Skip to content

Infinite loop in BandwidthAllocator.allocate() causes CPU 100% under high load (stable-10431) #2366

@semi-lion

Description

@semi-lion

Description
Infinite loop in BandwidthAllocator.allocate() method causes CPU to reach 100% when handling large number of participants. The issue occurs when insufficient bandwidth causes the improve() method to return 0, but the while loop condition oldRemainingBandwidth != remainingBandwidth remains true, leading to infinite iteration. This problem is intermittent and occurs when stress levels exceed 1.0, even with the configured limit of 60 participants per JVB.

Current behavior
CPU usage reaches 100% under high load scenarios
Infinite loop in bandwidth allocation algorithm
System becomes unresponsive and load balancer redirects traffic to other JVB instances
Logs show 3,430+ WARNING messages related to TCC packet processing
High RTT values (6000ms+) and packet reordering issues
Intermittent stress level exceeding 1.0 (overloaded state)
Occurs even with 60 participants per JVB limit configured
Stress level fluctuates and occasionally spikes above threshold

Expected Behavior
Bandwidth allocation should complete within reasonable iterations
CPU usage should remain within normal limits even under high load
System should handle 60 participants per JVB without infinite loops
Bandwidth allocation should gracefully handle insufficient bandwidth scenarios
Stress level should remain below 1.0 consistently

Possible Solution
Add loop counter with maximum iteration limit and early termination logic:

var loopCount = 0
val maxLoops = 100
var totalConsumed = 0L

while (oldRemainingBandwidth != remainingBandwidth && loopCount < maxLoops) {
    loopCount++
    oldRemainingBandwidth = remainingBandwidth
    totalConsumed = 0L
    
    for (i in sourceBitrateAllocations.indices) {
        val sourceBitrateAllocation = sourceBitrateAllocations[i]
        if (sourceBitrateAllocation.constraints.isDisabled()) {
            continue
        }

        val consumed = sourceBitrateAllocation.improve(remainingBandwidth, i == 0)
        remainingBandwidth -= consumed
        totalConsumed += consumed
        
        if (remainingBandwidth < 0) {
            oversending = true
        }

        if (sourceBitrateAllocation.isOnStage() && !sourceBitrateAllocation.hasReachedPreferred()) {
            break
        }
    }
    
    // Early termination if no bandwidth was consumed
    if (totalConsumed == 0L) {
        logger.debug("No bandwidth consumed in iteration $loopCount, breaking loop")
        break
    }
}

if (loopCount >= maxLoops) {
    logger.warn("Bandwidth allocation loop exceeded maximum iterations: $loopCount")
}

Steps to reproduce
Set up a Jitsi Meet conference with 60 participants per JVB (configured limit)
Have participants turn on video simultaneously or in quick succession
Monitor CPU usage and JVB logs
Observe the following symptoms:
Intermittent stress level spikes above 1.0
CPU usage reaching 100%
Repeated WARNING messages in logs
High RTT values and packet processing issues
Stress level exceeding 1.0 even with participant limit

Environment details
Jitsi Videobridge version: stable-10431 (Docker image)
Deployment: Docker container
Participants: 60 users per JVB (configured limit)
Scenario: Participants turning on video simultaneously or in quick succession
Stress level: Intermittently exceeds 1.0 (overloaded: true)
Endpoints: 60 (configured limit)
Log evidence: 3,430 WARNING messages, TCC packet processing errors
File location: jvb/src/main/kotlin/org/jitsi/videobridge/cc/allocation/BandwidthAllocator.kt lines 251-272
Method: allocate() method in BandwidthAllocator class
Configuration: 60 participants per JVB limit is set
Docker image: jitsi/jvb:stable-10431
Root cause: The infinite loop occurs when sourceBitrateAllocation.improve() returns 0 due to insufficient bandwidth, but the while loop condition oldRemainingBandwidth != remainingBandwidth remains true, causing infinite iteration. This happens intermittently even with the configured participant limit, causing stress levels to spike above 1.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions