Skip to content

feat(bench): add VirtualTime-based LEDBAT benchmarks#2605

Merged
iduartgomez merged 26 commits intomainfrom
claude/virtualtime-benchmarks-qTWk2
Jan 7, 2026
Merged

feat(bench): add VirtualTime-based LEDBAT benchmarks#2605
iduartgomez merged 26 commits intomainfrom
claude/virtualtime-benchmarks-qTWk2

Conversation

@iduartgomez
Copy link
Copy Markdown
Collaborator

Add deterministic benchmarks using VirtualTime for instant simulation
of network conditions. This enables testing LEDBAT congestion control
behavior without wall-clock delays.

Key changes:

  • Extract LedbatTestHarness into pub mod harness (available with bench feature)
  • Add virtualtime.rs with 7 benchmark groups testing:
    • Slow start convergence across RTT scenarios (LAN to satellite)
    • Long simulation runs (1000-2000 RTTs)
    • Loss recovery behavior (0%, 1%, 5% loss rates)
    • High RTT path behavior (50-500ms RTT)
    • GAIN calculation validation
    • Periodic slowdown (LEDBAT++ fairness feature)
    • Determinism verification

Performance comparison (100 RTTs @ 135ms):

  • Real-time: ~13.5 seconds
  • VirtualTime: ~15 microseconds (~900,000x faster)

This allows comprehensive LEDBAT algorithm testing in CI without the
overhead of real network delays.

@iduartgomez iduartgomez marked this pull request as draft January 5, 2026 17:41
@iduartgomez iduartgomez closed this Jan 5, 2026
@iduartgomez iduartgomez force-pushed the claude/virtualtime-benchmarks-qTWk2 branch from 36bed29 to 50ab0f4 Compare January 5, 2026 21:25
@iduartgomez iduartgomez reopened this Jan 5, 2026
@iduartgomez iduartgomez marked this pull request as ready for review January 6, 2026 12:32
@iduartgomez iduartgomez force-pushed the claude/virtualtime-benchmarks-qTWk2 branch from 5b4adbd to e7d6a48 Compare January 6, 2026 17:44
@freenet freenet deleted a comment from github-actions bot Jan 6, 2026
@freenet freenet deleted a comment from github-actions bot Jan 6, 2026
@freenet freenet deleted a comment from github-actions bot Jan 6, 2026
@freenet freenet deleted a comment from github-actions bot Jan 6, 2026
@freenet freenet deleted a comment from github-actions bot Jan 6, 2026
@freenet freenet deleted a comment from github-actions bot Jan 6, 2026
@iduartgomez iduartgomez enabled auto-merge January 7, 2026 00:05
claude added 17 commits January 7, 2026 07:43
Integrate TimeSource generic throughout the transport layer to enable
deterministic simulation benchmarks with instant execution of high-latency
scenarios.

Changes:
- Add TimeSource generic to InboundConnectionHandler, OutboundConnectionHandler,
  UdpPacketsListener, and ConnectionEvent
- Add MockSocket::with_time_source() for VirtualTime-based packet delays
- Add create_mock_peer_with_virtual_time() for benchmark helpers
- Update benchmark common.rs with VirtualTimeMeasurement for Criterion
- Propagate time_source through all async blocks in connection handling
- Update type aliases (GatewayConnectionFuture, TraverseNatFuture, etc.)
  to include TimeSource generic parameter
The generic impl<TS: TimeSource> PeerPair<TS>::connect() cannot work because
OutboundConnectionHandler::connect() is defined on separate impl blocks for
RealTime and VirtualTime, not generically.

Split into two specialized impl blocks to match the handler's structure.
Fixes Criterion warning: 'Unable to complete 10 samples in 15.0s'
The warm connection benchmark takes ~408ms/iteration, requiring ~25s
for 10 samples plus warmup.
Migrate all transport benchmarks to use VirtualTime for time tracking:
- slow_start.rs: cold_start, warm_connection, cwnd_evolution, rtt_scenarios
- transport_extended.rs: sustained_throughput, packet_loss, large_files
- transport_ci.rs: updated config for VirtualTime
- streaming.rs: stream_throughput, concurrent_streams
- ledbat_validation.rs: cold_start, warm_connection
- blackbox.rs: connection_establishment, message_throughput

Uses iter_custom() to track virtual elapsed time via TimeSource::now_nanos().

Note: This adds VirtualTime time-tracking but actual execution still runs
at real-time speed. For true instant execution, the transport stack's
internal tokio::time::sleep() and timeout() calls would need to be
replaced with VirtualTime-aware versions throughout.
- Send packets before yielding to ensure they're available when
  other tasks run
- Always advance VirtualTime by at least 10ms to ensure protocol
  timeouts fire (needed for select! patterns with VirtualSleep)
- This is a partial fix for VirtualTime benchmarks; full integration
  requires changes to how VirtualTime sleep() and timeout() work
  with tokio's select!
Key changes:
- Add trigger_expired() method to VirtualTime that wakes expired futures
  even when current time equals deadline (fixes edge case where advance_to()
  wouldn't trigger wakeups at exactly the current time)
- Update try_auto_advance() to call trigger_expired() before checking
  for new wakeups to advance to
- Switch VirtualTime benchmarks to single-threaded tokio runtime for
  deterministic task scheduling (multi-threaded runtimes caused race
  conditions between auto-advance and packet processing)
- Restructure cold_start benchmark to use futures::join! for connection
  establishment instead of tokio::spawn (prevents ownership issues)
- Tune auto-advance task with 100µs real-time sleep for balance between
  speed and reliability

These changes fix the VirtualTime benchmark deadlocks where connections
would fail with "max connection attempts reached" or "ConnectionClosed"
errors due to improper time advancement coordination.
- Add #[cfg(test)] to RealTime impl blocks in ledbat, token_bucket,
  and sent_packet_tracker since they're only used in tests
- Add #[allow(clippy::too_many_arguments)] to config_listener_with_virtual_time
- Replace unwrap_or_else(VirtualTime::new) with unwrap_or_default()
- Remove for loop over single element in cold_start benchmark
- Fix unused imports (VirtualTime, RealTime)
- Add separate "Compile Benchmarks" step to CI workflows for fast-fail
  on compilation errors before running actual benchmarks
- Switch to single-threaded runtime for deterministic scheduling
- Add spawn_auto_advance_task to prevent VirtualTime deadlocks
- Replace tokio::spawn with futures::join! for proper coordination
- Import spawn_auto_advance_task from common module

This fixes the extended benchmarks getting stuck during warmup
due to the same VirtualTime coordination issues fixed in transport_ci.
Update streaming.rs and ledbat_validation.rs with the same VirtualTime
coordination fixes that were applied to other benchmarks:

- Switch from multi-threaded to single-threaded runtime for deterministic
  scheduling with VirtualTime
- Add spawn_auto_advance_task to prevent deadlocks when tasks block
- Replace tokio::spawn with sequential operations (single-threaded runtime
  can't run concurrent tasks)

These benchmarks were timing out in CI because they were still using
multi-threaded runtime which doesn't work correctly with VirtualTime.
Instead of running `cargo bench` again in the Run step (which triggers
cargo's compilation check), execute the benchmark binary directly after
compilation. This:

1. Avoids confusing "Compiling" messages in the Run step
2. Eliminates any potential for accidental recompilation
3. Slightly faster execution since we skip cargo's dependency check

The compile step now finds and exports the binary path via GITHUB_ENV
for the run step to use.
Two major fixes for extended benchmark failures:

1. **Aggressive auto-advance**: The auto-advance task was sleeping 100µs
   real time between each try_auto_advance() call. With many VirtualSleeps
   in streaming benchmarks, this accumulated to 4-10 seconds of real time.
   Now it advances ALL pending wakeups in a burst, only sleeping when idle.

2. **Reuse connections**: Benchmarks were creating new peer pairs inside
   the iteration loop, causing port exhaustion after ~100 iterations.
   The 65536-byte streaming benchmark crashed with "ConnectionClosed"
   errors from exhausted ports. Now connections are created once per
   benchmark run and reused across iterations.

Also renamed "rate_limited" benchmark to "stream" since there's no
actual rate limiting (delay is Duration::ZERO).
…ures)

Changes to improve VirtualTime benchmark reliability:

1. **Bounded auto-advance**: Added try_auto_advance_bounded() that limits
   time advancement to prevent jumping past protocol timeouts. Default
   max step is 1s, benchmarks use 10ms.

2. **Fix auto-advance bug**: try_auto_advance_bounded was returning Some
   even when no advancement was needed (deadline <= current), causing
   the auto-advance loop to spin without sleeping.

3. **Fresh VirtualTime per iteration**: Benchmarks now create fresh
   VirtualTime instances per iter_custom call to prevent time accumulation
   across criterion warmup and sample iterations.

4. **Abort auto-advance tasks**: Benchmarks now call abort() on the
   auto-advance JoinHandle when done.

**Still investigating**: Connections are still closing prematurely with
"ConnectionClosed" errors. The 120s idle timeout appears to be triggering
despite these fixes. Need to investigate:
- How the keep-alive task interacts with VirtualTime
- Whether packet delivery is happening correctly in MockSocket
- Whether the connection's listener task is processing packets
VirtualTime caused connection timeouts when auto-advance advanced time
faster than packets could be delivered. The timeout mechanism (120s
idle timeout in PeerConnection) uses VirtualTime to check elapsed time,
but packet delivery uses real async channels that don't integrate with
VirtualTime.

Changes:
- streaming.rs: Use RealTime with MockSocket for reliable benchmarks
- ledbat_validation.rs: Use RealTime with MockSocket for reliability
- slow_start.rs: Update to multi-threaded runtime for better concurrency
- transport_extended.rs: Update to multi-threaded runtime

The benchmarks now complete successfully, though with slightly longer
wall-clock time. Some warmup errors are acceptable as they don't affect
the final measurements.

Fixes connection closure errors in CI benchmark runs.
VirtualTime benchmarks were failing because the auto-advance task
advanced time faster than packets could be delivered, triggering the
120-second connection timeout prematurely.

Changes:
- Add connection_idle_timeout() method to TimeSource trait
- RealTime uses default 120s timeout
- VirtualTime uses 1-hour timeout to avoid premature disconnections
- Update peer_connection.rs to use configurable timeout
- Spawn auto-advance AFTER connection is established to avoid
  timeouts during handshake

The benchmarks now complete successfully with VirtualTime.
All three extended benchmarks (sustained throughput, packet loss,
large files) now follow the same pattern as the other VirtualTime
benchmarks:
- Create fresh VirtualTime for each iteration
- Connect peers WITHOUT auto-advance running
- Spawn auto-advance AFTER connection is established
- Abort auto-advance before cleanup

This prevents ConnectionEstablishmentFailure errors caused by
VirtualTime advancing too fast during the handshake.
claude added 9 commits January 7, 2026 07:44
…eout

Now that VirtualTime.connection_idle_timeout() returns 1 hour,
it's safe to run auto-advance during the handshake phase. This
allows VirtualTime to advance properly, making benchmarks run
instantly instead of using wall-clock time.

Without auto-advance during handshake, the connection worked but
took real wall-clock time (~30s per transfer instead of instant).
The auto-advance task now unconditionally advances VirtualTime in
small increments (10ms per 100µs real time = 100x faster). This
ensures VirtualTime-based protocol timers fire even when tasks are
blocked on real async channel operations.

Also increases VirtualTime connection_idle_timeout from 1 hour to
24 hours to accommodate aggressive time advancement.

Results:
- 16KB transfer: 11s → 186ms (60x improvement)
- Larger transfers still slow due to LEDBAT congestion control
  seeing "high delay" from rapid VirtualTime advancement (needs
  further investigation)
Unconditional auto-advance inflated RTT measurements because VirtualTime
advanced during async channel operations. This caused LEDBAT to throttle
heavily for larger transfers (64KB+).

Conditional auto-advance only advances when there are pending wakeups,
which prevents RTT inflation. However, with NoDelay policy, packets
are delivered so fast that retransmit timers are cancelled before
auto-advance runs, causing VirtualTime to not advance at all.

Still investigating the right balance - may need to use a small
simulated delay (e.g., 1ms) to ensure VirtualTime advances meaningfully.
Previously, MockSocket advanced VirtualTime by 10ms on every send_to() call,
even with NoDelay policy. This inflated RTT measurements since
receive_time - send_time included the accumulated VirtualTime advances,
causing LEDBAT to throttle throughput unnecessarily.

New approach:
- Packets carry an `available_at_nanos` timestamp computed at send time
- recv_from() waits via sleep_until() for the packet's delivery time
- VirtualTime auto-advance handles both protocol timers AND packet delivery
- RTT now accurately reflects the simulated network delay from PacketDelayPolicy

This enables proper emulation of different RTT/latency conditions:
- NoDelay: 0ms RTT, immediate delivery
- Fixed(d): d RTT per packet
- Uniform{min,max}: random RTT in range

Implementation:
- Changed Channels type to include delivery timestamp (third tuple element)
- Added compute_delay_and_timestamp() helper for consistent timestamp calculation
- Added send_packet_internal() helper for packet transmission
- Updated recv_from() to wait for delivery time via sleep_until()

All benchmarks pass: 16KB-1MB transfers, RTT scenarios (0-50ms).
Previously, slow_start.rs shared a single VirtualTime instance across all
benchmark iterations. This caused VirtualTime to accumulate, leading to:
- Inconsistent RTT measurements as auto-advance ran between send/receive
- ConnectionClosed errors from timing accumulation issues
- Massive performance regression (16KB: 117ms -> 3.2s)

This change aligns slow_start.rs with transport_extended.rs by:
- Creating fresh VirtualTime for each iteration
- Spawning auto-advance task per iteration
- Aborting auto-advance task at end of each iteration

This ensures clean timing state for each benchmark iteration,
preventing cross-iteration interference.

All benchmarks updated:
- bench_cold_start_throughput
- bench_warm_connection_throughput
- bench_cwnd_evolution
- bench_rtt_scenarios
- bench_high_bandwidth_throughput
…e inflation

The key insight is that ledbat_validation.rs benchmarks work correctly
because they establish connection WITHOUT auto-advance running, avoiding
VirtualTime inflation during handshake.

Changes:
- Restructure all slow_start.rs benchmarks to connect peers directly
  (not in spawned tasks) and spawn auto-advance AFTER connection
- Fix transport_extended.rs benchmarks with same pattern
- Remove misleading "safe now due to 1-hour idle timeout" comments

This should fix the 3+ second regression for 16KB transfers.
…fore connection

The previous change caused a +3218% regression on 16KB transfers.
transport_extended.rs was working correctly with auto-advance spawned
before connection - the 1-hour idle timeout prevented premature disconnects.

Keep slow_start.rs changes which use a different pattern.
… progression

Root cause: Conditional auto-advance with 1ms sleep was too slow.
VirtualTime wasn't advancing fast enough, causing protocol timers
to stall and benchmarks to take 5+ seconds instead of ~35ms.

Changes:
- Revert common.rs to unconditional auto-advance with 100µs sleep
- Update slow_start/cold_start to use fresh VirtualTime per iteration
  (matching transport_extended.rs pattern)

Results: 16KB transfers now complete in ~35-38ms instead of 5-6 seconds.
…ocks

Replace tokio::spawn-based sender/receiver pattern with synchronous
futures::join! connection pattern for warm_connection, cwnd_evolution,
and rtt_scenarios benchmarks.

The tokio::spawn pattern was causing race conditions and ConnectionClosed
errors with VirtualTime because spawned tasks weren't synchronized with
the auto-advance task. The new pattern follows cold_start's approach:
1. Connect both peers concurrently using futures::join!
2. Send from one connection
3. Receive on the other connection

This ensures proper synchronization with VirtualTime and eliminates
the hanging/timeout issues seen with the previous implementation.

All benchmarks now complete successfully with expected performance:
- cold_start: ~38ms
- warm_connection: ~54ms (includes 3 warmup transfers)
- cwnd_evolution: ~38ms
- rtt_scenarios: 38-189ms depending on RTT (0-50ms)
@iduartgomez iduartgomez force-pushed the claude/virtualtime-benchmarks-qTWk2 branch from c700e96 to c14b727 Compare January 7, 2026 07:51
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 7, 2026

⚠️ Performance Benchmark Regressions Detected

Found 2 benchmark(s) with performance regressions:

  • streaming_buffer/latency/first_fragment_full: +69.495%
  • streaming_buffer/latency/first_fragment_1kb: +67.489%

⚠️ Important: This may be a false positive!

Common causes of false positives:

  1. Stale baseline: If recent PRs improved performance on main, this PR (which doesn't include those changes) will show as "regressed" when compared to the new baseline
  2. GitHub runner variance: Benchmarks run on shared ubuntu-latest runners with variable CPU contention
  3. Old baseline: The baseline might be from an older main commit if the cache restore used restore-keys fallback

To verify if this is a real regression:

  1. Check if recent commits on main touched transport or benchmark code
  2. Merge main into your branch and re-run benchmarks
  3. Review the baseline age in the "Download main branch baseline" step

This is informational only and does not block the PR.

View full benchmark results and summary

@iduartgomez iduartgomez added this pull request to the merge queue Jan 7, 2026
Merged via the queue into main with commit e7e497e Jan 7, 2026
11 checks passed
@iduartgomez iduartgomez deleted the claude/virtualtime-benchmarks-qTWk2 branch January 7, 2026 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants