fix(bench): improve benchmark stability and output formatting by iduartgomez · Pull Request #2597 · freenet/freenet-core

iduartgomez · 2026-01-04T23:41:08Z

Fixes three issues with benchmark reports:

ANSI escape codes in output - Cargo warnings with color codes were
polluting markdown reports. Fixed by:
- Adding strip_ansi_codes() to parse_bench_output.py
- Using --color=never in both workflow files
- Adding sed fallback stripping in workflow error handling
Benchmark panic during warmup - ConnectionClosed errors in warmup
phase caused benchmark crashes. Fixed by:
- Replacing .unwrap() with graceful error handling in slow_start.rs
- Making ConnectedPeerPair::warmup() return completion count instead
  of panicking on errors
Extended benchmarks not running - transport_extended was missing from
Cargo.toml [[bench]] configuration, causing cargo to run test harness
instead of criterion. Fixed by adding proper bench configuration with
harness = false.

This commit consolidates multiple improvements to the benchmark infrastructure: Benchmark Stability Fixes: - Use small warmup messages (1000 bytes) to avoid cwnd exhaustion - Add proper error handling in all benchmark functions - Increase measurement times to collect sufficient samples - Fix doc comment warnings (/// -> //) on criterion macros Parser Improvements (parse_bench_output.py): - Fix time extraction to properly get middle value with unit - Add benchmark name validation (must contain '/') - Filter out criterion output lines from benchmark name detection - Add severe regression detection (>25% on throughput benchmarks) - Exit code 2 for severe regressions, 1 for regular regressions CI Workflow Updates: - Add "Fail on Severe Regressions" step for nightly/main builds - Parse exit codes to distinguish severity levels - PRs get informational comments only, not failures New Benchmarks: - Add RTT scenario benchmarks (0ms, 5ms, 10ms, 25ms, 50ms delays) Removed: - transport_direct.rs (redundant after criterion fixes)

Document known gaps in transport benchmark coverage that could allow regressions to go undetected: Critical gaps: - Memory usage tracking (none) - Bidirectional simultaneous transfer (none) - Connection churn / many short connections (none) - High RTT paths 100ms+ (partial - only up to 50ms) Medium priority gaps: - Timeout/retransmission behavior (indirect only) - Multiple concurrent connections from different peers - Backpressure / flow control behavior Also documents potential for virtual time benchmarks using Criterion's custom Measurement trait, which could make high-RTT and timeout testing 10-100x faster.

iduartgomez marked this pull request as draft January 4, 2026 23:41

iduartgomez force-pushed the claude/fix-benchmark-stability-faYpJ branch from 403e0ce to 337a18a Compare January 5, 2026 16:32

iduartgomez marked this pull request as ready for review January 5, 2026 16:54

iduartgomez merged commit 7a4bb93 into main Jan 5, 2026
8 of 9 checks passed

iduartgomez deleted the claude/fix-benchmark-stability-faYpJ branch January 5, 2026 16:55

sanity mentioned this pull request Jan 5, 2026

build: release 0.1.83 #2614

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(bench): improve benchmark stability and output formatting#2597

fix(bench): improve benchmark stability and output formatting#2597
iduartgomez merged 2 commits intomainfrom
claude/fix-benchmark-stability-faYpJ

iduartgomez commented Jan 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

iduartgomez commented Jan 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants