fix: prevent streaming transfers from hanging on degraded connections by sanity · Pull Request #3609 · freenet/freenet-core

sanity · 2026-03-20T21:14:17Z

Problem

GET requests for large contracts (~1MB, e.g. the River UI container) consistently fail because streaming transfers die mid-transfer. The responding peer starts sending fragments but they stop arriving, causing stream assembly timeouts (30s).

User impact: River UI cannot load. Users see "GET request timed out after 30s". Affects users with otherwise healthy nodes (16+ ring peers). Multiple diagnostic reports confirm the pattern (E7NJG5, 23TN9A).

Root cause: The cwnd_wait loop in send_stream and pipe_stream has no timeout. When the FixedRateController's loss_pause activates (on packet loss/RTO), it caps cwnd at flightsize, blocking all new data until an ACK clears it. This is correct for transient loss (ACKs arrive in 1-2s), but when the outbound connection degrades (consecutive retransmission failures with exponential RTO backoff: 1s→2s→4s→8s→16s = 31s), the sender stalls for >30s and the receiver's STREAM_INACTIVITY_TIMEOUT fires with no diagnostic from the sender.

Large transfers (737 fragments for ~1MB) are disproportionately affected because more fragments = more chances for loss events and longer total transfer time.

Approach

Add CWND_WAIT_TIMEOUT (20s) to both send_stream and pipe_stream. Set below the 30s STREAM_INACTIVITY_TIMEOUT so the sender fails first with a clear diagnostic message ("cwnd wait timed out — outbound connection likely dead"), rather than the receiver timing out silently.

The fix preserves the existing loss_pause behavior (complete block until ACK clears it) for transient loss recovery, while adding a safety net for dead connections.

Testing

All 2068 existing tests pass (0 failures)
Added test_send_stream_cwnd_wait_timeout: verifies send_stream returns ConnectionClosed when cwnd wait exceeds timeout, and that completion_tx fires on timeout
Fixed telemetry bytes_sent overestimate (now capped at bytes_to_send)

Closes #3608

[AI-assisted - Claude]

The cwnd_wait loop in send_stream and pipe_stream had no timeout, causing indefinite hangs when the FixedRateController's loss_pause was active and retransmission ACKs never arrived (dead/degraded outbound connection). For large transfers (737 fragments for ~1MB), this manifested as "stream assembly timed out after 30s" on the receiver with no diagnostic from the sender. Two fixes: 1. Add CWND_WAIT_TIMEOUT (20s) to both send_stream and pipe_stream. The sender now fails with a clear diagnostic before the receiver's 30s inactivity timeout fires. 2. Fix FixedRateController::current_cwnd() during loss_pause: both flightsize() and current_cwnd() read the same AtomicUsize, making the check `flightsize + packet_size <= cwnd` permanently false (X + positive <= X). Adding a one-packet margin allows flow restart when ACKs reduce flightsize, without waiting for loss_pause to fully clear. Closes #3608 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-20T21:15:52Z

🔍 Rule Review: cwnd timeout for degraded connections

Rules checked: transport.md, code-style.md, testing.md, git-workflow.md
Files reviewed: 2 (fixed_rate/controller.rs, peer_connection/outbound_stream.rs)

Critical

None.

Warnings

None.

Info

outbound_stream.rs:42–46 — CWND_WAIT_TIMEOUT is hardcoded at 20s and documented as being "below the receiver's STREAM_INACTIVITY_TIMEOUT (30s)." If STREAM_INACTIVITY_TIMEOUT is itself a named constant, the code-style rule recommends deriving from it rather than keeping the relationship in a comment alone. Not a blocker, but fragile if the receiver timeout ever changes. (rule: code-style.md — numeric thresholds)
outbound_stream.rs:1101 (test) — The test uses RealTime::new() inside crates/core/ with #[tokio::test(start_paused = true)]. The comment explains that VirtualTime requires manual advance() calls that don't interleave with the cwnd loop's sleep iterations. This is a reasonable exception, but it ties test correctness to RealTime being backed by tokio::time (not std::time::Instant). Worth confirming RealTime::now() delegates to tokio::time::Instant::now() so the paused-clock auto-advance actually works — if it uses std::time, the 20s timeout would never fire and the test would hang. (rule: testing.md — TimeSource / determinism)

No Critical or Warning findings — merge is not blocked by this check.

Automated review against .claude/rules/. Critical and Warning findings block merge — check boxes or post /ack to acknowledge.

The LOSS_PAUSE_CWND_MARGIN constant doc and CWND_WAIT_TIMEOUT doc already explain the rationale; inline comments were restating the same information. Replaced with brief references to the constants. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-20T21:24:44Z

⚠️ Extended Benchmark Regressions Detected

Some extended benchmarks show performance regressions.

Note: Extended benchmarks include high-latency and packet-loss scenarios
which have higher variance than standard CI benchmarks.

View full results

View full benchmark summary

The margin (1500 bytes) effectively disabled loss_pause entirely — since each fragment is ~1422 bytes, the margin always allowed one fragment through, and since cwnd dynamically tracks flightsize, every subsequent fragment also passed. Loss_pause became a no-op. The CWND_WAIT_TIMEOUT (20s) alone is the correct fix: - Normal loss: ACKs clear loss_pause in 1-2s, sends resume - Dead connection: timeout fires at 20s, clean error Also adds a regression test that verifies send_stream returns ConnectionClosed and fires completion_tx when cwnd wait exceeds the timeout, and fixes a telemetry overestimate (bytes_sent was not capped at bytes_to_send). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Avoid two consecutive clock reads per timeout check — use one snapshot for both the timeout condition and elapsed calculation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-20T21:47:31Z

⚠️ Extended Benchmark Regressions Detected

Some extended benchmarks show performance regressions.

Note: Extended benchmarks include high-latency and packet-loss scenarios
which have higher variance than standard CI benchmarks.

View full results

View full benchmark summary

iduartgomez

address #3608 (comment)

sanity and others added 2 commits March 20, 2026 16:32

refactor: use single time_source.now() in cwnd timeout paths

18f189a

Avoid two consecutive clock reads per timeout check — use one snapshot for both the timeout condition and elapsed calculation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sanity added this pull request to the merge queue Mar 20, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 20, 2026

iduartgomez requested changes Mar 21, 2026

View reviewed changes

sanity mentioned this pull request Mar 21, 2026

fix: reduce UDP packet size to 1200 bytes to avoid IP fragmentation #3619

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prevent streaming transfers from hanging on degraded connections#3609

fix: prevent streaming transfers from hanging on degraded connections#3609
sanity wants to merge 4 commits intomainfrom
fix-3608

sanity commented Mar 20, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

Uh oh!

iduartgomez left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sanity commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Approach

Testing

Uh oh!

github-actions bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Rule Review: cwnd timeout for degraded connections

Critical

Warnings

Info

Uh oh!

github-actions bot commented Mar 20, 2026

⚠️ Extended Benchmark Regressions Detected

Uh oh!

github-actions bot commented Mar 20, 2026

⚠️ Extended Benchmark Regressions Detected

Uh oh!

Uh oh!

iduartgomez left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sanity commented Mar 20, 2026 •

edited

Loading

github-actions bot commented Mar 20, 2026 •

edited

Loading