Skip to content

Rapid-skip loop after PUT connect state timeout — no recovery without reboot #300

@rkorell

Description

@rkorell

Bug description

After a put connect state request to Spotify's API fails with a timeout (context deadline exceeded), go-librespot loses synchronization with Spotify Connect and enters an unrecoverable rapid-skip loop, cycling through multiple tracks in seconds without playing any audio. The player does not recover on its own — a full device reboot is required.

Environment

  • go-librespot via Volumio Spotify Plugin 4.4.1
  • Raspberry Pi 5, 8GB, NVMe
  • HiFiBerry DAC+ Pro, OGG_VORBIS_320
  • WLAN (stable — plays for hours without issue when the bug doesn't trigger)

Reproduction pattern

Consistent trigger after several pause/resume cycles during long playback sessions (uptime: 8 days, single boot). In version 4.3.4 after ~8–10 cycles, worsened in 4.4.1 to ~2–3 cycles.

Failure sequence

Phase 1 — Normal operation

10:11:39 go-librespot: resume track at 219763ms
10:11:39 go-librespot: put connect state because PLAYER_STATE_CHANGED
10:11:39 go-librespot: emitting websocket event: playing

Phase 2 — Trigger: PUT times out (57s after resume)

10:12:36 level=error msg="failed put state after update" error="context deadline exceeded"
10:12:36 go-librespot: emitting websocket event: not_playing
10:12:36 go-librespot: loading track (paused: false, position: 1ms)

Current track is abandoned immediately. Player jumps to a different track at position 1ms — playlist context appears lost.

Phase 3 — State corruption confirmed

10:13:24 level=debug msg="put state request failed with status 400: Missing payload"

go-librespot no longer has a valid session state.

Phase 4 — Rapid-skip loop (~33 seconds)

10:13:27 loaded "If We Ever Meet Again"     → pause at 196ms → skip
10:13:28 loaded "Calm After The Storm"      → skip
10:13:29 loaded "Calm After The Storm"      → skip  ← duplicate
10:13:31 loaded "Calm After The Storm"      → skip  ← triplicate
10:13:32 loaded "Calm After The Storm"      → skip  ← 4x
10:13:33 loaded "Who Knew"                  → skip
10:13:33 loaded "Lights - Single Version"   → skip
10:13:35 loaded "Lights - Single Version"   → skip  ← duplicate
10:13:40 loaded "Lights - Single Version"   → skip  ← 3x
10:13:41 loaded "Lights - Single Version"   → skip  ← 4x
10:13:43 loaded "Lights - Single Version"   → skip  ← 5x
10:13:44 loaded "Lights - Single Version"   → skip  ← 6x
10:13:50 loaded "Lights - Single Version"   → skip  ← 7x
10:13:53 loaded "SexyBack"                  → skip
10:13:55 loaded "SexyBack"                  → skip
10:13:57 loaded "Mi Verdad"                 → skip
10:13:59 loaded "Stupid Love"               → (loop ends)

Each track triggers: resolved context → loading track → loaded track → put connect state → resolved context for NEXT track. No audio is played.

Phase 5 — Unsolicited volume reset

10:13:59 update volume requested to 14417/65535
10:13:59 update volume requested to 15728/65535
10:14:00 put connect state because VOLUME_CHANGED

Phase 6 — Player stuck

go-librespot process still running, responding to pings. UI responds visually (play/pause symbols toggle) but no audio. Skip/forward unresponsive. Full reboot required.

Pattern across the same boot cycle (8 days uptime)

Mar 24 13:06:18  "did not receive last pong from dealer, 59s passed"
Mar 24 18:26:48  "did not receive last pong ack from accesspoint, 120s passed"
Mar 27 05:01:07  "failed receiving packet" error="connection reset by peer"
Mar 28 01:14:29  "did not receive last pong from dealer, 60s passed"
Mar 28 07:11:59  "did not receive last pong from dealer, 60s passed"
Mar 28 09:36:59  "did not receive last pong from dealer, 60s passed"
Mar 28 10:12:36  "failed put state after update" error="context deadline exceeded"  ← this incident
Mar 28 12:36:29  "did not receive last pong ack from accesspoint, 137s passed"

Root cause analysis

go-librespot has no graceful recovery path when a put connect state request fails:

  1. On timeout: fires not_playing, abandons current track, attempts to load next from now-invalid context
  2. Each load triggers another put connect state → likely fails → resolves next context → load → loop
  3. After the loop: player left in broken idle state, no self-recovery possible

Suggested fix

  • On context deadline exceeded: retry the PUT, do not abandon the current track
  • On HTTP 400 Missing payload: re-authenticate / re-sync session state before continuing
  • Do not enter rapid-load loop when state is desynchronized
  • If state recovery fails: reset player to clean idle state instead of leaving it stuck

Logs

Full Volumio system logs from affected sessions:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions