Skip to content

feat(mcp): headless sessions with playhead simulation + lag measurement#257

Draft
leszko wants to merge 1 commit into
mainfrom
rafal/mcp-headless
Draft

feat(mcp): headless sessions with playhead simulation + lag measurement#257
leszko wants to merge 1 commit into
mainfrom
rafal/mcp-headless

Conversation

@leszko

@leszko leszko commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Why

Testing realtime behavior through the frontend is slow, and some bugs — e.g. music generation lagging behind the playhead — were only reproducible with a browser attached. The MCP server could only attach to a browser-created session because only the PRIMARY WS transport advances the server playhead (StreamingSession.set_knobs is echo-only for CommandOrigin.EXTERNAL) and knob persistence relies on the browser's useMcpMirror carrying echoes back.

What

demos/realtime_motion_graph_web/headless_client.py (new, torch-free — numpy/zstandard/websockets only, runs inside the MCP stdio process):

  • Full PRIMARY client: init handshake (server-side fixture or PCM upload), binary slice decode kept byte-identical to the server SliceCodec mirror, swap_ready/stem_assets handling.
  • PlayheadSim: simulated audible clock reported via params ticks like the browser; supports seek, pause, and rate skew (reproduces a fast client AudioContext eating the lead).
  • params_echo mirror: control-bus knob changes are merged and carried back on the next PRIMARY tick, so all existing MCP tools work against headless sessions unchanged.
  • LagTracker: per-slice lead_s (where fresh audio landed relative to the playhead; negative = behind the listener) and per-tick staleness_s (age of audio under the playhead; ~buffer duration = a full lap behind), plus a ring recorder of what the simulated listener heard.

New MCP tools (mcp_server.py): headless_start, headless_stop, headless_status, headless_lag_report, headless_observe, headless_seek, headless_set_playback, headless_dump_audio.

Typical lag-bug repro:

headless_start(fixture="low_fi_Gm_loop_60s_gnm.wav")
→ drive knobs / prompts / seeks
→ headless_lag_report(include_timeline=True)
   # lead_s p5 < 0 or stale_ticks > 0 ⇒ generation fell behind
→ headless_dump_audio(source="played")   # hear what the user would have heard

Note: starting a headless session preempts a live browser session (one-session-per-pod policy), documented in the tool.

Testing

  • 13 new unit tests (tests/unit/test_headless_client.py): playhead sim (advance/wrap/seek/pause/rate-rebase/swap-reset), circular lead folding, lag detection (healthy stream, stall + behind-playhead writes, window filtering, timeline rollup), and slice codec round-trips against the server-side SliceCodec (anti-ghosting byte-identity invariant).
  • Verified end-to-end against a live TRT backend: headless session created, ~395 slices/25 s decoded, healthy lead ≈ 0.2 s with zero stale ticks, knob set via control bus persisted through the echo mirror, seek produced the expected transient staleness spike, played-audio WAV dump contains real audio, clean teardown.
  • tests/unit/test_ws_preemption.py has 2 failures that pre-exist on main (SDK package move fallout), unrelated to this change.

No wire-contract or knob-registry changes — no TS regen needed.

🤖 Generated with Claude Code

The MCP server could only attach to a browser-created session, so
realtime bugs (e.g. generation lagging behind the playhead) were only
reproducible with the frontend. Add a headless mode: headless_start
spawns a full PRIMARY WebSocket client (new headless_client.py,
torch-free) that does what the browser does — init handshake, binary
slice decode (zstd float16 deltas, kept byte-identical to the server
SliceCodec mirror), a simulated audible playhead reported over the
params channel, and the params_echo mirror so control-bus knob changes
persist. The session registers like any other, so every existing MCP
tool drives it unchanged.

On top of the transport it measures the two lag signals per event:
lead_s (where each fresh slice landed relative to the playhead;
negative = behind the listener) and staleness_s (age of the audio
under the playhead; ~buffer duration = a full lap behind). New tools:
headless_start/stop/status, headless_lag_report, headless_observe,
headless_seek, headless_set_playback (clock-skew + pause repro), and
headless_dump_audio (the stream the simulated listener heard, stale
audio audible as a user would hear it).

Verified end-to-end against a live TRT backend: healthy stream shows
lead ~0.2 s and zero stale ticks; a seek produces the expected
transient staleness spike; knobs set through the control bus persist
via the echo mirror.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant