Skip to content

feat: hold-mode PTT with TTS interrupt, tag-based polyglot TTS, and local TTS retry#328

Open
luisdemarchi wants to merge 2 commits intombailey:masterfrom
luisdemarchi:feat/hold-ptt-tts-interrupt-polyglot-tags
Open

feat: hold-mode PTT with TTS interrupt, tag-based polyglot TTS, and local TTS retry#328
luisdemarchi wants to merge 2 commits intombailey:masterfrom
luisdemarchi:feat/hold-ptt-tts-interrupt-polyglot-tags

Conversation

@luisdemarchi
Copy link
Copy Markdown

Summary

  • Hold-mode PTT: Replaces toggle-mode with hold-to-talk. Press and hold the PTT key to record; release to send. Pressing the key while Claude is speaking immediately interrupts playback and starts recording.
  • Tag-based Polyglot TTS: Replaces heuristic language detection with explicit <en>...</en> tags annotated by Claude. Tags are stripped before synthesis so they are never read aloud. Heuristic mode preserved as opt-in (VOICEMODE_POLYGLOT_HEURISTIC=true).
  • Local TTS retry: Auto-retries up to 3 times with exponential backoff (2s/4s/8s) on connection errors from local providers (e.g. Kokoro restarting after hitting UVICORN_LIMIT_MAX_REQUESTS).

Details

Push-to-Talk: hold mode + interrupt

  • Global pynput listener starts eagerly at server startup (not lazily on first recording) so PTT is ready from the very first response
  • audio_player.py: tracks active player in _current_playback global so PTT can stop it from any thread
  • streaming.py: polls interrupt_streaming event inside the PCM chunk loop and calls stream.abort() to cut off playback mid-stream
  • core.py: treats an interrupted TTS stream as success (avoids triggering buffered fallback)
  • New dep: pynput (+ evdev on Linux)

Polyglot TTS: tag-based detection

  • Claude annotates English words/phrases in Portuguese sentences with <en>...</en> tags via updated system prompts in prompts/converse.py and tools/converse.py
  • polyglot_tts.py: new parse_tag_segments() parses tags into (lang, text) pairs; new strip_language_tags() removes tags before non-polyglot synthesis
  • Heuristic word-scoring still available but disabled by default (false positive rate too high for Portuguese)

Local TTS retry

  • simple_failover.py: detects connection-error strings and retries with asyncio.sleep backoff for local endpoints only; non-connection failures and remote endpoints fail-fast as before

Test plan

  • PTT hold mode: hold key → record → release → sends audio
  • PTT interrupt: press key while Claude is speaking → playback stops immediately, recording starts
  • Polyglot: Portuguese response with <en>word</en> tags uses English voice for tagged segments
  • Tags stripped: non-polyglot TTS does not read <en> or </en> aloud
  • Local TTS retry: simulate Kokoro restart → voicemode retries and recovers

🤖 Generated with Claude Code

luisdemarchi and others added 2 commits March 27, 2026 08:56
…evity rules

- polyglot_tts.py: detect PT/EN language segments and synthesize each with
  the appropriate Kokoro voice, then concatenate audio (VOICEMODE_POLYGLOT=true)
- ptt.py: toggle push-to-talk via configurable key (default F9) using pynput;
  replaces continuous microphone VAD when VOICEMODE_PTT_MODE=toggle
- cli_commands/voice_notify.py: 'voicemode notify pre|post <tool>' command for
  Claude Code PreToolUse/PostToolUse hooks; speaks tool name via TTS when a
  voice session is active (VOICEMODE_TOOL_NOTIFY=true)
- converse tool: integrates PTT via record_audio_smart(), touches voice session
  state file on every call, injects polyglot TTS in text_to_speech_with_failover
- prompts/converse.py + converse docstring: strict brevity rules so Claude keeps
  voice responses to 1-2 sentences and avoids preambles/summaries
- pyproject.toml: adds optional [ptt] extra for pynput dependency

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ocal TTS retry

## Push-to-Talk: hold mode + TTS interrupt
- Rewrote PTT from toggle-mode to hold-mode (hold key to record, release to send)
- Global pynput listener starts eagerly at server startup so PTT is ready on
  the very first response
- Pressing the PTT key during TTS playback immediately aborts the stream and
  starts recording — no need to wait for Claude to finish speaking
- `audio_player.py`: track active player in `_current_playback` global so PTT
  can call `stop()` from any thread
- `streaming.py`: poll `interrupt_streaming` event inside the chunk loop and
  call `stream.abort()` to cut off PCM playback mid-stream
- `core.py`: treat interrupted TTS as success (no buffered fallback)

## Polyglot TTS: tag-based detection
- Primary strategy now uses explicit `<en>...</en>` / `<pt>...</pt>` tags that
  Claude annotates in its responses, replacing unreliable heuristic detection
- Heuristic word-scoring retained but disabled by default; opt-in via
  `VOICEMODE_POLYGLOT_HEURISTIC=true`
- Added `parse_tag_segments()` and `strip_language_tags()` helpers
- Tags are stripped before non-polyglot TTS so they are never read aloud
- `prompts/converse.py` and `tools/converse.py` updated with rules instructing
  Claude to annotate English terms in Portuguese responses

## Local TTS retry
- `simple_failover.py`: retry up to 3 times (2s/4s/8s backoff) on connection
  errors from local providers (e.g. Kokoro restarting after hitting
  UVICORN_LIMIT_MAX_REQUESTS)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant