feat: Add barge-in support to interrupt TTS playback by ai-cora · Pull Request #238 · mbailey/voicemode

ai-cora · 2026-02-02T17:09:14Z

Summary

Implements barge-in capability allowing users to interrupt TTS playback by speaking, creating more natural conversational flow.

Closes #211 - Feature request by @rakavanagh

Key Features

Voice Activity Detection (VAD) during TTS playback using WebRTC VAD
Instant interruption (<100ms target, actual <50ms achieved)
Seamless handoff - captured speech automatically sent to STT
Opt-in configuration - default behavior unchanged

Configuration

Environment Variable	Default	Description
`VOICEMODE_BARGE_IN`	`false`	Enable barge-in feature
`VOICEMODE_BARGE_IN_VAD`	`2`	VAD aggressiveness (0-3)
`VOICEMODE_BARGE_IN_MIN_MS`	`150`	Min speech duration to trigger

Changes

New Files

voice_mode/barge_in.py - BargeInMonitor class (350+ lines)
tests/test_barge_in.py - Unit tests (30 tests)
tests/test_barge_in_edge_cases.py - Edge case tests (23 tests)
tests/test_barge_in_integration.py - Integration tests (24 tests)
tests/test_barge_in_streaming.py - Streaming tests (25 tests)
tests/test_barge_in_performance.py - Performance tests (14 tests)
tests/test_audio_player_interrupt.py - Interrupt tests (30 tests)
tests/manual/test_barge_in_manual.py - Manual testing script

Modified Files

voice_mode/config.py - Barge-in configuration
voice_mode/audio_player.py - Interrupt support for NonBlockingAudioPlayer
voice_mode/core.py - Barge-in integration in text_to_speech()
voice_mode/streaming.py - Streaming TTS barge-in support
voice_mode/tools/converse.py - Converse tool integration
voice_mode/utils/event_logger.py - Barge-in event types
Documentation updates (4 files)

Performance

Metric	Average	Max	Target
Interrupt callback latency	<5ms	<10ms	<50ms
Voice onset → TTS stop	<20ms	<50ms	<100ms ✅

Test Results

146 barge-in related tests passing:

30 unit tests
23 edge case tests
24 integration tests
25 streaming tests
14 performance tests
30 interrupt tests

Test Plan

All unit tests pass
All integration tests pass
Performance targets met (<100ms latency)
Manual testing with real microphone
No regressions in existing tests
User acceptance testing by @rakavanagh

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com

Add barge-in configuration to voice_mode/config.py: - BARGE_IN_ENABLED: bool from VOICEMODE_BARGE_IN env (default: False) - BARGE_IN_VAD_AGGRESSIVENESS: int from VOICEMODE_BARGE_IN_VAD (default: 2) with validation to ensure 0-3 range - BARGE_IN_MIN_SPEECH_MS: int from VOICEMODE_BARGE_IN_MIN_MS (default: 150) Also update default config template with documentation for new env vars. Part of VM-606: Barge-in feature implementation (Phase 1/5) GitHub Issue: #211 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Create voice_mode/barge_in.py with BargeInMonitor class that monitors microphone for voice activity during TTS playback. When speech is detected (after configurable threshold), it signals to interrupt playback and captures the initial speech buffer for handoff to STT. Features: - VAD-based detection using webrtcvad - Background thread monitoring via sounddevice.InputStream - Audio buffer capture from voice onset - Configurable thresholds (vad_aggressiveness, min_speech_ms) - Thread-safe events for signaling - Graceful cleanup on stop Part of VM-606: Barge-in - Interrupt TTS playback when user starts speaking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add tests/test_barge_in.py with 30 unit tests covering: - Initialization and configuration defaults - Helper function is_barge_in_available() - State management methods (is_monitoring, voice_detected, get_captured_audio) - Start/stop monitoring lifecycle and thread management - VAD detection with mocked scipy.signal.resample - Audio buffer capture and concatenation - Thread safety with concurrent access - Integration tests for callback triggering and speech accumulation - Configuration validation for barge-in settings All tests pass with 72% coverage of barge_in.py module. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add barge-in interrupt capability to the audio player for VM-606: - Add on_interrupt callback parameter to __init__ - Add _interrupted flag to track interrupt state - Add interrupt() method that stops playback and fires callback - Add was_interrupted() method to check interrupt state - Reset _interrupted in play() for clean state each playback This enables the barge-in feature to stop TTS playback immediately when the user starts speaking, with proper callback notification. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Wire up barge-in support in the core TTS function: - Add optional barge_in_monitor: Optional[BargeInMonitor] parameter - Start monitoring before playback, wire callback to player.interrupt() - Stop monitoring after playback completes (normal or interrupted) - Check player.was_interrupted() to detect barge-in - Capture user's speech via get_captured_audio() for STT handoff - Add metrics: interrupted, interrupted_at, captured_audio, captured_audio_samples - Log TTS_PLAYBACK_END event with interrupted flag - Return early on interrupt with (True, metrics) This enables callers to pass a BargeInMonitor instance and receive captured audio in the metrics dict for seamless conversation flow. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ctionality Create tests/test_audio_player_interrupt.py with 30 comprehensive unit tests covering the barge-in interrupt support added to NonBlockingAudioPlayer. Test classes: - TestNonBlockingAudioPlayerInit: initialization with/without callback - TestNonBlockingAudioPlayerInterrupt: interrupt() behavior (flag, stop, callback) - TestNonBlockingAudioPlayerWasInterrupted: was_interrupted() state tracking - TestNonBlockingAudioPlayerPlaybackIntegration: mocked playback scenarios - TestNonBlockingAudioPlayerResourceCleanup: cleanup after interrupt - TestNonBlockingAudioPlayerNoRegression: stop() vs interrupt() distinction - TestNonBlockingAudioPlayerCallback: error handling, thread safety - TestIntegrationWithBargeInMonitor: real barge-in flow simulation All 60 barge-in + interrupt tests pass. Phase 2 complete. Ref: VM-606, phase2-tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…rse) Integrate barge-in functionality into the converse() tool to allow users to interrupt TTS playback by speaking. Changes to converse.py: - Import BARGE_IN_ENABLED from config - Import BargeInMonitor and is_barge_in_available from barge_in module - Update text_to_speech_with_failover() to accept barge_in_monitor parameter and pass through to simple_tts_failover() converse() integration: - Create BargeInMonitor before TTS when BARGE_IN_ENABLED and is_barge_in_available() and wait_for_response and not should_skip_tts - Pass monitor to text_to_speech_with_failover() - Check tts_metrics.get('interrupted') after TTS for barge-in detection - If barge-in occurred: - Log BARGE_IN_DETECTED event with interrupted_at_seconds and captured_samples - Get captured_audio from metrics - Set speech_detected=True (user was speaking) - Skip listening chime and normal recording - Use captured audio directly for STT - If no barge-in: normal recording flow unchanged The captured audio is in the same format as recorded audio (int16 PCM at SAMPLE_RATE) so it flows seamlessly to STT processing. Implements: VM-606 (phase3-converse) Ref: #211 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add comprehensive edge case handling for barge-in interruption: False positive detection: - Detect when barge-in triggers but no audio captured - Detect when captured audio is too short (<100 samples) - Log BARGE_IN_FALSE_POSITIVE events with reason - Save false positive audio for debugging when SAVE_AUDIO enabled STT error handling: - Log BARGE_IN_STT_ERROR when STT fails on barge-in audio - Include captured samples count in error events - Graceful fallback messaging Streaming TTS edge case: - Warn when barge-in enabled with streaming TTS (not yet supported) - Log BARGE_IN_STREAMING_UNSUPPORTED event - Graceful degradation (TTS plays, barge-in silently skipped) Event logger integration: - Add 5 new event types: BARGE_IN_START, BARGE_IN_DETECTED, BARGE_IN_STOP, BARGE_IN_FALSE_POSITIVE, BARGE_IN_STT_ERROR - Add convenience functions for each event type - Log BARGE_IN_START with VAD config when monitoring begins - Log BARGE_IN_STOP with voice_detected flag after monitoring ends Enhanced logging: - Detailed logging for barge-in monitoring start with config values - Rich logging for barge-in detection with timing and sample counts - Warning logs for false positives and edge cases Includes 23 new tests in test_barge_in_edge_cases.py covering all edge case scenarios. All 83 barge-in tests pass. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Create comprehensive integration tests for barge-in feature: tests/test_barge_in_integration.py (24 tests in 8 classes): - TestTTSPlaybackInterruptionTiming: interrupt latency <50ms, playback stops immediately, callback timing <10ms - TestSeamlessSTTHandoff: audio format int16, voice onset preserved, metrics structure, STT flow - TestFullConversationFlowWithInterruption: complete barge-in flow, listening chime skipped, normal flow fallback - TestBargeInMonitorAndPlayerCoordination: interrupt wiring, callback target, audio capture during playback - TestBargeInConfigurationIntegration: enabled flag, VAD aggressiveness passthrough, min_speech_ms passthrough - TestBargeInEventLogging: BARGE_IN_START/DETECTED/STOP events - TestEdgeCasesInIntegration: disabled mode, unavailable webrtcvad, wait_for_response=False - TestConcurrencyAndThreadSafety: cross-thread interrupt, buffer safety tests/manual/test_barge_in_manual.py: - Interactive test scenarios for real microphone testing - Monitor-only test (no TTS) - Interrupt timing/latency test - TTS playback test (full flow) - Comparison test (with vs without barge-in) All 107 barge-in tests pass (30 unit + 23 edge cases + 30 interrupt + 24 integration) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add interrupt detection to streaming TTS functions so users can barge-in (interrupt TTS by speaking) even when streaming is enabled. Changes: - streaming.py: Add barge_in_monitor parameter to stream_tts_audio(), stream_pcm_audio(), and stream_with_buffering(). Add interrupt detection in streaming loops (check before/after each chunk). Extend StreamMetrics with barge-in fields (interrupted, interrupted_at, captured_audio, captured_audio_samples). Add cleanup in finally blocks. - core.py: Remove BARGE_IN_STREAMING_UNSUPPORTED warning (was a placeholder). Pass barge_in_monitor through to stream_tts_audio(). Handle streaming barge-in metrics. - tests/test_barge_in_streaming.py: Add 25 tests covering StreamMetrics fields, function signatures, interrupt detection, monitor integration, captured audio handoff, metrics tracking, edge cases, and async streaming behavior. All 132 barge-in tests pass (30 unit + 23 edge cases + 30 interrupt + 24 integration + 25 streaming). Ref: VM-606 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add comprehensive documentation for the barge-in (TTS interruption) feature: - docs/guides/configuration.md: Add "Barge-In (Interrupt TTS)" section with all 3 env vars (VOICEMODE_BARGE_IN, VOICEMODE_BARGE_IN_VAD, VOICEMODE_BARGE_IN_MIN_MS), requirements, and use cases - docs/concepts/architecture.md: Add "Barge-In (TTS Interruption)" section with ASCII flow diagram showing BargeInMonitor, AudioPlayer interrupt flow, and captured audio path to STT - CHANGELOG.md: Add feature entry under [Unreleased] with VM-606/GH-211 references, all configuration options, and key implementation details - docs/reference/converse-parameters.md: Add barge-in section with enabling instructions, how-it-works explanation, requirements, and tuning tips for various environments Closes: VM-606 phase5-docs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Create tests/test_barge_in_performance.py with 14 automated tests: - Voice onset to TTS stop latency verification (<100ms target) - Interrupt callback latency (<5ms average) - VAD check performance (<5ms per chunk) - Buffer operations (<1ms append) - Memory usage (5s audio ~240KB) - Thread safety under concurrent access - Load testing (50 cycles consistency) - Add performance characteristics to docs/concepts/architecture.md: - Latency metrics table with targets - Detailed latency breakdown - CPU overhead documentation - Memory usage characteristics - Thread safety notes - Add CPU profiling and performance report to manual test script: - test_cpu_profiling() for real-time CPU measurement - test_performance_report() for comprehensive metrics Key findings: - Average interrupt latency: <5ms - Average cross-thread latency: <20ms - Total pipeline latency: <50ms (well under 100ms target) - Minimal CPU overhead during monitoring - Memory scales linearly with captured duration Ref: VM-606, GH-211 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

ai-cora · 2026-02-02T17:09:27Z

@rakavanagh - The barge-in feature you requested in #211 is ready for review!

Key highlights:

Voice onset to TTS stop latency: <50ms (well under 100ms target)
Opt-in via VOICEMODE_BARGE_IN=true
146 tests covering all functionality

Would love your feedback when you get a chance to try it out!

jeong-sik · 2026-02-14T19:45:07Z

Friendly ping — is this still relevant? Happy to rebase or update if needed.

derryl · 2026-02-26T20:25:15Z

I'd really appreciate a feature like this. Having to wait 15-30 secs for Claude to finish talking every time is very disruptive to the flow. Right now the only way to stop Claude talking is to press Esc which interrupts the entire 'voicemode:converseprocess. Thankfully it's pretty easy to pick up where you left off by running/voicemode:converse` again. But the first thing Claude does is say even more stuff because you've started a "new" conversation

mbailey and others added 12 commits February 3, 2026 01:50

MatiasComercio mentioned this pull request Feb 17, 2026

feat: add barge-in interrupt detection #268

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add barge-in support to interrupt TTS playback#238

feat: Add barge-in support to interrupt TTS playback#238
ai-cora wants to merge 12 commits intomasterfrom
feat/VM-606-barge-in-interrupt-tts-playback-when-user-starts

ai-cora commented Feb 2, 2026

Uh oh!

ai-cora commented Feb 2, 2026

Uh oh!

jeong-sik commented Feb 14, 2026

Uh oh!

derryl commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants