feat: Add barge-in support to interrupt TTS playback#238
Open
feat: Add barge-in support to interrupt TTS playback#238
Conversation
Add barge-in configuration to voice_mode/config.py: - BARGE_IN_ENABLED: bool from VOICEMODE_BARGE_IN env (default: False) - BARGE_IN_VAD_AGGRESSIVENESS: int from VOICEMODE_BARGE_IN_VAD (default: 2) with validation to ensure 0-3 range - BARGE_IN_MIN_SPEECH_MS: int from VOICEMODE_BARGE_IN_MIN_MS (default: 150) Also update default config template with documentation for new env vars. Part of VM-606: Barge-in feature implementation (Phase 1/5) GitHub Issue: #211 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Create voice_mode/barge_in.py with BargeInMonitor class that monitors microphone for voice activity during TTS playback. When speech is detected (after configurable threshold), it signals to interrupt playback and captures the initial speech buffer for handoff to STT. Features: - VAD-based detection using webrtcvad - Background thread monitoring via sounddevice.InputStream - Audio buffer capture from voice onset - Configurable thresholds (vad_aggressiveness, min_speech_ms) - Thread-safe events for signaling - Graceful cleanup on stop Part of VM-606: Barge-in - Interrupt TTS playback when user starts speaking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add tests/test_barge_in.py with 30 unit tests covering: - Initialization and configuration defaults - Helper function is_barge_in_available() - State management methods (is_monitoring, voice_detected, get_captured_audio) - Start/stop monitoring lifecycle and thread management - VAD detection with mocked scipy.signal.resample - Audio buffer capture and concatenation - Thread safety with concurrent access - Integration tests for callback triggering and speech accumulation - Configuration validation for barge-in settings All tests pass with 72% coverage of barge_in.py module. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add barge-in interrupt capability to the audio player for VM-606: - Add on_interrupt callback parameter to __init__ - Add _interrupted flag to track interrupt state - Add interrupt() method that stops playback and fires callback - Add was_interrupted() method to check interrupt state - Reset _interrupted in play() for clean state each playback This enables the barge-in feature to stop TTS playback immediately when the user starts speaking, with proper callback notification. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Wire up barge-in support in the core TTS function: - Add optional barge_in_monitor: Optional[BargeInMonitor] parameter - Start monitoring before playback, wire callback to player.interrupt() - Stop monitoring after playback completes (normal or interrupted) - Check player.was_interrupted() to detect barge-in - Capture user's speech via get_captured_audio() for STT handoff - Add metrics: interrupted, interrupted_at, captured_audio, captured_audio_samples - Log TTS_PLAYBACK_END event with interrupted flag - Return early on interrupt with (True, metrics) This enables callers to pass a BargeInMonitor instance and receive captured audio in the metrics dict for seamless conversation flow. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ctionality Create tests/test_audio_player_interrupt.py with 30 comprehensive unit tests covering the barge-in interrupt support added to NonBlockingAudioPlayer. Test classes: - TestNonBlockingAudioPlayerInit: initialization with/without callback - TestNonBlockingAudioPlayerInterrupt: interrupt() behavior (flag, stop, callback) - TestNonBlockingAudioPlayerWasInterrupted: was_interrupted() state tracking - TestNonBlockingAudioPlayerPlaybackIntegration: mocked playback scenarios - TestNonBlockingAudioPlayerResourceCleanup: cleanup after interrupt - TestNonBlockingAudioPlayerNoRegression: stop() vs interrupt() distinction - TestNonBlockingAudioPlayerCallback: error handling, thread safety - TestIntegrationWithBargeInMonitor: real barge-in flow simulation All 60 barge-in + interrupt tests pass. Phase 2 complete. Ref: VM-606, phase2-tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…rse)
Integrate barge-in functionality into the converse() tool to allow users
to interrupt TTS playback by speaking.
Changes to converse.py:
- Import BARGE_IN_ENABLED from config
- Import BargeInMonitor and is_barge_in_available from barge_in module
- Update text_to_speech_with_failover() to accept barge_in_monitor
parameter and pass through to simple_tts_failover()
converse() integration:
- Create BargeInMonitor before TTS when BARGE_IN_ENABLED and
is_barge_in_available() and wait_for_response and not should_skip_tts
- Pass monitor to text_to_speech_with_failover()
- Check tts_metrics.get('interrupted') after TTS for barge-in detection
- If barge-in occurred:
- Log BARGE_IN_DETECTED event with interrupted_at_seconds and
captured_samples
- Get captured_audio from metrics
- Set speech_detected=True (user was speaking)
- Skip listening chime and normal recording
- Use captured audio directly for STT
- If no barge-in: normal recording flow unchanged
The captured audio is in the same format as recorded audio (int16 PCM at
SAMPLE_RATE) so it flows seamlessly to STT processing.
Implements: VM-606 (phase3-converse)
Ref: #211
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive edge case handling for barge-in interruption: False positive detection: - Detect when barge-in triggers but no audio captured - Detect when captured audio is too short (<100 samples) - Log BARGE_IN_FALSE_POSITIVE events with reason - Save false positive audio for debugging when SAVE_AUDIO enabled STT error handling: - Log BARGE_IN_STT_ERROR when STT fails on barge-in audio - Include captured samples count in error events - Graceful fallback messaging Streaming TTS edge case: - Warn when barge-in enabled with streaming TTS (not yet supported) - Log BARGE_IN_STREAMING_UNSUPPORTED event - Graceful degradation (TTS plays, barge-in silently skipped) Event logger integration: - Add 5 new event types: BARGE_IN_START, BARGE_IN_DETECTED, BARGE_IN_STOP, BARGE_IN_FALSE_POSITIVE, BARGE_IN_STT_ERROR - Add convenience functions for each event type - Log BARGE_IN_START with VAD config when monitoring begins - Log BARGE_IN_STOP with voice_detected flag after monitoring ends Enhanced logging: - Detailed logging for barge-in monitoring start with config values - Rich logging for barge-in detection with timing and sample counts - Warning logs for false positives and edge cases Includes 23 new tests in test_barge_in_edge_cases.py covering all edge case scenarios. All 83 barge-in tests pass. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Create comprehensive integration tests for barge-in feature: tests/test_barge_in_integration.py (24 tests in 8 classes): - TestTTSPlaybackInterruptionTiming: interrupt latency <50ms, playback stops immediately, callback timing <10ms - TestSeamlessSTTHandoff: audio format int16, voice onset preserved, metrics structure, STT flow - TestFullConversationFlowWithInterruption: complete barge-in flow, listening chime skipped, normal flow fallback - TestBargeInMonitorAndPlayerCoordination: interrupt wiring, callback target, audio capture during playback - TestBargeInConfigurationIntegration: enabled flag, VAD aggressiveness passthrough, min_speech_ms passthrough - TestBargeInEventLogging: BARGE_IN_START/DETECTED/STOP events - TestEdgeCasesInIntegration: disabled mode, unavailable webrtcvad, wait_for_response=False - TestConcurrencyAndThreadSafety: cross-thread interrupt, buffer safety tests/manual/test_barge_in_manual.py: - Interactive test scenarios for real microphone testing - Monitor-only test (no TTS) - Interrupt timing/latency test - TTS playback test (full flow) - Comparison test (with vs without barge-in) All 107 barge-in tests pass (30 unit + 23 edge cases + 30 interrupt + 24 integration) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add interrupt detection to streaming TTS functions so users can barge-in (interrupt TTS by speaking) even when streaming is enabled. Changes: - streaming.py: Add barge_in_monitor parameter to stream_tts_audio(), stream_pcm_audio(), and stream_with_buffering(). Add interrupt detection in streaming loops (check before/after each chunk). Extend StreamMetrics with barge-in fields (interrupted, interrupted_at, captured_audio, captured_audio_samples). Add cleanup in finally blocks. - core.py: Remove BARGE_IN_STREAMING_UNSUPPORTED warning (was a placeholder). Pass barge_in_monitor through to stream_tts_audio(). Handle streaming barge-in metrics. - tests/test_barge_in_streaming.py: Add 25 tests covering StreamMetrics fields, function signatures, interrupt detection, monitor integration, captured audio handoff, metrics tracking, edge cases, and async streaming behavior. All 132 barge-in tests pass (30 unit + 23 edge cases + 30 interrupt + 24 integration + 25 streaming). Ref: VM-606 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive documentation for the barge-in (TTS interruption) feature: - docs/guides/configuration.md: Add "Barge-In (Interrupt TTS)" section with all 3 env vars (VOICEMODE_BARGE_IN, VOICEMODE_BARGE_IN_VAD, VOICEMODE_BARGE_IN_MIN_MS), requirements, and use cases - docs/concepts/architecture.md: Add "Barge-In (TTS Interruption)" section with ASCII flow diagram showing BargeInMonitor, AudioPlayer interrupt flow, and captured audio path to STT - CHANGELOG.md: Add feature entry under [Unreleased] with VM-606/GH-211 references, all configuration options, and key implementation details - docs/reference/converse-parameters.md: Add barge-in section with enabling instructions, how-it-works explanation, requirements, and tuning tips for various environments Closes: VM-606 phase5-docs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create tests/test_barge_in_performance.py with 14 automated tests: - Voice onset to TTS stop latency verification (<100ms target) - Interrupt callback latency (<5ms average) - VAD check performance (<5ms per chunk) - Buffer operations (<1ms append) - Memory usage (5s audio ~240KB) - Thread safety under concurrent access - Load testing (50 cycles consistency) - Add performance characteristics to docs/concepts/architecture.md: - Latency metrics table with targets - Detailed latency breakdown - CPU overhead documentation - Memory usage characteristics - Thread safety notes - Add CPU profiling and performance report to manual test script: - test_cpu_profiling() for real-time CPU measurement - test_performance_report() for comprehensive metrics Key findings: - Average interrupt latency: <5ms - Average cross-thread latency: <20ms - Total pipeline latency: <50ms (well under 100ms target) - Minimal CPU overhead during monitoring - Memory scales linearly with captured duration Ref: VM-606, GH-211 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Collaborator
Author
|
@rakavanagh - The barge-in feature you requested in #211 is ready for review! Key highlights:
Would love your feedback when you get a chance to try it out! |
|
Friendly ping — is this still relevant? Happy to rebase or update if needed. |
10 tasks
|
I'd really appreciate a feature like this. Having to wait 15-30 secs for Claude to finish talking every time is very disruptive to the flow. Right now the only way to stop Claude talking is to press |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements barge-in capability allowing users to interrupt TTS playback by speaking, creating more natural conversational flow.
Closes #211 - Feature request by @rakavanagh
Key Features
Configuration
VOICEMODE_BARGE_INfalseVOICEMODE_BARGE_IN_VAD2VOICEMODE_BARGE_IN_MIN_MS150Changes
New Files
voice_mode/barge_in.py- BargeInMonitor class (350+ lines)tests/test_barge_in.py- Unit tests (30 tests)tests/test_barge_in_edge_cases.py- Edge case tests (23 tests)tests/test_barge_in_integration.py- Integration tests (24 tests)tests/test_barge_in_streaming.py- Streaming tests (25 tests)tests/test_barge_in_performance.py- Performance tests (14 tests)tests/test_audio_player_interrupt.py- Interrupt tests (30 tests)tests/manual/test_barge_in_manual.py- Manual testing scriptModified Files
voice_mode/config.py- Barge-in configurationvoice_mode/audio_player.py- Interrupt support for NonBlockingAudioPlayervoice_mode/core.py- Barge-in integration in text_to_speech()voice_mode/streaming.py- Streaming TTS barge-in supportvoice_mode/tools/converse.py- Converse tool integrationvoice_mode/utils/event_logger.py- Barge-in event typesPerformance
Test Results
146 barge-in related tests passing:
Test Plan
🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com