feat: add barge-in interrupt detection by MatiasComercio · Pull Request #268 · mbailey/voicemode

MatiasComercio · 2026-02-16T18:11:17Z

Summary

This PR adds Barge-in (Interrupt Detection): Users can interrupt TTS playback by speaking, with echo suppression and ring buffer to capture pre-trigger audio

These changes enable natural conversation flow where users can interrupt the AI's response without waiting, and recordings stop automatically after silence.

Motivation

No capability to interrupt TTS playback (must wait for full response)

Changes

Core Functionality Added

1. Barge-in System

DuplexBargeInPlayer class for full-duplex audio monitoring
Monitors microphone during TTS playback
Echo suppression (input must be 30% louder than output)
Ring buffer captures 500ms pre-trigger audio to prevent lost syllables
Configurable energy threshold (default: 300)
Minimum speech duration (default: 20ms)

2. Streaming Layer Updates

Added skip_playback parameter throughout streaming stack
Allows TTS generation without playback (for interrupted audio)
Preserves audio files for potential replay

Files Modified

File                          Added  Removed  Total
voice_mode/config.py              6        2      8
voice_mode/core.py                8        3     11
voice_mode/streaming.py          46       33     79
voice_mode/tools/converse.py    284       32    316
────────────────────────────────────────────────
Total:                          344       70    414

voice_mode/config.py (8 lines)

Added BARGE_IN_ENERGY_THRESHOLD = 300
Added BARGE_IN_MIN_SPEECH_MS = 20

voice_mode/streaming.py (79 lines)

Added skip_playback parameter to stream_pcm_audio()
Added skip_playback parameter to stream_tts_audio()
Added skip_playback parameter to stream_with_buffering()
Guarded all stream operations with skip_playback checks
Store audio path in metrics for barge-in replay

voice_mode/core.py (11 lines)

Added skip_playback parameter to text_to_speech()
Updated docstring
Pass parameter through to streaming layer

voice_mode/tools/converse.py (316 lines)

Added DuplexBargeInPlayer class (170+ lines of barge-in logic)
Added barge_in parameter to converse() function
Modified text_to_speech_with_failover() with force_save_audio and skip_playback
Integrated barge-in detection flow
Added audio concatenation (prepending barge-in audio to VAD recording)
Added conditional finished chime (skip if interrupted)

API Changes

New backward-compatible parameter in converse():

await converse(
    message="Hello, how can I help?",
    barge_in=True  # Enable interrupt detection
)

Default: barge_in=False (preserves existing behavior)

Environment Variables

Optional configuration via environment:

VOICEMODE_BARGE_IN_THRESHOLD=300      # Energy threshold for detection
VOICEMODE_BARGE_IN_MIN_SPEECH=20     # Minimum speech duration (ms)

Testing

Manual Testing Performed

Basic VAD (Silence Detection)
- Recording now stops automatically ~1s after user stops speaking
- No manual interruption needed
- Works with default settings
Barge-in (TTS Interruption)
- TTS stops within 300-600ms of user speaking
- Ring buffer captures full utterance (no lost syllables)
- Echo suppression prevents false triggers
Backward Compatibility
- Existing code works unchanged (barge_in defaults to False)
- All new parameters have sensible defaults

Test Metrics

TTS Play Latency: 0.3-0.6s
Record Duration: 2-4s with automatic silence stop
Barge-in Latency: 300-600ms (threshold dependent)
Ring Buffer: Captures 500ms pre-trigger audio

Test Commands

# Enable debug logging
export VOICEMODE_DEBUG=true

# Test basic conversation (with barge-in enabled)
uv run voicemode converse

# Check audio debug files
ls ~/.voicemode/audio/

Automated Testing

All existing tests pass:

pytest
pytest --cov=voice_mode

Note: new tests for DuplexBargeInPlayer barge-in detection logic cover:

Energy threshold detection with synthetic data
Ring buffer size management (deque-based)
Echo suppression logic
Thread-safety of shared state
skip_playback parameter propagation through streaming functions

Voicemode Version

Patches were developed and tested against voicemode 8.1.0.

Checklist

Dependencies

No new direct dependencies added to project requirements.

Platform Compatibility

Tested on macOS with standard audio hardware. Linux/Windows compatibility should be maintained as all audio operations use existing VoiceMode abstractions.

- Implement DuplexBargeInPlayer for real-time interrupt detection - Add barge-in configuration constants and skip_playback parameter - Add 23 unit tests covering energy detection, echo suppression, thread-safety - Fix VAD silence detection with wall-clock timeout and reduced post-barge-in min_duration - Update dependency to webrtcvad-wheels for easier installation Applies to voicemode v8.1.0

MatiasComercio · 2026-02-17T02:09:14Z

Just checking the opened PRs, the #238 is exactly the same. Closing and in case anything from here is useful ping me or use it!

MatiasComercio closed this Feb 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add barge-in interrupt detection#268

feat: add barge-in interrupt detection#268
MatiasComercio wants to merge 1 commit intombailey:masterfrom
MatiasComercio:feat/barge-in-vad-fixes

MatiasComercio commented Feb 16, 2026

Uh oh!

MatiasComercio commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MatiasComercio commented Feb 16, 2026

Summary

Motivation

Changes

Core Functionality Added

Files Modified

API Changes

Environment Variables

Testing

Manual Testing Performed

Test Metrics

Test Commands

Automated Testing

Voicemode Version

Checklist

Dependencies

Platform Compatibility

Uh oh!

MatiasComercio commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant