A simple, hands-free Python voice assistant that runs 100% locally. This script uses openwakeword for wakeword detection, webrtcvad for silence detection, OpenAI's Whisper for transcription, and Ollama for generative AI responses.
flowchart LR
A[Microphone] --> B(openwakeword);
B -- "hey jarvis" --> C(webrtcvad);
C -- "Records until silence" --> D[faster-whisper STT];
D -- "Transcribes audio" --> E[Ollama LLM];
E -- "Generates streaming response" --> F[Piper TTS];
F -- "Speaks response" --> G[Speaker];
- 100% Local: No cloud services are required for STT, TTS, or the LLM.
- Hands-Free: Uses openwakeword for wakeword detection.
- Low-Latency TTS: Uses the Piper TTS engine for fast, high-quality voice output.
- Optimized STT: Leverages faster-whisper models for efficient and accurate speech-to-text.
- Smart Recording: Uses webrtcvad (Voice Activity Detection) to automatically stop recording when you finish speaking.
- Flexible LLM: Easily configurable to use any model supported by your local Ollama instance (e.g., llama3, mistral, phi3).
- Cross-Platform Audio: Uses sounddevice for audio input/output.
- Configurable: Settings are adjustable via
config.iniand command-line arguments.
Before you begin, ensure you have the following installed and running:
You must have the Ollama application installed and running.
You need at least one model downloaded for Ollama to use.
# The default model is llama3
ollama pull llama3The underlying audio libraries require system packages to be installed.
On Debian/Ubuntu Linux:
sudo apt-get update && sudo apt-get install -y portaudio19-dev ffmpegOn Fedora/RHEL Linux:
# Enable RPM Fusion if you haven't already (see https://rpmfusion.org/Configuration)
sudo dnf install -y portaudio-devel gcc python3-devel ffmpeg pulseaudio-libs-develThis project comes with pre-packaged wake-word (hey_jarvis) and TTS models in the models/ directory. No download is required unless you wish to use different ones.
Clone this repository to your local machine and navigate into the project directory.
git clone https://github.com/BranchingBad/ollama-STT-TTS.git
cd ollama-STT-TTSCreate and activate a Python virtual environment (recommended).
# Create the environment
python3 -m venv venv
# Activate it (Linux/macOS)
source venv/bin/activate
# Activate it (Windows)
# venv\Scripts\activateInstall the project. For regular use:
pip install .For development (including testing dependencies):
pip install -e .[test]On the first run, the application will automatically download the required faster-whisper model.
You can run the assistant either locally with Python or via Docker. All commands should be run from the root of the project directory.
Make sure your Ollama application is running. Then, start the assistant:
python run.pyOr, if you have installed the package, you can use the entry point:
ollama-voice-assistantWhen ready, you will see the message: Ready! Listening for 'hey jarvis'...
How to Interact:
- Say the wakeword (e.g., "Hey jarvis").
- The assistant will respond, "Yes?" and begin listening.
- Speak your command (e.g., "Who won the war of 1812?").
- The assistant will transcribe your audio, send it to Ollama, and speak the response. It will then return to listening for the wakeword.
Special Commands:
"goodbye"or"exit": Stops the script."new chat"or"reset chat": Clears the conversation history for the LLM.
A pre-built Docker image is available on the GitHub Container Registry.
1. Pull the Image:
docker pull ghcr.io/branchingbad/ollama-stt-tts:latest2. Prepare Configuration: You will likely need to find the correct audio device index for the container to use. You can list the devices from your local (non-Docker) installation:
python run.py --list-devicesCopy the config.ini file from the repository to a local directory and edit the device_index with the correct value from the command above.
3. Run the Container (Linux):
This command connects the container to your host's network (to access Ollama), mounts your sound devices, and mounts your local config.ini.
docker run --rm -it \
--network=host \
--device /dev/snd \
-v ./config.ini:/app/config.ini:ro \
ghcr.io/branchingbad/ollama-stt-tts:latest--network=host: Required for the container to access Ollama athttp://localhost:11434.--device /dev/snd: Grants the container access to your host's sound devices (Linux-specific).-v ./config.ini...: Mounts your local configuration file as read-only.
Note for macOS/Windows users: Audio device mapping is more complex. You may need to adjust the docker run command. If --network=host is unavailable, remove it and set ollama_host in your config.ini to http://host.docker.internal:11434.
Customize the assistant by editing config.ini or by providing command-line arguments. Arguments always override settings from the config file.
Example Commands:
# Run with a different wakeword threshold and VAD aggressiveness
python run.py --wakeword-threshold 0.5 --vad-aggressiveness 1
# Run using a different Ollama model and input device
python run.py --ollama-model mistral --device-index 2Common Arguments:
--list-devices: List available audio input devices and exit.--list-output-devices: List available audio output devices and exit.--debug: Enable verbose debug logging.--ollama-model: Name of the Ollama model to use (e.g.,llama3,mistral).--whisper-model: Name of thefaster-whispermodel to use (e.g.,tiny.en,base.en).--wakeword: The wakeword phrase to listen for.--device-index: The integer index of your microphone.--piper-output-device-index: The integer index of your speaker.--system-prompt: A custom system prompt or a path to a.txtfile containing one.
For a full list of configurable options, see the [Models] and [Functionality] sections in the config.ini file.
This project includes a suite of unit tests to ensure the reliability of its core components. The tests cover:
- Ollama connection
- Configuration management
- Audio utilities
- LLM handling
- Audio transcription
- Speech synthesis
To run the tests, first ensure you have installed the development dependencies:
pip install -e .[test]Then, run pytest from the root of the project directory:
python3 -m pytest