Skip to content

renjfk/opencode-voice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CI License: MIT npm Downloads

opencode-voice

Speech-to-text and text-to-speech plugin for OpenCode.

Record voice prompts with local whisper transcription, hear assistant responses spoken aloud via Piper TTS. Both directions use an LLM to normalize text for natural speech (fixing homophones, splitting camelCase identifiers, summarizing code-heavy responses, etc.).

Install

Add to your tui.json (create at ~/.config/opencode/tui.json if it doesn't exist). You must configure at least endpoint and model:

{
  "$schema": "https://opencode.ai/tui.json",
  "plugin": [
    [
      "@renjfk/opencode-voice",
      {
        "endpoint": "https://api.anthropic.com/v1",
        "model": "claude-haiku-4-5",
        "apiKeyEnv": "ANTHROPIC_API_KEY"
      }
    ]
  ]
}

Prerequisites

Speech-to-text

brew install whisper-cpp sox

Download a whisper model to ~/.local/share/whisper-cpp/:

mkdir -p ~/.local/share/whisper-cpp
curl -L -o ~/.local/share/whisper-cpp/ggml-large-v3-turbo-q5_0.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo-q5_0.bin

Text-to-speech

Install Piper:

uv tool install piper-tts

Or with pip:

pip install piper-tts

Download a voice model to ~/.local/share/piper-voices/:

mkdir -p ~/.local/share/piper-voices
curl -L -o ~/.local/share/piper-voices/en_US-ryan-high.onnx \
  https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/ryan/high/en_US-ryan-high.onnx
curl -L -o ~/.local/share/piper-voices/en_US-ryan-high.onnx.json \
  https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/ryan/high/en_US-ryan-high.onnx.json

LLM endpoint

An OpenAI-compatible LLM endpoint is required for text normalization. For speech-to-text it cleans up whisper output (punctuation, filler words, software engineering homophones). For text-to-speech it converts markdown into natural spoken text.

Configure your endpoint in tui.json via plugin options. Any OpenAI-compatible endpoint works (Anthropic, OpenAI, Ollama, vLLM, LM Studio, etc.). The apiKeyEnv option is optional - omit it for unauthenticated endpoints like Ollama.

{
  "plugin": [
    [
      "@renjfk/opencode-voice",
      {
        "endpoint": "https://api.anthropic.com/v1",
        "model": "claude-haiku-4-5",
        "apiKeyEnv": "ANTHROPIC_API_KEY"
      }
    ]
  ]
}

For unauthenticated local endpoints (e.g. Ollama):

{
  "plugin": [
    [
      "@renjfk/opencode-voice",
      {
        "endpoint": "http://localhost:11434/v1",
        "model": "llama3.2"
      }
    ]
  ]
}
  • endpoint (required) - OpenAI-compatible base URL
  • model (required) - model name sent to /chat/completions
  • apiKeyEnv (optional) - environment variable containing the API key
  • maxTokens (optional) - maximum completion tokens for normalization calls
  • reasoningEffort (optional) - reasoning level for models that support it
  • chatTemplateKwargs (optional) - extra keyword arguments passed to the model's chat template (e.g. {"enable_thinking": false} for Qwen models to disable chain-of-thought)
  • retries (optional) - number of retry attempts for transient LLM failures

STT API transcription (optional)

Instead of local whisper-cli, you can use an OpenAI-compatible speech-to-text API (e.g. serving a Whisper model). This is useful when you want to run the plugin on a machine without whisper-cpp installed.

{
  "plugin": [
    [
      "@renjfk/opencode-voice",
      {
        "sttEndpoint": "http://127.0.0.1:8000/v1",
        "sttModel": "whisper-large-v3-turbo",
        "sttApiKeyEnv": "MY_STT_API_KEY"
      }
    ]
  ]
}
  • sttEndpoint (optional) - OpenAI-compatible base URL with /audio/transcriptions support
  • sttModel (optional) - whisper model name to pass to the API (default: whisper-large-v3-turbo). Can be changed at runtime via /stt-model, which fetches available whisper models from the endpoint's /models listing
  • sttApiKeyEnv (optional) - environment variable containing the API key

Custom prompts

The LLM system prompts used for normalization can be fully replaced by pointing to your own prompt files. This lets you fine-tune how transcriptions are cleaned up or how responses are spoken.

{
  "plugin": [
    [
      "@renjfk/opencode-voice",
      {
        "sttPrompt": "~/.config/opencode/stt-prompt.md",
        "ttsAutoPrompt": "~/.config/opencode/tts-auto-prompt.md",
        "ttsManualPrompt": "~/.config/opencode/tts-manual-prompt.md"
      }
    ]
  ]
}
  • sttPrompt (optional) - system prompt for cleaning up whisper transcriptions
  • ttsAutoPrompt (optional) - system prompt for auto-speaking assistant responses
  • ttsManualPrompt (optional) - system prompt for manually reading responses aloud

If a path is not set, the built-in default prompt is used.

Commands

Speech-to-text

Command Keybind Description
/stt-record ctrl+r Start/stop recording + transcribe
/stt-stop Cancel recording
/stt-model Select whisper model
/stt-mic Select microphone

Text-to-speech

The leader key in OpenCode is ctrl+x. So leader+s means press ctrl+x then s.

Command Keybind Description
/tts-speak leader+s Read last response aloud
/tts-mode leader+v Toggle auto TTS on/off
/tts-stop escape Stop playback
/tts-voice Select TTS voice

How it works

STT pipeline

  1. sox records audio from your microphone
  2. whisper-cli transcribes locally using a ggml model, or an OpenAI-compatible API endpoint if sttEndpoint is configured
  3. LLM normalizes the transcription: fixes punctuation, removes filler words, corrects software engineering homophones ("Jason" to "JSON", "bullion" to "boolean", etc.)
  4. Cleaned text is appended to the OpenCode prompt. If normalization fails (e.g. LLM endpoint unreachable), the raw transcription is used as a fallback so you never lose your input

TTS pipeline

  1. When the assistant finishes responding (or on manual trigger), the response text is sent to the LLM for speech normalization
  2. The LLM decides how to handle it: narrate simple answers, summarize code-heavy responses, or briefly notify for confirmations
  3. Piper synthesizes speech locally, piped through sox for playback

Auto TTS

When enabled (/tts-mode), the plugin automatically speaks:

  • Assistant responses when a session goes idle after work
  • Permission requests
  • Questions that need your answer

Contributing

opencode-voice is open to contributions and ideas!

Issue conventions

Format: type: brief description

  • feat: new features or functionality
  • fix: bug fixes
  • enhance: improvements to existing features
  • chore: maintenance tasks, dependencies, cleanup
  • docs: documentation updates
  • build: build system, CI/CD changes

Development

npm run check        # lint + fmt
npm run lint         # oxlint
npm run fmt          # oxfmt --check
npm run fmt:fix      # oxfmt --write

Test local plugin in OpenCode

To test unpublished changes in the OpenCode TUI, point ~/.config/opencode/tui.json at the local repo path, not the npm package name:

{
  "$schema": "https://opencode.ai/tui.json",
  "plugin": ["/Users/your-user/opencode-voice"]
}

Release process

Manual releases via opencode; see RELEASE_PROCESS.md.

License

This project is licensed under the MIT License.

About

Speech-to-text and text-to-speech plugin for OpenCode

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors