Speaking is just easier.
Speak freely, type instantly — 100% local voice dictation for Linux with 25+ languages, translation, speaker diarization, and real-time visual feedback. Text appears right where your cursor is.
Installation • Configuration • Visual interfaces • Usage • Going further • Roadmap
dictee is a complete voice dictation system for Linux. Transcription is performed 100% locally — no audio data ever leaves your machine. Press a shortcut, speak, and the text is typed directly into the active application.
- 4 ASR backends: Parakeet-TDT (25 languages, native punctuation), Canary-1B (built-in translation, GPU), Vosk (lightweight, ~50 MB), faster-whisper (99 languages)
- Daemon mode: model loaded once, near-instant transcriptions (~0.8s on CPU)
- Translation: 4 backends — Google, Bing, LibreTranslate (local), ollama (local)
- Speaker diarization: who said what, up to 4 speakers via Sortformer (CLI only, not yet in voice dictation)
- 3 visual interfaces: KDE Plasma widget, notification area icon, fullscreen animation
Download the .deb from the Releases, then:
# GPU version (requires the NVIDIA CUDA repository — see "GPU dependencies" below)
sudo dpkg -i dictee-cuda_1.2.0_amd64.deb
# CPU version (any computer, no extra repository needed)
sudo dpkg -i dictee-cpu_1.2.0_amd64.deb
# Install missing dependencies
sudo apt-get install -fNote: The GPU version requires cuDNN from the NVIDIA CUDA repository, which is not included in standard Ubuntu/Fedora repos. Without it, the GPU version will still work but in CPU mode only.
Fedora / openSUSE:
# GPU version (NVIDIA CUDA — see "GPU dependencies" below)
sudo dnf install ./dictee-cuda-1.2.0-1.x86_64.rpm
# CPU version (any computer)
sudo dnf install ./dictee-cpu-1.2.0-1.x86_64.rpmArch Linux (AUR):
A PKGBUILD is available in the repository root. It builds from source and includes all components (x86_64 and aarch64).
aarch64 (ARM64):
Pre-built packages are x86_64 only. On aarch64 (Raspberry Pi 5, Ampere, etc.), build from source — see below. CUDA is limited to NVIDIA Jetson on this architecture; most users will use CPU mode.
Other distributions (.tar.gz):
tar xzf dictee-1.2.0_amd64.tar.gz
cd dictee-1.2.0
sudo ./install.shFrom source:
tar xzf dictee-1.2.0-source.tar.gz
cd dictee-1.2.0-source
cargo build --release --features sortformer
sudo ./install.shFor detailed build instructions and Cargo features, see docs/building.md.
The GPU version (dictee-cuda) requires cuDNN, which is not available in standard Ubuntu/Fedora repositories. You need the NVIDIA CUDA repository:
Ubuntu / Debian:
wget -qO - https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/3bf863cc.pub | \
sudo gpg --dearmor -o /usr/share/keyrings/cuda-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg] \
https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/ /" | \
sudo tee /etc/apt/sources.list.d/cuda-ubuntu2404-x86_64.list
sudo apt update
sudo apt install libcudnn9-cuda-12Replace
ubuntu2404with your version (ubuntu2204,ubuntu2504, etc.). See NVIDIA CUDA repos.
Fedora:
sudo dnf config-manager addrepo --from-repofile=https://developer.download.nvidia.com/compute/cuda/repos/fedora41/x86_64/cuda-fedora41.repo
sudo dnf install libcudnn9-cuda-12Without cuDNN, the GPU version falls back to CPU automatically.
dictee-setupwill detect this and guide you through the setup.
On first launch, a setup wizard guides you through backend selection, model download, and keyboard shortcuts. You can reconfigure anytime from the application menu, tray icon, Plasma widget, or by running dictee --setup from the terminal:
Four mutually exclusive transcription backends, switchable from dictee --setup:
| Backend | Languages | Model size | Warm latency | Type |
|---|---|---|---|---|
| Parakeet-TDT | 25 | ~2.5 GB | ~0.8s CPU · ~0.16s GPU | ONNX Runtime (Rust) |
| Canary-1B | 4 (EN,ES,FR,DE) | ~5 GB | ~0.7s GPU | ONNX Runtime (Python, GPU recommended) |
| faster-whisper | 99 | ~500 MB–3 GB | ~0.3s | CTranslate2 (Python) |
| Vosk | 9+ | ~50 MB | ~1.5s | Python (lightweight) |
Each backend runs as a systemd user service — same Unix socket protocol, fully transparent to the user.
dictee --setup captures and registers shortcuts automatically (KDE Plasma / GNOME). Two separate shortcuts: one for dictation, one for dictation + translation.
For tiling WMs (Sway, i3, Hyprland…), the tool shows the command to add manually to your config.
| Backend | Privacy | Speed | Quality | Setup |
|---|---|---|---|---|
| Canary-1B | 100% local | Built-in | Best | Included with ASR backend |
| LibreTranslate | 100% local | 0.1–0.3s | Good | Guided from setup |
| ollama | 100% local | 2.3–3.4s | Best | Guided from setup |
| translate-shell (Google) | Online | 0.2–0.7s | Good | Included |
| translate-shell (Bing) | Online | 1.7–2.2s | Good | Included |
Switch ASR or translation backend instantly from the command line, the tray icon menu, or the Plasma widget:
# Switch ASR backend
dictee-switch-backend asr canary
# Switch translation backend
dictee-switch-backend translate ollama
# Show current backends
dictee-switch-backend status
# → ASR: parakeet (dictee.service, active)
# → Translate: google (trans)The tray icon and Plasma widget include sub-menus for switching backends without opening the configuration.
A native KDE Plasma 6 widget with real-time audio visualization during recording, daemon status, and quick controls (dictate, translate, cancel).
Five animation styles with Hanning envelope, per-style sensitivity, and optional color gradients:
| Bars | Wave | Pulse | Dots | Waveform |
|---|---|---|---|---|
All styles support color gradients, adjustable Hanning envelope (shape and center frequency), per-style sensitivity curve, and fine-tuning options (bar count, spacing, radius, speed…).
# Install (included in .deb, or manually)
kpackagetool6 -t Plasma/Applet -i /usr/share/dictee/dictee.plasmoidRight-click on the panel → "Add Widgets…" → search for "Dictée".
For full widget settings documentation, see docs/plasmoid.md.
dictee-tray is the alternative to the KDE Plasma widget for non-KDE desktops (GNOME, Xfce, Sway, Hyprland…). It displays a notification area icon reflecting the real-time state: idle, recording (green), transcribing (blue), daemon stopped (red).
- Left click → start dictation
- Middle click → cancel
- Context menu → all actions (dictation, translation, daemon, configuration)
# Launch manually
dictee-tray
# Enable at session startup
systemctl --user enable --now dictee-trayThe icon automatically adapts to light/dark themes.
Both the Plasma widget and the tray icon include:
- Backend selectors — switch ASR and translation backends without opening
dictee-setup - First-run detection — prompts to run the setup wizard if not yet configured
- Install detection (Plasma widget) — shows a clear message if dictee is not installed
animation-speech is a standalone project that provides a fullscreen visual animation during recording, with cancellation via Escape key. It works on any Wayland compositor supporting wlr-layer-shell (KDE Plasma, Sway, Hyprland…).
sudo dpkg -i animation-speech_1.2.0_all.debDownload: animation-speech releases
Note: animation-speech is not compatible with GNOME (no
wlr-layer-shellsupport). GNOME users can usedictee-trayfor visual feedback. Contributions for a GNOME Shell extension are welcome — see the plasmoid source for reference architecture.
Without any visual interface, dictee works normally but without visual feedback during recording.
# Simple dictation — transcribe and type
dictee
# With translation (default: system language → English)
dictee --translate
dictee --translate --ollama # 100% local translation via ollama
# Change translation languages
DICTEE_LANG_TARGET=es dictee --translate # → Spanish
# Cancel an ongoing recording (via shortcut or Escape key)
dictee --cancel
# Test post-processing rules
dictee-test-rules # interactive mode
dictee-test-rules --loop # continuous test loop
dictee-test-rules --wav file.wav # test from audio file
# Switch backend from command line
dictee-switch-backend status # show current backends
dictee-switch-backend asr canary # switch to Canary
dictee-switch-backend translate bing # switch translation to Bingdictee includes a configurable text transformation pipeline that runs after transcription:
- Custom rules — regex-based text replacements (e.g., voice commands like "new line", "comma")
- Dictionary — replace common ASR mistakes with correct words
- Continuation — detect incomplete sentences across multiple dictations
- Elisions — French grammar rules (e.g., "le arbre" → "l'arbre")
- Number conversion — spoken numbers to digits (e.g., "vingt-trois" → "23")
- Auto-capitalization — capitalize after sentence-ending punctuation
- LLM correction — optional grammar/spelling fix via Ollama before rules
Configure from dictee --setup → Post-processing tab, or test rules with dictee-test-rules.
| Documentation | Description |
|---|---|
| docs/cli-programs.md | CLI binaries, direct usage, ONNX models |
| docs/building.md | Building from source, Cargo features, audio pipeline |
| docs/plasmoid.md | Widget settings, animation styles, configuration details |
| Post-processing | Text transformation pipeline: rules, dictionary, elisions, text2num, capitalization, LLM correction |
v1.2.0 (current): 4 ASR backends (+ Canary), post-processing pipeline, quick backend switching, first-run wizard, dictee-test-rules
- (v1.3) Hotword boosting — bias ASR decoding toward custom names and terms without retraining (beam search + Aho-Corasick in Rust)
- Diarization from tray/plasmoid — select audio file, get speaker-labeled transcription
- CLI for speech-to-text (pipe audio, get text)
dictee-ctlcoordinator — single entry point, eliminates race conditions- VAD (Voice Activity Detection) — hands-free dictation without push-to-talk
- Real-time streaming transcription with live text display
- Built-in visual overlay (replace external
animation-speech) - AppImage / Flatpak packaging
- COSMIC / GNOME applet (contributions welcome!)
The transcription engine is built on parakeet-rs by Enes Altun, which provides the Rust library for NVIDIA Parakeet model inference via ONNX Runtime. The Canary-1B backend uses onnx-asr by Ivan Stupakov for ONNX-based ASR inference.
This project is distributed under the GPL-3.0-or-later license (see LICENSE).
The original parakeet-rs code by Enes Altun is under the MIT license (see LICENSE-MIT).
dotool by geb is bundled for keyboard input simulation and is under the GPL-3.0 license.
The Parakeet ONNX models (downloaded separately from HuggingFace) are provided by NVIDIA. This project does not distribute the models.






