GitHub - rcspam/dictee: Push-to-talk voice dictation for Linux — 100% local, multilingual (25+ languages), with speaker diarization. Qt frontend, Rust backend on NVIDIA Parakeet via ONNX Runtime. KDE Plasmoid integred.

Speaking is just easier.

Speak freely, type instantly — 100% local voice dictation for Linux with 25+ languages, translation, speaker diarization, and real-time visual feedback. Text appears right where your cursor is.

Installation • Configuration • Visual interfaces • Usage • Going further • Roadmap

dictee is a complete voice dictation system for Linux. Transcription is performed 100% locally — no audio data ever leaves your machine. Press a shortcut, speak, and the text is typed directly into the active application.

4 ASR backends: Parakeet-TDT (25 languages, native punctuation), Canary-1B (built-in translation, GPU), Vosk (lightweight, ~50 MB), faster-whisper (99 languages)
Daemon mode: model loaded once, near-instant transcriptions (~0.8s on CPU)
Translation: 4 backends — Google, Bing, LibreTranslate (local), ollama (local)
Speaker diarization: who said what, up to 4 speakers via Sortformer (CLI only, not yet in voice dictation)
3 visual interfaces: KDE Plasma widget, notification area icon, fullscreen animation

Installation

Download the .deb from the Releases, then:

# GPU version (requires the NVIDIA CUDA repository — see "GPU dependencies" below)
sudo dpkg -i dictee-cuda_1.2.0_amd64.deb

# CPU version (any computer, no extra repository needed)
sudo dpkg -i dictee-cpu_1.2.0_amd64.deb

# Install missing dependencies
sudo apt-get install -f

Note: The GPU version requires cuDNN from the NVIDIA CUDA repository, which is not included in standard Ubuntu/Fedora repos. Without it, the GPU version will still work but in CPU mode only.

Fedora / openSUSE:

# GPU version (NVIDIA CUDA — see "GPU dependencies" below)
sudo dnf install ./dictee-cuda-1.2.0-1.x86_64.rpm

# CPU version (any computer)
sudo dnf install ./dictee-cpu-1.2.0-1.x86_64.rpm

Arch Linux (AUR):

A PKGBUILD is available in the repository root. It builds from source and includes all components (x86_64 and aarch64).

aarch64 (ARM64):

Pre-built packages are x86_64 only. On aarch64 (Raspberry Pi 5, Ampere, etc.), build from source — see below. CUDA is limited to NVIDIA Jetson on this architecture; most users will use CPU mode.

Other distributions (.tar.gz):

tar xzf dictee-1.2.0_amd64.tar.gz
cd dictee-1.2.0
sudo ./install.sh

From source:

tar xzf dictee-1.2.0-source.tar.gz
cd dictee-1.2.0-source
cargo build --release --features sortformer
sudo ./install.sh

For detailed build instructions and Cargo features, see docs/building.md.

GPU version: NVIDIA CUDA dependencies

The GPU version (dictee-cuda) requires cuDNN, which is not available in standard Ubuntu/Fedora repositories. You need the NVIDIA CUDA repository:

Ubuntu / Debian:

wget -qO - https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/3bf863cc.pub | \
  sudo gpg --dearmor -o /usr/share/keyrings/cuda-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg] \
  https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/ /" | \
  sudo tee /etc/apt/sources.list.d/cuda-ubuntu2404-x86_64.list
sudo apt update
sudo apt install libcudnn9-cuda-12

Replace ubuntu2404 with your version (ubuntu2204, ubuntu2504, etc.). See NVIDIA CUDA repos.

Fedora:

sudo dnf config-manager addrepo --from-repofile=https://developer.download.nvidia.com/compute/cuda/repos/fedora41/x86_64/cuda-fedora41.repo
sudo dnf install libcudnn9-cuda-12

Without cuDNN, the GPU version falls back to CPU automatically. dictee-setup will detect this and guide you through the setup.

Configuration

On first launch, a setup wizard guides you through backend selection, model download, and keyboard shortcuts. You can reconfigure anytime from the application menu, tray icon, Plasma widget, or by running dictee --setup from the terminal:

ASR backend

Four mutually exclusive transcription backends, switchable from dictee --setup:

Backend	Languages	Model size	Warm latency	Type
Parakeet-TDT	25	~2.5 GB	~0.8s CPU · ~0.16s GPU	ONNX Runtime (Rust)
Canary-1B	4 (EN,ES,FR,DE)	~5 GB	~0.7s GPU	ONNX Runtime (Python, GPU recommended)
faster-whisper	99	~500 MB–3 GB	~0.3s	CTranslate2 (Python)
Vosk	9+	~50 MB	~1.5s	Python (lightweight)

Each backend runs as a systemd user service — same Unix socket protocol, fully transparent to the user.

Keyboard shortcuts

dictee --setup captures and registers shortcuts automatically (KDE Plasma / GNOME). Two separate shortcuts: one for dictation, one for dictation + translation.

For tiling WMs (Sway, i3, Hyprland…), the tool shows the command to add manually to your config.

Translation

Backend	Privacy	Speed	Quality	Setup
Canary-1B	100% local	Built-in	Best	Included with ASR backend
LibreTranslate	100% local	0.1–0.3s	Good	Guided from setup
ollama	100% local	2.3–3.4s	Best	Guided from setup
translate-shell (Google)	Online	0.2–0.7s	Good	Included
translate-shell (Bing)	Online	1.7–2.2s	Good	Included

Quick backend switching

Switch ASR or translation backend instantly from the command line, the tray icon menu, or the Plasma widget:

# Switch ASR backend
dictee-switch-backend asr canary

# Switch translation backend
dictee-switch-backend translate ollama

# Show current backends
dictee-switch-backend status
# → ASR: parakeet (dictee.service, active)
# → Translate: google (trans)

The tray icon and Plasma widget include sub-menus for switching backends without opening the configuration.

Visual interfaces

KDE Plasma widget

A native KDE Plasma 6 widget with real-time audio visualization during recording, daemon status, and quick controls (dictate, translate, cancel).

Five animation styles with Hanning envelope, per-style sensitivity, and optional color gradients:

Bars	Wave	Pulse	Dots	Waveform

All styles support color gradients, adjustable Hanning envelope (shape and center frequency), per-style sensitivity curve, and fine-tuning options (bar count, spacing, radius, speed…).

# Install (included in .deb, or manually)
kpackagetool6 -t Plasma/Applet -i /usr/share/dictee/dictee.plasmoid

Right-click on the panel → "Add Widgets…" → search for "Dictée".

For full widget settings documentation, see docs/plasmoid.md.

Notification area icon (dictee-tray)

dictee-tray is the alternative to the KDE Plasma widget for non-KDE desktops (GNOME, Xfce, Sway, Hyprland…). It displays a notification area icon reflecting the real-time state: idle, recording (green), transcribing (blue), daemon stopped (red).

Left click → start dictation
Middle click → cancel
Context menu → all actions (dictation, translation, daemon, configuration)

# Launch manually
dictee-tray

# Enable at session startup
systemctl --user enable --now dictee-tray

The icon automatically adapts to light/dark themes.

Both the Plasma widget and the tray icon include:

Backend selectors — switch ASR and translation backends without opening dictee-setup
First-run detection — prompts to run the setup wizard if not yet configured
Install detection (Plasma widget) — shows a clear message if dictee is not installed

animation-speech

animation-speech is a standalone project that provides a fullscreen visual animation during recording, with cancellation via Escape key. It works on any Wayland compositor supporting wlr-layer-shell (KDE Plasma, Sway, Hyprland…).

sudo dpkg -i animation-speech_1.2.0_all.deb

Download: animation-speech releases

Note: animation-speech is not compatible with GNOME (no wlr-layer-shell support). GNOME users can use dictee-tray for visual feedback. Contributions for a GNOME Shell extension are welcome — see the plasmoid source for reference architecture.

Without any visual interface, dictee works normally but without visual feedback during recording.

Usage

# Simple dictation — transcribe and type
dictee

# With translation (default: system language → English)
dictee --translate
dictee --translate --ollama    # 100% local translation via ollama

# Change translation languages
DICTEE_LANG_TARGET=es dictee --translate    # → Spanish

# Cancel an ongoing recording (via shortcut or Escape key)
dictee --cancel

# Test post-processing rules
dictee-test-rules                    # interactive mode
dictee-test-rules --loop             # continuous test loop
dictee-test-rules --wav file.wav     # test from audio file

# Switch backend from command line
dictee-switch-backend status         # show current backends
dictee-switch-backend asr canary     # switch to Canary
dictee-switch-backend translate bing # switch translation to Bing

Going further

Post-processing

dictee includes a configurable text transformation pipeline that runs after transcription:

Custom rules — regex-based text replacements (e.g., voice commands like "new line", "comma")
Dictionary — replace common ASR mistakes with correct words
Continuation — detect incomplete sentences across multiple dictations
Elisions — French grammar rules (e.g., "le arbre" → "l'arbre")
Number conversion — spoken numbers to digits (e.g., "vingt-trois" → "23")
Auto-capitalization — capitalize after sentence-ending punctuation
LLM correction — optional grammar/spelling fix via Ollama before rules

Configure from dictee --setup → Post-processing tab, or test rules with dictee-test-rules.

Documentation	Description
docs/cli-programs.md	CLI binaries, direct usage, ONNX models
docs/building.md	Building from source, Cargo features, audio pipeline
docs/plasmoid.md	Widget settings, animation styles, configuration details
Post-processing	Text transformation pipeline: rules, dictionary, elisions, text2num, capitalization, LLM correction

Roadmap

v1.2.0 (current): 4 ASR backends (+ Canary), post-processing pipeline, quick backend switching, first-run wizard, dictee-test-rules

(v1.3) Hotword boosting — bias ASR decoding toward custom names and terms without retraining (beam search + Aho-Corasick in Rust)
Diarization from tray/plasmoid — select audio file, get speaker-labeled transcription
CLI for speech-to-text (pipe audio, get text)
dictee-ctl coordinator — single entry point, eliminates race conditions
VAD (Voice Activity Detection) — hands-free dictation without push-to-talk
Real-time streaming transcription with live text display
Built-in visual overlay (replace external animation-speech)
AppImage / Flatpak packaging
COSMIC / GNOME applet (contributions welcome!)

Credits

The transcription engine is built on parakeet-rs by Enes Altun, which provides the Rust library for NVIDIA Parakeet model inference via ONNX Runtime. The Canary-1B backend uses onnx-asr by Ivan Stupakov for ONNX-based ASR inference.

License

This project is distributed under the GPL-3.0-or-later license (see LICENSE).

The original parakeet-rs code by Enes Altun is under the MIT license (see LICENSE-MIT).

dotool by geb is bundled for keyboard input simulation and is under the GPL-3.0 license.

The Parakeet ONNX models (downloaded separately from HuggingFace) are provided by NVIDIA. This project does not distribute the models.

Name		Name	Last commit message	Last commit date
Latest commit History 769 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
examples		examples
icons		icons
librivox		librivox
pkg		pkg
plasmoid		plasmoid
po		po
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
LICENSE-MIT		LICENSE-MIT
PKGBUILD		PKGBUILD
README.fr.md		README.fr.md
README.md		README.md
build-deb.sh		build-deb.sh
build-rpm.sh		build-rpm.sh
continuation.conf.default		continuation.conf.default
dictee		dictee
dictee-audio-sources		dictee-audio-sources
dictee-common.sh		dictee-common.sh
dictee-postprocess.py		dictee-postprocess.py
dictee-ptt.py		dictee-ptt.py
dictee-ptt.service		dictee-ptt.service
dictee-reset		dictee-reset
dictee-setup.py		dictee-setup.py
dictee-switch-backend		dictee-switch-backend
dictee-test-rules		dictee-test-rules
dictee-transcribe.py		dictee-transcribe.py
dictee-translate-langs		dictee-translate-langs
dictee-tray.py		dictee-tray.py
dictee.conf.example		dictee.conf.example
dictee_models.py		dictee_models.py
dictionary.conf.default		dictionary.conf.default
install.sh		install.sh
plasmoid.png		plasmoid.png
plasmoid_config.png		plasmoid_config.png
plasmoid_gh.png		plasmoid_gh.png
rules.conf.default		rules.conf.default
test-postprocess.py		test-postprocess.py
test-translate.sh		test-translate.sh
tray.png		tray.png
tray_gh.png		tray_gh.png
uninstall.sh		uninstall.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

GPU version: NVIDIA CUDA dependencies

Configuration

ASR backend

Keyboard shortcuts

Translation

Quick backend switching

Visual interfaces

KDE Plasma widget

Notification area icon (dictee-tray)

animation-speech

Usage

Going further

Post-processing

Roadmap

Credits

License

About

Licenses found

Uh oh!

Releases 14

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Installation

GPU version: NVIDIA CUDA dependencies

Configuration

ASR backend

Keyboard shortcuts

Translation

Quick backend switching

Visual interfaces

KDE Plasma widget

Notification area icon (dictee-tray)

animation-speech

Usage

Going further

Post-processing

Roadmap

Credits

License

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Contributors 1

Languages

Packages