Offline audio-to-text transcription tool powered by faster-whisper.
This desktop application transcribes local audio/video files (MP3, WAV, MP4, etc.) into English text using a local Whisper model—no internet required after the first launch.
Note: Designed for batch processing of pre-recorded content. For live streams, use the companion app
Stream Recorder and Transcriber.
- Supports multiple formats: MP3, WAV, MP4 (any file with audio track)
- Automatic audio conversion: resamples to 16 kHz mono using PyAV
- Accurate transcription using Whisper
smallmodel (int8, CPU-friendly) - Sentence-by-sentence output with visual formatting (red → blue)
- Export to
.txtwith same name as input file (e.g.,talk.mp4→talk_tr.txt) - Language mode selection:
- English → English (forced)
- Any language → English (auto-detect + translate)
- Russian → English (explicit)
- Real-time timer and progress feedback
- Graceful stop with logging of partial results
- Temporary file cleanup after processing
- Uses faster-whisper with the
smallmodel (better accuracy thanbase) - CPU-only,
int8quantization for low memory usage (~1–2 GB RAM) - First run downloads ~500 MB model to
~/.cache/huggingface/hub - Model is loaded once per session
- Python 3.8+
- Required packages:
pip install faster-whisper pyside6 pyaudio pydub numpy av
- ~600 MB free disk space (for model + temp files)
- No FFmpeg required — audio conversion handled by PyAV
- Download or clone the project.
- Install dependencies:
pip install -r requirements.txt # if created - Run the app:
python file.py
- Click File, choose an audio/video file.
- Select language mode and click START.
On first launch, the
smallWhisper model will be downloaded automatically.
- Text: displayed in real time; each complete sentence appears in red bold, then turns blue bold.
- Transcript file: saved as
{original_name}_tr.txtin the same directory as the executable. - Log:
file.logwith entries like:
2025-12-11 15:22:10, meeting.mp4, Length: 42:18, Duration: 03:25, Transcription completed
Log file uses rotating handler (max 5 MB, 3 backups).
| Element | Function |
|---|---|
| Language dropdown | Choose transcription/translation mode |
| File browser | Select MP3/WAV/MP4 (or any AV file with audio) |
| START / STOP | Begin or interrupt processing |
| Timer | Shows elapsed processing time |
| Text box | Live transcription output |
- Do not close the app abruptly during processing—use STOP to ensure partial results are saved.
- The app splits text at sentence boundaries (
. ! ?) for clean output. - Input files with no audio track will trigger an error.
- For best results, use clear, speech-focused recordings (e.g., interviews, lectures, podcasts).
This tool is for personal or research use.
Underlying libraries:
faster-whisper— MIT LicensePyAV— BSD LicensePySide6— [LGPL/GPL]
Built with Python, faster-whisper, and PyAV.
Version:V201125