Note: This is a self-contained PySide6 GUI application designed for Windows (with
ffmpeg.exesupport), but can be adapted to other platforms.
- Real-time transcription of live audio streams (HLS, Icecast, HTTP, etc.)
- Predefined stream presets (BBC, CNN, Al Jazeera, VOA, Bloomberg, and more)
- Manual URL input for custom streams
- Configurable recording duration (1–30 minutes)
- Adjustable audio chunk size for latency/performance tuning
- Live transcription display with color-coded formatting
- Automatic audio saving as WAV → MP3 (192 kbps)
- Usage logging with stream name, action type, and duration
- Model-on-demand loading: Whisper
basemodel (int8, CPU-only) loads only when needed - Listen-only mode: transcribe without saving audio
- Uses faster-whisper with the
basemodel (English-optimized) - Runs locally on CPU (
int8quantization for efficiency) - Language is locked to English (with VAD filtering enabled)
- Model is loaded once per session and reused across recordings
- Python 3.8+
- Required packages:
pip install faster-whisper pyside6 pyaudio pydub colorama requests numpy
- FFmpeg (bundled as
ffmpeg.exein frozen builds; otherwise must be inPATH) - ~500 MB disk space for model cache (
~/.cache/huggingface/hub)
- Clone or download the project.
- Install dependencies:
pip install -r requirements.txt # if you create one - Run the application:
python stream.py
- Choose a stream from the dropdown or enter a custom URL.
- Set duration (1–30 min), optionally adjust chunk size, and click Record.
The first run will download the Whisper
basemodel (~150 MB).
- Text: displayed in real time; each new sentence appears in red bold, then turns blue bold after processing.
- Audio: saved as
StreamName_audio_YYYY-MM-DD_HH-MM-SS.wav→ auto-converted to.mp3. - Log:
stream.login the app directory with entries like:
2025-12-11 14:30:22 BBC - Record: 2:15
| Control | Purpose |
|---|---|
| Stream selector | Choose from built-in stations or type a URL |
| Duration (min) | Max recording/transcription time |
| Audio chunk size | Controls block duration: 160000 = 5s, 320000 = 10s, ..., 1920000 = 60s |
| Record button | Start/stop recording + transcription + audio save |
| Listen button | Transcribe only (no file saved) |
This project is for personal/educational use.
Underlying components:
faster-whisper— MIT License- FFmpeg — GPL/LGPL
- PySide6 — [LGPL/GPL]
- Designed for stable, low-bandwidth English streams (news, talk radio).
- Not optimized for noisy or multilingual content.
- Avoid very short chunk sizes (<5s) — may reduce transcription accuracy.
- On first launch, ensure internet access for model download and stream validation.
Made with ❤️ using Python, faster-whisper, and PySide6.
Version:V211125