Self-hosted speech-to-text API service powered by OpenAI Whisper
Gosper is a production-ready transcription service that runs entirely on your infrastructure. Deploy it to k3s, send audio files via HTTP API, and get accurate transcripts back—all without your users' voice data ever touching a cloud provider's servers.
Building an app with speech-to-text features? You have options:
| Your Choice | Gosper | Cloud APIs (Google, AWS, Azure) |
|---|---|---|
| Privacy | ✅ Audio stays on your servers | ❌ Audio sent to cloud |
| Cost | ✅ Free after deployment | ❌ $$ per minute |
| Control | ✅ You own the infrastructure | ❌ Vendor lock-in |
| Accuracy | ✅ OpenAI Whisper | ✅ High accuracy |
Gosper was built for developers who:
- 🔒 Care about user privacy and data sovereignty
- 💰 Want to avoid escalating per-minute API costs
- 🏗️ Prefer self-hosted infrastructure (homelab, VPS, on-prem)
- 🚀 Need a production-ready backend for mobile/web apps
- 🛠️ Value clean, extensible architecture
"Your users' voices shouldn't be a subscription service."
- Deploy Gosper to your k3s cluster or Docker host
- Integrate your mobile/web app with the
/api/transcribeendpoint - Send audio files (WAV or MP3) via HTTP POST
- Receive accurate JSON transcripts powered by Whisper
- Scale with your userbase—no per-minute costs
- Build the CLI binary using
make build-all. - Download a Model: Gosper needs a Whisper model to run.
# Build the model downloader utility make -C whisper.cpp/bindings/go examples # Download the tiny English model ./whisper.cpp/bindings/go/build_go/go-model-download -out whisper.cpp/models ggml-tiny.en.bin
- Transcribe an Audio File:
Note: The current version has a known issue with MP3 decoding. Please use WAV files for transcription.
# Transcribe a WAV file ./dist/gosper transcribe path/to/your/audio.wav --model whisper.cpp/models/ggml-tiny.en.bin
All processing happens locally using whisper.cpp, a high-performance C++ implementation of OpenAI's Whisper model.
Note: The public Docker image gosper/server:latest is currently out of date. Please build the image locally.
# Build the server image
docker build -f Dockerfile.server -t gosper/server:local .
# Run the service
docker run -p 8080:8080 gosper/server:local
# Transcribe an audio file
curl -X POST http://localhost:8080/api/transcribe \
-F "[email protected]" \
-F "lang=auto"🎉 That's it! Your transcript is returned as JSON.
- 📦 Docker & Docker Compose - Run locally in seconds
- ☸️ Kubernetes/k3s Deployment - Production setup
- 💻 CLI Installation - Command-line usage
- 🛠️ Build from Source - Development setup
- 🎙️ Multiple Interfaces: HTTP API, CLI, and Web UI
- 🎵 Format Support: WAV and MP3 with automatic detection
- 🌍 Multi-Language: 100+ languages with auto-detection
- ⚡ Fast: Optimized whisper.cpp with parallelization
- 🐳 Production-Ready: Docker images and k8s manifests included
- 🏗️ Clean Architecture: Hexagonal design, 85%+ test coverage
- 📴 Offline Capable: Models cached locally, no internet required
Gosper follows hexagonal (ports & adapters) architecture:
┌─────────────────────────────────────────────┐
│ Inbound Adapters │
│ (HTTP API, CLI, Web UI) │
└──────────────┬──────────────────────────────┘
│
┌──────────────▼──────────────────────────────┐
│ Use Cases │
│ (TranscribeFile, RecordAndTranscribe) │
└──────────────┬──────────────────────────────┘
│
┌──────────────▼──────────────────────────────┐
│ Outbound Adapters │
│ (Whisper.cpp, Audio Decoders, Storage) │
└─────────────────────────────────────────────┘
📚 Full Architecture Guide - Detailed layer descriptions and extension points
- ✅ Linux (x86_64, ARM64) - Ubuntu 20.04+, Debian 11+
- ✅ macOS (Intel, Apple Silicon) - macOS 11+
- ✅ Windows (x86_64) - Windows 10+
- ✅ Docker - Multi-platform images available
- ✅ Kubernetes - k3s/k8s manifests and Helm charts
🔧 Platform-Specific Notes - Build requirements and known issues
- 🚀 Quick Start Guide - Get transcribing in minutes
- ☸️ Deployment Guide - Production k3s/k8s setup
- 🛠️ Build from Source - Development environment
- 🏗️ Architecture - Design principles and code structure
- 🔌 API Reference - HTTP API endpoints and examples
- ⚙️ Configuration - Environment variables and models
- 🩺 Troubleshooting - Common issues and solutions
- 🤝 Contributing Guide - Development workflow and guidelines
We welcome contributions! Gosper aims to be not just useful, but also forkable and extensible.
- Check existing issues at github.com/cjpais/go-whisper/issues
- Read the contributing guide at docs/CONTRIBUTING.md
- Fork and create a feature branch
- Write tests - We maintain 85%+ coverage
- Submit a pull request with clear description
# Clone and build
git clone https://github.com/yourusername/gosper.git
cd gosper
# Build all binaries
make build-all
# Run tests
make testSee docs/BUILD.md for detailed setup instructions.
MIT License - see LICENSE file for details.
Gosper stands on the shoulders of giants:
- OpenAI Whisper - Revolutionary speech recognition model
- whisper.cpp - High-performance C++ implementation
- hajimehoshi/go-mp3 - Pure Go MP3 decoder
- Go Community - Excellent language and ecosystem
"Self-host your speech-to-text. Own your data. Build without limits."