A production-ready, self-hosted Text-to-Speech API powered by Coqui XTTS-v2 with voice cloning, multi-language support, and a beautiful admin dashboard. Deploy for free on HuggingFace Spaces!
Author: Anubhav N. Mishra
- π£οΈ High-Quality TTS - XTTS-v2 model for natural-sounding speech
- π 17 Languages - English, Hindi, Spanish, French, German, Japanese, Chinese, and more
- π Voice Cloning - Clone any voice from 6-30 second audio samples
- π Admin Dashboard - Beautiful UI to manage API keys and view usage analytics
- π Multi-Tier Auth - Owner + Friends access system with rate limiting
- β‘ Async Processing - Queue long texts for background processing
- πΎ Audio Caching - Automatic caching for repeated requests
- π Usage Analytics - Track requests, characters, audio minutes, languages, and voices
- π 100% Free - Deploy on HuggingFace Spaces at no cost
- Go to HuggingFace Spaces
- Click "Create new Space"
- Configure:
- Space name:
tts(or any name you prefer) - SDK:
Docker - Visibility:
Public(recommended) orPrivate
- Space name:
- Click "Create Space"
Run this Python script locally to generate secure tokens:
import secrets
print("=" * 60)
print("π YOUR TTS API TOKENS - SAVE THESE SECURELY!")
print("=" * 60)
print(f"\nOWNER_TOKEN={secrets.token_urlsafe(32)}")
print(f"OWNER_NAME=YourName")
print()
for i in range(1, 6):
print(f"FRIEND_{i}_TOKEN={secrets.token_urlsafe(32)}")
print(f"FRIEND_{i}_NAME=Friend{i}")
print("\n" + "=" * 60)In your HuggingFace Space:
- Go to Settings β Repository secrets
- Add these secrets:
| Secret Name | Description |
|---|---|
OWNER_TOKEN |
Your master access token (unlimited access) |
OWNER_NAME |
Your display name |
FRIEND_1_TOKEN |
Friend 1's access token |
FRIEND_1_NAME |
Friend 1's display name |
FRIEND_2_TOKEN |
Friend 2's access token (optional) |
| ... | Add up to 5 friends |
# Clone this repository
git clone https://github.com/anubhav-n-mishra/xtts-api.git
cd xtts-api
# Clone your HuggingFace Space
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME hf-space
cd hf-space
# Copy all files
cp -r ../xtts-api/* .
# Push to HuggingFace
git add -A
git commit -m "Initial deployment"
git push origin main- Build time: 5-10 minutes
- First TTS request: Model downloads (~2GB), takes 2-3 minutes
- Dashboard:
https://YOUR_USERNAME-YOUR_SPACE.hf.space/static/index.html
- Python 3.10 or higher
- FFmpeg installed on your system
- CUDA-capable GPU (optional, for faster inference)
# Clone the repository
git clone https://github.com/anubhav-n-mishra/xtts-api.git
cd xtts-api
# Create virtual environment
python -m venv venv
# Activate virtual environment
# Linux/Mac:
source venv/bin/activate
# Windows:
.\venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set environment variables
# Linux/Mac:
export OWNER_TOKEN="your_secure_token_here"
export OWNER_NAME="YourName"
export DATA_DIR="./data"
# Windows PowerShell:
$env:OWNER_TOKEN="your_secure_token_here"
$env:OWNER_NAME="YourName"
$env:DATA_DIR="./data"
# Run the server
uvicorn app.main:app --host 0.0.0.0 --port 7860 --reload# Build the image
docker build -t xtts-api .
# Run the container
docker run -p 7860:7860 \
-e OWNER_TOKEN="your_token_here" \
-e OWNER_NAME="YourName" \
-v $(pwd)/data:/data \
xtts-apiAll endpoints (except /health) require authentication. Pass your token in the key header:
curl -H "key: YOUR_API_KEY" https://your-space.hf.space/voicesConvert text to speech audio.
curl -X POST "https://your-space.hf.space/tts" \
-H "key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, this is a test!",
"voice": "default",
"language": "en",
"format": "mp3",
"speed": 1.0
}' \
--output speech.mp3Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
string | required | Text to synthesize (max 5000 chars) |
voice |
string | "default" | Voice ID |
language |
string | "en" | Language code |
format |
string | "mp3" | Output format: "mp3" or "wav" |
speed |
float | 1.0 | Speech speed (0.5-2.0) |
async_mode |
bool | false | Return job_id for async processing |
curl -H "key: YOUR_API_KEY" "https://your-space.hf.space/voices"Response:
{
"voices": [
{"voice_id": "default", "description": "Built-in default voice", "type": "built-in"},
{"voice_id": "female_1", "description": "Built-in female voice 1", "type": "built-in"},
{"voice_id": "my_clone", "description": "Cloned voice: my_clone", "type": "cloned"}
],
"total": 3
}Upload audio to create a custom voice.
curl -X POST "https://your-space.hf.space/clone" \
-H "key: YOUR_API_KEY" \
-F "audio=@sample.wav" \
-F "name=my_voice" \
-F "description=My custom cloned voice"Requirements:
- Audio: WAV or MP3 format
- Duration: 6-30 seconds (ideal), 3-60 seconds (allowed)
- Quality: Clear speech, minimal background noise
curl -H "key: YOUR_API_KEY" "https://your-space.hf.space/languages"Create a new API key (requires master token).
curl -X POST "https://your-space.hf.space/keys" \
-H "key: YOUR_MASTER_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "My App"}'curl -H "key: YOUR_MASTER_TOKEN" "https://your-space.hf.space/keys"curl -X DELETE "https://your-space.hf.space/keys/123" \
-H "key: YOUR_MASTER_TOKEN"curl -H "key: YOUR_API_KEY" "https://your-space.hf.space/stats"Response:
{
"total_requests": 150,
"total_audio_seconds": 3600.5,
"total_characters": 50000,
"language_usage": {"en": 100, "hi": 30, "es": 20},
"voice_usage": {"default": 120, "my_clone": 30}
}curl -H "key: YOUR_API_KEY" "https://your-space.hf.space/usage?limit=10"Interactive Swagger docs available at: https://your-space.hf.space/docs
| Code | Language | Code | Language |
|---|---|---|---|
en |
English | ko |
Korean |
es |
Spanish | ja |
Japanese |
fr |
French | zh-cn |
Chinese (Simplified) |
de |
German | ar |
Arabic |
it |
Italian | hi |
Hindi |
pt |
Portuguese | pl |
Polish |
ru |
Russian | tr |
Turkish |
nl |
Dutch | cs |
Czech |
hu |
Hungarian |
xtts-api/
βββ app/
β βββ __init__.py # Package init
β βββ main.py # FastAPI application & routes
β βββ tts_engine.py # XTTS-v2 model wrapper
β βββ database.py # SQLite database & usage tracking
β βββ auth.py # Authentication module
β βββ middleware.py # Auth & rate limiting middleware
β βββ queue.py # Async job processing queue
β βββ cache.py # Audio caching system
β βββ static/
β βββ index.html # Admin dashboard
βββ Dockerfile # HuggingFace Spaces compatible
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ LICENSE # MIT License
| Tier | Rate Limit | Max API Keys | Description |
|---|---|---|---|
| Owner | Unlimited | Unlimited | Full admin access, all features |
| Friend | 3 req/sec | 5 keys | Shared access for trusted friends |
- Master Token β Used to log into dashboard and create API keys
- API Key β Used by applications to call TTS endpoints
| Variable | Required | Default | Description |
|---|---|---|---|
OWNER_TOKEN |
β Yes | - | Owner's master authentication token |
OWNER_NAME |
No | "owner" | Owner's display name in dashboard |
FRIEND_1_TOKEN |
No | - | Friend 1's master token |
FRIEND_1_NAME |
No | "friend_1" | Friend 1's display name |
FRIEND_2_TOKEN |
No | - | Friend 2's master token |
| ... | ... | ... | Up to FRIEND_5_TOKEN / FRIEND_5_NAME |
DATA_DIR |
No | "/data" | Directory for persistent storage |
import requests
API_URL = "https://your-space.hf.space"
API_KEY = "tts_your_api_key_here"
def text_to_speech(text, language="en", voice="default"):
response = requests.post(
f"{API_URL}/tts",
headers={"key": API_KEY},
json={
"text": text,
"language": language,
"voice": voice,
"format": "mp3"
}
)
response.raise_for_status()
return response.content
# Generate and save audio
audio = text_to_speech("Hello from Python!", language="en")
with open("output.mp3", "wb") as f:
f.write(audio)const fetch = require('node-fetch');
const fs = require('fs');
const API_URL = 'https://your-space.hf.space';
const API_KEY = 'tts_your_api_key_here';
async function textToSpeech(text, language = 'en') {
const response = await fetch(`${API_URL}/tts`, {
method: 'POST',
headers: {
'key': API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: text,
language: language,
format: 'mp3'
})
});
const buffer = await response.buffer();
fs.writeFileSync('output.mp3', buffer);
}
textToSpeech('Hello from JavaScript!');async function generateSpeech(text) {
const response = await fetch('https://your-space.hf.space/tts', {
method: 'POST',
headers: {
'key': 'your_api_key',
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: text,
language: 'en',
format: 'mp3'
})
});
const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();
}Contributions are welcome! Here's how you can help:
- Fork the repository
- Create your feature branch:
git checkout -b feature/AmazingFeature - Commit your changes:
git commit -m 'Add AmazingFeature' - Push to the branch:
git push origin feature/AmazingFeature - Open a Pull Request
# Clone your fork
git clone https://github.com/YOUR_USERNAME/xtts-api.git
cd xtts-api
# Create branch
git checkout -b feature/my-feature
# Make changes, test locally, then submit PRThis project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2024 Anubhav N. Mishra
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
- Coqui TTS - The amazing XTTS-v2 model
- FastAPI - Modern Python web framework
- HuggingFace - Free model hosting and Spaces
- Issues: GitHub Issues
- Author: Anubhav N. Mishra
If you find this project useful, please give it a star! It helps others discover it and motivates continued development.
Made with β€οΈ by Anubhav N. Mishra