Qwen3-TTS-API

OpenAI-compatible Text-to-Speech API powered by Qwen3-TTS with faster-qwen3-tts CUDA graph acceleration.

7-10x faster than stock inference. Real-time generation on RTX 4060/5060 Ti class GPUs. No flash-attn, no vLLM, no Triton — just CUDA graphs.

Features

OpenAI-compatible /v1/audio/speech endpoint (JSON body)
CUDA graph acceleration — 7-10x faster than baseline
Streaming output ("stream": true returns chunked WAV)
Voice cloning from 3-second reference audio
10 languages: Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian
9 built-in voices (CustomVoice model) with instruction-based emotion control
No flash-attn dependency required
Supports RTX 50-series (Blackwell) GPUs

Quick Start

docker run -d --gpus all \
  -p 8080:8080 \
  -v /mnt/user/appdata/qwen3-tts-api/models:/root/.cache/huggingface \
  -e MODEL_ID=Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \
  --shm-size=4g \
  --name qwen3-tts-api \
  ghcr.io/hsiang-han/qwen3-tts-api:latest

Or with docker compose:

docker compose -f docker/gpu/docker-compose.yml up -d

First start downloads model (~3-7GB) and captures CUDA graphs on first request. China users: set HF_ENDPOINT=https://hf-mirror.com.

Usage Examples

# Generate speech with built-in voice
curl -X POST http://localhost:8080/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello, this is a test.", "voice": "Vivian", "language": "English"}' \
  --output output.wav

# With emotion instruction (1.7B CustomVoice only)
curl -X POST http://localhost:8080/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "我真的太开心了！", "voice": "Vivian", "instruct": "用特别开心的语气说"}' \
  --output happy.wav

# Streaming output
curl -X POST http://localhost:8080/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Streaming test.", "voice": "Vivian", "stream": true}' \
  --output stream.wav

# List available voices
curl http://localhost:8080/v1/voices

Built-in Voices (CustomVoice model)

Voice	Description	Native Language
Vivian	Bright, slightly edgy young female	Chinese
Serena	Warm, gentle young female	Chinese
Uncle_Fu	Seasoned male, low mellow timbre	Chinese
Dylan	Youthful Beijing male, clear natural	Chinese (Beijing)
Eric	Lively Chengdu male, slightly husky	Chinese (Sichuan)
Ryan	Dynamic male, strong rhythmic drive	English
Aiden	Sunny American male, clear midrange	English
Ono_Anna	Playful Japanese female, light nimble	Japanese
Sohee	Warm Korean female, rich emotion	Korean

API Endpoints

Endpoint	Method	Description
`/v1/audio/speech`	POST	Text-to-speech (JSON body, OpenAI-compatible)
`/v1/audio/speech/clone`	POST	Voice cloning (Form + file upload, Base model only)
`/v1/voices`	GET	List available voices and languages
`/v1/models`	GET	List models
`/health`	GET	Health check
`/docs`	GET	Swagger documentation

Environment Variables

Variable	Default	Description
MODEL_ID	Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice	HuggingFace model ID or local path
DTYPE	bfloat16	Model precision (float16, bfloat16, float32)
DEVICE	cuda:0	CUDA device
ATTN_IMPLEMENTATION	sdpa	Attention backend (sdpa, eager)
PORT	8080	API server port
HF_HOME	/root/.cache/huggingface	HuggingFace cache directory
HF_ENDPOINT	https://huggingface.co	HuggingFace mirror (China: https://hf-mirror.com)

Available Models

Model ID	Type	VRAM	Features
Qwen/Qwen3-TTS-12Hz-0.6B-Base	Base	~3GB	Voice clone
Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice	CustomVoice	~3GB	9 built-in voices
Qwen/Qwen3-TTS-12Hz-1.7B-Base	Base	~6GB	Voice clone
Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice	CustomVoice	~6GB	9 built-in voices + instruction control
Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign	VoiceDesign	~6GB	Design voice from text description

Hardware Requirements

NVIDIA GPU with 4GB+ VRAM (0.6B) or 8GB+ VRAM (1.7B)
NVIDIA driver 550+ (Ampere/Ada) or 570+ (Blackwell RTX 50-series)
Docker with NVIDIA Container Toolkit

Credits

Qwen3-TTS by Alibaba Qwen Team — the model
faster-qwen3-tts by @andimarafioti — CUDA graph acceleration (7-10x speedup)

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
api		api
docker		docker
templates		templates
.gitignore		.gitignore
README.md		README.md
README_zh.md		README_zh.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen3-TTS-API

Features

Quick Start

Usage Examples

Built-in Voices (CustomVoice model)

API Endpoints

Environment Variables

Available Models

Hardware Requirements

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Qwen3-TTS-API

Features

Quick Start

Usage Examples

Built-in Voices (CustomVoice model)

API Endpoints

Environment Variables

Available Models

Hardware Requirements

Credits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages