AI Video Transcriber

English | 中文

An AI-powered tool to transcribe and summarize videos and podcasts — supports YouTube, TikTok, Bilibili, Apple Podcasts, SoundCloud, and 30+ platforms.

✨ Features

🎥 Multi-Platform Support: Works with YouTube, TikTok, Bilibili, Apple Podcasts, SoundCloud, and 30+ more
⚡ Subtitle-First Architecture: For platforms with native subtitles (e.g. YouTube), transcripts are extracted instantly — no audio download needed. Whisper is only used as a fallback, making the whole pipeline dramatically faster.
🗣️ Intelligent Transcription: High-accuracy speech-to-text using Faster-Whisper when subtitles aren't available
🤖 AI Text Optimization: Automatic typo correction, sentence completion, and intelligent paragraphing
🌍 Multi-Language Summaries: Generate intelligent summaries in multiple languages
🔧 Bring Your Own Model: Configure any OpenAI-compatible API endpoint (OpenAI, OpenRouter, local LLM, etc.) directly in the UI — enter your API Base URL and API Key, then click Fetch to auto-discover all available models and select the one you want
⚙️ Conditional Translation: Auto-translates the transcript when the summary language differs from the source language
📱 Mobile-Friendly: Perfect support for mobile devices

🚀 Quick Start

Prerequisites

Python 3.8+
FFmpeg
An API key from any OpenAI-compatible provider (OpenAI, OpenRouter, etc.) — configured directly in the UI, no server-side env var needed

Installation

Method 1: Automatic Installation

# Clone the repository
git clone https://github.com/wendy7756/AI-Video-Transcriber.git
cd AI-Video-Transcriber

# Run installation script
chmod +x install.sh
./install.sh

Method 2: Docker

# Clone the repository
git clone https://github.com/wendy7756/AI-Video-Transcriber.git
cd AI-Video-Transcriber

# Using Docker Compose (easiest)
cp .env.example .env
# Edit .env file if you want server-side defaults (optional)
docker-compose up -d

# Or using Docker directly
docker build -t ai-video-transcriber .
docker run -p 8000:8000 ai-video-transcriber

Method 3: Manual Installation

Install Python Dependencies

# macOS (PEP 668) strongly recommends using a virtualenv
python3 -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt

Install FFmpeg

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# CentOS/RHEL
sudo yum install ffmpeg

Configure Environment Variables (optional)

# If you prefer server-side defaults, set these — otherwise configure via the UI
export OPENAI_API_KEY="your_api_key_here"
export OPENAI_BASE_URL="https://openrouter.ai/api/v1"  # any OpenAI-compatible endpoint

Start the Service

python3 start.py

After the service starts, open your browser and visit http://localhost:8000

Production Mode (Recommended for long videos)

To avoid SSE disconnections during long processing, start in production mode (hot-reload disabled):

python3 start.py --prod

This keeps the SSE connection stable throughout long tasks (30–60+ min).

Run with explicit env (example)

source venv/bin/activate
export OPENAI_API_KEY=your_api_key_here         # optional: server-side default
# export OPENAI_BASE_URL=https://openrouter.ai/api/v1  # optional: server-side default
python3 start.py --prod

📖 Usage Guide

Enter Video URL: Paste a video link from YouTube, Bilibili, or other supported platforms
Select Summary Language: Choose the output language from the dropdown next to the input area
(Optional) Configure AI Model: Click AI Settings to expand the panel
- Enter your API Base URL (e.g. https://openrouter.ai/api/v1) and API Key
- Click Fetch to auto-load all models from that provider
- Select the model you want — or leave blank to use the server default
Start Processing: Click the Transcribe button. The progress bar shows which mode is active:
- ⚡ Subtitle (green) — native subtitles found, transcript extracted in seconds
- 🎙 Whisper (amber) — no subtitles available, downloading audio for transcription
View Results: Review the optimized transcript and AI summary
- If transcript language ≠ selected summary language, a Translation tab appears automatically
Download Files: Save Markdown-formatted files (Transcript / Translation / Summary)

🛠️ Technical Architecture

Backend Stack

FastAPI: Modern Python web framework
yt-dlp: Video downloading and processing
Faster-Whisper: Efficient speech transcription
OpenAI API: Intelligent text summarization

Frontend Stack

HTML5 + CSS3: Responsive interface design
JavaScript (ES6+): Modern frontend interactions
Marked.js: Markdown rendering
Font Awesome: Icon library

Project Structure

AI-Video-Transcriber/
├── backend/                 # Backend code
│   ├── main.py             # FastAPI main application
│   ├── video_processor.py  # Video processing module
│   ├── transcriber.py      # Transcription module
│   ├── summarizer.py       # Summary module
│   └── translator.py       # Translation module
├── static/                 # Frontend files
│   ├── index.html          # Main page
│   └── app.js              # Frontend logic
├── temp/                   # Temporary files directory
├── Dockerfile              # Docker image configuration
├── docker-compose.yml      # Docker Compose configuration
├── .dockerignore           # Docker ignore rules
├── .env.example            # Environment variables template
├── requirements.txt        # Python dependencies
├── start.py               # Startup script
└── README.md              # Project documentation

⚙️ Configuration Options

Environment Variables

Variable	Description	Default	Required
`OPENAI_API_KEY`	API key (server-side default)	-	No — can be set in UI instead
`HOST`	Server address	`0.0.0.0`	No
`PORT`	Server port	`8000`	No
`WHISPER_MODEL_SIZE`	Whisper model size	`base`	No

Whisper Model Size Options

Model	Parameters	English-only	Multilingual	Speed	Memory Usage
tiny	39 M	✓	✓	Fast	Low
base	74 M	✓	✓	Medium	Low
small	244 M	✓	✓	Medium	Medium
medium	769 M	✓	✓	Slow	Medium
large	1550 M	✗	✓	Very Slow	High

🔧 FAQ

Q: Why is transcription slow?

A: Transcription speed depends on video length, Whisper model size, and hardware performance. Try using smaller models (like tiny or base) to improve speed.

Q: Which video platforms are supported?

A: All platforms supported by yt-dlp, including but not limited to: YouTube, TikTok, Facebook, Instagram, Twitter, Bilibili, Youku, iQiyi, Tencent Video, etc.

Q: What if the AI optimization features are unavailable?

A: AI features require an API key from any OpenAI-compatible provider (OpenAI, OpenRouter, etc.). You can enter it directly in the AI Settings panel in the UI — no server restart needed. Alternatively, set OPENAI_API_KEY as an environment variable for a server-side default.

Q: I get HTTP 500 errors when starting/using the service. Why?

A: In most cases this is an environment configuration issue rather than a code bug. Please check:

Ensure a virtualenv is activated: source venv/bin/activate
Install deps inside the venv: pip install -r requirements.txt
Configure your API key in the AI Settings panel, or set OPENAI_API_KEY as an env var
Install FFmpeg: brew install ffmpeg (macOS) / sudo apt install ffmpeg (Debian/Ubuntu)
If port 8000 is occupied, stop the old process or change PORT

Q: How to handle long videos?

A: The system can process videos of any length, but processing time will increase accordingly. For very long videos, consider using smaller Whisper models.

Q: How to use Docker for deployment?

A: Docker provides the easiest deployment method:

Prerequisites:

Install Docker Desktop from https://www.docker.com/products/docker-desktop/
Ensure Docker service is running

Quick Start:

# Clone and setup
git clone https://github.com/wendy7756/AI-Video-Transcriber.git
cd AI-Video-Transcriber
cp .env.example .env
# Edit .env file to set server-side defaults (optional)

# Start with Docker Compose (recommended)
docker-compose up -d

# Or build and run manually
docker build -t ai-video-transcriber .
docker run -p 8000:8000 --env-file .env ai-video-transcriber

Common Docker Issues:

Port conflict: Change port mapping -p 8001:8000 if 8000 is occupied
Permission denied: Ensure Docker Desktop is running and you have proper permissions
Build fails: Check disk space (need ~2GB free) and network connection
Container won't start: Check Docker logs with docker logs <container_id>

Docker Commands:

# View running containers
docker ps

# Check container logs
docker logs ai-video-transcriber-ai-video-transcriber-1

# Stop service
docker-compose down

# Rebuild after changes
docker-compose build --no-cache

Q: What are the memory requirements?

A: Memory usage varies depending on the deployment method and workload:

Docker Deployment:

Base memory: ~128MB for idle container
During processing: 500MB - 2GB depending on video length and Whisper model
Docker image size: ~1.6GB disk space required
Recommended: 4GB+ RAM for smooth operation

Traditional Deployment:

Base memory: ~50-100MB for FastAPI server
Whisper models memory usage:
- tiny: ~150MB
- base: ~250MB
- small: ~750MB
- medium: ~1.5GB
- large: ~3GB
Peak usage: Base + Model + Video processing (~500MB additional)

Memory Optimization Tips:

# Use smaller Whisper model to reduce memory usage
WHISPER_MODEL_SIZE=tiny  # or base

# For Docker, limit container memory if needed
docker run -m 1g -p 8000:8000 --env-file .env ai-video-transcriber

# Monitor memory usage
docker stats ai-video-transcriber-ai-video-transcriber-1

Q: Network connection errors or timeouts?

A: If you encounter network-related errors during video downloading or API calls, try these solutions:

Common Network Issues:

Video download fails with "Unable to extract" or timeout errors
OpenAI API calls return connection timeout or DNS resolution failures
Docker image pull fails or is extremely slow

Solutions:

Switch VPN/Proxy: Try connecting to a different VPN server or switch your proxy settings
Check Network Stability: Ensure your internet connection is stable
Retry After Network Change: Wait 30-60 seconds after changing network settings before retrying
Use Alternative Endpoints: If using custom OpenAI endpoints, verify they're accessible from your network
Docker Network Issues: Restart Docker Desktop if container networking fails

Quick Network Test:

# Test video platform access
curl -I https://www.youtube.com/

# Test your AI provider endpoint
curl -I https://openrouter.ai

# Test Docker Hub access
docker pull hello-world

🎯 Supported Languages

Transcription

Supports 100+ languages through Whisper
Automatic language detection
High accuracy for major languages

Summary Generation

English
Chinese (Simplified)
Japanese
Korean
Spanish
French
German
Portuguese
Russian
Arabic
And more...

📈 Performance Tips

Hardware Requirements:
- Minimum: 4GB RAM, dual-core CPU
- Recommended: 8GB RAM, quad-core CPU
- Ideal: 16GB RAM, multi-core CPU, SSD storage

Processing Time Estimates:

Video Length	Subtitle Mode	Whisper Mode	Notes
1 minute	~5s	30s–1 min	Subtitle mode needs no audio download
5 minutes	~10s	2–5 min	YouTube auto-captions trigger subtitle mode
15 minutes	~15s	5–15 min	Most YouTube videos support subtitle mode
30+ minutes	~20s	15–60 min	Podcast/audio-only always uses Whisper

🤝 Contributing

We welcome Issues and Pull Requests!

Fork the project
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Acknowledgments

yt-dlp - Powerful video downloading tool
Faster-Whisper - Efficient Whisper implementation
FastAPI - Modern Python web framework
OpenAI - Intelligent text processing API

📞 Contact

For questions or suggestions, please submit an Issue or contact Wendy.

🚀 Try the Full Product — sipsip.ai

This tool is the open-source part of sipsip.ai.

The full product goes further:

📧 Daily email briefs — follow your favorite creators and get an AI-curated digest in your inbox every morning
⚡ Transcribe & summarize any video or podcast on demand
🌐 Multi-language support across all features

Free to start — no credit card required.

➡️ sipsip.ai

⭐ Star History

If you find this project helpful, please consider giving it a star!

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
backend		backend
static		static
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README_ZH.md		README_ZH.md
cn-video.png		cn-video.png
docker-compose.yml		docker-compose.yml
en-video.png		en-video.png
install.sh		install.sh
requirements.txt		requirements.txt
start.py		start.py

Folders and files

Latest commit

History

Repository files navigation