A powerful audio transcription tool that supports local files, YouTube videos, and batch processing with an intuitive web interface.
-
Multiple Input Sources
- Local audio files (MP3, WAV, M4A, OGG)
- YouTube videos via URL
- Batch processing from CSV files
- Channel Extractor - Extract all videos from a YouTube channel
-
Modern Web Interface
- Native file and folder selection dialogs
- Dropdown menu for input source selection
- Persistent output directory settings
- Real-time progress tracking with live metrics
- Download buttons for transcripts
- YouTube Data API v3 integration with OAuth 2.0
-
Multiple Output Formats
- Plain text transcripts (.txt)
- SRT subtitles with timestamps (.srt)
- JSON format with detailed segments (.json)
- Flexible format selection (individual or combined)
-
Language Support
- Turkish transcription
- English transcription
- Automatic language detection
- macOS (with native Finder integration) or Linux/Windows
- Python 3.12 or higher
- FFmpeg
- Git
- For Channel Extractor: Google Cloud Project with YouTube Data API v3 enabled (see setup below)
-
Install Homebrew (if not already installed):
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" -
Install required dependencies:
brew install ffmpeg python git
-
Install Poetry (dependency manager):
curl -sSL https://install.python-poetry.org | python3 - -
Clone the repository:
git clone https://github.com/makgunay/meseleekonomiTranscribe.git cd meseleekonomiTranscribe -
Install project dependencies:
poetry install
-
Download the Whisper model:
mkdir -p models/models--mlx-community--whisper-large-v2-mlx/
Then download the model files from Hugging Face and place them in the created folder.
-
Start the application:
poetry run streamlit run app.py
-
Access the interface:
- Open your browser at
http://localhost:8501 - The interface will display with settings and input options
- Open your browser at
-
Configure settings:
- Output Directory: Click "Browse" to select via Finder or enter path manually
- Input Source: Select from dropdown (Local File, YouTube URL, Batch Processing, or Channel Extractor)
- Language: Choose Turkish or English
- Output Format: Select desired output format(s)
-
Process audio:
- Local File: Click "Select File" to choose audio file, then "Transcribe"
- YouTube URL: Enter URL and click "Transcribe"
- Batch Processing: Select CSV file with URLs and click "Process Batch"
- Channel Extractor: Enter channel ID to extract all videos and download/transcribe
For command-line usage:
poetry run python main.pyFollow the prompts to:
- Choose input source (local file or YouTube URL)
- Select language (Turkish or English)
- Choose output format
- Enter file path or URL
The tool generates different output formats based on your selection:
Plain text transcript without timestamps:
Hello, welcome to our podcast.
Thank you for having me today.
Let's talk about our topic...
Subtitle format with timestamps:
1
00:00:00,000 --> 00:00:02,500
Hello, welcome to our podcast.
2
00:00:02,500 --> 00:00:04,800
Thank you for having me today.
Detailed segment information with timestamps and confidence scores:
{
"text": "Full transcript text...",
"segments": [
{
"start": 0.0,
"end": 2.5,
"text": "Hello, welcome to our podcast."
}
]
}Extract all videos from a YouTube channel using YouTube Data API v3.
-
Create Google Cloud Project:
- Go to Google Cloud Console
- Create a new project or select existing one
- Enable "YouTube Data API v3" in the API Library
-
Create OAuth 2.0 Credentials:
- Navigate to: APIs & Services → Credentials
- Click "Create Credentials" → "OAuth 2.0 Client ID"
- Application type: "Desktop app"
- Download the credentials as
client_secret.json
-
Place credentials file:
- Put
client_secret.jsonin the project root directory - It's automatically excluded from git via .gitignore
- Put
-
First-time authentication:
- Run the channel extractor (CLI or Web UI)
- Browser will open for Google account authorization
- Grant access to YouTube Data API
- Token will be saved to
token.jsonfor future use
- Find the YouTube channel ID (the part starting with "UC" in the channel URL)
- In the web interface, select "Channel Extractor" from the input source dropdown
- Enter the channel ID and click "Extract Channel Data"
- The tool will:
- Authenticate with YouTube API
- Extract all videos from the channel
- Save CSV and JSON files with comprehensive metadata
- Mark already downloaded videos in the CSV
- Use the generated CSV with "Batch Processing" to download and transcribe videos
- Daily quota: 10,000 units
- Cost per channel extraction (~1000 videos): ~41 units
- Large channel (2000 videos): ~81 units (0.81% of daily quota)
For processing multiple YouTube videos:
- Use the Channel Extractor to generate a CSV with all channel videos
- The CSV includes comprehensive metadata and download tracking
- Use the generated CSV with Batch Processing
- The system automatically:
- Skips already downloaded videos
- Updates CSV with download status after each video
- Shows live metrics (downloaded, skipped, failed, remaining)
- Create a CSV file with URLs in the third column
- Use the web interface's "Batch Processing" option
- Select your CSV file using the native file picker
- Monitor progress as each video is processed
- All transcripts are saved to your output directory
The Channel Extractor generates CSV files with comprehensive metadata:
Required columns:
video_id: YouTube video ID (11 characters)title: Video titledescription: Full video descriptionlink: YouTube watch URL (column 4, index 3)publish_date: ISO 8601 timestampview_count,like_count,comment_count: Statistics (may be 'N/A')duration: ISO 8601 duration format (e.g., PT4M33S)tags: Comma-separated tag listcategory_id: YouTube category IDdownloaded: Download status flag ("Yes", "No", "Failed")
Folder structure:
- Channel data saved to:
video/{channel_name}/ - Filename format:
{channel_name}_{YYYYMMDD_HHMM}_{video_count}.csv
Download tracking:
- "No": Not yet downloaded
- "Yes": Successfully downloaded
- "Failed": Download failed
- The batch downloader automatically updates this flag in real-time
Example:
video_id,title,description,link,publish_date,view_count,like_count,comment_count,duration,tags,category_id,downloaded
dQw4w9WgXcQ,Never Gonna Give You Up,Official music video,https://www.youtube.com/watch?v=dQw4w9WgXcQ,2009-10-25T06:57:33Z,1500000000,25000000,3500000,PT3M33S,"music, 80s",10,No- The CSV must include a header row
- YouTube URLs must be in the third column (index 2)
- Empty or malformed rows are ignored
Example:
title,channel,url
Konutta şehir efsanesi,Mesele Ekonomi,https://www.youtube.com/watch?v=lX42-MSQ_rM
2025 beklentileri,Mesele Ekonomi,https://www.youtube.com/watch?v=UuXCEDwVKMImeseleekonomiTranscribe/
├── app.py # Streamlit web interface
├── main.py # CLI interface
├── channel_extractor.py # YouTube Data API v3 integration
├── transcription.py # Core transcription logic
├── audio_downloader.py # YouTube download functionality
├── interface.py # CLI user interface
├── utils.py # Utility functions
├── models/ # MLX Whisper model files
├── video/ # Default output directory
│ └── {channel_name}/ # Channel-specific folders
├── client_secret.json # OAuth credentials (not in git)
└── token.json # OAuth token (not in git)
-
YouTube download errors:
- The tool includes automatic retry mechanisms
- Uses latest yt-dlp with enhanced extraction methods
- If persistent, check your internet connection
-
Model not found:
- Ensure model files are in
models/models--mlx-community--whisper-large-v2-mlx/ - Download all required files from Hugging Face
- Ensure model files are in
-
Permission errors:
- Ensure you have write permissions for the output directory
- Try selecting a different output folder
-
Memory issues:
- For long audio files, the tool processes in segments
- Close other applications if needed
poetry run pytestpoetry run black .
poetry run flake8Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License.
- MLX Whisper for the transcription model
- Streamlit for the web framework
- yt-dlp for YouTube downloading capabilities
- Google YouTube Data API v3 for channel extraction
For issues or questions, please open an issue on GitHub.
Happy transcribing! 🎙️