MeseleEkonomi Transcribe

A powerful audio transcription tool that supports local files, YouTube videos, and batch processing with an intuitive web interface.

Features

Multiple Input Sources
- Local audio files (MP3, WAV, M4A, OGG)
- YouTube videos via URL
- Batch processing from CSV files
- Channel Extractor - Extract all videos from a YouTube channel
Modern Web Interface
- Native file and folder selection dialogs
- Dropdown menu for input source selection
- Persistent output directory settings
- Real-time progress tracking with live metrics
- Download buttons for transcripts
- YouTube Data API v3 integration with OAuth 2.0
Multiple Output Formats
- Plain text transcripts (.txt)
- SRT subtitles with timestamps (.srt)
- JSON format with detailed segments (.json)
- Flexible format selection (individual or combined)
Language Support
- Turkish transcription
- English transcription
- Automatic language detection

Prerequisites

macOS (with native Finder integration) or Linux/Windows
Python 3.12 or higher
FFmpeg
Git
For Channel Extractor: Google Cloud Project with YouTube Data API v3 enabled (see setup below)

Installation

macOS Setup

Install Homebrew (if not already installed):

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Install required dependencies:
```
brew install ffmpeg python git
```

Install Poetry (dependency manager):

curl -sSL https://install.python-poetry.org | python3 -

Clone the repository:

git clone https://github.com/makgunay/meseleekonomiTranscribe.git
cd meseleekonomiTranscribe

Install project dependencies:
```
poetry install
```
Download the Whisper model:
```
mkdir -p models/models--mlx-community--whisper-large-v2-mlx/
```
Then download the model files from Hugging Face and place them in the created folder.

Usage

Web Interface (Recommended)

Start the application:
```
poetry run streamlit run app.py
```
Access the interface:
- Open your browser at http://localhost:8501
- The interface will display with settings and input options
Configure settings:
- Output Directory: Click "Browse" to select via Finder or enter path manually
- Input Source: Select from dropdown (Local File, YouTube URL, Batch Processing, or Channel Extractor)
- Language: Choose Turkish or English
- Output Format: Select desired output format(s)
Process audio:
- Local File: Click "Select File" to choose audio file, then "Transcribe"
- YouTube URL: Enter URL and click "Transcribe"
- Batch Processing: Select CSV file with URLs and click "Process Batch"
- Channel Extractor: Enter channel ID to extract all videos and download/transcribe

Command Line Interface

For command-line usage:

poetry run python main.py

Follow the prompts to:

Choose input source (local file or YouTube URL)
Select language (Turkish or English)
Choose output format
Enter file path or URL

Output Files

The tool generates different output formats based on your selection:

Text Format (.txt)

Plain text transcript without timestamps:

Hello, welcome to our podcast.
Thank you for having me today.
Let's talk about our topic...

SRT Format (.srt)

Subtitle format with timestamps:

1
00:00:00,000 --> 00:00:02,500
Hello, welcome to our podcast.

2
00:00:02,500 --> 00:00:04,800
Thank you for having me today.

JSON Format (.json)

Detailed segment information with timestamps and confidence scores:

{
  "text": "Full transcript text...",
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "Hello, welcome to our podcast."
    }
  ]
}

Channel Extractor

Extract all videos from a YouTube channel using YouTube Data API v3.

Setup

Create Google Cloud Project:
- Go to Google Cloud Console
- Create a new project or select existing one
- Enable "YouTube Data API v3" in the API Library
Create OAuth 2.0 Credentials:
- Navigate to: APIs & Services → Credentials
- Click "Create Credentials" → "OAuth 2.0 Client ID"
- Application type: "Desktop app"
- Download the credentials as client_secret.json
Place credentials file:
- Put client_secret.json in the project root directory
- It's automatically excluded from git via .gitignore
First-time authentication:
- Run the channel extractor (CLI or Web UI)
- Browser will open for Google account authorization
- Grant access to YouTube Data API
- Token will be saved to token.json for future use

Usage

Find the YouTube channel ID (the part starting with "UC" in the channel URL)
In the web interface, select "Channel Extractor" from the input source dropdown
Enter the channel ID and click "Extract Channel Data"
The tool will:
- Authenticate with YouTube API
- Extract all videos from the channel
- Save CSV and JSON files with comprehensive metadata
- Mark already downloaded videos in the CSV
Use the generated CSV with "Batch Processing" to download and transcribe videos

API Quota

Daily quota: 10,000 units
Cost per channel extraction (~1000 videos): ~41 units
Large channel (2000 videos): ~81 units (0.81% of daily quota)

Batch Processing

For processing multiple YouTube videos:

Option 1: Channel Extractor (Recommended)

Use the Channel Extractor to generate a CSV with all channel videos
The CSV includes comprehensive metadata and download tracking
Use the generated CSV with Batch Processing
The system automatically:
- Skips already downloaded videos
- Updates CSV with download status after each video
- Shows live metrics (downloaded, skipped, failed, remaining)

Option 2: Manual CSV

Create a CSV file with URLs in the third column
Use the web interface's "Batch Processing" option
Select your CSV file using the native file picker
Monitor progress as each video is processed
All transcripts are saved to your output directory

CSV Format

Channel Extractor CSV Format (Recommended)

The Channel Extractor generates CSV files with comprehensive metadata:

Required columns:

video_id: YouTube video ID (11 characters)
title: Video title
description: Full video description
link: YouTube watch URL (column 4, index 3)
publish_date: ISO 8601 timestamp
view_count, like_count, comment_count: Statistics (may be 'N/A')
duration: ISO 8601 duration format (e.g., PT4M33S)
tags: Comma-separated tag list
category_id: YouTube category ID
downloaded: Download status flag ("Yes", "No", "Failed")

Folder structure:

Channel data saved to: video/{channel_name}/
Filename format: {channel_name}_{YYYYMMDD_HHMM}_{video_count}.csv

Download tracking:

"No": Not yet downloaded
"Yes": Successfully downloaded
"Failed": Download failed
The batch downloader automatically updates this flag in real-time

Example:

video_id,title,description,link,publish_date,view_count,like_count,comment_count,duration,tags,category_id,downloaded
dQw4w9WgXcQ,Never Gonna Give You Up,Official music video,https://www.youtube.com/watch?v=dQw4w9WgXcQ,2009-10-25T06:57:33Z,1500000000,25000000,3500000,PT3M33S,"music, 80s",10,No

Manual CSV Format (Legacy)

The CSV must include a header row
YouTube URLs must be in the third column (index 2)
Empty or malformed rows are ignored

Example:

title,channel,url
Konutta şehir efsanesi,Mesele Ekonomi,https://www.youtube.com/watch?v=lX42-MSQ_rM
2025 beklentileri,Mesele Ekonomi,https://www.youtube.com/watch?v=UuXCEDwVKMI

Project Structure

meseleekonomiTranscribe/
├── app.py                 # Streamlit web interface
├── main.py                # CLI interface
├── channel_extractor.py   # YouTube Data API v3 integration
├── transcription.py       # Core transcription logic
├── audio_downloader.py    # YouTube download functionality
├── interface.py           # CLI user interface
├── utils.py               # Utility functions
├── models/                # MLX Whisper model files
├── video/                 # Default output directory
│   └── {channel_name}/    # Channel-specific folders
├── client_secret.json     # OAuth credentials (not in git)
└── token.json             # OAuth token (not in git)

Troubleshooting

Common Issues

YouTube download errors:
- The tool includes automatic retry mechanisms
- Uses latest yt-dlp with enhanced extraction methods
- If persistent, check your internet connection
Model not found:
- Ensure model files are in models/models--mlx-community--whisper-large-v2-mlx/
- Download all required files from Hugging Face
Permission errors:
- Ensure you have write permissions for the output directory
- Try selecting a different output folder
Memory issues:
- For long audio files, the tool processes in segments
- Close other applications if needed

Development

Running Tests

poetry run pytest

Code Style

poetry run black .
poetry run flake8

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License.

Acknowledgments

MLX Whisper for the transcription model
Streamlit for the web framework
yt-dlp for YouTube downloading capabilities
Google YouTube Data API v3 for channel extraction

Support

For issues or questions, please open an issue on GitHub.

Happy transcribing! 🎙️

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
Docs		Docs
__pycache__		__pycache__
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
MeseleEkonomi_1.png		MeseleEkonomi_1.png
README.md		README.md
__init__.py		__init__.py
app.py		app.py
audio_downloader.py		audio_downloader.py
channel_extractor.py		channel_extractor.py
interface.py		interface.py
main.py		main.py
merged_transcript.json		merged_transcript.json
meseleekonomiTranscribe.py		meseleekonomiTranscribe.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
transcription.py		transcription.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

MeseleEkonomi Transcribe

Features

Prerequisites

Installation

macOS Setup

Usage

Web Interface (Recommended)

Command Line Interface

Output Files

Text Format (.txt)

SRT Format (.srt)

JSON Format (.json)

Channel Extractor

Setup

Usage

API Quota

Batch Processing

Option 1: Channel Extractor (Recommended)

Option 2: Manual CSV

CSV Format

Channel Extractor CSV Format (Recommended)

Manual CSV Format (Legacy)

Project Structure

Troubleshooting

Common Issues

Development

Running Tests

Code Style

Contributing

License

Acknowledgments

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages