🎙️ SpeechSync – Speak with Ease, Translate in a Breeze

SpeechSync is a real-time multilingual speech-to-speech translation system that bridges communication gaps across languages. Built using a modular pipeline of ASR (Automatic Speech Recognition), NMT (Neural Machine Translation), and TTS (Text-to-Speech), SpeechSync delivers low-latency, high-accuracy translations that are both context-aware and user-friendly.

🔗 Live Demo

Access the live project here: View Deployed App

⚠️ For the best experience, open this app in Google Chrome. Safari may have limited audio playback support.

🖼️ User Interface Preview

Below is a preview of the web-based interface designed for seamless interaction. Users can speak live and receive real-time translations in an intuitive, responsive interface.

🚀 Features

🎤 Real-time speech recognition with pause/resume control
🌐 Multilingual translation using MarianMT
🗣️ Natural-sounding speech synthesis via gTTS
⚡ Fast performance with Whisper Tiny model
🧠 Support for low-resource languages (Hindi, Bengali, etc.)
🔄 Manual and automatic transcription + translation options
🎛️ Clean, responsive UI with real-time audio waveform visualization
💻 Best experienced in Google Chrome (Safari playback may be limited)

🛠️ Tech Stack

Component	Technology Used
Frontend	HTML5, CSS3, JavaScript
Backend	Flask (Python)
Translation	MarianMT (transformers)
Speech-to-Text	OpenAI Whisper (tiny)
Text-to-Speech	gTTS + pydub
Audio Encoding	MediaRecorder API

📸 Interface Overview

🎙️ Start, Pause, and Stop recording buttons
🎧 Playback for both input and translated audio
🌍 Clickable buttons to select translation language
📈 Real-time audio waveform visualizer
🔄 Status updates showing transcription and translation stages

📁 Project Structure

📂 Here's how the core directory looks:

SpeechSync/
├── app.py                  # Flask backend server
├── index.html              # Main frontend UI
├── process_audio.py        # ASR, NMT, and TTS logic
├── requirements.txt        # Python dependencies
├── .gitignore              # Git ignore rules
├── README.md               # Project documentation
├── uploads/                # Stores recorded audio files
├── output/                 # Stores generated audio response

🧑‍💻 Installation & Setup

1. Clone the Repository

git clone https://github.com/DSinghania13/SpeechSync.git
cd SpeechSync

2. Create Virtual Environment (Optional but Recommended)

python3.12 -m venv .venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows

3. Install Python Dependencies

pip install -r requirements.txt

▶️ Run the Application

python3 app.py

Open your browser (preferably Google Chrome) and go to:

http://localhost:5050

🧪 Model Architecture

ASR (Whisper) → Transcribes speech into text
NMT (MarianMT) → Translates text into target language
TTS (gTTS) → Converts translated text into speech

⚙️ How It Works

Transcribe: Converts input speech to text using Whisper.
Translate: Translates English text to the selected language using MarianMT.
Synthesize: Generates translated speech using gTTS and plays it back.

🌍 Supported Languages

Hindi (hi-IN)
Bengali (bn-IN)
Spanish (es-SP)
French (fr-FR)

Note: More languages can be added by extending the MarianMT and gTTS logic in process_audio.py.

🎯 Use Cases

Education: Multi-language classrooms and learning tools
Healthcare: Real-time doctor-patient translation
Business: Cross-lingual collaboration
Tourism: Voice translator for travel
Emergency: Humanitarian aid communication

📊 Performance

⏱️ Latency: Under 5 seconds
✅ Accuracy: 90%+ for common languages
🌍 Supports Hindi, Spanish, French, Bengali
🔁 Scalable with modular backend

👨‍💻 Contributors

“Speak with ease, translate in a breeze” – because the world should never be lost in translation.

📝 License

This project is licensed under the MIT License.
You are free to use, modify, and distribute this software with proper attribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ SpeechSync – Speak with Ease, Translate in a Breeze

🔗 Live Demo

🖼️ User Interface Preview

🚀 Features

🛠️ Tech Stack

📸 Interface Overview

📁 Project Structure

🧑‍💻 Installation & Setup

1. Clone the Repository

2. Create Virtual Environment (Optional but Recommended)

3. Install Python Dependencies

▶️ Run the Application

🧪 Model Architecture

⚙️ How It Works

🌍 Supported Languages

🎯 Use Cases

📊 Performance

👨‍💻 Contributors

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
index.html		index.html
process_audio.py		process_audio.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🎙️ SpeechSync – Speak with Ease, Translate in a Breeze

🔗 Live Demo

🖼️ User Interface Preview

🚀 Features

🛠️ Tech Stack

📸 Interface Overview

📁 Project Structure

🧑‍💻 Installation & Setup

1. Clone the Repository

2. Create Virtual Environment (Optional but Recommended)

3. Install Python Dependencies

▶️ Run the Application

🧪 Model Architecture

⚙️ How It Works

🌍 Supported Languages

🎯 Use Cases

📊 Performance

👨‍💻 Contributors

📝 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages