A voice-powered AI assistant that listens to your speech, translates it to English (if needed), processes it through an AI agent, and responds with synthesized speech — all in real-time.
- 🎤 Voice Input — Captures speech using your microphone with Google Speech Recognition
- 🌍 Auto Translation — Automatically translates non-English speech to English
- 🤖 AI Agent — Processes queries using OpenRouter's free LLM models
- 🔊 Voice Output — Responds with natural text-to-speech (offline, no API needed)
- ⚡ Async Architecture — Built with asyncio for efficient, non-blocking operations
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Microphone │────▶│ Speech-to-Text │────▶│ Translator │
│ (Voice In) │ │ (Google API) │ │ (to English) │
└─────────────────┘ └──────────────────┘ └────────┬────────┘
│
▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Speaker │◀────│ Text-to-Speech │◀────│ AI Agent │
│ (Voice Out) │ │ (pyttsx3) │ │ (OpenRouter) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
voice-agentic-workflow/
├── src/
│ ├── ai_agents/
│ │ └── all_agents.py # Main agent configuration & runner
│ ├── agents_tools/
│ │ ├── speech_to_text.py # Voice capture & translation
│ │ └── text_to_speech.py # AI response vocalization
│ └── .env # API keys (not tracked in git)
├── pyproject.toml # Project dependencies
├── README.md
└── .gitignore
- Python 3.12+
- uv (recommended) or pip
- Working microphone
- Internet connection (for speech recognition & AI)
-
Clone the repository
git clone https://github.com/MYounus-Codes/voice-ai-agent.git cd voice-ai-agent -
Install dependencies with uv
uv sync
Or with pip:
pip install -r requirements.txt
-
Set up environment variables
Create a
.envfile in thesrc/directory:OPENROUTER_API_KEY=your_openrouter_api_key_here
Get your free API key from OpenRouter
uv run src/ai_agents/all_agents.pyWhen you see Listening for voice input... Speak clearly., start speaking!
Edit src/ai_agents/all_agents.py to use a different model:
model = OpenAIChatCompletionsModel(
model="xiaomi/mimo-v2-flash:free", # Change to any OpenRouter model
openai_client=external_client
)Browse available models at OpenRouter Models
Edit src/agents_tools/text_to_speech.py to adjust:
# Speech rate (words per minute)
engine.setProperty('rate', 150)
# Volume (0.0 to 1.0)
engine.setProperty('volume', 1.0)
# Voice selection (male/female depends on system)
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id) # Try different indices| Package | Purpose |
|---|---|
openai-agents |
AI agent framework with OpenAI-compatible API |
speechrecognition |
Voice capture & Google Speech-to-Text |
translate |
Automatic language translation |
pyttsx3 |
Offline text-to-speech synthesis |
pyaudio |
Audio I/O for microphone access |
python-dotenv |
Environment variable management |
- Voice Capture — The microphone listens for your voice input
- Speech Recognition — Google's Speech-to-Text API converts audio to text
- Translation — If the text isn't in English, it's automatically translated
- AI Processing — The translated text is sent to an AI agent via OpenRouter
- Voice Response — The AI's response is spoken aloud using pyttsx3
- Speak clearly and closer to the microphone
- Reduce background noise
- Check if your microphone is working
- Verify your OpenRouter API key is valid
- Generate a new key at openrouter.ai/keys
- Ensure no quotes around the key in
.env
On Windows:
pip install pipwin
pipwin install pyaudioOn Linux:
sudo apt-get install portaudio19-dev
pip install pyaudioOn macOS:
brew install portaudio
pip install pyaudioContributions are welcome! Feel free to:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenRouter for providing access to various LLM models
- OpenAI Agents SDK for the agent framework
- SpeechRecognition for voice capture capabilities
Made with ❤️ by MYounus-Codes