Skip to content

MYounus-Codes/voice-ai-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎙️ Voice Agentic Workflow

A voice-powered AI assistant that listens to your speech, translates it to English (if needed), processes it through an AI agent, and responds with synthesized speech — all in real-time.

Python License OpenRouter

✨ Features

  • 🎤 Voice Input — Captures speech using your microphone with Google Speech Recognition
  • 🌍 Auto Translation — Automatically translates non-English speech to English
  • 🤖 AI Agent — Processes queries using OpenRouter's free LLM models
  • 🔊 Voice Output — Responds with natural text-to-speech (offline, no API needed)
  • Async Architecture — Built with asyncio for efficient, non-blocking operations

🏗️ Architecture

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   Microphone    │────▶│  Speech-to-Text  │────▶│   Translator    │
│   (Voice In)    │     │  (Google API)    │     │   (to English)  │
└─────────────────┘     └──────────────────┘     └────────┬────────┘
                                                          │
                                                          ▼
┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│    Speaker      │◀────│  Text-to-Speech  │◀────│    AI Agent     │
│   (Voice Out)   │     │    (pyttsx3)     │     │  (OpenRouter)   │
└─────────────────┘     └──────────────────┘     └─────────────────┘

📁 Project Structure

voice-agentic-workflow/
├── src/
│   ├── ai_agents/
│   │   └── all_agents.py      # Main agent configuration & runner
│   ├── agents_tools/
│   │   ├── speech_to_text.py  # Voice capture & translation
│   │   └── text_to_speech.py  # AI response vocalization
│   └── .env                   # API keys (not tracked in git)
├── pyproject.toml             # Project dependencies
├── README.md
└── .gitignore

🚀 Quick Start

Prerequisites

  • Python 3.12+
  • uv (recommended) or pip
  • Working microphone
  • Internet connection (for speech recognition & AI)

Installation

  1. Clone the repository

    git clone https://github.com/MYounus-Codes/voice-ai-agent.git
    cd voice-ai-agent
  2. Install dependencies with uv

    uv sync

    Or with pip:

    pip install -r requirements.txt
  3. Set up environment variables

    Create a .env file in the src/ directory:

    OPENROUTER_API_KEY=your_openrouter_api_key_here

    Get your free API key from OpenRouter

Running the Assistant

uv run src/ai_agents/all_agents.py

When you see Listening for voice input... Speak clearly., start speaking!

🔧 Configuration

Changing the AI Model

Edit src/ai_agents/all_agents.py to use a different model:

model = OpenAIChatCompletionsModel(
    model="xiaomi/mimo-v2-flash:free",  # Change to any OpenRouter model
    openai_client=external_client
)

Browse available models at OpenRouter Models

Customizing Voice Output

Edit src/agents_tools/text_to_speech.py to adjust:

# Speech rate (words per minute)
engine.setProperty('rate', 150)

# Volume (0.0 to 1.0)
engine.setProperty('volume', 1.0)

# Voice selection (male/female depends on system)
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)  # Try different indices

📦 Dependencies

Package Purpose
openai-agents AI agent framework with OpenAI-compatible API
speechrecognition Voice capture & Google Speech-to-Text
translate Automatic language translation
pyttsx3 Offline text-to-speech synthesis
pyaudio Audio I/O for microphone access
python-dotenv Environment variable management

🔍 How It Works

  1. Voice Capture — The microphone listens for your voice input
  2. Speech Recognition — Google's Speech-to-Text API converts audio to text
  3. Translation — If the text isn't in English, it's automatically translated
  4. AI Processing — The translated text is sent to an AI agent via OpenRouter
  5. Voice Response — The AI's response is spoken aloud using pyttsx3

🐛 Troubleshooting

"Could not understand the audio"

  • Speak clearly and closer to the microphone
  • Reduce background noise
  • Check if your microphone is working

"401 - User not found" Error

  • Verify your OpenRouter API key is valid
  • Generate a new key at openrouter.ai/keys
  • Ensure no quotes around the key in .env

"No module named 'pyaudio'"

On Windows:

pip install pipwin
pipwin install pyaudio

On Linux:

sudo apt-get install portaudio19-dev
pip install pyaudio

On macOS:

brew install portaudio
pip install pyaudio

🤝 Contributing

Contributions are welcome! Feel free to:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments


Made with ❤️ by MYounus-Codes

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages