Valoqui is a high-performance, real-time AI conversation partner designed for immersive language learning. This MVP focuses on ultra-low latency voice interactions, utilizing a hybrid on-device/cloud audio pipeline to provide a seamless, human-like speaking experience.
Journey • Get Started • Screenshots • Architecture • Setup • Testing • Contributing
Welcome to Valoqui! Whether you're evaluating the project, diving into the codebase, or contributing, start here:
- 🧭 Journey: How this was built, what I learned, and where the line is between my work and the AI's. Start here if you're evaluating the project.
- 🚀 Zero to Hero (Onboarding): New to the codebase? Start here for a mental model, codebase tour, and your first steps.
- 📱 App Screenshots: Explore the interface and features of the app through high-fidelity design screenshots.
- 🛠 Developer Setup: A step-by-step guide to setting up your local environment, Firebase, and required models.
- 🏛 Architecture Deep Dive: Understand the Feature-First Clean Architecture, the voice pipeline, and state management.
- 🧠 Architectural Decision Records (ADRs): Learn why decisions were made (e.g., choosing BLoC, Groq, Piper).
- 📊 Product Management: View the Product Requirements Document (PRD) and Sprint Plans to see how the project is scoped and executed.
- 📏 Coding Standards: The strict technical conventions required when contributing to this codebase.
- 🧪 Testing Strategy: Learn about our ~95% coverage, testing patterns, and how to write tests for BLoCs.
↳ Latest targeted use-case coverage update: Use Case Coverage Expansion (April 2026). - 🤝 Contributing: Ready to write code? Read our guidelines for PRs, commits, and codebase rules.
To eliminate latency, Valoqui leverages a highly optimized audio stack:
- VAD (Voice Activity Detection): Uses on-device Silero VAD for accurate segment-based speech detection.
- STT (Speech-to-Text): Utilizes on-device Sherpa-ONNX (Moonshine) for highly accurate, private transcription.
- TTS (Text-to-Speech): Utilizes on-device VITS/Piper with a natural Spanish voice (Lucia). Optimized for streaming using a rotating file-cache strategy to prevent stale audio.
- TTS Warm-Up: Pre-initializes the TTS engine during LLM generation to overlap initialization with other processing, eliminating 500-2000ms cold start latency (ADR-013).
- Streaming LLM: Integrates Groq (LLaMA 3.3 70B) for near-instantaneous response generation.
- Resilient Fallback: Automatically switches to Google Gemini 2.0 Flash if the primary provider fails, ensuring conversation continuity.
- Sentence-Boundary TTS: Intelligently triggers TTS playback on sentence boundaries during LLM streaming for a natural human-like cadence.
- Security First: Implements a "Bring Your Own Key" (BYOK) model. API keys are stored securely using Android Keystore / iOS Keychain via
flutter_secure_storage. - Clean Network Layer: Custom Dio interceptors dynamically inject secure keys into request headers, isolating security logic from business features.
- Framework: Flutter (Dart)
- Architecture: Feature-First Modular Clean Architecture
- State Management:
flutter_bloc+freezed(States) &Equatable(Events) - Dependency Injection:
get_it(Service Locator) - Functional Programming:
fpdart(usingEitherfor robust error handling) - Networking:
dio - Backend: Firebase (Authentication & Firestore)
- AI/ML: Groq (LLM), Gemini (LLM Fallback), Sherpa-ONNX (VAD/STT/TTS)
