🏆 Official Entrant of Gemini Live Agent Challenge
Transforming children's drawings into interactive imaginary friends via Multimodal Orchestration.
A Dual-Audience Platform designed for Pediatric Therapy and Emotional Engagement.
██████╗ ██████╗ ██████╗ ██████╗ ██╗ ███████╗███████╗ ██████╗ ██╗ ██╗██╗
██╔══██╗██╔═══██╗██╔═══██╗██╔══██╗██║ ██╔════╝██╔════╝██╔═══██╗██║ ██║██║
██║ ██║██║ ██║██║ ██║██║ ██║██║ █████╗ ███████╗██║ ██║██║ ██║██║
██║ ██║██║ ██║██║ ██║██║ ██║██║ ██╔══╝ ╚════██║██║ ██║██║ ██║██║
██████╔╝╚██████╔╝╚██████╔╝██████╔╝███████╗███████╗███████║╚██████╔╝╚██████╔╝███████╗
╚═════╝ ╚═════╝ ╚═════╝ ╚═════╝ ╚══════╝╚══════╝╚══════╝ ╚═════╝ ╚═════╝ ╚══════╝
Developed for the #GeminiLiveAgentChallenge
- 🎯 Overview
- 🚀 Local Spin-Up (Judge's Guide)
- ✨ Key Features
- 🏗️ Technical Architecture
- 🔒 Security & Compliance (LGPD/ECA)
- 🧠 Gemini Multimodal Integration
- 🎥 Judge Test Script
- 🛠️ Technical Debt & Production Roadmap
DoodleSoul addresses the "Clinical Blockade" in pediatric therapy. For children with ASD, ADHD, or Selective Mutism, traditional talk therapy is often perceived as a threat.
Our solution uses Technological Externalization: the child draws a character, and we use the Gemini Live API to bring it to life. By projecting internal emotions onto a "digital puppet," we bypass defensive filters, transforming a passive patient into an active storyteller.
As per the hackathon rules ("URL to your Public Code Repository"), the following instructions allow for a complete local reproduction of the DoodleSoul experience. Only the backend is hosted on Google Cloud; for evaluation purposes, running the frontend locally ensures the best performance and microphone access.
- Python 3.11+ and Node.js 20+
- A Google Cloud Project with Gemini Live API, Imagen, and Veo enabled.
git clone https://github.com/matheus896/DoodleSoul.git
cd DoodleSoulCreate a .env file in the root and the backend/ directory with your key:
GOOGLE_API_KEY="YOUR_GEMINI_API_KEY"
ANIMISM_LIVE_MODE="adk"
ANIMISM_ADK_TOOL_MODE="text_fallback"
ANIMISM_DEBUG_MEDIA=0 # or 1 to enable media debug
ANIMISM_LOG_LEVEL=INFOWe recommend using uv for lightning-fast dependency management:
cd backend
pip install uv # if not installed
uv pip install -r requirements.txt
uv run uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload # local testcd ../frontend
npm install
npm run devAccess the platform at: http://localhost:5173/demo
DoodleSoul provides three primary entry points depending on the user role:
| Route | View | Purpose |
|---|---|---|
/demo |
Unified View | Recommended for Judges. Renders the Child Session and Therapist Dashboard side-by-side. Automatically syncs the session_id between views via localStorage. |
/session |
Child View | The main interface. Used in isolation for the patient's "Adventure" session. |
/therapist/live |
Therapist View | Real-time clinical monitoring. Resolves the active session via ?session_id= or localStorage. Displays "Silent Alerts" and emotional state KPIs. |
The project includes a cloudbuild.yaml for serverless deployment to Google Cloud Run.
Command (PowerShell):
.\scripts\deploy_cloud.ps1 -ProjectId "YOUR_PROJECT_ID"To prove the backend is running on Google Cloud and capture the "Silent Alarm" events, use the evidence collection script. It queries Cloud Logging for canonical audit events (session_started, dlp_redaction_applied, etc.) and generates a JSON evidence file.
Command (PowerShell):
.\scripts\collect_epic5_evidence.ps1 -ProjectId "YOUR_PROJECT_ID" -ServiceName "YOUR_SERVICE_NAME"The script automatically detects the latest session ID if not provided.
- Child Side: Continuous, low-latency voice conversation with their drawing.
- Therapist Side: A real-time dashboard receiving "Silent Alerts" and emotional state tracking via
report_clinical_alert.
- Single-Shot Intake: Capture physical drawings via camera.
- Persona Derivation: Gemini 3.1 Flash Lite extracts voice and personality traits from visual cues.
- Cascading Media: Imagen-4 generates an immediate still, followed by Veo-3 cinematic video.
The heart of the system is a Python bridge that manages asynchronous upstream (microphone) and downstream (Gemini voice + media events) tasks using asyncio.wait(FIRST_COMPLETED).
graph TD
A[Child Microphone] -->|PCM16| B[React Frontend]
B -->|WebSocket| C[FastAPI Bridge]
C -->|ADK Runner| D[Gemini Live API]
D -->|Tool Call| E[Media Interceptor]
E -->|Imagen-4| F[Still Image]
E -->|Veo-3| G[Cinematic Video]
D -->|Clinical Alert| H[Silent Alarm Store]
H -->|API| I[Therapist Dashboard]
DoodleSoul was built with the Brazil's Digital Statute for Children and Adolescents (2026) and LGPD in mind:
| Provision | Implementation |
|---|---|
| Privacy by Default | Aggressive data minimization; no raw audio persistence. |
| DLP Gatekeeper | Mandatory redaction of PII before any clinical storage. |
| Silent Alarm | Clinical alerts are filtered from the child's channel (immersion safety). |
| Audit Logs | Immutable JSON audit logs for legal and clinical accountability. |
To solve the ~45s processing time of Veo-3, we implemented a cascading fallback:
- Imagen-4: Generates a 1024x1024 still in < 5s.
- Ken Burns UI: The frontend applies a CSS
zoom-and-pananimation to the still. - Veo-3: The actual video replaces the animated still once ready.
- Access the Demo: Open
http://localhost:5173/demo. - Start Adventure: Upload a drawing and provide a name.
- Trigger Silent Alarm: Speak: "I am scared of the loud school bell."
- Observe: The voice remains warm; the Therapist Dashboard (right iframe) shows a private alert.
- Generate Magic: Say "Can you draw our adventure?" then "Can you make it move?".
- Observe: The agent keeps talking while the video renders in the background.
To ensure the "Architectural Illusion" and low-latency delivery within the hackathon deadline, the following trade-offs were made:
- In-Memory State: Clinical session states (
ClinicalSessionStore) are currently held in-memory, requiring the Cloud Run deployment to be restricted to--max-instances 1. In a production environment, this would be migrated to Google Cloud Firestore. - Asset Persistence: Media assets (Imagen/Veo) are currently served from local disk. A production-ready version would utilize Google Cloud Storage (GCS) with signed URLs for secure, scalable delivery.
- DLP Simulation: The Cloud DLP mode currently uses a local simulator for quota reliability. Production would swap this for the Google Cloud DLP API.
- CORS Policy: CORS has been left open (
*) exclusively to streamline the demo evaluation process across different local/cloud environments.
