DoodleSoul 🎨✨

🏆 Official Entrant of Gemini Live Agent Challenge
Transforming children's drawings into interactive imaginary friends via Multimodal Orchestration.
A Dual-Audience Platform designed for Pediatric Therapy and Emotional Engagement.

    ██████╗  ██████╗  ██████╗ ██████╗ ██╗     ███████╗███████╗ ██████╗ ██╗   ██╗██╗     
    ██╔══██╗██╔═══██╗██╔═══██╗██╔══██╗██║     ██╔════╝██╔════╝██╔═══██╗██║   ██║██║     
    ██║  ██║██║   ██║██║   ██║██║  ██║██║     █████╗  ███████╗██║   ██║██║   ██║██║     
    ██║  ██║██║   ██║██║   ██║██║  ██║██║     ██╔══╝  ╚════██║██║   ██║██║   ██║██║     
    ██████╔╝╚██████╔╝╚██████╔╝██████╔╝███████╗███████╗███████║╚██████╔╝╚██████╔╝███████╗
    ╚═════╝  ╚═════╝  ╚═════╝ ╚═════╝ ╚══════╝╚══════╝╚══════╝ ╚═════╝ ╚═════╝ ╚══════╝

Developed for the #GeminiLiveAgentChallenge

📋 Table of Contents

🎯 Overview
🚀 Local Spin-Up (Judge's Guide)
✨ Key Features
🏗️ Technical Architecture
🔒 Security & Compliance (LGPD/ECA)
🧠 Gemini Multimodal Integration
🎥 Judge Test Script
🛠️ Technical Debt & Production Roadmap

🎯 Overview

DoodleSoul addresses the "Clinical Blockade" in pediatric therapy. For children with ASD, ADHD, or Selective Mutism, traditional talk therapy is often perceived as a threat.

Our solution uses Technological Externalization: the child draws a character, and we use the Gemini Live API to bring it to life. By projecting internal emotions onto a "digital puppet," we bypass defensive filters, transforming a passive patient into an active storyteller.

🚀 Local Spin-Up (Judge's Guide)

As per the hackathon rules ("URL to your Public Code Repository"), the following instructions allow for a complete local reproduction of the DoodleSoul experience. Only the backend is hosted on Google Cloud; for evaluation purposes, running the frontend locally ensures the best performance and microphone access.

1. Prerequisites

Python 3.11+ and Node.js 20+
A Google Cloud Project with Gemini Live API, Imagen, and Veo enabled.

2. Repository & Environment

git clone https://github.com/matheus896/DoodleSoul.git
cd DoodleSoul

Create a .env file in the root and the backend/ directory with your key:

GOOGLE_API_KEY="YOUR_GEMINI_API_KEY"
ANIMISM_LIVE_MODE="adk"
ANIMISM_ADK_TOOL_MODE="text_fallback"
ANIMISM_DEBUG_MEDIA=0 # or 1 to enable media debug
ANIMISM_LOG_LEVEL=INFO

3. Backend Setup

We recommend using uv for lightning-fast dependency management:

cd backend
pip install uv  # if not installed
uv pip install -r requirements.txt
uv run uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload # local test

4. Frontend Setup

cd ../frontend
npm install
npm run dev

Access the platform at: http://localhost:5173/demo

🚦 Navigation & Roles

DoodleSoul provides three primary entry points depending on the user role:

Route	View	Purpose
`/demo`	Unified View	Recommended for Judges. Renders the Child Session and Therapist Dashboard side-by-side. Automatically syncs the `session_id` between views via `localStorage`.
`/session`	Child View	The main interface. Used in isolation for the patient's "Adventure" session.
`/therapist/live`	Therapist View	Real-time clinical monitoring. Resolves the active session via `?session_id=` or `localStorage`. Displays "Silent Alerts" and emotional state KPIs.

☁️ Cloud Deployment & Evidence

1. Automated Deployment

The project includes a cloudbuild.yaml for serverless deployment to Google Cloud Run.

Command (PowerShell):

.\scripts\deploy_cloud.ps1 -ProjectId "YOUR_PROJECT_ID"

2. Evidence Collection (Audit Logs)

To prove the backend is running on Google Cloud and capture the "Silent Alarm" events, use the evidence collection script. It queries Cloud Logging for canonical audit events (session_started, dlp_redaction_applied, etc.) and generates a JSON evidence file.

Command (PowerShell):

.\scripts\collect_epic5_evidence.ps1 -ProjectId "YOUR_PROJECT_ID" -ServiceName "YOUR_SERVICE_NAME"

The script automatically detects the latest session ID if not provided.

✨ Key Features

🎙️ Real-Time Dual-Audience Interaction

Child Side: Continuous, low-latency voice conversation with their drawing.
Therapist Side: A real-time dashboard receiving "Silent Alerts" and emotional state tracking via report_clinical_alert.

🖼️ Living Drawings Pipeline

Single-Shot Intake: Capture physical drawings via camera.
Persona Derivation: Gemini 3.1 Flash Lite extracts voice and personality traits from visual cues.
Cascading Media: Imagen-4 generates an immediate still, followed by Veo-3 cinematic video.

🏗️ Technical Architecture

The Full-Duplex Bridge

The heart of the system is a Python bridge that manages asynchronous upstream (microphone) and downstream (Gemini voice + media events) tasks using asyncio.wait(FIRST_COMPLETED).

graph TD
    A[Child Microphone] -->|PCM16| B[React Frontend]
    B -->|WebSocket| C[FastAPI Bridge]
    C -->|ADK Runner| D[Gemini Live API]
    D -->|Tool Call| E[Media Interceptor]
    E -->|Imagen-4| F[Still Image]
    E -->|Veo-3| G[Cinematic Video]
    D -->|Clinical Alert| H[Silent Alarm Store]
    H -->|API| I[Therapist Dashboard]

🔒 Security & Compliance

DoodleSoul was built with the Brazil's Digital Statute for Children and Adolescents (2026) and LGPD in mind:

Provision	Implementation
Privacy by Default	Aggressive data minimization; no raw audio persistence.
DLP Gatekeeper	Mandatory redaction of PII before any clinical storage.
Silent Alarm	Clinical alerts are filtered from the child's channel (immersion safety).
Audit Logs	Immutable JSON audit logs for legal and clinical accountability.

🧠 Gemini Multimodal Integration

The Cascading Rendering Strategy (Latency Masking)

To solve the ~45s processing time of Veo-3, we implemented a cascading fallback:

Imagen-4: Generates a 1024x1024 still in < 5s.
Ken Burns UI: The frontend applies a CSS zoom-and-pan animation to the still.
Veo-3: The actual video replaces the animated still once ready.

🎥 Judge Test Script

Access the Demo: Open http://localhost:5173/demo.
Start Adventure: Upload a drawing and provide a name.
Trigger Silent Alarm: Speak: "I am scared of the loud school bell."
- Observe: The voice remains warm; the Therapist Dashboard (right iframe) shows a private alert.
Generate Magic: Say "Can you draw our adventure?" then "Can you make it move?".
- Observe: The agent keeps talking while the video renders in the background.

🛠️ Technical Debt & Production Roadmap

To ensure the "Architectural Illusion" and low-latency delivery within the hackathon deadline, the following trade-offs were made:

In-Memory State: Clinical session states (ClinicalSessionStore) are currently held in-memory, requiring the Cloud Run deployment to be restricted to --max-instances 1. In a production environment, this would be migrated to Google Cloud Firestore.
Asset Persistence: Media assets (Imagen/Veo) are currently served from local disk. A production-ready version would utilize Google Cloud Storage (GCS) with signed URLs for secure, scalable delivery.
DLP Simulation: The Cloud DLP mode currently uses a local simulator for quota reliability. Production would swap this for the Google Cloud DLP API.
CORS Policy: CORS has been left open (*) exclusively to streamline the demo evaluation process across different local/cloud environments.

#GeminiLiveAgentChallenge

DoodleSoul: Where imagination finds its voice.

⬆ Back to Top

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DoodleSoul 🎨✨

📋 Table of Contents

🎯 Overview

🚀 Local Spin-Up (Judge's Guide)

1. Prerequisites

2. Repository & Environment

3. Backend Setup

4. Frontend Setup

🚦 Navigation & Roles

☁️ Cloud Deployment & Evidence

1. Automated Deployment

2. Evidence Collection (Audit Logs)

✨ Key Features

🎙️ Real-Time Dual-Audience Interaction

🖼️ Living Drawings Pipeline

🏗️ Technical Architecture

The Full-Duplex Bridge

🔒 Security & Compliance

🧠 Gemini Multimodal Integration

The Cascading Rendering Strategy (Latency Masking)

🎥 Judge Test Script

🛠️ Technical Debt & Production Roadmap

#GeminiLiveAgentChallenge

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
backend		backend
evidence		evidence
frontend		frontend
scripts		scripts
tests		tests
.env.example		.env.example
README.md		README.md
cloudbuild.yaml		cloudbuild.yaml
system_architecture_diagram.png		system_architecture_diagram.png

Folders and files

Latest commit

History

Repository files navigation

DoodleSoul 🎨✨

📋 Table of Contents

🎯 Overview

🚀 Local Spin-Up (Judge's Guide)

1. Prerequisites

2. Repository & Environment

3. Backend Setup

4. Frontend Setup

🚦 Navigation & Roles

☁️ Cloud Deployment & Evidence

1. Automated Deployment

2. Evidence Collection (Audit Logs)

✨ Key Features

🎙️ Real-Time Dual-Audience Interaction

🖼️ Living Drawings Pipeline

🏗️ Technical Architecture

The Full-Duplex Bridge

🔒 Security & Compliance

🧠 Gemini Multimodal Integration

The Cascading Rendering Strategy (Latency Masking)

🎥 Judge Test Script

🛠️ Technical Debt & Production Roadmap

#GeminiLiveAgentChallenge

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages