Skip to content

omorros/TrueVoice

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

45 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TrueVoice

Clinical voice intelligence that listens for what patients don't say.

Built at the Voice AI Hack (London, 2026) β€” Voice & Medical track, sponsored by Thymia and Speechmatics.


What It Does

Patients routinely minimise symptoms during GP consultations: "I'm fine, just a bit tired." The words say one thing; the voice says another. TrueVoice catches that gap in real time.

During the consultation we run three signals in parallel:

  1. Medical STT (Speechmatics) β€” an accurate clinical transcript, with speaker diarization for single-mic mode.
  2. Voice biomarkers (Thymia Sentinel β€” Helios, Apollo, Psyche) β€” per-utterance distress, mood/energy, and affect scores.
  3. Concordance engine β€” a rolling matcher over the transcript for minimisation phrases ("I'm fine", "sleeping well"), gated against the biomarker window.

When the patient's words and their voice diverge, Claude Haiku 4.5 writes a one-sentence clinical gloss (< 1 s) and it lands on the clinician's dashboard. At the end, Claude Sonnet 4.6 synthesises every flag, transcript, and biomarker reading into a one-page evidence report.

No audio is ever persisted. Rooms are ephemeral and live only in memory.


How It Works

Audio is captured in the browser, downsampled to 16 kHz PCM16 in an AudioWorklet, and streamed to FastAPI over WebSockets as 40 ms binary frames. The backend fans each frame out to Speechmatics and Thymia in parallel, merges their outputs in the concordance engine, and publishes every event onto a per-room async event bus that the clinician dashboard subscribes to.

flowchart LR
    MIC["Microphone\n48 kHz"] --> AW["AudioWorklet\n→ 16 kHz PCM16"]
    AW -->|"WS binary frames\n640 samples / 40ms"| BE["Backend\n/ws/audio"]

    BE --> STT["Speechmatics\nMedical STT"]
    BE --> BIO["Thymia Biomarkers\nHelios Β· Apollo Β· Psyche"]

    STT --> CE["Concordance Engine"]
    BIO --> CE

    CE --> CL["Claude Haiku 4.5\nHot-path gloss < 1s"]
    CL --> EB["EventBus\nper-room pub/sub"]
    EB -->|"WS stream"| DASH["Clinician Dashboard"]
    DASH -->|"end of consult"| RPT["Claude Sonnet 4.6\nEvidence Report"]
Loading

Signal pipeline

Stage Service Target latency
Medical transcription Speechmatics RT ~200 ms
Distress / stress score (Helios) Thymia per utterance
Mood / energy score (Apollo) Thymia per utterance
Affect breakdown (Psyche) Thymia per utterance
Minimisation flag gloss Claude Haiku 4.5 < 1 s
End-of-consult evidence report Claude Sonnet 4.6 on demand

Consultation Modes

TrueVoice runs the same pipeline in two environments:

  • Telehealth (/online) β€” patient and clinician on separate devices. A lightweight WebRTC signaling relay in the backend pairs them; patient audio is tee'd to the pipeline. The clinician's screen shows the video call and the live dashboard.
  • In-person (/in-person) β€” one laptop on the desk. A single microphone captures both voices; Speechmatics speaker diarization (max_speakers=2) attributes each utterance to clinician or patient. The dashboard lives on a second monitor.
graph TD
    A[TrueVoice] --> B[Telehealth]
    A --> C[In-person]
    B --> B1["Browser WebRTC\npatient + clinician on separate devices"]
    C --> C1["Single laptop mic\ndiarization splits the two speakers"]
Loading

Screenshots

Landing page

Landing page showing the two consultation modes and the TrueVoice value proposition

Clinician dashboard

Live transcript lane, Helios/Apollo/Psyche biomarker bars, and concordance flag cards. Each flag pairs the minimisation phrase with the biomarker evidence that triggered it and Claude's gloss.

Clinician dashboard with live transcript, concordance meter, biomarker bars, and flag cards

Telehealth call (GP view)

The clinician's screen during a live telehealth consult. WebRTC video sits alongside the full diagnostic surface: patient tile, live diarized transcript, concordance meter, biomarker bars, and the gap panel waiting to fire when minimisation meets biomarker evidence.

Clinician telehealth view with patient video tile, live transcript, and diagnostic dashboard side by side

Evidence report

Clicking Generate Report at the end of the consult calls Claude Sonnet 4.6 with the full transcript, every biomarker reading, and every flag. The output is a structured one-page brief the GP can review and attach to the patient record.

Evidence report listing concordance gaps with quoted utterances and biomarker evidence


Tech Stack

Backend β€” Python 3.11 Β· FastAPI Β· WebSockets Β· speechmatics-rt Β· thymia-sentinel Β· Anthropic SDK Β· uv

Frontend β€” Next.js 16 Β· React 19 Β· TypeScript Β· Tailwind CSS 4 Β· AudioWorklet Β· WebRTC Β· npm

AI β€” Claude Haiku 4.5 (claude-haiku-4-5) for sub-second gloss Β· Claude Sonnet 4.6 (claude-sonnet-4-6) for end-of-consult synthesis


Getting Started

Prerequisites

Tool Version Why
Python β‰₯ 3.11 Backend runtime
uv latest Backend deps / runner
Node.js β‰₯ 20 Frontend runtime
npm β‰₯ 10 Package install (lockfile is package-lock.json)
A modern browser Chrome, Edge, Safari 17+ AudioWorklet + microphone permission

API keys

You need three keys before running anything:

Variable Where to get it
SPEECHMATICS_API_KEY speechmatics.com β€” sign up, create an API key under API Access
THYMIA_API_KEY thymia.ai β€” request access to the Sentinel SDK
ANTHROPIC_API_KEY console.anthropic.com β€” create a key under API Keys

1. Backend

cd backend
cp .env.example .env
# fill in the three keys in .env
uv sync
uv run uvicorn app.main:app --reload

Server listens on http://localhost:8000. Verify with:

curl http://localhost:8000/health
# β†’ {"ok":true}

backend/.env:

SPEECHMATICS_API_KEY=your-speechmatics-key
THYMIA_API_KEY=your-thymia-key
ANTHROPIC_API_KEY=your-anthropic-key
ALLOWED_ORIGINS=http://localhost:3000

2. Frontend

cd frontend
npm install
npm run dev

Open http://localhost:3000. The frontend proxies /api/* to the backend (see next.config.ts), so no env file is needed for local dev.

To point at a non-default backend, create frontend/.env.local:

NEXT_PUBLIC_BACKEND_HTTP_URL=http://localhost:8000
NEXT_PUBLIC_BACKEND_WS_URL=ws://localhost:8000

Using TrueVoice

Telehealth

  1. Clinician opens http://localhost:3000/online and clicks Start as clinician β†’ a 4-digit room code is generated.
  2. Share the code with the patient (out-of-band β€” text, email).
  3. Patient opens /online, enters the code, clicks Join as patient, and grants microphone + camera permission.
  4. WebRTC connects the call; patient audio is simultaneously streamed to the pipeline.
  5. Clinician sees the live dashboard (transcript Β· biomarkers Β· flags) alongside the video tile.
  6. End the call β†’ click Generate Report to get the one-page brief.

In-person

  1. Open http://localhost:3000/in-person on a single laptop and click Start session.
  2. Grant microphone permission. Both people speak into the same mic; Speechmatics diarization attributes utterances to clinician vs. patient.
  3. Dashboard streams live to the same browser window (or a second monitor).
  4. Click Generate Report at the end.

Project Structure

TrueVoice/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ main.py              # FastAPI entry + middleware + routers
β”‚   β”‚   β”œβ”€β”€ config.py            # Pydantic settings / secret masking
β”‚   β”‚   β”œβ”€β”€ models.py            # Pydantic event schema (dashboard events)
β”‚   β”‚   β”œβ”€β”€ rooms.py             # Ephemeral in-memory room state
β”‚   β”‚   β”œβ”€β”€ eventbus.py          # Per-room async pub/sub + replay buffer
β”‚   β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”‚   β”œβ”€β”€ rooms.py         # POST /api/rooms, GET /api/rooms/{id}
β”‚   β”‚   β”‚   β”œβ”€β”€ report.py        # POST/GET /api/report/{room}
β”‚   β”‚   β”‚   └── debug.py         # Opt-in debug endpoints
β”‚   β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”‚   β”œβ”€β”€ distributor.py   # Fan-out 16 kHz frames to many consumers
β”‚   β”‚   β”‚   β”œβ”€β”€ speechmatics.py  # Medical STT + speaker diarization
β”‚   β”‚   β”‚   β”œβ”€β”€ thymia.py        # Helios / Apollo / Psyche biomarkers
β”‚   β”‚   β”‚   β”œβ”€β”€ concordance.py   # Minimisation matcher + biomarker gating
β”‚   β”‚   β”‚   └── claude.py        # Hot-path gloss + report synthesis
β”‚   β”‚   └── ws/
β”‚   β”‚       β”œβ”€β”€ audio.py         # /ws/audio/{role}/{room} ingress
β”‚   β”‚       β”œβ”€β”€ dashboard.py     # /ws/dashboard/{room} event stream
β”‚   β”‚       └── signaling.py     # /ws/signal/{role}/{room} WebRTC relay
β”‚   └── tests/                   # pytest suite (unit + live integration)
└── frontend/
    β”œβ”€β”€ app/
    β”‚   β”œβ”€β”€ page.tsx             # Landing
    β”‚   β”œβ”€β”€ online/              # Telehealth lobby + /patient and /clinician rooms
    β”‚   β”œβ”€β”€ in-person/           # Single-laptop mode
    β”‚   └── report/[room]/       # Evidence report viewer
    β”œβ”€β”€ components/
    β”‚   β”œβ”€β”€ Dashboard.tsx        # Composed live view
    β”‚   β”œβ”€β”€ TranscriptLane.tsx
    β”‚   β”œβ”€β”€ BiomarkerLane.tsx
    β”‚   β”œβ”€β”€ FlagCard.tsx
    β”‚   β”œβ”€β”€ ConcordanceMeter.tsx
    β”‚   β”œβ”€β”€ ClinicianVideoPanel.tsx, VideoTile.tsx, MeetingControls.tsx
    β”‚   └── ui/                  # Shared primitives (buttons, badges, effects)
    β”œβ”€β”€ lib/
    β”‚   β”œβ”€β”€ audioCapture.ts      # Mic β†’ AudioWorklet β†’ WebSocket
    β”‚   β”œβ”€β”€ dashboardSocket.ts   # Dashboard WS client w/ reconnect
    β”‚   └── useVideoCall.ts      # WebRTC + signaling hook
    └── public/pcm-worklet.js    # 48 kHz β†’ 16 kHz PCM16 downsampler

API Surface

Method Path Purpose
GET /health Liveness probe
POST /api/rooms Create an ephemeral room, returns {room_id, created_at_ms}
GET /api/rooms/{room_id} Check whether a room exists
POST /api/report/{room_id} Trigger Claude Sonnet synthesis
GET /api/report/{room_id} Fetch the generated report
WS /ws/audio/{role}/{room_id}?mode=inperson Binary 40 ms PCM16 frames in
WS /ws/dashboard/{room_id} JSON event stream out (with replay on connect)
WS /ws/signal/{role}/{room_id} WebRTC signaling relay (telehealth)

Development

# Backend
cd backend
uv run pytest          # unit tests
uv run pytest -m integration   # live integration tests (slower)
uv run ruff check .

# Frontend
cd frontend
npm run lint
npm run build

Team

Name GitHub
Joan Torres Gordo @joant11
Indigo Luksch @IndigoLuksch
Oriol Morros Vilaseca @omorros

Disclaimer: TrueVoice is a research-grade hackathon prototype. It is not a medical device and should not be used for clinical diagnosis.

About

πŸ† Voice AI Hack London 2026, Overall Winner. Patients minimise. Voices don't. Real-time clinical voice intelligence that flags when what a patient says diverges from what their voice reveals. Built on Speechmatics medical STT, Thymia voice biomarkers, and Claude.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 61.3%
  • Python 35.8%
  • CSS 2.4%
  • JavaScript 0.5%