Clinical voice intelligence that listens for what patients don't say.
Built at the Voice AI Hack (London, 2026) β Voice & Medical track, sponsored by Thymia and Speechmatics.
Patients routinely minimise symptoms during GP consultations: "I'm fine, just a bit tired." The words say one thing; the voice says another. TrueVoice catches that gap in real time.
During the consultation we run three signals in parallel:
- Medical STT (Speechmatics) β an accurate clinical transcript, with speaker diarization for single-mic mode.
- Voice biomarkers (Thymia Sentinel β Helios, Apollo, Psyche) β per-utterance distress, mood/energy, and affect scores.
- Concordance engine β a rolling matcher over the transcript for minimisation phrases ("I'm fine", "sleeping well"), gated against the biomarker window.
When the patient's words and their voice diverge, Claude Haiku 4.5 writes a one-sentence clinical gloss (< 1 s) and it lands on the clinician's dashboard. At the end, Claude Sonnet 4.6 synthesises every flag, transcript, and biomarker reading into a one-page evidence report.
No audio is ever persisted. Rooms are ephemeral and live only in memory.
Audio is captured in the browser, downsampled to 16 kHz PCM16 in an AudioWorklet, and streamed to FastAPI over WebSockets as 40 ms binary frames. The backend fans each frame out to Speechmatics and Thymia in parallel, merges their outputs in the concordance engine, and publishes every event onto a per-room async event bus that the clinician dashboard subscribes to.
flowchart LR
MIC["Microphone\n48 kHz"] --> AW["AudioWorklet\nβ 16 kHz PCM16"]
AW -->|"WS binary frames\n640 samples / 40ms"| BE["Backend\n/ws/audio"]
BE --> STT["Speechmatics\nMedical STT"]
BE --> BIO["Thymia Biomarkers\nHelios Β· Apollo Β· Psyche"]
STT --> CE["Concordance Engine"]
BIO --> CE
CE --> CL["Claude Haiku 4.5\nHot-path gloss < 1s"]
CL --> EB["EventBus\nper-room pub/sub"]
EB -->|"WS stream"| DASH["Clinician Dashboard"]
DASH -->|"end of consult"| RPT["Claude Sonnet 4.6\nEvidence Report"]
| Stage | Service | Target latency |
|---|---|---|
| Medical transcription | Speechmatics RT | ~200 ms |
| Distress / stress score (Helios) | Thymia | per utterance |
| Mood / energy score (Apollo) | Thymia | per utterance |
| Affect breakdown (Psyche) | Thymia | per utterance |
| Minimisation flag gloss | Claude Haiku 4.5 | < 1 s |
| End-of-consult evidence report | Claude Sonnet 4.6 | on demand |
TrueVoice runs the same pipeline in two environments:
- Telehealth (
/online) β patient and clinician on separate devices. A lightweight WebRTC signaling relay in the backend pairs them; patient audio is tee'd to the pipeline. The clinician's screen shows the video call and the live dashboard. - In-person (
/in-person) β one laptop on the desk. A single microphone captures both voices; Speechmatics speaker diarization (max_speakers=2) attributes each utterance to clinician or patient. The dashboard lives on a second monitor.
graph TD
A[TrueVoice] --> B[Telehealth]
A --> C[In-person]
B --> B1["Browser WebRTC\npatient + clinician on separate devices"]
C --> C1["Single laptop mic\ndiarization splits the two speakers"]
Live transcript lane, Helios/Apollo/Psyche biomarker bars, and concordance flag cards. Each flag pairs the minimisation phrase with the biomarker evidence that triggered it and Claude's gloss.
The clinician's screen during a live telehealth consult. WebRTC video sits alongside the full diagnostic surface: patient tile, live diarized transcript, concordance meter, biomarker bars, and the gap panel waiting to fire when minimisation meets biomarker evidence.
Clicking Generate Report at the end of the consult calls Claude Sonnet 4.6 with the full transcript, every biomarker reading, and every flag. The output is a structured one-page brief the GP can review and attach to the patient record.
Backend β Python 3.11 Β· FastAPI Β· WebSockets Β· speechmatics-rt Β· thymia-sentinel Β· Anthropic SDK Β· uv
Frontend β Next.js 16 Β· React 19 Β· TypeScript Β· Tailwind CSS 4 Β· AudioWorklet Β· WebRTC Β· npm
AI β Claude Haiku 4.5 (claude-haiku-4-5) for sub-second gloss Β· Claude Sonnet 4.6 (claude-sonnet-4-6) for end-of-consult synthesis
| Tool | Version | Why |
|---|---|---|
| Python | β₯ 3.11 | Backend runtime |
| uv | latest | Backend deps / runner |
| Node.js | β₯ 20 | Frontend runtime |
| npm | β₯ 10 | Package install (lockfile is package-lock.json) |
| A modern browser | Chrome, Edge, Safari 17+ | AudioWorklet + microphone permission |
You need three keys before running anything:
| Variable | Where to get it |
|---|---|
SPEECHMATICS_API_KEY |
speechmatics.com β sign up, create an API key under API Access |
THYMIA_API_KEY |
thymia.ai β request access to the Sentinel SDK |
ANTHROPIC_API_KEY |
console.anthropic.com β create a key under API Keys |
cd backend
cp .env.example .env
# fill in the three keys in .env
uv sync
uv run uvicorn app.main:app --reloadServer listens on http://localhost:8000. Verify with:
curl http://localhost:8000/health
# β {"ok":true}backend/.env:
SPEECHMATICS_API_KEY=your-speechmatics-key
THYMIA_API_KEY=your-thymia-key
ANTHROPIC_API_KEY=your-anthropic-key
ALLOWED_ORIGINS=http://localhost:3000cd frontend
npm install
npm run devOpen http://localhost:3000. The frontend proxies /api/* to the backend (see next.config.ts), so no env file is needed for local dev.
To point at a non-default backend, create frontend/.env.local:
NEXT_PUBLIC_BACKEND_HTTP_URL=http://localhost:8000
NEXT_PUBLIC_BACKEND_WS_URL=ws://localhost:8000- Clinician opens
http://localhost:3000/onlineand clicks Start as clinician β a 4-digit room code is generated. - Share the code with the patient (out-of-band β text, email).
- Patient opens
/online, enters the code, clicks Join as patient, and grants microphone + camera permission. - WebRTC connects the call; patient audio is simultaneously streamed to the pipeline.
- Clinician sees the live dashboard (transcript Β· biomarkers Β· flags) alongside the video tile.
- End the call β click Generate Report to get the one-page brief.
- Open
http://localhost:3000/in-personon a single laptop and click Start session. - Grant microphone permission. Both people speak into the same mic; Speechmatics diarization attributes utterances to clinician vs. patient.
- Dashboard streams live to the same browser window (or a second monitor).
- Click Generate Report at the end.
TrueVoice/
βββ backend/
β βββ app/
β β βββ main.py # FastAPI entry + middleware + routers
β β βββ config.py # Pydantic settings / secret masking
β β βββ models.py # Pydantic event schema (dashboard events)
β β βββ rooms.py # Ephemeral in-memory room state
β β βββ eventbus.py # Per-room async pub/sub + replay buffer
β β βββ api/
β β β βββ rooms.py # POST /api/rooms, GET /api/rooms/{id}
β β β βββ report.py # POST/GET /api/report/{room}
β β β βββ debug.py # Opt-in debug endpoints
β β βββ services/
β β β βββ distributor.py # Fan-out 16 kHz frames to many consumers
β β β βββ speechmatics.py # Medical STT + speaker diarization
β β β βββ thymia.py # Helios / Apollo / Psyche biomarkers
β β β βββ concordance.py # Minimisation matcher + biomarker gating
β β β βββ claude.py # Hot-path gloss + report synthesis
β β βββ ws/
β β βββ audio.py # /ws/audio/{role}/{room} ingress
β β βββ dashboard.py # /ws/dashboard/{room} event stream
β β βββ signaling.py # /ws/signal/{role}/{room} WebRTC relay
β βββ tests/ # pytest suite (unit + live integration)
βββ frontend/
βββ app/
β βββ page.tsx # Landing
β βββ online/ # Telehealth lobby + /patient and /clinician rooms
β βββ in-person/ # Single-laptop mode
β βββ report/[room]/ # Evidence report viewer
βββ components/
β βββ Dashboard.tsx # Composed live view
β βββ TranscriptLane.tsx
β βββ BiomarkerLane.tsx
β βββ FlagCard.tsx
β βββ ConcordanceMeter.tsx
β βββ ClinicianVideoPanel.tsx, VideoTile.tsx, MeetingControls.tsx
β βββ ui/ # Shared primitives (buttons, badges, effects)
βββ lib/
β βββ audioCapture.ts # Mic β AudioWorklet β WebSocket
β βββ dashboardSocket.ts # Dashboard WS client w/ reconnect
β βββ useVideoCall.ts # WebRTC + signaling hook
βββ public/pcm-worklet.js # 48 kHz β 16 kHz PCM16 downsampler
| Method | Path | Purpose |
|---|---|---|
GET |
/health |
Liveness probe |
POST |
/api/rooms |
Create an ephemeral room, returns {room_id, created_at_ms} |
GET |
/api/rooms/{room_id} |
Check whether a room exists |
POST |
/api/report/{room_id} |
Trigger Claude Sonnet synthesis |
GET |
/api/report/{room_id} |
Fetch the generated report |
WS |
/ws/audio/{role}/{room_id}?mode=inperson |
Binary 40 ms PCM16 frames in |
WS |
/ws/dashboard/{room_id} |
JSON event stream out (with replay on connect) |
WS |
/ws/signal/{role}/{room_id} |
WebRTC signaling relay (telehealth) |
# Backend
cd backend
uv run pytest # unit tests
uv run pytest -m integration # live integration tests (slower)
uv run ruff check .
# Frontend
cd frontend
npm run lint
npm run build| Name | GitHub |
|---|---|
| Joan Torres Gordo | @joant11 |
| Indigo Luksch | @IndigoLuksch |
| Oriol Morros Vilaseca | @omorros |
Disclaimer: TrueVoice is a research-grade hackathon prototype. It is not a medical device and should not be used for clinical diagnosis.



