Skip to content

Latest commit

 

History

History
306 lines (217 loc) · 14 KB

File metadata and controls

306 lines (217 loc) · 14 KB

Architecture and Vision

M4 is infrastructure for AI-assisted clinical research. This document explains the design philosophy, architecture, and why M4 exists.

Why M4?

The Problem: Clinical Semantics

LLM-assisted clinical data exploration fails most often due to clinical semantics, not SQL syntax. An LLM can write syntactically correct SQL, but it doesn't know:

  • What "sepsis" maps to in ICD codes
  • Which lab values indicate kidney function (creatinine? GFR? both?)
  • That SOFA scores require specific chartevents itemids
  • How to join MIMIC-IV tables without duplicating rows
  • That eICU structures data differently than MIMIC-IV

Without this knowledge, even sophisticated models produce queries that are syntactically correct but clinically meaningless.

The Solution: Infrastructure Layer

M4 addresses this by providing three layers of clinical intelligence:

  1. Schema Documentation: Tables and columns annotated with clinical meaning, not just data types
  2. Concept Mappings: Curated mappings from clinical concepts ("sepsis", "kidney function") to database-specific implementations
  3. Agent Skills: Skills for the Python API and validated clinical research patterns (SOFA scoring, Sepsis-3 criteria, KDIGO AKI staging) extracted from MIT-LCP repositories. For the canonical list, see src/m4/skills/SKILLS_INDEX.md.

This transforms M4 from "an MCP server that runs SQL" into "infrastructure that understands clinical research."

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                        AI Clients                           │
│          (Claude Desktop, Cursor, Claude Code)              │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                     M4 Infrastructure                       │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │  MCP Server  │  │  Python API  │  │ Agent Skills │       │
│  │   (tools)    │  │ (code exec)  │  │ (knowledge)  │       │
│  └──────────────┘  └──────────────┘  └──────────────┘       │
│                              │                              │
│  ┌──────────────────────────────────────────────────┐       │
│  │           Modality-Based Tool System             │       │
│  │  (TABULAR, NOTES, future: WAVEFORMS, IMAGING)    │       │
│  └──────────────────────────────────────────────────┘       │
│                              │                              │
│  ┌──────────────────────────────────────────────────┐       │
│  │             Backend Abstraction                  │       │
│  │        (DuckDB local, BigQuery cloud)            │       │
│  └──────────────────────────────────────────────────┘       │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                     Clinical Datasets                       │
│       MIMIC-IV  │  MIMIC-IV-Note  │  eICU  │  Custom        │
└─────────────────────────────────────────────────────────────┘

Three Access Patterns

1. MCP Server (Natural Language)

For exploratory questions: "What tables exist?", "Show me patients over 80", "Search notes for pneumonia."

AI Client → MCP Protocol → M4 Tools → Backend → Dataset

2. Python API (Code Execution)

For complex analysis: multi-step workflows, statistical computations, survival analysis.

from m4 import set_dataset, execute_query
set_dataset("mimic-iv")
df = execute_query("SELECT * FROM mimiciv_hosp.patients")  # Returns DataFrame

3. Agent Skills (Knowledge)

For clinical intelligence: validated SQL patterns, concept definitions, methodological guidance.

User: "Calculate SOFA scores for my sepsis cohort"
Claude: [Uses sofa-score skill with validated MIT-LCP SQL]

Core Design Principles

1. Modality-Based Tools

Datasets contain different data types (modalities):

  • TABULAR: Structured tables (labs, demographics, vitals)
  • NOTES: Clinical narratives (discharge summaries, radiology reports)
  • Future: WAVEFORMS, IMAGING

Tools declare which modalities they require:

class ExecuteQueryTool:
    required_modalities = frozenset({Modality.TABULAR})

class SearchNotesTool:
    required_modalities = frozenset({Modality.NOTES})

When you switch datasets, M4 automatically shows only compatible tools. Query a tabular dataset? You get SQL tools. Switch to MIMIC-IV-Note? You get notes search tools.

2. Backend Abstraction

The same queries work on local files or cloud databases:

Backend Use Case Setup
DuckDB Local development, data governance Parquet files on disk
BigQuery Full datasets, cloud collaboration Google Cloud credentials

Researchers can prototype locally with the demo dataset, then scale to full MIMIC-IV on BigQuery without changing queries.

3. Cross-Dataset Portability

The same clinical question should work across datasets:

"Find patients with sepsis"

On MIMIC-IV: Uses ICD-10 codes, mimiciv_derived tables On eICU: Uses ICD-9 codes, different table structure

M4's skills and concept mappings handle these translations, enabling external validation studies without per-dataset engineering.

Component Details

MCP Server (mcp_server.py)

FastMCP server exposing tools via Model Context Protocol. Thin adapter over core functionality.

Tools exposed:

  • Dataset management: list_datasets, set_dataset
  • Tabular: get_database_schema, get_table_info, execute_query
  • Notes: search_notes, get_note, list_patient_notes

Python API (api.py)

Direct programmatic access returning native Python types:

Function Returns
execute_query(sql) pd.DataFrame
get_schema() dict
get_table_info(table) dict with DataFrame values
search_notes(query) dict with DataFrame values

Tool System (core/tools/)

Protocol-based design (structural typing). Tools implement:

class Tool(Protocol):
    name: str
    description: str
    required_modalities: frozenset[Modality]

    def invoke(self, dataset, params) -> ToolOutput: ...
    def is_compatible(self, dataset) -> bool: ...

The ToolSelector filters tools based on active dataset modalities.

Derived Table System (core/derived/)

Pre-computed clinical concept tables materialized from vendored SQL. This system provides ~63 derived tables for MIMIC-IV, covering severity scores, sepsis definitions, organ failure staging, medications, measurements, and more.

Architecture:

core/derived/
├── __init__.py              # Public API: materialize_all()
├── materializer.py          # Orchestrates CREATE TABLE execution
└── builtins/
    ├── __init__.py           # Execution order parsing, listing
    └── mimic_iv/
        ├── duckdb.sql        # Orchestrator (defines dependency order)
        ├── score/            # SOFA, SAPS-II, APACHE III, OASIS, LODS, SIRS
        ├── sepsis/           # Sepsis-3, suspicion of infection
        ├── organfailure/     # KDIGO creatinine/UO/stages, MELD
        ├── medication/       # Vasopressors, antibiotics, ACE inhibitors, NSAIDs, etc.
        ├── measurement/      # Labs, vitals, GCS, blood gas, urine output, etc.
        ├── demographics/     # Age, ICU stay details, weight durations
        ├── firstday/         # First-day labs, vitals, GCS, SOFA, etc.
        ├── treatment/        # Ventilation, RRT, CRRT, invasive lines
        └── comorbidity/      # Charlson comorbidity index

Vendored SQL approach: The SQL files are sourced from the mimic-code repository, which is peer-reviewed and used in hundreds of published studies. The files are vendored (copied into M4's source tree) rather than fetched at runtime, ensuring reproducibility and offline operation. Each SQL file creates a table in the mimiciv_derived schema.

Materialization flow:

  1. User runs m4 init-derived mimic-iv (or accepts the prompt during m4 init mimic-iv)
  2. The materializer opens a read-write DuckDB connection to the existing database
  3. A pre-flight check verifies that required base schemas (mimiciv_hosp, mimiciv_icu) exist — these are created by m4 init via schema mapping
  4. It drops and recreates the mimiciv_derived schema
  5. The duckdb.sql orchestrator defines execution order (respecting inter-table dependencies)
  6. Each SQL file is executed sequentially, creating tables like mimiciv_derived.sofa, mimiciv_derived.sepsis3, etc.
  7. Tables become immediately queryable via execute_query

Schema dependency: The vendored SQL references schema-qualified tables (e.g., mimiciv_icu.chartevents, mimiciv_hosp.patients). These schemas are created by m4 init using the dataset's schema_mapping. If you initialized your database with an older version of M4 that used flat naming (e.g., icu_chartevents), you must reinitialize with m4 init mimic-iv --force before running init-derived.

DuckDB-specific SQL modifications: Most vendored SQL is used as-is from mimic-code. Where the upstream SQL causes severe performance problems under DuckDB (but runs fine on PostgreSQL/BigQuery), we apply targeted rewrites that preserve identical output. These are documented in comments at the top of each modified file. Current modifications:

  • sepsis/suspicion_of_infection.sql: The upstream query uses OR conditions in its JOIN clauses (one branch for cultures with charttime, another for cultures with only chartdate). OR in join predicates prevents DuckDB's IEJoin range-join optimization, causing nested-loop scans (~18 min on full MIMIC-IV). The rewrite splits the OR into two separate INNER JOIN + UNION ALL paths with identical comparison operators, enabling IEJoin on each path.

Backend note: Derived table materialization is a DuckDB-only operation. BigQuery users already have these tables available at physionet-data.mimiciv_derived and do not need to run init-derived.

Dataset support: Currently limited to MIMIC-IV (full dataset). Not available for mimic-iv-demo or eICU.

Backend System (core/backends/)

Backend protocol for query execution:

class Backend(Protocol):
    def execute_query(self, sql, dataset) -> QueryResult: ...
    def get_table_list(self, dataset) -> list[str]: ...

Implementations handle database-specific details (DuckDB views, BigQuery schemas).

Data Flow Example

User asks: "What's the mortality rate for patients with AKI?"

Without M4:

  1. LLM guesses at table names, column names
  2. May not know KDIGO staging criteria
  3. Produces syntactically correct but clinically incorrect SQL

With M4:

  1. kdigo-aki-staging skill activates with validated SQL
  2. LLM uses proper MIMIC-IV table joins
  3. Query uses peer-reviewed AKI definitions from MIT-LCP
# Claude Code with M4 skills generates:
from m4 import set_dataset, execute_query

set_dataset("mimic-iv")

# KDIGO AKI staging (validated)
aki_cohort = execute_query("""
    SELECT
        k.stay_id,
        k.aki_stage,
        a.hospital_expire_flag
    FROM mimiciv_derived.kdigo_stages k
    JOIN admissions a ON k.hadm_id = a.hadm_id
    WHERE k.aki_stage >= 1
""")

mortality_rate = aki_cohort['hospital_expire_flag'].mean()

Supported Datasets

Dataset Modalities Patients Access Derived Tables
mimic-iv-demo TABULAR 100 Free No
mimic-iv TABULAR 365k PhysioNet credentialed Yes (63 tables)
mimic-iv-note NOTES 331k notes PhysioNet credentialed No
eicu TABULAR 200k+ PhysioNet credentialed No

Custom datasets can be added via JSON definition. See Custom Datasets.

Future Directions

Additional Modalities

  • WAVEFORMS: ECG, arterial blood pressure waveforms
  • IMAGING: Chest X-rays, CT scans

Enhanced Clinical Semantics

  • Semantic search over clinical notes (beyond keyword matching)
  • Entity extraction from unstructured text
  • Expanded concept mappings for more clinical domains

Provenance and Reproducibility

  • Query logging with timestamps
  • Session export/replay
  • Result fingerprints for audit trails

References