Architecture and Vision

M4 is infrastructure for AI-assisted clinical research. This document explains the design philosophy, architecture, and why M4 exists.

Why M4?

The Problem: Clinical Semantics

LLM-assisted clinical data exploration fails most often due to clinical semantics, not SQL syntax. An LLM can write syntactically correct SQL, but it doesn't know:

What "sepsis" maps to in ICD codes
Which lab values indicate kidney function (creatinine? GFR? both?)
That SOFA scores require specific chartevents itemids
How to join MIMIC-IV tables without duplicating rows
That eICU structures data differently than MIMIC-IV

Without this knowledge, even sophisticated models produce queries that are syntactically correct but clinically meaningless.

The Solution: Infrastructure Layer

M4 addresses this by providing three layers of clinical intelligence:

Schema Documentation: Tables and columns annotated with clinical meaning, not just data types
Concept Mappings: Curated mappings from clinical concepts ("sepsis", "kidney function") to database-specific implementations
Agent Skills: Skills for the Python API and validated clinical research patterns (SOFA scoring, Sepsis-3 criteria, KDIGO AKI staging) extracted from MIT-LCP repositories. For the canonical list, see src/m4/skills/SKILLS_INDEX.md.

This transforms M4 from "an MCP server that runs SQL" into "infrastructure that understands clinical research."

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                        AI Clients                           │
│          (Claude Desktop, Cursor, Claude Code)              │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                     M4 Infrastructure                       │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │  MCP Server  │  │  Python API  │  │ Agent Skills │       │
│  │   (tools)    │  │ (code exec)  │  │ (knowledge)  │       │
│  └──────────────┘  └──────────────┘  └──────────────┘       │
│                              │                              │
│  ┌──────────────────────────────────────────────────┐       │
│  │           Modality-Based Tool System             │       │
│  │  (TABULAR, NOTES, future: WAVEFORMS, IMAGING)    │       │
│  └──────────────────────────────────────────────────┘       │
│                              │                              │
│  ┌──────────────────────────────────────────────────┐       │
│  │             Backend Abstraction                  │       │
│  │        (DuckDB local, BigQuery cloud)            │       │
│  └──────────────────────────────────────────────────┘       │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                     Clinical Datasets                       │
│       MIMIC-IV  │  MIMIC-IV-Note  │  eICU  │  Custom        │
└─────────────────────────────────────────────────────────────┘

Three Access Patterns

1. MCP Server (Natural Language)

For exploratory questions: "What tables exist?", "Show me patients over 80", "Search notes for pneumonia."

AI Client → MCP Protocol → M4 Tools → Backend → Dataset

2. Python API (Code Execution)

For complex analysis: multi-step workflows, statistical computations, survival analysis.

from m4 import set_dataset, execute_query
set_dataset("mimic-iv")
df = execute_query("SELECT * FROM mimiciv_hosp.patients")  # Returns DataFrame

3. Agent Skills (Knowledge)

For clinical intelligence: validated SQL patterns, concept definitions, methodological guidance.

User: "Calculate SOFA scores for my sepsis cohort"
Claude: [Uses sofa-score skill with validated MIT-LCP SQL]

Core Design Principles

1. Modality-Based Tools

Datasets contain different data types (modalities):

TABULAR: Structured tables (labs, demographics, vitals)
NOTES: Clinical narratives (discharge summaries, radiology reports)
Future: WAVEFORMS, IMAGING

Tools declare which modalities they require:

class ExecuteQueryTool:
    required_modalities = frozenset({Modality.TABULAR})

class SearchNotesTool:
    required_modalities = frozenset({Modality.NOTES})

When you switch datasets, M4 automatically shows only compatible tools. Query a tabular dataset? You get SQL tools. Switch to MIMIC-IV-Note? You get notes search tools.

2. Backend Abstraction

The same queries work on local files or cloud databases:

Backend	Use Case	Setup
DuckDB	Local development, data governance	Parquet files on disk
BigQuery	Full datasets, cloud collaboration	Google Cloud credentials

Researchers can prototype locally with the demo dataset, then scale to full MIMIC-IV on BigQuery without changing queries.

3. Cross-Dataset Portability

The same clinical question should work across datasets:

"Find patients with sepsis"

On MIMIC-IV: Uses ICD-10 codes, mimiciv_derived tables On eICU: Uses ICD-9 codes, different table structure

M4's skills and concept mappings handle these translations, enabling external validation studies without per-dataset engineering.

Component Details

MCP Server (`mcp_server.py`)

FastMCP server exposing tools via Model Context Protocol. Thin adapter over core functionality.

Tools exposed:

Dataset management: list_datasets, set_dataset
Tabular: get_database_schema, get_table_info, execute_query
Notes: search_notes, get_note, list_patient_notes

Python API (`api.py`)

Direct programmatic access returning native Python types:

Function	Returns
`execute_query(sql)`	`pd.DataFrame`
`get_schema()`	`dict`
`get_table_info(table)`	`dict` with DataFrame values
`search_notes(query)`	`dict` with DataFrame values

Tool System (`core/tools/`)

Protocol-based design (structural typing). Tools implement:

class Tool(Protocol):
    name: str
    description: str
    required_modalities: frozenset[Modality]

    def invoke(self, dataset, params) -> ToolOutput: ...
    def is_compatible(self, dataset) -> bool: ...

The ToolSelector filters tools based on active dataset modalities.

Derived Table System (`core/derived/`)

Pre-computed clinical concept tables materialized from vendored SQL. This system provides ~63 derived tables for MIMIC-IV, covering severity scores, sepsis definitions, organ failure staging, medications, measurements, and more.

Architecture:

core/derived/
├── __init__.py              # Public API: materialize_all()
├── materializer.py          # Orchestrates CREATE TABLE execution
└── builtins/
    ├── __init__.py           # Execution order parsing, listing
    └── mimic_iv/
        ├── duckdb.sql        # Orchestrator (defines dependency order)
        ├── score/            # SOFA, SAPS-II, APACHE III, OASIS, LODS, SIRS
        ├── sepsis/           # Sepsis-3, suspicion of infection
        ├── organfailure/     # KDIGO creatinine/UO/stages, MELD
        ├── medication/       # Vasopressors, antibiotics, ACE inhibitors, NSAIDs, etc.
        ├── measurement/      # Labs, vitals, GCS, blood gas, urine output, etc.
        ├── demographics/     # Age, ICU stay details, weight durations
        ├── firstday/         # First-day labs, vitals, GCS, SOFA, etc.
        ├── treatment/        # Ventilation, RRT, CRRT, invasive lines
        └── comorbidity/      # Charlson comorbidity index

Vendored SQL approach: The SQL files are sourced from the mimic-code repository, which is peer-reviewed and used in hundreds of published studies. The files are vendored (copied into M4's source tree) rather than fetched at runtime, ensuring reproducibility and offline operation. Each SQL file creates a table in the mimiciv_derived schema.

Materialization flow:

User runs m4 init-derived mimic-iv (or accepts the prompt during m4 init mimic-iv)
The materializer opens a read-write DuckDB connection to the existing database
A pre-flight check verifies that required base schemas (mimiciv_hosp, mimiciv_icu) exist — these are created by m4 init via schema mapping
It drops and recreates the mimiciv_derived schema
The duckdb.sql orchestrator defines execution order (respecting inter-table dependencies)
Each SQL file is executed sequentially, creating tables like mimiciv_derived.sofa, mimiciv_derived.sepsis3, etc.
Tables become immediately queryable via execute_query

Schema dependency: The vendored SQL references schema-qualified tables (e.g., mimiciv_icu.chartevents, mimiciv_hosp.patients). These schemas are created by m4 init using the dataset's schema_mapping. If you initialized your database with an older version of M4 that used flat naming (e.g., icu_chartevents), you must reinitialize with m4 init mimic-iv --force before running init-derived.

DuckDB-specific SQL modifications: Most vendored SQL is used as-is from mimic-code. Where the upstream SQL causes severe performance problems under DuckDB (but runs fine on PostgreSQL/BigQuery), we apply targeted rewrites that preserve identical output. These are documented in comments at the top of each modified file. Current modifications:

sepsis/suspicion_of_infection.sql: The upstream query uses OR conditions in its JOIN clauses (one branch for cultures with charttime, another for cultures with only chartdate). OR in join predicates prevents DuckDB's IEJoin range-join optimization, causing nested-loop scans (~18 min on full MIMIC-IV). The rewrite splits the OR into two separate INNER JOIN + UNION ALL paths with identical comparison operators, enabling IEJoin on each path.

Backend note: Derived table materialization is a DuckDB-only operation. BigQuery users already have these tables available at physionet-data.mimiciv_derived and do not need to run init-derived.

Dataset support: Currently limited to MIMIC-IV (full dataset). Not available for mimic-iv-demo or eICU.

Backend System (`core/backends/`)

Backend protocol for query execution:

class Backend(Protocol):
    def execute_query(self, sql, dataset) -> QueryResult: ...
    def get_table_list(self, dataset) -> list[str]: ...

Implementations handle database-specific details (DuckDB views, BigQuery schemas).

Data Flow Example

User asks: "What's the mortality rate for patients with AKI?"

Without M4:

LLM guesses at table names, column names
May not know KDIGO staging criteria
Produces syntactically correct but clinically incorrect SQL

With M4:

kdigo-aki-staging skill activates with validated SQL
LLM uses proper MIMIC-IV table joins
Query uses peer-reviewed AKI definitions from MIT-LCP

# Claude Code with M4 skills generates:
from m4 import set_dataset, execute_query

set_dataset("mimic-iv")

# KDIGO AKI staging (validated)
aki_cohort = execute_query("""
    SELECT
        k.stay_id,
        k.aki_stage,
        a.hospital_expire_flag
    FROM mimiciv_derived.kdigo_stages k
    JOIN admissions a ON k.hadm_id = a.hadm_id
    WHERE k.aki_stage >= 1
""")

mortality_rate = aki_cohort['hospital_expire_flag'].mean()

Supported Datasets

Dataset	Modalities	Patients	Access	Derived Tables
mimic-iv-demo	TABULAR	100	Free	No
mimic-iv	TABULAR	365k	PhysioNet credentialed	Yes (63 tables)
mimic-iv-note	NOTES	331k notes	PhysioNet credentialed	No
eicu	TABULAR	200k+	PhysioNet credentialed	No

Custom datasets can be added via JSON definition. See Custom Datasets.

Future Directions

Additional Modalities

WAVEFORMS: ECG, arterial blood pressure waveforms
IMAGING: Chest X-rays, CT scans

Enhanced Clinical Semantics

Semantic search over clinical notes (beyond keyword matching)
Entity extraction from unstructured text
Expanded concept mappings for more clinical domains

Provenance and Reproducibility

Query logging with timestamps
Session export/replay
Result fingerprints for audit trails

References

MIMIC-IV: https://mimic.mit.edu/docs/iv/
eICU: https://eicu-crd.mit.edu/
MIT-LCP Code Repositories: https://github.com/MIT-LCP
Model Context Protocol: https://modelcontextprotocol.io/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture and Vision

Why M4?

The Problem: Clinical Semantics

The Solution: Infrastructure Layer

Architecture Overview

Three Access Patterns

Core Design Principles

1. Modality-Based Tools

2. Backend Abstraction

3. Cross-Dataset Portability

Component Details

MCP Server (`mcp_server.py`)

Python API (`api.py`)

Tool System (`core/tools/`)

Derived Table System (`core/derived/`)

Backend System (`core/backends/`)

Data Flow Example

Supported Datasets

Future Directions

Additional Modalities

Enhanced Clinical Semantics

Provenance and Reproducibility

References

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Architecture and Vision

Why M4?

The Problem: Clinical Semantics

The Solution: Infrastructure Layer

Architecture Overview

Three Access Patterns

Core Design Principles

1. Modality-Based Tools

2. Backend Abstraction

3. Cross-Dataset Portability

Component Details

MCP Server (mcp_server.py)

Python API (api.py)

Tool System (core/tools/)

Derived Table System (core/derived/)

Backend System (core/backends/)

Data Flow Example

Supported Datasets

Future Directions

Additional Modalities

Enhanced Clinical Semantics

Provenance and Reproducibility

References

MCP Server (`mcp_server.py`)

Python API (`api.py`)

Tool System (`core/tools/`)

Derived Table System (`core/derived/`)

Backend System (`core/backends/`)