Skip to content

Latest commit

 

History

History
189 lines (143 loc) · 5.05 KB

File metadata and controls

189 lines (143 loc) · 5.05 KB

M4 - Installation Guide for AI Agents

This guide helps AI agents like Cline, Cursor, and other MCP clients install and configure M4 for clinical data analysis.

What is M4?

M4 is infrastructure for AI-assisted clinical research. It provides:

  • MCP Server: Query clinical datasets (MIMIC-IV, eICU) via natural language
  • Python API: Direct programmatic access returning pandas DataFrames
  • Clinical Skills: A set of bundled skills for the Python API and validated clinical research patterns.

Installation

Option 1: uvx (Zero-Installation)

uvx m4-infra

Option 2: pip

pip install m4-infra

MCP Configuration

DuckDB Backend (Recommended for Getting Started)

{
  "mcpServers": {
    "m4": {
      "command": "uvx",
      "args": ["m4-infra"]
    }
  }
}

Features:

  • Demo database (100 patients) downloads automatically
  • No setup required
  • Perfect for testing and development

BigQuery Backend (Full Datasets)

First, switch the active backend:

m4 backend bigquery

Then use this MCP configuration:

{
  "mcpServers": {
    "m4": {
      "command": "uvx",
      "args": ["m4-infra"],
      "env": {
        "M4_PROJECT_ID": "user-project-id"
      }
    }
  }
}

Prerequisites:

  1. Google Cloud credentials configured (gcloud auth application-default login)
  2. PhysioNet credentialed access to MIMIC-IV or eICU

Available MCP Tools

Dataset Management

Tool Description
list_datasets List available datasets and their status
set_dataset Switch the active dataset

Tabular Data (MIMIC-IV, eICU)

Tool Description
get_database_schema List all tables in the database
get_table_info Get column details and sample data for a table
execute_query Run SQL SELECT queries

Clinical Notes (MIMIC-IV-Note)

Tool Description
search_notes Full-text search with snippets
get_note Retrieve a single note by ID
list_patient_notes List notes for a patient (metadata only)

Python API (Code Execution)

For AI agents with code execution capabilities (Claude Code, Cursor), M4 provides a Python API that returns native types instead of formatted strings:

from m4 import set_dataset, execute_query, get_schema

set_dataset("mimic-iv")

# Returns pandas DataFrame
df = execute_query("""
    SELECT subject_id, gender, anchor_age
    FROM mimiciv_hosp.patients
    WHERE anchor_age > 65
""")

# Full pandas power: filter, aggregate, visualize
print(df.describe())
df.plot(kind='bar', x='gender', y='anchor_age')

When to use the Python API:

  • Multi-step analyses where each query informs the next
  • Statistical computations and survival analysis
  • Large result sets that shouldn't flood context
  • Building reproducible analysis notebooks

Clinical Research Skills

M4 ships with a set of bundled skills for the Python API (m4-api) and clinical research patterns extracted from MIT-LCP validated code.

For the canonical list of bundled skills, see src/m4/skills/SKILLS_INDEX.md.

Severity Scores

  • sofa-score - Sequential Organ Failure Assessment
  • apsiii-score - APACHE III with mortality prediction
  • sapsii-score - SAPS-II score
  • oasis-score - Oxford Acute Severity of Illness Score
  • lods-score - Logistic Organ Dysfunction Score
  • sirs-criteria - Systemic Inflammatory Response Syndrome

Sepsis and Infection

  • sepsis-3-cohort - Sepsis-3 identification (SOFA >= 2 + infection)
  • suspicion-of-infection - Suspected infection events

Organ Failure

  • kdigo-aki-staging - KDIGO AKI staging (creatinine + urine output)

Medications

  • vasopressor-equivalents - Norepinephrine-equivalent dose

Laboratory

  • baseline-creatinine - Baseline creatinine estimation
  • gcs-calculation - Glasgow Coma Scale extraction

Cohort Definitions

  • first-icu-stay - First ICU stay selection

Data Quality

  • mimic-table-relationships - Table relationships and join patterns
  • mimic-eicu-mapping - Cross-database concept mapping
  • clinical-research-pitfalls - Common methodological mistakes

Install skills for Claude Code:

m4 skills --tools claude

Verification

After configuration, test by asking:

  • "What tables are available in the database?"
  • "Show me patient demographics"
  • "Search for notes mentioning diabetes"

Troubleshooting

DuckDB backend fails:

  • The demo database downloads automatically on first query
  • No manual m4 init needed when using uvx

BigQuery backend fails:

  • Verify GCP authentication: gcloud auth list
  • Confirm PhysioNet access to the dataset
  • Check project ID is correct

Tools not appearing:

  • Ensure the MCP client configuration is valid JSON
  • Restart the MCP client after configuration changes

Resources