ClaimLens NPDB

ClaimLens NPDB is a healthcare analytics dashboard for exploring U.S. medical malpractice payment patterns using the National Practitioner Data Bank Public Use Data File.

The project converts a raw, coded government dataset into an interactive analytics product focused on malpractice allegation types, injury severity, payment concentration, geography, practitioner fields, reporting lag, and data reliability.

Project Objective

The objective of ClaimLens NPDB is to make public malpractice payment data easier to understand, analyze, and communicate in a responsible healthcare analytics context.

The dashboard helps answer questions such as:

Which medical malpractice allegation categories account for the highest payment totals?
How do payment patterns differ across injury severity levels?
How have report volume and median payment amounts changed over time?
Which practitioner license fields are associated with the largest payment concentration?
Which states show higher reported malpractice payment totals?
How complete and reliable are the fields used in the analysis?

This is not a clinical diagnosis tool, provider ranking system, or negligence detector. It is a healthcare data analytics project built around de-identified public-use malpractice payment records.

Why This Project Matters

Medical malpractice data is high-stakes, sensitive, and easy to misinterpret. Raw NPDB public-use records are coded, de-identified, and difficult to use directly without understanding the codebook and data limitations.

This project demonstrates how healthcare data should be handled:

Decode official public-use codes before analysis.
Separate payment reports from proof of clinical negligence.
Show data quality instead of hiding missingness.
Avoid unsupported clinical claims.
Present findings in a clear, decision-friendly interface.

Key Features

Executive KPI dashboard: reports, total paid, median payment, P90 payment, and severe injury share.
Trend analysis: yearly report volume and median malpractice payment trends.
Payment distribution: payment-band analysis across filtered records.
Injury severity analysis: NPDB outcome severity mix with median payment context.
Allegation intelligence: ranked allegation categories by total payment, frequency, severity share, and median payment.
Geographic analysis: state-level malpractice payment concentration using available NPDB location fields.
Practitioner field analysis: decoded license/practitioner fields grouped into broader categories.
Reliability tab: field completeness checks for payment, severity, demographics, reporting lag, and location availability.
Dark-mode UI: polished Streamlit dashboard designed for portfolio presentation.

Dataset

Source: National Practitioner Data Bank Public Use Data File.

Local source file:

data/npdb_public.csv

Official code mappings are stored locally in:

data/npdb_codebook.json

The codebook JSON was generated from the official NPDB Public Use Data File Format Specifications.

Generated analytics outputs:

data/npdb_analytics.csv
data/data_quality.json

Methodology

The project follows a reproducible analytics workflow:

Load the raw NPDB public-use CSV.
Decode coded NPDB fields using the official public-use format specification.
Clean payment fields and convert dollar strings into numeric amounts.
Derive healthcare analytics features, including:
- report year
- event year
- event-to-report lag
- payment band
- injury severity score
- practitioner group
- state proxy
Generate an analytics-ready dataset.
Generate a data-quality summary.
Render the dashboard with Streamlit and Plotly.

Tech Stack

Python
pandas
Streamlit
Plotly
NPDB Public Use Data File
Official NPDB format/codebook specifications

Project Structure

.
├── analyzer.py                 # Aggregation, KPI, trend, and reliability helpers
├── dashboard.py                # Streamlit dashboard
├── pipeline.py                 # Cleaning, decoding, and feature engineering pipeline
├── requirements.txt            # Python dependencies
├── data/
│   ├── npdb_public.csv         # Raw NPDB public-use data
│   ├── npdb_codebook.json      # Official code mappings
│   ├── npdb_analytics.csv      # Generated analytics dataset
│   └── data_quality.json       # Generated data-quality summary

How To Run

Install dependencies:

pip install -r requirements.txt

Build the analytics dataset:

python3 pipeline.py

Launch the dashboard:

python3 -m streamlit run dashboard.py

Then open the local Streamlit URL shown in the terminal.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
screenshots		screenshots
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
analyzer.py		analyzer.py
dashboard.py		dashboard.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClaimLens NPDB

Project Objective

Why This Project Matters

Key Features

Dataset

Methodology

Tech Stack

Project Structure

How To Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ClaimLens NPDB

Project Objective

Why This Project Matters

Key Features

Dataset

Methodology

Tech Stack

Project Structure

How To Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages