MedExpert

Code for the "MedExpert: An Expert-Annotated Dataset for Medical Chatbot Evaluation" paper at Machine Learning for Health (ML4H) 2025.

Paper Link: https://openreview.net/pdf?id=rkLAzDPlqL

Dataset release on Hugging Face: https://huggingface.co/datasets/sonal-ssj/MedExpert

Overview

Setup Instructions

Clone the repository:

[email protected]:JHU-CLSP/MedExpert.git
cd MedExpert

Copy example.env to .env and fill in the required environment variables.
```
cp example.env .env
```
Edit the .env file to edit paths for MEDEXPERT_REPO="path/to/medexpert/repo",MEDEXPERT_DATA="path/to/medexpert/data", and API key OPENAI_API_KEY="your_openai_api_key_here"

Install the required packages:

conda env create -f environment.yml
conda activate medexpert
pip install git+https://github.com/Heyuan9/MedScore.git --no-deps

(Optional) Data Annotation & Pre-processing

This section not required for MedExpert data or benchmark experiments itself but only if someone wants to annotate data like MedExpert we provide all the scripts & tools for the same. We used John Snow Labs No Tool NLP Interface.

Once you have the interface hosted, click on New Project -> Setup -> Configuration -> Keep clicking on Next button until you reach Customize Configuration -> Select Code tab in customize configuration -> Copy paste the XML config in scripts/data_preprocessing/JSL_interface.XML. -> Save Config.

In Project -> Setup -> Team you can add the annotators

Upload Annotation Tasks. A sample task is provided in scripts/data_preprocessing/sample_task.json

Assign tasks to annotators

Once all the annotations are done, 'Export' all the annoated data in .json format

Note that we used John Snow Labs No Tool NLP Interface (however it seems it is deprecated now). The upgraded version is Generative AI Labs. The XML config should ideally be compatible.

After exporting the data, you get a sample data. An example of it is in sample_data/sample_data_export_from_JSL.json

Now you can use data-preprocessing script to get the final dataset file.

./run_00_data_preprocessing.sh

You will need to change the paths of data if you want to change it your custom data.

raw_annotations_file = f"{project_dir}/sample_data/sample_data_export_from_JSL.json"
processed_file = f"{project_dir}/sample_data/sample_data_export_from_JSL.jsonl"
topics_file = f"{project_dir}/sample_data/medexpert_questions_with_topics.jsonl"

Here the topics_file is optional and corresponding to topics in Appendix table of the paper.

Data Preparation

Download the MedExpert dataset from the Hugging Face link using

./run_01_prepare_data.sh

This script also computes data statistics reported in the the paper.

MedExpert Benchmark: Factuality and Omission Detection Systems

We include the code and instructions to run the factuality and omission detection systems discussed in the MedExpert paper.

The following commands assume you have set up the medexpert environment as described above.

Factuality / Hallucination Detection

We evaluate two factuality detection systems:

MedScore+GPT-4o Knowledge
MedScore+MedRAG

Both can be run with

./run_02_factuality_detection.sh

Omission Detection

We evaluate two omission detection systems:

Zero-shot Omission Detector
HealthBench-ICL

Both can be run with

./run_03_omission_detection.sh

Note that the HealthBench-ICL dataset automatically downloads with the script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MedExpert

Overview

Setup Instructions

(Optional) Data Annotation & Pre-processing

Data Preparation

MedExpert Benchmark: Factuality and Omission Detection Systems

Factuality / Hallucination Detection

Omission Detection

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
images		images
sample_data		sample_data
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
example.env		example.env
run_00_data_preprocessing.sh		run_00_data_preprocessing.sh
run_01_prepare_data.sh		run_01_prepare_data.sh
run_02_factuality_detection.sh		run_02_factuality_detection.sh
run_03_omission_detection.sh		run_03_omission_detection.sh

License

JHU-CLSP/MedExpert

Folders and files

Latest commit

History

Repository files navigation

MedExpert

Overview

Setup Instructions

(Optional) Data Annotation & Pre-processing

Data Preparation

MedExpert Benchmark: Factuality and Omission Detection Systems

Factuality / Hallucination Detection

Omission Detection

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages