Code for the "MedExpert: An Expert-Annotated Dataset for Medical Chatbot Evaluation" paper at Machine Learning for Health (ML4H) 2025.
Paper Link: https://openreview.net/pdf?id=rkLAzDPlqL
Dataset release on Hugging Face: https://huggingface.co/datasets/sonal-ssj/MedExpert
-
Clone the repository:
[email protected]:JHU-CLSP/MedExpert.git cd MedExpert
-
Copy
example.envto.envand fill in the required environment variables.cp example.env .env
Edit the
.envfile to edit paths forMEDEXPERT_REPO="path/to/medexpert/repo",MEDEXPERT_DATA="path/to/medexpert/data", and API keyOPENAI_API_KEY="your_openai_api_key_here" -
Install the required packages:
conda env create -f environment.yml conda activate medexpert pip install git+https://github.com/Heyuan9/MedScore.git --no-deps
This section not required for MedExpert data or benchmark experiments itself but only if someone wants to annotate data like MedExpert we provide all the scripts & tools for the same. We used John Snow Labs No Tool NLP Interface.
Once you have the interface hosted, click on New Project -> Setup -> Configuration -> Keep clicking on Next button until you reach Customize Configuration -> Select Code tab in customize configuration -> Copy paste the XML config in scripts/data_preprocessing/JSL_interface.XML. -> Save Config.
In Project -> Setup -> Team you can add the annotators
Upload Annotation Tasks. A sample task is provided in scripts/data_preprocessing/sample_task.json
Once all the annotations are done, 'Export' all the annoated data in .json format
Note that we used John Snow Labs No Tool NLP Interface (however it seems it is deprecated now). The upgraded version is Generative AI Labs. The XML config should ideally be compatible.
After exporting the data, you get a sample data. An example of it is in sample_data/sample_data_export_from_JSL.json
Now you can use data-preprocessing script to get the final dataset file.
./run_00_data_preprocessing.shYou will need to change the paths of data if you want to change it your custom data.
raw_annotations_file = f"{project_dir}/sample_data/sample_data_export_from_JSL.json"
processed_file = f"{project_dir}/sample_data/sample_data_export_from_JSL.jsonl"
topics_file = f"{project_dir}/sample_data/medexpert_questions_with_topics.jsonl"
Here the topics_file is optional and corresponding to topics in Appendix table of the paper.
Download the MedExpert dataset from the Hugging Face link using
./run_01_prepare_data.shThis script also computes data statistics reported in the the paper.
We include the code and instructions to run the factuality and omission detection systems discussed in the MedExpert paper.
The following commands assume you have set up the medexpert environment as described above.
We evaluate two factuality detection systems:
- MedScore+GPT-4o Knowledge
- MedScore+MedRAG
Both can be run with
./run_02_factuality_detection.shWe evaluate two omission detection systems:
- Zero-shot Omission Detector
- HealthBench-ICL
Both can be run with
./run_03_omission_detection.shNote that the HealthBench-ICL dataset automatically downloads with the script.


