Name	Name	Last commit message	Last commit date
parent directory ..
accuracy_evaluation	accuracy_evaluation
snoinc_extracts	snoinc_extracts
training_files	training_files
README.md	README.md

Name

Last commit message

Last commit date

DIBBS Text to Code Data

Overview

Overview

The data folder contains publicly available, synthetic, and augmented data used in TTC model development, tuning, and evaluation.

Data extracted from queries, API calls, or other pulls from LOINC, SNOMED, and HL7 Valueset resources are categorized under /snoinc_extracts.

For more details read here
To generate these SNOINC Extract Files refer to this README

Data created as part of curation, augmentation, or synthetic generation for model training and evaluation is categorized under /training_files/.

Data used to evaluate the accuracy of codes assigned by the TTC model to the expected codes and is categoried under /accuracy_evaluation.

build_evaluation_files.py creates the files required to complete the evaluation using the eRSD.
oid_to_conditions.txt is a json file that logs the OID-SNOMED condition ID key-pairs.
loinc_to_oids.txt is a json file that logs the LOINC code to the array of 1+ OIDs that leverage that LOINC code.
add_loinc_codes.py adds LOINC codes to a JSONL that only has the display name for the expected and returned text fields. This will probably be deprecated once we add LOINC codes to the embedding files to avoid 1000s of calls to the LOINC API.
evaluation.py is a script that takes a JSON containing the expected and returned LOINC codes and runs a comparison to determine the accuracy of a match. More information on the criteria to determine the degree of correctness of a match can be found here: https://docs.google.com/document/d/1yA5NJ06mf1EfLZRmNrrNKopWL6ExMj-dPYKy8wlVDGs/edit?tab=t.0#heading=h.rn5y5vzcin6p.
/accuracy_evaluation/sample_data/ is a folder that contains small portions of data to confirm the efficacy of the evaluation.py script.
- eval_results_snippet.jsonl is a portion of the output of the performance.ipynb notebook that currently lacks the LOINC codes. eval_results_snippet_with_loinc_codes.json is the created output of the add_loinc_codes.py script that can be used with the evaluation.py script; evaluation_results_eval_results_snippet_with_loinc_codes.json is the result.
- sample_evaluation_file.txt is a dummy file that can be run against the evaluation script to confirm the logic for a first-, second-, and third-degree match; evaluation_results_sample_evaluation_file.json is the output file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

DIBBS Text to Code Data

Table of Contents

Overview

FilesExpand file tree

data

Directory actions

More options

Directory actions

More options

Latest commit

History

data

Folders and files

parent directory

README.md

DIBBS Text to Code Data

Table of Contents

Overview