The data folder contains publicly available, synthetic, and augmented data used in
TTC model development, tuning, and evaluation.
Data extracted from queries, API calls, or other pulls from LOINC, SNOMED, and HL7 Valueset resources are categorized under
/snoinc_extracts.
Data created as part of curation, augmentation, or synthetic generation for model training and evaluation is categorized under
/training_files/.
Data used to evaluate the accuracy of codes assigned by the TTC model to the expected codes and is categoried under /accuracy_evaluation.
build_evaluation_files.pycreates the files required to complete the evaluation using the eRSD.oid_to_conditions.txtis a json file that logs the OID-SNOMED condition ID key-pairs.loinc_to_oids.txtis a json file that logs the LOINC code to the array of 1+ OIDs that leverage that LOINC code.add_loinc_codes.pyadds LOINC codes to a JSONL that only has the display name for the expected and returned text fields. This will probably be deprecated once we add LOINC codes to the embedding files to avoid 1000s of calls to the LOINC API.evaluation.pyis a script that takes a JSON containing the expected and returned LOINC codes and runs a comparison to determine the accuracy of a match. More information on the criteria to determine the degree of correctness of a match can be found here: https://docs.google.com/document/d/1yA5NJ06mf1EfLZRmNrrNKopWL6ExMj-dPYKy8wlVDGs/edit?tab=t.0#heading=h.rn5y5vzcin6p./accuracy_evaluation/sample_data/is a folder that contains small portions of data to confirm the efficacy of theevaluation.pyscript.eval_results_snippet.jsonlis a portion of the output of theperformance.ipynbnotebook that currently lacks the LOINC codes.eval_results_snippet_with_loinc_codes.jsonis the created output of theadd_loinc_codes.pyscript that can be used with theevaluation.pyscript;evaluation_results_eval_results_snippet_with_loinc_codes.jsonis the result.sample_evaluation_file.txtis a dummy file that can be run against the evaluation script to confirm the logic for a first-, second-, and third-degree match;evaluation_results_sample_evaluation_file.jsonis the output file.