-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Labels
Description
OARD (open annotations for rare diseases) paper: https://pubmed.ncbi.nlm.nih.gov/35998640/
Andrew shared this resource on 5/28 as one to possibly ingest for Translator purposes (internal Slack link).
My initial review:
It could provide rare disease -> phenotype associations (derived from EHR data using ontology mapping/NLP + frequency stats).
HOWEVER:
- I don't see a way to download the data: there's only the website and API
- it doesn't provide a simple set of asserted associations. Instead, given a concept (or a set of concepts), it does statistical analyses on-the-fly and provides a ranked list of associations to other concepts. So further analysis/curation decisions are needed like:
- what statistical method? There's 4 choices (chi-squared, odds ratio, relative freq, jaccard index). Odds ratio seems more prominent: the paper's analysis of its associations used odds ratio, and the website seems to default to using it.
- what dataset? There's 3 original data sources: cuimc/ohdsi (dataset num 1), cuimc/notes (2), chop/notes (3). Plus lots of other subsets.
- what to use as a cutoff for "assertion"? (stat >= x value, take top n?)
- if we only want disease -> phenotype associations, we'll need a filtering process after NodeNorming. I see a way to filter results to only HP terms or only MONDO terms. But I see MONDO terms that look like PhenotypicFeatures and HP terms that look like Diseases.
Note: OARD is made by the COHD team.
Reactions are currently unavailable