Skip to content

Possible resource to ingest: OARD #911

@colleenXu

Description

@colleenXu

OARD (open annotations for rare diseases) paper: https://pubmed.ncbi.nlm.nih.gov/35998640/

Andrew shared this resource on 5/28 as one to possibly ingest for Translator purposes (internal Slack link).


My initial review:

It could provide rare disease -> phenotype associations (derived from EHR data using ontology mapping/NLP + frequency stats).

HOWEVER:

  • I don't see a way to download the data: there's only the website and API
  • it doesn't provide a simple set of asserted associations. Instead, given a concept (or a set of concepts), it does statistical analyses on-the-fly and provides a ranked list of associations to other concepts. So further analysis/curation decisions are needed like:
    • what statistical method? There's 4 choices (chi-squared, odds ratio, relative freq, jaccard index). Odds ratio seems more prominent: the paper's analysis of its associations used odds ratio, and the website seems to default to using it.
    • what dataset? There's 3 original data sources: cuimc/ohdsi (dataset num 1), cuimc/notes (2), chop/notes (3). Plus lots of other subsets.
    • what to use as a cutoff for "assertion"? (stat >= x value, take top n?)
  • if we only want disease -> phenotype associations, we'll need a filtering process after NodeNorming. I see a way to filter results to only HP terms or only MONDO terms. But I see MONDO terms that look like PhenotypicFeatures and HP terms that look like Diseases.

Note: OARD is made by the COHD team.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions