Skip to content

Add validate text-file subcommand for regex-based extraction from text/markdown #12

@cmungall

Description

@cmungall

Feature request

Problem

Currently linkml-term-validator only accepts LinkML YAML schemas as input. When validating ontology term identifiers referenced in documentation (markdown, plain text, OBO files, etc.), users must write a shim script to:

  1. Extract CURIEs and labels from the text using regex
  2. Generate a synthetic LinkML schema YAML with enum permissible values
  3. Create an OAK config file mapping prefixes to adapters
  4. Run validate-schema on the generated YAML

This is clunky compared to the sister tool linkml-reference-validator, which already has a validate text-file subcommand with --regex support.

Proposed solution

Add a validate text-file subcommand analogous to the one in linkml-reference-validator:

uvx linkml-term-validator validate text-file document.md \
  --regex '@term (\S+) "([^"]*)"' \
  --curie-group 1 --label-group 2 \
  --config oak_config.yaml --strict -v

This would:

  • Read the text file
  • Extract CURIE + label pairs using the regex
  • Resolve each CURIE via OAK
  • Check the label matches
  • Report results

Use case

I'm writing analysis documents for ontology restructuring (e.g. MONDO disease term reviews) that reference many terms from multiple ontologies (MONDO, ORDO, etc.). I want to embed machine-checkable assertions directly in the markdown:

## Validated Identifiers
- @term MONDO:0009282 "multiple acyl-CoA dehydrogenase deficiency"
- @term ORDO:26791 "Multiple acyl-CoA dehydrogenase deficiency"

And validate them with a single command rather than generating intermediate YAML files.

Additional issue: --strict should error on unresolvable CURIEs

Currently, if a CURIE doesn't exist in the ontology, the tool silently passes (no label retrieved = no mismatch). With --strict, an unresolvable CURIE should be an error, not a silent pass. This is important for catching typos in identifiers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions