This document summarizes the initial implementation of the EnumValidator for the linkml-term-validator project, based on the reference implementation in ../valuesets.
-
Pydantic Models (
src/linkml_term_validator/models.py)ValidationConfig: Configuration for validation behaviorValidationIssue: Individual validation issue with severity levels (ERROR, WARNING, INFO)ValidationResult: Aggregated validation results with helper methodsSeverityLevel: Enum for severity classification
-
EnumValidator (
src/linkml_term_validator/validator.py)- Main validation class with multi-level caching
- Integration with OAK (Ontology Access Kit) for ontology access
- Per-prefix adapter configuration support
- Label normalization and alias matching
- Configurable strictness modes
-
CLI (
src/linkml_term_validator/cli.py)validatecommand with comprehensive options- Support for single files and directories
- Verbose and quiet output modes
- Exit codes for CI/CD integration
-
Configuration (
oak_config.yaml)- Template configuration for per-prefix ontology adapters
- Support for OBO sqlite databases, OLS, BioPortal, and custom adapters
- Option to skip validation for specific prefixes
-
Multi-Level Caching
- In-memory cache for session performance
- File-based cache (CSV) for persistence across runs
- Per-prefix cache organization
-
Flexible Validation Modes
- Strict mode: All mismatches treated as errors
- Lenient mode: Configured prefixes are strict, unconfigured generate warnings
- Info level: For completely unconfigured prefixes
-
Comprehensive Label Matching
- Permissible value name
- Title field
- Description field
- Aliases
- Annotations (label, display_name, preferred_name, synonym)
- All normalized for case-insensitive, punctuation-free comparison
-
Unknown Prefix Tracking
- Identifies prefixes not in configuration
- Suggests adding them to
oak_config.yaml
- OAK Integration: Used OAK as the primary abstraction for ontology access rather than implementing custom clients
- Two-Tier Validation: Strict validation for configured prefixes, lenient for unconfigured
- Pydantic Models: Type-safe configuration and results with built-in validation
- CSV Caching: Simple, human-readable cache format with timestamps
- CLI Design: Simple success output ("✅"), detailed error reporting when needed
- Simplified REST Adapters: Not implemented in initial version (can be added later)
- Typer CLI: Used Typer instead of raw argument parsing
- Comprehensive Doctests: Added extensive doctests for documentation and testing
- Modern Python: Used Python 3.10+ type hints (
list[T]instead ofList[T])
-
Unit Tests (
tests/test_validator.py)- String normalization
- CURIE prefix extraction
- Configuration validation
- Alias extraction
- Schema validation structure
- Result and issue helper methods
- Cache file path generation
- Unknown prefix tracking
-
Doctests (in source files)
- All public methods documented with examples
- Verified through pytest doctest module
-
Test Data (
tests/data/test_schema.yaml)- Example schema with multiple enums
- Mix of enums with and without meanings
- GO and CHEBI term examples
All tests passing:
- 13 pytest tests
- 20 doctests
- mypy type checking passes
- ruff linting passes
# Validate a single schema
linkml-term-validator validate schema.yaml
# Validate all schemas in a directory
linkml-term-validator validate src/schema/
# Strict mode
linkml-term-validator validate --strict schema.yaml
# Custom adapter
linkml-term-validator validate --adapter ols: schema.yaml
# With config file
linkml-term-validator validate --config oak_config.yaml schema.yamlfrom pathlib import Path
from linkml_term_validator.models import ValidationConfig
from linkml_term_validator.validator import EnumValidator
config = ValidationConfig(
oak_adapter_string="sqlite:obo:",
strict_mode=False,
cache_labels=True,
oak_config_path=Path("oak_config.yaml"),
)
validator = EnumValidator(config)
result = validator.validate_schema(Path("schema.yaml"))
if result.has_errors():
result.print_summary(verbose=True)
exit(1)linkml-term-validator/
├── src/linkml_term_validator/
│ ├── __init__.py
│ ├── _version.py
│ ├── cli.py # Typer CLI
│ ├── models.py # Pydantic models
│ └── validator.py # EnumValidator class
├── tests/
│ ├── data/
│ │ └── test_schema.yaml
│ ├── test_simple.py
│ └── test_validator.py
├── docs/
│ └── usage.md
├── oak_config.yaml # Example configuration
├── CLAUDE.md # Repository instructions
├── IMPLEMENTATION.md # This file
└── pyproject.toml # Dependencies and config
dependencies = [
"typer >= 0.9.0",
"linkml-runtime >=1.9.4",
"oaklib>=0.6.23", # NEW
"pydantic>=2.0.0", # NEW
"ruamel-yaml>=0.18.15", # NEW
]- REST Adapters: Custom REST adapters for non-OAK sources (e.g., ROR)
- Dataset Validation: Validating actual data instances (not just schemas)
- Dynamic Enum Validation: Based on OAK vskit
- Batch Processing Optimizations: Parallel adapter queries
- Cache Invalidation: TTL-based cache expiry
- Progress Bars: For large schema directories
- JSON/YAML Output: Machine-readable validation results
- GitHub Actions Integration: Pre-built action for CI/CD
- Pre-commit Hook: For automatic validation
- Watch Mode: Continuous validation during development
- Core validation logic (label matching, severity levels)
- OAK integration pattern
- Two-level caching (memory + file)
- Per-prefix configuration
- CLI design philosophy
- Typer CLI instead of raw argparse
- No REST adapters yet
- Simplified initial implementation
- More comprehensive doctests
- Modern Python type hints
- README.md: Updated with features and quick start
- docs/usage.md: Comprehensive usage guide
- CLAUDE.md: Repository context for AI assistants
- Inline documentation: Extensive docstrings with examples
The EnumValidator implementation successfully replicates the core functionality of the valuesets reference implementation while adapting it for this project's needs. The foundation is solid, with comprehensive testing, type safety, and extensibility for future enhancements.