CMIP Linked Data Utilities Library
CMIP-LD is a Python library for working with CMIP (Coupled Model Intercomparison Project) Linked Data vocabularies. It provides tools to fetch, resolve, validate, and generate documentation for JSON-LD controlled vocabularies used across CMIP and related climate science projects.
- π Prefix Resolution - Resolve short prefixes (e.g.,
universal:frequency) to full URLs - π₯ Data Fetching - Retrieve and expand JSON-LD documents with automatic dereferencing
- π Documentation Generation - Auto-generate README files for vocabulary directories
- β Validation - Validate JSON files against schemas and contexts
- π CI/CD Actions - GitHub Actions for automated vocabulary processing
| Prefix | Repository | Description |
|---|---|---|
universal |
WCRP-universe | Universal controlled vocabularies |
cmip7 |
CMIP7-CVs | CMIP7 controlled vocabularies |
cmip6plus |
CMIP6Plus_CVs | CMIP6Plus controlled vocabularies |
cf |
CF | CF Conventions vocabularies |
vr |
Variable-Registry | Variable registry |
emd |
Essential-Model-Documentation | Essential model documentation |
git clone https://github.com/wcrp-cmip/CMIP-LD.git
cd CMIP-LD
pip install -e .- Python 3.8+
jsonld-recursive- JSON-LD processingrequests- HTTP requests- Optional:
esgvoc- For Pydantic model integration
import cmipld
# Fetch and resolve a vocabulary term
data = cmipld.get("universal:frequency/mon")
print(data)
# Expand a JSON-LD document
expanded = cmipld.expand("universal:frequency")import cmipld
# Get the full URL for a prefix
url = cmipld.mapping['universal']
# β 'https://wcrp-cmip.github.io/WCRP-universe/'
# Resolve a prefixed URI
full_url = cmipld.resolve_prefix("universal:frequency/mon")from esgvoc.api import search
# Search for terms
results = search.find("frequency", term="mon")
print(results)CMIP-LD/
βββ cmipld/ # Main Python package
β βββ __init__.py # Package initialization & client setup
β βββ locations.py # Prefix mappings and URL resolution
β βββ prefix_mappings.json # Prefix β repository mappings
β βββ generate/ # Documentation generation tools
β β βββ create_readme.py # Generate READMEs for vocab directories
β β βββ generate_summary.py
β β βββ validate_json.py
β βββ utils/ # Utility functions
β βββ git/ # Git integration
β βββ extract/ # Data extraction tools
β βββ ...
βββ actions/ # GitHub Actions for CI/CD
βββ static/ # Static assets (viewer, images)
βββ notebooks/ # Example Jupyter notebooks
βββ scripts/ # Standalone utility scripts
The create_readme.py script generates standardized documentation for vocabulary directories containing JSON-LD files:
python -m cmipld.generate.create_readme /path/to/src-data/universeFeatures:
- Only processes directories with a
_contextfile - Extracts schema from Pydantic models (via esgvoc) or JSON keys
- Generates usage examples for cmipld, esgvoc, and direct HTTP
- Creates collapsible file listings
- Analyzes external dependencies
python scripts/collect_vocab_docs.py /path/to/src-data --output docs/vocabulariesThis collects all vocabulary READMEs into a single folder for rendering with MkDocs.
CMIP-LD provides reusable GitHub Actions for vocabulary repositories:
| Action | Description |
|---|---|
actions/process_jsonld |
Process and validate JSON-LD files |
actions/build-mkdocs |
Build MkDocs documentation |
actions/check-graph |
Validate graph structure |
actions/commit-all |
Commit changes with attribution |
See CONTRIBUTING.md for guidelines.
- esgvoc - ESGF Vocabulary API with Pydantic models
- jsonld-recursive - JSON-LD recursive resolution
- WCRP-universe - Universal vocabularies
Apache 2.0 - See LICENSE for details.
Developed by WCRP-CMIP for the climate science community.
