Pipeline to transform GoldenGate XML into RDF.
This Docker Image exposes a server on port 4505 which:
- listens for github webhooks (
POSTrequests) from the configured source repo (plazi/treatments-xml) - processes the changed files to generate rdf
- pushes these into the configured target repo (
plazi/treatments-rdf)
This webserver also exposes the follwing paths:
/status: Serves a Badge (svg) to show the current pipeline status/workdir/jobs/: List of runs/workdir/jobs/[id]/status.json: Status of run with that id/workdir/jobs/[id]/log.txt: Log of run with that id/update?from=[from-commit-id]&till=[till-commit-id]: send aPOSThere to update all files modified since from-commit-id up till-commit-id or HEAD if not specified/full_update: send aPOSThere to run the full_update script. Note that this will not delete any files (yet).
Build as a docker container.
docker build . -t gg2rdfRequires a the environment-variable GHTOKEN as
username:<personal-acces-token> to authenticate the pushing into the
target-repo.
Then run using a volume
docker run --name gg2rdf --env GHTOKEN=username:<personal-acces-token> -p 4505:4505 -v gg2rdf:/app/workdir gg2rdfExposes port 4505.
services:
gg2rdf:
...
environment:
- GHTOKEN=username:<personal-acces-token>
volumes:
- gg2rdf:/app/workdir
volumes:
gg2rdf:Edit the file config/config.ts. Should be self-explanatory what goes where.
The repo comes with vscode devcontaioner configurations. Some tweaks to allow using git from inside the devcontainer.
To start from the terminal in vscode:
set -a; source .env; set +a; deno run -A src/main.ts
use git diff origin/main HEAD -- from the target-repo inside the container after running some big transformation.
Hint: run git switch --discard-changes --force-create main --track origin/main before gg2rdf tries to reclone the branch if there are already changes in the target-repo.
gg2rdf will ouput the follwing messages and put them as comments into the generated ttl:
- Errors:
- «Error: Could not create RDF due to missing <document>»
- «There was some Error in gg2rdf» followed by the javascript error if one occurs
- «Cannot produce RDF due to data errors:»
- «the treatment is lacking the taxon» if no element matches
document treatment subSubSection[type="nomenclature"] taxonomicName
- «the treatment is lacking the taxon» if no element matches
- «Error: Invalid Rank» and «Error: Invalid taxon relation» for invalid taxon-concepts
- Warnings:
- «Warning: Failed to output a material citation, could not create identifier»
- «Warning: treatment taxon is missing ancestor kingdom»
- «Warning: abbreviated
rank"name"» if a taxon name component contains a.(e.g. «Warning: abbreviated genus "T."») - «Warning: Could not determine parent name of
tn-uri»