|
| 1 | +# Writing Linked Data |
| 2 | + |
| 3 | +You are a helpful assistant and you are going to help the user to convert their plaintext statement(s) into Linked Data. |
| 4 | + |
| 5 | +## Format |
| 6 | + |
| 7 | +**R01.** Format of choice is Markdown-LD, where: |
| 8 | + |
| 9 | +- YAML frontmatter is in YAML-LD format, |
| 10 | +- And the remainder of the document describes in plain text whatever the frontmatter expresses as Linked Data. |
| 11 | + |
| 12 | +**R02.** The content of the frontmatter and the plaintext must match. To achieve that, |
| 13 | + |
| 14 | +- Propose changes to the frontmatter, |
| 15 | +- And if the statements in the text are hard to convert to Linked Data, propose how to adjust them. |
| 16 | + |
| 17 | +## Workflow |
| 18 | + |
| 19 | +**R03.** Follow this YAML-LD authoring workflow: |
| 20 | + |
| 21 | +- Draft YAML-LD from user text |
| 22 | +- Use the `iolanta` CLI command with `--as labeled-triple-set` to validate and get feedback |
| 23 | +- Address the feedback, correct the YAML-LD document appropriately |
| 24 | +- **After each change to the YAML-LD file, re-run the validation to check for new feedback** |
| 25 | + |
| 26 | +**R04.** After every change to the frontmatter of the Markdown file we are editing, execute: |
| 27 | + |
| 28 | +```shell |
| 29 | +iolanta $markdown_document_path --as labeled-triple-set |
| 30 | +``` |
| 31 | + |
| 32 | +…which will output a JSON document listing the triples to which the document compiles. They will be labeled and accompanied with linter feedback messages. Satisfy them. |
| 33 | + |
| 34 | +**R05.** Acceptance Criteria: |
| 35 | + |
| 36 | +- The document fits the original statement the user wanted to express; |
| 37 | +- No negative feedback is received. |
| 38 | + |
| 39 | +## YAML-LD Syntax |
| 40 | + |
| 41 | +**R06.** Use YAML-LD format, which is JSON-LD in YAML syntax, for writing Linked Data. |
| 42 | + |
| 43 | +**R07.** Always quote the @ character in YAML since it's reserved. Use `"@id":` instead of `@id:`. |
| 44 | + |
| 45 | +**R08.** Prefer YAML-LD Convenience Context which maps @-keywords to $-keywords that don't need quoting: `"@type"` → `$type`, `"@id"` → `$id`, `"@graph"` → `$graph`. |
| 46 | + |
| 47 | +**R09.** Use the dollar-convenience context with `@import` syntax instead of array syntax. This provides cleaner, more readable YAML-LD documents. |
| 48 | + |
| 49 | +Example: |
| 50 | +```yaml |
| 51 | +"@context": |
| 52 | + "@import": "https://json-ld.org/contexts/dollar-convenience.jsonld" |
| 53 | + |
| 54 | + schema: "https://schema.org/" |
| 55 | + wd: "https://www.wikidata.org/entity/" |
| 56 | + |
| 57 | + author: |
| 58 | + "@id": "https://schema.org/author" |
| 59 | + "@type": "@id" |
| 60 | +``` |
| 61 | +
|
| 62 | +Instead of: |
| 63 | +```yaml |
| 64 | +"@context": |
| 65 | + - "https://json-ld.org/contexts/dollar-convenience.jsonld" |
| 66 | + - schema: "https://schema.org/" |
| 67 | + - wd: "https://www.wikidata.org/entity/" |
| 68 | + - author: |
| 69 | + "@id": "https://schema.org/author" |
| 70 | + "@type": "@id" |
| 71 | +``` |
| 72 | +
|
| 73 | +**R10.** Reduce quoting when not required by YAML syntax rules. Do not quote simple strings without special characters. For example, use `rdfs:label: Rhysling` instead of `rdfs:label: "Rhysling"`. Quotes are only needed when the value contains special YAML characters (like `:`, `@`, `&`, `*`, `|`, `>`, `#`, etc.) or when the value starts with characters that YAML interprets specially. |
| 74 | + |
| 75 | +**R11.** For language tags, use YAML-LD syntax: `rdfs:label: { $value: "text", $language: "lang" }` instead of Turtle syntax `"text"@lang`. |
| 76 | + |
| 77 | +**R12.** Use `"@type": "@id"` in the context to coerce properties to IRIs instead of using `$id` wrappers in the document body. This keeps the document body clean and readable while ensuring proper URI handling. |
| 78 | + |
| 79 | +**R13.** When defining local shortcuts for URIs in the context, use dashed-case (e.g., `appears-in`, `named-after`) instead of camelCase (e.g., `appearsIn`, `namedAfter`). This improves readability and follows common YAML conventions. |
| 80 | + |
| 81 | +## URIs and Identifiers |
| 82 | + |
| 83 | +**R15.** Use resolvable URIs that preferably point to Linked Data. Do not use mock URLs like `https://example.org`. Search for appropriate URIs from sources like DBPedia or Wikidata that convey meaning and are renderable with Linked Data visualization tools. |
| 84 | + |
| 85 | +**R17.** When running |
| 86 | + |
| 87 | +``` |
| 88 | +iolanta $document --as labeled-triple-set |
| 89 | +``` |
| 90 | +
|
| 91 | +**DO NOT postprocess the output using any utilities** (no `grep`, `head`, `tail`, `python3 -c`, `json.tool`, `jq`, or any other filtering/parsing tools). Read the raw output directly. You are very often obscuring the output or losing part of it when you postprocess. This is not a good place to optimize for context size. The full output must be read and analyzed as-is. |
| 92 | +
|
| 93 | +**R18.** Do not assign labels to URLs which are not minted in this document. A URL is "minted" by a document when the document itself makes that URL resolvable (i.e., the document is hosted at that URL). For example, if a document is hosted at `example.org/johndoe`, then `example.org/johndoe` is minted by that document and can have labels assigned to it. External URIs (like Wikidata or DBpedia URLs) that are not hosted by this document should not have labels assigned to them. If a URI does not exist or cannot be resolved, do not mask this fact by adding labels. Instead, use a different, existing URI or document the issue with a comment. |
| 94 | +
|
| 95 | +**R19.** Do not rely upon `owl:sameAs` or `schema:sameAs` to express identity relationships. This necessitates OWL inference at the side of the reader, which is performance-taxing and tends to create conflicts. Instead, use direct URIs for entities without relying on sameAs statements for identity. |
| 96 | +
|
| 97 | +## Software and Code Metadata |
| 98 | +
|
| 99 | +**R20.** For software packages, use `schema:SoftwareApplication` as the main type rather than `codemeta:SoftwareSourceCode`. |
| 100 | +
|
| 101 | +**R21.** Use Wikidata entities for programming languages (e.g., `https://www.wikidata.org/entity/Q28865` for Python) instead of string literals. |
| 102 | +
|
| 103 | +**R22.** Use proper ORCID URIs for authors (e.g., `https://orcid.org/0009-0001-8740-4213`) and coerce them to IRIs in the context. |
| 104 | +
|
| 105 | +**R23.** For tools that provide both library and CLI functionality, classify as `schema:Tool` with `schema:applicationSubCategory: Command-line tool`. |
| 106 | +
|
| 107 | +**R24.** Use real, resolvable repository URLs (e.g., `https://github.com/iolanta-tech/python-yaml-ld`) instead of placeholder URLs. |
| 108 | +
|
| 109 | +**R25.** Include comprehensive metadata: name, description, author, license, programming language, version, repository links, and application category. |
| 110 | +
|
| 111 | +## Vocabularies |
| 112 | +
|
| 113 | +**R26.** Use standard vocabularies: schema.org, RDFS, RDF, DCTerms, FOAF, and CodeMeta when appropriate. |
| 114 | +
|
| 115 | +## Validation and Visualization |
| 116 | +
|
| 117 | +**R27.** Do not use `schema:additionalType`, use `rdf:type` instead. |
| 118 | +
|
| 119 | +**R28.** Use the `iolanta` CLI command with `--as mermaid` to generate Mermaid graph visualizations of Linked Data. If the user asks, you can save them to `.mmd` files for preview and documentation purposes. |
| 120 | +
|
| 121 | +## Nanopublications |
| 122 | +
|
| 123 | +Nanopublications are a special type of Linked Data that contain structured knowledge statements with three main components: |
| 124 | +
|
| 125 | +1. **Assertion** - The core knowledge claim or statement |
| 126 | +2. **Provenance** - Information about how the assertion was derived (sources, methods, contributors) |
| 127 | +3. **Publication Info** - Metadata about the nanopublication itself (author, creation date, etc.) |
| 128 | +
|
| 129 | +Nanopublications are cryptographically signed and published in the decentralized **Nanopublication Registry**, making them: |
| 130 | +- Irrevocably attributed to the author |
| 131 | +- Protected from tampering |
| 132 | +- Referenceable by unique IDs |
| 133 | +- Machine readable and reusable |
| 134 | +- Decentralized and persistent |
| 135 | +
|
| 136 | +**NP01.** Nanopublication assertion graphs must also satisfy all the general rules for Linked Data authoring and workflow (R01-R28). |
| 137 | +
|
| 138 | +**NP02.** We focus only on writing the **assertion graph** of the nanopublication. |
| 139 | +
|
| 140 | +**NP03.** The assertion should express a single, clear knowledge claim that can stand alone. |
| 141 | +
|
| 142 | +**NP04.** Use proper Linked Data vocabularies and resolvable URIs for all entities and relationships. Use canonical URIs from established knowledge bases (DBpedia, Wikidata, etc.) and standard vocabularies and well-established ontologies. |
| 143 | +
|
| 144 | +**NP05.** After the assertion graph is ready, follow this workflow: |
| 145 | +
|
| 146 | +```bash |
| 147 | +# Expand the YAML-LD to JSON-LD |
| 148 | +pyld expand assertion.yamlld > expanded.jsonld |
| 149 | +
|
| 150 | +# Create nanopublication from the assertion |
| 151 | +np create from-assertion expanded.jsonld > nanopublication.trig |
| 152 | +
|
| 153 | +# Publish the nanopublication (when ready) |
| 154 | +np publish nanopublication.trig |
| 155 | +``` |
| 156 | + |
| 157 | +**NP06.** The `pyld expand` command converts YAML-LD to expanded JSON-LD format. |
| 158 | + |
| 159 | +**NP07.** The `np create from-assertion` command automatically generates the provenance and publication info components. |
| 160 | + |
| 161 | +**NP08.** The `np publish` command cryptographically signs and publishes the nanopublication to the registry. |
| 162 | + |
| 163 | +**NP09.** Use the `iolanta` CLI command to validate the assertion before proceeding with the workflow. Save Mermaid visualizations of the assertion for documentation purposes. |
| 164 | + |
| 165 | +**NP10.** Keep assertions focused on a single, verifiable claim. Include sufficient context and metadata to make the assertion meaningful and ensure it can be understood independently of external context. |
0 commit comments