feat(RDF): expose access rights and rights in RDF by maudetes · Pull Request #3667 · opendatateam/udata

maudetes · 2026-02-13T18:25:40Z

Related to datagouv/data.gouv.fr#1942

Builds on #3666

Expose access rights and rights in DCAT following ERPD guidelines.

use access rights european vocabulary for DCT.accessRights
use INSPIRE vocabulary for restriction reason categories exposed as DCT.rights
expose license title as DCT.rights and license URL as DCT.license

All those information are exposed at the distribution and dataset level, since everything is for now consolidated at the dataset level.

In my understanding, there are conflicting visions (ex: between DCAT guidelines, DCAT-AP guidelines, GeoDCAT-AP cardinalities and ERPD guidelines) regarding exposition of legal information at the dataset level or not.
As explained, we only expose consolidated information, thus we don't fall on the pitfall of having conflicting legal information at different levels.

udata/core/access_type/constants.py

udata/core/dataset/rdf.py

streino · 2026-02-16T11:16:43Z

udata/core/dataset/rdf.py

+            DCT.description,
+            Literal(InspireLimitationCategory(dataset.access_type_reason_category).definition),
+        )
+        yield node, DCT.rights


Why not DCT.accessRights here? https://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess is only about access.

That was my first attempt, but I think DCT.accessRights is only 0..1 and we already have the one exposed based on dataset.access_type?

Ah right...

IIRC the GeoDCAT-AP spec, we should then put dataset.access_type_reason_category in DCT.accessRights instead of dataset.access_type (when available):

https://semiceu.github.io/GeoDCAT-AP/releases/3.1.0/#Dataset.accessrights

For INSPIRE metadata, this property SHOULD be used with the URIs of the "Limitations on public access" code list operated by the INSPIRE Registry [INSPIRE-LPA].

But then I don't know how compatible it is with DCAT, which lists dct:accessRights among "a number of properties are listed with controlled vocabularies that MUST be used for the listed properties", but follows with "implementers are encouraged to publish and to use further region or domain-specific vocabularies that are available online". The fact that GeoDCAT-AP says that the "considerations for expressing legal information follow the guidelines and approach of DCAT-AP" (and therefore I assume don't depart from it) suggests it might be OK to use a more precise vocabulary when available?

If the main target is "pure DCAT", I guess we can't do much more than what you do now. Question is: are we targetting DCAT or GeoDCAT-AP?

I think we're targeting GeoDCAT-AP in the end. However, it should be compatible with DCAT-AP and this is not the case here in my understanding. We can raise this point in GeoDCAT-AP discussions.
In the meantime, we do need to use the EU publication office vocabulary for the European Register for Protected Data (ERPD) (guidelines).

Making a note of this for the GeoDCAT-AP pilot discussions.

The difficult choices will be if/when DCAT (or DCAT-AP) and GeoDCAT-AP differ. But let's hope it won't happen too often.

=> SEMICeu/GeoDCAT-AP#172

streino · 2026-02-16T11:24:57Z

udata/core/dataset/rdf.py

    if frequency := frequency_to_rdf(dataset.frequency):
        d.set(DCT.accrualPeriodicity, frequency)

+    for access_right, predicate in rights_to_rdf(dataset, graph):


As you mention in #3667 (comment) it's not clear what should go at the dataset level, so I would suggest adding only what draws consensus, to avoid having to remove stuff later?

In addition to the sources you cited, there is an interesting discussion in SEMICeu/GeoDCAT-AP#156, especially the conclusion from SEMICeu/GeoDCAT-AP#156 (comment):

Conclusion (DCAT-AP-VL / Metadata Vlaanderen advice)

For stability and cross-border interoperability, we recommend sticking to the current hierarchy:

Dataset → dct:accessRights (PUBLIC / NON_PUBLIC) for access classification. Distribution → dct:license + dct:rights (and always an dcat:accessURL). DataService → dct:license (+ in DCAT-AP-VL also dct:rights) and (in DCAT-AP-VL) dct:accessRights. Offline/restricted/no-download → model a proxy Distribution via dcat:accessURL and attach rights/licence there.

I would therefore restrain from adding anything other than actual accessRights at the Dataset level. rights_to_rdf in this PR might output license-title-as-rights, which would be inconsistent with having the license only at the resource level.

I've tried splitting the legal information accordingly in 5d0bb15

streino · 2026-02-16T11:30:49Z

udata/core/dataset/rdf.py

+        node.set(RDF.type, DCT.RightsStatement)
+        yield node, DCT.accessRights
+    if dataset.license and dataset.license.id != DEFAULT_LICENSE["id"]:
+        yield Literal(dataset.license.title), DCT.rights


Is this needed? Seems the URL in DCT.licence should be enough?
Or could this be added as a description of the licence node, like you do with INSPIRE rights below?

I'm worried about having two different independent nodes providing the same information. We have similar cases in ISO-19139 with {access,use}Constraints and otherConstraints and it's a pain to reconcile.

I kept the existing logic from previous resource_to_rdf implementation with a r.add(DCT.rights, Literal(dataset.license.title)).
I think it is useful for licenses without URLs? For example License Not Specified (there are others but, I guess they could have a URL?).
In any case, I think we shouldn't add a DCT.license if we are in the case of not-specified.

See the new logic in 5d0bb15

Right, in the case of not-specified we don't want a DCT.licence. Are we OK too for all the other entries without URIs in https://www.data.gouv.fr/api/1/datasets/licenses/?

My worry is incorrectly outputting a "true" license with only a label but no URI. In which case I think we should still be outputting it as a DCT.license instead of a DCT.rights. But I'm not sure this case can exist?

So if the definition of a license in udata is "has an URI" then we're good with the current implem. Otherwise we may have to be more careful.

That's why I kept the existing logic as I wasn't sure how to improve the situation at the moment.
I can ask for a juridical advise on this but I believe keeping close to what exists is fair for now.

… notifications (#3665)

udata/core/access_type/constants.py

udata/core/dataset/rdf.py

Use these at the appropriate level, ie: - accessRights for Dataset - license + rights for Distribution

Running `uvx ty check` raises a few real problems. - Before 3479 errors - After 1286 errors (mainly factory generic missing in [cbe6dfd](cbe6dfd) and assert200… in [4be0519](4be0519)) --------- Co-authored-by: maudetes <maudet.estelle@gmail.com>

udata/core/dataset/rdf.py

Co-authored-by: maudetes <maudet.estelle@gmail.com>

There is a lot of variability in our tests but here are some stats. #### [main](https://app.circleci.com/pipelines/github/opendatateam/udata/8966/workflows/7cc6d4e3-1bab-48fd-ac08-4711b236e5b2?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-checks-link&utm_content=summary) - Around 9min/9min30 for tests - Total is 11min30 (including deploy) #### [some branch right now](https://app.circleci.com/pipelines/github/opendatateam/udata/8963/workflows/3b176b23-e787-4738-8e4e-ed860f8ffe55?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-checks-link&utm_content=summary) - Between 7min30 and 9min30 for tests - Total 9min52 #### [this branch](https://app.circleci.com/pipelines/github/opendatateam/udata/8974/workflows/9696d75b-506c-4ec8-ae21-4294873ace81?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-checks-link&utm_content=summary) - Between 5min40 and 6min45 for tests - Total 7min16

The real commit is 3220c99 (adding the type logic to `field()`). Other commits are simple find and replace the dynamic `db.*Field` to the correct direct import. 1. Add correct return type for `field()` wrapper based on the `inner` Mongo type. 2. Converting dynamic `db.StringField` to simple Mongo `StringField` so it will work Allow having the correct type inside models, it's even better than simple `db.StringField` or even `StringField` during usage. For example: ```python class DatasetAPI(API): @api.doc("get_dataset") @api.marshal_with(dataset_fields) def get(self, dataset: Dataset): """Get a dataset given its identifier""" if not dataset.permissions["edit"].can(): if dataset.private: # <- Here VSCode know that .private is a bool api.abort(404) elif dataset.deleted: api.abort(410, "Dataset has been deleted") return dataset ``` With `uvx ty check`: - before `Found 3479 diagnostics` - after `Found 3212 diagnostics` #### TODO - [x] Convert all db.* fields - [ ] Improve `field()` typechecking by creating an object instead of a dict (so that we can be check inside the API generation) - [ ] Support `require=True` to add `| None` to nullable fields in Mongo

…3678) It prevents conflict in case of values that may impact tests in udata.cfg.

I have some crashes in #3674 (only on Python 3.11, reproducted twice with a retry at the same point in tests). See https://app.circleci.com/pipelines/github/opendatateam/udata/8980/workflows/8fdff4fd-aceb-4cc9-8746-4a6667808de8?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-checks-link&utm_content=summary This solution is also clearly faster! Win between 1 and 2 minutes on the tests. I've tried: - [x] ~~Dropping the collections from `get_all_models()` instead of droping the db (but some dynamic fake models weren't taken into consideration)~~ - [x] ~~Dropping the collections from `__sub_classes__` of `Document` but I still got crashes~~ - [x] Dropping the items inside collections (there was one problem with one `Fake` model that create an index for `slug=Unique` so other `Fake` models with the same collection name cannot insert two document with `slug: null`.)

This reverts commit 729eb66.

streino · 2026-02-23T11:56:46Z

@maudetes there's something wrong with the PR (target branch?): 120-file diff 😬

maudetes added 3 commits February 13, 2026 18:45

expose access rights and rights in RDF

3f52ffa

Add tests

0fda374

Change access type definitions to singular

729eb66

maudetes requested a review from streino February 13, 2026 18:27

feat: add translations (#3664)

c0274c1

streino reviewed Feb 16, 2026

View reviewed changes

nicolaskempf57 and others added 2 commits February 17, 2026 09:57

feat(notifications): prevent integrity issues and add harvest related…

fda215b

… notifications (#3665)

Fix wording using right instead of access_right

03a65de

ThibaudDauce reviewed Feb 17, 2026

View reviewed changes

udata/core/access_type/constants.py Outdated Show resolved Hide resolved

ThibaudDauce reviewed Feb 17, 2026

View reviewed changes

udata/core/dataset/rdf.py Outdated Show resolved Hide resolved

maudetes and others added 2 commits February 17, 2026 10:59

Split access rights, rights and license in dedicated functions

5d0bb15

Use these at the appropriate level, ie: - accessRights for Dataset - license + rights for Distribution

streino reviewed Feb 17, 2026

View reviewed changes

udata/core/dataset/rdf.py Show resolved Hide resolved

udata/core/dataset/rdf.py Outdated Show resolved Hide resolved

nicolaskempf57 and others added 10 commits February 18, 2026 14:34

feat(notifications): add membership response notifications (#3655)

8c245e2

Co-authored-by: maudetes <maudet.estelle@gmail.com>

chore(test): add empty testing values to prevent conflict in tests (#…

f838d85

…3678) It prevents conflict in case of values that may impact tests in udata.cfg.

Merge branch 'main' into feat/expose-access-rights

6388980

Move the if dataset check in caller

2cfa734

Revert "Change access type definitions to singular"

35f0e0b

This reverts commit 729eb66.

Improve InspireLimitationCategory StrEnum logic

8577ed7

use category variable

e3493f7

maudetes requested review from ThibaudDauce and streino February 19, 2026 16:56

Conversation

maudetes commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

streino Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

streino Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

streino Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

streino commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

maudetes commented Feb 13, 2026 •

edited

Loading

streino Feb 17, 2026 •

edited

Loading

streino Feb 16, 2026 •

edited

Loading

streino Feb 17, 2026 •

edited

Loading