-
Notifications
You must be signed in to change notification settings - Fork 4
Description
In the BioPortal context file (https://github.com/linkml/prefixmaps/blob/main/src/prefixmaps/data/bioportal.csv#L199-L202), two prefixes share the same URI prefix. This seems to invalidate the claim on the README about bijectivity (https://github.com/linkml/prefixmaps/blob/main/README.md?plain=1#L13).
Maybe this is a curation oversight, which I bet @caufieldjh has already found, since I know he recently put a lot of effort into looking through this content. However, this isn't resolved when loading content through the package, which makes me think that the package should be more careful about checking the integrity of content. The following code illustrates:
from prefixmaps import load_context
context = load_context("bioportal")
for e in context.prefix_expansions:
if e.prefix in {"INVERSEROLES", "ISO-15926-2_2003"}:
print(e)
# Output:
# PrefixExpansion(context='bioportal', prefix='INVERSEROLES', namespace='http://rds.posccaesar.org/2008/02/OWL/ISO-15926-2_2003#', status=<StatusType.canonical: 'canonical'>)
# PrefixExpansion(context='bioportal', prefix='ISO-15926-2_2003', namespace='http://rds.posccaesar.org/2008/02/OWL/ISO-15926-2_2003#', status=<StatusType.canonical: 'canonical'>)This means the assumptions in Context.as_dict() are also incorrect, since this naively iterates through the expansions and picks out the prefix/URI prefix (namespace) pairs.
I'm pretty stumped trying to understand the data structure used in this package, it seems like a lot of things that could be grouped are not. Have you considered using a JSON structure?
For example, the curies package has a lot of overlap in terms of needing to represent a group of related prefixes and URI prefixes while denoting which is the "canonical" prefix and "canonical" URI prefix. This data structure is described in the curies documentation and a more full example with the whole Bioregistry can be found here.
Background: I'm currently trying to implement a more principled import of a Context object from this package into a curies.Converter in biopragmatics/curies#22 and am stuck since there's no way to decide which of these two canonical records should be the actual canonical record.