Lutaml integration by andrew2net · Pull Request #66 · relaton/relaton-bipm

andrew2net · 2025-02-13T23:47:34Z

No description provided.

…r class

…w URI - Changed ID formats in YAML files for CGPM, CIPM, and JCRB meetings to a more descriptive format. - Updated XML fixture for cctf_recommendation_2009_02.xml to include a new URI element. - Modified the data_outcomes_parser.rb to use Array instead of Relaton.array for handling PDF links. - Adjusted date parsing in article_parser_spec.rb to ensure the date is compared as a string.

…XML fixtures

…version

… attribute references

…l format

…in gemspec

@ronaldtse

Context: GitHub issue #28 (Metrologia parsing) is mostly implemented, but two items from the comment thread remain unresolved: 1. <back><ref-list> not parsed as relations — Some Metrologia XML files contain a <back><ref-list> section with bibliographic references. @ronaldtse confirmed these should be parsed as relations (type "cites"). 2. Implicit deduplication — The same article can appear in multiple date-stamped archives. Currently, Dir glob ordering implicitly overwrites older copies with newer ones, but this is fragile and not guaranteed. @ronaldtse said to take the newest copy based on the archive date in the folder name. Changes: 1. Parse <back><ref-list> as "cites" relations 2. Explicit date-based deduplication in fetcher

…hens

- Updated contributor organization structure in multiple YAML files to include a subdivision for the committee. - Added description fields for roles in YAML files to specify the type as "committee". - Modified the SI brochure YAML and RXL files to reflect the new organization structure. - Adjusted tests in data_outcomes_parser_spec.rb to validate the new structure and ensure proper parsing of committee details.

+          next unless from
+
+          owner = l.at("./copyright-statement").text.split(" & ").map do |c|
+            /(?<name>[A-z]+(?:\s[A-z]+)*)/ =~ c


In general, the fix is to replace ambiguous or overly broad character ranges like A-z with explicit ranges that only cover the intended characters, such as A-Za-z. This removes the unintended extra characters between Z and a in the ASCII table.

In this specific case, on line 294 in lib/relaton/bipm/rawdata_bipm_metrologia/article_parser.rb, the regex:

/(?<name>[A-z]+(?:\s[A-z]+)*)/ =~ c

is clearly intended to match alphabetic words separated by single spaces. To keep the same semantics but avoid the overly permissive range, change both A-z instances to A-Za-z:

/(?<name>[A-Za-z]+(?:\s[A-Za-z]+)*)/ =~ c

This keeps the behavior (names composed of letters and spaces) while ensuring no stray punctuation characters are accidentally matched as part of the name. No additional methods or imports are needed; this is a local change to the regex in the parse_copyright method.

+          next unless from
+
+          owner = l.at("./copyright-statement").text.split(" & ").map do |c|
+            /(?<name>[A-z]+(?:\s[A-z]+)*)/ =~ c


In general, overly permissive ranges like [A-z] should be replaced with explicit ranges that include only the desired characters, most commonly [A-Za-z] for ASCII letters. This avoids unintentionally matching punctuation between Z and a in the ASCII table.

Here, the problematic line is in parse_copyright in lib/relaton/bipm/rawdata_bipm_metrologia/article_parser.rb:

/(?<name>[A-z]+(?:\s[A-z]+)*)/ =~ c

This appears to intend to capture a person's or organization's name made of alphabetic words separated by single spaces. To fix the issue without changing higher-level functionality, we should tighten both occurrences of [A-z] to [A-Za-z]. The rest of the pattern (+, (?:\s...)*, and the named capture) can remain unchanged. No new imports or methods are required; we are only adjusting the regex literal.

Concretely, in that file, update line 294 so that [A-z] is replaced with [A-Za-z] in both locations within the pattern.

…rser fixes relaton/relaton-iso#167

…ions, and citations in ArticleParser

…I brochure data

…t_errors method

…ncy to 2.0.0-alpha.7

….0.0

…ndex data

andrew2net added 26 commits February 13, 2025 18:45

WIP implement flavor model

af2bfa2

WIP update fethcers, parsers, grammars, model, and processor

d484cef

WIP update data fetcher

7f502a8

WIP update data parsers

ae3eaee

update Relaton::Bipm::Bibliography

9219704

refactor: update require statements to use relative paths in Processo…

54c6d9f

…r class

update VCRs

4a5d802

refactor: remove EditorialGroup and related classes, update YAML and …

d934904

…XML fixtures

feat: implement Asciibib converter and update ItemData to support con…

977c8f0

…version

feat: add CLAUDE.md for project guidance and common commands

0267979

fix: add missing require_relative for bipm/version in bipm.rb

1d9176d

feat: implement serialization methods for bibxml format in DataFetcher

b5798fa

fix: update identifiers and document references in fixtures

a3618d8

fix: update biblio.rng to replace optional with zeroOrMore and adjust…

da37306

… attribute references

fix: update meeting IDs for consistency across YAML fixtures

c57a7c7

fix: update serialization output to include reference anchor in bibxm…

69fdd46

…l format

fix: update Ruby version requirements to 3.2 and adjust dependencies …

73f476d

…in gemspec

fix: update version to 2.0.0-alpha.2 in version.rb

b065f34

Update VCR cassettes

492a8ef

fix: unify INDEX_FILE constant usage across modules

db67d2a

fix: update ID formatting in fix_si_brochure_id method to include hyp…

110163a

…hens

update VCRs

f19fe28

bump version to 2.0.0-alpha.3

355de02

github-advanced-security AI found potential problems Mar 24, 2026

View reviewed changes

feat: enhance error handling and logging in DataFetcher and ArticlePa…

5d679f8

…rser fixes relaton/relaton-iso#167

andrew2net force-pushed the lutaml-integration branch from ec982ef to 5d679f8 Compare March 24, 2026 18:53

fix: improve error handling for journal title, contributors, affiliat…

a344200

…ions, and citations in ArticleParser

andrew2net added 7 commits March 25, 2026 13:16

feat: enhance error handling in parsers and fetcher for article and S…

6380e45

…I brochure data

refactor: remove unused gh_issue_channel method and fix typo in repor…

eef9606

…t_errors method

Update grammars

d90af53

chore: update version to 2.0.0-alpha.4 and change relaton-bib depende…

c0dec5d

…ncy to 2.0.0-alpha.7

chore: update version to 2.0.0 and change relaton-bib dependency to 2…

5901d81

….0.0

feat: add index fixture and update Rake task for downloading latest i…

d31b256

…ndex data

Update VCRs

74a8fa0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lutaml integration#66

Lutaml integration#66
andrew2net wants to merge 35 commits intomainfrom
lutaml-integration

andrew2net commented Feb 13, 2025

Uh oh!

Check warning

Copilot Autofix

Check warning

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

@@ -291,7 +291,7 @@
                       next unless from
                       owner = l.at("./copyright-statement").text.split(" & ").map do |c|
-                        /(?<name>[A-z]+(?:\s[A-z]+)*)/ =~ c
+                        /(?<name>[A-Za-z]+(?:\s[A-Za-z]+)*)/ =~ c
                         org_name = Relaton::Bib::TypedLocalizedString.new(content: name, language: "en", script: "Latn")
                         org = Relaton::Bib::Organization.new name: [org_name]
                         Relaton::Bib::ContributionInfo.new(organization: org)

Conversation

andrew2net commented Feb 13, 2025

Uh oh!

Check warning

Copilot Autofix

Check warning

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants