Conversation
…w URI - Changed ID formats in YAML files for CGPM, CIPM, and JCRB meetings to a more descriptive format. - Updated XML fixture for cctf_recommendation_2009_02.xml to include a new URI element. - Modified the data_outcomes_parser.rb to use Array instead of Relaton.array for handling PDF links. - Adjusted date parsing in article_parser_spec.rb to ensure the date is compared as a string.
… attribute references
Context: GitHub issue #28 (Metrologia parsing) is mostly implemented, but two items from the comment thread remain unresolved: 1. <back><ref-list> not parsed as relations — Some Metrologia XML files contain a <back><ref-list> section with bibliographic references. @ronaldtse confirmed these should be parsed as relations (type "cites"). 2. Implicit deduplication — The same article can appear in multiple date-stamped archives. Currently, Dir glob ordering implicitly overwrites older copies with newer ones, but this is fragile and not guaranteed. @ronaldtse said to take the newest copy based on the archive date in the folder name. Changes: 1. Parse <back><ref-list> as "cites" relations 2. Explicit date-based deduplication in fetcher
- Updated contributor organization structure in multiple YAML files to include a subdivision for the committee. - Added description fields for roles in YAML files to specify the type as "committee". - Modified the SI brochure YAML and RXL files to reflect the new organization structure. - Adjusted tests in data_outcomes_parser_spec.rb to validate the new structure and ensure proper parsing of committee details.
| next unless from | ||
|
|
||
| owner = l.at("./copyright-statement").text.split(" & ").map do |c| | ||
| /(?<name>[A-z]+(?:\s[A-z]+)*)/ =~ c |
Check warning
Code scanning / CodeQL
Overly permissive regular expression range Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 21 days ago
In general, the fix is to replace ambiguous or overly broad character ranges like A-z with explicit ranges that only cover the intended characters, such as A-Za-z. This removes the unintended extra characters between Z and a in the ASCII table.
In this specific case, on line 294 in lib/relaton/bipm/rawdata_bipm_metrologia/article_parser.rb, the regex:
/(?<name>[A-z]+(?:\s[A-z]+)*)/ =~ cis clearly intended to match alphabetic words separated by single spaces. To keep the same semantics but avoid the overly permissive range, change both A-z instances to A-Za-z:
/(?<name>[A-Za-z]+(?:\s[A-Za-z]+)*)/ =~ cThis keeps the behavior (names composed of letters and spaces) while ensuring no stray punctuation characters are accidentally matched as part of the name. No additional methods or imports are needed; this is a local change to the regex in the parse_copyright method.
| @@ -291,7 +291,7 @@ | ||
| next unless from | ||
|
|
||
| owner = l.at("./copyright-statement").text.split(" & ").map do |c| | ||
| /(?<name>[A-z]+(?:\s[A-z]+)*)/ =~ c | ||
| /(?<name>[A-Za-z]+(?:\s[A-Za-z]+)*)/ =~ c | ||
| org_name = Relaton::Bib::TypedLocalizedString.new(content: name, language: "en", script: "Latn") | ||
| org = Relaton::Bib::Organization.new name: [org_name] | ||
| Relaton::Bib::ContributionInfo.new(organization: org) |
| next unless from | ||
|
|
||
| owner = l.at("./copyright-statement").text.split(" & ").map do |c| | ||
| /(?<name>[A-z]+(?:\s[A-z]+)*)/ =~ c |
Check warning
Code scanning / CodeQL
Overly permissive regular expression range Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 21 days ago
In general, overly permissive ranges like [A-z] should be replaced with explicit ranges that include only the desired characters, most commonly [A-Za-z] for ASCII letters. This avoids unintentionally matching punctuation between Z and a in the ASCII table.
Here, the problematic line is in parse_copyright in lib/relaton/bipm/rawdata_bipm_metrologia/article_parser.rb:
/(?<name>[A-z]+(?:\s[A-z]+)*)/ =~ cThis appears to intend to capture a person's or organization's name made of alphabetic words separated by single spaces. To fix the issue without changing higher-level functionality, we should tighten both occurrences of [A-z] to [A-Za-z]. The rest of the pattern (+, (?:\s...)*, and the named capture) can remain unchanged. No new imports or methods are required; we are only adjusting the regex literal.
Concretely, in that file, update line 294 so that [A-z] is replaced with [A-Za-z] in both locations within the pattern.
| @@ -291,7 +291,7 @@ | ||
| next unless from | ||
|
|
||
| owner = l.at("./copyright-statement").text.split(" & ").map do |c| | ||
| /(?<name>[A-z]+(?:\s[A-z]+)*)/ =~ c | ||
| /(?<name>[A-Za-z]+(?:\s[A-Za-z]+)*)/ =~ c | ||
| org_name = Relaton::Bib::TypedLocalizedString.new(content: name, language: "en", script: "Latn") | ||
| org = Relaton::Bib::Organization.new name: [org_name] | ||
| Relaton::Bib::ContributionInfo.new(organization: org) |
ec982ef to
5d679f8
Compare
…ions, and citations in ArticleParser
…ncy to 2.0.0-alpha.7
No description provided.