Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,5 @@
.rspec_status
.rubocop-https---raw-githubusercontent-com-riboseinc-oss-guides-master-ci-rubocop-yml
Gemfile.lock
rubocop-87c7cdd254a8d09d005ee06efac7acc0.yml
.claude/
2 changes: 1 addition & 1 deletion .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,6 @@ require: rubocop-rails
inherit_from:
- https://raw.githubusercontent.com/riboseinc/oss-guides/master/ci/rubocop.yml
AllCops:
TargetRubyVersion: 2.7
TargetRubyVersion: 3.2
Rails:
Enabled: false
59 changes: 59 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## What is relaton-xsf?

A Ruby gem for bibliographic retrieval of XMPP XEP (XMPP Extension Protocol) specifications. Part of the Relaton family of gems. Fetches data from https://xmpp.org/extensions/refs/ and the relaton-data-xsf GitHub repository.

## Commands

- `bundle exec rspec` — run all tests
- `bundle exec rspec spec/relaton/xsf/processor_spec.rb` — run a single spec file
- `bundle exec rubocop` — lint
- `bundle exec rubocop -a` — lint with auto-fix
- `bin/console` — interactive console with gem loaded

## Architecture

Namespace: `Relaton::Xsf` (under `lib/relaton/xsf/`). Branch `lutaml-integration` uses the new nested namespace (not the old `RelatonXsf`).

Key classes and their base classes from relaton-core:

| Class | Base | Role |
|---|---|---|
| `Processor` | `Relaton::Core::Processor` | Plugin entry point for relaton registry |
| `Bibliography` | Module (extends self) | Search & get interface (`search`, `get`) |
| `HitCollection` | `Relaton::Core::HitCollection` | Collection of search results |
| `Hit` | `Relaton::Core::Hit` | Single result; lazy-loads YAML from GitHub |
| `DataFetcher` | `Relaton::Core::DataFetcher` | Crawls xmpp.org, parses BibXML, saves docs |
| `Item` / `Bibitem` / `Bibdata` | `Relaton::Bib::Item` | Bibliographic item models (lutaml-model based) |

Data flow: `Processor#get` → `Bibliography.get` → `HitCollection.search` → `Hit#item` → fetches YAML → `Relaton::Bib::Item.from_yaml`

DataFetcher flow: Crawls `https://xmpp.org/extensions/refs/`, parses each XML ref via `Relaton::Bib::Converter::BibXml.to_item`, sets `ext.flavor = "xsf"`, saves to disk.

Constants: `INDEXFILE = "index-v1"`, `GHDATA_URL` points to relaton-data-xsf `data-v2` branch.

## Testing

- **Index fixture:** `spec/fixtures/index-v1.zip` is pre-loaded into `Relaton::Index` pool in `before(:suite)` (configured in `spec/support/webmock.rb`). Run `rake spec:update_index` to refresh from relaton-data-xsf.
- RSpec with VCR cassettes (`spec/vcr_cassettes/`) for HTTP interactions
- WebMock disables all external network connections
- Fixtures in `spec/fixtures/` (item.yaml, bibdata.xml, bibitem.xml)
- Round-trip tests verify YAML→Item→YAML and XML→Item→XML fidelity
- `DataFetcher` is lazily required — specs that test it must `require "relaton/xsf/data_fetcher"` explicitly
- Same for `Processor` — `require "relaton/xsf/processor"`

## Key dependencies

- `relaton-core` — abstract base classes (Processor, HitCollection, Hit, DataFetcher)
- `relaton-bib` — bibliographic models, XML/YAML serialization (lutaml-model based)
- `relaton-index` — index management for quick document lookups
- `mechanize` — HTTP fetching and HTML parsing

## Style

- RuboCop with relaton shared config (inherits from riboseinc/oss-guides)
- Target Ruby version: 3.1
- Logging via `Relaton::Xsf::Util` (extends `Relaton::Bib::Util`, PROGNAME = "relaton-xsf")
66 changes: 38 additions & 28 deletions README.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
= Relaton-XSF: bibliographic retrieval of XMPP XEP specifications
= Relaton::XSF: bibliographic retrieval of XMPP XEP specifications

image:https://img.shields.io/gem/v/relaton-xsf.svg["Gem Version", link="https://rubygems.org/gems/relaton-xsf"]
image:https://github.com/relaton/relaton-xsf/workflows/macos/badge.svg["Build Status (macOS)", link="https://github.com/relaton/relaton-xsf/actions?workflow=macos"]
Expand Down Expand Up @@ -42,81 +42,89 @@ Or install it yourself as:

[source,ruby]
----
require 'relaton_xsf'
require 'relaton/xsf'
=> true

hit_collection = RelatonXsf::Bibliography.search("XEP 0001")
=> <RelatonXsf::HitCollection:0x00000000019780 @ref=XEP 0001 @fetched=false>
hit_collection = Relaton::Xsf::Bibliography.search("XEP 0001")
=> <Relaton::Xsf::HitCollection:0x000000000013c0 @ref=XEP 0001 @fetched=false>

item = hit_collection[0].fetch
=> #<RelatonXsf::BibliographicItem:0x000000011167a518
item = hit_collection[0].item
=> #<Relaton::Bib::ItemData:0x0000000124d510b8
...
----

=== XML serialization
[source,ruby]
----
item.to_xml
=> "<bibitem id="XEP0001" type="standard" schema-version="v1.2.9">
<fetched>2023-07-18</fetched>
<title format="text/plain" language="en" script="Latn">XMPP Extension Protocols</title>
=> "<bibitem id="XEP0001" type="standard" schema-version="v1.4.1">
<fetched>2026-03-04</fetched>
<title language="en" script="Latn">XMPP Extension Protocols</title>
<uri type="src">http://xmpp.org/extensions/xep-0001.html</uri>
<uri type="HTML">http://xmpp.org/extensions/xep-0001.html</uri>
<docidentifier type="XEP" primary="true">XEP 0001</docidentifier>
...
<bibitem>"
----
With argument `bibdata: true` it outputs XML wrapped by `bibdata` element and adds flavor `ext` element.
[source,ruby]
----
item.to_xml bibdata: true
=> "<bibdata type="standard" schema-version="v1.2.9">
<fetched>2023-07-18</fetched>
<title format="text/plain" language="en" script="Latn">XMPP Extension Protocols</title>
=> "<bibdata type="standard" schema-version="v1.4.1">
<fetched>2026-03-04</fetched>
<title language="en" script="Latn">XMPP Extension Protocols</title>
<uri type="src">http://xmpp.org/extensions/xep-0001.html</uri>
<uri type="HTML">http://xmpp.org/extensions/xep-0001.html</uri>
<docidentifier type="XEP" primary="true">XEP 0001</docidentifier>
...
<ext>
<doctype>rfc</doctype>
<flavor>xsf</flavor>
</ext>
</bibdata>"
----

=== Get document by reference
[source,ruby]
----
item = RelatonXsf::Bibliography.get "XEP 0001"
[relaton-xsf] (XEP 0001) Fetching from Relaton repository ...
[relaton-xsf] (XEP 0001) Found `XEP 0001`
=> #<RelatonXsf::BibliographicItem:0x000000011275cd18
item = Relaton::Xsf::Bibliography.get "XEP 0001"
[relaton-xsf] INFO: (XEP 0001) Fetching from Relaton repository ...
[relaton-xsf] INFO: (XEP 0001) Found: `XEP 0001`
=> #<Relaton::Bib::ItemData:0x0000000125036f58
...

item.docidentifier.first.id
item.docidentifier.first.content
=> "XEP 0001"
----

=== Typed links
=== Typed source links

XSF publications have `src` type link.
XSF publications have `src` type source link.

[source,ruby]
----
item.link
=> [#<RelatonBib::TypedUri:0x0000000113ad5ca0
@content=#<Addressable::URI:0xcc24 URI:http://xmpp.org/extensions/xep-0001.html>,
@language=nil,
@script=nil,
@type="src">]
item.source[0].type
=> "src"

item.source[0].content
=> "http://xmpp.org/extensions/xep-0001.html"
----

=== Fetch data

This gem uses the https://xmpp.org/extensions/refs/ dataset as a data source.

The method `RelatonXsf::DataFetcher.fetch(output: "data", format: "yaml")` fetches all the documents from the dataset and saves them to the `./data` folder in YAML format.
The method `Relaton::Xsf::DataFetcher.fetch(output: "data", format: "yaml")` fetches all the documents from the dataset and saves them to the `./data` folder in YAML format.
Arguments:

- `output` - folder to save documents (default './data').
- `format` - the format in which the documents are saved. Possible formats are: `yaml`, `xml`, `bibxxml` (default `yaml`).

[source,ruby]
----
RelatonXsf::DataFetcher.fetch
require 'relaton/xsf/data_fetcher'

Relaton::Xsf::DataFetcher.fetch
Started at: 2021-09-01 18:01:01 +0200
Stopped at: 2021-09-01 18:01:43 +0200
Done in: 42 sec.
Expand All @@ -125,12 +133,14 @@ Done in: 42 sec.

=== Logging

RelatonXsf uses the relaton-logger gem for logging. By default, it logs to STDOUT. To change the log levels and add other loggers, read the https://github.com/relaton/relaton-logger#usage[relaton-logger] documentation.
Relaton::Xsf uses the relaton-logger gem for logging. By default, it logs to STDOUT. To change the log levels and add other loggers, read the https://github.com/relaton/relaton-logger#usage[relaton-logger] documentation.

== Development

After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.

To update the index test fixture (used by tests), run `rake spec:update_index`. This downloads the latest `index-v1.zip` from the https://github.com/relaton/relaton-data-xsf[relaton-data-xsf] repository.

To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to https://rubygems.org[rubygems.org].

== Contributing
Expand Down
22 changes: 22 additions & 0 deletions Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,25 @@ require "rubocop/rake_task"
RuboCop::RakeTask.new

task default: %i[spec]

namespace :spec do
desc "Download latest XSF index fixture from relaton-data-xsf"
task :update_index do
require "net/http"
require "uri"

url = "https://raw.githubusercontent.com/relaton/relaton-data-xsf/data-v2/index-v1.zip"
dest = File.join(__dir__, "spec", "fixtures", "index-v1.zip")

puts "Downloading \#{url} ..."
uri = URI.parse(url)
response = Net::HTTP.get_response(uri)

if response.is_a?(Net::HTTPSuccess)
File.binwrite(dest, response.body)
puts "Updated \#{dest} (\#{response.body.bytesize} bytes)"
else
abort "Failed to download: HTTP \#{response.code}"
end
end
end
2 changes: 1 addition & 1 deletion bin/console
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# frozen_string_literal: true

require "bundler/setup"
require "relaton_xsf"
require "relaton/xsf"

# You can add fixtures and/or initialization code here to make experimenting
# with your gem easier. You can also use a different console, if you like.
Expand Down
29 changes: 29 additions & 0 deletions lib/relaton/xsf.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# frozen_string_literal: true

require "mechanize"
require "relaton/index"
require "relaton/bib"
require_relative "xsf/version"
require_relative "xsf/util"
require_relative "xsf/item"
require_relative "xsf/bibitem"
require_relative "xsf/bibdata"
require_relative "xsf/hit"
require_relative "xsf/hit_collection"
require_relative "xsf/bibliography"

module Relaton
module Xsf
INDEXFILE = "index-v1"

class Error < StandardError; end

# Your code goes here...
def self.grammar_hash
# gem_path = File.expand_path "..", __dir__
# grammars_path = File.join gem_path, "grammars", "*"
# grammars = Dir[grammars_path].sort.map { |gp| File.read gp }.join
Digest::MD5.hexdigest Relaton::Bib::VERSION # grammars
end
end
end
7 changes: 7 additions & 0 deletions lib/relaton/xsf/bibdata.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
module Relaton
module Xsf
class Bibdata < Item
include Bib::BibdataShared
end
end
end
7 changes: 7 additions & 0 deletions lib/relaton/xsf/bibitem.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
module Relaton
module Xsf
class Bibitem < Item
include Bib::BibitemShared
end
end
end
24 changes: 24 additions & 0 deletions lib/relaton/xsf/bibliography.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
module Relaton
module Xsf
module Bibliography
extend self

def search(ref)
HitCollection.new(ref).search
end

def get(code, _year = nil, _opts = {})
Util.info "Fetching from Relaton repository ...", key: code
result = search(code)
if result.empty?
Util.info "Not found.", key: code
return
end

bib = result.first.item
Util.info "Found: `#{bib.docidentifier.first.content}`", key: code
bib
end
end
end
end
56 changes: 56 additions & 0 deletions lib/relaton/xsf/data_fetcher.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
require "relaton/core"

module Relaton
module Xsf
class DataFetcher < Relaton::Core::DataFetcher
def index
@index ||= Relaton::Index.find_or_create :xsf, file: "#{INDEXFILE}.yaml"
end

def fetch(_source = nil)
agent = Mechanize.new
resp = agent.get "https://xmpp.org/extensions/refs/"
resp.xpath("//a[contains(@href, 'XEP-')]").each do |link|
doc = agent.get link[:href]
bib = Relaton::Bib::Converter::BibXml.to_item doc.body
save_doc bib
rescue StandardError => e
Util.warn "Failed to parse #{link[:href]}: #{e.message}"
end
index.save
end

def save_doc(bib)
return unless bib

bib.ext ||= Relaton::Bib::Ext.new
bib.ext.flavor = "xsf"

docid = bib.docidentifier.detect(&:primary) || bib.docidentifier.first
id = docid&.content
return unless id

file = output_file id
if @files.include? file
Util.warn "File #{file} already exists"
else
@files << file
end
File.write file, serialize(bib), encoding: "UTF-8"
index.add_or_update id, file
end

def to_yaml(bib)
bib.to_yaml
end

def to_xml(bib)
bib.to_xml bibdata: true
end

def to_bibxml(bib)
bib.to_rfcxml
end
end
end
end
17 changes: 17 additions & 0 deletions lib/relaton/xsf/hit.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
module Relaton
module Xsf
class Hit < Relaton::Core::Hit
def item
return @doc if @doc

agent = Mechanize.new
resp = agent.get hit[:url]
hash = YAML.safe_load resp.body
hash["fetched"] = Date.today.to_s
@doc = Relaton::Bib::Item.from_yaml hash.to_yaml
rescue StandardError => e
raise Relaton::RequestError, e.message
end
end
end
end
Loading
Loading