-
Notifications
You must be signed in to change notification settings - Fork 32
Description
Overview
We have developed the html_to_docbook() function in the R package movepub, which converts HTML-formatted text to DocBook markup. We propose to implement this functionality directly within the EML package.
Rationale
Movepub streamlines the publication of animal tracking data from Movebank to the Global Biodiversity Information Facility (GBIF). When converting Movebank metadata to EML with movepub::write_eml(), we aim to support rich text formatting (e.g., bold, hyperlinks) in dataset descriptions.
The only consistent way to provide rich text that:
- passes EML validation,
- is accepted by the Integrated Publishing Toolkit (IPT), and
- displays correctly on GBIF.org
is to follow the EML specification for <para> elements, using a subset of DocBook syntax.
While EML::set_TextType() addresses this partially, it is difficult to use and only works for external files—not for inline text strings.
See related discussion and evaluation of alternatives in movepub issue #101.
Proposed Solution
After reviewing several options, we implemented a custom converter for movepub that transforms HTML syntax into DocBook, splitting paragraphs and headers into separate elements.
Benefits of integrating this into EML:
- It is a better fit for EML than for movepub, because it is specifically desgined for EML para
- More users can prepare EML with rich text descriptions using familiar HTML syntax.
- Ensures interoperability with IPT and GBIF.org.
- Reduces duplicated effort across EML-related packages.
Implementation reference:
Reproducible example
library(movepub)
html <- "This is <b>bold</b>.\nParagraph 1\n\nParagraph 2<p></p>What follows is a list: <ul><li>Item 1</li><li>Item 2</li></ul>"
html_to_docbook(html)
#> [1] "This is <emphasis>bold</emphasis>."
#> [2] "Paragraph 1"
#> [3] "Paragraph 2"
#> [4] "What follows is a list: <itemizedlist><listitem><para>Item 1</para></listitem><listitem><para>Item 2</para></listitem></itemizedlist>"Created on 2025-10-15 with reprex v2.1.1
Next Steps
- Would the maintainers be receptive to integrating this converter?
- We can provide a PR or collaborate on adapting the code to EML’s conventions and requirements.