Skip to content

Releases: slub/mets-mods2tei

v0.1.6.post2

25 Sep 22:43
f899803

Choose a tag to compare

fix version

v0.1.6.post1

25 Sep 22:14

Choose a tag to compare

re-allow py38

v0.1.6

25 Sep 21:47
82c13b0

Choose a tag to compare

Added

  • CI/CD via Github Actions, ht @rettinghaus #77
  • type hints and PEP257 conformity, ht @rettinghaus #75
  • support file: URIs for local FLocats (as from OCR)
  • add test covering METS without logical sub-divs, and only local ALTO (as from OCR)
  • add test covering remote METS
  • start mapping from DFG Strukturdatenset mets:div/@TYPE to DTABf tei:div/@type

Changed

  • update CI/CD via CircleCI rules

Fixed

  • timeout and retry when downloading ALTO files
  • downgrade loglevel error→warning if MODS has no encodingDesc
  • better differentiate between front/body/back (based on mets:div/@TYPE rules)

v0.1.5

03 Feb 11:11
49104dc

Choose a tag to compare

Added

  • Cover flat logical structMap with only top-level @ADMID="AMD" div
    (i.e. no structure / headings): fall back to physical structMap, too
  • Increase test coverage (OCR, only physical structMap)

Changed

  • replaced python-Levenshtein with rapidfuzz dependency
  • improved pretty-printing of lb (one newline and indentation per line)
  • 🔥 add-refs page: use top-level facsimile for pb/@facs page URLs instead pb/@corresp, #69
  • 🔥 add-refs page: use mods:identifier[@type="PURL"] (presentation) for pb/@corresp links, #69
  • 🔥 add-refs line: use lb/@n instead of lb/@corresp for line IDs, #69

Fixed

  • adapted to click.File change (CLI option for output file)
  • adapted to pkg_resources deprecation

v0.1.4

12 Dec 21:58
6602f9a

Choose a tag to compare

Changed

  • mm-update: adapt to OCR-D API changes

v0.1.3

11 Feb 19:36

Choose a tag to compare

Added

  • mm2tei CLI param controlling page and line refs via @corresp
  • mm-update CLI

v0.1.2

10 Jan 12:45
69665d9

Choose a tag to compare

Added

  • tests for TEI API
  • tests for insertion index identification
  • more logging
  • CLI param for output file
  • CLI param for image fileGrp

Changed

  • Add front, body and back per default
  • Log to stderr instead of stdout
  • Differentiate between (physical) image nr and (logical) page nr

Fixed

  • Evaluate texts from all struct types but binding and colour_checker, #43
  • Handle errors during language code expansion, and fallback to Unbekannt, #47
  • Add ALTO HYP text content if available, #52
  • Allow empty logical structMap and structLink, fallback to physical, or empty, #57
  • Allow partial dmdSec (MODS) or amdSec, fallback to empty, #46, #51
  • Pass all mods:identifiers to msIdentifier/idno (not just VD and URN)
  • Parse full titleInfo (main/sub/part/volume), and re-use in biblFull
  • Prefer titleInfo/title over div/@LABEL if available
  • Map top logical div/@TYPE into allowed biblFull/title/@level only
  • Map top logical div/@TYPE into appropriate bibl/@type if possible

v0.1.1

10 Jan 12:38
21e8bd0

Choose a tag to compare

Added

  • Treat nested AMD-type (non-logical) divs in logical struct map (i.e. newspaper case)
  • Make full text file group selectable by user
  • Allow for file entries (in addition to URLs) in METS
  • Add special treatment for URNs and VD IDs
  • Add poor man's namespace versioning handling

Changed

  • Make extraction of subtitles conditional on their presence
  • Use "licence" for all types of licences (even unknown ones)

Fixed

Working TEI/text serialization

04 Dec 17:32
ecde7db

Choose a tag to compare

With this version, the <text> part of the TEI file gets spurred. The div structure from the METS file is carried over to the TEI and, optionally, attached OCR in ALTO format is added to single divs as defined by METS' logical struct map.

Initial Release

31 Jul 11:57
4585111

Choose a tag to compare

A complete TEI header is created from METS/MODS files. Tested with multiple examples but not yet systematically.