Releases: slub/mets-mods2tei
Releases · slub/mets-mods2tei
v0.1.6.post2
fix version
v0.1.6.post1
re-allow py38
v0.1.6
Added
- CI/CD via Github Actions, ht @rettinghaus #77
- type hints and PEP257 conformity, ht @rettinghaus #75
- support
file:URIs for local FLocats (as from OCR) - add test covering METS without logical sub-divs, and only local ALTO (as from OCR)
- add test covering remote METS
- start mapping from DFG Strukturdatenset
mets:div/@TYPEto DTABftei:div/@type
Changed
- update CI/CD via CircleCI rules
Fixed
- timeout and retry when downloading ALTO files
- downgrade loglevel error→warning if MODS has no encodingDesc
- better differentiate between front/body/back (based on
mets:div/@TYPErules)
v0.1.5
Added
- Cover flat logical
structMapwith only top-level@ADMID="AMD"div
(i.e. no structure / headings): fall back to physical structMap, too - Increase test coverage (OCR, only physical structMap)
Changed
- replaced
python-Levenshteinwithrapidfuzzdependency - improved pretty-printing of
lb(one newline and indentation per line) - 🔥 add-refs page: use top-level
facsimileforpb/@facspage URLs insteadpb/@corresp, #69 - 🔥 add-refs page: use
mods:identifier[@type="PURL"](presentation) forpb/@corresplinks, #69 - 🔥 add-refs line: use
lb/@ninstead oflb/@correspfor line IDs, #69
Fixed
- adapted to
click.Filechange (CLI option for output file) - adapted to
pkg_resourcesdeprecation
v0.1.4
v0.1.3
v0.1.2
Added
- tests for TEI API
- tests for insertion index identification
- more logging
- CLI param for output file
- CLI param for image fileGrp
Changed
- Add
front,bodyandbackper default - Log to stderr instead of stdout
- Differentiate between (physical) image nr and (logical) page nr
Fixed
- Evaluate texts from all struct types but
bindingandcolour_checker, #43 - Handle errors during language code expansion, and fallback to
Unbekannt, #47 - Add ALTO
HYPtext content if available, #52 - Allow empty logical structMap and structLink, fallback to physical, or empty, #57
- Allow partial dmdSec (MODS) or amdSec, fallback to empty, #46, #51
- Pass all
mods:identifiers tomsIdentifier/idno(not just VD and URN) - Parse full
titleInfo(main/sub/part/volume), and re-use inbiblFull - Prefer
titleInfo/titleoverdiv/@LABELif available - Map top logical
div/@TYPEinto allowedbiblFull/title/@levelonly - Map top logical
div/@TYPEinto appropriatebibl/@typeif possible
v0.1.1
Added
- Treat nested AMD-type (non-logical) divs in logical struct map (i.e. newspaper case)
- Make full text file group selectable by user
- Allow for file entries (in addition to URLs) in METS
- Add special treatment for URNs and VD IDs
- Add poor man's namespace versioning handling
Changed
- Make extraction of subtitles conditional on their presence
- Use "licence" for all types of licences (even unknown ones)
Fixed
Working TEI/text serialization
With this version, the <text> part of the TEI file gets spurred. The div structure from the METS file is carried over to the TEI and, optionally, attached OCR in ALTO format is added to single divs as defined by METS' logical struct map.
Initial Release
A complete TEI header is created from METS/MODS files. Tested with multiple examples but not yet systematically.