usfm3 is a Python parser for USFM 3.x.
It turns USFM into Python-friendly outputs:
to_usj(): USJ as adictto_usx(): USX as an XMLstrto_usfm(): normalized USFM as astrto_vref(): a verse-text map like{"GEN 1:1": "In the beginning..."}
The parser is error-tolerant, so malformed input still produces a parse result with structured diagnostics.
Built in Rust for speed, with native Python bindings via PyO3.
pip install usfm3Requires Python 3.9+.
import usfm3
text = r"""\id GEN
\c 1
\p
\v 1 In the beginning God created the heavens and the earth.
"""
result = usfm3.parse(text)
print(result.to_vref()["GEN 1:1"])
for diagnostic in result.diagnostics:
print(
f"[{diagnostic.severity}] {diagnostic.code}: "
f"{diagnostic.message} ({diagnostic.start}..{diagnostic.end})"
)
usj = result.to_usj()
usx = result.to_usx()
normalized_usfm = result.to_usfm()parse() runs semantic validation by default, so diagnostics can include issues such as
chapter and verse sequencing, invalid attributes, or mismatched milestones.
If you only want parsing, disable validation:
result = usfm3.parse(text, validate=False)Parses a USFM string and returns a ParseResult.
to_usj() -> dictto_usx() -> strto_usfm() -> strto_vref() -> dict[str, str]has_errors() -> booldiagnostics -> list[Diagnostic]
Each diagnostic has:
severity:"error","warning", or"info"code: machine-readable code such as"UnknownMarker"message: human-readable messagestartend
start and end are byte offsets into the original source.
to_vref()returns plain verse text keyed by references such as"GEN 1:1".to_usfm()returns normalized USFM, so whitespace may be regularized.- Invalid USFM is reported through
diagnostics;parse()still returns a result.
- Rust crate: crates.io/crates/usfm3
- JavaScript/TypeScript package: npmjs.com/package/usfm3
- Source code: github.com/jcuenod/usfm3
MIT