An error-tolerant USFM 3.x parser written in Rust. Outputs USJ (JSON), USX (XML), normalized USFM, and vref format (a key-value map of verse references to text).
Available as a Rust library, CLI tool, Python package, and WebAssembly module.
- Parses all USFM 3.x markers including tables, milestones, sidebars, figures, and nested character styles
- Error-tolerant: always produces a document tree, even from malformed input
- Structured diagnostics with source locations, severity levels, and machine-readable codes
- Semantic validation (chapter/verse sequence, attribute rules, milestone pairing, etc.)
- Multiple output formats: USJ, USX, USFM, and verse-reference maps
| Crate | Description |
|---|---|
usfm3 |
Core Rust library |
usfm3-cli |
Command-line tool |
usfm3-python |
Python bindings (PyO3) |
usfm3-wasm |
WebAssembly bindings (works in browsers and Node.js) |
# From a file (defaults to USJ output)
usfm3 path/to/file.usfm
# Choose output format
usfm3 path/to/file.usfm usx
usfm3 path/to/file.usfm usfm
usfm3 path/to/file.usfm vref
# From stdin
cat file.usfm | usfm3
# Skip validation
usfm3 path/to/file.usfm --no-validateDiagnostics are printed to stderr; document output goes to stdout.
Crate available on crates.io:
let result = usfm3::builder::parse(r#"\id GEN
\c 1
\p
\v 1 In the beginning God created the heavens and the earth.
"#);
// Check for errors
for diag in result.diagnostics.iter() {
eprintln!("{diag}");
}
// Output as USJ (JSON)
let usj = usfm3::usj::to_usj_string_pretty(&result.document).unwrap();
println!("{usj}");
// Output as USX (XML)
let usx = usfm3::usx::to_usx_string(&result.document).unwrap();
// Output as normalized USFM
let usfm = usfm3::usfm::to_usfm_string(&result.document);
// Run semantic validation
let validation_diags = usfm3::validation::validate(&result.document);Python bindings available at: PyPI
import usfm3
result = usfm3.parse(open("GEN.usfm").read())
# Output formats
usj = result.to_usj() # dict
usx = result.to_usx() # XML string
usfm = result.to_usfm() # USFM string
vref = result.to_vref() # {"GEN 1:1": "In the beginning...", ...}
# Diagnostics
for d in result.diagnostics:
print(f"[{d.severity}] {d.message} ({d.start}..{d.end})")
if result.has_errors():
print("Document has errors")
# Skip validation
result = usfm3.parse(text, validate=False)Build with maturin:
cd crates/usfm3-python
maturin develop # install into current venvWorks in browsers, Node.js, Deno, and Bun. NPM
WASM is automatically initialized in Node.js, Deno, and Bun. In a browser, call init() first:
import init from "usfm3";
await init(); // browser onlyimport { parse } from "usfm3";
const result = parse(usfmText);
// Output formats (lazy -- only serialized when called)
const usj = result.toUsj(); // USJ object
const usx = result.toUsx(); // USX XML string
const usfm = result.toUsfm(); // Normalized USFM string
const vref = result.toVref(); // Vref pairs like { "GEN 1:1": "In the beginning...", ... }
// Diagnostics
for (const d of result.diagnostics) {
console.log(`[${d.severity}] ${d.message} (${d.start}..${d.end})`);
// d.code is a machine-readable enum like "UnknownMarker", "ImplicitClose", etc.
}
// Skip validation
const result2 = parse(usfmText, { validate: false });
// Free wasm memory when done
result.free();Build with wasm-pack:
wasm-pack build crates/usfm3-wasm --target web # for browsers
wasm-pack build crates/usfm3-wasm --target nodejs # for Node.js# Build everything
cargo build
# Build individual crates
cargo build -p usfm3 # core library
cargo build -p usfm3-cli # CLI
# Run tests
cargo test -p usfm3The parser uses a two-phase architecture:
- Lexer (
logos-based tokenizer) -- splits USFM source into tokens with byte-offset spans - Builder (stack-based tree builder) -- converts the token stream into a
DocumentAST
This design makes the parser error-tolerant: the lexer always succeeds, and the builder recovers from structural errors by emitting diagnostics and applying heuristics (implicit closes, etc.).
Validation is a separate pass over the AST that checks semantic rules without modifying the tree.
| Format | Function | Description |
|---|---|---|
| USJ | usj::to_usj_string() |
Unified Scripture JSON -- the standard JSON representation |
| USX | usx::to_usx_string() |
Unified Scripture XML -- the standard XML representation |
| USFM | usfm::to_usfm_string() |
Normalized USFM with regularized whitespace |
| VRef | vref::to_vref_json_string() |
Verse reference to plain text map (strips formatting/notes) |
MIT