upf is a Rust library for working with Unified Pseudopotential Format (UPF)
documents as typed Rust data. The current codebase supports both directions:
- read UPF text into a validated
UpfDatastructure - write a validated
UpfDatavalue back to UPF text
The project is aimed at semantic round-tripping. A document can be parsed, serialized, and parsed again into the same Rust data model, even if the exact whitespace or original layout is not preserved.
The crate exposes six primary entry points:
from_str: parse a UPF document from a UTF-8 stringfrom_reader: parse a UPF document from a buffered readerfrom_file: parse a UPF document from a file pathto_string: serialize a validatedUpfDatainto UPF textto_writer: serialize a validatedUpfDatainto any writerto_file: serialize a validatedUpfDatato a file path
Parse and write operations use the shared public model type UpfData and
return Result<_, UpfError>.
The implementation is organized around serde-based XML mapping rather than a custom parser pipeline.
src/de.rsRead-side APIs. These usequick_xml::deto deserialize a full document intoUpfData, then run semantic validation.src/ser.rsWrite-side APIs. These validateUpfDatafirst, then usequick_xml::seto serialize it back into UPF text.
src/model/core.rsDefines the rootUpfDatatype,PP_HEADER,PP_MESH, shared numeric arrays, and the central validation logic.src/model/nonlocal.rsDefinesPP_INFO,PP_NONLOCAL,PP_SEMILOCAL,PP_PSWFC, and related nested nodes.src/model/paw.rsDefines PAW-specific sections such asPP_FULL_WFC,PP_PAW, andPP_AUGMENTATION.src/model/gipaw.rsDefines GIPAW-specific sections.
src/error.rsDefinesUpfErrorfor XML decode/encode, I/O, value parsing, and validation failures.src/text.rsProvides helpers for whitespace-delimited numeric fields and UPF boolean flags.
The crate currently enforces a small set of structural invariants in
UpfData::validate():
PP_HEADER/@mesh_sizemust match the lengths ofPP_R,PP_RAB,PP_LOCAL, andPP_RHOATOMPP_HEADER/@is_paw="T"requires aPP_PAWsectionPP_HEADER/@has_gipaw="T"requires aPP_GIPAWsection
These checks run after deserialization and before serialization, so both read and write paths enforce the same structural contract.
The current top-level model covers these sections:
PP_INFOPP_HEADERPP_MESHPP_NLCCPP_LOCALPP_SEMILOCALPP_NONLOCALPP_PSWFCPP_FULL_WFCPP_RHOATOMPP_PAWPP_GIPAW
Optional sections are represented as Option<T>. Repeated numbered tags such
as PP_BETA.n, PP_CHI.n, and PAW/GIPAW entry lists are represented with enums
and vectors that match the serialized UPF tags.
- The code is built around the UPF
2.0.1structure currently represented insrc/model. - Serialization aims to produce valid UPF for the supported model, not to preserve original comments, formatting, or unknown sections byte-for-byte.
- The crate does not currently preserve unsupported top-level sections.
- Input still needs to be readable by
quick-xml; the old custom normalization/tree pipeline described in previous docs is no longer part of the implementation.
The repository uses focused inline fixtures in tests/*.rs to cover:
- basic parsing of core sections
- file/string/reader read APIs
- file/string/writer write APIs
- semantic round-tripping
- validation failures for inconsistent sections
- PAW, GIPAW, and nonlocal subtree coverage
UPF: Unified Pseudopotential FormatPP: pseudopotentialNC: norm-conservingUS: ultrasoftPAW: projector augmented waveGIPAW: gauge including projector augmented waveAE: all-electronPS: pseudoWFC: wavefunctionNLCC: nonlinear core correctionRHOATOM: atomic charge densityRAB: radial integration measureDIJ: nonlocal projector coupling matrix
The current repository verification commands are:
cargo fmt --checkcargo clippy --all-targets -- -D warningscargo testcargo doc --no-depswhen public API docs or rustdoc are touched