Skip to content

Latest commit

 

History

History
145 lines (111 loc) · 4.4 KB

File metadata and controls

145 lines (111 loc) · 4.4 KB

Project Guide

Purpose

upf is a Rust library for working with Unified Pseudopotential Format (UPF) documents as typed Rust data. The current codebase supports both directions:

  • read UPF text into a validated UpfData structure
  • write a validated UpfData value back to UPF text

The project is aimed at semantic round-tripping. A document can be parsed, serialized, and parsed again into the same Rust data model, even if the exact whitespace or original layout is not preserved.

Public API

The crate exposes six primary entry points:

  • from_str: parse a UPF document from a UTF-8 string
  • from_reader: parse a UPF document from a buffered reader
  • from_file: parse a UPF document from a file path
  • to_string: serialize a validated UpfData into UPF text
  • to_writer: serialize a validated UpfData into any writer
  • to_file: serialize a validated UpfData to a file path

Parse and write operations use the shared public model type UpfData and return Result<_, UpfError>.

Current architecture

The implementation is organized around serde-based XML mapping rather than a custom parser pipeline.

Entry points

  • src/de.rs Read-side APIs. These use quick_xml::de to deserialize a full document into UpfData, then run semantic validation.
  • src/ser.rs Write-side APIs. These validate UpfData first, then use quick_xml::se to serialize it back into UPF text.

Public model

  • src/model/core.rs Defines the root UpfData type, PP_HEADER, PP_MESH, shared numeric arrays, and the central validation logic.
  • src/model/nonlocal.rs Defines PP_INFO, PP_NONLOCAL, PP_SEMILOCAL, PP_PSWFC, and related nested nodes.
  • src/model/paw.rs Defines PAW-specific sections such as PP_FULL_WFC, PP_PAW, and PP_AUGMENTATION.
  • src/model/gipaw.rs Defines GIPAW-specific sections.

Support code

  • src/error.rs Defines UpfError for XML decode/encode, I/O, value parsing, and validation failures.
  • src/text.rs Provides helpers for whitespace-delimited numeric fields and UPF boolean flags.

Validation rules

The crate currently enforces a small set of structural invariants in UpfData::validate():

  • PP_HEADER/@mesh_size must match the lengths of PP_R, PP_RAB, PP_LOCAL, and PP_RHOATOM
  • PP_HEADER/@is_paw="T" requires a PP_PAW section
  • PP_HEADER/@has_gipaw="T" requires a PP_GIPAW section

These checks run after deserialization and before serialization, so both read and write paths enforce the same structural contract.

Supported UPF sections

The current top-level model covers these sections:

  • PP_INFO
  • PP_HEADER
  • PP_MESH
  • PP_NLCC
  • PP_LOCAL
  • PP_SEMILOCAL
  • PP_NONLOCAL
  • PP_PSWFC
  • PP_FULL_WFC
  • PP_RHOATOM
  • PP_PAW
  • PP_GIPAW

Optional sections are represented as Option<T>. Repeated numbered tags such as PP_BETA.n, PP_CHI.n, and PAW/GIPAW entry lists are represented with enums and vectors that match the serialized UPF tags.

Current scope and limitations

  • The code is built around the UPF 2.0.1 structure currently represented in src/model.
  • Serialization aims to produce valid UPF for the supported model, not to preserve original comments, formatting, or unknown sections byte-for-byte.
  • The crate does not currently preserve unsupported top-level sections.
  • Input still needs to be readable by quick-xml; the old custom normalization/tree pipeline described in previous docs is no longer part of the implementation.

Testing strategy

The repository uses focused inline fixtures in tests/*.rs to cover:

  • basic parsing of core sections
  • file/string/reader read APIs
  • file/string/writer write APIs
  • semantic round-tripping
  • validation failures for inconsistent sections
  • PAW, GIPAW, and nonlocal subtree coverage

Abbreviation glossary

  • UPF: Unified Pseudopotential Format
  • PP: pseudopotential
  • NC: norm-conserving
  • US: ultrasoft
  • PAW: projector augmented wave
  • GIPAW: gauge including projector augmented wave
  • AE: all-electron
  • PS: pseudo
  • WFC: wavefunction
  • NLCC: nonlinear core correction
  • RHOATOM: atomic charge density
  • RAB: radial integration measure
  • DIJ: nonlocal projector coupling matrix

Verification

The current repository verification commands are:

  • cargo fmt --check
  • cargo clippy --all-targets -- -D warnings
  • cargo test
  • cargo doc --no-deps when public API docs or rustdoc are touched