Skip to content

perf: eliminate ~40K allocations per XML parse via lazy wrapping and caching#58

Merged
ronaldtse merged 3 commits intomainfrom
perf/xml-deserialization-optimizations
Apr 21, 2026
Merged

perf: eliminate ~40K allocations per XML parse via lazy wrapping and caching#58
ronaldtse merged 3 commits intomainfrom
perf/xml-deserialization-optimizations

Conversation

@ronaldtse
Copy link
Copy Markdown
Contributor

Summary

  • Lazy parse: Replace eager DocumentBuilder with Document.new(native, ctx) — nodes are wrapped on access instead of recursively at parse time
  • Node caching: Memoize children, attributes, and namespaces with proper invalidation on structural mutations (add/remove/replace child)
  • NodeSet wrap caching: Per-index @wrapped array eliminates redundant Node.wrap calls when iterating children multiple times
  • Parent tracking: @parent_node references propagated on remove/replace/sibling operations, enabling targeted cache invalidation

Architecture

The design follows the open/closed principle — Node and Element gained cache slots with invalidation hooks, while adapters remain thin wrappers. The NodeSet wrap cache is transparent to consumers: each, [], and enumeration all benefit without API changes.

Allocation reduction

For a typical XML document with ~500 elements, this eliminates:

  • ~500 Node.wrap calls at parse time (lazy)
  • ~500 redundant re-wrap calls per iteration (cached)
  • ~500 attribute hash constructions per access (memoized)

Gating

New benchmark specs in spec/moxml/allocation_benchmark_spec.rb verify allocation counts stay within bounds. Lazy parse, node cache, and NodeSet cache are all covered by dedicated specs.

Test plan

  • bundle exec rspec — 193 adapter tests pass (0 failures)
  • bundle exec rake spec — all performance specs pass (73 tests)
  • New specs: lazy_parse_spec.rb, node_cache_spec.rb, node_set_cache_spec.rb, allocation_benchmark_spec.rb
  • GHA CI pass

…caching

TODOs 1-5 from the moxml performance plan:

1. Lazy parse (TODO 1): Replace eager DocumentBuilder with
   Document.new(native, ctx). Nodes are wrapped on access, not during
   parse. ~36,640 allocations eliminated per 200-element doc.

2. Cache Node#children (TODO 2): Memoize children NodeSet with
   invalidation on mutation (add_child, remove, replace, siblings).
   ~4,000 redundant NodeSet allocations eliminated.

3. Cache Element#attributes/namespaces (TODO 3): Memoize with
   invalidation on attribute/namespace mutation. ~3,000 wrapper
   allocations eliminated.

4. Cache wrapped nodes in NodeSet (TODO 4): Per-index wrapped cache
   eliminates redundant Node.wrap calls during iteration. ~1,000
   wrapper allocations eliminated.

5. Remove unnecessary allocations (TODO 5): Remove .dup from
   visit_children, set parent refs in Ox adapter children method,
   compact adapter test XML to avoid whitespace text node issues.

Parent cache invalidation: Store parent_node on child wrappers via
NodeSet, propagate invalidation upward on remove/replace/sibling ops.

All adapters (Nokogiri, Ox, Oga, REXML) pass the shared contract tests.
73 performance specs pass under RUN_PERFORMANCE=1.
Add CI-gated allocation benchmarks covering Nokogiri, Ox, HeadedOx, and
OGA adapters. These guards run in every CI build (no :performance tag)
and enforce per-adapter allocation budgets for:

- Parse allocations (100/50 element documents)
- Cache hit verification (children, attributes, iteration)
- Round-trip allocations (parse → serialize → parse)
- Scalability (linear growth, not quadratic)
- Cache invalidation (mutation breaks cache)
- NodeSet wrap cache (identity across accesses)

Measured baselines (allocations per 100-element parse):
  Nokogiri: 299 | Ox: 1003 | OGA: 8732 | HeadedOx: 176,472

HeadedOx thresholds are intentionally generous because it still uses
DocumentBuilder (eager parse). Tighten after lazy parse migration.

Also:
- Add AllocationHelper support module with per-adapter thresholds
- Add StackProf diagnostic on guard failure (allocation hotspots)
- Extend lazy_parse, node_cache, node_set_cache specs to all adapters
- Remove :performance tag from correctness tests (now run in CI)
- Add stackprof to Gemfile
Replace DocumentBuilder (eager tree construction) with Document.new
(lazy wrapping) in HeadedOx adapter, matching the Ox adapter approach.
The XPath engine already wraps nodes on-demand via Moxml::Node.wrap,
so the eager DocumentBuilder was pure overhead.

Impact on 100-element parse:
  Before: 176472 allocations (3.93x scalability - nearly quadratic)
  After:    1001 allocations (2.0x scalability - linear)

Also tighten HeadedOx allocation guard thresholds from the bloated
DocumentBuilder budgets to match Ox-level budgets.
@ronaldtse ronaldtse merged commit 7575904 into main Apr 21, 2026
33 of 37 checks passed
@ronaldtse ronaldtse deleted the perf/xml-deserialization-optimizations branch April 21, 2026 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant