perf: use unsynchronized StringBuilderWriter in TomlRenderer by He-Pin · Pull Request #875 · databricks/sjsonnet

He-Pin · 2026-05-30T11:30:40Z

Motivation

std.manifestTomlEx had three sources of avoidable overhead on the hot manifestation path:

Synchronized writer. TomlRenderer and ManifestModule.evalRhs rendered into a java.io.StringWriter, whose backing StringBuffer pays a monitor enter/exit on every write/flush. The FastMaterializeJsonRenderer already uses the unsynchronized StringBuilderWriter (perf: speed up manifest JSON rendering #874); TOML did not.
Redundant field lookups in renderTableInternal. Each key's Val.Obj.value(k) was resolved twice — once to classify scalar vs section, then again to render or recurse. The cache deduplicates the result, but the lookup itself still costs.
Wasted indexing work. visibleKeyNames was iterated and each key binary-searched back into sortedVisibleKeyNames — sortedVisibleKeyNames can be iterated directly, skipping O(n log n) compares per table.

Modification

Two commits:

perf: use unsynchronized StringBuilderWriter in TomlRenderer — Swap TomlRenderer and the manifestTomlEx render path in ManifestModule from java.io.StringWriter to the package-private StringBuilderWriter. std.deepJoin keeps StringWriter (separate concern).
perf: cache resolved field values and skip binary search in renderTableInternal — Resolve each field once into a resolved: Array[Val] during section classification and reuse it during render/recurse; iterate sortedVisibleKeyNames directly (removes the now-unused sortedKeyIndex binary search); hoist childIndent = cumulatedIndent + indent out of the section loop (was an identical allocation per sibling section); pre-size the output StringBuilderWriter to 1 KiB so small/medium outputs skip the first ~6 doublings.

Output is byte-identical (verified at 1,228,186 bytes on the benchmark workload).

Result

Scala Native, hyperfine A/B against master (fc292fa6). Workload: object comprehension over 8000 small tables → ~1.2 MB TOML output (render-dominated). Four interleaved-order passes, --warmup 10 --min-runs 100 --shell=none:

pass	order	before mean	after mean	before min	after min	min ratio
1	before → after	59.4 ± 2.7 ms	53.2 ± 23.4 ms	55.4 ms	43.8 ms	1.27×
2	after → before	64.1 ± 7.7 ms	51.8 ± 12.2 ms	56.4 ms	43.7 ms	1.29×
3	before → after	64.1 ± 8.1 ms	53.2 ± 14.3 ms	56.4 ms	42.0 ms	1.34×
4	after → before	63.3 ± 14.3 ms	49.2 ± 3.7 ms	57.2 ms	42.8 ms	1.34×

Mean is noisy on the host (1.12× – 1.29×), but after is faster in every one of the 4 passes and the min values are tight at ~1.27–1.34× faster (best observed: 42.0 ms vs 56.4 ms, ~25.5% reduction). Output byte-identical, 1,228,186 bytes both sides.

For comparison, the StringBuilderWriter swap alone (commit 1) measures ~1.08–1.14× min; the cache + binary-search elimination + childIndent hoist (commit 2) lifts that to ~1.27–1.34× min.

Test plan

./mill __.reformat
./mill 'sjsonnet.jvm[3.3.7]'.test — 519/519 pass
Scala Native A/B hyperfine — 4 interleaved-order passes, all positive; output byte-identical

Rebased onto current master (fc292fa6). The companion commit "speed up manifest JSON rendering" was merged separately as #879, so this PR now contains only the TomlRenderer / ManifestModule changes.

## Motivation `std.manifestJson`, `std.manifestJsonMinified`, and `std.manifestJsonEx` routed through `java.io.StringWriter`, paying `StringBuffer` synchronization per `write`/`flush` on the hot manifestation path. Source-built jrsonnet comparisons showed sjsonnet trailing on object-heavy manifest workloads. ## Modification - Add `StringBuilderWriter`: an unsynchronized `Writer` over a `StringBuilder`. - Add package-private `FastMaterializeJsonRenderer` backed by `StringBuilderWriter`; route the three `std.manifestJson*` builtins through it. Public `MaterializeJsonRenderer` ABI/shape unchanged. - Use an in-place codepoint sort for `sortedVisibleKeyNames` / `maybeSortKeys` (avoids `.sorted` boxing). - Fix codepoint comparison for raw surrogate prefixes; `UnicodeHandlingTests` extended. ## Result Scala Native hyperfine on kube-prometheus, jrsonnet HEAD `2d7eed05`: | Workload (native) | Before | After | Δ | |---|---:|---:|---:| | kube-prometheus, sjsonnet | 158.4 ± 16.8 ms | 143.7 ± 3.2 ms | **−9.3%** | | `manifestJsonEx`, sjsonnet | — | 5.09 ± 1.01 ms | new | ## Test plan - [x] `./mill __.reformat` - [x] `./mill 'sjsonnet.jvm[3.3.7]'.test` — 518/518 pass This PR is the base for the stacked follow-ups #875 (TomlRenderer reuses `StringBuilderWriter`), and the independent #876/#877/#878.

std.manifestTomlEx routed through java.io.StringWriter, whose backing StringBuffer pays a monitor enter/exit on every write/flush on the hot TOML manifestation path. Switch TomlRenderer and the manifestTomlEx render path in ManifestModule to the unsynchronized package-private StringBuilderWriter (the same writer the JSON manifest renderer uses). Output is byte-identical; std.deepJoin keeps StringWriter (separate concern). Result (Scala Native hyperfine, TOML-heavy workload, ~1.8 MB output): after ran 1.11 ± 0.07x faster than before (~10%); output byte-identical.

He-Pin · 2026-06-03T04:13:30Z

@stephenamar-db rebased

…leInternal Each TOML table iteration was doing redundant work for every key: * v.value(k) was called twice — once to classify scalar vs section, then again to render or recurse. The cache deduplicates the result but the lookup itself still costs. * visibleKeyNames was iterated and each key binary-searched back into sortedVisibleKeyNames. Iterating sortedVisibleKeyNames directly is simpler and skips O(n log n) compares per table. * childIndent (cumulatedIndent + indent) was allocated inside the section loop once per section, all producing the same String for sibling sections. Also pre-size the output StringBuilderWriter to 1 KiB at the manifestTomlEx entry point so small/medium outputs skip the first few StringBuilder doublings. Output byte-identical (no behavior change).

This was referenced May 30, 2026

perf: speed up manifest JSON rendering #874

Closed

perf: speed up manifest JSON rendering #879

Merged

He-Pin marked this pull request as draft May 30, 2026 12:48

He-Pin marked this pull request as ready for review June 3, 2026 02:57

He-Pin force-pushed the perf/toml-stringbuilder-writer branch from 933ed41 to e327ba2 Compare June 3, 2026 03:52

He-Pin marked this pull request as draft June 3, 2026 04:35

He-Pin marked this pull request as ready for review June 3, 2026 04:53

He-Pin mentioned this pull request Jun 3, 2026

perf: use unsynchronized StringBuilderWriter in std.deepJoin #889

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: use unsynchronized StringBuilderWriter in TomlRenderer#875

perf: use unsynchronized StringBuilderWriter in TomlRenderer#875
He-Pin wants to merge 2 commits into
databricks:masterfrom
He-Pin:perf/toml-stringbuilder-writer

He-Pin commented May 30, 2026 •

edited

Loading

Uh oh!

He-Pin commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

Result

Test plan

Uh oh!

He-Pin commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

He-Pin commented May 30, 2026 •

edited

Loading