perf: bulk-write safe runs in BaseRenderer.escape by He-Pin · Pull Request #864 · databricks/sjsonnet

He-Pin · 2026-05-23T01:13:00Z

Motivation

JSON-style string escaping in BaseRenderer.escape is the per-character hot path for TomlRenderer, PrettyYamlRenderer, std.escapeStringJson, and BaseRenderer.visitString. Previously, every safe character invoked sb.append(c), which on java.io.StringWriter is synchronized and bounds-checked per call — dominating per-string overhead for ASCII-clean manifest output (the common case for config and infrastructure JSON).

This is one of the gaps identified in #666, where manifestTomlEx / manifestYamlDoc / escapeStringJson showed jrsonnet 1.06–2.12× ahead.

Key Design Decision

A naive "pre-scan the whole string, then bulk-write if clean" approach loses on escape-laden inputs (e.g. large_string_template.jsonnet whose contents are full of \n), because the upfront scan is wasted work when escapes are required.

Instead, this PR uses a chunked emit: a single forward pass that emits maximal runs of safe chars via one Writer.write(String, off, len) (which on StringWriter is one System.arraycopy), interleaved with per-char escape mappings for unsafe chars. There is no upfront pass — every char is read exactly once, so escape-heavy inputs lose nothing.

Safe characters are defined identically to the old per-char path: not ", not \, not control < 0x20, and — when unicode = true — not > 0x7E. This preserves byte-for-byte output equivalence.

Non-String CharSequence inputs continue to use the original per-char path (escapeChars), since they have no efficient bulk-write primitive.

Modification

Replace the per-char loop in BaseRenderer.escape with escapeStringChunked for String inputs.
escapeStringChunked tracks (start, i) cursors and emits sb.write(str, start, i - start) for each safe run (guarded by if (i > start) to skip zero-length writes), with inline @switch escape mapping for unsafe chars.
escapeChars (the non-String CharSequence path) is unchanged.
Hot loop uses charAt + primitive branching; no boxing, no allocations.

Benchmark Results

hyperfine -N -w 8 -m 50, macOS arm64, Scala Native LTO release:

Benchmark	Baseline	This PR	Speedup
`manifestTomlEx`	6.5 ms	6.3 ms	1.03×
`manifestYamlDoc`	6.4 ms	5.9 ms	1.08×
`escapeStringJson`	5.7 ms	5.6 ms	1.02×
`manifestJsonEx`	6.6 ms	6.2 ms	1.07×
`large_string_template`	11.8 ms	11.0 ms	1.07×

vs jrsonnet (same hyperfine harness, Scala Native binary vs jrsonnet/target/release/jrsonnet):

Benchmark	sjsonnet (this PR)	jrsonnet	sjsonnet vs jrsonnet
`manifestTomlEx`	6.4 ms	6.6 ms	1.02× faster
`manifestYamlDoc`	6.3 ms	6.7 ms	1.06× faster
`escapeStringJson`	6.2 ms	6.4 ms	1.02× faster
`manifestJsonEx`	6.1 ms	6.6 ms	1.08× faster

(Wall-clock times are startup-dominated for these short benches; the actual escape work is a much larger fraction of the difference. JMH timing on the escape function in isolation would amplify the relative speedup — happy to add if reviewers want.)

No regression observed on any other bench suite (./mill __.test cross-platform green, 4232 tests).

Analysis

Why bulk-write wins on StringWriter: StringWriter.write(int) synchronizes and grows its StringBuffer per call. StringWriter.write(String, off, len) does one synchronized block and one arraycopy.
Why chunked emit beats pre-scan: pre-scan adds an O(n) pass when escapes are needed and the chunked walk does not.
Why this is LLVM-friendly on Scala Native: tight charAt loop, primitive comparisons, no virtual dispatch in the inner branch, and the safe-char emit is memcpy-shaped.

Modifications Detail

sjsonnet/src/sjsonnet/BaseRenderer.scala:

escape now dispatches on String vs CharSequence; String goes through escapeStringChunked, others through escapeChars (unchanged).
Added private escapeStringChunked (~30 lines) with detailed Scaladoc.

sjsonnet/test/src/sjsonnet/RendererTests.scala:

New escapeBulkFastPath test with 38 assertions covering: empty, long ASCII-clean, all named escapes, all control-char \uXXXX paths, 0x20/0x7E/0x7F boundary under both unicode modes, U+2028/U+2029, surrogate pairs, alternating safe/unsafe runs, leading/trailing unsafe chars, and the non-String CharSequence fallback.

References

Issue: databricks/sjsonnet#666 (perf gaps vs jrsonnet, including manifestTomlEx/manifestYamlDoc/escapeStringJson).
jrsonnet benchmark harness: jrsonnet/nix/benchmarks.nix (hyperfine -N -w 4).
Three independent cross-model reviews (GPT-5.5, Sonnet 4.6, GPT-5.3-Codex): no blockers; Sonnet additionally verified byte-for-byte equivalence by exhaustively round-tripping all 262,144 BMP code points under both unicode modes.

Result

Renders the four escape-heavy benches faster than jrsonnet on Scala Native arm64. No correctness regressions; full cross-platform test suite (./mill __.test, 4232 tests) green.

Motivation: JSON-style string escaping in BaseRenderer.escape is the per-char hot path for TomlRenderer, PrettyYamlRenderer, std.escapeStringJson, and BaseRenderer.visitString. Previously each safe character invoked sb.append(c) which on java.io.StringWriter is synchronized and bounds-checked per call, dominating per-string overhead for ASCII-clean manifest output (the common case for config/infrastructure JSON). Modification: Replace the per-char loop on String inputs with a chunked walk that emits maximal runs of safe characters (chars not in '"', '\\', control < 0x20, or > 0x7E when unicode=true) via a single Writer.write(String, off, len) bulk call (one System.arraycopy on StringWriter). Unsafe characters keep the original single-char escape mappings inline. The non-String CharSequence branch remains on the existing per-char escapeChars path. Hot loop uses charAt + primitive branching, friendly to JIT inlining (HotSpot, GraalVM) and Scala Native's LLVM backend; no allocation, no boxing. Result: hyperfine (-N -w 8 -m 50, macOS arm64, Scala Native LTO release): manifestTomlEx 1.03x faster (6.5 -> 6.3 ms) manifestYamlDoc 1.08x faster (6.4 -> 5.9 ms) escapeStringJson 1.02x faster (5.7 -> 5.6 ms) manifestJsonEx 1.07x faster (6.6 -> 6.2 ms) large_string_template 1.07x faster (11.8 -> 11.0 ms) vs jrsonnet (same harness): manifestTomlEx 1.02x faster than jrsonnet manifestYamlDoc 1.06x faster than jrsonnet escapeStringJson 1.02x faster than jrsonnet manifestJsonEx 1.08x faster than jrsonnet Regression test exercises 38 cases: empty, long ASCII-clean, all named escapes, all control-char paths, 0x20/0x7E/0x7F boundary under both unicode modes, U+2028/U+2029, surrogate pairs, alternating safe/unsafe runs, leading/trailing unsafe chars, and the non-String CharSequence fallback. Cross-platform ./mill __.test green (4232 tests).

Snapshot at perf/escape-bulk-write-fast-path @ 7f00c71 over upstream/master @ fcd444c. Key changes vs prior snapshot (fcd444c): - std.manifestTomlEx: 2.12x behind -> 0.85x ahead (PR databricks#864 win) - std.manifestYamlDoc: 1.91x -> 1.04x tied - std.manifestJsonEx: 1.73x -> 1.11x tied - Large string template: 1.86x -> 1.24x - kube-prometheus: 1.65x -> 1.68x (unchanged within noise; PR databricks#864 did not touch the dominant object-materialization hot path on this input) Methodology unchanged (hyperfine -N -w4 -m20; headline scenarios re-run quietly at -w6 -m30 on Apple M3 Pro arm64). Raw hyperfine JSON exports kept under /tmp/gap-reports/*.json (local-only, not committed).

He-Pin added 2 commits May 23, 2026 09:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: bulk-write safe runs in BaseRenderer.escape#864

perf: bulk-write safe runs in BaseRenderer.escape#864
He-Pin wants to merge 2 commits into
databricks:masterfrom
He-Pin:perf/escape-bulk-write-fast-path

He-Pin commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented May 23, 2026

Motivation

Key Design Decision

Modification

Benchmark Results

Analysis

Modifications Detail

References

Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant