perf: bulk-write safe runs in BaseRenderer.escape#864
Draft
He-Pin wants to merge 2 commits into
Draft
Conversation
Motivation: JSON-style string escaping in BaseRenderer.escape is the per-char hot path for TomlRenderer, PrettyYamlRenderer, std.escapeStringJson, and BaseRenderer.visitString. Previously each safe character invoked sb.append(c) which on java.io.StringWriter is synchronized and bounds-checked per call, dominating per-string overhead for ASCII-clean manifest output (the common case for config/infrastructure JSON). Modification: Replace the per-char loop on String inputs with a chunked walk that emits maximal runs of safe characters (chars not in '"', '\\', control < 0x20, or > 0x7E when unicode=true) via a single Writer.write(String, off, len) bulk call (one System.arraycopy on StringWriter). Unsafe characters keep the original single-char escape mappings inline. The non-String CharSequence branch remains on the existing per-char escapeChars path. Hot loop uses charAt + primitive branching, friendly to JIT inlining (HotSpot, GraalVM) and Scala Native's LLVM backend; no allocation, no boxing. Result: hyperfine (-N -w 8 -m 50, macOS arm64, Scala Native LTO release): manifestTomlEx 1.03x faster (6.5 -> 6.3 ms) manifestYamlDoc 1.08x faster (6.4 -> 5.9 ms) escapeStringJson 1.02x faster (5.7 -> 5.6 ms) manifestJsonEx 1.07x faster (6.6 -> 6.2 ms) large_string_template 1.07x faster (11.8 -> 11.0 ms) vs jrsonnet (same harness): manifestTomlEx 1.02x faster than jrsonnet manifestYamlDoc 1.06x faster than jrsonnet escapeStringJson 1.02x faster than jrsonnet manifestJsonEx 1.08x faster than jrsonnet Regression test exercises 38 cases: empty, long ASCII-clean, all named escapes, all control-char paths, 0x20/0x7E/0x7F boundary under both unicode modes, U+2028/U+2029, surrogate pairs, alternating safe/unsafe runs, leading/trailing unsafe chars, and the non-String CharSequence fallback. Cross-platform ./mill __.test green (4232 tests).
Snapshot at perf/escape-bulk-write-fast-path @ 7f00c71 over upstream/master @ fcd444c. Key changes vs prior snapshot (fcd444c): - std.manifestTomlEx: 2.12x behind -> 0.85x ahead (PR databricks#864 win) - std.manifestYamlDoc: 1.91x -> 1.04x tied - std.manifestJsonEx: 1.73x -> 1.11x tied - Large string template: 1.86x -> 1.24x - kube-prometheus: 1.65x -> 1.68x (unchanged within noise; PR databricks#864 did not touch the dominant object-materialization hot path on this input) Methodology unchanged (hyperfine -N -w4 -m20; headline scenarios re-run quietly at -w6 -m30 on Apple M3 Pro arm64). Raw hyperfine JSON exports kept under /tmp/gap-reports/*.json (local-only, not committed).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
JSON-style string escaping in
BaseRenderer.escapeis the per-character hot path forTomlRenderer,PrettyYamlRenderer,std.escapeStringJson, andBaseRenderer.visitString. Previously, every safe character invokedsb.append(c), which onjava.io.StringWriteris synchronized and bounds-checked per call — dominating per-string overhead for ASCII-clean manifest output (the common case for config and infrastructure JSON).This is one of the gaps identified in #666, where
manifestTomlEx/manifestYamlDoc/escapeStringJsonshowed jrsonnet 1.06–2.12× ahead.Key Design Decision
A naive "pre-scan the whole string, then bulk-write if clean" approach loses on escape-laden inputs (e.g.
large_string_template.jsonnetwhose contents are full of\n), because the upfront scan is wasted work when escapes are required.Instead, this PR uses a chunked emit: a single forward pass that emits maximal runs of safe chars via one
Writer.write(String, off, len)(which onStringWriteris oneSystem.arraycopy), interleaved with per-char escape mappings for unsafe chars. There is no upfront pass — every char is read exactly once, so escape-heavy inputs lose nothing.Safe characters are defined identically to the old per-char path: not
", not\, not control< 0x20, and — whenunicode = true— not> 0x7E. This preserves byte-for-byte output equivalence.Non-
StringCharSequenceinputs continue to use the original per-char path (escapeChars), since they have no efficient bulk-write primitive.Modification
BaseRenderer.escapewithescapeStringChunkedforStringinputs.escapeStringChunkedtracks(start, i)cursors and emitssb.write(str, start, i - start)for each safe run (guarded byif (i > start)to skip zero-length writes), with inline@switchescape mapping for unsafe chars.escapeChars(the non-StringCharSequencepath) is unchanged.charAt+ primitive branching; no boxing, no allocations.Benchmark Results
hyperfine -N -w 8 -m 50, macOS arm64, Scala Native LTO release:manifestTomlExmanifestYamlDocescapeStringJsonmanifestJsonExlarge_string_templatevs
jrsonnet(same hyperfine harness, Scala Native binary vsjrsonnet/target/release/jrsonnet):manifestTomlExmanifestYamlDocescapeStringJsonmanifestJsonEx(Wall-clock times are startup-dominated for these short benches; the actual escape work is a much larger fraction of the difference. JMH timing on the escape function in isolation would amplify the relative speedup — happy to add if reviewers want.)
No regression observed on any other bench suite (
./mill __.testcross-platform green, 4232 tests).Analysis
StringWriter:StringWriter.write(int)synchronizes and grows itsStringBufferper call.StringWriter.write(String, off, len)does onesynchronizedblock and onearraycopy.charAtloop, primitive comparisons, no virtual dispatch in the inner branch, and the safe-char emit ismemcpy-shaped.Modifications Detail
sjsonnet/src/sjsonnet/BaseRenderer.scala:escapenow dispatches onStringvsCharSequence;Stringgoes throughescapeStringChunked, others throughescapeChars(unchanged).escapeStringChunked(~30 lines) with detailed Scaladoc.sjsonnet/test/src/sjsonnet/RendererTests.scala:escapeBulkFastPathtest with 38 assertions covering: empty, long ASCII-clean, all named escapes, all control-char\uXXXXpaths, 0x20/0x7E/0x7F boundary under bothunicodemodes, U+2028/U+2029, surrogate pairs, alternating safe/unsafe runs, leading/trailing unsafe chars, and the non-StringCharSequencefallback.References
manifestTomlEx/manifestYamlDoc/escapeStringJson).jrsonnet/nix/benchmarks.nix(hyperfine -N -w 4).unicodemodes.Result
Renders the four escape-heavy benches faster than
jrsonneton Scala Native arm64. No correctness regressions; full cross-platform test suite (./mill __.test, 4232 tests) green.