perf: skip UTF-8 encode for clean-ASCII long strings in renderer#866
Open
He-Pin wants to merge 1 commit into
Open
perf: skip UTF-8 encode for clean-ASCII long strings in renderer#866He-Pin wants to merge 1 commit into
He-Pin wants to merge 1 commit into
Conversation
Motivation:
async-profiler on the Scala Native kube-prometheus workload shows
HeapCharBuffer.wrap accounting for 40.3% of GC-allocation parents
(GC itself is ~25-30% of native runtime). The wrap site is
String.getBytes(UTF_8) called once per long (>=128 char) JSON string
inside BaseByteRenderer.visitLongString. Each call also allocates an
output byte[]. In K8s manifest output the overwhelming majority of
these long values (descriptions, annotations, base64 blobs, paths)
are pure printable ASCII with no JSON-escape characters.
Modification:
At the top of visitLongString, probe the string with the existing
Platform.isAsciiJsonSafe SWAR scan (16 chars/Long word, no allocation).
On a positive probe, delegate to renderAsciiSafeString which uses
Platform.copyAsciiStringToBytes for a direct char->byte memcpy and
skips the CharsetEncoder, HeapCharBuffer, and intermediate byte[]
entirely. Strings that contain any escape-requiring char or any
non-ASCII codepoint fall through to the existing byte-SWAR path
unchanged — they pay one SWAR scan over chars (~bLen/16 Long reads)
on top of the existing work, which is dominated by the encode cost
they already perform.
Result:
- ./mill 'sjsonnet.jvm[3.3.7]'.test : 444/444 pass
- Byte-identical output on kube-prometheus (1.5MB / 72k lines)
- hyperfine (Scala Native, kube-prom, 60 runs, warmup 8):
before: 150.7 ms ± 8.3 ms
after : 145.9 ms ± 6.2 ms
=> 1.03x faster (-4.8 ms mean, -3.2%)
- ./mill bench.runRegressions : completes successfully across all
cpp/go/sjsonnet suites with no anomalies.
Analysis:
Modest but real: visitLongString is one call per long output string,
so even on a 72k-line kube-prom output we hit it on the order of
~10^4 times. Each spared call avoids two heap allocations and a
CharsetEncoder dispatch. Larger gains require attacking the
remaining UTF-8 path itself (next commits target the escape-needing
branch and the PlatformBase64 zero-copy).
References:
- async-profiler GC-parent analysis on /tmp/sjsonnet-yaml-fix
- Platform.isAsciiJsonSafe / CharSWAR.isAsciiJsonSafe (existing SWAR helper)
- renderAsciiSafeString / Platform.copyAsciiStringToBytes (existing fast path)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
async-profileron the Scala Native kube-prometheus workload showsHeapCharBuffer.wrapaccounting for 40.3% of GC-allocation parents (GC itself is ~25–30% of native runtime). The wrap site isString.getBytes(UTF_8)called once per long (≥128 char) JSON string insideBaseByteRenderer.visitLongString. Each call also allocates an outputbyte[]. In K8s manifest output, the overwhelming majority of these long values (descriptions, annotations, base64 blobs, paths) are pure printable ASCII with no JSON-escape characters.Key Design Decision
Use the existing
Platform.isAsciiJsonSafeSWAR scan (16 chars/Long word, no allocation) as a cheap probe up-front. On positive probe, route to the existingrenderAsciiSafeStringfast path which usesPlatform.copyAsciiStringToBytesfor direct char→byte memcpy. On a negative probe, fall through unchanged to the byte-SWAR path. The extra cost paid by the non-ASCII branch is one SWAR scan over chars (~bLen/16 Long reads), which is dominated by the encode cost that branch already performs.Modification
At the top of
visitLongStringinBaseByteRenderer.scala, probe withPlatform.isAsciiJsonSafe(str)and delegate torenderAsciiSafeString(str)when clean. Otherwise the original code runs unchanged.Benchmark Results
./mill bench.runRegressionscompletes across all cpp/go/sjsonnet suites with no anomalies.hyperfine (Scala Native AOT, kube-prometheus, 60 runs, 8 warmup):
fcd444cc)After is 1.03× faster (−4.8 ms mean, −3.2%). σ ratio = 0.07, improvement is reproducible across runs.
Analysis
Modest but real.
visitLongStringis one call per long output string, so on a 72k-line kube-prom output we hit it on the order of 10⁴ times. Each spared call avoids two heap allocations and aCharsetEncoderdispatch, which directly reduces theHeapCharBuffer.wrapGC pressure that profiling identified as the largest single allocation source.Larger gains require attacking the remaining UTF-8 path itself — follow-up commits target the escape-needing branch and
PlatformBase64zero-copy.References
/tmp/sjsonnet-yaml-fixPlatform.isAsciiJsonSafe,renderAsciiSafeString,Platform.copyAsciiStringToBytesResult
./mill 'sjsonnet.jvm[3.3.7]'.test— 444/444 passbench.runRegressions