perf: hand-rolled YAML quote scanner removes fastparse allocations in Native#865
Open
He-Pin wants to merge 2 commits into
Open
perf: hand-rolled YAML quote scanner removes fastparse allocations in Native#865He-Pin wants to merge 2 commits into
He-Pin wants to merge 2 commits into
Conversation
…hine
Motivation:
async-profiler on the Scala Native AOT build over the kube-prometheus
workload (jrsonnet/tests/realworld/entry-kube-prometheus.jsonnet) showed
27% of total CPU was spent in fastparse Failure message construction:
fastparse.Parsed$Failure$.formatMsg 17.8% self
fastparse.Parsed$Failure.msg 9.1% self
Root cause: `Parsed.fromParsingRun` eagerly evaluates the failure
message string regardless of `verboseFailures`. `stringNeedsToBeQuoted`
calls `fastparse.parse(...).isSuccess` once per emitted YAML scalar
(via `std.manifestYamlDoc` / `PrettyYamlRenderer`), so every "this
string is fine as bare YAML" decision pays the failure-message tax.
kube-prom emits tens of thousands of YAML scalars per render
(alertmanager / prometheus-adapter / grafana configs), so this single
function dominates the native profile.
Modification:
Replace the fastparse parser in `PrettyYamlRenderer.stringNeedsToBeQuoted`
with a hand-rolled, allocation-free scanner that reproduces the original
grammar exactly:
- yamlPunctuation prefix (single-char + ' '-suffixed '-', ':', '?')
- yamlKeyword whole-string match (yes/no/true/false/on/off/null/~/-/=
and their capitalized variants)
- yamlTime (dd:dd, exactly 5 chars)
- yamlDate (YYYY-M[M]-D[D] with optional datetime suffix)
- yamlNumber ('-'? .inf | float | 0x.. | 0o.. | digits)
- trailing/substring fallbacks (': ', ' #', trailing ':' / space)
A `StringNeedsToBeQuotedTests` suite exhaustively exercises every
grammar branch (~80 cases) and was used to validate equivalence with the
prior fastparse implementation.
Benchmark Results:
Hyperfine on Scala Native (Apple M-series, aarch64, 50 iters, warmup 5,
`--prepare 'sleep 0.05'`):
baseline (fastparse): 196.0 ms ± 41.7 ms [158.1 - 412.9 ms]
hand-rolled scanner: 168.3 ms ± 29.9 ms [144.6 - 295.1 ms]
speedup: 1.16x mean, -8% min, -29% max (p99)
Tail-latency improvement is more dramatic than mean because the fix
removes a hot allocation path (Failure / Lazy[String] thunks), reducing
GC pressure under the Scala Native immix collector.
Analysis:
JVM JIT compiles away most of the Failure cost via dead-code elimination
of the unread message strings, so the JVM regression bench
(`bench.runRegressions go_suite/manifestYamlDoc.jsonnet`) shows no
measurable change (0.062 ms/op before and after). The fix is therefore
neutral on JVM and a clear win on Scala Native, exactly matching the
profile-driven hypothesis.
References:
async-profiler / samply native profile,
fastparse/fastparse/src/fastparse/Parsed.scala (fromParsingRun, line 20-27).
Result:
27% CPU hotspot eliminated, 14% wall-clock reduction on kube-prom under
Scala Native, no JVM regression. All 512 JVM tests pass.
Motivation: A review of PR databricks#865 found a Scaladoc reference to a non-existent reference function. Modification: Clarify that the original FastParse implementation is preserved in the commit message / PR description and that regression expectations were generated from it. Result: The Scala 3 JVM tests pass locally. References: databricks#865
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
PrettyYamlRenderer.stringNeedsToBeQuotedwas the top 1 self-time hotspot on the kube-prometheus workload in Scala Native AOT builds, accounting for ~27% of total CPU (fastparse.Parsed$Failure.formatMsg17.8% +.msg9.1%).Root cause: fastparse's
Parsed.fromParsingRuneagerly evaluatesOption(p.lastFailureMsg).fold("")(_.render)regardless ofverboseFailures=false. Every "no match" branch allocatesLazy[String]thunks and renders them. On JVM the JIT can dead-code-eliminate these allocations, but Scala Native AOT cannot.std.manifestYamlDoccalls this scanner once per emitted scalar, so a single kube-prom render does it thousands of times.Key Design Decisions
stringNeedsToBeQuoted1:1 — no parser combinator overhead, no message allocations, noLazy[String]thunks.StringNeedsToBeQuotedTestswith ~80 equivalence cases covering: punctuation prefixes, YAML keywords (yes/no/true/false/on/off/null/~), timeHH:MM:SS, datesYYYY-MM-DD, integers, floats, octals, hex, leading/trailing space, control chars, ASCII vs non-ASCII boundaries.Modification
stringNeedsToBeQuotedinPrettyYamlRenderer.scalawith a hand-rolled scanner plus helpers (isYamlKeywordExact,isYamlTimeExact,isYamlDateExact,isYamlNumberExact,isYamlFloatFromDot,isYamlOctalHex).sjsonnet/test/src/sjsonnet/StringNeedsToBeQuotedTests.scala.Benchmark Results
Workload:
jrsonnet/tests/realworld/entry-kube-prometheus.jsonnetwith-J jrsonnet/tests/realworld/vendor.Scala Native AOT, hyperfine
-w 5 -r 50 -N:JVM warm
bench.runRegressions: neutral (within noise) — as expected; JIT already handled this.Analysis
The fastparse library's
Parsed.Failuremessage materialization is unavoidable in current versions for AOT-compiled callers. Replacing the inner-loop parser with a direct scanner removes the entire allocation chain for the success path. This is a Native-specific win that JVM users won't notice but Native deployments will.References
PrettyYamlRenderer.scala:447-630StringNeedsToBeQuotedTests.scalasamplycapture on Apple M-series, 600 iters, identified the dual hotspotsParsed$Failure.formatMsg+.msg.Result