fix(hogql): correct interval string parsing in rust and cpp parsers#66373
fix(hogql): correct interval string parsing in rust and cpp parsers#66373robbie-c wants to merge 9 commits into
Conversation
The rust parser fell through to the `INTERVAL <expr> <unit>` form for a string-literal interval whose string has no internal space (e.g. `interval ''`, `interval 'x'`), then raised a "expected interval unit keyword" SyntaxError when no unit keyword followed. cpp commits to ColumnExprIntervalString and rejects with "Unsupported interval type: must be in the format '<count> <unit>'". Try the expr+unit form first via backtracking (so the no-space string as the head of a longer unit-terminated value, `interval 'a' || 'b' hour`, still parses) and fall back to cpp's rejection when it fails to find a trailing unit. A fatal error from the value parse surfaces as-is. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The rust parser diverged from cpp on two count edge cases in the
`INTERVAL '<count> <unit>'` form:
- An empty count from a space-but-no-digits string (`interval ' '`,
`interval ' day'`) was rejected with "Unsupported interval count:",
but cpp's per-char isdigit loop passes vacuously and then `std::stoi("")`
throws invalid_argument ("Unknown error: stoi: no conversion").
- A count past INT_MAX (`interval '3000000000 day'`) was parsed as i64 and
silently accepted, but cpp's `std::stoi` returns int and throws
out_of_range.
Emulate `std::stoi` faithfully: digit-check (empty passes), reject empty as
"no conversion", parse as i32 so overflow past INT_MAX is "out of range".
Deduplicated the count+unit validation, previously copied in both
parse_interval_expr and parse_interval_string_only, into a shared
emit_interval_combined_string helper so both paths stay in lockstep.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`interval_call_name_case_sensitive` used `trim_end_matches('s')`, which
strips every trailing 's', so the rust parser silently accepted doubled
plurals like `INTERVAL '1 dayss'` (as toIntervalDay) and `'1 secondss'`.
cpp's visitColumnExprIntervalString matches each unit against exactly its
singular or single-'s' plural and rejects the rest.
Match cpp by listing both accepted forms explicitly. Surfaced by review of
the interval-string parity changes; the helper is the only caller.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Reviews (1): Last reviewed commit: "fix(hogql): reject doubled-plural interv..." | Re-trigger Greptile |
|
It looks like the code of |
ClickHouse stores intervals as Int64 (DataTypeInterval is DataTypeNumberBase<Int64>)
and handles the whole Int64 range correctly; beyond it the value silently overflows.
The combined-string interval form `INTERVAL '<count> <unit>'` was capped at int32 by
the cpp parser's std::stoi, while the equivalent `INTERVAL <count> <unit>` and
`toIntervalX(<count>)` forms already accepted the full range. Align all three forms
by converting the string count with std::stoll (cpp) / i64 (rust), so e.g.
`INTERVAL '3000000000 day'` now parses on both backends and only counts past Int64
max are rejected (matching std::stoll's out_of_range).
- common/hogql_parser/parser_json.cpp: std::stoi -> std::stoll (int -> int64_t)
- rust emit_interval_combined_string: parse the count as i64; error text matches
cpp's stoll what() ("Unknown error: stoll: ...")
- bumps hogql-parser 1.3.72 -> 1.3.73 and hogql-parser-rs 1.3.102 -> 1.3.103
Verified by building both parsers locally and diffing rust vs cpp across the string,
expr+unit, and direct-call forms over the int32/Int64 boundaries. Runtime-confirmed
the Int64 range against the ClickHouse cluster.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
It looks like the code of |
|
It looks like the code of |
✅ Hobby deploy smoke test: PASSEDHobby deployment smoke test passed successfully. |
Replace the leaked C++ stdlib text in `INTERVAL '<count> <unit>'` count errors with clear, parser-controlled messages, kept identical across the cpp and rust parsers: - empty / non-digit count: "Unsupported interval count: '<count>' is not a valid integer" (was rust "Unknown error: stoll: no conversion"; cpp surfaced the same via the std::stoll exception propagating to the generic handler) - count past Int64 max: "Unsupported interval count: '<count>' is too large" (was "Unknown error: stoll: out of range") cpp now catches std::out_of_range from std::stoll and throws a NotImplementedError, so neither parser leaks the "Unknown error: stoll: ..." text. As a bonus the test no longer depends on the libc++/libstdc++ difference in stoll's what(), so it asserts the full message. Bumps hogql-parser 1.3.73 -> 1.3.74 and hogql-parser-rs 1.3.103 -> 1.3.104. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
It looks like the code of |
Problem
PostHog runs several HogQL parser backends that must produce identical results: the C++ reference (
cpp-json), the hand-rolled Rust port (rust-py/rust-json, now the production default primary), and a legacypythonbackend. The two production parsers diverged from each other and from ClickHouse on severalINTERVAL '<string>'edge cases, and some invalid-interval errors leaked internal C++ stdlib text. The reported case wasinterval ''.Changes
Fixes to
INTERVAL '<count> <unit>'parsing so the rust and cpp parsers agree with each other and with ClickHouse, plus clearer error messages. Behavior onmastervs after this PR:masterinterval '',interval 'x'(no-space string)SyntaxError; cpp: rejectsUnsupported interval type: must be in the format '<count> <unit>'interval ' ',interval ' day'(empty count)Unsupported interval count:; cpp:Unknown error: stoll: no conversionUnsupported interval count: '' is not a valid integerinterval 'twenty days'(non-numeric count)Unsupported interval count: twentyUnsupported interval count: 'twenty' is not a valid integerinterval '1 dayss'(doubled-plural unit)toIntervalDay; cpp: rejectsUnsupported interval unit: dayssinterval '3000000000 day'(int32 < count ≤ Int64 max)std::stoi)toIntervalDay(3000000000)interval '9223372036854775808 day'(count > Int64 max)Unknown error: stoll: out of rangeUnsupported interval count: '9223372036854775808' is too largeTwo things worth calling out:
Count range. ClickHouse stores intervals as
Int64(DataTypeIntervalisDataTypeNumberBase<Int64>) and handles the whole Int64 range correctly; past it the value silently overflows to garbage. The cpp parser capped the string-form count at int32 viastd::stoi, even though the equivalentINTERVAL <count> <unit>andtoIntervalX(<count>)forms already accepted the full range. This change converts the string count withstd::stoll(cpp) /i64(rust) so all three forms accept the same Int64 range and only reject counts past Int64 max.Error messages. Invalid counts used to surface the raw C++ stdlib text (
Unknown error: stoll: out of range/no conversion), which is meaningless to a user. Both parsers now emitUnsupported interval count: '<count>' is not a valid integeror... is too large. cpp catchesstd::out_of_rangefromstd::stolland throws a clean error instead of letting it propagate to the generic handler, so the message is fully parser-controlled and identical across backends (and the test no longer depends on the libc++/libstdc++ difference instoll'swhat()).Other notes:
INTERVAL <expr> <unit>form first so a no-space string that is only the head of a longer unit-terminated value still parses (interval 'a' || 'b' hour), and falls back to the string-form rejection only when no unit closes the value. This mirrors how the C++ ANTLR grammar chooses betweenColumnExprIntervalandColumnExprIntervalString.splural explicitly instead oftrim_end_matches('s'), which stripped every trailingsand over-accepted doubled plurals.hogql-parser1.3.72 → 1.3.74 andhogql-parser-rs→ 1.3.104.How did you test this code?
I'm an agent; I did not do manual UI testing. Automated only:
rust-pyagainstcpp-jsonacross the string, expr+unit, and direct-call forms over the int32 / Int64 boundaries and the empty / non-numeric / doubled-plural / no-space cases. All agree, including the new error messages.toIntervalDay(9223372036854775807)is correct,toIntervalDay(9223372036854775808)silently overflows to a negative value, so Int64 is the meaningful bound.test_interval_combined_string_validates_count_and_unitagainstcpp-json,rust-json, andrust-py(3 passed), plus the fullrust-py+rust-jsonparser suites (943 passed, 504 snapshots). Rust unit tests (20) andcargo fmt/clippy/ruffclean.The shared regression test asserts both backends raise the same
ExposedHogQLErrormessage for each reject case, and compares the rust AST to thecpp-jsonAST for accept cases (including the Int64-max boundary). The legacypythonbackend is deferred for this test (it cannot match C++std::stollbecause Python ints never overflow) and is not wired to any production parser mode.Automatic notifications
Docs update
No docs change needed.
🤖 Agent context
Autonomy: Human-driven (agent-assisted). Driven by Robbie.
Built with Claude Code (Opus 4.8). Skills invoked:
/qa-team(multi-agent review of the diff), which independently surfaced the doubled-plural over-acceptance.Decisions along the way:
interval ''rejects.Int64interval type rather than the arbitrarystd::stoiint32 cap, after confirming against both the ClickHouse source and the live cluster that Int64 is the range it handles correctly. This required changing the cpp reference parser too, not just rust.pythonbackend untouched and deferred.Reviewer note on CI: this PR bumps both parser wheels. The root
pyproject.toml/uv.lockpins are auto-bumped bytests-posthog[bot]after each wheel'sbuild-hogql-parser*workflow publishes. Until both bot commits land, Python CI installs the old wheels and the new tests will be transiently red. That is expected, not a real failure.