Fix JSON extraction of $-prefixed property keys#807
Conversation
DuckDB's json_extract[_string](doc, path) treats a '$'-prefixed second
argument as a JSONPath. PostHog/HogQL property keys like "$ai_session_id"
and "$group_0" are not valid JSONPaths ('$' must be followed by '.' or
'['), so DuckDB failed at bind time:
Binder Error: JSON path error near 'ai_session_id'
Normalize the key/path argument of -> / ->> operators and direct
json_extract / json_extract_string calls:
- Literal keys are rewritten at transpile time: $ai_session_id ->
$."$ai_session_id". Plain keys and already-valid JSONPaths ($.foo,
$."foo", $.a[0], $[0]) are preserved unchanged.
- Bound parameters ($N), whose value is unknown at transpile time, are
wrapped in a new duckgres_json_extract_path() macro that applies the
same normalization at runtime. Only JSON path arguments are wrapped,
never ordinary string parameters. The macro is registered on every
backend and emitted memory.main-qualified in DuckLake/Iceberg mode.
Adds transpiler unit tests, server transpile->DuckDB round-trip tests
(literal + bound param), e2e harness assertions on both metadata
backends, and updates the compatibility docs.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Test Impact PlanDeterministic summary of how this PR changes tests, CI runners, and coverage-risk signals. Summary
Signals
Coverage risk: needs review Warnings
|
- $1::text / props->>$1::text parse as TypeCast(ParamRef) and bypassed normalization, so a bound $ai_session_id still reached DuckDB as a malformed JSONPath. normalizeJSONExtractPathArg now looks through wrapping casts (and rewrites a literal inside a cast, preserving it). - json ->> $1 with an INTEGER-bound parameter (Postgres array indexing) regressed: the param was wrapped in a macro whose LIKE checks failed to bind on an integer. duckgres_json_extract_path is now type-aware — an integer argument becomes a $[i] JSONPath (array index), while a string "0" stays an object-key lookup, matching Postgres int-vs-text semantics. The string branches cast to VARCHAR so LIKE binds for any bound type. - Documented the deliberate divergence where a $./$[ -prefixed key navigates as a DuckDB JSONPath even via ->> (kept consistent with direct calls and the #639 test; HogQL property keys never take this branch). Adds transpiler + round-trip tests (casted text param, integer array index) and an integer-index e2e harness assertion. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Thanks — all three addressed in e9e5770. [P2] Casted text path params ( [P2] Integer array-index params — confirmed regression, and worse than wrong results: an integer arg made the macro's [P3] Arrow vs direct
|
Problem
Production logs showed repeated DuckDB binder failures:
(same pattern for
$group_0).DuckDB's
json_extract[_string](doc, path)treats a$-prefixed second argument as a JSONPath. PostHog/HogQL property keys like$ai_session_id/$group_0are not valid JSONPaths ($must be followed by.or[), so DuckDB fails at bind time — for both literal arguments and bound parameters. Non-$keys are already literal-key lookups (verified against DuckDB 1.5.2) and must be left alone.Fix
Normalize the key/path argument of
->/->>operators and directjson_extract/json_extract_stringcalls (the production query uses the latter shape):$ai_session_id→$."$ai_session_id". Plain keys (key,foo.bar) and already-valid JSONPaths ($.foo,$."foo",$.a[0],$[0]) are preserved unchanged — the rewrite is idempotent.$N), whose value is unknown at transpile time, are wrapped in a newduckgres_json_extract_path()macro that applies the identical normalization at runtime, before DuckDB's binder sees the value. The placeholder is referenced exactly once, so param count/position is unchanged. Only JSON path arguments are wrapped — ordinary string parameters are untouched.initUtilityMacros(reaches standalone, process workers, and k8s workers) and emittedmemory.main-qualified in DuckLake/Iceberg mode (the worker backend), since the macro-qualifier pass runs before the operator transform.Tests
TDD red→green. Added:
$-keys, chained keys, direct func-call shape, parameterized form (default + DuckLake-qualified), and a guard that ordinary string params are not wrapped.server/json_extract_path_test.go) — full transpile→real-DuckDB execution for both literal and bound-parameter cases, exercising the registered macro.tests/e2e-mw-dev/harness.sh) — 3 assertions (arrow, direct call, and the macro directly) that run against real worker pods on both cnpg + ext metadata backends.docs/postgres-compatibility.md.just lintclean; full server + transpiler suites pass.🤖 Generated with Claude Code