Skip to content

Fix JSON extraction of $-prefixed property keys#807

Open
bill-ph wants to merge 2 commits into
mainfrom
claude/vibrant-goldwasser-058fee
Open

Fix JSON extraction of $-prefixed property keys#807
bill-ph wants to merge 2 commits into
mainfrom
claude/vibrant-goldwasser-058fee

Conversation

@bill-ph

@bill-ph bill-ph commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Problem

Production logs showed repeated DuckDB binder failures:

query: SELECT json_extract_string(events.properties, $1) AS ai_session_id ...
error: Binder Error: JSON path error near 'ai_session_id'

(same pattern for $group_0).

DuckDB's json_extract[_string](doc, path) treats a $-prefixed second argument as a JSONPath. PostHog/HogQL property keys like $ai_session_id / $group_0 are not valid JSONPaths ($ must be followed by . or [), so DuckDB fails at bind time — for both literal arguments and bound parameters. Non-$ keys are already literal-key lookups (verified against DuckDB 1.5.2) and must be left alone.

Fix

Normalize the key/path argument of ->/->> operators and direct json_extract / json_extract_string calls (the production query uses the latter shape):

  • Literal keys are rewritten at transpile time: $ai_session_id$."$ai_session_id". Plain keys (key, foo.bar) and already-valid JSONPaths ($.foo, $."foo", $.a[0], $[0]) are preserved unchanged — the rewrite is idempotent.
  • Bound parameters ($N), whose value is unknown at transpile time, are wrapped in a new duckgres_json_extract_path() macro that applies the identical normalization at runtime, before DuckDB's binder sees the value. The placeholder is referenced exactly once, so param count/position is unchanged. Only JSON path arguments are wrapped — ordinary string parameters are untouched.
  • The macro is registered in initUtilityMacros (reaches standalone, process workers, and k8s workers) and emitted memory.main-qualified in DuckLake/Iceberg mode (the worker backend), since the macro-qualifier pass runs before the operator transform.

Tests

TDD red→green. Added:

  • Transpiler unit tests — literal $-keys, chained keys, direct func-call shape, parameterized form (default + DuckLake-qualified), and a guard that ordinary string params are not wrapped.
  • Server round-trip tests (server/json_extract_path_test.go) — full transpile→real-DuckDB execution for both literal and bound-parameter cases, exercising the registered macro.
  • e2e harness (tests/e2e-mw-dev/harness.sh) — 3 assertions (arrow, direct call, and the macro directly) that run against real worker pods on both cnpg + ext metadata backends.
  • Updated docs/postgres-compatibility.md.

just lint clean; full server + transpiler suites pass.

🤖 Generated with Claude Code

DuckDB's json_extract[_string](doc, path) treats a '$'-prefixed second
argument as a JSONPath. PostHog/HogQL property keys like "$ai_session_id"
and "$group_0" are not valid JSONPaths ('$' must be followed by '.' or
'['), so DuckDB failed at bind time:

  Binder Error: JSON path error near 'ai_session_id'

Normalize the key/path argument of -> / ->> operators and direct
json_extract / json_extract_string calls:

- Literal keys are rewritten at transpile time: $ai_session_id ->
  $."$ai_session_id". Plain keys and already-valid JSONPaths ($.foo,
  $."foo", $.a[0], $[0]) are preserved unchanged.
- Bound parameters ($N), whose value is unknown at transpile time, are
  wrapped in a new duckgres_json_extract_path() macro that applies the
  same normalization at runtime. Only JSON path arguments are wrapped,
  never ordinary string parameters. The macro is registered on every
  backend and emitted memory.main-qualified in DuckLake/Iceberg mode.

Adds transpiler unit tests, server transpile->DuckDB round-trip tests
(literal + bound param), e2e harness assertions on both metadata
backends, and updates the compatibility docs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

Test Impact Plan

Deterministic summary of how this PR changes tests, CI runners, and coverage-risk signals.

Summary

Area Added Changed Deleted
Test files 1 2 0
E2E/journey files 0 1 0
Workflow files 0 0 0

Signals

  • Test cases: +5 / -0
  • Assertions: +16 / -0
  • Skips or known failures added: 0
  • Workflow continue-on-error added: 0
  • Workflow path filters added: 0
  • Test commands removed from justfile: 0
  • E2E/journey retry lines added: 0

Coverage risk: needs review

Warnings

  • E2E or journey files changed (needs review)
    • tests/e2e-mw-dev/harness.sh

- $1::text / props->>$1::text parse as TypeCast(ParamRef) and bypassed
  normalization, so a bound $ai_session_id still reached DuckDB as a
  malformed JSONPath. normalizeJSONExtractPathArg now looks through
  wrapping casts (and rewrites a literal inside a cast, preserving it).

- json ->> $1 with an INTEGER-bound parameter (Postgres array indexing)
  regressed: the param was wrapped in a macro whose LIKE checks failed to
  bind on an integer. duckgres_json_extract_path is now type-aware — an
  integer argument becomes a $[i] JSONPath (array index), while a string
  "0" stays an object-key lookup, matching Postgres int-vs-text semantics.
  The string branches cast to VARCHAR so LIKE binds for any bound type.

- Documented the deliberate divergence where a $./$[ -prefixed key
  navigates as a DuckDB JSONPath even via ->> (kept consistent with direct
  calls and the #639 test; HogQL property keys never take this branch).

Adds transpiler + round-trip tests (casted text param, integer array
index) and an integer-index e2e harness assertion.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bill-ph

bill-ph commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks — all three addressed in e9e5770.

[P2] Casted text path params ($1::text) — confirmed: $1::text and props ->> $1::text parse as TypeCast(ParamRef), so the bare-ParamRef check skipped them and the bound $ai_session_id still hit the binder error. normalizeJSONExtractPathArg now looks through wrapping casts: a cast-wrapped param is wrapped whole (duckgres_json_extract_path($1::text), cast preserved so the wire type is unchanged), and a literal inside a cast ('$ai_session_id'::text) is rewritten in place. Covered by transpiler cases + a ::text-param round-trip.

[P2] Integer array-index params — confirmed regression, and worse than wrong results: an integer arg made the macro's LIKE fail to bind (~~ wants VARCHAR). The macro is now type-aware — it dispatches on typeof(k): an integer becomes a $[i] JSONPath (array index), while a string "0" stays an object-key lookup (matching Postgres int-vs-text ->> semantics). The string branches cast to VARCHAR so LIKE binds for any bound type. Verified through the Go DuckDB driver that a Go int/int64 arg keeps its integer type and indexes correctly; added arrow + direct round-trip cases and an json_int_index_macro e2e assertion.

[P3] Arrow vs direct $.a.b semantics — you're right that data ->> '$.a.b' is a literal key in Postgres. I kept the consistent DuckDB-JSONPath behavior for both forms (rather than splitting arrow/direct) deliberately, for two reasons: (a) a single runtime macro can't both preserve a valid path and quote it, so arrow params could never match arrow literals — splitting only literals would make the two inconsistent; and (b) it matches the pre-existing #639 transpiler test's convention. The divergence only affects keys that are also valid JSONPaths ($./$[-prefixed); the reported keys ($ai_session_id, $group_0 — no dot) and HogQL property keys never take that branch. I've made it explicit rather than an implicit lock-in: documented in normalizeJSONPathKey, the round-trip test (dotted JSONPath via arrow navigates (documented divergence) + a direct-call case where DuckDB semantics are unambiguously correct), and the compatibility doc. Happy to split arrow→literal-key for the literal case if you'd prefer the partial correctness over the literal/param consistency.

just lint clean; full transpiler + server suites pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant