feat(data-warehouse): resolve duckgres data-import schema from team table_suffix by EDsCODE · Pull Request #65323 · PostHog/posthog

EDsCODE · 2026-06-22T21:07:26Z

Summary

A DuckgresServer is org-scoped and hosts many teams, so the v3 data-import sink writes each team into its own schema — historically posthog_data_imports_team_{team_id}. This threads the team's DuckLakeBackfill.table_suffix (merged #64950 — the same field that names a team's events/persons duckling tables) through the schema name, so one user-chosen identifier governs all of a team's warehouse tables instead of an opaque team id.

suffix unset → posthog_data_imports_team_<team_id>   (today's behavior, unchanged)
suffix set   → posthog_data_imports_<suffix>

Changes

New shared resolver posthog.ducklake.common.duckgres_data_imports_schema(team_id) + validate_duckgres_identifier (fail-closed alnum/underscore, mirrors the duckling DAG's _validate_identifier so the suffix is validated identically wherever it hits DuckDB DDL).
Wired into the v3 sink processor (_duckgres_schema_name) and the legacy DuckLakeCopyDataImportsWorkflow, so the historical-copy and live-sink paths never diverge on schema for the same team. (The copy workflow already stands down for sink-enabled teams via the flag gate; this keeps non-sink suffix teams consistent too.)

Backward-compatible no-op while all suffixes are NULL — existing teams are unaffected until a suffix is explicitly set.

Follow-ups / call-outs (not in this PR)

Suffix change ⇒ re-prime. team_id was immutable; a suffix is not. Changing it moves the schema and orphans the old one. The clean fix is to record the primed schema on DuckgresSinkSchemaState and transition to NEEDS_RESYNC on drift — that depends on the backfill PR (feat(data-warehouse): duckgres sink backfill primer #63144), so it's a tracked follow-up. Until then a change would silently start a new schema.
Per-org suffix uniqueness is an assumed invariant (two teams in an org picking the same suffix collide — same as events_<suffix> would). The provision/enable flow (feat(data-warehouse): provision/enable per-team warehouse backfill with named tables #64939) must enforce it.
Audit before enabling: any sink-enabled team that already has a non-NULL table_suffix will move schemas on deploy and needs a one-time re-prime — not a silent switch.

Tests

Resolver fallback (no row / NULL / empty) → team-id schema; set → suffixed; unsafe suffix → ValueError; validator accept/reject cases. Processor unit tests pin the resolver via an autouse stub.

🤖 Generated with Claude Code

…able_suffix A DuckgresServer is org-scoped and hosts many teams, so the v3 sink writes each team into its own schema — historically posthog_data_imports_team_{id}. This threads the team's DuckLakeBackfill.table_suffix (merged #64950, the same field that names its events/persons duckling tables) through the schema name so one user-chosen identifier governs all of a team's warehouse tables. - New shared resolver posthog.ducklake.common.duckgres_data_imports_schema(): NULL/empty suffix -> posthog_data_imports_team_{id} (today's behavior), set -> posthog_data_imports_{suffix}, fail-closed validated via validate_duckgres_identifier (mirrors the duckling DAG's _validate_identifier). - Wired into the v3 sink processor (_duckgres_schema_name) and the legacy DuckLakeCopyDataImportsWorkflow, so the historical-copy and live-sink paths never diverge on schema for the same team. Backward-compatible no-op while all suffixes are NULL. Follow-up (depends on backfill #63144): a suffix CHANGE moves the schema and orphans the old one — detect drift in the DuckgresSinkSchemaState reconciler and trigger NEEDS_RESYNC rather than silently switching. Tracked separately. Tests: resolver fallback/set/unsafe-suffix + validator accept/reject; processor tests pin the resolver via an autouse stub. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

greptile-apps · 2026-06-22T21:11:57Z

Comments Outside Diff (1)

posthog/temporal/ducklake/ducklake_copy_data_imports_workflow.py, line 222-234 (link)

duckgres_data_imports_schema is called inside the loop with the same inputs.team_id on every iteration, so it fires one DB query per schema_id when one query before the loop would suffice. All schemas in the batch belong to the same team.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (1): Last reviewed commit: "feat(data-warehouse): resolve duckgres d..." | Re-trigger Greptile}

veria-ai · 2026-06-22T21:19:03Z

PR overview

All previously flagged issues have been addressed. No open security concerns remain on this pull request.

Security review

No open security issues remain on this pull request.

Fixed/addressed: 1 · PR risk: 0/10

EDsCODE · 2026-06-22T21:22:22Z

Tracked follow-up for the suffix-change / re-prime hazard noted in the description: #65331 (NEEDS_RESYNC on schema drift). Gated on this PR + the backfill state machine (#63144).

…imports-suffix # Conflicts: # posthog/ducklake/common.py

hex-security-app · 2026-06-23T16:26:24Z

+    if not suffix:
+        return f"posthog_data_imports_team_{team_id}"
+    validate_duckgres_identifier(suffix)
+    return f"posthog_data_imports_{suffix}"


Cross-team schema collision

DuckgresServer is org-scoped, but this resolver maps a user-controlled DuckLakeBackfill.table_suffix directly to a shared schema name without preserving team_id or checking uniqueness. A team that can set its suffix to another team’s suffix, or to team_<id> for a legacy schema, can make its import/copy jobs create or replace tables in that other team’s data-import schema, breaking tenant isolation inside the org-scoped warehouse.

Prompt To Fix With AI

Keep team isolation in duckgres data-import schemas. Either keep `team_id` in the schema name even when a suffix is set (for example `posthog_data_imports_team_<team_id>_<suffix>`), or enforce and verify uniqueness of `DuckLakeBackfill.table_suffix` within each organization/DuckgresServer before it can ever be used here, including preventing collisions with legacy `posthog_data_imports_team_<id>` names. Update all writers/readers consistently and add tests covering cross-team collision attempts.

_{Severity: high | Confidence: 96% | React with 👍 if useful or 👎 if not}

duckgres_data_imports_schema(inputs.team_id) was called inside the per-schema loop in prepare_data_imports_ducklake_metadata_activity, issuing one identical DuckLakeBackfill lookup per schema_id. All schemas in a copy batch belong to the same team, so the result is loop-invariant — hoist it above the loop into a single query. Behavior is unchanged (same resolved schema name); only the redundant queries are removed. Addresses Greptile review comment on PR #65323. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

deployment-status-posthog · 2026-06-23T20:43:09Z

Deploy status

Environment	Status	Deployed At	Workflow
dev	✅ Deployed	2026-06-23 20:42 UTC	Run
prod-us	✅ Deployed	2026-06-23 20:57 UTC	Run
prod-eu	✅ Deployed	2026-06-23 21:09 UTC	Run

assign-reviewers-posthog Bot assigned EDsCODE Jun 22, 2026

assign-reviewers-posthog Bot requested a review from a team June 22, 2026 21:07

veria-ai Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread posthog/ducklake/common.py

EDsCODE mentioned this pull request Jun 22, 2026

duckgres sink: re-prime on table_suffix change (NEEDS_RESYNC on schema drift) #65331

Open

Gilbert09 approved these changes Jun 23, 2026

View reviewed changes

Merge remote-tracking branch 'origin/master' into eric/duckgres-data-…

8d7ab17

…imports-suffix # Conflicts: # posthog/ducklake/common.py

hex-security-app Bot reviewed Jun 23, 2026

View reviewed changes

EDsCODE merged commit 6396064 into master Jun 23, 2026
224 checks passed

EDsCODE deleted the eric/duckgres-data-imports-suffix branch June 23, 2026 18:33

EDsCODE mentioned this pull request Jun 23, 2026

fix(data-warehouse): make duckgres table_suffix read-only in admin #65616

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(data-warehouse): resolve duckgres data-import schema from team table_suffix#65323

feat(data-warehouse): resolve duckgres data-import schema from team table_suffix#65323
EDsCODE merged 3 commits into
masterfrom
eric/duckgres-data-imports-suffix

EDsCODE commented Jun 22, 2026

Uh oh!

greptile-apps Bot commented Jun 22, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

Uh oh!

veria-ai Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

EDsCODE commented Jun 22, 2026

Uh oh!

hex-security-app Bot Jun 23, 2026

Uh oh!

Uh oh!

deployment-status-posthog Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

EDsCODE commented Jun 22, 2026

Summary

Changes

Follow-ups / call-outs (not in this PR)

Tests

Uh oh!

greptile-apps Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comments Outside Diff (1)

Uh oh!

Uh oh!

veria-ai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR overview

Security review

Uh oh!

EDsCODE commented Jun 22, 2026

Uh oh!

hex-security-app Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Jun 22, 2026 •

edited

Loading

veria-ai Bot commented Jun 22, 2026 •

edited

Loading

deployment-status-posthog Bot commented Jun 23, 2026 •

edited

Loading