Skip to content

duckgres sink: re-prime on table_suffix change (NEEDS_RESYNC on schema drift) #65331

Description

@EDsCODE

Background

The v3 duckgres data-import sink resolves a team's target schema from DuckLakeBackfill.table_suffix (PR #65323): posthog_data_imports_<suffix>, falling back to posthog_data_imports_team_<id> when unset. team_id was immutable; a suffix is not.

Problem

If a team sets or changes table_suffix after the sink has already written its tables, _duckgres_schema_name starts resolving to a new schema. The sink would silently begin writing into the new (empty) schema, orphaning the fully-primed old one — a silent, partial-data outcome for that team's warehouse.

This is the exact "a suffix could be set but ignored" hazard the duckling DAG stack calls out, in the sink's shape.

Proposed fix (depends on #63144 backfill state machine + #65323 resolver)

  1. Record what was primed. Add primed_schema (CharField) to DuckgresSinkSchemaState; set it at mark_primed / the reconciler's PRIMED transition to the resolved duckgres_data_imports_schema(team_id) at prime time.
  2. Detect drift in the reconciler. When a PRIMED schema's current resolved schema != primed_schema, CAS-transition it to NEEDS_RESYNC (state already exists) so the next backfill re-primes into the new schema. Log + Sentry on the transition.
  3. Re-prime targets the new schema — falls out of the existing backfill flow once the state flips, since the backfill processor also resolves via _duckgres_schema_name.
  4. Old schema is left in place (not auto-dropped) — cleanup belongs to the separate deletion-lifecycle sweeper follow-up.

Acceptance criteria / tests

  • Setting a suffix on an already-PRIMED team → reconciler flips it to NEEDS_RESYNC; a fresh backfill primes posthog_data_imports_<suffix>; live batches then apply there.
  • Unset→unset and unchanged-suffix cases are no-ops (no spurious re-prime).
  • DB-backed test on the state transition.

Dependencies / ordering

Interim safety (until this lands)

After #65323 deploys, audit sink-enabled teams for a non-NULL table_suffix — any that already have one will move schemas and need a one-time manual re-prime (reset_duckgres_failed_runs --replan-backfill once #63144 is in).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions