diff --git a/.github/workflows/e2e-mw-dev.yml b/.github/workflows/e2e-mw-dev.yml index 1f7acc1d..e4a09c6a 100644 --- a/.github/workflows/e2e-mw-dev.yml +++ b/.github/workflows/e2e-mw-dev.yml @@ -2,9 +2,7 @@ # # Replaces what kind (tests/k8s/) cannot exercise: real Cilium network # policies, real Crossplane Duckling provisioning, real cnpg-shard + external -# RDS metadata stores, and the real per-org Lakekeeper operator — the layers -# where this quarter's bugs actually lived (Cilium egress, lakekeeper -# encryption-key drift, cnpg role drift, RBAC delete gaps). +# RDS metadata stores — the layers kind cannot model. # # Flow: # 1. build arm64-only worker + control-plane images, tagged pr--, @@ -17,7 +15,7 @@ # ClusterIP service (no public DNS / NLB needed). Covers the cnpg-shard # and external metadata backends. # 5. always tear the namespace down (and deprovision the ci-pr ducklings so -# no S3 / cnpg role / lakekeeper CR leaks on shared infra). +# no S3 / cnpg role leaks on shared infra). # # SECURITY — who can run this: # Same model as the AWS/OIDC job in ci.yml: the gate is the repo setting @@ -201,8 +199,8 @@ jobs: uses: azure/setup-kubectl@776406bce94f63e41d621b960d78ee25c8b76ede # v4.0.1 - name: Update kubeconfig run: aws eks update-kubeconfig --name "$CLUSTER_NAME" --region us-east-1 --alias "$KUBE_CONTEXT" - # Deprovision the ci-pr ducklings (so no S3 / cnpg role+db / lakekeeper CR - # leaks on shared infra) then delete the namespace. + # Deprovision the ci-pr ducklings (so no S3 / cnpg role+db leaks on shared + # infra) then delete the namespace. - name: Teardown run: bash tests/e2e-mw-dev/run.sh teardown diff --git a/CLAUDE.md b/CLAUDE.md index 640e87a6..291333b6 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -126,7 +126,7 @@ exercises it (and update it if the path moved) — a refactor that quietly drops e2e coverage is a regression in the test suite even if behavior is unchanged. Unit/package tests are necessary but not sufficient: a change is only "done" once it is exercised against the real mw-dev cluster — real worker pods, real Crossplane ducklings, real cnpg/RDS metadata, real -Lakekeeper, real S3/Iceberg/STS. "Solid" means a deterministic pass/fail +S3/STS. "Solid" means a deterministic pass/fail assertion of the actual user-visible behavior (not just "it didn't error"), with transient/cold-pool conditions handled, on both metadata backends (cnpg + ext) where it touches metadata. A bugfix gets a regression assertion that would have @@ -139,20 +139,19 @@ Three test lanes worth knowing about, in increasing order of blast radius: - **Unit / package tests** (`go test ./...`): in-process, no external deps. Where most coverage lives. Includes `tests/manifests/` (static-manifest artifact asserts for `k8s/rbac.yaml` + `k8s/networkpolicy.yaml`). - **`tests/integration/`** (`just test-integration`): spins up the standalone server binary against a real MinIO + Postgres metadata store via docker compose. Covers wire protocol, DuckLake on real S3-compatible storage, transpilation against a live server. -- **`tests/e2e-mw-dev/`** (per-PR GitHub workflow `e2e-mw-dev.yml`): the full multi-tenant activation pipeline against the **real posthog-mw-dev EKS cluster** — real Cilium, real Crossplane ducklings, real cnpg-shard + external-RDS metadata, real per-org Lakekeeper, real AWS S3/Iceberg. A shell harness (`harness.sh`) runs as an in-cluster Job per PR; `run.sh` orchestrates deploy/test/teardown/e2e-cleanup. **Replaces the retired kind suite** (`tests/k8s/`) — that suite's `k8s-integration-tests` CI job and its Go tests are gone; the supporting `k8s/` scripts/manifests + Dockerfiles are kept for now. See `tests/e2e-mw-dev/README.md`. +- **`tests/e2e-mw-dev/`** (per-PR GitHub workflow `e2e-mw-dev.yml`): the full multi-tenant activation pipeline against the **real posthog-mw-dev EKS cluster** — real Cilium, real Crossplane ducklings, real cnpg-shard + external-RDS metadata, real AWS S3. A shell harness (`harness.sh`) runs as an in-cluster Job per PR; `run.sh` orchestrates deploy/test/teardown/e2e-cleanup. **Replaces the retired kind suite** (`tests/k8s/`) — that suite's `k8s-integration-tests` CI job and its Go tests are gone; the supporting `k8s/` scripts/manifests + Dockerfiles are kept for now. See `tests/e2e-mw-dev/README.md`. ### When code changes obligate test changes `tests/e2e-mw-dev/` is the only place we exercise the full activation pipeline (control plane → STS broker → worker pod → DuckDB → ATTACH against real cloud storage). If your change touches any of the following, treat updating the harness as part of the change, not a follow-up: -- `controlplane/shared_worker_activator.go`, `controlplane/sts_broker.go`, anything in the activation payload shape (`TenantActivationPayload`, `server.DuckLakeConfig`, `server.IcebergConfig`) -- `server/server.go::AttachDeltaCatalog`, `server.AttachIcebergCatalog`, `server.attachDuckLake*`, `server.refresh*Secret` -- `server/iceberg/` (config, dispatcher, backend implementations) — the harness provisions iceberg-enabled ducklings on both cnpg + ext backends and asserts the catalog attaches + reads/writes +- `controlplane/shared_worker_activator.go`, `controlplane/sts_broker.go`, anything in the activation payload shape (`TenantActivationPayload`, `server.DuckLakeConfig`) +- `server/server.go::AttachDeltaCatalog`, `server.attachDuckLake*`, `server.refresh*Secret` - `controlplane/configstore/models.go` — new columns flow through the provisioning API the harness calls; exercise them via a provision body field - `duckdbservice/activation.go`, `worker_activation.go` — worker-side activation order - Any code path that wires AWS credentials through to DuckDB SECRETs -The contract: if the harness no longer exercises a path you changed, **update `harness.sh`**; if your change removes a path it asserts against, **delete the assertion**. The DuckLake round-trip / durability / concurrent-writers / iceberg activation checks in `harness.sh` are the load-bearing ones for catalog wiring — keep them honest. +The contract: if the harness no longer exercises a path you changed, **update `harness.sh`**; if your change removes a path it asserts against, **delete the assertion**. The DuckLake round-trip / durability / concurrent-writers checks in `harness.sh` are the load-bearing ones for catalog wiring — keep them honest. ## Dependencies @@ -271,8 +270,8 @@ Invariants for anyone touching this path: secrets — persistent ones AND non-persistent (plain/TEMPORARY `CREATE SECRET`) ones, which pass through to the worker and would otherwise leak to the next user. It preserves only the system-managed allowlist - (`usersecrets.IsReservedName`: `ducklake_s3`/`iceberg_sigv4`/`iceberg_oauth` - + the `__default_*`/`duckgres_*` prefixes, which activation re-creates). It + (`usersecrets.IsReservedName`: `ducklake_s3` + the + `__default_*`/`duckgres_*` prefixes, which activation re-creates). It MUST run before replay on every CreateSession in shared-warm mode, and a wipe failure MUST fail the session. - **Execute-then-persist ordering.** Persist only statements DuckDB accepted; diff --git a/Dockerfile b/Dockerfile index 0e3c3866..3088fa90 100644 --- a/Dockerfile +++ b/Dockerfile @@ -42,9 +42,7 @@ RUN : "${DUCKDB_EXTENSION_VERSION:?must be set}" \ | gunzip > "/build/duckdb-extensions/v${DUCKDB_EXTENSION_VERSION}/linux_${TARGETARCH}/json.duckdb_extension" \ && curl -fsSL "${POSTGRES_SCANNER_REPOSITORY}/v${DUCKDB_EXTENSION_VERSION}/linux_${TARGETARCH}/postgres_scanner.duckdb_extension.gz" \ | gunzip > "/build/duckdb-extensions/v${DUCKDB_EXTENSION_VERSION}/linux_${TARGETARCH}/postgres_scanner.duckdb_extension" \ - && curl -fsSL "${DUCKDB_EXTENSION_REPOSITORY}/v${DUCKDB_EXTENSION_VERSION}/linux_${TARGETARCH}/iceberg.duckdb_extension.gz" \ - | gunzip > "/build/duckdb-extensions/v${DUCKDB_EXTENSION_VERSION}/linux_${TARGETARCH}/iceberg.duckdb_extension" \ - && for f in httpfs ducklake json postgres_scanner iceberg; do \ + && for f in httpfs ducklake json postgres_scanner; do \ [ -s "/build/duckdb-extensions/v${DUCKDB_EXTENSION_VERSION}/linux_${TARGETARCH}/$f.duckdb_extension" ] \ || { echo "ERROR: $f.duckdb_extension is empty after fetch" >&2; exit 1; }; \ done diff --git a/Dockerfile.worker b/Dockerfile.worker index 243d18de..6428787d 100644 --- a/Dockerfile.worker +++ b/Dockerfile.worker @@ -100,9 +100,7 @@ RUN : "${DUCKDB_EXTENSION_VERSION:?must be set}" \ | gunzip > "/build/duckdb-extensions/v${DUCKDB_EXTENSION_VERSION}/linux_${TARGETARCH}/json.duckdb_extension" \ && curl -fsSL "${POSTGRES_SCANNER_REPOSITORY}/v${DUCKDB_EXTENSION_VERSION}/linux_${TARGETARCH}/postgres_scanner.duckdb_extension.gz" \ | gunzip > "/build/duckdb-extensions/v${DUCKDB_EXTENSION_VERSION}/linux_${TARGETARCH}/postgres_scanner.duckdb_extension" \ - && curl -fsSL "${DUCKDB_EXTENSION_REPOSITORY}/v${DUCKDB_EXTENSION_VERSION}/linux_${TARGETARCH}/iceberg.duckdb_extension.gz" \ - | gunzip > "/build/duckdb-extensions/v${DUCKDB_EXTENSION_VERSION}/linux_${TARGETARCH}/iceberg.duckdb_extension" \ - && for f in httpfs ducklake json postgres_scanner iceberg; do \ + && for f in httpfs ducklake json postgres_scanner; do \ [ -s "/build/duckdb-extensions/v${DUCKDB_EXTENSION_VERSION}/linux_${TARGETARCH}/$f.duckdb_extension" ] \ || { echo "ERROR: $f.duckdb_extension is empty after fetch" >&2; exit 1; }; \ done diff --git a/README.md b/README.md index 6fbef880..82c75694 100644 --- a/README.md +++ b/README.md @@ -121,7 +121,6 @@ LIMIT 20; - [Performance Harness](docs/perf-harness-runbook.md): Local smoke and nightly operations for performance testing. - [Control Plane Rollout](docs/runbooks/control-plane-rollout.md): Zero-downtime deployment process for the control plane itself. - [Managed Warehouse Deprovision](docs/runbooks/managed-warehouse-deprovision.md): Destructive teardown process for managed warehouse infrastructure and org cleanup. -- [Lakekeeper Iceberg Catalog](docs/runbooks/lakekeeper-iceberg-catalog.md): Per-org Lakekeeper Iceberg REST catalog backend — architecture, the no-vending credential model, activation, and the `ACCESS_DELEGATION_MODE 'none'` gotcha. ## Quick Start @@ -324,12 +323,12 @@ teardown), so you can build a provisioning funnel and alert on failures. | Event | Fires when | Properties | | --- | --- | --- | -| `warehouse_provision_begin` | Provisioning accepted by the admin API (warehouse not usable yet) | `database_name`, `metadata_store`, `ducklake_enabled`, `iceberg_enabled` | -| `warehouse_provision_success` | Warehouse reaches Ready and is usable (provisioner controller) | `metadata_store`, `ducklake_enabled`, `iceberg_enabled` | -| `warehouse_provision_failed` | Warehouse reaches Failed (provisioner controller) | `metadata_store`, `ducklake_enabled`, `iceberg_enabled`, `reason` (`provisioning_timeout`/`crossplane_sync_failure`) | +| `warehouse_provision_begin` | Provisioning accepted by the admin API (warehouse not usable yet) | `database_name`, `metadata_store`, `ducklake_enabled` | +| `warehouse_provision_success` | Warehouse reaches Ready and is usable (provisioner controller) | `metadata_store`, `ducklake_enabled` | +| `warehouse_provision_failed` | Warehouse reaches Failed (provisioner controller) | `metadata_store`, `ducklake_enabled`, `reason` (`provisioning_timeout`/`crossplane_sync_failure`) | | `warehouse_deprovision_begin` | Deprovisioning accepted by the admin API (teardown not finished yet) | — | | `warehouse_deprovision_success` | All underlying resources deleted (provisioner controller) | — | -| `warehouse_deprovision_failed` | A teardown attempt failed (provisioner controller) | `reason` (`duckling_delete_failed`/`lakekeeper_teardown_failed`) | +| `warehouse_deprovision_failed` | A teardown attempt failed (provisioner controller) | `reason` (`duckling_delete_failed`) | | `warehouse_password_reset` | An org's root password is reset (admin API) | `username` | | `query_initiated` | A client query begins execution | `user`, `trace_id` | | `query_failed` | A query errors | `user`, `trace_id`, `error_code` (SQLSTATE), `error_category` (`user`/`system`/`conflict`/`metadata_connection_lost`) | @@ -775,7 +774,7 @@ Managed-warehouse contract notes: - At most one managed-warehouse row exists per team. The row may be absent before first provisioning or after cleanup, but there is never more than one active warehouse contract for a team. - The admin API exposes that contract at `GET /api/v1/teams/:name/warehouse` and `PUT /api/v1/teams/:name/warehouse`. Team list/get responses also include a nested `warehouse` object when present. -- User rows support an optional `default_catalog` field on `POST /api/v1/users` and `PUT /api/v1/orgs/:id/users/:username`. The default is empty, which preserves the standard DuckLake-first session behavior. Set `default_catalog` to `iceberg` for users whose sessions should resolve schema-qualified names and compatibility metadata through the Iceberg catalog by default; a client-supplied startup `search_path` still takes precedence. +- User rows support an optional `default_catalog` field on `POST /api/v1/users` and `PUT /api/v1/orgs/:id/users/:username`. The default is empty, which preserves the standard DuckLake session behavior. `ducklake` is the only non-empty value accepted. - The typed sections are `warehouse_database`, `metadata_store`, `s3`, `worker_identity`, and structured secret refs for `warehouse_database_credentials`, `metadata_store_credentials`, `s3_credentials`, and `runtime_config`. In shared worker mode, every non-empty secret ref must store an explicit `namespace`, and it must match `worker_identity.namespace`. - Secret references only are stored in the config store. Secret material remains outside the database. - The provisioning fields are stored directly on the warehouse row as overall `state` / `status_message`, per-resource `*_state` / `*_status_message`, plus `ready_at` and `failed_at`. diff --git a/cmd/duckgres-worker/main.go b/cmd/duckgres-worker/main.go index f802174d..b266f4ec 100644 --- a/cmd/duckgres-worker/main.go +++ b/cmd/duckgres-worker/main.go @@ -22,7 +22,6 @@ import ( "github.com/posthog/duckgres/duckdbservice" "github.com/posthog/duckgres/internal/cliboot" "github.com/posthog/duckgres/server" - "github.com/posthog/duckgres/server/lakekeeperbroker" ) // Each duckgres binary owns its own package-main version/commit/date because @@ -79,9 +78,6 @@ func main() { duckLakeDeltaCatalogEnabled := flag.Bool("ducklake-delta-catalog-enabled", true, "Attach a Delta Lake catalog during DuckLake worker boot (default true; use --ducklake-delta-catalog-enabled=false to disable; env: DUCKGRES_DUCKLAKE_DELTA_CATALOG_ENABLED)") duckLakeDeltaCatalogPath := flag.String("ducklake-delta-catalog-path", "", "Delta Lake catalog/table path to attach (env: DUCKGRES_DUCKLAKE_DELTA_CATALOG_PATH)") duckLakeDefaultSpecVersion := flag.String("ducklake-default-spec-version", "", "Default DuckLake spec version for migration checks (env: DUCKGRES_DUCKLAKE_DEFAULT_SPEC_VERSION)") - icebergEnabled := flag.Bool("iceberg-enabled", false, "Attach a per-tenant Iceberg catalog (Lakekeeper REST) at session init (env: DUCKGRES_ICEBERG_ENABLED)") - icebergRegion := flag.String("iceberg-region", "", "AWS region for S3 object access by Iceberg (env: DUCKGRES_ICEBERG_REGION)") - icebergNamespace := flag.String("iceberg-namespace", "", "Default Iceberg namespace (env: DUCKGRES_ICEBERG_NAMESPACE)") // Query log queryLog := flag.Bool("query-log", true, "Enable/disable DuckLake query log (use --query-log=false to disable; env: DUCKGRES_QUERY_LOG_ENABLED)") @@ -206,9 +202,6 @@ func main() { DuckLakeDeltaCatalogEnabled: *duckLakeDeltaCatalogEnabled, DuckLakeDeltaCatalogPath: *duckLakeDeltaCatalogPath, DuckLakeDefaultSpecVersion: *duckLakeDefaultSpecVersion, - IcebergEnabled: *icebergEnabled, - IcebergRegion: *icebergRegion, - IcebergNamespace: *icebergNamespace, QueryLog: *queryLog, }, os.Getenv, func(msg string) { slog.Warn(msg) @@ -248,23 +241,6 @@ func main() { os.Exit(1) } - // Lakekeeper OIDC SA-token broker. Started when DUCKGRES_LAKEKEEPER_TOKEN_PATH - // is set (typically by the duckling pod spec, mounting a projected SA - // token volume at /var/run/secrets/lakekeeper/token). DuckDB's iceberg - // extension POSTs to OAUTH2_SERVER_URI=http://127.0.0.1:9876/token to - // fetch a bearer; the broker reads the projected file and hands it back. - // When the env var is unset, no broker is started — preserves the - // allowall + NetworkPolicy posture for clusters that haven't enabled - // OIDC on Lakekeeper yet. - if tokenPath := configloader.Env("DUCKGRES_LAKEKEEPER_TOKEN_PATH", ""); tokenPath != "" { - broker := lakekeeperbroker.New(tokenPath) - if err := broker.ListenAndServe(lakekeeperbroker.DefaultAddr); err != nil { - fmt.Fprintf(os.Stderr, "Failed to start lakekeeper broker on %s: %s\n", lakekeeperbroker.DefaultAddr, err) - os.Exit(1) - } - slog.Info("Lakekeeper OIDC broker started.", "addr", lakekeeperbroker.DefaultAddr, "token_path", tokenPath) - } - // No initMetrics() here. In control-plane mode all worker pods would // fight over :9090; the control plane process owns the metrics // endpoint. The duckdb-service exposes per-session metrics via its own @@ -277,4 +253,3 @@ func main() { MaxSessions: maxSessions, }) } - diff --git a/configloader/file_config.go b/configloader/file_config.go index 8ad43642..a2039685 100644 --- a/configloader/file_config.go +++ b/configloader/file_config.go @@ -26,7 +26,6 @@ type FileConfig struct { RateLimit RateLimitFileConfig `yaml:"rate_limit"` Extensions []string `yaml:"extensions"` DuckLake DuckLakeFileConfig `yaml:"ducklake"` - Iceberg IcebergFileConfig `yaml:"iceberg"` FilePersistence bool `yaml:"file_persistence"` ProcessIsolation bool `yaml:"process_isolation"` IdleTimeout string `yaml:"idle_timeout"` @@ -106,15 +105,6 @@ type RateLimitFileConfig struct { MaxConnections int `yaml:"max_connections"` } -// IcebergFileConfig is the YAML shape for opting a single-tenant duckgres -// instance into Iceberg catalog attachment. In multi-tenant mode the -// equivalent values come from the per-warehouse configstore row, not YAML. -type IcebergFileConfig struct { - Enabled *bool `yaml:"enabled"` - Region string `yaml:"region"` - Namespace string `yaml:"namespace"` -} - type DuckLakeFileConfig struct { MetadataStore string `yaml:"metadata_store"` ObjectStore string `yaml:"object_store"` diff --git a/configresolve/cliflags.go b/configresolve/cliflags.go index 34308045..75a79126 100644 --- a/configresolve/cliflags.go +++ b/configresolve/cliflags.go @@ -43,9 +43,6 @@ func RegisterCLIInputsFlags(fs *flag.FlagSet) func() CLIInputs { duckLakeDeltaCatalogEnabled := fs.Bool("ducklake-delta-catalog-enabled", true, "Attach a Delta Lake catalog during DuckLake worker boot (default true; use --ducklake-delta-catalog-enabled=false to disable; env: DUCKGRES_DUCKLAKE_DELTA_CATALOG_ENABLED)") duckLakeDeltaCatalogPath := fs.String("ducklake-delta-catalog-path", "", "Delta Lake catalog/table path to attach, defaults to sibling delta/ prefix at DuckLake object-store root (env: DUCKGRES_DUCKLAKE_DELTA_CATALOG_PATH)") duckLakeDefaultSpecVersion := fs.String("ducklake-default-spec-version", "", "Default DuckLake spec version for migration checks (env: DUCKGRES_DUCKLAKE_DEFAULT_SPEC_VERSION)") - icebergEnabled := fs.Bool("iceberg-enabled", false, "Attach a per-tenant Iceberg catalog (Lakekeeper REST) at session init (env: DUCKGRES_ICEBERG_ENABLED)") - icebergRegion := fs.String("iceberg-region", "", "AWS region for S3 object access by Iceberg (default: us-east-1) (env: DUCKGRES_ICEBERG_REGION)") - icebergNamespace := fs.String("iceberg-namespace", "", "Default Iceberg namespace (informational; default: main) (env: DUCKGRES_ICEBERG_NAMESPACE)") processMinWorkers := fs.Int("process-min-workers", 0, "Pre-warm worker count at startup for process workers (control-plane mode) (env: DUCKGRES_PROCESS_MIN_WORKERS)") processMaxWorkers := fs.Int("process-max-workers", 0, "Max process workers, 0=auto-derived (control-plane mode) (env: DUCKGRES_PROCESS_MAX_WORKERS)") processRetireOnSessionEnd := fs.Bool("process-retire-on-session-end", false, "Retire a process worker immediately after its last session ends instead of keeping it warm for reuse (control-plane mode) (env: DUCKGRES_PROCESS_RETIRE_ON_SESSION_END)") @@ -103,9 +100,6 @@ func RegisterCLIInputsFlags(fs *flag.FlagSet) func() CLIInputs { cli.DuckLakeDeltaCatalogEnabled = *duckLakeDeltaCatalogEnabled cli.DuckLakeDeltaCatalogPath = *duckLakeDeltaCatalogPath cli.DuckLakeDefaultSpecVersion = *duckLakeDefaultSpecVersion - cli.IcebergEnabled = *icebergEnabled - cli.IcebergRegion = *icebergRegion - cli.IcebergNamespace = *icebergNamespace cli.ProcessMinWorkers = *processMinWorkers cli.ProcessMaxWorkers = *processMaxWorkers cli.ProcessRetireOnSessionEnd = *processRetireOnSessionEnd diff --git a/configresolve/cliflags_test.go b/configresolve/cliflags_test.go index 5dea829b..c96afa3d 100644 --- a/configresolve/cliflags_test.go +++ b/configresolve/cliflags_test.go @@ -30,9 +30,6 @@ func fieldNameToFlagName(name string) string { {"DuckLakeDeltaCatalogEnabled", "ducklake-delta-catalog-enabled"}, {"DuckLakeDeltaCatalogPath", "ducklake-delta-catalog-path"}, {"DuckLakeDefaultSpecVersion", "ducklake-default-spec-version"}, - {"IcebergEnabled", "iceberg-enabled"}, - {"IcebergRegion", "iceberg-region"}, - {"IcebergNamespace", "iceberg-namespace"}, {"FlightSessionIdleTTL", "flight-session-idle-ttl"}, {"FlightSessionReapInterval", "flight-session-reap-interval"}, {"FlightHandleIdleTTL", "flight-handle-idle-ttl"}, diff --git a/configresolve/resolve.go b/configresolve/resolve.go index 241aeef0..0245758f 100644 --- a/configresolve/resolve.go +++ b/configresolve/resolve.go @@ -48,9 +48,6 @@ type CLIInputs struct { DuckLakeDeltaCatalogEnabled bool DuckLakeDeltaCatalogPath string DuckLakeDefaultSpecVersion string - IcebergEnabled bool - IcebergRegion string - IcebergNamespace string ProcessMinWorkers int ProcessMaxWorkers int ProcessRetireOnSessionEnd bool @@ -317,15 +314,6 @@ func ResolveEffective(fileCfg *configloader.FileConfig, cli CLIInputs, getenv fu if fileCfg.DuckLake.DeltaCatalogPath != "" { cfg.DuckLake.DeltaCatalogPath = fileCfg.DuckLake.DeltaCatalogPath } - if fileCfg.Iceberg.Enabled != nil { - cfg.Iceberg.Enabled = *fileCfg.Iceberg.Enabled - } - if fileCfg.Iceberg.Region != "" { - cfg.Iceberg.Region = fileCfg.Iceberg.Region - } - if fileCfg.Iceberg.Namespace != "" { - cfg.Iceberg.Namespace = fileCfg.Iceberg.Namespace - } if fileCfg.DuckLake.DisableMetadataThreadLocalCache != nil { cfg.DuckLake.DisableMetadataThreadLocalCache = boolPtr(*fileCfg.DuckLake.DisableMetadataThreadLocalCache) } @@ -593,19 +581,6 @@ func ResolveEffective(fileCfg *configloader.FileConfig, cli CLIInputs, getenv fu if v := getenv("DUCKGRES_DUCKLAKE_DELTA_CATALOG_PATH"); v != "" { cfg.DuckLake.DeltaCatalogPath = v } - if v := getenv("DUCKGRES_ICEBERG_ENABLED"); v != "" { - if b, err := strconv.ParseBool(v); err == nil { - cfg.Iceberg.Enabled = b - } else { - warn("Invalid DUCKGRES_ICEBERG_ENABLED: " + err.Error()) - } - } - if v := getenv("DUCKGRES_ICEBERG_REGION"); v != "" { - cfg.Iceberg.Region = v - } - if v := getenv("DUCKGRES_ICEBERG_NAMESPACE"); v != "" { - cfg.Iceberg.Namespace = v - } if v := getenv("DUCKGRES_DUCKLAKE_DISABLE_METADATA_THREAD_LOCAL_CACHE"); v != "" { if b, err := strconv.ParseBool(v); err == nil { cfg.DuckLake.DisableMetadataThreadLocalCache = boolPtr(b) @@ -1059,15 +1034,6 @@ func ResolveEffective(fileCfg *configloader.FileConfig, cli CLIInputs, getenv fu if cli.Set["ducklake-delta-catalog-path"] { cfg.DuckLake.DeltaCatalogPath = cli.DuckLakeDeltaCatalogPath } - if cli.Set["iceberg-enabled"] { - cfg.Iceberg.Enabled = cli.IcebergEnabled - } - if cli.Set["iceberg-region"] { - cfg.Iceberg.Region = cli.IcebergRegion - } - if cli.Set["iceberg-namespace"] { - cfg.Iceberg.Namespace = cli.IcebergNamespace - } if cli.Set["ducklake-default-spec-version"] { cfg.DuckLake.SpecVersion = cli.DuckLakeDefaultSpecVersion } diff --git a/controlplane/admin/api.go b/controlplane/admin/api.go index 6cb06997..99454cad 100644 --- a/controlplane/admin/api.go +++ b/controlplane/admin/api.go @@ -427,9 +427,6 @@ func managedWarehouseUpsertColumns() []string { "s3_url_style", "s3_delta_catalog_enabled", "s3_delta_catalog_path", - "iceberg_enabled", - "iceberg_region", - "iceberg_namespace", "worker_identity_namespace", "worker_identity_service_account_name", "worker_identity_iam_role_arn", @@ -453,8 +450,6 @@ func managedWarehouseUpsertColumns() []string { "metadata_store_status_message", "s3_state", "s3_status_message", - "iceberg_state", - "iceberg_status_message", "identity_state", "identity_status_message", "secrets_state", @@ -532,7 +527,6 @@ type managedWarehouseRequest struct { MetadataStore configstore.ManagedWarehouseMetadataStore `json:"metadata_store"` PgBouncer configstore.ManagedWarehousePgBouncer `json:"pgbouncer"` S3 configstore.ManagedWarehouseS3 `json:"s3"` - Iceberg configstore.ManagedWarehouseIceberg `json:"iceberg"` WorkerIdentity configstore.ManagedWarehouseWorkerIdentity `json:"worker_identity"` WarehouseDatabaseCredentials configstore.SecretRef `json:"warehouse_database_credentials"` MetadataStoreCredentials configstore.SecretRef `json:"metadata_store_credentials"` @@ -546,8 +540,6 @@ type managedWarehouseRequest struct { MetadataStoreStatusMessage string `json:"metadata_store_status_message"` S3State configstore.ManagedWarehouseProvisioningState `json:"s3_state"` S3StatusMessage string `json:"s3_status_message"` - IcebergState configstore.ManagedWarehouseProvisioningState `json:"iceberg_state"` - IcebergStatusMessage string `json:"iceberg_status_message"` IdentityState configstore.ManagedWarehouseProvisioningState `json:"identity_state"` IdentityStatusMessage string `json:"identity_status_message"` SecretsState configstore.ManagedWarehouseProvisioningState `json:"secrets_state"` @@ -1087,10 +1079,10 @@ func validateDefaultCatalog(raw string) (string, error) { if raw == "" { return "", nil } - if raw == "iceberg" { + if raw == "ducklake" { return raw, nil } - return "", errors.New("default_catalog must be empty or iceberg") + return "", errors.New("default_catalog must be empty or ducklake") } func (h *apiHandler) deleteUser(c *gin.Context) { diff --git a/controlplane/admin/api_test.go b/controlplane/admin/api_test.go index 9e9ce3fd..19fcdf26 100644 --- a/controlplane/admin/api_test.go +++ b/controlplane/admin/api_test.go @@ -327,9 +327,9 @@ func TestCreateUserAcceptsDefaultCatalog(t *testing.T) { body := []byte(`{ "org_id": "analytics", - "username": "iceberg_reader", + "username": "ducklake_reader", "password": "secret", - "default_catalog": "iceberg" + "default_catalog": "ducklake" }`) req := httptest.NewRequest(http.MethodPost, "/api/v1/users", bytes.NewReader(body)) req.Header.Set("Content-Type", "application/json") @@ -339,20 +339,20 @@ func TestCreateUserAcceptsDefaultCatalog(t *testing.T) { if rec.Code != http.StatusCreated { t.Fatalf("status = %d, want %d: %s", rec.Code, http.StatusCreated, rec.Body.String()) } - user := store.users["analytics/iceberg_reader"] + user := store.users["analytics/ducklake_reader"] if user == nil { t.Fatal("expected user to be created") } - if user.DefaultCatalog != "iceberg" { - t.Fatalf("DefaultCatalog = %q, want iceberg", user.DefaultCatalog) + if user.DefaultCatalog != "ducklake" { + t.Fatalf("DefaultCatalog = %q, want ducklake", user.DefaultCatalog) } var response configstore.OrgUser if err := json.Unmarshal(rec.Body.Bytes(), &response); err != nil { t.Fatalf("unmarshal response: %v", err) } - if response.DefaultCatalog != "iceberg" { - t.Fatalf("response DefaultCatalog = %q, want iceberg", response.DefaultCatalog) + if response.DefaultCatalog != "ducklake" { + t.Fatalf("response DefaultCatalog = %q, want ducklake", response.DefaultCatalog) } } @@ -362,9 +362,9 @@ func TestCreateUserRejectsInvalidDefaultCatalog(t *testing.T) { body := []byte(`{ "org_id": "analytics", - "username": "iceberg_reader", + "username": "ducklake_reader", "password": "secret", - "default_catalog": "iceberg;DROP TABLE x" + "default_catalog": "ducklake;DROP TABLE x" }`) req := httptest.NewRequest(http.MethodPost, "/api/v1/users", bytes.NewReader(body)) req.Header.Set("Content-Type", "application/json") @@ -381,44 +381,44 @@ func TestCreateUserRejectsInvalidDefaultCatalog(t *testing.T) { func TestUpdateUserDefaultCatalogPreserveSetAndClear(t *testing.T) { store := newFakeAPIStore() - store.users["analytics/iceberg_reader"] = &configstore.OrgUser{ + store.users["analytics/ducklake_reader"] = &configstore.OrgUser{ OrgID: "analytics", - Username: "iceberg_reader", + Username: "ducklake_reader", Password: "hash", - DefaultCatalog: "iceberg", + DefaultCatalog: "ducklake", } router := newTestAPIRouter(store) - req := httptest.NewRequest(http.MethodPut, "/api/v1/orgs/analytics/users/iceberg_reader", bytes.NewReader([]byte(`{"passthrough":true}`))) + req := httptest.NewRequest(http.MethodPut, "/api/v1/orgs/analytics/users/ducklake_reader", bytes.NewReader([]byte(`{"passthrough":true}`))) req.Header.Set("Content-Type", "application/json") rec := httptest.NewRecorder() router.ServeHTTP(rec, req) if rec.Code != http.StatusOK { t.Fatalf("preserve status = %d, want %d: %s", rec.Code, http.StatusOK, rec.Body.String()) } - if got := store.users["analytics/iceberg_reader"].DefaultCatalog; got != "iceberg" { - t.Fatalf("preserved DefaultCatalog = %q, want iceberg", got) + if got := store.users["analytics/ducklake_reader"].DefaultCatalog; got != "ducklake" { + t.Fatalf("preserved DefaultCatalog = %q, want ducklake", got) } - req = httptest.NewRequest(http.MethodPut, "/api/v1/orgs/analytics/users/iceberg_reader", bytes.NewReader([]byte(`{"default_catalog":"iceberg"}`))) + req = httptest.NewRequest(http.MethodPut, "/api/v1/orgs/analytics/users/ducklake_reader", bytes.NewReader([]byte(`{"default_catalog":"ducklake"}`))) req.Header.Set("Content-Type", "application/json") rec = httptest.NewRecorder() router.ServeHTTP(rec, req) if rec.Code != http.StatusOK { t.Fatalf("set status = %d, want %d: %s", rec.Code, http.StatusOK, rec.Body.String()) } - if got := store.users["analytics/iceberg_reader"].DefaultCatalog; got != "iceberg" { - t.Fatalf("updated DefaultCatalog = %q, want iceberg", got) + if got := store.users["analytics/ducklake_reader"].DefaultCatalog; got != "ducklake" { + t.Fatalf("updated DefaultCatalog = %q, want ducklake", got) } - req = httptest.NewRequest(http.MethodPut, "/api/v1/orgs/analytics/users/iceberg_reader", bytes.NewReader([]byte(`{"default_catalog":""}`))) + req = httptest.NewRequest(http.MethodPut, "/api/v1/orgs/analytics/users/ducklake_reader", bytes.NewReader([]byte(`{"default_catalog":""}`))) req.Header.Set("Content-Type", "application/json") rec = httptest.NewRecorder() router.ServeHTTP(rec, req) if rec.Code != http.StatusOK { t.Fatalf("clear status = %d, want %d: %s", rec.Code, http.StatusOK, rec.Body.String()) } - if got := store.users["analytics/iceberg_reader"].DefaultCatalog; got != "" { + if got := store.users["analytics/ducklake_reader"].DefaultCatalog; got != "" { t.Fatalf("cleared DefaultCatalog = %q, want empty", got) } } @@ -644,7 +644,7 @@ func TestPutWarehouseDisablesPgBouncerWhenSetToFalse(t *testing.T) { } } -func TestPutWarehouseEnablesIcebergWhenSetToTrue(t *testing.T) { +func TestPutWarehouseRejectsIcebergUpdate(t *testing.T) { store := newFakeAPIStore() seedOrgWithWarehouse(store, "analytics") router := newTestAPIRouter(store) @@ -655,11 +655,8 @@ func TestPutWarehouseEnablesIcebergWhenSetToTrue(t *testing.T) { rec := httptest.NewRecorder() router.ServeHTTP(rec, req) - if rec.Code != http.StatusOK { - t.Fatalf("status = %d, want %d: %s", rec.Code, http.StatusOK, rec.Body.String()) - } - if !store.warehouses["analytics"].Iceberg.Enabled { - t.Fatal("expected iceberg.enabled=true after PUT") + if rec.Code != http.StatusBadRequest { + t.Fatalf("status = %d, want %d: %s", rec.Code, http.StatusBadRequest, rec.Body.String()) } } diff --git a/controlplane/configstore/iceberg_backend_test.go b/controlplane/configstore/iceberg_backend_test.go deleted file mode 100644 index 4964afc3..00000000 --- a/controlplane/configstore/iceberg_backend_test.go +++ /dev/null @@ -1,22 +0,0 @@ -package configstore - -import "testing" - -func TestManagedWarehouseIceberg_ResolvedBackend(t *testing.T) { - cases := []struct { - name string - in ManagedWarehouseIceberg - want string - }{ - {"empty defaults to lakekeeper", ManagedWarehouseIceberg{}, IcebergBackendLakekeeper}, - {"explicit lakekeeper", ManagedWarehouseIceberg{Backend: IcebergBackendLakekeeper}, IcebergBackendLakekeeper}, - {"unknown passthrough", ManagedWarehouseIceberg{Backend: "future"}, "future"}, - } - for _, c := range cases { - t.Run(c.name, func(t *testing.T) { - if got := c.in.ResolvedBackend(); got != c.want { - t.Errorf("ResolvedBackend() = %q, want %q", got, c.want) - } - }) - } -} diff --git a/controlplane/configstore/migrations/000001_initial_config_schema.sql b/controlplane/configstore/migrations/000001_initial_config_schema.sql index 19c1d067..f25d0734 100644 --- a/controlplane/configstore/migrations/000001_initial_config_schema.sql +++ b/controlplane/configstore/migrations/000001_initial_config_schema.sql @@ -210,17 +210,6 @@ CREATE TABLE IF NOT EXISTS duckgres_managed_warehouses ( s3_delta_catalog_enabled BOOLEAN DEFAULT true, s3_delta_catalog_path VARCHAR(1024), ducklake_enabled BOOLEAN DEFAULT false, - iceberg_enabled BOOLEAN DEFAULT false, - iceberg_backend VARCHAR(32) DEFAULT 'lakekeeper', - iceberg_namespace VARCHAR(255), - iceberg_region VARCHAR(64), - iceberg_lakekeeper_endpoint VARCHAR(512), - iceberg_lakekeeper_warehouse VARCHAR(128), - iceberg_lakekeeper_client_id VARCHAR(128), - iceberg_lakekeeper_o_auth2_server_uri VARCHAR(512), - iceberg_lakekeeper_client_credentials_namespace VARCHAR(255), - iceberg_lakekeeper_client_credentials_name VARCHAR(255), - iceberg_lakekeeper_client_credentials_key VARCHAR(255), worker_identity_namespace VARCHAR(255), worker_identity_service_account_name VARCHAR(255), worker_identity_iam_role_arn VARCHAR(512), @@ -244,8 +233,6 @@ CREATE TABLE IF NOT EXISTS duckgres_managed_warehouses ( metadata_store_status_message VARCHAR(1024), s3_state VARCHAR(32), s3_status_message VARCHAR(1024), - iceberg_state VARCHAR(32), - iceberg_status_message VARCHAR(1024), identity_state VARCHAR(32), identity_status_message VARCHAR(1024), secrets_state VARCHAR(32), @@ -286,17 +273,6 @@ ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS s3_url_style VA ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS s3_delta_catalog_enabled BOOLEAN DEFAULT true; ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS s3_delta_catalog_path VARCHAR(1024); ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS ducklake_enabled BOOLEAN DEFAULT false; -ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS iceberg_enabled BOOLEAN DEFAULT false; -ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS iceberg_backend VARCHAR(32) DEFAULT 'lakekeeper'; -ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS iceberg_namespace VARCHAR(255); -ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS iceberg_region VARCHAR(64); -ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS iceberg_lakekeeper_endpoint VARCHAR(512); -ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS iceberg_lakekeeper_warehouse VARCHAR(128); -ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS iceberg_lakekeeper_client_id VARCHAR(128); -ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS iceberg_lakekeeper_o_auth2_server_uri VARCHAR(512); -ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS iceberg_lakekeeper_client_credentials_namespace VARCHAR(255); -ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS iceberg_lakekeeper_client_credentials_name VARCHAR(255); -ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS iceberg_lakekeeper_client_credentials_key VARCHAR(255); ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS worker_identity_namespace VARCHAR(255); ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS worker_identity_service_account_name VARCHAR(255); ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS worker_identity_iam_role_arn VARCHAR(512); @@ -320,8 +296,6 @@ ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS metadata_store_ ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS metadata_store_status_message VARCHAR(1024); ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS s3_state VARCHAR(32); ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS s3_status_message VARCHAR(1024); -ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS iceberg_state VARCHAR(32); -ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS iceberg_status_message VARCHAR(1024); ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS identity_state VARCHAR(32); ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS identity_status_message VARCHAR(1024); ALTER TABLE duckgres_managed_warehouses ADD COLUMN IF NOT EXISTS secrets_state VARCHAR(32); diff --git a/controlplane/configstore/migrations/000004_drop_iceberg_columns.sql b/controlplane/configstore/migrations/000004_drop_iceberg_columns.sql new file mode 100644 index 00000000..79cd9a2a --- /dev/null +++ b/controlplane/configstore/migrations/000004_drop_iceberg_columns.sql @@ -0,0 +1,16 @@ +-- +goose Up + +ALTER TABLE duckgres_managed_warehouses + DROP COLUMN IF EXISTS iceberg_enabled, + DROP COLUMN IF EXISTS iceberg_backend, + DROP COLUMN IF EXISTS iceberg_namespace, + DROP COLUMN IF EXISTS iceberg_region, + DROP COLUMN IF EXISTS iceberg_lakekeeper_endpoint, + DROP COLUMN IF EXISTS iceberg_lakekeeper_warehouse, + DROP COLUMN IF EXISTS iceberg_lakekeeper_client_id, + DROP COLUMN IF EXISTS iceberg_lakekeeper_o_auth2_server_uri, + DROP COLUMN IF EXISTS iceberg_lakekeeper_client_credentials_namespace, + DROP COLUMN IF EXISTS iceberg_lakekeeper_client_credentials_name, + DROP COLUMN IF EXISTS iceberg_lakekeeper_client_credentials_key, + DROP COLUMN IF EXISTS iceberg_state, + DROP COLUMN IF EXISTS iceberg_status_message; diff --git a/controlplane/configstore/models.go b/controlplane/configstore/models.go index c3c7e604..de847e17 100644 --- a/controlplane/configstore/models.go +++ b/controlplane/configstore/models.go @@ -108,15 +108,7 @@ type ManagedWarehouseDatabase struct { } // Metadata-store kinds, stored verbatim in ManagedWarehouseMetadataStore.Kind -// and mirrored onto the Duckling CR's spec.metadataStore.type. The control -// plane provisions two of these: -// -// - "cnpg-shard": the per-tenant Lakekeeper Iceberg catalog Postgres backend -// on the shared CloudNativePG shard (always paired with iceberg.enabled). -// - "external": a pre-existing Postgres (e.g. RDS/Aurora), referenced by -// endpoint + an AWS Secrets Manager secret for the password. Backs either a -// DuckLake catalog (iceberg disabled) or the Lakekeeper catalog (iceberg -// enabled). +// and mirrored onto the Duckling CR's spec.metadataStore.type. const ( MetadataStoreKindCnpgShard = "cnpg-shard" MetadataStoreKindExternal = "external" @@ -187,83 +179,11 @@ type ManagedWarehouseWorkerIdentity struct { } // ManagedWarehouseDuckLake captures whether the org's DuckLake catalog is -// enabled. Decoupled from the metadata-store type and from Iceberg: a duckling -// may run DuckLake, Iceberg, or both, on any metadata backend (cnpg / -// external). The DuckLake catalog lives in the metadata Postgres — the -// per-tenant database for cnpg-shard, or the metadata database for external. -// -// For ducklings created before this field existed the column is absent/false; -// the worker activator does NOT key off it directly — it reads the Duckling -// CR's spec.ducklake.enabled (present/absent) so legacy ducklings keep their -// implied behavior (external ⇒ DuckLake, cnpg ⇒ none). +// enabled. DuckLake is the only supported warehouse catalog. type ManagedWarehouseDuckLake struct { Enabled bool `gorm:"default:false" json:"enabled"` } -// ManagedWarehouseIceberg captures per-org Iceberg catalog config. The -// only supported backend is Lakekeeper: a per-org Lakekeeper instance -// vends the Iceberg REST catalog. The provisioner creates the Lakekeeper -// CR + a warehouse pointing at the org's existing S3 bucket (path -// /lakekeeper//) and persists the endpoint + -// OAuth2 client credentials back here. The worker activator reads these -// and emits a (TYPE ICEBERG, CLIENT_ID/CLIENT_SECRET/OAUTH2_SERVER_URI) -// DuckDB SECRET + ATTACH at session init. -// -// The Backend column is retained for forward-compat / observability; the -// legacy "s3_tables" value is no longer honored anywhere in the code path. -type ManagedWarehouseIceberg struct { - Enabled bool `gorm:"default:false" json:"enabled"` - - // Backend is retained for schema compat. Empty/unset is treated as - // "lakekeeper" by callers; any other value is also treated as - // "lakekeeper" since no other backend is implemented. - Backend string `gorm:"size:32;default:'lakekeeper'" json:"backend"` - - // Namespace is the default Iceberg namespace inside the catalog. - Namespace string `gorm:"size:255" json:"namespace"` - - // Region is the AWS region for the Lakekeeper warehouse storage profile. - Region string `gorm:"size:64" json:"region"` - - // Lakekeeper fields. Populated by the provisioner after the per-org - // Lakekeeper is ready. - LakekeeperEndpoint string `gorm:"size:512" json:"lakekeeper_endpoint,omitempty"` - - // LakekeeperWarehouse is the warehouse NAME (e.g. "org-acme"), not the - // UUID. Iceberg REST clients pass this as the `warehouse` parameter to - // /v1/config and the server returns the UUID as a prefix for subsequent - // calls. PR2's worker-side ATTACH SQL uses this value directly. - LakekeeperWarehouse string `gorm:"size:128" json:"lakekeeper_warehouse,omitempty"` - LakekeeperClientID string `gorm:"size:128" json:"lakekeeper_client_id,omitempty"` - - // LakekeeperOAuth2ServerURI is the OAuth2 token endpoint URI for the - // duckling-side CREATE SECRET. Empty during PR1 (allowall mode); - // populated by PR3 once OIDC SA-token auth is wired. PR2 worker code - // must guard against empty and either skip the OAuth2 fields on the - // CREATE SECRET statement or emit a different secret shape. - LakekeeperOAuth2ServerURI string `gorm:"size:512" json:"lakekeeper_oauth2_server_uri,omitempty"` - - // LakekeeperClientCredentials holds the OAuth2 client_secret used by - // the duckling to authenticate to Lakekeeper. The control plane - // resolves this just before sending the activation payload. - LakekeeperClientCredentials SecretRef `gorm:"embedded;embeddedPrefix:lakekeeper_client_credentials_" json:"lakekeeper_client_credentials"` -} - -// IcebergBackend constants — string-typed to keep the GORM tag happy. -const ( - IcebergBackendLakekeeper = "lakekeeper" -) - -// ResolvedBackend returns Backend with the empty-string default applied. -// Callers should prefer this over reading Backend directly so that rows -// migrated from earlier schemas (no Backend column) behave correctly. -func (i ManagedWarehouseIceberg) ResolvedBackend() string { - if i.Backend == "" { - return IcebergBackendLakekeeper - } - return i.Backend -} - // ManagedWarehouse is the config-store source of truth for an org's managed warehouse metadata. type ManagedWarehouse struct { OrgID string `gorm:"primaryKey;size:255" json:"org_id"` @@ -277,7 +197,6 @@ type ManagedWarehouse struct { PgBouncer ManagedWarehousePgBouncer `gorm:"embedded;embeddedPrefix:pgbouncer_" json:"pgbouncer"` S3 ManagedWarehouseS3 `gorm:"embedded;embeddedPrefix:s3_" json:"s3"` DuckLake ManagedWarehouseDuckLake `gorm:"embedded;embeddedPrefix:ducklake_" json:"ducklake"` - Iceberg ManagedWarehouseIceberg `gorm:"embedded;embeddedPrefix:iceberg_" json:"iceberg"` WorkerIdentity ManagedWarehouseWorkerIdentity `gorm:"embedded;embeddedPrefix:worker_identity_" json:"worker_identity"` WarehouseDatabaseCredentials SecretRef `gorm:"embedded;embeddedPrefix:warehouse_database_credentials_" json:"warehouse_database_credentials"` @@ -293,8 +212,6 @@ type ManagedWarehouse struct { MetadataStoreStatusMessage string `gorm:"size:1024" json:"metadata_store_status_message"` S3State ManagedWarehouseProvisioningState `gorm:"size:32" json:"s3_state"` S3StatusMessage string `gorm:"size:1024" json:"s3_status_message"` - IcebergState ManagedWarehouseProvisioningState `gorm:"size:32" json:"iceberg_state"` - IcebergStatusMessage string `gorm:"size:1024" json:"iceberg_status_message"` IdentityState ManagedWarehouseProvisioningState `gorm:"size:32" json:"identity_state"` IdentityStatusMessage string `gorm:"size:1024" json:"identity_status_message"` SecretsState ManagedWarehouseProvisioningState `gorm:"size:32" json:"secrets_state"` @@ -571,7 +488,6 @@ type ManagedWarehouseConfig struct { MetadataStore ManagedWarehouseMetadataStore PgBouncer ManagedWarehousePgBouncer S3 ManagedWarehouseS3 - Iceberg ManagedWarehouseIceberg WorkerIdentity ManagedWarehouseWorkerIdentity WarehouseDatabaseCredentials SecretRef @@ -587,8 +503,6 @@ type ManagedWarehouseConfig struct { MetadataStoreStatusMessage string S3State ManagedWarehouseProvisioningState S3StatusMessage string - IcebergState ManagedWarehouseProvisioningState - IcebergStatusMessage string IdentityState ManagedWarehouseProvisioningState IdentityStatusMessage string SecretsState ManagedWarehouseProvisioningState @@ -610,7 +524,6 @@ func copyManagedWarehouseConfig(warehouse *ManagedWarehouse) *ManagedWarehouseCo MetadataStore: warehouse.MetadataStore, PgBouncer: warehouse.PgBouncer, S3: warehouse.S3, - Iceberg: warehouse.Iceberg, WorkerIdentity: warehouse.WorkerIdentity, WarehouseDatabaseCredentials: warehouse.WarehouseDatabaseCredentials, MetadataStoreCredentials: warehouse.MetadataStoreCredentials, @@ -624,8 +537,6 @@ func copyManagedWarehouseConfig(warehouse *ManagedWarehouse) *ManagedWarehouseCo MetadataStoreStatusMessage: warehouse.MetadataStoreStatusMessage, S3State: warehouse.S3State, S3StatusMessage: warehouse.S3StatusMessage, - IcebergState: warehouse.IcebergState, - IcebergStatusMessage: warehouse.IcebergStatusMessage, IdentityState: warehouse.IdentityState, IdentityStatusMessage: warehouse.IdentityStatusMessage, SecretsState: warehouse.SecretsState, diff --git a/controlplane/configstore/store.go b/controlplane/configstore/store.go index b7699070..736cb68f 100644 --- a/controlplane/configstore/store.go +++ b/controlplane/configstore/store.go @@ -50,7 +50,6 @@ type Snapshot struct { // non-empty values a client may request. const ( catalogDuckLake = "ducklake" - catalogIceberg = "iceberg" ) // PostgresConnectionResolution is the result of resolving and authenticating a @@ -70,11 +69,10 @@ type PostgresConnectionResolution struct { // SNIResolved is true when the managed hostname resolved to a known org. SNIResolved bool // EffectiveCatalog is the catalog the session should default to, selected by - // the startup `database` param: "" (use the per-user/attached default), - // "ducklake", or "iceberg". + // the startup `database` param: "" (use the attached default) or "ducklake". EffectiveCatalog string // CatalogValid is false when the requested `database` is not a selectable - // catalog name (anything other than "", "ducklake", "iceberg"). + // catalog name (anything other than "" or "ducklake"). CatalogValid bool // Valid is true when (OrgID, username, password) authenticated. Valid bool @@ -244,8 +242,8 @@ func (cs *ConfigStore) load() (*Snapshot, error) { if u.Passthrough { snap.OrgUserPassthrough[key] = true } - if u.DefaultCatalog != "" { - snap.OrgUserDefaultCatalog[key] = u.DefaultCatalog + if strings.EqualFold(u.DefaultCatalog, catalogDuckLake) { + snap.OrgUserDefaultCatalog[key] = catalogDuckLake } } snap.Orgs[o.Name] = oc @@ -359,8 +357,7 @@ func (cs *ConfigStore) ResolvePostgresConnection(startupDatabase, sniPrefix stri result := PostgresConnectionResolution{} // The startup `database` param is now pure catalog selection, not identity. - // Valid values: "" (use the per-user/attached default), "ducklake", or - // "iceberg". Anything else fails closed — there is no logical-name masking, + // Valid values: "" (use the attached default) or "ducklake". Anything else fails closed — there is no logical-name masking, // so an arbitrary name no longer routes anywhere. switch strings.ToLower(strings.TrimSpace(startupDatabase)) { case "": @@ -368,9 +365,6 @@ func (cs *ConfigStore) ResolvePostgresConnection(startupDatabase, sniPrefix stri case catalogDuckLake: result.EffectiveCatalog = catalogDuckLake result.CatalogValid = true - case catalogIceberg: - result.EffectiveCatalog = catalogIceberg - result.CatalogValid = true } cs.mu.RLock() @@ -607,28 +601,6 @@ var ErrWarehouseStateMismatch = errors.New("warehouse not in expected state") // has no row in duckgres_managed_warehouses. var ErrWarehouseNotFound = errors.New("warehouse not found") -// UpdateIcebergConfig writes the supplied column updates to the org's -// warehouse row without CAS'ing on the top-level state. Used by the -// Lakekeeper provisioner — Iceberg sub-state runs in parallel with the -// main warehouse state machine, so persisting the Lakekeeper endpoint -// after a top-level state transition shouldn't silently no-op. -// -// Caller-side discipline: the updates map should only contain -// iceberg_* columns. Untyped to keep the controller's WarehouseStore -// interface independent of the column list. -func (cs *ConfigStore) UpdateIcebergConfig(orgID string, updates map[string]interface{}) error { - result := cs.db.Model(&ManagedWarehouse{}). - Where("org_id = ?", orgID). - Updates(updates) - if result.Error != nil { - return fmt.Errorf("update iceberg config: %w", result.Error) - } - if result.RowsAffected == 0 { - return fmt.Errorf("warehouse %q: %w", orgID, ErrWarehouseNotFound) - } - return nil -} - func (cs *ConfigStore) UpdateWarehouseState(orgID string, expectedState ManagedWarehouseProvisioningState, updates map[string]interface{}) error { result := cs.db.Model(&ManagedWarehouse{}). Where("org_id = ? AND state = ?", orgID, expectedState). @@ -642,22 +614,6 @@ func (cs *ConfigStore) UpdateWarehouseState(orgID string, expectedState ManagedW return nil } -// GetManagedWarehouseIceberg reads the embedded Iceberg config for an -// org. Returns (nil, nil) when the org has no warehouse row so callers -// can distinguish "never provisioned" from a DB error. -func (cs *ConfigStore) GetManagedWarehouseIceberg(orgID string) (*ManagedWarehouseIceberg, error) { - var warehouse ManagedWarehouse - err := cs.db.First(&warehouse, "org_id = ?", orgID).Error - if err != nil { - if errors.Is(err, gorm.ErrRecordNotFound) { - return nil, nil - } - return nil, fmt.Errorf("get iceberg config for %q: %w", orgID, err) - } - ic := warehouse.Iceberg - return &ic, nil -} - func resolveRuntimeSchema(db *gorm.DB) (string, error) { var currentSchema string if err := db.Raw("SELECT current_schema()").Scan(¤tSchema).Error; err != nil { diff --git a/controlplane/configstore/store_test.go b/controlplane/configstore/store_test.go index 1712f45c..711d69fb 100644 --- a/controlplane/configstore/store_test.go +++ b/controlplane/configstore/store_test.go @@ -296,7 +296,7 @@ func TestResolvePostgresConnection(t *testing.T) { {OrgID: "test-org-smoke-1778167994", Username: "root"}: true, }, OrgUserDefaultCatalog: map[OrgUserKey]string{ - {OrgID: "billing", Username: "root"}: "iceberg", + {OrgID: "billing", Username: "root"}: "ducklake", }, }, } @@ -320,13 +320,10 @@ func TestResolvePostgresConnection(t *testing.T) { } }) - t.Run("iceberg catalog selected", func(t *testing.T) { + t.Run("iceberg catalog rejected", func(t *testing.T) { got := cs.ResolvePostgresConnection("iceberg", "test-org-smoke-1778167994", true, "root", "secret") - if !got.CatalogValid || got.EffectiveCatalog != "iceberg" { - t.Fatalf("catalog = (valid=%v, %q), want iceberg: %+v", got.CatalogValid, got.EffectiveCatalog, got) - } - if !got.Valid { - t.Fatalf("expected valid auth: %+v", got) + if got.CatalogValid { + t.Fatalf("iceberg must not be a selectable catalog: %+v", got) } }) @@ -341,7 +338,7 @@ func TestResolvePostgresConnection(t *testing.T) { }) t.Run("legacy database name is no longer a valid catalog", func(t *testing.T) { - // The org's old database_name is not "ducklake"/"iceberg", so it fails the + // The org's old database_name is not "ducklake", so it fails the // catalog check even though SNI+auth would otherwise succeed. got := cs.ResolvePostgresConnection("test_org_smoke_1778167994", "test-org-smoke-1778167994", true, "root", "secret") if got.CatalogValid { @@ -385,8 +382,8 @@ func TestResolvePostgresConnection(t *testing.T) { if got.OrgID != "billing" { t.Fatalf("OrgID = %q, want billing (via hostname alias)", got.OrgID) } - if got.DefaultCatalog != "iceberg" { - t.Fatalf("DefaultCatalog = %q, want iceberg", got.DefaultCatalog) + if got.DefaultCatalog != "ducklake" { + t.Fatalf("DefaultCatalog = %q, want ducklake", got.DefaultCatalog) } }) } diff --git a/controlplane/control.go b/controlplane/control.go index 1a30617c..846293b6 100644 --- a/controlplane/control.go +++ b/controlplane/control.go @@ -246,7 +246,6 @@ type ConfigStoreInterface interface { // OrgRouterInterface abstracts the org router for the control plane. type OrgRouterInterface interface { StackForOrg(orgID string) (pool WorkerPool, sessions *SessionManager, rebalancer *MemoryRebalancer, ok bool) - IcebergConfigForOrg(orgID string) (server.IcebergConfig, bool) IsMigratingForOrg(orgID string) bool ShutdownAll() } @@ -908,8 +907,7 @@ func (cp *ControlPlane) handleConnection(conn net.Conn) { // Honor a client-supplied connect-time search_path from the startup // `options` parameter (libpq `options=-c search_path=...`, PGOPTIONS, or - // pgjdbc `currentSchema`), so a session can pick its default catalog at - // connect (e.g. iceberg.public). Sanitized here at the trust boundary; + // pgjdbc `currentSchema`). Sanitized here at the trust boundary; // empty/invalid falls back to the worker's default search_path. startupOptions := server.ParseStartupOptions(startupParams["options"]) var clientSearchPath string @@ -958,12 +956,12 @@ func (cp *ControlPlane) handleConnection(conn net.Conn) { // In multi-tenant mode the org is resolved solely from the managed hostname // (SNI); the user is authenticated within that org. The startup `database` // param no longer identifies the org — it selects which attached catalog - // (ducklake/iceberg) the session defaults to. + // (ducklake) the session defaults to. var ( orgID string passthroughUser bool defaultCatalog string - requestedCatalog string // "" | "ducklake" | "iceberg" (validated below) + requestedCatalog string // "" | "ducklake" (validated below) ) if cp.configStore != nil { sni := tlsConn.ConnectionState().ServerName @@ -999,11 +997,11 @@ func (cp *ControlPlane) handleConnection(conn net.Conn) { } if !resolution.CatalogValid { // The startup `database` is now a catalog selector; only - // "ducklake"/"iceberg"/empty are valid. No logical-name masking. + // "ducklake"/empty are valid. No logical-name masking. clog.Warn("Postgres connection rejected: requested database is not a selectable catalog.", "database", database, "org", resolution.OrgID) _ = server.WriteErrorResponse(writer, "FATAL", "3D000", - fmt.Sprintf("database %q does not exist (connect with \"ducklake\" or \"iceberg\")", database)) + fmt.Sprintf("database %q does not exist (connect with \"ducklake\")", database)) _ = writer.Flush() return } @@ -1192,18 +1190,13 @@ func (cp *ControlPlane) handleConnection(conn net.Conn) { // Probe which catalogs the worker actually attached for this session, then // resolve the real catalog the session defaults to. The startup `database` - // selected "ducklake"/"iceberg"/"" (default); fail closed (3D000) if the - // requested catalog isn't attached. + // selected "ducklake"/"" (default); fail closed (3D000) if the requested + // catalog isn't attached. attachCtx, attachCancel := context.WithTimeout(context.Background(), cp.cfg.SessionInitTimeout) duckLakeAttached, dlErr := sessionmeta.HasAttachedCatalog(attachCtx, executor, physicalDuckLakeCatalog) - icebergAttached, icErr := sessionmeta.HasAttachedCatalog(attachCtx, executor, physicalIcebergCatalog) attachCancel() - probeErr := dlErr - if probeErr == nil { - probeErr = icErr - } - if probeErr != nil { - clog.Error("Failed to detect attached catalogs.", "error", probeErr) + if dlErr != nil { + clog.Error("Failed to detect attached catalogs.", "error", dlErr) _ = server.WriteErrorResponse(writer, "FATAL", "XX000", "failed to detect attached catalogs") _ = writer.Flush() return @@ -1211,10 +1204,10 @@ func (cp *ControlPlane) handleConnection(conn net.Conn) { var effectiveCatalog string if cp.configStore != nil { var ok bool - effectiveCatalog, ok = resolveEffectiveCatalog(requestedCatalog, defaultCatalog, duckLakeAttached, icebergAttached) + effectiveCatalog, ok = resolveEffectiveCatalog(requestedCatalog, defaultCatalog, duckLakeAttached) if !ok { clog.Warn("Postgres connection rejected: requested catalog is not available for this connection.", - "requested", requestedCatalog, "ducklake_attached", duckLakeAttached, "iceberg_attached", icebergAttached) + "requested", requestedCatalog, "ducklake_attached", duckLakeAttached) msg := "no catalog is available for this connection" if requestedCatalog != "" { msg = fmt.Sprintf("database %q does not exist", requestedCatalog) @@ -1227,12 +1220,9 @@ func (cp *ControlPlane) handleConnection(conn net.Conn) { // Single-tenant (process backend / static users): de-mask to the real // attached catalog when present; otherwise keep the client's database name // (plain DuckDB, no masking concern). No catalog-selection rejection here. - switch { - case duckLakeAttached: + if duckLakeAttached { effectiveCatalog = physicalDuckLakeCatalog - case icebergAttached: - effectiveCatalog = physicalIcebergCatalog - default: + } else { effectiveCatalog = database } } @@ -1247,19 +1237,6 @@ func (cp *ControlPlane) handleConnection(conn net.Conn) { // worker session stays in DuckDB's empty in-memory catalog (see the // passthrough branch below). if !passthroughUser { - // Prime the Iceberg REST catalog's schema list on this session connection - // before the compat-view bind below enumerates every attached catalog. - // Without this, the first session on a freshly-spawned (cold) worker fails - // the bind with `Schema with name "" not found` until the instance settles. - // Best-effort: if the prime errors, the real failure (if any) surfaces from - // InitSessionDatabaseMetadata with its full context. - if icebergAttached { - primeCtx, primeCancel := context.WithTimeout(context.Background(), cp.cfg.SessionInitTimeout) - if err := sessionmeta.PrimeIcebergCatalog(primeCtx, executor, physicalIcebergCatalog); err != nil { - clog.Warn("Failed to prime Iceberg catalog before session metadata init.", "error", err) - } - primeCancel() - } initCtx, initCancel := context.WithTimeout(context.Background(), cp.cfg.SessionInitTimeout) if err := sessionmeta.InitSessionDatabaseMetadata(initCtx, executor, effectiveCatalog); err != nil { initCancel() @@ -1274,8 +1251,7 @@ func (cp *ControlPlane) handleConnection(conn net.Conn) { // It must run here, not on the worker at session create: // InitSessionDatabaseMetadata's defer resets the catalog/search_path, so an // earlier value would be clobbered. A client-supplied search_path is - // best-effort; the configured catalog (Iceberg) fails closed because - // silently falling back would route the user to the wrong catalog. + // best-effort. if cmd, source := effectiveSessionDefaultCommand(clientSearchPath, effectiveCatalog); cmd != "" { spCtx, spCancel := context.WithTimeout(context.Background(), cp.cfg.SessionInitTimeout) _, err := executor.ExecContext(spCtx, cmd) @@ -1293,8 +1269,8 @@ func (cp *ControlPlane) handleConnection(conn net.Conn) { } else { // Passthrough: no pg_catalog views and no rewriting, but the session must // still land in its selected catalog instead of the empty in-memory one. - // Standalone passthrough does this via server.setDuckLakeDefault/ - // setIcebergDefault; the remote-worker path issues the equivalent here. + // Standalone passthrough does this via server.setDuckLakeDefault; the + // remote-worker path issues the equivalent here. if clientSearchPath != "" { clog.Warn("Ignoring client connect-time search_path for passthrough session.", "search_path", clientSearchPath) } @@ -1317,19 +1293,13 @@ func (cp *ControlPlane) handleConnection(conn net.Conn) { // Create real clientConn with FlightExecutor and worker assignment cc := server.NewClientConn(cp.srv, tlsConn, reader, writer, username, orgID, database, applicationName, executor, pid, secretKey, workerID, workerPod) - if cp.orgRouter != nil && orgID != "" { - if icebergCfg, ok := cp.orgRouter.IcebergConfigForOrg(orgID); ok { - server.SetConnectionIcebergConfig(cc, icebergCfg) - } - } // Record the resolved physical catalog so the transpiler selects the right - // backend profile (DuckLake/Iceberg DDL+DML policy) for this session. + // backend profile for this session. server.SetConnectionPhysicalCatalog(cc, effectiveCatalog) - // Catalog USE rewriting (expanding bare `USE ducklake`/`USE iceberg` to the + // Catalog USE rewriting (expanding bare `USE ducklake` to the // reliable two-part target) is a non-passthrough feature; passthrough sessions - // talk raw DuckDB, so keep it disabled for them. Enabled whenever either - // catalog is attached. - server.SetCatalogUseRewrite(cc, (duckLakeAttached || icebergAttached) && !passthroughUser) + // talk raw DuckDB, so keep it disabled for them. + server.SetCatalogUseRewrite(cc, duckLakeAttached && !passthroughUser) server.SetPassthrough(cc, passthroughUser) if orgID != "" { observeOrgPgSessionAccepted(orgID, passthroughUser) diff --git a/controlplane/flight_ingress_test.go b/controlplane/flight_ingress_test.go index c1d44e19..8fb5fd47 100644 --- a/controlplane/flight_ingress_test.go +++ b/controlplane/flight_ingress_test.go @@ -5,7 +5,6 @@ import ( "sync" "testing" - "github.com/posthog/duckgres/server" "github.com/posthog/duckgres/server/flightsqlingress" ) @@ -19,10 +18,6 @@ func (r *reconnectTestOrgRouter) StackForOrg(orgID string) (WorkerPool, *Session return nil, nil, nil, false } -func (r *reconnectTestOrgRouter) IcebergConfigForOrg(_ string) (server.IcebergConfig, bool) { - return server.IcebergConfig{}, false -} - func (r *reconnectTestOrgRouter) IsMigratingForOrg(_ string) bool { return false } func (r *reconnectTestOrgRouter) ShutdownAll() {} @@ -39,9 +34,6 @@ func (r *recordingOrgRouter) StackForOrg(orgID string) (WorkerPool, *SessionMana r.mu.Unlock() return nil, nil, nil, false } -func (r *recordingOrgRouter) IcebergConfigForOrg(_ string) (server.IcebergConfig, bool) { - return server.IcebergConfig{}, false -} func (r *recordingOrgRouter) IsMigratingForOrg(_ string) bool { return false } func (r *recordingOrgRouter) ShutdownAll() {} diff --git a/controlplane/k8s_pool_acquire.go b/controlplane/k8s_pool_acquire.go index ade11599..c4bde5c6 100644 --- a/controlplane/k8s_pool_acquire.go +++ b/controlplane/k8s_pool_acquire.go @@ -179,7 +179,6 @@ func (p *K8sWorkerPool) ActivateReservedWorker(ctx context.Context, worker *Mana }, OrgID: payload.OrgID, DuckLake: payload.DuckLake, - Iceberg: payload.Iceberg, }) } } diff --git a/controlplane/lakekeeper_inputs.go b/controlplane/lakekeeper_inputs.go deleted file mode 100644 index 719deeb0..00000000 --- a/controlplane/lakekeeper_inputs.go +++ /dev/null @@ -1,231 +0,0 @@ -//go:build kubernetes - -package controlplane - -import ( - "context" - "fmt" - "net/url" - "os" - "strconv" - "strings" - - "github.com/posthog/duckgres/controlplane/configstore" - "github.com/posthog/duckgres/controlplane/provisioner" -) - -// Environment variables that drive the Lakekeeper provisioner. The toggle is -// off by default so existing deployments are unaffected until an operator -// explicitly opts in. -const ( - envLakekeeperEnabled = "DUCKGRES_LAKEKEEPER_PROVISIONER_ENABLED" - - // Dev/orbstack fallback inputs, used only when an org has no - // Crossplane-provisioned Duckling CR to read infrastructure from - // (local MinIO + a plain Postgres). In prod these come from the CR. - envLakekeeperAdminDSN = "DUCKGRES_LAKEKEEPER_ADMIN_DSN" - envLakekeeperPGHost = "DUCKGRES_LAKEKEEPER_PG_HOST" - envLakekeeperPGPort = "DUCKGRES_LAKEKEEPER_PG_PORT" - envLakekeeperPGSSLMode = "DUCKGRES_LAKEKEEPER_PG_SSLMODE" - envLakekeeperS3Bucket = "DUCKGRES_LAKEKEEPER_S3_BUCKET" - envLakekeeperS3Region = "DUCKGRES_LAKEKEEPER_S3_REGION" - envLakekeeperS3Endpoint = "DUCKGRES_LAKEKEEPER_S3_ENDPOINT" - envLakekeeperS3Flavor = "DUCKGRES_LAKEKEEPER_S3_FLAVOR" - envLakekeeperS3KeyID = "DUCKGRES_LAKEKEEPER_S3_ACCESS_KEY_ID" - envLakekeeperS3Secret = "DUCKGRES_LAKEKEEPER_S3_SECRET_ACCESS_KEY" -) - -// lakekeeperProvisionerEnabled reports whether the control plane should wire -// the Lakekeeper provisioner branch into the provisioning controller. -func lakekeeperProvisionerEnabled() bool { - v := strings.ToLower(strings.TrimSpace(os.Getenv(envLakekeeperEnabled))) - return v == "1" || v == "true" || v == "yes" -} - -// newLakekeeperInputsResolver builds the per-org ProvisioningInputs resolver -// the provisioning controller calls before EnsureForOrg. -// -// Two sources, in priority order: -// -// 1. The Crossplane-provisioned Duckling CR (prod). The CR's metadata-store -// master credentials double as the admin connection that can CREATE the -// lakekeeper_ database/role, and its data-store bucket is the S3 -// warehouse Lakekeeper hands out. The Lakekeeper pod uses its own IRSA -// identity for S3 (no static creds), so we leave them empty. -// -// 2. Environment-configured fallback (dev/orbstack with MinIO). Used when no -// Duckling CR resolver is available, or the CR has no usable -// metadata-store/data-store yet. -// -// KubernetesAuthAudiences is always empty here: this wires the allowall + -// NetworkPolicy deployment shape. Enabling OIDC SA-token auth is a separate, -// flag-day change (see ProvisioningInputs.KubernetesAuthAudiences). -func newLakekeeperInputsResolver( - resolveDucklingStatus func(context.Context, string) (*provisioner.DucklingStatus, error), -) provisioner.LakekeeperInputsResolver { - return func(ctx context.Context, w *configstore.ManagedWarehouse) (provisioner.ProvisioningInputs, error) { - if resolveDucklingStatus != nil { - in, ok, err := lakekeeperInputsFromDuckling(ctx, resolveDucklingStatus, w.OrgID) - if err != nil { - // CR exists but is malformed/incomplete — fall through to the - // env fallback rather than hard-failing, mirroring how - // BuildActivationRequest degrades to the config-store path. - if env, ok := lakekeeperInputsFromEnv(); ok { - return env, nil - } - return provisioner.ProvisioningInputs{}, fmt.Errorf("resolve lakekeeper inputs for org %q from duckling CR: %w", w.OrgID, err) - } - if ok { - return in, nil - } - } - if env, ok := lakekeeperInputsFromEnv(); ok { - return env, nil - } - return provisioner.ProvisioningInputs{}, fmt.Errorf("no lakekeeper provisioning inputs for org %q: no usable Duckling CR and %s/%s/%s not set", - w.OrgID, envLakekeeperAdminDSN, envLakekeeperS3Bucket, envLakekeeperS3Region) - } -} - -// lakekeeperInputsFromDuckling derives inputs from the Duckling CR status. -// Returns ok=false (no error) when the resolver simply has no CR for the org, -// so the caller can fall back to env config without logging it as a failure. -func lakekeeperInputsFromDuckling( - ctx context.Context, - resolve func(context.Context, string) (*provisioner.DucklingStatus, error), - orgID string, -) (provisioner.ProvisioningInputs, bool, error) { - status, err := resolve(ctx, orgID) - if err != nil { - return provisioner.ProvisioningInputs{}, false, nil - } - if status == nil || status.MetadataStore.Endpoint == "" { - return provisioner.ProvisioningInputs{}, false, nil - } - if status.MetadataStore.Password == "" || status.MetadataStore.User == "" || status.MetadataStore.Database == "" { - return provisioner.ProvisioningInputs{}, false, fmt.Errorf("duckling CR for org %q has incomplete metadata-store credentials", orgID) - } - if status.DataStore.BucketName == "" || status.DataStore.S3Region == "" { - return provisioner.ProvisioningInputs{}, false, fmt.Errorf("duckling CR for org %q has no data-store bucket/region", orgID) - } - - s3 := provisioner.S3StorageConfig{ - Bucket: status.DataStore.BucketName, - // Keep the catalog's data under a stable prefix so it never - // collides with DuckLake's own layout in the shared bucket. - KeyPrefix: "lakekeeper", - Region: status.DataStore.S3Region, - Flavor: "aws", - // Prod: Lakekeeper uses its pod IRSA identity (no static creds). - // RoleARN is the per-org duckling role Lakekeeper assumes to vend - // scoped S3 creds (sts-role-arn). It's the same role the Lakekeeper - // pod runs as via Pod Identity, self-assumed (the role trusts its - // own ARN). Sourced from the Duckling CR status. - RoleARN: status.IAMRoleARN, - } - - // cnpg-shard: the Lakekeeper database + role are pre-provisioned on the - // CNPG shard by provider-sql, and status carries the per-tenant role - // credentials (not a privileged admin). We have no superuser DSN here, so - // the provisioner must NOT attempt CREATE DATABASE/ROLE — it takes these - // creds verbatim. The shard's Pooler is session-mode, so the Lakekeeper - // pod's migrations + pooled prepared statements work through it. - if status.MetadataStore.Type == "cnpg-shard" { - return provisioner.ProvisioningInputs{ - PGPreProvisioned: true, - PGUser: status.MetadataStore.User, - PGPassword: status.MetadataStore.Password, - PGDatabase: status.MetadataStore.Database, - PGHost: status.MetadataStore.Endpoint, - PGPort: 5432, - PGSSLMode: "require", - S3: s3, - // Allowall + NetworkPolicy deployment shape — no OIDC audiences yet. - KubernetesAuthAudiences: nil, - }, true, nil - } - - // external: status carries the metadata-store master credentials, - // which double as the admin connection that can CREATE the - // lakekeeper_ database/role. Admin DDL (CREATE DATABASE/ROLE) goes - // to the direct RDS endpoint, not the PgBouncer pooler — transaction - // pooling breaks CREATE DATABASE and the session-level statements - // EnsureRole runs. We connect to the existing metadata-store database; - // CREATE DATABASE can be issued from any database. - adminDSN := buildAdminURLDSN( - status.MetadataStore.Endpoint, 5432, - status.MetadataStore.User, status.MetadataStore.Password, - status.MetadataStore.Database, "require", - ) - - return provisioner.ProvisioningInputs{ - AdminDSN: adminDSN, - // The Lakekeeper pod also connects to the direct endpoint: it runs its - // own migrations and connection pool, which a transaction pooler would - // break. - PGHost: status.MetadataStore.Endpoint, - PGPort: 5432, - PGSSLMode: "require", - S3: s3, - KubernetesAuthAudiences: nil, - }, true, nil -} - -// lakekeeperInputsFromEnv builds inputs from environment variables for -// dev/orbstack (MinIO). Returns ok=false when the minimum set isn't present. -func lakekeeperInputsFromEnv() (provisioner.ProvisioningInputs, bool) { - adminDSN := strings.TrimSpace(os.Getenv(envLakekeeperAdminDSN)) - bucket := strings.TrimSpace(os.Getenv(envLakekeeperS3Bucket)) - region := strings.TrimSpace(os.Getenv(envLakekeeperS3Region)) - if adminDSN == "" || bucket == "" || region == "" { - return provisioner.ProvisioningInputs{}, false - } - - pgHost := strings.TrimSpace(os.Getenv(envLakekeeperPGHost)) - pgPort := int32(0) - if v := strings.TrimSpace(os.Getenv(envLakekeeperPGPort)); v != "" { - if n, err := strconv.Atoi(v); err == nil { - pgPort = int32(n) - } - } - sslMode := strings.TrimSpace(os.Getenv(envLakekeeperPGSSLMode)) - if sslMode == "" { - sslMode = "require" - } - flavor := strings.TrimSpace(os.Getenv(envLakekeeperS3Flavor)) - if flavor == "" { - flavor = "aws" - } - - return provisioner.ProvisioningInputs{ - AdminDSN: adminDSN, - PGHost: pgHost, - PGPort: pgPort, - PGSSLMode: sslMode, - S3: provisioner.S3StorageConfig{ - Bucket: bucket, - Region: region, - Endpoint: strings.TrimSpace(os.Getenv(envLakekeeperS3Endpoint)), - Flavor: flavor, - StaticAccessKeyID: strings.TrimSpace(os.Getenv(envLakekeeperS3KeyID)), - StaticAccessKeySecret: strings.TrimSpace(os.Getenv(envLakekeeperS3Secret)), - }, - KubernetesAuthAudiences: nil, - }, true -} - -// buildAdminURLDSN renders a pgx-compatible URL-style DSN. URL form is what -// the provisioner's reDSN rewrites when it scopes a connection to a specific -// database, so we emit that form here. -func buildAdminURLDSN(host string, port int, user, password, dbName, sslMode string) string { - u := url.URL{ - Scheme: "postgres", - User: url.UserPassword(user, password), - Host: fmt.Sprintf("%s:%d", host, port), - Path: "/" + dbName, - } - q := u.Query() - q.Set("sslmode", sslMode) - u.RawQuery = q.Encode() - return u.String() -} diff --git a/controlplane/lakekeeper_inputs_test.go b/controlplane/lakekeeper_inputs_test.go deleted file mode 100644 index d6267af9..00000000 --- a/controlplane/lakekeeper_inputs_test.go +++ /dev/null @@ -1,209 +0,0 @@ -//go:build kubernetes - -package controlplane - -import ( - "context" - "errors" - "net/url" - "strings" - "testing" - - "github.com/posthog/duckgres/controlplane/configstore" - "github.com/posthog/duckgres/controlplane/provisioner" -) - -func ducklingStatusWithWarehouse() *provisioner.DucklingStatus { - s := &provisioner.DucklingStatus{} - s.MetadataStore.Endpoint = "duckling-acme.cluster-xyz.us-east-1.rds.amazonaws.com" - s.MetadataStore.PgBouncerEndpoint = "pgb-acme.svc:6432" - s.MetadataStore.User = "ducklingacme" - s.MetadataStore.Password = "s3cr3t" - s.MetadataStore.Database = "ducklingacme" - s.DataStore.BucketName = "posthog-duckling-acme" - s.DataStore.S3Region = "us-east-1" - s.IAMRoleARN = "arn:aws:iam::123:role/duckling-acme" - return s -} - -func TestResolverFromDucklingCR(t *testing.T) { - // No env fallback configured, so this must come purely from the CR. - resolve := func(context.Context, string) (*provisioner.DucklingStatus, error) { - return ducklingStatusWithWarehouse(), nil - } - r := newLakekeeperInputsResolver(resolve) - - in, err := r(context.Background(), &configstore.ManagedWarehouse{OrgID: "acme"}) - if err != nil { - t.Fatalf("resolver: %v", err) - } - - // Admin DSN must target the DIRECT endpoint (not the PgBouncer pooler), - // be URL-form, carry the master creds, and request sslmode=require. - u, perr := url.Parse(in.AdminDSN) - if perr != nil { - t.Fatalf("admin DSN not a URL: %q (%v)", in.AdminDSN, perr) - } - if u.Hostname() != "duckling-acme.cluster-xyz.us-east-1.rds.amazonaws.com" { - t.Errorf("admin DSN host = %q, want direct RDS endpoint (not pgbouncer)", u.Hostname()) - } - if strings.Contains(in.AdminDSN, "pgb-acme") || strings.Contains(in.AdminDSN, "6432") { - t.Errorf("admin DSN must not use the PgBouncer pooler: %q", in.AdminDSN) - } - if pw, _ := u.User.Password(); pw != "s3cr3t" || u.User.Username() != "ducklingacme" { - t.Errorf("admin DSN creds wrong: %q", u.Redacted()) - } - if got := u.Query().Get("sslmode"); got != "require" { - t.Errorf("sslmode = %q, want require", got) - } - - if in.PGHost != "duckling-acme.cluster-xyz.us-east-1.rds.amazonaws.com" || in.PGPort != 5432 { - t.Errorf("PGHost/PGPort = %q/%d, want direct endpoint:5432", in.PGHost, in.PGPort) - } - if in.PGSSLMode != "require" { - t.Errorf("PGSSLMode = %q, want require", in.PGSSLMode) - } - if in.S3.Bucket != "posthog-duckling-acme" || in.S3.Region != "us-east-1" || in.S3.Flavor != "aws" { - t.Errorf("S3 = %+v, want bucket/region/aws from CR", in.S3) - } - if in.S3.StaticAccessKeyID != "" || in.S3.StaticAccessKeySecret != "" { - t.Errorf("prod S3 must use pod IRSA, not static creds: %+v", in.S3) - } - if in.S3.RoleARN != "arn:aws:iam::123:role/duckling-acme" { - t.Errorf("S3.RoleARN = %q, want the duckling role from CR status (for STS vending)", in.S3.RoleARN) - } - if len(in.KubernetesAuthAudiences) != 0 { - t.Errorf("expected allowall mode (no audiences), got %v", in.KubernetesAuthAudiences) - } -} - -func TestResolverFromCnpgShardCR(t *testing.T) { - // cnpg-shard: the Lakekeeper DB + role are pre-provisioned by provider-sql - // on the shard, and status carries the per-tenant role creds + the - // session-mode Pooler endpoint. The resolver must produce pre-provisioned - // inputs with NO AdminDSN (the provisioner must not attempt CREATE - // DATABASE/ROLE), taking the role creds verbatim. - resolve := func(context.Context, string) (*provisioner.DucklingStatus, error) { - s := &provisioner.DucklingStatus{} - s.MetadataStore.Type = "cnpg-shard" - s.MetadataStore.Endpoint = "shard-001-pooler.cnpg-shards.svc.cluster.local" - s.MetadataStore.User = "lakekeeper_acme" - s.MetadataStore.Password = "from-provider-sql" - s.MetadataStore.Database = "lakekeeper_acme" - s.DataStore.BucketName = "posthog-duckling-acme" - s.DataStore.S3Region = "us-east-1" - s.IAMRoleARN = "arn:aws:iam::123:role/duckling-acme" - return s, nil - } - r := newLakekeeperInputsResolver(resolve) - - in, err := r(context.Background(), &configstore.ManagedWarehouse{OrgID: "acme"}) - if err != nil { - t.Fatalf("resolver: %v", err) - } - - if !in.PGPreProvisioned { - t.Fatalf("PGPreProvisioned = false, want true for cnpg-shard") - } - if in.AdminDSN != "" { - t.Errorf("AdminDSN = %q, want empty (no privileged DDL in cnpg-shard mode)", in.AdminDSN) - } - if in.PGUser != "lakekeeper_acme" || in.PGPassword != "from-provider-sql" || in.PGDatabase != "lakekeeper_acme" { - t.Errorf("PG creds = %q/%q/%q, want the provider-sql role verbatim", in.PGUser, in.PGPassword, in.PGDatabase) - } - if in.PGHost != "shard-001-pooler.cnpg-shards.svc.cluster.local" || in.PGPort != 5432 { - t.Errorf("PGHost/PGPort = %q/%d, want the session Pooler:5432", in.PGHost, in.PGPort) - } - if in.S3.Bucket != "posthog-duckling-acme" || in.S3.RoleARN != "arn:aws:iam::123:role/duckling-acme" { - t.Errorf("S3 = %+v, want bucket/roleARN from CR status", in.S3) - } -} - -func TestResolverFromEnvFallback(t *testing.T) { - t.Setenv(envLakekeeperAdminDSN, "postgres://admin:pw@127.0.0.1:5432/postgres?sslmode=disable") - t.Setenv(envLakekeeperPGHost, "127.0.0.1") - t.Setenv(envLakekeeperPGPort, "5432") - t.Setenv(envLakekeeperPGSSLMode, "disable") - t.Setenv(envLakekeeperS3Bucket, "warehouse") - t.Setenv(envLakekeeperS3Region, "us-east-1") - t.Setenv(envLakekeeperS3Endpoint, "http://minio.minio.svc:9000") - t.Setenv(envLakekeeperS3Flavor, "s3-compat") - t.Setenv(envLakekeeperS3KeyID, "minioadmin") - t.Setenv(envLakekeeperS3Secret, "minioadmin") - - // nil resolver → straight to env fallback. - r := newLakekeeperInputsResolver(nil) - in, err := r(context.Background(), &configstore.ManagedWarehouse{OrgID: "dev"}) - if err != nil { - t.Fatalf("resolver: %v", err) - } - if in.AdminDSN == "" || in.PGHost != "127.0.0.1" || in.PGPort != 5432 { - t.Errorf("env inputs wrong: %+v", in) - } - if in.PGSSLMode != "disable" { - t.Errorf("PGSSLMode = %q, want disable", in.PGSSLMode) - } - if in.S3.Endpoint != "http://minio.minio.svc:9000" || in.S3.Flavor != "s3-compat" { - t.Errorf("S3 = %+v, want MinIO endpoint + s3-compat", in.S3) - } - if in.S3.StaticAccessKeyID != "minioadmin" || in.S3.StaticAccessKeySecret != "minioadmin" { - t.Errorf("S3 static creds not propagated: %+v", in.S3) - } -} - -func TestResolverDucklingErrorFallsBackToEnv(t *testing.T) { - t.Setenv(envLakekeeperAdminDSN, "postgres://admin:pw@127.0.0.1:5432/postgres?sslmode=disable") - t.Setenv(envLakekeeperS3Bucket, "warehouse") - t.Setenv(envLakekeeperS3Region, "us-east-1") - - // CR resolver errors (e.g. no Duckling for this org) → env fallback used. - resolve := func(context.Context, string) (*provisioner.DucklingStatus, error) { - return nil, errors.New("duckling not found") - } - r := newLakekeeperInputsResolver(resolve) - in, err := r(context.Background(), &configstore.ManagedWarehouse{OrgID: "dev"}) - if err != nil { - t.Fatalf("expected env fallback, got error: %v", err) - } - if in.S3.Bucket != "warehouse" { - t.Errorf("expected env inputs, got %+v", in) - } -} - -func TestResolverIncompleteCRWithoutEnvErrors(t *testing.T) { - // CR present but missing bucket, and no env configured → hard error. - resolve := func(context.Context, string) (*provisioner.DucklingStatus, error) { - s := &provisioner.DucklingStatus{} - s.MetadataStore.Endpoint = "host" - s.MetadataStore.User = "u" - s.MetadataStore.Password = "p" - s.MetadataStore.Database = "d" - // DataStore left empty. - return s, nil - } - r := newLakekeeperInputsResolver(resolve) - if _, err := r(context.Background(), &configstore.ManagedWarehouse{OrgID: "acme"}); err == nil { - t.Fatal("expected error for incomplete CR with no env fallback") - } -} - -func TestResolverNoSourcesErrors(t *testing.T) { - r := newLakekeeperInputsResolver(nil) - if _, err := r(context.Background(), &configstore.ManagedWarehouse{OrgID: "x"}); err == nil { - t.Fatal("expected error when neither CR nor env provide inputs") - } -} - -func TestLakekeeperProvisionerEnabledToggle(t *testing.T) { - for _, tc := range []struct { - val string - want bool - }{ - {"", false}, {"true", true}, {"1", true}, {"yes", true}, {"TRUE", true}, {"false", false}, {"0", false}, - } { - t.Setenv(envLakekeeperEnabled, tc.val) - if got := lakekeeperProvisionerEnabled(); got != tc.want { - t.Errorf("enabled(%q) = %v, want %v", tc.val, got, tc.want) - } - } -} diff --git a/controlplane/multitenant.go b/controlplane/multitenant.go index d89d9c6d..f6a7cd30 100644 --- a/controlplane/multitenant.go +++ b/controlplane/multitenant.go @@ -49,10 +49,6 @@ func (a *orgRouterAdapter) StackForOrg(orgID string) (WorkerPool, *SessionManage return stack.Pool, stack.Sessions, stack.Rebalancer, true } -func (a *orgRouterAdapter) IcebergConfigForOrg(orgID string) (server.IcebergConfig, bool) { - return a.router.IcebergConfigForOrg(orgID) -} - func (a *orgRouterAdapter) IsMigratingForOrg(orgID string) bool { return a.router.IsMigrating(orgID) } @@ -390,19 +386,6 @@ func SetupMultiTenant( // spec.dataStore.bucketName onto existing ready ducklings. Empty leaves // it disabled (composition keeps deriving). provCtrl.WithBucketSuffix(cfg.DucklingBucketSuffix) - // Opt-in: enable the per-org Lakekeeper provisioning branch. Off by - // default so existing deploys are unaffected. Best-effort — if the - // K8s client can't be built we log and leave the controller running - // without the Lakekeeper branch (S3-Tables warehouses still work). - if lakekeeperProvisionerEnabled() { - if k8sClient, lkErr := provisioner.NewLakekeeperK8sClient(); lkErr != nil { - slog.Warn("Lakekeeper provisioner enabled but K8s client unavailable; skipping.", "error", lkErr) - } else { - lkProv := provisioner.NewLakekeeperProvisioner(store, k8sClient) - provCtrl.WithLakekeeperProvisioner(lkProv, newLakekeeperInputsResolver(resolveDucklingStatus)) - slog.Info("Lakekeeper provisioner enabled (allowall + NetworkPolicy mode).") - } - } go provCtrl.Run(context.Background()) } diff --git a/controlplane/org_activation_test.go b/controlplane/org_activation_test.go index bd1ff896..8aeb331b 100644 --- a/controlplane/org_activation_test.go +++ b/controlplane/org_activation_test.go @@ -305,6 +305,7 @@ func TestSharedWorkerActivatorDucklingCRRequiresSTSBroker(t *testing.T) { if orgID != "test-org" { t.Fatalf("expected test-org, got %q", orgID) } + ducklakeEnabled := true return &provisioner.DucklingStatus{ MetadataStore: struct { Type string @@ -327,7 +328,8 @@ func TestSharedWorkerActivatorDucklingCRRequiresSTSBroker(t *testing.T) { BucketName: "posthog-duckling-test-org", S3Region: "us-east-1", }, - IAMRoleARN: "arn:aws:iam::123:role/duckling-test-org", + IAMRoleARN: "arn:aws:iam::123:role/duckling-test-org", + DuckLakeEnabled: &ducklakeEnabled, }, nil }, } diff --git a/controlplane/org_router.go b/controlplane/org_router.go index 2bafb2f9..d1d5090d 100644 --- a/controlplane/org_router.go +++ b/controlplane/org_router.go @@ -207,28 +207,6 @@ func (tr *OrgRouter) StackForOrg(orgID string) (*OrgStack, bool) { return stack, ok } -func (tr *OrgRouter) IcebergConfigForOrg(orgID string) (server.IcebergConfig, bool) { - tr.mu.RLock() - stack, ok := tr.orgs[orgID] - tr.mu.RUnlock() - if !ok || stack == nil || stack.Config == nil || stack.Config.Warehouse == nil { - return server.IcebergConfig{}, false - } - - src := stack.Config.Warehouse.Iceberg - cfg := server.IcebergConfig{ - Enabled: src.Enabled, - Backend: src.Backend, - Namespace: src.Namespace, - Region: src.Region, - LakekeeperEndpoint: src.LakekeeperEndpoint, - LakekeeperWarehouse: src.LakekeeperWarehouse, - LakekeeperClientID: src.LakekeeperClientID, - LakekeeperOAuth2ServerURI: src.LakekeeperOAuth2ServerURI, - } - return cfg, true -} - // SetMigrating marks an org as having a DuckLake migration in progress. func (tr *OrgRouter) SetMigrating(orgID string) { tr.migrating.Store(orgID, struct{}{}) diff --git a/controlplane/org_router_test.go b/controlplane/org_router_test.go index 6bad1fe6..2f660225 100644 --- a/controlplane/org_router_test.go +++ b/controlplane/org_router_test.go @@ -11,6 +11,7 @@ import ( "unsafe" "github.com/posthog/duckgres/controlplane/configstore" + corev1 "k8s.io/api/core/v1" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" ) @@ -59,39 +60,6 @@ func (l *recordingOrgRouterLease) Release(ctx context.Context) error { return nil } -func TestOrgRouterIcebergConfigForOrg(t *testing.T) { - router := &OrgRouter{ - orgs: map[string]*OrgStack{ - "org-acme": { - Config: &configstore.OrgConfig{ - Name: "org-acme", - Warehouse: &configstore.ManagedWarehouseConfig{ - Iceberg: configstore.ManagedWarehouseIceberg{ - Enabled: true, - Backend: configstore.IcebergBackendLakekeeper, - Namespace: "main", - Region: "us-east-1", - LakekeeperEndpoint: "http://lakekeeper/catalog", - LakekeeperWarehouse: "org-acme", - }, - }, - }, - }, - }, - } - - cfg, ok := router.IcebergConfigForOrg("org-acme") - if !ok { - t.Fatal("expected Iceberg config for org") - } - if !cfg.Enabled || cfg.LakekeeperEndpoint != "http://lakekeeper/catalog" || cfg.LakekeeperWarehouse != "org-acme" { - t.Fatalf("unexpected Iceberg config: %+v", cfg) - } - if cfg.Namespace != "main" || cfg.Region != "us-east-1" { - t.Fatalf("expected namespace and region to be preserved, got %+v", cfg) - } -} - func TestOrgRouterDestroyOrgStackDrainsSessionsBeforePoolShutdownAndReleasesSessionLeases(t *testing.T) { events := []string{} pool := &recordingOrgRouterPool{events: &events} diff --git a/controlplane/org_router_test_helpers_test.go b/controlplane/org_router_test_helpers_test.go index 20c66947..8f8ccdc9 100644 --- a/controlplane/org_router_test_helpers_test.go +++ b/controlplane/org_router_test_helpers_test.go @@ -1,7 +1,5 @@ package controlplane -import "github.com/posthog/duckgres/server" - // mockOrgRouter implements OrgRouterInterface for testing. type mockOrgRouter struct { sessions *SessionManager @@ -13,9 +11,5 @@ func (m *mockOrgRouter) StackForOrg(_ string) (WorkerPool, *SessionManager, *Mem return nil, m.sessions, m.rebalancer, m.ok } -func (m *mockOrgRouter) IcebergConfigForOrg(_ string) (server.IcebergConfig, bool) { - return server.IcebergConfig{}, false -} - func (m *mockOrgRouter) IsMigratingForOrg(_ string) bool { return false } func (m *mockOrgRouter) ShutdownAll() {} diff --git a/controlplane/provisioner/cleanup_helpers_test.go b/controlplane/provisioner/cleanup_helpers_test.go deleted file mode 100644 index 69e6447b..00000000 --- a/controlplane/provisioner/cleanup_helpers_test.go +++ /dev/null @@ -1,25 +0,0 @@ -//go:build kubernetes - -package provisioner - -import ( - "database/sql" - "testing" -) - -// cleanupDB attempts to drop a database. Best-effort: logs but does not fail. -func cleanupDB(t *testing.T, dsn, name string) { - t.Helper() - if dsn == "" { - return - } - db, err := sql.Open("pgx", dsn) - if err != nil { - t.Logf("cleanup open %s: %v", name, err) - return - } - defer db.Close() - if _, err := db.Exec("DROP DATABASE IF EXISTS " + quoteIdent(name)); err != nil { - t.Logf("cleanup drop %s: %v", name, err) - } -} diff --git a/controlplane/provisioner/controller.go b/controlplane/provisioner/controller.go index da016ea5..a3bed2fd 100644 --- a/controlplane/provisioner/controller.go +++ b/controlplane/provisioner/controller.go @@ -4,7 +4,6 @@ package provisioner import ( "context" - "errors" "fmt" "log/slog" "time" @@ -22,7 +21,6 @@ func provisionEventProps(w *configstore.ManagedWarehouse) map[string]any { return map[string]any{ "metadata_store": string(w.MetadataStore.Kind), "ducklake_enabled": w.DuckLake.Enabled, - "iceberg_enabled": w.Iceberg.Enabled, } } @@ -43,13 +41,6 @@ func captureProvisionFailed(w *configstore.ManagedWarehouse, reason string, upda type WarehouseStore interface { ListWarehousesByStates(states []configstore.ManagedWarehouseProvisioningState) ([]configstore.ManagedWarehouse, error) UpdateWarehouseState(orgID string, expectedState configstore.ManagedWarehouseProvisioningState, updates map[string]interface{}) error - // UpdateIcebergConfig writes per-org Iceberg/Lakekeeper config without - // CAS'ing on the top-level warehouse state. The Lakekeeper provisioner - // uses this because Iceberg provisioning runs in parallel with the - // top-level Duckling state machine — by the time we're ready to persist - // the Lakekeeper endpoint, the warehouse may already have transitioned - // to Ready, and a state-CAS update would silently no-op. - UpdateIcebergConfig(orgID string, updates map[string]interface{}) error } // MetadataProbe is the signature for an end-to-end metadata-store probe. The @@ -64,12 +55,6 @@ type Controller struct { pollInterval time.Duration probe MetadataProbe - // Lakekeeper-side reconcile dependencies. All optional: if any is nil - // (e.g. in older deployments or unit tests that don't exercise the - // Lakekeeper path), reconcileLakekeeper is skipped silently. - lakekeeperProvisioner *LakekeeperProvisioner - lakekeeperInputs LakekeeperInputsResolver - // bucketSuffix is the env suffix for CP-owned s3bucket naming // (DUCKGRES_DUCKLING_BUCKET_SUFFIX, e.g. "mw-prod-us"). When set, the // controller backfills spec.dataStore.bucketName onto ready s3bucket @@ -78,17 +63,6 @@ type Controller struct { bucketSuffix string } -// LakekeeperInputsResolver is the function shape the controller uses to -// build ProvisioningInputs for a given warehouse. Resolves things the -// configstore doesn't carry directly: the admin Postgres DSN (from a -// K8s Secret managed by Crossplane), the PG host the Lakekeeper pod uses -// to reach the cluster, and the S3 storage profile. -// -// Defined as a function rather than a method on Controller so tests can -// substitute a fake and so the convention for sourcing the admin DSN -// (Crossplane-emitted Secret name pattern) lives outside this package. -type LakekeeperInputsResolver func(ctx context.Context, w *configstore.ManagedWarehouse) (ProvisioningInputs, error) - // NewController creates a provisioning controller. Returns an error if the // Kubernetes client cannot be initialized (e.g., not running in-cluster). func NewController(store WarehouseStore, pollInterval time.Duration) (*Controller, error) { @@ -119,22 +93,6 @@ func (c *Controller) SetProbe(p MetadataProbe) { c.probe = p } -// WithLakekeeperProvisioner enables the Lakekeeper reconcile branch. -// Both p and inputs must be non-nil — partial wiring would silently -// disable the reconcile step and that's a misconfiguration we'd rather -// surface at startup than at the first activation. -func (c *Controller) WithLakekeeperProvisioner(p *LakekeeperProvisioner, inputs LakekeeperInputsResolver) *Controller { - if p == nil { - panic("WithLakekeeperProvisioner: provisioner is nil; call NewLakekeeperProvisioner first") - } - if inputs == nil { - panic("WithLakekeeperProvisioner: inputs resolver is nil") - } - c.lakekeeperProvisioner = p - c.lakekeeperInputs = inputs - return c -} - // WithBucketSuffix sets the env suffix used to compute and backfill the // CP-owned per-org s3bucket name onto existing ready ducklings. Empty leaves // backfill disabled (composition keeps deriving). Returns the controller for @@ -219,9 +177,7 @@ func (c *Controller) reconcilePending(ctx context.Context, w *configstore.Manage } // Create the Duckling CR - log.Info("Creating Duckling CR.", - "pgbouncer_enabled", w.PgBouncer.Enabled, - "iceberg_enabled", w.Iceberg.Enabled) + log.Info("Creating Duckling CR.", "pgbouncer_enabled", w.PgBouncer.Enabled) if err := c.duckling.Create(ctx, w.OrgID, CreateOptions{ MetadataStoreType: w.MetadataStore.Kind, PgBouncerEnabled: w.PgBouncer.Enabled, @@ -232,8 +188,6 @@ func (c *Controller) reconcilePending(ctx context.Context, w *configstore.Manage DataStoreType: w.DataStore.Kind, DataStoreBucket: w.DataStore.BucketName, DataStoreRegion: w.DataStore.Region, - IcebergEnabled: w.Iceberg.Enabled, - IcebergNamespace: w.Iceberg.Namespace, DuckLakeEnabled: w.DuckLake.Enabled, }); err != nil { log.Error("Failed to create Duckling CR.", "error", err) @@ -319,12 +273,6 @@ func (c *Controller) reconcileProvisioning(ctx context.Context, w *configstore.M updates["identity_state"] = configstore.ManagedWarehouseStateReady } - // Iceberg readiness for the Lakekeeper backend is owned by - // reconcileLakekeeper, which writes iceberg_state=Ready directly once - // the per-org Lakekeeper warehouse is provisioned. Nothing here needs - // to propagate from the Crossplane Duckling status — the Lakekeeper - // provisioner is the source of truth. - // Infrastructure is ready when all components are provisioned AND the // Crossplane Ready condition is True. The Ready condition ensures all // composed resources (including the metadata store) are fully reconciled, @@ -333,12 +281,8 @@ func (c *Controller) reconcileProvisioning(ctx context.Context, w *configstore.M metaReady := w.MetadataStoreState == configstore.ManagedWarehouseStateReady || updates["metadata_store_state"] == configstore.ManagedWarehouseStateReady secretsReady := w.SecretsState == configstore.ManagedWarehouseStateReady || updates["secrets_state"] == configstore.ManagedWarehouseStateReady identReady := w.IdentityState == configstore.ManagedWarehouseStateReady || updates["identity_state"] == configstore.ManagedWarehouseStateReady - // Iceberg is only required for Ready when the tenant opted in. - icebergReady := !w.Iceberg.Enabled || - w.IcebergState == configstore.ManagedWarehouseStateReady || - updates["iceberg_state"] == configstore.ManagedWarehouseStateReady - if s3Ready && metaReady && secretsReady && identReady && icebergReady && status.ReadyCondition { + if s3Ready && metaReady && secretsReady && identReady && status.ReadyCondition { // End-to-end probe: AWS reports the metadata store (RDS) Available before // its DNS record has propagated to in-cluster CoreDNS, and even longer // before pgbouncer's resolver picks it up. Flipping to Ready on @@ -394,32 +338,16 @@ func (c *Controller) reconcileProvisioning(ctx context.Context, w *configstore.M } } - // Provision the per-org Lakekeeper as part of turn-up, not only as a - // post-Ready late-enable. A cnpg-shard Duckling is required by the XRD to - // have iceberg.enabled=true, so its warehouse can never reach Ready until - // iceberg_state flips — and for the Lakekeeper backend that only happens - // once EnsureForOrg has stood the catalog up (there's no S3 Tables bucket - // whose ARN would otherwise trigger it). reconcileLakekeeper is idempotent, - // no-ops for non-Lakekeeper backends, and returns quietly until the - // Duckling status carries the metadata creds and the Lakekeeper CR has - // bootstrapped; the poll loop is the requeue. The iceberg_state=Ready it - // writes is picked up by the readiness check on a subsequent tick. - c.reconcileLakekeeper(ctx, w) } // reconcileReady handles drift correction for Ready warehouses. The -// post-create-mutable spec fields are metadataStore.pgbouncer.enabled and -// iceberg.enabled; if an operator flips either in the config store (admin -// API), we patch the CR so the Crossplane composition provisions / tears -// down the affected resource. +// post-create-mutable spec field is metadataStore.pgbouncer.enabled; if an +// operator flips it in the config store (admin API), we patch the CR so the +// Crossplane composition converges. // // Scope is intentionally narrow: we do NOT reconcile ACU, image, or other // spec fields. Those aren't user-mutable via the admin API today, and // aggressive drift correction would conflict with manual kubectl patches. -// -// iceberg.namespace is NOT drift-corrected — the XRD's CEL admission rule -// rejects post-create namespace changes, so a drift attempt would just hit -// a 422 from the API server. Namespace changes require warehouse re-creation. func (c *Controller) reconcileReady(ctx context.Context, w *configstore.ManagedWarehouse) { log := slog.With("org", w.OrgID, "phase", "ready") @@ -443,30 +371,12 @@ func (c *Controller) reconcileReady(ctx context.Context, w *configstore.ManagedW } } - currentIceberg, err := c.duckling.GetIcebergEnabled(ctx, w.OrgID) - if err != nil { - log.Warn("Failed to read Duckling CR iceberg.enabled for drift check.", "error", err) - return - } - if currentIceberg != w.Iceberg.Enabled { - log.Info("Iceberg drift detected, patching Duckling CR.", - "desired", w.Iceberg.Enabled, "current", currentIceberg) - if err := c.duckling.SetIcebergEnabled(ctx, w.OrgID, w.Iceberg.Enabled); err != nil { - log.Warn("Failed to patch Duckling CR iceberg.enabled.", "error", err) - } - } - // Backfill the CP-owned s3bucket name onto ducklings provisioned before the // control plane started supplying it (spec.dataStore carries only // {type: s3bucket}). The name computed here is exactly what the composition // has been deriving into status, so the patch moves it from a derived output // into a durable input without changing the actual bucket. No-op once set. c.reconcileBucketName(ctx, w, log) - - // Lakekeeper reconcile is independent of the Duckling state machine — - // provisioning a Lakekeeper for an org doesn't depend on, and doesn't - // block, the warehouse top-level state. - c.reconcileLakekeeper(ctx, w) } // reconcileBucketName backfills spec.dataStore.bucketName (and the config-store @@ -523,81 +433,9 @@ func (c *Controller) reconcileBucketName(ctx context.Context, w *configstore.Man } } -// reconcileLakekeeper provisions a per-org Lakekeeper instance when the -// warehouse selects the lakekeeper backend and isn't yet provisioned. Called -// from both reconcileProvisioning (initial turn-up — required for cnpg-shard, -// which must have iceberg enabled and so can't reach Ready until the catalog -// is up) and reconcileReady (late-enable on an already-Ready warehouse). -// Idempotent: a warehouse with LakekeeperEndpoint already populated is a -// no-op. ErrBootstrapPending from the underlying EnsureForOrg is logged -// at debug and treated as "retry on the next tick" — the controller's -// poll loop is the requeue mechanism. -// -// Skipped silently when the controller wasn't built with -// WithLakekeeperProvisioner (e.g. in deployments where the operator -// isn't installed, or in tests). -func (c *Controller) reconcileLakekeeper(ctx context.Context, w *configstore.ManagedWarehouse) { - if c.lakekeeperProvisioner == nil || c.lakekeeperInputs == nil { - return - } - if !w.Iceberg.Enabled { - return - } - if w.Iceberg.ResolvedBackend() != configstore.IcebergBackendLakekeeper { - return - } - log := slog.With("org", w.OrgID, "phase", "lakekeeper") - - if w.Iceberg.LakekeeperEndpoint != "" { - // Already provisioned. Converge the pod-shape fields (replicas, resource - // requests, scrape annotations) onto the org's existing CR(s) via a - // label-matched merge patch — no inputs needed, never recreates under a - // new name. The operator rolls the Deployment only when the spec actually - // changes. (The operator can't add fields that aren't in the CR, so a - // spec change in the provisioner must be written back here — it won't - // appear on its own.) - if err := c.lakekeeperProvisioner.PatchPodShape(ctx, w.OrgID); err != nil { - log.Warn("Lakekeeper CR pod-shape drift correction failed.", "error", err) - } - return - } - - inputs, err := c.lakekeeperInputs(ctx, w) - if err != nil { - log.Warn("Failed to resolve lakekeeper provisioning inputs.", "error", err) - return - } - if err := c.lakekeeperProvisioner.EnsureForOrg(ctx, w, inputs); err != nil { - if errors.Is(err, ErrBootstrapPending) { - log.Debug("Lakekeeper bootstrap still pending; next tick will retry.") - return - } - log.Warn("Lakekeeper provisioning failed.", "error", err) - return - } - log.Info("Lakekeeper provisioning completed.") -} - func (c *Controller) reconcileDeleting(ctx context.Context, w *configstore.ManagedWarehouse) { log := slog.With("org", w.OrgID, "phase", "deleting") - // Resolve the Lakekeeper inputs BEFORE deleting the Duckling CR. The - // inputs include the metadata-store admin DSN derived from the CR's - // status; once the CR is gone the resolver can't reconstruct it, and - // we'd lose the only way to drop the per-tenant lakekeeper_ - // Postgres database. Best-effort resolution: when the inputs aren't - // available (resolver unwired, CR never reconciled, dev/orbstack - // without env config), the subsequent DeleteForOrg call falls back to - // k8s-only teardown. - var lkInputs ProvisioningInputs - if c.lakekeeperProvisioner != nil && c.lakekeeperInputs != nil { - if in, err := c.lakekeeperInputs(ctx, w); err != nil { - log.Debug("Lakekeeper inputs unavailable at delete time; skipping PG cleanup.", "error", err) - } else { - lkInputs = in - } - } - log.Info("Deleting Duckling CR.") if err := c.duckling.Delete(ctx, w.OrgID); err != nil { // Only proceed if the CR is already gone (NotFound). For other errors @@ -617,26 +455,6 @@ func (c *Controller) reconcileDeleting(ctx context.Context, w *configstore.Manag } } - // Tear down the per-org Lakekeeper instance the control plane provisioned - // out-of-band (CR + Secret + ServiceAccount in the lakekeeper namespace, - // and — when this provisioner created them — the lakekeeper_ - // Postgres database and role on the metadata store). The Crossplane - // Duckling composition doesn't own the k8s pieces, so without an explicit - // teardown they leak after the warehouse is gone. Idempotent and - // NotFound-tolerant — a clean no-op for ducklings that never enabled - // Iceberg. Skipped silently when the provisioner isn't wired (mirrors - // reconcileLakekeeper). On error we return without marking the warehouse - // deleted so the next reconcile pass retries. - if c.lakekeeperProvisioner != nil { - if err := c.lakekeeperProvisioner.DeleteForOrg(ctx, w.OrgID, lkInputs); err != nil { - log.Warn("Failed to tear down Lakekeeper resources, will retry.", "error", err) - analytics.Default().Capture("warehouse_deprovision_failed", w.OrgID, map[string]any{ - "reason": "lakekeeper_teardown_failed", - }) - return - } - } - if err := c.store.UpdateWarehouseState(w.OrgID, configstore.ManagedWarehouseStateDeleting, map[string]interface{}{ "state": configstore.ManagedWarehouseStateDeleted, "status_message": "Resources deleted", diff --git a/controlplane/provisioner/controller_analytics_test.go b/controlplane/provisioner/controller_analytics_test.go index d77f91bd..26fc65b7 100644 --- a/controlplane/provisioner/controller_analytics_test.go +++ b/controlplane/provisioner/controller_analytics_test.go @@ -124,8 +124,8 @@ func TestReconcileProvisioningSuccessEmitsEvent(t *testing.T) { if e.props["ducklake_enabled"] != true { t.Errorf("ducklake_enabled = %v, want true", e.props["ducklake_enabled"]) } - if e.props["iceberg_enabled"] != false { - t.Errorf("iceberg_enabled = %v, want false", e.props["iceberg_enabled"]) + if _, ok := e.props["iceberg_enabled"]; ok { + t.Errorf("iceberg_enabled should not be emitted") } if n := fake.count("warehouse_provision_failed"); n != 0 { t.Errorf("expected no failure event, got %d", n) diff --git a/controlplane/provisioner/controller_lakekeeper_test.go b/controlplane/provisioner/controller_lakekeeper_test.go deleted file mode 100644 index 1f4430a3..00000000 --- a/controlplane/provisioner/controller_lakekeeper_test.go +++ /dev/null @@ -1,331 +0,0 @@ -//go:build kubernetes - -package provisioner - -import ( - "context" - "encoding/json" - "errors" - "io" - "net/http" - "net/http/httptest" - "os" - "testing" - - "github.com/posthog/duckgres/controlplane/configstore" - apierrors "k8s.io/apimachinery/pkg/api/errors" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured" -) - -func TestReconcileDeleting_TearsDownLakekeeper(t *testing.T) { - dc, _ := newFakeDucklingClient() - k8sClient, dyn, kc := newFakeLakekeeperClient() - ctx := context.Background() - orgID := "tenant-x" - - // Seed the per-org Lakekeeper resources EnsureForOrg would have created. - // The Duckling CR is intentionally left absent — reconcileDeleting tolerates - // NotFound on the CR delete and must still proceed to Lakekeeper teardown. - if err := k8sClient.EnsureSecret(ctx, orgID, LakekeeperSecretData{ - DBUser: "u", DBPassword: "p", EncryptionKey: "k", OAuth2ClientSecret: "o", - }); err != nil { - t.Fatalf("seed secret: %v", err) - } - if err := k8sClient.EnsureServiceAccount(ctx, orgID); err != nil { - t.Fatalf("seed service account: %v", err) - } - if err := k8sClient.EnsureCR(ctx, LakekeeperCRSpec{ - OrgID: orgID, Image: "stub", PGHost: "stub", PGDatabase: "stub", - SecretName: "stub", BaseURI: "http://stub", - }); err != nil { - t.Fatalf("seed CR: %v", err) - } - - fs := newFakeStore() - fs.warehouses[orgID] = &configstore.ManagedWarehouse{ - OrgID: orgID, - State: configstore.ManagedWarehouseStateDeleting, - } - - p := NewLakekeeperProvisioner(fs, k8sClient) - c := NewControllerWithClient(fs, dc, 0). - WithLakekeeperProvisioner(p, func(context.Context, *configstore.ManagedWarehouse) (ProvisioningInputs, error) { - return ProvisioningInputs{}, nil - }) - - c.reconcileDeleting(ctx, fs.warehouses[orgID]) - - if _, err := dyn.Resource(lakekeeperGVR).Namespace(k8sClient.namespace).Get(ctx, LakekeeperResourceName(orgID), metav1.GetOptions{}); !apierrors.IsNotFound(err) { - t.Errorf("Lakekeeper CR not torn down: err=%v", err) - } - if _, err := kc.CoreV1().Secrets(k8sClient.namespace).Get(ctx, LakekeeperResourceName(orgID), metav1.GetOptions{}); !apierrors.IsNotFound(err) { - t.Errorf("Lakekeeper Secret not torn down: err=%v", err) - } - if _, err := kc.CoreV1().ServiceAccounts(k8sClient.namespace).Get(ctx, LakekeeperServiceAccountName(orgID), metav1.GetOptions{}); !apierrors.IsNotFound(err) { - t.Errorf("Lakekeeper ServiceAccount not torn down: err=%v", err) - } - if fs.warehouses[orgID].State != configstore.ManagedWarehouseStateDeleted { - t.Errorf("warehouse state = %q, want deleted", fs.warehouses[orgID].State) - } -} - -func TestReconcileLakekeeper_SkipsWhenProvisionerNotConfigured(t *testing.T) { - store := newFakeStore() - store.warehouses["acme"] = &configstore.ManagedWarehouse{ - OrgID: "acme", - State: configstore.ManagedWarehouseStateReady, - Iceberg: configstore.ManagedWarehouseIceberg{ - Enabled: true, - Backend: configstore.IcebergBackendLakekeeper, - }, - } - c := NewControllerWithClient(store, nil, 0) // no lakekeeperProvisioner - // Should be a silent no-op — no panic, no calls, no err. - c.reconcileLakekeeper(context.Background(), store.warehouses["acme"]) -} - -func TestReconcileLakekeeper_SkipsWhenIcebergDisabled(t *testing.T) { - called := false - store := newFakeStore() - store.warehouses["acme"] = &configstore.ManagedWarehouse{ - OrgID: "acme", - State: configstore.ManagedWarehouseStateReady, - Iceberg: configstore.ManagedWarehouseIceberg{ - Enabled: false, // <-- gated off - Backend: configstore.IcebergBackendLakekeeper, - }, - } - c := NewControllerWithClient(store, nil, 0) - c.lakekeeperProvisioner = &LakekeeperProvisioner{} // present but unused - c.lakekeeperInputs = func(_ context.Context, _ *configstore.ManagedWarehouse) (ProvisioningInputs, error) { - called = true - return ProvisioningInputs{}, nil - } - c.reconcileLakekeeper(context.Background(), store.warehouses["acme"]) - if called { - t.Errorf("inputs resolver should not be called when Iceberg.Enabled=false") - } -} - -func TestReconcileLakekeeper_DriftCorrectsWhenAlreadyProvisioned(t *testing.T) { - // An already-provisioned org (LakekeeperEndpoint set) is NOT skipped: the - // Ready loop patches the pod shape (replicas/requests/scrape) onto the org's - // existing CR, matched by the duckgres/active-org label. No inputs are - // resolved, and no duplicate CR is created under the recomputed name. - called := false - k8sClient, dyn, _ := newFakeLakekeeperClient() - p := NewLakekeeperProvisioner(newFakeStore(), k8sClient) - - // Seed a legacy-named CR carrying the org label. Its name intentionally - // differs from LakekeeperResourceName("acme") to prove the patch is matched - // by label, not by a recomputed name (the post-#632 hyphenation bug). - seed := &unstructured.Unstructured{Object: map[string]interface{}{ - "apiVersion": "lakekeeper.k8s.lakekeeper.io/v1alpha1", - "kind": "Lakekeeper", - "metadata": map[string]interface{}{ - "name": "lakekeeper-acme-legacy", - "namespace": k8sClient.namespace, - "labels": map[string]interface{}{"duckgres/active-org": "acme"}, - }, - "spec": map[string]interface{}{"replicas": int64(1)}, - }} - if _, err := dyn.Resource(lakekeeperGVR).Namespace(k8sClient.namespace).Create(context.Background(), seed, metav1.CreateOptions{}); err != nil { - t.Fatalf("seed CR: %v", err) - } - - store := newFakeStore() - store.warehouses["acme"] = &configstore.ManagedWarehouse{ - OrgID: "acme", - State: configstore.ManagedWarehouseStateReady, - Iceberg: configstore.ManagedWarehouseIceberg{ - Enabled: true, - Backend: configstore.IcebergBackendLakekeeper, - LakekeeperEndpoint: "http://lk-acme.lakekeeper.svc:8181/catalog", - }, - } - c := NewControllerWithClient(store, nil, 0). - WithLakekeeperProvisioner(p, func(_ context.Context, _ *configstore.ManagedWarehouse) (ProvisioningInputs, error) { - called = true - return ProvisioningInputs{}, nil - }) - - c.reconcileLakekeeper(context.Background(), store.warehouses["acme"]) - - // Pod-shape drift correction needs no inputs — the resolver must not run. - if called { - t.Errorf("inputs resolver should NOT be called for pod-shape drift correction") - } - // The existing legacy-named CR was patched in place by label. - got, err := dyn.Resource(lakekeeperGVR).Namespace(k8sClient.namespace).Get(context.Background(), "lakekeeper-acme-legacy", metav1.GetOptions{}) - if err != nil { - t.Fatalf("get patched CR: %v", err) - } - spec := got.Object["spec"].(map[string]interface{}) - if rv := spec["replicas"]; rv != int64(lakekeeperPodReplicas) && rv != float64(lakekeeperPodReplicas) { - t.Errorf("replicas = %v, want %d", rv, lakekeeperPodReplicas) - } - ann := spec["podMetadata"].(map[string]interface{})["annotations"].(map[string]interface{}) - if ann["prometheus.io/scrape"] != "true" { - t.Errorf("scrape annotation not applied: %v", ann) - } - // No duplicate CR created under the recomputed (hyphenated) name. - if _, err := dyn.Resource(lakekeeperGVR).Namespace(k8sClient.namespace).Get(context.Background(), LakekeeperResourceName("acme"), metav1.GetOptions{}); err == nil { - t.Errorf("drift correction created a duplicate CR under the recomputed name") - } -} - -func TestReconcileLakekeeper_LogsAndContinuesOnInputResolverError(t *testing.T) { - store := newFakeStore() - store.warehouses["acme"] = &configstore.ManagedWarehouse{ - OrgID: "acme", - Iceberg: configstore.ManagedWarehouseIceberg{ - Enabled: true, - Backend: configstore.IcebergBackendLakekeeper, - }, - } - c := NewControllerWithClient(store, nil, 0) - c.lakekeeperProvisioner = &LakekeeperProvisioner{} - c.lakekeeperInputs = func(_ context.Context, _ *configstore.ManagedWarehouse) (ProvisioningInputs, error) { - return ProvisioningInputs{}, errors.New("crossplane secret not found") - } - // Should not panic; the controller logs at warn level and moves on so - // the poll loop retries on the next tick. - c.reconcileLakekeeper(context.Background(), store.warehouses["acme"]) -} - -func TestWithLakekeeperProvisioner_PanicsOnNilProvisioner(t *testing.T) { - defer func() { - if r := recover(); r == nil { - t.Fatal("expected panic on nil provisioner") - } - }() - NewControllerWithClient(newFakeStore(), nil, 0). - WithLakekeeperProvisioner(nil, func(context.Context, *configstore.ManagedWarehouse) (ProvisioningInputs, error) { - return ProvisioningInputs{}, nil - }) -} - -func TestWithLakekeeperProvisioner_PanicsOnNilResolver(t *testing.T) { - defer func() { - if r := recover(); r == nil { - t.Fatal("expected panic on nil resolver") - } - }() - NewControllerWithClient(newFakeStore(), nil, 0). - WithLakekeeperProvisioner(&LakekeeperProvisioner{}, nil) -} - -// TestReconcileLakekeeper_HappyPath exercises the positive case where all -// gates pass: a real LakekeeperProvisioner (built with fakes) is invoked, -// the inputs resolver's ProvisioningInputs reach EnsureForOrg intact, and -// the resulting Lakekeeper config is written through to the warehouse row. -// -// This is the "EnsureForOrg is actually called" coverage the early gate -// tests don't reach. -func TestReconcileLakekeeper_HappyPath(t *testing.T) { - // Fake Lakekeeper HTTP server: accepts bootstrap, list, and create - // warehouse. Tracks the create-warehouse call so we can assert it - // fired with the expected inputs. - var createCalled bool - var createReq CreateWarehouseRequest - lk := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - switch { - case r.Method == http.MethodPost && r.URL.Path == "/management/v1/bootstrap": - w.WriteHeader(http.StatusNoContent) - case r.Method == http.MethodGet && r.URL.Path == "/management/v1/warehouse": - _, _ = io.WriteString(w, `{"warehouses":[]}`) - case r.Method == http.MethodPost && r.URL.Path == "/management/v1/warehouse": - createCalled = true - _ = json.NewDecoder(r.Body).Decode(&createReq) - _, _ = io.WriteString(w, `{"warehouse-id":"wh-uuid","name":"`+createReq.WarehouseName+`","status":"active"}`) - default: - t.Errorf("unexpected request: %s %s", r.Method, r.URL.Path) - http.NotFound(w, r) - } - })) - t.Cleanup(lk.Close) - - // Fake K8s client + dynamic for the Lakekeeper CR + Secret. Pre-seed - // the CR with status.bootstrappedAt so waitForBootstrap (now - // checkBootstrap) returns immediately. - k8sClient, dyn, _ := newFakeLakekeeperClient() - if err := k8sClient.EnsureCR(context.Background(), LakekeeperCRSpec{ - OrgID: "happy", Image: "stub", PGHost: "stub", PGDatabase: "stub", - SecretName: "stub", BaseURI: "http://stub", - }); err != nil { - t.Fatalf("seed CR: %v", err) - } - cr, _ := dyn.Resource(lakekeeperGVR).Namespace(k8sClient.namespace).Get(context.Background(), LakekeeperResourceName("happy"), metav1.GetOptions{}) - cr.Object["status"] = map[string]interface{}{"bootstrappedAt": "2026-05-19T12:00:00Z"} - if _, err := dyn.Resource(lakekeeperGVR).Namespace(k8sClient.namespace).Update(context.Background(), cr, metav1.UpdateOptions{}); err != nil { - t.Fatalf("inject status: %v", err) - } - - // Real provisioner, with the Lakekeeper HTTP client factory pointed at - // our fake server. The EnsureDatabase call inside EnsureForOrg needs a - // live PG — we point at the prototype Postgres on localhost:5434, and - // skip the test if PG_ADMIN_DSN isn't set. - pgDSN := os.Getenv("PG_ADMIN_DSN") - if pgDSN == "" { - t.Skip("PG_ADMIN_DSN not set; skipping happy-path test") - } - p := NewLakekeeperProvisioner(newFakeStore(), k8sClient, - WithImage("stub:test"), - WithClientFactory(func(string) *LakekeeperClient { return NewLakekeeperClient(lk.URL) }), - ) - t.Cleanup(func() { dropDatabaseAndRole(t, pgDSN, lakekeeperDBName("happy")) }) - - // Wire into the controller. - store := newFakeStore() - store.warehouses["happy"] = &configstore.ManagedWarehouse{ - OrgID: "happy", - State: configstore.ManagedWarehouseStateReady, - Iceberg: configstore.ManagedWarehouseIceberg{ - Enabled: true, - Backend: configstore.IcebergBackendLakekeeper, - }, - } - // Re-point the provisioner's store at the controller's store so the - // EnsureForOrg writeback hits the same row reconcileLakekeeper sees. - p.store = store - - var capturedInputs ProvisioningInputs - wantInputs := ProvisioningInputs{ - AdminDSN: pgDSN, PGHost: "localhost", PGPort: 5434, PGSSLMode: "disable", - S3: S3StorageConfig{ - Bucket: "warehouse", KeyPrefix: "happy", Region: "us-east-1", Flavor: "s3-compat", - StaticAccessKeyID: "minioadmin", StaticAccessKeySecret: "minioadmin", - }, - } - - c := NewControllerWithClient(store, nil, 0). - WithLakekeeperProvisioner(p, func(_ context.Context, w *configstore.ManagedWarehouse) (ProvisioningInputs, error) { - if w.OrgID != "happy" { - t.Errorf("resolver got wrong org: %s", w.OrgID) - } - capturedInputs = wantInputs - return wantInputs, nil - }) - - c.reconcileLakekeeper(context.Background(), store.warehouses["happy"]) - - if !createCalled { - t.Fatalf("Lakekeeper warehouse-create REST call was never made") - } - if createReq.StorageProfile.KeyPrefix != "happy" { - t.Errorf("storage key-prefix = %q, want happy", createReq.StorageProfile.KeyPrefix) - } - // Verify inputs threaded through unchanged. - if capturedInputs.PGPort != wantInputs.PGPort || capturedInputs.S3.Bucket != wantInputs.S3.Bucket { - t.Errorf("inputs not threaded intact to provisioner; got %+v", capturedInputs) - } - // Verify the warehouse row got the persisted Lakekeeper config. - w := store.warehouses["happy"] - if w.Iceberg.LakekeeperEndpoint == "" { - t.Errorf("LakekeeperEndpoint not persisted after EnsureForOrg") - } - if w.Iceberg.LakekeeperWarehouse != "org-happy" { - t.Errorf("LakekeeperWarehouse = %q, want org-happy", w.Iceberg.LakekeeperWarehouse) - } -} diff --git a/controlplane/provisioner/controller_test.go b/controlplane/provisioner/controller_test.go index 301677ea..336f4ba8 100644 --- a/controlplane/provisioner/controller_test.go +++ b/controlplane/provisioner/controller_test.go @@ -92,68 +92,6 @@ func (s *fakeStore) UpdateWarehouseState(orgID string, expectedState configstore case "provisioning_started_at": t := v.(time.Time) w.ProvisioningStartedAt = &t - case "iceberg_region": - w.Iceberg.Region = v.(string) - case "iceberg_namespace": - w.Iceberg.Namespace = v.(string) - case "iceberg_state": - w.IcebergState = v.(configstore.ManagedWarehouseProvisioningState) - case "iceberg_enabled": - w.Iceberg.Enabled = v.(bool) - case "iceberg_backend": - w.Iceberg.Backend = v.(string) - case "iceberg_lakekeeper_endpoint": - w.Iceberg.LakekeeperEndpoint = v.(string) - case "iceberg_lakekeeper_warehouse": - w.Iceberg.LakekeeperWarehouse = v.(string) - case "iceberg_lakekeeper_client_id": - w.Iceberg.LakekeeperClientID = v.(string) - case "iceberg_lakekeeper_oauth2_server_uri": - w.Iceberg.LakekeeperOAuth2ServerURI = v.(string) - case "iceberg_lakekeeper_client_credentials_namespace": - w.Iceberg.LakekeeperClientCredentials.Namespace = v.(string) - case "iceberg_lakekeeper_client_credentials_name": - w.Iceberg.LakekeeperClientCredentials.Name = v.(string) - case "iceberg_lakekeeper_client_credentials_key": - w.Iceberg.LakekeeperClientCredentials.Key = v.(string) - } - } - return nil -} - -// UpdateIcebergConfig writes per-org Iceberg/Lakekeeper fields without a -// top-level state CAS. Mirrors the real configstore method's contract. -func (s *fakeStore) UpdateIcebergConfig(orgID string, updates map[string]interface{}) error { - w, ok := s.warehouses[orgID] - if !ok { - return fmt.Errorf("warehouse %q: %w", orgID, configstore.ErrWarehouseNotFound) - } - for k, v := range updates { - switch k { - case "iceberg_enabled": - w.Iceberg.Enabled = v.(bool) - case "iceberg_backend": - w.Iceberg.Backend = v.(string) - case "iceberg_namespace": - w.Iceberg.Namespace = v.(string) - case "iceberg_region": - w.Iceberg.Region = v.(string) - case "iceberg_state": - w.IcebergState = v.(configstore.ManagedWarehouseProvisioningState) - case "iceberg_lakekeeper_endpoint": - w.Iceberg.LakekeeperEndpoint = v.(string) - case "iceberg_lakekeeper_warehouse": - w.Iceberg.LakekeeperWarehouse = v.(string) - case "iceberg_lakekeeper_client_id": - w.Iceberg.LakekeeperClientID = v.(string) - case "iceberg_lakekeeper_oauth2_server_uri": - w.Iceberg.LakekeeperOAuth2ServerURI = v.(string) - case "iceberg_lakekeeper_client_credentials_namespace": - w.Iceberg.LakekeeperClientCredentials.Namespace = v.(string) - case "iceberg_lakekeeper_client_credentials_name": - w.Iceberg.LakekeeperClientCredentials.Name = v.(string) - case "iceberg_lakekeeper_client_credentials_key": - w.Iceberg.LakekeeperClientCredentials.Key = v.(string) } } return nil @@ -374,102 +312,6 @@ func TestReconcileReadyPatchesCRWhenPgBouncerFlippedOff(t *testing.T) { } } -func TestReconcileReadyPatchesCRWhenIcebergFlippedOn(t *testing.T) { - dc, fakeK8s := newFakeDucklingClient() - fs := newFakeStore() - fs.warehouses["org-iceberg-on"] = &configstore.ManagedWarehouse{ - OrgID: "org-iceberg-on", - State: configstore.ManagedWarehouseStateReady, - Iceberg: configstore.ManagedWarehouseIceberg{Enabled: true}, - } - // Seed a CR with no iceberg block — represents a warehouse that - // existed before iceberg was opted into. - cr := &unstructured.Unstructured{Object: map[string]interface{}{ - "apiVersion": "k8s.posthog.com/v1alpha1", - "kind": "Duckling", - "metadata": map[string]interface{}{ - "name": ducklingName("org-iceberg-on"), - "namespace": ducklingNamespace, - }, - "spec": map[string]interface{}{ - "metadataStore": map[string]interface{}{ - "type": "external", - "external": map[string]interface{}{ - "endpoint": "ext.example.internal", - "passwordAwsSecret": "ext-secret", - }, - }, - "dataStore": map[string]interface{}{"type": "s3bucket"}, - }, - }} - if _, err := fakeK8s.Resource(ducklingGVR).Namespace(ducklingNamespace).Create(context.Background(), cr, metav1.CreateOptions{}); err != nil { - t.Fatalf("seed CR: %v", err) - } - - ctrl := NewControllerWithClient(fs, dc, time.Second) - ctrl.reconcile(context.Background()) - - got, err := fakeK8s.Resource(ducklingGVR).Namespace(ducklingNamespace).Get(context.Background(), ducklingName("org-iceberg-on"), metav1.GetOptions{}) - if err != nil { - t.Fatalf("re-fetch CR: %v", err) - } - spec := got.Object["spec"].(map[string]interface{}) - iceberg, ok := spec["iceberg"].(map[string]interface{}) - if !ok { - t.Fatalf("expected iceberg block after drift patch, got spec=%v", spec) - } - if iceberg["enabled"] != true { - t.Fatalf("expected iceberg.enabled=true, got %v", iceberg["enabled"]) - } - // Merge-patch must not wipe sibling spec fields (metadataStore / dataStore). - if _, ok := spec["metadataStore"].(map[string]interface{}); !ok { - t.Fatalf("expected metadataStore preserved after iceberg patch") - } - if _, ok := spec["dataStore"].(map[string]interface{}); !ok { - t.Fatalf("expected dataStore preserved after iceberg patch") - } -} - -func TestReconcileReadyPatchesCRWhenIcebergFlippedOff(t *testing.T) { - dc, fakeK8s := newFakeDucklingClient() - fs := newFakeStore() - fs.warehouses["org-iceberg-off"] = &configstore.ManagedWarehouse{ - OrgID: "org-iceberg-off", - State: configstore.ManagedWarehouseStateReady, - Iceberg: configstore.ManagedWarehouseIceberg{Enabled: false}, - } - // Seed a CR that currently has iceberg enabled — expect drift back to false. - cr := &unstructured.Unstructured{Object: map[string]interface{}{ - "apiVersion": "k8s.posthog.com/v1alpha1", - "kind": "Duckling", - "metadata": map[string]interface{}{ - "name": ducklingName("org-iceberg-off"), - "namespace": ducklingNamespace, - }, - "spec": map[string]interface{}{ - "metadataStore": map[string]interface{}{"type": "external"}, - "dataStore": map[string]interface{}{"type": "s3bucket"}, - "iceberg": map[string]interface{}{"enabled": true}, - }, - }} - if _, err := fakeK8s.Resource(ducklingGVR).Namespace(ducklingNamespace).Create(context.Background(), cr, metav1.CreateOptions{}); err != nil { - t.Fatalf("seed CR: %v", err) - } - - ctrl := NewControllerWithClient(fs, dc, time.Second) - ctrl.reconcile(context.Background()) - - got, err := fakeK8s.Resource(ducklingGVR).Namespace(ducklingNamespace).Get(context.Background(), ducklingName("org-iceberg-off"), metav1.GetOptions{}) - if err != nil { - t.Fatalf("re-fetch CR: %v", err) - } - spec := got.Object["spec"].(map[string]interface{}) - iceberg := spec["iceberg"].(map[string]interface{}) - if iceberg["enabled"] != false { - t.Fatalf("expected iceberg.enabled=false after drift patch, got %v", iceberg["enabled"]) - } -} - func TestReconcileReadyNoDriftDoesNotPatch(t *testing.T) { dc, fakeK8s := newFakeDucklingClient() fs := newFakeStore() @@ -912,9 +754,8 @@ func TestFakeStoreUpdateWarehouseState(t *testing.T) { // TestReconcilePendingCreatesCnpgShardCR verifies that a warehouse whose // metadata-store kind is cnpg-shard produces a Duckling CR with -// metadataStore.type=cnpg-shard, no external/pgbouncer blocks, and the iceberg -// block enabled — the shape the composition expects for a Lakekeeper-backed -// shard tenant. +// metadataStore.type=cnpg-shard, no external/pgbouncer blocks, and DuckLake +// enabled. func TestReconcilePendingCreatesCnpgShardCR(t *testing.T) { dc, fakeK8s := newFakeDucklingClient() fs := newFakeStore() @@ -924,10 +765,7 @@ func TestReconcilePendingCreatesCnpgShardCR(t *testing.T) { MetadataStore: configstore.ManagedWarehouseMetadataStore{ Kind: configstore.MetadataStoreKindCnpgShard, }, - Iceberg: configstore.ManagedWarehouseIceberg{ - Enabled: true, - Backend: configstore.IcebergBackendLakekeeper, - }, + DuckLake: configstore.ManagedWarehouseDuckLake{Enabled: true}, } ctrl := NewControllerWithClient(fs, dc, time.Second) @@ -949,23 +787,20 @@ func TestReconcilePendingCreatesCnpgShardCR(t *testing.T) { if _, present := metadataStore["pgbouncer"]; present { t.Errorf("cnpg-shard CR must not carry a pgbouncer block, got %v", metadataStore["pgbouncer"]) } - iceberg, ok := spec["iceberg"].(map[string]interface{}) - if !ok || iceberg["enabled"] != true { - t.Errorf("expected iceberg.enabled=true on cnpg-shard CR, got %v", spec["iceberg"]) + if _, present := spec["iceberg"]; present { + t.Errorf("cnpg-shard CR must not carry an iceberg block, got %v", spec["iceberg"]) } - // DuckLake is emitted explicitly (false here — iceberg-only cnpg). ducklake, ok := spec["ducklake"].(map[string]interface{}) - if !ok || ducklake["enabled"] != false { - t.Errorf("expected ducklake.enabled=false on iceberg-only cnpg-shard CR, got %v", spec["ducklake"]) + if !ok || ducklake["enabled"] != true { + t.Errorf("expected ducklake.enabled=true on cnpg-shard CR, got %v", spec["ducklake"]) } if fs.warehouses["org-cnpg"].State != configstore.ManagedWarehouseStateProvisioning { t.Fatalf("expected provisioning state, got %q", fs.warehouses["org-cnpg"].State) } } -// TestReconcilePendingCreatesDuckLakeOnlyCnpgCR verifies the decoupled combo: -// cnpg-shard with DuckLake on and Iceberg off → ducklake.enabled=true, no -// iceberg block. +// TestReconcilePendingCreatesDuckLakeOnlyCnpgCR verifies cnpg-shard with +// DuckLake on emits ducklake.enabled=true and no iceberg block. func TestReconcilePendingCreatesDuckLakeOnlyCnpgCR(t *testing.T) { dc, fakeK8s := newFakeDucklingClient() fs := newFakeStore() @@ -992,15 +827,13 @@ func TestReconcilePendingCreatesDuckLakeOnlyCnpgCR(t *testing.T) { } // TestDucklingCreateCnpgShardRequiresCatalog verifies the Create guard: a -// cnpg-shard CR with neither DuckLake nor Iceberg has nothing to attach and is -// rejected. +// cnpg-shard CR without DuckLake has nothing to attach and is rejected. func TestDucklingCreateCnpgShardRequiresCatalog(t *testing.T) { dc, fakeK8s := newFakeDucklingClient() ctx := context.Background() err := dc.Create(ctx, "no-catalog", CreateOptions{ MetadataStoreType: configstore.MetadataStoreKindCnpgShard, - IcebergEnabled: false, DuckLakeEnabled: false, }) if err == nil { @@ -1012,167 +845,19 @@ func TestDucklingCreateCnpgShardRequiresCatalog(t *testing.T) { } // TestDucklingCreateRejectsUnsupportedType verifies the control plane refuses -// to create metadata-store types it doesn't provision (notably "external"). +// to create metadata-store types it doesn't provision. func TestDucklingCreateRejectsUnsupportedType(t *testing.T) { dc, _ := newFakeDucklingClient() - if err := dc.Create(context.Background(), "ext-org", CreateOptions{MetadataStoreType: "external"}); err == nil { - t.Fatal("expected error creating CR with unsupported metadata store type 'external'") - } -} - -// TestReconcileProvisioningCnpgShardGatedOnIceberg verifies a cnpg-shard -// warehouse does not flip to Ready while iceberg_state is unset (the Lakekeeper -// catalog isn't up yet), then reaches Ready once iceberg_state is Ready. This -// is the gating that the reconcileProvisioning -> reconcileLakekeeper call -// satisfies in production; here we drive iceberg_state directly since no -// Lakekeeper provisioner is wired. -func TestReconcileProvisioningCnpgShardGatedOnIceberg(t *testing.T) { - dc, fakeK8s := newFakeDucklingClient() - fs := newFakeStore() - fs.warehouses["org-cs"] = &configstore.ManagedWarehouse{ - OrgID: "org-cs", - State: configstore.ManagedWarehouseStateProvisioning, - CreatedAt: time.Now(), - MetadataStore: configstore.ManagedWarehouseMetadataStore{ - Kind: configstore.MetadataStoreKindCnpgShard, - }, - Iceberg: configstore.ManagedWarehouseIceberg{ - Enabled: true, - Backend: configstore.IcebergBackendLakekeeper, - }, - } - - cr := &unstructured.Unstructured{ - Object: map[string]interface{}{ - "apiVersion": "k8s.posthog.com/v1alpha1", - "kind": "Duckling", - "metadata": map[string]interface{}{ - "name": ducklingName("org-cs"), - "namespace": ducklingNamespace, - }, - "status": map[string]interface{}{ - "metadataStore": map[string]interface{}{ - "type": configstore.MetadataStoreKindCnpgShard, - "endpoint": "shard-001-pooler.cnpg-shards.svc.cluster.local", - "password": "from-provider-sql", - "user": "lakekeeper_org_cs", - "database": "lakekeeper_org_cs", - }, - "dataStore": map[string]interface{}{ - "type": "s3bucket", - "bucketName": "org-cs-bucket", - }, - "iamRoleArn": "arn:aws:iam::123456789012:role/duckling-org-cs", - "conditions": []interface{}{ - map[string]interface{}{"type": "Ready", "status": "True"}, - map[string]interface{}{"type": "Synced", "status": "True"}, - }, - }, - }, - } - ctx := context.Background() - if _, err := fakeK8s.Resource(ducklingGVR).Namespace(ducklingNamespace).Create(ctx, cr, metav1.CreateOptions{}); err != nil { - t.Fatalf("create CR: %v", err) - } - - ctrl := NewControllerWithClient(fs, dc, time.Second) - ctrl.SetProbe(func(context.Context, string, string, string, string, string) error { return nil }) - - // First pass: infra is ready but iceberg_state is still pending (no - // Lakekeeper provisioner wired), so the warehouse must stay in provisioning. - ctrl.reconcile(ctx) - w := fs.warehouses["org-cs"] - if w.MetadataStoreState != configstore.ManagedWarehouseStateReady { - t.Fatalf("expected metadata_store_state ready, got %q", w.MetadataStoreState) - } - if w.State == configstore.ManagedWarehouseStateReady { - t.Fatal("warehouse must not be Ready while iceberg (Lakekeeper) is unprovisioned") - } - - // Simulate the Lakekeeper provisioner having completed. - if err := fs.UpdateIcebergConfig("org-cs", map[string]interface{}{ - "iceberg_state": configstore.ManagedWarehouseStateReady, - "iceberg_lakekeeper_endpoint": "http://lakekeeper-org-cs.lakekeeper.svc/catalog", - }); err != nil { - t.Fatalf("update iceberg config: %v", err) - } - - // Second pass: now all components incl. iceberg are ready -> Ready. - ctrl.reconcile(ctx) - if fs.warehouses["org-cs"].State != configstore.ManagedWarehouseStateReady { - t.Fatalf("expected ready state after iceberg ready, got %q", fs.warehouses["org-cs"].State) - } -} - -// --- external metadata store (iceberg+external and ducklake+external) --- - -// TestReconcilePendingCreatesIcebergExternalCR verifies a warehouse with an -// external metadata store + iceberg produces a Duckling CR with -// metadataStore.type=external (carrying endpoint/passwordAwsSecret/user/ -// database), an external dataStore reusing the named bucket, and iceberg on. -func TestReconcilePendingCreatesIcebergExternalCR(t *testing.T) { - dc, fakeK8s := newFakeDucklingClient() - fs := newFakeStore() - fs.warehouses["org-ext"] = &configstore.ManagedWarehouse{ - OrgID: "org-ext", - State: configstore.ManagedWarehouseStatePending, - MetadataStore: configstore.ManagedWarehouseMetadataStore{ - Kind: configstore.MetadataStoreKindExternal, - Endpoint: "rds.example.us-east-1.rds.amazonaws.com", - Username: "postgres", - DatabaseName: "postgres", - PasswordAWSSecret: "duckling-example-rds-password", - }, - DataStore: configstore.ManagedWarehouseDataStore{ - Kind: "external", - BucketName: "posthog-duckling-example", - Region: "us-east-1", - }, - Iceberg: configstore.ManagedWarehouseIceberg{ - Enabled: true, - Backend: configstore.IcebergBackendLakekeeper, - }, - } - - ctrl := NewControllerWithClient(fs, dc, time.Second) - ctx := context.Background() - ctrl.reconcile(ctx) - - cr, err := fakeK8s.Resource(ducklingGVR).Namespace(ducklingNamespace).Get(ctx, ducklingName("org-ext"), metav1.GetOptions{}) - if err != nil { - t.Fatalf("expected CR to exist: %v", err) - } - spec := cr.Object["spec"].(map[string]interface{}) - ms := spec["metadataStore"].(map[string]interface{}) - if ms["type"] != configstore.MetadataStoreKindExternal { - t.Fatalf("metadataStore.type = %v, want external", ms["type"]) - } - ext, ok := ms["external"].(map[string]interface{}) - if !ok { - t.Fatalf("expected metadataStore.external block, got %v", ms) - } - if ext["endpoint"] != "rds.example.us-east-1.rds.amazonaws.com" || ext["passwordAwsSecret"] != "duckling-example-rds-password" { - t.Errorf("external endpoint/secret wrong: %v", ext) - } - if ext["user"] != "postgres" || ext["database"] != "postgres" { - t.Errorf("external user/database wrong: %v", ext) - } - ds := spec["dataStore"].(map[string]interface{}) - if ds["type"] != "external" { - t.Fatalf("dataStore.type = %v, want external", ds["type"]) - } - dsExt, ok := ds["external"].(map[string]interface{}) - if !ok || dsExt["bucketName"] != "posthog-duckling-example" || dsExt["region"] != "us-east-1" { - t.Errorf("dataStore.external wrong: %v", ds["external"]) - } - iceberg, ok := spec["iceberg"].(map[string]interface{}) - if !ok || iceberg["enabled"] != true { - t.Errorf("expected iceberg.enabled=true, got %v", spec["iceberg"]) + if err := dc.Create(context.Background(), "aurora-org", CreateOptions{ + MetadataStoreType: "aurora", + DuckLakeEnabled: true, + }); err == nil { + t.Fatal("expected error creating CR with unsupported metadata store type") } } // TestReconcilePendingCreatesDuckLakeExternalCR verifies external metadata -// WITHOUT iceberg yields a CR with no iceberg block (DuckLake-on-external). +// yields a DuckLake CR with no iceberg block. func TestReconcilePendingCreatesDuckLakeExternalCR(t *testing.T) { dc, fakeK8s := newFakeDucklingClient() fs := newFakeStore() @@ -1187,6 +872,7 @@ func TestReconcilePendingCreatesDuckLakeExternalCR(t *testing.T) { DataStore: configstore.ManagedWarehouseDataStore{ Kind: "external", BucketName: "posthog-duckling-example", Region: "us-east-1", }, + DuckLake: configstore.ManagedWarehouseDuckLake{Enabled: true}, } ctrl := NewControllerWithClient(fs, dc, time.Second) @@ -1201,6 +887,9 @@ func TestReconcilePendingCreatesDuckLakeExternalCR(t *testing.T) { if spec["metadataStore"].(map[string]interface{})["type"] != configstore.MetadataStoreKindExternal { t.Fatalf("metadataStore.type wrong: %v", spec["metadataStore"]) } + if dl, ok := spec["ducklake"].(map[string]interface{}); !ok || dl["enabled"] != true { + t.Errorf("expected ducklake.enabled=true, got %v", spec["ducklake"]) + } if _, present := spec["iceberg"]; present { t.Errorf("ducklake+external CR must not carry an iceberg block, got %v", spec["iceberg"]) } @@ -1211,7 +900,10 @@ func TestReconcilePendingCreatesDuckLakeExternalCR(t *testing.T) { func TestDucklingCreateExternalRequiresFields(t *testing.T) { dc, fakeK8s := newFakeDucklingClient() ctx := context.Background() - if err := dc.Create(ctx, "ext-bad", CreateOptions{MetadataStoreType: configstore.MetadataStoreKindExternal}); err == nil { + if err := dc.Create(ctx, "ext-bad", CreateOptions{ + MetadataStoreType: configstore.MetadataStoreKindExternal, + DuckLakeEnabled: true, + }); err == nil { t.Fatal("expected error creating external CR without endpoint/passwordAwsSecret") } if _, getErr := fakeK8s.Resource(ducklingGVR).Namespace(ducklingNamespace).Get(ctx, ducklingName("ext-bad"), metav1.GetOptions{}); getErr == nil { @@ -1228,6 +920,7 @@ func TestDucklingCreateExternalDataStoreRequiresBucket(t *testing.T) { ExternalEndpoint: "h", ExternalPasswordAWSSecret: "s", DataStoreType: "external", + DuckLakeEnabled: true, }) if err == nil { t.Fatal("expected error: external dataStore without a bucket name") @@ -1268,9 +961,9 @@ func TestDucklingGetFallsBackToLegacyName(t *testing.T) { t.Error("expected to parse the legacy CR's status") } - // And iceberg/pgbouncer reads + delete must resolve it too. - if _, err := dc.GetIcebergEnabled(ctx, org); err != nil { - t.Errorf("GetIcebergEnabled fallback: %v", err) + // And pgbouncer reads + delete must resolve it too. + if _, err := dc.GetPgBouncerEnabled(ctx, org); err != nil { + t.Errorf("GetPgBouncerEnabled fallback: %v", err) } if err := dc.Delete(ctx, org); err != nil { t.Errorf("Delete fallback: %v", err) @@ -1281,8 +974,8 @@ func TestDucklingGetFallsBackToLegacyName(t *testing.T) { } // TestParseDucklingStatusDuckLakeEnabled locks in the present/absent contract -// for spec.ducklake.enabled: nil (legacy CR) lets the activator apply the -// type-based default; an explicit true/false is authoritative. +// for spec.ducklake.enabled: nil means absent; an explicit true/false is +// authoritative. func TestParseDucklingStatusDuckLakeEnabled(t *testing.T) { mk := func(spec map[string]interface{}) *unstructured.Unstructured { return &unstructured.Unstructured{Object: map[string]interface{}{ diff --git a/controlplane/provisioner/k8s_client.go b/controlplane/provisioner/k8s_client.go index 10d5091e..95001595 100644 --- a/controlplane/provisioner/k8s_client.go +++ b/controlplane/provisioner/k8s_client.go @@ -6,7 +6,6 @@ import ( "context" "encoding/json" "fmt" - "regexp" "strings" "github.com/posthog/duckgres/controlplane/configstore" @@ -45,22 +44,12 @@ type DucklingStatus struct { BucketName string S3Region string } - // Iceberg is populated when spec.iceberg.enabled=true. The - // composition provisions a per-org Lakekeeper instance; the - // Lakekeeper provisioner extension drives readiness off the - // Lakekeeper CR itself, so we only need namespace/region here. - Iceberg struct { - NamespaceName string - Region string - } IAMRoleARN string ReadyCondition bool SyncedFalseMessage string // DuckLakeEnabled is spec.ducklake.enabled, read present/absent: nil when - // the CR predates the decoupled ducklake field (the worker activator then - // falls back to the legacy type-based default — DuckLake on for - // external, off for cnpg-shard). Non-nil for decoupled ducklings. + // the CR predates the explicit ducklake field. DuckLakeEnabled *bool } @@ -88,8 +77,7 @@ func NewDucklingClientWithDynamic(client dynamic.Interface) *DucklingClient { } // ducklingName is the k8s/AWS resource name derived from an org ID, used for -// the Duckling CR, the IAM role (duckling-), the S3 bucket, the -// Lakekeeper CR/SA/Secret, etc. Org IDs are validated as DNS-1123 labels at +// the Duckling CR, the IAM role (duckling-), the S3 bucket, etc. Org IDs are validated as DNS-1123 labels at // provision time (lowercase alphanumerics + hyphens), so this only lowercases: // hyphens are preserved, keeping in-cluster names human-readable and injective // with the org ID. @@ -101,22 +89,6 @@ func ducklingName(orgID string) string { return strings.ToLower(orgID) } -// pgIdentSanitizeRe matches characters not allowed in an unquoted Postgres -// identifier fragment. -var pgIdentSanitizeRe = regexp.MustCompile(`[^a-z0-9_]`) - -// pgIdentSuffix sanitizes an org ID into a valid unquoted Postgres identifier -// fragment: lowercase, with every non-[a-z0-9_] character (notably hyphens) -// mapped to '_'. Postgres identifiers can't contain hyphens unquoted, so PG -// object names can't preserve them the way k8s names do. This mirrors the -// Crossplane composition's $pgIdent transform for cnpg-shard, so the external -// (provisioner-created) and cnpg-shard (composition-created) Lakekeeper -// databases follow the same convention. Injective for org IDs restricted to -// [a-z0-9-], which the provision-time validation guarantees. -func pgIdentSuffix(orgID string) string { - return pgIdentSanitizeRe.ReplaceAllString(strings.ToLower(orgID), "_") -} - // legacyDucklingName is the pre-hyphen-preservation transform (hyphens // stripped). Retained ONLY so lookups can still find Duckling CRs created // before ducklingName started preserving hyphens — e.g. a CR named @@ -150,24 +122,17 @@ type CreateOptions struct { DataStoreBucket string DataStoreRegion string - // IcebergEnabled toggles spec.iceberg.enabled on the Duckling CR. The - // composition only provisions the per-tenant Lakekeeper Iceberg catalog - // when this is true; flipping it post-create is handled by the controller's - // Ready-state drift logic. - IcebergEnabled bool - // IcebergNamespace is the Iceberg namespace within the tenant's catalog. - // Empty falls back to the XRD default ("main"). - IcebergNamespace string - - // DuckLakeEnabled toggles spec.ducklake.enabled. Independent of Iceberg and - // of the metadata-store type; at least one of DuckLakeEnabled/IcebergEnabled - // must be true (Create rejects a CR with neither). + // DuckLakeEnabled toggles spec.ducklake.enabled. Create rejects CRs without + // DuckLake because DuckLake is the only supported catalog. DuckLakeEnabled bool } // Create creates a Duckling CR for the given org. func (d *DucklingClient) Create(ctx context.Context, orgID string, opts CreateOptions) error { name := ducklingName(orgID) + if !opts.DuckLakeEnabled { + return fmt.Errorf("create duckling CR %q: ducklake must be enabled", name) + } var metadataStore map[string]interface{} switch opts.MetadataStoreType { @@ -175,21 +140,16 @@ func (d *DucklingClient) Create(ctx context.Context, orgID string, opts CreateOp // The cnpg-shard metadata store is the per-tenant Postgres on the shared // CloudNativePG shard, provisioned via provider-sql. It carries no // per-claim config — the composition reads the active shard from chart - // values. It hosts the DuckLake catalog and/or the Lakekeeper PG - // depending on the catalog flags; a CR with neither catalog has nothing - // to attach, so refuse it. - if !opts.IcebergEnabled && !opts.DuckLakeEnabled { - return fmt.Errorf("create duckling CR %q: metadata store type %q requires at least one of ducklake or iceberg enabled", name, configstore.MetadataStoreKindCnpgShard) - } + // values. It hosts the DuckLake catalog; a CR without DuckLake has + // nothing to attach, so refuse it. // No pgbouncer block: cnpg-shard tenants reach Postgres through the // shard's own session-mode Pooler, not a per-Duckling PgBouncer. metadataStore = map[string]interface{}{"type": configstore.MetadataStoreKindCnpgShard} case configstore.MetadataStoreKindExternal: // A pre-existing Postgres (e.g. RDS), referenced by endpoint + an AWS // Secrets Manager secret name for the password (resolved by the - // composition via ESO). Backs a DuckLake catalog (iceberg disabled) or - // the Lakekeeper catalog (iceberg enabled). User/Database are omitted - // when empty so the XRD defaults ("postgres") apply. + // composition via ESO). Backs a DuckLake catalog. User/Database are + // omitted when empty so the XRD defaults ("postgres") apply. if opts.ExternalEndpoint == "" || opts.ExternalPasswordAWSSecret == "" { return fmt.Errorf("create duckling CR %q: metadata store type %q requires endpoint and passwordAwsSecret", name, configstore.MetadataStoreKindExternal) } @@ -246,18 +206,7 @@ func (d *DucklingClient) Create(ctx context.Context, orgID string, opts CreateOp spec := map[string]interface{}{ "metadataStore": metadataStore, "dataStore": dataStore, - // DuckLake is set explicitly (true or false) so the catalog choice is - // unambiguous on the CR — the worker activator reads spec.ducklake.enabled - // and only falls back to the legacy type-based default when the field is - // absent (i.e. for ducklings created before decoupling). - "ducklake": map[string]interface{}{"enabled": opts.DuckLakeEnabled}, - } - if opts.IcebergEnabled { - iceberg := map[string]interface{}{"enabled": true} - if ns := opts.IcebergNamespace; ns != "" { - iceberg["namespace"] = ns - } - spec["iceberg"] = iceberg + "ducklake": map[string]interface{}{"enabled": opts.DuckLakeEnabled}, } cr := &unstructured.Unstructured{ Object: map[string]interface{}{ @@ -375,59 +324,6 @@ func (d *DucklingClient) SetPgBouncerEnabled(ctx context.Context, orgID string, return nil } -// GetIcebergEnabled reads spec.iceberg.enabled from the Duckling CR. Missing -// blocks (composition at an older schema, CR predates iceberg support) are -// reported as false — same as an explicit opt-out — so the caller can just -// compare against the desired value. -func (d *DucklingClient) GetIcebergEnabled(ctx context.Context, orgID string) (bool, error) { - cr, name, err := d.getCR(ctx, orgID) - if err != nil { - return false, fmt.Errorf("get duckling CR %q: %w", name, err) - } - spec, ok := cr.Object["spec"].(map[string]interface{}) - if !ok { - return false, nil - } - iceberg, ok := spec["iceberg"].(map[string]interface{}) - if !ok { - return false, nil - } - enabled, _ := iceberg["enabled"].(bool) - return enabled, nil -} - -// SetIcebergEnabled patches spec.iceberg.enabled on the Duckling CR for the -// given org. Uses a JSON merge patch (RFC 7396) so the call is idempotent and -// only touches the iceberg block — sibling fields under spec (metadataStore, -// dataStore) are left untouched. -// -// Note: iceberg.namespace is enforced immutable by the XRD's CEL rule, so -// this method intentionally only patches enabled — namespace changes have -// to go through warehouse re-creation. -func (d *DucklingClient) SetIcebergEnabled(ctx context.Context, orgID string, enabled bool) error { - _, name, err := d.getCR(ctx, orgID) - if err != nil { - return fmt.Errorf("resolve duckling CR for %q: %w", orgID, err) - } - patch, err := json.Marshal(map[string]interface{}{ - "spec": map[string]interface{}{ - "iceberg": map[string]interface{}{ - "enabled": enabled, - }, - }, - }) - if err != nil { - return fmt.Errorf("marshal iceberg patch for %q: %w", name, err) - } - _, err = d.client.Resource(ducklingGVR).Namespace(ducklingNamespace).Patch( - ctx, name, types.MergePatchType, patch, metav1.PatchOptions{}, - ) - if err != nil { - return fmt.Errorf("patch duckling CR %q iceberg: %w", name, err) - } - return nil -} - // GetDataStoreBucketName reads spec.dataStore.bucketName from the Duckling CR. // Empty (missing block / missing key) means the CR predates CP-owned naming // and the composition is still deriving the name — the signal the backfill in @@ -482,8 +378,7 @@ func (d *DucklingClient) SetDataStoreBucketName(ctx context.Context, orgID, buck } // readSpecDuckLakeEnabled returns spec.ducklake.enabled as *bool — nil when the -// ducklake block (or its enabled key) is absent, so callers can distinguish a -// legacy CR (apply the type-based default) from an explicit true/false. +// ducklake block (or its enabled key) is absent. func readSpecDuckLakeEnabled(cr *unstructured.Unstructured) *bool { spec, ok := cr.Object["spec"].(map[string]interface{}) if !ok { @@ -502,8 +397,7 @@ func readSpecDuckLakeEnabled(cr *unstructured.Unstructured) *bool { func parseDucklingStatus(cr *unstructured.Unstructured) (*DucklingStatus, error) { // spec.ducklake.enabled lives in .spec (not .status) — read it first so it's - // captured even before the composition writes any status. Present/absent is - // significant: absent (legacy CR) leaves DuckLakeEnabled nil. + // captured even before the composition writes any status. duckLakeEnabled := readSpecDuckLakeEnabled(cr) status, ok := cr.Object["status"].(map[string]interface{}) @@ -533,12 +427,6 @@ func parseDucklingStatus(cr *unstructured.Unstructured) (*DucklingStatus, error) ds.DataStore.S3Region = getNestedString(store, "s3Region") } - // Parse status.iceberg (only populated when spec.iceberg.enabled=true) - if ic, ok := status["iceberg"].(map[string]interface{}); ok { - ds.Iceberg.NamespaceName = getNestedString(ic, "namespaceName") - ds.Iceberg.Region = getNestedString(ic, "region") - } - // Parse conditions conditions, _ := status["conditions"].([]interface{}) for _, cond := range conditions { diff --git a/controlplane/provisioner/lakekeeper_client.go b/controlplane/provisioner/lakekeeper_client.go deleted file mode 100644 index f22a8776..00000000 --- a/controlplane/provisioner/lakekeeper_client.go +++ /dev/null @@ -1,248 +0,0 @@ -package provisioner - -import ( - "bytes" - "context" - "encoding/json" - "errors" - "fmt" - "io" - "net/http" - "time" -) - -// LakekeeperClient is a thin HTTP client for the Lakekeeper management + -// catalog REST surface that the provisioner needs to drive: bootstrap the -// server, create the per-org warehouse, and check ready/bootstrapped state. -// -// One client instance addresses one Lakekeeper deployment. Per-org Lakekeeper -// deployments get their own client instances built ad-hoc by the provisioner. -// -// Authentication: a static bearer token can be set via WithBearer. In the -// current allowall + NetworkPolicy deployment model the token is unused; the -// field is kept so the OIDC follow-up (PR3) can plug in without changing -// callers. -type LakekeeperClient struct { - baseURL string - hc *http.Client - bearer string -} - -func NewLakekeeperClient(baseURL string) *LakekeeperClient { - return &LakekeeperClient{ - baseURL: baseURL, - hc: &http.Client{Timeout: 30 * time.Second}, - } -} - -// WithBearer sets the bearer token used on subsequent requests. The token -// must be a single-line string (no CR/LF) — Go's http.Header.Set won't error -// on construction but http.Client.Do will reject malformed headers at send. -// -// NOT safe to call concurrently with in-flight requests. Build the client -// once, configure it, then share read-only across goroutines. PR3 will add -// proper rotation primitives when OIDC refresh lands. -func (c *LakekeeperClient) WithBearer(token string) *LakekeeperClient { - c.bearer = token - return c -} - -func (c *LakekeeperClient) WithHTTPClient(hc *http.Client) *LakekeeperClient { - c.hc = hc - return c -} - -// ServerInfo is the subset of GET /management/v1/info the provisioner cares -// about. Bootstrapped flips to true after the first POST /management/v1/bootstrap. -type ServerInfo struct { - Version string `json:"version"` - Bootstrapped bool `json:"bootstrapped"` - ServerID string `json:"server-id"` - AuthzBackend string `json:"authz-backend"` - DefaultProjectID string `json:"default-project-id"` -} - -// Info fetches the server info. Returns ErrServerNotReady if the endpoint is -// reachable but the server reports itself as not yet ready (rare; treated as -// transient by the caller). -func (c *LakekeeperClient) Info(ctx context.Context) (*ServerInfo, error) { - var out ServerInfo - if err := c.do(ctx, http.MethodGet, "/management/v1/info", nil, &out); err != nil { - return nil, err - } - return &out, nil -} - -// Bootstrap is the one-shot init of a fresh Lakekeeper server. Returns nil if -// the server was already bootstrapped (Lakekeeper responds 409 in that case; -// we treat it as success). -func (c *LakekeeperClient) Bootstrap(ctx context.Context) error { - body := map[string]any{ - "accept-terms-of-use": true, - "is-operator": true, - } - err := c.do(ctx, http.MethodPost, "/management/v1/bootstrap", body, nil) - if err == nil { - return nil - } - var apiErr *APIError - if errors.As(err, &apiErr) && apiErr.Status == http.StatusConflict { - // Already bootstrapped — idempotent success. - return nil - } - return err -} - -// WarehouseStorageProfile is the subset of fields we send for an S3 / S3-compat -// warehouse. The Lakekeeper API accepts more; we only set what we need. -type WarehouseStorageProfile struct { - Type string `json:"type"` // "s3" - Bucket string `json:"bucket"` - KeyPrefix string `json:"key-prefix"` - Endpoint string `json:"endpoint,omitempty"` // e.g. http://minio:9000; omit for real AWS - STSEndpoint string `json:"sts-endpoint,omitempty"` // optional - Region string `json:"region"` - PathStyleAccess bool `json:"path-style-access"` - Flavor string `json:"flavor"` // "s3-compat" for MinIO, "aws" for AWS - STSEnabled bool `json:"sts-enabled"` - RemoteSigningEnabled bool `json:"remote-signing-enabled"` - // STSRoleARN is the IAM role Lakekeeper assumes to mint vended (scoped, - // short-lived) S3 credentials for clients. Lakekeeper requires it for the - // AWS flavor when sts-enabled. Empty for s3-compat (MinIO). We set it to - // the per-org duckling role, which the Lakekeeper pod already runs as via - // EKS Pod Identity — so it assumes itself (the role trusts its own ARN). - STSRoleARN string `json:"sts-role-arn,omitempty"` -} - -// WarehouseStorageCredential supports access-key creds (dev/MinIO) and -// instance-profile / IRSA-style creds (prod AWS, no static key). -type WarehouseStorageCredential struct { - Type string `json:"type"` // "s3" - CredentialType string `json:"credential-type"` // "access-key" or "aws-system-identity" - AWSAccessKeyID string `json:"aws-access-key-id,omitempty"` - AWSSecretAccessKey string `json:"aws-secret-access-key,omitempty"` -} - -// CreateWarehouseRequest is the body of POST /management/v1/warehouse. -type CreateWarehouseRequest struct { - WarehouseName string `json:"warehouse-name"` - ProjectID string `json:"project-id,omitempty"` - StorageProfile WarehouseStorageProfile `json:"storage-profile"` - StorageCredential WarehouseStorageCredential `json:"storage-credential"` -} - -// Warehouse is the subset of the create/list response we use. -type Warehouse struct { - ID string `json:"id"` - WarehouseID string `json:"warehouse-id"` - Name string `json:"name"` - ProjectID string `json:"project-id"` - Status string `json:"status"` -} - -// listWarehousesResponse wraps the list endpoint response. -type listWarehousesResponse struct { - Warehouses []Warehouse `json:"warehouses"` -} - -// EnsureWarehouse creates the warehouse if it doesn't exist, otherwise returns -// the existing one. Match is by warehouse-name within the default project. -// -// Idempotent under concurrent callers: if two callers both observe an empty -// list and both POST, the second POST will 409 and we re-list to return the -// winner. Callers that need stronger ordering should hold a per-org lock -// outside this method. -func (c *LakekeeperClient) EnsureWarehouse(ctx context.Context, req CreateWarehouseRequest) (*Warehouse, error) { - if existing, err := c.findWarehouseByName(ctx, req.WarehouseName); err != nil { - return nil, err - } else if existing != nil { - return existing, nil - } - var out Warehouse - err := c.do(ctx, http.MethodPost, "/management/v1/warehouse", req, &out) - if err == nil { - return &out, nil - } - // 409 → another caller (or a previous attempt) won the race. Re-list and - // return the existing warehouse rather than surface the conflict. - var apiErr *APIError - if errors.As(err, &apiErr) && apiErr.Status == http.StatusConflict { - existing, lookupErr := c.findWarehouseByName(ctx, req.WarehouseName) - if lookupErr != nil { - return nil, fmt.Errorf("create warehouse %q hit 409, lookup failed: %w", req.WarehouseName, lookupErr) - } - if existing != nil { - return existing, nil - } - // 409 but nothing matches by name — pass the original error through. - } - return nil, fmt.Errorf("create warehouse %q: %w", req.WarehouseName, err) -} - -// findWarehouseByName scans the first page of the warehouse list for a name -// match. Lakekeeper paginates the list endpoint; we assume one warehouse per -// per-org Lakekeeper instance, so a single page is enough today. If we ever -// host multiple warehouses per instance, revisit to honor `next-page-token`. -func (c *LakekeeperClient) findWarehouseByName(ctx context.Context, name string) (*Warehouse, error) { - var resp listWarehousesResponse - if err := c.do(ctx, http.MethodGet, "/management/v1/warehouse", nil, &resp); err != nil { - return nil, fmt.Errorf("list warehouses: %w", err) - } - for i := range resp.Warehouses { - if resp.Warehouses[i].Name == name { - return &resp.Warehouses[i], nil - } - } - return nil, nil -} - -// APIError is returned for non-2xx responses. Status holds the HTTP code; -// Body holds the raw response body (often a JSON error envelope from -// Lakekeeper that we don't bother unmarshalling). -type APIError struct { - Status int - Method string - Path string - Body string -} - -func (e *APIError) Error() string { - return fmt.Sprintf("lakekeeper %s %s: HTTP %d: %s", e.Method, e.Path, e.Status, e.Body) -} - -func (c *LakekeeperClient) do(ctx context.Context, method, path string, body, out any) error { - var rdr io.Reader - if body != nil { - b, err := json.Marshal(body) - if err != nil { - return fmt.Errorf("marshal %s %s body: %w", method, path, err) - } - rdr = bytes.NewReader(b) - } - req, err := http.NewRequestWithContext(ctx, method, c.baseURL+path, rdr) - if err != nil { - return err - } - if body != nil { - req.Header.Set("Content-Type", "application/json") - } - if c.bearer != "" { - req.Header.Set("Authorization", "Bearer "+c.bearer) - } - resp, err := c.hc.Do(req) - if err != nil { - return fmt.Errorf("%s %s: %w", method, path, err) - } - defer func() { _ = resp.Body.Close() }() - respBody, _ := io.ReadAll(resp.Body) - if resp.StatusCode < 200 || resp.StatusCode >= 300 { - return &APIError{Status: resp.StatusCode, Method: method, Path: path, Body: string(respBody)} - } - if out == nil || len(respBody) == 0 { - return nil - } - if err := json.Unmarshal(respBody, out); err != nil { - return fmt.Errorf("decode %s %s response: %w", method, path, err) - } - return nil -} diff --git a/controlplane/provisioner/lakekeeper_client_smoke_test.go b/controlplane/provisioner/lakekeeper_client_smoke_test.go deleted file mode 100644 index 0c0943fa..00000000 --- a/controlplane/provisioner/lakekeeper_client_smoke_test.go +++ /dev/null @@ -1,47 +0,0 @@ -package provisioner - -import ( - "context" - "errors" - "net/http" - "net/url" - "os" - "strings" - "testing" -) - -// Smoke-test against a real running Lakekeeper. Skipped unless -// LAKEKEEPER_SMOKE_URL is set (e.g. http://localhost:8181). Pair with -// LAKEKEEPER_SMOKE_BEARER if the server has OIDC enabled. -// -// To run against the tmp/lakekeeper-proto stack: -// -// export LAKEKEEPER_SMOKE_URL=http://localhost:8181 -// export LAKEKEEPER_SMOKE_BEARER="$(docker exec lkproto-oidc python /srv/mkjwt.py 300)" -// go test ./controlplane/provisioner/ -run TestSmoke -v -func TestSmoke_InfoAgainstLiveLakekeeper(t *testing.T) { - base := os.Getenv("LAKEKEEPER_SMOKE_URL") - if base == "" { - t.Skip("LAKEKEEPER_SMOKE_URL not set; skipping live smoke test") - } - if _, err := url.Parse(base); err != nil { - t.Fatalf("LAKEKEEPER_SMOKE_URL invalid: %v", err) - } - c := NewLakekeeperClient(strings.TrimRight(base, "/")) - if tok := os.Getenv("LAKEKEEPER_SMOKE_BEARER"); tok != "" { - c.WithBearer(tok) - } - info, err := c.Info(context.Background()) - if err != nil { - // Distinguish auth from connectivity. - var apiErr *APIError - if errors.As(err, &apiErr) && apiErr.Status == http.StatusUnauthorized { - t.Fatalf("Lakekeeper requires bearer (set LAKEKEEPER_SMOKE_BEARER): %v", err) - } - t.Fatalf("Info: %v", err) - } - if info.Version == "" { - t.Fatalf("expected non-empty version, got %+v", info) - } - t.Logf("live lakekeeper info: version=%s authz=%s bootstrapped=%v", info.Version, info.AuthzBackend, info.Bootstrapped) -} diff --git a/controlplane/provisioner/lakekeeper_client_test.go b/controlplane/provisioner/lakekeeper_client_test.go deleted file mode 100644 index 78084250..00000000 --- a/controlplane/provisioner/lakekeeper_client_test.go +++ /dev/null @@ -1,182 +0,0 @@ -package provisioner - -import ( - "context" - "encoding/json" - "errors" - "io" - "net/http" - "net/http/httptest" - "strings" - "testing" -) - -func newTestClient(handler http.Handler) (*LakekeeperClient, func()) { - srv := httptest.NewServer(handler) - return NewLakekeeperClient(srv.URL), srv.Close -} - -func TestInfo_OK(t *testing.T) { - c, stop := newTestClient(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - if r.Method != http.MethodGet || r.URL.Path != "/management/v1/info" { - t.Fatalf("unexpected request: %s %s", r.Method, r.URL.Path) - } - _, _ = io.WriteString(w, `{"version":"0.11.6","bootstrapped":true,"server-id":"sid","authz-backend":"allow-all"}`) - })) - defer stop() - - info, err := c.Info(context.Background()) - if err != nil { - t.Fatalf("Info: %v", err) - } - if !info.Bootstrapped || info.Version != "0.11.6" || info.AuthzBackend != "allow-all" { - t.Fatalf("unexpected info: %+v", info) - } -} - -func TestBootstrap_AlreadyBootstrappedIsIdempotent(t *testing.T) { - c, stop := newTestClient(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - w.WriteHeader(http.StatusConflict) - _, _ = io.WriteString(w, `{"error":{"message":"already bootstrapped"}}`) - })) - defer stop() - - if err := c.Bootstrap(context.Background()); err != nil { - t.Fatalf("expected idempotent success on 409, got: %v", err) - } -} - -func TestBootstrap_ServerErrorPropagates(t *testing.T) { - c, stop := newTestClient(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - w.WriteHeader(http.StatusInternalServerError) - _, _ = io.WriteString(w, "boom") - })) - defer stop() - - err := c.Bootstrap(context.Background()) - var apiErr *APIError - if !errors.As(err, &apiErr) { - t.Fatalf("expected APIError, got: %v", err) - } - if apiErr.Status != http.StatusInternalServerError || apiErr.Path != "/management/v1/bootstrap" { - t.Fatalf("unexpected APIError: %+v", apiErr) - } -} - -func TestEnsureWarehouse_CreatesWhenAbsent(t *testing.T) { - var sawCreate bool - c, stop := newTestClient(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - switch { - case r.Method == http.MethodGet && r.URL.Path == "/management/v1/warehouse": - _, _ = io.WriteString(w, `{"warehouses":[]}`) - case r.Method == http.MethodPost && r.URL.Path == "/management/v1/warehouse": - sawCreate = true - var req CreateWarehouseRequest - if err := json.NewDecoder(r.Body).Decode(&req); err != nil { - t.Fatalf("decode body: %v", err) - } - if req.WarehouseName != "org-acme" { - t.Fatalf("warehouse name = %q, want org-acme", req.WarehouseName) - } - if !req.StorageProfile.STSEnabled || req.StorageProfile.Flavor != "s3-compat" { - t.Fatalf("storage profile must enable STS + s3-compat flavor, got %+v", req.StorageProfile) - } - if req.StorageProfile.RemoteSigningEnabled { - t.Fatalf("remote-signing-enabled must be false for DuckDB-compatible vending") - } - _, _ = io.WriteString(w, `{"warehouse-id":"wh-uuid","name":"org-acme","status":"active"}`) - default: - t.Fatalf("unexpected request: %s %s", r.Method, r.URL.Path) - } - })) - defer stop() - - req := CreateWarehouseRequest{ - WarehouseName: "org-acme", - StorageProfile: WarehouseStorageProfile{ - Type: "s3", Bucket: "warehouse", KeyPrefix: "org-acme", - Endpoint: "http://minio:9000", Region: "us-east-1", - PathStyleAccess: true, Flavor: "s3-compat", - STSEnabled: true, RemoteSigningEnabled: false, - }, - StorageCredential: WarehouseStorageCredential{ - Type: "s3", CredentialType: "access-key", - AWSAccessKeyID: "minioadmin", AWSSecretAccessKey: "minioadmin", - }, - } - wh, err := c.EnsureWarehouse(context.Background(), req) - if err != nil { - t.Fatalf("EnsureWarehouse: %v", err) - } - if !sawCreate { - t.Fatalf("create POST was not sent") - } - if wh.Name != "org-acme" || wh.WarehouseID != "wh-uuid" { - t.Fatalf("unexpected warehouse: %+v", wh) - } -} - -func TestEnsureWarehouse_ResolvesRaceOn409(t *testing.T) { - // Simulate concurrent reconcilers: first GET sees empty list, POST loses - // the race (409), follow-up GET returns the winner's warehouse. - step := 0 - c, stop := newTestClient(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - switch { - case r.Method == http.MethodGet && step == 0: - step = 1 - _, _ = io.WriteString(w, `{"warehouses":[]}`) - case r.Method == http.MethodPost && step == 1: - step = 2 - w.WriteHeader(http.StatusConflict) - _, _ = io.WriteString(w, `{"error":{"message":"warehouse exists"}}`) - case r.Method == http.MethodGet && step == 2: - _, _ = io.WriteString(w, `{"warehouses":[{"warehouse-id":"winner","name":"org-acme","status":"active"}]}`) - default: - t.Fatalf("unexpected request at step %d: %s %s", step, r.Method, r.URL.Path) - } - })) - defer stop() - - wh, err := c.EnsureWarehouse(context.Background(), CreateWarehouseRequest{WarehouseName: "org-acme"}) - if err != nil { - t.Fatalf("EnsureWarehouse should resolve 409 via re-list: %v", err) - } - if wh.WarehouseID != "winner" { - t.Fatalf("expected winner warehouse, got %+v", wh) - } -} - -func TestEnsureWarehouse_NoOpWhenPresent(t *testing.T) { - c, stop := newTestClient(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - if r.Method == http.MethodPost { - t.Fatalf("create should not be called when warehouse already exists") - } - _, _ = io.WriteString(w, `{"warehouses":[{"warehouse-id":"existing","name":"org-acme","status":"active"}]}`) - })) - defer stop() - - wh, err := c.EnsureWarehouse(context.Background(), CreateWarehouseRequest{WarehouseName: "org-acme"}) - if err != nil { - t.Fatalf("EnsureWarehouse: %v", err) - } - if wh.WarehouseID != "existing" { - t.Fatalf("expected existing warehouse, got %+v", wh) - } -} - -func TestBearerHeaderIsSetWhenConfigured(t *testing.T) { - var sawAuth string - c, stop := newTestClient(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - sawAuth = r.Header.Get("Authorization") - _, _ = io.WriteString(w, `{"version":"x","bootstrapped":true}`) - })) - defer stop() - c.WithBearer("abc.def.ghi") - - if _, err := c.Info(context.Background()); err != nil { - t.Fatalf("Info: %v", err) - } - if !strings.HasPrefix(sawAuth, "Bearer abc") { - t.Fatalf("Authorization header = %q, want Bearer abc...", sawAuth) - } -} diff --git a/controlplane/provisioner/lakekeeper_e2e_helpers_test.go b/controlplane/provisioner/lakekeeper_e2e_helpers_test.go deleted file mode 100644 index 4c437f29..00000000 --- a/controlplane/provisioner/lakekeeper_e2e_helpers_test.go +++ /dev/null @@ -1,53 +0,0 @@ -//go:build kubernetes - -package provisioner - -import ( - "context" - "database/sql" - "fmt" - "testing" - - corev1 "k8s.io/api/core/v1" - apierrors "k8s.io/apimachinery/pkg/api/errors" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/client-go/kubernetes" -) - -func ensureNamespace(t *testing.T, kc kubernetes.Interface, name string) error { - t.Helper() - ns := &corev1.Namespace{ObjectMeta: metav1.ObjectMeta{Name: name}} - if _, err := kc.CoreV1().Namespaces().Create(context.Background(), ns, metav1.CreateOptions{}); err != nil { - if apierrors.IsAlreadyExists(err) { - return nil - } - return err - } - return nil -} - -func metav1DeleteOpts() metav1.DeleteOptions { - policy := metav1.DeletePropagationBackground - return metav1.DeleteOptions{PropagationPolicy: &policy} -} - -// dropDatabaseAndRole removes both the DB and the role created by EnsureRole. -// Used by the e2e test's t.Cleanup. Best-effort — logs but doesn't fail. -func dropDatabaseAndRole(t *testing.T, dsn, dbName string) { - t.Helper() - db, err := sql.Open("pgx", dsn) - if err != nil { - t.Logf("cleanup open: %v", err) - return - } - defer db.Close() - roleName := dbName - _, _ = db.Exec("REASSIGN OWNED BY " + quoteIdent(roleName) + " TO CURRENT_USER") - _, _ = db.Exec("DROP OWNED BY " + quoteIdent(roleName)) - if _, err := db.Exec("DROP DATABASE IF EXISTS " + quoteIdent(dbName)); err != nil { - t.Logf("cleanup drop database %s: %v", dbName, err) - } - if _, err := db.Exec(fmt.Sprintf("DROP ROLE IF EXISTS %s", quoteIdent(roleName))); err != nil { - t.Logf("cleanup drop role %s: %v", roleName, err) - } -} diff --git a/controlplane/provisioner/lakekeeper_e2e_test.go b/controlplane/provisioner/lakekeeper_e2e_test.go deleted file mode 100644 index 033be330..00000000 --- a/controlplane/provisioner/lakekeeper_e2e_test.go +++ /dev/null @@ -1,174 +0,0 @@ -//go:build kubernetes - -package provisioner - -import ( - "context" - "errors" - "fmt" - "os" - "testing" - "time" - - "github.com/posthog/duckgres/controlplane/configstore" - "k8s.io/client-go/dynamic" - "k8s.io/client-go/kubernetes" - "k8s.io/client-go/rest" - "k8s.io/client-go/tools/clientcmd" -) - -// TestE2E_LakekeeperProvisionerOnOrbstack drives EnsureForOrg against a real -// Kubernetes cluster (orbstack) with the lakekeeper-operator installed and a -// Postgres + MinIO running on the host (via tmp/lakekeeper-proto/docker-compose). -// -// Skipped unless LAKEKEEPER_E2E_KUBECONFIG is set. To run: -// -// export LAKEKEEPER_E2E_KUBECONFIG=$HOME/.kube/config -// export PG_ADMIN_DSN='postgres://lakekeeper:lakekeeper@localhost:5434/lakekeeper?sslmode=disable' -// go test -tags kubernetes ./controlplane/provisioner/ -run TestE2E_LakekeeperProvisionerOnOrbstack -v -// -// What's exercised: -// - Real K8s API server (not the fake dynamic client) so CR field -// types are validated against the operator's CRD OpenAPI schema -// - Real Postgres so EnsureDatabase + EnsureRole + the GRANT chain run -// end-to-end (verifies the previously-missing CREATE ROLE bug stays -// fixed) -// - Real lakekeeper-operator reconciling the CR — status.bootstrappedAt -// flips when the operator actually completes bootstrap -// - Real Lakekeeper pod calling /management/v1/warehouse, which exercises -// our HTTP client against a real Lakekeeper server (not httptest) -func TestE2E_LakekeeperProvisionerOnOrbstack(t *testing.T) { - kubeconfig := os.Getenv("LAKEKEEPER_E2E_KUBECONFIG") - if kubeconfig == "" { - t.Skip("LAKEKEEPER_E2E_KUBECONFIG not set; skipping orbstack e2e test") - } - dsn := os.Getenv("PG_ADMIN_DSN") - if dsn == "" { - t.Fatal("PG_ADMIN_DSN is required for e2e test") - } - - cfg, err := clientcmd.BuildConfigFromFlags("", kubeconfig) - if err != nil { - t.Fatalf("load kubeconfig: %v", err) - } - dc, err := dynamic.NewForConfig(cfg) - if err != nil { - t.Fatalf("dynamic client: %v", err) - } - kc, err := kubernetes.NewForConfig(cfg) - if err != nil { - t.Fatalf("kube client: %v", err) - } - - orgID := fmt.Sprintf("e2e%d", time.Now().Unix()) // alnum-only so it passes the label-value rule - t.Logf("orgID=%s", orgID) - - const namespace = "lakekeeper" // separate from lakekeeper-operator's own namespace - if err := ensureNamespace(t, kc, namespace); err != nil { - t.Fatalf("ensure namespace %s: %v", namespace, err) - } - - k8sClient := NewLakekeeperK8sClientWithClients(dc, kc, namespace) - - store := newFakeStore() - store.warehouses[orgID] = &configstore.ManagedWarehouse{ - OrgID: orgID, - State: configstore.ManagedWarehouseStateProvisioning, - } - - // The provisioner computes a baseURL like - // http://lakekeeper-.lakekeeper.svc:8181 — that hostname is only - // resolvable inside the cluster. From the host, we go through the API - // server's service-proxy subresource. WithClientFactory swaps the - // baseURL for the proxy URL on every NewLakekeeperClient call. - httpClient, err := rest.HTTPClientFor(cfg) - if err != nil { - t.Fatalf("rest http client: %v", err) - } - apiBase := cfg.Host - clientFor := func(baseURL string) *LakekeeperClient { - // The provisioner passes a baseURL like - // http://lakekeeper-.lakekeeper.svc:8181 - // (no /catalog suffix — management API lives at the root). From the - // host we route via the API server's service-proxy subresource. - proxyBase := fmt.Sprintf("%s/api/v1/namespaces/%s/services/http:%s:8181/proxy", - apiBase, namespace, LakekeeperResourceName(orgID)) - c := NewLakekeeperClient(proxyBase) - c.WithHTTPClient(httpClient) - return c - } - p := NewLakekeeperProvisioner(store, k8sClient, - WithImage("quay.io/lakekeeper/catalog:latest"), - WithClientFactory(clientFor), - ) - - in := ProvisioningInputs{ - AdminDSN: dsn, - PGHost: "host.docker.internal", // reachable from orbstack pods - PGPort: 5434, - PGSSLMode: "disable", // prototype PG has no TLS - S3: S3StorageConfig{ - Bucket: "warehouse", - KeyPrefix: orgID, - Endpoint: "http://host.docker.internal:9100", - Region: "us-east-1", - Flavor: "s3-compat", - // MinIO doesn't actually do STS-compat with these creds, but - // the warehouse-create call only validates the credential is - // well-formed; the actual S3 operations happen later when a - // duckling tries to write. - StaticAccessKeyID: "minioadmin", - StaticAccessKeySecret: "minioadmin", - }, - } - - t.Cleanup(func() { - // Cleanup: drop the per-org PG database + role, delete the CR + Secret. - // Keep best-effort — failures here shouldn't break the test result. - ctx := context.Background() - _ = k8sClient.dynamic.Resource(lakekeeperGVR).Namespace(namespace). - Delete(ctx, LakekeeperResourceName(orgID), metav1DeleteOpts()) - _ = k8sClient.kubernetes.CoreV1().Secrets(namespace). - Delete(ctx, LakekeeperResourceName(orgID), metav1DeleteOpts()) - dropDatabaseAndRole(t, dsn, lakekeeperDBName(orgID)) - }) - - // Retry EnsureForOrg until bootstrap completes (or timeout). The real - // operator's bootstrap takes 10–30s including image pull on first run. - ctx, cancel := context.WithTimeout(context.Background(), 3*time.Minute) - defer cancel() - startedAt := time.Now() - var lastErr error - for ctx.Err() == nil { - lastErr = p.EnsureForOrg(ctx, store.warehouses[orgID], in) - if lastErr == nil { - t.Logf("EnsureForOrg succeeded after %s", time.Since(startedAt)) - break - } - if errors.Is(lastErr, ErrBootstrapPending) { - t.Logf("[%s] bootstrap pending, requeueing...", time.Since(startedAt).Round(time.Second)) - time.Sleep(3 * time.Second) - continue - } - t.Fatalf("EnsureForOrg returned fatal error: %v", lastErr) - } - if lastErr != nil { - t.Fatalf("EnsureForOrg timed out: %v", lastErr) - } - - w := store.warehouses[orgID] - if !w.Iceberg.Enabled { - t.Errorf("Iceberg.Enabled not flipped on") - } - if w.Iceberg.LakekeeperEndpoint == "" { - t.Errorf("LakekeeperEndpoint not persisted") - } - if w.Iceberg.LakekeeperWarehouse != lakekeeperWarehouseName(orgID) { - t.Errorf("LakekeeperWarehouse = %q, want %q", w.Iceberg.LakekeeperWarehouse, lakekeeperWarehouseName(orgID)) - } - if w.Iceberg.LakekeeperClientID != oauthClientID(orgID) { - t.Errorf("LakekeeperClientID = %q, want %q", w.Iceberg.LakekeeperClientID, oauthClientID(orgID)) - } - t.Logf("warehouse row: endpoint=%s warehouse=%s client_id=%s", - w.Iceberg.LakekeeperEndpoint, w.Iceberg.LakekeeperWarehouse, w.Iceberg.LakekeeperClientID) -} diff --git a/controlplane/provisioner/lakekeeper_k8s.go b/controlplane/provisioner/lakekeeper_k8s.go deleted file mode 100644 index e3b4a064..00000000 --- a/controlplane/provisioner/lakekeeper_k8s.go +++ /dev/null @@ -1,578 +0,0 @@ -//go:build kubernetes - -package provisioner - -import ( - "context" - "encoding/json" - "fmt" - "regexp" - - corev1 "k8s.io/api/core/v1" - apierrors "k8s.io/apimachinery/pkg/api/errors" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured" - "k8s.io/apimachinery/pkg/runtime/schema" - "k8s.io/apimachinery/pkg/types" - "k8s.io/client-go/dynamic" - "k8s.io/client-go/kubernetes" - "k8s.io/client-go/rest" -) - -// k8sLabelValue matches the Kubernetes label-value grammar -// (([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])? with ≤63 chars. We use it to -// validate org IDs before stamping them on labels, so a malformed ID -// surfaces as a clear error rather than an opaque API server rejection. -var k8sLabelValue = regexp.MustCompile(`^([A-Za-z0-9]([-A-Za-z0-9_.]*[A-Za-z0-9])?)$`) - -func isValidOrgIDLabel(orgID string) bool { - return len(orgID) > 0 && len(orgID) <= 63 && k8sLabelValue.MatchString(orgID) -} - -// LakekeeperNamespace is the Kubernetes namespace where per-org Lakekeeper -// instances live. The lakekeeper-operator and its CRD watch every namespace -// by default but we co-locate the CRs to keep RBAC tight. -const LakekeeperNamespace = "lakekeeper" - -// Per-org Lakekeeper pod resource shape. Requests only (no limits) → Burstable -// QoS: a CPU limit would CFS-throttle the catalog, and we intentionally leave -// memory unbounded too. Lakekeeper is a light Rust REST catalog (mostly idle -// metadata ops), so a modest request floor is plenty; bump if a tenant needs -// more headroom. -const ( - lakekeeperPodCPU = "500m" - lakekeeperPodMemory = "512Mi" - lakekeeperPodReplicas = 2 -) - -// lakekeeperMetricsPort is the operator's default metrics container port -// (lakekeeper-operator getMetricsPort default). We don't set -// spec.server.metricsPort, so this is where the metrics endpoint listens and -// the value advertised to vmagent via the prometheus.io/port pod annotation. -const lakekeeperMetricsPort = "9000" - -// lakekeeperResourceRequests returns the CR spec.resources block: requests only, -// no limits (Burstable). Shared by EnsureCR (create) and PatchPodShape (drift) -// so the create and drift paths never diverge. -func lakekeeperResourceRequests() map[string]interface{} { - return map[string]interface{}{ - "requests": map[string]interface{}{ - "cpu": lakekeeperPodCPU, - "memory": lakekeeperPodMemory, - }, - } -} - -// lakekeeperPodMetadata returns the CR spec.podMetadata block carrying the -// Prometheus scrape annotations (see EnsureCR for the rationale). -func lakekeeperPodMetadata() map[string]interface{} { - return map[string]interface{}{ - "annotations": map[string]interface{}{ - "prometheus.io/scrape": "true", - "prometheus.io/port": lakekeeperMetricsPort, - "prometheus.io/path": "/metrics", - }, - } -} - -// PatchPodShape converges the pod-shape fields (replicas + resource requests + -// scrape annotations) onto every existing Lakekeeper CR for the org, matched by -// the duckgres/active-org label. -// -// It deliberately does NOT recompute the CR name from the orgID. Post-#632, -// LakekeeperResourceName preserves hyphens, but a legacy org's CR — and its -// Secret, ServiceAccount, and EKS pod-identity, all derived from the no-hyphen -// Duckling XR name — keeps the de-hyphenated name. Looking up by label patches -// whatever name actually exists (no-dash for legacy orgs, hyphenated for new -// ones) instead of minting a duplicate CR under a name that has no matching -// Secret/SA/pod-identity. -// -// Uses a JSON merge patch, which carries no resourceVersion, so it never races -// the operator's frequent status writes (the "object has been modified" -// conflicts seen under multiple control-plane replicas). limits is explicitly -// nulled so the patch strips any stale CPU/memory limit (requests-only shape). -func (c *LakekeeperK8sClient) PatchPodShape(ctx context.Context, orgID string) error { - if !isValidOrgIDLabel(orgID) { - return fmt.Errorf("PatchPodShape: orgID %q is not a valid K8s label value", orgID) - } - resource := c.dynamic.Resource(lakekeeperGVR).Namespace(c.namespace) - list, err := resource.List(ctx, metav1.ListOptions{ - LabelSelector: "duckgres/active-org=" + orgID, - }) - if err != nil { - return fmt.Errorf("list lakekeeper CRs for org %s: %w", orgID, err) - } - resources := lakekeeperResourceRequests() - resources["limits"] = nil // merge-patch removes any pre-existing limit block - patch, err := json.Marshal(map[string]interface{}{ - "spec": map[string]interface{}{ - "replicas": lakekeeperPodReplicas, - "resources": resources, - "podMetadata": lakekeeperPodMetadata(), - }, - }) - if err != nil { - return fmt.Errorf("marshal pod-shape patch: %w", err) - } - for i := range list.Items { - name := list.Items[i].GetName() - if _, err := resource.Patch(ctx, name, types.MergePatchType, patch, metav1.PatchOptions{}); err != nil { - return fmt.Errorf("patch lakekeeper CR %s pod shape: %w", name, err) - } - } - return nil -} - -// lakekeeperGVR matches the operator at /Users/james/opt/ph/lakekeeper-operator. -var lakekeeperGVR = schema.GroupVersionResource{ - Group: "lakekeeper.k8s.lakekeeper.io", - Version: "v1alpha1", - Resource: "lakekeepers", -} - -// LakekeeperK8sClient bundles the dynamic client (for the Lakekeeper CR) and -// the typed clientset (for the backing Secret) so the provisioner can drive -// both with one dependency. Construction mirrors DucklingClient. -type LakekeeperK8sClient struct { - dynamic dynamic.Interface - kubernetes kubernetes.Interface - namespace string -} - -// NewLakekeeperK8sClient builds a client from in-cluster config. -func NewLakekeeperK8sClient() (*LakekeeperK8sClient, error) { - cfg, err := rest.InClusterConfig() - if err != nil { - return nil, fmt.Errorf("in-cluster config: %w", err) - } - dc, err := dynamic.NewForConfig(cfg) - if err != nil { - return nil, fmt.Errorf("dynamic client: %w", err) - } - kc, err := kubernetes.NewForConfig(cfg) - if err != nil { - return nil, fmt.Errorf("kubernetes client: %w", err) - } - return &LakekeeperK8sClient{dynamic: dc, kubernetes: kc, namespace: LakekeeperNamespace}, nil -} - -// NewLakekeeperK8sClientWithClients is used by tests with fakes. -func NewLakekeeperK8sClientWithClients(dc dynamic.Interface, kc kubernetes.Interface, namespace string) *LakekeeperK8sClient { - if namespace == "" { - namespace = LakekeeperNamespace - } - return &LakekeeperK8sClient{dynamic: dc, kubernetes: kc, namespace: namespace} -} - -// LakekeeperResourceName derives the K8s resource name (CR + Secret + SA) for -// an org. Uses ducklingName so it preserves hyphens, matching the Duckling CR -// and the rest of the in-cluster resources. -func LakekeeperResourceName(orgID string) string { - return "lakekeeper-" + ducklingName(orgID) -} - -// LakekeeperSecretData is the strongly-typed contents of the per-org Secret -// that the CR's *SecretRef fields point at. All values are required. -type LakekeeperSecretData struct { - DBUser string - DBPassword string - EncryptionKey string // 32-byte key used by Lakekeeper for at-rest secret encryption - OAuth2ClientSecret string // client_secret minted for the duckling -} - -// SecretKey* are the keys inside the Secret. Stable contract with the CR. -const ( - SecretKeyDBUser = "db-user" - SecretKeyDBPassword = "db-password" - SecretKeyEncryptionKey = "encryption-key" - SecretKeyOAuth2ClientSecret = "oauth2-client-secret" -) - -// EnsureSecret creates the per-org Secret in the lakekeeper namespace or -// updates it if it already exists. Update semantics: the four keys are -// replaced wholesale on every call — callers must pass the full desired -// state, not a delta. -// -// On a Get/Update resourceVersion conflict (concurrent recreator between -// our IsAlreadyExists branch and our Update), the returned error wraps the -// apierrors.IsConflict-detectable original so the reconciler can treat it -// as transient and requeue. -func (c *LakekeeperK8sClient) EnsureSecret(ctx context.Context, orgID string, data LakekeeperSecretData) error { - if !isValidOrgIDLabel(orgID) { - return fmt.Errorf("EnsureSecret: orgID %q is not a valid K8s label value", orgID) - } - name := LakekeeperResourceName(orgID) - // Write to Data (bytes) rather than StringData (strings). The K8s API - // server converts StringData → Data on write and clears StringData on - // read, so production code that reads the Secret only ever sees Data - // populated. The dynamic-fake clientset doesn't do that conversion, so - // writing Data directly makes the fake's behavior match real K8s for - // downstream readers like secretFromExisting. - desired := &corev1.Secret{ - ObjectMeta: metav1.ObjectMeta{ - Name: name, - Namespace: c.namespace, - Labels: map[string]string{ - "app": "lakekeeper", - "duckgres/active-org": orgID, - }, - }, - Type: corev1.SecretTypeOpaque, - Data: map[string][]byte{ - SecretKeyDBUser: []byte(data.DBUser), - SecretKeyDBPassword: []byte(data.DBPassword), - SecretKeyEncryptionKey: []byte(data.EncryptionKey), - SecretKeyOAuth2ClientSecret: []byte(data.OAuth2ClientSecret), - }, - } - - secrets := c.kubernetes.CoreV1().Secrets(c.namespace) - _, err := secrets.Create(ctx, desired, metav1.CreateOptions{}) - if err == nil { - return nil - } - if !apierrors.IsAlreadyExists(err) { - return fmt.Errorf("create secret %s: %w", name, err) - } - // Update path: fetch resourceVersion + replace. - existing, getErr := secrets.Get(ctx, name, metav1.GetOptions{}) - if getErr != nil { - return fmt.Errorf("get secret %s for update: %w", name, getErr) - } - desired.ResourceVersion = existing.ResourceVersion - if _, err := secrets.Update(ctx, desired, metav1.UpdateOptions{}); err != nil { - return fmt.Errorf("update secret %s: %w", name, err) - } - return nil -} - -// LakekeeperServiceAccountName is the per-org ServiceAccount the Lakekeeper -// Deployment + migration Job run under. It matches the CR/Secret resource -// name. The posthog-cloud-infra EKS Pod Identity association keys on this -// exact (namespace, name) pair to bind a per-org IAM role, so changing this -// convention requires a matching Terraform change. -func LakekeeperServiceAccountName(orgID string) string { - return LakekeeperResourceName(orgID) -} - -// EnsureServiceAccount creates the per-org ServiceAccount that the Lakekeeper -// workload runs under. One SA per org — in a single shared namespace — so each -// org's Lakekeeper can carry a distinct cloud identity (EKS Pod Identity) -// scoped to its own object store. -// -// The SA is intentionally bare: EKS Pod Identity binds the IAM role to the -// (namespace, serviceAccount) pair on the AWS side, so no IRSA role-arn -// annotation is needed here. On re-runs we leave an existing SA untouched -// rather than overwriting it, so any annotations added out-of-band (e.g. an -// IRSA role-arn, if a cluster uses IRSA instead of Pod Identity) survive. -func (c *LakekeeperK8sClient) EnsureServiceAccount(ctx context.Context, orgID string) error { - if !isValidOrgIDLabel(orgID) { - return fmt.Errorf("EnsureServiceAccount: orgID %q is not a valid K8s label value", orgID) - } - name := LakekeeperServiceAccountName(orgID) - desired := &corev1.ServiceAccount{ - ObjectMeta: metav1.ObjectMeta{ - Name: name, - Namespace: c.namespace, - Labels: map[string]string{ - "app": "lakekeeper", - "duckgres/active-org": orgID, - }, - }, - } - sas := c.kubernetes.CoreV1().ServiceAccounts(c.namespace) - _, err := sas.Create(ctx, desired, metav1.CreateOptions{}) - if err == nil || apierrors.IsAlreadyExists(err) { - return nil - } - return fmt.Errorf("create service account %s: %w", name, err) -} - -// LakekeeperCRSpec carries the inputs we need to render a Lakekeeper CR. -// One CR per org. PG connection points at the org's existing managed-warehouse -// metadata Postgres, with the lakekeeper_ database created by -// EnsureDatabase. -type LakekeeperCRSpec struct { - OrgID string - Image string // e.g. quay.io/lakekeeper/catalog:v0.12.2 - Replicas int32 - PGHost string - PGPort int32 - PGDatabase string - // SecretName is the K8s Secret holding db-user, db-password, encryption-key, - // oauth2-client-secret. Typically LakekeeperResourceName(OrgID). - SecretName string - // BaseURI is the externally-visible URL clients (ducklings) use to reach - // Lakekeeper. In-cluster: http://lakekeeper-.lakekeeper.svc:8181 - BaseURI string - // PGSSLMode is the Postgres SSL mode the Lakekeeper pod uses to connect. - // Defaults to "require" (the operator's default). Set to "disable" for - // local/dev where PG runs without TLS. - PGSSLMode string - - // ServiceAccountName binds the Lakekeeper pod (and migration Job) to a - // specific ServiceAccount via the operator's spec.serviceAccountName - // field (a PostHog-fork addition). Empty falls back to the namespace - // default. We set it to the per-org SA so each org's Lakekeeper carries - // its own EKS Pod Identity for isolated object-store access. - ServiceAccountName string - - // KubernetesAuthAudiences, when non-empty, enables the operator's - // `authentication.kubernetes` mode with the given audiences. The - // duckling's projected SA token must carry one of these audiences for - // Lakekeeper to accept it. Empty disables kubernetes auth (Lakekeeper - // runs in allowall mode behind NetworkPolicy — the PR1+PR2 deployment - // shape). - KubernetesAuthAudiences []string -} - -// EnsureCR creates the Lakekeeper CR for the given org or patches it to match -// spec if it already exists. -// -// Like EnsureSecret, Update conflicts surface as apierrors.IsConflict-detectable -// errors so the reconciler can treat them as transient and requeue. -func (c *LakekeeperK8sClient) EnsureCR(ctx context.Context, spec LakekeeperCRSpec) error { - if spec.OrgID == "" { - return fmt.Errorf("EnsureCR: spec.OrgID is required") - } - if !isValidOrgIDLabel(spec.OrgID) { - return fmt.Errorf("EnsureCR: orgID %q is not a valid K8s label value", spec.OrgID) - } - if spec.Image == "" || spec.PGHost == "" || spec.PGDatabase == "" || spec.SecretName == "" { - return fmt.Errorf("EnsureCR: missing required field in spec: %+v", spec) - } - if spec.Replicas == 0 { - spec.Replicas = lakekeeperPodReplicas - } - if spec.PGPort == 0 { - spec.PGPort = 5432 - } - - name := LakekeeperResourceName(spec.OrgID) - cr := &unstructured.Unstructured{ - Object: map[string]interface{}{ - "apiVersion": "lakekeeper.k8s.lakekeeper.io/v1alpha1", - "kind": "Lakekeeper", - "metadata": map[string]interface{}{ - "name": name, - "namespace": c.namespace, - "labels": map[string]interface{}{ - "app": "lakekeeper", - "duckgres/active-org": spec.OrgID, - }, - }, - "spec": map[string]interface{}{ - "image": spec.Image, - "replicas": int64(spec.Replicas), - "database": map[string]interface{}{ - "type": "postgres", - "postgres": func() map[string]interface{} { - pg := map[string]interface{}{ - "host": spec.PGHost, - "port": int64(spec.PGPort), - "database": spec.PGDatabase, - "userSecretRef": map[string]interface{}{ - "name": spec.SecretName, - "key": SecretKeyDBUser, - }, - "passwordSecretRef": map[string]interface{}{ - "name": spec.SecretName, - "key": SecretKeyDBPassword, - }, - "encryptionKeySecretRef": map[string]interface{}{ - "name": spec.SecretName, - "key": SecretKeyEncryptionKey, - }, - } - if spec.PGSSLMode != "" { - pg["sslMode"] = spec.PGSSLMode - } - return pg - }(), - }, - "authorization": map[string]interface{}{ - "backend": "allowall", - }, - "bootstrap": map[string]interface{}{ - "enabled": true, - }, - "server": map[string]interface{}{ - "listenPort": int64(8181), - "enableDefaultProject": true, - "baseURI": spec.BaseURI, - // Let Lakekeeper use the pod's ambient (EKS Pod Identity) - // credentials for S3 — needed so it can assume the - // warehouse's sts-role-arn to vend. Lakekeeper defaults this - // OFF; the operator only emits the enabling env when the - // object is present, so we set it explicitly. The flags live - // under .aws (gated by enableAWS) — a nil parent wouldn't - // pick up the CRD defaults. assumeRoleRequireExternalID is - // false: same-account self-assume needs no external id (and - // requiring one would force an external-id in the - // storage-credential we don't send). - "storageSystemCredentials": map[string]interface{}{ - "enableAWS": true, - "aws": map[string]interface{}{ - "allowDirectSystemCredentials": true, - "assumeRoleRequireExternalID": false, - }, - }, - }, - // Pin a resource request floor (requests only, no limits → - // Burstable). Without it the catalog pod runs BestEffort and is - // first evicted under node pressure; a CPU limit would CFS-throttle - // it. Lakekeeper is a light Rust REST catalog, so a modest floor is - // plenty — tune the consts if a tenant needs more. Kept in sync with - // the drift-correction patch in PatchPodShape. - "resources": lakekeeperResourceRequests(), - // Stamp Prometheus scrape annotations onto the operator-managed - // pods. The managed-warehouse clusters have no prometheus-operator; - // vmagent discovers targets by pod annotation (kubernetes_sd), and - // the Lakekeeper CRD exposes no other pod-metadata hook — so without - // this the per-org catalog pods are never scraped. Lakekeeper serves - // metrics on the operator's "metrics" container port (its - // getMetricsPort default = lakekeeperMetricsPort). Requires the - // spec.podMetadata passthrough from PostHog's operator fork (branch - // posthog/serviceaccountname); on an operator without it the CRD - // prunes the field and these annotations are dropped — a safe no-op - // until the new operator image ships. - "podMetadata": lakekeeperPodMetadata(), - }, - }, - } - // Bind the workload to the per-org ServiceAccount when set. Maps to the - // operator's spec.serviceAccountName (PostHog-fork field); empty leaves it - // unset so the operator falls back to the namespace default. - if spec.ServiceAccountName != "" { - cr.Object["spec"].(map[string]interface{})["serviceAccountName"] = spec.ServiceAccountName - } - // Optional: enable Kubernetes SA-token authentication. The operator - // turns this into LAKEKEEPER__K8S_AUTH_ENABLED=true + - // LAKEKEEPER__K8S_AUTH_AUDIENCES=, which makes Lakekeeper validate - // incoming bearer tokens against the K8s TokenReview API. - if len(spec.KubernetesAuthAudiences) > 0 { - audiences := make([]interface{}, 0, len(spec.KubernetesAuthAudiences)) - for _, a := range spec.KubernetesAuthAudiences { - audiences = append(audiences, a) - } - cr.Object["spec"].(map[string]interface{})["authentication"] = map[string]interface{}{ - "kubernetes": map[string]interface{}{ - "enabled": true, - "audiences": audiences, - }, - } - } - - resource := c.dynamic.Resource(lakekeeperGVR).Namespace(c.namespace) - _, err := resource.Create(ctx, cr, metav1.CreateOptions{}) - if err == nil { - return nil - } - if !apierrors.IsAlreadyExists(err) { - return fmt.Errorf("create lakekeeper CR %s: %w", name, err) - } - // Update path. Fetch the existing CR to carry over resourceVersion and - // status. Real K8s separates status into a subresource, so an Update - // against the main resource shouldn't touch status — but the fake - // dynamic client doesn't honor that split, and the behaviour we want - // either way is "spec drift correction never overwrites the operator's - // status." So copy the existing status into the desired CR. - existing, getErr := resource.Get(ctx, name, metav1.GetOptions{}) - if getErr != nil { - return fmt.Errorf("get lakekeeper CR %s for update: %w", name, getErr) - } - cr.SetResourceVersion(existing.GetResourceVersion()) - if status, ok := existing.Object["status"]; ok { - cr.Object["status"] = status - } - if _, err := resource.Update(ctx, cr, metav1.UpdateOptions{}); err != nil { - return fmt.Errorf("update lakekeeper CR %s: %w", name, err) - } - return nil -} - -// LakekeeperCRStatus is the subset of status fields the provisioner reads. -type LakekeeperCRStatus struct { - Bootstrapped bool - ServerID string - ReadyReplicas int32 -} - -// GetCR fetches the current status of the Lakekeeper CR. Returns -// (nil, nil) if the CR does not exist yet, matching the "not provisioned" -// state the provisioner reconciler treats as a no-op. -func (c *LakekeeperK8sClient) GetCR(ctx context.Context, orgID string) (*LakekeeperCRStatus, error) { - name := LakekeeperResourceName(orgID) - cr, err := c.dynamic.Resource(lakekeeperGVR).Namespace(c.namespace).Get(ctx, name, metav1.GetOptions{}) - if err != nil { - if apierrors.IsNotFound(err) { - return nil, nil - } - return nil, fmt.Errorf("get lakekeeper CR %s: %w", name, err) - } - st := &LakekeeperCRStatus{} - status, _ := cr.Object["status"].(map[string]interface{}) - if status == nil { - return st, nil - } - if v, _ := status["bootstrappedAt"].(string); v != "" { - st.Bootstrapped = true - } - if v, _ := status["serverID"].(string); v != "" { - st.ServerID = v - } - // readyReplicas may be int64 or float64 depending on the decoder path. - switch v := status["readyReplicas"].(type) { - case int64: - st.ReadyReplicas = int32(v) - case float64: - st.ReadyReplicas = int32(v) - } - return st, nil -} - -// backgroundDeletion propagates deletion to dependents (the operator-managed -// Deployment, Service, and migration Job are owned by the CR via -// ownerReferences). Without this, the dynamic client uses the API server's -// per-resource default, which can orphan those children. -var backgroundDeletion = metav1.DeletePropagationBackground - -// DeleteCR deletes the org's Lakekeeper CR. The operator-managed Deployment / -// Service / migration Job carry the CR as their controller ownerReference, so -// background-propagation deletion cascades to them. NotFound is treated as -// success (idempotent teardown). -func (c *LakekeeperK8sClient) DeleteCR(ctx context.Context, orgID string) error { - name := LakekeeperResourceName(orgID) - err := c.dynamic.Resource(lakekeeperGVR).Namespace(c.namespace).Delete(ctx, name, metav1.DeleteOptions{ - PropagationPolicy: &backgroundDeletion, - }) - if err != nil && !apierrors.IsNotFound(err) { - return fmt.Errorf("delete lakekeeper CR %s: %w", name, err) - } - return nil -} - -// DeleteSecret deletes the org's Lakekeeper Secret. The Secret has no -// ownerReference to the CR (EnsureSecret creates it standalone), so deleting -// the CR does not remove it — it must be torn down explicitly. NotFound is -// treated as success. -func (c *LakekeeperK8sClient) DeleteSecret(ctx context.Context, orgID string) error { - name := LakekeeperResourceName(orgID) - err := c.kubernetes.CoreV1().Secrets(c.namespace).Delete(ctx, name, metav1.DeleteOptions{}) - if err != nil && !apierrors.IsNotFound(err) { - return fmt.Errorf("delete lakekeeper secret %s: %w", name, err) - } - return nil -} - -// DeleteServiceAccount deletes the org's Lakekeeper ServiceAccount. Like the -// Secret, it is created standalone (no ownerReference) so it must be deleted -// explicitly. NotFound is treated as success. -func (c *LakekeeperK8sClient) DeleteServiceAccount(ctx context.Context, orgID string) error { - name := LakekeeperServiceAccountName(orgID) - err := c.kubernetes.CoreV1().ServiceAccounts(c.namespace).Delete(ctx, name, metav1.DeleteOptions{}) - if err != nil && !apierrors.IsNotFound(err) { - return fmt.Errorf("delete lakekeeper service account %s: %w", name, err) - } - return nil -} diff --git a/controlplane/provisioner/lakekeeper_k8s_test.go b/controlplane/provisioner/lakekeeper_k8s_test.go deleted file mode 100644 index 277deaf2..00000000 --- a/controlplane/provisioner/lakekeeper_k8s_test.go +++ /dev/null @@ -1,534 +0,0 @@ -//go:build kubernetes - -package provisioner - -import ( - "context" - "strings" - "testing" - - corev1 "k8s.io/api/core/v1" - apierrors "k8s.io/apimachinery/pkg/api/errors" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured" - "k8s.io/apimachinery/pkg/runtime" - "k8s.io/apimachinery/pkg/runtime/schema" - dynamicfake "k8s.io/client-go/dynamic/fake" - kubefake "k8s.io/client-go/kubernetes/fake" -) - -func newFakeLakekeeperClient() (*LakekeeperK8sClient, *dynamicfake.FakeDynamicClient, *kubefake.Clientset) { - scheme := runtime.NewScheme() - scheme.AddKnownTypeWithName(schema.GroupVersionKind{ - Group: "lakekeeper.k8s.lakekeeper.io", Version: "v1alpha1", Kind: "Lakekeeper", - }, &unstructured.Unstructured{}) - scheme.AddKnownTypeWithName(schema.GroupVersionKind{ - Group: "lakekeeper.k8s.lakekeeper.io", Version: "v1alpha1", Kind: "LakekeeperList", - }, &unstructured.UnstructuredList{}) - dc := dynamicfake.NewSimpleDynamicClient(scheme) - kc := kubefake.NewClientset() - c := NewLakekeeperK8sClientWithClients(dc, kc, "lakekeeper") - return c, dc, kc -} - -func TestLakekeeperResourceName(t *testing.T) { - // k8s resource names preserve hyphens (only lowercasing is applied). - cases := map[string]string{ - "acme": "lakekeeper-acme", - "019e417b-18c4-7a41": "lakekeeper-019e417b-18c4-7a41", - "00000000-0000-0000-0000-000000000000": "lakekeeper-00000000-0000-0000-0000-000000000000", - } - for in, want := range cases { - if got := LakekeeperResourceName(in); got != want { - t.Errorf("LakekeeperResourceName(%q) = %q, want %q", in, got, want) - } - } -} - -func TestEnsureSecret_CreateThenUpdate(t *testing.T) { - c, _, kc := newFakeLakekeeperClient() - ctx := context.Background() - orgID := "acme" - - data := LakekeeperSecretData{ - DBUser: "lakekeeper_acme", - DBPassword: "pw-1", - EncryptionKey: "key-1", - OAuth2ClientSecret: "oauth-1", - } - if err := c.EnsureSecret(ctx, orgID, data); err != nil { - t.Fatalf("EnsureSecret create: %v", err) - } - got, err := kc.CoreV1().Secrets("lakekeeper").Get(ctx, "lakekeeper-acme", metav1.GetOptions{}) - if err != nil { - t.Fatalf("get secret: %v", err) - } - assertSecretData(t, got, data) - if lbl := got.Labels["duckgres/active-org"]; lbl != "acme" { - t.Errorf("active-org label = %q, want acme", lbl) - } - - // Idempotent update: change one field, verify it lands. - data.DBPassword = "pw-2" - if err := c.EnsureSecret(ctx, orgID, data); err != nil { - t.Fatalf("EnsureSecret update: %v", err) - } - got, err = kc.CoreV1().Secrets("lakekeeper").Get(ctx, "lakekeeper-acme", metav1.GetOptions{}) - if err != nil { - t.Fatalf("get secret after update: %v", err) - } - assertSecretData(t, got, data) -} - -func assertSecretData(t *testing.T, s *corev1.Secret, want LakekeeperSecretData) { - t.Helper() - get := func(k string) string { return string(s.Data[k]) } - if get(SecretKeyDBUser) != want.DBUser { - t.Errorf("db-user = %q, want %q", get(SecretKeyDBUser), want.DBUser) - } - if get(SecretKeyDBPassword) != want.DBPassword { - t.Errorf("db-password = %q, want %q", get(SecretKeyDBPassword), want.DBPassword) - } - if get(SecretKeyEncryptionKey) != want.EncryptionKey { - t.Errorf("encryption-key = %q, want %q", get(SecretKeyEncryptionKey), want.EncryptionKey) - } - if get(SecretKeyOAuth2ClientSecret) != want.OAuth2ClientSecret { - t.Errorf("oauth2-client-secret = %q, want %q", get(SecretKeyOAuth2ClientSecret), want.OAuth2ClientSecret) - } -} - -func TestEnsureServiceAccount_CreateAndIdempotent(t *testing.T) { - c, _, kc := newFakeLakekeeperClient() - ctx := context.Background() - - if err := c.EnsureServiceAccount(ctx, "acme"); err != nil { - t.Fatalf("EnsureServiceAccount: %v", err) - } - sa, err := kc.CoreV1().ServiceAccounts("lakekeeper").Get(ctx, "lakekeeper-acme", metav1.GetOptions{}) - if err != nil { - t.Fatalf("get SA: %v", err) - } - if sa.Labels["duckgres/active-org"] != "acme" { - t.Errorf("active-org label = %q, want acme", sa.Labels["duckgres/active-org"]) - } - // Re-run must not error (AlreadyExists is swallowed). - if err := c.EnsureServiceAccount(ctx, "acme"); err != nil { - t.Fatalf("EnsureServiceAccount re-run: %v", err) - } -} - -func TestEnsureServiceAccount_RejectsBadOrgID(t *testing.T) { - c, _, _ := newFakeLakekeeperClient() - if err := c.EnsureServiceAccount(context.Background(), "bad/org id"); err == nil { - t.Fatal("expected error for invalid org ID") - } -} - -func TestEnsureCR_SetsServiceAccountNameWhenProvided(t *testing.T) { - c, dc, _ := newFakeLakekeeperClient() - ctx := context.Background() - base := LakekeeperCRSpec{ - OrgID: "acme", - Image: "quay.io/lakekeeper/catalog:v0.12.2", - PGHost: "acme-pg.local", - PGDatabase: "lakekeeper_acme", - SecretName: "lakekeeper-acme", - BaseURI: "http://lakekeeper-acme.lakekeeper.svc:8181", - } - - // With SA set → rendered into spec. - withSA := base - withSA.ServiceAccountName = "lakekeeper-acme" - if err := c.EnsureCR(ctx, withSA); err != nil { - t.Fatalf("EnsureCR: %v", err) - } - got, err := dc.Resource(lakekeeperGVR).Namespace("lakekeeper").Get(ctx, "lakekeeper-acme", metav1.GetOptions{}) - if err != nil { - t.Fatalf("get CR: %v", err) - } - specMap := got.Object["spec"].(map[string]interface{}) - if specMap["serviceAccountName"] != "lakekeeper-acme" { - t.Errorf("spec.serviceAccountName = %v, want lakekeeper-acme", specMap["serviceAccountName"]) - } -} - -func TestEnsureCR_OmitsServiceAccountNameWhenEmpty(t *testing.T) { - c, dc, _ := newFakeLakekeeperClient() - ctx := context.Background() - spec := LakekeeperCRSpec{ - OrgID: "acme", - Image: "quay.io/lakekeeper/catalog:v0.12.2", - PGHost: "acme-pg.local", - PGDatabase: "lakekeeper_acme", - SecretName: "lakekeeper-acme", - BaseURI: "http://lakekeeper-acme.lakekeeper.svc:8181", - // ServiceAccountName intentionally empty. - } - if err := c.EnsureCR(ctx, spec); err != nil { - t.Fatalf("EnsureCR: %v", err) - } - got, err := dc.Resource(lakekeeperGVR).Namespace("lakekeeper").Get(ctx, "lakekeeper-acme", metav1.GetOptions{}) - if err != nil { - t.Fatalf("get CR: %v", err) - } - specMap := got.Object["spec"].(map[string]interface{}) - if _, present := specMap["serviceAccountName"]; present { - t.Errorf("spec.serviceAccountName should be omitted when empty, got %v", specMap["serviceAccountName"]) - } -} - -func TestEnsureCR_ValidatesRequiredFields(t *testing.T) { - c, _, _ := newFakeLakekeeperClient() - err := c.EnsureCR(context.Background(), LakekeeperCRSpec{OrgID: "acme"}) - if err == nil { - t.Fatal("expected error for missing required fields") - } -} - -func TestEnsureCR_CreateAndShape(t *testing.T) { - c, dc, _ := newFakeLakekeeperClient() - ctx := context.Background() - spec := LakekeeperCRSpec{ - OrgID: "acme", - Image: "quay.io/lakekeeper/catalog:v0.12.2", - PGHost: "acme-pg.local", - PGDatabase: "lakekeeper_acme", - SecretName: "lakekeeper-acme", - BaseURI: "http://lakekeeper-acme.lakekeeper.svc:8181", - } - if err := c.EnsureCR(ctx, spec); err != nil { - t.Fatalf("EnsureCR: %v", err) - } - - got, err := dc.Resource(lakekeeperGVR).Namespace("lakekeeper").Get(ctx, "lakekeeper-acme", metav1.GetOptions{}) - if err != nil { - t.Fatalf("get CR: %v", err) - } - // Drill into the structure - specMap := got.Object["spec"].(map[string]interface{}) - if specMap["image"] != spec.Image { - t.Errorf("image = %v, want %s", specMap["image"], spec.Image) - } - authz := specMap["authorization"].(map[string]interface{}) - if authz["backend"] != "allowall" { - t.Errorf("authz.backend = %v, want allowall", authz["backend"]) - } - boot := specMap["bootstrap"].(map[string]interface{}) - if boot["enabled"] != true { - t.Errorf("bootstrap.enabled = %v, want true", boot["enabled"]) - } - server := specMap["server"].(map[string]interface{}) - if server["baseURI"] != spec.BaseURI { - t.Errorf("server.baseURI = %v, want %s", server["baseURI"], spec.BaseURI) - } - pg := specMap["database"].(map[string]interface{})["postgres"].(map[string]interface{}) - if pg["host"] != spec.PGHost || pg["database"] != spec.PGDatabase { - t.Errorf("pg host/db = %v/%v, want %s/%s", pg["host"], pg["database"], spec.PGHost, spec.PGDatabase) - } - // Resources are requests-only (no limits → Burstable). - res := specMap["resources"].(map[string]interface{}) - reqs := res["requests"].(map[string]interface{}) - if reqs["cpu"] != lakekeeperPodCPU || reqs["memory"] != lakekeeperPodMemory { - t.Errorf("requests cpu/mem = %v/%v, want %s/%s", reqs["cpu"], reqs["memory"], lakekeeperPodCPU, lakekeeperPodMemory) - } - if _, hasLimits := res["limits"]; hasLimits { - t.Errorf("resources.limits set, want none (requests-only): %v", res["limits"]) - } - // Two replicas. - if specMap["replicas"] != int64(lakekeeperPodReplicas) { - t.Errorf("replicas = %v, want %d", specMap["replicas"], lakekeeperPodReplicas) - } - // Prometheus scrape annotations are stamped onto the pod via podMetadata. - ann := specMap["podMetadata"].(map[string]interface{})["annotations"].(map[string]interface{}) - if ann["prometheus.io/scrape"] != "true" { - t.Errorf("prometheus.io/scrape = %v, want true", ann["prometheus.io/scrape"]) - } - if ann["prometheus.io/port"] != lakekeeperMetricsPort { - t.Errorf("prometheus.io/port = %v, want %s", ann["prometheus.io/port"], lakekeeperMetricsPort) - } - if ann["prometheus.io/path"] != "/metrics" { - t.Errorf("prometheus.io/path = %v, want /metrics", ann["prometheus.io/path"]) - } -} - -func TestPatchPodShape_LabelMatchedStripsLimits(t *testing.T) { - c, dc, _ := newFakeLakekeeperClient() - ctx := context.Background() - // Two CRs for one org under different names (legacy de-hyphenated + new - // hyphenated), both label-tagged; the first carries a stale limits block. - names := []string{"lakekeeper-acme", "lakekeeper-a-c-m-e"} - for i, name := range names { - spec := map[string]interface{}{"replicas": int64(1)} - if i == 0 { - spec["resources"] = map[string]interface{}{ - "limits": map[string]interface{}{"cpu": "250m", "memory": "256Mi"}, - "requests": map[string]interface{}{"cpu": "250m", "memory": "256Mi"}, - } - } - cr := &unstructured.Unstructured{Object: map[string]interface{}{ - "apiVersion": "lakekeeper.k8s.lakekeeper.io/v1alpha1", - "kind": "Lakekeeper", - "metadata": map[string]interface{}{ - "name": name, - "namespace": "lakekeeper", - "labels": map[string]interface{}{"duckgres/active-org": "acme"}, - }, - "spec": spec, - }} - if _, err := dc.Resource(lakekeeperGVR).Namespace("lakekeeper").Create(ctx, cr, metav1.CreateOptions{}); err != nil { - t.Fatalf("seed %s: %v", name, err) - } - } - - if err := c.PatchPodShape(ctx, "acme"); err != nil { - t.Fatalf("PatchPodShape: %v", err) - } - - for _, name := range names { - got, err := dc.Resource(lakekeeperGVR).Namespace("lakekeeper").Get(ctx, name, metav1.GetOptions{}) - if err != nil { - t.Fatalf("get %s: %v", name, err) - } - spec := got.Object["spec"].(map[string]interface{}) - if rv := spec["replicas"]; rv != int64(lakekeeperPodReplicas) && rv != float64(lakekeeperPodReplicas) { - t.Errorf("%s replicas = %v, want %d", name, rv, lakekeeperPodReplicas) - } - res := spec["resources"].(map[string]interface{}) - if _, ok := res["limits"]; ok { - t.Errorf("%s resources.limits still present after patch: %v", name, res["limits"]) - } - if res["requests"].(map[string]interface{})["cpu"] != lakekeeperPodCPU { - t.Errorf("%s requests.cpu = %v, want %s", name, res["requests"], lakekeeperPodCPU) - } - ann := spec["podMetadata"].(map[string]interface{})["annotations"].(map[string]interface{}) - if ann["prometheus.io/scrape"] != "true" { - t.Errorf("%s scrape annotation not applied: %v", name, ann) - } - } -} - -func TestEnsureCR_KubernetesAuthOff_OmitsAuthenticationBlock(t *testing.T) { - c, dc, _ := newFakeLakekeeperClient() - ctx := context.Background() - spec := LakekeeperCRSpec{ - OrgID: "acme", Image: "img:v1", PGHost: "h", PGDatabase: "lakekeeper_acme", - SecretName: "lakekeeper-acme", BaseURI: "http://x", - // KubernetesAuthAudiences left empty - } - if err := c.EnsureCR(ctx, spec); err != nil { - t.Fatalf("EnsureCR: %v", err) - } - got, _ := dc.Resource(lakekeeperGVR).Namespace("lakekeeper").Get(ctx, "lakekeeper-acme", metav1.GetOptions{}) - specMap := got.Object["spec"].(map[string]interface{}) - if _, present := specMap["authentication"]; present { - t.Errorf("authentication block should be absent when KubernetesAuthAudiences is empty") - } -} - -func TestEnsureCR_KubernetesAuthOn_EmitsAuthenticationBlock(t *testing.T) { - c, dc, _ := newFakeLakekeeperClient() - ctx := context.Background() - spec := LakekeeperCRSpec{ - OrgID: "acme", Image: "img:v1", PGHost: "h", PGDatabase: "lakekeeper_acme", - SecretName: "lakekeeper-acme", BaseURI: "http://x", - KubernetesAuthAudiences: []string{"lakekeeper", "lakekeeper-prod"}, - } - if err := c.EnsureCR(ctx, spec); err != nil { - t.Fatalf("EnsureCR: %v", err) - } - got, _ := dc.Resource(lakekeeperGVR).Namespace("lakekeeper").Get(ctx, "lakekeeper-acme", metav1.GetOptions{}) - specMap := got.Object["spec"].(map[string]interface{}) - auth, ok := specMap["authentication"].(map[string]interface{}) - if !ok { - t.Fatalf("authentication block missing or wrong type: %T", specMap["authentication"]) - } - k8sAuth, ok := auth["kubernetes"].(map[string]interface{}) - if !ok { - t.Fatalf("authentication.kubernetes missing") - } - if k8sAuth["enabled"] != true { - t.Errorf("authentication.kubernetes.enabled = %v, want true", k8sAuth["enabled"]) - } - auds, ok := k8sAuth["audiences"].([]interface{}) - if !ok || len(auds) != 2 { - t.Fatalf("audiences = %v, want [lakekeeper lakekeeper-prod]", k8sAuth["audiences"]) - } - if auds[0] != "lakekeeper" || auds[1] != "lakekeeper-prod" { - t.Errorf("audiences contents = %v", auds) - } -} - -func TestEnsureCR_IdempotentUpdate(t *testing.T) { - c, dc, _ := newFakeLakekeeperClient() - ctx := context.Background() - spec := LakekeeperCRSpec{ - OrgID: "acme", Image: "img:v1", PGHost: "h", PGDatabase: "lakekeeper_acme", - SecretName: "lakekeeper-acme", BaseURI: "http://x", - } - if err := c.EnsureCR(ctx, spec); err != nil { - t.Fatalf("first EnsureCR: %v", err) - } - spec.Image = "img:v2" - if err := c.EnsureCR(ctx, spec); err != nil { - t.Fatalf("second EnsureCR (update): %v", err) - } - // Read back the CR and confirm the update actually landed. - got, err := dc.Resource(lakekeeperGVR).Namespace("lakekeeper").Get(ctx, "lakekeeper-acme", metav1.GetOptions{}) - if err != nil { - t.Fatalf("get CR after update: %v", err) - } - if image := got.Object["spec"].(map[string]interface{})["image"]; image != "img:v2" { - t.Errorf("image after update = %v, want img:v2", image) - } -} - -func TestEnsureCR_RejectsEmptyOrgID(t *testing.T) { - c, _, _ := newFakeLakekeeperClient() - err := c.EnsureCR(context.Background(), LakekeeperCRSpec{ - Image: "img:v1", PGHost: "h", PGDatabase: "lakekeeper_x", - SecretName: "lakekeeper-x", BaseURI: "http://x", - }) - if err == nil || !strings.Contains(err.Error(), "OrgID is required") { - t.Fatalf("expected OrgID-required error, got: %v", err) - } -} - -func TestEnsureCR_RejectsUnsafeOrgID(t *testing.T) { - c, _, _ := newFakeLakekeeperClient() - for _, bad := range []string{"With Space", "trailing-", "-leading", "UPPER ok but space bad"} { - err := c.EnsureCR(context.Background(), LakekeeperCRSpec{ - OrgID: bad, Image: "img:v1", PGHost: "h", PGDatabase: "x", - SecretName: "lakekeeper-x", BaseURI: "http://x", - }) - if err == nil { - t.Errorf("expected error for orgID %q, got nil", bad) - } - } -} - -func TestEnsureSecret_RejectsUnsafeOrgID(t *testing.T) { - c, _, _ := newFakeLakekeeperClient() - err := c.EnsureSecret(context.Background(), "bad value", LakekeeperSecretData{ - DBUser: "u", DBPassword: "p", EncryptionKey: "k", OAuth2ClientSecret: "s", - }) - if err == nil || !strings.Contains(err.Error(), "not a valid K8s label value") { - t.Fatalf("expected label-value error, got: %v", err) - } -} - -func TestIsValidOrgIDLabel(t *testing.T) { - cases := map[string]bool{ - "acme": true, - "019e417b-18c4-7a41-bfec-e9ae3a02deb8": true, // UUID - "a": true, - "a.b_c-d": true, - "": false, - "-leading": false, - "trailing-": false, - ".": false, - "has space": false, - // 64 chars (over the 63 limit) - "a234567890123456789012345678901234567890123456789012345678901234": false, - } - for in, want := range cases { - if got := isValidOrgIDLabel(in); got != want { - t.Errorf("isValidOrgIDLabel(%q) = %v, want %v", in, got, want) - } - } -} - -func TestGetCR_NotFoundReturnsNilNil(t *testing.T) { - c, _, _ := newFakeLakekeeperClient() - st, err := c.GetCR(context.Background(), "ghost-org") - if err != nil || st != nil { - t.Fatalf("expected (nil, nil) for absent CR, got (%v, %v)", st, err) - } -} - -func TestGetCR_ParsesStatus(t *testing.T) { - c, _, _ := newFakeLakekeeperClient() - ctx := context.Background() - if err := c.EnsureCR(ctx, LakekeeperCRSpec{ - OrgID: "acme", Image: "img:v1", PGHost: "h", PGDatabase: "lakekeeper_acme", - SecretName: "lakekeeper-acme", BaseURI: "http://x", - }); err != nil { - t.Fatalf("seed CR: %v", err) - } - // Patch status on the fake. The fake dynamic client lets us mutate. - cr, _ := c.dynamic.Resource(lakekeeperGVR).Namespace("lakekeeper").Get(ctx, "lakekeeper-acme", metav1.GetOptions{}) - cr.Object["status"] = map[string]interface{}{ - "bootstrappedAt": "2026-05-19T12:00:00Z", - "serverID": "abc-123", - "readyReplicas": int64(2), - } - if _, err := c.dynamic.Resource(lakekeeperGVR).Namespace("lakekeeper").Update(ctx, cr, metav1.UpdateOptions{}); err != nil { - t.Fatalf("inject status: %v", err) - } - - st, err := c.GetCR(ctx, "acme") - if err != nil { - t.Fatalf("GetCR: %v", err) - } - if !st.Bootstrapped || st.ServerID != "abc-123" || st.ReadyReplicas != 2 { - t.Errorf("status parse mismatch: %+v", st) - } -} - -func TestDeleteResources_RemovesProvisionedResources(t *testing.T) { - c, dc, kc := newFakeLakekeeperClient() - ctx := context.Background() - orgID := "acme" - - // Provision the three resources DeleteForOrg is responsible for. - if err := c.EnsureSecret(ctx, orgID, LakekeeperSecretData{ - DBUser: "lakekeeper_acme", DBPassword: "pw", EncryptionKey: "key", OAuth2ClientSecret: "oauth", - }); err != nil { - t.Fatalf("EnsureSecret: %v", err) - } - if err := c.EnsureServiceAccount(ctx, orgID); err != nil { - t.Fatalf("EnsureServiceAccount: %v", err) - } - if err := c.EnsureCR(ctx, LakekeeperCRSpec{ - OrgID: orgID, Image: "quay.io/lakekeeper/catalog:v0.12.2", - PGHost: "acme-pg.local", PGDatabase: "lakekeeper_acme", SecretName: "lakekeeper-acme", - BaseURI: "http://lakekeeper-acme.lakekeeper.svc:8181", - }); err != nil { - t.Fatalf("EnsureCR: %v", err) - } - - if err := c.DeleteCR(ctx, orgID); err != nil { - t.Fatalf("DeleteCR: %v", err) - } - if err := c.DeleteSecret(ctx, orgID); err != nil { - t.Fatalf("DeleteSecret: %v", err) - } - if err := c.DeleteServiceAccount(ctx, orgID); err != nil { - t.Fatalf("DeleteServiceAccount: %v", err) - } - - if _, err := dc.Resource(lakekeeperGVR).Namespace("lakekeeper").Get(ctx, "lakekeeper-acme", metav1.GetOptions{}); !apierrors.IsNotFound(err) { - t.Errorf("CR still present after delete: err=%v", err) - } - if _, err := kc.CoreV1().Secrets("lakekeeper").Get(ctx, "lakekeeper-acme", metav1.GetOptions{}); !apierrors.IsNotFound(err) { - t.Errorf("Secret still present after delete: err=%v", err) - } - if _, err := kc.CoreV1().ServiceAccounts("lakekeeper").Get(ctx, "lakekeeper-acme", metav1.GetOptions{}); !apierrors.IsNotFound(err) { - t.Errorf("ServiceAccount still present after delete: err=%v", err) - } -} - -func TestDeleteResources_NotFoundIsNoOp(t *testing.T) { - c, _, _ := newFakeLakekeeperClient() - ctx := context.Background() - // Deleting resources that were never created must be a clean no-op so the - // teardown path is safe for ducklings that never enabled Iceberg. - if err := c.DeleteCR(ctx, "never-existed"); err != nil { - t.Errorf("DeleteCR on missing CR: %v", err) - } - if err := c.DeleteSecret(ctx, "never-existed"); err != nil { - t.Errorf("DeleteSecret on missing secret: %v", err) - } - if err := c.DeleteServiceAccount(ctx, "never-existed"); err != nil { - t.Errorf("DeleteServiceAccount on missing SA: %v", err) - } -} diff --git a/controlplane/provisioner/lakekeeper_provisioner.go b/controlplane/provisioner/lakekeeper_provisioner.go deleted file mode 100644 index 1ff6bf19..00000000 --- a/controlplane/provisioner/lakekeeper_provisioner.go +++ /dev/null @@ -1,553 +0,0 @@ -//go:build kubernetes - -package provisioner - -import ( - "context" - "crypto/rand" - "encoding/hex" - "errors" - "fmt" - "log/slog" - - "github.com/posthog/duckgres/controlplane/configstore" - "github.com/posthog/duckgres/server/lakekeeperbroker" - corev1 "k8s.io/api/core/v1" - apierrors "k8s.io/apimachinery/pkg/api/errors" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" -) - -// LakekeeperProvisioner advances an org's Lakekeeper through a fixed pipeline: -// -// create lakekeeper_ DB → ensure K8s Secret with creds → -// ensure Lakekeeper CR → wait for operator bootstrap → -// ensure Iceberg warehouse via REST → persist endpoint+client_id back to -// the warehouse row. -// -// Every step is idempotent so EnsureForOrg can be called repeatedly. The -// pure dependencies (admin DSN, PG host, S3 config) are passed in by the -// caller so the provisioner doesn't bake in secret-lookup logic — PR2 wiring -// will compute them from the Duckling CR status + K8s Secrets. -type LakekeeperProvisioner struct { - store WarehouseStore - k8s *LakekeeperK8sClient - image string - clientFor ClientFactory -} - -// ClientFactory builds a LakekeeperClient for a base URL. Lets tests inject -// httptest.Server URLs without rebuilding the provisioner. -type ClientFactory func(baseURL string) *LakekeeperClient - -// ProvisioningInputs are everything the provisioner needs about the org's -// environment that doesn't live in the warehouse row yet. -type ProvisioningInputs struct { - // AdminDSN is a pgx-compatible DSN with permission to CREATE DATABASE - // on the org's metadata Postgres. Caller resolves this from a K8s Secret - // managed by Crossplane. - AdminDSN string - - // PG host/port the Lakekeeper pod uses to reach the same cluster. Often - // the same hostname as AdminDSN but routed differently (e.g. through - // PgBouncer). Empty Port defaults to 5432. - PGHost string - PGPort int32 - - // PGPreProvisioned is set when the org's Lakekeeper database and role - // already exist, created out-of-band — specifically by provider-sql on a - // CloudNativePG shard (the cnpg-shard metadata-store type). In that mode - // the provisioner does NOT run CREATE DATABASE / CREATE ROLE (it has no - // privileged AdminDSN — the connection is the per-tenant lakekeeper_ - // role itself), and instead takes the role credentials below verbatim: - // PGUser/PGPassword go into the Lakekeeper pod's Secret, PGDatabase is the - // database it connects to. AdminDSN is not required when this is true. - PGPreProvisioned bool - PGUser string - PGPassword string - PGDatabase string - - // PGSSLMode the Lakekeeper pod sets when connecting. Default "require". - // Set to "disable" for local/dev environments where PG has no TLS. - PGSSLMode string - - // S3 storage profile for the warehouse Lakekeeper hands out. For prod - // AWS, leave Endpoint empty and Flavor "aws". For MinIO / s3-compat, - // set Endpoint and Flavor "s3-compat". - S3 S3StorageConfig - - // KubernetesAuthAudiences enables the OIDC SA-token auth path on the - // Lakekeeper CR. The duckling's projected SA token must carry one of - // these audiences. Empty disables — Lakekeeper runs in allowall + - // NetworkPolicy mode (the PR1+PR2 deployment shape). - // - // When set, the provisioner also writes a non-empty - // LakekeeperOAuth2ServerURI to the warehouse row so the worker's - // in-process broker becomes DuckDB's OAuth2 server. - // - // **Flag-day deploy ordering.** Once any org gets a non-empty value - // here, its activation payload carries LakekeeperOAuth2ServerURI= - // http://127.0.0.1:9876/token, which the worker's iceberg extension - // tries to POST to. If the duckling pod spec hasn't yet been updated - // to (a) mount the projected SA token at DUCKGRES_LAKEKEEPER_TOKEN_PATH - // and (b) expose the broker on 9876, the POST hits a closed port and - // the ATTACH fails. There is no path that clears the URI once written - // (re-running with empty audiences would leave the old value behind). - // Deploy order MUST be: - // 1. Roll out the duckling pod spec change with the projected SA - // volume + env var. - // 2. Apply the operator chart change that flips the CR's - // authentication.kubernetes.enabled (PR4 already plumbs this). - // 3. Only then flip KubernetesAuthAudiences in the inputs resolver. - KubernetesAuthAudiences []string -} - -// S3StorageConfig captures the bucket + credentials Lakekeeper uses. -type S3StorageConfig struct { - Bucket string - KeyPrefix string - Endpoint string - Region string - // Flavor is "aws" or "s3-compat". - Flavor string - - // Static creds — present for MinIO/dev. For prod AWS, leave empty and - // Lakekeeper uses its pod IRSA identity (allow-direct-system-credentials - // in the operator config). - StaticAccessKeyID string - StaticAccessKeySecret string - - // RoleARN is the IAM role Lakekeeper assumes to vend scoped S3 credentials - // to clients (sts-role-arn). Required for the AWS flavor when STS is on; - // the per-org duckling role (Lakekeeper's own Pod Identity, self-assumed). - // Empty for s3-compat (MinIO), where STS vending doesn't need a role. - RoleARN string -} - -// LakekeeperProvisionerOption tunes the provisioner. -type LakekeeperProvisionerOption func(*LakekeeperProvisioner) - -// WithImage sets the Lakekeeper container image. Defaults to a pinned tag. -func WithImage(img string) LakekeeperProvisionerOption { - return func(p *LakekeeperProvisioner) { p.image = img } -} - -// WithClientFactory overrides how LakekeeperClient instances are built. -// Tests use this to point at httptest.Server. -func WithClientFactory(f ClientFactory) LakekeeperProvisionerOption { - return func(p *LakekeeperProvisioner) { p.clientFor = f } -} - -// DefaultLakekeeperImage is the pinned image we deploy by default. -// Bumps to this constant should be paired with a Lakekeeper-operator -// version compatibility check. The published tags carry a "v" prefix -// (quay.io/lakekeeper/catalog:v0.12.2) — without it the pull 404s. -// -// MUST be v0.12.x or later: the operator (lakekeeper_controller.go) emits the -// v0.12.x config scheme (LAKEKEEPER__STORAGE_CREDENTIAL_BACKEND__AWS__*). -// v0.11.x ignores those env vars, so e.g. system-identity credential vending -// silently stays disabled ("System identity credentials are disabled"). -const DefaultLakekeeperImage = "quay.io/lakekeeper/catalog:v0.12.2" - -// NewLakekeeperProvisioner builds a provisioner. The WarehouseStore is the -// same interface the existing controller uses for persistence. -func NewLakekeeperProvisioner(store WarehouseStore, k8s *LakekeeperK8sClient, opts ...LakekeeperProvisionerOption) *LakekeeperProvisioner { - p := &LakekeeperProvisioner{ - store: store, - k8s: k8s, - image: DefaultLakekeeperImage, - clientFor: NewLakekeeperClient, - } - for _, o := range opts { - o(p) - } - return p -} - -// ErrBootstrapPending signals "the operator hasn't finished initial bootstrap -// of the Lakekeeper CR yet". Callers should requeue rather than treat this -// as a failure. -var ErrBootstrapPending = errors.New("lakekeeper bootstrap pending") - -// EnsureForOrg drives the pipeline. Idempotent: re-running on an -// already-provisioned org reads existing state and only emits writes for -// genuine drift. -func (p *LakekeeperProvisioner) EnsureForOrg(ctx context.Context, w *configstore.ManagedWarehouse, in ProvisioningInputs) error { - if w == nil { - return errors.New("EnsureForOrg: warehouse is nil") - } - if !isValidOrgIDLabel(w.OrgID) { - return fmt.Errorf("EnsureForOrg: orgID %q is not a valid K8s label value", w.OrgID) - } - if err := validateInputs(in); err != nil { - return err - } - - dbName := lakekeeperDBName(w.OrgID) - secretName := LakekeeperResourceName(w.OrgID) - resourceName := LakekeeperResourceName(w.OrgID) - // In-cluster Service DNS — operator names the Service after the CR. - baseURL := fmt.Sprintf("http://%s.%s.svc:8181", resourceName, p.k8s.namespace) - - // In pre-provisioned mode (cnpg-shard) the database + role were already - // created by provider-sql on the shard, and the connection we have is the - // non-privileged per-tenant role — so skip the CREATE DATABASE / CREATE - // ROLE DDL and take the role's name/password verbatim from the inputs. - if in.PGPreProvisioned { - dbName = in.PGDatabase - } else { - // 1. CREATE DATABASE lakekeeper_ if absent. - if err := EnsureDatabase(ctx, in.AdminDSN, dbName); err != nil { - return fmt.Errorf("ensure lakekeeper db: %w", err) - } - } - - // 2. Resolve or generate the per-org credentials, and ensure the K8s - // Secret holds them. On re-runs we read the existing Secret rather than - // rotating its values — Lakekeeper's DB password would otherwise drift - // from the Postgres user's actual password. In pre-provisioned mode the - // DB user/password come from the inputs (the provider-sql role); the - // Lakekeeper-internal EncryptionKey/OAuth2ClientSecret are still generated. - creds, err := p.resolveOrGenerateSecret(ctx, w.OrgID, dbName, secretName, in.PGUser, in.PGPassword) - if err != nil { - return fmt.Errorf("ensure lakekeeper secret: %w", err) - } - - // 2b. Ensure the Postgres role exists with the password matching the - // Secret. A freshly-created database has no users by default; without - // this, the Lakekeeper pod's CreateAdminUser-style startup would fail - // with "role does not exist". EnsureRole is idempotent and rotates the - // password to match the Secret on re-runs. Skipped when pre-provisioned — - // provider-sql owns the role and our connection can't CREATE ROLE. - if !in.PGPreProvisioned { - if err := EnsureRole(ctx, in.AdminDSN, creds.DBUser, creds.DBPassword, dbName); err != nil { - return fmt.Errorf("ensure lakekeeper pg role: %w", err) - } - } - - // 2c. Ensure the per-org ServiceAccount the Lakekeeper pod runs under. - // Must exist before the CR so the operator's Deployment + migration Job - // can mount it. Each org gets its own SA (in the shared namespace) so it - // can carry a distinct EKS Pod Identity scoped to that org's bucket. - if err := p.k8s.EnsureServiceAccount(ctx, w.OrgID); err != nil { - return fmt.Errorf("ensure lakekeeper service account: %w", err) - } - - // 3. Apply the Lakekeeper CR pointing at the org's PG + the Secret. - if err := p.k8s.EnsureCR(ctx, p.buildCRSpec(w, in)); err != nil { - return fmt.Errorf("ensure lakekeeper cr: %w", err) - } - - // 4. Check whether the operator has marked bootstrap complete. If not, - // return ErrBootstrapPending so the outer reconcile loop can requeue - // without blocking other orgs. - if err := p.checkBootstrap(ctx, w.OrgID); err != nil { - return err - } - - // 5. Idempotently create the org's warehouse via Lakekeeper's REST API. - // LakekeeperClient calls /management/v1/* which are at the server root, - // NOT under /catalog. The /catalog prefix is only for the Iceberg REST - // API that ducklings hit later — that's the value we persist in - // LakekeeperEndpoint below. - lkClient := p.clientFor(baseURL) - whReq := CreateWarehouseRequest{ - WarehouseName: lakekeeperWarehouseName(w.OrgID), - StorageProfile: WarehouseStorageProfile{ - Type: "s3", - Bucket: in.S3.Bucket, - KeyPrefix: in.S3.KeyPrefix, - Endpoint: in.S3.Endpoint, - Region: in.S3.Region, - PathStyleAccess: in.S3.Flavor == "s3-compat", - Flavor: in.S3.Flavor, - // STS credential vending is OFF: Lakekeeper would assume a role - // and hand DuckDB short-lived creds, but its downscoping session - // policy overflows AWS's packed-policy limit (PackedPolicyTooLarge), - // and it's unnecessary — the duckling worker already holds STS creds - // for this bucket (brokered by the control plane for DuckLake, same - // per-org role). The worker attaches the catalog with its own S3 - // secret and Lakekeeper serves metadata only (using its pod identity - // for catalog-metadata IO via allowDirectSystemCredentials). - STSEnabled: false, - RemoteSigningEnabled: false, - }, - StorageCredential: storageCredFor(in.S3), - } - wh, err := lkClient.EnsureWarehouse(ctx, whReq) - if err != nil { - return fmt.Errorf("ensure lakekeeper warehouse: %w", err) - } - - // 6. Persist endpoint+credentials back into the warehouse row. Iceberg - // state flips to Ready as part of the same write. - // - // OAUTH2_SERVER_URI is populated only when KubernetesAuthAudiences was - // passed (PR4 OIDC mode). Without it, Lakekeeper runs in allowall + - // NetworkPolicy mode and the worker emits an ATTACH with - // AUTHORIZATION_TYPE 'none' — the empty URI value signals that path - // downstream via server/iceberg.BuildLakekeeperAttachStmt. - oauth2URI := "" - if len(in.KubernetesAuthAudiences) > 0 { - oauth2URI = lakekeeperbroker.DefaultOAuth2ServerURI - } - // NOTE: these are raw DB column names — GORM's Updates(map) uses keys - // verbatim, bypassing struct field→column mapping. They MUST match the - // columns AutoMigrate generated from ManagedWarehouseIceberg. GORM - // snake-cases the field LakekeeperOAuth2ServerURI to "o_auth2_server_uri" - // (it splits the OAuth2 acronym), so the column is - // iceberg_lakekeeper_o_auth2_server_uri — NOT ...oauth2.... A mismatch - // here fails the persist with "column does not exist" (SQLSTATE 42703). - updates := map[string]interface{}{ - "iceberg_enabled": true, - "iceberg_backend": configstore.IcebergBackendLakekeeper, - "iceberg_lakekeeper_endpoint": baseURL + "/catalog", - "iceberg_lakekeeper_warehouse": wh.Name, - "iceberg_lakekeeper_client_id": oauthClientID(w.OrgID), - "iceberg_lakekeeper_o_auth2_server_uri": oauth2URI, - "iceberg_lakekeeper_client_credentials_namespace": p.k8s.namespace, - "iceberg_lakekeeper_client_credentials_name": secretName, - "iceberg_lakekeeper_client_credentials_key": SecretKeyOAuth2ClientSecret, - // Persist the S3 region so the worker's iceberg S3 secret (built from - // the duckling's brokered creds) targets the right region for data IO. - "iceberg_region": in.S3.Region, - "iceberg_state": configstore.ManagedWarehouseStateReady, - } - _ = creds // creds are written into the Secret; the row only references them - if err := p.store.UpdateIcebergConfig(w.OrgID, updates); err != nil { - return fmt.Errorf("persist lakekeeper config: %w", err) - } - return nil -} - -// buildCRSpec assembles the desired Lakekeeper CR spec for an org, used by -// EnsureForOrg on initial provisioning. (Drift correction for already- -// provisioned orgs goes through PatchPodShape, which patches the live CR by -// label rather than rebuilding the whole spec under a recomputed name.) -func (p *LakekeeperProvisioner) buildCRSpec(w *configstore.ManagedWarehouse, in ProvisioningInputs) LakekeeperCRSpec { - dbName := lakekeeperDBName(w.OrgID) - if in.PGPreProvisioned { - dbName = in.PGDatabase - } - pgPort := in.PGPort - if pgPort == 0 { - pgPort = 5432 - } - resourceName := LakekeeperResourceName(w.OrgID) - return LakekeeperCRSpec{ - OrgID: w.OrgID, - Image: p.image, - Replicas: lakekeeperPodReplicas, - PGHost: in.PGHost, - PGPort: pgPort, - PGDatabase: dbName, - SecretName: resourceName, - BaseURI: fmt.Sprintf("http://%s.%s.svc:8181", resourceName, p.k8s.namespace), - PGSSLMode: in.PGSSLMode, - ServiceAccountName: LakekeeperServiceAccountName(w.OrgID), - KubernetesAuthAudiences: in.KubernetesAuthAudiences, - } -} - -// PatchPodShape is the drift-correction path for already-provisioned orgs: it -// converges the pod-shape fields (replicas + resource requests + scrape -// annotations) onto the org's existing Lakekeeper CR(s) without re-running the -// database / Secret / REST-warehouse pipeline and without resolving inputs. -// -// It matches CRs by the duckgres/active-org label rather than recomputing the -// name, so it patches whatever CR actually exists — important because a legacy -// org's CR keeps the de-hyphenated name (derived from the no-hyphen Duckling XR) -// while LakekeeperResourceName now preserves hyphens. See -// LakekeeperK8sClient.PatchPodShape for the merge-patch (conflict-free) details. -func (p *LakekeeperProvisioner) PatchPodShape(ctx context.Context, orgID string) error { - return p.k8s.PatchPodShape(ctx, orgID) -} - -// DeleteForOrg tears down the per-org Lakekeeper instance that EnsureForOrg -// created: the CR (which cascades to the operator-managed Deployment, Service, -// and migration Job via ownerReferences) plus the standalone Secret and -// ServiceAccount (which don't carry an ownerReference and so must be deleted -// explicitly). Idempotent and NotFound-tolerant, so it's a safe no-op for orgs -// that never had Iceberg enabled. -// -// When inputs carry an AdminDSN (i.e. !PGPreProvisioned), additionally drops -// the lakekeeper_ Postgres database and role on the metadata store. -// This is what makes recreating a duckling with the same orgID work: the -// duckgres lakekeeper provisioner rotates the k8s Secret's encryption-key on -// every provision, and leaving the old encrypted rows behind causes -// Lakekeeper to return SecretFetchError ("Wrong key or corrupt data") on -// every CREATE TABLE in the next lifetime. For the cnpg-shard case -// (PGPreProvisioned) the corresponding cleanup happens via the Crossplane -// composition's [Delete] managementPolicy on the cnpg-tenant-role and -// cnpg-tenant-database resources — see posthog/charts PR for the parallel -// fix. -// -// Best-effort: PG drop failures are logged and swallowed so a transient -// network issue or a half-deleted metadata store doesn't permanently block -// the duckling teardown. The k8s teardown failures (which actually leave -// resources stranded) still surface as errors to the controller's retry -// loop. -func (p *LakekeeperProvisioner) DeleteForOrg(ctx context.Context, orgID string, in ProvisioningInputs) error { - if err := p.k8s.DeleteCR(ctx, orgID); err != nil { - return err - } - if err := p.k8s.DeleteSecret(ctx, orgID); err != nil { - return err - } - if err := p.k8s.DeleteServiceAccount(ctx, orgID); err != nil { - return err - } - - // PG cleanup applies only when this provisioner actually created the - // DB/role (external + dev/orbstack paths). cnpg-shard ownership lives - // in the Crossplane composition. - if in.PGPreProvisioned || in.AdminDSN == "" { - return nil - } - dbName := lakekeeperDBName(orgID) - // DROP DATABASE first so DROP ROLE doesn't trip over the role's - // ownership of the database. Both best-effort. - if err := DropDatabase(ctx, in.AdminDSN, dbName); err != nil { - slog.Warn("Lakekeeper PG database drop failed; continuing teardown.", - "org", orgID, "database", dbName, "error", err) - } - if err := DropRole(ctx, in.AdminDSN, dbName); err != nil { - slog.Warn("Lakekeeper PG role drop failed; continuing teardown.", - "org", orgID, "role", dbName, "error", err) - } - return nil -} - -// checkBootstrap reads the Lakekeeper CR status once. Returns nil when -// bootstrappedAt is non-empty, ErrBootstrapPending otherwise. The caller — -// typically the warehouse-reconcile loop — is responsible for requeueing on -// ErrBootstrapPending. We intentionally don't poll here: the outer loop -// iterates over many orgs per tick, and a per-org sleep would stall every -// other org behind a slow bootstrap. The operator's bootstrap typically -// completes in <10s once the Deployment is ready, well within a normal -// reconcile cadence. -func (p *LakekeeperProvisioner) checkBootstrap(ctx context.Context, orgID string) error { - st, err := p.k8s.GetCR(ctx, orgID) - if err != nil { - return fmt.Errorf("get lakekeeper cr status: %w", err) - } - if st == nil || !st.Bootstrapped { - return ErrBootstrapPending - } - return nil -} - -// resolveOrGenerateSecret reads the existing per-org Secret if present, or -// generates fresh credentials and writes a new Secret. Generating ONLY on -// first run keeps the Postgres user's password stable across re-runs — -// rotating the Secret would silently de-sync it from the actual PG password. -// -// The DB user/password are generated unless pgUser/pgPassword are supplied -// (pre-provisioned cnpg-shard mode), in which case they're taken verbatim so -// the Lakekeeper pod authenticates as the provider-sql-created role. The -// Lakekeeper-internal EncryptionKey and OAuth2ClientSecret are always -// generated regardless of mode. -func (p *LakekeeperProvisioner) resolveOrGenerateSecret(ctx context.Context, orgID, dbName, secretName, pgUser, pgPassword string) (LakekeeperSecretData, error) { - existing, err := p.k8s.kubernetes.CoreV1().Secrets(p.k8s.namespace).Get(ctx, secretName, metav1.GetOptions{}) - if err == nil { - return secretFromExisting(existing), nil - } - if !apierrors.IsNotFound(err) { - return LakekeeperSecretData{}, fmt.Errorf("get existing secret: %w", err) - } - // DB credentials: pre-provisioned values win; otherwise generate. Default - // user is a role named after the DB (matches EnsureRole). - dbUser := dbName - dbPassword := mustRandomHex(32) - if pgUser != "" { - dbUser = pgUser - dbPassword = pgPassword - } - data := LakekeeperSecretData{ - DBUser: dbUser, - DBPassword: dbPassword, - EncryptionKey: mustRandomHex(32), // 32 bytes ⇒ 64 hex chars; safely covers any 256-bit key expectation - OAuth2ClientSecret: mustRandomHex(32), - } - if err := p.k8s.EnsureSecret(ctx, orgID, data); err != nil { - return LakekeeperSecretData{}, err - } - return data, nil -} - -// secretFromExisting decodes the per-org Secret keys back into a typed -// value. Reads only from Data — the K8s API server clears StringData on -// read and base64-decodes everything into Data, so checking StringData -// first would be dead code in production. The fake clientset echoes -// StringData back; assertSecretData in the unit tests covers both, so -// coverage is unchanged. -func secretFromExisting(s *corev1.Secret) LakekeeperSecretData { - return LakekeeperSecretData{ - DBUser: string(s.Data[SecretKeyDBUser]), - DBPassword: string(s.Data[SecretKeyDBPassword]), - EncryptionKey: string(s.Data[SecretKeyEncryptionKey]), - OAuth2ClientSecret: string(s.Data[SecretKeyOAuth2ClientSecret]), - } -} - -func validateInputs(in ProvisioningInputs) error { - if in.PGPreProvisioned { - if in.PGUser == "" || in.PGPassword == "" || in.PGDatabase == "" { - return errors.New("ProvisioningInputs: PGPreProvisioned requires PGUser, PGPassword, and PGDatabase") - } - } else if in.AdminDSN == "" { - return errors.New("ProvisioningInputs.AdminDSN is required") - } - if in.PGHost == "" { - return errors.New("ProvisioningInputs.PGHost is required") - } - if in.S3.Bucket == "" || in.S3.Region == "" { - return errors.New("ProvisioningInputs.S3 Bucket+Region are required") - } - if in.S3.Flavor == "" { - return errors.New("ProvisioningInputs.S3.Flavor is required (\"aws\" or \"s3-compat\")") - } - return nil -} - -func storageCredFor(s3 S3StorageConfig) WarehouseStorageCredential { - if s3.StaticAccessKeyID != "" { - return WarehouseStorageCredential{ - Type: "s3", - CredentialType: "access-key", - AWSAccessKeyID: s3.StaticAccessKeyID, - AWSSecretAccessKey: s3.StaticAccessKeySecret, - } - } - return WarehouseStorageCredential{ - Type: "s3", - CredentialType: "aws-system-identity", - } -} - -// lakekeeperDBName is the Postgres database (and role) name for the org's -// Lakekeeper backend. It's a Postgres identifier, so hyphens are sanitized to -// underscores via pgIdentSuffix — matching the cnpg-shard composition's -// $pgIdent so the external and cnpg-shard paths agree. -func lakekeeperDBName(orgID string) string { - return "lakekeeper_" + pgIdentSuffix(orgID) -} - -// lakekeeperWarehouseName and oauthClientID are free-form strings (the Iceberg -// REST warehouse name and the OAuth2 client_id), so they preserve hyphens. -func lakekeeperWarehouseName(orgID string) string { - return "org-" + ducklingName(orgID) -} - -func oauthClientID(orgID string) string { - return "duckling-" + ducklingName(orgID) -} - -func mustRandomHex(byteLen int) string { - b := make([]byte, byteLen) - if _, err := rand.Read(b); err != nil { - // crypto/rand on Linux/macOS doesn't fail in practice; if it does, - // we cannot proceed safely. - panic(fmt.Sprintf("crypto/rand failed: %v", err)) - } - return hex.EncodeToString(b) -} diff --git a/controlplane/provisioner/lakekeeper_provisioner_test.go b/controlplane/provisioner/lakekeeper_provisioner_test.go deleted file mode 100644 index 27842d32..00000000 --- a/controlplane/provisioner/lakekeeper_provisioner_test.go +++ /dev/null @@ -1,533 +0,0 @@ -//go:build kubernetes - -package provisioner - -import ( - "context" - "encoding/json" - "errors" - "io" - "net/http" - "net/http/httptest" - "os" - "strings" - "testing" - - "github.com/posthog/duckgres/controlplane/configstore" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" -) - -// fakeLakekeeperServer is an httptest stand-in for Lakekeeper. Tracks calls -// and lets us assert the right requests landed. -type fakeLakekeeperServer struct { - t *testing.T - srv *httptest.Server - warehouses []CreateWarehouseRequest - bootstraps int -} - -func newFakeLakekeeperServer(t *testing.T) *fakeLakekeeperServer { - t.Helper() - f := &fakeLakekeeperServer{t: t} - f.srv = httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - switch { - case r.Method == http.MethodGet && r.URL.Path == "/management/v1/info": - _, _ = io.WriteString(w, `{"version":"0.11.6","bootstrapped":true}`) - case r.Method == http.MethodPost && r.URL.Path == "/management/v1/bootstrap": - f.bootstraps++ - w.WriteHeader(http.StatusNoContent) - case r.Method == http.MethodGet && r.URL.Path == "/management/v1/warehouse": - resp := listWarehousesResponse{Warehouses: nil} - for i, req := range f.warehouses { - resp.Warehouses = append(resp.Warehouses, Warehouse{ - Name: req.WarehouseName, WarehouseID: stableID(i), - }) - } - _ = json.NewEncoder(w).Encode(resp) - case r.Method == http.MethodPost && r.URL.Path == "/management/v1/warehouse": - var req CreateWarehouseRequest - if err := json.NewDecoder(r.Body).Decode(&req); err != nil { - t.Fatalf("decode body: %v", err) - } - f.warehouses = append(f.warehouses, req) - _ = json.NewEncoder(w).Encode(Warehouse{ - Name: req.WarehouseName, WarehouseID: stableID(len(f.warehouses) - 1), - }) - default: - t.Errorf("unexpected request: %s %s", r.Method, r.URL.Path) - http.NotFound(w, r) - } - })) - t.Cleanup(f.srv.Close) - return f -} - -func stableID(i int) string { - return "wh-" + string(rune('a'+i)) -} - -func newFakeProvisionerStore(orgID string, state configstore.ManagedWarehouseProvisioningState) *fakeStore { - fs := newFakeStore() - fs.warehouses[orgID] = &configstore.ManagedWarehouse{ - OrgID: orgID, - State: state, - } - return fs -} - -// markBootstrapped helps EnsureForOrg's wait-for-bootstrap step succeed under -// the fake dynamic client (which doesn't run the operator's reconciler). -func markBootstrapped(t *testing.T, c *LakekeeperK8sClient, orgID string) { - t.Helper() - name := LakekeeperResourceName(orgID) - cr, err := c.dynamic.Resource(lakekeeperGVR).Namespace(c.namespace).Get(context.Background(), name, metav1.GetOptions{}) - if err != nil { - t.Fatalf("get CR to mark bootstrapped: %v", err) - } - cr.Object["status"] = map[string]interface{}{ - "bootstrappedAt": "2026-05-19T12:00:00Z", - "serverID": "test-server", - } - if _, err := c.dynamic.Resource(lakekeeperGVR).Namespace(c.namespace).Update(context.Background(), cr, metav1.UpdateOptions{}); err != nil { - t.Fatalf("mark bootstrapped: %v", err) - } -} - -func TestEnsureForOrg_HappyPath_Fakes(t *testing.T) { - // This test exercises everything EXCEPT the actual Postgres CREATE - // DATABASE — that's covered by EnsureDatabase_AgainstLivePG. We point - // AdminDSN at the local prototype Postgres if available, otherwise - // skip the test. - dsn := os.Getenv("PG_ADMIN_DSN") - if dsn == "" { - t.Skip("PG_ADMIN_DSN not set; happy-path test needs a real Postgres for CREATE DATABASE") - } - - c, _, _ := newFakeLakekeeperClient() - fake := newFakeLakekeeperServer(t) - store := newFakeProvisionerStore("acme", configstore.ManagedWarehouseStateProvisioning) - p := NewLakekeeperProvisioner(store, c, - WithImage("img:test"), - WithClientFactory(func(baseURL string) *LakekeeperClient { - // Redirect "in-cluster" base URL to the fake test server. - return NewLakekeeperClient(fake.srv.URL) - }), - ) - - // We need the CR to exist before EnsureForOrg's wait-for-bootstrap step, - // and we need it marked bootstrapped (the fake k8s client doesn't run - // the operator). EnsureForOrg's k8s.EnsureCR call creates the CR; we - // then need to inject status concurrently. Simpler: pre-create the CR - // with status set, so EnsureForOrg's EnsureCR Update path executes - // instead of Create. - if err := c.EnsureCR(context.Background(), LakekeeperCRSpec{ - OrgID: "acme", Image: "stub", PGHost: "stub", PGDatabase: "stub", SecretName: "stub", BaseURI: "http://stub", - }); err != nil { - t.Fatalf("seed CR: %v", err) - } - markBootstrapped(t, c, "acme") - - in := ProvisioningInputs{ - AdminDSN: dsn, - PGHost: "localhost", - PGPort: 5434, - S3: S3StorageConfig{ - Bucket: "warehouse", KeyPrefix: "org-acme", - Endpoint: "http://minio:9000", Region: "us-east-1", Flavor: "s3-compat", - StaticAccessKeyID: "minioadmin", StaticAccessKeySecret: "minioadmin", - }, - } - t.Cleanup(func() { - // Drop the lakekeeper_acme DB created by EnsureDatabase. - dropDatabase(t, dsn, "lakekeeper_acme") - }) - - if err := p.EnsureForOrg(context.Background(), store.warehouses["acme"], in); err != nil { - t.Fatalf("EnsureForOrg: %v", err) - } - - // Warehouse row should now carry Lakekeeper config. - w := store.warehouses["acme"] - if !w.Iceberg.Enabled { - t.Errorf("Iceberg.Enabled not flipped on") - } - if w.Iceberg.Backend != configstore.IcebergBackendLakekeeper { - t.Errorf("Backend = %q, want lakekeeper", w.Iceberg.Backend) - } - if !strings.HasSuffix(w.Iceberg.LakekeeperEndpoint, "/catalog") { - t.Errorf("Endpoint = %q, want /catalog suffix", w.Iceberg.LakekeeperEndpoint) - } - if w.Iceberg.LakekeeperWarehouse != "org-acme" { - t.Errorf("Warehouse = %q, want org-acme", w.Iceberg.LakekeeperWarehouse) - } - if w.Iceberg.LakekeeperClientID != "duckling-acme" { - t.Errorf("ClientID = %q, want duckling-acme", w.Iceberg.LakekeeperClientID) - } - if w.Iceberg.LakekeeperClientCredentials.Name != LakekeeperResourceName("acme") { - t.Errorf("ClientCredentials.Name = %q, want %q", w.Iceberg.LakekeeperClientCredentials.Name, LakekeeperResourceName("acme")) - } - if w.IcebergState != configstore.ManagedWarehouseStateReady { - t.Errorf("IcebergState = %q, want ready", w.IcebergState) - } - - // Fake Lakekeeper should have received exactly one warehouse create. - if len(fake.warehouses) != 1 { - t.Fatalf("warehouse creates = %d, want 1", len(fake.warehouses)) - } - got := fake.warehouses[0] - if got.WarehouseName != "org-acme" { - t.Errorf("WarehouseName = %q, want org-acme", got.WarehouseName) - } - if !got.StorageProfile.STSEnabled || got.StorageProfile.Flavor != "s3-compat" { - t.Errorf("storage profile must be sts-enabled + s3-compat, got %+v", got.StorageProfile) - } - - // Idempotency: a second call should be a no-op on writes. - beforeCalls := len(fake.warehouses) - if err := p.EnsureForOrg(context.Background(), w, in); err != nil { - t.Fatalf("EnsureForOrg (second call): %v", err) - } - if len(fake.warehouses) != beforeCalls { - t.Errorf("second EnsureForOrg created another warehouse (want idempotent)") - } -} - -// TestEnsureForOrg_PreProvisioned exercises the cnpg-shard path: the database -// and role were already created (by provider-sql on the shard), so EnsureForOrg -// must NOT run CREATE DATABASE / CREATE ROLE (it has no AdminDSN — a connection -// attempt would fail), and must put the supplied PG credentials into the -// Lakekeeper Secret verbatim. Unlike the happy-path test this needs no real -// Postgres, precisely because the DDL steps are skipped. -func TestEnsureForOrg_PreProvisioned(t *testing.T) { - c, _, kube := newFakeLakekeeperClient() - fake := newFakeLakekeeperServer(t) - store := newFakeProvisionerStore("acme", configstore.ManagedWarehouseStateProvisioning) - p := NewLakekeeperProvisioner(store, c, - WithImage("img:test"), - WithClientFactory(func(baseURL string) *LakekeeperClient { - return NewLakekeeperClient(fake.srv.URL) - }), - ) - - // Pre-create + bootstrap the CR (the fake k8s client doesn't run the operator). - if err := c.EnsureCR(context.Background(), LakekeeperCRSpec{ - OrgID: "acme", Image: "stub", PGHost: "stub", PGDatabase: "stub", SecretName: "stub", BaseURI: "http://stub", - }); err != nil { - t.Fatalf("seed CR: %v", err) - } - markBootstrapped(t, c, "acme") - - in := ProvisioningInputs{ - PGPreProvisioned: true, - PGUser: "lakekeeper_acme", - PGPassword: "from-provider-sql", - PGDatabase: "lakekeeper_acme", - PGHost: "shard-001-pooler.cnpg-shards.svc.cluster.local", - PGPort: 5432, - PGSSLMode: "require", - S3: S3StorageConfig{ - Bucket: "warehouse", KeyPrefix: "lakekeeper", - Region: "us-east-1", Flavor: "aws", - RoleARN: "arn:aws:iam::123456789012:role/duckling-acme", - }, - } - - // No PG_ADMIN_DSN, no real Postgres: succeeding proves the DDL steps were - // skipped (a nil/blank AdminDSN connection would otherwise error). - if err := p.EnsureForOrg(context.Background(), store.warehouses["acme"], in); err != nil { - t.Fatalf("EnsureForOrg (pre-provisioned): %v", err) - } - - // The Secret must carry the supplied DB creds verbatim (not generated). - sec, err := kube.CoreV1().Secrets(c.namespace).Get(context.Background(), LakekeeperResourceName("acme"), metav1.GetOptions{}) - if err != nil { - t.Fatalf("get lakekeeper secret: %v", err) - } - if got := string(sec.Data[SecretKeyDBUser]); got != "lakekeeper_acme" { - t.Errorf("db-user = %q, want lakekeeper_acme", got) - } - if got := string(sec.Data[SecretKeyDBPassword]); got != "from-provider-sql" { - t.Errorf("db-password = %q, want from-provider-sql", got) - } - // Lakekeeper-internal secrets are still generated even in pre-provisioned mode. - if len(sec.Data[SecretKeyEncryptionKey]) == 0 { - t.Errorf("encryption-key should still be generated") - } - - // Warehouse created + iceberg flipped on. - w := store.warehouses["acme"] - if !w.Iceberg.Enabled || w.Iceberg.Backend != configstore.IcebergBackendLakekeeper { - t.Errorf("iceberg not configured: %+v", w.Iceberg) - } - if len(fake.warehouses) != 1 { - t.Fatalf("warehouse creates = %d, want 1", len(fake.warehouses)) - } -} - -// EnsureForOrg returns ErrBootstrapPending when the operator hasn't yet -// flipped status.bootstrappedAt — caller (the warehouse-reconcile loop) -// is responsible for requeueing without blocking other orgs. -func TestEnsureForOrg_NotBootstrappedReturnsTransient(t *testing.T) { - dsn := os.Getenv("PG_ADMIN_DSN") - if dsn == "" { - t.Skip("PG_ADMIN_DSN not set") - } - c, _, _ := newFakeLakekeeperClient() - store := newFakeProvisionerStore("acme2", configstore.ManagedWarehouseStateProvisioning) - p := NewLakekeeperProvisioner(store, c) - t.Cleanup(func() { dropDatabase(t, dsn, "lakekeeper_acme2") }) - - err := p.EnsureForOrg(context.Background(), store.warehouses["acme2"], ProvisioningInputs{ - AdminDSN: dsn, PGHost: "localhost", PGPort: 5434, - S3: S3StorageConfig{ - Bucket: "warehouse", KeyPrefix: "org-acme2", Region: "us-east-1", Flavor: "s3-compat", - StaticAccessKeyID: "minioadmin", StaticAccessKeySecret: "minioadmin", - }, - }) - if !errors.Is(err, ErrBootstrapPending) { - t.Fatalf("expected ErrBootstrapPending, got: %v", err) - } -} - -func TestEnsureForOrg_RejectsInvalidInputs(t *testing.T) { - c, _, _ := newFakeLakekeeperClient() - store := newFakeProvisionerStore("acme", configstore.ManagedWarehouseStateProvisioning) - p := NewLakekeeperProvisioner(store, c) - - cases := map[string]ProvisioningInputs{ - "missing AdminDSN": {PGHost: "h", S3: S3StorageConfig{Bucket: "b", Region: "r", Flavor: "aws"}}, - "missing PGHost": {AdminDSN: "d", S3: S3StorageConfig{Bucket: "b", Region: "r", Flavor: "aws"}}, - "missing S3 bucket": {AdminDSN: "d", PGHost: "h", S3: S3StorageConfig{Region: "r", Flavor: "aws"}}, - "missing S3 region": {AdminDSN: "d", PGHost: "h", S3: S3StorageConfig{Bucket: "b", Flavor: "aws"}}, - "missing S3 flavor": {AdminDSN: "d", PGHost: "h", S3: S3StorageConfig{Bucket: "b", Region: "r"}}, - } - for name, in := range cases { - t.Run(name, func(t *testing.T) { - if err := p.EnsureForOrg(context.Background(), store.warehouses["acme"], in); err == nil { - t.Errorf("expected validation error for %s", name) - } - }) - } -} - -// TestEnsureForOrg_PersistsAfterTopLevelStateMoved covers the case where -// the warehouse row's top-level state has already transitioned to Ready -// (e.g. the Duckling controller moved it ahead of us). The new persist -// path uses UpdateIcebergConfig which doesn't CAS on top-level state, so -// Lakekeeper config still lands. -func TestEnsureForOrg_PersistsAfterTopLevelStateMoved(t *testing.T) { - dsn := os.Getenv("PG_ADMIN_DSN") - if dsn == "" { - t.Skip("PG_ADMIN_DSN not set") - } - c, _, _ := newFakeLakekeeperClient() - fake := newFakeLakekeeperServer(t) - store := newFakeProvisionerStore("ready-test", configstore.ManagedWarehouseStateProvisioning) - // Simulate the Duckling state machine racing ahead. - store.warehouses["ready-test"].State = configstore.ManagedWarehouseStateReady - // Caller's snapshot of w is stale. - staleW := &configstore.ManagedWarehouse{ - OrgID: "ready-test", - State: configstore.ManagedWarehouseStateProvisioning, - } - - if err := c.EnsureCR(context.Background(), LakekeeperCRSpec{ - OrgID: "ready-test", Image: "stub", PGHost: "stub", PGDatabase: "stub", SecretName: "stub", BaseURI: "http://stub", - }); err != nil { - t.Fatalf("seed CR: %v", err) - } - markBootstrapped(t, c, "ready-test") - - p := NewLakekeeperProvisioner(store, c, - WithClientFactory(func(string) *LakekeeperClient { return NewLakekeeperClient(fake.srv.URL) }), - ) - t.Cleanup(func() { dropDatabase(t, dsn, "lakekeeper_readytest") }) - - err := p.EnsureForOrg(context.Background(), staleW, ProvisioningInputs{ - AdminDSN: dsn, PGHost: "localhost", PGPort: 5434, - S3: S3StorageConfig{ - Bucket: "warehouse", KeyPrefix: "ready-test", Region: "us-east-1", Flavor: "s3-compat", - StaticAccessKeyID: "minioadmin", StaticAccessKeySecret: "minioadmin", - }, - }) - if err != nil { - t.Fatalf("expected nil, got: %v", err) - } - if w := store.warehouses["ready-test"]; w.Iceberg.LakekeeperEndpoint == "" { - t.Errorf("expected LakekeeperEndpoint persisted even though top-level state was Ready") - } -} - -// TestEnsureForOrg_PersistsOAuth2URIWhenKubernetesAuthOn confirms that -// KubernetesAuthAudiences on the inputs flows through to: -// -// - the Lakekeeper CR's spec.authentication.kubernetes block -// - the warehouse row's LakekeeperOAuth2ServerURI (pointing at the -// worker's local broker on 127.0.0.1) -// -// This is the wire-level handshake PR4 unlocks: the worker emits an -// OAuth2 secret + ATTACH because the URI is non-empty. -func TestEnsureForOrg_PersistsOAuth2URIWhenKubernetesAuthOn(t *testing.T) { - dsn := os.Getenv("PG_ADMIN_DSN") - if dsn == "" { - t.Skip("PG_ADMIN_DSN not set") - } - c, _, _ := newFakeLakekeeperClient() - fake := newFakeLakekeeperServer(t) - store := newFakeProvisionerStore("oidc-org", configstore.ManagedWarehouseStateProvisioning) - - if err := c.EnsureCR(context.Background(), LakekeeperCRSpec{ - OrgID: "oidc-org", Image: "stub", PGHost: "stub", PGDatabase: "stub", - SecretName: "stub", BaseURI: "http://stub", - }); err != nil { - t.Fatalf("seed CR: %v", err) - } - markBootstrapped(t, c, "oidc-org") - - p := NewLakekeeperProvisioner(store, c, - WithClientFactory(func(string) *LakekeeperClient { return NewLakekeeperClient(fake.srv.URL) }), - ) - t.Cleanup(func() { dropDatabase(t, dsn, "lakekeeper_oidcorg") }) - - err := p.EnsureForOrg(context.Background(), store.warehouses["oidc-org"], ProvisioningInputs{ - AdminDSN: dsn, PGHost: "localhost", PGPort: 5434, PGSSLMode: "disable", - S3: S3StorageConfig{ - Bucket: "warehouse", KeyPrefix: "oidc-org", Region: "us-east-1", Flavor: "s3-compat", - StaticAccessKeyID: "minioadmin", StaticAccessKeySecret: "minioadmin", - }, - KubernetesAuthAudiences: []string{"lakekeeper"}, - }) - if err != nil { - t.Fatalf("EnsureForOrg: %v", err) - } - - w := store.warehouses["oidc-org"] - if w.Iceberg.LakekeeperOAuth2ServerURI == "" { - t.Errorf("LakekeeperOAuth2ServerURI should be populated in OIDC mode") - } - if w.Iceberg.LakekeeperOAuth2ServerURI != "http://127.0.0.1:9876/token" { - t.Errorf("OAUTH2_SERVER_URI = %q, want http://127.0.0.1:9876/token (worker-local broker)", - w.Iceberg.LakekeeperOAuth2ServerURI) - } - - // Cross-check that the CR's authentication.kubernetes block was set in - // the SAME EnsureForOrg call. Without this read-back, the DB row and - // the CR could drift — a future refactor that threads - // KubernetesAuthAudiences into ProvisioningInputs but forgets to pass - // it to LakekeeperCRSpec would leave Lakekeeper in allowall mode while - // the worker tells DuckDB to POST to the broker. Lakekeeper would - // then reject the token because k8s auth wasn't actually enabled. - cr, err := c.dynamic.Resource(lakekeeperGVR).Namespace(c.namespace). - Get(context.Background(), LakekeeperResourceName("oidc-org"), metav1.GetOptions{}) - if err != nil { - t.Fatalf("get CR for cross-check: %v", err) - } - specMap := cr.Object["spec"].(map[string]interface{}) - auth, ok := specMap["authentication"].(map[string]interface{}) - if !ok { - t.Fatalf("spec.authentication missing on CR — would be allowall in prod") - } - k8sAuth, ok := auth["kubernetes"].(map[string]interface{}) - if !ok { - t.Fatalf("spec.authentication.kubernetes missing on CR") - } - if k8sAuth["enabled"] != true { - t.Errorf("spec.authentication.kubernetes.enabled = %v, want true", k8sAuth["enabled"]) - } - auds, ok := k8sAuth["audiences"].([]interface{}) - if !ok || len(auds) != 1 || auds[0] != "lakekeeper" { - t.Errorf("audiences = %v, want [lakekeeper]", k8sAuth["audiences"]) - } -} - -func TestEnsureForOrg_OAuth2URIEmptyInAllowallMode(t *testing.T) { - dsn := os.Getenv("PG_ADMIN_DSN") - if dsn == "" { - t.Skip("PG_ADMIN_DSN not set") - } - c, _, _ := newFakeLakekeeperClient() - fake := newFakeLakekeeperServer(t) - store := newFakeProvisionerStore("allowall-org", configstore.ManagedWarehouseStateProvisioning) - - if err := c.EnsureCR(context.Background(), LakekeeperCRSpec{ - OrgID: "allowall-org", Image: "stub", PGHost: "stub", PGDatabase: "stub", - SecretName: "stub", BaseURI: "http://stub", - }); err != nil { - t.Fatalf("seed CR: %v", err) - } - markBootstrapped(t, c, "allowall-org") - - p := NewLakekeeperProvisioner(store, c, - WithClientFactory(func(string) *LakekeeperClient { return NewLakekeeperClient(fake.srv.URL) }), - ) - t.Cleanup(func() { dropDatabase(t, dsn, "lakekeeper_allowallorg") }) - - // KubernetesAuthAudiences left empty → allowall mode. - err := p.EnsureForOrg(context.Background(), store.warehouses["allowall-org"], ProvisioningInputs{ - AdminDSN: dsn, PGHost: "localhost", PGPort: 5434, PGSSLMode: "disable", - S3: S3StorageConfig{ - Bucket: "warehouse", KeyPrefix: "allowall-org", Region: "us-east-1", Flavor: "s3-compat", - StaticAccessKeyID: "minioadmin", StaticAccessKeySecret: "minioadmin", - }, - }) - if err != nil { - t.Fatalf("EnsureForOrg: %v", err) - } - if w := store.warehouses["allowall-org"]; w.Iceberg.LakekeeperOAuth2ServerURI != "" { - t.Errorf("OAUTH2_SERVER_URI = %q, want empty (allowall mode)", w.Iceberg.LakekeeperOAuth2ServerURI) - } -} - -func TestEnsureForOrg_RejectsInvalidOrgID(t *testing.T) { - c, _, _ := newFakeLakekeeperClient() - store := newFakeProvisionerStore("bad org", configstore.ManagedWarehouseStateProvisioning) - p := NewLakekeeperProvisioner(store, c) - err := p.EnsureForOrg(context.Background(), store.warehouses["bad org"], ProvisioningInputs{ - AdminDSN: "d", PGHost: "h", - S3: S3StorageConfig{Bucket: "b", Region: "r", Flavor: "aws"}, - }) - if err == nil || !strings.Contains(err.Error(), "not a valid K8s label value") { - t.Fatalf("expected label-value error, got: %v", err) - } -} - -// dropDatabase is a t.Cleanup helper that best-effort removes a DB. -func dropDatabase(t *testing.T, dsn, name string) { - t.Helper() - dropDSN := dsn // pgx accepts the same DSN; CREATE/DROP from any current DB. - cleanupDB(t, dropDSN, name) -} - -// TestDeleteForOrg_SkipsPGCleanupForCnpgShard guards the contract that the -// cnpg-shard path leaves PG cleanup to the Crossplane composition. The -// composition's [Delete] managementPolicy on the cnpg-tenant-role and -// cnpg-tenant-database resources owns role+DB teardown there; the -// duckgres provisioner doesn't have the AdminDSN to do it itself -// anyway. The k8s teardown still runs. -func TestDeleteForOrg_SkipsPGCleanupForCnpgShard(t *testing.T) { - c, _, _ := newFakeLakekeeperClient() - p := NewLakekeeperProvisioner(newFakeStore(), c) - // AdminDSN empty + PGPreProvisioned=true: DropDatabase/DropRole must - // not be invoked. With a bogus DSN they'd surface as connection - // errors; a nil-DSN attempt would panic deep in pgx. The fact that - // this returns nil with no DSN configured is the assertion. - err := p.DeleteForOrg(context.Background(), "acme", ProvisioningInputs{ - PGPreProvisioned: true, - }) - if err != nil { - t.Fatalf("DeleteForOrg with PGPreProvisioned should succeed without DSN, got: %v", err) - } -} - -// TestDeleteForOrg_SkipsPGCleanupWithoutAdminDSN covers the dev/orbstack -// path where the lakekeeper provisioner never had an AdminDSN in the -// first place (no env, no Duckling CR). PG cleanup must be skipped -// rather than failing the teardown. -func TestDeleteForOrg_SkipsPGCleanupWithoutAdminDSN(t *testing.T) { - c, _, _ := newFakeLakekeeperClient() - p := NewLakekeeperProvisioner(newFakeStore(), c) - err := p.DeleteForOrg(context.Background(), "acme", ProvisioningInputs{}) - if err != nil { - t.Fatalf("DeleteForOrg with empty inputs should succeed, got: %v", err) - } -} diff --git a/controlplane/provisioner/naming_test.go b/controlplane/provisioner/naming_test.go index de51e612..f0fff58c 100644 --- a/controlplane/provisioner/naming_test.go +++ b/controlplane/provisioner/naming_test.go @@ -10,8 +10,8 @@ import "testing" // "ab" both collapsed to "ab". func TestDucklingNamePreservesHyphens(t *testing.T) { for in, want := range map[string]string{ - "ben-iceberg-cnpg": "ben-iceberg-cnpg", - "Ben-Iceberg": "ben-iceberg", + "ben-ducklake-cnpg": "ben-ducklake-cnpg", + "Ben-DuckLake": "ben-ducklake", "team123": "team123", "f47ac10b-58cc-4372-a567-0e02b2c3d479": "f47ac10b-58cc-4372-a567-0e02b2c3d479", } { @@ -23,36 +23,3 @@ func TestDucklingNamePreservesHyphens(t *testing.T) { t.Error("ducklingName must not collide \"a-b\" with \"ab\" (the old de-hyphenation bug)") } } - -// TestPgIdentSuffixSanitizes verifies the Postgres-identifier transform maps -// hyphens to underscores (PG identifiers can't be unquoted-hyphenated) and -// stays injective for [a-z0-9-] inputs. -func TestPgIdentSuffixSanitizes(t *testing.T) { - for in, want := range map[string]string{ - "ben-iceberg-cnpg": "ben_iceberg_cnpg", - "team123": "team123", - "ABC-1": "abc_1", - } { - if got := pgIdentSuffix(in); got != want { - t.Errorf("pgIdentSuffix(%q) = %q, want %q", in, got, want) - } - } -} - -// TestLakekeeperNamesForHyphenatedOrg verifies the Lakekeeper-derived names: -// k8s/string names keep hyphens; the PG database uses underscores. -func TestLakekeeperNamesForHyphenatedOrg(t *testing.T) { - const org = "ben-iceberg-external" - if got := LakekeeperResourceName(org); got != "lakekeeper-ben-iceberg-external" { - t.Errorf("LakekeeperResourceName = %q", got) - } - if got := lakekeeperDBName(org); got != "lakekeeper_ben_iceberg_external" { - t.Errorf("lakekeeperDBName = %q", got) - } - if got := lakekeeperWarehouseName(org); got != "org-ben-iceberg-external" { - t.Errorf("lakekeeperWarehouseName = %q", got) - } - if got := oauthClientID(org); got != "duckling-ben-iceberg-external" { - t.Errorf("oauthClientID = %q", got) - } -} diff --git a/controlplane/provisioner/postgres_admin.go b/controlplane/provisioner/postgres_admin.go deleted file mode 100644 index 38c63a10..00000000 --- a/controlplane/provisioner/postgres_admin.go +++ /dev/null @@ -1,403 +0,0 @@ -package provisioner - -import ( - "context" - "database/sql" - "errors" - "fmt" - "regexp" - "strings" - - _ "github.com/jackc/pgx/v5/stdlib" -) - -// EnsureDatabase creates dbName on the Postgres server addressed by adminDSN -// if it does not already exist. Idempotent: returns nil when the database is -// already present. Caller owns the DSN's credential lifetime. -// -// CREATE DATABASE cannot run in a transaction and Postgres does not support -// CREATE DATABASE IF NOT EXISTS, so we probe pg_database first and only fire -// CREATE DATABASE when the row is missing. There is a TOCTOU race against -// concurrent callers; we handle the duplicate_database SQLSTATE (42P04) as -// a benign collision rather than an error. -func EnsureDatabase(ctx context.Context, adminDSN, dbName string) error { - if !isSafePGIdent(dbName) { - return fmt.Errorf("ensure database: unsafe identifier %q", dbName) - } - db, err := sql.Open("pgx", adminDSN) - if err != nil { - return fmt.Errorf("open admin connection: %w", err) - } - defer func() { _ = db.Close() }() - - var exists bool - if err := db.QueryRowContext(ctx, "SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname=$1)", dbName).Scan(&exists); err != nil { - return fmt.Errorf("probe pg_database: %w", err) - } - if exists { - return nil - } - - // Identifier validated above. CREATE DATABASE does not accept parameters. - if _, err := db.ExecContext(ctx, "CREATE DATABASE "+quoteIdent(dbName)); err != nil { - if isDuplicateDatabase(err) { - return nil - } - return fmt.Errorf("create database %s: %w", dbName, err) - } - return nil -} - -// EnsureRole creates a login role with the given password, or rotates the -// password if the role already exists. Used by the Lakekeeper provisioner to -// make sure Lakekeeper's pod can connect with the credentials stored in its -// K8s Secret — a freshly-created database has no users by default. -// -// On a re-run with the same password the ALTER ROLE is a no-op for Postgres -// internals. On a re-run with a different password we explicitly rotate; -// callers must keep the password in their Secret in sync with whatever was -// last passed here. The Lakekeeper provisioner achieves this by reading the -// existing Secret on every run (resolveOrGenerateSecret) rather than -// regenerating, so the same password threads through to EnsureRole. -// -// Grants the role ALL PRIVILEGES on the named database. Cluster admin -// permissions are not granted. -func EnsureRole(ctx context.Context, adminDSN, role, password, ownedDB string) error { - if !isSafePGIdent(role) { - return fmt.Errorf("ensure role: unsafe role name %q", role) - } - if !isSafePGIdent(ownedDB) { - return fmt.Errorf("ensure role: unsafe db name %q", ownedDB) - } - if password == "" { - return fmt.Errorf("ensure role: empty password") - } - db, err := sql.Open("pgx", adminDSN) - if err != nil { - return fmt.Errorf("open admin connection: %w", err) - } - defer func() { _ = db.Close() }() - - var exists bool - if err := db.QueryRowContext(ctx, "SELECT EXISTS(SELECT 1 FROM pg_roles WHERE rolname=$1)", role).Scan(&exists); err != nil { - return fmt.Errorf("probe pg_roles: %w", err) - } - if !exists { - // CREATE ROLE accepts password as a literal — pgx's parameter - // binding doesn't apply to DDL. Validate as identifier so embedded - // quotes can't break out (passwords from mustRandomHex are pure - // hex so this is belt-and-suspenders). - if !isSafePGPassword(password) { - return fmt.Errorf("ensure role: password contains unsafe characters") - } - stmt := fmt.Sprintf("CREATE ROLE %s WITH LOGIN PASSWORD %s", quoteIdent(role), quoteLiteral(password)) - if _, err := db.ExecContext(ctx, stmt); err != nil { - if isDuplicateObject(err) { - // Concurrent creator beat us — fall through to ALTER + GRANT. - } else { - return fmt.Errorf("create role %s: %w", role, err) - } - } - } else { - // Role exists; rotate the password to whatever the caller passed - // (matches the contract that the secret + role stay in sync). - if !isSafePGPassword(password) { - return fmt.Errorf("ensure role: password contains unsafe characters") - } - stmt := fmt.Sprintf("ALTER ROLE %s WITH PASSWORD %s", quoteIdent(role), quoteLiteral(password)) - if _, err := db.ExecContext(ctx, stmt); err != nil { - return fmt.Errorf("alter role %s: %w", role, err) - } - } - - // GRANT is idempotent — Postgres ignores a re-grant of an existing privilege. - if _, err := db.ExecContext(ctx, "GRANT ALL PRIVILEGES ON DATABASE "+quoteIdent(ownedDB)+" TO "+quoteIdent(role)); err != nil { - return fmt.Errorf("grant on database %s to %s: %w", ownedDB, role, err) - } - - // Postgres 15+ revokes CREATE on the public schema for non-owner roles. - // Lakekeeper's `migrate` step needs DDL inside that schema, so make the - // role the database OWNER (which carries schema-creation privileges by - // default). Also ALTER SCHEMA public OWNER as belt-and-suspenders for - // older PG versions where the database OWNER doesn't automatically own - // pre-existing schemas in the new DB. - if _, err := db.ExecContext(ctx, "ALTER DATABASE "+quoteIdent(ownedDB)+" OWNER TO "+quoteIdent(role)); err != nil { - return fmt.Errorf("alter database owner %s -> %s: %w", ownedDB, role, err) - } - // Run the schema-owner ALTER inside the target database — schema - // ownership is local to each database. - dbScoped, err := sql.Open("pgx", reDSN(adminDSN, ownedDB)) - if err != nil { - return fmt.Errorf("open admin connection to %s: %w", ownedDB, err) - } - defer func() { _ = dbScoped.Close() }() - if _, err := dbScoped.ExecContext(ctx, "ALTER SCHEMA public OWNER TO "+quoteIdent(role)); err != nil { - return fmt.Errorf("alter schema public owner -> %s: %w", role, err) - } - return nil -} - -// DropDatabase removes dbName on the Postgres server addressed by adminDSN. -// Idempotent: returns nil when the database is already absent (3D000). Forces -// disconnection of any active sessions on the target DB so DROP DATABASE -// can't hang waiting for clients to drain — necessary at duckling teardown -// time because the per-tenant Lakekeeper pod may still be alive when the -// drop runs (the k8s teardown is fire-and-forget and the operator's -// reconciliation lag means connections linger). -// -// Reassigns ownership to CURRENT_USER before the drop so the admin role -// can issue DROP DATABASE even when it doesn't own the target. EnsureRole -// runs ALTER DATABASE ... OWNER TO , which means a non-superuser -// admin (e.g. the ducklingexample master on a shared RDS) wouldn't -// otherwise have permission — 42501 must be owner of database. GRANT -// role-membership first so CURRENT_USER inherits the necessary privileges -// to ALTER OWNER (which itself requires being a member of the new owner -// role on Postgres 14+). -// -// Caller must connect via a privileged DSN against a different database -// than dbName (the admin DSN's path is OK to be `postgres`). -func DropDatabase(ctx context.Context, adminDSN, dbName string) error { - if !isSafePGIdent(dbName) { - return fmt.Errorf("drop database: unsafe identifier %q", dbName) - } - db, err := sql.Open("pgx", adminDSN) - if err != nil { - return fmt.Errorf("open admin connection: %w", err) - } - defer func() { _ = db.Close() }() - - // Probe first so the IF EXISTS hides the missing-DB case cleanly, - // and so we don't run the ownership reassignment against a phantom. - var exists bool - if err := db.QueryRowContext(ctx, "SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname=$1)", dbName).Scan(&exists); err != nil { - return fmt.Errorf("probe pg_database: %w", err) - } - if !exists { - return nil - } - - // EnsureRole made the tenant role the database owner. To DROP we must - // either be that role or a superuser; on RDS the admin is neither. - // Take role-membership of the current owner so we can hand ownership - // back to ourselves, then drop. Both GRANT and ALTER OWNER are - // idempotent / best-effort: if the role doesn't exist (e.g. we are - // already the owner) the GRANT 0LP01-likes and ALTER fail cleanly, - // and we continue to the DROP, which will report the real obstacle. - if owner, ok := databaseOwner(ctx, db, dbName); ok && owner != "" && isSafePGIdent(owner) { - _, _ = db.ExecContext(ctx, "GRANT "+quoteIdent(owner)+" TO CURRENT_USER") - _, _ = db.ExecContext(ctx, "ALTER DATABASE "+quoteIdent(dbName)+" OWNER TO CURRENT_USER") - } - - // FORCE terminates active backends as part of DROP DATABASE (Postgres - // 13+). Without it a single lingering Lakekeeper connection blocks the - // drop until backoff. - if _, err := db.ExecContext(ctx, "DROP DATABASE IF EXISTS "+quoteIdent(dbName)+" WITH (FORCE)"); err != nil { - if isInvalidCatalogName(err) { - return nil - } - return fmt.Errorf("drop database %s: %w", dbName, err) - } - return nil -} - -// databaseOwner returns the rolname that owns dbName, or "" if the lookup -// fails. Used by DropDatabase to reassign ownership before DROP. -func databaseOwner(ctx context.Context, db *sql.DB, dbName string) (string, bool) { - var owner string - err := db.QueryRowContext(ctx, - "SELECT pg_catalog.pg_get_userbyid(datdba) FROM pg_database WHERE datname=$1", - dbName).Scan(&owner) - if err != nil { - return "", false - } - return owner, true -} - -// DropRole removes role on the Postgres server addressed by adminDSN. -// Idempotent: returns nil when the role is already absent. Best-effort -// REASSIGN/DROP OWNED first so any object the role owns (e.g. grants on -// the maintenance DB, default privileges) doesn't block DROP ROLE with -// 2BP01 ("role cannot be dropped because some objects depend on it"). -// -// Requires role-membership in `role` for REASSIGN OWNED + DROP OWNED to -// run (Postgres 14+); the GRANT is best-effort because if `role` is -// already gone the GRANT itself fails. The caller's admin must either be -// a superuser or already a member of role; on RDS we explicitly GRANT -// the membership first since the admin is neither. -// -// Caller must connect via a privileged DSN. -func DropRole(ctx context.Context, adminDSN, role string) error { - if !isSafePGIdent(role) { - return fmt.Errorf("drop role: unsafe role name %q", role) - } - db, err := sql.Open("pgx", adminDSN) - if err != nil { - return fmt.Errorf("open admin connection: %w", err) - } - defer func() { _ = db.Close() }() - - var exists bool - if err := db.QueryRowContext(ctx, "SELECT EXISTS(SELECT 1 FROM pg_roles WHERE rolname=$1)", role).Scan(&exists); err != nil { - return fmt.Errorf("probe pg_roles: %w", err) - } - if !exists { - return nil - } - - // Inherit the role's privileges so REASSIGN/DROP OWNED can run on - // shared RDS where the admin is not a superuser. Best-effort: the - // GRANT can fail when the admin is already a member (cycle) or is - // the superuser, in which case it isn't needed anyway. - _, _ = db.ExecContext(ctx, "GRANT "+quoteIdent(role)+" TO CURRENT_USER") - - // REASSIGN handles owned database objects in this DB; DROP OWNED - // CASCADE then handles cluster-wide grants/default privileges. Both - // can fail without blocking the final DROP ROLE — if there's a real - // remaining dependency, DROP ROLE will surface it. - _, _ = db.ExecContext(ctx, "REASSIGN OWNED BY "+quoteIdent(role)+" TO CURRENT_USER") - if _, err := db.ExecContext(ctx, "DROP OWNED BY "+quoteIdent(role)+" CASCADE"); err != nil { - if isUndefinedObject(err) { - return nil - } - // Continue to DROP ROLE — there may be nothing to drop. Surface - // the underlying error only if DROP ROLE itself fails too. - _ = err - } - if _, err := db.ExecContext(ctx, "DROP ROLE IF EXISTS "+quoteIdent(role)); err != nil { - return fmt.Errorf("drop role %s: %w", role, err) - } - return nil -} - -// reDSN rewrites the dbname component of a Postgres URL-style DSN. Used to -// connect to a specific database with the same admin credentials. -func reDSN(dsn, dbName string) string { - // pgx accepts both URL and keyword/value DSNs. Detect the URL form by - // the postgres:// prefix. - const urlPrefix = "postgres://" - const urlPrefix2 = "postgresql://" - if strings.HasPrefix(dsn, urlPrefix) || strings.HasPrefix(dsn, urlPrefix2) { - // Find the last "/" after the "@" — that's the path component. - at := strings.Index(dsn, "@") - slash := -1 - if at >= 0 { - slash = strings.Index(dsn[at:], "/") - if slash >= 0 { - slash += at - } - } - if slash < 0 { - // No dbname segment; append one. - if q := strings.Index(dsn, "?"); q >= 0 { - return dsn[:q] + "/" + dbName + dsn[q:] - } - return dsn + "/" + dbName - } - // Replace the segment between slash+1 and the next "?" (or end). - rest := dsn[slash+1:] - q := strings.Index(rest, "?") - if q < 0 { - return dsn[:slash+1] + dbName - } - return dsn[:slash+1] + dbName + rest[q:] - } - // Keyword/value form: replace dbname=... or append. - return strings.NewReplacer("dbname="+extractDBName(dsn), "dbname="+dbName).Replace(dsn) -} - -func extractDBName(dsn string) string { - // Best-effort extract for the keyword/value form. We only use this when - // pgx URL prefix isn't present, which is rare in our codebase. - for _, kv := range strings.Fields(dsn) { - if strings.HasPrefix(kv, "dbname=") { - return strings.TrimPrefix(kv, "dbname=") - } - } - return "" -} - -// isSafePGPassword restricts passwords we generate to a printable ASCII -// subset that can't break out of a single-quoted SQL literal. Our generator -// (mustRandomHex) produces pure hex which trivially passes; the check -// exists so an external caller can't sneak in newlines / quotes. -var safePGPassword = regexp.MustCompile(`^[A-Za-z0-9_\-+=./]{1,256}$`) - -func isSafePGPassword(s string) bool { return safePGPassword.MatchString(s) } - -// quoteLiteral wraps a string in single quotes and doubles any embedded -// single quotes per Postgres rules. Identifiers use double quotes -// (quoteIdent); string literals use single quotes. -func quoteLiteral(s string) string { - out := make([]byte, 0, len(s)+2) - out = append(out, '\'') - for i := 0; i < len(s); i++ { - if s[i] == '\'' { - out = append(out, '\'', '\'') - } else { - out = append(out, s[i]) - } - } - out = append(out, '\'') - return string(out) -} - -// isDuplicateObject reports whether err is Postgres 42710 (duplicate_object) -// — used for the CREATE ROLE concurrent-create race. -func isDuplicateObject(err error) bool { - type sqlStater interface{ SQLState() string } - var s sqlStater - return errors.As(err, &s) && s.SQLState() == "42710" -} - -// isSafePGIdent restricts database names to a conservative whitelist. Names -// we generate look like "lakekeeper_" where orgid is itself -// constrained; the regex catches accidental typos and any attempt to inject -// SQL via the name. Real escaping is still done with quoteIdent below — this -// is a belt-and-suspenders check. -var safePGIdent = regexp.MustCompile(`^[a-zA-Z_][a-zA-Z0-9_]{0,62}$`) - -func isSafePGIdent(s string) bool { return safePGIdent.MatchString(s) } - -// quoteIdent wraps an identifier in double quotes and escapes embedded quotes -// per Postgres rules. Used after isSafePGIdent so this is defensive. -func quoteIdent(s string) string { - out := make([]byte, 0, len(s)+2) - out = append(out, '"') - for i := 0; i < len(s); i++ { - if s[i] == '"' { - out = append(out, '"', '"') - } else { - out = append(out, s[i]) - } - } - out = append(out, '"') - return string(out) -} - -// isDuplicateDatabase reports whether err is a Postgres 42P04 (database -// already exists) — the only race outcome we consider benign. Uses -// errors.As so multi-error chains (e.g. errors.Join) are handled too. -func isDuplicateDatabase(err error) bool { - type sqlStater interface{ SQLState() string } - var s sqlStater - return errors.As(err, &s) && s.SQLState() == "42P04" -} - -// isInvalidCatalogName reports whether err is Postgres 3D000 (database does -// not exist) — what DROP DATABASE returns when the target is already gone. -// Without the IF EXISTS clause this would matter; we keep the check anyway -// because IF EXISTS is silent on missing-DB and an actual no-such-database -// error can also surface from the connection attempt itself. -func isInvalidCatalogName(err error) bool { - type sqlStater interface{ SQLState() string } - var s sqlStater - return errors.As(err, &s) && s.SQLState() == "3D000" -} - -// isUndefinedObject reports whether err is Postgres 42704 (undefined_object) -// — what DROP OWNED returns when the role doesn't exist. Benign. -func isUndefinedObject(err error) bool { - type sqlStater interface{ SQLState() string } - var s sqlStater - return errors.As(err, &s) && s.SQLState() == "42704" -} diff --git a/controlplane/provisioner/postgres_admin_test.go b/controlplane/provisioner/postgres_admin_test.go deleted file mode 100644 index 86aec77f..00000000 --- a/controlplane/provisioner/postgres_admin_test.go +++ /dev/null @@ -1,288 +0,0 @@ -package provisioner - -import ( - "context" - "database/sql" - "fmt" - "os" - "strings" - "testing" -) - -func TestIsSafePGIdent(t *testing.T) { - cases := []struct { - in string - want bool - }{ - {"lakekeeper_acme", true}, - {"a", true}, - {"_underscore_start", true}, - {"with-hyphen", false}, - {"1starts_with_digit", false}, - {"has space", false}, - {`"injection"`, false}, - {`; DROP TABLE`, false}, - {"", false}, - // Postgres identifier max 63 — 64 chars should reject. - {"x" + string(make([]byte, 63)), false}, // 64 NULs after x - } - // Build a 63-char valid ident and verify it passes. - maxOK := make([]byte, 63) - for i := range maxOK { - maxOK[i] = 'a' - } - cases = append(cases, struct { - in string - want bool - }{string(maxOK), true}) - - for _, c := range cases { - if got := isSafePGIdent(c.in); got != c.want { - t.Errorf("isSafePGIdent(%q) = %v, want %v", c.in, got, c.want) - } - } -} - -func TestQuoteIdent(t *testing.T) { - cases := map[string]string{ - "plain": `"plain"`, - `with"quote`: `"with""quote"`, - "": `""`, - } - for in, want := range cases { - if got := quoteIdent(in); got != want { - t.Errorf("quoteIdent(%q) = %q, want %q", in, got, want) - } - } -} - -// TestEnsureDatabase_AgainstLivePG is an integration test against a real -// Postgres. Skipped unless PG_ADMIN_DSN is set. Pair with the -// docker-compose stack at tmp/lakekeeper-proto: -// -// export PG_ADMIN_DSN='postgres://lakekeeper:lakekeeper@localhost:5434/lakekeeper?sslmode=disable' -// go test ./controlplane/provisioner/ -run TestEnsureDatabase -v -func TestEnsureDatabase_AgainstLivePG(t *testing.T) { - dsn := os.Getenv("PG_ADMIN_DSN") - if dsn == "" { - t.Skip("PG_ADMIN_DSN not set; skipping live Postgres test") - } - - dbName := fmt.Sprintf("lakekeeper_test_%d", os.Getpid()) - t.Cleanup(func() { - // Best-effort drop. Use a fresh connection because EnsureDatabase - // closes its own. - db, err := sql.Open("pgx", dsn) - if err != nil { - t.Logf("cleanup open: %v", err) - return - } - defer func() { _ = db.Close() }() - if _, err := db.Exec("DROP DATABASE IF EXISTS " + quoteIdent(dbName)); err != nil { - t.Logf("cleanup drop: %v", err) - } - }) - - // First call creates. - if err := EnsureDatabase(context.Background(), dsn, dbName); err != nil { - t.Fatalf("EnsureDatabase (first): %v", err) - } - // Second call is a no-op (idempotent). - if err := EnsureDatabase(context.Background(), dsn, dbName); err != nil { - t.Fatalf("EnsureDatabase (idempotent): %v", err) - } - // Verify the database actually exists. - db, err := sql.Open("pgx", dsn) - if err != nil { - t.Fatalf("verify open: %v", err) - } - defer func() { _ = db.Close() }() - var exists bool - if err := db.QueryRow("SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname=$1)", dbName).Scan(&exists); err != nil { - t.Fatalf("verify query: %v", err) - } - if !exists { - t.Fatalf("database %s not found after EnsureDatabase", dbName) - } -} - -// TestEnsureRole_AgainstLivePG verifies the role create/alter/grant path -// against a real Postgres. Skipped unless PG_ADMIN_DSN is set. -func TestEnsureRole_AgainstLivePG(t *testing.T) { - dsn := os.Getenv("PG_ADMIN_DSN") - if dsn == "" { - t.Skip("PG_ADMIN_DSN not set") - } - dbName := fmt.Sprintf("lakekeeper_role_test_%d", os.Getpid()) - roleName := dbName - pw1 := "abcdef0123456789" - pw2 := "fedcba9876543210" - t.Cleanup(func() { - db, _ := sql.Open("pgx", dsn) - defer func() { _ = db.Close() }() - // Order matters: drop privs before dropping the role. - _, _ = db.Exec("REASSIGN OWNED BY " + quoteIdent(roleName) + " TO CURRENT_USER") - _, _ = db.Exec("DROP OWNED BY " + quoteIdent(roleName)) - _, _ = db.Exec("DROP DATABASE IF EXISTS " + quoteIdent(dbName)) - _, _ = db.Exec("DROP ROLE IF EXISTS " + quoteIdent(roleName)) - }) - - ctx := context.Background() - if err := EnsureDatabase(ctx, dsn, dbName); err != nil { - t.Fatalf("EnsureDatabase: %v", err) - } - if err := EnsureRole(ctx, dsn, roleName, pw1, dbName); err != nil { - t.Fatalf("EnsureRole (create): %v", err) - } - // Idempotent: second call with same password is a no-op-ish ALTER. - if err := EnsureRole(ctx, dsn, roleName, pw1, dbName); err != nil { - t.Fatalf("EnsureRole (idempotent): %v", err) - } - // Verify the role can actually connect with pw1. - connDSN := fmt.Sprintf("postgres://%s:%s@localhost:5434/%s?sslmode=disable", roleName, pw1, dbName) - if connDB, err := sql.Open("pgx", connDSN); err != nil { - t.Fatalf("open as role: %v", err) - } else { - var one int - if err := connDB.QueryRow("SELECT 1").Scan(&one); err != nil { - t.Errorf("connect+query as role with pw1 failed: %v", err) - } - _ = connDB.Close() - } - // Rotate the password and verify the new one connects. - if err := EnsureRole(ctx, dsn, roleName, pw2, dbName); err != nil { - t.Fatalf("EnsureRole (rotate): %v", err) - } - connDSN2 := fmt.Sprintf("postgres://%s:%s@localhost:5434/%s?sslmode=disable", roleName, pw2, dbName) - if connDB, err := sql.Open("pgx", connDSN2); err != nil { - t.Fatalf("open as role with rotated pw: %v", err) - } else { - var one int - if err := connDB.QueryRow("SELECT 1").Scan(&one); err != nil { - t.Errorf("connect+query with rotated password failed: %v", err) - } - _ = connDB.Close() - } -} - -// TestDropDatabaseAndRole_AgainstLivePG covers the teardown helpers added -// for duckling delete: round-trips Ensure → Drop and confirms the role -// and DB are both gone, plus the idempotent re-drop case. -func TestDropDatabaseAndRole_AgainstLivePG(t *testing.T) { - dsn := os.Getenv("PG_ADMIN_DSN") - if dsn == "" { - t.Skip("PG_ADMIN_DSN not set") - } - dbName := fmt.Sprintf("lakekeeper_drop_test_%d", os.Getpid()) - roleName := dbName - t.Cleanup(func() { - // Belt-and-suspenders cleanup in case the assertions short-circuit. - db, _ := sql.Open("pgx", dsn) - defer func() { _ = db.Close() }() - _, _ = db.Exec("DROP DATABASE IF EXISTS " + quoteIdent(dbName) + " WITH (FORCE)") - _, _ = db.Exec("DROP ROLE IF EXISTS " + quoteIdent(roleName)) - }) - - ctx := context.Background() - if err := EnsureDatabase(ctx, dsn, dbName); err != nil { - t.Fatalf("EnsureDatabase: %v", err) - } - if err := EnsureRole(ctx, dsn, roleName, "abcdef0123456789", dbName); err != nil { - t.Fatalf("EnsureRole: %v", err) - } - - if err := DropDatabase(ctx, dsn, dbName); err != nil { - t.Fatalf("DropDatabase: %v", err) - } - // Idempotent re-drop. - if err := DropDatabase(ctx, dsn, dbName); err != nil { - t.Fatalf("DropDatabase (idempotent): %v", err) - } - if err := DropRole(ctx, dsn, roleName); err != nil { - t.Fatalf("DropRole: %v", err) - } - if err := DropRole(ctx, dsn, roleName); err != nil { - t.Fatalf("DropRole (idempotent): %v", err) - } - - db, err := sql.Open("pgx", dsn) - if err != nil { - t.Fatalf("open: %v", err) - } - defer func() { _ = db.Close() }() - var dbExists, roleExists bool - if err := db.QueryRow("SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname=$1)", dbName).Scan(&dbExists); err != nil { - t.Fatalf("verify db gone: %v", err) - } - if dbExists { - t.Errorf("database %s still present after DropDatabase", dbName) - } - if err := db.QueryRow("SELECT EXISTS(SELECT 1 FROM pg_roles WHERE rolname=$1)", roleName).Scan(&roleExists); err != nil { - t.Fatalf("verify role gone: %v", err) - } - if roleExists { - t.Errorf("role %s still present after DropRole", roleName) - } -} - -func TestIsSafePGPassword(t *testing.T) { - cases := map[string]bool{ - "abc123": true, - "hex0123456789abcdef": true, - "with-allowed-chars_=.+/": true, - "": false, - "has space": false, - "has'quote": false, - `has"doublequote`: false, - "has\nnewline": false, - "has;semicolon": false, - } - for in, want := range cases { - if got := isSafePGPassword(in); got != want { - t.Errorf("isSafePGPassword(%q) = %v, want %v", in, got, want) - } - } -} - -func TestQuoteLiteral(t *testing.T) { - cases := map[string]string{ - "plain": `'plain'`, - "with'quote": `'with''quote'`, - "": `''`, - `multiple''ticks`: `'multiple''''ticks'`, - } - for in, want := range cases { - if got := quoteLiteral(in); got != want { - t.Errorf("quoteLiteral(%q) = %q, want %q", in, got, want) - } - } -} - -func TestEnsureRole_RejectsUnsafeInput(t *testing.T) { - ctx := context.Background() - // Unsafe role - if err := EnsureRole(ctx, "postgres://stub", "bad role", "abc123", "db"); err == nil || - !strings.Contains(err.Error(), "unsafe role name") { - t.Errorf("expected unsafe role error, got: %v", err) - } - // Unsafe db - if err := EnsureRole(ctx, "postgres://stub", "role", "abc123", "bad db"); err == nil || - !strings.Contains(err.Error(), "unsafe db name") { - t.Errorf("expected unsafe db error, got: %v", err) - } - // Empty password - if err := EnsureRole(ctx, "postgres://stub", "role", "", "db"); err == nil || - !strings.Contains(err.Error(), "empty password") { - t.Errorf("expected empty password error, got: %v", err) - } -} - -func TestEnsureDatabase_RejectsUnsafeIdent(t *testing.T) { - err := EnsureDatabase(context.Background(), "postgres://stub", `evil"; DROP DATABASE foo;--`) - if err == nil { - t.Fatal("expected error for unsafe identifier, got nil") - } - if !strings.Contains(err.Error(), "unsafe identifier") { - t.Fatalf("expected 'unsafe identifier' error, got: %v", err) - } -} diff --git a/controlplane/provisioning/analytics_events_test.go b/controlplane/provisioning/analytics_events_test.go index 3bb89af3..848bd1f2 100644 --- a/controlplane/provisioning/analytics_events_test.go +++ b/controlplane/provisioning/analytics_events_test.go @@ -55,7 +55,7 @@ func TestProvisionEmitsAnalyticsEvent(t *testing.T) { store := newFakeStore() router := newTestRouter(store) - body := []byte(`{"database_name": "acme-db", "metadata_store": {"type": "cnpg-shard"}, "iceberg": {"enabled": true}}`) + body := []byte(`{"database_name": "acme-db", "metadata_store": {"type": "cnpg-shard"}, "ducklake": {"enabled": true}}`) req := httptest.NewRequest(http.MethodPost, "/api/v1/orgs/acme/provision", bytes.NewReader(body)) req.Header.Set("Content-Type", "application/json") rec := httptest.NewRecorder() @@ -75,11 +75,11 @@ func TestProvisionEmitsAnalyticsEvent(t *testing.T) { if e.props["metadata_store"] != "cnpg-shard" { t.Errorf("metadata_store = %v, want cnpg-shard", e.props["metadata_store"]) } - if e.props["iceberg_enabled"] != true { - t.Errorf("iceberg_enabled = %v, want true", e.props["iceberg_enabled"]) + if _, ok := e.props["iceberg_enabled"]; ok { + t.Errorf("iceberg_enabled should not be emitted") } - if e.props["ducklake_enabled"] != false { - t.Errorf("ducklake_enabled = %v, want false", e.props["ducklake_enabled"]) + if e.props["ducklake_enabled"] != true { + t.Errorf("ducklake_enabled = %v, want true", e.props["ducklake_enabled"]) } } @@ -89,7 +89,7 @@ func TestProvisionFailureEmitsNoEvent(t *testing.T) { router := newTestRouter(store) // Missing database_name → 400, no provisioning happens. - body := []byte(`{"metadata_store": {"type": "cnpg-shard"}, "iceberg": {"enabled": true}}`) + body := []byte(`{"metadata_store": {"type": "cnpg-shard"}, "ducklake": {"enabled": true}}`) req := httptest.NewRequest(http.MethodPost, "/api/v1/orgs/acme/provision", bytes.NewReader(body)) req.Header.Set("Content-Type", "application/json") rec := httptest.NewRecorder() diff --git a/controlplane/provisioning/api.go b/controlplane/provisioning/api.go index 06620e25..d7f22dde 100644 --- a/controlplane/provisioning/api.go +++ b/controlplane/provisioning/api.go @@ -1,6 +1,7 @@ package provisioning import ( + "encoding/json" "errors" "fmt" "net/http" @@ -32,8 +33,8 @@ func isUniqueViolation(err error) bool { // single DNS-1123 label (lowercase alphanumerics + hyphens, start/end // alphanumeric). This is the shape every derived name needs: // - the SNI prefix . is a single DNS label already; -// - the Duckling CR / IAM role / S3 bucket / Lakekeeper CR names use the org -// ID verbatim (lowercased), so it must be a valid k8s/AWS name; +// - the Duckling CR / IAM role / S3 bucket names use the org ID verbatim +// (lowercased), so it must be a valid k8s/AWS name; // - the Postgres identifier maps any non-[a-z0-9_] char to '_', which is only // injective when the source charset excludes everything but hyphens. // @@ -111,7 +112,7 @@ type provisionRequest struct { MetadataStore *provisionMetadataReq `json:"metadata_store,omitempty"` DataStore *provisionDataStoreReq `json:"data_store,omitempty"` DuckLake *provisionDuckLakeReq `json:"ducklake,omitempty"` - Iceberg *provisionIcebergReq `json:"iceberg,omitempty"` + Iceberg *json.RawMessage `json:"iceberg,omitempty"` } type provisionMetadataReq struct { @@ -121,9 +122,8 @@ type provisionMetadataReq struct { External *provisionExternalReq `json:"external,omitempty"` } -// provisionDuckLakeReq toggles the DuckLake catalog. Independent of Iceberg and -// of the metadata-store type: enable DuckLake, Iceberg, or both. At least one -// catalog must be enabled. +// provisionDuckLakeReq toggles the DuckLake catalog. DuckLake is the only +// supported managed warehouse catalog. type provisionDuckLakeReq struct { Enabled bool `json:"enabled"` } @@ -148,23 +148,6 @@ type provisionDataStoreReq struct { Region string `json:"region,omitempty"` } -// provisionIcebergReq toggles the per-tenant Lakekeeper Iceberg catalog. For -// external metadata stores it's optional (enabled → iceberg+external, omitted -// → ducklake+external); for cnpg-shard it's implied and always enabled. -type provisionIcebergReq struct { - Enabled bool `json:"enabled"` - Namespace string `json:"namespace,omitempty"` -} - -// icebergNamespace returns the requested Iceberg namespace, or "" to let the -// XRD default ("main") apply. -func icebergNamespace(req *provisionIcebergReq) string { - if req == nil { - return "" - } - return req.Namespace -} - // resolveDataStore validates and normalizes the data-store request into the // stored intent. Nil or "s3bucket" provisions a fresh per-org bucket; // "external" reuses an existing bucket and requires a bucket name. @@ -209,14 +192,14 @@ func (h *handler) provisionWarehouse(c *gin.Context) { return } - // Catalogs are decoupled from the metadata backend: a duckling can run - // DuckLake, Iceberg, or both, on any of the three metadata stores. At least - // one catalog must be enabled (a warehouse with neither has nothing to - // attach). + if req.Iceberg != nil { + c.JSON(http.StatusBadRequest, gin.H{"error": "iceberg provisioning is no longer supported; set ducklake.enabled to true"}) + return + } + ducklakeEnabled := req.DuckLake != nil && req.DuckLake.Enabled - icebergEnabled := req.Iceberg != nil && req.Iceberg.Enabled - if !ducklakeEnabled && !icebergEnabled { - c.JSON(http.StatusBadRequest, gin.H{"error": "at least one of ducklake.enabled or iceberg.enabled must be true"}) + if !ducklakeEnabled { + c.JSON(http.StatusBadRequest, gin.H{"error": "ducklake.enabled must be true"}) return } @@ -241,17 +224,10 @@ func (h *handler) provisionWarehouse(c *gin.Context) { DataStore: ds, DuckLake: configstore.ManagedWarehouseDuckLake{Enabled: ducklakeEnabled}, } - if icebergEnabled { - warehouse.Iceberg = configstore.ManagedWarehouseIceberg{ - Enabled: true, - Backend: configstore.IcebergBackendLakekeeper, - Namespace: icebergNamespace(req.Iceberg), - } - } - // Metadata backend (the Postgres that hosts the DuckLake catalog and/or the - // Lakekeeper PG). Provisioning shape differs per type; the catalog choice - // above is orthogonal. + // Metadata backend (the Postgres that hosts the DuckLake catalog). + // Provisioning shape differs per type; the catalog choice above is + // orthogonal. switch req.MetadataStore.Type { case configstore.MetadataStoreKindCnpgShard: // No per-claim config — the composition picks the active shard from @@ -332,7 +308,6 @@ func (h *handler) provisionWarehouse(c *gin.Context) { "database_name": req.DatabaseName, "metadata_store": string(req.MetadataStore.Type), "ducklake_enabled": ducklakeEnabled, - "iceberg_enabled": icebergEnabled, }) resp := gin.H{ diff --git a/controlplane/provisioning/api_test.go b/controlplane/provisioning/api_test.go index 01fd1ba6..06a87760 100644 --- a/controlplane/provisioning/api_test.go +++ b/controlplane/provisioning/api_test.go @@ -207,7 +207,7 @@ func TestProvisionAutoCreatesOrg(t *testing.T) { store := newFakeStore() router := newTestRouter(store) - body := []byte(`{"database_name": "test-db", "metadata_store": {"type": "cnpg-shard"}, "iceberg": {"enabled": true}}`) + body := []byte(`{"database_name": "test-db", "metadata_store": {"type": "cnpg-shard"}, "ducklake": {"enabled": true}}`) req := httptest.NewRequest(http.MethodPost, "/api/v1/orgs/new-org/provision", bytes.NewReader(body)) req.Header.Set("Content-Type", "application/json") rec := httptest.NewRecorder() @@ -248,7 +248,7 @@ func TestProvisionRejectsExistingNonTerminal(t *testing.T) { } router := newTestRouter(store) - body := []byte(`{"database_name": "test-db", "metadata_store": {"type": "cnpg-shard"}, "iceberg": {"enabled": true}}`) + body := []byte(`{"database_name": "test-db", "metadata_store": {"type": "cnpg-shard"}, "ducklake": {"enabled": true}}`) req := httptest.NewRequest(http.MethodPost, "/api/v1/orgs/analytics/provision", bytes.NewReader(body)) req.Header.Set("Content-Type", "application/json") rec := httptest.NewRecorder() @@ -269,7 +269,7 @@ func TestProvisionAllowsRetryAfterFailure(t *testing.T) { } router := newTestRouter(store) - body := []byte(`{"database_name": "analytics-db", "metadata_store": {"type": "cnpg-shard"}, "iceberg": {"enabled": true}}`) + body := []byte(`{"database_name": "analytics-db", "metadata_store": {"type": "cnpg-shard"}, "ducklake": {"enabled": true}}`) req := httptest.NewRequest(http.MethodPost, "/api/v1/orgs/analytics/provision", bytes.NewReader(body)) req.Header.Set("Content-Type", "application/json") rec := httptest.NewRecorder() @@ -293,7 +293,7 @@ func TestProvisionAllowsRetryAfterDeleted(t *testing.T) { } router := newTestRouter(store) - body := []byte(`{"database_name": "analytics-db", "metadata_store": {"type": "cnpg-shard"}, "iceberg": {"enabled": true}}`) + body := []byte(`{"database_name": "analytics-db", "metadata_store": {"type": "cnpg-shard"}, "ducklake": {"enabled": true}}`) req := httptest.NewRequest(http.MethodPost, "/api/v1/orgs/analytics/provision", bytes.NewReader(body)) req.Header.Set("Content-Type", "application/json") rec := httptest.NewRecorder() @@ -439,7 +439,7 @@ func TestProvisionTransactionRollsBackOnUserFailure(t *testing.T) { store.setProvisionUserFailHook(errors.New("simulated DB write failure")) router := newTestRouter(store) - body := []byte(`{"database_name": "team-7-db", "metadata_store": {"type": "cnpg-shard"}, "iceberg": {"enabled": true}}`) + body := []byte(`{"database_name": "team-7-db", "metadata_store": {"type": "cnpg-shard"}, "ducklake": {"enabled": true}}`) req := httptest.NewRequest(http.MethodPost, "/api/v1/orgs/7/provision", bytes.NewReader(body)) req.Header.Set("Content-Type", "application/json") rec := httptest.NewRecorder() @@ -502,8 +502,7 @@ func TestProvisionCnpgShard(t *testing.T) { store.orgs["shardco"] = &configstore.Org{Name: "shardco"} router := newTestRouter(store) - // cnpg-shard takes no sizing and auto-enables iceberg. - body := []byte(`{"database_name": "shardco-db", "metadata_store": {"type": "cnpg-shard"}, "iceberg": {"enabled": true}}`) + body := []byte(`{"database_name": "shardco-db", "metadata_store": {"type": "cnpg-shard"}, "ducklake": {"enabled": true}}`) req := httptest.NewRequest(http.MethodPost, "/api/v1/orgs/shardco/provision", bytes.NewReader(body)) req.Header.Set("Content-Type", "application/json") rec := httptest.NewRecorder() @@ -520,26 +519,26 @@ func TestProvisionCnpgShard(t *testing.T) { if w.MetadataStore.Kind != configstore.MetadataStoreKindCnpgShard { t.Errorf("metadata store kind = %q, want cnpg-shard", w.MetadataStore.Kind) } - if !w.Iceberg.Enabled || w.Iceberg.Backend != configstore.IcebergBackendLakekeeper { - t.Errorf("expected iceberg enabled with lakekeeper backend, got %+v", w.Iceberg) + if !w.DuckLake.Enabled { + t.Errorf("expected ducklake enabled, got %+v", w.DuckLake) } } -func TestProvisionIcebergExternal(t *testing.T) { +func TestProvisionDuckLakeExternalWithCustomDatabase(t *testing.T) { store := newFakeStore() router := newTestRouter(store) body := []byte(`{ - "database_name": "extice-db", + "database_name": "extcustom-db", "metadata_store": {"type": "external", "external": { "endpoint": "rds.example.us-east-1.rds.amazonaws.com", "password_aws_secret": "duckling-example-rds-password", "user": "postgres", "database": "postgres" }}, "data_store": {"type": "external", "bucket_name": "posthog-duckling-example", "region": "us-east-1"}, - "iceberg": {"enabled": true} + "ducklake": {"enabled": true} }`) - req := httptest.NewRequest(http.MethodPost, "/api/v1/orgs/extice/provision", bytes.NewReader(body)) + req := httptest.NewRequest(http.MethodPost, "/api/v1/orgs/extcustom/provision", bytes.NewReader(body)) req.Header.Set("Content-Type", "application/json") rec := httptest.NewRecorder() router.ServeHTTP(rec, req) @@ -547,7 +546,7 @@ func TestProvisionIcebergExternal(t *testing.T) { if rec.Code != http.StatusAccepted { t.Fatalf("status = %d, want %d: %s", rec.Code, http.StatusAccepted, rec.Body.String()) } - w := store.warehouses["extice"] + w := store.warehouses["extcustom"] if w == nil { t.Fatal("expected warehouse to be created") return @@ -564,17 +563,12 @@ func TestProvisionIcebergExternal(t *testing.T) { if w.DataStore.Kind != "external" || w.DataStore.BucketName != "posthog-duckling-example" || w.DataStore.Region != "us-east-1" { t.Errorf("data store not persisted: %+v", w.DataStore) } - if !w.Iceberg.Enabled || w.Iceberg.Backend != configstore.IcebergBackendLakekeeper { - t.Errorf("expected iceberg enabled with lakekeeper backend, got %+v", w.Iceberg) - } - // Decoupled: iceberg without a ducklake flag is iceberg-ONLY (no implicit DuckLake). - if w.DuckLake.Enabled { - t.Errorf("iceberg+external without a ducklake flag must NOT enable DuckLake; got ducklake=%v", w.DuckLake.Enabled) + if !w.DuckLake.Enabled { + t.Errorf("expected ducklake enabled, got %+v", w.DuckLake) } } -// TestProvisionRejectsNoCatalog verifies the ≥1-catalog gate: a duckling with -// neither ducklake nor iceberg is rejected. +// TestProvisionRejectsNoCatalog verifies DuckLake must be enabled. func TestProvisionRejectsNoCatalog(t *testing.T) { store := newFakeStore() router := newTestRouter(store) @@ -591,25 +585,19 @@ func TestProvisionRejectsNoCatalog(t *testing.T) { } } -// TestProvisionCnpgDuckLakeAndIceberg verifies the fully-decoupled combo: -// cnpg-shard with both catalogs. -func TestProvisionCnpgDuckLakeAndIceberg(t *testing.T) { +func TestProvisionRejectsIcebergCatalogRequest(t *testing.T) { store := newFakeStore() router := newTestRouter(store) - body := []byte(`{"database_name":"both-db","metadata_store":{"type":"cnpg-shard"},"ducklake":{"enabled":true},"iceberg":{"enabled":true}}`) - req := httptest.NewRequest(http.MethodPost, "/api/v1/orgs/bothco/provision", bytes.NewReader(body)) + body := []byte(`{"database_name":"iceberg-db","metadata_store":{"type":"cnpg-shard"},"iceberg":{"enabled":true}}`) + req := httptest.NewRequest(http.MethodPost, "/api/v1/orgs/acme/provision", bytes.NewReader(body)) req.Header.Set("Content-Type", "application/json") rec := httptest.NewRecorder() router.ServeHTTP(rec, req) - if rec.Code != http.StatusAccepted { - t.Fatalf("status = %d, want 202: %s", rec.Code, rec.Body.String()) - } - w := store.warehouses["bothco"] - if w == nil || w.MetadataStore.Kind != configstore.MetadataStoreKindCnpgShard { - t.Fatalf("expected cnpg-shard warehouse, got %+v", w) + if rec.Code != http.StatusBadRequest { + t.Fatalf("status = %d, want 400: %s", rec.Code, rec.Body.String()) } - if !w.DuckLake.Enabled || !w.Iceberg.Enabled { - t.Errorf("expected both ducklake and iceberg enabled; got ducklake=%v iceberg=%v", w.DuckLake.Enabled, w.Iceberg.Enabled) + if _, ok := store.warehouses["acme"]; ok { + t.Error("iceberg provision request must not create a warehouse") } } @@ -617,7 +605,6 @@ func TestProvisionDuckLakeExternal(t *testing.T) { store := newFakeStore() router := newTestRouter(store) - // ducklake on, no iceberg → DuckLake-only on external. body := []byte(`{ "database_name": "extdl-db", "metadata_store": {"type": "external", "external": { @@ -643,8 +630,8 @@ func TestProvisionDuckLakeExternal(t *testing.T) { if w.MetadataStore.Kind != configstore.MetadataStoreKindExternal { t.Errorf("metadata store kind = %q, want external", w.MetadataStore.Kind) } - if !w.DuckLake.Enabled || w.Iceberg.Enabled { - t.Errorf("ducklake-only+external: want ducklake on, iceberg off; got ducklake=%v iceberg=%v", w.DuckLake.Enabled, w.Iceberg.Enabled) + if !w.DuckLake.Enabled { + t.Errorf("ducklake-only+external: want ducklake on, got ducklake=%v", w.DuckLake.Enabled) } } @@ -773,11 +760,11 @@ func TestProvisionRejectsUnsupportedMetadataStore(t *testing.T) { } func TestProvisionRejectsInvalidOrgID(t *testing.T) { - for _, bad := range []string{"ben.iceberg", "Ben-Iceberg", "ben_iceberg", "-bad", "bad-"} { + for _, bad := range []string{"ben.ducklake", "Ben-Ducklake", "ben_ducklake", "-bad", "bad-"} { t.Run(bad, func(t *testing.T) { store := newFakeStore() router := newTestRouter(store) - body := []byte(`{"database_name":"d","metadata_store":{"type":"cnpg-shard"},"iceberg":{"enabled":true}}`) + body := []byte(`{"database_name":"d","metadata_store":{"type":"cnpg-shard"},"ducklake":{"enabled":true}}`) req := httptest.NewRequest(http.MethodPost, "/api/v1/orgs/"+bad+"/provision", bytes.NewReader(body)) req.Header.Set("Content-Type", "application/json") rec := httptest.NewRecorder() @@ -795,8 +782,8 @@ func TestProvisionRejectsInvalidOrgID(t *testing.T) { func TestProvisionAcceptsHyphenatedOrgID(t *testing.T) { store := newFakeStore() router := newTestRouter(store) - body := []byte(`{"database_name":"d","metadata_store":{"type":"cnpg-shard"},"iceberg":{"enabled":true}}`) - req := httptest.NewRequest(http.MethodPost, "/api/v1/orgs/ben-iceberg-cnpg/provision", bytes.NewReader(body)) + body := []byte(`{"database_name":"d","metadata_store":{"type":"cnpg-shard"},"ducklake":{"enabled":true}}`) + req := httptest.NewRequest(http.MethodPost, "/api/v1/orgs/ben-ducklake-cnpg/provision", bytes.NewReader(body)) req.Header.Set("Content-Type", "application/json") rec := httptest.NewRecorder() router.ServeHTTP(rec, req) diff --git a/controlplane/provisioning/store.go b/controlplane/provisioning/store.go index 4c481dc8..050c234a 100644 --- a/controlplane/provisioning/store.go +++ b/controlplane/provisioning/store.go @@ -120,13 +120,6 @@ func createPendingWarehouseTx(tx *gorm.DB, orgID, databaseName string, warehouse warehouse.S3State = configstore.ManagedWarehouseStatePending warehouse.IdentityState = configstore.ManagedWarehouseStatePending warehouse.SecretsState = configstore.ManagedWarehouseStatePending - // Track iceberg as a provisioning component only when the tenant opted - // in (e.g. cnpg-shard, which is always iceberg-backed). Leaving it - // empty for non-iceberg warehouses keeps them out of the iceberg - // readiness gate. - if warehouse.Iceberg.Enabled { - warehouse.IcebergState = configstore.ManagedWarehouseStatePending - } return tx.Create(warehouse).Error } diff --git a/controlplane/session_search_path.go b/controlplane/session_search_path.go index dd05d698..a69d0bb8 100644 --- a/controlplane/session_search_path.go +++ b/controlplane/session_search_path.go @@ -3,8 +3,6 @@ package controlplane import ( "fmt" "strings" - - "github.com/posthog/duckgres/server/iceberg" ) type sessionSearchPathSource string @@ -20,46 +18,20 @@ const ( // HasAttachedCatalog probe in control.go in sync. const physicalDuckLakeCatalog = "ducklake" -// physicalIcebergCatalog is the name the per-tenant Iceberg catalog is attached -// as on the worker (the `ATTACH ... AS iceberg` during activation). -const physicalIcebergCatalog = iceberg.CatalogName - // effectiveSessionDefaultCommand returns the connect-time command for a // non-passthrough session, given the resolved real catalog the session defaults -// to (effectiveCatalog, one of "ducklake"/"iceberg"). +// to. // // For DuckLake the catalog switch is owned by InitSessionDatabaseMetadata's // defer (which also restores memory.main on the search_path so the pg_catalog // compat macros stay resolvable), so a DuckLake session only needs a command // when the client supplied its own search_path. -// -// For Iceberg there is no such defer, so the `USE iceberg.` catalog -// switch MUST be issued here — even when the client also supplied a search_path. -// Otherwise the session stays in the ephemeral `memory` catalog while -// current_database() reports 'iceberg', and unqualified DDL/DML silently misses -// the warehouse. The Iceberg USE is load-bearing, so when it is combined with a -// client search_path the command fails closed (sessionDefaultSourceConfiguredCatalog) -// rather than treating the whole thing as a best-effort search_path. func effectiveSessionDefaultCommand(clientSearchPath, effectiveCatalog string) (string, sessionSearchPathSource) { - icebergUse := "" - if effectiveCatalog == iceberg.CatalogName { - icebergUse = fmt.Sprintf("USE %s.%s", iceberg.CatalogName, iceberg.DefaultSchema) - } - - switch { - case clientSearchPath != "": + if clientSearchPath != "" { searchPath := fmt.Sprintf("SET search_path = '%s'", ensureMemoryMainInSearchPath(clientSearchPath)) - if icebergUse != "" { - // Switch into Iceberg first, then apply the client search_path (the - // USE resets it). The catalog switch is fail-closed. - return icebergUse + "; " + searchPath, sessionDefaultSourceConfiguredCatalog - } return searchPath, sessionSearchPathSourceClient - case icebergUse != "": - return icebergUse, sessionDefaultSourceConfiguredCatalog - default: - return "", "" } + return "", "" } // passthroughSessionDefaultCatalogCommand returns the connect-time command that @@ -67,61 +39,43 @@ func effectiveSessionDefaultCommand(clientSearchPath, effectiveCatalog string) ( // Passthrough users skip InitSessionDatabaseMetadata (whose defer issues the // catalog `USE` for the standard path), so without this the session stays in // DuckDB's empty in-memory catalog — current_database() reports "memory" and -// unqualified DDL/DML never reaches the warehouse. Mirrors -// server.setIcebergDefault / setDuckLakeDefault used by the standalone -// passthrough path. +// unqualified DDL/DML never reaches the warehouse. func passthroughSessionDefaultCatalogCommand(effectiveCatalog string) string { - switch effectiveCatalog { - case iceberg.CatalogName: - return fmt.Sprintf("USE %s.%s", iceberg.CatalogName, iceberg.DefaultSchema) - case physicalDuckLakeCatalog: + if effectiveCatalog == physicalDuckLakeCatalog { return "USE " + physicalDuckLakeCatalog - default: - return "" } + return "" } // resolveEffectiveCatalog picks the real catalog a session should default to. // requested is the validated startup selection ("" → use the per-user/attached -// default, "ducklake", or "iceberg"). defaultCatalog is the per-user configured -// default ("" or "iceberg"). duckLakeAttached/icebergAttached reflect what the -// worker actually attached for this session. The bool is false when the -// requested catalog isn't attached (caller should fail the connection 3D000) or -// nothing is attached at all. -func resolveEffectiveCatalog(requested, defaultCatalog string, duckLakeAttached, icebergAttached bool) (string, bool) { +// default, or "ducklake"). defaultCatalog is the per-user configured default +// ("" or "ducklake"). duckLakeAttached reflects what the worker actually +// attached for this session. The bool is false when the requested catalog isn't +// attached (caller should fail the connection 3D000) or nothing is attached. +func resolveEffectiveCatalog(requested, defaultCatalog string, duckLakeAttached bool) (string, bool) { switch requested { case physicalDuckLakeCatalog: if duckLakeAttached { return physicalDuckLakeCatalog, true } return "", false - case iceberg.CatalogName: - if icebergAttached { - return iceberg.CatalogName, true - } - return "", false } // requested == "": fall back to the per-user configured default. If the user // explicitly configured a default catalog, honor it strictly — fail closed if // it isn't attached rather than silently routing to a different catalog (the // connect path turns the false into a 3D000). This preserves the pre-rework // fail-closed contract for configured catalogs. - if defaultCatalog == iceberg.CatalogName { - if icebergAttached { - return iceberg.CatalogName, true + if defaultCatalog == physicalDuckLakeCatalog { + if duckLakeAttached { + return physicalDuckLakeCatalog, true } return "", false } - // No configured default: use whatever is attached (DuckLake preferred, then - // Iceberg for iceberg-only orgs). - switch { - case duckLakeAttached: + if duckLakeAttached { return physicalDuckLakeCatalog, true - case icebergAttached: - return iceberg.CatalogName, true - default: - return "", false } + return "", false } func ensureMemoryMainInSearchPath(searchPath string) string { diff --git a/controlplane/session_search_path_test.go b/controlplane/session_search_path_test.go index e40ee676..923006a7 100644 --- a/controlplane/session_search_path_test.go +++ b/controlplane/session_search_path_test.go @@ -2,19 +2,6 @@ package controlplane import "testing" -func TestEffectiveSessionDefaultCommandIcebergSwitchSurvivesClientSearchPath(t *testing.T) { - // An Iceberg session with a client-supplied search_path must STILL switch - // into the Iceberg catalog (there is no InitSessionDatabaseMetadata defer to - // do it). The catalog switch precedes the search_path and is fail-closed. - got, source := effectiveSessionDefaultCommand("public", "iceberg") - if got != "USE iceberg.public; SET search_path = 'public,memory.main'" { - t.Fatalf("command = %q, want USE iceberg.public; SET search_path = 'public,memory.main'", got) - } - if source != sessionDefaultSourceConfiguredCatalog { - t.Fatalf("source = %q, want %q", source, sessionDefaultSourceConfiguredCatalog) - } -} - func TestEffectiveSessionDefaultCommandDuckLakeClientSearchPathOnly(t *testing.T) { // DuckLake's catalog switch is owned by InitSessionDatabaseMetadata's defer, // so a client search_path is applied alone and best-effort. @@ -27,16 +14,6 @@ func TestEffectiveSessionDefaultCommandDuckLakeClientSearchPathOnly(t *testing.T } } -func TestEffectiveSessionDefaultCommandUsesIcebergCatalogWhenClientOmitted(t *testing.T) { - got, source := effectiveSessionDefaultCommand("", "iceberg") - if got != "USE iceberg.public" { - t.Fatalf("command = %q, want USE iceberg.public", got) - } - if source != sessionDefaultSourceConfiguredCatalog { - t.Fatalf("source = %q, want %q", source, sessionDefaultSourceConfiguredCatalog) - } -} - func TestEffectiveSessionDefaultCommandEmptyForDuckLake(t *testing.T) { // DuckLake's catalog switch is owned by InitSessionDatabaseMetadata's defer, // so the connect-time command for a ducklake session is empty. @@ -56,7 +33,6 @@ func TestPassthroughSessionDefaultCatalogCommand(t *testing.T) { want string }{ {name: "ducklake selected", effectiveCatalog: "ducklake", want: "USE ducklake"}, - {name: "iceberg selected", effectiveCatalog: "iceberg", want: "USE iceberg.public"}, {name: "nothing resolved leaves session as-is", effectiveCatalog: "", want: ""}, } for _, tt := range tests { @@ -74,27 +50,22 @@ func TestResolveEffectiveCatalog(t *testing.T) { requested string defaultCatalog string duckLake bool - iceberg bool want string wantOK bool }{ - {name: "explicit ducklake attached", requested: "ducklake", duckLake: true, iceberg: true, want: "ducklake", wantOK: true}, - {name: "explicit iceberg attached", requested: "iceberg", duckLake: true, iceberg: true, want: "iceberg", wantOK: true}, - {name: "explicit ducklake not attached", requested: "ducklake", duckLake: false, iceberg: true, want: "", wantOK: false}, - {name: "explicit iceberg not attached", requested: "iceberg", duckLake: true, iceberg: false, want: "", wantOK: false}, - {name: "default prefers ducklake", requested: "", duckLake: true, iceberg: true, want: "ducklake", wantOK: true}, - {name: "default honors per-user iceberg", requested: "", defaultCatalog: "iceberg", duckLake: true, iceberg: true, want: "iceberg", wantOK: true}, - {name: "configured iceberg default not attached fails closed", requested: "", defaultCatalog: "iceberg", duckLake: true, iceberg: false, want: "", wantOK: false}, - {name: "default falls back to iceberg-only", requested: "", duckLake: false, iceberg: true, want: "iceberg", wantOK: true}, - {name: "nothing attached fails", requested: "", duckLake: false, iceberg: false, want: "", wantOK: false}, + {name: "explicit ducklake attached", requested: "ducklake", duckLake: true, want: "ducklake", wantOK: true}, + {name: "explicit ducklake not attached", requested: "ducklake", duckLake: false, want: "", wantOK: false}, + {name: "default uses ducklake", requested: "", duckLake: true, want: "ducklake", wantOK: true}, + {name: "configured ducklake default attached", requested: "", defaultCatalog: "ducklake", duckLake: true, want: "ducklake", wantOK: true}, + {name: "configured ducklake default not attached fails closed", requested: "", defaultCatalog: "ducklake", duckLake: false, want: "", wantOK: false}, + {name: "nothing attached fails", requested: "", duckLake: false, want: "", wantOK: false}, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { - got, ok := resolveEffectiveCatalog(tt.requested, tt.defaultCatalog, tt.duckLake, tt.iceberg) + got, ok := resolveEffectiveCatalog(tt.requested, tt.defaultCatalog, tt.duckLake) if got != tt.want || ok != tt.wantOK { t.Fatalf("resolveEffectiveCatalog = (%q, %v), want (%q, %v)", got, ok, tt.want, tt.wantOK) } }) } } - diff --git a/controlplane/shared_worker_activator.go b/controlplane/shared_worker_activator.go index 13436477..a1eb967c 100644 --- a/controlplane/shared_worker_activator.go +++ b/controlplane/shared_worker_activator.go @@ -66,9 +66,6 @@ type TenantActivationPayload struct { OrgID string `json:"org_id"` Usernames []string `json:"usernames,omitempty"` DuckLake server.DuckLakeConfig `json:"ducklake"` - // Iceberg is the per-tenant Iceberg catalog (AWS S3 Tables) config. - // Empty when the tenant has not opted in or hasn't been provisioned yet. - Iceberg server.IcebergConfig `json:"iceberg"` // S3CredentialsExpiresAt is the absolute expiration time of the STS // credentials embedded in DuckLake.{S3AccessKey,S3SecretKey,S3SessionToken}. // nil for non-STS payloads (config-store-driven warehouses use static @@ -171,7 +168,6 @@ func (a *SharedWorkerActivator) ActivateReservedWorker(ctx context.Context, work }, OrgID: payload.OrgID, DuckLake: payload.DuckLake, - Iceberg: payload.Iceberg, }) } @@ -320,7 +316,6 @@ func (a *SharedWorkerActivator) RefreshCredentials(ctx context.Context, worker * }, OrgID: payload.OrgID, DuckLake: payload.DuckLake, - Iceberg: payload.Iceberg, } if err := worker.ActivateTenant(ctx, rpcPayload); err != nil { return fmt.Errorf("activate tenant for refresh: %w", err) @@ -397,16 +392,10 @@ func (a *SharedWorkerActivator) BuildActivationRequest(ctx context.Context, org } dl.SpecVersion = targetSpecVersion - ic, err := a.buildIcebergConfig(ctx, assignment.OrgID, &org.Warehouse.Iceberg) - if err != nil { - return TenantActivationPayload{}, err - } - return TenantActivationPayload{ OrgID: assignment.OrgID, Usernames: usernames, DuckLake: dl, - Iceberg: ic, S3CredentialsExpiresAt: expiresAt, }, nil } @@ -431,34 +420,24 @@ func (a *SharedWorkerActivator) buildDuckLakeConfigFromDuckling(ctx context.Cont S3URLStyle: "vhost", } - // DuckLake is attached iff this tenant has it enabled. The CR's - // spec.ducklake.enabled is authoritative (decoupled ducklings); legacy CRs - // that predate the field fall back to the historical coupling — DuckLake on - // for external, off for cnpg-shard. When on, the catalog lives in the - // metadata Postgres (the per-tenant lakekeeper_ DB on cnpg, or the - // metadata DB on external); when off the worker attaches Iceberg only - // (server.ActivateDBConnection takes its iceberg-only branch). - ducklakeEnabled := status.MetadataStore.Type != configstore.MetadataStoreKindCnpgShard - if status.DuckLakeEnabled != nil { - ducklakeEnabled = *status.DuckLakeEnabled - } - if ducklakeEnabled { - if status.MetadataStore.Password == "" { - return server.DuckLakeConfig{}, nil, fmt.Errorf("duckling CR %q has DuckLake enabled but no metadata store password", orgID) - } - host, port, viaPgBouncer, err := ducklingMetadataStoreAddress(status, orgID) - if err != nil { - return server.DuckLakeConfig{}, nil, err - } - dl.MetadataStore = buildDuckLakeMetadataStoreDSN( - host, - port, - status.MetadataStore.User, - status.MetadataStore.Password, - status.MetadataStore.Database, - ) - dl.ViaPgBouncer = viaPgBouncer + if status.DuckLakeEnabled == nil || !*status.DuckLakeEnabled { + return server.DuckLakeConfig{}, nil, fmt.Errorf("duckling CR %q does not have DuckLake enabled", orgID) + } + if status.MetadataStore.Password == "" { + return server.DuckLakeConfig{}, nil, fmt.Errorf("duckling CR %q has DuckLake enabled but no metadata store password", orgID) + } + host, port, viaPgBouncer, err := ducklingMetadataStoreAddress(status, orgID) + if err != nil { + return server.DuckLakeConfig{}, nil, err } + dl.MetadataStore = buildDuckLakeMetadataStoreDSN( + host, + port, + status.MetadataStore.User, + status.MetadataStore.Password, + status.MetadataStore.Database, + ) + dl.ViaPgBouncer = viaPgBouncer // Broker S3 credentials via STS AssumeRole if status.IAMRoleARN == "" { @@ -541,11 +520,9 @@ func (a *SharedWorkerActivator) buildDuckLakeConfigFromConfigStore(ctx context.C dl.S3SecretKey = secretKey // session_token is optional in the secret payload — long-term IAM // user keys don't have one. STS-vended temporary credentials - // (AccessKeyId starting with ASIA…) require it: AWS rejects the - // signing identity without the token and the iceberg REST endpoint - // returns 403. Letting the field through lets sandbox/CI fixtures - // that source creds from STS use the same secret-ref schema as - // production's long-term keys. + // (AccessKeyId starting with ASIA...) require it for AWS signing. + // Letting the field through lets sandbox/CI fixtures that source creds + // from STS use the same secret-ref schema as production's long-term keys. dl.S3SessionToken = sessionToken case strings.EqualFold(warehouse.S3.Provider, "aws"): roleARN := warehouse.WorkerIdentity.IAMRoleARN @@ -587,39 +564,6 @@ func BuildTenantActivationPayload(ctx context.Context, clientset kubernetes.Inte return activator.BuildActivationRequest(ctx, org, assignment) } -// buildIcebergConfig maps a stored ManagedWarehouseIceberg into the wire-level -// IcebergConfig that ships to workers. Lakekeeper is the only backend, so the -// fields populated here all describe the per-tenant Lakekeeper REST catalog. -// Empty Lakekeeper fields are treated as "provisioner hasn't filled this in -// yet" and the worker returns no-op for that org. -// -// With a non-empty LakekeeperClientCredentials SecretRef, the OAuth2 -// client_secret is resolved via readSecretValue just before sending. Empty -// SecretRef is fine (allowall mode; OIDC SA-token auth supersedes this when -// configured). -func (a *SharedWorkerActivator) buildIcebergConfig(ctx context.Context, orgID string, src *configstore.ManagedWarehouseIceberg) (server.IcebergConfig, error) { - ic := server.IcebergConfig{ - Enabled: src.Enabled, - Backend: src.Backend, - Namespace: src.Namespace, - Region: src.Region, - } - // Lakekeeper is the only supported backend; populate its fields - // unconditionally. - ic.LakekeeperEndpoint = src.LakekeeperEndpoint - ic.LakekeeperWarehouse = src.LakekeeperWarehouse - ic.LakekeeperClientID = src.LakekeeperClientID - ic.LakekeeperOAuth2ServerURI = src.LakekeeperOAuth2ServerURI - if src.LakekeeperClientCredentials.Name != "" { - val, err := a.readSecretValue(ctx, src.LakekeeperClientCredentials) - if err != nil { - return server.IcebergConfig{}, fmt.Errorf("resolve lakekeeper client credentials for org %q: %w", orgID, err) - } - ic.LakekeeperClientSecret = val - } - return ic, nil -} - func (a *SharedWorkerActivator) readSecretValue(ctx context.Context, ref configstore.SecretRef) (string, error) { if strings.TrimSpace(ref.Name) == "" || strings.TrimSpace(ref.Key) == "" { return "", fmt.Errorf("secret ref requires name and key") diff --git a/controlplane/shared_worker_activator_iceberg_test.go b/controlplane/shared_worker_activator_iceberg_test.go deleted file mode 100644 index 27709ece..00000000 --- a/controlplane/shared_worker_activator_iceberg_test.go +++ /dev/null @@ -1,113 +0,0 @@ -//go:build kubernetes - -package controlplane - -import ( - "context" - "testing" - - "github.com/posthog/duckgres/controlplane/configstore" - "github.com/posthog/duckgres/server/iceberg" - corev1 "k8s.io/api/core/v1" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/client-go/kubernetes/fake" -) - -func TestBuildIcebergConfig_LakekeeperAllowall(t *testing.T) { - cs := fake.NewSimpleClientset() - a := &SharedWorkerActivator{clientset: cs, defaultNamespace: "duckgres"} - - src := &configstore.ManagedWarehouseIceberg{ - Enabled: true, - Backend: iceberg.BackendLakekeeper, - Namespace: "main", - LakekeeperEndpoint: "http://lakekeeper-acme.lakekeeper.svc:8181/catalog", - LakekeeperWarehouse: "org-acme", - LakekeeperClientID: "duckling-acme", - LakekeeperOAuth2ServerURI: "", // allowall mode - // LakekeeperClientCredentials is empty → no secret resolution - } - ic, err := a.buildIcebergConfig(context.Background(), "acme", src) - if err != nil { - t.Fatalf("buildIcebergConfig: %v", err) - } - if ic.LakekeeperEndpoint != src.LakekeeperEndpoint { - t.Errorf("LakekeeperEndpoint = %q, want %q", ic.LakekeeperEndpoint, src.LakekeeperEndpoint) - } - if ic.LakekeeperWarehouse != "org-acme" { - t.Errorf("LakekeeperWarehouse = %q, want org-acme", ic.LakekeeperWarehouse) - } - if ic.LakekeeperClientSecret != "" { - t.Errorf("allowall mode should not call readSecretValue; got LakekeeperClientSecret=%q", ic.LakekeeperClientSecret) - } -} - -func TestBuildIcebergConfig_LakekeeperResolvesOAuth2Secret(t *testing.T) { - const ns = "lakekeeper" - cs := fake.NewSimpleClientset(&corev1.Secret{ - ObjectMeta: metav1.ObjectMeta{Name: "lakekeeper-acme", Namespace: ns}, - Data: map[string][]byte{"oauth2-client-secret": []byte("super-secret-token")}, - }) - a := &SharedWorkerActivator{clientset: cs, defaultNamespace: ns} - - src := &configstore.ManagedWarehouseIceberg{ - Enabled: true, - Backend: iceberg.BackendLakekeeper, - LakekeeperEndpoint: "http://lakekeeper-acme.lakekeeper.svc:8181/catalog", - LakekeeperWarehouse: "org-acme", - LakekeeperClientID: "duckling-acme", - LakekeeperOAuth2ServerURI: "http://oidc/token", - LakekeeperClientCredentials: configstore.SecretRef{ - Namespace: ns, - Name: "lakekeeper-acme", - Key: "oauth2-client-secret", - }, - } - ic, err := a.buildIcebergConfig(context.Background(), "acme", src) - if err != nil { - t.Fatalf("buildIcebergConfig: %v", err) - } - if ic.LakekeeperClientSecret != "super-secret-token" { - t.Errorf("LakekeeperClientSecret = %q, want super-secret-token", ic.LakekeeperClientSecret) - } -} - -func TestBuildIcebergConfig_EmptyBackendDefaultsLakekeeper(t *testing.T) { - cs := fake.NewSimpleClientset() - a := &SharedWorkerActivator{clientset: cs, defaultNamespace: "duckgres"} - - src := &configstore.ManagedWarehouseIceberg{ - Enabled: true, - // Backend left empty → ResolvedBackend returns lakekeeper - LakekeeperEndpoint: "http://x/catalog", - LakekeeperWarehouse: "org-x", - } - ic, err := a.buildIcebergConfig(context.Background(), "x", src) - if err != nil { - t.Fatalf("buildIcebergConfig: %v", err) - } - if ic.ResolvedBackend() != iceberg.BackendLakekeeper { - t.Errorf("empty Backend should resolve to lakekeeper, got %q", ic.ResolvedBackend()) - } -} - -func TestBuildIcebergConfig_SecretFetchErrorSurfaces(t *testing.T) { - const ns = "lakekeeper" - cs := fake.NewSimpleClientset() // no secret pre-loaded - a := &SharedWorkerActivator{clientset: cs, defaultNamespace: ns} - - src := &configstore.ManagedWarehouseIceberg{ - Enabled: true, - Backend: iceberg.BackendLakekeeper, - LakekeeperOAuth2ServerURI: "http://oidc/token", - LakekeeperClientCredentials: configstore.SecretRef{ - Namespace: ns, - Name: "missing-secret", - Key: "oauth2-client-secret", - }, - } - _, err := a.buildIcebergConfig(context.Background(), "x", src) - if err == nil { - t.Fatal("expected error for missing secret, got nil") - } -} diff --git a/controlplane/sni_kubernetes_test.go b/controlplane/sni_kubernetes_test.go index 5de42386..2c2a4bca 100644 --- a/controlplane/sni_kubernetes_test.go +++ b/controlplane/sni_kubernetes_test.go @@ -265,7 +265,7 @@ func TestPostgresSNIUnknownModeIgnoresSNI(t *testing.T) { } } -// A non-selectable database name (anything other than ducklake/iceberg/empty) +// A non-selectable database name (anything other than ducklake/empty) // is rejected with 3D000 — the database param is now catalog selection, not an // org/identity routing key. func TestPostgresInvalidCatalogSQLSTATE(t *testing.T) { @@ -279,7 +279,7 @@ func TestPostgresInvalidCatalogSQLSTATE(t *testing.T) { OrgID: "other-org", SNIOrgID: "other-org", SNIResolved: true, - CatalogValid: false, // "requested_db" is not ducklake/iceberg + CatalogValid: false, // "requested_db" is not ducklake Valid: true, } }, diff --git a/controlplane/sts_broker.go b/controlplane/sts_broker.go index 9580814f..863ecf57 100644 --- a/controlplane/sts_broker.go +++ b/controlplane/sts_broker.go @@ -75,10 +75,10 @@ var credentialRefreshLookahead = stsSessionDuration / 2 // secrets through the statement's MVCC snapshot). Stock httpfs has NO // mid-statement recovery path for scan workloads: its refresh-on-403 hook // only runs at file open AND only when the open performs a network request, -// but DuckLake, Iceberg, and S3-glob scans all pre-populate file +// but DuckLake and S3-glob scans all pre-populate file // size/etag/last-modified so opens skip the HEAD entirely and the first auth // failure surfaces on a range GET, which is not retried (verified against -// httpfs v1.5.3 / ducklake / duckdb-iceberg sources; pinned by +// httpfs v1.5.3 / ducklake sources; pinned by // TestInFlightScanDiesOnCredentialRotation in tests/integration). A // statement on stock httpfs therefore lives or dies on its starting runway: // this margin guarantees every statement at least lookahead+5m of token diff --git a/controlplane/worker_profile_test.go b/controlplane/worker_profile_test.go index 6e74121e..915ed077 100644 --- a/controlplane/worker_profile_test.go +++ b/controlplane/worker_profile_test.go @@ -37,7 +37,7 @@ func TestResolveWorkerProfileSizing(t *testing.T) { wantWarn bool }{ {name: "no opts -> default (nil)", opts: map[string]string{}, wantNil: true}, - {name: "unrelated opts -> default (nil)", opts: map[string]string{"search_path": "iceberg.public"}, wantNil: true}, + {name: "unrelated opts -> default (nil)", opts: map[string]string{"search_path": "analytics"}, wantNil: true}, {name: "client cpu+mem", opts: map[string]string{gucWorkerCPU: "4", gucWorkerMemory: "32Gi"}, wantKey: "4|32Gi", wantTTL: defaultWorkerTTL}, {name: "client ttl only -> concrete w/ default size", opts: map[string]string{gucWorkerTTL: "5m"}, wantKey: "8|16Gi", wantTTL: 5 * time.Minute}, {name: "cpu over max -> clamp+warn", opts: map[string]string{gucWorkerCPU: "64"}, wantKey: "16|16Gi", wantTTL: defaultWorkerTTL, wantWarn: true}, diff --git a/docs/design/connection-string-worker-profile.md b/docs/design/connection-string-worker-profile.md index caba911d..76039432 100644 --- a/docs/design/connection-string-worker-profile.md +++ b/docs/design/connection-string-worker-profile.md @@ -43,10 +43,10 @@ options=-c duckgres.worker_tier=backfill # ergonomic alias = {cpu, mem, c |---|---|---|---|---| | default / exclusive | false | 46 / 360Gi (pool-global; normalized `""/""`) | `duckgres-workers` (today) | absent GUCs / gate off / `worker_tier=heavy` | | backfill | true | 4 / 16Gi | `duckgres-workers-colocated` | `colocate=true` no size / `worker_tier=backfill` | -| iceberg backfill | true | 8 / 48Gi | `duckgres-workers-colocated` | `colocate=true worker_cpu=8 worker_memory=48Gi` (Iceberg-allowlisted org) | +| large backfill | true | 8 / 48Gi | `duckgres-workers-colocated` | `colocate=true worker_cpu=8 worker_memory=48Gi` (allowlisted org) | -Iceberg dual-write does `INSERT … SELECT * FROM read_parquet(full-day)` (memory-heavy; can OOM 16Gi) — hence -8/48 for allowlisted orgs; the common DuckLake register path is metadata-only and fine on 4/16. +Large `INSERT ... SELECT * FROM read_parquet(full-day)` backfills are memory-heavy and can OOM 16Gi, so +8/48 remains available for allowlisted orgs; the common DuckLake register path is metadata-only and fine on 4/16. ## Open decisions (locked for this implementation) @@ -58,7 +58,7 @@ Iceberg dual-write does `INSERT … SELECT * FROM read_parquet(full-day)` (memor An adversarial multi-agent review (36/37 findings confirmed) drove these fixes: -- **Shape-aware warm pool** (was single-shape 4/16): an 8/48 Iceberg request now matches a warm worker instead of permanent backpressure (`ColocatedWarmShapes`). +- **Shape-aware warm pool** (was single-shape 4/16): an 8/48 backfill request now matches a warm worker instead of permanent backpressure (`ColocatedWarmShapes`). - **NULL-safe claim filters** (`COALESCE`): legacy `worker_records` rows (profile columns NULL before AutoMigrate) stay claimable by the default request. - **Profile-filtered `ClaimHotIdleWorker`**: a differently-shaped request no longer reclaims-and-retires an org's default-shape hot-idle workers. - **Authoritative per-org colocated quota** inside the claim txn (cross-CP), not just in-process. diff --git a/docs/runbooks/lakekeeper-iceberg-catalog.md b/docs/runbooks/lakekeeper-iceberg-catalog.md deleted file mode 100644 index 7feac1b5..00000000 --- a/docs/runbooks/lakekeeper-iceberg-catalog.md +++ /dev/null @@ -1,159 +0,0 @@ -# Lakekeeper Iceberg Catalog - -How the per-org [Lakekeeper](https://docs.lakekeeper.io/) Iceberg REST catalog -backend works, how to activate it for a tenant, and how to diagnose the -failure modes we hit bringing it up. This is the alternative to the AWS S3 -Tables Iceberg backend — selected per tenant via `iceberg.backend`. - -## Architecture - -Each tenant that opts into the Lakekeeper backend gets its **own** single-tenant -Lakekeeper instance, provisioned by the control plane through the -[lakekeeper-operator](https://github.com/lakekeeper/lakekeeper-operator) and a -`Lakekeeper` custom resource. All instances share one Kubernetes namespace -(`lakekeeper`); isolation is per-CR (Deployment + Service + ServiceAccount + -Secret + a migrate Job), not per-namespace. - -``` -duckgres control plane ──(provisioner)──▶ Lakekeeper CR ──(operator)──▶ lakekeeper- Deployment/Service - │ │ - │ persists endpoint/warehouse/client_id into the config store │ serves Iceberg REST catalog (metadata only) - ▼ ▼ - worker activation ──▶ ATTACH '' AS iceberg (TYPE ICEBERG, ENDPOINT '', …) - data read/write goes straight to S3 with the worker's own credentials -``` - -The relevant code: - -- `controlplane/provisioner/lakekeeper_provisioner.go` — reconciles the - `Lakekeeper` CR, bootstraps the server, and creates the per-org warehouse. -- `controlplane/provisioner/lakekeeper_k8s.go` — renders the CR, ServiceAccount, - and client-credentials Secret. -- `server/iceberg/migration.go` — builds the `ATTACH` and `CREATE SECRET` - statements the worker runs on activation. -- `server/server.go::attachLakekeeperCatalog` — wires the worker's S3 credentials - into the attach. - -## Credential model (important) - -**Lakekeeper serves catalog metadata only. The worker reads and writes table -data in S3 with its own credentials — Lakekeeper does not vend credentials to -the client.** - -On activation the worker creates a `TYPE S3` secret named `iceberg_sigv4` from -the credentials in its activation payload (the same per-org role/bucket the -DuckLake S3 secret uses), then `ATTACH`es the Lakekeeper REST catalog. DuckDB -signs S3 data requests with that secret. - -Credential **vending** (the Iceberg REST `vended-credentials` delegation, where -the catalog hands per-table S3 credentials back to the client) is deliberately -**disabled**: - -- The Lakekeeper instance is configured with STS vending off; its STS - down-scoping session policy overflowed AWS's packed-policy size limit - (`PackedPolicyTooLargeException`). -- It is unnecessary — the worker already holds S3 credentials for the warehouse - bucket. - -### `ACCESS_DELEGATION_MODE 'none'` is mandatory, not optional - -DuckDB's iceberg extension **defaults `ACCESS_DELEGATION_MODE` to -`'vended_credentials'`**. *Omitting* the option does **not** disable vending. - -If the client requests delegation but the server has vending disabled, -Lakekeeper returns a per-table storage *config* (region/endpoint) with **no -credentials**. DuckDB still materializes that into a path-scoped `__internal_ic_*` -S3 secret with empty credentials. Because its scope (the table's S3 prefix) is -*more specific* than `iceberg_sigv4`'s (`s3://`), it **shadows** the working -secret — and every data read/write goes out anonymous, so S3 returns -`403 Forbidden`. Metadata operations (CREATE SCHEMA/TABLE) still succeed because -they go through the REST API, which masks the problem during provisioning. - -`BuildLakekeeperAttachStmt` therefore sets `ACCESS_DELEGATION_MODE 'none'` -explicitly on every `ATTACH`. With delegation off, DuckDB falls back to the -ambient `iceberg_sigv4` secret. A quick check that this is in effect: - -```sql -SELECT count(*) FILTER (WHERE name LIKE '__internal_ic%') AS vended, - count(*) FILTER (WHERE name = 'iceberg_sigv4') AS sigv4 -FROM duckdb_secrets(); --- expect: vended = 0, sigv4 = 1 -``` - -## Prerequisites - -- The `lakekeeper-operator` is deployed in the same cluster as the tenants' - workers (it must be co-located with the `Lakekeeper` CRs the control plane - creates). -- The control plane runs with `DUCKGRES_LAKEKEEPER_PROVISIONER_ENABLED=true`, - and the `lakekeeper` namespace plus the provisioner's RBAC exist. -- The per-org Lakekeeper ServiceAccount has cloud credentials (e.g. via an EKS - Pod Identity association mapping it to the tenant's IAM role) so the catalog - can perform its own metadata IO to S3. - -## Activate the Lakekeeper backend for a tenant - -Enable Iceberg on the tenant's managed warehouse with the `lakekeeper` backend -via the admin API: - -```sh -curl -X PUT "$ADMIN_API/api/v1/orgs//warehouse" \ - -H "X-Duckgres-Internal-Secret: $INTERNAL_SECRET" \ - -H 'Content-Type: application/json' \ - -d '{"iceberg":{"enabled":true,"backend":"lakekeeper"}}' -``` - -The control plane detects the drift, patches the Duckling CR, and the operator -provisions the Lakekeeper instance (typically ready in well under a minute). -Confirm via the warehouse record: - -```sh -curl -s "$ADMIN_API/api/v1/orgs//warehouse" \ - -H "X-Duckgres-Internal-Secret: $INTERNAL_SECRET" | jq .iceberg, .iceberg_state -# iceberg_state: "ready"; iceberg.lakekeeper_endpoint / lakekeeper_warehouse / lakekeeper_client_id populated -``` - -## Verify end to end - -Connect as the tenant and round-trip data through the catalog: - -```sql -CREATE SCHEMA IF NOT EXISTS iceberg.smoke; -CREATE TABLE iceberg.smoke.t (id INTEGER, note VARCHAR); -INSERT INTO iceberg.smoke.t VALUES (1, 'hello'), (2, 'world'); -SELECT count(*) FROM iceberg.smoke.t; -- 2 -``` - -A successful `INSERT` writes a parquet data file plus Iceberg metadata -(`*.metadata.json`, manifest/snapshot `*.avro`) under the warehouse prefix in -the tenant's S3 bucket. - -## `information_schema.columns` - -DuckDB's Iceberg catalog can list tables before it has loaded each table's -schema, so raw `information_schema.columns` may expose placeholder -`__` / `unknown` rows. Duckgres hides those placeholders and uses the -Lakekeeper REST catalog to load current table schemas with bounded -concurrency before returning `information_schema.columns` results. - -For simple PostgreSQL client metadata predicates such as -`table_schema = 'x' AND table_name = 'y'`, Duckgres targets the preload to the -requested table. For broad metadata scans, stale Lakekeeper `404` table entries -are skipped because DuckDB's Iceberg `information_schema.tables` may -temporarily expose dropped tables from a worker/session-local catalog view. - -## Troubleshooting - -| Symptom | Likely cause | Action | -| :------ | :----------- | :----- | -| `CREATE TABLE` works but `INSERT` fails with `Unable to connect to URL s3://… Forbidden (HTTP 403)` | DuckDB requested vended credentials (the `ACCESS_DELEGATION_MODE` default) and is using an empty `__internal_ic_*` shadow secret | Confirm the `ATTACH` includes `ACCESS_DELEGATION_MODE 'none'` (see `BuildLakekeeperAttachStmt`) and that `duckdb_secrets()` shows `vended = 0`. Re-activate the worker on a build that includes the fix. | -| `iceberg_state` never leaves `pending`; no `lakekeeper-` Deployment appears | operator not running in this cluster, or provisioner disabled | Check the operator pod and that the control plane has `DUCKGRES_LAKEKEEPER_PROVISIONER_ENABLED=true`; check the `Lakekeeper` CR's status. | -| Warehouse-create fails with a credentials/`SystemIdentity` error | the Lakekeeper pod can't reach cloud credentials for its own metadata IO | Verify the per-org ServiceAccount's Pod Identity association exists and the pod has credential env injected (the pod must start *after* the association is created). | -| `column "iceberg_lakekeeper_oauth2_server_uri" does not exist` on persist | GORM snake-cases `OAuth2` to `o_auth2`; a config write used the wrong column name | The column is `iceberg_lakekeeper_o_auth2_server_uri`. | - -## See also - -- `server/iceberg/migration.go` — the attach/secret statement builders and the - delegation-mode rationale in the doc comments. -- [Delta Catalog Activation](delta-catalog-activation.md) — the analogous - runbook for the DuckLake Delta catalog. diff --git a/duckdbservice/activation.go b/duckdbservice/activation.go index 76438895..ddd8976f 100644 --- a/duckdbservice/activation.go +++ b/duckdbservice/activation.go @@ -18,10 +18,6 @@ type ActivationPayload struct { server.WorkerControlMetadata OrgID string `json:"org_id"` DuckLake server.DuckLakeConfig `json:"ducklake"` - // Iceberg is the per-tenant Iceberg catalog (AWS S3 Tables) config. Empty - // (Enabled=false) when the tenant has not opted in or hasn't been - // provisioned yet — workers handle that as a no-op at attach time. - Iceberg server.IcebergConfig `json:"iceberg"` } type activatedTenantRuntime struct { @@ -84,7 +80,6 @@ func (p *SessionPool) activateTenant(payload ActivationPayload) error { cfg := p.cfg cfg.DuckLake = payload.DuckLake - cfg.Iceberg = payload.Iceberg overrideS3EndpointForCacheProxy(&cfg.DuckLake) // Tag postgres_scanner libpq connections with an application_name that // includes the org so Aurora's pg_stat_activity / Performance Insights @@ -206,17 +201,11 @@ func (p *SessionPool) reuseExistingActivation(payload ActivationPayload) bool { } } - // needsRefresh is keyed on DuckLake creds because the activator - // populates DuckLake.S3* with the STS-minted credentials for the - // per-tenant IAM role, and the iceberg_sigv4 secret reuses the same - // values. So a single change-detection covers both downstream - // consumers. The guard "something is actually using S3" expands here - // to include iceberg — there are tenants (e.g. metadata-only DuckLake) - // where ObjectStore is empty but Iceberg.Enabled is true, and on - // those the iceberg secret still needs to be rotated. + // needsRefresh is keyed on DuckLake creds because the activator populates + // DuckLake.S3* with the STS-minted credentials for the per-tenant IAM role. needsRefresh := false if p.activation.db != nil && - (payload.DuckLake.ObjectStore != "" || payload.Iceberg.Enabled) && + payload.DuckLake.ObjectStore != "" && !reflect.DeepEqual(current.DuckLake, payload.DuckLake) { needsRefresh = s3CredentialsChanged(current.DuckLake, payload.DuckLake) if !needsRefresh { @@ -238,7 +227,6 @@ func (p *SessionPool) reuseExistingActivation(payload ActivationPayload) bool { refreshDB = p.activation.db } refreshFn := p.refreshS3Secret - refreshIcebergFn := p.refreshIcebergSecret sem := p.duckLakeSem p.mu.Unlock() @@ -252,24 +240,9 @@ func (p *SessionPool) reuseExistingActivation(payload ActivationPayload) bool { if refreshFn == nil { refreshFn = server.RefreshS3Secret } - if refreshIcebergFn == nil { - refreshIcebergFn = server.RefreshIcebergSecret - } - if payload.DuckLake.ObjectStore != "" { - if err := refreshFn(refreshDB, payload.DuckLake, sem); err != nil { - slog.Warn("Failed to refresh S3 credentials on hot-idle reuse.", "org", payload.OrgID, "error", err) - return false - } - } - if payload.Iceberg.Enabled { - if err := refreshIcebergFn(refreshDB, payload.Iceberg, sem, - payload.DuckLake.S3AccessKey, - payload.DuckLake.S3SecretKey, - payload.DuckLake.S3SessionToken, - ); err != nil { - slog.Warn("Failed to refresh Iceberg credentials on hot-idle reuse.", "org", payload.OrgID, "error", err) - return false - } + if err := refreshFn(refreshDB, payload.DuckLake, sem); err != nil { + slog.Warn("Failed to refresh S3 credentials on hot-idle reuse.", "org", payload.OrgID, "error", err) + return false } } @@ -322,7 +295,6 @@ func sameTenantActivationRuntime(current, next ActivationPayload) bool { return false } a, b := current.DuckLake, next.DuckLake - ai, bi := current.Iceberg, next.Iceberg return a.MetadataStore == b.MetadataStore && a.ObjectStore == b.ObjectStore && a.DataPath == b.DataPath && @@ -337,20 +309,7 @@ func sameTenantActivationRuntime(current, next ActivationPayload) bool { a.S3Profile == b.S3Profile && a.Migrate == b.Migrate && reflect.DeepEqual(a.DataInliningRowLimit, b.DataInliningRowLimit) && - a.CheckpointInterval == b.CheckpointInterval && - ai.Enabled == bi.Enabled && - ai.Backend == bi.Backend && - ai.Region == bi.Region && - ai.Namespace == bi.Namespace && - // Lakekeeper-side identity. Without this, a hot-idle worker - // activated before Lakekeeper provisioning completed would be - // reclaimed for the same org without forcing the new ATTACH — - // the worker would keep running with no iceberg catalog - // attached even though the new payload carries the endpoint. - ai.LakekeeperEndpoint == bi.LakekeeperEndpoint && - ai.LakekeeperWarehouse == bi.LakekeeperWarehouse && - ai.LakekeeperClientID == bi.LakekeeperClientID && - ai.LakekeeperOAuth2ServerURI == bi.LakekeeperOAuth2ServerURI + a.CheckpointInterval == b.CheckpointInterval } func (p *SessionPool) validateControlMetadata(meta server.WorkerControlMetadata) error { @@ -388,7 +347,6 @@ func (p *SessionPool) currentSessionConfig() (server.Config, error) { cfg := p.cfg cfg.DuckLake = p.activation.payload.DuckLake - cfg.Iceberg = p.activation.payload.Iceberg overrideS3EndpointForCacheProxy(&cfg.DuckLake) return cfg, nil } @@ -397,7 +355,6 @@ func (p *SessionPool) sharedWarmupConfig() server.Config { cfg := p.cfg if p.sharedWarmMode { cfg.DuckLake = server.DuckLakeConfig{} - cfg.Iceberg = server.IcebergConfig{} } return cfg } diff --git a/duckdbservice/activation_lakekeeper_test.go b/duckdbservice/activation_lakekeeper_test.go deleted file mode 100644 index 9dbd766c..00000000 --- a/duckdbservice/activation_lakekeeper_test.go +++ /dev/null @@ -1,93 +0,0 @@ -package duckdbservice - -import ( - "testing" - - "github.com/posthog/duckgres/server" -) - -// TestSameTenantActivationRuntime_LakekeeperFields locks in the cross-PR fix -// that extends the same-tenant equality check to include the Lakekeeper-side -// identity fields. Without these in the check, a hot-idle worker activated -// before Lakekeeper provisioning completed would be reclaimed for the same -// org without forcing a fresh ATTACH — the worker would keep running with no -// iceberg catalog attached even though the new payload now carries the -// provisioned endpoint. -func TestSameTenantActivationRuntime_LakekeeperFields(t *testing.T) { - base := ActivationPayload{ - OrgID: "acme", - Iceberg: server.IcebergConfig{ - Enabled: true, - Backend: server.IcebergConfig{}.Backend, // empty - }, - } - - cases := []struct { - name string - mutate func(p *ActivationPayload) - wantSameRun bool - }{ - { - name: "identical payload is same-tenant", - mutate: func(p *ActivationPayload) {}, - wantSameRun: true, - }, - { - name: "Backend differs → not same", - mutate: func(p *ActivationPayload) { - p.Iceberg.Backend = "lakekeeper" - }, - wantSameRun: false, - }, - { - name: "LakekeeperEndpoint added → not same (provisioning completed mid-flight)", - mutate: func(p *ActivationPayload) { - p.Iceberg.LakekeeperEndpoint = "http://lk-acme.lakekeeper.svc:8181/catalog" - }, - wantSameRun: false, - }, - { - name: "LakekeeperWarehouse differs → not same", - mutate: func(p *ActivationPayload) { - p.Iceberg.LakekeeperWarehouse = "org-acme" - }, - wantSameRun: false, - }, - { - name: "LakekeeperClientID differs → not same", - mutate: func(p *ActivationPayload) { - p.Iceberg.LakekeeperClientID = "duckling-acme" - }, - wantSameRun: false, - }, - { - name: "LakekeeperOAuth2ServerURI differs → not same (OIDC flip)", - mutate: func(p *ActivationPayload) { - p.Iceberg.LakekeeperOAuth2ServerURI = "http://127.0.0.1:9876/token" - }, - wantSameRun: false, - }, - { - name: "LakekeeperClientSecret intentionally NOT compared", - mutate: func(p *ActivationPayload) { - // The client_secret is rotated only via re-provisioning, - // which already changes other fields. Including it would - // also force a useless reactivation if the activator - // re-resolves the same SecretRef across two activations. - p.Iceberg.LakekeeperClientSecret = "different-but-shouldn't-matter" - }, - wantSameRun: true, - }, - } - for _, c := range cases { - t.Run(c.name, func(t *testing.T) { - next := base - c.mutate(&next) - got := sameTenantActivationRuntime(base, next) - if got != c.wantSameRun { - t.Errorf("sameTenantActivationRuntime() = %v, want %v\nbase iceberg: %+v\nnext iceberg: %+v", - got, c.wantSameRun, base.Iceberg, next.Iceberg) - } - }) - } -} diff --git a/duckdbservice/activation_test.go b/duckdbservice/activation_test.go index 7cd77510..829e5192 100644 --- a/duckdbservice/activation_test.go +++ b/duckdbservice/activation_test.go @@ -690,190 +690,3 @@ func TestConcurrentActivateTenantSamePayloadBothSucceed(t *testing.T) { t.Errorf("workerID = %d, want %d", pool.workerID, payload.WorkerID) } } - -// TestReuseExistingActivationRefreshesIcebergAlongsideS3 is the regression -// net for PR #563 (commit 12a9304): on hot-idle reclaim with rotated STS -// credentials, the iceberg_sigv4 secret must be refreshed alongside the -// DuckLake S3 secret. Before the fix, only refreshS3Secret was invoked, -// so long-lived workers' iceberg queries 403'd after STS expiry while -// DuckLake queries kept working — a class of bug that's invisible to -// tenants without iceberg enabled and silent for hours on tenants that -// have it but query it rarely. -// -// We assert two things the original fix actually changed: (1) the iceberg -// refresh function runs at all when Iceberg.Enabled=true on the payload, -// and (2) it receives the NEW credentials from the rotated payload, not -// the stale ones from the existing activation. Either failing reproduces -// the production bug. -func TestReuseExistingActivationRefreshesIcebergAlongsideS3(t *testing.T) { - mainDB, err := sql.Open("duckdb", "") - if err != nil { - t.Fatalf("open main duckdb: %v", err) - } - defer func() { _ = mainDB.Close() }() - controlDB, err := sql.Open("duckdb", "") - if err != nil { - t.Fatalf("open control duckdb: %v", err) - } - defer func() { _ = controlDB.Close() }() - - pool := &SessionPool{ - sessions: make(map[string]*Session), - stopRefresh: make(map[string]func()), - duckLakeSem: make(chan struct{}, 1), - warmupDone: make(chan struct{}), - sharedWarmMode: true, - warmupDB: mainDB, - controlDB: controlDB, - activation: &activatedTenantRuntime{ - payload: ActivationPayload{ - WorkerControlMetadata: server.WorkerControlMetadata{OwnerEpoch: 1}, - OrgID: "analytics", - DuckLake: server.DuckLakeConfig{ - MetadataStore: "postgres:host=meta port=5432 user=u password=p dbname=d", - ObjectStore: "s3://analytics/", - S3AccessKey: "OLD_AK", - S3SecretKey: "OLD_SK", - S3SessionToken: "OLD_TOK", - }, - Iceberg: server.IcebergConfig{ - Enabled: true, - LakekeeperEndpoint: "http://lakekeeper-analytics.lakekeeper.svc:8181/catalog", - LakekeeperWarehouse: "org-analytics", - Region: "us-east-1", - Namespace: "main", - }, - }, - db: mainDB, - }, - ownerEpoch: 1, - } - close(pool.warmupDone) - - var s3Calls int - var icebergCalls int - var icebergKeyID, icebergSecret, icebergToken string - var icebergCfg server.IcebergConfig - pool.refreshS3Secret = func(db *sql.DB, dlCfg server.DuckLakeConfig, sem chan struct{}) error { - s3Calls++ - return nil - } - pool.refreshIcebergSecret = func(db *sql.DB, ic server.IcebergConfig, sem chan struct{}, keyID, secret, sessionToken string) error { - icebergCalls++ - icebergCfg = ic - icebergKeyID = keyID - icebergSecret = secret - icebergToken = sessionToken - return nil - } - - newPayload := ActivationPayload{ - WorkerControlMetadata: server.WorkerControlMetadata{OwnerEpoch: 2}, - OrgID: "analytics", - DuckLake: server.DuckLakeConfig{ - MetadataStore: "postgres:host=meta port=5432 user=u password=p dbname=d", - ObjectStore: "s3://analytics/", - S3AccessKey: "NEW_AK", - S3SecretKey: "NEW_SK", - S3SessionToken: "NEW_TOK", - }, - Iceberg: server.IcebergConfig{ - Enabled: true, - LakekeeperEndpoint: "http://lakekeeper-analytics.lakekeeper.svc:8181/catalog", - LakekeeperWarehouse: "org-analytics", - Region: "us-east-1", - Namespace: "main", - }, - } - - if !pool.reuseExistingActivation(newPayload) { - t.Fatal("reuseExistingActivation returned false; expected hot-idle reclaim to succeed") - } - - if s3Calls != 1 { - t.Errorf("refreshS3Secret called %d times, want 1", s3Calls) - } - if icebergCalls != 1 { - t.Fatalf("refreshIcebergSecret called %d times, want 1 — iceberg secret was NOT rotated alongside DuckLake (PR #563 regression)", icebergCalls) - } - if icebergKeyID != "NEW_AK" || icebergSecret != "NEW_SK" || icebergToken != "NEW_TOK" { - t.Errorf("iceberg refresh got stale credentials: keyID=%q secret=%q token=%q, want NEW_AK/NEW_SK/NEW_TOK", - icebergKeyID, icebergSecret, icebergToken) - } - if icebergCfg.LakekeeperWarehouse != newPayload.Iceberg.LakekeeperWarehouse || icebergCfg.Region != newPayload.Iceberg.Region { - t.Errorf("iceberg refresh got wrong config: %+v, want %+v", icebergCfg, newPayload.Iceberg) - } -} - -// TestReuseExistingActivationSkipsIcebergRefreshWhenDisabled guards the -// inverse: tenants that haven't opted into iceberg must not have the -// iceberg refresh fn invoked at all on hot-idle reclaim. A regression -// that unconditionally fires the refresh would run a CREATE OR REPLACE -// SECRET against DuckDB for every reclaim, which is wasteful and — more -// importantly — would mask the real bug if the iceberg secret SQL form -// breaks for the disabled case (e.g. empty ARN/empty creds). -func TestReuseExistingActivationSkipsIcebergRefreshWhenDisabled(t *testing.T) { - mainDB, err := sql.Open("duckdb", "") - if err != nil { - t.Fatalf("open main duckdb: %v", err) - } - defer func() { _ = mainDB.Close() }() - controlDB, err := sql.Open("duckdb", "") - if err != nil { - t.Fatalf("open control duckdb: %v", err) - } - defer func() { _ = controlDB.Close() }() - - pool := &SessionPool{ - sessions: make(map[string]*Session), - stopRefresh: make(map[string]func()), - duckLakeSem: make(chan struct{}, 1), - warmupDone: make(chan struct{}), - sharedWarmMode: true, - warmupDB: mainDB, - controlDB: controlDB, - activation: &activatedTenantRuntime{ - payload: ActivationPayload{ - WorkerControlMetadata: server.WorkerControlMetadata{OwnerEpoch: 1}, - OrgID: "billing", - DuckLake: server.DuckLakeConfig{ - MetadataStore: "postgres:host=meta port=5432 user=u password=p dbname=d", - ObjectStore: "s3://billing/", - S3AccessKey: "OLD_AK", - S3SecretKey: "OLD_SK", - }, - // Iceberg disabled — typical for tenants on DuckLake-only. - }, - db: mainDB, - }, - ownerEpoch: 1, - } - close(pool.warmupDone) - - var icebergCalls int - pool.refreshS3Secret = func(db *sql.DB, dlCfg server.DuckLakeConfig, sem chan struct{}) error { - return nil - } - pool.refreshIcebergSecret = func(db *sql.DB, ic server.IcebergConfig, sem chan struct{}, keyID, secret, sessionToken string) error { - icebergCalls++ - return nil - } - - newPayload := ActivationPayload{ - WorkerControlMetadata: server.WorkerControlMetadata{OwnerEpoch: 2}, - OrgID: "billing", - DuckLake: server.DuckLakeConfig{ - MetadataStore: "postgres:host=meta port=5432 user=u password=p dbname=d", - ObjectStore: "s3://billing/", - S3AccessKey: "NEW_AK", - S3SecretKey: "NEW_SK", - }, - } - - if !pool.reuseExistingActivation(newPayload) { - t.Fatal("reuseExistingActivation returned false; expected hot-idle reclaim to succeed") - } - if icebergCalls != 0 { - t.Errorf("refreshIcebergSecret called %d times for iceberg-disabled tenant, want 0", icebergCalls) - } -} diff --git a/duckdbservice/service.go b/duckdbservice/service.go index bc7e9d2b..b26883bf 100644 --- a/duckdbservice/service.go +++ b/duckdbservice/service.go @@ -85,13 +85,6 @@ type SessionPool struct { // inject a stub to verify the credential-refresh path is non-blocking // (see TestReuseExistingActivationDoesNotBlockHealthChecks). refreshS3Secret func(*sql.DB, server.DuckLakeConfig, chan struct{}) error - // refreshIcebergSecret is the sibling indirection for rotating the - // iceberg_sigv4 secret. Runs alongside refreshS3Secret on hot-idle - // reuse whenever the tenant has iceberg enabled — without it, iceberg - // queries on a long-lived worker would 403 after STS rotation while - // DuckLake stays fresh. Defaults to server.RefreshIcebergSecret; - // stubbed by tests the same way refreshS3Secret is. - refreshIcebergSecret func(*sql.DB, server.IcebergConfig, chan struct{}, string, string, string) error drainMu sync.Mutex draining bool @@ -485,7 +478,6 @@ func NewDuckDBService(cfg ServiceConfig) *DuckDBService { createDBPair: CreateWorkerDBPair, activateDBConnection: server.ActivateDBConnection, refreshS3Secret: server.RefreshS3Secret, - refreshIcebergSecret: server.RefreshIcebergSecret, drainZero: make(chan struct{}), drainZeroOpen: true, } diff --git a/duckdbservice/user_secrets.go b/duckdbservice/user_secrets.go index 93dab97e..616b49df 100644 --- a/duckdbservice/user_secrets.go +++ b/duckdbservice/user_secrets.go @@ -32,9 +32,9 @@ const userSecretOpTimeout = 15 * time.Second // duckdb_secrets(). That makes a persistent-only wipe a cross-user isolation // leak; this wipe closes it by dropping temporary secrets too. // -// System-managed secrets (ducklake_s3, iceberg_sigv4, iceberg_oauth, plus the -// reserved __default_*/duckgres_* prefixes) are preserved: activation -// re-creates them and dropping them would break the org's own catalog wiring. +// System-managed secrets (ducklake_s3, plus the reserved +// __default_*/duckgres_* prefixes) are preserved: activation re-creates them +// and dropping them would break the org's own catalog wiring. // The allowlist is usersecrets.IsReservedName, the same set the control plane // forbids users from creating, so no user secret can hide behind it. // diff --git a/duckdbservice/user_secrets_test.go b/duckdbservice/user_secrets_test.go index c31922f9..6506700a 100644 --- a/duckdbservice/user_secrets_test.go +++ b/duckdbservice/user_secrets_test.go @@ -46,8 +46,8 @@ func secretNames(t *testing.T, db *sql.DB, where string) map[string]bool { // Wipe must drop ALL user-created secrets — persistent AND non-persistent // (plain/TEMPORARY CREATE SECRET) — while leaving the system-managed catalog -// secrets (ducklake_s3 / iceberg_sigv4 / iceberg_oauth, plus the reserved -// __default_*/duckgres_* prefixes) untouched. +// secrets (ducklake_s3, plus the reserved __default_*/duckgres_* prefixes) +// untouched. func TestWipeUserSecrets(t *testing.T) { db := openSecretTestDB(t) mustExec := func(q string) { @@ -59,7 +59,6 @@ func TestWipeUserSecrets(t *testing.T) { // System-managed secrets (created with plain CREATE OR REPLACE SECRET, so // they land in in-memory/temporary storage). These must survive the wipe. mustExec("CREATE OR REPLACE SECRET ducklake_s3 (TYPE s3, KEY_ID 'sys', SECRET 'sys')") - mustExec("CREATE OR REPLACE SECRET iceberg_sigv4 (TYPE s3, KEY_ID 'sys', SECRET 'sys')") // User secrets: a persistent one and a temporary one. Both must be dropped. mustExec("CREATE PERSISTENT SECRET user_a (TYPE s3, KEY_ID 'a', SECRET 'a')") mustExec(`CREATE PERSISTENT SECRET "user_b" (TYPE gcs, KEY_ID 'b', SECRET 'b')`) @@ -78,7 +77,7 @@ func TestWipeUserSecrets(t *testing.T) { t.Errorf("user secret %q remains after wipe; remaining: %v", leaked, remaining) } } - for _, sys := range []string{"ducklake_s3", "iceberg_sigv4"} { + for _, sys := range []string{"ducklake_s3"} { if !remaining[sys] { t.Errorf("system secret %q was wiped; remaining: %v", sys, remaining) } diff --git a/k8s/lakekeeper-networkpolicy.yaml b/k8s/lakekeeper-networkpolicy.yaml deleted file mode 100644 index 1ae5c005..00000000 --- a/k8s/lakekeeper-networkpolicy.yaml +++ /dev/null @@ -1,94 +0,0 @@ -# Baseline NetworkPolicy for the per-org Lakekeeper instances duckgres -# provisions in the `lakekeeper` namespace. Default-deny on both ingress -# and egress so a misconfigured Lakekeeper instance can't talk to (or be -# talked to by) anything not explicitly allowed below. -# -# Per-org allow rules are NOT in this file. The lakekeeper-operator -# stamps only the standard app.kubernetes.io/{name,instance,managed-by} -# labels on its Deployment pods today, so a NetworkPolicy keyed on -# `duckgres/active-org` would never match. Per-org isolation needs either: -# -# (a) an upstream patch to the operator that propagates CR labels onto -# the Deployment template (then add an ingress rule below that -# matches on duckgres/active-org), or -# (b) per-org CiliumClusterwideNetworkPolicy resources templated in -# the charts repo, matching on app.kubernetes.io/instance= -# lakekeeper-. -# -# Today both are TODO. Until one ships, the rules below only allow -# duckgres workers (any org) to reach Lakekeeper. That's coarser than -# the final story but matches the "allowall + NetworkPolicy" PR1 model: -# cluster boundary holds, intra-cluster org boundary lands in PR3 with -# OIDC + the label-propagation work. - -apiVersion: networking.k8s.io/v1 -kind: NetworkPolicy -metadata: - name: lakekeeper-default-deny - namespace: lakekeeper -spec: - podSelector: {} # all pods in the lakekeeper namespace - policyTypes: - - Ingress - - Egress ---- -apiVersion: networking.k8s.io/v1 -kind: NetworkPolicy -metadata: - name: lakekeeper-ingress-from-duckgres-workers - namespace: lakekeeper -spec: - # `app.kubernetes.io/name: lakekeeper` is what the lakekeeper-operator - # stamps on its Deployment pods (see operator's getLabels). The CR and - # the K8s Secret we create alongside it use `app: lakekeeper`, but - # those are non-pod objects so they don't intersect with this selector. - podSelector: - matchLabels: - app.kubernetes.io/name: lakekeeper - policyTypes: - - Ingress - ingress: - - from: - - namespaceSelector: - matchLabels: - kubernetes.io/metadata.name: duckgres - podSelector: - matchLabels: - app: duckgres-worker - ports: - - port: 8181 - protocol: TCP ---- -apiVersion: networking.k8s.io/v1 -kind: NetworkPolicy -metadata: - name: lakekeeper-egress-to-pg-and-s3 - namespace: lakekeeper -spec: - podSelector: - matchLabels: - app.kubernetes.io/name: lakekeeper - policyTypes: - - Egress - egress: - # DNS so service hostnames resolve. - - ports: - - port: 53 - protocol: UDP - - port: 53 - protocol: TCP - # Postgres (the org's managed-warehouse RDS/Aurora). Open to any IP since - # those endpoints aren't necessarily in-cluster. - - ports: - - port: 5432 - protocol: TCP - # HTTPS to S3 (and to the Lakekeeper-Operator-managed webhook / - # metrics endpoints we may add later). - - ports: - - port: 443 - protocol: TCP - # MinIO / S3-compat in-cluster (dev clusters). Harmless in prod where - # MinIO isn't running. - - ports: - - port: 9000 - protocol: TCP diff --git a/main.go b/main.go index 34c53fc5..fcd015d0 100644 --- a/main.go +++ b/main.go @@ -38,7 +38,6 @@ type ( ACMEConfig = configloader.ACMEConfig RateLimitFileConfig = configloader.RateLimitFileConfig DuckLakeFileConfig = configloader.DuckLakeFileConfig - IcebergFileConfig = configloader.IcebergFileConfig ) // loadConfigFile + env are thin wrappers around configloader for back-compat diff --git a/main_test.go b/main_test.go index 7df035cf..90da59ae 100644 --- a/main_test.go +++ b/main_test.go @@ -298,33 +298,6 @@ func TestResolveEffectiveConfigDuckLakeDeltaCatalog(t *testing.T) { } } -func TestResolveEffectiveConfigIceberg(t *testing.T) { - // Default-off: with no overrides, Iceberg stays disabled (opt-in unlike Delta). - resolved := configresolve.ResolveEffective(nil, configresolve.CLIInputs{}, envFromMap(nil), nil) - if resolved.Server.Iceberg.Enabled { - t.Fatal("expected Iceberg.Enabled to default to false") - } - - // File config opts in; remaining knobs (region/namespace) flow through. - fileEnabled := true - resolved = configresolve.ResolveEffective(&FileConfig{ - Iceberg: IcebergFileConfig{ - Enabled: &fileEnabled, - Region: "us-east-1", - Namespace: "main", - }, - }, configresolve.CLIInputs{}, envFromMap(nil), nil) - if !resolved.Server.Iceberg.Enabled { - t.Fatal("expected YAML iceberg.enabled=true to enable Iceberg") - } - if got, want := resolved.Server.Iceberg.Region, "us-east-1"; got != want { - t.Fatalf("expected YAML iceberg region %q, got %q", want, got) - } - if got, want := resolved.Server.Iceberg.Namespace, "main"; got != want { - t.Fatalf("expected YAML iceberg namespace %q, got %q", want, got) - } -} - func TestResolveEffectiveConfigInvalidQueryLogEnvValues(t *testing.T) { env := map[string]string{ "DUCKGRES_QUERY_LOG_FLUSH_INTERVAL": "0s", diff --git a/scripts/copy-ducklake-to-iceberg.sh b/scripts/copy-ducklake-to-iceberg.sh deleted file mode 100644 index 0f4bc575..00000000 --- a/scripts/copy-ducklake-to-iceberg.sh +++ /dev/null @@ -1,505 +0,0 @@ -#!/usr/bin/env bash -# -# copy-ducklake-to-iceberg.sh -# -# Copy tables from the `ducklake` catalog into the `iceberg` (Lakekeeper REST) -# catalog of a duckgres managed warehouse. -# -# SAFE BY DEFAULT: prints the SQL it would run and exits. Pass --execute to run. -# -# ── Why it works the way it does (hard-won) ────────────────────────────────── -# The DuckDB iceberg extension + Lakekeeper, reached through duckgres, have -# sharp edges that dictate this exact sequence: -# * `CREATE TABLE iceberg.x AS SELECT ...` creates an EMPTY table — the data -# is NOT written. So we create the shell, then INSERT. -# * `CREATE OR REPLACE TABLE` is NOT supported ("use separate Drop and -# Create"). So idempotency is done with CREATE IF NOT EXISTS + DELETE. -# * `COPY FROM DATABASE ducklake TO iceberg` (the DuckLake-native bulk copy) -# works but (a) requires the target schema to already exist, and (b) is one -# all-or-nothing statement — a single Iceberg-incompatible column type -# aborts the whole thing. This script copies table-by-table instead so one -# bad table is logged and skipped. -# * duckgres routes each NEW connection to an arbitrary warm worker, and -# Lakekeeper commits have visibility lag — so catalog state is inconsistent -# ACROSS connections. We therefore run the whole CATALOG-MUTATING batch -# (CREATE/DELETE/INSERT) in ONE psql session. Read-only PLANNING queries -# against the stable `ducklake` source (counts, quantiles, column types) -# don't touch iceberg catalog state, so they may run in their own sessions. -# -# ── Why we copy in CHUNKS (the big-table story) ────────────────────────────── -# A single `INSERT INTO iceberg.x SELECT * FROM ducklake.x` over a huge table -# holds the TLS connection open for minutes with ZERO bytes flowing (psql is -# blocked waiting for CommandComplete). There is no statement timeout in -# duckgres — but cloud load balancers drop idle TCP connections, so the copy -# dies partway. Splitting one table into many bounded INSERTs fixes this: -# * each chunk finishes fast and emits a CommandComplete — that traffic keeps -# the connection's idle timer from firing; -# * a dropped connection only loses the in-flight chunk, not the whole table; -# * each chunk is INDEPENDENTLY idempotent (DELETE its range, then INSERT its -# range), so resume is at chunk granularity, not whole-table. -# -# How a table is chunked: -# * We pick a KEY COLUMN to range over. Auto-detected per table (prefer an -# integer/bigint id-like column, else a timestamp/date column); override -# with --chunk-column. Tables with no usable key (or row count <= one -# chunk) are copied in a single whole-table INSERT, exactly as before. -# * We size the number of chunks from the row count and --chunk-rows, then -# compute distribution-aware boundaries with approx_quantile(key, ...). -# Quantiles (not equal-width ranges) keep chunk sizes even under skew — -# critical for PostHog data, which is heavily weighted toward recent time. -# * Boundaries are emitted as native-typed SQL literals so the per-chunk -# `WHERE key >= lo AND key < hi` predicate prunes DuckLake parquet files -# (≈ one source scan total, not one scan per chunk). A trailing -# `key IS NULL` chunk catches NULL keys so coverage is complete. -# * The iceberg target is unpartitioned (the DuckDB iceberg extension cannot -# write partitioned tables); chunking is purely on the READ side. -# -# Per chunk, the proven sequence is (CREATE SCHEMA / shell emitted once/table): -# CREATE SCHEMA IF NOT EXISTS iceberg."s"; -# CREATE TABLE IF NOT EXISTS iceberg."s"."t" AS SELECT * FROM ducklake."s"."t" LIMIT 0; -# DELETE FROM iceberg."s"."t" WHERE ; -- idempotent for THIS range -# INSERT INTO iceberg."s"."t" SELECT * FROM ducklake."s"."t" WHERE ; -# -# ── Resume (the connection WILL drop on big runs) ──────────────────────────── -# Recovery is just: run the exact same command again. Two mechanisms decide -# what to skip: -# 1. ROW-COUNT PRE-CHECK (authoritative, table-level): before copying, we -# compare each existing destination table's TOTAL row count to its ducklake -# source. Equal => already fully copied => skip the whole table. Disable -# with --no-verify (e.g. if counting is too slow on huge tables). -# 2. PROGRESS FILE (fast path, chunk-level): after each run we parse the log -# and record every CHUNK whose INSERT actually returned (and every TABLE -# whose chunks all returned). On the next run those are skipped. This is -# what makes a half-finished huge table resume mid-table. Path printed -# each run. Re-doing a chunk is always safe (DELETE range + INSERT range), -# so even a stale boundary set never duplicates rows. -# --restart ignores both and re-copies everything. -# -# Connection comes from libpq env vars. duckgres routes on the TLS SNI, so -# PGHOST must be the per-org hostname (.dw.): -# export PGHOST=.dw.us.postwh.com -# export PGUSER=... PGPASSWORD=... PGDATABASE=... PGSSLMODE=require -# -# Usage: -# ./copy-ducklake-to-iceberg.sh [--execute] [--schema NAME] [--table NAME] [--limit N] -# [--chunk-column NAME] [--chunk-rows N] [--max-chunks N] -# [--no-chunk] [--progress FILE] [--restart] [--no-verify] -# -# --schema NAME only this source schema (STRONGLY recommended for a first -# run — `ducklake` can hold thousands of tables / lots of data) -# --table NAME only this table (use with --schema) -# --limit N only the first N matched tables (smoke run) -# --chunk-column NAME force this column as the chunk key for every table that -# has it with a usable (integer/timestamp/date) type; tables -# without it fall back to auto-detect. -# --chunk-rows N target rows per chunk (default 5000000). Smaller = more, -# shorter statements (safer against idle drops, more overhead). -# --max-chunks N cap chunks per table (default 512). -# --no-chunk disable chunking entirely (one whole-table INSERT per table, -# the old behavior). -# --execute actually run (default prints the plan + SQL). -# --progress FILE progress file for resume (default: derived from host+db). -# --restart ignore progress + count-check; re-copy everything. -# --no-verify skip the table-level row-count pre-check. -# -set -uo pipefail - -SRC=ducklake -DST=iceberg -EXECUTE=0; SCHEMA=""; TABLE=""; LIMIT=0; PROGRESS=""; RESTART=0; VERIFY=1 -CHUNK_COLUMN=""; CHUNK_ROWS=5000000; MAX_CHUNKS=512; NOCHUNK=0 - -while [ $# -gt 0 ]; do - case "$1" in - --execute) EXECUTE=1 ;; - --schema) SCHEMA="${2:?--schema needs a value}"; shift ;; - --table) TABLE="${2:?--table needs a value}"; shift ;; - --limit) LIMIT="${2:?--limit needs a value}"; shift ;; - --chunk-column) CHUNK_COLUMN="${2:?--chunk-column needs a value}"; shift ;; - --chunk-rows) CHUNK_ROWS="${2:?--chunk-rows needs a value}"; shift ;; - --max-chunks) MAX_CHUNKS="${2:?--max-chunks needs a value}"; shift ;; - --no-chunk) NOCHUNK=1 ;; - --progress) PROGRESS="${2:?--progress needs a value}"; shift ;; - --restart) RESTART=1 ;; - --no-verify) VERIFY=0 ;; - -h|--help) sed -n '2,120p' "$0"; exit 0 ;; - *) echo "unknown arg: $1" >&2; exit 2 ;; - esac - shift -done - -: "${PGHOST:?set PGHOST to the org hostname, e.g. .dw.us.postwh.com}" -: "${PGUSER:?set PGUSER}"; : "${PGDATABASE:?set PGDATABASE}"; : "${PGPASSWORD:?set PGPASSWORD}" -export PGSSLMODE="${PGSSLMODE:-require}" - -q() { psql -X -w -tA -F $'\t' -v ON_ERROR_STOP=1 -c "$1"; } -qid(){ printf '"%s"' "${1//\"/\"\"}"; } - -echo "# $SRC -> $DST host=$PGHOST db=$PGDATABASE user=$PGUSER" -echo "# mode: $([ $EXECUTE = 1 ] && echo EXECUTE || echo 'DRY RUN (pass --execute to run)')" -[ -n "$SCHEMA" ] && echo "# schema=$SCHEMA"; [ -n "$TABLE" ] && echo "# table=$TABLE"; [ "$LIMIT" != 0 ] && echo "# limit=$LIMIT" -if [ "$NOCHUNK" = 1 ]; then - echo "# chunking: DISABLED (--no-chunk; one INSERT per table)" -else - echo "# chunking: target ~$CHUNK_ROWS rows/chunk, max $MAX_CHUNKS chunks/table${CHUNK_COLUMN:+, forced key '$CHUNK_COLUMN'}" -fi - -if [ "$(q "SELECT count(*) FROM duckdb_databases() WHERE database_name='$DST';")" != "1" ]; then - echo "ERROR: catalog '$DST' not attached — reconnect (you may be on a worker activated before iceberg was enabled)." >&2 - exit 1 -fi - -# Progress file: keyed by the "schema.table" string (whole table done) and by -# "schema.table#