Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions justfile
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ test:
# Run unit tests only
[group('test')]
test-unit:
go test -v -p 1 . ./configresolve/... ./duckdbservice/... ./server/... ./transpiler/... ./internal/...
go test -v -p 1 . ./configresolve/... ./duckdbservice/... ./server/... ./transpiler/... ./internal/... ./tests/manifests/...

# Run cache-proxy tests
[group('test')]
Expand All @@ -286,7 +286,7 @@ test-configstore-integration:
# Run Kubernetes-only control plane package tests
[group('test')]
test-controlplane-k8s:
go test -v -count=1 -tags kubernetes ./controlplane ./controlplane/admin ./controlplane/provisioner
go test -v -count=1 -tags kubernetes . ./controlplane ./controlplane/admin ./controlplane/provisioner

# Print the test impact plan for the current branch
[group('test')]
Expand Down
72 changes: 71 additions & 1 deletion k8s/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,13 +43,16 @@ The control plane handles TLS, authentication, PostgreSQL wire protocol, and SQL
| File | Description |
|------|-------------|
| `namespace.yaml` | `duckgres` namespace |
| `rbac.yaml` | Control-plane and shared worker ServiceAccounts, Role (pods + secrets), RoleBinding |
| `rbac.yaml` | Control-plane, shared worker, and query-log writer ServiceAccounts plus required Roles/RoleBindings |
| `configmap.yaml` | Shared duckgres config (users, extensions, data dir) |
| `secret.yaml` | Bearer token secret (auto-populated by CP if empty) |
| `managed-warehouse-secrets.yaml` | Local secret payloads referenced by the seeded managed-warehouse contract |
| `worker-identity.yaml` | Local worker ServiceAccount referenced by the seeded managed-warehouse contract |
| `networkpolicy.yaml` | Restricts worker ingress to CP pods only |
| `control-plane-multitenant-local.yaml` | Optional OrbStack-oriented shared-worker control-plane manifest |
| `query-log-kafka-config.example.yaml` | Example ConfigMap for enabling Kafka query-log producer and writer settings |
| `query-log-writer.yaml` | Disabled-by-default query-log writer Deployment and metrics Service |
| `query-log-writer-alerts.example.yaml` | Example PrometheusRule alerts for writer failures, drops, and high retry volume |
| `kind/config-store.overlay.yaml` | Compose overlay that attaches local dependency containers to the external Docker `kind` network |
| `kind/config-store.seed.sql` | Kind-oriented managed-warehouse seed for the shared-worker flow |
| `kind/control-plane.yaml` | Kind-first shared-worker control-plane manifest used by local dev and CI |
Expand Down Expand Up @@ -90,6 +93,73 @@ For seamless planned deployments, use a rolling strategy with overlap and enough

That gives the old replica time to fail readiness, stop taking new pgwire sessions, keep existing pgwire and Flight sessions alive during the drain window, and then force shutdown at the timeout boundary if sessions remain.

## Query Log Kafka Writer

By default, query logging writes directly to each tenant's DuckLake-backed
`ducklake.system.query_log`. To route query logs through Kafka instead, deploy
the producer config and query-log writer:

```bash
cp k8s/query-log-kafka-config.example.yaml /tmp/duckgres-query-log-kafka.yaml
# edit brokers, topic, config_store, aws_region, and k8s_worker_namespace
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/rbac.yaml
kubectl apply -f k8s/networkpolicy.yaml
kubectl apply -f /tmp/duckgres-query-log-kafka.yaml
kubectl apply -f k8s/query-log-writer.yaml
# apply or upgrade the relevant control-plane manifest/chart so the
# DUCKGRES_QUERY_LOG_* env refs exist before restarting control-plane pods
kubectl apply -f k8s/kind/control-plane.yaml
kubectl -n duckgres scale deploy/duckgres-query-log-writer --replicas=1
kubectl -n duckgres rollout restart deploy/duckgres-control-plane
```

The control-plane manifests already include optional `DUCKGRES_QUERY_LOG_*`
environment variables from the `duckgres-query-log-kafka` ConfigMap. When the
ConfigMap is absent, direct DuckLake logging remains unchanged. When present
with `sink: kafka`, control-plane pods publish query-log events to Kafka and the
writer consumes the configured topic using `group_id`. The writer Deployment
requires `config_store`, `brokers`, and `topic` from the ConfigMap so missing
runtime config fails at pod start instead of retrying events without a tenant
target.

Multiple writer replicas can use the same `group_id`; Kafka partition ownership
keeps each partition assigned to one active consumer in the group. Scale replicas
up to the topic partition count when throughput requires it.

The writer is a privileged infrastructure service. It never executes logged SQL;
it resolves the target tenant by `org_id`, attaches that tenant's DuckLake, and
inserts generated rows into `ducklake.system.query_log`. The reference RBAC
grants read-only Duckling CR access plus `get` on Secrets in the `duckgres`
namespace. If managed-warehouse SecretRefs point to other namespaces, grant the
same get-only Secret access for those namespaces. Kubernetes RBAC does not grant
cloud credentials: when `aws_region` is set and Duckling-backed tenants require
STS, bind the `duckgres-query-log-writer` ServiceAccount to an IAM/Pod
Identity/IRSA role with the same STS permissions needed to resolve tenant object
store credentials.

Rollback:

```bash
kubectl -n duckgres scale deploy/duckgres-query-log-writer --replicas=0
kubectl -n duckgres delete configmap duckgres-query-log-kafka
kubectl -n duckgres rollout restart deploy/duckgres-control-plane
```

Useful checks:

```bash
kubectl -n duckgres get deploy duckgres-query-log-writer
kubectl -n duckgres logs deploy/duckgres-query-log-writer --tail=200
kubectl -n duckgres port-forward svc/duckgres-query-log-writer-metrics 9090:9090
curl -s localhost:9090/metrics | rg 'duckgres_query_log_kafka_writer'
```

If your cluster runs the Prometheus Operator, adapt and apply
`k8s/query-log-writer-alerts.example.yaml`. Kafka consumer lag usually comes
from your Kafka exporter rather than Duckgres itself, so wire a lag alert from
that metric source alongside these writer-process alerts.

## Local Development with kind

The primary shared-worker workflow now uses [`kind`](https://kind.sigs.k8s.io/). Prerequisites: Docker, `kubectl`, `kind`, and `just`.
Expand Down
24 changes: 24 additions & 0 deletions k8s/control-plane-multitenant-local.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,30 @@ spec:
value: "500m"
- name: DUCKGRES_K8S_WORKER_MEMORY_REQUEST
value: "512Mi"
- name: DUCKGRES_QUERY_LOG_SINK
valueFrom:
configMapKeyRef:
name: duckgres-query-log-kafka
key: sink
optional: true
- name: DUCKGRES_QUERY_LOG_KAFKA_BROKERS
valueFrom:
configMapKeyRef:
name: duckgres-query-log-kafka
key: brokers
optional: true
- name: DUCKGRES_QUERY_LOG_KAFKA_TOPIC
valueFrom:
configMapKeyRef:
name: duckgres-query-log-kafka
key: topic
optional: true
- name: DUCKGRES_QUERY_LOG_KAFKA_CLIENT_ID
valueFrom:
configMapKeyRef:
name: duckgres-query-log-kafka
key: client_id
optional: true
args:
- "--mode"
- "control-plane"
Expand Down
24 changes: 24 additions & 0 deletions k8s/kind/control-plane.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,30 @@ spec:
value: "500m"
- name: DUCKGRES_K8S_WORKER_MEMORY_REQUEST
value: "512Mi"
- name: DUCKGRES_QUERY_LOG_SINK
valueFrom:
configMapKeyRef:
name: duckgres-query-log-kafka
key: sink
optional: true
- name: DUCKGRES_QUERY_LOG_KAFKA_BROKERS
valueFrom:
configMapKeyRef:
name: duckgres-query-log-kafka
key: brokers
optional: true
- name: DUCKGRES_QUERY_LOG_KAFKA_TOPIC
valueFrom:
configMapKeyRef:
name: duckgres-query-log-kafka
key: topic
optional: true
- name: DUCKGRES_QUERY_LOG_KAFKA_CLIENT_ID
valueFrom:
configMapKeyRef:
name: duckgres-query-log-kafka
key: client_id
optional: true
args:
- "--mode"
- "control-plane"
Expand Down
5 changes: 5 additions & 0 deletions k8s/namespace.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,8 @@ apiVersion: v1
kind: Namespace
metadata:
name: duckgres
---
apiVersion: v1
kind: Namespace
metadata:
name: ducklings
44 changes: 44 additions & 0 deletions k8s/networkpolicy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -77,5 +77,49 @@ spec:
protocol: TCP
- port: 5432
protocol: TCP
# Kafka query-log producer sink. Adjust or extend in production when
# brokers listen on different ports.
- port: 9092
protocol: TCP
- port: 9093
protocol: TCP
- port: 9094
protocol: TCP
- port: 8816
protocol: TCP
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: duckgres-query-log-writer-boundaries
namespace: duckgres
spec:
podSelector:
matchLabels:
app: duckgres-query-log-writer
policyTypes:
- Egress
egress:
- ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
# Kubernetes API, STS, object stores, and extension bootstrap.
- port: 80
protocol: TCP
- port: 443
protocol: TCP
# Tenant DuckLake metadata stores.
- port: 5432
protocol: TCP
# Local MinIO/object-store development endpoint.
- port: 9000
protocol: TCP
# Kafka query-log consumer.
- port: 9092
protocol: TCP
- port: 9093
protocol: TCP
- port: 9094
protocol: TCP
19 changes: 19 additions & 0 deletions k8s/query-log-kafka-config.example.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: duckgres-query-log-kafka
namespace: duckgres
data:
# Set sink=kafka on control-plane pods to publish query-log events instead of
# writing directly to DuckLake. The query-log-writer consumes the same topic
# and writes events into each tenant's ducklake.system.query_log table.
sink: kafka
brokers: "kafka:9092"
topic: "duckgres_query_log"
client_id: "duckgres-query-log"
group_id: "duckgres-query-log-writer"

# Query-log writer runtime. Use the same config store as the control plane.
config_store: "postgres://duckgres:duckgres@duckgres-config-store:5432/duckgres_config?sslmode=disable"
aws_region: ""
k8s_worker_namespace: "duckgres"
33 changes: 33 additions & 0 deletions k8s/query-log-writer-alerts.example.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: duckgres-query-log-writer
namespace: duckgres
spec:
groups:
- name: duckgres-query-log-writer
rules:
- alert: DuckgresQueryLogWriterFailures
expr: increase(duckgres_query_log_kafka_writer_events_total{outcome="failed"}[10m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: Duckgres query-log writer is failing to process events
description: Query-log writer failures usually mean Kafka, tenant DuckLake attach, or query_log inserts are failing. Check writer logs before offsets pile up.
- alert: DuckgresQueryLogWriterDrops
expr: increase(duckgres_query_log_kafka_writer_events_total{outcome="dropped"}[10m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: Duckgres query-log writer is dropping events
description: Dropped query-log events are intentionally committed invalid or non-targetable events. Check writer logs for the drop reason.
- alert: DuckgresQueryLogWriterRetriesHigh
expr: increase(duckgres_query_log_kafka_writer_events_total{outcome="retried"}[10m]) > 100
for: 10m
labels:
severity: warning
annotations:
summary: Duckgres query-log writer retry rate is high
description: The writer is retrying Kafka events repeatedly. Check Kafka commit errors, DuckLake attach errors, and tenant metadata-store connectivity.
Loading
Loading