Context
docs/design/worker-ttl-pool.md documents the current remote/K8s worker TTL model and explicitly leaves per-org reserved minimum capacity as future work. Today, small/default queries benefit from hot-idle reuse only when a compatible worker already exists. After an idle gap, the first small query still has to wait for a new worker pod to become query-ready.
Why this matters
In testing, spawning a worker on an existing node still took roughly 3-5 seconds before the query could run. That delay is not only node provisioning; it includes pod creation/startup, worker process startup, DuckDB extension/bootstrap/warmup, control-plane health-check confirmation, and tenant activation.
That latency is acceptable for cold or large batch work, but it is a poor fit for small/interactive queries where users expect the query path to feel already warm. The current TTL model improves repeated queries, but it does not protect the first small query after all compatible hot-idle workers have expired.
Desired outcome
Small/default query traffic should have an operator-controlled way to avoid paying cold K8s worker startup latency after ordinary idle gaps, while preserving the existing remote-worker safety properties: one query session per worker, tenant isolation, version-aware reuse, and predictable per-query resources.
Related
docs/design/worker-ttl-pool.md
- Remote/K8s spawn-on-demand + hot-idle worker model
Context
docs/design/worker-ttl-pool.mddocuments the current remote/K8s worker TTL model and explicitly leaves per-org reserved minimum capacity as future work. Today, small/default queries benefit from hot-idle reuse only when a compatible worker already exists. After an idle gap, the first small query still has to wait for a new worker pod to become query-ready.Why this matters
In testing, spawning a worker on an existing node still took roughly 3-5 seconds before the query could run. That delay is not only node provisioning; it includes pod creation/startup, worker process startup, DuckDB extension/bootstrap/warmup, control-plane health-check confirmation, and tenant activation.
That latency is acceptable for cold or large batch work, but it is a poor fit for small/interactive queries where users expect the query path to feel already warm. The current TTL model improves repeated queries, but it does not protect the first small query after all compatible hot-idle workers have expired.
Desired outcome
Small/default query traffic should have an operator-controlled way to avoid paying cold K8s worker startup latency after ordinary idle gaps, while preserving the existing remote-worker safety properties: one query session per worker, tenant isolation, version-aware reuse, and predictable per-query resources.
Related
docs/design/worker-ttl-pool.md