Skip to content

Small queries should avoid K8s worker cold-start latency #761

Description

@bill-ph

Context

docs/design/worker-ttl-pool.md documents the current remote/K8s worker TTL model and explicitly leaves per-org reserved minimum capacity as future work. Today, small/default queries benefit from hot-idle reuse only when a compatible worker already exists. After an idle gap, the first small query still has to wait for a new worker pod to become query-ready.

Why this matters

In testing, spawning a worker on an existing node still took roughly 3-5 seconds before the query could run. That delay is not only node provisioning; it includes pod creation/startup, worker process startup, DuckDB extension/bootstrap/warmup, control-plane health-check confirmation, and tenant activation.

That latency is acceptable for cold or large batch work, but it is a poor fit for small/interactive queries where users expect the query path to feel already warm. The current TTL model improves repeated queries, but it does not protect the first small query after all compatible hot-idle workers have expired.

Desired outcome

Small/default query traffic should have an operator-controlled way to avoid paying cold K8s worker startup latency after ordinary idle gaps, while preserving the existing remote-worker safety properties: one query session per worker, tenant isolation, version-aware reuse, and predictable per-query resources.

Related

  • docs/design/worker-ttl-pool.md
  • Remote/K8s spawn-on-demand + hot-idle worker model

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions