docs: refine BEP-1053 ROUTER frontend mode after design review#12331
docs: refine BEP-1053 ROUTER frontend mode after design review#12331achimnol wants to merge 1 commit into
Conversation
Resolve all four open questions and fill the design gaps surfaced in a design-review grilling session: - Endpoint placement via scaling-group routing; pick_worker treats ROUTER as unbounded; dedicated-coordinator topology (drop app-filters from the design). - Per-node liveness (ephemeral node_id, valkey liveness set as source of truth, derived `nodes`, dual-mode for backward compat) + REST/metrics exposure. - Visibility model: explicit-at-issuance, RBAC-validated, shrink-only; periodic Manager reconciler for the RBAC-driven shrink (no RBAC events to hook). - Two-tier revocation, documented; opt-in strict all-live-nodes confirmation (ack event carries node_id). - Hash-only key custody (plain SHA-256): token_hash replaces plaintext token. - Single per-authority revision + conditional poll (?known_revision -> 304). - Per-authority publication scope; multi-authority = caller composition. - First-class aliases (one deployment under several names, gated per key). - Atomic key rotation; overlap via two key_ids. - control_mode (manual/strategy-managed) to avoid BEP-1049 dual-writer. - rate_limit documented as per-node best-effort. - ratio = relative weight (0 = drain); Manager-side endpoint_id validation. - Authority discovery via appproxy client list_workers(). - Terminology: define publication, model name/alias, authority/node. Paired with the worker-side doc update in lablup/continuum-router#748. Claude-Session: https://claude.ai/code/session_01DRHPAAYiQxQyMRkzic4Dda
There was a problem hiding this comment.
Pull request overview
This PR updates the BEP-1053 design document to reflect post–design-review decisions for ROUTER frontend mode, clarifying routing placement/HA, key custody/visibility, and desired-state propagation semantics across Manager ↔ coordinator ↔ router worker.
Changes:
- Refines terminology and core entities (publications, aliases, authority/node) and documents per-authority scoping.
- Updates key issuance/visibility and revocation semantics (explicit-at-issuance shrink-only visibility, two-tier revocation, optional strict all-live-nodes confirmation).
- Expands coordinator/worker design details (per-node liveness via
node_id, conditional snapshot polling viaknown_revision, and hash-only key custody).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | Key generation / issuance | **User** | An explicit action via WebUI / API / CLI; the Manager mints the model API key and persists it with its owner. | | ||
| | Model → endpoint mapping | **User** | Explicit publication management via WebUI / API / CLI: choose the model name, the backing endpoints, and the split ratios (A/B tests, resource-group balancing). | | ||
| | Key → model visibility | **System** | Derived from RBAC: the Manager computes the key's allowed-model set from the owner's scope (domain/project permissions over published models), optionally narrowed further at issuance, and recomputes and re-propagates it whenever RBAC grants or publications change — no manual list maintenance. | | ||
| | Key → model visibility | **User at issuance, System for shrink** | `allowed_models` is an **explicit set chosen at issuance**, validated by the Manager to be ⊆ the owner's RBAC-visible published names (no privilege escalation). It is then **static except for two automatic shrink triggers**: publication-delete / endpoint-destroy (immediate) and owner-loses-RBAC-access (periodic reconcile). It **never auto-expands** — widening is an explicit re-issue. | |
| periodic reconciler (modelled on the sokovan reconciler) recomputes each | ||
| active key's `allowed_models` against current RBAC ∩ published names and | ||
| pushes **shrink** diffs via `PUT /v2/routers/{authority}/api-keys/{key_id}`. |
| - **Backward compatibility (dual-mode).** `node_id` is optional. Per authority, | ||
| the first registration decides the mode: with `node_id` → liveness-set + | ||
| derived `nodes`; without → the legacy `+1/−1` counter path, unchanged. Mixed | ||
| modes within one authority do not occur because all nodes run the same build. | ||
| Updating the stock worker types to send `node_id` is **follow-up work**. |
| `/v1/models` for a key returns `allowed_models ∩ currently-published-names`, | ||
| so aliases are independently gated and never auto-exposed. |
| `accepted_traffics`. The ROUTER worker is the inference/HTTP worker; with the | ||
| unbounded-slot treatment above, `pick_worker()` selects it for inference | ||
| circuits. | ||
| - **Operational constraint (documented, not enforced):** do **not** colocate a |
There was a problem hiding this comment.
Could we fail fast instead of silently misrouting? For example, when an app proxy worker is registered, could we raise an error if a ROUTER and an app proxy worker type that can receive legacy inference traffic are registered at the same time?
| aliases JSONB # ["alias-a", …] additional names that route identically | ||
| mappings JSONB # [{"endpoint_id": UUID, "ratio": float}] |
There was a problem hiding this comment.
Is there a reason to keep this in a JSONB column instead of normalizing it into a separate table?
| `?known_revision=`; an unchanged revision yields a cheap `304`. (An explicit | ||
| query param is used rather than HTTP `ETag`/`If-None-Match` because the |
There was a problem hiding this comment.
For extensibility, I think it would be good to introduce a middleware layer and align the implementation with it.
| publication binds **a primary model name (plus optional aliases)** to **one | ||
| or more deployment endpoints** (each with a split `ratio`) on **one router | ||
| authority**, and carries a `control_mode`. It is the object users | ||
| create/edit/delete; it is stored Manager-side (source of truth) and mirrored |
There was a problem hiding this comment.
Will the same table be added to the manager database with the same schema as well?
Refines BEP-1053: ROUTER Frontend Mode after a design-review grilling session that resolved all four of its open questions and filled the gaps surfaced while walking the design tree.
What changed
Placement & HA
ScalingGroupProxyTarget→ coordinator);pick_workertreats ROUTER as unbounded; dedicated-coordinator topology documented (app-filters dropped from the design — a rarely-used legacy import).node_id, valkey liveness set as source of truth, derivednodes, dual-mode for backward compat; exposed via coordinator REST + Prometheus metrics + Manager fleet view.Keys & visibility
node_id).token_hashreplaces plaintext; a leaked hash is not bearer-equivalent.Data model & propagation
revision+ conditional poll (?known_revision→304).key_ids.control_mode(manual/strategy-managed) to avoid a BEP-1049 dual-writer conflict.rate_limit= per-node best-effort;ratio= relative weight (0= drain); Manager-sideendpoint_idvalidation; authority discovery vialist_workers().Pairing
Worker-side counterpart updated in lablup/continuum-router#748 (
docs/en/architecture/appproxy-worker.md), kept in sync field-for-field.https://claude.ai/code/session_01DRHPAAYiQxQyMRkzic4Dda