docs: refine BEP-1053 ROUTER frontend mode after design review by achimnol · Pull Request #12331 · lablup/backend.ai

achimnol · 2026-06-21T18:31:45Z

Refines BEP-1053: ROUTER Frontend Mode after a design-review grilling session that resolved all four of its open questions and filled the gaps surfaced while walking the design tree.

What changed

Placement & HA

Endpoint placement via existing scaling-group routing (ScalingGroupProxyTarget → coordinator); pick_worker treats ROUTER as unbounded; dedicated-coordinator topology documented (app-filters dropped from the design — a rarely-used legacy import).
Per-node liveness (§2a): ephemeral node_id, valkey liveness set as source of truth, derived nodes, dual-mode for backward compat; exposed via coordinator REST + Prometheus metrics + Manager fleet view.

Keys & visibility

Visibility model: explicit-at-issuance, RBAC-validated, shrink-only; periodic Manager reconciler for the RBAC-driven shrink (RBAC mutations emit no events to hook).
Two-tier revocation documented honestly; opt-in strict all-live-nodes confirmation (ack event carries node_id).
Hash-only key custody (plain SHA-256): token_hash replaces plaintext; a leaked hash is not bearer-equivalent.

Data model & propagation

Single per-authority revision + conditional poll (?known_revision → 304).
Per-authority publication scope; multi-authority = caller composition.
First-class aliases (one deployment under several names, gated per key).
Atomic key rotation; overlap via two key_ids.
control_mode (manual/strategy-managed) to avoid a BEP-1049 dual-writer conflict.
rate_limit = per-node best-effort; ratio = relative weight (0 = drain); Manager-side endpoint_id validation; authority discovery via list_workers().
Terminology section expanded: defines publication, model name/alias, authority/node.

Pairing

Worker-side counterpart updated in lablup/continuum-router#748 (docs/en/architecture/appproxy-worker.md), kept in sync field-for-field.

https://claude.ai/code/session_01DRHPAAYiQxQyMRkzic4Dda

Resolve all four open questions and fill the design gaps surfaced in a design-review grilling session: - Endpoint placement via scaling-group routing; pick_worker treats ROUTER as unbounded; dedicated-coordinator topology (drop app-filters from the design). - Per-node liveness (ephemeral node_id, valkey liveness set as source of truth, derived `nodes`, dual-mode for backward compat) + REST/metrics exposure. - Visibility model: explicit-at-issuance, RBAC-validated, shrink-only; periodic Manager reconciler for the RBAC-driven shrink (no RBAC events to hook). - Two-tier revocation, documented; opt-in strict all-live-nodes confirmation (ack event carries node_id). - Hash-only key custody (plain SHA-256): token_hash replaces plaintext token. - Single per-authority revision + conditional poll (?known_revision -> 304). - Per-authority publication scope; multi-authority = caller composition. - First-class aliases (one deployment under several names, gated per key). - Atomic key rotation; overlap via two key_ids. - control_mode (manual/strategy-managed) to avoid BEP-1049 dual-writer. - rate_limit documented as per-node best-effort. - ratio = relative weight (0 = drain); Manager-side endpoint_id validation. - Authority discovery via appproxy client list_workers(). - Terminology: define publication, model name/alias, authority/node. Paired with the worker-side doc update in lablup/continuum-router#748. Claude-Session: https://claude.ai/code/session_01DRHPAAYiQxQyMRkzic4Dda

Copilot

Pull request overview

This PR updates the BEP-1053 design document to reflect post–design-review decisions for ROUTER frontend mode, clarifying routing placement/HA, key custody/visibility, and desired-state propagation semantics across Manager ↔ coordinator ↔ router worker.

Changes:

Refines terminology and core entities (publications, aliases, authority/node) and documents per-authority scoping.
Updates key issuance/visibility and revocation semantics (explicit-at-issuance shrink-only visibility, two-tier revocation, optional strict all-live-nodes confirmation).
Expands coordinator/worker design details (per-node liveness via node_id, conditional snapshot polling via known_revision, and hash-only key custody).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

 | Key generation / issuance | **User** | An explicit action via WebUI / API / CLI; the Manager mints the model API key and persists it with its owner. |
 | Model → endpoint mapping | **User** | Explicit publication management via WebUI / API / CLI: choose the model name, the backing endpoints, and the split ratios (A/B tests, resource-group balancing). |
-| Key → model visibility | **System** | Derived from RBAC: the Manager computes the key's allowed-model set from the owner's scope (domain/project permissions over published models), optionally narrowed further at issuance, and recomputes and re-propagates it whenever RBAC grants or publications change — no manual list maintenance. |
+| Key → model visibility | **User at issuance, System for shrink** | `allowed_models` is an **explicit set chosen at issuance**, validated by the Manager to be ⊆ the owner's RBAC-visible published names (no privilege escalation). It is then **static except for two automatic shrink triggers**: publication-delete / endpoint-destroy (immediate) and owner-loses-RBAC-access (periodic reconcile). It **never auto-expands** — widening is an explicit re-issue. |


+  periodic reconciler (modelled on the sokovan reconciler) recomputes each
+  active key's `allowed_models` against current RBAC ∩ published names and
+  pushes **shrink** diffs via `PUT /v2/routers/{authority}/api-keys/{key_id}`.


+- **Backward compatibility (dual-mode).** `node_id` is optional. Per authority,
+  the first registration decides the mode: with `node_id` → liveness-set +
+  derived `nodes`; without → the legacy `+1/−1` counter path, unchanged. Mixed
+  modes within one authority do not occur because all nodes run the same build.
+  Updating the stock worker types to send `node_id` is **follow-up work**.


+  `/v1/models` for a key returns `allowed_models ∩ currently-published-names`,
+  so aliases are independently gated and never auto-exposed.


jopemachine · 2026-06-24T06:11:48Z

+  `accepted_traffics`. The ROUTER worker is the inference/HTTP worker; with the
+  unbounded-slot treatment above, `pick_worker()` selects it for inference
+  circuits.
+- **Operational constraint (documented, not enforced):** do **not** colocate a


Could we fail fast instead of silently misrouting? For example, when an app proxy worker is registered, could we raise an error if a ROUTER and an app proxy worker type that can receive legacy inference traffic are registered at the same time?

jopemachine · 2026-06-24T06:54:34Z

+    aliases JSONB         # ["alias-a", …] additional names that route identically
    mappings JSONB        # [{"endpoint_id": UUID, "ratio": float}]


Is there a reason to keep this in a JSONB column instead of normalizing it into a separate table?

jopemachine · 2026-06-24T07:10:51Z

+`?known_revision=`; an unchanged revision yields a cheap `304`. (An explicit
+query param is used rather than HTTP `ETag`/`If-None-Match` because the


For extensibility, I think it would be good to introduce a middleware layer and align the implementation with it.

jopemachine · 2026-06-24T07:42:28Z

+  publication binds **a primary model name (plus optional aliases)** to **one
+  or more deployment endpoints** (each with a split `ratio`) on **one router
+  authority**, and carries a `control_mode`. It is the object users
+  create/edit/delete; it is stored Manager-side (source of truth) and mirrored


Will the same table be added to the manager database with the same schema as well?

achimnol requested a review from a team as a code owner June 21, 2026 18:31

Copilot AI review requested due to automatic review settings June 21, 2026 18:31

github-actions Bot assigned achimnol Jun 21, 2026

github-actions Bot added the size:L 100~500 LoC label Jun 21, 2026

Copilot started reviewing on behalf of achimnol June 21, 2026 18:32 View session

Copilot AI reviewed Jun 21, 2026

View reviewed changes

jopemachine reviewed Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: refine BEP-1053 ROUTER frontend mode after design review#12331

docs: refine BEP-1053 ROUTER frontend mode after design review#12331
achimnol wants to merge 1 commit into
mainfrom
bep/refine-1053

achimnol commented Jun 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

jopemachine Jun 24, 2026

Uh oh!

jopemachine Jun 24, 2026

Uh oh!

jopemachine Jun 24, 2026

Uh oh!

jopemachine Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		`/v1/models` for a key returns `allowed_models ∩ currently-published-names`,
		so aliases are independently gated and never auto-exposed.

		aliases JSONB # ["alias-a", …] additional names that route identically
		mappings JSONB # [{"endpoint_id": UUID, "ratio": float}]

		`?known_revision=`; an unchanged revision yields a cheap `304`. (An explicit
		query param is used rather than HTTP `ETag`/`If-None-Match` because the

Uh oh!

Conversation

achimnol commented Jun 21, 2026

What changed

Pairing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

jopemachine Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

jopemachine Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

jopemachine Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

jopemachine Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants