Skip to content

docs: refine BEP-1053 ROUTER frontend mode after design review#12331

Open
achimnol wants to merge 1 commit into
mainfrom
bep/refine-1053
Open

docs: refine BEP-1053 ROUTER frontend mode after design review#12331
achimnol wants to merge 1 commit into
mainfrom
bep/refine-1053

Conversation

@achimnol

Copy link
Copy Markdown
Member

Refines BEP-1053: ROUTER Frontend Mode after a design-review grilling session that resolved all four of its open questions and filled the gaps surfaced while walking the design tree.

What changed

Placement & HA

  • Endpoint placement via existing scaling-group routing (ScalingGroupProxyTarget → coordinator); pick_worker treats ROUTER as unbounded; dedicated-coordinator topology documented (app-filters dropped from the design — a rarely-used legacy import).
  • Per-node liveness (§2a): ephemeral node_id, valkey liveness set as source of truth, derived nodes, dual-mode for backward compat; exposed via coordinator REST + Prometheus metrics + Manager fleet view.

Keys & visibility

  • Visibility model: explicit-at-issuance, RBAC-validated, shrink-only; periodic Manager reconciler for the RBAC-driven shrink (RBAC mutations emit no events to hook).
  • Two-tier revocation documented honestly; opt-in strict all-live-nodes confirmation (ack event carries node_id).
  • Hash-only key custody (plain SHA-256): token_hash replaces plaintext; a leaked hash is not bearer-equivalent.

Data model & propagation

  • Single per-authority revision + conditional poll (?known_revision304).
  • Per-authority publication scope; multi-authority = caller composition.
  • First-class aliases (one deployment under several names, gated per key).
  • Atomic key rotation; overlap via two key_ids.
  • control_mode (manual/strategy-managed) to avoid a BEP-1049 dual-writer conflict.
  • rate_limit = per-node best-effort; ratio = relative weight (0 = drain); Manager-side endpoint_id validation; authority discovery via list_workers().
  • Terminology section expanded: defines publication, model name/alias, authority/node.

Pairing

Worker-side counterpart updated in lablup/continuum-router#748 (docs/en/architecture/appproxy-worker.md), kept in sync field-for-field.

https://claude.ai/code/session_01DRHPAAYiQxQyMRkzic4Dda

Resolve all four open questions and fill the design gaps surfaced in a
design-review grilling session:

- Endpoint placement via scaling-group routing; pick_worker treats ROUTER as
  unbounded; dedicated-coordinator topology (drop app-filters from the design).
- Per-node liveness (ephemeral node_id, valkey liveness set as source of truth,
  derived `nodes`, dual-mode for backward compat) + REST/metrics exposure.
- Visibility model: explicit-at-issuance, RBAC-validated, shrink-only; periodic
  Manager reconciler for the RBAC-driven shrink (no RBAC events to hook).
- Two-tier revocation, documented; opt-in strict all-live-nodes confirmation
  (ack event carries node_id).
- Hash-only key custody (plain SHA-256): token_hash replaces plaintext token.
- Single per-authority revision + conditional poll (?known_revision -> 304).
- Per-authority publication scope; multi-authority = caller composition.
- First-class aliases (one deployment under several names, gated per key).
- Atomic key rotation; overlap via two key_ids.
- control_mode (manual/strategy-managed) to avoid BEP-1049 dual-writer.
- rate_limit documented as per-node best-effort.
- ratio = relative weight (0 = drain); Manager-side endpoint_id validation.
- Authority discovery via appproxy client list_workers().
- Terminology: define publication, model name/alias, authority/node.

Paired with the worker-side doc update in lablup/continuum-router#748.

Claude-Session: https://claude.ai/code/session_01DRHPAAYiQxQyMRkzic4Dda
@achimnol achimnol requested a review from a team as a code owner June 21, 2026 18:31
Copilot AI review requested due to automatic review settings June 21, 2026 18:31
@github-actions github-actions Bot added the size:L 100~500 LoC label Jun 21, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the BEP-1053 design document to reflect post–design-review decisions for ROUTER frontend mode, clarifying routing placement/HA, key custody/visibility, and desired-state propagation semantics across Manager ↔ coordinator ↔ router worker.

Changes:

  • Refines terminology and core entities (publications, aliases, authority/node) and documents per-authority scoping.
  • Updates key issuance/visibility and revocation semantics (explicit-at-issuance shrink-only visibility, two-tier revocation, optional strict all-live-nodes confirmation).
  • Expands coordinator/worker design details (per-node liveness via node_id, conditional snapshot polling via known_revision, and hash-only key custody).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

| Key generation / issuance | **User** | An explicit action via WebUI / API / CLI; the Manager mints the model API key and persists it with its owner. |
| Model → endpoint mapping | **User** | Explicit publication management via WebUI / API / CLI: choose the model name, the backing endpoints, and the split ratios (A/B tests, resource-group balancing). |
| Key → model visibility | **System** | Derived from RBAC: the Manager computes the key's allowed-model set from the owner's scope (domain/project permissions over published models), optionally narrowed further at issuance, and recomputes and re-propagates it whenever RBAC grants or publications change — no manual list maintenance. |
| Key → model visibility | **User at issuance, System for shrink** | `allowed_models` is an **explicit set chosen at issuance**, validated by the Manager to be ⊆ the owner's RBAC-visible published names (no privilege escalation). It is then **static except for two automatic shrink triggers**: publication-delete / endpoint-destroy (immediate) and owner-loses-RBAC-access (periodic reconcile). It **never auto-expands** — widening is an explicit re-issue. |
Comment on lines +225 to +227
periodic reconciler (modelled on the sokovan reconciler) recomputes each
active key's `allowed_models` against current RBAC ∩ published names and
pushes **shrink** diffs via `PUT /v2/routers/{authority}/api-keys/{key_id}`.
Comment on lines +343 to +347
- **Backward compatibility (dual-mode).** `node_id` is optional. Per authority,
the first registration decides the mode: with `node_id` → liveness-set +
derived `nodes`; without → the legacy `+1/−1` counter path, unchanged. Mixed
modes within one authority do not occur because all nodes run the same build.
Updating the stock worker types to send `node_id` is **follow-up work**.
Comment on lines +529 to +530
`/v1/models` for a key returns `allowed_models ∩ currently-published-names`,
so aliases are independently gated and never auto-exposed.
`accepted_traffics`. The ROUTER worker is the inference/HTTP worker; with the
unbounded-slot treatment above, `pick_worker()` selects it for inference
circuits.
- **Operational constraint (documented, not enforced):** do **not** colocate a

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we fail fast instead of silently misrouting? For example, when an app proxy worker is registered, could we raise an error if a ROUTER and an app proxy worker type that can receive legacy inference traffic are registered at the same time?

Comment on lines +371 to 372
aliases JSONB # ["alias-a", …] additional names that route identically
mappings JSONB # [{"endpoint_id": UUID, "ratio": float}]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to keep this in a JSONB column instead of normalizing it into a separate table?

Comment on lines +456 to +457
`?known_revision=`; an unchanged revision yields a cheap `304`. (An explicit
query param is used rather than HTTP `ETag`/`If-None-Match` because the

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For extensibility, I think it would be good to introduce a middleware layer and align the implementation with it.

publication binds **a primary model name (plus optional aliases)** to **one
or more deployment endpoints** (each with a split `ratio`) on **one router
authority**, and carries a `control_mode`. It is the object users
create/edit/delete; it is stored Manager-side (source of truth) and mirrored

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the same table be added to the manager database with the same schema as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L 100~500 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants