bootjp · bootjp · May 22, 2026 · May 22, 2026 · May 22, 2026 · gemini-code-assist
diff --git a/.../2026_05_16_proposed_admin_purge_queue.md → ...26_05_16_implemented_admin_purge_queue.md b/.../2026_05_16_proposed_admin_purge_queue.md → ...26_05_16_implemented_admin_purge_queue.md
@@ -1,9 +1,25 @@
 # Admin Queue Peek and Purge (DLQ-Aware) for the SQS Web Console
 
-**Status:** Proposed
+**Status:** Implemented
 **Author:** bootjp
 **Date:** 2026-05-16
 
+## Implementation history
+
+| Phase | PR | Landed |
+|-------|-----|--------|
+| 1 (this doc, proposal) | #757 | 2026-05-16 |
+| 2 (`AdminPurgeQueue` + `IsDLQ`/`DLQSources`) | #771 | 2026-05-17 |
+| 3 (`AdminPeekQueue` backend) | #794 | 2026-05-20 |
+| 4 (HTTP handler + bridge) | #797 | 2026-05-21 |
+| 5 (SPA Messages tab + Purge button + DLQ chips) | #798 | 2026-05-21 |
+
+Out-of-scope follow-ups (tracked separately, not gating this rename):
+- Throttle integration (`bucketActionAdminPeek` + dedicated per-queue admin-peek bucket per §3.1)
+- Audit logging + Prometheus counters per §3.6
+- `principalForReadSensitive` live `RoleStore` re-check (Goal 8, blocked on wider RoleStore plumbing)
+- Page-size selector (20 / 50 / 100) + response-size warning
+
 ---
 
 ## 1. Background and Motivation
@@ -42,7 +58,7 @@ Both features work for any queue. The **UI is DLQ-aware** so the operator gets t
    - `DLQSources []string` — the source-queue names that point at this queue. The SPA renders these as a chip list on the detail page so the operator can confirm they understand what queue feeds the DLQ before purging.
 6. **Same AWS-shaped error mapping** as the SigV4 path — purging more than once per 60 seconds returns the SQS `PurgeQueueInProgress` semantics that `tryPurgeQueueOnce` already enforces. The admin response surfaces it as a structured `429 Too Many Requests` JSON payload (`{"code":"PurgeQueueInProgress", "retry_after_seconds":N}`).
 7. **Audit** — `admin.sqs.purge_queue` (subject, role, queue, generation_before, generation_after). Peek is a read and does NOT generate an audit line per call (the SPA polls; per-poll audit would drown the log) but the admin handler emits the standard request-log line with `route` / `subject` / `status_code` so the call is still traceable.
-8. **Read-only role can peek but not purge.** Peek is gated by a **live `RoleStore` re-check** (not just the session-auth gate that List / Describe currently use), introduced as a new `principalForReadSensitive` helper alongside the existing `principalForWrite`. Purge stays gated by `principalForWrite` (live-role re-check), matching `AdminDeleteQueue` exactly. Codex r9 P1 flagged the security gap in the earlier draft: peek exposes full message bodies / attributes (not just metadata like List / Describe), so a session JWT that was revoked or whose role was downgraded after login could still read DLQ payloads via peek until the token's natural 1-hour TTL elapsed. The new `principalForReadSensitive` helper performs the same revocation check `principalForWrite` does, but classifies the call as a read in the audit pipeline — keeping the audit shape parallel to List / Describe while closing the confidentiality gap. List / Describe themselves remain on the session-auth-only gate because their output is metadata that is already shown on the SPA's queue list page; the divergence is intentional and is documented at the call site so a future reviewer does not "fix" the inconsistency by downgrading peek's gate. Claude r2 caught the earlier draft that implied a non-existent `principalForRead` helper; this paragraph spells out the actual gate with the security-class distinction.
+8. **Read-only role can peek but not purge.** _The live `RoleStore` re-check is NOT yet implemented in the initial rollout — see "Out-of-scope follow-ups" at the top. Phase 4 shipped a `Role.AllowsRead()` gate that accepts the JWT-embedded role plus an optional RoleStore lookup; the wider live-revalidation plumbing the design calls for is blocked on a broader RoleStore refactor that affects every adapter's read path. Mitigation in absence: a revoked / downgraded key keeps peek access until the session JWT's natural 1-hour TTL expires._ Peek is gated by a **live `RoleStore` re-check** (not just the session-auth gate that List / Describe currently use), introduced as a new `principalForReadSensitive` helper alongside the existing `principalForWrite`. Purge stays gated by `principalForWrite` (live-role re-check), matching `AdminDeleteQueue` exactly. Codex r9 P1 flagged the security gap in the earlier draft: peek exposes full message bodies / attributes (not just metadata like List / Describe), so a session JWT that was revoked or whose role was downgraded after login could still read DLQ payloads via peek until the token's natural 1-hour TTL elapsed. The new `principalForReadSensitive` helper performs the same revocation check `principalForWrite` does, but classifies the call as a read in the audit pipeline — keeping the audit shape parallel to List / Describe while closing the confidentiality gap. List / Describe themselves remain on the session-auth-only gate because their output is metadata that is already shown on the SPA's queue list page; the divergence is intentional and is documented at the call site so a future reviewer does not "fix" the inconsistency by downgrading peek's gate. Claude r2 caught the earlier draft that implied a non-existent `principalForRead` helper; this paragraph spells out the actual gate with the security-class distinction.
 
 ### 2.2 Non-Goals
 
@@ -172,7 +188,7 @@ The walk terminates when either `Partition` advances back to `StartPartition` (f
 
 Cost is `O(Limit)` round-trips against Pebble at peek time — tiny for the bounded result sets the SPA uses. The bound on `Limit` (max 100) prevents an operator script from accidentally issuing million-row peeks against the leader.
 
-**Throttle.** Peek consults a **distinct per-queue admin-peek bucket**, *not* the per-queue `ReceiveMessage` budget. An earlier draft of this design merged the two; Claude r2 flagged that an operator paginating through a 10k-message DLQ could exhaust the budget that real consumers depend on. The separate admin-peek bucket defaults to a lower steady-rate (`adminPeekRPS = 5`, `adminPeekBurst = 20`) so a pagination loop cannot starve consumers.
+**Throttle.** _Not yet implemented in the initial rollout — see "Out-of-scope follow-ups" at the top. Mitigation in absence: the Phase 3 implementation enforces a hard `Limit ≤ 100` per call and leader-only execution, which bounds the steady-state cost; per-queue throttling lands when the SPA wiring needs the rate-limit metric to have a real consumer._ Peek consults a **distinct per-queue admin-peek bucket**, *not* the per-queue `ReceiveMessage` budget. An earlier draft of this design merged the two; Claude r2 flagged that an operator paginating through a 10k-message DLQ could exhaust the budget that real consumers depend on. The separate admin-peek bucket defaults to a lower steady-rate (`adminPeekRPS = 5`, `adminPeekBurst = 20`) so a pagination loop cannot starve consumers.
 
 **Bucket key format.** The existing `bucketStore` (`adapter/sqs_throttle.go`) keys on a struct `bucketKey{queue, action, incarnation}`, not a string. The admin-peek bucket therefore uses `bucketStore.charge()` directly with `action = bucketActionAdminPeek` and the queue's current incarnation, exactly like the `SendMessage` / `ReceiveMessage` paths do. Claude r4 flagged an earlier draft that described the bucket as a free-standing string-keyed map; that would have required parallel rate-limiter infrastructure and would not have been swept by `invalidateQueue()` on queue re-creation. The `bucketStore.charge(adminPeekThrottle, queueName, bucketActionAdminPeek, meta.Incarnation, 1)` form participates in the existing incarnation reset machinery automatically.
 
@@ -430,7 +446,7 @@ The queue detail page gains two new pieces of UI on top of the existing attribut
 | Body preview      | `body` (already truncated by backend)               | first 96 chars; "…" suffix when `body_truncated`. Row click opens detail modal.                                                    |
 | Size              | `body_original_size`                                | human-readable ("1.4 kB") so operators can spot oversized messages                                                                 |
 
-Below the table: a page-size selector (20 / 50 / 100), a Refresh button, and Next / Previous controls driven by the cursor. Detail modal shows full body + every attribute + the timestamps; a "Copy as JSON" button copies the row's full record to the clipboard for manual export.
+Below the table: _the page-size selector (20 / 50 / 100) is NOT yet implemented — see "Out-of-scope follow-ups" at the top. Phase 5 shipped a hard default of 20 rows; the worst-case response (20 × 256 KiB = 5 MiB) stays well under network / JSON-parse budgets even without operator-tunable sizes. Selector + size warning land in a follow-up if operators ask for larger pages._ A Refresh button, and Next / Previous controls driven by the cursor. Detail modal shows full body + every attribute + the timestamps; a "Copy as JSON" button copies the row's full record to the clipboard for manual export.
 
 **Copy as JSON payload schema.** The clipboard payload is the exact wire shape of a single `AdminPeekedMessage` entry (top-level keys: `message_id`, `body`, `body_truncated`, `body_original_size`, `sent_timestamp`, `receive_count`, `group_id`, `deduplication_id`, `attributes`) plus a wrapper `{"schema_version": 1, "queue": "<name>", "exported_at": "<ISO8601>", "message": { … }}`. The `schema_version` is what downstream tooling pins so a future change to the export format (e.g. multi-message JSONL bundle) does not silently break exporters. Operator workflows that pipe this into a recovery tool can rely on the schema not shifting under them.
 
@@ -475,6 +491,8 @@ mirroring the existing `deleteQueue` / `describeQueue` shape. `peekQueue` is `si
 
 ### 3.6 Audit and observability
 
+_Not yet implemented in the initial rollout — see "Out-of-scope follow-ups" at the top. Mitigation in absence: the admin handler still emits the standard request-log line with `route` / `subject` / `status_code` for both purge and peek calls, so an operator can correlate "who did what when" against the application logs at audit-review time. The structured `admin.sqs.purge_queue` audit line and the two Prometheus counters land alongside the SPA wiring so the metrics have a real consumer._
+
 New structured log line at `slog.Info` level (matches `AdminDeleteQueue`):
 
 ```