diff --git a/.claude/skills/kaizen-research/SKILL.md b/.claude/skills/kaizen-research/SKILL.md
new file mode 100644
index 00000000..aa1933fd
--- /dev/null
+++ b/.claude/skills/kaizen-research/SKILL.md
@@ -0,0 +1,384 @@
+---
+name: kaizen-research
+description: Weekly Friday early-morning external + internal scan for emerging functionality, agentic trends, tools, and feature/UX improvements in the AgentCore Public Stack repo. Tracks AWS Bedrock + AgentCore announcements, Strands Agents releases, FastMCP (used by externally hosted MCP servers), the aws-samples/sample-strands-agent-with-agentcore reference repo, the MCP ecosystem (including MCP Apps + extensions), frontier model announcements, agent-harness patterns, and agentic UI/UX patterns (MCP Apps, Vercel AI SDK, assistant-ui, NN/g AI research, Linear/Cursor/Anthropic product blogs). Audits internal signals (recent commits, open PRs, CI failures, version-pin lag, dormant skills). Outputs a dated research doc + queues ideas in `docs/kaizen/review-queue.md` for that same morning's `kaizen-review-prep` (runs ~2 hours later) to rank into decisions. Opens a PR into `develop`. **Out of scope**: security advisories / Dependabot / CodeQL — those have dedicated tooling and don't need a weekly kaizen lens. Triggers: "kaizen research", "weekly research scan", "external scan", "what should we look at this week".
+---
+
+# Kaizen Research
+
+Friday early morning. The "what's the rest of the world learning that we should consider, and what's our own week telling us?" scan. Pairs with `kaizen-review-prep` which runs ~2 hours later the same morning and ranks this skill's output into a decision agenda — both docs ready before Phil sits down to review Friday morning.
+
+## Philosophy
+
+- **Subtraction first.** Every research run should propose at least as many things to *remove or simplify* as to add. A smaller stack you trust beats a bigger one you route around. **Subtraction explicitly includes replacing custom code with library-native equivalents** — when an upstream release (Strands, AgentCore SDK, FastMCP, MCP, etc.) ships a capability we'd already built or filed an issue for, the win is closing our version and adopting upstream. Example: the 2026-05-10 bootstrap run found that Strands v1.37/v1.38 silently closed our open issues #266 and #267 — the codebase surface area shrinks even though we "added" a dep bump.
+- **Dual lens — impact + capability-unlock.** Evaluate every upstream feature through *two* lenses, not one: (a) **impact on existing code** (does it change, simplify, or obsolete something we already have?) and (b) **capability unlock** (what *new* product capability, UX pattern, or enhancement does this make possible that we couldn't easily do before?). Subtraction-first still applies to the first lens. But capability-unlock items — features that enable net-new product surface — must be evaluated on their strategic merit, *not* hedged into "replaces future glue we haven't written." Example: the 2026-05-10 AgentCore Runtime BYO filesystem was first framed only as "could replace future filesystem-staging glue" — under-weighting the real story (code-interpreter sandboxes, cross-session uploads, shared skill hot-swap, persistent vector indexes). A dep-bump's win is usually subtraction; a *new* platform primitive's win is usually capability unlock. Don't mis-classify.
+- **Subagent fan-out.** External sources are independent — fan them out to parallel subagents and synthesize. Keeps the main context clean and runs faster.
+- **Web budget soft cap.** Target ≤50 web requests. If a source is exhausted, unreachable, or rate-limited, list it as "not scanned this week" — don't skip silently. Going modestly over the cap (say, to 60) is fine if the extra requests are surfacing real signal; document the overage in the Web Budget block. Don't pad — if 30 requests covered every source meaningfully, stop at 30.
+- **Cite everything.** Every external claim gets a URL + access date in the Sources Scanned appendix. Web findings rot fast and you'll re-read them next week.
+- **No edits outside `docs/kaizen/`.** This skill writes a dated research doc and updates `review-queue.md`. It never touches `backend/`, `frontend/`, `infrastructure/`, `CLAUDE.md`, or skill files.
+
+## When to run
+
+Friday early morning (~6am MT). `kaizen-review-prep` runs ~2 hours later (~8am MT) so both docs are waiting when Phil sits down Friday morning. Phil reviews, picks 1–3 to ship over the coming week, and POCs additional items over the weekend. Last weekend's POC findings surface in *this* run's review-prep as Carried Over items (lifted from comments on the previous week's research PR).
+
+## Sources
+
+### External (web — last 7 days unless noted)
+
+1. **AWS Bedrock + AgentCore "What's New"**
+   - https://aws.amazon.com/about-aws/whats-new/recent/feed/ (canonical AWS What's New RSS — filter entries for Bedrock/AgentCore)
+   - https://aws.amazon.com/blogs/machine-learning/ (filter: bedrock, agentcore)
+   - Filter to: Bedrock, AgentCore, Bedrock Agents, Knowledge Bases, Guardrails, model availability/region/quota changes.
+
+2. **Strands Agents SDK**
+   - https://github.com/strands-agents/sdk-python/releases
+   - https://github.com/strands-agents/sdk-python/blob/main/CHANGELOG.md
+   - https://github.com/strands-agents/sdk-python/issues?q=is%3Aissue+sort%3Aupdated-desc
+   - For each new release, identify: breaking changes, new hooks/features, fixes that map to current usage in `backend/src/agents/main_agent/`.
+
+3. **Reference repo — `aws-samples/sample-strands-agent-with-agentcore`**
+   - https://github.com/aws-samples/sample-strands-agent-with-agentcore/commits/main
+   - Diff the last 7 days (or "since last research run" — whichever is longer). Identify new patterns, removed approaches, or fixes that map to constructs in this repo: agent setup, tool registration, AgentCore Identity flows, Memory configuration, Gateway/MCP wiring.
+   - This repo has historically informed our architecture; week-over-week deltas are first-class signal.
+
+4. **MCP ecosystem**
+   - https://modelcontextprotocol.io (blog, spec changes)
+   - https://github.com/modelcontextprotocol/servers (new servers, retired servers)
+   - MCP registry / awesome-mcp lists for new servers relevant to the stack (Bedrock, AWS, GitHub, Slack, observability).
+
+4a. **FastMCP** — used by our externally hosted MCP servers (Lambda-backed, behind AgentCore Gateway). FastMCP is **not** pinned in this repo's `pyproject.toml`; it lives in the MCP server repos this stack consumes via Gateway. Track upstream releases because changes affect server behavior we depend on.
+   - https://github.com/jlowin/fastmcp/releases
+   - https://github.com/jlowin/fastmcp/blob/main/CHANGELOG.md
+   - https://github.com/jlowin/fastmcp/issues?q=is%3Aissue+sort%3Aupdated-desc
+   - https://pypi.org/project/fastmcp/ (for latest version + release date)
+   - Identify: breaking changes, new server-side primitives (resources/prompts/tool decorators, lifespan, auth helpers), transport changes (especially relevant if MCP SEP-2567 sessionless transport lands), and Lambda/runtime adapter changes.
+
+4b. **Agentic UI/UX patterns** — emerging UI and UX conventions for AI/agentic apps. We're Angular + Tailwind, so React-specific libraries are **pattern-only** references (extract the idea, implement in signals). Focus on functionality + interaction + visual conventions, not generic "good chat UX".
+   - **MCP Apps + extensions** (priority): https://modelcontextprotocol.io/extensions/apps/overview, https://github.com/modelcontextprotocol/ext-apps, https://blog.modelcontextprotocol.io. The "MCP server returns an interactive UI inline with the chat" standard. Track host adoption (Claude Desktop, ChatGPT, VS Code Copilot, Goose, Postman) and new MCP extension SEPs.
+   - **AI SDK / Generative UI** (Vercel): https://ai-sdk.dev/docs/ai-sdk-ui, https://ai-sdk.dev/cookbook. Canonical reference for tool-call rendering, multi-step UI, generative UI, streaming state patterns. React, but the patterns port.
+   - **assistant-ui**: https://www.assistant-ui.com/docs, https://github.com/Yonom/assistant-ui/releases. React component library purpose-built for AI chat UI. Tracks attachment UX, threading, tool-call rendering primitives.
+   - **Vendor product-blog UX writeups**: https://linear.app/blog (Linear Agent), https://www.cursor.com/blog (canvas, agent harness), https://www.anthropic.com/news filtered for `artifact`/`ui`/`design`. Where in-app agentic patterns get documented by the teams shipping them.
+   - **OpenAI Canvas + ChatGPT UI**: https://openai.com/blog filtered for `canvas`, `chatgpt`, agent UI updates.
+   - **Nielsen Norman Group AI articles**: https://www.nngroup.com/topic/artificial-intelligence/. UX-research perspective; evidence-based; slow cadence — surfaces in ~1 of 4 weekly runs but high signal when it does.
+   - Identify: new agentic UI standards (especially MCP Apps + adjacent SEPs), tool-result rendering patterns, attachment/preview UX, multi-agent attribution patterns, consent/elicitation UX, evidence-based usability findings.
+
+5. **Frontier model announcements**
+   - https://www.anthropic.com/news
+   - https://openai.com/blog (filter: API, agents, tools)
+   - https://blog.google/technology/google-deepmind/ (Gemini)
+   - https://ai.meta.com/blog/ (Llama)
+   - Focus on capability deltas affecting agent harness design: longer context, native tool use changes, prompt caching APIs, computer use, structured output, latency/cost shifts.
+
+6. **Agent harness patterns**
+   - https://www.anthropic.com/engineering (Claude Code, agent design posts)
+   - https://github.com/anthropics/claude-code/blob/main/CHANGELOG.md
+   - LangChain / LlamaIndex / Pydantic-AI release notes — for ideas, not adoption.
+
+7. **AWS Bedrock pricing + quota**
+   - https://aws.amazon.com/bedrock/pricing/
+   - Note any model price/quota changes that could shift architecture choices in this repo (e.g., model selection in `inference_api`).
+
+8. **AgentCore SDK / starter-toolkit issues**
+   - https://github.com/aws/bedrock-agentcore-sdk-python/issues
+   - https://github.com/aws/bedrock-agentcore-starter-toolkit/issues
+   - Early-signal bugs/limits other users hit before we do.
+
+9. **Community signal (filtered)**
+   - HN search: `site:news.ycombinator.com bedrock OR agentcore OR strands OR "claude code"` (last 7 days)
+   - r/LocalLLaMA, r/MachineLearning — agent-harness critiques and patterns surface here before vendor blogs.
+
+10. **Anthropic cookbook**
+    - https://github.com/anthropics/anthropic-cookbook
+    - Worked examples often outpace docs — especially for caching, tool use, and agent loops.
+
+11. **Seasonal sources** (only when in window)
+    - AWS re:Invent (typically late Nov / early Dec) — Bedrock/AgentCore announcements.
+    - NeurIPS / ICLR / EMNLP agent tracks (when proceedings drop).
+    - If today's date is not in a known window, skip with "no seasonal sources this week".
+
+### Internal (this repo)
+
+13. **Recent commits.** `git log develop --since="7 days ago" --oneline --no-merges`. Cluster by area (`backend/`, `frontend/`, `infrastructure/`). Reverts and high-churn files signal pain points.
+
+14. **Open PRs + review comments.** `gh pr list --base develop --state open --limit 20`, then `gh pr view <n> --comments` on the top 3 by comment count. Repeated review feedback is a CLAUDE.md or skill-update signal.
+
+15. **GitHub issues opened in last 7 days.** `gh issue list --state open --search "created:>$(date -v-7d +%Y-%m-%d)"`. Bug clustering = refactor signal.
+
+16. **CI failures.** `gh run list --status=failure --limit 30`. Group by workflow + job. Flaky tests and recurring infra failures.
+
+17. **Recent CHANGELOG.md / RELEASE_NOTES.md entries** (last 14 days). Used as the "don't re-propose what we just shipped" filter.
+
+18. **Skill inventory.** `find .claude/skills -name SKILL.md -exec stat -f "%Sm %N" {} \;`. Skills not modified in 60+ days and not visibly referenced in recent PRs are retirement candidates.
+
+19. **Version-pin lag.** For each tracked dep, fetch latest release version and compute lag:
+    - Backend: `strands-agents`, `boto3`, `botocore`, `fastapi`, `pydantic`, `bedrock-agentcore`, `mcp`
+    - Frontend: `@angular/core`, `@analogjs/platform`, `vitest`
+    - Infrastructure: `aws-cdk-lib`, `constructs`
+    - Source files: `backend/pyproject.toml`, `frontend/ai.client/package.json`, `infrastructure/package.json`.
+
+20. **Decisions log** — `docs/kaizen/decisions.md` (if it exists). Items previously declined; don't re-propose without materially new context.
+
+21. **Recent reviews** — `docs/kaizen/reviews/*.md` (last 1–2). Used to avoid duplicate proposals.
+
+## Output
+
+### 1. Primary doc — `docs/kaizen/research/YYYY-MM-DD.md`
+
+```markdown
+# Kaizen Research — [Day, Month D, YYYY]
+> Scan window: [Month D – Month D, YYYY] (7 days)
+> Web budget: N/50 used (target).
+
+## TL;DR
+
+[2-3 sentences. The single most important external move and the single most pressing internal signal. Name the recommended #1 idea here.]
+
+## External Scan
+
+### What's moving this week
+
+[1-2 paragraphs — gestalt. What's the shape of the week? Are vendors converging on a pattern? Anything surprise you?]
+
+### Notable items by source
+
+> **Annotation conventions:**
+> - `*relevance*:` — impact-on-existing-code lens. What construct/file does this affect? What does it replace, simplify, or obsolete?
+> - `*unlocks*:` — capability-unlock lens (use when applicable, especially for *new* platform primitives, SDK hooks, or UX patterns). What net-new product capability or enhancement does this make possible? What could we now build that we couldn't before?
+>
+> Bug-fixes and incremental dep-bumps usually only need `*relevance*`. New platform features, new SDK primitives, new spec capabilities, and new UX patterns usually deserve both.
+
+#### AWS Bedrock / AgentCore
+- **[Item]** — [1-2 sentence summary] — [URL] — *relevance*: [specific construct/file] — *unlocks* (if applicable): [net-new capability or enhancement this enables]
+
+#### Strands Agents
+- **[Item]** — …
+
+#### Reference repo (aws-samples/sample-strands-agent-with-agentcore)
+- **[Commit / change]** — [diff summary] — [URL] — *applicability*: [does our equivalent code do this differently? worth porting?]
+
+#### MCP ecosystem
+- …
+
+#### FastMCP
+- **[Release / change]** — [URL] — *implications for our MCP servers*: [breaking change? new primitive worth adopting?]
+
+#### Agentic UI/UX patterns
+- **[Pattern / release]** — [URL] — *what it is*: [1-2 sentences] — *fit for our stack*: [direct port / pattern-only (Angular equivalent: …) / not applicable] — *where it'd land*: [SSE event / component / route]
+
+#### Frontier model announcements
+- …
+
+#### Agent harness patterns
+- …
+
+#### Pricing / quota
+- …
+
+#### Community + GitHub issues
+- …
+
+#### Cookbook / courses
+- …
+
+#### Seasonal
+- [content, or "Out of window — none scanned this week"]
+
+### Patterns worth considering
+
+- **[Pattern]** — [3 sentences: what it is, where it's appearing, fit for this repo]
+  - **Where**: [examples]
+  - **Fit**: [would this help? what does it replace? cost to adopt?]
+  - **Verdict**: [Worth trying / Not a fit / Monitor]
+
+## Internal Audit
+
+### Activity (last 7 days)
+- **Commits on develop**: N (across N PRs)
+- **PRs opened**: N — **merged**: N — **reverted**: N
+- **Issues opened**: N — **closed**: N
+- **CI failures (workflow → count)**: …
+
+### Repeated friction signals
+- **[Pattern]** (N occurrences) — [evidence: commit SHAs, PR numbers, issue links]
+  - **Hypothesis**: [root cause]
+  - **Fix candidate**: [specific change — file + behavior]
+
+### Version-pin lag
+| Dep | Pinned | Latest | Lag | Notes |
+|---|---|---|---|---|
+| strands-agents | x.y.z | a.b.c | N releases / N days | [breaking? new feature relevant to us?] |
+
+### Retirement candidates
+- **[Skill / file / config]** — [evidence: not modified in N days, replaced by X, never referenced]
+
+### Risks introduced this week
+<!-- Defensive scanning — things that could break us if ignored. -->
+- **[Risk]** — [source URL or PR] — *what breaks if we ignore this*
+
+## Ideas — Top 5 (ranked)
+
+| # | Idea | Surface | Effort | Impact | Subtracts? | Unlocks? |
+|---|---|---|---|---|---|---|
+| 1 | [Title] | backend / frontend / infra / cross-cutting | L/M/H | L/M/H | [what it retires, or "addition only — justified because…"] | [net-new capability, or "—" if not applicable] |
+| 2 | … | | | | | |
+
+### 1. [Idea title]
+- **Source**: [external item / internal signal — URL or commit SHA]
+- **Surface area**: [paths affected]
+- **Change**: [what specifically would change]
+- **Subtracts**: [what this retires/simplifies, or explicitly: "addition only — justified because…"]
+- **Unlocks** (if applicable): [net-new product capability, UX pattern, or enhancement this enables — bulleted if multiple. Omit field when not a capability-unlock item.]
+- **Effort × Impact**: [Low/Med/High] × [Low/Med/High]
+- **Verdict**: [Worth trying / Not a fit / Monitor]
+
+### 2. …
+
+## Take
+
+[2-4 sentences. Net read of the week. Is the system trending toward the ecosystem or away from it? One change that would matter most. What Phil would notice first if shipped.]
+
+---
+
+## Sources Scanned
+
+| # | Source | URL | Accessed | Items |
+|---|---|---|---|---|
+| 1 | AWS Bedrock What's New | https://… | 2026-05-10 | 3 |
+
+## Web Budget
+
+Used: N / 50 requests (target).
+Skipped (unreachable / rate-limited): [list]
+Skipped (other): [list with reason]
+Notes: [if the cap was exceeded, name the source category that justified it]
+```
+
+### 2. Handoff — `docs/kaizen/review-queue.md` (rolling, not dated)
+
+The explicit contract with `kaizen-review-prep`. This skill **appends** new entries under `## Open`. It never edits `## Resolved` (review-prep does the move).
+
+```markdown
+# Kaizen Review Queue
+
+Items added by `kaizen-research`, consumed by `kaizen-review-prep`.
+
+## Open
+<!-- Newest at top. -->
+
+### [YYYY-MM-DD] [Idea title]
+- **Source**: research/YYYY-MM-DD.md
+- **Surface**: backend | frontend | infrastructure | cross-cutting
+- **Effort × Impact**: L/M/H × L/M/H
+- **Subtracts**: [yes — what / no — justification]
+- **Unlocks** (if applicable): [net-new capability, UX pattern, or enhancement this enables; bulleted if multiple. Omit when not a capability-unlock item.]
+- **Status**: open
+
+## Resolved
+<!-- kaizen-review-prep moves entries here after a review. -->
+
+### [YYYY-MM-DD] [Idea title]
+- **Source**: research/YYYY-MM-DD.md
+- **Decision**: Ship | Decline | Defer until [date]
+- **Reasoning**: [Phil's reason, one sentence]
+- **Reviewed in**: reviews/YYYY-MM-DD.md
+```
+
+## How to run
+
+1. **Bootstrap.** If `docs/kaizen/`, `docs/kaizen/research/`, `docs/kaizen/reviews/`, or `docs/kaizen/review-queue.md` don't exist, create them. The queue starts with the headers above and empty sections.
+
+2. **Read recent context** (sequential — small reads):
+   - Last 1-2 files in `docs/kaizen/research/`
+   - Last 1-2 files in `docs/kaizen/reviews/`
+   - `docs/kaizen/decisions.md` if present
+   - `docs/kaizen/review-queue.md`
+   - Last 14 days of `CHANGELOG.md` and `RELEASE_NOTES.md`
+
+3. **Inventory internal signals** (parallel Bash calls):
+   - `git log develop --since="7 days ago" --oneline --no-merges`
+   - `gh pr list --base develop --state open --limit 20`
+   - `gh issue list --state open --search "created:>$(date -v-7d +%Y-%m-%d)"`
+   - `gh run list --status=failure --limit 30`
+   - `find .claude/skills -name SKILL.md -exec stat -f "%Sm %N" {} \;`
+   - Read pinned versions from the three manifest files.
+
+4. **Fan out external scan** — spawn parallel `general-purpose` subagents (or `Explore` for sources requiring multiple targeted lookups). One subagent per source category 1–11 above (13 categories total including 4a FastMCP and 4b Agentic UI/UX). Each subagent receives:
+   - The exact URLs to scan
+   - Scope: last 7 days
+   - Web budget for that subagent (3–5 requests soft target)
+   - Required output: 3-5 bullet items max — title, 1-2 sentence summary, URL, "relevance to this repo" line.
+   - **Required**: cite URLs; never fabricate. If empty, return "no notable items this week".
+
+   Total budget across subagents targets ≤50. Track centrally; modest overage (~60) is acceptable when surfacing real signal — beyond that, stop and document the skip.
+
+5. **Version-pin diff.** For each tracked dep, fetch latest release version (WebFetch on the release page or registry equivalent — counts toward budget). Compute lag in releases and days. If a budget hit prevents a check, list the dep under "Skipped".
+
+6. **Synthesize.** Write the research doc per the shape above. Pull subagent reports verbatim into source sections; write the gestalt narrative (TL;DR, "What's moving", Take) yourself. **Top 5 weighting**:
+   - **Library-native subtraction** opportunities (where upstream closed a custom-code need) get a subtraction boost.
+   - **Capability-unlock** items — new platform primitives, SDK hooks, spec capabilities, or UX patterns that enable net-new product surface we couldn't easily build before — rank on their strategic merit, *not* deprioritized just because they don't intersect existing code. Apply the dual lens from Philosophy: if a feature genuinely unlocks new capability (code-interpreter, persistent agent state, multi-agent UI attribution, etc.), rank it like a fit item, not like a "monitor" item. Resist the temptation to hedge unlock items into "replaces future glue we haven't written" — that under-weights the real story.
+   - **Concrete fit** UI/UX patterns that match an existing surface (tool-call rendering, attachments, A2A attribution, consent flows) get a fit boost over generic "interesting trend" items.
+
+7. **Update review queue.** For each Top 5 idea, prepend a new entry under `## Open` in `docs/kaizen/review-queue.md`. Never touch `## Resolved`.
+
+8. **Open a PR** — see "PR creation".
+
+## PR creation
+
+```bash
+DATE=$(TZ=America/Denver date +'%Y-%m-%d')
+BRANCH="kaizen/research-${DATE}"
+
+git checkout -b "$BRANCH" develop
+git add docs/kaizen/
+git commit -m "$(cat <<EOF
+chore(kaizen): weekly research scan ${DATE}
+
+Generated by the kaizen-research skill. Top 5 ideas appended to
+docs/kaizen/review-queue.md for the kaizen-review-prep run later this morning.
+EOF
+)"
+git push -u origin "$BRANCH"
+
+gh pr create --base develop --head "$BRANCH" \
+  --title "chore(kaizen): weekly research scan ${DATE}" \
+  --body "$(cat <<'EOF'
+## Summary
+- External scan: AWS Bedrock/AgentCore, Strands Agents, FastMCP, reference repo, MCP, agentic UI/UX patterns, frontier models, agent-harness patterns, pricing.
+- Internal audit: recent commits, open PRs, GitHub issues, CI failures, version-pin lag, retirement candidates.
+- Top 5 ideas in the dated research doc and queued in `docs/kaizen/review-queue.md`.
+
+## Review
+- Read the research doc.
+- Comment on the PR with reactions and any weekend POC findings — these become first-class signal for *next* Friday's `kaizen-review-prep`.
+- POC promising ideas over the weekend.
+
+## Decision
+Ship the doc to `develop`. Ranking into decisions happens in the kaizen-review-prep PR opened later this morning. Action on individual ideas happens in separate PRs the following week.
+
+🤖 Generated with [Claude Code](https://claude.com/claude-code)
+EOF
+)"
+```
+
+The branch is one-shot — squash-merging the PR lands the doc on `develop` and the branch can be deleted.
+
+## Rules
+
+- **No fabrication.** If a source is rate-limited or empty, list it as "not scanned" — don't invent content. The Sources Scanned table is auditable.
+- **Web budget is a soft target, not a hard cap.** ≤50 requests is the goal. Overage is acceptable when justified by signal (document in the Web Budget block). Don't pad — if a source is empty after one fetch, move on.
+- **Subtraction first.** Top 5 should include at least 2 retire/simplify candidates if the system has been running >2 weeks.
+- **Concrete, not aspirational.** "Consider Strands hooks" is too vague. "Add a Strands `BeforeToolCall` hook in `backend/src/agents/main_agent/hooks/` to attribute tokens by tool" is actionable.
+- **No edits to source code.** This skill only writes under `docs/kaizen/`.
+- **Honest about dry weeks.** A quiet week produces a short doc, not a padded one.
+- **Don't re-propose declined ideas** without materially new context. Check `docs/kaizen/decisions.md` and recent reviews.
+- **Cite everything.** Every external claim has a URL + access date in the Sources Scanned appendix.
+- **Don't auto-merge the PR.** Phil reviews and merges Friday morning. Review-prep runs against the unmerged PR's docs — it reads the file from the working tree, not from `develop`.
+
+## Confirmation
+
+After the PR is opened, tell Phil:
+1. PR URL.
+2. Top 1-2 ideas (title + Effort×Impact).
+3. One-sentence Take.
+4. Web budget used (N/50 target) and any skipped sources.
+
+Brief. The full doc is on the PR.
diff --git a/.claude/skills/kaizen-review-prep/SKILL.md b/.claude/skills/kaizen-review-prep/SKILL.md
new file mode 100644
index 00000000..a986b46f
--- /dev/null
+++ b/.claude/skills/kaizen-review-prep/SKILL.md
@@ -0,0 +1,255 @@
+---
+name: kaizen-review-prep
+description: Friday late-morning synthesis. Runs ~2 hours after `kaizen-research` the same morning. Consumes this week's research doc, open items in `docs/kaizen/review-queue.md`, last weekend's POC findings (from comments on the previous week's research PR), and recent merges/reverts/CI signal — produces a ranked, decision-oriented agenda. Every item has a Ship / Decline / Defer recommendation. Opens a PR into `develop`. Triggers: "kaizen review prep", "weekly review prep", "friday review", "rank kaizen ideas".
+---
+
+# Kaizen Review Prep
+
+Friday late morning, after `kaizen-research` ran earlier the same morning. This skill consolidates this week's research + open queue items + last weekend's POC findings (lifted from PR comments on the previous week's research PR) + recent repo state into a ranked decision agenda. Phil reviews Friday morning, marks ✅/❌/⏸ on each item, ships 1–3 the following week, and POCs the next batch over the weekend.
+
+## Philosophy
+
+- **Review is a decision forum, not a status update.** Everything that lands in the output should be either: (a) actionable this week, (b) explicitly deferred with a reason and revisit date, or (c) declined. Nothing is "noted." Noted-and-forgotten is how systems accumulate friction.
+- **Subtraction first.** Every proposal ranks against "do nothing" and "retire something instead." If a proposal adds anything, it must explain what existing thing it either replaces or simplifies.
+- **Dual lens — impact + capability-unlock.** Rank proposals through *two* lenses, not one: (a) **impact on existing code** (does this change, simplify, or obsolete something we already have?) and (b) **capability unlock** (what *new* product capability or UX enhancement does this enable that we couldn't easily build before?). Subtraction-first applies to lens (a). But proposals that genuinely unlock new product surface — code-interpreter sandboxes, persistent agent state, multi-agent UI attribution, new SSE event types that enable inline UI, etc. — must be evaluated on their strategic merit, *not* auto-deferred because they don't intersect existing code. A proposal with no `Subtracts` value but a substantive `Unlocks` value can rank above a low-impact dep-bump. Don't penalize net-new capability for not being a cleanup.
+- **Multiple cycles.** Kaizen is small changes, weekly, compounding. If this week's review touches 3 things, next week's will touch 3 different things. Phil doesn't need a grand plan — he needs a reliable weekly cadence.
+- **One-week feedback lag is intentional.** Phil reviews Friday → POCs over the weekend → those POC findings surface in the *next* Friday's review-prep as Carried Over items. Don't try to fold same-day POC findings in — they don't exist yet.
+- **No edits outside `docs/kaizen/`.** This skill writes one Markdown file under `docs/kaizen/reviews/` and updates `docs/kaizen/review-queue.md` (moves Open → Resolved post-review). It never touches source code, `CLAUDE.md`, or skill files. Those changes happen in separate PRs after the review.
+
+## When to run
+
+Friday late morning (~8am MT), ~2 hours after `kaizen-research` runs. Phil reviews both docs Friday morning, picks 1–3 to ship over the coming week, and POCs additional items over the weekend. POC findings from last weekend's POC session surface here as Carried Over items (lifted from PR comments on the *previous* week's research PR — not this week's, which Phil hasn't seen yet).
+
+## Inputs
+
+1. **Most recent `docs/kaizen/research/YYYY-MM-DD.md`** — Friday's scan. Its Top 5 ideas are the primary candidate list.
+2. **`docs/kaizen/review-queue.md`** — `## Open` entries. Includes both this week's ideas (just appended by `kaizen-research`) and any prior-week items that weren't resolved.
+3. **Last 1–2 `docs/kaizen/reviews/*.md`** — what was proposed before, what was decided, anything deferred to "revisit by [date]".
+4. **PR comments on the *previous* week's kaizen-research PR.** `gh pr view <n> --comments` — Phil's reactions and weekend POC findings are first-class signal. The PR opened *this* morning by `kaizen-research` is too fresh; comments accumulate over the week as Phil POCs ideas. Pick the research PR from one week ago (or the most recent merged/closed kaizen-research PR), not today's.
+5. **`docs/kaizen/decisions.md`** (if it exists) — declined items with reasons. Don't re-propose without materially new context.
+6. **Recent activity since last review:**
+   - `git log develop --since="<last review date>" --oneline --no-merges` — what shipped.
+   - `gh pr list --base develop --state merged --search "merged:>$(date -v-7d +%Y-%m-%d)"` — what landed.
+   - `gh run list --status=failure --limit 30` — fresh CI failures.
+7. **`CLAUDE.md` + skill inventory** — surface concerns only; never propose unilateral edits to these.
+8. **`CHANGELOG.md` / `RELEASE_NOTES.md`** — most recent ~14 days, for the "what shipped this week" celebration block + the don't-re-propose filter.
+
+## Output
+
+### 1. Review doc — `docs/kaizen/reviews/YYYY-MM-DD.md`
+
+```markdown
+# Kaizen Review — [Day, Month D, YYYY]
+> Prepared HH:MMam MT. Review window: [Month D – D] (7 days).
+> Source: research/YYYY-MM-DD.md + review-queue.md (N open items).
+
+## Week in Review
+
+[2-4 sentences. What did the week reveal about the system? Use concrete language —
+"The aws-samples reference repo introduced a new agent-loop pattern and we're 2
+Strands releases behind" beats "some external changes". This is Phil's pulse
+check before decisions.]
+
+## Friction — the week's signal
+
+### Repeated patterns (≥2 occurrences)
+- **[Pattern]** (N times) — [concrete description; quote PR review comments or commit messages where helpful]
+  - *Hypothesis*: [root cause]
+  - *Candidate fix*: [specific change — file + behavior]
+
+### One-offs worth watching
+- **[Pattern]** (1 occurrence) — [context]
+
+### Silence that matters
+<!-- What WASN'T used: skills not referenced this week, features not invoked,
+     CI workflows that haven't run, etc. -->
+- **[Silence]** — [what wasn't used + what that might mean]
+
+## Proposals — ranked
+
+<!-- 5–10 items. Each is a DECISION for Phil, not a status. Every item has
+     a specific Ship option, a specific Decline option, and a recommendation. -->
+
+### 1. [Proposal title]
+- **Source**: research/YYYY-MM-DD.md ▸ Top 5 #N | review-queue.md (open since YYYY-MM-DD) | PR comment | direct observation
+- **Surface area**: backend / frontend / infrastructure / cross-cutting / docs / skills
+- **Change**: [concrete description — what files change, what the new behavior is]
+- **Subtracts**: [required field — what this retires, simplifies, or replaces. Or explicitly "addition only — justified because…"]
+- **Unlocks** (if applicable): [net-new product capability, UX pattern, or enhancement this enables — bulleted if multiple. Required for proposals where `Subtracts: no — addition only`; the unlock is the justification. Omit when purely a cleanup/dep-bump and not applicable.]
+- **Effort**: Low / Med / High
+- **Impact**: Low / Med / High
+- **POC findings (if Phil tried it)**: [summary or "not POCed"]
+- **Ship means**: [specific action — "open PR updating X to do Y" or "retire skill Z"]
+- **Decline means**: [what happens instead — usually "keep current behavior, revisit in N weeks"]
+- **Recommendation**: Ship / Decline / Defer N weeks — [one-sentence why]
+
+### 2. [Next proposal]
+…
+
+## Carried Over From Prior Reviews
+<!-- Items deferred in earlier review docs that have hit their revisit date.
+     Surfaced here for re-decision so nothing rots silently in "deferred" status. -->
+
+- **[Deferred item]** (deferred YYYY-MM-DD until YYYY-MM-DD) — [original context]. Now due.
+
+## Retirement Candidates
+
+<!-- Things currently in the scaffold that aren't earning their place.
+     Bias strongly toward subtraction. If you can't find anything, that's a
+     finding — flag it in the Take. -->
+
+- **[Candidate]** — [evidence: not modified in N days, not referenced, replaced by X]
+
+## Risks Acknowledged But Not Acted On
+<!-- From research's "Risks introduced this week" section. Surface so Phil
+     can decide: address now, defer with a watch date, or accept. -->
+
+- **[Risk]** — [source URL] — *what breaks if ignored* — recommendation: [Address now / Watch until [date] / Accept]
+
+## What Shipped This Week
+
+<!-- From CHANGELOG.md / merged PRs. Short list, one line each. Context for
+     "the system absorbed this much change recently — propose less if a lot." -->
+
+- [shipped item] — *why it mattered*
+
+## Take
+
+[2-4 sentences. Is the system trending toward trust or toward friction? Is the kaizen
+loop catching real signal or generating noise? What's the one change that would
+matter most this week if shipped? Don't sugarcoat — if a skill or pattern isn't
+pulling its weight, say so.]
+
+---
+
+## Review Protocol (for Phil)
+
+1. Read Friction (2 min).
+2. Scan Proposals — mark ✅ Ship / ❌ Decline / ⏸ Defer on each (3-5 min).
+3. Scan Retirement Candidates — same marks (1-2 min).
+4. Resolve Carried Over items (1-2 min).
+5. Resolve Risks block.
+6. Pick 1-3 to ship this week. Decline or defer the rest with a reason.
+
+Target: 10-15 minutes.
+
+## Post-review (for Phil — separate PRs)
+
+- ✅ Ship items → individual feature PRs over the week. The decision is logged in this doc; the implementation lives elsewhere.
+- ❌ Decline items → appended to `docs/kaizen/decisions.md` with Phil's reason so future research doesn't re-propose.
+- ⏸ Defer items → kept open in `review-queue.md` with a "revisit by [date]"; surface again in the next review when due.
+
+This skill produces the agenda. Implementation never happens here.
+```
+
+### 2. Queue update — `docs/kaizen/review-queue.md`
+
+After Phil reviews and the decisions are logged in the review doc, this skill (or Phil himself, manually) **moves resolved items** from `## Open` to `## Resolved` with a Decision and Reasoning. On a fresh run before Phil has reviewed, the skill leaves Open as-is — only the *prior* review's outcomes get processed for queue movement.
+
+## How to run
+
+1. **Bootstrap.** Confirm `docs/kaizen/reviews/` exists; create it if not.
+
+2. **Read inputs** (sequential — small reads):
+   - Latest file in `docs/kaizen/research/`
+   - `docs/kaizen/review-queue.md` (full)
+   - Last 1–2 files in `docs/kaizen/reviews/`
+   - `docs/kaizen/decisions.md` if present
+   - Last ~14 days of `CHANGELOG.md` and `RELEASE_NOTES.md`
+   - `CLAUDE.md` (read-only — for context, not edits)
+
+3. **Pull PR comments on the latest research PR** (parallel with step 4):
+   ```
+   gh pr list --base develop --state all --search "kaizen/research" --limit 1 --json number,url
+   gh pr view <number> --comments
+   ```
+   Capture Phil's reactions. POC findings he mentions get folded into proposal entries.
+
+4. **Pull recent activity** (parallel Bash):
+   - `git log develop --since="<last review date>" --oneline --no-merges`
+   - `gh pr list --base develop --state merged --search "merged:>$(date -v-7d +%Y-%m-%d)" --limit 30`
+   - `gh run list --status=failure --limit 30`
+   - `gh issue list --state open --search "created:>$(date -v-7d +%Y-%m-%d)"`
+
+5. **Process prior-review queue movement.** For each entry in `## Open` that was resolved in the most recent review doc, move it to `## Resolved` with the Decision + Reasoning + Reviewed-in fields. Items with no decision in the prior review stay open.
+
+6. **Identify Carried Over items.** Scan prior review docs for `Defer N weeks` recommendations whose revisit date has hit. Add those to the new review's Carried Over section.
+
+7. **Synthesize the review doc** per the shape above. The Proposals list is built from:
+   - All `## Open` entries in `review-queue.md` (the primary source)
+   - Any new friction patterns surfaced from PR comments / merged PRs / CI that weren't already in the queue
+   - Carried Over items
+   Rank:
+   - Low-effort × High-impact first.
+   - **Retirement candidates** get a +1 boost (subtraction bias).
+   - **Capability-unlock items** (proposals with a substantive `Unlocks` field — new product capability, UX surface, or platform primitive adoption) rank on their strategic merit. Do not auto-defer just because `Subtracts: no`. A High-impact unlock can rank above a Low-impact subtraction.
+   - Items with **POC findings** rank above untested items at the same effort/impact.
+
+8. **Cap the proposal count at 10.** If more than 10 candidates, defer the lowest-ranked to next week with a note. The review is supposed to take 10-15 minutes, not be exhaustive.
+
+9. **Open a PR** — see "PR creation".
+
+## PR creation
+
+```bash
+DATE=$(TZ=America/Denver date +'%Y-%m-%d')
+BRANCH="kaizen/review-${DATE}"
+
+git checkout -b "$BRANCH" develop
+git add docs/kaizen/
+git commit -m "$(cat <<EOF
+chore(kaizen): weekly review prep ${DATE}
+
+Generated by kaizen-review-prep. Ranked agenda for the 10-15 min decision pass;
+queue updated with prior-review outcomes.
+EOF
+)"
+git push -u origin "$BRANCH"
+
+gh pr create --base develop --head "$BRANCH" \
+  --title "chore(kaizen): weekly review prep ${DATE}" \
+  --body "$(cat <<'EOF'
+## Summary
+- N proposals ranked Effort × Impact (retirement candidates boosted).
+- Friction patterns from the week's commits, PRs, and CI surfaced.
+- Carried-over deferred items now due for re-decision.
+- POC findings (from kaizen-research PR comments) folded into proposals where Phil tried something.
+
+## Review
+1. Read Friction (2 min).
+2. Mark each Proposal: ✅ Ship / ❌ Decline / ⏸ Defer.
+3. Same for Retirement Candidates and Risks.
+4. Pick 1-3 to ship this week.
+
+Target: 10-15 minutes.
+
+## Decision
+Ship the doc to `develop`. Action on individual items happens in separate PRs over the week.
+Declined items go to `docs/kaizen/decisions.md`; deferred items stay open in `review-queue.md` with a revisit date.
+
+🤖 Generated with [Claude Code](https://claude.com/claude-code)
+EOF
+)"
+```
+
+## Rules
+
+- **Every proposal is a decision.** No "consider X" or "we might want to." Each has Ship means / Decline means / Recommendation.
+- **Every proposal has a Subtracts field.** Required. If empty, ask: does it really need to be its own thing? Could an existing skill / construct be updated instead? If still pure addition, justify it explicitly.
+- **Retirement candidates are required.** If the section is empty and the system has been running >2 weeks, *that's* the finding — flag it in the Take.
+- **Don't re-propose declined items** without materially new context. Cross-check `docs/kaizen/decisions.md` and the last 1–2 reviews.
+- **Carried Over is not a graveyard.** Deferred items resurface on their revisit date. No silent deferrals.
+- **No fabrication.** If a week was quiet, the review is short. Length tracks signal, not target word count.
+- **Never edit `CLAUDE.md` or skill files unilaterally.** A proposal can recommend a change to them, but the change itself is always Phil-approved in review and shipped in a separate PR.
+- **Cap at 10 proposals.** A 15-item list defeats the 10-15 min target.
+
+## Confirmation
+
+After the PR is opened, tell Phil:
+1. PR URL.
+2. Top 1–2 proposals (title, Effort×Impact, recommendation).
+3. Top 1 retirement candidate if any.
+4. One-sentence Take.
+5. Estimated review time.
+
+Brief. Phil reads the full doc on the PR and marks decisions there or in a follow-up commit.
diff --git a/.github/ACTIONS-REFERENCE.md b/.github/ACTIONS-REFERENCE.md
index 229ff65d..e9a63489 100644
--- a/.github/ACTIONS-REFERENCE.md
+++ b/.github/ACTIONS-REFERENCE.md
@@ -29,6 +29,10 @@ GitHub provides two mechanisms for storing configuration values:
 | CDK_APP_API_ENABLED | Variable | No | `true` | App API | Enable/disable App API stack deployment |
 | CDK_APP_API_MAX_CAPACITY | Variable | No | `10` | Infrastructure, App API | Maximum App API tasks for auto-scaling |
 | CDK_APP_API_MEMORY | Variable | No | `1024` | Infrastructure, App API | Memory (MB) for App API ECS task (512, 1024, 2048, 4096, 8192) |
+| CDK_ARTIFACTS_CERTIFICATE_ARN | Variable | No | None | Artifacts | ACM certificate ARN that covers `artifacts.{CDK_DOMAIN_NAME}`. **Must be in `us-east-1`** regardless of deployment region. Required when `CDK_ARTIFACTS_ENABLED=true`. Reuse `CDK_FRONTEND_CERTIFICATE_ARN` **only when `CDK_DOMAIN_NAME` is the apex** — TLS wildcards are one label deep, so `*.example.com` covers `artifacts.example.com` but not `artifacts.alpha.example.com`. When `CDK_DOMAIN_NAME` is a subdomain, issue a dedicated `us-east-1` cert for `*.{CDK_DOMAIN_NAME}`. |
+| CDK_ARTIFACTS_ENABLED | Variable | No | `false` | Artifacts, Infrastructure, App API, Inference API, Frontend | Enable iframe-isolated artifact rendering. Toggling on provisions a DDB metadata table, S3 content bucket, CloudFront + Lambda render service, and the supporting IAM grants / env vars on the consumer stacks. |
+| CDK_ARTIFACTS_EXTRA_FRAME_ANCESTORS | Variable | No | None | Artifacts | Comma-separated extra origins (beyond `https://{CDK_DOMAIN_NAME}`) permitted to embed artifact iframes via CSP `frame-ancestors` — e.g. `http://localhost:4200` for a local SPA pointed at this deployment. Applied to both the CloudFront response-headers policy and the render Lambda's CSP. **Leave unset in production**: every listed origin can frame users' artifacts (still render-token gated, but a real loosening on a shared environment). |
+| CDK_ARTIFACTS_RETENTION_DAYS | Variable | No | `90` | Artifacts | Days after which soft-deleted artifacts (objects tagged `lifecycle-class=deleted`) are reaped by the S3 lifecycle rule. |
 | CDK_ASSISTANTS_CORS_ORIGINS | Variable | No | None | Infrastructure | Additional CORS origins for the assistants module only (appended to global CORS origins) |
 | CDK_AWS_ACCOUNT | Variable | Yes | None | All | 12-digit AWS account ID for CDK deployment |
 | CDK_CERTIFICATE_ARN | Variable | No | None | Infrastructure | ACM certificate ARN for HTTPS on ALB |
diff --git a/.github/README-ACTIONS.md b/.github/README-ACTIONS.md
index 0aec9de5..a259a02a 100644
--- a/.github/README-ACTIONS.md
+++ b/.github/README-ACTIONS.md
@@ -12,6 +12,7 @@ Deploy a production-ready multi-agent AI platform to your AWS account in about 4
 |-----------|-------------|
 | **VPC + ALB + ECS** | Networking, load balancer, and container orchestration |
 | **Fine-Tuning** *(optional)* | SageMaker training/inference infrastructure, S3 artifact storage, DynamoDB job tracking |
+| **Artifacts** *(optional)* | Iframe-isolated artifact rendering (DDB metadata, S3 content, CloudFront at `artifacts.{domain}`, Lambda render service) |
 | **RAG Ingestion** | Document ingestion pipeline for retrieval-augmented generation |
 | **Inference API** | Strands Agent runtime powered by AWS Bedrock AgentCore |
 | **App API** | Backend REST API for chat, sessions, admin, and auth |
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
new file mode 100644
index 00000000..ec532f7b
--- /dev/null
+++ b/.github/copilot-instructions.md
@@ -0,0 +1,84 @@
+# Copilot Instructions — AgentCore Public Stack
+
+Production multi-agent conversational AI platform built on AWS Bedrock AgentCore + Strands Agents. Monorepo with four top-level packages: `backend/` (Python 3.13, FastAPI), `frontend/ai.client/` (Angular 21 + Analog.js), `infrastructure/` (AWS CDK, TypeScript), and `scripts/`.
+
+Authoritative deeper docs: `CLAUDE.MD` (architecture), `CONTRIBUTING.md` (setup), `.kiro/steering/` and `.claude/skills/` (topic-specific patterns — CDK, Tailwind, Angular signals, CORS, release notes, versioning).
+
+## Build, Test, Lint
+
+### Backend (`cd backend`)
+```bash
+uv sync --extra agentcore --extra dev
+uv run python -m pytest tests/ -v
+uv run python -m pytest tests/path/to/test_file.py::test_name -v   # single test
+uv run black src/ && uv run ruff check src/ && uv run mypy src/
+# Run services locally:
+cd src/apis/app_api && uv run python main.py        # port 8000
+cd src/apis/inference_api && uv run python main.py  # port 8001
+```
+
+### Frontend (`cd frontend/ai.client`)
+```bash
+npm ci
+npm run start                          # dev server on 4200
+npm test                               # Vitest via Analog.js
+npx vitest run path/to/file.spec.ts    # single test file
+npx eslint src/ && npx prettier --check src/
+```
+
+### Infrastructure (`cd infrastructure`)
+```bash
+npm ci && npm run build
+npx cdk synth                          # validates stacks
+npx cdk deploy --all
+npm test -- test/stack-dependencies.test.ts   # verifies new stacks are registered
+```
+
+## Architecture — the big picture
+
+- **Three independent backend consumers** of `apis.shared`: `app_api`, `inference_api`, and `agents/`. They must **never import from each other** — only from `apis.shared`. Enforced by `backend/tests/architecture/test_import_boundaries.py`.
+- **Inference API runs inside an AgentCore Runtime container.** The runtime data plane only proxies `POST /invocations` and `GET /ping` — any other route returns 404 in cloud (works locally because `localhost:8001` bypasses the gateway). User-facing CRUD endpoints **belong in app-api**, not inference-api. To get workload context on app-api, use the `AGENTCORE_RUNTIME_WORKLOAD_NAME` mint fallback in `apis/shared/oauth/agentcore_identity.py`.
+- **Deploy order** (cross-stack SSM references): Infrastructure → (Gateway, RAG Ingestion, SageMaker Fine-Tuning, Artifacts, MCP Sandbox in parallel) → Inference API → App API → Frontend. App API reads `runtime-workload-identity-name` from SSM, published by Inference API.
+- **Errors stream as assistant messages over SSE**, not HTTP error codes. See SSE event table in `CLAUDE.MD` (`message_start`, `content_block_*`, `tool_use`/`tool_result`, `ui_resource`, `stream_error`, `oauth_required`, `compaction`, `done`).
+- **Multi-protocol tools:** direct/AWS-SDK tools live in `agents/main_agent/tools/`; remote tools come via MCP+SigV4 (Gateway Lambda) or A2A (Runtime). A2A is currently **client-only**; if exposing an A2A server, `capabilities` must include `streaming=True` or clients hang.
+- **Frontend is signal-based** throughout (`signal()`, `computed()`). API shapes are defined by backend routes; matching TS interfaces must be updated in the same PR as breaking backend changes.
+
+## Conventions specific to this repo
+
+- **Auth on `apis/app_api/` routes** uses `Depends(get_current_user_from_session)` (cookie-based) or `Depends(require_admin)`. The SPA sends an httpOnly session cookie, **not** `Authorization: Bearer`. Bearer-only deps on user-facing routes cause a 401 → redirect loop. Exceptions: `auth/api_keys/` (X-API-Key) and `voice/` (voice-ticket cookie) — do not template off these.
+- **Admin endpoints** go under `/admin/<domain>/`, user-facing under `/<domain>/`.
+- **Exact dependency pins only** — no `^`, `~`, or `>=` anywhere (Python, npm, CDK).
+- **Never install new packages without explicit user approval.**
+- **Branch from `develop`**, never `main`. PRs target `develop`; `main` advances only via squash-merge releases. Branch naming: `feature/<short-description>`. Sign commits with `git commit -s` (DCO).
+- **Conventional commits** (`feat:`, `fix:`, `chore:`, ...), one logical change per commit.
+- **No `print()` in backend** — use `logging`. Python: `snake_case` / `PascalCase`, type hints required. TS: strict mode, no `any` unless unavoidable.
+
+## File placement
+
+| Change | Location |
+|---|---|
+| New API route | `backend/src/apis/app_api/<domain>/` |
+| Admin endpoint | `backend/src/apis/app_api/admin/<domain>/` |
+| New agent tool | `backend/src/agents/main_agent/tools/` + register in `__init__.py` |
+| Shared backend code | `backend/src/apis/shared/<domain>/` |
+| Lambda for an infra stack | `backend/src/lambdas/<lambda-name>/` (not part of `apis/` boundary) |
+| Angular page | `frontend/ai.client/src/app/<feature>/` |
+| New CDK stack | `infrastructure/lib/<stack-name>-stack.ts` — also register in `test/stack-dependencies.test.ts` with a tier, add `scripts/stack-<name>/`, add a workflow under `.github/workflows/`, update `.github/docs/deploy/step-04-deploy.md` |
+
+## Debugging cheatsheet
+
+- **Tool not appearing:** check `__init__.py` export, RBAC permissions, `enabled_tools`, ToolRegistry.
+- **Session not persisting:** check AgentCore Memory config, `session_id`, `TurnBasedSessionManager` flush.
+- **SSE stream disconnecting:** check the 600s timeout, client connection, quota-exceeded events.
+- **Local inference-api route works, cloud returns 404:** the route isn't `/invocations` or `/ping` — move it to app-api (see Architecture).
+
+## Topic deep-dives
+
+Before non-trivial work in these areas, consult the matching skill/steering doc:
+
+- CDK stacks/constructs → `.claude/skills/cdk-infrastructure/` and `.kiro/steering/cdk-*.md`
+- Angular components/signals → `.claude/skills/angualar-best-practices/` and `.kiro/steering/angular-*.md`
+- Tailwind v4 / a11y → `.claude/skills/tailwind-ui/` and `.kiro/steering/tailwind-*.md`
+- CORS across stacks → `.claude/skills/cors-deployment/SKILL.md`
+- Release notes / CHANGELOG → `.claude/skills/release-notes/SKILL.md`
+- Version bumps → `.claude/skills/versioning/SKILL.md`
diff --git a/.github/dependabot.yml b/.github/dependabot.yml
index 70103ca9..59d6848d 100644
--- a/.github/dependabot.yml
+++ b/.github/dependabot.yml
@@ -1,4 +1,8 @@
 version: 2
+# Version-update PRs are disabled across all ecosystems
+# (open-pull-requests-limit: 0). We handle dependency upgrades manually on a
+# weekly cadence. Security updates are unaffected and will still be raised by
+# Dependabot when a CVE is published against a dependency.
 updates:
   # ── Python backend ──
   - package-ecosystem: "pip"
@@ -9,7 +13,7 @@ updates:
       day: "monday"
       time: "09:00"
       timezone: "America/Boise"
-    open-pull-requests-limit: 10
+    open-pull-requests-limit: 0
     versioning-strategy: "increase-if-necessary"
     commit-message:
       prefix: "chore(deps)"
@@ -33,7 +37,7 @@ updates:
       day: "monday"
       time: "09:00"
       timezone: "America/Boise"
-    open-pull-requests-limit: 10
+    open-pull-requests-limit: 0
     versioning-strategy: "increase-if-necessary"
     commit-message:
       prefix: "chore(deps)"
@@ -58,7 +62,7 @@ updates:
       day: "monday"
       time: "09:00"
       timezone: "America/Boise"
-    open-pull-requests-limit: 5
+    open-pull-requests-limit: 0
     versioning-strategy: "increase-if-necessary"
     commit-message:
       prefix: "chore(deps)"
@@ -83,7 +87,7 @@ updates:
       day: "monday"
       time: "09:00"
       timezone: "America/Boise"
-    open-pull-requests-limit: 5
+    open-pull-requests-limit: 0
     commit-message:
       prefix: "chore(deps)"
       include: "scope"
diff --git a/.github/docs/deploy/step-02-aws-setup.md b/.github/docs/deploy/step-02-aws-setup.md
index 698dc046..940a74b7 100644
--- a/.github/docs/deploy/step-02-aws-setup.md
+++ b/.github/docs/deploy/step-02-aws-setup.md
@@ -133,6 +133,13 @@ This allows the certificate to cover subdomains like `api.example.com` and `app.
 - `ALB Certificate ARN` (e.g. `arn:aws:acm:us-west-2:123456789012:certificate/abc-123`)
 - `CloudFront Certificate ARN` (e.g. `arn:aws:acm:us-east-1:123456789012:certificate/def-456`)
 
+> [!IMPORTANT]
+> **If you plan to enable the optional Artifacts stack, mind the wildcard depth.** A TLS wildcard covers **exactly one** label — `*.example.com` matches `artifacts.example.com` but **not** `artifacts.alpha.example.com`.
+> - If `CDK_DOMAIN_NAME` is your **apex** (e.g. `example.com`), the artifact origin is `artifacts.example.com` and the existing `*.example.com` CloudFront cert covers it — reuse that cert ARN, no third certificate needed.
+> - If `CDK_DOMAIN_NAME` is **already a subdomain** (e.g. `alpha.example.com`), the artifact origin is `artifacts.alpha.example.com`, which `*.example.com` does **not** cover. Issue a dedicated `us-east-1` cert for `*.alpha.example.com` (or exactly `artifacts.alpha.example.com`) and use that ARN for `CDK_ARTIFACTS_CERTIFICATE_ARN`.
+>
+> Verify before deploying: `aws acm describe-certificate --region us-east-1 --certificate-arn <arn> --query 'Certificate.SubjectAlternativeNames'` should list a SAN that matches `artifacts.{CDK_DOMAIN_NAME}`.
+
 <details>
 <summary>My certificate is stuck in "Pending validation"</summary>
 
diff --git a/.github/docs/deploy/step-03-github-config.md b/.github/docs/deploy/step-03-github-config.md
index 165308eb..97b1fa40 100644
--- a/.github/docs/deploy/step-03-github-config.md
+++ b/.github/docs/deploy/step-03-github-config.md
@@ -100,6 +100,9 @@ This prefix is prepended to all AWS resource names to avoid conflicts. Use somet
 | Variable Name | Default | Description |
 |---------------|---------|-------------|
 | `CDK_FINE_TUNING_ENABLED` | `false` | Set to `true` to enable the SageMaker Fine-Tuning stack. Must be set before running the fine-tuning deployment workflow in Step 4. |
+| `CDK_ARTIFACTS_ENABLED` | `false` | Set to `true` to enable iframe-isolated artifact rendering. Provisions the artifacts CloudFront origin, DDB table, S3 bucket, and Lambda. Requires `CDK_ARTIFACTS_CERTIFICATE_ARN`. |
+| `CDK_ARTIFACTS_CERTIFICATE_ARN` | — | ACM certificate ARN that covers `artifacts.{CDK_DOMAIN_NAME}`. **Must be in `us-east-1`** (CloudFront requirement). Reuse `CDK_FRONTEND_CERTIFICATE_ARN` **only if `CDK_DOMAIN_NAME` is your apex** (a `*.example.com` cert covers `artifacts.example.com`). If `CDK_DOMAIN_NAME` is itself a subdomain (e.g. `alpha.example.com`), wildcards are one label deep so `*.example.com` does **not** cover `artifacts.alpha.example.com` — issue a dedicated `us-east-1` cert for `*.alpha.example.com`. See [Step 2c](./step-02-aws-setup.md#2c-create-acm-certificates). |
+| `CDK_ARTIFACTS_EXTRA_FRAME_ANCESTORS` | — | Comma-separated extra origins (beyond `https://{CDK_DOMAIN_NAME}`) allowed to embed artifact iframes via CSP `frame-ancestors` — applied to both the CloudFront response-headers policy and the render Lambda. Set to `http://localhost:4200` to point a local SPA at this deployment. **Leave unset in production**: every listed origin can frame your users' artifacts (still render-token gated, but a real loosening on a shared environment). Prefer a one-off `cdk deploy '*ArtifactsStack*'` with this exported over committing it as a CI variable. |
 
 ---
 
diff --git a/.github/docs/deploy/step-04-deploy.md b/.github/docs/deploy/step-04-deploy.md
index 0b678a89..d1861656 100644
--- a/.github/docs/deploy/step-04-deploy.md
+++ b/.github/docs/deploy/step-04-deploy.md
@@ -10,7 +10,7 @@
 
 ---
 
-Now for the fun part. You'll trigger up to 8 GitHub Actions workflows in order. Each one deploys a different layer of the stack. Workflows that share the same step number can be run in parallel — just wait for the previous step to finish first.
+Now for the fun part. You'll trigger up to 9 GitHub Actions workflows in order. Each one deploys a different layer of the stack. Workflows that share the same step number can be run in parallel — just wait for the previous step to finish first.
 
 ## What you'll need for this step
 
@@ -41,6 +41,8 @@ Now for the fun part. You'll trigger up to 8 GitHub Actions workflows in order.
 | 1 | **Deploy Infrastructure** | VPC, subnets, ALB, ECS cluster, DynamoDB tables, S3 buckets |
 | 2 | **Deploy RAG Ingestion** | Document ingestion pipeline for retrieval-augmented generation |
 | 2 | **Deploy SageMaker Fine-Tuning** *(optional)* | SageMaker training/inference resources, S3 bucket, DynamoDB tables. Requires `CDK_FINE_TUNING_ENABLED=true` ([Step 3](./step-03-github-config.md#optional-features)). |
+| 2 | **Deploy Artifacts** *(optional)* | DynamoDB metadata + S3 content + CloudFront at `artifacts.{domain}` + Lambda render service. Requires `CDK_ARTIFACTS_ENABLED=true` and `CDK_ARTIFACTS_CERTIFICATE_ARN` (cert MUST be in `us-east-1`). |
+| 2 | **Deploy MCP Sandbox** *(optional)* | S3 + CloudFront at `mcp-sandbox.{domain}` + Route53 serving the MCP Apps sandbox-proxy shell. Requires `CDK_MCP_SANDBOX_ENABLED=true` and `CDK_MCP_SANDBOX_CERTIFICATE_ARN` (cert MUST be in `us-east-1`). Inert until later MCP Apps host-renderer PRs wire the SPA to it. |
 | 3 | **Deploy Inference API** | Strands Agent runtime container on ECS (Bedrock AgentCore) |
 | 4 | **Deploy App API** | Backend REST API container on ECS |
 | 5 | **Deploy Frontend** | Angular app to S3 + CloudFront distribution |
@@ -59,6 +61,8 @@ You can monitor the current state of each workflow:
 | Deploy Infrastructure | [![1.](https://github.com/Boise-State-Development/agentcore-public-stack/actions/workflows/infrastructure.yml/badge.svg)](https://github.com/Boise-State-Development/agentcore-public-stack/actions/workflows/infrastructure.yml) |
 | Deploy RAG Ingestion | [![2.](https://github.com/Boise-State-Development/agentcore-public-stack/actions/workflows/rag-ingestion.yml/badge.svg)](https://github.com/Boise-State-Development/agentcore-public-stack/actions/workflows/rag-ingestion.yml) |
 | Deploy SageMaker Fine-Tuning *(optional)* | [![2.](https://github.com/Boise-State-Development/agentcore-public-stack/actions/workflows/sagemaker-fine-tuning.yml/badge.svg)](https://github.com/Boise-State-Development/agentcore-public-stack/actions/workflows/sagemaker-fine-tuning.yml) |
+| Deploy Artifacts *(optional)* | [![2.](https://github.com/Boise-State-Development/agentcore-public-stack/actions/workflows/artifacts.yml/badge.svg)](https://github.com/Boise-State-Development/agentcore-public-stack/actions/workflows/artifacts.yml) |
+| Deploy MCP Sandbox *(optional)* | [![2.](https://github.com/Boise-State-Development/agentcore-public-stack/actions/workflows/mcp-sandbox.yml/badge.svg)](https://github.com/Boise-State-Development/agentcore-public-stack/actions/workflows/mcp-sandbox.yml) |
 | Deploy Inference API | [![3.](https://github.com/Boise-State-Development/agentcore-public-stack/actions/workflows/inference-api.yml/badge.svg)](https://github.com/Boise-State-Development/agentcore-public-stack/actions/workflows/inference-api.yml) |
 | Deploy App API | [![4.](https://github.com/Boise-State-Development/agentcore-public-stack/actions/workflows/app-api.yml/badge.svg)](https://github.com/Boise-State-Development/agentcore-public-stack/actions/workflows/app-api.yml) |
 | Deploy Frontend | [![5.](https://github.com/Boise-State-Development/agentcore-public-stack/actions/workflows/frontend.yml/badge.svg)](https://github.com/Boise-State-Development/agentcore-public-stack/actions/workflows/frontend.yml) |
@@ -70,6 +74,22 @@ You can monitor the current state of each workflow:
 
 ---
 
+## First-time deploy of a new optional stack
+
+> [!IMPORTANT]
+> When you flip a previously-disabled stack on for the first time (e.g. setting `CDK_FINE_TUNING_ENABLED=true` or `CDK_ARTIFACTS_ENABLED=true`), the PR that enables it will modify multiple stacks at once — the new stack itself, plus any consumer stacks that read its SSM exports. If you simply merge to `develop`, every affected workflow fires in parallel and the consumers will fail with `Parameter not found` because the new stack hasn't written its SSM keys yet. Subsequent normal pushes don't hit this — the race only happens on the very first deploy of new cross-stack SSM dependencies.
+
+**Recommended sequence — deploy in tier order via `workflow_dispatch` against the feature branch, then merge:**
+
+1. Push your feature branch. Do **not** merge yet.
+2. Open the **Actions** tab, pick the **Infrastructure** workflow, click **Run workflow**, and select your feature branch. Wait for it to go green. This publishes any new foundation SSM keys (e.g. the JWT signing secret for artifacts).
+3. Run the new stack's workflow (e.g. **Artifacts** or **SageMaker Fine-Tuning**) the same way. Wait for green. This publishes the new stack's SSM exports.
+4. Merge the PR. The consumer workflows (Inference API, App API, Frontend) re-deploy automatically on push to `develop` and find every SSM key they need on the first try.
+
+If you skip steps 2–3 and merge directly, the failing consumer workflows aren't broken — just click **Re-run** in the Actions tab after the foundation + new stack have completed, and they'll succeed. The pre-merge sequence above just spares you the retry dance and keeps your Actions history clean.
+
+---
+
 ## What Each Workflow Does
 
 <details>
@@ -109,6 +129,50 @@ After deployment, grant fine-tuning access to users via the admin dashboard.
 
 </details>
 
+<details>
+<summary>2. Deploy Artifacts (Optional)</summary>
+
+Provisions iframe-isolated artifact rendering — versioned, sandboxed HTML/code artifacts the agent can generate and the user can render inline. Skip if you don't need this capability; the rest of the stack works without it.
+
+Creates:
+- DynamoDB table for artifact metadata + version log
+- S3 bucket for artifact content blobs (private, no CORS)
+- CloudFront distribution serving `artifacts.{CDK_DOMAIN_NAME}`
+- Route 53 A record for the artifacts subdomain
+- Python render Lambda that wraps content in a strict CSP shell
+
+To enable, set these GitHub environment variables before running:
+
+- `CDK_ARTIFACTS_ENABLED=true`
+- `CDK_ARTIFACTS_CERTIFICATE_ARN` — ACM cert ARN that covers `artifacts.{domain}`. **Must be in `us-east-1`** (CloudFront requirement). Reuse `CDK_FRONTEND_CERTIFICATE_ARN` **only if `CDK_DOMAIN_NAME` is your apex** — a `*.example.com` cert covers `artifacts.example.com` but, because TLS wildcards are one label deep, does **not** cover `artifacts.alpha.example.com`. If `CDK_DOMAIN_NAME` is a subdomain, issue a dedicated `us-east-1` cert for `*.{CDK_DOMAIN_NAME}`. See [Step 2c](./step-02-aws-setup.md#2c-create-acm-certificates).
+- `CDK_ARTIFACTS_RETENTION_DAYS` *(optional, default 90)* — how long soft-deleted artifacts linger before lifecycle expiry.
+- `CDK_ARTIFACTS_EXTRA_FRAME_ANCESTORS` *(optional, default none)* — comma-separated extra origins allowed to embed artifact iframes via CSP `frame-ancestors`, on top of `https://{domain}`. Set to `http://localhost:4200` to point a local SPA at this environment. **Leave unset in production** — every listed origin can frame users' artifacts. Prefer a one-off targeted `cdk deploy '*ArtifactsStack*'` with this var exported over committing it as a CI variable, so localhost never lands in automated shared-env deploys.
+
+The artifact origin is intentionally a sibling subdomain (not the SPA origin) so artifact JS runs cross-origin and cannot access the `__Host-` session cookies, `localStorage`, or the app API. Defense in depth via strict CSP (`connect-src 'none'`, pinned `frame-ancestors`) is enforced both at the Lambda response and at the CloudFront response-headers policy.
+
+</details>
+
+<details>
+<summary>2. Deploy MCP Sandbox (Optional)</summary>
+
+Provisions the MCP Apps **sandbox-proxy origin** — a dedicated cross-origin shell that the SPA's MCP App iframe is pointed at, so interactive MCP App UIs run isolated from the `ai.client` origin. This is PR #1 of the MCP Apps host-renderer initiative (`docs/kaizen/scoping/mcp-apps-host-renderer.md`). Skip if you don't need MCP Apps; the rest of the stack works without it.
+
+Creates:
+- S3 bucket (private, OAC-only) holding the static `proxy.html` + `proxy.js` shell
+- CloudFront distribution serving `mcp-sandbox.{CDK_DOMAIN_NAME}`
+- Route 53 A record for the sandbox subdomain
+- A CloudFront response-headers policy that stamps `Content-Security-Policy: frame-ancestors <SPA origin only>`
+
+To enable, set these GitHub environment variables before running:
+
+- `CDK_MCP_SANDBOX_ENABLED=true`
+- `CDK_MCP_SANDBOX_CERTIFICATE_ARN` — ACM cert ARN that covers `mcp-sandbox.{domain}`. **Must be in `us-east-1`** (CloudFront requirement). The same wildcard-depth caveat as Artifacts applies: a `*.example.com` cert covers `mcp-sandbox.example.com` but **not** `mcp-sandbox.alpha.example.com`. If `CDK_DOMAIN_NAME` is a subdomain, issue a dedicated `us-east-1` cert for `*.{CDK_DOMAIN_NAME}`. See [Step 2c](./step-02-aws-setup.md#2c-create-acm-certificates).
+- `CDK_MCP_SANDBOX_EXTRA_FRAME_ANCESTORS` *(optional, default none)* — comma-separated extra origins allowed to embed the proxy iframe via CSP `frame-ancestors`, on top of `https://{domain}`. Set to `http://localhost:4200` to point a local SPA at this environment. **Leave unset in production** — every listed origin can frame the proxy. Prefer a one-off targeted `cdk deploy '*McpSandboxStack*'` with this var exported over committing it as a CI variable, so localhost never lands in automated shared-env deploys.
+
+The sandbox origin is intentionally a sibling subdomain (not the SPA origin), matching the Artifacts pattern: it is the **outer** half of the spec's Sandbox Proxy pattern, giving a stable cross-origin boundary so the inner MCP App content frame's `allow-same-origin` never reaches the SPA's cookies/`localStorage`/app API. When this stack is enabled, the Inference API conditionally consumes its `/mcp-sandbox/origin` SSM export into `AGENTCORE_MCP_APPS_SANDBOX_ORIGIN` and surfaces it on the `ui_resource` SSE event so the SPA knows where to frame an App. The MCP Apps host surface is now on by default (`AGENTCORE_MCP_APPS_HOST_ENABLED=true` since PR #7), but **stays dormant until an MCP-Apps-capable server is registered** in the tool catalog — see [Register an MCP-Apps-capable MCP server](#register-an-mcp-apps-capable-mcp-server) below. If you skip this stack the surface stays dormant regardless, because the SPA has no proxy origin to frame an App in.
+
+</details>
+
 <details>
 <summary>3. Deploy Inference API (AgentCore Runtime)</summary>
 
@@ -161,6 +225,77 @@ You can re-run this workflow later to update seed data.
 
 ---
 
+## Register an MCP-Apps-capable MCP server
+
+The MCP Apps host renderer (`docs/kaizen/scoping/mcp-apps-host-renderer.md`) is **on by default** (`AGENTCORE_MCP_APPS_HOST_ENABLED=true` since PR #7). But it stays completely dormant until you register an MCP-Apps-capable MCP server in the tool catalog — there is no built-in MCP App. Registration is a normal **tool-catalog** operation (DynamoDB, via the admin API); it is *not* a code constant or part of bootstrap seeding, so each environment opts in explicitly.
+
+### What "MCP-Apps-capable" means
+
+A server is MCP-Apps-capable when, on top of being a normal external MCP server, it:
+
+- advertises `_meta.ui` on its `tools/list` entries (a `resourceUri` of the form `ui://…` plus a `visibility` that includes `app`), and
+- serves that `ui://` resource via `resources/read` as `text/html;profile=mcp-app`.
+
+Our host advertises the `io.modelcontextprotocol/ui` extension on every outbound MCP `initialize` automatically (Gateway + external clients) — the server side needs no per-server opt-in here. Servers that don't speak the extension simply ignore it and behave as plain MCP tools.
+
+### Prerequisites
+
+1. **MCP Sandbox stack deployed** (`CDK_MCP_SANDBOX_ENABLED=true`, see *Deploy MCP Sandbox (Optional)* under [What Each Workflow Does](#what-each-workflow-does)). Without it the Inference API has no `/{prefix}/mcp-sandbox/origin` SSM value to consume, `AGENTCORE_MCP_APPS_SANDBOX_ORIGIN` stays empty, and the SPA has no cross-origin shell to frame the App in — every other piece can be wired and the App still won't render.
+2. **An MCP-Apps server reachable over Streamable HTTP.** External MCP tools connect via the Streamable-HTTP client (not stdio). Run the example server in HTTP mode (below).
+3. **Admin access** to the running App API (admin session cookie; the tool admin endpoints chain through `require_admin`).
+
+### Step 1 — Run the example server (dogfood)
+
+We dogfood with [`modelcontextprotocol/ext-apps` → `budget-allocator-server`](https://github.com/modelcontextprotocol/ext-apps/tree/main/examples/budget-allocator-server) — a form-style App (sliders, presets, benchmarks) that exercises `ui/update-model-context` and app-initiated `tools/call` without any 3D/charting backend infra. `scenario-modeler-server` works the same way if you prefer it.
+
+```bash
+git clone https://github.com/modelcontextprotocol/ext-apps
+cd ext-apps/examples/budget-allocator-server
+npm install
+npm run start:http          # stateless Streamable HTTP on http://localhost:3001/mcp
+```
+
+(Override the port with `PORT=…`. The README's default config is stdio — use `start:http`, since external MCP tools here are Streamable-HTTP only.)
+
+### Step 2 — Register it in the tool catalog
+
+Either use the **Admin UI** (Settings → Tools → Add tool → protocol *MCP (external)*) or the admin API directly. A ready-to-POST body is committed at [`docs/kaizen/scoping/mcp-apps-budget-allocator.tool.json`](../../../docs/kaizen/scoping/mcp-apps-budget-allocator.tool.json).
+
+Optionally pre-flight discovery (lists the server's tool names so you can confirm reachability/auth before creating the catalog entry):
+
+```bash
+curl -X POST "$APP_API/admin/tools/discover" \
+  -H 'Content-Type: application/json' --cookie "$ADMIN_COOKIE" \
+  -d '{"serverUrl":"http://localhost:3001/mcp","transport":"streamable-http","authType":"none"}'
+```
+
+Then create the catalog entry:
+
+```bash
+curl -X POST "$APP_API/admin/tools/" \
+  -H 'Content-Type: application/json' --cookie "$ADMIN_COOKIE" \
+  -d @docs/kaizen/scoping/mcp-apps-budget-allocator.tool.json
+```
+
+`authType: none` is only appropriate for an unauthenticated local/dev server. A real deployed MCP-Apps server uses `aws-iam` (Lambda Function URL / API Gateway behind SigV4), `api-key` (+ `secretArn`), or `oauth2` — exactly like any other external MCP tool. After creating the tool, grant it to the relevant App roles (Settings → Tools → Roles, or the `/admin/tools/{id}/roles` endpoints); a freshly created tool is in the catalog but not yet visible to any role.
+
+### Step 3 — Local-dev environment
+
+When running the backend locally (App API + Inference API), set in `backend/src/.env` (template: `backend/src/.env.example`):
+
+| Var | Value | Why |
+|-----|-------|-----|
+| `AGENTCORE_MCP_APPS_HOST_ENABLED` | `true` | Default; set `false` to opt this env out entirely. |
+| `AGENTCORE_MCP_APPS_SANDBOX_ORIGIN` | the deployed `mcp-sandbox.{domain}` origin (SSM `/{prefix}/mcp-sandbox/origin`) | The agent puts this on the `ui_resource` SSE event as `sandboxOrigin`; the SPA frames the App in it. Empty ⇒ no render. There is no local sandbox shell — point at a deployed one. |
+
+For the local SPA (`http://localhost:4200`) to be allowed to embed that deployed sandbox origin, the sandbox stack must list it in CSP `frame-ancestors` — deploy McpSandbox once with `CDK_MCP_SANDBOX_EXTRA_FRAME_ANCESTORS=http://localhost:4200` (one-off targeted deploy; never commit localhost as a shared CI variable — see *Deploy MCP Sandbox (Optional)* under [What Each Workflow Does](#what-each-workflow-does)).
+
+### Step 4 — Verify end-to-end
+
+In a chat, prompt the agent so it invokes the registered tool. You should see, in order: a `tool_use`/`tool_result` card, then the App's iframe rendering inside it (`ui_resource` SSE event carrying a non-empty `sandboxOrigin`). Driving the form should push `ui/notifications/tool-input`, app-initiated buttons should round-trip through `tools/call`, and an `ui/update-model-context` write should be visible to the model on the **next** turn. If the iframe stays blank, the `sandboxOrigin` is almost always empty (prerequisite 1) or the SPA origin isn't in the sandbox `frame-ancestors` (Step 3).
+
+---
+
 ## If a Workflow Fails
 
 1. Click into the failed workflow run to see the error logs
diff --git a/.github/docs/deploy/step-05-verify.md b/.github/docs/deploy/step-05-verify.md
index 5f4ea981..9e3450d0 100644
--- a/.github/docs/deploy/step-05-verify.md
+++ b/.github/docs/deploy/step-05-verify.md
@@ -72,6 +72,34 @@ The user who completed the first-boot setup is automatically the system admin.
 > [!TIP]
 > To add federated identity providers (Entra ID, Okta, Google, etc.), use the admin dashboard's authentication settings. No redeployment is needed.
 
+### 5. (Optional) MCP Apps dogfood — end-to-end
+
+Run this only if you've enabled the MCP Apps host renderer (MCP Sandbox stack deployed and an MCP-Apps server registered — see [Register an MCP-Apps-capable MCP server](./step-04-deploy.md#register-an-mcp-apps-capable-mcp-server)). It is the manual e2e scenario for the host-renderer initiative and walks every host↔App interaction. Using the `budget-allocator-server` example from the runbook:
+
+**Setup**
+- [ ] `budget-allocator-server` running over Streamable HTTP and registered as an `mcp_external` tool, granted to your role
+- [ ] `AGENTCORE_MCP_APPS_HOST_ENABLED=true` (default) and `AGENTCORE_MCP_APPS_SANDBOX_ORIGIN` resolves to the deployed `mcp-sandbox.{domain}` origin
+- [ ] Your SPA origin is in the sandbox's CSP `frame-ancestors`
+
+**Scenario** — in a fresh chat, ask the agent to "help me allocate a budget" (or anything that invokes the tool):
+
+- [ ] **Resource fetch** — a `tool_use` then `tool_result` card appears; backend logs show a server-side `resources/read` for the tool's `ui://…` resource (no client fetch)
+- [ ] **Iframe render** — the App renders *inside* the tool card: a `ui_resource` SSE event arrives with a **non-empty** `sandboxOrigin`, and the iframe is sourced from that origin (not `srcdoc` against the SPA origin)
+- [ ] **Tool-input push** — the App shows the arguments the model called it with (host pushed `ui/notifications/tool-input` from the active stream)
+- [ ] **App-initiated `tools/call`** — drive the form (move a slider / pick a preset) so the App calls a server tool; the call shows up as its own tool card in the thread *and* the App updates from the `ui/notifications/tool-result` it gets back
+- [ ] **`ui/update-model-context` mutates the next turn** — after changing the allocation, send a new chat message that asks about it (e.g. "is my current split reasonable?"); the model's reply reflects the App's latest state — i.e. context written via `ui/update-model-context` was merged into the **next** turn (not the one that opened the App)
+- [ ] **`ui/open-link` consent prompt** — trigger a link-open from the App (e.g. an "industry benchmarks" link); an inline consent prompt appears in the message list (modeled on the OAuth-consent prompt) and the link only opens after you approve. (Consent is **frontend-only** — there is no `ui_consent_required` SSE event; don't look for one.)
+
+<details>
+<summary>The App card appears but the iframe is blank</summary>
+
+In order of likelihood:
+1. `sandboxOrigin` is empty on the `ui_resource` event → the MCP Sandbox stack isn't deployed, or `CDK_MCP_SANDBOX_ENABLED` wasn't `true` when the Inference API deployed (it consumes `/{prefix}/mcp-sandbox/origin` conditionally).
+2. The SPA origin isn't in the sandbox CSP `frame-ancestors` → the browser blocks the frame (console shows a `frame-ancestors` violation). Redeploy McpSandbox with the right origin (`CDK_MCP_SANDBOX_EXTRA_FRAME_ANCESTORS` for localhost).
+3. The server didn't return `_meta.ui` on `tools/list`, or its `ui://` resource isn't `text/html;profile=mcp-app` → it isn't actually MCP-Apps-capable; re-check with the discover endpoint and the server's own logs.
+
+</details>
+
 ---
 
 ## You're Done!
diff --git a/.github/workflows/app-api.yml b/.github/workflows/app-api.yml
index c518c286..ab6a216a 100644
--- a/.github/workflows/app-api.yml
+++ b/.github/workflows/app-api.yml
@@ -273,7 +273,6 @@ jobs:
       CDK_DOMAIN_NAME: ${{ vars.CDK_DOMAIN_NAME }}
       CDK_CORS_ORIGINS: ${{ vars.CDK_CORS_ORIGINS }}
       CDK_VPC_CIDR: ${{ vars.CDK_VPC_CIDR }}
-      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_APP_API_ENABLED: ${{ vars.CDK_APP_API_ENABLED }}
       CDK_APP_API_CPU: ${{ vars.CDK_APP_API_CPU }}
       CDK_APP_API_MEMORY: ${{ vars.CDK_APP_API_MEMORY }}
@@ -284,6 +283,9 @@ jobs:
       CDK_FILE_UPLOAD_MAX_SIZE_MB: ${{ vars.CDK_FILE_UPLOAD_MAX_SIZE_MB }}
       CDK_FINE_TUNING_ENABLED: ${{ vars.CDK_FINE_TUNING_ENABLED }}
       CDK_FINE_TUNING_DEFAULT_QUOTA_HOURS: ${{ vars.CDK_FINE_TUNING_DEFAULT_QUOTA_HOURS }}
+      CDK_ARTIFACTS_ENABLED: ${{ vars.CDK_ARTIFACTS_ENABLED }}
+      CDK_ARTIFACTS_CERTIFICATE_ARN: ${{ vars.CDK_ARTIFACTS_CERTIFICATE_ARN }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
 
       AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
@@ -348,6 +350,9 @@ jobs:
       CDK_PROJECT_PREFIX: ${{ vars.CDK_PROJECT_PREFIX }}
       CDK_DOMAIN_NAME: ${{ vars.CDK_DOMAIN_NAME }}
       CDK_CORS_ORIGINS: ${{ vars.CDK_CORS_ORIGINS }}
+      CDK_ARTIFACTS_ENABLED: ${{ vars.CDK_ARTIFACTS_ENABLED }}
+      CDK_ARTIFACTS_CERTIFICATE_ARN: ${{ vars.CDK_ARTIFACTS_CERTIFICATE_ARN }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
       AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
       AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
@@ -412,6 +417,9 @@ jobs:
       IMAGE_TAG: ${{ needs.build-docker.outputs.image-tag }}
       CDK_AWS_REGION: ${{ vars.AWS_REGION }}
       CDK_PROJECT_PREFIX: ${{ vars.CDK_PROJECT_PREFIX }}
+      CDK_ARTIFACTS_ENABLED: ${{ vars.CDK_ARTIFACTS_ENABLED }}
+      CDK_ARTIFACTS_CERTIFICATE_ARN: ${{ vars.CDK_ARTIFACTS_CERTIFICATE_ARN }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
       AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
       AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
@@ -473,7 +481,6 @@ jobs:
       CDK_DOMAIN_NAME: ${{ vars.CDK_DOMAIN_NAME }}
       CDK_CORS_ORIGINS: ${{ vars.CDK_CORS_ORIGINS }}
       CDK_VPC_CIDR: ${{ vars.CDK_VPC_CIDR }}
-      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_APP_API_ENABLED: ${{ vars.CDK_APP_API_ENABLED }}
       CDK_APP_API_CPU: ${{ vars.CDK_APP_API_CPU }}
       CDK_APP_API_MEMORY: ${{ vars.CDK_APP_API_MEMORY }}
@@ -484,6 +491,9 @@ jobs:
       CDK_FILE_UPLOAD_MAX_SIZE_MB: ${{ vars.CDK_FILE_UPLOAD_MAX_SIZE_MB }}
       CDK_FINE_TUNING_ENABLED: ${{ vars.CDK_FINE_TUNING_ENABLED }}
       CDK_FINE_TUNING_DEFAULT_QUOTA_HOURS: ${{ vars.CDK_FINE_TUNING_DEFAULT_QUOTA_HOURS }}
+      CDK_ARTIFACTS_ENABLED: ${{ vars.CDK_ARTIFACTS_ENABLED }}
+      CDK_ARTIFACTS_CERTIFICATE_ARN: ${{ vars.CDK_ARTIFACTS_CERTIFICATE_ARN }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
 
       AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
diff --git a/.github/workflows/artifacts.yml b/.github/workflows/artifacts.yml
new file mode 100644
index 00000000..fd280255
--- /dev/null
+++ b/.github/workflows/artifacts.yml
@@ -0,0 +1,384 @@
+name: "2. Deploy Artifacts (DDB, S3, CloudFront, Lambda)"
+
+# Provisions the artifact iframe rendering pipeline (DynamoDB metadata table,
+# S3 content bucket, CloudFront distribution at artifacts.{domain}, and the
+# render Lambda). Gated on CDK_ARTIFACTS_ENABLED — the stack is skipped end-
+# to-end when disabled, identical to the SageMaker Fine-Tuning pattern.
+#
+# Deploy tier 1: depends only on InfrastructureStack (via SSM read of the
+# render-token signing-secret ARN). Parallel-safe with RAG Ingestion,
+# Gateway, and Fine-Tuning. Must complete before Inference API, App API,
+# and Frontend, which read this stack's SSM exports.
+
+on:
+  push:
+    branches:
+      - main
+      - develop
+    paths:
+      - 'infrastructure/lib/artifacts-stack.ts'
+      - 'infrastructure/lib/config.ts'
+      - 'infrastructure/bin/infrastructure.ts'
+      - 'infrastructure/cdk.json'
+      - 'infrastructure/cdk.context.json'
+      - 'infrastructure/package.json'
+      - 'backend/src/lambdas/artifact_render/**'
+      - 'scripts/stack-artifacts/**'
+      - '.github/workflows/artifacts.yml'
+  pull_request:
+    branches:
+      - main
+      - develop
+    paths:
+      - 'infrastructure/lib/artifacts-stack.ts'
+      - 'infrastructure/lib/config.ts'
+      - 'infrastructure/bin/infrastructure.ts'
+      - 'infrastructure/cdk.json'
+      - 'infrastructure/cdk.context.json'
+      - 'infrastructure/package.json'
+      - 'backend/src/lambdas/artifact_render/**'
+      - 'scripts/stack-artifacts/**'
+      - '.github/workflows/artifacts.yml'
+  workflow_dispatch:
+    inputs:
+      environment:
+        description: 'Deployment environment'
+        required: true
+        default: 'production'
+        type: choice
+        options:
+          - production
+          - staging
+          - development
+      skip_tests:
+        description: 'Skip tests'
+        required: false
+        default: 'false'
+      skip_deploy:
+        description: 'Skip deployment'
+        required: false
+        default: 'false'
+
+permissions:
+  contents: read
+
+env:
+  CDK_REQUIRE_APPROVAL: never
+  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
+  LOAD_ENV_QUIET: true
+
+concurrency:
+  group: artifacts-${{ github.ref }}
+  cancel-in-progress: false
+
+jobs:
+  check-stack-dependencies:
+    name: Check Stack Dependencies
+    runs-on: ubuntu-24.04
+    if: github.event_name != 'pull_request'
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Run stack dependency tests
+        run: |
+          bash scripts/common/test-stack-dependencies.sh
+
+  install:
+    name: Install Dependencies
+    runs-on: ubuntu-24.04
+    if: github.event_name != 'pull_request'
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Install system dependencies
+        run: |
+          bash scripts/common/install-deps.sh
+
+      - name: Install CDK dependencies
+        run: |
+          bash scripts/stack-artifacts/install.sh
+
+      - name: Save node_modules cache
+        uses: actions/cache/save@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        with:
+          path: infrastructure/node_modules
+          key: infrastructure-node-modules-${{ hashFiles('infrastructure/package-lock.json') }}
+
+  build:
+    name: Build CDK Code
+    runs-on: ubuntu-24.04
+    needs: [install, check-stack-dependencies]
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Restore node_modules cache
+        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        with:
+          path: infrastructure/node_modules
+          key: infrastructure-node-modules-${{ hashFiles('infrastructure/package-lock.json') }}
+
+      - name: Build CDK
+        run: |
+          bash scripts/stack-artifacts/build-cdk.sh
+
+      - name: Save build artifacts cache
+        uses: actions/cache/save@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        with:
+          path: |
+            infrastructure/lib/**/*.js
+            infrastructure/lib/**/*.d.ts
+            infrastructure/bin/**/*.js
+            infrastructure/bin/**/*.d.ts
+          key: infrastructure-build-${{ github.sha }}
+
+  synth:
+    name: Synthesize CloudFormation Template
+    runs-on: ubuntu-24.04
+    needs: build
+    if: github.event_name != 'pull_request'
+    outputs:
+      enabled: ${{ steps.check.outputs.enabled }}
+
+    environment: ${{ github.event.inputs.environment || (github.ref == 'refs/heads/main' && 'production') || (github.ref == 'refs/heads/develop' && 'development') || 'development' }}
+
+    permissions:
+      id-token: write
+      contents: read
+
+    env:
+      CDK_AWS_REGION: ${{ vars.AWS_REGION }}
+      CDK_PROJECT_PREFIX: ${{ vars.CDK_PROJECT_PREFIX }}
+      CDK_VPC_CIDR: ${{ vars.CDK_VPC_CIDR }}
+      CDK_DOMAIN_NAME: ${{ vars.CDK_DOMAIN_NAME }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
+      CDK_CORS_ORIGINS: ${{ vars.CDK_CORS_ORIGINS }}
+      CDK_RETAIN_DATA_ON_DELETE: ${{ vars.CDK_RETAIN_DATA_ON_DELETE }}
+      CDK_ARTIFACTS_ENABLED: ${{ vars.CDK_ARTIFACTS_ENABLED }}
+      CDK_ARTIFACTS_CERTIFICATE_ARN: ${{ vars.CDK_ARTIFACTS_CERTIFICATE_ARN }}
+      CDK_ARTIFACTS_RETENTION_DAYS: ${{ vars.CDK_ARTIFACTS_RETENTION_DAYS }}
+      CDK_ARTIFACTS_EXTRA_FRAME_ANCESTORS: ${{ vars.CDK_ARTIFACTS_EXTRA_FRAME_ANCESTORS }}
+      CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
+      AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
+      AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
+      AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
+
+    steps:
+      - name: Check if artifacts are enabled
+        id: check
+        run: |
+          if [ "${{ vars.CDK_ARTIFACTS_ENABLED }}" = "true" ] || [ "${{ vars.CDK_ARTIFACTS_ENABLED }}" = "1" ]; then
+            echo "enabled=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "enabled=false" >> "$GITHUB_OUTPUT"
+            echo "Artifacts stack is disabled — skipping synth"
+          fi
+
+      - name: Checkout code
+        if: steps.check.outputs.enabled == 'true'
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Restore node_modules cache
+        if: steps.check.outputs.enabled == 'true'
+        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        with:
+          path: infrastructure/node_modules
+          key: infrastructure-node-modules-${{ hashFiles('infrastructure/package-lock.json') }}
+
+      - name: Restore build artifacts
+        if: steps.check.outputs.enabled == 'true'
+        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        with:
+          path: |
+            infrastructure/lib/**/*.js
+            infrastructure/lib/**/*.d.ts
+            infrastructure/bin/**/*.js
+            infrastructure/bin/**/*.d.ts
+          key: infrastructure-build-${{ github.sha }}
+
+      - name: Configure AWS credentials
+        if: steps.check.outputs.enabled == 'true'
+        uses: ./.github/actions/configure-aws-credentials
+        with:
+          aws-region: ${{ env.CDK_AWS_REGION }}
+          role-session-name: GitHubActions-Artifacts-Synth
+          aws-role-arn: ${{ env.AWS_ROLE_ARN }}
+          aws-access-key-id: ${{ env.AWS_ACCESS_KEY_ID }}
+          aws-secret-access-key: ${{ env.AWS_SECRET_ACCESS_KEY }}
+
+      - name: Install system dependencies
+        if: steps.check.outputs.enabled == 'true'
+        run: |
+          bash scripts/common/install-deps.sh
+
+      - name: Synthesize CloudFormation template
+        if: steps.check.outputs.enabled == 'true'
+        run: |
+          bash scripts/stack-artifacts/synth.sh
+
+      - name: Upload synthesized templates
+        if: steps.check.outputs.enabled == 'true'
+        uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
+        with:
+          name: artifacts-cdk-synth
+          path: infrastructure/cdk.out/
+          retention-days: 7
+
+  test:
+    name: Validate CloudFormation Template
+    runs-on: ubuntu-24.04
+    needs: synth
+    if: ${{ needs.synth.outputs.enabled == 'true' && github.event.inputs.skip_tests != 'true' }}
+
+    environment: ${{ github.event.inputs.environment || (github.ref == 'refs/heads/main' && 'production') || (github.ref == 'refs/heads/develop' && 'development') || 'development' }}
+
+    permissions:
+      id-token: write
+      contents: read
+
+    env:
+      CDK_AWS_REGION: ${{ vars.AWS_REGION }}
+      CDK_PROJECT_PREFIX: ${{ vars.CDK_PROJECT_PREFIX }}
+      CDK_VPC_CIDR: ${{ vars.CDK_VPC_CIDR }}
+      CDK_DOMAIN_NAME: ${{ vars.CDK_DOMAIN_NAME }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
+      CDK_CORS_ORIGINS: ${{ vars.CDK_CORS_ORIGINS }}
+      CDK_RETAIN_DATA_ON_DELETE: ${{ vars.CDK_RETAIN_DATA_ON_DELETE }}
+      CDK_ARTIFACTS_ENABLED: ${{ vars.CDK_ARTIFACTS_ENABLED }}
+      CDK_ARTIFACTS_CERTIFICATE_ARN: ${{ vars.CDK_ARTIFACTS_CERTIFICATE_ARN }}
+      CDK_ARTIFACTS_RETENTION_DAYS: ${{ vars.CDK_ARTIFACTS_RETENTION_DAYS }}
+      CDK_ARTIFACTS_EXTRA_FRAME_ANCESTORS: ${{ vars.CDK_ARTIFACTS_EXTRA_FRAME_ANCESTORS }}
+      CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
+      AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
+      AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
+      AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Restore node_modules cache
+        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        with:
+          path: infrastructure/node_modules
+          key: infrastructure-node-modules-${{ hashFiles('infrastructure/package-lock.json') }}
+
+      - name: Restore build artifacts
+        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        with:
+          path: |
+            infrastructure/lib/**/*.js
+            infrastructure/lib/**/*.d.ts
+            infrastructure/bin/**/*.js
+            infrastructure/bin/**/*.d.ts
+          key: infrastructure-build-${{ github.sha }}
+
+      - name: Download synthesized templates
+        uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
+        with:
+          name: artifacts-cdk-synth
+          path: infrastructure/cdk.out/
+
+      - name: Configure AWS credentials
+        uses: ./.github/actions/configure-aws-credentials
+        with:
+          aws-region: ${{ env.CDK_AWS_REGION }}
+          role-session-name: GitHubActions-Artifacts-Test
+          aws-role-arn: ${{ env.AWS_ROLE_ARN }}
+          aws-access-key-id: ${{ env.AWS_ACCESS_KEY_ID }}
+          aws-secret-access-key: ${{ env.AWS_SECRET_ACCESS_KEY }}
+
+      - name: Install system dependencies
+        run: |
+          bash scripts/common/install-deps.sh
+
+      - name: Run CDK diff to validate template
+        run: |
+          bash scripts/stack-artifacts/test-cdk.sh
+
+  deploy:
+    name: Deploy Artifacts Stack
+    runs-on: ubuntu-24.04
+    needs: [synth, test]
+    if: |
+      always() && !cancelled() &&
+      needs.synth.outputs.enabled == 'true' &&
+      !contains(needs.*.result, 'failure') &&
+      (github.event_name == 'push' || github.event_name == 'workflow_dispatch') &&
+      (github.event_name != 'workflow_dispatch' || github.event.inputs.skip_deploy != 'true')
+
+    environment: ${{ github.event.inputs.environment || (github.ref == 'refs/heads/main' && 'production') || (github.ref == 'refs/heads/develop' && 'development') || 'development' }}
+
+    permissions:
+      id-token: write
+      contents: read
+
+    env:
+      CDK_AWS_REGION: ${{ vars.AWS_REGION }}
+      CDK_PROJECT_PREFIX: ${{ vars.CDK_PROJECT_PREFIX }}
+      CDK_VPC_CIDR: ${{ vars.CDK_VPC_CIDR }}
+      CDK_DOMAIN_NAME: ${{ vars.CDK_DOMAIN_NAME }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
+      CDK_CORS_ORIGINS: ${{ vars.CDK_CORS_ORIGINS }}
+      CDK_RETAIN_DATA_ON_DELETE: ${{ vars.CDK_RETAIN_DATA_ON_DELETE }}
+      CDK_ARTIFACTS_ENABLED: ${{ vars.CDK_ARTIFACTS_ENABLED }}
+      CDK_ARTIFACTS_CERTIFICATE_ARN: ${{ vars.CDK_ARTIFACTS_CERTIFICATE_ARN }}
+      CDK_ARTIFACTS_RETENTION_DAYS: ${{ vars.CDK_ARTIFACTS_RETENTION_DAYS }}
+      CDK_ARTIFACTS_EXTRA_FRAME_ANCESTORS: ${{ vars.CDK_ARTIFACTS_EXTRA_FRAME_ANCESTORS }}
+      CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
+      AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
+      AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
+      AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Restore node_modules cache
+        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        with:
+          path: infrastructure/node_modules
+          key: infrastructure-node-modules-${{ hashFiles('infrastructure/package-lock.json') }}
+
+      - name: Restore build artifacts
+        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        with:
+          path: |
+            infrastructure/lib/**/*.js
+            infrastructure/lib/**/*.d.ts
+            infrastructure/bin/**/*.js
+            infrastructure/bin/**/*.d.ts
+          key: infrastructure-build-${{ github.sha }}
+
+      - name: Download synthesized templates
+        uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
+        with:
+          name: artifacts-cdk-synth
+          path: infrastructure/cdk.out/
+
+      - name: Configure AWS credentials
+        uses: ./.github/actions/configure-aws-credentials
+        with:
+          aws-region: ${{ env.CDK_AWS_REGION }}
+          role-session-name: GitHubActions-Artifacts-Deploy
+          aws-role-arn: ${{ env.AWS_ROLE_ARN }}
+          aws-access-key-id: ${{ env.AWS_ACCESS_KEY_ID }}
+          aws-secret-access-key: ${{ env.AWS_SECRET_ACCESS_KEY }}
+
+      - name: Install system dependencies
+        run: |
+          bash scripts/common/install-deps.sh
+
+      - name: Deploy Artifacts Stack
+        run: |
+          bash scripts/stack-artifacts/deploy.sh
+
+      - name: Upload stack outputs
+        uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
+        if: always()
+        with:
+          name: artifacts-outputs
+          path: infrastructure/artifacts-outputs.json
+          retention-days: 30
diff --git a/.github/workflows/backup-data.yml b/.github/workflows/backup-data.yml
new file mode 100644
index 00000000..bc2c7735
--- /dev/null
+++ b/.github/workflows/backup-data.yml
@@ -0,0 +1,102 @@
+name: "Backup Data (Pre-Migration)"
+
+# Manual-dispatch only — this is a one-shot tool intended to be run before
+# a destructive infrastructure migration. See scripts/backup-data/README.md.
+
+on:
+  workflow_dispatch:
+    inputs:
+      project_prefix:
+        description: "CDK_PROJECT_PREFIX of the environment to back up (e.g. 'boisestate-prod')"
+        required: true
+        type: string
+      aws_region:
+        description: "AWS region"
+        required: true
+        default: "us-west-2"
+        type: string
+      aws_environment:
+        description: "GitHub Environment (selects AWS_ROLE_ARN secret + approvals)"
+        required: true
+        default: "production"
+        type: choice
+        options:
+          - production
+          - staging
+          - development
+      include_ephemeral:
+        description: "Also back up TTL-driven session/state tables"
+        required: false
+        default: false
+        type: boolean
+      dry_run:
+        description: "Discover and list sources without performing any writes"
+        required: false
+        default: false
+        type: boolean
+      allow_partial:
+        description: "Succeed even if some components fail (manifest still reflects state)"
+        required: false
+        default: false
+        type: boolean
+
+permissions:
+  id-token: write   # OIDC role assumption
+  contents: read
+
+concurrency:
+  # One backup at a time per environment to avoid bucket-name collisions
+  # and double-export of the same tables.
+  group: backup-data-${{ inputs.aws_environment }}
+  cancel-in-progress: false
+
+jobs:
+  backup:
+    name: Backup ${{ inputs.project_prefix }}
+    runs-on: ubuntu-24.04
+    environment: ${{ inputs.aws_environment }}
+    timeout-minutes: 360  # large S3 syncs can take a while
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Configure AWS credentials
+        uses: ./.github/actions/configure-aws-credentials
+        with:
+          aws-region: ${{ inputs.aws_region }}
+          aws-role-arn: ${{ secrets.AWS_ROLE_ARN }}
+          role-session-name: "backup-data-${{ inputs.project_prefix }}"
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@d0cc045d04ccac9d8b7881df0226f9e82c39688e # v6.8.0
+        with:
+          version: "0.5.x"
+
+      - name: Set up Python 3.13
+        run: uv python install 3.13
+
+      - name: Install backup tool
+        working-directory: scripts/backup-data
+        run: uv sync
+
+      - name: Run backup
+        working-directory: scripts/backup-data
+        env:
+          AWS_REGION: ${{ inputs.aws_region }}
+        run: |
+          set -euo pipefail
+          ARGS=(
+            --project-prefix "${{ inputs.project_prefix }}"
+            --region "${{ inputs.aws_region }}"
+          )
+          if [[ "${{ inputs.include_ephemeral }}" == "true" ]]; then
+            ARGS+=(--include-ephemeral)
+          fi
+          if [[ "${{ inputs.dry_run }}" == "true" ]]; then
+            ARGS+=(--dry-run)
+          fi
+          if [[ "${{ inputs.allow_partial }}" == "true" ]]; then
+            ARGS+=(--allow-partial)
+          fi
+          uv run python backup.py "${ARGS[@]}"
diff --git a/.github/workflows/frontend.yml b/.github/workflows/frontend.yml
index 9b46c12c..e50e65db 100644
--- a/.github/workflows/frontend.yml
+++ b/.github/workflows/frontend.yml
@@ -235,6 +235,9 @@ jobs:
       CDK_FRONTEND_ENABLED: ${{ vars.CDK_FRONTEND_ENABLED }}
       CDK_FRONTEND_CLOUDFRONT_PRICE_CLASS: ${{ vars.CDK_FRONTEND_CLOUDFRONT_PRICE_CLASS }}
       CDK_RETAIN_DATA_ON_DELETE: ${{ vars.CDK_RETAIN_DATA_ON_DELETE }}
+      CDK_ARTIFACTS_ENABLED: ${{ vars.CDK_ARTIFACTS_ENABLED }}
+      CDK_ARTIFACTS_CERTIFICATE_ARN: ${{ vars.CDK_ARTIFACTS_CERTIFICATE_ARN }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
       CDK_CERTIFICATE_ARN: ${{ vars.CDK_CERTIFICATE_ARN }}
       CDK_FRONTEND_CERTIFICATE_ARN: ${{ vars.CDK_FRONTEND_CERTIFICATE_ARN }}
@@ -297,6 +300,9 @@ jobs:
       # Environment-scoped configuration
       CDK_AWS_REGION: ${{ vars.AWS_REGION }}
       CDK_PROJECT_PREFIX: ${{ vars.CDK_PROJECT_PREFIX }}
+      CDK_ARTIFACTS_ENABLED: ${{ vars.CDK_ARTIFACTS_ENABLED }}
+      CDK_ARTIFACTS_CERTIFICATE_ARN: ${{ vars.CDK_ARTIFACTS_CERTIFICATE_ARN }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
       AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
       AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
@@ -365,6 +371,9 @@ jobs:
       CDK_FRONTEND_ENABLED: ${{ vars.CDK_FRONTEND_ENABLED }}
       CDK_FRONTEND_CLOUDFRONT_PRICE_CLASS: ${{ vars.CDK_FRONTEND_CLOUDFRONT_PRICE_CLASS }}
       CDK_RETAIN_DATA_ON_DELETE: ${{ vars.CDK_RETAIN_DATA_ON_DELETE }}
+      CDK_ARTIFACTS_ENABLED: ${{ vars.CDK_ARTIFACTS_ENABLED }}
+      CDK_ARTIFACTS_CERTIFICATE_ARN: ${{ vars.CDK_ARTIFACTS_CERTIFICATE_ARN }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
       CDK_CERTIFICATE_ARN: ${{ vars.CDK_CERTIFICATE_ARN }}
       CDK_FRONTEND_CERTIFICATE_ARN: ${{ vars.CDK_FRONTEND_CERTIFICATE_ARN }}
@@ -438,6 +447,9 @@ jobs:
       CDK_AWS_REGION: ${{ vars.AWS_REGION }}
       CDK_PROJECT_PREFIX: ${{ vars.CDK_PROJECT_PREFIX }}
       CDK_PRODUCTION: ${{ vars.CDK_PRODUCTION }}
+      CDK_ARTIFACTS_ENABLED: ${{ vars.CDK_ARTIFACTS_ENABLED }}
+      CDK_ARTIFACTS_CERTIFICATE_ARN: ${{ vars.CDK_ARTIFACTS_CERTIFICATE_ARN }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
       CDK_FRONTEND_BUCKET_NAME: ${{ vars.CDK_FRONTEND_BUCKET_NAME }}
       AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
diff --git a/.github/workflows/inference-api.yml b/.github/workflows/inference-api.yml
index 88bdddc8..a210b133 100644
--- a/.github/workflows/inference-api.yml
+++ b/.github/workflows/inference-api.yml
@@ -295,6 +295,9 @@ jobs:
       CDK_INFERENCE_API_MAX_CAPACITY: ${{ vars.CDK_INFERENCE_API_MAX_CAPACITY }}
       ENV_INFERENCE_API_LOG_LEVEL: ${{ vars.ENV_INFERENCE_API_LOG_LEVEL }}
       CDK_INFERENCE_API_CORS_ORIGINS: ${{ vars.CDK_INFERENCE_API_CORS_ORIGINS }}
+      CDK_ARTIFACTS_ENABLED: ${{ vars.CDK_ARTIFACTS_ENABLED }}
+      CDK_ARTIFACTS_CERTIFICATE_ARN: ${{ vars.CDK_ARTIFACTS_CERTIFICATE_ARN }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
       CDK_APP_VERSION: ${{ needs.build-docker.outputs.app-version }}
       AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
@@ -356,6 +359,9 @@ jobs:
       CDK_PROJECT_PREFIX: ${{ vars.CDK_PROJECT_PREFIX }}
       CDK_DOMAIN_NAME: ${{ vars.CDK_DOMAIN_NAME }}
       CDK_CORS_ORIGINS: ${{ vars.CDK_CORS_ORIGINS }}
+      CDK_ARTIFACTS_ENABLED: ${{ vars.CDK_ARTIFACTS_ENABLED }}
+      CDK_ARTIFACTS_CERTIFICATE_ARN: ${{ vars.CDK_ARTIFACTS_CERTIFICATE_ARN }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
       AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
       AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
@@ -422,6 +428,9 @@ jobs:
       APP_VERSION: ${{ needs.build-docker.outputs.app-version }}
       CDK_AWS_REGION: ${{ vars.AWS_REGION }}
       CDK_PROJECT_PREFIX: ${{ vars.CDK_PROJECT_PREFIX }}
+      CDK_ARTIFACTS_ENABLED: ${{ vars.CDK_ARTIFACTS_ENABLED }}
+      CDK_ARTIFACTS_CERTIFICATE_ARN: ${{ vars.CDK_ARTIFACTS_CERTIFICATE_ARN }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
       AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
       AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
@@ -491,6 +500,9 @@ jobs:
       CDK_INFERENCE_API_MAX_CAPACITY: ${{ vars.CDK_INFERENCE_API_MAX_CAPACITY }}
       ENV_INFERENCE_API_LOG_LEVEL: ${{ vars.ENV_INFERENCE_API_LOG_LEVEL }}
       CDK_INFERENCE_API_CORS_ORIGINS: ${{ vars.CDK_INFERENCE_API_CORS_ORIGINS }}
+      CDK_ARTIFACTS_ENABLED: ${{ vars.CDK_ARTIFACTS_ENABLED }}
+      CDK_ARTIFACTS_CERTIFICATE_ARN: ${{ vars.CDK_ARTIFACTS_CERTIFICATE_ARN }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
       AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
       AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
diff --git a/.github/workflows/infrastructure.yml b/.github/workflows/infrastructure.yml
index 6ed82f5d..d3a60487 100644
--- a/.github/workflows/infrastructure.yml
+++ b/.github/workflows/infrastructure.yml
@@ -157,7 +157,6 @@ jobs:
       CDK_DOMAIN_NAME: ${{ vars.CDK_DOMAIN_NAME }}
       CDK_CORS_ORIGINS: ${{ vars.CDK_CORS_ORIGINS }}
       CDK_VPC_CIDR: ${{ vars.CDK_VPC_CIDR }}
-      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_ALB_SUBDOMAIN: ${{ vars.CDK_ALB_SUBDOMAIN }}
       CDK_CERTIFICATE_ARN: ${{ vars.CDK_CERTIFICATE_ARN }}
       CDK_RETAIN_DATA_ON_DELETE: ${{ vars.CDK_RETAIN_DATA_ON_DELETE }}
@@ -167,6 +166,9 @@ jobs:
       CDK_COGNITO_CALLBACK_URLS: ${{ vars.CDK_COGNITO_CALLBACK_URLS }}
       CDK_COGNITO_LOGOUT_URLS: ${{ vars.CDK_COGNITO_LOGOUT_URLS }}
       CDK_COGNITO_SUPPORTED_IDPS: ${{ vars.CDK_COGNITO_SUPPORTED_IDPS }}
+      CDK_ARTIFACTS_ENABLED: ${{ vars.CDK_ARTIFACTS_ENABLED }}
+      CDK_ARTIFACTS_CERTIFICATE_ARN: ${{ vars.CDK_ARTIFACTS_CERTIFICATE_ARN }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
       AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
       AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
@@ -235,7 +237,6 @@ jobs:
       CDK_DOMAIN_NAME: ${{ vars.CDK_DOMAIN_NAME }}
       CDK_CORS_ORIGINS: ${{ vars.CDK_CORS_ORIGINS }}
       CDK_VPC_CIDR: ${{ vars.CDK_VPC_CIDR }}
-      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_ALB_SUBDOMAIN: ${{ vars.CDK_ALB_SUBDOMAIN }}
       CDK_CERTIFICATE_ARN: ${{ vars.CDK_CERTIFICATE_ARN }}
       CDK_RETAIN_DATA_ON_DELETE: ${{ vars.CDK_RETAIN_DATA_ON_DELETE }}
@@ -245,6 +246,9 @@ jobs:
       CDK_COGNITO_CALLBACK_URLS: ${{ vars.CDK_COGNITO_CALLBACK_URLS }}
       CDK_COGNITO_LOGOUT_URLS: ${{ vars.CDK_COGNITO_LOGOUT_URLS }}
       CDK_COGNITO_SUPPORTED_IDPS: ${{ vars.CDK_COGNITO_SUPPORTED_IDPS }}
+      CDK_ARTIFACTS_ENABLED: ${{ vars.CDK_ARTIFACTS_ENABLED }}
+      CDK_ARTIFACTS_CERTIFICATE_ARN: ${{ vars.CDK_ARTIFACTS_CERTIFICATE_ARN }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
       AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
       AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
@@ -317,7 +321,6 @@ jobs:
       CDK_DOMAIN_NAME: ${{ vars.CDK_DOMAIN_NAME }}
       CDK_CORS_ORIGINS: ${{ vars.CDK_CORS_ORIGINS }}
       CDK_VPC_CIDR: ${{ vars.CDK_VPC_CIDR }}
-      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_ALB_SUBDOMAIN: ${{ vars.CDK_ALB_SUBDOMAIN }}
       CDK_CERTIFICATE_ARN: ${{ vars.CDK_CERTIFICATE_ARN }}
       CDK_RETAIN_DATA_ON_DELETE: ${{ vars.CDK_RETAIN_DATA_ON_DELETE }}
@@ -335,6 +338,9 @@ jobs:
       CDK_COGNITO_CALLBACK_URLS: ${{ vars.CDK_COGNITO_CALLBACK_URLS }}
       CDK_COGNITO_LOGOUT_URLS: ${{ vars.CDK_COGNITO_LOGOUT_URLS }}
       CDK_COGNITO_SUPPORTED_IDPS: ${{ vars.CDK_COGNITO_SUPPORTED_IDPS }}
+      CDK_ARTIFACTS_ENABLED: ${{ vars.CDK_ARTIFACTS_ENABLED }}
+      CDK_ARTIFACTS_CERTIFICATE_ARN: ${{ vars.CDK_ARTIFACTS_CERTIFICATE_ARN }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
       CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
       AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
       AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
diff --git a/.github/workflows/mcp-sandbox.yml b/.github/workflows/mcp-sandbox.yml
new file mode 100644
index 00000000..af80d449
--- /dev/null
+++ b/.github/workflows/mcp-sandbox.yml
@@ -0,0 +1,383 @@
+name: "2. Deploy MCP Sandbox (S3, CloudFront, Route53)"
+
+# Provisions the MCP Apps sandbox-proxy origin: an S3 bucket holding the
+# static proxy shell, a CloudFront distribution at mcp-sandbox.{domain} that
+# stamps the CSP (frame-ancestors = SPA origin only), and a Route53 record.
+# Gated on CDK_MCP_SANDBOX_ENABLED — the stack is skipped end-to-end when
+# disabled, identical to the Artifacts / SageMaker Fine-Tuning pattern.
+#
+# PR #1 of docs/kaizen/scoping/mcp-apps-host-renderer.md. Deploy tier 1:
+# reads no cross-stack SSM. Parallel-safe with Artifacts, RAG Ingestion,
+# Gateway, and Fine-Tuning. Inert until the SPA wiring (PR #4) and
+# MCP_APPS_HOST_ENABLED (PR #7) land — nothing consumes its SSM origin
+# export before then.
+
+on:
+  push:
+    branches:
+      - main
+      - develop
+    paths:
+      - 'infrastructure/lib/mcp-sandbox-stack.ts'
+      - 'infrastructure/lib/config.ts'
+      - 'infrastructure/bin/infrastructure.ts'
+      - 'infrastructure/cdk.json'
+      - 'infrastructure/cdk.context.json'
+      - 'infrastructure/package.json'
+      - 'infrastructure/assets/mcp-sandbox/**'
+      - 'scripts/stack-mcp-sandbox/**'
+      - '.github/workflows/mcp-sandbox.yml'
+  pull_request:
+    branches:
+      - main
+      - develop
+    paths:
+      - 'infrastructure/lib/mcp-sandbox-stack.ts'
+      - 'infrastructure/lib/config.ts'
+      - 'infrastructure/bin/infrastructure.ts'
+      - 'infrastructure/cdk.json'
+      - 'infrastructure/cdk.context.json'
+      - 'infrastructure/package.json'
+      - 'infrastructure/assets/mcp-sandbox/**'
+      - 'scripts/stack-mcp-sandbox/**'
+      - '.github/workflows/mcp-sandbox.yml'
+  workflow_dispatch:
+    inputs:
+      environment:
+        description: 'Deployment environment'
+        required: true
+        default: 'production'
+        type: choice
+        options:
+          - production
+          - staging
+          - development
+      skip_tests:
+        description: 'Skip tests'
+        required: false
+        default: 'false'
+      skip_deploy:
+        description: 'Skip deployment'
+        required: false
+        default: 'false'
+
+permissions:
+  contents: read
+
+env:
+  CDK_REQUIRE_APPROVAL: never
+  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
+  LOAD_ENV_QUIET: true
+
+concurrency:
+  group: mcp-sandbox-${{ github.ref }}
+  cancel-in-progress: false
+
+jobs:
+  check-stack-dependencies:
+    name: Check Stack Dependencies
+    runs-on: ubuntu-24.04
+    if: github.event_name != 'pull_request'
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Run stack dependency tests
+        run: |
+          bash scripts/common/test-stack-dependencies.sh
+
+  install:
+    name: Install Dependencies
+    runs-on: ubuntu-24.04
+    if: github.event_name != 'pull_request'
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Install system dependencies
+        run: |
+          bash scripts/common/install-deps.sh
+
+      - name: Install CDK dependencies
+        run: |
+          bash scripts/stack-mcp-sandbox/install.sh
+
+      - name: Save node_modules cache
+        uses: actions/cache/save@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        with:
+          path: infrastructure/node_modules
+          key: infrastructure-node-modules-${{ hashFiles('infrastructure/package-lock.json') }}
+
+  build:
+    name: Build CDK Code
+    runs-on: ubuntu-24.04
+    needs: [install, check-stack-dependencies]
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Restore node_modules cache
+        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        with:
+          path: infrastructure/node_modules
+          key: infrastructure-node-modules-${{ hashFiles('infrastructure/package-lock.json') }}
+
+      - name: Build CDK
+        run: |
+          bash scripts/stack-mcp-sandbox/build-cdk.sh
+
+      - name: Save build artifacts cache
+        uses: actions/cache/save@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        with:
+          path: |
+            infrastructure/lib/**/*.js
+            infrastructure/lib/**/*.d.ts
+            infrastructure/bin/**/*.js
+            infrastructure/bin/**/*.d.ts
+          key: infrastructure-build-${{ github.sha }}
+
+  synth:
+    name: Synthesize CloudFormation Template
+    runs-on: ubuntu-24.04
+    needs: build
+    if: github.event_name != 'pull_request'
+    outputs:
+      enabled: ${{ steps.check.outputs.enabled }}
+
+    environment: ${{ github.event.inputs.environment || (github.ref == 'refs/heads/main' && 'production') || (github.ref == 'refs/heads/develop' && 'development') || 'development' }}
+
+    permissions:
+      id-token: write
+      contents: read
+
+    env:
+      CDK_AWS_REGION: ${{ vars.AWS_REGION }}
+      CDK_PROJECT_PREFIX: ${{ vars.CDK_PROJECT_PREFIX }}
+      CDK_VPC_CIDR: ${{ vars.CDK_VPC_CIDR }}
+      CDK_DOMAIN_NAME: ${{ vars.CDK_DOMAIN_NAME }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
+      CDK_CORS_ORIGINS: ${{ vars.CDK_CORS_ORIGINS }}
+      CDK_RETAIN_DATA_ON_DELETE: ${{ vars.CDK_RETAIN_DATA_ON_DELETE }}
+      CDK_MCP_SANDBOX_ENABLED: ${{ vars.CDK_MCP_SANDBOX_ENABLED }}
+      CDK_MCP_SANDBOX_CERTIFICATE_ARN: ${{ vars.CDK_MCP_SANDBOX_CERTIFICATE_ARN }}
+      CDK_MCP_SANDBOX_EXTRA_FRAME_ANCESTORS: ${{ vars.CDK_MCP_SANDBOX_EXTRA_FRAME_ANCESTORS }}
+      CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
+      AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
+      AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
+      AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
+
+    steps:
+      - name: Check if MCP sandbox is enabled
+        id: check
+        run: |
+          if [ "${{ vars.CDK_MCP_SANDBOX_ENABLED }}" = "true" ] || [ "${{ vars.CDK_MCP_SANDBOX_ENABLED }}" = "1" ]; then
+            echo "enabled=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "enabled=false" >> "$GITHUB_OUTPUT"
+            echo "MCP Sandbox stack is disabled — skipping synth"
+          fi
+
+      - name: Checkout code
+        if: steps.check.outputs.enabled == 'true'
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Restore node_modules cache
+        if: steps.check.outputs.enabled == 'true'
+        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        with:
+          path: infrastructure/node_modules
+          key: infrastructure-node-modules-${{ hashFiles('infrastructure/package-lock.json') }}
+
+      - name: Restore build artifacts
+        if: steps.check.outputs.enabled == 'true'
+        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        with:
+          path: |
+            infrastructure/lib/**/*.js
+            infrastructure/lib/**/*.d.ts
+            infrastructure/bin/**/*.js
+            infrastructure/bin/**/*.d.ts
+          key: infrastructure-build-${{ github.sha }}
+
+      - name: Configure AWS credentials
+        if: steps.check.outputs.enabled == 'true'
+        uses: ./.github/actions/configure-aws-credentials
+        with:
+          aws-region: ${{ env.CDK_AWS_REGION }}
+          role-session-name: GitHubActions-McpSandbox-Synth
+          aws-role-arn: ${{ env.AWS_ROLE_ARN }}
+          aws-access-key-id: ${{ env.AWS_ACCESS_KEY_ID }}
+          aws-secret-access-key: ${{ env.AWS_SECRET_ACCESS_KEY }}
+
+      - name: Install system dependencies
+        if: steps.check.outputs.enabled == 'true'
+        run: |
+          bash scripts/common/install-deps.sh
+
+      - name: Synthesize CloudFormation template
+        if: steps.check.outputs.enabled == 'true'
+        run: |
+          bash scripts/stack-mcp-sandbox/synth.sh
+
+      - name: Upload synthesized templates
+        if: steps.check.outputs.enabled == 'true'
+        uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
+        with:
+          name: mcp-sandbox-cdk-synth
+          path: infrastructure/cdk.out/
+          retention-days: 7
+
+  test:
+    name: Validate CloudFormation Template
+    runs-on: ubuntu-24.04
+    needs: synth
+    if: ${{ needs.synth.outputs.enabled == 'true' && github.event.inputs.skip_tests != 'true' }}
+
+    environment: ${{ github.event.inputs.environment || (github.ref == 'refs/heads/main' && 'production') || (github.ref == 'refs/heads/develop' && 'development') || 'development' }}
+
+    permissions:
+      id-token: write
+      contents: read
+
+    env:
+      CDK_AWS_REGION: ${{ vars.AWS_REGION }}
+      CDK_PROJECT_PREFIX: ${{ vars.CDK_PROJECT_PREFIX }}
+      CDK_VPC_CIDR: ${{ vars.CDK_VPC_CIDR }}
+      CDK_DOMAIN_NAME: ${{ vars.CDK_DOMAIN_NAME }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
+      CDK_CORS_ORIGINS: ${{ vars.CDK_CORS_ORIGINS }}
+      CDK_RETAIN_DATA_ON_DELETE: ${{ vars.CDK_RETAIN_DATA_ON_DELETE }}
+      CDK_MCP_SANDBOX_ENABLED: ${{ vars.CDK_MCP_SANDBOX_ENABLED }}
+      CDK_MCP_SANDBOX_CERTIFICATE_ARN: ${{ vars.CDK_MCP_SANDBOX_CERTIFICATE_ARN }}
+      CDK_MCP_SANDBOX_EXTRA_FRAME_ANCESTORS: ${{ vars.CDK_MCP_SANDBOX_EXTRA_FRAME_ANCESTORS }}
+      CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
+      AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
+      AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
+      AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Restore node_modules cache
+        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        with:
+          path: infrastructure/node_modules
+          key: infrastructure-node-modules-${{ hashFiles('infrastructure/package-lock.json') }}
+
+      - name: Restore build artifacts
+        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        with:
+          path: |
+            infrastructure/lib/**/*.js
+            infrastructure/lib/**/*.d.ts
+            infrastructure/bin/**/*.js
+            infrastructure/bin/**/*.d.ts
+          key: infrastructure-build-${{ github.sha }}
+
+      - name: Download synthesized templates
+        uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
+        with:
+          name: mcp-sandbox-cdk-synth
+          path: infrastructure/cdk.out/
+
+      - name: Configure AWS credentials
+        uses: ./.github/actions/configure-aws-credentials
+        with:
+          aws-region: ${{ env.CDK_AWS_REGION }}
+          role-session-name: GitHubActions-McpSandbox-Test
+          aws-role-arn: ${{ env.AWS_ROLE_ARN }}
+          aws-access-key-id: ${{ env.AWS_ACCESS_KEY_ID }}
+          aws-secret-access-key: ${{ env.AWS_SECRET_ACCESS_KEY }}
+
+      - name: Install system dependencies
+        run: |
+          bash scripts/common/install-deps.sh
+
+      - name: Run CDK diff to validate template
+        run: |
+          bash scripts/stack-mcp-sandbox/test-cdk.sh
+
+  deploy:
+    name: Deploy MCP Sandbox Stack
+    runs-on: ubuntu-24.04
+    needs: [synth, test]
+    if: |
+      always() && !cancelled() &&
+      needs.synth.outputs.enabled == 'true' &&
+      !contains(needs.*.result, 'failure') &&
+      (github.event_name == 'push' || github.event_name == 'workflow_dispatch') &&
+      (github.event_name != 'workflow_dispatch' || github.event.inputs.skip_deploy != 'true')
+
+    environment: ${{ github.event.inputs.environment || (github.ref == 'refs/heads/main' && 'production') || (github.ref == 'refs/heads/develop' && 'development') || 'development' }}
+
+    permissions:
+      id-token: write
+      contents: read
+
+    env:
+      CDK_AWS_REGION: ${{ vars.AWS_REGION }}
+      CDK_PROJECT_PREFIX: ${{ vars.CDK_PROJECT_PREFIX }}
+      CDK_VPC_CIDR: ${{ vars.CDK_VPC_CIDR }}
+      CDK_DOMAIN_NAME: ${{ vars.CDK_DOMAIN_NAME }}
+      CDK_HOSTED_ZONE_DOMAIN: ${{ vars.CDK_HOSTED_ZONE_DOMAIN }}
+      CDK_CORS_ORIGINS: ${{ vars.CDK_CORS_ORIGINS }}
+      CDK_RETAIN_DATA_ON_DELETE: ${{ vars.CDK_RETAIN_DATA_ON_DELETE }}
+      CDK_MCP_SANDBOX_ENABLED: ${{ vars.CDK_MCP_SANDBOX_ENABLED }}
+      CDK_MCP_SANDBOX_CERTIFICATE_ARN: ${{ vars.CDK_MCP_SANDBOX_CERTIFICATE_ARN }}
+      CDK_MCP_SANDBOX_EXTRA_FRAME_ANCESTORS: ${{ vars.CDK_MCP_SANDBOX_EXTRA_FRAME_ANCESTORS }}
+      CDK_AWS_ACCOUNT: ${{ vars.CDK_AWS_ACCOUNT }}
+      AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
+      AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
+      AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Restore node_modules cache
+        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        with:
+          path: infrastructure/node_modules
+          key: infrastructure-node-modules-${{ hashFiles('infrastructure/package-lock.json') }}
+
+      - name: Restore build artifacts
+        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        with:
+          path: |
+            infrastructure/lib/**/*.js
+            infrastructure/lib/**/*.d.ts
+            infrastructure/bin/**/*.js
+            infrastructure/bin/**/*.d.ts
+          key: infrastructure-build-${{ github.sha }}
+
+      - name: Download synthesized templates
+        uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
+        with:
+          name: mcp-sandbox-cdk-synth
+          path: infrastructure/cdk.out/
+
+      - name: Configure AWS credentials
+        uses: ./.github/actions/configure-aws-credentials
+        with:
+          aws-region: ${{ env.CDK_AWS_REGION }}
+          role-session-name: GitHubActions-McpSandbox-Deploy
+          aws-role-arn: ${{ env.AWS_ROLE_ARN }}
+          aws-access-key-id: ${{ env.AWS_ACCESS_KEY_ID }}
+          aws-secret-access-key: ${{ env.AWS_SECRET_ACCESS_KEY }}
+
+      - name: Install system dependencies
+        run: |
+          bash scripts/common/install-deps.sh
+
+      - name: Deploy MCP Sandbox Stack
+        run: |
+          bash scripts/stack-mcp-sandbox/deploy.sh
+
+      - name: Upload stack outputs
+        uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
+        if: always()
+        with:
+          name: mcp-sandbox-outputs
+          path: infrastructure/mcp-sandbox-outputs.json
+          retention-days: 30
diff --git a/.github/workflows/nightly-deploy-pipeline.yml b/.github/workflows/nightly-deploy-pipeline.yml
index 422dbb05..fbaa1139 100644
--- a/.github/workflows/nightly-deploy-pipeline.yml
+++ b/.github/workflows/nightly-deploy-pipeline.yml
@@ -431,6 +431,7 @@ jobs:
       CDK_VPC_CIDR: ${{ vars.CDK_VPC_CIDR }}
       CDK_DOMAIN_NAME: ""
       CDK_HOSTED_ZONE_DOMAIN: ""
+      CDK_CERTIFICATE_ARN: ${{ vars.CDK_CERTIFICATE_ARN }}
       CDK_FRONTEND_CERTIFICATE_ARN: ""
       CDK_FRONTEND_BUCKET_NAME: ""
       CDK_FRONTEND_CLOUDFRONT_PRICE_CLASS: ""
diff --git a/.github/workflows/skip-auth-guard.yml b/.github/workflows/skip-auth-guard.yml
new file mode 100644
index 00000000..d3fd2b5b
--- /dev/null
+++ b/.github/workflows/skip-auth-guard.yml
@@ -0,0 +1,49 @@
+name: "Guard: SKIP_AUTH must not appear in deployed config"
+
+# Refuses any PR or push that lets the local-dev SKIP_AUTH=true bypass
+# leak into deployed configuration. The bypass itself is implemented in
+# backend/src/apis/shared/auth/dependencies.py and gated at boot in
+# backend/src/apis/app_api/main.py — both of those legitimately reference
+# the string and are excluded from the scan. Anywhere else is a leak.
+
+permissions:
+  contents: read
+
+on:
+  push:
+    branches: [main, develop]
+  pull_request:
+    branches: [main, develop]
+
+jobs:
+  scan:
+    runs-on: ubuntu-24.04
+    steps:
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Scan for SKIP_AUTH=true in deployable config
+        run: |
+          set -euo pipefail
+          # Look in CDK, GitHub Actions, Dockerfiles, and any task definitions.
+          # Exclude the two files that legitimately reference the variable
+          # (the bypass implementation + its startup guard) and this workflow.
+          # Catches `SKIP_AUTH=true`, `SKIP_AUTH: true`, `SKIP_AUTH: "true"`, etc.
+          # Covers shell, Dockerfile, YAML, and CDK TypeScript object-literal styles.
+          PATTERN='SKIP_AUTH[[:space:]]*[:=][[:space:]]*["'\'']*true'
+          MATCHES=$(grep -RInE "$PATTERN" \
+            infrastructure/lib \
+            infrastructure/bin \
+            scripts \
+            .github/workflows \
+            backend/Dockerfile.app-api \
+            backend/Dockerfile.inference-api \
+            2>/dev/null \
+            | grep -v '^.github/workflows/skip-auth-guard.yml:' \
+            || true)
+          if [ -n "$MATCHES" ]; then
+            echo "::error::SKIP_AUTH=true is a local-dev bypass and must never appear in deployed config."
+            echo "Found in:"
+            echo "$MATCHES"
+            exit 1
+          fi
+          echo "OK — no SKIP_AUTH=true in deployable config."
diff --git a/.gitignore b/.gitignore
index 58ed0a5a..a20a3e3e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -126,3 +126,11 @@ coverage/
 
 # Local dev scripts
 start.sh
+refresh-aws-sso.ps1
+start.ps1
+test-api-key.py
+.kiro/steering/github.md
+backend/start_app_api.ps1
+backend/start_inference_api.ps1
+scripts/stack-bootstrap/install.sh
+test-accounts
diff --git a/.kiro/specs/bff-middleware-event-loop-blocking/.config.kiro b/.kiro/specs/bff-middleware-event-loop-blocking/.config.kiro
new file mode 100644
index 00000000..d4b7813d
--- /dev/null
+++ b/.kiro/specs/bff-middleware-event-loop-blocking/.config.kiro
@@ -0,0 +1 @@
+{"specId": "075212d4-ee53-4e7a-bc6d-9d99dacb7cef", "workflowType": "requirements-first", "specType": "bugfix"}
diff --git a/.kiro/specs/bff-middleware-event-loop-blocking/bugfix.md b/.kiro/specs/bff-middleware-event-loop-blocking/bugfix.md
new file mode 100644
index 00000000..4e345403
--- /dev/null
+++ b/.kiro/specs/bff-middleware-event-loop-blocking/bugfix.md
@@ -0,0 +1,78 @@
+# Bugfix Requirements Document
+
+## Introduction
+
+Since the `v1.0.0-beta.24` BFF Token Handler release (commit `258193d`, deployed 2026-05-06), the app-api service has exhibited severe tail-latency and ingress stalls on page loads. Angular's refresh fan-out (~8 concurrent endpoints — `/auth/session`, `/models`, `/tools/`, `/files/quota`, `/users/me/permissions`, `/sessions`, `/assistants`, `/connectors/`) sees requests time out or exceed the ALB 60s idle cap. Observed signals over the last 24h on the affected fleet:
+
+- Two `HTTPCode_ELB_504_Count` events (13:37 and 14:40 UTC) — driven by ALB idle timeout, not target 5xx.
+- `TargetResponseTime` p-max of 15.6s at 15:25 UTC; `/files/quota` outliers reaching ~80s.
+- Endpoint p95s under load: `/models` ~141ms, `/tools/` ~289ms, `/users/me/permissions` ~239ms, `/sessions` ~188ms.
+- ECS task at 0.7% CPU / 23% memory. No DDB throttling (0 `ReadThrottleEvents` / `WriteThrottleEvents` across all 23 tables). Zero target 5xx.
+
+The new `SessionRefreshMiddleware` runs on every request carrying a `__Host-bff_session` cookie. Its hot-path calls block the single-worker uvicorn event loop on synchronous boto3 I/O (DynamoDB + Cognito), its cache TTL and its sliding-renewal throttle are aligned on the same 60s boundary, and the per-session lock that coalesces refreshes does not coalesce the broader session-resolve path. The result is ~16 serialized blocking AWS calls at the front of every page load per active user, once per minute — with no concurrency slack because the service runs one uvicorn worker in one ECS task.
+
+Impact: degraded UX for every logged-in user (spinners, stale data, failed tab refreshes), 504s visible to users, and a fragile service posture where any single slow AWS call stalls every in-flight request on the same task.
+
+## Bug Analysis
+
+### Current Behavior (Defect)
+
+What currently happens under the new middleware on cookie-bearing requests:
+
+1.1 WHEN `SessionRepository.get`, `touch_last_seen`, `update_tokens`, `put`, or `delete` is awaited inside a request handler THEN the uvicorn event loop blocks for the full DynamoDB round-trip because the methods are declared `async def` but call boto3 synchronously with no `asyncio.to_thread` offload and no aioboto3.
+
+1.2 WHEN `SessionRefreshMiddleware._resolve_session` invokes `CognitoRefreshClient.refresh` THEN the uvicorn event loop blocks for the full `cognito-idp:initiate_auth` round-trip because `CognitoRefreshClient.refresh` is a plain `def` called without threadpool offload, and it runs while the per-session `asyncio.Lock` from `get_session_lock()` is held.
+
+1.3 WHEN N concurrent requests for the same `session_id` arrive with no valid cached `SessionRecord` THEN the middleware issues N independent DynamoDB `get_item` calls because the existing per-session lock only coalesces the refresh exchange, not the upstream unseal → `SessionCache.get` → `SessionRepository.get` sequence.
+
+1.4 WHEN the `SessionCache` TTL elapses at the same moment the sliding-renewal throttle window elapses THEN a single request triggers both a DynamoDB `get_item` and a DynamoDB `update_item` on its critical path because `_DEFAULT_REFRESH_LEEWAY_SECONDS` and `_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS` are both `60` in `sessions_bff/config.py`, so cache expiry and throttle expiry are aligned.
+
+1.5 WHEN a request passes `SessionRefreshMiddleware` and a slide is warranted THEN the caller's response waits for `touch_last_seen` to complete against DynamoDB because `_maybe_slide` `await`s the write inline on the request path, even though the code is documented to swallow failures ("Don't fail the request if the slide-write fails").
+
+1.6 WHEN the app-api container starts THEN the service has no concurrency slack because the `CMD` in `backend/Dockerfile.app-api` launches a single uvicorn worker with no `--workers` flag and the service runs as a single ECS task — one blocked event loop stalls all ingress, consistent with low CPU/memory while latency climbs.
+
+1.7 WHEN Angular fires its ~8-endpoint page-load fan-out with a session cookie and the per-session cache window has just elapsed THEN ~16 serialized blocking DynamoDB operations (per-coroutine `get_item` plus per-coroutine `update_item`) accumulate at the front of the page load because each coroutine independently observes cache-miss and past-throttle on its local `SessionRecord` copy and each runs its own blocking AWS I/O on the event loop thread.
+
+### Expected Behavior (Correct)
+
+What should happen instead, keeping the same middleware surface and contracts:
+
+2.1 WHEN `SessionRepository.get`, `touch_last_seen`, `update_tokens`, `put`, or `delete` is awaited inside a request handler THEN the method SHALL execute its underlying boto3 call in a threadpool (via `asyncio.to_thread` or an equivalent offload) so the uvicorn event loop continues scheduling other coroutines for the full DynamoDB round-trip.
+
+2.2 WHEN `SessionRefreshMiddleware._resolve_session` invokes `CognitoRefreshClient.refresh` THEN the Cognito `initiate_auth` call SHALL execute in a threadpool so the uvicorn event loop continues scheduling other coroutines — including coroutines for different `session_id`s — while the per-session `asyncio.Lock` is held.
+
+2.3 WHEN N concurrent requests for the same `session_id` arrive with no valid cached `SessionRecord` THEN the middleware SHALL coalesce them so at most one DynamoDB `get_item` is issued per `session_id` per cache window; the remaining N−1 requests SHALL await a shared in-process future keyed by `session_id` and consume its result.
+
+2.4 WHEN the `SessionCache` TTL elapses THEN a cache miss SHALL NOT imply a sliding-renewal DynamoDB write on the same request because `_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS` SHALL be a strict multiple of `_DEFAULT_REFRESH_LEEWAY_SECONDS` (e.g. 300s versus 60s) so cache-expiry and throttle-expiry are de-aligned.
+
+2.5 WHEN a request passes `SessionRefreshMiddleware` and a slide is warranted THEN the caller's response SHALL NOT wait for `touch_last_seen` because `_maybe_slide` SHALL dispatch the DynamoDB write as a detached `asyncio.Task` (fire-and-forget) and SHALL return the computed `Max-Age` to the response path immediately.
+
+2.6 WHEN the app-api container starts THEN the service SHALL have concurrency slack such that a single blocked event loop does not stall all ingress — uvicorn SHALL run with `--workers >= 2` (at current `cpu=512`) and/or the ECS service SHALL run `>= 2` tasks; the chosen configuration SHALL be deployed.
+
+2.7 WHEN Angular fires its ~8-endpoint page-load fan-out with a session cookie and the per-session cache window has just elapsed THEN the middleware SHALL issue at most 1 DynamoDB `get_item` and at most 1 DynamoDB `update_item` per `session_id` per cache window across the fan-out (not ~16 blocking calls), and all 8 responses SHALL return without any one of them serially waiting on another's AWS I/O to complete.
+
+### Unchanged Behavior (Regression Prevention)
+
+Existing guarantees that MUST continue to hold after the fix:
+
+3.1 WHEN `BFFConfig.is_enabled()` returns `False` (env vars unset) THEN `SessionRefreshMiddleware` SHALL CONTINUE TO short-circuit as a pass-through with no AWS calls (dormant pass-through invariant preserved).
+
+3.2 WHEN a request arrives without a `__Host-bff_session` cookie THEN `SessionRefreshMiddleware` SHALL CONTINUE TO pass the request through without unsealing, cache lookup, or DynamoDB access.
+
+3.3 WHEN an unrecoverable cookie is detected (bad seal, missing DynamoDB row, expired TTL, or terminal `CognitoRefreshError`) THEN the middleware SHALL CONTINUE TO clear both `__Host-bff_session` and `__Host-bff_csrf` on the response.
+
+3.4 WHEN `_maybe_slide` returns a non-`None` `Max-Age` THEN the response's `Set-Cookie` headers for `__Host-bff_session` and `__Host-bff_csrf` SHALL CONTINUE TO use that exact value (the cookie re-emit contract between `_maybe_slide` and `_reemit_cookies` is preserved under fire-and-forget dispatch of the DynamoDB write).
+
+3.5 WHEN N concurrent requests for the same `session_id` cross the refresh-leeway window at the same moment THEN exactly one `cognito-idp:initiate_auth` exchange SHALL CONTINUE TO be issued per `session_id` per leeway window (the existing refresh-storm coalescing via `get_session_lock(session_id)` is preserved end-to-end).
+
+3.6 WHEN `CookieCodec._ensure_cipher` is called on a hot request THEN the AES-GCM cipher SHALL CONTINUE TO be served from the process-wide `get_default_codec()` singleton with no per-request `kms:GenerateDataKey` call.
+
+3.7 WHEN `resolve_bff_client_secret` is called on a hot request THEN the BFF Cognito app-client secret SHALL CONTINUE TO be served from the module-scope cache with no per-request `secretsmanager:GetSecretValue` call.
+
+3.8 WHEN `CSRFMiddleware` validates an unsafe-method request with `request.state.bff_session` set THEN it SHALL CONTINUE TO accept/reject using the existing in-memory HMAC double-submit check with no new I/O introduced on its path.
+
+3.9 WHEN the absolute-lifetime cap (`created_at + absolute_lifetime_seconds`) has passed THEN `_maybe_slide` SHALL CONTINUE TO return `None` so no further cookie re-emission or DynamoDB slide is issued.
+
+3.10 WHEN a refresh rotates the Cognito refresh token and the `update_tokens` persist fails THEN the middleware SHALL CONTINUE TO invalidate the cache entry and clear the cookie so the user is forced to re-authenticate (fail-closed rotation behavior preserved).
+
+3.11 WHEN the BFF cookie seal fails to decode THEN the middleware SHALL CONTINUE TO treat every decode failure identically (no timing or response-shape oracle introduced by the new offload or single-flight paths).
diff --git a/.kiro/specs/bff-middleware-event-loop-blocking/code-review-report.md b/.kiro/specs/bff-middleware-event-loop-blocking/code-review-report.md
new file mode 100644
index 00000000..f30360f5
--- /dev/null
+++ b/.kiro/specs/bff-middleware-event-loop-blocking/code-review-report.md
@@ -0,0 +1,245 @@
+# Code Review Report: BFF Middleware Event-Loop Blocking Bugfix
+
+**Branch**: `fix/bff-middleware-event-loop-blocking`
+**PR**: [#264](https://github.com/Boise-State-Development/agentcore-public-stack/pull/264)
+**Commits reviewed**:
+- `db3d2e06` — Initial fix (tasks 3.1–3.7)
+- `dd91d6fd` — Test polling adjustment
+- `78891e2e` — Strong-reference fix for fire-and-forget tasks
+
+This report reviews each technical decision in the bugfix against authoritative external sources (Python docs, AWS docs, canonical patterns from the Python ecosystem) to demonstrate the approach was sound. Where my initial implementation missed a production nuance, I flag it and cite the source that caught it.
+
+---
+
+## 1. Offloading sync boto3 to threads via `asyncio.to_thread`
+
+**Change**: `SessionRepository.{get,put,update_tokens,touch_last_seen,delete}` and `CognitoRefreshClient.refresh` now wrap their boto3 calls in `await asyncio.to_thread(...)`.
+
+**Why this is correct**:
+
+The official Python documentation for [`asyncio.to_thread()`](https://docs.python.org/3/library/asyncio-task.html#asyncio.to_thread) describes it as:
+
+> This coroutine function is primarily intended to be used for executing IO-bound functions/methods that would otherwise block the event loop if they were run in the main thread.
+
+The docs state explicitly that `asyncio.to_thread` is the idiomatic solution for IO-bound blocking work — which is exactly what boto3's synchronous HTTP calls to DynamoDB and Cognito are. They also note:
+
+> Due to the GIL, asyncio.to_thread() can typically only be used to make IO-bound functions non-blocking.
+
+boto3 is a well-known offender in this exact scenario. [Stack Overflow](https://stackoverflow.com/questions/72092993/i-want-to-use-boto3-in-async-function-python) recommends two options for using boto3 in async code: (a) use `aioboto3`/`aiobotocore`, or (b) wrap boto3 in `asyncio.to_thread`/`loop.run_in_executor`. Both are valid; `to_thread` is the lower-friction choice because it doesn't introduce a new async SDK with a different API surface.
+
+The existing codebase had a documented awareness of this gap — the `SessionRepository` docstring before the fix acknowledged that boto3 runs on the event loop thread. The fix simply closes that gap without reshaping the API.
+
+**Alternative considered (not taken)**: Replacing boto3 with [`aioboto3`](https://pypi.org/project/aioboto3/). Rejected because: (a) adds a new dependency, (b) changes method signatures across the repository (e.g. `async with table.get_item(...)` vs `table.get_item(...)`), (c) the per-method offload is a surgical change with no ripple effect on callers. The spec explicitly called for "targeted, minimal-surface intervention that keeps the middleware's public contracts intact."
+
+**Verdict**: ✅ Correct approach, supported by official Python docs.
+
+---
+
+## 2. Per-session single-flight via `asyncio.Future`
+
+**Change**: New `backend/src/apis/shared/sessions_bff/single_flight.py` exports `async def resolve_once(session_id, loader_coro_factory)`. The first caller per `session_id` creates a Future, runs the loader, sets the result; concurrent callers await the same Future.
+
+**Why this is correct**:
+
+This is the canonical **request coalescing** / **single-flight** pattern. The Python ecosystem recognizes it as the standard solution for N-concurrent-callers-one-backend-hit. From [OneUptime's "How to Reduce DB Load with Request Coalescing in Python"](https://oneuptime.com/blog/post/2026-01-23-request-coalescing-python/view):
+
+> Request coalescing, also known as request deduplication or single-flighting, is a technique where concurrent requests for the same resource are merged into a single backend call.
+>
+> _(paraphrased for licensing compliance)_
+
+And from [SystemDesignSandbox](https://www.systemdesignsandbox.com/learn/hot-key-cache-stampede), "request coalescing" is listed as a textbook solution to fan-out amplification on hot keys / concurrent cache misses.
+
+The name comes from Go's `golang.org/x/sync/singleflight` package, which is the reference implementation of this pattern. Python's `asyncio.Future` is the natural primitive for it: multiple coroutines can `await` the same Future, and setting the result/exception wakes all of them.
+
+**Why a Future and not an `asyncio.Lock`**: The existing `get_session_lock(session_id)` in `lock.py` already serializes the Cognito refresh exchange. A lock would serialize the fan-out (N callers run sequentially through one DDB call), but we want to **coalesce** it (N callers share one result). A Future is the right primitive for coalescing. The design doc called this out:
+
+> The fix needs a different primitive — an `asyncio.Future` stored in a per-session slot that N waiters can await — because a lock would serialize N requests through one DDB call instead of consolidating them to one call.
+
+**Implementation notes**:
+- The registry is a plain `dict` guarded by a `threading.Lock` with double-checked locking — mirrors the pattern in `lock.py` which is already approved by the team.
+- Leader always removes the entry in a `try/except/finally` pattern so a failed loader doesn't sticky-cache.
+- Exceptions propagate to all waiters via `future.set_exception(exc)`; the leader additionally calls `future.exception()` to silence the "Future exception was never retrieved" warning if no follower attached.
+
+**Verdict**: ✅ Canonical pattern, implemented against Python's standard asyncio primitives.
+
+---
+
+## 3. De-aligning cache TTL and slide-throttle windows
+
+**Change**: `_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS` raised from 60s to 300s while `_DEFAULT_REFRESH_LEEWAY_SECONDS` stays at 60s.
+
+**Why this is correct**:
+
+Aligned TTL boundaries are the textbook cause of **cache stampede / thundering herd**. Multiple sources document this:
+
+- [Redis (antirez) on cache stampedes](https://redis.antirez.com/fundamental/cache-stampede-prevention.html): a popular cache key expiring causes many concurrent requests to regenerate it, overwhelming the backend.
+- [Aman Maharshi, "Cache Stampede: Solving the Thundering Herd Problem"](https://www.amanmaharshi.com/blog/cache-stampede): "Synchronized Expiration" — caching N items at once with one TTL causes them all to expire at the same second, creating a spike.
+- [softwarepatternslexicon.com "Thundering Herds and Backend Pressure"](https://softwarepatternslexicon.com/caching-patterns-and-invalidation/consistency-and-stampede-control/thundering-herds-backend-pressure/): "A synchronized TTL boundary... can create a wave of misses that ripples into databases."
+
+Our case was a miniature version of this: whenever `SessionCache` TTL (60s) elapsed at the same moment as the slide-throttle window (60s), a single request paid **both** a `get_item` AND an `update_item` on its critical path. Making the throttle a strict multiple (300s, 5× the leeway) guarantees that a cache miss at boundary T will never coincide with a slide-throttle expiry at the same T — by construction, the slide throttle expiry is at T + offset where `offset != 0 mod 60`.
+
+**Why 300s and not some other value**: The design doc explicitly says "strict multiple of refresh leeway (e.g. 300s vs 60s)". 300s is 5× 60s. The key property is that `throttle % leeway == 0` AND `throttle > leeway` — the multiplier could be 2, 5, 10, etc. 5× was chosen because it matches industry practice of caching session metadata for minutes, not seconds.
+
+**Related patterns we didn't need but recognized**: TTL jitter (randomizing per-key expiry) is another standard mitigation. We don't need it because we only have one key class (sessions) and the single-flight already coalesces; jitter would add complexity without bounded benefit.
+
+**Verdict**: ✅ Direct application of a well-documented cache-stampede prevention technique.
+
+---
+
+## 4. Fire-and-forget slide-write via `asyncio.create_task`
+
+**Change**: `_maybe_slide` now dispatches `touch_last_seen` as a detached task rather than awaiting it inline.
+
+**Why the approach is correct**:
+
+The inline `await` was causing the response path to wait on a DDB round-trip for a write that was already documented to swallow failures — i.e. the response didn't actually need the write to complete. That's the textbook scenario for fire-and-forget.
+
+**What I got wrong initially**: I wrote `asyncio.create_task(self._slide_write_task(...))` without holding a reference to the returned Task. This is a **known dangerous anti-pattern**. The [Python docs for `asyncio.create_task`](https://docs.python.org/3/library/asyncio-task.html#asyncio.create_task) contain this explicit warning:
+
+> **Important**
+>
+> Save a reference to the result of this function, to avoid a task disappearing mid-execution. The event loop only keeps weak references to tasks. A task that isn't referenced elsewhere may get garbage collected at any time, even before it's done.
+>
+> For reliable "fire-and-forget" background tasks, gather them in a collection:
+>
+> ```python
+> background_tasks = set()
+> for i in range(10):
+>     task = asyncio.create_task(some_coro(param=i))
+>     background_tasks.add(task)
+>     task.add_done_callback(background_tasks.discard)
+> ```
+
+The fix in commit `78891e2e` applies this exact pattern: `self._slide_tasks: set[asyncio.Task]` on the middleware instance, with `task.add_done_callback(self._slide_tasks.discard)` to prevent the set from leaking.
+
+**Multiple external sources reinforce this**:
+- [SuperFastPython, "Asyncio Disappearing Task Bug"](http://superfastpython.com/asyncio-disappearing-task-bug/): "Save a reference to the result of this function, to avoid a task disappearing mid-execution. The event loop only keeps weak references to tasks."
+- [Michael Kennedy, "Fire and forget (or never) with Python's asyncio"](https://mkennedy.codes/posts/fire-and-forget-or-never-with-python-s-asyncio/): "create_task() can silently garbage collect your fire-and-forget tasks starting in Python 3.12 — they may never run. The fix: store task references in a set and register a done_callback to clean them up."
+- [Ruff's `RUF006` lint rule ("asyncio-dangling-task")](https://docs.astral.sh/ruff/rules/asyncio-dangling-task/) flags exactly this anti-pattern automatically.
+- [Runebook, "Replacing Low-Level Task Registration"](http://runebook.dev/en/docs/python/library/asyncio-extending/asyncio._register_task): describes the weak-reference behavior and the risk of collection mid-execution.
+
+**Why the bug surfaced only on CI**: Python 3.12 made garbage collection more aggressive. On my local Python 3.13 (different GC tuning, different scheduler timing), the task usually completed before GC ran. On CI's Python 3.12 runners, the GC occasionally collected the task first, causing a missing `update_item`. Hypothesis caught it as `FlakyFailure` — failed once, passed on retry — which is the signature of exactly this kind of race.
+
+**Verdict**: ✅ Fire-and-forget is the right approach; ❌ my initial implementation had a canonical asyncio bug; ✅ the fix matches the Python docs' recommended pattern verbatim.
+
+---
+
+## 5. ECS `desiredCount` raised from 1 to 2
+
+**Change**: `infrastructure/cdk.context.json` `appApi.desiredCount: 1 → 2` in the production context.
+
+**Why this is correct**:
+
+The issue was a single point of failure at the deployment layer: one ECS task running one uvicorn worker means any slow AWS call on that task's event loop halts every in-flight request. AWS's own [ECS availability best practices](https://aws.amazon.com/blogs/containers/amazon-ecs-availability-best-practices/) document explicitly recommends multi-task deployments for availability.
+
+Independently from the event-loop issue, single-task services fail basic availability requirements: if the one task crashes, restarts, or becomes unreachable, the service has zero capacity until a replacement boots — which for Fargate is tens of seconds to minutes. Two tasks means rolling restarts always keep one healthy instance serving traffic.
+
+This change is belt-and-suspenders: even if the event-loop-blocking fix is 100% correct, running `desiredCount: 1` would still be a latent availability liability. Raising to 2 gives us:
+1. Concurrency slack so a single stuck loop can't halt all ingress (primary rationale).
+2. Rolling deploy safety (automatic secondary benefit).
+3. Resilience to a single task's AZ failure (automatic tertiary benefit).
+
+`maxCapacity` stays at 10 so auto-scaling can still burst upward under load.
+
+**Verdict**: ✅ Standard AWS multi-task posture, with a specific and documented trigger in the bug analysis.
+
+---
+
+## 6. Lock scope preservation (existing `get_session_lock`)
+
+**Change**: None — the `async with get_session_lock(session_id)` scope around the Cognito refresh exchange is deliberately preserved exactly as it was.
+
+**Why this is correct**:
+
+The existing lock exists for a specific purpose: the Cognito refresh-token rotation flow invalidates the previous refresh token as soon as a new one is issued. If N concurrent requests all call `initiate_auth` with the same refresh token, only the first succeeds; the rest receive the token-rotated-out error and have to be failed or retried. Serializing the exchange with a per-session lock prevents this race.
+
+The new single-flight primitive sits **upstream** of this lock — it coalesces the resolve path (cache, repo.get, needs_refresh decision) so typically only the leader ever reaches the Cognito refresh at all. But in the edge case where the leader decides refresh is NOT needed but a follower does (race with TTL expiry), the existing lock is still needed as a defense-in-depth. The design doc was explicit about not moving or widening the lock.
+
+The preservation test `test_3_5_refresh_storm_coalesces_to_single_initiate_auth` verifies that exactly one `cognito-idp:initiate_auth` fires per 10 concurrent same-session requests — which is the original contract, preserved end-to-end.
+
+**Verdict**: ✅ Correctly preserved. The contract the existing lock was enforcing continues to hold.
+
+---
+
+## 7. Testing approach
+
+**Property-Based Tests over scenario-based tests**: Used `hypothesis` for:
+- Sub-conditions that generalize over a domain (fan-out size, request shapes across the non-buggy input domain).
+- Preservation properties that must hold "for all" inputs meeting certain criteria.
+
+This is the approach the project's Kiro spec workflow calls for (Property-Based Testing Integration section). Property-based testing for preservation invariants is particularly strong because it catches edge cases in the fix (single-flight exception paths, background task races, Set-Cookie attribute sets) that scenario tests would miss.
+
+**Bug Condition exploration test FAILS on unfixed code, PASSES on fixed code**: This is the core methodology of the bugfix workflow — the test serves as the executable specification. 10 of 12 sub-conditions failed on unfixed code (proving the bug); all 12 pass after the fix.
+
+**What the tests caught that scenario tests would have missed**:
+- Hypothesis's `FlakyFailure` detection caught the `asyncio.create_task` GC race on CI — a scenario test at a fixed seed likely wouldn't have reproduced it at all.
+
+**Verdict**: ✅ Correct methodology; the tests caught a real bug I introduced.
+
+---
+
+## 8. What I did well
+
+1. **Read before writing**: traced the full middleware path, repository, lock, and config before proposing changes.
+2. **Preservation-first**: wrote the preservation test suite on unfixed code before implementing any fix, so regressions surface immediately.
+3. **Separate primitive for separate concern**: new `single_flight.py` module instead of overloading `lock.py` — keeps each primitive's contract clear.
+4. **Minimal-surface interventions**: no new async SDK, no public API changes, no lock-scope shift.
+
+## 9. What I got wrong (and corrected)
+
+1. **Missed the `asyncio.create_task` strong-reference requirement** on the first pass. The Python docs warn about this in bold, Ruff has a lint rule for it, and multiple blog posts cover it. This is directly traceable to me not running the full CI script locally before pushing — my local Python 3.13 GC didn't hit the race.
+2. **Initial CI fix was a band-aid** (polling on the test side) rather than a root-cause fix (strong reference in the middleware). The polling remains as defensive depth but the real fix is the set-based reference in commit `78891e2e`.
+
+## 10. Root cause summary
+
+The fix addresses four independent but correlated defects in `SessionRefreshMiddleware`, each with a canonical industry solution:
+
+| Defect | Canonical fix | Authority |
+|---|---|---|
+| Sync boto3 blocks event loop | `asyncio.to_thread` | [Python docs](https://docs.python.org/3/library/asyncio-task.html#asyncio.to_thread) |
+| N concurrent same-session → N DDB calls | Single-flight / request coalescing via `asyncio.Future` | [OneUptime](https://oneuptime.com/blog/post/2026-01-23-request-coalescing-python/view), Go's `singleflight` |
+| Aligned TTL = cache stampede | De-align boundaries (strict multiple) | [Redis on cache stampedes](https://redis.antirez.com/fundamental/cache-stampede-prevention.html), [softwarepatternslexicon.com](https://softwarepatternslexicon.com/caching-patterns-and-invalidation/consistency-and-stampede-control/thundering-herds-backend-pressure/) |
+| Response waits on non-critical DDB write | Fire-and-forget task with strong reference | [Python docs on `asyncio.create_task`](https://docs.python.org/3/library/asyncio-task.html#asyncio.create_task) |
+| Single ECS task = no concurrency slack | `desiredCount >= 2` | [AWS ECS availability best practices](https://aws.amazon.com/blogs/containers/amazon-ecs-availability-best-practices/) |
+
+Each fix is directly traceable to a published authority. The overall shape — coalesce upstream, offload sync I/O to threads, dispatch non-critical writes asynchronously, stagger TTLs, add replica slack — is the standard stack of techniques for keeping an ASGI service's event loop free under concurrent load.
+
+## 11. Verification status
+
+- **Local**: `scripts/stack-app-api/test.sh` and `scripts/stack-inference-api/test.sh` both pass with 2459 tests inside the `agentcore-dev` container.
+- **Bug condition exploration suite**: 12/12 pass on fixed code (0/12 passed before fix).
+- **Preservation suite**: 19/19 pass on both unfixed and fixed code (baseline intact).
+- **Single-flight primitive unit tests**: 6/6 pass.
+- **CDK unit tests**: 25/25 pass for `app-api-stack` including new production-context `DesiredCount: 2` assertion.
+- **CI PR #264**: pushed commit `78891e2e` with the strong-reference fix; awaiting CI verification.
+
+---
+
+## Sources consulted
+
+Primary:
+- [Python 3 docs: `asyncio.to_thread`](https://docs.python.org/3/library/asyncio-task.html#asyncio.to_thread)
+- [Python 3 docs: `asyncio.create_task` (Important: Save a reference...)](https://docs.python.org/3/library/asyncio-task.html#asyncio.create_task)
+
+Supporting (asyncio task lifecycle):
+- [SuperFastPython: Asyncio Disappearing Task Bug](http://superfastpython.com/asyncio-disappearing-task-bug/)
+- [Michael Kennedy: Fire and forget (or never) with Python's asyncio](https://mkennedy.codes/posts/fire-and-forget-or-never-with-python-s-asyncio/)
+- [Ruff RUF006: asyncio-dangling-task](https://docs.astral.sh/ruff/rules/asyncio-dangling-task/)
+
+Supporting (boto3 + async):
+- [Stack Overflow: I want to use boto3 in async function, python](https://stackoverflow.com/questions/72092993/i-want-to-use-boto3-in-async-function-python)
+- [aioboto3 on PyPI](https://pypi.org/project/aioboto3/) — considered and rejected as too invasive
+
+Supporting (cache stampede / thundering herd):
+- [Redis on cache stampede prevention](https://redis.antirez.com/fundamental/cache-stampede-prevention.html)
+- [softwarepatternslexicon.com: Thundering Herds and Backend Pressure](https://softwarepatternslexicon.com/caching-patterns-and-invalidation/consistency-and-stampede-control/thundering-herds-backend-pressure/)
+- [Aman Maharshi: Cache Stampede: Solving the Thundering Herd Problem](https://www.amanmaharshi.com/blog/cache-stampede)
+
+Supporting (request coalescing):
+- [OneUptime: How to Reduce DB Load with Request Coalescing in Python](https://oneuptime.com/blog/post/2026-01-23-request-coalescing-python/view)
+- [SystemDesignSandbox: Hot Keys and Cache Stampedes](https://www.systemdesignsandbox.com/learn/hot-key-cache-stampede)
+
+Supporting (ECS availability):
+- [AWS ECS availability best practices](https://aws.amazon.com/blogs/containers/amazon-ecs-availability-best-practices/)
+
+Content was paraphrased for compliance with licensing restrictions; verbatim quotes are limited to short excerpts attributed inline.
diff --git a/.kiro/specs/bff-middleware-event-loop-blocking/design.md b/.kiro/specs/bff-middleware-event-loop-blocking/design.md
new file mode 100644
index 00000000..b5e99a44
--- /dev/null
+++ b/.kiro/specs/bff-middleware-event-loop-blocking/design.md
@@ -0,0 +1,380 @@
+# BFF Middleware Event Loop Blocking Bugfix Design
+
+## Overview
+
+The `SessionRefreshMiddleware` runs on every cookie-bearing request and, as of `v1.0.0-beta.24`, executes four independent classes of blocking/serialized work on the uvicorn event loop:
+
+1. **Sync boto3 I/O on the event loop thread** — `SessionRepository.*` and `CognitoRefreshClient.refresh` are declared `async def` but call boto3 synchronously. Every DynamoDB `get_item`/`update_item` and every Cognito `initiate_auth` freezes the whole event loop for its round-trip duration.
+2. **Missing fan-out coalescing** — the per-session `asyncio.Lock` wraps only the refresh exchange. The upstream `unseal → cache → get_item → maybe_slide` path is not coalesced, so Angular's ~8-endpoint page-load fan-out produces ~16 serialized blocking DDB calls per cache window.
+3. **Aligned cache TTL / throttle window** — `_DEFAULT_REFRESH_LEEWAY_SECONDS` and `_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS` both default to 60s. Cache expiry and slide-throttle expiry land on the same boundary, so a single request crossing that boundary incurs both a `get_item` and an `update_item` on its critical path.
+4. **Inline awaited slide-write** — `_maybe_slide` awaits `touch_last_seen` on the request path even though the call is already written defensively (failures are swallowed). The caller's response waits on DDB.
+
+All of this runs inside a **single uvicorn worker on a single ECS task** (no `--workers` flag in `backend/Dockerfile.app-api`, `desiredCount: 1` in CDK), so any one blocked round-trip stalls every other in-flight request.
+
+The fix is a targeted, minimal-surface intervention that keeps the middleware's public contracts intact:
+
+- Offload every synchronous boto3 call in `SessionRepository` and `CognitoRefreshClient.refresh` via `asyncio.to_thread`.
+- Introduce a per-session `asyncio.Future`-based single-flight in front of the `get_item → needs_refresh → maybe-refresh` path so N concurrent requests for the same `session_id` share one lookup result.
+- De-align `_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS` from the cache/leeway window (raise to 300s) so cache-miss does not imply slide-write.
+- Dispatch `_maybe_slide`'s `touch_last_seen` as a detached `asyncio.Task` and return the `Max-Age` synchronously.
+- Add concurrency slack at the deployment layer (raise `CDK_APP_API_DESIRED_COUNT` to ≥ 2 for production config, keeping 1 valid for dev) so a single stuck event loop can no longer halt all ingress.
+
+## Glossary
+
+- **Bug_Condition (C)**: The condition that triggers the bug — a cookie-bearing request reaches `SessionRefreshMiddleware` while the middleware is active (`BFFConfig.is_enabled()` is True), under any of the sub-conditions 1.1–1.7 in `bugfix.md#Current Behavior`.
+- **Property (P)**: The desired behavior when the bug condition holds — AWS I/O never freezes the uvicorn event loop, fan-outs share a single coalesced lookup, and slide-writes never block the response path.
+- **Preservation**: Existing contracts that must remain unchanged — dormant pass-through (`is_enabled() == False`), no-cookie pass-through, unrecoverable-cookie clearing, refresh-storm coalescing, Max-Age re-emit contract, CSRF unchanged, absolute-lifetime cap, fail-closed rotation, uniform cookie decode failure.
+- **SessionRefreshMiddleware**: The middleware in `backend/src/apis/shared/middleware/session_refresh.py` that unseals the BFF cookie, resolves the `SessionRecord`, optionally refreshes Cognito tokens, and slides the session's DDB TTL.
+- **SessionRepository**: The repository in `backend/src/apis/shared/sessions_bff/repository.py` that wraps boto3 DynamoDB calls with `async def` signatures. Today the methods call boto3 synchronously on the event loop thread.
+- **CognitoRefreshClient**: The class in `backend/src/apis/shared/sessions_bff/refresh.py` whose `refresh()` method is plain `def` and calls `cognito-idp:initiate_auth` synchronously.
+- **SessionCache**: The process-wide `TTLCache` in `backend/src/apis/shared/sessions_bff/cache.py` whose TTL defaults to `refresh_leeway_seconds` (60s).
+- **`_DEFAULT_REFRESH_LEEWAY_SECONDS`**: 60s constant in `config.py` — both the refresh pre-expiry window and the SessionCache TTL.
+- **`_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS`**: 60s constant in `config.py` — the minimum interval between DDB `touch_last_seen` writes for a single session. Currently aligned with leeway, will be de-aligned to 300s.
+- **per-session `asyncio.Lock`**: The lock from `get_session_lock(session_id)` in `sessions_bff/lock.py`. Today it wraps only the Cognito refresh exchange; the fix does NOT move its scope — a separate single-flight `Future` is added upstream.
+- **Single-flight Future**: New per-session `asyncio.Future` added for this fix that coalesces the upstream `get_item → needs_refresh → refresh?` resolution across concurrent callers within one task.
+
+## Bug Details
+
+### Bug Condition
+
+The bug manifests when a request reaches `SessionRefreshMiddleware.dispatch` with `BFFConfig.is_enabled() == True` AND a `__Host-bff_session` cookie present. Under this condition the middleware's resolve/slide path performs at least one event-loop-blocking AWS call, and — under fan-out — performs 2×N blocking calls for N concurrent same-session requests. The observable symptoms (504s, 80s `/files/quota` tails, 15.6s p-max at 0.7% CPU) follow directly.
+
+**Formal Specification:**
+
+```
+FUNCTION isBugCondition(input)
+  INPUT: input of type HTTPRequest
+  OUTPUT: boolean
+
+  # Middleware-level precondition — everything else is scoped inside this.
+  IF NOT BFFConfig.from_env().is_enabled() THEN
+    RETURN false
+  END IF
+  IF input.cookies["__Host-bff_session"] IS NULL THEN
+    RETURN false
+  END IF
+
+  # Sub-condition 1.1: sync boto3 in SessionRepository blocks the loop.
+  blocks_on_repo := (
+    awaitedIn(request, SessionRepository.get)
+      OR awaitedIn(request, SessionRepository.touch_last_seen)
+      OR awaitedIn(request, SessionRepository.update_tokens)
+      OR awaitedIn(request, SessionRepository.put)
+      OR awaitedIn(request, SessionRepository.delete)
+  )
+    AND NOT executesInThreadpool(boto3_call_of_that_method)
+
+  # Sub-condition 1.2: sync boto3 in CognitoRefreshClient blocks the loop,
+  # AND it runs while get_session_lock(session_id) is held.
+  blocks_on_cognito := (
+    invokedIn(request, CognitoRefreshClient.refresh)
+      AND NOT executesInThreadpool(initiate_auth_call)
+      AND sessionLockHeldDuring(initiate_auth_call)
+  )
+
+  # Sub-condition 1.3: N concurrent same-session requests are not coalesced
+  # across the session-resolve path.
+  missing_resolve_coalescing := (
+    concurrentRequestsForSameSession(input.session_id) > 1
+      AND countOf(SessionRepository.get calls for input.session_id in this window)
+          = concurrentRequestsForSameSession(input.session_id)
+  )
+
+  # Sub-condition 1.4: cache-miss boundary aligns with throttle boundary.
+  aligned_windows := (
+    BFFConfig._DEFAULT_REFRESH_LEEWAY_SECONDS
+      == BFFConfig._DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS
+  )
+
+  # Sub-condition 1.5: response waits on inline-awaited touch_last_seen.
+  inline_slide := (
+    slideWarrantedFor(request)
+      AND responseWaitsFor(touch_last_seen_call_of_this_request)
+  )
+
+  # Sub-condition 1.6: no concurrency slack at the deployment boundary.
+  no_slack := (
+    uvicornWorkerCount() == 1
+      AND ecsDesiredCount() == 1
+  )
+
+  # Sub-condition 1.7: page-load fan-out amplifies 1.1 + 1.3 + 1.4.
+  amplified_fanout := (
+    concurrentRequestsForSameSession(input.session_id) >= 8
+      AND cacheWindowJustElapsedFor(input.session_id)
+      AND countOf(DDB calls on critical path during this window)
+          >= 2 * concurrentRequestsForSameSession(input.session_id)
+  )
+
+  RETURN blocks_on_repo
+    OR blocks_on_cognito
+    OR missing_resolve_coalescing
+    OR aligned_windows
+    OR inline_slide
+    OR no_slack
+    OR amplified_fanout
+END FUNCTION
+```
+
+### Examples
+
+- **1.1 blocking repo call**: Any request that hits `request.state.bff_session = record` → `_maybe_slide` → `touch_last_seen`. Expected: the DDB round-trip runs off the event loop thread; other coroutines continue to be scheduled. Actual: the event loop is frozen for the full round-trip.
+- **1.2 blocking Cognito call**: Two tabs refresh concurrently at minute 59 of the access token's lifetime. Expected: the Cognito `initiate_auth` for session A runs off the loop thread; unrelated requests (different cookies, Bearer-token requests, health checks) proceed. Actual: the loop is frozen for the full Cognito round-trip AND the per-session lock is held during that freeze.
+- **1.3 missing resolve coalescing**: Angular fan-out of 8 same-session requests with no cached `SessionRecord`. Expected: 1 DDB `get_item`. Actual: 8 DDB `get_item` calls, each blocking.
+- **1.4 aligned windows**: A request at T when `T - last_seen_at == 60s` AND `SessionCache` entry for this session has just TTL-evicted at T. Expected: at most 1 of `{get_item, update_item}`. Actual: both, serialized.
+- **1.5 inline slide**: Request with `_maybe_slide` returning non-None. Expected: the response Set-Cookie lands immediately; the DDB write happens in the background. Actual: the response waits for DDB.
+- **1.7 page-load fan-out**: Angular page load fires 8 endpoints at once right after a cache window elapses. Expected: ≤1 `get_item` + ≤1 `update_item` across the 8 requests. Actual: up to 16 serialized blocking calls at the front of the page load.
+- **Edge case — `is_enabled() == False`**: The middleware must short-circuit before any of the above sub-conditions can manifest. No AWS calls, no locks, no futures.
+
+## Expected Behavior
+
+### Preservation Requirements
+
+**Unchanged Behaviors:**
+
+- **3.1 Dormant pass-through**: `BFFConfig.is_enabled() == False` → `dispatch` short-circuits to `call_next(request)` with no AWS calls, no cache lookup, no single-flight registration.
+- **3.2 No-cookie pass-through**: No `__Host-bff_session` cookie → same short-circuit as 3.1.
+- **3.3 Unrecoverable cookie → clear both cookies**: Bad seal, missing DDB row, expired TTL, or terminal `CognitoRefreshError` → `_clear_cookies(response)` clears both `__Host-bff_session` AND `__Host-bff_csrf` with the same attribute set as today.
+- **3.4 Max-Age re-emit contract**: When `_maybe_slide` returns a non-None `Max-Age`, the `Set-Cookie` headers for both BFF cookies use that exact value and the exact attribute set in `_reemit_cookies` today. Fire-and-forget dispatch of the DDB write does not change this contract.
+- **3.5 Refresh-storm coalescing (existing)**: For N concurrent same-session requests crossing the refresh-leeway boundary, exactly one `cognito-idp:initiate_auth` is issued per `session_id` per leeway window. The existing `get_session_lock(session_id)` scope around the Cognito exchange is preserved end-to-end.
+- **3.6 Codec singleton**: `get_default_codec()` is the same process-wide instance used by the auth/callback seal path and the middleware unseal path. No per-request `kms:GenerateDataKey` is introduced.
+- **3.7 Client-secret cache**: `resolve_bff_client_secret` continues to serve from the module-scope cache. No per-request `secretsmanager:GetSecretValue`.
+- **3.8 CSRF middleware path**: `CSRFMiddleware` continues to validate unsafe-method requests using the existing in-memory HMAC double-submit check against `request.state.bff_csrf_token`. No new I/O is introduced on that path.
+- **3.9 Absolute-lifetime cap**: `_maybe_slide` returns `None` once `created_at + absolute_lifetime_seconds` has passed. No further cookie re-emit or DDB slide.
+- **3.10 Fail-closed rotation**: When Cognito rotates the refresh token and `_persist_refresh` exhausts its retries, the middleware invalidates the cache and clears the cookie.
+- **3.11 Uniform cookie decode failure**: Every `CookieDecodeError` branch produces the same response shape and timing signature. No new oracle is introduced by the offload or single-flight paths.
+
+**Scope:**
+
+All inputs that do NOT involve the BFF middleware path should be completely unaffected by this fix. This includes:
+
+- Bearer-token requests (no `__Host-bff_session` cookie) — untouched.
+- Anonymous endpoints (health, static assets) — untouched.
+- WebSocket voice routes — they replicate the cookie unseal + DDB lookup outside the middleware (see `voice/routes.py`); this fix does not change their path.
+- The auth/callback token-exchange route — it uses the same `CookieCodec` singleton to seal cookies; the singleton is not disturbed.
+- The logout route — its cache `invalidate(session_id)` call is preserved.
+
+## Hypothesized Root Cause
+
+Based on the bug description and code inspection, the root causes are concurrent and independent — each sub-condition has its own root cause, and the fix addresses all of them:
+
+1. **Sync boto3 in `async def` methods (1.1, 1.2)**: The `SessionRepository` docstring explicitly acknowledges this ("The methods are declared `async` to match the rest of `apis.shared`, but boto3 is sync — calls run on the event loop thread"). The original reasoning was that refresh-storm coalescing via `get_session_lock()` would hold fan-out low enough to make thread-pool offload unnecessary. That reasoning is wrong for two reasons: (a) the lock only covers the Cognito exchange, not the DDB path — so fan-out is not coalesced at all for cache misses; and (b) even a single blocking call is enough to freeze the event loop for the round-trip duration, which is directly observable in `TargetResponseTime` p-max.
+
+2. **Wrong lock scope (1.3, 1.7)**: `get_session_lock(session_id)` is acquired inside `_resolve_session` only after the `_cache.get → _repository.get → needs_refresh` decision has been made. An `asyncio.Lock` held this narrowly cannot coalesce anything upstream of itself. The fix needs a different primitive — an `asyncio.Future` stored in a per-session slot that N waiters can await — because a lock would serialize N requests through one DDB call instead of consolidating them to one call.
+
+3. **Aligned windows by default (1.4)**: Both constants default to 60s in `config.py`. A strict-multiple relationship (e.g. throttle = 5 × leeway) de-aligns the boundaries. This is a config fix with no code change needed in the middleware.
+
+4. **`await` on `touch_last_seen` by pattern (1.5)**: `_maybe_slide` awaits the write because that matches the rest of the codebase's DB access shape. The surrounding `try/except` already swallows failures (documented as "Don't fail the request if the slide-write fails"), which is exactly the pre-condition that makes fire-and-forget safe.
+
+5. **Single-worker container (1.6)**: The Dockerfile CMD ships one uvicorn worker and `desiredCount: 1` in CDK ships one task. This was fine for the Bearer-token era; under the BFF middleware, it means any one blocked round-trip halts every other in-flight request. Concurrency slack is a separate lever from event-loop non-blocking — both are required, neither is sufficient alone.
+
+## Correctness Properties
+
+Property 1: Bug Condition — Event-Loop Non-Blocking, Coalesced, Window-Staggered, Fire-and-Forget BFF Middleware
+
+_For any_ request where the bug condition holds (`isBugCondition` returns true), the fixed middleware and its collaborators SHALL (a) execute every boto3 DynamoDB and Cognito call off the event loop thread (via `asyncio.to_thread` or equivalent), (b) coalesce N concurrent same-`session_id` requests crossing a cold cache window to at most one DynamoDB `get_item` via a per-session `asyncio.Future`, (c) hold the `_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS` default to a strict multiple of `_DEFAULT_REFRESH_LEEWAY_SECONDS` (300s vs 60s) so cache-expiry and throttle-expiry do not align, (d) dispatch `_maybe_slide`'s `touch_last_seen` as a detached `asyncio.Task` and return the `Max-Age` to the response path synchronously, and (e) run with concurrency slack such that `desiredCount >= 2` in production configuration. The observable result SHALL be that Angular's ~8-endpoint page-load fan-out issues at most 1 `get_item` and at most 1 `update_item` per `session_id` per cache window (not ~16), and no single AWS call serializes unrelated requests.
+
+**Validates: Requirements 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7**
+
+Property 2: Preservation — BFF Middleware Contracts Unchanged for Non-Buggy Inputs
+
+_For any_ request where the bug condition does NOT hold (`isBugCondition` returns false), the fixed middleware SHALL produce the same externally observable result as the original middleware, preserving: dormant pass-through (`is_enabled() == False`), no-cookie pass-through, unrecoverable-cookie clearing of both `__Host-bff_session` and `__Host-bff_csrf` with the same attribute set, the `Max-Age` re-emit contract between `_maybe_slide` and `_reemit_cookies`, exactly-one Cognito `initiate_auth` per `session_id` per leeway window, the `CookieCodec` and client-secret process-wide singletons, the `CSRFMiddleware` in-memory HMAC double-submit check, the absolute-lifetime cap behavior, fail-closed refresh-token rotation, and uniform `CookieDecodeError` handling.
+
+**Validates: Requirements 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10, 3.11**
+
+## Fix Implementation
+
+### Changes Required
+
+Assuming the root cause analysis above is correct, the fix spans four code locations and one infrastructure config.
+
+**File**: `backend/src/apis/shared/sessions_bff/repository.py`
+
+**Function**: `SessionRepository.get`, `touch_last_seen`, `update_tokens`, `put`, `delete`
+
+**Specific Changes**:
+
+1. **Threadpool offload for every boto3 call**: Extract each method's boto3 invocation into a nested sync helper and invoke it via `await asyncio.to_thread(helper, ...)`. Example for `get`:
+   ```
+   async def get(self, session_id):
+       if not self._enabled:
+           return None
+       def _call():
+           return self._table.get_item(Key=self._key(session_id))
+       try:
+           response = await asyncio.to_thread(_call)
+       except ClientError as exc:
+           ...
+   ```
+   The method signatures, return types, and exception handling stay identical. The post-decode TTL defense-in-depth check and `_item_to_record` translation stay on the calling coroutine.
+
+2. **No change to public API**: Every callsite in the middleware (`self._repository.get`, `self._repository.touch_last_seen`, `self._repository.update_tokens`) remains an `await`. The offload is purely internal.
+
+**File**: `backend/src/apis/shared/sessions_bff/refresh.py`
+
+**Function**: `CognitoRefreshClient.refresh`
+
+**Specific Changes**:
+
+3. **Add async wrapper that offloads to a threadpool**: Either rename `refresh` to `_refresh_sync` and add a new `async def refresh(...)` that calls `await asyncio.to_thread(self._refresh_sync, username=..., refresh_token=...)`, or convert `refresh` to `async def` in-place with the same offload. The middleware callsite (`self._refresh_client.refresh(...)`) becomes `await self._refresh_client.refresh(...)`. The Cognito SDK call and the `CognitoRefreshError` contract are unchanged.
+
+**File**: `backend/src/apis/shared/middleware/session_refresh.py`
+
+**Function**: `SessionRefreshMiddleware._resolve_session`, `_maybe_slide`, `dispatch`
+
+**Specific Changes**:
+
+4. **Add per-session single-flight for the session-resolve path**: Introduce a new module-level `dict[str, asyncio.Future[tuple[Optional[SessionRecord], bool]]]` guarded by a thread lock in a new small module `backend/src/apis/shared/sessions_bff/single_flight.py` (mirroring `lock.py`'s shape), with an API:
+   ```
+   async def resolve_once(session_id, loader_coro_factory) -> tuple[Optional[SessionRecord], bool]
+   ```
+   The leader creates an `asyncio.Future`, registers it, runs the loader, sets the result/exception, and removes the entry. Followers `await` the existing Future. In `_resolve_session`, wrap the `_cache.get → _repository.get → needs_refresh → (maybe refresh)` block (from cache lookup through return) inside this single-flight, keyed by `session_id`. The existing `get_session_lock(session_id)` scope around the Cognito refresh exchange is **not** moved or widened — it stays exactly where it is today.
+
+5. **Fire-and-forget slide-write in `_maybe_slide`**: Replace `await self._repository.touch_last_seen(...)` with a detached task. The function still computes `new_max_age` and returns it synchronously. The DDB write happens in the background; the existing `try/except` that was already documented to swallow failures moves into a `_slide_write_task(...)` helper that logs on failure. Update the local cache (`record.last_seen_at = now`, `record.ttl = new_ttl`, `self._cache.set(record)`) before scheduling the task, so subsequent same-request reads see the slid state.
+
+6. **No change to `dispatch` structure or the cookie-clear / cookie-reemit branches**: Keep `clear_cookie` and `renewal_max_age` handling identical.
+
+**File**: `backend/src/apis/shared/sessions_bff/config.py`
+
+**Constant**: `_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS`
+
+**Specific Changes**:
+
+7. **Raise default from 60s to 300s**: Change the constant from `60` to `60 * 5` (or explicitly `300`) so the cache TTL (tied to `_DEFAULT_REFRESH_LEEWAY_SECONDS = 60`) and the slide-throttle window are strict multiples. The env var `BFF_SESSION_SLIDING_RENEWAL_THROTTLE_SECONDS` continues to override.
+
+**File**: `infrastructure/cdk.context.json` (and test fixtures under `infrastructure/test/`)
+
+**Key**: `appApi.desiredCount`
+
+**Specific Changes**:
+
+8. **Raise production `desiredCount` to 2**: Keep `maxCapacity` as-is (4). Update only the production/non-test context — test fixtures can stay at 1 if needed to keep CDK unit tests fast, but the top-level production context value must flip to 2. This is a **deployment-time** behavior change and the last item in the fix plan; it does not become necessary until the other changes ship.
+
+**No changes required** in: `backend/src/apis/shared/sessions_bff/cache.py`, `backend/src/apis/shared/sessions_bff/cookie.py`, `backend/src/apis/shared/sessions_bff/lock.py`, `backend/src/apis/shared/sessions_bff/csrf.py`, `backend/src/apis/shared/middleware/csrf.py`, `backend/src/apis/app_api/auth/bff/*`, or the uvicorn `CMD` in `backend/Dockerfile.app-api` (the ECS `desiredCount` bump is the chosen vector for concurrency slack in 2.6 — a `--workers N` flag would require reworking the in-process singletons in `cache.py` and `refresh.py`, which is out of scope).
+
+## Testing Strategy
+
+### Validation Approach
+
+The testing strategy follows a two-phase approach: first, surface counterexamples that demonstrate the bug on unfixed code, then verify the fix works correctly and preserves existing behavior. Because four of the sub-conditions are independent, we run the exploratory phase against each one.
+
+### Exploratory Bug Condition Checking
+
+**Goal**: Surface counterexamples that demonstrate the bug BEFORE implementing the fix. Confirm or refute the root-cause analysis for each sub-condition. If any is refuted, we re-hypothesize.
+
+**Test Plan**: Write tests that inject a slow/instrumented boto3 stub (for DDB and Cognito) and drive the middleware directly under `pytest-asyncio`. For each sub-condition, assert the blocking/serialization behavior is present on unfixed code. Run on UNFIXED code first; the assertions SHALL fail against fixed code later.
+
+**Test Cases**:
+
+1. **Event loop blocked by `SessionRepository.get`** (validates 1.1): Stub the boto3 `table.get_item` with a 500ms `time.sleep`. Submit a `SessionRepository.get` call and a concurrent `asyncio.sleep(0.05)` marker coroutine on the same loop. Assert the marker resolves strictly after the `get` (will hold on unfixed code, will fail on fixed code where the marker completes long before `get` returns).
+
+2. **Event loop blocked by `CognitoRefreshClient.refresh`** (validates 1.2): Same shape as (1) but against a stubbed `cognito-idp:initiate_auth`. Additionally assert that `get_session_lock(other_session_id)` can be acquired concurrently (will fail on unfixed code because the sync Cognito call has frozen the whole loop thread).
+
+3. **N fan-out → N `get_item` calls** (validates 1.3): Spin up 8 concurrent `dispatch` calls with the same cookie and a cold `SessionCache`. Count `table.get_item` invocations on the stub. Assert count == 8 on unfixed code; the fix target is 1.
+
+4. **Aligned windows → both writes on one request** (validates 1.4): Set clock to a moment where the cache TTL just elapsed AND `now - last_seen_at == 60s`. Drive a single request. Assert both `get_item` AND `update_item` are called on unfixed code; on fixed code with the new 300s throttle default, only `get_item` is called.
+
+5. **Response waits on `touch_last_seen`** (validates 1.5): Stub `table.update_item` with a 500ms delay. Measure time from `dispatch` entry to `call_next(request)` return. On unfixed code, response time ≥ 500ms; on fixed code, response time is independent of the DDB write latency.
+
+6. **Single-worker container / `desiredCount: 1`** (validates 1.6): This is a deployment-level property, not a middleware-level one. Verified by reading `infrastructure/cdk.context.json` and the Dockerfile `CMD`. No runtime test; CDK unit test asserts `DesiredCount: 2` on the production context.
+
+7. **Page-load fan-out amplification** (validates 1.7): Combine (3) + (4) — 8 concurrent requests at a boundary moment. Count blocking DDB calls. Assert ≥ 16 on unfixed code, ≤ 2 on fixed code.
+
+**Expected Counterexamples**:
+
+- Blocked-loop markers do not complete until the stubbed AWS call returns.
+- `table.get_item` call count on the stub matches the fan-out, not 1.
+- `Set-Cookie` response latency tracks `table.update_item` latency.
+- Possible causes confirmed: sync boto3 on event loop, narrow lock scope, aligned constants, inline-awaited slide write.
+
+### Fix Checking
+
+**Goal**: Verify that for all inputs where the bug condition holds, the fixed middleware produces the expected behavior defined by Property 1.
+
+**Pseudocode:**
+
+```
+FOR ALL input WHERE isBugCondition(input) DO
+  # (a) event loop non-blocking
+  marker_latency := measureConcurrentMarker(dispatch(input))
+  ASSERT marker_latency << AWS_call_latency
+
+  # (b) fan-out coalescing
+  ddb_get_calls := countGetItemCalls(during_dispatch(input_fanout_n=8))
+  ASSERT ddb_get_calls <= 1
+
+  # (c) window staggering
+  ASSERT config.slidingRenewalThrottleSeconds
+         % config.refreshLeewaySeconds == 0
+  ASSERT config.slidingRenewalThrottleSeconds
+         > config.refreshLeewaySeconds
+
+  # (d) fire-and-forget slide
+  response_latency := measureDispatchTime(input_with_slide)
+  ASSERT response_latency independent_of touch_last_seen_latency
+
+  # (e) concurrency slack (deployment assertion)
+  ASSERT cdkContextAppApiDesiredCount >= 2
+END FOR
+```
+
+### Preservation Checking
+
+**Goal**: Verify that for all inputs where the bug condition does NOT hold, the fixed middleware produces the same externally observable result as the original middleware.
+
+**Pseudocode:**
+
+```
+FOR ALL input WHERE NOT isBugCondition(input) DO
+  ASSERT dispatch_original(input).response == dispatch_fixed(input).response
+  ASSERT dispatch_original(input).set_cookie_headers
+         == dispatch_fixed(input).set_cookie_headers
+  ASSERT dispatch_original(input).request_state_bff_session
+         == dispatch_fixed(input).request_state_bff_session
+  ASSERT dispatch_original(input).cleared_cookies
+         == dispatch_fixed(input).cleared_cookies
+  ASSERT countOf(cognito.initiate_auth across N same-session concurrent requests)
+         == 1 per leeway window
+END FOR
+```
+
+**Testing Approach**: Property-based testing is recommended for preservation checking because:
+
+- It generates many request shapes across the input domain (cookie present/absent, cookie seal valid/invalid, cache hit/miss, needs_refresh yes/no, rotation yes/no, slide warranted yes/no, absolute cap passed yes/no, `is_enabled()` true/false) and asserts equivalence against a mocked `SessionRepository` + `CognitoRefreshClient`.
+- It catches edge cases in the single-flight and fire-and-forget paths that manual unit tests might miss (e.g. an exception inside the single-flight leader; a background slide task racing with the next request).
+- It provides strong guarantees that the observable middleware contract is unchanged for the entire `¬C` input domain.
+
+**Test Plan**: First, exercise the unfixed middleware with an expressive `Hypothesis` strategy over request shapes and record observable outputs (response status, `Set-Cookie` headers, `request.state.bff_session`, DDB/Cognito call counts). Then, swap in the fixed middleware and assert equivalence on the same inputs. The strategy must skip any input that satisfies `isBugCondition` — only `¬C` inputs enter the preservation assertion.
+
+**Test Cases**:
+
+1. **Dormant pass-through unchanged** (3.1): With `is_enabled() == False`, every request shape produces identical responses under fixed and unfixed middleware with zero AWS calls.
+2. **No-cookie pass-through unchanged** (3.2): Request with no `__Host-bff_session` header, for any method/path, produces identical responses with zero AWS calls.
+3. **Unrecoverable cookie clears both cookies** (3.3): Bad-seal, missing-row, expired-row, and terminal-refresh-error inputs produce the same `Set-Cookie` headers with `Max-Age=0` for both `__Host-bff_session` and `__Host-bff_csrf`, same attribute set.
+4. **Max-Age re-emit contract** (3.4): For inputs where `_maybe_slide` returns a non-None value, the resulting `Set-Cookie` headers match the original exactly (including attribute set). Fire-and-forget dispatch does not delay or drop the re-emit.
+5. **Refresh-storm coalescing preserved** (3.5): For 10 concurrent same-session requests crossing the refresh-leeway window, exactly one `initiate_auth` call is observed on the Cognito stub.
+6. **Codec / secret singletons preserved** (3.6, 3.7): Across many requests, `get_default_codec()` returns the same instance, and `resolve_bff_client_secret()` hits Secrets Manager exactly once per process.
+7. **CSRF path unchanged** (3.8): Requests that trigger `CSRFMiddleware` produce identical accept/reject decisions with no new I/O.
+8. **Absolute lifetime cap preserved** (3.9): Inputs with `created_at + absolute_lifetime_seconds < now` produce `_maybe_slide → None`, no slide write scheduled.
+9. **Fail-closed rotation preserved** (3.10): With rotation triggered and `_persist_refresh` forced to exhaust retries, the cache is invalidated and both cookies are cleared.
+10. **Cookie decode uniformity** (3.11): All `CookieDecodeError` branches produce identical response shapes and timing profiles on the fixed middleware (no new oracle via single-flight or fire-and-forget).
+
+### Unit Tests
+
+- **Repository offload**: Assert each `SessionRepository.*` method calls `asyncio.to_thread` (monkeypatched) exactly once per call and that the wrapped boto3 call receives the expected arguments. Assert `ClientError` propagation still matches today's behavior.
+- **Cognito offload**: Assert `CognitoRefreshClient.refresh` is awaitable, offloads to a threadpool, preserves `CognitoRefreshError`, and returns the same `RefreshResult` shape.
+- **Single-flight**: Two concurrent `resolve_once(session_id, factory)` calls share one loader invocation; the entry is removed after completion; an exception in the loader propagates to all waiters; distinct `session_id`s do not share.
+- **Fire-and-forget slide**: `_maybe_slide` returns Max-Age before `touch_last_seen` completes; the background task writes to DDB; failure inside the task logs and does not bubble to `dispatch`; the local cache is updated synchronously before the task is scheduled.
+- **Config constant**: `_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS == 300`; strict multiple of `_DEFAULT_REFRESH_LEEWAY_SECONDS`.
+
+### Property-Based Tests
+
+- **Preservation over `¬C` input domain**: As described in Preservation Checking — generate request shapes, assert fixed ≡ original on response, cookies, `request.state`, and AWS call counts.
+- **Fan-out coalescing invariant**: For any N ∈ [2, 32] and any cookie-bearing same-session fan-out, the number of DDB `get_item` calls observed on the stub is ≤ 1 per cache window. Randomize cache warm/cold state, `needs_refresh` outcomes, and concurrent-request arrival ordering.
+- **Window-staggering invariant**: For any request timing `t` within one leeway window of a cache TTL boundary, the fixed middleware issues at most one of `{get_item, update_item}` on the critical path — never both.
+
+### Integration Tests
+
+- **End-to-end page-load fan-out**: Drive the app-api container (under `moto` for DDB, a stubbed Cognito client) with a simulated 8-endpoint Angular page load. Measure total wall-clock time and count of DDB/Cognito calls. Assert ≤ 1 `get_item` and ≤ 1 `update_item` across the fan-out, and total latency bounded by the slowest individual handler (not by serialized AWS I/O).
+- **Concurrency slack at the deployment boundary**: CDK unit test asserts `DesiredCount: 2` for the production `app-api` service. Integration smoke test asserts that a deliberately slow endpoint (e.g., a route that sleeps 5s) does not stall a concurrent fast endpoint on a parallel request.
+- **Refresh-storm under fan-out**: 8 concurrent requests across the refresh-leeway boundary on the same session. Assert exactly 1 Cognito `initiate_auth`, all 8 responses succeed, and `request.state.bff_session` carries the freshly rotated tokens.
diff --git a/.kiro/specs/bff-middleware-event-loop-blocking/tasks.md b/.kiro/specs/bff-middleware-event-loop-blocking/tasks.md
new file mode 100644
index 00000000..b6e58c64
--- /dev/null
+++ b/.kiro/specs/bff-middleware-event-loop-blocking/tasks.md
@@ -0,0 +1,169 @@
+# Implementation Plan
+
+- [x] 1. Write bug condition exploration test
+  - **Property 1: Bug Condition** - Event-Loop Blocking, Missing Coalescing, Aligned Windows, Inline Slide-Write
+  - **CRITICAL**: This test MUST FAIL on unfixed code - failure confirms the bug exists
+  - **DO NOT attempt to fix the test or the code when it fails**
+  - **NOTE**: This test encodes the expected behavior - it will validate the fix when it passes after implementation
+  - **GOAL**: Surface counterexamples that demonstrate each sub-condition of the bug in `SessionRefreshMiddleware`
+  - **Scoped PBT Approach**: Scope the property to concrete failing cases that deterministically reproduce each sub-condition under `pytest-asyncio`
+  - Test location: `backend/tests/apis/shared/middleware/test_session_refresh_bug_condition.py`
+  - Use `hypothesis` + `pytest-asyncio`; inject slow/instrumented boto3 stubs for DynamoDB (`table.get_item`, `table.update_item`) and Cognito (`initiate_auth`) via monkeypatching on `SessionRepository._table` and `CognitoRefreshClient`
+  - Bug Condition (from design `isBugCondition`): `BFFConfig.is_enabled() == True` AND `__Host-bff_session` cookie present AND any of sub-conditions 1.1 through 1.7 hold
+  - Expected Behavior assertions (from design Property 1 / Expected Behavior 2.1–2.7) that must hold for all inputs satisfying the bug condition:
+    - **(1.1) Repository offload**: Stub `table.get_item`/`update_item`/`put_item`/`delete_item` with a 500ms `time.sleep`. Run `SessionRepository.get(session_id)` concurrently with an `asyncio.sleep(0.05)` marker coroutine. ASSERT the marker completes strictly BEFORE the repository call returns (loop is not blocked). Repeat for `touch_last_seen`, `update_tokens`, `put`, `delete`.
+    - **(1.2) Cognito offload**: Stub Cognito `initiate_auth` with a 500ms `time.sleep`. Run `CognitoRefreshClient.refresh(...)` concurrently with a marker coroutine AND a concurrent `get_session_lock(other_session_id)` acquisition. ASSERT the marker and unrelated lock acquisition complete while `refresh` is in flight.
+    - **(1.3) Resolve-path coalescing**: Drive 8 concurrent `SessionRefreshMiddleware.dispatch` calls for the same `session_id` with cold `SessionCache` and a valid sealed cookie. Count `table.get_item` invocations on the stub. ASSERT count == 1 (bug: count == 8).
+    - **(1.4) Window de-alignment**: ASSERT `BFFConfig._DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS % BFFConfig._DEFAULT_REFRESH_LEEWAY_SECONDS == 0` AND `BFFConfig._DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS > BFFConfig._DEFAULT_REFRESH_LEEWAY_SECONDS`. Drive a single request with `SessionCache` TTL just elapsed AND `now - last_seen_at == 60s`. ASSERT at most one of `{get_item, update_item}` is observed on the critical path.
+    - **(1.5) Fire-and-forget slide**: Stub `table.update_item` with a 500ms delay. Drive a `dispatch` call where a slide is warranted. Measure elapsed time from `dispatch` entry to `call_next(request)` returning. ASSERT elapsed time < 250ms (bug: elapsed time ≥ 500ms because the response waits on the DDB write).
+    - **(1.6) Concurrency slack at deployment**: Read `infrastructure/cdk.context.json` and assert `appApi.desiredCount >= 2` for the production context.
+    - **(1.7) Fan-out amplification**: Drive 8 concurrent `dispatch` calls on the same session at a cache-boundary moment. Count blocking DDB calls across the fan-out. ASSERT count ≤ 2 (bug: count ≥ 16).
+  - Run all property cases on UNFIXED code
+  - **EXPECTED OUTCOME**: Test FAILS (this is correct - it proves the bug exists). Document the counterexamples in the test output: marker coroutines starved, 8 `get_item` calls per fan-out, both `get_item` and `update_item` on single request, response latency tracking `update_item` latency
+  - Mark task complete when test is written, run, and failures are documented
+  - _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7_
+
+- [x] 2. Write preservation property tests (BEFORE implementing fix)
+  - **Property 2: Preservation** - BFF Middleware Contracts Unchanged for Non-Buggy Inputs
+  - **IMPORTANT**: Follow observation-first methodology
+  - Test location: `backend/tests/apis/shared/middleware/test_session_refresh_preservation.py`
+  - Use `hypothesis` to generate request shapes across the `¬C` input domain; skip any input for which `isBugCondition` returns true
+  - Strategy must cover all axes that exist today: `is_enabled()` true/false, `__Host-bff_session` cookie present/absent, cookie seal valid/invalid/expired, `SessionCache` hit/miss, `needs_refresh` yes/no, refresh-token rotation yes/no, slide warranted yes/no, absolute-lifetime cap passed yes/no, request method safe/unsafe (for CSRF interaction)
+  - **Observe behavior on UNFIXED code** for each non-buggy input and record: response status, `Set-Cookie` headers for `__Host-bff_session` and `__Host-bff_csrf` (including every attribute), `request.state.bff_session`, `request.state.bff_csrf_token`, DDB call counts, Cognito call counts, KMS/Secrets Manager call counts
+  - Write property-based tests capturing these observed behaviors as preservation invariants (from Preservation Requirements 3.1–3.11):
+    - **(3.1) Dormant pass-through**: for all requests, when `is_enabled() == False`, response == `call_next(request)` AND zero AWS calls
+    - **(3.2) No-cookie pass-through**: for all requests with no `__Host-bff_session` header, response == `call_next(request)` AND zero AWS calls
+    - **(3.3) Unrecoverable cookie clears both cookies**: for bad-seal / missing-row / expired-row / terminal-`CognitoRefreshError` inputs, `Set-Cookie` for both `__Host-bff_session` AND `__Host-bff_csrf` has `Max-Age=0` AND identical attribute set
+    - **(3.4) Max-Age re-emit contract**: when `_maybe_slide` returns non-None, the resulting `Set-Cookie` headers for both BFF cookies use that exact `Max-Age` and the exact attribute set from `_reemit_cookies` today
+    - **(3.5) Refresh-storm coalescing**: for 10 concurrent same-session requests crossing the refresh-leeway window, exactly one `cognito-idp:initiate_auth` call is observed
+    - **(3.6) Codec singleton**: across many requests, `get_default_codec()` returns the same instance identity; zero per-request `kms:GenerateDataKey` calls
+    - **(3.7) Client-secret cache**: across many requests, `resolve_bff_client_secret()` hits Secrets Manager exactly once per process
+    - **(3.8) CSRF path unchanged**: `CSRFMiddleware` accept/reject decision on unsafe-method requests is identical to unfixed; zero new I/O on the CSRF path
+    - **(3.9) Absolute-lifetime cap**: when `now > created_at + absolute_lifetime_seconds`, `_maybe_slide` returns `None`; no slide scheduled
+    - **(3.10) Fail-closed rotation**: when rotation triggers AND `_persist_refresh` exhausts retries, cache is invalidated AND both cookies are cleared
+    - **(3.11) Cookie decode uniformity**: every `CookieDecodeError` branch produces identical response shape and timing profile (no new oracle)
+  - Run tests on UNFIXED code
+  - **EXPECTED OUTCOME**: Tests PASS (this confirms baseline behavior to preserve)
+  - Mark task complete when tests are written, run, and passing on unfixed code
+  - _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10, 3.11_
+
+- [x] 3. Fix for BFF middleware event-loop blocking and fan-out amplification
+
+  - [x] 3.1 Offload `SessionRepository` boto3 calls via `asyncio.to_thread`
+    - Edit `backend/src/apis/shared/sessions_bff/repository.py`
+    - For each of `get`, `touch_last_seen`, `update_tokens`, `put`, `delete`: extract the boto3 invocation into a nested sync helper and invoke it via `await asyncio.to_thread(helper, ...)`
+    - Keep method signatures, return types, and exception-handling branches identical
+    - Keep the post-decode TTL defense-in-depth check and `_item_to_record` translation on the calling coroutine
+    - Do NOT change the public API — every callsite in the middleware remains an `await`
+    - _Bug_Condition: isBugCondition(input) where sub-condition 1.1 holds (sync boto3 on event loop)_
+    - _Expected_Behavior: expectedBehavior(result) per design Property 1 clause (a) — every `SessionRepository` boto3 call executes off the event loop thread_
+    - _Preservation: 3.3, 3.10 — exception branches and fail-closed rotation unchanged_
+    - _Requirements: 2.1_
+
+  - [x] 3.2 Offload `CognitoRefreshClient.refresh` via `asyncio.to_thread`
+    - Edit `backend/src/apis/shared/sessions_bff/refresh.py`
+    - Rename existing `refresh` to `_refresh_sync` (or equivalent private sync form) and add a new `async def refresh(...)` that calls `await asyncio.to_thread(self._refresh_sync, username=..., refresh_token=...)`
+    - Update the callsite in `SessionRefreshMiddleware._resolve_session` to `await self._refresh_client.refresh(...)`
+    - Preserve the `CognitoRefreshError` contract and `RefreshResult` return shape exactly
+    - Do NOT move or widen the `get_session_lock(session_id)` scope around the refresh exchange
+    - _Bug_Condition: isBugCondition(input) where sub-condition 1.2 holds (sync Cognito on event loop while session lock held)_
+    - _Expected_Behavior: expectedBehavior(result) per design Property 1 clause (a) — Cognito `initiate_auth` executes off the event loop thread, other sessions' locks are acquirable_
+    - _Preservation: 3.5 — refresh-storm coalescing preserved_
+    - _Requirements: 2.2_
+
+  - [x] 3.3 Add per-session single-flight primitive module
+    - Create `backend/src/apis/shared/sessions_bff/single_flight.py`
+    - Export `async def resolve_once(session_id: str, loader_coro_factory: Callable[[], Awaitable[tuple[Optional[SessionRecord], bool]]]) -> tuple[Optional[SessionRecord], bool]`
+    - Internal state: module-level `dict[str, asyncio.Future[tuple[Optional[SessionRecord], bool]]]` guarded by a `threading.Lock` (mirroring the shape of `sessions_bff/lock.py`)
+    - Leader semantics: first caller for a given `session_id` creates an `asyncio.Future`, registers it under the session lock, runs the loader, sets the result or exception, removes the entry, and returns
+    - Follower semantics: any caller that finds an existing Future `await`s it and returns its value
+    - Exception propagation: an exception from the loader MUST propagate to all current waiters, and the registry entry MUST be removed so subsequent calls start a new leader
+    - Distinct `session_id`s MUST NOT share a Future
+    - Include unit tests alongside: two concurrent `resolve_once` calls share one loader invocation; exception propagation to all waiters; distinct sessions are independent
+    - _Bug_Condition: isBugCondition(input) where sub-condition 1.3 holds (N concurrent same-session resolves issue N `get_item` calls)_
+    - _Expected_Behavior: expectedBehavior(result) per design Property 1 clause (b) — at most one DynamoDB `get_item` per `session_id` per cache window_
+    - _Preservation: 3.5 — the existing `get_session_lock` scope around the Cognito exchange is unchanged (this is a separate primitive upstream)_
+    - _Requirements: 2.3_
+
+  - [x] 3.4 Wire single-flight into `SessionRefreshMiddleware._resolve_session`
+    - Edit `backend/src/apis/shared/middleware/session_refresh.py`
+    - Wrap the `_cache.get → _repository.get → needs_refresh → (maybe refresh)` block in `_resolve_session` inside `resolve_once(session_id, loader_coro_factory)` where the loader factory builds the coroutine that performs today's cache/repo/refresh sequence and returns `(Optional[SessionRecord], clear_cookie: bool)`
+    - Ensure the existing `get_session_lock(session_id)` scope around the Cognito refresh exchange remains exactly where it is today — do NOT move or widen it
+    - Ensure the bad-seal / missing-row / expired-row / terminal-refresh-error paths still produce the same `clear_cookie=True` return and the same exception propagation to `dispatch` as today
+    - _Bug_Condition: isBugCondition(input) where sub-conditions 1.3 and 1.7 hold (fan-out amplification)_
+    - _Expected_Behavior: expectedBehavior(result) per design Property 1 clause (b)_
+    - _Preservation: 3.3, 3.5, 3.11 — unrecoverable cookie clearing, refresh-storm coalescing, uniform decode failure preserved_
+    - _Requirements: 2.3, 2.7_
+
+  - [x] 3.5 Convert `_maybe_slide` to fire-and-forget DDB write
+    - Edit `backend/src/apis/shared/middleware/session_refresh.py`
+    - In `_maybe_slide`, update the local cache synchronously (`record.last_seen_at = now`, `record.ttl = new_ttl`, `self._cache.set(record)`) BEFORE scheduling the background task
+    - Replace `await self._repository.touch_last_seen(...)` with `asyncio.create_task(self._slide_write_task(...))`
+    - Introduce a private `async def _slide_write_task(self, record, ...)` helper that performs `await self._repository.touch_last_seen(...)` inside a `try/except` that logs on failure (preserving today's "swallow failures" semantics)
+    - Return the computed `new_max_age` synchronously from `_maybe_slide`
+    - Do NOT change `dispatch` structure or the cookie-clear / cookie-reemit branches
+    - Do NOT change the absolute-lifetime cap path — it must still return `None`
+    - _Bug_Condition: isBugCondition(input) where sub-condition 1.5 holds (response waits on inline slide-write)_
+    - _Expected_Behavior: expectedBehavior(result) per design Property 1 clause (d) — response latency independent of `touch_last_seen` latency_
+    - _Preservation: 3.4, 3.9 — Max-Age re-emit contract and absolute-lifetime cap preserved_
+    - _Requirements: 2.5_
+
+  - [x] 3.6 De-align cache/leeway and throttle windows in config
+    - Edit `backend/src/apis/shared/sessions_bff/config.py`
+    - Change `_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS` from `60` to `60 * 5` (or explicit `300`)
+    - Verify `_DEFAULT_REFRESH_LEEWAY_SECONDS` remains `60`
+    - Confirm the strict-multiple relationship: `300 % 60 == 0` AND `300 > 60`
+    - Ensure the `BFF_SESSION_SLIDING_RENEWAL_THROTTLE_SECONDS` env var still overrides the default
+    - _Bug_Condition: isBugCondition(input) where sub-condition 1.4 holds (aligned windows force both writes on one request)_
+    - _Expected_Behavior: expectedBehavior(result) per design Property 1 clause (c) — cache-miss does not imply slide-write_
+    - _Preservation: none impacted (pure default-value change; overrides preserved)_
+    - _Requirements: 2.4_
+
+  - [x] 3.7 Raise production `appApi.desiredCount` to 2
+    - Edit `infrastructure/cdk.context.json` to set `appApi.desiredCount` to `2` in the production/non-test context
+    - Keep `appApi.maxCapacity` unchanged (4)
+    - Test fixtures under `infrastructure/test/` may stay at `1` if needed for CDK unit-test speed; only the top-level production context value must change
+    - Update or add CDK unit tests to assert `DesiredCount: 2` on the production `app-api` service synthesis
+    - _Bug_Condition: isBugCondition(input) where sub-condition 1.6 holds (no concurrency slack at deployment)_
+    - _Expected_Behavior: expectedBehavior(result) per design Property 1 clause (e) — `desiredCount >= 2` in production configuration_
+    - _Preservation: none impacted (deployment-config change; in-process singletons untouched)_
+    - _Requirements: 2.6_
+
+  - [x] 3.8 Verify bug condition exploration test now passes
+    - **Property 1: Expected Behavior** - Event-Loop Non-Blocking, Coalesced, Window-Staggered, Fire-and-Forget BFF Middleware
+    - **IMPORTANT**: Re-run the SAME test from task 1 - do NOT write a new test
+    - The test from task 1 encodes the expected behavior from design Property 1
+    - When this test passes, it confirms the expected behavior is satisfied across all seven sub-conditions
+    - Run: `cd backend && uv run python -m pytest tests/apis/shared/middleware/test_session_refresh_bug_condition.py -v`
+    - **EXPECTED OUTCOME**: Test PASSES (confirms bug is fixed):
+      - Marker coroutines complete while AWS stubs are still sleeping (1.1, 1.2)
+      - 8-fan-out produces exactly 1 `get_item` on the stub (1.3, 1.7)
+      - Aligned-boundary request produces at most one of `{get_item, update_item}` (1.4)
+      - Dispatch latency independent of `update_item` stub latency (1.5)
+      - `appApi.desiredCount >= 2` in production context (1.6)
+    - _Requirements: Expected Behavior Properties from design (2.1–2.7)_
+
+  - [x] 3.9 Verify preservation tests still pass
+    - **Property 2: Preservation** - BFF Middleware Contracts Unchanged for Non-Buggy Inputs
+    - **IMPORTANT**: Re-run the SAME tests from task 2 - do NOT write new tests
+    - Run: `cd backend && uv run python -m pytest tests/apis/shared/middleware/test_session_refresh_preservation.py -v`
+    - **EXPECTED OUTCOME**: Tests PASS (confirms no regressions):
+      - Dormant pass-through with zero AWS calls (3.1)
+      - No-cookie pass-through with zero AWS calls (3.2)
+      - Unrecoverable cookie clears both cookies with identical attributes (3.3)
+      - Max-Age re-emit contract preserved under fire-and-forget dispatch (3.4)
+      - Exactly one `initiate_auth` per `session_id` per leeway window (3.5)
+      - `get_default_codec()` and `resolve_bff_client_secret()` remain singletons (3.6, 3.7)
+      - `CSRFMiddleware` path unchanged (3.8)
+      - Absolute-lifetime cap preserved (3.9)
+      - Fail-closed rotation preserved (3.10)
+      - Uniform `CookieDecodeError` handling preserved (3.11)
+    - Confirm all tests still pass after fix (no regressions)
+
+- [x] 4. Checkpoint - Ensure all tests pass
+  - Run the full backend test suite: `cd backend && uv run python -m pytest tests/ -v`
+  - Run CDK unit tests: `cd infrastructure && npm run build && npm test`
+  - Confirm the bug condition exploration test (task 1) passes on fixed code
+  - Confirm the preservation property tests (task 2) pass on fixed code
+  - Confirm no unrelated tests regress
+  - Ensure all tests pass, ask the user if questions arise
diff --git a/.kiro/steering/structure.md b/.kiro/steering/structure.md
index 54606b6f..d5862e64 100644
--- a/.kiro/steering/structure.md
+++ b/.kiro/steering/structure.md
@@ -238,7 +238,7 @@ scripts/
 
 - **Files**: snake_case (e.g., `turn_based_session_manager.py`)
 - **Classes**: PascalCase (e.g., `TurnBasedSessionManager`)
-- **Functions**: snake_case (e.g., `get_current_user`)
+- **Functions**: snake_case (e.g., `get_current_user_from_session`)
 - **Constants**: UPPER_SNAKE_CASE (e.g., `MAX_FILE_SIZE`)
 - **Private**: Leading underscore (e.g., `_internal_method`)
 
@@ -266,7 +266,7 @@ All modules are properly packaged and can be imported directly:
 
 ```python
 # Shared utilities (canonical location for cross-service code)
-from apis.shared.auth import get_current_user, User
+from apis.shared.auth import get_current_user_from_session, User
 from apis.shared.rbac import RBACService
 from apis.shared.costs.calculator import CostCalculator
 from apis.shared.tools.models import ToolDefinition
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 3756d72e..4b854a06 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,6 +4,220 @@ All notable changes to this project are documented in this file. Format follows
 
 For narrative release notes written for operators and product owners, see [RELEASE_NOTES.md](RELEASE_NOTES.md).
 
+## [1.0.0-beta.27] - 2026-05-20
+
+The largest release since the BFF cutover. Two new user-facing surfaces (Artifacts and MCP Apps host-renderer) each backed by a new CDK stack, an admin shell redesign that replaces the 15-card grid with a persistent grouped sidebar, recoverable `max_tokens` truncation with a Continue affordance, model-aware adaptive thinking for Opus 4.7, an inference-API `/ping` reaper fix, and a pre-migration backup tool. `bedrock-agentcore` 1.6.4 → 1.9.1, `boto3` 1.42.96 → 1.43.9, `strands-agents` 1.39.0 → 1.40.0.
+
+### 🚀 Added
+
+- **Artifacts feature** — agent-authored versioned standalone documents (HTML, Markdown, code) that render in a sandboxed iframe in a docked side panel. Backed by a new `ArtifactsStack` (DDB `user-artifacts` heads + version log with session GSI; private S3 `artifacts-content` bucket; render Lambda; CloudFront on `artifacts.{domain}`) and short-lived HMAC-signed render-token JWTs minted by app-api. Two new built-in tools (`create_artifact`, `update_artifact`) registered as default public tools so the feature works on first deploy. Versions are immutable (no `s3:DeleteObject` on inference-api). HTML mode allows scripts from `cdn.tailwindcss.com`, `esm.sh`, `cdn.jsdelivr.net`, `unpkg.com`; `connect-src 'none'`. Markdown mode wraps GFM input in a self-contained HTML render harness server-side. Frontend: docked resizable panel, auto-open on first creation, skeleton loader, latest-version on update, per-version history cards, preview/code toggle with syntax-highlighted source view, download button (#306, #309, #310, #311, #312, #314, #316, #317, #318, #319, #321, #322, #323, #324, #325, #326, #334)
+- **MCP Apps host-renderer** — third-party MCP servers can ship UI alongside their tools. New `McpSandboxStack` (CloudFront on `mcp-sandbox.{domain}` with a CloudFront Function emitting per-resource `frame-ancestors` CSP; outer mount-page S3 bucket). Agent advertises `experimental.ui` on MCP `initialize`, fetches `ui_resource` payloads via `resources/read`, emits a `ui_resource` SSE event with `uri`, `permissions`, and `sandboxOrigin`. Frontend `<mcp-app-frame>` Angular custom element renders Apps in a sandboxed iframe with a `postMessage` bridge that enforces allowed message types (`ui/message`, `ui/update-model-context`) and origin checks. App-initiated `tools/call` proxied through app-api over an event broker. Explicit user consent prompt on first frame, persisted across reloads via card store. Default-on this release (`Defaults.MCP_APPS_HOST_ENABLED` flips false → true) with `AGENTCORE_MCP_APPS_SANDBOX_ORIGIN` wired into inference-api runtime env from SSM. Tools whose only output is a `ui_resource` are filtered out for non-capable clients. Committed `budget-allocator-server` example; runbooks updated (#296, #339, #342, #343, #344, #345, #346, #347, #348, #349, #352, #353, #355, #360)
+- **Admin shell redesign** — persistent grouped sidebar nav (Usage & Spend / AI Configuration / Identity & Access / Customization) replaces the 15-card admin grid. `/admin` redirects to `/admin/costs`. Quotas (Tiers / Assignments / Overrides / Inspector / Events) collapses 5 sibling routes into a single tabbed page; Fine-Tuning (Access / Costs) collapses into one. "Back to Admin" link removed from 10 sub-pages. Cost summary cards restructured (title on its own row, icon as top-right corner accent) so "Cache Savings" / "Avg Cost/User" stop wrapping (#300)
+- **Compact model browse + manage views** — manage-models and the Bedrock/Gemini/OpenAI browse pages redesigned as one-line scannable rows with expand-on-demand detail; slim inline filter toolbar; inline enable/disable toggle so status changes don't require opening the form; `rounded-2xl` matches the chat input (#332)
+- **Compact tool catalog + form** — same redesign applied to admin tools list and create/edit form. Compact expandable rows; form flattened to shared list-page token set (`rounded-2xl`, `text-sm/6`, `text-2xl/8` header, `focus:ring-2`); no behavior changes (#335)
+- **Admin-managed user-menu links** — new admin domain so org admins can curate the SPA user-menu links without code changes. Each link is either an external URL (new tab) or an in-app modal with admin-authored Markdown. New `user-menu-links` DDB table; admin CRUD at `/admin/user-menu-links` (`require_admin`); public enabled-only read at `/user-menu-links` (cookie-aware `get_current_user_from_session`) (#298)
+- **Recoverable `max_tokens` truncation** — `MaxTokensReachedException` is classified specifically in the stream processor and emits a `max_tokens`-coded recoverable `stream_error` event. Continue is a resume, not a new turn: `continue_truncated` re-enters the agent loop with an empty-list prompt (assistant-prefill) bypassing quota / RAG / file-resolution. `lastTurnContinuable` marker on session metadata flows through `SessionMetadataResponse` so Continue reappears after a refresh. Frontend renders a compact inline "Response length limit reached" notice + Continue button (no verbose error bubble); continuation-aware message-map sync pins the partial and appends the continuation. `stream_error` is now an always-allowed parser event (#328)
+- **Model-aware adaptive thinking + `effort` knob** — `_shape_thinking_value` is now model-aware. Opus 4.6/4.7, Sonnet 4.6, and Mythos emit `{type: "adaptive", display: "summarized"}` (the explicit `display` keeps the reasoning trace visible — Opus 4.7 defaults `display` to `"omitted"`); older models keep `{type: "enabled", budget_tokens: N}`. New `effort` canonical inference param wired through `additional_request_fields.output_config.effort` (NOT `additionalModelRequestFields`). Wired through the admin model form and the user-facing chat settings panel as a new select control with server-side allowed-set gating. Generic `allowed` enum on `ModelParamSpec` so the per-model effort-tier difference (Sonnet 4.6 vs Opus 4.7) is data, not a model-family branch (#331)
+- **Pre-migration backup tool** — `scripts/backup-data/` produces a complete restore-friendly snapshot for a given `CDK_PROJECT_PREFIX`: all ~20 application DDB tables via `ExportTableToPointInTime`, user-content S3 buckets via `aws s3 sync`, full Cognito user pool config including identity providers and app clients with plaintext client secrets preserved, users / groups / group memberships, and best-effort AgentCore Memory events. Each run lands in a freshly-created versioned SSE-encrypted TLS-only `{prefix}-backup-{utc_timestamp}` bucket. `manifest.json` is the single source of truth for restore. Cognito password hashes are not exportable by AWS — documented prominently. Ephemeral session/state tables excluded by default. `workflow_dispatch` GitHub workflow wired via the existing OIDC composite action (#361)
+- **Live tool output streamed into the tool rail** during artifact authoring (#316)
+- **Markdown content-type support** in the artifact tool (#318)
+- **Configurable extra CSP `frame-ancestors`** for the artifact origin (#314)
+- **`<mcp-app-frame>` custom element + `postMessage` bridge** with origin- and type-enforcement (#346)
+- **Tool result renderer registry** — signal-backed `ToolRendererRegistryService` keyed by tool name replaces the implicit text/JSON/image switch baked into `ToolUseComponent`. The default renderer reproduces the prior markup verbatim — zero visible change. `calculator`, `fetch_url_content`, and `create_visualization` migrated as proof points. Foundation for the MCP Apps `<mcp-app-frame>` renderer (#339)
+- **Copy-to-clipboard button on chat code blocks** + Prism syntax-highlighting bundles for JavaScript, TypeScript, Python, and SQL alongside the existing C#/CSS bundles (#299)
+- **Autofocus chat input on session load and switch** so the user can type immediately without clicking. Assistant-preview empty state opts out via a new `autoFocus` input (#333)
+- **Denser session sidebar with skeleton + entry animation** — rows tighten from ~40px to ~32px (`py-2 → py-1.5`, `text-sm/6 → text-sm/5`); nested flex wrappers around the title removed; group gaps tightened. A 10-session list is ~25% shorter overall. Inactive items `font-normal`; active row `!font-medium` via `routerLinkActive` (#301)
+
+### ✨ Improved
+
+- **Spinners across admin / settings / fine-tuning / auth pages** — 24 loading spinners had been rendering as a uniform gray ring in dark mode (no visible motion); they now spin with the proper accent (#300)
+- **Admin shell wider with sidebar label wrapping fixed** (#305)
+- **User-menu links / in-app modals visually distinguished** in both modal preview and runtime rendering (#303)
+- **`mcp-sandbox` outer CSP + inner mount aligned** with the upstream `ext-apps` basic-host reference; blob iframe rendering, first-class block element, Angular 21-specific fixes (#352, #353)
+- **Dynamic per-resource CSP** for the sandbox proxy — CloudFront Function decodes a URL-encoded `?csp=` query param scoped to one resource and emits the per-request `Content-Security-Policy` header. Source loaded from `assets/mcp-sandbox/csp-function.js` with `frame-ancestors` JSON-injected at synth; substitution asserts the placeholder is present exactly once so a future refactor that loses it fails loudly at synth (#355)
+
+### 🐛 Fixed
+
+- **Critical:** `MaxTokensReachedException` surfaced as a generic leaky error (`...unrecoverable state... https://strandsagents.com/...`) and the only "recovery" re-sent the original prompt as a new user turn, so the model re-answered from scratch and re-truncated — an infinite loop. Continue is now a true resume (`continue_truncated` empty-list prompt, assistant-prefill on restored history) bypassing quota / RAG / file-resolution like the existing interrupt-resume path (#328)
+- **Opus 4.7 400 on `thinking.type="enabled"`** — Opus 4.7 rejects the legacy thinking shape; model-aware `_shape_thinking_value` now emits `{type: "adaptive"}` for Opus 4.6/4.7, Sonnet 4.6, Mythos. Without this fix, Opus 4.7 turns failed at the SDK boundary (#331)
+- **Float-typed `max_tokens` / `top_k` crashed boto3's Bedrock Converse client.** Untyped inference params (`Dict[str, Any]` from JSON) let a float reach the SDK, which rejects a float `maxTokens` with a hard validation error. Coerced to `int` at the single provider-translation chokepoint (covers fresh + resumed turns, all providers). The thinking-vs-`max_tokens` consistency guard previously used `isinstance(..., int)` and silently no-opped on float input; it now coerces first so an inconsistent request (`thinking >= max_tokens`) is rejected before reaching Anthropic. Model-ceiling cap protects against admin-configured `max_tokens` exceeding the model's hard limit (#329, #330)
+- **Silent mid-stream microVM reaping on long generations.** AgentCore's idle reaper requires an integer `time_of_last_update` field alongside `status`; when absent, the platform reaps the microVM at `idleRuntimeSessionTimeout` regardless of reported status (`bedrock-agentcore-sdk-python#471`). Inference-api's `/ping` now emits a fresh timestamp on every call as the documented mitigation. Status casing also corrected to match `PingStatus`. Workaround until async-task busy tracking lands and we can report `HealthyBusy` (#338)
+- **Frontend deploy bundles shipped the `'dev'` placeholder.** `scripts/stack-frontend/build.sh` invoked `ng build` directly, bypassing the npm `prebuild` lifecycle hook that runs `gen-version.js`. The user menu rendered "local" on `develop` and `main`. Build script now runs `gen-version.js` explicitly before the build (#336)
+- **Chart.js artifacts loaded via `cdn.jsdelivr.net` rendered blank.** The artifact-origin CSP only permitted scripts from `cdn.tailwindcss.com` and `esm.sh`. Widened script-src to `cdn.jsdelivr.net` and `unpkg.com`, kept byte-identical across the render Lambda `CSP_SCRIPT_SRC` env var and the system-prompt allowlist (#326)
+- **Admin user-menu-links resource fired a duplicate load request for non-admin users** — gated to admin-only (#315)
+- **Artifact card z-index escapes its message row on focus** — scoped with `isolation: isolate` (#323)
+- **`mcp-sandbox` CFN `Comment` overflowed AWS's 128-char cap** — twice, on the original RHP and the rebuild (#356, #357)
+- **`mcp-sandbox` CSP not URL-decoded in CloudFront Function** — decoded properly; `x-csp-debug` diagnostic header added during the investigation (#358) and removed once the fix landed (#359)
+- **Inner App iframe gained `allow-same-origin`** to match the upstream basic-host reference (#360)
+- **Docker build hard-fail from rotated `curl` apt pin.** Debian rotated `curl 8.14.1-2+deb13u2` out of the trixie apt index (superseded by `+deb13u3`); the exact pin made every App API / Inference API Docker build on `develop` fail with `E: Version '8.14.1-2+deb13u2' for 'curl' was not found`. Pin bumped (#327)
+- **Artifact env vars not passed to non-`ArtifactsStack` consumer workflows.** `validateConfig` runs on every stack synth (the `bin/` instantiates all enabled stacks), so consumer workflows need to pass `CDK_HOSTED_ZONE_DOMAIN`, `CDK_ARTIFACTS_ENABLED`, and `CDK_ARTIFACTS_CERTIFICATE_ARN` even though they don't synth `ArtifactsStack` directly. Five deploys failed on the develop merge before this fix (#307)
+- **`infrastructure-stack` tests asserted a stale DDB count.** `resourceCountIs(18)` went red when `user-menu-links` landed (19 tables). Replaced the magic number with an enumerated, justified table list (#350)
+
+### 🔒 Security
+
+- **Artifacts isolation.** `artifacts.{domain}` is a different cookie-jar host from the SPA. CSP `connect-src 'none'` — artifacts cannot make outbound network calls. Render-token JWTs are scoped to one `(artifact_id, version)` and are HMAC-signed with a Secrets-Manager-managed key. S3 versions are immutable: there's no `s3:DeleteObject` grant on the inference-api role
+- **MCP Apps isolation.** `mcp-sandbox.{domain}` is a separate origin from the SPA. Per-resource `frame-ancestors` CSP is emitted by a CloudFront Function on viewer-response. Inner App iframe carries `allow-same-origin` to match the basic-host reference. Explicit user consent (with reload persistence) gates first-time framing
+- **Dead Bearer-only auth removed from app-api (#297).** A sweep of `app_api/` for `Depends(get_current_user)`, `Depends(security)`, `Depends(verify_token)`, and manual `Authorization` header reads turned up exactly two routes still on Bearer auth, both in `chat/routes.py`. Dead Bearer paths removed; `POST /chat/agent-stream` is documented as intentionally Bearer for non-SPA callers (API-key tooling, scripts). All other app-api routes are cookie-based BFF auth post-beta.24
+
+### ⚠️ Breaking changes
+
+- **MCP Apps default-on.** `Defaults.MCP_APPS_HOST_ENABLED` flips false → true. To remain opt-in, set `AGENTCORE_MCP_APPS_HOST_ENABLED=false` in inference-api task env. If MCP Apps is enabled but `mcp-sandbox` isn't deployed, `ui_resource` events emit with empty `sandboxOrigin` and the SPA cannot frame the App (#349)
+- **App-api Bearer-only auth removed (#297).** External integrations calling `apis/app_api/` routes with `Authorization: Bearer` must switch to the API-key feature (`auth/api_keys/`, `X-API-Key`) before deploying beta.27. `POST /chat/agent-stream` remains Bearer-acceptable for non-SPA callers
+
+### 🏗️ Infrastructure
+
+- **New `ArtifactsStack`** (gated by `config.artifacts.enabled`) — DDB `user-artifacts` table, private S3 `artifacts-content` bucket, render Lambda, CloudFront on `artifacts.{domain}`, Route53 alias. Consumes `/artifacts/render-token-key-arn` SSM (published by `InfrastructureStack`); publishes `/artifacts/bucket-name`, `/artifacts/bucket-arn`, `/artifacts/table-name`, `/artifacts/table-arn`, `/artifacts/origin`. Requires `CDK_HOSTED_ZONE_DOMAIN`, `CDK_ARTIFACTS_CERTIFICATE_ARN` (must be in `us-east-1`)
+- **New `McpSandboxStack`** (gated by `config.mcpSandbox.enabled`) — S3 mount-page bucket, CloudFront on `mcp-sandbox.{domain}` with a CloudFront Function for dynamic per-resource CSP, Route53 alias. Publishes `/mcp-sandbox/origin` SSM, consumed by inference-api at runtime as `AGENTCORE_MCP_APPS_SANDBOX_ORIGIN`. ACM cert must be in `us-east-1`
+- **New `UserMenuLinksTable`** in `InfrastructureStack` + `/admin/user-menu-links-table-name` and `/admin/user-menu-links-table-arn` SSM parameters (#298)
+- **New `ArtifactRenderTokenSecret`** in `InfrastructureStack` (Secrets Manager, AWS-managed encryption, `generateSecretString` 64-char) gated on `config.artifacts.enabled`. SSM `/artifacts/render-token-key-arn` publishes the ARN. Lives in `InfrastructureStack` (not `ArtifactsStack`) so app-api can read it without taking a stack-deploy-order dependency on `ArtifactsStack`
+- **Inference-api conditionally consumes `mcp-sandbox` SSM** when `config.mcpSandbox.enabled` is true. Mirrors the artifacts conditional-SSM pattern; two synth tests cover present/absent (#349)
+
+### 🔧 CI/CD
+
+- **Backup workflow** wired as `workflow_dispatch` against the existing OIDC composite action (#361)
+- **All five consumer workflows** now thread `CDK_HOSTED_ZONE_DOMAIN`, `CDK_ARTIFACTS_ENABLED`, `CDK_ARTIFACTS_CERTIFICATE_ARN` so synth-time validation doesn't fail on workflows that don't synth `ArtifactsStack` directly (#307)
+- **Frontend build** runs `gen-version.js` explicitly before `ng build` so deployed bundles bake the real version (#336)
+- **`infrastructure/test/infrastructure-stack.test.ts`** enumerates the 19 DDB tables instead of asserting `resourceCountIs(18)` (#350)
+- **Docker `curl` pin** bumped to `8.14.1-2+deb13u3`; pin policy documented as "follow Debian point-releases" (#327)
+
+### 📦 Dependency upgrades
+
+- `bedrock-agentcore` 1.6.4 → 1.9.1 (with coupled `boto3` 1.42.96 → 1.43.9, `botocore` / `s3transfer` following). CHANGELOG audited end-to-end: no breaking changes for our memory/identity usage. Validated with a read-only dev smoke test (memory `get_memory_strategies` / `retrieve_memories` + identity `list_workload_identities`) and the full backend suite. Test-infra side effect: `botocore` 1.43 newly reads `Credentials.account_id` during endpoint construction; on a `RefreshableCredentials` (SSO) object that forces a refresh → `GetRoleCredentials`, which `moto` does not implement. Combined with `backend/src/.env`'s `AWS_PROFILE` leaking via `load_dotenv(override=True)`, this red-ed the suite order-dependently. Added per-test autouse scrub fixtures for `AWS_PROFILE` and the `DYNAMODB_*` / `COGNITO_*` config families, mirroring the existing `_clear_skip_auth_env` fixture for the same `.env`-bleed bug class (#337)
+- `strands-agents` 1.39.0 → 1.40.0. Gated on a token-count audit and a compaction double-fire check. `use_native_token_count` default flipped true → false (Strands PR #2284) is inert for our token accounting — the flag gates only `BedrockModel.count_tokens()`, which Strands calls solely from `_estimate_input_tokens()` to populate `projected_input_tokens` on `BeforeModelCallEvent`. Our cost-badge / context-% / compaction-trigger plumbing reads from `inputTokens` + `cacheReadInputTokens` + `cacheWriteInputTokens` directly, so the flip is transparent (#340)
+
+### 🧪 Test Coverage
+
+- Backend + frontend regression coverage for `MaxTokensReachedException` classification, the `continue_truncated` resume path, `stream_error` always-allowed parser gating, and the `lastTurnContinuable` refresh-survival marker round-trip (#328)
+- Backend regression coverage for adaptive thinking shape per model marker, `effort` allowed-set gating, and the float→int coercion path on `max_tokens` / `top_k` (#329, #330, #331)
+- `infrastructure/test/mcp-sandbox-stack.test.ts` (264 lines) — synth + CFN unit coverage including the placeholder-substitution invariants (#343, #355)
+- `infrastructure/test/mcp-sandbox-csp-function.test.ts` (357 lines) — `frame-ancestors` quote-escaping, including `'none'` (which would otherwise produce `''none''`, a JS syntax error) (#355)
+- `infrastructure/test/inference-api-stack.test.ts` — two synth cases gating `AGENTCORE_MCP_APPS_SANDBOX_ORIGIN` wiring on `config.mcpSandbox.enabled` (#349)
+- `infrastructure/test/cors.test.ts` (53 lines) — new CORS test surface
+- `infrastructure/test/infrastructure-stack.test.ts` — 19 DDB tables enumerated with one-line justifications instead of count assertion (#350)
+- Frontend specs: `mcp-app-bridge`, `mcp-app-card-state.service`, `mcp-app-consent.service`, `mcp-app-message.service`, `mcp-app-proxy.service`, `mcp-app-state.service`, `proxy-url`, `artifact-http.service`, `artifact-state.service`, `artifact-source.component`
+
+### 📚 Docs
+
+- `docs/kaizen/scoping/mcp-apps-host-renderer.md` — initial scoping document for the MCP Apps Host Renderer initiative (#296)
+- `step-04-deploy.md` — "Register an MCP-Apps-capable MCP server" section with `budget-allocator-server` example + committed `ToolCreateRequest` payload (no auto-seed; registration stays an explicit per-env opt-in) (#349)
+- `step-05-verify.md` — manual e2e dogfood scenario exercising all six Definition-of-Done MCP Apps interactions (#349)
+- `docs/artifacts/...` — corrected cert-reuse guidance for subdomain primaries (#308)
+- `CLAUDE.md` — `ui_resource` SSE row + deploy-order line updated for the live flag and conditional `mcp-sandbox` SSM consumption (#349)
+- `.env.example` — documents `BFF_COOKIE_DATA_KEY_SECRET_ARN` (carry-over from beta.25) (#276)
+- Architecture rules surfaced for Copilot CLI: 3-package import boundary, inference-api Runtime 404 trap, deploy order, SSE error model. Points to `.kiro/steering` and `.claude/skills` for deeper dives (#361)
+- Forward-looking A2A guard: if exposing an A2A server, `AgentCard.capabilities` must include `streaming=True` or clients hang ~40 min (`sample-strands-agent-with-agentcore` commit `50c9112`) (#338)
+- Kaizen-2026-05-15 hygiene — replaced dead source URLs in `kaizen-research` (the `bedrock/whats-new/` 404, the `docs.claude.com` claude-code release-notes 301→404, and the inactive `anthropics/courses`); fixed `aws/amazon-bedrock-agentcore-{sdk-python,starter-toolkit}` repo-slug typos to the correct `aws/bedrock-agentcore-*` slugs (#338, #341, #302, #304)
+
+## [1.0.0-beta.26] - 2026-05-13
+
+Small focused release. Multi-sheet XLSX support for the spreadsheet analysis tool, async refactor of the spreadsheet file-lookup path, user default model preference applied at chat time, nightly E2E pipeline restored, and upstream contribution governance (PRs restricted to collaborators, Dependabot version-update PRs disabled).
+
+### 🚀 Added
+
+- Multi-sheet XLSX support in the `analyze_spreadsheet` tool. Each sheet converts to its own deterministic CSV (`stem.sheetname.csv`) with a primary alias (`stem.csv`) for the first sheet. Defensive caps via env vars `MAX_SHEETS_TO_CONVERT` and `MAX_ROWS_PER_SHEET` prevent latency blowout and context-window exhaustion on pathological workbooks. Skipped/truncated sheets are surfaced to the model with markdown footers documenting per-sheet conversion status
+- `_sanitize_sheet_name()` produces filesystem-safe deterministic CSV filenames; `_parse_sheet_inventory()` extracts structured sheet metadata from bootstrap stdout without `eval`-style evaluation; `_safe_int()` for defensive integer parsing; `_format_sheet_note()` for the per-call markdown footer
+
+### ✨ Improved
+
+- `analyze_spreadsheet`, `list_spreadsheets`, `_find_file`, `_get_kb_files`, and `_get_session_files` are now `async def`. Every DynamoDB call is offloaded via `asyncio.to_thread` so the event loop keeps scheduling other coroutines for the full round-trip duration
+- `inference_api/chat/routes.py::_build_tabular_inventory` is now `async` and awaits the file-operation calls directly, replacing the nested `asyncio.run` + thread pool executor pattern that could deadlock under concurrent chat load. Closes the regression introduced in #260
+- `analyze_tool` code generation stashes the filename as a `_FNAME` variable inside the generated snippet to prevent f-string interpolation conflicts when filenames contain quotes or special characters (`repr()` indirection in `_build_preview_code`)
+- `_clean_stderr` now respects the `MAX_ERROR_CHARS` budget strictly, accounting for ellipsis length
+
+### 🐛 Fixed
+
+- User-saved default model preference (`defaultModelId` in user settings) is now applied at chat time when the request doesn't specify a `model_id`. Previously the persisted preference was silently ignored and chat fell back to the hardcoded factory default. RBAC is re-checked on the resolved default to prevent access to permissions that have since been revoked. A missing user-settings table now surfaces as `503` instead of silently dropping the user choice. Fixes #161
+- Nightly E2E pipeline failures from cookie/JWT validation against the dynamic CloudFront URL, missing CDK certificate ARN in the nightly job, agent test timeouts on multi-tool turns, and cross-region Bedrock model routing flakes (switched the suite from global to US-region model IDs) (#290)
+
+### 📚 Docs
+
+- `backend/src/.env.example` — BFF cookie encryption documentation updated to reflect the beta.25 shift from direct KMS cookie encryption to Secrets Manager-mediated approach. Documents the new `BFF_COOKIE_DATA_KEY_SECRET_ARN` variable, the SHA-256 cross-task derivation, and the SSM parameter path with example ARN format
+
+### 🔧 CI/CD
+
+- Nightly E2E pipeline restored after multi-attempt fix (#290): CloudFront URL handling, CDK certificate ARN wiring, agent test timeout bumps, US-region Bedrock model IDs, rebase on develop to pick up #248
+
+### 🛡️ Governance
+
+- **CONTRIBUTING.md** documents that pull requests are restricted to approved collaborators (GitHub "Collaborators only" setting). Issues remain open to everyone; maintainers triage and either implement upstream or coordinate next steps with the reporter. Adds collaborator checklist (link tracking issue, single logical change per PR, DCO sign-off, green CI, respect backend import boundaries enforced by `backend/tests/architecture/test_import_boundaries.py`) (#293)
+- **`.github/dependabot.yml`** — `open-pull-requests-limit: 0` across all four ecosystems (pip, frontend npm, infrastructure npm, github-actions). Disables scheduled version-update PRs; security updates are unaffected and will still be raised when a CVE is published. Existing groups, labels, schedules retained for easy reversal (#293)
+
+### 🧪 Test Coverage
+
+- `backend/tests/agents/builtin_tools/spreadsheet_analysis/` — 2,800+ lines of new tests across 8 files. Notable: `test_analyze_tool_integration.py` (779 lines, multi-sheet XLSX + CSV workflows end-to-end), `test_sheet_inventory.py` (307 lines, parser robustness against malformed bootstrap output), `test_clean_stderr.py` (202 lines, strict error-char budget), `test_build_preview_code.py` (127 lines, filename escaping), plus `test_helpers.py`, `test_find_file.py`, `test_list_spreadsheets.py`, `test_strip_first_row.py`
+- `frontend/ai.client/src/app/session/services/model/model.service.spec.ts` (56 lines) — default-model resolution flow
+- `frontend/ai.client/src/app/settings/pages/chat-preferences/chat-preferences-settings.page.spec.ts` (101 lines) — Chat Preferences settings UI
+
+## [1.0.0-beta.25] - 2026-05-11
+
+Production-readiness fix for the BFF Token Handler shipped in beta.24. Fixes three production-breaking bugs introduced by beta.24: event-loop-blocking sync boto3 on every cookie-bearing request, per-process AES-256 keys that can't round-trip cookies across ECS tasks, and an in-process-only refresh lock that races Cognito rotation across replicas. Also ships PDF thumbnails, rich attachment previews, spreadsheet analysis tools, centralized 401 handling, and a `SKIP_AUTH` local-dev bypass.
+
+### 🐛 Fixed
+
+- **Critical (beta.24 regression):** `SessionRefreshMiddleware` ran sync boto3 (DynamoDB + Cognito) on the uvicorn event loop so Angular's ~8-endpoint page-load fan-out produced ~16 serialized blocking AWS calls per user per minute. Observable as ALB 504s, 15.6s p-max `TargetResponseTime` at 0.7% CPU, `/files/quota` outliers reaching ~80s. Every boto3 call in `SessionRepository` and `CognitoRefreshClient.refresh` now offloads via `asyncio.to_thread`; `_resolve_session` is wrapped in a per-session `asyncio.Future` single-flight so N concurrent same-session callers share one loader invocation; `_maybe_slide` dispatches `touch_last_seen` as a detached `asyncio.Task` (with strong reference on the middleware to prevent GC); `_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS` raised 60s → 300s to de-align from the 60s refresh-leeway window (#264)
+- **Critical (beta.24 regression):** `CookieCodec` called `kms:GenerateDataKey` on first use per process, so each app-api task minted its own random AES-256 key. Once `desiredCount` went above 1, cookies sealed on Task A failed as `bad seal` on Task B (~50% of requests). Data key is now generated once via Secrets Manager `generateSecretString` (44-char, ~261 bits entropy) encrypted at rest with the existing `BFFCookieSigningKey` CMK; `CookieCodec._ensure_cipher` reads the secret and derives the AES-256 key via SHA-256; `kms:GenerateDataKey` dropped from the runtime task role (#273, #274)
+- **Critical (beta.24 regression):** In-process `single_flight` and `get_session_lock` only coalesce same-session callers within one Python process. Under multi-replica, two tasks could each call `cognito-idp:initiate_auth` with the same refresh token; Cognito rotates on the winner and the loser silently logs the user out. New DDB conditional-write lock (`try_acquire_refresh_lock` / `release_refresh_lock` on `BFFSessionsTable`, reusing the existing `dynamodb:UpdateItem` grant) elects exactly one leader fleet-wide; followers poll the row and adopt the leader's tokens. `update_tokens` gains strict-owner condition (`refresh_lock_owner = :owner`) that atomically `REMOVE`s the lock attrs on successful persist and rejects stale-leader stomps via `ConditionalCheckFailedException`. Absolute-lifetime guard added ahead of lock acquisition so we don't burn a Cognito refresh on a row that's about to TTL-evict (#273, #275)
+- Per-message cost double-count on tool-use turns — Strands' `AgentResultEvent` cumulative `accumulated_usage` overwrote the last assistant message's per-call usage via `.update()`. Route the result-extracted cumulative on the `metadata_summary` turn-summary track instead of `metadata` (#270)
+- Context-% inflation within a tool turn — Bedrock reports each per-LLM-call `inputTokens` as the full context sent on that call, so Strands' summed `accumulated_usage` over-reports. `stream_coordinator` no longer accumulates `metadata_summary` into `accumulated_metadata`; per-call `metadata` last-write-wins so the value equals the most recent call's full input = current context. Summed across `inputTokens` + `cacheReadInputTokens` + `cacheWriteInputTokens` since `AgentResult.context_size` under-reports by 99%+ under prompt caching (#270)
+- `LatencyMetrics.time_to_first_token` changed from `int` (placeholder 0) to `Optional[int]` (placeholder `null`) — a real TTFT can't be 0ms and aggregations need to distinguish absence from a real value (#270)
+- Session-expired mid-session left users stranded with a generic toast or no feedback on SSE. Every 401 now flows through `SessionService.handleUnauthorized()`, which dedupes concurrent calls and navigates once with preserved `returnUrl` (#277)
+- Session loss not surfaced until the next HTTP call failed. Added cookie-presence fast-path (JS-readable `__Host-bff_csrf` cookie absence implies `__Host-bff_session` also gone) and visibility re-probe on tab refocus (#277)
+- Login & first-boot lava-lamp backdrop dark-mode CSS never applied on cold load — `html.dark .X` selectors don't match under Angular's emulated view encapsulation, and `ThemeService` was never injected in the pre-auth tree. Switched to `:host-context(html.dark) .X` and forced `ThemeService` construction via `provideAppInitializer` (#271)
+- XLSX→CSV filename mismatches in the Code Interpreter sandbox triggered retry loops. Targeted error hints, tolerant filename matching for CSV↔XLSX aliasing, schema footer preservation on errors
+
+### 🚀 Added
+
+- Server-rendered PDF page-1 thumbnails on attachment cards. New `ThumbnailRenderer` MIME-dispatcher (PDF today via `pypdfium2`, lazy-cached `_thumb.png` sibling in S3, render runs in `loop.run_in_executor`); new `GET /files/{upload_id}/thumbnail` returning a short-lived presigned URL; single-file + session-cascade deletes clean up thumbnails. Frontend: `FileUploadService.getThumbnail()` returns a typed `ready` / `unsupported` / `unavailable` result; PDF badge renders `object-cover` (#263)
+- Rich previews in user messages — iMessage-style image mosaic (1-bubble / 2-col / 1+2 split / 2×2 / 5+ with `+N` overlay) with full-screen lightbox + arrow-key navigation; document-style cards for non-images with tinted header + folded corner + content excerpt. New `GET /files/{upload_id}/preview-url` and `GET /files/{upload_id}/text-snippet` (first 2KB UTF-8) (#254)
+- Inline markdown preview for `.md` files in attachment cards; full-screen modal viewer via `ngx-markdown` instead of opening raw source in a new tab (#262)
+- Spreadsheet analysis tools — `list_spreadsheets` enumerates CSV/XLSX across KB + attachments (with size + MIME metadata); `analyze_spreadsheet` runs Python analysis in Code Interpreter with schema detection (skiprows probing), cleaned pandas/numpy tracebacks, and 10K/600-char output/error truncation. Injected per-request via `extra_tools` (#f88ce7ec, #0ab90bb1)
+- `SKIP_AUTH=true` local-dev bypass in `apis.shared.auth.dependencies` returns a fake admin user from all three auth dependencies. Optional tuning: `SKIP_AUTH_ROLES`, `SKIP_AUTH_USER_ID`, `SKIP_AUTH_EMAIL`. Startup guard in `app_api/main.lifespan` refuses to boot when `SKIP_AUTH=true` is paired with any non-localhost entry in `CORS_ORIGINS`. Inference-api intentionally not bypassed (all SPA traffic flows through app-api) (#272)
+- New CI workflow `.github/workflows/skip-auth-guard.yml` greps CDK source, workflow files, and Dockerfiles for `SKIP_AUTH=true` / `SKIP_AUTH: true` patterns and fails the build if any leak into deployed config. SHA-pinned `actions/checkout`, `ubuntu-24.04` (#272)
+- `SessionRepository.try_acquire_refresh_lock(session_id, owner, lock_ttl_seconds)` and `release_refresh_lock(session_id, owner)` for cross-task refresh coalescing (#273, #275)
+- `apis/shared/sessions_bff/single_flight.py` — new `resolve_once(session_id, loader_coro_factory)` primitive for in-process coalescing of the session-resolve path (#264)
+- CAUTION comment in `stream_coordinator` documenting that `AgentResult.context_size` / `EventLoopMetrics.latest_context_size` return only `inputTokens`, under-reporting by 99%+ under prompt caching (#270)
+
+### ✨ Improved
+
+- File metadata utilities (`backend/src/apis/shared/files/models.py`) for consistent attachment handling — `FileMetadata`, `FileContent`, size formatting, MIME-type inference — shared between routes and the chat-input component
+- Spreadsheet-analysis system prompt clarifies filename vs. sandbox-path handling; tool docstrings expanded with critical guidance on retries
+- Stream processor error handling for Code Interpreter responses is more defensive
+- Updated `test_session_refresh_preservation.py`'s `InstrumentedTable` to differentiate lock-acquire / token-persist / slide writes so `update_item_side_effect` injection only fires on the persist path (preserving original test intent) (#273)
+
+### 🔒 Security
+
+- `kms:GenerateDataKey` and `kms:DescribeKey` dropped from the app-api runtime task role (least privilege). Only `kms:Decrypt` remains, invoked by Secrets Manager on the caller's behalf when reading the CMK-encrypted `BFFCookieDataKeySecret` (#274)
+- `SKIP_AUTH=true` gated by boot-time CORS-origin allowlist + CI guard workflow; fails closed for any deploy target we haven't anticipated instead of blocklisting known cloud env vars (#272)
+
+### ⚡ Performance
+
+- `SessionRefreshMiddleware` resolve path now coalesces Angular's ~8-endpoint page-load fan-out to 1 `get_item` and 0 `update_item` on the critical path (previously ~16 serialized blocking AWS calls per user per minute). Response latency independent of `touch_last_seen` DDB latency after the `_maybe_slide` fire-and-forget refactor (#264)
+- `CookieCodec` initialization dropped from `kms:GenerateDataKey` + per-cold-start round trip to a one-shot Secrets Manager `GetSecretValue` + local SHA-256. No more per-task cold-start KMS call (#274)
+- Thumbnail render runs in `loop.run_in_executor` so the request worker isn't blocked; lazy `_thumb.png` sibling in S3 means steady-state thumbnails are a HEAD + presign, not a render (#263)
+
+### 🏗️ Infrastructure
+
+- New `BFFCookieDataKeySecret` (Secrets Manager, encrypted with `BFFCookieSigningKey` CMK); SSM parameter `/${projectPrefix}/auth/bff-cookie-data-key-secret-arn` publishes the ARN
+- App-api task role: added `secretsmanager:GetSecretValue` on the new secret; removed `kms:GenerateDataKey` and `kms:DescribeKey` on `BFFCookieSigningKey`; kept `kms:Decrypt`
+- `appApi.desiredCount` raised 1 → 2 — concurrency slack so a single blocked event loop can no longer halt all ingress
+
+### 📦 Dependencies
+
+- Backend: `strands-agents` 1.37.0 → 1.39.0, `strands-agents-tools` 0.5.1 → 0.5.2, new: `pypdfium2` (#265, #263)
+
+### 🧪 Test Coverage
+
+- `tests/apis/shared/middleware/test_session_refresh_bug_condition.py` (12 cases) — encodes the seven sub-conditions of the event-loop-blocking bug as Hypothesis properties. Fails on unfixed code (by design); passes on fixed code (#264)
+- `tests/apis/shared/middleware/test_session_refresh_preservation.py` (19 cases) — locks in 11 preservation invariants that must remain unchanged for non-buggy inputs (#264)
+- `tests/apis/shared/sessions_bff/test_single_flight.py` (6 cases) — primitive-level coverage for the new `resolve_once` module (#264)
+- `tests/apis/shared/sessions_bff/test_session_refresh_cross_task.py` (480 lines) — two-task integration coverage over moto DDB for the cross-task refresh lock, follower-polling/adoption, TTL recovery, headline invariant that two tasks racing in parallel call Cognito at most once (#273)
+- 8 new repository tests for the lock primitive (acquire on unlocked row, contention blocks peer, TTL recovery, distinct-session isolation, release-by-owner-only, atomic clear on token persist, condition fails when peer owns the lock, phantom-row-prevention on acquire, strict-owner release condition, absolute-lifetime guard ahead of refresh) (#273, #275)
+- `tests/agents/main_agent/streaming/test_per_message_cost_attribution.py` — three regression cases for the `metadata` vs `metadata_summary` contract; two parametrized cases for `stream_coordinator` current-context semantics including all-three-buckets-summed under cache-read/write (#270)
+- `tests/costs/test_calculator.py` — 26 cases of direct coverage for `CostCalculator` (per-bucket pricing, cache scenarios against Sonnet 4.5 rates, defensive missing-key / None handling, `calculate_cache_savings`, `validate_*` predicates) (#270)
+- `tests/auth/test_skip_auth.py` — `SKIP_AUTH` dependency-bypass + env-override coverage, startup guard allowlist behavior, skip-auth-guard.yml regex matches (#272)
+- Session-wide autouse fixture in `tests/conftest.py` scrubs `SKIP_AUTH_*` env so developer `.env` bleed doesn't silently turn on the bypass in test runs (#272)
+- Infrastructure-stack tests: dropped bootstrap-custom-resource assertions; added negative lock that no `AwsCustomResource` emits `kms:GenerateDataKey` / `secretsmanager:PutSecretValue`; positive assertion on `generateSecretString` shape (44-char, no punctuation, no space); fixed two pre-existing stale resource-count assertions (16→18 DDB tables, 3→6 secrets) (#273, #274)
+
 ## [1.0.0-beta.24] - 2026-05-06
 
 ### 🚀 Added
diff --git a/CLAUDE.MD b/CLAUDE.MD
index f3f0f754..356e2143 100644
--- a/CLAUDE.MD
+++ b/CLAUDE.MD
@@ -32,7 +32,7 @@ npx cdk deploy --all
 
 ## Key Conventions
 
-- **Deploy order:** Infrastructure → Gateway → Inference API → App API → Frontend (App API reads `runtime-workload-identity-name` from SSM, published by Inference API)
+- **Deploy order:** Infrastructure → (Gateway, RAG Ingestion, SageMaker Fine-Tuning, Artifacts, MCP Sandbox — parallel-safe) → Inference API → App API → Frontend (App API reads `runtime-workload-identity-name` from SSM, published by Inference API; Inference API + App API + Frontend conditionally consume `/{prefix}/artifacts/*` SSM params when `CDK_ARTIFACTS_ENABLED=true`; Inference API conditionally consumes `/{prefix}/mcp-sandbox/origin` into `AGENTCORE_MCP_APPS_SANDBOX_ORIGIN` when `CDK_MCP_SANDBOX_ENABLED=true`)
 - **Admin endpoints** go under `/admin/<domain>/`, user-facing under `/<domain>/`
 - **Errors stream as assistant messages** via SSE (not HTTP error codes)
 - **Signal-based state** throughout frontend (`signal()`, `computed()`)
@@ -47,6 +47,7 @@ npx cdk deploy --all
 | `content_block_start/delta/stop` | Streaming content |
 | `message_stop` | End of message |
 | `tool_use` / `tool_result` | Tool invocation and result |
+| `ui_resource` | MCP App UI for a tool result (SEP-1865) — payload `{type, toolUseId, resourceUri, html, mimeType, csp, permissions, sandboxOrigin}`, emitted right after the correlated `tool_result` when the tool declared a `ui://` resource. HTML fetched server-side via `resources/read` and inlined; `sandboxOrigin` is the proxy.html origin the SPA frames it in (empty unless the mcp-sandbox stack is deployed — inference-api consumes its SSM origin only when `CDK_MCP_SANDBOX_ENABLED=true`; an empty origin means the SPA cannot frame the App). Gated by `AGENTCORE_MCP_APPS_HOST_ENABLED` (default true since PR #7; set `=false` to opt an environment out) |
 | `stream_error` | Conversational error |
 | `oauth_required` | External MCP tool needs user consent — payload `{providerId, authorizationUrl}`, one event per provider emitted after `message_stop` |
 | `compaction` | Backend rolled older turns into a summary on this turn — payload `{previousCheckpoint, newCheckpoint, summarizedTurns, inputTokens}`, emitted after the final `metadata` event so the badge updates first, before `done` |
@@ -61,6 +62,8 @@ npx cdk deploy --all
 | MCP + SigV4 | Cloud Lambda (Gateway) | AWS SigV4 |
 | A2A | Cloud Runtime | AgentCore auth |
 
+Today A2A is **client-only** — `A2AAgentConfig` (`apis/shared/tools/models.py`) describes remote agents we call out to; we do not yet expose an A2A server / `AgentCard`. **When the first A2A server construct lands** (Strands `agent.to_a2a()`, an `A2AServer`, or a hand-built `AgentCard`), its advertised `capabilities` MUST include `streaming=True`. Without it the A2A SDK client silently falls back to non-streaming, never receives a `completed` event, and hangs until its ~40-minute timeout (ref-repo `aws-samples/sample-strands-agent-with-agentcore` commit `50c9112`).
+
 ## Cross-Package Contracts
 
 - Backend route handlers define the API shape; frontend TypeScript interfaces must match
@@ -76,7 +79,8 @@ npx cdk deploy --all
 | New admin endpoint | `backend/src/apis/app_api/admin/<domain>/` |
 | New agent tool | `backend/src/agents/main_agent/tools/` + register in `__init__.py` |
 | New Angular page | `frontend/ai.client/src/app/<feature>/` |
-| New CDK stack | `infrastructure/lib/<stack-name>-stack.ts` |
+| New CDK stack | `infrastructure/lib/<stack-name>-stack.ts` (also: register in `test/stack-dependencies.test.ts` with a tier, add scripts in `scripts/stack-<name>/`, add a workflow in `.github/workflows/`, update `step-04-deploy.md`) |
+| New Lambda for an infra stack | `backend/src/lambdas/<lambda-name>/` (one folder per Lambda; not part of the `apis/` import boundary) |
 | Shared backend code | `backend/src/apis/shared/<domain>/` |
 
 ### Inference API boundary
@@ -87,6 +91,14 @@ The `inference-api` runs inside an AgentCore Runtime container. The runtime data
 
 If you're tempted to add to inference-api because of an existing route there (`converse_router`, `voice_router`), don't use them as templates without confirming their access path — they predate this rule and may rely on bypasses (API key, WebSocket upgrade, direct container reach in environments where the runtime isn't the only path).
 
+### Auth dependency on app_api routes
+
+The SPA sends an httpOnly session cookie — not `Authorization: Bearer`. A route that declares a bare Bearer-only dependency on the SPA-facing surface causes a 401 → centralized redirect loop the moment the SPA hits it.
+
+**Rule:** New routes under `apis/app_api/` use `Depends(get_current_user_from_session)` from `apis.shared.auth.dependencies` for user authentication. Admin routes use `Depends(require_admin)` (which chains through the same cookie dependency). The only exceptions are the API-key feature (`auth/api_keys/`, uses `X-API-Key`) and voice mode (`voice/`, uses a voice-ticket cookie) — both handle auth on their own terms and should not be used as templates for ordinary user routes.
+
+Do not reintroduce a Bearer-only `Depends(...)` on any user-facing route. If you find one in older code, migrate it to `get_current_user_from_session`.
+
 ## Debugging Quick Reference
 
 - **Tool not appearing:** Check `__init__.py` export, RBAC permissions, `enabled_tools`, ToolRegistry
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 18a4b50c..6ecfe347 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -1,5 +1,39 @@
 # Contributing to AgentCore Public Stack
 
+## Contribution Policy
+
+AgentCore Public Stack is maintained by Boise State University as a reference
+implementation for academic and public-sector AgentCore deployments. It is
+source-available under the PolyForm Noncommercial License 1.0.0 (see
+[`LICENSE`](./LICENSE)).
+
+### Pull requests are restricted to approved collaborators
+
+To keep the reference architecture coherent and to let downstream deployments
+stay in sync with a single, well-known upstream, this repository uses GitHub's
+**"Collaborators only"** pull request setting. Only users with Write access or
+higher can open a pull request.
+
+### Reporting issues and proposing changes
+
+If you are deploying this stack and find a bug, regression, or documentation
+gap, please open a GitHub issue — issues are open to everyone. A maintainer
+will triage the report, and if the change belongs upstream we will either
+implement it or coordinate with the reporter on next steps.
+
+### For collaborators
+
+- Link the tracking issue in the PR description so changes stay discoverable.
+- Keep each PR focused on a single logical change.
+- Sign off your commits with `git commit -s` (Developer Certificate of Origin).
+- Make sure CI is green before requesting review.
+- Respect the backend import boundaries enforced by
+  `backend/tests/architecture/test_import_boundaries.py` — `app_api`,
+  `inference_api`, and `agents/` are independent consumers of `apis.shared`
+  and must not import from each other.
+
+---
+
 ## Prerequisites
 
 - **Node.js** 20+ (for frontend and infrastructure)
diff --git a/README.md b/README.md
index c9db526f..e0089660 100644
--- a/README.md
+++ b/README.md
@@ -8,7 +8,7 @@
 **An open-source, production-ready Generative AI platform for institutions**
 *Built by Boise State University, designed for everyone.*
 
-[![Release](https://img.shields.io/badge/Release-v1.0.0--beta.24-6366f1?style=flat&logo=github&logoColor=white)](RELEASE_NOTES.md)
+[![Release](https://img.shields.io/badge/Release-v1.0.0--beta.28-6366f1?style=flat&logo=github&logoColor=white)](RELEASE_NOTES.md)
 [![Nightly](https://github.com/Boise-State-Development/agentcore-public-stack/actions/workflows/nightly.yml/badge.svg)](https://github.com/Boise-State-Development/agentcore-public-stack/actions/workflows/nightly.yml)
 
 ![Python](https://img.shields.io/badge/Python-3.13+-3776AB?style=flat&logo=python&logoColor=white)
@@ -204,6 +204,7 @@ The fastest path to production is the **GitHub Actions pipeline**, which automat
 |-----------|-------------|---------|
 | Networking | VPC, ALB, Security Groups | Isolated network with load balancing |
 | Fine-Tuning *(optional)* | SageMaker, S3, DynamoDB | Model training, batch inference, artifact storage |
+| Artifacts *(optional)* | DynamoDB, S3, CloudFront, Lambda | Iframe-isolated rendering for agent-generated HTML/code artifacts |
 | RAG Ingestion | Lambda, S3 | Document ingestion for retrieval-augmented generation |
 | Inference API | Bedrock Agentcore | Agent orchestration with Bedrock |
 | App API | ECS Fargate | Authentication, admin, session management |
@@ -248,7 +249,8 @@ agentcore-public-stack/
 │       └── services/               # State management
 ├── infrastructure/                  # AWS CDK stacks
 │   └── lib/                         # Infra, App API, Inference API, Frontend,
-│                                    # Gateway, RAG Ingestion, SageMaker Fine-Tuning
+│                                    # Gateway, RAG Ingestion, SageMaker Fine-Tuning,
+│                                    # Artifacts
 └── .github/
     ├── workflows/                   # CI/CD pipelines
     └── docs/deploy/                 # Deployment guides
@@ -260,7 +262,7 @@ agentcore-public-stack/
 
 See [RELEASE_NOTES.md](RELEASE_NOTES.md) for the full changelog, including new features, bug fixes, platform upgrades, and deployment notes for each release.
 
-**Current release:** v1.0.0-beta.24
+**Current release:** v1.0.0-beta.28
 
 ---
 
diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md
index 46d94619..bd182350 100644
--- a/RELEASE_NOTES.md
+++ b/RELEASE_NOTES.md
@@ -1,13 +1,699 @@
-# Release Notes — v1.0.0-beta.24
+# Release Notes — v1.0.0-beta.27
 
-**Release Date:** May 6, 2026
-**Previous Release:** v1.0.0-beta.23 (April 29, 2026)
+**Release Date:** May 20, 2026
+**Previous Release:** v1.0.0-beta.26 (May 13, 2026)
 
 ---
 
 ## Highlights
 
-This release lands the **BFF Token Handler** — a ground-up rewrite of the SPA's auth surface. `localStorage` Bearer tokens are replaced with server-side Cognito session storage keyed by an opaque session id in a KMS-sealed AES-GCM cookie, the public PKCE Cognito client is decommissioned in favor of a confidential client whose secret never leaves the server, and same-origin `/api/*` routing via CloudFront enables `__Host-` cookies, double-submit CSRF, and eliminates the CORS preflight from every chat turn. **Voice mode returns** via a WebSocket-ticket proxy on app-api. The chat view gains a **per-conversation cost + context-window badge** with write-time aggregation, and **context compaction events** now surface inline with refresh-survival. Anthropic **extended thinking** is wired end-to-end via per-model inference parameters. The backend finishes its architecture cleanup: cost, tools, storage, and API-keys modules now live under `apis.shared` with AST-enforced import boundaries.
+The largest release since the BFF cutover. Beta.27 lands two new user-visible surfaces, both built on top of brand-new CDK stacks, plus a major admin redesign and a handful of inference-API correctness fixes.
+
+- **Artifacts** — the agent can now produce versioned, iframe-isolated HTML, Markdown, and code artifacts that render in a docked side panel beside the chat. Backed by a new `ArtifactsStack` (S3 + DynamoDB + render Lambda + CloudFront on `artifacts.{domain}`) and short-lived JWT render tokens minted by app-api.
+- **MCP Apps host renderer** — third-party MCP servers can ship UI alongside their tools. The agent advertises a UI extension on `initialize`, fetches `ui_resource` payloads via `resources/read`, and the SPA frames them in a sandboxed `<mcp-app-frame>` over a strict CSP, with an app-initiated `tools/call` proxy and explicit user consent. Backed by a new `McpSandboxStack` (CloudFront origin on `mcp-sandbox.{domain}` with dynamic per-resource CSP via a CloudFront Function). Default-on this release.
+- **Admin shell redesign** — the 15-card admin grid is replaced with a persistent grouped sidebar, and dense list redesigns for models and tools turn cards into compact expandable rows. Quotas and Fine-Tuning collapse from seven sibling routes into two tabbed pages.
+- **Recoverable `max_tokens` truncation** — what used to be a leaky, infinite-looping `MaxTokensReachedException` is now an inline "Response length limit reached" notice with a Continue button that resumes the truncated turn instead of resending the prompt. Survives a page refresh.
+- **Model-aware adaptive thinking** — Opus 4.7's 400 on `thinking.type=enabled` is fixed: Opus 4.6/4.7, Sonnet 4.6, and Mythos now emit `{type: adaptive, display: summarized}` and depth is governed by a new admin- and user-configurable `effort` knob. Older models keep the legacy `enabled` shape.
+- **`/ping` reaper fix** — fixes silent mid-stream microVM reaping by emitting the integer `time_of_last_update` field AgentCore's idle reaper requires. Workaround for `bedrock-agentcore-sdk-python#471` until async-task busy tracking lands.
+- **Pre-migration backup tool** — `scripts/backup-data/` produces a complete, restore-friendly snapshot of all DynamoDB tables, user-content S3 buckets, and Cognito (config + users + groups + IdPs + plaintext app-client secrets) for a given `CDK_PROJECT_PREFIX`. Workflow-dispatch wired.
+- **Dependency upgrades** — `bedrock-agentcore` 1.6.4 → 1.9.1 (with coupled `boto3` 1.42.96 → 1.43.9) and `strands-agents` 1.39.0 → 1.40.0.
+
+This release adds two new CDK stacks (`ArtifactsStack`, `McpSandboxStack`) and one new DynamoDB table (`user-menu-links`). Both new stacks are gated by config flags. Deploy order matters — see "Deployment notes" below.
+
+---
+
+## Artifacts
+
+The agent can now author versioned standalone documents — HTML pages, charts, Markdown reports — that render in a sandboxed iframe alongside the chat. Artifacts solve two problems the existing `create_visualization` and Code Interpreter outputs couldn't: persistence (the user can re-open and download), and isolation (HTML/JS runs in a cross-origin sandbox so it can't read cookies or the SPA DOM).
+
+### Architecture
+
+A new leaf stack, `ArtifactsStack`, owns the rendering pipeline:
+
+- **DynamoDB `user-artifacts` table** — version log + HEAD pointer per artifact. PK `USER#{user_id}`, SK `ARTIFACT#{aid}#V#{version:05d}` for versions and `ARTIFACT#{aid}#HEAD` for the latest pointer. GSI1 indexes by `SESSION#{session_id}` so the SPA can list artifacts produced in the current chat.
+- **S3 `artifacts-content` bucket** — private, no CORS. Layout `{user_id}/{aid}/v{n}/index.html`. Versions are immutable: there's no `s3:DeleteObject` grant on the inference-api role, so an `update_artifact` writes a new version and re-points HEAD instead of mutating.
+- **Render Lambda** — validates a render-token JWT scoped to one `(artifact_id, version)`, fetches the blob from S3, and returns it with a strict per-origin CSP that allows inline `<style>` / `<script>` plus scripts from `cdn.tailwindcss.com`, `esm.sh`, `cdn.jsdelivr.net`, and `unpkg.com`. `connect-src 'none'` — artifacts cannot make outbound network calls.
+- **CloudFront distribution on `artifacts.{domain}`** — terminates TLS, attaches the security-headers policy. The artifact origin is intentionally a different cookie-jar host from the SPA so a script in an artifact can't read `__Host-bff_session`.
+- **HMAC signing key** — the render-token signing secret lives in Secrets Manager in `InfrastructureStack` (not `ArtifactsStack`), so app-api and the render Lambda can both read it without `ArtifactsStack` becoming a stack-dependency root. App-api mints short-lived JWTs that the SPA embeds as the iframe `src`.
+
+### Agent tools
+
+Two new built-in tools, registered as default public tools so the feature is usable on first deploy without an admin opting them in per role:
+
+- `create_artifact(title, content, content_type="text/html; charset=utf-8")` — writes v1. HTML mode requires a complete standalone document (`<!doctype html>` + full `<html>`); Markdown mode (`content_type="text/markdown"`) takes raw GFM and the writer wraps it in a self-contained HTML render harness server-side.
+- `update_artifact(artifact_id, content, ...)` — writes a new version and re-points HEAD; the render-token mints against the latest version when the panel updates.
+
+The system prompt documents the dual authoring contract and the CSP allowlist (Chart.js auto-registering build, `import Chart from "https://esm.sh/chart.js@4/auto"` etc.) so the model produces output that actually renders.
+
+### SSE + SPA
+
+A new `artifact` SSE event streams from the inference-api each time the agent creates or updates an artifact. The frontend has:
+
+- `ArtifactStateService` + `ArtifactHttpService` + `ArtifactDownloadService` — signal-backed state, render-token fetch, blob download.
+- A docked, resizable artifact panel beside the chat that auto-opens on first creation, shows a skeleton while loading, and on update jumps to the latest version. Per-version history cards in the panel let the user step backwards through revisions.
+- An inline artifact card anchored to the producing message, with a preview/code toggle (syntax-highlighted source view) and a download button on both the card and the panel.
+- Full-width inline cards, scoped `isolation: isolate` z-indexing so a focused artifact card doesn't escape its message row, and live tool-output streaming into the tool rail while the artifact is being authored.
+
+### Configuration
+
+Artifacts is opt-in at deploy time via `CDK_ARTIFACTS_ENABLED=true`. When enabled, `CDK_HOSTED_ZONE_DOMAIN` and `CDK_ARTIFACTS_CERTIFICATE_ARN` become required. Validation runs on every stack synth, so all five consumer GitHub workflows now thread these env vars through the OIDC composite action — a missing var on a non-`ArtifactsStack` workflow would otherwise fail synth.
+
+---
+
+## MCP Apps Host-Renderer
+
+A scoping document landed early in the cycle (`docs/kaizen/scoping/mcp-apps-host-renderer.md`) and the implementation followed a deliberate seven-PR sequence (#339 PR #0 → #349 PR #7). The result: third-party MCP servers can ship a small interactive UI alongside their tools, and that UI renders in a sandboxed iframe with the same isolation guarantees as artifacts.
+
+### Architecture
+
+A new leaf stack, `McpSandboxStack`, mirrors the artifacts pattern:
+
+- **CloudFront distribution on `mcp-sandbox.{domain}`** — fronts an S3 origin that serves a tiny "basic-host" mount page. App URLs land at `mcp-sandbox.{domain}/<resource-encoded-path>`, the mount page reads the encoded resource URL from the path and frames the actual MCP App content in an inner blob iframe with `allow-same-origin` matching the basic-host reference.
+- **Dynamic per-resource CSP** — a CloudFront Function on the viewer-response decodes a `?csp=` query param (URL-encoded `frame-ancestors` source list scoped to that one resource) and emits a per-request `Content-Security-Policy` header. The function source is loaded from `assets/mcp-sandbox/csp-function.js` and the `frame-ancestors` allowlist is JSON-injected at synth — the substitution asserts the placeholder marker is present exactly once so a future refactor that loses it fails loudly at synth, not at edge runtime.
+- **Outer `frame-ancestors` allowlist** — configurable via `mcpSandbox.extraFrameAncestors` so a deploy can permit framing from custom origins (preview environments, alternate SPA hosts) without rebuilding the function asset.
+
+### MCP protocol surface
+
+The agent now advertises an `experimental.ui` extension during MCP `initialize` so a server knows whether the host can render UI. Tools whose only output is a `ui_resource` are filtered out for non-capable clients (the existing API-key path, scripted callers).
+
+When a tool result references a UI resource, the agent fetches it via the standard MCP `resources/read` flow and emits a `ui_resource` SSE event with `uri`, `permissions`, and a `sandboxOrigin` that points at the deployed `mcp-sandbox` host (sourced from SSM, so the value is correct per environment). Two app-initiated message types complete the protocol:
+
+- `ui/message` — the App pushes structured data into the chat input as a tool-input draft (acts like a smart form).
+- `ui/update-model-context` — the App contributes context the agent should consider on the next turn.
+- `tools/call` proxy — the App can invoke other tools on the same MCP server. The frontend brokers these through app-api over an event broker rather than letting an iframe call the Bedrock runtime directly.
+
+### Frontend
+
+- `<mcp-app-frame>` Angular custom element + a `postMessage` bridge that enforces the allowed message types and rejects unknown origins.
+- A consent prompt rendered as an inline message component — the user explicitly approves an App before it gets framed. Consent decisions persist across reloads via a card store.
+- Reload persistence: the consent service hydrates from a card store on session load so a refresh doesn't re-prompt for a previously-approved App.
+- A signal-backed `ToolRendererRegistryService` (the PR #0 refactor) keyed by tool name. The `mcp-app-frame` renderer is the first registry-aware tool result; the default renderer reproduces the prior text/JSON/image switch verbatim, so all existing tool-result cards render identically. `calculator`, `fetch_url_content`, and `create_visualization` were migrated as proof points to validate the registry shape.
+
+### Default-on
+
+`Defaults.MCP_APPS_HOST_ENABLED` flips `False → True` this release, and `AGENTCORE_MCP_APPS_SANDBOX_ORIGIN` is wired into the inference-api runtime env from the `mcp-sandbox` SSM origin (gated on `config.mcpSandbox.enabled`, mirrors the artifacts conditional-SSM pattern). Without that wiring a deployed environment would emit `ui_resource` events with an empty `sandboxOrigin` and the SPA couldn't frame the App. Two synth tests cover the present/absent paths.
+
+A budget-allocator-server example is committed as a reference MCP App, and `step-04-deploy.md` / `step-05-verify.md` runbooks gain "Register an MCP-Apps-capable MCP server" sections plus a manual e2e dogfood scenario.
+
+### CSP / isolation hardening (PRs #352–#360)
+
+Several follow-ups landed during dogfood to align the host with the upstream `ext-apps` basic-host reference:
+
+- Outer CSP + inner mount alignment with the reference implementation (#353).
+- Blob-iframe rendering, first-class block element, Angular 21-specific fixes (#352).
+- Sandbox CFN `Comment` shortened to fit the 128-char AWS cap, twice (#356, #357).
+- URL-decoded `?csp=` parsing in the sandbox CFN (#358), with the `x-csp-debug` diagnostic header added during the investigation (#358) and removed once the fix landed (#359).
+- Inner App iframe got `allow-same-origin` to match the basic-host reference (#360).
+
+---
+
+## Admin Shell Redesign
+
+The 15-card admin grid had outgrown its container — a sibling navigation surface that grew unboundedly with every new admin domain. Beta.27 replaces it with a persistent sidebar shell modeled on the user settings page, plus dense list redesigns for the two highest-traffic admin pages.
+
+### Persistent sidebar shell (#300)
+
+- Replaces the card grid with a left rail that stays visible across all admin routes. Nav items are grouped: **Usage & Spend**, **AI Configuration**, **Identity & Access**, **Customization**.
+- `/admin` redirects to `/admin/costs` as the default landing.
+- Strips the redundant "Back to Admin" link from 10 top-level admin sub-pages — the sidebar replaces them.
+- Cost summary cards restructured so the title gets its own row and the icon is a small top-right corner accent — fixes label wrapping on "Cache Savings" / "Avg Cost/User" in the narrower content area.
+- Drive-by fix: 24 loading spinners across admin, settings, fine-tuning, and auth pages were rendering as a uniform gray ring in dark mode (no visible motion); they now spin with the proper accent.
+- Admin shell widened and sidebar label wrapping fixed (#305).
+
+### Route consolidations
+
+Two clusters of sibling routes collapse into tabbed pages:
+
+- **Quotas** (`/admin/quotas`) — Tiers, Assignments, Overrides, Inspector, Events. Five sibling routes become tabs on a single page; deep-link URLs are preserved for back-compat.
+- **Fine-Tuning** (`/admin/fine-tuning`) — Access + Costs.
+
+### Compact list redesigns
+
+- **Manage Models + Bedrock/Gemini/OpenAI browse pages (#332)** — information-dense card layouts replaced with one-line scannable rows that expand on demand to show detail. Slim inline filter toolbar above the list. Inline enable/disable toggle on the manage-models row so status changes no longer require opening the edit form. Border-radius standardized on `rounded-2xl` to match the chat input.
+- **Tool catalog + form (#335)** — same redesign applied to the admin tools list and create/edit form. Compact expandable rows with an inline detail panel. Form flattened to use the shared list-page token set (`rounded-2xl`, `text-sm/6`, `text-2xl/8` header, `focus:ring-2`) instead of the older heavy section cards. No behavior changes — purely visual.
+
+### Admin-managed user-menu links (#298, #303, #315)
+
+A new admin domain so org admins can curate the links shown in the SPA user menu without code changes. Each link is either an external URL (opens in new tab) or an in-app modal that renders admin-authored Markdown — covers the common cases of policy pages, feedback forms, and embedded org-specific notices.
+
+- New `user-menu-links` DynamoDB table (single-tenant flat config; per-org PK scoping can be added later without changing the SK shape).
+- Admin CRUD at `/admin/user-menu-links` (gated by `require_admin`).
+- Public enabled-only read at `/user-menu-links` (cookie-aware `get_current_user_from_session` so it works under the BFF cutover).
+- Links and in-app modals are visually distinguished in both the modal preview and the runtime rendering (#303).
+- Resource gated to admin-only so non-admin user-menu loads no longer fire a duplicate request (#315).
+
+### Sidebar density (#301)
+
+Drive-by improvement on the chat session list: rows tighten from ~40px to ~32px (`py-2 → py-1.5`, `text-sm/6 → text-sm/5`), nested flex wrappers around the title removed (the link is now `block truncate` directly on the text), group gaps reduced (`gap-y-4 → gap-y-3`, `pb-1 → pb-0.5`, row `gap-y-1 → gap-y-0.5`). A list of 10 sessions is ~25% shorter overall. Inactive items drop from `font-medium` to `font-normal`; the active row picks up `!font-medium` via `routerLinkActive` so the selected state still feels distinct. Skeleton loader and entry animation added.
+
+---
+
+## Recoverable `max_tokens` Truncation
+
+Previously a `MaxTokensReachedException` surfaced as a generic, leaky error in the chat (`...unrecoverable state... https://strandsagents.com/...`) and the only "recovery" was a re-send button that fired the original prompt as a new user turn — the model re-answered from scratch, hit the same ceiling, and infinite-looped (#328).
+
+Beta.27 turns the failure into a first-class inline affordance.
+
+### Backend
+
+- `MaxTokensReachedException` is classified specifically in the stream processor; emits a `max_tokens`-coded, **recoverable** `stream_error` event. The leaked SDK URL and the verbose chat bubble are gone.
+- **Continue is a resume, not a new turn.** A `continue_truncated` invocation re-enters the agent loop with an empty-list prompt, so the model continues the truncated assistant message in restored history (assistant-prefill) instead of answering a fresh instruction. Bypasses quota / RAG / file-resolution like the existing interrupt-resume path.
+- The error is no longer double-persisted as a second assistant message (would otherwise break role alternation for the follow-up turn).
+- **Refresh-survival.** A `lastTurnContinuable` marker on session metadata is set on truncation and cleared at the start of any non-resume turn. The marker flows through `SessionMetadataResponse` so Continue reappears after a page reload.
+- `stream_error` is now an always-allowed parser event so a terminal recovery signal can't be dropped by stream-state gating.
+
+### Frontend
+
+- Compact inline "Response length limit reached" notice with a Continue button on the truncated message — no verbose error bubble.
+- Continuation-aware message-map sync: pins the existing partial + notice and **appends** the continuation rather than truncating to the last user message.
+- Hydrates `lastTurnContinuable` from session metadata on session load.
+
+Backend + frontend regression tests cover classification, the continuation path, the always-allowed `stream_error`, and the refresh-survival marker round-trip.
+
+---
+
+## Model-Aware Adaptive Thinking + `effort`
+
+Opus 4.7 rejects `thinking.type="enabled"` with a 400 — it requires adaptive thinking with depth governed by Anthropic's top-level `output_config.effort` field. Sonnet 4.6, Opus 4.6, and Mythos accept the legacy shape but recommend adaptive. Beta.27 makes `_shape_thinking_value` model-aware (#329, #330, #331).
+
+- **Adaptive marker list.** `_BEDROCK_ADAPTIVE_THINKING_MARKERS = ("claude-opus-4-7", "claude-opus-4-6", ...)`. On a marker hit, `_shape_thinking_value` emits `{type: "adaptive", display: "summarized"}` (the explicit `display` keeps the reasoning trace visible — Opus 4.7 defaults `display` to `"omitted"`). Non-marker models keep the legacy `{type: "enabled", budget_tokens: N}` shape.
+- **`effort` as a canonical inference param.** Routed through `additional_request_fields.output_config.effort` (it's NOT on `additionalModelRequestFields` like `thinking` / `top_k`). Wired through the admin model form and the user-facing chat settings panel as a new select control, with server-side allowed-set gating in the param normalizer.
+- **Generic `allowed` enum on `ModelParamSpec`** — the per-model effort-tier difference between Sonnet 4.6 and Opus 4.7 (which gets the additional `xhigh` / `max` tiers) is now data, not a model-family branch in code.
+- **Hardened param coercion (#329, #330).** `Dict[str, Any]` from JSON let a float reach the Bedrock Converse SDK, which rejects a float `maxTokens` with a hard boto3 validation error. `max_tokens` and `top_k` are now coerced to `int` at the single provider-translation chokepoint (covers fresh + resumed turns, all providers). The thinking-vs-`max_tokens` consistency guard previously used `isinstance(..., int)` and silently no-opped on float input; it now coerces first so an inconsistent request (`thinking >= max_tokens`) is rejected before reaching Anthropic. A model-ceiling cap protects against admin-configured `max_tokens` that exceed the model's hard limit.
+
+---
+
+## Inference-API Reliability
+
+### `/ping` reaper fix (#338)
+
+AgentCore's idle reaper requires an integer `time_of_last_update` field alongside `status`; when absent, the platform reaps the microVM at `idleRuntimeSessionTimeout` even mid-stream regardless of reported status (`bedrock-agentcore-sdk-python#471`). We have no async-task busy tracking yet (deferred async-mode work), so we cannot report `HealthyBusy` — returning a fresh timestamp on every ping is the documented mitigation against silent mid-generation reaps. Status casing also corrected to match `PingStatus`. This was a Kaizen-2026-05-15 review item.
+
+### Removed dead Bearer-only auth from app-api (#297)
+
+A sweep of `app_api/` for `Depends(get_current_user)`, `Depends(security)`, `Depends(verify_token)`, and manual `Authorization` header reads turned up exactly two routes still on Bearer auth, both in `chat/routes.py`. The dead Bearer-only paths are removed; `POST /chat/agent-stream` is documented as intentionally Bearer for non-SPA callers (API-key tooling, scripts). All other app-api routes are cookie-based BFF auth post-beta.24.
+
+### Frontend version baking (#336)
+
+`scripts/stack-frontend/build.sh` invoked `ng build` directly, which bypassed the npm `prebuild` lifecycle hook that runs `gen-version.js`. The deployed bundle therefore shipped the committed `'dev'` placeholder in `src/version.ts`, so the user menu rendered "local" on `develop` and `main`. Build script now runs `gen-version.js` explicitly before the build.
+
+### A2A streaming-capability guard (#338)
+
+Forward-looking guard: A2A is currently client-only. When the first A2A server construct lands (Strands `agent.to_a2a()`, `A2AServer`, or a hand-built `AgentCard`), its advertised capabilities **must** include `streaming=True` — otherwise the A2A SDK client silently falls back to non-streaming, never receives a `completed` event, and hangs ~40 minutes (ref-repo `sample-strands-agent-with-agentcore` commit `50c9112`). Documented in `CLAUDE.md` as a Kaizen-2026-05-15 review item.
+
+### Misc inference-API polish
+
+- Markdown content-type support in the artifact tool (#318).
+- Configurable extra CSP `frame-ancestors` for artifacts (#314).
+- `jsdelivr` and `unpkg` added to the artifact-origin script-src CSP so Chart.js artifacts loaded via the canonical jsDelivr snippet stop rendering blank (#326).
+
+---
+
+## Pre-Migration Backup Tool
+
+A new `scripts/backup-data/` tool produces a complete, restore-friendly snapshot for a given `CDK_PROJECT_PREFIX`, plus a `workflow_dispatch` GitHub Actions workflow that runs it via the existing OIDC composite action (#361).
+
+**Coverage:**
+
+- All ~20 application DynamoDB tables via `ExportTableToPointInTime` (portable DynamoDB-JSON).
+- User-content S3 buckets via `aws s3 sync`.
+- Full Cognito user pool config including identity providers and app clients **with their plaintext client secrets preserved** (so IdP re-registration with new infra can be fully automated).
+- Users, groups, and group memberships.
+- Best-effort AgentCore Memory events.
+
+Each run lands in a freshly-created, versioned, SSE-encrypted, TLS-only backup bucket named `{prefix}-backup-{utc_timestamp}`. `manifest.json` is the single source of truth a future restore script will consume.
+
+**Known limitation:** Cognito password hashes are not exportable by AWS — that constraint is documented prominently. Ephemeral session/state tables are excluded by default. Restore is intentionally a separate phase, to be written against the new infrastructure once it exists.
+
+---
+
+## Smaller Improvements
+
+- **Autofocus chat input on session load and switch (#333)** — focus the textarea on first mount and whenever the session changes (new or existing) so the user can type immediately. Assistant-preview empty state opts out via a new `autoFocus` input so it doesn't steal focus from the editor form.
+- **Copy-to-clipboard button on chat code blocks (#299)** — plus Prism syntax-highlighting bundles for JavaScript, TypeScript, Python, and SQL alongside the existing C#/CSS bundles.
+- **Tool renderer registry (#339)** — signal-backed `ToolRendererRegistryService` keyed by tool name replaces the implicit text/JSON/image switch baked into `ToolUseComponent`. Foundation for the MCP Apps `<mcp-app-frame>` renderer; `calculator`, `fetch_url_content`, and `create_visualization` migrated as proof points. Default renderer reproduces prior markup verbatim — zero visible change for existing tools.
+- **Kaizen-2026-05-15 hygiene (#338, #341, #302, #304)** — replaced dead source URLs in `kaizen-research` (the `bedrock/whats-new/` 404, the `docs.claude.com` claude-code release-notes 301→404, and the inactive `anthropics/courses`); fixed `aws/amazon-bedrock-agentcore-{sdk-python,starter-toolkit}` repo-slug typos to the correct `aws/bedrock-agentcore-*` slugs.
+
+---
+
+## 🐛 Bug fixes
+
+- `MaxTokensReachedException` no longer infinite-loops on retry; surfaces as a recoverable inline notice with Continue (#328).
+- Float-typed `max_tokens` / `top_k` in inference params no longer crash boto3's Bedrock Converse client (#329, #330).
+- Opus 4.7 no longer 400s on `thinking.type="enabled"` — model-aware adaptive shaping (#331).
+- Silent mid-stream microVM reaping on long generations fixed via `time_of_last_update` (#338).
+- Frontend deploy bundles bake the real version instead of the `'dev'` placeholder (#336).
+- Chart.js artifacts loaded via `cdn.jsdelivr.net` no longer render blank (#326).
+- Admin user-menu-links resource was firing a duplicate load request for non-admin users — gated to admin-only (#315).
+- Artifact card z-index escapes its message row on focus — scoped with `isolation: isolate` (#323).
+- `mcp-sandbox` CFN `Comment` overflow on the 128-char AWS cap (#356, #357).
+- `mcp-sandbox` CSP not URL-decoded in CloudFront Function (#358).
+
+---
+
+## 🔒 Security / isolation
+
+- **Artifacts** render on `artifacts.{domain}` — a different cookie-jar host from the SPA, with `connect-src 'none'` so an artifact cannot make outbound requests. Render-token JWTs are scoped to one `(artifact_id, version)` and are HMAC-signed with a Secrets-Manager-managed key. S3 versions are immutable: there's no `s3:DeleteObject` grant on the inference-api role.
+- **MCP Apps** render on `mcp-sandbox.{domain}` with a per-resource `frame-ancestors` CSP emitted by a CloudFront Function. The outer host enforces a separate origin from the SPA, the inner App iframe carries `allow-same-origin` to match the basic-host reference, and an explicit user consent step (with reload persistence) gates first-time framing.
+- App-api Bearer-only auth removed from all routes except the documented API-key endpoint (#297).
+
+---
+
+## ⚠️ Breaking changes
+
+- **MCP Apps default-on.** `Defaults.MCP_APPS_HOST_ENABLED` flips `False → True`. To stay opt-in, set `AGENTCORE_MCP_APPS_HOST_ENABLED=false` in inference-api task env. If MCP Apps is enabled but `mcp-sandbox` isn't deployed, `ui_resource` events will emit with an empty `sandboxOrigin` and the SPA cannot frame the App.
+- **App-api Bearer-only auth removed (#297).** If any external integration was calling `apis/app_api/` routes with `Authorization: Bearer`, switch it to the API-key feature (`auth/api_keys/`, `X-API-Key`) before deploying beta.27. `POST /chat/agent-stream` remains Bearer for non-SPA callers and is unaffected.
+- **Opus 4.7 admin model entries.** Any admin model entry for an Opus 4.6/4.7 / Sonnet 4.6 / Mythos model that used `thinking.type="enabled"` should be updated to use the new `effort` knob; the runtime still emits the correct adaptive shape regardless, but the admin UI now exposes `effort` directly.
+
+---
+
+## 🏗️ Infrastructure
+
+**New stacks (both gated by config flags, both safe to enable independently):**
+
+- **`ArtifactsStack`** (gated by `config.artifacts.enabled`) — DDB `user-artifacts` table, private S3 `artifacts-content` bucket, render Lambda, CloudFront on `artifacts.{domain}`, Route53 alias. Consumes `/artifacts/render-token-key-arn` SSM (published by `InfrastructureStack`); publishes `/artifacts/bucket-name`, `/artifacts/bucket-arn`, `/artifacts/table-name`, `/artifacts/table-arn`, `/artifacts/origin`. Requires `CDK_HOSTED_ZONE_DOMAIN` and `CDK_ARTIFACTS_CERTIFICATE_ARN`.
+- **`McpSandboxStack`** (gated by `config.mcpSandbox.enabled`) — S3 mount-page bucket, CloudFront distribution on `mcp-sandbox.{domain}` with a CloudFront Function for dynamic per-resource CSP, Route53 alias. Publishes `/mcp-sandbox/origin` SSM, consumed by inference-api at runtime as `AGENTCORE_MCP_APPS_SANDBOX_ORIGIN`.
+
+**`InfrastructureStack` additions:**
+
+- New `UserMenuLinksTable` (DDB) + `/admin/user-menu-links-table-name` and `/admin/user-menu-links-table-arn` SSM parameters.
+- New `ArtifactRenderTokenSecret` (Secrets Manager, AWS-managed encryption, `generateSecretString` 64-char) gated on `config.artifacts.enabled`. SSM `/artifacts/render-token-key-arn` publishes the ARN. Lives in `InfrastructureStack` (not `ArtifactsStack`) so app-api can read it without taking a stack-deploy-order dependency on `ArtifactsStack`.
+
+**Cross-stack:** `inference-api-stack` conditionally consumes `mcp-sandbox` SSM when `config.mcpSandbox.enabled` is true (mirrors the artifacts conditional-SSM pattern). Two synth tests cover present/absent.
+
+**Deploy order:** `InfrastructureStack` → `ArtifactsStack` (if enabled) and `McpSandboxStack` (if enabled) → app-api → inference-api → frontend.
+
+---
+
+## 🔧 CI/CD improvements
+
+- **Artifact env vars threaded through every consumer workflow (#307).** Validation on `config.artifacts.enabled` runs on every stack synth (the `bin/` instantiates all enabled stacks), so all five consumer workflows now pass `CDK_HOSTED_ZONE_DOMAIN`, `CDK_ARTIFACTS_ENABLED`, and `CDK_ARTIFACTS_CERTIFICATE_ARN` even when they're not synth'ing `ArtifactsStack` directly.
+- **Backup workflow** — new `workflow_dispatch` job wired to the OIDC composite action, runs `scripts/backup-data/` against any `CDK_PROJECT_PREFIX` (#361).
+- **Docker `curl` pin bumped (#327)** — Debian rotated `curl 8.14.1-2+deb13u2` out of the trixie apt index (superseded by `+deb13u3`), so the exact pin made every App API / Inference API Docker build hard-fail. Pin bumped, and the apt-pin policy documented as "follow Debian point-releases" rather than fully unpinning.
+- **`infrastructure-stack` DDB count test (#350)** — replaced the brittle `resourceCountIs(18)` magic number (which went stale when `user-menu-links` landed) with an enumerated, justified table list. Infra Jest is the only gate here and nothing blocks merges on it, so the count assertion had been sitting red on `develop`.
+
+---
+
+## 📦 Dependency upgrades
+
+- **`bedrock-agentcore` 1.6.4 → 1.9.1** (#337). Coupled `boto3` 1.42.96 → 1.43.9 with `botocore` / `s3transfer` following — `bedrock-agentcore` 1.9.1 requires `boto3>=1.43.0`. CHANGELOG audited end-to-end: no breaking changes for our memory/identity usage (the double-base64 fix is unused here, the namespace redesign is backward-compatible, the `ConversationTurn` fix is internal telemetry). Validated with a read-only dev smoke test (memory `get_memory_strategies` / `retrieve_memories` + identity `list_workload_identities`) and the full backend suite (2913 passed).
+
+  Test-infra side effect: `botocore` 1.43 newly reads `Credentials.account_id` during endpoint construction; on a `RefreshableCredentials` (SSO) object that forces a refresh → `GetRoleCredentials`, which `moto` does not implement. Combined with `backend/src/.env`'s `AWS_PROFILE` leaking via `load_dotenv(override=True)`, this red-ed the suite order-dependently. Added per-test autouse scrub fixtures for `AWS_PROFILE` and the `DYNAMODB_*` / `COGNITO_*` config families, mirroring the existing `_clear_skip_auth_env` fixture for the same `.env`-bleed bug class.
+
+- **`strands-agents` 1.39.0 → 1.40.0** (#340). Gated on a token-count audit and a compaction double-fire check. `use_native_token_count` default flipped `True → False` (Strands PR #2284) is inert for our token accounting — the flag gates only `BedrockModel.count_tokens()`, which Strands calls solely from `_estimate_input_tokens()` to populate `projected_input_tokens` on `BeforeModelCallEvent`. Our cost-badge / context-% / compaction-trigger plumbing reads from `inputTokens` + `cacheReadInputTokens` + `cacheWriteInputTokens` directly, so the default flip is transparent.
+
+---
+
+## 🧪 Test Coverage
+
+- Backend + frontend regression tests for `MaxTokensReachedException` classification, the `continue_truncated` resume path, `stream_error` always-allowed gating, and the `lastTurnContinuable` refresh-survival marker round-trip (#328).
+- Backend regression tests for adaptive thinking shape per model marker, `effort` allowed-set gating, and the float→int coercion path on `max_tokens` / `top_k` (#329, #330, #331).
+- `infrastructure/test/mcp-sandbox-stack.test.ts` (264 lines) and `mcp-sandbox-csp-function.test.ts` (357 lines) — synth + CFN unit coverage for the new stack including the placeholder-substitution invariants and `frame-ancestors` quote-escaping.
+- `infrastructure/test/inference-api-stack.test.ts` — two synth cases gating `AGENTCORE_MCP_APPS_SANDBOX_ORIGIN` wiring on `config.mcpSandbox.enabled` (#349).
+- `infrastructure/test/cors.test.ts` (53 lines) — new CORS test surface.
+- Refactored `infrastructure/test/infrastructure-stack.test.ts` to enumerate the 19 DDB tables with one-line justifications instead of asserting a count (#350).
+- Frontend specs for `mcp-app-bridge`, `mcp-app-card-state.service`, `mcp-app-consent.service`, `mcp-app-message.service`, `mcp-app-proxy.service`, `mcp-app-state.service`, `proxy-url`, `artifact-http.service`, `artifact-state.service`, `artifact-source.component`.
+
+---
+
+## 🚀 Deployment notes
+
+This is a multi-stack release. **Read this section before deploying.**
+
+### New stacks
+
+If you want either feature, set the gating flag and the supporting env vars before synth:
+
+- **Artifacts:** set `CDK_ARTIFACTS_ENABLED=true`. `CDK_HOSTED_ZONE_DOMAIN` and `CDK_ARTIFACTS_CERTIFICATE_ARN` become required across **every** consumer workflow that synthesizes any stack (validation runs on every synth — see #307). The artifacts ACM cert must be in `us-east-1` (CloudFront).
+- **MCP Apps:** set the corresponding `mcpSandbox.enabled` config and `AGENTCORE_MCP_APPS_HOST_ENABLED` (now defaults true). The `mcp-sandbox` ACM cert must be in `us-east-1`. Without `mcp-sandbox` deployed, `ui_resource` SSE events will emit with an empty `sandboxOrigin` and the SPA cannot frame the App.
+
+### Deploy order
+
+1. `InfrastructureStack` (provisions `UserMenuLinksTable` + `ArtifactRenderTokenSecret` + SSM publishes).
+2. `ArtifactsStack` (consumes `/artifacts/render-token-key-arn`).
+3. `McpSandboxStack` (independent of `ArtifactsStack`).
+4. `app-api` (consumes artifact + user-menu-links SSM).
+5. `inference-api` (consumes artifact + mcp-sandbox SSM, conditional on flags).
+6. Frontend.
+
+### Auth migration
+
+If any external integration was calling `apis/app_api/` routes with `Authorization: Bearer`, switch it to the API-key feature (`auth/api_keys/`, `X-API-Key`) before deploying beta.27 (#297). `POST /chat/agent-stream` remains Bearer-acceptable for non-SPA callers.
+
+### Pre-migration safety net
+
+Before any large infrastructure change (a stack-prefix migration, a region cutover, a CDK boundary refactor), run `scripts/backup-data/` first. The new workflow makes this a one-click affair against any `CDK_PROJECT_PREFIX`.
+
+### Optional follow-ups (not deploy-blocking)
+
+- Register an MCP Apps-capable MCP server via `step-04-deploy.md` to validate the host-renderer end-to-end against the committed `budget-allocator-server` example. Manual e2e dogfood scenario in `step-05-verify.md` exercises all six Definition-of-Done interactions.
+- If you carry custom CSP `frame-ancestors` source lists for embedded preview environments, set `mcpSandbox.extraFrameAncestors` rather than rebuilding the CloudFront Function asset.
+
+---
+
+# Release Notes — v1.0.0-beta.26
+
+**Release Date:** May 13, 2026
+**Previous Release:** v1.0.0-beta.25 (May 11, 2026)
+
+---
+
+## Highlights
+
+A small, focused release that lands two operator-facing fixes and one user-facing feature on top of the beta.25 production hardening. The big ones: **multi-sheet XLSX support** in the spreadsheet analysis tool with defensive caps so a pathological workbook can't blow up latency or context, and an **async refactor of the spreadsheet file-lookup path** that closes a regression where concurrent chat load could block the event loop. Also shipping a **user default model preference applied at chat time**, a **green nightly E2E pipeline** after a multi-attempt fix, and **upstream contribution governance** — PRs are now restricted to approved collaborators (GitHub "Collaborators only") and Dependabot version-update PRs are disabled in favor of manual weekly upgrades.
+
+This release has no schema or infrastructure changes. Deploy in any order.
+
+---
+
+## Multi-Sheet XLSX Support in Spreadsheet Analysis
+
+The spreadsheet analysis tool from beta.25 only handled the first sheet of an XLSX file, which silently misled the agent on multi-tabbed workbooks (financial models, fine-tuning datasets, anything from a real BI export). Beta.26 expands the tool to convert every sheet into its own predictable CSV, with sane defaults that protect the latency budget and the model's context window from pathological inputs.
+
+### Backend
+
+- `backend/src/agents/builtin_tools/spreadsheet_analysis/analyze_tool.py` — adds two environment-configurable caps (`MAX_SHEETS_TO_CONVERT`, `MAX_ROWS_PER_SHEET`) so a workbook with thousands of small sheets can't blow out the Code Interpreter sandbox. New helpers:
+  - `_sanitize_sheet_name()` produces filesystem-safe deterministic CSV filenames (`stem.sheetname.csv`) so the model's downstream code paths are predictable
+  - `_parse_sheet_inventory()` extracts structured sheet metadata from the bootstrap stdout without `eval`/`literal_eval` on untrusted output
+  - `_safe_int()` parses bootstrap integers defensively
+  - `_format_sheet_note()` generates a markdown footer documenting which sheets converted, which were truncated, and the per-sheet CSV paths — surfacing caps to the model with actionable warnings rather than silently wrong results
+- Tool docstring documents the dual contract: single-sheet workbooks keep the legacy `stem.csv` fast path; multi-sheet workbooks get per-sheet CSVs plus a primary alias for the first sheet
+- `backend/src/agents/main_agent/core/system_prompt_builder.py` — system-prompt guidance updated so the model handles per-sheet filenames correctly on retries
+
+### Test Coverage
+
+2,800+ lines of new tests across `backend/tests/agents/builtin_tools/spreadsheet_analysis/`:
+
+- `test_analyze_tool_integration.py` (779 lines) — multi-sheet XLSX and CSV workflows end-to-end
+- `test_sheet_inventory.py` (307 lines) — parser robustness against malformed bootstrap output
+- `test_build_preview_code.py` (127 lines) — filename escaping for quotes and special characters via `repr()` indirection (closes a code-generation injection edge case)
+- `test_clean_stderr.py` (202 lines) — `MAX_ERROR_CHARS` budget is now respected strictly, accounting for ellipsis length
+- `test_helpers.py`, `test_find_file.py`, `test_list_spreadsheets.py`, `test_strip_first_row.py` — coverage for the smaller utilities
+
+A small robustness fix landed alongside the tests: code generation now stashes the filename as a `_FNAME` variable inside the generated snippet to prevent f-string interpolation conflicts when filenames contain quotes or braces.
+
+---
+
+## Async Spreadsheet File Lookups
+
+The `analyze_spreadsheet` and `list_spreadsheets` tools shipped in beta.25 ran synchronous DynamoDB queries on the event loop (`_find_file`, `_get_kb_files`, `_get_session_files`), and the inference-api `_build_tabular_inventory` chat-route helper used a nested `asyncio.run` + thread pool executor pattern that could block under concurrent chat load. This release converts the entire path to native async: tool entry points are `async def`, every DynamoDB query is offloaded via `asyncio.to_thread`, and the inference-api helper awaits directly. This fixes a regression introduced in #260 where high-concurrency chat traffic could stall the event loop during file lookups — the same class of bug the BFF middleware fix in beta.25 addressed for session resolution.
+
+### Backend
+
+- `backend/src/agents/builtin_tools/spreadsheet_analysis/analyze_tool.py` and `list_spreadsheets_tool.py` — `analyze_spreadsheet`, `list_spreadsheets`, `_find_file`, `_get_kb_files`, `_get_session_files` are all `async def`; DynamoDB calls offload via `asyncio.to_thread`
+- `backend/src/apis/inference_api/chat/routes.py` — `_build_tabular_inventory` is now `async` and awaits the file-operation calls directly. Replaces the nested `asyncio.run` + thread pool executor pattern that could deadlock under load
+
+---
+
+## User Default Model Preference
+
+User-saved default model preferences (set in Settings → Chat Preferences) are now actually applied when the chat starts. Previously the persisted `defaultModelId` was ignored and chat fell back to the hardcoded factory default — closes issue #161.
+
+### Backend
+
+- `backend/src/apis/app_api/chat/routes.py` and `backend/src/apis/inference_api/chat/routes.py` — new `_resolve_user_default_model()` helper looks up the persisted `defaultModelId` from user settings. Applied in `chat_agent_stream` and the invocations endpoint when the request does not specify a `model_id`
+- RBAC re-checks the resolved default at chat time, so a user whose access to the previously-saved default has been revoked falls back gracefully rather than getting a permission error mid-stream
+- A missing user-settings table now surfaces as `503 Service Unavailable` instead of silently dropping the user choice
+- `backend/src/apis/app_api/user_settings/routes.py` — defaults endpoint adjustments
+
+### Frontend
+
+- `frontend/ai.client/src/app/session/services/model/model.service.ts` — supports persisted default model resolution
+- `frontend/ai.client/src/app/settings/pages/chat-preferences/chat-preferences-settings.page.ts` — Chat Preferences page now wires the default model picker to the persisted setting
+
+### Test Coverage
+
+- `model.service.spec.ts` — 56 lines covering the default-model resolution flow
+- `chat-preferences-settings.page.spec.ts` — 101 lines covering the settings UI
+
+---
+
+## Nightly E2E Pipeline Restored
+
+The nightly E2E pipeline had been red since the multi-stack deployment hit a series of cookie/JWT validation issues against the dynamic CloudFront URL. This release lands the fixes that turn the pipeline green:
+
+- CloudFront URL handling for cookie auth in the test environment
+- CDK certificate ARN wiring through the nightly job
+- Increased agent test time limits (the multi-tool turns were tripping default timeouts)
+- Switched the nightly suite from global Bedrock model IDs to US-region IDs to avoid cross-region routing flakes
+- Rebased fix branch on `develop` to pick up the release-notes strategy update from #248
+
+---
+
+## Upstream Contribution Governance
+
+A non-code change worth flagging because it changes how external contributors interact with this repository.
+
+- **`CONTRIBUTING.md`** — pull requests are now restricted to approved collaborators only (GitHub "Collaborators only" setting). The repository remains source-available under PolyForm Noncommercial 1.0.0; issues stay open to everyone for bug reports and proposed changes, and a maintainer triages each one. The contributing guide explains the path: open an issue → maintainer triages → maintainer either implements upstream or coordinates next steps with the reporter.
+- **`.github/dependabot.yml`** — `open-pull-requests-limit: 0` across all four ecosystems (pip, frontend npm, infrastructure npm, github-actions). Scheduled version-update PRs are off; we handle dependency upgrades manually on a weekly cadence. Dependabot **security updates** are unaffected — when a CVE is published against a dependency, you'll still see a PR.
+
+The full schedules, groups, and labels are retained in the config so flipping the limit back to a positive number restores the previous behavior with a one-line change.
+
+---
+
+## Documentation
+
+- `backend/src/.env.example` — BFF cookie encryption architecture documentation updated to reflect the beta.25 shift from direct KMS cookie encryption to Secrets Manager-mediated approach. Clarifies that the `BFFCookieSigningKey` CMK now encrypts the Secrets Manager secret at rest (not the cookie directly), documents the new `BFF_COOKIE_DATA_KEY_SECRET_ARN` variable, explains the cross-task SHA-256 derivation, and adds the SSM parameter path for locating the secret ARN with an example ARN format
+
+---
+
+## 📦 Dependencies
+
+No dependency upgrades in this release. Dependabot version-update PRs are disabled going forward; the next deps refresh will land as a manually curated batch.
+
+---
+
+## 🏗️ Infrastructure
+
+No infrastructure changes. No new resources, no IAM changes, no SSM parameter changes.
+
+---
+
+## 🔧 CI/CD
+
+- Nightly E2E pipeline fixes (#290) — CloudFront URL handling, CDK certificate ARN, agent test timeouts, US-region Bedrock model IDs
+
+---
+
+## 🚀 Deployment notes
+
+- Deploy in any order. No schema, infrastructure, or IAM changes.
+- After deployment, set the `MAX_SHEETS_TO_CONVERT` and `MAX_ROWS_PER_SHEET` env vars on the Inference API task definition if you want non-default caps for the spreadsheet analysis tool. Reasonable defaults are baked into the code; only set these if your workbooks routinely need higher limits.
+- **Manual follow-up (not deploy-blocking):** in the GitHub repo settings, flip **Settings → General → Pull Requests → Collaborators only** to actually enforce the contribution policy documented in `CONTRIBUTING.md`. Verify **Settings → Code security → Dependabot security updates** is still enabled — we explicitly want CVE-driven PRs to keep flowing even with version-update PRs disabled.
+
+---
+
+# Release Notes — v1.0.0-beta.25
+
+**Release Date:** May 11, 2026
+**Previous Release:** v1.0.0-beta.24 (May 6, 2026)
+
+---
+
+## Highlights
+
+This release is the **production-readiness fix for the BFF Token Handler** shipped in v1.0.0-beta.24. Beta.24 rewrote the SPA's auth surface onto cookie-based sessions but left three production-breaking bugs that only surfaced under real traffic: the `SessionRefreshMiddleware` ran synchronous boto3 on the uvicorn event loop so Angular's ~8-endpoint page-load fan-out produced ~16 serialized blocking AWS calls per user per minute (504s, 80s `/files/quota` tails, 15.6s p-max on a 0.7% CPU task); the `CookieCodec` minted a fresh random AES-256 key per process, so as soon as we raised `desiredCount` for concurrency slack every cookie started failing as `bad seal` on ~50% of requests; and the per-session refresh lock only coalesced in-process, so two tasks could still race `cognito-idp:initiate_auth` with the same refresh token and Cognito's rotation would silently log out the loser. This release lands the **event-loop offload + single-flight resolve**, a **cross-task shared AES key via Secrets Manager**, and a **DDB conditional-write refresh lock** that elects exactly one leader fleet-wide.
+
+Also shipping: **server-rendered PDF page-1 thumbnails** on attachment cards, **rich iMessage-style image mosaics** with a full-screen lightbox and inline markdown preview for `.md` files in user messages, **spreadsheet analysis tools** (`list_spreadsheets`, `analyze_spreadsheet`) that run CSV/XLSX analysis inside the Code Interpreter sandbox, **centralized 401 handling** with proactive session-loss detection on tab refocus, and a **`SKIP_AUTH=true` local-dev bypass** gated by a CORS-origin allowlist and a CI guard workflow. Token accounting was corrected across the board — per-message cost no longer double-counts tool-use turns and the context-% badge reflects current context occupancy rather than Strands' summed-across-calls value.
+
+### Heads-up on beta.24
+
+If you deployed beta.24 to a multi-replica environment, you saw some or all of: 401 storms on `/auth/session`, page-load latency tails in the tens of seconds, and users silently logged out after tab refocus. Beta.25 is the fix. The CookieCodec and refresh-lock changes require redeploying the Infrastructure and App API stacks in order — see **🚀 Deployment notes** at the bottom.
+
+---
+
+## BFF Middleware Event-Loop Blocking & Fan-Out Amplification
+
+The middleware introduced in beta.24 ran three independent classes of work on the uvicorn event loop that weren't safe to run there: synchronous boto3 for DynamoDB + Cognito, an inline-awaited sliding-session write on the response path, and a refresh-coalescing lock that only wrapped the Cognito exchange instead of the full resolve path. Under Angular's ~8-endpoint page-load fan-out with a cold `SessionCache` window, a single cookie-bearing user produced ~16 serialized blocking AWS round-trips on one uvicorn worker running in a single ECS task — every slow call stalled every concurrent request on the same task. The observable symptoms were ALB 504s, `TargetResponseTime` p-max of 15.6s at 0.7% CPU, `/files/quota` outliers reaching ~80s, and endpoint p95s climbing into the hundreds of ms under trivial load. (#264)
+
+### How it works now
+
+`SessionRepository.{get,put,update_tokens,touch_last_seen,delete}` and `CognitoRefreshClient.refresh` now offload every boto3 call via `asyncio.to_thread`, so the event loop keeps scheduling other coroutines for the full AWS round-trip duration. A new per-session single-flight primitive (`apis/shared/sessions_bff/single_flight.py`) wraps the whole `cache.get → repository.get → needs_refresh → (maybe refresh)` block in `SessionRefreshMiddleware._resolve_session` — the first caller per `session_id` runs the loader; N concurrent followers await a shared `asyncio.Future` and consume the leader's result. The existing `get_session_lock(session_id)` around the Cognito exchange is preserved end-to-end as defense in depth. `_maybe_slide` no longer `await`s `touch_last_seen` inline — the DDB write dispatches as a detached `asyncio.Task` and the response returns the fresh `Max-Age` synchronously. The cache/throttle boundary alignment that forced a single request to pay both `get_item` and `update_item` on the cache-miss boundary has been de-aligned: `_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS` is now a strict multiple of `_DEFAULT_REFRESH_LEEWAY_SECONDS` (300s vs 60s).
+
+### Backend
+
+- `apis/shared/sessions_bff/repository.py` — every boto3 call now wrapped in a nested sync helper invoked via `await asyncio.to_thread(helper, ...)`; method signatures, return types, and exception branches unchanged
+- `apis/shared/sessions_bff/refresh.py` — `refresh` is now `async def`, calling `await asyncio.to_thread(self._refresh_sync, ...)`; `CognitoRefreshError` contract and `RefreshResult` shape preserved verbatim
+- `apis/shared/sessions_bff/single_flight.py` — new module. `async def resolve_once(session_id, loader_coro_factory) -> tuple[Optional[SessionRecord], bool]`. Leader registers an `asyncio.Future` under a thread-lock-guarded `dict`, runs the loader, sets the result/exception on the Future, removes the registry entry in a `finally` block. Followers `await` the existing Future. Distinct `session_id`s never share a Future
+- `apis/shared/middleware/session_refresh.py` — `_resolve_session` wraps the cache/repo/refresh block in `resolve_once(session_id, _loader)`. `_maybe_slide` updates the local cache synchronously and dispatches `touch_last_seen` via `asyncio.create_task`, keeping the task on `self._slide_tasks` with an `add_done_callback(self._slide_tasks.discard)` — Python's asyncio docs explicitly warn that unreferenced tasks can be GC'd mid-flight, and our initial fix landed this footgun (caught by CI on Python 3.12)
+- `apis/shared/sessions_bff/config.py` — `_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS` raised 60s → 300s. Strict multiple of the 60s leeway guarantees cache-miss and slide-throttle boundaries never coincide
+
+### Infrastructure
+
+- `infrastructure/cdk.context.json` — `appApi.desiredCount` raised 1 → 2 for concurrency slack. A single blocked event loop on one task can no longer halt all ingress
+
+### Test Coverage
+
+~900 lines of new property-based tests. `test_session_refresh_bug_condition.py` encodes each of the seven sub-conditions as a hypothesis property that fails on unfixed code and passes on fixed code (Property 1 / Expected Behavior from the bugfix spec). `test_session_refresh_preservation.py` locks in the 11 preservation invariants that must stay unchanged for non-buggy inputs — dormant pass-through, no-cookie pass-through, unrecoverable-cookie clearing, `Max-Age` re-emit contract, refresh-storm coalescing, codec + client-secret singletons, CSRF decision unchanged, absolute-lifetime cap, fail-closed rotation, uniform `CookieDecodeError` handling. `test_single_flight.py` covers the primitive itself: concurrent callers share one loader invocation, exceptions propagate to every waiter, registry entries clean up after failure, distinct sessions are independent.
+
+---
+
+## BFF Cross-Task Cookie & Refresh Correctness
+
+The `desiredCount: 1 → 2` bump in the event-loop fix immediately exposed two latent defects in beta.24's BFF design that were hidden when only one task existed. Both had to be fixed before the deployment was actually safe to run with more than one replica. (#273, #274, #275)
+
+### Shared AES-256 data key via Secrets Manager
+
+`CookieCodec` in beta.24 called `kms:GenerateDataKey` on first use per process and cached the resulting plaintext AES-256 key in memory. The code's own docstring predicted what would happen with more than one task: _"two codecs in one process can never decrypt each other's output."_ And that's what happened — Task A sealed a cookie with Key-A, the ALB routed the follow-up to Task B which had its own Key-B, `unseal` hit `InvalidTag` → `CookieDecodeError` → `Discarding unrecoverable BFF cookie (bad seal)` → 401. CloudWatch confirmed: three app-api streams each independently logged _"BFF cookie codec initialized (KMS data key fetched)"_ and every subsequent `/auth/session` returned 401.
+
+The fix moves the data key out of per-process state and into a single Secrets Manager secret, encrypted at rest by the existing `BFFCookieSigningKey` CMK:
+
+- CDK creates `BFFCookieDataKeySecret` with `generateSecretString` (44-char alphanumeric, ~261 bits of entropy). On every deploy the secret already exists so the value is stable — cookies survive redeploys
+- `CookieCodec._ensure_cipher` reads the secret string and applies SHA-256 to derive the 32-byte AES-256 key. Single-shot SHA-256 of a ≥256-bit-entropy random input is a sound KDF for AES-256 usage
+- Every app-api task decrypts the same secret and derives the same key → all codecs round-trip each other's seals. The `kms:GenerateDataKey` permission dropped from the runtime task role (least privilege); `kms:Decrypt` stays because Secrets Manager invokes it on the caller's behalf when reading a CMK-encrypted secret
+
+A previous attempt at this bootstrap (#273's initial chained `AwsCustomResource` flow with `kms:GenerateDataKey → secretsmanager:PutSecretValue`) failed stack create with `Response object is too long`. Root cause: the `AwsCustomResource` framework Lambda JSON-stringifies the AWS-SDK response before applying `outputPaths`, and KMS returns `CiphertextBlob` as a Uint8Array that serializes as `{"0":233,"1":18,...}` — ~1.5 KB for a 200-byte ciphertext, past CloudFormation's 4 KB response-object limit. The Secrets-Manager-native `generateSecretString` path in #274 removes the chained custom resources entirely (-153 lines net), no per-cold-start `kms:Decrypt` call, simpler runtime IAM surface.
+
+### Cross-task refresh lock via DDB conditional-write
+
+The in-process single-flight and the existing `get_session_lock` only coalesce same-session callers within one Python process. Once the cookie-codec fix lands and both tasks can share cookies again, under `desiredCount: 2` two tasks each receive a same-session request crossing the refresh-leeway window and each call `cognito-idp:initiate_auth` with the same refresh token. Cognito rotates on the winning call; the loser receives `NotAuthorizedException`, the loser's middleware clears the cookie, and the user is silently logged out.
+
+- `SessionRepository.try_acquire_refresh_lock(session_id, owner, lock_ttl_seconds)` — conditional `UpdateItem` that succeeds iff `attribute_not_exists(refresh_lock_until) OR refresh_lock_until < :now` AND `attribute_exists(PK)` (no phantom rows for sessions that don't exist). Loser returns `False`
+- `SessionRepository.update_tokens` gains `expected_lock_owner=...` — when supplied, the write conditionally requires `refresh_lock_owner = :owner` (strict, not "owner-or-absent") and atomically `REMOVE`s the lock attrs in the same write. The stale-leader-stomp case (Task A's lock TTLs, Task B refreshes, Task A returns with older tokens) now surfaces as `ConditionalCheckFailedException` so the caller can re-read and adopt the peer's tokens
+- `SessionRepository.release_refresh_lock(session_id, owner)` — best-effort cleanup for the leader-failed path so a peer doesn't have to wait the full TTL before retrying
+- `SessionRefreshMiddleware._resolve_session._loader` — two-tier coalescing: (1) existing `get_session_lock` collapses N in-process same-session callers to one contender; (2) `try_acquire_refresh_lock` elects exactly one leader fleet-wide. Followers poll the row via `_wait_for_peer_refresh` and adopt the leader's tokens (rotation detected by refresh-token mismatch; non-rotation by access-token mismatch + future-dated `exp`). Absolute-lifetime guard added ahead of the lock acquisition — if `now > created_at + absolute_lifetime_seconds`, clear the cookie instead of burning a Cognito refresh on a row that's about to TTL-evict
+
+### Test Coverage
+
+Cross-task integration tests (`test_session_refresh_cross_task.py`, 480 lines) run two `SessionRefreshMiddleware` instances against one moto DDB table and exercise leader/follower paths, follower-polling-then-adopting, lock TTL recovery after a dead leader, follower-fall-back-terminal when the leader is stuck, and the headline invariant: two tasks racing in parallel call Cognito at most once. Eight new repository tests lock the lock primitive shape, plus targeted tests for the strict-owner release condition and the phantom-row-prevention guard on acquire.
+
+### Infrastructure
+
+- New `BFFCookieDataKeySecret` (Secrets Manager), encrypted with `BFFCookieSigningKey`. SSM parameter `/${projectPrefix}/auth/bff-cookie-data-key-secret-arn` publishes the ARN for app-api
+- App-api task role: added `secretsmanager:GetSecretValue` on the new secret; kept `kms:Decrypt` (needed by Secrets Manager to read the CMK-encrypted secret); removed `kms:GenerateDataKey` and `kms:DescribeKey`
+- No IAM change required for the DDB refresh lock — app-api task role already had `dynamodb:UpdateItem` on `BFFSessionsTable`
+
+### Breaking changes
+
+- None user-facing. The new env var and SSM parameter are additive; existing deployments redeploy Infrastructure first, then App API, to pick up the shared secret
+
+---
+
+## Token Accounting Correctness
+
+Two related bugs were inflating cost and context-% reporting on tool-use turns. (#270)
+
+### Per-message cost double-count
+
+Strands emits per-LLM-call metadata (each call's tokens) AND a final `AgentResultEvent` whose `EventLoopMetrics.accumulated_usage` is summed across every call in the turn. Both were emitted as `metadata` events and routed into `per_message_metadata[current_assistant_message_index]["usage"]` via `.update()`. Because the `AgentResult` event arrives after every `message_stop`, the index still pointed at the last assistant message — so cumulative tokens overwrote that message's per-call values, double-counting earlier messages' input tokens when each entry was priced and summed.
+
+Fix: route the result-extracted cumulative on the existing `metadata_summary` (turn-summary) track instead of `metadata`. The `stream_processor` main loop consumes both event types into `accumulated_metadata` so the final summary still carries true totals.
+
+### Context-% inflation within a tool turn
+
+Bedrock reports each per-LLM-call `inputTokens` as the FULL context size sent on that call. For a 2-call tool turn (`call_1.input=1000`, `call_2.input=2500`), Strands' `accumulated_usage` reports 3500 — but the actual current context occupancy is 2500. The final SSE `usage` field driving the context-% badge and compaction trigger was inheriting Strands' summed value.
+
+Fix: `stream_coordinator` no longer accumulates `metadata_summary` into `accumulated_metadata`. Per-call `metadata` events last-write-wins via `.update()`, so `accumulated_metadata.usage` equals the most recent call's full input = current context. Added a `CAUTION` comment noting `AgentResult.context_size` / `EventLoopMetrics.latest_context_size` return only `inputTokens` (excluding `cacheRead` / `cacheWrite`) — under prompt caching they under-report by 99%+, so we deliberately sum all three buckets. `TTFT` placeholder of 0 changed to `null` (a real time-to-first-token can never be 0ms and aggregations need to distinguish absence from a real zero); `LatencyMetrics.time_to_first_token` is now `Optional[int]` in both the shared and app-api models.
+
+### Test Coverage
+
+`test_per_message_cost_attribution.py` pins the `metadata` vs `metadata_summary` contract, the main-loop accumulator's both-tracks consumption, and the `stream_coordinator` current-context semantics (two parametrized cases plus all-three-buckets-summed for cache-read/write). Direct unit coverage for `CostCalculator` arrived in `test_calculator.py` (26 cases: per-bucket pricing, cache scenarios against Sonnet 4.5 rates, defensive missing-key / None handling, `calculate_cache_savings`, `validate_pricing` / `validate_usage`).
+
+---
+
+## Auth UX & Local-Dev Bypass
+
+### Centralized 401 handling + proactive session detection
+
+Beta.24 only redirected on 401 from the SessionService bootstrap path — a session that expired mid-session left the user stranded with a generic toast (CRUD endpoints) or no feedback (SSE chat stream). Every 401 now flows through `SessionService.handleUnauthorized()`, which dedupes concurrent calls and queues a single navigation to `/auth/login` with a preserved `returnUrl`. Session loss is surfaced proactively rather than waiting for the next HTTP call to fail: (#277)
+
+- **Cookie-presence fast-path** in bootstrap and recheck. The JS-readable `__Host-bff_csrf` cookie is set and cleared alongside `__Host-bff_session` with matching `Max-Age`, so if the CSRF cookie is gone the session cookie is gone too — skip the `/auth/session` round-trip and bounce straight to login
+- **Visibility re-probe** in the app shell. On tab refocus, `recheck()` runs the cookie check and falls back to `/auth/session`, so a session that expired while the tab was backgrounded is caught immediately rather than on the next user action
+
+### `SKIP_AUTH=true` local-dev bypass
+
+A single-env-var bypass for unattended local dev (and Claude Code agents) that can't round-trip through an external IdP. (#272)
+
+- Returns a fake admin `User` from the three auth dependencies in `apis.shared.auth.dependencies`; CSRF middleware, RBAC, and profile cache flow naturally because no `bff_session` is resolved
+- **Allowlist startup guard** in `app_api/main.lifespan` — app refuses to boot when `SKIP_AUTH=true` is paired with any non-localhost entry in `CORS_ORIGINS` (or an empty `CORS_ORIGINS`). Fails closed for deploy targets we haven't anticipated rather than blocklisting known cloud env vars
+- **CI guard workflow** (`.github/workflows/skip-auth-guard.yml`) — greps CDK source, workflow files, and Dockerfiles for `SKIP_AUTH=true` / `SKIP_AUTH: true` patterns and fails the build if any leak into deployed config
+- Inference-api is intentionally not bypassed — all SPA traffic flows through app-api per the BFF pattern, so one bypass is sufficient
+- Optional tuning: `SKIP_AUTH_ROLES`, `SKIP_AUTH_USER_ID`, `SKIP_AUTH_EMAIL` override the default fake user
+
+### Lava-lamp backdrop dark-mode fix
+
+The dark-mode CSS for the auth pages' lava-lamp backdrop and frosted-glass card never applied on cold load: hand-written `html.dark .X` selectors don't match under Angular's emulated view encapsulation, and `ThemeService` (`providedIn:'root'`) was never injected by anything in the pre-auth tree. Switched the auth-page CSS to `:host-context(html.dark) .X` (the pattern already used component-scoped elsewhere) and forced `ThemeService` to construct at bootstrap via `provideAppInitializer`, so the persisted/system theme is applied to `<html>` before any route renders, including `/auth/login` and `/auth/first-boot` on cold load. (#271)
+
+---
+
+## Attachments: PDF Thumbnails, Rich Previews, Markdown Modal
+
+### Server-rendered PDF page-1 thumbnails
+
+Real first-page thumbnails for PDF attachments instead of the skeleton mockup. Page rasterization runs in app-api via `pypdfium2` (Apache 2.0 / BSD, bundled PDFium binary, no system `poppler`/`ghostscript`). (#263)
+
+- New `ThumbnailRenderer` with a MIME-type dispatcher; PDF only today. Class docstring documents the recommended out-of-process design for `.docx` / `.xlsx` so the dispatcher stays small
+- `GET /files/{upload_id}/thumbnail` — lazy: HEAD-checks for a cached `_thumb.png` sibling next to the original, renders + stores on miss, returns a short-lived presigned GET URL. 415 for unsupported MIME types, 422 for unreadable / corrupt PDFs. Render runs in `loop.run_in_executor` so request workers aren't blocked
+- Single-file and session-cascade deletes also remove the thumbnail sibling
+- `FileUploadService.getThumbnail()` returns a typed result so callers switch on `ready` / `unsupported` / `unavailable` without parsing HTTP errors. Badge fetches on mount for PDFs and renders as `object-cover`, suppressing the bottom fade. Silent fall-back to the skeleton on any error
+
+### Rich previews in user messages
+
+The dense badge is replaced with a richer attachment renderer in user message history. (#254)
+
+- **Images** render as an iMessage-style mosaic: 1-bubble, 2-col, 1+2 split, 2×2 grid, 5+ with `+N` overlay. Opens in a full-screen lightbox with arrow-key navigation
+- **Non-image files** render as a document-style card: tinted header strip with type chip, white "page" body with a folded corner, filename + size footer. Text-based files (txt, md, csv, html) show a real content excerpt; binary types (pdf, docx, xls/xlsx) get skeleton lines
+- `GET /files/{upload_id}/preview-url` — short-lived presigned GET URL scoped to the file owner, used for inline images and the lightbox
+- `GET /files/{upload_id}/text-snippet` — first 2KB of a text-based file decoded as UTF-8 for the document card content peek
+
+### Inline markdown preview for `.md` files
+
+Parsed markdown renders in the attachment card excerpt instead of raw text; clicking a `.md` card opens a full-screen modal viewer rather than opening the raw source in a new tab. Reuses `ngx-markdown` (already wired up for assistant messages) and the existing presigned preview-url flow. (#262)
+
+---
+
+## Spreadsheet Analysis Tools
+
+New spreadsheet analysis capability for CSV/XLSX files. (#f88ce7ec, #0ab90bb1)
+
+- `list_spreadsheets` — enumerates CSV/Excel files from knowledge bases and chat attachments; includes file size and MIME type metadata
+- `analyze_spreadsheet` — downloads files from S3, executes Python analysis via Code Interpreter, returns results. Intelligent schema detection with skiprows probing handles report-style exports with metadata rows. Stderr is cleaned to filter pandas/numpy internal frames and show only user-relevant errors. Output truncated at 10K chars, errors at 600 chars, to prevent context-window overflow
+- Tools injected per-request into `ToolRegistry` via `extra_tools`; chat routes (app-api and inference-api) pass conversation context to the factories
+- Targeted error hints for XLSX→CSV filename mismatches in the sandbox environment; tolerant filename matching for CSV↔XLSX aliasing to prevent retry loops; schema footer preservation on errors for better retry context
+- File metadata models and utilities for consistent attachment handling; stream processor error handling improved for Code Interpreter responses
+
+---
+
+## 📦 Dependencies
+
+| Package | From | To |
+|---|---|---|
+| strands-agents (backend) | 1.37.0 | 1.39.0 |
+| strands-agents-tools (backend) | 0.5.1 | 0.5.2 |
+| pypdfium2 (backend, new) | — | latest |
+
+`CacheConfig(strategy="auto")` remains intentionally deferred on `BedrockModel`. The strands v1.39.0 bump includes the SDK-side fix (strands PR #1438 — `cachePoint` blocks alongside non-PDF document attachments), so the technical barrier is gone — but the user-visible cost/badge impact warrants a separate scoped rollout. (#265)
+
+---
+
+## 🏗️ Infrastructure
+
+- **New**: `BFFCookieDataKeySecret` (Secrets Manager), encrypted at rest with the existing `BFFCookieSigningKey` CMK. SSM parameter `/${projectPrefix}/auth/bff-cookie-data-key-secret-arn`
+- **Changed**: `appApi.desiredCount` raised 1 → 2
+- **IAM delta on app-api task role**: added `secretsmanager:GetSecretValue` on `BFFCookieDataKeySecret`; removed `kms:GenerateDataKey` and `kms:DescribeKey` on `BFFCookieSigningKey`; kept `kms:Decrypt` (Secrets Manager invokes it on the caller's behalf when reading a CMK-encrypted secret)
+- **No new tables**. The cross-task refresh lock reuses `BFFSessionsTable` via conditional `UpdateItem`
+
+---
+
+## 🔧 CI/CD
+
+- **New workflow**: `.github/workflows/skip-auth-guard.yml` — greps CDK source, workflow files, and Dockerfiles for `SKIP_AUTH=true` / `SKIP_AUTH: true` patterns and fails the build if any leak into deployed config. Uses SHA-pinned `actions/checkout` and `ubuntu-24.04` per existing supply-chain conventions in `tests/supply_chain/`
+
+---
+
+## 🚀 Deployment notes
+
+Deploy Infrastructure first, then App API, in that order.
+
+1. **Infrastructure stack** creates `BFFCookieDataKeySecret` and publishes its ARN to SSM. The secret value is generated by Secrets Manager on create and stays stable across subsequent deploys — cookies survive redeploys
+2. **App API stack** picks up `BFF_COOKIE_DATA_KEY_SECRET_ARN` on the next task rotation; existing tasks keep the old per-process data key until they drain. Both states coexist cleanly — new tasks seal under the shared key; old tasks still seal under their own; unsealing on a task that holds a different key fails the same way it does today and the SPA bounces to login. End state (all tasks rotated): cookies round-trip cleanly across the fleet
+3. **`desiredCount: 2` takes effect** on the App API stack's next deploy. CloudFormation scales up without draining traffic; the fix makes multi-replica safe
+
+No manual cleanup required if you were running on beta.24 — the migration is forward-only. If you want zero-drift on the user population, invalidate active sessions once post-deploy: `aws dynamodb scan --table-name ${BFFSessionsTable} --select COUNT` then a bulk delete, or just let the 30-day absolute-lifetime cap roll them off naturally.
+
+---
+
+
 
 ---
 
diff --git a/VERSION b/VERSION
index 33008cb0..c15e72b7 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-1.0.0-beta.24
+1.0.0-beta.28
diff --git a/backend/Dockerfile.app-api b/backend/Dockerfile.app-api
index 1a29e440..2f31de09 100644
--- a/backend/Dockerfile.app-api
+++ b/backend/Dockerfile.app-api
@@ -39,7 +39,7 @@ WORKDIR /app
 
 # Install runtime dependencies (curl required for HEALTHCHECK)
 RUN apt-get update && apt-get install -y \
-    curl=8.14.1-2+deb13u2 \
+    curl=8.14.1-2+deb13u3 \
     && rm -rf /var/lib/apt/lists/*
 
 # Copy the virtual environment from builder
diff --git a/backend/Dockerfile.inference-api b/backend/Dockerfile.inference-api
index 97a34fb9..0c7d17e1 100644
--- a/backend/Dockerfile.inference-api
+++ b/backend/Dockerfile.inference-api
@@ -39,7 +39,7 @@ WORKDIR /app
 
 # Install runtime dependencies (curl required for HEALTHCHECK)
 RUN apt-get update && apt-get install -y \
-    curl=8.14.1-2+deb13u2 \
+    curl=8.14.1-2+deb13u3 \
     && rm -rf /var/lib/apt/lists/*
 
 # Copy the virtual environment from builder
diff --git a/backend/pyproject.toml b/backend/pyproject.toml
index d6393e2b..e83c70af 100644
--- a/backend/pyproject.toml
+++ b/backend/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "agentcore-stack"
-version = "1.0.0-beta.24"
+version = "1.0.0-beta.28"
 requires-python = ">=3.10"
 description = "Multi-agent conversational AI system with AWS Bedrock AgentCore"
 readme = "README.md"
@@ -17,7 +17,7 @@ dependencies = [
     "uvicorn[standard]==0.46.0",
 
     # AWS and cloud services
-    "boto3==1.42.96",
+    "boto3==1.43.9",
 
     # Utilities
     "python-dotenv==1.2.2",
@@ -39,16 +39,20 @@ dependencies = [
     "cryptography==47.0.0",
     "python-multipart==0.0.27",
     "aiohttp==3.13.5",
+
+    # PDF page-1 rasterization for attachment thumbnails (Apache 2.0 / BSD).
+    # Bundles PDFium (no system poppler/ghostscript needed).
+    "pypdfium2==4.30.0",
 ]
 
 [project.optional-dependencies]
 # AgentCore-specific dependencies (for inference_api)
 agentcore = [
-    "strands-agents==1.37.0",
-    "strands-agents-tools==0.5.1",
+    "strands-agents==1.40.0",
+    "strands-agents-tools==0.5.2",
 
     "aws-opentelemetry-distro==0.17.0",
-    "bedrock-agentcore==1.6.4",
+    "bedrock-agentcore==1.9.1",
     # Multi-provider LLM support
     "openai==2.32.0",  # For OpenAI models
     "google-genai==1.73.1",  # For Google Gemini models
@@ -56,7 +60,7 @@ agentcore = [
 
 # Voice/BidiAgent dependencies (Nova Sonic speech-to-speech)
 bidi = [
-    "strands-agents[bidi]==1.37.0",
+    "strands-agents[bidi]==1.40.0",
 ]
 
 # Document ingestion pipeline dependencies (for Lambda deployment)
diff --git a/backend/scripts/seed_bootstrap_data.py b/backend/scripts/seed_bootstrap_data.py
index 5f745311..e603b5d7 100644
--- a/backend/scripts/seed_bootstrap_data.py
+++ b/backend/scripts/seed_bootstrap_data.py
@@ -207,10 +207,30 @@ def seed_default_quota_assignment(
 }
 
 
+# Sonnet 4.6 adds the `effort` knob (adaptive-thinking depth + overall token
+# spend). The per-model `allowed` set is the whole point of the design —
+# it's data, not code, so Opus 4.7 (which also gets `xhigh`/`max`) is just a
+# different array on a different record. Ordered low->high so future clamping
+# degrades gracefully. NOTE: Anthropic's published docs additionally list
+# `max` for Sonnet 4.6; this seeds the narrower low/medium/high set — widen
+# the array here if you want `max` exposed on this model.
+CLAUDE_SONNET_46_SUPPORTED_PARAMS: dict[str, Any] = {
+    "params": {
+        **CLAUDE_CHAT_SUPPORTED_PARAMS["params"],
+        "effort": {
+            "supported": True,
+            "allowed": ["low", "medium", "high"],
+            "default": "high",
+            "locked": False,
+        },
+    }
+}
+
+
 # Default Bedrock models to seed
 DEFAULT_MODELS: list[dict[str, Any]] = [
     {
-        "modelId": "global.anthropic.claude-haiku-4-5-20251001-v1:0",
+        "modelId": "us.anthropic.claude-haiku-4-5-20251001-v1:0",
         "modelName": "Claude Haiku 4.5",
         "provider": "bedrock",
         "providerName": "Anthropic",
@@ -227,7 +247,7 @@ def seed_default_quota_assignment(
         "supportedParams": CLAUDE_CHAT_SUPPORTED_PARAMS,
     },
     {
-        "modelId": "global.anthropic.claude-sonnet-4-6",
+        "modelId": "us.anthropic.claude-sonnet-4-6",
         "modelName": "Claude Sonnet 4.6",
         "provider": "bedrock",
         "providerName": "Anthropic",
@@ -241,7 +261,7 @@ def seed_default_quota_assignment(
         "cacheReadPricePerMillionTokens": Decimal("0.30"),
         "supportsCaching": True,
         "isDefault": False,
-        "supportedParams": CLAUDE_CHAT_SUPPORTED_PARAMS,
+        "supportedParams": CLAUDE_SONNET_46_SUPPORTED_PARAMS,
     },
     {
         "modelId": "amazon.nova-2-sonic-v1:0",
@@ -390,6 +410,26 @@ def seed_default_models(
         "isPublic": False,
         "forwardAuthToken": False,
     },
+    {
+        "toolId": "create_artifact",
+        "displayName": "Create Artifact",
+        "description": "Save standalone HTML or Markdown documents as versioned artifacts the user can open and iterate on.",
+        "category": "document",
+        "protocol": "local",
+        "enabledByDefault": True,
+        "isPublic": True,
+        "forwardAuthToken": False,
+    },
+    {
+        "toolId": "update_artifact",
+        "displayName": "Update Artifact",
+        "description": "Replace an existing artifact's content, creating a new immutable version.",
+        "category": "document",
+        "protocol": "local",
+        "enabledByDefault": True,
+        "isPublic": True,
+        "forwardAuthToken": False,
+    },
 ]
 
 
diff --git a/backend/src/.env.example b/backend/src/.env.example
index 44ca0eb4..c3adccb2 100644
--- a/backend/src/.env.example
+++ b/backend/src/.env.example
@@ -56,6 +56,19 @@ AGENTCORE_MEMORY_TOP_K=10
 # Requires: Gateway deployed via GatewayStack and SSM parameter /{projectPrefix}/mcp/gateway-url
 AGENTCORE_GATEWAY_MCP_ENABLED=true
 
+# MCP Apps host renderer (docs/kaizen/scoping/mcp-apps-host-renderer.md)
+# Gates the entire MCP Apps host surface. Default true since PR #7; set
+# false to opt this environment out. With no MCP-Apps-capable server
+# registered in the tool catalog the surface stays dormant regardless.
+AGENTCORE_MCP_APPS_HOST_ENABLED=true
+# Origin of the sandbox-proxy (proxy.html) the SPA frames an MCP App in.
+# Surfaced to the SPA on the `ui_resource` SSE event so the frontend needs no
+# separate config fetch. The iframe cannot render until this is set: point it
+# at a deployed mcp-sandbox origin (SSM /{projectPrefix}/mcp-sandbox/origin,
+# published by the mcp-sandbox CDK stack). See the MCP Apps registration
+# runbook in .github/docs/deploy/step-04-deploy.md.
+AGENTCORE_MCP_APPS_SANDBOX_ORIGIN=
+
 # AgentCore Code Interpreter ID (OPTIONAL)
 # Purpose: AWS Bedrock AgentCore Code Interpreter for executing Python code in a sandbox
 # Features: Generate charts/diagrams with matplotlib, data analysis with pandas/numpy
@@ -69,6 +82,41 @@ AGENTCORE_CODE_INTERPRETER_ID=
 # DEVELOPMENT SETTINGS
 # =============================================================================
 
+# Local-dev auth bypass (OPTIONAL — LOCAL DEV ONLY)
+# Purpose: Skip the Cognito redirect and return a fake admin user from
+# the three auth dependencies (get_current_user,
+# get_current_user_from_session, get_current_user_trusted). Lets an
+# unattended agent or a dev with no IdP access hit protected app-api
+# routes without the OAuth round-trip.
+# Default: unset (auth enforced)
+# Guard rails:
+#   - app-api refuses to boot when SKIP_AUTH=true unless every entry in
+#     CORS_ORIGINS is a localhost URL (localhost, 127.0.0.1, ::1,
+#     0.0.0.0). This is the allowlist that keeps the bypass off
+#     deployed environments.
+#   - A CI workflow (.github/workflows/skip-auth-guard.yml) refuses any
+#     PR that puts SKIP_AUTH=true into Dockerfiles, CDK, scripts, or
+#     other workflows.
+#   - Inference-api is NOT bypassed — SPA traffic flows through app-api
+#     so a single chokepoint suffices.
+# DO NOT enable in any deployed environment.
+SKIP_AUTH=
+
+# Roles to assign the fake user (OPTIONAL — only read when SKIP_AUTH=true)
+# Purpose: Comma-separated list of roles for the bypass user. Drives
+# RBAC filtering (model visibility, admin endpoints, etc.).
+# Default: admin
+# Example: DotNetDevelopers,QA
+SKIP_AUTH_ROLES=admin
+
+# User ID to assign the fake user (OPTIONAL — only read when SKIP_AUTH=true)
+# Default: local-dev
+SKIP_AUTH_USER_ID=local-dev
+
+# Email to assign the fake user (OPTIONAL — only read when SKIP_AUTH=true)
+# Default: dev@local
+SKIP_AUTH_EMAIL=dev@local
+
 # Enable quota enforcement (OPTIONAL)
 # Purpose: Enforce user quota limits on chat requests
 # If true (default), checks user quota before processing each request
@@ -502,6 +550,7 @@ COGNITO_DOMAIN_URL=
 #   aws ssm get-parameters --names \
 #     /<projectPrefix>/auth/bff-sessions-table-name \
 #     /<projectPrefix>/auth/bff-cookie-signing-key-arn \
+#     /<projectPrefix>/auth/bff-cookie-data-key-secret-arn \
 #     /<projectPrefix>/auth/cognito/bff-app-client-id \
 #     /<projectPrefix>/auth/cognito/bff-app-client-secret-arn
 
@@ -513,13 +562,25 @@ COGNITO_DOMAIN_URL=
 BFF_SESSIONS_TABLE_NAME=
 
 # KMS key used to seal the session cookie (REQUIRED)
-# Purpose: Customer-managed KMS key. Cookie codec encrypts the session_id
-# under this key with version-byte AAD; rotating the key invalidates all
-# outstanding cookies.
+# Purpose: Customer-managed KMS key that encrypts the BFF cookie
+# data-key secret at rest in Secrets Manager. App-api never calls KMS
+# directly; SecretsManager invokes kms:Decrypt on the caller's behalf
+# when GetSecretValue runs.
 # CDK Deployment: Created by InfrastructureStack
 # Example: arn:aws:kms:us-west-2:123456789012:key/abc-123
 BFF_COOKIE_SIGNING_KEY_ARN=
 
+# Secrets Manager ARN of the BFF cookie data-key secret (REQUIRED)
+# Purpose: High-entropy random secret generated once at stack create.
+# Every app-api task derives the same AES-256 cookie key via
+# SHA-256(secret_string), so cookies sealed by any task unseal on any
+# other task (required for desiredCount > 1). Without this var the
+# codec raises CookieDecodeError and /auth/callback returns 500.
+# CDK Deployment: Created by InfrastructureStack
+# Where to find: SSM /<projectPrefix>/auth/bff-cookie-data-key-secret-arn
+# Example: arn:aws:secretsmanager:us-west-2:123456789012:secret:<projectPrefix>-bff-cookie-data-key-XXXXXX
+BFF_COOKIE_DATA_KEY_SECRET_ARN=
+
 # Confidential Cognito app client used by the BFF (REQUIRED)
 # Purpose: Client ID for the server-side OAuth code exchange. Distinct
 # from any public PKCE client because the BFF holds a client_secret.
@@ -608,6 +669,38 @@ DYNAMODB_AUTH_PROVIDERS_TABLE_NAME=
 # Example: arn:aws:secretsmanager:us-west-2:123456789:secret:auth-provider-secrets-xxxxx
 AUTH_PROVIDER_SECRETS_ARN=
 
+# =============================================================================
+# ARTIFACTS CONFIGURATION
+# =============================================================================
+# SPA-rendered artifacts created by the create_artifact / update_artifact
+# agent tools (#306-#312). OPTIONAL: leave the secret ARN empty to disable.
+# CDK Deployment: all four created by ArtifactsStack when CDK_ARTIFACTS_ENABLED=true,
+# published to SSM at /{projectPrefix}/artifacts/* for the deployed stacks.
+
+# Secrets Manager ARN of the HMAC key that signs artifact render tokens
+# Purpose: signs/verifies the short-lived JWT the render Lambda checks
+# Feature gate: app-api ONLY registers the /artifacts routes when this is
+#   set and non-empty (apis/app_api/main.py) — empty here means a 404
+# Example: arn:aws:secretsmanager:us-west-2:123456789:secret:artifact-render-token-key-xxxxx
+ARTIFACTS_RENDER_TOKEN_SECRET_ARN=
+
+# DynamoDB table for artifact metadata and versions
+# Purpose: HEAD/version rows + SessionIndex GSI the list endpoint queries
+#   and the agent-side create/update tools write
+# Example: bsu-agentcore-user-artifacts
+DYNAMODB_ARTIFACTS_TABLE_NAME=
+
+# S3 bucket for artifact content bodies
+# Purpose: stores the rendered artifact payloads written by the agent tools
+# Example: bsu-agentcore-artifacts-content
+S3_ARTIFACTS_BUCKET_NAME=
+
+# Origin the render token is bound to (no trailing slash)
+# Purpose: base URL of the artifacts CloudFront distribution; render-token
+#   URLs are issued as {origin}/?t=<jwt>
+# Example: https://artifacts.example.com
+ARTIFACTS_ORIGIN=
+
 # =============================================================================
 # FILE UPLOAD CONFIGURATION
 # =============================================================================
diff --git a/backend/src/agents/builtin_tools/__init__.py b/backend/src/agents/builtin_tools/__init__.py
index 5c0df74f..3c7ea2ed 100644
--- a/backend/src/agents/builtin_tools/__init__.py
+++ b/backend/src/agents/builtin_tools/__init__.py
@@ -2,10 +2,15 @@
 
 This package contains tools that leverage AWS Bedrock capabilities:
 - Code Interpreter: Execute Python code for diagrams and charts
+- Spreadsheet Analysis: Analyze tabular data via Code Interpreter (factory-produced, not in registry)
 """
 
 from .code_interpreter_diagram_tool import generate_diagram_and_validate
+from .spreadsheet_analysis import make_list_spreadsheets_tool, make_analyze_tool
 
+# Only static tools go in __all__ (registered in ToolRegistry at startup).
+# Factory-produced tools (make_list_spreadsheets_tool, make_analyze_tool) are created
+# per-request with context and injected via extra_tools — not registered here.
 __all__ = [
     'generate_diagram_and_validate',
 ]
diff --git a/backend/src/agents/builtin_tools/artifacts/__init__.py b/backend/src/agents/builtin_tools/artifacts/__init__.py
new file mode 100644
index 00000000..761336e5
--- /dev/null
+++ b/backend/src/agents/builtin_tools/artifacts/__init__.py
@@ -0,0 +1,13 @@
+"""Artifact authoring tools.
+
+Lets the agent persist standalone HTML documents as versioned artifacts.
+The writer here owns S3 upload + the DynamoDB version/HEAD rows; the
+render Lambda (`backend/src/lambdas/artifact_render/handler.py`) and the
+app-api render-token minter read those rows back. The DDB key layout and
+the `storage`/`content_key`/`content_type` attributes are a frozen
+cross-PR contract with both readers.
+"""
+
+from .tools import make_create_artifact_tool, make_update_artifact_tool
+
+__all__ = ["make_create_artifact_tool", "make_update_artifact_tool"]
diff --git a/backend/src/agents/builtin_tools/artifacts/service.py b/backend/src/agents/builtin_tools/artifacts/service.py
new file mode 100644
index 00000000..86a5cb23
--- /dev/null
+++ b/backend/src/agents/builtin_tools/artifacts/service.py
@@ -0,0 +1,480 @@
+"""Artifact writer — S3 upload + DynamoDB version/HEAD rows.
+
+Frozen cross-PR contract (must stay in sync with the render Lambda
+`backend/src/lambdas/artifact_render/handler.py` and the app-api minter
+`backend/src/apis/app_api/artifacts/service.py`):
+
+  Version row : PK=USER#{user_id}  SK=ARTIFACT#{aid}#V#{version:05d}
+                attrs storage="s3", content_key, content_type
+  HEAD row    : PK=USER#{user_id}  SK=ARTIFACT#{aid}#HEAD
+                + GSI1PK=SESSION#{session_id}
+                + GSI1SK=ARTIFACT#{updated_at}#{aid}   (SessionIndex)
+  S3 layout   : {user_id}/{aid}/v{n}/index.html
+
+Versions are immutable (no DeleteObject grant in inference-api) — an
+update writes a new version and re-points HEAD.
+
+Markdown artifacts: when `content_type` is a Markdown type, the model
+authors raw Markdown but S3 stores a self-contained HTML render wrapper
+(the writer owns rendering — the render Lambda is a pass-through). The
+version/HEAD rows keep the authored `content_type` (`text/markdown`) so
+the card badge and list stay truthful; the render Lambda maps that type
+to a `text/html` HTTP response so the browser renders the wrapper.
+"""
+
+from __future__ import annotations
+
+import base64
+import html
+import logging
+import os
+import uuid
+from datetime import datetime, timezone
+from typing import Optional
+
+import boto3
+from boto3.dynamodb.conditions import Key
+from botocore.exceptions import ClientError
+
+logger = logging.getLogger(__name__)
+
+_DEFAULT_CONTENT_TYPE = "text/html; charset=utf-8"
+# What S3 physically holds (and the HTTP type the render Lambda emits)
+# for a Markdown artifact: a self-contained HTML render wrapper.
+_RENDERED_CONTENT_TYPE = "text/html; charset=utf-8"
+_MARKDOWN_MIME_TYPES = frozenset({"text/markdown", "text/x-markdown"})
+
+# Markdown is base64-embedded so no character ever needs HTML/JS escaping
+# and there is no second network fetch (the artifact-origin CSP sets
+# connect-src 'none'). The rendered HTML is intentionally NOT sanitized:
+# this document runs in the same null-origin sandboxed iframe as HTML
+# artifacts, so its containment story is identical to theirs. `marked` is
+# pinned (dependency-free single module) and loaded from esm.sh, which the
+# artifact-origin CSP allows under script-src.
+_MARKDOWN_RENDER_TEMPLATE = """<!doctype html>
+<html lang="en">
+<head>
+<meta charset="utf-8">
+<meta name="viewport" content="width=device-width, initial-scale=1">
+<title>__ARTIFACT_TITLE__</title>
+<style>
+  :root { color-scheme: light dark; }
+  body {
+    margin-inline: auto;
+    max-width: 56rem;
+    padding: 2.5rem clamp(1rem, 5vw, 4rem);
+    font: 16px/1.7 -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto,
+      Helvetica, Arial, sans-serif;
+    color: #1f2328;
+    background: #ffffff;
+  }
+  h1, h2, h3, h4 { line-height: 1.25; margin: 2rem 0 1rem; font-weight: 600; }
+  h1 { font-size: 2rem; border-bottom: 1px solid #d0d7de; padding-bottom: .3rem; }
+  h2 { font-size: 1.5rem; border-bottom: 1px solid #d0d7de; padding-bottom: .3rem; }
+  h3 { font-size: 1.25rem; }
+  a { color: #0969da; }
+  p, ul, ol, blockquote, table, pre { margin: 0 0 1rem; }
+  ul, ol { padding-left: 2rem; }
+  code {
+    font-family: ui-monospace, SFMono-Regular, "SF Mono", Menlo, Consolas,
+      monospace;
+    font-size: .9em;
+    background: rgba(175, 184, 193, .2);
+    padding: .2em .4em;
+    border-radius: 6px;
+  }
+  pre { background: #f6f8fa; padding: 1rem; border-radius: 6px; overflow: auto; }
+  pre code { background: none; padding: 0; font-size: .875em; }
+  blockquote {
+    margin-left: 0;
+    padding: 0 1rem;
+    color: #59636e;
+    border-left: .25rem solid #d0d7de;
+  }
+  table { border-collapse: collapse; display: block; overflow: auto; }
+  th, td { border: 1px solid #d0d7de; padding: .4rem .8rem; }
+  th { background: #f6f8fa; }
+  img { max-width: 100%; }
+  hr { border: none; border-top: 1px solid #d0d7de; margin: 2rem 0; }
+  @media (prefers-color-scheme: dark) {
+    body { color: #e6edf3; background: #0d1117; }
+    h1, h2 { border-bottom-color: #30363d; }
+    a { color: #4493f8; }
+    code { background: rgba(110, 118, 129, .4); }
+    pre { background: #161b22; }
+    blockquote { color: #9198a1; border-left-color: #30363d; }
+    th, td { border-color: #30363d; }
+    th { background: #161b22; }
+    hr { border-top-color: #30363d; }
+  }
+</style>
+</head>
+<body>
+<main id="content" aria-live="polite">Rendering…</main>
+<script type="application/x-markdown-base64" id="md-src">__ARTIFACT_MD_B64__</script>
+<script type="module">
+  import { marked } from "https://esm.sh/marked@14.1.4";
+  const b64 = document.getElementById("md-src").textContent.trim();
+  const md = new TextDecoder().decode(
+    Uint8Array.from(atob(b64), (c) => c.charCodeAt(0)),
+  );
+  const el = document.getElementById("content");
+  try {
+    el.innerHTML = marked.parse(md, { gfm: true, breaks: false });
+    const h1 = document.querySelector("h1");
+    if (h1 && h1.textContent.trim()) document.title = h1.textContent.trim();
+  } catch (err) {
+    el.textContent = "Could not render this Markdown document.";
+  }
+</script>
+</body>
+</html>
+"""
+
+
+def _is_markdown(content_type: Optional[str]) -> bool:
+    """True for a Markdown MIME type, ignoring any `; charset=` suffix."""
+    bare = (content_type or "").split(";")[0].strip().lower()
+    return bare in _MARKDOWN_MIME_TYPES
+
+
+def _wrap_markdown(title: str, markdown: str) -> str:
+    """Render Markdown source into a self-contained HTML document."""
+    md_b64 = base64.b64encode(markdown.encode("utf-8")).decode("ascii")
+    return _MARKDOWN_RENDER_TEMPLATE.replace(
+        "__ARTIFACT_TITLE__", html.escape(title or "Markdown document")
+    ).replace("__ARTIFACT_MD_B64__", md_b64)
+
+
+_cached_bucket: Optional[str] = None
+_cached_table: Optional[str] = None
+_ssm_client = None
+_s3_client = None
+_ddb_resource = None
+
+
+class ArtifactError(Exception):
+    """Base class for artifact write failures."""
+
+
+class ArtifactNotFoundError(ArtifactError):
+    """Update target does not exist for this user."""
+
+
+class ArtifactConfigError(ArtifactError):
+    """Artifacts feature is not configured for this environment."""
+
+
+def _reset_caches_for_tests() -> None:
+    global _cached_bucket, _cached_table, _ssm_client, _s3_client, _ddb_resource
+    _cached_bucket = None
+    _cached_table = None
+    _ssm_client = None
+    _s3_client = None
+    _ddb_resource = None
+
+
+def _region() -> str:
+    return (
+        os.environ.get("AWS_REGION")
+        or os.environ.get("AWS_DEFAULT_REGION")
+        or "us-west-2"
+    )
+
+
+def _resolve(env_var: str, ssm_suffix: str) -> str:
+    """Env var first, then SSM under the runtime's PROJECT_PREFIX.
+
+    inference-api exposes PROJECT_PREFIX and holds ssm:GetParameter on
+    `/{prefix}/*`, so the artifacts params published by the artifacts
+    stack are readable without any extra wiring."""
+    value = os.environ.get(env_var)
+    if value:
+        return value
+    global _ssm_client
+    prefix = os.environ.get("PROJECT_PREFIX")
+    if not prefix:
+        raise ArtifactConfigError(
+            f"{env_var} unset and PROJECT_PREFIX unavailable"
+        )
+    if _ssm_client is None:
+        _ssm_client = boto3.client("ssm", region_name=_region())
+    try:
+        resp = _ssm_client.get_parameter(
+            Name=f"/{prefix}/artifacts/{ssm_suffix}"
+        )
+    except ClientError as exc:
+        raise ArtifactConfigError(
+            f"artifacts {ssm_suffix} parameter not found"
+        ) from exc
+    return resp["Parameter"]["Value"]
+
+
+def _bucket_name() -> str:
+    global _cached_bucket
+    if _cached_bucket is None:
+        _cached_bucket = _resolve("S3_ARTIFACTS_BUCKET_NAME", "bucket-name")
+    return _cached_bucket
+
+
+def _table():
+    global _cached_table, _ddb_resource
+    if _cached_table is None:
+        _cached_table = _resolve("DYNAMODB_ARTIFACTS_TABLE_NAME", "table-name")
+    if _ddb_resource is None:
+        _ddb_resource = boto3.resource("dynamodb", region_name=_region())
+    return _ddb_resource.Table(_cached_table)
+
+
+def _s3():
+    global _s3_client
+    if _s3_client is None:
+        _s3_client = boto3.client("s3", region_name=_region())
+    return _s3_client
+
+
+def _now_iso() -> str:
+    return datetime.now(timezone.utc).isoformat()
+
+
+def _put_object(user_id: str, artifact_id: str, version: int,
+                content: str, content_type: str, title: str) -> str:
+    key = f"{user_id}/{artifact_id}/v{version}/index.html"
+    if _is_markdown(content_type):
+        body = _wrap_markdown(title, content)
+        object_content_type = _RENDERED_CONTENT_TYPE
+    else:
+        body = content
+        object_content_type = content_type
+    _s3().put_object(
+        Bucket=_bucket_name(),
+        Key=key,
+        Body=body.encode("utf-8"),
+        ContentType=object_content_type,
+    )
+    return key
+
+
+def create_artifact_record(
+    user_id: str,
+    session_id: str,
+    title: str,
+    content: str,
+    content_type: str,
+) -> tuple[str, int]:
+    """Create v1 of a new artifact. Returns (artifact_id, version)."""
+    artifact_id = uuid.uuid4().hex
+    version = 1
+    content_type = content_type or _DEFAULT_CONTENT_TYPE
+    now = _now_iso()
+    content_key = _put_object(
+        user_id, artifact_id, version, content, content_type, title
+    )
+
+    pk = f"USER#{user_id}"
+    common = {
+        "storage": "s3",
+        "content_key": content_key,
+        "content_type": content_type,
+        "version": version,
+        "artifact_id": artifact_id,
+        "user_id": user_id,
+        "session_id": session_id,
+        "title": title,
+        "created_at": now,
+    }
+    table = _table()
+    try:
+        table.put_item(
+            Item={
+                **common,
+                "PK": pk,
+                "SK": f"ARTIFACT#{artifact_id}#V#{version:05d}",
+                "updated_at": now,
+            },
+            ConditionExpression="attribute_not_exists(SK)",
+        )
+        table.put_item(
+            Item={
+                **common,
+                "PK": pk,
+                "SK": f"ARTIFACT#{artifact_id}#HEAD",
+                "updated_at": now,
+                "GSI1PK": f"SESSION#{session_id}",
+                "GSI1SK": f"ARTIFACT#{now}#{artifact_id}",
+            },
+            ConditionExpression="attribute_not_exists(SK)",
+        )
+    except ClientError as exc:
+        raise ArtifactError("failed to write artifact metadata") from exc
+
+    logger.info(
+        "created artifact user=%s artifact=%s v=%s session=%s",
+        user_id, artifact_id, version, session_id,
+    )
+    return artifact_id, version
+
+
+def update_artifact_record(
+    user_id: str,
+    artifact_id: str,
+    content: str,
+    title: Optional[str],
+    content_type: Optional[str],
+) -> int:
+    """Append a new immutable version and re-point HEAD. Returns version."""
+    pk = f"USER#{user_id}"
+    table = _table()
+    try:
+        head = table.get_item(
+            Key={"PK": pk, "SK": f"ARTIFACT#{artifact_id}#HEAD"}
+        ).get("Item")
+    except ClientError as exc:
+        raise ArtifactError("artifact metadata lookup failed") from exc
+    if not head:
+        raise ArtifactNotFoundError(artifact_id)
+
+    current = int(head["version"])
+    version = current + 1
+    title = title or head.get("title", "")
+    content_type = content_type or head.get("content_type") or _DEFAULT_CONTENT_TYPE
+    now = _now_iso()
+    content_key = _put_object(
+        user_id, artifact_id, version, content, content_type, title
+    )
+
+    common = {
+        "storage": "s3",
+        "content_key": content_key,
+        "content_type": content_type,
+        "version": version,
+        "artifact_id": artifact_id,
+        "user_id": user_id,
+        "session_id": head.get("session_id", ""),
+        "title": title,
+        "created_at": head.get("created_at", now),
+    }
+    try:
+        table.put_item(
+            Item={
+                **common,
+                "PK": pk,
+                "SK": f"ARTIFACT#{artifact_id}#V#{version:05d}",
+                "updated_at": now,
+            },
+            ConditionExpression="attribute_not_exists(SK)",
+        )
+        # Optimistic lock: HEAD must still be at the version we read, so
+        # two concurrent updates can't silently clobber each other.
+        table.put_item(
+            Item={
+                **common,
+                "PK": pk,
+                "SK": f"ARTIFACT#{artifact_id}#HEAD",
+                "updated_at": now,
+                "GSI1PK": f"SESSION#{head.get('session_id', '')}",
+                "GSI1SK": f"ARTIFACT#{now}#{artifact_id}",
+            },
+            ConditionExpression="version = :cur",
+            ExpressionAttributeValues={":cur": current},
+        )
+    except ClientError as exc:
+        code = exc.response.get("Error", {}).get("Code", "")
+        if code == "ConditionalCheckFailedException":
+            raise ArtifactError(
+                "artifact changed concurrently; retry the update"
+            ) from exc
+        raise ArtifactError("failed to write artifact metadata") from exc
+
+    logger.info(
+        "updated artifact user=%s artifact=%s v=%s", user_id, artifact_id, version
+    )
+    return version
+
+
+def set_produced_by_message_index(
+    user_id: str, artifact_id: str, version: int, message_index: int
+) -> None:
+    """Stamp the artifact's version row (and HEAD) with the index of the
+    assistant message that produced this version this turn.
+
+    Per-version linkage is what lets the SPA place every version's card
+    under the turn that produced it after a reload — the list endpoint
+    returns all version rows, not just HEAD. HEAD is stamped too so any
+    HEAD-only reader still sees the latest version's linkage.
+
+    The artifact tool can't know this index at write time — the turn
+    isn't finished — so the stream coordinator writes it back post-turn
+    using the same odd-position index it already computes for per-message
+    metadata (`initial_message_count + 2*i + 1`). That index matches the
+    `idx` the messages endpoint enumerates on reload.
+
+    Best-effort: a SET on a single attribute that deliberately does not
+    touch `version`, so it can never collide with the update_artifact
+    optimistic lock. Failures are swallowed by the caller (linkage is a
+    UX nicety, never worth breaking a turn over).
+    """
+    table = _table()
+    for sk in (
+        f"ARTIFACT#{artifact_id}#V#{version:05d}",
+        f"ARTIFACT#{artifact_id}#HEAD",
+    ):
+        table.update_item(
+            Key={"PK": f"USER#{user_id}", "SK": sk},
+            UpdateExpression="SET produced_by_message_index = :idx",
+            ExpressionAttributeValues={":idx": message_index},
+            ConditionExpression="attribute_exists(SK)",
+        )
+
+
+_SESSION_INDEX = "SessionIndex"
+
+
+def list_session_artifacts(user_id: str, session_id: str) -> list[dict]:
+    """Current HEAD of every artifact written in a chat session.
+
+    Read side of the same SessionIndex GSI the app-api list endpoint
+    uses; the stream coordinator calls this post-turn to emit the live
+    `artifact` SSE event. Only HEAD rows carry GSI1PK/GSI1SK, so the
+    query returns one row per artifact (its current version). GSI1PK is
+    SESSION#-scoped (not user-scoped) so every row is re-checked against
+    the authenticated user's id.
+    """
+    table = _table()
+    items: list[dict] = []
+    kwargs: dict = {
+        "IndexName": _SESSION_INDEX,
+        "KeyConditionExpression": Key("GSI1PK").eq(f"SESSION#{session_id}"),
+        "ScanIndexForward": False,  # GSI1SK embeds updated_at → newest first
+    }
+    try:
+        while True:
+            resp = table.query(**kwargs)
+            items.extend(resp.get("Items", []))
+            last = resp.get("LastEvaluatedKey")
+            if not last:
+                break
+            kwargs["ExclusiveStartKey"] = last
+    except ClientError as exc:
+        raise ArtifactError("artifact list query failed") from exc
+
+    out: list[dict] = []
+    for item in items:
+        if item.get("user_id") != user_id:
+            continue
+        out.append(
+            {
+                "artifact_id": item.get("artifact_id", ""),
+                "version": int(item.get("version", 0)),
+                "title": item.get("title", ""),
+                "content_type": item.get(
+                    "content_type", _DEFAULT_CONTENT_TYPE
+                ),
+                "updated_at": item.get("updated_at", ""),
+                "created_at": item.get("created_at"),
+                "produced_by_message_index": item.get(
+                    "produced_by_message_index"
+                ),
+            }
+        )
+    return out
diff --git a/backend/src/agents/builtin_tools/artifacts/tools.py b/backend/src/agents/builtin_tools/artifacts/tools.py
new file mode 100644
index 00000000..a202e034
--- /dev/null
+++ b/backend/src/agents/builtin_tools/artifacts/tools.py
@@ -0,0 +1,140 @@
+"""Context-bound factories for the artifact authoring tools.
+
+Identity is captured by closure (the codebase has no tool-execution
+contextvar) — same pattern as the spreadsheet_analysis tools. Blocking
+boto3 work is offloaded with ``asyncio.to_thread`` so the chat event
+loop stays responsive under concurrent load.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+from typing import Any, Optional
+
+from strands import tool
+
+from . import service
+
+logger = logging.getLogger(__name__)
+
+
+def make_create_artifact_tool(session_id: str, user_id: str):
+    @tool
+    async def create_artifact(
+        title: str,
+        content: str,
+        content_type: str = "text/html; charset=utf-8",
+    ) -> dict[str, Any]:
+        """Save a standalone document as a versioned artifact the user can open.
+
+        Use this when you produce a self-contained deliverable the user
+        will want to view, keep, or iterate on — an HTML page, a chart,
+        an interactive widget, a formatted report, or a written document.
+
+        Two authoring modes:
+
+        - HTML (default): `content` MUST be a complete standalone HTML
+          document (include `<!doctype html>` and a full `<html>` …
+          `</html>`). It renders in a sandboxed iframe with a strict CSP:
+          inline `<style>`/`<script>` are allowed, as are scripts from
+          `https://cdn.tailwindcss.com`, `https://esm.sh`,
+          `https://cdn.jsdelivr.net`, and `https://unpkg.com` — no
+          other origin loads. The page cannot make `fetch`/XHR calls
+          (CSP `connect-src` is blocked), so inline any data.
+
+          Load JS libraries from one of those CDNs and pin a version.
+          Chart.js note: use an auto-registering build or charts
+          render blank — e.g.
+          `import Chart from "https://esm.sh/chart.js@4/auto"` inside a
+          `<script type="module">`, or the UMD bundle
+          `<script src="https://cdn.jsdelivr.net/npm/chart.js@4">`. A
+          bare `https://esm.sh/chart.js` import (no `/auto`) silently
+          fails to draw.
+
+        - Markdown: pass `content_type="text/markdown"` and provide raw
+          GitHub-flavored Markdown as `content`. Do NOT add an HTML
+          shell — the system renders the Markdown for the user. Prefer
+          this for prose, reports, and documentation.
+
+        Either way, do NOT wrap `content` in markdown code fences.
+
+        Args:
+            title: Short human-readable name shown in the artifacts list.
+            content: The full HTML document, or raw Markdown when
+                content_type is text/markdown.
+            content_type: Defaults to text/html. Pass "text/markdown"
+                to author a Markdown document instead.
+
+        Returns the new artifact id and version — reference the id if the
+        user later asks you to change it (via update_artifact).
+        """
+        try:
+            artifact_id, version = await asyncio.to_thread(
+                service.create_artifact_record,
+                user_id, session_id, title, content, content_type,
+            )
+        except service.ArtifactConfigError as exc:
+            return {"content": [{"text": f"❌ Artifacts are not available: {exc}"}], "status": "error"}
+        except service.ArtifactError as exc:
+            return {"content": [{"text": f"❌ Failed to create artifact: {exc}"}], "status": "error"}
+        return {
+            "content": [{"text": (
+                f'Created artifact "{title}" '
+                f"(id: {artifact_id}, version {version})."
+            )}],
+            "status": "success",
+        }
+
+    return create_artifact
+
+
+def make_update_artifact_tool(session_id: str, user_id: str):
+    @tool
+    async def update_artifact(
+        artifact_id: str,
+        content: str,
+        title: Optional[str] = None,
+        content_type: Optional[str] = None,
+    ) -> dict[str, Any]:
+        """Replace an existing artifact's content, creating a new version.
+
+        Prior versions are kept immutably. Pass the `artifact_id`
+        returned by an earlier create_artifact. `content` follows the
+        same rules as create_artifact: a complete standalone HTML
+        document, or raw Markdown when content_type is text/markdown; no
+        markdown code fences either way. If content_type is omitted the
+        artifact's existing type (HTML or Markdown) is kept.
+
+        Args:
+            artifact_id: The artifact to update.
+            content: The full replacement HTML document, or raw Markdown
+                for a Markdown artifact.
+            title: Optional new title; unchanged if omitted.
+            content_type: Optional; unchanged if omitted. Pass
+                "text/markdown" to switch an artifact to Markdown.
+
+        Returns the new version number.
+        """
+        try:
+            version = await asyncio.to_thread(
+                service.update_artifact_record,
+                user_id, artifact_id, content, title, content_type,
+            )
+        except service.ArtifactNotFoundError:
+            return {"content": [{"text": (
+                f"❌ Artifact {artifact_id} was not found. Use the id "
+                f"returned by create_artifact."
+            )}], "status": "error"}
+        except service.ArtifactConfigError as exc:
+            return {"content": [{"text": f"❌ Artifacts are not available: {exc}"}], "status": "error"}
+        except service.ArtifactError as exc:
+            return {"content": [{"text": f"❌ Failed to update artifact: {exc}"}], "status": "error"}
+        return {
+            "content": [{"text": (
+                f"Updated artifact {artifact_id} to version {version}."
+            )}],
+            "status": "success",
+        }
+
+    return update_artifact
diff --git a/backend/src/agents/builtin_tools/spreadsheet_analysis/__init__.py b/backend/src/agents/builtin_tools/spreadsheet_analysis/__init__.py
new file mode 100644
index 00000000..2e380954
--- /dev/null
+++ b/backend/src/agents/builtin_tools/spreadsheet_analysis/__init__.py
@@ -0,0 +1,10 @@
+"""Spreadsheet analysis tools for Code Interpreter integration.
+
+Provides tools that enable the agent to list and analyze tabular data files
+from assistant knowledge bases and chat attachments using Code Interpreter.
+"""
+
+from .list_spreadsheets_tool import make_list_spreadsheets_tool
+from .analyze_tool import make_analyze_tool
+
+__all__ = ["make_list_spreadsheets_tool", "make_analyze_tool"]
diff --git a/backend/src/agents/builtin_tools/spreadsheet_analysis/analyze_tool.py b/backend/src/agents/builtin_tools/spreadsheet_analysis/analyze_tool.py
new file mode 100644
index 00000000..700f5ca0
--- /dev/null
+++ b/backend/src/agents/builtin_tools/spreadsheet_analysis/analyze_tool.py
@@ -0,0 +1,870 @@
+"""Analyze spreadsheet files using Code Interpreter.
+
+Factory function creates a context-bound tool that downloads tabular files
+from S3, pushes them to Code Interpreter, and executes Python code for analysis.
+"""
+
+import logging
+import os
+import re
+from typing import Any, Dict, Optional
+
+import boto3
+from strands import tool
+
+from .list_spreadsheets_tool import _get_kb_files, _get_session_files
+
+logger = logging.getLogger(__name__)
+
+MAX_OUTPUT_CHARS = 10000  # ~2500 tokens — safe margin under context limits
+MAX_ERROR_CHARS = 600  # cleaned traceback budget — full pandas tracebacks are noise
+
+# Defensive caps for multi-sheet XLSX conversion. The outer upload limit
+# (FILE_UPLOAD_MAX_SIZE_BYTES, default 4 MB) catches naive abuse, but XLSX
+# is a zip of XML and can pack thousands of nearly-empty sheets into a few
+# megabytes. We cap both sheet count and per-sheet row count to keep turn
+# latency bounded; anything excluded is surfaced to the model with a
+# warning so the user learns about the cap rather than getting silently
+# wrong results.
+MAX_SHEETS_TO_CONVERT = int(os.environ.get("ANALYZE_MAX_SHEETS", 25))
+MAX_ROWS_PER_SHEET = int(os.environ.get("ANALYZE_MAX_ROWS_PER_SHEET", 500_000))
+
+_SCHEMA_MARKER = "[__SCHEMA__]"
+_SHEETS_MARKER = "[__SHEETS__]"
+
+
+def _sanitize_sheet_name(name: str) -> str:
+    """Make a sheet name filesystem-safe.
+
+    Sheet names can contain spaces, slashes, unicode — pick a deterministic
+    filename-safe transform so the model can predict the output filename
+    from the sheet name. Lowercase for cross-platform stability, replace
+    anything non-alphanumeric with underscore, collapse repeats, trim.
+    """
+    cleaned = re.sub(r"[^A-Za-z0-9]+", "_", name).strip("_").lower()
+    return cleaned or "sheet"
+
+
+def _parse_sheet_inventory(bootstrap_stdout: str) -> Dict[str, Any]:
+    """Extract the sheet inventory emitted by the XLSX bootstrap.
+
+    The bootstrap prints a pipe-delimited record per converted sheet
+    (``sheet|<name>|<path>|<rows>|<truncated_flag>|<primary_alias>``),
+    bracketed by ``_SHEETS_MARKER``. We parse that into a structured dict
+    the tool can reason about without re-evaluating Python literals from
+    untrusted-ish interpreter stdout.
+
+    Returns a dict with:
+        - ``total`` (int): total sheets in workbook
+        - ``converted`` (int): sheets actually written to CSV
+        - ``skipped`` (int): sheets excluded by MAX_SHEETS_TO_CONVERT
+        - ``skipped_preview`` (list[str]): first few skipped sheet names
+        - ``sheets`` (list[dict]): per-sheet records with name, path,
+          rows, truncated
+        - ``has_primary_alias`` (bool): whether the <stem>.csv fast-path
+          alias was written for the first sheet
+    """
+    result: Dict[str, Any] = {
+        "total": 0,
+        "converted": 0,
+        "skipped": 0,
+        "skipped_preview": [],
+        "sheets": [],
+        "has_primary_alias": False,
+    }
+    if _SHEETS_MARKER not in bootstrap_stdout:
+        return result
+    try:
+        block = bootstrap_stdout.split(_SHEETS_MARKER)[1].strip()
+    except IndexError:
+        return result
+
+    for line in block.splitlines():
+        stripped = line.strip()
+        if not stripped:
+            continue
+        if stripped.startswith("total:"):
+            result["total"] = _safe_int(stripped.split(":", 1)[1])
+        elif stripped.startswith("converted:"):
+            result["converted"] = _safe_int(stripped.split(":", 1)[1])
+        elif stripped.startswith("skipped:"):
+            result["skipped"] = _safe_int(stripped.split(":", 1)[1])
+        elif stripped.startswith("skipped_names:"):
+            # Stored as a Python list literal — safe to ast.literal_eval
+            # because the content is quoted strings from sheetnames.
+            import ast as _ast
+            try:
+                names = _ast.literal_eval(stripped.split(":", 1)[1].strip())
+                if isinstance(names, list):
+                    result["skipped_preview"] = [str(n) for n in names]
+            except (ValueError, SyntaxError):
+                pass
+        elif stripped.startswith("sheet|"):
+            parts = stripped.split("|")
+            # sheet | name | path | rows | truncated | alias
+            if len(parts) < 6:
+                continue
+            _, name, path, rows, trunc, alias = parts[:6]
+            result["sheets"].append({
+                "name": name,
+                "path": path,
+                "rows": _safe_int(rows),
+                "truncated": trunc == "1",
+                "primary_alias": alias or None,
+            })
+            if alias:
+                result["has_primary_alias"] = True
+    return result
+
+
+def _safe_int(raw: str) -> int:
+    try:
+        return int(str(raw).strip())
+    except (TypeError, ValueError):
+        return 0
+
+
+def _format_sheet_note(inventory: Dict[str, Any]) -> str:
+    """Turn a parsed sheet inventory into a markdown footer for the tool
+    response. Empty string when the workbook has a single sheet that was
+    fully converted (no-op path — nothing interesting to report).
+    """
+    total = inventory.get("total", 0)
+    sheets = inventory.get("sheets", [])
+    skipped = inventory.get("skipped", 0)
+    truncated_sheets = [s for s in sheets if s.get("truncated")]
+
+    if total <= 1 and not truncated_sheets:
+        return ""
+
+    lines: list[str] = []
+
+    if total > 1:
+        converted = inventory.get("converted", len(sheets))
+        if skipped:
+            preview = inventory.get("skipped_preview", [])
+            shown = ", ".join(preview)
+            more = f" (+{skipped - len(preview)} more)" if skipped > len(preview) else ""
+            lines.append(
+                f"⚠ Workbook has {total} sheets; converted the first {converted}. "
+                f"Skipped: {shown}{more}. "
+                f"Split the file or export specific tabs as CSV to analyze the rest."
+            )
+        else:
+            lines.append(
+                f"Workbook has {total} sheets; all converted. Use the "
+                f"per-sheet filenames below to read or combine them."
+            )
+        lines.append("")
+        lines.append("**Available sheets (load via `pd.read_csv`):**")
+        for s in sheets:
+            trunc_tag = ""
+            if s.get("truncated"):
+                trunc_tag = f" — ⚠ truncated at {MAX_ROWS_PER_SHEET:,} rows"
+            lines.append(f"- `{s['name']}` → `{s['path']}` ({s['rows']:,} rows{trunc_tag})")
+
+    elif truncated_sheets:
+        # Single-sheet workbook but hit the row cap.
+        s = truncated_sheets[0]
+        lines.append(
+            f"⚠ Sheet `{s['name']}` was truncated at {MAX_ROWS_PER_SHEET:,} rows "
+            f"due to the analysis size cap; full row count not reported."
+        )
+
+    return "\n".join(lines)
+
+
+def _truncate_output(text: str) -> str:
+    """Truncate tool output to prevent blowing the LLM context window."""
+    if not text or len(text) <= MAX_OUTPUT_CHARS:
+        return text
+    return text[:MAX_OUTPUT_CHARS] + f"\n\n... (output truncated — {len(text):,} chars total, showing first {MAX_OUTPUT_CHARS:,})"
+
+
+def _strip_first_row(schema: str) -> str:
+    """Drop the ``first_row: ...`` line from a schema footer.
+
+    On the happy path the first-row preview helps the model write correct
+    code. On the error path the model already has the load line and column
+    list — the full row dump is ~30 fields of noise. This trims it.
+    """
+    return "\n".join(
+        line for line in schema.splitlines()
+        if not line.startswith("first_row:")
+    )
+
+
+# ---------------------------------------------------------------------------
+# Stderr cleaning
+# ---------------------------------------------------------------------------
+
+# Frames we never want to show the LLM — they're pandas/numpy internals with
+# zero signal for fixing the user's code.
+_INTERNAL_FRAME_MARKERS = (
+    "site-packages/pandas/",
+    "site-packages/numpy/",
+    "pandas/_libs/",
+    "pandas/core/",
+    "pandas/io/",
+)
+
+
+def _clean_stderr(stderr: str) -> str:
+    """Strip pandas internal frames and dtype warnings from a traceback.
+
+    Keeps the user-code frame (the `/tmp/ipykernel_*.py` line they wrote) and
+    the final exception line. Falls back to a truncated raw stderr if the
+    traceback doesn't match the expected shape.
+    """
+    if not stderr:
+        return "Unknown error"
+
+    lines = stderr.splitlines()
+
+    # 1. Drop DtypeWarning noise (spans 2 lines: the warning + the call-site).
+    filtered: list[str] = []
+    skip_next = False
+    for line in lines:
+        if skip_next:
+            skip_next = False
+            continue
+        if "DtypeWarning:" in line or "FutureWarning:" in line or "UserWarning:" in line:
+            skip_next = True  # next line is usually the offending code snippet
+            continue
+        filtered.append(line)
+
+    # 2. Find the final exception line (e.g. "KeyError: 'NET_AMOUNT'").
+    final_exception = ""
+    for line in reversed(filtered):
+        stripped = line.strip()
+        if not stripped:
+            continue
+        # Exception lines are left-flush and match "ExceptionName: message".
+        if not line.startswith((" ", "\t")) and re.match(r"^[A-Z][A-Za-z]*(?:Error|Exception|Warning):", stripped):
+            final_exception = stripped
+            break
+
+    # 3. Find the user-code frame (ipykernel tempfile, not site-packages).
+    user_frame_lines: list[str] = []
+    for i, line in enumerate(filtered):
+        stripped = line.strip()
+        if not stripped.startswith("File "):
+            continue
+        if any(m in stripped for m in _INTERNAL_FRAME_MARKERS):
+            continue
+        # Keep this frame + up to the next 2 lines (the code snippet + pointer).
+        user_frame_lines.append(stripped)
+        for j in range(i + 1, min(i + 3, len(filtered))):
+            nxt = filtered[j].strip()
+            if not nxt or nxt.startswith("File "):
+                break
+            user_frame_lines.append(nxt)
+        break
+
+    if user_frame_lines and final_exception:
+        cleaned = "\n".join(user_frame_lines) + "\n" + final_exception
+    elif final_exception:
+        cleaned = final_exception
+    else:
+        # Unrecognized shape — return a short tail rather than a 3K dump.
+        cleaned = "\n".join(filtered[-8:]).strip()
+
+    if len(cleaned) > MAX_ERROR_CHARS:
+        # Leave room for the ellipsis tail so the final string respects
+        # the budget strictly — callers rely on ``len(output) <=
+        # MAX_ERROR_CHARS``.
+        ellipsis = " ..."
+        cleaned = cleaned[:MAX_ERROR_CHARS - len(ellipsis)] + ellipsis
+    return cleaned
+
+
+# ---------------------------------------------------------------------------
+# Schema preview probe
+# ---------------------------------------------------------------------------
+
+
+def _build_preview_code(csv_filename: str) -> str:
+    """Return Python code that prints a compact schema snapshot for csv_filename.
+
+    Runs a bounded skiprows probe (0..8) to handle report-style exports with
+    leading metadata rows. Picks the skiprows value that produces the cleanest
+    header — no ``Unnamed:`` columns, no duplicates, non-empty names — and
+    emits a ready-to-use ``pd.read_csv(...)`` invocation when the best
+    candidate is meaningfully better than skiprows=0. Otherwise it reports the
+    columns at skiprows=0 and lets the model decide.
+
+    Output is bracketed with _SCHEMA_MARKER so it can be reliably extracted
+    from the interpreter's stdout stream even if user code prints other things.
+
+    Filenames with quotes or other f-string-breaking characters are handled
+    by stashing the filename as a top-of-script local variable (``_FNAME``)
+    via ``repr()`` once. The rest of the template references ``_FNAME`` as
+    an ordinary string, so we never re-interpolate the raw filename into
+    nested f-string contexts. Before this indirection, a filename like
+    ``"O'Brien data.csv"`` would generate invalid Python because ``repr``
+    emits double quotes when the string contains a single quote, conflicting
+    with the outer f-string's own quoting.
+    """
+    # repr() always produces a valid Python string literal; storing that
+    # literal once means the generated code can refer to the filename by
+    # name, without any further escaping.
+    fname_literal = repr(csv_filename)
+    return f"""
+import warnings, pandas as pd
+warnings.filterwarnings('ignore')
+
+_FNAME = {fname_literal}
+
+def _score(cols):
+    # Higher is better. Punishes Unnamed columns and duplicates.
+    if not cols:
+        return -10_000
+    unnamed = sum(1 for c in cols if str(c).startswith('Unnamed:'))
+    named = len(cols) - unnamed
+    dup_penalty = (len(cols) - len(set(cols))) * 20
+    blank_penalty = sum(1 for c in cols if not str(c).strip()) * 10
+    return named - (unnamed * 5) - dup_penalty - blank_penalty
+
+try:
+    with open(_FNAME, 'r') as _fh:
+        _total_rows = sum(1 for _ in _fh)
+
+    # Score skiprows=0..8, keep the winner and remember the baseline.
+    _baseline_score, _baseline_cols = -float('inf'), []
+    _best_skip, _best_score, _best_cols = 0, -float('inf'), []
+    for _sk in range(9):
+        try:
+            _cols = pd.read_csv(_FNAME, nrows=0, skiprows=_sk, low_memory=False).columns.tolist()
+        except Exception:
+            continue
+        _sc = _score(_cols)
+        if _sk == 0:
+            _baseline_score, _baseline_cols = _sc, _cols
+        if _sc > _best_score:
+            _best_skip, _best_score, _best_cols = _sk, _sc, _cols
+
+    # Confidence gate: only prescribe a non-zero skiprows when the winner
+    # actually fixes a header problem — either more named columns OR fewer
+    # Unnamed columns than the baseline — AND the winner is mostly clean.
+    # A score-delta threshold alone can't distinguish "found the real header"
+    # from "data row happens to parse cleanly", so we anchor on named/unnamed
+    # counts instead.
+    def _named_unnamed(cols):
+        u = sum(1 for c in cols if str(c).startswith('Unnamed:'))
+        return len(cols) - u, u
+    _base_named, _base_unnamed = _named_unnamed(_baseline_cols)
+    _win_named, _win_unnamed = _named_unnamed(_best_cols)
+    _win_clean_ratio = _win_named / max(len(_best_cols), 1)
+
+    _prescribe = (
+        _best_skip > 0
+        and _win_clean_ratio >= 0.7
+        and (_win_named > _base_named or _win_unnamed < _base_unnamed)
+    )
+
+    if _prescribe:
+        _report_skip, _report_cols = _best_skip, _best_cols
+    else:
+        _report_skip, _report_cols = 0, _baseline_cols
+
+    _data_rows = max(_total_rows - 1 - _report_skip, 0)
+    _col_preview = ', '.join(str(c) for c in _report_cols[:20])
+    if len(_report_cols) > 20:
+        _col_preview += f' ... (+{{len(_report_cols) - 20}} more)'
+
+    try:
+        _head = pd.read_csv(_FNAME, skiprows=_report_skip, nrows=1, low_memory=False)
+        _first_row = _head.iloc[0].to_dict() if len(_head) else {{}}
+        _first_row = {{k: (str(v)[:40] + '...' if len(str(v)) > 40 else v) for k, v in _first_row.items()}}
+    except Exception:
+        _first_row = {{}}
+
+    if _prescribe:
+        _load = f"pd.read_csv({{_FNAME!r}}, skiprows={{_report_skip}}, low_memory=False)"
+        _note = f"  # {{_report_skip}} metadata row(s) detected before header"
+    else:
+        _load = f"pd.read_csv({{_FNAME!r}}, low_memory=False)"
+        _note = ""
+
+    print({_SCHEMA_MARKER!r})
+    print(f'file: {{_FNAME}} ({{_data_rows}} rows x {{len(_report_cols)}} cols)')
+    print(f'load: {{_load}}{{_note}}')
+    print(f'columns: {{_col_preview}}')
+    print(f'first_row: {{_first_row}}')
+    # If confidence was low, flag it so the model knows to verify.
+    if not _prescribe and _win_unnamed > 0 and _win_unnamed < len(_best_cols):
+        print(f'note: header may need adjustment (skiprows=0 has {{_base_unnamed}}/{{len(_baseline_cols)}} unnamed columns); inspect head() if unsure')
+    print({_SCHEMA_MARKER!r})
+except Exception as _e:
+    print({_SCHEMA_MARKER!r})
+    print(f'schema preview unavailable: {{_e}}')
+    print({_SCHEMA_MARKER!r})
+"""
+
+
+def _extract_schema_preview(stdout: str) -> tuple[str, str]:
+    """Split stdout into (schema_block, remaining_stdout).
+
+    The schema block is whatever is between _SCHEMA_MARKER pairs; if no markers
+    are found, returns ("", stdout).
+    """
+    if _SCHEMA_MARKER not in stdout:
+        return "", stdout
+    parts = stdout.split(_SCHEMA_MARKER)
+    # parts = [before, schema, after, ...]; stitch back everything non-schema.
+    if len(parts) >= 3:
+        schema = parts[1].strip()
+        remaining = (parts[0] + _SCHEMA_MARKER.join(parts[2:])).strip("\n")
+        return schema, remaining
+    return "", stdout
+
+
+def _get_code_interpreter_id() -> Optional[str]:
+    """Get Code Interpreter ID from environment or SSM."""
+    ci_id = os.getenv("AGENTCORE_CODE_INTERPRETER_ID")
+    if ci_id:
+        return ci_id
+    try:
+        project_name = os.getenv("PROJECT_NAME", "strands-agent-chatbot")
+        environment = os.getenv("ENVIRONMENT", "dev")
+        region = os.getenv("AWS_REGION", "us-west-2")
+        ssm = boto3.client("ssm", region_name=region)
+        response = ssm.get_parameter(Name=f"/{project_name}/{environment}/agentcore/code-interpreter-id")
+        return response["Parameter"]["Value"]
+    except Exception:
+        return None
+
+
+def make_analyze_tool(
+    assistant_id: Optional[str],
+    session_id: str,
+    user_id: str,
+):
+    """Create an analyze_spreadsheet tool bound to the given context."""
+
+    @tool
+    async def analyze_spreadsheet(
+        filename: str,
+        python_code: str,
+        output_filename: Optional[str] = None,
+    ) -> Dict[str, Any]:
+        """Analyze a spreadsheet file using Python code in Code Interpreter.
+
+        Downloads the specified file and loads it into a sandboxed Python
+        environment for analysis. Use pandas, numpy, matplotlib, and seaborn.
+
+        ⚠️  CRITICAL — filename vs. in-sandbox path
+        -------------------------------------------
+        The ``filename`` parameter names the **source** file (exactly as it
+        appears in the chat attachment or knowledge base, e.g.
+        ``"FY_27_Ledger.xlsx"``).
+
+        In the sandbox, XLSX files are pre-converted to CSV:
+
+        • Single-sheet workbooks → loadable as ``<stem>.csv``
+          ``FY_27_Ledger.xlsx`` → ``FY_27_Ledger.csv``
+
+        • Multi-sheet workbooks → one CSV per sheet, plus a primary alias
+          for the first sheet:
+            ``Budget.xlsx`` → ``Budget.summary.csv``,
+                              ``Budget.transactions.csv``,
+                              ``Budget.notes.csv``,
+                              ``Budget.csv`` (alias of the first sheet)
+
+          The tool response's "Available sheets" footer lists the exact
+          ``pd.read_csv`` target for every sheet. **Use those names
+          verbatim.** For cross-sheet aggregation, read multiple and
+          combine with ``pd.concat``.
+
+        So ``python_code`` must read the CSV form, even for an XLSX source:
+
+            filename:    "FY_27_Ledger.xlsx"      (source name)
+            python_code: pd.read_csv('FY_27_Ledger.csv', low_memory=False)
+                                         ^^^ .csv, not .xlsx
+
+        CSV files keep their name unchanged in the sandbox.
+
+        Handling leading metadata rows
+        ------------------------------
+        Some exports have metadata rows above the real header. The tool
+        response always includes a schema footer with a ready-to-use
+        ``load:`` command that accounts for this — e.g.
+        ``pd.read_csv('file.csv', skiprows=3, low_memory=False)``.
+        **On any retry, use that exact load line verbatim** instead of
+        guessing ``skiprows``.
+
+        Safety limits
+        -------------
+        Multi-sheet workbooks convert at most the first 25 sheets; each
+        sheet is truncated at 500,000 rows. When a cap triggers, the
+        response footer tells you what was excluded so you can relay it
+        to the user instead of presenting a partial answer as complete.
+
+        Best for: aggregations, filtering, trends, comparisons, statistics,
+        charts. For simple factual lookups, use knowledge base search.
+
+        Args:
+            filename: Source filename from list_spreadsheets results. Use
+                the original name (``.xlsx`` or ``.csv``), not the sandbox
+                form.
+            python_code: Python to execute. For XLSX sources, use the exact
+                CSV names from the "Available sheets" footer. Available
+                libraries: pandas, numpy, matplotlib, seaborn, openpyxl.
+            output_filename: Optional PNG filename if generating a chart.
+                Must end with ``.png``. Example: ``"chart.png"``.
+
+        Returns:
+            Analysis results as text (with a schema footer), and optionally
+            a chart image.
+        """
+        from bedrock_agentcore.tools.code_interpreter_client import CodeInterpreter
+
+        # 1. Validate Code Interpreter is available
+        ci_id = _get_code_interpreter_id()
+        if not ci_id:
+            return {"content": [{"text": "❌ Code Interpreter is not configured. Contact your administrator."}], "status": "error"}
+
+        # 2. Find the file in accessible sources
+        file_info = await _find_file(filename, assistant_id, session_id)
+        if not file_info:
+            return {"content": [{"text": f"❌ File '{filename}' not found or not accessible. Use list_spreadsheets to see available files."}], "status": "error"}
+
+        # 3. Download from S3
+        try:
+            file_bytes = _download_file(file_info)
+        except Exception as e:
+            return {"content": [{"text": f"❌ Failed to download file: {e}"}], "status": "error"}
+
+        # 4. Push file to Code Interpreter
+        content_type = file_info.get("content_type", "")
+        is_xlsx = "spreadsheetml" in content_type or filename.lower().endswith(".xlsx")
+
+        region = os.getenv("AWS_REGION", "us-west-2")
+        code_interpreter = CodeInterpreter(region)
+
+        try:
+            code_interpreter.start(identifier=ci_id)
+
+            if is_xlsx:
+                # Push XLSX as base64, decode in sandbox, then convert every
+                # sheet to its own CSV (subject to defensive caps below).
+                # Model gets a full sheet inventory in the schema footer so
+                # cross-sheet aggregation works in a single analyze call.
+                import base64
+                b64_content = base64.b64encode(file_bytes).decode("ascii")
+                stem = os.path.splitext(filename)[0]
+                # Back-compat alias: single-sheet workbooks still expose
+                # <stem>.csv so the one-file, one-sheet fast path keeps
+                # its existing filename contract. Multi-sheet workbooks
+                # use <stem>.<sanitized_sheet>.csv per sheet.
+                primary_csv_filename = f"{stem}.csv"
+
+                code_interpreter.invoke("writeFiles", {"content": [
+                    {"path": "_encoded.b64", "text": b64_content},
+                ]})
+                # Bootstrap: iterate every sheet (capped), write a CSV per
+                # sheet, emit an inventory the outer tool can parse. Uses
+                # read_only + values_only to avoid loading full styles/
+                # formulas into memory — important for large workbooks.
+                bootstrap_code = f"""
+import base64, io, csv, re
+from openpyxl import load_workbook
+
+MAX_SHEETS = {MAX_SHEETS_TO_CONVERT}
+MAX_ROWS = {MAX_ROWS_PER_SHEET}
+STEM = {stem!r}
+PRIMARY_CSV = {primary_csv_filename!r}
+
+def _sanitize(name):
+    cleaned = re.sub(r'[^A-Za-z0-9]+', '_', name).strip('_').lower()
+    return cleaned or 'sheet'
+
+with open('_encoded.b64', 'r') as f:
+    raw = base64.b64decode(f.read())
+
+wb = load_workbook(io.BytesIO(raw), read_only=True, data_only=True)
+all_sheets = wb.sheetnames
+total_sheets = len(all_sheets)
+sheets_to_convert = all_sheets[:MAX_SHEETS]
+skipped_sheets = all_sheets[MAX_SHEETS:]
+
+# Track which sanitized names we've used — de-duplicate if two sheet
+# names sanitize to the same token (e.g. "Q1 2026" and "q1_2026").
+used_names = set()
+def _unique(base):
+    candidate, n = base, 2
+    while candidate in used_names:
+        candidate = f"{{base}}_{{n}}"
+        n += 1
+    used_names.add(candidate)
+    return candidate
+
+# Single-sheet workbook: keep the legacy <stem>.csv filename for
+# back-compat with existing prompts/docstring examples. Multi-sheet
+# workbooks get <stem>.<sheet>.csv per sheet. The primary alias is
+# always the first sheet.
+sheet_records = []
+for idx, sheet_name in enumerate(sheets_to_convert):
+    if total_sheets == 1:
+        out_path = PRIMARY_CSV
+    else:
+        safe = _unique(_sanitize(sheet_name))
+        out_path = f"{{STEM}}.{{safe}}.csv"
+
+    ws = wb[sheet_name]
+    rows_written = 0
+    truncated = False
+    with open(out_path, 'w', newline='') as out:
+        writer = csv.writer(out)
+        for row in ws.iter_rows(values_only=True):
+            if all(cell is None for cell in row):
+                continue
+            if rows_written >= MAX_ROWS:
+                truncated = True
+                break
+            writer.writerow([str(cell) if cell is not None else '' for cell in row])
+            rows_written += 1
+
+    # Alias the first sheet of a multi-sheet workbook to the legacy
+    # <stem>.csv path too, so the single-sheet fast path and the
+    # XLSX→CSV docstring example keep working for picking "the main
+    # sheet" without needing to know its name.
+    primary_alias = None
+    if total_sheets > 1 and idx == 0:
+        try:
+            with open(out_path, 'r') as src, open(PRIMARY_CSV, 'w') as dst:
+                dst.write(src.read())
+            primary_alias = PRIMARY_CSV
+        except Exception:
+            pass
+
+    sheet_records.append({{
+        'name': sheet_name,
+        'path': out_path,
+        'rows': rows_written,
+        'truncated': truncated,
+        'primary_alias': primary_alias,
+    }})
+
+print({_SHEETS_MARKER!r})
+print(f'total: {{total_sheets}}')
+print(f'converted: {{len(sheet_records)}}')
+print(f'skipped: {{len(skipped_sheets)}}')
+if skipped_sheets:
+    _preview = skipped_sheets[:5]
+    print(f'skipped_names: {{_preview}}')
+for rec in sheet_records:
+    # Emit one record per line, pipe-delimited, so the outer parser
+    # doesn't have to evaluate arbitrary Python literals.
+    trunc = '1' if rec['truncated'] else '0'
+    alias = rec['primary_alias'] or ''
+    print(f"sheet|{{rec['name']}}|{{rec['path']}}|{{rec['rows']}}|{{trunc}}|{{alias}}")
+print({_SHEETS_MARKER!r})
+wb.close()
+"""
+                resp = code_interpreter.invoke("executeCode", {"code": bootstrap_code, "language": "python", "clearContext": False})
+                bootstrap_stdout = ""
+                for event in resp.get("stream", []):
+                    result = event.get("result", {})
+                    if result.get("isError", False):
+                        error_msg = _clean_stderr(result.get("structuredContent", {}).get("stderr", ""))
+                        return {"content": [{"text": f"❌ Failed to convert XLSX in sandbox:\n```\n{error_msg}\n```"}], "status": "error"}
+                    bootstrap_stdout += result.get("structuredContent", {}).get("stdout", "")
+
+                sheet_inventory = _parse_sheet_inventory(bootstrap_stdout)
+                if not sheet_inventory["sheets"]:
+                    return {
+                        "content": [{"text": "❌ XLSX bootstrap produced no readable sheets."}],
+                        "status": "error",
+                    }
+
+                # csv_filename is the canonical name the rest of the code
+                # path uses to probe schema and emit "load:" hints. For
+                # single-sheet or the primary alias on multi-sheet, that's
+                # <stem>.csv. For multi-sheet with no primary alias (write
+                # failure), fall back to the first converted sheet's path.
+                csv_filename = (
+                    primary_csv_filename
+                    if sheet_inventory["has_primary_alias"] or len(sheet_inventory["sheets"]) == 1
+                    else sheet_inventory["sheets"][0]["path"]
+                )
+                multi_sheet_note = _format_sheet_note(sheet_inventory)
+            else:
+                # CSV — push directly as text
+                csv_filename = filename if filename.lower().endswith(".csv") else os.path.splitext(filename)[0] + ".csv"
+                multi_sheet_note = ""
+                try:
+                    csv_text = file_bytes.decode("utf-8")
+                except UnicodeDecodeError:
+                    csv_text = file_bytes.decode("utf-8", errors="replace")
+                code_interpreter.invoke("writeFiles", {"content": [{"path": csv_filename, "text": csv_text}]})
+
+            # 5. Probe schema — separate exec so its output is isolated from user code.
+            schema_preview = ""
+            try:
+                preview_resp = code_interpreter.invoke("executeCode", {
+                    "code": _build_preview_code(csv_filename),
+                    "language": "python",
+                    "clearContext": False,
+                })
+                preview_stdout = ""
+                for event in preview_resp.get("stream", []):
+                    result = event.get("result", {})
+                    if result.get("isError", False):
+                        continue
+                    preview_stdout += result.get("structuredContent", {}).get("stdout", "")
+                schema_preview, _ = _extract_schema_preview(preview_stdout)
+            except Exception as e:
+                logger.warning(f"Schema preview failed for {csv_filename}: {e}")
+
+            # 6. Execute user code
+            response = code_interpreter.invoke("executeCode", {
+                "code": python_code,
+                "language": "python",
+                "clearContext": False,
+            })
+
+            execution_output = ""
+            for event in response.get("stream", []):
+                result = event.get("result", {})
+                if result.get("isError", False):
+                    error_msg = _clean_stderr(result.get("structuredContent", {}).get("stderr", ""))
+                    error_text = f"❌ Code execution failed:\n```\n{error_msg}\n```"
+
+                    # Targeted hint for the most common wrong-filename error:
+                    # the model wrote `pd.read_csv('FY_27_Ledger.xlsx', ...)`
+                    # but in the sandbox the file lives as `FY_27_Ledger.csv`
+                    # (see docstring: XLSX sources are pre-converted). Naming
+                    # this out explicitly is much more effective than relying
+                    # on the model to infer it from the schema footer.
+                    if (
+                        is_xlsx
+                        and "FileNotFoundError" in error_msg
+                        and filename in error_msg
+                    ):
+                        error_text += (
+                            f"\n\n**Hint:** In the sandbox, the XLSX source "
+                            f"`{filename}` is loaded as `{csv_filename}`. "
+                            f"Retry with `pd.read_csv('{csv_filename}', "
+                            f"low_memory=False)`."
+                        )
+
+                    if schema_preview:
+                        # Drop the first_row dump on errors — the load line +
+                        # column list is enough for the retry, first_row is
+                        # ~1K tokens of bloat on a path that's already costing
+                        # a round-trip.
+                        trimmed_schema = _strip_first_row(schema_preview)
+                        error_text += f"\n\nDataset info (use the `load:` line verbatim):\n```\n{trimmed_schema}\n```"
+                    else:
+                        error_text += f"\n\nTry: `pd.read_csv('{csv_filename}', low_memory=False)`"
+                    if multi_sheet_note:
+                        error_text += f"\n\n{multi_sheet_note}"
+                    return {"content": [{"text": error_text}], "status": "error"}
+                stdout = result.get("structuredContent", {}).get("stdout", "")
+                if stdout:
+                    execution_output += stdout
+
+            # 7. Download chart if requested
+            success_text = _truncate_output(execution_output) or "✅ Code executed successfully (no output)."
+            if schema_preview:
+                success_text = f"{success_text}\n\n---\nDataset: {schema_preview.splitlines()[0] if schema_preview else ''}"
+            if multi_sheet_note:
+                success_text = f"{success_text}\n{multi_sheet_note}"
+
+            if output_filename and output_filename.endswith(".png"):
+                try:
+                    dl_response = code_interpreter.invoke("readFiles", {"paths": [output_filename]})
+                    file_content = None
+                    for event in dl_response.get("stream", []):
+                        result = event.get("result", {})
+                        if "content" in result:
+                            for block in result["content"]:
+                                if "data" in block:
+                                    file_content = block["data"]
+                                elif "resource" in block and "blob" in block["resource"]:
+                                    file_content = block["resource"]["blob"]
+                                if file_content:
+                                    break
+                        if file_content:
+                            break
+
+                    if file_content:
+                        return {
+                            "content": [
+                                {"text": success_text},
+                                {"image": {"format": "png", "source": {"bytes": file_content}}},
+                            ],
+                            "status": "success",
+                        }
+                except Exception as e:
+                    logger.warning(f"Failed to download chart {output_filename}: {e}")
+
+            return {
+                "content": [{"text": success_text}],
+                "status": "success",
+            }
+
+        finally:
+            try:
+                code_interpreter.stop()
+            except Exception:
+                pass
+
+    return analyze_spreadsheet
+
+
+async def _find_file(filename: str, assistant_id: Optional[str], session_id: str) -> Optional[Dict[str, Any]]:
+    """Find a file by name in accessible sources. Returns file info or None.
+
+    Matches are tolerant to XLSX ↔ CSV aliasing: if the model asks for
+    ``foo.csv`` but only ``foo.xlsx`` exists (because the sandbox converts
+    XLSX → CSV and the model copied the sandbox name into the ``filename``
+    param on retry), we treat them as the same file. Prevents the common
+    round-trip loop where analyze_spreadsheet rejects a reasonable guess
+    and forces the model to call list_spreadsheets (#206).
+    """
+    candidates: list[Dict[str, Any]] = []
+    if assistant_id:
+        candidates.extend(await _get_kb_files(assistant_id))
+    candidates.extend(await _get_session_files(session_id))
+
+    target_lower = filename.lower()
+    target_stem, _ = os.path.splitext(target_lower)
+
+    # First pass: exact match (case-insensitive).
+    for f in candidates:
+        if f["filename"].lower() == target_lower:
+            return f
+
+    # Second pass: same stem, tabular extension. Covers foo.csv -> foo.xlsx
+    # and foo.xlsx -> foo.csv. Only applies to tabular files so we don't
+    # accidentally alias foo.pdf to foo.docx.
+    from apis.shared.files.models import is_tabular_file
+
+    if target_stem and any(target_lower.endswith(ext) for ext in (".csv", ".xls", ".xlsx")):
+        for f in candidates:
+            cand_lower = f["filename"].lower()
+            cand_stem, _ = os.path.splitext(cand_lower)
+            if cand_stem == target_stem and is_tabular_file(f["filename"], f.get("content_type", "")):
+                return f
+
+    return None
+
+
+def _download_file(file_info: Dict[str, Any]) -> bytes:
+    """Download file bytes from S3."""
+    region = os.environ.get("AWS_REGION", "us-west-2")
+    s3 = boto3.client("s3", region_name=region)
+
+    if file_info["source"] == "knowledge_base":
+        bucket = os.environ.get("S3_ASSISTANTS_DOCUMENTS_BUCKET_NAME")
+        if not bucket:
+            raise ValueError("S3_ASSISTANTS_DOCUMENTS_BUCKET_NAME not configured")
+    else:
+        bucket = file_info.get("s3_bucket")
+        if not bucket:
+            raise ValueError("S3 bucket not found in file metadata")
+
+    response = s3.get_object(Bucket=bucket, Key=file_info["s3_key"])
+    return response["Body"].read()
diff --git a/backend/src/agents/builtin_tools/spreadsheet_analysis/list_spreadsheets_tool.py b/backend/src/agents/builtin_tools/spreadsheet_analysis/list_spreadsheets_tool.py
new file mode 100644
index 00000000..45744f38
--- /dev/null
+++ b/backend/src/agents/builtin_tools/spreadsheet_analysis/list_spreadsheets_tool.py
@@ -0,0 +1,158 @@
+"""List available spreadsheet files for analysis.
+
+Factory function creates a context-bound tool that only exposes CSV/XLSX
+files belonging to the current assistant's knowledge base or chat session.
+"""
+
+import asyncio
+import logging
+import os
+from typing import Any, Dict, List, Optional
+
+import boto3
+from strands import tool
+
+from apis.shared.files.models import is_tabular_file
+
+logger = logging.getLogger(__name__)
+
+
+def _is_tabular_file(filename: str, content_type: str) -> bool:
+    """Deprecated wrapper — use apis.shared.files.models.is_tabular_file.
+
+    Kept as a module-local name so existing callers in this file stay
+    readable; shares the canonical implementation that the inference-api
+    route uses when partitioning chat attachments (#206).
+    """
+    return is_tabular_file(filename, content_type)
+
+
+def make_list_spreadsheets_tool(
+    assistant_id: Optional[str],
+    session_id: str,
+    user_id: str,
+):
+    """Create a list_spreadsheets tool bound to the given context."""
+
+    @tool
+    async def list_spreadsheets() -> Dict[str, Any]:
+        """List CSV/XLSX spreadsheet files available for analysis.
+
+        Returns spreadsheets from the assistant's knowledge base (if a
+        conversation is scoped to an assistant) and/or files attached to
+        the current conversation. Use this to discover which files can be
+        analyzed with the analyze_spreadsheet tool.
+
+        Returns:
+            Dictionary with 'files' list containing available spreadsheets,
+            each with filename, source, content_type, size_bytes, and document_id.
+        """
+        files: List[Dict[str, Any]] = []
+
+        # 1. Assistant KB files
+        if assistant_id:
+            files.extend(await _get_kb_files(assistant_id))
+
+        # 2. Session-attached files
+        files.extend(await _get_session_files(session_id))
+
+        if not files:
+            return {
+                "content": [{"text": "No spreadsheet files (CSV or XLSX) are available. Upload a spreadsheet to the assistant's knowledge base or attach one to this conversation."}],
+                "status": "success",
+            }
+
+        file_list = "\n".join(
+            f"- {f['filename']} ({f['source']}, {f['size_bytes'] / 1024:.0f} KB)"
+            for f in files
+        )
+        return {
+            "content": [{"text": f"Available spreadsheet files:\n{file_list}"}],
+            "status": "success",
+            "files": files,
+        }
+
+    return list_spreadsheets
+
+
+async def _get_kb_files(assistant_id: str) -> List[Dict[str, Any]]:
+    """Query DynamoDB for completed tabular documents in the assistant's KB.
+
+    The boto3 query is offloaded to a worker thread via ``asyncio.to_thread``
+    so the event loop stays free to schedule other coroutines while the
+    DynamoDB round-trip is in flight. Previously this was a sync function
+    called from async contexts — see #260 for the regression it caused
+    under concurrent chat load.
+    """
+    table_name = os.environ.get("DYNAMODB_ASSISTANTS_TABLE_NAME")
+    if not table_name:
+        logger.warning("DYNAMODB_ASSISTANTS_TABLE_NAME not set, skipping KB files")
+        return []
+
+    def _query() -> Dict[str, Any]:
+        dynamodb = boto3.resource(
+            "dynamodb", region_name=os.environ.get("AWS_REGION", "us-west-2")
+        )
+        table = dynamodb.Table(table_name)
+        return table.query(
+            KeyConditionExpression="PK = :pk AND begins_with(SK, :sk_prefix)",
+            ExpressionAttributeValues={":pk": f"AST#{assistant_id}", ":sk_prefix": "DOC#"},
+        )
+
+    try:
+        response = await asyncio.to_thread(_query)
+
+        files = []
+        for item in response.get("Items", []):
+            if item.get("status") != "complete":
+                continue
+            filename = item.get("filename", "")
+            content_type = item.get("contentType", item.get("content_type", ""))
+            if not _is_tabular_file(filename, content_type):
+                continue
+            files.append({
+                "filename": filename,
+                "source": "knowledge_base",
+                "content_type": content_type,
+                "size_bytes": int(item.get("sizeBytes", item.get("size_bytes", 0))),
+                "document_id": item.get("documentId", item.get("document_id", "")),
+                "s3_key": item.get("s3Key", item.get("s3_key", "")),
+            })
+        return files
+
+    except Exception as e:
+        logger.error(f"Error querying KB files for assistant {assistant_id}: {e}")
+        return []
+
+
+async def _get_session_files(session_id: str) -> List[Dict[str, Any]]:
+    """Query DynamoDB for tabular files attached to the current session.
+
+    Awaits ``FileUploadRepository.list_session_files`` directly — replaces
+    the earlier sync-in-async executor dance which spun up a thread pool
+    per call and ran ``asyncio.run`` inside it. See #260.
+    """
+    try:
+        from apis.shared.files.repository import get_file_upload_repository
+
+        repo = get_file_upload_repository()
+        session_files = await repo.list_session_files(session_id)
+
+        files = []
+        for f in session_files:
+            if not _is_tabular_file(f.filename, f.mime_type):
+                continue
+            files.append({
+                "filename": f.filename,
+                "source": "chat_attachment",
+                "content_type": f.mime_type,
+                "size_bytes": f.size_bytes,
+                "document_id": f.upload_id,
+                "s3_key": f.s3_key,
+                "s3_bucket": f.s3_bucket,
+            })
+        return files
+
+    except Exception as e:
+        logger.error(f"Error querying session files for {session_id}: {e}")
+        return []
diff --git a/backend/src/agents/main_agent/USAGE_EXAMPLES.md b/backend/src/agents/main_agent/USAGE_EXAMPLES.md
index afc0fea5..b9b1360e 100644
--- a/backend/src/agents/main_agent/USAGE_EXAMPLES.md
+++ b/backend/src/agents/main_agent/USAGE_EXAMPLES.md
@@ -347,8 +347,8 @@ from agents.main_agent import MainAgent
 
 app = FastAPI()
 
-@app.post("/chat/agent-stream")
-async def chat_agent_stream(session_id: str, message: str, enabled_tools: list[str]):
+@app.post("/chat/stream")
+async def chat_stream(session_id: str, message: str, enabled_tools: list[str]):
     agent = MainAgent(
         session_id=session_id,
         enabled_tools=enabled_tools
diff --git a/backend/src/agents/main_agent/base_agent.py b/backend/src/agents/main_agent/base_agent.py
index 75c41b74..3b295545 100644
--- a/backend/src/agents/main_agent/base_agent.py
+++ b/backend/src/agents/main_agent/base_agent.py
@@ -59,6 +59,7 @@ def __init__(
         max_tokens: Optional[int] = None,
         inference_params: Optional[Dict[str, Any]] = None,
         skip_persistence: bool = False,
+        extra_tools: Optional[List[Any]] = None,
     ):
         """
         Initialize base agent with shared infrastructure.
@@ -84,6 +85,7 @@ def __init__(
         self.user_id = user_id or session_id
         self.auth_token = auth_token
         self.enabled_tools = enabled_tools
+        self.extra_tools = extra_tools or []
         self.agent = None
 
         # Merge legacy temperature/max_tokens into the canonical dict. Explicit
@@ -182,6 +184,7 @@ async def stream_async(
         citations: Optional[List] = None,
         original_message: Optional[str] = None,
         interrupt_responses: Optional[List[Dict[str, Any]]] = None,
+        continue_truncated: bool = False,
     ) -> AsyncGenerator[str, None]:
         """Stream agent responses. Subclasses must implement.
 
@@ -189,6 +192,12 @@ async def stream_async(
         agent turn (Strands interrupt protocol) instead of starting a new
         one. In that case `message`/`files` are ignored — the original turn
         already has the user's prompt in its context.
+
+        When `continue_truncated` is True, the call resumes after a
+        max_tokens truncation: `message`/`files` are ignored and the loop is
+        re-entered with an empty prompt so the model continues the truncated
+        assistant message already in restored history (assistant-prefill),
+        rather than answering a fresh instruction.
         """
         ...
 
@@ -415,6 +424,11 @@ async def _load_with_context():
 
             logger.info(f"Added {len(external_clients)} external MCP clients to tools")
 
+        # Append context-bound tools (e.g., spreadsheet analysis) created per-request
+        if self.extra_tools:
+            local_tools.extend(self.extra_tools)
+            logger.info(f"Added {len(self.extra_tools)} extra context-bound tools")
+
         return local_tools
 
     def get_model_config(self) -> dict:
diff --git a/backend/src/agents/main_agent/chat_agent.py b/backend/src/agents/main_agent/chat_agent.py
index edf26f44..12baf9e4 100644
--- a/backend/src/agents/main_agent/chat_agent.py
+++ b/backend/src/agents/main_agent/chat_agent.py
@@ -50,6 +50,7 @@ async def stream_async(
         citations: Optional[List] = None,
         original_message: Optional[str] = None,
         interrupt_responses: Optional[List[Dict[str, Any]]] = None,
+        continue_truncated: bool = False,
     ) -> AsyncGenerator[str, None]:
         """
         Stream agent responses.
@@ -65,6 +66,12 @@ async def stream_async(
             interrupt_responses: When set, resume a paused agent turn by
                 passing this list as the prompt to Strands. Each entry is
                 `{"interruptResponse": {"interruptId": str, "response": Any}}`.
+            continue_truncated: When True, resume after a max_tokens
+                truncation by passing an empty-list prompt to Strands.
+                `_convert_prompt_to_messages([])` appends no message, so the
+                event loop re-runs against restored history whose tail is the
+                truncated assistant message — the model continues it
+                (assistant-prefill) instead of answering a new instruction.
 
         Yields:
             str: SSE formatted events
@@ -78,6 +85,11 @@ async def stream_async(
             # interrupts' `.response`, and continues from the paused tool
             # call. multimodal_builder + files do not apply here.
             prompt: Any = interrupt_responses
+        elif continue_truncated:
+            # Empty list → Strands appends nothing → the loop re-runs against
+            # restored history (tail = truncated assistant message). No new
+            # user turn, no multimodal/files.
+            prompt = []
         else:
             prompt = self.multimodal_builder.build_prompt(message, files)
 
diff --git a/backend/src/agents/main_agent/config/constants.py b/backend/src/agents/main_agent/config/constants.py
index 0b98f286..d8f2f9a6 100644
--- a/backend/src/agents/main_agent/config/constants.py
+++ b/backend/src/agents/main_agent/config/constants.py
@@ -49,6 +49,14 @@ class EnvVars:
     # --- Gateway ---
     GATEWAY_MCP_ENABLED = "AGENTCORE_GATEWAY_MCP_ENABLED"
 
+    # --- MCP Apps (host renderer initiative) ---
+    MCP_APPS_HOST_ENABLED = "AGENTCORE_MCP_APPS_HOST_ENABLED"
+    # Origin of the sandbox-proxy (proxy.html) the SPA frames an MCP App in.
+    # Deploy pipeline sources it from SSM /{projectPrefix}/mcp-sandbox/origin
+    # (published by the PR #1 CDK stack). Surfaced to the SPA on the
+    # `ui_resource` SSE event so the frontend needs no separate config fetch.
+    MCP_APPS_SANDBOX_ORIGIN = "AGENTCORE_MCP_APPS_SANDBOX_ORIGIN"
+
     # --- Frontend ---
     FRONTEND_URL = "FRONTEND_URL"
 
@@ -104,6 +112,19 @@ class Defaults:
     # --- Gateway ---
     GATEWAY_MCP_ENABLED = True
 
+    # --- MCP Apps (host renderer initiative) ---
+    # Gates the entire MCP Apps host surface. Flipped on in PR #7 of
+    # docs/kaizen/scoping/mcp-apps-host-renderer.md (the full #1–#6 chain
+    # landed first). Set AGENTCORE_MCP_APPS_HOST_ENABLED=false to opt a
+    # given environment back out.
+    MCP_APPS_HOST_ENABLED = True
+    # Empty unless the mcp-sandbox stack is deployed: the inference-api CDK
+    # stack wires `/{prefix}/mcp-sandbox/origin` into this env only when
+    # `config.mcpSandbox.enabled` (conditional-SSM, mirrors artifacts).
+    # An empty origin keeps the surface dormant — the SPA has no proxy
+    # origin to frame an App in even with the host flag on.
+    MCP_APPS_SANDBOX_ORIGIN = ""
+
     # --- Voice Agent ---
     NOVA_SONIC_MODEL_ID = "amazon.nova-2-sonic-v1:0"
     NOVA_SONIC_VOICE = "tiffany"
diff --git a/backend/src/agents/main_agent/core/model_config.py b/backend/src/agents/main_agent/core/model_config.py
index 3b6dd31d..c6e8e254 100644
--- a/backend/src/agents/main_agent/core/model_config.py
+++ b/backend/src/agents/main_agent/core/model_config.py
@@ -30,6 +30,11 @@ class ModelProvider(str, Enum):
     "top_k": "additional_request_fields.top_k",
     "max_tokens": "max_tokens",
     "thinking": "additional_request_fields.thinking",
+    # `effort` is Anthropic's top-level `output_config.effort`. It isn't on
+    # the Bedrock Converse standard shape either, so it rides through
+    # `additionalModelRequestFields` like `thinking`/`top_k`. Soft guidance
+    # for thinking depth on adaptive models; also tunes overall token spend.
+    "effort": "additional_request_fields.output_config.effort",
 }
 
 _OPENAI_PARAM_MAP: Dict[str, str] = {
@@ -52,6 +57,13 @@ class ModelProvider(str, Enum):
 # them. Suppression happens in `_apply_canonical_params` before dispatch.
 _THINKING_INCOMPATIBLE = {"temperature", "top_p", "top_k"}
 
+# Canonical params whose provider-native value must be a plain int. JSON- and
+# DynamoDB-sourced inference params arrive untyped (Dict[str, Any]) and can be
+# a float (e.g. 100000.0); the Bedrock Converse SDK rejects a float maxTokens
+# with a hard boto3 validation error. Coerce at this single translation
+# chokepoint. `thinking` is excluded — `_shape_thinking_value` already int()s it.
+_INTEGER_CANONICAL_PARAMS: frozenset[str] = frozenset({"max_tokens", "top_k"})
+
 # Union of every canonical key we know how to translate. Used by the request
 # merge step to gate user-supplied keys against an allow-list — admins can
 # constrain known params with `supportedParams`, but users shouldn't be able
@@ -71,19 +83,49 @@ def _set_nested(target: Dict[str, Any], dotted_path: str, value: Any) -> None:
     cursor[keys[-1]] = value
 
 
-def _shape_thinking_value(provider_label: str, value: Any) -> Any:
+# Bedrock model-id substrings whose Anthropic models require (Opus 4.7) or
+# recommend (Opus 4.6, Sonnet 4.6, Mythos) adaptive thinking. On these,
+# `{type: "enabled", budget_tokens: N}` is rejected (4.7) or deprecated; the
+# shape is `{type: "adaptive"}` and depth is governed by the `effort` param.
+# Substring match — real ids are inference-profile-prefixed and date-stamped
+# (e.g. `us.anthropic.claude-opus-4-7-20XXXXXX-v1:0`). Unknown ids fall back
+# to the legacy enabled shape, which is the safe default for older models.
+_BEDROCK_ADAPTIVE_THINKING_MARKERS = (
+    "claude-opus-4-7",
+    "claude-opus-4-6",
+    "claude-sonnet-4-6",
+    "claude-mythos",
+)
+
+
+def _bedrock_uses_adaptive_thinking(model_id: Optional[str]) -> bool:
+    """True when the Bedrock model id requires/recommends adaptive thinking."""
+    mid = (model_id or "").lower()
+    return any(marker in mid for marker in _BEDROCK_ADAPTIVE_THINKING_MARKERS)
+
+
+def _shape_thinking_value(
+    provider_label: str, value: Any, model_id: Optional[str] = None
+) -> Any:
     """Wrap a canonical ``thinking`` value into the provider-native object.
 
     The canonical value is an ``int`` budget (>= 1024), or falsy / 0 to disable.
-    Anthropic on Bedrock requires ``{type, budget_tokens}``; Gemini wants
-    ``{thinking_budget}``. Anything that's already a dict (admin pasting raw
-    SDK shape) is passed through verbatim.
+    On Bedrock, older Anthropic models take ``{type: "enabled", budget_tokens}``;
+    Opus 4.6/4.7 and Sonnet 4.6 require ``{type: "adaptive"}`` instead (Opus 4.7
+    rejects ``enabled`` with a 400). For adaptive models the int budget only
+    signals "thinking on" — depth is controlled by the separate ``effort``
+    param — and we set ``display: "summarized"`` so the reasoning trace the
+    UI renders isn't blank (Opus 4.7 defaults ``display`` to ``"omitted"``).
+    Gemini wants ``{thinking_budget}``. Anything that's already a dict (admin
+    pasting raw SDK shape) is passed through verbatim.
     """
     if isinstance(value, dict):
         return value
     if not value:
         return None
     if provider_label == "bedrock":
+        if _bedrock_uses_adaptive_thinking(model_id):
+            return {"type": "adaptive", "display": "summarized"}
         return {"type": "enabled", "budget_tokens": int(value)}
     if provider_label == "gemini":
         return {"thinking_budget": int(value)}
@@ -95,6 +137,7 @@ def _apply_canonical_params(
     canonical_params: Dict[str, Any],
     provider_map: Dict[str, str],
     provider_label: str,
+    model_id: Optional[str] = None,
 ) -> None:
     """Translate canonical inference params into provider-native shape.
 
@@ -125,10 +168,16 @@ def _apply_canonical_params(
             )
             continue
         if name == "thinking":
-            shaped = _shape_thinking_value(provider_label, value)
+            shaped = _shape_thinking_value(provider_label, value, model_id)
             if shaped is None:
                 continue
             value = shaped
+        elif (
+            name in _INTEGER_CANONICAL_PARAMS
+            and isinstance(value, (int, float))
+            and not isinstance(value, bool)
+        ):
+            value = int(value)
         _set_nested(target, native_path, value)
 
 
@@ -227,13 +276,16 @@ def get_provider(self) -> ModelProvider:
     def to_bedrock_config(self) -> Dict[str, Any]:
         """Convert to BedrockModel kwargs, translating canonical inference params."""
         config: Dict[str, Any] = {"model_id": self.model_id}
-        _apply_canonical_params(config, self.inference_params, _BEDROCK_PARAM_MAP, "bedrock")
+        _apply_canonical_params(
+            config, self.inference_params, _BEDROCK_PARAM_MAP, "bedrock", self.model_id
+        )
 
-        # TODO: Re-enable once Bedrock supports cachePoint blocks alongside
-        # non-PDF document blocks (.md, .docx, etc.). Currently causes:
-        # ValidationException: messages.N.content.M.type: Field required
-        # because Bedrock can't translate cachePoint after document blocks
-        # to the Anthropic format.
+        # Bedrock prompt caching is intentionally deferred. The previous SDK
+        # blocker — strands PR #1438, which fixed `cachePoint` blocks landing
+        # alongside non-PDF document attachments — is resolved in
+        # strands-agents 1.39.0, so the technical barrier is gone. Re-enabling
+        # is being held for a separate, scoped rollout (cost/badge impact is
+        # user-visible the moment caching turns on).
         # See: https://github.com/strands-agents/sdk-python/pull/1438
         # if self.caching_enabled:
         #     from strands.models import CacheConfig
@@ -255,7 +307,9 @@ def to_bedrock_config(self) -> Dict[str, Any]:
     def to_openai_config(self) -> Dict[str, Any]:
         """Convert to OpenAIModel kwargs, translating canonical inference params."""
         params: Dict[str, Any] = {}
-        _apply_canonical_params(params, self.inference_params, _OPENAI_PARAM_MAP, "openai")
+        _apply_canonical_params(
+            params, self.inference_params, _OPENAI_PARAM_MAP, "openai", self.model_id
+        )
         config: Dict[str, Any] = {"model_id": self.model_id}
         if params:
             config["params"] = params
@@ -264,7 +318,9 @@ def to_openai_config(self) -> Dict[str, Any]:
     def to_gemini_config(self) -> Dict[str, Any]:
         """Convert to GeminiModel kwargs, translating canonical inference params."""
         params: Dict[str, Any] = {}
-        _apply_canonical_params(params, self.inference_params, _GEMINI_PARAM_MAP, "gemini")
+        _apply_canonical_params(
+            params, self.inference_params, _GEMINI_PARAM_MAP, "gemini", self.model_id
+        )
         config: Dict[str, Any] = {"model_id": self.model_id}
         if params:
             config["params"] = params
diff --git a/backend/src/agents/main_agent/core/system_prompt_builder.py b/backend/src/agents/main_agent/core/system_prompt_builder.py
index 7fce7daf..9b76ffa2 100644
--- a/backend/src/agents/main_agent/core/system_prompt_builder.py
+++ b/backend/src/agents/main_agent/core/system_prompt_builder.py
@@ -51,6 +51,82 @@
 - Always explain your reasoning when using tools
 - If you don't have the right tool for a task, clearly inform the user about the limitation
 
+HANDLING MISSING TOOLS:
+Users can toggle individual tools on and off from the Tools section of the
+model settings panel (the gear icon next to the message input). When a user
+asks for something you would normally handle with a tool that isn't currently
+available to you, don't just say "I can't do that." Instead:
+
+1. Identify which capability they're asking for in plain language
+   (e.g. "spreadsheet analysis", "web browsing", "Python execution",
+   "knowledge base search").
+2. Tell them that capability isn't active in the current session and suggest
+   they enable the matching tool from the Tools panel in settings, then retry
+   the request.
+3. If you can offer a partial answer without the tool (e.g. explaining a
+   formula they could run themselves), do that as a fallback — but lead with
+   the tool suggestion so they know the better path exists.
+
+Common user intents and the tools to point at:
+- Analyzing spreadsheet/CSV data, aggregations, totals, trends → "Spreadsheet Analysis"
+- Listing files attached to the conversation or assistant → "List Spreadsheet Files"
+- Running Python code, generating charts or diagrams from data → "Code Interpreter"
+- Live web searches, news, current events → the web search tools
+- Fetching a specific URL's contents → the URL fetch tool
+- Questions answerable from the assistant's knowledge base → the knowledge base search tool
+
+Example response when spreadsheet analysis is disabled and a user asks for a
+column total:
+
+> I can compute that for you, but the Spreadsheet Analysis tool isn't
+> currently enabled for this conversation. Open the settings panel (gear
+> icon next to the message input), enable "Spreadsheet Analysis" under
+> Tools, and send the request again — I'll run the aggregation directly
+> on the file. Alternatively, you can open the file in Excel and use
+> `=SUM(NET_AMOUNT)` on the column.
+
+SPREADSHEET ANALYSIS — DISAMBIGUATION:
+When more than one spreadsheet is attached (including the assistant's
+knowledge base plus any chat attachments), do not silently pick one for
+`analyze_spreadsheet`. The turn preamble will list every available tabular
+file when multiple exist. Use that list to decide:
+
+1. If the user named a specific file (or the reference is unambiguous from
+   the query), analyze that file and state which one in your response:
+   "Analyzing `X.xlsx`: …"
+2. If the user's request could reasonably span multiple files (e.g. "total
+   X across the ledgers"), either run `analyze_spreadsheet` on each file
+   and combine the results, or explain the approach and ask the user which
+   files to include.
+3. If the reference is ambiguous, ask the user which file they mean
+   rather than guessing from RAG chunk ordering.
+
+Always name the file(s) you analyzed in the final response so the user can
+audit the choice. Example:
+
+> Analyzed `FY_27_Ledger.xlsx` — the total NET_AMOUNT is $20,419,308.89
+> across 18,551 transactions. Note: `FY_27_Ledger(_11).xlsx` is also
+> attached but was not included in this total. Let me know if you'd like
+> a combined figure.
+
+SPREADSHEET ANALYSIS — MULTI-SHEET WORKBOOKS:
+An XLSX workbook can have more than one sheet. When it does, the
+`analyze_spreadsheet` response includes an "Available sheets" footer
+listing one CSV target per sheet (e.g. `Budget.summary.csv`,
+`Budget.transactions.csv`).
+
+- Use the sheet CSV names verbatim in `pd.read_csv(...)` — they're
+  already correct for the sandbox.
+- For single-sheet workbooks the legacy `<stem>.csv` name still works.
+- For queries that span sheets (e.g. "total X across all tabs"), read
+  each sheet and combine with `pd.concat`:
+      ``dfs = [pd.read_csv(p) for p in paths]``
+      ``combined = pd.concat(dfs, ignore_index=True)``
+- Name the sheet(s) you analyzed in the response so the user can audit.
+- If the workbook had sheets skipped by the conversion cap (the footer
+  will say so), tell the user explicitly rather than presenting partial
+  results as complete.
+
 Your goal is to be helpful, accurate, and efficient in completing user requests using the available tools."""
 
 
diff --git a/backend/src/agents/main_agent/integrations/external_mcp_client.py b/backend/src/agents/main_agent/integrations/external_mcp_client.py
index fa5708dc..fd64cd1a 100644
--- a/backend/src/agents/main_agent/integrations/external_mcp_client.py
+++ b/backend/src/agents/main_agent/integrations/external_mcp_client.py
@@ -27,6 +27,7 @@
     ToolDefinition,
 )
 from agents.main_agent.integrations import oauth_token_cache
+from agents.main_agent.integrations.mcp_apps import UICapableMCPClient
 from agents.main_agent.integrations.gateway_auth import get_sigv4_auth
 from agents.main_agent.integrations.oauth_auth import (
     CompositeAuth,
@@ -182,7 +183,10 @@ def create_external_mcp_client(
             transport = MCPTransport(transport)
 
         if transport == MCPTransport.STREAMABLE_HTTP:
-            mcp_client = MCPClient(
+            # UICapableMCPClient advertises the MCP Apps UI extension on
+            # initialize and filters app-only tools out of the model's tool
+            # list (both inert unless AGENTCORE_MCP_APPS_HOST_ENABLED=true).
+            mcp_client = UICapableMCPClient(
                 lambda url=config.server_url, auth=auth: streamablehttp_client(
                     url,
                     auth=auth
diff --git a/backend/src/agents/main_agent/integrations/gateway_mcp_client.py b/backend/src/agents/main_agent/integrations/gateway_mcp_client.py
index de1a7953..d00fc87a 100644
--- a/backend/src/agents/main_agent/integrations/gateway_mcp_client.py
+++ b/backend/src/agents/main_agent/integrations/gateway_mcp_client.py
@@ -11,6 +11,11 @@
 from strands.tools.mcp import MCPClient
 from agents.main_agent.config.constants import EnvVars, Defaults
 from agents.main_agent.integrations.gateway_auth import get_sigv4_auth, get_gateway_region_from_url
+from agents.main_agent.integrations.mcp_apps import (
+    UICapableMCPClient,
+    ensure_ui_extension_session_patch,
+    record_and_filter_ui_tools,
+)
 
 logger = logging.getLogger(__name__)
 
@@ -38,6 +43,9 @@ def __init__(
             enabled_tool_ids: List of tool IDs that should be enabled
             prefix: Prefix used for tool IDs (default: 'gateway')
         """
+        # Advertise the MCP Apps UI extension on this client's initialize
+        # (inert unless AGENTCORE_MCP_APPS_HOST_ENABLED=true).
+        ensure_ui_extension_session_patch()
         super().__init__(client_factory)
         self.enabled_tool_ids = enabled_tool_ids
         self.prefix = prefix
@@ -80,6 +88,12 @@ def list_tools_sync(self, *args, **kwargs):
         logger.info(f"   Enabled tool IDs: {self.enabled_tool_ids}")
         logger.info(f"   Filtered tool names: {[t.tool_name for t in filtered_tools]}")
 
+        # Record `_meta.ui` into the catalog and drop app-only tools so the
+        # model never sees them (no-op unless the host flag is enabled).
+        # `self` is the client hosting these tools — recorded so PR #3 can
+        # issue `resources/read` against it.
+        filtered_tools = record_and_filter_ui_tools(filtered_tools, client=self)
+
         return PaginatedList(filtered_tools, token=paginated_result.pagination_token)
 
 
@@ -163,8 +177,10 @@ def create_gateway_mcp_client(
 
     # Create MCP client with streamable HTTP transport
     # Note: prefix and tool_filters are no longer supported in MCPClient constructor
-    # We'll filter tools manually after listing them
-    mcp_client = MCPClient(
+    # We'll filter tools manually after listing them.
+    # UICapableMCPClient advertises the MCP Apps UI extension on initialize and
+    # records/filters `_meta.ui` tools (inert unless the host flag is enabled).
+    mcp_client = UICapableMCPClient(
         lambda: streamablehttp_client(
             gateway_url,
             auth=auth  # httpx Auth class for automatic SigV4 signing
diff --git a/backend/src/agents/main_agent/integrations/mcp_apps.py b/backend/src/agents/main_agent/integrations/mcp_apps.py
new file mode 100644
index 00000000..59ad67f5
--- /dev/null
+++ b/backend/src/agents/main_agent/integrations/mcp_apps.py
@@ -0,0 +1,462 @@
+"""MCP Apps host support — `initialize` extension advertisement + tool-visibility filter.
+
+PR #2 of the MCP Apps host-renderer initiative
+(`docs/kaizen/scoping/mcp-apps-host-renderer.md`). This module is the backend
+surface for the MCP Apps extension (SEP-1865):
+
+1. Advertise `capabilities.extensions["io.modelcontextprotocol/ui"]` on every
+   outbound MCP `initialize` (Gateway + external clients), so servers know we
+   can host their UIs. Unconditional per-server (servers that don't understand
+   the capability ignore it).
+2. Parse `_meta.ui` off `tools/list` responses and retain it in an in-process
+   catalog (`UIToolCatalog`) keyed by agent-facing tool name. Later PRs read
+   `resource_uri` from here to fetch the UI via `resources/read`.
+3. Filter tools whose `_meta.ui.visibility` excludes `"model"` out of the
+   Strands agent's tool list — the model must never see app-only tools — while
+   the full metadata stays in the catalog.
+
+The entire surface is gated by `AGENTCORE_MCP_APPS_HOST_ENABLED` (default
+true since PR #7; set it to `false` to opt an environment back out). When
+the flag is off, no extension is advertised and no tool is filtered or
+recorded — behavior is byte-for-byte unchanged.
+
+Why a `ClientSession` symbol patch: Strands' `MCPClient` constructs the MCP
+SDK `ClientSession` itself inside its background thread and exposes no hook to
+customize the `initialize` capabilities. The SDK hard-codes
+`ClientCapabilities(experimental=None, ...)` with no `extensions`. Subclassing
+`ClientSession` and substituting the single symbol Strands resolves
+(`strands.tools.mcp.mcp_client.ClientSession`) is the minimal, upgrade-robust
+seam: it does not touch the SDK's own `ClientSession`, and the unit test that
+asserts the capability appears on the wire fails loudly if a Strands upgrade
+ever changes how the session is constructed.
+"""
+
+import logging
+import os
+from typing import Any, Dict, List, Optional, Tuple
+
+import mcp.types as mcp_types
+from mcp.client.session import ClientSession
+from strands.tools.mcp import MCPClient
+from strands.types import PaginatedList
+
+from agents.main_agent.config.constants import Defaults, EnvVars
+from apis.shared.tools.models import ToolUIMetadata
+
+logger = logging.getLogger(__name__)
+
+# SEP-1865 wire constants.
+MCP_APPS_UI_EXTENSION_KEY = "io.modelcontextprotocol/ui"
+MCP_APPS_UI_MIME_TYPE = "text/html;profile=mcp-app"
+MCP_APPS_UI_CAPABILITY: dict[str, Any] = {"mimeTypes": [MCP_APPS_UI_MIME_TYPE]}
+
+
+def is_mcp_apps_host_enabled() -> bool:
+    """True when the MCP Apps host surface is enabled via env flag.
+
+    Read on every call (not cached) so the flag can be flipped without a
+    process restart, matching the Gateway flag's pattern.
+    """
+    raw = os.environ.get(
+        EnvVars.MCP_APPS_HOST_ENABLED, str(Defaults.MCP_APPS_HOST_ENABLED)
+    )
+    return raw.strip().lower() == "true"
+
+
+def mcp_apps_sandbox_origin() -> str:
+    """Origin of the sandbox-proxy the SPA frames an MCP App in.
+
+    Read on every call (not cached), same as the host flag. Empty string
+    until the PR #1 mcp-sandbox stack is deployed and the deploy pipeline
+    wires its SSM origin into the env — benign because the whole surface is
+    inert behind the host flag. Surfaced to the SPA on the `ui_resource`
+    event so the frontend needs no separate config fetch.
+    """
+    return os.environ.get(
+        EnvVars.MCP_APPS_SANDBOX_ORIGIN, Defaults.MCP_APPS_SANDBOX_ORIGIN
+    ).strip()
+
+
+# =============================================================================
+# In-process UI tool catalog
+# =============================================================================
+
+
+class UIToolCatalog:
+    """Process-global map of agent-facing tool name -> parsed `_meta.ui`.
+
+    This is the "tool catalog for later PRs": PR #3 reads `resource_uri` from
+    here to fetch the UI resource via `resources/read`. Kept in memory because
+    `_meta.ui` is discovered live from the server on every `tools/list`, not
+    admin-configured, and is re-derived each agent build.
+
+    PR #3 also records the MCP client that surfaced each UI tool, alongside
+    its metadata. `record_and_filter_ui_tools` is invoked from within a
+    client's own `list_tools_sync`, so "the server hosting the tool" is just
+    that client — `read_resource_sync` against it is the spec-mandated
+    `resources/read`. The client's session stays alive for the agent's
+    lifetime (Strands holds MCP clients as tool providers), so it is still
+    active when a tool result arrives mid-stream.
+    """
+
+    def __init__(self) -> None:
+        self._by_tool_name: dict[str, ToolUIMetadata] = {}
+        self._client_by_tool_name: dict[str, Any] = {}
+
+    def record(
+        self,
+        tool_name: str,
+        ui_metadata: ToolUIMetadata,
+        client: Optional[Any] = None,
+    ) -> None:
+        self._by_tool_name[tool_name] = ui_metadata
+        if client is not None:
+            self._client_by_tool_name[tool_name] = client
+
+    def get(self, tool_name: str) -> Optional[ToolUIMetadata]:
+        return self._by_tool_name.get(tool_name)
+
+    def get_client(self, tool_name: str) -> Optional[Any]:
+        """The MCP client that surfaced `tool_name`, or None.
+
+        Used by `fetch_ui_resource` to issue `resources/read` against the
+        same server the tool came from (spec MUST: never inline).
+        """
+        return self._client_by_tool_name.get(tool_name)
+
+    def snapshot(self) -> dict[str, ToolUIMetadata]:
+        return dict(self._by_tool_name)
+
+    def clear(self) -> None:
+        self._by_tool_name.clear()
+        self._client_by_tool_name.clear()
+
+
+_ui_tool_catalog: Optional[UIToolCatalog] = None
+
+
+def get_ui_tool_catalog() -> UIToolCatalog:
+    """Get or create the global UIToolCatalog instance."""
+    global _ui_tool_catalog
+    if _ui_tool_catalog is None:
+        _ui_tool_catalog = UIToolCatalog()
+    return _ui_tool_catalog
+
+
+def record_and_filter_ui_tools(
+    tools: List[Any], client: Optional[Any] = None
+) -> List[Any]:
+    """Record `_meta.ui` into the catalog and drop model-invisible tools.
+
+    Given the `MCPAgentTool` list a Strands `MCPClient` produced from a
+    `tools/list`, parse each tool's `_meta.ui`, store it in the catalog (keyed
+    by the agent-facing tool name), and return only the tools the model is
+    allowed to see. Tools with no `_meta.ui` are ordinary tools and pass
+    through untouched.
+
+    `client` is the MCP client whose `list_tools_sync` produced `tools`. It
+    is recorded alongside each UI tool's metadata so PR #3 can issue
+    `resources/read` against the same server the tool came from. It is
+    optional purely so PR #2's catalog tests can call this without a client.
+
+    When the host flag is disabled this is a pure pass-through: nothing is
+    recorded and nothing is filtered.
+    """
+    if not is_mcp_apps_host_enabled():
+        return tools
+
+    catalog = get_ui_tool_catalog()
+    visible: List[Any] = []
+    for tool in tools:
+        mcp_tool = getattr(tool, "mcp_tool", None)
+        meta = getattr(mcp_tool, "meta", None)
+        ui_metadata = ToolUIMetadata.from_meta(meta)
+
+        if ui_metadata is None:
+            visible.append(tool)
+            continue
+
+        tool_name = getattr(tool, "tool_name", None) or getattr(
+            mcp_tool, "name", "<unknown>"
+        )
+        catalog.record(tool_name, ui_metadata, client=client)
+
+        if ui_metadata.visible_to_model():
+            visible.append(tool)
+        else:
+            logger.debug(
+                "filtered app-only MCP tool from model tool list: %s "
+                "(visibility=%s)",
+                tool_name,
+                ui_metadata.visibility,
+            )
+
+    return visible
+
+
+# =============================================================================
+# resources/read fetch path (PR #3)
+# =============================================================================
+
+# Keys an MCP App resource may carry its `_meta.ui` block under. SEP-1865
+# namespaces it as `io.modelcontextprotocol/ui`; PR #2 also accepts the short
+# `ui` alias on tool `_meta`, so honor both on the resource side too.
+_UI_META_KEYS = (MCP_APPS_UI_EXTENSION_KEY, "ui")
+
+
+def _coerce_meta(meta: Any) -> Dict[str, Any]:
+    """Best-effort `_meta` -> dict. Accepts a dict or a pydantic model."""
+    if isinstance(meta, dict):
+        return meta
+    if meta is not None and hasattr(meta, "model_dump"):
+        try:
+            return meta.model_dump(by_alias=True, exclude_none=True)
+        except Exception:
+            return {}
+    return {}
+
+
+def _ui_block(meta: Any) -> Dict[str, Any]:
+    """Extract the MCP Apps `ui` block from a `_meta` dict, or {}."""
+    data = _coerce_meta(meta)
+    for key in _UI_META_KEYS:
+        block = data.get(key)
+        if isinstance(block, dict):
+            return block
+    return {}
+
+
+def _extract_html_content(result: Any) -> Tuple[Optional[str], str]:
+    """Pick the HTML body + MIME type out of a `resources/read` result.
+
+    Prefers the spec MIME type (`text/html;profile=mcp-app`), then any
+    `text/html*` text content, then untyped text (the tool already declared
+    a `ui://` resource, so an inline body with no MIME is treated as the
+    app). An explicit non-HTML MIME (`text/plain`, `application/json`, …) is
+    rejected — we never pass a non-app body off as the app. Returns
+    `(None, "")` when nothing usable is present (e.g. a blob-only resource);
+    the caller then emits nothing.
+    """
+    contents = getattr(result, "contents", None) or []
+    html_fallback: Optional[Tuple[str, str]] = None
+    untyped_fallback: Optional[Tuple[str, str]] = None
+
+    for item in contents:
+        text = getattr(item, "text", None)
+        if not isinstance(text, str):
+            continue
+        mime = getattr(item, "mimeType", None) or ""
+        if mime == MCP_APPS_UI_MIME_TYPE:
+            return text, mime
+        if html_fallback is None and mime.startswith("text/html"):
+            html_fallback = (text, mime)
+        elif untyped_fallback is None and not mime:
+            untyped_fallback = (text, MCP_APPS_UI_MIME_TYPE)
+
+    chosen = html_fallback or untyped_fallback
+    if chosen is None:
+        return None, ""
+    return chosen[0], chosen[1]
+
+
+def _extract_csp_permissions(
+    result: Any, ui_metadata: ToolUIMetadata
+) -> Tuple[Dict[str, Any], Dict[str, Any]]:
+    """Resolve `csp` / `permissions` for the `ui_resource` event.
+
+    The spec declares these on the resource's `_meta.ui` (per-content first,
+    then the result-level `_meta`). We fall back to the tool-level `_meta.ui`
+    PR #2 retained verbatim in `ToolUIMetadata.raw` so a server that declares
+    them only on `tools/list` still works. PR #3 passes them through opaquely;
+    building the actual CSP (deny-by-default) is the frontend's job (PR #4).
+
+    Both are objects per SEP-1865: `csp` is `McpUiResourceCsp`
+    (`{connectDomains, resourceDomains, frameDomains, baseUriDomains}`) and
+    `permissions` is `{camera?, microphone?, geolocation?, clipboardWrite?}`
+    (each an empty object when requested) — NOT a list. The sandbox proxy
+    maps the permission keys onto the inner iframe's `allow` attribute.
+    """
+    sources: List[Dict[str, Any]] = []
+    for item in getattr(result, "contents", None) or []:
+        block = _ui_block(getattr(item, "meta", None))
+        if block:
+            sources.append(block)
+    result_block = _ui_block(getattr(result, "meta", None))
+    if result_block:
+        sources.append(result_block)
+    sources.append(ui_metadata.raw or {})
+
+    csp: Dict[str, Any] = {}
+    permissions: Dict[str, Any] = {}
+    for block in sources:
+        if not csp and isinstance(block.get("csp"), dict):
+            csp = block["csp"]
+        if not permissions and isinstance(block.get("permissions"), dict):
+            permissions = block["permissions"]
+    return csp, permissions
+
+
+def fetch_ui_resource(
+    tool_name: str, tool_use_id: str
+) -> Optional[Dict[str, Any]]:
+    """Fetch a tool's MCP App UI resource and build the `ui_resource` payload.
+
+    Looks up `tool_name` in the catalog PR #2 populates; if it carries a
+    `ui://` `resourceUri`, issues `resources/read` against the same MCP
+    client that surfaced the tool (spec MUST: fetch via `resources/read`,
+    never inline from the server's perspective) and returns the SSE payload
+    `{type, toolUseId, resourceUri, html, mimeType, csp, permissions}` with
+    the HTML inlined so the frontend needs no MCP client of its own.
+
+    Best-effort and fully inert when `AGENTCORE_MCP_APPS_HOST_ENABLED` is
+    false: returns None on flag-off, non-UI tool, unknown hosting client,
+    inactive session, fetch error, or a body with no inline HTML. Never
+    raises into the stream.
+    """
+    if not is_mcp_apps_host_enabled():
+        return None
+
+    catalog = get_ui_tool_catalog()
+    ui_metadata = catalog.get(tool_name)
+    if ui_metadata is None or not ui_metadata.resource_uri:
+        return None
+
+    client = catalog.get_client(tool_name)
+    if client is None:
+        logger.warning(
+            "MCP Apps: tool %s has resourceUri %s but no hosting client "
+            "was recorded; cannot issue resources/read",
+            tool_name,
+            ui_metadata.resource_uri,
+        )
+        return None
+
+    try:
+        result = client.read_resource_sync(ui_metadata.resource_uri)
+    except Exception:
+        logger.warning(
+            "MCP Apps: resources/read failed for %s (%s); emitting no "
+            "ui_resource event",
+            tool_name,
+            ui_metadata.resource_uri,
+            exc_info=True,
+        )
+        return None
+
+    html, mime_type = _extract_html_content(result)
+    if html is None:
+        logger.warning(
+            "MCP Apps: resources/read for %s (%s) returned no inline HTML; "
+            "emitting no ui_resource event",
+            tool_name,
+            ui_metadata.resource_uri,
+        )
+        return None
+
+    csp, permissions = _extract_csp_permissions(result, ui_metadata)
+    return {
+        "type": "ui_resource",
+        "toolUseId": tool_use_id,
+        "resourceUri": ui_metadata.resource_uri,
+        "html": html,
+        "mimeType": mime_type or MCP_APPS_UI_MIME_TYPE,
+        "csp": csp,
+        "permissions": permissions,
+        # Origin the SPA frames the sandbox-proxy at (PR #1's proxy.html).
+        # Carried on the event so the frontend needs no separate config
+        # fetch. Empty until the mcp-sandbox stack is deployed + wired.
+        "sandboxOrigin": mcp_apps_sandbox_origin(),
+    }
+
+
+# =============================================================================
+# initialize() extension advertisement
+# =============================================================================
+
+
+class _UIExtensionClientSession(ClientSession):
+    """`ClientSession` that advertises the MCP Apps UI extension on `initialize`.
+
+    Drop-in for the SDK `ClientSession` — identical constructor, identical
+    behavior, except that the outbound `InitializeRequest` gets
+    `capabilities.extensions["io.modelcontextprotocol/ui"]` added when the
+    host flag is enabled. We augment in `send_request` rather than reimplement
+    `initialize()` so we inherit whatever capabilities the SDK computes
+    (sampling/elicitation/roots/tasks) and stay robust to SDK changes.
+    """
+
+    async def send_request(self, request: Any, *args: Any, **kwargs: Any) -> Any:
+        if is_mcp_apps_host_enabled():
+            try:
+                root = getattr(request, "root", None)
+                if isinstance(root, mcp_types.InitializeRequest):
+                    caps = root.params.capabilities
+                    caps_data = caps.model_dump(by_alias=True, exclude_none=True)
+                    extensions = dict(caps_data.get("extensions") or {})
+                    extensions.setdefault(
+                        MCP_APPS_UI_EXTENSION_KEY, dict(MCP_APPS_UI_CAPABILITY)
+                    )
+                    caps_data["extensions"] = extensions
+                    # `ClientCapabilities` is `extra="allow"`, so the extra
+                    # `extensions` key round-trips through model_dump and onto
+                    # the JSON-RPC wire in BaseSession.send_request.
+                    root.params.capabilities = mcp_types.ClientCapabilities(
+                        **caps_data
+                    )
+            except Exception:
+                # Advertising the extension must never break a connection;
+                # a server that never sees it simply won't return MCP Apps.
+                logger.warning(
+                    "failed to advertise MCP Apps UI extension on initialize; "
+                    "continuing without it",
+                    exc_info=True,
+                )
+
+        return await super().send_request(request, *args, **kwargs)
+
+
+def ensure_ui_extension_session_patch() -> None:
+    """Idempotently make Strands' MCP client construct `_UIExtensionClientSession`.
+
+    Substitutes the single `ClientSession` symbol that
+    `strands.tools.mcp.mcp_client` resolves when it builds a session. The MCP
+    SDK's own `mcp.ClientSession` is left untouched. Safe to leave installed
+    permanently: the subclass only augments `initialize` when the host flag is
+    on, so with the flag off it is behaviorally identical to the SDK class.
+    """
+    import strands.tools.mcp.mcp_client as strands_mcp_client_mod
+
+    if strands_mcp_client_mod.ClientSession is _UIExtensionClientSession:
+        return
+
+    strands_mcp_client_mod.ClientSession = _UIExtensionClientSession
+    logger.info(
+        "MCP Apps: patched strands MCP client to advertise the "
+        "'%s' extension on initialize",
+        MCP_APPS_UI_EXTENSION_KEY,
+    )
+
+
+# =============================================================================
+# UI-capable MCP client
+# =============================================================================
+
+
+class UICapableMCPClient(MCPClient):
+    """`MCPClient` that records `_meta.ui` and hides app-only tools.
+
+    Used for external MCP servers. Construction installs the `initialize`
+    extension patch so this client's session advertises the UI capability.
+    `list_tools_sync` is the seam Strands calls to build the model's tool
+    list, so filtering here guarantees the model never sees app-only tools
+    while the full metadata is retained in the catalog.
+    """
+
+    def __init__(self, *args: Any, **kwargs: Any) -> None:
+        ensure_ui_extension_session_patch()
+        super().__init__(*args, **kwargs)
+
+    def list_tools_sync(self, *args: Any, **kwargs: Any) -> PaginatedList:
+        result = super().list_tools_sync(*args, **kwargs)
+        filtered = record_and_filter_ui_tools(list(result), client=self)
+        return PaginatedList(filtered, token=result.pagination_token)
diff --git a/backend/src/agents/main_agent/multimodal/prompt_builder.py b/backend/src/agents/main_agent/multimodal/prompt_builder.py
index 97a4b3ff..2b254aaa 100644
--- a/backend/src/agents/main_agent/multimodal/prompt_builder.py
+++ b/backend/src/agents/main_agent/multimodal/prompt_builder.py
@@ -3,7 +3,7 @@
 """
 import logging
 import base64
-from typing import List, Optional, Union, Dict, Any
+from typing import List, Optional, Union, Dict, Any, Set
 from agents.main_agent.multimodal.image_handler import ImageHandler
 from agents.main_agent.multimodal.document_handler import DocumentHandler
 from agents.main_agent.multimodal.file_sanitizer import FileSanitizer
@@ -51,20 +51,27 @@ def build_prompt(
         else:
             content_blocks.append({"text": message})
 
+        # Track sanitized document names used in this turn to prevent
+        # Bedrock ValidationException: "Messages can't contain duplicate document names"
+        used_document_names: Set[str] = set()
+
         # Add each file as appropriate ContentBlock
         for file in files:
-            content_block = self._process_file(file)
+            content_block = self._process_file(file, used_document_names)
             if content_block:
                 content_blocks.append(content_block)
 
         return content_blocks
 
-    def _process_file(self, file: Any) -> Optional[Dict[str, Any]]:
+    def _process_file(self, file: Any, used_document_names: Optional[Set[str]] = None) -> Optional[Dict[str, Any]]:
         """
         Process a single file and create appropriate ContentBlock
 
         Args:
             file: FileContent object with content_type, filename, and base64 bytes
+            used_document_names: Set of already-used document names in this turn.
+                When provided, duplicate document names are made unique by appending
+                a counter suffix to prevent Bedrock ValidationException.
 
         Returns:
             dict: ContentBlock or None if unsupported
@@ -88,6 +95,15 @@ def _process_file(self, file: Any) -> Optional[Dict[str, Any]]:
             # Sanitize filename for Bedrock
             sanitized_name = self.file_sanitizer.sanitize_filename(file.filename)
 
+            # Deduplicate document names within this turn. Bedrock rejects
+            # requests where two document blocks share the same name, even
+            # across different messages in the conversation history. When the
+            # same (or similarly-named) file appears more than once, append
+            # a numeric suffix to make the name unique.
+            if used_document_names is not None:
+                sanitized_name = self._unique_document_name(sanitized_name, used_document_names)
+                used_document_names.add(sanitized_name)
+
             return self.document_handler.create_content_block(
                 file_bytes=file_bytes,
                 filename=filename,
@@ -98,6 +114,23 @@ def _process_file(self, file: Any) -> Optional[Dict[str, Any]]:
             logger.warning(f"Unsupported file type: {filename} ({content_type})")
             return None
 
+    @staticmethod
+    def _unique_document_name(name: str, used_names: Set[str]) -> str:
+        """Return a name that is not already in used_names.
+
+        If ``name`` is already taken, appends ``_2``, ``_3``, … until a free
+        slot is found. This keeps names deterministic and human-readable while
+        satisfying Bedrock's uniqueness constraint.
+        """
+        if name not in used_names:
+            return name
+        counter = 2
+        while True:
+            candidate = f"{name}_{counter}"
+            if candidate not in used_names:
+                return candidate
+            counter += 1
+
     def get_content_type_summary(self, prompt: Union[str, List[Dict[str, Any]]]) -> str:
         """
         Get a summary of content types in the prompt
diff --git a/backend/src/agents/main_agent/session/turn_based_session_manager.py b/backend/src/agents/main_agent/session/turn_based_session_manager.py
index e15e005e..d13a34cf 100644
--- a/backend/src/agents/main_agent/session/turn_based_session_manager.py
+++ b/backend/src/agents/main_agent/session/turn_based_session_manager.py
@@ -157,6 +157,29 @@ def initialize(self, agent: "Agent", **kwargs: Any) -> None:
         self._total_message_count_at_init = len(agent.messages)
         self.message_count = self._total_message_count_at_init
 
+        # Slice 0 diagnostic: confirm what restored history a turn actually
+        # sees at init. Decisive for the max_tokens "Continue" flow — on the
+        # continuation turn the restored tail must be the truncated assistant
+        # message for the model to resume rather than restart. One concise
+        # line per init (init is not hot-path frequent).
+        try:
+            _msgs = agent.messages or []
+            if _msgs:
+                _last = _msgs[-1]
+                _last_role = _last.get("role")
+                _last_text = ""
+                for _blk in _last.get("content", []) or []:
+                    if isinstance(_blk, dict) and isinstance(_blk.get("text"), str):
+                        _last_text += _blk["text"]
+                logger.info(
+                    "Restore @init: %d message(s); last role=%s, last text len=%d",
+                    len(_msgs), _last_role, len(_last_text),
+                )
+            else:
+                logger.info("Restore @init: 0 messages (new or empty-at-init session)")
+        except Exception:
+            logger.debug("Restore @init: diagnostic log failed", exc_info=True)
+
         # Initialize compaction defaults
         self.compaction_state = CompactionState()
         self._valid_cutoff_indices = []
@@ -173,6 +196,23 @@ def initialize(self, agent: "Agent", **kwargs: Any) -> None:
         if not agent.messages:
             return
 
+        # Strip document bytes from history unconditionally — regardless of
+        # whether compaction is enabled. Document content blocks with inline
+        # bytes must never survive in restored history because Bedrock rejects
+        # any request where two document blocks share the same sanitized name
+        # across the conversation (ValidationException: "Messages can't contain
+        # duplicate document names"). This can happen on what feels like a
+        # "first turn" when the user returns to an existing session URL and
+        # re-attaches a file with the same name as one from a prior visit.
+        # The [Attached files: …] text marker already in the user message
+        # preserves the reference for the model without re-sending bytes.
+        # Images are handled the same way inside _truncate_tool_contents, but
+        # that method is gated on compaction being enabled — this one is not.
+        try:
+            agent.messages = self._strip_document_bytes(agent.messages)
+        except Exception as e:
+            logger.warning(f"Document byte stripping failed, continuing: {e}", exc_info=True)
+
         if not self.compaction_config or not self.compaction_config.enabled:
             return
 
@@ -698,6 +738,52 @@ def _truncate_text(self, text: str, max_length: int) -> str:
             return text
         return text[:max_length] + f"\n... [truncated, {len(text) - max_length} chars removed]"
 
+    def _strip_document_bytes(self, messages: List[Dict]) -> List[Dict]:
+        """Replace document content blocks' inline bytes with a text placeholder.
+
+        Called unconditionally on every session restore — independent of whether
+        compaction is enabled. Document blocks with ``source.bytes`` must never
+        survive in restored history because Bedrock rejects any request where two
+        document blocks share the same sanitized name across the conversation
+        (ValidationException: "Messages can't contain duplicate document names").
+
+        Images are handled the same way inside ``_truncate_tool_contents``, but
+        that method is gated on compaction being enabled. This one is not.
+
+        The ``[Attached files: …]`` text marker already present in the user
+        message preserves the reference for the model without re-sending bytes.
+        """
+        stripped_messages = copy.deepcopy(messages)
+        strip_count = 0
+
+        for msg in stripped_messages:
+            content = msg.get("content", [])
+            if not isinstance(content, list):
+                continue
+            for block_idx, block in enumerate(content):
+                if not isinstance(block, dict) or "document" not in block:
+                    continue
+                doc_data = block["document"]
+                source = doc_data.get("source", {})
+                # Only replace blocks that carry inline bytes — s3Location
+                # blocks (enhancement #401) have no bytes to strip and are
+                # safe to leave as-is since they don't accumulate in history.
+                if "bytes" not in source:
+                    continue
+                doc_name = doc_data.get("name", "unknown")
+                doc_format = doc_data.get("format", "unknown")
+                original_bytes = source.get("bytes", b"")
+                original_size = len(original_bytes) if isinstance(original_bytes, bytes) else 0
+                content[block_idx] = {
+                    "text": f"[Document placeholder: name={doc_name}, format={doc_format}, original_size={original_size} bytes]"
+                }
+                strip_count += 1
+
+        if strip_count > 0:
+            logger.debug(f"Stripped inline bytes from {strip_count} document block(s) in history")
+
+        return stripped_messages
+
     def _truncate_tool_contents(
         self,
         messages: List[Dict],
@@ -706,6 +792,10 @@ def _truncate_tool_contents(
         """
         Stage 1 Compaction: Truncate long tool inputs/results and replace images.
 
+        Note: document block byte-stripping is handled unconditionally by
+        ``_strip_document_bytes`` (called from ``initialize``) and is therefore
+        not repeated here.
+
         Returns:
             Tuple of (modified_messages, truncation_count, chars_saved)
         """
diff --git a/backend/src/agents/main_agent/streaming/stream_coordinator.py b/backend/src/agents/main_agent/streaming/stream_coordinator.py
index 01a4b667..5cdf2f9a 100644
--- a/backend/src/agents/main_agent/streaming/stream_coordinator.py
+++ b/backend/src/agents/main_agent/streaming/stream_coordinator.py
@@ -66,8 +66,26 @@ async def stream_response(
 
         # Track timing for latency metrics
         stream_start_time = time.time()
+        # Wall-clock turn start as a tz-aware datetime. Used post-turn to
+        # tell which artifacts (HEAD.updated_at) were touched *this* turn
+        # vs. carried over from earlier turns in the same session.
+        turn_start_dt = datetime.now(timezone.utc)
         first_token_time: Optional[float] = None
 
+        # Set when a create_artifact / update_artifact tool call is seen
+        # this turn — the only turns that need the post-turn artifacts
+        # query + `artifact` SSE emit. Normal turns pay nothing.
+        artifact_tool_invoked = False
+
+        # MCP Apps (PR #3): toolUseId -> tool name, learned from tool_use /
+        # content_block_start events so a later tool_result can be matched
+        # back to its catalog `_meta.ui`. `ui_resource_emitted` dedupes the
+        # `ui_resource` SSE per toolUseId (a tool result can surface twice —
+        # once via the lifecycle path, once via the tool path). Both stay
+        # empty and unused unless AGENTCORE_MCP_APPS_HOST_ENABLED=true.
+        ui_tool_use_names: Dict[str, str] = {}
+        ui_resource_emitted: set[str] = set()
+
         # Accumulate metadata from stream
         accumulated_metadata: Dict[str, Any] = {"usage": {}, "metrics": {}}
 
@@ -84,6 +102,30 @@ async def stream_response(
         initial_message_count = self._get_initial_message_count(session_manager)
         logger.info(f"📊 Initial message count before streaming: {initial_message_count}")
 
+        # MCP Apps PR #5: subscribe this conversation stream to the
+        # app-initiated tool-event broker so a `tools/call` proxied from an
+        # embedded MCP App surfaces as a tool_use/tool_result card in the
+        # live thread (and any buffered while no stream was active flush
+        # in here). Inert + zero-cost unless the host flag is on; removed
+        # in the method-level `finally` so a dropped stream can't leak.
+        app_event_queue = None
+        try:
+            from agents.main_agent.integrations.mcp_apps import (
+                is_mcp_apps_host_enabled,
+            )
+
+            if is_mcp_apps_host_enabled():
+                from apis.shared.mcp_apps.broker import (
+                    get_app_tool_event_broker,
+                )
+
+                app_event_queue = get_app_tool_event_broker().add_subscriber(
+                    session_id
+                )
+        except Exception as e:  # noqa: BLE001 - never block the stream
+            logger.warning("MCP Apps broker subscribe failed: %s", e)
+            app_event_queue = None
+
         try:
             # Get raw agent stream
             agent_stream = agent.stream_async(prompt)
@@ -126,6 +168,37 @@ async def stream_response(
                                 if current_assistant_message_index == 0 and first_token_time is None:
                                     first_token_time = per_message_metadata[0]["first_token_time"]
 
+                # Note whether the agent invoked an artifact authoring
+                # tool this turn. Gates the post-turn artifacts query so
+                # only artifact turns pay for it.
+                if not artifact_tool_invoked and event.get("type") == "tool_use":
+                    tool_name = (
+                        event.get("data", {}).get("tool_use", {}).get("name")
+                    )
+                    if tool_name in ("create_artifact", "update_artifact"):
+                        artifact_tool_invoked = True
+
+                # MCP Apps (PR #3): remember toolUseId -> tool name so a
+                # later tool_result can be matched to its catalog `_meta.ui`.
+                # Captured from both event flavors that carry the pairing:
+                # the `tool_use` event (data.tool_use) and the
+                # `content_block_start` of a tool-use block (data.toolUse).
+                etype = event.get("type")
+                if etype == "tool_use":
+                    td = event.get("data", {}).get("tool_use", {})
+                    tn = td.get("name")
+                    tuid = td.get("tool_use_id") or td.get("toolUseId")
+                    if tn and tuid:
+                        ui_tool_use_names[tuid] = tn
+                elif etype == "content_block_start":
+                    bd = event.get("data", {})
+                    if bd.get("type") == "tool_use":
+                        tu = bd.get("toolUse", {})
+                        tn = tu.get("name")
+                        tuid = tu.get("toolUseId") or tu.get("tool_use_id")
+                        if tn and tuid:
+                            ui_tool_use_names[tuid] = tn
+
                 # Track when assistant messages end
                 if event.get("type") == "message_stop":
                     if current_assistant_message_index >= 0 and current_assistant_message_index < len(per_message_metadata):
@@ -180,13 +253,26 @@ async def stream_response(
                     if "metrics" in event_data:
                         accumulated_metadata["metrics"].update(event_data["metrics"])
 
-                # Collect metadata_summary event (don't send to client as-is)
+                # Collect metadata_summary event (don't send to client as-is).
+                #
+                # NOTE: metadata_summary carries Strands' EventLoopMetrics
+                # `accumulated_usage`, which sums each LLM call's full
+                # context-size across the turn (and across the agent's
+                # whole lifetime, per Strands' docs). For a 2-call tool
+                # turn with call_1.input=1000 and call_2.input=2500,
+                # accumulated_usage.inputTokens=3500 — but the *current*
+                # context occupancy is 2500, not 3500. We deliberately do
+                # NOT update accumulated_metadata["usage"] / ["metrics"]
+                # from this event: stream_coordinator's accumulated_metadata
+                # drives (a) the final SSE `usage` the frontend uses for
+                # the context-% badge and (b) the compaction trigger —
+                # both want "current context size", which the per-call
+                # `metadata` events already provide via last-write-wins
+                # `.update()`. Per-message cost attribution rides
+                # per_message_metadata (per-call) and is unaffected.
+                # We only keep the first_token_time backstop.
                 if event.get("type") == "metadata_summary":
                     event_data = event.get("data", {})
-                    if "usage" in event_data:
-                        accumulated_metadata["usage"].update(event_data["usage"])
-                    if "metrics" in event_data:
-                        accumulated_metadata["metrics"].update(event_data["metrics"])
                     if "first_token_time" in event_data:
                         first_token_time = event_data["first_token_time"]
                         # Associate first_token_time with first assistant message if we have one
@@ -327,6 +413,16 @@ async def stream_response(
                     # checkpoint advanced on this turn, emit a `compaction` SSE
                     # so the frontend can place an inline "earlier messages
                     # summarized" divider. Fires after metadata, before done.
+                    #
+                    # CAUTION: do NOT replace this with Strands'
+                    # AgentResult.context_size / EventLoopMetrics.latest_context_size.
+                    # Both return ONLY `inputTokens` from the last cycle —
+                    # under Bedrock prompt caching that's the uncached
+                    # suffix only, so a 50k-token fully-cached context
+                    # reports ~50 (inputTokens) and hides ~49,950 in
+                    # cacheReadInputTokens. Summing all three buckets
+                    # below is the only correct "current context size"
+                    # under caching.
                     if hasattr(session_manager, "update_after_turn"):
                         usage = accumulated_metadata.get("usage", {})
                         total_input_tokens = (
@@ -354,6 +450,34 @@ async def stream_response(
                             except Exception as e:
                                 logger.warning(f"Failed to update compaction state: {e}")
 
+                # Emit one `artifact` SSE per artifact created/updated this
+                # turn. Placed after the compaction emit (so it lands with
+                # the other post-message_stop side-channel events) and
+                # before `done`. Best-effort: a lookup failure logs and is
+                # swallowed so it never breaks the live stream.
+                if event.get("type") == "done" and artifact_tool_invoked:
+                    # Anchor every artifact touched this turn to the turn's
+                    # final assistant message. `done` lands after the last
+                    # `message_stop`, so current_assistant_message_index is
+                    # final here; this is the same odd-position index the
+                    # post-loop block uses for per-message metadata
+                    # (assistant_message_ids[-1]), which the messages
+                    # endpoint re-derives as `idx` on reload.
+                    produced_by_message_index = (
+                        initial_message_count
+                        + 2 * current_assistant_message_index
+                        + 1
+                        if current_assistant_message_index >= 0
+                        else None
+                    )
+                    for sse in await self._extract_artifact_events(
+                        session_id=session_id,
+                        user_id=user_id,
+                        turn_start=turn_start_dt,
+                        produced_by_message_index=produced_by_message_index,
+                    ):
+                        yield sse
+
                 # Intercept legacy "error" events from stream_processor and convert to conversational format
                 # This ensures errors appear as assistant messages in the chat UI
                 if event.get("type") == "error":
@@ -376,42 +500,86 @@ async def stream_response(
                         code=error_code, error=synthetic_error, session_id=session_id, recoverable=error_data.get("recoverable", False)
                     )
 
-                    # Emit message events so error appears in chat
-                    yield f'event: message_start\ndata: {{"role": "assistant"}}\n\n'
-                    yield f'event: content_block_start\ndata: {{"contentBlockIndex": 0, "type": "text"}}\n\n'
-                    yield f"event: content_block_delta\ndata: {json.dumps({'contentBlockIndex': 0, 'type': 'text', 'text': conv_error_event.message})}\n\n"
-                    yield f'event: content_block_stop\ndata: {{"contentBlockIndex": 0}}\n\n'
-                    yield f'event: message_stop\ndata: {{"stopReason": "error"}}\n\n'
-                    yield conv_error_event.to_sse_format()
-                    yield "event: done\ndata: {}\n\n"
-
-                    # Persist error messages to session
-                    try:
-                        from strands.types.content import Message
-                        from strands.types.session import SessionMessage
+                    if error_code == ErrorCode.MAX_TOKENS:
+                        # No verbose assistant bubble for truncation. The model
+                        # stream already emitted its own message_stop
+                        # (stopReason max_tokens) for the partial, so do NOT
+                        # emit a second synthetic message_stop here — a
+                        # duplicate with no active builder flips the client
+                        # parser into an error state and drops the
+                        # stream_error below. Just emit the stream_error
+                        # signal (frontend shows the inline "response length
+                        # limit reached" notice + Continue on the partial) and
+                        # done; `done` finalizes any still-open builder.
+                        yield conv_error_event.to_sse_format()
+                        # Durable marker so the Continue affordance survives a
+                        # page refresh (the partial itself is already in
+                        # AgentCore Memory). Best-effort; never blocks the
+                        # stream. Cleared at the start of the next non-resume
+                        # turn (see invocations route).
+                        try:
+                            from apis.shared.sessions.metadata import set_truncated_turn
+                            await set_truncated_turn(session_id, user_id)
+                        except Exception as marker_err:
+                            logger.error(
+                                "max_tokens: failed to persist truncated_turn marker for session %s: %s",
+                                session_id, marker_err, exc_info=True,
+                            )
+                        yield "event: done\ndata: {}\n\n"
+                    else:
+                        # Other errors still surface as a conversational
+                        # assistant message in the chat.
+                        yield f'event: message_start\ndata: {{"role": "assistant"}}\n\n'
+                        yield f'event: content_block_start\ndata: {{"contentBlockIndex": 0, "type": "text"}}\n\n'
+                        yield f"event: content_block_delta\ndata: {json.dumps({'contentBlockIndex': 0, 'type': 'text', 'text': conv_error_event.message})}\n\n"
+                        yield f'event: content_block_stop\ndata: {{"contentBlockIndex": 0}}\n\n'
+                        yield f'event: message_stop\ndata: {{"stopReason": "error"}}\n\n'
+                        yield conv_error_event.to_sse_format()
+                        yield "event: done\ndata: {}\n\n"
+
+                    # Persist error messages to session.
+                    #
+                    # SKIP for max_tokens: Strands already appended the recovered
+                    # partial assistant turn to agent.messages and the
+                    # MessageAddedEvent hook persisted it to AgentCore Memory
+                    # before the exception propagated; the user turn was
+                    # persisted at turn start by the normal hook. Re-persisting
+                    # here would duplicate the user turn and add a SECOND
+                    # consecutive assistant message, breaking Bedrock role
+                    # alternation for the follow-up "Continue" turn. The error
+                    # explanation stays a live-only UI affordance for this turn.
+                    if error_code == ErrorCode.MAX_TOKENS:
+                        logger.info(
+                            f"max_tokens: skipping error re-persist for session {session_id} "
+                            f"(Strands already committed the recovered partial turn)"
+                        )
+                    else:
+                        try:
+                            from strands.types.content import Message
+                            from strands.types.session import SessionMessage
 
-                        from agents.main_agent.session.session_factory import SessionFactory
+                            from agents.main_agent.session.session_factory import SessionFactory
 
-                        persist_session_manager = SessionFactory.create_session_manager(session_id=session_id, user_id=user_id, caching_enabled=False)
+                            persist_session_manager = SessionFactory.create_session_manager(session_id=session_id, user_id=user_id, caching_enabled=False)
 
-                        # Extract user text from prompt (can be string or ContentBlock list)
-                        if isinstance(prompt, str):
-                            user_text = prompt
-                        else:
-                            # Extract text from ContentBlock list
-                            user_text = " ".join(block.get("text", "") for block in prompt if isinstance(block, dict) and "text" in block)
+                            # Extract user text from prompt (can be string or ContentBlock list)
+                            if isinstance(prompt, str):
+                                user_text = prompt
+                            else:
+                                # Extract text from ContentBlock list
+                                user_text = " ".join(block.get("text", "") for block in prompt if isinstance(block, dict) and "text" in block)
 
-                        user_msg: Message = {"role": "user", "content": [{"text": user_text}]}
-                        assistant_msg: Message = {"role": "assistant", "content": [{"text": conv_error_event.message}]}
+                            user_msg: Message = {"role": "user", "content": [{"text": user_text}]}
+                            assistant_msg: Message = {"role": "assistant", "content": [{"text": conv_error_event.message}]}
 
-                        if hasattr(persist_session_manager, "base_manager") and hasattr(persist_session_manager.base_manager, "create_message"):
-                            user_session_msg = SessionMessage.from_message(user_msg, 0)
-                            assistant_session_msg = SessionMessage.from_message(assistant_msg, 1)
-                            persist_session_manager.base_manager.create_message(session_id, "default", user_session_msg)
-                            persist_session_manager.base_manager.create_message(session_id, "default", assistant_session_msg)
-                            logger.info(f"💾 Saved intercepted error messages to session {session_id}")
-                    except Exception as persist_error:
-                        logger.error(f"Failed to persist intercepted error to session: {persist_error}")
+                            if hasattr(persist_session_manager, "base_manager") and hasattr(persist_session_manager.base_manager, "create_message"):
+                                user_session_msg = SessionMessage.from_message(user_msg, 0)
+                                assistant_session_msg = SessionMessage.from_message(assistant_msg, 1)
+                                persist_session_manager.base_manager.create_message(session_id, "default", user_session_msg)
+                                persist_session_manager.base_manager.create_message(session_id, "default", assistant_session_msg)
+                                logger.info(f"💾 Saved intercepted error messages to session {session_id}")
+                        except Exception as persist_error:
+                            logger.error(f"Failed to persist intercepted error to session: {persist_error}")
 
                     # Skip the original error event and exit the loop - we've handled the error
                     return
@@ -420,6 +588,34 @@ async def stream_response(
                 sse_event = self._format_sse_event(event)
                 yield sse_event
 
+                # MCP Apps PR #5: interleave any app-initiated tool events
+                # (a `tools/call` proxied from an embedded App, dispatched
+                # out-of-band on /mcp-apps/proxy-call) into the live thread.
+                # Non-blocking drain — never waits on the agent stream.
+                if app_event_queue is not None:
+                    from apis.shared.mcp_apps.broker import (
+                        get_app_tool_event_broker,
+                    )
+
+                    for app_ev in get_app_tool_event_broker().drain(
+                        app_event_queue
+                    ):
+                        yield self._format_sse_event(app_ev)
+
+                # MCP Apps (PR #3): if this tool_result belongs to a
+                # UI-bearing tool, fetch its `ui://` resource via
+                # `resources/read` and emit a `ui_resource` SSE right after
+                # the tool_result it correlates to (toolUseId ties them).
+                # Inert + zero-cost unless the host flag is on; best-effort
+                # so a fetch failure never breaks the live stream.
+                if event.get("type") == "tool_result":
+                    for sse in await self._extract_ui_resource_events(
+                        event,
+                        ui_tool_use_names,
+                        ui_resource_emitted,
+                    ):
+                        yield sse
+
             # Calculate end-to-end latency (fallback if done event wasn't received)
             stream_end_time = time.time()
 
@@ -622,6 +818,23 @@ async def stream_response(
                     logger.info(f"💾 Saved stream error messages to session {session_id}")
             except Exception as persist_error:
                 logger.error(f"Failed to persist stream error to session: {persist_error}")
+        finally:
+            # MCP Apps PR #5: always release the broker subscription —
+            # covers normal completion, the in-loop error `return`, and
+            # the except path, so a dropped stream never leaks a queue.
+            if app_event_queue is not None:
+                try:
+                    from apis.shared.mcp_apps.broker import (
+                        get_app_tool_event_broker,
+                    )
+
+                    get_app_tool_event_broker().remove_subscriber(
+                        session_id, app_event_queue
+                    )
+                except Exception:  # noqa: BLE001
+                    logger.warning(
+                        "MCP Apps broker unsubscribe failed", exc_info=True
+                    )
 
     async def _persist_paused_turn_snapshot(
         self,
@@ -842,6 +1055,148 @@ async def _extract_tool_approval_required_events(
             )
         return events
 
+    async def _extract_artifact_events(
+        self,
+        session_id: Optional[str],
+        user_id: Optional[str],
+        turn_start: datetime,
+        produced_by_message_index: Optional[int] = None,
+    ) -> List[str]:
+        """Yield one SSE-formatted `artifact` event per artifact whose
+        HEAD was created or updated during this turn.
+
+        Identifies "this turn" by `updated_at >= turn_start` rather than
+        parsing the tool result text: it reflects exactly what was
+        persisted, handles multiple artifacts in one turn, and ignores
+        artifacts carried over from earlier turns in the same session. A
+        row with an unparseable `updated_at` is included (the artifact
+        tool ran this turn and the SPA dedupes by id+version anyway).
+
+        Best-effort: any failure (artifacts not configured for this env,
+        DynamoDB error) logs and returns [] — never breaks the stream.
+        """
+        if not (session_id and user_id):
+            return []
+        try:
+            from agents.builtin_tools.artifacts.service import (
+                ArtifactConfigError,
+                list_session_artifacts,
+                set_produced_by_message_index,
+            )
+
+            rows = await asyncio.to_thread(
+                list_session_artifacts, user_id, session_id
+            )
+        except ArtifactConfigError:
+            return []
+        except Exception as e:
+            logger.warning("Failed to list session artifacts: %s", e)
+            return []
+
+        events: List[str] = []
+        for row in rows:
+            updated_at = row.get("updated_at") or ""
+            try:
+                touched = datetime.fromisoformat(updated_at) >= turn_start
+            except (ValueError, TypeError):
+                touched = True
+            if not touched:
+                continue
+            artifact_id = row.get("artifact_id", "")
+            version = int(row.get("version", 0))
+            if produced_by_message_index is not None and artifact_id:
+                try:
+                    await asyncio.to_thread(
+                        set_produced_by_message_index,
+                        user_id,
+                        artifact_id,
+                        version,
+                        produced_by_message_index,
+                    )
+                except Exception as e:  # noqa: BLE001 - best-effort linkage
+                    logger.warning(
+                        "Failed to stamp produced_by_message_index "
+                        "(artifact=%s): %s",
+                        artifact_id,
+                        e,
+                    )
+            payload = {
+                "type": "artifact",
+                "artifactId": artifact_id,
+                "version": version,
+                "title": row.get("title", ""),
+                "contentType": row.get(
+                    "content_type", "text/html; charset=utf-8"
+                ),
+                "sessionId": session_id,
+                "updatedAt": updated_at,
+                "action": "created" if version == 1 else "updated",
+                "producedByMessageIndex": produced_by_message_index,
+            }
+            events.append(
+                f"event: artifact\ndata: {json.dumps(payload)}\n\n"
+            )
+        return events
+
+    async def _extract_ui_resource_events(
+        self,
+        event: Dict[str, Any],
+        tool_use_names: Dict[str, str],
+        emitted: set,
+    ) -> List[str]:
+        """Yield a `ui_resource` SSE for a tool_result that ships an MCP App.
+
+        PR #3 of the MCP Apps host-renderer initiative
+        (`docs/kaizen/scoping/mcp-apps-host-renderer.md`). When the host flag
+        is on and this tool_result's tool declared a `ui://` resource in its
+        `tools/list` `_meta.ui` (recorded in the catalog by PR #2), fetch that
+        resource via the spec-mandated `resources/read` against the same MCP
+        client that surfaced the tool, and emit a single
+
+            `{type, toolUseId, resourceUri, html, mimeType, csp, permissions}`
+
+        event with the HTML inlined (so the frontend needs no MCP client).
+        The blocking `resources/read` runs in a worker thread so the live
+        stream is not stalled.
+
+        Inert and zero-cost when `AGENTCORE_MCP_APPS_HOST_ENABLED` is false.
+        Best-effort: deduped per toolUseId, and any failure logs and returns
+        [] — it never breaks the stream.
+        """
+        from agents.main_agent.integrations.mcp_apps import (
+            fetch_ui_resource,
+            is_mcp_apps_host_enabled,
+        )
+
+        if not is_mcp_apps_host_enabled():
+            return []
+
+        try:
+            tool_result = event.get("data", {}).get("tool_result", {})
+            if not isinstance(tool_result, dict):
+                return []
+            tool_use_id = tool_result.get("toolUseId") or tool_result.get(
+                "tool_use_id"
+            )
+            if not tool_use_id or tool_use_id in emitted:
+                return []
+
+            tool_name = tool_use_names.get(tool_use_id)
+            if not tool_name:
+                return []
+
+            payload = await asyncio.to_thread(
+                fetch_ui_resource, tool_name, tool_use_id
+            )
+            if payload is None:
+                return []
+
+            emitted.add(tool_use_id)
+            return [f"event: ui_resource\ndata: {json.dumps(payload)}\n\n"]
+        except Exception as e:  # noqa: BLE001 - best-effort side channel
+            logger.warning("Failed to emit ui_resource event: %s", e)
+            return []
+
     def _format_sse_event(self, event: Dict[str, Any]) -> str:
         """
         Format processed event as SSE (Server-Sent Event)
@@ -1165,27 +1520,25 @@ async def _store_message_metadata(
                 end_to_end_latency_ms = int((stream_end_time - stream_start_time) * 1000)
                 logger.info(f"📊 Calculated E2E latency: {end_to_end_latency_ms}ms")
 
-            # Get time to first token
-            # PRIORITY 1: Use provider's timeToFirstByteMs if available (most accurate)
+            # Get time to first token. We persist `None` (not 0) when the
+            # provider didn't emit `timeToFirstByteMs` and we couldn't
+            # measure it locally — a real TTFT can never be 0ms, and any
+            # downstream aggregation (averages, percentiles) needs to
+            # distinguish "not measured" from a real value to avoid
+            # pulling stats toward zero.
             if accumulated_metadata.get("metrics", {}).get("timeToFirstByteMs"):
                 time_to_first_token_ms = int(accumulated_metadata["metrics"]["timeToFirstByteMs"])
                 logger.info(f"📊 Using provider timeToFirstByteMs: {time_to_first_token_ms}ms")
-            # PRIORITY 2: Estimate TTFT as a portion of latency if we don't have it
-            # This is a rough estimate but better than 0 or None
-            # For most LLM calls, TTFT is typically 20-40% of total latency
-            elif end_to_end_latency_ms and end_to_end_latency_ms > 100:
-                # If E2E latency is available and substantial, estimate TTFT
-                # We don't have actual TTFT so we can't store it accurately
-                # Instead, log that we're missing it
-                logger.info(f"📊 No TTFT available - provider did not send timeToFirstByteMs for this message")
-                # Still create latency metrics with just E2E, using a placeholder of 0 for TTFT
-                # This is better than losing all latency data
-                time_to_first_token_ms = 0  # Indicates "not measured"
-
-            # Create latency metrics if we have at least E2E latency
+            else:
+                logger.info("📊 No TTFT available - provider did not send timeToFirstByteMs for this message")
+
+            # Create latency metrics if we have at least E2E latency.
+            # `time_to_first_token_ms` may be None — LatencyMetrics.time_to_first_token
+            # is Optional, so this serializes as JSON null.
             if end_to_end_latency_ms is not None:
                 latency_metrics = LatencyMetrics(
-                    time_to_first_token=time_to_first_token_ms if time_to_first_token_ms is not None else 0, end_to_end_latency=end_to_end_latency_ms
+                    time_to_first_token=time_to_first_token_ms,
+                    end_to_end_latency=end_to_end_latency_ms,
                 )
                 logger.info(f"📊 Created LatencyMetrics: TTFT={time_to_first_token_ms}ms, E2E={end_to_end_latency_ms}ms")
             else:
diff --git a/backend/src/agents/main_agent/streaming/stream_processor.py b/backend/src/agents/main_agent/streaming/stream_processor.py
index f2501713..227d4011 100644
--- a/backend/src/agents/main_agent/streaming/stream_processor.py
+++ b/backend/src/agents/main_agent/streaming/stream_processor.py
@@ -45,6 +45,8 @@
 from typing import Any, AsyncGenerator, Dict, List, Optional, Tuple, TypedDict
 from uuid import UUID
 
+from strands.types.exceptions import MaxTokensReachedException
+
 from apis.shared.errors import StreamErrorEvent, ErrorCode
 
 logger = logging.getLogger(__name__)
@@ -300,12 +302,13 @@ def _handle_completion_events(event: RawEvent) -> Tuple[List[ProcessedEvent], bo
     if event.get("force_stop", False):
         reason = event.get("force_stop_reason", "unknown reason")
 
-        # Create structured error event
+        error_message, recoverable = _format_force_stop_message(reason)
+
         error_event = StreamErrorEvent(
-            error=f"Agent force-stopped: {reason}",
+            error=error_message,
             code=ErrorCode.AGENT_ERROR,
             detail=str(reason) if reason != "unknown reason" else None,
-            recoverable=False
+            recoverable=recoverable,
         )
         # Convert to event dict format
         events.append(_create_event("error", error_event.model_dump(exclude_none=True)))
@@ -314,6 +317,104 @@ def _handle_completion_events(event: RawEvent) -> Tuple[List[ProcessedEvent], bo
     return events, should_break
 
 
+def _format_force_stop_message(reason: Any) -> tuple[str, bool]:
+    """Translate raw Bedrock errors in ``force_stop_reason`` into user-facing
+    markdown. Returns (message, recoverable).
+
+    We detect a small handful of high-signal patterns (document size limit,
+    throttling, access denied) and surface actionable guidance. Anything
+    else falls through to a generic "Agent force-stopped" with the raw
+    error for transparency.
+    """
+    reason_str = str(reason or "")
+    reason_lower = reason_str.lower()
+
+    # Bedrock rejects requests where two document blocks in the conversation
+    # share the same name. This can happen when the same file (or two files
+    # whose names sanitize identically) appears in both the current turn and
+    # a prior turn that is still in the active context window.
+    if "duplicate document name" in reason_lower or "can't contain duplicate document" in reason_lower:
+        return (
+            "⚠️ A file you attached has the same name as one already in this "
+            "conversation's context.\n\n"
+            "Try renaming the file before attaching it, or start a new "
+            "conversation.",
+            True,
+        )
+
+    # Some Bedrock-hosted models (e.g. gpt-oss-120b) reject any document or
+    # image content block outright with "This model doesn't support
+    # documents." Check this BEFORE the size-limit branch — the AWS message
+    # contains "ValidationException" + "documents" and would otherwise be
+    # misclassified as a 4.5 MB overflow.
+    #
+    # Copy notes: keep the actionable advice deployment-agnostic — no brand
+    # names (model lineups change), no UI affordance names (might drift),
+    # no references to optional tools like Spreadsheet Analysis (not
+    # guaranteed enabled across forks/deployments).
+    if "doesn't support document" in reason_lower or "does not support document" in reason_lower:
+        return (
+            "⚠️ The selected model can't read attached files.\n\n"
+            "To work with this file, switch to a model that supports "
+            "documents.",
+            True,
+        )
+
+    if "doesn't support image" in reason_lower or "does not support image" in reason_lower:
+        return (
+            "⚠️ The selected model can't read attached images.\n\n"
+            "To work with this image, switch to a model that supports "
+            "images.",
+            True,
+        )
+
+    # Bedrock ConverseStream rejects document content blocks over ~4.5 MB
+    # internal size. Triggered most often by XLSX files that inflate
+    # significantly during parsing. See issue #206. Narrowed from
+    # `"document" in reason` to size-specific markers so
+    # unsupported-modality errors don't false-positive here.
+    #
+    # Copy notes: keep guidance deployment-agnostic — no references to
+    # optional tools (Spreadsheet Analysis isn't guaranteed enabled across
+    # forks/deployments) and no UI affordance names that might drift.
+    if (
+        "maximum document size" in reason_lower
+        or "document size" in reason_lower
+        or ("document" in reason_lower and ("too large" in reason_lower or "exceeds" in reason_lower or "exceed the" in reason_lower))
+    ):
+        return (
+            "⚠️ One of the attached files is too large for the model to read "
+            "directly.\n\n"
+            "Bedrock limits inline documents to 4.5 MB after parsing, and "
+            "spreadsheets (especially XLSX) often expand past that. Try "
+            "splitting the file into smaller sections, converting to plain "
+            "text or CSV, or sharing only the relevant portion.",
+            True,
+        )
+
+    if "throttl" in reason_lower or "too many requests" in reason_lower:
+        return (
+            "⚠️ The model is receiving too many requests right now.\n\n"
+            "> " + reason_str + "\n\n"
+            "Please wait a moment and try again.",
+            True,
+        )
+
+    if "accessdenied" in reason_lower or "access denied" in reason_lower:
+        return (
+            "⚠️ I don't have access to complete this request.\n\n"
+            "> " + reason_str + "\n\n"
+            "Try a different model, or check that the required permissions "
+            "are configured for this workspace.",
+            False,
+        )
+
+    # Default: preserve prior behavior so operators can still read the raw
+    # error in logs and the UI, but flag it recoverable since most transient
+    # issues (network blips, tool timeouts) benefit from a retry.
+    return (f"Agent force-stopped: {reason_str}", False)
+
+
 def _handle_content_block_events(event: RawEvent, current_block_index: Dict[str, int]) -> List[ProcessedEvent]:
     """Process content block events from the agent stream.
 
@@ -1031,7 +1132,16 @@ def _extract_metrics_data(metrics_obj: Any) -> Dict[str, Any]:
                     metadata_data["metrics"] = metrics_data
             
             if metadata_data:
-                metadata_event = _create_event("metadata", metadata_data)
+                # Emit on the turn-summary track, NOT the per-message track.
+                # `result.metrics.accumulated_usage` is summed across every
+                # LLM call in the turn (Strands' EventLoopMetrics). If we
+                # emitted this as a `metadata` event, the stream coordinator
+                # would route it into per_message_metadata[last] and clobber
+                # that message's per-call usage — pricing each entry would
+                # then double-count earlier messages' input tokens. The
+                # main loop also accumulates `metadata_summary` events
+                # (see below), so the final summary stays cumulative.
+                metadata_event = _create_event("metadata_summary", metadata_data)
                 events.append(metadata_event)
 
     # Check for metadata in nested event structure (like content blocks)
@@ -1250,8 +1360,14 @@ async def mock_stream():
             # NOTE: timeToFirstByteMs from provider will be stored in metrics
             # and the coordinator can use it as a fallback if first_token_time is not set
             for processed_event in _handle_metadata_events(event):
-                # Accumulate metadata for summary
-                if processed_event.get("type") == "metadata":
+                # Accumulate metadata for the final summary. Both per-call
+                # `metadata` events (each LLM call's usage) and the
+                # turn-cumulative `metadata_summary` event (extracted from
+                # AgentResult.metrics.accumulated_usage) feed this dict —
+                # the cumulative event arrives last and `update()` makes it
+                # last-write-wins, so accumulated_metadata ends the turn
+                # carrying true totals.
+                if processed_event.get("type") in ("metadata", "metadata_summary"):
                     event_data = processed_event.get("data", {})
                     if "usage" in event_data:
                         accumulated_metadata["usage"].update(event_data["usage"])
@@ -1275,8 +1391,11 @@ async def mock_stream():
                 # Check one more time for metadata in case result came with complete
                 metadata_events_after_complete = _handle_metadata_events(event)
                 for processed_event in metadata_events_after_complete:
-                    # Accumulate metadata for summary
-                    if processed_event.get("type") == "metadata":
+                    # Accumulate metadata for summary — see note on the main
+                    # loop's accumulator above. Both `metadata` (per-call)
+                    # and `metadata_summary` (turn-cumulative from result)
+                    # feed accumulated_metadata so the final emit is total.
+                    if processed_event.get("type") in ("metadata", "metadata_summary"):
                         event_data = processed_event.get("data", {})
                         if "usage" in event_data:
                             accumulated_metadata["usage"].update(event_data["usage"])
@@ -1384,14 +1503,19 @@ async def mock_stream():
         yield _create_event("done", {})
 
     except RuntimeError as e:
-        # RuntimeError may occur during generator cleanup or cancellation
-        # Check if it's related to generator state
+        # The "generator"/"async" matched branch is the load-bearing part of
+        # this clause — it swallows async-generator state errors that aren't
+        # real user-facing failures: e.g. "asynchronous generator is already
+        # running" (re-entry) or "generator ignored GeneratorExit" (cleanup
+        # races). Normal shutdown goes through the GeneratorExit handler
+        # above; this catches the residual edge cases that should not
+        # surface as visible error events to the consumer.
         if "generator" in str(e).lower() or "async" in str(e).lower():
             logger.debug(f"Generator runtime error (likely cleanup): {e}")
         else:
-            # Unexpected RuntimeError - treat as error
+            # Any other RuntimeError: emit a STREAM_ERROR event, same shape
+            # as the generic Exception handler below.
             logger.error(f"Runtime error processing agent stream: {e}", exc_info=True)
-            # Create structured error event
             error_event = StreamErrorEvent(
                 error="Runtime error during streaming",
                 code=ErrorCode.STREAM_ERROR,
@@ -1400,6 +1524,24 @@ async def mock_stream():
             )
             yield _create_event("error", error_event.model_dump(exclude_none=True))
 
+    except MaxTokensReachedException:
+        # The model hit its output-token ceiling mid-turn. Strands has already
+        # appended the recovered partial assistant message (truncated tool uses
+        # swapped for stubs) to agent.messages, and the MessageAddedEvent hook
+        # has persisted it to AgentCore Memory before this exception propagated
+        # — so the turn is recoverable via a follow-up "continue". Emit a
+        # max_tokens-coded error so the coordinator can render an honest,
+        # actionable message and skip the duplicate-persist path. Do NOT leak
+        # the raw SDK message/URL into `detail`.
+        logger.info("Agent stream stopped: max_tokens limit reached (recoverable via continue)")
+        error_event = StreamErrorEvent(
+            error="Response truncated: maximum output length reached",
+            code=ErrorCode.MAX_TOKENS,
+            detail=None,
+            recoverable=True
+        )
+        yield _create_event("error", error_event.model_dump(exclude_none=True))
+
     except Exception as e:
         # ERROR HANDLING: If anything else goes wrong, log it and send an error event
         # This ensures the client always gets a response, even on failure
diff --git a/backend/src/agents/main_agent/tools/tool_catalog.py b/backend/src/agents/main_agent/tools/tool_catalog.py
index c65344a8..587a1387 100644
--- a/backend/src/agents/main_agent/tools/tool_catalog.py
+++ b/backend/src/agents/main_agent/tools/tool_catalog.py
@@ -84,6 +84,22 @@ def to_dict(self) -> dict:
         icon="code-bracket",
     ),
 
+    # --- Built-in Tools (Spreadsheet Analysis) ---
+    "list_spreadsheets": ToolMetadata(
+        tool_id="list_spreadsheets",
+        name="List Spreadsheet Files",
+        description="List spreadsheet files available for analysis from the assistant's knowledge base or conversation attachments.",
+        category=ToolCategory.DATA,
+        icon="folder-open",
+    ),
+    "analyze_spreadsheet": ToolMetadata(
+        tool_id="analyze_spreadsheet",
+        name="Spreadsheet Analysis",
+        description="Analyze spreadsheet data using Python code. Use for aggregations, comparisons, trends, filtering, and chart generation. For simple factual lookups, use the knowledge base search instead.",
+        category=ToolCategory.DATA,
+        icon="table-cells",
+    ),
+
     # --- Gateway/MCP Tools ---
     # These are loaded dynamically from the gateway but we define metadata here
     # for the admin UI. Actual tool availability depends on gateway configuration.
diff --git a/backend/src/apis/app_api/admin/auth_providers/routes.py b/backend/src/apis/app_api/admin/auth_providers/routes.py
index 84bf83fc..07863e8f 100644
--- a/backend/src/apis/app_api/admin/auth_providers/routes.py
+++ b/backend/src/apis/app_api/admin/auth_providers/routes.py
@@ -8,7 +8,7 @@
 
 from fastapi import APIRouter, Depends, HTTPException, Query, status
 
-from apis.shared.auth import User
+from apis.shared.auth import User, require_admin
 from apis.shared.auth_providers.models import (
     AuthProviderCreate,
     AuthProviderListResponse,
@@ -18,7 +18,6 @@
     OIDCDiscoveryResponse,
 )
 from apis.shared.auth_providers.service import get_auth_provider_service
-from apis.shared.rbac.system_admin import require_system_admin
 
 logger = logging.getLogger(__name__)
 
@@ -32,7 +31,7 @@
 )
 async def list_auth_providers(
     enabled_only: bool = Query(False, description="Filter to enabled providers only"),
-    admin_user: User = Depends(require_system_admin),
+    admin_user: User = Depends(require_admin),
 ) -> AuthProviderListResponse:
     """List all configured OIDC authentication providers."""
     logger.info("Admin listing auth providers")
@@ -51,7 +50,7 @@ async def list_auth_providers(
     summary="Get current runtime container image tag",
 )
 async def get_runtime_image_tag(
-    admin_user: User = Depends(require_system_admin),
+    admin_user: User = Depends(require_admin),
 ) -> dict:
     """
     Get the current container image tag used for AgentCore runtimes.
@@ -94,7 +93,7 @@ async def get_runtime_image_tag(
     summary="Get the Cognito IdP-response redirect URI",
 )
 async def get_cognito_redirect_uri(
-    admin_user: User = Depends(require_system_admin),
+    admin_user: User = Depends(require_admin),
 ) -> dict:
     """Return the Cognito Hosted UI URL admins must register on an external IdP.
 
@@ -121,7 +120,7 @@ async def get_cognito_redirect_uri(
 )
 async def discover_oidc_endpoints(
     request: OIDCDiscoveryRequest,
-    admin_user: User = Depends(require_system_admin),
+    admin_user: User = Depends(require_admin),
 ) -> OIDCDiscoveryResponse:
     """
     Discover OIDC endpoints from an issuer URL.
@@ -142,7 +141,7 @@ async def discover_oidc_endpoints(
 )
 async def get_auth_provider(
     provider_id: str,
-    admin_user: User = Depends(require_system_admin),
+    admin_user: User = Depends(require_admin),
 ) -> AuthProviderResponse:
     """Get a specific authentication provider by ID."""
     logger.info("Admin requesting auth provider")
@@ -167,7 +166,7 @@ async def get_auth_provider(
 )
 async def create_auth_provider(
     data: AuthProviderCreate,
-    admin_user: User = Depends(require_system_admin),
+    admin_user: User = Depends(require_admin),
 ) -> AuthProviderResponse:
     """
     Create a new OIDC authentication provider.
@@ -196,7 +195,7 @@ async def create_auth_provider(
 async def update_auth_provider(
     provider_id: str,
     updates: AuthProviderUpdate,
-    admin_user: User = Depends(require_system_admin),
+    admin_user: User = Depends(require_admin),
 ) -> AuthProviderResponse:
     """
     Update an authentication provider.
@@ -231,7 +230,7 @@ async def update_auth_provider(
 )
 async def delete_auth_provider(
     provider_id: str,
-    admin_user: User = Depends(require_system_admin),
+    admin_user: User = Depends(require_admin),
 ) -> None:
     """Delete an authentication provider and its client secret."""
     logger.info("Admin deleting auth provider")
@@ -252,7 +251,7 @@ async def delete_auth_provider(
 )
 async def test_auth_provider(
     provider_id: str,
-    admin_user: User = Depends(require_system_admin),
+    admin_user: User = Depends(require_admin),
 ) -> dict:
     """
     Test provider connectivity by verifying JWKS, discovery, and
diff --git a/backend/src/apis/app_api/admin/roles/routes.py b/backend/src/apis/app_api/admin/roles/routes.py
index 76e6dabb..feefe093 100644
--- a/backend/src/apis/app_api/admin/roles/routes.py
+++ b/backend/src/apis/app_api/admin/roles/routes.py
@@ -12,7 +12,6 @@
     AppRoleListResponse,
     CacheStatsResponse,
 )
-from apis.shared.rbac.system_admin import require_system_admin
 from apis.shared.rbac.admin_service import get_app_role_admin_service
 from apis.shared.rbac.cache import get_app_role_cache
 
@@ -88,7 +87,7 @@ async def get_role(
 @router.post("/", response_model=AppRoleResponse, status_code=status.HTTP_201_CREATED)
 async def create_role(
     role_data: AppRoleCreate,
-    admin: User = Depends(require_system_admin),
+    admin: User = Depends(require_admin),
 ):
     """
     Create a new application role.
@@ -124,7 +123,7 @@ async def create_role(
 async def update_role(
     role_id: str,
     updates: AppRoleUpdate,
-    admin: User = Depends(require_system_admin),
+    admin: User = Depends(require_admin),
 ):
     """
     Update an application role.
@@ -170,7 +169,7 @@ async def update_role(
 @router.delete("/{role_id}", status_code=status.HTTP_204_NO_CONTENT)
 async def delete_role(
     role_id: str,
-    admin: User = Depends(require_system_admin),
+    admin: User = Depends(require_admin),
 ):
     """
     Delete an application role.
@@ -210,7 +209,7 @@ async def delete_role(
 @router.post("/{role_id}/sync", response_model=AppRoleResponse)
 async def sync_role_permissions(
     role_id: str,
-    admin: User = Depends(require_system_admin),
+    admin: User = Depends(require_admin),
 ):
     """
     Force recomputation of effective permissions for a role.
@@ -245,7 +244,7 @@ async def sync_role_permissions(
 
 @router.get("/cache/stats", response_model=CacheStatsResponse)
 async def get_cache_stats(
-    admin: User = Depends(require_system_admin),
+    admin: User = Depends(require_admin),
 ):
     """
     Get cache statistics.
@@ -268,7 +267,7 @@ async def get_cache_stats(
 
 @router.post("/cache/invalidate", status_code=status.HTTP_204_NO_CONTENT)
 async def invalidate_cache(
-    admin: User = Depends(require_system_admin),
+    admin: User = Depends(require_admin),
 ):
     """
     Force invalidation of all role caches.
diff --git a/backend/src/apis/app_api/admin/routes.py b/backend/src/apis/app_api/admin/routes.py
index 67de87d7..346db2db 100644
--- a/backend/src/apis/app_api/admin/routes.py
+++ b/backend/src/apis/app_api/admin/routes.py
@@ -35,7 +35,6 @@
     update_managed_model,
     delete_managed_model,
 )
-from apis.shared.rbac.system_admin import require_system_admin
 
 logger = logging.getLogger(__name__)
 
@@ -612,7 +611,7 @@ async def delete_managed_model_endpoint(
 @router.post("/managed-models/{model_id}/sync-roles", response_model=ManagedModel)
 async def sync_model_roles(
     model_id: str,
-    admin_user: User = Depends(require_system_admin),
+    admin_user: User = Depends(require_admin),
 ):
     """
     Sync a model's allowedAppRoles with the AppRole system.
@@ -727,6 +726,11 @@ async def sync_model_roles(
 
 router.include_router(auth_providers_router)
 
+# ========== Include User Menu Links Admin Subrouter ==========
+from .user_menu_links.routes import router as user_menu_links_admin_router
+
+router.include_router(user_menu_links_admin_router)
+
 # ========== Include Fine-Tuning Admin Subrouter (conditional) ==========
 if os.environ.get("FINE_TUNING_ENABLED", "false").lower() == "true":
     from .fine_tuning.routes import router as fine_tuning_admin_router
diff --git a/backend/src/apis/app_api/admin/user_menu_links/__init__.py b/backend/src/apis/app_api/admin/user_menu_links/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/backend/src/apis/app_api/admin/user_menu_links/routes.py b/backend/src/apis/app_api/admin/user_menu_links/routes.py
new file mode 100644
index 00000000..a91025c8
--- /dev/null
+++ b/backend/src/apis/app_api/admin/user_menu_links/routes.py
@@ -0,0 +1,125 @@
+"""Admin API routes for user-menu link management.
+
+All endpoints require admin role. Non-admin users hit the public
+``GET /user-menu-links`` endpoint (registered under ``app_api.user_menu_links``)
+which returns only enabled links and strips admin-only metadata.
+"""
+
+import logging
+
+from fastapi import APIRouter, Depends, HTTPException, Query, status
+
+from apis.shared.auth import User, require_admin
+from apis.shared.user_menu_links.models import (
+    UserMenuLinkCreate,
+    UserMenuLinkListResponse,
+    UserMenuLinkResponse,
+    UserMenuLinkUpdate,
+)
+from apis.shared.user_menu_links.service import get_user_menu_links_service
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/user-menu-links", tags=["admin-user-menu-links"])
+
+
+@router.get(
+    "/",
+    response_model=UserMenuLinkListResponse,
+    summary="List all user-menu links",
+)
+async def list_user_menu_links(
+    enabled_only: bool = Query(False, description="Filter to enabled links only"),
+    admin_user: User = Depends(require_admin),
+) -> UserMenuLinkListResponse:
+    """List all user-menu links (admin sees disabled ones too)."""
+    service = get_user_menu_links_service()
+    links = await service.list_links(enabled_only=enabled_only)
+    return UserMenuLinkListResponse(
+        links=[UserMenuLinkResponse.from_link(link) for link in links],
+        total=len(links),
+    )
+
+
+@router.get(
+    "/{link_id}",
+    response_model=UserMenuLinkResponse,
+    summary="Get a user-menu link",
+)
+async def get_user_menu_link(
+    link_id: str,
+    admin_user: User = Depends(require_admin),
+) -> UserMenuLinkResponse:
+    service = get_user_menu_links_service()
+    link = await service.get_link(link_id)
+    if not link:
+        raise HTTPException(
+            status_code=status.HTTP_404_NOT_FOUND,
+            detail=f"User-menu link '{link_id}' not found",
+        )
+    return UserMenuLinkResponse.from_link(link)
+
+
+@router.post(
+    "/",
+    response_model=UserMenuLinkResponse,
+    status_code=status.HTTP_201_CREATED,
+    summary="Create a user-menu link",
+)
+async def create_user_menu_link(
+    data: UserMenuLinkCreate,
+    admin_user: User = Depends(require_admin),
+) -> UserMenuLinkResponse:
+    try:
+        service = get_user_menu_links_service()
+        link = await service.create_link(data, created_by=admin_user.email)
+        return UserMenuLinkResponse.from_link(link)
+    except ValueError as e:
+        raise HTTPException(
+            status_code=status.HTTP_400_BAD_REQUEST,
+            detail=str(e),
+        )
+
+
+@router.patch(
+    "/{link_id}",
+    response_model=UserMenuLinkResponse,
+    summary="Update a user-menu link",
+)
+async def update_user_menu_link(
+    link_id: str,
+    updates: UserMenuLinkUpdate,
+    admin_user: User = Depends(require_admin),
+) -> UserMenuLinkResponse:
+    try:
+        service = get_user_menu_links_service()
+        link = await service.update_link(link_id, updates)
+        if not link:
+            raise HTTPException(
+                status_code=status.HTTP_404_NOT_FOUND,
+                detail=f"User-menu link '{link_id}' not found",
+            )
+        return UserMenuLinkResponse.from_link(link)
+    except ValueError as e:
+        raise HTTPException(
+            status_code=status.HTTP_400_BAD_REQUEST,
+            detail=str(e),
+        )
+
+
+@router.delete(
+    "/{link_id}",
+    status_code=status.HTTP_204_NO_CONTENT,
+    summary="Delete a user-menu link",
+)
+async def delete_user_menu_link(
+    link_id: str,
+    admin_user: User = Depends(require_admin),
+) -> None:
+    service = get_user_menu_links_service()
+    deleted = await service.delete_link(link_id)
+    if not deleted:
+        raise HTTPException(
+            status_code=status.HTTP_404_NOT_FOUND,
+            detail=f"User-menu link '{link_id}' not found",
+        )
diff --git a/backend/src/apis/app_api/artifacts/__init__.py b/backend/src/apis/app_api/artifacts/__init__.py
new file mode 100644
index 00000000..75c4dadc
--- /dev/null
+++ b/backend/src/apis/app_api/artifacts/__init__.py
@@ -0,0 +1,7 @@
+"""Artifact render-token minter (app-api).
+
+Issues short-lived HS256 JWTs that the artifact render Lambda (a
+separate deployable) verifies before serving a sandboxed iframe. The
+claim shape and DDB lookup keys here are a cross-PR contract with that
+Lambda — see `backend/src/lambdas/artifact_render/handler.py`.
+"""
diff --git a/backend/src/apis/app_api/artifacts/models.py b/backend/src/apis/app_api/artifacts/models.py
new file mode 100644
index 00000000..a36ce7c4
--- /dev/null
+++ b/backend/src/apis/app_api/artifacts/models.py
@@ -0,0 +1,66 @@
+"""Request/response models for the render-token endpoint."""
+
+from __future__ import annotations
+
+from typing import Optional
+
+from pydantic import BaseModel, Field
+
+
+class RenderTokenRequest(BaseModel):
+    version: int = Field(..., ge=1, description="Artifact version to render")
+    session_id: Optional[str] = Field(
+        default=None,
+        validation_alias="sessionId",
+        description="Originating chat session id — audit correlation only",
+    )
+
+
+class RenderTokenResponse(BaseModel):
+    url: str = Field(
+        ...,
+        description="Artifact origin URL with the embedded render token "
+        "(set directly as the iframe src)",
+    )
+    expires_at: str = Field(..., description="ISO-8601 UTC token expiry")
+
+
+class ArtifactSummary(BaseModel):
+    """One artifact's current HEAD, for the session artifacts list.
+
+    Snake-case JSON to match this domain's existing REST shape
+    (RenderTokenResponse.expires_at). The SPA normalizes both this and
+    the camelCase live SSE `artifact` event into one client model.
+    """
+
+    artifact_id: str
+    version: int
+    title: str
+    content_type: str
+    updated_at: str
+    created_at: Optional[str] = None
+    produced_by_message_index: Optional[int] = Field(
+        default=None,
+        description="0-based index of the assistant message that produced "
+        "or last updated this artifact, matching the messages endpoint's "
+        "`msg-{session_id}-{index}` id. Null for artifacts written before "
+        "linkage existed — the SPA falls back to the end-of-chat strip.",
+    )
+
+
+class ArtifactListResponse(BaseModel):
+    artifacts: list[ArtifactSummary] = Field(default_factory=list)
+
+
+class ArtifactContentResponse(BaseModel):
+    """Raw source of one artifact version, for the panel's code view.
+
+    `content` is inert text the SPA highlights client-side — never
+    executed. For Markdown artifacts the stored S3 object is a rendered
+    HTML wrapper; the service unwraps it back to the authored Markdown
+    and `content_type` is normalized to `text/markdown` accordingly.
+    """
+
+    content: str
+    content_type: str
+    version: int
diff --git a/backend/src/apis/app_api/artifacts/routes.py b/backend/src/apis/app_api/artifacts/routes.py
new file mode 100644
index 00000000..6b265eec
--- /dev/null
+++ b/backend/src/apis/app_api/artifacts/routes.py
@@ -0,0 +1,161 @@
+"""Artifact render-token API routes."""
+
+import logging
+from datetime import datetime, timezone
+
+from fastapi import APIRouter, Depends, HTTPException, Query, status
+
+from apis.shared.auth import User, get_current_user_from_session
+
+from .models import (
+    ArtifactContentResponse,
+    ArtifactListResponse,
+    ArtifactSummary,
+    RenderTokenRequest,
+    RenderTokenResponse,
+)
+from .service import (
+    ArtifactContentService,
+    ArtifactListService,
+    ArtifactNotFoundError,
+    ArtifactQueryError,
+    ArtifactTooLargeError,
+    RenderTokenConfigError,
+    RenderTokenService,
+    get_artifact_content_service,
+    get_artifact_list_service,
+    get_render_token_service,
+)
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/artifacts", tags=["artifacts"])
+
+
+@router.post("/{artifact_id}/render-token", response_model=RenderTokenResponse)
+async def mint_render_token(
+    artifact_id: str,
+    request: RenderTokenRequest,
+    user: User = Depends(get_current_user_from_session),
+    service: RenderTokenService = Depends(get_render_token_service),
+) -> RenderTokenResponse:
+    """Mint a short-lived render token for one artifact version.
+
+    `sub` is taken from the authenticated session, so a caller can only
+    ever obtain a token for their own artifact. The version is validated
+    against DynamoDB before minting so the SPA gets a clean 404 rather
+    than a token that renders the Lambda's error page in the iframe.
+    """
+    try:
+        url, exp = service.mint(
+            user_id=user.user_id,
+            artifact_id=artifact_id,
+            version=request.version,
+            session_id=request.session_id,
+        )
+    except ArtifactNotFoundError:
+        raise HTTPException(
+            status.HTTP_404_NOT_FOUND, "Artifact version not found"
+        )
+    except RenderTokenConfigError:
+        logger.exception("render token service misconfigured")
+        raise HTTPException(
+            status.HTTP_500_INTERNAL_SERVER_ERROR,
+            "Artifact rendering is unavailable",
+        )
+
+    return RenderTokenResponse(
+        url=url,
+        expires_at=datetime.fromtimestamp(exp, tz=timezone.utc).isoformat(),
+    )
+
+
+@router.get("", response_model=ArtifactListResponse)
+async def list_session_artifacts(
+    session_id: str = Query(..., description="Chat session id to list artifacts for"),
+    user: User = Depends(get_current_user_from_session),
+    service: ArtifactListService = Depends(get_artifact_list_service),
+) -> ArtifactListResponse:
+    """List every version of every artifact created in a chat session.
+
+    Used by the SPA to hydrate per-version artifact cards on session load
+    (live creation/updates are delivered separately via the `artifact`
+    SSE event). Each artifact is re-checked against the authenticated
+    user, so a borrowed session id cannot enumerate another user's
+    artifacts. An unknown or artifact-free session is a normal empty
+    list, not a 404.
+    """
+    try:
+        rows = service.list_for_session(
+            user_id=user.user_id, session_id=session_id
+        )
+    except RenderTokenConfigError:
+        logger.exception("artifact list service misconfigured")
+        raise HTTPException(
+            status.HTTP_500_INTERNAL_SERVER_ERROR,
+            "Artifact listing is unavailable",
+        )
+    except ArtifactQueryError:
+        # Feature is configured fine — the backing query just failed
+        # (throttle/timeout/transient). Retryable, so signal 503 rather
+        # than masquerading as a 500 misconfiguration.
+        logger.exception("artifact list query failed")
+        raise HTTPException(
+            status.HTTP_503_SERVICE_UNAVAILABLE,
+            "Artifact listing is temporarily unavailable",
+        )
+
+    return ArtifactListResponse(
+        artifacts=[ArtifactSummary(**row) for row in rows]
+    )
+
+
+@router.get("/{artifact_id}/content", response_model=ArtifactContentResponse)
+async def get_artifact_content(
+    artifact_id: str,
+    version: int = Query(..., ge=1, description="Artifact version to read"),
+    user: User = Depends(get_current_user_from_session),
+    service: ArtifactContentService = Depends(get_artifact_content_service),
+) -> ArtifactContentResponse:
+    """Return one artifact version's raw source for the panel code view.
+
+    The bytes are inert text the SPA highlights client-side — never
+    executed. Ownership is enforced by building the lookup key from the
+    authenticated session, so a borrowed artifact/session id can't read
+    another user's content. Markdown is unwrapped back to the authored
+    source (see ArtifactContentService). Oversized artifacts 413 so the
+    SPA can steer the user to the download path instead.
+    """
+    try:
+        content, content_type = service.get(
+            user_id=user.user_id,
+            artifact_id=artifact_id,
+            version=version,
+        )
+    except ArtifactNotFoundError:
+        raise HTTPException(
+            status.HTTP_404_NOT_FOUND, "Artifact version not found"
+        )
+    except ArtifactTooLargeError:
+        raise HTTPException(
+            status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
+            "Artifact is too large to preview — download it instead",
+        )
+    except RenderTokenConfigError:
+        logger.exception("artifact content service misconfigured")
+        raise HTTPException(
+            status.HTTP_500_INTERNAL_SERVER_ERROR,
+            "Artifact content is unavailable",
+        )
+    except ArtifactQueryError:
+        logger.exception("artifact content fetch failed")
+        raise HTTPException(
+            status.HTTP_503_SERVICE_UNAVAILABLE,
+            "Artifact content is temporarily unavailable",
+        )
+
+    return ArtifactContentResponse(
+        content=content,
+        content_type=content_type,
+        version=version,
+    )
diff --git a/backend/src/apis/app_api/artifacts/service.py b/backend/src/apis/app_api/artifacts/service.py
new file mode 100644
index 00000000..6a8ac003
--- /dev/null
+++ b/backend/src/apis/app_api/artifacts/service.py
@@ -0,0 +1,477 @@
+"""Render-token minting service.
+
+Mints the HS256 JWT that the artifact render Lambda verifies. The claim
+shape, signing key, and DynamoDB lookup keys are a frozen cross-PR
+contract with `backend/src/lambdas/artifact_render/handler.py` — any
+change here must be mirrored in that verifier (and vice versa).
+
+SECURITY: the minted token is a bearer credential carried in a URL.
+Never log the token or the assembled URL — log identifiers only.
+"""
+
+from __future__ import annotations
+
+import base64
+import logging
+import os
+import re
+import threading
+import time
+from typing import Optional
+
+import boto3
+import jwt
+from boto3.dynamodb.conditions import Key
+from botocore.exceptions import ClientError
+
+logger = logging.getLogger(__name__)
+
+# Frozen contract — must match the render Lambda's _verify_token.
+_ISS = "app-api"
+_AUD = "artifact-render"
+# The verifier hard-caps exp - iat at 600s. 120s comfortably covers an
+# iframe load while keeping a leaked-in-a-log token useless almost
+# immediately.
+_TTL_SECONDS = 120
+
+_secret_lock = threading.Lock()
+_table_lock = threading.Lock()
+_s3_lock = threading.Lock()
+_cached_signing_key: Optional[str] = None
+_secrets_client = None
+_ddb_table = None
+_s3_client = None
+_cached_bucket: Optional[str] = None
+
+# Inline code-view ceiling. Past this the SPA shows a "too large to
+# preview — download instead" affordance rather than highlighting a
+# multi-MB blob in the DOM.
+_MAX_CONTENT_BYTES = 2 * 1024 * 1024
+
+# Bare Markdown MIME types. Duplicated (not imported) from the agent
+# writer: the import-boundary rule forbids app_api importing from
+# agents/, and this set rarely changes.
+_MARKDOWN_MIME_TYPES = frozenset({"text/markdown", "text/x-markdown"})
+
+# The writer embeds the authored Markdown as base64 in this exact script
+# tag inside the rendered HTML wrapper (agents/builtin_tools/artifacts
+# _MARKDOWN_RENDER_TEMPLATE). We unwrap it back to source for code view.
+_MD_SRC_RE = re.compile(
+    r'<script type="application/x-markdown-base64" id="md-src">'
+    r"(?P<b64>[^<]*)</script>"
+)
+
+
+class RenderTokenError(Exception):
+    """Base class for render-token failures."""
+
+
+class ArtifactNotFoundError(RenderTokenError):
+    """No version record for the requested (user, artifact, version)."""
+
+
+class RenderTokenConfigError(RenderTokenError):
+    """Required environment / AWS configuration is missing or unusable."""
+
+
+class ArtifactQueryError(RenderTokenError):
+    """A backing-store query failed at runtime (throttle, timeout,
+    transient DynamoDB error) — distinct from a misconfiguration: the
+    feature is set up correctly, the request just couldn't be served."""
+
+
+class ArtifactTooLargeError(RenderTokenError):
+    """The artifact body exceeds the inline code-view cap. The caller
+    should fall back to the download path rather than streaming a huge
+    blob into the SPA's DOM for syntax highlighting."""
+
+
+def _reset_caches_for_tests() -> None:
+    """Drop process-wide singletons so test order can't leak a stale
+    signing key, secrets client, or DDB table handle."""
+    global _cached_signing_key, _secrets_client, _ddb_table
+    global _s3_client, _cached_bucket
+    _s3_client = None
+    _cached_bucket = None
+    _cached_signing_key = None
+    _secrets_client = None
+    _ddb_table = None
+
+
+def _region() -> str:
+    return (
+        os.environ.get("AWS_REGION")
+        or os.environ.get("AWS_DEFAULT_REGION")
+        or "us-west-2"
+    )
+
+
+def _signing_key() -> str:
+    """Fetch and cache the HMAC signing key. The secret is a plain
+    string (Secrets Manager generateSecretString, no JSON wrapper) —
+    same shape as the BFF cookie data key."""
+    global _cached_signing_key, _secrets_client
+    if _cached_signing_key is not None:
+        return _cached_signing_key
+    with _secret_lock:
+        if _cached_signing_key is not None:
+            return _cached_signing_key
+        arn = os.environ.get("ARTIFACTS_RENDER_TOKEN_SECRET_ARN", "")
+        if not arn:
+            raise RenderTokenConfigError(
+                "ARTIFACTS_RENDER_TOKEN_SECRET_ARN is not set"
+            )
+        if _secrets_client is None:
+            _secrets_client = boto3.client(
+                "secretsmanager", region_name=_region()
+            )
+        try:
+            response = _secrets_client.get_secret_value(SecretId=arn)
+        except ClientError as exc:
+            raise RenderTokenConfigError(
+                "could not read render token secret"
+            ) from exc
+        key = response.get("SecretString") or ""
+        if not key:
+            raise RenderTokenConfigError("render token secret is empty")
+        _cached_signing_key = key
+        return key
+
+
+def _table():
+    global _ddb_table
+    if _ddb_table is not None:
+        return _ddb_table
+    with _table_lock:
+        if _ddb_table is not None:
+            return _ddb_table
+        name = os.environ.get("DYNAMODB_ARTIFACTS_TABLE_NAME", "")
+        if not name:
+            raise RenderTokenConfigError(
+                "DYNAMODB_ARTIFACTS_TABLE_NAME is not set"
+            )
+        _ddb_table = boto3.resource(
+            "dynamodb", region_name=_region()
+        ).Table(name)
+        return _ddb_table
+
+
+def _origin() -> str:
+    """The artifact origin the render token is bound to.
+
+    Validated like the signing key and table so a misconfigured deploy
+    fails closed with a 500 — never returns a usable token embedded in a
+    relative, unloadable URL. Infra sets this env var alongside the
+    secret ARN and table name, so an empty value here means a broken
+    artifacts deploy, not a disabled feature."""
+    origin = os.environ.get("ARTIFACTS_ORIGIN", "").strip().rstrip("/")
+    if not origin:
+        raise RenderTokenConfigError("ARTIFACTS_ORIGIN is not set")
+    return origin
+
+
+def _assert_version_exists(
+    user_id: str, artifact_id: str, version: int
+) -> None:
+    """Confirm the exact version row exists and belongs to this user.
+
+    Building the PK from the authenticated user's id is what scopes the
+    token: a caller can never mint for another user's artifact. The
+    SK zero-pad must match the verifier's `V#{version:05d}`."""
+    sk = f"ARTIFACT#{artifact_id}#V#{version:05d}"
+    try:
+        result = _table().get_item(
+            Key={"PK": f"USER#{user_id}", "SK": sk}
+        )
+    except ClientError as exc:
+        raise RenderTokenConfigError(
+            "artifact metadata lookup failed"
+        ) from exc
+    if "Item" not in result:
+        raise ArtifactNotFoundError("artifact version not found")
+
+
+class RenderTokenService:
+    def mint(
+        self,
+        *,
+        user_id: str,
+        artifact_id: str,
+        version: int,
+        session_id: Optional[str],
+    ) -> tuple[str, int]:
+        """Validate config + ownership/existence, then mint a token.
+
+        Returns (render_url, exp_unix). Raises ArtifactNotFoundError or
+        RenderTokenConfigError. Origin is resolved first so a misconfig
+        fails closed before any DDB call or credential is generated."""
+        origin = _origin()
+        _assert_version_exists(user_id, artifact_id, version)
+        now = int(time.time())
+        exp = now + _TTL_SECONDS
+        claims = {
+            "iss": _ISS,
+            "aud": _AUD,
+            "sub": user_id,
+            "aid": artifact_id,
+            "ver": version,
+            "sid": session_id or "",
+            "iat": now,
+            "exp": exp,
+        }
+        token = jwt.encode(claims, _signing_key(), algorithm="HS256")
+        logger.info(
+            "minted render token user=%s artifact=%s v=%s",
+            user_id,
+            artifact_id,
+            version,
+        )
+        return f"{origin}/?t={token}", exp
+
+
+def get_render_token_service() -> RenderTokenService:
+    return RenderTokenService()
+
+
+# Frozen contract — the HEAD row + SessionIndex keys the artifact writer
+# (backend/src/agents/builtin_tools/artifacts/service.py) emits.
+_SESSION_INDEX = "SessionIndex"
+
+
+class ArtifactListService:
+    """List every version of every artifact created in a chat session.
+
+    Two-step, because the SessionIndex GSI only projects HEAD rows (the
+    writer attaches GSI1PK/GSI1SK to the HEAD put only):
+
+      1. Query SessionIndex by GSI1PK=SESSION#{sid} to discover the
+         artifacts in the session. GSI1PK is NOT user-scoped, so each
+         HEAD row is re-checked against the authenticated user's id.
+      2. Per artifact, query the main table by PK=USER#{uid} and
+         SK begins_with ARTIFACT#{aid}#V# for all immutable version
+         rows. PK is the authenticated user's id, so step 2 is
+         ownership-safe by construction.
+
+    The SPA renders one card per version, anchored to the turn that
+    produced it via the per-version produced_by_message_index the writer
+    stamps. Version rows written before per-version linkage shipped lack
+    that attribute (and updated_at) and degrade to the SPA's
+    end-of-conversation strip rather than a per-turn anchor.
+    """
+
+    def list_for_session(
+        self, *, user_id: str, session_id: str
+    ) -> list[dict]:
+        table = _table()
+        head_items: list[dict] = []
+        kwargs: dict = {
+            "IndexName": _SESSION_INDEX,
+            "KeyConditionExpression": Key("GSI1PK").eq(
+                f"SESSION#{session_id}"
+            ),
+            "ScanIndexForward": False,  # GSI1SK embeds updated_at → newest first
+        }
+        try:
+            while True:
+                resp = table.query(**kwargs)
+                head_items.extend(resp.get("Items", []))
+                last = resp.get("LastEvaluatedKey")
+                if not last:
+                    break
+                kwargs["ExclusiveStartKey"] = last
+        except ClientError as exc:
+            raise ArtifactQueryError(
+                "artifact list query failed"
+            ) from exc
+
+        # Distinct artifact ids in the session, newest-first, owned by
+        # the caller. dict.fromkeys dedupes while preserving GSI order.
+        artifact_ids = list(
+            dict.fromkeys(
+                item.get("artifact_id", "")
+                for item in head_items
+                if item.get("user_id") == user_id
+                and item.get("artifact_id")
+            )
+        )
+
+        summaries: list[dict] = []
+        for artifact_id in artifact_ids:
+            summaries.extend(
+                self._versions_for_artifact(user_id, artifact_id)
+            )
+        return summaries
+
+    @staticmethod
+    def _versions_for_artifact(
+        user_id: str, artifact_id: str
+    ) -> list[dict]:
+        """All immutable version rows for one artifact, scoped to the
+        user by PK. The #HEAD row shares the SK prefix but not the `#V#`
+        infix, so begins_with cleanly excludes it."""
+        table = _table()
+        items: list[dict] = []
+        kwargs: dict = {
+            "KeyConditionExpression": Key("PK").eq(f"USER#{user_id}")
+            & Key("SK").begins_with(f"ARTIFACT#{artifact_id}#V#"),
+        }
+        try:
+            while True:
+                resp = table.query(**kwargs)
+                items.extend(resp.get("Items", []))
+                last = resp.get("LastEvaluatedKey")
+                if not last:
+                    break
+                kwargs["ExclusiveStartKey"] = last
+        except ClientError as exc:
+            raise ArtifactQueryError(
+                "artifact version query failed"
+            ) from exc
+
+        return [
+            {
+                "artifact_id": item.get("artifact_id", ""),
+                "version": int(item.get("version", 0)),
+                "title": item.get("title", ""),
+                "content_type": item.get(
+                    "content_type", "text/html; charset=utf-8"
+                ),
+                "updated_at": item.get("updated_at", ""),
+                "created_at": item.get("created_at"),
+                "produced_by_message_index": item.get(
+                    "produced_by_message_index"
+                ),
+            }
+            for item in items
+        ]
+
+
+def get_artifact_list_service() -> ArtifactListService:
+    return ArtifactListService()
+
+
+def _bucket_name() -> str:
+    """The artifacts S3 bucket. Set by app-api-stack alongside the table
+    name; an empty value means a broken artifacts deploy, not a disabled
+    feature, so fail closed with a 500."""
+    global _cached_bucket
+    if _cached_bucket is not None:
+        return _cached_bucket
+    with _s3_lock:
+        if _cached_bucket is not None:
+            return _cached_bucket
+        name = os.environ.get("S3_ARTIFACTS_BUCKET_NAME", "")
+        if not name:
+            raise RenderTokenConfigError(
+                "S3_ARTIFACTS_BUCKET_NAME is not set"
+            )
+        _cached_bucket = name
+        return name
+
+
+def _s3():
+    global _s3_client
+    if _s3_client is not None:
+        return _s3_client
+    with _s3_lock:
+        if _s3_client is None:
+            _s3_client = boto3.client("s3", region_name=_region())
+        return _s3_client
+
+
+def _get_version_item(
+    user_id: str, artifact_id: str, version: int
+) -> dict:
+    """Fetch the exact version row, scoped to the authenticated user.
+
+    Building the PK from the session user's id is what prevents reading
+    another user's artifact. SK zero-pad matches the writer/verifier
+    `V#{version:05d}` contract."""
+    sk = f"ARTIFACT#{artifact_id}#V#{version:05d}"
+    try:
+        result = _table().get_item(
+            Key={"PK": f"USER#{user_id}", "SK": sk}
+        )
+    except ClientError as exc:
+        raise ArtifactQueryError(
+            "artifact metadata lookup failed"
+        ) from exc
+    item = result.get("Item")
+    if not item:
+        raise ArtifactNotFoundError("artifact version not found")
+    return item
+
+
+def _is_markdown(content_type: str) -> bool:
+    bare = (content_type or "").split(";")[0].strip().lower()
+    return bare in _MARKDOWN_MIME_TYPES
+
+
+def _unwrap_markdown(html_body: str) -> Optional[str]:
+    """Recover the authored Markdown from the writer's HTML wrapper.
+
+    Markdown artifacts are stored as a self-contained HTML render
+    scaffold with the original source base64-embedded in a fixed
+    `<script id="md-src">` tag. Returns the decoded Markdown, or None if
+    the tag is absent / undecodable (legacy object or a future template
+    change) so the caller can fall back to the raw bytes."""
+    match = _MD_SRC_RE.search(html_body)
+    if not match:
+        return None
+    try:
+        return base64.b64decode(match.group("b64")).decode("utf-8")
+    except (ValueError, UnicodeDecodeError):
+        return None
+
+
+class ArtifactContentService:
+    """Return one artifact version's raw source for the panel code view.
+
+    Ownership is enforced by the PK lookup. For Markdown the stored S3
+    object is a rendered HTML wrapper; we unwrap it back to the authored
+    Markdown so code view shows what the model actually wrote, and
+    normalize `content_type` to `text/markdown` to match. Anything that
+    can't be unwrapped falls back to the raw stored bytes + real type so
+    the view still shows something truthful instead of erroring."""
+
+    def get(
+        self, *, user_id: str, artifact_id: str, version: int
+    ) -> tuple[str, str]:
+        bucket = _bucket_name()
+        item = _get_version_item(user_id, artifact_id, version)
+        content_key = item.get("content_key")
+        stored_type = item.get(
+            "content_type", "text/html; charset=utf-8"
+        )
+        if not content_key:
+            raise ArtifactNotFoundError("artifact has no stored content")
+
+        try:
+            obj = _s3().get_object(Bucket=bucket, Key=content_key)
+        except ClientError as exc:
+            code = exc.response.get("Error", {}).get("Code", "")
+            if code in ("NoSuchKey", "NoSuchBucket", "404"):
+                raise ArtifactNotFoundError(
+                    "artifact content not found"
+                ) from exc
+            raise ArtifactQueryError(
+                "artifact content fetch failed"
+            ) from exc
+
+        if obj.get("ContentLength", 0) > _MAX_CONTENT_BYTES:
+            raise ArtifactTooLargeError("artifact too large for code view")
+
+        raw = obj["Body"].read(_MAX_CONTENT_BYTES + 1)
+        if len(raw) > _MAX_CONTENT_BYTES:
+            raise ArtifactTooLargeError("artifact too large for code view")
+        body = raw.decode("utf-8", errors="replace")
+
+        if _is_markdown(stored_type):
+            unwrapped = _unwrap_markdown(body)
+            if unwrapped is not None:
+                return unwrapped, "text/markdown"
+        return body, stored_type
+
+
+def get_artifact_content_service() -> ArtifactContentService:
+    return ArtifactContentService()
diff --git a/backend/src/apis/app_api/chat/proxy_routes.py b/backend/src/apis/app_api/chat/proxy_routes.py
index dfe2e18d..b01d036c 100644
--- a/backend/src/apis/app_api/chat/proxy_routes.py
+++ b/backend/src/apis/app_api/chat/proxy_routes.py
@@ -1,7 +1,7 @@
 """BFF chat proxy — forwards browser SSE chat requests to inference-api.
 
-`POST /chat/stream` is the cookie-authenticated counterpart to the SPA's
-current direct-to-inference-api chat call. The flow:
+`POST /chat/stream` is the cookie-authenticated chat path for the SPA.
+The flow:
 
   Browser  → CloudFront `/api/*`  → app-api  → inference-api `/invocations`
            (httpOnly session cookie)         (Authorization: Bearer <token>)
@@ -9,13 +9,8 @@
 `SessionRefreshMiddleware` resolves the cookie and, if the stored Cognito
 access token is near expiry, refreshes it before this handler runs. The
 handler then forwards `current_user.raw_token` — the freshly-validated
-access token — to inference-api, which already accepts Cognito Bearer
-tokens via `get_current_user_trusted` on `/invocations`. No inference-api
-changes needed (architecture decision #4 in the BFF migration plan).
-
-The legacy in-process Bearer agent route that previously owned `/chat/stream`
-was renamed to `/chat/agent-stream` in the Phase 6 cutover. The Phase 4
-`/chat/proxy-stream` rolling-deploy alias was deleted in Phase 7.
+access token — to inference-api, which accepts Cognito Bearer tokens via
+`get_current_user_trusted` on `/invocations`.
 """
 
 from __future__ import annotations
diff --git a/backend/src/apis/app_api/chat/routes.py b/backend/src/apis/app_api/chat/routes.py
index 9864f8ff..563af3fb 100644
--- a/backend/src/apis/app_api/chat/routes.py
+++ b/backend/src/apis/app_api/chat/routes.py
@@ -1,60 +1,26 @@
 """Chat feature routes
 
 Application-specific chat endpoints moved from inference_api to keep
-AgentCore Runtime API clean. These endpoints handle:
-- Conversation title generation
-- In-process agent streaming (`POST /chat/agent-stream`, Bearer auth)
-- Multimodal chat input
+the AgentCore Runtime API clean. Currently:
+- Conversation title generation (`POST /chat/generate-title`)
 
-`/chat/agent-stream` was named `/chat/stream` until the Phase 6 BFF
-cutover, which reclaimed that path for the cookie-authenticated proxy
-to inference-api. Bearer-authenticated callers (API-key tooling, scripts)
-must update to the new path.
+The browser-facing streaming chat path is the cookie-authenticated BFF
+proxy at `POST /chat/stream` (see `proxy_routes.py`).
 """
 
-import asyncio
-import json
 import logging
 
-from fastapi import APIRouter, Depends, HTTPException
-from fastapi.responses import StreamingResponse
+from fastapi import APIRouter, Depends
 
-from agents.main_agent.session.session_factory import SessionFactory
-from agents.main_agent.session.preview_session_manager import is_preview_session
-from apis.app_api.admin.services import get_tool_access_service
-from apis.shared.assistants.service import assistant_exists, get_assistant_with_access_check, mark_share_as_interacted
-from apis.shared.assistants.rag_service import augment_prompt_with_context, search_assistant_knowledgebase_with_formatting
-from apis.shared.files.file_resolver import get_file_resolver
-from apis.shared.sessions.models import SessionMetadata, SessionPreferences
-from apis.shared.sessions.messages import get_messages
-from apis.shared.sessions.metadata import get_session_metadata, store_session_metadata
-
-# Import models and services from inference_api (shared code)
-from apis.inference_api.chat.models import ChatEvent, ChatRequest, FileContent, GenerateTitleRequest, GenerateTitleResponse
-from apis.inference_api.chat.routes import stream_conversational_message
-from apis.inference_api.chat.service import generate_conversation_title, get_agent
-from apis.shared.auth.dependencies import get_current_user, get_current_user_from_session
+from apis.inference_api.chat.models import GenerateTitleRequest, GenerateTitleResponse
+from apis.inference_api.chat.service import generate_conversation_title
+from apis.shared.auth.dependencies import get_current_user_from_session
 from apis.shared.auth.models import User
-from apis.shared.errors import (
-    ErrorCode,
-    build_conversational_error_event,
-)
-from apis.shared.quota import (
-    build_no_quota_configured_event,
-    build_quota_exceeded_event,
-    build_quota_warning_event,
-    get_quota_checker,
-    is_quota_enforcement_enabled,
-)
 
 logger = logging.getLogger(__name__)
 
 router = APIRouter(prefix="/chat", tags=["chat"])
 
-# Stream timeout configuration (in seconds)
-# Prevents hanging streams and resource exhaustion
-STREAM_TIMEOUT_SECONDS = 600  # 10 minutes
-
 
 @router.post("/generate-title")
 async def generate_title(request: GenerateTitleRequest, current_user: User = Depends(get_current_user_from_session)):
@@ -66,7 +32,7 @@ async def generate_title(request: GenerateTitleRequest, current_user: User = Dep
     to be called in parallel with the first chat request.
 
     The endpoint:
-    - Uses JWT authentication to extract user_id
+    - Uses cookie session auth to extract user_id
     - Truncates input to ~500 tokens for speed and cost efficiency
     - Calls Nova Micro with temperature=0.3 for consistent output
     - Updates session metadata both locally and in cloud
@@ -74,7 +40,7 @@ async def generate_title(request: GenerateTitleRequest, current_user: User = Dep
 
     Args:
         request: GenerateTitleRequest with session_id and user input
-        current_user: User from JWT token (injected by dependency)
+        current_user: User from session cookie (injected by dependency)
 
     Returns:
         GenerateTitleResponse with generated title and session_id
@@ -93,532 +59,3 @@ async def generate_title(request: GenerateTitleRequest, current_user: User = Dep
         # Return fallback instead of raising exception
         # Title generation failures shouldn't break the user experience
         return GenerateTitleResponse(title="New Conversation", session_id=request.session_id)
-
-
-@router.post("/agent-stream")
-async def chat_agent_stream(request: ChatRequest, current_user: User = Depends(get_current_user)):
-    """
-    Bearer-authenticated in-process agent stream.
-
-    Runs the agent loop inside this app-api process and streams the SSE
-    response back. Distinct from `/chat/stream` (Phase 6 BFF cookie proxy
-    to inference-api) and `/chat/api-converse` (X-API-Key proxy). This
-    endpoint was previously registered at `/chat/stream`; it was moved to
-    `/chat/agent-stream` in the Phase 6 BFF cutover so the shorter path
-    could be reclaimed for the browser-facing cookie route.
-
-    Uses default tools (all available) if enabled_tools not specified.
-    Uses the authenticated user's ID from the JWT token.
-
-    Tool authorization:
-    - Filters requested tools to only those the user is allowed to use via AppRoles
-    - If user has no tool permissions, agent runs without tools
-
-    Quota enforcement (when enabled via ENABLE_QUOTA_ENFORCEMENT=true):
-    - Checks user quota before processing
-    - Streams quota_exceeded as assistant message if quota exceeded (better UX)
-    - Injects quota_warning event into stream if approaching limit
-    """
-    user_id = current_user.user_id
-    logger.info(f"Legacy chat request - Session: {request.session_id}, User: {user_id}, Message: {request.message[:50]}...")
-
-    # Filter tools based on user permissions (RBAC)
-    authorized_tools = request.enabled_tools
-    try:
-        tool_access_service = get_tool_access_service()
-        authorized_tools, denied_tools = await tool_access_service.check_access_and_filter(
-            user=current_user,
-            requested_tools=request.enabled_tools,
-            strict=False,  # Don't fail, just filter
-        )
-        if denied_tools:
-            logger.info(f"Filtered out unauthorized tools for user {user_id}: {denied_tools}")
-    except Exception as e:
-        # Log error but don't block request - fail open for RBAC errors
-        logger.error(f"Error filtering tools for user {user_id}: {e}", exc_info=True)
-
-    # Check quota if enforcement is enabled
-    quota_warning_event = None
-    quota_exceeded_event = None
-    if is_quota_enforcement_enabled():
-        try:
-            quota_checker = get_quota_checker()
-            quota_result = await quota_checker.check_quota(user=current_user, session_id=request.session_id)
-
-            if not quota_result.allowed:
-                # Quota blocked - stream as SSE instead of 429 for better UX
-                logger.warning(f"Quota blocked for user {user_id}: {quota_result.message}")
-                if quota_result.tier is None:
-                    # No quota tier configured for this user
-                    quota_exceeded_event = build_no_quota_configured_event(quota_result)
-                else:
-                    # Quota limit exceeded
-                    quota_exceeded_event = build_quota_exceeded_event(quota_result)
-            else:
-                # Check for warning level
-                quota_warning_event = build_quota_warning_event(quota_result)
-                if quota_warning_event:
-                    logger.info(f"Quota warning for user {user_id}: {quota_result.warning_level}")
-
-        except Exception as e:
-            # Log error but don't block request - fail open for quota errors
-            logger.error(f"Error checking quota for user {user_id}: {e}", exc_info=True)
-
-    # If quota exceeded, stream the quota exceeded message instead of agent response
-    if quota_exceeded_event:
-        return StreamingResponse(
-            stream_conversational_message(
-                message=quota_exceeded_event.message,
-                stop_reason="quota_exceeded",
-                metadata_event=quota_exceeded_event,
-                session_id=request.session_id,
-                user_id=user_id,
-                user_input=request.message,
-            ),
-            media_type="text/event-stream",
-            headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no", "X-Session-ID": request.session_id},
-        )
-
-    # Handle assistant RAG integration if assistant_id is provided
-    assistant = None
-    augmented_message = request.message
-    system_prompt = None
-    context_chunks = []  # RAG context chunks for citation events
-
-    # Get assistant_id from request or session preferences (priority: request > preferences)
-    assistant_id_to_use = request.assistant_id
-    if not assistant_id_to_use:
-        # Check session preferences for persisted assistant
-        try:
-            existing_metadata = await get_session_metadata(request.session_id, user_id)
-            if existing_metadata and existing_metadata.preferences:
-                assistant_id_to_use = existing_metadata.preferences.assistant_id
-                if assistant_id_to_use:
-                    logger.info(f"Using persisted assistant {assistant_id_to_use} from session preferences")
-        except Exception as e:
-            logger.error(f"Error checking session preferences for assistant: {e}", exc_info=True)
-            # Continue without assistant if metadata check fails
-
-    logger.info(f"Chat request received - Session: {request.session_id}, Assistant ID: {assistant_id_to_use}, Message: {request.message[:50]}...")
-
-    if assistant_id_to_use:
-        logger.info(f"Assistant RAG requested - Assistant: {assistant_id_to_use}, Session: {request.session_id}")
-
-        # 1. Check if session already has an assistant attached
-        # If it does, verify it's the same assistant (can't change assistants mid-session)
-        # If it doesn't, verify session has no messages (can only attach to new sessions)
-        try:
-            existing_metadata = await get_session_metadata(request.session_id, user_id)
-            existing_assistant_id = existing_metadata.preferences.assistant_id if existing_metadata and existing_metadata.preferences else None
-
-            if existing_assistant_id:
-                # Session already has an assistant - verify it's the same one (if request provided one)
-                if request.assistant_id and existing_assistant_id != request.assistant_id:
-                    logger.warning(
-                        f"Attempted to change assistant from {existing_assistant_id} to {request.assistant_id} in session {request.session_id}"
-                    )
-                    raise HTTPException(
-                        status_code=400, detail="Cannot change assistants mid-session. Start a new session to use a different assistant."
-                    )
-                # Same assistant or using persisted one - allow it to continue
-                logger.info(f"Continuing with existing assistant {assistant_id_to_use} in session {request.session_id}")
-            else:
-                # No assistant attached - verify session has no messages (can only attach to new sessions)
-                # Only check if this is a new attachment (from request, not from preferences)
-                if request.assistant_id:
-                    messages_response = await get_messages(
-                        session_id=request.session_id,
-                        user_id=user_id,
-                        limit=1,  # Only need to check if any messages exist
-                    )
-                    if messages_response.messages and len(messages_response.messages) > 0:
-                        logger.warning(f"Attempted to attach assistant {request.assistant_id} to session {request.session_id} with existing messages")
-                        raise HTTPException(
-                            status_code=400, detail="Assistants can only be attached to new sessions, start a new session to chat with this assistant"
-                        )
-        except HTTPException:
-            raise
-        except Exception as e:
-            logger.error(f"Error checking session state: {e}", exc_info=True)
-            # Continue anyway - better to allow than block on error
-
-        # 2. Load assistant with access check
-        # First check if assistant exists (without access check) to distinguish 404 from 403
-        exists = await assistant_exists(assistant_id_to_use)
-
-        if not exists:
-            # Assistant doesn't exist
-            logger.warning(f"Assistant {assistant_id_to_use} not found for user {user_id}")
-            raise HTTPException(status_code=404, detail=f"Assistant not found: {assistant_id_to_use}")
-
-        # Assistant exists, now check access
-        assistant = await get_assistant_with_access_check(assistant_id=assistant_id_to_use, user_id=user_id, user_email=current_user.email)
-
-        if not assistant:
-            # Assistant exists but access denied (PRIVATE and not owner)
-            logger.warning(f"Access denied: user {user_id} attempted to access PRIVATE assistant {assistant_id_to_use}")
-            raise HTTPException(status_code=403, detail=f"Access denied: You don't have permission to use this assistant")
-
-        # Mark as viewed if this is a shared assistant (not owned)
-        if assistant.owner_id != user_id:
-            await mark_share_as_interacted(assistant_id=assistant_id_to_use, user_email=current_user.email)
-
-        # 3. Search assistant knowledge base
-        try:
-            logger.info(f"Searching knowledge base for assistant {assistant_id_to_use} with query: {request.message[:100]}...")
-            context_chunks = await search_assistant_knowledgebase_with_formatting(assistant_id=assistant_id_to_use, query=request.message, top_k=5)
-            logger.info(f"Knowledge base search returned {len(context_chunks) if context_chunks else 0} chunks")
-
-            # 4. Augment message with context
-            if context_chunks:
-                augmented_message = augment_prompt_with_context(user_message=request.message, context_chunks=context_chunks)
-                logger.info(
-                    f"✅ Augmented message with {len(context_chunks)} context chunks. Original length: {len(request.message)}, Augmented length: {len(augmented_message)}"
-                )
-            else:
-                logger.info(f"⚠️ No context chunks found for assistant {assistant_id_to_use} - using original message without augmentation")
-        except Exception as e:
-            logger.error(f"❌ Error searching assistant knowledge base: {e}", exc_info=True)
-            # Continue without RAG context rather than failing
-
-        # 5. Append assistant's instructions to the base system prompt (don't replace)
-        if assistant.instructions:
-            from agents.main_agent.core.system_prompt_builder import SystemPromptBuilder
-
-            # Build the base prompt with date
-            base_prompt_builder = SystemPromptBuilder()
-            base_prompt = base_prompt_builder.build(include_date=True)
-
-            # Append assistant instructions to the base prompt
-            system_prompt = f"{base_prompt}\n\n## Assistant-Specific Instructions\n\n{assistant.instructions}"
-            logger.info(
-                f"✅ Appended assistant instructions to base system prompt (base: {len(base_prompt)}, assistant: {len(assistant.instructions)}, total: {len(system_prompt)})"
-            )
-        else:
-            # No assistant instructions - use base prompt if no system_prompt provided
-            if not system_prompt:
-                from agents.main_agent.core.system_prompt_builder import SystemPromptBuilder
-
-                base_prompt_builder = SystemPromptBuilder()
-                system_prompt = base_prompt_builder.build(include_date=True)
-            logger.info(f"⚠️ Assistant {assistant_id_to_use} has no instructions - using {'provided' if system_prompt else 'default'} system prompt")
-
-        # 6. Save assistant_id to session preferences (persist for future loads)
-        # Only save if it came from the request (not already persisted)
-        # Skip for preview sessions - they should not persist metadata
-        if request.assistant_id and not is_preview_session(request.session_id):
-            try:
-                existing_metadata = await get_session_metadata(request.session_id, user_id)
-                if existing_metadata:
-                    # Update existing metadata with assistant_id in preferences
-                    prefs_dict = existing_metadata.preferences.model_dump(by_alias=False) if existing_metadata.preferences else {}
-                    prefs_dict["assistant_id"] = assistant_id_to_use
-                    preferences = SessionPreferences(**prefs_dict)
-
-                    updated_metadata = SessionMetadata(
-                        session_id=existing_metadata.session_id,
-                        user_id=existing_metadata.user_id,
-                        title=existing_metadata.title,
-                        status=existing_metadata.status,
-                        created_at=existing_metadata.created_at,
-                        last_message_at=existing_metadata.last_message_at,
-                        message_count=existing_metadata.message_count,
-                        starred=existing_metadata.starred,
-                        tags=existing_metadata.tags,
-                        preferences=preferences,
-                    )
-                else:
-                    # Create new metadata with assistant_id in preferences
-                    from datetime import datetime, timezone
-
-                    now = datetime.now(timezone.utc).isoformat()
-                    preferences = SessionPreferences(assistant_id=assistant_id_to_use)
-
-                    updated_metadata = SessionMetadata(
-                        session_id=request.session_id,
-                        user_id=user_id,
-                        title="New Conversation",
-                        status="active",
-                        created_at=now,
-                        last_message_at=now,
-                        message_count=0,  # Will be updated by stream coordinator
-                        starred=False,
-                        tags=[],
-                        preferences=preferences,
-                    )
-
-                await store_session_metadata(session_id=request.session_id, user_id=user_id, session_metadata=updated_metadata)
-                logger.info(f"💾 Saved assistant_id {assistant_id_to_use} to session {request.session_id} preferences")
-            except Exception as e:
-                logger.error(f"Failed to save assistant_id to session preferences: {e}", exc_info=True)
-                # Don't fail the request if metadata save fails
-
-    # Resolve file upload IDs to FileContent objects
-    files_to_send = list(request.files) if request.files else []
-
-    if request.file_upload_ids:
-        logger.info(f"Resolving {len(request.file_upload_ids)} file upload IDs")
-        try:
-            file_resolver = get_file_resolver()
-            resolved_files = await file_resolver.resolve_files(
-                user_id=user_id,
-                upload_ids=request.file_upload_ids,
-                max_files=5,  # Bedrock document limit
-            )
-            # Convert ResolvedFileContent to FileContent
-            for rf in resolved_files:
-                files_to_send.append(FileContent(filename=rf.filename, content_type=rf.content_type, bytes=rf.bytes))
-            logger.info(f"Resolved {len(resolved_files)} files from upload IDs")
-        except Exception as e:
-            logger.warning(f"Failed to resolve file upload IDs: {e}")
-            # Continue without files rather than failing the request
-
-    try:
-        # Get agent instance (with or without tool filtering)
-        # Use assistant's system prompt if provided
-        agent = await get_agent(
-            session_id=request.session_id,
-            user_id=user_id,
-            enabled_tools=authorized_tools,  # Filtered by RBAC (may be None for all allowed)
-            system_prompt=system_prompt,  # Assistant instructions if assistant is attached
-        )
-
-        # Wrap stream to ensure flush on disconnect and prevent further processing
-        async def stream_with_cleanup():
-            # Yield quota warning event first if applicable
-            if quota_warning_event:
-                yield quota_warning_event.to_sse_format()
-
-            # Yield citation events BEFORE the agent stream starts
-            # This allows the UI to display sources immediately
-            if context_chunks:
-                for chunk in context_chunks:
-                    citation_event = {
-                        "type": "citation",
-                        "assistantId": assistant_id_to_use,
-                        "documentId": chunk.get("metadata", {}).get("document_id", ""),
-                        "fileName": chunk.get("metadata", {}).get("source", "Unknown Source"),
-                        "text": chunk.get("text", "")[:500],  # Limit excerpt length
-                    }
-                    yield f"event: citation\ndata: {json.dumps(citation_event)}\n\n"
-
-            # Pass resolved files (from S3) merged with any direct file content
-            # Use augmented message if assistant RAG was applied
-            #
-            # Always store the original user message as displayText when the prompt
-            # will be modified before reaching the model. This happens when:
-            #   1. RAG augmentation prepends context chunks to the message
-            #   2. File attachments cause PromptBuilder to rewrite into ContentBlocks
-            # The original text becomes the single source of truth for UI display,
-            # while the full augmented prompt stays in AgentCore Memory for the LLM.
-            message_will_be_modified = (
-                augmented_message != request.message  # RAG augmentation
-                or bool(files_to_send)                # File attachments
-            )
-            stream_iterator = agent.stream_async(
-                augmented_message,
-                session_id=request.session_id,
-                files=files_to_send if files_to_send else None,
-                original_message=request.message if message_will_be_modified else None,
-            )
-
-            try:
-                # Add timeout to prevent hanging streams
-                async with asyncio.timeout(STREAM_TIMEOUT_SECONDS):
-                    async for event in stream_iterator:
-                        yield event
-
-            except asyncio.TimeoutError:
-                # Stream exceeded timeout - send as conversational message
-                logger.error(f"⏱️ Stream timeout ({STREAM_TIMEOUT_SECONDS}s) for session {request.session_id}")
-
-                # Build conversational timeout error
-                timeout_error = Exception(f"Stream processing time exceeded {STREAM_TIMEOUT_SECONDS} seconds")
-                error_event = build_conversational_error_event(
-                    code=ErrorCode.TIMEOUT, error=timeout_error, session_id=request.session_id, recoverable=True
-                )
-
-                # Stream timeout error as assistant message
-                yield f'event: message_start\ndata: {{"role": "assistant"}}\n\n'
-                yield f'event: content_block_start\ndata: {{"contentBlockIndex": 0, "type": "text"}}\n\n'
-                yield f"event: content_block_delta\ndata: {json.dumps({'contentBlockIndex': 0, 'type': 'text', 'text': error_event.message})}\n\n"
-
-                yield f'event: content_block_stop\ndata: {{"contentBlockIndex": 0}}\n\n'
-                yield f'event: message_stop\ndata: {{"stopReason": "error"}}\n\n'
-                yield error_event.to_sse_format()
-                yield "event: done\ndata: {}\n\n"
-
-                # Persist timeout error to session
-                try:
-                    from strands.types.content import Message
-                    from strands.types.session import SessionMessage
-
-                    session_manager = SessionFactory.create_session_manager(session_id=request.session_id, user_id=user_id, caching_enabled=False)
-
-                    user_msg: Message = {"role": "user", "content": [{"text": request.message}]}
-                    assistant_msg: Message = {"role": "assistant", "content": [{"text": error_event.message}]}
-
-                    if hasattr(session_manager, "base_manager") and hasattr(session_manager.base_manager, "create_message"):
-                        user_session_msg = SessionMessage.from_message(user_msg, 0)
-                        assistant_session_msg = SessionMessage.from_message(assistant_msg, 1)
-                        session_manager.base_manager.create_message(request.session_id, "default", user_session_msg)
-                        session_manager.base_manager.create_message(request.session_id, "default", assistant_session_msg)
-                        logger.info(f"💾 Saved timeout error messages to session {request.session_id}")
-                except Exception as persist_error:
-                    logger.error(f"Failed to persist timeout error to session: {persist_error}")
-
-                return
-
-            except asyncio.CancelledError:
-                # Client disconnected (e.g., stop button clicked)
-                logger.warning(f"⚠️ Client disconnected during streaming for session {request.session_id}")
-
-                # Mark session manager as cancelled to prevent further tool execution
-                if hasattr(agent.session_manager, "cancelled"):
-                    agent.session_manager.cancelled = True
-                    logger.info(f"🚫 Session manager marked as cancelled - will ignore further messages")
-
-                # Add final assistant message with stop reason
-                stop_message = {"role": "assistant", "content": [{"text": "Session stopped by user"}]}
-                if hasattr(agent.session_manager, "pending_messages"):
-                    agent.session_manager.pending_messages.append(stop_message)
-                    logger.info(f"📝 Added stop message to pending buffer")
-
-                # Re-raise to properly close the connection
-                raise
-
-            except Exception as e:
-                # Log unexpected errors and send to client as conversational message
-                logger.error(f"Error during streaming for session {request.session_id}: {e}", exc_info=True)
-
-                # Build conversational error for better UX
-                error_event = build_conversational_error_event(code=ErrorCode.STREAM_ERROR, error=e, session_id=request.session_id, recoverable=True)
-
-                # Stream error as assistant message
-                yield f'event: message_start\ndata: {{"role": "assistant"}}\n\n'
-                yield f'event: content_block_start\ndata: {{"contentBlockIndex": 0, "type": "text"}}\n\n'
-                yield f"event: content_block_delta\ndata: {json.dumps({'contentBlockIndex': 0, 'type': 'text', 'text': error_event.message})}\n\n"
-
-                yield f'event: content_block_stop\ndata: {{"contentBlockIndex": 0}}\n\n'
-                yield f'event: message_stop\ndata: {{"stopReason": "error"}}\n\n'
-                yield error_event.to_sse_format()
-                yield "event: done\ndata: {}\n\n"
-
-                # Persist error messages to session
-                try:
-                    from strands.types.content import Message
-                    from strands.types.session import SessionMessage
-
-                    session_manager = SessionFactory.create_session_manager(session_id=request.session_id, user_id=user_id, caching_enabled=False)
-
-                    user_msg: Message = {"role": "user", "content": [{"text": request.message}]}
-                    assistant_msg: Message = {"role": "assistant", "content": [{"text": error_event.message}]}
-
-                    if hasattr(session_manager, "base_manager") and hasattr(session_manager.base_manager, "create_message"):
-                        user_session_msg = SessionMessage.from_message(user_msg, 0)
-                        assistant_session_msg = SessionMessage.from_message(assistant_msg, 1)
-                        session_manager.base_manager.create_message(request.session_id, "default", user_session_msg)
-                        session_manager.base_manager.create_message(request.session_id, "default", assistant_session_msg)
-                        logger.info(f"💾 Saved stream error messages to session {request.session_id}")
-                except Exception as persist_error:
-                    logger.error(f"Failed to persist stream error to session: {persist_error}")
-
-                # Don't re-raise - we've handled the error gracefully
-                return
-
-            finally:
-                # Cleanup: Flush buffered messages and close stream iterator
-                # This runs on both success and error paths
-                if hasattr(agent.session_manager, "flush"):
-                    try:
-                        agent.session_manager.flush()
-                        logger.info(f"💾 Flushed buffered messages for session {request.session_id}")
-                    except Exception as flush_error:
-                        logger.error(f"Failed to flush session {request.session_id}: {flush_error}")
-
-                # Close the stream iterator if possible
-                if hasattr(stream_iterator, "aclose"):
-                    try:
-                        await stream_iterator.aclose()
-                    except Exception as close_error:
-                        logger.debug(f"Failed to close stream iterator: {close_error}")
-
-        # Stream response from agent
-        return StreamingResponse(
-            stream_with_cleanup(),
-            media_type="text/event-stream",
-            headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no", "X-Session-ID": request.session_id},
-        )
-
-    except HTTPException:
-        # Re-raise HTTP exceptions as-is (e.g., from auth)
-        raise
-    except Exception as e:
-        # Stream error as a conversational assistant message for better UX
-        logger.error(f"Error in chat_stream: {e}", exc_info=True)
-
-        error_event = build_conversational_error_event(code=ErrorCode.STREAM_ERROR, error=e, session_id=request.session_id, recoverable=True)
-
-        return StreamingResponse(
-            stream_conversational_message(
-                message=error_event.message,
-                stop_reason="error",
-                metadata_event=error_event,
-                session_id=request.session_id,
-                user_id=user_id,
-                user_input=request.message,
-            ),
-            media_type="text/event-stream",
-            headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no", "X-Session-ID": request.session_id},
-        )
-
-
-@router.post("/multimodal")
-async def chat_multimodal(request: ChatRequest, current_user: User = Depends(get_current_user)):
-    """
-    Stream chat response with multimodal input (files)
-
-    For now, just echoes the message and mentions files.
-    Will be replaced with actual Strands Agent execution.
-    Uses the authenticated user's ID from the JWT token.
-    """
-    user_id = current_user.user_id
-    logger.info(f"Multimodal chat request - Session: {request.session_id}, User: {user_id}")
-    logger.info(f"Message: {request.message[:50]}...")
-    if request.files:
-        logger.info(f"Files: {len(request.files)} uploaded")
-        for file in request.files:
-            logger.info(f"  - {file.filename} ({file.content_type})")
-
-    async def event_generator():
-        try:
-            # Send init event
-            event = ChatEvent(
-                type="init",
-                content="Processing multimodal input",
-                metadata={"session_id": request.session_id, "file_count": len(request.files or [])},
-            )
-            yield f"data: {event.to_json()}\n\n"
-            await asyncio.sleep(0.2)
-
-            # Echo message
-            response_text = f"Received message: '{request.message}'"
-            if request.files:
-                response_text += f" and {len(request.files)} file(s): "
-                response_text += ", ".join([f.filename for f in request.files])
-
-            for word in response_text.split():
-                event = ChatEvent(type="text", content=word + " ")
-                yield f"data: {event.to_json()}\n\n"
-                await asyncio.sleep(0.05)
-
-            # Complete
-            event = ChatEvent(type="complete", content="Multimodal processing complete")
-            yield f"data: {event.to_json()}\n\n"
-
-        except Exception as e:
-            logger.error(f"Error in multimodal event_generator: {e}")
-            error_event = ChatEvent(type="error", content=str(e))
-            yield f"data: {error_event.to_json()}\n\n"
-
-    return StreamingResponse(event_generator(), media_type="text/event-stream", headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"})
diff --git a/backend/src/apis/app_api/connectors/routes.py b/backend/src/apis/app_api/connectors/routes.py
index b5360bc5..20737efc 100644
--- a/backend/src/apis/app_api/connectors/routes.py
+++ b/backend/src/apis/app_api/connectors/routes.py
@@ -356,8 +356,8 @@ async def complete_consent(
 
     Returns `ok: true` on success; errors from AgentCore bubble up as 502.
 
-    Authorization: the inbound JWT (`current_user`) is verified by
-    `get_current_user`, and we pass that user's id as `userIdentifier` to
+    Authorization: the inbound session (`current_user`) is verified by
+    `get_current_user_from_session`, and we pass that user's id as `userIdentifier` to
     AgentCore. AgentCore's own binding rejects a completion attempt whose
     `userIdentifier` doesn't match the identity that initiated the session,
     so a leaked `session_uri` cannot be redeemed under a different user.
diff --git a/backend/src/apis/app_api/files/routes.py b/backend/src/apis/app_api/files/routes.py
index 44119dc7..ef115b97 100644
--- a/backend/src/apis/app_api/files/routes.py
+++ b/backend/src/apis/app_api/files/routes.py
@@ -16,6 +16,9 @@
     PresignRequest,
     PresignResponse,
     CompleteUploadResponse,
+    PreviewUrlResponse,
+    TextSnippetResponse,
+    ThumbnailResponse,
     FileListResponse,
     QuotaResponse,
     QuotaExceededError as QuotaExceededModel,
@@ -30,6 +33,7 @@
     FileNotFoundError,
     FileUploadError,
 )
+from .thumbnails import ThumbnailRenderError, ThumbnailUnsupportedError
 
 logger = logging.getLogger(__name__)
 
@@ -136,6 +140,91 @@ async def complete_upload(
         )
 
 
+@router.get("/{upload_id}/preview-url", response_model=PreviewUrlResponse)
+async def get_preview_url(
+    upload_id: str,
+    user: User = Depends(get_current_user_from_session),
+    service: FileUploadService = Depends(get_file_upload_service),
+):
+    """
+    Generate a short-lived presigned GET URL for an uploaded file.
+
+    Used by the UI to render image previews inline and to open files in a
+    lightbox. The URL is scoped to the file owner and expires after a few
+    minutes.
+    """
+    try:
+        return await service.get_preview_url(user.user_id, upload_id)
+    except FileNotFoundError:
+        raise HTTPException(
+            status_code=status.HTTP_404_NOT_FOUND,
+            detail=f"File {upload_id} not found or not owned by you",
+        )
+
+
+@router.get("/{upload_id}/text-snippet", response_model=TextSnippetResponse)
+async def get_text_snippet(
+    upload_id: str,
+    user: User = Depends(get_current_user_from_session),
+    service: FileUploadService = Depends(get_file_upload_service),
+):
+    """
+    Return a short UTF-8 text excerpt from the start of a file.
+
+    Used by the UI to render a content peek inside the document-style
+    attachment card for text-based files (txt, md, csv, html). Returns an
+    empty snippet for non-text MIME types so the UI can fall back to a
+    skeleton mockup.
+    """
+    try:
+        return await service.get_text_snippet(user.user_id, upload_id)
+    except FileNotFoundError:
+        raise HTTPException(
+            status_code=status.HTTP_404_NOT_FOUND,
+            detail=f"File {upload_id} not found or not owned by you",
+        )
+
+
+@router.get("/{upload_id}/thumbnail", response_model=ThumbnailResponse)
+async def get_thumbnail(
+    upload_id: str,
+    user: User = Depends(get_current_user_from_session),
+    service: FileUploadService = Depends(get_file_upload_service),
+):
+    """
+    Return a presigned URL for a PNG thumbnail of the file's first page.
+
+    Lazy-renders on first request and caches the resulting `_thumb.png`
+    sibling object next to the original. Subsequent calls hit the cache and
+    return immediately.
+
+    Status codes:
+    - 200: Thumbnail available (response body indicates `cached`).
+    - 404: File not found or not owned by the caller.
+    - 415: MIME type has no thumbnail renderer (UI should fall back to its
+           skeleton card).
+    - 422: File present but unrenderable (corrupt, encrypted, empty PDF, ...).
+    """
+    try:
+        return await service.get_or_create_thumbnail(user.user_id, upload_id)
+    except FileNotFoundError:
+        raise HTTPException(
+            status_code=status.HTTP_404_NOT_FOUND,
+            detail=f"File {upload_id} not found or not owned by you",
+        )
+    except ThumbnailUnsupportedError as e:
+        raise HTTPException(
+            status_code=status.HTTP_415_UNSUPPORTED_MEDIA_TYPE,
+            detail=str(e),
+        )
+    except ThumbnailRenderError as e:
+        logger.warning(f"Thumbnail render failed for {upload_id}: {e}")
+        raise HTTPException(
+            status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
+            detail="Could not render a thumbnail for this file",
+        )
+
+
 # =============================================================================
 # File Management Endpoints
 # =============================================================================
diff --git a/backend/src/apis/app_api/files/service.py b/backend/src/apis/app_api/files/service.py
index eeec0935..b837d26c 100644
--- a/backend/src/apis/app_api/files/service.py
+++ b/backend/src/apis/app_api/files/service.py
@@ -4,6 +4,7 @@
 Business logic for file uploads with S3 pre-signed URLs and quota management.
 """
 
+import asyncio
 import os
 import logging
 import uuid
@@ -20,12 +21,34 @@
     PresignRequest,
     PresignResponse,
     CompleteUploadResponse,
+    PreviewUrlResponse,
+    TextSnippetResponse,
+    ThumbnailResponse,
+    THUMBNAIL_SUPPORTED_MIME_TYPES,
     FileResponse,
     FileListResponse,
     QuotaResponse,
     is_allowed_mime_type,
     ALLOWED_MIME_TYPES,
 )
+from .thumbnails import (
+    ThumbnailRenderer,
+    ThumbnailRenderError,
+    ThumbnailUnsupportedError,
+    get_thumbnail_renderer,
+)
+
+
+# MIME types that are safe to decode as UTF-8 text for inline previews.
+TEXT_PREVIEW_MIME_TYPES = frozenset(
+    {
+        "text/plain",
+        "text/markdown",
+        "text/csv",
+        "text/html",
+    }
+)
+TEXT_SNIPPET_MAX_BYTES = 2048
 from apis.shared.files.repository import FileUploadRepository, get_file_upload_repository
 
 logger = logging.getLogger(__name__)
@@ -89,6 +112,10 @@ class FileUploadService:
     - File listing and deletion
     """
 
+    # Sibling key, in the same per-upload S3 "folder" as the original.
+    # Stored alongside the original so cleanup happens with the file.
+    THUMBNAIL_KEY_NAME = "_thumb.png"
+
     def __init__(
         self,
         repository: Optional[FileUploadRepository] = None,
@@ -97,9 +124,11 @@ def __init__(
         max_file_size: Optional[int] = None,
         max_files_per_message: Optional[int] = None,
         user_quota_bytes: Optional[int] = None,
+        thumbnail_renderer: Optional[ThumbnailRenderer] = None,
     ):
         """Initialize with dependencies."""
         self.repository = repository or get_file_upload_repository()
+        self._thumbnail_renderer = thumbnail_renderer or get_thumbnail_renderer()
 
         # S3 configuration
         # Use region from AWS_REGION env var to ensure presigned URLs use regional endpoint
@@ -133,6 +162,7 @@ def __init__(
 
         # Pre-signed URL expiration
         self.presign_expiration = 15 * 60  # 15 minutes
+        self.preview_url_expiration = 10 * 60  # 10 minutes for GET previews
 
     # =========================================================================
     # Pre-signed URL Flow
@@ -301,6 +331,284 @@ async def complete_upload(
             size_bytes=file_meta.size_bytes,
         )
 
+    # =========================================================================
+    # Preview URL
+    # =========================================================================
+
+    async def get_preview_url(
+        self, user_id: str, upload_id: str
+    ) -> PreviewUrlResponse:
+        """
+        Generate a short-lived presigned GET URL for displaying a file.
+
+        Used by the UI to render image thumbnails inline and to power the
+        full-size lightbox. Only owners can generate preview URLs for their
+        own files. Files must be in READY state.
+
+        Args:
+            user_id: The owner's user ID
+            upload_id: The upload identifier
+
+        Returns:
+            PreviewUrlResponse with a presigned GET URL
+
+        Raises:
+            FileNotFoundError: If file not found, not owned by user, or not ready
+        """
+        file_meta = await self.repository.get_file(user_id, upload_id)
+        if not file_meta:
+            raise FileNotFoundError(f"File {upload_id} not found")
+
+        if file_meta.status != FileStatus.READY:
+            raise FileNotFoundError(
+                f"File {upload_id} is not ready (status: {file_meta.status})"
+            )
+
+        try:
+            url = self._s3_client.generate_presigned_url(
+                "get_object",
+                Params={
+                    "Bucket": self.bucket_name,
+                    "Key": file_meta.s3_key,
+                    "ResponseContentType": file_meta.mime_type,
+                    "ResponseContentDisposition": f'inline; filename="{file_meta.filename}"',
+                },
+                ExpiresIn=self.preview_url_expiration,
+            )
+        except ClientError as e:
+            logger.error(f"Failed to generate preview URL: {e}")
+            raise
+
+        expires_at = (
+            datetime.now(timezone.utc) + timedelta(seconds=self.preview_url_expiration)
+        ).isoformat() + "Z"
+
+        return PreviewUrlResponse(
+            upload_id=upload_id,
+            url=url,
+            expires_at=expires_at,
+            mime_type=file_meta.mime_type,
+            filename=file_meta.filename,
+        )
+
+    async def get_text_snippet(
+        self, user_id: str, upload_id: str
+    ) -> TextSnippetResponse:
+        """
+        Return a short UTF-8 text excerpt from the start of a file.
+
+        Used by the UI to render a content peek inside the document-style
+        attachment card. Only text-based MIME types are supported; other
+        types return an empty snippet so the UI can fall back to a skeleton.
+
+        Args:
+            user_id: The owner's user ID
+            upload_id: The upload identifier
+
+        Returns:
+            TextSnippetResponse with the decoded snippet (possibly empty)
+
+        Raises:
+            FileNotFoundError: If file not found, not owned, or not ready
+        """
+        file_meta = await self.repository.get_file(user_id, upload_id)
+        if not file_meta:
+            raise FileNotFoundError(f"File {upload_id} not found")
+
+        if file_meta.status != FileStatus.READY:
+            raise FileNotFoundError(
+                f"File {upload_id} is not ready (status: {file_meta.status})"
+            )
+
+        if file_meta.mime_type not in TEXT_PREVIEW_MIME_TYPES:
+            return TextSnippetResponse(
+                upload_id=upload_id,
+                snippet="",
+                truncated=False,
+                mime_type=file_meta.mime_type,
+            )
+
+        # Read up to TEXT_SNIPPET_MAX_BYTES + 1 bytes so we can detect truncation.
+        try:
+            response = self._s3_client.get_object(
+                Bucket=self.bucket_name,
+                Key=file_meta.s3_key,
+                Range=f"bytes=0-{TEXT_SNIPPET_MAX_BYTES}",
+            )
+            body = response["Body"].read()
+        except ClientError as e:
+            logger.warning(f"Failed to read snippet for {upload_id}: {e}")
+            return TextSnippetResponse(
+                upload_id=upload_id,
+                snippet="",
+                truncated=False,
+                mime_type=file_meta.mime_type,
+            )
+
+        truncated = file_meta.size_bytes > TEXT_SNIPPET_MAX_BYTES
+        excerpt = body[:TEXT_SNIPPET_MAX_BYTES]
+        try:
+            text = excerpt.decode("utf-8")
+        except UnicodeDecodeError:
+            # Trim trailing partial multi-byte sequence and retry; fall back to replace.
+            text = excerpt.decode("utf-8", errors="replace")
+
+        return TextSnippetResponse(
+            upload_id=upload_id,
+            snippet=text,
+            truncated=truncated,
+            mime_type=file_meta.mime_type,
+        )
+
+    # =========================================================================
+    # Thumbnails
+    # =========================================================================
+
+    def _thumbnail_s3_key(self, file_meta: FileMetadata) -> str:
+        """
+        Derive the sibling thumbnail key for an original file.
+
+        Originals live at ``user-files/{user}/{session}/{upload_id}/{filename}``,
+        thumbnails at ``user-files/{user}/{session}/{upload_id}/_thumb.png`` —
+        same parent prefix so cleanup paths can find both.
+        """
+        base, _, _ = file_meta.s3_key.rpartition("/")
+        return f"{base}/{self.THUMBNAIL_KEY_NAME}"
+
+    async def get_or_create_thumbnail(
+        self, user_id: str, upload_id: str
+    ) -> ThumbnailResponse:
+        """
+        Return a presigned URL for a PNG thumbnail of the file's first page.
+
+        Lazy-renders on first request: checks for a cached ``_thumb.png``
+        sibling object in S3, generates one synchronously if missing, then
+        returns a short-lived presigned URL. Subsequent calls hit the cache.
+
+        Args:
+            user_id: The owner's user ID.
+            upload_id: The upload identifier.
+
+        Returns:
+            ThumbnailResponse with a presigned GET URL and ``cached`` flag.
+
+        Raises:
+            FileNotFoundError: File not found, not owned, or not ready.
+            ThumbnailUnsupportedError: MIME type has no registered renderer.
+            ThumbnailRenderError: The file was unreadable / corrupt / encrypted.
+        """
+        file_meta = await self.repository.get_file(user_id, upload_id)
+        if not file_meta:
+            raise FileNotFoundError(f"File {upload_id} not found")
+
+        if file_meta.status != FileStatus.READY:
+            raise FileNotFoundError(
+                f"File {upload_id} is not ready (status: {file_meta.status})"
+            )
+
+        if file_meta.mime_type not in THUMBNAIL_SUPPORTED_MIME_TYPES:
+            raise ThumbnailUnsupportedError(
+                f"No thumbnail renderer for {file_meta.mime_type}"
+            )
+
+        thumb_key = self._thumbnail_s3_key(file_meta)
+
+        # Cache check via HEAD — cheap, no body transfer.
+        cached = True
+        try:
+            self._s3_client.head_object(Bucket=self.bucket_name, Key=thumb_key)
+        except ClientError as e:
+            if e.response["Error"]["Code"] in ("404", "NoSuchKey"):
+                cached = False
+            else:
+                logger.error(f"HEAD failed for thumbnail {thumb_key}: {e}")
+                raise
+
+        if not cached:
+            await self._render_and_store_thumbnail(file_meta, thumb_key)
+
+        # Generate the presigned URL for the thumbnail. Use the same
+        # expiration window as preview URLs so the UI's caching expectations
+        # line up.
+        try:
+            url = self._s3_client.generate_presigned_url(
+                "get_object",
+                Params={
+                    "Bucket": self.bucket_name,
+                    "Key": thumb_key,
+                    "ResponseContentType": "image/png",
+                },
+                ExpiresIn=self.preview_url_expiration,
+            )
+        except ClientError as e:
+            logger.error(f"Failed to generate thumbnail presigned URL: {e}")
+            raise
+
+        expires_at = (
+            datetime.now(timezone.utc) + timedelta(seconds=self.preview_url_expiration)
+        ).isoformat() + "Z"
+
+        return ThumbnailResponse(
+            upload_id=upload_id,
+            url=url,
+            expires_at=expires_at,
+            cached=cached,
+        )
+
+    async def _render_and_store_thumbnail(
+        self, file_meta: FileMetadata, thumb_key: str
+    ) -> None:
+        """Stream the original from S3, rasterize page 1, store the PNG."""
+        try:
+            response = self._s3_client.get_object(
+                Bucket=self.bucket_name,
+                Key=file_meta.s3_key,
+            )
+            raw = response["Body"].read()
+        except ClientError as e:
+            logger.error(f"Failed to read source for thumbnail {file_meta.upload_id}: {e}")
+            raise ThumbnailRenderError(f"Failed to read source: {e}") from e
+
+        # Run the CPU-bound render off the event loop so we don't stall the
+        # request worker. pypdfium2 releases the GIL for the heavy bits.
+        loop = asyncio.get_event_loop()
+        png_bytes = await loop.run_in_executor(
+            None,
+            self._thumbnail_renderer.render,
+            file_meta.mime_type,
+            raw,
+        )
+
+        try:
+            self._s3_client.put_object(
+                Bucket=self.bucket_name,
+                Key=thumb_key,
+                Body=png_bytes,
+                ContentType="image/png",
+            )
+        except ClientError as e:
+            logger.error(f"Failed to write thumbnail {thumb_key}: {e}")
+            raise
+
+        logger.info(
+            f"Rendered thumbnail for upload {file_meta.upload_id} "
+            f"({len(png_bytes)} bytes)"
+        )
+
+    def _delete_thumbnail_object(self, file_meta: FileMetadata) -> None:
+        """
+        Best-effort delete of the thumbnail sibling.
+
+        S3 ``delete_object`` is idempotent — a missing key is not an error.
+        We swallow other errors so a broken thumbnail never blocks deletion
+        of the underlying file.
+        """
+        thumb_key = self._thumbnail_s3_key(file_meta)
+        try:
+            self._s3_client.delete_object(Bucket=self.bucket_name, Key=thumb_key)
+        except ClientError as e:
+            logger.warning(f"Failed to delete thumbnail {thumb_key}: {e}")
+
     # =========================================================================
     # File Management
     # =========================================================================
@@ -344,6 +652,9 @@ async def delete_file(self, user_id: str, upload_id: str) -> bool:
             logger.warning("Failed to delete S3 object", exc_info=True)
             # Continue with metadata deletion even if S3 fails
 
+        # Best-effort delete of the thumbnail sibling, if one exists.
+        self._delete_thumbnail_object(file_meta)
+
         # Delete metadata
         deleted = await self.repository.delete_file(user_id, upload_id)
 
@@ -488,6 +799,9 @@ async def delete_session_files(self, session_id: str) -> int:
                     logger.warning(f"Failed to delete S3 object for {file_meta.upload_id}: {e}")
                     # Continue with metadata deletion
 
+                # Best-effort delete of the thumbnail sibling, if one exists.
+                self._delete_thumbnail_object(file_meta)
+
                 # Delete metadata
                 deleted = await self.repository.delete_file(
                     file_meta.user_id, file_meta.upload_id
diff --git a/backend/src/apis/app_api/files/thumbnails.py b/backend/src/apis/app_api/files/thumbnails.py
new file mode 100644
index 00000000..c5bdbd89
--- /dev/null
+++ b/backend/src/apis/app_api/files/thumbnails.py
@@ -0,0 +1,147 @@
+"""
+Thumbnail rendering for non-image file attachments.
+
+Currently supports PDF (page 1) via pypdfium2. Office formats (.docx, .xlsx)
+are intentionally not implemented here — see ThumbnailRenderer.render() for
+guidance on how to extend this.
+"""
+
+import io
+import logging
+from typing import Callable, Dict
+
+logger = logging.getLogger(__name__)
+
+
+# Bounded box for the longest dimension of a generated thumbnail. The UI's
+# attachment card body is ~240x128 logical pixels, so 256 covers retina
+# without being wasteful on storage / CPU.
+THUMBNAIL_MAX_DIMENSION = 256
+
+
+class ThumbnailUnsupportedError(Exception):
+    """Raised when no renderer is registered for a given MIME type."""
+
+
+class ThumbnailRenderError(Exception):
+    """Raised when rendering ran but the source file was unreadable."""
+
+
+class ThumbnailRenderer:
+    """
+    MIME-type-dispatching renderer that produces a PNG of a file's first page.
+
+    Today this is PDF-only. The dispatcher exists so that callers (the file
+    upload service, the route layer) speak a single API: hand it a MIME type
+    plus bytes, get back a PNG or a typed error. New formats plug in by
+    adding an entry to ``_renderers``.
+
+    ----- Future formats: .docx and .xlsx -----
+
+    Office formats are deliberately out of scope for this in-process renderer.
+    The standard rasterization path requires LibreOffice (``soffice --headless
+    --convert-to pdf``) to first convert the document to PDF, which can then be
+    handed to the existing PDF path. LibreOffice adds roughly 500 MB to a
+    container image, pulls in ~20 system packages, and noticeably increases
+    cold start time — costs that are inappropriate for the app-api request
+    path, which today serves chat traffic with a tight latency budget.
+
+    Recommendation when those formats are needed:
+
+    1. Build a separate **thumbnail render service**. A small Fargate task or
+       a dedicated Lambda using a pre-baked LibreOffice container image is a
+       clean fit. Either flavor can stay scaled to zero when idle.
+    2. Have app-api enqueue render requests (SQS or a synchronous HTTPS call
+       behind an internal ALB) instead of importing the converter. The render
+       service writes the resulting PNG to the same `_thumb.png` sibling key
+       the PDF path uses, so the cache-and-serve flow on this side is
+       unchanged.
+    3. Keep the dispatcher's public API stable: callers should still get a PNG
+       back, and the cache layout in S3 should not change. The only difference
+       is *where* the bytes are produced.
+
+    Until that service exists, callers are expected to filter on
+    ``THUMBNAIL_SUPPORTED_MIME_TYPES`` from ``apis.shared.files.models`` and
+    return a 415 for unsupported types so the UI can fall back to the
+    existing skeleton card.
+    """
+
+    def __init__(self) -> None:
+        self._renderers: Dict[str, Callable[[bytes], bytes]] = {
+            "application/pdf": self._render_pdf,
+            # Future entries plug in here. See class docstring for the
+            # recommended out-of-process design for .docx and .xlsx.
+        }
+
+    def render(self, mime_type: str, raw: bytes) -> bytes:
+        """
+        Render a thumbnail PNG for the given file bytes.
+
+        Args:
+            mime_type: The source file's MIME type.
+            raw: The raw file bytes.
+
+        Returns:
+            PNG-encoded bytes for a thumbnail bounded by
+            THUMBNAIL_MAX_DIMENSION on its longest side.
+
+        Raises:
+            ThumbnailUnsupportedError: No renderer is registered for mime_type.
+            ThumbnailRenderError: The renderer ran but the file was unreadable.
+        """
+        renderer = self._renderers.get(mime_type)
+        if renderer is None:
+            raise ThumbnailUnsupportedError(
+                f"No thumbnail renderer registered for {mime_type}"
+            )
+        return renderer(raw)
+
+    def _render_pdf(self, raw: bytes) -> bytes:
+        # Imported lazily so unit tests that don't touch the renderer don't
+        # need the native lib loaded.
+        try:
+            import pypdfium2 as pdfium
+        except ImportError as e:
+            raise ThumbnailRenderError(
+                "pypdfium2 is not installed; PDF thumbnails are unavailable"
+            ) from e
+
+        try:
+            pdf = pdfium.PdfDocument(io.BytesIO(raw))
+        except Exception as e:
+            raise ThumbnailRenderError(f"Failed to open PDF: {e}") from e
+
+        try:
+            if len(pdf) == 0:
+                raise ThumbnailRenderError("PDF has no pages")
+
+            page = pdf[0]
+            try:
+                width, height = page.get_size()
+                longest = max(width, height)
+                if longest <= 0:
+                    raise ThumbnailRenderError("PDF page has zero dimensions")
+
+                # Scale so the longest side lands at THUMBNAIL_MAX_DIMENSION.
+                scale = THUMBNAIL_MAX_DIMENSION / longest
+                bitmap = page.render(scale=scale)
+                pil_image = bitmap.to_pil()
+            finally:
+                page.close()
+        finally:
+            pdf.close()
+
+        buffer = io.BytesIO()
+        pil_image.save(buffer, format="PNG", optimize=True)
+        return buffer.getvalue()
+
+
+_renderer_instance: ThumbnailRenderer | None = None
+
+
+def get_thumbnail_renderer() -> ThumbnailRenderer:
+    """Get or create the singleton ThumbnailRenderer."""
+    global _renderer_instance
+    if _renderer_instance is None:
+        _renderer_instance = ThumbnailRenderer()
+    return _renderer_instance
diff --git a/backend/src/apis/app_api/main.py b/backend/src/apis/app_api/main.py
index 17770531..804ad456 100644
--- a/backend/src/apis/app_api/main.py
+++ b/backend/src/apis/app_api/main.py
@@ -28,9 +28,58 @@
     format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
 )
 logger = logging.getLogger(__name__)
+
+# Refuse to boot unless SKIP_AUTH is paired with a positive local-dev
+# signal. Allowlist over blocklist: every CORS_ORIGINS entry must be a
+# localhost URL. Any deployed origin (or empty config) trips this — far
+# safer than enumerating every env var a deployed runtime might set, and
+# fails closed for new deploy targets we haven't met yet.
+#
+# Runs from `lifespan()` rather than at import time so tests that import
+# this module (e.g. tests/routes/test_pbt_auth_sweep.py) don't trip the
+# check on environments where SKIP_AUTH=true is set globally.
+_SKIP_AUTH_LOCAL_HOSTS = {"localhost", "127.0.0.1", "::1", "0.0.0.0"}
+
+
+def _validate_skip_auth_or_raise() -> None:
+    """Raise RuntimeError if SKIP_AUTH=true is paired with non-local CORS_ORIGINS.
+
+    No-op when SKIP_AUTH is unset/false. When set, every CORS_ORIGINS entry
+    must resolve to a localhost host or boot is refused.
+    """
+    if os.environ.get("SKIP_AUTH", "").lower() != "true":
+        return
+
+    from urllib.parse import urlparse
+
+    origins = [
+        o.strip()
+        for o in os.environ.get("CORS_ORIGINS", "").split(",")
+        if o.strip()
+    ]
+
+    def _is_local(origin: str) -> bool:
+        try:
+            return (urlparse(origin).hostname or "") in _SKIP_AUTH_LOCAL_HOSTS
+        except Exception:
+            return False
+
+    if not origins or not all(_is_local(o) for o in origins):
+        raise RuntimeError(
+            "SKIP_AUTH=true requires CORS_ORIGINS to contain only localhost "
+            "origins (localhost, 127.0.0.1, ::1, 0.0.0.0). Refusing to start "
+            "— this bypass is local-dev only."
+        )
+    logger.warning(
+        "SKIP_AUTH=true — auth dependencies will return a fake admin user. "
+        "DO NOT enable this in any deployed environment."
+    )
+
+
 @asynccontextmanager
 async def lifespan(app: FastAPI):
     # Startup
+    _validate_skip_auth_or_raise()
     logger.info("=== AgentCore Public Stack API Starting ===")
     logger.info("Agent execution engine initialized")
 
@@ -121,6 +170,7 @@ async def lifespan(app: FastAPI):
 from apis.app_api.chat.routes import router as chat_router
 from apis.app_api.chat.converse_routes import router as converse_router
 from apis.app_api.chat.proxy_routes import router as bff_chat_proxy_router
+from apis.app_api.mcp_apps.routes import router as mcp_apps_router
 from apis.app_api.memory.routes import router as memory_router
 from apis.app_api.tools.routes import router as tools_router
 from apis.app_api.files.routes import router as files_router
@@ -132,6 +182,7 @@ async def lifespan(app: FastAPI):
 from apis.app_api.system.routes import router as system_router
 from apis.app_api.shares.routes import conversations_share_router, shares_router, shared_view_router
 from apis.app_api.voice import router as voice_router
+from apis.app_api.user_menu_links.routes import router as user_menu_links_router
 
 # Include routers
 app.include_router(health_router)
@@ -149,6 +200,7 @@ async def lifespan(app: FastAPI):
 app.include_router(chat_router)  # Application-specific chat endpoints
 app.include_router(converse_router)  # Proxies to Inference API for cost accounting
 app.include_router(bff_chat_proxy_router)  # Cookie-authenticated SSE proxy (Phase 4, dormant until SPA cutover)
+app.include_router(mcp_apps_router)  # MCP Apps app-initiated tools/call proxy (PR #5; inert until host flag on)
 app.include_router(memory_router)  # AgentCore Memory access endpoints
 app.include_router(tools_router)  # Tool discovery and permissions
 app.include_router(files_router)  # File upload via pre-signed URLs
@@ -158,6 +210,7 @@ async def lifespan(app: FastAPI):
 app.include_router(shares_router)  # Share management (update, revoke, export)
 app.include_router(shared_view_router)  # Shared conversation read-only view
 app.include_router(voice_router)  # Cookie-authenticated WS proxy for Nova Sonic voice mode (#211)
+app.include_router(user_menu_links_router)  # Public read of admin-managed user-menu links
 
 # Conditionally register fine-tuning routes
 if os.environ.get("FINE_TUNING_ENABLED", "false").lower() == "true":
@@ -165,6 +218,14 @@ async def lifespan(app: FastAPI):
     app.include_router(fine_tuning_router)
     logger.info("Fine-tuning routes enabled")
 
+# Conditionally register artifact render-token routes. Infra only sets
+# the secret ARN when the artifacts feature is enabled for the
+# environment, so its presence is the enablement signal.
+if os.environ.get("ARTIFACTS_RENDER_TOKEN_SECRET_ARN"):
+    from apis.app_api.artifacts.routes import router as artifacts_router
+    app.include_router(artifacts_router)
+    logger.info("Artifact render-token routes enabled")
+
 # Mount static file directories for serving generated content
 # These are created by tools (visualization, code interpreter, etc.)
 # Use parent directory (src/) as base
diff --git a/backend/src/apis/app_api/mcp_apps/__init__.py b/backend/src/apis/app_api/mcp_apps/__init__.py
new file mode 100644
index 00000000..9ea06ea3
--- /dev/null
+++ b/backend/src/apis/app_api/mcp_apps/__init__.py
@@ -0,0 +1,6 @@
+"""MCP Apps app-api surface (SEP-1865).
+
+PR #5 of `docs/kaizen/scoping/mcp-apps-host-renderer.md`: the cookie-
+authenticated `POST /mcp-apps/proxy-call` boundary an embedded MCP App's
+`tools/call` is relayed through, on its way to inference-api dispatch.
+"""
diff --git a/backend/src/apis/app_api/mcp_apps/routes.py b/backend/src/apis/app_api/mcp_apps/routes.py
new file mode 100644
index 00000000..4ea4c9e7
--- /dev/null
+++ b/backend/src/apis/app_api/mcp_apps/routes.py
@@ -0,0 +1,237 @@
+"""Cookie-authenticated MCP App `tools/call` proxy (MCP Apps PR #5).
+
+`docs/kaizen/scoping/mcp-apps-host-renderer.md`, decision #2. The embedded
+MCP App iframe issues a JSON-RPC `tools/call` over the postMessage bridge;
+the SPA relays it here. The flow mirrors the BFF chat proxy:
+
+  iframe → SPA bridge → app-api `/mcp-apps/proxy-call` → inference-api
+  `/invocations` (app_tool_call directive) → MCP server → reverse path
+
+This handler is the **session-cookie boundary**: it authenticates the
+caller and forwards the conversation binding (sessionId + originating
+toolUseId) so the proxied call inherits provenance. It deliberately does
+NOT decide tool visibility — `_meta.ui.visibility` is derived live from the
+MCP server and only the inference-api process holds that catalog, so the
+authoritative spec-MUST "reject tools whose visibility excludes 'app'"
+gate lives in the inference-api dispatch (`dispatch_app_tool_call`). This
+boundary's contribution is auth + request validation + the bearer hand-off.
+
+Gated by `AGENTCORE_MCP_APPS_HOST_ENABLED` (default true since PR #7):
+with the host flag off the inference-api catalog is empty and every call
+is rejected there as not app-visible.
+"""
+
+from __future__ import annotations
+
+import logging
+from typing import Any, Dict, List, Optional
+
+import httpx
+from fastapi import APIRouter, Depends, HTTPException, Request
+from fastapi.responses import JSONResponse
+from pydantic import BaseModel, Field
+
+from apis.app_api.chat import proxy_routes
+from apis.shared.auth.dependencies import get_current_user_from_session
+from apis.shared.auth.models import User
+from apis.shared.mcp_apps.card_store import get_app_card_store
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/mcp-apps", tags=["mcp-apps"])
+
+
+class ProxyToolCallRequest(BaseModel):
+    """A `tools/call` proxied from an embedded MCP App.
+
+    `enabledTools` / `modelId` carry the conversation's configuration so
+    inference-api rebuilds the same agent shape (the MCP client that hosts
+    the tool must be loaded). The SPA already has these — it sends them on
+    every chat turn.
+    """
+
+    session_id: str = Field(..., alias="sessionId")
+    tool_use_id: str = Field(..., alias="toolUseId")
+    tool_name: str = Field(..., alias="toolName")
+    arguments: Dict[str, Any] = Field(default_factory=dict)
+    enabled_tools: List[str] = Field(default_factory=list, alias="enabledTools")
+    model_id: Optional[str] = Field(default=None, alias="modelId")
+
+    model_config = {"populate_by_name": True}
+
+
+@router.post("/proxy-call")
+async def proxy_call(
+    body: ProxyToolCallRequest,
+    request: Request,
+    current_user: User = Depends(get_current_user_from_session),
+) -> JSONResponse:
+    """Relay an app-initiated tool call to inference-api and return its result.
+
+    Non-streaming: inference-api runs the single tool (no model turn) and
+    returns the `CallToolResult` as JSON; the synthesized tool_use/
+    tool_result land in the conversation thread via the per-session broker.
+    """
+    invocation_body = {
+        "session_id": body.session_id,
+        "enabled_tools": body.enabled_tools,
+        "model_id": body.model_id,
+        "app_tool_call": {
+            "tool_use_id": body.tool_use_id,
+            "tool_name": body.tool_name,
+            "arguments": body.arguments,
+        },
+    }
+
+    target_url = proxy_routes._build_invocations_url(
+        proxy_routes._inference_api_url()
+    )
+    headers = {
+        "Content-Type": "application/json",
+        "Authorization": f"Bearer {current_user.raw_token}",
+    }
+
+    client = proxy_routes._build_upstream_client()
+    try:
+        response = await client.post(
+            target_url, headers=headers, json=invocation_body
+        )
+    except httpx.ConnectError:
+        logger.error("Cannot reach Inference API at %s", target_url)
+        raise HTTPException(status_code=502, detail="Inference API is unreachable")
+    except httpx.TimeoutException:
+        logger.error("Inference API request timed out: %s", target_url)
+        raise HTTPException(status_code=504, detail="Inference API request timed out")
+    except Exception as exc:  # noqa: BLE001
+        logger.error("MCP Apps proxy-call error: %s", exc, exc_info=True)
+        raise HTTPException(status_code=502, detail="Proxy error")
+    finally:
+        await client.aclose()
+
+    try:
+        payload = response.json()
+    except Exception:  # noqa: BLE001 - upstream returned non-JSON
+        raise HTTPException(status_code=502, detail="Bad upstream response")
+
+    # Option A (PR #6): on success, persist a static provenance card so the
+    # call survives a page reload (the broker is in-memory; the live thread
+    # event is otherwise lost on refresh). Best-effort + provenance-only —
+    # model-visible state flows through ui/update-model-context, never a
+    # persisted synthetic tool turn. A failed write must not fail the call.
+    if response.status_code == 200 and isinstance(payload, dict):
+        result = payload.get("result") or {}
+        try:
+            get_app_card_store().store(
+                user_id=current_user.user_id,
+                session_id=body.session_id,
+                tool_use_id=body.tool_use_id,
+                tool_name=body.tool_name,
+                arguments=body.arguments,
+                content=result.get("content") or [],
+                is_error=bool(result.get("isError")),
+            )
+        except Exception:  # noqa: BLE001 - provenance is best-effort
+            logger.warning(
+                "mcp-apps: failed to persist provenance card", exc_info=True
+            )
+
+    # Relay inference-api's status verbatim (403 not-app-visible, 409 no
+    # live client, 502 tool failure, 200 success) so the bridge can answer
+    # the iframe's JSON-RPC with the right error.
+    return JSONResponse(payload, status_code=response.status_code)
+
+
+class ProxyContextUpdateRequest(BaseModel):
+    """App-pushed model context proxied from an embedded MCP App (PR #6).
+
+    The iframe's `ui/update-model-context` params are `{content?,
+    structuredContent?}`; `resourceUri` is the bound App resource the SPA
+    already holds (the `ui_resource` event's `resourceUri`) and is the
+    host's per-App dedupe key. `enabledTools` / `modelId` carry the
+    conversation config so inference-api rebuilds the same cached agent.
+    """
+
+    session_id: str = Field(..., alias="sessionId")
+    resource_uri: str = Field(..., alias="resourceUri")
+    content: Optional[List[Dict[str, Any]]] = None
+    structured_content: Optional[Dict[str, Any]] = Field(
+        default=None, alias="structuredContent"
+    )
+    enabled_tools: List[str] = Field(default_factory=list, alias="enabledTools")
+    model_id: Optional[str] = Field(default=None, alias="modelId")
+
+    model_config = {"populate_by_name": True}
+
+
+@router.post("/update-context")
+async def update_context(
+    body: ProxyContextUpdateRequest,
+    request: Request,
+    current_user: User = Depends(get_current_user_from_session),
+) -> JSONResponse:
+    """Relay an app-pushed model-context update to inference-api.
+
+    Non-streaming, runs no model turn: inference-api stashes the payload on
+    the conversation agent's Strands state; the next real user turn merges
+    it. Mirrors `/proxy-call`'s auth + bearer hand-off — this boundary's
+    only job is the session-cookie → bearer exchange.
+    """
+    invocation_body = {
+        "session_id": body.session_id,
+        "enabled_tools": body.enabled_tools,
+        "model_id": body.model_id,
+        "app_context_update": {
+            "resource_uri": body.resource_uri,
+            "content": body.content,
+            "structured_content": body.structured_content,
+        },
+    }
+
+    target_url = proxy_routes._build_invocations_url(
+        proxy_routes._inference_api_url()
+    )
+    headers = {
+        "Content-Type": "application/json",
+        "Authorization": f"Bearer {current_user.raw_token}",
+    }
+
+    client = proxy_routes._build_upstream_client()
+    try:
+        response = await client.post(
+            target_url, headers=headers, json=invocation_body
+        )
+    except httpx.ConnectError:
+        logger.error("Cannot reach Inference API at %s", target_url)
+        raise HTTPException(status_code=502, detail="Inference API is unreachable")
+    except httpx.TimeoutException:
+        logger.error("Inference API request timed out: %s", target_url)
+        raise HTTPException(status_code=504, detail="Inference API request timed out")
+    except Exception as exc:  # noqa: BLE001
+        logger.error("MCP Apps update-context error: %s", exc, exc_info=True)
+        raise HTTPException(status_code=502, detail="Proxy error")
+    finally:
+        await client.aclose()
+
+    try:
+        payload = response.json()
+    except Exception:  # noqa: BLE001 - upstream returned non-JSON
+        raise HTTPException(status_code=502, detail="Bad upstream response")
+
+    return JSONResponse(payload, status_code=response.status_code)
+
+
+@router.get("/cards")
+async def list_cards(
+    session_id: str,
+    current_user: User = Depends(get_current_user_from_session),
+) -> JSONResponse:
+    """Return this user's app-initiated tool-call cards for a session.
+
+    Reload hydration for Option A: the SPA replays these as *static
+    historical cards* (the App iframe itself is not re-instantiated).
+    Ownership is re-checked in the store against a guessed session id.
+    """
+    cards = get_app_card_store().list_for_session(
+        session_id=session_id, user_id=current_user.user_id
+    )
+    return JSONResponse({"cards": cards})
diff --git a/backend/src/apis/app_api/messages/models.py b/backend/src/apis/app_api/messages/models.py
index a42f7326..44c0f472 100644
--- a/backend/src/apis/app_api/messages/models.py
+++ b/backend/src/apis/app_api/messages/models.py
@@ -31,11 +31,22 @@ class MessageContent(BaseModel):
 
 
 class LatencyMetrics(BaseModel):
-    """Latency measurements in milliseconds"""
+    """Latency measurements in milliseconds.
+
+    ``time_to_first_token`` is ``None`` when the provider did not emit
+    ``timeToFirstByteMs`` and we couldn't compute it locally — distinct from
+    a measured value of 0ms (which is physically impossible). Aggregations
+    over TTFT must filter ``None`` so a missing measurement doesn't pull
+    averages toward zero.
+    """
 
     model_config = ConfigDict(populate_by_name=True)
 
-    time_to_first_token: int = Field(..., alias="timeToFirstToken", description="Time from request start to first token received (ms)")
+    time_to_first_token: Optional[int] = Field(
+        None,
+        alias="timeToFirstToken",
+        description="Time from request start to first token (ms); None if not measured",
+    )
     end_to_end_latency: int = Field(..., alias="endToEndLatency", description="Total time from request start to completion (ms)")
 
 
diff --git a/backend/src/apis/app_api/user_menu_links/__init__.py b/backend/src/apis/app_api/user_menu_links/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/backend/src/apis/app_api/user_menu_links/routes.py b/backend/src/apis/app_api/user_menu_links/routes.py
new file mode 100644
index 00000000..3741c762
--- /dev/null
+++ b/backend/src/apis/app_api/user_menu_links/routes.py
@@ -0,0 +1,37 @@
+"""Public read endpoint for user-menu links.
+
+Signed-in users (any role) fetch enabled links to render in the user menu.
+Admin writes go through ``/admin/user-menu-links``.
+"""
+
+import logging
+
+from fastapi import APIRouter, Depends
+
+from apis.shared.auth import User, get_current_user_from_session
+from apis.shared.user_menu_links.models import (
+    UserMenuLinkListResponse,
+    UserMenuLinkResponse,
+)
+from apis.shared.user_menu_links.service import get_user_menu_links_service
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/user-menu-links", tags=["user-menu-links"])
+
+
+@router.get(
+    "/",
+    response_model=UserMenuLinkListResponse,
+    summary="List enabled user-menu links",
+)
+async def list_enabled_user_menu_links(
+    current_user: User = Depends(get_current_user_from_session),
+) -> UserMenuLinkListResponse:
+    """Return all enabled links for rendering in the SPA user menu."""
+    service = get_user_menu_links_service()
+    links = await service.list_links(enabled_only=True)
+    return UserMenuLinkListResponse(
+        links=[UserMenuLinkResponse.from_link(link) for link in links],
+        total=len(links),
+    )
diff --git a/backend/src/apis/app_api/user_settings/routes.py b/backend/src/apis/app_api/user_settings/routes.py
index 48997ba2..c2c04f15 100644
--- a/backend/src/apis/app_api/user_settings/routes.py
+++ b/backend/src/apis/app_api/user_settings/routes.py
@@ -54,6 +54,23 @@ async def update_settings(
         except Exception as e:
             logger.warning(f"Could not validate model ID: {e}")
 
+    # Surface the missing-table case as a real 503 instead of silently
+    # echoing the requested values back to the client. Previously the route
+    # returned 200 with the new payload while persisting nothing, so the
+    # SPA's "Saving..." indicator cleared and the user assumed success —
+    # then the next page load showed defaultModelId=null because the GET
+    # path falls through to the same disabled repo and returns defaults
+    # (#161). Failing loud here lets the frontend show the user that the
+    # backend is misconfigured rather than silently dropping their choice.
+    if not repo.enabled:
+        logger.error(
+            "User settings update rejected: DYNAMODB_USER_SETTINGS_TABLE_NAME is not configured"
+        )
+        raise HTTPException(
+            status_code=503,
+            detail="User settings storage is not configured on this server.",
+        )
+
     try:
         updated = await repo.update_settings(current_user.user_id, update_data)
         return UserSettings(**updated)
diff --git a/backend/src/apis/inference_api/chat/app_context_dispatch.py b/backend/src/apis/inference_api/chat/app_context_dispatch.py
new file mode 100644
index 00000000..031513cd
--- /dev/null
+++ b/backend/src/apis/inference_api/chat/app_context_dispatch.py
@@ -0,0 +1,191 @@
+"""App-pushed model context (`ui/update-model-context`, MCP Apps PR #6).
+
+`docs/kaizen/scoping/mcp-apps-host-renderer.md`, decision #3. An embedded
+MCP App pushes structured/text context to the host over the postMessage
+bridge; app-api relays it to `/invocations` with an `app_context_update`
+directive. Like PR #5's `app_tool_call` this runs WITHOUT a model turn —
+it stashes the payload on the conversation agent's Strands `agent.state`,
+keyed by the App's bound resource URI.
+
+Storage (decision #3): `agent.state` is the live Strands `AgentState` of
+the cached conversation agent. Multi-turn continuity in cloud rides the
+in-process LRU agent cache (AgentCore Memory is write-only for continuity —
+see docs/specs/MAX_TOKENS_CONTINUE_SESSION_RESTORE_ANALYSIS.md), so the
+same `agent.state` survives turn boundaries for free; a cold start /
+eviction drops the *entire* conversation anyway, so a dropped pending
+context there is consistent with existing behavior, not a new regression.
+No `TurnBasedSessionManager` / Memory change is needed.
+
+`AgentState` in strands 1.40 is a `.get()/.set()/.delete()` store whose
+`.get()` returns a **deep copy** — nested in-place mutation does NOT
+persist, so the bag under `STATE_KEY` is read-modify-written wholesale,
+and every value must be JSON-serializable.
+
+Read path: `merge_and_clear_pending_context` is called once before each
+real user turn (not resume / continuation / a directive call). It dedupes
+by resource URI (last-write-wins is inherent in the dict), renders a
+single delimited block, clears the bag, and the caller prepends the block
+to that turn's prompt only (kept out of persisted history + the cached
+system prefix, so prompt-cache stability is preserved).
+
+Inert unless `AGENTCORE_MCP_APPS_HOST_ENABLED=true` — app-api still relays,
+but with no live App nothing ever calls this.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+from datetime import datetime, timezone
+from typing import Any, Dict, List, Optional
+
+logger = logging.getLogger(__name__)
+
+# Top-level Strands `agent.state` key that holds all MCP Apps host state.
+STATE_KEY = "mcp_apps"
+# Sub-key under STATE_KEY mapping resource_uri -> pending context entry.
+_CONTEXT_SUBKEY = "context"
+
+
+class AppContextUpdateError(Exception):
+    """Dispatch failed in a way the caller should surface as an error.
+
+    `code` is an app-api HTTP status hint; `message` is safe to return to
+    the client (no internals).
+    """
+
+    def __init__(self, message: str, code: int = 400) -> None:
+        super().__init__(message)
+        self.message = message
+        self.code = code
+
+
+def _strands_agent(agent: Any) -> Any:
+    """The inner Strands `Agent` (its `.state` is the `AgentState`).
+
+    `get_agent` returns a `BaseAgent` wrapper; the Strands agent is
+    `BaseAgent.agent` (set in `chat_agent._create_agent`).
+    """
+    strands_agent = getattr(agent, "agent", None)
+    if strands_agent is None or not hasattr(strands_agent, "state"):
+        raise AppContextUpdateError(
+            "Conversation agent has no state to update", code=409
+        )
+    return strands_agent
+
+
+def dispatch_app_context_update(
+    agent: Any,
+    *,
+    resource_uri: str,
+    content: Optional[List[Dict[str, Any]]],
+    structured_content: Optional[Dict[str, Any]],
+) -> Dict[str, Any]:
+    """Stash one app-pushed context update on the cached agent's state.
+
+    Last-write-wins per `resource_uri` (the dedupe key). Returns a small
+    JSON-able ack for app-api to relay to the iframe. Raises
+    `AppContextUpdateError` on a missing/un-serializable payload.
+    """
+    if content is None and structured_content is None:
+        raise AppContextUpdateError(
+            "ui/update-model-context requires content or structuredContent",
+            code=400,
+        )
+
+    strands_agent = _strands_agent(agent)
+
+    entry: Dict[str, Any] = {
+        "resourceUri": resource_uri,
+        "updatedAt": datetime.now(timezone.utc).isoformat(),
+    }
+    if content is not None:
+        entry["content"] = content
+    if structured_content is not None:
+        entry["structuredContent"] = structured_content
+
+    # AgentState.get() deep-copies, so read-modify-write the whole bag.
+    bag: Dict[str, Any] = strands_agent.state.get(STATE_KEY) or {}
+    ctx: Dict[str, Any] = dict(bag.get(_CONTEXT_SUBKEY) or {})
+    ctx[resource_uri] = entry  # last-write-wins
+    bag[_CONTEXT_SUBKEY] = ctx
+
+    try:
+        strands_agent.state.set(STATE_KEY, bag)
+    except ValueError as exc:  # not JSON serializable
+        raise AppContextUpdateError(
+            "ui/update-model-context payload is not JSON serializable",
+            code=400,
+        ) from exc
+
+    logger.info(
+        "mcp-apps: stored model context (resource=%s, pending=%d)",
+        resource_uri,
+        len(ctx),
+    )
+    return {"resourceUri": resource_uri, "status": "stored", "pending": len(ctx)}
+
+
+def _render_entry(resource_uri: str, entry: Dict[str, Any]) -> str:
+    parts: List[str] = [f'<context resource="{resource_uri}">']
+    structured = entry.get("structuredContent")
+    if structured is not None:
+        try:
+            parts.append(json.dumps(structured, ensure_ascii=False, indent=2))
+        except (TypeError, ValueError):
+            parts.append(str(structured))
+    for block in entry.get("content") or []:
+        if isinstance(block, dict):
+            text = block.get("text")
+            if isinstance(text, str) and text:
+                parts.append(text)
+                continue
+            try:
+                parts.append(json.dumps(block, ensure_ascii=False))
+            except (TypeError, ValueError):
+                parts.append(str(block))
+        else:
+            parts.append(str(block))
+    parts.append("</context>")
+    return "\n".join(parts)
+
+
+def merge_and_clear_pending_context(agent: Any) -> Optional[str]:
+    """Drain pending app-pushed context into a single prompt block.
+
+    Returns a delimited block to prepend to the current turn's prompt, or
+    `None` when nothing is pending. Clears the bag so each update reaches
+    the model exactly once. Never raises into the turn — context is
+    best-effort and must not break a conversation.
+    """
+    try:
+        strands_agent = _strands_agent(agent)
+    except AppContextUpdateError:
+        return None
+
+    try:
+        bag: Dict[str, Any] = strands_agent.state.get(STATE_KEY) or {}
+        ctx: Dict[str, Any] = bag.get(_CONTEXT_SUBKEY) or {}
+        if not ctx:
+            return None
+
+        rendered = "\n".join(
+            _render_entry(uri, entry) for uri, entry in ctx.items()
+        )
+
+        bag[_CONTEXT_SUBKEY] = {}
+        strands_agent.state.set(STATE_KEY, bag)
+    except Exception:  # noqa: BLE001 - context is best-effort, never fatal
+        logger.warning(
+            "mcp-apps: failed to merge pending model context", exc_info=True
+        )
+        return None
+
+    return (
+        "<mcp_app_context>\n"
+        "The user's embedded app(s) provided this context for the request "
+        "that follows. Treat it as authoritative app state, not as the "
+        "user's words.\n"
+        f"{rendered}\n"
+        "</mcp_app_context>"
+    )
diff --git a/backend/src/apis/inference_api/chat/app_tool_dispatch.py b/backend/src/apis/inference_api/chat/app_tool_dispatch.py
new file mode 100644
index 00000000..4c8a33ed
--- /dev/null
+++ b/backend/src/apis/inference_api/chat/app_tool_dispatch.py
@@ -0,0 +1,191 @@
+"""App-initiated `tools/call` dispatch (MCP Apps PR #5).
+
+`docs/kaizen/scoping/mcp-apps-host-renderer.md`, decision #2. An embedded
+MCP App calls a server tool over the postMessage bridge; app-api relays it
+to `/invocations` with an `app_tool_call` directive. This module runs that
+single tool call WITHOUT a model turn:
+
+1. Rebuild the conversation's agent via `get_agent` (the same path resume
+   uses) so the MCP client session + auth (OAuth token cache, SigV4,
+   consent hook) are wired exactly as for a model-driven tool call.
+2. Re-check the tool's `_meta.ui.visibility` includes `"app"` — the spec
+   MUST, enforced here as the second gate (app-api is the first).
+3. Call the tool against the MCP client that surfaced it (recorded in the
+   `UIToolCatalog` during the agent's `tools/list`).
+4. Publish synthesized `tool_use` / `tool_result` events to the
+   per-session broker so the live conversation stream shows the card, and
+   return the `CallToolResult` so app-api can hand it back to the iframe.
+
+Inert unless `AGENTCORE_MCP_APPS_HOST_ENABLED=true` (default true since
+PR #7) — the catalog is empty when the flag is off, so every call is
+rejected as not app-visible.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import uuid
+from typing import Any, Dict, List, Optional
+
+from apis.shared.mcp_apps.broker import get_app_tool_event_broker
+
+logger = logging.getLogger(__name__)
+
+
+class AppToolCallError(Exception):
+    """Dispatch failed in a way the caller should surface as an error.
+
+    `code` is an app-api HTTP status hint; `message` is safe to return to
+    the client (no internals).
+    """
+
+    def __init__(self, message: str, code: int = 400) -> None:
+        super().__init__(message)
+        self.message = message
+        self.code = code
+
+
+def _serialize_content(result: Any) -> List[Dict[str, Any]]:
+    """Best-effort MCP tool-result content -> JSON-able blocks.
+
+    Strands' `MCPToolResult.content` is a list of MCP content models;
+    `model_dump` is the canonical serialization. Falls back to a text
+    block so a quirky server response still round-trips.
+    """
+    content = getattr(result, "content", None)
+    blocks: List[Dict[str, Any]] = []
+    if isinstance(content, list):
+        for item in content:
+            if hasattr(item, "model_dump"):
+                try:
+                    blocks.append(item.model_dump(by_alias=True, exclude_none=True))
+                    continue
+                except Exception:  # noqa: BLE001
+                    pass
+            if isinstance(item, dict):
+                blocks.append(item)
+            else:
+                blocks.append({"type": "text", "text": str(item)})
+    return blocks
+
+
+def _is_error(result: Any) -> bool:
+    val = getattr(result, "isError", None)
+    if val is None:
+        val = getattr(result, "is_error", None)
+    if val is None and isinstance(result, dict):
+        val = result.get("isError") or result.get("is_error")
+    return bool(val)
+
+
+def _resolve_client(agent: Any, tool_name: str):
+    """The MCP client that surfaced `tool_name`.
+
+    Primary source is the `UIToolCatalog` (recorded when the agent's MCP
+    client ran `tools/list` during build). Lazy import keeps the agent
+    layer off inference-api's cold-start path when MCP Apps is disabled.
+    """
+    from agents.main_agent.integrations.mcp_apps import (
+        get_ui_tool_catalog,
+        is_mcp_apps_host_enabled,
+    )
+
+    if not is_mcp_apps_host_enabled():
+        return None, None
+    catalog = get_ui_tool_catalog()
+    ui_metadata = catalog.get(tool_name)
+    client = catalog.get_client(tool_name)
+    return ui_metadata, client
+
+
+async def dispatch_app_tool_call(
+    agent: Any,
+    *,
+    session_id: str,
+    user_id: str,
+    tool_use_id: str,
+    tool_name: str,
+    arguments: Optional[Dict[str, Any]],
+) -> Dict[str, Any]:
+    """Execute one app-initiated tool call and publish thread events.
+
+    `agent` is the already-built conversation agent (its MCP clients are
+    live). Returns ``{"toolUseId", "result": {content, isError}}`` for the
+    JSON response app-api relays to the iframe. Raises `AppToolCallError`
+    for visibility / unknown-tool / dispatch failures.
+    """
+    ui_metadata, client = _resolve_client(agent, tool_name)
+
+    # Spec MUST: reject tools/call from apps for tools whose visibility
+    # excludes "app". With the host flag off the catalog is empty, so
+    # ui_metadata is None and every proxied call is rejected here.
+    if ui_metadata is None or not ui_metadata.visible_to_app():
+        raise AppToolCallError(
+            f"Tool '{tool_name}' is not callable from an MCP App", code=403
+        )
+    if client is None:
+        raise AppToolCallError(
+            f"No live MCP client for tool '{tool_name}'", code=409
+        )
+
+    # Distinct id for the thread card — the originating tool_use_id is the
+    # one that rendered the iframe; this proxied call is its own invocation.
+    synth_id = f"app-{tool_use_id}-{uuid.uuid4().hex[:8]}"
+    args = dict(arguments or {})
+
+    try:
+        result = await asyncio.to_thread(
+            client.call_tool_sync, synth_id, tool_name, args
+        )
+    except Exception as exc:  # noqa: BLE001 - surfaced to the App as an error
+        logger.warning(
+            "app tools/call dispatch failed (tool=%s session=%s): %s",
+            tool_name,
+            session_id,
+            exc,
+        )
+        raise AppToolCallError(
+            f"Tool '{tool_name}' failed to execute", code=502
+        ) from exc
+
+    content = _serialize_content(result)
+    is_error = _is_error(result)
+    status = "error" if is_error else "success"
+
+    # Surface the call in the conversation thread. Best-effort: a missing
+    # listener (no active stream) buffers in the broker for the next turn;
+    # never blocks returning the result to the App.
+    broker = get_app_tool_event_broker()
+    broker.publish(
+        session_id,
+        {
+            "type": "tool_use",
+            "data": {
+                "tool_use": {
+                    "name": tool_name,
+                    "tool_use_id": synth_id,
+                    "input": args,
+                    "origin": "mcp_app",
+                }
+            },
+        },
+    )
+    broker.publish(
+        session_id,
+        {
+            "type": "tool_result",
+            "data": {
+                "tool_result": {
+                    "toolUseId": synth_id,
+                    "status": status,
+                    "content": content,
+                }
+            },
+        },
+    )
+
+    return {
+        "toolUseId": tool_use_id,
+        "result": {"content": content, "isError": is_error},
+    }
diff --git a/backend/src/apis/inference_api/chat/models.py b/backend/src/apis/inference_api/chat/models.py
index 731befad..082ff473 100644
--- a/backend/src/apis/inference_api/chat/models.py
+++ b/backend/src/apis/inference_api/chat/models.py
@@ -29,6 +29,49 @@ class InterruptResponseEntry(BaseModel):
     response: Any = None
 
 
+class AppToolCallEntry(BaseModel):
+    """An app-initiated `tools/call` proxied from an embedded MCP App.
+
+    MCP Apps PR #5. The iframe's JSON-RPC `tools/call` is relayed by
+    app-api to `/invocations` with this directive. When set, the route
+    does NOT run a model turn: it dispatches the single named tool against
+    the conversation's live MCP client (rebuilding the agent like a resume
+    so the client session/auth are wired identically), then returns the
+    `CallToolResult` and publishes synthesized `tool_use`/`tool_result`
+    into the conversation thread via the per-session event broker.
+
+    `tool_use_id` is the originating MCP App's tool-use id; proxied calls
+    inherit that conversation/iframe binding for provenance.
+    """
+
+    tool_use_id: str
+    tool_name: str
+    arguments: Dict[str, Any] = {}
+
+
+class AppContextUpdateEntry(BaseModel):
+    """App-supplied model context pushed via `ui/update-model-context`.
+
+    MCP Apps PR #6. The embedded App's JSON-RPC `ui/update-model-context`
+    is relayed by app-api to `/invocations` with this directive. Like
+    `app_tool_call` it runs NO model turn — it stashes the payload on the
+    conversation agent's Strands `agent.state` under
+    `mcp_apps.context[resource_uri]`. The next real user turn merges any
+    pending entries into that turn's prompt and clears them.
+
+    `resource_uri` is the bound MCP App resource (`ui://...`) and is the
+    dedupe key: the host keeps only the last update per resource between
+    turns (spec: "if multiple updates are received before the next user
+    message, Host SHOULD only send the last"). `content` /
+    `structured_content` mirror the spec's `ui/update-model-context`
+    params; at least one is set.
+    """
+
+    resource_uri: str
+    content: Optional[List[Dict[str, Any]]] = None
+    structured_content: Optional[Dict[str, Any]] = None
+
+
 class InvocationRequest(BaseModel):
     """Input for /invocations endpoint with multi-provider support"""
 
@@ -57,10 +100,24 @@ class InvocationRequest(BaseModel):
     # new one. `message` is ignored in that case — the original prompt is
     # already in the agent's interrupt context.
     interrupt_responses: Optional[List[InterruptResponseEntry]] = None
+    # When true, this is a "Continue" after a max_tokens truncation. Like a
+    # resume, `message` is ignored: instead of synthesizing a new user turn,
+    # the agent re-enters the loop with an empty prompt so the model
+    # continues the truncated assistant message already in restored history
+    # (assistant-prefill). Bypasses quota / RAG / file resolution like resume.
+    continue_truncated: Optional[bool] = None
     # Selects which agent factory variant builds the turn. Defaults to "chat"
     # (MainAgent / ChatAgent) when omitted, so existing clients are unaffected.
     # Pass "skill" to route through SkillAgent's progressive skill disclosure.
     agent_type: Optional[str] = None
+    # When set, this invocation is an app-initiated tools/call proxied from
+    # an embedded MCP App (PR #5). `message` is ignored; no model turn runs.
+    app_tool_call: Optional[AppToolCallEntry] = None
+    # When set, this invocation pushes app-supplied model context onto the
+    # conversation agent's state (PR #6, `ui/update-model-context`).
+    # `message` is ignored; no model turn runs. The context is merged into
+    # (and cleared before) the next real user turn's prompt.
+    app_context_update: Optional[AppContextUpdateEntry] = None
 
 
 class InvocationResponse(BaseModel):
diff --git a/backend/src/apis/inference_api/chat/routes.py b/backend/src/apis/inference_api/chat/routes.py
index 175c138a..6d74b569 100644
--- a/backend/src/apis/inference_api/chat/routes.py
+++ b/backend/src/apis/inference_api/chat/routes.py
@@ -11,10 +11,11 @@
 import json
 import logging
 import os
+import time
 from typing import AsyncGenerator, Union
 
 from fastapi import APIRouter, Depends, HTTPException, status
-from fastapi.responses import StreamingResponse
+from fastapi.responses import JSONResponse, StreamingResponse
 
 from agents.main_agent.core.model_config import KNOWN_CANONICAL_PARAMS
 from agents.main_agent.session.session_factory import SessionFactory
@@ -38,7 +39,14 @@
 
 from apis.shared.rbac.service import get_app_role_service
 from apis.shared.sessions.metadata import ensure_session_metadata_exists
+from apis.shared.user_settings.repository import UserSettingsRepository
 
+from .app_context_dispatch import (
+    AppContextUpdateError,
+    dispatch_app_context_update,
+    merge_and_clear_pending_context,
+)
+from .app_tool_dispatch import AppToolCallError, dispatch_app_tool_call
 from .models import FileContent, InvocationRequest
 from .service import generate_conversation_title, get_agent
 
@@ -82,6 +90,23 @@ def _sanitize_log(value: object) -> str:
     return text.translate(control_map)
 
 
+def _as_int_or_none(value: object) -> int | None:
+    """Coerce a numeric inference-param value to int for safety comparisons.
+
+    Inference params arrive untyped (``Dict[str, Any]`` from JSON), so an
+    integer bound can show up as a float (e.g. ``8192.0``). Returns ``None``
+    for bool / non-numeric values (including a ``thinking`` value an admin
+    pasted as a raw SDK dict) so callers skip the check rather than crash.
+    """
+    if isinstance(value, bool):
+        return None
+    if isinstance(value, int):
+        return value
+    if isinstance(value, float):
+        return int(value)
+    return None
+
+
 async def _find_managed_model(model_id: str | None):
     """Best-effort lookup of a managed-model record by external model ID."""
     if not model_id:
@@ -98,6 +123,37 @@ async def _find_managed_model(model_id: str | None):
     return None
 
 
+async def _resolve_user_default_model(user_id: str | None) -> tuple[str | None, str | None]:
+    """Look up the user's persisted defaultModelId and resolve its provider.
+
+    Returns ``(model_id, provider)``. When the request does not specify
+    ``model_id``, callers fall back to the user's saved preference; if that
+    is also unset (or the saved id no longer exists in managed models), the
+    callers in turn fall back to the agent factory's hardcoded default.
+
+    The lookup is best-effort: any failure (no table, DynamoDB error, or
+    deleted model) returns ``(None, None)`` so the chat turn proceeds on
+    the system default rather than being blocked.
+    """
+    if not user_id:
+        return None, None
+    try:
+        repo = UserSettingsRepository()
+        if not repo.enabled:
+            return None, None
+        settings = await repo.get_settings(user_id)
+        saved_id = settings.get("defaultModelId")
+    except Exception:
+        logger.warning("Failed to load user settings for default model lookup", exc_info=True)
+        return None, None
+    if not saved_id:
+        return None, None
+
+    managed = await _find_managed_model(saved_id)
+    provider = managed.provider if managed else None
+    return saved_id, provider
+
+
 def _merge_inference_params(
     managed_model,
     request_params: dict,
@@ -141,6 +197,19 @@ def _merge_inference_params(
                 merged[name] = spec.default
             continue
 
+        # Enum params (e.g. `effort`): the override must be a member of the
+        # admin-declared `allowed` set; an out-of-domain value falls back to
+        # the default rather than erroring mid-stream. Mirrors the numeric
+        # clamp below, and the per-model `allowed` differences (Sonnet 4.6
+        # vs Opus 4.7) stay data, not code.
+        if spec.allowed is not None:
+            req = request_params.get(name)
+            if req is not None and req in spec.allowed:
+                merged[name] = req
+            elif spec.default is not None:
+                merged[name] = spec.default
+            continue
+
         if name in request_params and request_params[name] is not None:
             value = request_params[name]
             if isinstance(value, (int, float)):
@@ -176,15 +245,12 @@ def _merge_inference_params(
     # both are set and inconsistent, drop `thinking` so the response still
     # streams instead of erroring out — the user just doesn't get a
     # reasoning trace this turn. Logged so the gap is visible in metrics.
-    thinking = merged.get("thinking")
-    max_tokens = merged.get("max_tokens")
-    if (
-        isinstance(thinking, int)
-        and not isinstance(thinking, bool)
-        and isinstance(max_tokens, int)
-        and not isinstance(max_tokens, bool)
-        and thinking >= max_tokens
-    ):
+    # Coerce before comparing: both values can arrive as floats (untyped
+    # Dict[str, Any] from JSON), and an `isinstance(..., int)` gate would
+    # silently skip the check on float input and let the bad request through.
+    thinking = _as_int_or_none(merged.get("thinking"))
+    max_tokens = _as_int_or_none(merged.get("max_tokens"))
+    if thinking is not None and max_tokens is not None and thinking >= max_tokens:
         logger.warning(
             "Dropping thinking budget %d for model %s — not less than max_tokens %d",
             thinking,
@@ -230,6 +296,262 @@ async def _resolve_caching_enabled(model_id: str | None, explicit_caching_enable
     return caching
 
 
+# ============================================================
+# Spreadsheet Analysis Tool Injection
+# ============================================================
+
+SPREADSHEET_TOOL_IDS = {"list_spreadsheets", "analyze_spreadsheet"}
+
+
+def _build_spreadsheet_tools(
+    enabled_tools: list | None,
+    assistant_id: str | None,
+    session_id: str,
+    user_id: str,
+) -> list:
+    """Create context-bound spreadsheet analysis tools if enabled by the user."""
+    if not enabled_tools:
+        return []
+
+    requested = SPREADSHEET_TOOL_IDS.intersection(enabled_tools)
+    if not requested:
+        return []
+
+    from agents.builtin_tools.spreadsheet_analysis import make_list_spreadsheets_tool, make_analyze_tool
+
+    tools = []
+    if "list_spreadsheets" in requested:
+        tools.append(make_list_spreadsheets_tool(assistant_id, session_id, user_id))
+    if "analyze_spreadsheet" in requested:
+        tools.append(make_analyze_tool(assistant_id, session_id, user_id))
+
+    logger.info(f"Created {len(tools)} spreadsheet analysis tools (assistant={assistant_id})")
+    return tools
+
+
+# ============================================================
+# Artifact Authoring Tool Injection
+# ============================================================
+
+ARTIFACT_TOOL_IDS = {"create_artifact", "update_artifact"}
+
+
+def _build_artifact_tools(
+    enabled_tools: list | None,
+    session_id: str,
+    user_id: str,
+) -> list:
+    """Create context-bound artifact authoring tools if enabled by the user."""
+    if not enabled_tools:
+        return []
+
+    requested = ARTIFACT_TOOL_IDS.intersection(enabled_tools)
+    if not requested:
+        return []
+
+    from agents.builtin_tools.artifacts import (
+        make_create_artifact_tool,
+        make_update_artifact_tool,
+    )
+
+    tools = []
+    if "create_artifact" in requested:
+        tools.append(make_create_artifact_tool(session_id, user_id))
+    if "update_artifact" in requested:
+        tools.append(make_update_artifact_tool(session_id, user_id))
+
+    logger.info(f"Created {len(tools)} artifact authoring tools")
+    return tools
+
+
+# ============================================================
+# Attachment Partitioning (#206)
+# ============================================================
+
+def _estimate_decoded_size(file: "FileContent") -> int:
+    """Estimate decoded byte size of a base64-encoded FileContent payload.
+
+    Base64 inflates bytes by ~4/3, so decoded size ≈ len(b64) * 3 / 4.
+    This avoids allocating the full bytes just to check a threshold.
+    """
+    try:
+        # Account for base64 padding: strip "=" padding before estimating.
+        stripped = (file.bytes or "").rstrip("=")
+        return (len(stripped) * 3) // 4
+    except Exception:
+        return 0
+
+
+def _partition_attachments(
+    all_files: list,
+) -> tuple[list, list, list]:
+    """Split attachments into (inline_for_bedrock, tabular, oversized_non_tabular).
+
+    - Tabular files (csv/xlsx) are never sent inline — they route through
+      the spreadsheet analysis tools. Keeps Bedrock's 4.5MB document limit
+      from exploding on XLSX files that expand during internal parsing.
+    - Non-tabular files larger than INLINE_DOCUMENT_MAX_BYTES are dropped
+      from the inline set with a user-facing note, to prevent mid-stream
+      ValidationException on the raw AWS error path.
+    - Everything else rides along as a regular document/image content block.
+    """
+    from apis.shared.files.models import INLINE_DOCUMENT_MAX_BYTES, is_tabular_file
+
+    inline: list = []
+    tabular: list = []
+    oversized: list = []
+
+    for file in all_files:
+        if is_tabular_file(file.filename, file.content_type):
+            tabular.append(file)
+            continue
+        # Only size-gate non-image documents. Images have their own Bedrock
+        # limits (much larger) and the prompt builder reroutes them as
+        # image blocks, which are not affected by the document-size cap.
+        content_type = (file.content_type or "").lower()
+        is_image = content_type.startswith("image/")
+        if not is_image and _estimate_decoded_size(file) > INLINE_DOCUMENT_MAX_BYTES:
+            oversized.append(file)
+            continue
+        inline.append(file)
+
+    return inline, tabular, oversized
+
+
+def _build_attachment_guidance(
+    diverted_tabular: list,
+    oversized_inline: list,
+    enabled_tools: list | None,
+) -> str:
+    """Return a short markdown addendum describing how attachments will be
+    handled, to append to the user's message so the agent (and the user)
+    both understand why a file isn't inline.
+    """
+    parts: list[str] = []
+
+    if diverted_tabular:
+        names = ", ".join(f"`{f.filename}`" for f in diverted_tabular)
+        tool_is_enabled = bool(enabled_tools) and (
+            "analyze_spreadsheet" in enabled_tools or "list_spreadsheets" in enabled_tools
+        )
+        if tool_is_enabled:
+            parts.append(
+                f"_Attached spreadsheet(s) {names} are available through the "
+                f"Spreadsheet Analysis tool rather than inline — use "
+                f"`list_spreadsheets` to see them and `analyze_spreadsheet` "
+                f"to run aggregations or lookups._"
+            )
+        else:
+            parts.append(
+                f"_Attached spreadsheet(s) {names} can't be read inline at "
+                f"this size. To analyze them, enable **Spreadsheet Analysis** "
+                f"in the Tools section of the settings panel (gear icon next "
+                f"to the message input), then re-send your message._"
+            )
+
+    if oversized_inline:
+        names = ", ".join(f"`{f.filename}`" for f in oversized_inline)
+        parts.append(
+            f"_Attached file(s) {names} exceed the inline document size limit "
+            f"and were skipped. Try a smaller file, or convert to CSV/XLSX "
+            f"and use the Spreadsheet Analysis tool._"
+        )
+
+    return "\n\n".join(parts)
+
+
+async def _build_tabular_inventory(
+    session_id: str,
+    assistant_id: str | None,
+    enabled_tools: list | None,
+) -> str:
+    """Inventory every tabular file visible to the agent this turn, and
+    prepend it to the user message when more than one exists.
+
+    Motivation: when the vector search returns chunks from multiple source
+    files with identical schemas (e.g. two monthly FY ledgers), the model
+    has no way to tell there's more than one spreadsheet at all — RAG
+    surfaces chunk content but not a full file inventory. The model picks
+    whichever file yielded the first high-ranked chunk and silently runs
+    analyze_spreadsheet against just that one. The user's "total" is
+    wrong by exactly the other file(s).
+
+    We ship the file list inline so the agent sees the full set at turn
+    start and can call list_spreadsheets / pick deliberately / ask the
+    user / aggregate across files. Only emitted when the analysis tools
+    are enabled (otherwise the agent can't act on it anyway) and when at
+    least two tabular files exist (one file isn't ambiguous).
+    """
+    if not enabled_tools:
+        return ""
+    tool_is_enabled = (
+        "analyze_spreadsheet" in enabled_tools
+        or "list_spreadsheets" in enabled_tools
+    )
+    if not tool_is_enabled:
+        return ""
+
+    # Lazy imports to avoid pulling the agent layer into module-load time
+    # on cold starts where this code path isn't exercised.
+    try:
+        from agents.builtin_tools.spreadsheet_analysis.list_spreadsheets_tool import (
+            _get_kb_files,
+            _get_session_files,
+        )
+    except Exception:
+        return ""
+
+    files: list[dict] = []
+    try:
+        if assistant_id:
+            files.extend(await _get_kb_files(assistant_id))
+        files.extend(await _get_session_files(session_id))
+    except Exception:
+        logger.warning("Failed to enumerate tabular files for inventory", exc_info=True)
+        return ""
+
+    # De-duplicate by (filename, source) — a single file shouldn't be
+    # listed twice if our lookups overlap.
+    seen: set[tuple[str, str]] = set()
+    unique: list[dict] = []
+    for f in files:
+        key = (f.get("filename", ""), f.get("source", ""))
+        if key in seen:
+            continue
+        seen.add(key)
+        unique.append(f)
+
+    if len(unique) < 2:
+        # Single file: no ambiguity, and list_spreadsheets covers discovery
+        # for the agent if it ever needs it.
+        return ""
+
+    def _fmt_size(n: int) -> str:
+        if n >= 1024 * 1024:
+            return f"{n / (1024 * 1024):.1f} MB"
+        if n >= 1024:
+            return f"{n // 1024} KB"
+        return f"{n} B"
+
+    lines = []
+    for f in unique:
+        name = f.get("filename", "")
+        source = "knowledge base" if f.get("source") == "knowledge_base" else "chat attachment"
+        size = _fmt_size(int(f.get("size_bytes") or 0))
+        lines.append(f"- `{name}` ({source}, {size})")
+
+    listing = "\n".join(lines)
+    return (
+        f"_Multiple spreadsheet files are attached. Before running "
+        f"`analyze_spreadsheet`, decide which file(s) the user's request "
+        f"refers to — if it's ambiguous or spans multiple files, call "
+        f"`list_spreadsheets` and/or ask the user rather than picking one "
+        f"silently. State which file(s) you analyzed in your response._\n\n"
+        f"**Available spreadsheets:**\n{listing}"
+    )
+
+
+
 # ============================================================
 # Helper Functions for Streaming Error/Status Messages
 # ============================================================
@@ -316,8 +638,24 @@ async def stream_conversational_message(
 
 @router.get("/ping")
 async def ping():
-    """Health check endpoint (required by AgentCore Runtime)"""
-    return {"status": "healthy", "version": os.environ.get("APP_VERSION", "unknown")}
+    """Health check endpoint (required by AgentCore Runtime).
+
+    AgentCore's idle reaper requires ``time_of_last_update`` (int epoch
+    seconds) alongside ``status``. When the field is absent the platform
+    reaps the microVM at ``idleRuntimeSessionTimeout`` even mid-stream,
+    regardless of the reported status (bedrock-agentcore-sdk-python#471).
+
+    We do not run the SDK's async-task busy tracking here (that's the
+    deferred ``async_mode`` work), so we cannot report ``HealthyBusy``.
+    Returning a fresh timestamp on every ping keeps the session alive
+    while the runtime data plane is polling us, which is the documented
+    mitigation for the silent mid-generation reap.
+    """
+    return {
+        "status": "Healthy",
+        "time_of_last_update": int(time.time()),
+        "version": os.environ.get("APP_VERSION", "unknown"),
+    }
 
 
 @router.post("/invocations")
@@ -341,11 +679,107 @@ async def invocations(request: InvocationRequest, current_user: User = Depends(g
     # they bypass quota, file resolution, and RAG augmentation because those
     # already ran on the original turn that got paused.
     is_resume = bool(input_data.interrupt_responses)
+    # A "Continue" after a max_tokens truncation. Like resume, it bypasses
+    # quota / RAG / file resolution and does NOT clear the turn state; unlike
+    # resume there is no interrupt to validate — the agent is rebuilt from the
+    # resent params and re-entered with an empty prompt (assistant-prefill).
+    is_continuation = bool(input_data.continue_truncated)
     logger.info(
-        "Invocation request received (resume=%s)" % is_resume
+        "Invocation request received (resume=%s, continue_truncated=%s)" % (is_resume, is_continuation)
     )
     logger.info("Message received")
 
+    # App-initiated tools/call (MCP Apps PR #5). Like resume/continuation it
+    # bypasses quota / RAG / file resolution / title — there is no model
+    # turn. We rebuild the conversation agent (so the MCP client session +
+    # auth are wired exactly as for a model-driven call), dispatch the one
+    # named tool, publish synthesized tool_use/tool_result into the thread
+    # via the per-session broker, and return the CallToolResult as JSON for
+    # app-api to relay back to the iframe. Inert behind the host flag (the
+    # UIToolCatalog is empty, so dispatch rejects every call as not
+    # app-visible).
+    if input_data.app_tool_call is not None:
+        atc = input_data.app_tool_call
+        try:
+            request_inference_params = dict(input_data.inference_params or {})
+            caching_enabled, inference_params = await _resolve_model_settings(
+                model_id=input_data.model_id,
+                explicit_caching_enabled=input_data.caching_enabled,
+                request_inference_params=request_inference_params,
+            )
+            agent = await get_agent(
+                session_id=input_data.session_id,
+                user_id=user_id,
+                auth_token=auth_token,
+                enabled_tools=input_data.enabled_tools,
+                model_id=input_data.model_id,
+                system_prompt=input_data.system_prompt,
+                caching_enabled=caching_enabled,
+                provider=input_data.provider,
+                inference_params=inference_params,
+                agent_type=input_data.agent_type,
+                is_resume=False,
+            )
+            payload = await dispatch_app_tool_call(
+                agent,
+                session_id=input_data.session_id,
+                user_id=user_id,
+                tool_use_id=atc.tool_use_id,
+                tool_name=atc.tool_name,
+                arguments=atc.arguments,
+            )
+            return JSONResponse(payload)
+        except AppToolCallError as e:
+            return JSONResponse({"error": e.message}, status_code=e.code)
+        except HTTPException:
+            raise
+        except Exception:
+            logger.error("app tools/call invocation failed", exc_info=True)
+            return JSONResponse({"error": "Internal error"}, status_code=500)
+
+    # App-pushed model context (MCP Apps PR #6, `ui/update-model-context`).
+    # Like app_tool_call it bypasses quota / RAG / file resolution / title
+    # and runs NO model turn — we rebuild the conversation agent (so the
+    # same cached `agent.state` is reused) and stash the payload under
+    # `mcp_apps.context[resource_uri]`. The next real user turn merges and
+    # clears it. Inert behind the host flag (no live App ever calls this).
+    if input_data.app_context_update is not None:
+        acu = input_data.app_context_update
+        try:
+            request_inference_params = dict(input_data.inference_params or {})
+            caching_enabled, inference_params = await _resolve_model_settings(
+                model_id=input_data.model_id,
+                explicit_caching_enabled=input_data.caching_enabled,
+                request_inference_params=request_inference_params,
+            )
+            agent = await get_agent(
+                session_id=input_data.session_id,
+                user_id=user_id,
+                auth_token=auth_token,
+                enabled_tools=input_data.enabled_tools,
+                model_id=input_data.model_id,
+                system_prompt=input_data.system_prompt,
+                caching_enabled=caching_enabled,
+                provider=input_data.provider,
+                inference_params=inference_params,
+                agent_type=input_data.agent_type,
+                is_resume=False,
+            )
+            payload = dispatch_app_context_update(
+                agent,
+                resource_uri=acu.resource_uri,
+                content=acu.content,
+                structured_content=acu.structured_content,
+            )
+            return JSONResponse(payload)
+        except AppContextUpdateError as e:
+            return JSONResponse({"error": e.message}, status_code=e.code)
+        except HTTPException:
+            raise
+        except Exception:
+            logger.error("app context update invocation failed", exc_info=True)
+            return JSONResponse({"error": "Internal error"}, status_code=500)
+
     if input_data.enabled_tools:
         logger.info(f"Enabled tools ({len(input_data.enabled_tools)})")
 
@@ -357,8 +791,18 @@ async def invocations(request: InvocationRequest, current_user: User = Depends(g
     if input_data.file_upload_ids:
         logger.info(f"File upload IDs: {len(input_data.file_upload_ids)} IDs to resolve")
 
-    # Resolve file upload IDs to FileContent objects
-    files_to_send = list(input_data.files) if input_data.files else []
+    # Resolve file upload IDs to FileContent objects, then partition:
+    #   - inline_files: images + non-tabular documents that Bedrock can
+    #     ingest directly as document content blocks
+    #   - tabular_files: csv/xlsx, which we intentionally NEVER send inline
+    #     because XLSX in particular inflates dramatically inside Bedrock
+    #     (1.4MB zipped → >4.5MB internal, triggering ValidationException).
+    #     They remain available to the agent via list_spreadsheets /
+    #     analyze_spreadsheet, which run pandas on the real file. See #206.
+    #   - oversized_files: non-tabular docs that exceed our inline size
+    #     budget; we skip them inline and surface a note instead of
+    #     letting Bedrock reject the turn.
+    all_files = list(input_data.files) if input_data.files else []
 
     if input_data.file_upload_ids:
         try:
@@ -368,14 +812,56 @@ async def invocations(request: InvocationRequest, current_user: User = Depends(g
                 upload_ids=input_data.file_upload_ids,
                 max_files=5,  # Bedrock document limit
             )
-            # Convert ResolvedFileContent to FileContent
             for rf in resolved_files:
-                files_to_send.append(FileContent(filename=rf.filename, content_type=rf.content_type, bytes=rf.bytes))
+                all_files.append(
+                    FileContent(filename=rf.filename, content_type=rf.content_type, bytes=rf.bytes)
+                )
             logger.info(f"Resolved {len(resolved_files)} files from upload IDs")
-        except Exception as e:
-            logger.warning("Failed to resolve file upload IDs")
+        except Exception:
+            logger.warning("Failed to resolve file upload IDs", exc_info=True)
             # Continue without files rather than failing the request
 
+    # Deduplicate files by (filename, content_type) before partitioning.
+    # The same file can arrive via both `files` (direct base64) and
+    # `file_upload_ids` (resolved from S3), or a client may submit the same
+    # upload ID twice. Sending two document blocks with the same sanitized
+    # name to Bedrock ConverseStream raises:
+    #   ValidationException: Messages can't contain duplicate document names.
+    # We keep the first occurrence and drop subsequent duplicates.
+    if all_files:
+        seen_file_keys: set = set()
+        deduped_files = []
+        for f in all_files:
+            key = (f.filename.lower(), f.content_type.lower())
+            if key not in seen_file_keys:
+                seen_file_keys.add(key)
+                deduped_files.append(f)
+            else:
+                logger.info(
+                    "Dropping duplicate file attachment: %s (%s)",
+                    f.filename,
+                    f.content_type,
+                )
+        if len(deduped_files) < len(all_files):
+            logger.info(
+                "Deduplicated %d -> %d file(s) before sending to Bedrock",
+                len(all_files),
+                len(deduped_files),
+            )
+        all_files = deduped_files
+
+    files_to_send, diverted_tabular, oversized_inline = _partition_attachments(all_files)
+    if diverted_tabular:
+        logger.info(
+            f"Diverted {len(diverted_tabular)} tabular file(s) from inline document blocks; "
+            f"available via spreadsheet tools: {[f.filename for f in diverted_tabular]}"
+        )
+    if oversized_inline:
+        logger.warning(
+            f"Skipped {len(oversized_inline)} oversized file(s) (> inline limit): "
+            f"{[(f.filename, _estimate_decoded_size(f)) for f in oversized_inline]}"
+        )
+
     # Pre-create session metadata so OAuth interrupts and other state can
     # attach to the session row from turn one. Best-effort; on failure the
     # post-stream lazy-create in StreamCoordinator still covers it.
@@ -386,7 +872,7 @@ async def invocations(request: InvocationRequest, current_user: User = Depends(g
     # later (mistaken) resume request pick up against a turn the user
     # already moved past.
     is_new_session = False
-    if not is_resume:
+    if not is_resume and not is_continuation:
         is_new_session = await ensure_session_metadata_exists(input_data.session_id, user_id)
         try:
             from apis.shared.sessions.metadata import clear_paused_turn
@@ -394,6 +880,17 @@ async def invocations(request: InvocationRequest, current_user: User = Depends(g
         except Exception as e:
             logger.error("Failed to clear stale paused_turn on new turn: %s", e, exc_info=True)
 
+    # Invalidate any prior max_tokens "Continue" marker on every new model
+    # turn that isn't an interrupt-resume — both a fresh turn and a
+    # continuation supersede it. If a continuation itself re-truncates, the
+    # stream_coordinator intercept re-sets the marker.
+    if not is_resume:
+        try:
+            from apis.shared.sessions.metadata import clear_truncated_turn
+            await clear_truncated_turn(input_data.session_id, user_id)
+        except Exception as e:
+            logger.error("Failed to clear stale truncated_turn on new turn: %s", e, exc_info=True)
+
     # First turn → kick off title generation concurrently with the stream.
     # Runs as a background task so it doesn't add latency to TTFT. The
     # targeted UpdateExpression in update_session_title is race-safe with
@@ -410,7 +907,7 @@ async def invocations(request: InvocationRequest, current_user: User = Depends(g
     # Check quota if enforcement is enabled
     quota_warning_event = None
     quota_exceeded_event = None
-    if is_quota_enforcement_enabled() and not is_resume:
+    if is_quota_enforcement_enabled() and not is_resume and not is_continuation:
         try:
             quota_checker = get_quota_checker()
             quota_result = await quota_checker.check_quota(user=current_user, session_id=input_data.session_id)
@@ -469,7 +966,7 @@ async def invocations(request: InvocationRequest, current_user: User = Depends(g
         "Invocation request - processing with assistant context"
     )
 
-    if input_data.rag_assistant_id and not is_resume:
+    if input_data.rag_assistant_id and not is_resume and not is_continuation:
         # Local imports to avoid circular dependency
         from apis.shared.assistants.rag_service import (
             augment_prompt_with_context,
@@ -637,11 +1134,25 @@ async def invocations(request: InvocationRequest, current_user: User = Depends(g
             try:
                 existing_metadata = await get_session_metadata(input_data.session_id, user_id)
                 if existing_metadata:
-                    # Update existing metadata with assistant_id in preferences
-                    prefs_dict = existing_metadata.preferences.model_dump(by_alias=False) if existing_metadata.preferences else {}
+                    # Update existing metadata: merge assistant_id into the
+                    # preferences sub-model. The top-level SessionMetadata has
+                    # no assistant_id field, so applying the update there
+                    # (previous behavior) silently did nothing under
+                    # extra="allow" and left preferences.assistant_id=None.
+                    # That broke the mid-session validation above on turn 2+
+                    # because the check relies on preferences.assistant_id to
+                    # recognize an already-attached assistant (#205).
+                    prefs_dict = (
+                        existing_metadata.preferences.model_dump(by_alias=False)
+                        if existing_metadata.preferences
+                        else {}
+                    )
                     prefs_dict["assistant_id"] = input_data.rag_assistant_id
+                    merged_preferences = SessionPreferences(**prefs_dict)
 
-                    updated_metadata = existing_metadata.model_copy(update={"assistant_id": input_data.rag_assistant_id})
+                    updated_metadata = existing_metadata.model_copy(
+                        update={"preferences": merged_preferences}
+                    )
 
                 else:
                     # Create new metadata with assistant_id in preferences
@@ -739,10 +1250,34 @@ async def invocations(request: InvocationRequest, current_user: User = Depends(g
             if input_data.max_tokens is not None:
                 request_inference_params.setdefault("max_tokens", input_data.max_tokens)
 
+            # Resolve the user's persisted default when the request does
+            # not pin a model. Without this, a "no default selected" client
+            # always lands on the hardcoded factory default and the user's
+            # saved preference is silently ignored at chat time (#161).
+            effective_model_id = input_data.model_id
+            effective_provider = input_data.provider
+            if not effective_model_id:
+                user_default_id, user_default_provider = await _resolve_user_default_model(user_id)
+                if user_default_id:
+                    # Re-check model access against the resolved id. The
+                    # earlier guard only ran on `input_data.model_id`, so a
+                    # stale saved default the user no longer has rights to
+                    # would otherwise sneak past RBAC here.
+                    app_role_service = get_app_role_service()
+                    if await app_role_service.can_access_model(current_user, user_default_id):
+                        effective_model_id = user_default_id
+                        if not effective_provider and user_default_provider:
+                            effective_provider = user_default_provider
+                        logger.info("Applied user default model from settings")
+                    else:
+                        logger.info(
+                            "User default model exists but RBAC denies access; falling back to system default"
+                        )
+
             # Single registry lookup resolves caching + inference params,
             # merging admin defaults with request overrides.
             caching_enabled, inference_params = await _resolve_model_settings(
-                model_id=input_data.model_id,
+                model_id=effective_model_id,
                 explicit_caching_enabled=input_data.caching_enabled,
                 request_inference_params=request_inference_params,
             )
@@ -750,17 +1285,39 @@ async def invocations(request: InvocationRequest, current_user: User = Depends(g
             if caching_enabled is False:
                 logger.info("Prompt caching disabled for model")
 
+            # Get agent instance with user-specific configuration
+            # AgentCore Memory tracks preferences across sessions per user_id
+            # Supports multiple LLM providers: AWS Bedrock, OpenAI, and Google Gemini
+            # Use augmented message and assistant system prompt if assistant RAG was applied
+
+            # Spreadsheet tools scoped to the assistant's document corpus,
+            # when an assistant is attached to this request. The frontend
+            # keeps the assistant id in the URL for the whole session's
+            # lifetime, so we can trust `input_data.rag_assistant_id`
+            # directly; no preferences fallback needed.
+            extra_tools = _build_spreadsheet_tools(
+                enabled_tools=input_data.enabled_tools,
+                assistant_id=input_data.rag_assistant_id,
+                session_id=input_data.session_id,
+                user_id=user_id,
+            ) + _build_artifact_tools(
+                enabled_tools=input_data.enabled_tools,
+                session_id=input_data.session_id,
+                user_id=user_id,
+            )
+
             agent = await get_agent(
                 session_id=input_data.session_id,
                 user_id=user_id,
                 auth_token=auth_token,
                 enabled_tools=input_data.enabled_tools,
-                model_id=input_data.model_id,
+                model_id=effective_model_id,
                 system_prompt=system_prompt,  # Use assistant's instructions if available
                 caching_enabled=caching_enabled,
-                provider=input_data.provider,
+                provider=effective_provider,
                 inference_params=inference_params,
                 agent_type=input_data.agent_type,
+                extra_tools=extra_tools,
                 is_resume=False,
             )
 
@@ -821,11 +1378,45 @@ async def stream_with_quota_warning() -> AsyncGenerator[str, None]:
             # will be modified before reaching the model. This happens when:
             #   1. RAG augmentation prepends context chunks to the message
             #   2. File attachments cause PromptBuilder to rewrite into ContentBlocks
+            #   3. Attachment guidance is appended (tabular routed to tools, etc.)
             # The original text becomes the single source of truth for UI display,
             # while the full augmented prompt stays in AgentCore Memory for the LLM.
+            attachment_guidance = _build_attachment_guidance(
+                diverted_tabular, oversized_inline, input_data.enabled_tools
+            )
+            # When multiple spreadsheets are visible, ship the full inventory
+            # up front so the agent can disambiguate intentionally instead of
+            # silently picking whichever file the vector search ranked first.
+            tabular_inventory = await _build_tabular_inventory(
+                session_id=input_data.session_id,
+                assistant_id=input_data.rag_assistant_id,
+                enabled_tools=input_data.enabled_tools,
+            )
+            # Bind to a new local so we don't trip Python's local-scope rules
+            # inside this generator closure (augmented_message is defined in
+            # the outer function; reassigning it here would make the whole
+            # name local and UnboundLocalError before the assignment runs).
+            final_message = augmented_message
+            if attachment_guidance:
+                final_message = f"{final_message}\n\n{attachment_guidance}"
+            if tabular_inventory:
+                final_message = f"{final_message}\n\n{tabular_inventory}"
+
+            # MCP Apps PR #6: drain any context an embedded App pushed via
+            # `ui/update-model-context` since the last turn and prepend it
+            # to this turn only. Skipped on resume/continuation (Strands
+            # ignores `final_message` there) so a pending update survives
+            # until the next real user turn instead of being silently
+            # cleared. Kept out of persisted history via the
+            # `original_message` path below (cache-prefix-safe).
+            if not is_resume and not is_continuation:
+                pending_ctx_block = merge_and_clear_pending_context(agent)
+                if pending_ctx_block:
+                    final_message = f"{pending_ctx_block}\n\n{final_message}"
+
             message_will_be_modified = (
-                augmented_message != input_data.message  # RAG augmentation
-                or bool(files_to_send)                   # File attachments
+                final_message != input_data.message  # RAG augmentation / attachment guidance / inventory
+                or bool(files_to_send)               # File attachments
             )
             # Strands' resume protocol wants each entry wrapped as
             # {"interruptResponse": {...}}. The InvocationRequest schema
@@ -838,12 +1429,13 @@ async def stream_with_quota_warning() -> AsyncGenerator[str, None]:
             )
 
             async for event in agent.stream_async(
-                augmented_message,
+                final_message,
                 session_id=input_data.session_id,
                 files=files_to_send if files_to_send else None,
                 citations=citations_for_storage if citations_for_storage else None,
                 original_message=input_data.message if message_will_be_modified else None,
                 interrupt_responses=interrupt_responses_payload,
+                continue_truncated=is_continuation,
             ):
                 yield event
 
diff --git a/backend/src/apis/inference_api/chat/service.py b/backend/src/apis/inference_api/chat/service.py
index 168da1e7..9bd0ff19 100644
--- a/backend/src/apis/inference_api/chat/service.py
+++ b/backend/src/apis/inference_api/chat/service.py
@@ -118,6 +118,7 @@ async def get_agent(
     provider: Optional[str] = None,
     max_tokens: Optional[int] = None,
     agent_type: Optional[str] = None,
+    extra_tools: Optional[list] = None,
     inference_params: Optional[Dict[str, Any]] = None,
     is_resume: bool = False,
 ) -> BaseAgent:
@@ -172,8 +173,7 @@ async def get_agent(
         agent_type=agent_type,
     )
 
-    # Check cache
-    if cache_key in _agent_cache:
+    if not extra_tools and cache_key in _agent_cache:
         cached = _agent_cache[cache_key]
         # Defense in depth: a non-resume request should never be served a
         # paused agent. If we ever desync the cache key between the original
@@ -210,6 +210,8 @@ async def get_agent(
         system_prompt=system_prompt,
         caching_enabled=caching_enabled,
         provider=provider,
+        max_tokens=max_tokens,
+        extra_tools=extra_tools,
         inference_params=merged_params,
     )
 
@@ -218,6 +220,11 @@ async def get_agent(
     if hasattr(agent, "_construction_snapshot"):
         agent._construction_snapshot["agent_type"] = resolved_agent_type
 
+    # Don't cache agents with context-bound extra_tools
+    if extra_tools:
+        logger.debug("⏭️ Skipping cache for agent with extra_tools")
+        return agent
+
     # Add to cache with LRU eviction
     if len(_agent_cache) >= _CACHE_MAX_SIZE:
         # Remove oldest entry (first inserted)
diff --git a/backend/src/apis/shared/auth/__init__.py b/backend/src/apis/shared/auth/__init__.py
index 4e3d7e61..87ac8723 100644
--- a/backend/src/apis/shared/auth/__init__.py
+++ b/backend/src/apis/shared/auth/__init__.py
@@ -1,12 +1,11 @@
 """Shared authentication utilities for API projects."""
 
-from .dependencies import get_current_user, get_current_user_from_session, security
+from .dependencies import get_current_user_from_session, security
 from .models import User
 from .state_store import StateStore, InMemoryStateStore, DynamoDBStateStore, create_state_store
 from .rbac import require_app_roles, require_admin
 
 __all__ = [
-    "get_current_user",
     "get_current_user_from_session",
     "security",
     "User",
diff --git a/backend/src/apis/shared/auth/dependencies.py b/backend/src/apis/shared/auth/dependencies.py
index e4e48417..f492fcbe 100644
--- a/backend/src/apis/shared/auth/dependencies.py
+++ b/backend/src/apis/shared/auth/dependencies.py
@@ -134,57 +134,10 @@ async def _sync_user_background(sync_service, user: User) -> None:
         # Log but don't fail - sync should never break authentication
         logger.warning(f"Failed to sync user {user.user_id}: {e}")
 
-# Lazy-initialized Cognito validator singleton
-_cognito_validator = None
-
-
-def _get_cognito_validator():
-    """
-    Get the CognitoJWTValidator singleton instance.
-
-    Reads Cognito configuration from environment variables:
-    - COGNITO_USER_POOL_ID: The Cognito User Pool ID
-    - COGNITO_APP_CLIENT_ID: The Cognito App Client ID
-    - COGNITO_REGION or AWS_REGION: The AWS region
-
-    Returns None if required environment variables are not set.
-    """
-    global _cognito_validator
-    if _cognito_validator is not None:
-        return _cognito_validator
-
-    try:
-        from .cognito_jwt_validator import CognitoJWTValidator
-
-        user_pool_id = os.environ.get("COGNITO_USER_POOL_ID")
-        app_client_id = os.environ.get("COGNITO_APP_CLIENT_ID")
-        region = os.environ.get("COGNITO_REGION") or os.environ.get("AWS_REGION")
-
-        if not user_pool_id or not app_client_id or not region:
-            logger.warning(
-                "Cognito environment variables not fully configured. "
-                "Required: COGNITO_USER_POOL_ID, COGNITO_APP_CLIENT_ID, "
-                "COGNITO_REGION (or AWS_REGION)"
-            )
-            return None
-
-        _cognito_validator = CognitoJWTValidator(
-            user_pool_id=user_pool_id,
-            app_client_id=app_client_id,
-            region=region,
-        )
-        logger.info("CognitoJWTValidator initialized for Cognito auth")
-    except Exception as e:
-        logger.error(f"Failed to initialize CognitoJWTValidator: {e}", exc_info=True)
-
-    return _cognito_validator
-
-
-# Separate validator for the BFF confidential client. Phase 1 CDK provisions
-# COGNITO_BFF_APP_CLIENT_ID alongside the SPA's COGNITO_APP_CLIENT_ID, and the
-# refresh exchange in `sessions_bff.refresh` issues against the BFF client —
-# so tokens carry `client_id = COGNITO_BFF_APP_CLIENT_ID` and would be rejected
-# by the SPA validator's client_id check.
+# Cognito JWT validator for tokens minted by the BFF confidential client.
+# The SPA-public PKCE client was decommissioned in Phase 7; all browser-facing
+# auth now flows through the BFF, which issues tokens carrying
+# `client_id = COGNITO_BFF_APP_CLIENT_ID`.
 _bff_cognito_validator = None
 
 
@@ -227,65 +180,26 @@ def _get_bff_cognito_validator():
     return _bff_cognito_validator
 
 
-async def get_current_user(
-    credentials: Optional[HTTPAuthorizationCredentials] = Depends(security)
-) -> User:
-    """
-    FastAPI dependency to get the current authenticated user.
-
-    Validates the JWT token using the CognitoJWTValidator against
-    the configured Cognito User Pool.
-
-    Args:
-        credentials: HTTP Bearer token credentials (None if missing)
-
-    Returns:
-        User object with authenticated user information
+def _skip_auth_user() -> Optional[User]:
+    """Return a fake admin user when SKIP_AUTH=true, else None.
 
-    Raises:
-        HTTPException:
-            - 401 if token is missing or invalid
-            - 500 if no JWT validator is available
+    Local-dev-only bypass so an unattended agent (or a dev with no IdP
+    access) can hit protected routes without the OAuth round-trip. The
+    startup check in `app_api/main.py` refuses to boot when this is
+    combined with deployed-environment indicators.
     """
-    # Check if credentials are missing
-    if credentials is None:
-        raise HTTPException(
-            status_code=status.HTTP_401_UNAUTHORIZED,
-            detail="Authentication required. Please provide a valid Bearer token in the Authorization header.",
-            headers={"WWW-Authenticate": "Bearer"},
-        )
-
-    token = credentials.credentials
-
-    validator = _get_cognito_validator()
-    if validator:
-        try:
-            user = validator.validate_token(token)
-            user.raw_token = token
-
-            # Enrich with stored profile (email, name) when using access tokens
-            await _enrich_user_from_store(user)
-
-            # Fire-and-forget sync to Users table
-            sync_service = _get_user_sync_service()
-            if sync_service and sync_service.enabled:
-                asyncio.create_task(_sync_user_background(sync_service, user))
-
-            return user
-        except HTTPException:
-            raise
-        except Exception as e:
-            logger.error(f"Token validation failed: {e}", exc_info=True)
-            raise HTTPException(
-                status_code=status.HTTP_401_UNAUTHORIZED,
-                detail="Authentication failed."
-            )
-
-    # No validator available - Cognito not configured
-    logger.error("No JWT validator available. Ensure Cognito environment variables are configured.")
-    raise HTTPException(
-        status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
-        detail="Authentication service not configured. Cognito environment variables are missing."
+    if os.environ.get("SKIP_AUTH", "").lower() != "true":
+        return None
+    roles = [
+        r.strip()
+        for r in os.environ.get("SKIP_AUTH_ROLES", "admin").split(",")
+        if r.strip()
+    ]
+    return User(
+        user_id=os.environ.get("SKIP_AUTH_USER_ID", "local-dev"),
+        email=os.environ.get("SKIP_AUTH_EMAIL", "dev@local"),
+        name="Local Dev",
+        roles=roles,
     )
 
 
@@ -295,19 +209,22 @@ async def get_current_user_from_session(request: Request) -> User:
     `SessionRefreshMiddleware` is responsible for unsealing the cookie,
     looking up the session row, refreshing the access token if needed, and
     attaching the resulting `SessionRecord` to `request.state.bff_session`.
-    This dependency just consumes that and reuses the existing Cognito JWT
-    validator + profile-enrichment pipeline so RBAC, user-profile cache,
-    and the fire-and-forget user-sync background task all behave identically
-    to the Bearer path.
+    This dependency consumes that and runs the access token through the BFF
+    Cognito validator + profile-enrichment pipeline so RBAC, the user-profile
+    cache, and the fire-and-forget user-sync background task all behave
+    consistently with the rest of the auth surface.
 
-    Phase 2 ships this dependency dormant — no router consumes it yet. Phase
-    6 cuts each router over from `get_current_user` to this one as part of
-    the per-environment cutover.
+    This is the only user-facing auth dependency in `app_api/`. External
+    Bearer callers were retired in the BFF migration (API keys and voice
+    handle their own auth).
 
     Raises:
         HTTPException 401 if no session was resolved by the upstream
         middleware (cookie missing, malformed, or session record gone).
     """
+    if (fake := _skip_auth_user()) is not None:
+        return fake
+
     record = getattr(request.state, "bff_session", None)
     if record is None:
         raise HTTPException(
@@ -379,9 +296,10 @@ async def get_current_user_trusted(
     skips expensive signature verification and simply extracts standard
     Cognito/OIDC claims from the token.
 
-    Security: Only use this in services where the JWT validation
-    is guaranteed. IE AgentCore Runtime with Inbound Auth. For services without pre-validation, use
-    get_current_user() instead.
+    Security: Only use this in services where JWT validation is
+    guaranteed at the network layer (e.g. AgentCore Runtime with Inbound
+    Auth). For cookie-authenticated user-facing routes use
+    `get_current_user_from_session` instead.
 
     Args:
         credentials: HTTP Bearer token credentials (None if missing)
@@ -395,6 +313,9 @@ async def get_current_user_trusted(
     """
     logger.debug("[get_current_user_trusted] Trusted auth extraction started")
 
+    if (fake := _skip_auth_user()) is not None:
+        return fake
+
     # Check if credentials are missing
     if credentials is None:
         logger.debug("[get_current_user_trusted] No credentials provided - returning 401")
diff --git a/backend/src/apis/shared/errors.py b/backend/src/apis/shared/errors.py
index 3a504aeb..c24a0d02 100644
--- a/backend/src/apis/shared/errors.py
+++ b/backend/src/apis/shared/errors.py
@@ -28,6 +28,7 @@ class ErrorCode(str, Enum):
     TOOL_ERROR = "tool_error"
     MODEL_ERROR = "model_error"
     STREAM_ERROR = "stream_error"
+    MAX_TOKENS = "max_tokens"
 
 
 class ErrorDetail(BaseModel):
@@ -204,6 +205,12 @@ def build_conversational_error_event(
 
 Please wait a moment and try again."""
 
+    elif code == ErrorCode.MAX_TOKENS:
+        # Not rendered as a chat bubble — the frontend shows a compact
+        # inline "response length limit reached" notice + Continue button
+        # off the stream_error signal. Kept concise for logs / payload.
+        message = "Response length limit reached."
+
     elif code == ErrorCode.STREAM_ERROR:
         if "accessdenied" in error_lower or "access denied" in error_lower:
             message = f"""⚠️ I don't have access to complete this request.
@@ -241,9 +248,25 @@ def build_conversational_error_event(
 
 Please try again."""
 
-    metadata = {}
+    # Override with specific actionable messages for known Bedrock errors
+    # that can arrive as raw exceptions (not force_stop events) depending
+    # on where in the call stack Strands surfaces them.
+    if "duplicate document name" in error_lower or "can't contain duplicate document" in error_lower:
+        message = (
+            "⚠️ A file you attached has the same name as one already in this "
+            "conversation's context.\n\n"
+            "Try renaming the file before attaching it, or start a new "
+            "conversation."
+        )
+        recoverable = True
+
+    metadata: Dict[str, Any] = {}
     if session_id:
         metadata["session_id"] = session_id
+    if code == ErrorCode.MAX_TOKENS:
+        # Machine-readable hint so the frontend can offer a "Continue"
+        # affordance without parsing the human-readable message.
+        metadata["error_kind"] = "max_tokens"
 
     return ConversationalErrorEvent(
         code=code,
diff --git a/backend/src/apis/shared/files/__init__.py b/backend/src/apis/shared/files/__init__.py
index 6e85c844..a29ba630 100644
--- a/backend/src/apis/shared/files/__init__.py
+++ b/backend/src/apis/shared/files/__init__.py
@@ -17,8 +17,12 @@
     QuotaExceededError,
     ALLOWED_MIME_TYPES,
     ALLOWED_EXTENSIONS,
+    TABULAR_MIME_TYPES,
+    TABULAR_EXTENSIONS,
+    INLINE_DOCUMENT_MAX_BYTES,
     get_file_format,
     is_allowed_mime_type,
+    is_tabular_file,
 )
 
 from .repository import (
@@ -47,8 +51,12 @@
     "QuotaExceededError",
     "ALLOWED_MIME_TYPES",
     "ALLOWED_EXTENSIONS",
+    "TABULAR_MIME_TYPES",
+    "TABULAR_EXTENSIONS",
+    "INLINE_DOCUMENT_MAX_BYTES",
     "get_file_format",
     "is_allowed_mime_type",
+    "is_tabular_file",
     # Repository
     "FileUploadRepository",
     "get_file_upload_repository",
diff --git a/backend/src/apis/shared/files/models.py b/backend/src/apis/shared/files/models.py
index c3fba383..1d0210aa 100644
--- a/backend/src/apis/shared/files/models.py
+++ b/backend/src/apis/shared/files/models.py
@@ -5,6 +5,7 @@
 Supports the pre-signed URL upload flow for S3.
 """
 
+import os
 from datetime import datetime, timezone
 from enum import Enum
 from typing import List, Optional
@@ -69,6 +70,56 @@ def is_allowed_mime_type(mime_type: str) -> bool:
     return mime_type in ALLOWED_MIME_TYPES
 
 
+# =============================================================================
+# Tabular File Detection
+# =============================================================================
+
+# Tabular files (CSV, XLSX) are routed to the spreadsheet analysis tools
+# (list_spreadsheets, analyze_spreadsheet) instead of being sent inline as
+# Bedrock document content blocks. Two reasons:
+#   1. XLSX files compress well on disk but expand dramatically when Bedrock
+#      parses them internally. A 1.4MB xlsx can exceed Bedrock's 4.5MB
+#      document-content limit and crash the turn with ValidationException
+#      (see #206).
+#   2. Even when under the limit, sending raw tabular bytes as a document
+#      block is wasteful — the model does pandas-quality aggregation poorly
+#      from text-rendered tables. analyze_spreadsheet runs real Python on
+#      the real file and is cheaper in tokens and more accurate.
+
+TABULAR_MIME_TYPES = frozenset({
+    "text/csv",
+    "application/vnd.ms-excel",
+    "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
+})
+
+TABULAR_EXTENSIONS = frozenset({".csv", ".xls", ".xlsx"})
+
+
+def is_tabular_file(filename: str, mime_type: str) -> bool:
+    """Return True when the file should be handled by spreadsheet tools
+    rather than sent inline as a Bedrock document block.
+    """
+    if mime_type and mime_type.lower() in TABULAR_MIME_TYPES:
+        return True
+    if filename:
+        lower = filename.lower()
+        for ext in TABULAR_EXTENSIONS:
+            if lower.endswith(ext):
+                return True
+    return False
+
+
+# Bedrock's /ConverseStream imposes a 4.5MB hard limit on each document
+# content block's *internal* representation. Non-tabular formats (PDF, docx,
+# txt, md) don't inflate much, but we leave margin for per-request overhead
+# and for cumulative size across attachments. Rejecting inline files above
+# this threshold with a friendly message is better than a raw AWS
+# ValidationException mid-stream.
+INLINE_DOCUMENT_MAX_BYTES = int(
+    os.environ.get("INLINE_DOCUMENT_MAX_BYTES", 4 * 1024 * 1024)  # 4MB
+)
+
+
 # =============================================================================
 # Database Models (stored in DynamoDB)
 # =============================================================================
@@ -250,6 +301,48 @@ class CompleteUploadResponse(BaseModel):
     model_config = ConfigDict(populate_by_name=True)
 
 
+class PreviewUrlResponse(BaseModel):
+    """Response for GET /api/files/{uploadId}/preview-url."""
+
+    upload_id: str = Field(..., alias="uploadId")
+    url: str = Field(..., description="Short-lived presigned GET URL")
+    expires_at: str = Field(..., alias="expiresAt", description="ISO8601 expiration time")
+    mime_type: str = Field(..., alias="mimeType")
+    filename: str
+
+    model_config = ConfigDict(populate_by_name=True)
+
+
+class TextSnippetResponse(BaseModel):
+    """Response for GET /api/files/{uploadId}/text-snippet."""
+
+    upload_id: str = Field(..., alias="uploadId")
+    snippet: str = Field(..., description="UTF-8 decoded text from the start of the file")
+    truncated: bool = Field(..., description="True if the file was longer than the snippet limit")
+    mime_type: str = Field(..., alias="mimeType")
+
+    model_config = ConfigDict(populate_by_name=True)
+
+
+# MIME types the thumbnail renderer can currently produce a preview image for.
+# Callers should consult this set before invoking the thumbnail endpoint to
+# avoid hammering the service for unsupported types.
+THUMBNAIL_SUPPORTED_MIME_TYPES = frozenset({
+    "application/pdf",
+})
+
+
+class ThumbnailResponse(BaseModel):
+    """Response for GET /api/files/{uploadId}/thumbnail."""
+
+    upload_id: str = Field(..., alias="uploadId")
+    url: str = Field(..., description="Short-lived presigned GET URL for a PNG thumbnail")
+    expires_at: str = Field(..., alias="expiresAt", description="ISO8601 expiration time")
+    cached: bool = Field(..., description="True if served from cache, False if newly rendered")
+
+    model_config = ConfigDict(populate_by_name=True)
+
+
 class FileResponse(BaseModel):
     """Single file in list response."""
 
diff --git a/backend/src/apis/shared/mcp_apps/__init__.py b/backend/src/apis/shared/mcp_apps/__init__.py
new file mode 100644
index 00000000..04e452ea
--- /dev/null
+++ b/backend/src/apis/shared/mcp_apps/__init__.py
@@ -0,0 +1,7 @@
+"""Shared MCP Apps host-renderer support (SEP-1865).
+
+PR #5 of `docs/kaizen/scoping/mcp-apps-host-renderer.md`. Lives in
+`apis.shared` because both the inference-api `/invocations` app-tool-call
+dispatch (publisher) and the `agents` stream coordinator (subscriber) need
+it, and they must not import from each other.
+"""
diff --git a/backend/src/apis/shared/mcp_apps/broker.py b/backend/src/apis/shared/mcp_apps/broker.py
new file mode 100644
index 00000000..db620b7a
--- /dev/null
+++ b/backend/src/apis/shared/mcp_apps/broker.py
@@ -0,0 +1,135 @@
+"""Per-conversation app-initiated tool-event broker (MCP Apps PR #5).
+
+When an embedded MCP App calls a server tool (`tools/call` over the
+postMessage bridge), the result must surface as `tool_use` / `tool_result`
+in the user's conversation thread — even though the call arrives out-of-band
+on `POST /mcp-apps/proxy-call`, not on the chat stream.
+
+This broker is the seam the scoping doc's "open implementation question"
+calls for. The inference-api app-tool-call dispatch **publishes** synthesized
+events here; the `StreamCoordinator` (the live conversation SSE stream)
+**subscribes** and interleaves them. Delivery model:
+
+- A subscriber is active (a chat turn is streaming for that session): the
+  event is handed to every live subscriber queue — it lands in the thread
+  live.
+- No subscriber active (the App was used between turns): the event is
+  buffered in a small bounded per-session ring. The next stream that
+  subscribes drains the ring first, so the card shows when the user's next
+  turn opens. (Full reload is covered separately by session persistence.)
+
+Process-global and asyncio-only: the inference-api runtime is a single
+event loop, so a module singleton with `asyncio.Queue` subscribers is
+sufficient and avoids cross-process coupling. Bounded so a forgotten
+session can't leak memory.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+from collections import deque
+from contextlib import asynccontextmanager
+from typing import Any, AsyncIterator, Deque, Dict, List, Set
+
+logger = logging.getLogger(__name__)
+
+# Cap buffered events for a session with no active stream. App-initiated
+# calls are user-driven clicks, so this is generous; oldest is dropped.
+_MAX_PENDING_PER_SESSION = 100
+
+
+class AppToolEventBroker:
+    """Per-`session_id` fan-out of synthesized app-initiated tool events."""
+
+    def __init__(self) -> None:
+        self._subscribers: Dict[str, Set["asyncio.Queue[dict]"]] = {}
+        self._pending: Dict[str, Deque[dict]] = {}
+
+    def publish(self, session_id: str, event: Dict[str, Any]) -> None:
+        """Deliver an event to the session's live stream, or buffer it.
+
+        Never raises into the caller (a proxied tool call must still return
+        its result to the App even if nothing is listening).
+        """
+        if not session_id:
+            return
+        subs = self._subscribers.get(session_id)
+        if subs:
+            for q in list(subs):
+                try:
+                    q.put_nowait(event)
+                except Exception:  # noqa: BLE001 - best effort fan-out
+                    logger.warning(
+                        "mcp-apps broker: failed to enqueue for an active "
+                        "subscriber (session=%s)",
+                        session_id,
+                    )
+            return
+        # No live stream — buffer for the next one.
+        ring = self._pending.setdefault(session_id, deque())
+        ring.append(event)
+        while len(ring) > _MAX_PENDING_PER_SESSION:
+            ring.popleft()
+
+    def add_subscriber(self, session_id: str) -> "asyncio.Queue[dict]":
+        """Register an active stream as a subscriber and return its queue.
+
+        Any events buffered while no stream was active are moved into the
+        new queue so they surface on this turn. Caller MUST pair this with
+        `remove_subscriber` (a generator `finally`); `subscribe` is the
+        context-manager wrapper that does so automatically.
+        """
+        q: "asyncio.Queue[dict]" = asyncio.Queue()
+        if session_id:
+            self._subscribers.setdefault(session_id, set()).add(q)
+            pending = self._pending.pop(session_id, None)
+            if pending:
+                for event in pending:
+                    q.put_nowait(event)
+        return q
+
+    def remove_subscriber(
+        self, session_id: str, q: "asyncio.Queue[dict]"
+    ) -> None:
+        subs = self._subscribers.get(session_id)
+        if subs:
+            subs.discard(q)
+            if not subs:
+                self._subscribers.pop(session_id, None)
+
+    @asynccontextmanager
+    async def subscribe(
+        self, session_id: str
+    ) -> AsyncIterator["asyncio.Queue[dict]"]:
+        """Context-manager form of add/remove_subscriber (used by tests)."""
+        q = self.add_subscriber(session_id)
+        try:
+            yield q
+        finally:
+            self.remove_subscriber(session_id, q)
+
+    def drain(self, queue: "asyncio.Queue[dict]") -> List[dict]:
+        """Non-blocking pop of everything currently in a subscriber queue.
+
+        The stream loop calls this between model events to interleave any
+        app-initiated events without ever blocking on the agent stream.
+        """
+        out: List[dict] = []
+        while True:
+            try:
+                out.append(queue.get_nowait())
+            except asyncio.QueueEmpty:
+                break
+        return out
+
+
+_broker: AppToolEventBroker | None = None
+
+
+def get_app_tool_event_broker() -> AppToolEventBroker:
+    """Get or create the process-global broker."""
+    global _broker
+    if _broker is None:
+        _broker = AppToolEventBroker()
+    return _broker
diff --git a/backend/src/apis/shared/mcp_apps/card_store.py b/backend/src/apis/shared/mcp_apps/card_store.py
new file mode 100644
index 00000000..d1661266
--- /dev/null
+++ b/backend/src/apis/shared/mcp_apps/card_store.py
@@ -0,0 +1,230 @@
+"""Reload persistence for app-initiated tool-call cards (MCP Apps PR #6).
+
+PR #5's broker (`broker.py`) is in-memory only: when an embedded MCP App
+runs a server tool via `tools/call`, the synthesized `tool_use`/
+`tool_result` surface in the *live* conversation stream, but on a full
+page reload they're gone (the App iframe itself isn't re-instantiated on
+reload either). "Option A" of the scoping doc closes that gap with a
+side-channel store, mirroring the Artifacts feature: a small per-session
+provenance record the SPA replays as a *static historical card* on load.
+
+**This is provenance / UI only.** Model-visible state flows solely through
+`ui/update-model-context` (`app_context_dispatch.py`). We deliberately do
+NOT persist a synthetic tool turn into AgentCore Memory — that breaks
+Bedrock's user/assistant role alternation, a hazard this codebase has been
+bitten by (see `stream_coordinator` / max_tokens re-persist comments).
+
+Storage (decision #4): reuse the existing `sessions-metadata` DynamoDB
+table — its `SessionLookupIndex` GSI (`GSI_PK=SESSION#<id>`, Projection
+ALL) and the app-api task role's Query/Read grant already exist, so this
+needs **zero new infra**. New `APPCARD#` SK prefix, alongside the `C#`
+(cost) and `META` rows:
+
+    PK:     USER#<user_id>
+    SK:     APPCARD#<created_at>#<card_id>
+    GSI_PK: SESSION#<session_id>      (SessionLookupIndex)
+    GSI_SK: APPCARD#<created_at>
+
+The write stays on the **app-api boundary** (called from
+`/mcp-apps/proxy-call` after a successful dispatch) so inference-api keeps
+its inference-only scope. Dev/local has no table — every method degrades
+to a no-op / empty list, consistent with the whole MCP Apps surface being
+gated by `AGENTCORE_MCP_APPS_HOST_ENABLED` (default true since PR #7).
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+import uuid as uuid_lib
+from datetime import datetime, timedelta, timezone
+from decimal import Decimal
+from typing import Any, Dict, List, Optional
+
+try:  # boto3 is absent in some local-dev setups
+    import boto3
+    from boto3.dynamodb.conditions import Key
+    from botocore.exceptions import ClientError
+except ImportError:  # pragma: no cover - exercised only without boto3
+    boto3 = None
+    Key = None  # type: ignore[assignment]
+    ClientError = Exception  # type: ignore[assignment, misc]
+
+logger = logging.getLogger(__name__)
+
+# Cards expire with the conversation; 90d matches "conversation history"
+# retention expectations without keeping provenance forever.
+_CARD_TTL_DAYS = 90
+# DynamoDB item hard limit is 400KB; cap the embedded result well under
+# that so a chatty tool can't fail the write (or bloat hydration).
+_MAX_CONTENT_BYTES = 200_000
+_KEY_ATTRS = ("PK", "SK", "GSI_PK", "GSI_SK", "ttl")
+
+
+def _floats_to_decimal(obj: Any) -> Any:
+    if isinstance(obj, float):
+        return Decimal(str(obj))
+    if isinstance(obj, dict):
+        return {k: _floats_to_decimal(v) for k, v in obj.items()}
+    if isinstance(obj, list):
+        return [_floats_to_decimal(v) for v in obj]
+    return obj
+
+
+def _decimal_to_native(obj: Any) -> Any:
+    if isinstance(obj, Decimal):
+        # int when whole, else float — keeps message-index style fields tidy.
+        return int(obj) if obj == obj.to_integral_value() else float(obj)
+    if isinstance(obj, dict):
+        return {k: _decimal_to_native(v) for k, v in obj.items()}
+    if isinstance(obj, list):
+        return [_decimal_to_native(v) for v in obj]
+    return obj
+
+
+class AppCardStore:
+    """Per-session store of app-initiated tool-call provenance cards."""
+
+    def __init__(self) -> None:
+        self._table = None
+        if boto3 is None:
+            return
+        table_name = os.environ.get("DYNAMODB_SESSIONS_METADATA_TABLE_NAME")
+        if not table_name:
+            return
+        try:
+            self._table = boto3.resource("dynamodb").Table(table_name)
+        except Exception:  # noqa: BLE001 - dev without AWS creds
+            logger.warning(
+                "mcp-apps card store: DynamoDB unavailable; persistence "
+                "disabled (cards will be live-only).",
+                exc_info=True,
+            )
+            self._table = None
+
+    @property
+    def enabled(self) -> bool:
+        return self._table is not None
+
+    def store(
+        self,
+        *,
+        user_id: str,
+        session_id: str,
+        tool_use_id: str,
+        tool_name: str,
+        arguments: Dict[str, Any],
+        content: List[Dict[str, Any]],
+        is_error: bool,
+        produced_by_message_index: Optional[int] = None,
+    ) -> None:
+        """Persist one app-initiated tool-call card. Best-effort.
+
+        Never raises into the proxy path — a failed provenance write must
+        not fail the tool call the App is waiting on (it still surfaced
+        live via the broker; only the reload card is lost).
+        """
+        if self._table is None:
+            return
+        created_at = datetime.now(timezone.utc).isoformat()
+        card_id = uuid_lib.uuid4().hex[:12]
+
+        safe_content = content
+        try:
+            import json
+
+            if len(json.dumps(content, ensure_ascii=False).encode()) > _MAX_CONTENT_BYTES:
+                safe_content = [
+                    {
+                        "type": "text",
+                        "text": "[result omitted from history — too large to persist]",
+                    }
+                ]
+        except (TypeError, ValueError):
+            safe_content = [
+                {"type": "text", "text": "[result not serializable for history]"}
+            ]
+
+        ttl = int(
+            (datetime.now(timezone.utc) + timedelta(days=_CARD_TTL_DAYS)).timestamp()
+        )
+        item = {
+            "PK": f"USER#{user_id}",
+            "SK": f"APPCARD#{created_at}#{card_id}",
+            "GSI_PK": f"SESSION#{session_id}",
+            "GSI_SK": f"APPCARD#{created_at}",
+            "userId": user_id,
+            "sessionId": session_id,
+            "cardId": card_id,
+            "toolUseId": tool_use_id,
+            "toolName": tool_name,
+            "arguments": arguments,
+            "content": safe_content,
+            "isError": is_error,
+            "createdAt": created_at,
+            "producedByMessageIndex": produced_by_message_index,
+            "ttl": ttl,
+        }
+        try:
+            self._table.put_item(Item=_floats_to_decimal(item))
+        except Exception:  # noqa: BLE001 - provenance is best-effort
+            logger.warning(
+                "mcp-apps card store: failed to persist card (session=%s)",
+                session_id,
+                exc_info=True,
+            )
+
+    def list_for_session(
+        self, *, session_id: str, user_id: str
+    ) -> List[Dict[str, Any]]:
+        """Return this user's app-initiated tool cards for a session.
+
+        Queried off the session GSI then re-filtered by `userId` so a
+        guessed session id can't surface another user's cards (mirrors the
+        Artifacts ownership re-check). Oldest-first for stable rendering.
+        """
+        if self._table is None:
+            return []
+        try:
+            items: List[Dict[str, Any]] = []
+            kwargs: Dict[str, Any] = {
+                "IndexName": "SessionLookupIndex",
+                "KeyConditionExpression": Key("GSI_PK").eq(f"SESSION#{session_id}")
+                & Key("GSI_SK").begins_with("APPCARD#"),
+                "ScanIndexForward": True,
+            }
+            while True:
+                resp = self._table.query(**kwargs)
+                items.extend(resp.get("Items", []))
+                lek = resp.get("LastEvaluatedKey")
+                if not lek:
+                    break
+                kwargs["ExclusiveStartKey"] = lek
+        except ClientError:
+            logger.warning(
+                "mcp-apps card store: query failed (session=%s)",
+                session_id,
+                exc_info=True,
+            )
+            return []
+
+        cards: List[Dict[str, Any]] = []
+        for raw in items:
+            if raw.get("userId") != user_id:
+                continue  # ownership re-check (guessed session id)
+            card = _decimal_to_native(
+                {k: v for k, v in raw.items() if k not in _KEY_ATTRS}
+            )
+            cards.append(card)
+        return cards
+
+
+_store: Optional[AppCardStore] = None
+
+
+def get_app_card_store() -> AppCardStore:
+    """Get or create the process-global card store."""
+    global _store
+    if _store is None:
+        _store = AppCardStore()
+    return _store
diff --git a/backend/src/apis/shared/middleware/session_refresh.py b/backend/src/apis/shared/middleware/session_refresh.py
index 4b36abba..6a2d7bb5 100644
--- a/backend/src/apis/shared/middleware/session_refresh.py
+++ b/backend/src/apis/shared/middleware/session_refresh.py
@@ -17,9 +17,12 @@
 
 import asyncio
 import logging
+import secrets
 import time
 from typing import Optional
 
+from botocore.exceptions import ClientError
+
 from starlette.middleware.base import BaseHTTPMiddleware
 from starlette.requests import Request
 from starlette.responses import Response
@@ -43,6 +46,7 @@
     CognitoRefreshError,
 )
 from apis.shared.sessions_bff.repository import SessionRepository
+from apis.shared.sessions_bff.single_flight import resolve_once
 
 logger = logging.getLogger(__name__)
 
@@ -64,6 +68,7 @@ def __init__(
         cookie_codec: Optional[CookieCodec] = None,
         refresh_client: Optional[CognitoRefreshClient] = None,
         cache: Optional[SessionCache] = None,
+        refresh_lock_ttl_seconds: int = 30,
     ) -> None:
         super().__init__(app)
         self._config = config
@@ -71,6 +76,19 @@ def __init__(
         self._cookie_codec = cookie_codec
         self._refresh_client = refresh_client
         self._cache = cache
+        # Cross-task refresh lock TTL. A leader that crashes mid-refresh
+        # strands the lock for at most this many seconds, after which any
+        # peer can re-acquire and retry. Followers poll for at most this
+        # long before falling back to terminal. 30s is a safety cushion
+        # over the worst-case (Cognito + DDB + retries) refresh latency.
+        self._refresh_lock_ttl_seconds = refresh_lock_ttl_seconds
+        # Strong-reference set for fire-and-forget slide-write tasks.
+        # Without keeping a reference, `asyncio.create_task(...)` can be
+        # garbage-collected mid-execution — Python's docs explicitly warn
+        # about this, and on fast CI runners the task dies before the
+        # scheduler picks it up. We remove each task via `add_done_callback`
+        # so the set doesn't grow unboundedly.
+        self._slide_tasks: set[asyncio.Task] = set()
 
     def _ensure_collaborators(self) -> None:
         if self._config is None:
@@ -145,6 +163,13 @@ async def _maybe_slide(self, record: SessionRecord) -> Optional[int]:
         past it, the cookie is allowed to expire on its own original Max-Age
         — we don't extend, but we also don't proactively clear (the user
         might still complete the in-flight request).
+
+        The DDB `touch_last_seen` write is dispatched as a detached
+        `asyncio.Task` — the response path must not wait on it. The local
+        cache is updated synchronously BEFORE scheduling so subsequent
+        same-request reads (and the next cache window) see the slid state
+        even if the background write hasn't landed yet. Today's "swallow
+        failures" semantics are preserved inside `_slide_write_task`.
         """
         assert self._config is not None
         now = int(time.time())
@@ -165,26 +190,59 @@ async def _maybe_slide(self, record: SessionRecord) -> Optional[int]:
             return None
 
         new_ttl = now + new_max_age
-        try:
-            await self._repository.touch_last_seen(
-                record.session_id, last_seen_at=now, ttl=new_ttl
-            )
-        except Exception as exc:
-            # Don't fail the request if the slide-write fails — the user
-            # still has a valid session for the rest of its current TTL.
-            logger.warning(
-                "BFF session slide failed for %s: %s", record.session_id, exc
-            )
-            return None
 
-        # Reflect the slide locally so subsequent same-request reads (and the
-        # cache) don't think the row still needs a slide.
+        # Reflect the slide locally BEFORE dispatching the background write
+        # so subsequent same-request reads (and the cache) don't think the
+        # row still needs a slide — even if the background task hasn't yet
+        # landed the DDB write.
         record.last_seen_at = now
         record.ttl = new_ttl
         if self._cache is not None:
             self._cache.set(record)
+
+        # Fire-and-forget: the response path MUST NOT wait on the DDB write.
+        # Failures are swallowed inside `_slide_write_task` (preserving
+        # today's "slide failures are non-fatal" semantics — the user still
+        # has a valid session for the rest of its current TTL).
+        #
+        # CRITICAL: keep a strong reference on the middleware instance
+        # (`self._slide_tasks`). Without this, Python is free to GC the
+        # task before it runs — we observed this on Python 3.12 CI runners
+        # where the preservation tests saw 0 update_item calls because the
+        # task was collected mid-flight. The done-callback removes the task
+        # again so the set doesn't leak.
+        task = asyncio.create_task(
+            self._slide_write_task(
+                session_id=record.session_id,
+                last_seen_at=now,
+                ttl=new_ttl,
+            )
+        )
+        self._slide_tasks.add(task)
+        task.add_done_callback(self._slide_tasks.discard)
         return new_max_age
 
+    async def _slide_write_task(
+        self, *, session_id: str, last_seen_at: int, ttl: int
+    ) -> None:
+        """Background helper for `_maybe_slide`'s fire-and-forget DDB write.
+
+        Swallows exceptions so a DDB blip doesn't surface as an unhandled
+        task exception — today's inline slide-write already swallowed
+        failures, and we preserve that contract verbatim. The local cache
+        was updated synchronously in `_maybe_slide` before this task was
+        scheduled, so the user keeps seeing the slid state for the rest of
+        their current cache window.
+        """
+        try:
+            await self._repository.touch_last_seen(
+                session_id, last_seen_at=last_seen_at, ttl=ttl
+            )
+        except Exception as exc:
+            logger.warning(
+                "BFF session slide failed for %s: %s", session_id, exc
+            )
+
     async def _persist_refresh(
         self,
         *,
@@ -193,18 +251,26 @@ async def _persist_refresh(
         last_seen_at: int,
         ttl: int,
         rotated: bool,
+        lock_owner: str,
     ) -> bool:
         """Write refreshed tokens to DDB. Retry when rotation makes it critical.
 
         Returns True on success or on a benign (non-rotation) failure. Returns
         False only when rotation happened *and* every retry failed — caller
         should treat that as session-unrecoverable.
+
+        The write also clears the cross-task refresh lock (atomic with the
+        token rotation), conditional on `lock_owner` matching. A
+        `ConditionalCheckFailedException` here means a peer task acquired
+        the lock after ours expired — we abandon the persist and the caller
+        should re-read DDB to adopt the peer's tokens.
         """
         # Three attempts on rotation (≈350ms total worst-case), single shot
         # otherwise. boto3 already retries below us for transient API errors;
         # this layer handles longer blips and validation/throttling errors
         # that boto3 lets through.
         attempts = 3 if rotated else 1
+        last_exc: Optional[Exception] = None
         for attempt in range(attempts):
             try:
                 await self._repository.update_tokens(
@@ -215,8 +281,26 @@ async def _persist_refresh(
                     access_token_exp=refreshed.access_token_exp,
                     last_seen_at=last_seen_at,
                     ttl=ttl,
+                    expected_lock_owner=lock_owner,
                 )
                 return True
+            except ClientError as exc:
+                # Lock-ownership condition failed — a peer task took over.
+                # Don't retry: their refresh is authoritative now. Caller
+                # adopts their tokens via the post-failure DDB re-read.
+                if (
+                    exc.response.get("Error", {}).get("Code")
+                    == "ConditionalCheckFailedException"
+                ):
+                    logger.info(
+                        "BFF refresh persist for %s lost lock to a peer task — "
+                        "deferring to peer's tokens.",
+                        session_id,
+                    )
+                    return False
+                last_exc = exc
+                if attempt + 1 < attempts:
+                    await asyncio.sleep(0.05 * (2**attempt))  # 50ms, 100ms
             except Exception as exc:
                 last_exc = exc
                 if attempt + 1 < attempts:
@@ -240,6 +324,48 @@ async def _persist_refresh(
         )
         return True
 
+    async def _wait_for_peer_refresh(
+        self,
+        *,
+        session_id: str,
+        previous: SessionRecord,
+        max_wait_seconds: float,
+    ) -> Optional[SessionRecord]:
+        """Poll DDB for a peer task's freshly persisted tokens.
+
+        Called when we lost the cross-task refresh lock to a peer. Polls
+        the session row with bounded backoff (50ms → 500ms) until we
+        observe tokens that differ from `previous` — at which point we
+        adopt them — or `max_wait_seconds` elapses.
+
+        Returns the peer's record on success, or `None` if we timed out
+        (peer crashed mid-refresh). The caller treats `None` as terminal
+        and clears the cookie; the lock TTL ensures the next request can
+        re-acquire and retry without waiting for a stuck row.
+        """
+        deadline = time.monotonic() + max_wait_seconds
+        sleep_for = 0.05
+        while time.monotonic() < deadline:
+            await asyncio.sleep(sleep_for)
+            peer = await self._repository.get(session_id)
+            if peer is None:
+                # Row gone (delete or TTL eviction) — terminal.
+                return None
+            # Refresh-token rotation: peer issued a new refresh token, ours
+            # is now revoked. Adopt their record.
+            if peer.cognito_refresh_token != previous.cognito_refresh_token:
+                return peer
+            # No rotation but a fresh access token landed: peer refreshed
+            # successfully, we can use the new access token.
+            if (
+                peer.cognito_access_token != previous.cognito_access_token
+                and peer.access_token_exp
+                > int(time.time()) + self._config.refresh_leeway_seconds
+            ):
+                return peer
+            sleep_for = min(sleep_for * 1.5, 0.5)
+        return None
+
     async def _resolve_session(
         self, cookie_value: str
     ) -> tuple[Optional[SessionRecord], bool]:
@@ -247,6 +373,21 @@ async def _resolve_session(
 
         `should_clear_cookie` is True when the cookie is present but
         unrecoverable — bad seal, missing row, expired TTL, or refresh failure.
+
+        Cookie unseal happens before the single-flight wrap so a bad seal
+        short-circuits without registering a Future (and without keying the
+        registry off an untrusted session id). Once we have a validated
+        session id, the cache → `repository.get` → `needs_refresh` →
+        (maybe refresh) path is coalesced through `resolve_once` so an
+        Angular page-load fan-out of N same-session requests issues at most
+        one DynamoDB `get_item` per cache window.
+
+        The per-session `get_session_lock(session_id)` around the Cognito
+        refresh exchange stays exactly where it is today — the single-flight
+        sits upstream of it. In the common case that the single-flight
+        already coalesces N callers to one loader invocation, only the
+        leader ever reaches the refresh lock; the existing one-`initiate_auth`-
+        per-`session_id`-per-leeway-window contract is preserved end-to-end.
         """
         try:
             payload = self._cookie_codec.unseal(cookie_value)
@@ -256,92 +397,180 @@ async def _resolve_session(
 
         session_id = payload.session_id
 
-        cached = self._cache.get(session_id) if self._cache else None
-        if cached is not None and not cached.needs_refresh(
-            int(time.time()), self._config.refresh_leeway_seconds
-        ):
-            return cached, False
-
-        record = await self._repository.get(session_id)
-        if record is None:
-            logger.info("Discarding BFF cookie — no matching session row")
-            return None, True
+        async def _loader() -> tuple[Optional[SessionRecord], bool]:
+            cached = self._cache.get(session_id) if self._cache else None
+            if cached is not None and not cached.needs_refresh(
+                int(time.time()), self._config.refresh_leeway_seconds
+            ):
+                return cached, False
 
-        if not record.needs_refresh(
-            int(time.time()), self._config.refresh_leeway_seconds
-        ):
-            self._cache.set(record)
-            return record, False
-
-        # Coalesce concurrent refreshes for the same session id.
-        async with get_session_lock(session_id):
-            # Re-check after acquiring the lock — another waiter may have
-            # already refreshed, in which case we serve the fresh row.
-            current = await self._repository.get(session_id)
-            if current is None:
+            record = await self._repository.get(session_id)
+            if record is None:
+                logger.info("Discarding BFF cookie — no matching session row")
                 return None, True
-            if not current.needs_refresh(
+
+            if not record.needs_refresh(
                 int(time.time()), self._config.refresh_leeway_seconds
             ):
-                self._cache.set(current)
-                return current, False
-
-            try:
-                refreshed = self._refresh_client.refresh(
+                self._cache.set(record)
+                return record, False
+
+            # Two-tier coalescing:
+            #
+            # 1. `get_session_lock` (in-process): collapses N concurrent
+            #    same-session callers within ONE task to a single refresh
+            #    contender.
+            # 2. `try_acquire_refresh_lock` (cross-process, DDB conditional
+            #    write): one of those contenders, across all tasks, becomes
+            #    the leader and actually calls Cognito. Followers poll DDB
+            #    for the leader's persisted tokens.
+            #
+            # Without the cross-process lock, two tasks under desiredCount: 2
+            # would each call `cognito-idp:initiate_auth` with the same refresh
+            # token — Cognito rotates on the first; the second fails
+            # `NotAuthorizedException` and the loser's middleware clears the
+            # user's cookie. The DDB lock turns that race into a leader/
+            # follower handoff so exactly one Cognito refresh happens per
+            # session per leeway window across the entire fleet.
+            async with get_session_lock(session_id):
+                current = await self._repository.get(session_id)
+                if current is None:
+                    return None, True
+                if not current.needs_refresh(
+                    int(time.time()), self._config.refresh_leeway_seconds
+                ):
+                    self._cache.set(current)
+                    return current, False
+
+                # Past absolute lifetime — terminal. Don't burn a Cognito
+                # refresh-token rotation on a session we'd just write a
+                # past-dated TTL onto (which would instantly TTL-evict the
+                # row right after we wrote tokens). Mirrors the slide path
+                # in `_maybe_slide`, which also short-circuits at the cap.
+                if (
+                    current.created_at + self._config.absolute_lifetime_seconds
+                    <= int(time.time())
+                ):
+                    logger.info(
+                        "BFF session %s past absolute lifetime — clearing cookie "
+                        "rather than refreshing.",
+                        session_id,
+                    )
+                    self._cache.invalidate(session_id)
+                    return None, True
+
+                lock_owner = secrets.token_hex(16)
+                lock_acquired = await self._repository.try_acquire_refresh_lock(
+                    session_id=session_id,
+                    owner=lock_owner,
+                    lock_ttl_seconds=self._refresh_lock_ttl_seconds,
+                )
+                if not lock_acquired:
+                    # FOLLOWER: a peer task is doing the Cognito refresh.
+                    # Wait for their tokens to land on the row, then adopt.
+                    peer = await self._wait_for_peer_refresh(
+                        session_id=session_id,
+                        previous=current,
+                        max_wait_seconds=self._refresh_lock_ttl_seconds,
+                    )
+                    if peer is None:
+                        # Peer never wrote — likely crashed or hit a Cognito
+                        # error. Lock will TTL out; the user's next request
+                        # will get to retry. Fail closed for *this* request.
+                        self._cache.invalidate(session_id)
+                        return None, True
+                    # Cross-task adoption succeeded — peer's refresh is
+                    # authoritative. Log at INFO so CloudWatch can answer
+                    # "how often is cross-task coalescing actually firing?"
+                    logger.info(
+                        "BFF session %s adopted peer task's refreshed tokens "
+                        "(cross-task lock follower path).",
+                        session_id,
+                    )
+                    self._cache.set(peer)
+                    return peer, False
+
+                # LEADER: do the Cognito refresh and persist.
+                try:
+                    refreshed = await self._refresh_client.refresh(
+                        username=current.username,
+                        refresh_token=current.cognito_refresh_token,
+                    )
+                except CognitoRefreshError:
+                    # Refresh refused — release the lock so a peer can retry
+                    # the next request without waiting for the full lock TTL,
+                    # then treat as terminal for this request.
+                    await self._repository.release_refresh_lock(
+                        session_id, lock_owner
+                    )
+                    self._cache.invalidate(session_id)
+                    return None, True
+
+                now = int(time.time())
+                # Slide the row's DDB TTL alongside the token rotation: the user
+                # is provably active. Capped at `created_at + absolute_lifetime`
+                # so a long-lived browser tab can't roll the session forever.
+                absolute_cap = (
+                    current.created_at + self._config.absolute_lifetime_seconds
+                )
+                new_ttl = min(
+                    now + self._config.session_ttl_seconds,
+                    absolute_cap,
+                )
+                # Detect refresh-token rotation. When Cognito rotates, the OLD
+                # refresh token is dead the moment the new one is issued — so a
+                # DDB write failure here means the session is unrecoverable on
+                # the *next* request even though *this* one succeeded. Retry
+                # aggressively, then fail-closed (clear cookie now) so the user
+                # re-auths immediately rather than getting silently 401'd later.
+                # Without rotation, the previous refresh token is still valid,
+                # so a DDB write failure is benign: the next request will just
+                # re-trigger refresh with the same (still good) refresh token.
+                rotated = refreshed.refresh_token != current.cognito_refresh_token
+                persist_ok = await self._persist_refresh(
+                    session_id=session_id,
+                    refreshed=refreshed,
+                    last_seen_at=now,
+                    ttl=new_ttl,
+                    rotated=rotated,
+                    lock_owner=lock_owner,
+                )
+                if not persist_ok:
+                    # Two reasons this lands here:
+                    #   (a) Rotation persist exhausted retries — session is
+                    #       unrecoverable; clear cookie and force re-auth.
+                    #   (b) Lock-owner condition failed (peer took over) —
+                    #       re-read DDB and adopt the peer's tokens rather
+                    #       than logging the user out.
+                    peer = await self._repository.get(session_id)
+                    if (
+                        peer is not None
+                        and not peer.needs_refresh(
+                            int(time.time()),
+                            self._config.refresh_leeway_seconds,
+                        )
+                    ):
+                        self._cache.set(peer)
+                        return peer, False
+                    self._cache.invalidate(session_id)
+                    return None, True
+                updated = SessionRecord(
+                    session_id=current.session_id,
+                    user_id=current.user_id,
                     username=current.username,
-                    refresh_token=current.cognito_refresh_token,
+                    cognito_access_token=refreshed.access_token,
+                    cognito_refresh_token=refreshed.refresh_token,
+                    id_token=refreshed.id_token,
+                    access_token_exp=refreshed.access_token_exp,
+                    csrf_secret=current.csrf_secret,
+                    created_at=current.created_at,
+                    last_seen_at=now,
+                    ttl=new_ttl,
                 )
-            except CognitoRefreshError:
-                # Refresh refused — treat as terminal, force re-login.
-                self._cache.invalidate(session_id)
-                return None, True
+                self._cache.set(updated)
+                return updated, False
 
-            now = int(time.time())
-            # Slide the row's DDB TTL alongside the token rotation: the user
-            # is provably active. Capped at `created_at + absolute_lifetime`
-            # so a long-lived browser tab can't roll the session forever.
-            absolute_cap = (
-                current.created_at + self._config.absolute_lifetime_seconds
-            )
-            new_ttl = min(
-                now + self._config.session_ttl_seconds,
-                absolute_cap,
-            )
-            # Detect refresh-token rotation. When Cognito rotates, the OLD
-            # refresh token is dead the moment the new one is issued — so a
-            # DDB write failure here means the session is unrecoverable on
-            # the *next* request even though *this* one succeeded. Retry
-            # aggressively, then fail-closed (clear cookie now) so the user
-            # re-auths immediately rather than getting silently 401'd later.
-            # Without rotation, the previous refresh token is still valid,
-            # so a DDB write failure is benign: the next request will just
-            # re-trigger refresh with the same (still good) refresh token.
-            rotated = refreshed.refresh_token != current.cognito_refresh_token
-            persist_ok = await self._persist_refresh(
-                session_id=session_id,
-                refreshed=refreshed,
-                last_seen_at=now,
-                ttl=new_ttl,
-                rotated=rotated,
-            )
-            if not persist_ok:
-                self._cache.invalidate(session_id)
-                return None, True
-            updated = SessionRecord(
-                session_id=current.session_id,
-                user_id=current.user_id,
-                username=current.username,
-                cognito_access_token=refreshed.access_token,
-                cognito_refresh_token=refreshed.refresh_token,
-                id_token=refreshed.id_token,
-                access_token_exp=refreshed.access_token_exp,
-                csrf_secret=current.csrf_secret,
-                created_at=current.created_at,
-                last_seen_at=now,
-                ttl=new_ttl,
-            )
-            self._cache.set(updated)
-            return updated, False
+        return await resolve_once(session_id, _loader)
 
     @staticmethod
     def _reemit_cookies(
diff --git a/backend/src/apis/shared/models/models.py b/backend/src/apis/shared/models/models.py
index 38d5f367..b17afba2 100644
--- a/backend/src/apis/shared/models/models.py
+++ b/backend/src/apis/shared/models/models.py
@@ -21,10 +21,19 @@ class ModelParamSpec(BaseModel):
     supported: bool = True
     min: Optional[float] = None
     max: Optional[float] = None
+    allowed: Optional[list[Any]] = Field(
+        None,
+        description="Permissible values for enum-style params (e.g. `effort`: "
+                    "low/medium/high/xhigh/max). When set, `default` and any "
+                    "user override must be a member; `min`/`max` don't apply. "
+                    "Keep ordered low->high so future clamping (request `max`, "
+                    "model caps at `high`) can degrade gracefully."
+    )
     default: Optional[Any] = Field(
         None,
         description="Value sent when the user doesn't override. Type depends on the param "
-                    "(number for temperature/top_p, bool for thinking, etc.)."
+                    "(number for temperature/top_p, int budget for thinking, "
+                    "string for effort, etc.)."
     )
     locked: bool = Field(
         False,
@@ -41,6 +50,11 @@ def _check_bounds(self) -> "ModelParamSpec":
                 raise ValueError("default must be >= min")
             if self.max is not None and self.default > self.max:
                 raise ValueError("default must be <= max")
+        if self.allowed is not None:
+            if not self.allowed:
+                raise ValueError("allowed must be non-empty when set")
+            if self.default is not None and self.default not in self.allowed:
+                raise ValueError("default must be one of allowed")
         return self
 
 
@@ -54,8 +68,11 @@ class SupportedParams(BaseModel):
 
     For ``thinking``, ``ModelParamSpec.default`` carries the budget in
     tokens (int >= 1024, or 0/None to disable). The provider translator
-    wraps it into the Anthropic ``{type: "enabled", budget_tokens: N}``
-    shape on the way out.
+    wraps a truthy value into the Anthropic ``{type: "enabled",
+    budget_tokens: N}`` shape on older models, or ``{type: "adaptive"}``
+    on models that require adaptive thinking (Opus 4.6/4.7, Sonnet 4.6) —
+    where the int just means "thinking on" and depth is governed by the
+    separate ``effort`` param.
     """
     model_config = ConfigDict(populate_by_name=True)
 
@@ -88,6 +105,33 @@ def _check_thinking_invariants(self) -> "SupportedParams":
         return self
 
 
+def _max_tokens_within_ceiling(
+    max_output_tokens: Optional[int],
+    supported_params: Optional[SupportedParams],
+) -> None:
+    """Reject a max_tokens spec that lets the runtime request more output
+    than the model can physically produce.
+
+    Mirrors the Angular ``maxTokensCeilingValidator``. Only checks when both
+    the model ceiling and a *supported* max_tokens spec are present in the
+    same payload — a partial update touching only one side is left to the
+    per-field bounds rules.
+    """
+    if max_output_tokens is None or supported_params is None:
+        return
+    spec = supported_params.params.get("max_tokens")
+    if spec is None or not spec.supported:
+        return
+    if spec.max is not None and spec.max > max_output_tokens:
+        raise ValueError("max_tokens max must be <= maxOutputTokens")
+    if (
+        isinstance(spec.default, (int, float))
+        and not isinstance(spec.default, bool)
+        and spec.default > max_output_tokens
+    ):
+        raise ValueError("max_tokens default must be <= maxOutputTokens")
+
+
 class ManagedModelCreate(BaseModel):
     """Request model for creating a managed model."""
     model_config = ConfigDict(populate_by_name=True)
@@ -145,6 +189,11 @@ class ManagedModelCreate(BaseModel):
                     "When None, the runtime sends no inference params."
     )
 
+    @model_validator(mode="after")
+    def _check_max_tokens_within_ceiling(self) -> "ManagedModelCreate":
+        _max_tokens_within_ceiling(self.max_output_tokens, self.supported_params)
+        return self
+
 
 class ManagedModelUpdate(BaseModel):
     """Request model for updating a managed model."""
@@ -201,6 +250,11 @@ class ManagedModelUpdate(BaseModel):
         description="Per-model inference parameter capabilities."
     )
 
+    @model_validator(mode="after")
+    def _check_max_tokens_within_ceiling(self) -> "ManagedModelUpdate":
+        _max_tokens_within_ceiling(self.max_output_tokens, self.supported_params)
+        return self
+
 
 class ManagedModel(BaseModel):
     """Managed model with full details including cache pricing."""
diff --git a/backend/src/apis/shared/rbac/__init__.py b/backend/src/apis/shared/rbac/__init__.py
index 7178e2d2..60e3834f 100644
--- a/backend/src/apis/shared/rbac/__init__.py
+++ b/backend/src/apis/shared/rbac/__init__.py
@@ -9,7 +9,6 @@
 from .repository import AppRoleRepository
 from .service import AppRoleService
 from .admin_service import AppRoleAdminService
-from .system_admin import require_system_admin
 from .seeder import seed_system_roles, ensure_system_roles
 
 __all__ = [
@@ -20,7 +19,6 @@
     "AppRoleRepository",
     "AppRoleService",
     "AppRoleAdminService",
-    "require_system_admin",
     "seed_system_roles",
     "ensure_system_roles",
 ]
diff --git a/backend/src/apis/shared/rbac/system_admin.py b/backend/src/apis/shared/rbac/system_admin.py
deleted file mode 100644
index 638513cc..00000000
--- a/backend/src/apis/shared/rbac/system_admin.py
+++ /dev/null
@@ -1,115 +0,0 @@
-"""System administrator configuration and access control."""
-
-import logging
-from typing import Callable
-
-from fastapi import Depends, HTTPException, status
-
-from apis.shared.auth.models import User
-from apis.shared.auth.dependencies import get_current_user
-
-logger = logging.getLogger(__name__)
-
-
-async def require_system_admin(
-    user: User = Depends(get_current_user),
-) -> User:
-    """
-    Require system administrator access.
-
-    Resolves user permissions via the AppRole system and checks for
-    the ``system_admin`` app role.  Fails closed: if the permission
-    lookup raises an exception, access is denied.
-
-    Usage:
-        @router.post("/admin/roles")
-        async def create_role(
-            admin: User = Depends(require_system_admin)
-        ):
-            pass
-
-    Args:
-        user: Current authenticated user (injected)
-
-    Returns:
-        User object if authorized
-
-    Raises:
-        HTTPException: 403 if user lacks system admin access
-    """
-    from .service import get_app_role_service
-
-    try:
-        app_role_service = get_app_role_service()
-        permissions = await app_role_service.resolve_user_permissions(user)
-        if "system_admin" in permissions.app_roles:
-            logger.debug(f"User {user.name} authorized as system admin")
-            return user
-    except Exception:
-        logger.exception(
-            f"Failed to resolve permissions for {user.name}, denying admin access"
-        )
-
-    logger.warning(
-        f"User {user.name} (roles: {user.roles}) denied system admin access"
-    )
-    raise HTTPException(
-        status_code=status.HTTP_403_FORBIDDEN,
-        detail="System administrator access required",
-    )
-
-
-def require_tool_access(tool_id: str) -> Callable:
-    """
-    FastAPI dependency that checks if user can access a specific tool.
-
-    Usage:
-        @router.post("/tools/code-interpreter/execute")
-        async def execute_code(
-            user: User = Depends(require_tool_access("code_interpreter"))
-        ):
-            # User has been verified to have access
-            pass
-    """
-    from .service import get_app_role_service
-
-    async def checker(
-        user: User = Depends(get_current_user),
-    ) -> User:
-        app_role_service = get_app_role_service()
-        if not await app_role_service.can_access_tool(user, tool_id):
-            raise HTTPException(
-                status_code=status.HTTP_403_FORBIDDEN,
-                detail=f"Access denied to tool: {tool_id}",
-            )
-        return user
-
-    return checker
-
-
-def require_model_access(model_id: str) -> Callable:
-    """
-    FastAPI dependency that checks if user can access a specific model.
-
-    Usage:
-        @router.post("/chat")
-        async def chat(
-            user: User = Depends(require_model_access("claude-opus"))
-        ):
-            # User has been verified to have access
-            pass
-    """
-    from .service import get_app_role_service
-
-    async def checker(
-        user: User = Depends(get_current_user),
-    ) -> User:
-        app_role_service = get_app_role_service()
-        if not await app_role_service.can_access_model(user, model_id):
-            raise HTTPException(
-                status_code=status.HTTP_403_FORBIDDEN,
-                detail=f"Access denied to model: {model_id}",
-            )
-        return user
-
-    return checker
diff --git a/backend/src/apis/shared/sessions/metadata.py b/backend/src/apis/shared/sessions/metadata.py
index 55438f94..8925a29f 100644
--- a/backend/src/apis/shared/sessions/metadata.py
+++ b/backend/src/apis/shared/sessions/metadata.py
@@ -1919,3 +1919,83 @@ async def clear_paused_turn(session_id: str, user_id: str) -> None:
         logger.info("Cleared paused_turn for session %s", session_id)
     except Exception as e:
         logger.error("Failed to clear paused_turn: %s", e, exc_info=True)
+
+
+async def set_truncated_turn(session_id: str, user_id: str) -> None:
+    """Mark that the last turn ended in a recoverable max_tokens truncation.
+
+    Lets the client re-show the "Continue" affordance after a page refresh
+    (the truncated partial assistant message is already in AgentCore Memory;
+    this flag is the only missing piece). Idempotent overwrite. Best-effort:
+    a write failure logs but never breaks the live SSE flow. No-op when the
+    session record is missing or the table env var is unset.
+    """
+    sessions_metadata_table = os.environ.get("DYNAMODB_SESSIONS_METADATA_TABLE_NAME")
+    if not sessions_metadata_table:
+        logger.warning("DYNAMODB_SESSIONS_METADATA_TABLE_NAME not set; skipping truncated_turn persistence")
+        return
+
+    try:
+        import boto3
+
+        dynamodb = boto3.resource("dynamodb")
+        table = dynamodb.Table(sessions_metadata_table)
+
+        existing = await _get_session_by_gsi(session_id, user_id, table)
+        if not existing:
+            logger.info("Skipping truncated_turn write — session %s not found", session_id)
+            return
+
+        sk = existing.get("SK")
+        if not sk:
+            logger.warning("Session %s has no SK; cannot update truncated_turn", session_id)
+            return
+
+        table.update_item(
+            Key={"PK": f"USER#{user_id}", "SK": sk},
+            UpdateExpression="SET #ltc = :ltc",
+            ExpressionAttributeNames={"#ltc": "lastTurnContinuable"},
+            ExpressionAttributeValues={":ltc": True},
+        )
+        logger.info("Persisted truncated_turn marker for session %s", session_id)
+    except Exception as e:
+        logger.error("Failed to persist truncated_turn: %s", e, exc_info=True)
+
+
+async def clear_truncated_turn(session_id: str, user_id: str) -> None:
+    """Drop the truncated-turn marker.
+
+    Called at the start of any new turn that isn't an interrupt-resume
+    (fresh turn or a max_tokens continuation), so a stale marker can't
+    resurrect "Continue" against a turn the user moved past. If a
+    continuation itself re-truncates, the intercept re-sets the marker.
+    """
+    sessions_metadata_table = os.environ.get("DYNAMODB_SESSIONS_METADATA_TABLE_NAME")
+    if not sessions_metadata_table:
+        return
+
+    try:
+        import boto3
+
+        dynamodb = boto3.resource("dynamodb")
+        table = dynamodb.Table(sessions_metadata_table)
+
+        existing = await _get_session_by_gsi(session_id, user_id, table)
+        if not existing:
+            return
+
+        sk = existing.get("SK")
+        if not sk:
+            return
+
+        if "lastTurnContinuable" not in existing:
+            return  # Already clear
+
+        table.update_item(
+            Key={"PK": f"USER#{user_id}", "SK": sk},
+            UpdateExpression="REMOVE #ltc",
+            ExpressionAttributeNames={"#ltc": "lastTurnContinuable"},
+        )
+        logger.info("Cleared truncated_turn for session %s", session_id)
+    except Exception as e:
+        logger.error("Failed to clear truncated_turn: %s", e, exc_info=True)
diff --git a/backend/src/apis/shared/sessions/models.py b/backend/src/apis/shared/sessions/models.py
index 771b4b36..95ef3cfa 100644
--- a/backend/src/apis/shared/sessions/models.py
+++ b/backend/src/apis/shared/sessions/models.py
@@ -185,6 +185,11 @@ class SessionMetadata(BaseModel):
         alias="pausedTurn",
         description="Agent-construction snapshot for a turn paused on OAuth consent; cleared on successful resume or when a new turn supersedes it",
     )
+    last_turn_continuable: Optional[bool] = Field(
+        default=None,
+        alias="lastTurnContinuable",
+        description="True when the last turn ended in a recoverable max_tokens truncation; lets the 'Continue' affordance survive a page refresh. Cleared at the start of any new (non-interrupt-resume) turn",
+    )
 
     # Denormalized cost + context aggregates for the session-cost badge.
     # Maintained by _bump_session_aggregates after each turn (write-time
@@ -267,6 +272,11 @@ class SessionMetadataResponse(BaseModel):
         alias="totalSummarizedTurns",
         description="Cumulative count of turns rolled into a compaction summary in this session",
     )
+    last_turn_continuable: Optional[bool] = Field(
+        default=None,
+        alias="lastTurnContinuable",
+        description="True when the last turn ended in a recoverable max_tokens truncation, so the client can re-show the 'Continue' affordance after a refresh",
+    )
 
 
 class SessionsListResponse(BaseModel):
@@ -332,11 +342,22 @@ class MessageContent(BaseModel):
 
 
 class LatencyMetrics(BaseModel):
-    """Latency measurements in milliseconds"""
+    """Latency measurements in milliseconds.
+
+    ``time_to_first_token`` is ``None`` when the provider did not emit
+    ``timeToFirstByteMs`` and we couldn't compute it locally — distinct from
+    a measured value of 0ms (which is physically impossible). Aggregations
+    over TTFT must filter ``None`` so a missing measurement doesn't pull
+    averages toward zero.
+    """
 
     model_config = ConfigDict(populate_by_name=True)
 
-    time_to_first_token: int = Field(..., alias="timeToFirstToken", description="Time from request start to first token received (ms)")
+    time_to_first_token: Optional[int] = Field(
+        None,
+        alias="timeToFirstToken",
+        description="Time from request start to first token (ms); None if not measured",
+    )
     end_to_end_latency: int = Field(..., alias="endToEndLatency", description="Total time from request start to completion (ms)")
 
 
diff --git a/backend/src/apis/shared/sessions_bff/config.py b/backend/src/apis/shared/sessions_bff/config.py
index 311f1bbe..0a775eb9 100644
--- a/backend/src/apis/shared/sessions_bff/config.py
+++ b/backend/src/apis/shared/sessions_bff/config.py
@@ -29,9 +29,12 @@
 # fail anyway, so there's no value in carrying the cookie further.
 _DEFAULT_ABSOLUTE_LIFETIME_SECONDS = 30 * 24 * 3600
 # Don't write to DDB / re-emit cookies on every request; coalesce to once per
-# minute. Tabs that hit the BFF more often than this just ride the existing
-# row.
-_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS = 60
+# 5 minutes. Kept a strict multiple of `_DEFAULT_REFRESH_LEEWAY_SECONDS` (60s)
+# so cache-expiry (TTL = leeway) and slide-throttle boundaries are never
+# aligned — without this, a single request crossing the 60s boundary would
+# incur BOTH a `get_item` AND an `update_item` on the critical path. De-
+# alignment keeps a cache-miss request from also paying the slide-write cost.
+_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS = 60 * 5
 
 
 @dataclass(frozen=True)
diff --git a/backend/src/apis/shared/sessions_bff/cookie.py b/backend/src/apis/shared/sessions_bff/cookie.py
index 75b215f0..50741c5c 100644
--- a/backend/src/apis/shared/sessions_bff/cookie.py
+++ b/backend/src/apis/shared/sessions_bff/cookie.py
@@ -1,16 +1,38 @@
-"""Cookie codec — AES-GCM-sealed session id with a KMS-wrapped data key.
+"""Cookie codec — AES-GCM-sealed session id with a SHA-256-derived data key.
 
-Phase 1 CDK provisioned `BFFCookieSigningKey` as a symmetric KMS key. We
-follow the envelope-encryption pattern from the CDK comment:
+The CDK provisions two collaborating resources:
 
-    1. At first use, call `kms:GenerateDataKey(KeyId=...)` to get a 256-bit
-       AES key. The plaintext key is held in process memory; the wrapped
-       blob is discarded.
-    2. Cookie value = base64url( version || nonce || AES-GCM(payload) ).
-       The KMS key is *not* embedded — rotation requires a task restart,
-       which is fine for Phase 2 (rotation is on the Phase 7 hardening list).
-    3. `unseal` is constant-time on failure: any decode/auth-tag error maps
-       to `CookieDecodeError` so callers can't time-distinguish failure modes.
+    - `BFFCookieSigningKey` — symmetric KMS CMK that encrypts the secret at
+      rest. App-api never calls KMS directly; SecretsManager invokes
+      `kms:Decrypt` on the caller's behalf when `GetSecretValue` runs.
+    - `BFFCookieDataKeySecret` — Secrets Manager secret holding a 44-char
+      high-entropy random string (~261 bits of entropy) generated once at
+      stack create.
+
+Every app-api task on first use:
+
+    1. Reads the secret string from `BFFCookieDataKeySecret`.
+    2. Derives the 32-byte AES-256 key with `SHA-256(secret_string)` — a
+       single-shot KDF that is secure when the input has ≥256 bits of
+       entropy (a 44-char alphanumeric secret has ~261).
+    3. Caches the resulting `AESGCM` cipher as the process-wide singleton.
+
+This shared-secret design replaces the prior pattern of each task calling
+`kms:GenerateDataKey` directly: that produced a fresh random key per
+process, so under `desiredCount > 1` cookies sealed by Task A unsealed as
+`bad seal` on Task B (every page-load fan-out became a 401 storm). It
+also avoids the chained `AwsCustomResource` bootstrap design that broke
+on first deploy because the framework Lambda JSON-stringifies KMS's
+`Uint8Array` ciphertext as `{"0":233,...}`, exceeding the 4 KB
+CloudFormation response limit.
+
+    - Cookie value = base64url( version || nonce || AES-GCM(payload) ).
+    - The KMS key is *not* embedded — rotation requires regenerating the
+      secret AND restarting all tasks; in-flight cookies sealed under the
+      old key fail to unseal (Phase 7 hardening: kid-versioned cookies
+      enable hot rotation).
+    - `unseal` is constant-time on failure: any decode/auth-tag error maps
+      to `CookieDecodeError` so callers can't time-distinguish failure modes.
 
 Payload is JSON-encoded so we can extend `CookiePayload` later without
 breaking format compatibility (the version byte gates that). The whole
@@ -20,6 +42,7 @@
 from __future__ import annotations
 
 import base64
+import hashlib
 import json
 import logging
 import os
@@ -49,30 +72,55 @@ class CookieDecodeError(Exception):
     """
 
 
+class CookieDataKeyUnavailable(Exception):
+    """Raised when the data-key secret can't be fetched at startup.
+
+    Distinct from `CookieDecodeError` so callers can return 5xx (transient
+    infra problem — Secrets Manager unreachable, secret empty) rather than
+    silently clearing every active user's cookie.
+    """
+
+
 class CookieCodec:
-    """Stateful seal/unseal pair backed by a process-cached KMS data key.
+    """Stateful seal/unseal pair backed by a process-cached AES-GCM cipher.
 
     Construct one per process. The first `seal()` or `unseal()` call lazily
-    fetches the data key via `kms:GenerateDataKey`; subsequent calls reuse
-    the cached cipher. Thread-safe on the lazy-init path.
+    fetches the **shared** data-key secret from Secrets Manager and
+    derives the AES key via SHA-256; subsequent calls reuse the cached
+    cipher. Thread-safe on the lazy-init path.
+
+    Across multiple ECS tasks (`desiredCount > 1`), every task's codec
+    derives the **same** plaintext key, so cookies sealed by any task
+    unseal on any other task. This is the property that the prior
+    `kms:GenerateDataKey`-per-process design lacked.
     """
 
     def __init__(
         self,
         kms_key_arn: Optional[str] = None,
         *,
-        kms_client: Optional[object] = None,
+        data_key_secret_arn: Optional[str] = None,
+        secrets_manager_client: Optional[object] = None,
     ) -> None:
+        # `kms_key_arn` is retained for config-shape compatibility (and so
+        # callers can introspect which CMK encrypts the secret at rest), but
+        # the codec no longer calls KMS directly — SecretsManager handles
+        # decryption on the caller's behalf when GetSecretValue runs.
         if kms_key_arn is None:
             kms_key_arn = os.environ.get("BFF_COOKIE_SIGNING_KEY_ARN") or ""
+        if data_key_secret_arn is None:
+            data_key_secret_arn = (
+                os.environ.get("BFF_COOKIE_DATA_KEY_SECRET_ARN") or ""
+            )
         self._kms_key_arn = kms_key_arn
-        self._kms_client = kms_client
+        self._data_key_secret_arn = data_key_secret_arn
+        self._secrets_manager_client = secrets_manager_client
         self._cipher: Optional[AESGCM] = None
         self._init_lock = Lock()
 
     @property
     def enabled(self) -> bool:
-        return bool(self._kms_key_arn)
+        return bool(self._kms_key_arn) and bool(self._data_key_secret_arn)
 
     def _ensure_cipher(self) -> AESGCM:
         if self._cipher is not None:
@@ -80,16 +128,40 @@ def _ensure_cipher(self) -> AESGCM:
         with self._init_lock:
             if self._cipher is not None:
                 return self._cipher
-            if not self._kms_key_arn:
+            # Configuration missing — surface as decode error so the
+            # middleware path stays the same as for a `bad seal` and clears
+            # the cookie. (This branch is normally only hit in tests or in
+            # a misconfigured deploy; the env vars are populated by CDK.)
+            if not self._kms_key_arn or not self._data_key_secret_arn:
                 raise CookieDecodeError()
-            kms = self._kms_client or boto3.client("kms")
-            response = kms.generate_data_key(
-                KeyId=self._kms_key_arn,
-                KeySpec="AES_256",
-            )
-            plaintext_key = response["Plaintext"]
+
+            sm = self._secrets_manager_client or boto3.client("secretsmanager")
+            try:
+                secret = sm.get_secret_value(SecretId=self._data_key_secret_arn)
+            except Exception as exc:
+                # Infra failure — propagate so the request returns 5xx
+                # rather than silently invalidating sessions.
+                raise CookieDataKeyUnavailable(
+                    f"Failed to fetch BFF data key secret from Secrets Manager: {exc}"
+                ) from exc
+            secret_string = secret.get("SecretString") or ""
+            if not secret_string:
+                raise CookieDataKeyUnavailable(
+                    "BFF cookie data key secret is empty — bootstrap missing"
+                )
+
+            # Single-shot KDF: SHA-256 of a high-entropy random input
+            # produces a uniformly distributed 32-byte AES-256 key. The
+            # CDK generates a 44-char alphanumeric secret (~261 bits of
+            # entropy), so the SHA-256 output is statistically
+            # indistinguishable from random.
+            plaintext_key = hashlib.sha256(secret_string.encode("utf-8")).digest()
+
             self._cipher = AESGCM(plaintext_key)
-            logger.info("BFF cookie codec initialized (KMS data key fetched)")
+            logger.info(
+                "BFF cookie codec initialized "
+                "(data key derived from Secrets Manager secret via SHA-256)"
+            )
             return self._cipher
 
     def seal(self, payload: CookiePayload) -> str:
@@ -119,10 +191,10 @@ def unseal(self, value: str) -> CookiePayload:
         unknown version) raise `CookieDecodeError` with no information about
         the cause — callers treat every decode failure identically.
 
-        Infrastructure failures from `_ensure_cipher` (KMS unavailable, etc.)
-        propagate up so the middleware can return 5xx instead of silently
-        clearing the session cookie and forcing every active user to re-login
-        on a transient KMS hiccup.
+        Infrastructure failures from `_ensure_cipher` (Secrets Manager
+        unavailable, etc.) propagate up so the middleware can return 5xx
+        instead of silently clearing the session cookie and forcing every
+        active user to re-login on a transient hiccup.
         """
         # Cipher acquisition is intentionally outside the try/except below —
         # a botocore error here must not be coerced into CookieDecodeError.
@@ -162,11 +234,16 @@ def unseal(self, value: str) -> CookiePayload:
             raise CookieDecodeError()
 
 
-# Process-wide singleton. Every codec instance pulls a fresh random data key
-# from KMS, so two codecs in one process can never decrypt each other's
-# output — the seal happens in the auth/callback route, the unseal happens
-# in `SessionRefreshMiddleware`, and they MUST share a cipher. Treat this as
-# the only construction path in production code.
+# Process-wide singleton. The first `seal` or `unseal` call fetches the
+# shared data-key secret from Secrets Manager and derives the AES key via
+# SHA-256; subsequent calls reuse the same `AESGCM` cipher. Across
+# processes (e.g. multiple ECS tasks under `desiredCount > 1`), every
+# task's singleton derives the **same** plaintext key — so a cookie
+# sealed by any task unseals on any other task, including across rolling
+# deploys where two task revisions briefly coexist. The seal happens in
+# the auth/callback route, the unseal happens in `SessionRefreshMiddleware`
+# and the voice WebSocket route, and they all MUST go through this
+# singleton.
 _default_codec: Optional[CookieCodec] = None
 _default_codec_lock = Lock()
 
@@ -190,7 +267,7 @@ def _reset_default_codec_for_tests() -> None:
 
 def _set_default_codec_for_tests(codec: CookieCodec) -> None:
     """Install a pre-built codec (typically with `_cipher` pre-injected so
-    no KMS call is made) as the process-wide singleton."""
+    no Secrets Manager call is made) as the process-wide singleton."""
     global _default_codec
     with _default_codec_lock:
         _default_codec = codec
diff --git a/backend/src/apis/shared/sessions_bff/refresh.py b/backend/src/apis/shared/sessions_bff/refresh.py
index 0a2c5709..4c515ac7 100644
--- a/backend/src/apis/shared/sessions_bff/refresh.py
+++ b/backend/src/apis/shared/sessions_bff/refresh.py
@@ -15,6 +15,7 @@
 
 from __future__ import annotations
 
+import asyncio
 import base64
 import hashlib
 import hmac
@@ -156,9 +157,15 @@ def _idp(self):
             return self._cognito_idp
         return boto3.client("cognito-idp", region_name=self._region)
 
-    def refresh(self, *, username: str, refresh_token: str) -> RefreshResult:
-        """Call Cognito to exchange the refresh token for a fresh access
-        token. Raises `CognitoRefreshError` on any failure."""
+    def _refresh_sync(self, *, username: str, refresh_token: str) -> RefreshResult:
+        """Synchronous Cognito refresh exchange.
+
+        This is the raw boto3 path — kept private so callers can't invoke it
+        directly from the event loop. Use :meth:`refresh` instead, which
+        offloads this call via ``asyncio.to_thread`` so the uvicorn event
+        loop stays responsive (and so other sessions' ``get_session_lock``
+        acquisitions can still progress while ours is held).
+        """
         if not self.enabled:
             raise CognitoRefreshError("BFF refresh client is not configured")
 
@@ -187,3 +194,22 @@ def refresh(self, *, username: str, refresh_token: str) -> RefreshResult:
             id_token=result.get("IdToken"),
             access_token_exp=int(time.time()) + expires_in,
         )
+
+    async def refresh(self, *, username: str, refresh_token: str) -> RefreshResult:
+        """Exchange the refresh token for a fresh access token, off the loop.
+
+        Offloads the synchronous boto3 ``initiate_auth`` call via
+        ``asyncio.to_thread`` so the event loop keeps scheduling other
+        coroutines while Cognito is in flight. Critically, this matters
+        while the per-session ``get_session_lock(session_id)`` is held —
+        unrelated sessions' locks must remain acquirable on the loop.
+
+        The exception contract and :class:`RefreshResult` return shape are
+        identical to :meth:`_refresh_sync`: ``CognitoRefreshError`` is
+        raised on any Cognito failure and should be treated as terminal.
+        """
+        return await asyncio.to_thread(
+            self._refresh_sync,
+            username=username,
+            refresh_token=refresh_token,
+        )
diff --git a/backend/src/apis/shared/sessions_bff/repository.py b/backend/src/apis/shared/sessions_bff/repository.py
index c5eb39ff..5ba510be 100644
--- a/backend/src/apis/shared/sessions_bff/repository.py
+++ b/backend/src/apis/shared/sessions_bff/repository.py
@@ -8,13 +8,28 @@
     Attrs: user_id, cognito_access_token, cognito_refresh_token, id_token,
            access_token_exp, csrf_secret, created_at, last_seen_at, ttl
 
+    Cross-task refresh-lock attrs (added at runtime, never on the initial
+    write — both default to "absent" until a refresh contender writes them):
+       refresh_lock_owner: short opaque token identifying the leader
+       refresh_lock_until: epoch seconds; lock is considered expired past this
+
 The `ttl` attribute is wired to DynamoDB TTL so absolute session lifetime is
 enforced by the table itself — even if a session row is somehow leaked from
 the cleanup paths, DynamoDB will eventually evict it.
+
+The refresh-lock attrs coordinate the Cognito refresh exchange across tasks:
+the per-process `get_session_lock` and `single_flight` only coalesce within
+a single Python process, so under `desiredCount > 1` two tasks can otherwise
+issue parallel `cognito-idp:initiate_auth` calls with the same refresh token —
+Cognito rotates on the first; the second fails `NotAuthorizedException` and
+the loser unilaterally clears the user's cookie. The lock turns this into a
+leader/follower handoff: one task does the refresh, the other reads the
+freshly persisted tokens off the row.
 """
 
 from __future__ import annotations
 
+import asyncio
 import logging
 import os
 import time
@@ -31,11 +46,16 @@
 class SessionRepository:
     """Async-shaped wrapper over the BFF sessions DynamoDB table.
 
-    The methods are declared `async` to match the rest of `apis.shared`,
-    but boto3 is sync — calls run on the event loop thread. That mirrors
-    `UserRepository` and is intentional: refresh-storm coalescing happens
-    one layer up via `get_session_lock()`, so the lookup itself never
-    fans out enough to need a thread pool.
+    The methods are ``async def`` and offload each boto3 call via
+    ``asyncio.to_thread`` so the uvicorn event loop stays free to schedule
+    unrelated coroutines during the DynamoDB round-trip. Without this
+    offload, a single slow DDB call freezes every in-flight request — and
+    under page-load fan-out the blocking calls serialize, producing the
+    80s+ latency tails that motivated the event-loop-blocking bugfix.
+
+    The ``_item_to_record`` translation and the post-read TTL
+    defense-in-depth check run on the calling coroutine (pure Python, no
+    I/O); only the boto3 round-trip is offloaded.
     """
 
     def __init__(self, table_name: Optional[str] = None) -> None:
@@ -100,8 +120,14 @@ def _record_to_item(record: SessionRecord) -> dict:
     async def get(self, session_id: str) -> Optional[SessionRecord]:
         if not self._enabled:
             return None
+
+        key = self._key(session_id)
+
+        def _call() -> dict:
+            return self._table.get_item(Key=key)
+
         try:
-            response = self._table.get_item(Key=self._key(session_id))
+            response = await asyncio.to_thread(_call)
         except ClientError as exc:
             logger.error("BFF session get_item failed for %s: %s", session_id, exc)
             return None
@@ -118,7 +144,13 @@ async def get(self, session_id: str) -> Optional[SessionRecord]:
     async def put(self, record: SessionRecord) -> None:
         if not self._enabled:
             return
-        self._table.put_item(Item=self._record_to_item(record))
+
+        item = self._record_to_item(record)
+
+        def _call() -> None:
+            self._table.put_item(Item=item)
+
+        await asyncio.to_thread(_call)
 
     async def update_tokens(
         self,
@@ -129,6 +161,7 @@ async def update_tokens(
         access_token_exp: int,
         last_seen_at: int,
         ttl: Optional[int] = None,
+        expected_lock_owner: Optional[str] = None,
     ) -> None:
         """Atomically replace the Cognito tokens after a refresh.
 
@@ -138,6 +171,21 @@ async def update_tokens(
         is supplied, the row's DynamoDB TTL slides forward in the same write
         — a refresh proves the user is active, so the session row's expiry
         should slide alongside it.
+
+        When `expected_lock_owner` is supplied, the write is conditional on
+        the row's `refresh_lock_owner` attribute strictly matching. The lock
+        attributes are also REMOVED in the same write, releasing the
+        cross-task lock alongside the token rotation. The condition fires
+        on two distinct stale-leader cases that both must NOT stomp:
+
+        1. A peer holds the lock right now (their owner != ours) — we never
+           had it or our acquisition was stale.
+        2. A peer held the lock, completed the refresh, and `REMOVE`d the
+           attrs — the row has no lock owner at all but our tokens are
+           now older than the row's persisted state.
+
+        Both surface as `ConditionalCheckFailedException`; the caller
+        re-reads the row and adopts the peer's tokens instead of stomping.
         """
         if not self._enabled:
             return
@@ -159,7 +207,7 @@ async def update_tokens(
         if ttl is not None:
             update_expr += ", #ttl = :ttl"
             expr_values[":ttl"] = ttl
-        kwargs = {
+        kwargs: dict = {
             "Key": self._key(session_id),
             "UpdateExpression": update_expr,
             "ExpressionAttributeValues": expr_values,
@@ -167,7 +215,130 @@ async def update_tokens(
         if ttl is not None:
             # `ttl` is a reserved word in DynamoDB expressions.
             kwargs["ExpressionAttributeNames"] = {"#ttl": "ttl"}
-        self._table.update_item(**kwargs)
+
+        if expected_lock_owner is not None:
+            # Atomically release the cross-task refresh lock alongside the
+            # token write. The condition is strict — `refresh_lock_owner`
+            # MUST equal our owner. We don't accept "lock attrs absent"
+            # because that's exactly the stale-leader stomp case: a peer
+            # whose lock TTL'd, took over, refreshed, and persisted (which
+            # REMOVEs the lock attrs) — letting `attribute_not_exists`
+            # match here would let our stale tokens overwrite the peer's
+            # freshly rotated ones, silently logging the user out on the
+            # next request when Cognito rejects our (now-revoked) refresh
+            # token. The leader always set these attrs in
+            # `try_acquire_refresh_lock`, so the strict form is correct
+            # in every legitimate flow.
+            kwargs["UpdateExpression"] = (
+                update_expr + " REMOVE refresh_lock_owner, refresh_lock_until"
+            )
+            expr_values[":owner"] = expected_lock_owner
+            kwargs["ConditionExpression"] = "refresh_lock_owner = :owner"
+
+        def _call() -> None:
+            self._table.update_item(**kwargs)
+
+        await asyncio.to_thread(_call)
+
+    async def try_acquire_refresh_lock(
+        self,
+        session_id: str,
+        owner: str,
+        lock_ttl_seconds: int,
+    ) -> bool:
+        """Atomically claim leadership of a cross-task Cognito refresh.
+
+        Conditional `UpdateItem` on the session row: succeeds (returns True)
+        only if no peer holds the lock OR the holder's lock has expired
+        (`refresh_lock_until < now`). On contention returns False — the
+        caller should poll the row for the leader's persisted tokens.
+
+        Lock TTL bounds the worst case: a leader that crashes mid-refresh
+        strands the lock for at most `lock_ttl_seconds` (we use 30s in the
+        middleware), after which any peer can re-acquire and retry.
+
+        Returns False on `ConditionalCheckFailedException`. Other DDB
+        errors propagate so the caller can surface them as 5xx — silently
+        suppressing them would create a "neither leader nor follower" gap.
+        """
+        if not self._enabled:
+            return False
+        now = int(time.time())
+        kwargs: dict = {
+            "Key": self._key(session_id),
+            "UpdateExpression": (
+                "SET refresh_lock_owner = :owner, "
+                "refresh_lock_until = :until"
+            ),
+            # `attribute_exists(PK)` guards against UpdateItem's
+            # upsert-by-default behavior — without it, a logout that races
+            # the refresh path (deletes the row between `repository.get()`
+            # and this call) would let us create a phantom row containing
+            # only the lock attrs and no `ttl`, which DDB TTL would never
+            # reap. With it, lock acquisition on a missing row fails
+            # cleanly via ConditionalCheckFailedException → False.
+            "ConditionExpression": (
+                "attribute_exists(PK) AND ("
+                "attribute_not_exists(refresh_lock_until) "
+                "OR refresh_lock_until < :now)"
+            ),
+            "ExpressionAttributeValues": {
+                ":owner": owner,
+                ":until": now + lock_ttl_seconds,
+                ":now": now,
+            },
+        }
+
+        def _call() -> bool:
+            try:
+                self._table.update_item(**kwargs)
+                return True
+            except ClientError as exc:
+                if (
+                    exc.response.get("Error", {}).get("Code")
+                    == "ConditionalCheckFailedException"
+                ):
+                    return False
+                raise
+
+        return await asyncio.to_thread(_call)
+
+    async def release_refresh_lock(self, session_id: str, owner: str) -> None:
+        """Release the cross-task refresh lock if `owner` still holds it.
+
+        Used when the leader's Cognito refresh fails terminally and we want
+        a peer to be able to retry without waiting for the full lock TTL.
+        Best-effort: a `ConditionalCheckFailedException` (lock TTL'd or
+        re-acquired) is treated as a no-op.
+
+        `update_tokens` clears the lock attributes atomically with a
+        successful refresh, so this is only for the failure path.
+        """
+        if not self._enabled:
+            return
+        kwargs: dict = {
+            "Key": self._key(session_id),
+            "UpdateExpression": (
+                "REMOVE refresh_lock_owner, refresh_lock_until"
+            ),
+            "ConditionExpression": "refresh_lock_owner = :owner",
+            "ExpressionAttributeValues": {":owner": owner},
+        }
+
+        def _call() -> None:
+            try:
+                self._table.update_item(**kwargs)
+            except ClientError as exc:
+                code = exc.response.get("Error", {}).get("Code")
+                if code == "ConditionalCheckFailedException":
+                    return  # peer re-acquired or lock TTL'd — fine
+                logger.warning(
+                    "BFF refresh lock release failed for %s: %s",
+                    session_id,
+                    exc,
+                )
+
+        await asyncio.to_thread(_call)
 
     async def touch_last_seen(
         self,
@@ -195,8 +366,12 @@ async def touch_last_seen(
             expr_values[":ttl"] = ttl
             kwargs["UpdateExpression"] = update_expr
             kwargs["ExpressionAttributeNames"] = {"#ttl": "ttl"}
-        try:
+
+        def _call() -> None:
             self._table.update_item(**kwargs)
+
+        try:
+            await asyncio.to_thread(_call)
         except ClientError as exc:
             # Touch failures are non-critical — log and move on rather than
             # surfacing as a request error.
@@ -205,4 +380,10 @@ async def touch_last_seen(
     async def delete(self, session_id: str) -> None:
         if not self._enabled:
             return
-        self._table.delete_item(Key=self._key(session_id))
+
+        key = self._key(session_id)
+
+        def _call() -> None:
+            self._table.delete_item(Key=key)
+
+        await asyncio.to_thread(_call)
diff --git a/backend/src/apis/shared/sessions_bff/single_flight.py b/backend/src/apis/shared/sessions_bff/single_flight.py
new file mode 100644
index 00000000..0af06ab6
--- /dev/null
+++ b/backend/src/apis/shared/sessions_bff/single_flight.py
@@ -0,0 +1,117 @@
+"""Per-session single-flight primitive — session-resolve path coalescing.
+
+`get_session_lock` in `lock.py` only serializes the Cognito refresh exchange.
+It does NOT coalesce the upstream unseal -> `SessionCache.get` ->
+`SessionRepository.get` -> `needs_refresh` sequence. When Angular's ~8-endpoint
+page-load fan-out hits a cold cache window, each coroutine independently
+observes the miss and each runs its own blocking `get_item`, producing ~N
+DynamoDB round-trips per cache window per session.
+
+The primitive in this module addresses that gap with a per-session
+`asyncio.Future`: the first caller (the "leader") registers a Future under the
+session id, runs the loader, and stores the result/exception on the Future.
+Concurrent callers that arrive while the leader is still running (the
+"followers") find the existing Future and simply `await` it, sharing the
+leader's single DynamoDB call.
+
+This is a separate primitive from `get_session_lock`. The existing lock scope
+around the Cognito exchange is preserved end-to-end — this single-flight sits
+upstream of it.
+"""
+
+from __future__ import annotations
+
+import asyncio
+from threading import Lock as _ThreadLock
+from typing import Awaitable, Callable, Dict, Optional, Tuple
+
+from apis.shared.sessions_bff.models import SessionRecord
+
+# Module-level registry of in-flight resolves keyed by `session_id`.
+# Unlike `lock.py`, we use a plain `dict` rather than a `WeakValueDictionary`
+# because an `asyncio.Future` that is only referenced by its awaiters would
+# otherwise be collected if every awaiter was garbage-collected before
+# resolution — the leader is responsible for removing its entry in a
+# `finally` block, which keeps lifetime management explicit.
+_inflight: Dict[str, "asyncio.Future[Tuple[Optional[SessionRecord], bool]]"] = {}
+_registry_guard = _ThreadLock()
+
+
+async def resolve_once(
+    session_id: str,
+    loader_coro_factory: Callable[
+        [], Awaitable[Tuple[Optional[SessionRecord], bool]]
+    ],
+) -> Tuple[Optional[SessionRecord], bool]:
+    """Run `loader_coro_factory()` at most once per concurrent `session_id`.
+
+    Leader semantics: the first caller for a given `session_id` creates a new
+    `asyncio.Future`, registers it under the thread-lock-guarded registry,
+    runs the loader, sets the result or exception on the Future, removes the
+    entry from the registry, and returns the value.
+
+    Follower semantics: any caller that finds an existing Future `await`s it
+    and returns its value, sharing the leader's single loader invocation.
+
+    Exception propagation: an exception raised by the loader is set on the
+    Future so it propagates to the leader AND to every follower currently
+    awaiting. The registry entry is always removed before the leader returns
+    (success or failure), so any subsequent call after the failure starts a
+    fresh leader.
+
+    Isolation: distinct `session_id`s do not share a Future — the registry is
+    keyed by `session_id`.
+    """
+    # Fast path: look for an existing Future without holding the thread lock.
+    existing = _inflight.get(session_id)
+    if existing is not None:
+        return await existing
+
+    # Slow path: register a new Future under the thread lock, double-checking
+    # so two coroutines on different threads can't race-create two Futures.
+    loop = asyncio.get_event_loop()
+    with _registry_guard:
+        existing = _inflight.get(session_id)
+        if existing is not None:
+            # Another caller won the race — fall through to follower path.
+            future = existing
+            is_leader = False
+        else:
+            future = loop.create_future()
+            _inflight[session_id] = future
+            is_leader = True
+
+    if not is_leader:
+        return await future
+
+    # Leader path — run the loader, set the result/exception, and clean up.
+    try:
+        result = await loader_coro_factory()
+    except BaseException as exc:  # noqa: BLE001 — we must propagate everything
+        if not future.done():
+            future.set_exception(exc)
+            # Mark the exception as retrieved on the leader's side. Followers
+            # still observe it when they `await` the Future; this only
+            # silences the "Future exception was never retrieved" warning
+            # emitted when no follower ever attached.
+            future.exception()
+        with _registry_guard:
+            # Only clear our own entry — another leader may have taken over
+            # after we set the exception, though in practice that's only
+            # possible if every follower has already consumed the Future.
+            if _inflight.get(session_id) is future:
+                del _inflight[session_id]
+        raise
+    else:
+        if not future.done():
+            future.set_result(result)
+        with _registry_guard:
+            if _inflight.get(session_id) is future:
+                del _inflight[session_id]
+        return result
+
+
+def _reset_for_tests() -> None:
+    """Test-only escape hatch — drop all tracked in-flight Futures."""
+    with _registry_guard:
+        _inflight.clear()
diff --git a/backend/src/apis/shared/tools/models.py b/backend/src/apis/shared/tools/models.py
index c1adc74b..afa4a3d6 100644
--- a/backend/src/apis/shared/tools/models.py
+++ b/backend/src/apis/shared/tools/models.py
@@ -7,7 +7,7 @@
 
 from datetime import datetime, timezone
 from enum import Enum
-from typing import Dict, List, Optional
+from typing import Any, Dict, List, Literal, Optional
 
 from pydantic import BaseModel, Field
 
@@ -280,6 +280,89 @@ def from_dict(cls, data: dict) -> "A2AAgentConfig":
         )
 
 
+# =============================================================================
+# MCP Apps — tool UI metadata (SEP-1865)
+# =============================================================================
+
+# Spec values for `_meta.ui.visibility`. "model" = the model may see/call the
+# tool; "app" = an embedded MCP App may call it. Default per spec is both.
+ToolVisibility = Literal["model", "app"]
+DEFAULT_TOOL_VISIBILITY: List[ToolVisibility] = ["model", "app"]
+
+
+class ToolUIMetadata(BaseModel):
+    """Parsed `_meta.ui` from an MCP `tools/list` entry (MCP Apps / SEP-1865).
+
+    PR #2 of the MCP Apps host-renderer initiative only consumes
+    `resource_uri` and `visibility`. The full `_meta.ui` payload is retained
+    verbatim in `raw` so later PRs (the `resources/read` fetch path, CSP /
+    permissions handling, the postMessage bridge) can read it without another
+    server round-trip. `_meta` is discovered live from the server, not
+    admin-configured, so this never round-trips through DynamoDB.
+    """
+
+    resource_uri: Optional[str] = Field(
+        None,
+        description="The `ui://` URI from `_meta.ui.resourceUri` (fetched via "
+        "`resources/read` in a later PR; never inlined).",
+    )
+    visibility: List[ToolVisibility] = Field(
+        default_factory=lambda: list(DEFAULT_TOOL_VISIBILITY),
+        description="Surfaces allowed to see/invoke the tool. Defaults to "
+        "['model', 'app'] when the server omits `visibility`.",
+    )
+    raw: Dict[str, Any] = Field(
+        default_factory=dict,
+        description="Verbatim `_meta.ui` payload as returned by the server.",
+    )
+
+    model_config = {"use_enum_values": True}
+
+    @classmethod
+    def from_meta(cls, meta: Optional[Dict[str, Any]]) -> Optional["ToolUIMetadata"]:
+        """Parse a tool's `_meta` dict into `ToolUIMetadata`.
+
+        Returns None when the tool carries no `_meta.ui` block (an ordinary,
+        non-UI tool). An absent `visibility` defaults to the spec default
+        (`['model', 'app']`); an explicitly present `visibility` is honored
+        as-is (so `[]` or `['app']` correctly hides the tool from the model).
+        """
+        if not isinstance(meta, dict):
+            return None
+        ui = meta.get("ui")
+        if not isinstance(ui, dict):
+            return None
+
+        raw_visibility = ui.get("visibility")
+        if isinstance(raw_visibility, list):
+            visibility: List[ToolVisibility] = [
+                v for v in raw_visibility if v in ("model", "app")
+            ]
+        else:
+            visibility = list(DEFAULT_TOOL_VISIBILITY)
+
+        resource_uri = ui.get("resourceUri")
+        return cls(
+            resource_uri=resource_uri if isinstance(resource_uri, str) else None,
+            visibility=visibility,
+            raw=dict(ui),
+        )
+
+    def visible_to_model(self) -> bool:
+        """True if the model is allowed to see/call this tool."""
+        return "model" in self.visibility
+
+    def visible_to_app(self) -> bool:
+        """True if an embedded MCP App may call this tool (SEP-1865).
+
+        PR #5 gates the app-initiated `tools/call` proxy on this at both
+        the app-api boundary and the inference-api dispatch (spec MUST:
+        reject `tools/call` from apps for tools whose visibility excludes
+        `"app"`).
+        """
+        return "app" in self.visibility
+
+
 # =============================================================================
 # Database Models (stored in DynamoDB)
 # =============================================================================
@@ -346,6 +429,21 @@ class ToolDefinition(BaseModel):
         description="A2A agent configuration (required when protocol is 'a2a')",
     )
 
+    # MCP Apps (SEP-1865) — derived live from the MCP server's `tools/list`
+    # `_meta.ui`, not admin-configured, so these are intentionally NOT
+    # round-tripped through DynamoDB. Defaults make the field inert for every
+    # existing tool (full visibility, no UI resource).
+    visibility: List[ToolVisibility] = Field(
+        default_factory=lambda: list(DEFAULT_TOOL_VISIBILITY),
+        description="Surfaces allowed to see/invoke this tool, from "
+        "`_meta.ui.visibility`. Defaults to ['model', 'app'].",
+    )
+    ui_metadata: Optional[ToolUIMetadata] = Field(
+        None,
+        description="Parsed `_meta.ui` block when the tool ships an MCP App "
+        "UI resource; None for ordinary tools.",
+    )
+
     # Audit
     created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
     updated_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
diff --git a/backend/src/apis/shared/user_menu_links/__init__.py b/backend/src/apis/shared/user_menu_links/__init__.py
new file mode 100644
index 00000000..017e0d8e
--- /dev/null
+++ b/backend/src/apis/shared/user_menu_links/__init__.py
@@ -0,0 +1 @@
+"""User-menu links: admin-managed links shown in the SPA user menu."""
diff --git a/backend/src/apis/shared/user_menu_links/models.py b/backend/src/apis/shared/user_menu_links/models.py
new file mode 100644
index 00000000..81577613
--- /dev/null
+++ b/backend/src/apis/shared/user_menu_links/models.py
@@ -0,0 +1,180 @@
+"""Models for admin-managed user-menu links.
+
+A link is either:
+  - ``external``: opens ``url`` in a new tab
+  - ``modal``:    opens an in-app modal that renders ``body_markdown``
+
+Storage uses single-table design with a fixed partition (``USER_MENU_LINKS``)
+since the data is global / single-tenant. When per-org scoping is needed, the
+PK becomes ``USER_MENU_LINKS#<org_id>`` without touching the SK shape.
+"""
+
+from dataclasses import dataclass, field
+from datetime import datetime, timezone
+from typing import Any, Dict, List, Literal, Optional
+
+from pydantic import BaseModel, Field, field_validator, model_validator
+
+LinkKind = Literal["external", "modal"]
+
+
+def _utc_now() -> str:
+    return datetime.now(timezone.utc).isoformat() + "Z"
+
+
+def _validate_http_url(value: Optional[str]) -> Optional[str]:
+    """Reject non-http(s) URLs. Defense-in-depth — Angular's DomSanitizer
+    also strips ``javascript:`` URLs from ``[href]``, but anyone hitting the
+    API directly (curl, scripts, a malicious admin) bypasses the SPA form."""
+    if value is None or value == "":
+        return value
+    lowered = value.strip().lower()
+    if not (lowered.startswith("http://") or lowered.startswith("https://")):
+        raise ValueError("url must start with http:// or https://")
+    return value
+
+
+@dataclass
+class UserMenuLink:
+    """Admin-managed user-menu link stored in DynamoDB."""
+
+    link_id: str
+    label: str
+    kind: LinkKind
+    created_at: str
+    updated_at: str
+    enabled: bool = True
+    order: int = 0
+    url: Optional[str] = None  # external kind only
+    body_markdown: Optional[str] = None  # modal kind only
+    created_by: Optional[str] = None
+
+    def to_dynamo_item(self) -> Dict[str, Any]:
+        item: Dict[str, Any] = {
+            "PK": "USER_MENU_LINKS",
+            "SK": f"LINK#{self.link_id}",
+            "linkId": self.link_id,
+            "label": self.label,
+            "kind": self.kind,
+            "enabled": self.enabled,
+            "order": self.order,
+            "createdAt": self.created_at,
+            "updatedAt": self.updated_at,
+        }
+        if self.url:
+            item["url"] = self.url
+        if self.body_markdown:
+            item["bodyMarkdown"] = self.body_markdown
+        if self.created_by:
+            item["createdBy"] = self.created_by
+        return item
+
+    @classmethod
+    def from_dynamo_item(cls, item: Dict[str, Any]) -> "UserMenuLink":
+        try:
+            created_at = item["createdAt"]
+            updated_at = item["updatedAt"]
+        except KeyError as e:
+            raise ValueError(
+                f"User-menu link item {item.get('SK', '?')} is missing required "
+                f"timestamp field: {e.args[0]}"
+            ) from e
+        return cls(
+            link_id=item["linkId"],
+            label=item["label"],
+            kind=item["kind"],
+            enabled=item.get("enabled", True),
+            order=int(item.get("order", 0)),
+            url=item.get("url"),
+            body_markdown=item.get("bodyMarkdown"),
+            created_at=created_at,
+            updated_at=updated_at,
+            created_by=item.get("createdBy"),
+        )
+
+
+# =============================================================================
+# Pydantic request/response models
+# =============================================================================
+
+
+class _LinkFieldsMixin(BaseModel):
+    """Field validation shared by Create + Update."""
+
+    @model_validator(mode="after")
+    def _validate_kind_fields(self) -> "_LinkFieldsMixin":
+        kind = getattr(self, "kind", None)
+        url = getattr(self, "url", None)
+        body = getattr(self, "body_markdown", None)
+        if kind == "external" and not url:
+            raise ValueError("external links require url")
+        if kind == "modal" and not body:
+            raise ValueError("modal links require body_markdown")
+        # On a partial update, kind may be None — that's fine; downstream
+        # service merges with existing values before persisting.
+        return self
+
+
+class UserMenuLinkCreate(_LinkFieldsMixin):
+    label: str = Field(..., min_length=1, max_length=64)
+    kind: LinkKind
+    enabled: bool = True
+    order: int = Field(default=0, ge=0, le=10_000)
+    url: Optional[str] = Field(None, max_length=2048)
+    body_markdown: Optional[str] = Field(None, max_length=50_000)
+
+    @field_validator("url")
+    @classmethod
+    def _check_url_scheme(cls, v: Optional[str]) -> Optional[str]:
+        return _validate_http_url(v)
+
+
+class UserMenuLinkUpdate(BaseModel):
+    """Partial update — all fields optional. Validation runs after merge."""
+
+    label: Optional[str] = Field(None, min_length=1, max_length=64)
+    kind: Optional[LinkKind] = None
+    enabled: Optional[bool] = None
+    order: Optional[int] = Field(None, ge=0, le=10_000)
+    url: Optional[str] = Field(None, max_length=2048)
+    body_markdown: Optional[str] = Field(None, max_length=50_000)
+
+    @field_validator("url")
+    @classmethod
+    def _check_url_scheme(cls, v: Optional[str]) -> Optional[str]:
+        return _validate_http_url(v)
+
+
+class UserMenuLinkResponse(BaseModel):
+    """API response shape (camelCase via field aliases on read sites)."""
+
+    link_id: str
+    label: str
+    kind: LinkKind
+    enabled: bool
+    order: int
+    url: Optional[str] = None
+    body_markdown: Optional[str] = None
+    created_at: str
+    updated_at: str
+    created_by: Optional[str] = None
+
+    @classmethod
+    def from_link(cls, link: UserMenuLink) -> "UserMenuLinkResponse":
+        return cls(
+            link_id=link.link_id,
+            label=link.label,
+            kind=link.kind,
+            enabled=link.enabled,
+            order=link.order,
+            url=link.url,
+            body_markdown=link.body_markdown,
+            created_at=link.created_at,
+            updated_at=link.updated_at,
+            created_by=link.created_by,
+        )
+
+
+class UserMenuLinkListResponse(BaseModel):
+    links: List[UserMenuLinkResponse]
+    total: int
diff --git a/backend/src/apis/shared/user_menu_links/repository.py b/backend/src/apis/shared/user_menu_links/repository.py
new file mode 100644
index 00000000..31e591c7
--- /dev/null
+++ b/backend/src/apis/shared/user_menu_links/repository.py
@@ -0,0 +1,181 @@
+"""DynamoDB repository for admin-managed user-menu links."""
+
+import logging
+import os
+import uuid
+from datetime import datetime, timezone
+from typing import List, Optional
+
+import boto3
+from botocore.exceptions import ClientError
+
+from .models import UserMenuLink, UserMenuLinkCreate, UserMenuLinkUpdate
+
+logger = logging.getLogger(__name__)
+
+
+class UserMenuLinksRepository:
+    """CRUD for user-menu links in DynamoDB.
+
+    Single-table design, fixed PK ``USER_MENU_LINKS``. All items are queried
+    with a single ``query`` by PK; no GSI needed.
+    """
+
+    def __init__(self, table_name: Optional[str] = None, region: Optional[str] = None):
+        self._table_name = table_name or os.getenv("DYNAMODB_USER_MENU_LINKS_TABLE_NAME")
+        self._region = region or os.getenv("AWS_REGION", "us-west-2")
+        self._enabled = bool(self._table_name)
+
+        if not self._enabled:
+            logger.warning(
+                "DYNAMODB_USER_MENU_LINKS_TABLE_NAME not set. "
+                "User menu links repository is disabled."
+            )
+            return
+
+        profile = os.getenv("AWS_PROFILE")
+        if profile:
+            session = boto3.Session(profile_name=profile)
+            self._dynamodb = session.resource("dynamodb", region_name=self._region)
+        else:
+            self._dynamodb = boto3.resource("dynamodb", region_name=self._region)
+        self._table = self._dynamodb.Table(self._table_name)
+        logger.info(f"Initialized user-menu links repository: table={self._table_name}")
+
+    @property
+    def enabled(self) -> bool:
+        return self._enabled
+
+    async def list_links(self, enabled_only: bool = False) -> List[UserMenuLink]:
+        if not self._enabled:
+            return []
+
+        try:
+            response = self._table.query(
+                KeyConditionExpression="PK = :pk AND begins_with(SK, :sk)",
+                ExpressionAttributeValues={":pk": "USER_MENU_LINKS", ":sk": "LINK#"},
+            )
+            items = response.get("Items", [])
+            while "LastEvaluatedKey" in response:
+                response = self._table.query(
+                    KeyConditionExpression="PK = :pk AND begins_with(SK, :sk)",
+                    ExpressionAttributeValues={":pk": "USER_MENU_LINKS", ":sk": "LINK#"},
+                    ExclusiveStartKey=response["LastEvaluatedKey"],
+                )
+                items.extend(response.get("Items", []))
+        except ClientError:
+            logger.error("Error listing user-menu links", exc_info=True)
+            raise
+
+        links = [UserMenuLink.from_dynamo_item(item) for item in items]
+        if enabled_only:
+            links = [link for link in links if link.enabled]
+        links.sort(key=lambda link: (link.order, link.label.lower()))
+        return links
+
+    async def get_link(self, link_id: str) -> Optional[UserMenuLink]:
+        if not self._enabled:
+            return None
+        try:
+            response = self._table.get_item(
+                Key={"PK": "USER_MENU_LINKS", "SK": f"LINK#{link_id}"}
+            )
+            item = response.get("Item")
+            if not item:
+                return None
+            return UserMenuLink.from_dynamo_item(item)
+        except ClientError:
+            logger.error("Error getting user-menu link", exc_info=True)
+            raise
+
+    async def create_link(
+        self, data: UserMenuLinkCreate, created_by: Optional[str] = None
+    ) -> UserMenuLink:
+        if not self._enabled:
+            raise RuntimeError("User-menu links repository is not enabled")
+
+        now = datetime.now(timezone.utc).isoformat() + "Z"
+        link = UserMenuLink(
+            link_id=str(uuid.uuid4()),
+            label=data.label,
+            kind=data.kind,
+            enabled=data.enabled,
+            order=data.order,
+            url=data.url,
+            body_markdown=data.body_markdown,
+            created_at=now,
+            updated_at=now,
+            created_by=created_by,
+        )
+
+        try:
+            self._table.put_item(
+                Item=link.to_dynamo_item(),
+                ConditionExpression="attribute_not_exists(PK)",
+            )
+        except ClientError:
+            logger.error("Error creating user-menu link", exc_info=True)
+            raise
+
+        logger.info(f"Created user-menu link: {link.link_id}")
+        return link
+
+    async def update_link(
+        self, link_id: str, updates: UserMenuLinkUpdate
+    ) -> Optional[UserMenuLink]:
+        if not self._enabled:
+            return None
+
+        existing = await self.get_link(link_id)
+        if not existing:
+            return None
+
+        update_fields = updates.model_dump(exclude_none=True)
+        for field_name, value in update_fields.items():
+            setattr(existing, field_name, value)
+        existing.updated_at = datetime.now(timezone.utc).isoformat() + "Z"
+
+        # Re-validate the merged kind/url/body_markdown invariant before persist.
+        if existing.kind == "external" and not existing.url:
+            raise ValueError("external links require url")
+        if existing.kind == "modal" and not existing.body_markdown:
+            raise ValueError("modal links require body_markdown")
+        if existing.url:
+            lowered = existing.url.strip().lower()
+            if not (lowered.startswith("http://") or lowered.startswith("https://")):
+                raise ValueError("url must start with http:// or https://")
+
+        try:
+            self._table.put_item(Item=existing.to_dynamo_item())
+        except ClientError:
+            logger.error("Error updating user-menu link", exc_info=True)
+            raise
+
+        logger.info(f"Updated user-menu link: {link_id}")
+        return existing
+
+    async def delete_link(self, link_id: str) -> bool:
+        if not self._enabled:
+            return False
+        existing = await self.get_link(link_id)
+        if not existing:
+            return False
+        try:
+            self._table.delete_item(
+                Key={"PK": "USER_MENU_LINKS", "SK": f"LINK#{link_id}"}
+            )
+        except ClientError:
+            logger.error("Error deleting user-menu link", exc_info=True)
+            raise
+        logger.info(f"Deleted user-menu link: {link_id}")
+        return True
+
+
+_repository: Optional[UserMenuLinksRepository] = None
+
+
+def get_user_menu_links_repository() -> UserMenuLinksRepository:
+    global _repository
+    if _repository is None:
+        _repository = UserMenuLinksRepository()
+    return _repository
diff --git a/backend/src/apis/shared/user_menu_links/service.py b/backend/src/apis/shared/user_menu_links/service.py
new file mode 100644
index 00000000..292dc822
--- /dev/null
+++ b/backend/src/apis/shared/user_menu_links/service.py
@@ -0,0 +1,40 @@
+"""Service layer for user-menu links."""
+
+from typing import List, Optional
+
+from .models import UserMenuLink, UserMenuLinkCreate, UserMenuLinkUpdate
+from .repository import UserMenuLinksRepository, get_user_menu_links_repository
+
+
+class UserMenuLinksService:
+    def __init__(self, repository: UserMenuLinksRepository):
+        self._repo = repository
+
+    async def list_links(self, enabled_only: bool = False) -> List[UserMenuLink]:
+        return await self._repo.list_links(enabled_only=enabled_only)
+
+    async def get_link(self, link_id: str) -> Optional[UserMenuLink]:
+        return await self._repo.get_link(link_id)
+
+    async def create_link(
+        self, data: UserMenuLinkCreate, created_by: Optional[str] = None
+    ) -> UserMenuLink:
+        return await self._repo.create_link(data, created_by=created_by)
+
+    async def update_link(
+        self, link_id: str, updates: UserMenuLinkUpdate
+    ) -> Optional[UserMenuLink]:
+        return await self._repo.update_link(link_id, updates)
+
+    async def delete_link(self, link_id: str) -> bool:
+        return await self._repo.delete_link(link_id)
+
+
+_service: Optional[UserMenuLinksService] = None
+
+
+def get_user_menu_links_service() -> UserMenuLinksService:
+    global _service
+    if _service is None:
+        _service = UserMenuLinksService(get_user_menu_links_repository())
+    return _service
diff --git a/backend/src/lambdas/artifact_render/handler.py b/backend/src/lambdas/artifact_render/handler.py
new file mode 100644
index 00000000..434347b3
--- /dev/null
+++ b/backend/src/lambdas/artifact_render/handler.py
@@ -0,0 +1,507 @@
+"""Artifact render Lambda.
+
+Fronts the `artifacts.{domain}` CloudFront origin. Request flow:
+
+  1. CloudFront forwards a GET carrying a render-token JWT (`?t=...`).
+  2. Verify the JWT (HS256) against the HMAC key in Secrets Manager
+     (RENDER_TOKEN_SECRET_ARN). The token pins one immutable artifact
+     version: `{sub, aid, ver, sid, iss, aud, iat, exp}`.
+  3. Read the version record from DynamoDB (ARTIFACTS_TABLE):
+       PK = USER#{sub}
+       SK = ARTIFACT#{aid}#V#{ver:05d}
+  4. Fetch the content blob from S3 (ARTIFACTS_BUCKET) using the
+     `content_key` stored on the record (the writer owns key
+     construction; the verifier never reconstructs it).
+  5. Return those exact bytes with strict security headers. The CDN's
+     response-headers-policy also stamps the CSP, so the policy holds
+     even if this handler is buggy (defense in depth).
+
+This Lambda is a thin authenticated gate + header stamper, not a
+templating layer: S3 holds the complete document to serve, and the
+artifact writer owns all rendering. `#HEAD` is never read — the token
+pins an exact version.
+
+Markdown serve-type mapping: a Markdown artifact's version row carries
+the authored `content_type` (`text/markdown`) so the SPA card/list stay
+truthful, but S3 holds the writer's self-contained HTML render wrapper.
+So records typed as Markdown are served with a `text/html` HTTP
+content type — still the exact S3 bytes, only the response header is
+mapped (header stamping, not templating). Must stay in sync with the
+writer (`agents/builtin_tools/artifacts/service.py`).
+
+No third-party dependencies: HS256 is HMAC-SHA256, verified with the
+standard library. boto3 is provided by the Lambda runtime.
+
+Boundary: this Lambda runs OUTSIDE the apis/* import boundary
+(test_import_boundaries.py) — it's a standalone deployable, not part of
+app-api or inference-api. Do not import from apis/ here.
+"""
+
+from __future__ import annotations
+
+import base64
+import hashlib
+import hmac
+import json
+import logging
+import os
+import re
+import time
+from typing import Any
+from urllib.parse import parse_qs, quote
+
+import boto3
+from botocore.exceptions import ClientError
+
+logger = logging.getLogger()
+logger.setLevel(logging.INFO)
+
+# Pinned at deploy time via ArtifactsStack environment block. Read at
+# module load; emptiness is checked at request time so a missing var
+# becomes a clean runtime 500 with a log line rather than an import crash.
+_FRAME_ANCESTOR = os.environ.get("FRAME_ANCESTOR_ORIGIN", "")
+_CSP_SCRIPT_SRC = os.environ.get(
+    "CSP_SCRIPT_SRC",
+    "'self' 'unsafe-inline'",
+)
+_ARTIFACTS_BUCKET = os.environ.get("ARTIFACTS_BUCKET", "")
+_ARTIFACTS_TABLE = os.environ.get("ARTIFACTS_TABLE", "")
+_RENDER_TOKEN_SECRET_ARN = os.environ.get("RENDER_TOKEN_SECRET_ARN", "")
+
+_EXPECTED_ISS = "app-api"
+_EXPECTED_AUD = "artifact-render"
+# Tolerance for clock skew between the app-api minter and this Lambda.
+# Both run in AWS so skew is sub-second; keep it tight.
+_LEEWAY_SECONDS = 5
+# Upper bound on token lifetime. The minter issues ~60–120s tokens; a
+# token claiming a far-future exp is a minter bug or a forgery attempt,
+# so cap the blast radius. `iat` is mandatory, so this always applies.
+_MAX_TOKEN_LIFETIME_SECONDS = 600
+# Cap content size to stay within the Lambda's 5s / 512MB envelope and
+# to keep a single response bounded. Oversized blobs are a writer bug.
+_MAX_CONTENT_BYTES = 5 * 1024 * 1024
+
+# Module-scoped for container reuse across invocations.
+_secrets_client = None
+_s3_client = None
+_ddb_table = None
+_cached_signing_key: str | None = None
+
+
+class _TokenError(Exception):
+    """Render token is missing, malformed, or fails verification."""
+
+
+class _ArtifactNotFound(Exception):
+    """No version record or no backing object for the requested artifact."""
+
+
+class _RenderConfigError(Exception):
+    """Required environment / AWS configuration is missing or unusable."""
+
+
+class _UnsupportedStorage(Exception):
+    """Version record uses a storage class this handler can't serve yet."""
+
+
+def _csp_header() -> str:
+    """Build the artifact-origin CSP. Mirrors the CloudFront response-
+    headers-policy so the policy is identical whether CloudFront sets it
+    or the Lambda does (defense in depth)."""
+    return "; ".join(
+        [
+            "default-src 'none'",
+            f"script-src {_CSP_SCRIPT_SRC}",
+            "style-src 'self' 'unsafe-inline'",
+            "img-src 'self' data: https:",
+            "font-src 'self' data:",
+            "connect-src 'none'",
+            f"frame-ancestors {_FRAME_ANCESTOR}",
+            "form-action 'none'",
+            "base-uri 'none'",
+        ]
+    )
+
+
+# Authored types whose S3 body is a writer-produced HTML render wrapper
+# (see module docstring). Mirrors `_MARKDOWN_MIME_TYPES` in the writer's
+# service.py — this Lambda is standalone (no apis/* imports) so the small
+# duplication is by design; keep the two in sync.
+_HTML_CONTENT_TYPE = "text/html; charset=utf-8"
+_MARKDOWN_MIME_TYPES = frozenset({"text/markdown", "text/x-markdown"})
+
+
+def _serve_content_type(stored: str) -> str:
+    """HTTP content type to emit for a stored authored type.
+
+    Markdown records hold an HTML render wrapper in S3, so they are
+    served as HTML; every other type is served exactly as stored."""
+    bare = (stored or "").split(";")[0].strip().lower()
+    if bare in _MARKDOWN_MIME_TYPES:
+        return _HTML_CONTENT_TYPE
+    return stored
+
+
+def _security_headers(content_type: str) -> dict[str, str]:
+    return {
+        "content-type": content_type,
+        "content-security-policy": _csp_header(),
+        "x-content-type-options": "nosniff",
+        "referrer-policy": "no-referrer",
+        "cache-control": "no-store",
+    }
+
+
+# File extension to suggest when saving an artifact. Keyed by the bare
+# (parameter-stripped, lowercased) authored content type. Markdown rows
+# hold the writer's HTML render wrapper in S3 (see module docstring), so
+# the saved bytes are HTML — extension follows the bytes, not the label.
+_DOWNLOAD_EXTENSIONS = {
+    "text/html": "html",
+    "text/markdown": "html",
+    "text/x-markdown": "html",
+    "image/svg+xml": "svg",
+    "application/json": "json",
+    "text/css": "css",
+    "text/javascript": "js",
+    "application/javascript": "js",
+    "text/plain": "txt",
+}
+
+# Anything outside this set is collapsed to '_' for the ASCII fallback
+# filename. The UTF-8 `filename*` form carries the original title.
+_SAFE_FILENAME_RE = re.compile(r"[^A-Za-z0-9._ -]+")
+_MAX_FILENAME_BASE = 120
+
+
+def _download_extension(stored_content_type: str) -> str:
+    bare = (stored_content_type or "").split(";")[0].strip().lower()
+    return _DOWNLOAD_EXTENSIONS.get(bare, "bin")
+
+
+def _content_disposition(title: str, ext: str) -> str:
+    """Build an `attachment` Content-Disposition with an ASCII-safe
+    `filename` plus an RFC 5987 `filename*` that preserves a non-ASCII
+    title. Never reflects raw bytes into a header value."""
+    base = (title or "").strip() or "artifact"
+    ascii_base = _SAFE_FILENAME_RE.sub("_", base).strip(" ._")[
+        :_MAX_FILENAME_BASE
+    ] or "artifact"
+    fallback = f"{ascii_base}.{ext}"
+    utf8 = quote(f"{base}.{ext}", safe="")
+    return f"attachment; filename=\"{fallback}\"; filename*=UTF-8''{utf8}"
+
+
+def _wants_download(event: dict[str, Any]) -> bool:
+    """True when the request asked to save rather than render (`?download=1`).
+    Read the same two ways as the render token (parsed params, then the
+    raw query string) so it survives whichever the Function URL provides."""
+    params = event.get("queryStringParameters") or {}
+    value = params.get("download")
+    if value is None:
+        raw = event.get("rawQueryString") or ""
+        value = (parse_qs(raw).get("download") or [None])[0]
+    return value == "1"
+
+
+def _download_headers(content_type: str, disposition: str) -> dict[str, str]:
+    """Headers for an attachment response. No CSP/frame-ancestors here —
+    the bytes are saved to disk, not framed — but keep nosniff so the
+    browser doesn't re-sniff the type, and no-store so a one-time
+    credentialed URL isn't cached."""
+    return {
+        "content-type": content_type,
+        "content-disposition": disposition,
+        "x-content-type-options": "nosniff",
+        "referrer-policy": "no-referrer",
+        "cache-control": "no-store",
+    }
+
+
+def _error_html(message: str) -> str:
+    """Generic error page. Never reflects token or claim values — keeps
+    the surface free of injected content even though the CSP would
+    neutralize it anyway."""
+    return (
+        "<!doctype html>"
+        "<html><head>"
+        "<meta charset='utf-8'>"
+        "<title>Artifact unavailable</title>"
+        "<style>body{font:14px system-ui;padding:2rem;color:#444}</style>"
+        "</head><body>"
+        "<h1>Artifact unavailable</h1>"
+        f"<p>{message}</p>"
+        "</body></html>"
+    )
+
+
+def _response(status: int, body: str, content_type: str) -> dict[str, Any]:
+    return {
+        "statusCode": status,
+        "headers": _security_headers(content_type),
+        "body": body,
+    }
+
+
+def _error_response(status: int, message: str) -> dict[str, Any]:
+    return _response(status, _error_html(message), "text/html; charset=utf-8")
+
+
+def _b64url_decode(segment: str) -> bytes:
+    """Decode a base64url JWT segment, restoring the stripped padding."""
+    padding = "=" * (-len(segment) % 4)
+    return base64.urlsafe_b64decode(segment + padding)
+
+
+def _signing_key() -> str:
+    """Fetch and cache the HMAC signing key. The secret is a plain
+    string (Secrets Manager `generateSecretString`, no JSON wrapper) —
+    same shape as the BFF cookie data key. Cached for the container
+    lifetime; on rotation the container eventually recycles, which is
+    acceptable for short-lived render tokens."""
+    global _secrets_client, _cached_signing_key
+    if _cached_signing_key is not None:
+        return _cached_signing_key
+    if not _RENDER_TOKEN_SECRET_ARN:
+        raise _RenderConfigError("RENDER_TOKEN_SECRET_ARN is not set")
+    if _secrets_client is None:
+        _secrets_client = boto3.client("secretsmanager")
+    try:
+        secret = _secrets_client.get_secret_value(SecretId=_RENDER_TOKEN_SECRET_ARN)
+    except ClientError as exc:
+        raise _RenderConfigError("could not read render token secret") from exc
+    key = secret.get("SecretString")
+    if not key:
+        raise _RenderConfigError("render token secret is empty")
+    _cached_signing_key = key
+    return key
+
+
+def _verify_token(token: str) -> dict[str, Any]:
+    """Verify an HS256 render token and return its validated claims.
+
+    Implemented against the stdlib rather than PyJWT so the Lambda asset
+    stays dependency-free. `alg` is pinned to HS256 explicitly to reject
+    the `none` algorithm and HS/RS confusion."""
+    parts = token.split(".")
+    if len(parts) != 3:
+        raise _TokenError("malformed token")
+    header_b64, payload_b64, signature_b64 = parts
+
+    try:
+        header = json.loads(_b64url_decode(header_b64))
+    except (ValueError, json.JSONDecodeError) as exc:
+        raise _TokenError("unreadable header") from exc
+    if not isinstance(header, dict):
+        raise _TokenError("malformed header")
+    if header.get("alg") != "HS256":
+        raise _TokenError("unexpected token algorithm")
+
+    expected_sig = hmac.new(
+        _signing_key().encode("utf-8"),
+        f"{header_b64}.{payload_b64}".encode("ascii"),
+        hashlib.sha256,
+    ).digest()
+    try:
+        provided_sig = _b64url_decode(signature_b64)
+    except ValueError as exc:
+        raise _TokenError("unreadable signature") from exc
+    # Constant-time compare — never short-circuit on the first byte.
+    if not hmac.compare_digest(expected_sig, provided_sig):
+        raise _TokenError("signature mismatch")
+
+    try:
+        claims = json.loads(_b64url_decode(payload_b64))
+    except (ValueError, json.JSONDecodeError) as exc:
+        raise _TokenError("unreadable payload") from exc
+    if not isinstance(claims, dict):
+        raise _TokenError("malformed payload")
+
+    if claims.get("iss") != _EXPECTED_ISS:
+        raise _TokenError("unexpected issuer")
+    if claims.get("aud") != _EXPECTED_AUD:
+        raise _TokenError("unexpected audience")
+
+    now = time.time()
+    exp = claims.get("exp")
+    if not isinstance(exp, (int, float)):
+        raise _TokenError("missing exp")
+    if now > exp + _LEEWAY_SECONDS:
+        raise _TokenError("token expired")
+
+    # `iat` is mandatory: the lifetime cap is the blast-radius control for
+    # a minter bug, and it can only be enforced relative to `iat`. The
+    # cross-PR contract requires the minter to send it, so a missing `iat`
+    # is itself a contract violation — reject rather than skip the cap.
+    # `bool` is an `int` subclass — exclude it explicitly.
+    iat = claims.get("iat")
+    if not isinstance(iat, (int, float)) or isinstance(iat, bool):
+        raise _TokenError("missing iat")
+    if iat > now + _LEEWAY_SECONDS:
+        raise _TokenError("token issued in the future")
+    if exp - iat > _MAX_TOKEN_LIFETIME_SECONDS:
+        raise _TokenError("token lifetime too long")
+
+    sub = claims.get("sub")
+    aid = claims.get("aid")
+    ver = claims.get("ver")
+    if not isinstance(sub, str) or not sub:
+        raise _TokenError("missing sub")
+    if not isinstance(aid, str) or not aid:
+        raise _TokenError("missing aid")
+    # `bool` is an `int` subclass — exclude it explicitly.
+    if not isinstance(ver, int) or isinstance(ver, bool) or ver < 1:
+        raise _TokenError("invalid ver")
+
+    return claims
+
+
+def _get_version_record(user_id: str, artifact_id: str, version: int) -> dict[str, Any]:
+    global _ddb_table
+    if not _ARTIFACTS_TABLE:
+        raise _RenderConfigError("ARTIFACTS_TABLE is not set")
+    if _ddb_table is None:
+        _ddb_table = boto3.resource("dynamodb").Table(_ARTIFACTS_TABLE)
+    sk = f"ARTIFACT#{artifact_id}#V#{version:05d}"
+    try:
+        result = _ddb_table.get_item(Key={"PK": f"USER#{user_id}", "SK": sk})
+    except ClientError as exc:
+        raise _RenderConfigError("artifact metadata lookup failed") from exc
+    item = result.get("Item")
+    if not item:
+        raise _ArtifactNotFound("version record not found")
+    return item
+
+
+def _fetch_content(content_key: str) -> str:
+    global _s3_client
+    if not _ARTIFACTS_BUCKET:
+        raise _RenderConfigError("ARTIFACTS_BUCKET is not set")
+    if _s3_client is None:
+        _s3_client = boto3.client("s3")
+    try:
+        obj = _s3_client.get_object(Bucket=_ARTIFACTS_BUCKET, Key=content_key)
+    except ClientError as exc:
+        code = exc.response.get("Error", {}).get("Code", "")
+        if code in ("NoSuchKey", "404"):
+            raise _ArtifactNotFound("content object missing") from exc
+        raise _RenderConfigError("content fetch failed") from exc
+    content_length = obj.get("ContentLength")
+    if isinstance(content_length, int) and content_length > _MAX_CONTENT_BYTES:
+        raise _UnsupportedStorage("content exceeds size limit")
+    raw = obj["Body"].read(_MAX_CONTENT_BYTES + 1)
+    if len(raw) > _MAX_CONTENT_BYTES:
+        raise _UnsupportedStorage("content exceeds size limit")
+    try:
+        return raw.decode("utf-8")
+    except UnicodeDecodeError as exc:
+        raise _UnsupportedStorage("content is not valid utf-8") from exc
+
+
+def _extract_token(event: dict[str, Any]) -> str:
+    params = event.get("queryStringParameters") or {}
+    token = params.get("t")
+    if not token:
+        raw = event.get("rawQueryString") or ""
+        token = (parse_qs(raw).get("t") or [None])[0]
+    if not token:
+        raise _TokenError("missing render token")
+    return token
+
+
+def _request_method(event: dict[str, Any]) -> str:
+    return (
+        event.get("requestContext", {})
+        .get("http", {})
+        .get("method", "GET")
+        .upper()
+    )
+
+
+def handler(event: dict[str, Any], _context: Any) -> dict[str, Any]:
+    """Lambda Function URL handler. Payload format v2.0.
+
+    SECURITY: never log `event`, `rawQueryString`, `queryStringParameters`,
+    or the raw token — the render token is a bearer credential carried in
+    the URL query string. Log identifiers (sub/aid/ver/sid) only.
+    """
+    method = _request_method(event)
+    if method not in ("GET", "HEAD"):
+        return _error_response(405, "Method not allowed.")
+
+    try:
+        token = _extract_token(event)
+        claims = _verify_token(token)
+    except _TokenError as exc:
+        logger.warning("render token rejected: %s", exc)
+        return _error_response(403, "This artifact link is invalid or has expired.")
+    except _RenderConfigError as exc:
+        logger.error("render config error during verification: %s", exc)
+        return _error_response(500, "The artifact service is misconfigured.")
+
+    user_id = claims["sub"]
+    artifact_id = claims["aid"]
+    version = claims["ver"]
+    logger.info(
+        "render request user=%s artifact=%s v=%s sid=%s",
+        user_id,
+        artifact_id,
+        version,
+        claims.get("sid"),
+    )
+
+    try:
+        record = _get_version_record(user_id, artifact_id, version)
+        storage = record.get("storage")
+        if storage != "s3":
+            raise _UnsupportedStorage(f"storage class {storage!r} not supported")
+        content_key = record.get("content_key")
+        if not isinstance(content_key, str) or not content_key:
+            raise _ArtifactNotFound("version record has no content pointer")
+        stored_content_type = record.get("content_type") or _HTML_CONTENT_TYPE
+        content_type = _serve_content_type(stored_content_type)
+        raw_title = record.get("title")
+        title = raw_title if isinstance(raw_title, str) else ""
+        body = _fetch_content(content_key)
+    except _ArtifactNotFound as exc:
+        logger.warning(
+            "artifact not found user=%s artifact=%s v=%s: %s",
+            user_id,
+            artifact_id,
+            version,
+            exc,
+        )
+        return _error_response(404, "This artifact could not be found.")
+    except _UnsupportedStorage as exc:
+        logger.error(
+            "unsupported artifact content user=%s artifact=%s v=%s: %s",
+            user_id,
+            artifact_id,
+            version,
+            exc,
+        )
+        return _error_response(500, "This artifact could not be rendered.")
+    except _RenderConfigError as exc:
+        logger.error("render config error during fetch: %s", exc)
+        return _error_response(500, "The artifact service is misconfigured.")
+
+    if _wants_download(event):
+        ext = _download_extension(stored_content_type)
+        headers = _download_headers(
+            content_type, _content_disposition(title, ext)
+        )
+        return {
+            "statusCode": 200,
+            "headers": headers,
+            "body": "" if method == "HEAD" else body,
+        }
+
+    if method == "HEAD":
+        return _response(200, "", content_type)
+    return _response(200, body, content_type)
+
+
+# Local smoke test: `python handler.py` exercises the missing-token path
+# (returns 403) with zero AWS calls — the token check precedes any client.
+if __name__ == "__main__":
+    print(json.dumps(handler({}, None), indent=2))
diff --git a/backend/tests/agents/builtin_tools/__init__.py b/backend/tests/agents/builtin_tools/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/backend/tests/agents/builtin_tools/artifacts/test_artifact_tools.py b/backend/tests/agents/builtin_tools/artifacts/test_artifact_tools.py
new file mode 100644
index 00000000..23841c85
--- /dev/null
+++ b/backend/tests/agents/builtin_tools/artifacts/test_artifact_tools.py
@@ -0,0 +1,294 @@
+"""Tests for the artifact authoring tools.
+
+The headline guarantee is `test_record_satisfies_minter`: a row written
+by this tool must be accepted byte-for-byte by #310's app-api minter
+(the real downstream reader) and resolve to the S3 object #309's render
+Lambda would serve.
+"""
+
+from __future__ import annotations
+
+import base64
+import re
+
+import boto3
+import pytest
+from moto import mock_aws
+
+from agents.builtin_tools.artifacts import service
+from apis.inference_api.chat.routes import _build_artifact_tools
+
+REGION = "us-east-1"
+TABLE = "test-user-artifacts"
+BUCKET = "test-artifacts-content"
+USER = "user-123"
+SESSION = "sess-9"
+DOC = "<!doctype html><html><body><h1>hi</h1></body></html>"
+MD = "# Title\n\nSome **bold** text and a list:\n\n- one\n- two\n"
+
+
+def _embedded_markdown(body: str) -> str:
+    """Decode the base64 source the render wrapper embeds (no `<` in
+    base64, so the cheap regex is safe)."""
+    m = re.search(r'id="md-src">([^<]+)</script>', body)
+    assert m, "render wrapper is missing the embedded markdown block"
+    return base64.b64decode(m.group(1)).decode()
+
+
+@pytest.fixture(autouse=True)
+def _reset() -> None:
+    service._reset_caches_for_tests()
+
+
+@pytest.fixture
+def aws(monkeypatch: pytest.MonkeyPatch):
+    with mock_aws():
+        monkeypatch.setenv("AWS_REGION", REGION)
+        monkeypatch.setenv("S3_ARTIFACTS_BUCKET_NAME", BUCKET)
+        monkeypatch.setenv("DYNAMODB_ARTIFACTS_TABLE_NAME", TABLE)
+
+        boto3.client("s3", region_name=REGION).create_bucket(Bucket=BUCKET)
+        boto3.client("dynamodb", region_name=REGION).create_table(
+            TableName=TABLE,
+            KeySchema=[
+                {"AttributeName": "PK", "KeyType": "HASH"},
+                {"AttributeName": "SK", "KeyType": "RANGE"},
+            ],
+            AttributeDefinitions=[
+                {"AttributeName": "PK", "AttributeType": "S"},
+                {"AttributeName": "SK", "AttributeType": "S"},
+                {"AttributeName": "GSI1PK", "AttributeType": "S"},
+                {"AttributeName": "GSI1SK", "AttributeType": "S"},
+            ],
+            GlobalSecondaryIndexes=[
+                {
+                    "IndexName": "SessionIndex",
+                    "KeySchema": [
+                        {"AttributeName": "GSI1PK", "KeyType": "HASH"},
+                        {"AttributeName": "GSI1SK", "KeyType": "RANGE"},
+                    ],
+                    "Projection": {"ProjectionType": "ALL"},
+                }
+            ],
+            BillingMode="PAY_PER_REQUEST",
+        )
+        yield boto3.resource("dynamodb", region_name=REGION), boto3.client(
+            "s3", region_name=REGION
+        )
+
+
+def _item(ddb, artifact_id: str, sk_suffix: str) -> dict:
+    return ddb.Table(TABLE).get_item(
+        Key={"PK": f"USER#{USER}", "SK": f"ARTIFACT#{artifact_id}#{sk_suffix}"}
+    ).get("Item")
+
+
+def test_create_writes_s3_and_rows(aws) -> None:
+    ddb, s3 = aws
+    aid, ver = service.create_artifact_record(USER, SESSION, "My Art", DOC, "")
+    assert ver == 1
+
+    key = f"{USER}/{aid}/v1/index.html"
+    assert s3.get_object(Bucket=BUCKET, Key=key)["Body"].read().decode() == DOC
+
+    vrow = _item(ddb, aid, "V#00001")
+    assert vrow["storage"] == "s3"
+    assert vrow["content_key"] == key
+    assert vrow["content_type"] == "text/html; charset=utf-8"
+
+    head = _item(ddb, aid, "HEAD")
+    assert head["version"] == 1
+    assert head["GSI1PK"] == f"SESSION#{SESSION}"
+    assert head["GSI1SK"].startswith("ARTIFACT#") and head["GSI1SK"].endswith(aid)
+
+
+def test_update_increments_and_preserves_old(aws) -> None:
+    ddb, s3 = aws
+    aid, _ = service.create_artifact_record(USER, SESSION, "T", DOC, "")
+    new_doc = "<!doctype html><html><body>v2</body></html>"
+    ver = service.update_artifact_record(USER, aid, new_doc, None, None)
+    assert ver == 2
+
+    # Old version object is immutable / still present.
+    assert s3.get_object(
+        Bucket=BUCKET, Key=f"{USER}/{aid}/v1/index.html"
+    )["Body"].read().decode() == DOC
+    assert s3.get_object(
+        Bucket=BUCKET, Key=f"{USER}/{aid}/v2/index.html"
+    )["Body"].read().decode() == new_doc
+
+    assert _item(ddb, aid, "V#00002")["content_key"] == f"{USER}/{aid}/v2/index.html"
+    head = _item(ddb, aid, "HEAD")
+    assert head["version"] == 2
+    assert head["title"] == "T"  # carried forward
+
+
+def test_update_unknown_artifact_raises(aws) -> None:
+    with pytest.raises(service.ArtifactNotFoundError):
+        service.update_artifact_record(USER, "nope", DOC, None, None)
+
+
+def test_update_foreign_artifact_raises(aws) -> None:
+    aid, _ = service.create_artifact_record(USER, SESSION, "T", DOC, "")
+    with pytest.raises(service.ArtifactNotFoundError):
+        service.update_artifact_record("someone-else", aid, DOC, None, None)
+
+
+def test_content_type_default(aws) -> None:
+    ddb, _ = aws
+    aid, _ = service.create_artifact_record(USER, SESSION, "T", DOC, "")
+    assert _item(ddb, aid, "V#00001")["content_type"] == "text/html; charset=utf-8"
+
+
+def test_markdown_create_wraps_and_preserves_type(aws) -> None:
+    ddb, s3 = aws
+    aid, ver = service.create_artifact_record(
+        USER, SESSION, "Notes", MD, "text/markdown"
+    )
+    assert ver == 1
+
+    # DDB keeps the authored Markdown type — drives the SPA card badge
+    # and list; the render Lambda maps it to text/html when serving.
+    assert _item(ddb, aid, "V#00001")["content_type"] == "text/markdown"
+
+    body = s3.get_object(
+        Bucket=BUCKET, Key=f"{USER}/{aid}/v1/index.html"
+    )["Body"].read().decode()
+    assert body.lstrip().startswith("<!doctype html>")
+    assert "https://esm.sh/marked@14.1.4" in body
+    # Source is base64-embedded, never inlined raw (escaping/XSS-safe).
+    assert MD not in body
+    assert _embedded_markdown(body) == MD
+
+
+def test_markdown_charset_suffix_still_markdown(aws) -> None:
+    _, s3 = aws
+    aid, _ = service.create_artifact_record(
+        USER, SESSION, "Doc", MD, "text/markdown; charset=utf-8"
+    )
+    body = s3.get_object(
+        Bucket=BUCKET, Key=f"{USER}/{aid}/v1/index.html"
+    )["Body"].read().decode()
+    assert _embedded_markdown(body) == MD
+
+
+def test_markdown_update_rewraps_inherited_type(aws) -> None:
+    _, s3 = aws
+    aid, _ = service.create_artifact_record(
+        USER, SESSION, "Doc", MD, "text/markdown"
+    )
+    new_md = "## v2\n\nrevised body\n"
+    # content_type omitted → inherits Markdown from HEAD, must re-wrap.
+    ver = service.update_artifact_record(USER, aid, new_md, None, None)
+    assert ver == 2
+    body = s3.get_object(
+        Bucket=BUCKET, Key=f"{USER}/{aid}/v2/index.html"
+    )["Body"].read().decode()
+    assert body.lstrip().startswith("<!doctype html>")
+    assert _embedded_markdown(body) == new_md
+
+
+def test_html_artifact_not_wrapped(aws) -> None:
+    _, s3 = aws
+    aid, _ = service.create_artifact_record(USER, SESSION, "Page", DOC, "text/html")
+    assert s3.get_object(
+        Bucket=BUCKET, Key=f"{USER}/{aid}/v1/index.html"
+    )["Body"].read().decode() == DOC
+
+
+def test_ssm_fallback(aws, monkeypatch: pytest.MonkeyPatch) -> None:
+    """Env unset → resolve bucket/table from /{PROJECT_PREFIX}/artifacts/*."""
+    monkeypatch.delenv("S3_ARTIFACTS_BUCKET_NAME", raising=False)
+    monkeypatch.delenv("DYNAMODB_ARTIFACTS_TABLE_NAME", raising=False)
+    monkeypatch.setenv("PROJECT_PREFIX", "myproj")
+    ssm = boto3.client("ssm", region_name=REGION)
+    ssm.put_parameter(Name="/myproj/artifacts/bucket-name", Value=BUCKET, Type="String")
+    ssm.put_parameter(Name="/myproj/artifacts/table-name", Value=TABLE, Type="String")
+    service._reset_caches_for_tests()
+
+    aid, ver = service.create_artifact_record(USER, SESSION, "T", DOC, "")
+    assert ver == 1 and aid
+
+
+def test_record_satisfies_minter(aws) -> None:
+    """Cross-PR contract: the written version row must be accepted by
+    #310's app-api minter and resolve to the S3 object #309 serves."""
+    _, s3 = aws
+    aid, ver = service.create_artifact_record(USER, SESSION, "T", DOC, "")
+
+    from apis.app_api.artifacts import service as minter
+
+    minter._reset_caches_for_tests()
+    # Minter reads its own table handle from the same env we set.
+    minter._assert_version_exists(USER, aid, ver)  # must not raise
+
+    # And the content_key the readers trust actually points at content.
+    vrow = _item(boto3.resource("dynamodb", region_name=REGION), aid, "V#00001")
+    assert s3.get_object(
+        Bucket=BUCKET, Key=vrow["content_key"]
+    )["Body"].read().decode() == DOC
+
+
+@pytest.mark.parametrize(
+    "enabled,expected",
+    [(None, 0), ([], 0), (["other"], 0), (["create_artifact"], 1),
+     (["create_artifact", "update_artifact"], 2)],
+)
+def test_routes_gating(enabled, expected) -> None:
+    tools = _build_artifact_tools(enabled, SESSION, USER)
+    assert len(tools) == expected
+
+
+def test_list_session_artifacts_returns_heads_newest_first(aws) -> None:
+    a1, _ = service.create_artifact_record(USER, SESSION, "First", DOC, "")
+    a2, _ = service.create_artifact_record(USER, SESSION, "Second", DOC, "")
+    # Bump a1 to v2 so it becomes the most-recently-updated HEAD.
+    service.update_artifact_record(USER, a1, DOC, None, None)
+
+    rows = service.list_session_artifacts(USER, SESSION)
+    by_id = {r["artifact_id"]: r for r in rows}
+    assert set(by_id) == {a1, a2}
+    assert by_id[a1]["version"] == 2  # reflects current HEAD, not v1
+    assert by_id[a2]["title"] == "Second"
+    # Newest-first: a1 (just updated) precedes a2.
+    assert [r["artifact_id"] for r in rows] == [a1, a2]
+
+
+def test_list_session_artifacts_scopes_to_user(aws) -> None:
+    mine, _ = service.create_artifact_record(USER, SESSION, "Mine", DOC, "")
+    service.create_artifact_record("someone-else", SESSION, "Theirs", DOC, "")
+
+    rows = service.list_session_artifacts(USER, SESSION)
+    assert [r["artifact_id"] for r in rows] == [mine]
+
+
+def test_list_session_artifacts_empty_session(aws) -> None:
+    assert service.list_session_artifacts(USER, "no-such-session") == []
+
+
+def test_set_produced_by_message_index_stamps_version_and_head(aws) -> None:
+    ddb, _ = aws
+    aid, _ = service.create_artifact_record(USER, SESSION, "Doc", DOC, "")
+    assert service.list_session_artifacts(USER, SESSION)[0][
+        "produced_by_message_index"
+    ] is None
+
+    service.set_produced_by_message_index(USER, aid, 1, 7)
+
+    # Per-version linkage: the v1 row itself carries the index — this is
+    # what survives reload via the all-versions list endpoint.
+    assert _item(ddb, aid, "V#00001")["produced_by_message_index"] == 7
+    # HEAD is stamped too, so the writer's HEAD-based live list still sees it.
+    rows = service.list_session_artifacts(USER, SESSION)
+    assert rows[0]["produced_by_message_index"] == 7
+    # The stamp must leave the optimistic-lock `version` untouched so a
+    # later update_artifact still re-points HEAD cleanly.
+    assert service.update_artifact_record(USER, aid, DOC, None, None) == 2
+
+
+def test_set_produced_by_message_index_requires_existing_rows(aws) -> None:
+    from botocore.exceptions import ClientError
+
+    # No version/HEAD rows for "nope": the conditional update fails closed.
+    with pytest.raises(ClientError):
+        service.set_produced_by_message_index(USER, "nope", 1, 1)
diff --git a/backend/tests/agents/builtin_tools/spreadsheet_analysis/__init__.py b/backend/tests/agents/builtin_tools/spreadsheet_analysis/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/backend/tests/agents/builtin_tools/spreadsheet_analysis/conftest.py b/backend/tests/agents/builtin_tools/spreadsheet_analysis/conftest.py
new file mode 100644
index 00000000..c6d41284
--- /dev/null
+++ b/backend/tests/agents/builtin_tools/spreadsheet_analysis/conftest.py
@@ -0,0 +1,664 @@
+"""Shared fixtures for spreadsheet_analysis unit tests.
+
+Central place to assemble the stack of mocks the analyze_spreadsheet tool
+requires: the Code Interpreter client, the S3 client, and the file
+resolution helpers (_get_kb_files / _get_session_files / _find_file).
+
+Each fixture is small and composable so individual tests can swap in
+exactly the behavior they want to assert on.
+
+S3 and DynamoDB are handled with moto (see ``tests/shared/conftest.py``)
+so that tests exercise real boto3 call paths rather than ad-hoc mocks.
+The Code Interpreter client has no moto equivalent — it's an AgentCore
+service — so ``FakeCodeInterpreter`` below is a hand-rolled stand-in.
+"""
+
+from __future__ import annotations
+
+import asyncio
+from dataclasses import dataclass, field
+from typing import Any, Callable
+from unittest.mock import patch
+
+import boto3
+import pytest
+from moto import mock_aws
+
+
+# ---------------------------------------------------------------------------
+# AWS mocks (moto) — S3 + DynamoDB tables
+# ---------------------------------------------------------------------------
+
+
+AWS_REGION = "us-east-1"
+SESSIONS_BUCKET = "test-sessions-bucket"
+KB_BUCKET = "test-kb-bucket"
+
+
+@pytest.fixture
+def aws_mocked(monkeypatch):
+    """Activate moto's ``mock_aws`` for the duration of the test.
+
+    Sets the minimum env vars boto3 clients expect. Any S3 / DynamoDB
+    calls made by analyze_tool._download_file or _get_kb_files during
+    the test execute against moto's in-process fakes, not real AWS.
+
+    ``AWS_REGION`` is set alongside ``AWS_DEFAULT_REGION`` because some
+    helpers (``_get_kb_files``, ``_download_file``) read ``AWS_REGION``
+    explicitly and fall back to ``us-west-2`` — which would land on a
+    different moto region than the fixtures use.
+    """
+    monkeypatch.setenv("AWS_DEFAULT_REGION", AWS_REGION)
+    monkeypatch.setenv("AWS_REGION", AWS_REGION)
+    monkeypatch.setenv("AWS_ACCESS_KEY_ID", "testing")
+    monkeypatch.setenv("AWS_SECRET_ACCESS_KEY", "testing")
+    monkeypatch.setenv("AWS_SECURITY_TOKEN", "testing")
+    monkeypatch.setenv("AWS_SESSION_TOKEN", "testing")
+    with mock_aws():
+        yield
+
+
+@pytest.fixture
+def sessions_bucket(aws_mocked):
+    """Create the session-attachments S3 bucket. Tests push real objects
+    in and analyze_tool downloads them through real boto3 calls.
+    """
+    s3 = boto3.client("s3", region_name=AWS_REGION)
+    s3.create_bucket(Bucket=SESSIONS_BUCKET)
+    return SESSIONS_BUCKET
+
+
+@pytest.fixture
+def kb_bucket(aws_mocked, monkeypatch):
+    """Create the assistant-KB S3 bucket and point the env var at it so
+    ``_download_file`` can resolve the bucket for KB-source files.
+    """
+    s3 = boto3.client("s3", region_name=AWS_REGION)
+    s3.create_bucket(Bucket=KB_BUCKET)
+    monkeypatch.setenv("S3_ASSISTANTS_DOCUMENTS_BUCKET_NAME", KB_BUCKET)
+    return KB_BUCKET
+
+
+@pytest.fixture
+def assistants_table(aws_mocked, monkeypatch):
+    """Create the DynamoDB assistants table with the schema
+    ``_get_kb_files`` queries against. Tests can ``put_item`` real
+    document records and see them flow through the filter.
+    """
+    ddb = boto3.client("dynamodb", region_name=AWS_REGION)
+    name = "test-assistants"
+    monkeypatch.setenv("DYNAMODB_ASSISTANTS_TABLE_NAME", name)
+    ddb.create_table(
+        TableName=name,
+        KeySchema=[
+            {"AttributeName": "PK", "KeyType": "HASH"},
+            {"AttributeName": "SK", "KeyType": "RANGE"},
+        ],
+        AttributeDefinitions=[
+            {"AttributeName": "PK", "AttributeType": "S"},
+            {"AttributeName": "SK", "AttributeType": "S"},
+        ],
+        BillingMode="PAY_PER_REQUEST",
+    )
+    return boto3.resource("dynamodb", region_name=AWS_REGION).Table(name)
+
+
+@pytest.fixture
+def files_table(aws_mocked, monkeypatch):
+    """Create the user-files DynamoDB table with the SessionIndex GSI
+    that ``FileUploadRepository.list_session_files`` queries.
+    """
+    ddb = boto3.client("dynamodb", region_name=AWS_REGION)
+    name = "test-user-files"
+    monkeypatch.setenv("DYNAMODB_USER_FILES_TABLE_NAME", name)
+    ddb.create_table(
+        TableName=name,
+        KeySchema=[
+            {"AttributeName": "PK", "KeyType": "HASH"},
+            {"AttributeName": "SK", "KeyType": "RANGE"},
+        ],
+        AttributeDefinitions=[
+            {"AttributeName": "PK", "AttributeType": "S"},
+            {"AttributeName": "SK", "AttributeType": "S"},
+            {"AttributeName": "GSI1PK", "AttributeType": "S"},
+            {"AttributeName": "GSI1SK", "AttributeType": "S"},
+        ],
+        GlobalSecondaryIndexes=[{
+            "IndexName": "SessionIndex",
+            "KeySchema": [
+                {"AttributeName": "GSI1PK", "KeyType": "HASH"},
+                {"AttributeName": "GSI1SK", "KeyType": "RANGE"},
+            ],
+            "Projection": {"ProjectionType": "ALL"},
+        }],
+        BillingMode="PAY_PER_REQUEST",
+    )
+    return boto3.resource("dynamodb", region_name=AWS_REGION).Table(name)
+
+
+@pytest.fixture
+def file_repository(files_table):
+    """A real ``FileUploadRepository`` pointed at the moto-backed table."""
+    from apis.shared.files.repository import FileUploadRepository
+
+    return FileUploadRepository(table_name="test-user-files")
+
+
+# ---------------------------------------------------------------------------
+# Seed helpers — write KB docs / session files into moto-backed stores
+# ---------------------------------------------------------------------------
+
+
+def put_kb_doc(
+    table,
+    *,
+    assistant_id: str,
+    filename: str,
+    content_type: str,
+    status: str = "complete",
+    size_bytes: int = 1024,
+    document_id: str | None = None,
+    s3_key: str | None = None,
+    use_snake_case: bool = False,
+) -> None:
+    """Write a completed (or failed) KB document row to the assistants
+    table in the shape ``_get_kb_files`` queries.
+
+    ``use_snake_case`` lets tests pin the legacy field-name behavior —
+    some older items store ``content_type`` / ``size_bytes`` / ``s3_key``
+    / ``document_id`` instead of the camelCase defaults.
+    """
+    doc_id = document_id or f"doc-{filename}"
+    key = s3_key or f"assistants/{assistant_id}/{filename}"
+    item = {
+        "PK": f"AST#{assistant_id}",
+        "SK": f"DOC#{doc_id}",
+        "status": status,
+        "filename": filename,
+    }
+    if use_snake_case:
+        item.update({
+            "content_type": content_type,
+            "size_bytes": size_bytes,
+            "document_id": doc_id,
+            "s3_key": key,
+        })
+    else:
+        item.update({
+            "contentType": content_type,
+            "sizeBytes": size_bytes,
+            "documentId": doc_id,
+            "s3Key": key,
+        })
+    table.put_item(Item=item)
+
+
+async def put_session_file(
+    file_repository,
+    *,
+    session_id: str,
+    user_id: str = "u1",
+    upload_id: str,
+    filename: str,
+    mime_type: str,
+    size_bytes: int = 1024,
+    s3_bucket: str = SESSIONS_BUCKET,
+    s3_key: str | None = None,
+) -> None:
+    """Create a READY file record in the files repository so
+    ``FileUploadRepository.list_session_files`` returns it.
+    """
+    from apis.shared.files.models import FileMetadata, FileStatus
+
+    key = s3_key or f"sessions/{session_id}/{filename}"
+    await file_repository.create_file(FileMetadata(
+        upload_id=upload_id,
+        user_id=user_id,
+        session_id=session_id,
+        filename=filename,
+        mime_type=mime_type,
+        size_bytes=size_bytes,
+        s3_key=key,
+        s3_bucket=s3_bucket,
+        status=FileStatus.READY,
+    ))
+
+
+@pytest.fixture
+def seed_kb_doc(assistants_table):
+    """Tiny helper so tests read like ``seed_kb_doc(filename=..., ...)``
+    without threading the table fixture through every call site.
+    """
+    def _seed(**kwargs):
+        put_kb_doc(assistants_table, **kwargs)
+    return _seed
+
+
+@pytest.fixture
+def seed_session_file(file_repository):
+    """Async-aware helper; tests should ``await seed_session_file(...)``."""
+    async def _seed(**kwargs):
+        await put_session_file(file_repository, **kwargs)
+    return _seed
+
+
+# ---------------------------------------------------------------------------
+# Fake CodeInterpreter
+# ---------------------------------------------------------------------------
+
+
+@dataclass
+class InvocationRecord:
+    """One call to the fake CodeInterpreter's ``invoke`` method."""
+
+    name: str
+    payload: dict
+
+
+@dataclass
+class FakeCodeInterpreter:
+    """Drop-in stand-in for bedrock_agentcore's CodeInterpreter client.
+
+    Tests can:
+    - install a ``reply_for`` callback that returns the canned stream
+      response for a given (invocation_name, payload) pair; or
+    - rely on the default empty-success behavior (``executeCode`` returns
+      an empty stdout non-error stream; ``writeFiles`` / ``readFiles``
+      return empty streams).
+
+    The ``invocations`` list preserves call order so tests can assert on
+    the full sequence, not just the last call.
+    """
+
+    reply_for: Callable[[str, dict], dict] | None = None
+    invocations: list[InvocationRecord] = field(default_factory=list)
+    started: bool = False
+    stopped: bool = False
+
+    # Inputs the test doesn't care about — bedrock_agentcore exposes these
+    # as construction / lifecycle hooks. We keep no-op stubs.
+    def __init__(self, *_args, reply_for=None, **_kwargs):
+        self.reply_for = reply_for
+        self.invocations = []
+        self.started = False
+        self.stopped = False
+
+    def start(self, identifier: str) -> None:  # noqa: D401 — mock signature
+        self.started = True
+
+    def stop(self) -> None:
+        self.stopped = True
+
+    def invoke(self, name: str, payload: dict) -> dict:
+        self.invocations.append(InvocationRecord(name=name, payload=payload))
+        if self.reply_for is not None:
+            return self.reply_for(name, payload)
+        # Default: empty successful response.
+        return {"stream": [{"result": {"isError": False, "structuredContent": {"stdout": ""}}}]}
+
+    # --- Handy helpers for test assertions ---
+
+    def bootstrap_payload(self) -> str | None:
+        """Return the code string passed to the first executeCode call
+        (the XLSX bootstrap), or None if nothing was executed.
+        """
+        for rec in self.invocations:
+            if rec.name == "executeCode":
+                return rec.payload.get("code")
+        return None
+
+    def executed_codes(self) -> list[str]:
+        return [r.payload.get("code", "") for r in self.invocations if r.name == "executeCode"]
+
+
+def _stream_response(stdout: str = "", *, is_error: bool = False, stderr: str = "") -> dict:
+    """Build a minimally valid stream response from CodeInterpreter."""
+    return {
+        "stream": [
+            {
+                "result": {
+                    "isError": is_error,
+                    "structuredContent": {"stdout": stdout, "stderr": stderr},
+                }
+            }
+        ]
+    }
+
+
+@pytest.fixture
+def fake_code_interpreter():
+    """Return a FakeCodeInterpreter instance + a patch context that
+    substitutes it for the real client used by analyze_tool.
+
+    Usage:
+        def test_it(fake_code_interpreter):
+            fake, patcher = fake_code_interpreter
+            with patcher:
+                ...
+    """
+    fake = FakeCodeInterpreter()
+
+    def _factory(*_args, **_kwargs):
+        return fake
+
+    patcher = patch(
+        "bedrock_agentcore.tools.code_interpreter_client.CodeInterpreter",
+        side_effect=_factory,
+    )
+    return fake, patcher
+
+
+# ---------------------------------------------------------------------------
+# S3 object helpers (moto-backed)
+# ---------------------------------------------------------------------------
+
+
+def put_s3_object(bucket: str, key: str, body: bytes) -> None:
+    """Push a real object into a moto-backed bucket."""
+    s3 = boto3.client("s3", region_name=AWS_REGION)
+    s3.put_object(Bucket=bucket, Key=key, Body=body)
+
+
+@pytest.fixture
+def seed_s3_object(sessions_bucket):
+    """Drop an object into the sessions bucket. ``analyze_tool._download_file``
+    will pick it up via real boto3 through moto's interceptor.
+    """
+    def _seed(key: str, body: bytes = b"fake bytes", bucket: str = SESSIONS_BUCKET):
+        put_s3_object(bucket, key, body)
+    return _seed
+
+
+# ---------------------------------------------------------------------------
+# File sources (KB + session)
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture
+def file_sources():
+    """Patch ``_get_kb_files`` and ``_get_session_files`` together.
+
+    Patches both modules that import these helpers — analyze_tool
+    (for _find_file and the tabular inventory) and list_spreadsheets_tool
+    (for the tool factory's direct calls). Tests can configure both
+    sides cleanly:
+
+        def test_it(file_sources):
+            set_kb, set_session = file_sources
+            set_session([{...}])
+
+    The helpers are ``async def`` (see #260 — sync boto3 was blocking
+    the event loop), so the patches install async side-effects. Returning
+    a plain list from an ``async def`` gives the callers an awaitable
+    they can ``await`` exactly like the real helpers.
+    """
+    kb_files: list[dict[str, Any]] = []
+    session_files: list[dict[str, Any]] = []
+
+    def set_kb(files):
+        kb_files[:] = list(files)
+
+    def set_session(files):
+        session_files[:] = list(files)
+
+    async def _kb_side_effect(_aid):
+        return list(kb_files)
+
+    async def _session_side_effect(_sid):
+        return list(session_files)
+
+    patchers = [
+        patch(
+            "agents.builtin_tools.spreadsheet_analysis.analyze_tool._get_kb_files",
+            side_effect=_kb_side_effect,
+        ),
+        patch(
+            "agents.builtin_tools.spreadsheet_analysis.analyze_tool._get_session_files",
+            side_effect=_session_side_effect,
+        ),
+        patch(
+            "agents.builtin_tools.spreadsheet_analysis.list_spreadsheets_tool._get_kb_files",
+            side_effect=_kb_side_effect,
+        ),
+        patch(
+            "agents.builtin_tools.spreadsheet_analysis.list_spreadsheets_tool._get_session_files",
+            side_effect=_session_side_effect,
+        ),
+    ]
+    for p in patchers:
+        p.start()
+    try:
+        yield set_kb, set_session
+    finally:
+        for p in patchers:
+            p.stop()
+
+
+@pytest.fixture
+def code_interpreter_id(monkeypatch):
+    """Set a sentinel Code Interpreter id so ``_get_code_interpreter_id``
+    short-circuits to the env branch (avoiding the SSM fallback).
+    """
+    monkeypatch.setenv("AGENTCORE_CODE_INTERPRETER_ID", "ci-test-123")
+
+
+# ---------------------------------------------------------------------------
+# Canned file records
+# ---------------------------------------------------------------------------
+
+
+def make_session_csv(filename: str = "data.csv", size: int = 1024) -> dict:
+    return {
+        "filename": filename,
+        "source": "chat_attachment",
+        "content_type": "text/csv",
+        "size_bytes": size,
+        "document_id": f"upload-{filename}",
+        "s3_key": f"sessions/{filename}",
+        "s3_bucket": SESSIONS_BUCKET,
+    }
+
+
+def make_session_xlsx(filename: str = "workbook.xlsx", size: int = 1024 * 500) -> dict:
+    return {
+        "filename": filename,
+        "source": "chat_attachment",
+        "content_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
+        "size_bytes": size,
+        "document_id": f"upload-{filename}",
+        "s3_key": f"sessions/{filename}",
+        "s3_bucket": SESSIONS_BUCKET,
+    }
+
+
+def make_kb_xlsx(filename: str = "kb_workbook.xlsx", size: int = 1024 * 200) -> dict:
+    return {
+        "filename": filename,
+        "source": "knowledge_base",
+        "content_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
+        "size_bytes": size,
+        "document_id": f"doc-{filename}",
+        "s3_key": f"assistants/ast-1/{filename}",
+    }
+
+
+@pytest.fixture
+def file_factories():
+    """Expose the canned-file helpers to tests without an import dance."""
+    return {
+        "session_csv": make_session_csv,
+        "session_xlsx": make_session_xlsx,
+        "kb_xlsx": make_kb_xlsx,
+    }
+
+
+# ---------------------------------------------------------------------------
+# Tool invocation helper
+# ---------------------------------------------------------------------------
+
+
+def _unwrap(tool_obj: Any) -> Callable[..., Any]:
+    """Strands' @tool wraps the original function; our tests call the raw
+    function so we bypass framework marshalling.
+    """
+    return getattr(tool_obj, "__wrapped__", None) or tool_obj
+
+
+@pytest.fixture
+def call_analyze():
+    """Shortcut for ``unwrap(tool)(**kwargs)`` — builds the analyze tool
+    via the factory and invokes it in one go.
+
+    ``analyze_spreadsheet`` is ``async def`` (see #260) so we run the
+    returned coroutine to completion here; tests stay sync and assert
+    on the resolved result, keeping the call-site unchanged from the
+    pre-refactor shape.
+
+        def test_it(call_analyze, ...):
+            result = call_analyze(
+                filename="x.csv",
+                python_code="print(1)",
+                assistant_id=None, session_id="s1", user_id="u1",
+            )
+    """
+    from agents.builtin_tools.spreadsheet_analysis.analyze_tool import (
+        make_analyze_tool,
+    )
+
+    def _call(*, filename, python_code, output_filename=None,
+              assistant_id=None, session_id="s1", user_id="u1"):
+        tool = make_analyze_tool(assistant_id, session_id, user_id)
+        fn = _unwrap(tool)
+        return asyncio.run(fn(filename=filename, python_code=python_code,
+                              output_filename=output_filename))
+
+    return _call
+
+
+# ---------------------------------------------------------------------------
+# Bootstrap stdout builder (multi-sheet / single-sheet)
+# ---------------------------------------------------------------------------
+
+
+def build_bootstrap_stdout(
+    *,
+    total: int,
+    sheets: list[tuple[str, str, int, bool, str]],
+    skipped_names: list[str] | None = None,
+) -> str:
+    """Build the stdout the XLSX bootstrap emits inside its ``[__SHEETS__]``
+    block.
+
+    Each ``sheets`` entry is ``(name, path, rows, truncated, alias)``.
+    The function assembles the block in the exact shape the real
+    bootstrap writes, so ``_parse_sheet_inventory`` can round-trip it.
+    """
+    from agents.builtin_tools.spreadsheet_analysis.analyze_tool import _SHEETS_MARKER
+
+    lines = [
+        _SHEETS_MARKER,
+        f"total: {total}",
+        f"converted: {len(sheets)}",
+        f"skipped: {total - len(sheets)}",
+    ]
+    if skipped_names:
+        lines.append(f"skipped_names: {skipped_names!r}")
+    for name, path, rows, truncated, alias in sheets:
+        flag = "1" if truncated else "0"
+        lines.append(f"sheet|{name}|{path}|{rows}|{flag}|{alias}")
+    lines.append(_SHEETS_MARKER)
+    return "\n".join(lines) + "\n"
+
+
+@pytest.fixture
+def bootstrap_stdout():
+    """Expose ``build_bootstrap_stdout`` to tests."""
+    return build_bootstrap_stdout
+
+
+# ---------------------------------------------------------------------------
+# Schema-preview stdout builder
+# ---------------------------------------------------------------------------
+
+
+def build_schema_stdout(
+    *,
+    file: str,
+    rows: int = 100,
+    cols: int = 3,
+    load: str | None = None,
+    columns: str = "a, b, c",
+    first_row: str = "{'a': 1, 'b': 2, 'c': 3}",
+) -> str:
+    """Build the stdout the schema-preview probe emits inside its
+    ``[__SCHEMA__]`` block.
+    """
+    from agents.builtin_tools.spreadsheet_analysis.analyze_tool import _SCHEMA_MARKER
+
+    load_line = load or f"pd.read_csv('{file}', low_memory=False)"
+    return "\n".join([
+        _SCHEMA_MARKER,
+        f"file: {file} ({rows} rows x {cols} cols)",
+        f"load: {load_line}",
+        f"columns: {columns}",
+        f"first_row: {first_row}",
+        _SCHEMA_MARKER,
+    ]) + "\n"
+
+
+@pytest.fixture
+def schema_stdout():
+    return build_schema_stdout
+
+
+# ---------------------------------------------------------------------------
+# Default stream reply dispatcher
+# ---------------------------------------------------------------------------
+
+
+def default_reply_factory(
+    *,
+    bootstrap_out: str = "",
+    schema_out: str = "",
+    user_out: str = "",
+    user_err: str = "",
+    user_is_error: bool = False,
+) -> Callable[[str, dict], dict]:
+    """Return a ``reply_for`` callback suitable for ``FakeCodeInterpreter``.
+
+    Reads the invocation ordering the tool performs — ``writeFiles`` for
+    the base64 blob / raw CSV, then executeCode for the bootstrap,
+    executeCode for the schema probe, executeCode for the user code —
+    and emits the matching stdout/stderr.
+
+    Ignores ``readFiles`` (used for chart downloads) unless a caller
+    explicitly overrides.
+    """
+    state = {"execute_calls": 0}
+
+    def _reply(name: str, _payload: dict) -> dict:
+        if name == "executeCode":
+            state["execute_calls"] += 1
+            # Order: 1) XLSX bootstrap (or none for CSV), 2) schema probe,
+            # 3) user code. For CSV inputs, the bootstrap is skipped so
+            # call #1 is schema, call #2 is user code.
+            call_idx = state["execute_calls"]
+            if bootstrap_out and call_idx == 1:
+                return _stream_response(bootstrap_out)
+            if bootstrap_out and call_idx == 2:
+                return _stream_response(schema_out)
+            if bootstrap_out and call_idx == 3:
+                return _stream_response(user_out, is_error=user_is_error, stderr=user_err)
+            # CSV path — no bootstrap.
+            if not bootstrap_out and call_idx == 1:
+                return _stream_response(schema_out)
+            if not bootstrap_out and call_idx == 2:
+                return _stream_response(user_out, is_error=user_is_error, stderr=user_err)
+        return _stream_response()
+
+    return _reply
+
+
+@pytest.fixture
+def reply_factory():
+    return default_reply_factory
diff --git a/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_analyze_tool_integration.py b/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_analyze_tool_integration.py
new file mode 100644
index 00000000..e4b4aae4
--- /dev/null
+++ b/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_analyze_tool_integration.py
@@ -0,0 +1,779 @@
+"""Integration-style tests for the analyze_spreadsheet tool, exercising
+the full factory → file lookup → download → CodeInterpreter → response
+path with every external dependency mocked.
+
+Covers the behaviors the issue (#261) specifically called out as "subtle
+logic worth pinning down":
+
+- CSV fast-path (no bootstrap, direct writeFiles + schema probe + user code)
+- XLSX bootstrap: base64 push, sheet inventory round-trip, CSV rename
+- Single-sheet vs. multi-sheet response shape
+- Filename alias fallback (foo.csv ↔ foo.xlsx) via ``_find_file``
+- Error-path hints: wrong-filename retry, schema-footer attached
+- Truncation warnings when sheets hit MAX_ROWS_PER_SHEET
+- Skipped-sheet warning when workbook exceeds MAX_SHEETS_TO_CONVERT
+- Missing Code Interpreter → friendly error, no interpreter calls
+- File not found → friendly error with list_spreadsheets hint
+- S3 download failure → friendly error, interpreter still stopped
+
+S3 and DynamoDB go through moto so tests exercise the real boto3 call
+paths. Only the CodeInterpreter is hand-mocked (no moto equivalent for
+AgentCore). All tests run offline; no AWS credentials required and the
+backend env doesn't need pandas installed.
+"""
+
+from __future__ import annotations
+
+from unittest.mock import patch
+
+
+# ---------------------------------------------------------------------------
+# Happy path: CSV end-to-end
+# ---------------------------------------------------------------------------
+
+
+class TestCsvHappyPath:
+    def test_csv_end_to_end_success(
+        self,
+        call_analyze,
+        file_sources,
+        file_factories,
+        fake_code_interpreter,
+        sessions_bucket,
+        seed_s3_object,
+        code_interpreter_id,
+        schema_stdout,
+        reply_factory,
+    ):
+        set_kb, set_session = file_sources
+        set_session([file_factories["session_csv"]("data.csv")])
+        seed_s3_object(key="sessions/data.csv", body=b"col1,col2\n1,2\n")
+
+        fake, ci_patch = fake_code_interpreter
+        fake.reply_for = reply_factory(
+            schema_out=schema_stdout(file="data.csv"),
+            user_out="Total: 42\n",
+        )
+
+        with ci_patch:
+            result = call_analyze(
+                filename="data.csv",
+                python_code="print('Total: 42')",
+                session_id="s1",
+                user_id="u1",
+            )
+
+        assert result["status"] == "success"
+        text = result["content"][0]["text"]
+        assert "Total: 42" in text
+        # Schema footer attached to success responses.
+        assert "Dataset" in text
+        assert "data.csv" in text
+        # No XLSX bootstrap: one writeFiles (raw CSV) + two executeCode
+        # calls (schema probe, user code).
+        assert fake.started and fake.stopped
+        write_calls = [r for r in fake.invocations if r.name == "writeFiles"]
+        exec_calls = [r for r in fake.invocations if r.name == "executeCode"]
+        assert len(write_calls) == 1
+        assert len(exec_calls) == 2
+
+    def test_csv_writes_raw_text_to_sandbox(
+        self,
+        call_analyze,
+        file_sources,
+        file_factories,
+        fake_code_interpreter,
+        sessions_bucket,
+        seed_s3_object,
+        code_interpreter_id,
+        schema_stdout,
+        reply_factory,
+    ):
+        """CSV fast-path pushes the raw text directly — no base64, no
+        bootstrap. Regression guard against a future "always run the
+        bootstrap" refactor.
+        """
+        set_kb, set_session = file_sources
+        set_session([file_factories["session_csv"]("data.csv")])
+        seed_s3_object(key="sessions/data.csv", body=b"col1,col2\n1,2\n")
+
+        fake, ci_patch = fake_code_interpreter
+        fake.reply_for = reply_factory(
+            schema_out=schema_stdout(file="data.csv"),
+            user_out="done\n",
+        )
+
+        with ci_patch:
+            call_analyze(
+                filename="data.csv",
+                python_code="print('done')",
+                session_id="s1",
+                user_id="u1",
+            )
+
+        write_call = next(r for r in fake.invocations if r.name == "writeFiles")
+        files = write_call.payload["content"]
+        assert len(files) == 1
+        # Pushed as text, not base64.
+        assert files[0]["path"] == "data.csv"
+        assert files[0]["text"] == "col1,col2\n1,2\n"
+
+
+# ---------------------------------------------------------------------------
+# XLSX happy path — single-sheet and multi-sheet
+# ---------------------------------------------------------------------------
+
+
+XLSX_BYTES = b"\x50\x4b\x03\x04" + b"fake xlsx binary"  # PK... magic + payload
+
+
+class TestXlsxSingleSheet:
+    def test_single_sheet_xlsx_success(
+        self,
+        call_analyze,
+        file_sources,
+        file_factories,
+        fake_code_interpreter,
+        sessions_bucket,
+        seed_s3_object,
+        code_interpreter_id,
+        bootstrap_stdout,
+        schema_stdout,
+        reply_factory,
+    ):
+        set_kb, set_session = file_sources
+        set_session([file_factories["session_xlsx"]("Budget.xlsx")])
+        seed_s3_object(key="sessions/Budget.xlsx", body=XLSX_BYTES)
+
+        fake, ci_patch = fake_code_interpreter
+        fake.reply_for = reply_factory(
+            bootstrap_out=bootstrap_stdout(
+                total=1,
+                sheets=[("Sheet1", "Budget.csv", 100, False, "")],
+            ),
+            schema_out=schema_stdout(file="Budget.csv", rows=100),
+            user_out="sum=9999\n",
+        )
+
+        with ci_patch:
+            result = call_analyze(
+                filename="Budget.xlsx",
+                python_code="print('sum=9999')",
+                session_id="s1",
+                user_id="u1",
+            )
+
+        assert result["status"] == "success"
+        text = result["content"][0]["text"]
+        assert "sum=9999" in text
+        # Single-sheet path does not emit the multi-sheet inventory.
+        assert "Available sheets" not in text
+
+    def test_xlsx_bootstrap_pushes_base64_blob(
+        self,
+        call_analyze,
+        file_sources,
+        file_factories,
+        fake_code_interpreter,
+        sessions_bucket,
+        seed_s3_object,
+        code_interpreter_id,
+        bootstrap_stdout,
+        schema_stdout,
+        reply_factory,
+    ):
+        set_kb, set_session = file_sources
+        set_session([file_factories["session_xlsx"]("Budget.xlsx")])
+        seed_s3_object(key="sessions/Budget.xlsx", body=b"xlsx-binary-bytes")
+
+        fake, ci_patch = fake_code_interpreter
+        fake.reply_for = reply_factory(
+            bootstrap_out=bootstrap_stdout(
+                total=1,
+                sheets=[("Sheet1", "Budget.csv", 10, False, "")],
+            ),
+            schema_out=schema_stdout(file="Budget.csv"),
+            user_out="",
+        )
+
+        with ci_patch:
+            call_analyze(
+                filename="Budget.xlsx",
+                python_code="pass",
+                session_id="s1",
+                user_id="u1",
+            )
+
+        write_call = next(r for r in fake.invocations if r.name == "writeFiles")
+        # Encoded blob written as text under _encoded.b64.
+        entries = write_call.payload["content"]
+        assert any(e["path"] == "_encoded.b64" for e in entries)
+
+
+class TestXlsxMultiSheet:
+    def test_multi_sheet_response_includes_inventory(
+        self,
+        call_analyze,
+        file_sources,
+        file_factories,
+        fake_code_interpreter,
+        sessions_bucket,
+        seed_s3_object,
+        code_interpreter_id,
+        bootstrap_stdout,
+        schema_stdout,
+        reply_factory,
+    ):
+        set_kb, set_session = file_sources
+        set_session([file_factories["session_xlsx"]("Budget.xlsx")])
+        seed_s3_object(key="sessions/Budget.xlsx", body=XLSX_BYTES)
+
+        fake, ci_patch = fake_code_interpreter
+        fake.reply_for = reply_factory(
+            bootstrap_out=bootstrap_stdout(
+                total=3,
+                sheets=[
+                    ("Summary", "Budget.summary.csv", 12, False, "Budget.csv"),
+                    ("Transactions", "Budget.transactions.csv", 18_551, False, ""),
+                    ("Notes", "Budget.notes.csv", 5, False, ""),
+                ],
+            ),
+            schema_out=schema_stdout(file="Budget.csv"),
+            user_out="analyzed\n",
+        )
+
+        with ci_patch:
+            result = call_analyze(
+                filename="Budget.xlsx",
+                python_code="pass",
+                session_id="s1",
+                user_id="u1",
+            )
+
+        text = result["content"][0]["text"]
+        assert "Available sheets" in text
+        assert "Summary" in text
+        assert "Transactions" in text
+        assert "Notes" in text
+        assert "Budget.summary.csv" in text
+        # Row counts are formatted with commas for readability.
+        assert "18,551" in text
+
+    def test_skipped_sheets_warning_surfaces(
+        self,
+        call_analyze,
+        file_sources,
+        file_factories,
+        fake_code_interpreter,
+        sessions_bucket,
+        seed_s3_object,
+        code_interpreter_id,
+        bootstrap_stdout,
+        schema_stdout,
+        reply_factory,
+    ):
+        set_kb, set_session = file_sources
+        set_session([file_factories["session_xlsx"]("Many.xlsx")])
+        seed_s3_object(key="sessions/Many.xlsx", body=XLSX_BYTES)
+
+        fake, ci_patch = fake_code_interpreter
+        sheets = [
+            (f"S{i}", f"Many.s{i}.csv", 10, False, "" if i > 1 else "Many.csv")
+            for i in range(1, 26)
+        ]
+        fake.reply_for = reply_factory(
+            bootstrap_out=bootstrap_stdout(
+                total=30,
+                sheets=sheets,
+                skipped_names=["S26", "S27", "S28", "S29", "S30"],
+            ),
+            schema_out=schema_stdout(file="Many.csv"),
+            user_out="ok\n",
+        )
+
+        with ci_patch:
+            result = call_analyze(
+                filename="Many.xlsx",
+                python_code="pass",
+                session_id="s1",
+                user_id="u1",
+            )
+
+        text = result["content"][0]["text"]
+        assert "30 sheets" in text
+        assert "first 25" in text
+        assert "S26" in text
+        assert "S30" in text
+
+    def test_truncated_sheet_warning_surfaces(
+        self,
+        call_analyze,
+        file_sources,
+        file_factories,
+        fake_code_interpreter,
+        sessions_bucket,
+        seed_s3_object,
+        code_interpreter_id,
+        bootstrap_stdout,
+        schema_stdout,
+        reply_factory,
+    ):
+        """A sheet truncated at MAX_ROWS_PER_SHEET must be flagged in the
+        inventory list so the user knows the analysis may be partial.
+        """
+        from agents.builtin_tools.spreadsheet_analysis.analyze_tool import (
+            MAX_ROWS_PER_SHEET,
+        )
+
+        set_kb, set_session = file_sources
+        set_session([file_factories["session_xlsx"]("Huge.xlsx")])
+        seed_s3_object(key="sessions/Huge.xlsx", body=XLSX_BYTES)
+
+        fake, ci_patch = fake_code_interpreter
+        fake.reply_for = reply_factory(
+            bootstrap_out=bootstrap_stdout(
+                total=2,
+                sheets=[
+                    ("BigSheet", "Huge.bigsheet.csv",
+                     MAX_ROWS_PER_SHEET, True, "Huge.csv"),
+                    ("SmallSheet", "Huge.smallsheet.csv", 10, False, ""),
+                ],
+            ),
+            schema_out=schema_stdout(file="Huge.csv"),
+            user_out="done\n",
+        )
+
+        with ci_patch:
+            result = call_analyze(
+                filename="Huge.xlsx",
+                python_code="pass",
+                session_id="s1",
+                user_id="u1",
+            )
+
+        text = result["content"][0]["text"]
+        # Truncation tag for the big sheet; none for the small one.
+        big_line = next(line for line in text.splitlines() if "BigSheet" in line)
+        small_line = next(line for line in text.splitlines() if "SmallSheet" in line)
+        assert "truncated" in big_line.lower()
+        assert "truncated" not in small_line.lower()
+
+
+# ---------------------------------------------------------------------------
+# Filename aliasing
+# ---------------------------------------------------------------------------
+
+
+class TestFilenameAliasing:
+    def test_csv_request_resolves_xlsx_source(
+        self,
+        call_analyze,
+        file_sources,
+        file_factories,
+        fake_code_interpreter,
+        sessions_bucket,
+        seed_s3_object,
+        code_interpreter_id,
+        bootstrap_stdout,
+        schema_stdout,
+        reply_factory,
+    ):
+        """Model asks for ``Budget.csv`` (sandbox filename) when only
+        ``Budget.xlsx`` was uploaded. ``_find_file`` aliases to the XLSX
+        source; end-to-end the tool should still succeed.
+        """
+        set_kb, set_session = file_sources
+        set_session([file_factories["session_xlsx"]("Budget.xlsx")])
+        seed_s3_object(key="sessions/Budget.xlsx", body=XLSX_BYTES)
+
+        fake, ci_patch = fake_code_interpreter
+        fake.reply_for = reply_factory(
+            bootstrap_out=bootstrap_stdout(
+                total=1,
+                sheets=[("Sheet1", "Budget.csv", 10, False, "")],
+            ),
+            schema_out=schema_stdout(file="Budget.csv"),
+            user_out="ok\n",
+        )
+
+        with ci_patch:
+            result = call_analyze(
+                filename="Budget.csv",
+                python_code="pass",
+                session_id="s1",
+                user_id="u1",
+            )
+
+        assert result["status"] == "success"
+
+
+class TestKnowledgeBaseDownload:
+    def test_kb_source_downloads_from_kb_bucket(
+        self,
+        call_analyze,
+        file_sources,
+        file_factories,
+        fake_code_interpreter,
+        kb_bucket,
+        code_interpreter_id,
+        schema_stdout,
+        reply_factory,
+    ):
+        """KB-sourced files resolve their bucket from the
+        ``S3_ASSISTANTS_DOCUMENTS_BUCKET_NAME`` env var (set by the
+        ``kb_bucket`` fixture) rather than from the file record's
+        ``s3_bucket`` field. Covers the knowledge_base branch of
+        ``_download_file`` which the session-attachment tests never
+        exercise.
+        """
+        import boto3
+
+        from tests.agents.builtin_tools.spreadsheet_analysis.conftest import (
+            AWS_REGION,
+        )
+
+        # Seed the KB bucket directly (the kb_xlsx factory points s3_key
+        # at assistants/ast-1/..., which is what _download_file reads).
+        kb_file = file_factories["kb_xlsx"]("Ledger.csv")
+        kb_file["content_type"] = "text/csv"  # simpler path — no XLSX bootstrap
+        s3 = boto3.client("s3", region_name=AWS_REGION)
+        s3.put_object(
+            Bucket=kb_bucket,
+            Key=kb_file["s3_key"],
+            Body=b"a,b,c\n1,2,3\n",
+        )
+
+        set_kb, set_session = file_sources
+        set_kb([kb_file])
+        set_session([])
+
+        fake, ci_patch = fake_code_interpreter
+        fake.reply_for = reply_factory(
+            schema_out=schema_stdout(file="Ledger.csv"),
+            user_out="kb-analyzed\n",
+        )
+
+        with ci_patch:
+            result = call_analyze(
+                filename="Ledger.csv",
+                python_code="pass",
+                assistant_id="ast-1",
+                session_id="s1",
+                user_id="u1",
+            )
+
+        assert result["status"] == "success"
+        assert "kb-analyzed" in result["content"][0]["text"]
+
+    def test_kb_source_missing_env_var_surfaces_friendly_error(
+        self,
+        call_analyze,
+        file_sources,
+        file_factories,
+        aws_mocked,
+        monkeypatch,
+        code_interpreter_id,
+    ):
+        """``_download_file`` raises ``ValueError`` when a KB file has no
+        resolvable bucket. Tool wraps that in a graceful error rather
+        than propagating the exception.
+        """
+        monkeypatch.delenv("S3_ASSISTANTS_DOCUMENTS_BUCKET_NAME", raising=False)
+
+        kb_file = file_factories["kb_xlsx"]("Ledger.csv")
+        kb_file["content_type"] = "text/csv"
+
+        set_kb, set_session = file_sources
+        set_kb([kb_file])
+        set_session([])
+
+        result = call_analyze(
+            filename="Ledger.csv",
+            python_code="pass",
+            assistant_id="ast-1",
+            session_id="s1",
+            user_id="u1",
+        )
+
+        assert result["status"] == "error"
+        assert "Failed to download" in result["content"][0]["text"]
+
+
+# ---------------------------------------------------------------------------
+# Error paths
+# ---------------------------------------------------------------------------
+
+
+class TestFileNotFound:
+    def test_unknown_file_returns_list_spreadsheets_hint(
+        self,
+        call_analyze,
+        file_sources,
+        code_interpreter_id,
+    ):
+        set_kb, set_session = file_sources
+        set_session([])
+
+        # No Code Interpreter patching because we never get that far.
+        result = call_analyze(
+            filename="missing.csv",
+            python_code="print(1)",
+            session_id="s1",
+            user_id="u1",
+        )
+        assert result["status"] == "error"
+        text = result["content"][0]["text"]
+        assert "not found" in text
+        assert "list_spreadsheets" in text
+
+
+class TestS3DownloadFailure:
+    def test_s3_error_surfaces_friendly_message(
+        self,
+        call_analyze,
+        file_sources,
+        file_factories,
+        fake_code_interpreter,
+        sessions_bucket,
+        code_interpreter_id,
+    ):
+        """File metadata points at a key that doesn't exist in the bucket.
+        moto returns a NoSuchKey ClientError, which ``_download_file``
+        wraps in a friendly message rather than crashing.
+        """
+        set_kb, set_session = file_sources
+        set_session([file_factories["session_csv"]("data.csv")])
+        # Note: no seed_s3_object — the object doesn't exist, so
+        # get_object raises.
+
+        fake, ci_patch = fake_code_interpreter
+        with ci_patch:
+            result = call_analyze(
+                filename="data.csv",
+                python_code="pass",
+                session_id="s1",
+                user_id="u1",
+            )
+
+        assert result["status"] == "error"
+        assert "Failed to download" in result["content"][0]["text"]
+        # The interpreter should never have been started for a download
+        # failure — start() happens after _download_file succeeds.
+        assert not fake.started
+
+
+class TestCodeInterpreterUnavailable:
+    def test_no_ci_id_returns_friendly_error(
+        self,
+        call_analyze,
+        file_sources,
+        file_factories,
+        monkeypatch,
+    ):
+        """When ``_get_code_interpreter_id`` resolves to None (env unset,
+        SSM lookup fails), the tool bails out with a contact-admin
+        message instead of crashing.
+        """
+        monkeypatch.delenv("AGENTCORE_CODE_INTERPRETER_ID", raising=False)
+
+        set_kb, set_session = file_sources
+        set_session([file_factories["session_csv"]("data.csv")])
+
+        with patch(
+            "agents.builtin_tools.spreadsheet_analysis.analyze_tool._get_code_interpreter_id",
+            return_value=None,
+        ):
+            result = call_analyze(
+                filename="data.csv",
+                python_code="pass",
+                session_id="s1",
+                user_id="u1",
+            )
+
+        assert result["status"] == "error"
+        assert "Code Interpreter is not configured" in result["content"][0]["text"]
+
+
+class TestUserCodeError:
+    def test_wrong_xlsx_filename_injects_hint(
+        self,
+        call_analyze,
+        file_sources,
+        file_factories,
+        fake_code_interpreter,
+        sessions_bucket,
+        seed_s3_object,
+        code_interpreter_id,
+        bootstrap_stdout,
+        schema_stdout,
+        reply_factory,
+    ):
+        """Classic failure: model wrote ``pd.read_csv('Budget.xlsx', ...)``
+        but the sandbox has ``Budget.csv``. Error response must include
+        the targeted retry hint naming the correct filename, not just
+        dump the FileNotFoundError.
+        """
+        set_kb, set_session = file_sources
+        set_session([file_factories["session_xlsx"]("Budget.xlsx")])
+        seed_s3_object(key="sessions/Budget.xlsx", body=XLSX_BYTES)
+
+        err_traceback = (
+            "Traceback (most recent call last):\n"
+            "  File \"/tmp/ipykernel_1/code.py\", line 1, in <module>\n"
+            "    df = pd.read_csv('Budget.xlsx', low_memory=False)\n"
+            "FileNotFoundError: [Errno 2] No such file or directory: "
+            "'Budget.xlsx'\n"
+        )
+
+        fake, ci_patch = fake_code_interpreter
+        fake.reply_for = reply_factory(
+            bootstrap_out=bootstrap_stdout(
+                total=1,
+                sheets=[("Sheet1", "Budget.csv", 10, False, "")],
+            ),
+            schema_out=schema_stdout(file="Budget.csv"),
+            user_out="",
+            user_err=err_traceback,
+            user_is_error=True,
+        )
+
+        with ci_patch:
+            result = call_analyze(
+                filename="Budget.xlsx",
+                python_code="df = pd.read_csv('Budget.xlsx')",
+                session_id="s1",
+                user_id="u1",
+            )
+
+        assert result["status"] == "error"
+        text = result["content"][0]["text"]
+        assert "FileNotFoundError" in text
+        # The retry hint names the sandbox filename explicitly.
+        assert "loaded as" in text
+        assert "Budget.csv" in text
+        # Schema footer should also be attached so the retry has the
+        # load line.
+        assert "Dataset info" in text or "use the `load:` line" in text
+
+    def test_generic_user_error_attaches_schema(
+        self,
+        call_analyze,
+        file_sources,
+        file_factories,
+        fake_code_interpreter,
+        sessions_bucket,
+        seed_s3_object,
+        code_interpreter_id,
+        schema_stdout,
+        reply_factory,
+    ):
+        """A KeyError on a CSV — no XLSX hint needed, but the schema
+        footer with column list should land so the model can fix its
+        column reference on retry.
+        """
+        set_kb, set_session = file_sources
+        set_session([file_factories["session_csv"]("data.csv")])
+        seed_s3_object(key="sessions/data.csv", body=b"a,b,c\n1,2,3\n")
+
+        err_traceback = (
+            "Traceback (most recent call last):\n"
+            "  File \"/tmp/ipykernel_1/code.py\", line 1, in <module>\n"
+            "    print(df['WRONG_COL'].sum())\n"
+            "KeyError: 'WRONG_COL'\n"
+        )
+
+        fake, ci_patch = fake_code_interpreter
+        fake.reply_for = reply_factory(
+            schema_out=schema_stdout(file="data.csv", columns="a, b, c"),
+            user_out="",
+            user_err=err_traceback,
+            user_is_error=True,
+        )
+
+        with ci_patch:
+            result = call_analyze(
+                filename="data.csv",
+                python_code="print(df['WRONG_COL'].sum())",
+                session_id="s1",
+                user_id="u1",
+            )
+
+        assert result["status"] == "error"
+        text = result["content"][0]["text"]
+        assert "KeyError" in text
+        assert "Dataset info" in text
+        assert "columns: a, b, c" in text
+        # The xlsx hint must NOT appear on a CSV error.
+        assert "loaded as" not in text
+
+
+class TestInterpreterLifecycle:
+    def test_interpreter_stopped_on_success(
+        self,
+        call_analyze,
+        file_sources,
+        file_factories,
+        fake_code_interpreter,
+        sessions_bucket,
+        seed_s3_object,
+        code_interpreter_id,
+        schema_stdout,
+        reply_factory,
+    ):
+        set_kb, set_session = file_sources
+        set_session([file_factories["session_csv"]("data.csv")])
+        seed_s3_object(key="sessions/data.csv", body=b"a,b\n1,2\n")
+
+        fake, ci_patch = fake_code_interpreter
+        fake.reply_for = reply_factory(
+            schema_out=schema_stdout(file="data.csv"),
+            user_out="done\n",
+        )
+
+        with ci_patch:
+            call_analyze(
+                filename="data.csv",
+                python_code="pass",
+                session_id="s1",
+                user_id="u1",
+            )
+
+        assert fake.started
+        assert fake.stopped
+
+    def test_interpreter_stopped_on_user_error(
+        self,
+        call_analyze,
+        file_sources,
+        file_factories,
+        fake_code_interpreter,
+        sessions_bucket,
+        seed_s3_object,
+        code_interpreter_id,
+        schema_stdout,
+        reply_factory,
+    ):
+        """The finally: stop() must run even when user code fails.
+        Otherwise we'd leak interpreter sessions on every bad query.
+        """
+        set_kb, set_session = file_sources
+        set_session([file_factories["session_csv"]("data.csv")])
+        seed_s3_object(key="sessions/data.csv", body=b"a,b\n1,2\n")
+
+        fake, ci_patch = fake_code_interpreter
+        fake.reply_for = reply_factory(
+            schema_out=schema_stdout(file="data.csv"),
+            user_out="",
+            user_err="KeyError: 'x'\n",
+            user_is_error=True,
+        )
+
+        with ci_patch:
+            call_analyze(
+                filename="data.csv",
+                python_code="pass",
+                session_id="s1",
+                user_id="u1",
+            )
+
+        assert fake.stopped
diff --git a/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_build_preview_code.py b/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_build_preview_code.py
new file mode 100644
index 00000000..b6915d9d
--- /dev/null
+++ b/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_build_preview_code.py
@@ -0,0 +1,127 @@
+"""Tests for ``_build_preview_code`` — the schema-probe Python template
+that runs inside the Code Interpreter sandbox.
+
+Scope note: the sandbox runs in an AWS-managed container with pandas
+preinstalled; the backend's own test environment does NOT bundle pandas.
+That means we can't execute the template in-process here without pulling
+pandas into backend dependencies — which nothing else needs. So these
+tests focus on the template's **shape**: it must parse as valid Python,
+quote the filename safely (including filenames with apostrophes or
+double quotes), and include the expected scorer/marker scaffolding so
+regressions to the template structure are caught.
+
+Execution-level coverage of the scorer (does it correctly prescribe
+``skiprows=4`` for a 4-row title preamble?) will land in a follow-up
+issue to extract the scorer into a pure, directly-testable helper. See
+#261.
+"""
+
+import ast
+
+from agents.builtin_tools.spreadsheet_analysis.analyze_tool import (
+    _SCHEMA_MARKER,
+    _build_preview_code,
+)
+
+
+class TestPreviewCodeParsesAsValidPython:
+    def test_simple_filename(self):
+        ast.parse(_build_preview_code("data.csv"))
+
+    def test_filename_with_apostrophe(self):
+        """Regression: before the ``_FNAME`` indirection, a filename like
+        ``O'Brien data.csv`` produced invalid Python because repr() emits
+        double quotes around strings containing single quotes, conflicting
+        with the template's outer f-string quoting.
+        """
+        ast.parse(_build_preview_code("O'Brien data.csv"))
+
+    def test_filename_with_double_quote(self):
+        """Double quotes in filenames should also survive — repr() picks
+        single quotes when the string contains doubles.
+        """
+        ast.parse(_build_preview_code('say "hello".csv'))
+
+    def test_filename_with_backslashes(self):
+        ast.parse(_build_preview_code("path\\with\\backslashes.csv"))
+
+    def test_filename_with_tabs_and_newlines(self):
+        """Whitespace escapes — Python's repr uses \\t / \\n so the
+        generated source stays on one line.
+        """
+        ast.parse(_build_preview_code("file\twith\ttabs.csv"))
+        ast.parse(_build_preview_code("file\nwith\nnewlines.csv"))
+
+    def test_filename_with_unicode(self):
+        ast.parse(_build_preview_code("Ñiño.csv"))
+
+    def test_filename_with_braces(self):
+        """Curly braces in filenames must not be interpreted as f-string
+        placeholders. ``_FNAME`` indirection sidesteps the issue.
+        """
+        ast.parse(_build_preview_code("{templated}.csv"))
+
+    def test_empty_filename(self):
+        """Empty strings should produce valid (if useless) Python — we
+        don't want to fail tool construction on a bad filename; that's
+        the call site's job.
+        """
+        ast.parse(_build_preview_code(""))
+
+
+class TestPreviewCodeShape:
+    def test_contains_schema_markers(self):
+        code = _build_preview_code("x.csv")
+        # The marker appears at least twice — once to open, once to close.
+        assert code.count(repr(_SCHEMA_MARKER)) >= 2
+
+    def test_emits_marker_on_failure_branch(self):
+        """The template wraps its probe in try/except and emits the marker
+        on the except path too, so a probe failure doesn't leave the
+        outer parser hanging on a half-emitted schema.
+        """
+        code = _build_preview_code("x.csv")
+        # Look for the failure branch's signature text. Resilience against
+        # template churn: use a stable keyword rather than exact wording.
+        assert "schema preview unavailable" in code
+
+    def test_scorer_iterates_skiprows_0_to_8(self):
+        """Regression: the probe range is deliberate. If someone shortens
+        it, the scorer can't find the right header on deeply-nested
+        report exports.
+        """
+        code = _build_preview_code("x.csv")
+        assert "range(9)" in code
+
+    def test_references_pandas(self):
+        code = _build_preview_code("x.csv")
+        assert "pandas" in code
+        assert "pd.read_csv" in code
+
+    def test_stores_filename_in_local_once(self):
+        """The template references ``_FNAME`` rather than re-interpolating
+        the raw filename into every usage. Pin this to keep the quoting
+        bug from regressing if someone "simplifies" the template.
+        """
+        code = _build_preview_code("whatever.csv")
+        # Exactly one assignment of _FNAME.
+        assert code.count("_FNAME = ") == 1
+        # All file operations use the local, not a re-interpolated literal.
+        for expected in (
+            "open(_FNAME",
+            "pd.read_csv(_FNAME, nrows=0",
+            "pd.read_csv(_FNAME, skiprows=",
+        ):
+            assert expected in code
+
+    def test_confidence_gate_still_present(self):
+        """The ``_prescribe`` gate is what prevents over-eager skiprows
+        recommendations. If it disappears, the scorer will happily point
+        the model at a data-row-as-header and regressions become silent.
+        """
+        code = _build_preview_code("x.csv")
+        assert "_prescribe" in code
+        # The gate checks all three conditions — drop any and we're
+        # back to pre-gate behavior.
+        assert "_best_skip > 0" in code
+        assert "_win_clean_ratio" in code
diff --git a/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_clean_stderr.py b/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_clean_stderr.py
new file mode 100644
index 00000000..84d214e7
--- /dev/null
+++ b/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_clean_stderr.py
@@ -0,0 +1,202 @@
+"""Tests for ``_clean_stderr`` — strips pandas internal frames and warning
+noise from Code Interpreter tracebacks, keeping only the user-code frame
+and the final exception line.
+
+Fixtures model real tracebacks surfaced by the interpreter: KeyError from
+a missing column, ValueError from a bad dtype cast, FileNotFoundError from
+an incorrect filename, a SyntaxError from malformed python_code, and a
+malformed blob that doesn't match the expected traceback shape.
+
+These tests are important because ``_clean_stderr`` output is what the
+model sees on retry. Regressions here either flood the model with
+irrelevant noise (bad retries, wasted tokens) or swallow the real error
+(stuck retries). See #261.
+"""
+
+from agents.builtin_tools.spreadsheet_analysis.analyze_tool import (
+    MAX_ERROR_CHARS,
+    _clean_stderr,
+)
+
+
+class TestCleanStderrEmptyInput:
+    def test_empty_string_returns_placeholder(self):
+        assert _clean_stderr("") == "Unknown error"
+
+    def test_none_returns_placeholder(self):
+        assert _clean_stderr(None) == "Unknown error"  # type: ignore[arg-type]
+
+
+class TestCleanStderrKeyError:
+    """pandas KeyError — the most common failure: wrong column name."""
+
+    TRACEBACK = """Traceback (most recent call last):
+  File "/tmp/ipykernel_42/user_code.py", line 3, in <module>
+    total = df['NET_AMOUNT_MISSPELLED'].sum()
+            ~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/opt/venv/lib/python3.12/site-packages/pandas/core/frame.py", line 4090, in __getitem__
+    indexer = self.columns.get_loc(key)
+  File "/opt/venv/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 3812, in get_loc
+    raise KeyError(key) from err
+KeyError: 'NET_AMOUNT_MISSPELLED'
+"""
+
+    def test_keeps_user_frame(self):
+        cleaned = _clean_stderr(self.TRACEBACK)
+        assert "user_code.py" in cleaned
+        assert "NET_AMOUNT_MISSPELLED" in cleaned
+
+    def test_drops_pandas_internal_frames(self):
+        cleaned = _clean_stderr(self.TRACEBACK)
+        assert "site-packages/pandas/" not in cleaned
+        assert "pandas/core/frame.py" not in cleaned
+        assert "get_loc" not in cleaned
+
+    def test_includes_final_exception(self):
+        cleaned = _clean_stderr(self.TRACEBACK)
+        assert "KeyError" in cleaned
+        # The actual missing key should survive
+        assert "'NET_AMOUNT_MISSPELLED'" in cleaned
+
+    def test_within_budget(self):
+        cleaned = _clean_stderr(self.TRACEBACK)
+        assert len(cleaned) <= MAX_ERROR_CHARS
+
+
+class TestCleanStderrValueError:
+    TRACEBACK = """Traceback (most recent call last):
+  File "/tmp/ipykernel_99/script.py", line 7, in <module>
+    df['amount'] = df['amount'].astype(int)
+                   ~~~~~~~~~~~~~~~~~~~~^^^^^
+  File "/opt/venv/lib/python3.12/site-packages/pandas/core/generic.py", line 6534, in astype
+    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
+  File "/opt/venv/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 414, in astype
+    return self.apply(
+ValueError: invalid literal for int() with base 10: '$1,234.56'
+"""
+
+    def test_user_frame_kept(self):
+        cleaned = _clean_stderr(self.TRACEBACK)
+        assert "script.py" in cleaned
+        assert "astype(int)" in cleaned
+
+    def test_exception_kept(self):
+        cleaned = _clean_stderr(self.TRACEBACK)
+        assert "ValueError" in cleaned
+        assert "'$1,234.56'" in cleaned
+
+    def test_pandas_internals_dropped(self):
+        cleaned = _clean_stderr(self.TRACEBACK)
+        assert "generic.py" not in cleaned
+        assert "managers.py" not in cleaned
+
+
+class TestCleanStderrFileNotFoundError:
+    """The XLSX/CSV mismatch path — model points at the wrong filename."""
+
+    TRACEBACK = """Traceback (most recent call last):
+  File "/tmp/ipykernel_1/user_code.py", line 2, in <module>
+    df = pd.read_csv('FY_27_Ledger.xlsx', low_memory=False)
+  File "/opt/venv/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
+    return _read(filepath_or_buffer, kwds)
+FileNotFoundError: [Errno 2] No such file or directory: 'FY_27_Ledger.xlsx'
+"""
+
+    def test_filename_preserved_for_targeted_hint(self):
+        """The outer tool matches on ``filename in error_msg`` to trigger
+        the xlsx→csv retry hint — the cleaner must keep the source
+        filename readable.
+        """
+        cleaned = _clean_stderr(self.TRACEBACK)
+        assert "FY_27_Ledger.xlsx" in cleaned
+
+    def test_exception_name_preserved(self):
+        assert "FileNotFoundError" in _clean_stderr(self.TRACEBACK)
+
+    def test_pandas_reader_frame_dropped(self):
+        cleaned = _clean_stderr(self.TRACEBACK)
+        assert "readers.py" not in cleaned
+
+
+class TestCleanStderrSyntaxError:
+    """Model wrote broken python_code — no useful stack, just the syntax
+    error line and caret.
+    """
+
+    TRACEBACK = """  File "/tmp/ipykernel_5/broken.py", line 2
+    df = pd.read_csv('x.csv'
+                           ^
+SyntaxError: '(' was never closed
+"""
+
+    def test_syntax_error_surfaced(self):
+        cleaned = _clean_stderr(self.TRACEBACK)
+        assert "SyntaxError" in cleaned
+        assert "never closed" in cleaned
+
+    def test_user_frame_preserved(self):
+        cleaned = _clean_stderr(self.TRACEBACK)
+        assert "broken.py" in cleaned
+
+
+class TestCleanStderrMalformed:
+    """If the traceback doesn't match the expected shape, we should still
+    return *something* useful (a tail of the raw stderr) rather than blank.
+    """
+
+    def test_no_exception_line_returns_tail(self):
+        weird = "line1\nline2\nline3\n\nunexpected output without traceback"
+        cleaned = _clean_stderr(weird)
+        assert cleaned != ""
+        assert len(cleaned) > 0
+
+    def test_tail_bounded_by_budget(self):
+        """Malformed output should not exceed the error budget — prevents a
+        multi-kilobyte dump of unrelated stderr from eating tool result
+        space on retries.
+        """
+        weird = "\n".join(f"random noise line {i}" for i in range(200))
+        cleaned = _clean_stderr(weird)
+        assert len(cleaned) <= MAX_ERROR_CHARS
+
+
+class TestCleanStderrWarnings:
+    """DtypeWarning / FutureWarning / UserWarning are pandas noise that
+    appear *above* the real error. The cleaner drops the warning line and
+    its call-site follow-up.
+    """
+
+    TRACEBACK = """/opt/venv/lib/python3.12/site-packages/pandas/io/parsers/readers.py:622: DtypeWarning: Columns (17) have mixed types. Specify dtype option on import or set low_memory=False.
+  return _read(filepath_or_buffer, kwds)
+Traceback (most recent call last):
+  File "/tmp/ipykernel_7/code.py", line 4, in <module>
+    print(df['NET'].sum())
+          ~~^^^^^^^
+KeyError: 'NET'
+"""
+
+    def test_warning_dropped(self):
+        cleaned = _clean_stderr(self.TRACEBACK)
+        assert "DtypeWarning" not in cleaned
+        assert "mixed types" not in cleaned
+
+    def test_real_error_preserved(self):
+        cleaned = _clean_stderr(self.TRACEBACK)
+        assert "KeyError" in cleaned
+        assert "'NET'" in cleaned
+        assert "code.py" in cleaned
+
+
+class TestCleanStderrTruncation:
+    def test_output_clamped_to_max_error_chars(self):
+        """A long user-code frame shouldn't push the cleaned output past
+        MAX_ERROR_CHARS. Truncation appends an ellipsis marker.
+        """
+        long_traceback = (
+            "Traceback (most recent call last):\n"
+            f"  File \"/tmp/ipykernel_1/code.py\", line 1, in <module>\n"
+            f"    {'x' * 2000}\n"
+            "ValueError: super long error message " + "y" * 1000 + "\n"
+        )
+        cleaned = _clean_stderr(long_traceback)
+        assert len(cleaned) <= MAX_ERROR_CHARS
diff --git a/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_find_file.py b/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_find_file.py
new file mode 100644
index 00000000..dfe26b61
--- /dev/null
+++ b/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_find_file.py
@@ -0,0 +1,212 @@
+"""Tests for ``_find_file`` — file lookup used by analyze_spreadsheet to
+resolve a model-supplied filename to an S3-backed file record.
+
+The lookup pulls from two sources: the assistant's knowledge base
+(``_get_kb_files``) and the session's attachments (``_get_session_files``).
+The twist is an alias pass: XLSX↔CSV for tabular files, so
+``analyze_spreadsheet(filename="foo.csv", ...)`` resolves to the backing
+``foo.xlsx`` (and vice versa). Without this, the model's "retry with the
+sandbox filename" guess — which the docstring asks for — would fail at
+the tool boundary (#206).
+
+These tests pin down:
+- exact-match wins over the alias pass
+- aliasing only triggers for tabular extensions (no foo.pdf ↔ foo.docx)
+- both sources contribute candidates
+- case-insensitive exact match
+
+After #260, ``_find_file`` and both helpers are ``async def``; the
+``patch`` calls install ``AsyncMock`` side-effects so awaiting them
+yields the configured return values.
+"""
+
+from unittest.mock import AsyncMock, patch
+
+import pytest
+
+from agents.builtin_tools.spreadsheet_analysis.analyze_tool import _find_file
+
+
+def _kb_file(filename: str, content_type: str = "") -> dict:
+    return {
+        "filename": filename,
+        "source": "knowledge_base",
+        "content_type": content_type,
+        "size_bytes": 1234,
+        "document_id": "doc-1",
+        "s3_key": f"kb/{filename}",
+    }
+
+
+def _session_file(filename: str, content_type: str = "") -> dict:
+    return {
+        "filename": filename,
+        "source": "chat_attachment",
+        "content_type": content_type,
+        "size_bytes": 1234,
+        "document_id": "upload-1",
+        "s3_key": f"session/{filename}",
+        "s3_bucket": "test-bucket",
+    }
+
+
+def _patch_sources(*, kb=None, session=None):
+    """Install AsyncMock patches for both file-source helpers.
+
+    Returns a tuple of (kb_patch, session_patch) that callers apply via
+    ``with`` so the mocks tear down cleanly between tests.
+    """
+    kb_value = list(kb or [])
+    session_value = list(session or [])
+    return (
+        patch(
+            "agents.builtin_tools.spreadsheet_analysis.analyze_tool._get_kb_files",
+            new=AsyncMock(return_value=kb_value),
+        ),
+        patch(
+            "agents.builtin_tools.spreadsheet_analysis.analyze_tool._get_session_files",
+            new=AsyncMock(return_value=session_value),
+        ),
+    )
+
+
+class TestExactMatchWins:
+    @pytest.mark.asyncio
+    async def test_exact_xlsx_match_in_session(self):
+        kb_p, sess_p = _patch_sources(
+            session=[_session_file("Report.xlsx", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")],
+        )
+        with kb_p, sess_p:
+            result = await _find_file("Report.xlsx", assistant_id=None, session_id="s1")
+            assert result is not None
+            assert result["filename"] == "Report.xlsx"
+
+    @pytest.mark.asyncio
+    async def test_exact_csv_match_in_kb(self):
+        kb_p, sess_p = _patch_sources(kb=[_kb_file("Q1.csv", "text/csv")])
+        with kb_p, sess_p:
+            result = await _find_file("Q1.csv", assistant_id="ast-1", session_id="s1")
+            assert result is not None
+            assert result["filename"] == "Q1.csv"
+            assert result["source"] == "knowledge_base"
+
+    @pytest.mark.asyncio
+    async def test_exact_match_case_insensitive(self):
+        kb_p, sess_p = _patch_sources(
+            session=[_session_file("Budget.XLSX", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")],
+        )
+        with kb_p, sess_p:
+            result = await _find_file("budget.xlsx", assistant_id=None, session_id="s1")
+            assert result is not None
+            assert result["filename"] == "Budget.XLSX"
+
+    @pytest.mark.asyncio
+    async def test_exact_match_preferred_over_alias(self):
+        """If both ``foo.xlsx`` and ``foo.csv`` exist and the model asks
+        for ``foo.csv``, exact match should win — no surprise aliasing.
+        """
+        kb_p, sess_p = _patch_sources(
+            session=[
+                _session_file("Data.xlsx", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"),
+                _session_file("Data.csv", "text/csv"),
+            ],
+        )
+        with kb_p, sess_p:
+            result = await _find_file("Data.csv", assistant_id=None, session_id="s1")
+            assert result is not None
+            assert result["filename"] == "Data.csv"
+
+
+class TestAliasPass:
+    @pytest.mark.asyncio
+    async def test_csv_request_resolves_xlsx_source(self):
+        """Model asked for ``foo.csv`` (sandbox filename), only ``foo.xlsx``
+        is attached. Alias pass finds it.
+        """
+        kb_p, sess_p = _patch_sources(
+            session=[_session_file("FY_27_Ledger.xlsx", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")],
+        )
+        with kb_p, sess_p:
+            result = await _find_file("FY_27_Ledger.csv", assistant_id=None, session_id="s1")
+            assert result is not None
+            assert result["filename"] == "FY_27_Ledger.xlsx"
+
+    @pytest.mark.asyncio
+    async def test_xlsx_request_resolves_csv_source(self):
+        """Reverse direction — model asked for ``foo.xlsx`` but only
+        ``foo.csv`` is attached (rare but handled).
+        """
+        kb_p, sess_p = _patch_sources(
+            session=[_session_file("Q3.csv", "text/csv")],
+        )
+        with kb_p, sess_p:
+            result = await _find_file("Q3.xlsx", assistant_id=None, session_id="s1")
+            assert result is not None
+            assert result["filename"] == "Q3.csv"
+
+    @pytest.mark.asyncio
+    async def test_alias_only_applies_to_tabular(self):
+        """``foo.pdf`` must not alias to ``foo.docx``. The alias pass is
+        gated on target being a tabular extension.
+        """
+        kb_p, sess_p = _patch_sources(
+            session=[_session_file("report.docx", "application/vnd.openxmlformats-officedocument.wordprocessingml.document")],
+        )
+        with kb_p, sess_p:
+            result = await _find_file("report.pdf", assistant_id=None, session_id="s1")
+            assert result is None
+
+    @pytest.mark.asyncio
+    async def test_alias_skips_non_tabular_candidate(self):
+        """Even if the target is tabular, candidates with non-tabular
+        content/type shouldn't match. Prevents e.g. alias bleeding
+        ``.docx`` into a ``.csv`` request.
+        """
+        kb_p, sess_p = _patch_sources(
+            session=[_session_file("data.pdf", "application/pdf")],
+        )
+        with kb_p, sess_p:
+            result = await _find_file("data.csv", assistant_id=None, session_id="s1")
+            assert result is None
+
+
+class TestSourceOrder:
+    @pytest.mark.asyncio
+    async def test_kb_checked_before_session(self):
+        """When assistant_id is set, KB files are consulted first. This
+        matches behavior documented in the tool: the KB is the
+        authoritative source for assistants.
+        """
+        kb_p, sess_p = _patch_sources(
+            kb=[_kb_file("shared.csv", "text/csv")],
+            session=[_session_file("shared.csv", "text/csv")],
+        )
+        with kb_p, sess_p:
+            result = await _find_file("shared.csv", assistant_id="ast-1", session_id="s1")
+            assert result is not None
+            assert result["source"] == "knowledge_base"
+
+    @pytest.mark.asyncio
+    async def test_no_assistant_skips_kb_lookup(self):
+        """With ``assistant_id=None``, KB is not queried — only session
+        files. Avoids spurious DynamoDB calls on non-assistant chats.
+        """
+        kb_mock = AsyncMock(return_value=[_kb_file("only-in-kb.csv", "text/csv")])
+        sess_mock = AsyncMock(return_value=[_session_file("only-in-session.csv", "text/csv")])
+        with patch(
+            "agents.builtin_tools.spreadsheet_analysis.analyze_tool._get_kb_files",
+            new=kb_mock,
+        ), patch(
+            "agents.builtin_tools.spreadsheet_analysis.analyze_tool._get_session_files",
+            new=sess_mock,
+        ):
+            result = await _find_file("only-in-kb.csv", assistant_id=None, session_id="s1")
+            kb_mock.assert_not_called()
+            # KB file isn't visible; only session files considered.
+            assert result is None
+
+    @pytest.mark.asyncio
+    async def test_returns_none_when_not_found(self):
+        kb_p, sess_p = _patch_sources()
+        with kb_p, sess_p:
+            assert await _find_file("nope.csv", assistant_id="ast-1", session_id="s1") is None
diff --git a/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_helpers.py b/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_helpers.py
new file mode 100644
index 00000000..c723a8bf
--- /dev/null
+++ b/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_helpers.py
@@ -0,0 +1,149 @@
+"""Unit tests for the small pure helpers in analyze_tool.py.
+
+These cover the boring-but-critical glue: output truncation, schema-marker
+extraction, sheet-name sanitization, and safe int parsing. The logic is
+simple so the tests are small — their job is to lock in the current
+behavior so the async refactor doesn't regress the happy paths (#261).
+"""
+
+from agents.builtin_tools.spreadsheet_analysis.analyze_tool import (
+    MAX_OUTPUT_CHARS,
+    _extract_schema_preview,
+    _safe_int,
+    _sanitize_sheet_name,
+    _truncate_output,
+    _SCHEMA_MARKER,
+)
+
+
+class TestTruncateOutput:
+    def test_empty_returns_empty(self):
+        assert _truncate_output("") == ""
+
+    def test_none_returns_none(self):
+        # The helper short-circuits on falsy inputs. Preserve that.
+        assert _truncate_output(None) is None  # type: ignore[arg-type]
+
+    def test_under_cap_unchanged(self):
+        text = "x" * (MAX_OUTPUT_CHARS - 1)
+        assert _truncate_output(text) == text
+
+    def test_at_cap_unchanged(self):
+        text = "x" * MAX_OUTPUT_CHARS
+        assert _truncate_output(text) == text
+
+    def test_over_cap_truncated_with_marker(self):
+        text = "x" * (MAX_OUTPUT_CHARS + 500)
+        out = _truncate_output(text)
+        assert out.startswith("x" * MAX_OUTPUT_CHARS)
+        assert "truncated" in out
+        assert f"{MAX_OUTPUT_CHARS:,}" in out
+        assert f"{len(text):,}" in out
+
+
+class TestExtractSchemaPreview:
+    def test_no_marker_returns_empty_block_and_full_stdout(self):
+        stdout = "some tool output\nwith no marker\n"
+        schema, remaining = _extract_schema_preview(stdout)
+        assert schema == ""
+        assert remaining == stdout
+
+    def test_full_block_between_markers(self):
+        stdout = (
+            f"{_SCHEMA_MARKER}\n"
+            "file: data.csv (10 rows x 3 cols)\n"
+            "columns: a, b, c\n"
+            f"{_SCHEMA_MARKER}\n"
+        )
+        schema, remaining = _extract_schema_preview(stdout)
+        assert "file: data.csv" in schema
+        assert "columns: a, b, c" in schema
+        # The remaining stdout should be empty (or a stripped empty string)
+        assert remaining == "" or remaining.strip() == ""
+
+    def test_schema_surrounded_by_user_output(self):
+        """User code may print before AND after the schema block.
+
+        The helper should pull out just the schema and preserve both sides
+        of the user stdout — important because the tool concatenates the
+        two halves back together when rendering the final response.
+        """
+        stdout = (
+            "Hello from user code\n"
+            f"{_SCHEMA_MARKER}\n"
+            "file: data.csv\n"
+            f"{_SCHEMA_MARKER}\n"
+            "After schema user output\n"
+        )
+        schema, remaining = _extract_schema_preview(stdout)
+        assert "file: data.csv" in schema
+        assert "Hello from user code" in remaining
+        assert "After schema user output" in remaining
+
+    def test_marker_present_only_once_returns_empty_block(self):
+        """A single marker (not bracketed) is malformed — treat as no schema.
+
+        Prevents us from accidentally surfacing half of a stream as a
+        "schema" when the bootstrap failed mid-emit.
+        """
+        stdout = f"partial {_SCHEMA_MARKER}\ntruncated"
+        schema, remaining = _extract_schema_preview(stdout)
+        assert schema == ""
+        # Original stdout returned on the malformed path
+        assert remaining == stdout
+
+
+class TestSafeInt:
+    def test_parses_int(self):
+        assert _safe_int("42") == 42
+
+    def test_parses_large_int(self):
+        assert _safe_int("1000000") == 1_000_000
+
+    def test_strips_whitespace(self):
+        assert _safe_int("  7  ") == 7
+
+    def test_returns_zero_for_empty(self):
+        assert _safe_int("") == 0
+
+    def test_returns_zero_for_garbage(self):
+        assert _safe_int("not-a-number") == 0
+
+    def test_returns_zero_for_none(self):
+        assert _safe_int(None) == 0  # type: ignore[arg-type]
+
+    def test_parses_negative(self):
+        assert _safe_int("-5") == -5
+
+
+class TestSanitizeSheetName:
+    def test_simple_name_lowercased(self):
+        assert _sanitize_sheet_name("Summary") == "summary"
+
+    def test_spaces_become_underscore(self):
+        assert _sanitize_sheet_name("Q1 2026") == "q1_2026"
+
+    def test_multiple_non_alnum_collapse_to_single_underscore(self):
+        assert _sanitize_sheet_name("Q1   ---  2026") == "q1_2026"
+
+    def test_slashes_replaced(self):
+        assert _sanitize_sheet_name("Sales/2026") == "sales_2026"
+
+    def test_unicode_replaced(self):
+        # Non-ASCII characters aren't in [A-Za-z0-9] so they all collapse.
+        assert _sanitize_sheet_name("Ñiño") == "i_o"
+
+    def test_leading_trailing_punctuation_stripped(self):
+        assert _sanitize_sheet_name("--Budget--") == "budget"
+
+    def test_empty_returns_fallback(self):
+        assert _sanitize_sheet_name("") == "sheet"
+
+    def test_all_punctuation_returns_fallback(self):
+        # Everything collapses to "" post-strip, fallback kicks in.
+        assert _sanitize_sheet_name("---") == "sheet"
+
+    def test_deterministic(self):
+        # Same input always yields same output — callers rely on this
+        # to predict filenames.
+        assert _sanitize_sheet_name("Q1 2026") == _sanitize_sheet_name("Q1 2026")
diff --git a/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_list_spreadsheets.py b/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_list_spreadsheets.py
new file mode 100644
index 00000000..6acc21cc
--- /dev/null
+++ b/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_list_spreadsheets.py
@@ -0,0 +1,388 @@
+"""Tests for the ``list_spreadsheets`` tool factory and its two private
+helpers (``_get_kb_files``, ``_get_session_files``).
+
+The factory (``make_list_spreadsheets_tool``) builds a closure-bound tool
+the agent can invoke. Helper-level tests exercise the real boto3 /
+DynamoDB paths via moto so the query / filter / field-mapping logic is
+under test, not mocked out.
+
+After #260 the helpers and the tool itself are ``async def``; tests that
+invoke them are marked with ``@pytest.mark.asyncio`` and use ``await``.
+
+See #261.
+"""
+
+import pytest
+
+from agents.builtin_tools.spreadsheet_analysis.list_spreadsheets_tool import (
+    _get_kb_files,
+    _get_session_files,
+    _is_tabular_file,
+    make_list_spreadsheets_tool,
+)
+
+
+# ---------------------------------------------------------------------------
+# _is_tabular_file — thin wrapper; delegate to shared is_tabular_file
+# ---------------------------------------------------------------------------
+
+
+class TestIsTabularFile:
+    def test_csv_by_extension(self):
+        assert _is_tabular_file("data.csv", "") is True
+
+    def test_csv_by_mime(self):
+        assert _is_tabular_file("anything", "text/csv") is True
+
+    def test_xlsx_by_extension(self):
+        assert _is_tabular_file("data.xlsx", "") is True
+
+    def test_xlsx_by_mime(self):
+        mime = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
+        assert _is_tabular_file("anything", mime) is True
+
+    def test_pdf_rejected(self):
+        assert _is_tabular_file("report.pdf", "application/pdf") is False
+
+    def test_docx_rejected(self):
+        docx_mime = "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
+        assert _is_tabular_file("report.docx", docx_mime) is False
+
+
+# ---------------------------------------------------------------------------
+# list_spreadsheets tool — factory + invocation
+# ---------------------------------------------------------------------------
+
+
+def _call_tool(tool) -> dict:
+    """Invoke a Strands-decorated async tool and unwrap the result.
+
+    ``@tool`` returns a wrapper that exposes the original coroutine
+    function via ``__wrapped__``. We ``await`` it from the test, which
+    must be marked ``@pytest.mark.asyncio``.
+    """
+    fn = getattr(tool, "__wrapped__", None) or tool
+    return fn()
+
+
+class TestMakeListSpreadsheetsTool:
+    @pytest.mark.asyncio
+    async def test_empty_state_returns_helpful_message(self, file_sources):
+        set_kb, set_session = file_sources
+        set_kb([])
+        set_session([])
+
+        tool = make_list_spreadsheets_tool(
+            assistant_id="ast-1", session_id="s1", user_id="u1"
+        )
+        result = await _call_tool(tool)
+
+        assert result["status"] == "success"
+        text = result["content"][0]["text"]
+        assert "No spreadsheet files" in text
+        # "files" key should NOT be present on the empty path so the model
+        # doesn't loop on an empty list.
+        assert "files" not in result
+
+    @pytest.mark.asyncio
+    async def test_kb_and_session_files_merged(self, file_sources, file_factories):
+        set_kb, set_session = file_sources
+        set_kb([file_factories["kb_xlsx"]("Budget.xlsx")])
+        set_session([file_factories["session_csv"]("notes.csv")])
+
+        tool = make_list_spreadsheets_tool(
+            assistant_id="ast-1", session_id="s1", user_id="u1"
+        )
+        result = await _call_tool(tool)
+
+        assert result["status"] == "success"
+        filenames = [f["filename"] for f in result["files"]]
+        assert filenames == ["Budget.xlsx", "notes.csv"]
+
+        text = result["content"][0]["text"]
+        assert "Budget.xlsx" in text
+        assert "knowledge_base" in text
+        assert "notes.csv" in text
+        assert "chat_attachment" in text
+
+    @pytest.mark.asyncio
+    async def test_no_assistant_skips_kb_call(self, file_sources):
+        """Without an assistant_id, KB files aren't queried — locks in the
+        conditional branch so we don't regress and start spamming DynamoDB
+        on non-assistant chats.
+        """
+        from unittest.mock import patch
+
+        kb_calls = []
+
+        async def _track(_aid):
+            kb_calls.append(_aid)
+            return []
+
+        set_kb, set_session = file_sources
+        set_session([])
+        with patch(
+            "agents.builtin_tools.spreadsheet_analysis.list_spreadsheets_tool._get_kb_files",
+            side_effect=_track,
+        ):
+            tool = make_list_spreadsheets_tool(
+                assistant_id=None, session_id="s1", user_id="u1"
+            )
+            await _call_tool(tool)
+
+        assert kb_calls == [], "KB lookup should be skipped when assistant_id is None"
+
+    @pytest.mark.asyncio
+    async def test_size_formatted_in_kb(self, file_sources, file_factories):
+        """Files are rendered with their size in KB for the preview text.
+        Pinning this so the formatter change doesn't silently regress.
+        """
+        set_kb, set_session = file_sources
+        set_kb([])
+        set_session([file_factories["session_csv"]("tiny.csv", size=2560)])
+
+        tool = make_list_spreadsheets_tool(
+            assistant_id=None, session_id="s1", user_id="u1"
+        )
+        result = await _call_tool(tool)
+        text = result["content"][0]["text"]
+        # 2560 bytes → 3 KB with the current round-to-nearest formatter.
+        assert "3 KB" in text or "2 KB" in text  # allow either rounding
+
+
+# ---------------------------------------------------------------------------
+# _get_kb_files — DynamoDB query with status filter, via moto
+# ---------------------------------------------------------------------------
+
+
+XLSX_MIME = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
+
+
+class TestGetKbFilesDynamoDB:
+    """Exercise the real DynamoDB query path against a moto-backed table.
+
+    This replaces the earlier MagicMock-based tests: those verified that
+    ``table.query`` was called, but didn't actually check the schema
+    (attribute names, key-condition expression) matches what production
+    writes. Moto does.
+
+    ``_get_kb_files`` is ``async def`` (see #260) so each test awaits it.
+    """
+
+    @pytest.mark.asyncio
+    async def test_no_table_env_returns_empty(self, monkeypatch):
+        """Helper bails out cleanly when the env var isn't set at all,
+        rather than crashing on a missing table.
+        """
+        monkeypatch.delenv("DYNAMODB_ASSISTANTS_TABLE_NAME", raising=False)
+        assert await _get_kb_files("ast-1") == []
+
+    @pytest.mark.asyncio
+    async def test_completed_tabular_file_included(self, assistants_table, seed_kb_doc):
+        seed_kb_doc(
+            assistant_id="ast-1",
+            filename="Budget.xlsx",
+            content_type=XLSX_MIME,
+            size_bytes=1024,
+        )
+        files = await _get_kb_files("ast-1")
+        assert len(files) == 1
+        assert files[0]["filename"] == "Budget.xlsx"
+        assert files[0]["source"] == "knowledge_base"
+        assert files[0]["size_bytes"] == 1024
+
+    @pytest.mark.asyncio
+    async def test_non_tabular_file_filtered_out(self, assistants_table, seed_kb_doc):
+        seed_kb_doc(
+            assistant_id="ast-1",
+            filename="report.pdf",
+            content_type="application/pdf",
+        )
+        assert await _get_kb_files("ast-1") == []
+
+    @pytest.mark.asyncio
+    async def test_incomplete_status_filtered_out(self, assistants_table, seed_kb_doc):
+        seed_kb_doc(
+            assistant_id="ast-1",
+            filename="Pending.xlsx",
+            content_type=XLSX_MIME,
+            status="processing",  # not "complete"
+        )
+        assert await _get_kb_files("ast-1") == []
+
+    @pytest.mark.asyncio
+    async def test_mixed_statuses_filters_correctly(self, assistants_table, seed_kb_doc):
+        seed_kb_doc(assistant_id="ast-1", filename="done.csv",
+                    content_type="text/csv", status="complete")
+        seed_kb_doc(assistant_id="ast-1", filename="broken.csv",
+                    content_type="text/csv", status="failed")
+        seed_kb_doc(assistant_id="ast-1", filename="notes.txt",
+                    content_type="text/plain", status="complete")
+
+        files = await _get_kb_files("ast-1")
+        assert len(files) == 1
+        assert files[0]["filename"] == "done.csv"
+
+    @pytest.mark.asyncio
+    async def test_isolates_by_assistant_id(self, assistants_table, seed_kb_doc):
+        """The ``PK = AST#<id>`` key condition partitions by assistant.
+        Documents under a different assistant must not leak through.
+        """
+        seed_kb_doc(assistant_id="ast-1", filename="mine.csv",
+                    content_type="text/csv")
+        seed_kb_doc(assistant_id="ast-other", filename="theirs.csv",
+                    content_type="text/csv")
+
+        files = await _get_kb_files("ast-1")
+        assert [f["filename"] for f in files] == ["mine.csv"]
+
+    @pytest.mark.asyncio
+    async def test_dynamodb_exception_returns_empty(
+        self, aws_mocked, monkeypatch, caplog
+    ):
+        """Graceful degradation: a query failure shouldn't crash the
+        tool. Points the helper at a table that doesn't exist *within
+        moto* so the failure mode is the production-realistic
+        ``ResourceNotFoundException`` rather than a credentials error
+        (which would mask a real graceful-degradation regression).
+        """
+        import logging
+
+        monkeypatch.setenv("DYNAMODB_ASSISTANTS_TABLE_NAME", "nonexistent-table")
+        with caplog.at_level(logging.ERROR):
+            files = await _get_kb_files("ast-1")
+        assert files == []
+        # Verify we actually hit the exception branch — passing solely
+        # because the early-return fired would be a silent regression
+        # of the graceful-degradation contract the next refactor
+        # (#260) needs to preserve.
+        assert any(
+            "ResourceNotFoundException" in record.getMessage()
+            or "not found" in record.getMessage().lower()
+            for record in caplog.records
+        ), f"expected error log, got: {[r.getMessage() for r in caplog.records]}"
+
+    @pytest.mark.asyncio
+    async def test_legacy_snake_case_fields_supported(self, assistants_table, seed_kb_doc):
+        """The repo stores camelCase but some legacy items use snake_case
+        aliases. Both must work.
+        """
+        seed_kb_doc(
+            assistant_id="ast-1",
+            filename="legacy.xlsx",
+            content_type=XLSX_MIME,
+            size_bytes=500,
+            use_snake_case=True,
+        )
+        files = await _get_kb_files("ast-1")
+        assert len(files) == 1
+        assert files[0]["filename"] == "legacy.xlsx"
+        assert files[0]["size_bytes"] == 500
+
+
+# ---------------------------------------------------------------------------
+# _get_session_files — async repo, via moto
+# ---------------------------------------------------------------------------
+
+
+class TestGetSessionFiles:
+    """Real repository queries against the moto-backed files table.
+
+    After #260, ``_get_session_files`` awaits the repository directly
+    instead of running ``asyncio.run`` inside a thread-pool. These
+    tests exercise the straightened-out async path end-to-end.
+    """
+
+    @pytest.mark.asyncio
+    async def test_returns_tabular_files_only(
+        self, file_repository, seed_session_file
+    ):
+        await seed_session_file(
+            session_id="s1", upload_id="u-xlsx",
+            filename="Budget.xlsx", mime_type=XLSX_MIME,
+        )
+        await seed_session_file(
+            session_id="s1", upload_id="u-md",
+            filename="README.md", mime_type="text/markdown",
+        )
+        await seed_session_file(
+            session_id="s1", upload_id="u-csv",
+            filename="data.csv", mime_type="text/csv",
+        )
+
+        files = await _get_session_files("s1")
+        filenames = {f["filename"] for f in files}
+        assert filenames == {"Budget.xlsx", "data.csv"}
+        assert "README.md" not in filenames
+
+    @pytest.mark.asyncio
+    async def test_empty_session_returns_empty(self, file_repository):
+        # No files seeded — list_session_files returns [].
+        assert await _get_session_files("s1") == []
+
+    @pytest.mark.asyncio
+    async def test_missing_table_env_returns_empty(self, aws_mocked, monkeypatch, caplog):
+        """Pointing the repo at a table that doesn't exist exercises the
+        exception path inside the async helper. Tool should return an
+        empty list, not crash.
+
+        Uses ``caplog`` to confirm the error was actually logged —
+        otherwise this test could regress silently if a future refactor
+        made the helper return ``[]`` without ever reaching the
+        exception branch.
+        """
+        import logging
+
+        # Reset the module-level singleton so the new env var is picked
+        # up on the next ``get_file_upload_repository()`` call — otherwise
+        # we inherit the repo bound to whatever table name another test
+        # happened to set first.
+        import apis.shared.files.repository as repo_module
+        monkeypatch.setattr(repo_module, "_repository_instance", None)
+        monkeypatch.setenv("DYNAMODB_USER_FILES_TABLE_NAME", "no-such-table")
+
+        with caplog.at_level(logging.ERROR):
+            files = await _get_session_files("s1")
+        assert files == []
+        assert any(
+            "ResourceNotFoundException" in record.getMessage()
+            or "not found" in record.getMessage().lower()
+            for record in caplog.records
+        ), f"expected error log, got: {[r.getMessage() for r in caplog.records]}"
+
+    @pytest.mark.asyncio
+    async def test_record_structure(
+        self, file_repository, seed_session_file
+    ):
+        """Session records need specific keys so analyze_tool._download_file
+        can find the S3 bucket/key. Lock the contract.
+        """
+        await seed_session_file(
+            session_id="s1", upload_id="u-1",
+            filename="Q1.csv", mime_type="text/csv",
+        )
+        files = await _get_session_files("s1")
+        assert files[0].keys() >= {
+            "filename", "source", "content_type", "size_bytes",
+            "document_id", "s3_key", "s3_bucket",
+        }
+        assert files[0]["source"] == "chat_attachment"
+
+    @pytest.mark.asyncio
+    async def test_isolates_by_session_id(
+        self, file_repository, seed_session_file
+    ):
+        """The session index must partition: a file attached to session
+        A should not appear in session B's list.
+        """
+        await seed_session_file(
+            session_id="s1", upload_id="u-a",
+            filename="a.csv", mime_type="text/csv",
+        )
+        await seed_session_file(
+            session_id="s2", upload_id="u-b",
+            filename="b.csv", mime_type="text/csv",
+        )
+
+        files = await _get_session_files("s1")
+        assert [f["filename"] for f in files] == ["a.csv"]
diff --git a/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_sheet_inventory.py b/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_sheet_inventory.py
new file mode 100644
index 00000000..1b71a694
--- /dev/null
+++ b/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_sheet_inventory.py
@@ -0,0 +1,307 @@
+"""Tests for ``_parse_sheet_inventory`` and ``_format_sheet_note`` — the
+parser for the XLSX bootstrap's pipe-delimited sheet inventory, and the
+markdown footer builder that surfaces it to the model.
+
+The inventory flows from the sandbox's stdout back to the tool response,
+so regressions here would either silently drop multi-sheet support or
+mis-report which sheets were included/skipped. See #261.
+"""
+
+from agents.builtin_tools.spreadsheet_analysis.analyze_tool import (
+    MAX_ROWS_PER_SHEET,
+    _SHEETS_MARKER,
+    _format_sheet_note,
+    _parse_sheet_inventory,
+)
+
+
+def _wrap_block(lines: list[str]) -> str:
+    """Helper: wrap inventory lines with the sheet markers as the bootstrap
+    would emit them.
+    """
+    body = "\n".join(lines)
+    return f"{_SHEETS_MARKER}\n{body}\n{_SHEETS_MARKER}\n"
+
+
+class TestParseSheetInventoryEmpty:
+    def test_no_marker_returns_empty_inventory(self):
+        result = _parse_sheet_inventory("some unrelated stdout")
+        assert result["total"] == 0
+        assert result["sheets"] == []
+        assert result["skipped"] == 0
+        assert result["has_primary_alias"] is False
+
+    def test_empty_string_returns_empty_inventory(self):
+        result = _parse_sheet_inventory("")
+        assert result["sheets"] == []
+
+    def test_single_marker_returns_empty_inventory(self):
+        """Malformed emission with only one marker — don't try to parse."""
+        result = _parse_sheet_inventory(f"partial {_SHEETS_MARKER}\nsheet|x|x|0|0|")
+        # Behavior: parser splits on marker; only one marker means no
+        # bracketed block. Should still return a safe empty structure.
+        # Whether it returns data or empty is implementation-defined, but
+        # it must not raise.
+        assert isinstance(result, dict)
+
+
+class TestParseSheetInventorySingleSheet:
+    def test_single_sheet_no_truncation(self):
+        stdout = _wrap_block([
+            "total: 1",
+            "converted: 1",
+            "skipped: 0",
+            "sheet|Summary|Budget.csv|100|0|",
+        ])
+        result = _parse_sheet_inventory(stdout)
+        assert result["total"] == 1
+        assert result["converted"] == 1
+        assert result["skipped"] == 0
+        assert len(result["sheets"]) == 1
+        assert result["sheets"][0]["name"] == "Summary"
+        assert result["sheets"][0]["path"] == "Budget.csv"
+        assert result["sheets"][0]["rows"] == 100
+        assert result["sheets"][0]["truncated"] is False
+        assert result["sheets"][0]["primary_alias"] is None
+        assert result["has_primary_alias"] is False
+
+    def test_truncation_flag_parsed(self):
+        stdout = _wrap_block([
+            "total: 1",
+            "converted: 1",
+            "skipped: 0",
+            "sheet|BigSheet|data.csv|500000|1|",
+        ])
+        result = _parse_sheet_inventory(stdout)
+        assert result["sheets"][0]["truncated"] is True
+
+
+class TestParseSheetInventoryMultiSheet:
+    def test_multi_sheet_with_primary_alias(self):
+        stdout = _wrap_block([
+            "total: 3",
+            "converted: 3",
+            "skipped: 0",
+            "sheet|Summary|Budget.summary.csv|12|0|Budget.csv",
+            "sheet|Transactions|Budget.transactions.csv|18551|0|",
+            "sheet|Notes|Budget.notes.csv|5|0|",
+        ])
+        result = _parse_sheet_inventory(stdout)
+        assert result["total"] == 3
+        assert result["converted"] == 3
+        assert len(result["sheets"]) == 3
+        assert result["has_primary_alias"] is True
+        assert result["sheets"][0]["primary_alias"] == "Budget.csv"
+        # Sibling sheets don't carry the alias.
+        assert result["sheets"][1]["primary_alias"] is None
+        assert result["sheets"][2]["primary_alias"] is None
+
+    def test_skipped_sheets_preview(self):
+        stdout = _wrap_block([
+            "total: 30",
+            "converted: 25",
+            "skipped: 5",
+            "skipped_names: ['Sheet26', 'Sheet27', 'Sheet28', 'Sheet29', 'Sheet30']",
+            "sheet|Sheet1|data.sheet1.csv|10|0|",
+        ])
+        result = _parse_sheet_inventory(stdout)
+        assert result["skipped"] == 5
+        assert result["skipped_preview"] == [
+            "Sheet26", "Sheet27", "Sheet28", "Sheet29", "Sheet30",
+        ]
+
+    def test_sheet_names_with_special_chars_via_literal_eval(self):
+        """Sheet names can contain commas, apostrophes, etc. The skipped_names
+        field is a Python list literal — ast.literal_eval handles quoting
+        correctly. This locks in the contract.
+        """
+        stdout = _wrap_block([
+            "total: 5",
+            "converted: 3",
+            "skipped: 2",
+            'skipped_names: ["O\'Brien, J.", "Q1, 2026"]',
+            "sheet|Main|data.main.csv|10|0|",
+        ])
+        result = _parse_sheet_inventory(stdout)
+        # Both names survive round-trip.
+        assert "O'Brien, J." in result["skipped_preview"]
+        assert "Q1, 2026" in result["skipped_preview"]
+
+    def test_malformed_skipped_names_gracefully_ignored(self):
+        """If the literal is invalid, we don't crash — we just skip it."""
+        stdout = _wrap_block([
+            "total: 10",
+            "converted: 5",
+            "skipped: 5",
+            "skipped_names: not-a-valid-literal",
+            "sheet|Main|data.main.csv|10|0|",
+        ])
+        result = _parse_sheet_inventory(stdout)
+        assert result["skipped_preview"] == []
+        # Other fields still populated.
+        assert result["total"] == 10
+
+
+class TestParseSheetInventoryMalformedSheetLines:
+    def test_truncated_sheet_line_skipped(self):
+        """A sheet line with fewer than 6 pipe-delimited fields is
+        skipped rather than crashing the parser.
+        """
+        stdout = _wrap_block([
+            "total: 2",
+            "converted: 2",
+            "skipped: 0",
+            "sheet|Valid|data.csv|10|0|",
+            "sheet|Broken|truncated",  # too few fields
+        ])
+        result = _parse_sheet_inventory(stdout)
+        # Only the valid sheet is kept.
+        assert len(result["sheets"]) == 1
+        assert result["sheets"][0]["name"] == "Valid"
+
+    def test_integer_fields_with_whitespace(self):
+        """``_safe_int`` handles surrounding whitespace — regression
+        guard: the parser strips on its own too.
+        """
+        stdout = _wrap_block([
+            "total:  42 ",
+            "converted:  10  ",
+            "skipped: 32",
+            "sheet|S|p.csv|  500  | 0 |",
+        ])
+        result = _parse_sheet_inventory(stdout)
+        assert result["total"] == 42
+        assert result["converted"] == 10
+        assert result["skipped"] == 32
+        assert result["sheets"][0]["rows"] == 500
+
+
+# ---------------------------------------------------------------------------
+# _format_sheet_note
+# ---------------------------------------------------------------------------
+
+
+class TestFormatSheetNoteSingleSheet:
+    def test_single_sheet_no_truncation_returns_empty(self):
+        """Single-sheet workbook without truncation is the boring case —
+        no message needed.
+        """
+        inventory = {
+            "total": 1,
+            "converted": 1,
+            "skipped": 0,
+            "skipped_preview": [],
+            "sheets": [
+                {"name": "Sheet1", "path": "data.csv", "rows": 100,
+                 "truncated": False, "primary_alias": None},
+            ],
+            "has_primary_alias": False,
+        }
+        assert _format_sheet_note(inventory) == ""
+
+    def test_single_sheet_truncated_surfaces_warning(self):
+        inventory = {
+            "total": 1,
+            "converted": 1,
+            "skipped": 0,
+            "skipped_preview": [],
+            "sheets": [
+                {"name": "BigSheet", "path": "data.csv",
+                 "rows": MAX_ROWS_PER_SHEET, "truncated": True,
+                 "primary_alias": None},
+            ],
+            "has_primary_alias": False,
+        }
+        note = _format_sheet_note(inventory)
+        assert note != ""
+        assert "truncated" in note.lower()
+        assert "BigSheet" in note
+        assert f"{MAX_ROWS_PER_SHEET:,}" in note
+
+
+class TestFormatSheetNoteMultiSheet:
+    def test_all_sheets_converted(self):
+        inventory = {
+            "total": 3,
+            "converted": 3,
+            "skipped": 0,
+            "skipped_preview": [],
+            "sheets": [
+                {"name": "Summary", "path": "Budget.summary.csv", "rows": 12,
+                 "truncated": False, "primary_alias": "Budget.csv"},
+                {"name": "Transactions", "path": "Budget.transactions.csv",
+                 "rows": 18551, "truncated": False, "primary_alias": None},
+                {"name": "Notes", "path": "Budget.notes.csv", "rows": 5,
+                 "truncated": False, "primary_alias": None},
+            ],
+            "has_primary_alias": True,
+        }
+        note = _format_sheet_note(inventory)
+        # Full inventory listed so the model can pick or combine.
+        assert "Available sheets" in note
+        assert "Summary" in note
+        assert "Transactions" in note
+        assert "Notes" in note
+        assert "Budget.summary.csv" in note
+        assert "Budget.transactions.csv" in note
+        assert "18,551" in note  # row count formatted with commas
+
+    def test_skipped_sheets_surfaced_with_names(self):
+        inventory = {
+            "total": 30,
+            "converted": 25,
+            "skipped": 5,
+            "skipped_preview": ["Q6", "Q7", "Q8", "Q9", "Q10"],
+            "sheets": [
+                {"name": f"Q{i + 1}", "path": f"Budget.q{i + 1}.csv",
+                 "rows": 100, "truncated": False, "primary_alias": None}
+                for i in range(25)
+            ],
+            "has_primary_alias": False,
+        }
+        note = _format_sheet_note(inventory)
+        assert "30 sheets" in note
+        assert "first 25" in note
+        assert "Q6" in note
+        assert "Q10" in note
+        # Tells the user what to do about it.
+        assert "split" in note.lower() or "export" in note.lower()
+
+    def test_skipped_many_includes_more_suffix(self):
+        inventory = {
+            "total": 100,
+            "converted": 25,
+            "skipped": 75,
+            "skipped_preview": ["A", "B", "C", "D", "E"],
+            "sheets": [
+                {"name": f"S{i}", "path": f"d.s{i}.csv", "rows": 1,
+                 "truncated": False, "primary_alias": None}
+                for i in range(25)
+            ],
+            "has_primary_alias": False,
+        }
+        note = _format_sheet_note(inventory)
+        assert "+70 more" in note  # 75 skipped - 5 shown = 70
+
+    def test_truncated_sheet_annotation_in_list(self):
+        inventory = {
+            "total": 2,
+            "converted": 2,
+            "skipped": 0,
+            "skipped_preview": [],
+            "sheets": [
+                {"name": "Huge", "path": "wb.huge.csv",
+                 "rows": MAX_ROWS_PER_SHEET, "truncated": True,
+                 "primary_alias": None},
+                {"name": "Small", "path": "wb.small.csv", "rows": 10,
+                 "truncated": False, "primary_alias": None},
+            ],
+            "has_primary_alias": False,
+        }
+        note = _format_sheet_note(inventory)
+        # The truncated row should have a specific tag; the other shouldn't.
+        lines = note.splitlines()
+        huge_line = next(line for line in lines if "Huge" in line)
+        small_line = next(line for line in lines if "Small" in line)
+        assert "truncated" in huge_line.lower()
+        assert "truncated" not in small_line.lower()
diff --git a/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_strip_first_row.py b/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_strip_first_row.py
new file mode 100644
index 00000000..505f043b
--- /dev/null
+++ b/backend/tests/agents/builtin_tools/spreadsheet_analysis/test_strip_first_row.py
@@ -0,0 +1,70 @@
+"""Tests for ``_strip_first_row`` — drops the ``first_row:`` line from a
+schema footer on the error path to keep retry responses token-efficient.
+
+Simple helper but load-bearing: every analyze_spreadsheet error retry goes
+through it, and a bug here silently bloats every follow-up turn by ~1K
+tokens (#261).
+"""
+
+from agents.builtin_tools.spreadsheet_analysis.analyze_tool import _strip_first_row
+
+
+class TestStripFirstRow:
+    def test_drops_first_row_line(self):
+        schema = (
+            "file: data.csv (100 rows x 5 cols)\n"
+            "load: pd.read_csv('data.csv', low_memory=False)\n"
+            "columns: a, b, c, d, e\n"
+            "first_row: {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}\n"
+        )
+        result = _strip_first_row(schema)
+        assert "first_row:" not in result
+        assert "file: data.csv" in result
+        assert "load:" in result
+        assert "columns:" in result
+
+    def test_no_first_row_line_unchanged(self):
+        """If the schema footer doesn't have a first_row line (malformed or
+        schema-preview-failed path), return it as-is. Don't lose structure
+        trying to remove something that isn't there.
+        """
+        schema = (
+            "file: data.csv (100 rows x 5 cols)\n"
+            "load: pd.read_csv('data.csv', low_memory=False)\n"
+            "columns: a, b, c, d, e"
+        )
+        result = _strip_first_row(schema)
+        assert result.count("\n") == schema.count("\n")
+        assert "file: data.csv" in result
+        assert "columns:" in result
+
+    def test_empty_input_returns_empty_string(self):
+        assert _strip_first_row("") == ""
+
+    def test_only_first_row_line_returns_empty(self):
+        assert _strip_first_row("first_row: {'a': 1}") == ""
+
+    def test_first_row_with_leading_whitespace_not_stripped(self):
+        """The helper is strict: only lines whose raw text starts with
+        ``first_row:`` are dropped. Indented variants (which we don't emit)
+        should pass through. Pinning this so a future "be more lenient"
+        change is deliberate.
+        """
+        schema = (
+            "file: data.csv\n"
+            "  first_row: {'indented': True}\n"
+            "columns: a, b"
+        )
+        result = _strip_first_row(schema)
+        assert "indented" in result
+
+    def test_preserves_line_ordering(self):
+        schema = (
+            "file: a\n"
+            "first_row: x\n"
+            "columns: z\n"
+            "note: extra\n"
+        )
+        lines = _strip_first_row(schema).splitlines()
+        # Only the first_row line should be gone; relative order preserved.
+        assert lines == ["file: a", "columns: z", "note: extra"]
diff --git a/backend/tests/agents/main_agent/core/test_model_config.py b/backend/tests/agents/main_agent/core/test_model_config.py
index 9489f909..5dfeafb2 100644
--- a/backend/tests/agents/main_agent/core/test_model_config.py
+++ b/backend/tests/agents/main_agent/core/test_model_config.py
@@ -98,9 +98,10 @@ def test_explicit_gemini_overrides_gpt_model_id(self):
 class TestToBedrockConfig:
     """Validates: Requirements 1.6, 1.7"""
 
-    def test_bedrock_config_with_caching_disabled_due_to_bedrock_limitation(self):
-        """Req 1.6 — caching_enabled=True but cache_config omitted due to
-        Bedrock limitation with non-PDF document blocks. See model_config.py TODO."""
+    def test_bedrock_config_with_caching_enabled_currently_omits_cache_config(self):
+        """Req 1.6 — caching_enabled=True but cache_config omitted while
+        Bedrock prompt caching rollout is deferred. The SDK-side blocker is
+        resolved in strands 1.39.0; see model_config.py for the deferral note."""
         cfg = ModelConfig(caching_enabled=True)
         result = cfg.to_bedrock_config()
 
@@ -179,6 +180,109 @@ def test_bedrock_config_thinking_disabled_passes_sampling_params_through(self):
         assert result["temperature"] == 0.5
         assert result["top_p"] == 0.8
 
+    @pytest.mark.parametrize(
+        "model_id",
+        [
+            "us.anthropic.claude-opus-4-7-20260115-v1:0",
+            "us.anthropic.claude-opus-4-6",
+            "us.anthropic.claude-sonnet-4-6",
+            "claude-mythos-preview",
+        ],
+    )
+    def test_bedrock_thinking_uses_adaptive_shape_on_newer_models(self, model_id):
+        """Opus 4.6/4.7, Sonnet 4.6 and Mythos require/recommend adaptive
+        thinking. Opus 4.7 rejects `{type:"enabled"}` with a 400, so the
+        int budget only signals "on" and the shape is `{type:"adaptive"}`.
+        `display:"summarized"` keeps the reasoning trace from going blank
+        (Opus 4.7 defaults display to "omitted")."""
+        cfg = ModelConfig(model_id=model_id, inference_params={"thinking": 4096})
+        result = cfg.to_bedrock_config()
+
+        assert result["additional_request_fields"]["thinking"] == {
+            "type": "adaptive",
+            "display": "summarized",
+        }
+        assert "budget_tokens" not in result["additional_request_fields"]["thinking"]
+
+    @pytest.mark.parametrize(
+        "model_id",
+        [
+            "us.anthropic.claude-sonnet-4-5-20250101-v1:0",
+            "claude-3-opus",
+            "us.anthropic.claude-haiku-4-5-20251001-v1:0",
+        ],
+    )
+    def test_bedrock_thinking_keeps_legacy_enabled_shape_on_older_models(self, model_id):
+        """Older models (Sonnet 4.5, Claude 3, Haiku 4.5) still take the
+        legacy `{type:"enabled", budget_tokens:N}` shape — unchanged."""
+        cfg = ModelConfig(model_id=model_id, inference_params={"thinking": 4096})
+        result = cfg.to_bedrock_config()
+
+        assert result["additional_request_fields"]["thinking"] == {
+            "type": "enabled",
+            "budget_tokens": 4096,
+        }
+
+    def test_bedrock_adaptive_thinking_still_suppresses_sampling_params(self):
+        """Anthropic rejects temperature/top_p/top_k while extended thinking
+        is on regardless of mode — suppression still fires for adaptive."""
+        cfg = ModelConfig(
+            model_id="us.anthropic.claude-opus-4-7-20260115-v1:0",
+            inference_params={"thinking": 2048, "temperature": 0.7, "top_p": 0.9},
+        )
+        result = cfg.to_bedrock_config()
+
+        assert "temperature" not in result
+        assert "top_p" not in result
+        assert result["additional_request_fields"]["thinking"]["type"] == "adaptive"
+
+    def test_bedrock_effort_maps_to_output_config(self):
+        """`effort` rides through additional_request_fields as Anthropic's
+        top-level `output_config.effort` — not a Converse standard field."""
+        cfg = ModelConfig(
+            model_id="us.anthropic.claude-opus-4-7-20260115-v1:0",
+            inference_params={"effort": "xhigh"},
+        )
+        result = cfg.to_bedrock_config()
+
+        assert result["additional_request_fields"]["output_config"]["effort"] == "xhigh"
+        assert "effort" not in result
+        assert "output_config" not in result
+
+    def test_bedrock_effort_and_adaptive_thinking_coexist(self):
+        """effort and adaptive thinking are independent knobs — both land
+        under additional_request_fields together."""
+        cfg = ModelConfig(
+            model_id="us.anthropic.claude-opus-4-7-20260115-v1:0",
+            inference_params={"thinking": 2048, "effort": "high"},
+        )
+        result = cfg.to_bedrock_config()
+
+        arf = result["additional_request_fields"]
+        assert arf["thinking"] == {"type": "adaptive", "display": "summarized"}
+        assert arf["output_config"]["effort"] == "high"
+
+    def test_bedrock_config_coerces_float_max_tokens_to_int(self):
+        """JSON-sourced inference params can carry a float (100000.0); the
+        Bedrock SDK rejects a float maxTokens, so it must be coerced to int."""
+        cfg = ModelConfig(inference_params={"max_tokens": 100000.0, "top_k": 40.0})
+        result = cfg.to_bedrock_config()
+
+        assert result["max_tokens"] == 100000
+        assert isinstance(result["max_tokens"], int)
+        assert result["additional_request_fields"]["top_k"] == 40
+        assert isinstance(result["additional_request_fields"]["top_k"], int)
+
+    def test_gemini_config_coerces_float_max_tokens_to_int(self):
+        """Coercion applies across providers — Gemini max_output_tokens too."""
+        cfg = ModelConfig(
+            model_id="gemini-pro", inference_params={"max_tokens": 2048.0}
+        )
+        result = cfg.to_gemini_config()
+
+        assert result["params"]["max_output_tokens"] == 2048
+        assert isinstance(result["params"]["max_output_tokens"], int)
+
     def test_bedrock_config_drops_unknown_canonical_param(self):
         """Provider translation table silently drops keys it doesn't know."""
         cfg = ModelConfig(inference_params={"reasoning_effort": "high"})
diff --git a/backend/tests/agents/main_agent/integrations/test_mcp_apps.py b/backend/tests/agents/main_agent/integrations/test_mcp_apps.py
new file mode 100644
index 00000000..46063cae
--- /dev/null
+++ b/backend/tests/agents/main_agent/integrations/test_mcp_apps.py
@@ -0,0 +1,500 @@
+"""Tests for MCP Apps host support (PR #2 of the host-renderer initiative).
+
+Covers the PR #2 acceptance criteria from
+`docs/kaizen/scoping/mcp-apps-host-renderer.md`:
+
+  (a) `io.modelcontextprotocol/ui` is advertised on the outbound MCP
+      `initialize` when the host flag is on, and absent when it is off.
+  (b) A tool whose `_meta.ui.visibility` excludes `"model"` is filtered out
+      of the Strands tool list (external client + gateway filtered client).
+  (c) `_meta.ui.resourceUri` survives the round-trip into our tool catalog.
+  (d) Ordinary tools and default-visibility (`["model", "app"]`) tools are
+      unaffected.
+
+The fake-MCP-server surface is a `super().list_tools_sync()` stub returning
+UI-bearing tools, mirroring the mock-the-boundary style already used in
+`test_external_mcp_client.py`.
+"""
+
+from types import SimpleNamespace
+from unittest.mock import AsyncMock, patch
+
+import anyio
+import mcp.types as mcp_types
+import pytest
+import strands.tools.mcp.mcp_client as strands_mcp_client_mod
+from mcp.shared.session import BaseSession
+from strands.types import PaginatedList
+
+from agents.main_agent.integrations import mcp_apps
+from agents.main_agent.integrations.mcp_apps import (
+    MCP_APPS_UI_EXTENSION_KEY,
+    MCP_APPS_UI_MIME_TYPE,
+    UICapableMCPClient,
+    _UIExtensionClientSession,
+    ensure_ui_extension_session_patch,
+    fetch_ui_resource,
+    get_ui_tool_catalog,
+    record_and_filter_ui_tools,
+)
+from agents.main_agent.integrations.gateway_mcp_client import FilteredMCPClient
+from apis.shared.tools.models import DEFAULT_TOOL_VISIBILITY, ToolUIMetadata
+
+_ENV_FLAG = "AGENTCORE_MCP_APPS_HOST_ENABLED"
+_ENV_SANDBOX_ORIGIN = "AGENTCORE_MCP_APPS_SANDBOX_ORIGIN"
+
+
+@pytest.fixture
+def mcp_apps_clean(monkeypatch):
+    """Isolate the global catalog and the strands ClientSession symbol."""
+    get_ui_tool_catalog().clear()
+    original_session = strands_mcp_client_mod.ClientSession
+    monkeypatch.delenv(_ENV_FLAG, raising=False)
+    monkeypatch.delenv(_ENV_SANDBOX_ORIGIN, raising=False)
+    try:
+        yield
+    finally:
+        strands_mcp_client_mod.ClientSession = original_session
+        get_ui_tool_catalog().clear()
+
+
+def _fake_tool(tool_name, ui=None, mcp_name=None):
+    """An MCPAgentTool stand-in: it carries the raw mcp tool with `_meta`."""
+    meta = {"ui": ui} if ui is not None else None
+    return SimpleNamespace(
+        tool_name=tool_name,
+        mcp_tool=SimpleNamespace(name=mcp_name or tool_name, meta=meta),
+    )
+
+
+# ── ToolUIMetadata.from_meta ──────────────────────────────────────────────────
+
+
+class TestToolUIMetadataParsing:
+    def test_returns_none_for_non_ui_tool(self):
+        assert ToolUIMetadata.from_meta(None) is None
+        assert ToolUIMetadata.from_meta({}) is None
+        assert ToolUIMetadata.from_meta({"other": 1}) is None
+
+    def test_absent_visibility_defaults_to_spec_default(self):
+        ui = ToolUIMetadata.from_meta({"ui": {"resourceUri": "ui://x/y"}})
+        assert ui is not None
+        assert ui.resource_uri == "ui://x/y"
+        assert ui.visibility == DEFAULT_TOOL_VISIBILITY
+        assert ui.visible_to_model() is True
+
+    def test_app_only_visibility_hides_from_model(self):
+        ui = ToolUIMetadata.from_meta(
+            {"ui": {"resourceUri": "ui://x/y", "visibility": ["app"]}}
+        )
+        assert ui.visibility == ["app"]
+        assert ui.visible_to_model() is False
+
+    def test_raw_payload_is_retained_verbatim(self):
+        raw = {
+            "resourceUri": "ui://x/y",
+            "visibility": ["model", "app"],
+            "csp": {"connectDomains": ["https://example.com"]},
+        }
+        ui = ToolUIMetadata.from_meta({"ui": raw})
+        assert ui.raw == raw
+
+
+# ── (a) initialize advertises the UI extension ────────────────────────────────
+
+
+async def _run_initialize(monkeypatch, *, enabled):
+    """Drive _UIExtensionClientSession.initialize() with I/O stubbed out and
+    return the ClientCapabilities that went onto the wire."""
+    if enabled:
+        monkeypatch.setenv(_ENV_FLAG, "true")
+    else:
+        monkeypatch.setenv(_ENV_FLAG, "false")
+
+    captured: dict = {}
+
+    async def fake_send_request(request, result_type, *a, **k):
+        captured["request"] = request
+        return mcp_types.InitializeResult(
+            protocolVersion=mcp_types.LATEST_PROTOCOL_VERSION,
+            capabilities=mcp_types.ServerCapabilities(),
+            serverInfo=mcp_types.Implementation(name="fake-server", version="1"),
+        )
+
+    send_a, recv_a = anyio.create_memory_object_stream(1)
+    send_b, recv_b = anyio.create_memory_object_stream(1)
+    session = _UIExtensionClientSession(recv_a, send_b)
+
+    with patch.object(
+        BaseSession, "send_request", new=AsyncMock(side_effect=fake_send_request)
+    ), patch.object(BaseSession, "send_notification", new=AsyncMock()):
+        await session.initialize()
+
+    request = captured["request"]
+    caps = request.root.params.capabilities
+    return caps.model_dump(by_alias=True, exclude_none=True)
+
+
+@pytest.mark.asyncio
+async def test_initialize_advertises_ui_extension_when_enabled(
+    mcp_apps_clean, monkeypatch
+):
+    caps = await _run_initialize(monkeypatch, enabled=True)
+
+    assert caps.get("extensions", {}).get(MCP_APPS_UI_EXTENSION_KEY) == {
+        "mimeTypes": [MCP_APPS_UI_MIME_TYPE]
+    }
+    assert MCP_APPS_UI_MIME_TYPE == "text/html;profile=mcp-app"
+
+
+@pytest.mark.asyncio
+async def test_initialize_omits_ui_extension_when_disabled(
+    mcp_apps_clean, monkeypatch
+):
+    caps = await _run_initialize(monkeypatch, enabled=False)
+
+    assert MCP_APPS_UI_EXTENSION_KEY not in caps.get("extensions", {})
+
+
+# ── ClientSession symbol patch ────────────────────────────────────────────────
+
+
+class TestSessionPatch:
+    def test_ensure_patch_substitutes_strands_client_session(self, mcp_apps_clean):
+        ensure_ui_extension_session_patch()
+        assert (
+            strands_mcp_client_mod.ClientSession is _UIExtensionClientSession
+        )
+
+    def test_constructing_ui_capable_client_installs_patch(self, mcp_apps_clean):
+        strands_mcp_client_mod.ClientSession = (
+            mcp_apps._UIExtensionClientSession.__bases__[0]
+        )
+        UICapableMCPClient(lambda: None)
+        assert (
+            strands_mcp_client_mod.ClientSession is _UIExtensionClientSession
+        )
+
+
+# ── (b)(c)(d) record + visibility filter ─────────────────────────────────────
+
+
+class TestRecordAndFilter:
+    def test_passthrough_when_flag_disabled(self, mcp_apps_clean, monkeypatch):
+        monkeypatch.setenv(_ENV_FLAG, "false")
+        tools = [
+            _fake_tool("app_only", ui={"resourceUri": "ui://a", "visibility": ["app"]}),
+            _fake_tool("plain"),
+        ]
+
+        result = record_and_filter_ui_tools(tools)
+
+        # Inert: nothing filtered, nothing recorded.
+        assert result == tools
+        assert get_ui_tool_catalog().snapshot() == {}
+
+    def test_filters_app_only_and_records_metadata(
+        self, mcp_apps_clean, monkeypatch
+    ):
+        monkeypatch.setenv(_ENV_FLAG, "true")
+        tools = [
+            _fake_tool(
+                "app_widget",
+                ui={"resourceUri": "ui://app/widget", "visibility": ["app"]},
+            ),
+            _fake_tool(
+                "panel",
+                ui={"resourceUri": "ui://app/panel"},  # default visibility
+            ),
+            _fake_tool(
+                "dual",
+                ui={"resourceUri": "ui://app/dual", "visibility": ["model", "app"]},
+            ),
+            _fake_tool("plain"),  # ordinary, no _meta.ui
+        ]
+
+        result = record_and_filter_ui_tools(tools)
+
+        kept = {t.tool_name for t in result}
+        # (b) app-only hidden from the model; (d) the rest unaffected.
+        assert kept == {"panel", "dual", "plain"}
+
+        catalog = get_ui_tool_catalog()
+        # (c) resourceUri survives the round-trip into our tool catalog,
+        # including for the app-only tool we hide from the model.
+        assert catalog.get("app_widget").resource_uri == "ui://app/widget"
+        assert catalog.get("app_widget").visible_to_model() is False
+        assert catalog.get("panel").resource_uri == "ui://app/panel"
+        assert catalog.get("panel").visibility == DEFAULT_TOOL_VISIBILITY
+        assert catalog.get("dual").resource_uri == "ui://app/dual"
+        # Ordinary tools are never recorded.
+        assert catalog.get("plain") is None
+
+
+# ── external client: UICapableMCPClient.list_tools_sync ───────────────────────
+
+
+class TestUICapableMCPClientListTools:
+    @pytest.mark.asyncio
+    async def test_list_tools_sync_filters_and_preserves_pagination(
+        self, mcp_apps_clean, monkeypatch
+    ):
+        monkeypatch.setenv(_ENV_FLAG, "true")
+        client = UICapableMCPClient(lambda: None)
+
+        fake_page = PaginatedList(
+            [
+                _fake_tool(
+                    "app_only",
+                    ui={"resourceUri": "ui://srv/app", "visibility": ["app"]},
+                ),
+                _fake_tool("normal"),
+            ],
+            token="next-page",
+        )
+
+        with patch.object(
+            strands_mcp_client_mod.MCPClient,
+            "list_tools_sync",
+            return_value=fake_page,
+        ):
+            result = client.list_tools_sync()
+
+        assert [t.tool_name for t in result] == ["normal"]
+        assert result.pagination_token == "next-page"
+        assert (
+            get_ui_tool_catalog().get("app_only").resource_uri == "ui://srv/app"
+        )
+
+
+# ── gateway client: FilteredMCPClient applies the same filter ─────────────────
+
+
+class TestFilteredGatewayClientUIFilter:
+    @pytest.mark.asyncio
+    async def test_gateway_filtered_client_hides_app_only_tool(
+        self, mcp_apps_clean, monkeypatch
+    ):
+        monkeypatch.setenv(_ENV_FLAG, "true")
+        client = FilteredMCPClient(
+            lambda: None,
+            enabled_tool_ids=["app_only", "normal"],
+        )
+
+        fake_page = PaginatedList(
+            [
+                _fake_tool(
+                    "app_only",
+                    ui={"resourceUri": "ui://gw/app", "visibility": ["app"]},
+                ),
+                _fake_tool("normal"),
+            ],
+            token=None,
+        )
+
+        # Patch the grandparent MCPClient.list_tools_sync so FilteredMCPClient's
+        # own override runs (enabled-id filter -> UI visibility filter).
+        with patch.object(
+            strands_mcp_client_mod.MCPClient,
+            "list_tools_sync",
+            return_value=fake_page,
+        ):
+            result = client.list_tools_sync()
+
+        assert [t.tool_name for t in result] == ["normal"]
+        assert get_ui_tool_catalog().get("app_only").resource_uri == "ui://gw/app"
+
+
+# ── PR #3: resources/read fetch path + ui_resource payload ───────────────────
+
+
+class _FakeMCPClient:
+    """Stand-in for a Strands MCPClient at the `resources/read` boundary.
+
+    Mirrors the mock-the-boundary style in `test_external_mcp_client.py`:
+    the unit under test never starts a real session — it only calls
+    `read_resource_sync`, which we record and stub.
+    """
+
+    def __init__(self, result=None, raises: Exception | None = None) -> None:
+        self._result = result
+        self._raises = raises
+        self.read_calls: list = []
+
+    def read_resource_sync(self, uri):
+        self.read_calls.append(uri)
+        if self._raises is not None:
+            raise self._raises
+        return self._result
+
+
+def _html_resource(
+    *, text="<h1>widget</h1>", mime=MCP_APPS_UI_MIME_TYPE, ui_meta=None
+):
+    """A real `mcp.types.ReadResourceResult` — proves our extraction works
+    against the actual MCP SDK shape, not just a duck-typed fake."""
+    kwargs = {"uri": "ui://srv/widget", "mimeType": mime, "text": text}
+    if ui_meta is not None:
+        kwargs["_meta"] = {MCP_APPS_UI_EXTENSION_KEY: ui_meta}
+    return mcp_types.ReadResourceResult(
+        contents=[mcp_types.TextResourceContents(**kwargs)]
+    )
+
+
+def _seed_catalog(monkeypatch, *, ui, client):
+    """Record a UI tool + its hosting client exactly the way a live
+    `list_tools_sync` would (so the client-passing path is exercised too)."""
+    monkeypatch.setenv(_ENV_FLAG, "true")
+    record_and_filter_ui_tools([_fake_tool("widget", ui=ui)], client=client)
+
+
+class TestFetchUIResource:
+    def test_fetches_via_resources_read_and_inlines_html(
+        self, mcp_apps_clean, monkeypatch
+    ):
+        client = _FakeMCPClient(
+            result=_html_resource(
+                ui_meta={
+                    "csp": {"connectDomains": ["https://api.test"]},
+                    # SEP-1865: permissions is an OBJECT, not a list.
+                    "permissions": {"clipboardWrite": {}},
+                }
+            )
+        )
+        _seed_catalog(
+            monkeypatch,
+            ui={"resourceUri": "ui://srv/widget"},
+            client=client,
+        )
+
+        payload = fetch_ui_resource("widget", "tu-1")
+
+        # Spec MUST: the resource is fetched via resources/read against the
+        # hosting client, addressed by the `ui://` URI — never inlined by us.
+        assert client.read_calls == ["ui://srv/widget"]
+        assert payload == {
+            "type": "ui_resource",
+            "toolUseId": "tu-1",
+            "resourceUri": "ui://srv/widget",
+            "html": "<h1>widget</h1>",
+            "mimeType": MCP_APPS_UI_MIME_TYPE,
+            "csp": {"connectDomains": ["https://api.test"]},
+            "permissions": {"clipboardWrite": {}},
+            # Empty when the mcp-sandbox stack origin isn't wired into env.
+            "sandboxOrigin": "",
+        }
+
+    def test_carries_sandbox_origin_from_env(
+        self, mcp_apps_clean, monkeypatch
+    ):
+        client = _FakeMCPClient(result=_html_resource())
+        _seed_catalog(
+            monkeypatch, ui={"resourceUri": "ui://srv/widget"}, client=client
+        )
+        monkeypatch.setenv(
+            _ENV_SANDBOX_ORIGIN, "https://mcp-sandbox.example.com"
+        )
+
+        payload = fetch_ui_resource("widget", "tu-1")
+        assert payload["sandboxOrigin"] == "https://mcp-sandbox.example.com"
+
+    def test_inert_when_flag_disabled(self, mcp_apps_clean, monkeypatch):
+        client = _FakeMCPClient(result=_html_resource())
+        _seed_catalog(
+            monkeypatch, ui={"resourceUri": "ui://srv/widget"}, client=client
+        )
+        # Flag flipped off *after* catalog seeding: the fetch path itself
+        # must stay inert regardless of catalog contents.
+        monkeypatch.setenv(_ENV_FLAG, "false")
+
+        assert fetch_ui_resource("widget", "tu-1") is None
+        assert client.read_calls == []
+
+    def test_none_for_unknown_or_non_ui_tool(self, mcp_apps_clean, monkeypatch):
+        monkeypatch.setenv(_ENV_FLAG, "true")
+        assert fetch_ui_resource("never-seen", "tu-1") is None
+
+    def test_none_when_no_hosting_client_recorded(
+        self, mcp_apps_clean, monkeypatch
+    ):
+        # Metadata recorded without a client (e.g. PR #2's catalog-only
+        # path) → we cannot issue resources/read, so no event.
+        _seed_catalog(
+            monkeypatch, ui={"resourceUri": "ui://srv/widget"}, client=None
+        )
+        assert fetch_ui_resource("widget", "tu-1") is None
+
+    def test_resources_read_failure_is_swallowed(
+        self, mcp_apps_clean, monkeypatch
+    ):
+        client = _FakeMCPClient(raises=RuntimeError("session not running"))
+        _seed_catalog(
+            monkeypatch, ui={"resourceUri": "ui://srv/widget"}, client=client
+        )
+        assert fetch_ui_resource("widget", "tu-1") is None
+        assert client.read_calls == ["ui://srv/widget"]
+
+    def test_none_when_resource_has_no_inline_html(
+        self, mcp_apps_clean, monkeypatch
+    ):
+        blob = mcp_types.ReadResourceResult(
+            contents=[
+                mcp_types.BlobResourceContents(
+                    uri="ui://srv/widget",
+                    mimeType="application/octet-stream",
+                    blob="AAAA",
+                )
+            ]
+        )
+        client = _FakeMCPClient(result=blob)
+        _seed_catalog(
+            monkeypatch, ui={"resourceUri": "ui://srv/widget"}, client=client
+        )
+        assert fetch_ui_resource("widget", "tu-1") is None
+
+    def test_csp_permissions_fall_back_to_tool_meta(
+        self, mcp_apps_clean, monkeypatch
+    ):
+        # Resource carries no `_meta.ui`; the tool's `tools/list` `_meta.ui`
+        # (retained verbatim by PR #2 in ToolUIMetadata.raw) supplies them.
+        client = _FakeMCPClient(result=_html_resource(ui_meta=None))
+        _seed_catalog(
+            monkeypatch,
+            ui={
+                "resourceUri": "ui://srv/widget",
+                "csp": {"frameDomains": ["https://embed.test"]},
+                "permissions": {"geolocation": {}},
+            },
+            client=client,
+        )
+
+        payload = fetch_ui_resource("widget", "tu-9")
+        assert payload is not None
+        assert payload["csp"] == {"frameDomains": ["https://embed.test"]}
+        assert payload["permissions"] == {"geolocation": {}}
+
+    def test_prefers_mcp_app_mime_when_multiple_text_contents(
+        self, mcp_apps_clean, monkeypatch
+    ):
+        result = mcp_types.ReadResourceResult(
+            contents=[
+                mcp_types.TextResourceContents(
+                    uri="ui://srv/widget",
+                    mimeType="text/plain",
+                    text="ignored",
+                ),
+                mcp_types.TextResourceContents(
+                    uri="ui://srv/widget",
+                    mimeType=MCP_APPS_UI_MIME_TYPE,
+                    text="<main>chosen</main>",
+                ),
+            ]
+        )
+        client = _FakeMCPClient(result=result)
+        _seed_catalog(
+            monkeypatch, ui={"resourceUri": "ui://srv/widget"}, client=client
+        )
+
+        payload = fetch_ui_resource("widget", "tu-2")
+        assert payload["html"] == "<main>chosen</main>"
+        assert payload["mimeType"] == MCP_APPS_UI_MIME_TYPE
diff --git a/backend/tests/agents/main_agent/streaming/test_artifact_events.py b/backend/tests/agents/main_agent/streaming/test_artifact_events.py
new file mode 100644
index 00000000..c55f7554
--- /dev/null
+++ b/backend/tests/agents/main_agent/streaming/test_artifact_events.py
@@ -0,0 +1,198 @@
+"""Tests for StreamCoordinator._extract_artifact_events.
+
+Covers the post-turn `artifact` SSE emit: turn-window filtering (only
+artifacts touched this turn), action derivation, fail-closed behavior,
+and the no-session guard.
+"""
+
+from __future__ import annotations
+
+import json
+from datetime import datetime, timezone
+
+import pytest
+
+from agents.builtin_tools.artifacts import service as artifact_service
+from agents.main_agent.streaming.stream_coordinator import StreamCoordinator
+
+SESSION = "sess-9"
+USER = "user-123"
+
+
+def _parse_sse(raw: str) -> dict:
+    assert raw.startswith("event: artifact\ndata: ")
+    assert raw.endswith("\n\n")
+    return json.loads(raw[len("event: artifact\ndata: ") :].strip())
+
+
+@pytest.fixture
+def turn_start() -> datetime:
+    return datetime(2026, 5, 15, 12, 0, 0, tzinfo=timezone.utc)
+
+
+@pytest.fixture
+def coord() -> StreamCoordinator:
+    return StreamCoordinator()
+
+
+def _row(**kw) -> dict:
+    base = {
+        "artifact_id": "art-1",
+        "version": 1,
+        "title": "Doc",
+        "content_type": "text/html; charset=utf-8",
+        "updated_at": "2026-05-15T12:00:05+00:00",
+        "created_at": "2026-05-15T12:00:05+00:00",
+    }
+    base.update(kw)
+    return base
+
+
+@pytest.mark.asyncio
+async def test_emits_created_for_v1(coord, turn_start, monkeypatch) -> None:
+    monkeypatch.setattr(
+        artifact_service, "list_session_artifacts", lambda u, s: [_row()]
+    )
+    out = await coord._extract_artifact_events(SESSION, USER, turn_start)
+    assert len(out) == 1
+    payload = _parse_sse(out[0])
+    assert payload == {
+        "type": "artifact",
+        "artifactId": "art-1",
+        "version": 1,
+        "title": "Doc",
+        "contentType": "text/html; charset=utf-8",
+        "sessionId": SESSION,
+        "updatedAt": "2026-05-15T12:00:05+00:00",
+        "action": "created",
+        "producedByMessageIndex": None,
+    }
+
+
+@pytest.mark.asyncio
+async def test_stamps_and_emits_produced_by_message_index(
+    coord, turn_start, monkeypatch
+) -> None:
+    monkeypatch.setattr(
+        artifact_service,
+        "list_session_artifacts",
+        lambda u, s: [_row(artifact_id="a"), _row(artifact_id="b")],
+    )
+    stamped: list[tuple] = []
+    monkeypatch.setattr(
+        artifact_service,
+        "set_produced_by_message_index",
+        lambda u, aid, ver, idx: stamped.append((u, aid, ver, idx)),
+    )
+    out = await coord._extract_artifact_events(
+        SESSION, USER, turn_start, produced_by_message_index=7
+    )
+    assert {_parse_sse(e)["producedByMessageIndex"] for e in out} == {7}
+    # Each artifact's own version is threaded to the stamp so the right
+    # version row is linked (both rows are v1 here).
+    assert stamped == [(USER, "a", 1, 7), (USER, "b", 1, 7)]
+
+
+@pytest.mark.asyncio
+async def test_stamp_failure_is_swallowed(
+    coord, turn_start, monkeypatch
+) -> None:
+    monkeypatch.setattr(
+        artifact_service, "list_session_artifacts", lambda u, s: [_row()]
+    )
+
+    def _boom(u, aid, ver, idx):
+        raise RuntimeError("ddb down")
+
+    monkeypatch.setattr(
+        artifact_service, "set_produced_by_message_index", _boom
+    )
+    out = await coord._extract_artifact_events(
+        SESSION, USER, turn_start, produced_by_message_index=3
+    )
+    # Stamp failure must not drop the live event.
+    assert _parse_sse(out[0])["producedByMessageIndex"] == 3
+
+
+@pytest.mark.asyncio
+async def test_version_gt_1_is_updated(coord, turn_start, monkeypatch) -> None:
+    monkeypatch.setattr(
+        artifact_service,
+        "list_session_artifacts",
+        lambda u, s: [_row(version=4)],
+    )
+    out = await coord._extract_artifact_events(SESSION, USER, turn_start)
+    assert _parse_sse(out[0])["action"] == "updated"
+    assert _parse_sse(out[0])["version"] == 4
+
+
+@pytest.mark.asyncio
+async def test_filters_artifacts_from_earlier_turns(
+    coord, turn_start, monkeypatch
+) -> None:
+    stale = _row(artifact_id="old", updated_at="2026-05-15T11:59:59+00:00")
+    fresh = _row(artifact_id="new", updated_at="2026-05-15T12:00:30+00:00")
+    monkeypatch.setattr(
+        artifact_service,
+        "list_session_artifacts",
+        lambda u, s: [fresh, stale],
+    )
+    out = await coord._extract_artifact_events(SESSION, USER, turn_start)
+    ids = [_parse_sse(e)["artifactId"] for e in out]
+    assert ids == ["new"]
+
+
+@pytest.mark.asyncio
+async def test_unparseable_updated_at_is_included(
+    coord, turn_start, monkeypatch
+) -> None:
+    monkeypatch.setattr(
+        artifact_service,
+        "list_session_artifacts",
+        lambda u, s: [_row(updated_at="")],
+    )
+    out = await coord._extract_artifact_events(SESSION, USER, turn_start)
+    assert len(out) == 1
+
+
+@pytest.mark.asyncio
+async def test_config_error_is_swallowed(coord, turn_start, monkeypatch) -> None:
+    def _raise(u, s):
+        raise artifact_service.ArtifactConfigError("not configured")
+
+    monkeypatch.setattr(artifact_service, "list_session_artifacts", _raise)
+    assert await coord._extract_artifact_events(SESSION, USER, turn_start) == []
+
+
+@pytest.mark.asyncio
+async def test_unexpected_error_is_swallowed(
+    coord, turn_start, monkeypatch
+) -> None:
+    def _raise(u, s):
+        raise RuntimeError("ddb down")
+
+    monkeypatch.setattr(artifact_service, "list_session_artifacts", _raise)
+    assert await coord._extract_artifact_events(SESSION, USER, turn_start) == []
+
+
+@pytest.mark.asyncio
+async def test_no_session_or_user_is_noop(coord, turn_start) -> None:
+    assert await coord._extract_artifact_events(None, USER, turn_start) == []
+    assert await coord._extract_artifact_events(SESSION, None, turn_start) == []
+
+
+@pytest.mark.asyncio
+async def test_multiple_artifacts_one_turn(coord, turn_start, monkeypatch) -> None:
+    monkeypatch.setattr(
+        artifact_service,
+        "list_session_artifacts",
+        lambda u, s: [
+            _row(artifact_id="a", version=1),
+            _row(artifact_id="b", version=2),
+        ],
+    )
+    out = await coord._extract_artifact_events(SESSION, USER, turn_start)
+    actions = {
+        _parse_sse(e)["artifactId"]: _parse_sse(e)["action"] for e in out
+    }
+    assert actions == {"a": "created", "b": "updated"}
diff --git a/backend/tests/agents/main_agent/streaming/test_compaction_sse_emit_once.py b/backend/tests/agents/main_agent/streaming/test_compaction_sse_emit_once.py
new file mode 100644
index 00000000..32223b61
--- /dev/null
+++ b/backend/tests/agents/main_agent/streaming/test_compaction_sse_emit_once.py
@@ -0,0 +1,215 @@
+"""Regression: the `compaction` SSE event emits exactly once per compaction event.
+
+The `compaction` SSE event (frontend inline "earlier messages summarized"
+divider, landed in PR #243) is emitted by ``StreamCoordinator.stream_response``
+from inside the single terminal ``done`` handler, gated solely on
+``TurnBasedSessionManager.update_after_turn`` returning a ``CompactionResult``.
+
+``process_agent_stream`` yields exactly one ``done`` event per turn (STEP 9,
+after the raw agent stream is exhausted), so ``update_after_turn`` is awaited
+exactly once and the SSE frame is yielded at most once.
+
+This module locks that once-per-turn invariant against the *real* pipeline
+(``stream_response`` → ``process_agent_stream`` → coordinator emit code),
+stubbing only two narrow seams: ``agent.stream_async`` (raw Strands events)
+and ``session_manager.update_after_turn`` (the compaction decision).
+
+It is also the explicit non-regression guard for the Strands 1.40 bump.
+Strands 1.40 ships proactive context compression (strands PR #2239) and feeds
+``EventLoopMetrics.accumulated_usage`` on the ``AgentResult`` event — which
+``_handle_metadata_events`` surfaces on the ``metadata_summary`` track. Neither
+is a second emit path: proactive compression is opt-in via a
+``ConversationManager``'s ``proactive_compression`` (default ``None`` → the
+``BeforeModelCallEvent`` handler early-returns), and our compaction lives in a
+``SessionManager`` (``TurnBasedSessionManager.update_after_turn``), a different
+abstraction. The third test drives the ``metadata_summary``/accumulated-usage
+surface explicitly and asserts the emit count stays at one.
+"""
+
+from typing import Any, AsyncIterator, Dict, List, Optional
+
+import pytest
+
+from agents.main_agent.session.compaction_models import CompactionResult
+from agents.main_agent.streaming.stream_coordinator import StreamCoordinator
+
+
+# Per-call metadata raw event: Bedrock's `metadata` chunk wrapped inside
+# Strands' ModelStreamChunkEvent. Same shape as the cost-attribution suite.
+def _raw_metadata_event(usage: Dict[str, int]) -> Dict[str, Any]:
+    return {"event": {"metadata": {"usage": usage, "metrics": {"latencyMs": 100}}}}
+
+
+# Strands AgentResult event. EventLoopMetrics.accumulated_usage is summed
+# across all LLM calls in the turn; _handle_metadata_events extracts it onto
+# the `metadata_summary` track. This is the surface Strands 1.40's proactive
+# compression also reads from — included here to prove it is not a second
+# compaction-emit path.
+class _FakeEventLoopMetrics:
+    def __init__(self, accumulated_usage: Dict[str, int]) -> None:
+        self.accumulated_usage = accumulated_usage
+        self.accumulated_metrics = {"latencyMs": 250}
+
+
+class _FakeAgentResult:
+    def __init__(self, accumulated_usage: Dict[str, int]) -> None:
+        self.metrics = _FakeEventLoopMetrics(accumulated_usage)
+
+
+def _raw_agent_result_event(accumulated_usage: Dict[str, int]) -> Dict[str, Any]:
+    return {"result": _FakeAgentResult(accumulated_usage)}
+
+
+class _FakeAgent:
+    """Minimal agent: a message list and a controllable raw event stream.
+
+    No ``_interrupt_state`` so the coordinator's paused-turn snapshot and
+    OAuth / tool-approval extractors all early-return on the ``done`` event.
+    """
+
+    def __init__(self, raw_events: List[Dict[str, Any]]) -> None:
+        self.messages = [{"role": "user", "content": [{"text": "hi"}]}]
+        self._raw_events = raw_events
+
+    def stream_async(self, prompt: Any) -> AsyncIterator[Dict[str, Any]]:
+        async def _gen() -> AsyncIterator[Dict[str, Any]]:
+            for ev in self._raw_events:
+                yield ev
+
+        return _gen()
+
+
+class _RecordingSessionManager:
+    """Stub session manager that records ``update_after_turn`` invocations.
+
+    Only the seam the coordinator depends on is implemented; the real
+    threshold/checkpoint math is covered by the TurnBasedSessionManager
+    suite. This isolates the coordinator-level once-per-turn invariant.
+    """
+
+    def __init__(self, result: Optional[CompactionResult]) -> None:
+        self._result = result
+        self.calls: List[int] = []
+
+    async def update_after_turn(
+        self,
+        input_tokens: int,
+        current_messages: Optional[List[Dict]] = None,
+    ) -> Optional[CompactionResult]:
+        self.calls.append(input_tokens)
+        return self._result
+
+
+async def _collect_sse(
+    agent: _FakeAgent, session_manager: _RecordingSessionManager
+) -> List[str]:
+    coordinator = StreamCoordinator()
+    frames: List[str] = []
+    async for sse in coordinator.stream_response(
+        agent=agent,
+        prompt="hi",
+        session_manager=session_manager,
+        session_id="sess-1",
+        user_id="user-1",
+        main_agent_wrapper=None,
+    ):
+        frames.append(sse)
+    return frames
+
+
+def _compaction_frames(frames: List[str]) -> List[str]:
+    return [f for f in frames if f.startswith("event: compaction\n")]
+
+
+# A turn whose summed input buckets exceed any threshold — guarantees the
+# coordinator's `total_input_tokens > 0` guard passes so update_after_turn
+# is consulted.
+_TURN_USAGE = {"inputTokens": 150_000, "outputTokens": 80, "totalTokens": 150_080}
+
+
+@pytest.mark.asyncio
+async def test_compaction_sse_emitted_exactly_once_when_checkpoint_advances():
+    """Checkpoint advances → exactly one `event: compaction` frame."""
+    result = CompactionResult(
+        previous_checkpoint=0,
+        new_checkpoint=4,
+        summarized_turns=2,
+        input_tokens=150_000,
+    )
+    agent = _FakeAgent([_raw_metadata_event(_TURN_USAGE)])
+    sm = _RecordingSessionManager(result)
+
+    frames = await _collect_sse(agent, sm)
+    compaction = _compaction_frames(frames)
+
+    # update_after_turn consulted exactly once (one terminal `done`).
+    assert sm.calls == [150_000]
+    assert len(compaction) == 1, (
+        f"expected exactly one compaction SSE frame, got {len(compaction)}: "
+        f"{compaction}"
+    )
+
+    import json
+
+    payload = json.loads(compaction[0][len("event: compaction\ndata: ") :].strip())
+    assert payload == {
+        "type": "compaction",
+        "previousCheckpoint": 0,
+        "newCheckpoint": 4,
+        "summarizedTurns": 2,
+        "inputTokens": 150_000,
+    }
+
+
+@pytest.mark.asyncio
+async def test_no_compaction_sse_when_checkpoint_does_not_advance():
+    """update_after_turn returns None → zero compaction frames, still one call."""
+    agent = _FakeAgent([_raw_metadata_event(_TURN_USAGE)])
+    sm = _RecordingSessionManager(None)
+
+    frames = await _collect_sse(agent, sm)
+
+    assert sm.calls == [150_000]
+    assert _compaction_frames(frames) == []
+
+
+@pytest.mark.asyncio
+async def test_strands_result_metadata_track_does_not_double_fire():
+    """Strands 1.40 non-regression guard.
+
+    Interleave per-call `metadata` events with a Strands ``AgentResult``
+    (the ``EventLoopMetrics.accumulated_usage`` / ``metadata_summary``
+    surface that 1.40's proactive compression also reads). There is still
+    exactly one terminal ``done`` → update_after_turn is consulted exactly
+    once → exactly one compaction frame. The accumulated-usage track is not
+    a second emit path.
+    """
+    call_0 = {"inputTokens": 80_000, "outputTokens": 40, "totalTokens": 80_040}
+    call_1 = {"inputTokens": 150_000, "outputTokens": 60, "totalTokens": 150_060}
+    turn_cumulative = {
+        "inputTokens": 230_000,  # Strands sums across calls — must not re-trigger
+        "outputTokens": 100,
+        "totalTokens": 230_100,
+    }
+    agent = _FakeAgent(
+        [
+            _raw_metadata_event(call_0),
+            _raw_metadata_event(call_1),
+            _raw_agent_result_event(turn_cumulative),
+        ]
+    )
+    result = CompactionResult(
+        previous_checkpoint=4,
+        new_checkpoint=8,
+        summarized_turns=2,
+        input_tokens=150_000,
+    )
+    sm = _RecordingSessionManager(result)
+
+    frames = await _collect_sse(agent, sm)
+
+    # Consulted exactly once. The compaction trigger reads "current context"
+    # (last per-call usage via last-write-wins), NOT Strands' summed
+    # accumulated_usage — so the input is call_1's 150_000, not 230_000.
+    assert sm.calls == [150_000]
+    assert len(_compaction_frames(frames)) == 1
diff --git a/backend/tests/agents/main_agent/streaming/test_per_message_cost_attribution.py b/backend/tests/agents/main_agent/streaming/test_per_message_cost_attribution.py
new file mode 100644
index 00000000..0c24bcc8
--- /dev/null
+++ b/backend/tests/agents/main_agent/streaming/test_per_message_cost_attribution.py
@@ -0,0 +1,310 @@
+"""Regression test for per-message cost attribution on multi-LLM-call turns.
+
+Strands emits two sources of usage during a tool-use turn:
+  1. Per-LLM-call metadata via ``ModelStreamChunkEvent`` (one per assistant
+     message), carrying just that call's tokens.
+  2. A final ``AgentResultEvent`` whose ``AgentResult.metrics`` is an
+     ``EventLoopMetrics`` with ``accumulated_usage`` summed across every call
+     in the turn.
+
+``stream_processor._handle_metadata_events`` extracts both. The stream
+coordinator routes any ``metadata`` event into
+``per_message_metadata[current_assistant_message_index]["usage"].update(...)``.
+Because the AgentResult event arrives *after* every ``message_stop`` (so the
+index still points at the last assistant message), a naive ``.update()`` on
+the same key overwrites the last message's per-call usage with the
+turn-cumulative usage. Pricing each per-message entry and summing then
+double-counts every earlier message's input tokens.
+
+This module locks the contract:
+  - The per-call metadata events stay typed ``metadata`` (per-message track).
+  - The result-extracted cumulative metadata is typed ``metadata_summary``
+    (turn-summary track), so it never lands in per_message_metadata.
+
+If the contract regresses, simulating the dispatch loop will reproduce the
+double-count and these assertions will fail.
+"""
+
+from typing import Any, Dict, List
+
+from agents.main_agent.streaming.stream_processor import _handle_metadata_events
+
+
+# Realistic per-call metadata chunk shape: Bedrock's `metadata` chunk wrapped
+# inside Strands' ModelStreamChunkEvent (`{"event": chunk}`).
+def _per_call_metadata_event(usage: Dict[str, int]) -> Dict[str, Any]:
+    return {"event": {"metadata": {"usage": usage, "metrics": {"latencyMs": 100}}}}
+
+
+# Realistic AgentResultEvent shape. EventLoopMetrics has accumulated_usage
+# summed across all calls; _handle_metadata_events extracts it via __dict__.
+class _FakeEventLoopMetrics:
+    def __init__(self, accumulated_usage: Dict[str, int]) -> None:
+        self.accumulated_usage = accumulated_usage
+        self.accumulated_metrics = {"latencyMs": 250}
+
+
+class _FakeAgentResult:
+    def __init__(self, accumulated_usage: Dict[str, int]) -> None:
+        self.metrics = _FakeEventLoopMetrics(accumulated_usage)
+
+
+def _agent_result_event(accumulated_usage: Dict[str, int]) -> Dict[str, Any]:
+    return {"result": _FakeAgentResult(accumulated_usage)}
+
+
+def _dispatch_to_per_message(
+    processed_events: List[Dict[str, Any]],
+    per_message_metadata: List[Dict[str, Any]],
+    current_index: int,
+) -> None:
+    """Mimic stream_coordinator's per-message routing for a single source event.
+
+    Only ``metadata`` events flow into ``per_message_metadata`` — the
+    ``metadata_summary`` track is for the turn-level accumulator and is
+    intentionally not routed here.
+    """
+    for processed in processed_events:
+        if processed.get("type") != "metadata":
+            continue
+        usage = processed.get("data", {}).get("usage")
+        if not usage:
+            continue
+        per_message_metadata[current_index]["usage"].update(usage)
+
+
+class TestPerMessageAttributionTwoCallTurn:
+    """Reproduce the dispatch sequence of a 2-call tool-use turn."""
+
+    CALL_0_USAGE = {"inputTokens": 1000, "outputTokens": 50, "totalTokens": 1050}
+    CALL_1_USAGE = {"inputTokens": 1300, "outputTokens": 80, "totalTokens": 1380}
+    TURN_CUMULATIVE = {
+        "inputTokens": CALL_0_USAGE["inputTokens"] + CALL_1_USAGE["inputTokens"],
+        "outputTokens": CALL_0_USAGE["outputTokens"] + CALL_1_USAGE["outputTokens"],
+        "totalTokens": CALL_0_USAGE["totalTokens"] + CALL_1_USAGE["totalTokens"],
+    }
+
+    def test_per_call_metadata_routes_to_per_message_track(self):
+        """Each per-call metadata event carries one message's tokens, no more."""
+        events = _handle_metadata_events(_per_call_metadata_event(self.CALL_0_USAGE))
+        metadata_events = [e for e in events if e["type"] == "metadata"]
+        assert len(metadata_events) == 1
+        assert metadata_events[0]["data"]["usage"] == self.CALL_0_USAGE
+
+    def test_result_cumulative_does_not_route_to_per_message_track(self):
+        """The AgentResult cumulative must not be a `metadata` event.
+
+        If it is, the dispatch loop overwrites the last per-message entry
+        with cumulative usage, double-counting earlier messages' input
+        tokens at pricing time.
+        """
+        events = _handle_metadata_events(_agent_result_event(self.TURN_CUMULATIVE))
+        per_message_typed = [e for e in events if e["type"] == "metadata"]
+        assert per_message_typed == [], (
+            "AgentResult cumulative usage was emitted as a `metadata` event; "
+            "it would clobber the last per-message entry. Expected "
+            "`metadata_summary` so it stays on the turn-summary track only."
+        )
+
+    def test_result_cumulative_emitted_on_summary_track(self):
+        """Result-extracted cumulative is still emitted — just on metadata_summary."""
+        events = _handle_metadata_events(_agent_result_event(self.TURN_CUMULATIVE))
+        summary_events = [e for e in events if e["type"] == "metadata_summary"]
+        assert len(summary_events) == 1
+        assert summary_events[0]["data"]["usage"] == self.TURN_CUMULATIVE
+
+    def test_full_turn_dispatch_preserves_per_call_attribution(self):
+        """Drive the full event sequence and assert no double-counting."""
+        per_message_metadata = [
+            {"usage": {}, "metrics": {}},
+            {"usage": {}, "metrics": {}},
+        ]
+
+        # Message 0's per-call metadata fires while index = 0.
+        _dispatch_to_per_message(
+            _handle_metadata_events(_per_call_metadata_event(self.CALL_0_USAGE)),
+            per_message_metadata,
+            current_index=0,
+        )
+        # Message 1's per-call metadata fires while index = 1.
+        _dispatch_to_per_message(
+            _handle_metadata_events(_per_call_metadata_event(self.CALL_1_USAGE)),
+            per_message_metadata,
+            current_index=1,
+        )
+        # AgentResult cumulative fires last, with index still at 1. If this
+        # leaks onto the `metadata` track, msg 1's usage gets clobbered with
+        # the turn cumulative — input tokens for msg 0 would be summed twice
+        # when pricing each entry independently.
+        _dispatch_to_per_message(
+            _handle_metadata_events(_agent_result_event(self.TURN_CUMULATIVE)),
+            per_message_metadata,
+            current_index=1,
+        )
+
+        assert per_message_metadata[0]["usage"] == self.CALL_0_USAGE
+        assert per_message_metadata[1]["usage"] == self.CALL_1_USAGE
+
+        # Pricing each entry independently must equal the cumulative input,
+        # not 2× msg 0's input + msg 1's input.
+        summed_input = (
+            per_message_metadata[0]["usage"]["inputTokens"]
+            + per_message_metadata[1]["usage"]["inputTokens"]
+        )
+        assert summed_input == self.TURN_CUMULATIVE["inputTokens"]
+
+
+class TestSummaryAccumulatorAcceptsBothTracks:
+    """The stream_processor main loop must keep `accumulated_metadata` cumulative.
+
+    Per-call events accumulate via ``.update()`` (last-write-wins), so before
+    the cumulative arrives the accumulator only holds the last call's usage —
+    which is *not* cumulative. The accumulator must therefore consume both
+    `metadata` and `metadata_summary` events for the final summary emission
+    to carry true turn totals.
+    """
+
+    def test_accumulator_processes_both_tracks(self):
+        """Walk the same sequence the main loop does and check the final state."""
+        accumulated: Dict[str, Any] = {"usage": {}, "metrics": {}}
+
+        sequence = [
+            _per_call_metadata_event(TestPerMessageAttributionTwoCallTurn.CALL_0_USAGE),
+            _per_call_metadata_event(TestPerMessageAttributionTwoCallTurn.CALL_1_USAGE),
+            _agent_result_event(TestPerMessageAttributionTwoCallTurn.TURN_CUMULATIVE),
+        ]
+
+        for raw in sequence:
+            for processed in _handle_metadata_events(raw):
+                if processed.get("type") in ("metadata", "metadata_summary"):
+                    data = processed.get("data", {})
+                    if "usage" in data:
+                        accumulated["usage"].update(data["usage"])
+                    if "metrics" in data:
+                        accumulated["metrics"].update(data["metrics"])
+
+        assert accumulated["usage"] == TestPerMessageAttributionTwoCallTurn.TURN_CUMULATIVE
+
+
+class TestStreamCoordinatorContextOccupancy:
+    """The final SSE `usage` field must reflect current context, not sums.
+
+    Bedrock reports each LLM call's `inputTokens` as the FULL context size
+    sent on that call. For a 2-call tool turn:
+        call_1.input  = 1000  (system + user_msg)
+        call_2.input  = 2500  (system + user_msg + tool_use + tool_result)
+
+    Strands' EventLoopMetrics.accumulated_usage sums these into 3500 — but
+    the actual context occupancy is 2500, the size of the most recent call.
+    The frontend uses the SSE metadata `usage` to drive the context-%
+    badge, and the backend uses it to decide whether to trigger
+    compaction; both need "current context size", not the cross-call sum.
+
+    This locks in the contract that stream_coordinator's accumulated_metadata
+    (which feeds the final SSE metadata) takes per-call values via
+    last-write-wins from `metadata` events and IGNORES the cross-call
+    cumulative carried on `metadata_summary`.
+    """
+
+    CALL_0_USAGE = {"inputTokens": 1000, "outputTokens": 50, "totalTokens": 1050}
+    CALL_1_USAGE = {"inputTokens": 2500, "outputTokens": 100, "totalTokens": 2600}
+    TURN_CUMULATIVE = {
+        "inputTokens": 3500,   # 1000 + 2500 — Strands' accumulated_usage
+        "outputTokens": 150,
+        "totalTokens": 3650,
+    }
+
+    def _simulate_stream_coordinator_accumulator(
+        self, events: List[Dict[str, Any]]
+    ) -> Dict[str, Any]:
+        """Mirror stream_coordinator's accumulator branches for a sequence of
+        already-processed events. Returns the resulting accumulated_metadata.
+
+        - `metadata` events → update accumulated_metadata.usage/metrics.
+        - `metadata_summary` events → first_token_time only; usage/metrics ignored.
+        """
+        accumulated: Dict[str, Any] = {"usage": {}, "metrics": {}}
+        for processed in events:
+            event_type = processed.get("type")
+            event_data = processed.get("data", {})
+            if event_type == "metadata":
+                if "usage" in event_data:
+                    accumulated["usage"].update(event_data["usage"])
+                if "metrics" in event_data:
+                    accumulated["metrics"].update(event_data["metrics"])
+            # metadata_summary intentionally does NOT touch usage/metrics here
+        return accumulated
+
+    def test_final_usage_reflects_last_call_not_sum(self):
+        """End of a 2-call tool turn — usage should be call_2's, not the sum."""
+        # Drive the realistic event order through _handle_metadata_events
+        # exactly as stream_processor would, then through the coordinator's
+        # accumulator branches.
+        raw_events = [
+            _per_call_metadata_event(self.CALL_0_USAGE),
+            _per_call_metadata_event(self.CALL_1_USAGE),
+            _agent_result_event(self.TURN_CUMULATIVE),
+        ]
+        processed: List[Dict[str, Any]] = []
+        for raw in raw_events:
+            processed.extend(_handle_metadata_events(raw))
+
+        result = self._simulate_stream_coordinator_accumulator(processed)
+
+        assert result["usage"] == self.CALL_1_USAGE, (
+            "Final accumulated usage must equal the last per-call's full input "
+            "(current context size), not Strands' summed-across-calls value. "
+            "If this regresses, the context-% badge and compaction trigger "
+            "will inflate by ~the size of every prior call in the turn."
+        )
+
+    def test_compaction_input_tokens_match_current_context(self):
+        """The trigger threshold computation in stream_coordinator uses
+        `usage.inputTokens + cacheReadInputTokens + cacheWriteInputTokens`."""
+        call_with_cache = {
+            "inputTokens": 200,
+            "outputTokens": 80,
+            "totalTokens": 280,
+            "cacheReadInputTokens": 2000,
+            "cacheWriteInputTokens": 300,
+        }
+        prior_call = {
+            "inputTokens": 100,
+            "outputTokens": 40,
+            "totalTokens": 140,
+            "cacheReadInputTokens": 0,
+            "cacheWriteInputTokens": 800,
+        }
+        cumulative_after_two_calls = {
+            "inputTokens": 300,            # would be summed by Strands
+            "outputTokens": 120,
+            "totalTokens": 420,
+            "cacheReadInputTokens": 2000,
+            "cacheWriteInputTokens": 1100, # would be summed by Strands
+        }
+
+        raw_events = [
+            _per_call_metadata_event(prior_call),
+            _per_call_metadata_event(call_with_cache),
+            _agent_result_event(cumulative_after_two_calls),
+        ]
+        processed: List[Dict[str, Any]] = []
+        for raw in raw_events:
+            processed.extend(_handle_metadata_events(raw))
+
+        result = self._simulate_stream_coordinator_accumulator(processed)
+        usage = result["usage"]
+
+        # Compaction sums all three input buckets — must equal call_with_cache's
+        # totals (current context), not the summed-across-calls totals.
+        compaction_input = (
+            usage.get("inputTokens", 0)
+            + usage.get("cacheReadInputTokens", 0)
+            + usage.get("cacheWriteInputTokens", 0)
+        )
+        expected_current_context = (
+            call_with_cache["inputTokens"]
+            + call_with_cache["cacheReadInputTokens"]
+            + call_with_cache["cacheWriteInputTokens"]
+        )
+        assert compaction_input == expected_current_context
diff --git a/backend/tests/agents/main_agent/streaming/test_stream_processor.py b/backend/tests/agents/main_agent/streaming/test_stream_processor.py
index 04a99318..2848c0e1 100644
--- a/backend/tests/agents/main_agent/streaming/test_stream_processor.py
+++ b/backend/tests/agents/main_agent/streaming/test_stream_processor.py
@@ -19,6 +19,7 @@
     _handle_reasoning_events,
     _handle_tool_events,
     _serialize_object,
+    process_agent_stream,
 )
 
 
@@ -608,7 +609,13 @@ def test_empty_event_returns_empty(self):
         assert _handle_metadata_events({}) == []
 
     def test_result_with_accumulated_usage(self):
-        """result.metrics.accumulated_usage produces a metadata event."""
+        """result.metrics.accumulated_usage rides the metadata_summary track.
+
+        It must NOT be emitted as a `metadata` event — those land in
+        per_message_metadata in the stream coordinator and would clobber
+        the last assistant message's per-call usage with a turn-cumulative
+        value, double-counting earlier messages at pricing time.
+        """
         raw = {
             "result": {
                 "metrics": {
@@ -621,9 +628,11 @@ def test_result_with_accumulated_usage(self):
             }
         }
         events = _handle_metadata_events(raw)
-        m = [e for e in events if e["type"] == "metadata"]
-        assert len(m) >= 1
-        assert m[0]["data"]["usage"]["inputTokens"] == 500
+        per_message_typed = [e for e in events if e["type"] == "metadata"]
+        summary_typed = [e for e in events if e["type"] == "metadata_summary"]
+        assert per_message_typed == []
+        assert len(summary_typed) == 1
+        assert summary_typed[0]["data"]["usage"]["inputTokens"] == 500
 
 
 # ---------------------------------------------------------------------------
@@ -779,3 +788,52 @@ def test_metadata_structure(self):
             "usage": {"inputTokens": 1, "outputTokens": 1, "totalTokens": 2},
         })
         self._assert_structure(events)
+
+
+class TestProcessAgentStreamMaxTokens:
+    """MaxTokensReachedException is classified as a recoverable max_tokens
+    error event (not the generic stream_error) and never leaks the raw SDK
+    message/URL."""
+
+    @pytest.mark.asyncio
+    async def test_max_tokens_emits_recoverable_error_event(self):
+        from strands.types.exceptions import MaxTokensReachedException
+
+        async def mock_stream():
+            yield {"start_event_loop": True}
+            raise MaxTokensReachedException(
+                "Agent has reached an unrecoverable state due to max_tokens "
+                "limit. For more information see: https://strandsagents.com/x"
+            )
+
+        events = []
+        async for ev in process_agent_stream(mock_stream()):
+            events.append(ev)
+
+        error_events = [e for e in events if e.get("type") == "error"]
+        assert len(error_events) == 1
+        data = error_events[0]["data"]
+        assert data["code"] == "max_tokens"
+        assert data["recoverable"] is True
+        # detail is None and excluded — no leaked SDK URL/raw exception text.
+        assert "strandsagents.com" not in str(data)
+        assert "unrecoverable" not in str(data).lower()
+
+    @pytest.mark.asyncio
+    async def test_generic_exception_still_stream_error(self):
+        """Regression: a non-max_tokens exception still maps to the
+        non-recoverable generic stream_error."""
+
+        async def mock_stream():
+            yield {"start_event_loop": True}
+            raise RuntimeError("totally unrelated boom")
+
+        events = []
+        async for ev in process_agent_stream(mock_stream()):
+            events.append(ev)
+
+        error_events = [e for e in events if e.get("type") == "error"]
+        assert len(error_events) == 1
+        data = error_events[0]["data"]
+        assert data["code"] == "stream_error"
+        assert data["recoverable"] is False
diff --git a/backend/tests/agents/main_agent/streaming/test_ui_resource_events.py b/backend/tests/agents/main_agent/streaming/test_ui_resource_events.py
new file mode 100644
index 00000000..883abd3a
--- /dev/null
+++ b/backend/tests/agents/main_agent/streaming/test_ui_resource_events.py
@@ -0,0 +1,226 @@
+"""Tests for StreamCoordinator._extract_ui_resource_events.
+
+PR #3 of the MCP Apps host-renderer initiative
+(`docs/kaizen/scoping/mcp-apps-host-renderer.md`). Covers the per-tool-result
+`ui_resource` SSE emit: it fires only for UI-bearing tools, fetches the
+resource via the hosting client's `resources/read` and inlines the HTML,
+correlates by toolUseId, dedupes, stays inert behind the host flag, and
+never breaks the stream on failure.
+
+Mirrors the helper-level style of `test_artifact_events.py` (drive the
+coordinator method directly) and the mock-the-boundary catalog seeding from
+`tests/agents/main_agent/integrations/test_mcp_apps.py`.
+"""
+
+from __future__ import annotations
+
+import json
+
+import mcp.types as mcp_types
+import pytest
+
+from agents.main_agent.integrations import mcp_apps
+from agents.main_agent.integrations.mcp_apps import (
+    MCP_APPS_UI_EXTENSION_KEY,
+    MCP_APPS_UI_MIME_TYPE,
+    get_ui_tool_catalog,
+    record_and_filter_ui_tools,
+)
+from agents.main_agent.streaming.stream_coordinator import StreamCoordinator
+
+_ENV_FLAG = "AGENTCORE_MCP_APPS_HOST_ENABLED"
+_ENV_SANDBOX_ORIGIN = "AGENTCORE_MCP_APPS_SANDBOX_ORIGIN"
+
+
+@pytest.fixture
+def coord() -> StreamCoordinator:
+    return StreamCoordinator()
+
+
+@pytest.fixture
+def catalog_clean(monkeypatch):
+    get_ui_tool_catalog().clear()
+    monkeypatch.delenv(_ENV_FLAG, raising=False)
+    monkeypatch.delenv(_ENV_SANDBOX_ORIGIN, raising=False)
+    try:
+        yield
+    finally:
+        get_ui_tool_catalog().clear()
+
+
+class _FakeMCPClient:
+    def __init__(self, result):
+        self._result = result
+        self.read_calls: list = []
+
+    def read_resource_sync(self, uri):
+        self.read_calls.append(uri)
+        return self._result
+
+
+def _fake_tool(tool_name, ui):
+    from types import SimpleNamespace
+
+    return SimpleNamespace(
+        tool_name=tool_name,
+        mcp_tool=SimpleNamespace(name=tool_name, meta={"ui": ui}),
+    )
+
+
+def _html_result(text="<h1>hi</h1>"):
+    return mcp_types.ReadResourceResult(
+        contents=[
+            mcp_types.TextResourceContents(
+                uri="ui://srv/widget",
+                mimeType=MCP_APPS_UI_MIME_TYPE,
+                text=text,
+                _meta={
+                    MCP_APPS_UI_EXTENSION_KEY: {
+                        "csp": {"connectDomains": ["https://api.test"]},
+                        "permissions": {"clipboardWrite": {}},
+                    }
+                },
+            )
+        ]
+    )
+
+
+def _seed(monkeypatch, client):
+    monkeypatch.setenv(_ENV_FLAG, "true")
+    record_and_filter_ui_tools(
+        [_fake_tool("widget", {"resourceUri": "ui://srv/widget"})],
+        client=client,
+    )
+
+
+def _tool_result_event(tool_use_id="tu-1"):
+    return {
+        "type": "tool_result",
+        "data": {
+            "tool_result": {
+                "toolUseId": tool_use_id,
+                "status": "success",
+                "content": [{"text": "ok"}],
+            }
+        },
+    }
+
+
+def _parse(raw: str) -> dict:
+    assert raw.startswith("event: ui_resource\ndata: ")
+    assert raw.endswith("\n\n")
+    return json.loads(raw[len("event: ui_resource\ndata: ") :].strip())
+
+
+@pytest.mark.asyncio
+async def test_emits_ui_resource_with_inline_html(
+    coord, catalog_clean, monkeypatch
+):
+    client = _FakeMCPClient(_html_result("<main>app</main>"))
+    _seed(monkeypatch, client)
+
+    out = await coord._extract_ui_resource_events(
+        _tool_result_event("tu-1"), {"tu-1": "widget"}, set()
+    )
+
+    assert client.read_calls == ["ui://srv/widget"]
+    assert len(out) == 1
+    payload = _parse(out[0])
+    assert payload == {
+        "type": "ui_resource",
+        "toolUseId": "tu-1",
+        "resourceUri": "ui://srv/widget",
+        "html": "<main>app</main>",
+        "mimeType": MCP_APPS_UI_MIME_TYPE,
+        "csp": {"connectDomains": ["https://api.test"]},
+        "permissions": {"clipboardWrite": {}},
+        "sandboxOrigin": "",
+    }
+
+
+@pytest.mark.asyncio
+async def test_dedupes_per_tool_use_id(coord, catalog_clean, monkeypatch):
+    client = _FakeMCPClient(_html_result())
+    _seed(monkeypatch, client)
+    emitted: set = set()
+
+    first = await coord._extract_ui_resource_events(
+        _tool_result_event("tu-1"), {"tu-1": "widget"}, emitted
+    )
+    second = await coord._extract_ui_resource_events(
+        _tool_result_event("tu-1"), {"tu-1": "widget"}, emitted
+    )
+
+    assert len(first) == 1
+    assert second == []
+    assert emitted == {"tu-1"}
+    # The dedupe must short-circuit before a second resources/read.
+    assert client.read_calls == ["ui://srv/widget"]
+
+
+@pytest.mark.asyncio
+async def test_inert_when_flag_disabled(coord, catalog_clean, monkeypatch):
+    client = _FakeMCPClient(_html_result())
+    _seed(monkeypatch, client)
+    monkeypatch.setenv(_ENV_FLAG, "false")
+
+    out = await coord._extract_ui_resource_events(
+        _tool_result_event("tu-1"), {"tu-1": "widget"}, set()
+    )
+    assert out == []
+    assert client.read_calls == []
+
+
+@pytest.mark.asyncio
+async def test_noop_for_untracked_tool_use_id(
+    coord, catalog_clean, monkeypatch
+):
+    client = _FakeMCPClient(_html_result())
+    _seed(monkeypatch, client)
+
+    # No name learned for this toolUseId → cannot map to the catalog.
+    out = await coord._extract_ui_resource_events(
+        _tool_result_event("tu-unknown"), {}, set()
+    )
+    assert out == []
+    assert client.read_calls == []
+
+
+@pytest.mark.asyncio
+async def test_noop_when_tool_result_has_no_tool_use_id(
+    coord, catalog_clean, monkeypatch
+):
+    client = _FakeMCPClient(_html_result())
+    _seed(monkeypatch, client)
+
+    event = {"type": "tool_result", "data": {"tool_result": {"status": "ok"}}}
+    out = await coord._extract_ui_resource_events(
+        event, {"tu-1": "widget"}, set()
+    )
+    assert out == []
+
+
+@pytest.mark.asyncio
+async def test_noop_for_non_ui_tool(coord, catalog_clean, monkeypatch):
+    # Flag on, but the tool has no `_meta.ui` in the catalog at all.
+    monkeypatch.setenv(_ENV_FLAG, "true")
+    out = await coord._extract_ui_resource_events(
+        _tool_result_event("tu-1"), {"tu-1": "plain_tool"}, set()
+    )
+    assert out == []
+
+
+@pytest.mark.asyncio
+async def test_failure_is_swallowed(coord, catalog_clean, monkeypatch):
+    _seed(monkeypatch, _FakeMCPClient(_html_result()))
+
+    def _boom(tool_name, tool_use_id):
+        raise RuntimeError("catalog exploded")
+
+    monkeypatch.setattr(mcp_apps, "fetch_ui_resource", _boom)
+
+    # A failure in the fetch path must not propagate into the live stream.
+    out = await coord._extract_ui_resource_events(
+        _tool_result_event("tu-1"), {"tu-1": "widget"}, set()
+    )
+    assert out == []
diff --git a/backend/tests/agents/main_agent/test_chat_agent_continue.py b/backend/tests/agents/main_agent/test_chat_agent_continue.py
new file mode 100644
index 00000000..645d2b65
--- /dev/null
+++ b/backend/tests/agents/main_agent/test_chat_agent_continue.py
@@ -0,0 +1,70 @@
+"""ChatAgent.stream_async continuation-after-max_tokens behavior.
+
+A `continue_truncated=True` call must NOT synthesize a new user prompt: it
+forwards an empty-list prompt so Strands appends no message and the model
+resumes the truncated assistant message already in restored history.
+"""
+
+import pytest
+
+from agents.main_agent.chat_agent import ChatAgent
+
+
+class _RecordingCoordinator:
+    """Captures the prompt stream_async forwards to the coordinator."""
+
+    def __init__(self):
+        self.captured = {}
+
+    async def stream_response(self, **kwargs):
+        self.captured = kwargs
+        if False:  # pragma: no cover - make this an async generator
+            yield ""
+
+
+class _ExplodingMultimodalBuilder:
+    """build_prompt must never be called on the continuation path."""
+
+    def build_prompt(self, message, files):  # noqa: D401
+        raise AssertionError("multimodal build_prompt called on continuation path")
+
+
+def _bare_chat_agent(coordinator, multimodal):
+    agent = object.__new__(ChatAgent)
+    agent.agent = object()  # truthy so _create_agent() is skipped
+    agent.stream_coordinator = coordinator
+    agent.multimodal_builder = multimodal
+    agent.session_manager = object()
+    agent.session_id = "sess-1"
+    agent.user_id = "user-1"
+    return agent
+
+
+@pytest.mark.asyncio
+async def test_continue_truncated_forwards_empty_list_prompt():
+    coordinator = _RecordingCoordinator()
+    agent = _bare_chat_agent(coordinator, _ExplodingMultimodalBuilder())
+
+    async for _ in agent.stream_async(
+        "this message text must be ignored",
+        continue_truncated=True,
+    ):
+        pass
+
+    assert coordinator.captured.get("prompt") == []
+
+
+@pytest.mark.asyncio
+async def test_normal_turn_still_uses_multimodal_builder():
+    coordinator = _RecordingCoordinator()
+
+    class _Builder:
+        def build_prompt(self, message, files):
+            return f"built:{message}"
+
+    agent = _bare_chat_agent(coordinator, _Builder())
+
+    async for _ in agent.stream_async("hello", continue_truncated=False):
+        pass
+
+    assert coordinator.captured.get("prompt") == "built:hello"
diff --git a/backend/tests/apis/app_api/admin/auth_providers/test_cognito_redirect_uri.py b/backend/tests/apis/app_api/admin/auth_providers/test_cognito_redirect_uri.py
index e71a9dd1..671ace8a 100644
--- a/backend/tests/apis/app_api/admin/auth_providers/test_cognito_redirect_uri.py
+++ b/backend/tests/apis/app_api/admin/auth_providers/test_cognito_redirect_uri.py
@@ -7,7 +7,7 @@
 from fastapi.testclient import TestClient
 
 from apis.shared.auth.models import User
-from apis.shared.rbac.system_admin import require_system_admin
+from apis.shared.auth import require_admin
 
 
 @pytest.fixture
@@ -35,7 +35,7 @@ def _create_app(admin_user: User) -> FastAPI:
     admin_router = APIRouter(prefix="/admin")
     admin_router.include_router(router)
     app.include_router(admin_router)
-    app.dependency_overrides[require_system_admin] = lambda: admin_user
+    app.dependency_overrides[require_admin] = lambda: admin_user
     return app
 
 
diff --git a/backend/tests/apis/app_api/artifacts/test_artifact_content.py b/backend/tests/apis/app_api/artifacts/test_artifact_content.py
new file mode 100644
index 00000000..200385fd
--- /dev/null
+++ b/backend/tests/apis/app_api/artifacts/test_artifact_content.py
@@ -0,0 +1,203 @@
+"""Tests for the app-api artifact content endpoint (panel code view).
+
+Covers ownership scoping, the Markdown unwrap (+ its fallback), the
+inline size cap, and the fail-closed config behavior.
+"""
+
+from __future__ import annotations
+
+import base64
+
+import boto3
+import pytest
+from fastapi import FastAPI
+from fastapi.testclient import TestClient
+from moto import mock_aws
+
+from apis.app_api.artifacts import service as artifact_service
+from apis.app_api.artifacts.routes import router as artifacts_router
+from apis.shared.auth import User, get_current_user_from_session
+
+TABLE = "test-user-artifacts"
+BUCKET = "test-artifacts-bucket"
+REGION = "us-east-1"
+USER_ID = "user-123"
+
+
+@pytest.fixture(autouse=True)
+def _reset_caches() -> None:
+    artifact_service._reset_caches_for_tests()
+
+
+@pytest.fixture
+def client(monkeypatch: pytest.MonkeyPatch):
+    with mock_aws():
+        monkeypatch.setenv("AWS_REGION", REGION)
+
+        ddb = boto3.client("dynamodb", region_name=REGION)
+        ddb.create_table(
+            TableName=TABLE,
+            KeySchema=[
+                {"AttributeName": "PK", "KeyType": "HASH"},
+                {"AttributeName": "SK", "KeyType": "RANGE"},
+            ],
+            AttributeDefinitions=[
+                {"AttributeName": "PK", "AttributeType": "S"},
+                {"AttributeName": "SK", "AttributeType": "S"},
+            ],
+            BillingMode="PAY_PER_REQUEST",
+        )
+        s3 = boto3.client("s3", region_name=REGION)
+        s3.create_bucket(Bucket=BUCKET)
+
+        monkeypatch.setenv("DYNAMODB_ARTIFACTS_TABLE_NAME", TABLE)
+        monkeypatch.setenv("S3_ARTIFACTS_BUCKET_NAME", BUCKET)
+
+        app = FastAPI()
+        app.include_router(artifacts_router)
+        app.dependency_overrides[get_current_user_from_session] = (
+            lambda: User(
+                email="u@x.com", user_id=USER_ID, name="U", roles=[]
+            )
+        )
+        yield (
+            TestClient(app),
+            boto3.resource("dynamodb", region_name=REGION),
+            s3,
+        )
+
+
+def _put(
+    ddb,
+    s3,
+    *,
+    user_id: str = USER_ID,
+    artifact: str = "art-1",
+    version: int = 1,
+    content_type: str = "text/html; charset=utf-8",
+    body: bytes = b"<h1>hi</h1>",
+    write_object: bool = True,
+    content_key: str | None = None,
+) -> None:
+    key = content_key
+    if key is None:
+        key = f"{user_id}/{artifact}/v{version}/index.html"
+    ddb.Table(TABLE).put_item(
+        Item={
+            "PK": f"USER#{user_id}",
+            "SK": f"ARTIFACT#{artifact}#V#{version:05d}",
+            "storage": "s3",
+            "content_key": key,
+            "content_type": content_type,
+        }
+    )
+    if write_object:
+        s3.put_object(Bucket=BUCKET, Key=key, Body=body)
+
+
+def _markdown_wrapper(md: str) -> bytes:
+    b64 = base64.b64encode(md.encode("utf-8")).decode("ascii")
+    return (
+        "<!doctype html><html><body><main>Rendering…</main>"
+        '<script type="application/x-markdown-base64" '
+        f'id="md-src">{b64}</script>'
+        "<script type=\"module\">/* render */</script>"
+        "</body></html>"
+    ).encode("utf-8")
+
+
+def test_happy_path_returns_raw_source(client) -> None:
+    tc, ddb, s3 = client
+    _put(ddb, s3, body=b"<h1>Hello</h1>")
+    resp = tc.get("/artifacts/art-1/content", params={"version": 1})
+    assert resp.status_code == 200
+    body = resp.json()
+    assert body["content"] == "<h1>Hello</h1>"
+    assert body["content_type"] == "text/html; charset=utf-8"
+    assert body["version"] == 1
+
+
+def test_markdown_is_unwrapped_to_authored_source(client) -> None:
+    tc, ddb, s3 = client
+    md = "# Title\n\nSome **bold** text.\n"
+    _put(
+        ddb,
+        s3,
+        content_type="text/markdown",
+        body=_markdown_wrapper(md),
+    )
+    resp = tc.get("/artifacts/art-1/content", params={"version": 1})
+    assert resp.status_code == 200
+    body = resp.json()
+    assert body["content"] == md
+    assert body["content_type"] == "text/markdown"
+
+
+def test_markdown_without_src_tag_falls_back_to_raw(client) -> None:
+    """A Markdown row whose object lacks the embed (legacy / future
+    template) returns the raw stored bytes + real type, not an error."""
+    tc, ddb, s3 = client
+    _put(
+        ddb,
+        s3,
+        content_type="text/markdown",
+        body=b"<html><body>no embed here</body></html>",
+    )
+    resp = tc.get("/artifacts/art-1/content", params={"version": 1})
+    assert resp.status_code == 200
+    body = resp.json()
+    assert "no embed here" in body["content"]
+    assert body["content_type"] == "text/markdown"
+
+
+def test_unknown_version_is_404(client) -> None:
+    tc, _, _ = client
+    resp = tc.get("/artifacts/art-1/content", params={"version": 1})
+    assert resp.status_code == 404
+
+
+def test_other_users_artifact_is_404(client) -> None:
+    tc, ddb, s3 = client
+    _put(ddb, s3, user_id="someone-else")
+    resp = tc.get("/artifacts/art-1/content", params={"version": 1})
+    assert resp.status_code == 404
+
+
+def test_missing_s3_object_is_404(client) -> None:
+    tc, ddb, s3 = client
+    _put(ddb, s3, write_object=False)
+    resp = tc.get("/artifacts/art-1/content", params={"version": 1})
+    assert resp.status_code == 404
+
+
+def test_oversized_artifact_is_413(client, monkeypatch) -> None:
+    tc, ddb, s3 = client
+    monkeypatch.setattr(artifact_service, "_MAX_CONTENT_BYTES", 16)
+    _put(ddb, s3, body=b"x" * 64)
+    resp = tc.get("/artifacts/art-1/content", params={"version": 1})
+    assert resp.status_code == 413
+
+
+def test_missing_bucket_is_500(client, monkeypatch) -> None:
+    tc, ddb, s3 = client
+    _put(ddb, s3)
+    monkeypatch.delenv("S3_ARTIFACTS_BUCKET_NAME", raising=False)
+    artifact_service._reset_caches_for_tests()
+    resp = tc.get("/artifacts/art-1/content", params={"version": 1})
+    assert resp.status_code == 500
+
+
+def test_version_must_be_positive(client) -> None:
+    tc, ddb, s3 = client
+    _put(ddb, s3)
+    resp = tc.get("/artifacts/art-1/content", params={"version": 0})
+    assert resp.status_code == 422
+
+
+def test_requires_authentication() -> None:
+    app = FastAPI()
+    app.include_router(artifacts_router)
+    resp = TestClient(app).get(
+        "/artifacts/art-1/content", params={"version": 1}
+    )
+    assert resp.status_code == 401
diff --git a/backend/tests/apis/app_api/artifacts/test_list_artifacts.py b/backend/tests/apis/app_api/artifacts/test_list_artifacts.py
new file mode 100644
index 00000000..82e25200
--- /dev/null
+++ b/backend/tests/apis/app_api/artifacts/test_list_artifacts.py
@@ -0,0 +1,365 @@
+"""Tests for the app-api session artifacts list endpoint.
+
+The endpoint returns *every version* of every artifact in a session via
+a two-step query: SessionIndex (HEAD rows only) to discover the
+artifacts, then a per-artifact main-table `SK begins_with #V#` query for
+all immutable version rows. The SPA renders one card per version,
+anchored to the turn that produced it via the per-version
+`produced_by_message_index` the writer stamps.
+"""
+
+from __future__ import annotations
+
+import boto3
+import pytest
+from botocore.exceptions import ClientError
+from fastapi import FastAPI
+from fastapi.testclient import TestClient
+from moto import mock_aws
+
+from apis.app_api.artifacts import service as artifact_service
+from apis.app_api.artifacts.routes import router as artifacts_router
+from apis.app_api.artifacts.service import (
+    ArtifactListService,
+    ArtifactQueryError,
+    RenderTokenConfigError,
+    get_artifact_list_service,
+)
+from apis.shared.auth import User, get_current_user_from_session
+
+TABLE = "test-user-artifacts"
+REGION = "us-east-1"
+USER_ID = "user-123"
+SESSION = "sess-9"
+
+
+@pytest.fixture(autouse=True)
+def _reset_caches() -> None:
+    artifact_service._reset_caches_for_tests()
+
+
+@pytest.fixture
+def client(monkeypatch: pytest.MonkeyPatch):
+    with mock_aws():
+        monkeypatch.setenv("AWS_REGION", REGION)
+        ddb = boto3.client("dynamodb", region_name=REGION)
+        ddb.create_table(
+            TableName=TABLE,
+            KeySchema=[
+                {"AttributeName": "PK", "KeyType": "HASH"},
+                {"AttributeName": "SK", "KeyType": "RANGE"},
+            ],
+            AttributeDefinitions=[
+                {"AttributeName": "PK", "AttributeType": "S"},
+                {"AttributeName": "SK", "AttributeType": "S"},
+                {"AttributeName": "GSI1PK", "AttributeType": "S"},
+                {"AttributeName": "GSI1SK", "AttributeType": "S"},
+            ],
+            BillingMode="PAY_PER_REQUEST",
+            GlobalSecondaryIndexes=[
+                {
+                    "IndexName": "SessionIndex",
+                    "KeySchema": [
+                        {"AttributeName": "GSI1PK", "KeyType": "HASH"},
+                        {"AttributeName": "GSI1SK", "KeyType": "RANGE"},
+                    ],
+                    "Projection": {"ProjectionType": "ALL"},
+                }
+            ],
+        )
+
+        monkeypatch.setenv("DYNAMODB_ARTIFACTS_TABLE_NAME", TABLE)
+
+        app = FastAPI()
+        app.include_router(artifacts_router)
+        app.dependency_overrides[get_current_user_from_session] = lambda: User(
+            email="u@x.com", user_id=USER_ID, name="U", roles=[]
+        )
+        yield TestClient(app), boto3.resource("dynamodb", region_name=REGION)
+
+
+def _put_version(
+    ddb,
+    *,
+    artifact: str,
+    version: int,
+    user_id: str = USER_ID,
+    session_id: str = SESSION,
+    title: str = "Doc",
+    updated_at: str | None = "2026-05-15T10:00:00+00:00",
+    created_at: str = "2026-05-15T10:00:00+00:00",
+    produced_by: int | None = None,
+) -> None:
+    """One immutable version row, mirroring the writer. `updated_at` /
+    `produced_by` left None models a pre-per-version-linkage row."""
+    item = {
+        "PK": f"USER#{user_id}",
+        "SK": f"ARTIFACT#{artifact}#V#{version:05d}",
+        "storage": "s3",
+        "content_key": f"{user_id}/{artifact}/v{version}/index.html",
+        "content_type": "text/html; charset=utf-8",
+        "version": version,
+        "artifact_id": artifact,
+        "user_id": user_id,
+        "session_id": session_id,
+        "title": title,
+        "created_at": created_at,
+    }
+    if updated_at is not None:
+        item["updated_at"] = updated_at
+    if produced_by is not None:
+        item["produced_by_message_index"] = produced_by
+    ddb.Table(TABLE).put_item(Item=item)
+
+
+def _put_head(
+    ddb,
+    *,
+    artifact: str,
+    head_version: int,
+    user_id: str = USER_ID,
+    session_id: str = SESSION,
+    updated_at: str = "2026-05-15T10:00:00+00:00",
+    title: str = "Doc",
+) -> None:
+    """The HEAD pointer row — carries the SessionIndex GSI keys used for
+    step-1 artifact discovery."""
+    ddb.Table(TABLE).put_item(
+        Item={
+            "PK": f"USER#{user_id}",
+            "SK": f"ARTIFACT#{artifact}#HEAD",
+            "GSI1PK": f"SESSION#{session_id}",
+            "GSI1SK": f"ARTIFACT#{updated_at}#{artifact}",
+            "storage": "s3",
+            "content_key": f"{user_id}/{artifact}/v{head_version}/index.html",
+            "content_type": "text/html; charset=utf-8",
+            "version": head_version,
+            "artifact_id": artifact,
+            "user_id": user_id,
+            "session_id": session_id,
+            "title": title,
+            "created_at": "2026-05-15T10:00:00+00:00",
+            "updated_at": updated_at,
+        }
+    )
+
+
+def _put_artifact(
+    ddb,
+    *,
+    artifact: str,
+    versions: list[dict],
+    user_id: str = USER_ID,
+    session_id: str = SESSION,
+) -> None:
+    """N immutable version rows plus a HEAD at the latest — exactly what
+    the writer leaves after a create + updates sequence."""
+    for v in versions:
+        _put_version(
+            ddb,
+            artifact=artifact,
+            user_id=user_id,
+            session_id=session_id,
+            **v,
+        )
+    last = max(versions, key=lambda v: v["version"])
+    _put_head(
+        ddb,
+        artifact=artifact,
+        head_version=last["version"],
+        user_id=user_id,
+        session_id=session_id,
+        updated_at=last.get("updated_at") or "2026-05-15T10:00:00+00:00",
+        title=last.get("title", "Doc"),
+    )
+
+
+def test_empty_session_is_empty_list(client) -> None:
+    tc, _ = client
+    resp = tc.get("/artifacts", params={"session_id": SESSION})
+    assert resp.status_code == 200
+    assert resp.json() == {"artifacts": []}
+
+
+def test_returns_every_version_newest_artifact_first(client) -> None:
+    tc, ddb = client
+    _put_artifact(
+        ddb,
+        artifact="old",
+        versions=[
+            {"version": 1, "updated_at": "2026-05-15T10:00:00+00:00", "title": "Old"}
+        ],
+    )
+    _put_artifact(
+        ddb,
+        artifact="new",
+        versions=[
+            {"version": 1, "updated_at": "2026-05-15T11:00:00+00:00", "title": "New"},
+            {"version": 2, "updated_at": "2026-05-15T11:30:00+00:00", "title": "New"},
+            {"version": 3, "updated_at": "2026-05-15T12:00:00+00:00", "title": "New"},
+        ],
+    )
+
+    arts = tc.get("/artifacts", params={"session_id": SESSION}).json()[
+        "artifacts"
+    ]
+    # Every version of every artifact is present.
+    assert {(a["artifact_id"], a["version"]) for a in arts} == {
+        ("new", 1),
+        ("new", 2),
+        ("new", 3),
+        ("old", 1),
+    }
+    # Step-1 discovery is HEAD-newest-first, so all of "new"'s versions
+    # come before "old"'s.
+    ids = [a["artifact_id"] for a in arts]
+    assert set(ids[:3]) == {"new"}
+    assert ids[-1] == "old"
+
+
+def test_per_version_produced_by_index(client) -> None:
+    """Each version row carries its own linkage index so the SPA can
+    anchor every version's card under the turn that produced it. A row
+    without one (pre-linkage) is null → SPA end-of-conversation strip."""
+    tc, ddb = client
+    _put_artifact(
+        ddb,
+        artifact="art-1",
+        versions=[
+            {"version": 1, "updated_at": "2026-05-15T11:00:00+00:00", "produced_by": 3},
+            {"version": 2, "updated_at": "2026-05-15T12:00:00+00:00", "produced_by": 7},
+            {"version": 3, "updated_at": "2026-05-15T12:30:00+00:00"},
+        ],
+    )
+    arts = tc.get("/artifacts", params={"session_id": SESSION}).json()[
+        "artifacts"
+    ]
+    by_v = {a["version"]: a for a in arts}
+    assert by_v[1]["produced_by_message_index"] == 3
+    assert by_v[2]["produced_by_message_index"] == 7
+    assert by_v[3]["produced_by_message_index"] is None
+
+
+def test_legacy_version_rows_degrade_gracefully(client) -> None:
+    """Version rows written before per-version linkage lack updated_at /
+    produced_by_message_index. They must still be returned (empty/null)
+    so the SPA shows them in the strip rather than dropping them."""
+    tc, ddb = client
+    _put_artifact(
+        ddb,
+        artifact="legacy",
+        versions=[
+            {"version": 1, "updated_at": None},
+            {"version": 2, "updated_at": None},
+        ],
+    )
+    arts = tc.get("/artifacts", params={"session_id": SESSION}).json()[
+        "artifacts"
+    ]
+    assert {a["version"] for a in arts} == {1, 2}
+    for a in arts:
+        assert a["updated_at"] == ""
+        assert a["produced_by_message_index"] is None
+
+
+def test_created_at_present_on_each_version(client) -> None:
+    tc, ddb = client
+    _put_artifact(
+        ddb,
+        artifact="art-1",
+        versions=[
+            {"version": 1},
+            {"version": 2, "updated_at": "2026-05-15T12:00:00+00:00"},
+        ],
+    )
+    arts = tc.get("/artifacts", params={"session_id": SESSION}).json()[
+        "artifacts"
+    ]
+    assert all(
+        a["created_at"] == "2026-05-15T10:00:00+00:00" for a in arts
+    )
+
+
+def test_other_users_artifact_is_filtered(client) -> None:
+    """Step 1 drops a HEAD owned by another user that happens to share
+    the queried session id; step 2 is PK=USER#{caller}, so their version
+    rows are never read even if a HEAD leaked."""
+    tc, ddb = client
+    _put_artifact(ddb, artifact="mine", versions=[{"version": 1}])
+    _put_artifact(
+        ddb,
+        artifact="theirs",
+        versions=[{"version": 1}],
+        user_id="someone-else",
+    )
+
+    arts = tc.get("/artifacts", params={"session_id": SESSION}).json()[
+        "artifacts"
+    ]
+    assert {a["artifact_id"] for a in arts} == {"mine"}
+
+
+def test_session_id_required(client) -> None:
+    tc, _ = client
+    resp = tc.get("/artifacts")
+    assert resp.status_code == 422
+
+
+def test_requires_authentication() -> None:
+    app = FastAPI()
+    app.include_router(artifacts_router)
+    resp = TestClient(app).get("/artifacts", params={"session_id": SESSION})
+    assert resp.status_code == 401
+
+
+def test_transient_query_error_is_not_a_config_error(
+    client, monkeypatch
+) -> None:
+    """A transient DynamoDB ClientError is a runtime query failure, not a
+    misconfiguration — it must surface as ArtifactQueryError so the route
+    can distinguish a configured-but-throttled feature from a broken one."""
+
+    class _ThrottlingTable:
+        def query(self, **_):
+            raise ClientError(
+                {"Error": {"Code": "ThrottlingException", "Message": "slow down"}},
+                "Query",
+            )
+
+    monkeypatch.setattr(artifact_service, "_table", lambda: _ThrottlingTable())
+    with pytest.raises(ArtifactQueryError):
+        ArtifactListService().list_for_session(
+            user_id=USER_ID, session_id=SESSION
+        )
+
+
+def test_route_maps_transient_query_failure_to_503(client) -> None:
+    """ArtifactQueryError → 503 (retryable), distinct from the 500 a real
+    RenderTokenConfigError misconfiguration produces."""
+    tc, _ = client
+
+    class _FailingService:
+        def list_for_session(self, **_):
+            raise ArtifactQueryError("artifact list query failed")
+
+    tc.app.dependency_overrides[get_artifact_list_service] = _FailingService
+    try:
+        resp = tc.get("/artifacts", params={"session_id": SESSION})
+    finally:
+        tc.app.dependency_overrides.pop(get_artifact_list_service, None)
+    assert resp.status_code == 503
+
+
+def test_route_maps_misconfig_to_500(client) -> None:
+    tc, _ = client
+
+    class _MisconfiguredService:
+        def list_for_session(self, **_):
+            raise RenderTokenConfigError("DYNAMODB_ARTIFACTS_TABLE_NAME is not set")
+
+    tc.app.dependency_overrides[get_artifact_list_service] = _MisconfiguredService
+    try:
+        resp = tc.get("/artifacts", params={"session_id": SESSION})
+    finally:
+        tc.app.dependency_overrides.pop(get_artifact_list_service, None)
+    assert resp.status_code == 500
diff --git a/backend/tests/apis/app_api/artifacts/test_render_token.py b/backend/tests/apis/app_api/artifacts/test_render_token.py
new file mode 100644
index 00000000..0aaca0ed
--- /dev/null
+++ b/backend/tests/apis/app_api/artifacts/test_render_token.py
@@ -0,0 +1,182 @@
+"""Tests for the app-api render-token minter.
+
+The headline test (`test_token_verifies_against_render_lambda`) mints a
+token and feeds it straight through #309's Lambda verifier with a shared
+signing key — that is the real cross-PR contract guarantee.
+"""
+
+from __future__ import annotations
+
+import jwt
+import pytest
+from fastapi import FastAPI
+from fastapi.testclient import TestClient
+from moto import mock_aws
+import boto3
+
+from apis.app_api.artifacts import service as token_service
+from apis.app_api.artifacts.routes import router as artifacts_router
+from apis.shared.auth import User, get_current_user_from_session
+from lambdas.artifact_render import handler as render_lambda
+
+KEY = "test-render-key-44-chars-of-entropy-aaaaaaaa"
+SECRET_NAME = "test-artifact-render-token-key"
+TABLE = "test-user-artifacts"
+ORIGIN = "https://artifacts.test.example.com"
+REGION = "us-east-1"
+USER_ID = "user-123"
+
+
+@pytest.fixture(autouse=True)
+def _reset_caches(monkeypatch: pytest.MonkeyPatch) -> None:
+    token_service._reset_caches_for_tests()
+    # The verifier caches its own signing key separately.
+    monkeypatch.setattr(render_lambda, "_cached_signing_key", None)
+
+
+@pytest.fixture
+def client(monkeypatch: pytest.MonkeyPatch):
+    with mock_aws():
+        monkeypatch.setenv("AWS_REGION", REGION)
+        sm = boto3.client("secretsmanager", region_name=REGION)
+        arn = sm.create_secret(Name=SECRET_NAME, SecretString=KEY)["ARN"]
+
+        ddb = boto3.client("dynamodb", region_name=REGION)
+        ddb.create_table(
+            TableName=TABLE,
+            KeySchema=[
+                {"AttributeName": "PK", "KeyType": "HASH"},
+                {"AttributeName": "SK", "KeyType": "RANGE"},
+            ],
+            AttributeDefinitions=[
+                {"AttributeName": "PK", "AttributeType": "S"},
+                {"AttributeName": "SK", "AttributeType": "S"},
+            ],
+            BillingMode="PAY_PER_REQUEST",
+        )
+
+        monkeypatch.setenv("ARTIFACTS_RENDER_TOKEN_SECRET_ARN", arn)
+        monkeypatch.setenv("DYNAMODB_ARTIFACTS_TABLE_NAME", TABLE)
+        monkeypatch.setenv("ARTIFACTS_ORIGIN", ORIGIN)
+
+        app = FastAPI()
+        app.include_router(artifacts_router)
+        app.dependency_overrides[get_current_user_from_session] = (
+            lambda: User(email="u@x.com", user_id=USER_ID, name="U", roles=[])
+        )
+        yield TestClient(app), boto3.resource("dynamodb", region_name=REGION)
+
+
+def _put_version(ddb, *, user_id: str = USER_ID, artifact="art-1", version=1) -> None:
+    ddb.Table(TABLE).put_item(
+        Item={
+            "PK": f"USER#{user_id}",
+            "SK": f"ARTIFACT#{artifact}#V#{version:05d}",
+            "storage": "s3",
+            "content_key": f"{user_id}/{artifact}/v{version}/index.html",
+            "content_type": "text/html; charset=utf-8",
+        }
+    )
+
+
+def _token_from_url(url: str) -> str:
+    assert url.startswith(f"{ORIGIN}/?t=")
+    return url.split("?t=", 1)[1]
+
+
+def test_happy_path_mints_valid_token(client) -> None:
+    tc, ddb = client
+    _put_version(ddb)
+    resp = tc.post(
+        "/artifacts/art-1/render-token", json={"version": 1, "sessionId": "sess-9"}
+    )
+    assert resp.status_code == 200
+    body = resp.json()
+    claims = jwt.decode(
+        _token_from_url(body["url"]),
+        KEY,
+        algorithms=["HS256"],
+        audience="artifact-render",
+    )
+    assert claims["iss"] == "app-api"
+    assert claims["sub"] == USER_ID
+    assert claims["aid"] == "art-1"
+    assert claims["ver"] == 1
+    assert claims["sid"] == "sess-9"
+    assert claims["exp"] - claims["iat"] == 120
+    assert body["expires_at"].endswith("+00:00")
+
+
+def test_token_verifies_against_render_lambda(client, monkeypatch) -> None:
+    """The cross-PR contract: a freshly minted token must pass the
+    actual #309 verifier byte-for-byte with the same signing key."""
+    tc, ddb = client
+    _put_version(ddb)
+    monkeypatch.setattr(render_lambda, "_cached_signing_key", KEY)
+
+    resp = tc.post("/artifacts/art-1/render-token", json={"version": 1})
+    token = _token_from_url(resp.json()["url"])
+
+    verified = render_lambda._verify_token(token)
+    assert verified["sub"] == USER_ID
+    assert verified["aid"] == "art-1"
+    assert verified["ver"] == 1
+
+
+def test_unknown_version_is_404(client) -> None:
+    tc, _ = client
+    resp = tc.post("/artifacts/art-1/render-token", json={"version": 1})
+    assert resp.status_code == 404
+
+
+def test_other_users_artifact_is_404(client) -> None:
+    """Ownership scoping: a record owned by someone else is invisible
+    because the PK is built from the authenticated user's id."""
+    tc, ddb = client
+    _put_version(ddb, user_id="someone-else")
+    resp = tc.post("/artifacts/art-1/render-token", json={"version": 1})
+    assert resp.status_code == 404
+
+
+def test_version_must_be_positive(client) -> None:
+    tc, ddb = client
+    _put_version(ddb)
+    resp = tc.post("/artifacts/art-1/render-token", json={"version": 0})
+    assert resp.status_code == 422
+
+
+def test_session_id_optional(client) -> None:
+    tc, ddb = client
+    _put_version(ddb)
+    resp = tc.post("/artifacts/art-1/render-token", json={"version": 1})
+    assert resp.status_code == 200
+    claims = jwt.decode(
+        _token_from_url(resp.json()["url"]),
+        KEY,
+        algorithms=["HS256"],
+        audience="artifact-render",
+    )
+    assert claims["sid"] == ""
+
+
+def test_missing_origin_is_500(client, monkeypatch) -> None:
+    """Fail-closed config: with ARTIFACTS_ORIGIN unset the service must
+    500 before minting — never hand back a usable token embedded in a
+    relative, unloadable URL. The artifact row exists, so a 500 (not a
+    404) proves the origin check fires first."""
+    tc, ddb = client
+    _put_version(ddb)
+    monkeypatch.delenv("ARTIFACTS_ORIGIN", raising=False)
+    resp = tc.post("/artifacts/art-1/render-token", json={"version": 1})
+    assert resp.status_code == 500
+
+
+def test_requires_authentication() -> None:
+    """No dependency override and no session cookie → the route is
+    blocked by the session dependency, never reaching mint logic."""
+    app = FastAPI()
+    app.include_router(artifacts_router)
+    resp = TestClient(app).post(
+        "/artifacts/art-1/render-token", json={"version": 1}
+    )
+    assert resp.status_code == 401
diff --git a/backend/tests/apis/app_api/test_mcp_apps_proxy_call.py b/backend/tests/apis/app_api/test_mcp_apps_proxy_call.py
new file mode 100644
index 00000000..ea514e03
--- /dev/null
+++ b/backend/tests/apis/app_api/test_mcp_apps_proxy_call.py
@@ -0,0 +1,134 @@
+"""Tests for the cookie-authenticated MCP App tools/call proxy (PR #5).
+
+Mirrors `test_proxy_routes.py`: the upstream client seam
+(`proxy_routes._build_upstream_client`) is swapped for a MockTransport so
+the relay to inference-api `/invocations` is asserted without a network.
+"""
+
+from __future__ import annotations
+
+import json
+from typing import Callable, Optional
+
+import httpx
+import pytest
+from fastapi import FastAPI
+from fastapi.testclient import TestClient
+
+from apis.app_api.chat import proxy_routes
+from apis.app_api.mcp_apps.routes import router as mcp_apps_router
+from apis.shared.auth.dependencies import get_current_user_from_session
+from apis.shared.auth.models import User
+
+
+def _user(raw_token: str = "access.token.value") -> User:
+    user = User(
+        email="alice@example.com",
+        user_id="user-sub",
+        name="Alice",
+        roles=["user"],
+    )
+    user.raw_token = raw_token
+    return user
+
+
+def _build_app(*, user_override: Optional[User] = None) -> FastAPI:
+    app = FastAPI()
+    app.include_router(mcp_apps_router)
+    if user_override is not None:
+        app.dependency_overrides[get_current_user_from_session] = (
+            lambda: user_override
+        )
+    return app
+
+
+def _patch_upstream(
+    monkeypatch: pytest.MonkeyPatch,
+    handler: Callable[[httpx.Request], httpx.Response],
+) -> None:
+    transport = httpx.MockTransport(handler)
+    monkeypatch.setattr(
+        proxy_routes,
+        "_build_upstream_client",
+        lambda: httpx.AsyncClient(transport=transport),
+    )
+
+
+_BODY = {
+    "sessionId": "sess-1",
+    "toolUseId": "tu-1",
+    "toolName": "widget_tool",
+    "arguments": {"q": "x"},
+    "enabledTools": ["gateway_widget"],
+    "modelId": "m1",
+}
+
+
+def test_requires_session() -> None:
+    # No auth override → get_current_user_from_session rejects.
+    resp = TestClient(_build_app()).post("/mcp-apps/proxy-call", json=_BODY)
+    assert resp.status_code == 401
+
+
+def test_relays_directive_and_bearer_then_returns_result(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    seen: dict = {}
+
+    def handler(request: httpx.Request) -> httpx.Response:
+        seen["url"] = str(request.url)
+        seen["auth"] = request.headers.get("Authorization")
+        seen["body"] = json.loads(request.content)
+        return httpx.Response(
+            200,
+            json={
+                "toolUseId": "tu-1",
+                "result": {"content": [{"type": "text", "text": "ok"}], "isError": False},
+            },
+        )
+
+    _patch_upstream(monkeypatch, handler)
+    app = _build_app(user_override=_user("tok-abc"))
+
+    resp = TestClient(app).post("/mcp-apps/proxy-call", json=_BODY)
+
+    assert resp.status_code == 200
+    assert resp.json()["result"]["content"][0]["text"] == "ok"
+    assert seen["url"].endswith("/invocations")
+    assert seen["auth"] == "Bearer tok-abc"
+    # The conversation binding + directive are forwarded verbatim.
+    assert seen["body"]["session_id"] == "sess-1"
+    assert seen["body"]["enabled_tools"] == ["gateway_widget"]
+    assert seen["body"]["app_tool_call"] == {
+        "tool_use_id": "tu-1",
+        "tool_name": "widget_tool",
+        "arguments": {"q": "x"},
+    }
+
+
+def test_relays_inference_error_status_verbatim(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    # inference-api rejected the tool as not app-visible (spec MUST gate).
+    def handler(_request: httpx.Request) -> httpx.Response:
+        return httpx.Response(403, json={"error": "not app-visible"})
+
+    _patch_upstream(monkeypatch, handler)
+    app = _build_app(user_override=_user())
+
+    resp = TestClient(app).post("/mcp-apps/proxy-call", json=_BODY)
+    assert resp.status_code == 403
+    assert resp.json()["error"] == "not app-visible"
+
+
+def test_maps_unreachable_inference_to_502(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    def handler(_request: httpx.Request) -> httpx.Response:
+        raise httpx.ConnectError("refused")
+
+    _patch_upstream(monkeypatch, handler)
+    app = _build_app(user_override=_user())
+
+    resp = TestClient(app).post("/mcp-apps/proxy-call", json=_BODY)
+    assert resp.status_code == 502
diff --git a/backend/tests/apis/app_api/test_mcp_apps_update_context.py b/backend/tests/apis/app_api/test_mcp_apps_update_context.py
new file mode 100644
index 00000000..a0b42861
--- /dev/null
+++ b/backend/tests/apis/app_api/test_mcp_apps_update_context.py
@@ -0,0 +1,125 @@
+"""Tests for the cookie-authenticated ui/update-model-context relay (PR #6).
+
+Mirrors `test_mcp_apps_proxy_call.py`: the upstream client seam is swapped
+for a MockTransport so the relay to inference-api `/invocations` is
+asserted without a network.
+"""
+
+from __future__ import annotations
+
+import json
+from typing import Callable, Optional
+
+import httpx
+import pytest
+from fastapi import FastAPI
+from fastapi.testclient import TestClient
+
+from apis.app_api.chat import proxy_routes
+from apis.app_api.mcp_apps.routes import router as mcp_apps_router
+from apis.shared.auth.dependencies import get_current_user_from_session
+from apis.shared.auth.models import User
+
+
+def _user(raw_token: str = "access.token.value") -> User:
+    user = User(
+        email="alice@example.com",
+        user_id="user-sub",
+        name="Alice",
+        roles=["user"],
+    )
+    user.raw_token = raw_token
+    return user
+
+
+def _build_app(*, user_override: Optional[User] = None) -> FastAPI:
+    app = FastAPI()
+    app.include_router(mcp_apps_router)
+    if user_override is not None:
+        app.dependency_overrides[get_current_user_from_session] = (
+            lambda: user_override
+        )
+    return app
+
+
+def _patch_upstream(
+    monkeypatch: pytest.MonkeyPatch,
+    handler: Callable[[httpx.Request], httpx.Response],
+) -> None:
+    transport = httpx.MockTransport(handler)
+    monkeypatch.setattr(
+        proxy_routes,
+        "_build_upstream_client",
+        lambda: httpx.AsyncClient(transport=transport),
+    )
+
+
+_BODY = {
+    "sessionId": "sess-1",
+    "resourceUri": "ui://srv/widget",
+    "content": [{"type": "text", "text": "user picked X"}],
+    "structuredContent": {"selection": "X"},
+    "enabledTools": ["gateway_widget"],
+    "modelId": "m1",
+}
+
+
+def test_requires_session() -> None:
+    resp = TestClient(_build_app()).post("/mcp-apps/update-context", json=_BODY)
+    assert resp.status_code == 401
+
+
+def test_relays_directive_and_bearer(monkeypatch: pytest.MonkeyPatch) -> None:
+    seen: dict = {}
+
+    def handler(request: httpx.Request) -> httpx.Response:
+        seen["url"] = str(request.url)
+        seen["auth"] = request.headers.get("Authorization")
+        seen["body"] = json.loads(request.content)
+        return httpx.Response(
+            200, json={"resourceUri": "ui://srv/widget", "status": "stored"}
+        )
+
+    _patch_upstream(monkeypatch, handler)
+    app = _build_app(user_override=_user("tok-xyz"))
+
+    resp = TestClient(app).post("/mcp-apps/update-context", json=_BODY)
+
+    assert resp.status_code == 200
+    assert resp.json()["status"] == "stored"
+    assert seen["url"].endswith("/invocations")
+    assert seen["auth"] == "Bearer tok-xyz"
+    assert seen["body"]["session_id"] == "sess-1"
+    assert seen["body"]["enabled_tools"] == ["gateway_widget"]
+    assert seen["body"]["app_context_update"] == {
+        "resource_uri": "ui://srv/widget",
+        "content": [{"type": "text", "text": "user picked X"}],
+        "structured_content": {"selection": "X"},
+    }
+
+
+def test_relays_inference_error_status_verbatim(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    def handler(_request: httpx.Request) -> httpx.Response:
+        return httpx.Response(400, json={"error": "needs content"})
+
+    _patch_upstream(monkeypatch, handler)
+    app = _build_app(user_override=_user())
+
+    resp = TestClient(app).post("/mcp-apps/update-context", json=_BODY)
+    assert resp.status_code == 400
+    assert resp.json()["error"] == "needs content"
+
+
+def test_maps_unreachable_inference_to_502(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    def handler(_request: httpx.Request) -> httpx.Response:
+        raise httpx.ConnectError("refused")
+
+    _patch_upstream(monkeypatch, handler)
+    app = _build_app(user_override=_user())
+
+    resp = TestClient(app).post("/mcp-apps/update-context", json=_BODY)
+    assert resp.status_code == 502
diff --git a/backend/tests/apis/inference_api/test_app_context_dispatch.py b/backend/tests/apis/inference_api/test_app_context_dispatch.py
new file mode 100644
index 00000000..fdefc1be
--- /dev/null
+++ b/backend/tests/apis/inference_api/test_app_context_dispatch.py
@@ -0,0 +1,137 @@
+"""Tests for app-pushed model context dispatch (MCP Apps PR #6).
+
+Uses a fake that faithfully mimics strands 1.40 `AgentState`: `.get()`
+returns a deep copy (so the read-modify-write path is genuinely
+exercised) and `.set()` enforces JSON-serializability. No live agent.
+"""
+
+import copy
+import json
+
+import pytest
+
+from apis.inference_api.chat.app_context_dispatch import (
+    STATE_KEY,
+    AppContextUpdateError,
+    dispatch_app_context_update,
+    merge_and_clear_pending_context,
+)
+
+
+class _FakeState:
+    """Mimics strands.agent.state.AgentState get/set semantics."""
+
+    def __init__(self) -> None:
+        self._data: dict = {}
+
+    def get(self, key=None):
+        if key is None:
+            return copy.deepcopy(self._data)
+        return copy.deepcopy(self._data.get(key))
+
+    def set(self, key: str, value) -> None:
+        json.dumps(value)  # raises TypeError/ValueError if not serializable
+        self._data[key] = copy.deepcopy(value)
+
+
+class _FakeStrands:
+    def __init__(self) -> None:
+        self.state = _FakeState()
+
+
+class _FakeAgent:
+    """BaseAgent wrapper — inner Strands agent is `.agent`."""
+
+    def __init__(self) -> None:
+        self.agent = _FakeStrands()
+
+
+def test_dispatch_writes_under_resource_uri_and_acks():
+    agent = _FakeAgent()
+    ack = dispatch_app_context_update(
+        agent,
+        resource_uri="ui://srv/widget",
+        content=[{"type": "text", "text": "hello"}],
+        structured_content={"count": 2},
+    )
+    assert ack == {
+        "resourceUri": "ui://srv/widget",
+        "status": "stored",
+        "pending": 1,
+    }
+    bag = agent.agent.state.get(STATE_KEY)
+    entry = bag["context"]["ui://srv/widget"]
+    assert entry["content"] == [{"type": "text", "text": "hello"}]
+    assert entry["structuredContent"] == {"count": 2}
+    assert "updatedAt" in entry
+
+
+def test_last_write_wins_per_resource_uri():
+    agent = _FakeAgent()
+    dispatch_app_context_update(
+        agent, resource_uri="ui://a", content=None, structured_content={"v": 1}
+    )
+    ack = dispatch_app_context_update(
+        agent, resource_uri="ui://a", content=None, structured_content={"v": 2}
+    )
+    assert ack["pending"] == 1  # same uri overwrote, not appended
+    bag = agent.agent.state.get(STATE_KEY)
+    assert bag["context"]["ui://a"]["structuredContent"] == {"v": 2}
+
+
+def test_requires_content_or_structured():
+    with pytest.raises(AppContextUpdateError) as ei:
+        dispatch_app_context_update(
+            _FakeAgent(), resource_uri="ui://a", content=None, structured_content=None
+        )
+    assert ei.value.code == 400
+
+
+def test_missing_agent_state_is_409():
+    class _NoState:
+        agent = None
+
+    with pytest.raises(AppContextUpdateError) as ei:
+        dispatch_app_context_update(
+            _NoState(), resource_uri="ui://a", content=None,
+            structured_content={"x": 1},
+        )
+    assert ei.value.code == 409
+
+
+def test_merge_drains_clears_and_dedupes_by_uri():
+    agent = _FakeAgent()
+    dispatch_app_context_update(
+        agent, resource_uri="ui://a", content=None,
+        structured_content={"a": 1},
+    )
+    dispatch_app_context_update(
+        agent,
+        resource_uri="ui://b",
+        content=[{"type": "text", "text": "note-b"}],
+        structured_content=None,
+    )
+
+    block = merge_and_clear_pending_context(agent)
+    assert block is not None
+    assert "<mcp_app_context>" in block and "</mcp_app_context>" in block
+    assert 'resource="ui://a"' in block
+    assert 'resource="ui://b"' in block
+    assert "note-b" in block
+    assert '"a": 1' in block
+
+    # Cleared: a second merge with no new updates yields nothing.
+    assert merge_and_clear_pending_context(agent) is None
+
+
+def test_merge_empty_returns_none():
+    assert merge_and_clear_pending_context(_FakeAgent()) is None
+
+
+def test_merge_never_raises_on_bad_agent():
+    class _Broken:
+        agent = None
+
+    # _strands_agent would raise AppContextUpdateError(409); merge swallows
+    # it (context is best-effort and must never break a turn).
+    assert merge_and_clear_pending_context(_Broken()) is None
diff --git a/backend/tests/apis/inference_api/test_app_tool_dispatch.py b/backend/tests/apis/inference_api/test_app_tool_dispatch.py
new file mode 100644
index 00000000..d270b52c
--- /dev/null
+++ b/backend/tests/apis/inference_api/test_app_tool_dispatch.py
@@ -0,0 +1,167 @@
+"""Tests for app-initiated tools/call dispatch (MCP Apps PR #5).
+
+Mocks the boundary the way the PR #3 tests do: a fake MCP client +
+`UIToolCatalog`, no live agent. Asserts the spec-MUST app-visibility gate
+at the inference-api dispatch, and that a successful call publishes
+synthesized tool_use/tool_result into the per-session broker.
+"""
+
+import pytest
+
+from apis.inference_api.chat.app_tool_dispatch import (
+    AppToolCallError,
+    dispatch_app_tool_call,
+)
+from apis.shared.mcp_apps.broker import get_app_tool_event_broker
+from apis.shared.tools.models import ToolUIMetadata
+from agents.main_agent.integrations import mcp_apps as mcp_apps_mod
+
+
+class _FakeContent:
+    def __init__(self, text: str) -> None:
+        self._text = text
+
+    def model_dump(self, **_: object) -> dict:
+        return {"type": "text", "text": self._text}
+
+
+class _FakeResult:
+    def __init__(self, text: str = "ok", is_error: bool = False) -> None:
+        self.content = [_FakeContent(text)]
+        self.isError = is_error
+
+
+class _FakeClient:
+    def __init__(self, result=None, raises: Exception | None = None) -> None:
+        self._result = result if result is not None else _FakeResult()
+        self._raises = raises
+        self.calls: list = []
+
+    def call_tool_sync(self, tool_use_id, name, arguments=None):
+        self.calls.append((tool_use_id, name, arguments))
+        if self._raises is not None:
+            raise self._raises
+        return self._result
+
+
+class _FakeCatalog:
+    def __init__(self, meta=None, client=None) -> None:
+        self._meta = meta
+        self._client = client
+
+    def get(self, _name):
+        return self._meta
+
+    def get_client(self, _name):
+        return self._client
+
+
+def _patch(monkeypatch, *, enabled=True, meta=None, client=None):
+    monkeypatch.setattr(
+        mcp_apps_mod, "is_mcp_apps_host_enabled", lambda: enabled
+    )
+    monkeypatch.setattr(
+        mcp_apps_mod,
+        "get_ui_tool_catalog",
+        lambda: _FakeCatalog(meta=meta, client=client),
+    )
+
+
+def _ui(visibility):
+    return ToolUIMetadata(resource_uri="ui://srv/w", visibility=visibility)
+
+
+async def _call(session_id="disp-s1", tool_name="widget_tool"):
+    return await dispatch_app_tool_call(
+        agent=None,
+        session_id=session_id,
+        user_id="u1",
+        tool_use_id="tu-1",
+        tool_name=tool_name,
+        arguments={"q": "x"},
+    )
+
+
+@pytest.mark.asyncio
+async def test_rejects_when_host_flag_disabled(monkeypatch):
+    _patch(monkeypatch, enabled=False)
+    with pytest.raises(AppToolCallError) as ei:
+        await _call()
+    assert ei.value.code == 403
+
+
+@pytest.mark.asyncio
+async def test_rejects_unknown_tool(monkeypatch):
+    _patch(monkeypatch, enabled=True, meta=None, client=_FakeClient())
+    with pytest.raises(AppToolCallError) as ei:
+        await _call()
+    assert ei.value.code == 403
+
+
+@pytest.mark.asyncio
+async def test_rejects_tool_not_app_visible(monkeypatch):
+    # visibility=["model"] → callable by the model, NOT by an app.
+    _patch(
+        monkeypatch,
+        enabled=True,
+        meta=_ui(["model"]),
+        client=_FakeClient(),
+    )
+    with pytest.raises(AppToolCallError) as ei:
+        await _call()
+    assert ei.value.code == 403
+
+
+@pytest.mark.asyncio
+async def test_rejects_when_no_live_client(monkeypatch):
+    _patch(monkeypatch, enabled=True, meta=_ui(["model", "app"]), client=None)
+    with pytest.raises(AppToolCallError) as ei:
+        await _call()
+    assert ei.value.code == 409
+
+
+@pytest.mark.asyncio
+async def test_dispatch_failure_maps_to_502(monkeypatch):
+    _patch(
+        monkeypatch,
+        enabled=True,
+        meta=_ui(["app"]),
+        client=_FakeClient(raises=RuntimeError("boom")),
+    )
+    with pytest.raises(AppToolCallError) as ei:
+        await _call()
+    assert ei.value.code == 502
+
+
+@pytest.mark.asyncio
+async def test_success_returns_result_and_publishes_thread_events(monkeypatch):
+    client = _FakeClient(_FakeResult("hello"))
+    _patch(monkeypatch, enabled=True, meta=_ui(["model", "app"]), client=client)
+
+    broker = get_app_tool_event_broker()
+    q = broker.add_subscriber("disp-ok")
+    try:
+        payload = await dispatch_app_tool_call(
+            agent=None,
+            session_id="disp-ok",
+            user_id="u1",
+            tool_use_id="tu-9",
+            tool_name="widget_tool",
+            arguments={"q": "x"},
+        )
+    finally:
+        events = broker.drain(q)
+        broker.remove_subscriber("disp-ok", q)
+
+    assert payload["toolUseId"] == "tu-9"
+    assert payload["result"]["isError"] is False
+    assert payload["result"]["content"] == [{"type": "text", "text": "hello"}]
+    # The MCP client was called with a synthesized (distinct) id.
+    assert client.calls[0][1] == "widget_tool"
+    assert client.calls[0][0] != "tu-9"
+    # Both thread events were published, tool_use before tool_result.
+    types = [e["type"] for e in events]
+    assert types == ["tool_use", "tool_result"]
+    assert events[0]["data"]["tool_use"]["name"] == "widget_tool"
+    assert events[0]["data"]["tool_use"]["origin"] == "mcp_app"
+    assert events[1]["data"]["tool_result"]["status"] == "success"
diff --git a/backend/tests/apis/inference_api/test_inference_param_merge.py b/backend/tests/apis/inference_api/test_inference_param_merge.py
new file mode 100644
index 00000000..60801937
--- /dev/null
+++ b/backend/tests/apis/inference_api/test_inference_param_merge.py
@@ -0,0 +1,123 @@
+"""Tests for the inference-param merge guard in ``apis.inference_api.chat.routes``.
+
+Focus: the cross-param safety check that drops ``thinking`` when
+``thinking >= max_tokens`` (Anthropic rejects that request outright). Inference
+params arrive untyped (``Dict[str, Any]`` from JSON), so an int bound can show
+up as a float — an ``isinstance(..., int)`` gate used to silently skip the
+check on float input and let the bad request through.
+"""
+
+from __future__ import annotations
+
+from types import SimpleNamespace
+
+import pytest
+
+from apis.inference_api.chat.routes import _as_int_or_none, _merge_inference_params
+from apis.shared.models.models import ModelParamSpec, SupportedParams
+
+
+def _model(**specs: ModelParamSpec) -> SimpleNamespace:
+    """Minimal managed-model stand-in: only ``supported_params`` + ``model_id``."""
+    return SimpleNamespace(
+        model_id="test-model",
+        supported_params=SupportedParams(params=dict(specs)),
+    )
+
+
+# Wide bounds so request values pass through unclamped (and keep their
+# original float type), reproducing the JSON-sourced-float scenario.
+_WIDE_MAX_TOKENS = ModelParamSpec(supported=True, min=1, max=200000)
+_WIDE_THINKING = ModelParamSpec(supported=True, min=1024, max=None)
+
+
+class TestAsIntOrNone:
+    @pytest.mark.parametrize(
+        "value,expected",
+        [
+            (8192, 8192),
+            (8192.0, 8192),
+            (100000.0, 100000),
+            (True, None),
+            (False, None),
+            (None, None),
+            ("8192", None),
+            ({"type": "enabled"}, None),
+        ],
+    )
+    def test_coercion(self, value, expected):
+        assert _as_int_or_none(value) == expected
+
+
+class TestThinkingGuardFloatInput:
+    def test_float_thinking_ge_float_max_tokens_drops_thinking(self):
+        """The original bug: both arrive as floats, thinking >= max_tokens.
+        The guard must still fire and drop thinking."""
+        model = _model(max_tokens=_WIDE_MAX_TOKENS, thinking=_WIDE_THINKING)
+        merged = _merge_inference_params(
+            model, {"max_tokens": 2048.0, "thinking": 4096.0}
+        )
+
+        assert "thinking" not in merged
+        assert merged["max_tokens"] == 2048.0
+
+    def test_float_thinking_below_float_max_tokens_is_retained(self):
+        """Guard must not over-drop when the float values are consistent."""
+        model = _model(max_tokens=_WIDE_MAX_TOKENS, thinking=_WIDE_THINKING)
+        merged = _merge_inference_params(
+            model, {"max_tokens": 8192.0, "thinking": 2048.0}
+        )
+
+        assert merged["thinking"] == 2048.0
+        assert merged["max_tokens"] == 8192.0
+
+    def test_int_inputs_still_guarded(self):
+        """Pre-existing int path must keep working."""
+        model = _model(max_tokens=_WIDE_MAX_TOKENS, thinking=_WIDE_THINKING)
+        merged = _merge_inference_params(
+            model, {"max_tokens": 2048, "thinking": 4096}
+        )
+
+        assert "thinking" not in merged
+
+
+class TestEffortAllowedGating:
+    """`effort` is enum-gated: a request override must be a member of the
+    admin-declared `allowed` set, else it falls back to the default. The
+    per-model effort-tier difference (Sonnet 4.6 vs Opus 4.7) is data on
+    `ModelParamSpec.allowed`, not model-family code."""
+
+    _SONNET_EFFORT = ModelParamSpec(
+        supported=True, allowed=["low", "medium", "high"], default="high"
+    )
+    _OPUS_EFFORT = ModelParamSpec(
+        supported=True, allowed=["low", "medium", "high", "xhigh", "max"], default="high"
+    )
+
+    def test_in_domain_override_is_kept(self):
+        model = _model(effort=self._SONNET_EFFORT)
+        merged = _merge_inference_params(model, {"effort": "low"})
+        assert merged["effort"] == "low"
+
+    def test_out_of_domain_override_falls_back_to_default(self):
+        # `xhigh` is Opus-4.7-only; on a Sonnet-4.6-shaped spec it's rejected
+        # and the admin default wins instead of erroring mid-stream.
+        model = _model(effort=self._SONNET_EFFORT)
+        merged = _merge_inference_params(model, {"effort": "xhigh"})
+        assert merged["effort"] == "high"
+
+    def test_xhigh_allowed_on_opus_spec(self):
+        model = _model(effort=self._OPUS_EFFORT)
+        merged = _merge_inference_params(model, {"effort": "xhigh"})
+        assert merged["effort"] == "xhigh"
+
+    def test_no_override_uses_default(self):
+        model = _model(effort=self._SONNET_EFFORT)
+        merged = _merge_inference_params(model, {})
+        assert merged["effort"] == "high"
+
+    def test_out_of_domain_with_no_default_is_dropped(self):
+        spec = ModelParamSpec(supported=True, allowed=["low", "medium", "high"])
+        model = _model(effort=spec)
+        merged = _merge_inference_params(model, {"effort": "max"})
+        assert "effort" not in merged
diff --git a/backend/tests/apis/shared/middleware/test_session_refresh_bug_condition.py b/backend/tests/apis/shared/middleware/test_session_refresh_bug_condition.py
new file mode 100644
index 00000000..3445180c
--- /dev/null
+++ b/backend/tests/apis/shared/middleware/test_session_refresh_bug_condition.py
@@ -0,0 +1,739 @@
+"""Bug condition exploration property tests for SessionRefreshMiddleware event-loop blocking.
+
+Property 1: Bug Condition — Event-Loop Non-Blocking, Coalesced, Window-Staggered, Fire-and-Forget
+
+This file encodes the EXPECTED behavior (Property 1 / Expected Behavior 2.1–2.7) from
+the design document. Each sub-condition test surfaces a counterexample that demonstrates
+the corresponding sub-condition (1.1–1.7) of `isBugCondition` from design.md.
+
+CRITICAL: These tests MUST FAIL on unfixed code — failure confirms the bug exists.
+They will PASS after the fix (task 3 series) is implemented:
+  - Repository/Cognito offload via asyncio.to_thread (2.1, 2.2)
+  - Per-session single-flight for the resolve path (2.3)
+  - Strict-multiple windows (throttle=300s, leeway=60s) (2.4)
+  - Fire-and-forget slide-write (2.5)
+  - appApi.desiredCount >= 2 (2.6)
+  - Bounded blocking DDB calls across fan-out (2.7)
+
+Scoped PBT Approach: each sub-condition is reproduced by a concrete, deterministic
+scenario under pytest-asyncio. Hypothesis is used on the two sub-conditions that
+generalize over a family of inputs (fan-out size for 1.3 / 1.7).
+
+Validates: Requirements 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import secrets
+import time
+from pathlib import Path
+from typing import Any, Optional
+from unittest.mock import MagicMock
+
+import httpx
+import pytest
+from cryptography.hazmat.primitives.ciphers.aead import AESGCM
+from fastapi import FastAPI, Request
+from hypothesis import HealthCheck, given, settings
+from hypothesis import strategies as st
+
+from apis.shared.middleware.session_refresh import SessionRefreshMiddleware
+from apis.shared.sessions_bff import lock as lock_module
+from apis.shared.sessions_bff.cache import SessionCache
+from apis.shared.sessions_bff.config import (
+    BFFConfig,
+    SESSION_COOKIE_NAME,
+    _DEFAULT_REFRESH_LEEWAY_SECONDS,
+    _DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS,
+)
+from apis.shared.sessions_bff.cookie import CookieCodec
+from apis.shared.sessions_bff.lock import get_session_lock
+from apis.shared.sessions_bff.models import CookiePayload, SessionRecord
+from apis.shared.sessions_bff.refresh import (
+    CognitoRefreshClient,
+    _reset_secret_cache_for_tests,
+)
+from apis.shared.sessions_bff.repository import SessionRepository
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Fixtures and helpers
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+class InstrumentedTable:
+    """Synchronous fake of a boto3 DynamoDB Table.
+
+    Records call counts and can inject a `time.sleep` delay to block the
+    event loop thread on unfixed code, letting us prove whether the caller
+    yielded to the loop while the boto3 call was in flight.
+
+    Mirrors the tiny subset of the Table API that `SessionRepository` uses:
+    `get_item`, `update_item`, `put_item`, `delete_item`.
+    """
+
+    def __init__(
+        self,
+        *,
+        record: Optional[SessionRecord] = None,
+        delay_s: float = 0.0,
+    ) -> None:
+        self._delay_s = delay_s
+        self._record = record
+        self.get_item_calls = 0
+        self.update_item_calls = 0
+        self.put_item_calls = 0
+        self.delete_item_calls = 0
+
+    def _sleep(self) -> None:
+        if self._delay_s > 0:
+            time.sleep(self._delay_s)
+
+    def get_item(self, Key: dict) -> dict:
+        self.get_item_calls += 1
+        self._sleep()
+        if self._record is None:
+            return {}
+        return {"Item": _record_to_item(self._record)}
+
+    def update_item(self, **kwargs: Any) -> dict:
+        self.update_item_calls += 1
+        self._sleep()
+        return {}
+
+    def put_item(self, Item: dict) -> dict:
+        self.put_item_calls += 1
+        self._sleep()
+        return {}
+
+    def delete_item(self, Key: dict) -> dict:
+        self.delete_item_calls += 1
+        self._sleep()
+        return {}
+
+
+def _record_to_item(r: SessionRecord) -> dict:
+    return {
+        "PK": f"SESSION#{r.session_id}",
+        "SK": "META",
+        "session_id": r.session_id,
+        "user_id": r.user_id,
+        "username": r.username,
+        "cognito_access_token": r.cognito_access_token,
+        "cognito_refresh_token": r.cognito_refresh_token,
+        "id_token": r.id_token,
+        "access_token_exp": r.access_token_exp,
+        "csrf_secret": r.csrf_secret,
+        "created_at": r.created_at,
+        "last_seen_at": r.last_seen_at,
+        "ttl": r.ttl,
+    }
+
+
+def _make_repo(table: InstrumentedTable) -> SessionRepository:
+    """Build a SessionRepository backed by an InstrumentedTable.
+
+    Bypasses boto3.resource() initialization by starting disabled, then
+    flipping `_enabled` and injecting the fake table. Exercises the real
+    SessionRepository async-method bodies — which is the point for
+    sub-condition 1.1 (offload).
+    """
+    repo = SessionRepository(table_name="")
+    repo._enabled = True
+    repo._table = table  # type: ignore[assignment]
+    repo._table_name = "test-bff-sessions"
+    return repo
+
+
+def _make_codec() -> CookieCodec:
+    codec = CookieCodec(kms_key_arn="arn:aws:kms:fake")
+    # Pre-inject an AES-GCM cipher so no KMS call is attempted.
+    codec._cipher = AESGCM(secrets.token_bytes(32))
+    return codec
+
+
+def _make_record(
+    *,
+    session_id: str = "sess-001",
+    access_token_exp: Optional[int] = None,
+    last_seen_at: Optional[int] = None,
+    created_at: Optional[int] = None,
+) -> SessionRecord:
+    now = int(time.time())
+    return SessionRecord(
+        session_id=session_id,
+        user_id="user-sub-001",
+        username="alice",
+        cognito_access_token="access.original",
+        cognito_refresh_token="refresh.original",
+        id_token="id.original",
+        access_token_exp=access_token_exp if access_token_exp is not None else now + 3600,
+        csrf_secret="csrf-secret-deadbeef",
+        created_at=created_at if created_at is not None else now,
+        last_seen_at=last_seen_at if last_seen_at is not None else now,
+        ttl=now + 28800,
+    )
+
+
+def _enabled_config(**overrides: Any) -> BFFConfig:
+    defaults: dict[str, Any] = dict(
+        sessions_table_name="tbl",
+        cookie_signing_key_arn="arn:aws:kms:fake",
+        session_ttl_seconds=28800,
+        refresh_leeway_seconds=_DEFAULT_REFRESH_LEEWAY_SECONDS,
+        cognito_bff_app_client_id="client-id",
+        cognito_bff_app_client_secret_arn="arn:secret",
+        inference_api_url=None,
+        absolute_lifetime_seconds=30 * 24 * 3600,
+        sliding_renewal_throttle_seconds=_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS,
+    )
+    defaults.update(overrides)
+    return BFFConfig(**defaults)
+
+
+def _build_app(
+    *,
+    config: BFFConfig,
+    repository: SessionRepository,
+    codec: CookieCodec,
+    refresh_client: Any,
+    cache: Optional[SessionCache] = None,
+) -> FastAPI:
+    app = FastAPI()
+    app.add_middleware(
+        SessionRefreshMiddleware,
+        config=config,
+        repository=repository,
+        cookie_codec=codec,
+        refresh_client=refresh_client,
+        cache=cache or SessionCache(ttl_seconds=60),
+    )
+
+    @app.get("/echo")
+    async def echo(request: Request) -> dict:
+        record = getattr(request.state, "bff_session", None)
+        return {
+            "has_session": record is not None,
+            "session_id": record.session_id if record else None,
+        }
+
+    return app
+
+
+@pytest.fixture(autouse=True)
+def _reset_session_state() -> Any:
+    """Clear process-wide state between tests so storm/coalescing behavior
+    stays independent across cases."""
+    lock_module._reset_for_tests()
+    _reset_secret_cache_for_tests()
+    yield
+    lock_module._reset_for_tests()
+    _reset_secret_cache_for_tests()
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Sub-condition 1.1 — SessionRepository.* must offload sync boto3 to a threadpool
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+@pytest.mark.asyncio
+@pytest.mark.parametrize(
+    "method_name",
+    ["get", "touch_last_seen", "update_tokens", "put", "delete"],
+)
+async def test_1_1_session_repository_methods_offload_sync_boto3(
+    method_name: str,
+) -> None:
+    """(1.1) Repository offload.
+
+    Each SessionRepository async method that wraps boto3 must execute its
+    boto3 call off the event loop thread. We prove this by running the
+    method concurrently with a 50ms marker coroutine against a 500ms
+    slow-stubbed table.
+
+    - Fixed code: marker completes in ~0.05s while repo call is still in flight.
+    - Unfixed code: sync boto3 freezes the loop for the full 500ms, starving
+      the marker so it only completes once the method returns.
+
+    Expected Behavior 2.1 (design.md).
+    """
+    record = _make_record(session_id=f"sess-1-1-{method_name}")
+    table = InstrumentedTable(record=record, delay_s=0.5)
+    repo = _make_repo(table)
+
+    now = int(time.time())
+    if method_name == "get":
+        op = repo.get(record.session_id)
+    elif method_name == "touch_last_seen":
+        op = repo.touch_last_seen(record.session_id, last_seen_at=now)
+    elif method_name == "update_tokens":
+        op = repo.update_tokens(
+            session_id=record.session_id,
+            access_token="access.rotated",
+            refresh_token="refresh.rotated",
+            id_token=None,
+            access_token_exp=now + 3600,
+            last_seen_at=now,
+        )
+    elif method_name == "put":
+        op = repo.put(record)
+    elif method_name == "delete":
+        op = repo.delete(record.session_id)
+    else:
+        pytest.fail(f"unknown method_name: {method_name}")
+
+    marker_elapsed: dict[str, float] = {}
+
+    async def marker(start: float) -> None:
+        await asyncio.sleep(0.05)
+        marker_elapsed["t"] = time.monotonic() - start
+
+    t0 = time.monotonic()
+    marker_task = asyncio.create_task(marker(t0))
+    await op
+    op_elapsed = time.monotonic() - t0
+    await marker_task
+
+    # Sanity: the stubbed boto3 call really took ~500ms.
+    assert op_elapsed >= 0.4, (
+        f"[1.1/{method_name}] Sanity: stubbed {method_name} should take ~500ms, "
+        f"got {op_elapsed:.3f}s — the InstrumentedTable delay may not be wired."
+    )
+    # Counterexample: on unfixed code, the marker sits behind the frozen loop.
+    assert "t" in marker_elapsed, (
+        f"[1.1/{method_name}] Marker coroutine never completed — "
+        f"event loop fully frozen by sync boto3."
+    )
+    assert marker_elapsed["t"] < 0.25, (
+        f"[1.1/{method_name}] Marker coroutine starved by sync boto3: "
+        f"marker elapsed={marker_elapsed['t']:.3f}s, "
+        f"op elapsed={op_elapsed:.3f}s. "
+        f"SessionRepository.{method_name} must offload its boto3 call via "
+        "asyncio.to_thread so the event loop continues scheduling other "
+        "coroutines for the round-trip duration."
+    )
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Sub-condition 1.2 — CognitoRefreshClient.refresh must offload initiate_auth
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+@pytest.mark.asyncio
+async def test_1_2_cognito_refresh_offloads_sync_initiate_auth() -> None:
+    """(1.2) Cognito offload.
+
+    CognitoRefreshClient.refresh must execute cognito-idp:initiate_auth
+    off the event loop thread, including while the per-session
+    get_session_lock(session_id) is held. We prove this by running
+    refresh concurrently with:
+      (a) a 50ms marker coroutine;
+      (b) an unrelated get_session_lock(other_session_id) acquisition.
+
+    - Fixed code: both complete promptly while refresh is still in flight.
+    - Unfixed code: the sync initiate_auth freezes the loop, starving
+      the marker and delaying the unrelated lock acquisition.
+
+    Expected Behavior 2.2 (design.md).
+    """
+    slow_cognito = MagicMock()
+
+    def slow_initiate_auth(**_kwargs: Any) -> dict:
+        time.sleep(0.5)
+        return {
+            "AuthenticationResult": {
+                "AccessToken": "access.fresh",
+                "RefreshToken": "refresh.fresh",
+                "IdToken": "id.fresh",
+                "ExpiresIn": 3600,
+            }
+        }
+
+    slow_cognito.initiate_auth.side_effect = slow_initiate_auth
+
+    slow_secrets = MagicMock()
+    slow_secrets.get_secret_value.return_value = {"SecretString": "client-secret"}
+
+    client = CognitoRefreshClient(
+        app_client_id="client-id",
+        app_client_secret_arn="arn:secret",
+        cognito_idp_client=slow_cognito,
+        secrets_manager_client=slow_secrets,
+    )
+
+    marker_elapsed: dict[str, float] = {}
+    lock_elapsed: dict[str, float] = {}
+    refresh_elapsed: dict[str, float] = {}
+
+    async def call_refresh(start: float) -> None:
+        result = client.refresh(username="alice", refresh_token="refresh.original")
+        # Support both the unfixed (sync) and fixed (coroutine) shape.
+        if asyncio.iscoroutine(result):
+            result = await result
+        refresh_elapsed["t"] = time.monotonic() - start
+
+    async def marker(start: float) -> None:
+        await asyncio.sleep(0.05)
+        marker_elapsed["t"] = time.monotonic() - start
+
+    async def acquire_other_lock(start: float) -> None:
+        other_lock = get_session_lock("other-session-id")
+        async with other_lock:
+            pass
+        lock_elapsed["t"] = time.monotonic() - start
+
+    t0 = time.monotonic()
+    marker_task = asyncio.create_task(marker(t0))
+    other_lock_task = asyncio.create_task(acquire_other_lock(t0))
+    await call_refresh(t0)
+    await marker_task
+    await other_lock_task
+
+    # Sanity: the stubbed initiate_auth really took ~500ms.
+    assert refresh_elapsed.get("t", 0.0) >= 0.4, (
+        f"[1.2] Sanity: stubbed refresh should take ~500ms, "
+        f"got {refresh_elapsed.get('t', 0.0):.3f}s — stub not wired."
+    )
+    assert "t" in marker_elapsed, (
+        "[1.2] Marker coroutine never completed — loop fully frozen."
+    )
+    assert marker_elapsed["t"] < 0.25, (
+        f"[1.2] Marker coroutine starved by sync Cognito initiate_auth: "
+        f"marker elapsed={marker_elapsed['t']:.3f}s, "
+        f"refresh elapsed={refresh_elapsed['t']:.3f}s. "
+        "CognitoRefreshClient.refresh must offload initiate_auth via "
+        "asyncio.to_thread so other coroutines — including those for "
+        "different session_ids — make progress while the per-session "
+        "asyncio.Lock is held."
+    )
+    assert lock_elapsed["t"] < 0.25, (
+        f"[1.2] Unrelated get_session_lock('other-session-id') acquisition "
+        f"starved by sync Cognito call: lock elapsed={lock_elapsed['t']:.3f}s, "
+        f"refresh elapsed={refresh_elapsed['t']:.3f}s. "
+        "Even uncontended locks for different sessions block when the "
+        "event loop thread is frozen."
+    )
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Sub-condition 1.3 — Resolve-path coalescing: N concurrent reqs → 1 get_item
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+@pytest.mark.asyncio
+@pytest.mark.parametrize("fanout", [8])
+async def test_1_3_concurrent_same_session_fanout_coalesces_to_one_get_item(
+    fanout: int,
+) -> None:
+    """(1.3) Resolve-path coalescing.
+
+    N concurrent SessionRefreshMiddleware.dispatch calls for the same
+    session_id with a cold SessionCache and a valid sealed cookie must
+    result in exactly ONE DynamoDB get_item invocation. The upstream
+    unseal → SessionCache.get → SessionRepository.get path needs
+    coalescing via a per-session single-flight primitive.
+
+    - Fixed code: 1 get_item (single-flight leader + followers).
+    - Unfixed code: N get_item calls — the existing get_session_lock only
+      wraps the Cognito exchange, not the resolve path.
+
+    Expected Behavior 2.3 (design.md).
+    """
+    record = _make_record(session_id="sess-1-3")
+    # Small delay so concurrent dispatches overlap long enough for each
+    # to observe cache-miss independently on unfixed code.
+    table = InstrumentedTable(record=record, delay_s=0.05)
+    repo = _make_repo(table)
+    codec = _make_codec()
+    refresh_client = MagicMock()
+    cache = SessionCache(ttl_seconds=60)  # cold → cache miss
+    app = _build_app(
+        config=_enabled_config(),
+        repository=repo,
+        codec=codec,
+        refresh_client=refresh_client,
+        cache=cache,
+    )
+
+    sealed = codec.seal(CookiePayload(session_id=record.session_id))
+    transport = httpx.ASGITransport(app=app)
+
+    async with httpx.AsyncClient(
+        transport=transport, base_url="http://test"
+    ) as client:
+        client.cookies.set(SESSION_COOKIE_NAME, sealed)
+        responses = await asyncio.gather(
+            *(client.get("/echo") for _ in range(fanout))
+        )
+
+    for r in responses:
+        assert r.status_code == 200
+
+    assert table.get_item_calls == 1, (
+        f"[1.3] Fan-out of {fanout} concurrent same-session requests against "
+        f"a cold cache must coalesce to exactly one get_item call. "
+        f"Observed: {table.get_item_calls} get_item calls (bug target: {fanout}). "
+        "A per-session asyncio.Future single-flight is required upstream of "
+        "SessionRepository.get."
+    )
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Sub-condition 1.4 — Cache window and slide throttle must be de-aligned
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+@given(
+    throttle=st.just(_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS),
+    leeway=st.just(_DEFAULT_REFRESH_LEEWAY_SECONDS),
+)
+@settings(max_examples=1, deadline=None, suppress_health_check=[HealthCheck.function_scoped_fixture])
+def test_1_4a_default_throttle_is_strict_multiple_of_leeway(
+    throttle: int, leeway: int
+) -> None:
+    """(1.4) Window de-alignment — config invariant.
+
+    _DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS must be a strict multiple of
+    _DEFAULT_REFRESH_LEEWAY_SECONDS AND strictly greater. This de-aligns
+    cache-expiry (TTL = leeway) from slide-throttle expiry so a single
+    request crossing one boundary does not also cross the other.
+
+    - Fixed code: throttle=300, leeway=60 → 300 > 60 and 300 % 60 == 0.
+    - Unfixed code: both default to 60 → 60 > 60 is False.
+
+    Expected Behavior 2.4 (design.md).
+    """
+    assert throttle > leeway, (
+        f"[1.4a] Sliding-renewal throttle ({throttle}s) must be strictly "
+        f"greater than refresh leeway ({leeway}s) to de-align boundaries."
+    )
+    assert throttle % leeway == 0, (
+        f"[1.4a] Sliding-renewal throttle ({throttle}s) must be a strict "
+        f"multiple of refresh leeway ({leeway}s)."
+    )
+
+
+@pytest.mark.asyncio
+async def test_1_4b_single_request_at_boundary_skips_slide_write() -> None:
+    """(1.4) Window de-alignment — runtime behavior.
+
+    A single request with SessionCache TTL just elapsed AND
+    (now - last_seen_at) == refresh_leeway_seconds must issue AT MOST ONE
+    of {get_item, update_item} on the critical path. On unfixed code the
+    aligned 60s windows guarantee BOTH writes on the same request (the
+    cache miss drives get_item AND the past-throttle state drives
+    update_item).
+
+    Expected Behavior 2.4 (design.md).
+    """
+    now = int(time.time())
+    record = _make_record(
+        session_id="sess-1-4b",
+        last_seen_at=now - _DEFAULT_REFRESH_LEEWAY_SECONDS,
+    )
+    table = InstrumentedTable(record=record, delay_s=0.01)
+    repo = _make_repo(table)
+    codec = _make_codec()
+    refresh_client = MagicMock()
+    cache = SessionCache(ttl_seconds=60)  # cold → cache miss
+    # Use the real default throttle so the test fails on unfixed code
+    # (throttle == leeway == 60s) and passes on fixed code (throttle=300s,
+    # leeway=60s).
+    app = _build_app(
+        config=_enabled_config(
+            sliding_renewal_throttle_seconds=_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS,
+        ),
+        repository=repo,
+        codec=codec,
+        refresh_client=refresh_client,
+        cache=cache,
+    )
+
+    sealed = codec.seal(CookiePayload(session_id=record.session_id))
+    transport = httpx.ASGITransport(app=app)
+
+    async with httpx.AsyncClient(
+        transport=transport, base_url="http://test"
+    ) as client:
+        client.cookies.set(SESSION_COOKIE_NAME, sealed)
+        response = await client.get("/echo")
+    assert response.status_code == 200
+
+    ddb_calls = table.get_item_calls + table.update_item_calls
+    assert ddb_calls <= 1, (
+        f"[1.4b] Single request at cache/throttle boundary issued "
+        f"{table.get_item_calls} get_item + {table.update_item_calls} "
+        f"update_item = {ddb_calls} DDB calls on critical path. "
+        "Windows must be de-aligned (throttle > leeway, strict multiple) "
+        "so a cache miss never also triggers a slide write."
+    )
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Sub-condition 1.5 — _maybe_slide must fire-and-forget the DDB write
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+@pytest.mark.asyncio
+async def test_1_5_slide_write_is_fire_and_forget() -> None:
+    """(1.5) Fire-and-forget slide.
+
+    When a slide is warranted, the response path must NOT wait on
+    touch_last_seen. Stubbing update_item with a 500ms delay, the total
+    dispatch elapsed must stay well under 500ms.
+
+    - Fixed code: _maybe_slide schedules touch_last_seen as an
+      asyncio.Task and returns synchronously → elapsed ~= handler time.
+    - Unfixed code: _maybe_slide awaits touch_last_seen inline →
+      elapsed >= 500ms.
+
+    Expected Behavior 2.5 (design.md).
+    """
+    now = int(time.time())
+    record = _make_record(
+        session_id="sess-1-5",
+        last_seen_at=now - 3600,  # past any reasonable throttle window
+    )
+    table = InstrumentedTable(record=record, delay_s=0.5)
+    repo = _make_repo(table)
+    codec = _make_codec()
+    refresh_client = MagicMock()
+
+    # Pre-seed the cache so repo.get is not on the path — this test isolates
+    # the slide-write-on-response-path question from the coalescing question.
+    cache = SessionCache(ttl_seconds=60)
+    cache.set(record)
+
+    # Use a small throttle so the slide is warranted (last_seen == now-3600).
+    app = _build_app(
+        config=_enabled_config(sliding_renewal_throttle_seconds=60),
+        repository=repo,
+        codec=codec,
+        refresh_client=refresh_client,
+        cache=cache,
+    )
+
+    sealed = codec.seal(CookiePayload(session_id=record.session_id))
+    transport = httpx.ASGITransport(app=app)
+
+    async with httpx.AsyncClient(
+        transport=transport, base_url="http://test"
+    ) as client:
+        client.cookies.set(SESSION_COOKIE_NAME, sealed)
+        t0 = time.monotonic()
+        response = await client.get("/echo")
+        elapsed = time.monotonic() - t0
+
+    assert response.status_code == 200
+    # Sanity: the slide write was in fact requested (fires exactly once;
+    # in the fixed scenario it's still counted on the fake table — it just
+    # doesn't block the response path).
+    assert table.update_item_calls >= 1, (
+        f"[1.5] Sanity: the slide path should have fired update_item at least "
+        f"once, got {table.update_item_calls}. Check last_seen_at setup."
+    )
+    assert elapsed < 0.25, (
+        f"[1.5] Dispatch elapsed={elapsed:.3f}s; the response waited on the "
+        "500ms stubbed update_item. _maybe_slide must dispatch the DDB write "
+        "as a detached asyncio.Task so the response returns without blocking."
+    )
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Sub-condition 1.6 — Production deployment must have concurrency slack
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+def test_1_6_cdk_app_api_desired_count_at_least_two() -> None:
+    """(1.6) Concurrency slack at deployment.
+
+    infrastructure/cdk.context.json must set appApi.desiredCount >= 2 so
+    a single blocked event loop on one ECS task cannot stall all ingress.
+
+    Expected Behavior 2.6 (design.md).
+    """
+    cdk_context_path = (
+        Path(__file__).resolve().parents[5] / "infrastructure" / "cdk.context.json"
+    )
+    assert cdk_context_path.exists(), (
+        f"[1.6] Expected cdk.context.json at {cdk_context_path}"
+    )
+    ctx = json.loads(cdk_context_path.read_text())
+    app_api = ctx.get("appApi", {})
+    desired = app_api.get("desiredCount")
+    assert isinstance(desired, int) and desired >= 2, (
+        f"[1.6] appApi.desiredCount must be >= 2 in the production context "
+        f"(found: {desired!r}). Single-task deployment cannot absorb a "
+        "blocked event loop — a slow AWS call on one task halts every "
+        "concurrent request."
+    )
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Sub-condition 1.7 — Fan-out at cache boundary must not amplify to N*2 DDB calls
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+@pytest.mark.asyncio
+@pytest.mark.parametrize("fanout", [8])
+async def test_1_7_fanout_at_boundary_bounded_blocking_ddb_calls(
+    fanout: int,
+) -> None:
+    """(1.7) Fan-out amplification.
+
+    N concurrent requests for the same session at a cache-boundary moment
+    must produce AT MOST 2 blocking DDB calls across the entire fan-out
+    (ideally 1 get_item and 0 slide-writes when windows are de-aligned).
+
+    - Fixed code: single-flight + de-aligned windows → ≤ 1 get_item +
+      ≤ 1 update_item = ≤ 2.
+    - Unfixed code: each coroutine observes cache miss + past-throttle
+      independently on its local SessionRecord copy and issues its own
+      get_item + update_item → 2*N blocking calls.
+
+    Expected Behavior 2.7 (design.md).
+    """
+    now = int(time.time())
+    record = _make_record(
+        session_id="sess-1-7",
+        last_seen_at=now - _DEFAULT_REFRESH_LEEWAY_SECONDS,  # past aligned throttle on unfixed
+    )
+    table = InstrumentedTable(record=record, delay_s=0.01)
+    repo = _make_repo(table)
+    codec = _make_codec()
+    refresh_client = MagicMock()
+    cache = SessionCache(ttl_seconds=60)  # cold → cache miss
+    app = _build_app(
+        config=_enabled_config(
+            sliding_renewal_throttle_seconds=_DEFAULT_SLIDING_RENEWAL_THROTTLE_SECONDS,
+        ),
+        repository=repo,
+        codec=codec,
+        refresh_client=refresh_client,
+        cache=cache,
+    )
+
+    sealed = codec.seal(CookiePayload(session_id=record.session_id))
+    transport = httpx.ASGITransport(app=app)
+
+    async with httpx.AsyncClient(
+        transport=transport, base_url="http://test"
+    ) as client:
+        client.cookies.set(SESSION_COOKIE_NAME, sealed)
+        responses = await asyncio.gather(
+            *(client.get("/echo") for _ in range(fanout))
+        )
+
+    for r in responses:
+        assert r.status_code == 200
+
+    blocking_calls = table.get_item_calls + table.update_item_calls
+    assert blocking_calls <= 2, (
+        f"[1.7] Fan-out of {fanout} concurrent same-session requests at a "
+        f"cache-boundary moment produced {table.get_item_calls} get_item + "
+        f"{table.update_item_calls} update_item = {blocking_calls} blocking "
+        f"DDB calls (bug: ~{2 * fanout}). Single-flight coalescing AND "
+        "window de-alignment are required."
+    )
diff --git a/backend/tests/apis/shared/middleware/test_session_refresh_preservation.py b/backend/tests/apis/shared/middleware/test_session_refresh_preservation.py
new file mode 100644
index 00000000..4f42c8db
--- /dev/null
+++ b/backend/tests/apis/shared/middleware/test_session_refresh_preservation.py
@@ -0,0 +1,1213 @@
+"""Preservation property tests for SessionRefreshMiddleware.
+
+Property 2: BFF Middleware Contracts Unchanged for Non-Buggy Inputs.
+
+This file encodes the observable contracts (Preservation Requirements 3.1–3.11)
+that the event-loop-blocking fix MUST preserve. Tests are run on UNFIXED code
+first and MUST PASS — confirming the baseline behavior to lock in. After the
+fix lands (task 3.x series) these same tests must continue to pass with no
+modifications.
+
+Observation-first methodology: each preservation test encodes behavior
+OBSERVED on today's code — response status, `Set-Cookie` headers (including
+every attribute), `request.state.bff_session`, `request.state.bff_csrf_token`,
+DDB call counts, Cognito call counts, KMS/Secrets Manager call counts — rather
+than re-derived from the spec.
+
+The hypothesis strategies cover the axes that exist today: `is_enabled()`
+true/false, `__Host-bff_session` cookie present/absent, cookie seal
+valid/invalid/expired, `SessionCache` hit/miss, `needs_refresh` yes/no,
+refresh-token rotation yes/no, slide warranted yes/no, absolute-lifetime cap
+passed yes/no, request method safe/unsafe. Inputs that themselves reproduce
+an isBugCondition sub-condition (fan-outs at aligned boundaries, slide timing
+vs response timing, etc.) are avoided — preservation is about the externally
+observable contract, not about how many DDB calls happen under bug-triggering
+inputs.
+
+Validates: Requirements 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10, 3.11
+"""
+
+from __future__ import annotations
+
+import asyncio
+import secrets
+import time
+from typing import Any, Optional
+from unittest.mock import AsyncMock, MagicMock
+
+import httpx
+import pytest
+from cryptography.hazmat.primitives.ciphers.aead import AESGCM
+from fastapi import FastAPI, Request
+from fastapi.testclient import TestClient
+from hypothesis import HealthCheck, given, settings
+from hypothesis import strategies as st
+
+from apis.shared.middleware.csrf import CSRFMiddleware
+from apis.shared.middleware.session_refresh import SessionRefreshMiddleware
+from apis.shared.sessions_bff import cache as cache_module
+from apis.shared.sessions_bff import cookie as cookie_module
+from apis.shared.sessions_bff import lock as lock_module
+from apis.shared.sessions_bff import refresh as refresh_module
+from apis.shared.sessions_bff.cache import SessionCache
+from apis.shared.sessions_bff.config import (
+    BFFConfig,
+    CSRF_COOKIE_NAME,
+    CSRF_HEADER_NAME,
+    SESSION_COOKIE_NAME,
+    _DEFAULT_REFRESH_LEEWAY_SECONDS,
+)
+from apis.shared.sessions_bff.cookie import CookieCodec, get_default_codec
+from apis.shared.sessions_bff.csrf import CSRFHelper
+from apis.shared.sessions_bff.models import CookiePayload, SessionRecord
+from apis.shared.sessions_bff.refresh import (
+    CognitoRefreshClient,
+    CognitoRefreshError,
+    RefreshResult,
+    _reset_secret_cache_for_tests,
+    resolve_bff_client_secret,
+)
+from apis.shared.sessions_bff.repository import SessionRepository
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Shared helpers — duplicated from test_session_refresh_bug_condition.py for
+# test-file isolation. Keep the two files' helper shapes in sync.
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+class InstrumentedTable:
+    """Synchronous fake of a boto3 DynamoDB Table.
+
+    Records call counts so preservation tests can assert "zero AWS calls"
+    for dormant / no-cookie pass-through paths, and "exactly one get_item"
+    for the refresh-storm coalescing contract.
+
+    `update_item` writes are classified into three kinds by inspecting the
+    `UpdateExpression`:
+      - `lock_acquire_calls`: cross-task refresh-lock acquisition (writes
+        `refresh_lock_owner` + `refresh_lock_until`, no token columns).
+      - `token_persist_calls`: token rotation write (sets
+        `cognito_access_token` etc., usually also REMOVE-ing the lock).
+      - `slide_calls`: sliding-renewal touch (writes only `last_seen_at`
+        and optionally `ttl`).
+    `update_item_calls` remains the total (sum) so existing assertions on
+    "any update_item issued" continue to hold. The injected side-effect is
+    applied only to the token-persist path so tests that simulate "DDB
+    throttled during persist" don't accidentally fail at the lock-acquire
+    write — that's a different code path with different recovery semantics.
+    """
+
+    def __init__(
+        self,
+        *,
+        record: Optional[SessionRecord] = None,
+        delay_s: float = 0.0,
+        update_item_side_effect: Optional[Exception] = None,
+    ) -> None:
+        self._delay_s = delay_s
+        self._record = record
+        self._update_item_side_effect = update_item_side_effect
+        self.get_item_calls = 0
+        self.update_item_calls = 0
+        self.lock_acquire_calls = 0
+        self.token_persist_calls = 0
+        self.slide_calls = 0
+        self.put_item_calls = 0
+        self.delete_item_calls = 0
+
+    def _sleep(self) -> None:
+        if self._delay_s > 0:
+            time.sleep(self._delay_s)
+
+    def get_item(self, Key: dict) -> dict:
+        self.get_item_calls += 1
+        self._sleep()
+        if self._record is None:
+            return {}
+        return {"Item": _record_to_item(self._record)}
+
+    @staticmethod
+    def _classify_update(update_expr: str) -> str:
+        """Classify which middleware path issued this update_item.
+
+        Token persist writes always set `cognito_access_token`. Pure lock
+        acquires write `refresh_lock_owner` without touching tokens. Slide
+        writes touch only `last_seen_at` (+ optionally `ttl`).
+        """
+        if "cognito_access_token" in update_expr:
+            return "token_persist"
+        if "refresh_lock_owner" in update_expr:
+            return "lock_acquire"
+        return "slide"
+
+    def update_item(self, **kwargs: Any) -> dict:
+        self.update_item_calls += 1
+        kind = self._classify_update(kwargs.get("UpdateExpression", ""))
+        if kind == "token_persist":
+            self.token_persist_calls += 1
+        elif kind == "lock_acquire":
+            self.lock_acquire_calls += 1
+        else:
+            self.slide_calls += 1
+        self._sleep()
+        # Side-effect injection applies only to the token-persist path —
+        # tests that simulate "rotation persist exhausted" mean exactly
+        # that write, not the upstream lock-acquire.
+        if self._update_item_side_effect is not None and kind == "token_persist":
+            raise self._update_item_side_effect
+        return {}
+
+    def put_item(self, Item: dict) -> dict:
+        self.put_item_calls += 1
+        self._sleep()
+        return {}
+
+    def delete_item(self, Key: dict) -> dict:
+        self.delete_item_calls += 1
+        self._sleep()
+        return {}
+
+
+def _record_to_item(r: SessionRecord) -> dict:
+    return {
+        "PK": f"SESSION#{r.session_id}",
+        "SK": "META",
+        "session_id": r.session_id,
+        "user_id": r.user_id,
+        "username": r.username,
+        "cognito_access_token": r.cognito_access_token,
+        "cognito_refresh_token": r.cognito_refresh_token,
+        "id_token": r.id_token,
+        "access_token_exp": r.access_token_exp,
+        "csrf_secret": r.csrf_secret,
+        "created_at": r.created_at,
+        "last_seen_at": r.last_seen_at,
+        "ttl": r.ttl,
+    }
+
+
+def _make_repo(table: InstrumentedTable) -> SessionRepository:
+    """SessionRepository backed by an InstrumentedTable.
+
+    Bypasses boto3.resource() by starting disabled, then flipping `_enabled`
+    and injecting the fake table. Exercises the real repository async-method
+    bodies so preservation tests see the production code path.
+    """
+    repo = SessionRepository(table_name="")
+    repo._enabled = True
+    repo._table = table  # type: ignore[assignment]
+    repo._table_name = "test-bff-sessions"
+    return repo
+
+
+def _make_codec() -> CookieCodec:
+    codec = CookieCodec(kms_key_arn="arn:aws:kms:fake")
+    codec._cipher = AESGCM(secrets.token_bytes(32))
+    return codec
+
+
+def _make_record(
+    *,
+    session_id: str = "sess-pres-001",
+    access_token_exp: Optional[int] = None,
+    last_seen_at: Optional[int] = None,
+    created_at: Optional[int] = None,
+    ttl: Optional[int] = None,
+) -> SessionRecord:
+    now = int(time.time())
+    return SessionRecord(
+        session_id=session_id,
+        user_id="user-sub-001",
+        username="alice",
+        cognito_access_token="access.original",
+        cognito_refresh_token="refresh.original",
+        id_token="id.original",
+        access_token_exp=access_token_exp if access_token_exp is not None else now + 3600,
+        csrf_secret="csrf-secret-deadbeef",
+        created_at=created_at if created_at is not None else now,
+        last_seen_at=last_seen_at if last_seen_at is not None else now,
+        ttl=ttl if ttl is not None else now + 28800,
+    )
+
+
+def _enabled_config(**overrides: Any) -> BFFConfig:
+    defaults: dict[str, Any] = dict(
+        sessions_table_name="tbl",
+        cookie_signing_key_arn="arn:aws:kms:fake",
+        session_ttl_seconds=28800,
+        refresh_leeway_seconds=_DEFAULT_REFRESH_LEEWAY_SECONDS,
+        cognito_bff_app_client_id="client-id",
+        cognito_bff_app_client_secret_arn="arn:secret",
+        inference_api_url=None,
+        absolute_lifetime_seconds=30 * 24 * 3600,
+        sliding_renewal_throttle_seconds=60,
+    )
+    defaults.update(overrides)
+    return BFFConfig(**defaults)
+
+
+def _disabled_config() -> BFFConfig:
+    return BFFConfig(
+        sessions_table_name=None,
+        cookie_signing_key_arn=None,
+        session_ttl_seconds=28800,
+        refresh_leeway_seconds=60,
+        cognito_bff_app_client_id=None,
+        cognito_bff_app_client_secret_arn=None,
+        inference_api_url=None,
+    )
+
+
+def _build_app(
+    *,
+    config: BFFConfig,
+    repository: Any,
+    codec: CookieCodec,
+    refresh_client: Any,
+    cache: Optional[SessionCache] = None,
+    include_csrf: bool = False,
+) -> FastAPI:
+    app = FastAPI()
+    if include_csrf:
+        # Added first → innermost relative to SessionRefreshMiddleware.
+        # Request order: SessionRefresh → CSRF → route.
+        app.add_middleware(CSRFMiddleware)
+    app.add_middleware(
+        SessionRefreshMiddleware,
+        config=config,
+        repository=repository,
+        cookie_codec=codec,
+        refresh_client=refresh_client,
+        cache=cache or SessionCache(ttl_seconds=60),
+    )
+
+    @app.get("/echo")
+    async def echo_get(request: Request) -> dict:
+        record = getattr(request.state, "bff_session", None)
+        csrf = getattr(request.state, "bff_csrf_token", None)
+        return {
+            "has_session": record is not None,
+            "session_id": record.session_id if record else None,
+            "access_token": record.cognito_access_token if record else None,
+            "csrf_token": csrf,
+        }
+
+    @app.post("/submit")
+    async def submit_post(request: Request) -> dict:
+        record = getattr(request.state, "bff_session", None)
+        return {
+            "has_session": record is not None,
+            "session_id": record.session_id if record else None,
+        }
+
+    return app
+
+
+@pytest.fixture(autouse=True)
+def _reset_session_state() -> Any:
+    """Clear process-wide state between tests."""
+    lock_module._reset_for_tests()
+    _reset_secret_cache_for_tests()
+    cache_module._reset_default_cache_for_tests()
+    cookie_module._reset_default_codec_for_tests()
+    yield
+    lock_module._reset_for_tests()
+    _reset_secret_cache_for_tests()
+    cache_module._reset_default_cache_for_tests()
+    cookie_module._reset_default_codec_for_tests()
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Set-Cookie parsing helpers — the preservation contract on cookie attributes
+# is observed from the raw `Set-Cookie` header, so we parse it here.
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+def _parse_set_cookie(header: str) -> dict[str, Any]:
+    """Parse a raw Set-Cookie header into {name, value, attributes}.
+
+    Attributes are keyed case-folded for reliable membership checks.
+    Boolean attributes (HttpOnly, Secure) map to True.
+    """
+    parts = [p.strip() for p in header.split(";")]
+    name, _, value = parts[0].partition("=")
+    attrs: dict[str, Any] = {}
+    for attr in parts[1:]:
+        if "=" in attr:
+            k, _, v = attr.partition("=")
+            attrs[k.strip().lower()] = v.strip()
+        else:
+            attrs[attr.strip().lower()] = True
+    return {"name": name.strip(), "value": value.strip(), "attrs": attrs}
+
+
+def _find_set_cookies(
+    response_headers: Any, cookie_name: str
+) -> list[dict[str, Any]]:
+    """Return every parsed Set-Cookie for a given cookie name."""
+    parsed = []
+    for header in response_headers.get_list("set-cookie"):
+        pc = _parse_set_cookie(header)
+        if pc["name"] == cookie_name:
+            parsed.append(pc)
+    return parsed
+
+
+def _wait_for(predicate: Any, *, timeout_s: float = 1.0, interval_s: float = 0.01) -> bool:
+    """Poll ``predicate`` until it returns truthy or ``timeout_s`` elapses.
+
+    The slide-write path became fire-and-forget in task 3.5 — `_maybe_slide`
+    schedules the DDB `touch_last_seen` on a detached `asyncio.create_task`
+    and returns the Max-Age synchronously. `TestClient` returns the response
+    before the scheduled task has a chance to run on slower CI schedulers,
+    so assertions about `update_item_calls == 1` must poll rather than
+    sample immediately. The observable external contract (cookie attributes,
+    Max-Age, response body) is unchanged — only the internal timing of the
+    background write moves.
+    """
+    deadline = time.monotonic() + timeout_s
+    while time.monotonic() < deadline:
+        if predicate():
+            return True
+        time.sleep(interval_s)
+    return predicate()
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Requirement 3.1 — Dormant pass-through with zero AWS calls
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+# Cookie-safe ASCII: printable, no semicolons/commas/whitespace/control chars —
+# httpx's cookiejar only accepts ASCII values and rejects the RFC 6265 separators.
+_COOKIE_SAFE_ALPHABET = st.characters(
+    min_codepoint=0x21,
+    max_codepoint=0x7E,
+    blacklist_characters=";, \t\"\\",
+)
+
+
+@given(
+    method=st.sampled_from(["GET", "POST", "PUT", "PATCH", "DELETE", "HEAD", "OPTIONS"]),
+    path=st.sampled_from(["/echo", "/submit"]),
+    with_cookie=st.booleans(),
+    cookie_value=st.text(alphabet=_COOKIE_SAFE_ALPHABET, min_size=0, max_size=64),
+)
+@settings(
+    max_examples=30,
+    deadline=None,
+    suppress_health_check=[HealthCheck.function_scoped_fixture],
+)
+def test_3_1_dormant_passthrough_zero_aws_calls(
+    method: str, path: str, with_cookie: bool, cookie_value: str
+) -> None:
+    """(3.1) Dormant pass-through.
+
+    When `BFFConfig.is_enabled() == False`, every request shape (method,
+    path, cookie present/absent) short-circuits through `call_next(request)`
+    with zero DDB calls and zero Cognito calls.
+    """
+    table = InstrumentedTable()
+    repo = _make_repo(table)
+    # Force the repo into the "enabled" posture so we'd observe a call if
+    # the middleware mistakenly went past its `is_enabled()` guard.
+    codec = _make_codec()
+    refresh_client = MagicMock()
+    app = _build_app(
+        config=_disabled_config(),
+        repository=repo,
+        codec=codec,
+        refresh_client=refresh_client,
+    )
+
+    cookies: dict[str, str] = {}
+    if with_cookie:
+        cookies[SESSION_COOKIE_NAME] = cookie_value
+
+    with TestClient(app) as client:
+        response = client.request(method, path, cookies=cookies)
+
+    # OPTIONS/HEAD may be allowed or not depending on route — we only care
+    # that the middleware did not touch AWS regardless of status.
+    assert response.status_code < 500, (
+        f"[3.1] dormant pass-through produced 5xx for {method} {path}: "
+        f"{response.status_code}"
+    )
+    assert table.get_item_calls == 0, (
+        f"[3.1] dormant middleware issued {table.get_item_calls} get_item "
+        f"calls — must be zero when is_enabled() == False"
+    )
+    assert table.update_item_calls == 0, (
+        f"[3.1] dormant middleware issued {table.update_item_calls} "
+        "update_item calls — must be zero when is_enabled() == False"
+    )
+    assert table.put_item_calls == 0
+    assert table.delete_item_calls == 0
+    refresh_client.refresh.assert_not_called()
+    # No Set-Cookie emitted by the middleware when dormant.
+    assert response.headers.get_list("set-cookie") == []
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Requirement 3.2 — No-cookie pass-through with zero AWS calls
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+@given(
+    method=st.sampled_from(["GET", "POST", "PUT", "PATCH", "DELETE"]),
+    path=st.sampled_from(["/echo", "/submit"]),
+)
+@settings(
+    max_examples=20,
+    deadline=None,
+    suppress_health_check=[HealthCheck.function_scoped_fixture],
+)
+def test_3_2_no_cookie_passthrough_zero_aws_calls(
+    method: str, path: str
+) -> None:
+    """(3.2) No-cookie pass-through.
+
+    When `is_enabled() == True` but no `__Host-bff_session` cookie is present
+    (Bearer-token requests, anonymous endpoints), the middleware must pass
+    through with zero AWS calls and no `request.state.bff_session`.
+    """
+    table = InstrumentedTable()
+    repo = _make_repo(table)
+    codec = _make_codec()
+    refresh_client = MagicMock()
+    app = _build_app(
+        config=_enabled_config(),
+        repository=repo,
+        codec=codec,
+        refresh_client=refresh_client,
+    )
+
+    with TestClient(app) as client:
+        response = client.request(method, path)
+
+    assert response.status_code < 500
+    # When the call returned 200 with body, the handler reports has_session=False.
+    if response.status_code == 200 and response.headers.get(
+        "content-type", ""
+    ).startswith("application/json"):
+        body = response.json()
+        assert body["has_session"] is False, (
+            "[3.2] state.bff_session must NOT be set when no cookie is present"
+        )
+    assert table.get_item_calls == 0, (
+        f"[3.2] no-cookie path issued {table.get_item_calls} get_item calls"
+    )
+    assert table.update_item_calls == 0
+    assert table.put_item_calls == 0
+    assert table.delete_item_calls == 0
+    refresh_client.refresh.assert_not_called()
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Requirement 3.3 — Unrecoverable cookie clears BOTH cookies with matching attrs
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+def _assert_clear_cookie_attrs(parsed: dict[str, Any]) -> None:
+    """Attributes observed today on a cleared BFF cookie:
+
+        Max-Age=0; Path=/; SameSite=lax; Secure
+
+    HttpOnly is present on the session cookie only (intentional: the CSRF
+    cookie is JS-readable). All other attributes are identical across both
+    cookies.
+    """
+    attrs = parsed["attrs"]
+    assert attrs.get("max-age") == "0", (
+        f"[3.3] clear must set Max-Age=0; got attrs={attrs}"
+    )
+    assert attrs.get("path") == "/", (
+        f"[3.3] clear must set Path=/; got attrs={attrs}"
+    )
+    assert attrs.get("samesite") == "lax", (
+        f"[3.3] clear must set SameSite=lax; got attrs={attrs}"
+    )
+    assert attrs.get("secure") is True, (
+        f"[3.3] clear must set Secure; got attrs={attrs}"
+    )
+
+
+@pytest.mark.parametrize(
+    "scenario",
+    ["bad_seal", "missing_row", "expired_row", "terminal_refresh_error"],
+)
+def test_3_3_unrecoverable_cookie_clears_both_cookies_with_matching_attrs(
+    scenario: str,
+) -> None:
+    """(3.3) Unrecoverable cookie → clear both.
+
+    Bad-seal, missing-row, expired-row, and terminal-`CognitoRefreshError`
+    inputs all produce Set-Cookie for both `__Host-bff_session` and
+    `__Host-bff_csrf` with `Max-Age=0` and the today-observed attribute set.
+    The HttpOnly attribute intentionally differs between the two (session
+    is HttpOnly; CSRF is JS-readable by design); all other attrs match.
+    """
+    codec = _make_codec()
+    refresh_client = MagicMock()
+
+    if scenario == "bad_seal":
+        table = InstrumentedTable()
+        cookie_value = "not-a-sealed-cookie"
+    elif scenario == "missing_row":
+        # No record on the table — get_item returns {} → record None.
+        table = InstrumentedTable(record=None)
+        cookie_value = codec.seal(CookiePayload(session_id="sess-gone"))
+    elif scenario == "expired_row":
+        # TTL in the past — repository treats as missing (defense in depth).
+        expired = _make_record(ttl=int(time.time()) - 10)
+        table = InstrumentedTable(record=expired)
+        cookie_value = codec.seal(CookiePayload(session_id=expired.session_id))
+    elif scenario == "terminal_refresh_error":
+        # Access token within leeway → refresh path → Cognito raises.
+        rec = _make_record(access_token_exp=int(time.time()) + 5)
+        table = InstrumentedTable(record=rec)
+        cookie_value = codec.seal(CookiePayload(session_id=rec.session_id))
+        refresh_client.refresh.side_effect = CognitoRefreshError("rotated-dead")
+    else:
+        pytest.fail(f"unknown scenario: {scenario}")
+
+    repo = _make_repo(table)
+    app = _build_app(
+        config=_enabled_config(),
+        repository=repo,
+        codec=codec,
+        refresh_client=refresh_client,
+    )
+
+    with TestClient(app) as client:
+        response = client.get("/echo", cookies={SESSION_COOKIE_NAME: cookie_value})
+
+    assert response.status_code == 200
+    assert response.json()["has_session"] is False, (
+        f"[3.3/{scenario}] state.bff_session must NOT be set after clear"
+    )
+
+    session_clears = _find_set_cookies(response.headers, SESSION_COOKIE_NAME)
+    csrf_clears = _find_set_cookies(response.headers, CSRF_COOKIE_NAME)
+    assert len(session_clears) == 1, (
+        f"[3.3/{scenario}] expected exactly one Set-Cookie for "
+        f"{SESSION_COOKIE_NAME}; got {len(session_clears)}"
+    )
+    assert len(csrf_clears) == 1, (
+        f"[3.3/{scenario}] expected exactly one Set-Cookie for "
+        f"{CSRF_COOKIE_NAME}; got {len(csrf_clears)}"
+    )
+
+    # Each cleared cookie carries Max-Age=0 and the shared attribute set.
+    _assert_clear_cookie_attrs(session_clears[0])
+    _assert_clear_cookie_attrs(csrf_clears[0])
+
+    # HttpOnly is the one documented difference between the two cookies.
+    assert session_clears[0]["attrs"].get("httponly") is True, (
+        f"[3.3/{scenario}] session cookie must remain HttpOnly on clear"
+    )
+    assert csrf_clears[0]["attrs"].get("httponly") is not True, (
+        f"[3.3/{scenario}] CSRF cookie must NOT be HttpOnly (JS must read it)"
+    )
+
+    # Shared (non-HttpOnly) attribute set is identical across the two clears.
+    shared_keys = {"max-age", "path", "samesite", "secure"}
+    sess_shared = {k: session_clears[0]["attrs"].get(k) for k in shared_keys}
+    csrf_shared = {k: csrf_clears[0]["attrs"].get(k) for k in shared_keys}
+    assert sess_shared == csrf_shared, (
+        f"[3.3/{scenario}] shared clear attrs diverge: "
+        f"session={sess_shared}, csrf={csrf_shared}"
+    )
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Requirement 3.4 — Max-Age re-emit contract (slide path)
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+@given(
+    # Session TTL bounded so it always fits well within the absolute cap.
+    session_ttl=st.integers(min_value=120, max_value=28800),
+    # Time since the last touch — past the throttle so a slide is warranted.
+    seconds_since_last_seen=st.integers(min_value=61, max_value=3600),
+)
+@settings(
+    max_examples=15,
+    deadline=None,
+    suppress_health_check=[HealthCheck.function_scoped_fixture],
+)
+def test_3_4_slide_max_age_matches_on_both_cookies(
+    session_ttl: int, seconds_since_last_seen: int
+) -> None:
+    """(3.4) Max-Age re-emit contract.
+
+    When `_maybe_slide` returns a non-None Max-Age, the Set-Cookie headers
+    for BOTH `__Host-bff_session` and `__Host-bff_csrf` carry that exact
+    Max-Age and the attribute set observed today on `_reemit_cookies`:
+
+        Session:  HttpOnly; Max-Age=<n>; Path=/; SameSite=lax; Secure
+        CSRF:                Max-Age=<n>; Path=/; SameSite=lax; Secure
+    """
+    now = int(time.time())
+    record = _make_record(last_seen_at=now - seconds_since_last_seen)
+    table = InstrumentedTable(record=record)
+    repo = _make_repo(table)
+    codec = _make_codec()
+    refresh_client = MagicMock()
+    # Large absolute lifetime so the slide is not capped — the Max-Age we
+    # get back must equal session_ttl_seconds exactly.
+    app = _build_app(
+        config=_enabled_config(
+            session_ttl_seconds=session_ttl,
+            absolute_lifetime_seconds=30 * 24 * 3600,
+            sliding_renewal_throttle_seconds=60,
+        ),
+        repository=repo,
+        codec=codec,
+        refresh_client=refresh_client,
+    )
+
+    sealed = codec.seal(CookiePayload(session_id=record.session_id))
+    with TestClient(app) as client:
+        response = client.get("/echo", cookies={SESSION_COOKIE_NAME: sealed})
+        # Slide-write is fire-and-forget (task 3.5) — drive the event
+        # loop with a second request to let the background task from the
+        # first request flush. MUST happen inside the `with TestClient`
+        # block because TestClient tears down its anyio portal (and the
+        # event loop) on `__exit__`, which cancels any pending tasks.
+        _wait_for(lambda: table.update_item_calls >= 1)
+        if table.update_item_calls == 0:
+            # A no-op second request keeps the event loop alive long
+            # enough for the pending slide task to run.
+            client.get("/echo")
+            _wait_for(lambda: table.update_item_calls >= 1)
+
+    assert response.status_code == 200
+    # Slide must have fired exactly once (one DDB update_item).
+    assert table.update_item_calls == 1, (
+        f"[3.4] slide must issue exactly one update_item; got "
+        f"{table.update_item_calls}"
+    )
+
+    session_emits = _find_set_cookies(response.headers, SESSION_COOKIE_NAME)
+    csrf_emits = _find_set_cookies(response.headers, CSRF_COOKIE_NAME)
+    assert len(session_emits) == 1, (
+        f"[3.4] expected exactly one Set-Cookie for {SESSION_COOKIE_NAME}"
+    )
+    assert len(csrf_emits) == 1, (
+        f"[3.4] expected exactly one Set-Cookie for {CSRF_COOKIE_NAME}"
+    )
+
+    sess_attrs = session_emits[0]["attrs"]
+    csrf_attrs = csrf_emits[0]["attrs"]
+
+    # Max-Age equals session_ttl_seconds on BOTH cookies (no absolute cap).
+    assert sess_attrs.get("max-age") == str(session_ttl), (
+        f"[3.4] session cookie Max-Age mismatch: expected {session_ttl}, "
+        f"got {sess_attrs.get('max-age')}"
+    )
+    assert csrf_attrs.get("max-age") == str(session_ttl), (
+        f"[3.4] csrf cookie Max-Age mismatch: expected {session_ttl}, "
+        f"got {csrf_attrs.get('max-age')}"
+    )
+
+    # Attribute set observed on today's _reemit_cookies:
+    assert sess_attrs.get("path") == "/"
+    assert sess_attrs.get("samesite") == "lax"
+    assert sess_attrs.get("secure") is True
+    assert sess_attrs.get("httponly") is True
+
+    assert csrf_attrs.get("path") == "/"
+    assert csrf_attrs.get("samesite") == "lax"
+    assert csrf_attrs.get("secure") is True
+    # CSRF is JS-readable → MUST NOT be HttpOnly.
+    assert csrf_attrs.get("httponly") is not True
+
+    # Shared (non-HttpOnly) attribute set is identical.
+    shared = {"max-age", "path", "samesite", "secure"}
+    assert {k: sess_attrs.get(k) for k in shared} == {
+        k: csrf_attrs.get(k) for k in shared
+    }
+
+    # The sealed value on the session cookie is the exact same value the
+    # browser already held — slide doesn't mint a new seal.
+    assert session_emits[0]["value"] == sealed, (
+        "[3.4] slide must re-emit the same sealed session value, not a new seal"
+    )
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Requirement 3.5 — Refresh-storm coalescing preserved (one initiate_auth per
+# session per leeway window)
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+@pytest.mark.asyncio
+async def test_3_5_refresh_storm_coalesces_to_single_initiate_auth() -> None:
+    """(3.5) Refresh-storm coalescing.
+
+    10 concurrent same-session requests crossing the refresh-leeway window
+    must drive exactly ONE `cognito-idp:initiate_auth` call (the existing
+    per-session lock coalescing contract). The fix MUST preserve this.
+    """
+    now = int(time.time())
+    record = _make_record(access_token_exp=now + 5)  # within 60s leeway
+    table = InstrumentedTable(record=record)
+    repo = _make_repo(table)
+    codec = _make_codec()
+
+    refresh_call_count = {"n": 0}
+
+    async def _refresh(*, username: str, refresh_token: str) -> RefreshResult:
+        refresh_call_count["n"] += 1
+        return RefreshResult(
+            access_token=f"access.fresh.{refresh_call_count['n']}",
+            refresh_token="refresh.original",  # no rotation
+            id_token="id.fresh",
+            access_token_exp=int(time.time()) + 3600,
+        )
+
+    refresh_client = MagicMock()
+    refresh_client.refresh = AsyncMock(side_effect=_refresh)
+
+    # After the first refresh lands, later repo.get calls should observe
+    # a record that no longer needs refresh (the update_item write is a
+    # no-op on the fake, so we pre-refresh the in-memory record copy).
+    fresh = _make_record(
+        session_id=record.session_id, access_token_exp=now + 3600
+    )
+    fresh.cognito_access_token = "access.fresh.1"
+    # Sequential responses: first few see the stale record, then the fresh one.
+    table._record = record  # starts stale
+    original_get_item = table.get_item
+
+    get_item_counter = {"n": 0}
+
+    def counting_get_item(Key: dict) -> dict:
+        get_item_counter["n"] += 1
+        # After the leader's update_item bumps tokens, followers arriving
+        # late should see the fresh record. Flip after 2 calls so both
+        # pre-lock and post-lock rechecks on the leader path see the stale row.
+        if get_item_counter["n"] > 2:
+            table._record = fresh
+        return original_get_item(Key)
+
+    table.get_item = counting_get_item  # type: ignore[assignment]
+
+    app = _build_app(
+        config=_enabled_config(),
+        repository=repo,
+        codec=codec,
+        refresh_client=refresh_client,
+    )
+
+    sealed = codec.seal(CookiePayload(session_id=record.session_id))
+    transport = httpx.ASGITransport(app=app)
+
+    async with httpx.AsyncClient(
+        transport=transport, base_url="http://test"
+    ) as client:
+        client.cookies.set(SESSION_COOKIE_NAME, sealed)
+        responses = await asyncio.gather(
+            *(client.get("/echo") for _ in range(10))
+        )
+
+    for r in responses:
+        assert r.status_code == 200
+
+    assert refresh_call_count["n"] == 1, (
+        f"[3.5] 10 concurrent same-session requests drove "
+        f"{refresh_call_count['n']} Cognito initiate_auth calls — exactly "
+        "one is required per session per leeway window (existing "
+        "get_session_lock coalescing)."
+    )
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Requirement 3.6 — Codec singleton, zero per-request KMS GenerateDataKey
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+def test_3_6_get_default_codec_is_singleton_with_no_per_request_kms() -> None:
+    """(3.6) Codec singleton.
+
+    `get_default_codec()` returns the same instance across calls. The
+    underlying `secretsmanager:GetSecretValue` call happens at most once
+    per process. Hot seal/unseal traffic must not re-fetch.
+
+    (This contract held under the original `kms:GenerateDataKey`-per-process
+    design and the interim KMS-wrap design too; only the underlying AWS
+    APIs and KDF changed when the codec was moved to a shared
+    Secrets-Manager-generated secret for cross-task seal/unseal.)
+    """
+    sm_client = MagicMock()
+    sm_client.get_secret_value.return_value = {
+        "SecretString": "secret-3-6-high-entropy-1234567890ABCDEFGHIJ"
+    }
+
+    codec = CookieCodec(
+        kms_key_arn="arn:aws:kms:fake-3.6",
+        data_key_secret_arn="arn:aws:secretsmanager:fake-3.6",
+        secrets_manager_client=sm_client,
+    )
+    cookie_module._set_default_codec_for_tests(codec)
+
+    first = get_default_codec()
+    for _ in range(25):
+        other = get_default_codec()
+        assert other is first, (
+            "[3.6] get_default_codec() must return the same instance each call"
+        )
+
+    payload = CookiePayload(session_id="sess-3-6")
+    for _ in range(20):
+        sealed = first.seal(payload)
+        roundtripped = first.unseal(sealed)
+        assert roundtripped.session_id == "sess-3-6"
+
+    assert sm_client.get_secret_value.call_count <= 1, (
+        f"[3.6] Secrets Manager get_secret_value invoked "
+        f"{sm_client.get_secret_value.call_count} times — must be at most "
+        "one per process."
+    )
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Requirement 3.7 — Client-secret cache, one Secrets Manager hit per process
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+def test_3_7_client_secret_cache_one_secrets_manager_hit_per_process() -> None:
+    """(3.7) Client-secret cache.
+
+    `resolve_bff_client_secret()` must hit Secrets Manager exactly once per
+    process regardless of how many times it is called.
+    """
+    sm_client = MagicMock()
+    sm_client.get_secret_value.return_value = {"SecretString": "client-secret-A"}
+
+    first = resolve_bff_client_secret(
+        secret_arn="arn:secret",
+        region="us-east-1",
+        secrets_manager_client=sm_client,
+    )
+    assert first == "client-secret-A"
+
+    # Many subsequent calls — even with a fresh SM client — must not drive
+    # a new GetSecretValue, because the first call populated the cache.
+    for _ in range(50):
+        value = resolve_bff_client_secret(
+            secret_arn="arn:secret",
+            region="us-east-1",
+            secrets_manager_client=sm_client,
+        )
+        assert value == "client-secret-A"
+
+    assert sm_client.get_secret_value.call_count == 1, (
+        f"[3.7] Secrets Manager get_secret_value called "
+        f"{sm_client.get_secret_value.call_count} times — must be exactly one."
+    )
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Requirement 3.8 — CSRFMiddleware accept/reject unchanged, no new I/O
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+@pytest.mark.parametrize(
+    "case",
+    ["matching", "mismatched", "header_only", "cookie_only", "forged_pair", "missing"],
+)
+def test_3_8_csrf_decision_unchanged_with_zero_new_io(case: str) -> None:
+    """(3.8) CSRF path unchanged.
+
+    With `SessionRefreshMiddleware` upstream populating `state.bff_session`,
+    the `CSRFMiddleware` accept/reject decision on unsafe-method requests
+    matches today's observed behavior across all five CSRF token cases.
+    No new DDB / Cognito / KMS / Secrets Manager I/O is introduced on the
+    CSRF path.
+    """
+    record = _make_record()
+    table = InstrumentedTable(record=record)
+    repo = _make_repo(table)
+    codec = _make_codec()
+    refresh_client = MagicMock()
+    app = _build_app(
+        config=_enabled_config(),
+        repository=repo,
+        codec=codec,
+        refresh_client=refresh_client,
+        include_csrf=True,
+    )
+
+    sealed = codec.seal(CookiePayload(session_id=record.session_id))
+    valid_token = CSRFHelper.derive_token(record.csrf_secret, record.session_id)
+    forged_token = "0" * 32
+
+    headers: dict[str, str] = {}
+    cookies: dict[str, str] = {SESSION_COOKIE_NAME: sealed}
+
+    if case == "matching":
+        headers[CSRF_HEADER_NAME] = valid_token
+        cookies[CSRF_COOKIE_NAME] = valid_token
+        expected_status = 200
+    elif case == "mismatched":
+        headers[CSRF_HEADER_NAME] = valid_token
+        cookies[CSRF_COOKIE_NAME] = "different-value"
+        expected_status = 403
+    elif case == "header_only":
+        headers[CSRF_HEADER_NAME] = valid_token
+        expected_status = 403
+    elif case == "cookie_only":
+        cookies[CSRF_COOKIE_NAME] = valid_token
+        expected_status = 403
+    elif case == "forged_pair":
+        headers[CSRF_HEADER_NAME] = forged_token
+        cookies[CSRF_COOKIE_NAME] = forged_token
+        expected_status = 403
+    elif case == "missing":
+        expected_status = 403
+    else:
+        pytest.fail(f"unknown case: {case}")
+
+    # Snapshot AWS call counters BEFORE the CSRF-exercising request.
+    # (Session resolve may have happened on-open via middleware init; we
+    # expect exactly one get_item for the resolve, and zero writes.)
+    initial_refresh_calls = refresh_client.refresh.call_count
+    initial_update_calls = table.update_item_calls
+
+    with TestClient(app) as client:
+        response = client.post("/submit", headers=headers, cookies=cookies)
+
+    assert response.status_code == expected_status, (
+        f"[3.8/{case}] unexpected CSRF decision: expected {expected_status}, "
+        f"got {response.status_code}"
+    )
+    # Zero NEW Cognito / DDB write I/O on the CSRF path itself.
+    assert refresh_client.refresh.call_count == initial_refresh_calls, (
+        f"[3.8/{case}] CSRF path triggered an unexpected Cognito refresh"
+    )
+    # CSRF itself never writes to DDB.
+    assert table.update_item_calls - initial_update_calls <= 1, (
+        f"[3.8/{case}] more than one update_item observed — at most the "
+        "preceding session-resolve slide is expected."
+    )
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Requirement 3.9 — Absolute-lifetime cap returns None from _maybe_slide
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+@pytest.mark.asyncio
+async def test_3_9_maybe_slide_returns_none_past_absolute_cap() -> None:
+    """(3.9) Absolute-lifetime cap.
+
+    When `now > created_at + absolute_lifetime_seconds`, `_maybe_slide`
+    returns `None` (no cookie re-emit, no DDB write).
+    """
+    now = int(time.time())
+    # Session was created 200s ago with an absolute lifetime of 100s → cap
+    # was reached 100s ago. last_seen_at is past the throttle so otherwise
+    # a slide would be warranted.
+    record = _make_record(
+        created_at=now - 200,
+        last_seen_at=now - 120,
+    )
+    table = InstrumentedTable(record=record)
+    repo = _make_repo(table)
+    codec = _make_codec()
+    refresh_client = MagicMock()
+    config = _enabled_config(
+        absolute_lifetime_seconds=100,
+        sliding_renewal_throttle_seconds=60,
+    )
+
+    # Build the middleware directly so we can invoke _maybe_slide in
+    # isolation — the preservation contract is specifically that the
+    # method returns None past the cap.
+    middleware = SessionRefreshMiddleware(
+        app=FastAPI(),
+        config=config,
+        repository=repo,
+        cookie_codec=codec,
+        refresh_client=refresh_client,
+        cache=SessionCache(ttl_seconds=60),
+    )
+    middleware._ensure_collaborators()
+
+    result = await middleware._maybe_slide(record)
+    assert result is None, (
+        f"[3.9] _maybe_slide must return None past the absolute cap; "
+        f"got {result!r}"
+    )
+    assert table.update_item_calls == 0, (
+        f"[3.9] _maybe_slide must NOT schedule a DDB write past the cap; "
+        f"observed {table.update_item_calls} update_item calls."
+    )
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Requirement 3.10 — Fail-closed rotation: cache invalidated AND cookies cleared
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+def test_3_10_rotation_persist_exhausts_invalidates_cache_and_clears_cookies() -> None:
+    """(3.10) Fail-closed rotation.
+
+    When refresh-token rotation kicks in AND `_persist_refresh` exhausts all
+    retries (update_item fails every time), the middleware MUST:
+      (a) invalidate the cache entry for this session
+      (b) clear BOTH BFF cookies on the response
+    so the user is forced to re-authenticate before their next request
+    hits a dead refresh token.
+    """
+    now = int(time.time())
+    # Access token within leeway → refresh path.
+    record = _make_record(access_token_exp=now + 5)
+    table = InstrumentedTable(
+        record=record,
+        update_item_side_effect=RuntimeError("DDB throttled"),
+    )
+    repo = _make_repo(table)
+    codec = _make_codec()
+    refresh_client = MagicMock()
+    # Rotation kicks in — refresh_token differs from current.
+    refresh_client.refresh = AsyncMock(
+        return_value=RefreshResult(
+            access_token="access.fresh",
+            refresh_token="refresh.ROTATED",
+            id_token="id.fresh",
+            access_token_exp=now + 3600,
+        )
+    )
+
+    cache = SessionCache(ttl_seconds=60)
+    # Pre-seed the cache so we can verify invalidation.
+    cache.set(record)
+    assert cache.get(record.session_id) is not None
+
+    app = _build_app(
+        config=_enabled_config(),
+        repository=repo,
+        codec=codec,
+        refresh_client=refresh_client,
+        cache=cache,
+    )
+
+    sealed = codec.seal(CookiePayload(session_id=record.session_id))
+    with TestClient(app) as client:
+        response = client.get("/echo", cookies={SESSION_COOKIE_NAME: sealed})
+
+    assert response.status_code == 200
+    assert response.json()["has_session"] is False, (
+        "[3.10] state.bff_session must NOT be set after fail-closed rotation"
+    )
+
+    # (a) Cache entry invalidated.
+    assert cache.get(record.session_id) is None, (
+        "[3.10] cache entry must be invalidated after exhausted rotation persist"
+    )
+
+    # (b) Both cookies cleared.
+    session_clears = _find_set_cookies(response.headers, SESSION_COOKIE_NAME)
+    csrf_clears = _find_set_cookies(response.headers, CSRF_COOKIE_NAME)
+    assert len(session_clears) == 1 and len(csrf_clears) == 1, (
+        f"[3.10] both BFF cookies must be cleared; got "
+        f"session={len(session_clears)}, csrf={len(csrf_clears)}"
+    )
+    _assert_clear_cookie_attrs(session_clears[0])
+    _assert_clear_cookie_attrs(csrf_clears[0])
+
+    # Sanity: update_tokens was retried 3 times on rotation. Use the
+    # token_persist sub-counter so we measure persist attempts only,
+    # not the (also-incrementing) lock_acquire write that precedes them.
+    assert table.token_persist_calls == 3, (
+        f"[3.10] rotation must retry update_tokens 3 times; got "
+        f"{table.token_persist_calls}"
+    )
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# Requirement 3.11 — Cookie decode uniformity (no new timing/shape oracle)
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+@given(
+    garbage=st.one_of(
+        # Arbitrary non-empty ASCII cookie-safe strings — typical "bad seal"
+        # wire shape. Excludes '' because an empty cookie value is treated
+        # as "no cookie present" by the middleware (requirement 3.2), not
+        # as a decode failure.
+        st.text(alphabet=_COOKIE_SAFE_ALPHABET, min_size=1, max_size=64),
+        # Hex-encoded random bytes — invalid base64url alphabet and length.
+        st.binary(min_size=1, max_size=48).map(lambda b: b.hex()),
+    ),
+)
+@settings(
+    max_examples=25,
+    deadline=None,
+    suppress_health_check=[HealthCheck.function_scoped_fixture],
+)
+def test_3_11_cookie_decode_failure_produces_uniform_response_shape(
+    garbage: str,
+) -> None:
+    """(3.11) Cookie decode uniformity.
+
+    Every `CookieDecodeError` branch — bad base64, bad tag, truncated blob,
+    wrong version, non-JSON body — produces the SAME externally observable
+    response shape: identical status, identical Set-Cookie clearing pattern
+    for both BFF cookies, identical handler body (has_session=False).
+
+    The middleware must NOT surface any oracle that lets a caller
+    distinguish decode failure modes.
+    """
+    table = InstrumentedTable()
+    repo = _make_repo(table)
+    codec = _make_codec()
+    refresh_client = MagicMock()
+    app = _build_app(
+        config=_enabled_config(),
+        repository=repo,
+        codec=codec,
+        refresh_client=refresh_client,
+    )
+
+    with TestClient(app) as client:
+        response = client.get(
+            "/echo", cookies={SESSION_COOKIE_NAME: garbage}
+        )
+
+    assert response.status_code == 200, (
+        f"[3.11] bad-seal path must return 200 with cleared cookie; "
+        f"got {response.status_code}"
+    )
+    assert response.json() == {
+        "has_session": False,
+        "session_id": None,
+        "access_token": None,
+        "csrf_token": None,
+    }, (
+        f"[3.11] handler body diverges for garbage cookie {garbage!r}: "
+        f"{response.json()}"
+    )
+
+    # Both cookies cleared with the same attribute set.
+    session_clears = _find_set_cookies(response.headers, SESSION_COOKIE_NAME)
+    csrf_clears = _find_set_cookies(response.headers, CSRF_COOKIE_NAME)
+    assert len(session_clears) == 1, (
+        f"[3.11] expected one session-cookie clear; got {len(session_clears)}"
+    )
+    assert len(csrf_clears) == 1, (
+        f"[3.11] expected one csrf-cookie clear; got {len(csrf_clears)}"
+    )
+    _assert_clear_cookie_attrs(session_clears[0])
+    _assert_clear_cookie_attrs(csrf_clears[0])
+
+    # Zero AWS calls — decode failure is caught before any DDB / Cognito I/O.
+    assert table.get_item_calls == 0, (
+        f"[3.11] bad-seal path must NOT reach DDB; observed "
+        f"{table.get_item_calls} get_item calls."
+    )
+    refresh_client.refresh.assert_not_called()
diff --git a/backend/tests/apis/shared/sessions_bff/test_cookie.py b/backend/tests/apis/shared/sessions_bff/test_cookie.py
index afeaf61a..49f4d5bc 100644
--- a/backend/tests/apis/shared/sessions_bff/test_cookie.py
+++ b/backend/tests/apis/shared/sessions_bff/test_cookie.py
@@ -1,21 +1,33 @@
 """Tests for the AES-GCM cookie codec.
 
-Uses an injected `AESGCM` cipher to avoid mocking KMS — `CookieCodec` exposes
-the `_cipher` attribute which we set directly. (Production callers always go
-through `_ensure_cipher`, which is what the KMS-integration test exercises.)
+Two layers of coverage:
+
+  1. Round-trip / decode tests — use an injected `AESGCM` cipher (set on
+     `_cipher` directly) so we don't need to mock Secrets Manager.
+  2. `_ensure_cipher` path — exercises the deploy-time-bootstrapped data
+     key flow (`secretsmanager:GetSecretValue` -> SHA-256 -> AESGCM cipher)
+     with mock clients. This is the path that runs in production every
+     time a task starts.
+
+The cross-task seal/unseal regression — a cookie sealed by one process
+unsealing on a *different* process — is locked in by
+`test_two_codecs_with_same_secret_derive_the_same_cipher`.
 """
 
 from __future__ import annotations
 
 import base64
+import hashlib
 import os
 import secrets
+from unittest.mock import MagicMock
 
 import pytest
 from cryptography.hazmat.primitives.ciphers.aead import AESGCM
 
 from apis.shared.sessions_bff.cookie import (
     CookieCodec,
+    CookieDataKeyUnavailable,
     CookieDecodeError,
     _reset_default_codec_for_tests,
     _set_default_codec_for_tests,
@@ -110,17 +122,24 @@ def test_seal_preserves_extras() -> None:
 def test_default_codec_is_a_singleton() -> None:
     """The auth/callback route seals with this codec and the
     `SessionRefreshMiddleware` unseals with it on the next request — they
-    must be the *same* instance, since each `CookieCodec` derives its own
-    random AES key. A second instance would fail every unseal as 'bad seal'.
+    must be the *same* instance within a process so we don't refetch the
+    data-key secret on every cookie operation.
+
+    Cross-process consistency (Task A's seal unsealing on Task B) is locked
+    in by `test_two_codecs_with_same_secret_derive_the_same_cipher`.
     """
     _reset_default_codec_for_tests()
     try:
         os.environ["BFF_COOKIE_SIGNING_KEY_ARN"] = "arn:aws:kms:fake"
+        os.environ["BFF_COOKIE_DATA_KEY_SECRET_ARN"] = (
+            "arn:aws:secretsmanager:us-east-1:0:secret:bff-data-key"
+        )
         first = get_default_codec()
         second = get_default_codec()
         assert first is second
     finally:
         os.environ.pop("BFF_COOKIE_SIGNING_KEY_ARN", None)
+        os.environ.pop("BFF_COOKIE_DATA_KEY_SECRET_ARN", None)
         _reset_default_codec_for_tests()
 
 
@@ -142,14 +161,136 @@ def test_default_codec_round_trip_seals_and_unseals() -> None:
         _reset_default_codec_for_tests()
 
 
-def test_unseal_propagates_kms_infrastructure_errors() -> None:
-    """KMS unavailable is not a decode error — it must surface so the caller
-    can return 5xx instead of clearing the cookie and forcing re-login."""
-    from unittest.mock import MagicMock
+# =====================================================================
+# `_ensure_cipher` — Secrets Manager fetch + SHA-256 derivation path.
+# =====================================================================
+
+KMS_KEY_ARN = "arn:aws:kms:us-east-1:0:key/test"
+DATA_KEY_SECRET_ARN = "arn:aws:secretsmanager:us-east-1:0:secret:bff-data-key"
+
+
+def _make_sm_mock(secret_string: str) -> MagicMock:
+    sm = MagicMock()
+    sm.get_secret_value.return_value = {"SecretString": secret_string}
+    return sm
+
+
+def test_ensure_cipher_fetches_secret_and_derives_key() -> None:
+    """Happy path: codec fetches the secret from Secrets Manager, derives
+    a 32-byte AES-256 key with SHA-256, then seals/unseals successfully."""
+    secret_string = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL012345"  # 44 chars
+    sm = _make_sm_mock(secret_string)
+
+    codec = CookieCodec(
+        kms_key_arn=KMS_KEY_ARN,
+        data_key_secret_arn=DATA_KEY_SECRET_ARN,
+        secrets_manager_client=sm,
+    )
+    sealed = codec.seal(CookiePayload(session_id="sess-bootstrapped"))
+    assert codec.unseal(sealed).session_id == "sess-bootstrapped"
+
+    sm.get_secret_value.assert_called_once_with(SecretId=DATA_KEY_SECRET_ARN)
 
-    fake_kms = MagicMock()
-    fake_kms.generate_data_key.side_effect = RuntimeError("KMS unreachable")
 
-    codec = CookieCodec(kms_key_arn="arn:aws:kms:fake", kms_client=fake_kms)
-    with pytest.raises(RuntimeError, match="KMS unreachable"):
-        codec.unseal("doesnt-matter")
+def test_ensure_cipher_derived_key_matches_sha256_of_secret() -> None:
+    """Lock the KDF: a future change must keep the same derivation, or
+    every cookie sealed by an old task fails to unseal on a new task
+    after deploy."""
+    secret_string = "deterministic-secret-for-kdf-pinning-test-1234"
+    sm = _make_sm_mock(secret_string)
+
+    codec = CookieCodec(
+        kms_key_arn=KMS_KEY_ARN,
+        data_key_secret_arn=DATA_KEY_SECRET_ARN,
+        secrets_manager_client=sm,
+    )
+    # Force initialization without exposing _cipher's key directly: use a
+    # parallel cipher with the expected key, encrypt, and decrypt with the
+    # codec. If the codec didn't derive via SHA-256, decrypt fails.
+    codec.seal(CookiePayload(session_id="x"))
+    expected_key = hashlib.sha256(secret_string.encode("utf-8")).digest()
+    expected_cipher = AESGCM(expected_key)
+    nonce = secrets.token_bytes(12)
+    ciphertext = expected_cipher.encrypt(nonce, b'{"sid":"y"}', bytes([1]))
+    blob = bytes([1]) + nonce + ciphertext
+    sealed = base64.urlsafe_b64encode(blob).rstrip(b"=").decode("ascii")
+    decoded = codec.unseal(sealed)
+    assert decoded.session_id == "y"
+
+
+def test_ensure_cipher_caches_after_first_call() -> None:
+    """Hot-path requirement: only one Secrets Manager call per process."""
+    sm = _make_sm_mock("a" * 44)
+    codec = CookieCodec(
+        kms_key_arn=KMS_KEY_ARN,
+        data_key_secret_arn=DATA_KEY_SECRET_ARN,
+        secrets_manager_client=sm,
+    )
+    for _ in range(5):
+        codec.seal(CookiePayload(session_id="x"))
+    assert sm.get_secret_value.call_count == 1
+
+
+def test_two_codecs_with_same_secret_derive_the_same_cipher() -> None:
+    """Regression lock for the dev `bad seal` 401 storm.
+
+    Two independent `CookieCodec` instances simulate two ECS tasks. Both
+    fetch the SAME secret string from Secrets Manager and derive the same
+    32-byte key via SHA-256. A cookie sealed on `task_a` MUST unseal on
+    `task_b`. Pre-fix, each task generated its own random data key and
+    this failed.
+    """
+    secret_string = "shared-secret-across-tasks-1234567890ABCDEFGH"
+    sm_a = _make_sm_mock(secret_string)
+    sm_b = _make_sm_mock(secret_string)
+
+    task_a = CookieCodec(
+        kms_key_arn=KMS_KEY_ARN,
+        data_key_secret_arn=DATA_KEY_SECRET_ARN,
+        secrets_manager_client=sm_a,
+    )
+    task_b = CookieCodec(
+        kms_key_arn=KMS_KEY_ARN,
+        data_key_secret_arn=DATA_KEY_SECRET_ARN,
+        secrets_manager_client=sm_b,
+    )
+
+    sealed_on_a = task_a.seal(CookiePayload(session_id="sess-cross-task"))
+    decoded_on_b = task_b.unseal(sealed_on_a)
+    assert decoded_on_b.session_id == "sess-cross-task"
+
+
+def test_ensure_cipher_propagates_secrets_manager_failure() -> None:
+    """Secrets Manager unreachable must surface as `CookieDataKeyUnavailable`
+    so the request returns 5xx — never as a decode error that clears the
+    user's cookie."""
+    sm = MagicMock()
+    sm.get_secret_value.side_effect = RuntimeError("Secrets Manager unreachable")
+    codec = CookieCodec(
+        kms_key_arn=KMS_KEY_ARN,
+        data_key_secret_arn=DATA_KEY_SECRET_ARN,
+        secrets_manager_client=sm,
+    )
+    with pytest.raises(CookieDataKeyUnavailable):
+        codec.unseal("anything")
+
+
+def test_ensure_cipher_rejects_empty_secret_string() -> None:
+    """Bootstrap not yet completed (or secret manually wiped) — fail loud
+    rather than silently invalidate every active session."""
+    sm = _make_sm_mock("")
+    codec = CookieCodec(
+        kms_key_arn=KMS_KEY_ARN,
+        data_key_secret_arn=DATA_KEY_SECRET_ARN,
+        secrets_manager_client=sm,
+    )
+    with pytest.raises(CookieDataKeyUnavailable, match="bootstrap missing"):
+        codec.unseal("anything")
+
+
+def test_ensure_cipher_missing_config_surfaces_as_decode_error() -> None:
+    """No KMS ARN or no secret ARN — same shape as today's "BFF disabled"
+    path. Treated as `bad seal` so the middleware clears the cookie."""
+    codec = CookieCodec(kms_key_arn="", data_key_secret_arn="")
+    with pytest.raises(CookieDecodeError):
+        codec.unseal("anything")
diff --git a/backend/tests/apis/shared/sessions_bff/test_repository.py b/backend/tests/apis/shared/sessions_bff/test_repository.py
index ec6c771b..b20c33cf 100644
--- a/backend/tests/apis/shared/sessions_bff/test_repository.py
+++ b/backend/tests/apis/shared/sessions_bff/test_repository.py
@@ -77,3 +77,302 @@ async def test_disabled_repository_is_inert() -> None:
     # All ops succeed silently — no exceptions, no AWS calls.
     assert await repo.get("any") is None
     await repo.delete("any")
+
+
+# =====================================================================
+# Cross-task refresh lock — try_acquire_refresh_lock / release_refresh_lock
+# =====================================================================
+
+
+@pytest.mark.asyncio
+async def test_try_acquire_refresh_lock_succeeds_on_unlocked_row(
+    repository, sample_record
+) -> None:
+    """The first contender claims the lock when no peer is holding one."""
+    record = sample_record()
+    await repository.put(record)
+
+    acquired = await repository.try_acquire_refresh_lock(
+        session_id=record.session_id,
+        owner="task-A",
+        lock_ttl_seconds=30,
+    )
+    assert acquired is True
+
+
+@pytest.mark.asyncio
+async def test_try_acquire_refresh_lock_blocks_concurrent_peer(
+    repository, sample_record
+) -> None:
+    """While task-A's lock is fresh, task-B's acquisition MUST fail.
+
+    This is the cross-task coalescing primitive — without it, two tasks
+    would each call cognito-idp:initiate_auth with the same refresh token
+    under desiredCount > 1.
+    """
+    record = sample_record()
+    await repository.put(record)
+
+    a = await repository.try_acquire_refresh_lock(
+        session_id=record.session_id,
+        owner="task-A",
+        lock_ttl_seconds=30,
+    )
+    b = await repository.try_acquire_refresh_lock(
+        session_id=record.session_id,
+        owner="task-B",
+        lock_ttl_seconds=30,
+    )
+    assert a is True
+    assert b is False
+
+
+@pytest.mark.asyncio
+async def test_try_acquire_refresh_lock_takes_over_after_ttl_expires(
+    repository, sample_record
+) -> None:
+    """A leader that crashed mid-refresh strands the lock for at most
+    `lock_ttl_seconds`. After that, any peer can re-acquire — no manual
+    cleanup required, no permanent stuck state."""
+    record = sample_record()
+    await repository.put(record)
+
+    # task-A acquires with a 0-second TTL → lock_until = now, so any
+    # contender at a later second sees `refresh_lock_until < :now`.
+    a = await repository.try_acquire_refresh_lock(
+        session_id=record.session_id,
+        owner="task-A",
+        lock_ttl_seconds=0,
+    )
+    assert a is True
+
+    # Sleep 1s so the next contender's :now is strictly greater.
+    time.sleep(1)
+
+    b = await repository.try_acquire_refresh_lock(
+        session_id=record.session_id,
+        owner="task-B",
+        lock_ttl_seconds=30,
+    )
+    assert b is True
+
+
+@pytest.mark.asyncio
+async def test_try_acquire_refresh_lock_distinct_sessions_dont_block(
+    repository, sample_record
+) -> None:
+    rec_a = sample_record(session_id="sess-A")
+    rec_b = sample_record(session_id="sess-B")
+    await repository.put(rec_a)
+    await repository.put(rec_b)
+
+    a = await repository.try_acquire_refresh_lock(
+        session_id=rec_a.session_id, owner="task-1", lock_ttl_seconds=30
+    )
+    b = await repository.try_acquire_refresh_lock(
+        session_id=rec_b.session_id, owner="task-1", lock_ttl_seconds=30
+    )
+    assert a is True
+    assert b is True
+
+
+@pytest.mark.asyncio
+async def test_release_refresh_lock_clears_attrs_for_owner(
+    repository, sample_record
+) -> None:
+    record = sample_record()
+    await repository.put(record)
+    await repository.try_acquire_refresh_lock(
+        session_id=record.session_id, owner="task-A", lock_ttl_seconds=30
+    )
+
+    await repository.release_refresh_lock(record.session_id, owner="task-A")
+
+    # After release a peer can immediately acquire.
+    b = await repository.try_acquire_refresh_lock(
+        session_id=record.session_id, owner="task-B", lock_ttl_seconds=30
+    )
+    assert b is True
+
+
+@pytest.mark.asyncio
+async def test_release_refresh_lock_is_no_op_for_non_owner(
+    repository, sample_record
+) -> None:
+    """Best-effort release: if a peer has already taken over the lock
+    (because ours TTL'd), the release MUST NOT clear their lock attrs."""
+    record = sample_record()
+    await repository.put(record)
+    await repository.try_acquire_refresh_lock(
+        session_id=record.session_id, owner="task-A", lock_ttl_seconds=30
+    )
+
+    # task-B (who never held the lock) calls release — must not blow away
+    # task-A's lock.
+    await repository.release_refresh_lock(record.session_id, owner="task-B")
+
+    # task-A's lock is still in force; a third contender can't acquire.
+    c = await repository.try_acquire_refresh_lock(
+        session_id=record.session_id, owner="task-C", lock_ttl_seconds=30
+    )
+    assert c is False
+
+
+@pytest.mark.asyncio
+async def test_update_tokens_with_lock_owner_clears_lock_atomically(
+    repository, sample_record
+) -> None:
+    """Successful refresh persist clears the lock attributes in the same
+    write so peers don't have to wait for the TTL to retry."""
+    record = sample_record()
+    await repository.put(record)
+    await repository.try_acquire_refresh_lock(
+        session_id=record.session_id, owner="task-A", lock_ttl_seconds=30
+    )
+
+    await repository.update_tokens(
+        session_id=record.session_id,
+        access_token="access.fresh",
+        refresh_token="refresh.rotated",
+        id_token="id.fresh",
+        access_token_exp=int(time.time()) + 3600,
+        last_seen_at=int(time.time()),
+        expected_lock_owner="task-A",
+    )
+
+    # Lock cleared → another contender can acquire immediately.
+    b = await repository.try_acquire_refresh_lock(
+        session_id=record.session_id, owner="task-B", lock_ttl_seconds=30
+    )
+    assert b is True
+
+
+@pytest.mark.asyncio
+async def test_update_tokens_rejects_persist_when_peer_owns_the_lock(
+    repository, sample_record
+) -> None:
+    """Stale-leader guard: if our lock TTL'd and a peer took over, we must
+    NOT overwrite their freshly persisted tokens. ConditionalCheckFailed
+    propagates so the caller can re-read DDB and adopt the peer's state."""
+    from botocore.exceptions import ClientError
+
+    record = sample_record()
+    await repository.put(record)
+    # Peer task acquired the lock.
+    await repository.try_acquire_refresh_lock(
+        session_id=record.session_id, owner="peer-task", lock_ttl_seconds=30
+    )
+
+    with pytest.raises(ClientError) as exc_info:
+        await repository.update_tokens(
+            session_id=record.session_id,
+            access_token="access.stale",
+            refresh_token="refresh.stale",
+            id_token="id.stale",
+            access_token_exp=int(time.time()) + 3600,
+            last_seen_at=int(time.time()),
+            expected_lock_owner="our-task",  # ≠ peer-task
+        )
+    assert (
+        exc_info.value.response.get("Error", {}).get("Code")
+        == "ConditionalCheckFailedException"
+    )
+
+
+@pytest.mark.asyncio
+async def test_update_tokens_rejects_persist_when_peer_already_cleared_the_lock(
+    repository, sample_record
+) -> None:
+    """The other half of the stale-leader guard: a peer whose lock TTL'd,
+    took over, refreshed, and successfully persisted (which atomically
+    REMOVEs the lock attrs) — the row now has NO lock attributes at all.
+    A stale leader trying to persist with `expected_lock_owner=our-task`
+    must still fail closed; otherwise our older Cognito tokens would
+    silently overwrite the peer's freshly rotated ones, and the next
+    request would get NotAuthorizedException from Cognito (our refresh
+    token was revoked when the peer's rotation was issued).
+
+    Sequence:
+        1. Task A acquires lock at T0.
+        2. Task A's Cognito call hangs.
+        3. Task B sees lock TTL'd, acquires, refreshes, persists (clears).
+        4. Task A's Cognito finally returns; A tries to persist.
+        => MUST fail with ConditionalCheckFailedException.
+    """
+    from botocore.exceptions import ClientError
+
+    record = sample_record()
+    await repository.put(record)
+
+    # Peer acquired the lock and successfully persisted (clearing it).
+    await repository.try_acquire_refresh_lock(
+        session_id=record.session_id, owner="peer-task", lock_ttl_seconds=30
+    )
+    await repository.update_tokens(
+        session_id=record.session_id,
+        access_token="access.peer-fresh",
+        refresh_token="refresh.peer-rotated",
+        id_token="id.peer",
+        access_token_exp=int(time.time()) + 3600,
+        last_seen_at=int(time.time()),
+        expected_lock_owner="peer-task",
+    )
+
+    # Stale leader (our-task) — never owned a lock that's still on the
+    # row, but holds an old `lock_owner` from before the TTL. Must fail.
+    with pytest.raises(ClientError) as exc_info:
+        await repository.update_tokens(
+            session_id=record.session_id,
+            access_token="access.stale-leader",
+            refresh_token="refresh.stale-leader",
+            id_token="id.stale-leader",
+            access_token_exp=int(time.time()) + 3600,
+            last_seen_at=int(time.time()),
+            expected_lock_owner="our-task",
+        )
+    assert (
+        exc_info.value.response.get("Error", {}).get("Code")
+        == "ConditionalCheckFailedException"
+    )
+
+    # Peer's tokens are still intact on the row.
+    fetched = await repository.get(record.session_id)
+    assert fetched is not None
+    assert fetched.cognito_access_token == "access.peer-fresh"
+    assert fetched.cognito_refresh_token == "refresh.peer-rotated"
+
+
+@pytest.mark.asyncio
+async def test_try_acquire_refresh_lock_does_not_create_phantom_row(
+    repository, moto_bff_dynamodb
+) -> None:
+    """Logout-during-refresh guard: if the session row was deleted between
+    `repository.get()` and `try_acquire_refresh_lock`, UpdateItem would
+    upsert a phantom row containing only the lock attrs (and crucially no
+    `ttl`, so DDB TTL would never reap it). The `attribute_exists(PK)`
+    guard turns that into a clean False return.
+
+    Asserts via raw DDB get_item — `repository.get` would mask a phantom
+    behind its post-read TTL check (a row with no `ttl` attribute reads
+    as `int(item.get("ttl", 0)) <= now`, treated as missing), so we
+    bypass that and look at the raw item.
+    """
+    # Session row never existed (or was just deleted by a logout from
+    # another task between this request's repository.get() and here).
+    acquired = await repository.try_acquire_refresh_lock(
+        session_id="never-existed",
+        owner="task-A",
+        lock_ttl_seconds=30,
+    )
+    assert acquired is False
+
+    # No phantom row was created — check the raw table, since
+    # repository.get() would also return None for a phantom (no `ttl`).
+    table = moto_bff_dynamodb.Table("test-bff-sessions")
+    response = table.get_item(
+        Key={"PK": "SESSION#never-existed", "SK": "META"}
+    )
+    assert "Item" not in response, (
+        "try_acquire_refresh_lock created a phantom row with no `ttl` — "
+        "DDB TTL would never reap it"
+    )
diff --git a/backend/tests/apis/shared/sessions_bff/test_session_refresh_cross_task.py b/backend/tests/apis/shared/sessions_bff/test_session_refresh_cross_task.py
new file mode 100644
index 00000000..946e357c
--- /dev/null
+++ b/backend/tests/apis/shared/sessions_bff/test_session_refresh_cross_task.py
@@ -0,0 +1,480 @@
+"""Cross-task refresh-coalescing tests for SessionRefreshMiddleware.
+
+Locks in the regression that PR #264 created and the cookie-codec fix would
+*expose* once dev started working again: with `desiredCount: 2`, two
+`SessionRefreshMiddleware` instances running in two ECS tasks would each
+see a cookie crossing the refresh-leeway boundary, each call
+`cognito-idp:initiate_auth` with the same refresh token, and one of them
+would lose the rotation race — Cognito revokes the original token on the
+winner's exchange, the loser gets `NotAuthorizedException`, the loser's
+middleware clears the user's cookie. Page-load fan-outs become routine
+silent logouts.
+
+The fix coalesces the refresh exchange across tasks via a DynamoDB
+conditional-write lock (`refresh_lock_owner` + `refresh_lock_until` on
+the session row). These tests instantiate two repository + middleware
+pairs against ONE moto-backed DDB table so we can drive the leader and
+follower paths deterministically without spinning real ECS tasks.
+
+What's covered:
+    - Leader-only Cognito refresh under same-time contention from two tasks
+    - Follower adoption of the leader's persisted tokens (no Cognito call)
+    - Leader crash (Cognito error) releases the lock so peers can retry
+    - Lock TTL recovery: a crashed leader's lock unblocks peers after TTL
+    - Refresh-token rotation: peer's rotated tokens propagate to follower
+"""
+
+from __future__ import annotations
+
+import asyncio
+import secrets
+import time
+from typing import Optional
+from unittest.mock import AsyncMock, MagicMock
+
+import boto3
+import pytest
+from cryptography.hazmat.primitives.ciphers.aead import AESGCM
+from fastapi import FastAPI, Request
+from fastapi.testclient import TestClient
+from moto import mock_aws
+
+from apis.shared.middleware.session_refresh import SessionRefreshMiddleware
+from apis.shared.sessions_bff import lock as lock_module
+from apis.shared.sessions_bff import single_flight as single_flight_module
+from apis.shared.sessions_bff.cache import SessionCache
+from apis.shared.sessions_bff.config import (
+    BFFConfig,
+    SESSION_COOKIE_NAME,
+)
+from apis.shared.sessions_bff.cookie import CookieCodec
+from apis.shared.sessions_bff.models import CookiePayload, SessionRecord
+from apis.shared.sessions_bff.refresh import (
+    CognitoRefreshError,
+    RefreshResult,
+)
+from apis.shared.sessions_bff.repository import SessionRepository
+
+# Single shared DDB table — both "tasks" attach to the same backing store,
+# matching production where two ECS tasks read/write one BFFSessionsTable.
+TABLE_NAME = "test-bff-sessions"
+
+
+@pytest.fixture(autouse=True)
+def _reset_module_state():
+    """Drop process-wide locks + single-flight registries between tests so
+    a leftover Future or asyncio lock from one test can't influence the
+    next case's contention behavior."""
+    lock_module._reset_for_tests()
+    single_flight_module._reset_for_tests()
+    yield
+    lock_module._reset_for_tests()
+    single_flight_module._reset_for_tests()
+
+
+@pytest.fixture
+def two_task_setup(monkeypatch):
+    """Spin up two `SessionRefreshMiddleware` instances over one moto DDB
+    table so each represents a distinct ECS task in the same fleet."""
+    monkeypatch.setenv("AWS_DEFAULT_REGION", "us-east-1")
+    monkeypatch.setenv("AWS_ACCESS_KEY_ID", "testing")
+    monkeypatch.setenv("AWS_SECRET_ACCESS_KEY", "testing")
+
+    with mock_aws():
+        dynamodb = boto3.resource("dynamodb", region_name="us-east-1")
+        dynamodb.create_table(
+            TableName=TABLE_NAME,
+            KeySchema=[
+                {"AttributeName": "PK", "KeyType": "HASH"},
+                {"AttributeName": "SK", "KeyType": "RANGE"},
+            ],
+            AttributeDefinitions=[
+                {"AttributeName": "PK", "AttributeType": "S"},
+                {"AttributeName": "SK", "AttributeType": "S"},
+            ],
+            BillingMode="PAY_PER_REQUEST",
+        )
+
+        # Both tasks share the data-key secret (otherwise the cookie sealed
+        # by Task A would unseal as `bad seal` on Task B — that's the OTHER
+        # bug in this branch, exercised by test_cookie). We pre-inject one
+        # AES key here to keep the test focused on the refresh-lock path.
+        shared_aes_key = secrets.token_bytes(32)
+
+        def _make_codec() -> CookieCodec:
+            codec = CookieCodec(
+                kms_key_arn="arn:aws:kms:fake",
+                data_key_secret_arn="arn:aws:secretsmanager:fake",
+            )
+            codec._cipher = AESGCM(shared_aes_key)
+            return codec
+
+        def _make_task(*, refresh_client) -> dict:
+            repo = SessionRepository(table_name=TABLE_NAME)
+            codec = _make_codec()
+            cache = SessionCache(ttl_seconds=60)
+            config = _enabled_config()
+
+            app = FastAPI()
+            app.add_middleware(
+                SessionRefreshMiddleware,
+                config=config,
+                repository=repo,
+                cookie_codec=codec,
+                refresh_client=refresh_client,
+                cache=cache,
+                refresh_lock_ttl_seconds=2,  # short for tests
+            )
+
+            @app.get("/echo")
+            async def echo(request: Request):
+                record = getattr(request.state, "bff_session", None)
+                return {
+                    "has_session": record is not None,
+                    "access_token": (
+                        record.cognito_access_token if record else None
+                    ),
+                    "refresh_token": (
+                        record.cognito_refresh_token if record else None
+                    ),
+                }
+
+            return {
+                "app": app,
+                "repository": repo,
+                "codec": codec,
+                "cache": cache,
+                "refresh_client": refresh_client,
+            }
+
+        yield {
+            "make_task": _make_task,
+            "table_name": TABLE_NAME,
+            "shared_aes_key": shared_aes_key,
+            "make_codec": _make_codec,
+        }
+
+
+def _enabled_config() -> BFFConfig:
+    return BFFConfig(
+        sessions_table_name="tbl",
+        cookie_signing_key_arn="arn:aws:kms:fake",
+        session_ttl_seconds=28800,
+        refresh_leeway_seconds=60,
+        cognito_bff_app_client_id="client-id",
+        cognito_bff_app_client_secret_arn="arn:secret",
+        inference_api_url=None,
+        absolute_lifetime_seconds=30 * 24 * 3600,
+        sliding_renewal_throttle_seconds=300,
+    )
+
+
+def _seed_session_in_refresh_window(repository: SessionRepository) -> SessionRecord:
+    """Persist a session whose access token is inside the refresh leeway,
+    so the middleware MUST hit the refresh path."""
+    now = int(time.time())
+    record = SessionRecord(
+        session_id="sess-cross-task",
+        user_id="user-001",
+        username="alice",
+        cognito_access_token="access.original",
+        cognito_refresh_token="refresh.original",
+        id_token="id.original",
+        access_token_exp=now + 5,  # within 60s leeway
+        csrf_secret="csrf-secret",
+        created_at=now,
+        last_seen_at=now,
+        ttl=now + 28800,
+    )
+    asyncio.run(repository.put(record))
+    return record
+
+
+def test_only_the_leader_calls_cognito_under_cross_task_contention(
+    two_task_setup,
+) -> None:
+    """Two tasks see the same cookie in the refresh window. Exactly one
+    calls Cognito (the leader). The other adopts the leader's tokens
+    from DDB without ever calling Cognito.
+
+    Pre-fix: BOTH tasks would call Cognito with the same refresh token,
+    and the loser would get NotAuthorizedException → clear cookie → 401.
+    """
+    # Refresh client A is the leader's; refresh client B simulates the
+    # follower's. We assert that B is NEVER called.
+    leader_refresh = AsyncMock(
+        return_value=RefreshResult(
+            access_token="access.fresh-from-leader",
+            refresh_token="refresh.rotated-by-leader",
+            id_token="id.fresh",
+            access_token_exp=int(time.time()) + 3600,
+        )
+    )
+    follower_refresh = AsyncMock(
+        side_effect=AssertionError(
+            "Follower MUST NOT call Cognito — peer holds the refresh lock"
+        )
+    )
+
+    task_a = two_task_setup["make_task"](refresh_client=MagicMock(refresh=leader_refresh))
+    task_b = two_task_setup["make_task"](refresh_client=MagicMock(refresh=follower_refresh))
+
+    record = _seed_session_in_refresh_window(task_a["repository"])
+    sealed = task_a["codec"].seal(CookiePayload(session_id=record.session_id))
+
+    # Drive task_a first (it'll grab the lock and refresh). Then drive
+    # task_b — it must observe the lock as held (or just released, with
+    # tokens already rotated on the row) and adopt rather than refresh.
+    with TestClient(task_a["app"]) as client_a:
+        response_a = client_a.get(
+            "/echo", cookies={SESSION_COOKIE_NAME: sealed}
+        )
+    with TestClient(task_b["app"]) as client_b:
+        response_b = client_b.get(
+            "/echo", cookies={SESSION_COOKIE_NAME: sealed}
+        )
+
+    assert response_a.status_code == 200
+    assert response_b.status_code == 200
+    assert response_a.json()["has_session"] is True
+    assert response_b.json()["has_session"] is True
+    # Both tasks see the leader's freshly rotated tokens.
+    assert response_a.json()["access_token"] == "access.fresh-from-leader"
+    assert response_b.json()["access_token"] == "access.fresh-from-leader"
+    assert response_b.json()["refresh_token"] == "refresh.rotated-by-leader"
+
+    leader_refresh.assert_called_once()
+    follower_refresh.assert_not_called()
+
+
+def test_follower_polls_until_leader_persists_then_adopts(
+    two_task_setup,
+) -> None:
+    """Simulates near-simultaneous arrival: task_a gets the lock just
+    before task_b runs. Task_b's `_wait_for_peer_refresh` polls DDB
+    and adopts task_a's tokens once they land.
+
+    To force the follower to actually poll (rather than fall through
+    a fully-completed leader path), we make the leader's Cognito refresh
+    take a measurable amount of time and start the follower while the
+    leader is still in flight.
+    """
+    leader_done = asyncio.Event()
+    follower_started = asyncio.Event()
+
+    async def slow_leader_refresh(*args, **kwargs) -> RefreshResult:
+        # Wait for the follower to be inside its poll loop, then complete.
+        await follower_started.wait()
+        await asyncio.sleep(0.05)
+        leader_done.set()
+        return RefreshResult(
+            access_token="access.fresh-leader",
+            refresh_token="refresh.rotated-leader",
+            id_token="id.fresh",
+            access_token_exp=int(time.time()) + 3600,
+        )
+
+    leader_refresh = AsyncMock(side_effect=slow_leader_refresh)
+    follower_refresh = AsyncMock(
+        side_effect=AssertionError("Follower must NOT call Cognito")
+    )
+
+    task_a = two_task_setup["make_task"](refresh_client=MagicMock(refresh=leader_refresh))
+    task_b = two_task_setup["make_task"](refresh_client=MagicMock(refresh=follower_refresh))
+    record = _seed_session_in_refresh_window(task_a["repository"])
+    sealed = task_a["codec"].seal(CookiePayload(session_id=record.session_id))
+
+    async def drive_both() -> tuple[dict, dict]:
+        async def hit(client_app):
+            from httpx import ASGITransport, AsyncClient
+
+            async with AsyncClient(
+                transport=ASGITransport(app=client_app), base_url="http://t"
+            ) as client:
+                response = await client.get(
+                    "/echo", cookies={SESSION_COOKIE_NAME: sealed}
+                )
+                return response.json()
+
+        async def driven_follower():
+            # Start the follower a tick later, so the leader has the lock.
+            await asyncio.sleep(0.02)
+            follower_started.set()
+            return await hit(task_b["app"])
+
+        a, b = await asyncio.gather(hit(task_a["app"]), driven_follower())
+        return a, b
+
+    a_body, b_body = asyncio.run(drive_both())
+
+    assert a_body["has_session"] is True
+    assert b_body["has_session"] is True
+    assert a_body["access_token"] == "access.fresh-leader"
+    assert b_body["access_token"] == "access.fresh-leader"
+    leader_refresh.assert_called_once()
+    follower_refresh.assert_not_called()
+
+
+def test_lock_ttl_lets_a_peer_retry_after_a_dead_leader(
+    two_task_setup,
+) -> None:
+    """Leader's Cognito call fails → lock is released → peer can refresh
+    on its next request without waiting for the full TTL.
+
+    This guards against the worst case where a leader crashes mid-refresh
+    and never persists tokens. We don't want every subsequent request to
+    fail closed for the duration of the lock TTL.
+    """
+    leader_refresh = AsyncMock(side_effect=CognitoRefreshError("Cognito down"))
+    follower_refresh = AsyncMock(
+        return_value=RefreshResult(
+            access_token="access.peer-fresh",
+            refresh_token="refresh.peer-rotated",
+            id_token="id.peer",
+            access_token_exp=int(time.time()) + 3600,
+        )
+    )
+
+    task_a = two_task_setup["make_task"](
+        refresh_client=MagicMock(refresh=leader_refresh)
+    )
+    task_b = two_task_setup["make_task"](
+        refresh_client=MagicMock(refresh=follower_refresh)
+    )
+    record = _seed_session_in_refresh_window(task_a["repository"])
+    sealed = task_a["codec"].seal(CookiePayload(session_id=record.session_id))
+
+    # Task A: leader fails. The middleware clears its cookie for THIS
+    # request but releases the lock (so a peer can retry).
+    with TestClient(task_a["app"]) as client_a:
+        response_a = client_a.get(
+            "/echo", cookies={SESSION_COOKIE_NAME: sealed}
+        )
+    assert response_a.status_code == 200
+    assert response_a.json()["has_session"] is False
+    set_cookies_a = response_a.headers.get_list("set-cookie")
+    assert any(
+        "__Host-bff_session=" in c and "Max-Age=0" in c for c in set_cookies_a
+    ), "Task A must clear cookie after its own refresh failed"
+
+    # Task B (different request): lock is released; peer becomes the new
+    # leader and refreshes successfully.
+    with TestClient(task_b["app"]) as client_b:
+        response_b = client_b.get(
+            "/echo", cookies={SESSION_COOKIE_NAME: sealed}
+        )
+    assert response_b.status_code == 200
+    assert response_b.json()["has_session"] is True
+    assert response_b.json()["access_token"] == "access.peer-fresh"
+    leader_refresh.assert_called_once()
+    follower_refresh.assert_called_once()
+
+
+def test_follower_falls_back_terminal_when_leader_disappears_mid_refresh(
+    two_task_setup,
+) -> None:
+    """Pathological case: leader holds the lock but never persists tokens
+    AND never releases (e.g. process killed). The follower's poll deadline
+    is bounded by `refresh_lock_ttl_seconds`; after that, this request
+    fails closed (clear cookie). The user re-auths.
+
+    The next request after this one will see the lock TTL'd and can
+    re-acquire — that path is covered by
+    `test_lock_ttl_lets_a_peer_retry_after_a_dead_leader`.
+    """
+    follower_refresh = AsyncMock(
+        side_effect=AssertionError("Follower must NOT call Cognito while a peer holds the lock")
+    )
+    task_b = two_task_setup["make_task"](
+        refresh_client=MagicMock(refresh=follower_refresh)
+    )
+    record = _seed_session_in_refresh_window(task_b["repository"])
+
+    # Manually park a lock on the row as if some other task is mid-refresh
+    # but hasn't persisted yet (and won't, for the duration of this test).
+    asyncio.run(
+        task_b["repository"].try_acquire_refresh_lock(
+            session_id=record.session_id,
+            owner="ghost-task",
+            lock_ttl_seconds=2,  # matches make_task's middleware TTL
+        )
+    )
+
+    sealed = task_b["codec"].seal(CookiePayload(session_id=record.session_id))
+    with TestClient(task_b["app"]) as client_b:
+        response = client_b.get(
+            "/echo", cookies={SESSION_COOKIE_NAME: sealed}
+        )
+
+    assert response.status_code == 200
+    assert response.json()["has_session"] is False
+    set_cookies = response.headers.get_list("set-cookie")
+    assert any(
+        "__Host-bff_session=" in c and "Max-Age=0" in c for c in set_cookies
+    ), "Follower must clear cookie after polling timed out on a stuck leader"
+    follower_refresh.assert_not_called()
+
+
+def test_two_tasks_in_parallel_call_cognito_at_most_once(
+    two_task_setup,
+) -> None:
+    """Pure asyncio gather of one request per task at the same instant.
+    Whichever wins the conditional UpdateItem becomes the leader; the
+    other adopts. Combined Cognito call count must be exactly 1.
+
+    This is the closest analogue to the page-load fan-out behavior we
+    care about in production — two tasks each receive their share of
+    the 8-endpoint fan-out at the moment the cookie crosses the leeway
+    window.
+    """
+    refresh_count = {"calls": 0}
+
+    async def counted_refresh(*args, **kwargs):
+        refresh_count["calls"] += 1
+        await asyncio.sleep(0.05)
+        return RefreshResult(
+            access_token="access.fresh",
+            refresh_token="refresh.rotated",
+            id_token="id.fresh",
+            access_token_exp=int(time.time()) + 3600,
+        )
+
+    refresh_a = AsyncMock(side_effect=counted_refresh)
+    refresh_b = AsyncMock(side_effect=counted_refresh)
+
+    task_a = two_task_setup["make_task"](
+        refresh_client=MagicMock(refresh=refresh_a)
+    )
+    task_b = two_task_setup["make_task"](
+        refresh_client=MagicMock(refresh=refresh_b)
+    )
+    record = _seed_session_in_refresh_window(task_a["repository"])
+    sealed = task_a["codec"].seal(CookiePayload(session_id=record.session_id))
+
+    async def drive() -> tuple[dict, dict]:
+        from httpx import ASGITransport, AsyncClient
+
+        async def hit(app):
+            async with AsyncClient(
+                transport=ASGITransport(app=app), base_url="http://t"
+            ) as client:
+                response = await client.get(
+                    "/echo", cookies={SESSION_COOKIE_NAME: sealed}
+                )
+                return response.json()
+
+        return await asyncio.gather(hit(task_a["app"]), hit(task_b["app"]))
+
+    a_body, b_body = asyncio.run(drive())
+
+    # Both succeeded.
+    assert a_body["has_session"] is True
+    assert b_body["has_session"] is True
+    # Both got the same fresh tokens (one set, sourced from the leader).
+    assert a_body["access_token"] == b_body["access_token"] == "access.fresh"
+    assert a_body["refresh_token"] == b_body["refresh_token"] == "refresh.rotated"
+    # CRITICAL: across BOTH tasks, Cognito refresh was called at most once.
+    assert refresh_count["calls"] == 1, (
+        f"Cross-task coalescing violated — Cognito refresh was called "
+        f"{refresh_count['calls']} times across two tasks"
+    )
diff --git a/backend/tests/apis/shared/sessions_bff/test_session_refresh_middleware.py b/backend/tests/apis/shared/sessions_bff/test_session_refresh_middleware.py
index cdc68f7e..7e64f02d 100644
--- a/backend/tests/apis/shared/sessions_bff/test_session_refresh_middleware.py
+++ b/backend/tests/apis/shared/sessions_bff/test_session_refresh_middleware.py
@@ -226,12 +226,16 @@ def test_near_expiry_session_triggers_refresh_once() -> None:
     repo = AsyncMock()
     repo.get.return_value = record
     codec = _make_codec()
+    # `refresh_client.refresh` is now `async` (task 3.2) — use AsyncMock so
+    # `await self._refresh_client.refresh(...)` in the middleware resolves.
     refresh = MagicMock()
-    refresh.refresh.return_value = RefreshResult(
-        access_token="access.fresh",
-        refresh_token="refresh.fresh",
-        id_token="id.fresh",
-        access_token_exp=int(time.time()) + 3600,
+    refresh.refresh = AsyncMock(
+        return_value=RefreshResult(
+            access_token="access.fresh",
+            refresh_token="refresh.fresh",
+            id_token="id.fresh",
+            access_token_exp=int(time.time()) + 3600,
+        )
     )
     app = _build_app(
         config=_enabled_config(), repository=repo, codec=codec, refresh_client=refresh
@@ -245,7 +249,7 @@ def test_near_expiry_session_triggers_refresh_once() -> None:
     assert body["has_session"] is True
     # The refreshed token should be exposed downstream.
     assert body["access_token"] == "access.fresh"
-    refresh.refresh.assert_called_once_with(
+    refresh.refresh.assert_awaited_once_with(
         username="alice", refresh_token="refresh.original"
     )
     repo.update_tokens.assert_awaited_once()
@@ -259,7 +263,7 @@ def test_refresh_failure_clears_cookie() -> None:
     repo.get.return_value = record
     codec = _make_codec()
     refresh = MagicMock()
-    refresh.refresh.side_effect = CognitoRefreshError("rotated")
+    refresh.refresh = AsyncMock(side_effect=CognitoRefreshError("rotated"))
     app = _build_app(
         config=_enabled_config(), repository=repo, codec=codec, refresh_client=refresh
     )
@@ -298,7 +302,7 @@ def slow_refresh(*, username: str, refresh_token: str) -> RefreshResult:
         )
 
     refresh = MagicMock()
-    refresh.refresh.side_effect = slow_refresh
+    refresh.refresh = AsyncMock(side_effect=slow_refresh)
 
     # After the first refresh, repo.get returns the *fresh* record so other
     # waiters short-circuit out of the refresh branch.
@@ -381,7 +385,13 @@ def test_slide_within_throttle_window_does_not_write_or_reemit() -> None:
 def test_slide_past_throttle_writes_ddb_and_reemits_cookie() -> None:
     """Once `last_seen_at` is older than the throttle window, the slide
     fires: one DDB touch with a fresh ttl, plus a Set-Cookie carrying a
-    fresh Max-Age = session_ttl_seconds."""
+    fresh Max-Age = session_ttl_seconds.
+
+    The slide-write is fire-and-forget (task 3.5) — we poll for the
+    background task's side effect rather than sample immediately. The
+    observable external contract (Set-Cookie Max-Age) is unchanged; only
+    the internal timing of the write moves off the request path.
+    """
     record = _make_record()
     record.last_seen_at = int(time.time()) - 120  # past the 60s throttle
     repo = AsyncMock()
@@ -393,7 +403,21 @@ def test_slide_past_throttle_writes_ddb_and_reemits_cookie() -> None:
     )
 
     sealed = codec.seal(CookiePayload(session_id=record.session_id))
-    response = TestClient(app).get("/echo", cookies={SESSION_COOKIE_NAME: sealed})
+    with TestClient(app) as client:
+        response = client.get("/echo", cookies={SESSION_COOKIE_NAME: sealed})
+        # Poll for the fire-and-forget slide-write (task 3.5) INSIDE the
+        # `with` block — TestClient tears down its anyio portal (and the
+        # event loop) on `__exit__`, cancelling any unfinished tasks.
+        # Drive the loop with a second GET if the first request's
+        # background task hasn't flushed yet.
+        deadline = time.monotonic() + 1.0
+        while time.monotonic() < deadline and repo.touch_last_seen.await_count == 0:
+            time.sleep(0.01)
+        if repo.touch_last_seen.await_count == 0:
+            client.get("/echo")
+            deadline = time.monotonic() + 1.0
+            while time.monotonic() < deadline and repo.touch_last_seen.await_count == 0:
+                time.sleep(0.01)
 
     assert response.status_code == 200
     # Exactly one slide-write, and it carries a ttl bumped by ~session_ttl_seconds.
@@ -472,6 +496,44 @@ def test_slide_max_age_capped_by_remaining_absolute_lifetime() -> None:
     assert 350 <= max_age <= 400
 
 
+def test_refresh_path_past_absolute_cap_clears_cookie_without_calling_cognito() -> None:
+    """The refresh path must mirror the slide path's absolute-cap behavior:
+    once `created_at + absolute_lifetime` has passed, do NOT mint fresh
+    tokens. Persisting them would also write a past-dated `ttl`
+    (`min(now + session_ttl_seconds, absolute_cap)` is `< now` past the
+    cap), which would instantly TTL-evict the row right after the write
+    and silently log the user out one request later. Failing closed up
+    front avoids burning a Cognito refresh-token rotation we'd just
+    throw away."""
+    record = _make_record(access_token_exp=int(time.time()) + 5)  # within leeway
+    record.created_at = int(time.time()) - 200  # past 100s absolute cap
+    repo = AsyncMock()
+    repo.get.return_value = record
+    codec = _make_codec()
+    refresh = MagicMock()
+    refresh.refresh = AsyncMock(
+        side_effect=AssertionError(
+            "Cognito refresh MUST NOT be called past absolute lifetime"
+        )
+    )
+    app = _build_app(
+        config=_enabled_config(absolute_lifetime_seconds=100),
+        repository=repo,
+        codec=codec,
+        refresh_client=refresh,
+    )
+
+    sealed = codec.seal(CookiePayload(session_id=record.session_id))
+    response = TestClient(app).get("/echo", cookies={SESSION_COOKIE_NAME: sealed})
+
+    assert response.status_code == 200
+    assert response.json()["has_session"] is False
+    refresh.refresh.assert_not_called()
+    repo.update_tokens.assert_not_called()
+    cleared = " ".join(response.headers.get_list("set-cookie"))
+    assert SESSION_COOKIE_NAME in cleared and "Max-Age=0" in cleared
+
+
 def test_refresh_path_bumps_ttl_when_persisting_tokens() -> None:
     """The token-rotation write must also slide the row's ttl forward —
     otherwise a session that just refreshed could still expire moments
@@ -481,11 +543,13 @@ def test_refresh_path_bumps_ttl_when_persisting_tokens() -> None:
     repo.get.return_value = record
     codec = _make_codec()
     refresh = MagicMock()
-    refresh.refresh.return_value = RefreshResult(
-        access_token="access.fresh",
-        refresh_token="refresh.original",  # no rotation
-        id_token="id.fresh",
-        access_token_exp=int(time.time()) + 3600,
+    refresh.refresh = AsyncMock(
+        return_value=RefreshResult(
+            access_token="access.fresh",
+            refresh_token="refresh.original",  # no rotation
+            id_token="id.fresh",
+            access_token_exp=int(time.time()) + 3600,
+        )
     )
     app = _build_app(
         config=_enabled_config(), repository=repo, codec=codec, refresh_client=refresh
@@ -517,11 +581,13 @@ def test_rotation_persist_failure_invalidates_session() -> None:
     repo.update_tokens.side_effect = RuntimeError("DDB throttled")
     codec = _make_codec()
     refresh = MagicMock()
-    refresh.refresh.return_value = RefreshResult(
-        access_token="access.fresh",
-        refresh_token="refresh.ROTATED",  # rotation kicked in
-        id_token="id.fresh",
-        access_token_exp=int(time.time()) + 3600,
+    refresh.refresh = AsyncMock(
+        return_value=RefreshResult(
+            access_token="access.fresh",
+            refresh_token="refresh.ROTATED",  # rotation kicked in
+            id_token="id.fresh",
+            access_token_exp=int(time.time()) + 3600,
+        )
     )
     app = _build_app(
         config=_enabled_config(), repository=repo, codec=codec, refresh_client=refresh
@@ -550,11 +616,13 @@ def test_non_rotation_persist_failure_does_not_invalidate() -> None:
     repo.update_tokens.side_effect = RuntimeError("DDB throttled")
     codec = _make_codec()
     refresh = MagicMock()
-    refresh.refresh.return_value = RefreshResult(
-        access_token="access.fresh",
-        refresh_token="refresh.original",  # SAME — no rotation
-        id_token="id.fresh",
-        access_token_exp=int(time.time()) + 3600,
+    refresh.refresh = AsyncMock(
+        return_value=RefreshResult(
+            access_token="access.fresh",
+            refresh_token="refresh.original",  # SAME — no rotation
+            id_token="id.fresh",
+            access_token_exp=int(time.time()) + 3600,
+        )
     )
     app = _build_app(
         config=_enabled_config(), repository=repo, codec=codec, refresh_client=refresh
diff --git a/backend/tests/apis/shared/sessions_bff/test_single_flight.py b/backend/tests/apis/shared/sessions_bff/test_single_flight.py
new file mode 100644
index 00000000..e2159765
--- /dev/null
+++ b/backend/tests/apis/shared/sessions_bff/test_single_flight.py
@@ -0,0 +1,211 @@
+"""Unit tests for the per-session single-flight primitive.
+
+Covers the contract documented in
+`backend/src/apis/shared/sessions_bff/single_flight.py`:
+
+1. Two concurrent `resolve_once` calls for the same `session_id` share one
+   loader invocation; both receive the same result.
+2. An exception raised by the loader propagates to every current waiter
+   (leader + all followers).
+3. After a loader exception the registry entry is removed, so a subsequent
+   call starts a fresh leader.
+4. Distinct `session_id`s are independent (two different sessions produce two
+   loader invocations).
+5. Happy path: a single caller's result is returned correctly.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import time
+from typing import Optional, Tuple
+
+import pytest
+
+from apis.shared.sessions_bff import single_flight
+from apis.shared.sessions_bff.models import SessionRecord
+
+
+def _make_record(session_id: str = "sess-sf-001") -> SessionRecord:
+    now = int(time.time())
+    return SessionRecord(
+        session_id=session_id,
+        user_id=f"user-for-{session_id}",
+        username="alice",
+        cognito_access_token="access.token.value",
+        cognito_refresh_token="refresh.token.value",
+        id_token="id.token.value",
+        access_token_exp=now + 3600,
+        csrf_secret="csrf-secret-deadbeef",
+        created_at=now,
+        last_seen_at=now,
+        ttl=now + 28800,
+    )
+
+
+@pytest.fixture(autouse=True)
+def _reset_registry():
+    """Drop any residual in-flight Futures between tests."""
+    single_flight._reset_for_tests()
+    yield
+    single_flight._reset_for_tests()
+
+
+@pytest.mark.asyncio
+async def test_happy_path_single_caller_returns_loader_result():
+    """A lone caller receives the loader's exact return value."""
+    record = _make_record()
+    call_count = 0
+
+    async def loader() -> Tuple[Optional[SessionRecord], bool]:
+        nonlocal call_count
+        call_count += 1
+        return record, False
+
+    result = await single_flight.resolve_once("sess-sf-001", loader)
+
+    assert result == (record, False)
+    assert call_count == 1
+    # Registry is clean after success.
+    assert "sess-sf-001" not in single_flight._inflight
+
+
+@pytest.mark.asyncio
+async def test_concurrent_same_session_share_one_loader_invocation():
+    """N concurrent `resolve_once` calls on the same session call loader once."""
+    record = _make_record()
+    call_count = 0
+    gate = asyncio.Event()
+
+    async def loader() -> Tuple[Optional[SessionRecord], bool]:
+        nonlocal call_count
+        call_count += 1
+        # Hold the leader open long enough for followers to attach.
+        await gate.wait()
+        return record, False
+
+    async def release_after_followers_attach() -> None:
+        # Give followers a chance to see the existing Future.
+        await asyncio.sleep(0.05)
+        gate.set()
+
+    tasks = [
+        asyncio.create_task(single_flight.resolve_once("sess-sf-002", loader))
+        for _ in range(8)
+    ]
+    releaser = asyncio.create_task(release_after_followers_attach())
+
+    results = await asyncio.gather(*tasks)
+    await releaser
+
+    assert call_count == 1, "loader must be invoked exactly once for shared session"
+    for result in results:
+        assert result == (record, False)
+    assert "sess-sf-002" not in single_flight._inflight
+
+
+@pytest.mark.asyncio
+async def test_loader_exception_propagates_to_all_waiters():
+    """An exception from the loader reaches the leader and every follower."""
+    call_count = 0
+    gate = asyncio.Event()
+
+    class LoaderBoom(RuntimeError):
+        pass
+
+    async def loader() -> Tuple[Optional[SessionRecord], bool]:
+        nonlocal call_count
+        call_count += 1
+        await gate.wait()
+        raise LoaderBoom("cognito exploded")
+
+    async def release_after_followers_attach() -> None:
+        await asyncio.sleep(0.05)
+        gate.set()
+
+    tasks = [
+        asyncio.create_task(single_flight.resolve_once("sess-sf-003", loader))
+        for _ in range(5)
+    ]
+    releaser = asyncio.create_task(release_after_followers_attach())
+
+    results = await asyncio.gather(*tasks, return_exceptions=True)
+    await releaser
+
+    assert call_count == 1
+    assert len(results) == 5
+    for outcome in results:
+        assert isinstance(outcome, LoaderBoom)
+        assert str(outcome) == "cognito exploded"
+
+
+@pytest.mark.asyncio
+async def test_registry_entry_removed_after_exception_so_next_call_is_fresh_leader():
+    """After a loader failure, the next call must start a new leader."""
+    attempts = 0
+
+    async def failing_loader() -> Tuple[Optional[SessionRecord], bool]:
+        nonlocal attempts
+        attempts += 1
+        raise ValueError("transient ddb failure")
+
+    with pytest.raises(ValueError):
+        await single_flight.resolve_once("sess-sf-004", failing_loader)
+
+    # Registry entry must be gone so the next call is a new leader.
+    assert "sess-sf-004" not in single_flight._inflight
+
+    record = _make_record("sess-sf-004")
+
+    async def succeeding_loader() -> Tuple[Optional[SessionRecord], bool]:
+        nonlocal attempts
+        attempts += 1
+        return record, False
+
+    result = await single_flight.resolve_once("sess-sf-004", succeeding_loader)
+
+    assert result == (record, False)
+    assert attempts == 2, "both loaders ran; the failure did not sticky-cache"
+    assert "sess-sf-004" not in single_flight._inflight
+
+
+@pytest.mark.asyncio
+async def test_distinct_sessions_are_independent():
+    """Two different `session_id`s run two independent loader invocations."""
+    calls: list[str] = []
+    record_a = _make_record("sess-A")
+    record_b = _make_record("sess-B")
+
+    async def loader_for(session_id: str, record: SessionRecord):
+        async def _loader() -> Tuple[Optional[SessionRecord], bool]:
+            calls.append(session_id)
+            # Small sleep to encourage interleaving.
+            await asyncio.sleep(0.01)
+            return record, False
+
+        return _loader
+
+    loader_a = await loader_for("sess-A", record_a)
+    loader_b = await loader_for("sess-B", record_b)
+
+    result_a, result_b = await asyncio.gather(
+        single_flight.resolve_once("sess-A", loader_a),
+        single_flight.resolve_once("sess-B", loader_b),
+    )
+
+    assert result_a == (record_a, False)
+    assert result_b == (record_b, False)
+    assert sorted(calls) == ["sess-A", "sess-B"], "each session's loader runs exactly once"
+    assert "sess-A" not in single_flight._inflight
+    assert "sess-B" not in single_flight._inflight
+
+
+@pytest.mark.asyncio
+async def test_clear_cookie_flag_is_preserved():
+    """`resolve_once` must faithfully propagate the `clear_cookie` bool."""
+
+    async def loader_none_clear() -> Tuple[Optional[SessionRecord], bool]:
+        return None, True
+
+    result = await single_flight.resolve_once("sess-sf-005", loader_none_clear)
+    assert result == (None, True)
diff --git a/backend/tests/auth/test_dependencies.py b/backend/tests/auth/test_dependencies.py
index c336d115..5e4fcf89 100644
--- a/backend/tests/auth/test_dependencies.py
+++ b/backend/tests/auth/test_dependencies.py
@@ -1,22 +1,18 @@
 """Tests for FastAPI auth dependencies.
 
 Covers:
-- get_current_user: Bearer token validation via CognitoJWTValidator
 - get_current_user_trusted: JWT decode without signature verification
 - get_current_user_id: convenience wrapper returning user_id string
 
 Requirements: 10.5, 10.6
 """
 
-import time
-from unittest.mock import AsyncMock, MagicMock, patch
+from unittest.mock import MagicMock, patch
 
-import jwt as pyjwt
 import pytest
 from fastapi import HTTPException
 
 from apis.shared.auth.dependencies import (
-    get_current_user,
     get_current_user_id,
     get_current_user_trusted,
 )
@@ -34,107 +30,6 @@ def _bearer(token: str):
     return creds
 
 
-# ---------------------------------------------------------------------------
-# get_current_user tests
-# ---------------------------------------------------------------------------
-
-
-class TestGetCurrentUser:
-    """Tests for the get_current_user dependency (Cognito-based)."""
-
-    @pytest.mark.asyncio
-    async def test_valid_bearer_token(self, make_jwt, make_user):
-        """Req 10.5: valid Bearer token validated by CognitoJWTValidator, returns User with raw_token."""
-        token = make_jwt()
-        expected_user = make_user(raw_token=None)
-
-        mock_validator = MagicMock()
-        mock_validator.validate_token = MagicMock(return_value=expected_user)
-
-        with patch(
-            "apis.shared.auth.dependencies._get_cognito_validator",
-            return_value=mock_validator,
-        ), patch(
-            "apis.shared.auth.dependencies._get_user_sync_service",
-            return_value=None,
-        ):
-            user = await get_current_user(credentials=_bearer(token))
-
-        assert isinstance(user, User)
-        assert user.raw_token == token
-        assert user.user_id == expected_user.user_id
-        mock_validator.validate_token.assert_called_once_with(token)
-
-    @pytest.mark.asyncio
-    async def test_no_credentials_401(self):
-        """Req 10.5: None credentials raises 401 with WWW-Authenticate header."""
-        with pytest.raises(HTTPException) as exc_info:
-            await get_current_user(credentials=None)
-
-        assert exc_info.value.status_code == 401
-        assert "WWW-Authenticate" in (exc_info.value.headers or {})
-
-    @pytest.mark.asyncio
-    async def test_failed_validation_401(self, make_jwt):
-        """Req 10.5: token that fails Cognito validation raises 401."""
-        token = make_jwt()
-
-        mock_validator = MagicMock()
-        mock_validator.validate_token = MagicMock(
-            side_effect=HTTPException(status_code=401, detail="Invalid token signature.")
-        )
-
-        with patch(
-            "apis.shared.auth.dependencies._get_cognito_validator",
-            return_value=mock_validator,
-        ), patch(
-            "apis.shared.auth.dependencies._get_user_sync_service",
-            return_value=None,
-        ):
-            with pytest.raises(HTTPException) as exc_info:
-                await get_current_user(credentials=_bearer(token))
-
-        assert exc_info.value.status_code == 401
-
-    @pytest.mark.asyncio
-    async def test_no_validator_500(self, make_jwt):
-        """Req 10.6: no Cognito validator available raises 500."""
-        token = make_jwt()
-
-        with patch(
-            "apis.shared.auth.dependencies._get_cognito_validator",
-            return_value=None,
-        ):
-            with pytest.raises(HTTPException) as exc_info:
-                await get_current_user(credentials=_bearer(token))
-
-        assert exc_info.value.status_code == 500
-        assert "Authentication service not configured" in exc_info.value.detail
-
-    @pytest.mark.asyncio
-    async def test_unexpected_exception_401(self, make_jwt):
-        """Unexpected exception during validation raises 401."""
-        token = make_jwt()
-
-        mock_validator = MagicMock()
-        mock_validator.validate_token = MagicMock(
-            side_effect=RuntimeError("unexpected")
-        )
-
-        with patch(
-            "apis.shared.auth.dependencies._get_cognito_validator",
-            return_value=mock_validator,
-        ), patch(
-            "apis.shared.auth.dependencies._get_user_sync_service",
-            return_value=None,
-        ):
-            with pytest.raises(HTTPException) as exc_info:
-                await get_current_user(credentials=_bearer(token))
-
-        assert exc_info.value.status_code == 401
-        assert exc_info.value.detail == "Authentication failed."
-
-
 # ---------------------------------------------------------------------------
 # get_current_user_trusted tests
 # ---------------------------------------------------------------------------
@@ -260,24 +155,11 @@ class TestGetCurrentUserId:
     """Tests for the get_current_user_id dependency."""
 
     @pytest.mark.asyncio
-    async def test_returns_string(self, make_jwt, make_user):
-        """get_current_user_id returns the user_id string."""
-        token = make_jwt()
+    async def test_returns_string(self, make_user):
+        """get_current_user_id returns the resolved user's user_id."""
         expected_user = make_user(user_id="uid-42")
 
-        mock_validator = MagicMock()
-        mock_validator.validate_token = MagicMock(return_value=expected_user)
-
-        with patch(
-            "apis.shared.auth.dependencies._get_cognito_validator",
-            return_value=mock_validator,
-        ), patch(
-            "apis.shared.auth.dependencies._get_user_sync_service",
-            return_value=None,
-        ):
-            user_id = await get_current_user_id(
-                user=await get_current_user(credentials=_bearer(token))
-            )
+        user_id = await get_current_user_id(user=expected_user)
 
         assert user_id == "uid-42"
         assert isinstance(user_id, str)
diff --git a/backend/tests/auth/test_skip_auth.py b/backend/tests/auth/test_skip_auth.py
new file mode 100644
index 00000000..bbf3a299
--- /dev/null
+++ b/backend/tests/auth/test_skip_auth.py
@@ -0,0 +1,252 @@
+"""Tests for the SKIP_AUTH=true local-dev bypass.
+
+Covers:
+- `_skip_auth_user()` returns None when disabled, fake User when enabled
+- All three auth dependencies bypass when SKIP_AUTH=true
+- `_validate_skip_auth_or_raise()` accepts localhost-only CORS_ORIGINS,
+  rejects empty CORS_ORIGINS, rejects any non-localhost origin
+- The CI-guard regex matches realistic leak strings and skips the
+  legitimate references in dependencies.py / main.py
+"""
+
+import importlib
+import re
+from unittest.mock import MagicMock, patch
+
+import pytest
+from fastapi import HTTPException
+
+from apis.shared.auth.dependencies import (
+    _skip_auth_user,
+    get_current_user_from_session,
+    get_current_user_trusted,
+)
+from apis.shared.auth.models import User
+
+
+# ---------------------------------------------------------------------------
+# Env helpers
+# ---------------------------------------------------------------------------
+
+@pytest.fixture
+def clean_skip_auth_env(monkeypatch):
+    """Clear all SKIP_AUTH_* env vars so each test starts from a known state."""
+    for key in (
+        "SKIP_AUTH",
+        "SKIP_AUTH_ROLES",
+        "SKIP_AUTH_USER_ID",
+        "SKIP_AUTH_EMAIL",
+        "CORS_ORIGINS",
+    ):
+        monkeypatch.delenv(key, raising=False)
+
+
+# ---------------------------------------------------------------------------
+# _skip_auth_user
+# ---------------------------------------------------------------------------
+
+
+class TestSkipAuthUser:
+    """Tests for the `_skip_auth_user()` helper."""
+
+    def test_returns_none_when_unset(self, clean_skip_auth_env):
+        assert _skip_auth_user() is None
+
+    @pytest.mark.parametrize("value", ["false", "0", "", "no", "FALSE"])
+    def test_returns_none_when_falsey(self, clean_skip_auth_env, monkeypatch, value):
+        monkeypatch.setenv("SKIP_AUTH", value)
+        assert _skip_auth_user() is None
+
+    @pytest.mark.parametrize("value", ["true", "TRUE", "True"])
+    def test_returns_user_when_true(self, clean_skip_auth_env, monkeypatch, value):
+        monkeypatch.setenv("SKIP_AUTH", value)
+        user = _skip_auth_user()
+        assert isinstance(user, User)
+        assert user.user_id == "local-dev"
+        assert user.email == "dev@local"
+        assert user.name == "Local Dev"
+        assert user.roles == ["admin"]
+
+    def test_overrides_via_env(self, clean_skip_auth_env, monkeypatch):
+        monkeypatch.setenv("SKIP_AUTH", "true")
+        monkeypatch.setenv("SKIP_AUTH_USER_ID", "phil")
+        monkeypatch.setenv("SKIP_AUTH_EMAIL", "phil@example.com")
+        monkeypatch.setenv("SKIP_AUTH_ROLES", "admin,DotNetDevelopers, ,QA ")
+
+        user = _skip_auth_user()
+        assert user is not None
+        assert user.user_id == "phil"
+        assert user.email == "phil@example.com"
+        # whitespace-only entries filtered, surrounding whitespace stripped
+        assert user.roles == ["admin", "DotNetDevelopers", "QA"]
+
+
+# ---------------------------------------------------------------------------
+# Dependency bypass
+# ---------------------------------------------------------------------------
+
+
+class TestDependencyBypass:
+    """Tests that the bypass short-circuits each auth dependency."""
+
+    @pytest.mark.asyncio
+    async def test_get_current_user_from_session_bypassed(
+        self, clean_skip_auth_env, monkeypatch
+    ):
+        monkeypatch.setenv("SKIP_AUTH", "true")
+        # Build a request whose state.bff_session is unset; without the
+        # bypass this would 401, with the bypass we get the fake user.
+        request = MagicMock()
+        request.state = MagicMock(spec=[])  # no bff_session attr
+
+        user = await get_current_user_from_session(request)
+        assert user.user_id == "local-dev"
+
+    @pytest.mark.asyncio
+    async def test_get_current_user_trusted_bypassed(
+        self, clean_skip_auth_env, monkeypatch
+    ):
+        monkeypatch.setenv("SKIP_AUTH", "true")
+        # No credentials supplied — without the bypass this 401s.
+        user = await get_current_user_trusted(credentials=None)
+        assert user.user_id == "local-dev"
+
+    @pytest.mark.asyncio
+    async def test_get_current_user_from_session_still_401_when_disabled(
+        self, clean_skip_auth_env
+    ):
+        """Sanity check: with SKIP_AUTH unset, missing session still 401."""
+        request = MagicMock()
+        request.state = MagicMock(spec=[])  # no bff_session attr
+        with pytest.raises(HTTPException) as exc:
+            await get_current_user_from_session(request)
+        assert exc.value.status_code == 401
+
+
+# ---------------------------------------------------------------------------
+# Startup guard
+# ---------------------------------------------------------------------------
+
+
+def _import_main_module():
+    """Import (and reload) apis.app_api.main so it picks up current env.
+
+    The module calls `load_dotenv(..., override=True)` at import time,
+    which would clobber monkeypatched env vars on reload. Patch the
+    upstream symbol so the `from dotenv import load_dotenv` re-binding
+    inside the module reload also picks up the no-op.
+    """
+    with patch("dotenv.load_dotenv", lambda *a, **kw: None):
+        import apis.app_api.main as m
+        return importlib.reload(m)
+
+
+class TestStartupGuard:
+    """Tests for `_validate_skip_auth_or_raise()` in app_api/main.py."""
+
+    def test_noop_when_skip_auth_off(self, clean_skip_auth_env):
+        m = _import_main_module()
+        # Doesn't raise even with no CORS_ORIGINS — guard is a no-op.
+        m._validate_skip_auth_or_raise()
+
+    @pytest.mark.parametrize(
+        "origins",
+        [
+            "http://localhost:4200",
+            "http://localhost:4200,http://127.0.0.1:8000",
+            "http://[::1]:4200",
+            "http://0.0.0.0:4200",
+            "http://localhost:4200, http://127.0.0.1:8000 ",  # whitespace tolerated
+        ],
+    )
+    def test_accepts_localhost_origins(
+        self, clean_skip_auth_env, monkeypatch, origins
+    ):
+        monkeypatch.setenv("SKIP_AUTH", "true")
+        monkeypatch.setenv("CORS_ORIGINS", origins)
+        m = _import_main_module()
+        m._validate_skip_auth_or_raise()  # no raise
+
+    def test_rejects_empty_cors_origins(self, clean_skip_auth_env, monkeypatch):
+        monkeypatch.setenv("SKIP_AUTH", "true")
+        monkeypatch.setenv("CORS_ORIGINS", "")
+        m = _import_main_module()
+        with pytest.raises(RuntimeError, match="localhost"):
+            m._validate_skip_auth_or_raise()
+
+    def test_rejects_unset_cors_origins(self, clean_skip_auth_env, monkeypatch):
+        monkeypatch.setenv("SKIP_AUTH", "true")
+        # CORS_ORIGINS deliberately unset
+        m = _import_main_module()
+        with pytest.raises(RuntimeError, match="localhost"):
+            m._validate_skip_auth_or_raise()
+
+    @pytest.mark.parametrize(
+        "origins",
+        [
+            "https://app.example.com",
+            "http://localhost:4200,https://app.example.com",  # one bad apple
+            "https://prod.boisestate.edu",
+        ],
+    )
+    def test_rejects_non_localhost(
+        self, clean_skip_auth_env, monkeypatch, origins
+    ):
+        monkeypatch.setenv("SKIP_AUTH", "true")
+        monkeypatch.setenv("CORS_ORIGINS", origins)
+        m = _import_main_module()
+        with pytest.raises(RuntimeError, match="localhost"):
+            m._validate_skip_auth_or_raise()
+
+
+# ---------------------------------------------------------------------------
+# CI-guard regex
+# ---------------------------------------------------------------------------
+
+
+class TestCIGuardPattern:
+    """Tests that mirror the grep pattern in skip-auth-guard.yml.
+
+    The CI workflow uses `grep -E` with this pattern; we validate the
+    same regex against representative leak strings and the legitimate
+    references in our own source so a future refactor of the workflow
+    has a behavioral spec to test against.
+    """
+
+    # Mirrors the PATTERN in .github/workflows/skip-auth-guard.yml
+    PATTERN = re.compile(r"""SKIP_AUTH[ \t]*[:=][ \t]*["']*true""")
+
+    @pytest.mark.parametrize(
+        "leak",
+        [
+            "SKIP_AUTH=true",
+            'SKIP_AUTH: "true"',
+            "SKIP_AUTH: true",
+            "SKIP_AUTH:true",
+            "SKIP_AUTH: 'true'",
+            "  SKIP_AUTH = true",
+            'SKIP_AUTH="true"',
+            "ENV SKIP_AUTH=true",  # Dockerfile
+            "      SKIP_AUTH: 'true'  # in some yaml",
+        ],
+    )
+    def test_matches_leak_strings(self, leak):
+        assert self.PATTERN.search(leak) is not None, f"missed leak: {leak!r}"
+
+    @pytest.mark.parametrize(
+        "benign",
+        [
+            'SKIP_AUTH = "false"',
+            "SKIP_AUTH=false",
+            "# Document SKIP_AUTH behaviour",
+            'os.environ.get("SKIP_AUTH", "")',
+            'if os.environ.get("SKIP_AUTH", "").lower() == "true":',
+        ],
+    )
+    def test_skips_benign_strings(self, benign):
+        # The legitimate dependencies.py / main.py references compare against
+        # "true" but don't *assign* SKIP_AUTH=true, so they shouldn't match.
+        # The one exception is the inline comparison string "== \"true\"" —
+        # which the workflow excludes via path-based filtering, not regex.
+        # We only assert the pattern itself doesn't trip on these forms.
+        assert self.PATTERN.search(benign) is None, f"false positive: {benign!r}"
diff --git a/backend/tests/conftest.py b/backend/tests/conftest.py
index 5be3a462..1bb58882 100644
--- a/backend/tests/conftest.py
+++ b/backend/tests/conftest.py
@@ -4,10 +4,26 @@
 import sys
 from pathlib import Path
 
+import pytest
+
 # Ensure AWS region is set so that module-level boto3 calls don't fail
 # during import (e.g. agents.main_agent.quota -> boto3.resource('dynamodb'))
 os.environ.setdefault("AWS_DEFAULT_REGION", "us-east-1")
 
+# botocore >= 1.43 accesses Credentials.account_id during endpoint
+# construction. On a RefreshableCredentials object (e.g. resolved from a
+# real SSO profile) that property forces a credential _refresh() →
+# GetRoleCredentials, which moto does not implement, so mocked AWS calls
+# fail. Pin static dummy credentials so the chain builds a non-refreshable
+# Credentials object instead. The matching AWS_PROFILE scrub is done
+# per-test below (a process-wide pop here is not enough: tests that reload
+# `apis.app_api.main` run load_dotenv(override=True), which re-injects
+# AWS_PROFILE from backend/src/.env mid-suite). Mirrors moto's practice.
+os.environ.setdefault("AWS_ACCESS_KEY_ID", "testing")
+os.environ.setdefault("AWS_SECRET_ACCESS_KEY", "testing")
+os.environ.setdefault("AWS_SESSION_TOKEN", "testing")
+os.environ.setdefault("AWS_SECURITY_TOKEN", "testing")
+
 # Add backend/src to Python path for imports
 # This file is in backend/tests/, so we need to go up one level to backend/
 BACKEND_DIR = Path(__file__).parent.parent
@@ -16,3 +32,89 @@
 if str(SRC_DIR) not in sys.path:
     sys.path.insert(0, str(SRC_DIR))
 
+
+# Scrub SKIP_AUTH bleed from local .env. Some tests reload
+# `apis.app_api.main`, which calls `load_dotenv(override=True)` and
+# clobbers process env with whatever `backend/src/.env` has set —
+# typically `SKIP_AUTH=true` for local dev. Without this fixture every
+# auth-aware test downstream of that reload returns the fake bypass
+# user. Tests that need SKIP_AUTH on can still set it via monkeypatch
+# (test-local setenv runs after this autouse delenv).
+#
+# Manages os.environ directly rather than depending on monkeypatch so
+# this autouse fixture doesn't perturb fixture-teardown ordering for
+# tests that already use monkeypatch + their own autouse fixtures
+# (e.g. tests/apis/app_api/test_connectors_routes.py).
+_SKIP_AUTH_ENV_KEYS = (
+    "SKIP_AUTH",
+    "SKIP_AUTH_ROLES",
+    "SKIP_AUTH_USER_ID",
+    "SKIP_AUTH_EMAIL",
+)
+
+
+@pytest.fixture(autouse=True)
+def _clear_skip_auth_env():
+    saved = {k: os.environ.pop(k, None) for k in _SKIP_AUTH_ENV_KEYS}
+    try:
+        yield
+    finally:
+        for k, v in saved.items():
+            if v is None:
+                os.environ.pop(k, None)
+            else:
+                os.environ[k] = v
+
+
+# Same load_dotenv(override=True) bleed as above, but for AWS_PROFILE.
+# backend/src/.env sets a real SSO profile for local dev; once a test
+# reloads `apis.app_api.main` it lands in process env and every later
+# test that builds a boto3 client resolves SSO credentials. Under
+# botocore >= 1.43 that fails all mocked AWS calls (see import-time note).
+# Scrub per-test so the static dummy credentials win the provider chain.
+_AWS_PROFILE_ENV_KEYS = (
+    "AWS_PROFILE",
+    "AWS_DEFAULT_PROFILE",
+)
+
+
+@pytest.fixture(autouse=True)
+def _clear_aws_profile_env():
+    saved = {k: os.environ.pop(k, None) for k in _AWS_PROFILE_ENV_KEYS}
+    try:
+        yield
+    finally:
+        for k, v in saved.items():
+            if v is None:
+                os.environ.pop(k, None)
+            else:
+                os.environ[k] = v
+
+
+# Same load_dotenv(override=True) bleed again, for the infra-resource
+# config families. backend/src/.env sets real DYNAMODB_*_TABLE_NAME and
+# COGNITO_* identifiers for local dev; once a test reloads
+# `apis.app_api.main` they land in process env. Repositories/services gate
+# their "configured" flag on `param or os.getenv("DYNAMODB_..."/"COGNITO_...")`,
+# so a leaked value makes "disabled when unconfigured" tests construct a
+# live client and attempt real AWS calls. Tests always inject their own
+# resource names via moto fixtures, so scrub the whole family per-test.
+_ENV_CONFIG_BLEED_PREFIXES = (
+    "DYNAMODB_",
+    "COGNITO_",
+)
+
+
+@pytest.fixture(autouse=True)
+def _clear_env_config_bleed():
+    saved = {
+        k: os.environ.pop(k)
+        for k in list(os.environ)
+        if k.startswith(_ENV_CONFIG_BLEED_PREFIXES)
+    }
+    try:
+        yield
+    finally:
+        for k, v in saved.items():
+            os.environ[k] = v
+
diff --git a/backend/tests/costs/test_calculator.py b/backend/tests/costs/test_calculator.py
new file mode 100644
index 00000000..0d6e8120
--- /dev/null
+++ b/backend/tests/costs/test_calculator.py
@@ -0,0 +1,282 @@
+"""Unit tests for CostCalculator — the source-of-truth for all USD math.
+
+These tests pin the per-bucket pricing formula, the cache-savings derivation,
+and the input-validation predicates. The aggregator and storage tests cover
+this code transitively, but only through mocks; this module is the only
+place the math itself is asserted directly.
+
+Conventions for cases:
+  - "Sonnet 4.5 pricing" reflects Bedrock's published rates so a regression
+    in the formula would be visible in dollar terms a reader can sanity-check.
+  - Floats are compared with ``pytest.approx`` to avoid 1e-15 drift.
+"""
+
+import pytest
+
+from apis.shared.costs.calculator import CostCalculator
+from apis.shared.costs.models import CostBreakdown
+
+
+# Bedrock rates for Claude Sonnet 4.5 ($/Mtok). Used as the "realistic"
+# baseline so dollar amounts in tests can be compared to a published source.
+SONNET_45_PRICING = {
+    "inputPricePerMtok": 3.0,
+    "outputPricePerMtok": 15.0,
+    "cacheWritePricePerMtok": 3.75,
+    "cacheReadPricePerMtok": 0.30,
+}
+
+
+class TestCalculateMessageCostBasic:
+    """Core formula: per-bucket pricing summed into total."""
+
+    def test_input_only(self):
+        usage = {"inputTokens": 1_000_000, "outputTokens": 0}
+        total, breakdown = CostCalculator.calculate_message_cost(usage, SONNET_45_PRICING)
+        assert total == pytest.approx(3.0)
+        assert breakdown.input_cost == pytest.approx(3.0)
+        assert breakdown.output_cost == 0.0
+        assert breakdown.cache_read_cost == 0.0
+        assert breakdown.cache_write_cost == 0.0
+
+    def test_output_only(self):
+        usage = {"inputTokens": 0, "outputTokens": 1_000_000}
+        total, breakdown = CostCalculator.calculate_message_cost(usage, SONNET_45_PRICING)
+        assert total == pytest.approx(15.0)
+        assert breakdown.output_cost == pytest.approx(15.0)
+        assert breakdown.input_cost == 0.0
+
+    def test_input_and_output_no_cache(self):
+        """Realistic short turn: 1k input + 500 output on Sonnet 4.5."""
+        usage = {"inputTokens": 1_000, "outputTokens": 500}
+        total, breakdown = CostCalculator.calculate_message_cost(usage, SONNET_45_PRICING)
+        # 1000/1M * 3.00 + 500/1M * 15.00 = 0.003 + 0.0075 = 0.0105
+        assert total == pytest.approx(0.0105)
+        assert breakdown.input_cost == pytest.approx(0.003)
+        assert breakdown.output_cost == pytest.approx(0.0075)
+
+    def test_breakdown_components_sum_to_total(self):
+        """The total in the breakdown must equal the sum of its parts."""
+        usage = {
+            "inputTokens": 1_234,
+            "outputTokens": 567,
+            "cacheReadInputTokens": 8_910,
+            "cacheWriteInputTokens": 2_345,
+        }
+        total, breakdown = CostCalculator.calculate_message_cost(usage, SONNET_45_PRICING)
+        component_sum = (
+            breakdown.input_cost
+            + breakdown.output_cost
+            + breakdown.cache_read_cost
+            + breakdown.cache_write_cost
+        )
+        assert breakdown.total_cost == pytest.approx(component_sum)
+        assert total == pytest.approx(component_sum)
+
+
+class TestCalculateMessageCostWithCache:
+    """Cache buckets price separately and add to the total."""
+
+    def test_cache_read_only(self):
+        """A subsequent turn hitting the prompt cache."""
+        usage = {
+            "inputTokens": 100,            # uncached suffix
+            "outputTokens": 200,
+            "cacheReadInputTokens": 5_000, # cached prefix
+            "cacheWriteInputTokens": 0,
+        }
+        total, breakdown = CostCalculator.calculate_message_cost(usage, SONNET_45_PRICING)
+        # input: 100/1M * 3 = 0.0003
+        # output: 200/1M * 15 = 0.003
+        # cache_read: 5000/1M * 0.30 = 0.0015
+        assert breakdown.input_cost == pytest.approx(0.0003)
+        assert breakdown.output_cost == pytest.approx(0.003)
+        assert breakdown.cache_read_cost == pytest.approx(0.0015)
+        assert breakdown.cache_write_cost == 0.0
+        assert total == pytest.approx(0.0048)
+
+    def test_cache_write_only(self):
+        """The first turn that establishes the cache pays the write premium."""
+        usage = {
+            "inputTokens": 0,
+            "outputTokens": 100,
+            "cacheReadInputTokens": 0,
+            "cacheWriteInputTokens": 5_000,
+        }
+        total, breakdown = CostCalculator.calculate_message_cost(usage, SONNET_45_PRICING)
+        # cache_write: 5000/1M * 3.75 = 0.01875
+        # output: 100/1M * 15 = 0.0015
+        assert breakdown.cache_write_cost == pytest.approx(0.01875)
+        assert breakdown.output_cost == pytest.approx(0.0015)
+        assert breakdown.cache_read_cost == 0.0
+        assert total == pytest.approx(0.02025)
+
+    def test_cache_read_and_write_mixed(self):
+        """A turn that hits part of the cache and writes a new section."""
+        usage = {
+            "inputTokens": 200,
+            "outputTokens": 300,
+            "cacheReadInputTokens": 10_000,
+            "cacheWriteInputTokens": 2_000,
+        }
+        total, breakdown = CostCalculator.calculate_message_cost(usage, SONNET_45_PRICING)
+        assert breakdown.input_cost == pytest.approx(200 / 1_000_000 * 3.0)
+        assert breakdown.output_cost == pytest.approx(300 / 1_000_000 * 15.0)
+        assert breakdown.cache_read_cost == pytest.approx(10_000 / 1_000_000 * 0.30)
+        assert breakdown.cache_write_cost == pytest.approx(2_000 / 1_000_000 * 3.75)
+        assert total == pytest.approx(
+            breakdown.input_cost
+            + breakdown.output_cost
+            + breakdown.cache_read_cost
+            + breakdown.cache_write_cost
+        )
+
+    def test_docstring_example_holds(self):
+        """The docstring example must match the implementation."""
+        usage = {
+            "inputTokens": 1_000,
+            "outputTokens": 500,
+            "cacheReadInputTokens": 200,
+            "cacheWriteInputTokens": 100,
+        }
+        total, breakdown = CostCalculator.calculate_message_cost(usage, SONNET_45_PRICING)
+        assert breakdown.input_cost == pytest.approx(0.003)
+        assert breakdown.output_cost == pytest.approx(0.0075)
+        assert breakdown.cache_read_cost == pytest.approx(0.00006)
+        assert breakdown.cache_write_cost == pytest.approx(0.000375)
+        assert total == pytest.approx(0.010935)
+
+
+class TestCalculateMessageCostDefensive:
+    """Missing or None fields should degrade to 0, never raise."""
+
+    def test_missing_pricing_fields_default_to_zero(self):
+        """Cache prices may be absent for non-Bedrock providers."""
+        pricing = {"inputPricePerMtok": 1.0, "outputPricePerMtok": 2.0}
+        usage = {
+            "inputTokens": 1_000_000,
+            "outputTokens": 1_000_000,
+            "cacheReadInputTokens": 1_000_000,
+            "cacheWriteInputTokens": 1_000_000,
+        }
+        total, breakdown = CostCalculator.calculate_message_cost(usage, pricing)
+        assert breakdown.input_cost == pytest.approx(1.0)
+        assert breakdown.output_cost == pytest.approx(2.0)
+        assert breakdown.cache_read_cost == 0.0
+        assert breakdown.cache_write_cost == 0.0
+        assert total == pytest.approx(3.0)
+
+    def test_none_pricing_values_default_to_zero(self):
+        """A managed-model row with explicit None for cache prices must not raise."""
+        pricing = {
+            "inputPricePerMtok": 3.0,
+            "outputPricePerMtok": 15.0,
+            "cacheReadPricePerMtok": None,
+            "cacheWritePricePerMtok": None,
+        }
+        usage = {
+            "inputTokens": 1_000,
+            "outputTokens": 500,
+            "cacheReadInputTokens": 1_000,
+            "cacheWriteInputTokens": 1_000,
+        }
+        total, breakdown = CostCalculator.calculate_message_cost(usage, pricing)
+        assert breakdown.cache_read_cost == 0.0
+        assert breakdown.cache_write_cost == 0.0
+
+    def test_none_usage_values_default_to_zero(self):
+        usage = {
+            "inputTokens": None,
+            "outputTokens": None,
+            "cacheReadInputTokens": None,
+            "cacheWriteInputTokens": None,
+        }
+        total, breakdown = CostCalculator.calculate_message_cost(usage, SONNET_45_PRICING)
+        assert total == 0.0
+        assert breakdown.input_cost == 0.0
+        assert breakdown.output_cost == 0.0
+        assert breakdown.cache_read_cost == 0.0
+        assert breakdown.cache_write_cost == 0.0
+
+    def test_empty_usage_and_pricing(self):
+        total, breakdown = CostCalculator.calculate_message_cost({}, {})
+        assert total == 0.0
+        assert isinstance(breakdown, CostBreakdown)
+
+
+class TestCalculateCacheSavings:
+    """Cache savings = (input_price - cache_read_price) * read_tokens / 1M."""
+
+    def test_typical_savings(self):
+        """200 read tokens at Sonnet 4.5 rates."""
+        savings = CostCalculator.calculate_cache_savings(200, 3.0, 0.30)
+        # standard: 200/1M * 3 = 0.0006; cached: 200/1M * 0.30 = 0.00006
+        assert savings == pytest.approx(0.00054)
+
+    def test_zero_reads_returns_zero(self):
+        assert CostCalculator.calculate_cache_savings(0, 3.0, 0.30) == 0.0
+
+    def test_none_reads_returns_zero(self):
+        """``None`` is the realistic shape from a model that didn't hit cache."""
+        assert CostCalculator.calculate_cache_savings(None, 3.0, 0.30) == 0.0
+
+    def test_none_prices_default_to_zero(self):
+        """None prices must not raise — the formula collapses cleanly to 0."""
+        assert CostCalculator.calculate_cache_savings(1_000, None, None) == 0.0
+
+    def test_savings_equals_full_input_cost_when_cache_is_free(self):
+        """If cache reads are priced at 0, savings is the full input cost."""
+        savings = CostCalculator.calculate_cache_savings(1_000_000, 3.0, 0.0)
+        assert savings == pytest.approx(3.0)
+
+
+class TestValidatePricing:
+    """validate_pricing requires inputPricePerMtok and outputPricePerMtok with non-None values."""
+
+    def test_complete_pricing_is_valid(self):
+        assert CostCalculator.validate_pricing(SONNET_45_PRICING) is True
+
+    def test_minimal_pricing_is_valid(self):
+        """Cache fields are not required."""
+        assert CostCalculator.validate_pricing({
+            "inputPricePerMtok": 1.0,
+            "outputPricePerMtok": 2.0,
+        }) is True
+
+    def test_missing_input_price_is_invalid(self):
+        assert CostCalculator.validate_pricing({"outputPricePerMtok": 2.0}) is False
+
+    def test_missing_output_price_is_invalid(self):
+        assert CostCalculator.validate_pricing({"inputPricePerMtok": 1.0}) is False
+
+    def test_none_value_is_invalid(self):
+        assert CostCalculator.validate_pricing({
+            "inputPricePerMtok": None,
+            "outputPricePerMtok": 2.0,
+        }) is False
+
+
+class TestValidateUsage:
+    """validate_usage requires inputTokens and outputTokens with non-None values."""
+
+    def test_complete_usage_is_valid(self):
+        assert CostCalculator.validate_usage({
+            "inputTokens": 100,
+            "outputTokens": 50,
+        }) is True
+
+    def test_zero_values_are_valid(self):
+        """Zero is a real measurement, not an absence."""
+        assert CostCalculator.validate_usage({
+            "inputTokens": 0,
+            "outputTokens": 0,
+        }) is True
+
+    def test_missing_input_tokens_is_invalid(self):
+        assert CostCalculator.validate_usage({"outputTokens": 50}) is False
+
+    def test_none_value_is_invalid(self):
+        assert CostCalculator.validate_usage({
+            "inputTokens": None,
+            "outputTokens": 50,
+        }) is False
diff --git a/backend/tests/lambdas/test_artifact_render.py b/backend/tests/lambdas/test_artifact_render.py
new file mode 100644
index 00000000..5817c760
--- /dev/null
+++ b/backend/tests/lambdas/test_artifact_render.py
@@ -0,0 +1,550 @@
+"""Tests for the artifact render Lambda.
+
+Two layers:
+  * Token verification matrix — pure stdlib HS256 logic, no AWS.
+  * Handler integration — full request flow against moto-backed
+    Secrets Manager, DynamoDB, and S3.
+"""
+
+from __future__ import annotations
+
+import base64
+import hashlib
+import hmac
+import json
+import time
+from typing import Any
+
+import boto3
+import pytest
+from moto import mock_aws
+
+from lambdas.artifact_render import handler
+
+KEY = "test-signing-key-44-chars-of-entropy-aaaaaaa"
+
+
+def _b64url(data: bytes) -> str:
+    return base64.urlsafe_b64encode(data).rstrip(b"=").decode("ascii")
+
+
+def _mint(
+    claims: dict[str, Any],
+    *,
+    key: str = KEY,
+    alg: str = "HS256",
+    tamper_sig: bool = False,
+) -> str:
+    header = _b64url(json.dumps({"alg": alg, "typ": "JWT"}).encode())
+    payload = _b64url(json.dumps(claims).encode())
+    signing_input = f"{header}.{payload}".encode("ascii")
+    sig = hmac.new(key.encode(), signing_input, hashlib.sha256).digest()
+    sig_b64 = _b64url(sig)
+    if tamper_sig:
+        sig_b64 = ("A" if sig_b64[0] != "A" else "B") + sig_b64[1:]
+    return f"{header}.{payload}.{sig_b64}"
+
+
+def _valid_claims(**overrides: Any) -> dict[str, Any]:
+    now = int(time.time())
+    base = {
+        "sub": "user-123",
+        "aid": "artifact-abc",
+        "ver": 1,
+        "sid": "session-xyz",
+        "iss": "app-api",
+        "aud": "artifact-render",
+        "iat": now,
+        "exp": now + 90,
+    }
+    base.update(overrides)
+    return base
+
+
+@pytest.fixture(autouse=True)
+def _reset_module_state(monkeypatch: pytest.MonkeyPatch) -> None:
+    """Each test starts from clean module-scoped caches."""
+    monkeypatch.setattr(handler, "_cached_signing_key", None)
+    monkeypatch.setattr(handler, "_secrets_client", None)
+    monkeypatch.setattr(handler, "_s3_client", None)
+    monkeypatch.setattr(handler, "_ddb_table", None)
+
+
+# --------------------------------------------------------------------------
+# Token verification matrix (no AWS — signing key injected directly).
+# --------------------------------------------------------------------------
+
+
+@pytest.fixture
+def _injected_key(monkeypatch: pytest.MonkeyPatch) -> None:
+    monkeypatch.setattr(handler, "_cached_signing_key", KEY)
+
+
+def test_valid_token_returns_claims(_injected_key: None) -> None:
+    claims = handler._verify_token(_mint(_valid_claims()))
+    assert claims["sub"] == "user-123"
+    assert claims["aid"] == "artifact-abc"
+    assert claims["ver"] == 1
+
+
+def test_tampered_signature_rejected(_injected_key: None) -> None:
+    with pytest.raises(handler._TokenError):
+        handler._verify_token(_mint(_valid_claims(), tamper_sig=True))
+
+
+def test_wrong_signing_key_rejected(_injected_key: None) -> None:
+    forged = _mint(_valid_claims(), key="a-different-key")
+    with pytest.raises(handler._TokenError):
+        handler._verify_token(forged)
+
+
+def test_alg_none_rejected(_injected_key: None) -> None:
+    with pytest.raises(handler._TokenError):
+        handler._verify_token(_mint(_valid_claims(), alg="none"))
+
+
+def test_alg_confusion_rejected(_injected_key: None) -> None:
+    with pytest.raises(handler._TokenError):
+        handler._verify_token(_mint(_valid_claims(), alg="HS512"))
+
+
+def test_expired_token_rejected(_injected_key: None) -> None:
+    now = int(time.time())
+    with pytest.raises(handler._TokenError):
+        handler._verify_token(_mint(_valid_claims(iat=now - 200, exp=now - 100)))
+
+
+def test_expiry_within_leeway_accepted(_injected_key: None) -> None:
+    now = int(time.time())
+    claims = handler._verify_token(_mint(_valid_claims(iat=now - 4, exp=now - 3)))
+    assert claims["sub"] == "user-123"
+
+
+def test_future_iat_rejected(_injected_key: None) -> None:
+    now = int(time.time())
+    with pytest.raises(handler._TokenError):
+        handler._verify_token(_mint(_valid_claims(iat=now + 100, exp=now + 200)))
+
+
+def test_overlong_lifetime_rejected(_injected_key: None) -> None:
+    now = int(time.time())
+    over = handler._MAX_TOKEN_LIFETIME_SECONDS + 60
+    with pytest.raises(handler._TokenError):
+        handler._verify_token(_mint(_valid_claims(iat=now, exp=now + over)))
+
+
+def test_wrong_issuer_rejected(_injected_key: None) -> None:
+    with pytest.raises(handler._TokenError):
+        handler._verify_token(_mint(_valid_claims(iss="evil")))
+
+
+def test_wrong_audience_rejected(_injected_key: None) -> None:
+    with pytest.raises(handler._TokenError):
+        handler._verify_token(_mint(_valid_claims(aud="some-other-service")))
+
+
+def test_missing_exp_rejected(_injected_key: None) -> None:
+    claims = _valid_claims()
+    del claims["exp"]
+    with pytest.raises(handler._TokenError):
+        handler._verify_token(_mint(claims))
+
+
+def test_missing_iat_rejected(_injected_key: None) -> None:
+    # `iat` is mandatory — without it the lifetime cap can't be enforced.
+    claims = _valid_claims()
+    del claims["iat"]
+    with pytest.raises(handler._TokenError):
+        handler._verify_token(_mint(claims))
+
+
+@pytest.mark.parametrize("bad_iat", ["123", True])
+def test_non_numeric_iat_rejected(_injected_key: None, bad_iat: Any) -> None:
+    # A string or bool `iat` must not slip past the numeric guard
+    # (bool is an int subclass).
+    with pytest.raises(handler._TokenError):
+        handler._verify_token(_mint(_valid_claims(iat=bad_iat)))
+
+
+@pytest.mark.parametrize("bad_ver", [0, -1, True, "1", 1.0])
+def test_invalid_version_rejected(_injected_key: None, bad_ver: Any) -> None:
+    with pytest.raises(handler._TokenError):
+        handler._verify_token(_mint(_valid_claims(ver=bad_ver)))
+
+
+@pytest.mark.parametrize("missing", ["sub", "aid"])
+def test_missing_identity_claim_rejected(_injected_key: None, missing: str) -> None:
+    claims = _valid_claims()
+    del claims[missing]
+    with pytest.raises(handler._TokenError):
+        handler._verify_token(_mint(claims))
+
+
+@pytest.mark.parametrize("token", ["", "a.b", "a.b.c.d", "not-a-token"])
+def test_malformed_token_rejected(_injected_key: None, token: str) -> None:
+    with pytest.raises(handler._TokenError):
+        handler._verify_token(token)
+
+
+def test_non_dict_header_rejected(_injected_key: None) -> None:
+    # Header decodes to a JSON array, not an object.
+    header = _b64url(json.dumps(["HS256"]).encode())
+    payload = _b64url(json.dumps(_valid_claims()).encode())
+    sig = _b64url(
+        hmac.new(
+            KEY.encode(), f"{header}.{payload}".encode("ascii"), hashlib.sha256
+        ).digest()
+    )
+    with pytest.raises(handler._TokenError):
+        handler._verify_token(f"{header}.{payload}.{sig}")
+
+
+def test_non_dict_payload_rejected(_injected_key: None) -> None:
+    header = _b64url(json.dumps({"alg": "HS256", "typ": "JWT"}).encode())
+    payload = _b64url(json.dumps(["not", "an", "object"]).encode())
+    sig = _b64url(
+        hmac.new(
+            KEY.encode(), f"{header}.{payload}".encode("ascii"), hashlib.sha256
+        ).digest()
+    )
+    with pytest.raises(handler._TokenError):
+        handler._verify_token(f"{header}.{payload}.{sig}")
+
+
+# --------------------------------------------------------------------------
+# Handler integration (moto-backed Secrets Manager + DynamoDB + S3).
+# --------------------------------------------------------------------------
+
+SECRET_ARN_NAME = "test-artifact-render-token-key"
+TABLE = "test-user-artifacts"
+BUCKET = "test-artifacts-content"
+CONTENT_KEY = "user-123/artifact-abc/v1/index.html"
+DOC = "<!doctype html><html><body><h1>hi</h1></body></html>"
+
+
+@pytest.fixture
+def aws_env(monkeypatch: pytest.MonkeyPatch):
+    with mock_aws():
+        sm = boto3.client("secretsmanager", region_name="us-east-1")
+        secret = sm.create_secret(Name=SECRET_ARN_NAME, SecretString=KEY)
+
+        ddb = boto3.client("dynamodb", region_name="us-east-1")
+        ddb.create_table(
+            TableName=TABLE,
+            KeySchema=[
+                {"AttributeName": "PK", "KeyType": "HASH"},
+                {"AttributeName": "SK", "KeyType": "RANGE"},
+            ],
+            AttributeDefinitions=[
+                {"AttributeName": "PK", "AttributeType": "S"},
+                {"AttributeName": "SK", "AttributeType": "S"},
+            ],
+            BillingMode="PAY_PER_REQUEST",
+        )
+
+        s3 = boto3.client("s3", region_name="us-east-1")
+        s3.create_bucket(Bucket=BUCKET)
+        s3.put_object(Bucket=BUCKET, Key=CONTENT_KEY, Body=DOC.encode())
+
+        monkeypatch.setattr(handler, "_RENDER_TOKEN_SECRET_ARN", secret["ARN"])
+        monkeypatch.setattr(handler, "_ARTIFACTS_TABLE", TABLE)
+        monkeypatch.setattr(handler, "_ARTIFACTS_BUCKET", BUCKET)
+        monkeypatch.setattr(handler, "_FRAME_ANCESTOR", "https://app.example.com")
+
+        yield {"ddb": boto3.resource("dynamodb", region_name="us-east-1")}
+
+
+def _put_record(ddb, **overrides: Any) -> None:
+    item = {
+        "PK": "USER#user-123",
+        "SK": "ARTIFACT#artifact-abc#V#00001",
+        "storage": "s3",
+        "content_key": CONTENT_KEY,
+        "content_type": "text/html; charset=utf-8",
+    }
+    item.update(overrides)
+    ddb.Table(TABLE).put_item(Item=item)
+
+
+def _event(
+    token: str | None, method: str = "GET", *, download: bool = False
+) -> dict[str, Any]:
+    qsp: dict[str, str] = {}
+    raw_parts: list[str] = []
+    if token:
+        qsp["t"] = token
+        raw_parts.append(f"t={token}")
+    if download:
+        qsp["download"] = "1"
+        raw_parts.append("download=1")
+    return {
+        "requestContext": {"http": {"method": method}},
+        "queryStringParameters": qsp,
+        "rawQueryString": "&".join(raw_parts),
+    }
+
+
+def test_happy_path_returns_content(aws_env) -> None:
+    _put_record(aws_env["ddb"])
+    resp = handler.handler(_event(_mint(_valid_claims())), None)
+    assert resp["statusCode"] == 200
+    assert resp["body"] == DOC
+    assert resp["headers"]["cache-control"] == "no-store"
+    assert "frame-ancestors https://app.example.com" in (
+        resp["headers"]["content-security-policy"]
+    )
+
+
+def test_secret_fetched_from_secrets_manager(aws_env) -> None:
+    # _cached_signing_key is None (reset fixture) so this exercises the
+    # real Secrets Manager round-trip, not an injected key.
+    _put_record(aws_env["ddb"])
+    resp = handler.handler(_event(_mint(_valid_claims())), None)
+    assert resp["statusCode"] == 200
+
+
+def test_head_request_omits_body(aws_env) -> None:
+    _put_record(aws_env["ddb"])
+    resp = handler.handler(_event(_mint(_valid_claims()), method="HEAD"), None)
+    assert resp["statusCode"] == 200
+    assert resp["body"] == ""
+
+
+def test_token_from_raw_query_string(aws_env) -> None:
+    _put_record(aws_env["ddb"])
+    token = _mint(_valid_claims())
+    event = {
+        "requestContext": {"http": {"method": "GET"}},
+        "queryStringParameters": None,
+        "rawQueryString": f"t={token}",
+    }
+    assert handler.handler(event, None)["statusCode"] == 200
+
+
+def test_missing_token_is_403(aws_env) -> None:
+    resp = handler.handler(_event(None), None)
+    assert resp["statusCode"] == 403
+
+
+def test_tampered_token_is_403(aws_env) -> None:
+    _put_record(aws_env["ddb"])
+    bad = _mint(_valid_claims(), tamper_sig=True)
+    assert handler.handler(_event(bad), None)["statusCode"] == 403
+
+
+def test_non_get_method_is_405(aws_env) -> None:
+    resp = handler.handler(_event(_mint(_valid_claims()), method="POST"), None)
+    assert resp["statusCode"] == 405
+
+
+def test_missing_version_record_is_404(aws_env) -> None:
+    resp = handler.handler(_event(_mint(_valid_claims())), None)
+    assert resp["statusCode"] == 404
+
+
+def test_unsupported_storage_is_500(aws_env) -> None:
+    _put_record(aws_env["ddb"], storage="inline")
+    resp = handler.handler(_event(_mint(_valid_claims())), None)
+    assert resp["statusCode"] == 500
+
+
+def test_record_without_content_key_is_404(aws_env) -> None:
+    ddb = aws_env["ddb"]
+    ddb.Table(TABLE).put_item(
+        Item={
+            "PK": "USER#user-123",
+            "SK": "ARTIFACT#artifact-abc#V#00001",
+            "storage": "s3",
+        }
+    )
+    resp = handler.handler(_event(_mint(_valid_claims())), None)
+    assert resp["statusCode"] == 404
+
+
+def test_missing_s3_object_is_404(aws_env) -> None:
+    _put_record(aws_env["ddb"], content_key="user-123/artifact-abc/v1/gone.html")
+    resp = handler.handler(_event(_mint(_valid_claims())), None)
+    assert resp["statusCode"] == 404
+
+
+def test_version_pins_exact_sk(aws_env) -> None:
+    # Token asks for v2; only v1 exists → must 404, never fall back to HEAD.
+    _put_record(aws_env["ddb"])
+    resp = handler.handler(_event(_mint(_valid_claims(ver=2))), None)
+    assert resp["statusCode"] == 404
+
+
+@pytest.mark.parametrize(
+    "stored,served",
+    [
+        ("text/markdown", "text/html; charset=utf-8"),
+        ("text/markdown; charset=utf-8", "text/html; charset=utf-8"),
+        ("text/x-markdown", "text/html; charset=utf-8"),
+        ("TEXT/MARKDOWN", "text/html; charset=utf-8"),
+        ("text/html; charset=utf-8", "text/html; charset=utf-8"),
+        ("image/svg+xml", "image/svg+xml"),
+        ("application/json", "application/json"),
+    ],
+)
+def test_serve_content_type_mapping(stored: str, served: str) -> None:
+    assert handler._serve_content_type(stored) == served
+
+
+def test_markdown_record_served_as_html(aws_env) -> None:
+    # S3 holds the writer's HTML render wrapper; the row is typed
+    # text/markdown so the SPA card/list stay truthful. The Lambda must
+    # serve the exact bytes but with a text/html HTTP content type.
+    wrapper = "<!doctype html><html><body>rendered md</body></html>"
+    boto3.client("s3", region_name="us-east-1").put_object(
+        Bucket=BUCKET, Key=CONTENT_KEY, Body=wrapper.encode()
+    )
+    _put_record(aws_env["ddb"], content_type="text/markdown; charset=utf-8")
+    resp = handler.handler(_event(_mint(_valid_claims())), None)
+    assert resp["statusCode"] == 200
+    assert resp["headers"]["content-type"] == "text/html; charset=utf-8"
+    assert resp["body"] == wrapper  # bytes are an exact pass-through
+
+
+def test_non_markdown_content_type_served_verbatim(aws_env) -> None:
+    _put_record(aws_env["ddb"], content_type="image/svg+xml")
+    resp = handler.handler(_event(_mint(_valid_claims())), None)
+    assert resp["statusCode"] == 200
+    assert resp["headers"]["content-type"] == "image/svg+xml"
+
+
+def test_oversized_content_is_500(aws_env, monkeypatch: pytest.MonkeyPatch) -> None:
+    monkeypatch.setattr(handler, "_MAX_CONTENT_BYTES", 16)
+    boto3.client("s3", region_name="us-east-1").put_object(
+        Bucket=BUCKET, Key=CONTENT_KEY, Body=b"x" * 64
+    )
+    _put_record(aws_env["ddb"])
+    resp = handler.handler(_event(_mint(_valid_claims())), None)
+    assert resp["statusCode"] == 500
+
+
+# --------------------------------------------------------------------------
+# Download mode (`?download=1`): attachment disposition, no CSP, gated by
+# the same token as render.
+# --------------------------------------------------------------------------
+
+
+@pytest.mark.parametrize(
+    "stored,ext",
+    [
+        ("text/html; charset=utf-8", "html"),
+        ("text/markdown", "html"),  # S3 body is the HTML render wrapper
+        ("text/x-markdown", "html"),
+        ("TEXT/MARKDOWN; charset=utf-8", "html"),
+        ("image/svg+xml", "svg"),
+        ("application/json", "json"),
+        ("text/css", "css"),
+        ("application/javascript", "js"),
+        ("text/plain", "txt"),
+        ("application/x-weird", "bin"),
+    ],
+)
+def test_download_extension_mapping(stored: str, ext: str) -> None:
+    assert handler._download_extension(stored) == ext
+
+
+def test_content_disposition_sanitizes_title() -> None:
+    cd = handler._content_disposition("Q3 / Report: <draft>", "html")
+    assert cd.startswith('attachment; filename="')
+    # The ASCII fallback must not carry path/header-hostile characters.
+    fallback = cd.split('filename="', 1)[1].split('"', 1)[0]
+    assert fallback.endswith(".html")
+    for bad in ("/", ":", "<", ">", "\\", "\r", "\n"):
+        assert bad not in fallback
+    # The original title is preserved in the RFC 5987 form.
+    assert "filename*=UTF-8''" in cd
+
+
+def test_content_disposition_defaults_when_title_blank() -> None:
+    # A whitespace-only title is treated as absent: both forms fall back
+    # to "artifact" (never a file literally named "   .json").
+    assert handler._content_disposition("   ", "json") == (
+        "attachment; filename=\"artifact.json\"; "
+        "filename*=UTF-8''artifact.json"
+    )
+
+
+def test_content_disposition_preserves_unicode_title() -> None:
+    cd = handler._content_disposition("résumé", "txt")
+    # Non-ASCII collapses to '_' in the fallback but survives in filename*.
+    assert 'filename="r' in cd
+    assert "filename*=UTF-8''r%C3%A9sum%C3%A9.txt" in cd
+
+
+def test_download_returns_attachment(aws_env) -> None:
+    _put_record(aws_env["ddb"], title="My Page")
+    resp = handler.handler(
+        _event(_mint(_valid_claims()), download=True), None
+    )
+    assert resp["statusCode"] == 200
+    assert resp["body"] == DOC
+    cd = resp["headers"]["content-disposition"]
+    assert cd.startswith("attachment; ")
+    assert 'filename="My Page.html"' in cd
+    assert resp["headers"]["content-type"] == "text/html; charset=utf-8"
+    assert resp["headers"]["x-content-type-options"] == "nosniff"
+    assert resp["headers"]["cache-control"] == "no-store"
+    # An attachment is saved, never framed — no CSP/frame-ancestors.
+    assert "content-security-policy" not in resp["headers"]
+
+
+def test_download_markdown_uses_html_extension(aws_env) -> None:
+    wrapper = "<!doctype html><html><body>rendered md</body></html>"
+    boto3.client("s3", region_name="us-east-1").put_object(
+        Bucket=BUCKET, Key=CONTENT_KEY, Body=wrapper.encode()
+    )
+    _put_record(
+        aws_env["ddb"],
+        content_type="text/markdown; charset=utf-8",
+        title="Notes",
+    )
+    resp = handler.handler(
+        _event(_mint(_valid_claims()), download=True), None
+    )
+    assert resp["statusCode"] == 200
+    assert resp["body"] == wrapper
+    assert resp["headers"]["content-type"] == "text/html; charset=utf-8"
+    assert 'filename="Notes.html"' in resp["headers"]["content-disposition"]
+
+
+def test_download_default_filename_when_title_missing(aws_env) -> None:
+    _put_record(aws_env["ddb"], content_type="application/json")
+    resp = handler.handler(
+        _event(_mint(_valid_claims()), download=True), None
+    )
+    assert 'filename="artifact.json"' in resp["headers"]["content-disposition"]
+
+
+def test_head_download_omits_body_keeps_disposition(aws_env) -> None:
+    _put_record(aws_env["ddb"], title="Doc")
+    resp = handler.handler(
+        _event(_mint(_valid_claims()), method="HEAD", download=True), None
+    )
+    assert resp["statusCode"] == 200
+    assert resp["body"] == ""
+    assert resp["headers"]["content-disposition"].startswith("attachment; ")
+
+
+def test_download_still_requires_valid_token(aws_env) -> None:
+    _put_record(aws_env["ddb"])
+    bad = _mint(_valid_claims(), tamper_sig=True)
+    resp = handler.handler(_event(bad, download=True), None)
+    assert resp["statusCode"] == 403
+    assert "content-disposition" not in resp["headers"]
+
+
+def test_download_flag_from_raw_query_string(aws_env) -> None:
+    _put_record(aws_env["ddb"], title="Doc")
+    token = _mint(_valid_claims())
+    event = {
+        "requestContext": {"http": {"method": "GET"}},
+        "queryStringParameters": None,
+        "rawQueryString": f"t={token}&download=1",
+    }
+    resp = handler.handler(event, None)
+    assert resp["statusCode"] == 200
+    assert resp["headers"]["content-disposition"].startswith("attachment; ")
diff --git a/backend/tests/routes/conftest.py b/backend/tests/routes/conftest.py
index 850b2b4d..b023dc3d 100644
--- a/backend/tests/routes/conftest.py
+++ b/backend/tests/routes/conftest.py
@@ -25,7 +25,7 @@
 from fastapi import FastAPI, HTTPException, status
 from fastapi.testclient import TestClient
 
-from apis.shared.auth.dependencies import get_current_user, get_current_user_from_session
+from apis.shared.auth.dependencies import get_current_user_from_session
 from apis.shared.auth.models import User
 
 
@@ -96,26 +96,16 @@ def mock_auth_user(app: FastAPI, user: User) -> None:
 
     Requirement 1.1: authenticated TestClient with Auth_Dependency overridden.
 
-    Overrides BOTH `get_current_user` (Bearer-only, retained for the
-    `/chat/agent-stream` external-caller route) and
-    `get_current_user_from_session` (cookie auth — the SPA-facing
-    surface) so routes can be exercised regardless of which dep they
-    pull in. Without the cookie override they'd hit the real session
-    resolution path and 401.
+    Overrides `get_current_user_from_session` (cookie auth — the only
+    user-facing auth dependency in `app_api/` after the BFF migration).
     """
-    app.dependency_overrides[get_current_user] = lambda: user
     app.dependency_overrides[get_current_user_from_session] = lambda: user
 
 
 def mock_no_auth(app: FastAPI) -> None:
-    """Override the auth dependencies to raise HTTP 401.
+    """Override the auth dependency to raise HTTP 401.
 
     Requirement 1.2: unauthenticated TestClient behaviour.
-
-    Both Bearer (`get_current_user`) and cookie
-    (`get_current_user_from_session`) dependencies are overridden so the
-    "no auth provided" assertion holds regardless of which dep the route
-    uses.
     """
 
     def _raise_401():
@@ -124,7 +114,6 @@ def _raise_401():
             detail="Not authenticated",
         )
 
-    app.dependency_overrides[get_current_user] = _raise_401
     app.dependency_overrides[get_current_user_from_session] = _raise_401
 
 
diff --git a/backend/tests/routes/test_chat.py b/backend/tests/routes/test_chat.py
index 54c6a444..f8d3fa0e 100644
--- a/backend/tests/routes/test_chat.py
+++ b/backend/tests/routes/test_chat.py
@@ -3,20 +3,16 @@
 Endpoints under test:
 - POST /chat/generate-title  → 200 with generated title
 - POST /chat/generate-title  → 401 for unauthenticated request
-- POST /chat/agent-stream           → streaming response with text/event-stream
-- POST /chat/multimodal       → streaming response
 
-Requirements: 5.1, 5.2, 5.3, 5.4
+Requirements: 5.1, 5.2
 """
 
-from unittest.mock import AsyncMock, MagicMock, patch
+from unittest.mock import AsyncMock, patch
 
 import pytest
 from fastapi import FastAPI
-from fastapi.testclient import TestClient
 
 from apis.app_api.chat.routes import router
-from tests.routes.conftest import mock_auth_user, mock_no_auth
 
 
 # ---------------------------------------------------------------------------
@@ -92,124 +88,3 @@ def test_returns_401_for_unauthenticated(self, app, unauthenticated_client):
             json={"session_id": "sess-001", "input": "Hello"},
         )
         assert resp.status_code == 401
-
-
-# ---------------------------------------------------------------------------
-# Requirement 5.3: POST /chat/agent-stream returns streaming response
-# ---------------------------------------------------------------------------
-
-
-class TestChatStream:
-    """POST /chat/agent-stream returns a streaming response."""
-
-    def test_returns_streaming_response(self, app, make_user, authenticated_client):
-        """Req 5.3: Should return streaming response with text/event-stream."""
-        user = make_user()
-        client = authenticated_client(app, user)
-
-        # Mock the agent returned by get_agent
-        mock_agent = MagicMock()
-
-        async def fake_stream(*args, **kwargs):
-            yield 'event: message_start\ndata: {"role": "assistant"}\n\n'
-            yield "event: done\ndata: {}\n\n"
-
-        mock_agent.stream_async = fake_stream
-        mock_agent.session_manager = MagicMock()
-        mock_agent.session_manager.flush = MagicMock()
-
-        with patch(
-            "apis.app_api.chat.routes.get_agent",
-            return_value=mock_agent,
-        ), patch(
-            "apis.app_api.chat.routes.get_tool_access_service",
-        ) as mock_tool_svc, patch(
-            "apis.app_api.chat.routes.is_quota_enforcement_enabled",
-            return_value=False,
-        ), patch(
-            "apis.app_api.chat.routes.get_session_metadata",
-            new_callable=AsyncMock,
-            return_value=None,
-        ):
-            mock_tool_access = AsyncMock()
-            mock_tool_access.check_access_and_filter = AsyncMock(
-                return_value=(["tool1"], [])
-            )
-            mock_tool_svc.return_value = mock_tool_access
-
-            resp = client.post(
-                "/chat/agent-stream",
-                json={
-                    "session_id": "sess-001",
-                    "message": "Hello, how are you?",
-                },
-            )
-
-        assert resp.status_code == 200
-        assert "text/event-stream" in resp.headers["content-type"]
-
-    def test_returns_401_for_unauthenticated(self, app, unauthenticated_client):
-        """Req 5.3: Should return 401 when no auth is provided."""
-        client = unauthenticated_client(app)
-        resp = client.post(
-            "/chat/agent-stream",
-            json={"session_id": "sess-001", "message": "Hello"},
-        )
-        assert resp.status_code == 401
-
-
-# ---------------------------------------------------------------------------
-# Requirement 5.4: POST /chat/multimodal returns streaming response
-# ---------------------------------------------------------------------------
-
-
-class TestChatMultimodal:
-    """POST /chat/multimodal returns a streaming response."""
-
-    def test_returns_streaming_response(self, app, make_user, authenticated_client):
-        """Req 5.4: Should return streaming response for multimodal input."""
-        user = make_user()
-        client = authenticated_client(app, user)
-
-        resp = client.post(
-            "/chat/multimodal",
-            json={
-                "session_id": "sess-001",
-                "message": "Describe this image",
-                "files": [
-                    {
-                        "filename": "test.png",
-                        "content_type": "image/png",
-                        "bytes": "aGVsbG8=",
-                    }
-                ],
-            },
-        )
-
-        assert resp.status_code == 200
-        assert "text/event-stream" in resp.headers["content-type"]
-
-    def test_returns_streaming_response_without_files(self, app, make_user, authenticated_client):
-        """Req 5.4: Should return streaming response even without files."""
-        user = make_user()
-        client = authenticated_client(app, user)
-
-        resp = client.post(
-            "/chat/multimodal",
-            json={
-                "session_id": "sess-001",
-                "message": "Just a text message",
-            },
-        )
-
-        assert resp.status_code == 200
-        assert "text/event-stream" in resp.headers["content-type"]
-
-    def test_returns_401_for_unauthenticated(self, app, unauthenticated_client):
-        """Req 5.4: Should return 401 when no auth is provided."""
-        client = unauthenticated_client(app)
-        resp = client.post(
-            "/chat/multimodal",
-            json={"session_id": "sess-001", "message": "Hello"},
-        )
-        assert resp.status_code == 401
diff --git a/backend/tests/routes/test_inference.py b/backend/tests/routes/test_inference.py
index 1e08ef0b..c0daf0c6 100644
--- a/backend/tests/routes/test_inference.py
+++ b/backend/tests/routes/test_inference.py
@@ -65,11 +65,17 @@ def test_ping_returns_200(self, app):
         assert resp.status_code == 200
 
     def test_ping_response_contains_status(self, app):
-        """Req 15.1: /ping response should contain status field."""
+        """Req 15.1: /ping returns the AgentCore health contract.
+
+        Status must be a valid AgentCore PingStatus value, and the response
+        must carry an integer ``time_of_last_update``; without that field the
+        platform idle-reaps the microVM mid-stream
+        (bedrock-agentcore-sdk-python#471).
+        """
         client = TestClient(app)
         body = client.get("/ping").json()
-        assert "status" in body
-        assert body["status"] == "healthy"
+        assert body["status"] in {"Healthy", "HealthyBusy"}
+        assert isinstance(body["time_of_last_update"], int)
 
 
 # ---------------------------------------------------------------------------
diff --git a/backend/tests/routes/test_pbt_auth_sweep.py b/backend/tests/routes/test_pbt_auth_sweep.py
index f34247b0..2f966221 100644
--- a/backend/tests/routes/test_pbt_auth_sweep.py
+++ b/backend/tests/routes/test_pbt_auth_sweep.py
@@ -18,7 +18,7 @@
 from hypothesis import given, settings, HealthCheck
 from hypothesis import strategies as st
 
-from apis.shared.auth.dependencies import get_current_user, get_current_user_from_session
+from apis.shared.auth.dependencies import get_current_user_from_session
 from apis.shared.auth.models import User
 from apis.shared.auth.rbac import require_admin
 
@@ -157,16 +157,13 @@ def test_non_admin_roles_get_403(self, roles):
         app = FastAPI()
         app.include_router(admin_router)
 
-        # Override get_current_user to return a user with the generated roles
+        # Override the session dependency to return a user with the generated roles
         user = User(
             email="prop4@example.com",
             user_id="prop4-user",
             name="Property 4 User",
             roles=roles,
         )
-        # `require_admin` is cookie-only since Phase 7; the Bearer override
-        # is kept too for any test that still relies on it.
-        app.dependency_overrides[get_current_user] = lambda: user
         app.dependency_overrides[get_current_user_from_session] = lambda: user
 
         # Mock AppRoleService to return no admin AppRoles (simulates
@@ -231,7 +228,7 @@ def test_unauthenticated_request_returns_401(self, method, path):
         )
 
         # Clean up the override so it doesn't leak to other tests
-        app.dependency_overrides.pop(get_current_user, None)
+        app.dependency_overrides.pop(get_current_user_from_session, None)
 
     def test_health_endpoint_accessible_without_auth(self):
         """Requirement 17.3: Health endpoint remains accessible without auth.
@@ -240,7 +237,7 @@ def test_health_endpoint_accessible_without_auth(self):
         """
         app = _FULL_APP
         # Ensure no auth override is set
-        app.dependency_overrides.pop(get_current_user, None)
+        app.dependency_overrides.pop(get_current_user_from_session, None)
 
         client = TestClient(app, raise_server_exceptions=False)
         resp = client.get("/health")
diff --git a/backend/tests/shared/test_managed_models.py b/backend/tests/shared/test_managed_models.py
index 1064d8bd..f4144a13 100644
--- a/backend/tests/shared/test_managed_models.py
+++ b/backend/tests/shared/test_managed_models.py
@@ -92,3 +92,89 @@ async def test_supports_caching_default_openai(self):
         from apis.shared.models.managed_models import create_managed_model
         model = await create_managed_model(_make_model_data("gpt4", provider="openai"))
         assert model.supports_caching is False
+
+
+class TestMaxTokensCeiling:
+    """max_tokens spec must not exceed the model's declared output ceiling."""
+
+    def test_default_above_ceiling_rejected(self):
+        # Default 8192 is within the (absent) row bounds but exceeds the
+        # model's 4096 ceiling — only the cross-field rule should fire.
+        with pytest.raises(Exception):
+            _make_model_data(
+                maxOutputTokens=4096,
+                supportedParams={"params": {"max_tokens": {"supported": True, "default": 8192}}},
+            )
+
+    def test_max_above_ceiling_rejected(self):
+        with pytest.raises(Exception):
+            _make_model_data(
+                maxOutputTokens=4096,
+                supportedParams={"params": {"max_tokens": {"supported": True, "max": 8192}}},
+            )
+
+    def test_within_ceiling_ok(self):
+        m = _make_model_data(
+            maxOutputTokens=8192,
+            supportedParams={"params": {"max_tokens": {"supported": True, "max": 8192, "default": 8192}}},
+        )
+        assert m.max_output_tokens == 8192
+
+    def test_unsupported_row_not_ceiling_checked(self):
+        m = _make_model_data(
+            maxOutputTokens=4096,
+            supportedParams={"params": {"max_tokens": {"supported": False, "max": 999999, "default": 999999}}},
+        )
+        assert m.max_output_tokens == 4096
+
+    def test_update_payload_enforced(self):
+        from apis.shared.models.models import ManagedModelUpdate
+        with pytest.raises(Exception):
+            ManagedModelUpdate(
+                maxOutputTokens=4096,
+                supportedParams={"params": {"max_tokens": {"supported": True, "default": 8192}}},
+            )
+
+
+class TestEffortAllowed:
+    """Enum params carry an `allowed` set; `default` must be a member.
+
+    This is the per-model representation of the effort-tier difference
+    (Sonnet 4.6 vs Opus 4.7) — data, not model-family branching in code.
+    """
+
+    def test_default_in_allowed_ok(self):
+        m = _make_model_data(
+            supportedParams={"params": {"effort": {
+                "supported": True, "allowed": ["low", "medium", "high"], "default": "high",
+            }}},
+        )
+        spec = m.supported_params.params["effort"]
+        assert spec.allowed == ["low", "medium", "high"]
+        assert spec.default == "high"
+
+    def test_default_not_in_allowed_rejected(self):
+        with pytest.raises(Exception):
+            _make_model_data(
+                supportedParams={"params": {"effort": {
+                    "supported": True, "allowed": ["low", "medium", "high"], "default": "xhigh",
+                }}},
+            )
+
+    def test_empty_allowed_rejected(self):
+        with pytest.raises(Exception):
+            _make_model_data(
+                supportedParams={"params": {"effort": {
+                    "supported": True, "allowed": [], "default": None,
+                }}},
+            )
+
+    def test_allowed_without_default_ok(self):
+        # No default is valid — runtime sends nothing, model uses its own
+        # API default (effort "high").
+        m = _make_model_data(
+            supportedParams={"params": {"effort": {
+                "supported": True, "allowed": ["low", "medium", "high", "xhigh", "max"],
+            }}},
+        )
+        assert m.supported_params.params["effort"].default is None
diff --git a/backend/tests/shared/test_mcp_apps_broker.py b/backend/tests/shared/test_mcp_apps_broker.py
new file mode 100644
index 00000000..e4e8ef03
--- /dev/null
+++ b/backend/tests/shared/test_mcp_apps_broker.py
@@ -0,0 +1,102 @@
+"""Tests for the per-conversation app-tool event broker (MCP Apps PR #5)."""
+
+import asyncio
+
+import pytest
+
+from apis.shared.mcp_apps.broker import (
+    AppToolEventBroker,
+    get_app_tool_event_broker,
+)
+
+
+def _ev(tag: str) -> dict:
+    return {"type": "tool_use", "data": {"tag": tag}}
+
+
+def test_singleton_accessor():
+    assert get_app_tool_event_broker() is get_app_tool_event_broker()
+
+
+def test_publish_to_active_subscriber_is_live():
+    b = AppToolEventBroker()
+    q = b.add_subscriber("s1")
+    b.publish("s1", _ev("a"))
+    b.publish("s1", _ev("b"))
+    assert [e["data"]["tag"] for e in b.drain(q)] == ["a", "b"]
+    assert b.drain(q) == []
+    b.remove_subscriber("s1", q)
+
+
+def test_publish_with_no_subscriber_buffers_then_flushes_on_subscribe():
+    b = AppToolEventBroker()
+    # No active stream — buffered.
+    b.publish("s1", _ev("early"))
+    q = b.add_subscriber("s1")
+    # The next stream to open drains what it missed.
+    assert [e["data"]["tag"] for e in b.drain(q)] == ["early"]
+    b.remove_subscriber("s1", q)
+
+
+def test_pending_ring_is_bounded():
+    b = AppToolEventBroker()
+    for i in range(150):
+        b.publish("s1", _ev(str(i)))
+    q = b.add_subscriber("s1")
+    drained = b.drain(q)
+    # Capped at 100, oldest dropped → tail retained.
+    assert len(drained) == 100
+    assert drained[0]["data"]["tag"] == "50"
+    assert drained[-1]["data"]["tag"] == "149"
+
+
+def test_sessions_are_isolated():
+    b = AppToolEventBroker()
+    qa = b.add_subscriber("a")
+    qb = b.add_subscriber("b")
+    b.publish("a", _ev("for-a"))
+    assert [e["data"]["tag"] for e in b.drain(qa)] == ["for-a"]
+    assert b.drain(qb) == []
+    b.remove_subscriber("a", qa)
+    b.remove_subscriber("b", qb)
+
+
+def test_fan_out_to_multiple_active_subscribers():
+    b = AppToolEventBroker()
+    q1 = b.add_subscriber("s1")
+    q2 = b.add_subscriber("s1")
+    b.publish("s1", _ev("x"))
+    assert b.drain(q1)[0]["data"]["tag"] == "x"
+    assert b.drain(q2)[0]["data"]["tag"] == "x"
+    b.remove_subscriber("s1", q1)
+    b.remove_subscriber("s1", q2)
+
+
+def test_remove_subscriber_prunes_session_then_buffers_again():
+    b = AppToolEventBroker()
+    q = b.add_subscriber("s1")
+    b.remove_subscriber("s1", q)
+    # With the subscriber gone the session falls back to buffering.
+    b.publish("s1", _ev("after"))
+    q2 = b.add_subscriber("s1")
+    assert [e["data"]["tag"] for e in b.drain(q2)] == ["after"]
+    b.remove_subscriber("s1", q2)
+
+
+def test_publish_empty_session_is_noop():
+    b = AppToolEventBroker()
+    b.publish("", _ev("x"))  # must not raise
+
+
+@pytest.mark.asyncio
+async def test_subscribe_context_manager_pairs_add_remove():
+    b = AppToolEventBroker()
+    b.publish("s1", _ev("buffered"))
+    async with b.subscribe("s1") as q:
+        assert [e["data"]["tag"] for e in b.drain(q)] == ["buffered"]
+        b.publish("s1", _ev("live"))
+        assert [e["data"]["tag"] for e in b.drain(q)] == ["live"]
+    # Context exit unsubscribed → back to buffering.
+    b.publish("s1", _ev("after"))
+    async with b.subscribe("s1") as q2:
+        assert [e["data"]["tag"] for e in b.drain(q2)] == ["after"]
diff --git a/backend/tests/shared/test_mcp_apps_card_store.py b/backend/tests/shared/test_mcp_apps_card_store.py
new file mode 100644
index 00000000..69ce17f2
--- /dev/null
+++ b/backend/tests/shared/test_mcp_apps_card_store.py
@@ -0,0 +1,128 @@
+"""Tests for the app-initiated tool-card store (MCP Apps PR #6, Option A).
+
+The store reuses the existing `sessions-metadata` table. No DynamoDB in
+tests — the no-table path is a silent no-op (matches dev), and a fake
+table asserts the record shape, the ownership re-check, and the size cap.
+"""
+
+from __future__ import annotations
+
+from decimal import Decimal
+
+from apis.shared.mcp_apps.card_store import AppCardStore
+
+
+class _FakeTable:
+    def __init__(self, items=None) -> None:
+        self.items = items or []
+        self.puts: list = []
+
+    def put_item(self, Item):  # noqa: N803 - boto3 kwarg name
+        self.puts.append(Item)
+
+    def query(self, **kwargs):
+        return {"Items": self.items}
+
+
+def _store_with(table) -> AppCardStore:
+    s = AppCardStore()  # __init__ sets _table=None without the env var
+    s._table = table
+    return s
+
+
+def test_no_table_is_silent_noop():
+    s = AppCardStore()
+    assert s.enabled is False
+    # Must not raise.
+    s.store(
+        user_id="u1",
+        session_id="s1",
+        tool_use_id="tu1",
+        tool_name="t",
+        arguments={},
+        content=[],
+        is_error=False,
+    )
+    assert s.list_for_session(session_id="s1", user_id="u1") == []
+
+
+def test_store_writes_appcard_record_shape():
+    table = _FakeTable()
+    s = _store_with(table)
+    s.store(
+        user_id="u1",
+        session_id="s1",
+        tool_use_id="tu1",
+        tool_name="widget_tool",
+        arguments={"q": "x", "n": 1.5},
+        content=[{"type": "text", "text": "ok"}],
+        is_error=False,
+    )
+    assert len(table.puts) == 1
+    item = table.puts[0]
+    assert item["PK"] == "USER#u1"
+    assert item["SK"].startswith("APPCARD#")
+    assert item["GSI_PK"] == "SESSION#s1"
+    assert item["GSI_SK"].startswith("APPCARD#")
+    assert item["toolName"] == "widget_tool"
+    assert item["isError"] is False
+    # floats are stored as Decimal for DynamoDB.
+    assert item["arguments"]["n"] == Decimal("1.5")
+    assert "ttl" in item
+
+
+def test_store_caps_oversized_content():
+    table = _FakeTable()
+    s = _store_with(table)
+    huge = [{"type": "text", "text": "z" * 300_000}]
+    s.store(
+        user_id="u1",
+        session_id="s1",
+        tool_use_id="tu1",
+        tool_name="t",
+        arguments={},
+        content=huge,
+        is_error=False,
+    )
+    stored = table.puts[0]["content"]
+    assert stored == [
+        {"type": "text", "text": "[result omitted from history — too large to persist]"}
+    ]
+
+
+def test_list_filters_by_owner_and_cleans_record():
+    items = [
+        {
+            "PK": "USER#u1",
+            "SK": "APPCARD#2026-01-01T00:00:00#aaa",
+            "GSI_PK": "SESSION#s1",
+            "GSI_SK": "APPCARD#2026-01-01T00:00:00",
+            "ttl": 123,
+            "userId": "u1",
+            "sessionId": "s1",
+            "toolName": "mine",
+            "isError": False,
+            "producedByMessageIndex": Decimal("4"),
+        },
+        {
+            "PK": "USER#someone-else",
+            "SK": "APPCARD#2026-01-01T00:00:01#bbb",
+            "GSI_PK": "SESSION#s1",
+            "GSI_SK": "APPCARD#2026-01-01T00:00:01",
+            "userId": "other",
+            "toolName": "not-mine",
+            "isError": False,
+        },
+    ]
+    s = _store_with(_FakeTable(items))
+    cards = s.list_for_session(session_id="s1", user_id="u1")
+
+    assert len(cards) == 1
+    card = cards[0]
+    assert card["toolName"] == "mine"
+    # Key attributes are stripped from the returned card.
+    for k in ("PK", "SK", "GSI_PK", "GSI_SK", "ttl"):
+        assert k not in card
+    # Decimals are converted back to native ints/floats.
+    assert card["producedByMessageIndex"] == 4
+    assert isinstance(card["producedByMessageIndex"], int)
diff --git a/backend/tests/shared/test_models_and_utils.py b/backend/tests/shared/test_models_and_utils.py
index 90e38a83..4c5a6432 100644
--- a/backend/tests/shared/test_models_and_utils.py
+++ b/backend/tests/shared/test_models_and_utils.py
@@ -244,6 +244,27 @@ def test_build_conversational_error_service_unavailable(self):
         evt = build_conversational_error_event(ErrorCode.SERVICE_UNAVAILABLE, Exception("down"))
         assert "unavailable" in evt.message.lower()
 
+    def test_build_conversational_error_max_tokens(self):
+        from apis.shared.errors import build_conversational_error_event, ErrorCode
+        # The raw SDK message carries a strandsagents.com URL we must NOT leak.
+        raw = Exception(
+            "Agent has reached an unrecoverable state due to max_tokens limit. "
+            "For more information see: https://strandsagents.com/x"
+        )
+        evt = build_conversational_error_event(
+            ErrorCode.MAX_TOKENS, raw, session_id="s1", recoverable=True
+        )
+        # Concise, not rendered as a bubble — the UI owns the wording.
+        assert "limit" in evt.message.lower()
+        # No leaked SDK URL or raw exception text.
+        assert "strandsagents.com" not in evt.message
+        assert "unrecoverable" not in evt.message.lower()
+        # Recoverable + machine-readable hint for the frontend affordance.
+        assert evt.recoverable is True
+        assert evt.code == ErrorCode.MAX_TOKENS
+        assert evt.metadata["error_kind"] == "max_tokens"
+        assert evt.metadata["session_id"] == "s1"
+
 
 # ===================================================================
 # sessions/metadata.py — pure functions
diff --git a/backend/tests/shared/test_sessions_metadata.py b/backend/tests/shared/test_sessions_metadata.py
index ac6027c2..7b80f10b 100644
--- a/backend/tests/shared/test_sessions_metadata.py
+++ b/backend/tests/shared/test_sessions_metadata.py
@@ -74,6 +74,51 @@ async def test_get_nonexistent(self, sessions_metadata_table):
         assert result is None
 
 
+class TestTruncatedTurnMarker:
+    """Refresh-survival marker for the max_tokens 'Continue' affordance."""
+
+    @pytest.mark.asyncio
+    async def test_set_then_clear(self, sessions_metadata_table):
+        from apis.shared.sessions.metadata import (
+            store_session_metadata,
+            get_session_metadata,
+            set_truncated_turn,
+            clear_truncated_turn,
+        )
+        await store_session_metadata(session_id="s1", user_id="u1", session_metadata=_make_session_metadata())
+
+        # Default: not continuable.
+        result = await get_session_metadata("s1", "u1")
+        assert not result.last_turn_continuable
+
+        await set_truncated_turn("s1", "u1")
+        result = await get_session_metadata("s1", "u1")
+        assert result.last_turn_continuable is True
+
+        await clear_truncated_turn("s1", "u1")
+        result = await get_session_metadata("s1", "u1")
+        assert not result.last_turn_continuable
+
+    @pytest.mark.asyncio
+    async def test_survives_response_round_trip(self, sessions_metadata_table):
+        # Exact contract the metadata endpoint uses:
+        # SessionMetadataResponse.model_validate(metadata.model_dump(by_alias=True))
+        from apis.shared.sessions.metadata import (
+            store_session_metadata,
+            get_session_metadata,
+            set_truncated_turn,
+        )
+        from apis.shared.sessions.models import SessionMetadataResponse
+
+        await store_session_metadata(session_id="s2", user_id="u1", session_metadata=_make_session_metadata(session_id="s2"))
+        await set_truncated_turn("s2", "u1")
+        meta = await get_session_metadata("s2", "u1")
+
+        resp = SessionMetadataResponse.model_validate(meta.model_dump(by_alias=True))
+        assert resp.last_turn_continuable is True
+        assert resp.model_dump(by_alias=True)["lastTurnContinuable"] is True
+
+
 class TestGetAllMessageMetadata:
     @pytest.mark.asyncio
     async def test_get_cost_records(self, sessions_metadata_table):
diff --git a/backend/tests/shared/test_user_menu_links.py b/backend/tests/shared/test_user_menu_links.py
new file mode 100644
index 00000000..c75682c6
--- /dev/null
+++ b/backend/tests/shared/test_user_menu_links.py
@@ -0,0 +1,259 @@
+"""Tests for the user-menu links shared module (repository + service)."""
+
+import boto3
+import pytest
+from pydantic import ValidationError
+
+from apis.shared.user_menu_links.models import (
+    UserMenuLink,
+    UserMenuLinkCreate,
+    UserMenuLinkUpdate,
+)
+from apis.shared.user_menu_links.repository import UserMenuLinksRepository
+from apis.shared.user_menu_links.service import UserMenuLinksService
+
+AWS_REGION = "us-west-2"
+
+
+@pytest.fixture()
+def user_menu_links_table(aws, monkeypatch):
+    ddb = boto3.client("dynamodb", region_name=AWS_REGION)
+    name = "test-user-menu-links"
+    ddb.create_table(
+        TableName=name,
+        KeySchema=[
+            {"AttributeName": "PK", "KeyType": "HASH"},
+            {"AttributeName": "SK", "KeyType": "RANGE"},
+        ],
+        AttributeDefinitions=[
+            {"AttributeName": "PK", "AttributeType": "S"},
+            {"AttributeName": "SK", "AttributeType": "S"},
+        ],
+        BillingMode="PAY_PER_REQUEST",
+    )
+    monkeypatch.setenv("DYNAMODB_USER_MENU_LINKS_TABLE_NAME", name)
+    return boto3.resource("dynamodb", region_name=AWS_REGION).Table(name)
+
+
+@pytest.fixture()
+def repo(user_menu_links_table):
+    return UserMenuLinksRepository(table_name="test-user-menu-links", region=AWS_REGION)
+
+
+@pytest.fixture()
+def service(repo):
+    return UserMenuLinksService(repo)
+
+
+def _external(**kw):
+    defaults = dict(label="Privacy policy", kind="external", url="https://x.example/p")
+    defaults.update(kw)
+    return UserMenuLinkCreate(**defaults)
+
+
+def _modal(**kw):
+    defaults = dict(label="About", kind="modal", body_markdown="# Hi")
+    defaults.update(kw)
+    return UserMenuLinkCreate(**defaults)
+
+
+# ----------------------------------------------------------------------
+# Pydantic validation
+# ----------------------------------------------------------------------
+
+
+class TestCreateValidation:
+    def test_external_requires_url(self):
+        with pytest.raises(ValidationError):
+            UserMenuLinkCreate(label="X", kind="external")
+
+    def test_modal_requires_body_markdown(self):
+        with pytest.raises(ValidationError):
+            UserMenuLinkCreate(label="X", kind="modal")
+
+    def test_external_ok_with_url(self):
+        link = UserMenuLinkCreate(label="X", kind="external", url="https://x.example")
+        assert link.kind == "external"
+
+    def test_modal_ok_with_body(self):
+        link = UserMenuLinkCreate(label="X", kind="modal", body_markdown="hi")
+        assert link.kind == "modal"
+
+    def test_order_bounds(self):
+        with pytest.raises(ValidationError):
+            UserMenuLinkCreate(label="X", kind="external", url="https://x.example", order=-1)
+        with pytest.raises(ValidationError):
+            UserMenuLinkCreate(label="X", kind="external", url="https://x.example", order=10_001)
+
+    @pytest.mark.parametrize(
+        "bad_url",
+        [
+            "javascript:alert(1)",
+            "data:text/html,<script>alert(1)</script>",
+            "file:///etc/passwd",
+            "ftp://example.com/x",
+            "//example.com/x",
+            "example.com/x",
+        ],
+    )
+    def test_external_rejects_non_http_url(self, bad_url):
+        with pytest.raises(ValidationError):
+            UserMenuLinkCreate(label="X", kind="external", url=bad_url)
+
+    def test_external_accepts_http_and_https(self):
+        UserMenuLinkCreate(label="X", kind="external", url="http://x.example")
+        UserMenuLinkCreate(label="X", kind="external", url="HTTPS://X.example")
+
+    def test_update_rejects_non_http_url(self):
+        with pytest.raises(ValidationError):
+            UserMenuLinkUpdate(url="javascript:alert(1)")
+
+
+# ----------------------------------------------------------------------
+# Repository
+# ----------------------------------------------------------------------
+
+
+class TestRepository:
+    @pytest.mark.asyncio
+    async def test_create_and_get(self, repo):
+        created = await repo.create_link(_external(), created_by="admin@x")
+        assert created.link_id
+        fetched = await repo.get_link(created.link_id)
+        assert fetched is not None
+        assert fetched.label == "Privacy policy"
+        assert fetched.created_by == "admin@x"
+
+    @pytest.mark.asyncio
+    async def test_get_missing_returns_none(self, repo):
+        assert await repo.get_link("nope") is None
+
+    @pytest.mark.asyncio
+    async def test_list_returns_all_then_enabled_only(self, repo):
+        a = await repo.create_link(_external(label="A", order=2))
+        b = await repo.create_link(_modal(label="B", order=1, enabled=False))
+        all_links = await repo.list_links()
+        assert {link.link_id for link in all_links} == {a.link_id, b.link_id}
+        enabled = await repo.list_links(enabled_only=True)
+        assert [link.link_id for link in enabled] == [a.link_id]
+
+    @pytest.mark.asyncio
+    async def test_list_sorted_by_order_then_label(self, repo):
+        await repo.create_link(_external(label="Beta", order=10))
+        await repo.create_link(_external(label="Alpha", order=10))
+        await repo.create_link(_external(label="First", order=0))
+        labels = [link.label for link in await repo.list_links()]
+        assert labels == ["First", "Alpha", "Beta"]
+
+    @pytest.mark.asyncio
+    async def test_update_partial_merges(self, repo):
+        created = await repo.create_link(_external())
+        updated = await repo.update_link(
+            created.link_id, UserMenuLinkUpdate(label="Renamed")
+        )
+        assert updated is not None
+        assert updated.label == "Renamed"
+        assert updated.url == "https://x.example/p"  # untouched
+
+    @pytest.mark.asyncio
+    async def test_update_to_modal_requires_body(self, repo):
+        created = await repo.create_link(_external())
+        # Switching to modal without supplying body_markdown should raise:
+        # the merged record has kind=modal but no body.
+        with pytest.raises(ValueError, match="modal links require body_markdown"):
+            await repo.update_link(created.link_id, UserMenuLinkUpdate(kind="modal"))
+
+    @pytest.mark.asyncio
+    async def test_update_to_modal_with_body_succeeds(self, repo):
+        created = await repo.create_link(_external())
+        updated = await repo.update_link(
+            created.link_id,
+            UserMenuLinkUpdate(kind="modal", body_markdown="hi"),
+        )
+        assert updated is not None
+        assert updated.kind == "modal"
+        assert updated.body_markdown == "hi"
+
+    @pytest.mark.asyncio
+    async def test_update_missing_returns_none(self, repo):
+        assert await repo.update_link("nope", UserMenuLinkUpdate(label="x")) is None
+
+    @pytest.mark.asyncio
+    async def test_delete(self, repo):
+        created = await repo.create_link(_external())
+        assert await repo.delete_link(created.link_id) is True
+        assert await repo.get_link(created.link_id) is None
+
+    @pytest.mark.asyncio
+    async def test_delete_missing_returns_false(self, repo):
+        assert await repo.delete_link("nope") is False
+
+    @pytest.mark.asyncio
+    async def test_update_rejects_non_http_url_on_merge(self, repo):
+        """Defense-in-depth: even if a bypass somehow stored a bad URL,
+        an update that merges it back through the repo must reject it.
+        We bypass Pydantic by mutating the existing record directly."""
+        created = await repo.create_link(_external())
+        # Force a bad URL into the persisted item to simulate corruption /
+        # an out-of-band write.
+        repo._table.update_item(
+            Key={"PK": "USER_MENU_LINKS", "SK": f"LINK#{created.link_id}"},
+            UpdateExpression="SET #u = :u",
+            ExpressionAttributeNames={"#u": "url"},
+            ExpressionAttributeValues={":u": "javascript:alert(1)"},
+        )
+        with pytest.raises(ValueError, match="url must start with"):
+            await repo.update_link(
+                created.link_id, UserMenuLinkUpdate(label="Renamed")
+            )
+
+
+# ----------------------------------------------------------------------
+# Service
+# ----------------------------------------------------------------------
+
+
+class TestService:
+    @pytest.mark.asyncio
+    async def test_create_then_list(self, service):
+        await service.create_link(_external())
+        links = await service.list_links()
+        assert len(links) == 1
+
+    @pytest.mark.asyncio
+    async def test_enabled_only_filter(self, service):
+        await service.create_link(_external(label="A"))
+        await service.create_link(_external(label="B", enabled=False))
+        enabled = await service.list_links(enabled_only=True)
+        labels = [link.label for link in enabled]
+        assert labels == ["A"]
+
+
+# ----------------------------------------------------------------------
+# Dataclass round-trip
+# ----------------------------------------------------------------------
+
+
+class TestDynamoRoundTrip:
+    def test_to_and_from_dynamo_item(self):
+        original = UserMenuLink(
+            link_id="abc",
+            label="Test",
+            kind="modal",
+            enabled=True,
+            order=5,
+            body_markdown="# Hi",
+            created_at="2026-05-14T00:00:00Z",
+            updated_at="2026-05-14T00:00:00Z",
+        )
+        item = original.to_dynamo_item()
+        round_tripped = UserMenuLink.from_dynamo_item(item)
+        assert round_tripped == original
+
+    def test_from_dynamo_item_requires_timestamps(self):
+        # Corrupted records (missing createdAt/updatedAt) should raise rather
+        # than silently substituting "now".
+        with pytest.raises(ValueError, match="missing required timestamp"):
+            UserMenuLink.from_dynamo_item(
+                {"linkId": "x", "label": "x", "kind": "external", "url": "https://x.example"}
+            )
diff --git a/backend/tests/shared/test_user_menu_links_routes.py b/backend/tests/shared/test_user_menu_links_routes.py
new file mode 100644
index 00000000..d283a806
--- /dev/null
+++ b/backend/tests/shared/test_user_menu_links_routes.py
@@ -0,0 +1,229 @@
+"""Route tests for user-menu links endpoints (admin CRUD + public read)."""
+
+import boto3
+import pytest
+from fastapi import APIRouter, FastAPI, HTTPException
+from fastapi.testclient import TestClient
+
+from apis.shared.auth import get_current_user_from_session, require_admin
+from apis.shared.auth.models import User
+from apis.shared.user_menu_links import repository as repo_module
+from apis.shared.user_menu_links import service as service_module
+
+AWS_REGION = "us-east-1"
+TABLE_NAME = "test-user-menu-links-routes"
+
+
+def _make_user(email: str = "user@example.com", roles=None) -> User:
+    return User(
+        email=email,
+        user_id="user-001",
+        name="Test User",
+        roles=roles if roles is not None else ["User"],
+    )
+
+
+@pytest.fixture()
+def user_menu_links_table(aws, monkeypatch):
+    """Moto-backed DynamoDB table + module-singleton reset so the routes pick
+    up this fresh table on first call inside each test."""
+    ddb = boto3.client("dynamodb", region_name=AWS_REGION)
+    ddb.create_table(
+        TableName=TABLE_NAME,
+        KeySchema=[
+            {"AttributeName": "PK", "KeyType": "HASH"},
+            {"AttributeName": "SK", "KeyType": "RANGE"},
+        ],
+        AttributeDefinitions=[
+            {"AttributeName": "PK", "AttributeType": "S"},
+            {"AttributeName": "SK", "AttributeType": "S"},
+        ],
+        BillingMode="PAY_PER_REQUEST",
+    )
+    monkeypatch.setenv("DYNAMODB_USER_MENU_LINKS_TABLE_NAME", TABLE_NAME)
+    monkeypatch.setenv("AWS_REGION", AWS_REGION)
+    # The service + repo are module-level singletons; reset them so the
+    # next get_*() call constructs a fresh instance against the moto table.
+    monkeypatch.setattr(repo_module, "_repository", None)
+    monkeypatch.setattr(service_module, "_service", None)
+    return boto3.resource("dynamodb", region_name=AWS_REGION).Table(TABLE_NAME)
+
+
+def _build_admin_app() -> FastAPI:
+    """Mount the admin router under /admin to mirror the real app."""
+    from apis.app_api.admin.user_menu_links.routes import router as admin_router
+
+    app = FastAPI()
+    parent = APIRouter(prefix="/admin")
+    parent.include_router(admin_router)
+    app.include_router(parent)
+    return app
+
+
+def _build_public_app() -> FastAPI:
+    from apis.app_api.user_menu_links.routes import router as public_router
+
+    app = FastAPI()
+    app.include_router(public_router)
+    return app
+
+
+# ----------------------------------------------------------------------
+# Admin routes
+# ----------------------------------------------------------------------
+
+
+class TestAdminRoutes:
+    def test_create_returns_201(self, user_menu_links_table):
+        app = _build_admin_app()
+        admin = _make_user(email="admin@example.com", roles=["system_admin"])
+        app.dependency_overrides[require_admin] = lambda: admin
+
+        client = TestClient(app)
+        resp = client.post(
+            "/admin/user-menu-links/",
+            json={"label": "Privacy", "kind": "external", "url": "https://x.example"},
+        )
+        assert resp.status_code == 201
+        body = resp.json()
+        assert body["label"] == "Privacy"
+        assert body["created_by"] == "admin@example.com"
+
+    def test_create_rejects_non_http_url_with_422(self, user_menu_links_table):
+        app = _build_admin_app()
+        admin = _make_user(email="admin@example.com", roles=["system_admin"])
+        app.dependency_overrides[require_admin] = lambda: admin
+
+        client = TestClient(app)
+        resp = client.post(
+            "/admin/user-menu-links/",
+            json={"label": "Bad", "kind": "external", "url": "javascript:alert(1)"},
+        )
+        assert resp.status_code == 422
+
+    def test_create_missing_url_for_external_returns_422(self, user_menu_links_table):
+        app = _build_admin_app()
+        admin = _make_user(email="admin@example.com", roles=["system_admin"])
+        app.dependency_overrides[require_admin] = lambda: admin
+
+        client = TestClient(app)
+        resp = client.post(
+            "/admin/user-menu-links/",
+            json={"label": "X", "kind": "external"},
+        )
+        assert resp.status_code == 422
+
+    def test_non_admin_gets_403(self, user_menu_links_table):
+        app = _build_admin_app()
+
+        def _forbid():
+            raise HTTPException(status_code=403, detail="Forbidden")
+
+        app.dependency_overrides[require_admin] = _forbid
+
+        client = TestClient(app)
+        resp = client.get("/admin/user-menu-links/")
+        assert resp.status_code == 403
+
+    def test_list_then_get_round_trips(self, user_menu_links_table):
+        app = _build_admin_app()
+        admin = _make_user(email="admin@example.com", roles=["system_admin"])
+        app.dependency_overrides[require_admin] = lambda: admin
+
+        client = TestClient(app)
+        created = client.post(
+            "/admin/user-menu-links/",
+            json={"label": "About", "kind": "modal", "body_markdown": "# Hi"},
+        ).json()
+
+        list_resp = client.get("/admin/user-menu-links/")
+        assert list_resp.status_code == 200
+        assert list_resp.json()["total"] == 1
+
+        get_resp = client.get(f"/admin/user-menu-links/{created['link_id']}")
+        assert get_resp.status_code == 200
+        assert get_resp.json()["label"] == "About"
+
+    def test_get_missing_returns_404(self, user_menu_links_table):
+        app = _build_admin_app()
+        admin = _make_user(email="admin@example.com", roles=["system_admin"])
+        app.dependency_overrides[require_admin] = lambda: admin
+
+        client = TestClient(app)
+        resp = client.get("/admin/user-menu-links/does-not-exist")
+        assert resp.status_code == 404
+
+    def test_update_returns_400_on_invariant_violation(self, user_menu_links_table):
+        app = _build_admin_app()
+        admin = _make_user(email="admin@example.com", roles=["system_admin"])
+        app.dependency_overrides[require_admin] = lambda: admin
+
+        client = TestClient(app)
+        created = client.post(
+            "/admin/user-menu-links/",
+            json={"label": "Privacy", "kind": "external", "url": "https://x.example"},
+        ).json()
+
+        # PATCH kind=modal without supplying body_markdown — the merged record
+        # fails the kind/body invariant in the repository, which raises
+        # ValueError → mapped to 400 by the handler.
+        resp = client.patch(
+            f"/admin/user-menu-links/{created['link_id']}",
+            json={"kind": "modal"},
+        )
+        assert resp.status_code == 400
+
+    def test_delete_returns_204_then_404(self, user_menu_links_table):
+        app = _build_admin_app()
+        admin = _make_user(email="admin@example.com", roles=["system_admin"])
+        app.dependency_overrides[require_admin] = lambda: admin
+
+        client = TestClient(app)
+        created = client.post(
+            "/admin/user-menu-links/",
+            json={"label": "X", "kind": "external", "url": "https://x.example"},
+        ).json()
+
+        del_resp = client.delete(f"/admin/user-menu-links/{created['link_id']}")
+        assert del_resp.status_code == 204
+
+        again = client.delete(f"/admin/user-menu-links/{created['link_id']}")
+        assert again.status_code == 404
+
+
+# ----------------------------------------------------------------------
+# Public read route
+# ----------------------------------------------------------------------
+
+
+class TestPublicRoute:
+    def test_returns_only_enabled_links(self, user_menu_links_table):
+        admin_app = _build_admin_app()
+        admin = _make_user(email="admin@example.com", roles=["system_admin"])
+        admin_app.dependency_overrides[require_admin] = lambda: admin
+        admin_client = TestClient(admin_app)
+
+        # Seed one enabled + one disabled link via the admin API.
+        admin_client.post(
+            "/admin/user-menu-links/",
+            json={"label": "Visible", "kind": "external", "url": "https://x.example"},
+        )
+        admin_client.post(
+            "/admin/user-menu-links/",
+            json={
+                "label": "Hidden",
+                "kind": "external",
+                "url": "https://y.example",
+                "enabled": False,
+            },
+        )
+
+        public_app = _build_public_app()
+        public_app.dependency_overrides[get_current_user_from_session] = (
+            lambda: _make_user()
+        )
+        public_client = TestClient(public_app)
+        resp = public_client.get("/user-menu-links/")
+        assert resp.status_code == 200
+        body = resp.json()
+        assert [link["label"] for link in body["links"]] == ["Visible"]
diff --git a/backend/tests/test_seed_system_admin_jwt.py b/backend/tests/test_seed_system_admin_jwt.py
index 2b605f9c..5ad50c90 100644
--- a/backend/tests/test_seed_system_admin_jwt.py
+++ b/backend/tests/test_seed_system_admin_jwt.py
@@ -117,7 +117,7 @@ def test_creates_default_tools(self, dynamodb_table):
         """Creates the default tool entries."""
         result = seed_default_tools(TABLE_NAME, REGION)
 
-        assert result.created == 4
+        assert result.created == 6
         assert result.failed == 0
 
         # Verify fetch_url_content
@@ -167,13 +167,39 @@ def test_creates_default_tools(self, dynamodb_table):
         assert item["category"] == "code"
         assert item["protocol"] == "local"
 
+        # Verify create_artifact
+        resp = dynamodb_table.get_item(
+            Key={"PK": "TOOL#create_artifact", "SK": "METADATA"}
+        )
+        item = resp["Item"]
+        assert item["toolId"] == "create_artifact"
+        assert item["displayName"] == "Create Artifact"
+        assert item["category"] == "document"
+        assert item["protocol"] == "local"
+        assert item["enabledByDefault"] is True
+        assert item["isPublic"] is True
+        assert item["GSI1PK"] == "CATEGORY#document"
+        assert item["GSI1SK"] == "TOOL#create_artifact"
+
+        # Verify update_artifact
+        resp = dynamodb_table.get_item(
+            Key={"PK": "TOOL#update_artifact", "SK": "METADATA"}
+        )
+        item = resp["Item"]
+        assert item["toolId"] == "update_artifact"
+        assert item["displayName"] == "Update Artifact"
+        assert item["category"] == "document"
+        assert item["protocol"] == "local"
+        assert item["enabledByDefault"] is True
+        assert item["isPublic"] is True
+
     def test_skips_existing_tools(self, dynamodb_table):
         """Skips tools that already exist."""
         seed_default_tools(TABLE_NAME, REGION)
 
         result = seed_default_tools(TABLE_NAME, REGION)
 
-        assert result.skipped == 4
+        assert result.skipped == 6
         assert result.created == 0
 
     def test_partial_skip(self, dynamodb_table):
@@ -187,5 +213,5 @@ def test_partial_skip(self, dynamodb_table):
 
         result = seed_default_tools(TABLE_NAME, REGION)
 
-        assert result.created == 3
+        assert result.created == 5
         assert result.skipped == 1
diff --git a/backend/tests/test_system_admin.py b/backend/tests/test_system_admin.py
deleted file mode 100644
index 2cdf5637..00000000
--- a/backend/tests/test_system_admin.py
+++ /dev/null
@@ -1,90 +0,0 @@
-"""Tests for require_system_admin dependency."""
-
-import pytest
-from unittest.mock import AsyncMock, patch
-from datetime import datetime, timezone
-
-from fastapi import HTTPException
-
-from apis.shared.auth.models import User
-from apis.shared.rbac.models import UserEffectivePermissions
-from apis.shared.rbac.system_admin import require_system_admin
-
-
-def _user(roles: list | None = None) -> User:
-    return User(
-        user_id="u-1",
-        email="test@example.com",
-        name="Test",
-        roles=roles or [],
-    )
-
-
-def _perms(app_roles: list) -> UserEffectivePermissions:
-    return UserEffectivePermissions(
-        user_id="u-1",
-        app_roles=app_roles,
-        tools=[],
-        models=[],
-        quota_tier=None,
-        resolved_at=datetime.now(timezone.utc).isoformat() + "Z",
-    )
-
-
-class TestRequireSystemAdmin:
-    @pytest.mark.asyncio
-    async def test_grants_access_when_system_admin_role_present(self):
-        mock_service = AsyncMock()
-        mock_service.resolve_user_permissions.return_value = _perms(["system_admin"])
-
-        with patch(
-            "apis.shared.rbac.service.get_app_role_service",
-            return_value=mock_service,
-        ):
-            result = await require_system_admin(user=_user(["Admin"]))
-
-        assert result.user_id == "u-1"
-        mock_service.resolve_user_permissions.assert_called_once()
-
-    @pytest.mark.asyncio
-    async def test_denies_access_without_system_admin_role(self):
-        mock_service = AsyncMock()
-        mock_service.resolve_user_permissions.return_value = _perms(["default"])
-
-        with patch(
-            "apis.shared.rbac.service.get_app_role_service",
-            return_value=mock_service,
-        ):
-            with pytest.raises(HTTPException) as exc_info:
-                await require_system_admin(user=_user(["Faculty"]))
-
-        assert exc_info.value.status_code == 403
-
-    @pytest.mark.asyncio
-    async def test_denies_access_on_service_error(self):
-        """Fail-closed: if AppRoleService raises, deny access."""
-        mock_service = AsyncMock()
-        mock_service.resolve_user_permissions.side_effect = Exception("DynamoDB down")
-
-        with patch(
-            "apis.shared.rbac.service.get_app_role_service",
-            return_value=mock_service,
-        ):
-            with pytest.raises(HTTPException) as exc_info:
-                await require_system_admin(user=_user(["Admin"]))
-
-        assert exc_info.value.status_code == 403
-
-    @pytest.mark.asyncio
-    async def test_denies_access_with_empty_app_roles(self):
-        mock_service = AsyncMock()
-        mock_service.resolve_user_permissions.return_value = _perms([])
-
-        with patch(
-            "apis.shared.rbac.service.get_app_role_service",
-            return_value=mock_service,
-        ):
-            with pytest.raises(HTTPException) as exc_info:
-                await require_system_admin(user=_user([]))
-
-        assert exc_info.value.status_code == 403
diff --git a/backend/uv.lock b/backend/uv.lock
index df65cc44..8afa59de 100644
--- a/backend/uv.lock
+++ b/backend/uv.lock
@@ -1,5 +1,5 @@
 version = 1
-revision = 2
+revision = 3
 requires-python = ">=3.10"
 resolution-markers = [
     "python_full_version >= '3.15'",
@@ -12,7 +12,7 @@ resolution-markers = [
 
 [[package]]
 name = "agentcore-stack"
-version = "1.0.0b24"
+version = "1.0.0b28"
 source = { editable = "." }
 dependencies = [
     { name = "aiofiles" },
@@ -25,6 +25,7 @@ dependencies = [
     { name = "httpx" },
     { name = "pillow" },
     { name = "pyjwt", extra = ["crypto"] },
+    { name = "pypdfium2" },
     { name = "python-dotenv" },
     { name = "python-multipart" },
     { name = "starlette" },
@@ -83,9 +84,9 @@ requires-dist = [
     { name = "aiohttp", specifier = "==3.13.5" },
     { name = "authlib", specifier = "==1.7.0" },
     { name = "aws-opentelemetry-distro", marker = "extra == 'agentcore'", specifier = "==0.17.0" },
-    { name = "bedrock-agentcore", marker = "extra == 'agentcore'", specifier = "==1.6.4" },
+    { name = "bedrock-agentcore", marker = "extra == 'agentcore'", specifier = "==1.9.1" },
     { name = "black", marker = "extra == 'dev'", specifier = "==26.3.1" },
-    { name = "boto3", specifier = "==1.42.96" },
+    { name = "boto3", specifier = "==1.43.9" },
     { name = "cachetools", specifier = "==6.2.4" },
     { name = "cryptography", specifier = "==47.0.0" },
     { name = "fastapi", specifier = "==0.136.1" },
@@ -98,6 +99,7 @@ requires-dist = [
     { name = "openai", marker = "extra == 'agentcore'", specifier = "==2.32.0" },
     { name = "pillow", specifier = "==12.2.0" },
     { name = "pyjwt", extras = ["crypto"], specifier = "==2.12.1" },
+    { name = "pypdfium2", specifier = "==4.30.0" },
     { name = "pytest", marker = "extra == 'dev'", specifier = "==9.0.3" },
     { name = "pytest-asyncio", marker = "extra == 'dev'", specifier = "==1.3.0" },
     { name = "pytest-cov", marker = "extra == 'dev'", specifier = "==7.1.0" },
@@ -105,11 +107,11 @@ requires-dist = [
     { name = "python-multipart", specifier = "==0.0.27" },
     { name = "ruff", marker = "extra == 'dev'", specifier = "==0.15.12" },
     { name = "starlette", specifier = "==1.0.0" },
-    { name = "strands-agents", marker = "extra == 'agentcore'", specifier = "==1.37.0" },
-    { name = "strands-agents", extras = ["bidi"], marker = "extra == 'bidi'", specifier = "==1.37.0" },
-    { name = "strands-agents-tools", marker = "extra == 'agentcore'", specifier = "==0.5.1" },
+    { name = "strands-agents", marker = "extra == 'agentcore'", specifier = "==1.40.0" },
+    { name = "strands-agents", extras = ["bidi"], marker = "extra == 'bidi'", specifier = "==1.40.0" },
+    { name = "strands-agents-tools", marker = "extra == 'agentcore'", specifier = "==0.5.2" },
     { name = "tiktoken", marker = "extra == 'dev'", specifier = "==0.12.0" },
-    { name = "types-aiofiles", marker = "extra == 'dev'", specifier = "==25.1.0.20251011" },
+    { name = "types-aiofiles", marker = "extra == 'dev'", specifier = "==25.1.0.20260409" },
     { name = "uvicorn", extras = ["standard"], specifier = "==0.46.0" },
 ]
 provides-extras = ["agentcore", "bidi", "dev", "all"]
@@ -513,7 +515,7 @@ wheels = [
 
 [[package]]
 name = "bedrock-agentcore"
-version = "1.6.4"
+version = "1.9.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "boto3" },
@@ -525,9 +527,9 @@ dependencies = [
     { name = "uvicorn" },
     { name = "websockets" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/54/6d/6dfd3e9f05fb3fff256312cf7a9cee11d849281dfd4d32fa7aaf3cea87e9/bedrock_agentcore-1.6.4.tar.gz", hash = "sha256:7b3e12361ca432ab1cada5e191e6f3cfa9536cd5cedafc37058f670b263bdabf", size = 521801, upload-time = "2026-04-23T20:08:25.401Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/08/ba/91b6ec49558755cccc5bfa5a64916995baed5490768bee33581b370a1e4e/bedrock_agentcore-1.9.1.tar.gz", hash = "sha256:f0e69b41c32c12e395d698299c96981d34035dafa90e0e79fcbd743574315c6a", size = 692593, upload-time = "2026-05-12T21:50:47.639Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/49/cc/298426f7601172fab91a7c4fe6c0f7a07ecbdaeb2413f1e8dc5d79aacbd7/bedrock_agentcore-1.6.4-py3-none-any.whl", hash = "sha256:a20f76f23cf08f4c081704eeb85c1899340163066b1612458c93963055a5e3dd", size = 168734, upload-time = "2026-04-23T20:08:23.467Z" },
+    { url = "https://files.pythonhosted.org/packages/34/05/a5fbaa2320c34f8df196c105ca1938848845216cacc36850c73d116f28a9/bedrock_agentcore-1.9.1-py3-none-any.whl", hash = "sha256:f323c3d943dfe1defd52febd1409f8c4d04c0fc37848dd100ede692c2a6addd2", size = 262193, upload-time = "2026-05-12T21:50:45.506Z" },
 ]
 
 [[package]]
@@ -576,30 +578,30 @@ wheels = [
 
 [[package]]
 name = "boto3"
-version = "1.42.96"
+version = "1.43.9"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "botocore" },
     { name = "jmespath" },
     { name = "s3transfer" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/a6/2d/69fb3acd50bab83fb295c167d33c4b653faeb5fb0f42bfca4d9b69d6fb68/boto3-1.42.96.tar.gz", hash = "sha256:b38a9e4a3fbbee9017252576f1379780d0a5814768676c08df2f539d31fcdd68", size = 113203, upload-time = "2026-04-24T19:47:18.677Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/b4/cc/42d798fc5305e4636170b50cdfb305ff0a81f470e35131f4a0d2641976ae/boto3-1.43.9.tar.gz", hash = "sha256:37dac72f2921095378c0200caf07918d5e10a82b7c1f611abb70e44f69d0b962", size = 113135, upload-time = "2026-05-15T19:28:31.167Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/2b/9d/b3f617d011c42eb804d993103b8fa9acdce153e181a3042f58bfe33d7cb4/boto3-1.42.96-py3-none-any.whl", hash = "sha256:2f4566da2c209a98bdbfc874d813ef231c84ad24e4f815e9bc91de5f63351a24", size = 140557, upload-time = "2026-04-24T19:47:15.824Z" },
+    { url = "https://files.pythonhosted.org/packages/f4/dc/51286e9551f7852a79ce5d2a57468d9d905c30d32bcace55204551db202d/boto3-1.43.9-py3-none-any.whl", hash = "sha256:5e967292d361482793471bd80fad1e714515b7401f65a0d5b4aa6ef9d009c030", size = 140523, upload-time = "2026-05-15T19:28:28.948Z" },
 ]
 
 [[package]]
 name = "botocore"
-version = "1.42.97"
+version = "1.43.9"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "jmespath" },
     { name = "python-dateutil" },
     { name = "urllib3" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/c6/95/c37edb602948fad2253ffd1bb3dba5b938645bd1845ee4160350136a0f41/botocore-1.42.97.tar.gz", hash = "sha256:5c0bb00e32d16ff6d278cc8c9e10dc3672d9c1d569031635ac3c908a60de8310", size = 15269348, upload-time = "2026-04-27T20:39:05.625Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/ca/e8/f696c80982685a4cdb3df5f0781919afa50262f40e1aac7066c9c2520deb/botocore-1.43.9.tar.gz", hash = "sha256:93e91c7160678182860f5902ee4cfe6d643cac0d9ee84d3eb65becc9f4c00228", size = 15357963, upload-time = "2026-05-15T19:28:19.342Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/e3/d2/8e025ba1a4e257879af72d06913272311af79673d82fa2581a351b924317/botocore-1.42.97-py3-none-any.whl", hash = "sha256:77d2c8ce1bc592d3fbd7c01c35836f4a5b0cac2ca03ccdf6ffc60faa16b5fadc", size = 14950367, upload-time = "2026-04-27T20:39:01.261Z" },
+    { url = "https://files.pythonhosted.org/packages/77/c9/a1b51a74d476f5cb2f555ce8274f0f6b9fb21d75cc3f57b87dd0632ee17a/botocore-1.43.9-py3-none-any.whl", hash = "sha256:b9bdcd9c87fc552aad30006f00167d9ebb3480e1b06f1902bac5b2c41014fdab", size = 15039827, upload-time = "2026-05-15T19:28:14.543Z" },
 ]
 
 [[package]]
@@ -3668,6 +3670,26 @@ crypto = [
     { name = "cryptography" },
 ]
 
+[[package]]
+name = "pypdfium2"
+version = "4.30.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/a1/14/838b3ba247a0ba92e4df5d23f2bea9478edcfd72b78a39d6ca36ccd84ad2/pypdfium2-4.30.0.tar.gz", hash = "sha256:48b5b7e5566665bc1015b9d69c1ebabe21f6aee468b509531c3c8318eeee2e16", size = 140239, upload-time = "2024-05-09T18:33:17.552Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/c7/9a/c8ff5cc352c1b60b0b97642ae734f51edbab6e28b45b4fcdfe5306ee3c83/pypdfium2-4.30.0-py3-none-macosx_10_13_x86_64.whl", hash = "sha256:b33ceded0b6ff5b2b93bc1fe0ad4b71aa6b7e7bd5875f1ca0cdfb6ba6ac01aab", size = 2837254, upload-time = "2024-05-09T18:32:48.653Z" },
+    { url = "https://files.pythonhosted.org/packages/21/8b/27d4d5409f3c76b985f4ee4afe147b606594411e15ac4dc1c3363c9a9810/pypdfium2-4.30.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:4e55689f4b06e2d2406203e771f78789bd4f190731b5d57383d05cf611d829de", size = 2707624, upload-time = "2024-05-09T18:32:51.458Z" },
+    { url = "https://files.pythonhosted.org/packages/11/63/28a73ca17c24b41a205d658e177d68e198d7dde65a8c99c821d231b6ee3d/pypdfium2-4.30.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4e6e50f5ce7f65a40a33d7c9edc39f23140c57e37144c2d6d9e9262a2a854854", size = 2793126, upload-time = "2024-05-09T18:32:53.581Z" },
+    { url = "https://files.pythonhosted.org/packages/d1/96/53b3ebf0955edbd02ac6da16a818ecc65c939e98fdeb4e0958362bd385c8/pypdfium2-4.30.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:3d0dd3ecaffd0b6dbda3da663220e705cb563918249bda26058c6036752ba3a2", size = 2591077, upload-time = "2024-05-09T18:32:55.99Z" },
+    { url = "https://files.pythonhosted.org/packages/ec/ee/0394e56e7cab8b5b21f744d988400948ef71a9a892cbeb0b200d324ab2c7/pypdfium2-4.30.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:cc3bf29b0db8c76cdfaac1ec1cde8edf211a7de7390fbf8934ad2aa9b4d6dfad", size = 2864431, upload-time = "2024-05-09T18:32:57.911Z" },
+    { url = "https://files.pythonhosted.org/packages/65/cd/3f1edf20a0ef4a212a5e20a5900e64942c5a374473671ac0780eaa08ea80/pypdfium2-4.30.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f1f78d2189e0ddf9ac2b7a9b9bd4f0c66f54d1389ff6c17e9fd9dc034d06eb3f", size = 2812008, upload-time = "2024-05-09T18:32:59.886Z" },
+    { url = "https://files.pythonhosted.org/packages/c8/91/2d517db61845698f41a2a974de90762e50faeb529201c6b3574935969045/pypdfium2-4.30.0-py3-none-musllinux_1_1_aarch64.whl", hash = "sha256:5eda3641a2da7a7a0b2f4dbd71d706401a656fea521b6b6faa0675b15d31a163", size = 6181543, upload-time = "2024-05-09T18:33:02.597Z" },
+    { url = "https://files.pythonhosted.org/packages/ba/c4/ed1315143a7a84b2c7616569dfb472473968d628f17c231c39e29ae9d780/pypdfium2-4.30.0-py3-none-musllinux_1_1_i686.whl", hash = "sha256:0dfa61421b5eb68e1188b0b2231e7ba35735aef2d867d86e48ee6cab6975195e", size = 6175911, upload-time = "2024-05-09T18:33:05.376Z" },
+    { url = "https://files.pythonhosted.org/packages/7a/c4/9e62d03f414e0e3051c56d5943c3bf42aa9608ede4e19dc96438364e9e03/pypdfium2-4.30.0-py3-none-musllinux_1_1_x86_64.whl", hash = "sha256:f33bd79e7a09d5f7acca3b0b69ff6c8a488869a7fab48fdf400fec6e20b9c8be", size = 6267430, upload-time = "2024-05-09T18:33:08.067Z" },
+    { url = "https://files.pythonhosted.org/packages/90/47/eda4904f715fb98561e34012826e883816945934a851745570521ec89520/pypdfium2-4.30.0-py3-none-win32.whl", hash = "sha256:ee2410f15d576d976c2ab2558c93d392a25fb9f6635e8dd0a8a3a5241b275e0e", size = 2775951, upload-time = "2024-05-09T18:33:10.567Z" },
+    { url = "https://files.pythonhosted.org/packages/25/bd/56d9ec6b9f0fc4e0d95288759f3179f0fcd34b1a1526b75673d2f6d5196f/pypdfium2-4.30.0-py3-none-win_amd64.whl", hash = "sha256:90dbb2ac07be53219f56be09961eb95cf2473f834d01a42d901d13ccfad64b4c", size = 2892098, upload-time = "2024-05-09T18:33:13.107Z" },
+    { url = "https://files.pythonhosted.org/packages/be/7a/097801205b991bc3115e8af1edb850d30aeaf0118520b016354cf5ccd3f6/pypdfium2-4.30.0-py3-none-win_arm64.whl", hash = "sha256:119b2969a6d6b1e8d55e99caaf05290294f2d0fe49c12a3f17102d01c441bd29", size = 2752118, upload-time = "2024-05-09T18:33:15.489Z" },
+]
+
 [[package]]
 name = "pytest"
 version = "9.0.3"
@@ -4195,14 +4217,14 @@ wheels = [
 
 [[package]]
 name = "s3transfer"
-version = "0.16.0"
+version = "0.17.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "botocore" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/05/04/74127fc843314818edfa81b5540e26dd537353b123a4edc563109d8f17dd/s3transfer-0.16.0.tar.gz", hash = "sha256:8e990f13268025792229cd52fa10cb7163744bf56e719e0b9cb925ab79abf920", size = 153827, upload-time = "2025-12-01T02:30:59.114Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/9b/ec/7c692cde9125b77e84b307354d4fb705f98b8ccad59a036d5957ca75bfc3/s3transfer-0.17.0.tar.gz", hash = "sha256:9edeb6d1c3c2f89d6050348548834ad8289610d886e5bf7b7207728bd43ce33a", size = 155337, upload-time = "2026-04-29T22:07:36.33Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/fc/51/727abb13f44c1fcf6d145979e1535a35794db0f6e450a0cb46aa24732fe2/s3transfer-0.16.0-py3-none-any.whl", hash = "sha256:18e25d66fed509e3868dc1572b3f427ff947dd2c56f844a5bf09481ad3f3b2fe", size = 86830, upload-time = "2025-12-01T02:30:57.729Z" },
+    { url = "https://files.pythonhosted.org/packages/87/72/c6c32d2b657fa3dad1de340254e14390b1e334ce38268b7ad51abda3c8c2/s3transfer-0.17.0-py3-none-any.whl", hash = "sha256:ce3801712acf4ad3e89fb9990df97b4972e93f4b3b0004d214be5bce12814c20", size = 86811, upload-time = "2026-04-29T22:07:34.966Z" },
 ]
 
 [[package]]
@@ -4362,7 +4384,7 @@ wheels = [
 
 [[package]]
 name = "strands-agents"
-version = "1.37.0"
+version = "1.40.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "boto3" },
@@ -4378,9 +4400,9 @@ dependencies = [
     { name = "typing-extensions" },
     { name = "watchdog" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/03/88/cf23aa713ea68c8a0ad5144341da7ee022e88ce6206512aeafddba257b75/strands_agents-1.37.0.tar.gz", hash = "sha256:3fe6821f730f0468eee91e1ff38eb27a5244046893ffba63e8f5345288096509", size = 824168, upload-time = "2026-04-22T19:18:01.378Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/07/fa/b5fdfa099b122fea98fc64b9923237077ed6b7c2a90f2c3a65cba00d7202/strands_agents-1.40.0.tar.gz", hash = "sha256:5d867c1255f8449f0030a9a9c085106c15b1704e871d0fea56d3c20b2309a4d3", size = 878176, upload-time = "2026-05-14T13:48:28.812Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/5f/ff/bede1b8d5fe1c776bd5ed33575505681b3b65ab20889fe6b8344b92fc82d/strands_agents-1.37.0-py3-none-any.whl", hash = "sha256:2fa12e22ed1dac228aa93e91c2ea5381d9b3f08416ed8162222b61b255fee0b1", size = 404526, upload-time = "2026-04-22T19:17:59.634Z" },
+    { url = "https://files.pythonhosted.org/packages/9e/ca/ce4c061d0fa007738f0ce4ebdb234969d9343322a089c24d5986620faa66/strands_agents-1.40.0-py3-none-any.whl", hash = "sha256:40c04f411e4082a6eb78b22d5b421757b27aac1f9a42e8198ff3db7fd4fcc13f", size = 432744, upload-time = "2026-05-14T13:48:26.639Z" },
 ]
 
 [package.optional-dependencies]
@@ -4391,7 +4413,7 @@ bidi = [
 
 [[package]]
 name = "strands-agents-tools"
-version = "0.5.1"
+version = "0.5.2"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "aiohttp" },
@@ -4412,9 +4434,9 @@ dependencies = [
     { name = "tzdata", marker = "sys_platform == 'win32'" },
     { name = "watchdog" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/b4/fc/8a9da78b5c4a8802367a8eeec046f98eda742b1ee1b2fff568c81c1b3479/strands_agents_tools-0.5.1.tar.gz", hash = "sha256:616ba88b5849d9fd495da057ccb670108580320b8cb0fc4faac5fc327f2622aa", size = 483123, upload-time = "2026-04-22T20:01:13.305Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/63/32/710a49ffd32b0a232ec1731620ee6105c045e9a77ecee1f3ecaa1a80a6cd/strands_agents_tools-0.5.2.tar.gz", hash = "sha256:96763c8ae75933c5dd327cca87561f573aed720c9c0f3d17fd20835910d11381", size = 483164, upload-time = "2026-04-30T17:08:13.151Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/64/59/79360f718683ae15cefeb8b0ca1e6d96608c4581280fb12b0f502375a705/strands_agents_tools-0.5.1-py3-none-any.whl", hash = "sha256:790865d073410e9a16ac44ce3a46c169b98e1f89844ce8670472b869257b7686", size = 316122, upload-time = "2026-04-22T20:01:11.599Z" },
+    { url = "https://files.pythonhosted.org/packages/59/ef/fe73b6d25d095784d2e1f6f33419265e796143100fb2f32a6e86f8ae68af/strands_agents_tools-0.5.2-py3-none-any.whl", hash = "sha256:8f85e4cb28d9411e62e1f159aa7e300d3a0f4b1d2b878a7cdfd5d746d9333343", size = 316178, upload-time = "2026-04-30T17:08:11.416Z" },
 ]
 
 [[package]]
@@ -4567,11 +4589,11 @@ wheels = [
 
 [[package]]
 name = "types-aiofiles"
-version = "25.1.0.20251011"
+version = "25.1.0.20260409"
 source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/84/6c/6d23908a8217e36704aa9c79d99a620f2fdd388b66a4b7f72fbc6b6ff6c6/types_aiofiles-25.1.0.20251011.tar.gz", hash = "sha256:1c2b8ab260cb3cd40c15f9d10efdc05a6e1e6b02899304d80dfa0410e028d3ff", size = 14535, upload-time = "2025-10-11T02:44:51.237Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/6c/66/9e62a2692792bc96c0f423f478149f4a7b84720704c546c8960b0a047c89/types_aiofiles-25.1.0.20260409.tar.gz", hash = "sha256:49e67d72bdcf9fe406f5815758a78dc34a1249bb5aa2adba78a80aec0a775435", size = 14812, upload-time = "2026-04-09T04:22:35.308Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/71/0f/76917bab27e270bb6c32addd5968d69e558e5b6f7fb4ac4cbfa282996a96/types_aiofiles-25.1.0.20251011-py3-none-any.whl", hash = "sha256:8ff8de7f9d42739d8f0dadcceeb781ce27cd8d8c4152d4a7c52f6b20edb8149c", size = 14338, upload-time = "2025-10-11T02:44:50.054Z" },
+    { url = "https://files.pythonhosted.org/packages/27/d0/28236f869ba4dfb223ecdbc267eb2bdb634b81a561dd992230a4f9ec48fa/types_aiofiles-25.1.0.20260409-py3-none-any.whl", hash = "sha256:923fedb532c772cc0f62e0ce4282725afa82ca5b41cabd9857f06b55e5eee8de", size = 14372, upload-time = "2026-04-09T04:22:34.328Z" },
 ]
 
 [[package]]
diff --git a/docs/feature-summaries/MULTIMODAL_FILE_ATTACHMENTS.md b/docs/feature-summaries/MULTIMODAL_FILE_ATTACHMENTS.md
index 52a593a9..49fb58f3 100644
--- a/docs/feature-summaries/MULTIMODAL_FILE_ATTACHMENTS.md
+++ b/docs/feature-summaries/MULTIMODAL_FILE_ATTACHMENTS.md
@@ -42,7 +42,7 @@ Users can attach files to chat messages. Files are uploaded to S3 via pre-signed
 │  Frontend                      Backend                         AWS          │
 │  ────────                      ───────                         ───          │
 │                                                                             │
-│  1. POST /chat/agent-stream    2. FileResolver.resolve_files()              │
+│  1. POST /chat/stream (BFF)    2. FileResolver.resolve_files()              │
 │     {message, file_upload_ids}    ─────────────────────────────► S3        │
 │     ─────────────────────────►    - Fetch each file from S3                │
 │                                   - Base64 encode content                   │
diff --git a/docs/feature-summaries/RBAC_IMPLEMENTATION.md b/docs/feature-summaries/RBAC_IMPLEMENTATION.md
index 2b0f185b..0339f70d 100644
--- a/docs/feature-summaries/RBAC_IMPLEMENTATION.md
+++ b/docs/feature-summaries/RBAC_IMPLEMENTATION.md
@@ -116,10 +116,10 @@ async def critical_endpoint(user: User = Depends(require_all_roles("Admin", "Sec
 ### Conditional Features
 
 ```python
-from apis.shared.auth import get_current_user, has_any_role
+from apis.shared.auth import get_current_user_from_session, has_any_role
 
 @router.get("/dashboard")
-async def dashboard(user: User = Depends(get_current_user)):
+async def dashboard(user: User = Depends(get_current_user_from_session)):
     """All authenticated users can access, but admins see extra data."""
     response = {"user": user.email}
 
@@ -303,7 +303,7 @@ The dependency automatically:
 
 1. **Always use dependencies** - Never manually check roles
 2. **Log admin actions** - Audit trail for compliance
-3. **Use specific roles** - Prefer `require_admin` over `get_current_user` for sensitive operations
+3. **Use specific roles** - Prefer `require_admin` over `get_current_user_from_session` for sensitive operations
 4. **Never disable auth in production** - `ENABLE_AUTHENTICATION=false` is for development only
 5. **Validate on every request** - Stateless authentication, no sessions
 6. **Use HTTPS in production** - Protect tokens in transit
diff --git a/docs/kaizen/decisions.md b/docs/kaizen/decisions.md
new file mode 100644
index 00000000..ddd0eebe
--- /dev/null
+++ b/docs/kaizen/decisions.md
@@ -0,0 +1,26 @@
+# Kaizen Decisions Log
+
+Declined proposals and corrected premises. `kaizen-research` and `kaizen-review-prep`
+**must not re-propose** anything here without *materially new context* (a new capability,
+a changed upstream constraint, or a new exploit/failure path). Each entry records what
+the new context would have to be to re-open it.
+
+---
+
+### [2026-05-18] Declined — Add Reddit `.rss` or Reddit MCP to `kaizen-research`
+- **Origin**: review-queue.md (open since 2026-05-10) ▸ research/2026-05-10.md Risks; recommended Decline in reviews/2026-05-15.md ▸ Retirement Candidates.
+- **Decision**: Decline.
+- **Reasoning**: research/2026-05-15.md confirmed Reddit is blocked at the **domain level** via WebFetch — not just the HTML path. The proposal as scoped (add a Reddit `.rss` source to the research skill) is infeasible with current tooling.
+- **Re-open only if**: a Reddit MCP server becomes available, or a `curl`-via-Bash path with a custom User-Agent header is whitelisted. Absent one of those, do not re-surface.
+
+### [2026-05-18] Premise corrected — "Close #266 / #267 as phantom tech debt"
+- **Origin**: review-queue.md (open since 2026-05-10) ▸ reviews/2026-05-15.md ▸ Proposal #7 ("close phantom tech debt; features already in our Strands 1.39 pin"). Actioned via PR #338.
+- **Decision**: Premise rejected. Issues **#266** (large tool-result offload) and **#267** (context-window lookup fallback) are **not** phantom debt — they are live, well-specified Strands adoption/wiring tasks whose 1.39 precondition is now met. PR #338 posted "unblocked, keep open" comments on both rather than closing them.
+- **Reasoning**: The kaizen review assumed the upstream features being present in our pinned Strands version made the issues obsolete. They are not obsolete — they track the *wiring* work to actually adopt those features. Closing them would have silently dropped real, scoped backlog.
+- **Re-open only if**: never re-propose *closing* #266/#267 on the "already in our pin" basis. They are valid open work; treat as normal backlog, not kaizen retirement candidates. (Proposing to *implement* them is fine — that is the opposite of this decision.)
+
+### [2026-05-18] Scope note — "Adopt Strands built-in proactive compression, retire our custom `TurnBasedSessionManager` compaction"
+- **Origin**: the review-queue Strands-bump entry framed Strands 1.40 proactive compression (PR #2239) as a "library-native subtraction" reducing our custom session-manager compaction surface. Surfaced concretely in PR #340's "Subtraction opportunity (noted, NOT acted on)".
+- **Decision**: **Not a drop-in replacement.** Do not propose retiring our custom compaction on a bare "Strands now does this" basis.
+- **Reasoning**: Strands' built-in proactive compression operates on `ConversationManager` and only summarizes. Our `TurnBasedSessionManager` compaction additionally does: (1) tool-content truncation, (2) AgentCore-Memory long-term-summary retrieval, (3) DynamoDB-persisted checkpoint state — and drives the PR #243 `compaction` SSE event. The built-in managers do none of (1)–(3).
+- **Re-open only if**: a concrete migration design accounts for tool-content truncation, LTM summary retrieval, DynamoDB checkpoint persistence, and the `compaction` SSE-once invariant. A bare "adopt the built-in, delete ours" proposal is out of scope and should not be re-surfaced.
diff --git a/docs/kaizen/research/2026-05-10.md b/docs/kaizen/research/2026-05-10.md
new file mode 100644
index 00000000..b009ba2e
--- /dev/null
+++ b/docs/kaizen/research/2026-05-10.md
@@ -0,0 +1,265 @@
+# Kaizen Research — Sunday, May 10, 2026
+> Scan window: May 3 – May 10, 2026 (7 days; reference repo + UI/UX scan extended to 30 days for first-run baseline)
+> Web budget: 64/50 used (target — UX-lens scan added 10 requests post-initial-run). Frontier-models also went over the sub-budget by ~5 due to two OpenAI WebFetch 403s.
+> **Bootstrap run** — first execution of the kaizen-research skill. Subsequent runs cover only the prior 7 days for the reference repo + UX sources too.
+
+## TL;DR
+
+Three converging signals this week:
+1. **MCP Apps is now the de-facto agentic UI standard**, and we don't host it yet. The spec (SEP-1865) is production-ready: tool results can declare a `ui://` resource that the host renders in a sandboxed iframe alongside the chat. Claude Desktop, ChatGPT, VS Code Copilot, Goose, Postman all ship support. Every third-party MCP server we connect could be shipping richer UX than text+JSON; we're leaving that on the table.
+2. **Upstream is shrinking our backlog for free**: our open issues #266/#267 were quietly solved in Strands v1.37/v1.38 (now in our 1.39 pin from #265); `bedrock-agentcore` is 3 minor versions behind (1.6.4 → 1.9.0, latest published May 7 — inside the scan window) with likely fixes for two open SDK issues we feel.
+3. **CI is broken**: 9 Nightly Build & Test failures + 6+ Deploy failures in 7 days, untriaged.
+
+**Recommended #1**: scope an MCP Apps host renderer in our chat (multi-PR initiative). It's the highest-leverage agentic-UX investment this week per the scan. **Recommended quick-win**: bump `bedrock-agentcore` 1.6.4 → 1.9.0.
+
+## External Scan
+
+### What's moving this week
+
+The week converged on two themes worth our attention. First, AWS shipped two AgentCore capabilities that map cleanly onto things we already do: **AgentCore Runtime BYO filesystem from S3/EFS** (cross-session filesystem persistence without custom mount code) and **AgentCore Memory metadata** (structured tags on long-term memory records for filtered retrieval). Both are direct value-adds to our `inference-api` and our `TurnBasedSessionManager` layer. Second, Strands has been cleaning up the long tail: v1.37 added a context-window lookup table (closes our open issue #267), v1.38 added large tool result offload (closes our open issue #266), and v1.39 — which we just pinned in #265 — added AWS-profile support for the OpenAI provider. We're caught up to the head, but we haven't yet *used* the v1.37/v1.38 features the upgrade unlocked.
+
+The reference repo (`aws-samples/sample-strands-agent-with-agentcore`) has diverged from us in one major direction (CDK → Terraform on Apr 19) and converged in several minor ones — most notably moving compaction state and per-message metadata onto Strands' own `agent.state` and `message.metadata` instead of a custom DynamoDB table. They also abandoned the `enabledTools` whitelist pattern that's still embedded in our CLAUDE.md, in favor of a `disabled_skills` blacklist read from DDB per-request. Those are architectural calls, not direct ports.
+
+The MCP spec is heading toward stateless transport (SEP-2567 sessionless MCP merged May 7), which is a strong fit for our SigV4 Gateway model — but our Python `mcp` library hasn't picked it up yet (current 1.27.1). Watch.
+
+### Notable items by source
+
+#### AWS Bedrock / AgentCore
+- **AgentCore Runtime BYO file system from S3 and EFS** — Attach S3/EFS to runtimes for cross-session persistence without custom mount code — https://aws.amazon.com/about-aws/whats-new/2026/05/amazon-bedrock-agentcore-runtime/ — *relevance*: directly applicable to `inference-api`; could replace future filesystem-staging glue
+- **AgentCore Memory adds metadata for long-term memory** — Long-term memory records now support structured metadata for filtered retrieval — https://aws.amazon.com/about-aws/whats-new/2026/05/agentcore-longterm-memory-metadata — *relevance*: `TurnBasedSessionManager` long-term flush could carry user/RBAC/conversation-type metadata for richer recall
+- **Secure AI agents with AgentCore Identity on Amazon ECS** — OAuth federation walkthrough for ECS-hosted agents — https://aws.amazon.com/blogs/machine-learning/secure-ai-agents-with-amazon-bedrock-agentcore-identity-on-amazon-ecs/ — *relevance*: useful reference; pattern mirrors our `apis/shared/oauth/agentcore_identity.py` mint-fallback
+- **OS-Level Actions in AgentCore Browser** — OS-level control for native UI agents — https://aws.amazon.com/blogs/machine-learning/introducing-os-level-actions-in-amazon-bedrock-agentcore-browser/ — *relevance*: informational; we don't use AgentCore Browser
+- **AgentCore Payments preview** — Wallet/auth/governance for transactional agents (Coinbase + Stripe partners) — https://aws.amazon.com/blogs/machine-learning/agents-that-transact-introducing-amazon-bedrock-agentcore-payments-built-with-coinbase-and-stripe/ — *relevance*: informational; no commerce path today
+
+**Open AgentCore SDK issues affecting us:**
+- **#456 — OTEL context detached across asyncio/thread boundaries in memory client + Strands session_manager** — https://github.com/aws/bedrock-agentcore-sdk-python/issues/456 — *applicability*: HIGH — we use Strands 1.39 + AgentCore Memory + `TurnBasedSessionManager`; X-Ray/OTEL traces likely show broken spans on memory writes
+- **#452 — AgentCoreMemorySessionManager: add `async_mode` to prevent event-loop blocking** — https://github.com/aws/bedrock-agentcore-sdk-python/issues/452 — *applicability*: HIGH — `inference-api` is FastAPI/async; sync flush on the loop could be hurting concurrency
+- **#453 — Auto-populate AgentCard.skills[] from ToolRegistry in serve_a2a** — *applicability*: medium; relevant if/when we expose A2A endpoints
+
+#### Strands Agents
+- **v1.39.0 (current pin)** — AWS profile support for OpenAI, MCP init error messaging, Bedrock token-counting enhancements, A2A task-lifecycle states — https://github.com/strands-agents/sdk-python/releases/tag/v1.39.0 — *informational*: just landed in #265
+- **v1.38.0 — large tool result offload + `CachePoint` TTL for prompt caching** — https://github.com/strands-agents/sdk-python/releases/tag/v1.38.0 — *closes our issue #266*
+- **v1.37.0 — context-window limit lookup tables + experimental checkpoint API** — https://github.com/strands-agents/sdk-python/releases/tag/v1.37.0 — *closes our issue #267*
+- **#2266 — `BedrockModel.stream` leaks inner task on outer cancellation (May 9, open)** — https://github.com/strands-agents/sdk-python/issues/2266 — *applicability*: HIGH — we cancel SSE streams on client disconnect; check for "Task exception was never retrieved" in stream_coordinator logs
+- **#2271 — Support dual cache prefixes in Bedrock auto caching strategy (May 10)** — https://github.com/strands-agents/sdk-python/issues/2271 — *applicability*: medium; pairs with issue #269 (prompt caching) if we move to Strands' built-in caching strategy
+- **#2243 — Tool-level suspend/resume for external async callbacks** — https://github.com/strands-agents/sdk-python/issues/2243 — *applicability*: medium; could simplify our `oauth_required` SSE handoff
+- **PR #2239 — Proactive Context Compression (merged May 8)** — https://github.com/strands-agents/sdk-python/pull/2239 — *applicability*: medium; could complement our SSE compaction surfacing
+
+#### Reference repo: aws-samples/sample-strands-agent-with-agentcore (last 30 days — bootstrap baseline)
+- **CDK → Terraform migration (Apr 19, c422fbf)** — https://github.com/aws-samples/sample-strands-agent-with-agentcore/commit/c422fbf — *applicability*: NOT relevant for porting; we're CDK-native. **Implication**: the reference repo is no longer a usable CDK template going forward. Anything CDK-shaped historically pulled from them is frozen at pre-Apr-19 state.
+- **Compaction state + metrics moved from custom DynamoDB to SDK `agent.state` + `message.metadata` (Apr 27, 2b1a13d)** — https://github.com/aws-samples/sample-strands-agent-with-agentcore/commit/2b1a13d — *applicability*: HIGH — our `TurnBasedSessionManager` could shed code by piggybacking on `agent.state` rather than maintaining parallel state; potential subtraction
+- **Force re-auth on OAuth 401/403 mid-tool-call (Apr 22, 9fcdb4c)** — https://github.com/aws-samples/sample-strands-agent-with-agentcore/commit/9fcdb4c — *applicability*: HIGH — verify our `oauth_required` SSE flow handles mid-conversation 401/403 from Google etc. by re-emitting `oauth_required` rather than streaming an error
+- **Supersede stale executions instead of 409-rejecting (May 6, d6c9516)** — https://github.com/aws-samples/sample-strands-agent-with-agentcore/commit/d6c9516 — *applicability*: medium; check how app-api handles concurrent submissions on the same conversation
+- **Use SDK `agent.cancel()` for stop-signal handling (May 6, fd9acec)** — https://github.com/aws-samples/sample-strands-agent-with-agentcore/commit/fd9acec — *applicability*: medium; if we have custom cancellation code, may simplify
+- **`enabledTools` whitelist replaced with `disabled_skills` blacklist (May 3, 092aa33)** — https://github.com/aws-samples/sample-strands-agent-with-agentcore/commit/092aa33 — *applicability*: monitor; our CLAUDE.md still mentions `enabled_tools` as a debug step. Inversion has UX upside but RBAC implications
+
+#### MCP ecosystem
+- **SEP-2567 Sessionless MCP merged (May 7)** — https://github.com/modelcontextprotocol/modelcontextprotocol/pull/2567 — *implications*: drops `Mcp-Session-Id` and `session/create`; list endpoints become cacheable. Strong fit for our SigV4 Gateway model. Watch python `mcp` library for adoption.
+- **SEP-2575 init-removal track (companion)** — same thread — *implications*: stateless HTTP transport simplifies Lambda-backed Gateway servers
+- **Schema rename: `IncompleteResult` → `InputRequiredResult`** — typed-API break on next `mcp` lib bump
+- **MCPSafe — security scanner for MCP servers** — https://github.com/orgs/modelcontextprotocol/discussions — could scan our Gateway-hosted servers
+- **MCP servers repo (no new servers this week)** — discovery has moved to `registry.modelcontextprotocol.io`
+
+#### FastMCP (used by our externally hosted MCP servers, behind AgentCore Gateway)
+- **Latest release: 3.2.4** — published 2026-04-14 (~26 days ago) — https://pypi.org/project/fastmcp/ — *applicability*: cross-reference against our MCP server repos' pinned FastMCP version; if any are behind 3.x, evaluate the migration path.
+- **Bootstrap-run note**: FastMCP source category was added mid-bootstrap based on follow-up feedback. Full release-notes + issues scan (https://github.com/jlowin/fastmcp) deferred to the first regular Friday run (2026-05-15). For this bootstrap, only the PyPI version snapshot is captured.
+
+#### Agentic UI/UX patterns (30-day baseline scan for bootstrap)
+
+- **MCP Apps extension is production-ready (SEP-1865)** — https://modelcontextprotocol.io/extensions/apps/overview | https://blog.modelcontextprotocol.io/posts/2026-01-26-mcp-apps/ — *what it is*: spec letting MCP tools return a `_meta.ui.resourceUri` pointing to a `ui://` resource; host fetches the HTML and renders it in a sandboxed iframe alongside the chat with bidirectional `ui/`-prefixed JSON-RPC via `postMessage`. Claude Desktop, ChatGPT, VS Code Copilot, Goose, Postman, MCPJam already ship support. — *fit*: **direct port (high impact, high effort)** — this is the standard our chat is going to be measured against in 2026. — *where it'd land*: new SSE event (`ui_resource` carrying `{resourceUri, csp, permissions}`), Angular `<mcp-app-frame>` sandboxed-iframe component implementing the `ui/` host bridge, branch in tool-result rendering pipeline.
+- **MCP Apps host security model — sandboxed iframe + opt-in capabilities** — https://modelcontextprotocol.io/extensions/apps/overview — *what it is*: hosts declare capabilities (`sendOpenLink`, mic, camera) a given app can request; tool-call proxying goes through the host with user consent. — *fit*: **direct port** — maps cleanly onto our existing `oauth_required` consent pattern. — *where it'd land*: extend `oauth_required` SSE event family with `ui_consent_required`; reuse per-provider consent badge UI.
+- **MCP Apps example servers** — https://github.com/modelcontextprotocol/ext-apps/tree/main/examples — *what it is*: starter servers for data exploration (cohort heatmap, customer segmentation), forms (scenario modeler, budget allocator), media (PDF, video, sheet music), 3D (Cesium, Three.js). — *fit*: pattern-only — templates are React/Vue/Svelte but the protocol is framework-agnostic. — *informs*: the kinds of internal tools we'd expose as MCP Apps once we host.
+- **AI SDK "Render Visual Interface in Chat" recipe** — https://ai-sdk.dev/cookbook — *what it is*: pattern where tool results map to specific UI components on the client, model drives which component renders. — *fit*: pattern-only (React hook). — *Angular equivalent*: a `toolRenderers` registry keyed by tool name, with a signal-driven `<tool-result>` component doing `@switch (toolName())` over registered renderers. We do a coarse version today; the pattern argues for making it a first-class extension point so per-tool components live next to the tool definition rather than in a god-switch.
+- **AI SDK "Call Tools in Multiple Steps" / `streamText` multi-step** — https://ai-sdk.dev/cookbook — *fit*: pattern-only. — *Angular equivalent*: keep `signal()`-backed tool-call state mutable across the conversation (don't freeze at `tool_result`), so prior tool-call cards stay interactive as new steps stream in.
+- **assistant-ui @0.14.0 (2026-05-07)** — https://github.com/Yonom/assistant-ui/releases — API consolidation (`useAui` replaces deprecated naming). Also: `mcp-app-studio` package updated alongside — assistant-ui is shipping first-party MCP Apps authoring/preview tooling. — *signal*: **MCP Apps is the assumed UI surface** for serious agentic chat shells going forward.
+- **"Output isn't design" — Karri Saarinen, Linear (2026-04-17)** — https://linear.app/now/output-isn-t-design — *takeaway*: pointed pushback on generative-UI hype. "Plausible-looking generated interfaces unravel the moment you actually use them" because the work of resolving tensions and edge cases hasn't happened. — *implication for us*: when we add MCP Apps, treat the iframe as a vehicle for *purpose-built* UIs (forms, viewers), not as a "let the model generate a UI" shortcut.
+- **"Interact with agent-created visualizations in canvases" — Cursor (2026-04-15)** — https://www.cursor.com/blog/canvas — *takeaway*: agent output that's interactive (charts you can drill into, plots you can re-parameterize) is now table stakes in agentic IDEs. Maps to our PDF/markdown/spreadsheet preview surface — direction is "previews become interactive viewers," not static thumbnails.
+- **Linear Agent as named participant** — https://linear.app/now/how-we-use-linear-agent-at-linear (2026-04-10) + https://linear.app/changelog/2026-04-23-linear-agent-mcp-support — *pattern*: Linear's agent reads context via MCP and posts back as a structured agent identity in the issue thread (not as a chat message). **Agents as named participants with distinct affordances**, not just a stream of assistant text. — *fit for us*: worth considering for our multi-agent A2A flows — A2A sub-agents could render as distinct attributed turns rather than nested tool calls.
+- **"Claude Design by Anthropic Labs" (2026-04-17)** — https://www.anthropic.com/news/claude-design-anthropic-labs — *takeaway*: "collaborate with Claude to produce polished visual work" as a first-class output type. Validates investing in artifact-style rendering surfaces beyond plain markdown.
+- **NN/g "Designing AI Agents: 4 Lessons from China's Qwen Agent" (2026-05-08)** — https://www.nngroup.com/articles/designing-ai-agents/ — *evidence-based principles*: support discoverability, reuse familiar patterns, handle personal data carefully, protect user autonomy. — *applicability*: **discoverability** — tool-call rendering should surface available tools *before* the user has to phrase the right prompt (slash menu, suggestions from `enabled_tools`). **Autonomy** — our `oauth_required` consent event is on-pattern; extend the same explicit-consent model to MCP-Apps-initiated tool calls.
+- **OpenAI AgentKit / Agent Builder visual canvas** — https://openai.com/index/introducing-agentkit/ — *takeaway*: agent *authoring* is moving to visual node-graphs. Not directly applicable to our runtime chat, but a signal that **agent-state visibility** (which agent is running, which tool, what step) is increasingly expected at runtime too — relevant to how we render A2A and multi-step tool flows.
+
+#### Frontier model announcements
+- **Anthropic — higher Opus rate limits (May 6)** — https://www.anthropic.com/news/higher-limits-spacex — informational; we use Bedrock-hosted, not first-party
+- **Anthropic — finance-agents pack (May 5)** — https://www.anthropic.com/news/finance-agents — Moody's MCP server is a concrete public MCP we could register in Gateway if a finance use case emerges
+- **OpenAI — GPT-5.5 Instant displaces GPT-5.3 Instant (May 5)** — https://openai.com/index/gpt-5-5-instant/ — *risk*: confirm our model selector doesn't expose a deprecation-path 5.3 ID
+- **Google / Gemini** — quiet week (no new model/API deltas)
+- **Meta / Llama** — quiet week
+
+#### Agent harness patterns
+- **Claude Code 2.1.136 — skills-under-plugins fix + MCP content-block fix** — https://github.com/anthropics/claude-code/blob/main/CHANGELOG.md — *relevance*: skill loading + MCP tool result rendering
+- **Claude Code 2.1.133 — hooks receive `effort.level` + `worktree.baseRef` setting** — same URL — pattern worth mirroring in Strands hook payloads
+- **Claude Code 2.1.132 — `CLAUDE_CODE_SESSION_ID` env into Bash subprocess** — same URL — session-id-everywhere pattern we already do loosely
+- **CMA_coordinate_specialist_team.ipynb (May 6)** — https://github.com/anthropics/claude-cookbooks/tree/main/managed_agents — coordinator + 3 specialists with per-role tool scoping
+- **CMA_verify_with_outcome_grader.ipynb (May 6)** — same repo — writer/grader loop with `user.define_outcome` rubrics; could bolt onto SSE for tool-result fact-checking
+- **Agent Development Lifecycle (LangChain blog, May 9)** — https://www.langchain.com/blog/the-agent-development-lifecycle — our kaizen cadence already covers most of this; gap is "online evals"
+
+#### Pricing / quota
+- No detected Bedrock or AgentCore pricing changes this week
+- Note: `https://aws.amazon.com/bedrock/whats-new/` returned **404** — page appears retired. Skill source URL needs replacement.
+
+#### Community + GitHub issues
+- HN: 0 hits for stack keywords (`bedrock`, `agentcore`, `strands`, `mcp`, `claude code`) in the 7-day window — quiet
+- Reddit: blocked from WebFetch in this environment — gap to address (`.rss` or Reddit MCP)
+
+#### Cookbook / courses
+- 4 new managed-agent cookbooks landed May 5–8 (vulnerability detection, coordinator/specialists, outcome grading, registry category)
+- `anthropics/courses` quiet (last commit Nov 2025) — candidate to drop from weekly scan
+
+#### Seasonal
+- Out of window — no re:Invent or NeurIPS items
+
+### Patterns worth considering
+
+- **Online evals via grader sub-agent** — sample N% of conversation turns, run a stateless grader, persist outcomes. Fits LangChain's Agent Development Lifecycle framing and the CMA outcome-grader cookbook. **Verdict**: monitor — interesting once we've shipped the core cleanups below.
+- **Brain/hands separation** (Anthropic Managed Agents direction) — push session/checkpoint store outside the agent process. We already do this via AgentCore Memory; fully aligned. **Verdict**: aligned, no action.
+- **Sessionless MCP** (SEP-2567) — list endpoints cacheable per (deployment, auth). Direct fit for SigV4 Gateway. **Verdict**: monitor; act when python `mcp` library adopts.
+
+## Internal Audit
+
+### Activity (last 7 days)
+- **Commits on develop**: 8 (all from squash-merged PRs)
+- **PRs opened**: 5 (4 dependabot — #237/#239/#241 still open, plus #276 docs)
+- **PRs merged**: 8
+- **PRs reverted**: 0
+- **Issues opened**: 4 (#266, #267, #268, #269 on May 9 — Strands-features and prompt caching)
+- **CI failures (workflow → count)**: Nightly Build & Test 9, Deploy Inference API 5, Deploy App API 6, Deploy Frontend 1, Version Check 6, Deploy Infrastructure 2
+
+### Repeated friction signals
+- **Nightly Build & Test failing 9× since May 6** — concentrated cluster; no signal it's been investigated. Could be the test flakiness from issue #220 (order-dependent flakiness in `test_cognito_idp_service`, `test_oauth_repositories`, `test_auth_providers*`) compounding, or a different cause. *Hypothesis*: untriaged. *Fix candidate*: triage one failure end-to-end; promote to a blocking issue if not already on the board.
+- **Deploy workflows failing 6+ times May 6–9** — Inference API, App API, Frontend deploys all hit failures. *Hypothesis*: BFF migration shipped this week (#272–#277) introduced env-var or stack drift not caught in synth. *Fix candidate*: cross-check most recent failed deploy log against beta.24 ↔ post-beta.24 stack diff.
+- **5 of 8 commits this week are BFF/auth fixes** (#270, #271, #273, #274, #275, #277) — the BFF migration shipped in beta.24 is still being patched. Healthy iteration, but the pace says "treat BFF as not-done-yet" before declaring beta.25.
+
+### Version-pin lag
+
+| Dep | Pinned | Latest | Lag | Notes |
+|---|---|---|---|---|
+| `bedrock-agentcore` | 1.6.4 | **1.9.0** | 3 minor / latest 2026-05-07 | Open issues #456 (OTEL detach) and #452 (event-loop blocking) may already be addressed |
+| `boto3` | 1.42.96 | 1.43.6 | 1 minor / ~10 patches | Routine bump |
+| `aws-cdk-lib` | 2.251.0 | 2.253.1 | 2 patch | Routine |
+| `aws-cdk` | 2.1120.0 | 2.1121.0 | 1 patch | Routine |
+| `@angular/core` | 21.2.11 | 21.2.12 | 1 patch | Routine |
+| `strands-agents` | 1.39.0 | 1.39.0 | current | Just upgraded in #265 |
+| `fastapi` | 0.136.1 | 0.136.1 | current | — |
+| `mcp` | (transitive) | 1.27.1 | n/a | Watch for SEP-2567 adoption |
+
+### Retirement candidates
+
+- **`enabled_tools` whitelist debug guidance in `CLAUDE.md`** — Reference repo abandoned this pattern May 3 (`092aa33`) for `disabled_skills` blacklist. Not urgent retirement, but worth a re-evaluation if we touch tool-enablement code.
+- **`anthropics/courses` source in `kaizen-research`** — Last commit Nov 2025; subagent reported "quiet". Drop from weekly scan list.
+- **`https://aws.amazon.com/bedrock/whats-new/` URL in `kaizen-research`** — 404'd on this run. Replace with the AWS What's New RSS feed only, or a different filtered URL.
+- **`https://docs.claude.com/en/docs/claude-code/release-notes` URL in `kaizen-research`** — 301→404. Replace with `https://github.com/anthropics/claude-code/blob/main/CHANGELOG.md`.
+- **6 of 9 skills not modified in 60+ days** (angualar-best-practices, tailwind-ui, frontend-design, cdk-infrastructure, versioning, cors-deployment) — modification freshness alone is a weak signal for skills since they encode stable conventions. **No retirement recommended without invocation telemetry.**
+
+### Risks introduced this week
+
+- **`bedrock-agentcore` 3 minor versions behind** with a release in the scan window — issues #456 (OTEL trace detach in Memory + Strands session_manager) and #452 (async-mode for AgentCoreMemorySessionManager) may be already-fixed in 1.7-1.9. *What breaks if ignored*: silent observability gaps (broken spans on memory writes); concurrency degradation under load. — https://pypi.org/project/bedrock-agentcore/
+- **OpenAI displaces GPT-5.3 Instant with GPT-5.5 Instant (May 5)** — our model selector exposes per-model IDs. *What breaks if ignored*: customers using a 5.3 default may hit a deprecation window. — https://openai.com/index/gpt-5-5-instant/
+- **Strands #2266 — `BedrockModel.stream` leaks inner task on outer cancellation** — we cancel SSE streams on client disconnect. *What breaks if ignored*: orphaned tasks, "Task exception was never retrieved" log noise, possible memory pressure under churn. — https://github.com/strands-agents/sdk-python/issues/2266
+- **Reddit blocked from WebFetch** in the kaizen-research environment — community-signal scan is half-blind. — *Fix*: switch to `https://www.reddit.com/r/<sub>/.rss` or a configured Reddit MCP server.
+
+## Ideas — Top 6 (ranked)
+
+> Bootstrap exceptionally lists 6 (vs the skill's nominal 5) because the UI/UX lens was added mid-run and surfaced an MCP Apps initiative worth ranking. Regular runs target 5.
+
+| # | Idea | Surface | Effort | Impact | Subtracts? |
+|---|---|---|---|---|---|
+| 1 | Scope an MCP Apps host renderer in our chat (multi-PR initiative) | frontend + backend (SSE event + component) | H | H | no — additive, but unlocks every future MCP server shipping a UI |
+| 2 | Bump `bedrock-agentcore` 1.6.4 → 1.9.0; verify SDK issues #456/#452 are addressed | backend | L | M | no — pure dep bump (justified: 3 versions of upstream fixes, latest in scan window) |
+| 3 | Promote tool-result rendering to a per-tool renderer registry (signal-backed) | frontend | M | M-H | partial — replaces an implicit switch with an explicit registry; bridges toward MCP Apps |
+| 4 | Audit `BedrockModel.stream` cancellation path against Strands #2266 | backend | L | M-H | no — defensive; SSE-disconnect path is hot |
+| 5 | Close issues #266 and #267 — features already in our Strands 1.39 pin; replace with smaller "wire upstream feature" tasks | cross-cutting | L | M | **yes — retires 2 build-from-scratch tickets (library-native subtraction)** |
+| 6 | Triage Nightly Build & Test failure cluster (9× since May 6) | cross-cutting / CI | L-M | M-H | possibly — if root is issue #220, fixing it simplifies suite |
+
+### 1. Scope an MCP Apps host renderer in our chat
+- **Source**: research/2026-05-10.md ▸ Agentic UI/UX ▸ MCP Apps (SEP-1865, production-ready); cross-confirmed by assistant-ui's `mcp-app-studio` direction
+- **Surface area**: frontend (new `<mcp-app-frame>` Angular component, tool-result rendering pipeline branch) + backend (new SSE event `ui_resource`; possibly extend `oauth_required` family with `ui_consent_required`)
+- **Change**: implement the host side of the MCP Apps spec — sandboxed iframe rendering `ui://` resources returned by MCP tools, with the `ui/`-prefixed JSON-RPC dialect over `postMessage`. Consent UX reuses the existing `oauth_required` pattern. Treat as a multi-PR initiative: (a) SSE event + plumbing, (b) iframe component + postMessage bridge, (c) consent UI, (d) end-to-end with one example MCP App from `ext-apps/examples`.
+- **Subtracts**: no — pure addition. Justified because: every major host already ships this; without it, third-party MCP servers we connect can't deliver UI beyond text+JSON. We become the platform less-than the rest of the ecosystem.
+- **Effort × Impact**: High × High
+- **Verdict**: Worth scoping (formal scoping doc before any code). Could comfortably be a 3-4 week initiative spanning multiple sprints.
+
+### 2. Bump `bedrock-agentcore` 1.6.4 → 1.9.0
+- **Source**: PyPI (https://pypi.org/project/bedrock-agentcore/) + open SDK issues #456, #452
+- **Surface area**: `backend/pyproject.toml`, `backend/uv.lock`
+- **Change**: pin update + smoke-test memory + identity flows in dev; verify CHANGELOG between 1.6 and 1.9 for any breaking changes
+- **Subtracts**: addition only — justified by 3 versions of upstream fixes including likely-relevant OTEL trace detach (#456) and event-loop blocking (#452)
+- **Effort × Impact**: Low × Medium
+- **Verdict**: Worth trying
+
+### 3. Promote tool-result rendering to a per-tool renderer registry
+- **Source**: research/2026-05-10.md ▸ Agentic UI/UX ▸ AI SDK generative-UI recipes + Cursor canvases
+- **Surface area**: frontend (`<tool-result>` component or equivalent, plus a new `ToolRendererRegistry` service)
+- **Change**: today our tool-result rendering is (implicitly) a switch in one place. Promote to a signal-backed registry keyed by tool name; per-tool renderers live next to the tool definition. Bridges naturally toward MCP Apps (which would just be "another registered renderer that emits an iframe"). Lifts a chunk of switch-like code into a declarative table.
+- **Subtracts**: partial — replaces an implicit switch with an explicit registry; the registry's existence is more code, but it absorbs scattered tool-specific UI logic into one place.
+- **Effort × Impact**: Medium × Medium-High
+- **Verdict**: Worth trying — independently valuable AND pre-work for proposal #1.
+
+### 4. Audit `BedrockModel.stream` cancellation path
+- **Source**: Strands open issue #2266 (filed May 9)
+- **Surface area**: `backend/src/agents/main_agent/` stream coordinator + SSE handler
+- **Change**: locate where we cancel `BedrockModel.stream`; ensure we `await task` on cancel paths so tasks don't orphan; add a log assertion in dev to detect "Task exception was never retrieved"
+- **Subtracts**: addition only — defensive
+- **Effort × Impact**: Low × Medium-High
+- **Verdict**: Worth trying
+
+### 5. Close issues #266 and #267 — features already in our Strands 1.39 pin
+- **Source**: Strands v1.37 (PR #2249, context-window lookup) + v1.38 (large tool result offload)
+- **Surface area**: GitHub issues + small wiring in `stream_coordinator` and tool-result handling for spreadsheet/Code Interpreter outputs
+- **Change**: close #266 and #267 with comments pointing at upstream PRs; replace with smaller "wire context-window lookup" and "wire large tool result offload" tasks if the wiring isn't automatic
+- **Subtracts**: **yes — retires 2 "build from scratch" issues; replaces with at-most 2 "use upstream feature" tasks. Library-native subtraction.**
+- **Effort × Impact**: Low × Medium
+- **Verdict**: Worth trying
+
+### 6. Audit `oauth_required` SSE flow against ref-repo's mid-tool-call 401/403 handling
+- **Source**: aws-samples/sample-strands-agent-with-agentcore commit `9fcdb4c`
+- **Surface area**: `backend/src/apis/shared/oauth/agentcore_identity.py`, SSE event emission in `inference-api`, MCP/A2A tool wrappers
+- **Change**: ensure mid-conversation 401/403 from Google/external OAuth providers re-emits `oauth_required` (consent-resume) rather than streaming a tool error to the user
+- **Subtracts**: addition only — defensive; closes a real UX gap when upstream tokens revoke mid-stream
+- **Effort × Impact**: Medium × High
+- **Verdict**: Worth trying
+
+### 7. Triage Nightly Build & Test failure cluster
+- **Source**: 9 failures since May 6 in `gh run list --status=failure`
+- **Surface area**: `.github/workflows/nightly-*.yml`, possibly `tests/shared/test_cognito_idp_service.py` + adjacent (per issue #220)
+- **Change**: pull the most recent failure log; trace to root cause; fix flakiness OR document why it's failing if it's a real regression; promote to blocking issue
+- **Subtracts**: possibly — if root is issue #220 (test isolation), fixing it materially simplifies the suite
+- **Effort × Impact**: Low-Medium × Medium-High
+- **Verdict**: Worth trying
+
+## Take
+
+Two big themes this week. **First, agentic UI/UX has shifted under us.** MCP Apps shipped to production with adoption from Claude Desktop, ChatGPT, VS Code Copilot, Goose, and Postman; assistant-ui is building first-party MCP Apps tooling; the design conversation has moved from "what should an agent chat look like" to "how do we host other people's UIs in our chat." Our text+JSON tool-result rendering is now the baseline competitors are extending past. **Second, library-native subtraction is the kaizen loop's clearest win** — Strands 1.37/1.38 quietly closes our open issues #266/#267, and `bedrock-agentcore` 1.6.4 → 1.9.0 likely closes two open SDK issues we already feel. The single change that would matter most this week if scoped is **proposal #1 (MCP Apps host renderer)** — high effort, but the right strategic investment. Quick wins: **#2 (`bedrock-agentcore` bump)** and **#5 (close #266/#267)**. **#3 (renderer registry)** is the natural mid-ground that delivers value standalone AND pre-paves proposal #1.
+
+---
+
+## Sources Scanned
+
+| # | Source | URL | Accessed | Items |
+|---|---|---|---|---|
+| 1 | AWS Bedrock + AgentCore (RSS, blog, pricing, SDK issues) | aws.amazon.com / github.com/aws/bedrock-agentcore-* | 2026-05-10 | 5 announcements + 5 issues |
+| 2 | Strands Agents (releases, issues, PRs) | github.com/strands-agents/sdk-python | 2026-05-10 | 3 releases + 5 issues |
+| 3 | Reference repo | github.com/aws-samples/sample-strands-agent-with-agentcore | 2026-05-10 | 12 commits in 30-day window |
+| 4 | MCP ecosystem | modelcontextprotocol.io / github.com/modelcontextprotocol | 2026-05-10 | 4 spec items + 3 discussions |
+| 4a | FastMCP (bootstrap: PyPI snapshot only — full scan deferred to 2026-05-15) | pypi.org/project/fastmcp + github.com/jlowin/fastmcp | 2026-05-10 | latest 3.2.4 |
+| 4b | Agentic UI/UX (MCP Apps, AI SDK, assistant-ui, Linear/Cursor/Anthropic, NN/g) — 30-day baseline | modelcontextprotocol.io + ai-sdk.dev + assistant-ui.com + linear.app + cursor.com + anthropic.com + nngroup.com | 2026-05-10 | 11 items across MCP Apps spec, AI SDK patterns, assistant-ui, vendor product blogs, NN/g research |
+| 5 | Frontier models (Anthropic, OpenAI, Google, Meta) | anthropic.com / openai.com / blog.google / ai.meta.com | 2026-05-10 | 3 Anthropic + 1 OpenAI + 0 others |
+| 6 | Agent harness | github.com/anthropics + langchain.com + pydantic.dev | 2026-05-10 | 3 CC releases + 4 cookbook items |
+| 7 | Community (HN Algolia + Reddit) | hn.algolia.com + reddit.com | 2026-05-10 | 0 HN hits, Reddit blocked |
+| 8 | Version-pin diff | pypi.org / npmjs.com | 2026-05-10 | 8 deps checked, 4 lag |
+
+## Web Budget
+
+Used: 64 / 50 requests (target — UX-lens scan added 10 to the original 54).
+
+Skipped (unreachable / rate-limited):
+- Reddit (`r/LocalLLaMA`, `r/MachineLearning`) — WebFetch blocked from this environment. Switch to `.rss` endpoint or configured Reddit MCP next run.
+- `https://aws.amazon.com/bedrock/whats-new/` — 404 (page appears retired). Drop or replace.
+- `https://docs.claude.com/en/docs/claude-code/release-notes` — 301→404. Replace with `github.com/anthropics/claude-code/blob/main/CHANGELOG.md`.
+- OpenAI blog returned 403 twice; backfilled via search.
+
+Skipped (other): Security advisories (external) and Internal security posture (Dependabot + CodeQL) sources were initially included in this bootstrap run but **removed per scope refinement** — security signals are handled by Dependabot and CodeQL directly and don't need a weekly kaizen lens. Future runs won't scan them.
+
+Notes:
+- Frontier-models sub-budget exceeded (11 vs ~6 target) due to two OpenAI WebFetch 403s requiring search backfill.
+- This is a **bootstrap run**: reference-repo + UX-lens scope extended to 30 days for baseline; Carried Over and prior-decisions sections in the review-prep doc are necessarily empty.
diff --git a/docs/kaizen/research/2026-05-15.md b/docs/kaizen/research/2026-05-15.md
new file mode 100644
index 00000000..bdc3c06d
--- /dev/null
+++ b/docs/kaizen/research/2026-05-15.md
@@ -0,0 +1,253 @@
+# Kaizen Research — Friday, May 15, 2026
+> Scan window: May 8 – May 15, 2026 (7 days).
+> Web budget: ~46/50 used (target). Modest overshoot only on the frontier-models sub-budget (OpenAI 403 backfilled via search).
+> First regular run after the 2026-05-10 bootstrap.
+
+## TL;DR
+
+Quiet external week with two outsized upstream events: **Strands v1.40.0 shipped (proactive context compression PR #2239 landed)** and carries a **breaking default flip** (`use_native_token_count` true → false) that affects our token accounting; and **`bedrock-agentcore` is now 4 releases behind** (1.6.4 → 1.9.1, latest May 12) with PR #478 in flight that directly resolves last week's flagged issue #452 (AgentCoreMemorySessionManager event-loop blocking). Internally the BFF parade settled into the beta.25 + beta.26 release pair, but **#293 disabled Dependabot version-update PRs on May 13** — the kaizen loop is now the *only* mechanism catching version-pin lag. **Recommended #1**: re-prioritize the `bedrock-agentcore` bump from last week's queue — it's stale and the lag is widening.
+
+## External Scan
+
+### What's moving this week
+
+Two ecosystem currents converged. **First, the upstream-shrinks-our-backlog pattern continued** but with a wrinkle: Strands v1.40 (May 14) lands the proactive context compression we'd noted as PR #2239, but it also flips `use_native_token_count` from `true` to `false` per PR #2284 — a silent latency regression fix for multimodal, but a behavior change for anyone reading Bedrock-native input-token counts. We'd need to audit our token-metric reads (recall last week's #270 bugfix touched per-message-cost + context-% semantics) before promoting. **Second, the MCP "stateless" track keeps consolidating**: SEP-2575 merged May 11 paired with SEP-2567 from last week's window — together they remove `Mcp-Session-Id` and the mandatory `initialize` handshake. Python `mcp` SDK 1.27.1 is patch-only and doesn't adopt either yet, so the watch-and-wait posture from last week stands.
+
+The reference repo (`aws-samples/sample-strands-agent-with-agentcore`) had 12 commits in window. The architecturally interesting cluster is around a Progressive-Disclosure "skill_executor" SSE wire-name unwrap (which doesn't map to us — we don't have a wrapper-tool layer) and two defensive fixes that *do* generalize: (a) `tool_use` SSE was being emitted twice per call (empty-args then populated-args), and (b) A2A AgentCard needed an explicit `capabilities={"streaming": True}` or the SDK silently fell back to non-streaming → 40-minute timeouts. Both are 30-second checks for us with non-trivial silent-failure modes.
+
+AgentCore SDK shipped **1.9.1 (May 12)** with a runtime parse-error fix and an entrypoint registration fix; **PR #478** adds the long-requested `async_mode` to `MemorySessionManager` — the fix for issue #452 we flagged HIGH last week. Likely lands in 1.10.0. Anthropic shipped the **`/goal` command + per-tool `duration_ms` on PostToolUse hooks** in Claude Code 2.1.139/141 — both directly inspire pieces of work we already have on deck (the planned context-attribution prototype + a long-arc-objective UX). Frontier-model side was quiet: only OpenAI's GPT-5.2/5.3 snapshot deprecation notice (no Anthropic / Google / Meta model releases). Agentic UI/UX was a lean week — only Linear Code Intelligence (validating the named-participant pattern with a 5× growth datapoint) and Cursor's pre-run elicitation wizard (informs MCP elicitation UX whenever we tackle it). HN and Reddit yielded nothing in window; Reddit `.rss` confirmed blocked at the domain level via WebFetch, not just the HTML path.
+
+### Notable items by source
+
+> **Annotation conventions:**
+> - `*relevance*:` — impact-on-existing-code lens. What construct/file does this affect? What does it replace, simplify, or obsolete?
+> - `*unlocks*:` — capability-unlock lens (where applicable). What net-new capability or enhancement does this make possible?
+
+#### AWS Bedrock / AgentCore
+- **AgentCore Browser + Code Interpreter — Chrome enterprise policies + custom root CA certificates** — `BrowserClient.create_browser(enterprise_policies=...)` and `CodeInterpreter.create_code_interpreter(certificates=[Certificate.from_secret_arn(...)])` for governed/corporate-proxy scenarios. — https://aws.amazon.com/blogs/machine-learning/control-where-your-ai-agents-can-browse-with-chrome-enterprise-policies-on-amazon-bedrock-agentcore/ — *relevance*: informational; we don't use AgentCore Browser or Code Interpreter primitives today (tool layer is direct-call + AWS-SDK + Gateway-MCP + A2A) — *unlocks*: domain-restricted browser agents (e.g. `*.boisestate.edu`) and code-exec tools that reach internal HTTPS endpoints behind a corporate CA without disabling cert verification — parking-lot for whenever we add a browsing or sandboxed-exec tool
+- **Bedrock advanced prompt optimizer + migration tool** — console feature that refines a prompt across multiple models and shows comparative perf + cost. — https://aws.amazon.com/about-aws/whats-new/2026/05/amazon-bedrock-advanced-prompt-optimization-migration-tool/ — *relevance*: informational; we author agent prompts in `backend/src/agents/main_agent/` (version-controlled), not Bedrock console. Could be a one-off tuning aid for the main-agent system prompt against a candidate model swap. No code impact.
+- **Real-time voice agents with Stream Vision Agents + Nova 2 Sonic** — integration walkthrough for Nova 2 Sonic on Bedrock. — https://aws.amazon.com/blogs/machine-learning/real-time-voice-agents-with-stream-vision-agents-and-amazon-nova-2-sonic/ — *relevance*: informational; we already have a voice mode (`apis/app_api/voice/` with voice-ticket cookie auth). Useful comp point if we ever evaluate Nova 2 Sonic.
+- **No AgentCore platform-level GA/preview announcements this week.** No movement on BYO filesystem, Memory metadata, Identity-on-ECS, or Payments (all flagged last week as new). Quietest AgentCore week in recent memory.
+
+#### Strands Agents
+- **v1.40.0 released 2026-05-14** — *applicability*: HIGH — touches `backend/src/agents/main_agent/` agent setup and our custom `TurnBasedSessionManager`. — https://github.com/strands-agents/sdk-python/releases/tag/v1.40.0
+  - Proactive context compression (PR #2239) landed — adjacent to our compaction-surfacing SSE path (`compaction` event). Verify it doesn't double-fire with our existing flush.
+  - Bedrock `count_tokens` AccessDenied caching (PR #2279).
+  - Swarm OTel context-detach fix (PR #2281) — same family as bedrock-agentcore SDK issue #456 we flagged last week.
+- **BREAKING — `use_native_token_count` default flipped to `false` (PR #2284, in v1.40.0)** — fixes #2277 silent-latency regression for image/multimodal workloads; previous default added a per-turn Bedrock `count_tokens` call. — *applicability*: HIGH — if we read native input-token counts anywhere in `apis/shared/` token metrics or compaction triggers, we now get heuristic counts unless we explicitly set `BedrockModel(use_native_token_count=True)`. Audit before bumping. — https://github.com/strands-agents/sdk-python/pull/2284 — *closes our issues?* no, but interacts with last week's #270 per-message-cost fix.
+- **Issue #2266 still open** — `BedrockModel.stream` cancel leak ("Task exception was never retrieved"); last updated 2026-05-09, no PR linked yet. — *applicability*: MEDIUM-HIGH (this is the same item we already queued as last week's #4 audit). — https://github.com/strands-agents/sdk-python/issues/2266
+- **PR #2290 — MCP progress notifications in MCPClient (open, updated 2026-05-14)** — adds `notifications/progress` to `MCPClient`. — *applicability*: MEDIUM — once merged, long-running MCP tools (Gateway-SigV4 + external OAuth servers) can stream progress to the user; would slot into our SSE event stream as a new event type. — *unlocks*: long-running-tool progress UX, e.g. for spreadsheet-analysis or future browsing tools. — https://github.com/strands-agents/sdk-python/pull/2290
+- **Issue #2286 — Strands SDK repos consolidating into a monorepo** — maintainer announcement opened 2026-05-12. — *applicability*: LOW today, HIGH later — import paths may move in a future major; pin and watch for v2.x messaging. — https://github.com/strands-agents/sdk-python/issues/2286
+
+#### Reference repo (aws-samples/sample-strands-agent-with-agentcore)
+- **fix(a2a): enable streaming capability on AgentCard** (`50c9112`) — one-line fix: `capabilities={"streaming": True}` on the A2A AgentCard. Without it, the A2A SDK client silently falls back to non-streaming and never receives the completed event → ~40-minute timeouts. — *applicability*: HIGH if we have any A2A AgentCard config. **Defensive 30-second check on our A2A construct(s)** (likely `backend/src/agents/` or infra). Failure mode is silent — exactly the kind of bug we don't want to discover in prod. — https://github.com/aws-samples/sample-strands-agent-with-agentcore/commit/50c9112cbc83a4517462d9e77d73e2239b22a004
+- **refactor(sse): emit TOOL_CALL_START once per tool_use, after unwrap** (`d668685`) — they discovered Strands' tool-use processor was firing `_format_tool_use` twice per call (registration with empty input, then again with populated args). They gated START until the populated call. — *applicability*: MEDIUM — quick audit of our SSE `tool_use` emission. If we emit twice (empty + populated), the frontend tool chip may flicker or render an "unknown args" intermediate state. — https://github.com/aws-samples/sample-strands-agent-with-agentcore/commit/d668685ddfd2e6164093ca2cca0e91852ad19876
+- **fix(skill): filter disabled skills out of the L1 catalog** (`a0753dc`) — generalizes to: *if a tool is filtered at execution, it must also be filtered out of any catalog/listing that gets injected into the system prompt*. Otherwise the model hallucinates availability. — *applicability*: MEDIUM — our system-prompt tool list should derive from the same post-RBAC filtered set the executor uses. Quick consistency check. — https://github.com/aws-samples/sample-strands-agent-with-agentcore/commit/a0753dca5ade0e613f44608c3824e002dc02bc03
+- **test(e2e): add L4 protocol-path integration suite** (`8d111d9`) — 13 protocol paths (local/builtin/gateway/a2a/skill/memory) tested via deployed BFF with event-based SSE assertions. — *applicability*: MEDIUM — we have unit + architecture tests but limited deployed-BFF e2e of the multi-protocol matrix. Event-based assertions sidestep LLM flakiness. — https://github.com/aws-samples/sample-strands-agent-with-agentcore/commit/8d111d9bb79b7b1d88cc03cfb223ef23e037a32e
+
+#### MCP ecosystem
+- **SEP-2575 "Make MCP Stateless" merged 2026-05-11** — companion to SEP-2567 (sessionless transport, last week). Removes the mandatory `initialize` handshake; replaces with per-request `MCP-Protocol-Version` header + `_meta` capability bits, a new optional `server/discover` RPC, and `messages/listen` for client-initiated streaming. — *implications*: directly affects our Lambda+Gateway servers. Combined with SEP-2567 this formalizes the "stateless wire protocol" direction. **No action yet** — wait for python `mcp` SDK adoption. — https://github.com/modelcontextprotocol/modelcontextprotocol/pull/2575
+- **python-sdk v1.27.1 released 2026-05-08 — NO SEP-2567/2575 adoption** — patch release only (pydantic 2.13 compat, OAuth client-metadata coercion, `httpx<1.0.0` pin, SSEError import refactor). — *implications*: safe transitive bump from 1.27.0; the `httpx<1.0.0` pin is worth noting if anything in `backend/pyproject.toml` is reaching toward httpx 1.x. — https://github.com/modelcontextprotocol/python-sdk/releases/tag/v1.27.1
+- **awslabs/mcp PR #3491 — auth-conflict detection + structured recovery (merged May 13)** — multi-tenant OAuth/MCP pattern. — *implications*: worth reading before our next Gateway-SigV4/OAuth bridging round in `backend/src/apis/shared/oauth/agentcore_identity.py` — same problem space (token vault + multiple identities). Reference pattern, not a blocker. — https://github.com/awslabs/mcp/pull/3491
+- **SEP-2577 "Deprecate Roots, Sampling, and Logging" still open** (active May 13) — signal the spec is consolidating around tools+prompts+resources only. — *implications*: informational; our Lambda servers don't implement those client-side features. — https://github.com/modelcontextprotocol/modelcontextprotocol/pull/2577
+
+#### FastMCP
+- **v3.3.0 "Slim Reaper" released 2026-05-15** — headline: `fastmcp-slim`, a client-only distribution that drops Starlette/Uvicorn for installs that only consume MCP. Import namespace unchanged. — https://github.com/PrefectHQ/fastmcp/releases/tag/v3.3.0 — *implications for our MCP servers*: **informational, not breaking**. Our externally-hosted MCP servers are servers, so they stay on the full `fastmcp`. But anything in `apis.shared` that acts as a *client*, or future scripts/agents that just consume MCP, could shrink their install footprint by switching to `fastmcp-slim[client]`. No API changes.
+- **No transport / SEP-2567 / Lambda-adapter changes this week.** Active PRs are server hardening (docstring caching #4136, proactive token refresh #4142, event-store eviction crash #4144, cancelled-tool-call forwarding #4145) — nothing protocol-level. — https://github.com/PrefectHQ/fastmcp/pulls
+- **Latest FastMCP pin moved 3.2.4 → 3.3.0.** Update the kaizen tracker.
+
+#### Agentic UI/UX patterns
+- **Linear Agent — Code Intelligence (May 14)** — Linear Agent can now read codebases and answer technical questions; invoked via `@Linear` mention. **Usage jumped ~5× Feb→May (1,055 → ~5,200 queries/mo)** — strong validation of the named-participant pattern flagged last week. — https://linear.app/now/code-intelligence-for-linear-agent — *what it is*: agent-as-mention pattern, addressable by `@`, scoped to a thread, inline answers. — *fit*: pattern-only (Angular equivalent: render assistant turns with a distinct avatar + handle; support `@mention` addressing when multi-agent lands). — *where it'd land*: `chat-message` component (agent identity slot + mention-token renderer); no SSE change needed yet. **Strengthens the existing queue item for named A2A participants.**
+- **Cursor — Cloud Agent Development Environments (May 12)** — pre-run setup wizard where the agent gathers config/secrets via structured prompts before any task runs. — https://www.cursor.com/blog/cloud-agent-development-environments — *what it is*: productized elicitation flow, distinct from in-conversation tool consent. — *fit*: pattern-only; closest MCP analog is SEP-elicitation. Bookmark for whenever we tackle MCP elicitation UX. Not on the near-term roadmap.
+- **assistant-ui 0.14.2 (May 13)** — CI plumbing only (disable OIDC pre-flight verification). The substantive 0.14.0 release fell May 7 (one day outside window). — *fit*: not applicable this week; watch for an `mcp-app-studio` follow-up.
+- **blog.modelcontextprotocol.io — quiet (no new posts in window).** Last MCP-blog post was April 8.
+- **NN/Group AI topic URL 404'd** — try `/articles/?topic=ai` or a search path next week.
+
+#### Frontier model announcements
+- **OpenAI — deprecation notice for `gpt-5.2-chat-latest` and `gpt-5.3-chat-latest` snapshots (May 8)** — pairs with last week's GPT-5.5 Instant displacing 5.3 flag. — https://community.openai.com/t/deprecation-notice-upcoming-model-shutdowns-in-2026/1379553 — *relevance*: informational (we don't ship OpenAI in the selector); useful as a comp pattern for our own model-lifecycle UX in the model-selector. — *risk*: low; only material if we proxy OpenAI.
+- **OpenAI — Realtime API Beta removed (May 12)**, **DALL·E 2/3 snapshots removed (May 12)** — informational; no impact on `backend/src/apis/app_api/voice/` (own ticket flow) or any of our image paths.
+- **Anthropic — Claude for Small Business (May 13)** — packaging/connectors announcement; QuickBooks, PayPal, HubSpot, Canva, DocuSign, Google Workspace, Microsoft 365. — https://www.anthropic.com/news/claude-for-small-business — *relevance*: informational; foreshadows which integrations Anthropic considers "table stakes." Hints at first-party MCP tools we might otherwise build.
+- **Anthropic — $200M Gates Foundation partnership (May 14)** — non-technical; no capability deltas. — https://www.anthropic.com/news/gates-foundation-partnership
+- **Google DeepMind + Meta — quiet (two weeks running).** No Gemini or Llama posts in window. Frontier signal is now coming exclusively from Anthropic + OpenAI.
+
+#### Agent harness patterns
+- **Claude Code 2.1.139 — `/goal` command + persistent agent view** — sets a turn-spanning completion condition with live overlay (elapsed time / turns / tokens). Hooks declared in agent frontmatter fire when that agent runs as the main thread via `--agent`. — *applicability*: HIGH — maps cleanly onto our SSE pipeline (we already stream `metadata` tokens and emit `compaction`). *Inspires*: a `goal` SSE event + sidebar overlay showing user-declared objective + turn count + token-budget burn-down, particularly useful for long agentic threads. — https://github.com/anthropics/claude-code/blob/main/CHANGELOG.md
+- **Claude Code 2.1.141 — `duration_ms` on PostToolUse hooks** — per-tool wall-clock cost without instrumenting each tool. — *applicability*: HIGH — Strands has a hook system we already lean on (planned context-attribution prototype is hook-based per our memory). *Inspires*: emit per-tool `duration_ms` in our `tool_result` SSE payload + a faint inline timing badge on tool blocks. Feeds the planned context-attribution work by separating tool latency from token cost.
+- **Claude Code 2.1.142 — `MCP_TOOL_TIMEOUT` per-request override** — removes the hardcoded 60s ceiling on remote MCP. — *applicability*: MEDIUM — our Gateway-MCP path has the 15-min Lambda cap as the real ceiling, but our app_api → inference_api SSE has its own 600s timeout. *Inspires*: per-MCP-target timeout in our Gateway target registry (today one global) + surfacing remaining-time in the `tool_use` event so the UI can show a progress hint on slow MCP calls.
+- **Anthropic engineering blog — no posts in window.**
+
+#### Pricing / quota
+- **AgentCore Payments launch (May 11)** — preview only; AWS charges $0 for the service, wallet-provider passes through (Coinbase CDP ~$0.005/wallet op). — *impact*: none on inference_api model selection or cost-badge values today; relevant only if we wire an agent to autonomously pay for external APIs/MCP servers. — https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-bedrock-agentcore-payments-agent-toolkit-for-aws-and-more-may-11-2026/
+- **AgentCore base pricing unchanged.** Runtime/Browser/Code-Interpreter still $0.0895/vCPU-hr + $0.00945/GB-hr; Memory $0.25/1K events; Identity $0.010/1K token requests (free via Runtime/Gateway); Gateway $0.005/1K invocations.
+- **Bedrock model pricing page** — WebFetch extraction returned only Claude 3.5 entries (same limitation as last week). No model-pricing announcements detected. Worth a manual eyeball or direct curl next week. — https://aws.amazon.com/bedrock/pricing/
+
+#### Open AgentCore SDK issues affecting us
+- **PR #478 — feat: add async support to MemorySessionManager** — opt-in `async_mode: bool` on `AgentCoreMemoryConfig`; when true, wraps sync methods in `asyncio.to_thread()`. **Directly resolves last week's #452**. Likely lands in 1.10.0. — https://github.com/aws/bedrock-agentcore-sdk-python/pull/478 — *applicability*: HIGH
+- **Issue #471 — docs: `/ping` response requires undocumented `time_of_last_update` field** — without that field, AgentCore's idle reaper kills microVMs even while `/ping` returns `HealthyBusy`; AWS docs only show `{"status": "HealthyBusy"}`. — *applicability*: HIGH — we run long-running streaming responses on the inference-api runtime. If our `/ping` doesn't emit `time_of_last_update`, we may be experiencing silent microVM reaping on long generations. **Grep our ping handler today.** — https://github.com/aws/bedrock-agentcore-sdk-python/issues/471
+- **Issue #468 — BedrockAgentCoreApp SDK needs update for expanded Request Header Allowlist** — AgentCore CP now allows arbitrary HTTP headers; SDK doesn't yet surface them to the handler. — *applicability*: MEDIUM — unlocks passing trace IDs / tenant hints from BFF/app-api into inference-api without the `X-Amzn-...-Custom-` prefix dance. Not blocking. — https://github.com/aws/bedrock-agentcore-sdk-python/issues/468
+- **Issue #467 — Multi-agent (Graph) session support missing from AgentCoreMemorySessionManager** — `create_multi_agent` / `read_multi_agent` / `update_multi_agent` on Strands `SessionRepository` aren't implemented; Strands `Graph`-based flows fail at graph creation when paired with AgentCore Memory. — *applicability*: LOW-MEDIUM today (single-agent), HIGH if/when we adopt Strands Graph for sub-agent decomposition. — https://github.com/aws/bedrock-agentcore-sdk-python/issues/467
+
+#### Cookbook / courses
+- **Linear ↔ Managed Agents stateless webhook bridge** (May 13, `9644291`) — TS/Bun template: no held SSE, no session-map DB. `session.metadata` carries `linear_session_id` + `org_id`; uses `beta.webhooks.unwrap` with retrieve-then-filter and 10s ack. — https://github.com/anthropics/claude-cookbooks/commit/96442914bfee9842faa97b1d45ee7b43317f7391 — *what we'd borrow*: the stateless-bridge pattern is the cleanest pattern we've seen for wrapping streaming backends behind webhook-style async callers. Worth comparing to how `apis.shared` correlates async tool callbacks today. *Inspires*: stash external correlation IDs in `session.metadata` instead of a session-map table; future Slack/Teams entry points slot in cheaply.
+- **CMA Sessions API as an MCP server (stdio + HTTP)** (May 13, `a090206`) — thin MCP wrapper with two entrypoints: `server.ts` for Claude Desktop stdio, `server-http.ts` for Streamable HTTP + bearer auth (claude.ai Connectors). `wait_for_idle` is an SSE→request/response shim. Authoring/destructive endpoints deliberately omitted. — *what we'd borrow*: the `wait_for_idle` SSE-to-request/response shim is a clean template for wrapping streaming backends as MCP tools — relevant if we ever expose our Strands agent loop as an MCP tool another agent can call. Also: deliberate-omission stance (no authoring/secrets) is a good template for our admin-endpoint MCP boundaries.
+- **Registry: "Claude Managed Agents" category** (May 13, `c8b30f3`) — schema addition + retag of managed_agents notebooks under a new top-level category. — *takeaway*: Anthropic is consolidating CMA as a first-class product surface. Strategic data point on whether to keep building bridges *to* CMA vs. treat it as a peer to Bedrock AgentCore.
+
+#### Community + GitHub issues
+- **HN Algolia (May 8–15) — 0 in-window hits** for `bedrock`, `agentcore`, `strands`, `mcp`, `claude code`. Index is functioning — `agentcore` returns 1,490 lifetime results, with the most recent (Payments) at May 7, one day before window. Quiet week confirmed.
+- **Reddit `.rss` blocked at the domain level via WebFetch** (not just the HTML path). Last week's open queue item to "add Reddit `.rss`" is **infeasible via WebFetch** — needs a different transport (`curl` with UA header via Bash, or an RSS-fetching MCP). Recommend closing the queue item as infeasible.
+
+#### Seasonal
+- Out of window — no re:Invent / NeurIPS / ICLR / EMNLP items.
+
+### Patterns worth considering
+
+- **Per-tool timing as a first-class hook output** (Claude Code 2.1.141) — directly enables our planned context-attribution prototype to separate tool latency from token cost. Cheap, mechanical, signal-rich.
+- **`session.metadata` correlation-id pattern** (Linear↔CMA cookbook) — replaces dedicated session-map tables with a JSON blob the external system writes. Lifts a future infra ask (Slack/Teams entry points → schema work) to a zero-table approach.
+- **Stateless MCP transport** (SEP-2567 + SEP-2575) — direct fit for our SigV4 Gateway model. Watch python `mcp` SDK for adoption; act when 1.28.0+ ships with the changes.
+- **Tool-catalog ↔ executor consistency** (ref repo `a0753dc`) — generalizes the principle: any tool list emitted to the model must derive from the same post-RBAC filtered set the executor uses. Worth a quick audit on our system-prompt assembly.
+
+## Internal Audit
+
+### Activity (last 7 days)
+- **Commits on develop**: 33 (across multiple squash-merged PRs and direct release/develop merges)
+- **PRs merged into develop in window**: 8 user-facing (#290, #293, #296, #297, #298, #299, #300, #301) + 2 release back-merges (beta.25, beta.26) + 1 docs (#276 — pre-existing branch)
+- **PRs opened**: 1 (#301 — session-list polish; the working branch this run is on)
+- **PRs reverted**: 0
+- **Issues opened**: 1 (#288 — inference-api deploy: new images reach ECR but live AgentCore Runtime isn't rolled)
+- **Releases**: beta.25 (May 12), beta.26 (May 14)
+- **CI failures (workflow → count, last 7 days)**:
+  - Version Check: 7 (all Dependabot — see below)
+  - Deploy App API: 5
+  - Deploy Inference API: 3
+  - Deploy Infrastructure: 2
+  - **Nightly Build & Test: 0** (last week's #6 idea — fixed via #290 on May 12)
+
+### Repeated friction signals
+- **Inference API deploy not rolling new Runtime versions** (issue #288, May 12) — new container images reach ECR but the live AgentCore Runtime isn't picked up. Correlates with the 3 Deploy Inference API failures this week. *Hypothesis*: deploy step is publishing the image but not bumping the Runtime version pointer or invoking `update_agent_runtime`. *Fix candidate*: trace the deploy workflow against the AgentCore Runtime versioning model — verify it calls the SDK's `update_agent_runtime` (or equivalent) after the ECR push.
+- **Deploy App API failures clustered (5 in 7 days)** — three on the `feature/fix-default-model-persistence` branch on May 12, two on `feature/skip-auth-local-bypass` on May 9 (pre-`#272` merge). Branch-level, not a `develop` regression — most are pre-merge breakages getting resolved by the author.
+- **Nightly Build & Test cluster from last week — RESOLVED.** Last failure was May 8; PR #290 landed May 12 and the cluster has been silent since. *Last week's #6 worked.*
+
+### Version-pin lag
+
+| Dep | Pinned | Latest | Lag | Notes |
+|---|---|---|---|---|
+| `bedrock-agentcore` | 1.6.4 | **1.9.1** | 4 minor+patch / latest 2026-05-12 | **Widening from last week (was 3).** PR #478 in flight adds `async_mode` to `MemorySessionManager` — directly resolves last week's flagged #452. |
+| `strands-agents` | 1.39.0 | **1.40.0** | 1 minor / latest 2026-05-14 | **New release in window.** Headline: proactive context compression (PR #2239). **Carries breaking default flip on `use_native_token_count` (true → false)** — audit token-metric reads before bumping. |
+| `fastmcp` (transitive on external MCP servers) | n/a in this repo | **3.3.0** | n/a | New `fastmcp-slim` client distribution. Informational. |
+| `mcp` (transitive) | (transitive) | 1.27.1 | 1 patch | Patch-only release. NO SEP-2567/2575 adoption — watch 1.28.0. |
+| `boto3` | 1.42.96 | (not re-fetched this week) | — | Last week 1.43.6 was latest; routine bumps continue. |
+| `aws-cdk-lib` | 2.251.0 | (not re-fetched this week) | — | Routine. |
+| `@angular/core` | 21.2.11 | (not re-fetched this week) | — | Routine. |
+
+> **Note**: #293 disabled Dependabot version-update PRs on 2026-05-13. The kaizen loop is now the only mechanism catching version-pin lag — version-pin diff has been promoted from "routine" to "load-bearing."
+
+### Retirement candidates
+- **Last week's queue item "Add Reddit `.rss` or Reddit MCP to `kaizen-research`"** — confirmed Reddit is blocked at the domain level via WebFetch (not just the HTML path). Close the queue item as **infeasible via WebFetch**; revisit only if a different transport becomes available (Reddit MCP, or `curl`-via-Bash with UA header).
+- **`kaizen-research/SKILL.md` AgentCore starter-toolkit URL is wrong** — current URL references `aws/amazon-bedrock-agentcore-starter-toolkit`; correct slug is `aws/bedrock-agentcore-starter-toolkit` (no `amazon-` prefix). Fold into the existing queue item "Replace dead source URLs in `kaizen-research` skill."
+- **`anthropics/courses` source** — already queued for removal last week; confirmed quiet again this week. Leave queued.
+
+### Risks introduced this week
+- **Dependabot version-update PRs disabled (#293, May 13)** — no automated bump pressure. Kaizen loop is now the *only* mechanism catching version-pin lag. *What breaks if ignored*: version-pin lag silently widens; security patches in transitive deps may not arrive until something else surfaces them. — *mitigation*: tighten the version-pin diff section of this skill (already in scope); add direct fetches for `boto3`, `aws-cdk-lib`, `@angular/core`, `pydantic` every run.
+- **`bedrock-agentcore` now 4 releases behind, with a release in the scan window** (1.9.1, May 12) — *what breaks if ignored*: still-open OTEL trace detach (#456), event-loop blocking (#452 — fix in PR #478 likely 1.10.0). Same risk as last week but worse — 3rd week of carry-over would be embarrassing.
+- **Strands v1.40 `use_native_token_count` default flip** — if we bump without auditing, token-accounting code reading native counts gets heuristic values instead. — https://github.com/strands-agents/sdk-python/pull/2284
+- **AgentCore `/ping` may not emit `time_of_last_update`** (SDK issue #471) — silent microVM reaping on long generations. — *what breaks if ignored*: long agent turns get killed by the idle reaper even while we're streaming. **Grep our ping handler.** — https://github.com/aws/bedrock-agentcore-sdk-python/issues/471
+- **Inference-api deploy doesn't roll the live AgentCore Runtime** (our #288) — manual-redeploy band-aid; eventually a security patch or model-version bump will need to ship and won't.
+
+## Ideas — Top 5 (ranked)
+
+| # | Idea | Surface | Effort | Impact | Subtracts? | Unlocks? |
+|---|---|---|---|---|---|---|
+| 1 | Bump `bedrock-agentcore` 1.6.4 → 1.9.1 (now 4 releases behind — re-prioritized from last week) | backend | L | M-H | no — addition only, but retires open queue carry-over; sets up adoption of PR #478 / `async_mode` once 1.10.0 ships | — |
+| 2 | Audit and fix `/ping` to emit `time_of_last_update` per AgentCore SDK issue #471 | backend (inference-api `/ping`) | L | M-H | no — defensive against silent microVM reaping on long generations | — |
+| 3 | Strands 1.39 → 1.40 bump, gated on `use_native_token_count` audit + proactive-compression double-fire check | backend | M | M-H | **yes — adopting upstream proactive context compression (PR #2239) reduces our custom compaction surface area** | — |
+| 4 | Defensive A2A AgentCard `capabilities={"streaming": True}` check (ref repo `50c9112`) | backend (A2A construct) | L | M | no — defensive; silent-failure mode is 40-min timeouts | — |
+| 5 | Wire per-tool `duration_ms` into our `tool_result` SSE payload (Claude Code 2.1.141 pattern) | backend (Strands hook) + frontend (faint inline badge) | L-M | M-H | partial — replaces ad-hoc per-tool timing with a single hook-driven field; pre-paves the planned context-attribution prototype | per-tool timing visibility in the UI; the data substrate for context-attribution that separates latency from token cost |
+
+### 1. Bump `bedrock-agentcore` 1.6.4 → 1.9.1
+- **Source**: PyPI (https://pypi.org/project/bedrock-agentcore/) + open SDK issues #456, #452, #471. *Re-prioritized from last week's queue — lag widened 3 → 4 releases and Dependabot version-updates are now disabled (#293), so this won't get there on its own.*
+- **Surface area**: `backend/pyproject.toml`, `backend/uv.lock`
+- **Change**: pin update + smoke-test memory + identity flows in dev. Verify CHANGELOG between 1.6.4 and 1.9.1 for breaking changes. Verify whether 1.9.1 already addresses #456 (OTEL trace detach across asyncio boundaries) — if so, close. Track PR #478 (`async_mode`) for the 1.10.0 follow-up.
+- **Subtracts**: addition only — justified by 4 versions of upstream fixes and the kaizen-loop accountability for catching version-pin lag now that Dependabot is off
+- **Effort × Impact**: Low × Medium-High
+- **Verdict**: Worth trying. *This is a re-prioritization, not a new idea — the bootstrap-week version is still open in the queue and is now stale.*
+
+### 2. Audit and fix `/ping` to emit `time_of_last_update`
+- **Source**: AgentCore SDK issue #471 (https://github.com/aws/bedrock-agentcore-sdk-python/issues/471)
+- **Surface area**: `backend/src/apis/inference_api/` (the `/ping` handler) — this is one of the two routes the AgentCore Runtime data plane actually serves (per CLAUDE.md inference-api boundary section).
+- **Change**: grep our ping response shape; if it's just `{"status": "Healthy" | "HealthyBusy"}`, extend it to also emit `{"time_of_last_update": <iso-ts>}`. Without that field AgentCore's idle reaper can kill microVMs mid-long-generation even while we're streaming.
+- **Subtracts**: addition only — defensive; silent failure mode
+- **Effort × Impact**: Low × Medium-High
+- **Verdict**: Worth trying — cheap, surface area is tiny, failure mode is silent and bad
+
+### 3. Strands 1.39 → 1.40 bump
+- **Source**: Strands v1.40.0 release notes + PR #2284 (breaking)
+- **Surface area**: `backend/pyproject.toml`, `backend/uv.lock`, our compaction-surfacing SSE path, any code reading native Bedrock input-token counts
+- **Change**: (a) audit `apis/shared/` and `agents/main_agent/streaming/` for native-token-count reads — if we depend on them, either pin `BedrockModel(use_native_token_count=True)` explicitly OR re-route through the heuristic and verify our cost-badge math (recall last week's #270 already touched this); (b) bump pin; (c) verify proactive context compression (PR #2239) doesn't double-fire with our existing `TurnBasedSessionManager` flush — our compaction SSE event should still emit cleanly. (d) Smoke-test cost-badge values across a tool turn before promoting.
+- **Subtracts**: **yes — library-native subtraction.** Strands' proactive context compression replaces compaction logic we'd otherwise grow. Reduces the surface area in our custom session manager.
+- **Effort × Impact**: Medium × Medium-High
+- **Verdict**: Worth trying. The bump is straightforward; the audit is the careful part.
+
+### 4. Defensive A2A AgentCard `capabilities={"streaming": True}` check
+- **Source**: aws-samples/sample-strands-agent-with-agentcore commit `50c9112`
+- **Surface area**: wherever we construct A2A AgentCards in this repo (search `AgentCard`, `capabilities=`, `agent_card`)
+- **Change**: 30-second grep + read; if the field is missing or `False`, set to `True`. Silent failure mode otherwise: A2A SDK client falls back to non-streaming, never receives `completed`, hangs 40 minutes.
+- **Subtracts**: addition only — defensive
+- **Effort × Impact**: Low × Medium
+- **Verdict**: Worth trying. Cheapest item in the list with a real silent-failure mode.
+
+### 5. Wire per-tool `duration_ms` into `tool_result` SSE
+- **Source**: Claude Code 2.1.141 hook pattern (https://github.com/anthropics/claude-code/blob/main/CHANGELOG.md)
+- **Surface area**: backend Strands hook (PostToolUse equivalent) emitting into our SSE `tool_result` payload; frontend `<tool-result>` component renders a faint inline timing badge
+- **Change**: register a Strands `AfterToolCall` hook that captures `(end - start)` wall-clock per tool invocation; emit on the existing `tool_result` SSE event as `duration_ms`. Frontend renders inline timing only if `> 250ms` (avoid noise).
+- **Subtracts**: partial — replaces ad-hoc per-tool timing (if any) with a single hook-driven field; *more importantly, it's the data substrate for the planned context-attribution prototype* (per our memory) — separating tool latency from token cost
+- **Unlocks**: per-tool timing visibility in the UI (which slow tool is the bottleneck on this turn?); the data substrate for context-attribution that distinguishes latency from token cost
+- **Effort × Impact**: Low-Medium × Medium-High
+- **Verdict**: Worth trying. Pairs naturally with the planned context-attribution work — landing this first means the prototype starts with cleaner inputs.
+
+## Take
+
+Two patterns are quietly reshaping the kaizen loop. **First, the upstream-shrinks-our-backlog play keeps paying off**: this week Strands shipped the proactive compression we'd flagged, AgentCore SDK has the `async_mode` fix for our #452 in flight, and the ref repo's `50c9112` A2A streaming-capability fix is a 30-second port. **Second, #293 disabled Dependabot version-update PRs** — the kaizen loop is now the only mechanism catching version-pin lag, which makes the *re-prioritization* of last week's bedrock-agentcore bump (now 4 releases behind, not 3) the single most important change this week. The agentic-UI/MCP-Apps storyline that dominated bootstrap week has gone quiet — slow week on UX, normal noise on ecosystem. Idea #1 is a carry-over but earns its #1 by being stale; idea #2 (the `/ping` fix) is the most consequential silent-failure mitigation we can land cheaply. If Phil ships only two this week, those are the two.
+
+---
+
+## Sources Scanned
+
+| # | Source | URL | Accessed | Items |
+|---|---|---|---|---|
+| 1 | AWS Bedrock + AgentCore (RSS, blog, AWS weekly roundup) | aws.amazon.com / github.com/aws/bedrock-agentcore-* | 2026-05-15 | 3 announcements + 4 SDK items (PR #478, issues #467/#468/#471) |
+| 2 | Strands Agents (releases, issues, PRs) | github.com/strands-agents/sdk-python | 2026-05-15 | 1 release (v1.40.0) + 4 issues/PRs (#2266, #2284, #2286, #2290) |
+| 3 | Reference repo (12 commits in window) | github.com/aws-samples/sample-strands-agent-with-agentcore | 2026-05-15 | 4 architecturally relevant commits (50c9112, d668685, a0753dc, 8d111d9) |
+| 4 | MCP ecosystem (SEPs, python-sdk, awslabs/mcp) | modelcontextprotocol.io / github.com/modelcontextprotocol / github.com/awslabs/mcp | 2026-05-15 | SEP-2575 merge + python-sdk v1.27.1 + 2 awslabs PRs + SEP-2577 |
+| 4a | FastMCP (v3.3.0 + active PRs) | github.com/PrefectHQ/fastmcp + pypi.org/project/fastmcp | 2026-05-15 | 1 release (3.3.0) + 4 active PRs |
+| 4b | Agentic UI/UX (MCP blog, assistant-ui, Linear, Cursor, Anthropic, NN/g) | blog.modelcontextprotocol.io + linear.app + cursor.com + anthropic.com + nngroup.com | 2026-05-15 | 2 in-window posts (Linear Code Intelligence, Cursor cloud envs) + 1 assistant-ui CI release + NN/g 404 noted |
+| 5 | Frontier models (Anthropic, OpenAI, Google, Meta) | anthropic.com / openai.com / blog.google / ai.meta.com | 2026-05-15 | 2 Anthropic non-technical + 3 OpenAI deprecations + 0 Google/Meta |
+| 6 | Agent harness (Claude Code CHANGELOG, claude-cookbooks, Anthropic engineering) | github.com/anthropics/* | 2026-05-15 | 3 Claude Code releases (2.1.139/141/142) + 3 new cookbook artifacts |
+| 7 | AWS Bedrock pricing + quota | aws.amazon.com/bedrock/pricing | 2026-05-15 | 0 detected changes; AgentCore Payments launch context only |
+| 8 | Community (HN Algolia + Reddit) | hn.algolia.com + reddit.com | 2026-05-15 | 0 HN in-window; Reddit `.rss` confirmed domain-blocked via WebFetch |
+
+## Web Budget
+
+Used: ~46 / 50 requests (target).
+- AWS Bedrock/AgentCore: 4–5 (one 404 on a `/whats-new/2026/05/` aggregator)
+- Strands: 4 (mostly `gh api`)
+- Reference repo: 4 (`gh api`)
+- MCP ecosystem: 4 (mostly `gh api`, 1 web)
+- FastMCP: 2 `gh api`
+- Agentic UI/UX: 6 (one 404 on NN/g topic URL)
+- Frontier models: 6 (1 over sub-budget — OpenAI 403 backfilled via search)
+- Agent harness: 4
+- Pricing: 3
+- AgentCore SDK: 4 (mostly `gh api`)
+- Community: 3 (HN + 2 Reddit attempts that failed fast)
+- Cookbook: 2 `gh api` (no web)
+
+Skipped (unreachable):
+- Reddit `.rss` — domain-blocked via WebFetch (confirmed). Closing the existing queue item as infeasible-via-WebFetch.
+- NN/Group AI topic URL `https://www.nngroup.com/topic/artificial-intelligence/` 404'd. Try `/articles/?topic=ai` or search next week.
+- `https://aws.amazon.com/about-aws/whats-new/2026/05/` 404'd. RSS feed covers this anyway.
+
+Skipped (other): seasonal sources (out of window).
+
+Notes:
+- Frontier-models sub-budget overshot by 1 (OpenAI 403 required search backfill — same failure mode as bootstrap).
+- Bedrock pricing-page extraction returned only Claude 3.5 entries for the second week running. If a third scan misses, switch to `curl` + grep instead of WebFetch summarization.
diff --git a/docs/kaizen/review-queue.md b/docs/kaizen/review-queue.md
new file mode 100644
index 00000000..d97d94cd
--- /dev/null
+++ b/docs/kaizen/review-queue.md
@@ -0,0 +1,114 @@
+# Kaizen Review Queue
+
+Items added by `kaizen-research`, consumed by `kaizen-review-prep`.
+
+## Open
+
+### [2026-05-15] Wire per-tool `duration_ms` into `tool_result` SSE
+- **Source**: research/2026-05-15.md ▸ Top 5 #5 — Claude Code 2.1.141 hook pattern
+- **Surface**: backend (Strands `AfterToolCall` hook) + frontend (`<tool-result>` component — inline timing badge for `> 250ms`)
+- **Effort × Impact**: L-M × M-H
+- **Subtracts**: partial — single hook-driven field replaces any ad-hoc per-tool timing; pre-paves the planned context-attribution prototype
+- **Unlocks**:
+  - Per-tool timing visibility in the UI (which slow tool is the bottleneck on this turn?)
+  - Data substrate for the planned context-attribution prototype — separates tool latency from token cost
+- **Status**: open — surfaced in reviews/2026-05-15.md ▸ Proposal #3 (Ship); no decision logged yet
+
+### [2026-05-15] Investigate inference-api deploy — new images reach ECR but Runtime isn't rolled (issue #288)
+- **Source**: reviews/2026-05-15.md ▸ Proposal #10 (new from internal friction, issue #288 May 12). Pairs with the 1.6.4 → 1.9.1 bump (same SDK package owns `update_agent_runtime`).
+- **Surface**: cross-cutting — `.github/workflows/deploy-inference-api.yml` + bedrock-agentcore SDK `update_agent_runtime` call shape
+- **Effort × Impact**: L-M × M-H
+- **Subtracts**: possibly — removes the manual-redeploy band-aid that's been the workaround
+- **Status**: open — surfaced in reviews/2026-05-15.md ▸ Proposal #10 (Ship — recommended ship-first); no decision logged yet. **Friction intensifying**: 6+ "Deploy Inference API" failures May 15–17; a new "Deploy App API" failure cluster (8× May 16–17) may share a root cause.
+
+### [2026-05-10] Scope AgentCore Runtime BYO filesystem (S3 Files / EFS) for persistent agent workspaces
+- **Source**: research/2026-05-10.md ▸ AWS Bedrock / AgentCore (re-evaluated 2026-05-10 via strategic-lens follow-up — original framing under-weighted the capability-unlock angle)
+- **Surface**: backend (`inference-api` invocation handler reads/writes mount) + infrastructure (VPC config, IAM mount permissions, S3 Files or EFS access points, per-user prefix/access-point layout for RBAC); ADR-worthy
+- **Effort × Impact**: H × H
+- **Subtracts**: no — pure capability addition
+- **Unlocks**:
+  - Code-interpreter / persistent agent workspace (artifacts survive turn and session boundaries)
+  - Cross-session file uploads — PDFs/spreadsheets persist between conversations instead of re-staging per session
+  - Shared skill/template/prompt hot-swap without redeploying the runtime container
+  - A2A multi-agent intermediate-result handoff via shared mount
+  - Persistent vector indexes / embedding caches — avoids cold-start rebuild
+- **Open questions**: GA vs preview status (March 2026 managed session storage was preview; May 2026 BYO needs verification); VPC requirement is a new architectural surface for the runtime; multi-tenancy isolation strategy (per-user S3 prefix vs per-user EFS access point); RBAC mount-path layout; runtime data plane still only proxies `/invocations` + `/ping` so this doesn't unlock new HTTP routes
+- **Status**: open — deferred 4 weeks in reviews/2026-05-15.md (revisit 2026-06-12). MCP Apps host renderer is the dominant strategic initiative this cycle; layering another ADR-worthy bet on top would double the open architectural surface.
+
+### [2026-05-10] Audit `BedrockModel.stream` cancellation path against Strands #2266
+- **Source**: research/2026-05-10.md ▸ Top 6 #4
+- **Surface**: backend
+- **Effort × Impact**: L × M-H
+- **Subtracts**: no — defensive (SSE-disconnect path is hot)
+- **Status**: open — surfaced in reviews/2026-05-15.md ▸ Proposal #8 (Ship); no decision logged yet
+
+### [2026-05-10] Audit `oauth_required` SSE flow against ref-repo's mid-tool-call 401/403 handling
+- **Source**: research/2026-05-10.md ▸ Risks
+- **Surface**: backend
+- **Effort × Impact**: M × H
+- **Subtracts**: no — defensive
+- **Status**: open — deferred 2026-05-10 until 2026-05-24. BFF parade declared done via #297 (May 14), so deferral conditions have cleared a week early; reviews/2026-05-15.md holds to original revisit date to give one stable week.
+
+### [2026-05-10] Named A2A agent participants in the chat UI
+- **Source**: research/2026-05-10.md ▸ Agentic UI/UX ▸ Linear Agent pattern. Reinforced by research/2026-05-15.md Linear Code Intelligence 5× usage-growth datapoint.
+- **Surface**: frontend (extend message model with `agent_identity`, distinct avatar/name/styling)
+- **Effort × Impact**: L-M × M
+- **Subtracts**: no — additive but pattern-validated across Linear/ChatGPT/Cursor
+- **Status**: open — deferred 4 weeks in reviews/2026-05-15.md (revisit 2026-06-12). Earns its keep when an A2A construct lands.
+
+## Resolved
+
+### [2026-05-15] Strands 1.39 → 1.40 bump (token-count audit + compaction double-fire check) → RESOLVED — shipped
+- **Decision**: Ship — reviews/2026-05-15.md ▸ Proposal #6
+- **Reasoning**: Shipped in PR #340 (`chore(deps): bump strands-agents 1.39.0 → 1.40.0`, merged 2026-05-18). Audit outcome: **accept the new `use_native_token_count=False` default** — the flag gates only `BedrockModel.count_tokens()`, which nothing in our cost / context-% paths reads (those read native Bedrock Converse `usage`); pinning `True` would add a redundant CountTokens API call per invocation. Compaction double-fire **confirmed absent** — Strands proactive compression is opt-in (`proactive_compression=None` default), operates on `ConversationManager` not our `TurnBasedSessionManager`; the `compaction` SSE event still emits exactly once (PR #243 invariant preserved; new regression test `test_compaction_sse_emit_once.py`). Full local backend suite: 2887 passed / 3 skipped on 1.40.
+- **Reviewed-in**: reviews/2026-05-15.md ▸ Proposal #6
+
+### [2026-05-10] Promote tool-result rendering to a per-tool renderer registry (MCP Apps PR #0) → RESOLVED — shipped
+- **Decision**: Ship — reviews/2026-05-15.md ▸ Proposal #5
+- **Reasoning**: Shipped in PR #339 (`refactor(chat): tool-result renderer registry (MCP Apps PR #0)`, merged 2026-05-18). Pure refactor — implicit text/JSON/image switch lifted into a signal-backed `ToolRendererRegistryService` keyed by tool name; `DefaultToolResultComponent` reproduces prior markup verbatim (zero user-visible change); `calculator` / `fetch_url_content` / `create_visualization` migrated as proof points. 1014/1014 frontend tests green (14 new, DI-token overrides not `vi.mock`). Unblocks MCP Apps PR #1; the PR #4 MCP App renderer now plugs in as just-another-registered-renderer.
+- **Reviewed-in**: reviews/2026-05-15.md ▸ Proposal #5
+
+### [2026-05-15] Bump `bedrock-agentcore` 1.6.4 → 1.9.1 → RESOLVED — shipped
+- **Decision**: Ship — reviews/2026-05-15.md ▸ Proposal #1
+- **Reasoning**: Shipped in PR #337 (`chore(deps): bump bedrock-agentcore 1.6.4 → 1.9.1 (+ coupled boto3 1.43.9)`, merged 2026-05-18). Closes the structural version-pin lag now that Dependabot version-updates are disabled (#293); first proof the kaizen loop catches lag without Dependabot.
+- **Reviewed-in**: reviews/2026-05-15.md ▸ Proposal #1
+
+### [2026-05-15] Audit and fix `/ping` to emit `time_of_last_update` (#471) → RESOLVED — shipped
+- **Decision**: Ship — reviews/2026-05-15.md ▸ Proposal #2
+- **Reasoning**: Shipped in PR #338 (kaizen bundle, merged 2026-05-18). `/ping` now emits an integer `time_of_last_update` + corrected `Healthy` casing. Accepted trade-off documented in the PR: a fresh per-ping timestamp disables ping-based idle reaping for this runtime — we can't report `HealthyBusy` without async-task busy tracking (deferred `async_mode` work).
+- **Reviewed-in**: reviews/2026-05-15.md ▸ Proposal #2
+
+### [2026-05-15] Defensive A2A AgentCard `capabilities={"streaming": True}` check → RESOLVED — guard documented
+- **Decision**: Ship (docs-only) — reviews/2026-05-15.md ▸ Proposal #4
+- **Reasoning**: Resolved in PR #338 (merged 2026-05-18). A2A is client-only today (no server `AgentCard` exists), so there is no code site to patch. Added a forward-looking guard to `CLAUDE.md`: the first A2A server construct MUST advertise `capabilities` with `streaming=True`, else A2A clients hang ~40 min (ref-repo `50c9112`).
+- **Reviewed-in**: reviews/2026-05-15.md ▸ Proposal #4
+
+### [2026-05-10] Close issues #266 and #267 — features already in our Strands 1.39 pin → RESOLVED — decided (NOT closed; premise corrected)
+- **Decision**: Decided, premise corrected — reviews/2026-05-15.md ▸ Proposal #7 (via PR #338)
+- **Reasoning**: The review's "phantom tech debt — close them" framing was **wrong**. #266 (large tool-result offload) and #267 (context-window lookup fallback) are live, well-specified Strands adoption/wiring tasks whose 1.39 precondition is now met. Decision (PR #338, GitHub-only): posted "unblocked, keep open" comments on both — NOT closed. Logged in decisions.md so future research does not re-propose closing them.
+- **Reviewed-in**: reviews/2026-05-15.md ▸ Proposal #7
+
+### [2026-05-10] Replace dead source URLs in `kaizen-research` skill (+ starter-toolkit slug) → RESOLVED — shipped
+- **Decision**: Ship — reviews/2026-05-15.md ▸ Proposal #9
+- **Reasoning**: Shipped in PR #338 (merged 2026-05-18). Replaced/dropped dead source URLs in `kaizen-research/SKILL.md`; fixed `aws/amazon-bedrock-agentcore-*` → `aws/bedrock-agentcore-*` slug — the review flagged the starter-toolkit; the sdk-python line had the same typo and was also fixed.
+- **Reviewed-in**: reviews/2026-05-15.md ▸ Proposal #9
+
+### [2026-05-10] Add Reddit `.rss` or Reddit MCP to `kaizen-research` → RESOLVED — declined
+- **Decision**: Decline — reviews/2026-05-15.md ▸ Retirement Candidates
+- **Reasoning**: research/2026-05-15.md confirmed Reddit is blocked at the *domain* level via WebFetch (not just the HTML path), so the proposal as scoped is infeasible. Logged in decisions.md; revisit only if a Reddit MCP or `curl`-via-Bash-with-UA-header path becomes available.
+- **Reviewed-in**: reviews/2026-05-15.md ▸ Retirement Candidates
+
+### [2026-05-10] Scope an MCP Apps host renderer in our chat (multi-PR initiative) → RESOLVED — scoping landed
+- **Decision**: Ship (scope only) — reviews/2026-05-10.md ▸ Proposal #1
+- **Reasoning**: Scoping doc `docs/kaizen/scoping/mcp-apps-host-renderer.md` landed in PR #296 (May 14, 2026). Four open architectural questions locked: sandbox-proxy origin, app-initiated `tools/call` plumbing, `ui/update-model-context` storage in Strands `agent.state`, full v1 method scope. PR #0 → PR #6 sequence defined; build work is now tracked via the renderer-registry queue item (PR #0 of that sequence).
+- **Reviewed-in**: reviews/2026-05-10.md ▸ Proposal #1
+
+### [2026-05-10] Triage Nightly Build & Test failure cluster (9× since May 6) → RESOLVED — fixed
+- **Decision**: Ship — reviews/2026-05-10.md ▸ Proposal #6
+- **Reasoning**: PR #290 (`Fix e2e testing in nightly`, May 12) landed. The Nightly Build & Test workflow has been silent since — research/2026-05-15.md confirms 0 failures in the May 10–15 window. Loop caught and resolved CI hygiene.
+- **Reviewed-in**: reviews/2026-05-10.md ▸ Proposal #6
+
+### [2026-05-10] Bump `bedrock-agentcore` 1.6.4 → 1.9.0 → RESOLVED — superseded
+- **Decision**: Superseded
+- **Reasoning**: Replaced by the 2026-05-15 re-prioritized entry (`1.6.4 → 1.9.1`) — lag widened from 3 → 4 versions in window, and Dependabot version-updates were disabled by #293 (May 13), so the lag is now structural rather than incidental. The re-prioritized entry shipped in PR #337.
+- **Reviewed-in**: reviews/2026-05-15.md ▸ Proposal #1
diff --git a/docs/kaizen/reviews/2026-05-10.md b/docs/kaizen/reviews/2026-05-10.md
new file mode 100644
index 00000000..e48f395b
--- /dev/null
+++ b/docs/kaizen/reviews/2026-05-10.md
@@ -0,0 +1,207 @@
+# Kaizen Review — Sunday, May 10, 2026
+> Prepared 11:00am MT. Review window: May 3 – May 10, 2026 (7 days).
+> Source: research/2026-05-10.md + review-queue.md (8 open items).
+> **Bootstrap run** — no prior reviews, no prior-week POC findings, no Carried Over items. Scope evolved mid-bootstrap: added FastMCP, library-native subtraction lens, and Agentic UI/UX lens; removed security posture lens (security is handled by Dependabot/CodeQL and doesn't need a weekly kaizen pass).
+
+## Week in Review
+
+Two themes braid this week. **Agentic UI/UX has shifted under us**: MCP Apps shipped to production with adoption from Claude Desktop, ChatGPT, VS Code Copilot, Goose, and Postman; the design conversation has moved from "what should an agent chat look like" to "how do we host other people's UIs in our chat." Our text+JSON tool-result rendering is now the baseline competitors are extending past. **Upstream-shrinks-our-backlog**: Strands v1.37/v1.38 quietly closed our open issues #266 and #267, `bedrock-agentcore` is 3 minor versions behind with likely fixes for two SDK issues we feel. The BFF migration is still v1 (5 of 8 commits this week are auth follow-ups), CI is unreliable. Net read: a "scope the big UX investment, harvest the upstream gains, stabilize hygiene" week.
+
+## Friction — the week's signal
+
+### Repeated patterns (≥2 occurrences)
+- **CI deploy failures (6+ since May 6)** across Inference API, App API, and Frontend deploys.
+  - *Hypothesis*: BFF migration introduced env-var or stack drift not caught in synth (most failures cluster around the same auth/config landscape that's still being patched).
+  - *Candidate fix*: cross-check the most recent failed deploy log against the beta.23 → beta.24 stack diff; the most likely suspects are missing/renamed SSM parameters from the public-PKCE-client decommission (`/auth/cognito/app-client-id` removed) or the new `BFF_*` env vars.
+- **Nightly Build & Test failures (9× since May 6)** — concentrated, untriaged.
+  - *Hypothesis*: known flakiness from issue #220 (`test_cognito_idp_service`, `test_oauth_repositories`, `test_auth_providers*` order-dependent) compounding under the BFF-heavy week's churn.
+  - *Candidate fix*: triage one failure log end-to-end. Either the root is #220 (then #220 needs to land) or it's a real regression masked by the noise.
+- **BFF/auth fix churn (5 of 8 commits this week)** — #270, #271, #273, #274, #275, #277.
+  - *Hypothesis*: BFF migration is a v1, not a v1.1; expect 1–2 more weeks of follow-ups before declaring beta.25.
+  - *Candidate fix*: not a fix per se — adjust release-cut timing for beta.25 to wait for the churn to settle.
+
+### One-offs worth watching
+- **`bedrock-agentcore` 1.6.4 → 1.9.0 lag** with a release published *inside* the scan window (May 7) — see proposal #1.
+- **OpenAI displaces GPT-5.3 Instant with GPT-5.5 Instant** — model selector audit needed (proposal #6 — declined-by-default below; check first then decide).
+- **Strands #2266 `BedrockModel.stream` cancel leak** — filed May 9; see proposal #2.
+
+### Silence that matters
+- **No invocations of 6 of 9 skills in 60+ days** (angualar-best-practices, tailwind-ui, frontend-design, cdk-infrastructure, versioning, cors-deployment) — modification freshness is a weak signal for skills since they encode stable conventions; **not enough to act**, but worth instrumenting invocation telemetry if we want to make this a reliable retirement signal in the future.
+- **HN was quiet on stack keywords this week** (0 hits in Algolia 7-day window) — not a problem; just a confirmation that this is an internal-momentum week, not a community-momentum week.
+- **`anthropics/courses` quiet since Nov 2025** — proposal #6 below proposes dropping it from the scan list.
+
+## Proposals — ranked
+
+### 1. Scope an MCP Apps host renderer in our chat (multi-PR initiative)
+- **Source**: research/2026-05-10.md ▸ Top 6 #1 ▸ Agentic UI/UX | review-queue.md (open)
+- **Surface area**: frontend (new `<mcp-app-frame>` Angular component, tool-result rendering pipeline) + backend (new SSE event `ui_resource`; likely a `ui_consent_required` cousin of `oauth_required`)
+- **Change**: implement the host side of MCP Apps (SEP-1865) — sandboxed iframe rendering `ui://` resources returned by MCP tools, with the `ui/`-prefixed JSON-RPC dialect over `postMessage`. Treat as a multi-PR initiative: (a) scoping/architecture doc + spec checklist, (b) SSE event + plumbing, (c) iframe component + postMessage bridge, (d) consent UX, (e) end-to-end demo with one example from `ext-apps/examples`.
+- **Subtracts**: no — pure addition. Justified: Claude Desktop, ChatGPT, VS Code Copilot, Goose, and Postman ship this. Every third-party MCP server we connect could be shipping UIs richer than text+JSON; without a host, we leave that on the table.
+- **Effort**: High (multi-PR; new SSE protocol event; sandboxed-iframe component; consent UX)
+- **Impact**: High (strategic — agentic UI standard our chat is being measured against)
+- **POC findings**: not POCed.
+- **Ship means**: **scope this week, build over 3-4 weeks.** Open a scoping issue + architecture doc this week — not a code PR yet. Code PRs follow in subsequent sprints.
+- **Decline means**: stay on text+JSON tool results; revisit when a third-party MCP server we connect ships an MCP App and we can't render it (the forcing function).
+- **Recommendation**: **Ship (scope this week).** Highest strategic value of any item. Pre-paves with proposal #3.
+
+### 2. Bump `bedrock-agentcore` 1.6.4 → 1.9.0
+- **Source**: research/2026-05-10.md ▸ Top 6 #2 | review-queue.md (open)
+- **Surface area**: backend (`backend/pyproject.toml`, `backend/uv.lock`)
+- **Change**: pin update + smoke-test memory + identity flows in dev; verify upstream CHANGELOG between 1.6 and 1.9 for any breaking changes; close out open SDK issues #456 (OTEL trace detach) and #452 (event-loop blocking) if 1.9 addresses them.
+- **Subtracts**: addition only — justified by 3 versions of upstream fixes including likely-relevant ones to OTEL trace detach and AgentCoreMemorySessionManager event-loop blocking.
+- **Effort**: Low
+- **Impact**: Medium (observability + concurrency)
+- **POC findings**: not POCed — bootstrap run.
+- **Ship means**: open a PR updating `pyproject.toml` and `uv.lock`; smoke-test memory write/read + identity OAuth flows in dev; if smoke passes, merge.
+- **Decline means**: keep at 1.6.4 for another week; revisit after 1.10 ships or after we observe a memory/identity issue.
+- **Recommendation**: **Ship.** Lowest effort × medium impact; clean upstream-harvest win.
+
+### 3. Promote tool-result rendering to a per-tool renderer registry (signal-backed)
+- **Source**: research/2026-05-10.md ▸ Top 6 #3 ▸ Agentic UI/UX (AI SDK + Cursor canvases) | review-queue.md (open)
+- **Surface area**: frontend (`<tool-result>` component + new `ToolRendererRegistry` service)
+- **Change**: today our tool-result rendering is (implicitly) a switch in one place. Promote to a signal-backed registry keyed by tool name; per-tool renderers live next to the tool definition. Independently valuable AND the natural extension point for proposal #1 (MCP Apps becomes "just another registered renderer that emits an iframe").
+- **Subtracts**: partial — replaces an implicit switch with an explicit registry; the registry itself is new code, but it absorbs scattered tool-specific UI logic into one declarative table.
+- **Effort**: Medium
+- **Impact**: Medium-High (improves current tool-result UX AND pre-paves MCP Apps host)
+- **POC findings**: not POCed.
+- **Ship means**: open a PR that introduces the registry service + migrates 2-3 current tool renderers as a proof point.
+- **Decline means**: keep the implicit switch; pay the cost when proposal #1 lands.
+- **Recommendation**: **Ship.** Best risk-adjusted UX investment — value standalone AND scaffolding for proposal #1.
+
+### 4. Audit `BedrockModel.stream` cancellation path against Strands #2266
+- **Source**: research/2026-05-10.md ▸ Top 6 #4 | review-queue.md (open)
+- **Surface area**: backend (`backend/src/agents/main_agent/` stream coordinator + SSE handler)
+- **Change**: locate every `BedrockModel.stream` cancellation path; ensure each `await`s the inner task on cancel so it doesn't orphan; add a dev-only assertion / log filter to detect "Task exception was never retrieved" before it reaches prod.
+- **Subtracts**: addition only — defensive.
+- **Effort**: Low
+- **Impact**: Medium-High (SSE-disconnect path is hot; orphan-task pressure is silent until it isn't)
+- **POC findings**: not POCed.
+- **Ship means**: open a PR with the audit + fixes + a regression test that triggers cancel + asserts no orphan tasks.
+- **Decline means**: log a tech-debt issue; revisit if "Task exception was never retrieved" appears in CloudWatch.
+- **Recommendation**: **Ship.** Pairs naturally with proposal #2 (same backend area, same week's Strands signal). Cheap insurance.
+
+### 5. Close issues #266 and #267 — features already in our Strands 1.39 pin
+- **Source**: research/2026-05-10.md ▸ Top 6 #5 | review-queue.md (open)
+- **Surface area**: cross-cutting — GitHub issues + small wiring in `stream_coordinator` and Code Interpreter / spreadsheet tool-result handling
+- **Change**:
+  1. Comment on #266 + #267 pointing at upstream PRs (Strands #2249 for context-window lookup; v1.38.0 release notes for large tool result offload).
+  2. Verify the upstream features are *automatically* active under our 1.39 pin — if not, file replacement issues for the wiring work and link them.
+  3. Close #266 + #267.
+- **Subtracts**: **yes — library-native subtraction. Retires 2 "build from scratch" tickets; replaces with at-most 2 "wire upstream feature" tasks.**
+- **Effort**: Low
+- **Impact**: Medium (closes phantom tech debt; clears the issue list)
+- **POC findings**: not POCed.
+- **Ship means**: 30-minute issue-grooming pass; comment + close + (if needed) file 2 smaller follow-ups.
+- **Decline means**: leave #266 + #267 open; future kaizen runs will re-flag them.
+- **Recommendation**: **Ship.** Highest *subtraction* yield this week. The clearest demonstration of the kaizen loop earning its keep.
+
+### 6. Triage Nightly Build & Test failure cluster (9× since May 6)
+- **Source**: research/2026-05-10.md ▸ Top 6 #6 | review-queue.md (open)
+- **Surface area**: cross-cutting / CI (`.github/workflows/nightly-*.yml`, possibly `tests/shared/test_cognito_idp_service.py` per issue #220)
+- **Change**: pull the most recent failure log; trace to root cause; if root is issue #220 (test isolation), bump #220 to blocking and land it; if it's a different cause, file and resolve.
+- **Subtracts**: possibly — if root is #220, fixing it materially simplifies the suite (removes a tech-debt entry).
+- **Effort**: Low-Medium (worst case: a real regression hiding under the noise)
+- **Impact**: Medium-High (CI signal is currently unreliable; that affects *every* PR review)
+- **POC findings**: not POCed.
+- **Ship means**: 1-2 hour triage pass; either fix in a small PR or bump #220 to blocking.
+- **Decline means**: continue ignoring nightly failures; eventually a real regression will hide here.
+- **Recommendation**: **Ship.** This is independent of the kaizen-loop work above; it's hygiene. If the kaizen review is the venue that finally surfaces it, that's a kaizen win.
+
+### 7. Audit `oauth_required` SSE flow against ref-repo's mid-tool-call 401/403 handling
+- **Source**: research/2026-05-10.md ▸ Risks | review-queue.md (open, deferred)
+- **Surface area**: backend (`apis/shared/oauth/agentcore_identity.py`, SSE event emission in `inference-api`, MCP/A2A tool wrappers)
+- **Change**: walk through the code paths where an external OAuth provider (Google etc.) returns 401/403 mid-stream; confirm the response is `oauth_required` SSE re-emission (consent-resume), not a streamed tool error to the user. Add a regression test if missing.
+- **Subtracts**: addition only — defensive; closes a real UX gap when upstream tokens revoke.
+- **Effort**: Medium (audit + likely 1-2 small fixes)
+- **Impact**: High (user-visible UX; OAuth token revocation does happen)
+- **POC findings**: not POCed.
+- **Ship means**: open a tech-debt issue with the audit findings; fix in a follow-up PR if anything is broken.
+- **Decline means**: assume current behavior is correct; revisit if a user reports a stuck-OAuth conversation.
+- **Recommendation**: **Defer 2 weeks (revisit 2026-05-24).** Highest-impact backend proposal but BFF auth is still settling — landing this in the middle of the BFF patch parade risks compounding the churn. Wait for BFF to stabilize (likely beta.25), then audit cleanly.
+
+### 8. Named A2A agent participants in the chat UI
+- **Source**: research/2026-05-10.md ▸ Agentic UI/UX ▸ Linear Agent pattern | review-queue.md (open)
+- **Surface area**: frontend (extend message model with `agent_identity`; distinct avatar / name / styling for A2A sub-agent turns instead of nesting under generic `tool_use` cards)
+- **Change**: when an A2A sub-agent produces output, render it as a distinctly attributed turn (named, avatar'd) rather than as a nested tool result. SSE already carries enough information; this is mostly a rendering change.
+- **Subtracts**: no — additive but pattern-validated across Linear, ChatGPT agents, and Cursor multi-agent flows.
+- **Effort**: Low-Medium
+- **Impact**: Medium (legibility of multi-agent runs; user understanding of "who did what")
+- **POC findings**: not POCed.
+- **Ship means**: small PR extending the message model + a `<agent-turn>` Angular component variant.
+- **Decline means**: A2A sub-agent activity continues to be nested under tool cards; users can't easily tell when a different "actor" is responding.
+- **Recommendation**: **Ship.** Low-effort UX win that future-proofs the chat for the A2A direction we're already heading.
+
+### 9. Replace dead source URLs in `kaizen-research` skill + drop `anthropics/courses`
+- **Source**: research/2026-05-10.md ▸ Retirement candidates | review-queue.md (open)
+- **Surface area**: skills (`.claude/skills/kaizen-research/SKILL.md`)
+- **Change**:
+  - Replace `https://aws.amazon.com/bedrock/whats-new/` (404) with the AWS What's New RSS feed (already in the skill — drop the dead URL).
+  - Replace `https://docs.claude.com/en/docs/claude-code/release-notes` (301→404) with `https://github.com/anthropics/claude-code/blob/main/CHANGELOG.md`.
+  - Drop `https://github.com/anthropics/courses` from the cookbook source (quiet since Nov 2025).
+- **Subtracts**: **yes — replaces 2 broken URLs with working ones; drops 1 stale source.**
+- **Effort**: Low
+- **Impact**: Low (skill quality)
+- **POC findings**: not POCed.
+- **Ship means**: 5-minute edit to `kaizen-research/SKILL.md`.
+- **Decline means**: leave dead URLs; future runs waste budget on them.
+- **Recommendation**: **Ship.** Trivial, pure subtraction. Bundle with proposal #10 in one skill-update PR.
+
+### 10. Add Reddit `.rss` or Reddit MCP to `kaizen-research`
+- **Source**: research/2026-05-10.md ▸ Risks ▸ "Reddit blocked from WebFetch" | review-queue.md (open)
+- **Surface area**: skills (`.claude/skills/kaizen-research/SKILL.md`)
+- **Change**: switch the community-signal subagent from raw `reddit.com` URLs to `https://www.reddit.com/r/<sub>/.rss` (try first — `.rss` may be allowed where the HTML page isn't), or wire a Reddit MCP server if available.
+- **Subtracts**: no — restores a half-blind source.
+- **Effort**: Low (try `.rss` first; fall back to MCP if blocked)
+- **Impact**: Low-Medium (community signal is one of 11 sources; useful but not load-bearing)
+- **POC findings**: not POCed.
+- **Ship means**: edit the skill's source list.
+- **Decline means**: keep accepting "Reddit blocked" as a known gap.
+- **Recommendation**: **Ship.** Bundle with proposal #9 in a single skill-update PR.
+
+## Carried Over From Prior Reviews
+
+Bootstrap run — none.
+
+## Retirement Candidates
+
+- **`enabled_tools` whitelist debug guidance in `CLAUDE.md`** — Reference repo abandoned this pattern May 3; ours isn't *wrong*, just diverging from the reference. **Recommendation**: monitor; revisit if/when we touch tool-enablement code. Not retire-this-week.
+- **Skills not modified in 60+ days (6 of 9)** — modification freshness alone isn't enough signal to retire skills that encode stable conventions (e.g. `tailwind-ui`, `cdk-infrastructure`). **Recommendation**: **don't retire.** If we want this to be a reliable retirement signal, we'd need invocation telemetry — that's a separate proposal worth filing for a future week.
+
+## Risks Acknowledged But Not Acted On
+
+- **OpenAI GPT-5.3 → 5.5 Instant displacement** — https://openai.com/index/gpt-5-5-instant/ — *what breaks if ignored*: customers using a 5.3 default may hit a deprecation window. **Recommendation**: **Watch until 2026-06-01.** Quick check next week to confirm whether OpenAI is publishing a deprecation date for 5.3; if yes, file a model-selector audit.
+- **MCP Apps adoption window** — every major host shipped support in Q1 2026. The longer we wait, the more we're shaping our tool-result UI in a direction that doesn't compose with where the ecosystem is going. **Recommendation**: scope this week (proposal #1); first code PR by 2026-05-31.
+
+## What Shipped This Week
+
+- **#277 — feat(auth): centralize 401 redirect + proactive session detection** (May 10) — *closes a real refresh-edge UX hole*
+- **#275 — fix(bff): tighten cross-task refresh-lock release + absolute-lifetime guard** — *prevents zombie refresh attempts after lock release*
+- **#274 — fix(bff): replace KMS-wrap data-key bootstrap with Secrets-Manager-generated secret** — *removes a bootstrap race the AES-GCM codec couldn't recover from*
+- **#273 — fix(bff): cross-task cookie-codec & refresh-lock correctness** — *cleanup*
+- **#272 — feat(auth): add SKIP_AUTH=true local-dev bypass with allowlist guard** — *unblocks local dev when Cognito is offline*
+- **#271 — fix(auth): make lava-lamp backdrop dark-mode aware** — *visual polish*
+- **#270 — fix(token-accounting): correct per-message cost and context-window semantics** — *fixes the cost badge accuracy*
+- **#265 — chore(deps): upgrade strands-agents to 1.39.0** — *the upgrade that quietly closes #266 and #267*
+
+## Take
+
+The week's most valuable shipping is the strands-agents 1.39.0 bump (#265) — the team probably doesn't yet know it closed two of our open issues. That's the kaizen loop earning its keep on the upstream-harvest side. The new UI/UX lens — added mid-bootstrap — earned its keep too: it surfaced **MCP Apps** as a production-ready agentic UI standard that every major host already ships, and that our chat doesn't. The most consequential change this week if scoped is **proposal #1 (MCP Apps host renderer)** — high effort but high strategic value. The best risk-adjusted move is **proposal #3 (per-tool renderer registry)** — independently valuable AND pre-paves #1. Quick wins: **#2 (`bedrock-agentcore` bump)** and **#5 (close #266/#267)** demonstrate library-native subtraction. CI (proposal #6) is the loudest non-kaizen problem; surface it here but fix it as hygiene.
+
+---
+
+## Review Protocol (for Phil)
+
+1. Read Friction (2 min).
+2. Mark each Proposal ✅ Ship / ❌ Decline / ⏸ Defer (4-6 min). **10 proposals**; my recommendations: 9 Ship, 1 Defer, 0 Decline (proposal #7 — `oauth_required` audit — is the defer until 2026-05-24).
+3. Same for Risks Acknowledged.
+4. Pick 1-3 to ship this week. Suggested if you only do 3: **#1 (scope MCP Apps host — scoping doc only this week), #2 (bedrock-agentcore bump), #5 (close #266/#267)** — covers strategic, quick-win, and subtraction. If 4: add **#3 (renderer registry)** as the bridge investment.
+
+Target: 12-17 minutes (slightly more than the nominal 10-15 because the bootstrap is larger than a normal weekly review).
+
+## Post-review (separate PRs)
+
+- ✅ Ship items → individual feature PRs over the week. The decision is logged in this doc; the implementation lives elsewhere.
+- ❌ Decline items → appended to `docs/kaizen/decisions.md` with reason so future research doesn't re-propose.
+- ⏸ Defer items → kept open in `review-queue.md` with a "revisit by" date; surface again in the next review when due.
+
+This skill produces the agenda. Implementation never happens here.
diff --git a/docs/kaizen/reviews/2026-05-15.md b/docs/kaizen/reviews/2026-05-15.md
new file mode 100644
index 00000000..610bf82a
--- /dev/null
+++ b/docs/kaizen/reviews/2026-05-15.md
@@ -0,0 +1,207 @@
+# Kaizen Review — Friday, May 15, 2026
+> Prepared 09:50am MT. Review window: May 10 – May 15, 2026 (5 days since the bootstrap review).
+> Source: research/2026-05-15.md + review-queue.md (15 open items at run start).
+> First *regular* review after the 2026-05-10 bootstrap. Two prior-week proposals already landed (#1 MCP Apps scoping → PR #296, #6 nightly-CI triage → PR #290).
+
+## Week in Review
+
+The week's defining move came not from external ecosystem noise but from a single internal decision: **#293 disabled Dependabot version-update PRs on May 13**, which silently promotes this kaizen loop from "nice-to-have radar" to "the only mechanism catching version-pin lag." The first cost of that promotion arrived the same week — `bedrock-agentcore` widened from 3 → 4 releases behind (1.6.4 → 1.9.1, latest May 12), Strands shipped v1.40 with a *breaking* default-flip on `use_native_token_count`, and four ref-repo and SDK signals pointed at silent-failure modes in our code (`/ping` reaping, A2A streaming capability, double-fired `tool_use` SSE, tool-catalog/executor RBAC drift). Externally the agentic-UI storyline that dominated bootstrap week went quiet, but the upstream-shrinks-our-backlog play kept paying off — Strands' proactive context compression and the AgentCore SDK `async_mode` PR #478 both directly intersect work we'd otherwise build. Net read: a "harvest the upstream gains, defuse the silent-failure modes, take responsibility for dependency drift" week.
+
+## Friction — the week's signal
+
+### Repeated patterns (≥2 occurrences)
+- **Inference-API deploy doesn't roll new AgentCore Runtime versions** (issue #288, May 12; 3 Deploy Inference API failures in window) — new container images reach ECR but the live Runtime isn't picked up.
+  - *Hypothesis*: the deploy workflow pushes the image but does not call `update_agent_runtime` (or whatever the SDK equivalent is) to bump the Runtime version pointer. Manual redeploys have been the band-aid.
+  - *Candidate fix*: trace the deploy workflow against the AgentCore Runtime versioning model; verify the post-ECR-push step invokes the SDK's `update_agent_runtime`. Pair with the bedrock-agentcore 1.9.1 bump (the SDK we'd need to call against has moved 4 versions in that workflow's blind spot).
+- **Version Check workflow failures clustered on Dependabot branches before #293 landed** (5 of 7 in window, May 11) — these are the failures that *prompted* the decision to disable Dependabot version-update PRs.
+  - *Hypothesis*: the Version Check workflow expects authored-by-human PRs to bump a project VERSION file; Dependabot PRs only bump pinned dependencies and tripped the check.
+  - *Candidate fix*: not actionable here — Phil's call to disable Dependabot version-updates (#293) was the decision; this review just inherits its consequence (load-bearing kaizen loop).
+
+### One-offs worth watching
+- **Nightly Build & Test silent since May 8 (last week's #6 worked)** — PR #290 landed May 12 and the cluster has been quiet since. Confirmation that the loop catches and resolves CI hygiene.
+- **BFF parade declared done with #297** (May 14) — `chore(auth): remove dead Bearer-only auth from app_api post-BFF migration` retires the parallel-auth path; beta.26 cut May 14 settles the BFF v1.
+
+### Silence that matters
+- **Zero merged work on prior-review proposals #2, #3, #4, #5, #8, #9** in the 5-day window despite 9 "Ship" recommendations. Phil shipped the two highest-leverage ones (#1 scoping doc, #6 nightly-CI), then the rest of the week's PR throughput (#297–#301) was on admin-shell + UX polish + dead-code removal. *Read*: the loop produced more "Ship" recommendations than the week absorbed. Either accept that recommendation density is meant to give Phil optionality and re-surface the unshipped items here (this review's approach), or the next research run trims to top-3 Ship suggestions. Pick one — drifting between both reads is the failure mode.
+- **AgentCore platform announcements** — zero new GA/preview items this week (last week had BYO filesystem, Memory metadata, Identity-on-ECS, Payments). The capability-unlock pipeline is shallow this week; lean on the carry-overs.
+
+## Proposals — ranked
+
+### 1. Bump `bedrock-agentcore` 1.6.4 → 1.9.1 (re-prioritized; lag widened 3 → 4)
+- **Source**: research/2026-05-15.md ▸ Top 5 #1 | review-queue.md (open since 2026-05-15, supersedes 2026-05-10 entry)
+- **Surface area**: backend (`backend/pyproject.toml`, `backend/uv.lock`)
+- **Change**: pin update + smoke-test memory + identity flows in dev. Verify CHANGELOG between 1.6.4 → 1.9.1 (May 12) for breaking changes. Verify whether 1.9.1 already addresses #456 (OTEL trace detach across asyncio boundaries) — if so, close. Track PR #478 (`async_mode`) for the 1.10.0 follow-up.
+- **Subtracts**: no — pure dep bump. Justified: 4 versions of upstream fixes; Dependabot version-updates are now off (#293), so this won't get there on its own; sets up adoption of PR #478 `async_mode` once 1.10.0 ships (which resolves last week's flagged #452).
+- **Effort**: Low
+- **Impact**: Medium-High
+- **POC findings**: not POCed; recommended in bootstrap review but no PR opened.
+- **Ship means**: open a PR updating `pyproject.toml` and `uv.lock`; smoke-test memory write/read + identity OAuth flows in dev; if smoke passes, merge.
+- **Decline means**: lag widens to 5 next week; eventually a security patch or `async_mode` adoption forces a multi-version jump.
+- **Recommendation**: **Ship.** Highest-priority cleanup of the week. Embarrassing on a third week of carry-over. Bundle with proposal #10 (deploy-rolls-runtime fix) — same surface area, the SDK methods are in the same package.
+
+### 2. Audit and fix `/ping` to emit `time_of_last_update` (AgentCore SDK issue #471)
+- **Source**: research/2026-05-15.md ▸ Top 5 #2 | review-queue.md (open since 2026-05-15) — https://github.com/aws/bedrock-agentcore-sdk-python/issues/471
+- **Surface area**: backend (`backend/src/apis/inference_api/` `/ping` handler — one of the two routes the AgentCore Runtime data plane actually serves per CLAUDE.md inference-api boundary)
+- **Change**: grep our ping response shape; if it's just `{"status": "Healthy" | "HealthyBusy"}`, extend to `{"status": ..., "time_of_last_update": <iso-ts>}`. Without that field AgentCore's idle reaper can kill microVMs mid-long-generation even while we're streaming.
+- **Subtracts**: no — defensive against silent microVM reaping on long generations.
+- **Effort**: Low
+- **Impact**: Medium-High (silent failure mode; long agent turns get killed mid-stream)
+- **POC findings**: not POCed.
+- **Ship means**: 15-minute PR — grep the handler, add the field, smoke-test against a long-running tool turn in dev.
+- **Decline means**: keep accepting potential silent microVM reaping on long generations; revisit if a user reports a stuck-mid-stream conversation.
+- **Recommendation**: **Ship.** Cheapest item in the list with a real silent-failure mode. This is exactly the kind of cheap-but-load-bearing fix the kaizen loop should be catching.
+
+### 3. Wire per-tool `duration_ms` into `tool_result` SSE (Claude Code 2.1.141 pattern)
+- **Source**: research/2026-05-15.md ▸ Top 5 #5 | review-queue.md (open since 2026-05-15)
+- **Surface area**: backend (new Strands `AfterToolCall` hook → SSE event emitter) + frontend (`<tool-result>` component: faint inline timing badge when `duration_ms > 250`)
+- **Change**: register a Strands `AfterToolCall` hook capturing `(end - start)` wall-clock per tool invocation; emit on the existing `tool_result` SSE event as `duration_ms`. Frontend renders inline timing badge only above a noise threshold (default 250ms).
+- **Subtracts**: partial — single hook-driven field replaces any ad-hoc per-tool timing.
+- **Unlocks**:
+  - Per-tool timing visibility in the UI (which slow tool is the bottleneck on this turn?)
+  - Data substrate for the planned context-attribution prototype — separates tool latency from token cost
+- **Effort**: Low-Medium
+- **Impact**: Medium-High
+- **POC findings**: not POCed.
+- **Ship means**: one PR: backend hook + SSE field + frontend badge + a unit test that asserts the hook fires on tool completion.
+- **Decline means**: tool-latency stays invisible; context-attribution prototype starts with murkier inputs.
+- **Recommendation**: **Ship.** Pairs naturally with the planned context-attribution work — landing this first means the prototype starts clean. The "unlocks new UI surface" lens applies cleanly here.
+
+### 4. Defensive A2A AgentCard `capabilities={"streaming": True}` check
+- **Source**: research/2026-05-15.md ▸ Top 5 #4 | review-queue.md (open since 2026-05-15) — ref-repo commit `50c9112`
+- **Surface area**: backend (wherever we construct A2A AgentCards — search `AgentCard`, `capabilities=`, `agent_card`)
+- **Change**: 30-second grep + read; if the field is missing or `False`, set to `True`. Silent failure mode otherwise: A2A SDK client falls back to non-streaming, never receives `completed`, hangs ~40 minutes.
+- **Subtracts**: no — defensive; silent-failure mode of 40-min timeouts.
+- **Effort**: Low
+- **Impact**: Medium
+- **POC findings**: not POCed.
+- **Ship means**: 30-min grep + PR; if no A2A AgentCard exists in the repo today (single-agent baseline), file a note that it must be set on first A2A construct landed.
+- **Decline means**: leave a silent-failure mode active in any future A2A AgentCard work.
+- **Recommendation**: **Ship.** Cheapest defensive item this week.
+
+### 5. Promote tool-result rendering to a per-tool renderer registry (PR #0 of MCP Apps host sequence)
+- **Source**: review-queue.md (open since 2026-05-10) | scoping doc `docs/kaizen/scoping/mcp-apps-host-renderer.md` ▸ PR #0 (pre-work)
+- **Surface area**: frontend (`<tool-result>` / `tool-use.component.ts` + new `tool-renderer-registry.service.ts`)
+- **Change**: lift the implicit tool-result switch into a signal-backed registry keyed by tool name. Default renderer is today's behavior. Migrate 2–3 existing renderers as proof points.
+- **Subtracts**: partial — replaces implicit switch with declarative registry; absorbs scattered tool-specific UI logic into one table. Pre-paves MCP Apps PR #4 (the iframe renderer slots in as just-another-registered-renderer).
+- **Effort**: Medium
+- **Impact**: Medium-High (standalone UX value + scaffolding for the MCP Apps initiative)
+- **POC findings**: not POCed — but Phil locked it in as PR #0 of the MCP Apps sequence on May 14 (#296 scoping doc).
+- **Ship means**: open the PR #0 from the scoping doc — registry service + 2–3 migrated renderers, all existing tool rendering unchanged.
+- **Decline means**: MCP Apps PR sequence is blocked; the implicit switch grows further.
+- **Recommendation**: **Ship.** The scoping decision is already made; this is execution. Best risk-adjusted UX investment of the week.
+
+### 6. Strands 1.39 → 1.40 bump, gated on `use_native_token_count` audit + proactive-compression double-fire check
+- **Source**: research/2026-05-15.md ▸ Top 5 #3 | review-queue.md (open since 2026-05-15)
+- **Surface area**: backend (`backend/pyproject.toml`, `uv.lock`, `apis/shared/` token-metric reads, `agents/main_agent/streaming/`, `TurnBasedSessionManager`)
+- **Change**: (a) audit `apis/shared/` and `agents/main_agent/streaming/` for native-token-count reads — if we depend on them, either pin `BedrockModel(use_native_token_count=True)` explicitly OR re-route through the heuristic and verify the cost-badge math (recall #270 just touched this); (b) bump pin; (c) verify proactive context compression (PR #2239) doesn't double-fire with our existing `TurnBasedSessionManager` flush — our `compaction` SSE event should still emit cleanly. (d) Smoke-test cost-badge values across a tool turn before promoting.
+- **Subtracts**: **yes — library-native subtraction.** Strands' proactive context compression reduces the surface area of our custom session-manager compaction logic.
+- **Effort**: Medium
+- **Impact**: Medium-High
+- **POC findings**: not POCed.
+- **Ship means**: PR with the audit + pin bump + a regression test that asserts the `compaction` SSE event still emits exactly once per compaction event.
+- **Decline means**: stay on 1.39 for another week; revisit when 1.41 ships or after a token-accounting issue surfaces.
+- **Recommendation**: **Ship,** but second-priority of the bumps (after #1) — the audit is the careful part.
+
+### 7. Close issues #266 and #267 — features already in our Strands 1.39 pin
+- **Source**: review-queue.md (open since 2026-05-10; carry-over from bootstrap review proposal #5)
+- **Surface area**: cross-cutting — GitHub issues + minor wiring checks in `stream_coordinator` (#267) and large tool-result paths (#266)
+- **Change**: (1) comment on #266 + #267 pointing at upstream PRs; (2) verify the upstream features are *automatically* active under our 1.39 pin — if not, file replacement issues for the wiring work and link them; (3) close #266 + #267.
+- **Subtracts**: **yes — library-native subtraction.** Retires 2 build-from-scratch tickets; replaces with at-most 2 "wire upstream feature" tasks.
+- **Effort**: Low
+- **Impact**: Medium (closes phantom tech debt; clears the issue list)
+- **POC findings**: not POCed — recommended Ship in the bootstrap review but didn't get to it in window.
+- **Ship means**: 30-minute issue-grooming pass.
+- **Decline means**: leave #266 + #267 open; future kaizen runs will re-flag them.
+- **Recommendation**: **Ship.** Highest subtraction yield this week. Trivial.
+
+### 8. Audit `BedrockModel.stream` cancellation path against Strands #2266
+- **Source**: review-queue.md (open since 2026-05-10; carry-over from bootstrap review proposal #4)
+- **Surface area**: backend (`backend/src/agents/main_agent/` stream coordinator + SSE handler)
+- **Change**: locate every `BedrockModel.stream` cancellation path; ensure each `await`s the inner task on cancel so it doesn't orphan; add a dev-only assertion / log filter to detect "Task exception was never retrieved" before it reaches prod.
+- **Subtracts**: no — defensive.
+- **Effort**: Low
+- **Impact**: Medium-High (SSE-disconnect path is hot)
+- **POC findings**: not POCed.
+- **Ship means**: PR with the audit + fixes + a regression test that triggers cancel + asserts no orphan tasks.
+- **Decline means**: log a tech-debt issue; revisit if "Task exception was never retrieved" appears in CloudWatch.
+- **Recommendation**: **Ship.** Pairs naturally with proposal #1 (same backend area). Cheap insurance.
+
+### 9. Replace dead source URLs in `kaizen-research` skill + AgentCore starter-toolkit slug typo
+- **Source**: review-queue.md (open since 2026-05-10; carry-over from bootstrap review proposal #9) + research/2026-05-15.md ▸ Retirement candidates
+- **Surface area**: skills (`.claude/skills/kaizen-research/SKILL.md`)
+- **Change**: (a) replace `https://aws.amazon.com/bedrock/whats-new/` (404) with the AWS What's New RSS feed; (b) replace `https://docs.claude.com/en/docs/claude-code/release-notes` (301→404) with `https://github.com/anthropics/claude-code/blob/main/CHANGELOG.md`; (c) drop `https://github.com/anthropics/courses` (quiet since Nov 2025); (d) fix `aws/amazon-bedrock-agentcore-starter-toolkit` → `aws/bedrock-agentcore-starter-toolkit` slug (new this week).
+- **Subtracts**: **yes — replaces 2 broken URLs with working ones; drops 1 stale source; fixes 1 slug typo.**
+- **Effort**: Low
+- **Impact**: Low (skill quality; reduces web-budget waste)
+- **POC findings**: not POCed.
+- **Ship means**: 10-minute edit to `kaizen-research/SKILL.md`.
+- **Decline means**: keep wasting web-budget on dead URLs; future runs continue to miss the starter-toolkit repo via the wrong slug.
+- **Recommendation**: **Ship.** Trivial subtraction.
+
+### 10. Investigate inference-api deploy issue #288 — new images reach ECR but Runtime isn't rolled
+- **Source**: research/2026-05-15.md ▸ Internal Audit ▸ Repeated friction (3 Deploy Inference API failures in window) | issue #288 (May 12, open)
+- **Surface area**: cross-cutting — `.github/workflows/deploy-inference-api.yml` + the SDK call shape against the post-bump bedrock-agentcore 1.9.x API
+- **Change**: trace the deploy workflow end-to-end; confirm whether it calls `update_agent_runtime` (or equivalent) after the ECR push; if missing, add it; if present but failing, surface the failure. Naturally pairs with proposal #1 since the SDK that owns this call is the package we're bumping.
+- **Subtracts**: possibly — if the fix removes the manual-redeploy band-aid that's been the workaround.
+- **Effort**: Low-Medium (worst case: an IAM/permission gap on the deploy role)
+- **Impact**: Medium-High (eventually a security patch or model-version bump must ship via this path)
+- **POC findings**: not POCed.
+- **Ship means**: 1–2 hour triage; small PR fixing the workflow + closing #288 when verified.
+- **Decline means**: continue running manual redeploys; eventually something time-critical needs to ship through this path.
+- **Recommendation**: **Ship.** Pair with #1 for shared context on the bedrock-agentcore SDK surface area.
+
+## Carried Over From Prior Reviews
+
+- **`oauth_required` SSE flow audit against ref-repo's mid-tool-call 401/403 handling** (deferred 2026-05-10 until 2026-05-24) — original context: BFF auth was still settling. **Status this week**: BFF parade declared done with #297 (May 14) — the cleanup PR removed the parallel Bearer path entirely. Conditions for the original deferral have cleared a week early. *Recommendation*: keep deferred until 2026-05-24 per original commitment (one more stable week eliminates regression risk) — but surface here as an explicit hold rather than silently leaving on the queue.
+- **Named A2A agent participants in the chat UI** (open since bootstrap, recommended Ship) — not shipped in window. *Recommendation*: defer 4 weeks (revisit 2026-06-12) — single-agent mode is still our baseline; this earns its keep when an A2A construct lands. No re-rank without a trigger.
+- **Scope AgentCore Runtime BYO filesystem (S3 Files / EFS)** (open since bootstrap, no decision recorded) — high-effort high-impact capability unlock. *Recommendation*: defer 4 weeks (revisit 2026-06-12) — MCP Apps host renderer is the dominant strategic initiative; layering another VPC + IAM + storage-architecture push on top doubles the open ADR-worthy bets.
+
+## Retirement Candidates
+
+- **Queue item "Add Reddit `.rss` or Reddit MCP to `kaizen-research`"** — research/2026-05-15.md confirmed Reddit is blocked at the domain level via WebFetch, not just the HTML path. *Recommendation*: **Decline.** Move to `decisions.md` with reason "infeasible via WebFetch; revisit only if Reddit MCP or `curl`-via-Bash with UA header becomes available."
+- **(Soft) Bootstrap-era queue entry "Bump `bedrock-agentcore` 1.6.4 → 1.9.0"** — superseded by re-prioritized 2026-05-15 entry (1.9.1 + lag widened). *Recommendation*: mark Resolved as "Superseded by 1.9.1 entry" when moving queue items.
+- **(System-health check)** Three retirements landed across two weeks (kaizen-research lens churn, dead URLs queued for replacement, Reddit RSS declined this week). The subtraction muscle is exercising. No additional retirements needed *this* week.
+
+## Risks Acknowledged But Not Acted On
+
+- **Dependabot version-update PRs disabled (#293)** — https://github.com/Boise-State-Development/agentcore-public-stack/pull/293 — *what breaks if ignored*: version-pin lag silently widens; security patches in transitive deps don't arrive until something else surfaces them. The kaizen loop is the only mechanism left catching this. — *Recommendation*: **Address now via #1** (bedrock-agentcore bump validates the loop catches lag) + tighten the version-pin diff section of the research skill (direct fetches for `boto3`, `aws-cdk-lib`, `@angular/core`, `pydantic` every run — file as a kaizen-research skill update follow-up).
+- **Strands v1.40 `use_native_token_count` default flip** — https://github.com/strands-agents/sdk-python/pull/2284 — *what breaks if ignored*: if we bump without auditing, token-accounting reading native counts gets heuristic values instead, and #270's per-message-cost / context-% math may quietly drift. — *Recommendation*: **Address now via #6** (the audit is the careful part of the bump).
+- **AgentCore SDK PR #478 (`async_mode`) still in flight; #452 event-loop blocking unfixed in 1.9.1** — *what breaks if ignored*: under sustained load, AgentCoreMemorySessionManager can block the event loop. — *Recommendation*: **Watch until 1.10.0 ships** (likely lands the fix). Re-evaluate in the 2026-05-22 review.
+- **Strands SDK monorepo consolidation announced (issue #2286, May 12)** — *what breaks if ignored*: import paths likely move in a future major; no near-term impact at 1.40. — *Recommendation*: **Watch until v2.x messaging emerges.** Re-evaluate at next minor (1.41).
+
+## What Shipped This Week
+
+- **#301 — feat(sidebar): denser session list with skeleton and entry animation** (May 15) — *UX polish*
+- **#300 — feat(admin): persistent shell layout with grouped sidebar nav** (May 15) — *admin restructure*
+- **#299 — feat(frontend): copy-to-clipboard button on chat code blocks** (May 14) — *UX*
+- **#298 — feat(admin): admin-managed user-menu links** (May 14) — *new admin capability*
+- **#297 — chore(auth): remove dead Bearer-only auth from app_api post-BFF migration** (May 14) — *BFF v1 cleanup; declares the BFF parade done*
+- **#296 — chore(kaizen): add initial scoping document for MCP Apps Host Renderer** (May 14) — *kaizen proposal #1 from 2026-05-10 review; PR #0 → PR #6 sequence locked*
+- **#293 — chore: restrict contributions and disable Dependabot version-update PRs** (May 13) — *the load-bearing decision of the week*
+- **#290 — Fix e2e testing in nightly** (May 12) — *kaizen proposal #6 from 2026-05-10 review; nightly cluster silent since*
+- **beta.26** (May 14) + **beta.25** (May 12) — *two release cuts in window*
+
+## Take
+
+The system is trending toward trust on the loop side (two prior-review proposals shipped, both high-leverage) and toward friction on the dependency-management side (lag widened the same week the only-mechanism guard came on). **The single change that matters most this week is proposal #1 (`bedrock-agentcore` bump)** — not because the bump itself is interesting, but because it's the proof that the kaizen loop can catch lag now that Dependabot can't. **The best risk-adjusted move this week is the bundle of proposals #2 + #4 + #7 + #9** — four under-30-minute items that collectively close 1 silent-failure mode, 1 latent-A2A-trap, 2 phantom GitHub issues, and 4 dead/typo URLs. Phil should ship all four in a single afternoon and feel the kaizen loop earn its keep. The slower-burn items (#3 per-tool duration_ms, #5 renderer registry, #6 Strands 1.40 audit, #8 stream cancel audit, #10 deploy/runtime fix) are the week's real engineering work — pick 1–2 of those to ride the week. **Do not** add new items to the queue this week — the queue is full, the carry-over count is healthy, and the loop is producing more recommendations than the week is absorbing; that imbalance is fine for now but worth watching.
+
+---
+
+## Review Protocol (for Phil)
+
+1. Read Friction (2 min).
+2. Mark each Proposal ✅ Ship / ❌ Decline / ⏸ Defer (4-6 min). **10 proposals**; my recommendations: 10 Ship, 0 Defer, 0 Decline.
+3. Mark Carried Over: keep oauth audit deferred to 2026-05-24; defer A2A participants + BYO filesystem 4 weeks (1-2 min).
+4. Confirm the Reddit RSS retirement → `decisions.md` (1 min).
+5. Resolve Risks block.
+6. **Suggested 1–3 to ship**: **#1 (bedrock-agentcore bump)**, **the bundle #2+#4+#7+#9** counted as one afternoon (4 cheap subtractions), and **#3 (per-tool `duration_ms`)** for the week's real UX investment.
+
+Target: 12-15 minutes.
+
+## Post-review (separate PRs)
+
+- ✅ Ship items → individual feature PRs over the week. The decision is logged here; the implementation lives elsewhere.
+- ❌ Decline items → appended to `docs/kaizen/decisions.md` with reason so future research doesn't re-propose.
+- ⏸ Defer items → kept open in `review-queue.md` with a "revisit by" date; surface again in the next review when due.
+
+This skill produces the agenda. Implementation never happens here.
diff --git a/docs/kaizen/scoping/mcp-apps-budget-allocator.tool.json b/docs/kaizen/scoping/mcp-apps-budget-allocator.tool.json
new file mode 100644
index 00000000..52640ac0
--- /dev/null
+++ b/docs/kaizen/scoping/mcp-apps-budget-allocator.tool.json
@@ -0,0 +1,16 @@
+{
+  "_comment": "Example ToolCreateRequest body for registering the modelcontextprotocol/ext-apps `budget-allocator-server` as an MCP-Apps-capable external MCP tool. POST this to `/admin/tools/` on the App API (requires an admin session). The form-style App exercises ui/update-model-context + app-initiated tools/call without 3D/charting backend infra. Run the example server in HTTP mode first (see the runbook in .github/docs/deploy/step-04-deploy.md): `cd examples/budget-allocator-server && npm run start:http` → http://localhost:3001/mcp. Adjust serverUrl for a deployed server. authType `none` is only appropriate for an unauthenticated local/dev server — use aws-iam / api-key / oauth2 for anything real.",
+  "toolId": "budget_allocator",
+  "displayName": "Budget Allocator",
+  "description": "Interactive budget-allocation MCP App (modelcontextprotocol/ext-apps example) — sliders, presets, and benchmarks. Dogfood server for the MCP Apps host renderer.",
+  "category": "data",
+  "protocol": "mcp_external",
+  "status": "active",
+  "isPublic": false,
+  "enabledByDefault": false,
+  "mcpConfig": {
+    "serverUrl": "http://localhost:3001/mcp",
+    "transport": "streamable-http",
+    "authType": "none"
+  }
+}
diff --git a/docs/kaizen/scoping/mcp-apps-host-renderer.md b/docs/kaizen/scoping/mcp-apps-host-renderer.md
new file mode 100644
index 00000000..66d583c0
--- /dev/null
+++ b/docs/kaizen/scoping/mcp-apps-host-renderer.md
@@ -0,0 +1,170 @@
+# Scoping — MCP Apps Host Renderer
+
+> Status: Scoping (no code yet)
+> Owner: Phil Merrell
+> Source: research/2026-05-10.md ▸ Top 6 #1 ▸ Agentic UI/UX | reviews/2026-05-10.md ▸ Proposal #1 (Ship — scope this week) | review-queue.md (open)
+> Spec read: `specification/2026-01-26/apps.mdx` (normative). Pre-merge step: diff `specification/draft/apps.mdx` against the dated version to catch any movement before PR #1 lands.
+
+## Goal
+
+Implement the host side of the MCP Apps extension (SEP-1865) end-to-end and to spec, so that any MCP server we connect — Gateway-hosted or external — can return interactive UIs alongside text/JSON tool results, and so our chat sits on the agentic-UI standard that Claude Desktop, ChatGPT, VS Code Copilot, Goose, Postman, and MCPJam already meet.
+
+**Out of scope:** authoring MCP Apps (we are a host, not a server-of-apps), MCP-UI / `@mcp-ui/client` framework adoption (we implement the postMessage protocol directly), and any non-MCP-Apps "generative UI" pattern.
+
+## Architectural decisions (locked)
+
+These four were the open ones from scoping. Decisions, with rationale.
+
+### 1. Sandbox origin — new subdomain (Sandbox Proxy pattern)
+
+Stand up a dedicated origin for the outer "sandbox proxy" iframe so `allow-same-origin` does not give iframe content access to the main `ai.client` origin. Pattern matches Claude.ai's web-host implementation.
+
+- **Origin:** `mcp-sandbox.<our-domain>` (exact name TBD in PR #1 — see CDK work).
+- **What it serves:** a single static `proxy.html` shell that itself creates the inner content iframe via `srcdoc` (the inner iframe is where the MCP App HTML actually runs). The outer page is what `ai.client` `postMessage`s to.
+- **Why two iframes:** the spec's "Sandbox Proxy pattern" for web hosts — the inner iframe takes the strict CSP from `_meta.ui.csp`, the outer iframe gives us a stable cross-origin boundary against the host page.
+- **CDK:** new stack `infrastructure/lib/mcp-sandbox-stack.ts` — CloudFront distribution, S3 bucket for `proxy.html`, ACM cert. Flowed through the `cors-deployment` skill for origin allowlisting.
+
+### 2. App-initiated `tools/call` — pipe through inference-api dispatch
+
+When the iframe calls `tools/call`, we surface it as a `tool_use` / `tool_result` event in the active conversation stream. Provenance is preserved — the chat history is a complete audit trail of what the embedded app ran on the user's behalf.
+
+- **Path:** iframe `postMessage` → frontend `mcp-app-frame` → app-api (new endpoint `POST /mcp-apps/proxy-call`) → inference-api → MCP server → reverse path. The inference-api side synthesizes a `tool_use` event into the conversation's SSE stream so it lands in the user's chat thread.
+- **Conversation correlation:** the iframe is bound to the originating `toolUseId` and conversation session at render time; proxied calls inherit that binding.
+- **Visibility enforcement:** the proxy endpoint MUST reject calls for tools whose `visibility` does not include `"app"` — at both the app-api boundary and the inference-api dispatch.
+
+### 3. `ui/update-model-context` storage — Strands `agent.state`
+
+App-supplied context (the structured/text payload from `ui/update-model-context`) lives in Strands `agent.state` under a dedicated key (e.g., `mcp_apps.context[resourceUri]`). This is where the upstream reference repo moved its compaction state on Apr 27 (commit `2b1a13d`) and it's where Strands is heading.
+
+- **Read path:** before each inference turn, merge any pending `agent.state.mcp_apps.context.*` entries into the prompt context, then clear them.
+- **Spec semantics honored:** "host MAY defer context until next user message" and "host SHOULD only send last update if multiple arrive before next user message" — we dedupe by `resourceUri` and apply last-write-wins between turns.
+
+### 4. v1 method scope — full set, no deferrals
+
+Implement every `ui/` method the spec defines and every standard MCP method it permits inside the postMessage channel. Rationale: the user-facing payoff of MCP Apps is highest when the app can both *receive* context (host→app) and *push* it back (app→host) — half-implementing either side cuts off the workflows the spec exists to enable (`ui/message`, `ui/update-model-context`). One feature flag (`MCP_APPS_HOST_ENABLED`) gates the whole surface during rollout.
+
+## Spec compliance checklist
+
+Normative requirements from `apps.mdx` (2026-01-26). Items prefixed with `MUST` are spec-mandated; `SHOULD`/`MAY` items captured in the PR-level acceptance criteria below.
+
+- **MUST** fetch UI resources via `resources/read` against the `ui://` URI from `_meta.ui.resourceUri` — never inline.
+- **MUST** treat `text/html;profile=mcp-app` as the resource MIME type.
+- **MUST** advertise `capabilities.extensions["io.modelcontextprotocol/ui"]` with `{ mimeTypes: ["text/html;profile=mcp-app"] }` on every outbound MCP `initialize` (Gateway client + external MCP client).
+- **MUST** filter tools whose `_meta.ui.visibility` excludes `"model"` from the agent's tool list (Strands tool registry filter).
+- **MUST** reject `tools/call` proxied from the iframe for tools whose visibility excludes `"app"`.
+- **MUST** set iframe `sandbox="allow-scripts allow-same-origin"` minimum; add `allow-camera` / `allow-microphone` / `allow-geolocation` / `allow-clipboard-write` only if the resource declares them in `_meta.ui.permissions`.
+- **MUST** build CSP from `_meta.ui.csp.{connectDomains, resourceDomains, frameDomains, baseUriDomains}` and apply the spec's deny-by-default defaults. **MUST NOT** allow undeclared domains.
+- **MUST** wait for `ui/notifications/initialized` from the app before sending any request or notification.
+- **MUST** send `ui/notifications/tool-input` with the complete arguments exactly once (before `tool-result`).
+- **MUST** send `ui/resource-teardown` before tearing the iframe down.
+- **MUST** accept `event.origin === "null"` from the sandbox iframe and rely on a per-frame nonce instead of origin matching.
+- **MUST** correlate JSON-RPC over postMessage using request `id` (standard JSON-RPC 2.0 envelope: `{jsonrpc, id, method, params}` / `{jsonrpc, id, result|error}`).
+
+## PR sequence
+
+Targets `develop`. Each PR is independently mergeable behind `MCP_APPS_HOST_ENABLED=false` until PR #6 flips it on.
+
+### PR #0 — Tool-renderer registry (pre-work; proposal #3 from reviews/2026-05-10.md)
+
+- **Files:** [tool-use.component.ts](frontend/ai.client/src/app/session/components/message-list/components/tool-use/tool-use.component.ts) + new `tool-renderer-registry.service.ts`.
+- **Change:** lift the implicit tool-result switch in `ToolUseComponent` into a signal-backed registry keyed by tool name. Default renderer is today's behavior (text/JSON/image). Registry exposes `register(toolName, component)`.
+- **Why first:** the MCP App renderer in PR #4 plugs in as just-another-registered-renderer — no special-case branches in `tool-use.component.html`.
+- **Acceptance:** all existing tool renderings work unchanged; no MCP App code yet.
+
+### PR #1 — Sandbox-proxy origin (CDK)
+
+- **Files:** new `infrastructure/lib/mcp-sandbox-stack.ts`, updates to [`bin/agentcore-public-stack.ts`](infrastructure/bin/agentcore-public-stack.ts) and `cors-deployment` workflow env vars.
+- **Change:** CloudFront + S3 + ACM for `mcp-sandbox.<domain>`; deploy a static `proxy.html` shell implementing the outer-iframe half of the Sandbox Proxy pattern. CSP `frame-ancestors` permits the `ai.client` origin only.
+- **Acceptance:** `mcp-sandbox.<domain>/proxy.html` serves; `ai.client` can `postMessage` to it; no MCP server wiring yet.
+- **Coordinates with** the [cors-deployment skill](.) — every new env var that names this origin flows through that skill.
+
+### PR #2 — Backend: MCP `initialize` extension advertisement + tool-visibility filter
+
+- **Files:** [external_mcp_client.py](backend/src/agents/main_agent/integrations/external_mcp_client.py), [gateway_mcp_client.py](backend/src/agents/main_agent/integrations/gateway_mcp_client.py), [models.py](backend/src/apis/shared/tools/models.py) (add `visibility` to `ToolDefinition`), Strands tool registry adapter.
+- **Change:** advertise `io.modelcontextprotocol/ui` on outbound MCP `initialize`; parse `_meta.ui` off `tools/list` responses onto `ToolDefinition`; filter model-invisible tools out of the Strands agent's tool list.
+- **Acceptance:** unit tests covering a fake MCP server returning a UI-bearing tool — confirm visibility filtering, confirm `_meta.ui.resourceUri` survives the round-trip into our tool catalog.
+
+### PR #3 — Backend: SSE `ui_resource` event + `resources/read` fetch path
+
+- **Files:** [event_formatter.py](backend/src/agents/main_agent/streaming/event_formatter.py), [tool_result_processor.py](backend/src/agents/main_agent/streaming/tool_result_processor.py), [stream_processor.py](backend/src/agents/main_agent/streaming/stream_processor.py), and a new helper for `resources/read` against the MCP server hosting the tool.
+- **Change:** when a tool result references `_meta.ui.resourceUri`, fetch the resource via `resources/read` and emit a new `ui_resource` SSE event: `{type, toolUseId, resourceUri, html, mimeType, csp, permissions}`. Update [CLAUDE.md](CLAUDE.md) SSE event table.
+- **Acceptance:** integration test — fake MCP server returns `_meta.ui.resourceUri`; backend emits `ui_resource` event with HTML body inline. **Spec note:** we still call `resources/read` (spec MUST); we just inline the HTML in the SSE event so the frontend doesn't need its own MCP client.
+
+### PR #4 — Frontend: `<mcp-app-frame>` component + postMessage bridge
+
+- **Files:** new `mcp-app-frame.component.ts`, [stream-parser-types.ts](frontend/ai.client/src/app/shared/utils/stream-parser/stream-parser-types.ts), [stream-parser-core.ts](frontend/ai.client/src/app/shared/utils/stream-parser/stream-parser-core.ts), [stream-parser.service.ts](frontend/ai.client/src/app/session/services/chat/stream-parser.service.ts), wire-in via PR #0's renderer registry.
+- **Change:** Angular component that:
+  - Renders the outer iframe pointed at `mcp-sandbox.<domain>/proxy.html` with the spec-mandated `sandbox` attribute.
+  - Posts a `sandbox-resource-ready` notification to the proxy with `{html, sandbox, csp, permissions}` from the SSE `ui_resource` event.
+  - Implements the host half of the JSON-RPC 2.0 envelope over postMessage with a per-frame nonce.
+  - Handles `ui/initialize`, `ui/notifications/initialized`, `ui/notifications/size-changed`, `ui/open-link`, `ui/request-display-mode` (inline/fullscreen/pip), `ui/notifications/host-context-changed`, `ui/resource-teardown`.
+  - Wires `ui/notifications/tool-input` + `tool-input-partial` + `tool-result` + `tool-cancelled` from the active SSE stream.
+- **Acceptance:** load the [basic-host](https://github.com/modelcontextprotocol/ext-apps/tree/main/examples/basic-host) reference's QR-server-style example against our component end-to-end in dev.
+
+### PR #5 — Backend + frontend: app-initiated `tools/call` proxying (decision #2)
+
+- **Files:** new `POST /mcp-apps/proxy-call` route in [app_api](backend/src/apis/app_api), tool-dispatch hook in inference-api to inject a synthesized `tool_use` into the active SSE stream, frontend wiring in `mcp-app-frame.component.ts`.
+- **Change:** iframe `tools/call` → app-api → inference-api → MCP server → result → synthesized `tool_use`/`tool_result` events on the conversation SSE stream → frontend pushes `ui/notifications/tool-result` back to the iframe.
+- **Acceptance:** clicking a button inside a hosted MCP App that triggers a server tool — the call shows up as a tool-use card in the chat *and* the iframe gets the result via `ui/notifications/tool-result`.
+- **Open implementation question:** how to inject a synthesized event into a *closed* SSE stream (i.e., when the iframe lives past the originating turn). Likely a per-conversation event broker the active SSE handler subscribes to; if no handler is active, the call still runs but the chat thread shows it when the user next opens a stream. Detailed design lives in the PR.
+
+### PR #6 — Backend: `ui/message`, `ui/update-model-context`, `ui/open-link` consent, capability gating
+
+- **Files:** [oauth_consent.py](backend/src/apis/shared/oauth/oauth_consent.py) (pattern model), new `ui_capability_consent.py` hook, Strands `agent.state` integration for `mcp_apps.context.*`, conversation-message injection for `ui/message`.
+- **Change:**
+  - `ui/update-model-context` writes to `agent.state.mcp_apps.context[resourceUri]`, merged into the next turn's context.
+  - `ui/message` injects a user-role message into the conversation (treated identically to a typed message).
+  - `ui/open-link` is gated by an `openLinks` capability declared in `hostCapabilities`; per-link consent reuses the [oauth-consent-prompt.component.ts](frontend/ai.client/src/app/session/components/message-list/components/oauth-consent-prompt/oauth-consent-prompt.component.ts) pattern (new `ui_consent_required` SSE event family).
+  - Camera / microphone / geolocation / clipboard-write capability gating wired through `hostCapabilities.sandbox.permissions`.
+- **Acceptance:** an MCP App can mutate model context, post a user message, request to open a link, and request mic access — each triggers the correct host behavior (deferred merge, conversation message, consent prompt).
+
+### PR #7 — Dogfood + enable flag flip
+
+- **Files:** documentation, example MCP App registration, [CLAUDE.md](CLAUDE.md) update, feature flag default.
+- **Change:** register one of the [ext-apps/examples](https://github.com/modelcontextprotocol/ext-apps/tree/main/examples) servers (recommended: `scenario-modeler-server` or `budget-allocator-server` — form-style, exercises `update-model-context` and `tools/call` proxying without 3D/charting infra). Flip `MCP_APPS_HOST_ENABLED=true`. Add a runbook entry to docs explaining how to register a new MCP App server.
+- **Acceptance:** end-to-end conversation in dev that invokes the example tool, renders the iframe, drives the form, calls back into MCP, mutates model context, and the model picks up the context on the next turn.
+
+## Defaults applied without explicit user call
+
+These were small enough that I'm noting them here rather than putting them in the question set:
+
+- `ToolDefinition` gets a new `visibility: Literal["model", "app"]` list field (default `["model", "app"]` per spec).
+- Outbound MCP clients advertise `io.modelcontextprotocol/ui` unconditionally — no per-server opt-in. Servers that don't understand the capability ignore it.
+- Iframes persist for the lifetime of the conversation; teardown happens on conversation reset, on explicit user dismiss, or on tab close. No per-turn teardown.
+- Default display mode is `inline`; fullscreen and PiP supported in PR #4.
+- Per-frame nonce, generated client-side, used to authenticate every postMessage exchange (origin will be `"null"` in srcdoc inner iframes; nonce is the real check).
+- Theming: `hostCapabilities.theme` exposes `light` | `dark` at initialize; `ui/notifications/host-context-changed` pushes updates when the user toggles theme.
+
+## Risks and unknowns
+
+- **CSP / `frame-ancestors` interplay.** The outer `mcp-sandbox` origin needs `frame-ancestors` permitting `ai.client`; the inner iframe needs CSP composed from `_meta.ui.csp`. We don't have prior art for nested CSP in our stack — expect 0.5–1 day of CSP debugging on PR #1.
+- **`tools/call` proxy when the SSE stream is idle.** PR #5's "inject synthesized event into a closed SSE stream" needs a small event broker. If we punt it, app-initiated tool calls work but the chat thread misses them until the user opens a new turn. Acceptable for a v1; flag as known limitation if we ship without the broker.
+- **Spec drift.** `specification/draft/apps.mdx` may have moved since 2026-01-26. Diff before PR #1 lands; if there's material movement, adjust PRs #2–#4 accordingly.
+- **AgentCore Gateway pass-through of `_meta`.** Confirm `_meta.ui.resourceUri` survives Gateway's MCP proxying — if Gateway strips unknown `_meta` keys, PR #2 needs Gateway-side work too. Verify in PR #2's integration test.
+- **Strands `agent.state` schema.** Our `TurnBasedSessionManager` doesn't currently round-trip `agent.state` through long-term memory. PR #6 may need a small adjacent change to ensure `mcp_apps.context.*` survives turn boundaries.
+
+## Definition of done
+
+- All seven PRs land on `develop` behind `MCP_APPS_HOST_ENABLED=false`; PR #7 flips it on.
+- One example MCP App from `ext-apps/examples` runs end-to-end in dev.
+- Every MUST in the compliance checklist has a corresponding test (unit or integration).
+- The dogfood scenario in PR #7 exercises: resource fetch, iframe render, `tool-input` push, app-initiated `tools/call`, `ui/update-model-context` mutating the next turn, `ui/open-link` consent prompt.
+- CLAUDE.md SSE event table updated with `ui_resource` and `ui_consent_required` rows.
+- A runbook entry describes how to register a new MCP-Apps-capable MCP server (one section in the docs, no separate doc).
+
+## Timeline
+
+3–4 weeks across calendar, depending on review cadence:
+
+| PR | Effort | Notes |
+|---|---|---|
+| #0 renderer registry | 0.5d | low-risk refactor |
+| #1 sandbox CDK | 1–1.5d | CDK + CORS skill + DNS + cert |
+| #2 backend MCP capabilities | 1d | + Gateway pass-through verification |
+| #3 backend SSE event | 1d | |
+| #4 frontend iframe + bridge | 2–3d | postMessage protocol surface is wide |
+| #5 tools/call proxying | 2d | + event broker for idle streams (or punt as known limit) |
+| #6 message/context/consent | 2d | reuses oauth-consent pattern |
+| #7 dogfood + flag flip | 0.5–1d | |
+
+Total: ~10–12 engineering days, sequenced; parallelization possible after PR #2 lands (frontend can race backend on #3–#4).
diff --git a/docs/kaizen/scoping/mcp-sandbox-dynamic-csp.md b/docs/kaizen/scoping/mcp-sandbox-dynamic-csp.md
new file mode 100644
index 00000000..5e67ff5d
--- /dev/null
+++ b/docs/kaizen/scoping/mcp-sandbox-dynamic-csp.md
@@ -0,0 +1,167 @@
+# Scoping — MCP Sandbox Dynamic Per-Resource CSP
+
+> Status: Shipping — feature/mcp-sandbox-dynamic-csp
+> Owner: Phil Merrell
+> Source: dogfood gotcha #3 in [[project-mcp-apps-pr-progress]] (Option 3 of the host-renderer CSP fix); follow-up to #353
+> Spec read: draft `specification/draft/apps.mdx` lines 283–296; reference implementation `modelcontextprotocol/ext-apps/examples/basic-host/serve.ts`
+
+## TL;DR — **Ship**
+
+PR #353 shipped Options 1+2 (broad static outer CSP + `document.write()` mount). That works for the 22/25 reference servers that don't declare `_meta.ui.csp`, but **including our PR #7 dogfood App, Excalidraw**, three real Apps declare external domains the static CSP can't honor — they fail at runtime trying to fetch declared CDN scripts / tiles / fonts / soundfonts under our `connect-src 'self'`. Excalidraw's `create_view` is the canonical case: its server declares `resourceDomains: ['https://esm.sh']` + `connectDomains: ['https://esm.sh']` (see `excalidraw/excalidraw-mcp/src/server.ts`), and the dogfood console shows a wall of blocked esm.sh font / script / stylesheet loads. The spec's draft `apps.mdx` line 283 makes this a **host MUST**: "Host MUST construct CSP headers based on declared domains." We're not currently violating "MUST NOT allow undeclared domains" (we have no externals in our CSP at all), but we're failing the contract Apps rely on. Implementation: a CloudFront Function on viewer-response reading `?csp=` matching the upstream `examples/basic-host/serve.ts` `buildCspHeader` — ~50–100 LoC across `infrastructure/assets/mcp-sandbox/csp-function.js`, `mcp-sandbox-stack.ts`, frontend `proxy-url.ts`, plus tests. Cache stays simple (CFN runs on viewer-response including cache hits; one cached `proxy.html` body, dynamic header per request).
+
+## Apps that need it
+
+Empirical scan of `modelcontextprotocol/ext-apps/examples/*-server/server.ts` and the Excalidraw MCP server for `_meta.ui.csp` declarations. Four servers declare external domains:
+
+### Excalidraw `create_view` (our dogfood)
+
+```typescript
+// excalidraw/excalidraw-mcp/src/server.ts
+const cspMeta = {
+  ui: {
+    csp: {
+      resourceDomains: ['https://esm.sh'],
+      connectDomains: ['https://esm.sh'],
+    },
+  },
+};
+```
+
+The view's HTML pulls React 19, ReactDOM, Excalidraw 0.18, and the font/CSS bundle from `esm.sh`. On broad static CSP every one of those loads is blocked (`script-src` / `style-src` / `font-src` allow only `'self' blob: data:` — no `esm.sh`). The dogfood demo is visibly broken until this lands.
+
+### map-server (CesiumJS globe + OSM tiles)
+
+```typescript
+const cspMeta = {
+  ui: {
+    csp: {
+      connectDomains: [
+        "https://*.openstreetmap.org",   // OSM tiles + Nominatim geocoding
+        "https://cesium.com",
+        "https://*.cesium.com",
+      ],
+      resourceDomains: [
+        "https://*.openstreetmap.org",   // OSM map tiles
+        "https://cesium.com",
+        "https://*.cesium.com",
+      ],
+    },
+  },
+};
+```
+
+Hard fail on broad static — Cesium needs the tile servers + CDN both for `connect` (XHR for tile bytes / geocoding) and `resource` (script-src for ion-loaded JS modules). Our `connect-src 'self'` blocks every tile request the moment the globe initialises.
+
+### pdf-server (PDF.js standard fonts)
+
+```typescript
+csp: {
+  // pdf.js loads the Standard-14 fonts TWO ways:
+  //   - fetch()s the .ttf bytes → connect-src
+  //   - creates FontFace('name', 'url(...)') → font-src
+  // resourceDomains maps to font-src; we need both.
+  connectDomains: [STANDARD_FONT_ORIGIN],
+  resourceDomains: [STANDARD_FONT_ORIGIN],
+},
+```
+
+`STANDARD_FONT_ORIGIN` resolves to the pdf.js CDN host. PDF body renders but every glyph that requires a Standard-14 font (Helvetica, Times, Courier, Symbol, ZapfDingbats) falls back to a substitute or renders as a box — a visible quality regression, not a hard fail.
+
+### sheet-music-server (audio soundfonts)
+
+```typescript
+csp: {
+  // Allow loading soundfonts for audio playback
+  connectDomains: ["https://paulrosen.github.io"],
+},
+```
+
+Visual sheet-music rendering works on broad static (abcjs is bundled). Only the "play audio" button silently fails — soundfont fetches hit `connect-src 'self'` block.
+
+## Apps that don't need it
+
+22 of 25 reference servers declare no `_meta.ui.csp` at all. These work today on our broad static CSP because they:
+
+- Bundle everything (no external CDN fetches).
+- Use only same-origin postMessage to the host (no external network).
+- Use only `permissions` (mic/camera/clipboard) without external resource needs — covered by our `_meta.ui.permissions` plumbing, not CSP.
+
+Concrete list: `basic-server-*` (preact/react/solid/svelte/vanillajs/vue), budget-allocator-server, scenario-modeler-server, cohort-heatmap-server, customer-segmentation-server, integration-server, transcript-server, debug-server, qr-server, say-server, shadertoy-server, system-monitor-server, threejs-server, video-resource-server, wiki-explorer-server. **All five of the "rich UI" candidates the scoping doc considered for PR #7 dogfood** (budget-allocator, scenario-modeler, threejs, shadertoy, transcript) are in this set — none of them are blocked.
+
+Note: shadertoy / threejs being in this set is non-obvious — they're WebGL-heavy and you'd expect external asset CDNs, but in the reference repo they ship fully bundled.
+
+## Cost vs. benefit
+
+### Security gain — small, bordering on theatre
+
+The threat model: "untrusted App HTML escapes its CSP and exfiltrates / phishes from the user." Our current static CSP has:
+
+- `connect-src 'self'` — App cannot make any external network request from inside the iframe.
+- `frame-src 'none'` — App cannot frame anything else.
+- `base-uri 'none'`, `form-action 'none'`, `object-src 'none'` — no base / form / plugin injection.
+
+The remaining attack surface is `'unsafe-inline' 'unsafe-eval' blob: data:` on scripts/styles. But:
+
+1. The inner App iframe is **already cross-origin sandboxed** to the SPA (null origin under `sandbox` attribute). Even if an attacker fully owns the App's JS execution, they can't reach SPA cookies, localStorage, or DOM.
+2. The outer `proxy.html` ships **zero inline content** — every byte that runs is `proxy.js` loaded from same-origin (the dedicated mcp-sandbox CloudFront). `'unsafe-inline'`/`'unsafe-eval'` on the outer document can't be exploited unless an attacker can already inject into a static CloudFront asset, which is a much bigger compromise.
+3. Going dynamic would *narrow* `connect-src` and `script-src` to per-App declared domains. But for the 22/25 Apps without declared CSP, we'd use the spec's restrictive default (`connect-src 'none'`, `script-src 'self' 'unsafe-inline'` — *no* `'unsafe-eval' blob:`), which would **break** many of the bundled-but-eval-needing Apps we currently render fine. The reference implementation acknowledges this by baking `'unsafe-eval' blob: data:` into its default too.
+
+So dynamic CSP buys us: a tighter `connect-src` for the 3 Apps that actually declare it. That's a marginal defense-in-depth gain stacked behind the existing cross-origin sandbox boundary.
+
+### Spec-compliance gain — real but not violated today
+
+Draft `apps.mdx` line 283: **"Host MUST construct CSP headers based on declared domains."**
+Line 295: "No Loosening: Host MAY further restrict but MUST NOT allow undeclared domains."
+
+We don't violate "MUST NOT allow undeclared domains" — we have no external domains in our CSP at all. We *do* violate "MUST construct CSP headers based on declared domains" in the sense that we ignore declared `connectDomains`/`resourceDomains`. The user-visible consequence is that map-server / pdf-server / sheet-music-server can't fully function on us — they DECLARED what they need, we DIDN'T honor the declaration, the App fails. That's not "leaky security," it's "host doesn't implement the contract the App relied on."
+
+If someone is grading us on spec compliance (an external review, an audit, an MCP showcase), this gap is visible. If we're shipping internally, no one notices until we onboard a CSP-declaring App.
+
+### Implementation options compared
+
+| Option | Code | Deploy time | Runtime cost | Cache impact | On-call |
+|---|---|---|---|---|---|
+| **A. CloudFront Function (viewer-request → -response)** | ~30 LoC JS, no async, sanitize `?csp=`, emit header | Standard CFN deploy (~5 min) | $0.10/M invocations — pennies | proxy.html cache key adds `?csp`; hit rate drops to ~0 but origin is S3 (fast). proxy.js unaffected | Low — sync function, no cold start, no env vars |
+| **B. Lambda@Edge (viewer-response)** | ~50 LoC Node, full SDK, easier to test | Slower deploy (~10 min replication) | $0.60/M + duration; <$1/month at our traffic | same as A | Medium — Lambda@Edge logs land in *viewer* region CloudWatch, harder to follow; rollback is slower |
+| **C. Replace CloudFront+S3 with API Gateway + Lambda** | ~150 LoC + CDK rewrite | New stack | Higher | Lose CloudFront edge cache for proxy.js too | High — bigger surface |
+| **D. Origin Lambda behind CloudFront** | Lambda + CFN integration | Standard | Higher than A/B | proxy.js still cacheable; proxy.html per-request | Medium |
+
+Plus, for any option, frontend side: `mcp-app-frame.component.ts` already has `csp` from the `ui_resource` SSE event — it would build `${proxyOrigin}/proxy.html?csp=${encodeURIComponent(JSON.stringify(csp))}` before assigning `iframe.src`. ~10 LoC change, no new SSE event.
+
+**Recommended option if/when we ship: A (CloudFront Function).** It fits the constraint set (sync, no I/O, sanitize + concat into a header), is cheaper and lower-latency than Lambda@Edge, and has the simpler operational story. The sanitizer from `serve.ts` (`/[;\r\n'" ]/.test(d)` reject) is straightforwardly portable to the CFN JS runtime.
+
+The `ResponseHeadersPolicy` in `infrastructure/lib/mcp-sandbox-stack.ts` would need to drop its static `Content-Security-Policy` (the dynamic header would conflict with the policy's "override: true" semantics). Other security headers (HSTS, Referrer-Policy, X-Content-Type-Options) stay in the policy. `frame-ancestors` becomes part of the dynamic CSP since it's the security-critical bit — though it could also stay in a separate static `Content-Security-Policy` header alongside the dynamic one (CSPs combine via intersection).
+
+### Cache implications
+
+Today: `CacheQueryStringBehavior.none()` — every request to `proxy.html` returns the same cached body. Switch to `CacheQueryStringBehavior.allowList(['csp'])` and each unique `?csp=` value becomes a separate cache entry. With ~hundreds of distinct Apps in any deployed env, hit rate on `proxy.html` drops from ~100% to ~0%. proxy.html is ~2 KB, S3 origin response is sub-10ms — the cost is invisible at our traffic. `proxy.js` cache is untouched (no query param on its fetch).
+
+One real concern: cache *explosion* if Apps generate per-call unique `?csp=` query strings (e.g. dynamic per-conversation CSP). The 25 reference Apps all use static `_meta.ui.csp` at resource-declaration time, so in practice the cardinality is bounded by the number of distinct UI resources, not the number of conversations.
+
+## Trigger — what would change the recommendation
+
+**Ship if any of these happen:**
+
+1. We onboard map-server, pdf-server, or sheet-music-server (or any App declaring non-`'self'` `connectDomains` / `resourceDomains` / `frameDomains` / `baseUriDomains`). The CSP work goes in *that* PR — same author, fresh context, no need to reload prior state. **Most likely trigger: CesiumJS map-server when we want a "wow" demo.**
+2. The spec MUST tightens further (e.g. "Host MUST reject UI resources that declare CSP the host doesn't honor"). Skim the draft on each kaizen-research pass — currently line 283 is the relevant MUST; nothing has been added that makes it a *rejection* requirement yet.
+3. An external review / showcase / partner asks for SEP-1865 compliance attestation. The "declared domains not honored" gap is visible to anyone who reads the spec.
+4. We onboard an App that needs nested iframes (`frameDomains`) — our static `frame-src 'none'` blocks all nested framing absolutely. Reference Apps that fit this profile: none today, but anything embedding YouTube / a Tableau viz / a third-party widget would need it.
+
+**Don't ship for:**
+
+- "Defense-in-depth feels nice." The cross-origin sandbox is the real boundary. CSP tightening is icing.
+- "The reference does it, so should we." The reference is a demo host; we're a product. Match capability when we have a user-facing reason.
+- "It's in the scoping doc as a risk." The original scoping doc (`docs/kaizen/scoping/mcp-apps-host-renderer.md`) called out the CSP/`frame-ancestors` interplay as a 0.5–1d debug; we paid that debt. The dynamic-per-resource piece was always a follow-up.
+
+## Files that would change if we ship
+
+For reference (not implementation):
+
+- `infrastructure/lib/mcp-sandbox-stack.ts` — new CloudFront Function resource, drop static CSP from `ResponseHeadersPolicy`, update `CachePolicy` to include `?csp` in cache key on the `proxy.html` path behavior.
+- `infrastructure/lib/mcp-sandbox-function.js` (new) — the CFN handler: read `?csp=`, parse JSON, sanitize domains, build CSP string (mirror `buildCspHeader` from `examples/basic-host/serve.ts`), set response header.
+- `infrastructure/test/mcp-sandbox-stack.test.ts` — unit tests for sanitization (the `/[;\r\n'" ]/` reject rule is security-critical — every CSP-injection attack hides in domain entries with embedded `'`/`;`/space).
+- `frontend/ai.client/src/app/.../mcp-app-frame.component.ts` — build `?csp=` query before setting `iframe.src`; the bridge already receives `csp` on the `ui_resource` event.
+- `frontend/ai.client/src/app/.../mcp-app-frame.component.spec.ts` — unit test the query-string encoding.
+- No backend changes (the `ui_resource` SSE event already carries `csp`).
+
+Total: 1 new file (CFN handler), edits to 4 files, ~80–100 LoC + tests. 1–2 days, mostly testing the cache invalidation + redeploy behavior end-to-end.
diff --git a/frontend/ai.client/angular.json b/frontend/ai.client/angular.json
index fe26333a..e7490d83 100644
--- a/frontend/ai.client/angular.json
+++ b/frontend/ai.client/angular.json
@@ -41,7 +41,13 @@
             "scripts": [
               "node_modules/prismjs/prism.js",
               "node_modules/prismjs/components/prism-csharp.min.js",
+              "node_modules/prismjs/components/prism-javascript.min.js",
+              "node_modules/prismjs/components/prism-typescript.min.js",
+              "node_modules/prismjs/components/prism-python.min.js",
+              "node_modules/prismjs/components/prism-sql.min.js",
               "node_modules/prismjs/components/prism-css.min.js",
+              "node_modules/prismjs/components/prism-json.min.js",
+              "node_modules/prismjs/components/prism-markdown.min.js",
               "node_modules/mermaid/dist/mermaid.min.js",
               "node_modules/katex/dist/katex.min.js",
               "node_modules/katex/dist/contrib/auto-render.min.js",
diff --git a/frontend/ai.client/e2e/auth-admin.setup.ts b/frontend/ai.client/e2e/auth-admin.setup.ts
index 4d8c48c5..95a4f371 100644
--- a/frontend/ai.client/e2e/auth-admin.setup.ts
+++ b/frontend/ai.client/e2e/auth-admin.setup.ts
@@ -17,6 +17,8 @@ async function cognitoLogin(
 ) {
   await page.goto('/auth/login');
   await page.getByRole('button', { name: 'Sign in with Cognito' }).click();
+
+  // Wait for Cognito managed login page
   await page.getByRole('textbox', { name: 'Username' }).waitFor({ timeout: 15_000 });
   await page.getByRole('textbox', { name: 'Username' }).fill(username);
   await page.getByRole('textbox', { name: 'Password' }).fill(password);
@@ -31,7 +33,36 @@ async function cognitoLogin(
     );
   }
 
-  await page.waitForURL('**/', { timeout: 30_000 });
+  // Wait for the browser to leave Cognito and return to our app.
+  // After Cognito submit, the redirect chain is:
+  //   Cognito → /api/auth/callback → BFF token exchange → 302 to /
+  // If the BFF callback fails, it redirects to /?auth_error=... or /auth/login
+  // If cookies land on the wrong domain (ALB instead of CloudFront), the
+  // APP_INITIALIZER gets 401 and redirects back to /auth/login.
+
+  // Track the callback to diagnose cookie-domain issues
+  let callbackResponseUrl = '';
+  page.on('response', async (response) => {
+    if (response.url().includes('/auth/callback')) {
+      callbackResponseUrl = response.url();
+    }
+  });
+
+  try {
+    await page.waitForURL('**/', { timeout: 45_000 });
+  } catch {
+    const finalUrl = page.url();
+    const cookies = await page.context().cookies();
+    const bffCookies = cookies.filter(c => c.name.startsWith('__Host-bff'));
+    const cookieDetails = bffCookies.map(c => `${c.name}(domain=${c.domain},path=${c.path},secure=${c.secure})`).join('; ');
+    throw new Error(
+      `OAuth redirect chain failed. Final URL: ${finalUrl} | ` +
+      `Callback response URL: ${callbackResponseUrl || 'NEVER HIT'} | ` +
+      `BFF cookies: ${cookieDetails || 'NONE'} | ` +
+      `All cookie domains: ${[...new Set(cookies.map(c => c.domain))].join(', ')}`,
+    );
+  }
+
   await expect(page.locator('textarea#user-message')).toBeVisible({ timeout: 10_000 });
   await page.context().storageState({ path: storageStatePath });
 }
diff --git a/frontend/ai.client/e2e/auth-user.setup.ts b/frontend/ai.client/e2e/auth-user.setup.ts
index 584d6c6f..d9cfb0f7 100644
--- a/frontend/ai.client/e2e/auth-user.setup.ts
+++ b/frontend/ai.client/e2e/auth-user.setup.ts
@@ -17,6 +17,8 @@ async function cognitoLogin(
 ) {
   await page.goto('/auth/login');
   await page.getByRole('button', { name: 'Sign in with Cognito' }).click();
+
+  // Wait for Cognito managed login page
   await page.getByRole('textbox', { name: 'Username' }).waitFor({ timeout: 15_000 });
   await page.getByRole('textbox', { name: 'Username' }).fill(username);
   await page.getByRole('textbox', { name: 'Password' }).fill(password);
@@ -31,7 +33,51 @@ async function cognitoLogin(
     );
   }
 
-  await page.waitForURL('**/', { timeout: 30_000 });
+  // Wait for the browser to leave Cognito and return to our app.
+  // After Cognito submit, the redirect chain is:
+  //   Cognito → /api/auth/callback → BFF token exchange → 302 to /
+  // If the BFF callback fails, it redirects to /?auth_error=... or /auth/login
+
+  // Intercept the /auth/session request to see what's happening
+  let sessionResponseStatus = 0;
+  let sessionResponseBody = '';
+  let sessionRequestCookies = '';
+  // Track the callback redirect to diagnose cookie-domain issues
+  let callbackResponseUrl = '';
+  let callbackSetCookies: string[] = [];
+  page.on('response', async (response) => {
+    if (response.url().includes('/auth/session')) {
+      sessionResponseStatus = response.status();
+      sessionRequestCookies = response.request().headers()['cookie'] || 'NO COOKIE HEADER';
+      try { sessionResponseBody = await response.text(); } catch { sessionResponseBody = '<unreadable>'; }
+    }
+    // Capture the callback response to see where cookies are being set
+    if (response.url().includes('/auth/callback')) {
+      callbackResponseUrl = response.url();
+      const headers = response.headers();
+      // Collect all set-cookie headers (may be multiple)
+      const setCookie = headers['set-cookie'] || '';
+      if (setCookie) callbackSetCookies.push(setCookie);
+    }
+  });
+
+  try {
+    await page.waitForURL('**/', { timeout: 45_000 });
+  } catch {
+    const finalUrl = page.url();
+    const cookies = await page.context().cookies();
+    const bffCookies = cookies.filter(c => c.name.startsWith('__Host-bff'));
+    const cookieDetails = bffCookies.map(c => `${c.name}(domain=${c.domain},path=${c.path},secure=${c.secure})`).join('; ');
+    throw new Error(
+      `OAuth redirect chain failed. Final URL: ${finalUrl} | ` +
+      `Callback response URL: ${callbackResponseUrl || 'NEVER HIT'} | ` +
+      `Session response: ${sessionResponseStatus} ${sessionResponseBody.substring(0, 100)} | ` +
+      `Cookie header sent: ${sessionRequestCookies.substring(0, 150)} | ` +
+      `BFF cookies in jar: ${cookieDetails || 'NONE'} | ` +
+      `All cookie domains: ${[...new Set(cookies.map(c => c.domain))].join(', ')}`,
+    );
+  }
+
   await expect(page.locator('textarea#user-message')).toBeVisible({ timeout: 10_000 });
   await page.context().storageState({ path: storageStatePath });
 }
diff --git a/frontend/ai.client/e2e/home-page/chat.user.spec.ts b/frontend/ai.client/e2e/home-page/chat.user.spec.ts
index 29e3b765..70f94466 100644
--- a/frontend/ai.client/e2e/home-page/chat.user.spec.ts
+++ b/frontend/ai.client/e2e/home-page/chat.user.spec.ts
@@ -31,8 +31,8 @@ async function sendMessageAndWaitForResponse(
   await page.getByRole('button', { name: 'Submit message' }).click();
 
   const assistantMessage = page.locator('app-assistant-message').last();
-  await expect(assistantMessage).toBeVisible({ timeout: 150_000 });
-  await expect(page.locator('app-pulsating-loader')).toBeHidden({ timeout: 250_000 });
+  await expect(assistantMessage).toBeVisible({ timeout: 300_000 });
+  await expect(page.locator('app-pulsating-loader')).toBeHidden({ timeout: 300_000 });
 
   return (await assistantMessage.innerText()).trim();
 }
@@ -41,6 +41,7 @@ async function sendMessageAndWaitForResponse(
 test.describe('Chat (user)', () => {
   test.describe.serial('Chat lifecycle with Claude Haiku 4.5', () => {
     test('should select Haiku, send a message, and receive a response', async ({ page }) => {
+      test.setTimeout(60_000); // 60s for this test
       await page.goto('/');
       await expect(page.locator('textarea#user-message')).toBeVisible({ timeout: 15_000 });
 
@@ -58,6 +59,7 @@ test.describe('Chat (user)', () => {
     });
 
     test('should send a second message in the same session', async ({ page }) => {
+      test.setTimeout(60_000); // 60s for this test
       await page.goto('/');
       await expect(page.locator('textarea#user-message')).toBeVisible({ timeout: 15_000 });
 
diff --git a/frontend/ai.client/e2e/manage-sessions.user.spec.ts b/frontend/ai.client/e2e/manage-sessions.user.spec.ts
index 9f25bc01..3d9d20b4 100644
--- a/frontend/ai.client/e2e/manage-sessions.user.spec.ts
+++ b/frontend/ai.client/e2e/manage-sessions.user.spec.ts
@@ -64,12 +64,18 @@ test.describe('Manage Sessions Page (user)', () => {
 
   test('should show the delete selected button disabled when nothing is selected', async ({ page }) => {
     await page.goto('/manage-sessions');
-    await expect(page.getByText('Loading conversations...')).toBeHidden({ timeout: 15_000 });
 
-    const deleteButton = page.getByRole('button', { name: /Delete Selected/i });
-    const hasButton = (await deleteButton.count()) > 0;
-    test.skip(!hasButton, 'No sessions available — delete button not rendered');
+    // Wait for the page to fully render before checking anything
+    await expect(
+      page.getByRole('heading', { name: 'Manage Conversations' }),
+    ).toBeVisible({ timeout: 15_000 });
+
+    // Wait for loading to finish
+    await expect(page.getByText('Loading conversations...')).toBeHidden({ timeout: 30_000 });
 
+    // The Delete Selected button is always rendered (not conditional on sessions existing)
+    const deleteButton = page.getByRole('button', { name: /Delete Selected/i });
+    await expect(deleteButton).toBeVisible({ timeout: 5_000 });
     await expect(deleteButton).toBeDisabled();
   });
 });
diff --git a/frontend/ai.client/package-lock.json b/frontend/ai.client/package-lock.json
index fb94ae19..fdff2c3f 100644
--- a/frontend/ai.client/package-lock.json
+++ b/frontend/ai.client/package-lock.json
@@ -1,12 +1,12 @@
 {
   "name": "ai.client",
-  "version": "1.0.0-beta.24",
+  "version": "1.0.0-beta.28",
   "lockfileVersion": 3,
   "requires": true,
   "packages": {
     "": {
       "name": "ai.client",
-      "version": "1.0.0-beta.24",
+      "version": "1.0.0-beta.28",
       "dependencies": {
         "@angular/cdk": "21.2.9",
         "@angular/common": "21.2.11",
diff --git a/frontend/ai.client/package.json b/frontend/ai.client/package.json
index 27283c25..a48c2245 100644
--- a/frontend/ai.client/package.json
+++ b/frontend/ai.client/package.json
@@ -1,6 +1,6 @@
 {
   "name": "ai.client",
-  "version": "1.0.0-beta.24",
+  "version": "1.0.0-beta.28",
   "scripts": {
     "ng": "ng",
     "start": "ng serve",
diff --git a/frontend/ai.client/playwright.ci.config.ts b/frontend/ai.client/playwright.ci.config.ts
index e38dce1a..083f3989 100644
--- a/frontend/ai.client/playwright.ci.config.ts
+++ b/frontend/ai.client/playwright.ci.config.ts
@@ -41,10 +41,12 @@ export default defineConfig({
     {
       name: 'admin-setup',
       testMatch: /auth-admin\.setup\.ts/,
+      timeout: 60_000,
     },
     {
       name: 'user-setup',
       testMatch: /auth-user\.setup\.ts/,
+      timeout: 60_000,
     },
 
     // --- Unauthenticated tests (no login needed) ---
diff --git a/frontend/ai.client/src/app/admin/admin.layout.ts b/frontend/ai.client/src/app/admin/admin.layout.ts
new file mode 100644
index 00000000..0920bcef
--- /dev/null
+++ b/frontend/ai.client/src/app/admin/admin.layout.ts
@@ -0,0 +1,174 @@
+import {
+  Component,
+  ChangeDetectionStrategy,
+  inject,
+} from '@angular/core';
+import { Router, RouterLink, RouterLinkActive, RouterOutlet } from '@angular/router';
+import { NgIcon, provideIcons } from '@ng-icons/core';
+import {
+  heroArrowLeft,
+  heroShieldCheck,
+  heroCurrencyDollar,
+  heroScale,
+  heroAcademicCap,
+  heroPencilSquare,
+  heroWrenchScrewdriver,
+  heroLink,
+  heroUsers,
+  heroKey,
+  heroFingerPrint,
+  heroBars3,
+} from '@ng-icons/heroicons/outline';
+
+interface NavItem {
+  label: string;
+  icon: string;
+  route: string;
+}
+
+interface NavGroup {
+  label: string;
+  items: NavItem[];
+}
+
+@Component({
+  selector: 'app-admin-layout',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  imports: [RouterLink, RouterLinkActive, RouterOutlet, NgIcon],
+  providers: [
+    provideIcons({
+      heroArrowLeft,
+      heroShieldCheck,
+      heroCurrencyDollar,
+      heroScale,
+      heroAcademicCap,
+      heroPencilSquare,
+      heroWrenchScrewdriver,
+      heroLink,
+      heroUsers,
+      heroKey,
+      heroFingerPrint,
+      heroBars3,
+    }),
+  ],
+  host: { class: 'block' },
+  template: `
+    <div class="min-h-dvh">
+      <!-- Top bar -->
+      <div class="sticky top-0 z-10 border-b border-gray-200 bg-gray-50/80 backdrop-blur-sm dark:border-white/10 dark:bg-gray-900/50">
+        <div class="flex h-14 items-center gap-4 px-4 sm:px-6 lg:px-8">
+          <a
+            routerLink="/"
+            class="flex items-center gap-2 text-sm/6 font-medium text-gray-500 transition-colors hover:text-gray-900 dark:text-gray-400 dark:hover:text-white"
+          >
+            <ng-icon name="heroArrowLeft" class="size-4" />
+            <span class="hidden sm:inline">Back to Chat</span>
+          </a>
+          <div class="h-5 w-px bg-gray-200 dark:bg-white/10"></div>
+          <div class="flex items-center gap-2">
+            <ng-icon name="heroShieldCheck" class="size-5 text-gray-400 dark:text-gray-500" />
+            <h1 class="text-base/7 font-semibold text-gray-900 dark:text-white">Admin</h1>
+          </div>
+        </div>
+      </div>
+
+      <div class="mx-auto max-w-[96rem] px-4 py-8 sm:px-6 lg:px-8">
+        <div class="lg:flex lg:gap-x-8">
+          <!-- Sidebar Navigation -->
+          <aside class="lg:w-60 lg:shrink-0">
+            <!-- Mobile dropdown (shown on small screens) -->
+            <div class="lg:hidden">
+              <label for="admin-nav" class="sr-only">Admin section</label>
+              <select
+                id="admin-nav"
+                class="block w-full rounded-sm border-gray-300 bg-white py-2 pl-3 pr-10 text-base text-gray-900 focus:border-blue-500 focus:outline-hidden focus:ring-blue-500 dark:border-gray-700 dark:bg-gray-800 dark:text-white"
+                (change)="onMobileNavChange($event)"
+              >
+                @for (group of navGroups; track group.label) {
+                  <optgroup [label]="group.label">
+                    @for (item of group.items; track item.route) {
+                      <option [value]="item.route">{{ item.label }}</option>
+                    }
+                  </optgroup>
+                }
+              </select>
+            </div>
+
+            <!-- Desktop sidebar -->
+            <nav class="hidden lg:block" aria-label="Admin navigation">
+              <div class="flex flex-col gap-6">
+                @for (group of navGroups; track group.label) {
+                  <div>
+                    <h2 class="px-3 text-xs/5 font-semibold uppercase tracking-wide text-gray-500 dark:text-gray-400">
+                      {{ group.label }}
+                    </h2>
+                    <ul role="list" class="mt-2 flex flex-col gap-1">
+                      @for (item of group.items; track item.route) {
+                        <li>
+                          <a
+                            [routerLink]="item.route"
+                            routerLinkActive="bg-gray-100 text-gray-900 dark:bg-white/10 dark:text-white"
+                            class="group flex items-center gap-x-3 whitespace-nowrap rounded-md px-3 py-2 text-sm/6 font-medium text-gray-700 transition-colors hover:bg-gray-100 hover:text-gray-900 dark:text-gray-400 dark:hover:bg-white/10 dark:hover:text-white"
+                          >
+                            <ng-icon [name]="item.icon" class="size-5 shrink-0 text-gray-400 group-hover:text-gray-500 dark:text-gray-500 dark:group-hover:text-gray-300" />
+                            {{ item.label }}
+                          </a>
+                        </li>
+                      }
+                    </ul>
+                  </div>
+                }
+              </div>
+            </nav>
+          </aside>
+
+          <!-- Content area -->
+          <main class="mt-8 min-w-0 lg:mt-0 lg:flex-1">
+            <router-outlet />
+          </main>
+        </div>
+      </div>
+    </div>
+  `,
+})
+export class AdminLayout {
+  private router = inject(Router);
+
+  readonly navGroups: NavGroup[] = [
+    {
+      label: 'Usage & Spend',
+      items: [
+        { label: 'Cost Analytics', icon: 'heroCurrencyDollar', route: '/admin/costs' },
+        { label: 'Quotas', icon: 'heroScale', route: '/admin/quota' },
+        { label: 'Fine-Tuning', icon: 'heroAcademicCap', route: '/admin/fine-tuning' },
+      ],
+    },
+    {
+      label: 'AI Configuration',
+      items: [
+        { label: 'Models', icon: 'heroPencilSquare', route: '/admin/manage-models' },
+        { label: 'Tools', icon: 'heroWrenchScrewdriver', route: '/admin/tools' },
+        { label: 'Connectors', icon: 'heroLink', route: '/admin/connectors' },
+      ],
+    },
+    {
+      label: 'Identity & Access',
+      items: [
+        { label: 'Users', icon: 'heroUsers', route: '/admin/users' },
+        { label: 'Roles', icon: 'heroKey', route: '/admin/roles' },
+        { label: 'Auth Providers', icon: 'heroFingerPrint', route: '/admin/auth-providers' },
+      ],
+    },
+    {
+      label: 'Customization',
+      items: [
+        { label: 'User Menu Links', icon: 'heroBars3', route: '/admin/manage-user-menu-links' },
+      ],
+    },
+  ];
+
+  onMobileNavChange(event: Event): void {
+    const select = event.target as HTMLSelectElement;
+    this.router.navigateByUrl(select.value);
+  }
+}
diff --git a/frontend/ai.client/src/app/admin/admin.page.css b/frontend/ai.client/src/app/admin/admin.page.css
deleted file mode 100644
index 6d1fe4e2..00000000
--- a/frontend/ai.client/src/app/admin/admin.page.css
+++ /dev/null
@@ -1 +0,0 @@
-/* Admin landing page styles */
diff --git a/frontend/ai.client/src/app/admin/admin.page.html b/frontend/ai.client/src/app/admin/admin.page.html
deleted file mode 100644
index 517a0bcf..00000000
--- a/frontend/ai.client/src/app/admin/admin.page.html
+++ /dev/null
@@ -1,79 +0,0 @@
-<div class="min-h-dvh">
-  <div class="mx-auto max-w-7xl px-4 py-8 sm:px-6 lg:px-8">
-    <!-- Page Header -->
-    <div class="mb-8">
-      <h1 class="text-3xl/9 font-bold text-gray-900 dark:text-white">Admin Dashboard</h1>
-      <p class="mt-2 text-base/7 text-gray-600 dark:text-gray-400">
-        Manage AI models, quotas, and system configuration
-      </p>
-    </div>
-
-    <!-- Feature Grid -->
-    <div class="grid grid-cols-1 gap-6 md:grid-cols-2 lg:grid-cols-3">
-      @for (feature of features; track feature.route; let i = $index) {
-        <a
-          [routerLink]="feature.route"
-          class="group relative flex flex-col gap-4 rounded-sm border border-gray-200 bg-white p-6 transition-all hover:border-blue-300 hover:shadow-sm dark:border-gray-700 dark:bg-gray-800 dark:hover:border-blue-700"
-        >
-          <!-- Icon -->
-          <div class="flex items-start">
-            <div [class]="'flex size-16 items-center justify-center rounded-sm border border-gray-300 dark:border-gray-600 ' + getIconBackgroundClasses(i)">
-              <ng-icon
-                [name]="feature.icon"
-                size="32"
-                [class]="getIconColorClasses(i)"
-              />
-            </div>
-          </div>
-
-          <!-- Content -->
-          <div class="flex-1">
-            <h3 class="text-lg/7 font-semibold text-gray-900 group-hover:text-blue-600 dark:text-white dark:group-hover:text-blue-400">
-              {{ feature.title }}
-            </h3>
-            <p class="mt-2 text-sm/6 text-gray-600 dark:text-gray-400">
-              {{ feature.description }}
-            </p>
-          </div>
-
-          <!-- Arrow indicator -->
-          <div class="flex items-center text-sm/6 font-medium text-blue-600 dark:text-blue-400">
-            <span>Open</span>
-            <svg class="ml-2 size-4 transition-transform group-hover:translate-x-1" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2">
-              <path stroke-linecap="round" stroke-linejoin="round" d="M13 7l5 5m0 0l-5 5m5-5H6" />
-            </svg>
-          </div>
-        </a>
-      }
-    </div>
-
-    <!-- Info Section -->
-    <div class="mt-8 rounded-sm border border-blue-200 bg-blue-50 p-6 dark:border-blue-800 dark:bg-blue-900/20">
-      <h2 class="text-lg/7 font-semibold text-blue-900 dark:text-blue-200">About Admin Features</h2>
-      <div class="mt-3 space-y-2 text-sm/6 text-blue-800 dark:text-blue-300">
-        <p>
-          <strong>Model Management:</strong> Configure which AI models are available to users. Control access by role and set pricing information.
-        </p>
-        <p>
-          <strong>Quota Management:</strong> Comprehensive quota system with tiered limits, role-based assignments, email domain matching, temporary overrides, and detailed monitoring.
-        </p>
-      </div>
-    </div>
-
-    <!-- Quick Stats (Optional - can be expanded later) -->
-    <!-- <div class="mt-8 grid grid-cols-1 gap-6 md:grid-cols-3">
-      <div class="rounded-sm border border-gray-200 bg-white p-6 dark:border-gray-700 dark:bg-gray-800">
-        <h3 class="text-sm/6 font-medium text-gray-600 dark:text-gray-400">Active Models</h3>
-        <p class="mt-2 text-3xl/9 font-bold text-gray-900 dark:text-white">12</p>
-      </div>
-      <div class="rounded-sm border border-gray-200 bg-white p-6 dark:border-gray-700 dark:bg-gray-800">
-        <h3 class="text-sm/6 font-medium text-gray-600 dark:text-gray-400">Quota Tiers</h3>
-        <p class="mt-2 text-3xl/9 font-bold text-gray-900 dark:text-white">5</p>
-      </div>
-      <div class="rounded-sm border border-gray-200 bg-white p-6 dark:border-gray-700 dark:bg-gray-800">
-        <h3 class="text-sm/6 font-medium text-gray-600 dark:text-gray-400">Active Overrides</h3>
-        <p class="mt-2 text-3xl/9 font-bold text-gray-900 dark:text-white">3</p>
-      </div>
-    </div> -->
-  </div>
-</div>
diff --git a/frontend/ai.client/src/app/admin/admin.page.ts b/frontend/ai.client/src/app/admin/admin.page.ts
deleted file mode 100644
index c5b4166b..00000000
--- a/frontend/ai.client/src/app/admin/admin.page.ts
+++ /dev/null
@@ -1,192 +0,0 @@
-import { Component, ChangeDetectionStrategy } from '@angular/core';
-import { RouterLink } from '@angular/router';
-import { NgIcon, provideIcons } from '@ng-icons/core';
-import {
-  heroCpuChip,
-  heroPencilSquare,
-  heroScale,
-  heroChartBar,
-  heroClipboardDocumentList,
-  heroMagnifyingGlass,
-  heroCalendar,
-  heroSparkles,
-  heroCurrencyDollar,
-  heroUsers,
-  heroShieldCheck,
-  heroWrenchScrewdriver,
-  heroLink,
-  heroFingerPrint,
-  heroAcademicCap,
-} from '@ng-icons/heroicons/outline';
-
-interface AdminFeature {
-  title: string;
-  description: string;
-  icon: string;
-  route: string;
-}
-
-@Component({
-  selector: 'app-admin-page',
-  imports: [RouterLink, NgIcon],
-  providers: [
-    provideIcons({
-      heroCpuChip,
-      heroPencilSquare,
-      heroScale,
-      heroChartBar,
-      heroClipboardDocumentList,
-      heroMagnifyingGlass,
-      heroCalendar,
-      heroSparkles,
-      heroCurrencyDollar,
-      heroUsers,
-      heroShieldCheck,
-      heroWrenchScrewdriver,
-      heroLink,
-      heroFingerPrint,
-      heroAcademicCap,
-    })
-  ],
-  templateUrl: './admin.page.html',
-  styleUrl: './admin.page.css',
-  changeDetection: ChangeDetectionStrategy.OnPush,
-})
-export class AdminPage {
-  readonly features: AdminFeature[] = [
-    {
-      title: 'Cost Analytics',
-      description: 'View system-wide usage metrics, top users by cost, model breakdowns, and cost trends. Export reports for analysis.',
-      icon: 'heroCurrencyDollar',
-      route: '/admin/costs',
-    },
-    {
-      title: 'Manage Models',
-      description: 'Configure and manage AI models available to users. Control model access by role, set pricing, and enable/disable models.',
-      icon: 'heroPencilSquare',
-      route: '/admin/manage-models',
-    },
-    {
-      title: 'Tool Catalog',
-      description: 'Manage the tool catalog, configure role-based access, and sync tools from the registry. Control which tools are available to users.',
-      icon: 'heroWrenchScrewdriver',
-      route: '/admin/tools',
-    },
-    // {
-    //   title: 'Bedrock Models',
-    //   description: 'Browse and explore AWS Bedrock foundation models. View model capabilities, pricing, and add models to your managed collection.',
-    //   icon: 'heroCpuChip',
-    //   route: '/admin/bedrock/models',
-    // },
-    // {
-    //   title: 'Gemini Models',
-    //   description: 'Browse and explore Google Gemini AI models. View model specifications, features, and add models to your managed collection.',
-    //   icon: 'heroSparkles',
-    //   route: '/admin/gemini/models',
-    // },
-    // {
-    //   title: 'OpenAI Models',
-    //   description: 'Browse and explore OpenAI models including GPT-4 and other offerings. View capabilities and add models to your managed collection.',
-    //   icon: 'heroCpuChip',
-    //   route: '/admin/openai/models',
-    // },
-    
-    {
-      title: 'User Lookup',
-      description: 'Search and browse users to view their profile, costs, and quota status. Manage user-specific overrides and assignments.',
-      icon: 'heroUsers',
-      route: '/admin/users',
-    },
-    {
-      title: 'Role Management',
-      description: 'Create and manage application roles with tool and model permissions. Configure JWT mappings and role inheritance.',
-      icon: 'heroShieldCheck',
-      route: '/admin/roles',
-    },
-    {
-      title: 'Auth Providers',
-      description: 'Configure OIDC authentication providers for user login. Manage issuer URLs, client credentials, claim mappings, and login page appearance.',
-      icon: 'heroFingerPrint',
-      route: '/admin/auth-providers',
-    },
-    {
-      title: 'Connectors',
-      description: 'Configure third-party OAuth integrations that users can connect for MCP tool authentication. Manage Google, Microsoft, GitHub, and custom connectors.',
-      icon: 'heroLink',
-      route: '/admin/connectors',
-    },
-    {
-      title: 'Fine-Tuning Access',
-      description: 'Manage which users can access fine-tuning. Grant or revoke access, set monthly compute hour quotas, and monitor usage.',
-      icon: 'heroAcademicCap',
-      route: '/admin/fine-tuning',
-    },
-    {
-      title: 'Fine-Tuning Costs',
-      description: 'View per-user GPU compute costs, hours used, and job counts for fine-tuning. Drill into monthly breakdowns.',
-      icon: 'heroChartBar',
-      route: '/admin/fine-tuning/costs',
-    },
-    {
-      title: 'Quota Tiers',
-      description: 'Create and manage quota tiers with cost limits and soft limit configurations. Define monthly/daily limits and warning thresholds.',
-      icon: 'heroScale',
-      route: '/admin/quota/tiers',
-    },
-    {
-      title: 'Quota Assignments',
-      description: 'Assign quota tiers to users, roles, or email domains. Control priority and manage default tier assignments.',
-      icon: 'heroClipboardDocumentList',
-      route: '/admin/quota/assignments',
-    },
-    {
-      title: 'Quota Overrides',
-      description: 'Create temporary quota exceptions for individual users. Set custom limits or unlimited access with expiration dates.',
-      icon: 'heroCalendar',
-      route: '/admin/quota/overrides',
-    },
-    {
-      title: 'Quota Inspector',
-      description: 'Debug and inspect quota resolution for individual users. View resolved quotas, current usage, and recent blocks.',
-      icon: 'heroMagnifyingGlass',
-      route: '/admin/quota/inspector',
-    },
-    {
-      title: 'Quota Events',
-      description: 'Monitor quota enforcement events including warnings, blocks, resets, and override applications. Export event data to CSV.',
-      icon: 'heroChartBar',
-      route: '/admin/quota/events',
-    },
-    
-  ];
-
-  getIconBackgroundClasses(index: number): string {
-    const backgrounds = [
-      'bg-purple-100 dark:bg-purple-900/30',
-      'bg-blue-100 dark:bg-blue-900/30',
-      'bg-green-100 dark:bg-green-900/30',
-      'bg-amber-100 dark:bg-amber-900/30',
-      'bg-pink-100 dark:bg-pink-900/30',
-      'bg-indigo-100 dark:bg-indigo-900/30',
-      'bg-teal-100 dark:bg-teal-900/30',
-      'bg-rose-100 dark:bg-rose-900/30',
-      'bg-emerald-100 dark:bg-emerald-900/30',
-    ];
-    return backgrounds[index % backgrounds.length];
-  }
-
-  getIconColorClasses(index: number): string {
-    const colors = [
-      'text-purple-600 dark:text-purple-400',
-      'text-blue-600 dark:text-blue-400',
-      'text-green-600 dark:text-green-400',
-      'text-amber-600 dark:text-amber-400',
-      'text-pink-600 dark:text-pink-400',
-      'text-indigo-600 dark:text-indigo-400',
-      'text-teal-600 dark:text-teal-400',
-      'text-rose-600 dark:text-rose-400',
-      'text-emerald-600 dark:text-emerald-400',
-    ];
-    return colors[index % colors.length];
-  }
-}
diff --git a/frontend/ai.client/src/app/admin/admin.routes.ts b/frontend/ai.client/src/app/admin/admin.routes.ts
new file mode 100644
index 00000000..49f1ed09
--- /dev/null
+++ b/frontend/ai.client/src/app/admin/admin.routes.ts
@@ -0,0 +1,139 @@
+import { Routes } from '@angular/router';
+import { FineTuningLayout } from './fine-tuning-access/fine-tuning.layout';
+
+export const adminRoutes: Routes = [
+  {
+    path: '',
+    redirectTo: 'costs',
+    pathMatch: 'full',
+  },
+  {
+    path: 'costs',
+    loadComponent: () => import('./costs/admin-costs.page').then(m => m.AdminCostsPage),
+  },
+  {
+    path: 'quota',
+    loadChildren: () => import('./quota-tiers/quota-routing.module').then(m => m.quotaRoutes),
+  },
+  {
+    path: 'fine-tuning',
+    component: FineTuningLayout,
+    children: [
+      {
+        path: '',
+        loadComponent: () => import('./fine-tuning-access/fine-tuning-access.page').then(m => m.FineTuningAccessPage),
+      },
+      {
+        path: 'costs',
+        loadComponent: () => import('./fine-tuning-costs/fine-tuning-costs.page').then(m => m.FineTuningCostsPage),
+      },
+    ],
+  },
+  {
+    path: 'manage-models',
+    loadComponent: () => import('./manage-models/manage-models.page').then(m => m.ManageModelsPage),
+  },
+  {
+    path: 'manage-models/new',
+    loadComponent: () => import('./manage-models/model-form.page').then(m => m.ModelFormPage),
+  },
+  {
+    path: 'manage-models/edit/:id',
+    loadComponent: () => import('./manage-models/model-form.page').then(m => m.ModelFormPage),
+  },
+  {
+    path: 'bedrock/models',
+    loadComponent: () => import('./bedrock-models/bedrock-models.page').then(m => m.BedrockModelsPage),
+  },
+  {
+    path: 'gemini/models',
+    loadComponent: () => import('./gemini-models/gemini-models.page').then(m => m.GeminiModelsPage),
+  },
+  {
+    path: 'openai/models',
+    loadComponent: () => import('./openai-models/openai-models.page').then(m => m.OpenAIModelsPage),
+  },
+  {
+    path: 'tools',
+    loadComponent: () => import('./tools/pages/tool-list.page').then(m => m.ToolListPage),
+  },
+  {
+    path: 'tools/new',
+    loadComponent: () => import('./tools/pages/tool-form.page').then(m => m.ToolFormPage),
+  },
+  {
+    path: 'tools/edit/:toolId',
+    loadComponent: () => import('./tools/pages/tool-form.page').then(m => m.ToolFormPage),
+  },
+  {
+    path: 'connectors',
+    loadComponent: () => import('./connectors/pages/connector-list.page').then(m => m.ConnectorListPage),
+  },
+  {
+    path: 'connectors/new',
+    loadComponent: () => import('./connectors/pages/connector-form.page').then(m => m.ConnectorFormPage),
+  },
+  {
+    path: 'connectors/edit/:providerId',
+    loadComponent: () => import('./connectors/pages/connector-form.page').then(m => m.ConnectorFormPage),
+  },
+  {
+    path: 'oauth-providers',
+    redirectTo: 'connectors',
+    pathMatch: 'full',
+  },
+  {
+    path: 'oauth-providers/new',
+    redirectTo: 'connectors/new',
+    pathMatch: 'full',
+  },
+  {
+    path: 'oauth-providers/edit/:providerId',
+    redirectTo: 'connectors/edit/:providerId',
+    pathMatch: 'full',
+  },
+  {
+    path: 'users',
+    loadComponent: () => import('./users/pages/user-list/user-list.page').then(m => m.UserListPage),
+  },
+  {
+    path: 'users/:userId',
+    loadComponent: () => import('./users/pages/user-detail/user-detail.page').then(m => m.UserDetailPage),
+  },
+  {
+    path: 'roles',
+    loadComponent: () => import('./roles/pages/role-list.page').then(m => m.RoleListPage),
+  },
+  {
+    path: 'roles/new',
+    loadComponent: () => import('./roles/pages/role-form.page').then(m => m.RoleFormPage),
+  },
+  {
+    path: 'roles/edit/:id',
+    loadComponent: () => import('./roles/pages/role-form.page').then(m => m.RoleFormPage),
+  },
+  {
+    path: 'auth-providers',
+    loadComponent: () => import('./auth-providers/pages/provider-list.page').then(m => m.AuthProviderListPage),
+  },
+  {
+    path: 'auth-providers/new',
+    loadComponent: () => import('./auth-providers/pages/provider-form.page').then(m => m.AuthProviderFormPage),
+  },
+  {
+    path: 'auth-providers/edit/:providerId',
+    loadComponent: () => import('./auth-providers/pages/provider-form.page').then(m => m.AuthProviderFormPage),
+  },
+  {
+    path: 'manage-user-menu-links',
+    loadComponent: () => import('./manage-user-menu-links/manage-user-menu-links.page').then(m => m.ManageUserMenuLinksPage),
+  },
+  {
+    path: 'manage-user-menu-links/new',
+    loadComponent: () => import('./manage-user-menu-links/user-menu-link-form.page').then(m => m.UserMenuLinkFormPage),
+  },
+  {
+    path: 'manage-user-menu-links/edit/:id',
+    loadComponent: () => import('./manage-user-menu-links/user-menu-link-form.page').then(m => m.UserMenuLinkFormPage),
+  },
+];
diff --git a/frontend/ai.client/src/app/admin/auth-providers/pages/provider-form.page.ts b/frontend/ai.client/src/app/admin/auth-providers/pages/provider-form.page.ts
index c2468ae9..654cc338 100644
--- a/frontend/ai.client/src/app/admin/auth-providers/pages/provider-form.page.ts
+++ b/frontend/ai.client/src/app/admin/auth-providers/pages/provider-form.page.ts
@@ -69,8 +69,7 @@ interface ProviderFormGroup {
     class: 'block',
   },
   template: `
-    <div class="min-h-dvh">
-      <div class="mx-auto max-w-4xl px-4 py-8 sm:px-6 lg:px-8">
+    <div class="max-w-4xl">
         <!-- Back Button -->
         <button
           (click)="goBack()"
@@ -95,7 +94,7 @@ interface ProviderFormGroup {
           <div class="flex h-64 items-center justify-center">
             <div class="flex flex-col items-center gap-4">
               <div
-                class="size-12 animate-spin rounded-full border-4 border-gray-300 border-t-blue-600 dark:border-gray-600"
+                class="size-12 animate-spin rounded-full border-4 border-gray-300 border-t-blue-600 dark:border-t-blue-400 dark:border-gray-600"
               ></div>
               <p class="text-sm text-gray-500 dark:text-gray-400">
                 Loading provider...
@@ -667,7 +666,6 @@ interface ProviderFormGroup {
             </div>
           </form>
         }
-      </div>
     </div>
   `,
 })
diff --git a/frontend/ai.client/src/app/admin/auth-providers/pages/provider-list.page.ts b/frontend/ai.client/src/app/admin/auth-providers/pages/provider-list.page.ts
index 2472f7b5..2fc47b63 100644
--- a/frontend/ai.client/src/app/admin/auth-providers/pages/provider-list.page.ts
+++ b/frontend/ai.client/src/app/admin/auth-providers/pages/provider-list.page.ts
@@ -45,15 +45,6 @@ import { AuthProvider } from '../models/auth-provider.model';
     class: 'block p-6',
   },
   template: `
-    <!-- Back Button -->
-    <a
-      routerLink="/admin"
-      class="mb-6 inline-flex items-center gap-2 text-sm/6 font-medium text-gray-600 hover:text-gray-900 dark:text-gray-400 dark:hover:text-white"
-    >
-      <ng-icon name="heroArrowLeft" class="size-4" />
-      Back to Admin
-    </a>
-
     <div class="mb-6 flex items-center justify-between">
       <div>
         <h1 class="text-3xl/9 font-bold">Authentication Providers</h1>
@@ -118,7 +109,7 @@ import { AuthProvider } from '../models/auth-provider.model';
       <div class="flex h-64 items-center justify-center">
         <div class="flex flex-col items-center gap-4">
           <div
-            class="size-12 animate-spin rounded-full border-4 border-gray-300 border-t-blue-600 dark:border-gray-600"
+            class="size-12 animate-spin rounded-full border-4 border-gray-300 border-t-blue-600 dark:border-t-blue-400 dark:border-gray-600"
           ></div>
           <p class="text-sm text-gray-500 dark:text-gray-400">
             Loading providers...
diff --git a/frontend/ai.client/src/app/admin/bedrock-models/bedrock-models.page.html b/frontend/ai.client/src/app/admin/bedrock-models/bedrock-models.page.html
index 608cc9d9..995e7713 100644
--- a/frontend/ai.client/src/app/admin/bedrock-models/bedrock-models.page.html
+++ b/frontend/ai.client/src/app/admin/bedrock-models/bedrock-models.page.html
@@ -1,149 +1,119 @@
 <div class="min-h-dvh">
-  <div class="mx-auto max-w-7xl px-4 py-8 sm:px-6 lg:px-8">
+  <div class="mx-auto max-w-5xl px-4 py-8 sm:px-6 lg:px-8">
     <!-- Back Button -->
     <a
       routerLink="/admin/manage-models"
       class="mb-6 inline-flex items-center gap-2 text-sm/6 font-medium text-gray-600 hover:text-gray-900 dark:text-gray-400 dark:hover:text-white"
     >
-      <ng-icon name="heroArrowLeft" class="size-4" />
+      <ng-icon name="heroArrowLeft" class="size-4" aria-hidden="true" />
       Back to Manage Models
     </a>
 
     <!-- Page Header -->
-    <div class="mb-8">
-      <h1 class="text-3xl/9 font-bold text-gray-900 dark:text-white">Bedrock Foundation Models</h1>
-      <p class="mt-2 text-base/7 text-gray-600 dark:text-gray-400">
-        View and filter available AWS Bedrock foundation models
+    <div class="mb-6">
+      <h1 class="text-2xl/8 font-bold text-gray-900 dark:text-white">Bedrock Foundation Models</h1>
+      <p class="mt-1 text-sm/6 text-gray-600 dark:text-gray-400">
+        Browse available AWS Bedrock foundation models and add them to your managed list.
       </p>
     </div>
 
-    <!-- Filters Section -->
-    <div class="mb-6 rounded-sm border border-gray-300 bg-white p-6 dark:border-gray-600 dark:bg-gray-800">
-      <h2 class="mb-4 text-lg/7 font-semibold text-gray-900 dark:text-white">Filters</h2>
-
-      <div class="grid grid-cols-1 gap-4 md:grid-cols-2 lg:grid-cols-3">
-        <!-- Search Filter -->
-        <div>
-          <label for="search" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
-            Search Models
-          </label>
-          <input
-            type="text"
-            id="search"
-            [ngModel]="searchQuery()"
-            (ngModelChange)="searchQuery.set($event)"
-            placeholder="Search by ID, name, or provider..."
-            class="mt-1 block w-full rounded-sm border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-hidden focus:ring-3 focus:ring-blue-500/50 dark:border-gray-600 dark:bg-gray-700 dark:text-white dark:placeholder:text-gray-500"
-          />
-        </div>
-
-        <!-- Provider Filter -->
-        <div>
-          <label for="provider" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
-            Provider
-          </label>
-          <select
-            id="provider"
-            [ngModel]="providerFilter()"
-            (ngModelChange)="providerFilter.set($event)"
-            class="mt-1 block w-full rounded-sm border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-hidden focus:ring-3 focus:ring-blue-500/50 dark:border-gray-600 dark:bg-gray-700 dark:text-white"
-          >
-            <option value="">All Providers</option>
-            @for (provider of availableProviders(); track provider) {
-              <option [value]="provider">{{ provider }}</option>
-            }
-          </select>
-        </div>
-
-        <!-- Output Modality Filter -->
-        <div>
-          <label for="outputModality" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
-            Output Modality
-          </label>
-          <select
-            id="outputModality"
-            [ngModel]="outputModalityFilter()"
-            (ngModelChange)="outputModalityFilter.set($event)"
-            class="mt-1 block w-full rounded-sm border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-hidden focus:ring-3 focus:ring-blue-500/50 dark:border-gray-600 dark:bg-gray-700 dark:text-white"
-          >
-            <option value="">All Modalities</option>
-            @for (modality of availableOutputModalities(); track modality) {
-              <option [value]="modality">{{ modality }}</option>
-            }
-          </select>
-        </div>
-
-        <!-- Inference Type Filter -->
-        <div>
-          <label for="inferenceType" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
-            Inference Type
-          </label>
-          <select
-            id="inferenceType"
-            [ngModel]="inferenceTypeFilter()"
-            (ngModelChange)="inferenceTypeFilter.set($event)"
-            class="mt-1 block w-full rounded-sm border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-hidden focus:ring-3 focus:ring-blue-500/50 dark:border-gray-600 dark:bg-gray-700 dark:text-white"
-          >
-            <option value="">All Types</option>
-            @for (type of availableInferenceTypes(); track type) {
-              <option [value]="type">{{ type }}</option>
-            }
-          </select>
-        </div>
-
-        <!-- Customization Type Filter -->
-        <div>
-          <label for="customizationType" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
-            Customization Type
-          </label>
-          <select
-            id="customizationType"
-            [ngModel]="customizationTypeFilter()"
-            (ngModelChange)="customizationTypeFilter.set($event)"
-            class="mt-1 block w-full rounded-sm border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-hidden focus:ring-3 focus:ring-blue-500/50 dark:border-gray-600 dark:bg-gray-700 dark:text-white"
-          >
-            <option value="">All Types</option>
-            @for (type of availableCustomizationTypes(); track type) {
-              <option [value]="type">{{ type }}</option>
-            }
-          </select>
-        </div>
-
-        <!-- Max Results Filter -->
-        <div>
-          <label for="maxResults" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
-            Max Results
-          </label>
-          <input
-            type="number"
-            id="maxResults"
-            [ngModel]="maxResultsFilter()"
-            (ngModelChange)="maxResultsFilter.set($event || undefined)"
-            placeholder="No limit"
-            min="1"
-            max="1000"
-            class="mt-1 block w-full rounded-sm border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-hidden focus:ring-3 focus:ring-blue-500/50 dark:border-gray-600 dark:bg-gray-700 dark:text-white dark:placeholder:text-gray-500"
-          />
-        </div>
+    <!-- Toolbar: search + filters inline -->
+    <div class="mb-3 flex flex-col gap-2 sm:flex-row sm:flex-wrap sm:items-center">
+      <div class="relative min-w-48 flex-1">
+        <ng-icon
+          name="heroMagnifyingGlass"
+          class="pointer-events-none absolute left-3 top-1/2 size-4 -translate-y-1/2 text-gray-400 dark:text-gray-500"
+          aria-hidden="true"
+        />
+        <label for="search" class="sr-only">Search models</label>
+        <input
+          type="text"
+          id="search"
+          [ngModel]="searchQuery()"
+          (ngModelChange)="searchQuery.set($event)"
+          placeholder="Search by ID, name, or provider…"
+          class="block w-full rounded-2xl border border-gray-300 bg-white py-2 pl-9 pr-3 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
+        />
       </div>
 
-      <!-- Filter Actions -->
-      <div class="mt-4 flex gap-3">
+      <label for="provider" class="sr-only">Filter by provider</label>
+      <select
+        id="provider"
+        [ngModel]="providerFilter()"
+        (ngModelChange)="providerFilter.set($event)"
+        class="rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white"
+      >
+        <option value="">All providers</option>
+        @for (provider of availableProviders(); track provider) {
+          <option [value]="provider">{{ provider }}</option>
+        }
+      </select>
+
+      <label for="outputModality" class="sr-only">Filter by output modality</label>
+      <select
+        id="outputModality"
+        [ngModel]="outputModalityFilter()"
+        (ngModelChange)="outputModalityFilter.set($event)"
+        class="rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white"
+      >
+        <option value="">All modalities</option>
+        @for (modality of availableOutputModalities(); track modality) {
+          <option [value]="modality">{{ modality }}</option>
+        }
+      </select>
+
+      <label for="inferenceType" class="sr-only">Filter by inference type</label>
+      <select
+        id="inferenceType"
+        [ngModel]="inferenceTypeFilter()"
+        (ngModelChange)="inferenceTypeFilter.set($event)"
+        class="rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white"
+      >
+        <option value="">All inference types</option>
+        @for (type of availableInferenceTypes(); track type) {
+          <option [value]="type">{{ type }}</option>
+        }
+      </select>
+
+      <label for="customizationType" class="sr-only">Filter by customization type</label>
+      <select
+        id="customizationType"
+        [ngModel]="customizationTypeFilter()"
+        (ngModelChange)="customizationTypeFilter.set($event)"
+        class="rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white"
+      >
+        <option value="">All customizations</option>
+        @for (type of availableCustomizationTypes(); track type) {
+          <option [value]="type">{{ type }}</option>
+        }
+      </select>
+
+      <label for="maxResults" class="sr-only">Max results</label>
+      <input
+        type="number"
+        id="maxResults"
+        [ngModel]="maxResultsFilter()"
+        (ngModelChange)="maxResultsFilter.set($event || undefined)"
+        placeholder="Max"
+        min="1"
+        max="1000"
+        class="w-24 rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
+      />
+
+      <button
+        (click)="applyFilters()"
+        class="rounded-2xl bg-blue-600 px-4 py-2 text-sm/6 font-medium text-white hover:bg-blue-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:bg-blue-500 dark:hover:bg-blue-600"
+      >
+        Apply
+      </button>
+      @if (hasActiveFilters()) {
         <button
-          (click)="applyFilters()"
-          class="rounded-sm bg-blue-600 px-4 py-2 text-sm/6 font-medium text-white hover:bg-blue-700 focus:outline-hidden focus:ring-3 focus:ring-blue-500/50 dark:bg-blue-500 dark:hover:bg-blue-600"
+          (click)="resetFilters()"
+          class="rounded-2xl px-3 py-2 text-sm/6 font-medium text-gray-600 hover:bg-gray-100 hover:text-gray-900 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-gray-500 dark:text-gray-400 dark:hover:bg-gray-800 dark:hover:text-white"
         >
-          Apply Filters
+          Reset
         </button>
-        @if (hasActiveFilters()) {
-          <button
-            (click)="resetFilters()"
-            class="rounded-sm border border-gray-300 bg-white px-4 py-2 text-sm/6 font-medium text-gray-700 hover:bg-gray-50 focus:outline-hidden focus:ring-3 focus:ring-gray-500/50 dark:border-gray-600 dark:bg-gray-700 dark:text-gray-300 dark:hover:bg-gray-600"
-          >
-            Reset Filters
-          </button>
-        }
-      </div>
+      }
     </div>
 
     <!-- Loading State -->
@@ -155,157 +125,125 @@ <h2 class="mb-4 text-lg/7 font-semibold text-gray-900 dark:text-white">Filters</
 
     <!-- Error State -->
     @if (error()) {
-      <div class="rounded-sm border border-red-200 bg-red-50 p-4 dark:border-red-800 dark:bg-red-900/20">
+      <div class="rounded-2xl border border-red-200 bg-red-50 p-4 dark:border-red-800 dark:bg-red-900/20">
         <h3 class="text-sm/6 font-medium text-red-800 dark:text-red-400">Error loading models</h3>
         <p class="mt-1 text-sm/6 text-red-700 dark:text-red-500">{{ error() }}</p>
       </div>
     }
 
-    <!-- Results Header -->
     @if (!isLoading() && !error()) {
-      <div class="mb-4 flex items-center justify-between">
-        <p class="text-sm/6 text-gray-600 dark:text-gray-400">
-          Showing {{ models().length }} model{{ models().length !== 1 ? 's' : '' }}
-        </p>
-      </div>
+      <!-- Count -->
+      <p class="mb-3 text-xs/5 text-gray-500 dark:text-gray-400">
+        {{ models().length }} model{{ models().length !== 1 ? 's' : '' }}
+      </p>
 
       <!-- Models List -->
       @if (models().length === 0) {
-        <div class="rounded-sm border border-gray-200 bg-white p-12 text-center dark:border-gray-700 dark:bg-gray-800">
-          <p class="text-base/7 text-gray-500 dark:text-gray-400">
+        <div class="rounded-2xl border border-dashed border-gray-300 bg-white p-12 text-center dark:border-gray-700 dark:bg-gray-800">
+          <p class="text-sm/6 text-gray-500 dark:text-gray-400">
             No models found matching the current filters.
           </p>
         </div>
       } @else {
-        <div class="space-y-4">
+        <ul class="divide-y divide-gray-200 overflow-hidden rounded-2xl border border-gray-200 bg-white dark:divide-gray-700 dark:border-gray-700 dark:bg-gray-800">
           @for (model of models(); track model.modelId) {
-            <div class="rounded-sm border border-gray-200 bg-white p-6 hover:border-gray-300 dark:border-gray-700 dark:bg-gray-800 dark:hover:border-gray-600">
-              <!-- Model Header -->
-              <div class="mb-4 flex items-start justify-between">
-                <div class="flex-1">
-                  <h3 class="text-lg/7 font-semibold text-gray-900 dark:text-white">
+            <li>
+              <!-- Row -->
+              <div class="flex items-center gap-3 px-3 py-2.5 sm:px-4">
+                <button
+                  type="button"
+                  (click)="toggleExpand(model.modelId)"
+                  [attr.aria-expanded]="isExpanded(model.modelId)"
+                  [attr.aria-controls]="'model-detail-' + model.modelId"
+                  [attr.aria-label]="(isExpanded(model.modelId) ? 'Hide' : 'Show') + ' details for ' + model.modelName"
+                  class="flex size-7 shrink-0 items-center justify-center rounded-2xl text-gray-400 hover:bg-gray-100 hover:text-gray-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:text-gray-500 dark:hover:bg-gray-700 dark:hover:text-gray-200"
+                >
+                  <ng-icon
+                    name="heroChevronDown"
+                    class="size-4 transition-transform duration-150"
+                    [class.rotate-180]="isExpanded(model.modelId)"
+                    aria-hidden="true"
+                  />
+                </button>
+
+                <div class="min-w-0 flex-1">
+                  <span class="block truncate text-sm/6 font-medium text-gray-900 dark:text-white">
                     {{ model.modelName }}
-                  </h3>
-                  <p class="mt-1 font-mono text-sm/6 text-gray-600 dark:text-gray-400">
+                  </span>
+                  <p class="truncate font-mono text-xs/5 text-gray-500 dark:text-gray-400">
                     {{ model.modelId }}
                   </p>
                 </div>
-                <span class="inline-flex shrink-0 items-center rounded-sm bg-blue-100 px-3 py-1 text-xs/5 font-medium text-blue-800 dark:bg-blue-900/50 dark:text-blue-300">
+
+                <span class="hidden shrink-0 rounded-2xl bg-gray-100 px-2.5 py-0.5 text-xs/5 font-medium text-gray-600 sm:inline-block dark:bg-gray-700 dark:text-gray-300">
                   {{ model.providerName }}
                 </span>
-              </div>
-
-              <!-- Model Details Grid -->
-              <div class="grid grid-cols-1 gap-4 md:grid-cols-2">
-                <!-- Input Modalities -->
-                <div>
-                  <h4 class="text-sm/6 font-medium text-gray-700 dark:text-gray-300">Input Modalities</h4>
-                  <div class="mt-2 flex flex-wrap gap-2">
-                    @for (modality of model.inputModalities; track modality) {
-                      <span class="inline-flex items-center rounded-sm bg-gray-100 px-2 py-1 text-xs/5 text-gray-700 dark:bg-gray-700 dark:text-gray-300">
-                        {{ modality }}
-                      </span>
-                    }
-                    @if (model.inputModalities.length === 0) {
-                      <span class="text-sm/6 text-gray-500 dark:text-gray-400">None</span>
-                    }
-                  </div>
-                </div>
 
-                <!-- Output Modalities -->
-                <div>
-                  <h4 class="text-sm/6 font-medium text-gray-700 dark:text-gray-300">Output Modalities</h4>
-                  <div class="mt-2 flex flex-wrap gap-2">
-                    @for (modality of model.outputModalities; track modality) {
-                      <span class="inline-flex items-center rounded-sm bg-gray-100 px-2 py-1 text-xs/5 text-gray-700 dark:bg-gray-700 dark:text-gray-300">
-                        {{ modality }}
-                      </span>
-                    }
-                    @if (model.outputModalities.length === 0) {
-                      <span class="text-sm/6 text-gray-500 dark:text-gray-400">None</span>
-                    }
-                  </div>
-                </div>
-
-                <!-- Inference Types -->
-                <div>
-                  <h4 class="text-sm/6 font-medium text-gray-700 dark:text-gray-300">Inference Types</h4>
-                  <div class="mt-2 flex flex-wrap gap-2">
-                    @for (type of model.inferenceTypesSupported; track type) {
-                      <span class="inline-flex items-center rounded-sm bg-green-100 px-2 py-1 text-xs/5 text-green-700 dark:bg-green-900/50 dark:text-green-300">
-                        {{ type }}
-                      </span>
-                    }
-                    @if (model.inferenceTypesSupported.length === 0) {
-                      <span class="text-sm/6 text-gray-500 dark:text-gray-400">None</span>
-                    }
-                  </div>
-                </div>
-
-                <!-- Customizations Supported -->
-                <div>
-                  <h4 class="text-sm/6 font-medium text-gray-700 dark:text-gray-300">Customizations</h4>
-                  <div class="mt-2 flex flex-wrap gap-2">
-                    @for (customization of model.customizationsSupported; track customization) {
-                      <span class="inline-flex items-center rounded-sm bg-purple-100 px-2 py-1 text-xs/5 text-purple-700 dark:bg-purple-900/50 dark:text-purple-300">
-                        {{ customization }}
-                      </span>
-                    }
-                    @if (model.customizationsSupported.length === 0) {
-                      <span class="text-sm/6 text-gray-500 dark:text-gray-400">None</span>
-                    }
-                  </div>
-                </div>
-              </div>
-
-              <!-- Model Features and Actions -->
-              <div class="mt-4 flex items-center justify-between gap-4 border-t border-gray-200 pt-4 dark:border-gray-700">
-                <div class="flex gap-4">
-                  <div class="flex items-center gap-2">
-                    @if (model.responseStreamingSupported) {
-                      <svg class="size-5 text-green-600 dark:text-green-400" fill="currentColor" viewBox="0 0 20 20">
-                        <path fill-rule="evenodd" d="M10 18a8 8 0 100-16 8 8 0 000 16zm3.857-9.809a.75.75 0 00-1.214-.882l-3.483 4.79-1.88-1.88a.75.75 0 10-1.06 1.061l2.5 2.5a.75.75 0 001.137-.089l4-5.5z" clip-rule="evenodd" />
-                      </svg>
-                      <span class="text-sm/6 text-gray-700 dark:text-gray-300">Streaming supported</span>
-                    } @else {
-                      <svg class="size-5 text-gray-400" fill="currentColor" viewBox="0 0 20 20">
-                        <path fill-rule="evenodd" d="M10 18a8 8 0 100-16 8 8 0 000 16zM8.28 7.22a.75.75 0 00-1.06 1.06L8.94 10l-1.72 1.72a.75.75 0 101.06 1.06L10 11.06l1.72 1.72a.75.75 0 101.06-1.06L11.06 10l1.72-1.72a.75.75 0 00-1.06-1.06L10 8.94 8.28 7.22z" clip-rule="evenodd" />
-                      </svg>
-                      <span class="text-sm/6 text-gray-500 dark:text-gray-400">No streaming</span>
-                    }
-                  </div>
-
-                  @if (model.modelLifecycle) {
-                    <div class="flex items-center gap-2">
-                      <span class="text-sm/6 text-gray-600 dark:text-gray-400">Status:</span>
-                      <span class="font-medium text-sm/6 text-gray-900 dark:text-white">{{ model.modelLifecycle }}</span>
-                    </div>
-                  }
-                </div>
-
-                <!-- Add Model Button or Added Status -->
                 @if (isModelAdded(model.modelId)) {
-                  <div class="inline-flex items-center gap-2 rounded-sm bg-green-100 px-3 py-1.5 text-sm/6 font-medium text-green-800 dark:bg-green-900/50 dark:text-green-300">
-                    <svg class="size-4" fill="currentColor" viewBox="0 0 20 20">
-                      <path fill-rule="evenodd" d="M10 18a8 8 0 100-16 8 8 0 000 16zm3.857-9.809a.75.75 0 00-1.214-.882l-3.483 4.79-1.88-1.88a.75.75 0 10-1.06 1.061l2.5 2.5a.75.75 0 001.137-.089l4-5.5z" clip-rule="evenodd" />
-                    </svg>
+                  <span class="inline-flex shrink-0 items-center gap-1.5 rounded-2xl bg-green-100 px-3 py-1.5 text-xs/5 font-medium text-green-800 dark:bg-green-900/50 dark:text-green-300">
+                    <ng-icon name="heroCheckCircleSolid" class="size-4" aria-hidden="true" />
                     Added
-                  </div>
+                  </span>
                 } @else {
                   <button
                     (click)="addModelFromBedrock(model)"
-                    class="inline-flex items-center gap-2 rounded-sm bg-blue-600 px-3 py-1.5 text-sm/6 font-medium text-white hover:bg-blue-700 focus:outline-hidden focus:ring-3 focus:ring-blue-500/50 dark:bg-blue-500 dark:hover:bg-blue-600"
+                    class="inline-flex shrink-0 items-center gap-1.5 rounded-2xl bg-blue-600 px-3 py-1.5 text-xs/5 font-medium text-white hover:bg-blue-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:bg-blue-500 dark:hover:bg-blue-600"
                   >
-                    <svg class="size-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2">
-                      <path stroke-linecap="round" stroke-linejoin="round" d="M12 4v16m8-8H4" />
-                    </svg>
-                    Add Model
+                    <ng-icon name="heroPlus" class="size-4" aria-hidden="true" />
+                    Add
                   </button>
                 }
               </div>
-            </div>
+
+              <!-- Expanded detail -->
+              @if (isExpanded(model.modelId)) {
+                <div
+                  [id]="'model-detail-' + model.modelId"
+                  class="border-t border-gray-100 bg-gray-50 px-4 py-3 sm:pl-14 dark:border-gray-700/60 dark:bg-gray-900/40"
+                >
+                  <dl class="grid grid-cols-1 gap-x-8 gap-y-3 sm:grid-cols-3">
+                    <div>
+                      <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">Input modalities</dt>
+                      <dd class="mt-0.5 text-sm/6 text-gray-700 dark:text-gray-300">
+                        {{ model.inputModalities.length > 0 ? model.inputModalities.join(', ') : 'None' }}
+                      </dd>
+                    </div>
+                    <div>
+                      <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">Output modalities</dt>
+                      <dd class="mt-0.5 text-sm/6 text-gray-700 dark:text-gray-300">
+                        {{ model.outputModalities.length > 0 ? model.outputModalities.join(', ') : 'None' }}
+                      </dd>
+                    </div>
+                    <div>
+                      <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">Streaming</dt>
+                      <dd class="mt-0.5 text-sm/6 text-gray-700 dark:text-gray-300">
+                        {{ model.responseStreamingSupported ? 'Supported' : 'Not supported' }}
+                      </dd>
+                    </div>
+                    <div>
+                      <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">Inference types</dt>
+                      <dd class="mt-0.5 text-sm/6 text-gray-700 dark:text-gray-300">
+                        {{ model.inferenceTypesSupported.length > 0 ? model.inferenceTypesSupported.join(', ') : 'None' }}
+                      </dd>
+                    </div>
+                    <div>
+                      <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">Customizations</dt>
+                      <dd class="mt-0.5 text-sm/6 text-gray-700 dark:text-gray-300">
+                        {{ model.customizationsSupported.length > 0 ? model.customizationsSupported.join(', ') : 'None' }}
+                      </dd>
+                    </div>
+                    @if (model.modelLifecycle) {
+                      <div>
+                        <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">Lifecycle</dt>
+                        <dd class="mt-0.5 text-sm/6 text-gray-700 dark:text-gray-300">{{ model.modelLifecycle }}</dd>
+                      </div>
+                    }
+                  </dl>
+                </div>
+              }
+            </li>
           }
-        </div>
+        </ul>
       }
     }
   </div>
diff --git a/frontend/ai.client/src/app/admin/bedrock-models/bedrock-models.page.ts b/frontend/ai.client/src/app/admin/bedrock-models/bedrock-models.page.ts
index 6eec47d7..039c51e3 100644
--- a/frontend/ai.client/src/app/admin/bedrock-models/bedrock-models.page.ts
+++ b/frontend/ai.client/src/app/admin/bedrock-models/bedrock-models.page.ts
@@ -2,7 +2,13 @@ import { Component, ChangeDetectionStrategy, inject, signal, computed } from '@a
 import { Router, RouterLink } from '@angular/router';
 import { FormsModule } from '@angular/forms';
 import { NgIcon, provideIcons } from '@ng-icons/core';
-import { heroArrowLeft } from '@ng-icons/heroicons/outline';
+import {
+  heroArrowLeft,
+  heroPlus,
+  heroMagnifyingGlass,
+  heroChevronDown,
+} from '@ng-icons/heroicons/outline';
+import { heroCheckCircleSolid } from '@ng-icons/heroicons/solid';
 import { BedrockModelsService } from './services/bedrock-models.service';
 import { FoundationModelSummary } from './models/bedrock-model.model';
 import { ManagedModelsService } from '../manage-models/services/managed-models.service';
@@ -11,7 +17,15 @@ import { ThinkingDotsComponent } from '../../components/thinking-dots.component'
 @Component({
   selector: 'app-bedrock-models-page',
   imports: [FormsModule, ThinkingDotsComponent, RouterLink, NgIcon],
-  providers: [provideIcons({ heroArrowLeft })],
+  providers: [
+    provideIcons({
+      heroArrowLeft,
+      heroPlus,
+      heroMagnifyingGlass,
+      heroChevronDown,
+      heroCheckCircleSolid,
+    }),
+  ],
   templateUrl: './bedrock-models.page.html',
   styleUrl: './bedrock-models.page.css',
   changeDetection: ChangeDetectionStrategy.OnPush,
@@ -29,6 +43,9 @@ export class BedrockModelsPage {
   maxResultsFilter = signal<number | undefined>(undefined);
   searchQuery = signal<string>('');
 
+  // Row detail expansion state (set of model ids currently expanded)
+  private expandedIds = signal<ReadonlySet<string>>(new Set());
+
   // Access the models resource from the service
   readonly modelsResource = this.bedrockModelsService.modelsResource;
 
@@ -125,6 +142,22 @@ export class BedrockModelsPage {
     );
   });
 
+  isExpanded(modelId: string): boolean {
+    return this.expandedIds().has(modelId);
+  }
+
+  toggleExpand(modelId: string): void {
+    this.expandedIds.update(current => {
+      const next = new Set(current);
+      if (next.has(modelId)) {
+        next.delete(modelId);
+      } else {
+        next.add(modelId);
+      }
+      return next;
+    });
+  }
+
   /**
    * Check if a model has already been added to the managed models list
    */
diff --git a/frontend/ai.client/src/app/admin/connectors/pages/connector-form.page.ts b/frontend/ai.client/src/app/admin/connectors/pages/connector-form.page.ts
index b43df68c..35adbbc6 100644
--- a/frontend/ai.client/src/app/admin/connectors/pages/connector-form.page.ts
+++ b/frontend/ai.client/src/app/admin/connectors/pages/connector-form.page.ts
@@ -103,8 +103,7 @@ const ICON_ACCEPTED_MIME_TYPES = [
   ],
   host: { class: 'block' },
   template: `
-    <div class="min-h-dvh">
-      <div class="mx-auto max-w-3xl px-4 py-8 sm:px-6 lg:px-8">
+    <div class="max-w-3xl">
         <button
           type="button"
           (click)="goBack()"
@@ -126,7 +125,7 @@ const ICON_ACCEPTED_MIME_TYPES = [
         @if (loading()) {
           <div class="flex h-64 items-center justify-center">
             <div class="flex flex-col items-center gap-4">
-              <div class="size-12 animate-spin rounded-full border-4 border-gray-300 border-t-blue-600 dark:border-gray-600"></div>
+              <div class="size-12 animate-spin rounded-full border-4 border-gray-300 border-t-blue-600 dark:border-t-blue-400 dark:border-gray-600"></div>
               <p class="text-sm/6 text-gray-500 dark:text-gray-400">Loading connector...</p>
             </div>
           </div>
@@ -572,7 +571,6 @@ const ICON_ACCEPTED_MIME_TYPES = [
             </div>
           </form>
         }
-      </div>
     </div>
   `,
 })
diff --git a/frontend/ai.client/src/app/admin/connectors/pages/connector-list.page.ts b/frontend/ai.client/src/app/admin/connectors/pages/connector-list.page.ts
index 720b9e0f..eb9db52a 100644
--- a/frontend/ai.client/src/app/admin/connectors/pages/connector-list.page.ts
+++ b/frontend/ai.client/src/app/admin/connectors/pages/connector-list.page.ts
@@ -58,17 +58,7 @@ import {
     class: 'block',
   },
   template: `
-    <div class="min-h-dvh">
-      <div class="mx-auto max-w-6xl px-4 py-8 sm:px-6 lg:px-8">
-        <!-- Back Button -->
-        <a
-          routerLink="/admin"
-          class="mb-6 inline-flex items-center gap-2 text-sm/6 font-medium text-gray-600 hover:text-gray-900 dark:text-gray-400 dark:hover:text-white"
-        >
-          <ng-icon name="heroArrowLeft" class="size-4" />
-          Back to Admin
-        </a>
-
+    <div>
         <!-- Page Header -->
         <div class="mb-8 flex flex-col gap-4 sm:flex-row sm:items-center sm:justify-between">
           <div>
@@ -149,7 +139,7 @@ import {
           <div class="flex h-64 items-center justify-center">
             <div class="flex flex-col items-center gap-4">
               <div
-                class="size-12 animate-spin rounded-full border-4 border-gray-300 border-t-blue-600 dark:border-gray-600"
+                class="size-12 animate-spin rounded-full border-4 border-gray-300 border-t-blue-600 dark:border-t-blue-400 dark:border-gray-600"
               ></div>
               <p class="text-sm/6 text-gray-500 dark:text-gray-400">
                 Loading connectors...
@@ -364,7 +354,6 @@ import {
             </div>
           </div>
         }
-      </div>
     </div>
   `,
 })
diff --git a/frontend/ai.client/src/app/admin/costs/admin-costs.page.ts b/frontend/ai.client/src/app/admin/costs/admin-costs.page.ts
index 106ebd48..069a0998 100644
--- a/frontend/ai.client/src/app/admin/costs/admin-costs.page.ts
+++ b/frontend/ai.client/src/app/admin/costs/admin-costs.page.ts
@@ -39,18 +39,7 @@ import { ModelBreakdownComponent } from './components/model-breakdown.component'
   providers: [provideIcons({ heroArrowLeft, heroArrowDownTray })],
   changeDetection: ChangeDetectionStrategy.OnPush,
   template: `
-    <div class="min-h-dvh bg-gray-50 dark:bg-gray-900">
-      <!-- Content -->
-      <div class="mx-auto max-w-7xl px-4 py-8 sm:px-6 lg:px-8">
-        <!-- Back Button -->
-        <a
-          routerLink="/admin"
-          class="mb-6 inline-flex items-center gap-2 text-sm/6 font-medium text-gray-600 hover:text-gray-900 dark:text-gray-400 dark:hover:text-white"
-        >
-          <ng-icon name="heroArrowLeft" class="size-4" />
-          Back to Admin
-        </a>
-
+    <div>
         <!-- Page Header -->
         <div class="mb-6 flex flex-col gap-4 sm:flex-row sm:items-center sm:justify-between">
           <div>
@@ -83,7 +72,7 @@ import { ModelBreakdownComponent } from './components/model-breakdown.component'
           <div class="flex items-center justify-center h-64">
             <div class="flex flex-col items-center gap-4">
               <div
-                class="animate-spin rounded-full size-12 border-4 border-gray-300 dark:border-gray-600 border-t-blue-600"
+                class="animate-spin rounded-full size-12 border-4 border-gray-300 dark:border-gray-600 border-t-blue-600 dark:border-t-blue-400"
               ></div>
               <p class="text-sm text-gray-500 dark:text-gray-400">
                 Loading dashboard data...
@@ -128,7 +117,7 @@ import { ModelBreakdownComponent } from './components/model-breakdown.component'
           </div>
         } @else {
           <!-- Summary Cards -->
-          <div class="grid grid-cols-1 gap-6 sm:grid-cols-2 lg:grid-cols-4">
+          <div class="grid grid-cols-1 gap-6 sm:grid-cols-2 xl:grid-cols-4">
             <app-system-summary-card
               title="Total Cost"
               [value]="formattedTotalCost()"
@@ -172,7 +161,6 @@ import { ModelBreakdownComponent } from './components/model-breakdown.component'
             />
           </div>
         }
-      </div>
     </div>
   `,
 })
diff --git a/frontend/ai.client/src/app/admin/costs/components/system-summary-card.component.ts b/frontend/ai.client/src/app/admin/costs/components/system-summary-card.component.ts
index a724f5cb..b25ba1b3 100644
--- a/frontend/ai.client/src/app/admin/costs/components/system-summary-card.component.ts
+++ b/frontend/ai.client/src/app/admin/costs/components/system-summary-card.component.ts
@@ -46,52 +46,50 @@ export type SummaryCardIcon =
     <div
       class="bg-white dark:bg-gray-800 rounded-lg shadow-xs border border-gray-200 dark:border-gray-700 p-6"
     >
-      <div class="flex items-center justify-between">
-        <div class="flex-1">
-          <p class="text-sm font-medium text-gray-500 dark:text-gray-400">
-            {{ title() }}
-          </p>
-          <p class="mt-2 text-3xl font-semibold text-gray-900 dark:text-white">
-            {{ value() }}
-          </p>
-
-          @if (trend() !== null && trend() !== undefined) {
-            <div class="mt-2 flex items-center gap-1">
-              @if (trend()! > 0) {
-                <ng-icon
-                  name="heroArrowTrendingUp"
-                  class="size-4 text-green-500"
-                />
-                <span class="text-sm text-green-600 dark:text-green-400">
-                  +{{ trend() | number : '1.1-1' }}%
-                </span>
-              } @else if (trend()! < 0) {
-                <ng-icon
-                  name="heroArrowTrendingDown"
-                  class="size-4 text-red-500"
-                />
-                <span class="text-sm text-red-600 dark:text-red-400">
-                  {{ trend() | number : '1.1-1' }}%
-                </span>
-              } @else {
-                <span class="text-sm text-gray-500 dark:text-gray-400">
-                  No change
-                </span>
-              }
-              <span class="text-sm text-gray-400 dark:text-gray-500">
-                vs last period
-              </span>
-            </div>
-          }
-        </div>
-
+      <div class="flex items-start justify-between gap-3">
+        <p class="text-sm/6 font-medium text-gray-500 dark:text-gray-400">
+          {{ title() }}
+        </p>
         <div
-          class="flex size-12 items-center justify-center rounded-lg"
+          class="flex size-8 shrink-0 items-center justify-center rounded-md"
           [class]="iconBackgroundClass()"
         >
-          <ng-icon [name]="icon()" class="size-6" [class]="iconColorClass()" />
+          <ng-icon [name]="icon()" class="size-4" [class]="iconColorClass()" />
         </div>
       </div>
+
+      <p class="mt-3 text-3xl/9 font-semibold text-gray-900 dark:text-white">
+        {{ value() }}
+      </p>
+
+      @if (trend() !== null && trend() !== undefined) {
+        <div class="mt-2 flex items-center gap-1">
+          @if (trend()! > 0) {
+            <ng-icon
+              name="heroArrowTrendingUp"
+              class="size-4 text-green-500"
+            />
+            <span class="text-sm text-green-600 dark:text-green-400">
+              +{{ trend() | number : '1.1-1' }}%
+            </span>
+          } @else if (trend()! < 0) {
+            <ng-icon
+              name="heroArrowTrendingDown"
+              class="size-4 text-red-500"
+            />
+            <span class="text-sm text-red-600 dark:text-red-400">
+              {{ trend() | number : '1.1-1' }}%
+            </span>
+          } @else {
+            <span class="text-sm text-gray-500 dark:text-gray-400">
+              No change
+            </span>
+          }
+          <span class="text-sm text-gray-400 dark:text-gray-500">
+            vs last period
+          </span>
+        </div>
+      }
     </div>
   `,
 })
diff --git a/frontend/ai.client/src/app/admin/costs/components/top-users-table.component.ts b/frontend/ai.client/src/app/admin/costs/components/top-users-table.component.ts
index aca037d8..d801cb2a 100644
--- a/frontend/ai.client/src/app/admin/costs/components/top-users-table.component.ts
+++ b/frontend/ai.client/src/app/admin/costs/components/top-users-table.component.ts
@@ -303,7 +303,7 @@ type SortDirection = 'asc' | 'desc';
         >
           <div class="inline-flex items-center gap-2 text-sm text-gray-500">
             <div
-              class="animate-spin rounded-full size-4 border-2 border-gray-300 dark:border-gray-600 border-t-blue-600"
+              class="animate-spin rounded-full size-4 border-2 border-gray-300 dark:border-gray-600 border-t-blue-600 dark:border-t-blue-400"
             ></div>
             Loading more users...
           </div>
diff --git a/frontend/ai.client/src/app/admin/fine-tuning-access/fine-tuning-access.page.html b/frontend/ai.client/src/app/admin/fine-tuning-access/fine-tuning-access.page.html
index 010a0fc0..1d052338 100644
--- a/frontend/ai.client/src/app/admin/fine-tuning-access/fine-tuning-access.page.html
+++ b/frontend/ai.client/src/app/admin/fine-tuning-access/fine-tuning-access.page.html
@@ -1,10 +1,4 @@
 <div class="mx-auto max-w-5xl px-4 py-8 sm:px-6 lg:px-8">
-  <!-- Back link -->
-  <a routerLink="/admin" class="mb-6 inline-flex items-center gap-1 text-sm/6 text-gray-500 hover:text-gray-700 dark:text-gray-400 dark:hover:text-gray-200">
-    <ng-icon name="heroArrowLeft" class="size-4" />
-    Back to Admin
-  </a>
-
   <!-- Page header -->
   <div class="mb-6 flex items-start justify-between">
     <div>
@@ -94,7 +88,7 @@ <h2 class="mb-3 text-sm/6 font-semibold text-blue-900 dark:text-blue-200">Grant
   <!-- Loading spinner (only when no data yet) -->
   @if (state.loading() && state.grantCount() === 0) {
     <div class="flex items-center justify-center py-12">
-      <div class="size-8 animate-spin rounded-full border-4 border-gray-200 border-t-blue-600" role="status" aria-label="Loading"></div>
+      <div class="size-8 animate-spin rounded-full border-4 border-gray-200 border-t-blue-600 dark:border-t-blue-400" role="status" aria-label="Loading"></div>
     </div>
   }
 
diff --git a/frontend/ai.client/src/app/admin/fine-tuning-access/fine-tuning.layout.ts b/frontend/ai.client/src/app/admin/fine-tuning-access/fine-tuning.layout.ts
new file mode 100644
index 00000000..0d366a6e
--- /dev/null
+++ b/frontend/ai.client/src/app/admin/fine-tuning-access/fine-tuning.layout.ts
@@ -0,0 +1,48 @@
+import { Component, ChangeDetectionStrategy } from '@angular/core';
+import { RouterLink, RouterLinkActive, RouterOutlet } from '@angular/router';
+
+interface FineTuningTab {
+  label: string;
+  route: string;
+  exact: boolean;
+}
+
+@Component({
+  selector: 'app-fine-tuning-layout',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  imports: [RouterLink, RouterLinkActive, RouterOutlet],
+  host: { class: 'block' },
+  template: `
+    <div class="mb-6">
+      <h1 class="text-2xl/8 font-bold text-gray-900 dark:text-white">Fine-Tuning</h1>
+      <p class="mt-1 text-sm/6 text-gray-600 dark:text-gray-400">
+        Manage who can access fine-tuning and review the resulting compute spend.
+      </p>
+    </div>
+
+    <div class="mb-6 border-b border-gray-200 dark:border-white/10">
+      <nav class="-mb-px flex flex-wrap gap-x-6" aria-label="Fine-tuning sections">
+        @for (tab of tabs; track tab.route) {
+          <a
+            [routerLink]="tab.route"
+            routerLinkActive="border-blue-500 text-blue-600 dark:border-blue-400 dark:text-blue-400"
+            [routerLinkActiveOptions]="{ exact: tab.exact }"
+            #rla="routerLinkActive"
+            [attr.aria-current]="rla.isActive ? 'page' : null"
+            class="whitespace-nowrap border-b-2 border-transparent px-1 py-3 text-sm/6 font-medium text-gray-500 hover:border-gray-300 hover:text-gray-700 dark:text-gray-400 dark:hover:border-white/20 dark:hover:text-gray-200"
+          >
+            {{ tab.label }}
+          </a>
+        }
+      </nav>
+    </div>
+
+    <router-outlet />
+  `,
+})
+export class FineTuningLayout {
+  readonly tabs: FineTuningTab[] = [
+    { label: 'Access', route: '.', exact: true },
+    { label: 'Costs', route: 'costs', exact: false },
+  ];
+}
diff --git a/frontend/ai.client/src/app/admin/fine-tuning-costs/fine-tuning-costs.page.html b/frontend/ai.client/src/app/admin/fine-tuning-costs/fine-tuning-costs.page.html
index dc634d59..de655b87 100644
--- a/frontend/ai.client/src/app/admin/fine-tuning-costs/fine-tuning-costs.page.html
+++ b/frontend/ai.client/src/app/admin/fine-tuning-costs/fine-tuning-costs.page.html
@@ -1,14 +1,5 @@
 <div class="min-h-dvh bg-gray-50 dark:bg-gray-900">
   <div class="mx-auto max-w-7xl px-4 py-8 sm:px-6 lg:px-8">
-    <!-- Back Button -->
-    <a
-      routerLink="/admin"
-      class="mb-6 inline-flex items-center gap-2 text-sm/6 font-medium text-gray-600 hover:text-gray-900 dark:text-gray-400 dark:hover:text-white"
-    >
-      <ng-icon name="heroArrowLeft" class="size-4" />
-      Back to Admin
-    </a>
-
     <!-- Page Header -->
     <div class="mb-6 flex flex-col gap-4 sm:flex-row sm:items-center sm:justify-between">
       <div>
@@ -48,7 +39,7 @@ <h1 class="text-3xl/9 font-bold text-gray-900 dark:text-white">
       <div class="flex items-center justify-center h-64">
         <div class="flex flex-col items-center gap-4">
           <div
-            class="animate-spin rounded-full size-12 border-4 border-gray-300 dark:border-gray-600 border-t-blue-600"
+            class="animate-spin rounded-full size-12 border-4 border-gray-300 dark:border-gray-600 border-t-blue-600 dark:border-t-blue-400"
             role="status"
           >
             <span class="sr-only">Loading cost data</span>
diff --git a/frontend/ai.client/src/app/admin/gemini-models/gemini-models.page.html b/frontend/ai.client/src/app/admin/gemini-models/gemini-models.page.html
index d77e21d5..39f9251e 100644
--- a/frontend/ai.client/src/app/admin/gemini-models/gemini-models.page.html
+++ b/frontend/ai.client/src/app/admin/gemini-models/gemini-models.page.html
@@ -1,77 +1,67 @@
 <div class="min-h-dvh">
-  <div class="mx-auto max-w-7xl px-4 py-8 sm:px-6 lg:px-8">
+  <div class="mx-auto max-w-5xl px-4 py-8 sm:px-6 lg:px-8">
     <!-- Back Button -->
     <a
       routerLink="/admin/manage-models"
       class="mb-6 inline-flex items-center gap-2 text-sm/6 font-medium text-gray-600 hover:text-gray-900 dark:text-gray-400 dark:hover:text-white"
     >
-      <ng-icon name="heroArrowLeft" class="size-4" />
+      <ng-icon name="heroArrowLeft" class="size-4" aria-hidden="true" />
       Back to Manage Models
     </a>
 
     <!-- Page Header -->
-    <div class="mb-8">
-      <h1 class="text-3xl/9 font-bold text-gray-900 dark:text-white">Google Gemini Models</h1>
-      <p class="mt-2 text-base/7 text-gray-600 dark:text-gray-400">
-        View and manage available Google Gemini models
+    <div class="mb-6">
+      <h1 class="text-2xl/8 font-bold text-gray-900 dark:text-white">Google Gemini Models</h1>
+      <p class="mt-1 text-sm/6 text-gray-600 dark:text-gray-400">
+        Browse available Google Gemini models and add them to your managed list.
       </p>
     </div>
 
-    <!-- Filters Section -->
-    <div class="mb-6 rounded-sm border border-gray-300 bg-white p-6 dark:border-gray-600 dark:bg-gray-800">
-      <h2 class="mb-4 text-lg/7 font-semibold text-gray-900 dark:text-white">Filters</h2>
-
-      <div class="grid grid-cols-1 gap-4 md:grid-cols-2 lg:grid-cols-3">
-        <!-- Search Filter -->
-        <div>
-          <label for="search" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
-            Search Models
-          </label>
-          <input
-            type="text"
-            id="search"
-            [ngModel]="searchQuery()"
-            (ngModelChange)="searchQuery.set($event)"
-            placeholder="Search by name or description..."
-            class="mt-1 block w-full rounded-sm border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-hidden focus:ring-3 focus:ring-blue-500/50 dark:border-gray-600 dark:bg-gray-700 dark:text-white dark:placeholder:text-gray-500"
-          />
-        </div>
-
-        <!-- Max Results Filter -->
-        <div>
-          <label for="maxResults" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
-            Max Results
-          </label>
-          <input
-            type="number"
-            id="maxResults"
-            [ngModel]="maxResultsFilter()"
-            (ngModelChange)="maxResultsFilter.set($event || undefined)"
-            placeholder="No limit"
-            min="1"
-            max="1000"
-            class="mt-1 block w-full rounded-sm border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-hidden focus:ring-3 focus:ring-blue-500/50 dark:border-gray-600 dark:bg-gray-700 dark:text-white dark:placeholder:text-gray-500"
-          />
-        </div>
+    <!-- Toolbar: search + filters inline -->
+    <div class="mb-3 flex flex-col gap-2 sm:flex-row sm:items-center">
+      <div class="relative flex-1">
+        <ng-icon
+          name="heroMagnifyingGlass"
+          class="pointer-events-none absolute left-3 top-1/2 size-4 -translate-y-1/2 text-gray-400 dark:text-gray-500"
+          aria-hidden="true"
+        />
+        <label for="search" class="sr-only">Search models</label>
+        <input
+          type="text"
+          id="search"
+          [ngModel]="searchQuery()"
+          (ngModelChange)="searchQuery.set($event)"
+          placeholder="Search by name or description…"
+          class="block w-full rounded-2xl border border-gray-300 bg-white py-2 pl-9 pr-3 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
+        />
       </div>
 
-      <!-- Filter Actions -->
-      <div class="mt-4 flex gap-3">
+      <label for="maxResults" class="sr-only">Max results</label>
+      <input
+        type="number"
+        id="maxResults"
+        [ngModel]="maxResultsFilter()"
+        (ngModelChange)="maxResultsFilter.set($event || undefined)"
+        placeholder="Max"
+        min="1"
+        max="1000"
+        class="w-24 rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
+      />
+
+      <button
+        (click)="applyFilters()"
+        class="rounded-2xl bg-blue-600 px-4 py-2 text-sm/6 font-medium text-white hover:bg-blue-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:bg-blue-500 dark:hover:bg-blue-600"
+      >
+        Apply
+      </button>
+      @if (hasActiveFilters()) {
         <button
-          (click)="applyFilters()"
-          class="rounded-sm bg-blue-600 px-4 py-2 text-sm/6 font-medium text-white hover:bg-blue-700 focus:outline-hidden focus:ring-3 focus:ring-blue-500/50 dark:bg-blue-500 dark:hover:bg-blue-600"
+          (click)="resetFilters()"
+          class="rounded-2xl px-3 py-2 text-sm/6 font-medium text-gray-600 hover:bg-gray-100 hover:text-gray-900 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-gray-500 dark:text-gray-400 dark:hover:bg-gray-800 dark:hover:text-white"
         >
-          Apply Filters
+          Reset
         </button>
-        @if (hasActiveFilters()) {
-          <button
-            (click)="resetFilters()"
-            class="rounded-sm border border-gray-300 bg-white px-4 py-2 text-sm/6 font-medium text-gray-700 hover:bg-gray-50 focus:outline-hidden focus:ring-3 focus:ring-gray-500/50 dark:border-gray-600 dark:bg-gray-700 dark:text-gray-300 dark:hover:bg-gray-600"
-          >
-            Reset Filters
-          </button>
-        }
-      </div>
+      }
     </div>
 
     <!-- Loading State -->
@@ -83,148 +73,133 @@ <h2 class="mb-4 text-lg/7 font-semibold text-gray-900 dark:text-white">Filters</
 
     <!-- Error State -->
     @if (error()) {
-      <div class="rounded-sm border border-red-200 bg-red-50 p-4 dark:border-red-800 dark:bg-red-900/20">
+      <div class="rounded-2xl border border-red-200 bg-red-50 p-4 dark:border-red-800 dark:bg-red-900/20">
         <h3 class="text-sm/6 font-medium text-red-800 dark:text-red-400">Error loading models</h3>
         <p class="mt-1 text-sm/6 text-red-700 dark:text-red-500">{{ error() }}</p>
       </div>
     }
 
-    <!-- Results Header -->
     @if (!isLoading() && !error()) {
-      <div class="mb-4 flex items-center justify-between">
-        <p class="text-sm/6 text-gray-600 dark:text-gray-400">
-          Showing {{ models().length }} model{{ models().length !== 1 ? 's' : '' }}
-        </p>
-      </div>
+      <!-- Count -->
+      <p class="mb-3 text-xs/5 text-gray-500 dark:text-gray-400">
+        {{ models().length }} model{{ models().length !== 1 ? 's' : '' }}
+      </p>
 
       <!-- Models List -->
       @if (models().length === 0) {
-        <div class="rounded-sm border border-gray-200 bg-white p-12 text-center dark:border-gray-700 dark:bg-gray-800">
-          <p class="text-base/7 text-gray-500 dark:text-gray-400">
-            No models found.
-          </p>
+        <div class="rounded-2xl border border-dashed border-gray-300 bg-white p-12 text-center dark:border-gray-700 dark:bg-gray-800">
+          <p class="text-sm/6 text-gray-500 dark:text-gray-400">No models found.</p>
         </div>
       } @else {
-        <div class="space-y-4">
+        <ul class="divide-y divide-gray-200 overflow-hidden rounded-2xl border border-gray-200 bg-white dark:divide-gray-700 dark:border-gray-700 dark:bg-gray-800">
           @for (model of models(); track model.name) {
-            <div class="rounded-sm border border-gray-200 bg-white p-6 hover:border-gray-300 dark:border-gray-700 dark:bg-gray-800 dark:hover:border-gray-600">
-              <!-- Model Header -->
-              <div class="mb-4 flex items-start justify-between">
-                <div class="flex-1">
-                  <h3 class="text-lg/7 font-semibold text-gray-900 dark:text-white">
-                    {{ model.displayName }}
-                  </h3>
-                  <p class="mt-1 font-mono text-sm/6 text-gray-600 dark:text-gray-400">
-                    {{ model.name }}
-                  </p>
-                  @if (model.description) {
-                    <p class="mt-2 text-sm/6 text-gray-700 dark:text-gray-300">
-                      {{ model.description }}
-                    </p>
-                  }
-                </div>
-                <span class="inline-flex shrink-0 items-center rounded-sm bg-blue-100 px-3 py-1 text-xs/5 font-medium text-blue-800 dark:bg-blue-900/50 dark:text-blue-300">
-                  Google
-                </span>
-              </div>
+            <li>
+              <!-- Row -->
+              <div class="flex items-center gap-3 px-3 py-2.5 sm:px-4">
+                <button
+                  type="button"
+                  (click)="toggleExpand(model.name)"
+                  [attr.aria-expanded]="isExpanded(model.name)"
+                  [attr.aria-controls]="'model-detail-' + model.name"
+                  [attr.aria-label]="(isExpanded(model.name) ? 'Hide' : 'Show') + ' details for ' + model.displayName"
+                  class="flex size-7 shrink-0 items-center justify-center rounded-2xl text-gray-400 hover:bg-gray-100 hover:text-gray-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:text-gray-500 dark:hover:bg-gray-700 dark:hover:text-gray-200"
+                >
+                  <ng-icon
+                    name="heroChevronDown"
+                    class="size-4 transition-transform duration-150"
+                    [class.rotate-180]="isExpanded(model.name)"
+                    aria-hidden="true"
+                  />
+                </button>
 
-              <!-- Model Details Grid -->
-              <div class="grid grid-cols-1 gap-4 md:grid-cols-2">
-                <!-- Token Limits -->
-                <div>
-                  <h4 class="text-sm/6 font-medium text-gray-700 dark:text-gray-300">Token Limits</h4>
-                  <div class="mt-2 space-y-1">
-                    @if (model.inputTokenLimit) {
-                      <div class="flex items-center gap-2">
-                        <span class="text-xs/5 text-gray-600 dark:text-gray-400">Input:</span>
-                        <span class="text-sm/6 font-medium text-gray-900 dark:text-white">{{ model.inputTokenLimit | number }}</span>
-                      </div>
-                    }
-                    @if (model.outputTokenLimit) {
-                      <div class="flex items-center gap-2">
-                        <span class="text-xs/5 text-gray-600 dark:text-gray-400">Output:</span>
-                        <span class="text-sm/6 font-medium text-gray-900 dark:text-white">{{ model.outputTokenLimit | number }}</span>
-                      </div>
-                    }
-                    @if (!model.inputTokenLimit && !model.outputTokenLimit) {
-                      <span class="text-sm/6 text-gray-500 dark:text-gray-400">Not specified</span>
+                <div class="min-w-0 flex-1">
+                  <div class="flex items-center gap-1.5">
+                    <span class="truncate text-sm/6 font-medium text-gray-900 dark:text-white">
+                      {{ model.displayName }}
+                    </span>
+                    @if (model.thinking) {
+                      <span class="shrink-0 rounded-2xl bg-purple-100 px-2 py-0.5 text-xs/5 font-medium text-purple-700 dark:bg-purple-900/50 dark:text-purple-300">
+                        Thinking
+                      </span>
                     }
                   </div>
+                  <p class="truncate font-mono text-xs/5 text-gray-500 dark:text-gray-400">
+                    {{ model.name }}
+                  </p>
                 </div>
 
-                <!-- Model Parameters -->
-                @if (model.temperature !== null || model.topP !== null || model.topK !== null) {
-                  <div>
-                    <h4 class="text-sm/6 font-medium text-gray-700 dark:text-gray-300">Parameters</h4>
-                    <div class="mt-2 space-y-1">
-                      @if (model.temperature !== null) {
-                        <div class="flex items-center gap-2">
-                          <span class="text-xs/5 text-gray-600 dark:text-gray-400">Temperature:</span>
-                          <span class="text-sm/6 font-medium text-gray-900 dark:text-white">{{ model.temperature }}</span>
-                        </div>
-                      }
-                      @if (model.topP !== null) {
-                        <div class="flex items-center gap-2">
-                          <span class="text-xs/5 text-gray-600 dark:text-gray-400">Top-P:</span>
-                          <span class="text-sm/6 font-medium text-gray-900 dark:text-white">{{ model.topP }}</span>
-                        </div>
-                      }
-                      @if (model.topK !== null) {
-                        <div class="flex items-center gap-2">
-                          <span class="text-xs/5 text-gray-600 dark:text-gray-400">Top-K:</span>
-                          <span class="text-sm/6 font-medium text-gray-900 dark:text-white">{{ model.topK }}</span>
-                        </div>
-                      }
-                    </div>
-                  </div>
-                }
-
-                <!-- Version -->
-                @if (model.version) {
-                  <div>
-                    <h4 class="text-sm/6 font-medium text-gray-700 dark:text-gray-300">Version</h4>
-                    <p class="mt-2 text-sm/6 text-gray-900 dark:text-white">{{ model.version }}</p>
-                  </div>
-                }
-              </div>
-
-              <!-- Model Features and Actions -->
-              <div class="mt-4 flex items-center justify-between gap-4 border-t border-gray-200 pt-4 dark:border-gray-700">
-                <div class="flex flex-wrap gap-4">
-                  <!-- Thinking/Reasoning Capability -->
-                  @if (model.thinking) {
-                    <div class="flex items-center gap-2">
-                      <svg class="size-5 text-purple-600 dark:text-purple-400" fill="currentColor" viewBox="0 0 20 20">
-                        <path d="M10 9a3 3 0 100-6 3 3 0 000 6zM6 8a2 2 0 11-4 0 2 2 0 014 0zM1.49 15.326a.78.78 0 01-.358-.442 3 3 0 014.308-3.516 6.484 6.484 0 00-1.905 3.959c-.023.222-.014.442.025.654a4.97 4.97 0 01-2.07-.655zM16.44 15.98a4.97 4.97 0 002.07-.654.78.78 0 00.357-.442 3 3 0 00-4.308-3.517 6.484 6.484 0 011.907 3.96 2.32 2.32 0 01-.026.654zM18 8a2 2 0 11-4 0 2 2 0 014 0zM5.304 16.19a.844.844 0 01-.277-.71 5 5 0 019.947 0 .843.843 0 01-.277.71A6.975 6.975 0 0110 18a6.974 6.974 0 01-4.696-1.81z" />
-                      </svg>
-                      <span class="text-sm/6 text-purple-700 dark:text-purple-300">Thinking model</span>
-                    </div>
-                  }
-                </div>
+                <span class="hidden shrink-0 rounded-2xl bg-gray-100 px-2.5 py-0.5 text-xs/5 font-medium text-gray-600 sm:inline-block dark:bg-gray-700 dark:text-gray-300">
+                  Google
+                </span>
 
-                <!-- Add Model Button or Added Status -->
                 @if (isModelAdded(model.name)) {
-                  <div class="inline-flex items-center gap-2 rounded-sm bg-green-100 px-3 py-1.5 text-sm/6 font-medium text-green-800 dark:bg-green-900/50 dark:text-green-300">
-                    <svg class="size-4" fill="currentColor" viewBox="0 0 20 20">
-                      <path fill-rule="evenodd" d="M10 18a8 8 0 100-16 8 8 0 000 16zm3.857-9.809a.75.75 0 00-1.214-.882l-3.483 4.79-1.88-1.88a.75.75 0 10-1.06 1.061l2.5 2.5a.75.75 0 001.137-.089l4-5.5z" clip-rule="evenodd" />
-                    </svg>
+                  <span class="inline-flex shrink-0 items-center gap-1.5 rounded-2xl bg-green-100 px-3 py-1.5 text-xs/5 font-medium text-green-800 dark:bg-green-900/50 dark:text-green-300">
+                    <ng-icon name="heroCheckCircleSolid" class="size-4" aria-hidden="true" />
                     Added
-                  </div>
+                  </span>
                 } @else {
                   <button
                     (click)="addModelFromGemini(model)"
-                    class="inline-flex items-center gap-2 rounded-sm bg-blue-600 px-3 py-1.5 text-sm/6 font-medium text-white hover:bg-blue-700 focus:outline-hidden focus:ring-3 focus:ring-blue-500/50 dark:bg-blue-500 dark:hover:bg-blue-600"
+                    class="inline-flex shrink-0 items-center gap-1.5 rounded-2xl bg-blue-600 px-3 py-1.5 text-xs/5 font-medium text-white hover:bg-blue-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:bg-blue-500 dark:hover:bg-blue-600"
                   >
-                    <svg class="size-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2">
-                      <path stroke-linecap="round" stroke-linejoin="round" d="M12 4v16m8-8H4" />
-                    </svg>
-                    Add Model
+                    <ng-icon name="heroPlus" class="size-4" aria-hidden="true" />
+                    Add
                   </button>
                 }
               </div>
-            </div>
+
+              <!-- Expanded detail -->
+              @if (isExpanded(model.name)) {
+                <div
+                  [id]="'model-detail-' + model.name"
+                  class="border-t border-gray-100 bg-gray-50 px-4 py-3 sm:pl-14 dark:border-gray-700/60 dark:bg-gray-900/40"
+                >
+                  @if (model.description) {
+                    <p class="mb-3 text-sm/6 text-gray-700 dark:text-gray-300">{{ model.description }}</p>
+                  }
+                  <dl class="grid grid-cols-1 gap-x-8 gap-y-3 sm:grid-cols-3">
+                    <div>
+                      <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">Token limits</dt>
+                      <dd class="mt-0.5 text-sm/6 text-gray-700 dark:text-gray-300">
+                        @if (model.inputTokenLimit || model.outputTokenLimit) {
+                          {{ (model.inputTokenLimit || 0) | number }} in
+                          ·
+                          {{ (model.outputTokenLimit || 0) | number }} out
+                        } @else {
+                          Not specified
+                        }
+                      </dd>
+                    </div>
+
+                    @if (model.temperature !== undefined || model.topP !== undefined || model.topK !== undefined) {
+                      <div>
+                        <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">Parameters</dt>
+                        <dd class="mt-0.5 space-y-0.5 text-sm/6 text-gray-700 dark:text-gray-300">
+                          @if (model.temperature !== undefined) {
+                            <div>Temperature: {{ model.temperature }}</div>
+                          }
+                          @if (model.topP !== undefined) {
+                            <div>Top-P: {{ model.topP }}</div>
+                          }
+                          @if (model.topK !== undefined) {
+                            <div>Top-K: {{ model.topK }}</div>
+                          }
+                        </dd>
+                      </div>
+                    }
+
+                    @if (model.version) {
+                      <div>
+                        <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">Version</dt>
+                        <dd class="mt-0.5 text-sm/6 text-gray-700 dark:text-gray-300">{{ model.version }}</dd>
+                      </div>
+                    }
+                  </dl>
+                </div>
+              }
+            </li>
           }
-        </div>
+        </ul>
       }
     }
   </div>
diff --git a/frontend/ai.client/src/app/admin/gemini-models/gemini-models.page.ts b/frontend/ai.client/src/app/admin/gemini-models/gemini-models.page.ts
index 2a9eeb50..caca6ae6 100644
--- a/frontend/ai.client/src/app/admin/gemini-models/gemini-models.page.ts
+++ b/frontend/ai.client/src/app/admin/gemini-models/gemini-models.page.ts
@@ -3,7 +3,13 @@ import { Router, RouterLink } from '@angular/router';
 import { FormsModule } from '@angular/forms';
 import { DecimalPipe } from '@angular/common';
 import { NgIcon, provideIcons } from '@ng-icons/core';
-import { heroArrowLeft } from '@ng-icons/heroicons/outline';
+import {
+  heroArrowLeft,
+  heroPlus,
+  heroMagnifyingGlass,
+  heroChevronDown,
+} from '@ng-icons/heroicons/outline';
+import { heroCheckCircleSolid } from '@ng-icons/heroicons/solid';
 import { GeminiModelsService } from './services/gemini-models.service';
 import { GeminiModelSummary } from './models/gemini-model.model';
 import { ManagedModelsService } from '../manage-models/services/managed-models.service';
@@ -12,7 +18,15 @@ import { ThinkingDotsComponent } from '../../components/thinking-dots.component'
 @Component({
   selector: 'app-gemini-models-page',
   imports: [FormsModule, ThinkingDotsComponent, DecimalPipe, RouterLink, NgIcon],
-  providers: [provideIcons({ heroArrowLeft })],
+  providers: [
+    provideIcons({
+      heroArrowLeft,
+      heroPlus,
+      heroMagnifyingGlass,
+      heroChevronDown,
+      heroCheckCircleSolid,
+    }),
+  ],
   templateUrl: './gemini-models.page.html',
   styleUrl: './gemini-models.page.css',
   changeDetection: ChangeDetectionStrategy.OnPush,
@@ -26,6 +40,9 @@ export class GeminiModelsPage {
   maxResultsFilter = signal<number | undefined>(undefined);
   searchQuery = signal<string>('');
 
+  // Row detail expansion state (set of model names currently expanded)
+  private expandedIds = signal<ReadonlySet<string>>(new Set());
+
   // Access the models resource from the service
   readonly modelsResource = this.geminiModelsService.modelsResource;
 
@@ -82,6 +99,22 @@ export class GeminiModelsPage {
     return !!(this.maxResultsFilter() || this.searchQuery());
   });
 
+  isExpanded(modelName: string): boolean {
+    return this.expandedIds().has(modelName);
+  }
+
+  toggleExpand(modelName: string): void {
+    this.expandedIds.update(current => {
+      const next = new Set(current);
+      if (next.has(modelName)) {
+        next.delete(modelName);
+      } else {
+        next.add(modelName);
+      }
+      return next;
+    });
+  }
+
   /**
    * Check if a model has already been added to the managed models list
    */
diff --git a/frontend/ai.client/src/app/admin/manage-models/manage-models.page.html b/frontend/ai.client/src/app/admin/manage-models/manage-models.page.html
index 49c5651c..b16b08bb 100644
--- a/frontend/ai.client/src/app/admin/manage-models/manage-models.page.html
+++ b/frontend/ai.client/src/app/admin/manage-models/manage-models.page.html
@@ -1,243 +1,244 @@
 <div class="min-h-dvh">
-  <div class="mx-auto max-w-7xl px-4 py-8 sm:px-6 lg:px-8">
-    <!-- Back Button -->
-    <a
-      routerLink="/admin"
-      class="mb-6 inline-flex items-center gap-2 text-sm/6 font-medium text-gray-600 hover:text-gray-900 dark:text-gray-400 dark:hover:text-white"
-    >
-      <ng-icon name="heroArrowLeft" class="size-4" />
-      Back to Admin
-    </a>
-
+  <div class="mx-auto max-w-5xl px-4 py-8 sm:px-6 lg:px-8">
     <!-- Page Header -->
-    <div class="mb-8 flex flex-col gap-4 sm:flex-row sm:items-center sm:justify-between">
+    <div class="mb-6 flex flex-col gap-4 sm:flex-row sm:items-end sm:justify-between">
       <div>
-        <h1 class="text-3xl/9 font-bold text-gray-900 dark:text-white">Manage Models</h1>
-        <p class="mt-1 text-gray-600 dark:text-gray-400">
-          View and manage AI models available to users.
+        <h1 class="text-2xl/8 font-bold text-gray-900 dark:text-white">Manage Models</h1>
+        <p class="mt-1 text-sm/6 text-gray-600 dark:text-gray-400">
+          Enable, configure, and remove the models available to users.
         </p>
       </div>
       <a
         routerLink="/admin/manage-models/new"
-        class="inline-flex items-center gap-2 rounded-sm bg-blue-600 px-4 py-2 text-sm/6 font-medium text-white hover:bg-blue-700 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:bg-blue-500 dark:hover:bg-blue-600"
+        class="inline-flex shrink-0 items-center gap-2 rounded-2xl bg-blue-600 px-4 py-2 text-sm/6 font-medium text-white hover:bg-blue-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:bg-blue-500 dark:hover:bg-blue-600"
       >
-        <svg class="size-5" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2">
-          <path stroke-linecap="round" stroke-linejoin="round" d="M12 4v16m8-8H4" />
-        </svg>
+        <ng-icon name="heroPlus" class="size-5" aria-hidden="true" />
         Add Model
       </a>
     </div>
 
-    <!-- Filters Section -->
-    <div class="mb-6 rounded-sm border border-gray-300 bg-white p-6 dark:border-gray-600 dark:bg-gray-800">
-      <h2 class="mb-4 text-lg/7 font-semibold text-gray-900 dark:text-white">Search & Filters</h2>
-
-      <div class="grid grid-cols-1 gap-4 md:grid-cols-3">
-        <!-- Search -->
-        <div>
-          <label for="search" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
-            Search
-          </label>
-          <input
-            type="text"
-            id="search"
-            [ngModel]="searchQuery()"
-            (ngModelChange)="searchQuery.set($event)"
-            placeholder="Search by name, ID, or provider..."
-            class="mt-1 block w-full rounded-sm border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-500 dark:bg-gray-700 dark:text-white dark:placeholder:text-gray-400"
-          />
-        </div>
+    <!-- Toolbar: search + filters inline -->
+    <div class="mb-3 flex flex-col gap-2 sm:flex-row sm:items-center">
+      <div class="relative flex-1">
+        <ng-icon
+          name="heroMagnifyingGlass"
+          class="pointer-events-none absolute left-3 top-1/2 size-4 -translate-y-1/2 text-gray-400 dark:text-gray-500"
+          aria-hidden="true"
+        />
+        <label for="search" class="sr-only">Search models</label>
+        <input
+          type="text"
+          id="search"
+          [ngModel]="searchQuery()"
+          (ngModelChange)="searchQuery.set($event)"
+          placeholder="Search by name, ID, or provider…"
+          class="block w-full rounded-2xl border border-gray-300 bg-white py-2 pl-9 pr-3 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
+        />
+      </div>
 
-        <!-- Provider Filter -->
-        <div>
-          <label for="provider" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
-            Provider
-          </label>
-          <select
-            id="provider"
-            [ngModel]="providerFilter()"
-            (ngModelChange)="providerFilter.set($event)"
-            class="mt-1 block w-full rounded-sm border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-500 dark:bg-gray-700 dark:text-white"
-          >
-            <option value="">All Providers</option>
-            @for (provider of availableProviders(); track provider) {
-              <option [value]="provider">{{ provider }}</option>
-            }
-          </select>
-        </div>
+      <label for="provider" class="sr-only">Filter by provider</label>
+      <select
+        id="provider"
+        [ngModel]="providerFilter()"
+        (ngModelChange)="providerFilter.set($event)"
+        class="rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white"
+      >
+        <option value="">All providers</option>
+        @for (provider of availableProviders(); track provider) {
+          <option [value]="provider">{{ provider }}</option>
+        }
+      </select>
 
-        <!-- Enabled Filter -->
-        <div>
-          <label for="enabled" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
-            Status
-          </label>
-          <select
-            id="enabled"
-            [ngModel]="enabledFilter()"
-            (ngModelChange)="enabledFilter.set($event)"
-            class="mt-1 block w-full rounded-sm border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-500 dark:bg-gray-700 dark:text-white"
-          >
-            <option value="">All Statuses</option>
-            <option value="enabled">Enabled</option>
-            <option value="disabled">Disabled</option>
-          </select>
-        </div>
-      </div>
+      <label for="enabled" class="sr-only">Filter by status</label>
+      <select
+        id="enabled"
+        [ngModel]="enabledFilter()"
+        (ngModelChange)="enabledFilter.set($event)"
+        class="rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white"
+      >
+        <option value="">All statuses</option>
+        <option value="enabled">Enabled</option>
+        <option value="disabled">Disabled</option>
+      </select>
 
-      <!-- Filter Actions -->
       @if (hasActiveFilters()) {
-        <div class="mt-4">
-          <button
-            (click)="resetFilters()"
-            class="rounded-sm border border-gray-300 bg-gray-100 px-4 py-2 text-sm/6 font-medium text-gray-700 hover:bg-gray-200 focus:outline-none focus:ring-2 focus:ring-gray-500 dark:border-gray-500 dark:bg-gray-700 dark:text-gray-300 dark:hover:bg-gray-600"
-          >
-            Reset Filters
-          </button>
-        </div>
+        <button
+          (click)="resetFilters()"
+          class="rounded-2xl px-3 py-2 text-sm/6 font-medium text-gray-600 hover:bg-gray-100 hover:text-gray-900 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-gray-500 dark:text-gray-400 dark:hover:bg-gray-800 dark:hover:text-white"
+        >
+          Reset
+        </button>
       }
     </div>
 
-    <!-- Results Header -->
-    <div class="mb-4 flex flex-col gap-3 sm:flex-row sm:items-center sm:justify-between">
-      <p class="text-sm/6 text-gray-600 dark:text-gray-400">
-        Showing {{ filteredModels().length }} model{{ filteredModels().length !== 1 ? 's' : '' }}
+    <!-- Count + catalog links -->
+    <div class="mb-3 flex flex-col gap-1 text-xs/5 sm:flex-row sm:items-center sm:justify-between">
+      <p class="text-gray-500 dark:text-gray-400">
+        {{ filteredModels().length }} model{{ filteredModels().length !== 1 ? 's' : '' }}
       </p>
-      <div class="flex flex-wrap gap-3 sm:gap-4">
-        <a
-          routerLink="/admin/bedrock/models"
-          class="text-sm/6 font-medium text-blue-600 hover:text-blue-700 dark:text-blue-400 dark:hover:text-blue-300"
-        >
-          Browse Bedrock Models →
-        </a>
-        <a
-          routerLink="/admin/gemini/models"
-          class="text-sm/6 font-medium text-blue-600 hover:text-blue-700 dark:text-blue-400 dark:hover:text-blue-300"
-        >
-          Browse Gemini Models →
-        </a>
-        <a
-          routerLink="/admin/openai/models"
-          class="text-sm/6 font-medium text-blue-600 hover:text-blue-700 dark:text-blue-400 dark:hover:text-blue-300"
-        >
-          Browse OpenAI Models →
-        </a>
+      <div class="flex flex-wrap items-center gap-x-4 gap-y-1">
+        <span class="text-gray-400 dark:text-gray-500">Add from catalog:</span>
+        <a routerLink="/admin/bedrock/models" class="font-medium text-blue-600 hover:text-blue-700 dark:text-blue-400 dark:hover:text-blue-300">Bedrock</a>
+        <a routerLink="/admin/gemini/models" class="font-medium text-blue-600 hover:text-blue-700 dark:text-blue-400 dark:hover:text-blue-300">Gemini</a>
+        <a routerLink="/admin/openai/models" class="font-medium text-blue-600 hover:text-blue-700 dark:text-blue-400 dark:hover:text-blue-300">OpenAI</a>
       </div>
     </div>
 
     <!-- Models List -->
     @if (filteredModels().length === 0) {
-      <div class="rounded-sm border border-gray-300 bg-white p-12 text-center dark:border-gray-600 dark:bg-gray-800">
-        <p class="text-base/7 text-gray-500 dark:text-gray-400">
+      <div class="rounded-2xl border border-dashed border-gray-300 bg-white p-12 text-center dark:border-gray-700 dark:bg-gray-800">
+        <p class="text-sm/6 text-gray-500 dark:text-gray-400">
           No models found matching the current filters.
         </p>
       </div>
     } @else {
-      <div class="space-y-4">
+      <ul class="divide-y divide-gray-200 overflow-hidden rounded-2xl border border-gray-200 bg-white dark:divide-gray-700 dark:border-gray-700 dark:bg-gray-800">
         @for (model of filteredModels(); track model.id) {
-          <div class="rounded-sm border border-gray-300 bg-white p-6 dark:border-gray-600 dark:bg-gray-800">
-            <!-- Model Header -->
-            <div class="mb-4 flex items-start justify-between">
-              <div class="flex-1">
-                <div class="flex items-center gap-3">
-                  <h3 class="text-lg/7 font-semibold text-gray-900 dark:text-white">
+          <li>
+            <!-- Row -->
+            <div class="flex items-center gap-3 px-3 py-2.5 sm:px-4">
+              <!-- Expand toggle -->
+              <button
+                type="button"
+                (click)="toggleExpand(model.id)"
+                [attr.aria-expanded]="isExpanded(model.id)"
+                [attr.aria-controls]="'model-detail-' + model.id"
+                [attr.aria-label]="(isExpanded(model.id) ? 'Hide' : 'Show') + ' details for ' + model.modelName"
+                class="flex size-7 shrink-0 items-center justify-center rounded-2xl text-gray-400 hover:bg-gray-100 hover:text-gray-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:text-gray-500 dark:hover:bg-gray-700 dark:hover:text-gray-200"
+              >
+                <ng-icon
+                  name="heroChevronDown"
+                  class="size-4 transition-transform duration-150"
+                  [class.rotate-180]="isExpanded(model.id)"
+                  aria-hidden="true"
+                />
+              </button>
+
+              <!-- Name + model id -->
+              <div class="min-w-0 flex-1">
+                <div class="flex items-center gap-1.5">
+                  <span class="truncate text-sm/6 font-medium text-gray-900 dark:text-white">
                     {{ model.modelName }}
-                  </h3>
-                  @if (model.enabled) {
-                    <span class="inline-flex items-center rounded-sm bg-green-100 px-2 py-1 text-xs/5 font-medium text-green-800 dark:bg-green-900/50 dark:text-green-300">
-                      Enabled
-                    </span>
-                  } @else {
-                    <span class="inline-flex items-center rounded-sm bg-red-100 px-2 py-1 text-xs/5 font-medium text-red-800 dark:bg-red-900/50 dark:text-red-300">
-                      Disabled
-                    </span>
+                  </span>
+                  @if (model.isDefault) {
+                    <ng-icon
+                      name="heroStarSolid"
+                      class="size-4 shrink-0 text-amber-500 dark:text-amber-400"
+                      aria-label="Default model"
+                    />
                   }
                 </div>
-                <p class="mt-1 font-mono text-sm/6 text-gray-600 dark:text-gray-400">
+                <p class="truncate font-mono text-xs/5 text-gray-500 dark:text-gray-400">
                   {{ model.modelId }}
                 </p>
               </div>
-              <div class="flex shrink-0 gap-2">
-                <span class="inline-flex items-center rounded-sm bg-blue-100 px-3 py-1 text-xs/5 font-medium text-blue-800 dark:bg-blue-900/50 dark:text-blue-300">
-                  {{ model.provider }}
-                </span>
-                <span class="inline-flex items-center rounded-sm bg-gray-100 px-3 py-1 text-xs/5 font-medium text-gray-800 dark:bg-gray-700 dark:text-gray-300">
-                  {{ model.providerName }}
-                </span>
-              </div>
-            </div>
 
-            <!-- Model Details Grid -->
-            <div class="grid grid-cols-1 gap-4 md:grid-cols-2 lg:grid-cols-3">
-              <!-- Allowed AppRoles -->
-              <div>
-                <h4 class="text-sm/6 font-medium text-gray-700 dark:text-gray-300">Allowed Roles</h4>
-                <div class="mt-2 flex flex-wrap gap-2">
-                  @if (model.allowedAppRoles && model.allowedAppRoles.length > 0) {
-                    @for (roleId of model.allowedAppRoles; track roleId) {
-                      <span
-                        class="inline-flex items-center rounded-sm bg-purple-100 px-2 py-1 text-xs/5 text-purple-700 dark:bg-purple-900/50 dark:text-purple-300"
-                        [title]="roleId"
-                      >
-                        {{ getRoleDisplayName(roleId) }}
-                      </span>
-                    }
-                  } @else {
-                    <span class="text-xs/5 text-gray-500 dark:text-gray-400 italic">No roles assigned</span>
-                  }
-                </div>
-              </div>
+              <!-- Provider -->
+              <span class="hidden shrink-0 rounded-2xl bg-gray-100 px-2.5 py-0.5 text-xs/5 font-medium text-gray-600 sm:inline-block dark:bg-gray-700 dark:text-gray-300">
+                {{ model.providerName }}
+              </span>
 
-              <!-- Pricing -->
-              <div>
-                <h4 class="text-sm/6 font-medium text-gray-700 dark:text-gray-300">Pricing (per 1M tokens)</h4>
-                <div class="mt-2 space-y-1">
-                  <p class="text-sm/6 text-gray-900 dark:text-white">
-                    Input: <span class="font-semibold">${{ model.inputPricePerMillionTokens.toFixed(2) }}</span>
-                  </p>
-                  <p class="text-sm/6 text-gray-900 dark:text-white">
-                    Output: <span class="font-semibold">${{ model.outputPricePerMillionTokens.toFixed(2) }}</span>
-                  </p>
-                </div>
+              <!-- Enable toggle -->
+              <div class="flex shrink-0 items-center gap-2">
+                <span
+                  class="hidden w-14 text-right text-xs/5 font-medium sm:inline-block"
+                  [class]="model.enabled ? 'text-green-700 dark:text-green-400' : 'text-gray-500 dark:text-gray-400'"
+                >
+                  {{ model.enabled ? 'Enabled' : 'Disabled' }}
+                </span>
+                <button
+                  type="button"
+                  role="switch"
+                  [attr.aria-checked]="model.enabled"
+                  [attr.aria-label]="(model.enabled ? 'Disable' : 'Enable') + ' ' + model.modelName"
+                  [disabled]="isToggling(model.id)"
+                  (click)="toggleEnabled(model)"
+                  class="relative inline-flex h-6 w-11 shrink-0 items-center rounded-2xl transition-colors focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 disabled:cursor-not-allowed disabled:opacity-50"
+                  [class]="model.enabled ? 'bg-green-600 dark:bg-green-500' : 'bg-gray-300 dark:bg-gray-600'"
+                >
+                  <span
+                    class="inline-block size-5 transform rounded-full bg-white shadow transition-transform duration-150"
+                    [class]="model.enabled ? 'translate-x-5' : 'translate-x-0.5'"
+                  ></span>
+                </button>
               </div>
 
-              <!-- Modalities -->
-              <div>
-                <h4 class="text-sm/6 font-medium text-gray-700 dark:text-gray-300">Modalities</h4>
-                <div class="mt-2 space-y-1">
-                  <p class="text-sm/6 text-gray-700 dark:text-gray-300">
-                    <span class="font-medium">Input:</span> {{ model.inputModalities.join(', ') }}
-                  </p>
-                  <p class="text-sm/6 text-gray-700 dark:text-gray-300">
-                    <span class="font-medium">Output:</span> {{ model.outputModalities.join(', ') }}
-                  </p>
-                </div>
+              <!-- Actions -->
+              <div class="flex shrink-0 items-center gap-1">
+                <a
+                  [routerLink]="['/admin/manage-models/edit', model.id]"
+                  [attr.aria-label]="'Edit ' + model.modelName"
+                  [title]="'Edit ' + model.modelName"
+                  class="flex size-8 items-center justify-center rounded-2xl text-gray-400 hover:bg-gray-100 hover:text-gray-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:text-gray-500 dark:hover:bg-gray-700 dark:hover:text-gray-200"
+                >
+                  <ng-icon name="heroPencilSquare" class="size-4" aria-hidden="true" />
+                </a>
+                <button
+                  type="button"
+                  (click)="deleteModel(model.id)"
+                  [attr.aria-label]="'Delete ' + model.modelName"
+                  [title]="'Delete ' + model.modelName"
+                  class="flex size-8 items-center justify-center rounded-2xl text-gray-400 hover:bg-red-50 hover:text-red-600 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-red-500 dark:text-gray-500 dark:hover:bg-red-900/20 dark:hover:text-red-400"
+                >
+                  <ng-icon name="heroTrash" class="size-4" aria-hidden="true" />
+                </button>
               </div>
             </div>
 
-            <!-- Actions -->
-            <div class="mt-4 flex gap-3 border-t border-gray-200 pt-4 dark:border-gray-600">
-              <a
-                [routerLink]="['/admin/manage-models/edit', model.id]"
-                class="inline-flex items-center gap-2 rounded-sm border border-gray-300 bg-white px-3 py-1.5 text-sm/6 font-medium text-gray-700 hover:bg-gray-100 focus:outline-none focus:ring-2 focus:ring-gray-500 dark:border-gray-500 dark:bg-gray-700 dark:text-gray-300 dark:hover:bg-gray-600"
+            <!-- Expanded detail -->
+            @if (isExpanded(model.id)) {
+              <div
+                [id]="'model-detail-' + model.id"
+                class="border-t border-gray-100 bg-gray-50 px-4 py-3 sm:pl-14 dark:border-gray-700/60 dark:bg-gray-900/40"
               >
-                <svg class="size-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2">
-                  <path stroke-linecap="round" stroke-linejoin="round" d="M11 5H6a2 2 0 00-2 2v11a2 2 0 002 2h11a2 2 0 002-2v-5m-1.414-9.414a2 2 0 112.828 2.828L11.828 15H9v-2.828l8.586-8.586z" />
-                </svg>
-                Edit
-              </a>
-              <button
-                (click)="deleteModel(model.id)"
-                class="inline-flex items-center gap-2 rounded-sm border border-red-300 bg-white px-3 py-1.5 text-sm/6 font-medium text-red-700 hover:bg-red-50 focus:outline-none focus:ring-2 focus:ring-red-500 dark:border-red-500 dark:bg-gray-700 dark:text-red-400 dark:hover:bg-red-900/20"
-              >
-                <svg class="size-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2">
-                  <path stroke-linecap="round" stroke-linejoin="round" d="M19 7l-.867 12.142A2 2 0 0116.138 21H7.862a2 2 0 01-1.995-1.858L5 7m5 4v6m4-6v6m1-10V4a1 1 0 00-1-1h-4a1 1 0 00-1 1v3M4 7h16" />
-                </svg>
-                Delete
-              </button>
-            </div>
-          </div>
+                <dl class="grid grid-cols-1 gap-x-8 gap-y-3 sm:grid-cols-3">
+                  <div>
+                    <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">
+                      Pricing / 1M tokens
+                    </dt>
+                    <dd class="mt-0.5 text-sm/6 text-gray-700 dark:text-gray-300">
+                      <span class="font-semibold text-gray-900 dark:text-white">${{ model.inputPricePerMillionTokens.toFixed(2) }}</span> in
+                      ·
+                      <span class="font-semibold text-gray-900 dark:text-white">${{ model.outputPricePerMillionTokens.toFixed(2) }}</span> out
+                    </dd>
+                  </div>
+
+                  <div>
+                    <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">
+                      Modalities
+                    </dt>
+                    <dd class="mt-0.5 text-sm/6 text-gray-700 dark:text-gray-300">
+                      {{ model.inputModalities.join(', ') }}
+                      <span class="text-gray-400 dark:text-gray-500">→</span>
+                      {{ model.outputModalities.join(', ') }}
+                    </dd>
+                  </div>
+
+                  <div>
+                    <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">
+                      Allowed roles
+                    </dt>
+                    <dd class="mt-1 flex flex-wrap gap-1.5">
+                      @if (model.allowedAppRoles && model.allowedAppRoles.length > 0) {
+                        @for (roleId of model.allowedAppRoles; track roleId) {
+                          <span
+                            class="inline-flex items-center rounded-2xl bg-purple-100 px-2 py-0.5 text-xs/5 text-purple-700 dark:bg-purple-900/50 dark:text-purple-300"
+                            [title]="roleId"
+                          >
+                            {{ getRoleDisplayName(roleId) }}
+                          </span>
+                        }
+                      } @else {
+                        <span class="text-xs/5 italic text-gray-500 dark:text-gray-400">No roles assigned</span>
+                      }
+                    </dd>
+                  </div>
+                </dl>
+              </div>
+            }
+          </li>
         }
-      </div>
+      </ul>
     }
   </div>
 </div>
diff --git a/frontend/ai.client/src/app/admin/manage-models/manage-models.page.ts b/frontend/ai.client/src/app/admin/manage-models/manage-models.page.ts
index e67191d6..e46ca05d 100644
--- a/frontend/ai.client/src/app/admin/manage-models/manage-models.page.ts
+++ b/frontend/ai.client/src/app/admin/manage-models/manage-models.page.ts
@@ -2,14 +2,31 @@ import { Component, ChangeDetectionStrategy, signal, computed, inject } from '@a
 import { RouterLink } from '@angular/router';
 import { FormsModule } from '@angular/forms';
 import { NgIcon, provideIcons } from '@ng-icons/core';
-import { heroArrowLeft } from '@ng-icons/heroicons/outline';
+import {
+  heroPlus,
+  heroMagnifyingGlass,
+  heroChevronDown,
+  heroPencilSquare,
+  heroTrash,
+} from '@ng-icons/heroicons/outline';
+import { heroStarSolid } from '@ng-icons/heroicons/solid';
 import { ManagedModelsService } from './services/managed-models.service';
 import { AppRolesService } from '../roles/services/app-roles.service';
+import type { ManagedModel } from './models/managed-model.model';
 
 @Component({
   selector: 'app-manage-models-page',
   imports: [RouterLink, FormsModule, NgIcon],
-  providers: [provideIcons({ heroArrowLeft })],
+  providers: [
+    provideIcons({
+      heroPlus,
+      heroMagnifyingGlass,
+      heroChevronDown,
+      heroPencilSquare,
+      heroTrash,
+      heroStarSolid,
+    }),
+  ],
   templateUrl: './manage-models.page.html',
   styleUrl: './manage-models.page.css',
   changeDetection: ChangeDetectionStrategy.OnPush,
@@ -23,12 +40,17 @@ export class ManageModelsPage {
   providerFilter = signal<string>('');
   enabledFilter = signal<string>('');
 
-  // Get models from service
-  private mockModels = computed(() => this.managedModelsService.getManagedModels());
+  // Row detail expansion state (set of model ids currently expanded)
+  private expandedIds = signal<ReadonlySet<string>>(new Set());
+
+  // Models with an in-flight enable/disable request
+  private togglingIds = signal<ReadonlySet<string>>(new Set());
+
+  private allModels = computed(() => this.managedModelsService.getManagedModels());
 
   // Filtered models based on search and filters
   readonly filteredModels = computed(() => {
-    let models = this.mockModels();
+    let models = this.allModels();
     const query = this.searchQuery().toLowerCase();
     const provider = this.providerFilter();
     const enabled = this.enabledFilter();
@@ -56,7 +78,7 @@ export class ManageModelsPage {
 
   // Available providers for filter dropdown
   readonly availableProviders = computed(() => {
-    const providers = new Set(this.mockModels().map(m => m.providerName));
+    const providers = new Set(this.allModels().map(m => m.providerName));
     return Array.from(providers).sort();
   });
 
@@ -74,6 +96,48 @@ export class ManageModelsPage {
     this.enabledFilter.set('');
   }
 
+  isExpanded(modelId: string): boolean {
+    return this.expandedIds().has(modelId);
+  }
+
+  toggleExpand(modelId: string): void {
+    this.expandedIds.update(current => {
+      const next = new Set(current);
+      if (next.has(modelId)) {
+        next.delete(modelId);
+      } else {
+        next.add(modelId);
+      }
+      return next;
+    });
+  }
+
+  isToggling(modelId: string): boolean {
+    return this.togglingIds().has(modelId);
+  }
+
+  /**
+   * Flip a model's enabled state in place via a partial update.
+   */
+  async toggleEnabled(model: ManagedModel): Promise<void> {
+    if (this.isToggling(model.id)) {
+      return;
+    }
+    this.togglingIds.update(current => new Set(current).add(model.id));
+    try {
+      await this.managedModelsService.updateModel(model.id, { enabled: !model.enabled });
+    } catch (error) {
+      console.error('Error updating model status:', error);
+      alert('Failed to update model status. Please try again.');
+    } finally {
+      this.togglingIds.update(current => {
+        const next = new Set(current);
+        next.delete(model.id);
+        return next;
+      });
+    }
+  }
+
   /**
    * Delete a model
    */
diff --git a/frontend/ai.client/src/app/admin/manage-models/model-form.page.html b/frontend/ai.client/src/app/admin/manage-models/model-form.page.html
index 88748bcb..f5ec0a4c 100644
--- a/frontend/ai.client/src/app/admin/manage-models/model-form.page.html
+++ b/frontend/ai.client/src/app/admin/manage-models/model-form.page.html
@@ -552,6 +552,37 @@ <h2 class="text-xl/8 font-semibold text-gray-900 dark:text-white">Inference Para
                         Default to enabled
                       </label>
                     </div>
+                  } @else if (meta.kind === 'select') {
+                    <div class="md:col-span-2">
+                      <span class="block text-xs/5 font-medium text-gray-600 dark:text-gray-400">
+                        Levels this model supports
+                      </span>
+                      <div class="mt-1 flex flex-wrap gap-x-4 gap-y-1">
+                        @for (lvl of meta.options ?? []; track lvl) {
+                          <label class="flex items-center gap-2 text-xs/5 text-gray-700 dark:text-gray-300">
+                            <input
+                              type="checkbox"
+                              [checked]="isParamAllowed(i, lvl)"
+                              (change)="toggleParamAllowed(i, lvl)"
+                              class="size-4 rounded-xs border-gray-300 text-blue-600 focus:ring-3 focus:ring-blue-500/50 dark:border-gray-600 dark:bg-gray-700"
+                            />
+                            {{ lvl }}
+                          </label>
+                        }
+                      </div>
+                    </div>
+                    <div>
+                      <label class="block text-xs/5 font-medium text-gray-600 dark:text-gray-400">Default</label>
+                      <select
+                        formControlName="defaultValue"
+                        class="mt-1 block w-full rounded-sm border border-gray-300 bg-white px-2 py-1 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-hidden focus:ring-3 focus:ring-blue-500/50 dark:border-gray-600 dark:bg-gray-700 dark:text-white"
+                      >
+                        <option [ngValue]="null">— none (API default) —</option>
+                        @for (lvl of paramRowGroup(i).controls.allowed.value ?? []; track lvl) {
+                          <option [ngValue]="lvl">{{ lvl }}</option>
+                        }
+                      </select>
+                    </div>
                   }
                   <div>
                     <label class="flex items-center gap-2 pt-5 text-xs/5 font-medium text-gray-600 dark:text-gray-400">
diff --git a/frontend/ai.client/src/app/admin/manage-models/model-form.page.ts b/frontend/ai.client/src/app/admin/manage-models/model-form.page.ts
index a5817cb4..1d0acbdb 100644
--- a/frontend/ai.client/src/app/admin/manage-models/model-form.page.ts
+++ b/frontend/ai.client/src/app/admin/manage-models/model-form.page.ts
@@ -37,6 +37,12 @@ interface ParamRowGroup {
   supported: FormControl<boolean>;
   min: FormControl<number | null>;
   max: FormControl<number | null>;
+  /**
+   * Selectable subset for `kind: 'select'` params (e.g. `effort`). `null`
+   * on numeric/toggle rows. The per-model effort tier difference lives
+   * here as data, mirroring `ModelParamSpec.allowed` on the backend.
+   */
+  allowed: FormControl<(string | number)[] | null>;
   defaultValue: FormControl<number | boolean | string | null>;
   locked: FormControl<boolean>;
 }
@@ -76,6 +82,17 @@ function paramRowBoundsValidator(group: AbstractControl): ValidationErrors | nul
     if (typeof min === 'number' && def < min) errors['defaultBelowMin'] = true;
     if (typeof max === 'number' && def > max) errors['defaultAboveMax'] = true;
   }
+  // Enum rows (`kind: 'select'`, e.g. effort) carry an `allowed` array
+  // instead of min/max. The model must support at least one level, and the
+  // default has to be one of them. Mirrors `ModelParamSpec._check_bounds`.
+  const allowed = group.get('allowed')?.value;
+  if (Array.isArray(allowed)) {
+    if (allowed.length === 0) {
+      errors['allowedEmpty'] = true;
+    } else if (def !== null && def !== undefined && def !== '' && !allowed.includes(def)) {
+      errors['defaultNotAllowed'] = true;
+    }
+  }
   return Object.keys(errors).length > 0 ? errors : null;
 }
 
@@ -130,6 +147,58 @@ function thinkingInvariantsValidator(array: AbstractControl): ValidationErrors |
   return null;
 }
 
+/**
+ * FormArray-level validator pinning the `max_tokens` row to the model's
+ * declared output ceiling. The model-level `maxOutputTokens` control is a
+ * sibling of this FormArray (reached via `array.parent`) — neither the
+ * `max` bound nor the `default` the runtime sends may exceed what the
+ * model can physically produce.
+ *
+ * Errors land on the `max_tokens` row so the inline markup surfaces them
+ * next to the per-row bounds errors. Mirrored on the backend by
+ * `_max_tokens_within_ceiling` on `ManagedModelCreate`/`ManagedModelUpdate`.
+ */
+function maxTokensCeilingValidator(array: AbstractControl): ValidationErrors | null {
+  if (!(array instanceof FormArray)) return null;
+  let maxTokensRow: FormGroup | undefined;
+  for (const row of array.controls as FormGroup[]) {
+    if (row.get('key')?.value === 'max_tokens') {
+      maxTokensRow = row;
+      break;
+    }
+  }
+  if (!maxTokensRow) return null;
+
+  // Recompute only the two ceiling keys each pass, preserving the per-row
+  // bounds errors paramRowBoundsValidator sets independently.
+  const rewrite = (extra: Record<string, true>): void => {
+    const existing = { ...(maxTokensRow!.errors ?? {}) };
+    delete existing['maxTokensMaxAboveCeiling'];
+    delete existing['maxTokensDefaultAboveCeiling'];
+    const merged = { ...existing, ...extra };
+    maxTokensRow!.setErrors(Object.keys(merged).length > 0 ? merged : null);
+  };
+
+  if (!maxTokensRow.get('supported')?.value) {
+    rewrite({});
+    return null;
+  }
+
+  const ceiling = array.parent?.get('maxOutputTokens')?.value;
+  if (typeof ceiling !== 'number' || !Number.isFinite(ceiling) || ceiling < 1) {
+    rewrite({});
+    return null;
+  }
+
+  const errors: Record<string, true> = {};
+  const max = maxTokensRow.get('max')?.value;
+  const def = maxTokensRow.get('defaultValue')?.value;
+  if (typeof max === 'number' && max > ceiling) errors['maxTokensMaxAboveCeiling'] = true;
+  if (typeof def === 'number' && def > ceiling) errors['maxTokensDefaultAboveCeiling'] = true;
+  rewrite(errors);
+  return null;
+}
+
 /**
  * Helper used as a key on each known-param row so the FormArray validator
  * can find the `thinking` and `max_tokens` rows. Custom rows already store
@@ -217,7 +286,7 @@ export class ModelFormPage implements OnInit {
     knowledgeCutoffDate: this.fb.control<string | null>(null),
     supportsCaching: this.fb.control(false, { nonNullable: true }),
     inferenceParams: this.fb.array<FormGroup<ParamRowGroup>>([], {
-      validators: [thinkingInvariantsValidator],
+      validators: [thinkingInvariantsValidator, maxTokensCeilingValidator],
     }),
     customInferenceParams: this.fb.array<FormGroup<CustomParamRowGroup>>([]),
   });
@@ -307,6 +376,12 @@ export class ModelFormPage implements OnInit {
     this.modelForm.controls.provider.valueChanges.subscribe(provider => {
       this.rebuildInferenceParamRows(provider);
     });
+
+    // Keep the max_tokens row pinned to the model's output ceiling: pre-fill
+    // it on a fresh model and re-check the cap whenever the ceiling changes.
+    this.modelForm.controls.maxOutputTokens.valueChanges.subscribe(() => {
+      this.syncMaxTokensCeiling();
+    });
   }
 
   /**
@@ -354,6 +429,7 @@ export class ModelFormPage implements OnInit {
           supported: fromExisting.supported,
           min: fromExisting.min ?? row.controls.min.value,
           max: fromExisting.max ?? row.controls.max.value,
+          allowed: fromExisting.allowed ?? row.controls.allowed.value,
           defaultValue: fromExisting.default ?? null,
           locked: fromExisting.locked,
         });
@@ -397,6 +473,7 @@ export class ModelFormPage implements OnInit {
           supported: spec.supported,
           min: spec.min ?? row.controls.min.value,
           max: spec.max ?? row.controls.max.value,
+          allowed: spec.allowed ?? row.controls.allowed.value,
           defaultValue: spec.default ?? null,
           locked: spec.locked,
         });
@@ -422,6 +499,37 @@ export class ModelFormPage implements OnInit {
         }
       });
     }
+
+    this.syncMaxTokensCeiling();
+  }
+
+  /**
+   * Pin the `max_tokens` inference-param row to the model's declared output
+   * ceiling. Pre-fills the row's Max and Default from `maxOutputTokens` so a
+   * fresh model defaults to "request the full ceiling" — but only while
+   * those fields are untouched and weren't loaded from a persisted record,
+   * so deliberate admin edits and saved specs win. Always re-validates the
+   * array so the ceiling cap re-checks when only the model-level field
+   * changed (a sibling value change doesn't re-run the array validator on
+   * its own).
+   */
+  private syncMaxTokensCeiling(): void {
+    const idx = this.inferenceParamRows().findIndex(m => m.key === 'max_tokens');
+    if (idx < 0) return;
+    const row = this.paramRowGroup(idx);
+    const ceiling = this.modelForm.controls.maxOutputTokens.value;
+    const loaded = this.loadedKnownKeys.has('max_tokens');
+
+    if (!loaded && typeof ceiling === 'number' && Number.isFinite(ceiling) && ceiling >= 1) {
+      if (row.controls.max.pristine) {
+        row.controls.max.setValue(ceiling, { emitEvent: false });
+      }
+      if (row.controls.defaultValue.pristine) {
+        row.controls.defaultValue.setValue(ceiling, { emitEvent: false });
+      }
+    }
+
+    this.modelForm.controls.inferenceParams.updateValueAndValidity();
   }
 
   private buildCustomParamRow(key: string, seed: ModelParamSpec | null): FormGroup<CustomParamRowGroup> {
@@ -431,6 +539,9 @@ export class ModelFormPage implements OnInit {
         supported: this.fb.control(seed?.supported ?? true, { nonNullable: true }),
         min: this.fb.control<number | null>(seed?.min ?? null),
         max: this.fb.control<number | null>(seed?.max ?? null),
+        // Custom rows have no catalog kind, so they're never enum-select;
+        // round-trip a persisted `allowed` if one was stored, else null.
+        allowed: this.fb.control<(string | number)[] | null>(seed?.allowed ?? null),
         defaultValue: this.fb.control<number | boolean | string | null>(seed?.default ?? null),
         locked: this.fb.control(seed?.locked ?? false, { nonNullable: true }),
       },
@@ -486,6 +597,11 @@ export class ModelFormPage implements OnInit {
     const providerBounds = meta.defaults?.[provider];
     const seedMin = seed?.min ?? providerBounds?.min ?? meta.defaultMin ?? null;
     const seedMax = seed?.max ?? providerBounds?.max ?? meta.defaultMax ?? null;
+    // Enum-select rows (e.g. effort) carry an `allowed` subset instead of
+    // min/max. Empty array on a fresh row marks it as "select kind" for the
+    // validator/template and forces the admin to opt into levels explicitly.
+    const seedAllowed: (string | number)[] | null =
+      meta.kind === 'select' ? (seed?.allowed ?? []) : null;
     return this.fb.group<ParamRowGroup>(
       {
         // Catalog key is fixed for known rows — no validators, just a read-only
@@ -494,6 +610,7 @@ export class ModelFormPage implements OnInit {
         supported: this.fb.control(seed?.supported ?? false, { nonNullable: true }),
         min: this.fb.control<number | null>(seedMin),
         max: this.fb.control<number | null>(seedMax),
+        allowed: this.fb.control<(string | number)[] | null>(seedAllowed),
         defaultValue: this.fb.control<number | boolean | string | null>(seed?.default ?? null),
         locked: this.fb.control(seed?.locked ?? false, { nonNullable: true }),
       },
@@ -507,6 +624,7 @@ export class ModelFormPage implements OnInit {
       supported: v.supported,
       min: v.min,
       max: v.max,
+      allowed: v.allowed,
       default: v.defaultValue,
       locked: v.locked,
     };
@@ -580,9 +698,51 @@ export class ModelFormPage implements OnInit {
     if (row.errors['thinkingBudgetNotNumeric']) {
       out.push('Thinking budget must be a number — clear the value to disable, or enter an integer ≥ 1024.');
     }
+    if (row.errors['maxTokensMaxAboveCeiling'] || row.errors['maxTokensDefaultAboveCeiling']) {
+      const ceiling = this.modelForm.controls.maxOutputTokens.value;
+      if (row.errors['maxTokensMaxAboveCeiling']) {
+        out.push(`Max must be ≤ the model's Max Output Tokens (${ceiling}).`);
+      }
+      if (row.errors['maxTokensDefaultAboveCeiling']) {
+        out.push(`Default must be ≤ the model's Max Output Tokens (${ceiling}).`);
+      }
+    }
+    if (row.errors['allowedEmpty']) {
+      out.push('Select at least one level this model supports.');
+    }
+    if (row.errors['defaultNotAllowed']) {
+      out.push('Default must be one of the selected levels.');
+    }
     return out;
   }
 
+  /**
+   * Whether `value` is in the enum-select row's `allowed` subset. Backs the
+   * per-level checkboxes for `kind: 'select'` params (e.g. effort).
+   */
+  isParamAllowed(index: number, value: string): boolean {
+    return (this.paramRowGroup(index).controls.allowed.value ?? []).includes(value);
+  }
+
+  /**
+   * Toggle a level in the enum-select row's `allowed` subset. Clears the
+   * row default if the level backing it was just removed so the
+   * default-in-allowed invariant can't be left stale.
+   */
+  toggleParamAllowed(index: number, value: string): void {
+    const row = this.paramRowGroup(index);
+    const current = row.controls.allowed.value ?? [];
+    const next = current.includes(value)
+      ? current.filter(v => v !== value)
+      : [...current, value];
+    row.controls.allowed.setValue(next);
+    row.controls.allowed.markAsDirty();
+    if (row.controls.defaultValue.value != null && !next.includes(row.controls.defaultValue.value as string)) {
+      row.controls.defaultValue.setValue(null);
+    }
+    row.controls.allowed.updateValueAndValidity();
+  }
+
   /**
    * Load model data for editing
    */
diff --git a/frontend/ai.client/src/app/admin/manage-models/models/managed-model.model.ts b/frontend/ai.client/src/app/admin/manage-models/models/managed-model.model.ts
index 076a589d..704ff466 100644
--- a/frontend/ai.client/src/app/admin/manage-models/models/managed-model.model.ts
+++ b/frontend/ai.client/src/app/admin/manage-models/models/managed-model.model.ts
@@ -20,6 +20,13 @@ export interface ModelParamSpec {
   supported: boolean;
   min?: number | null;
   max?: number | null;
+  /**
+   * Permissible values for enum-style params (e.g. `effort`). When set,
+   * `default` and any user override must be a member; `min`/`max` don't
+   * apply. The per-model difference (Sonnet 4.6 vs Opus 4.7 effort tiers)
+   * lives here as data — no model-family branching in code.
+   */
+  allowed?: (string | number)[] | null;
   default?: number | boolean | string | null;
   locked?: boolean;
 }
@@ -165,7 +172,13 @@ export interface KnownParamMeta {
    * stored value is `null` (off) or an int budget (on). The runtime
    * translator wraps the int into the provider-native shape.
    */
-  kind: 'number' | 'integer' | 'toggle' | 'thinkingBudget';
+  kind: 'number' | 'integer' | 'toggle' | 'thinkingBudget' | 'select';
+  /**
+   * Universe of selectable values for `kind: 'select'`. The admin checks the
+   * subset this model supports (stored as `ModelParamSpec.allowed`); the
+   * default is chosen from that subset. Ordered low->high.
+   */
+  options?: string[];
   /** Catalog-wide fallback range, used when no provider-specific entry applies. */
   defaultMin?: number;
   defaultMax?: number;
@@ -236,6 +249,17 @@ export const KNOWN_PARAMS: KnownParamMeta[] = [
     providers: ['bedrock', 'gemini'],
     incompatibleWith: ['temperature', 'top_p', 'top_k'],
   },
+  {
+    key: 'effort',
+    label: 'Effort',
+    description:
+      'Reasoning/output effort (Anthropic output_config.effort). Higher = ' +
+      'more thorough, more tokens. On adaptive-thinking models it governs ' +
+      'thinking depth. Check the levels this model supports; pick a default.',
+    kind: 'select',
+    options: ['low', 'medium', 'high', 'xhigh', 'max'],
+    providers: ['bedrock'],
+  },
   {
     key: 'reasoning_effort',
     label: 'Reasoning Effort',
diff --git a/frontend/ai.client/src/app/admin/manage-user-menu-links/manage-user-menu-links.page.ts b/frontend/ai.client/src/app/admin/manage-user-menu-links/manage-user-menu-links.page.ts
new file mode 100644
index 00000000..768ff8d9
--- /dev/null
+++ b/frontend/ai.client/src/app/admin/manage-user-menu-links/manage-user-menu-links.page.ts
@@ -0,0 +1,148 @@
+import { ChangeDetectionStrategy, Component, computed, inject } from '@angular/core';
+import { RouterLink } from '@angular/router';
+import { NgIcon, provideIcons } from '@ng-icons/core';
+import {
+  heroArrowLeft,
+  heroPencil,
+  heroTrash,
+  heroPlus,
+  heroArrowTopRightOnSquare,
+  heroDocumentText,
+} from '@ng-icons/heroicons/outline';
+import { UserMenuLinksService } from './services/user-menu-links.service';
+import { UserMenuLink } from './models/user-menu-link.model';
+
+@Component({
+  selector: 'app-manage-user-menu-links-page',
+  imports: [RouterLink, NgIcon],
+  providers: [
+    provideIcons({
+      heroArrowLeft,
+      heroPencil,
+      heroTrash,
+      heroPlus,
+      heroArrowTopRightOnSquare,
+      heroDocumentText,
+    }),
+  ],
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  template: `
+    <div>
+      <div class="mb-8 flex flex-col gap-4 sm:flex-row sm:items-center sm:justify-between">
+          <div>
+            <h1 class="text-3xl/9 font-bold text-gray-900 dark:text-white">User Menu Links</h1>
+            <p class="mt-1 text-gray-600 dark:text-gray-400">
+              Manage links rendered in the user menu. Each link opens an external URL or an in-app modal with rich text.
+            </p>
+          </div>
+          <a
+            routerLink="/admin/manage-user-menu-links/new"
+            class="inline-flex items-center gap-2 rounded-sm bg-blue-600 px-4 py-2 text-sm/6 font-medium text-white hover:bg-blue-700 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:bg-blue-500 dark:hover:bg-blue-600"
+          >
+            <ng-icon name="heroPlus" class="size-5" />
+            New link
+          </a>
+        </div>
+
+        @if (loadError()) {
+          <div class="mb-4 rounded-sm border border-red-300 bg-red-50 p-4 text-sm/6 text-red-700 dark:border-red-700 dark:bg-red-900/20 dark:text-red-300">
+            Failed to load links. {{ loadError() }}
+          </div>
+        }
+
+        @if (links().length === 0 && !isLoading()) {
+          <div class="rounded-sm border border-gray-300 bg-white p-12 text-center dark:border-gray-600 dark:bg-gray-800">
+            <p class="text-base/7 text-gray-500 dark:text-gray-400">No user-menu links yet.</p>
+            <a
+              routerLink="/admin/manage-user-menu-links/new"
+              class="mt-4 inline-flex items-center gap-2 text-sm/6 font-medium text-blue-600 hover:text-blue-700 dark:text-blue-400 dark:hover:text-blue-300"
+            >
+              Add the first one →
+            </a>
+          </div>
+        } @else {
+          <div class="space-y-3">
+            @for (link of links(); track link.link_id) {
+              <div class="flex items-center justify-between gap-4 rounded-sm border border-gray-300 bg-white p-4 dark:border-gray-600 dark:bg-gray-800">
+                <div class="flex flex-1 items-center gap-3 min-w-0">
+                  <ng-icon
+                    [name]="link.kind === 'external' ? 'heroArrowTopRightOnSquare' : 'heroDocumentText'"
+                    class="size-5 shrink-0 text-gray-400 dark:text-gray-500"
+                  />
+                  <div class="min-w-0 flex-1">
+                    <div class="flex items-center gap-2">
+                      <span class="truncate text-sm/6 font-medium text-gray-900 dark:text-white">{{ link.label }}</span>
+                      @if (!link.enabled) {
+                        <span class="shrink-0 rounded-sm bg-gray-100 px-2 py-0.5 text-xs/5 font-medium text-gray-600 dark:bg-gray-700 dark:text-gray-300">Disabled</span>
+                      }
+                      <span class="shrink-0 rounded-sm bg-blue-100 px-2 py-0.5 text-xs/5 font-medium text-blue-700 dark:bg-blue-900/40 dark:text-blue-300">
+                        {{ link.kind === 'external' ? 'External' : 'Modal' }}
+                      </span>
+                    </div>
+                    @if (link.kind === 'external') {
+                      <p class="mt-0.5 truncate text-xs/5 text-gray-500 dark:text-gray-400">{{ link.url }}</p>
+                    } @else {
+                      <p class="mt-0.5 truncate text-xs/5 text-gray-500 dark:text-gray-400">{{ summarize(link.body_markdown) }}</p>
+                    }
+                  </div>
+                </div>
+                <div class="flex shrink-0 items-center gap-2">
+                  <span class="hidden text-xs text-gray-400 sm:inline dark:text-gray-500" [title]="'Order: ' + link.order">#{{ link.order }}</span>
+                  <a
+                    [routerLink]="['/admin/manage-user-menu-links/edit', link.link_id]"
+                    class="inline-flex items-center gap-1 rounded-sm border border-gray-300 bg-white px-2.5 py-1.5 text-sm/6 font-medium text-gray-700 hover:bg-gray-100 focus:outline-none focus:ring-2 focus:ring-gray-500 dark:border-gray-500 dark:bg-gray-700 dark:text-gray-300 dark:hover:bg-gray-600"
+                    [attr.aria-label]="'Edit ' + link.label"
+                  >
+                    <ng-icon name="heroPencil" class="size-4" />
+                    <span class="sr-only sm:not-sr-only">Edit</span>
+                  </a>
+                  <button
+                    type="button"
+                    (click)="onDelete(link)"
+                    class="inline-flex items-center gap-1 rounded-sm border border-red-300 bg-white px-2.5 py-1.5 text-sm/6 font-medium text-red-700 hover:bg-red-50 focus:outline-none focus:ring-2 focus:ring-red-500 dark:border-red-500 dark:bg-gray-700 dark:text-red-400 dark:hover:bg-red-900/20"
+                    [attr.aria-label]="'Delete ' + link.label"
+                  >
+                    <ng-icon name="heroTrash" class="size-4" />
+                    <span class="sr-only sm:not-sr-only">Delete</span>
+                  </button>
+                </div>
+              </div>
+            }
+          </div>
+      }
+    </div>
+  `,
+})
+export class ManageUserMenuLinksPage {
+  private readonly service = inject(UserMenuLinksService);
+
+  constructor() {
+    this.service.ensureAdminLinksLoaded();
+  }
+
+  protected readonly links = computed<UserMenuLink[]>(
+    () => this.service.adminLinksResource.value()?.links ?? [],
+  );
+  protected readonly isLoading = computed(() => this.service.adminLinksResource.isLoading());
+  protected readonly loadError = computed(() => {
+    const err = this.service.adminLinksResource.error();
+    if (!err) return null;
+    return err instanceof Error ? err.message : String(err);
+  });
+
+  protected summarize(markdown: string | null | undefined): string {
+    if (!markdown) return '(empty)';
+    const stripped = markdown.replace(/[#*_`>\-]/g, '').replace(/\s+/g, ' ').trim();
+    return stripped.length > 120 ? stripped.slice(0, 120) + '…' : stripped;
+  }
+
+  protected async onDelete(link: UserMenuLink): Promise<void> {
+    if (!confirm(`Delete "${link.label}"?`)) return;
+    try {
+      await this.service.deleteLink(link.link_id);
+    } catch (err) {
+      console.error('Failed to delete user-menu link', err);
+      alert('Failed to delete the link. Please try again.');
+    }
+  }
+}
diff --git a/frontend/ai.client/src/app/admin/manage-user-menu-links/models/user-menu-link.model.ts b/frontend/ai.client/src/app/admin/manage-user-menu-links/models/user-menu-link.model.ts
new file mode 100644
index 00000000..ae054b34
--- /dev/null
+++ b/frontend/ai.client/src/app/admin/manage-user-menu-links/models/user-menu-link.model.ts
@@ -0,0 +1,28 @@
+export type UserMenuLinkKind = 'external' | 'modal';
+
+export interface UserMenuLink {
+  link_id: string;
+  label: string;
+  kind: UserMenuLinkKind;
+  enabled: boolean;
+  order: number;
+  url?: string | null;
+  body_markdown?: string | null;
+  created_at: string;
+  updated_at: string;
+  created_by?: string | null;
+}
+
+export interface UserMenuLinksListResponse {
+  links: UserMenuLink[];
+  total: number;
+}
+
+export interface UserMenuLinkFormData {
+  label: string;
+  kind: UserMenuLinkKind;
+  enabled: boolean;
+  order: number;
+  url?: string | null;
+  body_markdown?: string | null;
+}
diff --git a/frontend/ai.client/src/app/admin/manage-user-menu-links/services/user-menu-links.service.ts b/frontend/ai.client/src/app/admin/manage-user-menu-links/services/user-menu-links.service.ts
new file mode 100644
index 00000000..986eafa0
--- /dev/null
+++ b/frontend/ai.client/src/app/admin/manage-user-menu-links/services/user-menu-links.service.ts
@@ -0,0 +1,102 @@
+import { Injectable, inject, computed, resource, signal } from '@angular/core';
+import { HttpClient } from '@angular/common/http';
+import { firstValueFrom } from 'rxjs';
+import { ConfigService } from '../../../services/config.service';
+import {
+  UserMenuLink,
+  UserMenuLinkFormData,
+  UserMenuLinksListResponse,
+} from '../models/user-menu-link.model';
+
+/**
+ * Service for admin-managed user-menu links.
+ *
+ * Two API surfaces:
+ *  - Admin: `/admin/user-menu-links` (CRUD, includes disabled links)
+ *  - Public: `/user-menu-links` (enabled-only, used by `enabledLinksResource`)
+ *
+ * The public resource is what the user-dropdown component consumes. The
+ * dropdown takes a `User` as a required input and is only rendered by the
+ * topnav once the session bootstrap has resolved, so the resource's loader
+ * fires post-auth on first read — no explicit reload needed.
+ */
+@Injectable({ providedIn: 'root' })
+export class UserMenuLinksService {
+  private http = inject(HttpClient);
+  private config = inject(ConfigService);
+
+  private readonly adminBaseUrl = computed(
+    () => `${this.config.appApiUrl()}/admin/user-menu-links`,
+  );
+  private readonly publicBaseUrl = computed(
+    () => `${this.config.appApiUrl()}/user-menu-links`,
+  );
+
+  // The admin resource is gated: this service is `providedIn: 'root'` and is
+  // injected by the always-rendered user-dropdown, so an eager admin loader
+  // would fire `GET /admin/user-menu-links/` on every app load for every user
+  // (401/403 for non-admins). It only loads once the admin manage page calls
+  // `ensureAdminLinksLoaded()`.
+  private readonly adminLinksRequested = signal(false);
+
+  readonly adminLinksResource = resource({
+    params: () => (this.adminLinksRequested() ? {} : undefined),
+    loader: async () => this.fetchAdminLinks(),
+  });
+
+  /** Activates the admin links resource. Called by the admin manage page. */
+  ensureAdminLinksLoaded(): void {
+    this.adminLinksRequested.set(true);
+  }
+
+  readonly enabledLinksResource = resource({
+    loader: async () => this.fetchEnabledLinks(),
+  });
+
+  async fetchAdminLinks(): Promise<UserMenuLinksListResponse> {
+    return await firstValueFrom(
+      this.http.get<UserMenuLinksListResponse>(`${this.adminBaseUrl()}/`),
+    );
+  }
+
+  async fetchEnabledLinks(): Promise<UserMenuLinksListResponse> {
+    return await firstValueFrom(
+      this.http.get<UserMenuLinksListResponse>(`${this.publicBaseUrl()}/`),
+    );
+  }
+
+  async getLink(linkId: string): Promise<UserMenuLink> {
+    return await firstValueFrom(
+      this.http.get<UserMenuLink>(`${this.adminBaseUrl()}/${linkId}`),
+    );
+  }
+
+  async createLink(data: UserMenuLinkFormData): Promise<UserMenuLink> {
+    const created = await firstValueFrom(
+      this.http.post<UserMenuLink>(`${this.adminBaseUrl()}/`, data),
+    );
+    this.adminLinksResource.reload();
+    this.enabledLinksResource.reload();
+    return created;
+  }
+
+  async updateLink(
+    linkId: string,
+    updates: Partial<UserMenuLinkFormData>,
+  ): Promise<UserMenuLink> {
+    const updated = await firstValueFrom(
+      this.http.patch<UserMenuLink>(`${this.adminBaseUrl()}/${linkId}`, updates),
+    );
+    this.adminLinksResource.reload();
+    this.enabledLinksResource.reload();
+    return updated;
+  }
+
+  async deleteLink(linkId: string): Promise<void> {
+    await firstValueFrom(
+      this.http.delete<void>(`${this.adminBaseUrl()}/${linkId}`),
+    );
+    this.adminLinksResource.reload();
+    this.enabledLinksResource.reload();
+  }
+}
diff --git a/frontend/ai.client/src/app/admin/manage-user-menu-links/user-menu-link-form.page.ts b/frontend/ai.client/src/app/admin/manage-user-menu-links/user-menu-link-form.page.ts
new file mode 100644
index 00000000..307c7461
--- /dev/null
+++ b/frontend/ai.client/src/app/admin/manage-user-menu-links/user-menu-link-form.page.ts
@@ -0,0 +1,327 @@
+import { ChangeDetectionStrategy, Component, OnInit, computed, inject, signal } from '@angular/core';
+import { ActivatedRoute, Router, RouterLink } from '@angular/router';
+import {
+  FormControl,
+  FormGroup,
+  ReactiveFormsModule,
+  Validators,
+} from '@angular/forms';
+import { MarkdownComponent } from 'ngx-markdown';
+import { NgIcon, provideIcons } from '@ng-icons/core';
+import { heroArrowLeft } from '@ng-icons/heroicons/outline';
+import { UserMenuLinksService } from './services/user-menu-links.service';
+import {
+  UserMenuLinkFormData,
+  UserMenuLinkKind,
+} from './models/user-menu-link.model';
+
+const URL_PATTERN = /^https?:\/\/.+/i;
+
+@Component({
+  selector: 'app-user-menu-link-form-page',
+  imports: [RouterLink, ReactiveFormsModule, MarkdownComponent, NgIcon],
+  providers: [provideIcons({ heroArrowLeft })],
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  template: `
+    <div class="max-w-4xl">
+        <a
+          routerLink="/admin/manage-user-menu-links"
+          class="mb-6 inline-flex items-center gap-2 text-sm/6 font-medium text-gray-600 hover:text-gray-900 dark:text-gray-400 dark:hover:text-white"
+        >
+          <ng-icon name="heroArrowLeft" class="size-4" />
+          Back to User Menu Links
+        </a>
+
+        <h1 class="mb-6 text-3xl/9 font-bold text-gray-900 dark:text-white">
+          {{ isEdit() ? 'Edit user-menu link' : 'New user-menu link' }}
+        </h1>
+
+        @if (loadError()) {
+          <div class="mb-4 rounded-sm border border-red-300 bg-red-50 p-4 text-sm/6 text-red-700 dark:border-red-700 dark:bg-red-900/20 dark:text-red-300">
+            {{ loadError() }}
+          </div>
+        }
+
+        <form [formGroup]="form" (ngSubmit)="onSubmit()" class="space-y-6">
+          <div class="rounded-sm border border-gray-300 bg-white p-6 dark:border-gray-600 dark:bg-gray-800">
+            <div class="grid grid-cols-1 gap-4 md:grid-cols-2">
+              <div class="md:col-span-2">
+                <label for="label" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                  Label <span class="text-red-600">*</span>
+                </label>
+                <input
+                  id="label"
+                  type="text"
+                  formControlName="label"
+                  maxlength="64"
+                  class="mt-1 block w-full rounded-sm border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-500 dark:bg-gray-700 dark:text-white"
+                  placeholder="e.g. Privacy policy"
+                />
+                @if (showError('label')) {
+                  <p class="mt-1 text-xs text-red-600 dark:text-red-400">Label is required.</p>
+                }
+              </div>
+
+              <div>
+                <label for="kind" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                  Type
+                </label>
+                <select
+                  id="kind"
+                  formControlName="kind"
+                  class="mt-1 block w-full rounded-sm border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-500 dark:bg-gray-700 dark:text-white"
+                >
+                  <option value="external">External URL (new tab)</option>
+                  <option value="modal">In-app modal (rich text)</option>
+                </select>
+              </div>
+
+              <div>
+                <label for="order" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                  Order
+                </label>
+                <input
+                  id="order"
+                  type="number"
+                  min="0"
+                  max="10000"
+                  formControlName="order"
+                  class="mt-1 block w-full rounded-sm border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-500 dark:bg-gray-700 dark:text-white"
+                />
+                <p class="mt-1 text-xs text-gray-500 dark:text-gray-400">Lower numbers appear first.</p>
+              </div>
+
+              <div class="md:col-span-2 flex items-center gap-2">
+                <input
+                  id="enabled"
+                  type="checkbox"
+                  formControlName="enabled"
+                  class="size-4 rounded border-gray-300 text-blue-600 focus:ring-blue-500"
+                />
+                <label for="enabled" class="text-sm/6 text-gray-700 dark:text-gray-300">
+                  Visible to users
+                </label>
+              </div>
+            </div>
+          </div>
+
+          @if (kindValue() === 'external') {
+            <div class="rounded-sm border border-gray-300 bg-white p-6 dark:border-gray-600 dark:bg-gray-800">
+              <label for="url" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                URL <span class="text-red-600">*</span>
+              </label>
+              <input
+                id="url"
+                type="url"
+                formControlName="url"
+                maxlength="2048"
+                placeholder="https://example.com/privacy"
+                class="mt-1 block w-full rounded-sm border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-500 dark:bg-gray-700 dark:text-white"
+              />
+              @if (showError('url')) {
+                <p class="mt-1 text-xs text-red-600 dark:text-red-400">A valid http(s) URL is required.</p>
+              }
+              <p class="mt-1 text-xs text-gray-500 dark:text-gray-400">Opens in a new tab with <code>rel="noopener noreferrer"</code>.</p>
+            </div>
+          } @else {
+            <div class="rounded-sm border border-gray-300 bg-white p-6 dark:border-gray-600 dark:bg-gray-800">
+              <label for="body_markdown" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                Body (Markdown) <span class="text-red-600">*</span>
+              </label>
+              <p class="mt-1 mb-3 text-xs text-gray-500 dark:text-gray-400">
+                Supports CommonMark: headings, lists, links, code, emphasis. Links open in a new tab.
+              </p>
+              <div class="grid grid-cols-1 gap-3 md:grid-cols-2">
+                <textarea
+                  id="body_markdown"
+                  formControlName="body_markdown"
+                  rows="14"
+                  maxlength="50000"
+                  class="block w-full rounded-sm border border-gray-300 bg-white px-3 py-2 font-mono text-sm text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-500 dark:bg-gray-700 dark:text-white"
+                  placeholder="# Welcome&#10;&#10;Some **rich** text..."
+                ></textarea>
+                <div class="rounded-sm border border-gray-200 bg-gray-50 px-4 py-3 dark:border-gray-700 dark:bg-gray-900">
+                  <p class="mb-2 text-xs font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">Preview</p>
+                  <div class="markdown-body prose prose-sm max-w-none dark:prose-invert">
+                    <markdown [data]="previewMarkdown()" />
+                  </div>
+                </div>
+              </div>
+              @if (showError('body_markdown')) {
+                <p class="mt-1 text-xs text-red-600 dark:text-red-400">Body is required for modal links.</p>
+              }
+            </div>
+          }
+
+          @if (submitError()) {
+            <div class="rounded-sm border border-red-300 bg-red-50 p-4 text-sm/6 text-red-700 dark:border-red-700 dark:bg-red-900/20 dark:text-red-300">
+              {{ submitError() }}
+            </div>
+          }
+
+          <div class="flex justify-end gap-3">
+            <a
+              routerLink="/admin/manage-user-menu-links"
+              class="rounded-sm border border-gray-300 bg-white px-4 py-2 text-sm/6 font-medium text-gray-700 hover:bg-gray-100 focus:outline-none focus:ring-2 focus:ring-gray-500 dark:border-gray-500 dark:bg-gray-700 dark:text-gray-300 dark:hover:bg-gray-600"
+            >
+              Cancel
+            </a>
+            <button
+              type="submit"
+              [disabled]="form.invalid || isSubmitting()"
+              class="inline-flex items-center gap-2 rounded-sm bg-blue-600 px-4 py-2 text-sm/6 font-medium text-white hover:bg-blue-700 focus:outline-none focus:ring-2 focus:ring-blue-500 disabled:cursor-not-allowed disabled:opacity-50 dark:bg-blue-500 dark:hover:bg-blue-600"
+            >
+              @if (isSubmitting()) {
+                <span class="size-4 animate-spin rounded-full border-2 border-white border-t-transparent" aria-hidden="true"></span>
+              }
+              {{ isEdit() ? 'Save changes' : 'Create link' }}
+            </button>
+          </div>
+        </form>
+    </div>
+  `,
+  styles: `
+    @import "tailwindcss";
+    @custom-variant dark (&:where(.dark, .dark *));
+
+    .markdown-body ::ng-deep a {
+      color: var(--color-primary-500);
+      text-decoration: underline;
+      text-underline-offset: 2px;
+    }
+    .markdown-body ::ng-deep a:hover {
+      color: var(--color-primary-700);
+    }
+    .markdown-body ::ng-deep a:focus-visible {
+      outline: 2px solid var(--color-primary-500);
+      outline-offset: 2px;
+      border-radius: 0.125rem;
+    }
+    :host-context(.dark) .markdown-body ::ng-deep a {
+      color: var(--color-primary-400);
+    }
+    :host-context(.dark) .markdown-body ::ng-deep a:hover {
+      color: var(--color-primary-300);
+    }
+  `,
+})
+export class UserMenuLinkFormPage implements OnInit {
+  private readonly service = inject(UserMenuLinksService);
+  private readonly route = inject(ActivatedRoute);
+  private readonly router = inject(Router);
+
+  protected readonly form = new FormGroup({
+    label: new FormControl<string>('', {
+      nonNullable: true,
+      validators: [Validators.required, Validators.maxLength(64)],
+    }),
+    kind: new FormControl<UserMenuLinkKind>('external', { nonNullable: true }),
+    enabled: new FormControl<boolean>(true, { nonNullable: true }),
+    order: new FormControl<number>(0, {
+      nonNullable: true,
+      validators: [Validators.min(0), Validators.max(10_000)],
+    }),
+    url: new FormControl<string>('', { nonNullable: true }),
+    body_markdown: new FormControl<string>('', { nonNullable: true }),
+  });
+
+  protected readonly isSubmitting = signal(false);
+  protected readonly submitError = signal<string | null>(null);
+  protected readonly loadError = signal<string | null>(null);
+  private readonly editingId = signal<string | null>(null);
+  protected readonly isEdit = computed(() => this.editingId() !== null);
+
+  // Mirrored signals for reactive template (FormControl.valueChanges is rxjs).
+  private readonly kindSig = signal<UserMenuLinkKind>('external');
+  private readonly bodySig = signal<string>('');
+  protected readonly kindValue = this.kindSig.asReadonly();
+  protected readonly previewMarkdown = computed(() => this.bodySig() || '*(empty preview)*');
+
+  async ngOnInit(): Promise<void> {
+    this.form.controls.kind.valueChanges.subscribe(value => {
+      this.kindSig.set(value);
+      this.syncKindValidators(value);
+    });
+    this.form.controls.body_markdown.valueChanges.subscribe(value => {
+      this.bodySig.set(value);
+    });
+    this.syncKindValidators(this.form.controls.kind.value);
+
+    const id = this.route.snapshot.paramMap.get('id');
+    if (id) {
+      this.editingId.set(id);
+      try {
+        const link = await this.service.getLink(id);
+        this.form.patchValue({
+          label: link.label,
+          kind: link.kind,
+          enabled: link.enabled,
+          order: link.order,
+          url: link.url ?? '',
+          body_markdown: link.body_markdown ?? '',
+        });
+        this.kindSig.set(link.kind);
+        this.bodySig.set(link.body_markdown ?? '');
+      } catch (err) {
+        this.loadError.set(
+          err instanceof Error ? err.message : 'Failed to load link.',
+        );
+      }
+    }
+  }
+
+  private syncKindValidators(kind: UserMenuLinkKind): void {
+    const url = this.form.controls.url;
+    const body = this.form.controls.body_markdown;
+    if (kind === 'external') {
+      url.setValidators([Validators.required, Validators.pattern(URL_PATTERN)]);
+      body.clearValidators();
+    } else {
+      body.setValidators([Validators.required]);
+      url.clearValidators();
+    }
+    url.updateValueAndValidity({ emitEvent: false });
+    body.updateValueAndValidity({ emitEvent: false });
+  }
+
+  protected showError(name: 'label' | 'url' | 'body_markdown'): boolean {
+    const c = this.form.get(name);
+    return !!c && c.invalid && (c.touched || c.dirty);
+  }
+
+  protected async onSubmit(): Promise<void> {
+    if (this.form.invalid) {
+      this.form.markAllAsTouched();
+      return;
+    }
+    this.isSubmitting.set(true);
+    this.submitError.set(null);
+
+    const raw = this.form.getRawValue();
+    const data: UserMenuLinkFormData = {
+      label: raw.label.trim(),
+      kind: raw.kind,
+      enabled: raw.enabled,
+      order: Number(raw.order ?? 0),
+      url: raw.kind === 'external' ? raw.url.trim() : null,
+      body_markdown: raw.kind === 'modal' ? raw.body_markdown : null,
+    };
+
+    try {
+      const id = this.editingId();
+      if (id) {
+        await this.service.updateLink(id, data);
+      } else {
+        await this.service.createLink(data);
+      }
+      this.router.navigate(['/admin/manage-user-menu-links']);
+    } catch (err: unknown) {
+      const detail = (err as { error?: { detail?: string }; message?: string })?.error?.detail
+        ?? (err as Error)?.message
+        ?? 'Failed to save link.';
+      this.submitError.set(detail);
+    } finally {
+      this.isSubmitting.set(false);
+    }
+  }
+}
diff --git a/frontend/ai.client/src/app/admin/openai-models/openai-models.page.html b/frontend/ai.client/src/app/admin/openai-models/openai-models.page.html
index 7acb14ef..bd77b005 100644
--- a/frontend/ai.client/src/app/admin/openai-models/openai-models.page.html
+++ b/frontend/ai.client/src/app/admin/openai-models/openai-models.page.html
@@ -1,80 +1,69 @@
 <div class="min-h-dvh">
-  <div class="mx-auto max-w-7xl px-4 py-8 sm:px-6 lg:px-8">
+  <div class="mx-auto max-w-5xl px-4 py-8 sm:px-6 lg:px-8">
     <!-- Back Button -->
     <a
       routerLink="/admin/manage-models"
       class="mb-6 inline-flex items-center gap-2 text-sm/6 font-medium text-gray-600 hover:text-gray-900 dark:text-gray-400 dark:hover:text-white"
     >
-      <ng-icon name="heroArrowLeft" class="size-4" />
+      <ng-icon name="heroArrowLeft" class="size-4" aria-hidden="true" />
       Back to Manage Models
     </a>
 
     <!-- Page Header -->
-    <div class="mb-8">
-      <h1 class="text-3xl/9 font-bold text-gray-900 dark:text-white">OpenAI Models</h1>
-      <p class="mt-2 text-base/7 text-gray-600 dark:text-gray-400">
-        View and manage available OpenAI models. For detailed specifications, visit
+    <div class="mb-6">
+      <h1 class="text-2xl/8 font-bold text-gray-900 dark:text-white">OpenAI Models</h1>
+      <p class="mt-1 text-sm/6 text-gray-600 dark:text-gray-400">
+        Browse available OpenAI models and add them to your managed list. For detailed specs, see
         <a href="https://platform.openai.com/docs/models/compare" target="_blank" rel="noopener noreferrer" class="text-blue-600 hover:text-blue-700 dark:text-blue-400 dark:hover:text-blue-300">
-          OpenAI's model comparison page
-        </a>
+          OpenAI's model comparison page</a>.
       </p>
     </div>
 
-    <!-- Filters Section -->
-    <div class="mb-6 rounded-sm border border-gray-300 bg-white p-6 dark:border-gray-600 dark:bg-gray-800">
-      <h2 class="mb-4 text-lg/7 font-semibold text-gray-900 dark:text-white">Filters</h2>
-
-      <div class="grid grid-cols-1 gap-4 md:grid-cols-2 lg:grid-cols-3">
-        <!-- Search Filter -->
-        <div>
-          <label for="search" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
-            Search Models
-          </label>
-          <input
-            type="text"
-            id="search"
-            [ngModel]="searchQuery()"
-            (ngModelChange)="searchQuery.set($event)"
-            placeholder="Search by ID or owner..."
-            class="mt-1 block w-full rounded-sm border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-hidden focus:ring-3 focus:ring-blue-500/50 dark:border-gray-600 dark:bg-gray-700 dark:text-white dark:placeholder:text-gray-500"
-          />
-        </div>
-
-        <!-- Max Results Filter -->
-        <div>
-          <label for="maxResults" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
-            Max Results
-          </label>
-          <input
-            type="number"
-            id="maxResults"
-            [ngModel]="maxResultsFilter()"
-            (ngModelChange)="maxResultsFilter.set($event || undefined)"
-            placeholder="No limit"
-            min="1"
-            max="1000"
-            class="mt-1 block w-full rounded-sm border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-hidden focus:ring-3 focus:ring-blue-500/50 dark:border-gray-600 dark:bg-gray-700 dark:text-white dark:placeholder:text-gray-500"
-          />
-        </div>
+    <!-- Toolbar: search + filters inline -->
+    <div class="mb-3 flex flex-col gap-2 sm:flex-row sm:items-center">
+      <div class="relative flex-1">
+        <ng-icon
+          name="heroMagnifyingGlass"
+          class="pointer-events-none absolute left-3 top-1/2 size-4 -translate-y-1/2 text-gray-400 dark:text-gray-500"
+          aria-hidden="true"
+        />
+        <label for="search" class="sr-only">Search models</label>
+        <input
+          type="text"
+          id="search"
+          [ngModel]="searchQuery()"
+          (ngModelChange)="searchQuery.set($event)"
+          placeholder="Search by ID or owner…"
+          class="block w-full rounded-2xl border border-gray-300 bg-white py-2 pl-9 pr-3 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
+        />
       </div>
 
-      <!-- Filter Actions -->
-      <div class="mt-4 flex gap-3">
+      <label for="maxResults" class="sr-only">Max results</label>
+      <input
+        type="number"
+        id="maxResults"
+        [ngModel]="maxResultsFilter()"
+        (ngModelChange)="maxResultsFilter.set($event || undefined)"
+        placeholder="Max"
+        min="1"
+        max="1000"
+        class="w-24 rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
+      />
+
+      <button
+        (click)="applyFilters()"
+        class="rounded-2xl bg-blue-600 px-4 py-2 text-sm/6 font-medium text-white hover:bg-blue-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:bg-blue-500 dark:hover:bg-blue-600"
+      >
+        Apply
+      </button>
+      @if (hasActiveFilters()) {
         <button
-          (click)="applyFilters()"
-          class="rounded-sm bg-blue-600 px-4 py-2 text-sm/6 font-medium text-white hover:bg-blue-700 focus:outline-hidden focus:ring-3 focus:ring-blue-500/50 dark:bg-blue-500 dark:hover:bg-blue-600"
+          (click)="resetFilters()"
+          class="rounded-2xl px-3 py-2 text-sm/6 font-medium text-gray-600 hover:bg-gray-100 hover:text-gray-900 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-gray-500 dark:text-gray-400 dark:hover:bg-gray-800 dark:hover:text-white"
         >
-          Apply Filters
+          Reset
         </button>
-        @if (hasActiveFilters()) {
-          <button
-            (click)="resetFilters()"
-            class="rounded-sm border border-gray-300 bg-white px-4 py-2 text-sm/6 font-medium text-gray-700 hover:bg-gray-50 focus:outline-hidden focus:ring-3 focus:ring-gray-500/50 dark:border-gray-600 dark:bg-gray-700 dark:text-gray-300 dark:hover:bg-gray-600"
-          >
-            Reset Filters
-          </button>
-        }
-      </div>
+      }
     </div>
 
     <!-- Loading State -->
@@ -86,103 +75,105 @@ <h2 class="mb-4 text-lg/7 font-semibold text-gray-900 dark:text-white">Filters</
 
     <!-- Error State -->
     @if (error()) {
-      <div class="rounded-sm border border-red-200 bg-red-50 p-4 dark:border-red-800 dark:bg-red-900/20">
+      <div class="rounded-2xl border border-red-200 bg-red-50 p-4 dark:border-red-800 dark:bg-red-900/20">
         <h3 class="text-sm/6 font-medium text-red-800 dark:text-red-400">Error loading models</h3>
         <p class="mt-1 text-sm/6 text-red-700 dark:text-red-500">{{ error() }}</p>
       </div>
     }
 
-    <!-- Results Header -->
     @if (!isLoading() && !error()) {
-      <div class="mb-4 flex items-center justify-between">
-        <p class="text-sm/6 text-gray-600 dark:text-gray-400">
-          Showing {{ models().length }} model{{ models().length !== 1 ? 's' : '' }}
-        </p>
-      </div>
+      <!-- Count -->
+      <p class="mb-3 text-xs/5 text-gray-500 dark:text-gray-400">
+        {{ models().length }} model{{ models().length !== 1 ? 's' : '' }}
+      </p>
 
       <!-- Models List -->
       @if (models().length === 0) {
-        <div class="rounded-sm border border-gray-200 bg-white p-12 text-center dark:border-gray-700 dark:bg-gray-800">
-          <p class="text-base/7 text-gray-500 dark:text-gray-400">
-            No models found.
-          </p>
+        <div class="rounded-2xl border border-dashed border-gray-300 bg-white p-12 text-center dark:border-gray-700 dark:bg-gray-800">
+          <p class="text-sm/6 text-gray-500 dark:text-gray-400">No models found.</p>
         </div>
       } @else {
-        <div class="space-y-4">
+        <ul class="divide-y divide-gray-200 overflow-hidden rounded-2xl border border-gray-200 bg-white dark:divide-gray-700 dark:border-gray-700 dark:bg-gray-800">
           @for (model of models(); track model.id) {
-            <div class="rounded-sm border border-gray-200 bg-white p-6 hover:border-gray-300 dark:border-gray-700 dark:bg-gray-800 dark:hover:border-gray-600">
-              <!-- Model Header -->
-              <div class="mb-4 flex items-start justify-between">
-                <div class="flex-1">
-                  <h3 class="text-lg/7 font-semibold text-gray-900 dark:text-white">
+            <li>
+              <!-- Row -->
+              <div class="flex items-center gap-3 px-3 py-2.5 sm:px-4">
+                <button
+                  type="button"
+                  (click)="toggleExpand(model.id)"
+                  [attr.aria-expanded]="isExpanded(model.id)"
+                  [attr.aria-controls]="'model-detail-' + model.id"
+                  [attr.aria-label]="(isExpanded(model.id) ? 'Hide' : 'Show') + ' details for ' + model.id"
+                  class="flex size-7 shrink-0 items-center justify-center rounded-2xl text-gray-400 hover:bg-gray-100 hover:text-gray-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:text-gray-500 dark:hover:bg-gray-700 dark:hover:text-gray-200"
+                >
+                  <ng-icon
+                    name="heroChevronDown"
+                    class="size-4 transition-transform duration-150"
+                    [class.rotate-180]="isExpanded(model.id)"
+                    aria-hidden="true"
+                  />
+                </button>
+
+                <div class="min-w-0 flex-1">
+                  <span class="block truncate text-sm/6 font-medium text-gray-900 dark:text-white">
                     {{ model.id }}
-                  </h3>
-                  <p class="mt-1 text-sm/6 text-gray-600 dark:text-gray-400">
+                  </span>
+                  <p class="truncate text-xs/5 text-gray-500 dark:text-gray-400">
                     Owned by: {{ model.ownedBy }}
                   </p>
                 </div>
-                <span class="inline-flex shrink-0 items-center rounded-sm bg-green-100 px-3 py-1 text-xs/5 font-medium text-green-800 dark:bg-green-900/50 dark:text-green-300">
+
+                <span class="hidden shrink-0 rounded-2xl bg-gray-100 px-2.5 py-0.5 text-xs/5 font-medium text-gray-600 sm:inline-block dark:bg-gray-700 dark:text-gray-300">
                   OpenAI
                 </span>
-              </div>
-
-              <!-- Model Details Grid -->
-              <div class="grid grid-cols-1 gap-4 md:grid-cols-2">
-                <!-- Created Date -->
-                @if (model.created) {
-                  <div>
-                    <h4 class="text-sm/6 font-medium text-gray-700 dark:text-gray-300">Created</h4>
-                    <p class="mt-2 text-sm/6 text-gray-900 dark:text-white">{{ formatDate(model.created) }}</p>
-                  </div>
-                }
-
-                <!-- Object Type -->
-                @if (model.object) {
-                  <div>
-                    <h4 class="text-sm/6 font-medium text-gray-700 dark:text-gray-300">Type</h4>
-                    <p class="mt-2 text-sm/6 text-gray-900 dark:text-white">{{ model.object }}</p>
-                  </div>
-                }
-              </div>
 
-              <!-- Model Features and Actions -->
-              <div class="mt-4 flex items-center justify-between gap-4 border-t border-gray-200 pt-4 dark:border-gray-700">
-                <div class="flex gap-4">
-                  <div class="flex items-center gap-2">
-                    <svg class="size-5 text-blue-600 dark:text-blue-400" fill="currentColor" viewBox="0 0 20 20">
-                      <path fill-rule="evenodd" d="M18 10a8 8 0 11-16 0 8 8 0 0116 0zm-7-4a1 1 0 11-2 0 1 1 0 012 0zM9 9a.75.75 0 000 1.5h.253a.25.25 0 01.244.304l-.459 2.066A1.75 1.75 0 0010.747 15H11a.75.75 0 000-1.5h-.253a.25.25 0 01-.244-.304l.459-2.066A1.75 1.75 0 009.253 9H9z" clip-rule="evenodd" />
-                    </svg>
-                    <span class="text-sm/6 text-gray-700 dark:text-gray-300">
-                      <a href="https://platform.openai.com/docs/models/compare" target="_blank" rel="noopener noreferrer" class="text-blue-600 hover:text-blue-700 dark:text-blue-400 dark:hover:text-blue-300">
-                        View specifications
-                      </a>
-                    </span>
-                  </div>
-                </div>
-
-                <!-- Add Model Button or Added Status -->
                 @if (isModelAdded(model.id)) {
-                  <div class="inline-flex items-center gap-2 rounded-sm bg-green-100 px-3 py-1.5 text-sm/6 font-medium text-green-800 dark:bg-green-900/50 dark:text-green-300">
-                    <svg class="size-4" fill="currentColor" viewBox="0 0 20 20">
-                      <path fill-rule="evenodd" d="M10 18a8 8 0 100-16 8 8 0 000 16zm3.857-9.809a.75.75 0 00-1.214-.882l-3.483 4.79-1.88-1.88a.75.75 0 10-1.06 1.061l2.5 2.5a.75.75 0 001.137-.089l4-5.5z" clip-rule="evenodd" />
-                    </svg>
+                  <span class="inline-flex shrink-0 items-center gap-1.5 rounded-2xl bg-green-100 px-3 py-1.5 text-xs/5 font-medium text-green-800 dark:bg-green-900/50 dark:text-green-300">
+                    <ng-icon name="heroCheckCircleSolid" class="size-4" aria-hidden="true" />
                     Added
-                  </div>
+                  </span>
                 } @else {
                   <button
                     (click)="addModelFromOpenAI(model)"
-                    class="inline-flex items-center gap-2 rounded-sm bg-blue-600 px-3 py-1.5 text-sm/6 font-medium text-white hover:bg-blue-700 focus:outline-hidden focus:ring-3 focus:ring-blue-500/50 dark:bg-blue-500 dark:hover:bg-blue-600"
+                    class="inline-flex shrink-0 items-center gap-1.5 rounded-2xl bg-blue-600 px-3 py-1.5 text-xs/5 font-medium text-white hover:bg-blue-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:bg-blue-500 dark:hover:bg-blue-600"
                   >
-                    <svg class="size-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2">
-                      <path stroke-linecap="round" stroke-linejoin="round" d="M12 4v16m8-8H4" />
-                    </svg>
-                    Add Model
+                    <ng-icon name="heroPlus" class="size-4" aria-hidden="true" />
+                    Add
                   </button>
                 }
               </div>
-            </div>
+
+              <!-- Expanded detail -->
+              @if (isExpanded(model.id)) {
+                <div
+                  [id]="'model-detail-' + model.id"
+                  class="border-t border-gray-100 bg-gray-50 px-4 py-3 sm:pl-14 dark:border-gray-700/60 dark:bg-gray-900/40"
+                >
+                  <dl class="grid grid-cols-1 gap-x-8 gap-y-3 sm:grid-cols-3">
+                    <div>
+                      <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">Created</dt>
+                      <dd class="mt-0.5 text-sm/6 text-gray-700 dark:text-gray-300">{{ formatDate(model.created) }}</dd>
+                    </div>
+                    @if (model.object) {
+                      <div>
+                        <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">Type</dt>
+                        <dd class="mt-0.5 text-sm/6 text-gray-700 dark:text-gray-300">{{ model.object }}</dd>
+                      </div>
+                    }
+                    <div>
+                      <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">Specifications</dt>
+                      <dd class="mt-0.5 text-sm/6">
+                        <a href="https://platform.openai.com/docs/models/compare" target="_blank" rel="noopener noreferrer" class="text-blue-600 hover:text-blue-700 dark:text-blue-400 dark:hover:text-blue-300">
+                          View on OpenAI →
+                        </a>
+                      </dd>
+                    </div>
+                  </dl>
+                </div>
+              }
+            </li>
           }
-        </div>
+        </ul>
       }
     }
   </div>
diff --git a/frontend/ai.client/src/app/admin/openai-models/openai-models.page.ts b/frontend/ai.client/src/app/admin/openai-models/openai-models.page.ts
index ed18dbfa..c95980f7 100644
--- a/frontend/ai.client/src/app/admin/openai-models/openai-models.page.ts
+++ b/frontend/ai.client/src/app/admin/openai-models/openai-models.page.ts
@@ -2,7 +2,13 @@ import { Component, ChangeDetectionStrategy, inject, signal, computed } from '@a
 import { Router, RouterLink } from '@angular/router';
 import { FormsModule } from '@angular/forms';
 import { NgIcon, provideIcons } from '@ng-icons/core';
-import { heroArrowLeft } from '@ng-icons/heroicons/outline';
+import {
+  heroArrowLeft,
+  heroPlus,
+  heroMagnifyingGlass,
+  heroChevronDown,
+} from '@ng-icons/heroicons/outline';
+import { heroCheckCircleSolid } from '@ng-icons/heroicons/solid';
 import { OpenAIModelsService } from './services/openai-models.service';
 import { OpenAIModelSummary } from './models/openai-model.model';
 import { ManagedModelsService } from '../manage-models/services/managed-models.service';
@@ -11,7 +17,15 @@ import { ThinkingDotsComponent } from '../../components/thinking-dots.component'
 @Component({
   selector: 'app-openai-models-page',
   imports: [FormsModule, ThinkingDotsComponent, RouterLink, NgIcon],
-  providers: [provideIcons({ heroArrowLeft })],
+  providers: [
+    provideIcons({
+      heroArrowLeft,
+      heroPlus,
+      heroMagnifyingGlass,
+      heroChevronDown,
+      heroCheckCircleSolid,
+    }),
+  ],
   templateUrl: './openai-models.page.html',
   styleUrl: './openai-models.page.css',
   changeDetection: ChangeDetectionStrategy.OnPush,
@@ -25,6 +39,9 @@ export class OpenAIModelsPage {
   maxResultsFilter = signal<number | undefined>(undefined);
   searchQuery = signal<string>('');
 
+  // Row detail expansion state (set of model ids currently expanded)
+  private expandedIds = signal<ReadonlySet<string>>(new Set());
+
   // Access the models resource from the service
   readonly modelsResource = this.openaiModelsService.modelsResource;
 
@@ -80,6 +97,22 @@ export class OpenAIModelsPage {
     return !!(this.maxResultsFilter() || this.searchQuery());
   });
 
+  isExpanded(modelId: string): boolean {
+    return this.expandedIds().has(modelId);
+  }
+
+  toggleExpand(modelId: string): void {
+    this.expandedIds.update(current => {
+      const next = new Set(current);
+      if (next.has(modelId)) {
+        next.delete(modelId);
+      } else {
+        next.add(modelId);
+      }
+      return next;
+    });
+  }
+
   /**
    * Check if a model has already been added to the managed models list
    */
diff --git a/frontend/ai.client/src/app/admin/quota-tiers/pages/tier-list/tier-list.component.html b/frontend/ai.client/src/app/admin/quota-tiers/pages/tier-list/tier-list.component.html
index 80affc86..df50e210 100644
--- a/frontend/ai.client/src/app/admin/quota-tiers/pages/tier-list/tier-list.component.html
+++ b/frontend/ai.client/src/app/admin/quota-tiers/pages/tier-list/tier-list.component.html
@@ -103,7 +103,7 @@ <h2 class="mb-4 text-lg/7 font-semibold text-gray-900 dark:text-white">Search &
       <div class="flex items-center justify-center h-64">
         <div class="flex flex-col items-center gap-4">
           <div
-            class="animate-spin rounded-full size-12 border-4 border-gray-300 dark:border-gray-600 border-t-blue-600"
+            class="animate-spin rounded-full size-12 border-4 border-gray-300 dark:border-gray-600 border-t-blue-600 dark:border-t-blue-400"
           ></div>
           <p class="text-sm text-gray-500 dark:text-gray-400">
             Loading tiers...
diff --git a/frontend/ai.client/src/app/admin/quota-tiers/quota-routing.module.ts b/frontend/ai.client/src/app/admin/quota-tiers/quota-routing.module.ts
index 1878d3d9..baac7875 100644
--- a/frontend/ai.client/src/app/admin/quota-tiers/quota-routing.module.ts
+++ b/frontend/ai.client/src/app/admin/quota-tiers/quota-routing.module.ts
@@ -1,17 +1,52 @@
 import { Routes } from '@angular/router';
+import { QuotaLayout } from './quota.layout';
 
 export const quotaRoutes: Routes = [
   {
     path: '',
-    redirectTo: 'tiers',
-    pathMatch: 'full',
-  },
-  {
-    path: 'tiers',
-    loadComponent: () =>
-      import('./pages/tier-list/tier-list.component').then(
-        (m) => m.TierListComponent
-      ),
+    component: QuotaLayout,
+    children: [
+      {
+        path: '',
+        redirectTo: 'tiers',
+        pathMatch: 'full',
+      },
+      {
+        path: 'tiers',
+        loadComponent: () =>
+          import('./pages/tier-list/tier-list.component').then(
+            (m) => m.TierListComponent
+          ),
+      },
+      {
+        path: 'assignments',
+        loadComponent: () =>
+          import('./pages/assignment-list/assignment-list.component').then(
+            (m) => m.AssignmentListComponent
+          ),
+      },
+      {
+        path: 'overrides',
+        loadComponent: () =>
+          import('./pages/override-list/override-list.component').then(
+            (m) => m.OverrideListComponent
+          ),
+      },
+      {
+        path: 'inspector',
+        loadComponent: () =>
+          import('./pages/quota-inspector/quota-inspector.component').then(
+            (m) => m.QuotaInspectorComponent
+          ),
+      },
+      {
+        path: 'events',
+        loadComponent: () =>
+          import('./pages/event-viewer/event-viewer.component').then(
+            (m) => m.EventViewerComponent
+          ),
+      },
+    ],
   },
   {
     path: 'tiers/:tierId',
@@ -20,13 +55,6 @@ export const quotaRoutes: Routes = [
         (m) => m.TierDetailComponent
       ),
   },
-  {
-    path: 'assignments',
-    loadComponent: () =>
-      import('./pages/assignment-list/assignment-list.component').then(
-        (m) => m.AssignmentListComponent
-      ),
-  },
   {
     path: 'assignments/:assignmentId',
     loadComponent: () =>
@@ -34,13 +62,6 @@ export const quotaRoutes: Routes = [
         (m) => m.AssignmentDetailComponent
       ),
   },
-  {
-    path: 'overrides',
-    loadComponent: () =>
-      import('./pages/override-list/override-list.component').then(
-        (m) => m.OverrideListComponent
-      ),
-  },
   {
     path: 'overrides/:overrideId',
     loadComponent: () =>
@@ -48,18 +69,4 @@ export const quotaRoutes: Routes = [
         (m) => m.OverrideDetailComponent
       ),
   },
-  {
-    path: 'inspector',
-    loadComponent: () =>
-      import('./pages/quota-inspector/quota-inspector.component').then(
-        (m) => m.QuotaInspectorComponent
-      ),
-  },
-  {
-    path: 'events',
-    loadComponent: () =>
-      import('./pages/event-viewer/event-viewer.component').then(
-        (m) => m.EventViewerComponent
-      ),
-  },
 ];
diff --git a/frontend/ai.client/src/app/admin/quota-tiers/quota.layout.ts b/frontend/ai.client/src/app/admin/quota-tiers/quota.layout.ts
new file mode 100644
index 00000000..ddef7f7d
--- /dev/null
+++ b/frontend/ai.client/src/app/admin/quota-tiers/quota.layout.ts
@@ -0,0 +1,49 @@
+import { Component, ChangeDetectionStrategy } from '@angular/core';
+import { RouterLink, RouterLinkActive, RouterOutlet } from '@angular/router';
+
+interface QuotaTab {
+  label: string;
+  route: string;
+}
+
+@Component({
+  selector: 'app-quota-layout',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  imports: [RouterLink, RouterLinkActive, RouterOutlet],
+  host: { class: 'block' },
+  template: `
+    <div class="mb-6">
+      <h1 class="text-2xl/8 font-bold text-gray-900 dark:text-white">Quotas</h1>
+      <p class="mt-1 text-sm/6 text-gray-600 dark:text-gray-400">
+        Tiers, assignments, overrides, and runtime visibility for usage limits.
+      </p>
+    </div>
+
+    <div class="mb-6 border-b border-gray-200 dark:border-white/10">
+      <nav class="-mb-px flex flex-wrap gap-x-6" aria-label="Quota sections">
+        @for (tab of tabs; track tab.route) {
+          <a
+            [routerLink]="tab.route"
+            routerLinkActive="border-blue-500 text-blue-600 dark:border-blue-400 dark:text-blue-400"
+            #rla="routerLinkActive"
+            [attr.aria-current]="rla.isActive ? 'page' : null"
+            class="whitespace-nowrap border-b-2 border-transparent px-1 py-3 text-sm/6 font-medium text-gray-500 hover:border-gray-300 hover:text-gray-700 dark:text-gray-400 dark:hover:border-white/20 dark:hover:text-gray-200"
+          >
+            {{ tab.label }}
+          </a>
+        }
+      </nav>
+    </div>
+
+    <router-outlet />
+  `,
+})
+export class QuotaLayout {
+  readonly tabs: QuotaTab[] = [
+    { label: 'Tiers', route: 'tiers' },
+    { label: 'Assignments', route: 'assignments' },
+    { label: 'Overrides', route: 'overrides' },
+    { label: 'Inspector', route: 'inspector' },
+    { label: 'Events', route: 'events' },
+  ];
+}
diff --git a/frontend/ai.client/src/app/admin/roles/pages/role-form.page.ts b/frontend/ai.client/src/app/admin/roles/pages/role-form.page.ts
index 0389f55f..32a1514b 100644
--- a/frontend/ai.client/src/app/admin/roles/pages/role-form.page.ts
+++ b/frontend/ai.client/src/app/admin/roles/pages/role-form.page.ts
@@ -47,8 +47,7 @@ interface RoleFormGroup {
     class: 'block',
   },
   template: `
-    <div class="min-h-dvh">
-      <div class="mx-auto max-w-4xl px-4 py-8 sm:px-6 lg:px-8">
+    <div class="max-w-4xl">
         <!-- Back Button -->
         <button
           (click)="goBack()"
@@ -73,7 +72,7 @@ interface RoleFormGroup {
           <div class="flex items-center justify-center h-64">
             <div class="flex flex-col items-center gap-4">
               <div
-                class="animate-spin rounded-full size-12 border-4 border-gray-300 dark:border-gray-600 border-t-blue-600"
+                class="animate-spin rounded-full size-12 border-4 border-gray-300 dark:border-gray-600 border-t-blue-600 dark:border-t-blue-400"
               ></div>
               <p class="text-sm text-gray-500 dark:text-gray-400">
                 Loading role...
@@ -396,7 +395,6 @@ interface RoleFormGroup {
             </div>
           </form>
         }
-      </div>
     </div>
   `,
 })
diff --git a/frontend/ai.client/src/app/admin/roles/pages/role-list.page.ts b/frontend/ai.client/src/app/admin/roles/pages/role-list.page.ts
index b641a875..b0e57d0d 100644
--- a/frontend/ai.client/src/app/admin/roles/pages/role-list.page.ts
+++ b/frontend/ai.client/src/app/admin/roles/pages/role-list.page.ts
@@ -44,14 +44,6 @@ import { ToolsService } from '../../tools/services/tools.service';
     class: 'block p-6',
   },
   template: `
-    <!-- Back Button -->
-    <a
-      routerLink="/admin"
-      class="mb-6 inline-flex items-center gap-2 text-sm/6 font-medium text-gray-600 hover:text-gray-900 dark:text-gray-400 dark:hover:text-white"
-    >
-      <ng-icon name="heroArrowLeft" class="size-4" />
-      Back to Admin
-    </a>
 
     <div class="mb-6 flex items-center justify-between">
       <div>
@@ -117,7 +109,7 @@ import { ToolsService } from '../../tools/services/tools.service';
       <div class="flex items-center justify-center h-64">
         <div class="flex flex-col items-center gap-4">
           <div
-            class="animate-spin rounded-full size-12 border-4 border-gray-300 dark:border-gray-600 border-t-blue-600"
+            class="animate-spin rounded-full size-12 border-4 border-gray-300 dark:border-gray-600 border-t-blue-600 dark:border-t-blue-400"
           ></div>
           <p class="text-sm text-gray-500 dark:text-gray-400">
             Loading roles...
diff --git a/frontend/ai.client/src/app/admin/tools/pages/tool-form.page.ts b/frontend/ai.client/src/app/admin/tools/pages/tool-form.page.ts
index 42484e75..ac757988 100644
--- a/frontend/ai.client/src/app/admin/tools/pages/tool-form.page.ts
+++ b/frontend/ai.client/src/app/admin/tools/pages/tool-form.page.ts
@@ -12,7 +12,6 @@ import { FormArray, FormBuilder, FormGroup, Validators, ReactiveFormsModule } fr
 import { NgIcon, provideIcons } from '@ng-icons/core';
 import {
   heroArrowLeft,
-  heroCheck,
   heroServer,
   heroUserGroup,
   heroLink,
@@ -39,586 +38,618 @@ import {
   selector: 'app-tool-form',
   changeDetection: ChangeDetectionStrategy.OnPush,
   imports: [RouterLink, ReactiveFormsModule, NgIcon],
-  providers: [provideIcons({ heroArrowLeft, heroCheck, heroServer, heroUserGroup, heroLink, heroShieldCheck, heroPlus, heroTrash })],
-  host: {
-    class: 'block p-6',
-  },
+  providers: [provideIcons({ heroArrowLeft, heroServer, heroUserGroup, heroLink, heroShieldCheck, heroPlus, heroTrash })],
   template: `
-    <div class="max-w-2xl">
-      <!-- Header -->
-      <div class="mb-6">
+    <div class="min-h-dvh">
+      <div class="mx-auto max-w-3xl px-4 py-8 sm:px-6 lg:px-8">
+        <!-- Back link -->
         <a
           routerLink="/admin/tools"
-          class="inline-flex items-center gap-1 text-sm text-gray-600 hover:text-gray-900 dark:text-gray-400 dark:hover:text-gray-200 mb-4"
+          class="mb-6 inline-flex items-center gap-2 text-sm/6 font-medium text-gray-600 hover:text-gray-900 dark:text-gray-400 dark:hover:text-white"
         >
-          <ng-icon name="heroArrowLeft" class="size-4" />
+          <ng-icon name="heroArrowLeft" class="size-4" aria-hidden="true" />
           Back to Tools
         </a>
-        <h1 class="text-3xl/9 font-bold">
-          {{ isEditMode() ? 'Edit Tool' : 'Create Tool' }}
-        </h1>
-        <p class="text-gray-600 dark:text-gray-400">
-          {{ isEditMode() ? 'Update tool metadata and settings.' : 'Add a new tool to the catalog.' }}
-        </p>
-      </div>
 
-      <!-- Loading State -->
-      @if (loading()) {
-        <div class="flex items-center justify-center h-64">
-          <div class="animate-spin rounded-full size-12 border-4 border-gray-300 dark:border-gray-600 border-t-blue-600"></div>
+        <!-- Page Header -->
+        <div class="mb-8">
+          <h1 class="text-2xl/8 font-bold text-gray-900 dark:text-white">
+            {{ isEditMode() ? 'Edit Tool' : 'Create Tool' }}
+          </h1>
+          <p class="mt-1 text-sm/6 text-gray-600 dark:text-gray-400">
+            {{ isEditMode() ? 'Update tool metadata and settings.' : 'Add a new tool to the catalog.' }}
+          </p>
         </div>
-      } @else {
-        <!-- Form -->
-        <form [formGroup]="form" (ngSubmit)="onSubmit()" class="space-y-6">
-          <!-- Tool ID (only for create) -->
-          @if (!isEditMode()) {
-            <div>
-              <label for="toolId" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-                Tool ID
-              </label>
-              <input
-                id="toolId"
-                type="text"
-                formControlName="toolId"
-                class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
-                placeholder="e.g., my_custom_tool"
-              />
-              @if (form.get('toolId')?.invalid && form.get('toolId')?.touched) {
-                <p class="mt-1 text-sm text-red-600 dark:text-red-400">
-                  Tool ID must be 3-50 characters, lowercase letters, numbers, and underscores only.
-                </p>
-              }
-            </div>
-          }
-
-          <!-- Display Name -->
-          <div>
-            <label for="displayName" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-              Display Name
-            </label>
-            <input
-              id="displayName"
-              type="text"
-              formControlName="displayName"
-              class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
-              placeholder="e.g., My Custom Tool"
-            />
-            @if (form.get('displayName')?.invalid && form.get('displayName')?.touched) {
-              <p class="mt-1 text-sm text-red-600 dark:text-red-400">
-                Display name is required (1-100 characters).
-              </p>
-            }
-          </div>
 
-          <!-- Description -->
-          <div>
-            <label for="description" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-              Description
-            </label>
-            <textarea
-              id="description"
-              formControlName="description"
-              rows="3"
-              class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
-              placeholder="Describe what this tool does..."
-            ></textarea>
-            @if (form.get('description')?.invalid && form.get('description')?.touched) {
-              <p class="mt-1 text-sm text-red-600 dark:text-red-400">
-                Description is required (max 500 characters).
-              </p>
-            }
+        <!-- Loading State -->
+        @if (loading()) {
+          <div class="flex h-64 items-center justify-center">
+            <div class="size-10 animate-spin rounded-full border-4 border-gray-300 border-t-blue-600 dark:border-gray-700 dark:border-t-blue-500"></div>
           </div>
-
-          <!-- Category and Protocol Row -->
-          <div class="grid grid-cols-2 gap-4">
-            <div>
-              <label for="category" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-                Category
-              </label>
-              <select
-                id="category"
-                formControlName="category"
-                class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
-              >
-                @for (cat of categories; track cat.value) {
-                  <option [value]="cat.value">{{ cat.label }}</option>
-                }
-              </select>
-            </div>
-
-            <div>
-              <label for="protocol" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-                Protocol
-              </label>
-              <select
-                id="protocol"
-                formControlName="protocol"
-                class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
-              >
-                @for (proto of protocols; track proto.value) {
-                  <option [value]="proto.value">{{ proto.label }}</option>
-                }
-              </select>
-              @if (selectedProtocol()) {
-                <p class="mt-1 text-xs text-gray-500 dark:text-gray-400">
-                  {{ getProtocolDescription(selectedProtocol()) }}
-                </p>
+        } @else {
+          <!-- Form -->
+          <form [formGroup]="form" (ngSubmit)="onSubmit()" class="space-y-8">
+            <!-- Basic Information -->
+            <section class="space-y-4">
+              <h2 class="text-base/7 font-semibold text-gray-900 dark:text-white">Basic information</h2>
+
+              <!-- Tool ID (only for create) -->
+              @if (!isEditMode()) {
+                <div>
+                  <label for="toolId" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                    Tool ID <span class="text-red-600">*</span>
+                  </label>
+                  <input
+                    id="toolId"
+                    type="text"
+                    formControlName="toolId"
+                    placeholder="e.g., my_custom_tool"
+                    class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
+                    [class.border-red-500]="form.get('toolId')?.invalid && form.get('toolId')?.touched"
+                  />
+                  @if (form.get('toolId')?.invalid && form.get('toolId')?.touched) {
+                    <p class="mt-1 text-sm/6 text-red-600 dark:text-red-400">
+                      Tool ID must be 3-50 characters, lowercase letters, numbers, and underscores only.
+                    </p>
+                  }
+                </div>
               }
-            </div>
-          </div>
 
-          <!-- MCP External Server Configuration -->
-          @if (selectedProtocol() === 'mcp_external') {
-            <div class="border border-blue-200 dark:border-blue-800 rounded-lg p-4 bg-blue-50/50 dark:bg-blue-900/20">
-              <div class="flex items-center gap-2 mb-4">
-                <ng-icon name="heroServer" class="size-5 text-blue-600 dark:text-blue-400" />
-                <h3 class="text-lg font-semibold text-blue-900 dark:text-blue-100">MCP Server Configuration</h3>
-              </div>
-
-              <!-- Server URL -->
-              <div class="mb-4">
-                <label for="mcpServerUrl" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-                  Server URL <span class="text-red-500">*</span>
+              <!-- Display Name -->
+              <div>
+                <label for="displayName" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                  Display Name <span class="text-red-600">*</span>
                 </label>
                 <input
-                  id="mcpServerUrl"
-                  type="url"
-                  formControlName="mcpServerUrl"
-                  class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
-                  placeholder="https://xxx.lambda-url.us-west-2.on.aws/"
+                  id="displayName"
+                  type="text"
+                  formControlName="displayName"
+                  placeholder="e.g., My Custom Tool"
+                  class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
+                  [class.border-red-500]="form.get('displayName')?.invalid && form.get('displayName')?.touched"
                 />
-                <p class="mt-1 text-xs text-gray-500 dark:text-gray-400">
-                  Lambda Function URL or API Gateway endpoint
-                </p>
+                @if (form.get('displayName')?.invalid && form.get('displayName')?.touched) {
+                  <p class="mt-1 text-sm/6 text-red-600 dark:text-red-400">
+                    Display name is required (1-100 characters).
+                  </p>
+                }
+              </div>
+
+              <!-- Description -->
+              <div>
+                <label for="description" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                  Description <span class="text-red-600">*</span>
+                </label>
+                <textarea
+                  id="description"
+                  formControlName="description"
+                  rows="3"
+                  placeholder="Describe what this tool does..."
+                  class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
+                  [class.border-red-500]="form.get('description')?.invalid && form.get('description')?.touched"
+                ></textarea>
+                @if (form.get('description')?.invalid && form.get('description')?.touched) {
+                  <p class="mt-1 text-sm/6 text-red-600 dark:text-red-400">
+                    Description is required (max 500 characters).
+                  </p>
+                }
               </div>
 
-              <!-- Transport and Auth Row -->
-              <div class="grid grid-cols-2 gap-4 mb-4">
+              <!-- Category and Protocol Row -->
+              <div class="grid grid-cols-1 gap-4 sm:grid-cols-2">
                 <div>
-                  <label for="mcpTransport" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-                    Transport
+                  <label for="category" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                    Category
                   </label>
                   <select
-                    id="mcpTransport"
-                    formControlName="mcpTransport"
-                    class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
+                    id="category"
+                    formControlName="category"
+                    class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white"
                   >
-                    @for (transport of mcpTransports; track transport.value) {
-                      <option [value]="transport.value">{{ transport.label }}</option>
+                    @for (cat of categories; track cat.value) {
+                      <option [value]="cat.value">{{ cat.label }}</option>
                     }
                   </select>
                 </div>
 
                 <div>
-                  <label for="mcpAuthType" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-                    Authentication
+                  <label for="protocol" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                    Protocol
                   </label>
                   <select
-                    id="mcpAuthType"
-                    formControlName="mcpAuthType"
-                    class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
+                    id="protocol"
+                    formControlName="protocol"
+                    class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white"
                   >
-                    @for (auth of mcpAuthTypes; track auth.value) {
-                      <option [value]="auth.value">{{ auth.label }}</option>
+                    @for (proto of protocols; track proto.value) {
+                      <option [value]="proto.value">{{ proto.label }}</option>
                     }
                   </select>
+                  @if (selectedProtocol()) {
+                    <p class="mt-1 text-xs/5 text-gray-500 dark:text-gray-400">
+                      {{ getProtocolDescription(selectedProtocol()) }}
+                    </p>
+                  }
                 </div>
               </div>
+            </section>
+
+            <!-- MCP External Server Configuration -->
+            @if (selectedProtocol() === 'mcp_external') {
+              <section class="space-y-4 border-t border-gray-200 pt-8 dark:border-gray-700">
+                <div class="flex items-center gap-2">
+                  <ng-icon name="heroServer" class="size-5 text-blue-600 dark:text-blue-400" aria-hidden="true" />
+                  <h2 class="text-base/7 font-semibold text-gray-900 dark:text-white">MCP server configuration</h2>
+                </div>
 
-              <!-- AWS Region (shown for aws-iam auth) -->
-              @if (form.get('mcpAuthType')?.value === 'aws-iam') {
-                <div class="mb-4">
-                  <label for="mcpAwsRegion" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-                    AWS Region
+                <!-- Server URL -->
+                <div>
+                  <label for="mcpServerUrl" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                    Server URL <span class="text-red-600">*</span>
                   </label>
                   <input
-                    id="mcpAwsRegion"
-                    type="text"
-                    formControlName="mcpAwsRegion"
-                    class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
-                    placeholder="us-west-2 (auto-detected from URL if blank)"
+                    id="mcpServerUrl"
+                    type="url"
+                    formControlName="mcpServerUrl"
+                    placeholder="https://xxx.lambda-url.us-west-2.on.aws/"
+                    class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
                   />
+                  <p class="mt-1 text-xs/5 text-gray-500 dark:text-gray-400">
+                    Lambda Function URL or API Gateway endpoint
+                  </p>
                 </div>
-              }
 
-              <!-- API Key Header (shown for api-key auth) -->
-              @if (form.get('mcpAuthType')?.value === 'api-key') {
-                <div class="grid grid-cols-2 gap-4 mb-4">
+                <!-- Transport and Auth Row -->
+                <div class="grid grid-cols-1 gap-4 sm:grid-cols-2">
                   <div>
-                    <label for="mcpApiKeyHeader" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-                      API Key Header
+                    <label for="mcpTransport" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                      Transport
                     </label>
-                    <input
-                      id="mcpApiKeyHeader"
-                      type="text"
-                      formControlName="mcpApiKeyHeader"
-                      class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
-                      placeholder="x-api-key"
-                    />
+                    <select
+                      id="mcpTransport"
+                      formControlName="mcpTransport"
+                      class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white"
+                    >
+                      @for (transport of mcpTransports; track transport.value) {
+                        <option [value]="transport.value">{{ transport.label }}</option>
+                      }
+                    </select>
                   </div>
+
                   <div>
-                    <label for="mcpSecretArn" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-                      Secret ARN
+                    <label for="mcpAuthType" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                      Authentication
                     </label>
-                    <input
-                      id="mcpSecretArn"
-                      type="text"
-                      formControlName="mcpSecretArn"
-                      class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
-                      placeholder="arn:aws:secretsmanager:..."
-                    />
+                    <select
+                      id="mcpAuthType"
+                      formControlName="mcpAuthType"
+                      class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white"
+                    >
+                      @for (auth of mcpAuthTypes; track auth.value) {
+                        <option [value]="auth.value">{{ auth.label }}</option>
+                      }
+                    </select>
                   </div>
                 </div>
-              }
 
-              <!-- MCP Tools -->
-              <div class="mb-4" formArrayName="mcpTools">
-                <div class="flex items-center justify-between mb-2">
-                  <label class="block text-sm font-medium text-gray-700 dark:text-gray-300">
-                    Available Tools
-                  </label>
-                  <div class="flex items-center gap-2">
-                    <button
-                      type="button"
-                      (click)="discoverMcpTools()"
-                      [disabled]="discovering() || !form.get('mcpServerUrl')?.value"
-                      class="inline-flex items-center gap-1 px-2 py-1 text-sm text-blue-700 hover:text-blue-900 dark:text-blue-300 dark:hover:text-blue-100 disabled:opacity-50 disabled:cursor-not-allowed"
-                    >
-                      {{ discovering() ? 'Discovering…' : 'Discover from server' }}
-                    </button>
-                    <button
-                      type="button"
-                      (click)="addMcpTool()"
-                      class="inline-flex items-center gap-1 px-2 py-1 text-sm text-blue-700 hover:text-blue-900 dark:text-blue-300 dark:hover:text-blue-100"
-                    >
-                      <ng-icon name="heroPlus" class="size-4" />
-                      Add Tool
-                    </button>
+                <!-- AWS Region (shown for aws-iam auth) -->
+                @if (form.get('mcpAuthType')?.value === 'aws-iam') {
+                  <div>
+                    <label for="mcpAwsRegion" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                      AWS Region
+                    </label>
+                    <input
+                      id="mcpAwsRegion"
+                      type="text"
+                      formControlName="mcpAwsRegion"
+                      placeholder="us-west-2 (auto-detected from URL if blank)"
+                      class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
+                    />
                   </div>
-                </div>
-                @if (discoverError()) {
-                  <p class="mb-2 text-sm text-red-600 dark:text-red-400">
-                    {{ discoverError() }}
-                  </p>
                 }
 
-                @if (mcpToolsArray.length === 0) {
-                  <p class="text-xs text-gray-500 dark:text-gray-400 italic">
-                    No tools listed. Leave empty to discover tools at runtime — per-tool approval flags will not apply.
-                  </p>
-                } @else {
-                  <div class="space-y-2">
-                    @for (row of mcpToolsArray.controls; track $index) {
-                      <div [formGroupName]="$index" class="flex items-start gap-2 p-2 bg-white dark:bg-gray-900 border border-gray-200 dark:border-gray-700 rounded-sm">
-                        <div class="flex-1">
-                          <input
-                            type="text"
-                            formControlName="name"
-                            class="w-full px-2 py-1 text-sm font-mono border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
-                            placeholder="tool_name"
-                            [attr.aria-label]="'Tool name ' + ($index + 1)"
-                          />
-                        </div>
-                        <label class="flex items-center gap-1.5 text-xs text-gray-700 dark:text-gray-300 whitespace-nowrap pt-1.5">
-                          <input
-                            type="checkbox"
-                            formControlName="needsApproval"
-                            class="size-4 rounded border-gray-300 text-amber-600 focus:ring-amber-500"
-                          />
-                          <span>Needs approval</span>
-                        </label>
-                        <button
-                          type="button"
-                          (click)="removeMcpTool($index)"
-                          class="p-1 text-gray-500 hover:text-red-600 dark:text-gray-400 dark:hover:text-red-400"
-                          [attr.aria-label]="'Remove tool ' + ($index + 1)"
-                        >
-                          <ng-icon name="heroTrash" class="size-4" />
-                        </button>
-                      </div>
-                    }
+                <!-- API Key Header (shown for api-key auth) -->
+                @if (form.get('mcpAuthType')?.value === 'api-key') {
+                  <div class="grid grid-cols-1 gap-4 sm:grid-cols-2">
+                    <div>
+                      <label for="mcpApiKeyHeader" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                        API Key Header
+                      </label>
+                      <input
+                        id="mcpApiKeyHeader"
+                        type="text"
+                        formControlName="mcpApiKeyHeader"
+                        placeholder="x-api-key"
+                        class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
+                      />
+                    </div>
+                    <div>
+                      <label for="mcpSecretArn" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                        Secret ARN
+                      </label>
+                      <input
+                        id="mcpSecretArn"
+                        type="text"
+                        formControlName="mcpSecretArn"
+                        placeholder="arn:aws:secretsmanager:..."
+                        class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
+                      />
+                    </div>
                   </div>
                 }
-                <p class="mt-2 text-xs text-gray-500 dark:text-gray-400">
-                  Tools flagged "Needs approval" will pause the agent for user confirmation before invocation.
-                </p>
-              </div>
 
-              <!-- Health Check -->
-              <label class="flex items-center gap-2 mb-4">
-                <input
-                  type="checkbox"
-                  formControlName="mcpHealthCheckEnabled"
-                  class="size-4 rounded border-gray-300 text-blue-600 focus:ring-blue-500"
-                />
-                <span class="text-sm font-medium text-gray-700 dark:text-gray-300">
-                  Enable Health Checks
-                </span>
-              </label>
-            </div>
-
-            <!-- OIDC Token Forwarding -->
-            <div class="border border-amber-200 dark:border-amber-800 rounded-lg p-4 bg-amber-50/50 dark:bg-amber-900/20">
-              <div class="flex items-center gap-2 mb-3">
-                <ng-icon name="heroShieldCheck" class="size-5 text-amber-600 dark:text-amber-400" />
-                <h3 class="text-lg font-semibold text-amber-900 dark:text-amber-100">Forward App Authentication Token</h3>
-              </div>
-
-              <label class="flex items-start gap-2">
-                <input
-                  type="checkbox"
-                  formControlName="forwardAuthToken"
-                  class="size-4 mt-0.5 rounded border-gray-300 text-amber-600 focus:ring-amber-500"
-                />
-                <div class="flex-1">
-                  <span class="text-sm font-medium text-gray-700 dark:text-gray-300">
-                    Forward user's OIDC token to MCP server
-                  </span>
-                  <p class="text-sm text-gray-600 dark:text-gray-400 mt-1">
-                    The user's authentication token from app login will be sent in the Authorization header.
-                    The MCP server validates the JWT and extracts user identity from claims.
-                  </p>
-                </div>
-              </label>
+                <!-- MCP Tools -->
+                <div formArrayName="mcpTools">
+                  <div class="mb-2 flex items-center justify-between">
+                    <span class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                      Available Tools
+                    </span>
+                    <div class="flex items-center gap-1">
+                      <button
+                        type="button"
+                        (click)="discoverMcpTools()"
+                        [disabled]="discovering() || !form.get('mcpServerUrl')?.value"
+                        class="inline-flex items-center gap-1 rounded-2xl px-2.5 py-1 text-sm/6 font-medium text-blue-600 hover:bg-blue-50 hover:text-blue-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 disabled:cursor-not-allowed disabled:opacity-50 dark:text-blue-400 dark:hover:bg-blue-900/20"
+                      >
+                        {{ discovering() ? 'Discovering…' : 'Discover from server' }}
+                      </button>
+                      <button
+                        type="button"
+                        (click)="addMcpTool()"
+                        class="inline-flex items-center gap-1 rounded-2xl px-2.5 py-1 text-sm/6 font-medium text-blue-600 hover:bg-blue-50 hover:text-blue-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:text-blue-400 dark:hover:bg-blue-900/20"
+                      >
+                        <ng-icon name="heroPlus" class="size-4" aria-hidden="true" />
+                        Add Tool
+                      </button>
+                    </div>
+                  </div>
+                  @if (discoverError()) {
+                    <p class="mb-2 text-sm/6 text-red-600 dark:text-red-400">
+                      {{ discoverError() }}
+                    </p>
+                  }
 
-              @if (form.get('forwardAuthToken')?.value) {
-                <div class="mt-3 p-3 bg-amber-100 dark:bg-amber-900/30 border border-amber-300 dark:border-amber-700 rounded-sm">
-                  <p class="text-sm font-medium text-amber-900 dark:text-amber-100 mb-1">
-                    Security Notice
-                  </p>
-                  <p class="text-sm text-amber-800 dark:text-amber-200">
-                    Only enable this for MCP servers you control. The user's authentication token will be sent
-                    in the Authorization header. The MCP server should validate the JWT signature and extract
-                    user identity from the token claims. Set the MCP Authentication Type to "None" above.
+                  @if (mcpToolsArray.length === 0) {
+                    <p class="text-xs/5 italic text-gray-500 dark:text-gray-400">
+                      No tools listed. Leave empty to discover tools at runtime — per-tool approval flags will not apply.
+                    </p>
+                  } @else {
+                    <div class="space-y-2">
+                      @for (row of mcpToolsArray.controls; track $index) {
+                        <div [formGroupName]="$index" class="flex items-start gap-2 rounded-2xl border border-gray-200 bg-white p-2 dark:border-gray-700 dark:bg-gray-800">
+                          <div class="flex-1">
+                            <input
+                              type="text"
+                              formControlName="name"
+                              placeholder="tool_name"
+                              [attr.aria-label]="'Tool name ' + ($index + 1)"
+                              class="block w-full rounded-2xl border border-gray-300 bg-white px-3 py-1.5 font-mono text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-900 dark:text-white"
+                            />
+                          </div>
+                          <label class="flex items-center gap-1.5 whitespace-nowrap pt-1.5 text-xs/5 text-gray-700 dark:text-gray-300">
+                            <input
+                              type="checkbox"
+                              formControlName="needsApproval"
+                              class="size-4 rounded border-gray-300 text-amber-600 focus:ring-2 focus:ring-amber-500 dark:border-gray-600 dark:bg-gray-800"
+                            />
+                            <span>Needs approval</span>
+                          </label>
+                          <button
+                            type="button"
+                            (click)="removeMcpTool($index)"
+                            [attr.aria-label]="'Remove tool ' + ($index + 1)"
+                            class="flex size-8 shrink-0 items-center justify-center rounded-2xl text-gray-400 hover:bg-red-50 hover:text-red-600 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-red-500 dark:text-gray-500 dark:hover:bg-red-900/20 dark:hover:text-red-400"
+                          >
+                            <ng-icon name="heroTrash" class="size-4" aria-hidden="true" />
+                          </button>
+                        </div>
+                      }
+                    </div>
+                  }
+                  <p class="mt-2 text-xs/5 text-gray-500 dark:text-gray-400">
+                    Tools flagged "Needs approval" will pause the agent for user confirmation before invocation.
                   </p>
                 </div>
-              }
-            </div>
 
-            <!-- OAuth Provider Requirement -->
-            <div class="border border-emerald-200 dark:border-emerald-800 rounded-lg p-4 bg-emerald-50/50 dark:bg-emerald-900/20">
-              <div class="flex items-center gap-2 mb-4">
-                <ng-icon name="heroLink" class="size-5 text-emerald-600 dark:text-emerald-400" />
-                <h3 class="text-lg font-semibold text-emerald-900 dark:text-emerald-100">User OAuth Connector</h3>
-              </div>
-              <p class="text-sm text-gray-600 dark:text-gray-400 mb-4">
-                If this tool requires access to a user's external account (e.g., Google Workspace, Microsoft 365),
-                select the OAuth provider. The user's access token will be passed to the MCP server.
-              </p>
-              <div>
-                <label for="requiresOauthProvider" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-                  Required OAuth Provider
+                <!-- Health Check -->
+                <label class="flex items-center gap-3">
+                  <input
+                    type="checkbox"
+                    formControlName="mcpHealthCheckEnabled"
+                    class="size-4 rounded border-gray-300 text-blue-600 focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800"
+                  />
+                  <span class="text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                    Enable health checks
+                  </span>
                 </label>
-                <select
-                  id="requiresOauthProvider"
-                  formControlName="requiresOauthProvider"
-                  class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-emerald-500 focus:border-emerald-500 dark:bg-gray-800 dark:border-gray-600"
-                >
-                  <option [value]="''">None - No user OAuth required</option>
-                  @for (provider of oauthProviders(); track provider.providerId) {
-                    <option [value]="provider.providerId">{{ provider.displayName }}</option>
-                  }
-                </select>
-                <p class="mt-1 text-xs text-gray-500 dark:text-gray-400">
-                  Users must connect this connector before using the tool. Manage connectors in
-                  <a routerLink="/admin/connectors" class="text-emerald-600 hover:underline">Connectors</a>.
-                </p>
-              </div>
-            </div>
-          }
+              </section>
 
-          <!-- A2A Agent Configuration -->
-          @if (selectedProtocol() === 'a2a') {
-            <div class="border border-purple-200 dark:border-purple-800 rounded-lg p-4 bg-purple-50/50 dark:bg-purple-900/20">
-              <div class="flex items-center gap-2 mb-4">
-                <ng-icon name="heroUserGroup" class="size-5 text-purple-600 dark:text-purple-400" />
-                <h3 class="text-lg font-semibold text-purple-900 dark:text-purple-100">Agent-to-Agent Configuration</h3>
-              </div>
-
-              <!-- Agent URL -->
-              <div class="mb-4">
-                <label for="a2aAgentUrl" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-                  Agent URL <span class="text-red-500">*</span>
-                </label>
-                <input
-                  id="a2aAgentUrl"
-                  type="url"
-                  formControlName="a2aAgentUrl"
-                  class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
-                  placeholder="https://agent-endpoint.example.com/"
-                />
-              </div>
+              <!-- Forward App Authentication Token -->
+              <section class="space-y-3 border-t border-gray-200 pt-8 dark:border-gray-700">
+                <div class="flex items-center gap-2">
+                  <ng-icon name="heroShieldCheck" class="size-5 text-amber-600 dark:text-amber-400" aria-hidden="true" />
+                  <h2 class="text-base/7 font-semibold text-gray-900 dark:text-white">Forward app authentication token</h2>
+                </div>
 
-              <!-- Agent ID and Auth Row -->
-              <div class="grid grid-cols-2 gap-4 mb-4">
-                <div>
-                  <label for="a2aAgentId" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-                    Agent ID
-                  </label>
+                <label class="flex items-start gap-3">
                   <input
-                    id="a2aAgentId"
-                    type="text"
-                    formControlName="a2aAgentId"
-                    class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
-                    placeholder="AgentCore Runtime ID (optional)"
+                    type="checkbox"
+                    formControlName="forwardAuthToken"
+                    class="mt-0.5 size-4 rounded border-gray-300 text-amber-600 focus:ring-2 focus:ring-amber-500 dark:border-gray-600 dark:bg-gray-800"
                   />
-                </div>
+                  <span class="flex-1">
+                    <span class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                      Forward user's OIDC token to MCP server
+                    </span>
+                    <span class="mt-1 block text-sm/6 text-gray-600 dark:text-gray-400">
+                      The user's authentication token from app login will be sent in the Authorization header.
+                      The MCP server validates the JWT and extracts user identity from claims.
+                    </span>
+                  </span>
+                </label>
 
+                @if (form.get('forwardAuthToken')?.value) {
+                  <div class="rounded-2xl border border-amber-300 bg-amber-50 p-4 dark:border-amber-700 dark:bg-amber-900/30">
+                    <p class="mb-1 text-sm/6 font-medium text-amber-900 dark:text-amber-100">
+                      Security notice
+                    </p>
+                    <p class="text-sm/6 text-amber-800 dark:text-amber-200">
+                      Only enable this for MCP servers you control. The user's authentication token will be sent
+                      in the Authorization header. The MCP server should validate the JWT signature and extract
+                      user identity from the token claims. Set the MCP Authentication Type to "None" above.
+                    </p>
+                  </div>
+                }
+              </section>
+
+              <!-- User OAuth Connector -->
+              <section class="space-y-3 border-t border-gray-200 pt-8 dark:border-gray-700">
+                <div class="flex items-center gap-2">
+                  <ng-icon name="heroLink" class="size-5 text-emerald-600 dark:text-emerald-400" aria-hidden="true" />
+                  <h2 class="text-base/7 font-semibold text-gray-900 dark:text-white">User OAuth connector</h2>
+                </div>
+                <p class="text-sm/6 text-gray-600 dark:text-gray-400">
+                  If this tool requires access to a user's external account (e.g., Google Workspace, Microsoft 365),
+                  select the OAuth provider. The user's access token will be passed to the MCP server.
+                </p>
                 <div>
-                  <label for="a2aAuthType" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-                    Authentication
+                  <label for="requiresOauthProvider" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                    Required OAuth provider
                   </label>
                   <select
-                    id="a2aAuthType"
-                    formControlName="a2aAuthType"
-                    class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
+                    id="requiresOauthProvider"
+                    formControlName="requiresOauthProvider"
+                    class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white"
                   >
-                    @for (auth of a2aAuthTypes; track auth.value) {
-                      <option [value]="auth.value">{{ auth.label }}</option>
+                    <option [value]="''">None - No user OAuth required</option>
+                    @for (provider of oauthProviders(); track provider.providerId) {
+                      <option [value]="provider.providerId">{{ provider.displayName }}</option>
                     }
                   </select>
+                  <p class="mt-1 text-xs/5 text-gray-500 dark:text-gray-400">
+                    Users must connect this connector before using the tool. Manage connectors in
+                    <a routerLink="/admin/connectors" class="font-medium text-blue-600 hover:text-blue-700 dark:text-blue-400 dark:hover:text-blue-300">Connectors</a>.
+                  </p>
+                </div>
+              </section>
+            }
+
+            <!-- A2A Agent Configuration -->
+            @if (selectedProtocol() === 'a2a') {
+              <section class="space-y-4 border-t border-gray-200 pt-8 dark:border-gray-700">
+                <div class="flex items-center gap-2">
+                  <ng-icon name="heroUserGroup" class="size-5 text-purple-600 dark:text-purple-400" aria-hidden="true" />
+                  <h2 class="text-base/7 font-semibold text-gray-900 dark:text-white">Agent-to-agent configuration</h2>
                 </div>
-              </div>
 
-              <!-- AWS Region (shown for aws-iam or agentcore auth) -->
-              @if (form.get('a2aAuthType')?.value === 'aws-iam' || form.get('a2aAuthType')?.value === 'agentcore') {
-                <div class="mb-4">
-                  <label for="a2aAwsRegion" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-                    AWS Region
+                <!-- Agent URL -->
+                <div>
+                  <label for="a2aAgentUrl" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                    Agent URL <span class="text-red-600">*</span>
                   </label>
                   <input
-                    id="a2aAwsRegion"
-                    type="text"
-                    formControlName="a2aAwsRegion"
-                    class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
-                    placeholder="us-west-2"
+                    id="a2aAgentUrl"
+                    type="url"
+                    formControlName="a2aAgentUrl"
+                    placeholder="https://agent-endpoint.example.com/"
+                    class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
                   />
                 </div>
-              }
 
-              <!-- Capabilities -->
-              <div class="mb-4">
-                <label for="a2aCapabilities" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-                  Capabilities
-                </label>
-                <textarea
-                  id="a2aCapabilities"
-                  formControlName="a2aCapabilities"
-                  rows="3"
-                  class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600 font-mono text-sm"
-                  placeholder="report_generation&#10;data_analysis&#10;document_creation"
-                ></textarea>
-                <p class="mt-1 text-xs text-gray-500 dark:text-gray-400">
-                  One capability per line
-                </p>
-              </div>
+                <!-- Agent ID and Auth Row -->
+                <div class="grid grid-cols-1 gap-4 sm:grid-cols-2">
+                  <div>
+                    <label for="a2aAgentId" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                      Agent ID
+                    </label>
+                    <input
+                      id="a2aAgentId"
+                      type="text"
+                      formControlName="a2aAgentId"
+                      placeholder="AgentCore Runtime ID (optional)"
+                      class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
+                    />
+                  </div>
+
+                  <div>
+                    <label for="a2aAuthType" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                      Authentication
+                    </label>
+                    <select
+                      id="a2aAuthType"
+                      formControlName="a2aAuthType"
+                      class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white"
+                    >
+                      @for (auth of a2aAuthTypes; track auth.value) {
+                        <option [value]="auth.value">{{ auth.label }}</option>
+                      }
+                    </select>
+                  </div>
+                </div>
+
+                <!-- AWS Region (shown for aws-iam or agentcore auth) -->
+                @if (form.get('a2aAuthType')?.value === 'aws-iam' || form.get('a2aAuthType')?.value === 'agentcore') {
+                  <div>
+                    <label for="a2aAwsRegion" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                      AWS Region
+                    </label>
+                    <input
+                      id="a2aAwsRegion"
+                      type="text"
+                      formControlName="a2aAwsRegion"
+                      placeholder="us-west-2"
+                      class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
+                    />
+                  </div>
+                }
 
-              <!-- Timeout and Retries -->
-              <div class="grid grid-cols-2 gap-4">
+                <!-- Capabilities -->
                 <div>
-                  <label for="a2aTimeoutSeconds" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-                    Timeout (seconds)
+                  <label for="a2aCapabilities" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                    Capabilities
                   </label>
+                  <textarea
+                    id="a2aCapabilities"
+                    formControlName="a2aCapabilities"
+                    rows="3"
+                    placeholder="report_generation&#10;data_analysis&#10;document_creation"
+                    class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 font-mono text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
+                  ></textarea>
+                  <p class="mt-1 text-xs/5 text-gray-500 dark:text-gray-400">
+                    One capability per line
+                  </p>
+                </div>
+
+                <!-- Timeout and Retries -->
+                <div class="grid grid-cols-1 gap-4 sm:grid-cols-2">
+                  <div>
+                    <label for="a2aTimeoutSeconds" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                      Timeout (seconds)
+                    </label>
+                    <input
+                      id="a2aTimeoutSeconds"
+                      type="number"
+                      formControlName="a2aTimeoutSeconds"
+                      min="1"
+                      max="600"
+                      class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white"
+                    />
+                  </div>
+                  <div>
+                    <label for="a2aMaxRetries" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                      Max Retries
+                    </label>
+                    <input
+                      id="a2aMaxRetries"
+                      type="number"
+                      formControlName="a2aMaxRetries"
+                      min="0"
+                      max="10"
+                      class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white"
+                    />
+                  </div>
+                </div>
+              </section>
+            }
+
+            <!-- Status & Visibility -->
+            <section class="space-y-6 border-t border-gray-200 pt-8 dark:border-gray-700">
+              <h2 class="text-base/7 font-semibold text-gray-900 dark:text-white">Status &amp; visibility</h2>
+
+              <div>
+                <label for="status" class="block text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                  Status
+                </label>
+                <select
+                  id="status"
+                  formControlName="status"
+                  class="mt-1 block w-full rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 sm:max-w-xs dark:border-gray-600 dark:bg-gray-800 dark:text-white"
+                >
+                  @for (stat of statuses; track stat.value) {
+                    <option [value]="stat.value">{{ stat.label }}</option>
+                  }
+                </select>
+              </div>
+
+              <div>
+                <label class="flex items-center gap-3">
                   <input
-                    id="a2aTimeoutSeconds"
-                    type="number"
-                    formControlName="a2aTimeoutSeconds"
-                    class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
-                    min="1"
-                    max="600"
+                    type="checkbox"
+                    formControlName="isPublic"
+                    class="size-4 rounded border-gray-300 text-blue-600 focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800"
                   />
-                </div>
-                <div>
-                  <label for="a2aMaxRetries" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-                    Max Retries
-                  </label>
+                  <span class="text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                    Public tool
+                  </span>
+                </label>
+                <p class="ml-7 mt-1 text-xs/5 text-gray-500 dark:text-gray-400">
+                  Available to all authenticated users.
+                </p>
+              </div>
+
+              <div>
+                <label class="flex items-center gap-3">
                   <input
-                    id="a2aMaxRetries"
-                    type="number"
-                    formControlName="a2aMaxRetries"
-                    class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
-                    min="0"
-                    max="10"
+                    type="checkbox"
+                    formControlName="enabledByDefault"
+                    class="size-4 rounded border-gray-300 text-blue-600 focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800"
                   />
-                </div>
+                  <span class="text-sm/6 font-medium text-gray-700 dark:text-gray-300">
+                    Enabled by default
+                  </span>
+                </label>
+                <p class="ml-7 mt-1 text-xs/5 text-gray-500 dark:text-gray-400">
+                  Tool is enabled when a user first accesses it.
+                </p>
               </div>
-            </div>
-          }
+            </section>
 
-          <!-- Status -->
-          <div>
-            <label for="status" class="block text-sm font-medium text-gray-700 dark:text-gray-300 mb-1">
-              Status
-            </label>
-            <select
-              id="status"
-              formControlName="status"
-              class="w-full px-3 py-2 border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-600"
-            >
-              @for (stat of statuses; track stat.value) {
-                <option [value]="stat.value">{{ stat.label }}</option>
+            <!-- Form Actions -->
+            <div class="flex flex-col gap-4 border-t border-gray-200 pt-6 dark:border-gray-700">
+              @if (error()) {
+                <div class="rounded-2xl border border-red-200 bg-red-50 p-4 text-sm/6 text-red-800 dark:border-red-800 dark:bg-red-900/20 dark:text-red-200">
+                  {{ error() }}
+                </div>
               }
-            </select>
-          </div>
-
-          <!-- Checkboxes -->
-          <div class="space-y-3">
-            <label class="flex items-center gap-2">
-              <input
-                type="checkbox"
-                formControlName="isPublic"
-                class="size-4 rounded border-gray-300 text-blue-600 focus:ring-blue-500"
-              />
-              <span class="text-sm font-medium text-gray-700 dark:text-gray-300">
-                Public Tool
-              </span>
-              <span class="text-sm text-gray-500 dark:text-gray-400">
-                (Available to all authenticated users)
-              </span>
-            </label>
-
-            <label class="flex items-center gap-2">
-              <input
-                type="checkbox"
-                formControlName="enabledByDefault"
-                class="size-4 rounded border-gray-300 text-blue-600 focus:ring-blue-500"
-              />
-              <span class="text-sm font-medium text-gray-700 dark:text-gray-300">
-                Enabled by Default
-              </span>
-              <span class="text-sm text-gray-500 dark:text-gray-400">
-                (Tool is enabled when user first accesses it)
-              </span>
-            </label>
 
-          </div>
+              @if (form.invalid) {
+                <div class="rounded-2xl border border-amber-200 bg-amber-50 p-4 dark:border-amber-800 dark:bg-amber-900/20">
+                  <p class="text-sm/6 font-medium text-amber-800 dark:text-amber-200">
+                    Please fix the following before saving:
+                  </p>
+                  <ul class="mt-1 list-inside list-disc text-sm/6 text-amber-700 dark:text-amber-300">
+                    @if (form.get('toolId')?.invalid && !isEditMode()) {
+                      <li>Tool ID is required (3-50 chars, lowercase, numbers, underscores)</li>
+                    }
+                    @if (form.get('displayName')?.invalid) {
+                      <li>Display name is required (1-100 characters)</li>
+                    }
+                    @if (form.get('description')?.invalid) {
+                      <li>Description is required (max 500 characters)</li>
+                    }
+                  </ul>
+                </div>
+              }
 
-          <!-- Error Message -->
-          @if (error()) {
-            <div class="p-4 bg-red-50 border border-red-200 rounded-sm text-red-800 dark:bg-red-900/20 dark:border-red-800 dark:text-red-200">
-              {{ error() }}
+              <div class="flex gap-2">
+                <button
+                  type="submit"
+                  [disabled]="form.invalid || saving()"
+                  class="inline-flex items-center justify-center rounded-2xl bg-blue-600 px-4 py-2 text-sm/6 font-medium text-white hover:bg-blue-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 disabled:cursor-not-allowed disabled:opacity-50 dark:bg-blue-500 dark:hover:bg-blue-600"
+                >
+                  {{ saving() ? 'Saving…' : (isEditMode() ? 'Update Tool' : 'Create Tool') }}
+                </button>
+                <a
+                  routerLink="/admin/tools"
+                  class="inline-flex items-center justify-center rounded-2xl px-4 py-2 text-sm/6 font-medium text-gray-600 hover:bg-gray-100 hover:text-gray-900 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-gray-500 dark:text-gray-400 dark:hover:bg-gray-800 dark:hover:text-white"
+                >
+                  Cancel
+                </a>
+              </div>
             </div>
-          }
-
-          <!-- Actions -->
-          <div class="flex items-center justify-end gap-3 pt-4">
-            <a
-              routerLink="/admin/tools"
-              class="px-4 py-2 border border-gray-300 rounded-sm hover:bg-gray-100 dark:border-gray-600 dark:hover:bg-gray-700"
-            >
-              Cancel
-            </a>
-            <button
-              type="submit"
-              [disabled]="form.invalid || saving()"
-              class="flex items-center gap-2 px-4 py-2 bg-blue-600 text-white rounded-sm hover:bg-blue-700 disabled:opacity-50 disabled:cursor-not-allowed"
-            >
-              <ng-icon name="heroCheck" class="size-5" />
-              {{ saving() ? 'Saving...' : (isEditMode() ? 'Update Tool' : 'Create Tool') }}
-            </button>
-          </div>
-        </form>
-      }
+          </form>
+        }
+      </div>
     </div>
   `,
 })
diff --git a/frontend/ai.client/src/app/admin/tools/pages/tool-list.page.ts b/frontend/ai.client/src/app/admin/tools/pages/tool-list.page.ts
index 8bb25c75..6fd43dd6 100644
--- a/frontend/ai.client/src/app/admin/tools/pages/tool-list.page.ts
+++ b/frontend/ai.client/src/app/admin/tools/pages/tool-list.page.ts
@@ -5,7 +5,7 @@ import {
   signal,
   computed,
 } from '@angular/core';
-import { Router, RouterLink } from '@angular/router';
+import { RouterLink } from '@angular/router';
 import { FormsModule } from '@angular/forms';
 import { Dialog } from '@angular/cdk/dialog';
 import { firstValueFrom } from 'rxjs';
@@ -13,277 +13,382 @@ import { NgIcon, provideIcons } from '@ng-icons/core';
 import {
   heroPlus,
   heroMagnifyingGlass,
+  heroChevronDown,
   heroPencilSquare,
   heroTrash,
   heroUserGroup,
-  heroXMark,
   heroGlobeAlt,
-  heroCheck,
-  heroXCircle,
-  heroArrowLeft,
 } from '@ng-icons/heroicons/outline';
+import { heroStarSolid } from '@ng-icons/heroicons/solid';
 import { AdminToolService } from '../services/admin-tool.service';
-import { AdminTool, TOOL_CATEGORIES, TOOL_STATUSES } from '../models/admin-tool.model';
+import {
+  AdminTool,
+  TOOL_CATEGORIES,
+  TOOL_STATUSES,
+  TOOL_PROTOCOLS,
+} from '../models/admin-tool.model';
+import { AppRolesService } from '../../roles/services/app-roles.service';
 import { ToolRoleDialogComponent, ToolRoleDialogData, ToolRoleDialogResult } from '../components/tool-role-dialog.component';
 import { DeleteToolDialogComponent, DeleteToolDialogData, DeleteToolDialogResult } from '../components/delete-tool-dialog.component';
-import { TooltipDirective } from '../../../components/tooltip';
 
 @Component({
   selector: 'app-tool-list',
   changeDetection: ChangeDetectionStrategy.OnPush,
-  imports: [RouterLink, FormsModule, NgIcon, TooltipDirective],
+  imports: [RouterLink, FormsModule, NgIcon],
   providers: [
     provideIcons({
       heroPlus,
       heroMagnifyingGlass,
+      heroChevronDown,
       heroPencilSquare,
       heroTrash,
       heroUserGroup,
-      heroXMark,
       heroGlobeAlt,
-      heroCheck,
-      heroXCircle,
-      heroArrowLeft,
+      heroStarSolid,
     }),
   ],
-  host: {
-    class: 'block p-6',
-  },
   template: `
-    <!-- Back Button -->
-    <a
-      routerLink="/admin"
-      class="mb-6 inline-flex items-center gap-2 text-sm/6 font-medium text-gray-600 hover:text-gray-900 dark:text-gray-400 dark:hover:text-white"
-    >
-      <ng-icon name="heroArrowLeft" class="size-4" />
-      Back to Admin
-    </a>
-
-    <div class="mb-6 flex items-center justify-between">
-      <div>
-        <h1 class="text-3xl/9 font-bold">Tool Catalog</h1>
-        <p class="text-gray-600 dark:text-gray-400">
-          Manage tool metadata and role assignments.
-        </p>
-      </div>
-      <div class="flex gap-2">
-        <a
-          routerLink="/admin/tools/new"
-          class="inline-flex items-center gap-2 rounded-sm bg-blue-600 px-4 py-2 text-sm/6 font-medium text-white hover:bg-blue-700 focus:outline-hidden focus:ring-3 focus:ring-blue-500/50 dark:bg-blue-500 dark:hover:bg-blue-600"
-        >
-          <ng-icon name="heroPlus" class="size-5" />
-          Add Tool
-        </a>
-      </div>
-    </div>
+    <div class="min-h-dvh">
+      <div class="mx-auto max-w-5xl px-4 py-8 sm:px-6 lg:px-8">
+        <!-- Page Header -->
+        <div class="mb-6 flex flex-col gap-4 sm:flex-row sm:items-end sm:justify-between">
+          <div>
+            <h1 class="text-2xl/8 font-bold text-gray-900 dark:text-white">Tool Catalog</h1>
+            <p class="mt-1 text-sm/6 text-gray-600 dark:text-gray-400">
+              Manage tool metadata and role assignments.
+            </p>
+          </div>
+          <a
+            routerLink="/admin/tools/new"
+            class="inline-flex shrink-0 items-center gap-2 rounded-2xl bg-blue-600 px-4 py-2 text-sm/6 font-medium text-white hover:bg-blue-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:bg-blue-500 dark:hover:bg-blue-600"
+          >
+            <ng-icon name="heroPlus" class="size-5" aria-hidden="true" />
+            Add Tool
+          </a>
+        </div>
+
+        <!-- Toolbar: search + filters inline -->
+        <div class="mb-3 flex flex-col gap-2 sm:flex-row sm:items-center">
+          <div class="relative flex-1">
+            <ng-icon
+              name="heroMagnifyingGlass"
+              class="pointer-events-none absolute left-3 top-1/2 size-4 -translate-y-1/2 text-gray-400 dark:text-gray-500"
+              aria-hidden="true"
+            />
+            <label for="search" class="sr-only">Search tools</label>
+            <input
+              type="text"
+              id="search"
+              [ngModel]="searchQuery()"
+              (ngModelChange)="searchQuery.set($event)"
+              placeholder="Search by name, ID, or description…"
+              class="block w-full rounded-2xl border border-gray-300 bg-white py-2 pl-9 pr-3 text-sm/6 text-gray-900 placeholder:text-gray-400 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:placeholder:text-gray-500"
+            />
+          </div>
 
-    <!-- Filters -->
-    <div class="mb-6 flex flex-wrap items-center gap-4">
-      <div class="relative flex-1 min-w-64">
-        <ng-icon
-          name="heroMagnifyingGlass"
-          class="absolute left-3 top-1/2 -translate-y-1/2 size-5 text-gray-400"
-        />
-        <input
-          type="text"
-          [(ngModel)]="searchQuery"
-          placeholder="Search by name or ID..."
-          class="w-full pl-10 pr-10 py-2 bg-white border border-gray-300 rounded-sm focus:ring-2 focus:ring-blue-500 focus:border-blue-500 dark:bg-gray-800 dark:border-gray-500 dark:text-white dark:placeholder-gray-400"
-        />
-        @if (searchQuery()) {
-          <button
-            (click)="searchQuery.set('')"
-            class="absolute right-3 top-1/2 -translate-y-1/2 text-gray-400 hover:text-gray-600"
+          <label for="status" class="sr-only">Filter by status</label>
+          <select
+            id="status"
+            [ngModel]="statusFilter()"
+            (ngModelChange)="statusFilter.set($event)"
+            class="rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white"
           >
-            <ng-icon name="heroXMark" class="size-5" />
-          </button>
-        }
-      </div>
+            <option value="">All statuses</option>
+            @for (status of statuses; track status.value) {
+              <option [value]="status.value">{{ status.label }}</option>
+            }
+          </select>
 
-      <select
-        [ngModel]="statusFilter()"
-        (ngModelChange)="statusFilter.set($event)"
-        class="px-3 py-2 bg-white border border-gray-300 rounded-sm dark:bg-gray-800 dark:border-gray-500 dark:text-white"
-      >
-        <option value="">All Statuses</option>
-        @for (status of statuses; track status.value) {
-          <option [value]="status.value">{{ status.label }}</option>
-        }
-      </select>
-
-      <select
-        [ngModel]="categoryFilter()"
-        (ngModelChange)="categoryFilter.set($event)"
-        class="px-3 py-2 bg-white border border-gray-300 rounded-sm dark:bg-gray-800 dark:border-gray-500 dark:text-white"
-      >
-        <option value="">All Categories</option>
-        @for (cat of categories; track cat.value) {
-          <option [value]="cat.value">{{ cat.label }}</option>
-        }
-      </select>
-
-      @if (hasActiveFilters()) {
-        <button
-          (click)="resetFilters()"
-          class="text-sm text-blue-600 hover:text-blue-800 dark:text-blue-400 dark:hover:text-blue-300"
-        >
-          Clear Filters
-        </button>
-      }
-    </div>
+          <label for="category" class="sr-only">Filter by category</label>
+          <select
+            id="category"
+            [ngModel]="categoryFilter()"
+            (ngModelChange)="categoryFilter.set($event)"
+            class="rounded-2xl border border-gray-300 bg-white px-3 py-2 text-sm/6 text-gray-900 focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white"
+          >
+            <option value="">All categories</option>
+            @for (cat of categories; track cat.value) {
+              <option [value]="cat.value">{{ cat.label }}</option>
+            }
+          </select>
 
-    <!-- Loading State -->
-    @if (toolsResource.isLoading() && tools().length === 0) {
-      <div class="flex items-center justify-center h-64">
-        <div class="flex flex-col items-center gap-4">
-          <div
-            class="animate-spin rounded-full size-12 border-4 border-gray-300 dark:border-gray-600 border-t-blue-600"
-          ></div>
-          <p class="text-sm text-gray-500 dark:text-gray-400">
-            Loading tools...
-          </p>
+          @if (hasActiveFilters()) {
+            <button
+              (click)="resetFilters()"
+              class="rounded-2xl px-3 py-2 text-sm/6 font-medium text-gray-600 hover:bg-gray-100 hover:text-gray-900 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-gray-500 dark:text-gray-400 dark:hover:bg-gray-800 dark:hover:text-white"
+            >
+              Reset
+            </button>
+          }
         </div>
-      </div>
-    }
 
-    <!-- Error State -->
-    @if (toolsResource.error()) {
-      <div class="mb-6 p-4 bg-red-50 border border-red-200 rounded-sm text-red-800 dark:bg-red-900/20 dark:border-red-800 dark:text-red-200">
-        <p>Failed to load tools. Please try again.</p>
-        <button
-          (click)="adminToolService.reload()"
-          class="mt-2 text-sm underline hover:no-underline"
-        >
-          Retry
-        </button>
-      </div>
-    }
+        <!-- Count -->
+        <div class="mb-3 text-xs/5 text-gray-500 dark:text-gray-400">
+          {{ filteredTools().length }} tool{{ filteredTools().length !== 1 ? 's' : '' }}
+        </div>
 
-    <!-- Tools Table -->
-    @if (!toolsResource.isLoading() || tools().length > 0) {
-      <div class="bg-white dark:bg-gray-800 rounded-sm shadow-xs overflow-hidden border border-gray-200 dark:border-gray-700">
-        <table class="min-w-full divide-y divide-gray-200 dark:divide-gray-700">
-          <thead class="bg-gray-50 dark:bg-gray-700">
-            <tr>
-              <th class="px-6 py-3 text-left text-xs font-medium text-gray-500 dark:text-gray-300 uppercase tracking-wider">
-                Tool
-              </th>
-              <th class="px-6 py-3 text-left text-xs font-medium text-gray-500 dark:text-gray-300 uppercase tracking-wider">
-                Category
-              </th>
-              <th class="px-6 py-3 text-left text-xs font-medium text-gray-500 dark:text-gray-300 uppercase tracking-wider">
-                Access
-              </th>
-              <th class="px-6 py-3 text-left text-xs font-medium text-gray-500 dark:text-gray-300 uppercase tracking-wider">
-                Default
-              </th>
-              <th class="px-6 py-3 text-left text-xs font-medium text-gray-500 dark:text-gray-300 uppercase tracking-wider">
-                Status
-              </th>
-              <th class="px-6 py-3 text-right text-xs font-medium text-gray-500 dark:text-gray-300 uppercase tracking-wider">
-                Actions
-              </th>
-            </tr>
-          </thead>
-          <tbody class="divide-y divide-gray-200 dark:divide-gray-700">
-            @for (tool of filteredTools(); track tool.toolId) {
-              <tr class="hover:bg-gray-50 dark:hover:bg-gray-700">
-                <td class="px-6 py-4">
-                  <div>
-                    <div class="font-medium">{{ tool.displayName }}</div>
-                    <div class="text-sm text-gray-500 dark:text-gray-400">{{ tool.toolId }}</div>
-                  </div>
-                </td>
-                <td class="px-6 py-4">
-                  <span class="px-2 py-1 text-xs rounded-xs bg-gray-100 dark:bg-gray-600 capitalize">
-                    {{ tool.category }}
-                  </span>
-                </td>
-                <td class="px-6 py-4">
-                  @if (tool.isPublic) {
-                    <span class="inline-flex items-center gap-1 text-green-600 dark:text-green-400">
-                      <ng-icon name="heroGlobeAlt" class="size-4" />
-                      Public
-                    </span>
-                  } @else {
-                    <span class="text-gray-600 dark:text-gray-400">
-                      {{ tool.allowedAppRoles.length }} roles
-                    </span>
-                  }
-                </td>
-                <td class="px-6 py-4">
-                  @if (tool.enabledByDefault) {
-                    <ng-icon name="heroCheck" class="size-5 text-green-600 dark:text-green-400" />
-                  } @else {
-                    <ng-icon name="heroXCircle" class="size-5 text-gray-400" />
-                  }
-                </td>
-                <td class="px-6 py-4">
-                  <span [class]="getStatusClass(tool.status)">
-                    {{ tool.status }}
-                  </span>
-                </td>
-                <td class="px-6 py-4 text-right">
-                  <div class="flex items-center justify-end gap-1">
-                    <button
-                      (click)="openRoleDialog(tool)"
-                      class="p-2 text-gray-500 hover:text-blue-600 hover:bg-gray-100 rounded-sm dark:hover:bg-gray-600"
-                      [appTooltip]="'Manage Role Access'"
-                      appTooltipPosition="top"
-                    >
-                      <ng-icon name="heroUserGroup" class="size-5" />
-                    </button>
-                    <a
-                      [routerLink]="['/admin/tools/edit', tool.toolId]"
-                      class="p-2 text-gray-500 hover:text-blue-600 hover:bg-gray-100 rounded-sm dark:hover:bg-gray-600"
-                      [appTooltip]="'Edit Tool'"
-                      appTooltipPosition="top"
-                    >
-                      <ng-icon name="heroPencilSquare" class="size-5" />
-                    </a>
+        <!-- Loading State -->
+        @if (toolsResource.isLoading() && tools().length === 0) {
+          <div class="flex h-64 items-center justify-center">
+            <div class="flex flex-col items-center gap-4">
+              <div
+                class="size-12 animate-spin rounded-full border-4 border-gray-300 border-t-blue-600 dark:border-gray-600 dark:border-t-blue-400"
+              ></div>
+              <p class="text-sm/6 text-gray-500 dark:text-gray-400">Loading tools…</p>
+            </div>
+          </div>
+        }
+
+        <!-- Error State -->
+        @if (toolsResource.error()) {
+          <div class="mb-6 rounded-2xl border border-red-200 bg-red-50 p-4 text-red-800 dark:border-red-800 dark:bg-red-900/20 dark:text-red-200">
+            <p class="text-sm/6">Failed to load tools. Please try again.</p>
+            <button
+              (click)="adminToolService.reload()"
+              class="mt-2 text-sm/6 font-medium underline hover:no-underline"
+            >
+              Retry
+            </button>
+          </div>
+        }
+
+        <!-- Tools List -->
+        @if (!toolsResource.isLoading() || tools().length > 0) {
+          @if (filteredTools().length === 0) {
+            <div class="rounded-2xl border border-dashed border-gray-300 bg-white p-12 text-center dark:border-gray-700 dark:bg-gray-800">
+              @if (hasActiveFilters()) {
+                <p class="text-sm/6 text-gray-500 dark:text-gray-400">
+                  No tools match the current filters.
+                </p>
+              } @else {
+                <p class="text-sm/6 text-gray-500 dark:text-gray-400">
+                  No tools in catalog yet.
+                </p>
+                <a
+                  routerLink="/admin/tools/new"
+                  class="mt-4 inline-flex items-center gap-2 rounded-2xl bg-blue-600 px-4 py-2 text-sm/6 font-medium text-white hover:bg-blue-700 dark:bg-blue-500 dark:hover:bg-blue-600"
+                >
+                  <ng-icon name="heroPlus" class="size-5" aria-hidden="true" />
+                  Add Tool
+                </a>
+              }
+            </div>
+          } @else {
+            <ul class="divide-y divide-gray-200 overflow-hidden rounded-2xl border border-gray-200 bg-white dark:divide-gray-700 dark:border-gray-700 dark:bg-gray-800">
+              @for (tool of filteredTools(); track tool.toolId) {
+                <li>
+                  <!-- Row -->
+                  <div class="flex items-center gap-3 px-3 py-2.5 sm:px-4">
+                    <!-- Expand toggle -->
                     <button
-                      (click)="deleteTool(tool)"
-                      class="p-2 text-gray-500 hover:text-red-600 hover:bg-gray-100 rounded-sm dark:hover:bg-gray-600"
-                      [appTooltip]="'Delete Tool'"
-                      appTooltipPosition="top"
+                      type="button"
+                      (click)="toggleExpand(tool.toolId)"
+                      [attr.aria-expanded]="isExpanded(tool.toolId)"
+                      [attr.aria-controls]="'tool-detail-' + tool.toolId"
+                      [attr.aria-label]="(isExpanded(tool.toolId) ? 'Hide' : 'Show') + ' details for ' + tool.displayName"
+                      class="flex size-7 shrink-0 items-center justify-center rounded-2xl text-gray-400 hover:bg-gray-100 hover:text-gray-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:text-gray-500 dark:hover:bg-gray-700 dark:hover:text-gray-200"
                     >
-                      <ng-icon name="heroTrash" class="size-5" />
+                      <ng-icon
+                        name="heroChevronDown"
+                        class="size-4 transition-transform duration-150"
+                        [class.rotate-180]="isExpanded(tool.toolId)"
+                        aria-hidden="true"
+                      />
                     </button>
+
+                    <!-- Name + tool id -->
+                    <div class="min-w-0 flex-1">
+                      <div class="flex items-center gap-1.5">
+                        <span class="truncate text-sm/6 font-medium text-gray-900 dark:text-white">
+                          {{ tool.displayName }}
+                        </span>
+                        @if (tool.enabledByDefault) {
+                          <ng-icon
+                            name="heroStarSolid"
+                            class="size-4 shrink-0 text-amber-500 dark:text-amber-400"
+                            aria-label="Enabled by default"
+                          />
+                        }
+                      </div>
+                      <p class="truncate font-mono text-xs/5 text-gray-500 dark:text-gray-400">
+                        {{ tool.toolId }}
+                      </p>
+                    </div>
+
+                    <!-- Category -->
+                    <span class="hidden shrink-0 rounded-2xl bg-gray-100 px-2.5 py-0.5 text-xs/5 font-medium capitalize text-gray-600 sm:inline-block dark:bg-gray-700 dark:text-gray-300">
+                      {{ getCategoryLabel(tool.category) }}
+                    </span>
+
+                    <!-- Access -->
+                    <span class="hidden w-20 shrink-0 justify-end text-right text-xs/5 sm:flex">
+                      @if (tool.isPublic) {
+                        <span class="inline-flex items-center gap-1 font-medium text-green-700 dark:text-green-400">
+                          <ng-icon name="heroGlobeAlt" class="size-4" aria-hidden="true" />
+                          Public
+                        </span>
+                      } @else {
+                        <span class="text-gray-500 dark:text-gray-400">
+                          {{ tool.allowedAppRoles.length }} role{{ tool.allowedAppRoles.length !== 1 ? 's' : '' }}
+                        </span>
+                      }
+                    </span>
+
+                    <!-- Status -->
+                    <span [class]="getStatusClass(tool.status)">
+                      {{ tool.status }}
+                    </span>
+
+                    <!-- Actions -->
+                    <div class="flex shrink-0 items-center gap-1">
+                      <button
+                        type="button"
+                        (click)="openRoleDialog(tool)"
+                        [attr.aria-label]="'Manage role access for ' + tool.displayName"
+                        [title]="'Manage role access for ' + tool.displayName"
+                        class="flex size-8 items-center justify-center rounded-2xl text-gray-400 hover:bg-gray-100 hover:text-gray-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:text-gray-500 dark:hover:bg-gray-700 dark:hover:text-gray-200"
+                      >
+                        <ng-icon name="heroUserGroup" class="size-4" aria-hidden="true" />
+                      </button>
+                      <a
+                        [routerLink]="['/admin/tools/edit', tool.toolId]"
+                        [attr.aria-label]="'Edit ' + tool.displayName"
+                        [title]="'Edit ' + tool.displayName"
+                        class="flex size-8 items-center justify-center rounded-2xl text-gray-400 hover:bg-gray-100 hover:text-gray-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:text-gray-500 dark:hover:bg-gray-700 dark:hover:text-gray-200"
+                      >
+                        <ng-icon name="heroPencilSquare" class="size-4" aria-hidden="true" />
+                      </a>
+                      <button
+                        type="button"
+                        (click)="deleteTool(tool)"
+                        [attr.aria-label]="'Delete ' + tool.displayName"
+                        [title]="'Delete ' + tool.displayName"
+                        class="flex size-8 items-center justify-center rounded-2xl text-gray-400 hover:bg-red-50 hover:text-red-600 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-red-500 dark:text-gray-500 dark:hover:bg-red-900/20 dark:hover:text-red-400"
+                      >
+                        <ng-icon name="heroTrash" class="size-4" aria-hidden="true" />
+                      </button>
+                    </div>
                   </div>
-                </td>
-              </tr>
-            }
-          </tbody>
-        </table>
-      </div>
 
-      <!-- Empty State -->
-      @if (filteredTools().length === 0 && !toolsResource.isLoading()) {
-        <div class="text-center py-12 text-gray-500">
-          <ng-icon name="heroPlus" class="size-12 mx-auto mb-4 text-gray-300" />
-          @if (hasActiveFilters()) {
-            <p class="text-lg/7">No tools match your filters</p>
-            <p class="text-sm/6">Try adjusting your search or filter criteria</p>
-          } @else {
-            <p class="text-lg/7">No tools in catalog</p>
-            <p class="text-sm/6 mb-4">Add a tool to get started.</p>
-            <a
-              routerLink="/admin/tools/new"
-              class="inline-flex items-center gap-2 rounded-sm bg-blue-600 px-4 py-2 text-sm/6 font-medium text-white hover:bg-blue-700"
-            >
-              <ng-icon name="heroPlus" class="size-5" />
-              Add Tool
-            </a>
+                  <!-- Expanded detail -->
+                  @if (isExpanded(tool.toolId)) {
+                    <div
+                      [id]="'tool-detail-' + tool.toolId"
+                      class="border-t border-gray-100 bg-gray-50 px-4 py-3 sm:pl-14 dark:border-gray-700/60 dark:bg-gray-900/40"
+                    >
+                      <dl class="grid grid-cols-1 gap-x-8 gap-y-3 sm:grid-cols-3">
+                        <div class="sm:col-span-3">
+                          <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">
+                            Description
+                          </dt>
+                          <dd class="mt-0.5 text-sm/6 text-gray-700 dark:text-gray-300">
+                            {{ tool.description || 'No description provided.' }}
+                          </dd>
+                        </div>
+
+                        <div>
+                          <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">
+                            Protocol
+                          </dt>
+                          <dd class="mt-0.5 text-sm/6 text-gray-700 dark:text-gray-300">
+                            {{ getProtocolLabel(tool.protocol) }}
+                          </dd>
+                        </div>
+
+                        <div>
+                          <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">
+                            Default
+                          </dt>
+                          <dd class="mt-0.5 text-sm/6 text-gray-700 dark:text-gray-300">
+                            {{ tool.enabledByDefault ? 'On by default' : 'Off by default' }}
+                          </dd>
+                        </div>
+
+                        <div>
+                          <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">
+                            OAuth
+                          </dt>
+                          <dd class="mt-0.5 text-sm/6 text-gray-700 dark:text-gray-300">
+                            {{ tool.requiresOauthProvider || 'None' }}
+                            @if (tool.forwardAuthToken) {
+                              <span class="text-gray-400 dark:text-gray-500">· forwards auth token</span>
+                            }
+                          </dd>
+                        </div>
+
+                        <div class="sm:col-span-3">
+                          <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">
+                            Access
+                          </dt>
+                          @if (tool.isPublic) {
+                            <dd class="mt-0.5 text-sm/6 text-gray-700 dark:text-gray-300">
+                              Public — available to all authenticated users.
+                            </dd>
+                          } @else {
+                            <dd class="mt-1 flex flex-wrap gap-1.5">
+                              @if (tool.allowedAppRoles.length > 0) {
+                                @for (roleId of tool.allowedAppRoles; track roleId) {
+                                  <span
+                                    class="inline-flex items-center rounded-2xl bg-purple-100 px-2 py-0.5 text-xs/5 text-purple-700 dark:bg-purple-900/50 dark:text-purple-300"
+                                    [title]="roleId"
+                                  >
+                                    {{ getRoleDisplayName(roleId) }}
+                                  </span>
+                                }
+                              } @else {
+                                <span class="text-xs/5 italic text-gray-500 dark:text-gray-400">No roles assigned</span>
+                              }
+                            </dd>
+                          }
+                        </div>
+
+                        @if (tool.mcpConfig) {
+                          <div class="sm:col-span-3">
+                            <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">
+                              MCP server
+                            </dt>
+                            <dd class="mt-0.5 space-y-0.5 text-sm/6 text-gray-700 dark:text-gray-300">
+                              <p class="break-all font-mono text-xs/5">{{ tool.mcpConfig.serverUrl }}</p>
+                              <p class="text-xs/5 text-gray-500 dark:text-gray-400">
+                                {{ tool.mcpConfig.transport }} · auth: {{ tool.mcpConfig.authType }} ·
+                                {{ tool.mcpConfig.tools.length }} tool{{ tool.mcpConfig.tools.length !== 1 ? 's' : '' }}
+                              </p>
+                            </dd>
+                          </div>
+                        }
+
+                        @if (tool.a2aConfig) {
+                          <div class="sm:col-span-3">
+                            <dt class="text-xs/5 font-medium uppercase tracking-wide text-gray-500 dark:text-gray-400">
+                              A2A agent
+                            </dt>
+                            <dd class="mt-0.5 space-y-0.5 text-sm/6 text-gray-700 dark:text-gray-300">
+                              <p class="break-all font-mono text-xs/5">{{ tool.a2aConfig.agentUrl }}</p>
+                              <p class="text-xs/5 text-gray-500 dark:text-gray-400">
+                                auth: {{ tool.a2aConfig.authType }}
+                                @if (tool.a2aConfig.capabilities.length > 0) {
+                                  · {{ tool.a2aConfig.capabilities.join(', ') }}
+                                }
+                              </p>
+                            </dd>
+                          </div>
+                        }
+                      </dl>
+                    </div>
+                  }
+                </li>
+              }
+            </ul>
           }
-        </div>
-      }
-    }
+        }
+      </div>
+    </div>
   `,
 })
 export class ToolListPage {
   adminToolService = inject(AdminToolService);
-  private router = inject(Router);
   private dialog = inject(Dialog);
+  private appRolesService = inject(AppRolesService);
 
   readonly toolsResource = this.adminToolService.toolsResource;
   readonly categories = TOOL_CATEGORIES;
@@ -294,6 +399,9 @@ export class ToolListPage {
   statusFilter = signal('');
   categoryFilter = signal('');
 
+  // Row detail expansion state (set of tool ids currently expanded)
+  private expandedIds = signal<ReadonlySet<string>>(new Set());
+
   // Computed
   readonly tools = computed(() => this.adminToolService.getTools());
 
@@ -338,18 +446,53 @@ export class ToolListPage {
     this.categoryFilter.set('');
   }
 
+  isExpanded(toolId: string): boolean {
+    return this.expandedIds().has(toolId);
+  }
+
+  toggleExpand(toolId: string): void {
+    this.expandedIds.update(current => {
+      const next = new Set(current);
+      if (next.has(toolId)) {
+        next.delete(toolId);
+      } else {
+        next.add(toolId);
+      }
+      return next;
+    });
+  }
+
+  getCategoryLabel(category: string): string {
+    return this.categories.find(c => c.value === category)?.label ?? category;
+  }
+
+  getProtocolLabel(protocol: string): string {
+    return TOOL_PROTOCOLS.find(p => p.value === protocol)?.label ?? protocol;
+  }
+
+  /**
+   * Get the display name for a role ID.
+   * Falls back to the role ID if not found.
+   */
+  getRoleDisplayName(roleId: string): string {
+    const role = this.appRolesService.getRoleById(roleId);
+    return role?.displayName ?? roleId;
+  }
+
   getStatusClass(status: string): string {
+    const base =
+      'shrink-0 rounded-2xl px-2.5 py-0.5 text-xs/5 font-medium';
     switch (status) {
       case 'active':
-        return 'px-2 py-1 text-xs rounded-xs bg-green-100 text-green-800 dark:bg-green-900/30 dark:text-green-300';
+        return `${base} bg-green-100 text-green-800 dark:bg-green-900/30 dark:text-green-300`;
       case 'deprecated':
-        return 'px-2 py-1 text-xs rounded-xs bg-yellow-100 text-yellow-800 dark:bg-yellow-900/30 dark:text-yellow-300';
+        return `${base} bg-yellow-100 text-yellow-800 dark:bg-yellow-900/30 dark:text-yellow-300`;
       case 'disabled':
-        return 'px-2 py-1 text-xs rounded-xs bg-red-100 text-red-800 dark:bg-red-900/30 dark:text-red-300';
+        return `${base} bg-red-100 text-red-800 dark:bg-red-900/30 dark:text-red-300`;
       case 'coming_soon':
-        return 'px-2 py-1 text-xs rounded-xs bg-blue-100 text-blue-800 dark:bg-blue-900/30 dark:text-blue-300';
+        return `${base} bg-blue-100 text-blue-800 dark:bg-blue-900/30 dark:text-blue-300`;
       default:
-        return 'px-2 py-1 text-xs rounded-xs bg-gray-100 text-gray-800 dark:bg-gray-700 dark:text-gray-300';
+        return `${base} bg-gray-100 text-gray-800 dark:bg-gray-700 dark:text-gray-300`;
     }
   }
 
diff --git a/frontend/ai.client/src/app/admin/users/pages/user-detail/user-detail.page.ts b/frontend/ai.client/src/app/admin/users/pages/user-detail/user-detail.page.ts
index 2dcd02be..8eea79e3 100644
--- a/frontend/ai.client/src/app/admin/users/pages/user-detail/user-detail.page.ts
+++ b/frontend/ai.client/src/app/admin/users/pages/user-detail/user-detail.page.ts
@@ -64,7 +64,7 @@ import { QuotaEventSummary } from '../../models';
       <div class="flex items-center justify-center h-64">
         <div class="flex flex-col items-center gap-4">
           <div
-            class="animate-spin rounded-full size-12 border-4 border-gray-300 dark:border-gray-600 border-t-blue-600"
+            class="animate-spin rounded-full size-12 border-4 border-gray-300 dark:border-gray-600 border-t-blue-600 dark:border-t-blue-400"
           ></div>
           <p class="text-sm text-gray-500 dark:text-gray-400">
             Loading user details...
diff --git a/frontend/ai.client/src/app/admin/users/pages/user-list/user-list.page.ts b/frontend/ai.client/src/app/admin/users/pages/user-list/user-list.page.ts
index 7e3539e5..685f4f93 100644
--- a/frontend/ai.client/src/app/admin/users/pages/user-list/user-list.page.ts
+++ b/frontend/ai.client/src/app/admin/users/pages/user-list/user-list.page.ts
@@ -28,15 +28,6 @@ import { UserListItem, UserStatus } from '../../models';
     class: 'block p-6',
   },
   template: `
-    <!-- Back Button -->
-    <a
-      routerLink="/admin"
-      class="mb-6 inline-flex items-center gap-2 text-sm/6 font-medium text-gray-600 hover:text-gray-900 dark:text-gray-400 dark:hover:text-white"
-    >
-      <ng-icon name="heroArrowLeft" class="size-4" />
-      Back to Admin
-    </a>
-
     <div class="mb-6">
       <h1 class="text-3xl/9 font-bold mb-2">User Lookup</h1>
       <p class="text-gray-600 dark:text-gray-400">
@@ -100,7 +91,7 @@ import { UserListItem, UserStatus } from '../../models';
       <div class="flex items-center justify-center h-64">
         <div class="flex flex-col items-center gap-4">
           <div
-            class="animate-spin rounded-full size-12 border-4 border-gray-300 dark:border-gray-600 border-t-blue-600"
+            class="animate-spin rounded-full size-12 border-4 border-gray-300 dark:border-gray-600 border-t-blue-600 dark:border-t-blue-400"
           ></div>
           <p class="text-sm text-gray-500 dark:text-gray-400">
             Loading users...
diff --git a/frontend/ai.client/src/app/app.config.ts b/frontend/ai.client/src/app/app.config.ts
index 76c528ac..66b6db32 100644
--- a/frontend/ai.client/src/app/app.config.ts
+++ b/frontend/ai.client/src/app/app.config.ts
@@ -8,6 +8,8 @@ import { errorInterceptor } from './auth/error.interceptor';
 import { withCredentialsInterceptor } from './auth/with-credentials.interceptor';
 import { MARKED_OPTIONS, MarkedOptions, MarkedRenderer, provideMarkdown } from 'ngx-markdown';
 import { SessionService } from './auth/session.service';
+import { ThemeService } from './components/topnav/components/theme-toggle/theme.service';
+import { provideBuiltInToolRenderers } from './session/components/message-list/components/tool-use/built-in-renderers';
 
 function markedOptionsFactory(): MarkedOptions {
   const renderer = new MarkedRenderer();
@@ -47,5 +49,18 @@ export const appConfig: ApplicationConfig = {
     // the user can pick a provider. Transport errors leave the SPA in a clean
     // unauthenticated state without redirecting.
     provideAppInitializer(() => inject(SessionService).bootstrap()),
+
+    // ThemeService applies the persisted/system theme to <html> in its
+    // constructor. It's providedIn:'root' but only injected by the topnav
+    // and authed pages, so on a cold load to /auth/login or /auth/first-boot
+    // it would never run and the dark-mode CSS on those screens would sit
+    // dormant. Inject it at bootstrap so the lava-lamp backdrop honors the
+    // user's preference (and prefers-color-scheme) on every route.
+    provideAppInitializer(() => { inject(ThemeService); }),
+
+    // Register the built-in tool-result renderers (text/JSON/image default
+    // plus the migrated proof-point renderers) into the renderer registry
+    // before the first message renders.
+    provideBuiltInToolRenderers(),
   ]
 };
diff --git a/frontend/ai.client/src/app/app.css b/frontend/ai.client/src/app/app.css
index 059e3c76..e07ed461 100644
--- a/frontend/ai.client/src/app/app.css
+++ b/frontend/ai.client/src/app/app.css
@@ -54,3 +54,14 @@
 .sidenav-backdrop-exit {
   animation: fadeOut 0.3s ease-in forwards;
 }
+
+/* Docked artifact pane: reserve right-side space equal to the panel's
+   max width (max-w-2xl = 42rem) so the fixed pane sits beside the chat
+   instead of over it. Desktop only — below lg the pane is a full-width
+   takeover and the chat is not shown alongside it. The wrapper already
+   carries `transition-[padding]`, so this eases in/out with the nav. */
+@media (min-width: 1024px) {
+  .artifact-pane-open {
+    padding-right: var(--artifact-pane-width, 42rem);
+  }
+}
diff --git a/frontend/ai.client/src/app/app.html b/frontend/ai.client/src/app/app.html
index 9a9ca710..09f5b8ff 100644
--- a/frontend/ai.client/src/app/app.html
+++ b/frontend/ai.client/src/app/app.html
@@ -103,8 +103,10 @@
 
   <div
     class="transition-[padding] duration-300"
+    [style.--artifact-pane-width]="artifactPaneWidthCss()"
     [class.lg:pl-72]="!sidenavService.isCollapsed() && !sidenavService.isHidden()"
-    [class.lg:pl-0]="sidenavService.isCollapsed() || sidenavService.isHidden()">
+    [class.lg:pl-0]="sidenavService.isCollapsed() || sidenavService.isHidden()"
+    [class.artifact-pane-open]="artifactPanelOpen()">
     <main class="flex flex-col">
       <!-- Scrollable Content -->
       <div class="flex-1 overflow-y-auto">
diff --git a/frontend/ai.client/src/app/app.routes.ts b/frontend/ai.client/src/app/app.routes.ts
index 92e0646d..cdfd51c0 100644
--- a/frontend/ai.client/src/app/app.routes.ts
+++ b/frontend/ai.client/src/app/app.routes.ts
@@ -30,38 +30,9 @@ export const routes: Routes = [
     },
     {
         path: 'admin',
-        loadComponent: () => import('./admin/admin.page').then(m => m.AdminPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/bedrock/models',
-        loadComponent: () => import('./admin/bedrock-models/bedrock-models.page').then(m => m.BedrockModelsPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/gemini/models',
-        loadComponent: () => import('./admin/gemini-models/gemini-models.page').then(m => m.GeminiModelsPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/openai/models',
-        loadComponent: () => import('./admin/openai-models/openai-models.page').then(m => m.OpenAIModelsPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/manage-models',
-        loadComponent: () => import('./admin/manage-models/manage-models.page').then(m => m.ManageModelsPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/manage-models/new',
-        loadComponent: () => import('./admin/manage-models/model-form.page').then(m => m.ModelFormPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/manage-models/edit/:id',
-        loadComponent: () => import('./admin/manage-models/model-form.page').then(m => m.ModelFormPage),
+        loadComponent: () => import('./admin/admin.layout').then(m => m.AdminLayout),
         canActivate: [adminGuard],
+        loadChildren: () => import('./admin/admin.routes').then(m => m.adminRoutes),
     },
     {
         path: 'assistants/new',
@@ -103,111 +74,6 @@ export const routes: Routes = [
         canActivate: [authGuard],
         loadChildren: () => import('./settings/settings.routes').then(m => m.settingsRoutes),
     },
-    {
-        path: 'admin/quota',
-        loadChildren: () => import('./admin/quota-tiers/quota-routing.module').then(m => m.quotaRoutes),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/costs',
-        loadComponent: () => import('./admin/costs/admin-costs.page').then(m => m.AdminCostsPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/users',
-        loadComponent: () => import('./admin/users/pages/user-list/user-list.page').then(m => m.UserListPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/users/:userId',
-        loadComponent: () => import('./admin/users/pages/user-detail/user-detail.page').then(m => m.UserDetailPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/roles',
-        loadComponent: () => import('./admin/roles/pages/role-list.page').then(m => m.RoleListPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/roles/new',
-        loadComponent: () => import('./admin/roles/pages/role-form.page').then(m => m.RoleFormPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/roles/edit/:id',
-        loadComponent: () => import('./admin/roles/pages/role-form.page').then(m => m.RoleFormPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/tools',
-        loadComponent: () => import('./admin/tools/pages/tool-list.page').then(m => m.ToolListPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/tools/new',
-        loadComponent: () => import('./admin/tools/pages/tool-form.page').then(m => m.ToolFormPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/tools/edit/:toolId',
-        loadComponent: () => import('./admin/tools/pages/tool-form.page').then(m => m.ToolFormPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/auth-providers',
-        loadComponent: () => import('./admin/auth-providers/pages/provider-list.page').then(m => m.AuthProviderListPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/auth-providers/new',
-        loadComponent: () => import('./admin/auth-providers/pages/provider-form.page').then(m => m.AuthProviderFormPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/auth-providers/edit/:providerId',
-        loadComponent: () => import('./admin/auth-providers/pages/provider-form.page').then(m => m.AuthProviderFormPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/oauth-providers',
-        redirectTo: 'admin/connectors',
-        pathMatch: 'full',
-    },
-    {
-        path: 'admin/oauth-providers/new',
-        redirectTo: 'admin/connectors/new',
-        pathMatch: 'full',
-    },
-    {
-        path: 'admin/oauth-providers/edit/:providerId',
-        redirectTo: 'admin/connectors/edit/:providerId',
-        pathMatch: 'full',
-    },
-    {
-        path: 'admin/connectors',
-        loadComponent: () => import('./admin/connectors/pages/connector-list.page').then(m => m.ConnectorListPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/connectors/new',
-        loadComponent: () => import('./admin/connectors/pages/connector-form.page').then(m => m.ConnectorFormPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/connectors/edit/:providerId',
-        loadComponent: () => import('./admin/connectors/pages/connector-form.page').then(m => m.ConnectorFormPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/fine-tuning',
-        loadComponent: () => import('./admin/fine-tuning-access/fine-tuning-access.page').then(m => m.FineTuningAccessPage),
-        canActivate: [adminGuard],
-    },
-    {
-        path: 'admin/fine-tuning/costs',
-        loadComponent: () => import('./admin/fine-tuning-costs/fine-tuning-costs.page').then(m => m.FineTuningCostsPage),
-        canActivate: [adminGuard],
-    },
     {
         path: 'fine-tuning',
         loadComponent: () => import('./fine-tuning/pages/dashboard/fine-tuning-dashboard.page').then(m => m.FineTuningDashboardPage),
diff --git a/frontend/ai.client/src/app/app.ts b/frontend/ai.client/src/app/app.ts
index d2183f39..2c920224 100644
--- a/frontend/ai.client/src/app/app.ts
+++ b/frontend/ai.client/src/app/app.ts
@@ -1,4 +1,4 @@
-import { Component, inject, signal } from '@angular/core';
+import { Component, DestroyRef, computed, inject, signal } from '@angular/core';
 import { Router, RouterOutlet } from '@angular/router';
 import { Sidenav } from './components/sidenav/sidenav';
 import { ErrorToastComponent } from './components/error-toast/error-toast.component';
@@ -6,14 +6,16 @@ import { ToastComponent } from './components/toast';
 import { SidenavService } from './services/sidenav/sidenav.service';
 import { HeaderService } from './services/header/header.service';
 import { TooltipDirective } from './components/tooltip/tooltip.directive';
+import { SessionService } from './auth/session.service';
+import { ArtifactStateService } from './session/services/artifacts/artifact-state.service';
 
 @Component({
   selector: 'app-root',
   imports: [
-    RouterOutlet, 
-    Sidenav, 
-    ErrorToastComponent, 
-    ToastComponent, 
+    RouterOutlet,
+    Sidenav,
+    ErrorToastComponent,
+    ToastComponent,
     TooltipDirective
   ],
   templateUrl: './app.html',
@@ -24,6 +26,38 @@ export class App {
   protected sidenavService = inject(SidenavService);
   protected headerService = inject(HeaderService);
   private router = inject(Router);
+  private session = inject(SessionService);
+  private artifactState = inject(ArtifactStateService);
+
+  /** True while an artifact pane is docked — content reserves right-side
+   *  space for it (desktop only) so the fixed panel doesn't occlude chat. */
+  protected readonly artifactPanelOpen = computed(
+    () => this.artifactState.openArtifact() !== null,
+  );
+
+  /** Exposed as a CSS var on the content wrapper so the desktop-only
+   *  media-query rules (here and in chat-container) reserve exactly the
+   *  user-chosen pane width. */
+  protected readonly artifactPaneWidthCss = computed(
+    () => `${this.artifactState.paneWidth()}px`,
+  );
+
+  constructor() {
+    // Re-probe the BFF session whenever the tab regains focus. A session
+    // that expired while the tab was backgrounded surfaces immediately
+    // (redirect to /auth/login) instead of waiting for the next user
+    // action to 401. SSR-safe via the document guard.
+    if (typeof document !== 'undefined') {
+      const destroyRef = inject(DestroyRef);
+      const handler = () => {
+        if (document.visibilityState === 'visible') {
+          this.session.recheck();
+        }
+      };
+      document.addEventListener('visibilitychange', handler);
+      destroyRef.onDestroy(() => document.removeEventListener('visibilitychange', handler));
+    }
+  }
 
   newChat() {
     this.router.navigate(['']);
diff --git a/frontend/ai.client/src/app/assistants/assistant-form/components/assistant-preview.component.ts b/frontend/ai.client/src/app/assistants/assistant-form/components/assistant-preview.component.ts
index 4fc906dc..3e96ace5 100644
--- a/frontend/ai.client/src/app/assistants/assistant-form/components/assistant-preview.component.ts
+++ b/frontend/ai.client/src/app/assistants/assistant-form/components/assistant-preview.component.ts
@@ -69,6 +69,7 @@ import { AssistantCardComponent } from '../../components/assistant-card.componen
                 [sessionId]="previewChatService.sessionId()"
                 [isChatLoading]="previewChatService.isLoading()"
                 [showFileControls]="false"
+                [autoFocus]="false"
                 (messageSubmitted)="onMessageSubmitted($event)"
                 (messageCancelled)="onMessageCancelled()"
               />
diff --git a/frontend/ai.client/src/app/auth/error.interceptor.spec.ts b/frontend/ai.client/src/app/auth/error.interceptor.spec.ts
index e389a13d..bde70a6c 100644
--- a/frontend/ai.client/src/app/auth/error.interceptor.spec.ts
+++ b/frontend/ai.client/src/app/auth/error.interceptor.spec.ts
@@ -2,6 +2,7 @@ import { TestBed } from '@angular/core/testing';
 import { HttpRequest, HttpResponse, HttpErrorResponse, HttpHandlerFn } from '@angular/common/http';
 import { errorInterceptor } from './error.interceptor';
 import { ErrorService } from '../services/error/error.service';
+import { SessionService } from './session.service';
 import { of, throwError } from 'rxjs';
 import { describe, it, expect, beforeEach, afterEach, vi } from 'vitest';
 
@@ -9,16 +10,23 @@ describe('errorInterceptor', () => {
   let errorService: {
     handleHttpError: ReturnType<typeof vi.fn>;
   };
+  let sessionService: {
+    handleUnauthorized: ReturnType<typeof vi.fn>;
+  };
 
   beforeEach(() => {
     TestBed.resetTestingModule();
     errorService = {
       handleHttpError: vi.fn(),
     };
+    sessionService = {
+      handleUnauthorized: vi.fn(),
+    };
 
     TestBed.configureTestingModule({
       providers: [
         { provide: ErrorService, useValue: errorService },
+        { provide: SessionService, useValue: sessionService },
       ],
     });
   });
@@ -147,6 +155,26 @@ describe('errorInterceptor', () => {
     });
   });
 
+  it('should call sessionService.handleUnauthorized on 401 and skip the toast', async () => {
+    const error = new HttpErrorResponse({ status: 401, url: 'http://localhost:8000/api/sessions' });
+    const nextFn: HttpHandlerFn = vi.fn().mockReturnValue(throwError(() => error));
+    const req = new HttpRequest('GET', 'http://localhost:8000/api/sessions');
+
+    await new Promise<void>((resolve) => {
+      TestBed.runInInjectionContext(() => {
+        errorInterceptor(req, nextFn).subscribe({
+          error: (err: unknown) => {
+            expect(sessionService.handleUnauthorized).toHaveBeenCalledTimes(1);
+            expect(errorService.handleHttpError).not.toHaveBeenCalled();
+            // Caller still sees the error so any local cleanup runs.
+            expect(err).toBe(error);
+            resolve();
+          },
+        });
+      });
+    });
+  });
+
   /**
    * Validates: Requirements 14.8 (success case)
    * Successful responses pass through without interception
diff --git a/frontend/ai.client/src/app/auth/error.interceptor.ts b/frontend/ai.client/src/app/auth/error.interceptor.ts
index 83aad8b9..31cacfd8 100644
--- a/frontend/ai.client/src/app/auth/error.interceptor.ts
+++ b/frontend/ai.client/src/app/auth/error.interceptor.ts
@@ -2,6 +2,7 @@ import { HttpInterceptorFn, HttpErrorResponse } from '@angular/common/http';
 import { inject } from '@angular/core';
 import { catchError, throwError } from 'rxjs';
 import { ErrorService } from '../services/error/error.service';
+import { SessionService } from './session.service';
 
 /**
  * HTTP interceptor that handles errors from non-streaming HTTP requests
@@ -17,6 +18,7 @@ import { ErrorService } from '../services/error/error.service';
  */
 export const errorInterceptor: HttpInterceptorFn = (req, next) => {
   const errorService = inject(ErrorService);
+  const sessionService = inject(SessionService);
 
   // Skip error handling for SSE streaming endpoints
   // These are handled by fetchEventSource's onerror callback.
@@ -43,12 +45,12 @@ export const errorInterceptor: HttpInterceptorFn = (req, next) => {
           req.url.includes(endpoint)
         );
 
-        // 401s mean the BFF session is missing or expired. SessionService
-        // handles that by routing the user to /auth/login — a toast on top
-        // is just noise and tends to flash before the redirect lands.
-        const isUnauthorized = error.status === 401;
-
-        if (!isSilentEndpoint && !isUnauthorized) {
+        // 401s mean the BFF session is missing or expired. Route to the
+        // SPA login page (idempotent across concurrent 401s) and skip the
+        // toast — it just flashes before the redirect lands.
+        if (error.status === 401) {
+          sessionService.handleUnauthorized();
+        } else if (!isSilentEndpoint) {
           // Use ErrorService to display the error
           errorService.handleHttpError(error);
         }
diff --git a/frontend/ai.client/src/app/auth/first-boot/first-boot.page.css b/frontend/ai.client/src/app/auth/first-boot/first-boot.page.css
index 72609321..1659783b 100644
--- a/frontend/ai.client/src/app/auth/first-boot/first-boot.page.css
+++ b/frontend/ai.client/src/app/auth/first-boot/first-boot.page.css
@@ -1,5 +1,247 @@
-/* First-boot page specific styles */
+/* First-boot page styles — mirrors the login page's lava-lamp parallax
+   backdrop and frosted-glass card so the two auth screens feel like one
+   system. Class names are component-scoped (Emulated view encapsulation)
+   so they don't collide with the login page's identical names. */
 
 @import "tailwindcss";
 
 @custom-variant dark (&:where(.dark, .dark *));
+
+/* ---------- Background canvas ---------- */
+.login-shell {
+  background:
+    radial-gradient(120% 80% at 0% 0%, color-mix(in oklab, var(--color-primary-50) 70%, white) 0%, transparent 60%),
+    radial-gradient(120% 80% at 100% 100%, color-mix(in oklab, var(--color-primary-100) 60%, white) 0%, transparent 55%),
+    var(--color-gray-50);
+}
+
+:host-context(html.dark) .login-shell {
+  background:
+    radial-gradient(120% 80% at 0% 0%, color-mix(in oklab, var(--color-primary-900) 50%, black) 0%, transparent 60%),
+    radial-gradient(120% 80% at 100% 100%, color-mix(in oklab, var(--color-primary-800) 35%, black) 0%, transparent 55%),
+    var(--color-gray-900);
+}
+
+.login-bg {
+  position: absolute;
+  inset: 0;
+  overflow: hidden;
+  pointer-events: none;
+}
+
+.login-lava {
+  position: absolute;
+  inset: 0;
+  overflow: hidden;
+}
+
+.login-blob {
+  position: absolute;
+  will-change: transform, border-radius;
+  border-radius: 58% 42% 60% 40% / 50% 55% 45% 50%;
+}
+
+/* ----- Far tier: huge, slow, heavy blur, low opacity ----- */
+.login-blob--a {
+  width: 70vw;
+  height: 86vw;
+  max-width: 880px;
+  max-height: 1080px;
+  bottom: -38vw;
+  left: -18vw;
+  filter: blur(110px);
+  opacity: 0.4;
+  background: radial-gradient(circle at 35% 35%, var(--color-primary-400), var(--color-primary-700) 60%, transparent 78%);
+  animation:
+    login-rise-a 52s ease-in-out infinite alternate,
+    login-morph-a 28s ease-in-out infinite alternate;
+}
+
+.login-blob--b {
+  width: 62vw;
+  height: 76vw;
+  max-width: 800px;
+  max-height: 960px;
+  top: -34vw;
+  right: -20vw;
+  filter: blur(100px);
+  opacity: 0.36;
+  background: radial-gradient(circle at 65% 65%, var(--color-primary-500), var(--color-primary-800) 65%, transparent 82%);
+  animation:
+    login-rise-b 60s ease-in-out infinite alternate,
+    login-morph-b 32s ease-in-out infinite alternate;
+}
+
+/* ----- Mid tier ----- */
+.login-blob--c {
+  width: 32vw;
+  height: 40vw;
+  max-width: 420px;
+  max-height: 520px;
+  top: 28%;
+  left: 42%;
+  filter: blur(60px);
+  opacity: 0.5;
+  background: radial-gradient(circle, color-mix(in oklab, var(--color-primary-300) 75%, white), transparent 72%);
+  animation:
+    login-rise-c 30s ease-in-out infinite alternate,
+    login-morph-c 18s ease-in-out infinite alternate;
+}
+
+.login-blob--d {
+  width: 28vw;
+  height: 36vw;
+  max-width: 360px;
+  max-height: 460px;
+  bottom: -12vw;
+  right: 18vw;
+  filter: blur(55px);
+  opacity: 0.55;
+  background: radial-gradient(circle at 50% 50%, var(--color-primary-300), var(--color-primary-500) 60%, transparent 80%);
+  animation:
+    login-rise-d 26s ease-in-out infinite alternate,
+    login-morph-a 16s ease-in-out infinite alternate -3s;
+}
+
+/* ----- Near tier ----- */
+.login-blob--e {
+  width: 16vw;
+  height: 22vw;
+  max-width: 220px;
+  max-height: 300px;
+  top: -6vw;
+  left: 32vw;
+  filter: blur(32px);
+  opacity: 0.65;
+  background: radial-gradient(circle at 50% 50%, var(--color-primary-400), var(--color-primary-700) 65%, transparent 82%);
+  animation:
+    login-rise-e 14s ease-in-out infinite alternate,
+    login-morph-b 11s ease-in-out infinite alternate -5s;
+}
+
+.login-blob--f {
+  width: 12vw;
+  height: 16vw;
+  max-width: 160px;
+  max-height: 220px;
+  bottom: -4vw;
+  left: 14vw;
+  filter: blur(26px);
+  opacity: 0.7;
+  background: radial-gradient(circle at 45% 45%, var(--color-primary-300), var(--color-primary-600) 65%, transparent 84%);
+  animation:
+    login-rise-f 11s ease-in-out infinite alternate,
+    login-morph-c 9s ease-in-out infinite alternate -2s;
+}
+
+:host-context(html.dark) .login-blob--a { opacity: 0.32; }
+:host-context(html.dark) .login-blob--b { opacity: 0.28; }
+:host-context(html.dark) .login-blob--c { opacity: 0.38; }
+:host-context(html.dark) .login-blob--d { opacity: 0.42; }
+:host-context(html.dark) .login-blob--e { opacity: 0.5; }
+:host-context(html.dark) .login-blob--f { opacity: 0.55; }
+
+.login-grid {
+  position: absolute;
+  inset: 0;
+  background-image:
+    linear-gradient(to right, color-mix(in oklab, var(--color-primary-500) 8%, transparent) 1px, transparent 1px),
+    linear-gradient(to bottom, color-mix(in oklab, var(--color-primary-500) 8%, transparent) 1px, transparent 1px);
+  background-size: 64px 64px;
+  mask-image: radial-gradient(ellipse 70% 60% at 50% 45%, black 30%, transparent 75%);
+  -webkit-mask-image: radial-gradient(ellipse 70% 60% at 50% 45%, black 30%, transparent 75%);
+  opacity: 0.6;
+}
+
+:host-context(html.dark) .login-grid {
+  background-image:
+    linear-gradient(to right, color-mix(in oklab, var(--color-primary-300) 6%, transparent) 1px, transparent 1px),
+    linear-gradient(to bottom, color-mix(in oklab, var(--color-primary-300) 6%, transparent) 1px, transparent 1px);
+  opacity: 0.5;
+}
+
+/* Far: minimal travel, lazy sway */
+@keyframes login-rise-a {
+  0%   { transform: translate3d(0, 0, 0) scale(1, 1) rotate(0deg); }
+  50%  { transform: translate3d(2vw, -12vh, 0) scale(1.04, 0.96) rotate(4deg); }
+  100% { transform: translate3d(-1vw, -22vh, 0) scale(0.97, 1.05) rotate(-3deg); }
+}
+@keyframes login-rise-b {
+  0%   { transform: translate3d(0, 0, 0) scale(1, 1) rotate(0deg); }
+  50%  { transform: translate3d(-2vw, 10vh, 0) scale(0.96, 1.05) rotate(-4deg); }
+  100% { transform: translate3d(1vw, 20vh, 0) scale(1.05, 0.96) rotate(3deg); }
+}
+
+/* Mid: moderate travel */
+@keyframes login-rise-c {
+  0%   { transform: translate3d(0, 0, 0) scale(1, 1) rotate(0deg); }
+  50%  { transform: translate3d(-5vw, -25vh, 0) scale(1.1, 0.94) rotate(-10deg); }
+  100% { transform: translate3d(4vw, -50vh, 0) scale(0.92, 1.1) rotate(8deg); }
+}
+@keyframes login-rise-d {
+  0%   { transform: translate3d(0, 0, 0) scale(1, 1) rotate(0deg); }
+  50%  { transform: translate3d(6vw, -35vh, 0) scale(1.05, 0.95) rotate(12deg); }
+  100% { transform: translate3d(-3vw, -68vh, 0) scale(0.92, 1.08) rotate(-7deg); }
+}
+
+/* Near: dramatic travel */
+@keyframes login-rise-e {
+  0%   { transform: translate3d(0, 0, 0) scale(1, 1) rotate(0deg); }
+  50%  { transform: translate3d(8vw, 55vh, 0) scale(0.88, 1.14) rotate(-18deg); }
+  100% { transform: translate3d(-6vw, 100vh, 0) scale(1.16, 0.86) rotate(14deg); }
+}
+@keyframes login-rise-f {
+  0%   { transform: translate3d(0, 0, 0) scale(1, 1) rotate(0deg); }
+  50%  { transform: translate3d(-9vw, -55vh, 0) scale(1.18, 0.84) rotate(20deg); }
+  100% { transform: translate3d(7vw, -105vh, 0) scale(0.85, 1.18) rotate(-16deg); }
+}
+
+/* Surface morph */
+@keyframes login-morph-a {
+  0%   { border-radius: 58% 42% 60% 40% / 50% 55% 45% 50%; }
+  50%  { border-radius: 42% 58% 38% 62% / 60% 40% 60% 40%; }
+  100% { border-radius: 50% 50% 65% 35% / 45% 55% 50% 50%; }
+}
+@keyframes login-morph-b {
+  0%   { border-radius: 50% 50% 40% 60% / 55% 45% 55% 45%; }
+  50%  { border-radius: 65% 35% 55% 45% / 40% 60% 40% 60%; }
+  100% { border-radius: 38% 62% 50% 50% / 60% 50% 50% 40%; }
+}
+@keyframes login-morph-c {
+  0%   { border-radius: 60% 40% 50% 50% / 45% 60% 40% 55%; }
+  50%  { border-radius: 40% 60% 65% 35% / 55% 40% 60% 45%; }
+  100% { border-radius: 55% 45% 38% 62% / 50% 55% 45% 50%; }
+}
+
+@media (prefers-reduced-motion: reduce) {
+  .login-blob,
+  .login-blob--a,
+  .login-blob--b,
+  .login-blob--c,
+  .login-blob--d,
+  .login-blob--e,
+  .login-blob--f {
+    animation: none;
+  }
+}
+
+/* ---------- Frosted glass card ---------- */
+.login-card {
+  background: color-mix(in oklab, white 65%, transparent);
+  backdrop-filter: blur(24px) saturate(160%);
+  -webkit-backdrop-filter: blur(24px) saturate(160%);
+  border: 1px solid color-mix(in oklab, white 70%, transparent);
+  box-shadow:
+    0 1px 0 0 rgba(255, 255, 255, 0.6) inset,
+    0 20px 50px -20px color-mix(in oklab, var(--color-primary-900) 35%, transparent),
+    0 8px 24px -12px rgba(0, 0, 0, 0.15);
+}
+
+:host-context(html.dark) .login-card {
+  background: color-mix(in oklab, var(--color-gray-900) 55%, transparent);
+  border-color: color-mix(in oklab, white 12%, transparent);
+  box-shadow:
+    0 1px 0 0 rgba(255, 255, 255, 0.06) inset,
+    0 20px 50px -20px rgba(0, 0, 0, 0.6),
+    0 8px 24px -12px rgba(0, 0, 0, 0.5);
+}
diff --git a/frontend/ai.client/src/app/auth/first-boot/first-boot.page.ts b/frontend/ai.client/src/app/auth/first-boot/first-boot.page.ts
index 4c8f81e7..494ed489 100644
--- a/frontend/ai.client/src/app/auth/first-boot/first-boot.page.ts
+++ b/frontend/ai.client/src/app/auth/first-boot/first-boot.page.ts
@@ -11,8 +11,23 @@ import { SystemService, FirstBootError } from '../../services/system.service';
   styleUrl: './first-boot.page.css',
   changeDetection: ChangeDetectionStrategy.OnPush,
   template: `
-    <div class="fixed inset-0 flex items-center justify-center bg-gray-50 dark:bg-gray-900 overflow-y-auto">
-      <div class="w-full max-w-md px-4 py-12">
+    <div class="login-shell fixed inset-0 flex items-center justify-center overflow-y-auto">
+      <!-- Decorative background: lava-lamp blobs across three depth tiers
+           (far/mid/near) for parallax — size, blur, speed, and travel
+           distance all scale with depth. -->
+      <div class="login-bg" aria-hidden="true">
+        <div class="login-lava">
+          <div class="login-blob login-blob--a"></div>
+          <div class="login-blob login-blob--b"></div>
+          <div class="login-blob login-blob--c"></div>
+          <div class="login-blob login-blob--d"></div>
+          <div class="login-blob login-blob--e"></div>
+          <div class="login-blob login-blob--f"></div>
+        </div>
+        <div class="login-grid"></div>
+      </div>
+
+      <div class="relative w-full max-w-md px-4 py-12">
         <!-- Logo -->
         <div class="mb-8 flex justify-center">
           <img
@@ -25,13 +40,13 @@ import { SystemService, FirstBootError } from '../../services/system.service';
             class="hidden size-16 dark:block">
         </div>
 
-        <div class="bg-white dark:bg-gray-800 rounded-lg shadow-sm p-8">
+        <div class="login-card rounded-2xl p-8">
           <div class="flex flex-col items-center gap-6">
             <div class="flex flex-col items-center gap-2">
-              <h1 class="text-2xl font-semibold text-gray-900 dark:text-gray-100">
+              <h1 class="text-2xl font-semibold text-gray-900 dark:text-gray-50">
                 Welcome
               </h1>
-              <p class="text-base/7 text-gray-600 dark:text-gray-400 text-center">
+              <p class="text-base/7 text-gray-700 dark:text-gray-300 text-center">
                 Create your admin account to get started
               </p>
             </div>
diff --git a/frontend/ai.client/src/app/auth/login/login.page.css b/frontend/ai.client/src/app/auth/login/login.page.css
index da8ffc08..6167c8d4 100644
--- a/frontend/ai.client/src/app/auth/login/login.page.css
+++ b/frontend/ai.client/src/app/auth/login/login.page.css
@@ -10,7 +10,7 @@
     var(--color-gray-50);
 }
 
-html.dark .login-shell {
+:host-context(html.dark) .login-shell {
   background:
     radial-gradient(120% 80% at 0% 0%, color-mix(in oklab, var(--color-primary-900) 50%, black) 0%, transparent 60%),
     radial-gradient(120% 80% at 100% 100%, color-mix(in oklab, var(--color-primary-800) 35%, black) 0%, transparent 55%),
@@ -24,53 +24,122 @@ html.dark .login-shell {
   pointer-events: none;
 }
 
+/* The .login-lava wrapper holds the morphing blobs. Keeping it as a separate
+   layer lets us isolate transforms and (in the future) drop a goo SVG filter
+   on it without affecting the grid overlay. */
+.login-lava {
+  position: absolute;
+  inset: 0;
+  overflow: hidden;
+}
+
 .login-blob {
   position: absolute;
-  border-radius: 9999px;
-  filter: blur(80px);
-  opacity: 0.55;
-  will-change: transform;
+  will-change: transform, border-radius;
+  /* Asymmetric border-radius gives an organic, non-circular blob silhouette;
+     keyframes morph these values to make the surface "wobble" as it rises. */
+  border-radius: 58% 42% 60% 40% / 50% 55% 45% 50%;
 }
 
+/* ----- Far tier: huge, slow, heavy blur, low opacity ----- */
 .login-blob--a {
-  width: 60vw;
-  height: 60vw;
-  max-width: 720px;
-  max-height: 720px;
-  top: -15vw;
-  left: -10vw;
-  background: radial-gradient(circle at 30% 30%, var(--color-primary-400), var(--color-primary-600) 60%, transparent 75%);
-  animation: login-drift-a 18s ease-in-out infinite alternate;
+  width: 70vw;
+  height: 86vw;
+  max-width: 880px;
+  max-height: 1080px;
+  bottom: -38vw;
+  left: -18vw;
+  filter: blur(110px);
+  opacity: 0.4;
+  background: radial-gradient(circle at 35% 35%, var(--color-primary-400), var(--color-primary-700) 60%, transparent 78%);
+  animation:
+    login-rise-a 52s ease-in-out infinite alternate,
+    login-morph-a 28s ease-in-out infinite alternate;
 }
 
 .login-blob--b {
-  width: 55vw;
-  height: 55vw;
-  max-width: 640px;
-  max-height: 640px;
-  bottom: -18vw;
-  right: -12vw;
-  background: radial-gradient(circle at 70% 70%, var(--color-primary-500), var(--color-primary-800) 65%, transparent 80%);
-  opacity: 0.45;
-  animation: login-drift-b 22s ease-in-out infinite alternate;
+  width: 62vw;
+  height: 76vw;
+  max-width: 800px;
+  max-height: 960px;
+  top: -34vw;
+  right: -20vw;
+  filter: blur(100px);
+  opacity: 0.36;
+  background: radial-gradient(circle at 65% 65%, var(--color-primary-500), var(--color-primary-800) 65%, transparent 82%);
+  animation:
+    login-rise-b 60s ease-in-out infinite alternate,
+    login-morph-b 32s ease-in-out infinite alternate;
 }
 
+/* ----- Mid tier: medium, moderate speed/blur ----- */
 .login-blob--c {
-  width: 38vw;
-  height: 38vw;
-  max-width: 480px;
-  max-height: 480px;
-  top: 35%;
-  left: 50%;
-  transform: translate(-50%, -50%);
-  background: radial-gradient(circle, color-mix(in oklab, var(--color-primary-300) 70%, white), transparent 70%);
-  opacity: 0.35;
-  animation: login-drift-c 26s ease-in-out infinite alternate;
-}
-
-html.dark .login-blob--a { opacity: 0.45; }
-html.dark .login-blob--b { opacity: 0.4; }
-html.dark .login-blob--c { opacity: 0.25; }
+  width: 32vw;
+  height: 40vw;
+  max-width: 420px;
+  max-height: 520px;
+  top: 28%;
+  left: 42%;
+  filter: blur(60px);
+  opacity: 0.5;
+  background: radial-gradient(circle, color-mix(in oklab, var(--color-primary-300) 75%, white), transparent 72%);
+  animation:
+    login-rise-c 30s ease-in-out infinite alternate,
+    login-morph-c 18s ease-in-out infinite alternate;
+}
+
+.login-blob--d {
+  width: 28vw;
+  height: 36vw;
+  max-width: 360px;
+  max-height: 460px;
+  bottom: -12vw;
+  right: 18vw;
+  filter: blur(55px);
+  opacity: 0.55;
+  background: radial-gradient(circle at 50% 50%, var(--color-primary-300), var(--color-primary-500) 60%, transparent 80%);
+  animation:
+    login-rise-d 26s ease-in-out infinite alternate,
+    login-morph-a 16s ease-in-out infinite alternate -3s;
+}
+
+/* ----- Near tier: small, fast, sharper, more opaque ----- */
+.login-blob--e {
+  width: 16vw;
+  height: 22vw;
+  max-width: 220px;
+  max-height: 300px;
+  top: -6vw;
+  left: 32vw;
+  filter: blur(32px);
+  opacity: 0.65;
+  background: radial-gradient(circle at 50% 50%, var(--color-primary-400), var(--color-primary-700) 65%, transparent 82%);
+  animation:
+    login-rise-e 14s ease-in-out infinite alternate,
+    login-morph-b 11s ease-in-out infinite alternate -5s;
+}
+
+.login-blob--f {
+  width: 12vw;
+  height: 16vw;
+  max-width: 160px;
+  max-height: 220px;
+  bottom: -4vw;
+  left: 14vw;
+  filter: blur(26px);
+  opacity: 0.7;
+  background: radial-gradient(circle at 45% 45%, var(--color-primary-300), var(--color-primary-600) 65%, transparent 84%);
+  animation:
+    login-rise-f 11s ease-in-out infinite alternate,
+    login-morph-c 9s ease-in-out infinite alternate -2s;
+}
+
+:host-context(html.dark) .login-blob--a { opacity: 0.32; }
+:host-context(html.dark) .login-blob--b { opacity: 0.28; }
+:host-context(html.dark) .login-blob--c { opacity: 0.38; }
+:host-context(html.dark) .login-blob--d { opacity: 0.42; }
+:host-context(html.dark) .login-blob--e { opacity: 0.5; }
+:host-context(html.dark) .login-blob--f { opacity: 0.55; }
 
 .login-grid {
   position: absolute;
@@ -84,31 +153,80 @@ html.dark .login-blob--c { opacity: 0.25; }
   opacity: 0.6;
 }
 
-html.dark .login-grid {
+:host-context(html.dark) .login-grid {
   background-image:
     linear-gradient(to right, color-mix(in oklab, var(--color-primary-300) 6%, transparent) 1px, transparent 1px),
     linear-gradient(to bottom, color-mix(in oklab, var(--color-primary-300) 6%, transparent) 1px, transparent 1px);
   opacity: 0.5;
 }
 
-@keyframes login-drift-a {
-  from { transform: translate3d(0, 0, 0) scale(1); }
-  to   { transform: translate3d(4vw, 3vw, 0) scale(1.08); }
+/* Rise/fall trajectories — primary motion is vertical with gentle horizontal
+   sway and squish/stretch via non-uniform scale. Travel distance scales with
+   depth: far tier barely budges, mid tier drifts, near tier traverses most
+   of the viewport — that contrast is what sells the parallax effect. */
+
+/* Far: minimal travel, lazy sway */
+@keyframes login-rise-a {
+  0%   { transform: translate3d(0, 0, 0) scale(1, 1) rotate(0deg); }
+  50%  { transform: translate3d(2vw, -12vh, 0) scale(1.04, 0.96) rotate(4deg); }
+  100% { transform: translate3d(-1vw, -22vh, 0) scale(0.97, 1.05) rotate(-3deg); }
+}
+@keyframes login-rise-b {
+  0%   { transform: translate3d(0, 0, 0) scale(1, 1) rotate(0deg); }
+  50%  { transform: translate3d(-2vw, 10vh, 0) scale(0.96, 1.05) rotate(-4deg); }
+  100% { transform: translate3d(1vw, 20vh, 0) scale(1.05, 0.96) rotate(3deg); }
+}
+
+/* Mid: moderate travel */
+@keyframes login-rise-c {
+  0%   { transform: translate3d(0, 0, 0) scale(1, 1) rotate(0deg); }
+  50%  { transform: translate3d(-5vw, -25vh, 0) scale(1.1, 0.94) rotate(-10deg); }
+  100% { transform: translate3d(4vw, -50vh, 0) scale(0.92, 1.1) rotate(8deg); }
+}
+@keyframes login-rise-d {
+  0%   { transform: translate3d(0, 0, 0) scale(1, 1) rotate(0deg); }
+  50%  { transform: translate3d(6vw, -35vh, 0) scale(1.05, 0.95) rotate(12deg); }
+  100% { transform: translate3d(-3vw, -68vh, 0) scale(0.92, 1.08) rotate(-7deg); }
+}
+
+/* Near: dramatic travel, snappy squish/stretch */
+@keyframes login-rise-e {
+  0%   { transform: translate3d(0, 0, 0) scale(1, 1) rotate(0deg); }
+  50%  { transform: translate3d(8vw, 55vh, 0) scale(0.88, 1.14) rotate(-18deg); }
+  100% { transform: translate3d(-6vw, 100vh, 0) scale(1.16, 0.86) rotate(14deg); }
+}
+@keyframes login-rise-f {
+  0%   { transform: translate3d(0, 0, 0) scale(1, 1) rotate(0deg); }
+  50%  { transform: translate3d(-9vw, -55vh, 0) scale(1.18, 0.84) rotate(20deg); }
+  100% { transform: translate3d(7vw, -105vh, 0) scale(0.85, 1.18) rotate(-16deg); }
+}
+
+/* Morphing border-radius makes each blob's surface wobble independently of
+   its trajectory — the signature lava-lamp "skin" deformation. */
+@keyframes login-morph-a {
+  0%   { border-radius: 58% 42% 60% 40% / 50% 55% 45% 50%; }
+  50%  { border-radius: 42% 58% 38% 62% / 60% 40% 60% 40%; }
+  100% { border-radius: 50% 50% 65% 35% / 45% 55% 50% 50%; }
 }
-@keyframes login-drift-b {
-  from { transform: translate3d(0, 0, 0) scale(1); }
-  to   { transform: translate3d(-3vw, -4vw, 0) scale(1.1); }
+@keyframes login-morph-b {
+  0%   { border-radius: 50% 50% 40% 60% / 55% 45% 55% 45%; }
+  50%  { border-radius: 65% 35% 55% 45% / 40% 60% 40% 60%; }
+  100% { border-radius: 38% 62% 50% 50% / 60% 50% 50% 40%; }
 }
-@keyframes login-drift-c {
-  from { transform: translate(-50%, -50%) scale(1); }
-  to   { transform: translate(-48%, -52%) scale(1.06); }
+@keyframes login-morph-c {
+  0%   { border-radius: 60% 40% 50% 50% / 45% 60% 40% 55%; }
+  50%  { border-radius: 40% 60% 65% 35% / 55% 40% 60% 45%; }
+  100% { border-radius: 55% 45% 38% 62% / 50% 55% 45% 50%; }
 }
 
 @media (prefers-reduced-motion: reduce) {
   .login-blob,
   .login-blob--a,
   .login-blob--b,
-  .login-blob--c {
+  .login-blob--c,
+  .login-blob--d,
+  .login-blob--e,
+  .login-blob--f {
     animation: none;
   }
 }
@@ -125,7 +243,7 @@ html.dark .login-grid {
     0 8px 24px -12px rgba(0, 0, 0, 0.15);
 }
 
-html.dark .login-card {
+:host-context(html.dark) .login-card {
   background: color-mix(in oklab, var(--color-gray-900) 55%, transparent);
   border-color: color-mix(in oklab, white 12%, transparent);
   box-shadow:
@@ -142,6 +260,6 @@ html.dark .login-card {
   border-radius: 9999px;
 }
 
-html.dark .login-divider-text {
+:host-context(html.dark) .login-divider-text {
   background: color-mix(in oklab, var(--color-gray-900) 70%, transparent);
 }
diff --git a/frontend/ai.client/src/app/auth/login/login.page.ts b/frontend/ai.client/src/app/auth/login/login.page.ts
index 1b1ca589..a92a5d3e 100644
--- a/frontend/ai.client/src/app/auth/login/login.page.ts
+++ b/frontend/ai.client/src/app/auth/login/login.page.ts
@@ -26,11 +26,21 @@ interface AuthProviderPublicListResponse {
   changeDetection: ChangeDetectionStrategy.OnPush,
   template: `
     <div class="login-shell fixed inset-0 flex items-center justify-center overflow-y-auto">
-      <!-- Decorative background: large primary-color blobs with soft blur -->
+      <!-- Decorative background: lava-lamp blobs across three depth tiers
+           (far/mid/near) for parallax — size, blur, speed, and travel
+           distance all scale with depth. -->
       <div class="login-bg" aria-hidden="true">
-        <div class="login-blob login-blob--a"></div>
-        <div class="login-blob login-blob--b"></div>
-        <div class="login-blob login-blob--c"></div>
+        <div class="login-lava">
+          <!-- Far layer: huge, slow, heavily blurred -->
+          <div class="login-blob login-blob--a"></div>
+          <div class="login-blob login-blob--b"></div>
+          <!-- Mid layer -->
+          <div class="login-blob login-blob--c"></div>
+          <div class="login-blob login-blob--d"></div>
+          <!-- Near layer: small, fast, sharper -->
+          <div class="login-blob login-blob--e"></div>
+          <div class="login-blob login-blob--f"></div>
+        </div>
         <div class="login-grid"></div>
       </div>
 
diff --git a/frontend/ai.client/src/app/auth/session.service.spec.ts b/frontend/ai.client/src/app/auth/session.service.spec.ts
index 6c96a425..a9d039a7 100644
--- a/frontend/ai.client/src/app/auth/session.service.spec.ts
+++ b/frontend/ai.client/src/app/auth/session.service.spec.ts
@@ -21,8 +21,37 @@ describe('SessionService', () => {
     csrf_token: 'csrf-secret-abc',
   };
 
+  // Helpers — bootstrap takes the network path only when the JS-readable
+  // CSRF cookie is present; otherwise the fast-path bounces straight to
+  // login. Tests that exercise the network path set the cookie first.
+  //
+  // jsdom enforces `__Host-` cookie prefix rules (Secure required, no
+  // http://localhost), so a real `document.cookie` write is silently
+  // rejected. Install a minimal one-cookie shim per-test.
+  let cookieStore = '';
+  const installCookieShim = () => {
+    cookieStore = '';
+    Object.defineProperty(document, 'cookie', {
+      configurable: true,
+      get: () => cookieStore,
+      set: (input: string) => {
+        const [pair, ...attrs] = input.split(';');
+        const expired = attrs.some((a) => /expires=Thu, 01 Jan 1970/i.test(a));
+        cookieStore = expired ? '' : pair.trim();
+      },
+    });
+  };
+  const setCsrfCookie = (value = 'test-csrf-token') => {
+    document.cookie = `__Host-bff_csrf=${value}; path=/`;
+  };
+  const clearCsrfCookie = () => {
+    document.cookie =
+      '__Host-bff_csrf=; path=/; expires=Thu, 01 Jan 1970 00:00:00 GMT';
+  };
+
   beforeEach(() => {
     TestBed.resetTestingModule();
+    installCookieShim();
 
     Object.defineProperty(window, 'location', {
       value: {
@@ -53,6 +82,7 @@ describe('SessionService', () => {
 
   afterEach(() => {
     httpMock.verify();
+    clearCsrfCookie();
     TestBed.resetTestingModule();
     vi.restoreAllMocks();
   });
@@ -68,6 +98,7 @@ describe('SessionService', () => {
 
   describe('bootstrap', () => {
     it('populates user and csrfToken on a successful 200', async () => {
+      setCsrfCookie();
       const promise = service.bootstrap();
 
       const req = httpMock.expectOne('http://localhost:8000/auth/session');
@@ -94,6 +125,7 @@ describe('SessionService', () => {
     });
 
     it('redirects to the SPA /auth/login page on 401 and clears state', async () => {
+      setCsrfCookie();
       window.location.pathname = '/admin/users';
       window.location.search = '?tab=roles';
 
@@ -120,6 +152,7 @@ describe('SessionService', () => {
     });
 
     it('does not redirect when the 401 lands on /auth/login itself', async () => {
+      setCsrfCookie();
       window.location.pathname = '/auth/login';
       window.location.search = '';
 
@@ -137,6 +170,7 @@ describe('SessionService', () => {
     });
 
     it('does not redirect on a non-401 transport error', async () => {
+      setCsrfCookie();
       const promise = service.bootstrap();
 
       const req = httpMock.expectOne('http://localhost:8000/auth/session');
@@ -151,6 +185,7 @@ describe('SessionService', () => {
     });
 
     it('uses same-origin path when appApiUrl is configured as /api', async () => {
+      setCsrfCookie();
       (configService.appApiUrl as any).set('/api');
 
       const promise = service.bootstrap();
@@ -164,6 +199,7 @@ describe('SessionService', () => {
     });
 
     it('strips a trailing slash from appApiUrl', async () => {
+      setCsrfCookie();
       (configService.appApiUrl as any).set('http://localhost:8000/');
 
       const promise = service.bootstrap();
@@ -175,6 +211,150 @@ describe('SessionService', () => {
 
       expect(service.isAuthenticated()).toBe(true);
     });
+
+    it('skips the /auth/session round-trip and redirects when no CSRF cookie is present', async () => {
+      window.location.pathname = '/files';
+      window.location.search = '';
+      // Cookie absent (cleared in beforeEach). The fast-path should
+      // detect this and bounce without making any HTTP request.
+      void service.bootstrap();
+
+      await new Promise(resolve => setTimeout(resolve, 0));
+
+      httpMock.expectNone('http://localhost:8000/auth/session');
+      expect(service.bootstrapped()).toBe(false);
+      expect(window.location.href).toBe(
+        `/auth/login?returnUrl=${encodeURIComponent('/files')}`,
+      );
+    });
+
+    it('still resolves on /auth/login when the CSRF cookie is absent', async () => {
+      window.location.pathname = '/auth/login';
+      window.location.search = '';
+      // No cookie — fast-path triggers, but handleUnauthorized returns
+      // false on /auth/login so bootstrap completes without hanging.
+      await service.bootstrap();
+
+      httpMock.expectNone('http://localhost:8000/auth/session');
+      expect(service.bootstrapped()).toBe(true);
+      expect(window.location.href).toBe('');
+    });
+  });
+
+  describe('handleUnauthorized', () => {
+    it('redirects to /auth/login with the current path as returnUrl', () => {
+      window.location.pathname = '/manage-sessions';
+      window.location.search = '?id=abc';
+
+      const navigated = service.handleUnauthorized();
+
+      expect(navigated).toBe(true);
+      expect(window.location.href).toBe(
+        `/auth/login?returnUrl=${encodeURIComponent('/manage-sessions?id=abc')}`,
+      );
+      expect(service.user()).toBeNull();
+      expect(service.csrfToken()).toBeNull();
+    });
+
+    it('is a no-op when already on /auth/login', () => {
+      window.location.pathname = '/auth/login';
+
+      const navigated = service.handleUnauthorized();
+
+      expect(navigated).toBe(false);
+      expect(window.location.href).toBe('');
+    });
+
+    it('dedupes concurrent calls — only the first navigates', () => {
+      window.location.pathname = '/files';
+
+      expect(service.handleUnauthorized()).toBe(true);
+
+      // Mid-burst 401s shouldn't queue more navigations.
+      window.location.href = ''; // simulate that nothing has actually navigated yet
+      expect(service.handleUnauthorized()).toBe(false);
+      expect(window.location.href).toBe('');
+    });
+  });
+
+  describe('hasSessionCookie', () => {
+    it('returns false when the cookie is absent', () => {
+      expect(service.hasSessionCookie()).toBe(false);
+    });
+
+    it('returns true when the cookie is present', () => {
+      setCsrfCookie();
+      expect(service.hasSessionCookie()).toBe(true);
+    });
+  });
+
+  describe('recheck', () => {
+    const bootstrapAuthenticated = async () => {
+      setCsrfCookie();
+      const promise = service.bootstrap();
+      httpMock.expectOne('http://localhost:8000/auth/session').flush(sessionResponse);
+      await promise;
+    };
+
+    it('is a no-op before bootstrap has resolved', async () => {
+      setCsrfCookie();
+      await service.recheck();
+      httpMock.expectNone('http://localhost:8000/auth/session');
+    });
+
+    it('refreshes session state on a successful 200', async () => {
+      await bootstrapAuthenticated();
+      window.location.pathname = '/files';
+
+      const promise = service.recheck();
+      const req = httpMock.expectOne('http://localhost:8000/auth/session');
+      req.flush({ ...sessionResponse, csrf_token: 'rotated-csrf' });
+      await promise;
+
+      expect(service.csrfToken()).toBe('rotated-csrf');
+      expect(service.isAuthenticated()).toBe(true);
+      expect(window.location.href).toBe('');
+    });
+
+    it('redirects when the BFF returns 401', async () => {
+      await bootstrapAuthenticated();
+      window.location.pathname = '/files';
+
+      const promise = service.recheck();
+      const req = httpMock.expectOne('http://localhost:8000/auth/session');
+      req.flush('unauthorized', { status: 401, statusText: 'Unauthorized' });
+      await promise;
+
+      expect(window.location.href).toBe(
+        `/auth/login?returnUrl=${encodeURIComponent('/files')}`,
+      );
+    });
+
+    it('redirects without a network call when the CSRF cookie is gone', async () => {
+      await bootstrapAuthenticated();
+      window.location.pathname = '/files';
+      clearCsrfCookie();
+
+      await service.recheck();
+
+      httpMock.expectNone('http://localhost:8000/auth/session');
+      expect(window.location.href).toBe(
+        `/auth/login?returnUrl=${encodeURIComponent('/files')}`,
+      );
+    });
+
+    it('stays silent on a transient network error', async () => {
+      await bootstrapAuthenticated();
+      window.location.pathname = '/files';
+
+      const promise = service.recheck();
+      const req = httpMock.expectOne('http://localhost:8000/auth/session');
+      req.error(new ProgressEvent('network'), { status: 0, statusText: '' });
+      await promise;
+
+      expect(window.location.href).toBe('');
+      expect(service.isAuthenticated()).toBe(true);
+    });
   });
 
   describe('csrfHeaders', () => {
@@ -183,6 +363,7 @@ describe('SessionService', () => {
     });
 
     it('returns X-CSRF-Token after a successful bootstrap', async () => {
+      setCsrfCookie();
       const promise = service.bootstrap();
       httpMock.expectOne('http://localhost:8000/auth/session').flush(sessionResponse);
       await promise;
@@ -239,6 +420,7 @@ describe('SessionService', () => {
 
   describe('logout', () => {
     it('POSTs /auth/logout, clears local state, and navigates to the Cognito logout URL', async () => {
+      setCsrfCookie();
       const bootPromise = service.bootstrap();
       httpMock.expectOne('http://localhost:8000/auth/session').flush(sessionResponse);
       await bootPromise;
@@ -264,6 +446,7 @@ describe('SessionService', () => {
     });
 
     it('does not navigate when post_logout_url is null', async () => {
+      setCsrfCookie();
       const bootPromise = service.bootstrap();
       httpMock.expectOne('http://localhost:8000/auth/session').flush(sessionResponse);
       await bootPromise;
@@ -279,6 +462,7 @@ describe('SessionService', () => {
     });
 
     it('clears local state even when /auth/logout fails', async () => {
+      setCsrfCookie();
       const bootPromise = service.bootstrap();
       httpMock.expectOne('http://localhost:8000/auth/session').flush(sessionResponse);
       await bootPromise;
diff --git a/frontend/ai.client/src/app/auth/session.service.ts b/frontend/ai.client/src/app/auth/session.service.ts
index cbb78cd5..c416bd85 100644
--- a/frontend/ai.client/src/app/auth/session.service.ts
+++ b/frontend/ai.client/src/app/auth/session.service.ts
@@ -35,6 +35,12 @@ export class SessionService {
   private readonly _csrfToken = signal<string | null>(null);
   private readonly _bootstrapped = signal(false);
 
+  /**
+   * Latch flipped the first time we trigger a 401 redirect, so concurrent
+   * 401s from in-flight requests don't queue multiple navigations.
+   */
+  private redirecting = false;
+
   /** Current BFF session user, or null when no session is active. */
   readonly user = this._user.asReadonly();
 
@@ -55,11 +61,28 @@ export class SessionService {
    * to the SPA's `/auth/login` page (unless we're already there) so the
    * user can pick a provider before we hand off to Cognito Hosted UI.
    *
+   * Fast path: the JS-readable `__Host-bff_csrf` cookie shares its
+   * lifetime with the httpOnly session cookie (BFF sets and clears them
+   * together). If it's gone, the session is gone too — skip the round-trip
+   * and bounce straight to login.
+   *
    * Network errors leave the service in a clean unauthenticated state
    * without redirecting — a transient failure shouldn't kick the user
    * out.
    */
   async bootstrap(): Promise<void> {
+    if (!this.hasSessionCookie()) {
+      this._user.set(null);
+      this._csrfToken.set(null);
+      if (this.handleUnauthorized()) {
+        // Hang so APP_INITIALIZER blocks the SPA from rendering a route
+        // before the queued navigation tears the page down.
+        await new Promise<never>(() => {});
+      }
+      this._bootstrapped.set(true);
+      return;
+    }
+
     const url = `${this.baseUrl()}/auth/session`;
     try {
       const response = await firstValueFrom(
@@ -72,24 +95,88 @@ export class SessionService {
       this._user.set(null);
       this._csrfToken.set(null);
       if (error instanceof HttpErrorResponse && error.status === 401) {
-        if (window.location.pathname === '/auth/login') {
-          // Already on the SPA login page — let it render so the user
-          // can pick a provider.
-          return;
+        if (this.handleUnauthorized()) {
+          await new Promise<never>(() => {});
         }
-        const returnUrl = `${window.location.pathname}${window.location.search}`;
-        const params = new URLSearchParams({ returnUrl });
-        window.location.href = `/auth/login?${params.toString()}`;
-        // window.location.href only queues the navigation. Hang the promise
-        // so APP_INITIALIZER blocks the SPA from rendering a route before
-        // the browser tears the page down.
-        await new Promise<never>(() => {});
       }
     } finally {
       this._bootstrapped.set(true);
     }
   }
 
+  /**
+   * Centralized 401 handler. Clears local session state and navigates the
+   * browser to the SPA's `/auth/login` page with a `returnUrl` so the user
+   * lands back where they were after re-auth.
+   *
+   * Idempotent — a concurrent burst of 401s from in-flight requests only
+   * queues a single navigation. Skipped (returns false) when we're already
+   * on `/auth/login` so the page can render.
+   *
+   * @returns true when a navigation was queued, false otherwise. Bootstrap
+   *   uses the boolean to decide whether to hang APP_INITIALIZER.
+   */
+  handleUnauthorized(): boolean {
+    if (this.redirecting) return false;
+    if (window.location.pathname === '/auth/login') return false;
+    this.redirecting = true;
+    this._user.set(null);
+    this._csrfToken.set(null);
+    const returnUrl = `${window.location.pathname}${window.location.search}`;
+    const params = new URLSearchParams({ returnUrl });
+    window.location.href = `/auth/login?${params.toString()}`;
+    return true;
+  }
+
+  /**
+   * Re-probe the session against the BFF. Called by the app shell on tab
+   * refocus so a session that expired while the tab was backgrounded
+   * surfaces immediately instead of on the next user action.
+   *
+   * Cookie-presence check first — if the JS-readable CSRF cookie is gone,
+   * the session cookie is gone too (they share lifetime), so we redirect
+   * without spending a round-trip. Otherwise hits `/auth/session` which has
+   * no side effects on the BFF (it neither slides nor refreshes).
+   *
+   * Network errors are silent — a transient failure mid-tab-focus is not a
+   * reason to bounce the user.
+   */
+  async recheck(): Promise<void> {
+    if (this.redirecting) return;
+    if (!this._bootstrapped()) return;
+    if (!this.hasSessionCookie()) {
+      this.handleUnauthorized();
+      return;
+    }
+    const url = `${this.baseUrl()}/auth/session`;
+    try {
+      const response = await firstValueFrom(
+        this.http.get<BffSessionResponse>(url, { withCredentials: true }),
+      );
+      const { csrf_token, ...user } = response;
+      this._user.set(user);
+      this._csrfToken.set(csrf_token);
+    } catch (error) {
+      if (error instanceof HttpErrorResponse && error.status === 401) {
+        this.handleUnauthorized();
+      }
+    }
+  }
+
+  /**
+   * True when the JS-readable `__Host-bff_csrf` cookie is present. The BFF
+   * sets and clears it alongside the httpOnly session cookie with the same
+   * `Max-Age`, so absence ⇒ session is also gone. Presence is a weak
+   * positive — server-side revocation can leave the cookie behind until
+   * its TTL elapses — so callers should still verify with the BFF.
+   */
+  hasSessionCookie(): boolean {
+    if (typeof document === 'undefined') return true;
+    return document.cookie
+      .split(';')
+      .some((entry) => entry.trim().startsWith('__Host-bff_csrf='));
+  }
+
   /**
    * Navigate the browser to `{appApiUrl}/auth/login`. The BFF stashes
    * a state cookie and 302s on to Cognito Hosted UI.
diff --git a/frontend/ai.client/src/app/components/model-settings/model-settings.html b/frontend/ai.client/src/app/components/model-settings/model-settings.html
index ed791784..9466b05c 100644
--- a/frontend/ai.client/src/app/components/model-settings/model-settings.html
+++ b/frontend/ai.client/src/app/components/model-settings/model-settings.html
@@ -153,10 +153,6 @@ <h3 class="text-sm/6 font-medium text-gray-900 dark:text-white">Advanced</h3>
 
           @if (isAdvancedOpen()) {
             <div id="advanced-params-body" class="space-y-4 px-4 pb-4">
-              <p class="text-xs/5 text-gray-500 dark:text-gray-400">
-                Per-model inference parameters. Values are clamped to the model's allowed range on the server.
-              </p>
-
               @for (row of advancedRows(); track row.key) {
                 <div class="space-y-1.5">
                   <div class="flex items-baseline justify-between gap-2">
@@ -179,9 +175,7 @@ <h3 class="text-sm/6 font-medium text-gray-900 dark:text-white">Advanced</h3>
                       </button>
                     }
                   </div>
-                  <p
-                    [id]="'param-desc-' + row.key"
-                    class="text-xs/5 text-gray-500 dark:text-gray-400">
+                  <p [id]="'param-desc-' + row.key" class="sr-only">
                     {{ row.meta.description }}
                   </p>
 
@@ -253,6 +247,19 @@ <h3 class="text-sm/6 font-medium text-gray-900 dark:text-white">Advanced</h3>
                           class="w-32 rounded-sm border border-gray-300 bg-white px-2 py-1 text-sm/5 text-gray-900 focus-visible:outline-2 focus-visible:-outline-offset-2 focus-visible:outline-primary-600 disabled:cursor-not-allowed disabled:bg-gray-50 disabled:text-gray-500 dark:border-white/10 dark:bg-white/5 dark:text-white dark:disabled:bg-white/0" />
                       </div>
                     }
+                    @case ('select') {
+                      <select
+                        [id]="'param-input-' + row.key"
+                        [attr.aria-describedby]="'param-desc-' + row.key"
+                        [disabled]="row.locked || row.disabledByConflict"
+                        (change)="onParamSelectChange(row, $event)"
+                        class="w-40 rounded-sm border border-gray-300 bg-white px-2 py-1 text-sm/5 text-gray-900 focus-visible:outline-2 focus-visible:-outline-offset-2 focus-visible:outline-primary-600 disabled:cursor-not-allowed disabled:bg-gray-50 disabled:text-gray-500 dark:border-white/10 dark:bg-white/5 dark:text-white dark:disabled:bg-white/0">
+                        <option value="" [selected]="row.value === null || row.value === undefined">Use admin default</option>
+                        @for (lvl of row.spec.allowed ?? []; track lvl) {
+                          <option [value]="lvl" [selected]="row.value === lvl">{{ lvl }}</option>
+                        }
+                      </select>
+                    }
                     @default {
                       <input
                         type="number"
diff --git a/frontend/ai.client/src/app/components/model-settings/model-settings.ts b/frontend/ai.client/src/app/components/model-settings/model-settings.ts
index bb1faba2..8996ae27 100644
--- a/frontend/ai.client/src/app/components/model-settings/model-settings.ts
+++ b/frontend/ai.client/src/app/components/model-settings/model-settings.ts
@@ -309,6 +309,20 @@ export class ModelSettings {
     this.modelService.setInferenceParamOverride(row.key, next);
   }
 
+  /**
+   * Enum-select params (e.g. `effort`). The empty option clears the override
+   * (fall back to the admin default), mirroring how emptying a number input
+   * clears it. Any non-empty value is sent verbatim; the server gates it
+   * against the model's `allowed` set, so an out-of-domain value can't slip
+   * through even if the option list is momentarily stale.
+   */
+  onParamSelectChange(row: AdvancedParamRow, event: Event): void {
+    if (row.locked || row.disabledByConflict) return;
+    const target = event.target as HTMLSelectElement | null;
+    const raw = target?.value ?? '';
+    this.modelService.setInferenceParamOverride(row.key, raw === '' ? null : raw);
+  }
+
   /**
    * Extended thinking enable/disable. The stored value is `null` (off) or an
    * int budget (on). Default budget falls back to the admin default, then to
diff --git a/frontend/ai.client/src/app/components/sidenav/components/session-list/session-list.css b/frontend/ai.client/src/app/components/sidenav/components/session-list/session-list.css
index 6ac7e403..1ee9fd99 100644
--- a/frontend/ai.client/src/app/components/sidenav/components/session-list/session-list.css
+++ b/frontend/ai.client/src/app/components/sidenav/components/session-list/session-list.css
@@ -17,3 +17,25 @@
     transform: scale(1);
   }
 }
+
+/* Loaded-state entry animation */
+.session-list-enter {
+  animation: session-list-enter 280ms cubic-bezier(0.16, 1, 0.3, 1);
+}
+
+@keyframes session-list-enter {
+  from {
+    opacity: 0;
+    transform: translateY(4px);
+  }
+  to {
+    opacity: 1;
+    transform: translateY(0);
+  }
+}
+
+@media (prefers-reduced-motion: reduce) {
+  .session-list-enter {
+    animation: none;
+  }
+}
diff --git a/frontend/ai.client/src/app/components/sidenav/components/session-list/session-list.html b/frontend/ai.client/src/app/components/sidenav/components/session-list/session-list.html
index 9a1a55e9..07eab159 100644
--- a/frontend/ai.client/src/app/components/sidenav/components/session-list/session-list.html
+++ b/frontend/ai.client/src/app/components/sidenav/components/session-list/session-list.html
@@ -1,24 +1,39 @@
 <nav class="flex h-full flex-col">
     @if (isLoading()) {
-        <div class="flex items-center justify-center py-8">
-            <span class="text-sm text-gray-500 dark:text-gray-400">Loading sessions...</span>
+        <!-- Loading skeleton: mirrors the real list's group + row layout -->
+        <div class="flex flex-col gap-y-3" aria-hidden="true">
+            @for (group of skeletonGroups; track $index) {
+                <section>
+                    <div class="px-2 pb-0.5">
+                        <div class="h-3 w-12 rounded-sm bg-gray-200 animate-pulse motion-reduce:animate-none dark:bg-white/5"></div>
+                    </div>
+                    <ul role="list" class="flex flex-col gap-y-0.5">
+                        @for (width of group; track $index) {
+                            <li class="px-2 py-1.5">
+                                <div class="h-4 rounded-md bg-gray-200 animate-pulse motion-reduce:animate-none dark:bg-white/5" [style.width]="width"></div>
+                            </li>
+                        }
+                    </ul>
+                </section>
+            }
         </div>
+        <span class="sr-only" role="status">Loading sessions</span>
     } @else if (error()) {
         <div class="flex items-center justify-center py-8">
             <span class="text-sm text-red-500 dark:text-red-400">Failed to load sessions</span>
         </div>
     } @else {
         @if (groupedSessions().length > 0) {
-            <div class="flex flex-col gap-y-4">
+            <div class="session-list-enter flex flex-col gap-y-3">
                 @for (group of groupedSessions(); track group.label) {
                     <section>
-                        <h3 class="px-2 pb-1 text-[11px]/5 font-medium text-gray-400 dark:text-gray-500">{{ group.label }}</h3>
-                        <ul role="list" class="flex flex-col gap-y-1">
+                        <h3 class="px-2 pb-0.5 text-[11px]/5 font-medium text-gray-400 dark:text-gray-500">{{ group.label }}</h3>
+                        <ul role="list" class="flex flex-col gap-y-0.5">
                             @for (session of group.sessions; track session.sessionId) {
                                 <li class="group relative">
                                     @if (renamingSessionId() === session.sessionId) {
                                         <!-- Inline rename input -->
-                                        <div class="flex items-center gap-1 rounded-md px-2 py-1.5">
+                                        <div class="flex items-center gap-1 rounded-md px-2 py-1">
                                             <input
                                                 #renameInput
                                                 type="text"
@@ -26,22 +41,19 @@ <h3 class="px-2 pb-1 text-[11px]/5 font-medium text-gray-400 dark:text-gray-500"
                                                 (input)="renameValue.set(renameInput.value)"
                                                 (keydown)="onRenameKeydown($event, session)"
                                                 (blur)="onRenameSubmit(session)"
-                                                class="min-w-0 flex-1 rounded-md border border-gray-300 bg-white px-2 py-1 text-sm/6 font-medium text-gray-900 outline-none focus:border-primary-500 focus:ring-1 focus:ring-primary-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:focus:border-primary-400 dark:focus:ring-primary-400"
+                                                class="min-w-0 flex-1 rounded-md border border-gray-300 bg-white px-2 py-1 text-sm/5 font-medium text-gray-900 outline-none focus:border-primary-500 focus:ring-1 focus:ring-primary-500 dark:border-gray-600 dark:bg-gray-800 dark:text-white dark:focus:border-primary-400 dark:focus:ring-primary-400"
                                                 aria-label="Rename conversation"
                                             />
                                         </div>
                                     } @else {
                                         <a
                                             [routerLink]="['/s', getSessionId(session.sessionId)]"
-                                            routerLinkActive="bg-gray-200 !text-secondary-500 dark:bg-white/5 dark:!text-white"
+                                            [queryParams]="getSessionQueryParams(session)"
+                                            routerLinkActive="bg-gray-200 !font-medium !text-secondary-500 dark:bg-white/5 dark:!text-white"
                                             (click)="onSessionClick()"
-                                            class="flex gap-x-3 rounded-md px-2 py-2 pr-10 text-sm/6 font-medium text-gray-700 hover:bg-gray-50 hover:text-secondary-500 dark:text-gray-400 dark:hover:bg-white/5 dark:hover:text-white"
+                                            class="block truncate rounded-md px-2 py-1.5 pr-10 text-sm/5 font-normal text-gray-700 hover:bg-gray-50 hover:text-secondary-500 dark:text-gray-400 dark:hover:bg-white/5 dark:hover:text-white"
                                         >
-                                            <div class="flex min-w-0 flex-1 flex-col gap-y-1">
-                                                <div class="min-w-0 flex-auto truncate">
-                                                    <span>{{ getSessionTitle(session) }}</span>
-                                                </div>
-                                            </div>
+                                            {{ getSessionTitle(session) }}
                                         </a>
                                         <!-- Ellipsis menu button - visible on hover with fade-in animation -->
                                         <button
diff --git a/frontend/ai.client/src/app/components/sidenav/components/session-list/session-list.ts b/frontend/ai.client/src/app/components/sidenav/components/session-list/session-list.ts
index f104758e..c235424b 100644
--- a/frontend/ai.client/src/app/components/sidenav/components/session-list/session-list.ts
+++ b/frontend/ai.client/src/app/components/sidenav/components/session-list/session-list.ts
@@ -49,6 +49,16 @@ export class SessionList {
    */
   protected renameValue = signal('');
 
+  /**
+   * Placeholder row widths used by the loading skeleton.
+   * Each inner array is a "group" (e.g. Today, Yesterday) and each string
+   * is the width of one row, varied to mimic real session-title lengths.
+   */
+  protected readonly skeletonGroups: readonly (readonly string[])[] = [
+    ['78%', '52%', '64%'],
+    ['70%', '58%'],
+  ];
+
   /**
    * Reactive resource for fetching sessions (base API data).
    */
@@ -159,6 +169,21 @@ export class SessionList {
     return sessionId;
   }
 
+  /**
+   * Gets the queryParams for a session's routerLink. When the session has
+   * an assistant attached in preferences, we include it in the URL so the
+   * session page can load the assistant without a second round-trip. Keeping
+   * the URL the single source of truth also avoids a race where the user
+   * sends a message before the metadata fetch hydrates preferences.
+   *
+   * @param session - Session metadata from the list
+   * @returns queryParams object for routerLink, or null when no assistant
+   */
+  protected getSessionQueryParams(session: SessionMetadata): Record<string, string> | null {
+    const assistantId = session.preferences?.assistantId;
+    return assistantId ? { assistantId } : null;
+  }
+
   /**
    * Formats a timestamp for display.
    * Shows relative time if recent, otherwise shows date.
diff --git a/frontend/ai.client/src/app/components/topnav/components/user-dropdown.component.ts b/frontend/ai.client/src/app/components/topnav/components/user-dropdown.component.ts
index cb574a5d..f98e4091 100644
--- a/frontend/ai.client/src/app/components/topnav/components/user-dropdown.component.ts
+++ b/frontend/ai.client/src/app/components/topnav/components/user-dropdown.component.ts
@@ -3,6 +3,7 @@ import { Component, input, output, ChangeDetectionStrategy, computed, inject } f
 import { RouterLink } from '@angular/router';
 import { CdkMenuTrigger, CdkMenu, CdkMenuItem } from '@angular/cdk/menu';
 import { ConnectedPosition } from '@angular/cdk/overlay';
+import { Dialog } from '@angular/cdk/dialog';
 import { NgIcon, provideIcons } from '@ng-icons/core';
 import {
   heroChevronUpDown,
@@ -16,9 +17,17 @@ import {
   heroDocument,
   heroBriefcase,
   heroCog6Tooth,
+  heroArrowTopRightOnSquare,
+  heroDocumentText,
 } from '@ng-icons/heroicons/outline';
 import { ThemeService, ThemePreference } from './theme-toggle/theme.service';
 import { VERSION } from '../../../../version';
+import { UserMenuLinksService } from '../../../admin/manage-user-menu-links/services/user-menu-links.service';
+import { UserMenuLink } from '../../../admin/manage-user-menu-links/models/user-menu-link.model';
+import {
+  UserMenuLinkModalComponent,
+  UserMenuLinkModalData,
+} from './user-menu-link-modal/user-menu-link-modal.component';
 
 export interface User {
   firstName: string;
@@ -45,6 +54,8 @@ export interface User {
       heroChatBubbleLeftRight,
       heroDocument,
       heroCog6Tooth,
+      heroArrowTopRightOnSquare,
+      heroDocumentText,
     })
   ],
   template: `
@@ -135,6 +146,44 @@ export interface User {
               </a>
             </div>
 
+            <!-- Admin-managed custom links -->
+            @if (customLinks().length > 0) {
+              <div class="border-t border-gray-200 py-1 dark:border-gray-700">
+                @for (link of customLinks(); track link.link_id) {
+                  @if (link.kind === 'external') {
+                    <a
+                      cdkMenuItem
+                      [href]="link.url"
+                      target="_blank"
+                      rel="noopener noreferrer"
+                      class="flex w-full items-center gap-3 px-3 py-2 text-sm/6 text-gray-700 hover:bg-gray-50 focus:bg-gray-50 dark:text-gray-300 dark:hover:bg-gray-700 dark:focus:bg-gray-700 rounded-xs outline-hidden"
+                      role="menuitem"
+                    >
+                      <ng-icon
+                        name="heroArrowTopRightOnSquare"
+                        class="size-5 text-gray-400 dark:text-gray-500"
+                      />
+                      <span class="flex-1 truncate">{{ link.label }}</span>
+                    </a>
+                  } @else {
+                    <button
+                      cdkMenuItem
+                      type="button"
+                      (click)="openLinkModal(link)"
+                      class="flex w-full items-center gap-3 px-3 py-2 text-sm/6 text-gray-700 hover:bg-gray-50 focus:bg-gray-50 dark:text-gray-300 dark:hover:bg-gray-700 dark:focus:bg-gray-700 rounded-xs outline-hidden text-left"
+                      role="menuitem"
+                    >
+                      <ng-icon
+                        name="heroDocumentText"
+                        class="size-5 text-gray-400 dark:text-gray-500"
+                      />
+                      <span class="flex-1 truncate">{{ link.label }}</span>
+                    </button>
+                  }
+                }
+              </div>
+            }
+
             <!-- Logout section -->
             <div class="border-t border-gray-200 py-1 dark:border-gray-700">
               <button
@@ -198,11 +247,29 @@ export interface User {
 })
 export class UserDropdownComponent {
   private readonly themeService = inject(ThemeService);
+  private readonly userMenuLinksService = inject(UserMenuLinksService);
+  private readonly dialog = inject(Dialog);
 
   // Inputs
   user = input.required<User>();
   isAdmin = input.required<boolean>();
 
+  // Admin-managed custom links (already enabled-filtered + ordered by backend).
+  protected readonly customLinks = computed<UserMenuLink[]>(
+    () => this.userMenuLinksService.enabledLinksResource.value()?.links ?? [],
+  );
+
+  protected openLinkModal(link: UserMenuLink): void {
+    this.dialog.open<void, UserMenuLinkModalData>(UserMenuLinkModalComponent, {
+      data: {
+        label: link.label,
+        bodyMarkdown: link.body_markdown ?? '',
+      },
+      hasBackdrop: false, // dialog component owns its own backdrop
+      panelClass: 'user-menu-link-modal-panel',
+    });
+  }
+
   // Outputs
   logout = output<void>();
 
diff --git a/frontend/ai.client/src/app/components/topnav/components/user-menu-link-modal/user-menu-link-modal.component.ts b/frontend/ai.client/src/app/components/topnav/components/user-menu-link-modal/user-menu-link-modal.component.ts
new file mode 100644
index 00000000..28b66223
--- /dev/null
+++ b/frontend/ai.client/src/app/components/topnav/components/user-menu-link-modal/user-menu-link-modal.component.ts
@@ -0,0 +1,120 @@
+import { ChangeDetectionStrategy, Component, inject } from '@angular/core';
+import { DIALOG_DATA, DialogRef } from '@angular/cdk/dialog';
+import { MarkdownComponent } from 'ngx-markdown';
+import { NgIcon, provideIcons } from '@ng-icons/core';
+import { heroXMark } from '@ng-icons/heroicons/outline';
+
+export interface UserMenuLinkModalData {
+  label: string;
+  bodyMarkdown: string;
+}
+
+/**
+ * Generic rich-text modal opened from admin-managed user-menu links.
+ * Renders markdown via ngx-markdown (same renderer used for assistant
+ * messages, so heading/list/link styling is consistent).
+ */
+@Component({
+  selector: 'app-user-menu-link-modal',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  imports: [MarkdownComponent, NgIcon],
+  providers: [provideIcons({ heroXMark })],
+  host: {
+    class: 'block',
+    '(keydown.escape)': 'onClose()',
+  },
+  template: `
+    <div
+      class="dialog-backdrop fixed inset-0 bg-gray-500/75 dark:bg-gray-900/80"
+      aria-hidden="true"
+      (click)="onClose()"
+    ></div>
+
+    <div class="fixed inset-0 z-10 flex min-h-full items-end justify-center p-4 sm:items-center sm:p-0">
+      <div
+        class="dialog-panel relative w-full transform overflow-hidden rounded-lg bg-white text-left shadow-xl sm:my-8 sm:max-w-2xl dark:bg-gray-800 dark:outline dark:-outline-offset-1 dark:outline-white/10"
+        role="dialog"
+        aria-modal="true"
+        [attr.aria-labelledby]="titleId"
+      >
+        <div class="flex items-center justify-between border-b border-gray-200 px-6 py-4 dark:border-gray-700">
+          <h3 [id]="titleId" class="text-base/7 font-semibold text-gray-900 dark:text-white">
+            {{ data.label }}
+          </h3>
+          <button
+            type="button"
+            (click)="onClose()"
+            class="rounded-md text-gray-400 hover:text-gray-500 focus:outline-2 focus:outline-offset-2 focus:outline-[var(--color-primary)] dark:hover:text-gray-300"
+            aria-label="Close dialog"
+          >
+            <ng-icon name="heroXMark" class="size-5" aria-hidden="true" />
+          </button>
+        </div>
+
+        <div class="max-h-[70vh] overflow-y-auto px-6 py-5">
+          <div class="markdown-body prose prose-sm max-w-none dark:prose-invert">
+            <markdown [data]="data.bodyMarkdown" />
+          </div>
+        </div>
+
+        <div class="flex justify-end border-t border-gray-200 px-6 py-3 dark:border-gray-700">
+          <button
+            type="button"
+            (click)="onClose()"
+            class="rounded-md bg-white px-3 py-2 text-sm/6 font-semibold text-gray-900 shadow-xs ring-1 ring-gray-300 ring-inset hover:bg-gray-50 dark:bg-white/10 dark:text-white dark:shadow-none dark:ring-white/5 dark:hover:bg-white/20"
+          >
+            Close
+          </button>
+        </div>
+      </div>
+    </div>
+  `,
+  styles: `
+    @import "tailwindcss";
+    @custom-variant dark (&:where(.dark, .dark *));
+
+    .dialog-backdrop {
+      animation: backdrop-fade-in 200ms ease-out;
+    }
+    @keyframes backdrop-fade-in {
+      from { opacity: 0; }
+      to { opacity: 1; }
+    }
+    .dialog-panel {
+      animation: dialog-fade-in-up 200ms ease-out;
+    }
+    @keyframes dialog-fade-in-up {
+      from { opacity: 0; transform: translateY(1rem) scale(0.98); }
+      to { opacity: 1; transform: translateY(0) scale(1); }
+    }
+
+    .markdown-body ::ng-deep a {
+      color: var(--color-primary-500);
+      text-decoration: underline;
+      text-underline-offset: 2px;
+    }
+    .markdown-body ::ng-deep a:hover {
+      color: var(--color-primary-700);
+    }
+    .markdown-body ::ng-deep a:focus-visible {
+      outline: 2px solid var(--color-primary-500);
+      outline-offset: 2px;
+      border-radius: 0.125rem;
+    }
+    :host-context(.dark) .markdown-body ::ng-deep a {
+      color: var(--color-primary-400);
+    }
+    :host-context(.dark) .markdown-body ::ng-deep a:hover {
+      color: var(--color-primary-300);
+    }
+  `,
+})
+export class UserMenuLinkModalComponent {
+  private dialogRef = inject(DialogRef<void>);
+  protected data = inject<UserMenuLinkModalData>(DIALOG_DATA);
+  protected readonly titleId = `user-menu-link-modal-title-${crypto.randomUUID()}`;
+
+  protected onClose(): void {
+    this.dialogRef.close();
+  }
+}
diff --git a/frontend/ai.client/src/app/fine-tuning/pages/create-inference-job/create-inference-job.page.html b/frontend/ai.client/src/app/fine-tuning/pages/create-inference-job/create-inference-job.page.html
index 08f46d5b..7184c1b3 100644
--- a/frontend/ai.client/src/app/fine-tuning/pages/create-inference-job/create-inference-job.page.html
+++ b/frontend/ai.client/src/app/fine-tuning/pages/create-inference-job/create-inference-job.page.html
@@ -29,7 +29,7 @@ <h2 class="mb-1 text-base/7 font-semibold text-gray-900 dark:text-white">1. Sele
 
       @if (state.loading() && state.trainedModels().length === 0) {
         <div class="flex items-center justify-center py-8">
-          <div class="size-6 animate-spin rounded-full border-2 border-gray-200 border-t-blue-600" role="status" aria-label="Loading trained models"></div>
+          <div class="size-6 animate-spin rounded-full border-2 border-gray-200 border-t-blue-600 dark:border-t-blue-400" role="status" aria-label="Loading trained models"></div>
         </div>
       } @else if (state.trainedModels().length === 0) {
         <div class="rounded-sm border border-gray-200 bg-white py-8 text-center dark:border-gray-700 dark:bg-gray-900">
@@ -73,7 +73,7 @@ <h2 class="mb-1 text-base/7 font-semibold text-gray-900 dark:text-white">2. Uplo
               @if (upload.status === 'complete') {
                 <ng-icon name="heroCheck" class="size-5 text-green-600 dark:text-green-400" />
               } @else if (upload.status === 'uploading') {
-                <div class="size-5 animate-spin rounded-full border-2 border-gray-200 border-t-blue-600" role="status" aria-label="Uploading"></div>
+                <div class="size-5 animate-spin rounded-full border-2 border-gray-200 border-t-blue-600 dark:border-t-blue-400" role="status" aria-label="Uploading"></div>
               } @else if (upload.status === 'error') {
                 <ng-icon name="heroXMark" class="size-5 text-red-600 dark:text-red-400" />
               }
diff --git a/frontend/ai.client/src/app/fine-tuning/pages/create-training-job/create-training-job.page.html b/frontend/ai.client/src/app/fine-tuning/pages/create-training-job/create-training-job.page.html
index 33be342a..c3738a09 100644
--- a/frontend/ai.client/src/app/fine-tuning/pages/create-training-job/create-training-job.page.html
+++ b/frontend/ai.client/src/app/fine-tuning/pages/create-training-job/create-training-job.page.html
@@ -33,7 +33,7 @@ <h2 class="mb-1 text-base/7 font-semibold text-gray-900 dark:text-white">1. Uplo
             @if (upload.status === 'complete') {
               <ng-icon name="heroCheck" class="size-5 text-green-600 dark:text-green-400" />
             } @else if (upload.status === 'uploading') {
-              <div class="size-5 animate-spin rounded-full border-2 border-gray-200 border-t-blue-600" role="status" aria-label="Uploading"></div>
+              <div class="size-5 animate-spin rounded-full border-2 border-gray-200 border-t-blue-600 dark:border-t-blue-400" role="status" aria-label="Uploading"></div>
             } @else if (upload.status === 'error') {
               <ng-icon name="heroXMark" class="size-5 text-red-600 dark:text-red-400" />
             }
@@ -154,7 +154,7 @@ <h2 class="mb-1 text-base/7 font-semibold text-gray-900 dark:text-white">2. Sele
 
     @if (state.loading() && state.availableModels().length === 0) {
       <div class="flex items-center justify-center py-8">
-        <div class="size-6 animate-spin rounded-full border-2 border-gray-200 border-t-blue-600" role="status" aria-label="Loading models"></div>
+        <div class="size-6 animate-spin rounded-full border-2 border-gray-200 border-t-blue-600 dark:border-t-blue-400" role="status" aria-label="Loading models"></div>
       </div>
     } @else {
       <!-- Custom HuggingFace Model — prominent option at top -->
@@ -266,7 +266,7 @@ <h2 class="mb-1 text-base/7 font-semibold text-gray-900 dark:text-white">2. Sele
                 <div class="absolute z-20 mt-1 max-h-60 w-full overflow-auto rounded-sm border border-gray-200 bg-white shadow-lg dark:border-gray-600 dark:bg-gray-800">
                   @if (searchingModels()) {
                     <div class="flex items-center gap-2 px-3 py-3 text-sm/6 text-gray-500 dark:text-gray-400">
-                      <div class="size-4 animate-spin rounded-full border-2 border-gray-200 border-t-blue-600" role="status" aria-label="Searching"></div>
+                      <div class="size-4 animate-spin rounded-full border-2 border-gray-200 border-t-blue-600 dark:border-t-blue-400" role="status" aria-label="Searching"></div>
                       Searching HuggingFace...
                     </div>
                   } @else if (hfSearchResults().length > 0) {
diff --git a/frontend/ai.client/src/app/fine-tuning/pages/dashboard/fine-tuning-dashboard.page.html b/frontend/ai.client/src/app/fine-tuning/pages/dashboard/fine-tuning-dashboard.page.html
index 81873db3..484c47cc 100644
--- a/frontend/ai.client/src/app/fine-tuning/pages/dashboard/fine-tuning-dashboard.page.html
+++ b/frontend/ai.client/src/app/fine-tuning/pages/dashboard/fine-tuning-dashboard.page.html
@@ -75,7 +75,7 @@ <h1 class="text-2xl/8 font-bold text-gray-900 dark:text-white">Fine-Tuning</h1>
   <!-- Loading spinner (only when no data yet) -->
   @if (state.loading() && !state.access()) {
     <div class="flex items-center justify-center py-12">
-      <div class="size-8 animate-spin rounded-full border-4 border-gray-200 border-t-blue-600" role="status" aria-label="Loading"></div>
+      <div class="size-8 animate-spin rounded-full border-4 border-gray-200 border-t-blue-600 dark:border-t-blue-400" role="status" aria-label="Loading"></div>
     </div>
   }
 
diff --git a/frontend/ai.client/src/app/fine-tuning/pages/inference-job-detail/inference-job-detail.page.html b/frontend/ai.client/src/app/fine-tuning/pages/inference-job-detail/inference-job-detail.page.html
index 65d2d5b8..3b03c404 100644
--- a/frontend/ai.client/src/app/fine-tuning/pages/inference-job-detail/inference-job-detail.page.html
+++ b/frontend/ai.client/src/app/fine-tuning/pages/inference-job-detail/inference-job-detail.page.html
@@ -33,7 +33,7 @@
   <!-- Loading spinner -->
   @if (state.loading() && !state.currentInferenceJob()) {
     <div class="flex items-center justify-center py-12">
-      <div class="size-8 animate-spin rounded-full border-4 border-gray-200 border-t-blue-600" role="status" aria-label="Loading"></div>
+      <div class="size-8 animate-spin rounded-full border-4 border-gray-200 border-t-blue-600 dark:border-t-blue-400" role="status" aria-label="Loading"></div>
     </div>
   }
 
@@ -198,7 +198,7 @@ <h2 class="text-base/7 font-semibold text-gray-900 dark:text-white">CloudWatch L
 
       @if (loadingLogs() && state.currentLogs().length === 0) {
         <div class="flex items-center justify-center rounded-sm border border-gray-200 bg-white py-8 dark:border-gray-700 dark:bg-gray-900">
-          <div class="size-6 animate-spin rounded-full border-2 border-gray-200 border-t-blue-600" role="status" aria-label="Loading logs"></div>
+          <div class="size-6 animate-spin rounded-full border-2 border-gray-200 border-t-blue-600 dark:border-t-blue-400" role="status" aria-label="Loading logs"></div>
         </div>
       } @else if (state.currentLogs().length > 0) {
         <div class="max-h-96 overflow-auto rounded-sm border border-gray-200 bg-gray-950 dark:border-gray-700">
diff --git a/frontend/ai.client/src/app/fine-tuning/pages/training-job-detail/training-job-detail.page.html b/frontend/ai.client/src/app/fine-tuning/pages/training-job-detail/training-job-detail.page.html
index 07128405..a8a78e4b 100644
--- a/frontend/ai.client/src/app/fine-tuning/pages/training-job-detail/training-job-detail.page.html
+++ b/frontend/ai.client/src/app/fine-tuning/pages/training-job-detail/training-job-detail.page.html
@@ -33,7 +33,7 @@
   <!-- Loading spinner -->
   @if (state.loading() && !state.currentTrainingJob()) {
     <div class="flex items-center justify-center py-12">
-      <div class="size-8 animate-spin rounded-full border-4 border-gray-200 border-t-blue-600" role="status" aria-label="Loading"></div>
+      <div class="size-8 animate-spin rounded-full border-4 border-gray-200 border-t-blue-600 dark:border-t-blue-400" role="status" aria-label="Loading"></div>
     </div>
   }
 
@@ -220,7 +220,7 @@ <h2 class="text-base/7 font-semibold text-gray-900 dark:text-white">CloudWatch L
 
       @if (loadingLogs() && state.currentLogs().length === 0) {
         <div class="flex items-center justify-center rounded-sm border border-gray-200 bg-white py-8 dark:border-gray-700 dark:bg-gray-900">
-          <div class="size-6 animate-spin rounded-full border-2 border-gray-200 border-t-blue-600" role="status" aria-label="Loading logs"></div>
+          <div class="size-6 animate-spin rounded-full border-2 border-gray-200 border-t-blue-600 dark:border-t-blue-400" role="status" aria-label="Loading logs"></div>
         </div>
       } @else if (state.currentLogs().length > 0) {
         <div class="max-h-96 overflow-auto rounded-sm border border-gray-200 bg-gray-950 dark:border-gray-700">
diff --git a/frontend/ai.client/src/app/services/error/error.service.ts b/frontend/ai.client/src/app/services/error/error.service.ts
index 57fb15f1..3fb06f71 100644
--- a/frontend/ai.client/src/app/services/error/error.service.ts
+++ b/frontend/ai.client/src/app/services/error/error.service.ts
@@ -23,6 +23,7 @@ export enum ErrorCode {
   TOOL_ERROR = 'tool_error',
   MODEL_ERROR = 'model_error',
   STREAM_ERROR = 'stream_error',
+  MAX_TOKENS = 'max_tokens',
 
   // Client-side errors
   NETWORK_ERROR = 'network_error',
@@ -298,6 +299,7 @@ export class ErrorService {
       [ErrorCode.TOOL_ERROR]: 'Tool Error',
       [ErrorCode.MODEL_ERROR]: 'Model Error',
       [ErrorCode.STREAM_ERROR]: 'Stream Error',
+      [ErrorCode.MAX_TOKENS]: 'Response Truncated',
       [ErrorCode.NETWORK_ERROR]: 'Network Error',
       [ErrorCode.UNKNOWN_ERROR]: 'Error',
     };
diff --git a/frontend/ai.client/src/app/services/file-upload/file-upload.service.ts b/frontend/ai.client/src/app/services/file-upload/file-upload.service.ts
index 6058b4dd..fe96195a 100644
--- a/frontend/ai.client/src/app/services/file-upload/file-upload.service.ts
+++ b/frontend/ai.client/src/app/services/file-upload/file-upload.service.ts
@@ -77,6 +77,47 @@ export interface CompleteUploadResponse {
   sizeBytes: number;
 }
 
+/**
+ * Response from GET /files/{uploadId}/preview-url
+ */
+export interface PreviewUrlResponse {
+  uploadId: string;
+  url: string;
+  expiresAt: string;
+  mimeType: string;
+  filename: string;
+}
+
+/**
+ * Response from GET /files/{uploadId}/text-snippet
+ */
+export interface TextSnippetResponse {
+  uploadId: string;
+  snippet: string;
+  truncated: boolean;
+  mimeType: string;
+}
+
+/**
+ * Response from GET /files/{uploadId}/thumbnail
+ */
+export interface ThumbnailResponse {
+  uploadId: string;
+  url: string;
+  expiresAt: string;
+  cached: boolean;
+}
+
+/**
+ * Outcome of a thumbnail fetch — `unsupported` (415) and `unavailable`
+ * (404/422/network) collapse into typed states the UI can switch on
+ * without parsing HTTP errors at the call site.
+ */
+export type ThumbnailFetchResult =
+  | { status: 'ready'; response: ThumbnailResponse }
+  | { status: 'unsupported' }
+  | { status: 'unavailable' };
+
 /**
  * File metadata from list/get operations
  */
@@ -550,6 +591,61 @@ export class FileUploadService {
     return results;
   }
 
+  /**
+   * Fetch a short-lived presigned GET URL for a file.
+   *
+   * Used by the UI to render inline image previews and the lightbox.
+   * The URL expires after a few minutes; refetch on expiry.
+   */
+  async getPreviewUrl(uploadId: string): Promise<PreviewUrlResponse> {
+    try {
+      return await firstValueFrom(
+        this.http.get<PreviewUrlResponse>(`${this.baseUrl()}/${uploadId}/preview-url`)
+      );
+    } catch (err) {
+      throw this.handleApiError(err, 'Failed to get preview URL');
+    }
+  }
+
+  /**
+   * Fetch a UTF-8 text snippet from the start of a file.
+   *
+   * Returns an empty snippet for non-text MIME types so the UI can fall
+   * back to a skeleton mockup.
+   */
+  async getTextSnippet(uploadId: string): Promise<TextSnippetResponse> {
+    try {
+      return await firstValueFrom(
+        this.http.get<TextSnippetResponse>(`${this.baseUrl()}/${uploadId}/text-snippet`)
+      );
+    } catch (err) {
+      throw this.handleApiError(err, 'Failed to get text snippet');
+    }
+  }
+
+  /**
+   * Fetch a presigned URL for a PNG thumbnail of the file's first page.
+   *
+   * Backend lazy-renders on first call and caches the result, so subsequent
+   * calls return instantly. Distinguishes between "this file type can never
+   * have a thumbnail" (415 → `unsupported`) and "we tried but it didn't
+   * work" (404/422/network → `unavailable`) so the UI can decide whether
+   * to retry or give up.
+   */
+  async getThumbnail(uploadId: string): Promise<ThumbnailFetchResult> {
+    try {
+      const response = await firstValueFrom(
+        this.http.get<ThumbnailResponse>(`${this.baseUrl()}/${uploadId}/thumbnail`)
+      );
+      return { status: 'ready', response };
+    } catch (err) {
+      if (err instanceof HttpErrorResponse && err.status === 415) {
+        return { status: 'unsupported' };
+      }
+      return { status: 'unavailable' };
+    }
+  }
+
   /**
    * Get user's quota status.
    */
diff --git a/frontend/ai.client/src/app/session/components/chat-container/chat-container.component.css b/frontend/ai.client/src/app/session/components/chat-container/chat-container.component.css
index d20c092a..f961bf28 100644
--- a/frontend/ai.client/src/app/session/components/chat-container/chat-container.component.css
+++ b/frontend/ai.client/src/app/session/components/chat-container/chat-container.component.css
@@ -14,7 +14,7 @@
   left: 0;
   right: 0;
   z-index: 40;
-  transition: left 300ms;
+  transition: left 300ms, right 300ms;
 }
 
 /* Chat input footer for full-page mode */
@@ -25,7 +25,7 @@
   left: 0;
   right: 0;
   animation: fade-in 0.3s ease-out forwards;
-  transition: left 300ms;
+  transition: left 300ms, right 300ms;
   background-color: var(--color-gray-50);
 }
 
@@ -55,7 +55,7 @@
   display: flex;
   align-items: center;
   justify-content: center;
-  transition: left 300ms;
+  transition: left 300ms, right 300ms;
   background-color: var(--color-gray-50);
   background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='100' height='100' viewBox='0 0 100 100'%3E%3Cg fill-rule='evenodd'%3E%3Cg fill='%23d1d5db' fill-opacity='0.4'%3E%3Cpath opacity='.5' d='M96 95h4v1h-4v4h-1v-4h-9v4h-1v-4h-9v4h-1v-4h-9v4h-1v-4h-9v4h-1v-4h-9v4h-1v-4h-9v4h-1v-4h-9v4h-1v-4h-9v4h-1v-4H0v-1h15v-9H0v-1h15v-9H0v-1h15v-9H0v-1h15v-9H0v-1h15v-9H0v-1h15v-9H0v-1h15v-9H0v-1h15v-9H0v-1h15V0h1v15h9V0h1v15h9V0h1v15h9V0h1v15h9V0h1v15h9V0h1v15h9V0h1v15h9V0h1v15h9V0h1v15h4v1h-4v9h4v1h-4v9h4v1h-4v9h4v1h-4v9h4v1h-4v9h4v1h-4v9h4v1h-4v9h4v1h-4v9zm-1 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-9-10h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm9-10v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-9-10h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm9-10v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-9-10h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm9-10v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-10 0v-9h-9v9h9zm-9-10h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9zm10 0h9v-9h-9v9z'/%3E%3Cpath d='M6 5V0H5v5H0v1h5v94h1V6h94V5H6z'/%3E%3C/g%3E%3C/g%3E%3C/svg%3E");
 }
@@ -98,6 +98,14 @@
   .chat-topnav-wrapper.sidenav-expanded {
     left: 18rem; /* 72 in Tailwind = 18rem */
   }
+
+  /* Reserve room for the right-docked artifact pane (max-w-2xl = 42rem)
+     so the fixed footer/topnav sit beside it instead of under it. */
+  .chat-input-footer.full-page.artifact-pane-open,
+  .chat-container-empty.full-page.artifact-pane-open,
+  .chat-topnav-wrapper.artifact-pane-open {
+    right: var(--artifact-pane-width, 42rem);
+  }
 }
 
 /* ============================================================
diff --git a/frontend/ai.client/src/app/session/components/chat-container/chat-container.component.html b/frontend/ai.client/src/app/session/components/chat-container/chat-container.component.html
index e170548e..ddf1e91c 100644
--- a/frontend/ai.client/src/app/session/components/chat-container/chat-container.component.html
+++ b/frontend/ai.client/src/app/session/components/chat-container/chat-container.component.html
@@ -4,7 +4,8 @@
     <!-- Full-page mode: fixed topnav with sidenav awareness -->
     <div
       class="chat-topnav-wrapper"
-      [class.sidenav-expanded]="!isSidenavCollapsed()">
+      [class.sidenav-expanded]="!isSidenavCollapsed()"
+      [class.artifact-pane-open]="artifactPanelOpen()">
       <app-topnav />
     </div>
   }
@@ -82,7 +83,8 @@
       <!-- Assistant is loading: show skeleton card -->
       <div
         class="chat-container-empty full-page"
-        [class.sidenav-expanded]="!isSidenavCollapsed()">
+        [class.sidenav-expanded]="!isSidenavCollapsed()"
+      [class.artifact-pane-open]="artifactPanelOpen()">
         <div class="w-full max-w-sm px-4 -mt-20">
           <div class="animate-pulse rounded-2xl bg-gray-100 dark:bg-gray-800 border border-gray-200 dark:border-gray-700 overflow-hidden">
             <!-- Skeleton gradient header -->
@@ -107,7 +109,8 @@
       <!-- Input at bottom while loading -->
       <div
         class="chat-input-footer full-page"
-        [class.sidenav-expanded]="!isSidenavCollapsed()">
+        [class.sidenav-expanded]="!isSidenavCollapsed()"
+      [class.artifact-pane-open]="artifactPanelOpen()">
         <div class="mx-auto px-4 max-w-[720px]">
           <!-- Skeleton indicator chip -->
           <div class="flex justify-center mb-2">
@@ -136,7 +139,8 @@
       <!-- Assistant selected: show only assistant card with fixed input at bottom -->
       <div
         class="chat-container-empty full-page"
-        [class.sidenav-expanded]="!isSidenavCollapsed()">
+        [class.sidenav-expanded]="!isSidenavCollapsed()"
+      [class.artifact-pane-open]="artifactPanelOpen()">
         <div class="w-full max-w-[720px] animate-fade-in px-4 -mt-20 relative">
           <!-- Assistant Card -->
           <div class="mb-6 relative">
@@ -171,7 +175,8 @@
       <!-- Fixed input at bottom when assistant is selected -->
       <div
         class="chat-input-footer full-page"
-        [class.sidenav-expanded]="!isSidenavCollapsed()">
+        [class.sidenav-expanded]="!isSidenavCollapsed()"
+      [class.artifact-pane-open]="artifactPanelOpen()">
         <div class="mx-auto px-4 max-w-[720px]">
           <app-chat-input
             [sessionId]="sessionId()"
@@ -189,7 +194,8 @@
       <!-- No assistant: show greeting with inline input -->
       <div
         class="chat-container-empty full-page"
-        [class.sidenav-expanded]="!isSidenavCollapsed()">
+        [class.sidenav-expanded]="!isSidenavCollapsed()"
+      [class.artifact-pane-open]="artifactPanelOpen()">
         <div class="w-full max-w-[720px] animate-fade-in px-4 -mt-20 relative">
           <div class="mb-8 flex items-center justify-center gap-4 relative" data-testid="greeting-message">
             <img
@@ -246,7 +252,8 @@
             [messages]="messages()"
             [isChatLoading]="isChatLoading()"
             [streamingMessageId]="streamingMessageId()"
-            [embeddedMode]="true" />
+            [embeddedMode]="true"
+            (continueRequested)="continueRequested.emit()" />
         </div>
       </div>
 
@@ -294,7 +301,8 @@
       <!-- Header (fixed, content scrolls underneath) -->
       <div
         class="chat-topnav-wrapper"
-        [class.sidenav-expanded]="!isSidenavCollapsed()">
+        [class.sidenav-expanded]="!isSidenavCollapsed()"
+      [class.artifact-pane-open]="artifactPanelOpen()">
         <app-topnav />
       </div>
     }
@@ -311,14 +319,16 @@
         <app-message-list
           [messages]="messages()"
           [isChatLoading]="isChatLoading()"
-          [streamingMessageId]="streamingMessageId()" />
+          [streamingMessageId]="streamingMessageId()"
+          (continueRequested)="continueRequested.emit()" />
       </div>
     </div>
 
     <!-- Chat input footer -->
     <div
       class="chat-input-footer full-page"
-      [class.sidenav-expanded]="!isSidenavCollapsed()">
+      [class.sidenav-expanded]="!isSidenavCollapsed()"
+      [class.artifact-pane-open]="artifactPanelOpen()">
       <div class="mx-auto px-4 max-w-[720px]">
         <!-- Assistant Indicator (floating badge above input) -->
         @if (isLoadingAssistant()) {
diff --git a/frontend/ai.client/src/app/session/components/chat-container/chat-container.component.ts b/frontend/ai.client/src/app/session/components/chat-container/chat-container.component.ts
index 58457c03..14006892 100644
--- a/frontend/ai.client/src/app/session/components/chat-container/chat-container.component.ts
+++ b/frontend/ai.client/src/app/session/components/chat-container/chat-container.component.ts
@@ -16,6 +16,7 @@ import { AnimatedTextComponent } from '../../../components/animated-text';
 import { ParagraphSkeletonComponent } from '../../../components/paragraph-skeleton';
 import { Topnav } from '../../../components/topnav/topnav';
 import { SidenavService } from '../../../services/sidenav/sidenav.service';
+import { ArtifactStateService } from '../../services/artifacts/artifact-state.service';
 import { Assistant } from '../../../assistants/models/assistant.model';
 import { AssistantCardComponent } from '../../../assistants/components/assistant-card.component';
 import { AssistantIndicatorComponent } from '../assistant-indicator/assistant-indicator.component';
@@ -72,6 +73,7 @@ export interface ChatContainerConfig {
 export class ChatContainerComponent {
   // Inject sidenav service for full-page mode positioning
   protected sidenavService = inject(SidenavService);
+  private artifactState = inject(ArtifactStateService);
   private voiceChatService = inject(VoiceChatService);
   protected readonly isVoiceActive = this.voiceChatService.isVoiceActive;
 
@@ -106,6 +108,7 @@ export class ChatContainerComponent {
 
   // Output events
   messageSubmitted = output<{ content: string; timestamp: Date; fileUploadIds?: string[] }>();
+  continueRequested = output<void>();
   messageCancelled = output<void>();
   fileAttached = output<File>();
   settingsToggled = output<void>();
@@ -130,6 +133,11 @@ export class ChatContainerComponent {
   protected readonly isSidenavCollapsed = computed(() =>
     this.sidenavService.isCollapsed()
   );
+  /** True while the docked artifact pane is open — the fixed footer /
+   *  topnav reserve right-side space so the pane doesn't cover them. */
+  protected readonly artifactPanelOpen = computed(
+    () => this.artifactState.openArtifact() !== null
+  );
   protected readonly isAssistantOwner = computed(() => {
     const a = this.assistant();
     if (!a) return false;
diff --git a/frontend/ai.client/src/app/session/components/chat-input/chat-input.component.html b/frontend/ai.client/src/app/session/components/chat-input/chat-input.component.html
index d219ddf8..638c80d4 100644
--- a/frontend/ai.client/src/app/session/components/chat-input/chat-input.component.html
+++ b/frontend/ai.client/src/app/session/components/chat-input/chat-input.component.html
@@ -50,6 +50,7 @@
     <div class="relative">
       <label for="user-message" class="sr-only">How can I help you today?</label>
       <textarea
+        #messageInput
         id="user-message"
         name="user-message"
         rows="1"
diff --git a/frontend/ai.client/src/app/session/components/chat-input/chat-input.component.ts b/frontend/ai.client/src/app/session/components/chat-input/chat-input.component.ts
index 0d67ce96..ed22d5b5 100644
--- a/frontend/ai.client/src/app/session/components/chat-input/chat-input.component.ts
+++ b/frontend/ai.client/src/app/session/components/chat-input/chat-input.component.ts
@@ -1,4 +1,15 @@
-import { Component, signal, output, inject, input, computed } from '@angular/core';
+import {
+  Component,
+  signal,
+  output,
+  inject,
+  input,
+  computed,
+  viewChild,
+  effect,
+  afterNextRender,
+  ElementRef,
+} from '@angular/core';
 import { FormsModule } from '@angular/forms';
 import { NgIcon, provideIcons } from '@ng-icons/core';
 import {
@@ -22,6 +33,7 @@ import {
   formatBytes
 } from '../../../services/file-upload';
 import { ToastService } from '../../../services/toast/toast.service';
+import { ToolService } from '../../../services/tool/tool.service';
 import { VoiceChatService, type VoiceStatus } from '../../services/voice';
 
 interface Message {
@@ -50,6 +62,7 @@ export class ChatInputComponent {
   // Service injection
   private readonly fileUploadService = inject(FileUploadService);
   private readonly toastService = inject(ToastService);
+  private readonly toolService = inject(ToolService);
   private readonly voiceChatService = inject(VoiceChatService);
 
   // Input: session ID for file uploads
@@ -61,6 +74,12 @@ export class ChatInputComponent {
   // Input: show file attachment controls (defaults to true)
   readonly showFileControls = input<boolean>(true);
 
+  // Input: auto-focus the textarea on load and session change (defaults to true).
+  // Disabled where the input sits beside an editable form (e.g. assistant preview).
+  readonly autoFocus = input<boolean>(true);
+
+  private readonly messageInput = viewChild<ElementRef<HTMLTextAreaElement>>('messageInput');
+
   // Use the input directly - parent controls loading state
   protected readonly isLoading = computed(() => this.isChatLoading());
 
@@ -138,6 +157,24 @@ export class ChatInputComponent {
     }
   });
 
+  constructor() {
+    // Focus the textarea on first mount...
+    afterNextRender(() => this.focusInput());
+    // ...and whenever the session changes (new or existing). When switching
+    // between sessions in the messages view the component instance is reused,
+    // so afterNextRender alone would not refocus.
+    effect(() => {
+      this.sessionId();
+      this.focusInput();
+    });
+  }
+
+  private focusInput(): void {
+    if (this.autoFocus()) {
+      this.messageInput()?.nativeElement.focus();
+    }
+  }
+
   onSubmit() {
     if (this.isLoading()) {
       this.cancelChatRequest();
@@ -342,6 +379,12 @@ export class ChatInputComponent {
       return;
     }
 
+    // Nudge the user once per batch if they're attaching tabular files
+    // without the Spreadsheet Analysis tool enabled — the backend routes
+    // these to the tool instead of inline Bedrock document blocks (#206),
+    // so the user needs the tool enabled to get answers about the data.
+    let tabularNudgeShown = false;
+
     // Validate and upload each file
     for (const file of newFiles) {
       // Check file size
@@ -363,6 +406,19 @@ export class ChatInputComponent {
         continue;
       }
 
+      if (!tabularNudgeShown && this.isTabularFile(file)) {
+        const enabled = this.toolService
+          .enabledToolIds()
+          .includes('analyze_spreadsheet');
+        if (!enabled) {
+          this.toastService.info(
+            'Enable Spreadsheet Analysis',
+            'To analyze spreadsheets, enable "Spreadsheet Analysis" in the Tools section of the settings panel.'
+          );
+          tabularNudgeShown = true;
+        }
+      }
+
       // Upload file
       try {
         await this.fileUploadService.uploadFile(sessionId, file);
@@ -372,4 +428,16 @@ export class ChatInputComponent {
       }
     }
   }
+
+  private isTabularFile(file: File): boolean {
+    const tabularExts = ['.csv', '.xls', '.xlsx'];
+    const tabularMimes = [
+      'text/csv',
+      'application/vnd.ms-excel',
+      'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
+    ];
+    const lower = file.name.toLowerCase();
+    if (tabularExts.some(ext => lower.endsWith(ext))) return true;
+    return tabularMimes.includes((file.type || '').toLowerCase());
+  }
 }
\ No newline at end of file
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/artifact/artifact-card.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/artifact/artifact-card.component.ts
new file mode 100644
index 00000000..33b3fbc8
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/artifact/artifact-card.component.ts
@@ -0,0 +1,486 @@
+import {
+  ChangeDetectionStrategy,
+  Component,
+  computed,
+  inject,
+  input,
+  signal,
+} from '@angular/core';
+import { NgIcon, provideIcons } from '@ng-icons/core';
+import {
+  heroCodeBracket,
+  heroDocumentText,
+  heroArrowDownTray,
+  heroArrowPath,
+} from '@ng-icons/heroicons/outline';
+import type { Artifact } from '../../../../services/artifacts/artifact.model';
+import { ArtifactStateService } from '../../../../services/artifacts/artifact-state.service';
+import { ArtifactDownloadService } from '../../../../services/artifacts/artifact-download.service';
+
+/** Visual treatment derived from an artifact's content type. */
+interface ArtifactKind {
+  /** Short type label (HTML, JS, MD…). */
+  label: string;
+  /** Heroicon name for the type stamp. */
+  icon: string;
+  /** Single accent color (CSS). Used sparingly — a 2px edge rule, the
+   *  stamp outline, and the type glyph — never as a fill. One mid tone
+   *  that holds up on both the light and dark surface. */
+  accent: string;
+}
+
+/**
+ * One artifact, presented as a calm, rounded card — deliberately
+ * un-"component-kit": a borderless tinted surface (radius matched to the
+ * chat input), a hairline type stamp, and a quiet sans metadata line.
+ * The lone accent color appears only as a thin left rule, the stamp
+ * outline, and the glyph — not as a filled tile or pill.
+ *
+ * The artifact's content is never inlined here — opening asks the panel
+ * to mint a short-lived render token and load it in a sandboxed iframe.
+ *
+ * Anchored inline after its producing assistant message by the
+ * message-list (via `Artifact.producedByMessageIndex`); only legacy /
+ * unanchorable artifacts fall back to the end-of-conversation strip.
+ *
+ * Accent is applied via `[style.color]` (not a bound class string) so
+ * the structural classes stay static and there's no class-merge
+ * ambiguity; `currentColor` then carries it into the rule/stamp/glyph.
+ */
+@Component({
+  selector: 'app-artifact-card',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  imports: [NgIcon],
+  providers: [
+    provideIcons({
+      heroCodeBracket,
+      heroDocumentText,
+      heroArrowDownTray,
+      heroArrowPath,
+    }),
+  ],
+  template: `
+    <div class="artifact-card">
+      <button
+        type="button"
+        class="artifact-card__hit"
+        [attr.aria-label]="ariaLabel()"
+        (click)="open()"
+      ></button>
+
+      <span class="artifact-card__surface">
+        <span
+          class="artifact-card__rule"
+          [style.color]="kind().accent"
+          aria-hidden="true"
+        ></span>
+
+        <span
+          class="artifact-card__stamp"
+          [style.color]="kind().accent"
+          aria-hidden="true"
+        >
+          <ng-icon [name]="kind().icon" />
+        </span>
+
+        <span class="artifact-card__body" aria-hidden="true">
+          <span class="artifact-card__title">{{
+            artifact().title || 'Untitled artifact'
+          }}</span>
+          <span class="artifact-card__meta">
+            <span class="artifact-card__type">{{ kind().label }}</span>
+            <span class="artifact-card__sep">·</span>
+            <span>v{{ artifact().version }}</span>
+            @if (updatedLabel()) {
+              <span class="artifact-card__sep">·</span>
+              <span>{{ updatedLabel() }}</span>
+            }
+          </span>
+        </span>
+
+        <button
+          type="button"
+          class="artifact-card__download"
+          [class.is-busy]="downloading()"
+          [attr.aria-label]="downloadAriaLabel()"
+          [attr.aria-busy]="downloading()"
+          [disabled]="downloading()"
+          (click)="download()"
+        >
+          <ng-icon
+            [name]="downloading() ? 'heroArrowPath' : 'heroArrowDownTray'"
+            aria-hidden="true"
+          />
+          <span class="artifact-card__download-label">Download</span>
+        </button>
+      </span>
+    </div>
+  `,
+  styles: `
+    :host {
+      display: block;
+    }
+
+    /* Card shell: a positioning context for the stretched open button
+       and the download button. No chrome of its own. isolation:isolate
+       keeps the internal z-index (hit=1, surface=2) scoped to the card
+       so it can't paint over page overlays (e.g. the context popover) —
+       the card as a whole stacks at its normal flow level. */
+    .artifact-card {
+      position: relative;
+      isolation: isolate;
+      display: block;
+      width: 100%;
+      /* matches the chat input's rounded-2xl so the focus ring and the
+         surface share the app's corner radius */
+      border-radius: 1rem;
+    }
+
+    /* Primary action: an invisible button stretched over the whole card.
+       It owns the focus ring (an un-clipped rectangle so the rounded
+       corner can't eat it). The surface above is pointer-events:none, so
+       a click anywhere but the download button falls through to here. */
+    .artifact-card__hit {
+      position: absolute;
+      inset: 0;
+      z-index: 1;
+      appearance: none;
+      -webkit-appearance: none;
+      margin: 0;
+      padding: 0;
+      border: 0;
+      background: none;
+      font: inherit;
+      border-radius: inherit;
+      cursor: pointer;
+    }
+
+    .artifact-card__hit:focus-visible {
+      outline: 2px solid #2563eb;
+      outline-offset: 3px;
+    }
+
+    :host-context(html.dark) .artifact-card__hit:focus-visible {
+      outline-color: #60a5fa;
+    }
+
+    /* The visible body: a borderless, tinted, generously rounded card.
+       overflow:hidden so the left rule conforms to the rounded edge.
+       pointer-events:none delegates clicks to the stretched button
+       beneath; the download button (col 3) re-enables them. The grid's
+       auto last column sizes itself to the button — no manual gutter. */
+    .artifact-card__surface {
+      position: relative;
+      z-index: 2;
+      pointer-events: none;
+      display: grid;
+      grid-template-columns: auto minmax(0, 1fr) auto;
+      align-items: center;
+      gap: 0.875rem;
+      padding: 0.8rem 1rem 0.8rem 1.1rem;
+      background: #f1f2f4;
+      border-radius: 1rem;
+      overflow: hidden;
+      transition: background-color 0.18s ease;
+    }
+
+    :host-context(html.dark) .artifact-card__surface {
+      background: rgba(255, 255, 255, 0.045);
+    }
+
+    .artifact-card:hover .artifact-card__surface {
+      background: #e7e8ec;
+    }
+
+    :host-context(html.dark) .artifact-card:hover .artifact-card__surface {
+      background: rgba(255, 255, 255, 0.08);
+    }
+
+    /* Thin left rule: a short tick at rest, runs the full height on
+       hover/focus. The card's only structural line. */
+    .artifact-card__rule {
+      position: absolute;
+      left: 0;
+      top: 50%;
+      width: 2px;
+      height: 1.15rem;
+      transform: translateY(-50%);
+      background: currentColor;
+      opacity: 0.65;
+      transition:
+        height 0.2s ease,
+        opacity 0.2s ease;
+    }
+
+    .artifact-card:hover .artifact-card__rule,
+    .artifact-card__hit:focus-visible
+      ~ .artifact-card__surface
+      .artifact-card__rule {
+      height: 100%;
+      opacity: 1;
+    }
+
+    /* Hairline type stamp — outline only, no fill. */
+    .artifact-card__stamp {
+      display: grid;
+      place-items: center;
+      width: 2rem;
+      height: 2rem;
+      border: 1px solid color-mix(in srgb, currentColor 32%, transparent);
+      border-radius: 4px;
+    }
+
+    .artifact-card__stamp ng-icon {
+      font-size: 1rem;
+      line-height: 1;
+    }
+
+    .artifact-card__body {
+      min-width: 0;
+    }
+
+    .artifact-card__title {
+      display: block;
+      font-size: 0.875rem;
+      font-weight: 600;
+      letter-spacing: -0.006em;
+      color: #1f2430;
+      overflow: hidden;
+      text-overflow: ellipsis;
+      white-space: nowrap;
+    }
+
+    :host-context(html.dark) .artifact-card__title {
+      color: #eceef2;
+    }
+
+    /* Metadata line — the app's sans, small and quiet. */
+    .artifact-card__meta {
+      display: flex;
+      align-items: center;
+      gap: 0.4rem;
+      margin-top: 0.2rem;
+      font-size: 0.75rem;
+      color: #4b5563;
+    }
+
+    :host-context(html.dark) .artifact-card__meta {
+      color: #9aa3b2;
+    }
+
+    .artifact-card__type {
+      text-transform: uppercase;
+      font-weight: 600;
+      letter-spacing: 0.07em;
+      color: #353c4a;
+    }
+
+    :host-context(html.dark) .artifact-card__type {
+      color: #c4cbd8;
+    }
+
+    .artifact-card__sep {
+      opacity: 0.4;
+    }
+
+    /* Secondary action: a bordered, labelled download button in the
+       grid's last column. It lives inside the pointer-events:none
+       surface but re-enables them for itself, so it captures its own
+       clicks while the rest of the card falls through to the open
+       button. Resting colour clears the 3:1 non-text contrast bar. */
+    .artifact-card__download {
+      pointer-events: auto;
+      display: inline-flex;
+      align-items: center;
+      gap: 0.4rem;
+      margin: 0;
+      padding: 0.34rem 0.7rem;
+      border: 1px solid color-mix(in srgb, currentColor 35%, transparent);
+      border-radius: 7px;
+      background: none;
+      color: #6b7280;
+      font: inherit;
+      font-size: 0.75rem;
+      font-weight: 600;
+      line-height: 1;
+      white-space: nowrap;
+      cursor: pointer;
+      transition:
+        color 0.18s ease,
+        border-color 0.18s ease,
+        background-color 0.18s ease;
+    }
+
+    .artifact-card__download ng-icon {
+      font-size: 0.95rem;
+      line-height: 1;
+    }
+
+    .artifact-card:hover .artifact-card__download,
+    .artifact-card__download:hover {
+      color: #374151;
+    }
+
+    .artifact-card__download:hover {
+      background: rgba(0, 0, 0, 0.05);
+    }
+
+    .artifact-card__download:focus-visible {
+      outline: 2px solid #2563eb;
+      outline-offset: 2px;
+    }
+
+    .artifact-card__download:disabled {
+      cursor: default;
+    }
+
+    .artifact-card__download.is-busy ng-icon {
+      animation: artifact-card-spin 0.8s linear infinite;
+    }
+
+    :host-context(html.dark) .artifact-card__download {
+      color: #9aa3b2;
+    }
+
+    :host-context(html.dark) .artifact-card:hover .artifact-card__download,
+    :host-context(html.dark) .artifact-card__download:hover {
+      color: #cbd2dd;
+    }
+
+    :host-context(html.dark) .artifact-card__download:hover {
+      background: rgba(255, 255, 255, 0.08);
+    }
+
+    :host-context(html.dark) .artifact-card__download:focus-visible {
+      outline-color: #60a5fa;
+    }
+
+    @keyframes artifact-card-spin {
+      to {
+        transform: rotate(360deg);
+      }
+    }
+
+    @media (prefers-reduced-motion: reduce) {
+      .artifact-card__surface,
+      .artifact-card__rule,
+      .artifact-card__download {
+        transition: none;
+      }
+      .artifact-card__download.is-busy ng-icon {
+        animation: none;
+      }
+    }
+  `,
+})
+export class ArtifactCardComponent {
+  artifact = input.required<Artifact>();
+
+  private artifactState = inject(ArtifactStateService);
+  private artifactDownload = inject(ArtifactDownloadService);
+
+  protected readonly downloading = signal(false);
+
+  protected readonly kind = computed<ArtifactKind>(() =>
+    classifyContentType(this.artifact().contentType),
+  );
+
+  /** Short, human relative time for the meta line. Empty when the
+   *  timestamp is missing or unparseable so the row just omits it. */
+  protected readonly updatedLabel = computed<string>(() =>
+    relativeTime(this.artifact().updatedAt),
+  );
+
+  protected readonly ariaLabel = computed(
+    () =>
+      `Open ${this.kind().label} artifact ${this.artifact().title || 'Untitled'}, version ${this.artifact().version}`,
+  );
+
+  /** Static so the visible "Download" label is always contained in the
+   *  accessible name (WCAG 2.5.3); the working state rides `aria-busy`. */
+  protected readonly downloadAriaLabel = computed(
+    () =>
+      `Download ${this.kind().label} artifact ${this.artifact().title || 'Untitled'}, version ${this.artifact().version}`,
+  );
+
+  protected open(): void {
+    const a = this.artifact();
+    this.artifactState.openArtifactPanel({
+      artifactId: a.artifactId,
+      version: a.version,
+      title: a.title,
+    });
+  }
+
+  protected async download(): Promise<void> {
+    if (this.downloading()) return;
+    const a = this.artifact();
+    this.downloading.set(true);
+    try {
+      await this.artifactDownload.download({
+        artifactId: a.artifactId,
+        version: a.version,
+      });
+    } finally {
+      this.downloading.set(false);
+    }
+  }
+}
+
+const CODE = 'heroCodeBracket';
+const DOC = 'heroDocumentText';
+
+/** Map a MIME type to a label + accent. The match is on the bare type
+ *  (parameters like `; charset=utf-8` are ignored). One mid-tone accent
+ *  per type — legible on both the light and dark surface, used only as
+ *  a thin rule / outline / glyph. */
+function classifyContentType(contentType: string): ArtifactKind {
+  const mime = (contentType || '').split(';')[0].trim().toLowerCase();
+
+  switch (mime) {
+    case 'text/html':
+    case 'application/xhtml+xml':
+      return { label: 'HTML', icon: CODE, accent: '#e8762a' };
+    case 'text/javascript':
+    case 'application/javascript':
+      return { label: 'JS', icon: CODE, accent: '#cf9a13' };
+    case 'text/css':
+      return { label: 'CSS', icon: CODE, accent: '#2f9bd6' };
+    case 'application/json':
+      return { label: 'JSON', icon: CODE, accent: '#1f9d6b' };
+    case 'image/svg+xml':
+      return { label: 'SVG', icon: CODE, accent: '#d6519a' };
+    case 'text/markdown':
+      return { label: 'MD', icon: DOC, accent: '#7c6cf0' };
+    default:
+      return {
+        label: mime.startsWith('text/') ? 'TEXT' : 'DOC',
+        icon: DOC,
+        accent: '#8a93a3',
+      };
+  }
+}
+
+/** "just now" / "5m ago" / "3h ago" / "2d ago", else a short date.
+ *  Returns '' for a missing or unparseable timestamp. */
+function relativeTime(iso: string): string {
+  if (!iso) return '';
+  const then = Date.parse(iso);
+  if (Number.isNaN(then)) return '';
+
+  const diffMs = Date.now() - then;
+  const min = Math.floor(diffMs / 60000);
+  if (min < 1) return 'just now';
+  if (min < 60) return `${min}m ago`;
+
+  const hr = Math.floor(min / 60);
+  if (hr < 24) return `${hr}h ago`;
+
+  const day = Math.floor(hr / 24);
+  if (day < 7) return `${day}d ago`;
+
+  return new Date(then).toLocaleDateString(undefined, {
+    month: 'short',
+    day: 'numeric',
+  });
+}
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/artifact/artifact-panel.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/artifact/artifact-panel.component.ts
new file mode 100644
index 00000000..4cb1c2e5
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/artifact/artifact-panel.component.ts
@@ -0,0 +1,822 @@
+import {
+  ChangeDetectionStrategy,
+  Component,
+  ElementRef,
+  computed,
+  effect,
+  inject,
+  signal,
+  viewChild,
+} from '@angular/core';
+import { DomSanitizer, SafeResourceUrl } from '@angular/platform-browser';
+import { HttpErrorResponse } from '@angular/common/http';
+import { NgTemplateOutlet } from '@angular/common';
+import { NgIcon, provideIcons } from '@ng-icons/core';
+import {
+  heroXMark,
+  heroArrowPath,
+  heroExclamationTriangle,
+  heroArrowDownTray,
+  heroEye,
+  heroCodeBracket,
+  heroClipboard,
+  heroCheck,
+  heroChevronUpDown,
+} from '@ng-icons/heroicons/outline';
+import type { Artifact } from '../../../../services/artifacts/artifact.model';
+import { ArtifactStateService } from '../../../../services/artifacts/artifact-state.service';
+import {
+  ArtifactHttpService,
+  type ArtifactContent,
+} from '../../../../services/artifacts/artifact-http.service';
+import { ArtifactDownloadService } from '../../../../services/artifacts/artifact-download.service';
+import { SessionService } from '../../../../services/session/session.service';
+import { ArtifactSourceComponent } from './artifact-source.component';
+
+/**
+ * Right-docked pane that renders one artifact version in a sandboxed
+ * iframe. Non-modal: the side nav collapses to free the space (handled
+ * by `ArtifactStateService`) and the chat stays visible and interactive
+ * alongside it. Open state lives in `ArtifactStateService`; this component
+ * mints a fresh short-lived render token each time it opens (the token
+ * is a ~120s bearer credential — never cached) and points the iframe at
+ * the returned artifact-origin URL.
+ *
+ * Isolation model (from the #306 design): the artifact origin is a
+ * separate subdomain, the render Lambda + CloudFront stamp a strict CSP,
+ * and the iframe carries `sandbox="allow-scripts"` *without*
+ * `allow-same-origin` so the framed document is a null origin and cannot
+ * reach the artifact origin's storage/cookies. `bypassSecurityTrustResourceUrl`
+ * is safe here: the URL comes from our own authenticated app-api, is
+ * single-use, and the sandbox + CSP contain the content.
+ */
+@Component({
+  selector: 'app-artifact-panel',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  imports: [NgIcon, NgTemplateOutlet, ArtifactSourceComponent],
+  providers: [
+    provideIcons({
+      heroXMark,
+      heroArrowPath,
+      heroExclamationTriangle,
+      heroArrowDownTray,
+      heroEye,
+      heroCodeBracket,
+      heroClipboard,
+      heroCheck,
+      heroChevronUpDown,
+    }),
+  ],
+  host: {
+    '(document:keydown.escape)': 'onEscape()',
+    '(document:pointerdown)': 'onDocumentPointerDown($event)',
+  },
+  template: `
+    @if (open(); as ref) {
+      <aside
+        class="fixed inset-y-0 right-0 z-40 flex w-full flex-col border-l border-gray-200 bg-white dark:border-gray-700 dark:bg-gray-900"
+        [style.maxWidth]="paneWidthCss()"
+        [class.select-none]="dragging()"
+        [attr.aria-label]="'Artifact: ' + ref.title"
+      >
+        <div
+          role="separator"
+          aria-orientation="vertical"
+          tabindex="0"
+          aria-label="Resize artifact panel"
+          [attr.aria-valuemin]="paneWidthMin"
+          [attr.aria-valuemax]="paneWidthMax"
+          [attr.aria-valuenow]="paneWidth()"
+          class="group absolute inset-y-0 left-0 z-10 flex w-2 -translate-x-1/2 cursor-col-resize touch-none items-center justify-center focus-visible:outline-none"
+          (pointerdown)="onHandlePointerDown($event)"
+          (pointermove)="onHandlePointerMove($event)"
+          (pointerup)="onHandlePointerUp($event)"
+          (pointercancel)="onHandlePointerUp($event)"
+          (keydown)="onHandleKeydown($event)"
+        >
+          <span
+            aria-hidden="true"
+            class="h-12 w-1 rounded-full bg-gray-300 transition-colors group-hover:bg-blue-500 group-focus-visible:bg-blue-500 dark:bg-gray-600 dark:group-hover:bg-blue-400"
+          ></span>
+        </div>
+        <header
+          class="flex items-center gap-3 border-b border-gray-200 px-4 py-3 dark:border-gray-700"
+        >
+          <div class="min-w-0 flex-1">
+            <h2
+              class="truncate text-sm font-semibold text-gray-900 dark:text-gray-100"
+            >
+              {{ ref.title || 'Untitled artifact' }}
+            </h2>
+            @if (versions().length > 1) {
+              <div #versionControl class="relative">
+                <button
+                  type="button"
+                  class="-ml-1 flex items-center gap-0.5 rounded px-1 py-0.5 text-xs text-gray-500 transition-colors hover:text-gray-900 focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-blue-500 dark:text-gray-400 dark:hover:text-gray-100"
+                  aria-haspopup="menu"
+                  [attr.aria-expanded]="menuOpen()"
+                  aria-controls="artifact-version-menu"
+                  [attr.aria-label]="
+                    'Version ' +
+                    ref.version +
+                    ' of ' +
+                    versions().length +
+                    ', change version'
+                  "
+                  (click)="toggleMenu()"
+                >
+                  <span>Version {{ ref.version }}</span>
+                  <ng-icon
+                    name="heroChevronUpDown"
+                    class="text-sm"
+                    aria-hidden="true"
+                  />
+                </button>
+                @if (menuOpen()) {
+                  <div
+                    id="artifact-version-menu"
+                    role="menu"
+                    aria-label="Artifact versions"
+                    class="absolute left-0 z-20 mt-1 max-h-72 w-60 overflow-auto rounded-lg border border-gray-200 bg-white p-1 shadow-lg dark:border-gray-700 dark:bg-gray-800"
+                    (keydown)="onMenuKeydown($event)"
+                  >
+                    @for (v of versions(); track v.version) {
+                      <button
+                        type="button"
+                        role="menuitemradio"
+                        [attr.aria-checked]="v.version === ref.version"
+                        class="flex w-full items-center gap-2 rounded-md px-2 py-1.5 text-left text-xs transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-inset focus-visible:ring-blue-500"
+                        [class]="
+                          v.version === ref.version
+                            ? 'bg-gray-100 text-gray-900 dark:bg-gray-700 dark:text-gray-100'
+                            : 'text-gray-700 hover:bg-gray-100 dark:text-gray-300 dark:hover:bg-gray-700'
+                        "
+                        (click)="selectVersion(v)"
+                      >
+                        <span
+                          class="flex h-4 w-4 shrink-0 items-center justify-center"
+                          aria-hidden="true"
+                        >
+                          @if (v.version === ref.version) {
+                            <ng-icon name="heroCheck" class="text-sm" />
+                          }
+                        </span>
+                        <span class="min-w-0 flex-1">
+                          <span class="font-medium">Version {{ v.version }}</span>
+                          @if (relativeLabel(v.updatedAt); as t) {
+                            <span class="ml-1 text-gray-400 dark:text-gray-500"
+                              >· {{ t }}</span
+                            >
+                          }
+                        </span>
+                        @if (v.version === latestVersion()) {
+                          <span
+                            class="shrink-0 rounded bg-blue-50 px-1.5 py-0.5 text-[10px] font-semibold uppercase tracking-wide text-blue-700 dark:bg-blue-500/15 dark:text-blue-300"
+                            >Latest</span
+                          >
+                        }
+                      </button>
+                    }
+                  </div>
+                }
+              </div>
+            } @else {
+              <p class="text-xs text-gray-500 dark:text-gray-400">
+                Version {{ ref.version }}
+              </p>
+            }
+          </div>
+          <div
+            role="group"
+            aria-label="Artifact view mode"
+            class="flex items-center gap-0.5 rounded-md border border-gray-200 p-0.5 dark:border-gray-700"
+          >
+            <button
+              type="button"
+              class="flex h-7 w-7 items-center justify-center rounded transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-blue-500"
+              [class]="
+                view() === 'preview'
+                  ? 'bg-gray-100 text-gray-900 dark:bg-gray-800 dark:text-gray-100'
+                  : 'text-gray-500 hover:text-gray-900 dark:text-gray-400 dark:hover:text-gray-100'
+              "
+              [attr.aria-pressed]="view() === 'preview'"
+              aria-label="Preview"
+              title="Preview"
+              (click)="setView('preview')"
+            >
+              <ng-icon name="heroEye" class="text-base" aria-hidden="true" />
+            </button>
+            <button
+              type="button"
+              class="flex h-7 w-7 items-center justify-center rounded transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-blue-500"
+              [class]="
+                view() === 'code'
+                  ? 'bg-gray-100 text-gray-900 dark:bg-gray-800 dark:text-gray-100'
+                  : 'text-gray-500 hover:text-gray-900 dark:text-gray-400 dark:hover:text-gray-100'
+              "
+              [attr.aria-pressed]="view() === 'code'"
+              aria-label="View code"
+              title="View code"
+              (click)="setView('code')"
+            >
+              <ng-icon
+                name="heroCodeBracket"
+                class="text-base"
+                aria-hidden="true"
+              />
+            </button>
+          </div>
+          @if (view() === 'code') {
+            <button
+              type="button"
+              class="flex h-8 w-8 items-center justify-center rounded-md text-gray-500 transition-colors hover:bg-gray-100 hover:text-gray-900 focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-blue-500 disabled:cursor-not-allowed disabled:opacity-50 dark:text-gray-400 dark:hover:bg-gray-800 dark:hover:text-gray-100"
+              [attr.aria-label]="copied() ? 'Copied' : 'Copy code'"
+              [disabled]="!source()"
+              (click)="copy()"
+            >
+              <ng-icon
+                [name]="copied() ? 'heroCheck' : 'heroClipboard'"
+                class="text-lg"
+                [class.text-green-600]="copied()"
+                aria-hidden="true"
+              />
+            </button>
+          }
+          <button
+            type="button"
+            class="flex h-8 w-8 items-center justify-center rounded-md text-gray-500 transition-colors hover:bg-gray-100 hover:text-gray-900 focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-blue-500 disabled:cursor-not-allowed disabled:opacity-50 dark:text-gray-400 dark:hover:bg-gray-800 dark:hover:text-gray-100"
+            [attr.aria-label]="
+              downloading() ? 'Downloading artifact…' : 'Download artifact'
+            "
+            [attr.aria-busy]="downloading()"
+            [disabled]="downloading() || !safeUrl()"
+            (click)="download()"
+          >
+            <ng-icon
+              [name]="downloading() ? 'heroArrowPath' : 'heroArrowDownTray'"
+              class="text-lg"
+              [class.animate-spin]="downloading()"
+              aria-hidden="true"
+            />
+          </button>
+          <button
+            type="button"
+            class="flex h-8 w-8 items-center justify-center rounded-md text-gray-500 transition-colors hover:bg-gray-100 hover:text-gray-900 focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-blue-500 dark:text-gray-400 dark:hover:bg-gray-800 dark:hover:text-gray-100"
+            aria-label="Close artifact"
+            (click)="close()"
+          >
+            <ng-icon name="heroXMark" class="text-lg" aria-hidden="true" />
+          </button>
+        </header>
+
+        <div class="relative min-h-0 flex-1">
+          @if (view() === 'code') {
+            @if (sourceError()) {
+              <div
+                class="absolute inset-0 flex flex-col items-center justify-center gap-3 px-6 text-center"
+                role="alert"
+              >
+                <ng-icon
+                  name="heroExclamationTriangle"
+                  class="text-3xl text-amber-500"
+                  aria-hidden="true"
+                />
+                <p class="text-sm text-gray-700 dark:text-gray-300">
+                  {{ sourceError() }}
+                </p>
+                <button
+                  type="button"
+                  class="rounded-md bg-blue-600 px-3 py-1.5 text-sm font-medium text-white transition-colors hover:bg-blue-700 focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-blue-500 focus-visible:ring-offset-1 dark:focus-visible:ring-offset-gray-900"
+                  (click)="retrySource()"
+                >
+                  Try again
+                </button>
+              </div>
+            } @else if (source(); as src) {
+              <app-artifact-source
+                [content]="src.content"
+                [contentType]="src.contentType"
+              />
+            } @else {
+              <ng-container
+                [ngTemplateOutlet]="skeleton"
+                [ngTemplateOutletContext]="{ label: 'Building source view…' }"
+              />
+            }
+          } @else {
+            @if (error()) {
+              <div
+                class="absolute inset-0 flex flex-col items-center justify-center gap-3 px-6 text-center"
+                role="alert"
+              >
+                <ng-icon
+                  name="heroExclamationTriangle"
+                  class="text-3xl text-amber-500"
+                  aria-hidden="true"
+                />
+                <p class="text-sm text-gray-700 dark:text-gray-300">
+                  {{ error() }}
+                </p>
+                <button
+                  type="button"
+                  class="rounded-md bg-blue-600 px-3 py-1.5 text-sm font-medium text-white transition-colors hover:bg-blue-700 focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-blue-500 focus-visible:ring-offset-1 dark:focus-visible:ring-offset-gray-900"
+                  (click)="retry()"
+                >
+                  Try again
+                </button>
+              </div>
+            } @else {
+              @if (safeUrl(); as url) {
+                <iframe
+                  [src]="url"
+                  class="h-full w-full border-0 bg-white"
+                  [class.pointer-events-none]="dragging()"
+                  [title]="ref.title || 'Artifact'"
+                  sandbox="allow-scripts"
+                  referrerpolicy="no-referrer"
+                  loading="lazy"
+                  (load)="onIframeLoad()"
+                ></iframe>
+              }
+              @if (!previewReady()) {
+                <ng-container
+                  [ngTemplateOutlet]="skeleton"
+                  [ngTemplateOutletContext]="{ label: 'Rendering artifact…' }"
+                />
+              }
+            }
+          }
+        </div>
+      </aside>
+      <ng-template #skeleton let-label="label">
+        <div
+          class="absolute inset-0 overflow-hidden bg-white p-8 dark:bg-gray-900"
+          role="status"
+          [attr.aria-label]="label"
+        >
+          <div aria-hidden="true" class="mx-auto flex max-w-2xl flex-col gap-6">
+            <div class="flex flex-col gap-3">
+              <div
+                class="skeleton-shimmer h-8 w-1/2 rounded-lg bg-gray-200 dark:bg-gray-700"
+              ></div>
+              <div
+                class="skeleton-shimmer h-4 w-1/4 rounded bg-gray-200 dark:bg-gray-700"
+              ></div>
+            </div>
+            <div class="flex flex-col gap-3">
+              <div
+                class="skeleton-shimmer h-3.5 w-full rounded bg-gray-200 dark:bg-gray-700"
+              ></div>
+              <div
+                class="skeleton-shimmer h-3.5 w-11/12 rounded bg-gray-200 dark:bg-gray-700"
+              ></div>
+              <div
+                class="skeleton-shimmer h-3.5 w-4/5 rounded bg-gray-200 dark:bg-gray-700"
+              ></div>
+            </div>
+            <div
+              class="skeleton-shimmer h-48 w-full rounded-xl bg-gray-200 dark:bg-gray-700"
+            ></div>
+            <div class="flex flex-col gap-3">
+              <div
+                class="skeleton-shimmer h-3.5 w-full rounded bg-gray-200 dark:bg-gray-700"
+              ></div>
+              <div
+                class="skeleton-shimmer h-3.5 w-10/12 rounded bg-gray-200 dark:bg-gray-700"
+              ></div>
+              <div
+                class="skeleton-shimmer h-3.5 w-2/3 rounded bg-gray-200 dark:bg-gray-700"
+              ></div>
+            </div>
+          </div>
+          <span class="sr-only">{{ label }}</span>
+        </div>
+      </ng-template>
+    }
+  `,
+  styles: `
+    :host {
+      display: contents;
+    }
+    .skeleton-shimmer {
+      background-image: linear-gradient(
+        90deg,
+        transparent 0%,
+        rgba(255, 255, 255, 0.45) 50%,
+        transparent 100%
+      );
+      background-size: 220% 100%;
+      background-repeat: no-repeat;
+      animation: artifact-skeleton-shimmer 1.5s ease-in-out infinite;
+    }
+    @keyframes artifact-skeleton-shimmer {
+      0% {
+        background-position: 130% 0;
+      }
+      100% {
+        background-position: -130% 0;
+      }
+    }
+    @media (prefers-reduced-motion: reduce) {
+      .skeleton-shimmer {
+        animation: none;
+      }
+    }
+  `,
+})
+export class ArtifactPanelComponent {
+  private artifactState = inject(ArtifactStateService);
+  private artifactHttp = inject(ArtifactHttpService);
+  private sessionService = inject(SessionService);
+  private sanitizer = inject(DomSanitizer);
+  private artifactDownload = inject(ArtifactDownloadService);
+
+  protected readonly open = this.artifactState.openArtifact;
+
+  protected readonly safeUrl = signal<SafeResourceUrl | null>(null);
+  protected readonly error = signal<string | null>(null);
+  protected readonly downloading = signal(false);
+
+  /** Flips when the preview iframe fires `load` — the document has
+   *  actually painted, not merely that the render token was minted. */
+  protected readonly iframeLoaded = signal(false);
+  /** Skeleton clears only when a URL exists AND the iframe has painted. */
+  protected readonly previewReady = computed(
+    () => !!this.safeUrl() && this.iframeLoaded(),
+  );
+
+  // Code-view state. The source is fetched lazily the first time the
+  // user switches to 'code' for a given artifact (the preview iframe
+  // path is untouched and stays the default).
+  protected readonly view = signal<'preview' | 'code'>('preview');
+  protected readonly source = signal<ArtifactContent | null>(null);
+  protected readonly sourceLoading = signal(false);
+  protected readonly sourceError = signal<string | null>(null);
+  protected readonly copied = signal(false);
+  /** Bumped per source fetch so a slow response that resolves after the
+   *  panel closed or switched artifact is discarded. */
+  private sourceSeq = 0;
+  /** Which `currentKey()` the loaded source belongs to, so re-opening
+   *  the same version doesn't refetch but switching does. */
+  private loadedSourceKey: string | null = null;
+  private copiedTimer: ReturnType<typeof setTimeout> | null = null;
+
+  // Resize handle state. Width itself lives in ArtifactStateService so
+  // the layout (content padding + fixed footer/topnav) can reserve the
+  // same amount via a CSS var.
+  protected readonly paneWidth = this.artifactState.paneWidth;
+  protected readonly paneWidthMin = this.artifactState.paneWidthMin;
+  protected readonly paneWidthMax = this.artifactState.paneWidthMax;
+  protected readonly paneWidthCss = computed(
+    () => `${this.artifactState.paneWidth()}px`,
+  );
+  protected readonly dragging = signal(false);
+  private dragStartX = 0;
+  private dragStartWidth = 0;
+
+  /** Bumped per open/retry so a slow mint that resolves after the panel
+   *  closed (or moved to another artifact) is discarded. */
+  private requestSeq = 0;
+
+  protected readonly currentKey = computed(() => {
+    const r = this.open();
+    return r ? `${r.artifactId}#${r.version}` : null;
+  });
+
+  // Version picker. All versions live in ArtifactStateService (one
+  // registry entry per `artifactId#version`); selecting one just
+  // re-points openRef and the currentKey effect reloads it.
+  protected readonly versions = computed<Artifact[]>(() => {
+    const r = this.open();
+    return r ? this.artifactState.versionsFor(r.artifactId) : [];
+  });
+  /** Highest version number (the list is desc) — drives the badge. */
+  protected readonly latestVersion = computed(
+    () => this.versions()[0]?.version,
+  );
+  protected readonly menuOpen = signal(false);
+  private readonly versionControl =
+    viewChild<ElementRef<HTMLElement>>('versionControl');
+
+  constructor() {
+    // Mint a fresh token whenever the panel opens or switches artifact.
+    // Closing (open() === null) clears the iframe so the credential URL
+    // doesn't linger in the DOM.
+    effect(() => {
+      const key = this.currentKey();
+      // Any open / version-switch / close also dismisses the menu.
+      this.menuOpen.set(false);
+      if (!key) {
+        this.reset();
+        return;
+      }
+      // New artifact/version: drop any stale source and return to the
+      // preview default so the toggle doesn't carry across artifacts.
+      this.resetSource();
+      this.view.set('preview');
+      void this.load();
+    });
+  }
+
+  protected onHandlePointerDown(e: PointerEvent): void {
+    e.preventDefault();
+    // Capture so the drag keeps tracking even while the cursor is over
+    // the sandboxed iframe (which would otherwise swallow the events).
+    try {
+      (e.target as Element).setPointerCapture(e.pointerId);
+    } catch {
+      /* not all pointer types/environments allow capture — drag still works */
+    }
+    this.dragStartX = e.clientX;
+    this.dragStartWidth = this.artifactState.paneWidth();
+    this.dragging.set(true);
+  }
+
+  protected onHandlePointerMove(e: PointerEvent): void {
+    if (!this.dragging()) return;
+    // Pane is docked to the right edge: dragging the handle left (a
+    // smaller clientX) widens it.
+    const delta = this.dragStartX - e.clientX;
+    this.applyWidth(this.dragStartWidth + delta);
+  }
+
+  protected onHandlePointerUp(e: PointerEvent): void {
+    if (!this.dragging()) return;
+    this.dragging.set(false);
+    try {
+      const el = e.target as Element;
+      if (el.hasPointerCapture(e.pointerId)) {
+        el.releasePointerCapture(e.pointerId);
+      }
+    } catch {
+      /* capture may never have been established — nothing to release */
+    }
+  }
+
+  protected onHandleKeydown(e: KeyboardEvent): void {
+    const step = e.shiftKey ? 64 : 16;
+    const w = this.artifactState.paneWidth();
+    switch (e.key) {
+      case 'ArrowLeft': // widen (boundary moves left)
+        this.applyWidth(w + step);
+        break;
+      case 'ArrowRight': // narrow
+        this.applyWidth(w - step);
+        break;
+      case 'Home':
+        this.applyWidth(this.paneWidthMax);
+        break;
+      case 'End':
+        this.applyWidth(this.paneWidthMin);
+        break;
+      default:
+        return;
+    }
+    e.preventDefault();
+  }
+
+  /** Clamp to the absolute service bounds and to a viewport-relative
+   *  ceiling so the chat column can never be squeezed away entirely. */
+  private applyWidth(px: number): void {
+    let target = px;
+    if (typeof window !== 'undefined') {
+      const maxByViewport = Math.max(
+        this.paneWidthMin,
+        window.innerWidth - 360,
+      );
+      target = Math.min(target, maxByViewport);
+    }
+    this.artifactState.setPaneWidth(target);
+  }
+
+  protected onEscape(): void {
+    if (this.menuOpen()) {
+      this.closeMenuAndRefocus();
+      return;
+    }
+    if (this.open()) this.close();
+  }
+
+  protected toggleMenu(): void {
+    const next = !this.menuOpen();
+    this.menuOpen.set(next);
+    // Let the menu render, then move focus into it for keyboard users.
+    if (next) setTimeout(() => this.focusSelectedItem());
+  }
+
+  protected selectVersion(v: Artifact): void {
+    this.menuOpen.set(false);
+    // Re-points openRef; the currentKey effect re-mints the token and
+    // reloads (preview) / refetches (code) for the chosen version, and
+    // the header title tracks that version.
+    this.artifactState.openArtifactPanel({
+      artifactId: v.artifactId,
+      version: v.version,
+      title: v.title,
+    });
+  }
+
+  protected onMenuKeydown(e: KeyboardEvent): void {
+    const items = this.menuItems();
+    if (!items.length) return;
+    const idx = items.indexOf(
+      document.activeElement as HTMLButtonElement,
+    );
+    switch (e.key) {
+      case 'ArrowDown':
+        e.preventDefault();
+        items[(Math.max(idx, -1) + 1) % items.length]?.focus();
+        break;
+      case 'ArrowUp':
+        e.preventDefault();
+        items[(idx <= 0 ? items.length : idx) - 1]?.focus();
+        break;
+      case 'Home':
+        e.preventDefault();
+        items[0]?.focus();
+        break;
+      case 'End':
+        e.preventDefault();
+        items[items.length - 1]?.focus();
+        break;
+      case 'Tab':
+        // Don't leave an orphaned menu when focus tabs away.
+        this.menuOpen.set(false);
+        break;
+      default:
+        break;
+    }
+  }
+
+  protected onDocumentPointerDown(e: PointerEvent): void {
+    if (!this.menuOpen()) return;
+    const root = this.versionControl()?.nativeElement;
+    if (root && !root.contains(e.target as Node)) {
+      this.menuOpen.set(false);
+    }
+  }
+
+  /** Short relative time for a version row; '' (hidden) if unparseable. */
+  protected relativeLabel(iso: string): string {
+    if (!iso) return '';
+    const then = Date.parse(iso);
+    if (Number.isNaN(then)) return '';
+    const min = Math.floor((Date.now() - then) / 60000);
+    if (min < 1) return 'just now';
+    if (min < 60) return `${min}m ago`;
+    const hr = Math.floor(min / 60);
+    if (hr < 24) return `${hr}h ago`;
+    return `${Math.floor(hr / 24)}d ago`;
+  }
+
+  private menuItems(): HTMLButtonElement[] {
+    const root = this.versionControl()?.nativeElement;
+    if (!root) return [];
+    return Array.from(
+      root.querySelectorAll<HTMLButtonElement>('[role="menuitemradio"]'),
+    );
+  }
+
+  private focusSelectedItem(): void {
+    const items = this.menuItems();
+    const current =
+      items.find((b) => b.getAttribute('aria-checked') === 'true') ??
+      items[0];
+    current?.focus();
+  }
+
+  private closeMenuAndRefocus(): void {
+    this.menuOpen.set(false);
+    this.versionControl()
+      ?.nativeElement.querySelector<HTMLButtonElement>(
+        '[aria-haspopup="menu"]',
+      )
+      ?.focus();
+  }
+
+  protected close(): void {
+    this.artifactState.closeArtifactPanel();
+  }
+
+  protected retry(): void {
+    void this.load();
+  }
+
+  protected onIframeLoad(): void {
+    this.iframeLoaded.set(true);
+  }
+
+  protected setView(next: 'preview' | 'code'): void {
+    if (this.view() === next) return;
+    this.view.set(next);
+    if (next === 'code') void this.ensureSource();
+  }
+
+  protected retrySource(): void {
+    this.loadedSourceKey = null;
+    void this.ensureSource();
+  }
+
+  /** Fetch the raw source once per artifact version. No-op if it's
+   *  already loaded for the current key or a fetch is in flight. */
+  private async ensureSource(): Promise<void> {
+    const ref = this.open();
+    if (!ref) return;
+    const key = this.currentKey();
+    if (this.sourceLoading()) return;
+    if (this.source() && this.loadedSourceKey === key) return;
+
+    const seq = ++this.sourceSeq;
+    this.sourceLoading.set(true);
+    this.sourceError.set(null);
+    this.source.set(null);
+    try {
+      const content = await this.artifactHttp.getArtifactContent(
+        ref.artifactId,
+        ref.version,
+      );
+      if (seq !== this.sourceSeq) return; // superseded — drop
+      this.source.set(content);
+      this.loadedSourceKey = key;
+    } catch (err) {
+      if (seq !== this.sourceSeq) return;
+      this.sourceError.set(
+        err instanceof HttpErrorResponse && err.status === 413
+          ? 'This artifact is too large to preview here — download it instead.'
+          : "This artifact's source couldn't be loaded. It may have expired or been removed.",
+      );
+    } finally {
+      if (seq === this.sourceSeq) this.sourceLoading.set(false);
+    }
+  }
+
+  protected async copy(): Promise<void> {
+    const src = this.source();
+    if (!src) return;
+    try {
+      await navigator.clipboard.writeText(src.content);
+      this.copied.set(true);
+      if (this.copiedTimer) clearTimeout(this.copiedTimer);
+      this.copiedTimer = setTimeout(() => this.copied.set(false), 2000);
+    } catch {
+      /* clipboard blocked (permissions/insecure context) — no-op;
+         the user can still select the visible source manually */
+    }
+  }
+
+  private resetSource(): void {
+    this.source.set(null);
+    this.sourceError.set(null);
+    this.sourceLoading.set(false);
+    this.copied.set(false);
+    this.loadedSourceKey = null;
+  }
+
+  protected async download(): Promise<void> {
+    const ref = this.open();
+    if (!ref || this.downloading()) return;
+    this.downloading.set(true);
+    try {
+      await this.artifactDownload.download({
+        artifactId: ref.artifactId,
+        version: ref.version,
+      });
+    } finally {
+      this.downloading.set(false);
+    }
+  }
+
+  private reset(): void {
+    this.safeUrl.set(null);
+    this.error.set(null);
+    this.iframeLoaded.set(false);
+    this.resetSource();
+    this.view.set('preview');
+  }
+
+  private async load(): Promise<void> {
+    const ref = this.open();
+    if (!ref) return;
+    const seq = ++this.requestSeq;
+    this.error.set(null);
+    this.safeUrl.set(null);
+    this.iframeLoaded.set(false);
+    try {
+      const sessionId = this.sessionService.currentSession().sessionId;
+      const token = await this.artifactHttp.mintRenderToken(
+        ref.artifactId,
+        ref.version,
+        sessionId,
+      );
+      if (seq !== this.requestSeq) return; // superseded — drop
+      this.safeUrl.set(
+        this.sanitizer.bypassSecurityTrustResourceUrl(token.url),
+      );
+    } catch {
+      if (seq !== this.requestSeq) return;
+      this.error.set(
+        "This artifact couldn't be loaded. It may have expired or been removed.",
+      );
+    }
+  }
+}
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/artifact/artifact-source.component.spec.ts b/frontend/ai.client/src/app/session/components/message-list/components/artifact/artifact-source.component.spec.ts
new file mode 100644
index 00000000..8ab133e5
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/artifact/artifact-source.component.spec.ts
@@ -0,0 +1,66 @@
+import { describe, it, expect, beforeEach } from 'vitest';
+import { TestBed, ComponentFixture } from '@angular/core/testing';
+import { ArtifactSourceComponent } from './artifact-source.component';
+
+/**
+ * Prism is loaded as a global angular.json script, absent under vitest,
+ * so `getPrism()` returns null and the component falls back to escaped
+ * plain text. That makes the XSS-safety assertion deterministic here
+ * regardless of grammar availability.
+ */
+describe('ArtifactSourceComponent', () => {
+  let fixture: ComponentFixture<ArtifactSourceComponent>;
+
+  function render(content: string, contentType: string): HTMLElement {
+    fixture = TestBed.createComponent(ArtifactSourceComponent);
+    fixture.componentRef.setInput('content', content);
+    fixture.componentRef.setInput('contentType', contentType);
+    fixture.detectChanges();
+    return fixture.nativeElement as HTMLElement;
+  }
+
+  beforeEach(() => {
+    TestBed.resetTestingModule();
+    TestBed.configureTestingModule({
+      imports: [ArtifactSourceComponent],
+    });
+  });
+
+  it('renders one gutter number per source line', () => {
+    const el = render('a\nb\nc', 'text/css');
+    const gutter = el.querySelectorAll('[aria-hidden="true"] > div');
+    expect(gutter.length).toBe(3);
+    expect(Array.from(gutter).map((d) => d.textContent)).toEqual([
+      '1',
+      '2',
+      '3',
+    ]);
+  });
+
+  it('counts a trailing newline as a final (empty) line', () => {
+    const el = render('only\n', 'text/css');
+    expect(el.querySelectorAll('[aria-hidden="true"] > div').length).toBe(2);
+  });
+
+  it('maps the content type onto a Prism language class', () => {
+    const el = render('<p>x</p>', 'text/html; charset=utf-8');
+    expect(el.querySelector('code')?.className).toContain(
+      'language-markup',
+    );
+  });
+
+  it('falls back to a plain-text language for unknown types', () => {
+    const el = render('plain', 'application/octet-stream');
+    expect(el.querySelector('code')?.className).toContain('language-none');
+  });
+
+  it('escapes HTML so artifact source can never execute', () => {
+    const el = render('<script>alert(1)</script>', 'text/html');
+    const code = el.querySelector('code');
+    // No live <script> element was injected…
+    expect(code?.querySelector('script')).toBeNull();
+    // …the angle brackets were rendered as text instead.
+    expect(code?.innerHTML).toContain('&lt;script&gt;');
+    expect(code?.textContent).toBe('<script>alert(1)</script>');
+  });
+});
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/artifact/artifact-source.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/artifact/artifact-source.component.ts
new file mode 100644
index 00000000..57712146
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/artifact/artifact-source.component.ts
@@ -0,0 +1,127 @@
+import {
+  ChangeDetectionStrategy,
+  Component,
+  computed,
+  input,
+} from '@angular/core';
+
+/**
+ * Read-only source view for the artifact panel's "code" mode. Highlights
+ * inert artifact text with the globally-loaded Prism (same build/theme
+ * the chat code blocks use) and renders a line-number gutter beside it.
+ *
+ * Safety: the content is never executed. Prism.highlight HTML-escapes
+ * the source and only injects its own `<span class="token">` wrappers,
+ * and the `[innerHTML]` binding is additionally run through Angular's
+ * sanitizer (which strips scripts/handlers but keeps the token spans) —
+ * so this is XSS-safe without `bypassSecurityTrustHtml`.
+ */
+interface PrismLike {
+  languages: Record<string, unknown>;
+  highlight(text: string, grammar: unknown, language: string): string;
+}
+
+function getPrism(): PrismLike | null {
+  const p = (globalThis as { Prism?: PrismLike }).Prism;
+  return p && typeof p.highlight === 'function' ? p : null;
+}
+
+function escapeHtml(value: string): string {
+  return value
+    .replace(/&/g, '&amp;')
+    .replace(/</g, '&lt;')
+    .replace(/>/g, '&gt;');
+}
+
+/** Bare MIME type → Prism language id. Unknown types render as escaped
+ *  plain text (no grammar) so the view never breaks on an exotic type. */
+function mimeToPrismLang(contentType: string): string {
+  const bare = (contentType || '').split(';')[0].trim().toLowerCase();
+  switch (bare) {
+    case 'text/html':
+    case 'application/xhtml+xml':
+    case 'image/svg+xml':
+      return 'markup';
+    case 'text/javascript':
+    case 'application/javascript':
+      return 'javascript';
+    case 'text/css':
+      return 'css';
+    case 'application/json':
+      return 'json';
+    case 'text/markdown':
+    case 'text/x-markdown':
+      return 'markdown';
+    default:
+      return 'none';
+  }
+}
+
+@Component({
+  selector: 'app-artifact-source',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  template: `
+    <div
+      class="h-full w-full overflow-auto bg-[#272822] text-[13px]"
+      tabindex="0"
+      role="region"
+      [attr.aria-label]="'Artifact source (' + language() + ')'"
+    >
+      <div class="flex min-h-full min-w-full w-max">
+        <div
+          aria-hidden="true"
+          class="sticky left-0 z-10 select-none border-r border-white/10 bg-[#272822] px-3 py-4 text-right font-mono leading-[1.6] text-white/35 tabular-nums"
+        >
+          @for (n of lineNumbers(); track n) {
+            <div>{{ n }}</div>
+          }
+        </div>
+        <pre
+          class="m-0 flex-1 px-4 py-4 font-mono leading-[1.6]"
+        ><code [class]="'language-' + language()" [innerHTML]="highlighted()"></code></pre>
+      </div>
+    </div>
+  `,
+  styles: `
+    :host {
+      display: block;
+      height: 100%;
+    }
+    /* The global okaidia theme paints its own background/padding on
+       pre[class*="language-"]; we host the surface on the scroll
+       container instead so the gutter and code share it. */
+    :host pre {
+      background: transparent;
+      white-space: pre;
+      tab-size: 2;
+    }
+  `,
+})
+export class ArtifactSourceComponent {
+  readonly content = input.required<string>();
+  readonly contentType = input.required<string>();
+
+  protected readonly language = computed(() =>
+    mimeToPrismLang(this.contentType()),
+  );
+
+  protected readonly lineNumbers = computed(() => {
+    const lines = this.content().split('\n').length;
+    return Array.from({ length: lines }, (_, i) => i + 1);
+  });
+
+  protected readonly highlighted = computed(() => {
+    const code = this.content();
+    const lang = this.language();
+    const prism = getPrism();
+    const grammar = prism?.languages[lang];
+    if (lang === 'none' || !prism || !grammar) {
+      return escapeHtml(code);
+    }
+    try {
+      return prism.highlight(code, grammar, lang);
+    } catch {
+      return escapeHtml(code);
+    }
+  });
+}
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/assistant-message.component.spec.ts b/frontend/ai.client/src/app/session/components/message-list/components/assistant-message.component.spec.ts
index eaa9a488..d2562992 100644
--- a/frontend/ai.client/src/app/session/components/message-list/components/assistant-message.component.spec.ts
+++ b/frontend/ai.client/src/app/session/components/message-list/components/assistant-message.component.spec.ts
@@ -3,6 +3,8 @@ import { describe, it, expect, beforeEach } from 'vitest';
 import { provideMarkdown, MarkdownService } from 'ngx-markdown';
 import { AssistantMessageComponent } from './assistant-message.component';
 import { Message, ContentBlock } from '../../../services/models/message.model';
+import { McpAppStateService } from '../../../services/mcp-apps/mcp-app-state.service';
+import { UiResourceEvent } from '../../../../shared/utils/stream-parser/stream-parser-types';
 
 function makeMessage(content: ContentBlock[]): Message {
   return {
@@ -212,6 +214,104 @@ describe('AssistantMessageComponent', () => {
     });
   });
 
+  // Regression: a tool whose result content carries no inline ui_type/ui_display
+  // marker (i.e. not a legacy promoted-visual tool) was being folded into the
+  // collapsed tool_group, so no <app-tool-use> ever existed for it — and the
+  // MCP App frame (which lives inside <app-tool-use>) never instantiated even
+  // though the backend correctly emitted ui_resource. Surfaced dogfooding the
+  // excalidraw-mcp server, whose `create_view` returns plain text. The fix
+  // promotes any tool whose toolUseId is recorded in McpAppStateService.
+  describe('MCP Apps promote tool out of tool_group', () => {
+    function makeUiResource(toolUseId: string): UiResourceEvent {
+      return {
+        type: 'ui_resource',
+        toolUseId,
+        resourceUri: 'ui://example/app.html',
+        html: '<!doctype html><html><body>app</body></html>',
+        mimeType: 'text/html;profile=mcp-app',
+        csp: {},
+        permissions: {},
+        sandboxOrigin: 'https://mcp-sandbox.example.com',
+      };
+    }
+
+    it('folds a plain-text-result tool into tool_group when there is no ui_resource', () => {
+      const tool = makeToolBlock('create_view', {
+        result: { status: 'success', content: [{ text: 'Diagram displayed!' }] },
+      });
+      fixture.componentRef.setInput('message', makeMessage([tool]));
+      fixture.detectChanges();
+
+      const blocks = component.displayBlocks();
+      expect(blocks.length).toBe(1);
+      expect(blocks[0].type).toBe('tool_group');
+      expect(blocks[0].group!.calls.length).toBe(1);
+    });
+
+    it('promotes the same tool to tool_use_minimized when its toolUseId is in McpAppStateService', () => {
+      const mcpAppState = TestBed.inject(McpAppStateService);
+      const tool = makeToolBlock('create_view', {
+        toolUseId: 'tooluse_mcp_app_1',
+        result: { status: 'success', content: [{ text: 'Diagram displayed!' }] },
+      });
+      mcpAppState.recordLive(makeUiResource('tooluse_mcp_app_1'));
+
+      fixture.componentRef.setInput('message', makeMessage([tool]));
+      fixture.detectChanges();
+
+      const blocks = component.displayBlocks();
+      // Two blocks: the minimized tool card (provenance) plus a sibling
+      // mcp_app_frame block that hosts the iframe at the same level as
+      // text/visuals. No `promoted_visual` — MCP Apps have their own type.
+      expect(blocks.length).toBe(2);
+      expect(blocks[0].type).toBe('tool_use_minimized');
+      expect(blocks[1].type).toBe('mcp_app_frame');
+      expect(blocks[1].toolUseId).toBe('tooluse_mcp_app_1');
+      expect(blocks.some((b) => b.type === 'promoted_visual')).toBe(false);
+
+      mcpAppState.reset();
+    });
+
+    it('retroactively promotes when ui_resource arrives AFTER tool_result (late-arrival reactivity)', () => {
+      const mcpAppState = TestBed.inject(McpAppStateService);
+      const tool = makeToolBlock('create_view', {
+        toolUseId: 'tooluse_mcp_app_2',
+        result: { status: 'success', content: [{ text: 'Diagram displayed!' }] },
+      });
+
+      // Initial render: ui_resource hasn't arrived yet → tool folded into group.
+      fixture.componentRef.setInput('message', makeMessage([tool]));
+      fixture.detectChanges();
+      expect(component.displayBlocks()[0].type).toBe('tool_group');
+
+      // ui_resource arrives ~40ms after tool_result on the wire. The
+      // displayBlocks computed must re-run on the McpAppStateService signal
+      // update, or the tool stays folded forever.
+      mcpAppState.recordLive(makeUiResource('tooluse_mcp_app_2'));
+      fixture.detectChanges();
+
+      const blocks = component.displayBlocks();
+      expect(blocks.length).toBe(2);
+      expect(blocks[0].type).toBe('tool_use_minimized');
+      expect(blocks[1].type).toBe('mcp_app_frame');
+
+      mcpAppState.reset();
+    });
+
+    it('promoted-visual tool still emits both tool_use_minimized AND promoted_visual blocks', () => {
+      // Sanity: the new gate must not regress the legacy promoted-visual path.
+      fixture.componentRef.setInput('message', makeMessage([
+        makePromotedVisualToolBlock('chart_tool'),
+      ]));
+      fixture.detectChanges();
+
+      const blocks = component.displayBlocks();
+      expect(blocks.length).toBe(2);
+      expect(blocks[0].type).toBe('tool_use_minimized');
+      expect(blocks[1].type).toBe('promoted_visual');
+    });
+  });
+
   describe('reasoning content', () => {
     it('should render reasoning blocks and flush tool groups', () => {
       fixture.componentRef.setInput('message', makeMessage([
@@ -277,5 +377,33 @@ describe('AssistantMessageComponent', () => {
       const call = blocks[0].group!.calls[0];
       expect(call.status).toBe('pending');
     });
+
+    it('should carry streamingContent through to the ToolCallDisplay', () => {
+      fixture.componentRef.setInput('message', makeMessage([
+        makeToolBlock('create_artifact', {
+          status: 'pending',
+          streamingContent: '<!DOCTYPE html><html><body>partial',
+          // no result yet — still generating
+          result: undefined,
+        }),
+      ]));
+      fixture.detectChanges();
+
+      const blocks = component.displayBlocks();
+      const call = blocks[0].group!.calls[0];
+      expect(call.streamingContent).toBe('<!DOCTYPE html><html><body>partial');
+      expect(call.status).toBe('pending');
+    });
+
+    it('should leave streamingContent undefined for ordinary tool calls', () => {
+      fixture.componentRef.setInput('message', makeMessage([
+        makeToolBlock('my_tool'),
+      ]));
+      fixture.detectChanges();
+
+      const blocks = component.displayBlocks();
+      const call = blocks[0].group!.calls[0];
+      expect(call.streamingContent).toBeUndefined();
+    });
   });
 });
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/assistant-message.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/assistant-message.component.ts
index f0b13b65..908be669 100644
--- a/frontend/ai.client/src/app/session/components/message-list/components/assistant-message.component.ts
+++ b/frontend/ai.client/src/app/session/components/message-list/components/assistant-message.component.ts
@@ -7,10 +7,13 @@ import { ReasoningContentComponent } from './reasoning-content';
 import { StreamingTextComponent } from './streaming-text.component';
 import { InlineVisualComponent } from './inline-visual';
 import { OAuthConsentPromptComponent } from './oauth-consent-prompt/oauth-consent-prompt.component';
+import { McpAppFrameComponent } from './tool-use/renderers/mcp-app-frame.component';
 import {
   OAuthConsentRequest,
   OAuthConsentService,
 } from '../../../../services/oauth-consent/oauth-consent.service';
+import { McpAppStateService } from '../../../services/mcp-apps/mcp-app-state.service';
+import type { ToolResultData } from './tool-use/tool-renderer-registry.service';
 
 // ──────────────────────────────────────────────────────────────
 // 🔧 MOCK FLAG — set to true to render 10 fake tool calls
@@ -108,6 +111,7 @@ interface DisplayBlock {
     | 'tool_group'
     | 'tool_use_minimized'
     | 'promoted_visual'
+    | 'mcp_app_frame'
     | 'reasoningContent'
     | 'oauth_required';
   data?: ContentBlock;
@@ -117,6 +121,8 @@ interface DisplayBlock {
   uiType?: string;
   payload?: unknown;
   toolUseId?: string;
+  // For MCP App frames (SEP-1865): the tool result re-shaped for the renderer.
+  mcpResult?: ToolResultData;
   // For inline OAuth consent prompts
   oauthRequest?: OAuthConsentRequest;
 }
@@ -131,6 +137,7 @@ interface DisplayBlock {
     StreamingTextComponent,
     InlineVisualComponent,
     OAuthConsentPromptComponent,
+    McpAppFrameComponent,
   ],
   template: `
     <div class="block-container">
@@ -196,6 +203,18 @@ interface DisplayBlock {
               />
             </div>
           }
+          @case ('mcp_app_frame') {
+            <div
+              class="message-block visual-block"
+              [style.animation-delay]="$index * 0.1 + 's'"
+            >
+              <app-mcp-app-frame
+                class="block w-full"
+                [result]="block.mcpResult!"
+                [toolUseId]="block.toolUseId!"
+              />
+            </div>
+          }
           @case ('oauth_required') {
             <div
               class="message-block oauth-block"
@@ -259,6 +278,7 @@ export class AssistantMessageComponent {
   isStreaming = input<boolean>(false);
 
   private consentService = inject(OAuthConsentService);
+  private mcpAppState = inject(McpAppStateService);
 
   /**
    * Transforms content blocks into display blocks.
@@ -321,9 +341,23 @@ export class AssistantMessageComponent {
       if ((block.type === 'toolUse' || block.type === 'tool_use') && block.toolUse) {
         const toolUse = block.toolUse as ToolUseData;
         const promotedVisual = this.extractPromotedVisual(toolUse);
+        // An MCP App tool (SEP-1865) renders its sandbox-proxy iframe inside
+        // <app-tool-use> via the resultRenderer computed there, so it must
+        // escape the collapsed tool_group exactly like a promoted visual.
+        // `extractPromotedVisual` only fires on the legacy in-result
+        // `ui_type`/`ui_display` marker; MCP Apps deliver UI via a separate
+        // `ui_resource` SSE event that arrives *after* `tool_result` and
+        // their tool result content carries no inline marker. Reading the
+        // signal here keeps `displayBlocks` reactive to a late-arriving
+        // `ui_resource` — the computed re-runs when McpAppStateService
+        // updates and the tool gets promoted retroactively (vs. staying
+        // folded into the group forever).
+        const hasMcpAppResource = this.mcpAppState.has(toolUse.toolUseId);
 
-        if (promotedVisual) {
-          // Promoted visuals break the tool group and render separately
+        if (promotedVisual || hasMcpAppResource) {
+          // Promoted visuals and MCP Apps both need their own first-class
+          // sibling block (the iframe is not "tool output", it's a primary
+          // UI surface); break the tool group here.
           flushToolGroup();
 
           result.push({
@@ -332,12 +366,22 @@ export class AssistantMessageComponent {
             toolUseId: toolUse.toolUseId
           });
 
-          result.push({
-            type: 'promoted_visual',
-            uiType: promotedVisual.uiType,
-            payload: promotedVisual.payload,
-            toolUseId: toolUse.toolUseId
-          });
+          if (promotedVisual) {
+            result.push({
+              type: 'promoted_visual',
+              uiType: promotedVisual.uiType,
+              payload: promotedVisual.payload,
+              toolUseId: toolUse.toolUseId
+            });
+          }
+
+          if (hasMcpAppResource) {
+            result.push({
+              type: 'mcp_app_frame',
+              toolUseId: toolUse.toolUseId,
+              mcpResult: this.toResultData(toolUse),
+            });
+          }
         } else {
           // Accumulate into the current tool group. A tool_use with no result
           // on a message that has a pending OAuth interrupt is the row that
@@ -355,6 +399,7 @@ export class AssistantMessageComponent {
             input: toolUse.input || {},
             result: toolUse.result,
             status,
+            streamingContent: toolUse.streamingContent,
           });
         }
         continue;
@@ -374,6 +419,16 @@ export class AssistantMessageComponent {
     return result;
   });
 
+  /**
+   * Reshape a tool-use's `result` into the renderer's `ToolResultData`
+   * contract. Until the `tool_result` event arrives we pass an empty
+   * success stub — the renderer holds the iframe until the result comes
+   * in (and re-pushes it via the `refreshToolResult` effect).
+   */
+  private toResultData(toolUse: ToolUseData): ToolResultData {
+    return toolUse.result ?? { content: [], status: 'success' };
+  }
+
   /**
    * Extract promoted visual data from a tool use result.
    * Returns null if not a promoted visual (no ui_type or ui_display !== 'inline').
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/code-block-clipboard-button.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/code-block-clipboard-button.component.ts
new file mode 100644
index 00000000..01405d4c
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/code-block-clipboard-button.component.ts
@@ -0,0 +1,59 @@
+import { ChangeDetectionStrategy, Component, OnDestroy, signal } from '@angular/core';
+import { NgIcon, provideIcons } from '@ng-icons/core';
+import { heroSquare2Stack, heroCheck } from '@ng-icons/heroicons/outline';
+import { TooltipDirective } from '../../../../components/tooltip';
+
+/**
+ * Custom button rendered by ngx-markdown for the copy-to-clipboard toolbar
+ * that wraps each `<pre>` code block. ngx-markdown attaches ClipboardJS to
+ * the rendered button's root node — this component only owns the visual
+ * "Copied" feedback state.
+ */
+@Component({
+  selector: 'app-code-block-clipboard-button',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  imports: [NgIcon, TooltipDirective],
+  providers: [provideIcons({ heroSquare2Stack, heroCheck })],
+  template: `
+    <button
+      type="button"
+      class="inline-flex items-center justify-center rounded-md bg-gray-100/80 p-1.5 text-gray-500 backdrop-blur-sm transition-colors hover:bg-gray-200 hover:text-gray-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:bg-gray-800/80 dark:text-gray-400 dark:hover:bg-gray-700 dark:hover:text-gray-200"
+      [appTooltip]="copied() ? 'Copied' : 'Copy code'"
+      appTooltipPosition="top"
+      [attr.aria-label]="copied() ? 'Copied to clipboard' : 'Copy code'"
+      (click)="onClick()"
+    >
+      @if (copied()) {
+        <ng-icon name="heroCheck" class="size-4" aria-hidden="true" />
+      } @else {
+        <ng-icon name="heroSquare2Stack" class="size-4" aria-hidden="true" />
+      }
+    </button>
+  `,
+  styles: `
+    :host {
+      display: contents;
+    }
+  `,
+})
+export class CodeBlockClipboardButtonComponent implements OnDestroy {
+  protected copied = signal(false);
+  private resetTimeout: ReturnType<typeof setTimeout> | null = null;
+
+  protected onClick(): void {
+    this.copied.set(true);
+    if (this.resetTimeout) {
+      clearTimeout(this.resetTimeout);
+    }
+    this.resetTimeout = setTimeout(() => {
+      this.copied.set(false);
+      this.resetTimeout = null;
+    }, 2000);
+  }
+
+  ngOnDestroy(): void {
+    if (this.resetTimeout) {
+      clearTimeout(this.resetTimeout);
+    }
+  }
+}
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/file-attachment/file-attachment-badge.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/file-attachment/file-attachment-badge.component.ts
index 025bd930..603681f9 100644
--- a/frontend/ai.client/src/app/session/components/message-list/components/file-attachment/file-attachment-badge.component.ts
+++ b/frontend/ai.client/src/app/session/components/message-list/components/file-attachment/file-attachment-badge.component.ts
@@ -1,4 +1,4 @@
-import { Component, ChangeDetectionStrategy, input, computed } from '@angular/core';
+import { Component, ChangeDetectionStrategy, computed, effect, inject, input, signal } from '@angular/core';
 import { NgIcon, provideIcons } from '@ng-icons/core';
 import {
   heroDocument,
@@ -6,125 +6,126 @@ import {
   heroTableCells,
   heroCodeBracket,
   heroPhoto,
+  heroArrowTopRightOnSquare,
 } from '@ng-icons/heroicons/outline';
-import { formatBytes } from '../../../../../services/file-upload';
+import { MarkdownComponent } from 'ngx-markdown';
+import { formatBytes, FileUploadService } from '../../../../../services/file-upload';
 import { FileAttachmentData } from '../../../../services/models/message.model';
+import { MarkdownPreviewModalComponent } from './markdown-preview-modal.component';
 
-/**
- * Check if MIME type is an image
- */
-function isImageMimeType(mimeType: string): boolean {
-  return mimeType.startsWith('image/');
+interface FileTypeStyle {
+  icon: string;
+  label: string;
+  /** Accent color used for the type chip and the icon */
+  accent_text: string;
+  /** Header strip background tint (subtle) */
+  header_bg: string;
 }
 
-/**
- * File type to icon mapping
- */
-const FILE_TYPE_ICONS: Record<string, string> = {
-  'application/pdf': 'heroDocument',
-  'application/vnd.openxmlformats-officedocument.wordprocessingml.document': 'heroDocumentText',
-  'text/plain': 'heroDocumentText',
-  'text/html': 'heroCodeBracket',
-  'text/csv': 'heroTableCells',
-  'application/vnd.ms-excel': 'heroTableCells',
-  'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet': 'heroTableCells',
-  'text/markdown': 'heroDocumentText',
-  'image/png': 'heroPhoto',
-  'image/jpeg': 'heroPhoto',
-  'image/gif': 'heroPhoto',
-  'image/webp': 'heroPhoto',
+const DEFAULT_STYLE: FileTypeStyle = {
+  icon: 'heroDocument',
+  label: 'FILE',
+  accent_text: 'text-gray-600 dark:text-gray-300',
+  header_bg: 'bg-gray-50 dark:bg-gray-700/50',
 };
 
-/**
- * File type to color mapping for icon container
- */
-const FILE_TYPE_COLORS: Record<string, { bg: string; text: string; border: string }> = {
+const FILE_TYPE_STYLES: Record<string, FileTypeStyle> = {
   'application/pdf': {
-    bg: 'bg-rose-100 dark:bg-rose-900/60',
-    text: 'text-rose-600 dark:text-rose-300',
-    border: 'border-rose-300 dark:border-rose-700'
+    icon: 'heroDocument',
+    label: 'PDF',
+    accent_text: 'text-rose-600 dark:text-rose-300',
+    header_bg: 'bg-rose-50 dark:bg-rose-950/40',
   },
   'application/vnd.openxmlformats-officedocument.wordprocessingml.document': {
-    bg: 'bg-blue-100 dark:bg-blue-900/60',
-    text: 'text-blue-600 dark:text-blue-300',
-    border: 'border-blue-300 dark:border-blue-700'
+    icon: 'heroDocumentText',
+    label: 'DOCX',
+    accent_text: 'text-blue-600 dark:text-blue-300',
+    header_bg: 'bg-blue-50 dark:bg-blue-950/40',
   },
   'text/plain': {
-    bg: 'bg-gray-100 dark:bg-gray-600',
-    text: 'text-gray-600 dark:text-gray-200',
-    border: 'border-gray-300 dark:border-gray-500'
+    icon: 'heroDocumentText',
+    label: 'TXT',
+    accent_text: 'text-gray-600 dark:text-gray-300',
+    header_bg: 'bg-gray-50 dark:bg-gray-700/50',
   },
   'text/html': {
-    bg: 'bg-orange-100 dark:bg-orange-900/60',
-    text: 'text-orange-600 dark:text-orange-300',
-    border: 'border-orange-300 dark:border-orange-700'
+    icon: 'heroCodeBracket',
+    label: 'HTML',
+    accent_text: 'text-orange-600 dark:text-orange-300',
+    header_bg: 'bg-orange-50 dark:bg-orange-950/40',
   },
   'text/csv': {
-    bg: 'bg-green-100 dark:bg-green-900/60',
-    text: 'text-green-600 dark:text-green-300',
-    border: 'border-green-300 dark:border-green-700'
+    icon: 'heroTableCells',
+    label: 'CSV',
+    accent_text: 'text-green-600 dark:text-green-300',
+    header_bg: 'bg-green-50 dark:bg-green-950/40',
   },
   'application/vnd.ms-excel': {
-    bg: 'bg-green-100 dark:bg-green-900/60',
-    text: 'text-green-600 dark:text-green-300',
-    border: 'border-green-300 dark:border-green-700'
+    icon: 'heroTableCells',
+    label: 'XLS',
+    accent_text: 'text-green-600 dark:text-green-300',
+    header_bg: 'bg-green-50 dark:bg-green-950/40',
   },
   'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet': {
-    bg: 'bg-green-100 dark:bg-green-900/60',
-    text: 'text-green-600 dark:text-green-300',
-    border: 'border-green-300 dark:border-green-700'
+    icon: 'heroTableCells',
+    label: 'XLSX',
+    accent_text: 'text-green-600 dark:text-green-300',
+    header_bg: 'bg-green-50 dark:bg-green-950/40',
   },
   'text/markdown': {
-    bg: 'bg-purple-100 dark:bg-purple-900/60',
-    text: 'text-purple-600 dark:text-purple-300',
-    border: 'border-purple-300 dark:border-purple-700'
+    icon: 'heroDocumentText',
+    label: 'MD',
+    accent_text: 'text-purple-600 dark:text-purple-300',
+    header_bg: 'bg-purple-50 dark:bg-purple-950/40',
   },
   'image/png': {
-    bg: 'bg-indigo-100 dark:bg-indigo-900/60',
-    text: 'text-indigo-600 dark:text-indigo-300',
-    border: 'border-indigo-300 dark:border-indigo-700'
+    icon: 'heroPhoto',
+    label: 'PNG',
+    accent_text: 'text-indigo-600 dark:text-indigo-300',
+    header_bg: 'bg-indigo-50 dark:bg-indigo-950/40',
   },
   'image/jpeg': {
-    bg: 'bg-indigo-100 dark:bg-indigo-900/60',
-    text: 'text-indigo-600 dark:text-indigo-300',
-    border: 'border-indigo-300 dark:border-indigo-700'
+    icon: 'heroPhoto',
+    label: 'JPG',
+    accent_text: 'text-indigo-600 dark:text-indigo-300',
+    header_bg: 'bg-indigo-50 dark:bg-indigo-950/40',
   },
   'image/gif': {
-    bg: 'bg-indigo-100 dark:bg-indigo-900/60',
-    text: 'text-indigo-600 dark:text-indigo-300',
-    border: 'border-indigo-300 dark:border-indigo-700'
+    icon: 'heroPhoto',
+    label: 'GIF',
+    accent_text: 'text-indigo-600 dark:text-indigo-300',
+    header_bg: 'bg-indigo-50 dark:bg-indigo-950/40',
   },
   'image/webp': {
-    bg: 'bg-indigo-100 dark:bg-indigo-900/60',
-    text: 'text-indigo-600 dark:text-indigo-300',
-    border: 'border-indigo-300 dark:border-indigo-700'
+    icon: 'heroPhoto',
+    label: 'WEBP',
+    accent_text: 'text-indigo-600 dark:text-indigo-300',
+    header_bg: 'bg-indigo-50 dark:bg-indigo-950/40',
   },
 };
 
-const DEFAULT_COLORS = {
-  bg: 'bg-gray-100 dark:bg-gray-600',
-  text: 'text-gray-600 dark:text-gray-200',
-  border: 'border-gray-300 dark:border-gray-500'
-};
+const TEXT_PREVIEW_MIMES = new Set(['text/plain', 'text/markdown', 'text/csv', 'text/html']);
+
+/** MIME types where the backend can produce a real first-page thumbnail. */
+const THUMBNAIL_PREVIEW_MIMES = new Set(['application/pdf']);
+
+/** Skeleton "lines of text" widths (percent), tuned to look like a paragraph. */
+const SKELETON_LINE_WIDTHS = [92, 78, 88, 64, 95, 70, 84, 58];
 
 /**
- * Compact file attachment badge for displaying in user messages.
+ * Document-style preview card for a non-image file attachment.
  *
- * This is a read-only display component (no remove/retry actions)
- * used for showing files that were attached to historical messages.
- * Styled consistently with the FileCardComponent.
+ * Renders an iMessage-inspired "paper" mockup: a tinted header strip with the
+ * type chip and accent icon, a white page area showing either a real text
+ * excerpt (for txt/md/csv/html) or skeleton lines (for binary docs), a
+ * folded top-right corner detail, and a footer with filename + size.
  *
- * @example
- * ```html
- * <app-file-attachment-badge
- *   [attachment]="fileAttachment"
- * />
- * ```
+ * Clicking opens the file in a new tab via a short-lived presigned URL.
  */
 @Component({
   selector: 'app-file-attachment-badge',
   changeDetection: ChangeDetectionStrategy.OnPush,
-  imports: [NgIcon],
+  imports: [NgIcon, MarkdownComponent, MarkdownPreviewModalComponent],
   providers: [
     provideIcons({
       heroDocument,
@@ -132,25 +133,167 @@ const DEFAULT_COLORS = {
       heroTableCells,
       heroCodeBracket,
       heroPhoto,
-    })
+      heroArrowTopRightOnSquare,
+    }),
   ],
-  host: {
-    'class': 'contents'
-  },
+  host: { class: 'contents' },
+  styles: `
+    .corner-fold {
+      width: 18px;
+      height: 18px;
+      background: linear-gradient(225deg, var(--corner-bg, #f3f4f6) 50%, transparent 50%);
+      box-shadow: -1px 1px 1px rgb(0 0 0 / 0.05);
+    }
+    :host-context(.dark) .corner-fold {
+      --corner-bg: #374151;
+    }
+
+    /* Compact markdown styling for the small in-card preview. The card body
+       is only ~128px tall so we shrink everything aggressively and strip the
+       margins that the global .message-block prose styles add. */
+    .md-card-preview {
+      font-size: 9px;
+      line-height: 1.45;
+      color: rgb(55 65 81);
+    }
+    :host-context(.dark) .md-card-preview {
+      color: rgb(209 213 219);
+    }
+    .md-card-preview :is(h1, h2, h3, h4, h5, h6) {
+      font-weight: 700;
+      line-height: 1.25;
+      margin: 0 0 2px;
+      color: rgb(17 24 39);
+    }
+    :host-context(.dark) .md-card-preview :is(h1, h2, h3, h4, h5, h6) {
+      color: rgb(243 244 246);
+    }
+    .md-card-preview h1 { font-size: 12px; }
+    .md-card-preview h2 { font-size: 11px; }
+    .md-card-preview h3,
+    .md-card-preview h4,
+    .md-card-preview h5,
+    .md-card-preview h6 { font-size: 10px; }
+    .md-card-preview p { margin: 0 0 4px; }
+    .md-card-preview ul,
+    .md-card-preview ol {
+      margin: 0 0 4px;
+      padding-left: 14px;
+    }
+    .md-card-preview li { margin: 0 0 1px; }
+    .md-card-preview code {
+      font-size: 8.5px;
+      background: rgb(243 244 246);
+      padding: 0 2px;
+      border-radius: 2px;
+      font-family: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, monospace;
+    }
+    .md-card-preview pre {
+      font-size: 8.5px;
+      background: rgb(243 244 246);
+      padding: 4px;
+      border-radius: 4px;
+      overflow: hidden;
+      margin: 0 0 4px;
+    }
+    :host-context(.dark) .md-card-preview code,
+    :host-context(.dark) .md-card-preview pre {
+      background: rgb(31 41 55);
+    }
+    .md-card-preview a {
+      color: rgb(99 102 241);
+      text-decoration: underline;
+    }
+    .md-card-preview strong { font-weight: 600; }
+    .md-card-preview blockquote {
+      border-left: 2px solid rgb(209 213 219);
+      padding-left: 6px;
+      margin: 0 0 4px;
+      color: rgb(107 114 128);
+    }
+  `,
   template: `
-    <div
-      class="flex w-48 shrink-0 items-center gap-2 rounded-lg border border-gray-200 bg-gray-50 px-3 py-2 dark:border-gray-600 dark:bg-gray-800"
+    <button
+      type="button"
+      (click)="openFile()"
+      class="group flex w-60 shrink-0 flex-col overflow-hidden rounded-xl border border-gray-200 bg-white text-left shadow-sm transition-all hover:-translate-y-0.5 hover:shadow-md focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-primary-500 dark:border-gray-700 dark:bg-gray-800"
+      [attr.aria-label]="'Open ' + attachment().filename"
     >
-      <!-- File icon -->
+      <!-- Header strip -->
       <div
-        class="flex size-8 shrink-0 items-center justify-center rounded-md border"
-        [class]="iconContainerClass()"
+        class="flex items-center justify-between border-b border-gray-200 px-3 py-2 dark:border-gray-700"
+        [class]="style().header_bg"
       >
-        <ng-icon [name]="iconName()" class="size-5" [class]="iconClass()" aria-hidden="true" />
+        <div class="flex items-center gap-2">
+          <ng-icon
+            [name]="style().icon"
+            class="size-4"
+            [class]="style().accent_text"
+            aria-hidden="true"
+          />
+          <span
+            class="text-[10px] font-bold tracking-wider"
+            [class]="style().accent_text"
+          >
+            {{ style().label }}
+          </span>
+        </div>
+        <ng-icon
+          name="heroArrowTopRightOnSquare"
+          class="size-4 text-gray-400 opacity-0 transition-opacity group-hover:opacity-100"
+          aria-hidden="true"
+        />
+      </div>
+
+      <!-- Paper page area -->
+      <div class="relative h-32 overflow-hidden bg-white dark:bg-gray-900/40">
+        <!-- Folded corner -->
+        <div
+          class="corner-fold absolute right-0 top-0"
+          aria-hidden="true"
+        ></div>
+
+        @if (thumbnailUrl(); as url) {
+          <img
+            [src]="url"
+            [alt]="'First page of ' + attachment().filename"
+            class="size-full object-cover object-top"
+            loading="lazy"
+            decoding="async"
+          />
+        } @else if (snippetState() === 'ready' && hasSnippet()) {
+          @if (isMarkdown()) {
+            <div class="md-card-preview h-full overflow-hidden px-3 py-2">
+              <markdown [data]="truncatedSnippet()" />
+            </div>
+          } @else {
+            <pre
+              class="m-0 max-h-full overflow-hidden whitespace-pre-wrap break-words px-3 py-2 font-mono text-[9px] leading-snug text-gray-700 dark:text-gray-300"
+            >{{ truncatedSnippet() }}</pre>
+          }
+        } @else {
+          <div class="space-y-1.5 px-3 py-2.5" aria-hidden="true">
+            @for (width of skeletonWidths; track $index) {
+              <div
+                class="h-1.5 rounded-full bg-gray-200 dark:bg-gray-700"
+                [style.width.%]="width"
+              ></div>
+            }
+          </div>
+        }
+
+        <!-- Bottom fade for long text. Suppressed when a thumbnail is shown
+             so the rendered page edge stays crisp. -->
+        @if (!thumbnailUrl()) {
+          <div
+            class="pointer-events-none absolute inset-x-0 bottom-0 h-6 bg-gradient-to-t from-white to-transparent dark:from-gray-900/40"
+            aria-hidden="true"
+          ></div>
+        }
       </div>
 
-      <!-- File info -->
-      <div class="min-w-0 flex-1">
+      <!-- Footer -->
+      <div class="min-w-0 border-t border-gray-100 px-3 py-2 dark:border-gray-700/60">
         <p class="truncate text-sm font-medium text-gray-900 dark:text-white">
           {{ attachment().filename }}
         </p>
@@ -158,37 +301,87 @@ const DEFAULT_COLORS = {
           {{ formattedSize() }}
         </p>
       </div>
-    </div>
+    </button>
+
+    @if (markdownModalOpen()) {
+      <app-markdown-preview-modal
+        [uploadId]="attachment().uploadId"
+        [filename]="attachment().filename"
+        (close)="markdownModalOpen.set(false)"
+      />
+    }
   `,
 })
 export class FileAttachmentBadgeComponent {
-  /** File attachment data */
   readonly attachment = input.required<FileAttachmentData>();
 
-  protected readonly formattedSize = computed(() =>
-    formatBytes(this.attachment().sizeBytes)
-  );
+  private readonly fileUploadService = inject(FileUploadService);
+
+  protected readonly skeletonWidths = SKELETON_LINE_WIDTHS;
+
+  protected readonly snippetState = signal<'idle' | 'loading' | 'ready' | 'error'>('idle');
+  private readonly snippet = signal<string>('');
+  protected readonly markdownModalOpen = signal(false);
+
+  /** Presigned URL for a real first-page thumbnail (PDFs today). null on
+      unsupported types or render failure — caller falls back to skeleton. */
+  protected readonly thumbnailUrl = signal<string | null>(null);
 
-  protected readonly isImage = computed(() =>
-    isImageMimeType(this.attachment().mimeType)
+  protected readonly formattedSize = computed(() => formatBytes(this.attachment().sizeBytes));
+
+  protected readonly style = computed<FileTypeStyle>(
+    () => FILE_TYPE_STYLES[this.attachment().mimeType] ?? DEFAULT_STYLE,
   );
 
-  protected readonly iconName = computed(() => {
-    const mime = this.attachment().mimeType;
-    return FILE_TYPE_ICONS[mime] ?? 'heroDocument';
-  });
+  protected readonly hasSnippet = computed(() => this.snippet().trim().length > 0);
 
-  protected readonly colors = computed(() => {
-    const mime = this.attachment().mimeType;
-    return FILE_TYPE_COLORS[mime] ?? DEFAULT_COLORS;
-  });
+  protected readonly isMarkdown = computed(() => this.attachment().mimeType === 'text/markdown');
 
-  protected readonly iconContainerClass = computed(() => {
-    const colors = this.colors();
-    return `${colors.bg} ${colors.border}`;
+  /** Cap chars so very long unbroken lines don't blow out the card. */
+  protected readonly truncatedSnippet = computed(() => {
+    const raw = this.snippet();
+    return raw.length > 600 ? raw.slice(0, 600) : raw;
   });
 
-  protected readonly iconClass = computed(() => {
-    return this.colors().text;
-  });
+  constructor() {
+    effect(() => {
+      const att = this.attachment();
+      if (TEXT_PREVIEW_MIMES.has(att.mimeType)) {
+        this.loadSnippet(att.uploadId);
+      }
+      if (THUMBNAIL_PREVIEW_MIMES.has(att.mimeType)) {
+        this.loadThumbnail(att.uploadId);
+      }
+    });
+  }
+
+  private async loadSnippet(uploadId: string): Promise<void> {
+    this.snippetState.set('loading');
+    try {
+      const response = await this.fileUploadService.getTextSnippet(uploadId);
+      this.snippet.set(response.snippet);
+      this.snippetState.set('ready');
+    } catch {
+      this.snippetState.set('error');
+    }
+  }
+
+  private async loadThumbnail(uploadId: string): Promise<void> {
+    const result = await this.fileUploadService.getThumbnail(uploadId);
+    this.thumbnailUrl.set(result.status === 'ready' ? result.response.url : null);
+  }
+
+  protected async openFile(): Promise<void> {
+    if (this.isMarkdown()) {
+      this.markdownModalOpen.set(true);
+      return;
+    }
+    try {
+      const response = await this.fileUploadService.getPreviewUrl(this.attachment().uploadId);
+      window.open(response.url, '_blank', 'noopener,noreferrer');
+    } catch {
+      // Silent failure — the broken link state is rare and the message stream
+      // surfaces backend errors separately.
+    }
+  }
 }
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/file-attachment/image-attachment-group.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/file-attachment/image-attachment-group.component.ts
new file mode 100644
index 00000000..e7e90474
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/file-attachment/image-attachment-group.component.ts
@@ -0,0 +1,214 @@
+import {
+  ChangeDetectionStrategy,
+  Component,
+  computed,
+  inject,
+  input,
+  signal,
+} from '@angular/core';
+import { NgIcon, provideIcons } from '@ng-icons/core';
+import { heroPhoto, heroExclamationTriangle } from '@ng-icons/heroicons/outline';
+import { FileAttachmentData } from '../../../../services/models/message.model';
+import { FileUploadService } from '../../../../../services/file-upload';
+import { ImageLightboxComponent, LightboxImage } from './image-lightbox.component';
+
+interface PreviewState {
+  url: string | null;
+  status: 'idle' | 'loading' | 'ready' | 'error';
+}
+
+/**
+ * iMessage-style group renderer for one or more image attachments.
+ *
+ * Layouts:
+ * - 1 image: large bubble (max 280px tall), aspect preserved
+ * - 2 images: side-by-side equal columns
+ * - 3 images: 1 large + 2 stacked column on the right
+ * - 4 images: 2x2 grid
+ * - 5+ images: 2x2 grid with "+N" overlay on the last tile
+ *
+ * Each image lazy-fetches a presigned GET URL on first render. Clicking any
+ * tile opens a full-screen lightbox with arrow-key navigation.
+ */
+@Component({
+  selector: 'app-image-attachment-group',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  imports: [NgIcon, ImageLightboxComponent],
+  providers: [provideIcons({ heroPhoto, heroExclamationTriangle })],
+  host: { class: 'contents' },
+  template: `
+    <div
+      class="overflow-hidden rounded-2xl"
+      [class]="layoutClass()"
+      [style.max-width.px]="maxWidthPx()"
+    >
+      @for (item of visibleImages(); track item.attachment.uploadId; let i = $index) {
+        <button
+          type="button"
+          class="group relative block overflow-hidden bg-gray-100 dark:bg-gray-800 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-primary-500"
+          [class]="tileClass(i)"
+          (click)="openLightbox(i)"
+          [attr.aria-label]="'Open ' + item.attachment.filename"
+        >
+          @if (item.state.status === 'ready' && item.state.url) {
+            <img
+              [src]="item.state.url"
+              [alt]="item.attachment.filename"
+              class="size-full object-cover transition-transform duration-200 group-hover:scale-[1.02]"
+              loading="lazy"
+              decoding="async"
+            />
+          } @else if (item.state.status === 'error') {
+            <div
+              class="flex size-full flex-col items-center justify-center gap-1 bg-red-50 p-2 text-red-500 dark:bg-red-950/30 dark:text-red-400"
+            >
+              <ng-icon name="heroExclamationTriangle" class="size-6" aria-hidden="true" />
+              <span class="px-2 text-center text-xs">Preview unavailable</span>
+            </div>
+          } @else {
+            <div class="flex size-full items-center justify-center">
+              <div
+                class="size-8 animate-pulse rounded-full bg-gray-300 dark:bg-gray-600"
+                aria-hidden="true"
+              ></div>
+              <span class="sr-only">Loading {{ item.attachment.filename }}</span>
+            </div>
+          }
+
+          @if (showOverflowOnLast() && $last) {
+            <div
+              class="pointer-events-none absolute inset-0 flex items-center justify-center bg-black/50 text-2xl font-semibold text-white"
+            >
+              +{{ overflowCount() }}
+            </div>
+          }
+        </button>
+      }
+    </div>
+
+    @if (lightboxOpenAt() !== null) {
+      <app-image-lightbox
+        [images]="lightboxImages()"
+        [startIndex]="lightboxOpenAt() ?? 0"
+        (close)="closeLightbox()"
+      />
+    }
+  `,
+})
+export class ImageAttachmentGroupComponent {
+  readonly attachments = input.required<FileAttachmentData[]>();
+
+  private readonly fileUploadService = inject(FileUploadService);
+
+  /** Map of uploadId -> preview state. Signals updates trigger re-render. */
+  protected readonly previews = signal<Map<string, PreviewState>>(new Map());
+
+  protected readonly lightboxOpenAt = signal<number | null>(null);
+
+  /** All attachments are eligible for the lightbox; we cap visible tiles at 4. */
+  private readonly maxVisible = 4;
+
+  protected readonly visibleImages = computed(() => {
+    const all = this.attachments();
+    const visible = all.slice(0, this.maxVisible);
+    const map = this.previews();
+    return visible.map((attachment) => ({
+      attachment,
+      state: map.get(attachment.uploadId) ?? { url: null, status: 'idle' as const },
+    }));
+  });
+
+  protected readonly overflowCount = computed(() =>
+    Math.max(0, this.attachments().length - this.maxVisible),
+  );
+
+  protected readonly showOverflowOnLast = computed(() => this.overflowCount() > 0);
+
+  protected readonly lightboxImages = computed<LightboxImage[]>(() => {
+    const map = this.previews();
+    return this.attachments().map((a) => ({
+      url: map.get(a.uploadId)?.url ?? '',
+      filename: a.filename,
+    }));
+  });
+
+  protected readonly maxWidthPx = computed(() => {
+    const count = Math.min(this.attachments().length, this.maxVisible);
+    if (count === 1) return 320;
+    return 360;
+  });
+
+  protected readonly layoutClass = computed(() => {
+    const count = Math.min(this.attachments().length, this.maxVisible);
+    if (count === 1) return 'block';
+    if (count === 2) return 'grid grid-cols-2 gap-0.5';
+    if (count === 3) return 'grid grid-cols-2 grid-rows-2 gap-0.5';
+    return 'grid grid-cols-2 grid-rows-2 gap-0.5';
+  });
+
+  constructor() {
+    queueMicrotask(() => this.loadPreviews());
+  }
+
+  protected tileClass(index: number): string {
+    const count = Math.min(this.attachments().length, this.maxVisible);
+    if (count === 1) {
+      return 'aspect-[4/3] max-h-[280px] w-full';
+    }
+    if (count === 2) {
+      return 'aspect-square';
+    }
+    if (count === 3) {
+      // First tile spans 2 rows on left; tiles 2 and 3 stack on right
+      if (index === 0) return 'row-span-2 aspect-[3/4]';
+      return 'aspect-square';
+    }
+    // 4+
+    return 'aspect-square';
+  }
+
+  protected openLightbox(visibleIndex: number): void {
+    const map = this.previews();
+    const attachment = this.attachments()[visibleIndex];
+    if (!attachment) return;
+    const state = map.get(attachment.uploadId);
+    if (state?.status !== 'ready') return;
+    this.lightboxOpenAt.set(visibleIndex);
+  }
+
+  protected closeLightbox(): void {
+    this.lightboxOpenAt.set(null);
+  }
+
+  private async loadPreviews(): Promise<void> {
+    const all = this.attachments();
+    const current = this.previews();
+    const next = new Map(current);
+    for (const a of all) {
+      if (!next.has(a.uploadId)) {
+        next.set(a.uploadId, { url: null, status: 'loading' });
+      }
+    }
+    this.previews.set(next);
+
+    await Promise.all(
+      all.map(async (a) => {
+        if (current.get(a.uploadId)?.status === 'ready') return;
+        try {
+          const response = await this.fileUploadService.getPreviewUrl(a.uploadId);
+          this.updatePreview(a.uploadId, { url: response.url, status: 'ready' });
+        } catch {
+          this.updatePreview(a.uploadId, { url: null, status: 'error' });
+        }
+      }),
+    );
+  }
+
+  private updatePreview(uploadId: string, state: PreviewState): void {
+    this.previews.update((m) => {
+      const next = new Map(m);
+      next.set(uploadId, state);
+      return next;
+    });
+  }
+}
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/file-attachment/image-lightbox.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/file-attachment/image-lightbox.component.ts
new file mode 100644
index 00000000..89dda414
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/file-attachment/image-lightbox.component.ts
@@ -0,0 +1,138 @@
+import {
+  ChangeDetectionStrategy,
+  Component,
+  computed,
+  input,
+  output,
+  signal,
+} from '@angular/core';
+import { NgIcon, provideIcons } from '@ng-icons/core';
+import { heroXMark, heroArrowLeft, heroArrowRight } from '@ng-icons/heroicons/outline';
+
+export interface LightboxImage {
+  url: string;
+  filename: string;
+}
+
+/**
+ * Full-screen image lightbox with keyboard navigation.
+ *
+ * Renders a fixed overlay above the page when an image is selected.
+ * Supports left/right arrow keys to step through a group of images and
+ * Escape to dismiss.
+ */
+@Component({
+  selector: 'app-image-lightbox',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  imports: [NgIcon],
+  providers: [provideIcons({ heroXMark, heroArrowLeft, heroArrowRight })],
+  host: {
+    '(document:keydown)': 'onKeydown($event)',
+  },
+  template: `
+    <div
+      class="fixed inset-0 z-[9999] flex items-center justify-center bg-black/85 p-4 backdrop-blur-sm"
+      role="dialog"
+      aria-modal="true"
+      [attr.aria-label]="'Image preview: ' + currentImage().filename"
+      (click)="onBackdropClick($event)"
+    >
+      <button
+        type="button"
+        class="absolute right-4 top-4 flex size-10 items-center justify-center rounded-full bg-white/10 text-white transition-colors hover:bg-white/20 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-white"
+        (click)="close.emit()"
+        aria-label="Close preview"
+      >
+        <ng-icon name="heroXMark" class="size-6" aria-hidden="true" />
+      </button>
+
+      @if (hasMultiple()) {
+        <button
+          type="button"
+          class="absolute left-4 top-1/2 flex size-12 -translate-y-1/2 items-center justify-center rounded-full bg-white/10 text-white transition-colors hover:bg-white/20 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-white"
+          (click)="prev($event)"
+          aria-label="Previous image"
+        >
+          <ng-icon name="heroArrowLeft" class="size-6" aria-hidden="true" />
+        </button>
+        <button
+          type="button"
+          class="absolute right-4 top-1/2 flex size-12 -translate-y-1/2 items-center justify-center rounded-full bg-white/10 text-white transition-colors hover:bg-white/20 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-white"
+          (click)="next($event)"
+          aria-label="Next image"
+        >
+          <ng-icon name="heroArrowRight" class="size-6" aria-hidden="true" />
+        </button>
+      }
+
+      <figure class="flex max-h-full max-w-full flex-col items-center gap-3">
+        <img
+          [src]="currentImage().url"
+          [alt]="currentImage().filename"
+          class="max-h-[85vh] max-w-full rounded-lg object-contain shadow-2xl"
+          (click)="$event.stopPropagation()"
+        />
+        <figcaption class="max-w-full truncate text-sm text-white/80">
+          {{ currentImage().filename }}
+          @if (hasMultiple()) {
+            <span class="ml-2 text-white/50">{{ activeIndex() + 1 }} / {{ images().length }}</span>
+          }
+        </figcaption>
+      </figure>
+    </div>
+  `,
+})
+export class ImageLightboxComponent {
+  readonly images = input.required<LightboxImage[]>();
+  readonly startIndex = input<number>(0);
+  readonly close = output<void>();
+
+  protected readonly activeIndex = signal(0);
+
+  protected readonly currentImage = computed(() => {
+    const list = this.images();
+    const idx = Math.min(Math.max(this.activeIndex(), 0), list.length - 1);
+    return list[idx];
+  });
+
+  protected readonly hasMultiple = computed(() => this.images().length > 1);
+
+  constructor() {
+    queueMicrotask(() => this.activeIndex.set(this.startIndex()));
+  }
+
+  protected onKeydown(event: KeyboardEvent): void {
+    if (event.key === 'Escape') {
+      event.preventDefault();
+      this.close.emit();
+    } else if (event.key === 'ArrowLeft' && this.hasMultiple()) {
+      event.preventDefault();
+      this.step(-1);
+    } else if (event.key === 'ArrowRight' && this.hasMultiple()) {
+      event.preventDefault();
+      this.step(1);
+    }
+  }
+
+  protected onBackdropClick(event: MouseEvent): void {
+    if (event.target === event.currentTarget) {
+      this.close.emit();
+    }
+  }
+
+  protected prev(event: Event): void {
+    event.stopPropagation();
+    this.step(-1);
+  }
+
+  protected next(event: Event): void {
+    event.stopPropagation();
+    this.step(1);
+  }
+
+  private step(delta: number): void {
+    const len = this.images().length;
+    if (len === 0) return;
+    this.activeIndex.update((i) => (i + delta + len) % len);
+  }
+}
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/file-attachment/index.ts b/frontend/ai.client/src/app/session/components/message-list/components/file-attachment/index.ts
index 52165759..a0d9b1c1 100644
--- a/frontend/ai.client/src/app/session/components/message-list/components/file-attachment/index.ts
+++ b/frontend/ai.client/src/app/session/components/message-list/components/file-attachment/index.ts
@@ -1 +1,5 @@
 export { FileAttachmentBadgeComponent } from './file-attachment-badge.component';
+export { ImageAttachmentGroupComponent } from './image-attachment-group.component';
+export { ImageLightboxComponent } from './image-lightbox.component';
+export type { LightboxImage } from './image-lightbox.component';
+export { MarkdownPreviewModalComponent } from './markdown-preview-modal.component';
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/file-attachment/markdown-preview-modal.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/file-attachment/markdown-preview-modal.component.ts
new file mode 100644
index 00000000..6302b825
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/file-attachment/markdown-preview-modal.component.ts
@@ -0,0 +1,183 @@
+import {
+  ChangeDetectionStrategy,
+  Component,
+  computed,
+  effect,
+  inject,
+  input,
+  output,
+  signal,
+} from '@angular/core';
+import { NgIcon, provideIcons } from '@ng-icons/core';
+import { heroXMark, heroArrowTopRightOnSquare } from '@ng-icons/heroicons/outline';
+import { MarkdownComponent } from 'ngx-markdown';
+import { FileUploadService } from '../../../../../services/file-upload';
+
+/** Hard cap on how much of the file we render in the modal. */
+const MAX_PREVIEW_BYTES = 1024 * 1024;
+
+/**
+ * Full-screen modal that fetches a markdown file via a short-lived presigned
+ * URL and renders it through ngx-markdown. Used in place of opening the raw
+ * source in a new tab when a user clicks a `.md` attachment card.
+ */
+@Component({
+  selector: 'app-markdown-preview-modal',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  imports: [NgIcon, MarkdownComponent],
+  providers: [provideIcons({ heroXMark, heroArrowTopRightOnSquare })],
+  host: {
+    '(document:keydown)': 'onKeydown($event)',
+  },
+  template: `
+    <div
+      class="fixed inset-0 z-[9999] flex items-center justify-center bg-black/70 p-4 backdrop-blur-sm"
+      role="dialog"
+      aria-modal="true"
+      [attr.aria-label]="'Markdown preview: ' + filename()"
+      (click)="onBackdropClick($event)"
+    >
+      <div
+        class="flex max-h-[90vh] w-full max-w-3xl flex-col overflow-hidden rounded-xl bg-white shadow-2xl dark:bg-gray-900"
+        (click)="$event.stopPropagation()"
+      >
+        <!-- Header -->
+        <div
+          class="flex items-center justify-between gap-3 border-b border-gray-200 px-5 py-3 dark:border-gray-700"
+        >
+          <h2
+            class="min-w-0 truncate text-sm font-semibold text-gray-900 dark:text-white"
+            [title]="filename()"
+          >
+            {{ filename() }}
+          </h2>
+          <div class="flex items-center gap-1">
+            @if (sourceUrl()) {
+              <a
+                [href]="sourceUrl()!"
+                target="_blank"
+                rel="noopener noreferrer"
+                class="flex size-9 items-center justify-center rounded-full text-gray-500 transition-colors hover:bg-gray-100 hover:text-gray-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-primary-500 dark:text-gray-400 dark:hover:bg-gray-800 dark:hover:text-gray-200"
+                aria-label="Open raw markdown in a new tab"
+              >
+                <ng-icon
+                  name="heroArrowTopRightOnSquare"
+                  class="size-5"
+                  aria-hidden="true"
+                />
+              </a>
+            }
+            <button
+              type="button"
+              class="flex size-9 items-center justify-center rounded-full text-gray-500 transition-colors hover:bg-gray-100 hover:text-gray-700 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-primary-500 dark:text-gray-400 dark:hover:bg-gray-800 dark:hover:text-gray-200"
+              (click)="close.emit()"
+              aria-label="Close preview"
+            >
+              <ng-icon name="heroXMark" class="size-5" aria-hidden="true" />
+            </button>
+          </div>
+        </div>
+
+        <!-- Body -->
+        <div class="message-block min-h-0 flex-1 overflow-auto px-6 py-5">
+          @switch (state()) {
+            @case ('loading') {
+              <div
+                class="flex h-full min-h-[200px] items-center justify-center text-sm text-gray-500 dark:text-gray-400"
+                role="status"
+              >
+                <span
+                  class="size-5 animate-spin rounded-full border-2 border-gray-300 border-t-primary-500"
+                  aria-hidden="true"
+                ></span>
+                <span class="ml-3">Loading preview…</span>
+              </div>
+            }
+            @case ('error') {
+              <div
+                class="flex h-full min-h-[200px] items-center justify-center text-sm text-red-600 dark:text-red-400"
+                role="alert"
+              >
+                Couldn't load the markdown preview.
+              </div>
+            }
+            @case ('ready') {
+              @if (truncated()) {
+                <p
+                  class="mb-4 rounded-md bg-amber-50 px-3 py-2 text-xs text-amber-800 dark:bg-amber-950/40 dark:text-amber-200"
+                >
+                  Showing the first {{ formattedLimit }} of this file. Open in a
+                  new tab for the full source.
+                </p>
+              }
+              <markdown [data]="content()" />
+            }
+          }
+        </div>
+      </div>
+    </div>
+  `,
+  styles: `
+    :host {
+      display: contents;
+    }
+  `,
+})
+export class MarkdownPreviewModalComponent {
+  readonly uploadId = input.required<string>();
+  readonly filename = input.required<string>();
+  readonly close = output<void>();
+
+  private readonly fileUploadService = inject(FileUploadService);
+
+  protected readonly state = signal<'loading' | 'ready' | 'error'>('loading');
+  protected readonly content = signal<string>('');
+  protected readonly sourceUrl = signal<string | null>(null);
+  protected readonly truncated = signal(false);
+
+  protected readonly formattedLimit = computed(() => {
+    const kb = MAX_PREVIEW_BYTES / 1024;
+    return kb >= 1024 ? `${kb / 1024} MB` : `${kb} KB`;
+  })();
+
+  constructor() {
+    effect(() => {
+      const id = this.uploadId();
+      if (id) this.load(id);
+    });
+  }
+
+  protected onKeydown(event: KeyboardEvent): void {
+    if (event.key === 'Escape') {
+      event.preventDefault();
+      this.close.emit();
+    }
+  }
+
+  protected onBackdropClick(event: MouseEvent): void {
+    if (event.target === event.currentTarget) {
+      this.close.emit();
+    }
+  }
+
+  private async load(uploadId: string): Promise<void> {
+    this.state.set('loading');
+    try {
+      const presigned = await this.fileUploadService.getPreviewUrl(uploadId);
+      this.sourceUrl.set(presigned.url);
+
+      const response = await fetch(presigned.url);
+      if (!response.ok) {
+        throw new Error(`Fetch failed: ${response.status}`);
+      }
+
+      const text = await response.text();
+      const isTruncated = text.length > MAX_PREVIEW_BYTES;
+      this.content.set(isTruncated ? text.slice(0, MAX_PREVIEW_BYTES) : text);
+      this.truncated.set(isTruncated);
+      this.state.set('ready');
+    } catch {
+      this.state.set('error');
+    }
+  }
+}
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/mcp-app-card/mcp-app-card.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/mcp-app-card/mcp-app-card.component.ts
new file mode 100644
index 00000000..e80cbeb4
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/mcp-app-card/mcp-app-card.component.ts
@@ -0,0 +1,103 @@
+import {
+  ChangeDetectionStrategy,
+  Component,
+  computed,
+  input,
+} from '@angular/core';
+import { NgIcon, provideIcons } from '@ng-icons/core';
+import {
+  heroExclamationTriangle,
+  heroPuzzlePiece,
+} from '@ng-icons/heroicons/outline';
+import type { McpAppCard } from '../../../../services/mcp-apps/mcp-app-card-state.service';
+
+/**
+ * Static historical card for an app-initiated tool call (MCP Apps PR #6,
+ * Option A). Rendered on reload from persisted provenance — read-only by
+ * design: the App iframe itself is not re-instantiated, so this records
+ * *what the embedded app ran on the user's behalf*, not an interactive
+ * surface. Live app-initiated calls render through the normal tool path
+ * (PR #5); this only appears after a refresh.
+ */
+@Component({
+  selector: 'app-mcp-app-card',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  imports: [NgIcon],
+  providers: [
+    provideIcons({ heroExclamationTriangle, heroPuzzlePiece }),
+  ],
+  host: { class: 'block' },
+  template: `
+    <div
+      class="flex max-w-xl flex-col gap-1.5 rounded-lg border border-gray-200/80 bg-white px-3 py-2 text-sm dark:border-white/10 dark:bg-slate-800/70"
+      role="group"
+      [attr.aria-label]="'App ran ' + card().toolName"
+    >
+      <div class="flex items-center gap-2">
+        <ng-icon
+          name="heroPuzzlePiece"
+          class="size-4 shrink-0 text-primary-600 dark:text-primary-300"
+          aria-hidden="true"
+        />
+        <span
+          class="text-[10px] font-semibold uppercase tracking-[0.08em] text-primary-600 dark:text-primary-300"
+        >
+          Ran by app
+        </span>
+        <span
+          class="truncate font-mono text-xs text-gray-900 dark:text-gray-100"
+        >
+          {{ card().toolName }}
+        </span>
+        @if (card().isError) {
+          <ng-icon
+            name="heroExclamationTriangle"
+            class="size-4 shrink-0 text-red-600 dark:text-red-400"
+            [attr.aria-label]="'errored'"
+          />
+        }
+      </div>
+
+      @if (argsPreview(); as args) {
+        <pre
+          class="overflow-x-auto rounded bg-gray-50 px-2 py-1 text-[11px] leading-relaxed text-gray-700 dark:bg-slate-900 dark:text-gray-300"
+        >{{ args }}</pre>
+      }
+
+      @if (resultText(); as text) {
+        <p
+          class="whitespace-pre-wrap text-xs/5 text-gray-700 dark:text-gray-300"
+        >
+          {{ text }}
+        </p>
+      }
+    </div>
+  `,
+})
+export class McpAppCardComponent {
+  readonly card = input.required<McpAppCard>();
+
+  protected readonly argsPreview = computed<string | null>(() => {
+    const args = this.card().arguments;
+    if (!args || Object.keys(args).length === 0) return null;
+    let text: string;
+    try {
+      text = JSON.stringify(args);
+    } catch {
+      return null;
+    }
+    return text.length > 300 ? `${text.slice(0, 300)}…` : text;
+  });
+
+  protected readonly resultText = computed<string | null>(() => {
+    const blocks = this.card().content ?? [];
+    const parts: string[] = [];
+    for (const block of blocks) {
+      const text = (block as { text?: unknown }).text;
+      if (typeof text === 'string' && text) parts.push(text);
+    }
+    const joined = parts.join('\n').trim();
+    if (!joined) return null;
+    return joined.length > 500 ? `${joined.slice(0, 500)}…` : joined;
+  });
+}
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/mcp-app-consent-prompt/mcp-app-consent-prompt.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/mcp-app-consent-prompt/mcp-app-consent-prompt.component.ts
new file mode 100644
index 00000000..6b6c048b
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/mcp-app-consent-prompt/mcp-app-consent-prompt.component.ts
@@ -0,0 +1,227 @@
+import {
+  ChangeDetectionStrategy,
+  Component,
+  computed,
+  inject,
+  input,
+} from '@angular/core';
+import { NgIcon, provideIcons } from '@ng-icons/core';
+import {
+  heroArrowTopRightOnSquare,
+  heroCheck,
+  heroShieldCheck,
+  heroVideoCamera,
+  heroXMark,
+} from '@ng-icons/heroicons/outline';
+import {
+  McpAppConsentService,
+  PendingConsent,
+} from '../../../../services/mcp-apps/mcp-app-consent.service';
+
+/**
+ * Inline consent prompt for an App-initiated action (MCP Apps PR #6).
+ *
+ * Frontend-only: the request came from a postMessage on the embedded App,
+ * not a backend turn, so this is purely a client gate (see
+ * {@link McpAppConsentService}). Visually mirrors the OAuth consent prompt
+ * so the two read as one family; renders in the message-list strip with
+ * the unanchored OAuth prompts.
+ */
+@Component({
+  selector: 'app-mcp-app-consent-prompt',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  imports: [NgIcon],
+  providers: [
+    provideIcons({
+      heroArrowTopRightOnSquare,
+      heroCheck,
+      heroShieldCheck,
+      heroVideoCamera,
+      heroXMark,
+    }),
+  ],
+  host: { class: 'block' },
+  template: `
+    <div
+      class="mcp-consent group relative flex max-w-xl items-center gap-2.5 overflow-hidden rounded-lg border border-gray-200/80 bg-white py-1.5 pr-1.5 pl-3 shadow-[0_1px_2px_rgba(15,23,42,0.04)] dark:border-white/10 dark:bg-slate-800/70"
+      role="region"
+      aria-live="polite"
+      [attr.aria-label]="ariaLabel()"
+    >
+      <span
+        class="absolute inset-y-0 left-0 w-[2px] bg-primary-500 dark:bg-primary-400"
+        aria-hidden="true"
+      ></span>
+
+      <div
+        class="flex size-9 shrink-0 items-center justify-center overflow-hidden rounded-md bg-gray-50 ring-1 ring-gray-200/70 dark:bg-slate-900 dark:ring-white/10"
+      >
+        <ng-icon
+          [name]="isLink() ? 'heroArrowTopRightOnSquare' : 'heroVideoCamera'"
+          class="size-5 text-gray-700 dark:text-gray-300"
+          aria-hidden="true"
+        />
+      </div>
+
+      <div class="min-w-0 flex-1">
+        <p
+          class="inline-flex items-center gap-1 text-[10px] leading-none font-semibold uppercase tracking-[0.08em] text-primary-600 dark:text-primary-300"
+        >
+          <ng-icon name="heroShieldCheck" class="size-3" aria-hidden="true" />
+          Permission requested
+        </p>
+        <p class="truncate text-xs/5 text-gray-900 dark:text-gray-100">
+          @if (isLink()) {
+            This app wants to open
+            <span class="font-semibold">{{ linkHost() }}</span>
+          } @else {
+            This app requests
+            <span class="font-semibold">{{ capabilityList() }}</span>
+          }
+        </p>
+      </div>
+
+      <div class="flex shrink-0 items-center gap-1">
+        <button
+          type="button"
+          (click)="allow()"
+          class="action-btn"
+          [attr.aria-label]="'Allow: ' + summary()"
+        >
+          <ng-icon name="heroCheck" class="size-3" aria-hidden="true" />
+          <span>Allow</span>
+        </button>
+        <button
+          type="button"
+          (click)="deny()"
+          class="dismiss-btn flex size-6 items-center justify-center rounded-md text-gray-400 transition-colors hover:bg-gray-100 hover:text-gray-600 focus-visible:opacity-100 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-secondary-500 dark:hover:bg-white/10 dark:hover:text-gray-200"
+          [attr.aria-label]="'Deny: ' + summary()"
+        >
+          <ng-icon name="heroXMark" class="size-3.5" aria-hidden="true" />
+        </button>
+      </div>
+    </div>
+  `,
+  styles: `
+    @import 'tailwindcss';
+    @custom-variant dark (&:where(.dark, .dark *));
+
+    :host {
+      display: block;
+    }
+
+    .mcp-consent {
+      animation: mcp-consent-rise 0.32s cubic-bezier(0.16, 1, 0.3, 1);
+    }
+
+    .mcp-consent p {
+      margin-bottom: 0;
+    }
+
+    .action-btn {
+      display: inline-flex;
+      align-items: center;
+      gap: 0.25rem;
+      border-radius: 0.375rem;
+      padding: 0.25rem 0.625rem;
+      font-size: 0.75rem;
+      font-weight: 600;
+      color: white;
+      background: var(--color-secondary-500);
+      transition:
+        background-color 120ms ease,
+        transform 120ms ease;
+    }
+
+    .action-btn:hover {
+      background: var(--color-secondary-600);
+    }
+
+    .action-btn:active {
+      transform: translateY(1px);
+    }
+
+    .action-btn:focus-visible {
+      outline: 2px solid var(--color-secondary-500);
+      outline-offset: 2px;
+    }
+
+    .dismiss-btn {
+      opacity: 0;
+    }
+
+    .group:hover .dismiss-btn,
+    .group:focus-within .dismiss-btn {
+      opacity: 1;
+    }
+
+    @keyframes mcp-consent-rise {
+      from {
+        opacity: 0;
+        transform: translateY(6px);
+      }
+      to {
+        opacity: 1;
+        transform: translateY(0);
+      }
+    }
+
+    @media (prefers-reduced-motion: reduce) {
+      .mcp-consent {
+        animation: none;
+      }
+      .action-btn {
+        transition: none;
+      }
+    }
+  `,
+})
+export class McpAppConsentPromptComponent {
+  readonly prompt = input.required<PendingConsent>();
+
+  private readonly consentService = inject(McpAppConsentService);
+
+  protected readonly isLink = computed(
+    () => this.prompt().request.kind === 'open-link',
+  );
+
+  protected readonly linkHost = computed<string>(() => {
+    const req = this.prompt().request;
+    if (req.kind !== 'open-link') return '';
+    try {
+      return new URL(req.url).host;
+    } catch {
+      return req.url;
+    }
+  });
+
+  protected readonly capabilityList = computed<string>(() => {
+    const req = this.prompt().request;
+    if (req.kind !== 'capabilities') return '';
+    const labels: Record<string, string> = {
+      camera: 'camera',
+      microphone: 'microphone',
+      geolocation: 'location',
+      clipboardWrite: 'clipboard',
+    };
+    return req.capabilities.map((c) => labels[c] ?? c).join(', ');
+  });
+
+  protected readonly summary = computed<string>(() =>
+    this.isLink()
+      ? `open ${this.linkHost()}`
+      : `${this.capabilityList()} access`,
+  );
+
+  protected readonly ariaLabel = computed<string>(
+    () => `App permission requested: ${this.summary()}`,
+  );
+
+  allow(): void {
+    this.consentService.answer(this.prompt().id, true);
+  }
+
+  deny(): void {
+    this.consentService.answer(this.prompt().id, false);
+  }
+}
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/message-actions.component.spec.ts b/frontend/ai.client/src/app/session/components/message-list/components/message-actions.component.spec.ts
new file mode 100644
index 00000000..ffa22ddd
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/message-actions.component.spec.ts
@@ -0,0 +1,60 @@
+import { ComponentFixture, TestBed } from '@angular/core/testing';
+import { describe, it, expect, beforeEach } from 'vitest';
+import { provideMarkdown, MarkdownService } from 'ngx-markdown';
+import { MessageActionsComponent } from './message-actions.component';
+import { Message } from '../../../services/models/message.model';
+
+function makeMessage(text: string): Message {
+  return {
+    id: 'msg-1',
+    role: 'assistant',
+    content: [{ type: 'text', text }],
+  };
+}
+
+function continueButton(fixture: ComponentFixture<MessageActionsComponent>): HTMLButtonElement | null {
+  return fixture.nativeElement.querySelector(
+    'button[aria-label="Continue the truncated response"]',
+  );
+}
+
+describe('MessageActionsComponent — Continue affordance', () => {
+  let fixture: ComponentFixture<MessageActionsComponent>;
+  let component: MessageActionsComponent;
+
+  beforeEach(async () => {
+    await TestBed.configureTestingModule({
+      imports: [MessageActionsComponent],
+      providers: [provideMarkdown()],
+    }).compileComponents();
+
+    const markdownService = TestBed.inject(MarkdownService);
+    markdownService.parse = () => '';
+
+    fixture = TestBed.createComponent(MessageActionsComponent);
+    component = fixture.componentInstance;
+    fixture.componentRef.setInput('message', makeMessage('partial answer'));
+  });
+
+  it('hides the Continue button by default', () => {
+    fixture.detectChanges();
+    expect(continueButton(fixture)).toBeNull();
+  });
+
+  it('shows the Continue button when canContinue is true', () => {
+    fixture.componentRef.setInput('canContinue', true);
+    fixture.detectChanges();
+    expect(continueButton(fixture)).not.toBeNull();
+  });
+
+  it('emits continueRequested when the Continue button is clicked', () => {
+    fixture.componentRef.setInput('canContinue', true);
+    fixture.detectChanges();
+
+    let emitted = 0;
+    component.continueRequested.subscribe(() => emitted++);
+
+    continueButton(fixture)!.click();
+    expect(emitted).toBe(1);
+  });
+});
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/message-actions.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/message-actions.component.ts
index 5d632d70..beaeb55b 100644
--- a/frontend/ai.client/src/app/session/components/message-list/components/message-actions.component.ts
+++ b/frontend/ai.client/src/app/session/components/message-list/components/message-actions.component.ts
@@ -4,12 +4,13 @@ import {
   computed,
   inject,
   input,
+  output,
   PLATFORM_ID,
   signal,
 } from '@angular/core';
 import { isPlatformBrowser } from '@angular/common';
 import { NgIcon, provideIcons } from '@ng-icons/core';
-import { heroSquare2Stack, heroCheck } from '@ng-icons/heroicons/outline';
+import { heroSquare2Stack, heroCheck, heroArrowPath } from '@ng-icons/heroicons/outline';
 import { MarkdownService } from 'ngx-markdown';
 import { Message, isTextContentBlock } from '../../../services/models/message.model';
 import { TooltipDirective } from '../../../../components/tooltip';
@@ -18,7 +19,7 @@ import { TooltipDirective } from '../../../../components/tooltip';
   selector: 'app-message-actions',
   changeDetection: ChangeDetectionStrategy.OnPush,
   imports: [NgIcon, TooltipDirective],
-  providers: [provideIcons({ heroSquare2Stack, heroCheck })],
+  providers: [provideIcons({ heroSquare2Stack, heroCheck, heroArrowPath })],
   template: `
     <div class="flex items-center gap-1">
       <button
@@ -36,6 +37,23 @@ import { TooltipDirective } from '../../../../components/tooltip';
           <ng-icon name="heroSquare2Stack" class="size-4" aria-hidden="true" />
         }
       </button>
+
+      @if (canContinue()) {
+        <span class="pl-1 text-xs text-gray-500 dark:text-gray-400">
+          Response length limit reached
+        </span>
+        <button
+          type="button"
+          class="inline-flex items-center gap-1.5 rounded-md px-2 py-1 text-sm font-medium text-blue-700 transition-colors hover:bg-blue-50 focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-blue-500 dark:text-blue-300 dark:hover:bg-blue-950/40"
+          appTooltip="Resume response"
+          appTooltipPosition="top"
+          aria-label="Continue the truncated response"
+          (click)="continueRequested.emit()"
+        >
+          <ng-icon name="heroArrowPath" class="size-4" aria-hidden="true" />
+          <span>Continue</span>
+        </button>
+      }
     </div>
   `,
   styles: `
@@ -59,6 +77,13 @@ export class MessageActionsComponent {
 
   message = input.required<Message>();
 
+  /** Show a "Continue" button when this is the last assistant message of a
+   *  recoverable max_tokens-truncated turn. */
+  canContinue = input<boolean>(false);
+
+  /** Emitted when the user asks to continue the truncated response. */
+  continueRequested = output<void>();
+
   protected copied = signal(false);
   private resetTimeout: ReturnType<typeof setTimeout> | null = null;
 
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/message-metadata-badges.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/message-metadata-badges.component.ts
index 07fcdf4f..1329c058 100644
--- a/frontend/ai.client/src/app/session/components/message-list/components/message-metadata-badges.component.ts
+++ b/frontend/ai.client/src/app/session/components/message-list/components/message-metadata-badges.component.ts
@@ -17,7 +17,11 @@ interface CostBreakdown {
 
 interface MessageMetadata {
     latency?: {
-        timeToFirstToken?: number;
+        // null = not measured (provider didn't emit timeToFirstByteMs and
+        // we couldn't compute it locally). Distinct from 0, which is
+        // physically impossible — keep them separate so any future
+        // analytics can filter unmeasured samples cleanly.
+        timeToFirstToken?: number | null;
         endToEndLatency?: number;
     };
     tokenUsage?: {
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/streaming-text.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/streaming-text.component.ts
index 72c31ae1..15f46602 100644
--- a/frontend/ai.client/src/app/session/components/message-list/components/streaming-text.component.ts
+++ b/frontend/ai.client/src/app/session/components/message-list/components/streaming-text.component.ts
@@ -10,6 +10,7 @@ import {
 } from '@angular/core';
 import { isPlatformBrowser } from '@angular/common';
 import { MarkdownComponent } from 'ngx-markdown';
+import { CodeBlockClipboardButtonComponent } from './code-block-clipboard-button.component';
 
 /**
  * StreamingTextComponent provides smooth character-by-character typing animation
@@ -27,7 +28,8 @@ import { MarkdownComponent } from 'ngx-markdown';
   template: `
     <markdown
       class="min-w-0 max-w-full overflow-hidden"
-      clipboard
+      [clipboard]="!isStreaming()"
+      [clipboardButtonComponent]="ClipboardButton"
       mermaid
       katex
       [data]="displayedText()"
@@ -43,6 +45,8 @@ export class StreamingTextComponent implements OnDestroy {
   private platformId = inject(PLATFORM_ID);
   private isBrowser = isPlatformBrowser(this.platformId);
 
+  protected readonly ClipboardButton = CodeBlockClipboardButtonComponent;
+
   /** The full text content to display */
   text = input.required<string>();
 
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/index.ts b/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/index.ts
index d1121485..79c40bd9 100644
--- a/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/index.ts
+++ b/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/index.ts
@@ -1,2 +1,3 @@
 export * from './tool-rail.component';
 export * from './tool-rail.model';
+export * from './pin-scroll-to-bottom.directive';
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/pin-scroll-to-bottom.directive.ts b/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/pin-scroll-to-bottom.directive.ts
new file mode 100644
index 00000000..4aed6b09
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/pin-scroll-to-bottom.directive.ts
@@ -0,0 +1,25 @@
+import { Directive, ElementRef, effect, inject, input } from '@angular/core';
+
+/**
+ * Keeps the host element scrolled to the bottom whenever the bound value
+ * changes. Used for live-streaming output (e.g. artifact generation) so the
+ * latest content stays visible inside a fixed-height scroll box.
+ */
+@Directive({
+  selector: '[appPinScrollToBottom]',
+})
+export class PinScrollToBottomDirective {
+  /** Bind the streamed value; the host re-pins to the bottom on each change. */
+  readonly appPinScrollToBottom = input<string>('');
+
+  private readonly host = inject<ElementRef<HTMLElement>>(ElementRef);
+
+  constructor() {
+    effect(() => {
+      // Read the streamed value so the effect re-runs for every chunk.
+      this.appPinScrollToBottom();
+      const el = this.host.nativeElement;
+      el.scrollTop = el.scrollHeight;
+    });
+  }
+}
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/tool-rail.component.html b/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/tool-rail.component.html
index c7fc563a..cb14ad9f 100644
--- a/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/tool-rail.component.html
+++ b/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/tool-rail.component.html
@@ -101,6 +101,24 @@
                 </div>
               }
 
+              <!-- Live streaming output (e.g. artifact being generated) -->
+              @if (isGenerating(call)) {
+                <div class="flex items-start gap-1.5">
+                  <span class="text-xs text-gray-400 dark:text-gray-500 shrink-0 mt-1">output:</span>
+                  <div class="w-full">
+                    <div class="flex items-center gap-1.5 mb-1 text-xs text-gray-400 dark:text-gray-500">
+                      <span class="status-dot bg-amber-400 shimmer"></span>
+                      <span>Generating output…</span>
+                    </div>
+                    <pre
+                      class="max-h-56 overflow-auto whitespace-pre-wrap break-all font-mono text-xs text-gray-500 dark:text-gray-400 bg-gray-50 dark:bg-gray-800/60 border border-gray-100 dark:border-gray-700/30 rounded-xs px-2 py-1.5"
+                      aria-label="Tool output being generated"
+                      [appPinScrollToBottom]="call.streamingContent ?? ''"
+                    >{{ call.streamingContent }}</pre>
+                  </div>
+                </div>
+              }
+
               <!-- Result -->
               @if (call.result) {
                 <div class="flex items-start gap-1.5">
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/tool-rail.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/tool-rail.component.ts
index 35c51277..72d2d106 100644
--- a/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/tool-rail.component.ts
+++ b/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/tool-rail.component.ts
@@ -8,6 +8,7 @@ import {
 import { KeyValuePipe } from '@angular/common';
 import { JsonSyntaxHighlightPipe } from '../tool-use/json-syntax-highlight.pipe';
 import { ToolCallGroup, ToolCallDisplay } from './tool-rail.model';
+import { PinScrollToBottomDirective } from './pin-scroll-to-bottom.directive';
 import { ToolResultContent } from '../../../../services/models/message.model';
 
 @Component({
@@ -15,7 +16,7 @@ import { ToolResultContent } from '../../../../services/models/message.model';
   templateUrl: './tool-rail.component.html',
   styleUrl: './tool-rail.component.css',
   changeDetection: ChangeDetectionStrategy.OnPush,
-  imports: [JsonSyntaxHighlightPipe, KeyValuePipe],
+  imports: [JsonSyntaxHighlightPipe, KeyValuePipe, PinScrollToBottomDirective],
 })
 export class ToolRailComponent {
   /** The grouped tool calls to display */
@@ -89,6 +90,14 @@ export class ToolRailComponent {
     }
   }
 
+  /**
+   * True while a tool is still streaming its long output (e.g. an artifact)
+   * and has not yet returned a result — drives the live "generating" preview.
+   */
+  isGenerating(call: ToolCallDisplay): boolean {
+    return !!call.streamingContent && !call.result;
+  }
+
   /** Format duration for display */
   formatDuration(ms: number): string {
     return ms >= 1000 ? `${(ms / 1000).toFixed(1)}s` : `${ms}ms`;
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/tool-rail.model.ts b/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/tool-rail.model.ts
index c669c333..932e312a 100644
--- a/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/tool-rail.model.ts
+++ b/frontend/ai.client/src/app/session/components/message-list/components/tool-rail/tool-rail.model.ts
@@ -26,6 +26,14 @@ export interface ToolCallDisplay {
    *  for the user to authorize. */
   status: 'pending' | 'complete' | 'error' | 'awaiting_auth';
 
+  /**
+   * Partially-generated long output (e.g. an artifact's HTML) decoded from the
+   * still-incomplete tool-call JSON while the model is streaming it. Present
+   * only while the call is in flight; cleared once a result arrives. Used to
+   * show live "generating output" feedback in the rail.
+   */
+  streamingContent?: string;
+
   /** Optional LLM-generated one-line summary of this tool call's result */
   summary?: string;
 
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/tool-use/built-in-renderers.ts b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/built-in-renderers.ts
new file mode 100644
index 00000000..a0ee86f6
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/built-in-renderers.ts
@@ -0,0 +1,20 @@
+import { EnvironmentProviders, inject, provideAppInitializer } from '@angular/core';
+import { ToolRendererRegistryService } from './tool-renderer-registry.service';
+import { CalculatorToolResultComponent } from './renderers/calculator-tool-result.component';
+import { FetchUrlContentToolResultComponent } from './renderers/fetch-url-content-tool-result.component';
+import { CreateVisualizationToolResultComponent } from './renderers/create-visualization-tool-result.component';
+
+/**
+ * Registers the built-in proof-point renderers into
+ * {@link ToolRendererRegistryService} at bootstrap. New renderers (including
+ * the future MCP App renderer) register here — or via their own
+ * `provideAppInitializer` — with no host-template changes.
+ */
+export function provideBuiltInToolRenderers(): EnvironmentProviders {
+  return provideAppInitializer(() => {
+    const registry = inject(ToolRendererRegistryService);
+    registry.register('calculator', CalculatorToolResultComponent);
+    registry.register('fetch_url_content', FetchUrlContentToolResultComponent);
+    registry.register('create_visualization', CreateVisualizationToolResultComponent);
+  });
+}
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/tool-use/index.ts b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/index.ts
index 0b888c06..8fd250c8 100644
--- a/frontend/ai.client/src/app/session/components/message-list/components/tool-use/index.ts
+++ b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/index.ts
@@ -1,2 +1,5 @@
 export * from './tool-use.component';
 export * from './json-syntax-highlight.pipe';
+export * from './tool-renderer-registry.service';
+export * from './renderers/default-tool-result.component';
+export * from './built-in-renderers';
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/tool-use/renderers/calculator-tool-result.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/renderers/calculator-tool-result.component.ts
new file mode 100644
index 00000000..7a4f9e05
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/renderers/calculator-tool-result.component.ts
@@ -0,0 +1,20 @@
+import { ChangeDetectionStrategy, Component, input } from '@angular/core';
+import { DefaultToolResultComponent } from './default-tool-result.component';
+import type { ToolResultData, ToolResultRenderer } from '../tool-renderer-registry.service';
+
+/**
+ * Registry proof point for the `calculator` tool. Renders identically to the
+ * default today; it exists to validate that a distinct, tool-named component
+ * resolves and slots in with zero visual change — the exact mechanism the
+ * MCP App renderer will use.
+ */
+@Component({
+  selector: 'app-calculator-tool-result',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  imports: [DefaultToolResultComponent],
+  template: `<app-default-tool-result [result]="result()" [minimized]="minimized()" />`,
+})
+export class CalculatorToolResultComponent implements ToolResultRenderer {
+  result = input.required<ToolResultData>();
+  minimized = input<boolean>(false);
+}
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/tool-use/renderers/create-visualization-tool-result.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/renderers/create-visualization-tool-result.component.ts
new file mode 100644
index 00000000..cf83c34f
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/renderers/create-visualization-tool-result.component.ts
@@ -0,0 +1,19 @@
+import { ChangeDetectionStrategy, Component, input } from '@angular/core';
+import { DefaultToolResultComponent } from './default-tool-result.component';
+import type { ToolResultData, ToolResultRenderer } from '../tool-renderer-registry.service';
+
+/**
+ * Registry proof point for the `create_visualization` tool. Renders
+ * identically to the default today; it exists to validate the registry
+ * shape with a distinct, tool-named component.
+ */
+@Component({
+  selector: 'app-create-visualization-tool-result',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  imports: [DefaultToolResultComponent],
+  template: `<app-default-tool-result [result]="result()" [minimized]="minimized()" />`,
+})
+export class CreateVisualizationToolResultComponent implements ToolResultRenderer {
+  result = input.required<ToolResultData>();
+  minimized = input<boolean>(false);
+}
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/tool-use/renderers/default-tool-result.component.spec.ts b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/renderers/default-tool-result.component.spec.ts
new file mode 100644
index 00000000..1ab4005e
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/renderers/default-tool-result.component.spec.ts
@@ -0,0 +1,83 @@
+import { ComponentFixture, TestBed } from '@angular/core/testing';
+import { describe, it, expect, beforeEach } from 'vitest';
+import { DefaultToolResultComponent } from './default-tool-result.component';
+import { ToolResultData } from '../tool-renderer-registry.service';
+
+function makeResult(content: ToolResultData['content']): ToolResultData {
+  return { status: 'success', content };
+}
+
+describe('DefaultToolResultComponent', () => {
+  let fixture: ComponentFixture<DefaultToolResultComponent>;
+
+  beforeEach(async () => {
+    TestBed.resetTestingModule();
+    await TestBed.configureTestingModule({
+      imports: [DefaultToolResultComponent],
+    }).compileComponents();
+
+    fixture = TestBed.createComponent(DefaultToolResultComponent);
+  });
+
+  it('renders plain text content', () => {
+    fixture.componentRef.setInput('result', makeResult([{ text: 'hello world' }]));
+    fixture.detectChanges();
+
+    const textEl = fixture.nativeElement.querySelector('.whitespace-pre-wrap');
+    expect(textEl).toBeTruthy();
+    expect(textEl.textContent).toContain('hello world');
+  });
+
+  it('renders JSON content as a highlighted code block', () => {
+    fixture.componentRef.setInput('result', makeResult([{ json: { answer: 42 } }]));
+    fixture.detectChanges();
+
+    const codeEl = fixture.nativeElement.querySelector('pre code');
+    expect(codeEl).toBeTruthy();
+    expect(codeEl.textContent).toContain('answer');
+    expect(codeEl.textContent).toContain('42');
+    // The plain-text branch must not be used for JSON items.
+    expect(fixture.nativeElement.querySelector('.whitespace-pre-wrap')).toBeNull();
+  });
+
+  it('renders image content in full (non-minimized) mode', () => {
+    fixture.componentRef.setInput(
+      'result',
+      makeResult([{ image: { format: 'png', data: 'iVBORw0KGgo=' } }]),
+    );
+    fixture.detectChanges();
+
+    const img = fixture.nativeElement.querySelector('img');
+    expect(img).toBeTruthy();
+    expect(img.getAttribute('src')).toBe('data:image/png;base64,iVBORw0KGgo=');
+  });
+
+  it('uses rounded-sm cards in full mode', () => {
+    fixture.componentRef.setInput('result', makeResult([{ text: 'x' }]));
+    fixture.detectChanges();
+
+    const card = fixture.nativeElement.querySelector('.space-y-2 > div');
+    expect(card.classList.contains('rounded-sm')).toBe(true);
+    expect(card.classList.contains('rounded-xs')).toBe(false);
+  });
+
+  it('omits images and uses rounded-xs cards in minimized mode', () => {
+    fixture.componentRef.setInput(
+      'result',
+      makeResult([
+        { text: 'visible text' },
+        { image: { format: 'png', data: 'abc' } },
+      ]),
+    );
+    fixture.componentRef.setInput('minimized', true);
+    fixture.detectChanges();
+
+    // Image is suppressed in the minimized variant (historical behavior).
+    expect(fixture.nativeElement.querySelector('img')).toBeNull();
+
+    const card = fixture.nativeElement.querySelector('.space-y-2 > div');
+    expect(card.classList.contains('rounded-xs')).toBe(true);
+    expect(card.classList.contains('rounded-sm')).toBe(false);
+    expect(card.textContent).toContain('visible text');
+  });
+});
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/tool-use/renderers/default-tool-result.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/renderers/default-tool-result.component.ts
new file mode 100644
index 00000000..f79ef7d4
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/renderers/default-tool-result.component.ts
@@ -0,0 +1,75 @@
+import { ChangeDetectionStrategy, Component, computed, input } from '@angular/core';
+import { JsonSyntaxHighlightPipe } from '../json-syntax-highlight.pipe';
+import { ToolResultContent } from '../../../../../services/models/message.model';
+import type { ToolResultData, ToolResultRenderer } from '../tool-renderer-registry.service';
+
+/**
+ * Fallback tool-result renderer. Reproduces the historical text / JSON /
+ * image rendering verbatim — this is what every unregistered tool resolves
+ * to, so its output must stay byte-for-byte identical to the markup that
+ * previously lived inline in `tool-use.component.html`.
+ *
+ * `minimized` reproduces the two pre-existing variants: the minimized view
+ * used `rounded-xs` cards and never rendered image content; the full view
+ * used `rounded-sm` cards and appended images after the text/JSON items.
+ */
+@Component({
+  selector: 'app-default-tool-result',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  imports: [JsonSyntaxHighlightPipe],
+  styles: ':host { display: block; }',
+  template: `
+    <div class="space-y-2">
+      @for (item of textItems(); track $index) {
+        <div
+          class="p-2 bg-gray-100 dark:bg-gray-800 border border-gray-300 dark:border-gray-600"
+          [class.rounded-xs]="minimized()"
+          [class.rounded-sm]="!minimized()"
+        >
+          @if (item.json) {
+            <pre class="font-mono text-xs overflow-x-auto"><code [innerHTML]="formatContent(item) | jsonSyntaxHighlight"></code></pre>
+          } @else {
+            <div class="text-sm/6 text-gray-900 dark:text-gray-100 whitespace-pre-wrap">{{ formatContent(item) }}</div>
+          }
+        </div>
+      }
+
+      @if (!minimized()) {
+        @for (item of imageItems(); track $index) {
+          <div class="rounded-sm overflow-hidden bg-gray-100 dark:bg-gray-800 p-2 border border-gray-300 dark:border-gray-600">
+            <img
+              [src]="imageDataUrl(item)"
+              alt="Tool result image"
+              class="max-w-full h-auto rounded-xs"
+            />
+          </div>
+        }
+      }
+    </div>
+  `,
+})
+export class DefaultToolResultComponent implements ToolResultRenderer {
+  result = input.required<ToolResultData>();
+  minimized = input<boolean>(false);
+
+  /** Text and JSON content items (images are rendered separately). */
+  textItems = computed(() =>
+    this.result().content.filter((item) => Boolean(item.text || item.json)),
+  );
+
+  /** Image content items. */
+  imageItems = computed(() =>
+    this.result().content.filter((item) => Boolean(item.image)),
+  );
+
+  formatContent(item: ToolResultContent): string {
+    if (item.text) return item.text;
+    if (item.json) return JSON.stringify(item.json, null, 2);
+    return '';
+  }
+
+  imageDataUrl(item: ToolResultContent): string {
+    if (!item.image) return '';
+    return `data:image/${item.image.format};base64,${item.image.data}`;
+  }
+}
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/tool-use/renderers/fetch-url-content-tool-result.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/renderers/fetch-url-content-tool-result.component.ts
new file mode 100644
index 00000000..1a05f74f
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/renderers/fetch-url-content-tool-result.component.ts
@@ -0,0 +1,19 @@
+import { ChangeDetectionStrategy, Component, input } from '@angular/core';
+import { DefaultToolResultComponent } from './default-tool-result.component';
+import type { ToolResultData, ToolResultRenderer } from '../tool-renderer-registry.service';
+
+/**
+ * Registry proof point for the `fetch_url_content` tool. Renders identically
+ * to the default today; it exists to validate the registry shape with a
+ * distinct, tool-named component.
+ */
+@Component({
+  selector: 'app-fetch-url-content-tool-result',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  imports: [DefaultToolResultComponent],
+  template: `<app-default-tool-result [result]="result()" [minimized]="minimized()" />`,
+})
+export class FetchUrlContentToolResultComponent implements ToolResultRenderer {
+  result = input.required<ToolResultData>();
+  minimized = input<boolean>(false);
+}
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/tool-use/renderers/mcp-app-frame.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/renderers/mcp-app-frame.component.ts
new file mode 100644
index 00000000..34fa938b
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/renderers/mcp-app-frame.component.ts
@@ -0,0 +1,330 @@
+import {
+  ChangeDetectionStrategy,
+  Component,
+  DestroyRef,
+  ElementRef,
+  computed,
+  effect,
+  inject,
+  input,
+  signal,
+  viewChild,
+} from '@angular/core';
+import { DOCUMENT } from '@angular/common';
+import type {
+  ToolResultData,
+  ToolResultRenderer,
+} from '../tool-renderer-registry.service';
+import { McpAppStateService } from '../../../../../services/mcp-apps/mcp-app-state.service';
+import { StreamParserService } from '../../../../../services/chat/stream-parser.service';
+import { ThemeService } from '../../../../../../components/topnav/components/theme-toggle/theme.service';
+import { McpAppBridge } from '../../../../../services/mcp-apps/mcp-app-bridge';
+import { McpAppProxyService } from '../../../../../services/mcp-apps/mcp-app-proxy.service';
+import { McpAppMessageService } from '../../../../../services/mcp-apps/mcp-app-message.service';
+import { McpAppConsentService } from '../../../../../services/mcp-apps/mcp-app-consent.service';
+import { buildProxyUrl } from '../../../../../services/mcp-apps/proxy-url';
+import { McpAppConsentPromptComponent } from '../../mcp-app-consent-prompt/mcp-app-consent-prompt.component';
+import type { CapabilityKey } from '../../../../../services/mcp-apps/mcp-app-protocol';
+import { ChatRequestService } from '../../../../../services/chat/chat-request.service';
+import { SessionService } from '../../../../../services/session/session.service';
+
+/**
+ * MCP App renderer (SEP-1865), PR #4 of
+ * `docs/kaizen/scoping/mcp-apps-host-renderer.md`.
+ *
+ * Resolves to this component (instead of the default text/JSON renderer)
+ * when the tool invocation produced a `ui_resource` event — see the
+ * `resultRenderer` computed in `ToolUseComponent`. Renders the outer
+ * sandbox-proxy iframe at the deployed `sandboxOrigin` and drives the host
+ * half of the postMessage bridge; the proxy loads the actual App HTML in
+ * its inner null-origin iframe with a per-resource CSP.
+ *
+ * The whole surface is dark until the backend host flag is flipped (PR #7),
+ * so in practice no `ui_resource` arrives and the registry never resolves
+ * here. When it has no resource for its `toolUseId` (e.g. after a reload —
+ * the inline event doesn't re-hydrate) it renders nothing and the tool-use
+ * card falls back to the default renderer path.
+ */
+@Component({
+  selector: 'app-mcp-app-frame',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  imports: [McpAppConsentPromptComponent],
+  styles: ':host { display: block; }',
+  template: `
+    @if (currentPrompt(); as prompt) {
+      <div class="mb-2 flex justify-start">
+        <app-mcp-app-consent-prompt [prompt]="prompt" />
+      </div>
+    }
+    @if (proxyUrl(); as url) {
+      <div
+        #host
+        class="overflow-hidden rounded-sm border border-gray-300 dark:border-gray-600 bg-white"
+      ></div>
+    }
+  `,
+})
+export class McpAppFrameComponent implements ToolResultRenderer {
+  /** Tool result payload (the renderer contract). Mapped to CallToolResult. */
+  readonly result = input.required<ToolResultData>();
+  readonly minimized = input<boolean>(false);
+  /** Originating tool-use id — keys the resource + correlates tool data. */
+  readonly toolUseId = input<string>();
+
+  private readonly mcpAppState = inject(McpAppStateService);
+  private readonly mcpAppProxy = inject(McpAppProxyService);
+  private readonly mcpAppMessage = inject(McpAppMessageService);
+  private readonly mcpAppConsent = inject(McpAppConsentService);
+  private readonly chatRequest = inject(ChatRequestService);
+  private readonly conversation = inject(SessionService);
+  private readonly streamParser = inject(StreamParserService);
+  private readonly theme = inject(ThemeService);
+  private readonly destroyRef = inject(DestroyRef);
+  private readonly doc = inject(DOCUMENT);
+  private readonly win = this.doc.defaultView;
+
+  private readonly hostRef =
+    viewChild<ElementRef<HTMLDivElement>>('host');
+  private iframeEl: HTMLIFrameElement | null = null;
+
+  /** Initial height; the App drives it via `ui/notifications/size-changed`. */
+  protected readonly frameHeight = signal(360);
+
+  private bridge: McpAppBridge | null = null;
+  private readonly nonce =
+    this.win?.crypto?.randomUUID?.() ?? `n-${Math.random().toString(36).slice(2)}`;
+
+  /** The UI resource for this tool invocation (undefined ⇒ render nothing). */
+  protected readonly resource = computed(() => {
+    const id = this.toolUseId();
+    return id ? this.mcpAppState.get(id) : undefined;
+  });
+
+  /** Capabilities the resource declares (`_meta.ui.permissions`). */
+  private readonly requestedCaps = computed<CapabilityKey[]>(() => {
+    const p = this.resource()?.permissions ?? {};
+    const caps: CapabilityKey[] = [];
+    if (p.camera) caps.push('camera');
+    if (p.microphone) caps.push('microphone');
+    if (p.geolocation) caps.push('geolocation');
+    if (p.clipboardWrite) caps.push('clipboardWrite');
+    return caps;
+  });
+
+  /**
+   * Render-time capability consent (PR #6). `null` = undecided. When the
+   * resource requests sensitive sandbox features we hold the frame until
+   * the user answers an inline prompt, then grant only what was approved
+   * (all-or-nothing for v1). Frontend-only — the request never reaches a
+   * backend turn, same justification as `McpAppConsentService`.
+   */
+  private readonly capabilityGrant = signal<boolean | null>(null);
+  private capabilityAsked = false;
+
+  /** Id of this frame's currently-open consent prompt (capability ask or
+   *  open-link), used to render it inline above the iframe instead of in
+   *  an unanchored message-list strip. */
+  private readonly openPromptId = signal<string | null>(null);
+  protected readonly currentPrompt = computed(() => {
+    const id = this.openPromptId();
+    if (!id) return null;
+    return this.mcpAppConsent.pending().find((p) => p.id === id) ?? null;
+  });
+
+  /** True once requested capabilities are decided (or none were asked). */
+  private readonly capabilitiesResolved = computed(
+    () => this.requestedCaps().length === 0 || this.capabilityGrant() !== null,
+  );
+
+  /** Permissions actually applied to the frame: declared ∩ consent. */
+  private readonly effectivePermissions = computed(() => {
+    const declared = this.resource()?.permissions ?? {};
+    if (this.requestedCaps().length === 0) return declared;
+    return this.capabilityGrant() ? declared : {};
+  });
+
+  /**
+   * Plain string URL the iframe `src` is set to. Stays null until consent
+   * resolves so the @if-gated host div doesn't appear. Trusted single
+   * value from our authenticated backend (SSM-sourced); the imperative
+   * sandbox attribute + the proxy's per-resource CSP are the real
+   * containment, same justification as the artifact panel.
+   *
+   * The `?csp=` query the proxy CFN reads is built from the resource's
+   * declared `_meta.ui.csp` (`buildProxyUrl`). Apps that declare nothing
+   * get the bare URL and the proxy's default CSP — no cache fragmentation
+   * for the no-declaration majority.
+   */
+  protected readonly proxyUrl = computed<string | null>(() => {
+    const res = this.resource();
+    if (!res || !res.sandboxOrigin) return null;
+    if (!this.capabilitiesResolved()) return null;
+    return buildProxyUrl(res.sandboxOrigin, res.csp);
+  });
+
+  /** Permissions-Policy `allow` for the outer frame (delegates to inner). */
+  protected readonly allowAttr = computed(() => {
+    const p = this.effectivePermissions();
+    const feats: string[] = [];
+    if (p.camera) feats.push('camera');
+    if (p.microphone) feats.push('microphone');
+    if (p.geolocation) feats.push('geolocation');
+    if (p.clipboardWrite) feats.push('clipboard-write');
+    return feats.length ? feats.join('; ') : null;
+  });
+
+  constructor() {
+    // Push theme changes to the App as a host-context-changed partial.
+    effect(() => {
+      const theme = this.theme.theme();
+      this.bridge?.notifyHostContextChanged({ theme });
+    });
+    // Re-push the tool result if it lands/changes after the App initialized.
+    effect(() => {
+      this.result();
+      this.bridge?.refreshToolResult();
+    });
+    // Render-time capability consent: when the resource requests sensitive
+    // sandbox features, ask once and hold the frame until answered.
+    effect(() => {
+      const caps = this.requestedCaps();
+      if (caps.length === 0 || this.capabilityAsked) return;
+      this.capabilityAsked = true;
+      const { id, granted } = this.mcpAppConsent.request({
+        kind: 'capabilities',
+        capabilities: caps,
+      });
+      this.openPromptId.set(id);
+      granted
+        .then((g) => this.capabilityGrant.set(g))
+        .catch(() => this.capabilityGrant.set(false))
+        .finally(() => this.openPromptId.set(null));
+    });
+    // Imperatively create the iframe once the host div mounts and consent
+    // is resolved. Angular 21 forbids dynamic `[attr.allow]` on <iframe>
+    // (NG0910), so we build the element by hand with all attributes set
+    // before src — the browser only consults `allow` at load-start.
+    effect(() => {
+      const host = this.hostRef();
+      const url = this.proxyUrl();
+      if (!host || !url) {
+        if (this.iframeEl) {
+          this.iframeEl.remove();
+          this.iframeEl = null;
+        }
+        return;
+      }
+      if (this.iframeEl) return;
+      const iframe = this.doc.createElement('iframe');
+      iframe.setAttribute('title', 'MCP App');
+      iframe.setAttribute('sandbox', 'allow-scripts allow-same-origin');
+      iframe.setAttribute('referrerpolicy', 'no-referrer');
+      iframe.setAttribute('loading', 'lazy');
+      const allow = this.allowAttr();
+      if (allow) iframe.setAttribute('allow', allow);
+      iframe.className = 'block w-full border-0 bg-white';
+      iframe.style.height = `${this.frameHeight()}px`;
+      // Append BEFORE setting src so contentWindow exists, then start the
+      // bridge so the host listener is registered before the proxy script
+      // posts its `sandbox-proxy-ready` notification. Doing this in the
+      // (load) callback races: the proxy fires ready as soon as its IIFE
+      // runs, which is before the host's load event handler dispatches —
+      // miss that and the inner App iframe is never mounted (blank frame).
+      host.nativeElement.appendChild(iframe);
+      this.iframeEl = iframe;
+      this.startBridge();
+      iframe.src = url;
+    });
+    // Keep the iframe height in sync as the App reports size changes.
+    effect(() => {
+      const h = this.frameHeight();
+      if (this.iframeEl) this.iframeEl.style.height = `${h}px`;
+    });
+    this.destroyRef.onDestroy(() => this.bridge?.dispose('component-destroyed'));
+  }
+
+  private startBridge(): void {
+    const res = this.resource();
+    if (!res || this.bridge || !this.win) return;
+    // Hand the bridge a resource whose permissions are already narrowed to
+    // what the user consented to, so sandbox-resource-ready + the
+    // initialize `hostCapabilities.sandbox.permissions` advertise only the
+    // granted subset (consistent with the outer iframe's `allow`).
+    const effectiveRes = { ...res, permissions: this.effectivePermissions() };
+    this.bridge = new McpAppBridge({
+      hostWindow: this.win,
+      getProxyWindow: () => this.iframeEl?.contentWindow ?? null,
+      sandboxOrigin: res.sandboxOrigin.replace(/\/$/, ''),
+      resource: effectiveRes,
+      nonce: this.nonce,
+      getToolInput: () => this.lookupToolInput(),
+      getToolResult: () => this.toCallToolResult(),
+      getHostContext: () => ({
+        theme: this.theme.theme(),
+        locale: this.win?.navigator?.language,
+        userAgent: 'agentcore-public-stack',
+      }),
+      openLink: (url) => {
+        this.win?.open(url, '_blank', 'noopener,noreferrer');
+      },
+      proxyToolCall: (toolName, args) =>
+        this.mcpAppProxy.proxyToolCall(
+          this.toolUseId() ?? '',
+          toolName,
+          args,
+        ),
+      sendMessage: (text) =>
+        this.chatRequest.submitChatRequest(
+          text,
+          this.conversation.currentSession().sessionId || null,
+        ),
+      updateModelContext: (payload) =>
+        this.mcpAppMessage.updateModelContext(res.resourceUri, payload),
+      requestConsent: (req) => {
+        const { id, granted } = this.mcpAppConsent.request(req);
+        this.openPromptId.set(id);
+        return granted.finally(() => this.openPromptId.set(null));
+      },
+    });
+    this.bridge.onSizeChanged((_w, h) => {
+      if (h > 0) this.frameHeight.set(Math.ceil(h));
+    });
+    this.bridge.start();
+  }
+
+  /** Complete tool-call arguments, found by toolUseId in the live stream. */
+  private lookupToolInput(): Record<string, unknown> {
+    const id = this.toolUseId();
+    if (!id) return {};
+    for (const msg of this.streamParser.allMessages()) {
+      for (const block of msg.content ?? []) {
+        const tu = (block as { toolUse?: { toolUseId?: string; input?: unknown } })
+          .toolUse;
+        if (tu && tu.toolUseId === id && tu.input && typeof tu.input === 'object') {
+          return tu.input as Record<string, unknown>;
+        }
+      }
+    }
+    return {};
+  }
+
+  /** Map the renderer's `ToolResultData` to an MCP `CallToolResult`. */
+  private toCallToolResult(): unknown | null {
+    const r = this.result();
+    if (!r) return null;
+    const content = (r.content ?? []).map((item) => {
+      if (item.image) {
+        return {
+          type: 'image',
+          data: item.image.data,
+          mimeType: `image/${item.image.format}`,
+        };
+      }
+      if (item.json !== undefined) {
+        return { type: 'text', text: JSON.stringify(item.json) };
+      }
+      return { type: 'text', text: item.text ?? '' };
+    });
+    return { content, isError: r.status === 'error' };
+  }
+}
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/tool-use/tool-renderer-registry.service.spec.ts b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/tool-renderer-registry.service.spec.ts
new file mode 100644
index 00000000..6276ed59
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/tool-renderer-registry.service.spec.ts
@@ -0,0 +1,70 @@
+import { ChangeDetectionStrategy, Component, computed, input } from '@angular/core';
+import { TestBed } from '@angular/core/testing';
+import { describe, it, expect, beforeEach } from 'vitest';
+import {
+  ToolRendererRegistryService,
+  ToolResultData,
+  ToolResultRenderer,
+} from './tool-renderer-registry.service';
+import { DefaultToolResultComponent } from './renderers/default-tool-result.component';
+
+@Component({
+  selector: 'app-stub-renderer',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  template: `<div class="stub-renderer">stub</div>`,
+})
+class StubRendererComponent implements ToolResultRenderer {
+  result = input.required<ToolResultData>();
+  minimized = input<boolean>(false);
+}
+
+@Component({
+  selector: 'app-other-renderer',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  template: `<div class="other-renderer">other</div>`,
+})
+class OtherRendererComponent implements ToolResultRenderer {
+  result = input.required<ToolResultData>();
+  minimized = input<boolean>(false);
+}
+
+describe('ToolRendererRegistryService', () => {
+  let registry: ToolRendererRegistryService;
+
+  beforeEach(() => {
+    TestBed.resetTestingModule();
+    TestBed.configureTestingModule({});
+    registry = TestBed.inject(ToolRendererRegistryService);
+  });
+
+  it('resolves unregistered tools to the default renderer', () => {
+    expect(registry.resolve('not_registered')).toBe(DefaultToolResultComponent);
+    expect(registry.has('not_registered')).toBe(false);
+  });
+
+  it('resolves a registered tool to its component', () => {
+    registry.register('calculator', StubRendererComponent);
+
+    expect(registry.has('calculator')).toBe(true);
+    expect(registry.resolve('calculator')).toBe(StubRendererComponent);
+    // Other tools still fall back to the default.
+    expect(registry.resolve('fetch_url_content')).toBe(DefaultToolResultComponent);
+  });
+
+  it('lets a later registration override an earlier one', () => {
+    registry.register('calculator', StubRendererComponent);
+    registry.register('calculator', OtherRendererComponent);
+
+    expect(registry.resolve('calculator')).toBe(OtherRendererComponent);
+  });
+
+  it('stays reactive inside a computed when a renderer registers late', () => {
+    const resolved = computed(() => registry.resolve('calculator'));
+
+    expect(resolved()).toBe(DefaultToolResultComponent);
+
+    registry.register('calculator', StubRendererComponent);
+
+    expect(resolved()).toBe(StubRendererComponent);
+  });
+});
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/tool-use/tool-renderer-registry.service.ts b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/tool-renderer-registry.service.ts
new file mode 100644
index 00000000..f329eeb0
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/tool-renderer-registry.service.ts
@@ -0,0 +1,61 @@
+import { Injectable, InputSignal, Type, signal } from '@angular/core';
+import { ToolResultContent } from '../../../../services/models/message.model';
+import { DefaultToolResultComponent } from './renderers/default-tool-result.component';
+
+/**
+ * The tool result payload handed to every renderer. Mirrors the inline
+ * `result` shape on {@link ToolUseData} so renderers don't depend on the
+ * surrounding content-block envelope.
+ */
+export interface ToolResultData {
+  content: ToolResultContent[];
+  status: 'success' | 'error';
+}
+
+/**
+ * Structural contract every registered tool-result renderer must satisfy.
+ * A renderer is just a standalone component that exposes these two signal
+ * inputs; the host binds them via `NgComponentOutlet`. The future MCP App
+ * renderer plugs in as one of these with no host-template changes.
+ */
+export interface ToolResultRenderer {
+  readonly result: InputSignal<ToolResultData>;
+  readonly minimized: InputSignal<boolean>;
+  /**
+   * Originating tool-use id. Optional: existing renderers don't declare it
+   * and the host only binds it when the resolved renderer is one that
+   * declares it (the MCP App frame) — NgComponentOutlet throws if asked to
+   * set an input a component doesn't expose, so it must not be passed
+   * uniformly.
+   */
+  readonly toolUseId?: InputSignal<string | undefined>;
+}
+
+/**
+ * Signal-backed lookup of tool name → result-renderer component.
+ *
+ * Unregistered tools resolve to {@link DefaultToolResultComponent}, which
+ * reproduces the historical text/JSON/image rendering verbatim. `resolve`
+ * reads the backing signal, so a `computed()` that calls it stays reactive
+ * to registrations that happen after first render.
+ */
+@Injectable({ providedIn: 'root' })
+export class ToolRendererRegistryService {
+  private readonly renderers = signal<ReadonlyMap<string, Type<ToolResultRenderer>>>(
+    new Map(),
+  );
+
+  register(toolName: string, component: Type<ToolResultRenderer>): void {
+    const next = new Map(this.renderers());
+    next.set(toolName, component);
+    this.renderers.set(next);
+  }
+
+  resolve(toolName: string): Type<ToolResultRenderer> {
+    return this.renderers().get(toolName) ?? DefaultToolResultComponent;
+  }
+
+  has(toolName: string): boolean {
+    return this.renderers().has(toolName);
+  }
+}
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/tool-use/tool-use.component.html b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/tool-use.component.html
index 5947ea1d..c55945d4 100644
--- a/frontend/ai.client/src/app/session/components/message-list/components/tool-use/tool-use.component.html
+++ b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/tool-use.component.html
@@ -95,17 +95,10 @@
               <span class="text-red-600 dark:text-red-400">(Error)</span>
             }
           </div>
-          <div class="space-y-2">
-            @for (item of resultTextContent(); track $index) {
-              <div class="p-2 bg-gray-100 dark:bg-gray-800 rounded-xs border border-gray-300 dark:border-gray-600">
-                @if (item.json) {
-                  <pre class="font-mono text-xs overflow-x-auto"><code [innerHTML]="formatResultContent(item) | jsonSyntaxHighlight"></code></pre>
-                } @else {
-                  <div class="text-sm/6 text-gray-900 dark:text-gray-100 whitespace-pre-wrap">{{ formatResultContent(item) }}</div>
-                }
-              </div>
-            }
-          </div>
+          <ng-container
+            [ngComponentOutlet]="resultRenderer()"
+            [ngComponentOutletInputs]="rendererInputs()"
+          />
         </div>
       }
     </div>
@@ -242,29 +235,10 @@
                   <span class="text-red-600 dark:text-red-400">(Error)</span>
                 }
               </div>
-              <div class="space-y-2">
-                <!-- Text/JSON content -->
-                @for (item of resultTextContent(); track $index) {
-                  <div class="p-2 bg-gray-100 dark:bg-gray-800 rounded-sm border border-gray-300 dark:border-gray-600">
-                    @if (item.json) {
-                      <pre class="font-mono text-xs overflow-x-auto"><code [innerHTML]="formatResultContent(item) | jsonSyntaxHighlight"></code></pre>
-                    } @else {
-                      <div class="text-sm/6 text-gray-900 dark:text-gray-100 whitespace-pre-wrap">{{ formatResultContent(item) }}</div>
-                    }
-                  </div>
-                }
-
-                <!-- Image content -->
-                @for (item of resultImageContent(); track $index) {
-                  <div class="rounded-sm overflow-hidden bg-gray-100 dark:bg-gray-800 p-2 border border-gray-300 dark:border-gray-600">
-                    <img
-                      [src]="getImageDataUrl(item)"
-                      alt="Tool result image"
-                      class="max-w-full h-auto rounded-xs"
-                    />
-                  </div>
-                }
-              </div>
+              <ng-container
+                [ngComponentOutlet]="resultRenderer()"
+                [ngComponentOutletInputs]="rendererInputs()"
+              />
             </div>
           }
         </div>
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/tool-use/tool-use.component.spec.ts b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/tool-use.component.spec.ts
new file mode 100644
index 00000000..291f308c
--- /dev/null
+++ b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/tool-use.component.spec.ts
@@ -0,0 +1,136 @@
+import { ChangeDetectionStrategy, Component, input } from '@angular/core';
+import { ComponentFixture, TestBed } from '@angular/core/testing';
+import { describe, it, expect, beforeEach } from 'vitest';
+import { ToolUseComponent } from './tool-use.component';
+import {
+  ToolRendererRegistryService,
+  ToolResultData,
+  ToolResultRenderer,
+} from './tool-renderer-registry.service';
+import { CalculatorToolResultComponent } from './renderers/calculator-tool-result.component';
+import { ContentBlock } from '../../../../services/models/message.model';
+
+@Component({
+  selector: 'app-spec-stub-renderer',
+  changeDetection: ChangeDetectionStrategy.OnPush,
+  template: `<div class="spec-stub">STUB RENDERER</div>`,
+})
+class SpecStubRendererComponent implements ToolResultRenderer {
+  result = input.required<ToolResultData>();
+  minimized = input<boolean>(false);
+}
+
+function makeBlock(
+  name: string,
+  result: ToolResultData['content'],
+): ContentBlock {
+  return {
+    type: 'toolUse',
+    toolUse: {
+      toolUseId: `tool-${name}`,
+      name,
+      input: { query: 'test' },
+      status: 'complete',
+      result: { status: 'success', content: result },
+    },
+  };
+}
+
+describe('ToolUseComponent', () => {
+  let fixture: ComponentFixture<ToolUseComponent>;
+  let registry: ToolRendererRegistryService;
+
+  beforeEach(async () => {
+    TestBed.resetTestingModule();
+    await TestBed.configureTestingModule({
+      imports: [ToolUseComponent],
+    }).compileComponents();
+
+    registry = TestBed.inject(ToolRendererRegistryService);
+    fixture = TestBed.createComponent(ToolUseComponent);
+  });
+
+  it('renders a non-migrated tool through the default renderer', () => {
+    fixture.componentRef.setInput(
+      'toolUse',
+      makeBlock('some_unregistered_tool', [{ text: 'default path output' }]),
+    );
+    fixture.detectChanges();
+
+    const def = fixture.nativeElement.querySelector('app-default-tool-result');
+    expect(def).toBeTruthy();
+    expect(def.textContent).toContain('default path output');
+  });
+
+  it('renders a migrated tool through its registered renderer (identical output)', () => {
+    registry.register('calculator', CalculatorToolResultComponent);
+
+    fixture.componentRef.setInput(
+      'toolUse',
+      makeBlock('calculator', [{ text: 'the answer is 42' }]),
+    );
+    fixture.detectChanges();
+
+    // The registered proof-point component is used...
+    const calc = fixture.nativeElement.querySelector('app-calculator-tool-result');
+    expect(calc).toBeTruthy();
+    // ...and it delegates to the default renderer, so output is unchanged.
+    const def = calc.querySelector('app-default-tool-result');
+    expect(def).toBeTruthy();
+    expect(calc.textContent).toContain('the answer is 42');
+  });
+
+  it('renders identical result text for a migrated vs default-path tool', () => {
+    registry.register('calculator', CalculatorToolResultComponent);
+
+    fixture.componentRef.setInput(
+      'toolUse',
+      makeBlock('calculator', [{ text: 'shared output' }]),
+    );
+    fixture.detectChanges();
+    const migratedText = fixture.nativeElement
+      .querySelector('.whitespace-pre-wrap')
+      .textContent.trim();
+
+    const otherFixture = TestBed.createComponent(ToolUseComponent);
+    otherFixture.componentRef.setInput(
+      'toolUse',
+      makeBlock('plain_tool', [{ text: 'shared output' }]),
+    );
+    otherFixture.detectChanges();
+    const defaultText = otherFixture.nativeElement
+      .querySelector('.whitespace-pre-wrap')
+      .textContent.trim();
+
+    expect(migratedText).toBe(defaultText);
+    expect(migratedText).toBe('shared output');
+  });
+
+  it('uses a custom registered renderer instead of the default', () => {
+    registry.register('weird_tool', SpecStubRendererComponent);
+
+    fixture.componentRef.setInput(
+      'toolUse',
+      makeBlock('weird_tool', [{ text: 'ignored by stub' }]),
+    );
+    fixture.detectChanges();
+
+    expect(fixture.nativeElement.querySelector('.spec-stub')).toBeTruthy();
+    expect(fixture.nativeElement.querySelector('app-default-tool-result')).toBeNull();
+  });
+
+  it('does not render a result section when the tool has no result', () => {
+    fixture.componentRef.setInput('toolUse', {
+      type: 'toolUse',
+      toolUse: {
+        toolUseId: 'tool-pending',
+        name: 'pending_tool',
+        input: {},
+        status: 'pending',
+      },
+    } as ContentBlock);
+    fixture.detectChanges();
+
+    expect(fixture.nativeElement.querySelector('app-default-tool-result')).toBeNull();
+  });
+});
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/tool-use/tool-use.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/tool-use.component.ts
index 6f5e2614..edbd6758 100644
--- a/frontend/ai.client/src/app/session/components/message-list/components/tool-use/tool-use.component.ts
+++ b/frontend/ai.client/src/app/session/components/message-list/components/tool-use/tool-use.component.ts
@@ -3,17 +3,20 @@ import {
   input,
   signal,
   computed,
+  inject,
   ChangeDetectionStrategy,
 } from '@angular/core';
+import { NgComponentOutlet } from '@angular/common';
 import { JsonSyntaxHighlightPipe } from './json-syntax-highlight.pipe';
-import { ContentBlock, ToolUseData, ToolResultContent } from '../../../../services/models/message.model';
+import { ContentBlock, ToolUseData } from '../../../../services/models/message.model';
+import { ToolRendererRegistryService } from './tool-renderer-registry.service';
 
 @Component({
   selector: 'app-tool-use',
   templateUrl: './tool-use.component.html',
   styleUrl: './tool-use.component.css',
   changeDetection: ChangeDetectionStrategy.OnPush,
-  imports: [JsonSyntaxHighlightPipe],
+  imports: [JsonSyntaxHighlightPipe, NgComponentOutlet],
 })
 export class ToolUseComponent {
   /** The content block containing tool use data */
@@ -25,6 +28,8 @@ export class ToolUseComponent {
   /** Whether the details section is expanded (both input and output) */
   isDetailsExpanded = signal(false);
 
+  private readonly rendererRegistry = inject(ToolRendererRegistryService);
+
   /** Extract tool use data from the content block */
   toolUseData = computed(() => {
     const block = this.toolUse();
@@ -40,6 +45,9 @@ export class ToolUseComponent {
     return this.toolUseData().name || 'Unknown Tool';
   });
 
+  /** Originating tool-use id (correlates the MCP App resource + tool data). */
+  toolUseId = computed(() => this.toolUseData().toolUseId);
+
   /** Tool input */
   toolInput = computed(() => {
     return this.toolUseData().input || {};
@@ -60,6 +68,23 @@ export class ToolUseComponent {
     return !!this.toolResult();
   });
 
+  /**
+   * Result-renderer component for this tool. Resolved from the name-keyed
+   * registry (default = text/JSON/image). MCP Apps (SEP-1865) deliberately
+   * do NOT route through here — they render as their own first-class
+   * `mcp_app_frame` block in <app-assistant-message>, with the tool card
+   * still showing the call's input/output as provenance.
+   */
+  resultRenderer = computed(() =>
+    this.rendererRegistry.resolve(this.toolName()),
+  );
+
+  /** Inputs bound onto the resolved renderer via NgComponentOutlet. */
+  rendererInputs = computed(() => ({
+    result: this.toolResult(),
+    minimized: this.minimized(),
+  }));
+
   /** Preview of input keys for collapsed state */
   inputKeysPreview = computed(() => {
     const keys = Object.keys(this.toolInput());
@@ -97,35 +122,6 @@ export class ToolUseComponent {
     return 'text-blue-600 dark:text-blue-400';
   });
 
-  /** Get result text content */
-  resultTextContent = computed(() => {
-    const result = this.toolResult();
-    if (!result) return [];
-
-    return result.content.filter(item => item.text || item.json);
-  });
-
-  /** Get result image content */
-  resultImageContent = computed(() => {
-    const result = this.toolResult();
-    if (!result) return [];
-
-    return result.content.filter(item => item.image);
-  });
-
-  /** Format result content for display */
-  formatResultContent(item: ToolResultContent): string {
-    if (item.text) return item.text;
-    if (item.json) return JSON.stringify(item.json, null, 2);
-    return '';
-  }
-
-  /** Get image data URL */
-  getImageDataUrl(item: ToolResultContent): string {
-    if (!item.image) return '';
-    return `data:image/${item.image.format};base64,${item.image.data}`;
-  }
-
   /** Toggle the details expanded state */
   toggleDetailsExpanded(): void {
     this.isDetailsExpanded.update((expanded) => !expanded);
diff --git a/frontend/ai.client/src/app/session/components/message-list/components/user-message.component.ts b/frontend/ai.client/src/app/session/components/message-list/components/user-message.component.ts
index b630fa59..3aeb6922 100644
--- a/frontend/ai.client/src/app/session/components/message-list/components/user-message.component.ts
+++ b/frontend/ai.client/src/app/session/components/message-list/components/user-message.component.ts
@@ -10,15 +10,19 @@ import {
   inject,
 } from '@angular/core';
 import { ContentBlock, Message, FileAttachmentData } from '../../../services/models/message.model';
-import { FileAttachmentBadgeComponent } from './file-attachment';
+import { FileAttachmentBadgeComponent, ImageAttachmentGroupComponent } from './file-attachment';
 import { LocalSettingsService } from '../../../../services/local-settings.service';
 
+function isImageMimeType(mimeType: string): boolean {
+  return mimeType.startsWith('image/');
+}
+
 const MAX_HEIGHT_PX = 200;
 
 @Component({
   selector: 'app-user-message',
   changeDetection: ChangeDetectionStrategy.OnPush,
-  imports: [FileAttachmentBadgeComponent],
+  imports: [FileAttachmentBadgeComponent, ImageAttachmentGroupComponent],
   template: `
     @if (hasTextContent() || hasFileAttachments()) {
       <div class="flex w-full flex-col items-end gap-2">
@@ -61,10 +65,17 @@ const MAX_HEIGHT_PX = 200;
           </div>
         }
 
-        <!-- File attachments (below message bubble) -->
-        @if (hasFileAttachments()) {
+        <!-- Image attachments (iMessage-style mosaic) -->
+        @if (imageAttachments().length > 0) {
+          <div class="flex max-w-[80%] justify-end">
+            <app-image-attachment-group [attachments]="imageAttachments()" />
+          </div>
+        }
+
+        <!-- Non-image file attachments (below message bubble) -->
+        @if (nonImageAttachments().length > 0) {
           <div class="flex max-w-[80%] flex-wrap justify-end gap-2">
-            @for (attachment of fileAttachments(); track attachment.uploadId) {
+            @for (attachment of nonImageAttachments(); track attachment.uploadId) {
               <app-file-attachment-badge [attachment]="attachment" />
             }
           </div>
@@ -119,6 +130,14 @@ export class UserMessageComponent implements AfterViewInit {
       .map((block: ContentBlock) => block.fileAttachment as FileAttachmentData);
   });
 
+  imageAttachments = computed((): FileAttachmentData[] =>
+    this.fileAttachments().filter((a) => isImageMimeType(a.mimeType)),
+  );
+
+  nonImageAttachments = computed((): FileAttachmentData[] =>
+    this.fileAttachments().filter((a) => !isImageMimeType(a.mimeType)),
+  );
+
   ngAfterViewInit(): void {
     this.checkOverflow();
   }
diff --git a/frontend/ai.client/src/app/session/components/message-list/message-list.component.html b/frontend/ai.client/src/app/session/components/message-list/message-list.component.html
index c2edaaaf..e7ec525f 100644
--- a/frontend/ai.client/src/app/session/components/message-list/message-list.component.html
+++ b/frontend/ai.client/src/app/session/components/message-list/message-list.component.html
@@ -19,7 +19,10 @@
                 <app-assistant-message class="min-w-0 block" [message]="message" [isStreaming]="message.id === streamingMessageId()"></app-assistant-message>
             </div>
             <div class="mt-1 flex justify-start">
-                <app-message-actions [message]="message"></app-message-actions>
+                <app-message-actions
+                  [message]="message"
+                  [canContinue]="canContinueFor(message.id)"
+                  (continueRequested)="continueRequested.emit()"></app-message-actions>
             </div>
             <div class="invisible mt-1 flex flex-wrap items-center gap-2 opacity-0 transition-opacity duration-200 group-hover:visible group-hover:opacity-100 group-focus-within:visible group-focus-within:opacity-100" aria-live="polite" aria-atomic="true">
                 @if (message.metadata) {
@@ -32,6 +35,25 @@
         </div>
     }
 
+    <!-- Artifacts produced by this turn, anchored to their final
+         assistant message via producedByMessageIndex. Renders both live
+         (the `artifact` SSE event carries the index) and after a session
+         reload (the list endpoint persists it on the HEAD row). -->
+    @let messageArtifacts = artifactsForMessageId(message.id);
+    @if (messageArtifacts.length > 0) {
+      <section
+        class="flex flex-col gap-2"
+        aria-label="Artifacts from this response"
+      >
+        @for (
+          artifact of messageArtifacts;
+          track artifact.artifactId + '#' + artifact.version
+        ) {
+          <app-artifact-card [artifact]="artifact" />
+        }
+      </section>
+    }
+
   }
 
   <!-- Single subtle compaction summary at the end of the conversation,
@@ -45,6 +67,24 @@
     />
   }
 
+  <!-- Fallback strip for artifacts that can't be anchored to a loaded
+       message: legacy rows written before linkage existed, or an index
+       pointing outside the currently paginated page. Anchored artifacts
+       render inline after their producing message (see the @for above). -->
+  @if (hasOrphanArtifacts()) {
+    <section
+      class="mt-3 flex flex-col gap-2"
+      aria-label="Artifacts created in this conversation"
+    >
+      @for (
+        artifact of orphanArtifacts();
+        track artifact.artifactId + '#' + artifact.version
+      ) {
+        <app-artifact-card [artifact]="artifact" />
+      }
+    </section>
+  }
+
   <!-- OAuth consent prompts not anchored to a loaded message — usually
        hydrated from session metadata after a refresh, when the partial
        assistant message wasn't persisted. -->
@@ -54,6 +94,21 @@
     </div>
   }
 
+  <!-- Static historical cards for app-initiated tool calls (PR #6,
+       Option A). The PR #5 broker is in-memory, so on reload the live
+       tool_use/tool_result are gone; these replay them as a read-only
+       record at the end of the conversation. -->
+  @if (hasMcpAppCards()) {
+    <section
+      class="mt-3 flex flex-col gap-2"
+      aria-label="Tool calls run by embedded apps in this conversation"
+    >
+      @for (card of mcpAppCards(); track card.cardId) {
+        <app-mcp-app-card [card]="card" />
+      }
+    </section>
+  }
+
   <!-- Per-tool approval prompts (catalog-flagged needs_approval). Tied to
        the live SSE turn; refresh during the prompt orphans it. -->
   @for (request of pendingToolApprovals(); track request.interruptId) {
@@ -85,4 +140,7 @@
       class="pointer-events-none">
     </div>
   }
+
+  <!-- Fixed-position slide-over; renders only when an artifact is open. -->
+  <app-artifact-panel />
 </div>
diff --git a/frontend/ai.client/src/app/session/components/message-list/message-list.component.ts b/frontend/ai.client/src/app/session/components/message-list/message-list.component.ts
index e9f1c21a..55e924c5 100644
--- a/frontend/ai.client/src/app/session/components/message-list/message-list.component.ts
+++ b/frontend/ai.client/src/app/session/components/message-list/message-list.component.ts
@@ -1,6 +1,7 @@
-import { Component, computed, input, signal, effect, OnDestroy, inject, PLATFORM_ID } from '@angular/core';
+import { Component, computed, input, output, signal, effect, OnDestroy, inject, PLATFORM_ID } from '@angular/core';
 import { isPlatformBrowser } from '@angular/common';
 import { Message } from '../../services/models/message.model';
+import type { Artifact } from '../../services/artifacts/artifact.model';
 import { UserMessageComponent } from './components/user-message.component';
 import { AssistantMessageComponent } from './components/assistant-message.component';
 import { MessageMetadataBadgesComponent } from './components/message-metadata-badges.component';
@@ -10,6 +11,11 @@ import { PulsatingLoaderComponent } from '../../../components/pulsating-loader.c
 import { OAuthConsentPromptComponent } from './components/oauth-consent-prompt/oauth-consent-prompt.component';
 import { ToolApprovalPromptComponent } from './components/tool-approval-prompt/tool-approval-prompt.component';
 import { CompactionSummaryComponent } from './components/compaction-summary/compaction-summary.component';
+import { ArtifactCardComponent } from './components/artifact/artifact-card.component';
+import { ArtifactPanelComponent } from './components/artifact/artifact-panel.component';
+import { ArtifactStateService } from '../../services/artifacts/artifact-state.service';
+import { McpAppCardComponent } from './components/mcp-app-card/mcp-app-card.component';
+import { McpAppCardStateService } from '../../services/mcp-apps/mcp-app-card-state.service';
 import {
   OAuthConsentRequest,
   OAuthConsentService,
@@ -19,6 +25,7 @@ import {
   ToolApprovalService,
 } from '../../../services/tool-approval/tool-approval.service';
 import { CompactionSummaryService } from '../../services/chat/compaction-summary.service';
+import { ChatStateService } from '../../services/chat/chat-state.service';
 
 @Component({
   selector: 'app-message-list',
@@ -32,6 +39,9 @@ import { CompactionSummaryService } from '../../services/chat/compaction-summary
     OAuthConsentPromptComponent,
     ToolApprovalPromptComponent,
     CompactionSummaryComponent,
+    ArtifactCardComponent,
+    ArtifactPanelComponent,
+    McpAppCardComponent,
   ],
   templateUrl: './message-list.component.html',
   styleUrl: './message-list.component.css',
@@ -50,9 +60,113 @@ export class MessageListComponent implements OnDestroy {
   streamingMessageId = input<string | null>(null);
   embeddedMode = input<boolean>(false);
 
+  /** Bubbled up when the user clicks "Continue" on a max_tokens-truncated
+   *  assistant message. The page reuses the normal submit path with a
+   *  canned prompt. */
+  continueRequested = output<void>();
+
   private consentService = inject(OAuthConsentService);
   private toolApprovalService = inject(ToolApprovalService);
   private compactionSummary = inject(CompactionSummaryService);
+  private artifactState = inject(ArtifactStateService);
+  private mcpAppCardState = inject(McpAppCardStateService);
+  private chatStateService = inject(ChatStateService);
+
+  /** Persisted app-initiated tool cards, hydrated on reload (PR #6). */
+  protected mcpAppCards = this.mcpAppCardState.cards;
+  protected hasMcpAppCards = this.mcpAppCardState.hasCards;
+
+  /** Only the final message of a recoverable max_tokens turn gets the
+   *  "Continue" affordance. Live-only state, never shown while a new
+   *  response is streaming. */
+  private readonly lastMessageId = computed<string | null>(() => {
+    const m = this.messages();
+    return m.length ? m[m.length - 1].id : null;
+  });
+
+  protected canContinueFor(messageId: string): boolean {
+    return (
+      this.chatStateService.lastTurnContinuable() &&
+      !this.isChatLoading() &&
+      messageId === this.lastMessageId()
+    );
+  }
+
+  /** Session artifacts, newest first. Anchored ones render inline after
+   *  their producing assistant message (`producedByMessageIndex` matches
+   *  the `msg-{sessionId}-{index}` id); the rest fall back to the
+   *  end-of-conversation strip. */
+  protected artifacts = this.artifactState.artifacts;
+
+  /** Trailing 0-based index from a `msg-{sessionId}-{index}` id. Splits
+   *  on the last `-` so a session id containing dashes is irrelevant.
+   *  Null for any id that doesn't end in an integer. */
+  private parseMessageIndex(id: string): number | null {
+    const dash = id.lastIndexOf('-');
+    if (dash < 0) return null;
+    const n = Number(id.slice(dash + 1));
+    return Number.isInteger(n) ? n : null;
+  }
+
+  /** The message index an artifact anchors to. Live events carry a
+   *  concrete producing message id (stable as later turns append);
+   *  reload hydration only has the AgentCore-Memory numeric index, which
+   *  is exact there. Prefer the live id, fall back to the index. */
+  private resolveArtifactIndex(a: Artifact): number | null {
+    if (a.producedByMessageId) {
+      const live = this.parseMessageIndex(a.producedByMessageId);
+      if (live !== null) return live;
+    }
+    return a.producedByMessageIndex ?? null;
+  }
+
+  private readonly loadedMessageIndices = computed<ReadonlySet<number>>(() => {
+    const s = new Set<number>();
+    for (const m of this.messages()) {
+      const n = this.parseMessageIndex(m.id);
+      if (n !== null) s.add(n);
+    }
+    return s;
+  });
+
+  /** artifacts grouped by the message index that produced them, limited
+   *  to indices that are actually in the loaded (possibly paginated)
+   *  message list. */
+  private readonly artifactsByMessageIndex = computed<
+    ReadonlyMap<number, Artifact[]>
+  >(() => {
+    const loaded = this.loadedMessageIndices();
+    const map = new Map<number, Artifact[]>();
+    for (const a of this.artifacts()) {
+      const idx = this.resolveArtifactIndex(a);
+      if (idx == null || !loaded.has(idx)) continue;
+      const list = map.get(idx);
+      if (list) list.push(a);
+      else map.set(idx, [a]);
+    }
+    return map;
+  });
+
+  /** Artifacts with no usable anchor (legacy rows written before linkage,
+   *  or an index pointing outside the loaded page) — keep them visible in
+   *  the end-of-conversation strip so nothing silently disappears. */
+  protected readonly orphanArtifacts = computed<Artifact[]>(() => {
+    const loaded = this.loadedMessageIndices();
+    return this.artifacts().filter((a) => {
+      const idx = this.resolveArtifactIndex(a);
+      return idx == null || !loaded.has(idx);
+    });
+  });
+
+  protected readonly hasOrphanArtifacts = computed(
+    () => this.orphanArtifacts().length > 0,
+  );
+
+  protected artifactsForMessageId(id: string): Artifact[] {
+    const n = this.parseMessageIndex(id);
+    if (n === null) return [];
+    return this.artifactsByMessageIndex().get(n) ?? [];
+  }
 
   /** Single end-of-conversation compaction summary inputs. Sourced from
    *  live SSE events plus session-metadata hydration on load. The fade-in
diff --git a/frontend/ai.client/src/app/session/components/session-cost-badge/session-cost-badge.component.ts b/frontend/ai.client/src/app/session/components/session-cost-badge/session-cost-badge.component.ts
index 57ef193e..1a23b962 100644
--- a/frontend/ai.client/src/app/session/components/session-cost-badge/session-cost-badge.component.ts
+++ b/frontend/ai.client/src/app/session/components/session-cost-badge/session-cost-badge.component.ts
@@ -119,7 +119,7 @@ const RING_FILL_DELAY_MS = 750;
                 {{ tokensTotalLabel() }} tokens
               </span>
               <span class="mt-2 block text-[11px] leading-snug text-gray-500 dark:text-gray-400">
-                Includes system prompt and tool definitions.
+                Reflects the most recent turn (includes system prompt and tool definitions). May shrink after a context compaction.
               </span>
             </span>
           </span>
@@ -246,6 +246,6 @@ export class SessionCostBadgeComponent {
   protected readonly contextAriaLabel = computed(() => {
     const tokens = this.contextTokens().toLocaleString();
     const window = this.contextWindow().toLocaleString();
-    return `Context window: ${this.contextLabel()} used (${tokens} of ${window} tokens, includes system prompt and tools)`;
+    return `Context window: ${this.contextLabel()} used by the most recent turn (${tokens} of ${window} tokens, includes system prompt and tools)`;
   });
 }
diff --git a/frontend/ai.client/src/app/session/services/artifacts/artifact-download.service.ts b/frontend/ai.client/src/app/session/services/artifacts/artifact-download.service.ts
new file mode 100644
index 00000000..ec304903
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/artifacts/artifact-download.service.ts
@@ -0,0 +1,65 @@
+import { Injectable, inject } from '@angular/core';
+import { ArtifactHttpService } from './artifact-http.service';
+import { SessionService } from '../session/session.service';
+import { ToastService } from '../../../services/toast/toast.service';
+
+/** Minimal identity of the artifact version to save. */
+export interface DownloadableArtifact {
+  artifactId: string;
+  version: number;
+}
+
+/**
+ * Saves an artifact version to disk. Shared by the inline card and the
+ * docked panel so the credential handling stays in exactly one place.
+ *
+ * The content only lives on the artifact origin (no CORS,
+ * `connect-src 'none'`), so the SPA can't fetch the bytes into a blob.
+ * Instead it mints a fresh single-use render token and points a
+ * throwaway hidden iframe at the render URL with `download=1`; the
+ * render Lambda answers that with `Content-Disposition: attachment`, so
+ * the browser saves the file without navigating this document. A
+ * bad/expired token lands as an error page inside the discarded iframe,
+ * never the SPA.
+ */
+@Injectable({ providedIn: 'root' })
+export class ArtifactDownloadService {
+  private artifactHttp = inject(ArtifactHttpService);
+  private sessionService = inject(SessionService);
+  private toast = inject(ToastService);
+
+  /** Mint a token and kick off the save. Surfaces a toast and resolves
+   *  `false` on failure so callers can clear their own busy state. */
+  async download(ref: DownloadableArtifact): Promise<boolean> {
+    try {
+      const sessionId = this.sessionService.currentSession().sessionId;
+      const token = await this.artifactHttp.mintRenderToken(
+        ref.artifactId,
+        ref.version,
+        sessionId,
+      );
+      const sep = token.url.includes('?') ? '&' : '?';
+      this.trigger(`${token.url}${sep}download=1`);
+      return true;
+    } catch {
+      this.toast.error(
+        'Download failed',
+        "This artifact couldn't be downloaded. It may have expired or been removed.",
+      );
+      return false;
+    }
+  }
+
+  private trigger(url: string): void {
+    if (typeof document === 'undefined') return;
+    const frame = document.createElement('iframe');
+    frame.setAttribute('aria-hidden', 'true');
+    frame.style.display = 'none';
+    frame.src = url;
+    document.body.appendChild(frame);
+    // The token is single-use and short-lived; the save kicks off on
+    // load. Keep the frame briefly so the transfer can start, then drop
+    // it so the credential URL doesn't linger in the DOM.
+    setTimeout(() => frame.remove(), 60_000);
+  }
+}
diff --git a/frontend/ai.client/src/app/session/services/artifacts/artifact-http.service.spec.ts b/frontend/ai.client/src/app/session/services/artifacts/artifact-http.service.spec.ts
new file mode 100644
index 00000000..1cb66eb8
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/artifacts/artifact-http.service.spec.ts
@@ -0,0 +1,140 @@
+import { describe, it, expect, beforeEach, afterEach } from 'vitest';
+import { TestBed } from '@angular/core/testing';
+import { provideHttpClient } from '@angular/common/http';
+import {
+  HttpTestingController,
+  provideHttpClientTesting,
+} from '@angular/common/http/testing';
+import { signal } from '@angular/core';
+import { ArtifactHttpService } from './artifact-http.service';
+import { ConfigService } from '../../../services/config.service';
+
+const API = 'http://localhost:8000';
+
+describe('ArtifactHttpService', () => {
+  let service: ArtifactHttpService;
+  let httpMock: HttpTestingController;
+
+  beforeEach(() => {
+    TestBed.resetTestingModule();
+    TestBed.configureTestingModule({
+      providers: [
+        provideHttpClient(),
+        provideHttpClientTesting(),
+        ArtifactHttpService,
+        { provide: ConfigService, useValue: { appApiUrl: signal(API) } },
+      ],
+    });
+    service = TestBed.inject(ArtifactHttpService);
+    httpMock = TestBed.inject(HttpTestingController);
+  });
+
+  afterEach(() => {
+    httpMock.verify();
+  });
+
+  it('mints a render token and maps expires_at', async () => {
+    const p = service.mintRenderToken('art-1', 2, 'sess-9');
+    const req = httpMock.expectOne(`${API}/artifacts/art-1/render-token`);
+    expect(req.request.method).toBe('POST');
+    expect(req.request.body).toEqual({ version: 2, sessionId: 'sess-9' });
+    req.flush({ url: 'https://artifacts.x/?t=jwt', expires_at: '2026-05-15T12:02:00+00:00' });
+    await expect(p).resolves.toEqual({
+      url: 'https://artifacts.x/?t=jwt',
+      expiresAt: '2026-05-15T12:02:00+00:00',
+    });
+  });
+
+  it('url-encodes the artifact id in the token path', async () => {
+    const p = service.mintRenderToken('a/b 1', 1, 's');
+    const req = httpMock.expectOne(`${API}/artifacts/a%2Fb%201/render-token`);
+    req.flush({ url: 'u', expires_at: 'e' });
+    await p;
+  });
+
+  it('lists session artifacts and normalizes snake_case to camelCase', async () => {
+    const p = service.listSessionArtifacts('sess-9');
+    const req = httpMock.expectOne(
+      (r) => r.url === `${API}/artifacts` && r.params.get('session_id') === 'sess-9',
+    );
+    expect(req.request.method).toBe('GET');
+    req.flush({
+      artifacts: [
+        {
+          artifact_id: 'art-1',
+          version: 3,
+          title: 'Report',
+          content_type: 'text/html; charset=utf-8',
+          updated_at: '2026-05-15T12:00:00+00:00',
+          created_at: '2026-05-15T10:00:00+00:00',
+          produced_by_message_index: 4,
+        },
+        {
+          artifact_id: 'legacy',
+          version: 1,
+          title: 'Old',
+          content_type: 'text/html; charset=utf-8',
+          updated_at: '2026-05-15T11:00:00+00:00',
+          created_at: '2026-05-15T11:00:00+00:00',
+        },
+      ],
+    });
+    await expect(p).resolves.toEqual([
+      {
+        artifactId: 'art-1',
+        version: 3,
+        title: 'Report',
+        contentType: 'text/html; charset=utf-8',
+        updatedAt: '2026-05-15T12:00:00+00:00',
+        createdAt: '2026-05-15T10:00:00+00:00',
+        producedByMessageIndex: 4,
+      },
+      {
+        artifactId: 'legacy',
+        version: 1,
+        title: 'Old',
+        contentType: 'text/html; charset=utf-8',
+        updatedAt: '2026-05-15T11:00:00+00:00',
+        createdAt: '2026-05-15T11:00:00+00:00',
+        producedByMessageIndex: null,
+      },
+    ]);
+  });
+
+  it('tolerates a missing artifacts array', async () => {
+    const p = service.listSessionArtifacts('sess-9');
+    httpMock
+      .expectOne((r) => r.url === `${API}/artifacts`)
+      .flush({});
+    await expect(p).resolves.toEqual([]);
+  });
+
+  it('fetches artifact content and maps snake_case to camelCase', async () => {
+    const p = service.getArtifactContent('art-1', 2);
+    const req = httpMock.expectOne(
+      (r) =>
+        r.url === `${API}/artifacts/art-1/content` &&
+        r.params.get('version') === '2',
+    );
+    expect(req.request.method).toBe('GET');
+    req.flush({
+      content: '# Title\n\nbody',
+      content_type: 'text/markdown',
+      version: 2,
+    });
+    await expect(p).resolves.toEqual({
+      content: '# Title\n\nbody',
+      contentType: 'text/markdown',
+      version: 2,
+    });
+  });
+
+  it('url-encodes the artifact id in the content path', async () => {
+    const p = service.getArtifactContent('a/b 1', 1);
+    const req = httpMock.expectOne(
+      (r) => r.url === `${API}/artifacts/a%2Fb%201/content`,
+    );
+    req.flush({ content: '', content_type: 'text/html', version: 1 });
+    await p;
+  });
+});
diff --git a/frontend/ai.client/src/app/session/services/artifacts/artifact-http.service.ts b/frontend/ai.client/src/app/session/services/artifacts/artifact-http.service.ts
new file mode 100644
index 00000000..a83cdda0
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/artifacts/artifact-http.service.ts
@@ -0,0 +1,113 @@
+import { Injectable, inject } from '@angular/core';
+import { HttpClient } from '@angular/common/http';
+import { firstValueFrom } from 'rxjs';
+import { ConfigService } from '../../../services/config.service';
+import type { Artifact } from './artifact.model';
+
+interface RenderTokenResponseDto {
+  url: string;
+  expires_at: string;
+}
+
+interface ArtifactSummaryDto {
+  artifact_id: string;
+  version: number;
+  title: string;
+  content_type: string;
+  updated_at: string;
+  created_at?: string | null;
+  produced_by_message_index?: number | null;
+}
+
+interface ArtifactListResponseDto {
+  artifacts: ArtifactSummaryDto[];
+}
+
+interface ArtifactContentResponseDto {
+  content: string;
+  content_type: string;
+  version: number;
+}
+
+export interface RenderToken {
+  url: string;
+  expiresAt: string;
+}
+
+/** Raw source of one artifact version, for the panel's code view. */
+export interface ArtifactContent {
+  content: string;
+  contentType: string;
+  version: number;
+}
+
+/**
+ * app-api client for the artifacts feature. Auth rides the httpOnly BFF
+ * session cookie + csrfInterceptor automatically (HttpClient pipeline),
+ * same as every other app-api call — no Bearer here.
+ *
+ * The render token is a short-lived (~120s) bearer credential embedded
+ * in the returned URL; the panel sets it as the iframe `src` and
+ * re-mints on each open rather than caching it.
+ */
+@Injectable({ providedIn: 'root' })
+export class ArtifactHttpService {
+  private http = inject(HttpClient);
+  private config = inject(ConfigService);
+
+  /** Mint a single-version render URL for the sandboxed iframe. */
+  async mintRenderToken(
+    artifactId: string,
+    version: number,
+    sessionId: string,
+  ): Promise<RenderToken> {
+    const res = await firstValueFrom(
+      this.http.post<RenderTokenResponseDto>(
+        `${this.config.appApiUrl()}/artifacts/${encodeURIComponent(artifactId)}/render-token`,
+        { version, sessionId },
+      ),
+    );
+    return { url: res.url, expiresAt: res.expires_at };
+  }
+
+  /**
+   * Fetch one artifact version's raw source for the code view. The
+   * bytes are inert text the panel highlights client-side. The backend
+   * unwraps Markdown back to the authored source and 413s anything too
+   * large to highlight inline (callers steer to download on that).
+   */
+  async getArtifactContent(
+    artifactId: string,
+    version: number,
+  ): Promise<ArtifactContent> {
+    const res = await firstValueFrom(
+      this.http.get<ArtifactContentResponseDto>(
+        `${this.config.appApiUrl()}/artifacts/${encodeURIComponent(artifactId)}/content`,
+        { params: { version } },
+      ),
+    );
+    return {
+      content: res.content,
+      contentType: res.content_type,
+      version: res.version,
+    };
+  }
+
+  /** List the current HEAD of every artifact in a chat session. */
+  async listSessionArtifacts(sessionId: string): Promise<Artifact[]> {
+    const res = await firstValueFrom(
+      this.http.get<ArtifactListResponseDto>(`${this.config.appApiUrl()}/artifacts`, {
+        params: { session_id: sessionId },
+      }),
+    );
+    return (res.artifacts ?? []).map((a) => ({
+      artifactId: a.artifact_id,
+      version: a.version,
+      title: a.title,
+      contentType: a.content_type,
+      updatedAt: a.updated_at,
+      createdAt: a.created_at ?? undefined,
+      producedByMessageIndex: a.produced_by_message_index ?? null,
+    }));
+  }
+}
diff --git a/frontend/ai.client/src/app/session/services/artifacts/artifact-state.service.spec.ts b/frontend/ai.client/src/app/session/services/artifacts/artifact-state.service.spec.ts
new file mode 100644
index 00000000..c201bb18
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/artifacts/artifact-state.service.spec.ts
@@ -0,0 +1,233 @@
+import { describe, it, expect, beforeEach } from 'vitest';
+import { TestBed } from '@angular/core/testing';
+import { ArtifactStateService } from './artifact-state.service';
+import type { ArtifactEvent } from '../../../shared/utils/stream-parser';
+import type { Artifact } from './artifact.model';
+
+function ev(over: Partial<ArtifactEvent> = {}): ArtifactEvent {
+  return {
+    type: 'artifact',
+    artifactId: 'art-1',
+    version: 1,
+    title: 'Doc',
+    contentType: 'text/html; charset=utf-8',
+    sessionId: 'sess-9',
+    updatedAt: '2026-05-15T12:00:00+00:00',
+    action: 'created',
+    ...over,
+  };
+}
+
+describe('ArtifactStateService', () => {
+  let svc: ArtifactStateService;
+
+  beforeEach(() => {
+    // ArtifactStateService injects SidenavService (both providedIn:
+    // 'root'), so it must be created in an injection context rather
+    // than via `new`.
+    TestBed.configureTestingModule({});
+    svc = TestBed.inject(ArtifactStateService);
+  });
+
+  it('starts empty', () => {
+    expect(svc.hasArtifacts()).toBe(false);
+    expect(svc.artifacts()).toEqual([]);
+    expect(svc.openArtifact()).toBeNull();
+  });
+
+  it('records a live artifact event', () => {
+    svc.recordLive(ev());
+    expect(svc.hasArtifacts()).toBe(true);
+    expect(svc.get('art-1')?.version).toBe(1);
+  });
+
+  it('threads producedByMessageIndex from the live event', () => {
+    svc.recordLive(ev({ producedByMessageIndex: 7 }));
+    expect(svc.get('art-1')?.producedByMessageIndex).toBe(7);
+  });
+
+  it('defaults producedByMessageIndex to null when the event omits it', () => {
+    svc.recordLive(ev());
+    expect(svc.get('art-1')?.producedByMessageIndex).toBeNull();
+  });
+
+  it('stores the live producing message id when provided', () => {
+    svc.recordLive(ev(), 'msg-sess-9-3');
+    expect(svc.get('art-1')?.producedByMessageId).toBe('msg-sess-9-3');
+  });
+
+  it('defaults producedByMessageId to null on the hydration-style call', () => {
+    svc.recordLive(ev());
+    expect(svc.get('art-1')?.producedByMessageId).toBeNull();
+  });
+
+  it('keeps every version of an artifact as its own entry', () => {
+    svc.recordLive(ev({ version: 1 }));
+    svc.recordLive(ev({ version: 3, updatedAt: '2026-05-15T12:05:00+00:00' }));
+    svc.recordLive(ev({ version: 2, updatedAt: '2026-05-15T12:02:00+00:00' }));
+    // All three versions coexist (one card per version); get() resolves
+    // the highest.
+    expect(svc.artifacts().length).toBe(3);
+    expect(
+      svc
+        .artifacts()
+        .map((a) => a.version)
+        .sort(),
+    ).toEqual([1, 2, 3]);
+    expect(svc.get('art-1')?.version).toBe(3);
+  });
+
+  it('orders versions of one artifact by version when timestamps tie', () => {
+    const at = '2026-05-15T12:00:00+00:00';
+    svc.recordLive(ev({ version: 1, updatedAt: at }));
+    svc.recordLive(ev({ version: 2, updatedAt: at }));
+    expect(svc.artifacts().map((a) => a.version)).toEqual([2, 1]);
+  });
+
+  it('versionsFor returns one artifact’s versions, newest first', () => {
+    svc.recordLive(ev({ artifactId: 'a', version: 1 }));
+    svc.recordLive(ev({ artifactId: 'a', version: 2 }));
+    svc.recordLive(ev({ artifactId: 'a', version: 3 }));
+    svc.recordLive(ev({ artifactId: 'b', version: 1 }));
+    expect(svc.versionsFor('a').map((v) => v.version)).toEqual([3, 2, 1]);
+    expect(svc.versionsFor('b').map((v) => v.version)).toEqual([1]);
+    expect(svc.versionsFor('missing')).toEqual([]);
+  });
+
+  it('orders artifacts newest update first', () => {
+    svc.recordLive(ev({ artifactId: 'old', updatedAt: '2026-05-15T10:00:00+00:00' }));
+    svc.recordLive(ev({ artifactId: 'new', updatedAt: '2026-05-15T12:00:00+00:00' }));
+    expect(svc.artifacts().map((a) => a.artifactId)).toEqual(['new', 'old']);
+  });
+
+  it('hydration adds older versions alongside a newer live one', () => {
+    svc.recordLive(ev({ version: 5, updatedAt: '2026-05-15T12:10:00+00:00' }));
+    const v2: Artifact = {
+      artifactId: 'art-1',
+      version: 2,
+      title: 'Old',
+      contentType: 'text/html; charset=utf-8',
+      updatedAt: '2026-05-15T11:00:00+00:00',
+    };
+    svc.seedFromHydration([v2]);
+    // Distinct versions coexist (history), get() still resolves latest.
+    expect(svc.artifacts().map((a) => a.version).sort()).toEqual([2, 5]);
+    expect(svc.get('art-1')?.version).toBe(5);
+  });
+
+  it('live anchor wins when hydration repeats the same (id, version)', () => {
+    svc.recordLive(ev({ version: 1 }), 'msg-sess-9-3');
+    svc.seedFromHydration([
+      {
+        artifactId: 'art-1',
+        version: 1,
+        title: 'Doc',
+        contentType: 'text/html; charset=utf-8',
+        updatedAt: '2026-05-15T12:00:00+00:00',
+        producedByMessageIndex: 4,
+      },
+    ]);
+    // The live entry's precise per-turn anchor must survive a later
+    // index-only hydration of the same version.
+    expect(svc.get('art-1')?.producedByMessageId).toBe('msg-sess-9-3');
+  });
+
+  it('a live event fills in the anchor a prior hydration row lacked', () => {
+    svc.seedFromHydration([
+      {
+        artifactId: 'art-1',
+        version: 1,
+        title: 'Doc',
+        contentType: 'text/html; charset=utf-8',
+        updatedAt: '2026-05-15T12:00:00+00:00',
+      },
+    ]);
+    svc.recordLive(ev({ version: 1 }), 'msg-sess-9-3');
+    expect(svc.get('art-1')?.producedByMessageId).toBe('msg-sess-9-3');
+  });
+
+  it('hydration adds artifacts not seen live', () => {
+    svc.seedFromHydration([
+      {
+        artifactId: 'h1',
+        version: 2,
+        title: 'Hydrated',
+        contentType: 'text/html; charset=utf-8',
+        updatedAt: '2026-05-15T09:00:00+00:00',
+      },
+    ]);
+    expect(svc.get('h1')?.title).toBe('Hydrated');
+  });
+
+  it('opens and closes the panel', () => {
+    svc.openArtifactPanel({ artifactId: 'art-1', version: 1, title: 'Doc' });
+    expect(svc.openArtifact()).toEqual({ artifactId: 'art-1', version: 1, title: 'Doc' });
+    svc.closeArtifactPanel();
+    expect(svc.openArtifact()).toBeNull();
+  });
+
+  it('auto-opens the panel for a created artifact', () => {
+    svc.recordLive(ev({ artifactId: 'art-9', version: 1, title: 'Generated' }));
+    expect(svc.openArtifact()).toEqual({
+      artifactId: 'art-9',
+      version: 1,
+      title: 'Generated',
+    });
+  });
+
+  it('re-points the panel to the new version when an artifact is updated', () => {
+    svc.recordLive(ev({ version: 1, action: 'created' }));
+    svc.recordLive(
+      ev({
+        version: 2,
+        action: 'updated',
+        title: 'Doc v2',
+        updatedAt: '2026-05-15T12:05:00+00:00',
+      }),
+    );
+    expect(svc.get('art-1')?.version).toBe(2);
+    expect(svc.openArtifact()).toEqual({
+      artifactId: 'art-1',
+      version: 2,
+      title: 'Doc v2',
+    });
+  });
+
+  it('re-opens the panel on an update even after the user closed it', () => {
+    svc.recordLive(ev({ version: 1, action: 'created' }));
+    svc.closeArtifactPanel();
+    expect(svc.openArtifact()).toBeNull();
+    svc.recordLive(
+      ev({ version: 2, action: 'updated', updatedAt: '2026-05-15T12:05:00+00:00' }),
+    );
+    expect(svc.openArtifact()?.version).toBe(2);
+  });
+
+  it('keeps the panel on the highest version for an out-of-order update', () => {
+    svc.recordLive(ev({ version: 3, updatedAt: '2026-05-15T12:10:00+00:00' }));
+    svc.recordLive(ev({ version: 2, action: 'updated' })); // stale — ignored
+    expect(svc.get('art-1')?.version).toBe(3);
+    expect(svc.openArtifact()?.version).toBe(3);
+  });
+
+  it('hydration on reload does not auto-open the panel', () => {
+    svc.seedFromHydration([
+      {
+        artifactId: 'h1',
+        version: 1,
+        title: 'Hydrated',
+        contentType: 'text/html; charset=utf-8',
+        updatedAt: '2026-05-15T09:00:00+00:00',
+      },
+    ]);
+    expect(svc.openArtifact()).toBeNull();
+  });
+
+  it('reset clears artifacts and open panel', () => {
+    svc.recordLive(ev());
+    svc.openArtifactPanel({ artifactId: 'art-1', version: 1, title: 'Doc' });
+    svc.reset();
+    expect(svc.hasArtifacts()).toBe(false);
+    expect(svc.openArtifact()).toBeNull();
+  });
+});
diff --git a/frontend/ai.client/src/app/session/services/artifacts/artifact-state.service.ts b/frontend/ai.client/src/app/session/services/artifacts/artifact-state.service.ts
new file mode 100644
index 00000000..991b494a
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/artifacts/artifact-state.service.ts
@@ -0,0 +1,184 @@
+import { Injectable, computed, inject, signal } from '@angular/core';
+import { SidenavService } from '../../../services/sidenav/sidenav.service';
+import type { ArtifactEvent } from '../../../shared/utils/stream-parser';
+import type { Artifact, OpenArtifactRef } from './artifact.model';
+
+/**
+ * Per-session artifact registry + open-panel state. Structural sibling of
+ * `CompactionSummaryService`: a live SSE path (`recordLive`), a
+ * session-load hydration path (`seedFromHydration`), and a `reset()`
+ * called on session change.
+ *
+ * Keyed by `artifactId#version` — every version is its own entry so the
+ * conversation can show one card per version. `update_artifact` emits a
+ * new event for a new version (a new key), not a replacement. When the
+ * same (id, version) arrives from both a live event and reload
+ * hydration, the live entry wins: it carries `producedByMessageId`, the
+ * precise per-turn anchor the index-only hydration row lacks.
+ */
+@Injectable({ providedIn: 'root' })
+export class ArtifactStateService {
+  private readonly sidenav = inject(SidenavService);
+
+  private readonly byKey = signal<Map<string, Artifact>>(new Map());
+  private readonly openRef = signal<OpenArtifactRef | null>(null);
+
+  /** Sidenav collapsed state captured the moment the panel opened, so a
+   *  user who had the nav open isn't left with it collapsed after they
+   *  close the artifact (and one who had it collapsed keeps it that way). */
+  private navWasCollapsed = false;
+
+  /** User-controlled docked pane width in px. Shared with the layout
+   *  (content padding + fixed footer/topnav offset) via a CSS var so the
+   *  chat never ends up under the pane. Survives open/close within a
+   *  session (it's a root singleton); resets on full reload. */
+  private static readonly MIN_PANE_WIDTH = 360;
+  private static readonly MAX_PANE_WIDTH = 1200;
+  private static readonly DEFAULT_PANE_WIDTH = 672; // 42rem, prior fixed size
+  private readonly paneWidthSignal = signal(
+    ArtifactStateService.DEFAULT_PANE_WIDTH,
+  );
+
+  /** Every artifact version for the current session, newest first.
+   *  Tie-broken by version so versions of one artifact stay ordered
+   *  even when their timestamps are equal or missing. */
+  readonly artifacts = computed<Artifact[]>(() =>
+    Array.from(this.byKey().values()).sort(
+      (a, b) =>
+        b.updatedAt.localeCompare(a.updatedAt) || b.version - a.version,
+    ),
+  );
+
+  readonly hasArtifacts = computed(() => this.byKey().size > 0);
+
+  /** The artifact the side panel is showing, or null when closed. */
+  readonly openArtifact = this.openRef.asReadonly();
+
+  /** Current docked pane width in px (clamped). */
+  readonly paneWidth = this.paneWidthSignal.asReadonly();
+
+  get paneWidthMin(): number {
+    return ArtifactStateService.MIN_PANE_WIDTH;
+  }
+
+  get paneWidthMax(): number {
+    return ArtifactStateService.MAX_PANE_WIDTH;
+  }
+
+  /** Set the docked pane width, clamped to the allowed range. The caller
+   *  is responsible for any viewport-relative ceiling (the service only
+   *  knows absolute bounds, not the window size). */
+  setPaneWidth(px: number): void {
+    const clamped = Math.min(
+      ArtifactStateService.MAX_PANE_WIDTH,
+      Math.max(ArtifactStateService.MIN_PANE_WIDTH, Math.round(px)),
+    );
+    this.paneWidthSignal.set(clamped);
+  }
+
+  /** Highest known version of an artifact, if any. */
+  get(artifactId: string): Artifact | undefined {
+    let latest: Artifact | undefined;
+    for (const a of this.byKey().values()) {
+      if (a.artifactId !== artifactId) continue;
+      if (!latest || a.version > latest.version) latest = a;
+    }
+    return latest;
+  }
+
+  /** Every known version of an artifact, newest first. */
+  versionsFor(artifactId: string): Artifact[] {
+    const out: Artifact[] = [];
+    for (const a of this.byKey().values()) {
+      if (a.artifactId === artifactId) out.push(a);
+    }
+    return out.sort((x, y) => y.version - x.version);
+  }
+
+  /**
+   * Record a live `artifact` SSE event. Keeps the highest version.
+   *
+   * `producedByMessageId` is the concrete id of the assistant message
+   * that just streamed (resolved by the caller the same way oauth /
+   * tool-approval prompts are). Live placement keys off this rather than
+   * the numeric index, which only lines up after a reload.
+   */
+  recordLive(event: ArtifactEvent, producedByMessageId?: string | null): void {
+    this.upsert({
+      artifactId: event.artifactId,
+      version: event.version,
+      title: event.title,
+      contentType: event.contentType,
+      updatedAt: event.updatedAt,
+      producedByMessageIndex: event.producedByMessageIndex ?? null,
+      producedByMessageId: producedByMessageId ?? null,
+    });
+    // Created or updated, surface the artifact at its newest version. Read
+    // the registry (not event.version) so a stale out-of-order event can't
+    // pin the panel to an older version. Reopening a past session uses
+    // seedFromHydration, not this, so it never pops the panel.
+    const latest = this.get(event.artifactId);
+    if (latest) {
+      this.openArtifactPanel({
+        artifactId: latest.artifactId,
+        version: latest.version,
+        title: latest.title,
+      });
+    }
+  }
+
+  /**
+   * Replay the persisted session artifact list on load. Idempotent and
+   * non-clobbering: an entry already at an equal-or-higher version (from
+   * a live event that raced ahead) is left untouched.
+   */
+  seedFromHydration(list: readonly Artifact[]): void {
+    for (const a of list) this.upsert(a);
+  }
+
+  openArtifactPanel(ref: OpenArtifactRef): void {
+    // Collapse the side nav to make room for the docked pane. Only capture
+    // the prior state on a genuine closed -> open transition: switching
+    // between artifacts while the panel is already open must not overwrite
+    // it (the nav is collapsed by us at that point).
+    if (this.openRef() === null) {
+      this.navWasCollapsed = this.sidenav.isCollapsed();
+      this.sidenav.collapse();
+    }
+    this.openRef.set(ref);
+  }
+
+  closeArtifactPanel(): void {
+    if (this.openRef() === null) return;
+    this.openRef.set(null);
+    this.restoreNav();
+  }
+
+  /** Clear all state — called on session change. */
+  reset(): void {
+    this.byKey.set(new Map());
+    if (this.openRef() !== null) {
+      this.openRef.set(null);
+      this.restoreNav();
+    }
+  }
+
+  /** Return the side nav to whatever state it was in before the panel
+   *  opened. If the user had it collapsed already, leave it collapsed. */
+  private restoreNav(): void {
+    if (!this.navWasCollapsed) this.sidenav.expand();
+  }
+
+  private upsert(a: Artifact): void {
+    const key = `${a.artifactId}#${a.version}`;
+    const existing = this.byKey().get(key);
+    // A later reload-hydration call for the same (id, version) must not
+    // drop the live entry's precise per-turn anchor.
+    if (existing?.producedByMessageId && !a.producedByMessageId) return;
+    this.byKey.update((m) => {
+      const next = new Map(m);
+      next.set(key, a);
+      return next;
+    });
+  }
+}
diff --git a/frontend/ai.client/src/app/session/services/artifacts/artifact.model.ts b/frontend/ai.client/src/app/session/services/artifacts/artifact.model.ts
new file mode 100644
index 00000000..990780b1
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/artifacts/artifact.model.ts
@@ -0,0 +1,46 @@
+/**
+ * Normalized client model for an agent-authored artifact.
+ *
+ * Both delivery paths converge here: the live `artifact` SSE event
+ * (camelCase wire shape) and the app-api session-list endpoint
+ * (snake_case REST shape, normalized by ArtifactHttpService). The panel
+ * never holds artifact HTML — it mints a short-lived render token and
+ * loads the content in a sandboxed iframe on the artifact origin.
+ */
+export interface Artifact {
+  artifactId: string;
+  version: number;
+  title: string;
+  contentType: string;
+  updatedAt: string;
+  createdAt?: string;
+  /**
+   * 0-based index of the assistant message that produced (or last
+   * updated) this artifact, in AgentCore Memory ordering — matches the
+   * messages endpoint's `msg-{sessionId}-{index}` id on **reload**, when
+   * the SPA enumerates the same memory messages the backend indexed.
+   *
+   * Not reliable for live placement: a tool-using turn is one folded
+   * assistant message client-side, but this index counts the interleaved
+   * tool_use/tool_result memory messages, so it drifts. Live placement
+   * uses `producedByMessageId` instead. Null/undefined for artifacts
+   * written before linkage existed — those fall back to the strip.
+   */
+  producedByMessageIndex?: number | null;
+
+  /**
+   * Concrete client message id of the assistant turn that produced this
+   * artifact, resolved at live-event time (same way oauth/tool-approval
+   * prompts anchor). Set only on the live path; absent on reload
+   * hydration (the numeric index governs there). Stable as later turns
+   * append, so the card stays put without a refresh.
+   */
+  producedByMessageId?: string | null;
+}
+
+/** Which artifact the side panel is currently showing. */
+export interface OpenArtifactRef {
+  artifactId: string;
+  version: number;
+  title: string;
+}
diff --git a/frontend/ai.client/src/app/session/services/chat/chat-http.service.spec.ts b/frontend/ai.client/src/app/session/services/chat/chat-http.service.spec.ts
index 53028af4..256902aa 100644
--- a/frontend/ai.client/src/app/session/services/chat/chat-http.service.spec.ts
+++ b/frontend/ai.client/src/app/session/services/chat/chat-http.service.spec.ts
@@ -28,7 +28,7 @@ describe('ChatHttpService', () => {
         // Phase 6c: chat-http now reads CSRF from the BFF SessionService
         // and lets cookie auth ride along with the request rather than
         // attaching a Bearer manually.
-        { provide: BffSessionService, useValue: { csrfHeaders: vi.fn().mockReturnValue({}) } },
+        { provide: BffSessionService, useValue: { csrfHeaders: vi.fn().mockReturnValue({}), handleUnauthorized: vi.fn() } },
         { provide: SessionService, useValue: { currentSession: signal({ sessionId: 's1' }), updateSessionTitleInCache: vi.fn() } },
         { provide: StreamParserService, useValue: {} },
         { provide: ChatStateService, useValue: { isStreaming: signal(false), streamingSessionId: signal(null), abortCurrentRequest: vi.fn(), setChatLoading: vi.fn(), resetState: vi.fn(), getAbortController: vi.fn().mockReturnValue(new AbortController()) } },
diff --git a/frontend/ai.client/src/app/session/services/chat/chat-http.service.ts b/frontend/ai.client/src/app/session/services/chat/chat-http.service.ts
index b785faab..0973dec4 100644
--- a/frontend/ai.client/src/app/session/services/chat/chat-http.service.ts
+++ b/frontend/ai.client/src/app/session/services/chat/chat-http.service.ts
@@ -22,6 +22,17 @@ class FatalError extends Error {
     this.name = 'FatalError';
   }
 }
+/**
+ * Thrown by the SSE `onopen` when the BFF returned 401. The session
+ * redirect is already in flight; `onerror` uses this marker to suppress
+ * the toast that would otherwise flash before the page tears down.
+ */
+class UnauthorizedError extends Error {
+  constructor() {
+    super('Session expired');
+    this.name = 'UnauthorizedError';
+  }
+}
 
 interface GenerateTitleRequest {
   session_id: string;
@@ -69,6 +80,10 @@ export class ChatHttpService {
     // an empty object before bootstrap or in the Bearer rollback path.
     const csrfHeaders = this.bffSession.csrfHeaders();
 
+    // Capture into a local so `onopen` (a method-shorthand on the config
+    // object, not a closure over `this`) can reach the BFF session.
+    const bffSession = this.bffSession;
+
     return fetchEventSource(`${baseUrl}/chat/stream`, {
       method: 'POST',
       // Send the BFF session cookie (`__Host-bff_session`) on cross-origin
@@ -88,6 +103,12 @@ export class ChatHttpService {
       async onopen(response) {
         if (response.ok && response.headers.get('content-type')?.includes('text/event-stream')) {
           return; // everything's good
+        } else if (response.status === 401) {
+          // BFF session is missing or expired. Bounce to /auth/login —
+          // `handleUnauthorized` is idempotent, so a 401 here that races
+          // with one from a parallel request only navigates once.
+          bffSession.handleUnauthorized();
+          throw new UnauthorizedError();
         } else if (response.status === 403) {
           // Handle forbidden (e.g., usage limit exceeded)
           let errorMessage = 'Access forbidden';
@@ -162,6 +183,12 @@ export class ChatHttpService {
         this.messageMapService.endStreaming();
         this.chatStateService.setChatLoading(false);
 
+        // 401 already triggered the redirect — skip the toast so it
+        // doesn't flash before the page tears down.
+        if (err instanceof UnauthorizedError) {
+          throw err;
+        }
+
         // Display error message to user using ErrorService
         if (err instanceof FatalError) {
           this.errorService.addError('Chat Request Failed', err.message, undefined, undefined);
diff --git a/frontend/ai.client/src/app/session/services/chat/chat-request.service.spec.ts b/frontend/ai.client/src/app/session/services/chat/chat-request.service.spec.ts
index 851fa385..7927697a 100644
--- a/frontend/ai.client/src/app/session/services/chat/chat-request.service.spec.ts
+++ b/frontend/ai.client/src/app/session/services/chat/chat-request.service.spec.ts
@@ -43,8 +43,8 @@ describe('ChatRequestService', () => {
         ChatRequestService,
         { provide: ChatHttpService, useValue: mockChatHttpService },
         { provide: Router, useValue: mockRouter },
-        { provide: ChatStateService, useValue: { setChatLoading: vi.fn() } },
-        { provide: MessageMapService, useValue: { addUserMessage: vi.fn(), startStreaming: vi.fn(), endStreaming: vi.fn() } },
+        { provide: ChatStateService, useValue: { setChatLoading: vi.fn(), setLastTurnContinuable: vi.fn(), createNewAbortController: vi.fn() } },
+        { provide: MessageMapService, useValue: { addUserMessage: vi.fn(), startStreaming: vi.fn(), beginContinuationStreaming: vi.fn(), endStreaming: vi.fn() } },
         { provide: SessionService, useValue: { addSessionToCache: vi.fn() } },
         { provide: UserService, useValue: { getUser: vi.fn().mockReturnValue({ user_id: 'user1' }) } },
         { provide: ModelService, useValue: mockModelService },
@@ -103,4 +103,33 @@ describe('ChatRequestService', () => {
       'No model selected. Please select a model before sending a message.'
     );
   });
+
+  describe('continueTruncatedTurn', () => {
+    it('sends continue_truncated with an empty message', async () => {
+      await service.continueTruncatedTurn('session1', 'assistant1');
+
+      expect(mockChatHttpService.sendChatRequest).toHaveBeenCalledWith(
+        expect.objectContaining({
+          message: '',
+          session_id: 'session1',
+          continue_truncated: true,
+          rag_assistant_id: 'assistant1',
+        }),
+      );
+    });
+
+    it('does NOT add a user message (no visible bubble); uses continuation streaming', async () => {
+      const messageMap = TestBed.inject(MessageMapService) as any;
+      await service.continueTruncatedTurn('session1');
+
+      expect(messageMap.addUserMessage).not.toHaveBeenCalled();
+      expect(messageMap.startStreaming).not.toHaveBeenCalled();
+      expect(messageMap.beginContinuationStreaming).toHaveBeenCalledWith('session1');
+    });
+
+    it('is a no-op without a session id', async () => {
+      await service.continueTruncatedTurn(null);
+      expect(mockChatHttpService.sendChatRequest).not.toHaveBeenCalled();
+    });
+  });
 });
\ No newline at end of file
diff --git a/frontend/ai.client/src/app/session/services/chat/chat-request.service.ts b/frontend/ai.client/src/app/session/services/chat/chat-request.service.ts
index d08a7c99..02442273 100644
--- a/frontend/ai.client/src/app/session/services/chat/chat-request.service.ts
+++ b/frontend/ai.client/src/app/session/services/chat/chat-request.service.ts
@@ -68,6 +68,10 @@ export class ChatRequestService implements OnDestroy {
   ): Promise<void> {
     // Ensure conversation exists and get its ID
     // Update URL to reflect current conversation
+    // Any new send (including a "Continue") retires the previous turn's
+    // max_tokens "Continue" affordance immediately, before the stream starts.
+    this.chatStateService.setLastTurnContinuable(false);
+
     const isNewSession = !sessionId;
     sessionId = sessionId || uuidv4();
 
@@ -115,6 +119,52 @@ export class ChatRequestService implements OnDestroy {
     }
   }
 
+  /**
+   * "Continue" after a max_tokens truncation. Modeled on the OAuth/tool
+   * resume flow, NOT on submitChatRequest: no user message is added (no
+   * visible bubble, no new user turn) so the model resumes the truncated
+   * assistant message in restored history instead of answering a fresh
+   * instruction. The full request object is sent so the backend rebuilds
+   * the same agent shape; `continue_truncated` makes it re-enter the loop
+   * with an empty prompt.
+   */
+  async continueTruncatedTurn(
+    sessionId: string | null,
+    assistantId?: string,
+  ): Promise<void> {
+    if (!sessionId) {
+      return;
+    }
+
+    // Hide the affordance immediately; retire any stale continuable state.
+    this.chatStateService.setLastTurnContinuable(false);
+
+    // Continuation streaming: pins the existing messages (history +
+    // truncated partial + error bubble) as a stable prefix and appends the
+    // continuation after them, instead of the normal sync which would
+    // truncate back to the last user message and drop the partial. Also
+    // resets the parser (with the correct starting count) so the resumed
+    // stream is treated as a fresh batch.
+    this.messageMapService.beginContinuationStreaming(sessionId);
+    this.chatStateService.createNewAbortController();
+    this.chatStateService.setChatLoading(true);
+
+    // Reuse the normal request shape so the backend rebuilds the same
+    // model/tools/assistant agent, but with an empty message and the
+    // continuation flag. No addUserMessage call → no user bubble.
+    const requestObject = this.buildChatRequestObject('', sessionId, undefined, assistantId);
+    requestObject['message'] = '';
+    requestObject['continue_truncated'] = true;
+
+    try {
+      await this.chatHttpService.sendChatRequest(requestObject);
+    } catch (error) {
+      this.chatStateService.setChatLoading(false);
+      this.messageMapService.endStreaming();
+      throw error;
+    }
+  }
+
   /**
    * Navigates to the conversation route
    * @param sessionId The conversation ID to navigate to
diff --git a/frontend/ai.client/src/app/session/services/chat/chat-state.service.spec.ts b/frontend/ai.client/src/app/session/services/chat/chat-state.service.spec.ts
index 96dd7895..29f683b8 100644
--- a/frontend/ai.client/src/app/session/services/chat/chat-state.service.spec.ts
+++ b/frontend/ai.client/src/app/session/services/chat/chat-state.service.spec.ts
@@ -40,15 +40,31 @@ describe('ChatStateService', () => {
     });
   });
 
+  describe('setLastTurnContinuable', () => {
+    it('defaults to false', () => {
+      expect(service.lastTurnContinuable()).toBe(false);
+    });
+
+    it('toggles the continuable flag', () => {
+      service.setLastTurnContinuable(true);
+      expect(service.lastTurnContinuable()).toBe(true);
+
+      service.setLastTurnContinuable(false);
+      expect(service.lastTurnContinuable()).toBe(false);
+    });
+  });
+
   describe('resetState', () => {
     it('should reset all state to initial values', () => {
       service.setChatLoading(true);
       service.setStopReason('stop');
-      
+      service.setLastTurnContinuable(true);
+
       service.resetState();
-      
+
       expect(service.isChatLoading()).toBe(false);
       expect(service.currentStopReason()).toBeNull();
+      expect(service.lastTurnContinuable()).toBe(false);
     });
   });
 
diff --git a/frontend/ai.client/src/app/session/services/chat/chat-state.service.ts b/frontend/ai.client/src/app/session/services/chat/chat-state.service.ts
index 19688354..933a6457 100644
--- a/frontend/ai.client/src/app/session/services/chat/chat-state.service.ts
+++ b/frontend/ai.client/src/app/session/services/chat/chat-state.service.ts
@@ -12,6 +12,13 @@ export class ChatStateService {
     private readonly stopReason = signal<string | null>(null);
     readonly currentStopReason: Signal<string | null> = this.stopReason.asReadonly();
 
+    // True when the most recent turn ended in a recoverable max_tokens
+    // truncation. Drives the "Continue" affordance on the last assistant
+    // message. Live-only (not hydrated on reload); set from the stream_error
+    // event and cleared the moment a new turn starts.
+    private readonly lastTurnContinuableSignal = signal(false);
+    readonly lastTurnContinuable: Signal<boolean> = this.lastTurnContinuableSignal.asReadonly();
+
     // ----- Session-level cost / context aggregates ---------------------------
     // Drive the cost badge above the composer. Seeded from session metadata
     // on route change, then incrementally updated via the SSE metadata event
@@ -49,6 +56,14 @@ export class ChatStateService {
         this.stopReason.set(reason);
     }
 
+    /**
+     * Marks (or clears) whether the last turn ended in a recoverable
+     * max_tokens truncation that the user can continue from.
+     */
+    setLastTurnContinuable(continuable: boolean): void {
+        this.lastTurnContinuableSignal.set(continuable);
+    }
+
     /**
      * Seed the session-level cost/context signals from a session metadata
      * payload (e.g. when navigating to an existing session). Called BEFORE
@@ -87,6 +102,7 @@ export class ChatStateService {
     resetState(): void {
         this.chatLoading.set(false);
         this.stopReason.set(null);
+        this.lastTurnContinuableSignal.set(false);
         this.costDollarsSignal.set(0);
         this.contextTokensSignal.set(0);
         this.contextWindowSignal.set(0);
diff --git a/frontend/ai.client/src/app/session/services/chat/stream-parser.service.spec.ts b/frontend/ai.client/src/app/session/services/chat/stream-parser.service.spec.ts
index 48bea5ed..933801fb 100644
--- a/frontend/ai.client/src/app/session/services/chat/stream-parser.service.spec.ts
+++ b/frontend/ai.client/src/app/session/services/chat/stream-parser.service.spec.ts
@@ -349,3 +349,86 @@ describe('StreamParserService - Citation Handling', () => {
     });
   });
 });
+
+describe('StreamParserService - max_tokens Continue affordance', () => {
+  let service: StreamParserService;
+  let chatState: ChatStateService;
+
+  beforeEach(() => {
+    TestBed.resetTestingModule();
+    TestBed.configureTestingModule({
+      providers: [
+        StreamParserService,
+        ChatStateService,
+        ErrorService,
+        QuotaWarningService,
+      ],
+    });
+    service = TestBed.inject(StreamParserService);
+    chatState = TestBed.inject(ChatStateService);
+    service.reset();
+  });
+
+  afterEach(() => {
+    TestBed.resetTestingModule();
+  });
+
+  it('marks the last turn continuable on a max_tokens stream_error', () => {
+    expect(chatState.lastTurnContinuable()).toBe(false);
+
+    service.parseEventSourceMessage('stream_error', {
+      type: 'stream_error',
+      code: 'max_tokens',
+      message: 'I reached my response-length limit.',
+      recoverable: true,
+      metadata: { error_kind: 'max_tokens' },
+    });
+
+    expect(chatState.lastTurnContinuable()).toBe(true);
+  });
+
+  it('does not mark continuable for a non-max_tokens stream_error', () => {
+    service.parseEventSourceMessage('stream_error', {
+      type: 'stream_error',
+      code: 'stream_error',
+      message: 'Something went wrong.',
+      recoverable: false,
+    });
+
+    expect(chatState.lastTurnContinuable()).toBe(false);
+  });
+
+  it('retires the affordance when the next assistant turn starts streaming', () => {
+    service.parseEventSourceMessage('stream_error', {
+      type: 'stream_error',
+      code: 'max_tokens',
+      message: 'truncated',
+      recoverable: true,
+      metadata: { error_kind: 'max_tokens' },
+    });
+    expect(chatState.lastTurnContinuable()).toBe(true);
+
+    service.parseEventSourceMessage('message_start', { role: 'assistant' });
+    expect(chatState.lastTurnContinuable()).toBe(false);
+  });
+
+  it('processes a terminal stream_error even after the stream completed', () => {
+    // Reproduces the dropped-affordance bug: the parser reaches a
+    // terminal state (message_start sets currentStreamId; done →
+    // Completed), then the max_tokens stream_error arrives last. It must
+    // still be processed (always-allowed) so Continue appears.
+    service.parseEventSourceMessage('message_start', { role: 'assistant' });
+    service.parseEventSourceMessage('done', null);
+    expect(chatState.lastTurnContinuable()).toBe(false);
+
+    service.parseEventSourceMessage('stream_error', {
+      type: 'stream_error',
+      code: 'max_tokens',
+      message: 'Response length limit reached.',
+      recoverable: true,
+      metadata: { error_kind: 'max_tokens' },
+    });
+
+    expect(chatState.lastTurnContinuable()).toBe(true);
+  });
+});
diff --git a/frontend/ai.client/src/app/session/services/chat/stream-parser.service.ts b/frontend/ai.client/src/app/session/services/chat/stream-parser.service.ts
index 3ccac558..59e7bcfd 100644
--- a/frontend/ai.client/src/app/session/services/chat/stream-parser.service.ts
+++ b/frontend/ai.client/src/app/session/services/chat/stream-parser.service.ts
@@ -6,6 +6,7 @@ import { ChatStateService } from './chat-state.service';
 import { v4 as uuidv4 } from 'uuid';
 import {
   ErrorService,
+  ErrorCode,
   StreamErrorEvent,
   ConversationalStreamError,
 } from '../../../services/error/error.service';
@@ -17,15 +18,20 @@ import {
 import { OAuthConsentService } from '../../../services/oauth-consent/oauth-consent.service';
 import { ToolApprovalService } from '../../../services/tool-approval/tool-approval.service';
 import { CompactionSummaryService } from './compaction-summary.service';
+import { ArtifactStateService } from '../artifacts/artifact-state.service';
+import { McpAppStateService } from '../mcp-apps/mcp-app-state.service';
 import type {
   OAuthRequiredEvent,
   ToolApprovalRequiredEvent,
   CompactionEvent,
+  ArtifactEvent,
+  UiResourceEvent,
 } from '../../../shared/utils/stream-parser';
 import {
   processStreamEvent,
   createStreamLineParser,
   inferContentBlockType,
+  extractStreamingStringField,
   parseToolResultContent,
   type StreamParserCallbacks,
   type ContentBlockBuilder,
@@ -49,6 +55,12 @@ enum StreamState {
 // Re-export ToolProgress for backwards compatibility
 export type { ToolProgress };
 
+/**
+ * Tools whose `content` input is a long document worth surfacing live (as a
+ * "generating…" preview) while the model is still streaming the tool call.
+ */
+const STREAMING_CONTENT_TOOLS = new Set(['create_artifact', 'update_artifact']);
+
 @Injectable({
   providedIn: 'root',
 })
@@ -59,6 +71,8 @@ export class StreamParserService {
   private oauthConsentService = inject(OAuthConsentService);
   private toolApprovalService = inject(ToolApprovalService);
   private compactionSummary = inject(CompactionSummaryService);
+  private artifactState = inject(ArtifactStateService);
+  private mcpAppState = inject(McpAppStateService);
 
   // =========================================================================
   // State Signals
@@ -207,8 +221,14 @@ export class StreamParserService {
     // Check if we should process this event
     // oauth_required arrives after message_stop/done by design (see CLAUDE.md SSE
     // table) — allow it through even when the stream state is Completed.
+    // stream_error is a terminal signal that likewise arrives after
+    // message_stop (e.g. max_tokens truncation) and must never be dropped by
+    // state gating, or recovery affordances (Continue) silently disappear.
     const isAlwaysAllowedEvent =
-      event === 'message_start' || event === 'error' || event === 'oauth_required';
+      event === 'message_start' ||
+      event === 'error' ||
+      event === 'oauth_required' ||
+      event === 'stream_error';
     if (!isAlwaysAllowedEvent && !this.shouldProcessEvent()) {
       return;
     }
@@ -327,6 +347,31 @@ export class StreamParserService {
 
       onCompaction: (data: CompactionEvent) => this.compactionSummary.recordLive(data),
 
+      onArtifact: (data: ArtifactEvent) => {
+        // Same post-message_stop timing as oauth_required: the producing
+        // assistant message is the last assistant message in the list.
+        // Anchor live placement to its concrete id — the numeric index
+        // only lines up after a reload (it counts the memory tool
+        // messages the folded client message doesn't have).
+        const messages = this.allMessages();
+        let lastAssistantId: string | undefined;
+        for (let i = messages.length - 1; i >= 0; i--) {
+          if (messages[i].role === 'assistant') {
+            lastAssistantId = messages[i].id;
+            break;
+          }
+        }
+        this.artifactState.recordLive(data, lastAssistantId);
+      },
+
+      onUiResource: (data: UiResourceEvent) => {
+        // Inline event (arrives right after its tool_result, mid-stream),
+        // unlike the post-message_stop side channels above — just record
+        // it keyed by toolUseId. The tool-use renderer picks it up
+        // reactively and swaps in the MCP App frame.
+        this.mcpAppState.recordLive(data);
+      },
+
       onToolApprovalRequired: (data: ToolApprovalRequiredEvent) => {
         const messages = this.allMessages();
         let lastAssistantId: string | undefined;
@@ -348,8 +393,19 @@ export class StreamParserService {
       },
 
       onError: (data) => this.handleError(data),
-      onStreamError: (data) =>
-        this.errorService.handleConversationalStreamError(data as ConversationalStreamError),
+      onStreamError: (data) => {
+        const streamError = data as ConversationalStreamError;
+        this.errorService.handleConversationalStreamError(streamError);
+        // A max_tokens truncation is recoverable: Strands already persisted
+        // the partial assistant turn, so the user can continue from it.
+        // Surface the "Continue" affordance on the last assistant message.
+        const isMaxTokens =
+          streamError.code === ErrorCode.MAX_TOKENS ||
+          streamError.metadata?.['error_kind'] === 'max_tokens';
+        if (isMaxTokens) {
+          this.chatStateService.setLastTurnContinuable(true);
+        }
+      },
 
       onParseError: (message) => this.setError(message),
     };
@@ -380,6 +436,10 @@ export class StreamParserService {
     // Clear stopReason in ChatStateService
     this.chatStateService.setStopReason(null);
 
+    // A new assistant turn is streaming — retire any stale "Continue"
+    // affordance from a previous max_tokens truncation.
+    this.chatStateService.setLastTurnContinuable(false);
+
     // Compute predictable message ID
     const completedCount = this.completedMessages().length;
     const messageIndex = this.startingMessageCount + completedCount;
@@ -805,7 +865,7 @@ export class StreamParserService {
 
     // Check if we need to update
     const existingMetadata = lastMessage.metadata as Record<string, unknown>;
-    const existingLatency = existingMetadata['latency'] as { timeToFirstToken?: number } | undefined;
+    const existingLatency = existingMetadata['latency'] as { timeToFirstToken?: number | null } | undefined;
     const existingTTFT = existingLatency?.timeToFirstToken;
     const existingCost = existingMetadata['cost'] as number | undefined;
     const existingTokenUsage = existingMetadata['tokenUsage'] as {
@@ -813,7 +873,7 @@ export class StreamParserService {
       cacheWriteInputTokens?: number;
     } | undefined;
 
-    const newLatency = newMetadata['latency'] as { timeToFirstToken?: number } | undefined;
+    const newLatency = newMetadata['latency'] as { timeToFirstToken?: number | null } | undefined;
     const newTTFT = newLatency?.timeToFirstToken;
     const newCost = newMetadata['cost'] as number | undefined;
     const newTokenUsage = newMetadata['tokenUsage'] as {
@@ -901,8 +961,11 @@ export class StreamParserService {
     }
 
     if (metadataEvent.metrics) {
+      // Preserve `null` for unmeasured TTFT instead of coercing to 0 — a
+      // real time-to-first-token can never be 0ms, and the badge below
+      // already hides itself for null/undefined/0 via a truthy check.
       result['latency'] = {
-        timeToFirstToken: metadataEvent.metrics.timeToFirstByteMs ?? 0,
+        timeToFirstToken: metadataEvent.metrics.timeToFirstByteMs ?? null,
         endToEndLatency: metadataEvent.metrics.latencyMs,
       };
     }
@@ -961,6 +1024,21 @@ export class StreamParserService {
         toolUseData['status'] = builder.status;
       }
 
+      // While an artifact tool is still streaming (no result yet), surface the
+      // partially-generated `content` so the UI can show live progress. The
+      // full tool-input JSON is incomplete during this window, so JSON.parse
+      // above yields {} — we extract the in-flight value directly instead.
+      if (
+        !builder.result &&
+        builder.toolName &&
+        STREAMING_CONTENT_TOOLS.has(builder.toolName)
+      ) {
+        const streaming = extractStreamingStringField(inputStr, 'content');
+        if (streaming) {
+          toolUseData['streamingContent'] = streaming;
+        }
+      }
+
       return {
         type: 'toolUse',
         toolUse: toolUseData,
diff --git a/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-bridge.spec.ts b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-bridge.spec.ts
new file mode 100644
index 00000000..3c127b2c
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-bridge.spec.ts
@@ -0,0 +1,549 @@
+import { describe, it, expect, beforeEach, vi } from 'vitest';
+import { McpAppBridge } from './mcp-app-bridge';
+import type { UiResourceEvent } from '../../../shared/utils/stream-parser';
+
+const SANDBOX_ORIGIN = 'https://mcp-sandbox.example.com';
+const NONCE = 'nonce-abc';
+
+function resource(): UiResourceEvent {
+  return {
+    type: 'ui_resource',
+    toolUseId: 'tu-1',
+    resourceUri: 'ui://srv/widget',
+    html: '<h1>app</h1>',
+    mimeType: 'text/html;profile=mcp-app',
+    csp: { connectDomains: ['https://api.test'] },
+    permissions: { clipboardWrite: {} },
+    sandboxOrigin: SANDBOX_ORIGIN,
+  };
+}
+
+/** A postMessage sink standing in for the proxy iframe's contentWindow. */
+class FakeProxyWindow {
+  readonly sent: Array<{ msg: any; targetOrigin: string }> = [];
+  postMessage(msg: unknown, targetOrigin: string): void {
+    this.sent.push({ msg, targetOrigin });
+  }
+  last(): any {
+    return this.sent[this.sent.length - 1]?.msg;
+  }
+  byMethod(method: string): any[] {
+    return this.sent.map((s) => s.msg).filter((m) => m?.method === method);
+  }
+  /** The message responding to/with the given JSON-RPC id, or throws. */
+  byId(id: string | number): any {
+    const found = this.sent.find((s) => s.msg?.id === id);
+    if (!found) throw new Error(`no message with id ${id}`);
+    return found.msg;
+  }
+}
+
+/** Host window: lets the test deliver `message` events to the bridge. */
+class FakeHostWindow {
+  private listener: ((ev: MessageEvent) => void) | null = null;
+  addEventListener(_t: 'message', cb: (ev: MessageEvent) => void): void {
+    this.listener = cb;
+  }
+  removeEventListener(): void {
+    this.listener = null;
+  }
+  get attached(): boolean {
+    return this.listener !== null;
+  }
+  deliver(data: unknown, source: unknown, origin = SANDBOX_ORIGIN): void {
+    this.listener?.({ data, source, origin } as MessageEvent);
+  }
+}
+
+interface Harness {
+  bridge: McpAppBridge;
+  host: FakeHostWindow;
+  proxy: FakeProxyWindow;
+  openLink: ReturnType<typeof vi.fn>;
+  proxyToolCall: ReturnType<typeof vi.fn>;
+  sendMessage: ReturnType<typeof vi.fn>;
+  updateModelContext: ReturnType<typeof vi.fn>;
+  requestConsent: ReturnType<typeof vi.fn>;
+  warn: ReturnType<typeof vi.fn>;
+  toolResult: { value: unknown | null };
+}
+
+function makeBridge(
+  opts: { withProxy?: boolean; pr6?: boolean } = {},
+): Harness {
+  const host = new FakeHostWindow();
+  const proxy = new FakeProxyWindow();
+  const openLink = vi.fn();
+  const proxyToolCall = vi.fn(async () => ({
+    content: [{ type: 'text', text: 'tool-ok' }],
+    isError: false,
+  }));
+  // PR #6 deps. Wired only when opts.pr6 — otherwise the bridge must
+  // degrade per JSON-RPC (method-not-found / direct open) so older hosts
+  // keep working.
+  const sendMessage = vi.fn(async () => undefined);
+  const updateModelContext = vi.fn(async () => undefined);
+  const requestConsent = vi.fn(async () => true);
+  const warn = vi.fn();
+  const toolResult: { value: unknown | null } = {
+    value: { content: [{ type: 'text', text: 'ok' }], isError: false },
+  };
+  const bridge = new McpAppBridge({
+    hostWindow: host,
+    getProxyWindow: () => proxy as unknown as Window,
+    sandboxOrigin: SANDBOX_ORIGIN,
+    resource: resource(),
+    nonce: NONCE,
+    getToolInput: () => ({ q: 'paris' }),
+    getToolResult: () => toolResult.value,
+    getHostContext: () => ({ theme: 'dark' }),
+    openLink,
+    // Default harness can proxy; opt out to assert the no-capability path.
+    ...(opts.withProxy === false ? {} : { proxyToolCall }),
+    ...(opts.pr6 ? { sendMessage, updateModelContext, requestConsent } : {}),
+    onWarn: warn,
+  });
+  bridge.start();
+  return {
+    bridge,
+    host,
+    proxy,
+    openLink,
+    proxyToolCall,
+    sendMessage,
+    updateModelContext,
+    requestConsent,
+    warn,
+    toolResult,
+  };
+}
+
+/** Drive the handshake up to (but not including) `initialized`. */
+function handshake(h: Harness): void {
+  h.host.deliver(
+    { jsonrpc: '2.0', method: 'ui/notifications/sandbox-proxy-ready', params: {} },
+    h.proxy,
+  );
+}
+
+describe('McpAppBridge', () => {
+  let h: Harness;
+  beforeEach(() => {
+    h = makeBridge();
+  });
+
+  it('sends sandbox-resource-ready (with nonce + html + csp) on proxy-ready', () => {
+    handshake(h);
+    const msg = h.proxy.byMethod('ui/notifications/sandbox-resource-ready')[0];
+    expect(msg).toBeTruthy();
+    expect(msg.params.html).toBe('<h1>app</h1>');
+    expect(msg.params.nonce).toBe(NONCE);
+    expect(msg.params.csp).toEqual({ connectDomains: ['https://api.test'] });
+    expect(msg.params.permissions).toEqual({ clipboardWrite: {} });
+    expect(msg.params.sandbox).toBe('allow-scripts allow-same-origin allow-forms');
+    // Strict targetOrigin — never '*'.
+    expect(h.proxy.sent[0].targetOrigin).toBe(SANDBOX_ORIGIN);
+  });
+
+  it('rejects messages from the wrong source or origin', () => {
+    handshake(h);
+    h.host.deliver(
+      { jsonrpc: '2.0', id: 1, method: 'ping', nonce: NONCE },
+      {} /* not the proxy window */,
+    );
+    h.host.deliver(
+      { jsonrpc: '2.0', id: 2, method: 'ping', nonce: NONCE },
+      h.proxy,
+      'https://evil.example.com',
+    );
+    // No ping response went out for either rejected message.
+    expect(h.proxy.sent.some((s) => s.msg.id === 1 || s.msg.id === 2)).toBe(false);
+  });
+
+  it('drops post-handshake messages without the nonce', () => {
+    handshake(h);
+    h.host.deliver({ jsonrpc: '2.0', id: 9, method: 'ping' }, h.proxy);
+    expect(h.warn).toHaveBeenCalled();
+    expect(h.proxy.sent.some((s) => s.msg.id === 9)).toBe(false);
+  });
+
+  it('answers ui/initialize with protocol version + host capabilities', () => {
+    handshake(h);
+    h.host.deliver(
+      { jsonrpc: '2.0', id: 'i1', method: 'ui/initialize', nonce: NONCE, params: {} },
+      h.proxy,
+    );
+    const resp = h.proxy.sent.map((s) => s.msg).find((m) => m.id === 'i1');
+    expect(resp.result.protocolVersion).toBe('2026-01-26');
+    expect(resp.result.hostCapabilities.openLinks).toBeDefined();
+    expect(resp.result.hostCapabilities.sandbox.csp).toEqual({
+      connectDomains: ['https://api.test'],
+    });
+    expect(resp.result.hostContext.displayMode).toBe('inline');
+    expect(resp.result.hostContext.theme).toBe('dark');
+  });
+
+  it('does NOT push tool-input before initialized, then flushes on initialized', () => {
+    handshake(h);
+    h.host.deliver(
+      { jsonrpc: '2.0', id: 'i1', method: 'ui/initialize', nonce: NONCE, params: {} },
+      h.proxy,
+    );
+    expect(h.proxy.byMethod('ui/notifications/tool-input')).toHaveLength(0);
+
+    h.host.deliver(
+      { jsonrpc: '2.0', method: 'ui/notifications/initialized', nonce: NONCE },
+      h.proxy,
+    );
+    const input = h.proxy.byMethod('ui/notifications/tool-input');
+    const result = h.proxy.byMethod('ui/notifications/tool-result');
+    expect(input).toHaveLength(1);
+    expect(input[0].params).toEqual({ arguments: { q: 'paris' } });
+    expect(result).toHaveLength(1);
+    expect(result[0].params).toEqual({
+      content: [{ type: 'text', text: 'ok' }],
+      isError: false,
+    });
+    // tool-input MUST precede tool-result.
+    const order = h.proxy.sent
+      .map((s) => s.msg.method)
+      .filter((m) => m && m.startsWith('ui/notifications/tool-'));
+    expect(order.indexOf('ui/notifications/tool-input')).toBeLessThan(
+      order.indexOf('ui/notifications/tool-result'),
+    );
+  });
+
+  it('forwards size-changed to the registered sink', () => {
+    const sizes: Array<[number, number]> = [];
+    h.bridge.onSizeChanged((w, ht) => sizes.push([w, ht]));
+    handshake(h);
+    h.host.deliver(
+      {
+        jsonrpc: '2.0',
+        method: 'ui/notifications/size-changed',
+        nonce: NONCE,
+        params: { width: 320, height: 540 },
+      },
+      h.proxy,
+    );
+    expect(sizes).toEqual([[320, 540]]);
+  });
+
+  it('handles ui/open-link: opens valid https, rejects bad URLs', () => {
+    handshake(h);
+    h.host.deliver(
+      {
+        jsonrpc: '2.0',
+        id: 'l1',
+        method: 'ui/open-link',
+        nonce: NONCE,
+        params: { url: 'https://example.com/x' },
+      },
+      h.proxy,
+    );
+    expect(h.openLink).toHaveBeenCalledWith('https://example.com/x');
+    expect(h.proxy.byId('l1').result).toEqual({});
+
+    h.host.deliver(
+      {
+        jsonrpc: '2.0',
+        id: 'l2',
+        method: 'ui/open-link',
+        nonce: NONCE,
+        params: { url: 'javascript:alert(1)' },
+      },
+      h.proxy,
+    );
+    expect(h.openLink).toHaveBeenCalledTimes(1);
+    expect(h.proxy.byId('l2').error.code).toBe(-32000);
+  });
+
+  it('answers ui/request-display-mode with the resulting mode', () => {
+    handshake(h);
+    h.host.deliver(
+      {
+        jsonrpc: '2.0',
+        id: 'd1',
+        method: 'ui/request-display-mode',
+        nonce: NONCE,
+        params: { mode: 'fullscreen' },
+      },
+      h.proxy,
+    );
+    expect(h.proxy.byId('d1').result).toEqual({
+      mode: 'inline',
+    });
+  });
+
+  it('degrades ui/message + ui/update-model-context to method-not-found when the host lacks the deps', () => {
+    // Default harness wires no PR #6 deps → an older host that doesn't
+    // support these methods. Per JSON-RPC the App should get -32601 and
+    // fall back, regardless of payload.
+    handshake(h);
+    for (const [id, method] of [
+      ['m1', 'ui/message'],
+      ['m2', 'ui/update-model-context'],
+    ] as const) {
+      h.host.deliver(
+        { jsonrpc: '2.0', id, method, nonce: NONCE, params: {} },
+        h.proxy,
+      );
+      expect(h.proxy.byId(id).error.code).toBe(-32601);
+    }
+  });
+
+  describe('PR #6 — implemented (deps wired)', () => {
+    let p: Harness;
+    beforeEach(() => {
+      p = makeBridge({ pr6: true });
+      handshake(p);
+    });
+
+    it('ui/message relays a valid user text as a real turn', async () => {
+      p.host.deliver(
+        {
+          jsonrpc: '2.0',
+          id: 'm1',
+          method: 'ui/message',
+          nonce: NONCE,
+          params: { role: 'user', content: { type: 'text', text: '  hi  ' } },
+        },
+        p.proxy,
+      );
+      await Promise.resolve();
+      await Promise.resolve();
+      expect(p.sendMessage).toHaveBeenCalledWith('hi');
+      expect(p.proxy.byId('m1').result).toEqual({});
+    });
+
+    it('ui/message with bad params is invalid-params (-32000), not relayed', () => {
+      p.host.deliver(
+        {
+          jsonrpc: '2.0',
+          id: 'm2',
+          method: 'ui/message',
+          nonce: NONCE,
+          params: { role: 'assistant' },
+        },
+        p.proxy,
+      );
+      expect(p.sendMessage).not.toHaveBeenCalled();
+      expect(p.proxy.byId('m2').error.code).toBe(-32000);
+    });
+
+    it('ui/update-model-context relays structuredContent', async () => {
+      p.host.deliver(
+        {
+          jsonrpc: '2.0',
+          id: 'c1',
+          method: 'ui/update-model-context',
+          nonce: NONCE,
+          params: { structuredContent: { picked: 'X' } },
+        },
+        p.proxy,
+      );
+      await Promise.resolve();
+      await Promise.resolve();
+      expect(p.updateModelContext).toHaveBeenCalledWith({
+        content: undefined,
+        structuredContent: { picked: 'X' },
+      });
+      expect(p.proxy.byId('c1').result).toEqual({});
+    });
+
+    it('ui/update-model-context with neither content nor structured is -32000', () => {
+      p.host.deliver(
+        {
+          jsonrpc: '2.0',
+          id: 'c2',
+          method: 'ui/update-model-context',
+          nonce: NONCE,
+          params: {},
+        },
+        p.proxy,
+      );
+      expect(p.updateModelContext).not.toHaveBeenCalled();
+      expect(p.proxy.byId('c2').error.code).toBe(-32000);
+    });
+
+    it('ui/open-link asks for consent and opens only when granted', async () => {
+      p.host.deliver(
+        {
+          jsonrpc: '2.0',
+          id: 'l1',
+          method: 'ui/open-link',
+          nonce: NONCE,
+          params: { url: 'https://example.com/x' },
+        },
+        p.proxy,
+      );
+      await Promise.resolve();
+      await Promise.resolve();
+      expect(p.requestConsent).toHaveBeenCalledWith({
+        kind: 'open-link',
+        url: 'https://example.com/x',
+      });
+      expect(p.openLink).toHaveBeenCalledWith('https://example.com/x');
+      expect(p.proxy.byId('l1').result).toEqual({});
+    });
+
+    it('ui/open-link denied → not opened, JSON-RPC error', async () => {
+      p.requestConsent.mockResolvedValueOnce(false);
+      p.host.deliver(
+        {
+          jsonrpc: '2.0',
+          id: 'l2',
+          method: 'ui/open-link',
+          nonce: NONCE,
+          params: { url: 'https://example.com/y' },
+        },
+        p.proxy,
+      );
+      await Promise.resolve();
+      await Promise.resolve();
+      expect(p.openLink).not.toHaveBeenCalled();
+      expect(p.proxy.byId('l2').error.code).toBe(-32000);
+    });
+
+    it('a rejected sendMessage surfaces a JSON-RPC error', async () => {
+      p.sendMessage.mockRejectedValueOnce(new Error('quota exceeded'));
+      p.host.deliver(
+        {
+          jsonrpc: '2.0',
+          id: 'm3',
+          method: 'ui/message',
+          nonce: NONCE,
+          params: { role: 'user', content: { type: 'text', text: 'go' } },
+        },
+        p.proxy,
+      );
+      await Promise.resolve();
+      await Promise.resolve();
+      const err = p.proxy.byId('m3').error;
+      expect(err.code).toBe(-32000);
+      expect(err.message).toBe('quota exceeded');
+    });
+  });
+
+  it('answers ping with an empty result', () => {
+    handshake(h);
+    h.host.deliver(
+      { jsonrpc: '2.0', id: 'p1', method: 'ping', nonce: NONCE },
+      h.proxy,
+    );
+    expect(h.proxy.byId('p1').result).toEqual({});
+  });
+
+  it('dispose() sends resource-teardown after init and detaches the listener', () => {
+    handshake(h);
+    h.host.deliver(
+      { jsonrpc: '2.0', method: 'ui/notifications/initialized', nonce: NONCE },
+      h.proxy,
+    );
+    h.bridge.dispose('bye');
+    const td = h.proxy.byMethod('ui/resource-teardown');
+    expect(td).toHaveLength(1);
+    expect(td[0].params).toEqual({ reason: 'bye' });
+    expect(h.host.attached).toBe(false);
+  });
+
+  it('queues host-context-changed before init and flushes after', () => {
+    handshake(h);
+    h.bridge.notifyHostContextChanged({ theme: 'light' });
+    expect(h.proxy.byMethod('ui/notifications/host-context-changed')).toHaveLength(0);
+    h.host.deliver(
+      { jsonrpc: '2.0', method: 'ui/notifications/initialized', nonce: NONCE },
+      h.proxy,
+    );
+    const hc = h.proxy.byMethod('ui/notifications/host-context-changed');
+    expect(hc).toHaveLength(1);
+    expect(hc[0].params).toEqual({ theme: 'light' });
+  });
+
+  // ── PR #5: app-initiated tools/call proxying ───────────────────────────
+
+  it('advertises serverTools only when a proxy is available', () => {
+    handshake(h);
+    h.host.deliver(
+      { jsonrpc: '2.0', id: 'i1', method: 'ui/initialize', nonce: NONCE, params: {} },
+      h.proxy,
+    );
+    expect(h.proxy.byId('i1').result.hostCapabilities.serverTools).toBeDefined();
+
+    const noProxy = makeBridge({ withProxy: false });
+    handshake(noProxy);
+    noProxy.host.deliver(
+      { jsonrpc: '2.0', id: 'i2', method: 'ui/initialize', nonce: NONCE, params: {} },
+      noProxy.proxy,
+    );
+    expect(
+      noProxy.proxy.byId('i2').result.hostCapabilities.serverTools,
+    ).toBeUndefined();
+  });
+
+  it('proxies a tools/call and returns the CallToolResult to the View', async () => {
+    handshake(h);
+    h.host.deliver(
+      {
+        jsonrpc: '2.0',
+        id: 'c1',
+        method: 'tools/call',
+        nonce: NONCE,
+        params: { name: 'widget_tool', arguments: { q: 'x' } },
+      },
+      h.proxy,
+    );
+    expect(h.proxyToolCall).toHaveBeenCalledWith('widget_tool', { q: 'x' });
+    // proxyToolCall is async — let the microtask settle.
+    await Promise.resolve();
+    await Promise.resolve();
+    expect(h.proxy.byId('c1').result).toEqual({
+      content: [{ type: 'text', text: 'tool-ok' }],
+      isError: false,
+    });
+  });
+
+  it('answers tools/call with an error when the proxy rejects', async () => {
+    h.proxyToolCall.mockRejectedValueOnce(new Error('not app-visible'));
+    handshake(h);
+    h.host.deliver(
+      {
+        jsonrpc: '2.0',
+        id: 'c2',
+        method: 'tools/call',
+        nonce: NONCE,
+        params: { name: 'blocked', arguments: {} },
+      },
+      h.proxy,
+    );
+    await Promise.resolve();
+    await Promise.resolve();
+    expect(h.proxy.byId('c2').error.message).toBe('not app-visible');
+  });
+
+  it('rejects tools/call with no tool name', () => {
+    handshake(h);
+    h.host.deliver(
+      { jsonrpc: '2.0', id: 'c3', method: 'tools/call', nonce: NONCE, params: {} },
+      h.proxy,
+    );
+    expect(h.proxy.byId('c3').error.code).toBe(-32000);
+    expect(h.proxyToolCall).not.toHaveBeenCalled();
+  });
+
+  it('answers tools/call method-not-found when the host cannot proxy', () => {
+    const np = makeBridge({ withProxy: false });
+    handshake(np);
+    np.host.deliver(
+      {
+        jsonrpc: '2.0',
+        id: 'c4',
+        method: 'tools/call',
+        nonce: NONCE,
+        params: { name: 'widget_tool' },
+      },
+      np.proxy,
+    );
+    expect(np.proxy.byId('c4').error.code).toBe(-32601);
+  });
+});
diff --git a/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-bridge.ts b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-bridge.ts
new file mode 100644
index 00000000..702b21e7
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-bridge.ts
@@ -0,0 +1,583 @@
+/**
+ * MCP Apps host bridge (SEP-1865), PR #4 of
+ * `docs/kaizen/scoping/mcp-apps-host-renderer.md`.
+ *
+ * The host half of the JSON-RPC-2.0-over-postMessage protocol. It talks to
+ * the deployed sandbox-proxy (`proxy.html`, a different origin); the proxy
+ * forwards to/from the App View running in its inner null-origin iframe.
+ *
+ * Security model (spec + scoping doc decision #1/#5):
+ *  - Outer proxy iframe lives at a dedicated origin; messages to/from it are
+ *    validated by `event.source === proxyWindow` AND
+ *    `event.origin === sandboxOrigin`.
+ *  - A per-frame nonce, minted here and handed to the proxy in
+ *    `sandbox-resource-ready`, authenticates every subsequent exchange (the
+ *    inner View is null-origin, so the nonce — not origin — is the real
+ *    check the spec mandates).
+ *  - The host MUST NOT send any request/notification toward the View before
+ *    it observes the View's `ui/notifications/initialized`; pre-init sends
+ *    are queued and flushed on initialized.
+ *
+ * Framework-free and DOM-light on purpose: `hostWindow` and the proxy window
+ * accessor are injected so specs drive it with plain fakes (no vi.mock of
+ * globals — house rule).
+ *
+ * PR #6 implements the remaining View→host methods: `ui/message` (relayed
+ * to the chat send path — a real streaming turn, identical to a typed
+ * message), `ui/update-model-context` (relayed to app-api → stashed on the
+ * agent's Strands state for the next turn), and `ui/open-link` consent
+ * gating (frontend-only — the request originates here, so there is no
+ * backend turn to pause; the host obtains an inline in-thread consent
+ * before opening). Each new dep is optional: when absent the bridge keeps
+ * PR #4/#5 behavior (method-not-found / direct open) so older hosts and
+ * specs degrade gracefully.
+ */
+
+import type {
+  UiResourceEvent,
+} from '../../../shared/utils/stream-parser';
+import {
+  ConsentRequest,
+  HostContext,
+  JsonRpcId,
+  JsonRpcMessage,
+  JsonRpcRequest,
+  JSONRPC_IMPL_ERROR,
+  JSONRPC_METHOD_NOT_FOUND,
+  M_HOST_CONTEXT_CHANGED,
+  M_MESSAGE,
+  M_OPEN_LINK,
+  M_PING,
+  M_REQUEST_DISPLAY_MODE,
+  M_RESOURCE_TEARDOWN,
+  M_SANDBOX_PROXY_READY,
+  M_SANDBOX_RESOURCE_READY,
+  M_SIZE_CHANGED,
+  M_TOOLS_CALL,
+  M_TOOL_CANCELLED,
+  M_TOOL_INPUT,
+  M_TOOL_RESULT,
+  M_UI_INITIALIZE,
+  M_UI_INITIALIZED,
+  M_UPDATE_MODEL_CONTEXT,
+  MCP_UI_PROTOCOL_VERSION,
+  MessageParams,
+  SizeChangedParams,
+  UpdateModelContextParams,
+  isJsonRpc,
+  isRequest,
+} from './mcp-app-protocol';
+
+/** Minimal window surface the bridge needs (eases testing). */
+export interface BridgeHostWindow {
+  addEventListener(
+    type: 'message',
+    listener: (ev: MessageEvent) => void,
+  ): void;
+  removeEventListener(
+    type: 'message',
+    listener: (ev: MessageEvent) => void,
+  ): void;
+}
+
+export interface McpAppBridgeDeps {
+  /** The window that hosts the outer proxy iframe (for `message` events). */
+  hostWindow: BridgeHostWindow;
+  /** Lazily resolves the proxy iframe's `contentWindow` (null until load). */
+  getProxyWindow: () => Window | null;
+  /** Origin the proxy is served from; targetOrigin + inbound origin check. */
+  sandboxOrigin: string;
+  /** The resource (html/csp/permissions) to render. */
+  resource: UiResourceEvent;
+  /** Per-frame nonce (mint once per frame; never reused). */
+  nonce: string;
+  /** Complete tool-call arguments for `ui/notifications/tool-input`. */
+  getToolInput: () => Record<string, unknown>;
+  /** Tool result as an MCP `CallToolResult`, or null if not yet available. */
+  getToolResult: () => unknown | null;
+  /** Current host UI context (theme, displayMode, …). */
+  getHostContext: () => HostContext;
+  /** Open an external URL once consent (if wired) is granted. */
+  openLink: (url: string) => void;
+  /**
+   * Proxy an App-initiated `tools/call` to the MCP server (PR #5) via
+   * app-api. Resolves with the `CallToolResult`; rejects with an Error
+   * whose message is safe to return to the App. Optional so older hosts
+   * (and tests) can omit it — absent ⇒ `tools/call` is method-not-found.
+   */
+  proxyToolCall?: (
+    toolName: string,
+    args: Record<string, unknown>,
+  ) => Promise<{ content: unknown[]; isError: boolean }>;
+  /**
+   * Relay `ui/message` as a real user turn (PR #6). The host treats it
+   * identically to a typed message — it starts a normal streaming turn.
+   * Absent (older hosts / specs) ⇒ `ui/message` is method-not-found.
+   */
+  sendMessage?: (text: string) => Promise<void>;
+  /**
+   * Relay `ui/update-model-context` (PR #6) to app-api, which stashes it
+   * on the conversation agent's state for the next turn. Resolves on
+   * acceptance; rejects with a safe Error message. Absent ⇒
+   * `ui/update-model-context` is method-not-found.
+   */
+  updateModelContext?: (
+    payload: UpdateModelContextParams,
+  ) => Promise<void>;
+  /**
+   * Obtain user consent for an App-initiated action (PR #6, frontend-only
+   * — no backend turn to pause). Resolves `true` to proceed. Absent ⇒ the
+   * bridge keeps PR #4 behavior and opens links directly (back-compat for
+   * older hosts / specs). Capability consent is handled host-side before
+   * the frame renders, so the bridge only ever asks for `open-link`.
+   */
+  requestConsent?: (req: ConsentRequest) => Promise<boolean>;
+  /** Non-fatal diagnostics (validation drops, protocol slips). */
+  onWarn?: (message: string) => void;
+}
+
+export class McpAppBridge {
+  private readonly d: McpAppBridgeDeps;
+  private listener: ((ev: MessageEvent) => void) | null = null;
+
+  /** Set once `sandbox-resource-ready` is sent — nonce now required. */
+  private nonceArmed = false;
+  /** Set on the View's `ui/notifications/initialized`. */
+  private viewInitialized = false;
+  private disposed = false;
+
+  /** Notifications deferred until the View reports `initialized`. */
+  private readonly preInitQueue: Array<{ method: string; params: unknown }> = [];
+
+  /** Whether tool-input was already pushed (spec: at most once). */
+  private toolInputSent = false;
+
+  /** Pending host→View requests awaiting a JSON-RPC response, by id. */
+  private readonly pending = new Map<
+    JsonRpcId,
+    { resolve: (v: unknown) => void; reject: (e: unknown) => void }
+  >();
+  private nextRequestId = 1;
+
+  constructor(deps: McpAppBridgeDeps) {
+    this.d = deps;
+  }
+
+  /** Begin listening. Call once the outer iframe element exists. */
+  start(): void {
+    if (this.listener) return;
+    this.listener = (ev: MessageEvent) => this.onMessage(ev);
+    this.d.hostWindow.addEventListener('message', this.listener);
+  }
+
+  /**
+   * Tear down: best-effort `ui/resource-teardown` toward the View, then
+   * detach. Safe to call multiple times.
+   */
+  dispose(reason = 'host-teardown'): void {
+    if (this.disposed) return;
+    this.disposed = true;
+    if (this.viewInitialized) {
+      // Fire-and-forget: we're going away regardless of the ack.
+      this.sendRequest(M_RESOURCE_TEARDOWN, { reason }).catch(() => undefined);
+    }
+    if (this.listener) {
+      this.d.hostWindow.removeEventListener('message', this.listener);
+      this.listener = null;
+    }
+    for (const { reject } of this.pending.values()) {
+      reject(new Error('bridge disposed'));
+    }
+    this.pending.clear();
+  }
+
+  /** Push a `host-context-changed` partial (e.g., theme toggle). */
+  notifyHostContextChanged(partial: Partial<HostContext>): void {
+    this.sendNotification(M_HOST_CONTEXT_CHANGED, partial);
+  }
+
+  // --- inbound ------------------------------------------------------------
+
+  private onMessage(ev: MessageEvent): void {
+    if (this.disposed) return;
+    const proxyWindow = this.d.getProxyWindow();
+    // Source + origin gate. The proxy page is served from sandboxOrigin, so
+    // its window's origin is a real URL (the null-origin inner frame only
+    // ever talks to the proxy, never to us).
+    if (!proxyWindow || ev.source !== proxyWindow) return;
+    if (ev.origin !== this.d.sandboxOrigin) return;
+
+    const data = ev.data;
+    if (!isJsonRpc(data)) return;
+
+    // Nonce gate: armed the moment we hand the proxy the nonce. The only
+    // legitimately pre-nonce message is the proxy's first ready ping.
+    const method = 'method' in data ? (data as { method?: string }).method : undefined;
+    if (this.nonceArmed) {
+      if ((data as { nonce?: string }).nonce !== this.d.nonce) {
+        this.d.onWarn?.(`dropped message with bad/absent nonce (${method ?? 'response'})`);
+        return;
+      }
+    } else if (method !== M_SANDBOX_PROXY_READY) {
+      this.d.onWarn?.(`dropped pre-handshake message (${method ?? 'response'})`);
+      return;
+    }
+
+    this.route(data);
+  }
+
+  private route(msg: JsonRpcMessage): void {
+    // Response to a host-initiated request (e.g. resource-teardown).
+    if (!('method' in msg) && 'id' in msg) {
+      const p = this.pending.get(msg.id);
+      if (!p) return;
+      this.pending.delete(msg.id);
+      if ('error' in msg) p.reject(msg.error);
+      else p.resolve((msg as { result: unknown }).result);
+      return;
+    }
+
+    const method = (msg as { method: string }).method;
+    switch (method) {
+      case M_SANDBOX_PROXY_READY:
+        this.sendSandboxResourceReady();
+        return;
+
+      case M_UI_INITIALIZE:
+        this.handleInitialize(msg as JsonRpcRequest);
+        return;
+
+      case M_UI_INITIALIZED:
+        this.viewInitialized = true;
+        this.flushPreInit();
+        this.pushToolData();
+        return;
+
+      case M_SIZE_CHANGED: {
+        const p = (msg as { params?: SizeChangedParams }).params;
+        if (p && typeof p.height === 'number') {
+          this.sizeChangedCb?.(p.width, p.height);
+        }
+        return;
+      }
+
+      case M_PING:
+        if (isRequest(msg)) this.respond(msg.id, {});
+        return;
+
+      case M_TOOLS_CALL: {
+        if (!isRequest(msg)) return;
+        const p = msg.params as
+          | { name?: string; arguments?: Record<string, unknown> }
+          | undefined;
+        const name = p?.name;
+        if (typeof name !== 'string' || !name) {
+          this.respondError(msg.id, JSONRPC_IMPL_ERROR, 'Invalid tool name');
+          return;
+        }
+        if (!this.d.proxyToolCall) {
+          this.respondError(
+            msg.id,
+            JSONRPC_METHOD_NOT_FOUND,
+            'tools/call not supported by this host',
+          );
+          return;
+        }
+        const reqId = msg.id;
+        // app-api enforces auth + the conversation binding; inference-api
+        // is the authoritative spec-MUST app-visibility gate. We forward
+        // the View's CallToolResult back verbatim (content + isError).
+        this.d
+          .proxyToolCall(name, p?.arguments ?? {})
+          .then((result) => this.respond(reqId, result))
+          .catch((err: unknown) =>
+            this.respondError(
+              reqId,
+              JSONRPC_IMPL_ERROR,
+              err instanceof Error ? err.message : 'Tool call failed',
+            ),
+          );
+        return;
+      }
+
+      case M_OPEN_LINK: {
+        if (!isRequest(msg)) return;
+        const url = (msg.params as { url?: string } | undefined)?.url;
+        if (typeof url !== 'string' || !/^https?:\/\//i.test(url)) {
+          this.respondError(msg.id, JSONRPC_IMPL_ERROR, 'Invalid URL');
+          return;
+        }
+        const reqId = msg.id;
+        if (!this.d.requestConsent) {
+          // No consent gate wired (older host / specs): PR #4 behavior.
+          this.d.openLink(url);
+          this.respond(reqId, {});
+          return;
+        }
+        // PR #6: frontend-only consent. Hold the JSON-RPC response open
+        // until the user answers the inline in-thread prompt.
+        this.d
+          .requestConsent({ kind: 'open-link', url })
+          .then((granted) => {
+            if (granted) {
+              this.d.openLink(url);
+              this.respond(reqId, {});
+            } else {
+              this.respondError(
+                reqId,
+                JSONRPC_IMPL_ERROR,
+                'User declined to open the link',
+              );
+            }
+          })
+          .catch(() =>
+            this.respondError(
+              reqId,
+              JSONRPC_IMPL_ERROR,
+              'Consent could not be obtained',
+            ),
+          );
+        return;
+      }
+
+      case M_REQUEST_DISPLAY_MODE: {
+        if (!isRequest(msg)) return;
+        // PR #4 host only renders inline; spec: MUST return the resulting
+        // mode (the current one when the request can't be honored).
+        this.respond(msg.id, { mode: 'inline' });
+        return;
+      }
+
+      case M_MESSAGE: {
+        if (!isRequest(msg)) return;
+        // Capability check before params: a host without the dep simply
+        // doesn't support the method (older host / tests degrade per
+        // JSON-RPC, regardless of payload).
+        if (!this.d.sendMessage) {
+          this.respondError(
+            msg.id,
+            JSONRPC_METHOD_NOT_FOUND,
+            'ui/message not supported by this host',
+          );
+          return;
+        }
+        const p = msg.params as Partial<MessageParams> | undefined;
+        const text =
+          p?.role === 'user' &&
+          p?.content?.type === 'text' &&
+          typeof p.content.text === 'string'
+            ? p.content.text.trim()
+            : '';
+        if (!text) {
+          this.respondError(
+            msg.id,
+            JSONRPC_IMPL_ERROR,
+            'Invalid ui/message params',
+          );
+          return;
+        }
+        const reqId = msg.id;
+        // Treated identically to a typed message: a real streaming turn.
+        this.d
+          .sendMessage(text)
+          .then(() => this.respond(reqId, {}))
+          .catch((err: unknown) =>
+            this.respondError(
+              reqId,
+              JSONRPC_IMPL_ERROR,
+              err instanceof Error ? err.message : 'Failed to send message',
+            ),
+          );
+        return;
+      }
+
+      case M_UPDATE_MODEL_CONTEXT: {
+        if (!isRequest(msg)) return;
+        if (!this.d.updateModelContext) {
+          this.respondError(
+            msg.id,
+            JSONRPC_METHOD_NOT_FOUND,
+            'ui/update-model-context not supported by this host',
+          );
+          return;
+        }
+        const p = msg.params as UpdateModelContextParams | undefined;
+        const hasContent =
+          Array.isArray(p?.content) && p!.content!.length > 0;
+        const hasStructured =
+          !!p?.structuredContent &&
+          typeof p.structuredContent === 'object';
+        if (!hasContent && !hasStructured) {
+          this.respondError(
+            msg.id,
+            JSONRPC_IMPL_ERROR,
+            'Invalid ui/update-model-context params',
+          );
+          return;
+        }
+        const reqId = msg.id;
+        this.d
+          .updateModelContext({
+            content: hasContent ? p!.content : undefined,
+            structuredContent: hasStructured
+              ? p!.structuredContent
+              : undefined,
+          })
+          .then(() => this.respond(reqId, {}))
+          .catch((err: unknown) =>
+            this.respondError(
+              reqId,
+              JSONRPC_IMPL_ERROR,
+              err instanceof Error
+                ? err.message
+                : 'Failed to update model context',
+            ),
+          );
+        return;
+      }
+
+      default:
+        // Unknown request → method-not-found; unknown notification → ignore.
+        if (isRequest(msg)) {
+          this.respondError(
+            msg.id,
+            JSONRPC_METHOD_NOT_FOUND,
+            `Unknown method: ${method}`,
+          );
+        }
+        return;
+    }
+  }
+
+  private handleInitialize(req: JsonRpcRequest): void {
+    // A response (not a request/notification toward the View) — allowed
+    // before `initialized`.
+    this.respond(req.id, {
+      protocolVersion: MCP_UI_PROTOCOL_VERSION,
+      hostInfo: { name: 'agentcore-public-stack', version: '1.0.0' },
+      hostCapabilities: {
+        openLinks: {},
+        // Only advertise serverTools when the host can actually proxy
+        // (PR #5). Absent ⇒ the App won't attempt tools/call.
+        ...(this.d.proxyToolCall ? { serverTools: {} } : {}),
+        sandbox: {
+          permissions: this.d.resource.permissions,
+          csp: this.d.resource.csp,
+        },
+      },
+      hostContext: {
+        ...this.d.getHostContext(),
+        displayMode: 'inline',
+        availableDisplayModes: ['inline'],
+      },
+    });
+  }
+
+  // --- outbound -----------------------------------------------------------
+
+  private sizeChangedCb: ((w: number, h: number) => void) | null = null;
+  /** Register the size-changed sink (the component resizes the iframe). */
+  onSizeChanged(cb: (w: number, h: number) => void): void {
+    this.sizeChangedCb = cb;
+  }
+
+  private sendSandboxResourceReady(): void {
+    // Reserved host→proxy notification; consumed by the proxy, not forwarded.
+    // Inner-iframe sandbox matches the ext-apps basic-host reference
+    // (`examples/basic-host/src/sandbox.ts`): allow-scripts + allow-same-origin
+    // + allow-forms. allow-same-origin is required for proxy.js to populate
+    // the inner doc via document.write and for typical App bundles
+    // (Excalidraw, Cesium, etc.) to access localStorage at the sandbox origin.
+    // The mcp-sandbox origin is a static CDN with no shared state, so this
+    // does not weaken the cross-origin boundary against the SPA. Apps wanting
+    // stricter isolation can opt into null-origin via `_meta.ui.sandbox` once
+    // that pass-through lands.
+    this.postToProxy({
+      jsonrpc: '2.0',
+      method: M_SANDBOX_RESOURCE_READY,
+      params: {
+        html: this.d.resource.html,
+        sandbox: 'allow-scripts allow-same-origin allow-forms',
+        csp: this.d.resource.csp,
+        permissions: this.d.resource.permissions,
+        nonce: this.d.nonce,
+      },
+    });
+    // From here on every inbound message MUST carry the nonce.
+    this.nonceArmed = true;
+  }
+
+  /** Push `tool-input` (once) then `tool-result` if it's available. */
+  private pushToolData(): void {
+    if (!this.toolInputSent) {
+      this.toolInputSent = true;
+      this.sendNotification(M_TOOL_INPUT, { arguments: this.d.getToolInput() });
+    }
+    const result = this.d.getToolResult();
+    if (result != null) {
+      this.sendNotification(M_TOOL_RESULT, result);
+    }
+  }
+
+  /** Re-push the tool result if it arrives/changes after init. */
+  refreshToolResult(): void {
+    if (!this.viewInitialized) return;
+    const result = this.d.getToolResult();
+    if (result != null) this.sendNotification(M_TOOL_RESULT, result);
+  }
+
+  notifyToolCancelled(reason: string): void {
+    this.sendNotification(M_TOOL_CANCELLED, { reason });
+  }
+
+  private sendNotification(method: string, params: unknown): void {
+    // Spec: the host MUST NOT send any request/notification toward the View
+    // before `initialized`. (The `sandbox-resource-ready` notification and
+    // the `ui/initialize` response are bootstrap — they go straight through
+    // `postToProxy`/`respond`, never here.)
+    if (!this.viewInitialized) {
+      this.preInitQueue.push({ method, params });
+      return;
+    }
+    this.postToProxy({ jsonrpc: '2.0', method, params, nonce: this.d.nonce });
+  }
+
+  private sendRequest(method: string, params: unknown): Promise<unknown> {
+    const id = `host-${this.nextRequestId++}`;
+    const promise = new Promise<unknown>((resolve, reject) => {
+      this.pending.set(id, { resolve, reject });
+    });
+    this.postToProxy({ jsonrpc: '2.0', id, method, params, nonce: this.d.nonce });
+    return promise;
+  }
+
+  private respond(id: JsonRpcId, result: unknown): void {
+    this.postToProxy({ jsonrpc: '2.0', id, result, nonce: this.d.nonce });
+  }
+
+  private respondError(id: JsonRpcId, code: number, message: string): void {
+    this.postToProxy({
+      jsonrpc: '2.0',
+      id,
+      error: { code, message },
+      nonce: this.d.nonce,
+    });
+  }
+
+  private flushPreInit(): void {
+    const queued = this.preInitQueue.splice(0, this.preInitQueue.length);
+    for (const { method, params } of queued) {
+      this.postToProxy({ jsonrpc: '2.0', method, params, nonce: this.d.nonce });
+    }
+  }
+
+  private postToProxy(msg: JsonRpcMessage): void {
+    const w = this.d.getProxyWindow();
+    if (!w) {
+      this.d.onWarn?.('proxy window unavailable; message dropped');
+      return;
+    }
+    // Strict targetOrigin: we know exactly where the proxy lives.
+    w.postMessage(msg, this.d.sandboxOrigin);
+  }
+}
diff --git a/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-card-http.service.ts b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-card-http.service.ts
new file mode 100644
index 00000000..336c5d30
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-card-http.service.ts
@@ -0,0 +1,51 @@
+import { Injectable, inject } from '@angular/core';
+import { HttpClient } from '@angular/common/http';
+import { firstValueFrom } from 'rxjs';
+import { ConfigService } from '../../../services/config.service';
+import type { McpAppCard } from './mcp-app-card-state.service';
+
+interface McpAppCardDto {
+  cardId: string;
+  toolUseId: string;
+  toolName: string;
+  arguments?: Record<string, unknown>;
+  content?: Array<Record<string, unknown>>;
+  isError?: boolean;
+  createdAt: string;
+  producedByMessageIndex?: number | null;
+}
+
+interface McpAppCardListDto {
+  cards: McpAppCardDto[];
+}
+
+/**
+ * app-api client for Option A reload hydration (MCP Apps PR #6). Auth
+ * rides the BFF session cookie via the HttpClient pipeline (GET is a safe
+ * method — no CSRF needed), same as `ArtifactHttpService.listSessionArtifacts`.
+ */
+@Injectable({ providedIn: 'root' })
+export class McpAppCardHttpService {
+  private readonly http = inject(HttpClient);
+  private readonly config = inject(ConfigService);
+
+  /** List this user's persisted app-initiated tool cards for a session. */
+  async listSessionCards(sessionId: string): Promise<McpAppCard[]> {
+    const res = await firstValueFrom(
+      this.http.get<McpAppCardListDto>(
+        `${this.config.appApiUrl()}/mcp-apps/cards`,
+        { params: { session_id: sessionId } },
+      ),
+    );
+    return (res.cards ?? []).map((c) => ({
+      cardId: c.cardId,
+      toolUseId: c.toolUseId,
+      toolName: c.toolName,
+      arguments: c.arguments ?? {},
+      content: c.content ?? [],
+      isError: !!c.isError,
+      createdAt: c.createdAt,
+      producedByMessageIndex: c.producedByMessageIndex ?? null,
+    }));
+  }
+}
diff --git a/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-card-state.service.spec.ts b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-card-state.service.spec.ts
new file mode 100644
index 00000000..5fa203a8
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-card-state.service.spec.ts
@@ -0,0 +1,57 @@
+import { describe, it, expect, beforeEach } from 'vitest';
+import {
+  McpAppCardStateService,
+  McpAppCard,
+} from './mcp-app-card-state.service';
+
+function card(over: Partial<McpAppCard> = {}): McpAppCard {
+  return {
+    cardId: 'c1',
+    toolUseId: 'tu1',
+    toolName: 'widget_tool',
+    arguments: {},
+    content: [{ type: 'text', text: 'ok' }],
+    isError: false,
+    createdAt: '2026-01-01T00:00:00Z',
+    producedByMessageIndex: null,
+    ...over,
+  };
+}
+
+describe('McpAppCardStateService', () => {
+  let svc: McpAppCardStateService;
+
+  beforeEach(() => {
+    svc = new McpAppCardStateService();
+  });
+
+  it('seeds cards and sorts oldest-first', () => {
+    svc.seedFromHydration([
+      card({ cardId: 'b', createdAt: '2026-01-02T00:00:00Z' }),
+      card({ cardId: 'a', createdAt: '2026-01-01T00:00:00Z' }),
+    ]);
+    expect(svc.hasCards()).toBe(true);
+    expect(svc.cards().map((c) => c.cardId)).toEqual(['a', 'b']);
+  });
+
+  it('seedFromHydration is non-clobbering by cardId', () => {
+    svc.seedFromHydration([card({ cardId: 'a', toolName: 'first' })]);
+    // A later (e.g. slower) response must not overwrite an existing card.
+    svc.seedFromHydration([card({ cardId: 'a', toolName: 'second' })]);
+    expect(svc.cards()).toHaveLength(1);
+    expect(svc.cards()[0].toolName).toBe('first');
+  });
+
+  it('empty seed is a no-op', () => {
+    svc.seedFromHydration([]);
+    expect(svc.hasCards()).toBe(false);
+    expect(svc.cards()).toEqual([]);
+  });
+
+  it('reset() clears all cards', () => {
+    svc.seedFromHydration([card()]);
+    svc.reset();
+    expect(svc.hasCards()).toBe(false);
+    expect(svc.cards()).toEqual([]);
+  });
+});
diff --git a/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-card-state.service.ts b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-card-state.service.ts
new file mode 100644
index 00000000..15e3c7a6
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-card-state.service.ts
@@ -0,0 +1,67 @@
+import { Injectable, computed, signal } from '@angular/core';
+
+/**
+ * A persisted app-initiated tool-call card (MCP Apps PR #6, Option A).
+ * Mirrors the backend `card_store` record (key attrs stripped).
+ */
+export interface McpAppCard {
+  cardId: string;
+  toolUseId: string;
+  toolName: string;
+  arguments: Record<string, unknown>;
+  content: Array<Record<string, unknown>>;
+  isError: boolean;
+  createdAt: string;
+  producedByMessageIndex?: number | null;
+}
+
+/**
+ * Reload hydration for app-initiated tool-call cards (MCP Apps PR #6).
+ *
+ * PR #5's broker is in-memory: an App's `tools/call` surfaces *live* as
+ * `tool_use`/`tool_result` in the thread, but those synthesized events are
+ * never persisted to AgentCore Memory (persisting a synthetic tool turn
+ * would break Bedrock role alternation). So on a page reload they vanish.
+ * Option A closes that gap exactly like the Artifacts feature: app-api
+ * persists a side-channel provenance record and the SPA replays it here as
+ * a **static historical card** (the App iframe itself is not
+ * re-instantiated on reload — the realistic target is a read-only record).
+ *
+ * Structural sibling of `ArtifactStateService` / `McpAppStateService`: a
+ * session-load `seedFromHydration` path and a `reset()` the session page
+ * calls on conversation change. There is deliberately no `recordLive`:
+ * live app-initiated calls already render through the normal tool path
+ * (PR #5), so a live card would double-render.
+ */
+@Injectable({ providedIn: 'root' })
+export class McpAppCardStateService {
+  private readonly byId = signal<ReadonlyMap<string, McpAppCard>>(new Map());
+
+  /** Cards for the current conversation, oldest-first (stable order). */
+  readonly cards = computed<McpAppCard[]>(() =>
+    [...this.byId().values()].sort((a, b) =>
+      a.createdAt < b.createdAt ? -1 : a.createdAt > b.createdAt ? 1 : 0,
+    ),
+  );
+
+  readonly hasCards = computed(() => this.byId().size > 0);
+
+  /**
+   * Seed cards fetched from the app-api list endpoint on conversation
+   * load. Non-clobbering by `cardId` so a slow response can't undo state
+   * (matches `ArtifactStateService.seedFromHydration` semantics).
+   */
+  seedFromHydration(list: readonly McpAppCard[]): void {
+    if (!list.length) return;
+    const next = new Map(this.byId());
+    for (const card of list) {
+      if (!next.has(card.cardId)) next.set(card.cardId, card);
+    }
+    this.byId.set(next);
+  }
+
+  /** Drop all cards — called on conversation change. */
+  reset(): void {
+    if (this.byId().size) this.byId.set(new Map());
+  }
+}
diff --git a/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-consent.service.spec.ts b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-consent.service.spec.ts
new file mode 100644
index 00000000..6d45898b
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-consent.service.spec.ts
@@ -0,0 +1,65 @@
+import { describe, it, expect, beforeEach } from 'vitest';
+import { McpAppConsentService } from './mcp-app-consent.service';
+
+describe('McpAppConsentService', () => {
+  let svc: McpAppConsentService;
+
+  beforeEach(() => {
+    svc = new McpAppConsentService();
+  });
+
+  it('surfaces a pending prompt and resolves it on answer(true)', async () => {
+    const { id, granted } = svc.request({
+      kind: 'open-link',
+      url: 'https://x.test',
+    });
+    expect(svc.pending()).toHaveLength(1);
+    const entry = svc.pending()[0];
+    expect(entry.id).toBe(id);
+    expect(entry.request).toEqual({ kind: 'open-link', url: 'https://x.test' });
+
+    svc.answer(entry.id, true);
+    await expect(granted).resolves.toBe(true);
+    expect(svc.pending()).toHaveLength(0);
+  });
+
+  it('resolves false on deny', async () => {
+    const { granted } = svc.request({
+      kind: 'capabilities',
+      capabilities: ['microphone'],
+    });
+    svc.answer(svc.pending()[0].id, false);
+    await expect(granted).resolves.toBe(false);
+  });
+
+  it('answer() on an unknown id is a no-op', () => {
+    svc.request({ kind: 'open-link', url: 'https://x.test' });
+    svc.answer('nope', true);
+    expect(svc.pending()).toHaveLength(1);
+  });
+
+  it('reset() fails open prompts closed (resolve false) and clears', async () => {
+    const a = svc.request({ kind: 'open-link', url: 'https://a.test' });
+    const b = svc.request({ kind: 'open-link', url: 'https://b.test' });
+    expect(svc.pending()).toHaveLength(2);
+
+    svc.reset();
+    expect(svc.pending()).toHaveLength(0);
+    await expect(a.granted).resolves.toBe(false);
+    await expect(b.granted).resolves.toBe(false);
+  });
+
+  it('keeps multiple prompts independently addressable', async () => {
+    const a = svc.request({ kind: 'open-link', url: 'https://a.test' });
+    const b = svc.request({ kind: 'open-link', url: 'https://b.test' });
+    const [ea, eb] = svc.pending();
+
+    svc.answer(eb.id, true);
+    await expect(b.granted).resolves.toBe(true);
+    expect(svc.pending()).toHaveLength(1);
+    expect(svc.pending()[0].id).toBe(ea.id);
+
+    svc.answer(ea.id, false);
+    await expect(a.granted).resolves.toBe(false);
+  });
+});
diff --git a/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-consent.service.ts b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-consent.service.ts
new file mode 100644
index 00000000..9461362c
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-consent.service.ts
@@ -0,0 +1,76 @@
+import { Injectable, computed, signal } from '@angular/core';
+import type { ConsentRequest } from './mcp-app-protocol';
+
+/**
+ * Frontend-only consent for App-initiated actions (MCP Apps PR #6).
+ *
+ * Decision (`docs/kaizen/scoping/mcp-apps-host-renderer.md`, PR #6): unlike
+ * the OAuth-tool `oauth_required` SSE family — which exists because OAuth
+ * consent fires *inside a backend agent turn* and pauses it via a Strands
+ * interrupt — `ui/open-link` and capability requests originate from a
+ * postMessage on a possibly-idle iframe. There is no backend turn to
+ * pause, so routing through the backend would be pure round-trip
+ * ceremony. The host obtains consent entirely on the client: a pending
+ * request is surfaced as an inline in-thread prompt (structural sibling of
+ * `OAuthConsentService` + the unanchored-prompt strip) and the bridge's
+ * JSON-RPC response is held open until the user answers.
+ *
+ * Transient by design: prompts are interaction gates, not provenance, so
+ * they are NOT persisted and do not survive a reload (consistent with the
+ * App iframe itself not being re-instantiated on reload).
+ */
+export interface PendingConsent {
+  /** Stable id for `@for` tracking + resolution. */
+  id: string;
+  request: ConsentRequest;
+  receivedAt: number;
+  resolve: (granted: boolean) => void;
+}
+
+@Injectable({ providedIn: 'root' })
+export class McpAppConsentService {
+  private readonly pendingSignal = signal<readonly PendingConsent[]>([]);
+
+  /** Pending consent prompts to render in the message-list strip. */
+  readonly pending = computed(() => this.pendingSignal());
+
+  private seq = 0;
+
+  /**
+   * Surface a consent prompt and resolve once the user answers. The bridge
+   * awaits `granted` before opening a link; the frame uses `id` to render
+   * the prompt inline next to its own iframe (PR #6 follow-up — replaces
+   * the unanchored message-list strip). If the conversation changes while
+   * a prompt is open, `reset()` resolves it as denied (fail-closed).
+   */
+  request(request: ConsentRequest): { id: string; granted: Promise<boolean> } {
+    const id = `mcp-consent-${++this.seq}`;
+    const granted = new Promise<boolean>((resolve) => {
+      const entry: PendingConsent = {
+        id,
+        request,
+        receivedAt: Date.now(),
+        resolve,
+      };
+      this.pendingSignal.set([...this.pendingSignal(), entry]);
+    });
+    return { id, granted };
+  }
+
+  /** User answered a prompt (Allow/Deny). Idempotent. */
+  answer(id: string, granted: boolean): void {
+    const entry = this.pendingSignal().find((p) => p.id === id);
+    if (!entry) return;
+    this.pendingSignal.set(this.pendingSignal().filter((p) => p.id !== id));
+    entry.resolve(granted);
+  }
+
+  /** Drop all prompts on conversation change — fail closed (deny). */
+  reset(): void {
+    const open = this.pendingSignal();
+    if (open.length) {
+      this.pendingSignal.set([]);
+      for (const e of open) e.resolve(false);
+    }
+  }
+}
diff --git a/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-message.service.spec.ts b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-message.service.spec.ts
new file mode 100644
index 00000000..8c972e5b
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-message.service.spec.ts
@@ -0,0 +1,98 @@
+import { describe, it, expect, beforeEach, afterEach, vi } from 'vitest';
+import { TestBed } from '@angular/core/testing';
+import { provideHttpClient } from '@angular/common/http';
+import {
+  HttpTestingController,
+  provideHttpClientTesting,
+} from '@angular/common/http/testing';
+import { signal } from '@angular/core';
+import { McpAppMessageService } from './mcp-app-message.service';
+import { ConfigService } from '../../../services/config.service';
+import { SessionService as BffSessionService } from '../../../auth/session.service';
+import { SessionService } from '../session/session.service';
+import { ToolService } from '../../../services/tool/tool.service';
+import { ModelService } from '../model/model.service';
+
+describe('McpAppMessageService', () => {
+  let svc: McpAppMessageService;
+  let httpMock: HttpTestingController;
+  let usingDefault = false;
+
+  beforeEach(() => {
+    usingDefault = false;
+    TestBed.configureTestingModule({
+      providers: [
+        provideHttpClient(),
+        provideHttpClientTesting(),
+        McpAppMessageService,
+        { provide: ConfigService, useValue: { appApiUrl: signal('http://localhost:8000') } },
+        { provide: BffSessionService, useValue: { csrfHeaders: vi.fn().mockReturnValue({ 'X-CSRF-Token': 't' }) } },
+        { provide: SessionService, useValue: { currentSession: signal({ sessionId: 's1' }) } },
+        { provide: ToolService, useValue: { getEnabledToolIds: vi.fn().mockReturnValue(['gw_x']) } },
+        {
+          provide: ModelService,
+          useValue: {
+            isUsingDefaultModel: () => usingDefault,
+            selectedModel: signal({ modelId: 'm1' }),
+          },
+        },
+      ],
+    });
+    svc = TestBed.inject(McpAppMessageService);
+    httpMock = TestBed.inject(HttpTestingController);
+  });
+
+  afterEach(() => httpMock.verify());
+
+  it('posts the conversation binding + resourceUri + payload', async () => {
+    const promise = svc.updateModelContext('ui://srv/w', {
+      structuredContent: { picked: 'X' },
+    });
+
+    const req = httpMock.expectOne(
+      'http://localhost:8000/mcp-apps/update-context',
+    );
+    expect(req.request.method).toBe('POST');
+    expect(req.request.withCredentials).toBe(true);
+    expect(req.request.headers.get('X-CSRF-Token')).toBe('t');
+    expect(req.request.body).toEqual({
+      sessionId: 's1',
+      resourceUri: 'ui://srv/w',
+      content: null,
+      structuredContent: { picked: 'X' },
+      enabledTools: ['gw_x'],
+      modelId: 'm1',
+    });
+
+    req.flush({ resourceUri: 'ui://srv/w', status: 'stored' });
+    await expect(promise).resolves.toBeUndefined();
+  });
+
+  it('sends modelId null when using the default model', async () => {
+    usingDefault = true;
+    const promise = svc.updateModelContext('ui://srv/w', {
+      content: [{ type: 'text', text: 'note' }],
+    });
+    const req = httpMock.expectOne(
+      'http://localhost:8000/mcp-apps/update-context',
+    );
+    expect(req.request.body.modelId).toBeNull();
+    expect(req.request.body.content).toEqual([{ type: 'text', text: 'note' }]);
+    req.flush({ status: 'stored' });
+    await promise;
+  });
+
+  it('rejects with the inference error message on a non-2xx', async () => {
+    const promise = svc.updateModelContext('ui://srv/w', {
+      structuredContent: {},
+    });
+    const req = httpMock.expectOne(
+      'http://localhost:8000/mcp-apps/update-context',
+    );
+    req.flush(
+      { error: 'needs content' },
+      { status: 400, statusText: 'Bad Request' },
+    );
+    await expect(promise).rejects.toThrow('needs content');
+  });
+});
diff --git a/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-message.service.ts b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-message.service.ts
new file mode 100644
index 00000000..a24ed296
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-message.service.ts
@@ -0,0 +1,77 @@
+import { HttpClient, HttpErrorResponse } from '@angular/common/http';
+import { Injectable, inject } from '@angular/core';
+import { firstValueFrom } from 'rxjs';
+import { ConfigService } from '../../../services/config.service';
+import { SessionService as BffSessionService } from '../../../auth/session.service';
+import { ToolService } from '../../../services/tool/tool.service';
+import { SessionService } from '../session/session.service';
+import { ModelService } from '../model/model.service';
+import type { UpdateModelContextParams } from './mcp-app-protocol';
+
+/**
+ * App-pushed `ui/update-model-context` relay (MCP Apps PR #6).
+ *
+ * Structural sibling of {@link McpAppProxyService}: the authenticated POST
+ * to app-api `/mcp-apps/update-context` (session cookie + CSRF) plus the
+ * conversation-binding assembly so inference-api rebuilds the same cached
+ * agent and stashes the payload on its Strands state for the next turn.
+ * Kept as a service so the bridge stays framework-free — the component
+ * injects this and hands the bridge a plain `updateModelContext` callback.
+ *
+ * `ui/message` is intentionally NOT here: it is "treated identically to a
+ * typed message", so it goes through the normal chat send path
+ * (`ChatRequestService.submitChatRequest`), starting a real streaming turn.
+ */
+@Injectable({ providedIn: 'root' })
+export class McpAppMessageService {
+  private readonly http = inject(HttpClient);
+  private readonly config = inject(ConfigService);
+  private readonly bffSession = inject(BffSessionService);
+  private readonly toolService = inject(ToolService);
+  private readonly sessionService = inject(SessionService);
+  private readonly modelService = inject(ModelService);
+
+  /**
+   * Relay one model-context update. Resolves on acceptance; rejects with
+   * an Error whose message is safe to surface to the App.
+   *
+   * @param resourceUri the bound App resource (`ui://…`) — the host's
+   *   per-App dedupe key (last-write-wins between turns).
+   */
+  async updateModelContext(
+    resourceUri: string,
+    payload: UpdateModelContextParams,
+  ): Promise<void> {
+    const appApiUrl = this.config.appApiUrl();
+    const sessionId = this.sessionService.currentSession().sessionId;
+    if (!appApiUrl || !sessionId) {
+      throw new Error('No active conversation for the context update');
+    }
+
+    const usingDefault = this.modelService.isUsingDefaultModel();
+    const selected = this.modelService.selectedModel();
+    const body = {
+      sessionId,
+      resourceUri,
+      content: payload.content ?? null,
+      structuredContent: payload.structuredContent ?? null,
+      enabledTools: this.toolService.getEnabledToolIds(),
+      modelId: usingDefault ? null : selected?.modelId ?? null,
+    };
+
+    try {
+      await firstValueFrom(
+        this.http.post(`${appApiUrl}/mcp-apps/update-context`, body, {
+          headers: this.bffSession.csrfHeaders(),
+          withCredentials: true,
+        }),
+      );
+    } catch (err) {
+      const message =
+        err instanceof HttpErrorResponse
+          ? (err.error?.error ?? `Context update failed (${err.status})`)
+          : 'Context update failed';
+      throw new Error(message);
+    }
+  }
+}
diff --git a/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-protocol.ts b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-protocol.ts
new file mode 100644
index 00000000..fd76f0d4
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-protocol.ts
@@ -0,0 +1,195 @@
+/**
+ * MCP Apps (SEP-1865) host-half protocol types.
+ *
+ * JSON-RPC 2.0 envelopes exchanged over `postMessage` between this host and
+ * the sandbox-proxy (and, through it, the App View). Normative reference:
+ * `specification/2026-01-26/apps.mdx` in `modelcontextprotocol/ext-apps`.
+ *
+ * PR #4 of `docs/kaizen/scoping/mcp-apps-host-renderer.md`. Pure types +
+ * constants; no Angular, no DOM — kept framework-free so the bridge is
+ * unit-testable with plain fakes.
+ */
+
+import type { McpUiCsp, McpUiPermissions } from '../../../shared/utils/stream-parser';
+
+/** Protocol version this host implements (the normative dated spec). */
+export const MCP_UI_PROTOCOL_VERSION = '2026-01-26';
+
+/** JSON-RPC id (spec allows string or number). */
+export type JsonRpcId = string | number;
+
+export interface JsonRpcRequest {
+  jsonrpc: '2.0';
+  id: JsonRpcId;
+  method: string;
+  params?: unknown;
+  /** Per-frame transport nonce (host↔proxy auth; not part of JSON-RPC). */
+  nonce?: string;
+}
+
+export interface JsonRpcNotification {
+  jsonrpc: '2.0';
+  method: string;
+  params?: unknown;
+  nonce?: string;
+}
+
+export interface JsonRpcSuccess {
+  jsonrpc: '2.0';
+  id: JsonRpcId;
+  result: unknown;
+  nonce?: string;
+}
+
+export interface JsonRpcError {
+  jsonrpc: '2.0';
+  id: JsonRpcId;
+  error: { code: number; message: string; data?: unknown };
+  nonce?: string;
+}
+
+export type JsonRpcMessage =
+  | JsonRpcRequest
+  | JsonRpcNotification
+  | JsonRpcSuccess
+  | JsonRpcError;
+
+/** JSON-RPC reserved code for "method not found". */
+export const JSONRPC_METHOD_NOT_FOUND = -32601;
+/** Implementation-defined error (spec uses -32000 for ui/* denials). */
+export const JSONRPC_IMPL_ERROR = -32000;
+
+// --- Reserved sandbox-proxy methods (host ↔ proxy only) --------------------
+
+/** Sandbox proxy → host: proxy bootstrapped, ready for the resource. */
+export const M_SANDBOX_PROXY_READY = 'ui/notifications/sandbox-proxy-ready';
+/** Host → sandbox proxy: raw HTML + sandbox/CSP/permissions to load. */
+export const M_SANDBOX_RESOURCE_READY = 'ui/notifications/sandbox-resource-ready';
+
+// --- Lifecycle / app methods (host ↔ View, forwarded by the proxy) --------
+
+export const M_UI_INITIALIZE = 'ui/initialize';
+export const M_UI_INITIALIZED = 'ui/notifications/initialized';
+export const M_PING = 'ping';
+/** Standard MCP method an App calls to invoke a server tool (PR #5). */
+export const M_TOOLS_CALL = 'tools/call';
+
+export const M_TOOL_INPUT = 'ui/notifications/tool-input';
+export const M_TOOL_INPUT_PARTIAL = 'ui/notifications/tool-input-partial';
+export const M_TOOL_RESULT = 'ui/notifications/tool-result';
+export const M_TOOL_CANCELLED = 'ui/notifications/tool-cancelled';
+export const M_RESOURCE_TEARDOWN = 'ui/resource-teardown';
+export const M_HOST_CONTEXT_CHANGED = 'ui/notifications/host-context-changed';
+export const M_SIZE_CHANGED = 'ui/notifications/size-changed';
+
+export const M_OPEN_LINK = 'ui/open-link';
+export const M_REQUEST_DISPLAY_MODE = 'ui/request-display-mode';
+export const M_MESSAGE = 'ui/message';
+export const M_UPDATE_MODEL_CONTEXT = 'ui/update-model-context';
+
+export type DisplayMode = 'inline' | 'fullscreen' | 'pip';
+
+/** Capabilities this host advertises to the View in `ui/initialize`. */
+export interface HostCapabilities {
+  /** Host can open external links (consent gating lands in PR #6). */
+  openLinks?: Record<string, never>;
+  /** Host can proxy the App's `tools/call` to the MCP server (PR #5). */
+  serverTools?: Record<string, never>;
+  /** Sandbox config the host applied (mirrors what we asked the proxy for). */
+  sandbox?: {
+    permissions?: McpUiPermissions;
+    csp?: McpUiCsp;
+  };
+}
+
+export interface HostContext {
+  theme?: 'light' | 'dark';
+  displayMode?: DisplayMode;
+  availableDisplayModes?: DisplayMode[];
+  locale?: string;
+  timeZone?: string;
+  userAgent?: string;
+}
+
+export interface McpUiInitializeResult {
+  protocolVersion: string;
+  hostCapabilities: HostCapabilities;
+  hostInfo: { name: string; version: string };
+  hostContext: HostContext;
+}
+
+/** Params for `ui/notifications/sandbox-resource-ready` (host → proxy). */
+export interface SandboxResourceReadyParams {
+  html: string;
+  /**
+   * Inner-iframe `sandbox` attribute. Host default matches the ext-apps
+   * basic-host reference: `allow-scripts allow-same-origin allow-forms`.
+   */
+  sandbox?: string;
+  csp?: McpUiCsp;
+  permissions?: McpUiPermissions;
+  /** Per-frame nonce the proxy must echo on every later message. */
+  nonce: string;
+}
+
+export interface SizeChangedParams {
+  width: number;
+  height: number;
+}
+
+export interface OpenLinkParams {
+  url: string;
+}
+
+export interface RequestDisplayModeParams {
+  mode: DisplayMode;
+}
+
+/** Spec shape of `ui/message` params (View → host). */
+export interface MessageParams {
+  role: 'user';
+  content: { type: 'text'; text: string };
+}
+
+/** Spec shape of `ui/update-model-context` params (View → host). */
+export interface UpdateModelContextParams {
+  content?: unknown[];
+  structuredContent?: Record<string, unknown>;
+}
+
+/** Sandbox capability keys an App may request (`_meta.ui.permissions`). */
+export type CapabilityKey =
+  | 'camera'
+  | 'microphone'
+  | 'geolocation'
+  | 'clipboardWrite';
+
+/**
+ * A user-consent decision the host must obtain before acting on an
+ * App-initiated request. PR #6 of
+ * `docs/kaizen/scoping/mcp-apps-host-renderer.md`, decision: consent is
+ * **frontend-only** — these requests originate from a postMessage on a
+ * possibly-idle iframe, so there is no backend agent turn to pause (unlike
+ * the OAuth-tool `oauth_required` SSE family). The host resolves them with
+ * an inline in-thread prompt and gates the bridge response on the answer.
+ */
+export type ConsentRequest =
+  | { kind: 'open-link'; url: string }
+  | { kind: 'capabilities'; capabilities: CapabilityKey[] };
+
+/** Narrowing helpers. */
+export function isJsonRpc(data: unknown): data is JsonRpcMessage {
+  return (
+    !!data &&
+    typeof data === 'object' &&
+    (data as { jsonrpc?: unknown }).jsonrpc === '2.0'
+  );
+}
+
+export function isRequest(m: JsonRpcMessage): m is JsonRpcRequest {
+  return 'method' in m && 'id' in m;
+}
+
+export function isNotification(m: JsonRpcMessage): m is JsonRpcNotification {
+  return 'method' in m && !('id' in m);
+}
diff --git a/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-proxy.service.spec.ts b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-proxy.service.spec.ts
new file mode 100644
index 00000000..c1c764aa
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-proxy.service.spec.ts
@@ -0,0 +1,97 @@
+import { describe, it, expect, beforeEach, afterEach, vi } from 'vitest';
+import { TestBed } from '@angular/core/testing';
+import { provideHttpClient } from '@angular/common/http';
+import {
+  HttpTestingController,
+  provideHttpClientTesting,
+} from '@angular/common/http/testing';
+import { signal } from '@angular/core';
+import { McpAppProxyService } from './mcp-app-proxy.service';
+import { ConfigService } from '../../../services/config.service';
+import { SessionService as BffSessionService } from '../../../auth/session.service';
+import { SessionService } from '../session/session.service';
+import { ToolService } from '../../../services/tool/tool.service';
+import { ModelService } from '../model/model.service';
+
+describe('McpAppProxyService', () => {
+  let svc: McpAppProxyService;
+  let httpMock: HttpTestingController;
+  let usingDefault = false;
+
+  beforeEach(() => {
+    usingDefault = false;
+    TestBed.configureTestingModule({
+      providers: [
+        provideHttpClient(),
+        provideHttpClientTesting(),
+        McpAppProxyService,
+        { provide: ConfigService, useValue: { appApiUrl: signal('http://localhost:8000') } },
+        { provide: BffSessionService, useValue: { csrfHeaders: vi.fn().mockReturnValue({ 'X-CSRF-Token': 't' }) } },
+        { provide: SessionService, useValue: { currentSession: signal({ sessionId: 's1' }) } },
+        { provide: ToolService, useValue: { getEnabledToolIds: vi.fn().mockReturnValue(['gw_x']) } },
+        {
+          provide: ModelService,
+          useValue: {
+            isUsingDefaultModel: () => usingDefault,
+            selectedModel: signal({ modelId: 'm1' }),
+          },
+        },
+      ],
+    });
+    svc = TestBed.inject(McpAppProxyService);
+    httpMock = TestBed.inject(HttpTestingController);
+  });
+
+  afterEach(() => httpMock.verify());
+
+  it('posts the conversation binding + directive and returns the result', async () => {
+    const promise = svc.proxyToolCall('tu-1', 'widget_tool', { q: 'x' });
+
+    const req = httpMock.expectOne(
+      'http://localhost:8000/mcp-apps/proxy-call',
+    );
+    expect(req.request.method).toBe('POST');
+    expect(req.request.withCredentials).toBe(true);
+    expect(req.request.headers.get('X-CSRF-Token')).toBe('t');
+    expect(req.request.body).toEqual({
+      sessionId: 's1',
+      toolUseId: 'tu-1',
+      toolName: 'widget_tool',
+      arguments: { q: 'x' },
+      enabledTools: ['gw_x'],
+      modelId: 'm1',
+    });
+
+    req.flush({
+      toolUseId: 'tu-1',
+      result: { content: [{ type: 'text', text: 'ok' }], isError: false },
+    });
+    await expect(promise).resolves.toEqual({
+      content: [{ type: 'text', text: 'ok' }],
+      isError: false,
+    });
+  });
+
+  it('sends modelId null when using the default model', async () => {
+    usingDefault = true;
+    const promise = svc.proxyToolCall('tu-1', 'widget_tool', {});
+    const req = httpMock.expectOne(
+      'http://localhost:8000/mcp-apps/proxy-call',
+    );
+    expect(req.request.body.modelId).toBeNull();
+    req.flush({ toolUseId: 'tu-1', result: { content: [], isError: false } });
+    await promise;
+  });
+
+  it('rejects with the inference error message on a non-2xx', async () => {
+    const promise = svc.proxyToolCall('tu-1', 'blocked', {});
+    const req = httpMock.expectOne(
+      'http://localhost:8000/mcp-apps/proxy-call',
+    );
+    req.flush(
+      { error: 'Tool is not callable from an MCP App' },
+      { status: 403, statusText: 'Forbidden' },
+    );
+    await expect(promise).rejects.toThrow('Tool is not callable from an MCP App');
+  });
+});
diff --git a/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-proxy.service.ts b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-proxy.service.ts
new file mode 100644
index 00000000..5d76dc1c
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-proxy.service.ts
@@ -0,0 +1,80 @@
+import { HttpClient, HttpErrorResponse } from '@angular/common/http';
+import { Injectable, inject } from '@angular/core';
+import { firstValueFrom } from 'rxjs';
+import { ConfigService } from '../../../services/config.service';
+import { SessionService as BffSessionService } from '../../../auth/session.service';
+import { ToolService } from '../../../services/tool/tool.service';
+import { SessionService } from '../session/session.service';
+import { ModelService } from '../model/model.service';
+
+/** Shape returned to the bridge → forwarded to the App as the JSON-RPC
+ *  `tools/call` result (a minimal MCP `CallToolResult`). */
+export interface ProxyToolCallResult {
+  content: unknown[];
+  isError: boolean;
+}
+
+/**
+ * App-initiated `tools/call` proxy (MCP Apps PR #5).
+ *
+ * Centralizes the authenticated POST to app-api `/mcp-apps/proxy-call`
+ * (session cookie + CSRF, same pattern as the chat send path) and the
+ * conversation-binding assembly (sessionId + the conversation's enabled
+ * tools + model). Kept as an Angular service so the bridge stays
+ * framework-free: the component injects this and hands the bridge a plain
+ * `proxyToolCall` callback.
+ */
+@Injectable({ providedIn: 'root' })
+export class McpAppProxyService {
+  private readonly http = inject(HttpClient);
+  private readonly config = inject(ConfigService);
+  private readonly bffSession = inject(BffSessionService);
+  private readonly toolService = inject(ToolService);
+  private readonly sessionService = inject(SessionService);
+  private readonly modelService = inject(ModelService);
+
+  /**
+   * Relay one App-initiated tool call. Resolves with the CallToolResult;
+   * rejects with an Error (message safe to surface to the App) on a
+   * non-2xx response or transport failure.
+   */
+  async proxyToolCall(
+    toolUseId: string,
+    toolName: string,
+    args: Record<string, unknown>,
+  ): Promise<ProxyToolCallResult> {
+    const appApiUrl = this.config.appApiUrl();
+    const sessionId = this.sessionService.currentSession().sessionId;
+    if (!appApiUrl || !sessionId) {
+      throw new Error('No active conversation for the tool call');
+    }
+
+    const usingDefault = this.modelService.isUsingDefaultModel();
+    const selected = this.modelService.selectedModel();
+    const body = {
+      sessionId,
+      toolUseId,
+      toolName,
+      arguments: args ?? {},
+      enabledTools: this.toolService.getEnabledToolIds(),
+      modelId: usingDefault ? null : selected?.modelId ?? null,
+    };
+
+    try {
+      const resp = await firstValueFrom(
+        this.http.post<{ toolUseId: string; result: ProxyToolCallResult }>(
+          `${appApiUrl}/mcp-apps/proxy-call`,
+          body,
+          { headers: this.bffSession.csrfHeaders(), withCredentials: true },
+        ),
+      );
+      return resp.result;
+    } catch (err) {
+      const message =
+        err instanceof HttpErrorResponse
+          ? (err.error?.error ?? `Tool call failed (${err.status})`)
+          : 'Tool call failed';
+      throw new Error(message);
+    }
+  }
+}
diff --git a/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-state.service.spec.ts b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-state.service.spec.ts
new file mode 100644
index 00000000..b3740027
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-state.service.spec.ts
@@ -0,0 +1,61 @@
+import { TestBed } from '@angular/core/testing';
+import { describe, it, expect, beforeEach } from 'vitest';
+import { McpAppStateService } from './mcp-app-state.service';
+import type { UiResourceEvent } from '../../../shared/utils/stream-parser';
+
+function ev(toolUseId: string, html = '<h1>hi</h1>'): UiResourceEvent {
+  return {
+    type: 'ui_resource',
+    toolUseId,
+    resourceUri: `ui://srv/${toolUseId}`,
+    html,
+    mimeType: 'text/html;profile=mcp-app',
+    csp: {},
+    permissions: {},
+    sandboxOrigin: 'https://mcp-sandbox.example.com',
+  };
+}
+
+describe('McpAppStateService', () => {
+  let svc: McpAppStateService;
+
+  beforeEach(() => {
+    TestBed.resetTestingModule();
+    TestBed.configureTestingModule({});
+    svc = TestBed.inject(McpAppStateService);
+  });
+
+  it('starts empty', () => {
+    expect(svc.hasApps()).toBe(false);
+    expect(svc.has('tu-1')).toBe(false);
+    expect(svc.get('tu-1')).toBeUndefined();
+  });
+
+  it('records and retrieves a resource by toolUseId', () => {
+    const e = ev('tu-1');
+    svc.recordLive(e);
+    expect(svc.has('tu-1')).toBe(true);
+    expect(svc.get('tu-1')).toEqual(e);
+    expect(svc.hasApps()).toBe(true);
+  });
+
+  it('last write wins for the same toolUseId', () => {
+    svc.recordLive(ev('tu-1', '<old>'));
+    svc.recordLive(ev('tu-1', '<new>'));
+    expect(svc.get('tu-1')?.html).toBe('<new>');
+  });
+
+  it('keeps distinct invocations separate', () => {
+    svc.recordLive(ev('tu-1'));
+    svc.recordLive(ev('tu-2'));
+    expect(svc.get('tu-1')?.resourceUri).toBe('ui://srv/tu-1');
+    expect(svc.get('tu-2')?.resourceUri).toBe('ui://srv/tu-2');
+  });
+
+  it('reset() drops everything (conversation teardown)', () => {
+    svc.recordLive(ev('tu-1'));
+    svc.reset();
+    expect(svc.hasApps()).toBe(false);
+    expect(svc.has('tu-1')).toBe(false);
+  });
+});
diff --git a/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-state.service.ts b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-state.service.ts
new file mode 100644
index 00000000..48f8a3f7
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/mcp-apps/mcp-app-state.service.ts
@@ -0,0 +1,57 @@
+import { Injectable, computed, signal } from '@angular/core';
+import type { UiResourceEvent } from '../../../shared/utils/stream-parser';
+
+/**
+ * Per-conversation registry of MCP App UI resources (SEP-1865), keyed by the
+ * originating `toolUseId`. Structural sibling of `ArtifactStateService` /
+ * `CompactionSummaryService`: a live SSE path (`recordLive`) and a `reset()`
+ * the session page calls on conversation change.
+ *
+ * The `ui_resource` event is INLINE (it arrives right after its
+ * `tool_result`), so unlike artifacts there is no session-load hydration
+ * path — a refreshed conversation re-streams nothing, and a persisted MCP
+ * App is out of scope until a later PR. Iframes persist for the lifetime of
+ * the conversation per the scoping doc; teardown is on `reset()`.
+ *
+ * The whole surface is dark until the backend `AGENTCORE_MCP_APPS_HOST_ENABLED`
+ * flag is flipped (PR #7), so in practice nothing is recorded here yet.
+ */
+@Injectable({ providedIn: 'root' })
+export class McpAppStateService {
+  private readonly byToolUseId = signal<ReadonlyMap<string, UiResourceEvent>>(
+    new Map(),
+  );
+
+  /** True once any MCP App resource has been recorded this conversation. */
+  readonly hasApps = computed(() => this.byToolUseId().size > 0);
+
+  /**
+   * Record the UI resource for a tool invocation. Last write wins — a tool
+   * that re-emits for the same `toolUseId` replaces the prior resource
+   * (the iframe rebinds to the new HTML). New invocations get new ids.
+   */
+  recordLive(event: UiResourceEvent): void {
+    const next = new Map(this.byToolUseId());
+    next.set(event.toolUseId, event);
+    this.byToolUseId.set(next);
+  }
+
+  /** The UI resource for a tool invocation, or undefined. */
+  get(toolUseId: string): UiResourceEvent | undefined {
+    return this.byToolUseId().get(toolUseId);
+  }
+
+  /**
+   * Whether this tool invocation has an MCP App. Reads the backing signal,
+   * so a `computed()` that calls it stays reactive to the `ui_resource`
+   * event arriving after the tool-use block first renders.
+   */
+  has(toolUseId: string): boolean {
+    return this.byToolUseId().has(toolUseId);
+  }
+
+  /** Drop all resources — called on conversation change (teardown). */
+  reset(): void {
+    this.byToolUseId.set(new Map());
+  }
+}
diff --git a/frontend/ai.client/src/app/session/services/mcp-apps/proxy-url.spec.ts b/frontend/ai.client/src/app/session/services/mcp-apps/proxy-url.spec.ts
new file mode 100644
index 00000000..b7b393cc
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/mcp-apps/proxy-url.spec.ts
@@ -0,0 +1,118 @@
+import { describe, expect, test } from 'vitest';
+import { buildProxyUrl } from './proxy-url';
+
+describe('buildProxyUrl', () => {
+  const ORIGIN = 'https://mcp-sandbox.example.com';
+
+  test('no csp at all → bare proxy.html (matches the default-CSP fast path)', () => {
+    expect(buildProxyUrl(ORIGIN, null)).toBe(
+      'https://mcp-sandbox.example.com/proxy.html',
+    );
+    expect(buildProxyUrl(ORIGIN, undefined)).toBe(
+      'https://mcp-sandbox.example.com/proxy.html',
+    );
+    expect(buildProxyUrl(ORIGIN, {})).toBe(
+      'https://mcp-sandbox.example.com/proxy.html',
+    );
+  });
+
+  test('declared-but-empty arrays → bare proxy.html (no ?csp=)', () => {
+    // The CFN would produce the default CSP for any of these — sending
+    // ?csp= would just bust the CloudFront cache for no gain.
+    expect(
+      buildProxyUrl(ORIGIN, {
+        connectDomains: [],
+        resourceDomains: [],
+        frameDomains: [],
+        baseUriDomains: [],
+      }),
+    ).toBe('https://mcp-sandbox.example.com/proxy.html');
+  });
+
+  test('Excalidraw shape: connect+resource domains → ?csp= present and JSON-encoded', () => {
+    const url = buildProxyUrl(ORIGIN, {
+      connectDomains: ['https://esm.sh'],
+      resourceDomains: ['https://esm.sh'],
+    });
+    expect(url.startsWith('https://mcp-sandbox.example.com/proxy.html?csp=')).toBe(true);
+    const encoded = url.split('?csp=')[1];
+    const decoded = JSON.parse(decodeURIComponent(encoded));
+    expect(decoded).toEqual({
+      connectDomains: ['https://esm.sh'],
+      resourceDomains: ['https://esm.sh'],
+    });
+  });
+
+  test('CesiumJS map-server shape: multi-entry arrays survive round-trip intact', () => {
+    const csp = {
+      connectDomains: [
+        'https://*.openstreetmap.org',
+        'https://cesium.com',
+        'https://*.cesium.com',
+      ],
+      resourceDomains: [
+        'https://*.openstreetmap.org',
+        'https://cesium.com',
+        'https://*.cesium.com',
+      ],
+    };
+    const url = buildProxyUrl(ORIGIN, csp);
+    const encoded = url.split('?csp=')[1];
+    expect(JSON.parse(decodeURIComponent(encoded))).toEqual(csp);
+  });
+
+  test('any single declared array suffices to attach ?csp=', () => {
+    expect(buildProxyUrl(ORIGIN, { frameDomains: ['https://x.test'] })).toContain('?csp=');
+    expect(buildProxyUrl(ORIGIN, { baseUriDomains: ['https://x.test'] })).toContain(
+      '?csp=',
+    );
+    expect(buildProxyUrl(ORIGIN, { connectDomains: ['https://x.test'] })).toContain(
+      '?csp=',
+    );
+    expect(buildProxyUrl(ORIGIN, { resourceDomains: ['https://x.test'] })).toContain(
+      '?csp=',
+    );
+  });
+
+  test('trailing slash on origin is stripped (no double slash)', () => {
+    expect(buildProxyUrl('https://mcp-sandbox.example.com/', null)).toBe(
+      'https://mcp-sandbox.example.com/proxy.html',
+    );
+    expect(
+      buildProxyUrl('https://mcp-sandbox.example.com/', {
+        connectDomains: ['https://esm.sh'],
+      }),
+    ).toBe(
+      'https://mcp-sandbox.example.com/proxy.html?csp=' +
+        encodeURIComponent('{"connectDomains":["https://esm.sh"]}'),
+    );
+  });
+
+  test('non-object csp inputs degrade safely', () => {
+    // Defensive: SSE event payloads come from a typed channel but JS lets
+    // anything through at runtime. Anything not an object → no ?csp.
+    expect(buildProxyUrl(ORIGIN, 'not-an-object' as unknown as null)).toBe(
+      'https://mcp-sandbox.example.com/proxy.html',
+    );
+    expect(buildProxyUrl(ORIGIN, 42 as unknown as null)).toBe(
+      'https://mcp-sandbox.example.com/proxy.html',
+    );
+  });
+
+  test('encodes URL-special characters in domain entries', () => {
+    // The query value is the JSON serialization, then percent-encoded —
+    // so a `&` or `=` inside a domain entry (shouldn't happen, but still)
+    // gets encoded and can't break out of the query param.
+    const url = buildProxyUrl(ORIGIN, {
+      connectDomains: ['https://api.example.com?token=abc&other=xyz'],
+    });
+    const encoded = url.split('?csp=')[1];
+    // The encoded value contains neither a bare `&` nor a bare `=` — both
+    // would be percent-encoded (`%26`, `%3D`).
+    expect(encoded).not.toMatch(/&/);
+    expect(encoded.split('=').length).toBe(1); // no literal `=` in the encoded value
+    expect(JSON.parse(decodeURIComponent(encoded))).toEqual({
+      connectDomains: ['https://api.example.com?token=abc&other=xyz'],
+    });
+  });
+});
diff --git a/frontend/ai.client/src/app/session/services/mcp-apps/proxy-url.ts b/frontend/ai.client/src/app/session/services/mcp-apps/proxy-url.ts
new file mode 100644
index 00000000..a399bfc7
--- /dev/null
+++ b/frontend/ai.client/src/app/session/services/mcp-apps/proxy-url.ts
@@ -0,0 +1,42 @@
+import type { McpUiCsp } from '../../../shared/utils/stream-parser';
+
+/**
+ * Build the proxy.html URL the `<mcp-app-frame>` iframe `src` is set to.
+ *
+ * The sandbox-proxy origin runs a CloudFront Function on viewer-response
+ * (`infrastructure/assets/mcp-sandbox/csp-function.js`) that composes the
+ * `Content-Security-Policy` header from a `?csp=` query parameter. Apps
+ * that declare `_meta.ui.csp` get a per-resource CSP that honors their
+ * declared `connectDomains` / `resourceDomains` / `frameDomains` /
+ * `baseUriDomains`; Apps that omit it (or declare an empty shape) get the
+ * default CSP from the function with no query string at all — matching
+ * the upstream `examples/basic-host/serve.ts` reference.
+ *
+ * We only attach `?csp=` when the resource actually declares at least one
+ * non-empty domain array. An empty `{}` or `{ connectDomains: [] }` would
+ * produce identical CSP output from the function either way, and omitting
+ * the param keeps CloudFront cache keys uniform across the no-declaration
+ * majority of Apps.
+ */
+export function buildProxyUrl(
+  sandboxOrigin: string,
+  csp: McpUiCsp | null | undefined,
+): string {
+  const base = `${sandboxOrigin.replace(/\/$/, '')}/proxy.html`;
+  if (!hasDeclaredDomains(csp)) return base;
+  return `${base}?csp=${encodeURIComponent(JSON.stringify(csp))}`;
+}
+
+function hasDeclaredDomains(csp: McpUiCsp | null | undefined): csp is McpUiCsp {
+  if (!csp || typeof csp !== 'object') return false;
+  return (
+    isNonEmptyArray(csp.connectDomains) ||
+    isNonEmptyArray(csp.resourceDomains) ||
+    isNonEmptyArray(csp.frameDomains) ||
+    isNonEmptyArray(csp.baseUriDomains)
+  );
+}
+
+function isNonEmptyArray(value: unknown): value is string[] {
+  return Array.isArray(value) && value.length > 0;
+}
diff --git a/frontend/ai.client/src/app/session/services/model/model.service.spec.ts b/frontend/ai.client/src/app/session/services/model/model.service.spec.ts
index aa77881f..7ed1edd7 100644
--- a/frontend/ai.client/src/app/session/services/model/model.service.spec.ts
+++ b/frontend/ai.client/src/app/session/services/model/model.service.spec.ts
@@ -3,12 +3,14 @@ import { TestBed } from '@angular/core/testing';
 import { HttpClientTestingModule, HttpTestingController } from '@angular/common/http/testing';
 import { ModelService } from './model.service';
 import { ConfigService } from '../../../services/config.service';
+import { UserSettingsService } from '../../../services/user-settings.service';
 import { ManagedModel } from '../../../admin/manage-models/models/managed-model.model';
 import { signal } from '@angular/core';
 
 describe('ModelService', () => {
   let service: ModelService;
   let httpMock: HttpTestingController;
+  let mockUserSettings: { fetchSettings: ReturnType<typeof vi.fn> };
 
   const mockModels: ManagedModel[] = [
     { id: 'm1', modelId: 'claude-haiku', modelName: 'Claude Haiku', provider: 'bedrock', providerName: 'Anthropic', inputModalities: ['TEXT'], outputModalities: ['TEXT'], maxInputTokens: 200000, maxOutputTokens: 4096, allowedAppRoles: [], availableToRoles: [], enabled: true, inputPricePerMillionTokens: 0.25, outputPricePerMillionTokens: 1.25, knowledgeCutoffDate: null, supportsCaching: true, isDefault: false },
@@ -27,11 +29,16 @@ describe('ModelService', () => {
       removeItem: vi.fn((k: string) => { delete sessionStore[k]; }),
     });
 
+    mockUserSettings = {
+      fetchSettings: vi.fn().mockResolvedValue({ defaultModelId: null }),
+    };
+
     TestBed.configureTestingModule({
       imports: [HttpClientTestingModule],
       providers: [
         ModelService,
         { provide: ConfigService, useValue: { appApiUrl: signal('http://localhost:8000') } },
+        { provide: UserSettingsService, useValue: mockUserSettings },
       ],
     });
 
@@ -80,6 +87,55 @@ describe('ModelService', () => {
       await promise;
       expect(service.selectedModel().modelId).toBe('claude-haiku');
     });
+
+    it('should apply user persisted defaultModelId when no session selection exists', async () => {
+      // Drop in-memory + sessionStorage so the user-default branch runs.
+      service['_selectedModel'].set(null);
+      service['usingDefaultModel'].set(true);
+      delete sessionStore['selectedModelId'];
+      mockUserSettings.fetchSettings.mockResolvedValueOnce({ defaultModelId: 'claude-haiku' });
+
+      const promise = service.loadModels();
+      await vi.waitFor(() => {
+        httpMock.expectOne('http://localhost:8000/models').flush(mockResponse);
+      });
+      await promise;
+
+      // claude-haiku wins even though claude-sonnet has isDefault=true,
+      // because the user's persisted preference is consulted first.
+      expect(service.selectedModel().modelId).toBe('claude-haiku');
+      expect(mockUserSettings.fetchSettings).toHaveBeenCalled();
+    });
+
+    it('should fall back to admin default when user setting is null', async () => {
+      service['_selectedModel'].set(null);
+      service['usingDefaultModel'].set(true);
+      delete sessionStore['selectedModelId'];
+      mockUserSettings.fetchSettings.mockResolvedValueOnce({ defaultModelId: null });
+
+      const promise = service.loadModels();
+      await vi.waitFor(() => {
+        httpMock.expectOne('http://localhost:8000/models').flush(mockResponse);
+      });
+      await promise;
+
+      expect(service.selectedModel().modelId).toBe('claude-sonnet'); // isDefault: true
+    });
+
+    it('should fall back to admin default when user setting points to a missing model', async () => {
+      service['_selectedModel'].set(null);
+      service['usingDefaultModel'].set(true);
+      delete sessionStore['selectedModelId'];
+      mockUserSettings.fetchSettings.mockResolvedValueOnce({ defaultModelId: 'no-longer-here' });
+
+      const promise = service.loadModels();
+      await vi.waitFor(() => {
+        httpMock.expectOne('http://localhost:8000/models').flush(mockResponse);
+      });
+      await promise;
+
+      expect(service.selectedModel().modelId).toBe('claude-sonnet');
+    });
   });
 
   describe('setSelectedModel', () => {
diff --git a/frontend/ai.client/src/app/session/services/model/model.service.ts b/frontend/ai.client/src/app/session/services/model/model.service.ts
index f599954c..9c39feb8 100644
--- a/frontend/ai.client/src/app/session/services/model/model.service.ts
+++ b/frontend/ai.client/src/app/session/services/model/model.service.ts
@@ -3,6 +3,7 @@ import { HttpClient } from '@angular/common/http';
 import { firstValueFrom } from 'rxjs';
 import { ConfigService } from '../../../services/config.service';
 import { ManagedModel } from '../../../admin/manage-models/models/managed-model.model';
+import { UserSettingsService } from '../../../services/user-settings.service';
 
 interface ManagedModelsListResponse {
   models: ManagedModel[];
@@ -15,6 +16,7 @@ interface ManagedModelsListResponse {
 export class ModelService {
   private http = inject(HttpClient);
   private config = inject(ConfigService);
+  private userSettings = inject(UserSettingsService);
   private readonly baseUrl = computed(() => `${this.config.appApiUrl()}/models`);
 
   // Session storage key for persisting model selection
@@ -141,10 +143,20 @@ export class ModelService {
           this._selectedModel.set(savedModel);
           this.usingDefaultModel.set(false);
         } else {
-          // Find admin-configured default model, or fall back to first available
-          const defaultModel = enabledModels.find(m => m.isDefault);
-          this._selectedModel.set(defaultModel || enabledModels[0]);
-          this.usingDefaultModel.set(false);
+          // Check the user's persisted default from settings API before
+          // falling back to the admin-configured default. Settings live in
+          // DynamoDB and survive across sessions / browsers, where session
+          // storage above is tab-scoped only.
+          const userDefaultModel = await this.findUserDefaultModel(enabledModels);
+          if (userDefaultModel) {
+            this._selectedModel.set(userDefaultModel);
+            this.usingDefaultModel.set(false);
+          } else {
+            // Find admin-configured default model, or fall back to first available
+            const defaultModel = enabledModels.find(m => m.isDefault);
+            this._selectedModel.set(defaultModel || enabledModels[0]);
+            this.usingDefaultModel.set(false);
+          }
         }
       } else {
         // No models available, use system default
@@ -247,6 +259,25 @@ export class ModelService {
     }
   }
 
+  /**
+   * Look up the user's persisted defaultModelId from the settings API and
+   * resolve it to a model in the supplied enabled list. Returns null when
+   * the user has no default set, the saved model is no longer available,
+   * or the settings call fails. Failures are swallowed because the caller
+   * has a hardcoded fallback (admin default, then first available).
+   */
+  private async findUserDefaultModel(enabledModels: ManagedModel[]): Promise<ManagedModel | null> {
+    try {
+      const settings = await this.userSettings.fetchSettings();
+      const id = settings?.defaultModelId;
+      if (!id) return null;
+      return enabledModels.find(m => m.modelId === id) ?? null;
+    } catch (e) {
+      console.warn('Could not load user settings to apply default model:', e);
+      return null;
+    }
+  }
+
   /**
    * Set a single inference param override on the currently selected model.
    * Pass `null` / `undefined` to clear the override and fall back to the
diff --git a/frontend/ai.client/src/app/session/services/models/message.model.ts b/frontend/ai.client/src/app/session/services/models/message.model.ts
index c468f9c0..5e3bf3aa 100644
--- a/frontend/ai.client/src/app/session/services/models/message.model.ts
+++ b/frontend/ai.client/src/app/session/services/models/message.model.ts
@@ -71,6 +71,13 @@ export interface ToolUseData {
   };
   /** Tool execution status */
   status?: 'pending' | 'complete' | 'error';
+  /**
+   * Best-effort decoded value of a long string input field (e.g. an artifact's
+   * `content`) extracted from the still-incomplete tool-call JSON while the
+   * model is streaming it. Used to give the user live "generating" feedback
+   * before the full tool input has arrived. Unset once the tool completes.
+   */
+  streamingContent?: string;
 }
 
 /**
diff --git a/frontend/ai.client/src/app/session/services/models/session-metadata.model.ts b/frontend/ai.client/src/app/session/services/models/session-metadata.model.ts
index 45ab2b45..f2dd8745 100644
--- a/frontend/ai.client/src/app/session/services/models/session-metadata.model.ts
+++ b/frontend/ai.client/src/app/session/services/models/session-metadata.model.ts
@@ -44,6 +44,10 @@ export interface SessionMetadata {
    *  summary in this session. Drives the end-of-conversation summary
    *  indicator after a refresh. */
   totalSummarizedTurns?: number;
+  /** True when the last turn ended in a recoverable max_tokens truncation.
+   *  Lets the "Continue" affordance survive a page refresh. Cleared
+   *  server-side at the start of any new (non-interrupt-resume) turn. */
+  lastTurnContinuable?: boolean;
 }
 
 // Request model for updating session metadata
diff --git a/frontend/ai.client/src/app/session/services/session/message-map.service.ts b/frontend/ai.client/src/app/session/services/session/message-map.service.ts
index d778518a..089c6722 100644
--- a/frontend/ai.client/src/app/session/services/session/message-map.service.ts
+++ b/frontend/ai.client/src/app/session/services/session/message-map.service.ts
@@ -21,6 +21,17 @@ export class MessageMapService {
   private messageMap = signal<MessageMap>({});
   private activeStreamSessionId = signal<string | null>(null);
 
+  /**
+   * Continuation-after-max_tokens state. A "Continue" turn has NO new user
+   * message, so the normal sync (which truncates back to the last user
+   * message and replaces everything after it) would discard the truncated
+   * partial + error bubbles and show only the continuation. In continuation
+   * mode we instead pin a stable prefix (the messages that existed when the
+   * continuation started) and append the continuation stream after it.
+   */
+  private continuationSessionId: string | null = null;
+  private continuationPrefix: Message[] = [];
+
   /**
    * Tracks which session is currently loading messages from the API.
    * Used to show skeleton loading state when navigating to a session.
@@ -66,6 +77,10 @@ export class MessageMapService {
   startStreaming(sessionId: string): void {
     this.activeStreamSessionId.set(sessionId);
 
+    // A normal turn uses the standard (truncate-to-last-user) sync.
+    this.continuationSessionId = null;
+    this.continuationPrefix = [];
+
     // Get current message count for this session to enable predictable ID generation
     const currentMessages = this.messageMap()[sessionId]?.() ?? [];
     const messageCount = currentMessages.length;
@@ -82,6 +97,31 @@ export class MessageMapService {
     }
   }
 
+  /**
+   * Start a "Continue" stream after a max_tokens truncation. Unlike
+   * {@link startStreaming}, the messages that already exist (including the
+   * truncated partial and the error bubble) are pinned as a stable prefix
+   * and the continuation stream is appended after them — nothing is
+   * truncated away. The parser is reset with the current message count so
+   * the continuation's message IDs follow the prefix instead of colliding.
+   */
+  beginContinuationStreaming(sessionId: string): void {
+    this.activeStreamSessionId.set(sessionId);
+
+    const currentMessages = this.messageMap()[sessionId]?.() ?? [];
+    this.continuationSessionId = sessionId;
+    this.continuationPrefix = [...currentMessages];
+
+    this.streamParser.reset(sessionId, currentMessages.length);
+
+    if (!this.messageMap()[sessionId]) {
+      this.messageMap.update(map => ({
+        ...map,
+        [sessionId]: signal<Message[]>([])
+      }));
+    }
+  }
+
   /**
    * End streaming for the current session.
    * Finalizes messages and clears streaming state.
@@ -89,7 +129,8 @@ export class MessageMapService {
   endStreaming(): void {
     const sessionId = this.activeStreamSessionId();
     if (sessionId) {
-      // Ensure final messages are synced
+      // Ensure final messages are synced (continuation merge still active
+      // here so the final flush keeps the pinned prefix).
       const finalMessages = this.streamParser.allMessages();
       if (finalMessages.length > 0) {
         this.syncStreamingMessages(sessionId, finalMessages);
@@ -97,6 +138,8 @@ export class MessageMapService {
     }
 
     this.activeStreamSessionId.set(null);
+    this.continuationSessionId = null;
+    this.continuationPrefix = [];
   }
 
   /**
@@ -214,6 +257,15 @@ export class MessageMapService {
     if (!sessionSignal) return;
 
     sessionSignal.update(existingMessages => {
+      // Continuation-after-max_tokens: there is no new user message to
+      // truncate back to. Pin the prefix captured when the continuation
+      // started (history + truncated partial + error bubble) and append the
+      // continuation stream after it. Rebuilt from the stable prefix every
+      // tick so repeated syncs stay idempotent.
+      if (this.continuationSessionId === sessionId) {
+        return [...this.continuationPrefix, ...streamMessages];
+      }
+
       // Find the index of the last user message
       let lastUserMessageIndex = -1;
       for (let i = existingMessages.length - 1; i >= 0; i--) {
diff --git a/frontend/ai.client/src/app/session/session.page.html b/frontend/ai.client/src/app/session/session.page.html
index 18431c70..2f0a228a 100644
--- a/frontend/ai.client/src/app/session/session.page.html
+++ b/frontend/ai.client/src/app/session/session.page.html
@@ -10,6 +10,7 @@
   [greetingMessage]="greetingMessage()"
   [config]="chatConfig"
   (messageSubmitted)="onMessageSubmitted($event)"
+  (continueRequested)="onContinueRequested()"
   (messageCancelled)="onMessageCancelled()"
   (fileAttached)="onFileAttached($event)"
   (settingsToggled)="toggleSettings()"
diff --git a/frontend/ai.client/src/app/session/session.page.ts b/frontend/ai.client/src/app/session/session.page.ts
index 771ce93b..4937f248 100644
--- a/frontend/ai.client/src/app/session/session.page.ts
+++ b/frontend/ai.client/src/app/session/session.page.ts
@@ -15,6 +15,12 @@ import { UserService } from '../auth/user.service';
 import { ChatHttpService } from './services/chat/chat-http.service';
 import { StreamParserService } from './services/chat/stream-parser.service';
 import { CompactionSummaryService } from './services/chat/compaction-summary.service';
+import { ArtifactStateService } from './services/artifacts/artifact-state.service';
+import { ArtifactHttpService } from './services/artifacts/artifact-http.service';
+import { McpAppStateService } from './services/mcp-apps/mcp-app-state.service';
+import { McpAppCardStateService } from './services/mcp-apps/mcp-app-card-state.service';
+import { McpAppCardHttpService } from './services/mcp-apps/mcp-app-card-http.service';
+import { McpAppConsentService } from './services/mcp-apps/mcp-app-consent.service';
 import { Dialog } from '@angular/cdk/dialog';
 import { AssistantService } from '../assistants/services/assistant.service';
 import { Assistant } from '../assistants/models/assistant.model';
@@ -44,6 +50,12 @@ export class ConversationPage implements OnDestroy {
   private chatHttpService = inject(ChatHttpService);
   private streamParserService = inject(StreamParserService);
   private compactionSummary = inject(CompactionSummaryService);
+  private artifactState = inject(ArtifactStateService);
+  private mcpAppState = inject(McpAppStateService);
+  private mcpAppCardState = inject(McpAppCardStateService);
+  private mcpAppCardHttp = inject(McpAppCardHttpService);
+  private mcpAppConsent = inject(McpAppConsentService);
+  private artifactHttp = inject(ArtifactHttpService);
   private assistantService = inject(AssistantService);
   private router = inject(Router);
   private dialog = inject(Dialog);
@@ -185,6 +197,14 @@ export class ConversationPage implements OnDestroy {
         lastContextTokens: session.lastContextTokens,
         contextWindow: session.contextWindow,
       });
+      // Refresh-survival for the max_tokens "Continue" affordance: the
+      // truncated partial is already in restored history; this flag is the
+      // missing piece. Only set true here — the route-change reset clears
+      // it (cross-session safety) and the live stream_error path owns the
+      // in-turn signal, so we never clobber a live true with stale metadata.
+      if (session.lastTurnContinuable) {
+        this.chatStateService.setLastTurnContinuable(true);
+      }
     });
 
     // Hydrate the compaction summary indicator from persisted session
@@ -202,41 +222,70 @@ export class ConversationPage implements OnDestroy {
       }
     });
 
-    // Priority-based assistant loading: URL query param first, then session preferences
+    // Single source of truth: the URL's `assistantId` query parameter.
+    //
+    // The URL is authoritative for which assistant is attached to the
+    // current view. Session preferences on the backend still record the
+    // attached assistant (so we can rebuild the URL after a user lands on
+    // a bare `/s/:id` URL from a bookmark or legacy link), but that
+    // rebuild is handled by a dedicated self-heal redirect below — not by
+    // reading preferences here. Keeping this effect URL-only removes an
+    // entire class of races around metadata fetch timing and component
+    // recreation (see #205).
     effect(() => {
       const queryAssistantId = this.assistantIdFromQuery();
-      const session = this.sessionConversation();
-      const sessionAssistantId = session?.preferences?.assistantId;
-      const currentSessionId = this.sessionId();
-      
-      // Priority 1: URL query parameter (highest priority)
+      const loadedAssistant = this.assistant();
+
       if (queryAssistantId) {
-        // Validate: Can only attach to new sessions (no messages)
-        if (currentSessionId && this.hasMessages()) {
-          this.assistantError.set('Assistants can only be attached to new sessions');
-          this.assistant.set(null);
-          this.clearAssistantIdFromUrl();
+        // Already loaded — avoid a redundant fetch and the transient null
+        // state while the fetch would resolve.
+        if (loadedAssistant?.assistantId === queryAssistantId) {
           return;
         }
-        // Load from query param (existence check only, no access validation)
-        this.loadAssistant(queryAssistantId, false).catch(error => {
+        // Existence check only; access is validated on the backend when
+        // the next message is sent.
+        this.loadAssistant(queryAssistantId).catch(error => {
           console.error('Failed to load assistant from query param:', error);
         });
         return;
       }
-      
-      // Priority 2: Session preferences (fallback for existing sessions)
-      if (sessionAssistantId && currentSessionId) {
-        // Load from preferences - allow even if session has messages (persisted assistant)
-        this.loadAssistant(sessionAssistantId, true).catch(error => {
-          console.error('Failed to load assistant from session preferences:', error);
+
+      // No assistant in the URL — clear any stale state from a prior load.
+      if (loadedAssistant || this.assistantError()) {
+        this.assistant.set(null);
+        this.assistantError.set(null);
+      }
+    });
+
+    // Self-heal effect: when the user lands on `/s/:id` without an
+    // `assistantId` query param but the session's stored preferences carry
+    // one (bookmarks, legacy URLs, shared session links), redirect to the
+    // same session with the param filled in. From that point on, the URL
+    // is the sole source of truth for the assistant-loading effect above.
+    //
+    // Intentionally narrow: we only redirect when the URL is empty. If the
+    // URL already carries an `assistantId`, we trust it — including when
+    // it differs from preferences (the backend will reject a conflict on
+    // the next message).
+    effect(() => {
+      const session = this.sessionConversation();
+      const sessionAssistantId = session?.preferences?.assistantId;
+      const currentSessionId = this.sessionId();
+      const queryAssistantId = this.assistantIdFromQuery();
+
+      if (
+        currentSessionId &&
+        session?.sessionId === currentSessionId &&
+        sessionAssistantId &&
+        !queryAssistantId
+      ) {
+        this.router.navigate([], {
+          relativeTo: this.route,
+          queryParams: { assistantId: sessionAssistantId },
+          queryParamsHandling: 'merge',
+          replaceUrl: true,
         });
-        return;
       }
-      
-      // No assistant to load
-      this.assistant.set(null);
-      this.assistantError.set(null);
     });
 
     // Subscribe to route parameter changes
@@ -249,12 +298,33 @@ export class ConversationPage implements OnDestroy {
       // flash on the badge while the new metadata is in flight.
       this.chatStateService.seedSessionAggregates({});
 
+      // Retire any prior session's "Continue" affordance before the new
+      // session's metadata lands; the seed effect re-sets it from
+      // metadata.lastTurnContinuable when applicable.
+      this.chatStateService.setLastTurnContinuable(false);
+
       // Compaction summary is session-scoped — clear before loading the
       // next session's metadata so the previous session's totals don't
       // bleed in. The hydration effect above will reseed from
       // currentSession.totalSummarizedTurns once the metadata fetch lands.
       this.compactionSummary.reset();
 
+      // Artifacts are session-scoped — clear before the next session
+      // loads so a prior session's cards don't bleed in, then re-hydrate
+      // from the app-api list endpoint below.
+      this.artifactState.reset();
+
+      // MCP App frames persist for the conversation's lifetime per the
+      // scoping doc; teardown is on conversation change. No re-hydration:
+      // the inline `ui_resource` event only arrives live during a stream.
+      this.mcpAppState.reset();
+
+      // Option A (PR #6): app-initiated tool cards DO re-hydrate (the
+      // broker is in-memory). Any open consent prompt for the prior
+      // conversation is dropped fail-closed.
+      this.mcpAppCardState.reset();
+      this.mcpAppConsent.reset();
+
       if (id) {
         // Update the messages signal reference (this triggers reactivity)
         this.messagesSignal.set(this.messageMapService.getMessagesForSession(id));
@@ -271,6 +341,13 @@ export class ConversationPage implements OnDestroy {
         } catch (error) {
           console.error('Failed to load messages for session:', id, error);
         }
+
+        // Re-hydrate artifact cards for this session so they survive a
+        // refresh / deep link. Best-effort and non-blocking: a 404 (the
+        // artifacts feature is off for this environment) or any network
+        // error just means no cards — never disrupt the session.
+        this.hydrateArtifacts(id);
+        this.hydrateMcpAppCards(id);
       } else {
         // No session selected, clear the session metadata
         this.sessionService.setSessionMetadataId(null);
@@ -289,14 +366,58 @@ export class ConversationPage implements OnDestroy {
     this.queryParamSubscription?.unsubscribe();
   }
 
+  /**
+   * Best-effort: pull the session's artifacts and seed the registry so
+   * cards render on load. `seedFromHydration` is non-clobbering, so a
+   * slow response that lands after a live `artifact` event won't undo a
+   * newer version. A guard re-checks the active session because the
+   * fetch is async and the user may have navigated away.
+   */
+  private hydrateArtifacts(sessionId: string): void {
+    this.artifactHttp
+      .listSessionArtifacts(sessionId)
+      .then(artifacts => {
+        if (artifacts.length && this.sessionId() === sessionId) {
+          this.artifactState.seedFromHydration(artifacts);
+        }
+      })
+      .catch(() => {
+        // Feature disabled (404) or transient error — no cards, no noise.
+      });
+  }
+
+  /**
+   * Best-effort: pull persisted app-initiated tool cards and seed the
+   * registry so they survive a reload (the PR #5 broker is in-memory).
+   * Mirrors {@link hydrateArtifacts}: non-clobbering, session-guarded,
+   * silent on 404 (MCP Apps host flag off) or transient error.
+   */
+  private hydrateMcpAppCards(sessionId: string): void {
+    this.mcpAppCardHttp
+      .listSessionCards(sessionId)
+      .then(cards => {
+        if (cards.length && this.sessionId() === sessionId) {
+          this.mcpAppCardState.seedFromHydration(cards);
+        }
+      })
+      .catch(() => {
+        // Feature disabled (404) or transient error — no cards, no noise.
+      });
+  }
+
   onMessageSubmitted(message: { content: string, timestamp: Date, fileUploadIds?: string[] }) {
     // Use the effective session ID (route sessionId or staged sessionId)
     const sessionIdToUse = this.effectiveSessionId();
 
-    // Get assistantId from query param (priority 1) or session preferences (priority 2)
-    const queryAssistantId = this.assistantIdFromQuery();
-    const sessionAssistantId = this.sessionConversation()?.preferences?.assistantId;
-    const assistantIdToUse = queryAssistantId || sessionAssistantId || undefined;
+    // The URL's `assistantId` query param is the sole source of truth. It's
+    // set on initial navigation (assistant card, share link) and kept in
+    // sync by the self-heal redirect in the constructor for existing
+    // sessions opened without one. Falling back to the in-memory
+    // `assistant()` signal guards the brief window during the `/` → `/s/:id`
+    // route transition where the component is recreated and the query
+    // param hasn't yet propagated to the new instance.
+    const assistantIdToUse =
+      this.assistantIdFromQuery() || this.assistant()?.assistantId || undefined;
 
     // Set loading state before submitting
     this.chatStateService.setChatLoading(true);
@@ -317,6 +438,23 @@ export class ConversationPage implements OnDestroy {
     }
   }
 
+  /**
+   * "Continue" affordance on a max_tokens-truncated assistant message.
+   * NOT a new user turn: it resumes the truncated assistant message via the
+   * continuation path (no visible user bubble, empty prompt) so the model
+   * picks up where it stopped instead of re-answering the original request.
+   */
+  onContinueRequested() {
+    const sessionIdToUse = this.effectiveSessionId();
+    const assistantIdToUse =
+      this.assistantIdFromQuery() || this.assistant()?.assistantId || undefined;
+    this.chatRequestService
+      .continueTruncatedTurn(sessionIdToUse, assistantIdToUse)
+      .catch((error) => {
+        console.error('Error continuing truncated turn:', error);
+      });
+  }
+
   /**
    * Called when user selects a file to attach.
    * Creates a staged session if one doesn't exist yet.
@@ -382,7 +520,13 @@ export class ConversationPage implements OnDestroy {
       const user = this.userService.currentUser();
       const userId = user?.user_id || 'anonymous';
       this.sessionService.addSessionToCache(sessionId, userId);
-      this.router.navigate(['s', sessionId], { replaceUrl: true });
+      // Carry the assistant id forward if one is attached to this view —
+      // keeps the URL the single source of truth after voice ends.
+      const assistantId = this.assistantIdFromQuery() || this.assistant()?.assistantId;
+      this.router.navigate(['s', sessionId], {
+        replaceUrl: true,
+        queryParams: assistantId ? { assistantId } : {},
+      });
     }
 
     // Generate title for new voice sessions (fire and forget)
@@ -408,22 +552,16 @@ export class ConversationPage implements OnDestroy {
   }
 
   /**
-   * Load assistant by ID - only checks existence, not access
-   * Access validation happens on backend when message is sent
-   * @param assistantId - Assistant ID to load
-   * @param fromPreferences - If true, this is from session preferences (skip message check)
+   * Load assistant by ID - only checks existence, not access.
+   * Access and mid-session conflicts are validated on the backend when the
+   * next message is sent (the inference-api compares the request's
+   * rag_assistant_id against the session's stored assistant and rejects
+   * mismatches). Doing the same check here is unreliable because the
+   * component is recreated on the `/` → `/s/:id` route transition, so any
+   * "session has messages" guard fires on the normal first-turn flow and
+   * clears the assistant that the user just opened (#205).
    */
-  private async loadAssistant(assistantId: string, fromPreferences: boolean = false): Promise<void> {
-    // Validation: Only check messages for new attachments (not from preferences)
-    if (!fromPreferences) {
-      const sessionId = this.sessionId();
-      if (sessionId && this.hasMessages()) {
-        this.assistantError.set('Assistants can only be attached to new sessions');
-        this.assistant.set(null);
-        return;
-      }
-    }
-
+  private async loadAssistant(assistantId: string): Promise<void> {
     try {
       this.assistantError.set(null);
       this.isLoadingAssistant.set(true);
@@ -436,10 +574,6 @@ export class ConversationPage implements OnDestroy {
       // Only handle existence errors (404) - access errors (403) will be handled on backend
       if (error?.status === 404) {
         this.assistantError.set('Assistant not found');
-        // If from preferences and assistant doesn't exist, optionally clear it
-        if (fromPreferences) {
-          // TODO: Optionally clear assistantId from session preferences via API
-        }
       } else {
         // Other errors (network, etc.) - show generic error but don't block
         this.assistantError.set('Failed to load assistant');
diff --git a/frontend/ai.client/src/app/settings/pages/chat-preferences/chat-preferences-settings.page.spec.ts b/frontend/ai.client/src/app/settings/pages/chat-preferences/chat-preferences-settings.page.spec.ts
new file mode 100644
index 00000000..81fa318b
--- /dev/null
+++ b/frontend/ai.client/src/app/settings/pages/chat-preferences/chat-preferences-settings.page.spec.ts
@@ -0,0 +1,101 @@
+import { describe, it, expect, beforeEach, vi } from 'vitest';
+import { TestBed } from '@angular/core/testing';
+import { signal } from '@angular/core';
+import { ChatPreferencesSettingsPage } from './chat-preferences-settings.page';
+import { ModelService } from '../../../session/services/model/model.service';
+import { UserSettingsService } from '../../../services/user-settings.service';
+import { LocalSettingsService } from '../../../services/local-settings.service';
+
+/**
+ * Regression tests for #161 — default model selection silently reverting on
+ * page reload. The dropdown is bound via `[value]` on a native <select>,
+ * which is a one-time DOM property write. If the saved `defaultModelId` is
+ * emitted before the @for loop has rendered the matching <option>, the
+ * browser silently resets the <select> to "" and Angular never re-applies
+ * the binding when the options finally arrive (the computed input has not
+ * changed).
+ *
+ * Fix: `currentDefaultModelId` returns '' until BOTH the user's settings
+ * AND the model list have loaded. These tests pin that contract so a
+ * future refactor can't quietly undo it.
+ */
+describe('ChatPreferencesSettingsPage — currentDefaultModelId', () => {
+  let availableModels: ReturnType<typeof signal<{ id: string; modelId: string; modelName: string; providerName: string }[]>>;
+  let settingsValue: ReturnType<typeof signal<{ defaultModelId: string | null } | undefined>>;
+  let modelsLoading: ReturnType<typeof signal<boolean>>;
+
+  beforeEach(() => {
+    availableModels = signal<{ id: string; modelId: string; modelName: string; providerName: string }[]>([]);
+    settingsValue = signal<{ defaultModelId: string | null } | undefined>(undefined);
+    modelsLoading = signal<boolean>(false);
+
+    const mockModelService = {
+      availableModels,
+      modelsLoading,
+    };
+
+    const mockUserSettingsService = {
+      settingsResource: {
+        value: () => settingsValue(),
+        reload: vi.fn(),
+      },
+      updateSettings: vi.fn(),
+    };
+
+    const mockLocalSettings = {
+      showTokenCount: signal(false),
+      showDebugOutput: signal(false),
+      setShowTokenCount: vi.fn(),
+      setShowDebugOutput: vi.fn(),
+    };
+
+    TestBed.configureTestingModule({
+      providers: [
+        ChatPreferencesSettingsPage,
+        { provide: ModelService, useValue: mockModelService },
+        { provide: UserSettingsService, useValue: mockUserSettingsService },
+        { provide: LocalSettingsService, useValue: mockLocalSettings },
+      ],
+    });
+  });
+
+  it("returns '' while neither data source has loaded", () => {
+    const page = TestBed.inject(ChatPreferencesSettingsPage);
+    expect(page.currentDefaultModelId()).toBe('');
+  });
+
+  it("returns '' when settings have loaded but the model list is still empty", () => {
+    // This is the exact race the bug describes: settings resolve first,
+    // model list is still empty, so binding the saved id at this moment
+    // would target an <option> that does not yet exist.
+    settingsValue.set({ defaultModelId: 'claude-haiku' });
+    const page = TestBed.inject(ChatPreferencesSettingsPage);
+    expect(page.currentDefaultModelId()).toBe('');
+  });
+
+  it("returns '' when the model list has loaded but settings are still pending", () => {
+    availableModels.set([
+      { id: '1', modelId: 'claude-haiku', modelName: 'Claude Haiku', providerName: 'Anthropic' },
+    ]);
+    const page = TestBed.inject(ChatPreferencesSettingsPage);
+    expect(page.currentDefaultModelId()).toBe('');
+  });
+
+  it('returns the saved modelId once both data sources have loaded', () => {
+    settingsValue.set({ defaultModelId: 'claude-haiku' });
+    availableModels.set([
+      { id: '1', modelId: 'claude-haiku', modelName: 'Claude Haiku', providerName: 'Anthropic' },
+    ]);
+    const page = TestBed.inject(ChatPreferencesSettingsPage);
+    expect(page.currentDefaultModelId()).toBe('claude-haiku');
+  });
+
+  it("returns '' when the user has explicitly cleared their default", () => {
+    settingsValue.set({ defaultModelId: null });
+    availableModels.set([
+      { id: '1', modelId: 'claude-haiku', modelName: 'Claude Haiku', providerName: 'Anthropic' },
+    ]);
+    const page = TestBed.inject(ChatPreferencesSettingsPage);
+    expect(page.currentDefaultModelId()).toBe('');
+  });
+});
diff --git a/frontend/ai.client/src/app/settings/pages/chat-preferences/chat-preferences-settings.page.ts b/frontend/ai.client/src/app/settings/pages/chat-preferences/chat-preferences-settings.page.ts
index 11c24f19..2e4eb7a6 100644
--- a/frontend/ai.client/src/app/settings/pages/chat-preferences/chat-preferences-settings.page.ts
+++ b/frontend/ai.client/src/app/settings/pages/chat-preferences/chat-preferences-settings.page.ts
@@ -41,19 +41,33 @@ import { LocalSettingsService } from '../../../services/local-settings.service';
           <div class="mt-4">
             @if (modelService.modelsLoading()) {
               <div class="flex items-center gap-2 text-sm/6 text-gray-500 dark:text-gray-400">
-                <div class="size-4 animate-spin rounded-full border-2 border-gray-300 border-t-blue-600"></div>
+                <div class="size-4 animate-spin rounded-full border-2 border-gray-300 border-t-blue-600 dark:border-t-blue-400"></div>
                 Loading models...
               </div>
             } @else {
               <select
                 class="block w-full rounded-sm border-0 bg-white py-1.5 pl-3 pr-10 text-sm/6 text-gray-900 shadow-xs ring-1 ring-gray-300 focus:ring-2 focus:ring-blue-600 dark:bg-white/5 dark:text-white dark:ring-white/10 dark:focus:ring-blue-500"
                 aria-label="Default model"
-                [value]="currentDefaultModelId()"
                 (change)="onModelChange($event)"
               >
-                <option value="">No default (use first available)</option>
+                <!--
+                  We bind [selected] on each <option> rather than [value] on
+                  the <select>. Native <select>.value is a one-time DOM
+                  property write: if Angular evaluates it before @for has
+                  rendered the matching <option> (same change-detection
+                  tick), the browser silently resets the select to the
+                  first option and never resyncs when options arrive. With
+                  [selected], the binding fires as each option mounts, so
+                  the saved modelId reliably wins regardless of which
+                  data source — settings or model list — resolves first
+                  (#161).
+                -->
+                <option value="" [selected]="currentDefaultModelId() === ''">No default (use first available)</option>
                 @for (model of modelService.availableModels(); track model.id) {
-                  <option [value]="model.modelId">{{ model.modelName }} ({{ model.providerName }})</option>
+                  <option
+                    [value]="model.modelId"
+                    [selected]="model.modelId === currentDefaultModelId()"
+                  >{{ model.modelName }} ({{ model.providerName }})</option>
                 }
               </select>
             }
@@ -185,7 +199,14 @@ export class ChatPreferencesSettingsPage {
 
   readonly currentDefaultModelId = computed(() => {
     const settings = this.userSettingsService.settingsResource.value();
-    return settings?.defaultModelId ?? '';
+    const models = this.modelService.availableModels();
+    // Wait for both data sources before binding the dropdown value. If we
+    // emit the saved modelId before the @for loop has rendered the matching
+    // <option>, the browser silently resets the <select> to the first
+    // option and Angular won't re-apply [value] when options arrive later
+    // because the computed input hasn't changed.
+    if (!settings || models.length === 0) return '';
+    return settings.defaultModelId ?? '';
   });
 
   async onModelChange(event: Event): Promise<void> {
diff --git a/frontend/ai.client/src/app/settings/pages/connectors-settings/connectors-settings.page.ts b/frontend/ai.client/src/app/settings/pages/connectors-settings/connectors-settings.page.ts
index e8dc615c..cccda7b8 100644
--- a/frontend/ai.client/src/app/settings/pages/connectors-settings/connectors-settings.page.ts
+++ b/frontend/ai.client/src/app/settings/pages/connectors-settings/connectors-settings.page.ts
@@ -62,7 +62,7 @@ type ConnectState =
 
       @if (resource.isLoading()) {
         <div class="flex items-center gap-3 text-sm/6 text-gray-500 dark:text-gray-400">
-          <div class="size-4 animate-spin rounded-full border-2 border-gray-300 border-t-blue-600 dark:border-gray-600"></div>
+          <div class="size-4 animate-spin rounded-full border-2 border-gray-300 border-t-blue-600 dark:border-t-blue-400 dark:border-gray-600"></div>
           Loading connectors...
         </div>
       } @else if (resource.error()) {
diff --git a/frontend/ai.client/src/app/settings/pages/usage/usage-settings.page.ts b/frontend/ai.client/src/app/settings/pages/usage/usage-settings.page.ts
index 930543c9..d44e703b 100644
--- a/frontend/ai.client/src/app/settings/pages/usage/usage-settings.page.ts
+++ b/frontend/ai.client/src/app/settings/pages/usage/usage-settings.page.ts
@@ -65,7 +65,7 @@ import { UserCostSummary } from './models/cost-summary.model';
       @if ((selectedPeriodType() === 'current' && costSummary.isLoading()) || isLoadingCustomReport()) {
         <div class="flex items-center justify-center py-12">
           <div class="flex flex-col items-center gap-3">
-            <div class="size-8 animate-spin rounded-full border-4 border-gray-300 border-t-blue-600"></div>
+            <div class="size-8 animate-spin rounded-full border-4 border-gray-300 border-t-blue-600 dark:border-t-blue-400"></div>
             <p class="text-sm/6 text-gray-500 dark:text-gray-400">Loading cost data...</p>
           </div>
         </div>
diff --git a/frontend/ai.client/src/app/shared/utils/stream-parser/index.ts b/frontend/ai.client/src/app/shared/utils/stream-parser/index.ts
index d85c738b..63cafd1c 100644
--- a/frontend/ai.client/src/app/shared/utils/stream-parser/index.ts
+++ b/frontend/ai.client/src/app/shared/utils/stream-parser/index.ts
@@ -38,6 +38,7 @@ export {
   processStreamEvent,
   createStreamLineParser,
   inferContentBlockType,
+  extractStreamingStringField,
   parseToolResultContent,
   type StreamParserCallbacks,
 } from './stream-parser-core';
@@ -58,6 +59,8 @@ export {
   validateOAuthRequiredEvent,
   validateToolApprovalRequiredEvent,
   validateCompactionEvent,
+  validateArtifactEvent,
+  validateUiResourceEvent,
 } from './stream-parser-core';
 
 // Types
@@ -80,6 +83,10 @@ export type {
   OAuthRequiredEvent,
   ToolApprovalRequiredEvent,
   CompactionEvent,
+  ArtifactEvent,
+  UiResourceEvent,
+  McpUiCsp,
+  McpUiPermissions,
   StreamEventType,
   StreamEventData,
   ParsedStreamEvent,
diff --git a/frontend/ai.client/src/app/shared/utils/stream-parser/stream-parser-core.spec.ts b/frontend/ai.client/src/app/shared/utils/stream-parser/stream-parser-core.spec.ts
index f77c56e2..34eb1475 100644
--- a/frontend/ai.client/src/app/shared/utils/stream-parser/stream-parser-core.spec.ts
+++ b/frontend/ai.client/src/app/shared/utils/stream-parser/stream-parser-core.spec.ts
@@ -11,9 +11,12 @@ import {
   validateQuotaExceededEvent,
   validateConversationalStreamError,
   validateCitation,
+  validateArtifactEvent,
+  validateUiResourceEvent,
   processStreamEvent,
   createStreamLineParser,
   inferContentBlockType,
+  extractStreamingStringField,
   parseToolResultContent,
   StreamParserCallbacks
 } from './stream-parser-core';
@@ -370,6 +373,99 @@ describe('stream-parser-core', () => {
     });
   });
 
+  describe('validateArtifactEvent', () => {
+    const valid = {
+      type: 'artifact',
+      artifactId: 'art-1',
+      version: 1,
+      title: 'Sales Dashboard',
+      contentType: 'text/html; charset=utf-8',
+      sessionId: 'sess-9',
+      updatedAt: '2026-05-15T12:00:05+00:00',
+      action: 'created'
+    };
+
+    it('should return true for a valid created artifact', () => {
+      expect(validateArtifactEvent(valid)).toBe(true);
+    });
+
+    it('should return true for an updated artifact (version > 1)', () => {
+      expect(validateArtifactEvent({ ...valid, version: 4, action: 'updated' })).toBe(true);
+    });
+
+    it('should return false for null/undefined', () => {
+      expect(validateArtifactEvent(null)).toBe(false);
+      expect(validateArtifactEvent(undefined)).toBe(false);
+    });
+
+    it('should return false when type is not "artifact"', () => {
+      expect(validateArtifactEvent({ ...valid, type: 'compaction' })).toBe(false);
+    });
+
+    it('should return false for empty artifactId', () => {
+      expect(validateArtifactEvent({ ...valid, artifactId: '' })).toBe(false);
+    });
+
+    it('should return false for version < 1 or non-integer', () => {
+      expect(validateArtifactEvent({ ...valid, version: 0 })).toBe(false);
+      expect(validateArtifactEvent({ ...valid, version: 1.5 })).toBe(false);
+    });
+
+    it('should return false for an unknown action', () => {
+      expect(validateArtifactEvent({ ...valid, action: 'deleted' })).toBe(false);
+    });
+
+    it('should return false for missing fields', () => {
+      expect(validateArtifactEvent({ type: 'artifact', artifactId: 'art-1' })).toBe(false);
+    });
+  });
+
+  describe('validateUiResourceEvent', () => {
+    const valid = {
+      type: 'ui_resource',
+      toolUseId: 'tu-1',
+      resourceUri: 'ui://srv/widget',
+      html: '<h1>hi</h1>',
+      mimeType: 'text/html;profile=mcp-app',
+      csp: { connectDomains: ['https://api.test'] },
+      permissions: { clipboardWrite: {} },
+      sandboxOrigin: 'https://mcp-sandbox.example.com'
+    };
+
+    it('should return true for a valid ui_resource event', () => {
+      expect(validateUiResourceEvent(valid)).toBe(true);
+    });
+
+    it('should accept empty html and empty sandboxOrigin', () => {
+      expect(validateUiResourceEvent({ ...valid, html: '', sandboxOrigin: '' })).toBe(true);
+    });
+
+    it('should return false for null/undefined', () => {
+      expect(validateUiResourceEvent(null)).toBe(false);
+      expect(validateUiResourceEvent(undefined)).toBe(false);
+    });
+
+    it('should return false when type is wrong', () => {
+      expect(validateUiResourceEvent({ ...valid, type: 'artifact' })).toBe(false);
+    });
+
+    it('should return false for empty toolUseId or resourceUri', () => {
+      expect(validateUiResourceEvent({ ...valid, toolUseId: '' })).toBe(false);
+      expect(validateUiResourceEvent({ ...valid, resourceUri: '' })).toBe(false);
+    });
+
+    it('should return false when csp/permissions are not objects', () => {
+      expect(validateUiResourceEvent({ ...valid, csp: null })).toBe(false);
+      expect(validateUiResourceEvent({ ...valid, permissions: 'x' })).toBe(false);
+    });
+
+    it('should return false for missing fields', () => {
+      expect(
+        validateUiResourceEvent({ type: 'ui_resource', toolUseId: 'tu-1' }),
+      ).toBe(false);
+    });
+  });
+
   describe('processStreamEvent', () => {
     let callbacks: StreamParserCallbacks;
 
@@ -386,6 +482,8 @@ describe('stream-parser-core', () => {
         onQuotaExceeded: vi.fn(),
         onStreamError: vi.fn(),
         onCitation: vi.fn(),
+        onArtifact: vi.fn(),
+        onUiResource: vi.fn(),
         onParseError: vi.fn(),
         onDone: vi.fn(),
         onError: vi.fn(),
@@ -438,6 +536,46 @@ describe('stream-parser-core', () => {
       processStreamEvent('unknown_event', {}, callbacks);
       expect(callbacks.onParseError).not.toHaveBeenCalled();
     });
+
+    it('should call onArtifact for a valid artifact event', () => {
+      const data = {
+        type: 'artifact',
+        artifactId: 'art-1',
+        version: 2,
+        title: 'Report',
+        contentType: 'text/html; charset=utf-8',
+        sessionId: 'sess-9',
+        updatedAt: '2026-05-15T12:00:05+00:00',
+        action: 'updated'
+      };
+      processStreamEvent('artifact', data, callbacks);
+      expect(callbacks.onArtifact).toHaveBeenCalledWith(data);
+    });
+
+    it('should call onParseError for an invalid artifact event', () => {
+      processStreamEvent('artifact', { type: 'artifact', artifactId: '' }, callbacks);
+      expect(callbacks.onParseError).toHaveBeenCalledWith('artifact: invalid data structure');
+    });
+
+    it('should call onUiResource for a valid ui_resource event', () => {
+      const data = {
+        type: 'ui_resource',
+        toolUseId: 'tu-1',
+        resourceUri: 'ui://srv/widget',
+        html: '<main>app</main>',
+        mimeType: 'text/html;profile=mcp-app',
+        csp: {},
+        permissions: {},
+        sandboxOrigin: ''
+      };
+      processStreamEvent('ui_resource', data, callbacks);
+      expect(callbacks.onUiResource).toHaveBeenCalledWith(data);
+    });
+
+    it('should call onParseError for an invalid ui_resource event', () => {
+      processStreamEvent('ui_resource', { type: 'ui_resource', toolUseId: '' }, callbacks);
+      expect(callbacks.onParseError).toHaveBeenCalledWith('ui_resource: invalid data structure');
+    });
   });
 
   describe('createStreamLineParser', () => {
@@ -510,6 +648,65 @@ describe('stream-parser-core', () => {
     });
   });
 
+  describe('extractStreamingStringField', () => {
+    it('returns null when input is empty', () => {
+      expect(extractStreamingStringField('', 'content')).toBeNull();
+    });
+
+    it('returns null when the field has not started streaming', () => {
+      expect(extractStreamingStringField('{"title":"Hi"', 'content')).toBeNull();
+      expect(extractStreamingStringField('{"title":"Hi","content"', 'content')).toBeNull();
+      expect(extractStreamingStringField('{"title":"Hi","content":', 'content')).toBeNull();
+    });
+
+    it('returns the partial value while the string is still open', () => {
+      expect(
+        extractStreamingStringField('{"title":"Hi","content":"<!DOCTYPE htm', 'content'),
+      ).toBe('<!DOCTYPE htm');
+    });
+
+    it('returns the full value once the closing quote arrives', () => {
+      expect(
+        extractStreamingStringField('{"content":"<h1>Hello</h1>","x":1}', 'content'),
+      ).toBe('<h1>Hello</h1>');
+    });
+
+    it('decodes JSON string escapes', () => {
+      expect(
+        extractStreamingStringField('{"content":"line1\\nline2\\t\\"q\\"\\\\","', 'content'),
+      ).toBe('line1\nline2\t"q"\\');
+    });
+
+    it('decodes unicode escapes', () => {
+      expect(extractStreamingStringField('{"content":"\\u00e9\\u4e2d', 'content')).toBe(
+        'é中',
+      );
+    });
+
+    it('drops a dangling backslash that has not finished streaming', () => {
+      expect(extractStreamingStringField('{"content":"abc\\', 'content')).toBe('abc');
+    });
+
+    it('drops an incomplete unicode escape', () => {
+      expect(extractStreamingStringField('{"content":"abc\\u00e', 'content')).toBe('abc');
+    });
+
+    it('does not match a different field with a shared prefix', () => {
+      // `content_type` must not be mistaken for `content`
+      expect(
+        extractStreamingStringField('{"content_type":"text/html","content":"body', 'content'),
+      ).toBe('body');
+    });
+
+    it('tolerates whitespace between key, colon, and value', () => {
+      expect(extractStreamingStringField('{"content"  :  "hi', 'content')).toBe('hi');
+    });
+
+    it('returns empty string for an empty completed value', () => {
+      expect(extractStreamingStringField('{"content":""}', 'content')).toBe('');
+    });
+  });
+
   describe('parseToolResultContent', () => {
     it('should parse text content', () => {
       const result = parseToolResultContent([{ text: 'hello world' }]);
diff --git a/frontend/ai.client/src/app/shared/utils/stream-parser/stream-parser-core.ts b/frontend/ai.client/src/app/shared/utils/stream-parser/stream-parser-core.ts
index d5f117d9..75247c45 100644
--- a/frontend/ai.client/src/app/shared/utils/stream-parser/stream-parser-core.ts
+++ b/frontend/ai.client/src/app/shared/utils/stream-parser/stream-parser-core.ts
@@ -39,6 +39,8 @@ import type {
   OAuthRequiredEvent,
   ToolApprovalRequiredEvent,
   CompactionEvent,
+  ArtifactEvent,
+  UiResourceEvent,
   ToolProgress,
 } from './stream-parser-types';
 import type { MetadataEvent } from '../../../session/services/models/content-types';
@@ -87,6 +89,15 @@ export interface StreamParserCallbacks {
   // Compaction (backend rolled older turns into a summary on this turn)
   onCompaction?: (data: CompactionEvent) => void;
 
+  // Artifact created/updated this turn (existence signal; content is
+  // fetched out-of-band via a render token + sandboxed iframe)
+  onArtifact?: (data: ArtifactEvent) => void;
+
+  // MCP App UI resource for a tool result (SEP-1865). Inline event,
+  // correlated to its tool-use block by toolUseId; carries the HTML to
+  // render in the sandbox-proxy iframe.
+  onUiResource?: (data: UiResourceEvent) => void;
+
   // Error handling
   onError?: (data: StreamErrorEvent | ConversationalStreamErrorEvent | string) => void;
   onStreamError?: (data: ConversationalStreamErrorEvent) => void;
@@ -393,6 +404,62 @@ export function validateCompactionEvent(data: unknown): data is CompactionEvent
   );
 }
 
+/**
+ * Validate ArtifactEvent structure
+ */
+export function validateArtifactEvent(data: unknown): data is ArtifactEvent {
+  if (!data || typeof data !== 'object') {
+    return false;
+  }
+
+  const event = data as Partial<ArtifactEvent>;
+
+  return (
+    event.type === 'artifact' &&
+    typeof event.artifactId === 'string' &&
+    event.artifactId.length > 0 &&
+    typeof event.version === 'number' &&
+    Number.isInteger(event.version) &&
+    event.version >= 1 &&
+    typeof event.title === 'string' &&
+    typeof event.contentType === 'string' &&
+    typeof event.sessionId === 'string' &&
+    typeof event.updatedAt === 'string' &&
+    (event.action === 'created' || event.action === 'updated')
+  );
+}
+
+/**
+ * Validate UiResourceEvent structure (SEP-1865 MCP App, PR #3 wire shape).
+ *
+ * `html` may legitimately be empty only in degenerate cases; we require it
+ * to be a string but not non-empty so a future server that streams an
+ * empty shell still round-trips. `csp`/`permissions` are objects;
+ * `sandboxOrigin` may be '' until the sandbox stack is deployed.
+ */
+export function validateUiResourceEvent(data: unknown): data is UiResourceEvent {
+  if (!data || typeof data !== 'object') {
+    return false;
+  }
+
+  const event = data as Partial<UiResourceEvent>;
+
+  return (
+    event.type === 'ui_resource' &&
+    typeof event.toolUseId === 'string' &&
+    event.toolUseId.length > 0 &&
+    typeof event.resourceUri === 'string' &&
+    event.resourceUri.length > 0 &&
+    typeof event.html === 'string' &&
+    typeof event.mimeType === 'string' &&
+    typeof event.sandboxOrigin === 'string' &&
+    typeof event.csp === 'object' &&
+    event.csp !== null &&
+    typeof event.permissions === 'object' &&
+    event.permissions !== null
+  );
+}
+
 /**
  * Validate Citation structure
  */
@@ -582,6 +649,22 @@ export function processStreamEvent(
         }
         break;
 
+      case 'artifact':
+        if (validateArtifactEvent(data)) {
+          callbacks.onArtifact?.(data);
+        } else {
+          callbacks.onParseError?.('artifact: invalid data structure');
+        }
+        break;
+
+      case 'ui_resource':
+        if (validateUiResourceEvent(data)) {
+          callbacks.onUiResource?.(data);
+        } else {
+          callbacks.onParseError?.('ui_resource: invalid data structure');
+        }
+        break;
+
       default:
         // Ignore unknown events (ping, etc.)
         break;
@@ -695,6 +778,120 @@ export function inferContentBlockType(
   return 'text';
 }
 
+/**
+ * Best-effort extraction of a single string field's value from an incomplete
+ * JSON object that is still streaming in (e.g. a tool call's `input` arriving
+ * as `input_json_delta` chunks).
+ *
+ * Returns the decoded string value so far — even when the closing quote has
+ * not yet arrived — so callers can render live "generating" feedback. Returns
+ * `null` if the field's string value has not started streaming yet.
+ *
+ * Never throws: malformed / truncated input yields the portion decoded so far.
+ * Only string-valued fields are handled (non-string values return `null`).
+ *
+ * @param partialJson Accumulated (possibly incomplete) JSON text
+ * @param field       Top-level field name to extract (e.g. "content")
+ */
+export function extractStreamingStringField(
+  partialJson: string,
+  field: string,
+): string | null {
+  if (!partialJson) {
+    return null;
+  }
+
+  // Locate `"field" : "` allowing JSON-permitted whitespace. Escaping the
+  // field name keeps this safe even though callers pass simple identifiers.
+  const escapedField = field.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
+  const opener = new RegExp(`"${escapedField}"\\s*:\\s*"`);
+  const match = opener.exec(partialJson);
+  if (!match) {
+    return null;
+  }
+
+  let i = match.index + match[0].length;
+  let result = '';
+
+  while (i < partialJson.length) {
+    const ch = partialJson[i];
+
+    if (ch === '"') {
+      // Unescaped closing quote -> value is complete.
+      return result;
+    }
+
+    if (ch === '\\') {
+      // Need at least one more char to know the escape; if it hasn't
+      // streamed yet, drop the dangling backslash and return what we have.
+      if (i + 1 >= partialJson.length) {
+        return result;
+      }
+      const esc = partialJson[i + 1];
+      switch (esc) {
+        case '"':
+          result += '"';
+          i += 2;
+          break;
+        case '\\':
+          result += '\\';
+          i += 2;
+          break;
+        case '/':
+          result += '/';
+          i += 2;
+          break;
+        case 'b':
+          result += '\b';
+          i += 2;
+          break;
+        case 'f':
+          result += '\f';
+          i += 2;
+          break;
+        case 'n':
+          result += '\n';
+          i += 2;
+          break;
+        case 'r':
+          result += '\r';
+          i += 2;
+          break;
+        case 't':
+          result += '\t';
+          i += 2;
+          break;
+        case 'u': {
+          // Need 4 hex digits; if the buffer cuts off mid-sequence, drop
+          // the incomplete escape and return the decoded prefix.
+          if (i + 6 > partialJson.length) {
+            return result;
+          }
+          const hex = partialJson.slice(i + 2, i + 6);
+          if (!/^[0-9a-fA-F]{4}$/.test(hex)) {
+            return result;
+          }
+          result += String.fromCharCode(parseInt(hex, 16));
+          i += 6;
+          break;
+        }
+        default:
+          // Invalid escape — emit the char literally and move on.
+          result += esc;
+          i += 2;
+          break;
+      }
+      continue;
+    }
+
+    result += ch;
+    i += 1;
+  }
+
+  // Buffer exhausted before the closing quote — value still streaming.
+  return result;
+}
+
 /**
  * Parse tool result content array into normalized format
  */
diff --git a/frontend/ai.client/src/app/shared/utils/stream-parser/stream-parser-types.ts b/frontend/ai.client/src/app/shared/utils/stream-parser/stream-parser-types.ts
index eec961c9..4efdf1ba 100644
--- a/frontend/ai.client/src/app/shared/utils/stream-parser/stream-parser-types.ts
+++ b/frontend/ai.client/src/app/shared/utils/stream-parser/stream-parser-types.ts
@@ -143,6 +143,89 @@ export interface CompactionEvent {
   inputTokens: number;
 }
 
+/**
+ * Artifact event — emitted once per artifact created or updated during a
+ * turn, after the final `metadata`/`compaction` events and before `done`
+ * (same post-`message_stop` side-channel placement as `oauth_required`).
+ *
+ * The artifact's HTML content is never carried on the wire: it lives in
+ * S3 and renders in a sandboxed iframe via the artifact render origin.
+ * This event only signals existence so the SPA can show an inline card
+ * and open the panel (which mints a short-lived render token on demand).
+ *
+ * `action` is `created` for v1, `updated` for any later version. Cards
+ * also hydrate on session load via the app-api list endpoint; the SPA
+ * dedupes by `artifactId` keeping the highest `version`.
+ *
+ * `producedByMessageIndex` is the 0-based index of the turn's final
+ * assistant message (`msg-{sessionId}-{index}`), stamped by the stream
+ * coordinator so the SPA can anchor the card inline after that message.
+ * Null when the index couldn't be resolved — the SPA falls back to the
+ * end-of-conversation strip.
+ */
+export interface ArtifactEvent {
+  type: 'artifact';
+  artifactId: string;
+  version: number;
+  title: string;
+  contentType: string;
+  sessionId: string;
+  updatedAt: string;
+  action: 'created' | 'updated';
+  producedByMessageIndex?: number | null;
+}
+
+/**
+ * CSP domain allowlists declared by an MCP App resource (SEP-1865
+ * `McpUiResourceCsp`). The sandbox proxy composes the inner iframe's CSP
+ * from these plus the spec's deny-by-default fallbacks.
+ */
+export interface McpUiCsp {
+  connectDomains?: string[];
+  resourceDomains?: string[];
+  frameDomains?: string[];
+  baseUriDomains?: string[];
+}
+
+/**
+ * Sandbox permissions an MCP App resource requested (SEP-1865). Each key,
+ * when present (as an empty object), maps to a Permissions-Policy feature on
+ * the inner iframe's `allow` attribute. Absence = not requested.
+ */
+export interface McpUiPermissions {
+  camera?: Record<string, never>;
+  microphone?: Record<string, never>;
+  geolocation?: Record<string, never>;
+  clipboardWrite?: Record<string, never>;
+}
+
+/**
+ * UI resource event — emitted by the backend (PR #3) right after the
+ * correlated `tool_result` when the tool declared a `ui://` MCP App
+ * resource (SEP-1865). Unlike `artifact`/`oauth_required` this is an
+ * INLINE event during streaming, correlated to its tool-use block by
+ * `toolUseId`. The HTML is fetched server-side via `resources/read` and
+ * inlined here so the frontend needs no MCP client of its own.
+ *
+ * `sandboxOrigin` is the origin of the deployed sandbox-proxy (proxy.html)
+ * the SPA frames the App in; empty until that stack is deployed + wired
+ * (the whole surface is inert behind the backend host flag until then).
+ *
+ * The entire MCP Apps surface stays dark until PR #7 flips the backend
+ * `AGENTCORE_MCP_APPS_HOST_ENABLED` flag, so in practice this event does
+ * not arrive in production yet.
+ */
+export interface UiResourceEvent {
+  type: 'ui_resource';
+  toolUseId: string;
+  resourceUri: string;
+  html: string;
+  mimeType: string;
+  csp: McpUiCsp;
+  permissions: McpUiPermissions;
+  sandboxOrigin: string;
+}
+
 /**
  * Tool result event data structure
  */
@@ -182,7 +265,9 @@ export type StreamEventType =
   | 'stream_error'
   | 'citation'
   | 'oauth_required'
-  | 'compaction';
+  | 'compaction'
+  | 'artifact'
+  | 'ui_resource';
 
 /**
  * Union type of all possible event data types
@@ -204,6 +289,8 @@ export type StreamEventData =
   | Citation
   | OAuthRequiredEvent
   | CompactionEvent
+  | ArtifactEvent
+  | UiResourceEvent
   | null
   | undefined;
 
diff --git a/infrastructure/.gitignore b/infrastructure/.gitignore
index f60797b6..a94d06dd 100644
--- a/infrastructure/.gitignore
+++ b/infrastructure/.gitignore
@@ -3,6 +3,11 @@
 *.d.ts
 node_modules
 
+# Hand-written static assets deployed verbatim to S3 (e.g. the MCP Apps
+# sandbox-proxy shell) are SOURCE, not compiled CDK output — keep them tracked.
+!assets/
+!assets/**
+
 # CDK asset staging directory
 .cdk.staging
 cdk.out
diff --git a/infrastructure/assets/mcp-sandbox/csp-function.js b/infrastructure/assets/mcp-sandbox/csp-function.js
new file mode 100644
index 00000000..c1cec3fb
--- /dev/null
+++ b/infrastructure/assets/mcp-sandbox/csp-function.js
@@ -0,0 +1,157 @@
+/**
+ * MCP Apps sandbox — dynamic per-resource Content-Security-Policy.
+ *
+ * CloudFront Function attached to the proxy.html behavior on
+ * **viewer-response**. Reads the `?csp=<urlencoded-json>` query string
+ * (where the JSON matches the spec's `McpUiResourceCsp` shape from
+ * `_meta.ui.csp`), composes a per-resource CSP header that honors the
+ * declared `connectDomains` / `resourceDomains` / `frameDomains` /
+ * `baseUriDomains`, and stamps it on the response — replacing any CSP
+ * coming from the origin / ResponseHeadersPolicy.
+ *
+ * Mirrors `modelcontextprotocol/ext-apps/examples/basic-host/serve.ts`'s
+ * `buildCspHeader` so a UI resource gets the same CSP from us that it
+ * would on the spec's reference host. Failure paths (missing param,
+ * malformed JSON, non-object payload) degrade silently to the default
+ * (un-declared) CSP — same behavior as if the App had omitted `_meta.ui.csp`.
+ *
+ * Runtime: CloudFront Functions JavaScript runtime v2.0 (~ES2017 strict
+ * subset, sync only, no I/O, no Date.now, no Math.random, no Buffer/URL).
+ *
+ * Frame-ancestors is the security-critical bit that doesn't vary per
+ * resource. It's substituted into this file at CDK synth time by
+ * `loadMcpSandboxCspFunctionCode` (lib/mcp-sandbox-stack.ts), which
+ * replaces the __INJECT_FRAME_ANCESTORS__ string literal below with a
+ * properly JSON-escaped JS literal. The loader asserts the quoted form
+ * appears exactly once; this comment uses the bare token so it does not
+ * count as a second occurrence. Tests run the file as-is and pass
+ * `frameAncestors` directly to `buildCspHeader`.
+ *
+ * The trailing `if (typeof module !== 'undefined')` block is a no-op in
+ * the CloudFront Function runtime (`module` is undeclared, `typeof`
+ * returns `'undefined'`) and exposes the pure helpers for Node unit
+ * tests in `infrastructure/test/mcp-sandbox-csp-function.test.ts`. Don't
+ * delete it — it's how we keep the runtime code and the test surface in
+ * the same file with no duplication.
+ */
+'use strict';
+
+var FRAME_ANCESTORS = '__INJECT_FRAME_ANCESTORS__';
+
+/**
+ * Reject domain entries containing CSP-injection characters. Mirrors the
+ * upstream reference's regex exactly: `;` / CR / LF break out to a new
+ * directive; quotes inject CSP keywords (e.g. `'unsafe-eval'`); space
+ * smuggles multiple sources into one entry.
+ */
+function sanitizeCspDomains(domains) {
+  if (!Array.isArray(domains)) return [];
+  var out = [];
+  for (var i = 0; i < domains.length; i++) {
+    var d = domains[i];
+    if (typeof d === 'string' && d.length > 0 && !/[;\r\n'" ]/.test(d)) {
+      out.push(d);
+    }
+  }
+  return out;
+}
+
+/**
+ * Compose the CSP header. Defaults mirror the ext-apps basic-host
+ * reference (script-src ... 'unsafe-eval' blob: data:), broader than the
+ * spec's "Restrictive Default" so bundled Apps that omit `ui.csp` still
+ * run. Declared resource/connect/frame/baseUri domains are appended to
+ * the corresponding directives.
+ *
+ * `frame-ancestors`, `form-action`, and `object-src` are security-critical
+ * and never vary per resource.
+ */
+function buildCspHeader(cspConfig, frameAncestors) {
+  var csp = cspConfig || {};
+  var resourceDomains = sanitizeCspDomains(csp.resourceDomains).join(' ');
+  var connectDomains = sanitizeCspDomains(csp.connectDomains).join(' ');
+  var frameDomains = sanitizeCspDomains(csp.frameDomains).join(' ');
+  var baseUriDomains = sanitizeCspDomains(csp.baseUriDomains).join(' ');
+
+  var directives = [
+    "default-src 'self' 'unsafe-inline'",
+    ("script-src 'self' 'unsafe-inline' 'unsafe-eval' blob: data: " + resourceDomains).trim(),
+    ("style-src 'self' 'unsafe-inline' blob: data: " + resourceDomains).trim(),
+    ("img-src 'self' data: blob: " + resourceDomains).trim(),
+    ("font-src 'self' data: blob: " + resourceDomains).trim(),
+    ("media-src 'self' data: blob: " + resourceDomains).trim(),
+    ("connect-src 'self' " + connectDomains).trim(),
+    ("worker-src 'self' blob: " + resourceDomains).trim(),
+    frameDomains ? ("frame-src " + frameDomains) : "frame-src 'none'",
+    "object-src 'none'",
+    baseUriDomains ? ("base-uri " + baseUriDomains) : "base-uri 'none'",
+    "form-action 'none'",
+    "frame-ancestors " + frameAncestors,
+  ];
+
+  return directives.join('; ');
+}
+
+/**
+ * Try to parse a string as a JSON object (not array, not scalar). Returns
+ * the parsed object or null. Never throws.
+ */
+function tryParseObject(s) {
+  try {
+    var parsed = JSON.parse(s);
+    if (parsed && typeof parsed === 'object' && !Array.isArray(parsed)) {
+      return parsed;
+    }
+    return null;
+  } catch (e) {
+    return null;
+  }
+}
+
+/**
+ * Extract and parse the `csp` query parameter. CloudFront Functions
+ * deliver `request.querystring[name].value` as the parameter value;
+ * whether that value is URL-decoded depends on runtime/event-type and
+ * doesn't always match the docs in practice. We try the value as-is
+ * first, then fall back to an explicit `decodeURIComponent` if that
+ * parse fails. Returns the parsed CSP object or null on absent /
+ * malformed / non-object input — never throws.
+ */
+function parseCspParam(querystring) {
+  if (!querystring) return null;
+  var entry = querystring.csp;
+  if (!entry || typeof entry.value !== 'string' || entry.value.length === 0) {
+    return null;
+  }
+  var asIs = tryParseObject(entry.value);
+  if (asIs !== null) return asIs;
+  var decoded;
+  try {
+    decoded = decodeURIComponent(entry.value);
+  } catch (e) {
+    return null;
+  }
+  if (decoded === entry.value) return null;
+  return tryParseObject(decoded);
+}
+
+function handler(event) {
+  var request = event.request || {};
+  var response = event.response || {};
+  response.headers = response.headers || {};
+
+  var cspConfig = parseCspParam(request.querystring);
+  var csp = buildCspHeader(cspConfig, FRAME_ANCESTORS);
+
+  response.headers['content-security-policy'] = { value: csp };
+  return response;
+}
+
+if (typeof module !== 'undefined' && module.exports) {
+  module.exports = {
+    sanitizeCspDomains: sanitizeCspDomains,
+    buildCspHeader: buildCspHeader,
+    parseCspParam: parseCspParam,
+    handler: handler,
+  };
+}
diff --git a/infrastructure/assets/mcp-sandbox/proxy.html b/infrastructure/assets/mcp-sandbox/proxy.html
new file mode 100644
index 00000000..8cd8dfe6
--- /dev/null
+++ b/infrastructure/assets/mcp-sandbox/proxy.html
@@ -0,0 +1,50 @@
+<!DOCTYPE html>
+<!--
+  MCP Apps host renderer — Sandbox Proxy shell (OUTER iframe)
+
+  PR #1 (origin + shell) + PR #4 (proxy.js protocol) of
+  docs/kaizen/scoping/mcp-apps-host-renderer.md.
+
+  This is the static shell served at mcp-sandbox.{domain}/proxy.html. It is
+  the OUTER half of the spec's "Sandbox Proxy pattern" for web hosts:
+
+    ai.client (SPA)
+      └─ <iframe src="https://mcp-sandbox.{domain}/proxy.html">   ← THIS FILE
+           └─ <iframe srcdoc="...">  inner content frame (created by proxy.js)
+
+  Why two iframes: this outer page is a stable cross-origin boundary against
+  the host page, so the INNER (null-origin, no allow-same-origin) frame
+  never grants the MCP App access to the SPA origin (cookies / localStorage
+  / app API). proxy.js gives the inner frame the strict per-resource CSP
+  composed from `_meta.ui.csp` (carried on the `ui_resource` SSE event).
+
+  proxy.js (PR #4) implements the sandbox-proxy half of the JSON-RPC-2.0
+  postMessage protocol: it announces readiness, receives the resource, loads
+  the View under a composed CSP, and forwards messages host<->View with
+  per-frame nonce auth. Until PR #7 flips MCP_APPS_HOST_ENABLED nothing in
+  the SPA points at this page, so deploying this stack is inert.
+
+  CSP NOTE: the security-critical control for this PR is the response
+  `Content-Security-Policy: frame-ancestors <SPA origin only>` stamped by the
+  CloudFront ResponseHeadersPolicy in mcp-sandbox-stack.ts — that is what
+  prevents any site other than the SPA from embedding this proxy. This
+  document loads no inline scripts/styles so the served CSP can stay
+  `script-src 'self'` with no `'unsafe-inline'`.
+-->
+<html lang="en">
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1" />
+    <meta name="referrer" content="no-referrer" />
+    <title>MCP Apps Sandbox Proxy</title>
+  </head>
+  <body>
+    <!--
+      The inner content frame is created at runtime by proxy.js (via srcdoc),
+      not declared here: its srcdoc / sandbox / CSP / allow are derived
+      per-resource from the `ui_resource` SSE event the host relays in the
+      `ui/notifications/sandbox-resource-ready` message.
+    -->
+    <script src="proxy.js"></script>
+  </body>
+</html>
diff --git a/infrastructure/assets/mcp-sandbox/proxy.js b/infrastructure/assets/mcp-sandbox/proxy.js
new file mode 100644
index 00000000..371e7553
--- /dev/null
+++ b/infrastructure/assets/mcp-sandbox/proxy.js
@@ -0,0 +1,323 @@
+/*
+ * MCP Apps host renderer — Sandbox Proxy (OUTER iframe).
+ *
+ * PR #4 of docs/kaizen/scoping/mcp-apps-host-renderer.md (supersedes the
+ * PR #1 liveness shell). Normative spec: ext-apps
+ * specification/2026-01-26/apps.mdx, "Sandbox proxy".
+ *
+ * This file is served from the dedicated mcp-sandbox origin and runs inside
+ * the OUTER iframe the SPA created with sandbox="allow-scripts
+ * allow-same-origin". It is the stable cross-origin boundary between the
+ * host (ai.client) and the untrusted App View (an inner iframe this script
+ * creates, mounted via document.write per the ext-apps basic-host
+ * reference). The inner iframe defaults to allow-scripts +
+ * allow-same-origin + allow-forms (matches the basic-host reference) so
+ * document.write can populate it and typical App bundles can use
+ * localStorage at the sandbox origin; the App may override its
+ * `_meta.ui.sandbox` to opt back into null-origin (we'll fall back to
+ * srcdoc when contentDocument isn't writable).
+ *
+ * Responsibilities (spec §"Sandbox proxy"):
+ *   3. Announce readiness to the host (ui/notifications/sandbox-proxy-ready).
+ *   4. Receive the raw HTML + sandbox/CSP/permissions
+ *      (ui/notifications/sandbox-resource-ready).
+ *   5. Load the View HTML in the inner iframe via document.write (falls
+ *      back to srcdoc if the App opted into a stricter cross-origin
+ *      sandbox). Inject a per-resource <meta> CSP composed from
+ *      _meta.ui.csp; map _meta.ui.permissions onto the inner iframe's
+ *      `allow` attribute. NOTE: the inner doc also INHERITS this proxy's
+ *      HTTP CSP (CSP3 local-scheme rule applies to srcdoc + document.write-
+ *      populated about:blank alike); that inherited policy — set in
+ *      mcp-sandbox-stack.ts — is the load-bearing security bound. The
+ *      injected <meta> can only further-restrict via intersection.
+ *   6. Forward every JSON-RPC message host<->View whose method does not
+ *      start with "ui/notifications/sandbox-". The host enforces the
+ *      "no sends before initialized" rule; the proxy is a dumb pipe.
+ *
+ * Auth: a per-frame nonce, minted by the host and delivered in
+ * sandbox-resource-ready, authenticates the host<->proxy leg. The proxy
+ * adds the nonce on View->host forwards and strips it on host->View
+ * forwards (the View speaks plain spec JSON-RPC and never sees transport
+ * auth). proxy.html itself ships zero inline content so 'unsafe-inline'
+ * on the inherited CSP can't be exploited against the shell.
+ */
+(function () {
+  'use strict';
+
+  var PROXY_READY = 'ui/notifications/sandbox-proxy-ready';
+  var RESOURCE_READY = 'ui/notifications/sandbox-resource-ready';
+  var SANDBOX_RESERVED_PREFIX = 'ui/notifications/sandbox-';
+
+  var hostWindow = window.parent;
+  var hostOrigin = null; // learned from the first sandbox-resource-ready
+  var nonce = null;
+  var inner = null;
+  var innerReady = false;
+  var pendingToInner = []; // host->View messages queued until inner loads
+
+  // --- CSP composition (spec §"Sandbox proxy" point 5 + Host Behavior) ----
+
+  function list(domains) {
+    return Array.isArray(domains)
+      ? domains.filter(function (d) {
+          return typeof d === 'string' && d.length > 0;
+        })
+      : [];
+  }
+
+  // Restrictive default when no _meta.ui.csp is supplied (verbatim from the
+  // normative spec), hardened with object-src/frame-src/base-uri.
+  function defaultCsp() {
+    return [
+      "default-src 'none'",
+      "script-src 'self' 'unsafe-inline'",
+      "style-src 'self' 'unsafe-inline'",
+      "img-src 'self' data:",
+      "media-src 'self' data:",
+      "font-src 'self'",
+      "connect-src 'none'",
+      "frame-src 'none'",
+      "base-uri 'self'",
+      "object-src 'none'",
+      "form-action 'none'"
+    ].join('; ');
+  }
+
+  // Compose from declared domains. resourceDomains maps to the static
+  // resource directives; connectDomains to connect-src; frameDomains to
+  // frame-src; baseUriDomains to base-uri. Undeclared => deny (spec: MUST
+  // NOT allow undeclared domains; MAY further restrict).
+  function composeCsp(csp) {
+    if (!csp || typeof csp !== 'object') {
+      return defaultCsp();
+    }
+    var res = list(csp.resourceDomains).join(' ');
+    var conn = list(csp.connectDomains).join(' ');
+    var frame = list(csp.frameDomains).join(' ');
+    var base = list(csp.baseUriDomains).join(' ');
+    return [
+      "default-src 'none'",
+      ("script-src 'self' 'unsafe-inline'" + (res ? ' ' + res : '')),
+      ("style-src 'self' 'unsafe-inline'" + (res ? ' ' + res : '')),
+      ("img-src 'self' data:" + (res ? ' ' + res : '')),
+      ("font-src 'self'" + (res ? ' ' + res : '')),
+      ("media-src 'self' data:" + (res ? ' ' + res : '')),
+      ('connect-src ' + (conn || "'none'")),
+      ('frame-src ' + (frame || "'none'")),
+      ('base-uri ' + (base || "'self'")),
+      "object-src 'none'",
+      "form-action 'none'"
+    ].join('; ');
+  }
+
+  // Map _meta.ui.permissions (object form, SEP-1865) to a Permissions-Policy
+  // `allow` attribute value for the inner iframe.
+  function allowAttr(permissions) {
+    if (!permissions || typeof permissions !== 'object') {
+      return '';
+    }
+    var feats = [];
+    if (permissions.camera) feats.push('camera');
+    if (permissions.microphone) feats.push('microphone');
+    if (permissions.geolocation) feats.push('geolocation');
+    if (permissions.clipboardWrite) feats.push('clipboard-write');
+    return feats.join('; ');
+  }
+
+  // Inject the composed CSP as the first <head> child so it governs the
+  // whole document. Relies on the App being a valid HTML5 document (spec
+  // MUST); falls back to wrapping if no <head> is present.
+  function withCsp(html, cspValue) {
+    var meta =
+      '<meta http-equiv="Content-Security-Policy" content="' +
+      cspValue.replace(/"/g, '&quot;') +
+      '">';
+    if (/<head[^>]*>/i.test(html)) {
+      return html.replace(/(<head[^>]*>)/i, '$1' + meta);
+    }
+    if (/<html[^>]*>/i.test(html)) {
+      return html.replace(/(<html[^>]*>)/i, '$1<head>' + meta + '</head>');
+    }
+    return '<!doctype html><html><head>' + meta + '</head><body>' + html +
+      '</body></html>';
+  }
+
+  // --- inner iframe (the View) -------------------------------------------
+
+  function mountView(params) {
+    // Default matches the ext-apps basic-host reference
+    // (`examples/basic-host/src/sandbox.ts`): allow-scripts +
+    // allow-same-origin + allow-forms. allow-same-origin lets document.write
+    // populate the inner doc (contentDocument is only accessible when the
+    // inner is same-origin to this proxy — fine, the proxy origin is a
+    // static CDN with no shared state) and lets typical bundled Apps reach
+    // localStorage at the sandbox origin. The App can override via
+    // `_meta.ui.sandbox` to opt back into null-origin for stricter isolation;
+    // we'll fall back to srcdoc when contentDocument isn't writable.
+    var sandbox =
+      typeof params.sandbox === 'string' && params.sandbox
+        ? params.sandbox
+        : 'allow-scripts allow-same-origin allow-forms';
+    var allow = allowAttr(params.permissions);
+
+    inner = document.createElement('iframe');
+    inner.id = 'mcp-app-content';
+    inner.title = 'MCP App content';
+    inner.setAttribute('sandbox', sandbox);
+    if (allow) {
+      inner.setAttribute('allow', allow);
+    }
+    inner.setAttribute('referrerpolicy', 'no-referrer');
+    inner.style.cssText =
+      'border:0;width:100%;height:100%;display:block;background:#fff';
+    inner.addEventListener('load', function () {
+      innerReady = true;
+      var queued = pendingToInner.splice(0, pendingToInner.length);
+      for (var i = 0; i < queued.length; i++) {
+        postToInner(queued[i]);
+      }
+    });
+    document.body.appendChild(inner);
+
+    // Build the App document. Per the ext-apps basic-host reference
+    // (examples/basic-host/src/sandbox.ts): "Use document.write instead
+    // of srcdoc (which the CesiumJS Map won't work with)". The inner
+    // document inherits this proxy's HTTP CSP either way — that's the
+    // load-bearing security boundary (see buildMcpSandboxProxyCsp in
+    // mcp-sandbox-stack.ts). The per-App CSP meta tag we still inject
+    // *intersects* the inherited policy, so Apps can further restrict
+    // but not loosen. Per-frame nonce is the channel auth.
+    var html = withCsp(String(params.html || ''), composeCsp(params.csp));
+    var doc = null;
+    try {
+      doc = inner.contentDocument || (inner.contentWindow && inner.contentWindow.document);
+    } catch (_) {
+      // Cross-origin access throws; we'll fall back to srcdoc below.
+      doc = null;
+    }
+    if (doc) {
+      doc.open();
+      doc.write(html);
+      doc.close();
+    } else {
+      // Fallback path: the App opted into a stricter sandbox without
+      // allow-same-origin, so contentDocument is cross-origin. srcdoc
+      // works for the vast majority of Apps; CesiumJS-class outliers
+      // would need to relax their declared sandbox.
+      inner.setAttribute('srcdoc', html);
+    }
+  }
+
+  // --- message plumbing ---------------------------------------------------
+
+  function isJsonRpc(d) {
+    return d && typeof d === 'object' && d.jsonrpc === '2.0';
+  }
+
+  function methodOf(d) {
+    return d && typeof d.method === 'string' ? d.method : null;
+  }
+
+  function isSandboxReserved(method) {
+    return !!method && method.indexOf(SANDBOX_RESERVED_PREFIX) === 0;
+  }
+
+  function postToInner(msg) {
+    if (!inner || !inner.contentWindow) {
+      return;
+    }
+    // Inner is null-origin; targetOrigin must be "*". Strip transport nonce
+    // so the View only ever sees spec-clean JSON-RPC.
+    var clean = {};
+    for (var k in msg) {
+      if (Object.prototype.hasOwnProperty.call(msg, k) && k !== 'nonce') {
+        clean[k] = msg[k];
+      }
+    }
+    inner.contentWindow.postMessage(clean, '*');
+  }
+
+  function postToHost(msg) {
+    if (!hostWindow) {
+      return;
+    }
+    var withNonce = {};
+    for (var k in msg) {
+      if (Object.prototype.hasOwnProperty.call(msg, k)) {
+        withNonce[k] = msg[k];
+      }
+    }
+    if (nonce) {
+      withNonce.nonce = nonce;
+    }
+    hostWindow.postMessage(withNonce, hostOrigin || '*');
+  }
+
+  function onHostMessage(event) {
+    var data = event.data;
+    if (!isJsonRpc(data)) {
+      return;
+    }
+    var method = methodOf(data);
+
+    if (method === RESOURCE_READY) {
+      // First authenticated host message: lock onto the host origin and
+      // the per-frame nonce, then mount the View.
+      if (inner) {
+        return; // one resource per proxy instance
+      }
+      hostOrigin = event.origin && event.origin !== 'null' ? event.origin : null;
+      nonce =
+        data.params && typeof data.params.nonce === 'string'
+          ? data.params.nonce
+          : null;
+      mountView(data.params || {});
+      return;
+    }
+
+    // Reserved sandbox-* messages are proxy-private and never forwarded.
+    if (isSandboxReserved(method)) {
+      return;
+    }
+
+    // Everything else is host->View. Authenticate the nonce once armed.
+    if (nonce && data.nonce !== nonce) {
+      return;
+    }
+    if (innerReady) {
+      postToInner(data);
+    } else {
+      pendingToInner.push(data);
+    }
+  }
+
+  function onInnerMessage(event) {
+    if (!inner || event.source !== inner.contentWindow) {
+      return;
+    }
+    var data = event.data;
+    if (!isJsonRpc(data)) {
+      return;
+    }
+    // The View must not speak the reserved sandbox channel.
+    if (isSandboxReserved(methodOf(data))) {
+      return;
+    }
+    postToHost(data);
+  }
+
+  window.addEventListener('message', function (event) {
+    if (event.source === hostWindow) {
+      onHostMessage(event);
+    } else if (inner && event.source === inner.contentWindow) {
+      onInnerMessage(event);
+    }
+  });
+
+  // Step 3: announce readiness. targetOrigin "*" is acceptable — this
+  // carries no secret and the host validates by source window + origin
+  // before sending the nonce-bearing resource.
+  if (hostWindow && hostWindow !== window) {
+    hostWindow.postMessage({ jsonrpc: '2.0', method: PROXY_READY, params: {} }, '*');
+  }
+})();
diff --git a/infrastructure/bin/infrastructure.ts b/infrastructure/bin/infrastructure.ts
index 9ecad100..898b76ca 100644
--- a/infrastructure/bin/infrastructure.ts
+++ b/infrastructure/bin/infrastructure.ts
@@ -7,6 +7,8 @@ import { InferenceApiStack } from '../lib/inference-api-stack';
 import { GatewayStack } from '../lib/gateway-stack';
 import { RagIngestionStack } from '../lib/rag-ingestion-stack';
 import { SageMakerFineTuningStack } from '../lib/sagemaker-fine-tuning-stack';
+import { ArtifactsStack } from '../lib/artifacts-stack';
+import { McpSandboxStack } from '../lib/mcp-sandbox-stack';
 import { loadConfig, getStackEnv } from '../lib/config';
 
 const app = new cdk.App();
@@ -23,6 +25,32 @@ new InfrastructureStack(app, 'InfrastructureStack', {
   stackName: `${config.projectPrefix}-InfrastructureStack`,
 });
 
+// Artifacts Stack - iframe-isolated artifact rendering (deploy AFTER Infrastructure,
+// BEFORE Inference API / App API / Frontend, which read its SSM exports).
+// Parallel-safe with RAG Ingestion and Fine-Tuning.
+if (config.artifacts.enabled) {
+  new ArtifactsStack(app, 'ArtifactsStack', {
+    config,
+    env,
+    description: `${config.projectPrefix} Artifacts Stack - DDB, S3, CloudFront + Lambda render service`,
+    stackName: `${config.projectPrefix}-ArtifactsStack`,
+  });
+}
+
+// MCP Sandbox Stack - S3 + CloudFront + Route53 serving the MCP Apps
+// sandbox-proxy shell at mcp-sandbox.{domain}. PR #1 of the MCP Apps host
+// renderer initiative; deploy tier 1, parallel-safe with Artifacts / RAG /
+// Gateway / Fine-Tuning (reads no cross-stack SSM). Inert until the SPA
+// wiring (PR #4) and MCP_APPS_HOST_ENABLED (PR #7) land.
+if (config.mcpSandbox.enabled) {
+  new McpSandboxStack(app, 'McpSandboxStack', {
+    config,
+    env,
+    description: `${config.projectPrefix} MCP Sandbox Stack - S3, CloudFront, Route53 (MCP Apps proxy origin)`,
+    stackName: `${config.projectPrefix}-McpSandboxStack`,
+  });
+}
+
 // Frontend Stack - S3 + CloudFront + Route53
 if (config.frontend.enabled) {
   new FrontendStack(app, 'FrontendStack', {
diff --git a/infrastructure/cdk.context.json b/infrastructure/cdk.context.json
index e4b5b41f..524b8eaa 100644
--- a/infrastructure/cdk.context.json
+++ b/infrastructure/cdk.context.json
@@ -24,7 +24,7 @@
     "enabled": true,
     "cpu": 512,
     "memory": 1024,
-    "desiredCount": 1,
+    "desiredCount": 2,
     "maxCapacity": 10
   },
   "inferenceApi": {
@@ -64,6 +64,11 @@
   "fineTuning": {
     "enabled": false
   },
+  "artifacts": {
+    "enabled": false,
+    "certificateArn": "",
+    "retentionDays": 90
+  },
   "tags": {
     "ManagedBy": "CDK"
   },
diff --git a/infrastructure/lib/app-api-stack.ts b/infrastructure/lib/app-api-stack.ts
index 669a4b41..b3532bb5 100644
--- a/infrastructure/lib/app-api-stack.ts
+++ b/infrastructure/lib/app-api-stack.ts
@@ -314,6 +314,14 @@ export class AppApiStack extends cdk.Stack {
       this,
       `/${config.projectPrefix}/auth/auth-providers-table-arn`
     );
+    const userMenuLinksTableName = ssm.StringParameter.valueForStringParameter(
+      this,
+      `/${config.projectPrefix}/admin/user-menu-links-table-name`
+    );
+    const userMenuLinksTableArn = ssm.StringParameter.valueForStringParameter(
+      this,
+      `/${config.projectPrefix}/admin/user-menu-links-table-arn`
+    );
     const authProviderSecretsArn = ssm.StringParameter.valueForStringParameter(
       this,
       `/${config.projectPrefix}/auth/auth-provider-secrets-arn`
@@ -332,10 +340,8 @@ export class AppApiStack extends cdk.Stack {
     );
     // Phase 7 retired the public PKCE SPA client; the BFF confidential
     // client is the only one left. `COGNITO_APP_CLIENT_ID` is still wired
-    // because `get_current_user` (Bearer auth on `/chat/agent-stream`)
-    // needs *some* client_id to validate against — point it at the BFF
-    // client so any Bearer token minted via the BFF token-exchange path
-    // is accepted there too. The cookie-auth dependency uses
+    // because `CognitoIdentityProviderService` (federated OIDC IdP
+    // management) reads it. The cookie-auth dependency uses
     // `COGNITO_BFF_APP_CLIENT_ID` (same value) via its own validator.
     const cognitoAppClientId = ssm.StringParameter.valueForStringParameter(
       this,
@@ -367,6 +373,10 @@ export class AppApiStack extends cdk.Stack {
       this,
       `/${config.projectPrefix}/auth/bff-cookie-signing-key-arn`
     );
+    const bffCookieDataKeySecretArn = ssm.StringParameter.valueForStringParameter(
+      this,
+      `/${config.projectPrefix}/auth/bff-cookie-data-key-secret-arn`
+    );
     const cognitoBFFAppClientId = ssm.StringParameter.valueForStringParameter(
       this,
       `/${config.projectPrefix}/auth/cognito/bff-app-client-id`
@@ -520,6 +530,7 @@ export class AppApiStack extends cdk.Stack {
         DYNAMODB_AUTH_PROVIDERS_TABLE_NAME: authProvidersTableName,
         AUTH_PROVIDER_SECRETS_ARN: authProviderSecretsArn,
         DYNAMODB_USER_SETTINGS_TABLE_NAME: userSettingsTableName,
+        DYNAMODB_USER_MENU_LINKS_TABLE_NAME: userMenuLinksTableName,
         // Cognito configuration (imported from Infrastructure Stack)
         COGNITO_USER_POOL_ID: cognitoUserPoolId,
         COGNITO_APP_CLIENT_ID: cognitoAppClientId,
@@ -533,6 +544,11 @@ export class AppApiStack extends cdk.Stack {
         // BFF Token Handler — wired in Phase 1, used starting Phase 2.
         BFF_SESSIONS_TABLE_NAME: bffSessionsTableName,
         BFF_COOKIE_SIGNING_KEY_ARN: bffCookieSigningKeyArn,
+        // High-entropy secret string fetched once at task startup; SHA-256
+        // is applied at runtime to derive the 32-byte AES-256 cookie key.
+        // Generated once by Secrets Manager at stack create so every app-api
+        // task derives the same plaintext key (cross-task seal/unseal).
+        BFF_COOKIE_DATA_KEY_SECRET_ARN: bffCookieDataKeySecretArn,
         BFF_SESSION_TTL_SECONDS: '28800',
         BFF_SESSION_REFRESH_LEEWAY_SECONDS: '60',
         COGNITO_BFF_APP_CLIENT_ID: cognitoBFFAppClientId,
@@ -932,6 +948,87 @@ export class AppApiStack extends cdk.Stack {
       })
     );
 
+    // ============================================================
+    // Artifacts integration
+    // ============================================================
+    // App-api owns the user-facing artifact endpoints (list, get, share)
+    // and mints short-lived render-token JWTs that authorize the iframe to
+    // load a specific artifact version. The agent runtime in inference-api
+    // is the writer of new versions; app-api here is the user-facing reader
+    // plus token minter. Gated on `config.artifacts.enabled` so app-api
+    // remains deployable when the artifacts feature is off.
+    if (config.artifacts.enabled) {
+      const artifactsBucketName = ssm.StringParameter.valueForStringParameter(
+        this,
+        `/${config.projectPrefix}/artifacts/bucket-name`
+      );
+      const artifactsBucketArn = ssm.StringParameter.valueForStringParameter(
+        this,
+        `/${config.projectPrefix}/artifacts/bucket-arn`
+      );
+      const artifactsTableName = ssm.StringParameter.valueForStringParameter(
+        this,
+        `/${config.projectPrefix}/artifacts/table-name`
+      );
+      const artifactsTableArn = ssm.StringParameter.valueForStringParameter(
+        this,
+        `/${config.projectPrefix}/artifacts/table-arn`
+      );
+      const artifactsOrigin = ssm.StringParameter.valueForStringParameter(
+        this,
+        `/${config.projectPrefix}/artifacts/origin`
+      );
+      const artifactRenderTokenSecretArn = ssm.StringParameter.valueForStringParameter(
+        this,
+        `/${config.projectPrefix}/artifacts/render-token-key-arn`
+      );
+
+      taskDefinition.taskRole.addToPrincipalPolicy(
+        new iam.PolicyStatement({
+          sid: 'ArtifactsBucketReadWrite',
+          effect: iam.Effect.ALLOW,
+          actions: [
+            's3:GetObject',
+            's3:PutObject',
+            's3:PutObjectTagging',
+            's3:ListBucket',
+            // No DeleteObject — soft-delete via object tag + bucket lifecycle.
+          ],
+          resources: [artifactsBucketArn, `${artifactsBucketArn}/*`],
+        })
+      );
+
+      taskDefinition.taskRole.addToPrincipalPolicy(
+        new iam.PolicyStatement({
+          sid: 'ArtifactsTableReadWrite',
+          effect: iam.Effect.ALLOW,
+          actions: [
+            'dynamodb:GetItem',
+            'dynamodb:PutItem',
+            'dynamodb:UpdateItem',
+            'dynamodb:Query',
+            'dynamodb:BatchGetItem',
+          ],
+          resources: [artifactsTableArn, `${artifactsTableArn}/index/*`],
+        })
+      );
+
+      taskDefinition.taskRole.addToPrincipalPolicy(
+        new iam.PolicyStatement({
+          sid: 'ArtifactRenderTokenSecretRead',
+          effect: iam.Effect.ALLOW,
+          actions: ['secretsmanager:GetSecretValue', 'secretsmanager:DescribeSecret'],
+          // Wildcard suffix matches the random-suffix actual ARN.
+          resources: [`${artifactRenderTokenSecretArn}*`],
+        })
+      );
+
+      container.addEnvironment('S3_ARTIFACTS_BUCKET_NAME', artifactsBucketName);
+      container.addEnvironment('DYNAMODB_ARTIFACTS_TABLE_NAME', artifactsTableName);
+      container.addEnvironment('ARTIFACTS_ORIGIN', artifactsOrigin);
+      container.addEnvironment('ARTIFACTS_RENDER_TOKEN_SECRET_ARN', artifactRenderTokenSecretArn);
+    }
+
     // Grant Bedrock permissions for title generation (Nova Micro)
     taskDefinition.taskRole.addToPrincipalPolicy(
       new iam.PolicyStatement({
@@ -1035,12 +1132,27 @@ export class AppApiStack extends cdk.Stack {
       new iam.PolicyStatement({
         sid: 'BFFCookieSigningKeyAccess',
         effect: iam.Effect.ALLOW,
-        // GenerateDataKey at startup; Decrypt is reserved for future key rotation.
-        actions: ['kms:GenerateDataKey', 'kms:Decrypt', 'kms:DescribeKey'],
+        // BFFCookieDataKeySecret is encrypted at rest with this CMK, so
+        // SecretsManager invokes kms:Decrypt on the caller's behalf when
+        // app-api calls GetSecretValue. kms:GenerateDataKey is intentionally
+        // NOT granted — the runtime never mints a fresh key, so a compromised
+        // task can't seal cookies under a parallel key.
+        actions: ['kms:Decrypt'],
         resources: [bffCookieSigningKeyArn],
       })
     );
 
+    taskDefinition.taskRole.addToPrincipalPolicy(
+      new iam.PolicyStatement({
+        sid: 'BFFCookieDataKeySecretAccess',
+        effect: iam.Effect.ALLOW,
+        // Read-only on the data-key secret. PutSecretValue is intentionally
+        // not granted — the runtime cannot rotate or substitute the value.
+        actions: ['secretsmanager:GetSecretValue', 'secretsmanager:DescribeSecret'],
+        resources: [`${bffCookieDataKeySecretArn}*`], // Wildcard for random suffix
+      })
+    );
+
     taskDefinition.taskRole.addToPrincipalPolicy(
       new iam.PolicyStatement({
         sid: 'CognitoBFFAppClientSecretAccess',
@@ -1229,6 +1341,22 @@ export class AppApiStack extends cdk.Stack {
       })
     );
 
+    // Grant CRUD on the user-menu links table (admin-managed)
+    taskDefinition.taskRole.addToPrincipalPolicy(
+      new iam.PolicyStatement({
+        sid: 'UserMenuLinksTableAccess',
+        effect: iam.Effect.ALLOW,
+        actions: [
+          'dynamodb:GetItem',
+          'dynamodb:PutItem',
+          'dynamodb:UpdateItem',
+          'dynamodb:DeleteItem',
+          'dynamodb:Query',
+        ],
+        resources: [userMenuLinksTableArn],
+      })
+    );
+
     // Grant Cognito permissions for identity provider management and first-boot
     taskDefinition.taskRole.addToPrincipalPolicy(
       new iam.PolicyStatement({
diff --git a/infrastructure/lib/artifacts-stack.ts b/infrastructure/lib/artifacts-stack.ts
new file mode 100644
index 00000000..2901ad2b
--- /dev/null
+++ b/infrastructure/lib/artifacts-stack.ts
@@ -0,0 +1,358 @@
+import * as cdk from 'aws-cdk-lib';
+import * as acm from 'aws-cdk-lib/aws-certificatemanager';
+import * as cloudfront from 'aws-cdk-lib/aws-cloudfront';
+import * as origins from 'aws-cdk-lib/aws-cloudfront-origins';
+import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
+import * as iam from 'aws-cdk-lib/aws-iam';
+import * as lambda from 'aws-cdk-lib/aws-lambda';
+import * as route53 from 'aws-cdk-lib/aws-route53';
+import * as route53Targets from 'aws-cdk-lib/aws-route53-targets';
+import * as s3 from 'aws-cdk-lib/aws-s3';
+import * as ssm from 'aws-cdk-lib/aws-ssm';
+import { Construct } from 'constructs';
+import * as path from 'path';
+import {
+  AppConfig,
+  applyStandardTags,
+  getAutoDeleteObjects,
+  getRemovalPolicy,
+  getResourceName,
+} from './config';
+
+export interface ArtifactsStackProps extends cdk.StackProps {
+  config: AppConfig;
+}
+
+/**
+ * Artifacts Stack — iframe-isolated artifact rendering pipeline.
+ *
+ * Provisions everything required to serve user-generated artifacts
+ * (HTML, code, markdown, SVG) into a sandboxed cross-origin iframe:
+ *
+ *   - DynamoDB `user-artifacts` table (heads + version log, GSI by session)
+ *   - S3 `artifacts-content` bucket (private, no CORS — HTML served via CF)
+ *   - Render Lambda (validates render-token JWT, fetches blob, returns
+ *     HTML with strict CSP)
+ *   - CloudFront distribution for `artifacts.{domainName}` (terminates TLS,
+ *     attaches the security headers policy)
+ *   - Route53 A record aliasing the subdomain to CloudFront
+ *
+ * Cross-stack contract (SSM, all `/{projectPrefix}/artifacts/*`):
+ *
+ *   Consumes (published by InfrastructureStack):
+ *     /artifacts/render-token-key-arn   Secrets Manager ARN of HMAC key
+ *
+ *   Publishes (consumed by inference-api, app-api, frontend):
+ *     /artifacts/bucket-name            S3 bucket name
+ *     /artifacts/bucket-arn             S3 bucket ARN
+ *     /artifacts/table-name             DDB table name
+ *     /artifacts/table-arn              DDB table ARN
+ *     /artifacts/origin                 "https://artifacts.{domainName}"
+ *
+ * Dependency direction is one-way: ArtifactsStack reads InfrastructureStack
+ * via SSM and publishes its own SSM parameters. Consumers (inference-api,
+ * app-api, frontend) read those parameters. No consumer publishes anything
+ * that ArtifactsStack reads — this is what prevents CDK circular references.
+ *
+ * Deploy order: InfrastructureStack → ArtifactsStack → (inference-api,
+ * app-api, frontend). Parallel-safe with RagIngestionStack and
+ * SageMakerFineTuningStack which neither read nor write artifact SSM keys.
+ */
+export class ArtifactsStack extends cdk.Stack {
+  public readonly artifactsTable: dynamodb.Table;
+  public readonly artifactsBucket: s3.Bucket;
+  public readonly renderFunction: lambda.Function;
+  public readonly distribution: cloudfront.Distribution;
+
+  constructor(scope: Construct, id: string, props: ArtifactsStackProps) {
+    super(scope, id, props);
+
+    const { config } = props;
+    applyStandardTags(this, config);
+
+    // Validation in config.ts has already enforced these for enabled stacks.
+    // Non-null assertions are safe here.
+    const domainName = config.domainName!;
+    const hostedZoneDomain = config.infrastructureHostedZoneDomain!;
+    const certificateArn = config.artifacts.certificateArn!;
+    const artifactsSubdomain = `artifacts.${domainName}`;
+
+    // CSP frame-ancestors source list: the deployed SPA origin, plus any
+    // extra origins (e.g. http://localhost:4200 for a local SPA pointed at
+    // this env). Space-separated per the CSP grammar — consumed identically
+    // by the CloudFront response-headers-policy and the render Lambda's own
+    // defense-in-depth CSP.
+    const frameAncestors = [
+      `https://${domainName}`,
+      ...config.artifacts.extraFrameAncestors,
+    ].join(' ');
+
+    // ============================================================
+    // DynamoDB — artifact metadata + per-version log
+    // ============================================================
+    // PK: USER#{user_id}
+    // SK: ARTIFACT#{artifact_id}#HEAD            (current state, 1 per artifact)
+    // SK: ARTIFACT#{artifact_id}#V#{version:05d} (immutable version records)
+    //
+    // GSI SessionIndex:
+    //   PK: SESSION#{session_id}
+    //   SK: ARTIFACT#{updated_at}#{artifact_id}
+    // ...lets the SPA list artifacts for the current session newest-first.
+    this.artifactsTable = new dynamodb.Table(this, 'ArtifactsTable', {
+      tableName: getResourceName(config, 'user-artifacts'),
+      partitionKey: { name: 'PK', type: dynamodb.AttributeType.STRING },
+      sortKey: { name: 'SK', type: dynamodb.AttributeType.STRING },
+      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
+      pointInTimeRecoverySpecification: {
+        pointInTimeRecoveryEnabled: config.production,
+      },
+      timeToLiveAttribute: 'ttl',
+      encryption: dynamodb.TableEncryption.AWS_MANAGED,
+      removalPolicy: getRemovalPolicy(config),
+    });
+
+    this.artifactsTable.addGlobalSecondaryIndex({
+      indexName: 'SessionIndex',
+      partitionKey: { name: 'GSI1PK', type: dynamodb.AttributeType.STRING },
+      sortKey: { name: 'GSI1SK', type: dynamodb.AttributeType.STRING },
+      projectionType: dynamodb.ProjectionType.ALL,
+    });
+
+    // ============================================================
+    // S3 — artifact content blobs
+    // ============================================================
+    // Layout: {user_id}/{artifact_id}/v{n}/index.html (+ sibling assets)
+    // Private, no CORS — the iframe loads HTML directly from CloudFront
+    // (which proxies to the render Lambda), never via XHR. Versioning is
+    // at the DDB layer (immutable per-version rows + content pointer),
+    // not S3 — keeps the S3 object lifecycle simple and predictable.
+    this.artifactsBucket = new s3.Bucket(this, 'ArtifactsContentBucket', {
+      bucketName: getResourceName(config, 'artifacts-content'),
+      encryption: s3.BucketEncryption.S3_MANAGED,
+      blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
+      enforceSSL: true,
+      lifecycleRules: [
+        {
+          // Clean up failed multipart uploads (mostly large React bundles)
+          // so they don't accumulate storage charges.
+          id: 'abort-stale-multipart',
+          abortIncompleteMultipartUploadAfter: cdk.Duration.days(7),
+        },
+        {
+          // Soft-deleted artifacts: the backend tags objects with
+          // `lifecycle-class=deleted` on artifact delete, and this rule
+          // reaps them after the configured retention window. Keeps the
+          // "undelete" undo affordance feasible without unbounded storage.
+          id: 'expire-soft-deleted',
+          tagFilters: { 'lifecycle-class': 'deleted' },
+          expiration: cdk.Duration.days(config.artifacts.retentionDays),
+        },
+      ],
+      removalPolicy: getRemovalPolicy(config),
+      autoDeleteObjects: getAutoDeleteObjects(config),
+    });
+
+    // ============================================================
+    // Render Lambda — validates JWT, fetches blob, wraps in HTML+CSP
+    // ============================================================
+    // ARM64 for cost; Python to match the rest of the backend toolchain.
+    // The function ships with NO third-party deps in this scaffold —
+    // when JWT verification + S3 read are implemented, add a
+    // `requirements.txt` next to handler.py and switch to
+    // `lambda.Code.fromAsset` with a Python bundling option, or move to
+    // a `DockerImageFunction` (matches the rag-ingestion stack pattern).
+    const renderTokenKeyArn = ssm.StringParameter.valueForStringParameter(
+      this,
+      `/${config.projectPrefix}/artifacts/render-token-key-arn`,
+    );
+
+    this.renderFunction = new lambda.Function(this, 'RenderFunction', {
+      functionName: getResourceName(config, 'artifact-render'),
+      runtime: lambda.Runtime.PYTHON_3_12,
+      architecture: lambda.Architecture.ARM_64,
+      handler: 'handler.handler',
+      code: lambda.Code.fromAsset(
+        path.resolve(__dirname, '..', '..', 'backend', 'src', 'lambdas', 'artifact_render'),
+      ),
+      memorySize: 512,
+      timeout: cdk.Duration.seconds(5),
+      environment: {
+        ARTIFACTS_BUCKET: this.artifactsBucket.bucketName,
+        ARTIFACTS_TABLE: this.artifactsTable.tableName,
+        RENDER_TOKEN_SECRET_ARN: renderTokenKeyArn,
+        FRAME_ANCESTOR_ORIGIN: frameAncestors,
+        // Pinned CSP allow-list. Adjust here if/when the artifact runtime
+        // grows new permitted external script origins. Keep in exact sync
+        // with the `script-src` line in `cspDirectives` below — the render
+        // Lambda reads this env var, CloudFront stamps the literal, and the
+        // two must be identical (defense in depth).
+        CSP_SCRIPT_SRC:
+          "'self' 'unsafe-inline' https://cdn.tailwindcss.com https://esm.sh https://cdn.jsdelivr.net https://unpkg.com",
+      },
+    });
+
+    // Read access to artifact content + DDB metadata. No write access —
+    // writes flow from inference-api's agent tool, granted in InferenceApiStack.
+    this.artifactsBucket.grantRead(this.renderFunction);
+    this.artifactsTable.grantReadData(this.renderFunction);
+
+    // Read the HMAC signing key from Secrets Manager. Include the wildcard
+    // suffix so the policy matches the random-suffix actual ARN.
+    this.renderFunction.addToRolePolicy(
+      new iam.PolicyStatement({
+        sid: 'ReadRenderTokenSecret',
+        actions: ['secretsmanager:GetSecretValue'],
+        resources: [`${renderTokenKeyArn}*`],
+      }),
+    );
+
+    // Lambda Function URL — invoked by CloudFront only.
+    //
+    // AWS_IAM auth + Origin Access Control (OAC) below: CloudFront signs
+    // each origin request with SigV4 using a service-principal trust the
+    // Lambda accepts, and the Function URL refuses unsigned requests. This
+    // blocks direct invocation at the lambdaUrl.amazonaws.com hostname —
+    // no application-layer host check needed. (Earlier draft used NONE +
+    // CloudFront-as-gatekeeper, but `FunctionUrlOrigin.withOriginAccessControl`
+    // requires AWS_IAM; CDK enforces this at synth time.)
+    const functionUrl = this.renderFunction.addFunctionUrl({
+      authType: lambda.FunctionUrlAuthType.AWS_IAM,
+    });
+
+    // ============================================================
+    // CloudFront — terminates TLS, attaches CSP, caches nothing
+    // ============================================================
+    const certificate = acm.Certificate.fromCertificateArn(
+      this,
+      'ArtifactsCertificate',
+      certificateArn,
+    );
+
+    // Strict CSP for the artifact origin. `connect-src 'none'` is the
+    // critical line — artifact JS cannot fetch the app API, cannot phone
+    // home, cannot exfiltrate. `frame-ancestors` pins the parent origin
+    // so other sites can't embed your users' artifacts.
+    const cspDirectives = [
+      `default-src 'none'`,
+      `script-src 'self' 'unsafe-inline' https://cdn.tailwindcss.com https://esm.sh https://cdn.jsdelivr.net https://unpkg.com`,
+      `style-src 'self' 'unsafe-inline'`,
+      `img-src 'self' data: https:`,
+      `font-src 'self' data:`,
+      `connect-src 'none'`,
+      `frame-ancestors ${frameAncestors}`,
+      `form-action 'none'`,
+      `base-uri 'none'`,
+    ].join('; ');
+
+    const responseHeadersPolicy = new cloudfront.ResponseHeadersPolicy(
+      this,
+      'ArtifactsResponseHeaders',
+      {
+        responseHeadersPolicyName: getResourceName(config, 'artifacts-headers'),
+        comment: 'Strict CSP + security headers for artifact iframe origin',
+        securityHeadersBehavior: {
+          contentSecurityPolicy: {
+            contentSecurityPolicy: cspDirectives,
+            override: true,
+          },
+          contentTypeOptions: { override: true },
+          // NOT setting frameOptions here — `frame-ancestors` above is the
+          // CSP-native equivalent and is what gets enforced cross-browser.
+          referrerPolicy: {
+            referrerPolicy: cloudfront.HeadersReferrerPolicy.NO_REFERRER,
+            override: true,
+          },
+          strictTransportSecurity: {
+            accessControlMaxAge: cdk.Duration.days(365),
+            includeSubdomains: true,
+            override: true,
+          },
+        },
+      },
+    );
+
+    // FunctionUrlOrigin proxies to the Lambda Function URL. Caching is
+    // disabled because each render-token JWT is per-version-per-session
+    // and tokens carry their own auth — no useful cache key exists.
+    this.distribution = new cloudfront.Distribution(this, 'ArtifactsDistribution', {
+      comment: getResourceName(config, 'artifacts-cdn'),
+      domainNames: [artifactsSubdomain],
+      certificate,
+      minimumProtocolVersion: cloudfront.SecurityPolicyProtocol.TLS_V1_2_2021,
+      defaultBehavior: {
+        origin: origins.FunctionUrlOrigin.withOriginAccessControl(functionUrl),
+        viewerProtocolPolicy: cloudfront.ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
+        cachePolicy: cloudfront.CachePolicy.CACHING_DISABLED,
+        originRequestPolicy: cloudfront.OriginRequestPolicy.ALL_VIEWER_EXCEPT_HOST_HEADER,
+        responseHeadersPolicy,
+        allowedMethods: cloudfront.AllowedMethods.ALLOW_GET_HEAD,
+        compress: true,
+      },
+      // Reuse the cheapest price class for artifacts — these aren't
+      // latency-critical and most of the audience is regional anyway.
+      priceClass: cloudfront.PriceClass.PRICE_CLASS_100,
+    });
+
+    // ============================================================
+    // Route53 — alias artifacts.{domainName} to the CloudFront distro
+    // ============================================================
+    const hostedZone = route53.HostedZone.fromLookup(this, 'HostedZone', {
+      domainName: hostedZoneDomain,
+    });
+
+    new route53.ARecord(this, 'ArtifactsAliasRecord', {
+      zone: hostedZone,
+      recordName: artifactsSubdomain,
+      target: route53.RecordTarget.fromAlias(new route53Targets.CloudFrontTarget(this.distribution)),
+      comment: 'Artifact iframe origin — proxies to CloudFront → render Lambda',
+    });
+
+    // ============================================================
+    // SSM exports — the outward contract
+    // ============================================================
+    new ssm.StringParameter(this, 'ArtifactsBucketNameParameter', {
+      parameterName: `/${config.projectPrefix}/artifacts/bucket-name`,
+      stringValue: this.artifactsBucket.bucketName,
+      description: 'S3 bucket holding artifact content blobs',
+      tier: ssm.ParameterTier.STANDARD,
+    });
+
+    new ssm.StringParameter(this, 'ArtifactsBucketArnParameter', {
+      parameterName: `/${config.projectPrefix}/artifacts/bucket-arn`,
+      stringValue: this.artifactsBucket.bucketArn,
+      description: 'ARN of the artifact content bucket (for IAM grants)',
+      tier: ssm.ParameterTier.STANDARD,
+    });
+
+    new ssm.StringParameter(this, 'ArtifactsTableNameParameter', {
+      parameterName: `/${config.projectPrefix}/artifacts/table-name`,
+      stringValue: this.artifactsTable.tableName,
+      description: 'DynamoDB table holding artifact heads + version log',
+      tier: ssm.ParameterTier.STANDARD,
+    });
+
+    new ssm.StringParameter(this, 'ArtifactsTableArnParameter', {
+      parameterName: `/${config.projectPrefix}/artifacts/table-arn`,
+      stringValue: this.artifactsTable.tableArn,
+      description: 'ARN of the artifacts table (for IAM grants)',
+      tier: ssm.ParameterTier.STANDARD,
+    });
+
+    new ssm.StringParameter(this, 'ArtifactsOriginParameter', {
+      parameterName: `/${config.projectPrefix}/artifacts/origin`,
+      stringValue: `https://${artifactsSubdomain}`,
+      description: 'Origin where artifact iframes are served (https://artifacts.{domain})',
+      tier: ssm.ParameterTier.STANDARD,
+    });
+
+    // Human-readable CloudFormation outputs for deploy-time visibility.
+    new cdk.CfnOutput(this, 'ArtifactsOrigin', {
+      value: `https://${artifactsSubdomain}`,
+      description: 'Artifact iframe origin URL',
+    });
+    new cdk.CfnOutput(this, 'ArtifactsDistributionId', {
+      value: this.distribution.distributionId,
+      description: 'CloudFront distribution ID for the artifact origin',
+    });
+  }
+}
diff --git a/infrastructure/lib/config.ts b/infrastructure/lib/config.ts
index 33d08039..b6ae1c1a 100644
--- a/infrastructure/lib/config.ts
+++ b/infrastructure/lib/config.ts
@@ -33,10 +33,55 @@ export interface AppConfig {
   fileUpload: FileUploadConfig;
   ragIngestion: RagIngestionConfig;
   fineTuning: FineTuningConfig;
+  artifacts: ArtifactsConfig;
+  mcpSandbox: McpSandboxConfig;
   appVersion: string;
   tags: { [key: string]: string };
 }
 
+/**
+ * MCP Apps host renderer — sandbox-proxy origin (PR #1 of the
+ * docs/kaizen/scoping/mcp-apps-host-renderer.md sequence).
+ *
+ * Provisions a dedicated cross-origin shell (mcp-sandbox.{domainName}) that
+ * the SPA's <mcp-app-frame> is pointed at. When `mcpSandbox.enabled`, the
+ * inference-api stack consumes this stack's SSM origin export into
+ * `AGENTCORE_MCP_APPS_SANDBOX_ORIGIN` (conditional-SSM pattern, mirrors
+ * artifacts). The host renderer is gated by MCP_APPS_HOST_ENABLED, flipped
+ * on in PR #7; with this stack disabled the surface stays dormant because
+ * the SPA has no proxy origin to frame an App in.
+ */
+export interface McpSandboxConfig {
+  // When false the stack is not instantiated at all (bin/infrastructure.ts
+  // skips it). Default false — the origin is opt-in per environment and the
+  // initiative is gated end-to-end until PR #7.
+  enabled: boolean;
+  // ACM certificate ARN for the proxy origin (mcp-sandbox.{domainName}).
+  // MUST be in us-east-1 — CloudFront requires its viewer certs there.
+  // Required (and region-validated) only when enabled; without it the stack
+  // still synthesizes on the CloudFront default domain so unit/synth tests
+  // and domain-less local stacks work.
+  certificateArn?: string;
+  // Extra origins (beyond https://{domainName}) allowed to embed the proxy
+  // iframe via CSP frame-ancestors — e.g. http://localhost:4200 for a local
+  // SPA pointed at this deployment. Empty on prod.
+  extraFrameAncestors: string[];
+}
+
+export interface ArtifactsConfig {
+  enabled: boolean;
+  // ACM certificate ARN for the artifact iframe origin (artifacts.{domainName}).
+  // MUST be in us-east-1 — CloudFront requires its certs there. Validation
+  // surfaces a clear error if the arn is in another region.
+  certificateArn?: string;
+  // Soft-delete retention window for objects tagged `lifecycle-class=deleted`.
+  retentionDays: number;
+  // Extra origins (beyond https://{domainName}) allowed to embed artifact
+  // iframes via CSP frame-ancestors — e.g. http://localhost:4200 for a
+  // local SPA pointed at this deployment. Empty on prod.
+  extraFrameAncestors: string[];
+}
+
 export interface FrontendConfig {
   certificateArn?: string;
   enabled: boolean;
@@ -261,6 +306,23 @@ export function loadConfig(scope: cdk.App): AppConfig {
       defaultQuotaHours: parseIntEnv(process.env.CDK_FINE_TUNING_DEFAULT_QUOTA_HOURS) ?? scope.node.tryGetContext('fineTuning')?.defaultQuotaHours ?? 0,
       additionalCorsOrigins: process.env.CDK_FINE_TUNING_CORS_ORIGINS || scope.node.tryGetContext('fineTuning')?.additionalCorsOrigins,
     },
+    artifacts: {
+      enabled: parseBooleanEnv(process.env.CDK_ARTIFACTS_ENABLED) ?? scope.node.tryGetContext('artifacts')?.enabled ?? false,
+      certificateArn: process.env.CDK_ARTIFACTS_CERTIFICATE_ARN || scope.node.tryGetContext('artifacts')?.certificateArn,
+      retentionDays: parseIntEnv(process.env.CDK_ARTIFACTS_RETENTION_DAYS) ?? scope.node.tryGetContext('artifacts')?.retentionDays ?? 90,
+      extraFrameAncestors: process.env.CDK_ARTIFACTS_EXTRA_FRAME_ANCESTORS?.split(',')
+        .map((s) => s.trim()).filter(Boolean)
+        || scope.node.tryGetContext('artifacts')?.extraFrameAncestors
+        || [],
+    },
+    mcpSandbox: {
+      enabled: parseBooleanEnv(process.env.CDK_MCP_SANDBOX_ENABLED) ?? scope.node.tryGetContext('mcpSandbox')?.enabled ?? false,
+      certificateArn: process.env.CDK_MCP_SANDBOX_CERTIFICATE_ARN || scope.node.tryGetContext('mcpSandbox')?.certificateArn,
+      extraFrameAncestors: process.env.CDK_MCP_SANDBOX_EXTRA_FRAME_ANCESTORS?.split(',')
+        .map((s) => s.trim()).filter(Boolean)
+        || scope.node.tryGetContext('mcpSandbox')?.extraFrameAncestors
+        || [],
+    },
     tags: {
       ...(scope.node.tryGetContext('tags') || {}),
     },
@@ -277,6 +339,8 @@ export function loadConfig(scope: cdk.App): AppConfig {
   console.log(`   Inference API Enabled: ${config.inferenceApi.enabled}`);
   console.log(`   Gateway Enabled: ${config.gateway.enabled}`);
   console.log(`   Fine-Tuning Enabled: ${config.fineTuning.enabled}`);
+  console.log(`   Artifacts Enabled: ${config.artifacts.enabled}`);
+  console.log(`   MCP Sandbox Enabled: ${config.mcpSandbox.enabled}`);
   console.log(`   App Version: ${config.appVersion}`);
 
   // Validate configuration
@@ -507,6 +571,70 @@ function validateConfig(config: AppConfig): void {
       throw new Error('Gateway stack requires "throttleBurstLimit" to be set.');
     }
   }
+
+  if (config.artifacts.enabled) {
+    if (!config.domainName) {
+      throw new Error(
+        'Artifacts stack requires CDK_DOMAIN_NAME to be set — the artifact origin ' +
+        'is derived as artifacts.{CDK_DOMAIN_NAME}.'
+      );
+    }
+    if (!config.infrastructureHostedZoneDomain) {
+      throw new Error(
+        'Artifacts stack requires CDK_HOSTED_ZONE_DOMAIN to be set — used to look ' +
+        'up the Route53 zone where the artifacts subdomain record is created.'
+      );
+    }
+    if (!config.artifacts.certificateArn) {
+      throw new Error(
+        'Artifacts stack requires CDK_ARTIFACTS_CERTIFICATE_ARN — an ACM cert ' +
+        'in us-east-1 for the artifacts.{domain} CloudFront distribution.'
+      );
+    }
+    // CloudFront requires the viewer cert in us-east-1. Catch the most common
+    // misconfiguration up front rather than letting CloudFormation reject it.
+    if (!/^arn:aws:acm:us-east-1:/.test(config.artifacts.certificateArn)) {
+      throw new Error(
+        `Artifacts certificate must be in us-east-1 (CloudFront requirement). ` +
+        `Got: ${config.artifacts.certificateArn}`
+      );
+    }
+    if (config.artifacts.retentionDays < 1 || config.artifacts.retentionDays > 3650) {
+      throw new Error(
+        `Artifacts retentionDays must be between 1 and 3650. Got: ${config.artifacts.retentionDays}`
+      );
+    }
+  }
+
+  if (config.mcpSandbox.enabled) {
+    if (!config.domainName) {
+      throw new Error(
+        'MCP Sandbox stack requires CDK_DOMAIN_NAME to be set — the proxy origin ' +
+        'is derived as mcp-sandbox.{CDK_DOMAIN_NAME} and the CSP frame-ancestors ' +
+        'is locked to the SPA origin https://{CDK_DOMAIN_NAME}.'
+      );
+    }
+    if (!config.infrastructureHostedZoneDomain) {
+      throw new Error(
+        'MCP Sandbox stack requires CDK_HOSTED_ZONE_DOMAIN to be set — used to ' +
+        'look up the Route53 zone where the mcp-sandbox subdomain record is created.'
+      );
+    }
+    if (!config.mcpSandbox.certificateArn) {
+      throw new Error(
+        'MCP Sandbox stack requires CDK_MCP_SANDBOX_CERTIFICATE_ARN — an ACM cert ' +
+        'in us-east-1 for the mcp-sandbox.{domain} CloudFront distribution.'
+      );
+    }
+    // CloudFront requires the viewer cert in us-east-1. Catch the most common
+    // misconfiguration up front rather than letting CloudFormation reject it.
+    if (!/^arn:aws:acm:us-east-1:/.test(config.mcpSandbox.certificateArn)) {
+      throw new Error(
+        `MCP Sandbox certificate must be in us-east-1 (CloudFront requirement). ` +
+        `Got: ${config.mcpSandbox.certificateArn}`
+      );
+    }
+  }
 }
 
 /**
diff --git a/infrastructure/lib/frontend-stack.ts b/infrastructure/lib/frontend-stack.ts
index de570cd9..c81beed2 100644
--- a/infrastructure/lib/frontend-stack.ts
+++ b/infrastructure/lib/frontend-stack.ts
@@ -138,7 +138,19 @@ export class FrontendStack extends cdk.Stack {
       enableAcceptEncodingBrotli: true,
     });
 
-    // Response headers policy for security
+    // Response headers policy for security.
+    //
+    // When artifacts are enabled, allow the SPA to embed iframes from the
+    // artifact origin. `frame-src` is the only CSP directive we set —
+    // without `default-src`, the browser does not restrict any other
+    // resource types, so this addition cannot break existing functionality
+    // (scripts, styles, fonts, etc. remain unrestricted by CSP). The other
+    // resource types are still defended by the existing X-Content-Type-
+    // Options, Referrer-Policy, and HSTS headers below.
+    const artifactsOrigin = config.artifacts.enabled && config.domainName
+      ? `https://artifacts.${config.domainName}`
+      : undefined;
+
     const responseHeadersPolicy = new cloudfront.ResponseHeadersPolicy(
       this,
       'FrontendResponseHeadersPolicy',
@@ -165,6 +177,14 @@ export class FrontendStack extends cdk.Stack {
             modeBlock: true,
             override: true,
           },
+          ...(artifactsOrigin
+            ? {
+                contentSecurityPolicy: {
+                  contentSecurityPolicy: `frame-src 'self' ${artifactsOrigin}`,
+                  override: true,
+                },
+              }
+            : {}),
         },
       }
     );
diff --git a/infrastructure/lib/inference-api-stack.ts b/infrastructure/lib/inference-api-stack.ts
index 674a4be5..a45d59fa 100644
--- a/infrastructure/lib/inference-api-stack.ts
+++ b/infrastructure/lib/inference-api-stack.ts
@@ -386,9 +386,27 @@ export class InferenceApiStack extends cdk.Stack {
       ],
     }));
 
-    // S3 Assistants Documents Bucket permissions - NOT NEEDED by inference API
-    // Documents are only accessed during ingestion (Lambda function)
-    // Inference API only queries the vector store, not the raw documents
+    // S3 Assistants Documents Bucket permissions (READ-ONLY).
+    // The agent's spreadsheet_analysis tool downloads tabular KB files
+    // (CSV/XLSX) from this bucket to push into the Code Interpreter sandbox
+    // for analysis. Ingestion still happens via a separate Lambda; the
+    // runtime only needs read access.
+    const assistantsDocumentsBucketArn = ssm.StringParameter.valueForStringParameter(
+      this,
+      `/${config.projectPrefix}/rag/documents-bucket-arn`
+    );
+
+    runtimeExecutionRole.addToPolicy(new iam.PolicyStatement({
+      sid: 'AssistantsDocumentsBucketRead',
+      effect: iam.Effect.ALLOW,
+      actions: [
+        's3:GetObject',
+        's3:GetObjectVersion',
+      ],
+      resources: [
+        `${assistantsDocumentsBucketArn}/*`,
+      ],
+    }));
 
     // DynamoDB User Files Table permissions (imported from Infrastructure Stack)
     const userFilesTableArn = ssm.StringParameter.valueForStringParameter(
@@ -429,6 +447,52 @@ export class InferenceApiStack extends cdk.Stack {
       ],
     }));
 
+    // Artifacts: the agent's create_artifact / update_artifact tools write
+    // new versions to the artifact bucket and append rows to the artifacts
+    // DDB table. Read-back is handled by app-api (for listings) and the
+    // render Lambda (for iframe rendering), not by the agent runtime.
+    //
+    // Gated on `config.artifacts.enabled` — if artifacts isn't deployed,
+    // we don't issue SSM reads against parameters that don't exist (which
+    // would fail `cdk synth` token resolution at deploy time).
+    if (config.artifacts.enabled) {
+      const artifactsBucketArn = ssm.StringParameter.valueForStringParameter(
+        this,
+        `/${config.projectPrefix}/artifacts/bucket-arn`
+      );
+      const artifactsTableArn = ssm.StringParameter.valueForStringParameter(
+        this,
+        `/${config.projectPrefix}/artifacts/table-arn`
+      );
+
+      runtimeExecutionRole.addToPolicy(new iam.PolicyStatement({
+        sid: 'ArtifactsBucketWrite',
+        effect: iam.Effect.ALLOW,
+        actions: [
+          's3:PutObject',
+          's3:PutObjectTagging',
+          // No DeleteObject — soft-delete is implemented via tagging plus
+          // the bucket lifecycle rule (`lifecycle-class=deleted` expiry).
+        ],
+        resources: [`${artifactsBucketArn}/*`],
+      }));
+
+      runtimeExecutionRole.addToPolicy(new iam.PolicyStatement({
+        sid: 'ArtifactsTableWrite',
+        effect: iam.Effect.ALLOW,
+        actions: [
+          'dynamodb:GetItem',
+          'dynamodb:PutItem',
+          'dynamodb:UpdateItem',
+          'dynamodb:Query',
+        ],
+        resources: [
+          artifactsTableArn,
+          `${artifactsTableArn}/index/*`,
+        ],
+      }));
+    }
+
     // S3 Vectors permissions for RAG (READ-ONLY for queries)
     const assistantsVectorBucketName = ssm.StringParameter.valueForStringParameter(
       this,
@@ -813,13 +877,21 @@ export class InferenceApiStack extends cdk.Stack {
       resources: [this.memory.attrMemoryArn],
     }));
 
-    // Grant Runtime permission to use Code Interpreter
+    // Grant Runtime permission to use the Custom Code Interpreter.
+    // Action list matches AWS's documented policy for Code Interpreter access
+    // (see docs.aws.amazon.com/bedrock-agentcore/latest/devguide/
+    // code-interpreter-getting-started.html). Scoped to this stack's Custom
+    // Code Interpreter only — we don't need account-wide discovery perms.
     runtimeExecutionRole.addToPolicy(new iam.PolicyStatement({
       sid: 'CodeInterpreterAccess',
       effect: iam.Effect.ALLOW,
       actions: [
+        'bedrock-agentcore:StartCodeInterpreterSession',
         'bedrock-agentcore:InvokeCodeInterpreter',
-        'bedrock-agentcore:CreateCodeInterpreterSession',
+        'bedrock-agentcore:StopCodeInterpreterSession',
+        'bedrock-agentcore:GetCodeInterpreter',
+        'bedrock-agentcore:GetCodeInterpreterSession',
+        'bedrock-agentcore:ListCodeInterpreterSessions',
       ],
       resources: [this.codeInterpreter.attrCodeInterpreterArn],
     }));
@@ -985,6 +1057,15 @@ export class InferenceApiStack extends cdk.Stack {
         // S3 storage
         S3_ASSISTANTS_VECTOR_STORE_BUCKET_NAME: vectorBucketName,
         S3_ASSISTANTS_VECTOR_STORE_INDEX_NAME: vectorIndexName,
+        // Assistants KB documents bucket — needed by the agent's spreadsheet
+        // analysis tool to download files from S3 before pushing them into
+        // the Code Interpreter sandbox. Imported from RagIngestionStack via
+        // SSM (same parameter app-api uses). Without this the agent fails
+        // with "S3_ASSISTANTS_DOCUMENTS_BUCKET_NAME not configured".
+        S3_ASSISTANTS_DOCUMENTS_BUCKET_NAME: ssm.StringParameter.valueForStringParameter(
+          this,
+          `/${config.projectPrefix}/rag/documents-bucket-name`
+        ),
 
         // Authentication
         ENABLE_AUTHENTICATION: 'true',
@@ -1019,6 +1100,26 @@ export class InferenceApiStack extends cdk.Stack {
           this,
           `/${config.projectPrefix}/oauth/platform-workload-identity-name`
         ),
+
+        // MCP Apps sandbox-proxy origin (PR #7 of
+        // docs/kaizen/scoping/mcp-apps-host-renderer.md). The agent emits
+        // it on the `ui_resource` SSE event as `sandboxOrigin` — the
+        // cross-origin shell the SPA frames a hosted App in. Gated on
+        // `config.mcpSandbox.enabled` so we don't issue an SSM read against
+        // a parameter that doesn't exist when the mcp-sandbox stack isn't
+        // deployed (same conditional-SSM pattern as artifacts above; that
+        // failure would surface as cdk synth token resolution at deploy).
+        // Without it `AGENTCORE_MCP_APPS_SANDBOX_ORIGIN` falls back to its
+        // empty Python default and the SPA has no origin to frame an App
+        // in — the host surface stays dormant even with the flag on.
+        ...(config.mcpSandbox.enabled
+          ? {
+              AGENTCORE_MCP_APPS_SANDBOX_ORIGIN: ssm.StringParameter.valueForStringParameter(
+                this,
+                `/${config.projectPrefix}/mcp-sandbox/origin`
+              ),
+            }
+          : {}),
       },
     });
     this.runtime.node.addDependency(runtimeExecutionRole);
diff --git a/infrastructure/lib/infrastructure-stack.ts b/infrastructure/lib/infrastructure-stack.ts
index 05accdba..2c297a9c 100644
--- a/infrastructure/lib/infrastructure-stack.ts
+++ b/infrastructure/lib/infrastructure-stack.ts
@@ -6,6 +6,7 @@ import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
 import * as ec2 from 'aws-cdk-lib/aws-ec2';
 import * as elbv2 from 'aws-cdk-lib/aws-elasticloadbalancingv2';
 import * as ecs from 'aws-cdk-lib/aws-ecs';
+import * as iam from 'aws-cdk-lib/aws-iam';
 import * as kms from 'aws-cdk-lib/aws-kms';
 import * as route53 from 'aws-cdk-lib/aws-route53';
 import * as route53Targets from 'aws-cdk-lib/aws-route53-targets';
@@ -692,6 +693,96 @@ export class InfrastructureStack extends cdk.Stack {
       tier: ssm.ParameterTier.STANDARD,
     });
 
+    // High-entropy random secret used to derive the AES-256 cookie-sealing
+    // key. Generated once at stack create by Secrets Manager itself
+    // (`generateSecretString`), so every app-api task — across desiredCount > 1
+    // and across rolling deploys where two task revisions briefly coexist —
+    // derives the same plaintext key. Without this, each task's CookieCodec
+    // singleton would mint its own random AES key and any cookie sealed by
+    // Task A would fail `bad seal` on Task B (a 401 storm under the
+    // desiredCount: 2 deployment shape).
+    //
+    // The runtime hashes the secret string with SHA-256 to produce the
+    // 32-byte AES-256 key — a single-shot KDF that's secure when the input
+    // already has ≥256 bits of entropy (a 44-char alphanumeric secret has
+    // ~261 bits). This avoids the AwsCustomResource binary-payload
+    // serialization bug that broke the prior KMS-wrap bootstrap design
+    // (the framework Lambda JSON-stringifies Uint8Array as `{"0":233,...}`,
+    // which exceeded the 4 KB CloudFormation response limit).
+    //
+    // The secret is encrypted at rest with `bffCookieSigningKey`; access
+    // requires both `secretsmanager:GetSecretValue` on this secret AND
+    // `kms:Decrypt` on the CMK (Secrets Manager invokes Decrypt on the
+    // caller's behalf using the secret-ARN encryption context).
+    const bffCookieDataKeySecret = new secretsmanager.Secret(this, "BFFCookieDataKeySecret", {
+      secretName: getResourceName(config, "bff-cookie-data-key"),
+      description:
+        "High-entropy random secret used to derive the AES-256 BFF cookie " +
+        "sealing key (SHA-256). Generated once at deploy time; rotation " +
+        "invalidates active cookies (no kid versioning yet).",
+      encryptionKey: bffCookieSigningKey,
+      generateSecretString: {
+        // 44 chars from the 62-char alphanumeric alphabet ≈ 261 bits of
+        // entropy — comfortably above the 256-bit AES-256 target after
+        // SHA-256 derivation. Punctuation/space excluded so the value
+        // round-trips cleanly through env-var-style consumers.
+        passwordLength: 44,
+        excludePunctuation: true,
+        includeSpace: false,
+      },
+      removalPolicy: getRemovalPolicy(config),
+    });
+
+    new ssm.StringParameter(this, "BFFCookieDataKeySecretArnParameter", {
+      parameterName: `/${config.projectPrefix}/auth/bff-cookie-data-key-secret-arn`,
+      stringValue: bffCookieDataKeySecret.secretArn,
+      description: "Secrets Manager ARN for the BFF cookie data-key secret",
+      tier: ssm.ParameterTier.STANDARD,
+    });
+
+    // ============================================================
+    // Artifact Render Token Signing Key
+    // ============================================================
+    // HMAC-SHA256 key shared between app-api (minter) and the artifact
+    // render Lambda (verifier). The app-api hands the SPA a short-lived
+    // JWT scoped to one (artifact_id, version); the SPA embeds it as
+    // ?t=... on the iframe src; the render Lambda validates the JWT,
+    // fetches content from S3/DDB, and returns HTML with a strict CSP.
+    //
+    // Lives here (not in ArtifactsStack) so app-api and the render Lambda
+    // both read it symmetrically from a foundation neither owns — which
+    // keeps ArtifactsStack a leaf producer with no back-edges into
+    // InfrastructureStack. If this moved into ArtifactsStack, app-api
+    // would gain a deploy-order dependency on ArtifactsStack.
+    //
+    // Provisioned unconditionally on `config.artifacts.enabled` only; if
+    // artifacts is off, the secret is not created and no SSM parameter
+    // is published (consumers gate their lookups on the same flag).
+    if (config.artifacts.enabled) {
+      const artifactRenderTokenSecret = new secretsmanager.Secret(this, "ArtifactRenderTokenSecret", {
+        secretName: getResourceName(config, "artifact-render-token-key"),
+        description:
+          "HMAC-SHA256 key for signing artifact iframe render tokens. " +
+          "Used by app-api to mint short-lived JWTs and by the artifact " +
+          "render Lambda to verify them.",
+        generateSecretString: {
+          // 44 chars from the 62-char alphanumeric alphabet ≈ 261 bits of
+          // entropy — same shape as the BFF cookie data key above.
+          passwordLength: 44,
+          excludePunctuation: true,
+          includeSpace: false,
+        },
+        removalPolicy: getRemovalPolicy(config),
+      });
+
+      new ssm.StringParameter(this, "ArtifactRenderTokenSecretArnParameter", {
+        parameterName: `/${config.projectPrefix}/artifacts/render-token-key-arn`,
+        stringValue: artifactRenderTokenSecret.secretArn,
+        description: "Secrets Manager ARN for the artifact render token signing key",
+        tier: ssm.ParameterTier.STANDARD,
+      });
+    }
+
     // OAuth Providers Table - Admin-configured OAuth provider settings
     const oauthProvidersTable = new dynamodb.Table(this, "OAuthProvidersTable", {
       tableName: getResourceName(config, "oauth-providers"),
@@ -1161,6 +1252,35 @@ export class InfrastructureStack extends cdk.Stack {
       tier: ssm.ParameterTier.STANDARD,
     });
 
+    // ============================================================
+    // User Menu Links Table
+    // Admin-managed links rendered in the SPA user menu.
+    // Fixed PK ``USER_MENU_LINKS``, SK ``LINK#<uuid>``.
+    // ============================================================
+    const userMenuLinksTable = new dynamodb.Table(this, "UserMenuLinksTable", {
+      tableName: getResourceName(config, "user-menu-links"),
+      partitionKey: { name: "PK", type: dynamodb.AttributeType.STRING },
+      sortKey: { name: "SK", type: dynamodb.AttributeType.STRING },
+      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
+      pointInTimeRecovery: true,
+      removalPolicy: getRemovalPolicy(config),
+      encryption: dynamodb.TableEncryption.AWS_MANAGED,
+    });
+
+    new ssm.StringParameter(this, "UserMenuLinksTableNameParameter", {
+      parameterName: `/${config.projectPrefix}/admin/user-menu-links-table-name`,
+      stringValue: userMenuLinksTable.tableName,
+      description: "User-menu links DynamoDB table name",
+      tier: ssm.ParameterTier.STANDARD,
+    });
+
+    new ssm.StringParameter(this, "UserMenuLinksTableArnParameter", {
+      parameterName: `/${config.projectPrefix}/admin/user-menu-links-table-arn`,
+      stringValue: userMenuLinksTable.tableArn,
+      description: "User-menu links DynamoDB table ARN",
+      tier: ssm.ParameterTier.STANDARD,
+    });
+
     // AuthProviders Table - OIDC authentication provider configuration
     const authProvidersTable = new dynamodb.Table(this, "AuthProvidersTable", {
       tableName: getResourceName(config, "auth-providers"),
diff --git a/infrastructure/lib/mcp-sandbox-stack.ts b/infrastructure/lib/mcp-sandbox-stack.ts
new file mode 100644
index 00000000..18912b9d
--- /dev/null
+++ b/infrastructure/lib/mcp-sandbox-stack.ts
@@ -0,0 +1,376 @@
+import * as cdk from 'aws-cdk-lib';
+import * as acm from 'aws-cdk-lib/aws-certificatemanager';
+import * as cloudfront from 'aws-cdk-lib/aws-cloudfront';
+import * as origins from 'aws-cdk-lib/aws-cloudfront-origins';
+import * as route53 from 'aws-cdk-lib/aws-route53';
+import * as route53Targets from 'aws-cdk-lib/aws-route53-targets';
+import * as s3 from 'aws-cdk-lib/aws-s3';
+import * as s3deploy from 'aws-cdk-lib/aws-s3-deployment';
+import * as ssm from 'aws-cdk-lib/aws-ssm';
+import { Construct } from 'constructs';
+import * as fs from 'fs';
+import * as path from 'path';
+import {
+  AppConfig,
+  applyStandardTags,
+  getAutoDeleteObjects,
+  getRemovalPolicy,
+  getResourceName,
+} from './config';
+
+/**
+ * The full JS string literal in `assets/mcp-sandbox/csp-function.js` that
+ * we substitute at synth time. Matching the *quoted* literal (not the
+ * inner identifier) lets us replace it with `JSON.stringify(value)`,
+ * which handles quote-escaping correctly for any `frame-ancestors` source
+ * list — including `'none'` (which would otherwise produce `''none''`,
+ * a JS syntax error).
+ *
+ * The CFN unit tests don't depend on the substitution: they pass
+ * `frameAncestors` directly to `buildCspHeader`, so the unsubstituted
+ * source file is valid JS for `require()` to load.
+ */
+const FRAME_ANCESTORS_PLACEHOLDER_LITERAL = "'__INJECT_FRAME_ANCESTORS__'";
+
+/**
+ * Load the dynamic-CSP CloudFront Function source and inject the real
+ * `frame-ancestors` source list as a properly-escaped JS string literal.
+ * Asserts the placeholder is present exactly once so a future refactor
+ * that loses the marker fails loudly at synth, not at edge runtime.
+ *
+ * Exported for unit testing.
+ */
+export function loadMcpSandboxCspFunctionCode(frameAncestors: string): string {
+  const filePath = path.resolve(
+    __dirname,
+    '..',
+    'assets',
+    'mcp-sandbox',
+    'csp-function.js',
+  );
+  const source = fs.readFileSync(filePath, 'utf8');
+  const occurrences = source.split(FRAME_ANCESTORS_PLACEHOLDER_LITERAL).length - 1;
+  if (occurrences !== 1) {
+    throw new Error(
+      `Expected exactly one occurrence of ${FRAME_ANCESTORS_PLACEHOLDER_LITERAL} in csp-function.js, found ${occurrences}. Did the marker get renamed or duplicated?`,
+    );
+  }
+  return source.replace(FRAME_ANCESTORS_PLACEHOLDER_LITERAL, JSON.stringify(frameAncestors));
+}
+
+export interface McpSandboxStackProps extends cdk.StackProps {
+  config: AppConfig;
+}
+
+/**
+ * The subdomain label for the MCP Apps sandbox-proxy origin.
+ *
+ * Decision (the scoping doc explicitly leaves this "TBD in PR #1"): use
+ * `mcp-sandbox`, matching the working name in
+ * docs/kaizen/scoping/mcp-apps-host-renderer.md and paralleling the existing
+ * sibling iframe origin `artifacts.{domain}`. This single constant is the
+ * source of truth — it must stay in sync with the CDK_MCP_SANDBOX_* workflow
+ * env vars and the cors-deployment skill notes.
+ */
+export const MCP_SANDBOX_SUBDOMAIN_LABEL = 'mcp-sandbox';
+
+/**
+ * Build the CSP `frame-ancestors` source list for the proxy origin.
+ *
+ * The proxy may ONLY be embedded by the SPA (`ai.client`) origin, which is
+ * `https://{domainName}` plus any explicitly-allowed extras (e.g.
+ * http://localhost:4200 for a local SPA pointed at this env). This is the
+ * security-critical control for PR #1.
+ *
+ * Falls back to `'none'` (deny all framing) when there is no SPA origin to
+ * permit — keeps the stack synthesizable for unit/synth tests and
+ * domain-less local stacks without ever silently allowing `*`.
+ *
+ * Exported so the value is unit-testable directly and the stack body has a
+ * single source of truth.
+ */
+export function buildMcpSandboxFrameAncestors(
+  domainName: string | undefined,
+  extraFrameAncestors: string[],
+): string {
+  const sources: string[] = [];
+  if (domainName) {
+    sources.push(`https://${domainName}`);
+  }
+  for (const extra of extraFrameAncestors) {
+    const trimmed = extra.trim();
+    if (trimmed) {
+      sources.push(trimmed);
+    }
+  }
+  return sources.length > 0 ? sources.join(' ') : `'none'`;
+}
+
+/**
+ * MCP Apps host renderer — Sandbox Proxy origin.
+ *
+ * PR #1 of docs/kaizen/scoping/mcp-apps-host-renderer.md.
+ *
+ * Provisions a dedicated cross-origin static origin that serves a single
+ * shell, `proxy.html`, at `mcp-sandbox.{domainName}`:
+ *
+ *   - S3 bucket (private, OAC-only) holding proxy.html + proxy.js
+ *   - BucketDeployment that bakes those assets in at deploy time (the stack
+ *     is self-contained — no separate asset-sync step)
+ *   - CloudFront distribution terminating TLS and stamping the CSP
+ *     (`frame-ancestors` locked to the SPA origin)
+ *   - Route53 A record for the subdomain (when a custom domain + cert are
+ *     configured)
+ *
+ * The proxy is the OUTER half of the spec's Sandbox Proxy pattern; it itself
+ * creates the inner content iframe via `srcdoc` (see assets/mcp-sandbox/).
+ *
+ * INERT BY DESIGN: this stack writes exactly one SSM export
+ * (`/{prefix}/mcp-sandbox/origin`) that nothing consumes until the frontend
+ * `<mcp-app-frame>` lands (PR #4) and the whole host renderer stays gated
+ * behind `MCP_APPS_HOST_ENABLED` until PR #7. Deploying it changes nothing
+ * user-facing.
+ *
+ * Cross-stack contract: reads NOTHING from other stacks (cert ARN comes from
+ * config, the hosted zone via Route53 lookup). One-way SSM publish only —
+ * deploy tier 1, parallel-safe with Artifacts / RAG / Gateway / Fine-Tuning.
+ */
+export class McpSandboxStack extends cdk.Stack {
+  public readonly bucket: s3.Bucket;
+  public readonly distribution: cloudfront.Distribution;
+  public readonly proxyOrigin: string;
+
+  constructor(scope: Construct, id: string, props: McpSandboxStackProps) {
+    super(scope, id, props);
+
+    const { config } = props;
+    applyStandardTags(this, config);
+
+    // Custom domain + cert + Route53 are attached only when BOTH a domain and
+    // an ACM cert are configured (config.ts enforces both, plus the hosted
+    // zone, whenever the stack is *enabled*). Keeping it conditional — the
+    // FrontendStack pattern — lets the stack still synthesize on the
+    // CloudFront default domain for unit/synth tests and domain-less local
+    // stacks, while a real deploy always has the full custom-domain path.
+    const domainName = config.domainName;
+    const certificateArn = config.mcpSandbox.certificateArn;
+    const useCustomDomain = Boolean(domainName && certificateArn);
+    const proxySubdomain = domainName
+      ? `${MCP_SANDBOX_SUBDOMAIN_LABEL}.${domainName}`
+      : undefined;
+
+    // The SPA (ai.client) origin is the ONLY origin allowed to embed the
+    // proxy. Derived from the same domainName the CORS model is centred on
+    // (cors-deployment skill) so the framing allowlist and the CORS
+    // allowlist can never drift apart.
+    const frameAncestors = buildMcpSandboxFrameAncestors(
+      domainName,
+      config.mcpSandbox.extraFrameAncestors,
+    );
+
+    // ============================================================
+    // S3 — holds the static proxy shell (private, OAC-only)
+    // ============================================================
+    // No public access, no website hosting, no CORS: the shell is loaded
+    // only by being framed (an HTML document navigation), never via XHR.
+    // Content is fully reproducible from source, so removal policy follows
+    // the standard retain/destroy helper like every other bucket.
+    this.bucket = new s3.Bucket(this, 'McpSandboxBucket', {
+      bucketName: getResourceName(config, 'mcp-sandbox', config.awsAccount),
+      encryption: s3.BucketEncryption.S3_MANAGED,
+      blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
+      enforceSSL: true,
+      removalPolicy: getRemovalPolicy(config),
+      autoDeleteObjects: getAutoDeleteObjects(config),
+    });
+
+    // ============================================================
+    // CloudFront — terminates TLS, runs the dynamic-CSP function
+    // ============================================================
+    // The CSP itself is composed PER-REQUEST by a CloudFront Function on
+    // viewer-response (see `assets/mcp-sandbox/csp-function.js`). The
+    // function reads the `?csp=` query param the SPA appends when framing
+    // proxy.html — the JSON shape matches the spec's `McpUiResourceCsp`
+    // (`_meta.ui.csp`) and the function honors declared
+    // `connectDomains`/`resourceDomains`/`frameDomains`/`baseUriDomains`.
+    // Apps that omit `_meta.ui.csp` fall through to the same reference-
+    // matching default the previous static CSP shipped, so the 22/25
+    // example servers that worked before continue to work.
+    //
+    // The ResponseHeadersPolicy is left intentionally CSP-less. Doing
+    // both (static via RHP + dynamic via function) would mean every
+    // response carries two `Content-Security-Policy` headers; browsers
+    // intersect them, which would silently re-deny anything an App
+    // legitimately declared. The CFN is the single source of truth for
+    // the CSP directive; other security headers (HSTS, Referrer-Policy,
+    // X-Content-Type-Options) remain on the RHP since they don't vary
+    // per resource.
+    const responseHeadersPolicy = new cloudfront.ResponseHeadersPolicy(
+      this,
+      'McpSandboxResponseHeaders',
+      {
+        responseHeadersPolicyName: getResourceName(config, 'mcp-sandbox-headers'),
+        // AWS caps ResponseHeadersPolicy `Comment` at 128 chars (same as
+        // CloudFront::Function). Full rationale lives in the code comments
+        // above; this field is just the AWS-visible label.
+        comment: 'HSTS + Referrer-Policy + X-Content-Type-Options for MCP Apps sandbox proxy. CSP via dynamic CloudFront Function.',
+        securityHeadersBehavior: {
+          contentTypeOptions: { override: true },
+          // Intentionally NOT setting frameOptions — `frame-ancestors`
+          // in the dynamic CSP is the modern equivalent and the control
+          // we care about. Setting X-Frame-Options too would only add a
+          // legacy, less expressive duplicate.
+          referrerPolicy: {
+            referrerPolicy: cloudfront.HeadersReferrerPolicy.NO_REFERRER,
+            override: true,
+          },
+          strictTransportSecurity: {
+            accessControlMaxAge: cdk.Duration.days(365),
+            includeSubdomains: true,
+            override: true,
+          },
+        },
+      },
+    );
+
+    // Dynamic-CSP CloudFront Function. The source ships the FRAME_ANCESTORS
+    // placeholder string; we substitute it for the real source list here
+    // so unit tests can run the file as-is. JS_2_0 runtime is required —
+    // the function uses ES2017 features (regex literals, template
+    // strings, JSON.parse) that the legacy JS_1_0 runtime doesn't accept.
+    const cspFunctionCode = loadMcpSandboxCspFunctionCode(frameAncestors);
+    // AWS caps CloudFront Function `Comment` at 128 chars — design rationale
+    // lives in the docstring above and the scoping doc, not here.
+    const cspFunction = new cloudfront.Function(this, 'McpSandboxCspFunction', {
+      functionName: getResourceName(config, 'mcp-sandbox-csp'),
+      comment: 'Composes per-resource CSP header from ?csp= query (mirrors ext-apps basic-host/serve.ts).',
+      runtime: cloudfront.FunctionRuntime.JS_2_0,
+      code: cloudfront.FunctionCode.fromInline(cspFunctionCode),
+    });
+
+    // Static shell: a short cache is fine and BucketDeployment invalidates on
+    // every deploy so shell changes propagate immediately. No cookies / query
+    // / headers participate in the cache key.
+    const cachePolicy = new cloudfront.CachePolicy(this, 'McpSandboxCachePolicy', {
+      cachePolicyName: getResourceName(config, 'mcp-sandbox-cache'),
+      comment: 'Cache policy for the MCP Apps sandbox proxy shell',
+      defaultTtl: cdk.Duration.minutes(5),
+      minTtl: cdk.Duration.seconds(0),
+      maxTtl: cdk.Duration.hours(1),
+      cookieBehavior: cloudfront.CacheCookieBehavior.none(),
+      headerBehavior: cloudfront.CacheHeaderBehavior.none(),
+      queryStringBehavior: cloudfront.CacheQueryStringBehavior.none(),
+      enableAcceptEncodingGzip: true,
+      enableAcceptEncodingBrotli: true,
+    });
+
+    const distributionProps: cloudfront.DistributionProps = {
+      comment: getResourceName(config, 'mcp-sandbox-cdn'),
+      defaultRootObject: 'proxy.html',
+      minimumProtocolVersion: cloudfront.SecurityPolicyProtocol.TLS_V1_2_2021,
+      defaultBehavior: {
+        origin: origins.S3BucketOrigin.withOriginAccessControl(this.bucket),
+        viewerProtocolPolicy: cloudfront.ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
+        cachePolicy,
+        responseHeadersPolicy,
+        allowedMethods: cloudfront.AllowedMethods.ALLOW_GET_HEAD,
+        cachedMethods: cloudfront.CachedMethods.CACHE_GET_HEAD,
+        compress: true,
+        // The CSP function runs on viewer-response — including cache hits —
+        // so the CSP is composed fresh from the request's `?csp=` query
+        // every time. That lets us keep the cache key simple (no `?csp=`
+        // included) while still emitting per-resource CSPs: one cached
+        // body, dynamic header.
+        functionAssociations: [
+          {
+            function: cspFunction,
+            eventType: cloudfront.FunctionEventType.VIEWER_RESPONSE,
+          },
+        ],
+      },
+      // Cheapest price class — the shell is tiny and not latency-critical.
+      priceClass: cloudfront.PriceClass.PRICE_CLASS_100,
+      httpVersion: cloudfront.HttpVersion.HTTP2_AND_3,
+      enabled: true,
+      ...(useCustomDomain
+        ? {
+            domainNames: [proxySubdomain!],
+            certificate: acm.Certificate.fromCertificateArn(
+              this,
+              'McpSandboxCertificate',
+              certificateArn!,
+            ),
+          }
+        : {}),
+    };
+
+    this.distribution = new cloudfront.Distribution(
+      this,
+      'McpSandboxDistribution',
+      distributionProps,
+    );
+
+    // ============================================================
+    // Deploy the static shell into the bucket (self-contained)
+    // ============================================================
+    // Source.asset of plain files needs no Docker — CDK zips locally and the
+    // aws-cdk-lib BucketDeployment Lambda uploads it. distribution +
+    // distributionPaths wires an automatic CloudFront invalidation so a
+    // re-deployed shell is served immediately despite the cache policy.
+    new s3deploy.BucketDeployment(this, 'McpSandboxShellDeployment', {
+      sources: [
+        s3deploy.Source.asset(path.resolve(__dirname, '..', 'assets', 'mcp-sandbox')),
+      ],
+      destinationBucket: this.bucket,
+      distribution: this.distribution,
+      distributionPaths: ['/*'],
+      prune: true,
+    });
+
+    // ============================================================
+    // Route53 — alias the subdomain to CloudFront (custom domain only)
+    // ============================================================
+    if (useCustomDomain) {
+      const hostedZone = route53.HostedZone.fromLookup(this, 'HostedZone', {
+        domainName: config.infrastructureHostedZoneDomain!,
+      });
+
+      new route53.ARecord(this, 'McpSandboxAliasRecord', {
+        zone: hostedZone,
+        recordName: proxySubdomain!,
+        target: route53.RecordTarget.fromAlias(
+          new route53Targets.CloudFrontTarget(this.distribution),
+        ),
+        comment: 'MCP Apps sandbox-proxy origin — proxies to CloudFront → S3 shell',
+      });
+    }
+
+    // ============================================================
+    // SSM export — the outward contract (one-way; nothing reads it yet)
+    // ============================================================
+    this.proxyOrigin = useCustomDomain
+      ? `https://${proxySubdomain}`
+      : `https://${this.distribution.distributionDomainName}`;
+
+    new ssm.StringParameter(this, 'McpSandboxOriginParameter', {
+      parameterName: `/${config.projectPrefix}/mcp-sandbox/origin`,
+      stringValue: this.proxyOrigin,
+      description: 'Origin serving the MCP Apps sandbox proxy shell (https://mcp-sandbox.{domain})',
+      tier: ssm.ParameterTier.STANDARD,
+    });
+
+    // Human-readable CloudFormation outputs for deploy-time visibility.
+    new cdk.CfnOutput(this, 'McpSandboxOrigin', {
+      value: this.proxyOrigin,
+      description: 'MCP Apps sandbox-proxy origin URL',
+    });
+    new cdk.CfnOutput(this, 'McpSandboxProxyUrl', {
+      value: `${this.proxyOrigin}/proxy.html`,
+      description: 'Fully-qualified URL of the sandbox proxy shell',
+    });
+    new cdk.CfnOutput(this, 'McpSandboxDistributionId', {
+      value: this.distribution.distributionId,
+      description: 'CloudFront distribution ID for the sandbox-proxy origin',
+    });
+  }
+}
diff --git a/infrastructure/package-lock.json b/infrastructure/package-lock.json
index a58d3552..89bed4e6 100644
--- a/infrastructure/package-lock.json
+++ b/infrastructure/package-lock.json
@@ -1,12 +1,12 @@
 {
   "name": "infrastructure",
-  "version": "1.0.0-beta.24",
+  "version": "1.0.0-beta.28",
   "lockfileVersion": 3,
   "requires": true,
   "packages": {
     "": {
       "name": "infrastructure",
-      "version": "1.0.0-beta.24",
+      "version": "1.0.0-beta.28",
       "dependencies": {
         "aws-cdk-lib": "2.251.0",
         "constructs": "10.6.0"
diff --git a/infrastructure/package.json b/infrastructure/package.json
index f7e425c0..340eed0d 100644
--- a/infrastructure/package.json
+++ b/infrastructure/package.json
@@ -1,6 +1,6 @@
 {
   "name": "infrastructure",
-  "version": "1.0.0-beta.24",
+  "version": "1.0.0-beta.28",
   "bin": {
     "infrastructure": "bin/infrastructure.js"
   },
diff --git a/infrastructure/test/app-api-stack.test.ts b/infrastructure/test/app-api-stack.test.ts
index 883b756c..ba647a86 100644
--- a/infrastructure/test/app-api-stack.test.ts
+++ b/infrastructure/test/app-api-stack.test.ts
@@ -3,6 +3,28 @@ import { Template, Match } from 'aws-cdk-lib/assertions';
 import { AppApiStack } from '../lib/app-api-stack';
 import { createMockConfig, createMockApp, mockEnv } from './helpers/mock-config';
 
+/**
+ * CDK splits a task role's policy across an inline AWS::IAM::Policy and one
+ * or more AWS::IAM::ManagedPolicy "overflow" attachments once the IAM inline
+ * policy size limit is exceeded. Match.arrayWith on a single resource type
+ * misses statements that landed in the overflow. Walk both types and return
+ * every statement carrying the requested Sid.
+ */
+function _findStatementsBySid(template: Template, sid: string): any[] {
+  const found: any[] = [];
+  for (const resourceType of ['AWS::IAM::Policy', 'AWS::IAM::ManagedPolicy']) {
+    const policies = template.findResources(resourceType);
+    for (const policy of Object.values(policies)) {
+      const stmts =
+        (policy as any).Properties?.PolicyDocument?.Statement || [];
+      for (const stmt of stmts) {
+        if (stmt.Sid === sid) found.push(stmt);
+      }
+    }
+  }
+  return found;
+}
+
 describe('AppApiStack', () => {
   let template: Template;
   let config: ReturnType<typeof createMockConfig>;
@@ -52,6 +74,26 @@ describe('AppApiStack', () => {
         DesiredCount: config.appApi.desiredCount,
       });
     });
+
+    test('production context sets DesiredCount to 2 for event-loop blocking mitigation', () => {
+      // The production cdk.context.json sets appApi.desiredCount=2 so that a
+      // single blocked event loop on one ECS task can no longer halt all
+      // ingress (see .kiro/specs/bff-middleware-event-loop-blocking). When
+      // that default is unintentionally reverted to 1, this test fails.
+      const productionConfig = createMockConfig({
+        production: true,
+        appApi: { ...config.appApi, desiredCount: 2 },
+      });
+      const app = createMockApp(productionConfig, ['AppApiStack']);
+      const stack = new AppApiStack(app, 'ProdAppApiStack', {
+        config: productionConfig,
+        env: mockEnv(productionConfig),
+      });
+      const prodTemplate = Template.fromStack(stack);
+      prodTemplate.hasResourceProperties('AWS::ECS::Service', {
+        DesiredCount: 2,
+      });
+    });
   });
 
   // ============================================================
@@ -236,6 +278,55 @@ describe('AppApiStack', () => {
         }),
       });
     });
+
+    test('BFFCookieSigningKey grant is Decrypt-only — kms:GenerateDataKey is NOT granted to the runtime', () => {
+      // Least privilege: BFFCookieDataKeySecret is encrypted at rest with
+      // BFFCookieSigningKey, so SecretsManager invokes kms:Decrypt on the
+      // caller's behalf when app-api calls GetSecretValue. The runtime
+      // never mints a fresh key — a GenerateDataKey grant here would let a
+      // compromised task seal cookies under a parallel key.
+      const bffCookieGrants = _findStatementsBySid(template, 'BFFCookieSigningKeyAccess');
+      expect(bffCookieGrants.length).toBeGreaterThan(0);
+      for (const stmt of bffCookieGrants) {
+        const actions = Array.isArray(stmt.Action) ? stmt.Action : [stmt.Action];
+        expect(actions).not.toContain('kms:GenerateDataKey');
+        expect(actions).toContain('kms:Decrypt');
+      }
+    });
+
+    test('task role can read the BFF cookie data key secret (GetSecretValue)', () => {
+      const grants = _findStatementsBySid(template, 'BFFCookieDataKeySecretAccess');
+      expect(grants.length).toBeGreaterThan(0);
+      for (const stmt of grants) {
+        expect(stmt.Effect).toBe('Allow');
+        const actions = Array.isArray(stmt.Action) ? stmt.Action : [stmt.Action];
+        expect(actions).toContain('secretsmanager:GetSecretValue');
+        expect(actions).toContain('secretsmanager:DescribeSecret');
+      }
+    });
+  });
+
+  // ============================================================
+  // BFF Cookie Data Key — env var wiring
+  // ============================================================
+
+  describe('BFF Cookie Data Key', () => {
+    test('container env includes BFF_COOKIE_DATA_KEY_SECRET_ARN sourced from SSM', () => {
+      // Without this env var the CookieCodec falls back to "BFF disabled"
+      // and every cookie-bearing request unseals as bad seal — auth flow
+      // breaks across every app-api task.
+      template.hasResourceProperties('AWS::ECS::TaskDefinition', {
+        ContainerDefinitions: Match.arrayWith([
+          Match.objectLike({
+            Environment: Match.arrayWith([
+              Match.objectLike({
+                Name: 'BFF_COOKIE_DATA_KEY_SECRET_ARN',
+              }),
+            ]),
+          }),
+        ]),
+      });
+    });
   });
 
   // ============================================================
diff --git a/infrastructure/test/cors.test.ts b/infrastructure/test/cors.test.ts
index 98b3fc78..af119ece 100644
--- a/infrastructure/test/cors.test.ts
+++ b/infrastructure/test/cors.test.ts
@@ -1,5 +1,6 @@
 import * as cdk from 'aws-cdk-lib';
 import { loadConfig, buildCorsOrigins, AppConfig } from '../lib/config';
+import { buildMcpSandboxFrameAncestors } from '../lib/mcp-sandbox-stack';
 import { createMockConfig } from './helpers/mock-config';
 
 /**
@@ -256,3 +257,55 @@ describe('loadConfig CORS derivation', () => {
     expect(origins).toEqual(['http://localhost:4200']);
   });
 });
+
+// ============================================================
+// MCP Apps sandbox-proxy origin allowlisting (PR #1)
+//
+// The proxy is not a CORS consumer (buildCorsOrigins) — it is an origin
+// whose CSP frame-ancestors must permit ONLY the SPA. Per the
+// cors-deployment skill, every new env var that names this origin /
+// allowlists it flows through this file: these tests pin that the proxy's
+// framing allowlist is derived from the SAME domainName the CORS model is
+// centred on, so the two allowlists can never silently drift apart.
+// ============================================================
+
+describe('MCP sandbox proxy frame-ancestors vs CORS domain', () => {
+  test('frame-ancestors matches the domain-derived SPA CORS origin', () => {
+    const config = createMockConfig({
+      corsOrigins: 'https://alpha.example.com',
+      domainName: 'alpha.example.com',
+    });
+    const corsOrigins = buildCorsOrigins(config);
+    const frameAncestors = buildMcpSandboxFrameAncestors(
+      config.domainName,
+      config.mcpSandbox.extraFrameAncestors,
+    );
+    // The single SPA origin the CORS layer allows is exactly the single
+    // origin permitted to frame the proxy.
+    expect(corsOrigins).toContain('https://alpha.example.com');
+    expect(frameAncestors).toBe('https://alpha.example.com');
+  });
+
+  test('never widens to * and denies framing when there is no SPA origin', () => {
+    const config = createMockConfig({ corsOrigins: '', domainName: undefined });
+    const frameAncestors = buildMcpSandboxFrameAncestors(
+      config.domainName,
+      config.mcpSandbox.extraFrameAncestors,
+    );
+    expect(frameAncestors).toBe("'none'");
+    expect(frameAncestors).not.toContain('*');
+  });
+
+  test('extra frame ancestors (e.g. localhost SPA) are additive, like CDK_CORS_ORIGINS', () => {
+    const config = createMockConfig({
+      corsOrigins: 'https://alpha.example.com,http://localhost:4200',
+      domainName: 'alpha.example.com',
+      mcpSandbox: { enabled: true, extraFrameAncestors: ['http://localhost:4200'] },
+    });
+    const frameAncestors = buildMcpSandboxFrameAncestors(
+      config.domainName,
+      config.mcpSandbox.extraFrameAncestors,
+    );
+    expect(frameAncestors).toBe('https://alpha.example.com http://localhost:4200');
+  });
+});
diff --git a/infrastructure/test/helpers/mock-config.ts b/infrastructure/test/helpers/mock-config.ts
index 9506c97e..6622293c 100644
--- a/infrastructure/test/helpers/mock-config.ts
+++ b/infrastructure/test/helpers/mock-config.ts
@@ -77,6 +77,15 @@ export function createMockConfig(overrides: Partial<AppConfig> = {}): AppConfig
       enabled: false,
       defaultQuotaHours: 0,
     },
+    artifacts: {
+      enabled: false,
+      retentionDays: 90,
+      extraFrameAncestors: [],
+    },
+    mcpSandbox: {
+      enabled: false,
+      extraFrameAncestors: [],
+    },
     cognito: {
       domainPrefix: MOCK_PREFIX,
       passwordMinLength: 8,
@@ -113,6 +122,10 @@ const SSM_READS_BY_STACK: Record<string, string[]> = {
     'network/private-subnet-ids',
     'network/availability-zones',
   ],
+  ArtifactsStack: [
+    'artifacts/render-token-key-arn',
+  ],
+  McpSandboxStack: [],
   InferenceApiStack: [
     'inference-api/image-tag',
     'oauth/client-secrets-arn',
diff --git a/infrastructure/test/inference-api-stack.test.ts b/infrastructure/test/inference-api-stack.test.ts
index 2a9cee2a..f1bfd8db 100644
--- a/infrastructure/test/inference-api-stack.test.ts
+++ b/infrastructure/test/inference-api-stack.test.ts
@@ -160,6 +160,39 @@ describe('InferenceApiStack', () => {
         }),
       });
     });
+
+    // MCP Apps sandbox-proxy origin (PR #7 of
+    // docs/kaizen/scoping/mcp-apps-host-renderer.md). Conditional-SSM
+    // pattern: only wired when the mcp-sandbox stack is deployed, so a
+    // disabled environment never issues an SSM read against a parameter
+    // that doesn't exist.
+    test('omits AGENTCORE_MCP_APPS_SANDBOX_ORIGIN when mcp-sandbox disabled', () => {
+      // Default mock config has mcpSandbox.enabled = false.
+      template.hasResourceProperties('AWS::BedrockAgentCore::Runtime', {
+        EnvironmentVariables: Match.objectLike({
+          AGENTCORE_MCP_APPS_SANDBOX_ORIGIN: Match.absent(),
+        }),
+      });
+    });
+
+    test('wires AGENTCORE_MCP_APPS_SANDBOX_ORIGIN when mcp-sandbox enabled', () => {
+      const enabledConfig = createMockConfig({
+        mcpSandbox: { enabled: true, extraFrameAncestors: [] },
+      });
+      const app = createMockApp(enabledConfig, ['InferenceApiStack']);
+      const stack = new InferenceApiStack(app, 'SandboxEnabledStack', {
+        config: enabledConfig,
+        env: mockEnv(enabledConfig),
+      });
+      Template.fromStack(stack).hasResourceProperties(
+        'AWS::BedrockAgentCore::Runtime',
+        {
+          EnvironmentVariables: Match.objectLike({
+            AGENTCORE_MCP_APPS_SANDBOX_ORIGIN: Match.anyValue(),
+          }),
+        }
+      );
+    });
   });
 
   describe('AgentCore Memory', () => {
diff --git a/infrastructure/test/infrastructure-stack.test.ts b/infrastructure/test/infrastructure-stack.test.ts
index b1770606..bc54586b 100644
--- a/infrastructure/test/infrastructure-stack.test.ts
+++ b/infrastructure/test/infrastructure-stack.test.ts
@@ -108,10 +108,58 @@ describe('InfrastructureStack', () => {
   });
 
   // ------------------------------------------------------------------
-  // 6. All DynamoDB tables are created (count)
+  // 6. All DynamoDB tables are created — enumerated, not counted
   // ------------------------------------------------------------------
-  test('creates all 16 DynamoDB tables', () => {
-    template.resourceCountIs('AWS::DynamoDB::Table', 16);
+  test('provisions exactly the expected set of DynamoDB tables', () => {
+    // A bare resourceCountIs() number rots silently: a table gets added
+    // and the count just sits red (or worse, someone bumps the number
+    // without checking what changed). Instead, enumerate every table by
+    // the name-suffix passed to getResourceName() in
+    // infrastructure-stack.ts, with the reason it exists. When you add or
+    // remove a `new dynamodb.Table(...)`, update this map in the SAME
+    // change — the assertion below diffs the synthesized table names
+    // against this set, so an unjustified add/remove fails with the exact
+    // table name, not an opaque "expected 18 got 19".
+    const expectedTables: Record<string, string> = {
+      'oidc-state': 'OIDC login state/nonce (TTL)',
+      'bff-sessions': 'BFF httpOnly session store',
+      'voice-ticket-replay': 'voice-mode ticket replay guard (TTL)',
+      users: 'user directory + profile',
+      'app-roles': 'RBAC role definitions and mappings',
+      'api-keys': 'hashed API keys',
+      'oauth-providers': 'external OAuth provider registry',
+      'oauth-user-tokens': 'per-user external OAuth tokens (CMK)',
+      'user-quotas': 'usage quota assignments',
+      'quota-events': 'quota consumption ledger',
+      'sessions-metadata': 'conversation session metadata (TTL)',
+      'user-cost-summary': 'per-user cost rollup',
+      'system-cost-rollup': 'system-wide cost rollup',
+      'managed-models': 'admin-managed model catalog',
+      'user-settings': 'per-user UI/app settings',
+      'user-menu-links': 'admin-configured user menu links',
+      'auth-providers': 'OIDC auth provider config (stream)',
+      'user-file-uploads': 'uploaded file metadata (TTL, stream)',
+      'shared-conversations': 'publicly shared conversation snapshots',
+    };
+
+    const prefix = config.projectPrefix;
+    const expectedNames = Object.keys(expectedTables)
+      .map((suffix) => `${prefix}-${suffix}`)
+      .sort();
+
+    const tables = template.findResources('AWS::DynamoDB::Table');
+    const actualNames = Object.values(tables)
+      .map((r) => (r as any).Properties.TableName as string)
+      .sort();
+
+    // Exact set equality — the failure diff names the rogue table.
+    expect(actualNames).toEqual(expectedNames);
+    // Count is derived from the justified set, so it can no longer drift
+    // independently of the enumeration above.
+    template.resourceCountIs(
+      'AWS::DynamoDB::Table',
+      Object.keys(expectedTables).length,
+    );
   });
 
   // ------------------------------------------------------------------
@@ -268,8 +316,13 @@ describe('InfrastructureStack', () => {
   // ------------------------------------------------------------------
   // 8. Secrets Manager Secrets exist
   // ------------------------------------------------------------------
-  test('creates 3 Secrets Manager secrets', () => {
-    template.resourceCountIs('AWS::SecretsManager::Secret', 3);
+  test('creates 6 Secrets Manager secrets', () => {
+    // Today: AuthenticationSecret, VoiceTicketSigningSecret,
+    // BFFCookieDataKeySecret (high-entropy random secret for cross-task
+    // cookie seal/unseal — runtime derives the AES-256 key via SHA-256),
+    // OAuthClientSecretsSecret, AuthProviderSecretsSecret,
+    // CognitoBFFAppClientSecret.
+    template.resourceCountIs('AWS::SecretsManager::Secret', 6);
   });
 
   test('authentication secret generates a 64-char random string', () => {
@@ -448,4 +501,58 @@ describe('InfrastructureStack', () => {
       DeletionPolicy: 'Delete',
     });
   });
+
+  // ------------------------------------------------------------------
+  // BFF Cookie Data Key — shared high-entropy secret for cross-task seal/unseal
+  // ------------------------------------------------------------------
+  describe('BFF Cookie Data Key', () => {
+    test('provisions a Secrets Manager secret for the data key, encrypted with the BFFCookieSigningKey CMK', () => {
+      template.hasResource('AWS::SecretsManager::Secret', {
+        Properties: Match.objectLike({
+          Description: Match.stringLikeRegexp('AES-256 BFF cookie sealing key'),
+          KmsKeyId: Match.anyValue(),
+        }),
+      });
+    });
+
+    test('data key secret is generated by Secrets Manager (no chained custom resource)', () => {
+      // Replaces the prior chained AwsCustomResource bootstrap, which broke
+      // on first deploy because the framework Lambda JSON-stringifies KMS's
+      // Uint8Array CiphertextBlob as `{"0":233,...}` and exceeded the 4 KB
+      // CloudFormation response-object limit. `generateSecretString` runs
+      // inside Secrets Manager itself, so the bootstrap value never crosses
+      // a CFN response payload.
+      template.hasResource('AWS::SecretsManager::Secret', {
+        Properties: Match.objectLike({
+          Description: Match.stringLikeRegexp('AES-256 BFF cookie sealing key'),
+          GenerateSecretString: Match.objectLike({
+            // 44 chars from the 62-char alphanumeric alphabet ≈ 261 bits
+            // of entropy. SHA-256 derivation at runtime maps this to a
+            // 32-byte AES-256 key with no entropy loss.
+            PasswordLength: 44,
+            ExcludePunctuation: true,
+            IncludeSpace: false,
+          }),
+        }),
+      });
+    });
+
+    test('no AwsCustomResource emits a generateDataKey or putSecretValue call', () => {
+      // Negative lock — the broken bootstrap design is gone. Reintroducing
+      // it would re-trigger the "Response object is too long" deploy
+      // failure on stack create.
+      const customResources = template.findResources('Custom::AWS');
+      for (const r of Object.values(customResources)) {
+        const text = JSON.stringify(r);
+        expect(text).not.toContain('"action":"generateDataKey"');
+        expect(text).not.toContain('"action":"putSecretValue"');
+      }
+    });
+
+    test('publishes the data key secret ARN to SSM for app-api to consume', () => {
+      template.hasResourceProperties('AWS::SSM::Parameter', {
+        Name: `/${config.projectPrefix}/auth/bff-cookie-data-key-secret-arn`,
+      });
+    });
+  });
 });
diff --git a/infrastructure/test/mcp-sandbox-csp-function.test.ts b/infrastructure/test/mcp-sandbox-csp-function.test.ts
new file mode 100644
index 00000000..2ee3620e
--- /dev/null
+++ b/infrastructure/test/mcp-sandbox-csp-function.test.ts
@@ -0,0 +1,357 @@
+/**
+ * Unit tests for the MCP Apps sandbox dynamic-CSP CloudFront Function.
+ *
+ * The function source (`assets/mcp-sandbox/csp-function.js`) is written in
+ * the CloudFront Functions JavaScript runtime v2.0 subset, but also
+ * exports its pure helpers under `typeof module !== 'undefined'` so we
+ * can require it directly from Node. The CDK upload path substitutes the
+ * `FRAME_ANCESTORS_PLACEHOLDER` sentinel before the function runs at
+ * edge; here we pass `frameAncestors` directly to the builder.
+ *
+ * Coverage targets:
+ *   - Default CSP (no `_meta.ui.csp`) matches the ext-apps basic-host
+ *     reference's `buildCspHeader` output. This is what 22/25 reference
+ *     example servers run on.
+ *   - Declared `connectDomains` / `resourceDomains` / `frameDomains` /
+ *     `baseUriDomains` are honored on the right CSP directives.
+ *   - Sanitizer rejects every character class the reference rejects
+ *     (CSP-injection prevention — this is the security-critical bit).
+ *   - The `handler(event)` entry point degrades to default on missing /
+ *     malformed input and returns a CloudFront Functions v2.0–shaped
+ *     response object.
+ */
+
+// eslint-disable-next-line @typescript-eslint/no-var-requires
+const {
+  sanitizeCspDomains,
+  buildCspHeader,
+  parseCspParam,
+  handler,
+} = require('../assets/mcp-sandbox/csp-function');
+
+const FRAME_ANCESTORS = 'https://alpha.example.com';
+
+describe('sanitizeCspDomains', () => {
+  test('returns empty array when domains is not an array', () => {
+    expect(sanitizeCspDomains(undefined)).toEqual([]);
+    expect(sanitizeCspDomains(null)).toEqual([]);
+    expect(sanitizeCspDomains('https://example.com')).toEqual([]);
+    expect(sanitizeCspDomains({})).toEqual([]);
+  });
+
+  test('keeps well-formed origin entries unchanged', () => {
+    expect(
+      sanitizeCspDomains([
+        'https://example.com',
+        'https://*.cesium.com',
+        'https://esm.sh',
+      ]),
+    ).toEqual([
+      'https://example.com',
+      'https://*.cesium.com',
+      'https://esm.sh',
+    ]);
+  });
+
+  test('rejects semicolons (CSP directive break-out)', () => {
+    expect(
+      sanitizeCspDomains(['https://example.com', "https://evil.com; script-src *"]),
+    ).toEqual(['https://example.com']);
+  });
+
+  test('rejects newlines (header injection)', () => {
+    expect(sanitizeCspDomains(['https://evil.com\nscript-src *'])).toEqual([]);
+    expect(sanitizeCspDomains(['https://evil.com\rscript-src *'])).toEqual([]);
+  });
+
+  test("rejects quotes (smuggling CSP keywords like 'unsafe-eval')", () => {
+    expect(sanitizeCspDomains(["https://evil.com 'unsafe-eval'"])).toEqual([]);
+    expect(sanitizeCspDomains(['https://evil.com "x"'])).toEqual([]);
+  });
+
+  test('rejects spaces (multi-source smuggling within one entry)', () => {
+    expect(sanitizeCspDomains(['https://example.com https://evil.com'])).toEqual(
+      [],
+    );
+  });
+
+  test('rejects non-string entries', () => {
+    expect(sanitizeCspDomains(['https://example.com', 42, null, undefined, {}])).toEqual([
+      'https://example.com',
+    ]);
+  });
+
+  test('rejects empty strings', () => {
+    expect(sanitizeCspDomains(['', 'https://example.com', ''])).toEqual([
+      'https://example.com',
+    ]);
+  });
+});
+
+describe('buildCspHeader — default (no _meta.ui.csp)', () => {
+  const csp = buildCspHeader(null, FRAME_ANCESTORS);
+
+  test('matches the ext-apps basic-host reference default tokens', () => {
+    // These are the broader-than-spec defaults the upstream reference
+    // ships in serve.ts so bundled Apps that omit ui.csp still run.
+    // Tightening these would silently break 22/25 of the reference
+    // example servers.
+    expect(csp).toContain("default-src 'self' 'unsafe-inline'");
+    expect(csp).toContain("script-src 'self' 'unsafe-inline' 'unsafe-eval' blob: data:");
+    expect(csp).toContain("style-src 'self' 'unsafe-inline' blob: data:");
+    expect(csp).toContain("img-src 'self' data: blob:");
+    expect(csp).toContain("font-src 'self' data: blob:");
+    expect(csp).toContain("media-src 'self' data: blob:");
+    expect(csp).toContain("connect-src 'self'");
+    expect(csp).toContain("worker-src 'self' blob:");
+  });
+
+  test('locks down un-declared frame / base-uri / object / form-action', () => {
+    expect(csp).toContain("frame-src 'none'");
+    expect(csp).toContain("object-src 'none'");
+    expect(csp).toContain("base-uri 'none'");
+    expect(csp).toContain("form-action 'none'");
+  });
+
+  test('carries frame-ancestors from the synth-time injection', () => {
+    expect(csp).toContain('frame-ancestors https://alpha.example.com');
+  });
+
+  test('emits directives joined by "; " (no trailing semicolon)', () => {
+    expect(csp).toMatch(/^[^;]+(; [^;]+)+$/);
+  });
+});
+
+describe('buildCspHeader — declared domains', () => {
+  test('Excalidraw: connect+resource domains added to all asset directives', () => {
+    // From excalidraw/excalidraw-mcp/src/server.ts:
+    //   csp: { resourceDomains: ['https://esm.sh'], connectDomains: ['https://esm.sh'] }
+    const csp = buildCspHeader(
+      {
+        resourceDomains: ['https://esm.sh'],
+        connectDomains: ['https://esm.sh'],
+      },
+      FRAME_ANCESTORS,
+    );
+    expect(csp).toContain(
+      "script-src 'self' 'unsafe-inline' 'unsafe-eval' blob: data: https://esm.sh",
+    );
+    expect(csp).toContain("style-src 'self' 'unsafe-inline' blob: data: https://esm.sh");
+    expect(csp).toContain("font-src 'self' data: blob: https://esm.sh");
+    expect(csp).toContain("img-src 'self' data: blob: https://esm.sh");
+    expect(csp).toContain("media-src 'self' data: blob: https://esm.sh");
+    expect(csp).toContain("worker-src 'self' blob: https://esm.sh");
+    expect(csp).toContain('connect-src \'self\' https://esm.sh');
+  });
+
+  test('CesiumJS map-server: multiple domains on connect-src and resource-* directives', () => {
+    // From modelcontextprotocol/ext-apps/examples/map-server/server.ts
+    const csp = buildCspHeader(
+      {
+        connectDomains: [
+          'https://*.openstreetmap.org',
+          'https://cesium.com',
+          'https://*.cesium.com',
+        ],
+        resourceDomains: [
+          'https://*.openstreetmap.org',
+          'https://cesium.com',
+          'https://*.cesium.com',
+        ],
+      },
+      FRAME_ANCESTORS,
+    );
+    expect(csp).toContain(
+      'connect-src \'self\' https://*.openstreetmap.org https://cesium.com https://*.cesium.com',
+    );
+    expect(csp).toContain(
+      "script-src 'self' 'unsafe-inline' 'unsafe-eval' blob: data: https://*.openstreetmap.org https://cesium.com https://*.cesium.com",
+    );
+  });
+
+  test('frameDomains: when declared, replaces "frame-src none" — otherwise stays denied', () => {
+    const withFrames = buildCspHeader(
+      { frameDomains: ['https://youtube.com', 'https://*.youtube.com'] },
+      FRAME_ANCESTORS,
+    );
+    expect(withFrames).toContain('frame-src https://youtube.com https://*.youtube.com');
+    expect(withFrames).not.toContain("frame-src 'none'");
+
+    const withoutFrames = buildCspHeader({}, FRAME_ANCESTORS);
+    expect(withoutFrames).toContain("frame-src 'none'");
+  });
+
+  test('baseUriDomains: when declared, replaces "base-uri none"', () => {
+    const withBase = buildCspHeader(
+      { baseUriDomains: ['https://example.com'] },
+      FRAME_ANCESTORS,
+    );
+    expect(withBase).toContain('base-uri https://example.com');
+    expect(withBase).not.toContain("base-uri 'none'");
+  });
+
+  test('connectDomains alone does NOT widen resource-* directives', () => {
+    // Spec separation: connectDomains → connect-src only; resourceDomains
+    // → script/style/img/font/media/worker. An App that only declares
+    // network destinations should not get static-resource permission as
+    // a side effect.
+    const csp = buildCspHeader(
+      { connectDomains: ['https://api.example.com'] },
+      FRAME_ANCESTORS,
+    );
+    expect(csp).toContain('connect-src \'self\' https://api.example.com');
+    expect(csp).toContain("script-src 'self' 'unsafe-inline' 'unsafe-eval' blob: data:");
+    expect(csp).not.toMatch(/script-src[^;]*https:\/\/api\.example\.com/);
+  });
+
+  test('injection attempts in domain entries are silently dropped (not echoed)', () => {
+    const csp = buildCspHeader(
+      {
+        connectDomains: [
+          'https://good.com',
+          "https://evil.com; script-src *",
+          "https://evil.com 'unsafe-eval'",
+          'https://evil.com\nX-Injected: yes',
+        ],
+      },
+      FRAME_ANCESTORS,
+    );
+    expect(csp).toContain('connect-src \'self\' https://good.com');
+    expect(csp).not.toContain('evil.com');
+    expect(csp).not.toContain('X-Injected');
+    // And the directive separator structure is intact.
+    expect(csp).toMatch(/^[^;]+(; [^;]+)+$/);
+  });
+});
+
+describe('parseCspParam', () => {
+  test('null / undefined querystring → null', () => {
+    expect(parseCspParam(undefined)).toBeNull();
+    expect(parseCspParam(null)).toBeNull();
+    expect(parseCspParam({})).toBeNull();
+  });
+
+  test('missing csp key → null', () => {
+    expect(parseCspParam({ other: { value: 'x' } })).toBeNull();
+  });
+
+  test('empty value → null', () => {
+    expect(parseCspParam({ csp: { value: '' } })).toBeNull();
+  });
+
+  test('valid JSON object → parsed', () => {
+    expect(
+      parseCspParam({ csp: { value: '{"connectDomains":["https://esm.sh"]}' } }),
+    ).toEqual({ connectDomains: ['https://esm.sh'] });
+  });
+
+  test('malformed JSON → null (no throw)', () => {
+    expect(parseCspParam({ csp: { value: 'not-json' } })).toBeNull();
+    expect(parseCspParam({ csp: { value: '{unclosed' } })).toBeNull();
+  });
+
+  test('non-object JSON (array / scalar) → null', () => {
+    // We accept only the spec-shaped McpUiResourceCsp object — never an
+    // array (would let an attacker control directive composition).
+    expect(parseCspParam({ csp: { value: '["https://evil.com"]' } })).toBeNull();
+    expect(parseCspParam({ csp: { value: '"https://evil.com"' } })).toBeNull();
+    expect(parseCspParam({ csp: { value: '42' } })).toBeNull();
+    expect(parseCspParam({ csp: { value: 'null' } })).toBeNull();
+  });
+
+  test('URL-encoded value falls back to decodeURIComponent', () => {
+    // CloudFront Functions don't reliably URL-decode querystring values in
+    // every runtime/event-type combo; the function must handle both.
+    const encoded =
+      '%7B%22resourceDomains%22%3A%5B%22https%3A%2F%2Fesm.sh%22%5D%2C' +
+      '%22connectDomains%22%3A%5B%22https%3A%2F%2Fesm.sh%22%5D%7D';
+    expect(parseCspParam({ csp: { value: encoded } })).toEqual({
+      resourceDomains: ['https://esm.sh'],
+      connectDomains: ['https://esm.sh'],
+    });
+  });
+
+  test('URL-encoded but not valid JSON after decode → null', () => {
+    expect(parseCspParam({ csp: { value: '%7Bbroken' } })).toBeNull();
+  });
+});
+
+describe('handler', () => {
+  function makeEvent(querystring?: Record<string, { value: string }>) {
+    return {
+      request: { querystring: querystring ?? {} },
+      response: { statusCode: 200, headers: {} as Record<string, { value: string }> },
+    };
+  }
+
+  test('with no ?csp= param, emits the default (un-declared) CSP', () => {
+    const event = makeEvent();
+    const result = handler(event);
+    expect(result.headers['content-security-policy']).toBeDefined();
+    expect(result.headers['content-security-policy'].value).toContain(
+      "connect-src 'self'",
+    );
+    // Default → no resource domains beyond keywords/blob/data
+    expect(result.headers['content-security-policy'].value).not.toMatch(
+      /connect-src 'self' \S+/,
+    );
+  });
+
+  test('with declared csp, builds and replaces the response CSP header', () => {
+    const event = makeEvent({
+      csp: {
+        value: JSON.stringify({
+          resourceDomains: ['https://esm.sh'],
+          connectDomains: ['https://esm.sh'],
+        }),
+      },
+    });
+    // Pre-existing CSP from ResponseHeadersPolicy is what the CFN
+    // overrides — we simulate it being on the response.
+    event.response.headers['content-security-policy'] = {
+      value: "default-src 'self'",
+    };
+    const result = handler(event);
+    expect(result.headers['content-security-policy'].value).toContain(
+      'connect-src \'self\' https://esm.sh',
+    );
+    expect(result.headers['content-security-policy'].value).not.toBe(
+      "default-src 'self'",
+    );
+  });
+
+  test('with malformed ?csp=, falls back to default without throwing', () => {
+    const event = makeEvent({ csp: { value: 'not-json' } });
+    const result = handler(event);
+    expect(result.headers['content-security-policy'].value).toContain("connect-src 'self'");
+    expect(result.headers['content-security-policy'].value).not.toMatch(
+      /connect-src 'self' \S/,
+    );
+  });
+
+  test('always emits frame-ancestors so framing control is not lost on the dynamic path', () => {
+    const event = makeEvent();
+    const result = handler(event);
+    // The placeholder is what's in source; in production CDK substitutes
+    // it with the real SPA origin before upload.
+    expect(result.headers['content-security-policy'].value).toContain(
+      'frame-ancestors __INJECT_FRAME_ANCESTORS__',
+    );
+  });
+
+  test('returns the response object in the CloudFront Functions v2.0 shape', () => {
+    const event = makeEvent({
+      csp: { value: JSON.stringify({ connectDomains: ['https://api.example.com'] }) },
+    });
+    const result = handler(event);
+    expect(result).toBe(event.response);
+    expect(typeof result.headers['content-security-policy'].value).toBe('string');
+  });
+
+  test('handles missing response.headers (defensive)', () => {
+    const event = { request: { querystring: {} }, response: {} as any };
+    const result = handler(event);
+    expect(result.headers).toBeDefined();
+    expect(result.headers['content-security-policy']).toBeDefined();
+  });
+});
diff --git a/infrastructure/test/mcp-sandbox-stack.test.ts b/infrastructure/test/mcp-sandbox-stack.test.ts
new file mode 100644
index 00000000..66927a1a
--- /dev/null
+++ b/infrastructure/test/mcp-sandbox-stack.test.ts
@@ -0,0 +1,264 @@
+import { Template, Match } from 'aws-cdk-lib/assertions';
+import {
+  McpSandboxStack,
+  buildMcpSandboxFrameAncestors,
+  loadMcpSandboxCspFunctionCode,
+  MCP_SANDBOX_SUBDOMAIN_LABEL,
+} from '../lib/mcp-sandbox-stack';
+import { createMockConfig, createMockApp, mockEnv } from './helpers/mock-config';
+
+describe('McpSandboxStack', () => {
+  // Default mock config has a domainName but no mcpSandbox.certificateArn, so
+  // the stack synthesizes on the CloudFront default domain (no ACM import, no
+  // Route53 lookup) while still deriving the real frame-ancestors from the
+  // domain — exactly the unit/synth path.
+  let template: Template;
+
+  beforeEach(() => {
+    const config = createMockConfig({
+      domainName: 'test.example.com',
+      mcpSandbox: {
+        enabled: true,
+        extraFrameAncestors: ['http://localhost:4200'],
+      },
+    });
+    const app = createMockApp(config, ['McpSandboxStack']);
+    const stack = new McpSandboxStack(app, 'TestMcpSandboxStack', {
+      config,
+      env: mockEnv(config),
+    });
+    template = Template.fromStack(stack);
+  });
+
+  test('synthesizes without errors', () => {
+    expect(template.toJSON()).toBeDefined();
+  });
+
+  test('creates a private, encrypted S3 bucket that blocks all public access', () => {
+    template.hasResourceProperties('AWS::S3::Bucket', {
+      PublicAccessBlockConfiguration: {
+        BlockPublicAcls: true,
+        BlockPublicPolicy: true,
+        IgnorePublicAcls: true,
+        RestrictPublicBuckets: true,
+      },
+      BucketEncryption: {
+        ServerSideEncryptionConfiguration: Match.arrayWith([
+          Match.objectLike({
+            ServerSideEncryptionByDefault: Match.objectLike({
+              SSEAlgorithm: Match.anyValue(),
+            }),
+          }),
+        ]),
+      },
+    });
+  });
+
+  test('creates exactly one CloudFront distribution', () => {
+    template.resourceCountIs('AWS::CloudFront::Distribution', 1);
+  });
+
+  test('CloudFront serves proxy.html as the default root object', () => {
+    template.hasResourceProperties('AWS::CloudFront::Distribution', {
+      DistributionConfig: Match.objectLike({
+        DefaultRootObject: 'proxy.html',
+      }),
+    });
+  });
+
+  test('ResponseHeadersPolicy keeps non-CSP security headers but does NOT emit CSP (CSP is now per-request via the CloudFront Function)', () => {
+    // CSP via the response-headers-policy would be a SECOND `Content-
+    // Security-Policy` header alongside the dynamic one from the CFN;
+    // browsers intersect them, which would silently re-deny anything an
+    // App legitimately declared in `_meta.ui.csp`. So the policy carries
+    // only the headers that don't vary per resource.
+    template.hasResourceProperties('AWS::CloudFront::ResponseHeadersPolicy', {
+      ResponseHeadersPolicyConfig: Match.objectLike({
+        SecurityHeadersConfig: Match.objectLike({
+          StrictTransportSecurity: Match.objectLike({ Override: true }),
+          ReferrerPolicy: Match.objectLike({ Override: true }),
+          ContentTypeOptions: Match.objectLike({ Override: true }),
+        }),
+      }),
+    });
+
+    const policies = template.findResources('AWS::CloudFront::ResponseHeadersPolicy');
+    const policy = Object.values(policies)[0] as any;
+    const security = policy.Properties.ResponseHeadersPolicyConfig.SecurityHeadersConfig;
+    expect(security.ContentSecurityPolicy).toBeUndefined();
+  });
+
+  test('does NOT set legacy X-Frame-Options (frame-ancestors via the CSP function is the control)', () => {
+    const policies = template.findResources('AWS::CloudFront::ResponseHeadersPolicy');
+    const policy = Object.values(policies)[0] as any;
+    const security = policy.Properties.ResponseHeadersPolicyConfig.SecurityHeadersConfig;
+    expect(security.FrameOptions).toBeUndefined();
+  });
+
+  test('creates exactly one CloudFront Function for dynamic CSP, on the JS_2_0 runtime', () => {
+    template.resourceCountIs('AWS::CloudFront::Function', 1);
+    template.hasResourceProperties('AWS::CloudFront::Function', {
+      FunctionConfig: Match.objectLike({
+        Runtime: 'cloudfront-js-2.0',
+      }),
+    });
+  });
+
+  test('CFN function body has the frame-ancestors placeholder substituted with the real source list', () => {
+    const fns = template.findResources('AWS::CloudFront::Function');
+    const fn = Object.values(fns)[0] as any;
+    const code = fn.Properties.FunctionCode as string;
+    expect(code).toContain('https://test.example.com http://localhost:4200');
+    // The substitutable JS literal must be gone; the bare token still
+    // appears in a top-of-file comment and that's intentional.
+    expect(code).not.toContain("'__INJECT_FRAME_ANCESTORS__'");
+  });
+
+  test('CFN function comment fits within the AWS-enforced 128-char limit', () => {
+    // CloudFront::Function.FunctionConfig.Comment maxes at 128 chars and
+    // CloudFormation rejects the create with a 400 if exceeded — see
+    // alpha deploy 2026-05-19 which rolled back on this. Catching at
+    // synth time prevents another wasted deploy round-trip.
+    const fns = template.findResources('AWS::CloudFront::Function');
+    const fn = Object.values(fns)[0] as any;
+    const comment = fn.Properties.FunctionConfig.Comment as string;
+    expect(comment.length).toBeLessThanOrEqual(128);
+  });
+
+  test('ResponseHeadersPolicy comment fits within the AWS-enforced 128-char limit', () => {
+    // ResponseHeadersPolicy.ResponseHeadersPolicyConfig.Comment shares
+    // the same 128-char cap. The 2026-05-19 alpha redeploy hit this
+    // (generic "Invalid request" — AWS doesn't echo the field name) so
+    // we lock both sibling Comments in the stack down with the same
+    // guard.
+    const policies = template.findResources('AWS::CloudFront::ResponseHeadersPolicy');
+    const policy = Object.values(policies)[0] as any;
+    const comment = policy.Properties.ResponseHeadersPolicyConfig.Comment as string;
+    expect(comment.length).toBeLessThanOrEqual(128);
+  });
+
+  test('CFN function is wired to viewer-response on the default behavior', () => {
+    template.hasResourceProperties('AWS::CloudFront::Distribution', {
+      DistributionConfig: Match.objectLike({
+        DefaultCacheBehavior: Match.objectLike({
+          FunctionAssociations: Match.arrayWith([
+            Match.objectLike({
+              EventType: 'viewer-response',
+              FunctionARN: Match.anyValue(),
+            }),
+          ]),
+        }),
+      }),
+    });
+  });
+
+  test('bakes the shell in via a BucketDeployment with CloudFront invalidation', () => {
+    template.resourceCountIs('Custom::CDKBucketDeployment', 1);
+    template.hasResourceProperties('Custom::CDKBucketDeployment', {
+      DistributionPaths: ['/*'],
+    });
+  });
+
+  test('writes the one-way /mcp-sandbox/origin SSM export', () => {
+    template.hasResourceProperties('AWS::SSM::Parameter', {
+      Name: '/test-project/mcp-sandbox/origin',
+      Type: 'String',
+    });
+  });
+
+  test('does not create a Route53 record without a custom domain cert', () => {
+    template.resourceCountIs('AWS::Route53::RecordSet', 0);
+  });
+
+  test('domain-less config bakes "frame-ancestors none" into the CSP function code', () => {
+    const config = createMockConfig({
+      domainName: undefined,
+      mcpSandbox: { enabled: true, extraFrameAncestors: [] },
+    });
+    const app = createMockApp(config, ['McpSandboxStack']);
+    const stack = new McpSandboxStack(app, 'NoDomainMcpSandboxStack', {
+      config,
+      env: mockEnv(config),
+    });
+    const t = Template.fromStack(stack);
+    const fns = t.findResources('AWS::CloudFront::Function');
+    const fn = Object.values(fns)[0] as any;
+    const code = fn.Properties.FunctionCode as string;
+    // JSON.stringify wraps single-quoted CSP keywords in double quotes —
+    // the resulting JS literal is `"'none'"` (no escaping needed since
+    // JSON.stringify never emits backslashed single quotes).
+    expect(code).toContain('var FRAME_ANCESTORS = "\'none\'";');
+  });
+});
+
+describe('buildMcpSandboxFrameAncestors', () => {
+  test('prod: SPA origin derived from the domain', () => {
+    expect(buildMcpSandboxFrameAncestors('alpha.example.com', [])).toBe(
+      'https://alpha.example.com',
+    );
+  });
+
+  test('prod + extra origins (e.g. local SPA pointed at this env)', () => {
+    expect(
+      buildMcpSandboxFrameAncestors('alpha.example.com', ['http://localhost:4200']),
+    ).toBe('https://alpha.example.com http://localhost:4200');
+  });
+
+  test('no domain, no extras → deny all framing', () => {
+    expect(buildMcpSandboxFrameAncestors(undefined, [])).toBe("'none'");
+  });
+
+  test('extras only (domain-less local stack)', () => {
+    expect(buildMcpSandboxFrameAncestors(undefined, ['http://localhost:4200'])).toBe(
+      'http://localhost:4200',
+    );
+  });
+
+  test('blank extras are filtered out, never widening to *', () => {
+    expect(buildMcpSandboxFrameAncestors('alpha.example.com', ['', '  '])).toBe(
+      'https://alpha.example.com',
+    );
+  });
+});
+
+describe('loadMcpSandboxCspFunctionCode', () => {
+  test('substitutes the FRAME_ANCESTORS placeholder with the real source list (as a JSON-escaped JS literal)', () => {
+    const code = loadMcpSandboxCspFunctionCode('https://alpha.example.com');
+    expect(code).toContain('var FRAME_ANCESTORS = "https://alpha.example.com";');
+    // The replaceable quoted literal must be gone; the bare token may
+    // still appear in a comment, which is fine.
+    expect(code).not.toContain("'__INJECT_FRAME_ANCESTORS__'");
+  });
+
+  test('preserves the runtime helpers (sanitize / build / parse / handler)', () => {
+    const code = loadMcpSandboxCspFunctionCode("'none'");
+    expect(code).toContain('function sanitizeCspDomains(');
+    expect(code).toContain('function buildCspHeader(');
+    expect(code).toContain('function parseCspParam(');
+    expect(code).toContain('function handler(');
+  });
+
+  test('handles the deny-all frame-ancestors value without producing invalid JS', () => {
+    const code = loadMcpSandboxCspFunctionCode("'none'");
+    // JSON.stringify yields `"'none'"` (double-quoted, inner single
+    // quotes unescaped) — a valid JS string literal that decodes back to
+    // the CSP source `'none'` at runtime. Without this escaping the
+    // naive replace would produce `''none''`, a syntax error.
+    expect(code).toContain('var FRAME_ANCESTORS = "\'none\'";');
+  });
+
+  test('handles multiple space-separated source list entries (the common production shape)', () => {
+    const code = loadMcpSandboxCspFunctionCode(
+      'https://alpha.example.com http://localhost:4200',
+    );
+    expect(code).toContain(
+      'var FRAME_ANCESTORS = "https://alpha.example.com http://localhost:4200";',
+    );
+  });
+});
+
+describe('subdomain decision', () => {
+  test('is the documented "mcp-sandbox" label (TBD resolved in PR #1)', () => {
+    expect(MCP_SANDBOX_SUBDOMAIN_LABEL).toBe('mcp-sandbox');
+  });
+});
diff --git a/infrastructure/test/rag-ingestion-stack.test.ts b/infrastructure/test/rag-ingestion-stack.test.ts
index 38b3d92b..c1d2260a 100644
--- a/infrastructure/test/rag-ingestion-stack.test.ts
+++ b/infrastructure/test/rag-ingestion-stack.test.ts
@@ -83,6 +83,15 @@ describe('RagIngestionStack', () => {
         enabled: false,
         defaultQuotaHours: 0,
       },
+      artifacts: {
+        enabled: false,
+        retentionDays: 90,
+        extraFrameAncestors: [],
+      },
+      mcpSandbox: {
+        enabled: false,
+        extraFrameAncestors: [],
+      },
       cognito: {
         domainPrefix: 'test-project',
         passwordMinLength: 8,
diff --git a/infrastructure/test/stack-dependencies.test.ts b/infrastructure/test/stack-dependencies.test.ts
index a620f184..ffaff10f 100644
--- a/infrastructure/test/stack-dependencies.test.ts
+++ b/infrastructure/test/stack-dependencies.test.ts
@@ -34,6 +34,8 @@ const STACK_FILES: Record<string, string> = {
   'rag-ingestion-stack.ts': 'RagIngestionStack',
   'gateway-stack.ts': 'GatewayStack',
   'sagemaker-fine-tuning-stack.ts': 'SageMakerFineTuningStack',
+  'artifacts-stack.ts': 'ArtifactsStack',
+  'mcp-sandbox-stack.ts': 'McpSandboxStack',
   'inference-api-stack.ts': 'InferenceApiStack',
   'app-api-stack.ts': 'AppApiStack',
   'frontend-stack.ts': 'FrontendStack',
@@ -57,6 +59,11 @@ const DEPLOYMENT_TIERS: Record<string, number> = {
   RagIngestionStack: 1,
   GatewayStack: 1,
   SageMakerFineTuningStack: 1,
+  ArtifactsStack: 1,
+  // Reads no cross-stack SSM (cert from config, hosted zone via lookup);
+  // writes only its own /mcp-sandbox/origin. Parallel-safe with the other
+  // tier-1 stacks.
+  McpSandboxStack: 1,
   InferenceApiStack: 2,
   AppApiStack: 3,
   FrontendStack: 4,
diff --git a/scripts/backup-data/README.md b/scripts/backup-data/README.md
new file mode 100644
index 00000000..3ad56926
--- /dev/null
+++ b/scripts/backup-data/README.md
@@ -0,0 +1,247 @@
+# Pre-Migration Backup Tool
+
+One-shot tooling that takes a complete, restore-friendly backup of all
+application data and users in an AgentCore Public Stack deployment, so the
+infrastructure can be torn down and redeployed in a new shape without losing
+anything.
+
+**Restore is intentionally a separate step.** This tool produces portable,
+transformable artifacts (DynamoDB JSON, raw S3 objects, JSON config blobs);
+the restore script — written against the *new* infrastructure once it
+exists — reads those artifacts and maps them into the new shape.
+
+## What is backed up
+
+| Component | Source | How |
+|---|---|---|
+| Application DynamoDB tables (~20) | `/{prefix}/.../...-table-name` SSM params + the conventionally-named `{prefix}-assistants` table | `ExportTableToPointInTime` → gzipped DynamoDB-JSON in S3 |
+| User-content S3 buckets | `/{prefix}/.../bucket-name` SSM params (user file uploads, RAG documents, conditional artifacts/fine-tuning) | `aws s3 sync` |
+| Cognito User Pool config | `/{prefix}/auth/cognito/user-pool-id` SSM | `DescribeUserPool` JSON |
+| **Cognito Identity Providers** | enumerate + describe each | full `ProviderDetails` incl. **OIDC `client_secret`** for automated re-registration |
+| **Cognito App Clients** | enumerate + describe each | full record incl. **`ClientSecret`** |
+| Cognito Resource Servers, Domain, UI Customization | describe APIs | JSON |
+| Cognito Users, Groups, Group Memberships | paginated list APIs | gzipped JSONL |
+| AgentCore Memory | `/{prefix}/inference-api/memory-id` SSM | memory config + best-effort per-actor events |
+
+Every run lands in a freshly-created S3 bucket named
+`{project_prefix}-backup-{utc_timestamp}` (versioned, SSE-encrypted, TLS-only,
+all public access blocked) with this layout:
+
+```
+s3://{project_prefix}-backup-{ts}/{project_prefix}/{ts}/
+  manifest.json                            # see manifest_schema.json
+  dynamodb/
+    {logical}.schema.json                  # KeySchema, GSIs, TTL, etc.
+    {logical}/AWSDynamoDB/{export-id}/...  # DynamoDB-managed export tree
+  s3/
+    {logical}/                             # mirror of source bucket
+  cognito/
+    user-pool.json
+    identity-providers.json                # ⚠ contains client_secrets
+    app-clients.json                       # ⚠ contains ClientSecrets
+    resource-servers.json
+    domain.json
+    ui-customization.json
+    users.jsonl.gz
+    groups.jsonl.gz
+    group-memberships.jsonl.gz
+  agentcore-memory/
+    memory.json
+    events.jsonl.gz                        # best-effort
+    NOTES.md                               # only if partial / unavailable
+```
+
+## What is NOT backed up
+
+- **Cognito password hashes** — Cognito does not expose them. Native-password
+  users will need a "forgot password" reset on first login after restore.
+  Federated (OIDC/SAML) users are unaffected.
+- Ephemeral tables (`bff-sessions`, `oidc-state`, `voice-ticket-replay`) —
+  TTL-driven, no value preserving across migration. Pass
+  `--include-ephemeral` if you disagree.
+- Vector index bucket (`/{prefix}/rag/vector-bucket-name`) — derived from
+  the RAG documents bucket; re-ingest after restore.
+- Frontend, MCP-sandbox, and other build-artifact buckets — rebuilt from
+  source.
+- KMS keys, Secrets Manager values, ECR images, CloudWatch logs — infra,
+  not application data.
+
+## ⚠ Sensitive data warning
+
+The backup bucket contains:
+- Every user's email, custom attributes, and (federated) identity links.
+- Every user's uploaded files and RAG documents.
+- **OIDC client secrets** for every federated IdP.
+- **App client secrets** for every Cognito app client.
+- **OAuth user tokens** (in the `oauth-user-tokens` DynamoDB table).
+
+Treat the backup bucket like a production secret store. The bucket policy
+denies non-TLS access and blocks all public access by default, but you are
+responsible for restricting who can read it. After restore is complete and
+verified, delete the bucket (and its versions).
+
+## Running it
+
+### From GitHub Actions (recommended)
+
+`Actions → Backup Data (Pre-Migration) → Run workflow`. Inputs:
+
+- `project_prefix` — e.g. `boisestate-prod`
+- `aws_region` — defaults to `us-west-2`
+- `aws_environment` — selects the GitHub Environment (and thus
+  `AWS_ROLE_ARN` secret + any required approvals)
+- `include_ephemeral` / `dry_run` / `allow_partial` — usually leave off
+
+The workflow uses the existing `.github/actions/configure-aws-credentials`
+composite action for OIDC role assumption.
+
+### Locally
+
+```bash
+cd scripts/backup-data
+uv sync
+uv run python backup.py \
+  --project-prefix boisestate-prod \
+  --region us-west-2 \
+  --dry-run            # start here to see what would happen
+```
+
+Requires AWS credentials with the IAM policy below. The script also shells
+out to `aws s3 sync`, so the AWS CLI must be on PATH.
+
+## IAM policy
+
+The OIDC role (or local IAM user) needs:
+
+```json
+{
+  "Version": "2012-10-17",
+  "Statement": [
+    {
+      "Sid": "DiscoverViaSSM",
+      "Effect": "Allow",
+      "Action": ["ssm:GetParameter", "ssm:GetParametersByPath"],
+      "Resource": "arn:aws:ssm:*:*:parameter/PROJECT_PREFIX/*"
+    },
+    {
+      "Sid": "DynamoDBExport",
+      "Effect": "Allow",
+      "Action": [
+        "dynamodb:DescribeTable",
+        "dynamodb:DescribeContinuousBackups",
+        "dynamodb:DescribeTimeToLive",
+        "dynamodb:ExportTableToPointInTime",
+        "dynamodb:DescribeExport",
+        "dynamodb:ListExports"
+      ],
+      "Resource": [
+        "arn:aws:dynamodb:*:*:table/PROJECT_PREFIX-*",
+        "arn:aws:dynamodb:*:*:table/PROJECT_PREFIX-*/export/*"
+      ]
+    },
+    {
+      "Sid": "S3Source",
+      "Effect": "Allow",
+      "Action": ["s3:ListBucket", "s3:GetObject", "s3:GetObjectTagging"],
+      "Resource": [
+        "arn:aws:s3:::PROJECT_PREFIX-*",
+        "arn:aws:s3:::PROJECT_PREFIX-*/*"
+      ]
+    },
+    {
+      "Sid": "S3BackupBucket",
+      "Effect": "Allow",
+      "Action": [
+        "s3:CreateBucket",
+        "s3:PutBucketVersioning",
+        "s3:PutBucketEncryption",
+        "s3:PutBucketTagging",
+        "s3:PutBucketPolicy",
+        "s3:PutBucketPublicAccessBlock",
+        "s3:ListBucket",
+        "s3:PutObject",
+        "s3:AbortMultipartUpload",
+        "s3:GetObject"
+      ],
+      "Resource": [
+        "arn:aws:s3:::PROJECT_PREFIX-backup-*",
+        "arn:aws:s3:::PROJECT_PREFIX-backup-*/*"
+      ]
+    },
+    {
+      "Sid": "Cognito",
+      "Effect": "Allow",
+      "Action": [
+        "cognito-idp:DescribeUserPool",
+        "cognito-idp:ListIdentityProviders",
+        "cognito-idp:DescribeIdentityProvider",
+        "cognito-idp:ListUserPoolClients",
+        "cognito-idp:DescribeUserPoolClient",
+        "cognito-idp:ListResourceServers",
+        "cognito-idp:DescribeUserPoolDomain",
+        "cognito-idp:GetUICustomization",
+        "cognito-idp:ListUsers",
+        "cognito-idp:ListGroups",
+        "cognito-idp:ListUsersInGroup"
+      ],
+      "Resource": "arn:aws:cognito-idp:*:*:userpool/*"
+    },
+    {
+      "Sid": "AgentCoreMemory",
+      "Effect": "Allow",
+      "Action": [
+        "bedrock-agentcore:GetMemory",
+        "bedrock-agentcore:ListActors",
+        "bedrock-agentcore:ListSessions",
+        "bedrock-agentcore:ListEvents"
+      ],
+      "Resource": "*"
+    },
+    {
+      "Sid": "STSCallerIdentity",
+      "Effect": "Allow",
+      "Action": "sts:GetCallerIdentity",
+      "Resource": "*"
+    },
+    {
+      "Sid": "KMSForEncryptedResources",
+      "Effect": "Allow",
+      "Action": ["kms:Decrypt", "kms:DescribeKey"],
+      "Resource": "*"
+    }
+  ]
+}
+```
+
+Replace `PROJECT_PREFIX` with your actual project prefix (e.g.
+`boisestate-prod`).
+
+## Verification built into the tool
+
+- DynamoDB: each export records `ItemCount`, `BilledSizeBytes`, and the
+  `ExportManifest` location.
+- S3: source object count + total bytes are compared to destination count
+  after sync; mismatch fails that component.
+- Cognito: per-component counts; OIDC IdPs missing a `client_secret`
+  fail loudly (rather than silently writing an unrestorable backup).
+- `manifest.json` rolls everything up into `summary.{ok,skipped,failed}`.
+  The workflow fails on any `failed` row unless `--allow-partial` is set.
+
+## Restore (placeholder)
+
+Out of scope here. The restore script will:
+
+1. Read `manifest.json` to know what exists.
+2. For each `dynamodb/{logical}/` export, read DynamoDB-JSON line by line
+   and `BatchWriteItem` into the new table with whatever attribute
+   re-mapping the new schema requires.
+3. For each `s3/{logical}/`, `aws s3 sync` back into the new bucket (or
+   re-key on the way through).
+4. For Cognito: re-create the user pool config (via CDK in the new infra),
+   then iterate `identity-providers.json` + `app-clients.json` to
+   `CreateIdentityProvider` / `CreateUserPoolClient` using the preserved
+   secrets — which is the whole reason we capture them.
+5. For users: `AdminCreateUser` with `MessageAction=SUPPRESS` then optionally
+   trigger a forced password reset.
+6. For AgentCore Memory: re-create the memory resource in the new infra,
+   then replay `events.jsonl.gz` if needed.
diff --git a/scripts/backup-data/backup.py b/scripts/backup-data/backup.py
new file mode 100644
index 00000000..e58499eb
--- /dev/null
+++ b/scripts/backup-data/backup.py
@@ -0,0 +1,1010 @@
+"""Pre-migration backup tool for AgentCore Public Stack.
+
+Discovers all application data sources for a given CDK_PROJECT_PREFIX via SSM
+Parameter Store, creates a dedicated S3 backup bucket, and dumps:
+
+* All DynamoDB tables (via ExportTableToPointInTime — portable DynamoDB-JSON,
+  NOT AWS-Backup snapshots, so the restore step can transform freely into the
+  new schema).
+* All S3 user-content buckets (raw object copy via `aws s3 sync`).
+* The full Cognito User Pool: pool config, identity providers (with OIDC
+  client_secret preserved so future IdP re-registration can be automated),
+  app clients (with ClientSecret preserved), resource servers, domain, UI
+  customization, users, groups, group memberships.
+* AgentCore Memory: best-effort enumeration of strategies + per-actor events.
+
+Every run writes to a fresh bucket: {project_prefix}-backup-{utc_timestamp}.
+A single `manifest.json` at the root describes everything and records per-
+component pass/fail + counts. Intended to be invoked from the
+`.github/workflows/backup-data.yml` workflow but is fully runnable locally
+with valid AWS credentials.
+
+Restore is intentionally out of scope — see scripts/backup-data/README.md.
+"""
+
+from __future__ import annotations
+
+import argparse
+import dataclasses
+import gzip
+import io
+import json
+import logging
+import os
+import re
+import subprocess
+import sys
+import time
+import traceback
+from concurrent.futures import ThreadPoolExecutor, as_completed
+from dataclasses import dataclass, field
+from datetime import datetime, timezone
+from typing import Any
+
+import boto3
+import botocore
+from botocore.config import Config as BotoConfig
+from botocore.exceptions import ClientError
+
+LOG = logging.getLogger("backup")
+
+BOTO_CONFIG = BotoConfig(
+    retries={"max_attempts": 10, "mode": "adaptive"},
+    user_agent_extra="agentcore-backup/1.0",
+)
+
+# --------------------------------------------------------------------------- #
+# SSM-discoverable resources.                                                 #
+# `ssm` is the parameter path under `/{prefix}/...`; `logical` is the stable  #
+# name used in the manifest + on-disk layout. Optional resources are skipped  #
+# without error when the SSM parameter is missing.                            #
+# --------------------------------------------------------------------------- #
+DYNAMODB_TABLES: list[dict[str, Any]] = [
+    {"logical": "users",                "ssm": "/users/users-table-name"},
+    {"logical": "app-roles",            "ssm": "/rbac/app-roles-table-name"},
+    {"logical": "api-keys",             "ssm": "/auth/api-keys-table-name"},
+    {"logical": "auth-providers",       "ssm": "/auth/auth-providers-table-name"},
+    {"logical": "oauth-providers",      "ssm": "/oauth/providers-table-name"},
+    {"logical": "oauth-user-tokens",    "ssm": "/oauth/user-tokens-table-name"},
+    {"logical": "user-quotas",          "ssm": "/quota/user-quotas-table-name"},
+    {"logical": "quota-events",         "ssm": "/quota/quota-events-table-name"},
+    {"logical": "sessions-metadata",    "ssm": "/cost-tracking/sessions-metadata-table-name"},
+    {"logical": "user-cost-summary",    "ssm": "/cost-tracking/user-cost-summary-table-name"},
+    {"logical": "system-cost-rollup",   "ssm": "/cost-tracking/system-cost-rollup-table-name"},
+    {"logical": "managed-models",       "ssm": "/admin/managed-models-table-name"},
+    {"logical": "user-menu-links",      "ssm": "/admin/user-menu-links-table-name"},
+    {"logical": "user-settings",        "ssm": "/settings/user-settings-table-name"},
+    {"logical": "user-file-uploads",    "ssm": "/user-file-uploads/table-name"},
+    {"logical": "shared-conversations", "ssm": "/shares/shared-conversations-table-name"},
+    {"logical": "rag-assistants",       "ssm": "/rag/assistants-table-name"},
+    {"logical": "artifacts",            "ssm": "/artifacts/table-name",                 "optional": True},
+    {"logical": "fine-tuning-jobs",     "ssm": "/fine-tuning/jobs-table-name",          "optional": True},
+    {"logical": "fine-tuning-access",   "ssm": "/fine-tuning/access-table-name",        "optional": True},
+]
+
+# The app `assistants` table is NOT published to SSM; falls back to the
+# deterministic CDK naming convention `{prefix}-assistants`.
+DYNAMODB_TABLES_BY_CONVENTION: list[dict[str, str]] = [
+    {"logical": "assistants", "suffix": "assistants"},
+]
+
+# Ephemeral / TTL-driven tables. Excluded by default; include with --include-ephemeral.
+DYNAMODB_TABLES_EPHEMERAL: list[dict[str, str]] = [
+    {"logical": "bff-sessions",         "suffix": "bff-sessions"},
+    {"logical": "oidc-state",           "suffix": "oidc-state"},
+    {"logical": "voice-ticket-replay",  "suffix": "voice-ticket-replay"},
+]
+
+S3_BUCKETS: list[dict[str, Any]] = [
+    {"logical": "user-file-uploads",    "ssm": "/user-file-uploads/bucket-name"},
+    {"logical": "rag-documents",        "ssm": "/rag/documents-bucket-name"},
+    {"logical": "artifacts",            "ssm": "/artifacts/bucket-name",                "optional": True},
+    {"logical": "fine-tuning-data",     "ssm": "/fine-tuning/data-bucket-name",         "optional": True},
+]
+
+SSM_USER_POOL_ID = "/auth/cognito/user-pool-id"
+SSM_MEMORY_ID = "/inference-api/memory-id"
+
+
+# --------------------------------------------------------------------------- #
+# Data classes for the manifest.                                              #
+# --------------------------------------------------------------------------- #
+@dataclass
+class ComponentResult:
+    """One row in the manifest. status is 'ok' | 'skipped' | 'failed'."""
+    component: str
+    logical_name: str
+    status: str
+    detail: dict[str, Any] = field(default_factory=dict)
+    error: str | None = None
+
+
+@dataclass
+class BackupContext:
+    project_prefix: str
+    region: str
+    timestamp: str               # UTC, e.g. 20260120T173042Z
+    bucket: str                  # destination bucket
+    root_prefix: str             # bucket key prefix, e.g. {prefix}/{ts}
+    account_id: str
+    include_ephemeral: bool
+    dry_run: bool
+    allow_partial: bool
+    session: boto3.Session
+    results: list[ComponentResult] = field(default_factory=list)
+
+
+# --------------------------------------------------------------------------- #
+# Discovery                                                                   #
+# --------------------------------------------------------------------------- #
+def get_ssm_param(session: boto3.Session, name: str) -> str | None:
+    ssm = session.client("ssm", config=BOTO_CONFIG)
+    try:
+        return ssm.get_parameter(Name=name)["Parameter"]["Value"]
+    except ClientError as exc:
+        if exc.response.get("Error", {}).get("Code") == "ParameterNotFound":
+            return None
+        raise
+
+
+# --------------------------------------------------------------------------- #
+# Bucket creation                                                             #
+# --------------------------------------------------------------------------- #
+def ensure_backup_bucket(ctx: BackupContext) -> None:
+    """Create the destination bucket with versioning, SSE, BPA, and a bucket
+    policy permitting DynamoDB ExportTableToPointInTime to write into it."""
+    s3 = ctx.session.client("s3", config=BOTO_CONFIG)
+    LOG.info("Creating backup bucket s3://%s", ctx.bucket)
+
+    if ctx.dry_run:
+        LOG.info("[dry-run] would create bucket %s", ctx.bucket)
+        return
+
+    create_kwargs: dict[str, Any] = {"Bucket": ctx.bucket}
+    if ctx.region != "us-east-1":
+        create_kwargs["CreateBucketConfiguration"] = {"LocationConstraint": ctx.region}
+    try:
+        s3.create_bucket(**create_kwargs)
+    except ClientError as exc:
+        code = exc.response.get("Error", {}).get("Code")
+        if code in {"BucketAlreadyOwnedByYou", "BucketAlreadyExists"}:
+            LOG.warning("Bucket %s already exists; reusing", ctx.bucket)
+        else:
+            raise
+
+    s3.get_waiter("bucket_exists").wait(Bucket=ctx.bucket)
+
+    s3.put_public_access_block(
+        Bucket=ctx.bucket,
+        PublicAccessBlockConfiguration={
+            "BlockPublicAcls": True,
+            "IgnorePublicAcls": True,
+            "BlockPublicPolicy": True,
+            "RestrictPublicBuckets": True,
+        },
+    )
+    s3.put_bucket_versioning(Bucket=ctx.bucket, VersioningConfiguration={"Status": "Enabled"})
+    s3.put_bucket_encryption(
+        Bucket=ctx.bucket,
+        ServerSideEncryptionConfiguration={
+            "Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "AES256"}}]
+        },
+    )
+
+    s3.put_bucket_tagging(
+        Bucket=ctx.bucket,
+        Tagging={"TagSet": [
+            {"Key": "Project", "Value": ctx.project_prefix},
+            {"Key": "Purpose", "Value": "pre-migration-backup"},
+            {"Key": "CreatedAt", "Value": ctx.timestamp},
+        ]},
+    )
+
+    # Bucket policy: allow DynamoDB ExportTableToPointInTime to write under
+    # the dynamodb/ prefix, and deny any non-TLS access.
+    policy = {
+        "Version": "2012-10-17",
+        "Statement": [
+            {
+                "Sid": "AllowDynamoDBExportFromAccount",
+                "Effect": "Allow",
+                "Principal": {"Service": "dynamodb.amazonaws.com"},
+                "Action": ["s3:PutObject", "s3:AbortMultipartUpload"],
+                "Resource": f"arn:aws:s3:::{ctx.bucket}/{ctx.root_prefix}/dynamodb/*",
+                "Condition": {"StringEquals": {"aws:SourceAccount": ctx.account_id}},
+            },
+            {
+                "Sid": "DenyInsecureTransport",
+                "Effect": "Deny",
+                "Principal": "*",
+                "Action": "s3:*",
+                "Resource": [
+                    f"arn:aws:s3:::{ctx.bucket}",
+                    f"arn:aws:s3:::{ctx.bucket}/*",
+                ],
+                "Condition": {"Bool": {"aws:SecureTransport": "false"}},
+            },
+        ],
+    }
+    s3.put_bucket_policy(Bucket=ctx.bucket, Policy=json.dumps(policy))
+
+
+# --------------------------------------------------------------------------- #
+# DynamoDB                                                                    #
+# --------------------------------------------------------------------------- #
+def submit_dynamo_export(ctx: BackupContext, logical: str, table_name: str) -> ComponentResult:
+    dynamo = ctx.session.client("dynamodb", config=BOTO_CONFIG)
+    try:
+        desc = dynamo.describe_table(TableName=table_name)["Table"]
+    except ClientError as exc:
+        return ComponentResult(
+            "dynamodb", logical, "failed",
+            detail={"table_name": table_name},
+            error=f"DescribeTable: {exc.response.get('Error', {}).get('Code')}",
+        )
+
+    table_arn = desc["TableArn"]
+    item_count_estimate = desc.get("ItemCount", 0)
+    pitr_enabled = False
+    try:
+        pitr_desc = dynamo.describe_continuous_backups(TableName=table_name)
+        pitr_enabled = (
+            pitr_desc["ContinuousBackupsDescription"]
+            ["PointInTimeRecoveryDescription"]["PointInTimeRecoveryStatus"]
+            == "ENABLED"
+        )
+    except ClientError:
+        pass
+
+    if not pitr_enabled:
+        return ComponentResult(
+            "dynamodb", logical, "failed",
+            detail={"table_name": table_name, "table_arn": table_arn,
+                    "item_count_estimate": item_count_estimate},
+            error="PointInTimeRecovery not enabled — cannot use ExportTableToPointInTime",
+        )
+
+    if ctx.dry_run:
+        return ComponentResult(
+            "dynamodb", logical, "skipped",
+            detail={"table_name": table_name, "table_arn": table_arn,
+                    "item_count_estimate": item_count_estimate, "reason": "dry-run"},
+        )
+
+    s3_prefix = f"{ctx.root_prefix}/dynamodb/{logical}"
+    try:
+        resp = dynamo.export_table_to_point_in_time(
+            TableArn=table_arn,
+            S3Bucket=ctx.bucket,
+            S3Prefix=s3_prefix,
+            ExportFormat="DYNAMODB_JSON",
+            S3SseAlgorithm="AES256",
+        )
+    except ClientError as exc:
+        return ComponentResult(
+            "dynamodb", logical, "failed",
+            detail={"table_name": table_name, "table_arn": table_arn},
+            error=f"ExportTableToPointInTime: {exc.response.get('Error', {}).get('Code')}",
+        )
+
+    export_arn = resp["ExportDescription"]["ExportArn"]
+    schema_blob = {
+        "table_name": table_name,
+        "table_arn": table_arn,
+        "key_schema": desc.get("KeySchema", []),
+        "attribute_definitions": desc.get("AttributeDefinitions", []),
+        "global_secondary_indexes": [
+            {"index_name": gsi["IndexName"], "key_schema": gsi["KeySchema"]}
+            for gsi in desc.get("GlobalSecondaryIndexes", []) or []
+        ],
+        "local_secondary_indexes": [
+            {"index_name": lsi["IndexName"], "key_schema": lsi["KeySchema"]}
+            for lsi in desc.get("LocalSecondaryIndexes", []) or []
+        ],
+        "stream_specification": desc.get("StreamSpecification"),
+        "ttl_attribute": _get_ttl_attribute(dynamo, table_name),
+        "billing_mode": desc.get("BillingModeSummary", {}).get("BillingMode"),
+    }
+    put_json(ctx, f"dynamodb/{logical}.schema.json", schema_blob)
+
+    return ComponentResult(
+        "dynamodb", logical, "ok",
+        detail={
+            "table_name": table_name,
+            "table_arn": table_arn,
+            "export_arn": export_arn,
+            "s3_prefix": s3_prefix,
+            "item_count_estimate": item_count_estimate,
+            "status": "IN_PROGRESS",
+        },
+    )
+
+
+def _get_ttl_attribute(dynamo: Any, table_name: str) -> str | None:
+    try:
+        ttl = dynamo.describe_time_to_live(TableName=table_name)["TimeToLiveDescription"]
+        if ttl.get("TimeToLiveStatus") in {"ENABLED", "ENABLING"}:
+            return ttl.get("AttributeName")
+    except ClientError:
+        pass
+    return None
+
+
+def wait_for_dynamo_exports(ctx: BackupContext, results: list[ComponentResult]) -> None:
+    """Poll all in-progress exports until they reach a terminal state."""
+    dynamo = ctx.session.client("dynamodb", config=BOTO_CONFIG)
+    pending = [r for r in results
+               if r.component == "dynamodb" and r.status == "ok"
+               and r.detail.get("status") == "IN_PROGRESS"]
+    if not pending:
+        return
+
+    LOG.info("Waiting for %d DynamoDB exports to complete…", len(pending))
+    while pending:
+        time.sleep(30)
+        still_pending: list[ComponentResult] = []
+        for r in pending:
+            try:
+                desc = dynamo.describe_export(
+                    ExportArn=r.detail["export_arn"]
+                )["ExportDescription"]
+            except ClientError as exc:
+                r.status = "failed"
+                r.error = f"DescribeExport: {exc.response.get('Error', {}).get('Code')}"
+                continue
+            state = desc["ExportStatus"]
+            r.detail["status"] = state
+            if state == "COMPLETED":
+                r.detail["item_count_exported"] = desc.get("ItemCount", 0)
+                r.detail["billed_size_bytes"] = desc.get("BilledSizeBytes", 0)
+                r.detail["export_manifest"] = desc.get("ExportManifest")
+                LOG.info("  %s: COMPLETED (%d items)", r.logical_name,
+                         r.detail["item_count_exported"])
+            elif state == "FAILED":
+                r.status = "failed"
+                r.error = (
+                    f"Export FAILED: {desc.get('FailureCode')} "
+                    f"{desc.get('FailureMessage')}"
+                )
+                LOG.error("  %s: FAILED — %s", r.logical_name, r.error)
+            else:
+                still_pending.append(r)
+        pending = still_pending
+        if pending:
+            LOG.info("  …still waiting on %d", len(pending))
+
+
+# --------------------------------------------------------------------------- #
+# S3                                                                          #
+# --------------------------------------------------------------------------- #
+def backup_s3_bucket(ctx: BackupContext, logical: str, source_bucket: str) -> ComponentResult:
+    """Mirror a source bucket into the backup bucket under s3/{logical}/."""
+    s3 = ctx.session.client("s3", config=BOTO_CONFIG)
+    paginator = s3.get_paginator("list_objects_v2")
+    obj_count = 0
+    total_bytes = 0
+    try:
+        for page in paginator.paginate(Bucket=source_bucket):
+            for o in page.get("Contents", []) or []:
+                obj_count += 1
+                total_bytes += o.get("Size", 0)
+    except ClientError as exc:
+        return ComponentResult(
+            "s3", logical, "failed",
+            detail={"source_bucket": source_bucket},
+            error=f"ListObjects: {exc.response.get('Error', {}).get('Code')}",
+        )
+
+    detail: dict[str, Any] = {
+        "source_bucket": source_bucket,
+        "object_count": obj_count,
+        "total_bytes": total_bytes,
+        "destination_prefix": f"s3://{ctx.bucket}/{ctx.root_prefix}/s3/{logical}/",
+    }
+    if ctx.dry_run:
+        return ComponentResult("s3", logical, "skipped",
+                               detail={**detail, "reason": "dry-run"})
+
+    dest_uri = f"s3://{ctx.bucket}/{ctx.root_prefix}/s3/{logical}/"
+    cmd = ["aws", "s3", "sync", f"s3://{source_bucket}/", dest_uri,
+           "--region", ctx.region, "--only-show-errors"]
+    LOG.info("Syncing %s → %s (%d objects, %.1f MiB)",
+             source_bucket, dest_uri, obj_count, total_bytes / 1024 / 1024)
+    proc = subprocess.run(cmd, capture_output=True, text=True)
+    if proc.returncode != 0:
+        return ComponentResult(
+            "s3", logical, "failed",
+            detail=detail,
+            error=f"aws s3 sync exit {proc.returncode}: {proc.stderr.strip()[:500]}",
+        )
+
+    dest_count = 0
+    for page in paginator.paginate(
+        Bucket=ctx.bucket, Prefix=f"{ctx.root_prefix}/s3/{logical}/"
+    ):
+        dest_count += len(page.get("Contents", []) or [])
+    detail["destination_object_count"] = dest_count
+    if dest_count < obj_count:
+        return ComponentResult(
+            "s3", logical, "failed",
+            detail=detail,
+            error=f"Destination count {dest_count} < source count {obj_count}",
+        )
+    return ComponentResult("s3", logical, "ok", detail=detail)
+
+
+# --------------------------------------------------------------------------- #
+# Cognito                                                                     #
+# --------------------------------------------------------------------------- #
+def backup_cognito(ctx: BackupContext, user_pool_id: str) -> list[ComponentResult]:
+    """Full Cognito dump. Each piece becomes its own ComponentResult."""
+    idp = ctx.session.client("cognito-idp", config=BOTO_CONFIG)
+    results: list[ComponentResult] = []
+
+    # 1. User pool config
+    try:
+        pool = idp.describe_user_pool(UserPoolId=user_pool_id)["UserPool"]
+        put_json(ctx, "cognito/user-pool.json", _scrub_datetimes(pool))
+        results.append(ComponentResult(
+            "cognito", "user-pool", "ok",
+            detail={"user_pool_id": user_pool_id,
+                    "estimated_users": pool.get("EstimatedNumberOfUsers")},
+        ))
+    except ClientError as exc:
+        results.append(ComponentResult(
+            "cognito", "user-pool", "failed",
+            detail={"user_pool_id": user_pool_id},
+            error=str(exc),
+        ))
+        return results  # Nothing else works without this.
+
+    # 2. Identity providers — describe each so ProviderDetails (incl.
+    # client_secret for OIDC/social) is preserved verbatim.
+    idps_out: list[dict[str, Any]] = []
+    redacted_secrets: list[str] = []
+    try:
+        names: list[str] = []
+        paginator = idp.get_paginator("list_identity_providers")
+        for page in paginator.paginate(UserPoolId=user_pool_id):
+            names.extend(p["ProviderName"] for p in page.get("Providers", []))
+        for name in names:
+            full = idp.describe_identity_provider(
+                UserPoolId=user_pool_id, ProviderName=name
+            )["IdentityProvider"]
+            details = full.get("ProviderDetails", {}) or {}
+            if full.get("ProviderType") in {"OIDC", "Google", "Facebook",
+                                            "SignInWithApple", "LoginWithAmazon"}:
+                if "client_secret" in details and not details["client_secret"]:
+                    redacted_secrets.append(name)
+            idps_out.append(_scrub_datetimes(full))
+        put_json(ctx, "cognito/identity-providers.json", {"providers": idps_out})
+        status = "ok"
+        error: str | None = None
+        if redacted_secrets:
+            status = "failed"
+            error = (f"client_secret missing for IdP(s): "
+                     f"{', '.join(redacted_secrets)} — backup would be unrestorable")
+        results.append(ComponentResult("cognito", "identity-providers",
+                                       status, detail={"count": len(idps_out)},
+                                       error=error))
+    except ClientError as exc:
+        results.append(ComponentResult(
+            "cognito", "identity-providers", "failed", detail={}, error=str(exc),
+        ))
+
+    # 3. App clients — describe each so ClientSecret is preserved.
+    clients_out: list[dict[str, Any]] = []
+    try:
+        ids: list[str] = []
+        paginator = idp.get_paginator("list_user_pool_clients")
+        for page in paginator.paginate(UserPoolId=user_pool_id):
+            ids.extend(c["ClientId"] for c in page.get("UserPoolClients", []))
+        for cid in ids:
+            full = idp.describe_user_pool_client(
+                UserPoolId=user_pool_id, ClientId=cid
+            )["UserPoolClient"]
+            clients_out.append(_scrub_datetimes(full))
+        put_json(ctx, "cognito/app-clients.json", {"clients": clients_out})
+        results.append(ComponentResult(
+            "cognito", "app-clients", "ok", detail={"count": len(clients_out)},
+        ))
+    except ClientError as exc:
+        results.append(ComponentResult(
+            "cognito", "app-clients", "failed", detail={}, error=str(exc),
+        ))
+
+    # 4. Resource servers
+    try:
+        out = []
+        paginator = idp.get_paginator("list_resource_servers")
+        for page in paginator.paginate(UserPoolId=user_pool_id, MaxResults=50):
+            out.extend(page.get("ResourceServers", []))
+        put_json(ctx, "cognito/resource-servers.json", {"resource_servers": out})
+        results.append(ComponentResult("cognito", "resource-servers", "ok",
+                                       detail={"count": len(out)}))
+    except ClientError as exc:
+        results.append(ComponentResult("cognito", "resource-servers", "failed",
+                                       detail={}, error=str(exc)))
+
+    # 5. Domain
+    try:
+        domain = pool.get("Domain") or pool.get("CustomDomain")
+        domain_blob: dict[str, Any] = {"domain": domain}
+        if domain:
+            d = idp.describe_user_pool_domain(Domain=domain)["DomainDescription"]
+            domain_blob["description"] = _scrub_datetimes(d)
+        put_json(ctx, "cognito/domain.json", domain_blob)
+        results.append(ComponentResult("cognito", "domain", "ok",
+                                       detail={"domain": domain}))
+    except ClientError as exc:
+        results.append(ComponentResult("cognito", "domain", "failed",
+                                       detail={}, error=str(exc)))
+
+    # 6. UI customization (low value; warn-only on failure)
+    try:
+        ui = idp.get_ui_customization(UserPoolId=user_pool_id)["UICustomization"]
+        put_json(ctx, "cognito/ui-customization.json", _scrub_datetimes(ui))
+        results.append(ComponentResult("cognito", "ui-customization", "ok", detail={}))
+    except ClientError as exc:
+        results.append(ComponentResult("cognito", "ui-customization", "skipped",
+                                       detail={}, error=str(exc)))
+
+    # 7. Users
+    users_count = 0
+    try:
+        with _GzS3Writer(ctx, "cognito/users.jsonl.gz") as fh:
+            paginator = idp.get_paginator("list_users")
+            for page in paginator.paginate(UserPoolId=user_pool_id):
+                for u in page.get("Users", []):
+                    fh.write(json.dumps(_scrub_datetimes(u),
+                                        separators=(",", ":")).encode())
+                    fh.write(b"\n")
+                    users_count += 1
+        results.append(ComponentResult(
+            "cognito", "users", "ok",
+            detail={"count": users_count,
+                    "note": "Password hashes are not exportable from Cognito; "
+                            "native-password users will need a reset on first login."},
+        ))
+    except ClientError as exc:
+        results.append(ComponentResult("cognito", "users", "failed",
+                                       detail={"count": users_count}, error=str(exc)))
+
+    # 8. Groups
+    groups_count = 0
+    group_names: list[str] = []
+    try:
+        with _GzS3Writer(ctx, "cognito/groups.jsonl.gz") as fh:
+            paginator = idp.get_paginator("list_groups")
+            for page in paginator.paginate(UserPoolId=user_pool_id):
+                for g in page.get("Groups", []):
+                    group_names.append(g["GroupName"])
+                    fh.write(json.dumps(_scrub_datetimes(g),
+                                        separators=(",", ":")).encode())
+                    fh.write(b"\n")
+                    groups_count += 1
+        results.append(ComponentResult("cognito", "groups", "ok",
+                                       detail={"count": groups_count}))
+    except ClientError as exc:
+        results.append(ComponentResult("cognito", "groups", "failed",
+                                       detail={"count": groups_count}, error=str(exc)))
+
+    # 9. Group memberships
+    membership_count = 0
+    try:
+        with _GzS3Writer(ctx, "cognito/group-memberships.jsonl.gz") as fh:
+            for gname in group_names:
+                paginator = idp.get_paginator("list_users_in_group")
+                for page in paginator.paginate(UserPoolId=user_pool_id, GroupName=gname):
+                    for u in page.get("Users", []):
+                        rec = {
+                            "GroupName": gname,
+                            "Username": u.get("Username"),
+                            "UserAttributes": u.get("Attributes", []),
+                        }
+                        fh.write(json.dumps(rec, separators=(",", ":")).encode())
+                        fh.write(b"\n")
+                        membership_count += 1
+        results.append(ComponentResult("cognito", "group-memberships", "ok",
+                                       detail={"count": membership_count}))
+    except ClientError as exc:
+        results.append(ComponentResult(
+            "cognito", "group-memberships", "failed",
+            detail={"count": membership_count}, error=str(exc),
+        ))
+
+    return results
+
+
+# --------------------------------------------------------------------------- #
+# AgentCore Memory                                                            #
+# --------------------------------------------------------------------------- #
+def backup_agentcore_memory(ctx: BackupContext, memory_id: str) -> ComponentResult:
+    detail: dict[str, Any] = {"memory_id": memory_id}
+    try:
+        cp = ctx.session.client("bedrock-agentcore-control", config=BOTO_CONFIG)
+    except botocore.exceptions.UnknownServiceError as exc:
+        return ComponentResult("agentcore-memory", "memory", "failed",
+                               detail=detail,
+                               error=f"boto3 lacks bedrock-agentcore-control: {exc}")
+
+    try:
+        mem = cp.get_memory(memoryId=memory_id)["memory"]
+        put_json(ctx, "agentcore-memory/memory.json", _scrub_datetimes(mem))
+        detail["status"] = mem.get("status")
+    except (ClientError, AttributeError) as exc:
+        return ComponentResult("agentcore-memory", "memory", "failed",
+                               detail=detail, error=str(exc))
+
+    try:
+        dp = ctx.session.client("bedrock-agentcore", config=BOTO_CONFIG)
+    except botocore.exceptions.UnknownServiceError as exc:
+        notes = (
+            f"boto3 lacks bedrock-agentcore data plane client ({exc}); "
+            "memory config preserved but per-actor events not exported."
+        )
+        put_text(ctx, "agentcore-memory/NOTES.md", notes)
+        return ComponentResult("agentcore-memory", "memory", "ok",
+                               detail={**detail, "actors_exported": 0,
+                                       "events_exported": 0, "note": notes})
+
+    actors_count = 0
+    events_count = 0
+    try:
+        with _GzS3Writer(ctx, "agentcore-memory/events.jsonl.gz") as fh:
+            paginator = dp.get_paginator("list_actors")
+            for page in paginator.paginate(memoryId=memory_id):
+                for actor in page.get("actorSummaries", []):
+                    actors_count += 1
+                    actor_id = actor.get("actorId")
+                    sessions_paginator = dp.get_paginator("list_sessions")
+                    for s_page in sessions_paginator.paginate(
+                        memoryId=memory_id, actorId=actor_id
+                    ):
+                        for sess in s_page.get("sessionSummaries", []):
+                            sess_id = sess.get("sessionId")
+                            ev_paginator = dp.get_paginator("list_events")
+                            for e_page in ev_paginator.paginate(
+                                memoryId=memory_id,
+                                actorId=actor_id,
+                                sessionId=sess_id,
+                            ):
+                                for ev in e_page.get("events", []):
+                                    rec = {
+                                        "actorId": actor_id,
+                                        "sessionId": sess_id,
+                                        "event": _scrub_datetimes(ev),
+                                    }
+                                    fh.write(json.dumps(rec, separators=(",", ":")).encode())
+                                    fh.write(b"\n")
+                                    events_count += 1
+    except (ClientError, KeyError, AttributeError) as exc:
+        notes = (
+            f"Partial AgentCore Memory event export: {exc}. "
+            f"Exported {actors_count} actors / {events_count} events before error."
+        )
+        put_text(ctx, "agentcore-memory/NOTES.md", notes)
+        return ComponentResult(
+            "agentcore-memory", "memory", "ok",
+            detail={**detail, "actors_exported": actors_count,
+                    "events_exported": events_count, "note": notes},
+        )
+
+    return ComponentResult(
+        "agentcore-memory", "memory", "ok",
+        detail={**detail, "actors_exported": actors_count,
+                "events_exported": events_count},
+    )
+
+
+# --------------------------------------------------------------------------- #
+# Output helpers                                                              #
+# --------------------------------------------------------------------------- #
+def put_json(ctx: BackupContext, rel_key: str, obj: Any) -> None:
+    body = json.dumps(obj, indent=2, default=str).encode()
+    _put_bytes(ctx, rel_key, body, "application/json")
+
+
+def put_text(ctx: BackupContext, rel_key: str, text: str) -> None:
+    _put_bytes(ctx, rel_key, text.encode(), "text/plain")
+
+
+def _put_bytes(ctx: BackupContext, rel_key: str, body: bytes, content_type: str) -> None:
+    if ctx.dry_run:
+        LOG.info("[dry-run] would PUT s3://%s/%s/%s (%d bytes)",
+                 ctx.bucket, ctx.root_prefix, rel_key, len(body))
+        return
+    s3 = ctx.session.client("s3", config=BOTO_CONFIG)
+    s3.put_object(
+        Bucket=ctx.bucket,
+        Key=f"{ctx.root_prefix}/{rel_key}",
+        Body=body,
+        ContentType=content_type,
+    )
+
+
+class _GzS3Writer:
+    """Stream-write a gzipped JSONL file to S3 via a buffered upload."""
+    def __init__(self, ctx: BackupContext, rel_key: str) -> None:
+        self.ctx = ctx
+        self.rel_key = rel_key
+        self.buf = io.BytesIO()
+        self.gz = gzip.GzipFile(fileobj=self.buf, mode="wb")
+
+    def write(self, data: bytes) -> None:
+        self.gz.write(data)
+
+    def __enter__(self) -> "_GzS3Writer":
+        return self
+
+    def __exit__(self, exc_type: Any, exc: Any, tb: Any) -> None:
+        self.gz.close()
+        body = self.buf.getvalue()
+        if self.ctx.dry_run:
+            LOG.info("[dry-run] would PUT %s (%d bytes gz)", self.rel_key, len(body))
+            return
+        s3 = self.ctx.session.client("s3", config=BOTO_CONFIG)
+        s3.put_object(
+            Bucket=self.ctx.bucket,
+            Key=f"{self.ctx.root_prefix}/{self.rel_key}",
+            Body=body,
+            ContentType="application/x-ndjson",
+            ContentEncoding="gzip",
+        )
+
+
+def _scrub_datetimes(obj: Any) -> Any:
+    """boto3 returns datetime objects; convert to ISO strings for JSON."""
+    if isinstance(obj, datetime):
+        return obj.astimezone(timezone.utc).isoformat()
+    if isinstance(obj, dict):
+        return {k: _scrub_datetimes(v) for k, v in obj.items()}
+    if isinstance(obj, list):
+        return [_scrub_datetimes(v) for v in obj]
+    if isinstance(obj, (bytes, bytearray)):
+        return obj.decode("utf-8", errors="replace")
+    return obj
+
+
+# --------------------------------------------------------------------------- #
+# Orchestration                                                               #
+# --------------------------------------------------------------------------- #
+def run(ctx: BackupContext) -> int:
+    LOG.info("Backup starting: project_prefix=%s region=%s bucket=%s prefix=%s",
+             ctx.project_prefix, ctx.region, ctx.bucket, ctx.root_prefix)
+    ensure_backup_bucket(ctx)
+
+    # ---- DynamoDB: discover + submit exports in parallel ----
+    dynamo_targets: list[tuple[str, str]] = []
+    for cfg in DYNAMODB_TABLES:
+        logical = cfg["logical"]
+        path = f"/{ctx.project_prefix}{cfg['ssm']}"
+        name = get_ssm_param(ctx.session, path)
+        if not name:
+            if cfg.get("optional"):
+                ctx.results.append(ComponentResult(
+                    "dynamodb", logical, "skipped",
+                    detail={"ssm_param": path, "reason": "optional, not present"},
+                ))
+            else:
+                ctx.results.append(ComponentResult(
+                    "dynamodb", logical, "failed",
+                    detail={"ssm_param": path},
+                    error="Required SSM parameter not found",
+                ))
+            continue
+        dynamo_targets.append((logical, name))
+
+    for cfg in DYNAMODB_TABLES_BY_CONVENTION:
+        dynamo_targets.append((cfg["logical"], f"{ctx.project_prefix}-{cfg['suffix']}"))
+
+    if ctx.include_ephemeral:
+        for cfg in DYNAMODB_TABLES_EPHEMERAL:
+            dynamo_targets.append((cfg["logical"], f"{ctx.project_prefix}-{cfg['suffix']}"))
+
+    with ThreadPoolExecutor(max_workers=8) as pool:
+        futures = {pool.submit(submit_dynamo_export, ctx, lg, nm): lg
+                   for lg, nm in dynamo_targets}
+        for fut in as_completed(futures):
+            ctx.results.append(fut.result())
+
+    # ---- S3 buckets (parallel) ----
+    s3_targets: list[tuple[str, str]] = []
+    for cfg in S3_BUCKETS:
+        logical = cfg["logical"]
+        path = f"/{ctx.project_prefix}{cfg['ssm']}"
+        name = get_ssm_param(ctx.session, path)
+        if not name:
+            if cfg.get("optional"):
+                ctx.results.append(ComponentResult(
+                    "s3", logical, "skipped",
+                    detail={"ssm_param": path, "reason": "optional, not present"},
+                ))
+            else:
+                ctx.results.append(ComponentResult(
+                    "s3", logical, "failed",
+                    detail={"ssm_param": path},
+                    error="Required SSM parameter not found",
+                ))
+            continue
+        s3_targets.append((logical, name))
+
+    with ThreadPoolExecutor(max_workers=4) as pool:
+        futures = {pool.submit(backup_s3_bucket, ctx, lg, nm): lg
+                   for lg, nm in s3_targets}
+        for fut in as_completed(futures):
+            ctx.results.append(fut.result())
+
+    # ---- Cognito ----
+    user_pool_id = get_ssm_param(ctx.session, f"/{ctx.project_prefix}{SSM_USER_POOL_ID}")
+    if not user_pool_id:
+        ctx.results.append(ComponentResult(
+            "cognito", "user-pool", "failed",
+            detail={"ssm_param": f"/{ctx.project_prefix}{SSM_USER_POOL_ID}"},
+            error="Required SSM parameter not found",
+        ))
+    else:
+        ctx.results.extend(backup_cognito(ctx, user_pool_id))
+
+    # ---- AgentCore Memory ----
+    memory_id = get_ssm_param(ctx.session, f"/{ctx.project_prefix}{SSM_MEMORY_ID}")
+    if not memory_id:
+        ctx.results.append(ComponentResult(
+            "agentcore-memory", "memory", "skipped",
+            detail={"ssm_param": f"/{ctx.project_prefix}{SSM_MEMORY_ID}",
+                    "reason": "not present"},
+        ))
+    else:
+        ctx.results.append(backup_agentcore_memory(ctx, memory_id))
+
+    # ---- Wait for DynamoDB exports to complete ----
+    wait_for_dynamo_exports(ctx, ctx.results)
+
+    # ---- Write manifest + summary ----
+    manifest = build_manifest(ctx)
+    put_json(ctx, "manifest.json", manifest)
+
+    summary = manifest["summary"]
+    LOG.info("Backup complete: %s", summary)
+    _write_github_summary(ctx, manifest)
+
+    if summary["failed"] > 0 and not ctx.allow_partial:
+        return 1
+    return 0
+
+
+def build_manifest(ctx: BackupContext) -> dict[str, Any]:
+    by_component: dict[str, list[dict[str, Any]]] = {}
+    counts = {"ok": 0, "skipped": 0, "failed": 0}
+    for r in ctx.results:
+        by_component.setdefault(r.component, []).append(dataclasses.asdict(r))
+        counts[r.status] = counts.get(r.status, 0) + 1
+    return {
+        "version": 1,
+        "tool": "agentcore-backup-data/1.0",
+        "project_prefix": ctx.project_prefix,
+        "region": ctx.region,
+        "account_id": ctx.account_id,
+        "timestamp": ctx.timestamp,
+        "bucket": ctx.bucket,
+        "root_prefix": ctx.root_prefix,
+        "include_ephemeral": ctx.include_ephemeral,
+        "dry_run": ctx.dry_run,
+        "summary": {
+            "total": len(ctx.results),
+            "ok": counts["ok"],
+            "skipped": counts["skipped"],
+            "failed": counts["failed"],
+        },
+        "components": by_component,
+    }
+
+
+def _write_github_summary(ctx: BackupContext, manifest: dict[str, Any]) -> None:
+    path = os.environ.get("GITHUB_STEP_SUMMARY")
+    if not path:
+        return
+    summary = manifest["summary"]
+    lines = [
+        "# Backup Summary",
+        f"- **Project prefix:** `{ctx.project_prefix}`",
+        f"- **Region:** `{ctx.region}`",
+        f"- **Bucket:** `s3://{ctx.bucket}/{ctx.root_prefix}/`",
+        f"- **Timestamp:** `{ctx.timestamp}`",
+        f"- **Totals:** {summary['ok']} ok · {summary['skipped']} skipped · {summary['failed']} failed",
+        "",
+        "| Component | Logical | Status | Detail |",
+        "|---|---|---|---|",
+    ]
+    for component_rows in manifest["components"].values():
+        for row in component_rows:
+            d = row.get("detail") or {}
+            blurb = row.get("error") or ", ".join(
+                f"{k}={v}" for k, v in d.items()
+                if k in {"item_count_estimate", "item_count_exported",
+                         "object_count", "total_bytes", "count",
+                         "actors_exported", "events_exported"}
+            )
+            lines.append(
+                f"| {row['component']} | {row['logical_name']} | {row['status']} | {blurb} |"
+            )
+    with open(path, "a", encoding="utf-8") as fh:
+        fh.write("\n".join(lines) + "\n")
+
+
+# --------------------------------------------------------------------------- #
+# CLI                                                                         #
+# --------------------------------------------------------------------------- #
+_PREFIX_RE = re.compile(r"^[a-z][a-z0-9-]{1,20}$")
+
+
+def parse_args(argv: list[str] | None = None) -> argparse.Namespace:
+    p = argparse.ArgumentParser(
+        description="Backup all AgentCore Public Stack data for a given project prefix.",
+    )
+    p.add_argument("--project-prefix", required=True,
+                   help="The CDK_PROJECT_PREFIX of the environment to back up.")
+    p.add_argument("--region", required=True, help="AWS region.")
+    p.add_argument("--include-ephemeral", action="store_true",
+                   help="Also back up TTL-driven session/state tables.")
+    p.add_argument("--dry-run", action="store_true",
+                   help="Discover and list sources without performing any writes.")
+    p.add_argument("--allow-partial", action="store_true",
+                   help="Exit 0 even if some components failed (manifest still reflects state).")
+    p.add_argument("--bucket-override", default=None,
+                   help="Use this exact bucket name instead of computing one. "
+                        "Bucket must already exist.")
+    p.add_argument("--verbose", "-v", action="store_true")
+    return p.parse_args(argv)
+
+
+def main(argv: list[str] | None = None) -> int:
+    args = parse_args(argv)
+    logging.basicConfig(
+        level=logging.DEBUG if args.verbose else logging.INFO,
+        format="%(asctime)s %(levelname)s %(message)s",
+    )
+
+    if not _PREFIX_RE.match(args.project_prefix):
+        LOG.error("Invalid --project-prefix '%s' (must match %s)",
+                  args.project_prefix, _PREFIX_RE.pattern)
+        return 2
+
+    session = boto3.Session(region_name=args.region)
+    sts = session.client("sts", config=BOTO_CONFIG)
+    account_id = sts.get_caller_identity()["Account"]
+
+    timestamp = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
+    bucket = args.bucket_override or f"{args.project_prefix}-backup-{timestamp.lower()}"
+    if len(bucket) > 63:
+        LOG.error("Computed bucket name '%s' exceeds 63 chars; use --bucket-override", bucket)
+        return 2
+
+    ctx = BackupContext(
+        project_prefix=args.project_prefix,
+        region=args.region,
+        timestamp=timestamp,
+        bucket=bucket,
+        root_prefix=f"{args.project_prefix}/{timestamp}",
+        account_id=account_id,
+        include_ephemeral=args.include_ephemeral,
+        dry_run=args.dry_run,
+        allow_partial=args.allow_partial,
+        session=session,
+    )
+
+    try:
+        return run(ctx)
+    except Exception:  # noqa: BLE001 — top-level catch so manifest still writes if possible
+        LOG.error("Unhandled error:\n%s", traceback.format_exc())
+        try:
+            manifest = build_manifest(ctx)
+            manifest["fatal_error"] = traceback.format_exc()
+            put_json(ctx, "manifest.json", manifest)
+        except Exception:  # noqa: BLE001
+            pass
+        return 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/scripts/backup-data/manifest_schema.json b/scripts/backup-data/manifest_schema.json
new file mode 100644
index 00000000..426617f1
--- /dev/null
+++ b/scripts/backup-data/manifest_schema.json
@@ -0,0 +1,49 @@
+{
+  "$schema": "https://json-schema.org/draft/2020-12/schema",
+  "$id": "https://github.com/Boise-State-Development/agentcore-public-stack/scripts/backup-data/manifest_schema.json",
+  "title": "AgentCore backup manifest",
+  "description": "Single-source-of-truth descriptor written at the root of every backup. The restore step reads this to know what is present, what failed, and how to address each artifact.",
+  "type": "object",
+  "required": ["version", "tool", "project_prefix", "region", "account_id", "timestamp", "bucket", "root_prefix", "summary", "components"],
+  "properties": {
+    "version": {"const": 1},
+    "tool": {"type": "string"},
+    "project_prefix": {"type": "string", "pattern": "^[a-z][a-z0-9-]{1,20}$"},
+    "region": {"type": "string"},
+    "account_id": {"type": "string", "pattern": "^[0-9]{12}$"},
+    "timestamp": {"type": "string", "pattern": "^[0-9]{8}T[0-9]{6}Z$"},
+    "bucket": {"type": "string"},
+    "root_prefix": {"type": "string"},
+    "include_ephemeral": {"type": "boolean"},
+    "dry_run": {"type": "boolean"},
+    "fatal_error": {"type": "string", "description": "Present only if the run terminated with an unhandled exception."},
+    "summary": {
+      "type": "object",
+      "required": ["total", "ok", "skipped", "failed"],
+      "properties": {
+        "total":   {"type": "integer", "minimum": 0},
+        "ok":      {"type": "integer", "minimum": 0},
+        "skipped": {"type": "integer", "minimum": 0},
+        "failed":  {"type": "integer", "minimum": 0}
+      }
+    },
+    "components": {
+      "type": "object",
+      "description": "Keys are component names (dynamodb, s3, cognito, agentcore-memory).",
+      "additionalProperties": {
+        "type": "array",
+        "items": {
+          "type": "object",
+          "required": ["component", "logical_name", "status"],
+          "properties": {
+            "component":    {"type": "string"},
+            "logical_name": {"type": "string"},
+            "status":       {"enum": ["ok", "skipped", "failed"]},
+            "detail":       {"type": "object"},
+            "error":        {"type": ["string", "null"]}
+          }
+        }
+      }
+    }
+  }
+}
diff --git a/scripts/backup-data/pyproject.toml b/scripts/backup-data/pyproject.toml
new file mode 100644
index 00000000..3d7f2454
--- /dev/null
+++ b/scripts/backup-data/pyproject.toml
@@ -0,0 +1,18 @@
+[project]
+name = "agentcore-backup-data"
+version = "1.0.0"
+description = "Pre-migration backup tool for AgentCore Public Stack data and users"
+requires-python = ">=3.13"
+dependencies = [
+    "boto3==1.43.9",
+    "botocore==1.43.9",
+]
+
+[dependency-groups]
+dev = [
+    "moto[dynamodb,s3,cognitoidp,ssm]==5.1.4",
+    "pytest==8.4.2",
+]
+
+[tool.uv]
+package = false
diff --git a/scripts/backup-data/uv.lock b/scripts/backup-data/uv.lock
new file mode 100644
index 00000000..702f3c2c
--- /dev/null
+++ b/scripts/backup-data/uv.lock
@@ -0,0 +1,586 @@
+version = 1
+revision = 3
+requires-python = ">=3.13"
+
+[[package]]
+name = "agentcore-backup-data"
+version = "1.0.0"
+source = { virtual = "." }
+dependencies = [
+    { name = "boto3" },
+    { name = "botocore" },
+]
+
+[package.dev-dependencies]
+dev = [
+    { name = "moto", extra = ["cognitoidp", "dynamodb", "s3", "ssm"] },
+    { name = "pytest" },
+]
+
+[package.metadata]
+requires-dist = [
+    { name = "boto3", specifier = "==1.43.9" },
+    { name = "botocore", specifier = "==1.43.9" },
+]
+
+[package.metadata.requires-dev]
+dev = [
+    { name = "moto", extras = ["dynamodb", "s3", "cognitoidp", "ssm"], specifier = "==5.1.4" },
+    { name = "pytest", specifier = "==8.4.2" },
+]
+
+[[package]]
+name = "boto3"
+version = "1.43.9"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "botocore" },
+    { name = "jmespath" },
+    { name = "s3transfer" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/b4/cc/42d798fc5305e4636170b50cdfb305ff0a81f470e35131f4a0d2641976ae/boto3-1.43.9.tar.gz", hash = "sha256:37dac72f2921095378c0200caf07918d5e10a82b7c1f611abb70e44f69d0b962", size = 113135, upload-time = "2026-05-15T19:28:31.167Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/f4/dc/51286e9551f7852a79ce5d2a57468d9d905c30d32bcace55204551db202d/boto3-1.43.9-py3-none-any.whl", hash = "sha256:5e967292d361482793471bd80fad1e714515b7401f65a0d5b4aa6ef9d009c030", size = 140523, upload-time = "2026-05-15T19:28:28.948Z" },
+]
+
+[[package]]
+name = "botocore"
+version = "1.43.9"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "jmespath" },
+    { name = "python-dateutil" },
+    { name = "urllib3" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/ca/e8/f696c80982685a4cdb3df5f0781919afa50262f40e1aac7066c9c2520deb/botocore-1.43.9.tar.gz", hash = "sha256:93e91c7160678182860f5902ee4cfe6d643cac0d9ee84d3eb65becc9f4c00228", size = 15357963, upload-time = "2026-05-15T19:28:19.342Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/77/c9/a1b51a74d476f5cb2f555ce8274f0f6b9fb21d75cc3f57b87dd0632ee17a/botocore-1.43.9-py3-none-any.whl", hash = "sha256:b9bdcd9c87fc552aad30006f00167d9ebb3480e1b06f1902bac5b2c41014fdab", size = 15039827, upload-time = "2026-05-15T19:28:14.543Z" },
+]
+
+[[package]]
+name = "certifi"
+version = "2026.5.20"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/f3/ce/ee2ecad540810a79593028e88299baeae54d346cc7a0d94b6199988b89b1/certifi-2026.5.20.tar.gz", hash = "sha256:69dea482ab64caa7b9f6aba1c6bf48bb6a5448d1c0f1b17ab42ad8c763a5344d", size = 135422, upload-time = "2026-05-20T11:46:50.073Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/59/8c/57e832b7af6d7c5abe66eb3fbe3a3a32f4d11ea23a1aa7131371035be991/certifi-2026.5.20-py3-none-any.whl", hash = "sha256:3c52e209ba0a4ad7aebe60436a4ab349c39e1e602e8c134221e546902ad25897", size = 134134, upload-time = "2026-05-20T11:46:48.578Z" },
+]
+
+[[package]]
+name = "cffi"
+version = "2.0.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "pycparser", marker = "implementation_name != 'PyPy'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/eb/56/b1ba7935a17738ae8453301356628e8147c79dbb825bcbc73dc7401f9846/cffi-2.0.0.tar.gz", hash = "sha256:44d1b5909021139fe36001ae048dbdde8214afa20200eda0f64c068cac5d5529", size = 523588, upload-time = "2025-09-08T23:24:04.541Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/4b/8d/a0a47a0c9e413a658623d014e91e74a50cdd2c423f7ccfd44086ef767f90/cffi-2.0.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:00bdf7acc5f795150faa6957054fbbca2439db2f775ce831222b66f192f03beb", size = 185230, upload-time = "2025-09-08T23:23:00.879Z" },
+    { url = "https://files.pythonhosted.org/packages/4a/d2/a6c0296814556c68ee32009d9c2ad4f85f2707cdecfd7727951ec228005d/cffi-2.0.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:45d5e886156860dc35862657e1494b9bae8dfa63bf56796f2fb56e1679fc0bca", size = 181043, upload-time = "2025-09-08T23:23:02.231Z" },
+    { url = "https://files.pythonhosted.org/packages/b0/1e/d22cc63332bd59b06481ceaac49d6c507598642e2230f201649058a7e704/cffi-2.0.0-cp313-cp313-manylinux1_i686.manylinux2014_i686.manylinux_2_17_i686.manylinux_2_5_i686.whl", hash = "sha256:07b271772c100085dd28b74fa0cd81c8fb1a3ba18b21e03d7c27f3436a10606b", size = 212446, upload-time = "2025-09-08T23:23:03.472Z" },
+    { url = "https://files.pythonhosted.org/packages/a9/f5/a2c23eb03b61a0b8747f211eb716446c826ad66818ddc7810cc2cc19b3f2/cffi-2.0.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:d48a880098c96020b02d5a1f7d9251308510ce8858940e6fa99ece33f610838b", size = 220101, upload-time = "2025-09-08T23:23:04.792Z" },
+    { url = "https://files.pythonhosted.org/packages/f2/7f/e6647792fc5850d634695bc0e6ab4111ae88e89981d35ac269956605feba/cffi-2.0.0-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.whl", hash = "sha256:f93fd8e5c8c0a4aa1f424d6173f14a892044054871c771f8566e4008eaa359d2", size = 207948, upload-time = "2025-09-08T23:23:06.127Z" },
+    { url = "https://files.pythonhosted.org/packages/cb/1e/a5a1bd6f1fb30f22573f76533de12a00bf274abcdc55c8edab639078abb6/cffi-2.0.0-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.whl", hash = "sha256:dd4f05f54a52fb558f1ba9f528228066954fee3ebe629fc1660d874d040ae5a3", size = 206422, upload-time = "2025-09-08T23:23:07.753Z" },
+    { url = "https://files.pythonhosted.org/packages/98/df/0a1755e750013a2081e863e7cd37e0cdd02664372c754e5560099eb7aa44/cffi-2.0.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:c8d3b5532fc71b7a77c09192b4a5a200ea992702734a2e9279a37f2478236f26", size = 219499, upload-time = "2025-09-08T23:23:09.648Z" },
+    { url = "https://files.pythonhosted.org/packages/50/e1/a969e687fcf9ea58e6e2a928ad5e2dd88cc12f6f0ab477e9971f2309b57c/cffi-2.0.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:d9b29c1f0ae438d5ee9acb31cadee00a58c46cc9c0b2f9038c6b0b3470877a8c", size = 222928, upload-time = "2025-09-08T23:23:10.928Z" },
+    { url = "https://files.pythonhosted.org/packages/36/54/0362578dd2c9e557a28ac77698ed67323ed5b9775ca9d3fe73fe191bb5d8/cffi-2.0.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:6d50360be4546678fc1b79ffe7a66265e28667840010348dd69a314145807a1b", size = 221302, upload-time = "2025-09-08T23:23:12.42Z" },
+    { url = "https://files.pythonhosted.org/packages/eb/6d/bf9bda840d5f1dfdbf0feca87fbdb64a918a69bca42cfa0ba7b137c48cb8/cffi-2.0.0-cp313-cp313-win32.whl", hash = "sha256:74a03b9698e198d47562765773b4a8309919089150a0bb17d829ad7b44b60d27", size = 172909, upload-time = "2025-09-08T23:23:14.32Z" },
+    { url = "https://files.pythonhosted.org/packages/37/18/6519e1ee6f5a1e579e04b9ddb6f1676c17368a7aba48299c3759bbc3c8b3/cffi-2.0.0-cp313-cp313-win_amd64.whl", hash = "sha256:19f705ada2530c1167abacb171925dd886168931e0a7b78f5bffcae5c6b5be75", size = 183402, upload-time = "2025-09-08T23:23:15.535Z" },
+    { url = "https://files.pythonhosted.org/packages/cb/0e/02ceeec9a7d6ee63bb596121c2c8e9b3a9e150936f4fbef6ca1943e6137c/cffi-2.0.0-cp313-cp313-win_arm64.whl", hash = "sha256:256f80b80ca3853f90c21b23ee78cd008713787b1b1e93eae9f3d6a7134abd91", size = 177780, upload-time = "2025-09-08T23:23:16.761Z" },
+    { url = "https://files.pythonhosted.org/packages/92/c4/3ce07396253a83250ee98564f8d7e9789fab8e58858f35d07a9a2c78de9f/cffi-2.0.0-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:fc33c5141b55ed366cfaad382df24fe7dcbc686de5be719b207bb248e3053dc5", size = 185320, upload-time = "2025-09-08T23:23:18.087Z" },
+    { url = "https://files.pythonhosted.org/packages/59/dd/27e9fa567a23931c838c6b02d0764611c62290062a6d4e8ff7863daf9730/cffi-2.0.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:c654de545946e0db659b3400168c9ad31b5d29593291482c43e3564effbcee13", size = 181487, upload-time = "2025-09-08T23:23:19.622Z" },
+    { url = "https://files.pythonhosted.org/packages/d6/43/0e822876f87ea8a4ef95442c3d766a06a51fc5298823f884ef87aaad168c/cffi-2.0.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:24b6f81f1983e6df8db3adc38562c83f7d4a0c36162885ec7f7b77c7dcbec97b", size = 220049, upload-time = "2025-09-08T23:23:20.853Z" },
+    { url = "https://files.pythonhosted.org/packages/b4/89/76799151d9c2d2d1ead63c2429da9ea9d7aac304603de0c6e8764e6e8e70/cffi-2.0.0-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.whl", hash = "sha256:12873ca6cb9b0f0d3a0da705d6086fe911591737a59f28b7936bdfed27c0d47c", size = 207793, upload-time = "2025-09-08T23:23:22.08Z" },
+    { url = "https://files.pythonhosted.org/packages/bb/dd/3465b14bb9e24ee24cb88c9e3730f6de63111fffe513492bf8c808a3547e/cffi-2.0.0-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.whl", hash = "sha256:d9b97165e8aed9272a6bb17c01e3cc5871a594a446ebedc996e2397a1c1ea8ef", size = 206300, upload-time = "2025-09-08T23:23:23.314Z" },
+    { url = "https://files.pythonhosted.org/packages/47/d9/d83e293854571c877a92da46fdec39158f8d7e68da75bf73581225d28e90/cffi-2.0.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:afb8db5439b81cf9c9d0c80404b60c3cc9c3add93e114dcae767f1477cb53775", size = 219244, upload-time = "2025-09-08T23:23:24.541Z" },
+    { url = "https://files.pythonhosted.org/packages/2b/0f/1f177e3683aead2bb00f7679a16451d302c436b5cbf2505f0ea8146ef59e/cffi-2.0.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:737fe7d37e1a1bffe70bd5754ea763a62a066dc5913ca57e957824b72a85e205", size = 222828, upload-time = "2025-09-08T23:23:26.143Z" },
+    { url = "https://files.pythonhosted.org/packages/c6/0f/cafacebd4b040e3119dcb32fed8bdef8dfe94da653155f9d0b9dc660166e/cffi-2.0.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:38100abb9d1b1435bc4cc340bb4489635dc2f0da7456590877030c9b3d40b0c1", size = 220926, upload-time = "2025-09-08T23:23:27.873Z" },
+    { url = "https://files.pythonhosted.org/packages/3e/aa/df335faa45b395396fcbc03de2dfcab242cd61a9900e914fe682a59170b1/cffi-2.0.0-cp314-cp314-win32.whl", hash = "sha256:087067fa8953339c723661eda6b54bc98c5625757ea62e95eb4898ad5e776e9f", size = 175328, upload-time = "2025-09-08T23:23:44.61Z" },
+    { url = "https://files.pythonhosted.org/packages/bb/92/882c2d30831744296ce713f0feb4c1cd30f346ef747b530b5318715cc367/cffi-2.0.0-cp314-cp314-win_amd64.whl", hash = "sha256:203a48d1fb583fc7d78a4c6655692963b860a417c0528492a6bc21f1aaefab25", size = 185650, upload-time = "2025-09-08T23:23:45.848Z" },
+    { url = "https://files.pythonhosted.org/packages/9f/2c/98ece204b9d35a7366b5b2c6539c350313ca13932143e79dc133ba757104/cffi-2.0.0-cp314-cp314-win_arm64.whl", hash = "sha256:dbd5c7a25a7cb98f5ca55d258b103a2054f859a46ae11aaf23134f9cc0d356ad", size = 180687, upload-time = "2025-09-08T23:23:47.105Z" },
+    { url = "https://files.pythonhosted.org/packages/3e/61/c768e4d548bfa607abcda77423448df8c471f25dbe64fb2ef6d555eae006/cffi-2.0.0-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:9a67fc9e8eb39039280526379fb3a70023d77caec1852002b4da7e8b270c4dd9", size = 188773, upload-time = "2025-09-08T23:23:29.347Z" },
+    { url = "https://files.pythonhosted.org/packages/2c/ea/5f76bce7cf6fcd0ab1a1058b5af899bfbef198bea4d5686da88471ea0336/cffi-2.0.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:7a66c7204d8869299919db4d5069a82f1561581af12b11b3c9f48c584eb8743d", size = 185013, upload-time = "2025-09-08T23:23:30.63Z" },
+    { url = "https://files.pythonhosted.org/packages/be/b4/c56878d0d1755cf9caa54ba71e5d049479c52f9e4afc230f06822162ab2f/cffi-2.0.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:7cc09976e8b56f8cebd752f7113ad07752461f48a58cbba644139015ac24954c", size = 221593, upload-time = "2025-09-08T23:23:31.91Z" },
+    { url = "https://files.pythonhosted.org/packages/e0/0d/eb704606dfe8033e7128df5e90fee946bbcb64a04fcdaa97321309004000/cffi-2.0.0-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.whl", hash = "sha256:92b68146a71df78564e4ef48af17551a5ddd142e5190cdf2c5624d0c3ff5b2e8", size = 209354, upload-time = "2025-09-08T23:23:33.214Z" },
+    { url = "https://files.pythonhosted.org/packages/d8/19/3c435d727b368ca475fb8742ab97c9cb13a0de600ce86f62eab7fa3eea60/cffi-2.0.0-cp314-cp314t-manylinux2014_s390x.manylinux_2_17_s390x.whl", hash = "sha256:b1e74d11748e7e98e2f426ab176d4ed720a64412b6a15054378afdb71e0f37dc", size = 208480, upload-time = "2025-09-08T23:23:34.495Z" },
+    { url = "https://files.pythonhosted.org/packages/d0/44/681604464ed9541673e486521497406fadcc15b5217c3e326b061696899a/cffi-2.0.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:28a3a209b96630bca57cce802da70c266eb08c6e97e5afd61a75611ee6c64592", size = 221584, upload-time = "2025-09-08T23:23:36.096Z" },
+    { url = "https://files.pythonhosted.org/packages/25/8e/342a504ff018a2825d395d44d63a767dd8ebc927ebda557fecdaca3ac33a/cffi-2.0.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:7553fb2090d71822f02c629afe6042c299edf91ba1bf94951165613553984512", size = 224443, upload-time = "2025-09-08T23:23:37.328Z" },
+    { url = "https://files.pythonhosted.org/packages/e1/5e/b666bacbbc60fbf415ba9988324a132c9a7a0448a9a8f125074671c0f2c3/cffi-2.0.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:6c6c373cfc5c83a975506110d17457138c8c63016b563cc9ed6e056a82f13ce4", size = 223437, upload-time = "2025-09-08T23:23:38.945Z" },
+    { url = "https://files.pythonhosted.org/packages/a0/1d/ec1a60bd1a10daa292d3cd6bb0b359a81607154fb8165f3ec95fe003b85c/cffi-2.0.0-cp314-cp314t-win32.whl", hash = "sha256:1fc9ea04857caf665289b7a75923f2c6ed559b8298a1b8c49e59f7dd95c8481e", size = 180487, upload-time = "2025-09-08T23:23:40.423Z" },
+    { url = "https://files.pythonhosted.org/packages/bf/41/4c1168c74fac325c0c8156f04b6749c8b6a8f405bbf91413ba088359f60d/cffi-2.0.0-cp314-cp314t-win_amd64.whl", hash = "sha256:d68b6cef7827e8641e8ef16f4494edda8b36104d79773a334beaa1e3521430f6", size = 191726, upload-time = "2025-09-08T23:23:41.742Z" },
+    { url = "https://files.pythonhosted.org/packages/ae/3a/dbeec9d1ee0844c679f6bb5d6ad4e9f198b1224f4e7a32825f47f6192b0c/cffi-2.0.0-cp314-cp314t-win_arm64.whl", hash = "sha256:0a1527a803f0a659de1af2e1fd700213caba79377e27e4693648c2923da066f9", size = 184195, upload-time = "2025-09-08T23:23:43.004Z" },
+]
+
+[[package]]
+name = "charset-normalizer"
+version = "3.4.7"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/e7/a1/67fe25fac3c7642725500a3f6cfe5821ad557c3abb11c9d20d12c7008d3e/charset_normalizer-3.4.7.tar.gz", hash = "sha256:ae89db9e5f98a11a4bf50407d4363e7b09b31e55bc117b4f7d80aab97ba009e5", size = 144271, upload-time = "2026-04-02T09:28:39.342Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/c1/3b/66777e39d3ae1ddc77ee606be4ec6d8cbd4c801f65e5a1b6f2b11b8346dd/charset_normalizer-3.4.7-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:f496c9c3cc02230093d8330875c4c3cdfc3b73612a5fd921c65d39cbcef08063", size = 309627, upload-time = "2026-04-02T09:26:45.198Z" },
+    { url = "https://files.pythonhosted.org/packages/2e/4e/b7f84e617b4854ade48a1b7915c8ccfadeba444d2a18c291f696e37f0d3b/charset_normalizer-3.4.7-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0ea948db76d31190bf08bd371623927ee1339d5f2a0b4b1b4a4439a65298703c", size = 207008, upload-time = "2026-04-02T09:26:46.824Z" },
+    { url = "https://files.pythonhosted.org/packages/c4/bb/ec73c0257c9e11b268f018f068f5d00aa0ef8c8b09f7753ebd5f2880e248/charset_normalizer-3.4.7-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:a277ab8928b9f299723bc1a2dabb1265911b1a76341f90a510368ca44ad9ab66", size = 228303, upload-time = "2026-04-02T09:26:48.397Z" },
+    { url = "https://files.pythonhosted.org/packages/85/fb/32d1f5033484494619f701e719429c69b766bfc4dbc61aa9e9c8c166528b/charset_normalizer-3.4.7-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:3bec022aec2c514d9cf199522a802bd007cd588ab17ab2525f20f9c34d067c18", size = 224282, upload-time = "2026-04-02T09:26:49.684Z" },
+    { url = "https://files.pythonhosted.org/packages/fa/07/330e3a0dda4c404d6da83b327270906e9654a24f6c546dc886a0eb0ffb23/charset_normalizer-3.4.7-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:e044c39e41b92c845bc815e5ae4230804e8e7bc29e399b0437d64222d92809dd", size = 215595, upload-time = "2026-04-02T09:26:50.915Z" },
+    { url = "https://files.pythonhosted.org/packages/e3/7c/fc890655786e423f02556e0216d4b8c6bcb6bdfa890160dc66bf52dee468/charset_normalizer-3.4.7-cp313-cp313-manylinux_2_31_armv7l.whl", hash = "sha256:f495a1652cf3fbab2eb0639776dad966c2fb874d79d87ca07f9d5f059b8bd215", size = 201986, upload-time = "2026-04-02T09:26:52.197Z" },
+    { url = "https://files.pythonhosted.org/packages/d8/97/bfb18b3db2aed3b90cf54dc292ad79fdd5ad65c4eae454099475cbeadd0d/charset_normalizer-3.4.7-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:e712b419df8ba5e42b226c510472b37bd57b38e897d3eca5e8cfd410a29fa859", size = 211711, upload-time = "2026-04-02T09:26:53.49Z" },
+    { url = "https://files.pythonhosted.org/packages/6f/a5/a581c13798546a7fd557c82614a5c65a13df2157e9ad6373166d2a3e645d/charset_normalizer-3.4.7-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:7804338df6fcc08105c7745f1502ba68d900f45fd770d5bdd5288ddccb8a42d8", size = 210036, upload-time = "2026-04-02T09:26:54.975Z" },
+    { url = "https://files.pythonhosted.org/packages/8c/bf/b3ab5bcb478e4193d517644b0fb2bf5497fbceeaa7a1bc0f4d5b50953861/charset_normalizer-3.4.7-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:481551899c856c704d58119b5025793fa6730adda3571971af568f66d2424bb5", size = 202998, upload-time = "2026-04-02T09:26:56.303Z" },
+    { url = "https://files.pythonhosted.org/packages/e7/4e/23efd79b65d314fa320ec6017b4b5834d5c12a58ba4610aa353af2e2f577/charset_normalizer-3.4.7-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:f59099f9b66f0d7145115e6f80dd8b1d847176df89b234a5a6b3f00437aa0832", size = 230056, upload-time = "2026-04-02T09:26:57.554Z" },
+    { url = "https://files.pythonhosted.org/packages/b9/9f/1e1941bc3f0e01df116e68dc37a55c4d249df5e6fa77f008841aef68264f/charset_normalizer-3.4.7-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:f59ad4c0e8f6bba240a9bb85504faa1ab438237199d4cce5f622761507b8f6a6", size = 211537, upload-time = "2026-04-02T09:26:58.843Z" },
+    { url = "https://files.pythonhosted.org/packages/80/0f/088cbb3020d44428964a6c97fe1edfb1b9550396bf6d278330281e8b709c/charset_normalizer-3.4.7-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:3dedcc22d73ec993f42055eff4fcfed9318d1eeb9a6606c55892a26964964e48", size = 226176, upload-time = "2026-04-02T09:27:00.437Z" },
+    { url = "https://files.pythonhosted.org/packages/6a/9f/130394f9bbe06f4f63e22641d32fc9b202b7e251c9aef4db044324dac493/charset_normalizer-3.4.7-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:64f02c6841d7d83f832cd97ccf8eb8a906d06eb95d5276069175c696b024b60a", size = 217723, upload-time = "2026-04-02T09:27:02.021Z" },
+    { url = "https://files.pythonhosted.org/packages/73/55/c469897448a06e49f8fa03f6caae97074fde823f432a98f979cc42b90e69/charset_normalizer-3.4.7-cp313-cp313-win32.whl", hash = "sha256:4042d5c8f957e15221d423ba781e85d553722fc4113f523f2feb7b188cc34c5e", size = 148085, upload-time = "2026-04-02T09:27:03.192Z" },
+    { url = "https://files.pythonhosted.org/packages/5d/78/1b74c5bbb3f99b77a1715c91b3e0b5bdb6fe302d95ace4f5b1bec37b0167/charset_normalizer-3.4.7-cp313-cp313-win_amd64.whl", hash = "sha256:3946fa46a0cf3e4c8cb1cc52f56bb536310d34f25f01ca9b6c16afa767dab110", size = 158819, upload-time = "2026-04-02T09:27:04.454Z" },
+    { url = "https://files.pythonhosted.org/packages/68/86/46bd42279d323deb8687c4a5a811fd548cb7d1de10cf6535d099877a9a9f/charset_normalizer-3.4.7-cp313-cp313-win_arm64.whl", hash = "sha256:80d04837f55fc81da168b98de4f4b797ef007fc8a79ab71c6ec9bc4dd662b15b", size = 147915, upload-time = "2026-04-02T09:27:05.971Z" },
+    { url = "https://files.pythonhosted.org/packages/97/c8/c67cb8c70e19ef1960b97b22ed2a1567711de46c4ddf19799923adc836c2/charset_normalizer-3.4.7-cp314-cp314-macosx_10_15_universal2.whl", hash = "sha256:c36c333c39be2dbca264d7803333c896ab8fa7d4d6f0ab7edb7dfd7aea6e98c0", size = 309234, upload-time = "2026-04-02T09:27:07.194Z" },
+    { url = "https://files.pythonhosted.org/packages/99/85/c091fdee33f20de70d6c8b522743b6f831a2f1cd3ff86de4c6a827c48a76/charset_normalizer-3.4.7-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1c2aed2e5e41f24ea8ef1590b8e848a79b56f3a5564a65ceec43c9d692dc7d8a", size = 208042, upload-time = "2026-04-02T09:27:08.749Z" },
+    { url = "https://files.pythonhosted.org/packages/87/1c/ab2ce611b984d2fd5d86a5a8a19c1ae26acac6bad967da4967562c75114d/charset_normalizer-3.4.7-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:54523e136b8948060c0fa0bc7b1b50c32c186f2fceee897a495406bb6e311d2b", size = 228706, upload-time = "2026-04-02T09:27:09.951Z" },
+    { url = "https://files.pythonhosted.org/packages/a8/29/2b1d2cb00bf085f59d29eb773ce58ec2d325430f8c216804a0a5cd83cbca/charset_normalizer-3.4.7-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:715479b9a2802ecac752a3b0efa2b0b60285cf962ee38414211abdfccc233b41", size = 224727, upload-time = "2026-04-02T09:27:11.175Z" },
+    { url = "https://files.pythonhosted.org/packages/47/5c/032c2d5a07fe4d4855fea851209cca2b6f03ebeb6d4e3afdb3358386a684/charset_normalizer-3.4.7-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:bd6c2a1c7573c64738d716488d2cdd3c00e340e4835707d8fdb8dc1a66ef164e", size = 215882, upload-time = "2026-04-02T09:27:12.446Z" },
+    { url = "https://files.pythonhosted.org/packages/2c/c2/356065d5a8b78ed04499cae5f339f091946a6a74f91e03476c33f0ab7100/charset_normalizer-3.4.7-cp314-cp314-manylinux_2_31_armv7l.whl", hash = "sha256:c45e9440fb78f8ddabcf714b68f936737a121355bf59f3907f4e17721b9d1aae", size = 200860, upload-time = "2026-04-02T09:27:13.721Z" },
+    { url = "https://files.pythonhosted.org/packages/0c/cd/a32a84217ced5039f53b29f460962abb2d4420def55afabe45b1c3c7483d/charset_normalizer-3.4.7-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:3534e7dcbdcf757da6b85a0bbf5b6868786d5982dd959b065e65481644817a18", size = 211564, upload-time = "2026-04-02T09:27:15.272Z" },
+    { url = "https://files.pythonhosted.org/packages/44/86/58e6f13ce26cc3b8f4a36b94a0f22ae2f00a72534520f4ae6857c4b81f89/charset_normalizer-3.4.7-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:e8ac484bf18ce6975760921bb6148041faa8fef0547200386ea0b52b5d27bf7b", size = 211276, upload-time = "2026-04-02T09:27:16.834Z" },
+    { url = "https://files.pythonhosted.org/packages/8f/fe/d17c32dc72e17e155e06883efa84514ca375f8a528ba2546bee73fc4df81/charset_normalizer-3.4.7-cp314-cp314-musllinux_1_2_armv7l.whl", hash = "sha256:a5fe03b42827c13cdccd08e6c0247b6a6d4b5e3cdc53fd1749f5896adcdc2356", size = 201238, upload-time = "2026-04-02T09:27:18.229Z" },
+    { url = "https://files.pythonhosted.org/packages/6a/29/f33daa50b06525a237451cdb6c69da366c381a3dadcd833fa5676bc468b3/charset_normalizer-3.4.7-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:2d6eb928e13016cea4f1f21d1e10c1cebd5a421bc57ddf5b1142ae3f86824fab", size = 230189, upload-time = "2026-04-02T09:27:19.445Z" },
+    { url = "https://files.pythonhosted.org/packages/b6/6e/52c84015394a6a0bdcd435210a7e944c5f94ea1055f5cc5d56c5fe368e7b/charset_normalizer-3.4.7-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:e74327fb75de8986940def6e8dee4f127cc9752bee7355bb323cc5b2659b6d46", size = 211352, upload-time = "2026-04-02T09:27:20.79Z" },
+    { url = "https://files.pythonhosted.org/packages/8c/d7/4353be581b373033fb9198bf1da3cf8f09c1082561e8e922aa7b39bf9fe8/charset_normalizer-3.4.7-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:d6038d37043bced98a66e68d3aa2b6a35505dc01328cd65217cefe82f25def44", size = 227024, upload-time = "2026-04-02T09:27:22.063Z" },
+    { url = "https://files.pythonhosted.org/packages/30/45/99d18aa925bd1740098ccd3060e238e21115fffbfdcb8f3ece837d0ace6c/charset_normalizer-3.4.7-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:7579e913a5339fb8fa133f6bbcfd8e6749696206cf05acdbdca71a1b436d8e72", size = 217869, upload-time = "2026-04-02T09:27:23.486Z" },
+    { url = "https://files.pythonhosted.org/packages/5c/05/5ee478aa53f4bb7996482153d4bfe1b89e0f087f0ab6b294fcf92d595873/charset_normalizer-3.4.7-cp314-cp314-win32.whl", hash = "sha256:5b77459df20e08151cd6f8b9ef8ef1f961ef73d85c21a555c7eed5b79410ec10", size = 148541, upload-time = "2026-04-02T09:27:25.146Z" },
+    { url = "https://files.pythonhosted.org/packages/48/77/72dcb0921b2ce86420b2d79d454c7022bf5be40202a2a07906b9f2a35c97/charset_normalizer-3.4.7-cp314-cp314-win_amd64.whl", hash = "sha256:92a0a01ead5e668468e952e4238cccd7c537364eb7d851ab144ab6627dbbe12f", size = 159634, upload-time = "2026-04-02T09:27:26.642Z" },
+    { url = "https://files.pythonhosted.org/packages/c6/a3/c2369911cd72f02386e4e340770f6e158c7980267da16af8f668217abaa0/charset_normalizer-3.4.7-cp314-cp314-win_arm64.whl", hash = "sha256:67f6279d125ca0046a7fd386d01b311c6363844deac3e5b069b514ba3e63c246", size = 148384, upload-time = "2026-04-02T09:27:28.271Z" },
+    { url = "https://files.pythonhosted.org/packages/94/09/7e8a7f73d24dba1f0035fbbf014d2c36828fc1bf9c88f84093e57d315935/charset_normalizer-3.4.7-cp314-cp314t-macosx_10_15_universal2.whl", hash = "sha256:effc3f449787117233702311a1b7d8f59cba9ced946ba727bdc329ec69028e24", size = 330133, upload-time = "2026-04-02T09:27:29.474Z" },
+    { url = "https://files.pythonhosted.org/packages/8d/da/96975ddb11f8e977f706f45cddd8540fd8242f71ecdb5d18a80723dcf62c/charset_normalizer-3.4.7-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:fbccdc05410c9ee21bbf16a35f4c1d16123dcdeb8a1d38f33654fa21d0234f79", size = 216257, upload-time = "2026-04-02T09:27:30.793Z" },
+    { url = "https://files.pythonhosted.org/packages/e5/e8/1d63bf8ef2d388e95c64b2098f45f84758f6d102a087552da1485912637b/charset_normalizer-3.4.7-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:733784b6d6def852c814bce5f318d25da2ee65dd4839a0718641c696e09a2960", size = 234851, upload-time = "2026-04-02T09:27:32.44Z" },
+    { url = "https://files.pythonhosted.org/packages/9b/40/e5ff04233e70da2681fa43969ad6f66ca5611d7e669be0246c4c7aaf6dc8/charset_normalizer-3.4.7-cp314-cp314t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:a89c23ef8d2c6b27fd200a42aa4ac72786e7c60d40efdc76e6011260b6e949c4", size = 233393, upload-time = "2026-04-02T09:27:34.03Z" },
+    { url = "https://files.pythonhosted.org/packages/be/c1/06c6c49d5a5450f76899992f1ee40b41d076aee9279b49cf9974d2f313d5/charset_normalizer-3.4.7-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:6c114670c45346afedc0d947faf3c7f701051d2518b943679c8ff88befe14f8e", size = 223251, upload-time = "2026-04-02T09:27:35.369Z" },
+    { url = "https://files.pythonhosted.org/packages/2b/9f/f2ff16fb050946169e3e1f82134d107e5d4ae72647ec8a1b1446c148480f/charset_normalizer-3.4.7-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:a180c5e59792af262bf263b21a3c49353f25945d8d9f70628e73de370d55e1e1", size = 206609, upload-time = "2026-04-02T09:27:36.661Z" },
+    { url = "https://files.pythonhosted.org/packages/69/d5/a527c0cd8d64d2eab7459784fb4169a0ac76e5a6fc5237337982fd61347e/charset_normalizer-3.4.7-cp314-cp314t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:3c9a494bc5ec77d43cea229c4f6db1e4d8fe7e1bbffa8b6f0f0032430ff8ab44", size = 220014, upload-time = "2026-04-02T09:27:38.019Z" },
+    { url = "https://files.pythonhosted.org/packages/7e/80/8a7b8104a3e203074dc9aa2c613d4b726c0e136bad1cc734594b02867972/charset_normalizer-3.4.7-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:8d828b6667a32a728a1ad1d93957cdf37489c57b97ae6c4de2860fa749b8fc1e", size = 218979, upload-time = "2026-04-02T09:27:39.37Z" },
+    { url = "https://files.pythonhosted.org/packages/02/9a/b759b503d507f375b2b5c153e4d2ee0a75aa215b7f2489cf314f4541f2c0/charset_normalizer-3.4.7-cp314-cp314t-musllinux_1_2_armv7l.whl", hash = "sha256:cf1493cd8607bec4d8a7b9b004e699fcf8f9103a9284cc94962cb73d20f9d4a3", size = 209238, upload-time = "2026-04-02T09:27:40.722Z" },
+    { url = "https://files.pythonhosted.org/packages/c2/4e/0f3f5d47b86bdb79256e7290b26ac847a2832d9a4033f7eb2cd4bcf4bb5b/charset_normalizer-3.4.7-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:0c96c3b819b5c3e9e165495db84d41914d6894d55181d2d108cc1a69bfc9cce0", size = 236110, upload-time = "2026-04-02T09:27:42.33Z" },
+    { url = "https://files.pythonhosted.org/packages/96/23/bce28734eb3ed2c91dcf93abeb8a5cf393a7b2749725030bb630e554fdd8/charset_normalizer-3.4.7-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:752a45dc4a6934060b3b0dab47e04edc3326575f82be64bc4fc293914566503e", size = 219824, upload-time = "2026-04-02T09:27:43.924Z" },
+    { url = "https://files.pythonhosted.org/packages/2c/6f/6e897c6984cc4d41af319b077f2f600fc8214eb2fe2d6bcb79141b882400/charset_normalizer-3.4.7-cp314-cp314t-musllinux_1_2_s390x.whl", hash = "sha256:8778f0c7a52e56f75d12dae53ae320fae900a8b9b4164b981b9c5ce059cd1fcb", size = 233103, upload-time = "2026-04-02T09:27:45.348Z" },
+    { url = "https://files.pythonhosted.org/packages/76/22/ef7bd0fe480a0ae9b656189ec00744b60933f68b4f42a7bb06589f6f576a/charset_normalizer-3.4.7-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:ce3412fbe1e31eb81ea42f4169ed94861c56e643189e1e75f0041f3fe7020abe", size = 225194, upload-time = "2026-04-02T09:27:46.706Z" },
+    { url = "https://files.pythonhosted.org/packages/c5/a7/0e0ab3e0b5bc1219bd80a6a0d4d72ca74d9250cb2382b7c699c147e06017/charset_normalizer-3.4.7-cp314-cp314t-win32.whl", hash = "sha256:c03a41a8784091e67a39648f70c5f97b5b6a37f216896d44d2cdcb82615339a0", size = 159827, upload-time = "2026-04-02T09:27:48.053Z" },
+    { url = "https://files.pythonhosted.org/packages/7a/1d/29d32e0fb40864b1f878c7f5a0b343ae676c6e2b271a2d55cc3a152391da/charset_normalizer-3.4.7-cp314-cp314t-win_amd64.whl", hash = "sha256:03853ed82eeebbce3c2abfdbc98c96dc205f32a79627688ac9a27370ea61a49c", size = 174168, upload-time = "2026-04-02T09:27:49.795Z" },
+    { url = "https://files.pythonhosted.org/packages/de/32/d92444ad05c7a6e41fb2036749777c163baf7a0301a040cb672d6b2b1ae9/charset_normalizer-3.4.7-cp314-cp314t-win_arm64.whl", hash = "sha256:c35abb8bfff0185efac5878da64c45dafd2b37fb0383add1be155a763c1f083d", size = 153018, upload-time = "2026-04-02T09:27:51.116Z" },
+    { url = "https://files.pythonhosted.org/packages/db/8f/61959034484a4a7c527811f4721e75d02d653a35afb0b6054474d8185d4c/charset_normalizer-3.4.7-py3-none-any.whl", hash = "sha256:3dce51d0f5e7951f8bb4900c257dad282f49190fdbebecd4ba99bcc41fef404d", size = 61958, upload-time = "2026-04-02T09:28:37.794Z" },
+]
+
+[[package]]
+name = "colorama"
+version = "0.4.6"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/d8/53/6f443c9a4a8358a93a6792e2acffb9d9d5cb0a5cfd8802644b7b1c9a02e4/colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44", size = 27697, upload-time = "2022-10-25T02:36:22.414Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6", size = 25335, upload-time = "2022-10-25T02:36:20.889Z" },
+]
+
+[[package]]
+name = "cryptography"
+version = "48.0.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "cffi", marker = "platform_python_implementation != 'PyPy'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/9f/a9/db8f313fdcd85d767d4973515e1db101f9c71f95fced83233de224673757/cryptography-48.0.0.tar.gz", hash = "sha256:5c3932f4436d1cccb036cb0eaef46e6e2db91035166f1ad6505c3c9d5a635920", size = 832984, upload-time = "2026-05-04T22:59:38.133Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/df/3d/01f6dd9190170a5a241e0e98c2d04be3664a9e6f5b9b872cde63aff1c3dd/cryptography-48.0.0-cp311-abi3-macosx_10_9_universal2.whl", hash = "sha256:0c558d2cdffd8f4bbb30fc7134c74d2ca9a476f830bb053074498fbc86f41ed6", size = 8001587, upload-time = "2026-05-04T22:57:36.803Z" },
+    { url = "https://files.pythonhosted.org/packages/b2/6e/e90527eef33f309beb811cf7c982c3aeffcce8e3edb178baa4ca3ae4a6fa/cryptography-48.0.0-cp311-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:f5333311663ea94f75dd408665686aaf426563556bb5283554a3539177e03b8c", size = 4690433, upload-time = "2026-05-04T22:57:40.373Z" },
+    { url = "https://files.pythonhosted.org/packages/90/04/673510ed51ddff56575f306cf1617d80411ee76831ccd3097599140efdfe/cryptography-48.0.0-cp311-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:7995ef305d7165c3f11ae07f2517e5a4f1d5c18da1376a0a9ed496336b69e5f3", size = 4710620, upload-time = "2026-05-04T22:57:42.935Z" },
+    { url = "https://files.pythonhosted.org/packages/14/d5/e9c4ef932c8d800490c34d8bd589d64a31d5890e27ec9e9ad532be893294/cryptography-48.0.0-cp311-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:40ba1f85eaa6959837b1d51c9767e230e14612eea4ef110ee8854ada22da1bf5", size = 4696283, upload-time = "2026-05-04T22:57:45.294Z" },
+    { url = "https://files.pythonhosted.org/packages/0c/29/174b9dfb60b12d59ecfc6cfa04bc88c21b42a54f01b8aae09bb6e51e4c7f/cryptography-48.0.0-cp311-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:369a6348999f94bbd53435c894377b20ab95f25a9065c283570e70150d8abc3c", size = 5296573, upload-time = "2026-05-04T22:57:47.933Z" },
+    { url = "https://files.pythonhosted.org/packages/95/38/0d29a6fd7d0d1373f0c0c88a04ba20e359b257753ac497564cd660fc1d55/cryptography-48.0.0-cp311-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:a0e692c683f4df67815a2d258b324e66f4738bd7a96a218c826dce4f4bd05d8f", size = 4743677, upload-time = "2026-05-04T22:57:50.067Z" },
+    { url = "https://files.pythonhosted.org/packages/30/be/eef653013d5c63b6a490529e0316f9ac14a37602965d4903efed1399f32b/cryptography-48.0.0-cp311-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:18349bbc56f4743c8b12dc32e2bccb2cf83ee8b69a3bba74ef8ae857e26b3d25", size = 4330808, upload-time = "2026-05-04T22:57:52.301Z" },
+    { url = "https://files.pythonhosted.org/packages/84/9e/500463e87abb7a0a0f9f256ec21123ecde0a7b5541a15e840ea54551fd81/cryptography-48.0.0-cp311-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:7e8eac43dfca5c4cccc6dad9a80504436fca53bb9bc3100a2386d730fbe6b602", size = 4695941, upload-time = "2026-05-04T22:57:54.603Z" },
+    { url = "https://files.pythonhosted.org/packages/e3/dc/7303087450c2ec9e7fbb750e17c2abfbc658f23cbd0e54009509b7cc4091/cryptography-48.0.0-cp311-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:9ccdac7d40688ecb5a3b4a604b8a88c8002e3442d6c60aead1db2a89a041560c", size = 5252579, upload-time = "2026-05-04T22:57:57.207Z" },
+    { url = "https://files.pythonhosted.org/packages/d0/c0/7101d3b7215edcdc90c45da544961fd8ed2d6448f77577460fa75a8443f7/cryptography-48.0.0-cp311-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:bd72e68b06bb1e96913f97dd4901119bc17f39d4586a5adf2d3e47bc2b9d58b5", size = 4743326, upload-time = "2026-05-04T22:57:59.535Z" },
+    { url = "https://files.pythonhosted.org/packages/ac/d8/5b833bad13016f562ab9d063d68199a4bd121d18458e439515601d3357ec/cryptography-48.0.0-cp311-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:59baa2cb386c4f0b9905bd6eb4c2a79a69a128408fd31d32ca4d7102d4156321", size = 4826672, upload-time = "2026-05-04T22:58:01.996Z" },
+    { url = "https://files.pythonhosted.org/packages/98/e1/7074eb8bf3c135558c73fc2bcf0f5633f912e6fb87e868a55c454080ef09/cryptography-48.0.0-cp311-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:9249e3cd978541d665967ac2cb2787fd6a62bddf1e75b3e347a594d7dacf4f74", size = 4972574, upload-time = "2026-05-04T22:58:03.968Z" },
+    { url = "https://files.pythonhosted.org/packages/04/70/e5a1b41d325f797f39427aa44ef8baf0be500065ab6d8e10369d850d4a4f/cryptography-48.0.0-cp311-abi3-win32.whl", hash = "sha256:9c459db21422be75e2809370b829a87eb37f74cd785fc4aa9ea1e5f43b47cda4", size = 3294868, upload-time = "2026-05-04T22:58:06.467Z" },
+    { url = "https://files.pythonhosted.org/packages/f4/ac/8ac51b4a5fc5932eb7ee5c517ba7dc8cd834f0048962b6b352f00f41ebf9/cryptography-48.0.0-cp311-abi3-win_amd64.whl", hash = "sha256:5b012212e08b8dd5edc78ef54da83dd9892fd9105323b3993eff6bea65dc21d7", size = 3817107, upload-time = "2026-05-04T22:58:08.845Z" },
+    { url = "https://files.pythonhosted.org/packages/6b/84/70e3feea9feea87fd7cbe77efb2712ae1e3e6edf10749dc6e95f4e60e455/cryptography-48.0.0-cp314-cp314t-macosx_10_9_universal2.whl", hash = "sha256:3cb07a3ed6431663cd321ea8a000a1314c74211f823e4177fefa2255e057d1ec", size = 7986556, upload-time = "2026-05-04T22:58:11.172Z" },
+    { url = "https://files.pythonhosted.org/packages/89/6e/18e07a618bb5442ba10cf4df16e99c071365528aa570dfcb8c02e25a303b/cryptography-48.0.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:8c7378637d7d88016fa6791c159f698b3d3eed28ebf844ac36b9dc04a14dae18", size = 4684776, upload-time = "2026-05-04T22:58:13.712Z" },
+    { url = "https://files.pythonhosted.org/packages/be/6a/4ea3b4c6c6759794d5ee2103c304a5076dc4b19ae1f9fe47dba439e159e9/cryptography-48.0.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:cc90c0b39b2e3c65ef52c804b72e3c58f8a04ab2a1871272798e5f9572c17d20", size = 4698121, upload-time = "2026-05-04T22:58:16.448Z" },
+    { url = "https://files.pythonhosted.org/packages/2f/59/6ff6ad6cae03bb887da2a5860b2c9805f8dac969ef01ce563336c49bd1d1/cryptography-48.0.0-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:76341972e1eff8b4bea859f09c0d3e64b96ce931b084f9b9b7db8ef364c30eff", size = 4690042, upload-time = "2026-05-04T22:58:18.544Z" },
+    { url = "https://files.pythonhosted.org/packages/ca/b4/fc334ed8cfd705aca282fe4d8f5ae64a8e0f74932e9feecb344610cf6e4d/cryptography-48.0.0-cp314-cp314t-manylinux_2_28_ppc64le.whl", hash = "sha256:55b7718303bf06a5753dcdccf2f3945cf18ad7bffde41b61226e4db31ab89a9c", size = 5282526, upload-time = "2026-05-04T22:58:20.75Z" },
+    { url = "https://files.pythonhosted.org/packages/11/08/9f8c5386cc4cd90d8255c7cdd0f5baf459a08502a09de30dc51f553d38dc/cryptography-48.0.0-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:a64697c641c7b1b2178e573cbc31c7c6684cd56883a478d75143dbb7118036db", size = 4733116, upload-time = "2026-05-04T22:58:23.627Z" },
+    { url = "https://files.pythonhosted.org/packages/b8/77/99307d7574045699f8805aa500fa0fb83422d115b5400a064ddd306d7750/cryptography-48.0.0-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:561215ea3879cb1cbbf272867e2efda62476f240fb58c64de6b393ae19246741", size = 4316030, upload-time = "2026-05-04T22:58:25.581Z" },
+    { url = "https://files.pythonhosted.org/packages/fd/36/a608b98337af3cb2aff4818e406649d30572b7031918b04c87d979495348/cryptography-48.0.0-cp314-cp314t-manylinux_2_34_aarch64.whl", hash = "sha256:ad64688338ed4bc1a6618076ba75fd7194a5f1797ac60b47afe926285adb3166", size = 4689640, upload-time = "2026-05-04T22:58:27.747Z" },
+    { url = "https://files.pythonhosted.org/packages/dd/a6/825010a291b4438aecc1f568bc428189fc1175515223632477c07dc0a6df/cryptography-48.0.0-cp314-cp314t-manylinux_2_34_ppc64le.whl", hash = "sha256:906cbf0670286c6e0044156bc7d4af9cbb0ef6db9f73e52c3ec56ba6bdde5336", size = 5237657, upload-time = "2026-05-04T22:58:29.848Z" },
+    { url = "https://files.pythonhosted.org/packages/b9/09/4e76a09b4caa29aad535ddc806f5d4c5d01885bd978bd984fbc6ca032cae/cryptography-48.0.0-cp314-cp314t-manylinux_2_34_x86_64.whl", hash = "sha256:ea8990436d914540a40ab24b6a77c0969695ed52f4a4874c5137ccf7045a7057", size = 4732362, upload-time = "2026-05-04T22:58:32.009Z" },
+    { url = "https://files.pythonhosted.org/packages/18/78/444fa04a77d0cb95f417dda20d450e13c56ba8e5220fc892a1658f44f882/cryptography-48.0.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:c18684a7f0cc9a3cb60328f496b8e3372def7c5d2df39ac267878b05565aaaae", size = 4819580, upload-time = "2026-05-04T22:58:34.254Z" },
+    { url = "https://files.pythonhosted.org/packages/38/85/ea67067c70a1fd4be2c63d35eeed82658023021affccc7b17705f8527dd2/cryptography-48.0.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:9be5aafa5736574f8f15f262adc81b2a9869e2cfe9014d52a44633905b40d52c", size = 4963283, upload-time = "2026-05-04T22:58:36.376Z" },
+    { url = "https://files.pythonhosted.org/packages/75/54/cc6d0f3deac3e81c7f847e8a189a12b6cdd65059b43dad25d4316abd849a/cryptography-48.0.0-cp314-cp314t-win32.whl", hash = "sha256:c17dfe85494deaeddc5ce251aebd1d60bbe6afc8b62071bb0b469431a000124f", size = 3270954, upload-time = "2026-05-04T22:58:38.791Z" },
+    { url = "https://files.pythonhosted.org/packages/49/67/cc947e288c0758a4e5473d1dcb743037ab7785541265a969240b8885441a/cryptography-48.0.0-cp314-cp314t-win_amd64.whl", hash = "sha256:27241b1dc9962e056062a8eef1991d02c3a24569c95975bd2322a8a52c6e5e12", size = 3797313, upload-time = "2026-05-04T22:58:40.746Z" },
+    { url = "https://files.pythonhosted.org/packages/f2/63/61d4a4e1c6b6bab6ce1e213cd36a24c415d90e76d78c5eb8577c5541d2e8/cryptography-48.0.0-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:58d00498e8933e4a194f3076aee1b4a97dfec1a6da444535755822fe5d8b0b86", size = 7983482, upload-time = "2026-05-04T22:58:43.769Z" },
+    { url = "https://files.pythonhosted.org/packages/d5/ac/f5b5995b87770c693e2596559ffafe195b4033a57f14a82268a2842953f3/cryptography-48.0.0-cp39-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:614d0949f4790582d2cc25553abd09dd723025f0c0e7c67376a1d77196743d6e", size = 4683266, upload-time = "2026-05-04T22:58:46.064Z" },
+    { url = "https://files.pythonhosted.org/packages/ec/c6/8b14f67e18338fbc4adb76f66c001f5c3610b3e2d1837f268f47a347dbbb/cryptography-48.0.0-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:7ce4bfae76319a532a2dc68f82cc32f5676ee792a983187dac07183690e5c66f", size = 4696228, upload-time = "2026-05-04T22:58:48.22Z" },
+    { url = "https://files.pythonhosted.org/packages/ea/73/f808fbae9514bd91b47875b003f13e284c8c6bdfd904b7944e803937eec1/cryptography-48.0.0-cp39-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:2eb992bbd4661238c5a397594c83f5b4dc2bc5b848c365c8f991b6780efcc5c7", size = 4689097, upload-time = "2026-05-04T22:58:50.9Z" },
+    { url = "https://files.pythonhosted.org/packages/93/01/d86632d7d28db8ae83221995752eeb6639ffb374c2d22955648cf8d52797/cryptography-48.0.0-cp39-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:22a5cb272895dce158b2cacdfdc3debd299019659f42947dbdac6f32d68fe832", size = 5283582, upload-time = "2026-05-04T22:58:53.017Z" },
+    { url = "https://files.pythonhosted.org/packages/02/e1/50edc7a50334807cc4791fc4a0ce7468b4a1416d9138eab358bfc9a3d70b/cryptography-48.0.0-cp39-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:2b4d59804e8408e2fea7d1fbaf218e5ec984325221db76e6a241a9abd6cdd95c", size = 4730479, upload-time = "2026-05-04T22:58:55.611Z" },
+    { url = "https://files.pythonhosted.org/packages/6f/af/99a582b1b1641ff5911ac559beb45097cf79efd4ead4657f578ef1af2d47/cryptography-48.0.0-cp39-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:984a20b0f62a26f48a3396c72e4bc34c66e356d356bf370053066b3b6d54634a", size = 4326481, upload-time = "2026-05-04T22:58:57.607Z" },
+    { url = "https://files.pythonhosted.org/packages/90/ee/89aa26a06ef0a7d7611788ffd571a7c50e368cc6a4d5eef8b4884e866edb/cryptography-48.0.0-cp39-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:5a5ed8fde7a1d09376ca0b40e68cd59c69fe23b1f9768bd5824f54681626032a", size = 4688713, upload-time = "2026-05-04T22:59:00.077Z" },
+    { url = "https://files.pythonhosted.org/packages/70/ba/bcb1b0bb7a33d4c7c0c4d4c7874b4a62ae4f56113a5f4baefa362dfb1f0f/cryptography-48.0.0-cp39-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:8cd666227ef7af430aa5914a9910e0ddd703e75f039cef0825cd0da71b6b711a", size = 5238165, upload-time = "2026-05-04T22:59:02.317Z" },
+    { url = "https://files.pythonhosted.org/packages/c9/70/ca4003b1ce5ca3dc3186ada51908c8a9b9ff7d5cab83cc0d43ee14ec144f/cryptography-48.0.0-cp39-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:9071196d81abc88b3516ac8cdfad32e2b66dd4a5393a8e68a961e9161ddc6239", size = 4729947, upload-time = "2026-05-04T22:59:05.255Z" },
+    { url = "https://files.pythonhosted.org/packages/44/a0/4ec7cf774207905aef1a8d11c3750d5a1db805eb380ee4e16df317870128/cryptography-48.0.0-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:1e2d54c8be6152856a36f0882ab231e70f8ec7f14e93cf87db8a2ed056bf160c", size = 4822059, upload-time = "2026-05-04T22:59:07.802Z" },
+    { url = "https://files.pythonhosted.org/packages/1e/75/a2e55f99c16fcac7b5d6c1eb19ad8e00799854d6be5ca845f9259eae1681/cryptography-48.0.0-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:a5da777e32ffed6f85a7b2b3f7c5cbc88c146bfcd0a1d7baf5fcc6c52ee35dd4", size = 4960575, upload-time = "2026-05-04T22:59:09.851Z" },
+    { url = "https://files.pythonhosted.org/packages/b8/23/6e6f32143ab5d8b36ca848a502c4bcd477ae75b9e1677e3530d669062578/cryptography-48.0.0-cp39-abi3-win32.whl", hash = "sha256:77a2ccbbe917f6710e05ba9adaa25fb5075620bf3ea6fb751997875aff4ae4bd", size = 3279117, upload-time = "2026-05-04T22:59:12.019Z" },
+    { url = "https://files.pythonhosted.org/packages/9d/9a/0fea98a70cf1749d41d738836f6349d97945f7c89433a259a6c2642eefeb/cryptography-48.0.0-cp39-abi3-win_amd64.whl", hash = "sha256:16cd65b9330583e4619939b3a3843eec1e6e789744bb01e7c7e2e62e33c239c8", size = 3792100, upload-time = "2026-05-04T22:59:14.884Z" },
+]
+
+[[package]]
+name = "docker"
+version = "7.1.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "pywin32", marker = "sys_platform == 'win32'" },
+    { name = "requests" },
+    { name = "urllib3" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/91/9b/4a2ea29aeba62471211598dac5d96825bb49348fa07e906ea930394a83ce/docker-7.1.0.tar.gz", hash = "sha256:ad8c70e6e3f8926cb8a92619b832b4ea5299e2831c14284663184e200546fa6c", size = 117834, upload-time = "2024-05-23T11:13:57.216Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/e3/26/57c6fb270950d476074c087527a558ccb6f4436657314bfb6cdf484114c4/docker-7.1.0-py3-none-any.whl", hash = "sha256:c96b93b7f0a746f9e77d325bcfb87422a3d8bd4f03136ae8a85b37f1898d5fc0", size = 147774, upload-time = "2024-05-23T11:13:55.01Z" },
+]
+
+[[package]]
+name = "idna"
+version = "3.15"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/82/77/7b3966d0b9d1d31a36ddf1746926a11dface89a83409bf1483f0237aa758/idna-3.15.tar.gz", hash = "sha256:ca962446ea538f7092a95e057da437618e886f4d349216d2b1e294abfdb65fdc", size = 199245, upload-time = "2026-05-12T22:45:57.011Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/d2/23/408243171aa9aaba178d3e2559159c24c1171a641aa83b67bdd3394ead8e/idna-3.15-py3-none-any.whl", hash = "sha256:048adeaf8c2d788c40fee287673ccaa74c24ffd8dcf09ffa555a2fbb59f10ac8", size = 72340, upload-time = "2026-05-12T22:45:55.733Z" },
+]
+
+[[package]]
+name = "iniconfig"
+version = "2.3.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/72/34/14ca021ce8e5dfedc35312d08ba8bf51fdd999c576889fc2c24cb97f4f10/iniconfig-2.3.0.tar.gz", hash = "sha256:c76315c77db068650d49c5b56314774a7804df16fee4402c1f19d6d15d8c4730", size = 20503, upload-time = "2025-10-18T21:55:43.219Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/cb/b1/3846dd7f199d53cb17f49cba7e651e9ce294d8497c8c150530ed11865bb8/iniconfig-2.3.0-py3-none-any.whl", hash = "sha256:f631c04d2c48c52b84d0d0549c99ff3859c98df65b3101406327ecc7d53fbf12", size = 7484, upload-time = "2025-10-18T21:55:41.639Z" },
+]
+
+[[package]]
+name = "jinja2"
+version = "3.1.6"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "markupsafe" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/df/bf/f7da0350254c0ed7c72f3e33cef02e048281fec7ecec5f032d4aac52226b/jinja2-3.1.6.tar.gz", hash = "sha256:0137fb05990d35f1275a587e9aee6d56da821fc83491a0fb838183be43f66d6d", size = 245115, upload-time = "2025-03-05T20:05:02.478Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/62/a1/3d680cbfd5f4b8f15abc1d571870c5fc3e594bb582bc3b64ea099db13e56/jinja2-3.1.6-py3-none-any.whl", hash = "sha256:85ece4451f492d0c13c5dd7c13a64681a86afae63a5f347908daf103ce6d2f67", size = 134899, upload-time = "2025-03-05T20:05:00.369Z" },
+]
+
+[[package]]
+name = "jmespath"
+version = "1.1.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/d3/59/322338183ecda247fb5d1763a6cbe46eff7222eaeebafd9fa65d4bf5cb11/jmespath-1.1.0.tar.gz", hash = "sha256:472c87d80f36026ae83c6ddd0f1d05d4e510134ed462851fd5f754c8c3cbb88d", size = 27377, upload-time = "2026-01-22T16:35:26.279Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/14/2f/967ba146e6d58cf6a652da73885f52fc68001525b4197effc174321d70b4/jmespath-1.1.0-py3-none-any.whl", hash = "sha256:a5663118de4908c91729bea0acadca56526eb2698e83de10cd116ae0f4e97c64", size = 20419, upload-time = "2026-01-22T16:35:24.919Z" },
+]
+
+[[package]]
+name = "joserfc"
+version = "1.6.5"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "cryptography" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/3b/dc/5f768c2e391e9afabe5d18e3221346deb5fb6338565f1ccc9e7c6d7befdd/joserfc-1.6.5.tar.gz", hash = "sha256:1482a7db78fb4602e44ed89e51b599d052e091288c7c532c5b694e20149dec48", size = 231881, upload-time = "2026-05-06T04:58:13.408Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/54/3b/ad1cb22e75c963b1f07c8a2329bf47227ce7e4361df5eb2fb101b2ce33ef/joserfc-1.6.5-py3-none-any.whl", hash = "sha256:e9878a0f8243fe7b95e11fdda81374ca9f7a689e302751579d3dfdeec559675e", size = 70464, upload-time = "2026-05-06T04:58:11.668Z" },
+]
+
+[[package]]
+name = "markupsafe"
+version = "3.0.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/7e/99/7690b6d4034fffd95959cbe0c02de8deb3098cc577c67bb6a24fe5d7caa7/markupsafe-3.0.3.tar.gz", hash = "sha256:722695808f4b6457b320fdc131280796bdceb04ab50fe1795cd540799ebe1698", size = 80313, upload-time = "2025-09-27T18:37:40.426Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/38/2f/907b9c7bbba283e68f20259574b13d005c121a0fa4c175f9bed27c4597ff/markupsafe-3.0.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:e1cf1972137e83c5d4c136c43ced9ac51d0e124706ee1c8aa8532c1287fa8795", size = 11622, upload-time = "2025-09-27T18:36:41.777Z" },
+    { url = "https://files.pythonhosted.org/packages/9c/d9/5f7756922cdd676869eca1c4e3c0cd0df60ed30199ffd775e319089cb3ed/markupsafe-3.0.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:116bb52f642a37c115f517494ea5feb03889e04df47eeff5b130b1808ce7c219", size = 12029, upload-time = "2025-09-27T18:36:43.257Z" },
+    { url = "https://files.pythonhosted.org/packages/00/07/575a68c754943058c78f30db02ee03a64b3c638586fba6a6dd56830b30a3/markupsafe-3.0.3-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:133a43e73a802c5562be9bbcd03d090aa5a1fe899db609c29e8c8d815c5f6de6", size = 24374, upload-time = "2025-09-27T18:36:44.508Z" },
+    { url = "https://files.pythonhosted.org/packages/a9/21/9b05698b46f218fc0e118e1f8168395c65c8a2c750ae2bab54fc4bd4e0e8/markupsafe-3.0.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ccfcd093f13f0f0b7fdd0f198b90053bf7b2f02a3927a30e63f3ccc9df56b676", size = 22980, upload-time = "2025-09-27T18:36:45.385Z" },
+    { url = "https://files.pythonhosted.org/packages/7f/71/544260864f893f18b6827315b988c146b559391e6e7e8f7252839b1b846a/markupsafe-3.0.3-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:509fa21c6deb7a7a273d629cf5ec029bc209d1a51178615ddf718f5918992ab9", size = 21990, upload-time = "2025-09-27T18:36:46.916Z" },
+    { url = "https://files.pythonhosted.org/packages/c2/28/b50fc2f74d1ad761af2f5dcce7492648b983d00a65b8c0e0cb457c82ebbe/markupsafe-3.0.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:a4afe79fb3de0b7097d81da19090f4df4f8d3a2b3adaa8764138aac2e44f3af1", size = 23784, upload-time = "2025-09-27T18:36:47.884Z" },
+    { url = "https://files.pythonhosted.org/packages/ed/76/104b2aa106a208da8b17a2fb72e033a5a9d7073c68f7e508b94916ed47a9/markupsafe-3.0.3-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:795e7751525cae078558e679d646ae45574b47ed6e7771863fcc079a6171a0fc", size = 21588, upload-time = "2025-09-27T18:36:48.82Z" },
+    { url = "https://files.pythonhosted.org/packages/b5/99/16a5eb2d140087ebd97180d95249b00a03aa87e29cc224056274f2e45fd6/markupsafe-3.0.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:8485f406a96febb5140bfeca44a73e3ce5116b2501ac54fe953e488fb1d03b12", size = 23041, upload-time = "2025-09-27T18:36:49.797Z" },
+    { url = "https://files.pythonhosted.org/packages/19/bc/e7140ed90c5d61d77cea142eed9f9c303f4c4806f60a1044c13e3f1471d0/markupsafe-3.0.3-cp313-cp313-win32.whl", hash = "sha256:bdd37121970bfd8be76c5fb069c7751683bdf373db1ed6c010162b2a130248ed", size = 14543, upload-time = "2025-09-27T18:36:51.584Z" },
+    { url = "https://files.pythonhosted.org/packages/05/73/c4abe620b841b6b791f2edc248f556900667a5a1cf023a6646967ae98335/markupsafe-3.0.3-cp313-cp313-win_amd64.whl", hash = "sha256:9a1abfdc021a164803f4d485104931fb8f8c1efd55bc6b748d2f5774e78b62c5", size = 15113, upload-time = "2025-09-27T18:36:52.537Z" },
+    { url = "https://files.pythonhosted.org/packages/f0/3a/fa34a0f7cfef23cf9500d68cb7c32dd64ffd58a12b09225fb03dd37d5b80/markupsafe-3.0.3-cp313-cp313-win_arm64.whl", hash = "sha256:7e68f88e5b8799aa49c85cd116c932a1ac15caaa3f5db09087854d218359e485", size = 13911, upload-time = "2025-09-27T18:36:53.513Z" },
+    { url = "https://files.pythonhosted.org/packages/e4/d7/e05cd7efe43a88a17a37b3ae96e79a19e846f3f456fe79c57ca61356ef01/markupsafe-3.0.3-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:218551f6df4868a8d527e3062d0fb968682fe92054e89978594c28e642c43a73", size = 11658, upload-time = "2025-09-27T18:36:54.819Z" },
+    { url = "https://files.pythonhosted.org/packages/99/9e/e412117548182ce2148bdeacdda3bb494260c0b0184360fe0d56389b523b/markupsafe-3.0.3-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:3524b778fe5cfb3452a09d31e7b5adefeea8c5be1d43c4f810ba09f2ceb29d37", size = 12066, upload-time = "2025-09-27T18:36:55.714Z" },
+    { url = "https://files.pythonhosted.org/packages/bc/e6/fa0ffcda717ef64a5108eaa7b4f5ed28d56122c9a6d70ab8b72f9f715c80/markupsafe-3.0.3-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4e885a3d1efa2eadc93c894a21770e4bc67899e3543680313b09f139e149ab19", size = 25639, upload-time = "2025-09-27T18:36:56.908Z" },
+    { url = "https://files.pythonhosted.org/packages/96/ec/2102e881fe9d25fc16cb4b25d5f5cde50970967ffa5dddafdb771237062d/markupsafe-3.0.3-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8709b08f4a89aa7586de0aadc8da56180242ee0ada3999749b183aa23df95025", size = 23569, upload-time = "2025-09-27T18:36:57.913Z" },
+    { url = "https://files.pythonhosted.org/packages/4b/30/6f2fce1f1f205fc9323255b216ca8a235b15860c34b6798f810f05828e32/markupsafe-3.0.3-cp313-cp313t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:b8512a91625c9b3da6f127803b166b629725e68af71f8184ae7e7d54686a56d6", size = 23284, upload-time = "2025-09-27T18:36:58.833Z" },
+    { url = "https://files.pythonhosted.org/packages/58/47/4a0ccea4ab9f5dcb6f79c0236d954acb382202721e704223a8aafa38b5c8/markupsafe-3.0.3-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:9b79b7a16f7fedff2495d684f2b59b0457c3b493778c9eed31111be64d58279f", size = 24801, upload-time = "2025-09-27T18:36:59.739Z" },
+    { url = "https://files.pythonhosted.org/packages/6a/70/3780e9b72180b6fecb83a4814d84c3bf4b4ae4bf0b19c27196104149734c/markupsafe-3.0.3-cp313-cp313t-musllinux_1_2_riscv64.whl", hash = "sha256:12c63dfb4a98206f045aa9563db46507995f7ef6d83b2f68eda65c307c6829eb", size = 22769, upload-time = "2025-09-27T18:37:00.719Z" },
+    { url = "https://files.pythonhosted.org/packages/98/c5/c03c7f4125180fc215220c035beac6b9cb684bc7a067c84fc69414d315f5/markupsafe-3.0.3-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:8f71bc33915be5186016f675cd83a1e08523649b0e33efdb898db577ef5bb009", size = 23642, upload-time = "2025-09-27T18:37:01.673Z" },
+    { url = "https://files.pythonhosted.org/packages/80/d6/2d1b89f6ca4bff1036499b1e29a1d02d282259f3681540e16563f27ebc23/markupsafe-3.0.3-cp313-cp313t-win32.whl", hash = "sha256:69c0b73548bc525c8cb9a251cddf1931d1db4d2258e9599c28c07ef3580ef354", size = 14612, upload-time = "2025-09-27T18:37:02.639Z" },
+    { url = "https://files.pythonhosted.org/packages/2b/98/e48a4bfba0a0ffcf9925fe2d69240bfaa19c6f7507b8cd09c70684a53c1e/markupsafe-3.0.3-cp313-cp313t-win_amd64.whl", hash = "sha256:1b4b79e8ebf6b55351f0d91fe80f893b4743f104bff22e90697db1590e47a218", size = 15200, upload-time = "2025-09-27T18:37:03.582Z" },
+    { url = "https://files.pythonhosted.org/packages/0e/72/e3cc540f351f316e9ed0f092757459afbc595824ca724cbc5a5d4263713f/markupsafe-3.0.3-cp313-cp313t-win_arm64.whl", hash = "sha256:ad2cf8aa28b8c020ab2fc8287b0f823d0a7d8630784c31e9ee5edea20f406287", size = 13973, upload-time = "2025-09-27T18:37:04.929Z" },
+    { url = "https://files.pythonhosted.org/packages/33/8a/8e42d4838cd89b7dde187011e97fe6c3af66d8c044997d2183fbd6d31352/markupsafe-3.0.3-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:eaa9599de571d72e2daf60164784109f19978b327a3910d3e9de8c97b5b70cfe", size = 11619, upload-time = "2025-09-27T18:37:06.342Z" },
+    { url = "https://files.pythonhosted.org/packages/b5/64/7660f8a4a8e53c924d0fa05dc3a55c9cee10bbd82b11c5afb27d44b096ce/markupsafe-3.0.3-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:c47a551199eb8eb2121d4f0f15ae0f923d31350ab9280078d1e5f12b249e0026", size = 12029, upload-time = "2025-09-27T18:37:07.213Z" },
+    { url = "https://files.pythonhosted.org/packages/da/ef/e648bfd021127bef5fa12e1720ffed0c6cbb8310c8d9bea7266337ff06de/markupsafe-3.0.3-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f34c41761022dd093b4b6896d4810782ffbabe30f2d443ff5f083e0cbbb8c737", size = 24408, upload-time = "2025-09-27T18:37:09.572Z" },
+    { url = "https://files.pythonhosted.org/packages/41/3c/a36c2450754618e62008bf7435ccb0f88053e07592e6028a34776213d877/markupsafe-3.0.3-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:457a69a9577064c05a97c41f4e65148652db078a3a509039e64d3467b9e7ef97", size = 23005, upload-time = "2025-09-27T18:37:10.58Z" },
+    { url = "https://files.pythonhosted.org/packages/bc/20/b7fdf89a8456b099837cd1dc21974632a02a999ec9bf7ca3e490aacd98e7/markupsafe-3.0.3-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:e8afc3f2ccfa24215f8cb28dcf43f0113ac3c37c2f0f0806d8c70e4228c5cf4d", size = 22048, upload-time = "2025-09-27T18:37:11.547Z" },
+    { url = "https://files.pythonhosted.org/packages/9a/a7/591f592afdc734f47db08a75793a55d7fbcc6902a723ae4cfbab61010cc5/markupsafe-3.0.3-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:ec15a59cf5af7be74194f7ab02d0f59a62bdcf1a537677ce67a2537c9b87fcda", size = 23821, upload-time = "2025-09-27T18:37:12.48Z" },
+    { url = "https://files.pythonhosted.org/packages/7d/33/45b24e4f44195b26521bc6f1a82197118f74df348556594bd2262bda1038/markupsafe-3.0.3-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:0eb9ff8191e8498cca014656ae6b8d61f39da5f95b488805da4bb029cccbfbaf", size = 21606, upload-time = "2025-09-27T18:37:13.485Z" },
+    { url = "https://files.pythonhosted.org/packages/ff/0e/53dfaca23a69fbfbbf17a4b64072090e70717344c52eaaaa9c5ddff1e5f0/markupsafe-3.0.3-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:2713baf880df847f2bece4230d4d094280f4e67b1e813eec43b4c0e144a34ffe", size = 23043, upload-time = "2025-09-27T18:37:14.408Z" },
+    { url = "https://files.pythonhosted.org/packages/46/11/f333a06fc16236d5238bfe74daccbca41459dcd8d1fa952e8fbd5dccfb70/markupsafe-3.0.3-cp314-cp314-win32.whl", hash = "sha256:729586769a26dbceff69f7a7dbbf59ab6572b99d94576a5592625d5b411576b9", size = 14747, upload-time = "2025-09-27T18:37:15.36Z" },
+    { url = "https://files.pythonhosted.org/packages/28/52/182836104b33b444e400b14f797212f720cbc9ed6ba34c800639d154e821/markupsafe-3.0.3-cp314-cp314-win_amd64.whl", hash = "sha256:bdc919ead48f234740ad807933cdf545180bfbe9342c2bb451556db2ed958581", size = 15341, upload-time = "2025-09-27T18:37:16.496Z" },
+    { url = "https://files.pythonhosted.org/packages/6f/18/acf23e91bd94fd7b3031558b1f013adfa21a8e407a3fdb32745538730382/markupsafe-3.0.3-cp314-cp314-win_arm64.whl", hash = "sha256:5a7d5dc5140555cf21a6fefbdbf8723f06fcd2f63ef108f2854de715e4422cb4", size = 14073, upload-time = "2025-09-27T18:37:17.476Z" },
+    { url = "https://files.pythonhosted.org/packages/3c/f0/57689aa4076e1b43b15fdfa646b04653969d50cf30c32a102762be2485da/markupsafe-3.0.3-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:1353ef0c1b138e1907ae78e2f6c63ff67501122006b0f9abad68fda5f4ffc6ab", size = 11661, upload-time = "2025-09-27T18:37:18.453Z" },
+    { url = "https://files.pythonhosted.org/packages/89/c3/2e67a7ca217c6912985ec766c6393b636fb0c2344443ff9d91404dc4c79f/markupsafe-3.0.3-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:1085e7fbddd3be5f89cc898938f42c0b3c711fdcb37d75221de2666af647c175", size = 12069, upload-time = "2025-09-27T18:37:19.332Z" },
+    { url = "https://files.pythonhosted.org/packages/f0/00/be561dce4e6ca66b15276e184ce4b8aec61fe83662cce2f7d72bd3249d28/markupsafe-3.0.3-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1b52b4fb9df4eb9ae465f8d0c228a00624de2334f216f178a995ccdcf82c4634", size = 25670, upload-time = "2025-09-27T18:37:20.245Z" },
+    { url = "https://files.pythonhosted.org/packages/50/09/c419f6f5a92e5fadde27efd190eca90f05e1261b10dbd8cbcb39cd8ea1dc/markupsafe-3.0.3-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:fed51ac40f757d41b7c48425901843666a6677e3e8eb0abcff09e4ba6e664f50", size = 23598, upload-time = "2025-09-27T18:37:21.177Z" },
+    { url = "https://files.pythonhosted.org/packages/22/44/a0681611106e0b2921b3033fc19bc53323e0b50bc70cffdd19f7d679bb66/markupsafe-3.0.3-cp314-cp314t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:f190daf01f13c72eac4efd5c430a8de82489d9cff23c364c3ea822545032993e", size = 23261, upload-time = "2025-09-27T18:37:22.167Z" },
+    { url = "https://files.pythonhosted.org/packages/5f/57/1b0b3f100259dc9fffe780cfb60d4be71375510e435efec3d116b6436d43/markupsafe-3.0.3-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:e56b7d45a839a697b5eb268c82a71bd8c7f6c94d6fd50c3d577fa39a9f1409f5", size = 24835, upload-time = "2025-09-27T18:37:23.296Z" },
+    { url = "https://files.pythonhosted.org/packages/26/6a/4bf6d0c97c4920f1597cc14dd720705eca0bf7c787aebc6bb4d1bead5388/markupsafe-3.0.3-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:f3e98bb3798ead92273dc0e5fd0f31ade220f59a266ffd8a4f6065e0a3ce0523", size = 22733, upload-time = "2025-09-27T18:37:24.237Z" },
+    { url = "https://files.pythonhosted.org/packages/14/c7/ca723101509b518797fedc2fdf79ba57f886b4aca8a7d31857ba3ee8281f/markupsafe-3.0.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:5678211cb9333a6468fb8d8be0305520aa073f50d17f089b5b4b477ea6e67fdc", size = 23672, upload-time = "2025-09-27T18:37:25.271Z" },
+    { url = "https://files.pythonhosted.org/packages/fb/df/5bd7a48c256faecd1d36edc13133e51397e41b73bb77e1a69deab746ebac/markupsafe-3.0.3-cp314-cp314t-win32.whl", hash = "sha256:915c04ba3851909ce68ccc2b8e2cd691618c4dc4c4232fb7982bca3f41fd8c3d", size = 14819, upload-time = "2025-09-27T18:37:26.285Z" },
+    { url = "https://files.pythonhosted.org/packages/1a/8a/0402ba61a2f16038b48b39bccca271134be00c5c9f0f623208399333c448/markupsafe-3.0.3-cp314-cp314t-win_amd64.whl", hash = "sha256:4faffd047e07c38848ce017e8725090413cd80cbc23d86e55c587bf979e579c9", size = 15426, upload-time = "2025-09-27T18:37:27.316Z" },
+    { url = "https://files.pythonhosted.org/packages/70/bc/6f1c2f612465f5fa89b95bead1f44dcb607670fd42891d8fdcd5d039f4f4/markupsafe-3.0.3-cp314-cp314t-win_arm64.whl", hash = "sha256:32001d6a8fc98c8cb5c947787c5d08b0a50663d139f1305bac5885d98d9b40fa", size = 14146, upload-time = "2025-09-27T18:37:28.327Z" },
+]
+
+[[package]]
+name = "moto"
+version = "5.1.4"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "boto3" },
+    { name = "botocore" },
+    { name = "cryptography" },
+    { name = "jinja2" },
+    { name = "python-dateutil" },
+    { name = "requests" },
+    { name = "responses" },
+    { name = "werkzeug" },
+    { name = "xmltodict" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/4e/17/059cd6fa5962fa75575f74bb8d31d0e9cb3e656414cb79b4b3736b2b40eb/moto-5.1.4.tar.gz", hash = "sha256:b339c3514f2986ebefa465671b688bdbf51796705702214b1bad46490b68507a", size = 6796440, upload-time = "2025-04-20T20:16:19.165Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/44/98/346e3252767e1fe78a0842804cfffbbbef103b960149d4bb217b4cc31a19/moto-5.1.4-py3-none-any.whl", hash = "sha256:9a19d7a64c3f03824389cfbd478b64c82bd4d8da21b242a34259360d66cd108b", size = 4898363, upload-time = "2025-04-20T20:16:16.849Z" },
+]
+
+[package.optional-dependencies]
+cognitoidp = [
+    { name = "joserfc" },
+]
+dynamodb = [
+    { name = "docker" },
+    { name = "py-partiql-parser" },
+]
+s3 = [
+    { name = "py-partiql-parser" },
+    { name = "pyyaml" },
+]
+ssm = [
+    { name = "pyyaml" },
+]
+
+[[package]]
+name = "packaging"
+version = "26.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/d7/f1/e7a6dd94a8d4a5626c03e4e99c87f241ba9e350cd9e6d75123f992427270/packaging-26.2.tar.gz", hash = "sha256:ff452ff5a3e828ce110190feff1178bb1f2ea2281fa2075aadb987c2fb221661", size = 228134, upload-time = "2026-04-24T20:15:23.917Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/df/b2/87e62e8c3e2f4b32e5fe99e0b86d576da1312593b39f47d8ceef365e95ed/packaging-26.2-py3-none-any.whl", hash = "sha256:5fc45236b9446107ff2415ce77c807cee2862cb6fac22b8a73826d0693b0980e", size = 100195, upload-time = "2026-04-24T20:15:22.081Z" },
+]
+
+[[package]]
+name = "pluggy"
+version = "1.6.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/f9/e2/3e91f31a7d2b083fe6ef3fa267035b518369d9511ffab804f839851d2779/pluggy-1.6.0.tar.gz", hash = "sha256:7dcc130b76258d33b90f61b658791dede3486c3e6bfb003ee5c9bfb396dd22f3", size = 69412, upload-time = "2025-05-15T12:30:07.975Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/54/20/4d324d65cc6d9205fabedc306948156824eb9f0ee1633355a8f7ec5c66bf/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538, upload-time = "2025-05-15T12:30:06.134Z" },
+]
+
+[[package]]
+name = "py-partiql-parser"
+version = "0.6.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/58/a1/0a2867e48b232b4f82c4929ef7135f2a5d72c3886b957dccf63c70aa2fcb/py_partiql_parser-0.6.1.tar.gz", hash = "sha256:8583ff2a0e15560ef3bc3df109a7714d17f87d81d33e8c38b7fed4e58a63215d", size = 17120, upload-time = "2024-12-25T22:06:41.327Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/97/84/0e410c20bbe9a504fc56e97908f13261c2b313d16cbb3b738556166f044a/py_partiql_parser-0.6.1-py2.py3-none-any.whl", hash = "sha256:ff6a48067bff23c37e9044021bf1d949c83e195490c17e020715e927fe5b2456", size = 23520, upload-time = "2024-12-25T22:06:39.106Z" },
+]
+
+[[package]]
+name = "pycparser"
+version = "3.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/1b/7d/92392ff7815c21062bea51aa7b87d45576f649f16458d78b7cf94b9ab2e6/pycparser-3.0.tar.gz", hash = "sha256:600f49d217304a5902ac3c37e1281c9fe94e4d0489de643a9504c5cdfdfc6b29", size = 103492, upload-time = "2026-01-21T14:26:51.89Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/0c/c3/44f3fbbfa403ea2a7c779186dc20772604442dde72947e7d01069cbe98e3/pycparser-3.0-py3-none-any.whl", hash = "sha256:b727414169a36b7d524c1c3e31839a521725078d7b2ff038656844266160a992", size = 48172, upload-time = "2026-01-21T14:26:50.693Z" },
+]
+
+[[package]]
+name = "pygments"
+version = "2.20.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/c3/b2/bc9c9196916376152d655522fdcebac55e66de6603a76a02bca1b6414f6c/pygments-2.20.0.tar.gz", hash = "sha256:6757cd03768053ff99f3039c1a36d6c0aa0b263438fcab17520b30a303a82b5f", size = 4955991, upload-time = "2026-03-29T13:29:33.898Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/f4/7e/a72dd26f3b0f4f2bf1dd8923c85f7ceb43172af56d63c7383eb62b332364/pygments-2.20.0-py3-none-any.whl", hash = "sha256:81a9e26dd42fd28a23a2d169d86d7ac03b46e2f8b59ed4698fb4785f946d0176", size = 1231151, upload-time = "2026-03-29T13:29:30.038Z" },
+]
+
+[[package]]
+name = "pytest"
+version = "8.4.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "colorama", marker = "sys_platform == 'win32'" },
+    { name = "iniconfig" },
+    { name = "packaging" },
+    { name = "pluggy" },
+    { name = "pygments" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/a3/5c/00a0e072241553e1a7496d638deababa67c5058571567b92a7eaa258397c/pytest-8.4.2.tar.gz", hash = "sha256:86c0d0b93306b961d58d62a4db4879f27fe25513d4b969df351abdddb3c30e01", size = 1519618, upload-time = "2025-09-04T14:34:22.711Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/a8/a4/20da314d277121d6534b3a980b29035dcd51e6744bd79075a6ce8fa4eb8d/pytest-8.4.2-py3-none-any.whl", hash = "sha256:872f880de3fc3a5bdc88a11b39c9710c3497a547cfa9320bc3c5e62fbf272e79", size = 365750, upload-time = "2025-09-04T14:34:20.226Z" },
+]
+
+[[package]]
+name = "python-dateutil"
+version = "2.9.0.post0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "six" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/66/c0/0c8b6ad9f17a802ee498c46e004a0eb49bc148f2fd230864601a86dcf6db/python-dateutil-2.9.0.post0.tar.gz", hash = "sha256:37dd54208da7e1cd875388217d5e00ebd4179249f90fb72437e91a35459a0ad3", size = 342432, upload-time = "2024-03-01T18:36:20.211Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/ec/57/56b9bcc3c9c6a792fcbaf139543cee77261f3651ca9da0c93f5c1221264b/python_dateutil-2.9.0.post0-py2.py3-none-any.whl", hash = "sha256:a8b2bc7bffae282281c8140a97d3aa9c14da0b136dfe83f850eea9a5f7470427", size = 229892, upload-time = "2024-03-01T18:36:18.57Z" },
+]
+
+[[package]]
+name = "pywin32"
+version = "311"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/a5/be/3fd5de0979fcb3994bfee0d65ed8ca9506a8a1260651b86174f6a86f52b3/pywin32-311-cp313-cp313-win32.whl", hash = "sha256:f95ba5a847cba10dd8c4d8fefa9f2a6cf283b8b88ed6178fa8a6c1ab16054d0d", size = 8705700, upload-time = "2025-07-14T20:13:26.471Z" },
+    { url = "https://files.pythonhosted.org/packages/e3/28/e0a1909523c6890208295a29e05c2adb2126364e289826c0a8bc7297bd5c/pywin32-311-cp313-cp313-win_amd64.whl", hash = "sha256:718a38f7e5b058e76aee1c56ddd06908116d35147e133427e59a3983f703a20d", size = 9494700, upload-time = "2025-07-14T20:13:28.243Z" },
+    { url = "https://files.pythonhosted.org/packages/04/bf/90339ac0f55726dce7d794e6d79a18a91265bdf3aa70b6b9ca52f35e022a/pywin32-311-cp313-cp313-win_arm64.whl", hash = "sha256:7b4075d959648406202d92a2310cb990fea19b535c7f4a78d3f5e10b926eeb8a", size = 8709318, upload-time = "2025-07-14T20:13:30.348Z" },
+    { url = "https://files.pythonhosted.org/packages/c9/31/097f2e132c4f16d99a22bfb777e0fd88bd8e1c634304e102f313af69ace5/pywin32-311-cp314-cp314-win32.whl", hash = "sha256:b7a2c10b93f8986666d0c803ee19b5990885872a7de910fc460f9b0c2fbf92ee", size = 8840714, upload-time = "2025-07-14T20:13:32.449Z" },
+    { url = "https://files.pythonhosted.org/packages/90/4b/07c77d8ba0e01349358082713400435347df8426208171ce297da32c313d/pywin32-311-cp314-cp314-win_amd64.whl", hash = "sha256:3aca44c046bd2ed8c90de9cb8427f581c479e594e99b5c0bb19b29c10fd6cb87", size = 9656800, upload-time = "2025-07-14T20:13:34.312Z" },
+    { url = "https://files.pythonhosted.org/packages/c0/d2/21af5c535501a7233e734b8af901574572da66fcc254cb35d0609c9080dd/pywin32-311-cp314-cp314-win_arm64.whl", hash = "sha256:a508e2d9025764a8270f93111a970e1d0fbfc33f4153b388bb649b7eec4f9b42", size = 8932540, upload-time = "2025-07-14T20:13:36.379Z" },
+]
+
+[[package]]
+name = "pyyaml"
+version = "6.0.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/05/8e/961c0007c59b8dd7729d542c61a4d537767a59645b82a0b521206e1e25c2/pyyaml-6.0.3.tar.gz", hash = "sha256:d76623373421df22fb4cf8817020cbb7ef15c725b9d5e45f17e189bfc384190f", size = 130960, upload-time = "2025-09-25T21:33:16.546Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/d1/11/0fd08f8192109f7169db964b5707a2f1e8b745d4e239b784a5a1dd80d1db/pyyaml-6.0.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:8da9669d359f02c0b91ccc01cac4a67f16afec0dac22c2ad09f46bee0697eba8", size = 181669, upload-time = "2025-09-25T21:32:23.673Z" },
+    { url = "https://files.pythonhosted.org/packages/b1/16/95309993f1d3748cd644e02e38b75d50cbc0d9561d21f390a76242ce073f/pyyaml-6.0.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:2283a07e2c21a2aa78d9c4442724ec1eb15f5e42a723b99cb3d822d48f5f7ad1", size = 173252, upload-time = "2025-09-25T21:32:25.149Z" },
+    { url = "https://files.pythonhosted.org/packages/50/31/b20f376d3f810b9b2371e72ef5adb33879b25edb7a6d072cb7ca0c486398/pyyaml-6.0.3-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ee2922902c45ae8ccada2c5b501ab86c36525b883eff4255313a253a3160861c", size = 767081, upload-time = "2025-09-25T21:32:26.575Z" },
+    { url = "https://files.pythonhosted.org/packages/49/1e/a55ca81e949270d5d4432fbbd19dfea5321eda7c41a849d443dc92fd1ff7/pyyaml-6.0.3-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:a33284e20b78bd4a18c8c2282d549d10bc8408a2a7ff57653c0cf0b9be0afce5", size = 841159, upload-time = "2025-09-25T21:32:27.727Z" },
+    { url = "https://files.pythonhosted.org/packages/74/27/e5b8f34d02d9995b80abcef563ea1f8b56d20134d8f4e5e81733b1feceb2/pyyaml-6.0.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0f29edc409a6392443abf94b9cf89ce99889a1dd5376d94316ae5145dfedd5d6", size = 801626, upload-time = "2025-09-25T21:32:28.878Z" },
+    { url = "https://files.pythonhosted.org/packages/f9/11/ba845c23988798f40e52ba45f34849aa8a1f2d4af4b798588010792ebad6/pyyaml-6.0.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:f7057c9a337546edc7973c0d3ba84ddcdf0daa14533c2065749c9075001090e6", size = 753613, upload-time = "2025-09-25T21:32:30.178Z" },
+    { url = "https://files.pythonhosted.org/packages/3d/e0/7966e1a7bfc0a45bf0a7fb6b98ea03fc9b8d84fa7f2229e9659680b69ee3/pyyaml-6.0.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:eda16858a3cab07b80edaf74336ece1f986ba330fdb8ee0d6c0d68fe82bc96be", size = 794115, upload-time = "2025-09-25T21:32:31.353Z" },
+    { url = "https://files.pythonhosted.org/packages/de/94/980b50a6531b3019e45ddeada0626d45fa85cbe22300844a7983285bed3b/pyyaml-6.0.3-cp313-cp313-win32.whl", hash = "sha256:d0eae10f8159e8fdad514efdc92d74fd8d682c933a6dd088030f3834bc8e6b26", size = 137427, upload-time = "2025-09-25T21:32:32.58Z" },
+    { url = "https://files.pythonhosted.org/packages/97/c9/39d5b874e8b28845e4ec2202b5da735d0199dbe5b8fb85f91398814a9a46/pyyaml-6.0.3-cp313-cp313-win_amd64.whl", hash = "sha256:79005a0d97d5ddabfeeea4cf676af11e647e41d81c9a7722a193022accdb6b7c", size = 154090, upload-time = "2025-09-25T21:32:33.659Z" },
+    { url = "https://files.pythonhosted.org/packages/73/e8/2bdf3ca2090f68bb3d75b44da7bbc71843b19c9f2b9cb9b0f4ab7a5a4329/pyyaml-6.0.3-cp313-cp313-win_arm64.whl", hash = "sha256:5498cd1645aa724a7c71c8f378eb29ebe23da2fc0d7a08071d89469bf1d2defb", size = 140246, upload-time = "2025-09-25T21:32:34.663Z" },
+    { url = "https://files.pythonhosted.org/packages/9d/8c/f4bd7f6465179953d3ac9bc44ac1a8a3e6122cf8ada906b4f96c60172d43/pyyaml-6.0.3-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:8d1fab6bb153a416f9aeb4b8763bc0f22a5586065f86f7664fc23339fc1c1fac", size = 181814, upload-time = "2025-09-25T21:32:35.712Z" },
+    { url = "https://files.pythonhosted.org/packages/bd/9c/4d95bb87eb2063d20db7b60faa3840c1b18025517ae857371c4dd55a6b3a/pyyaml-6.0.3-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:34d5fcd24b8445fadc33f9cf348c1047101756fd760b4dacb5c3e99755703310", size = 173809, upload-time = "2025-09-25T21:32:36.789Z" },
+    { url = "https://files.pythonhosted.org/packages/92/b5/47e807c2623074914e29dabd16cbbdd4bf5e9b2db9f8090fa64411fc5382/pyyaml-6.0.3-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:501a031947e3a9025ed4405a168e6ef5ae3126c59f90ce0cd6f2bfc477be31b7", size = 766454, upload-time = "2025-09-25T21:32:37.966Z" },
+    { url = "https://files.pythonhosted.org/packages/02/9e/e5e9b168be58564121efb3de6859c452fccde0ab093d8438905899a3a483/pyyaml-6.0.3-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:b3bc83488de33889877a0f2543ade9f70c67d66d9ebb4ac959502e12de895788", size = 836355, upload-time = "2025-09-25T21:32:39.178Z" },
+    { url = "https://files.pythonhosted.org/packages/88/f9/16491d7ed2a919954993e48aa941b200f38040928474c9e85ea9e64222c3/pyyaml-6.0.3-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:c458b6d084f9b935061bc36216e8a69a7e293a2f1e68bf956dcd9e6cbcd143f5", size = 794175, upload-time = "2025-09-25T21:32:40.865Z" },
+    { url = "https://files.pythonhosted.org/packages/dd/3f/5989debef34dc6397317802b527dbbafb2b4760878a53d4166579111411e/pyyaml-6.0.3-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:7c6610def4f163542a622a73fb39f534f8c101d690126992300bf3207eab9764", size = 755228, upload-time = "2025-09-25T21:32:42.084Z" },
+    { url = "https://files.pythonhosted.org/packages/d7/ce/af88a49043cd2e265be63d083fc75b27b6ed062f5f9fd6cdc223ad62f03e/pyyaml-6.0.3-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:5190d403f121660ce8d1d2c1bb2ef1bd05b5f68533fc5c2ea899bd15f4399b35", size = 789194, upload-time = "2025-09-25T21:32:43.362Z" },
+    { url = "https://files.pythonhosted.org/packages/23/20/bb6982b26a40bb43951265ba29d4c246ef0ff59c9fdcdf0ed04e0687de4d/pyyaml-6.0.3-cp314-cp314-win_amd64.whl", hash = "sha256:4a2e8cebe2ff6ab7d1050ecd59c25d4c8bd7e6f400f5f82b96557ac0abafd0ac", size = 156429, upload-time = "2025-09-25T21:32:57.844Z" },
+    { url = "https://files.pythonhosted.org/packages/f4/f4/a4541072bb9422c8a883ab55255f918fa378ecf083f5b85e87fc2b4eda1b/pyyaml-6.0.3-cp314-cp314-win_arm64.whl", hash = "sha256:93dda82c9c22deb0a405ea4dc5f2d0cda384168e466364dec6255b293923b2f3", size = 143912, upload-time = "2025-09-25T21:32:59.247Z" },
+    { url = "https://files.pythonhosted.org/packages/7c/f9/07dd09ae774e4616edf6cda684ee78f97777bdd15847253637a6f052a62f/pyyaml-6.0.3-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:02893d100e99e03eda1c8fd5c441d8c60103fd175728e23e431db1b589cf5ab3", size = 189108, upload-time = "2025-09-25T21:32:44.377Z" },
+    { url = "https://files.pythonhosted.org/packages/4e/78/8d08c9fb7ce09ad8c38ad533c1191cf27f7ae1effe5bb9400a46d9437fcf/pyyaml-6.0.3-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:c1ff362665ae507275af2853520967820d9124984e0f7466736aea23d8611fba", size = 183641, upload-time = "2025-09-25T21:32:45.407Z" },
+    { url = "https://files.pythonhosted.org/packages/7b/5b/3babb19104a46945cf816d047db2788bcaf8c94527a805610b0289a01c6b/pyyaml-6.0.3-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6adc77889b628398debc7b65c073bcb99c4a0237b248cacaf3fe8a557563ef6c", size = 831901, upload-time = "2025-09-25T21:32:48.83Z" },
+    { url = "https://files.pythonhosted.org/packages/8b/cc/dff0684d8dc44da4d22a13f35f073d558c268780ce3c6ba1b87055bb0b87/pyyaml-6.0.3-cp314-cp314t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:a80cb027f6b349846a3bf6d73b5e95e782175e52f22108cfa17876aaeff93702", size = 861132, upload-time = "2025-09-25T21:32:50.149Z" },
+    { url = "https://files.pythonhosted.org/packages/b1/5e/f77dc6b9036943e285ba76b49e118d9ea929885becb0a29ba8a7c75e29fe/pyyaml-6.0.3-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:00c4bdeba853cc34e7dd471f16b4114f4162dc03e6b7afcc2128711f0eca823c", size = 839261, upload-time = "2025-09-25T21:32:51.808Z" },
+    { url = "https://files.pythonhosted.org/packages/ce/88/a9db1376aa2a228197c58b37302f284b5617f56a5d959fd1763fb1675ce6/pyyaml-6.0.3-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:66e1674c3ef6f541c35191caae2d429b967b99e02040f5ba928632d9a7f0f065", size = 805272, upload-time = "2025-09-25T21:32:52.941Z" },
+    { url = "https://files.pythonhosted.org/packages/da/92/1446574745d74df0c92e6aa4a7b0b3130706a4142b2d1a5869f2eaa423c6/pyyaml-6.0.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:16249ee61e95f858e83976573de0f5b2893b3677ba71c9dd36b9cf8be9ac6d65", size = 829923, upload-time = "2025-09-25T21:32:54.537Z" },
+    { url = "https://files.pythonhosted.org/packages/f0/7a/1c7270340330e575b92f397352af856a8c06f230aa3e76f86b39d01b416a/pyyaml-6.0.3-cp314-cp314t-win_amd64.whl", hash = "sha256:4ad1906908f2f5ae4e5a8ddfce73c320c2a1429ec52eafd27138b7f1cbe341c9", size = 174062, upload-time = "2025-09-25T21:32:55.767Z" },
+    { url = "https://files.pythonhosted.org/packages/f1/12/de94a39c2ef588c7e6455cfbe7343d3b2dc9d6b6b2f40c4c6565744c873d/pyyaml-6.0.3-cp314-cp314t-win_arm64.whl", hash = "sha256:ebc55a14a21cb14062aa4162f906cd962b28e2e9ea38f9b4391244cd8de4ae0b", size = 149341, upload-time = "2025-09-25T21:32:56.828Z" },
+]
+
+[[package]]
+name = "requests"
+version = "2.34.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "certifi" },
+    { name = "charset-normalizer" },
+    { name = "idna" },
+    { name = "urllib3" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/ac/c3/e2a2b89f2d3e2179abd6d00ebd70bff6273f37fb3e0cc209f48b39d00cbf/requests-2.34.2.tar.gz", hash = "sha256:f288924cae4e29463698d6d60bc6a4da69c89185ad1e0bcc4104f584e960b9ed", size = 142856, upload-time = "2026-05-14T19:25:27.735Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/a0/f4/c67b0b3f1b9245e8d266f0f112c500d50e5b4e83cb6f3b71b6528104182a/requests-2.34.2-py3-none-any.whl", hash = "sha256:2a0d60c172f83ac6ab31e4554906c0f3b3588d37b5cb939b1c061f4907e278e0", size = 73075, upload-time = "2026-05-14T19:25:26.443Z" },
+]
+
+[[package]]
+name = "responses"
+version = "0.26.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "pyyaml" },
+    { name = "requests" },
+    { name = "urllib3" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/9f/b4/b7e040379838cc71bf5aabdb26998dfbe5ee73904c92c1c161faf5de8866/responses-0.26.0.tar.gz", hash = "sha256:c7f6923e6343ef3682816ba421c006626777893cb0d5e1434f674b649bac9eb4", size = 81303, upload-time = "2026-02-19T14:38:05.574Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/ce/04/7f73d05b556da048923e31a0cc878f03be7c5425ed1f268082255c75d872/responses-0.26.0-py3-none-any.whl", hash = "sha256:03ec4409088cd5c66b71ecbbbd27fe2c58ddfad801c66203457b3e6a04868c37", size = 35099, upload-time = "2026-02-19T14:38:03.847Z" },
+]
+
+[[package]]
+name = "s3transfer"
+version = "0.17.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "botocore" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/9b/ec/7c692cde9125b77e84b307354d4fb705f98b8ccad59a036d5957ca75bfc3/s3transfer-0.17.0.tar.gz", hash = "sha256:9edeb6d1c3c2f89d6050348548834ad8289610d886e5bf7b7207728bd43ce33a", size = 155337, upload-time = "2026-04-29T22:07:36.33Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/87/72/c6c32d2b657fa3dad1de340254e14390b1e334ce38268b7ad51abda3c8c2/s3transfer-0.17.0-py3-none-any.whl", hash = "sha256:ce3801712acf4ad3e89fb9990df97b4972e93f4b3b0004d214be5bce12814c20", size = 86811, upload-time = "2026-04-29T22:07:34.966Z" },
+]
+
+[[package]]
+name = "six"
+version = "1.17.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/94/e7/b2c673351809dca68a0e064b6af791aa332cf192da575fd474ed7d6f16a2/six-1.17.0.tar.gz", hash = "sha256:ff70335d468e7eb6ec65b95b99d3a2836546063f63acc5171de367e834932a81", size = 34031, upload-time = "2024-12-04T17:35:28.174Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/b7/ce/149a00dd41f10bc29e5921b496af8b574d8413afcd5e30dfa0ed46c2cc5e/six-1.17.0-py2.py3-none-any.whl", hash = "sha256:4721f391ed90541fddacab5acf947aa0d3dc7d27b2e1e8eda2be8970586c3274", size = 11050, upload-time = "2024-12-04T17:35:26.475Z" },
+]
+
+[[package]]
+name = "urllib3"
+version = "2.7.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/53/0c/06f8b233b8fd13b9e5ee11424ef85419ba0d8ba0b3138bf360be2ff56953/urllib3-2.7.0.tar.gz", hash = "sha256:231e0ec3b63ceb14667c67be60f2f2c40a518cb38b03af60abc813da26505f4c", size = 433602, upload-time = "2026-05-07T16:13:18.596Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/7f/3e/5db95bcf282c52709639744ca2a8b149baccf648e39c8cc87553df9eae0c/urllib3-2.7.0-py3-none-any.whl", hash = "sha256:9fb4c81ebbb1ce9531cce37674bbc6f1360472bc18ca9a553ede278ef7276897", size = 131087, upload-time = "2026-05-07T16:13:17.151Z" },
+]
+
+[[package]]
+name = "werkzeug"
+version = "3.1.8"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "markupsafe" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/dd/b2/381be8cfdee792dd117872481b6e378f85c957dd7c5bca38897b08f765fd/werkzeug-3.1.8.tar.gz", hash = "sha256:9bad61a4268dac112f1c5cd4630a56ede601b6ed420300677a869083d70a4c44", size = 875852, upload-time = "2026-04-02T18:49:14.268Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/93/8c/2e650f2afeb7ee576912636c23ddb621c91ac6a98e66dc8d29c3c69446e1/werkzeug-3.1.8-py3-none-any.whl", hash = "sha256:63a77fb8892bf28ebc3178683445222aa500e48ebad5ec77b0ad80f8726b1f50", size = 226459, upload-time = "2026-04-02T18:49:12.72Z" },
+]
+
+[[package]]
+name = "xmltodict"
+version = "1.0.4"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/19/70/80f3b7c10d2630aa66414bf23d210386700aa390547278c789afa994fd7e/xmltodict-1.0.4.tar.gz", hash = "sha256:6d94c9f834dd9e44514162799d344d815a3a4faec913717a9ecbfa5be1bb8e61", size = 26124, upload-time = "2026-02-22T02:21:22.074Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/38/34/98a2f52245f4d47be93b580dae5f9861ef58977d73a79eb47c58f1ad1f3a/xmltodict-1.0.4-py3-none-any.whl", hash = "sha256:a4a00d300b0e1c59fc2bfccb53d7b2e88c32f200df138a0dd2229f842497026a", size = 13580, upload-time = "2026-02-22T02:21:21.039Z" },
+]
diff --git a/scripts/common/load-env.sh b/scripts/common/load-env.sh
index 002ed0b5..bce05086 100644
--- a/scripts/common/load-env.sh
+++ b/scripts/common/load-env.sh
@@ -206,6 +206,17 @@ build_cdk_context_params() {
         context_params="${context_params} --context fineTuning.enabled=\"${CDK_FINE_TUNING_ENABLED}\""
     fi
 
+    # Artifacts optional parameters
+    if [ -n "${CDK_ARTIFACTS_ENABLED:-}" ]; then
+        context_params="${context_params} --context artifacts.enabled=\"${CDK_ARTIFACTS_ENABLED}\""
+    fi
+    if [ -n "${CDK_ARTIFACTS_CERTIFICATE_ARN:-}" ]; then
+        context_params="${context_params} --context artifacts.certificateArn=\"${CDK_ARTIFACTS_CERTIFICATE_ARN}\""
+    fi
+    if [ -n "${CDK_ARTIFACTS_RETENTION_DAYS:-}" ]; then
+        context_params="${context_params} --context artifacts.retentionDays=\"${CDK_ARTIFACTS_RETENTION_DAYS}\""
+    fi
+
     echo "${context_params}"
 }
 
@@ -272,6 +283,11 @@ export CDK_RAG_LAMBDA_TIMEOUT="${CDK_RAG_LAMBDA_TIMEOUT:-$(get_json_value "ragIn
 # SageMaker Fine-Tuning configuration
 export CDK_FINE_TUNING_ENABLED="${CDK_FINE_TUNING_ENABLED:-$(get_json_value "fineTuning.enabled" "${CONTEXT_FILE}")}"
 
+# Artifacts configuration
+export CDK_ARTIFACTS_ENABLED="${CDK_ARTIFACTS_ENABLED:-$(get_json_value "artifacts.enabled" "${CONTEXT_FILE}")}"
+export CDK_ARTIFACTS_CERTIFICATE_ARN="${CDK_ARTIFACTS_CERTIFICATE_ARN:-$(get_json_value "artifacts.certificateArn" "${CONTEXT_FILE}")}"
+export CDK_ARTIFACTS_RETENTION_DAYS="${CDK_ARTIFACTS_RETENTION_DAYS:-$(get_json_value "artifacts.retentionDays" "${CONTEXT_FILE}")}"
+
 # Cognito configuration (optional — defaults to projectPrefix for domain prefix)
 export CDK_COGNITO_DOMAIN_PREFIX="${CDK_COGNITO_DOMAIN_PREFIX:-$(get_json_value "cognito.domainPrefix" "${CONTEXT_FILE}")}"
 
diff --git a/scripts/nightly/e2e-test.sh b/scripts/nightly/e2e-test.sh
index f0006019..9d667e46 100755
--- a/scripts/nightly/e2e-test.sh
+++ b/scripts/nightly/e2e-test.sh
@@ -443,6 +443,45 @@ print(json.dumps(register_input))
             status_code=$(curl -s -o /dev/null -w "%{http_code}" "${alb_url}/health" --max-time 10 || echo "000")
             if [ "${status_code}" = "200" ]; then
                 log_success "  App API healthy after CORS patch (HTTP 200)"
+                # Verify the new task is actually serving by checking the
+                # redirect_uri in /auth/login. The health check can pass on
+                # the new task while the old task is still draining — we need
+                # to confirm the BFF env var patch is active.
+                local verify_redirect
+                verify_redirect=$(curl -s -o /dev/null -w "%{redirect_url}" \
+                    "${alb_url}/auth/login" --max-time 10 || true)
+                local verify_uri=""
+                if [ -n "${verify_redirect}" ] && echo "${verify_redirect}" | grep -qF "redirect_uri="; then
+                    verify_uri=$(echo "${verify_redirect}" | python3 -c "
+import sys, urllib.parse
+url = sys.stdin.read().strip()
+parsed = urllib.parse.urlparse(url)
+params = urllib.parse.parse_qs(parsed.query)
+print(params.get('redirect_uri', [''])[0])
+" 2>/dev/null || true)
+                fi
+
+                if [ -n "${verify_uri}" ] && [ "${verify_uri}" = "${new_callback_url}" ]; then
+                    log_success "  BFF redirect_uri confirmed: ${verify_uri}"
+                else
+                    log_warn "  Health OK but redirect_uri not yet updated: ${verify_uri:-<empty>}"
+                    log_warn "  Expected: ${new_callback_url}"
+                    log_warn "  Old task may still be in the target group — waiting..."
+                    retries=$((retries + 1))
+                    if [ ${retries} -lt ${max_retries} ]; then
+                        sleep 15
+                        continue
+                    fi
+                fi
+
+                # Wait for the ALB deregistration delay (30s configured in CDK)
+                # to ensure the old task is fully drained and no longer serving
+                # requests. Without this, the old task (with a different cookie
+                # encryption key) may handle the OAuth callback, producing a
+                # cookie that the new task cannot unseal.
+                log_info "  Waiting 45s for old task deregistration to complete..."
+                sleep 45
+                log_info "  Deregistration wait complete — only new task should be serving"
                 return 0
             fi
             retries=$((retries + 1))
@@ -494,11 +533,14 @@ main() {
     log_info "Frontend responded with HTTP ${response_code}"
 
     # --- Patch App API CORS to allow requests from the CloudFront origin ---
+    # Only patch BFF env vars if they're still set to localhost defaults.
+    # When CDK configured a custom domain, the BFF env vars are already correct
+    # and we only need to add the base_url to CORS_ORIGINS.
     log_info "Patching App API env vars (CORS, BFF redirect URLs) for CloudFront..."
     patch_app_api_cors "${base_url}"
 
-    # --- Ensure Cognito allows the dynamic CloudFront callback URL ---
-    log_info "Patching Cognito app client with CloudFront callback URL..."
+    # --- Ensure Cognito allows the callback URL ---
+    log_info "Patching Cognito app client with callback URL..."
     patch_cognito_callback_urls "${base_url}"
 
     # --- Seed bootstrap data (models, tools, roles, quotas) ---
@@ -551,10 +593,15 @@ main() {
 
     # --- Verify BFF auth configuration ---
     # Smoke-test the BFF login redirect to confirm the patched env vars are
-    # active. The BFF's /auth/login should 302 to Cognito with a redirect_uri
-    # pointing at the CloudFront-fronted callback URL.
+    # active. We verify BOTH through the ALB directly AND through CloudFront.
+    # The CloudFront verification is critical: the Playwright tests go through
+    # CloudFront, so we must confirm the patched task is actually serving
+    # requests that arrive via CloudFront's /api/* behavior.
     if [ -n "${alb_url}" ] && [ "${alb_url}" != "None" ]; then
-        log_info "Verifying BFF auth login redirect..."
+        local expected_callback="${base_url}/api/auth/callback"
+
+        # --- Verify via ALB (direct) ---
+        log_info "Verifying BFF auth login redirect (via ALB)..."
         local login_redirect
         login_redirect=$(curl -s -o /dev/null -w "%{redirect_url}" \
             "${alb_url}/auth/login" --max-time 10 || true)
@@ -571,16 +618,93 @@ params = urllib.parse.parse_qs(parsed.query)
 print(params.get('redirect_uri', [''])[0])
 ")
                 log_info "  redirect_uri in authorize request: ${actual_redirect_uri}"
-                local expected_callback="${base_url}/api/auth/callback"
                 if [ "${actual_redirect_uri}" != "${expected_callback}" ]; then
                     log_error "  MISMATCH! Expected: ${expected_callback}"
                     log_error "  BFF_AUTH_CALLBACK_URL patch may not have taken effect."
-                    log_error "  This will cause token exchange failures after Cognito login."
+                    log_error "  This will cause cookies to be set on the wrong domain."
                 fi
             fi
         else
             log_warn "  Could not verify BFF login redirect (no redirect URL captured)"
         fi
+
+        # --- Verify via CloudFront (the path Playwright actually takes) ---
+        # This is the critical check. The browser goes through CloudFront, so
+        # we must confirm that CloudFront is routing to the NEW task (with the
+        # patched BFF_AUTH_CALLBACK_URL). If the old task is still draining or
+        # CloudFront has a stale connection, this will catch it.
+        log_info "Verifying BFF auth login redirect (via CloudFront)..."
+        local cf_retries=0
+        local cf_max_retries=12
+        local cf_verified=false
+        while [ ${cf_retries} -lt ${cf_max_retries} ]; do
+            local cf_login_redirect
+            cf_login_redirect=$(curl -s -o /dev/null -w "%{redirect_url}" \
+                "${base_url}/api/auth/login" --max-time 15 || true)
+
+            # Also capture the HTTP status code for diagnostics
+            local cf_status_code
+            cf_status_code=$(curl -s -o /dev/null -w "%{http_code}" \
+                "${base_url}/api/auth/login" --max-time 15 || echo "000")
+
+            if [ -n "${cf_login_redirect}" ] && echo "${cf_login_redirect}" | grep -qF "redirect_uri="; then
+                local cf_actual_redirect_uri
+                cf_actual_redirect_uri=$(echo "${cf_login_redirect}" | python3 -c "
+import sys, urllib.parse
+url = sys.stdin.read().strip()
+parsed = urllib.parse.urlparse(url)
+params = urllib.parse.parse_qs(parsed.query)
+print(params.get('redirect_uri', [''])[0])
+")
+                if [ "${cf_actual_redirect_uri}" = "${expected_callback}" ]; then
+                    log_success "  CloudFront redirect_uri verified: ${cf_actual_redirect_uri}"
+                    cf_verified=true
+                    break
+                else
+                    log_warn "  CloudFront redirect_uri mismatch (attempt $((cf_retries + 1))/${cf_max_retries}): got ${cf_actual_redirect_uri}"
+                    log_warn "  Expected: ${expected_callback}"
+                    log_warn "  Old task may still be draining — retrying in 10s..."
+                fi
+            elif [ -n "${cf_login_redirect}" ]; then
+                # Got a redirect but not to Cognito — likely ALB HTTP→HTTPS redirect
+                # This indicates CloudFront is connecting to ALB over HTTP and getting
+                # a 301 redirect to HTTPS instead of reaching the BFF directly.
+                log_warn "  CloudFront /api/auth/login returned HTTP ${cf_status_code} redirect to: ${cf_login_redirect:0:120}"
+                log_warn "  This looks like an ALB HTTP→HTTPS redirect. CloudFront may be using HTTP_ONLY protocol."
+                log_warn "  Fix: Ensure CDK_CERTIFICATE_ARN is set when deploying FrontendStack so CloudFront uses HTTPS to ALB."
+            else
+                log_warn "  CloudFront /api/auth/login returned HTTP ${cf_status_code} with no redirect (attempt $((cf_retries + 1))/${cf_max_retries}) — retrying in 10s..."
+            fi
+
+            cf_retries=$((cf_retries + 1))
+            if [ ${cf_retries} -lt ${cf_max_retries} ]; then
+                sleep 10
+            fi
+        done
+
+        if [ "${cf_verified}" = "false" ]; then
+            log_error "  CRITICAL: CloudFront is still routing to the old task after ${cf_max_retries} attempts."
+            log_error "  The OAuth callback will land on the wrong domain and cookies will not work."
+            log_error "  This is the root cause of the 'cookies on wrong domain' E2E failure."
+            # Don't exit — let the tests run so we get diagnostic output from Playwright
+        fi
+
+        # Verify /auth/session returns 401 (not 500/503) when no cookie is sent.
+        # A 500/503 would indicate the BFF middleware or JWT validator is misconfigured.
+        log_info "Verifying BFF /auth/session endpoint is functional..."
+        local session_status
+        session_status=$(curl -s -o /dev/null -w "%{http_code}" \
+            "${alb_url}/auth/session" --max-time 10 || echo "000")
+        if [ "${session_status}" = "401" ]; then
+            log_info "  /auth/session returns 401 (expected — no cookie sent)"
+        else
+            log_error "  /auth/session returned HTTP ${session_status} (expected 401)"
+            log_error "  This suggests the BFF middleware or JWT validator is broken."
+            # Fetch the response body for diagnostics
+            local session_body
+            session_body=$(curl -s "${alb_url}/auth/session" --max-time 10 || true)
+            log_error "  Response body: ${session_body:0:200}"
+        fi
     fi
 
     # --- Change to frontend directory ---
diff --git a/scripts/stack-artifacts/build-cdk.sh b/scripts/stack-artifacts/build-cdk.sh
new file mode 100755
index 00000000..c9cc3f05
--- /dev/null
+++ b/scripts/stack-artifacts/build-cdk.sh
@@ -0,0 +1,30 @@
+#!/bin/bash
+set -euo pipefail
+
+# Script: Build CDK Code for Artifacts Stack
+# Description: Compiles TypeScript CDK code to JavaScript.
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
+
+log_info() { echo "[INFO] $1"; }
+log_error() { echo "[ERROR] $1" >&2; }
+log_success() { echo "[SUCCESS] $1"; }
+
+main() {
+    log_info "Building Artifacts Stack CDK code..."
+
+    cd "${PROJECT_ROOT}/infrastructure"
+
+    if [ ! -d "node_modules" ]; then
+        log_error "node_modules not found. Run install.sh first."
+        exit 1
+    fi
+
+    log_info "Compiling TypeScript..."
+    npm run build
+
+    log_success "Artifacts Stack CDK build completed"
+}
+
+main "$@"
diff --git a/scripts/stack-artifacts/deploy.sh b/scripts/stack-artifacts/deploy.sh
new file mode 100755
index 00000000..4ccd5e8c
--- /dev/null
+++ b/scripts/stack-artifacts/deploy.sh
@@ -0,0 +1,62 @@
+#!/bin/bash
+
+#============================================================
+# Artifacts Stack - Deploy
+#
+# Deploys the Artifacts Stack (DDB, S3, CloudFront, Lambda).
+#
+# Deploy order: Infrastructure → Artifacts → (Inference API, App API,
+# Frontend). Parallel-safe with RAG Ingestion and Fine-Tuning.
+#============================================================
+
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
+
+source "${PROJECT_ROOT}/scripts/common/load-env.sh"
+source "${PROJECT_ROOT}/scripts/common/recover-stack.sh"
+
+log_success() {
+    echo -e "${GREEN}[SUCCESS]${NC} $1"
+}
+
+if [ "${CDK_ARTIFACTS_ENABLED:-false}" != "true" ]; then
+    log_info "CDK_ARTIFACTS_ENABLED is not 'true' — Artifacts Stack is disabled; skipping deploy."
+    exit 0
+fi
+
+log_info "Deploying Artifacts Stack..."
+cd "${PROJECT_ROOT}/infrastructure"
+
+if [ ! -d "node_modules" ]; then
+    log_info "node_modules not found in CDK directory. Installing dependencies..."
+    npm ci
+fi
+
+# Recover from DELETE_FAILED state if a previous teardown left the stack broken.
+recover_delete_failed_stack "${CDK_PROJECT_PREFIX}-ArtifactsStack"
+
+CONTEXT_PARAMS=$(build_cdk_context_params)
+
+# Prefer the pre-synthesized template when available (CI path) so deploy
+# matches exactly what was reviewed in cdk diff.
+if [ -d "cdk.out" ] && [ -f "cdk.out/ArtifactsStack.template.json" ]; then
+    log_info "Using pre-synthesized template from cdk.out/"
+    eval "cdk deploy ArtifactsStack ${CONTEXT_PARAMS} \
+        --app \"cdk.out/\" \
+        --require-approval never \
+        --outputs-file artifacts-outputs.json"
+else
+    log_info "Synthesizing and deploying in one step..."
+    eval "cdk deploy ArtifactsStack ${CONTEXT_PARAMS} \
+        --require-approval never \
+        --outputs-file artifacts-outputs.json"
+fi
+
+log_success "Artifacts Stack deployed"
+
+if [ -f "artifacts-outputs.json" ]; then
+    log_info "Stack outputs:"
+    cat artifacts-outputs.json
+fi
diff --git a/scripts/stack-artifacts/install.sh b/scripts/stack-artifacts/install.sh
new file mode 100755
index 00000000..f54ece31
--- /dev/null
+++ b/scripts/stack-artifacts/install.sh
@@ -0,0 +1,52 @@
+#!/bin/bash
+set -euo pipefail
+
+# Script: Install Dependencies for Artifacts Stack
+# Description: Installs Node.js dependencies for CDK synthesis and deployment.
+# Note: The artifact render Lambda is pure Python with no runtime deps in the
+# scaffold; CDK bundles handler.py directly. No Docker or pip step required.
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
+
+log_info() { echo "[INFO] $1"; }
+log_error() { echo "[ERROR] $1" >&2; }
+log_success() { echo "[SUCCESS] $1"; }
+
+main() {
+    log_info "Installing Artifacts Stack dependencies..."
+
+    cd "${PROJECT_ROOT}/infrastructure"
+
+    if [ ! -f "package.json" ]; then
+        log_error "package.json not found in ${PROJECT_ROOT}/infrastructure"
+        exit 1
+    fi
+
+    if ! command -v node &> /dev/null; then
+        log_error "Node.js is not installed. Please install Node.js 18 or higher."
+        exit 1
+    fi
+
+    NODE_VERSION=$(node --version)
+    log_info "Using Node.js ${NODE_VERSION}"
+
+    if [ -f "package-lock.json" ]; then
+        log_info "Running npm ci (clean install from package-lock.json)..."
+        npm ci
+    else
+        log_error "package-lock.json not found. Cannot run npm ci."
+        exit 1
+    fi
+
+    if npm list aws-cdk-lib &> /dev/null; then
+        log_success "aws-cdk-lib installed successfully"
+    else
+        log_error "aws-cdk-lib installation verification failed"
+        exit 1
+    fi
+
+    log_success "All Artifacts Stack dependencies installed successfully!"
+}
+
+main "$@"
diff --git a/scripts/stack-artifacts/synth.sh b/scripts/stack-artifacts/synth.sh
new file mode 100755
index 00000000..ebc0751b
--- /dev/null
+++ b/scripts/stack-artifacts/synth.sh
@@ -0,0 +1,46 @@
+#!/bin/bash
+
+#============================================================
+# Artifacts Stack - Synthesize
+#
+# Synthesizes the Artifacts Stack CloudFormation template.
+# Skips silently if CDK_ARTIFACTS_ENABLED is false so the workflow
+# can run unconditionally before the feature is turned on.
+#============================================================
+
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
+
+source "${PROJECT_ROOT}/scripts/common/load-env.sh"
+
+log_success() {
+    echo -e "${GREEN}[SUCCESS]${NC} $1"
+}
+
+if [ "${CDK_ARTIFACTS_ENABLED:-false}" != "true" ]; then
+    log_info "CDK_ARTIFACTS_ENABLED is not 'true' — Artifacts Stack is disabled; skipping synth."
+    exit 0
+fi
+
+log_info "Synthesizing Artifacts Stack CloudFormation template..."
+cd "${PROJECT_ROOT}/infrastructure"
+
+if [ ! -d "node_modules" ]; then
+    log_info "node_modules not found in CDK directory. Installing dependencies..."
+    npm ci
+fi
+
+log_info "Running CDK synth for ArtifactsStack..."
+
+CONTEXT_PARAMS=$(build_cdk_context_params)
+
+eval "cdk synth ArtifactsStack ${CONTEXT_PARAMS} --output \"${PROJECT_ROOT}/infrastructure/cdk.out\""
+
+log_success "Artifacts Stack CloudFormation template synthesized successfully"
+
+if [ -d "${PROJECT_ROOT}/infrastructure/cdk.out" ]; then
+    log_info "Synthesized stacks:"
+    ls -lh "${PROJECT_ROOT}/infrastructure/cdk.out"/*.template.json 2>/dev/null || log_info "No template files found"
+fi
diff --git a/scripts/stack-artifacts/test-cdk.sh b/scripts/stack-artifacts/test-cdk.sh
new file mode 100755
index 00000000..eea3c4ad
--- /dev/null
+++ b/scripts/stack-artifacts/test-cdk.sh
@@ -0,0 +1,36 @@
+#!/bin/bash
+
+#============================================================
+# Artifacts Stack - Test CDK
+#
+# Validates the synthesized CloudFormation template via cdk diff.
+#============================================================
+
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
+
+source "${PROJECT_ROOT}/scripts/common/load-env.sh"
+
+log_success() {
+    echo -e "${GREEN}[SUCCESS]${NC} $1"
+}
+
+if [ "${CDK_ARTIFACTS_ENABLED:-false}" != "true" ]; then
+    log_info "CDK_ARTIFACTS_ENABLED is not 'true' — Artifacts Stack is disabled; skipping test."
+    exit 0
+fi
+
+log_info "Validating synthesized CloudFormation template..."
+cd "${PROJECT_ROOT}/infrastructure"
+
+if [ ! -d "cdk.out" ] || [ ! -f "cdk.out/ArtifactsStack.template.json" ]; then
+    log_error "Synthesized template not found. Run synth.sh first."
+    exit 1
+fi
+
+log_info "Running cdk diff to compare synthesized template with deployed stack..."
+cdk diff ArtifactsStack --app "cdk.out/"
+
+log_success "CloudFormation template validation completed"
diff --git a/scripts/stack-frontend/build.sh b/scripts/stack-frontend/build.sh
index c0feb5e4..d75d437f 100644
--- a/scripts/stack-frontend/build.sh
+++ b/scripts/stack-frontend/build.sh
@@ -144,6 +144,12 @@ if [ -d "dist" ]; then
     rm -rf dist
 fi
 
+# Bake the real version into src/version.ts from the root VERSION file.
+# Invoked explicitly because we call `ng build` directly below, which bypasses
+# the npm `prebuild` lifecycle hook that normally runs gen-version.js.
+log_info "Generating version from root VERSION file..."
+node scripts/gen-version.js
+
 # Build the Angular application
 log_info "Running: ng build --configuration ${BUILD_CONFIG}"
 ./node_modules/.bin/ng build --configuration "${BUILD_CONFIG}"
diff --git a/scripts/stack-frontend/deploy-cdk.sh b/scripts/stack-frontend/deploy-cdk.sh
index 4a80b70a..fb703916 100644
--- a/scripts/stack-frontend/deploy-cdk.sh
+++ b/scripts/stack-frontend/deploy-cdk.sh
@@ -122,6 +122,7 @@ else
         --context vpcCidr="${CDK_VPC_CIDR}" \
         --context infrastructureHostedZoneDomain="${CDK_HOSTED_ZONE_DOMAIN}" \
         --context domainName="${CDK_DOMAIN_NAME}" \
+        --context certificateArn="${CDK_CERTIFICATE_ARN:-}" \
         --context frontend.certificateArn="${CDK_FRONTEND_CERTIFICATE_ARN}" \
         --context frontend.bucketName="${CDK_FRONTEND_BUCKET_NAME}" \
         --context frontend.enabled="${CDK_FRONTEND_ENABLED}" \
diff --git a/scripts/stack-frontend/synth.sh b/scripts/stack-frontend/synth.sh
index d0a22087..72220b4a 100644
--- a/scripts/stack-frontend/synth.sh
+++ b/scripts/stack-frontend/synth.sh
@@ -45,6 +45,7 @@ cdk synth FrontendStack \
     --context vpcCidr="${CDK_VPC_CIDR}" \
     --context infrastructureHostedZoneDomain="${CDK_HOSTED_ZONE_DOMAIN}" \
     --context domainName="${CDK_DOMAIN_NAME}" \
+    --context certificateArn="${CDK_CERTIFICATE_ARN:-}" \
     --context frontend.certificateArn="${CDK_FRONTEND_CERTIFICATE_ARN}" \
     --context frontend.bucketName="${CDK_FRONTEND_BUCKET_NAME}" \
     --context frontend.enabled="${CDK_FRONTEND_ENABLED}" \
diff --git a/scripts/stack-mcp-sandbox/build-cdk.sh b/scripts/stack-mcp-sandbox/build-cdk.sh
new file mode 100755
index 00000000..4d2eec76
--- /dev/null
+++ b/scripts/stack-mcp-sandbox/build-cdk.sh
@@ -0,0 +1,30 @@
+#!/bin/bash
+set -euo pipefail
+
+# Script: Build CDK Code for MCP Sandbox Stack
+# Description: Compiles TypeScript CDK code to JavaScript.
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
+
+log_info() { echo "[INFO] $1"; }
+log_error() { echo "[ERROR] $1" >&2; }
+log_success() { echo "[SUCCESS] $1"; }
+
+main() {
+    log_info "Building MCP Sandbox Stack CDK code..."
+
+    cd "${PROJECT_ROOT}/infrastructure"
+
+    if [ ! -d "node_modules" ]; then
+        log_error "node_modules not found. Run install.sh first."
+        exit 1
+    fi
+
+    log_info "Compiling TypeScript..."
+    npm run build
+
+    log_success "MCP Sandbox Stack CDK build completed"
+}
+
+main "$@"
diff --git a/scripts/stack-mcp-sandbox/deploy.sh b/scripts/stack-mcp-sandbox/deploy.sh
new file mode 100755
index 00000000..f34862d3
--- /dev/null
+++ b/scripts/stack-mcp-sandbox/deploy.sh
@@ -0,0 +1,64 @@
+#!/bin/bash
+
+#============================================================
+# MCP Sandbox Stack - Deploy
+#
+# Deploys the MCP Sandbox Stack (S3, CloudFront, Route53) that serves the
+# MCP Apps sandbox-proxy shell at mcp-sandbox.{domain}.
+#
+# Deploy tier 1: reads no cross-stack SSM. Parallel-safe with Artifacts,
+# RAG Ingestion, Gateway, and Fine-Tuning. Inert until the SPA wiring
+# (PR #4) and MCP_APPS_HOST_ENABLED (PR #7) land.
+#============================================================
+
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
+
+source "${PROJECT_ROOT}/scripts/common/load-env.sh"
+source "${PROJECT_ROOT}/scripts/common/recover-stack.sh"
+
+log_success() {
+    echo -e "${GREEN}[SUCCESS]${NC} $1"
+}
+
+if [ "${CDK_MCP_SANDBOX_ENABLED:-false}" != "true" ]; then
+    log_info "CDK_MCP_SANDBOX_ENABLED is not 'true' — MCP Sandbox Stack is disabled; skipping deploy."
+    exit 0
+fi
+
+log_info "Deploying MCP Sandbox Stack..."
+cd "${PROJECT_ROOT}/infrastructure"
+
+if [ ! -d "node_modules" ]; then
+    log_info "node_modules not found in CDK directory. Installing dependencies..."
+    npm ci
+fi
+
+# Recover from DELETE_FAILED state if a previous teardown left the stack broken.
+recover_delete_failed_stack "${CDK_PROJECT_PREFIX}-McpSandboxStack"
+
+CONTEXT_PARAMS=$(build_cdk_context_params)
+
+# Prefer the pre-synthesized template when available (CI path) so deploy
+# matches exactly what was reviewed in cdk diff.
+if [ -d "cdk.out" ] && [ -f "cdk.out/McpSandboxStack.template.json" ]; then
+    log_info "Using pre-synthesized template from cdk.out/"
+    eval "cdk deploy McpSandboxStack ${CONTEXT_PARAMS} \
+        --app \"cdk.out/\" \
+        --require-approval never \
+        --outputs-file mcp-sandbox-outputs.json"
+else
+    log_info "Synthesizing and deploying in one step..."
+    eval "cdk deploy McpSandboxStack ${CONTEXT_PARAMS} \
+        --require-approval never \
+        --outputs-file mcp-sandbox-outputs.json"
+fi
+
+log_success "MCP Sandbox Stack deployed"
+
+if [ -f "mcp-sandbox-outputs.json" ]; then
+    log_info "Stack outputs:"
+    cat mcp-sandbox-outputs.json
+fi
diff --git a/scripts/stack-mcp-sandbox/install.sh b/scripts/stack-mcp-sandbox/install.sh
new file mode 100755
index 00000000..3b911428
--- /dev/null
+++ b/scripts/stack-mcp-sandbox/install.sh
@@ -0,0 +1,52 @@
+#!/bin/bash
+set -euo pipefail
+
+# Script: Install Dependencies for MCP Sandbox Stack
+# Description: Installs Node.js dependencies for CDK synthesis and deployment.
+# Note: the static proxy shell (assets/mcp-sandbox/) is plain HTML/JS bundled
+# by CDK BucketDeployment — no Docker or pip step required.
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
+
+log_info() { echo "[INFO] $1"; }
+log_error() { echo "[ERROR] $1" >&2; }
+log_success() { echo "[SUCCESS] $1"; }
+
+main() {
+    log_info "Installing MCP Sandbox Stack dependencies..."
+
+    cd "${PROJECT_ROOT}/infrastructure"
+
+    if [ ! -f "package.json" ]; then
+        log_error "package.json not found in ${PROJECT_ROOT}/infrastructure"
+        exit 1
+    fi
+
+    if ! command -v node &> /dev/null; then
+        log_error "Node.js is not installed. Please install Node.js 18 or higher."
+        exit 1
+    fi
+
+    NODE_VERSION=$(node --version)
+    log_info "Using Node.js ${NODE_VERSION}"
+
+    if [ -f "package-lock.json" ]; then
+        log_info "Running npm ci (clean install from package-lock.json)..."
+        npm ci
+    else
+        log_error "package-lock.json not found. Cannot run npm ci."
+        exit 1
+    fi
+
+    if npm list aws-cdk-lib &> /dev/null; then
+        log_success "aws-cdk-lib installed successfully"
+    else
+        log_error "aws-cdk-lib installation verification failed"
+        exit 1
+    fi
+
+    log_success "All MCP Sandbox Stack dependencies installed successfully!"
+}
+
+main "$@"
diff --git a/scripts/stack-mcp-sandbox/synth.sh b/scripts/stack-mcp-sandbox/synth.sh
new file mode 100755
index 00000000..c6eead5a
--- /dev/null
+++ b/scripts/stack-mcp-sandbox/synth.sh
@@ -0,0 +1,46 @@
+#!/bin/bash
+
+#============================================================
+# MCP Sandbox Stack - Synthesize
+#
+# Synthesizes the MCP Sandbox Stack CloudFormation template.
+# Skips silently if CDK_MCP_SANDBOX_ENABLED is false so the workflow
+# can run unconditionally before the feature is turned on.
+#============================================================
+
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
+
+source "${PROJECT_ROOT}/scripts/common/load-env.sh"
+
+log_success() {
+    echo -e "${GREEN}[SUCCESS]${NC} $1"
+}
+
+if [ "${CDK_MCP_SANDBOX_ENABLED:-false}" != "true" ]; then
+    log_info "CDK_MCP_SANDBOX_ENABLED is not 'true' — MCP Sandbox Stack is disabled; skipping synth."
+    exit 0
+fi
+
+log_info "Synthesizing MCP Sandbox Stack CloudFormation template..."
+cd "${PROJECT_ROOT}/infrastructure"
+
+if [ ! -d "node_modules" ]; then
+    log_info "node_modules not found in CDK directory. Installing dependencies..."
+    npm ci
+fi
+
+log_info "Running CDK synth for McpSandboxStack..."
+
+CONTEXT_PARAMS=$(build_cdk_context_params)
+
+eval "cdk synth McpSandboxStack ${CONTEXT_PARAMS} --output \"${PROJECT_ROOT}/infrastructure/cdk.out\""
+
+log_success "MCP Sandbox Stack CloudFormation template synthesized successfully"
+
+if [ -d "${PROJECT_ROOT}/infrastructure/cdk.out" ]; then
+    log_info "Synthesized stacks:"
+    ls -lh "${PROJECT_ROOT}/infrastructure/cdk.out"/*.template.json 2>/dev/null || log_info "No template files found"
+fi
diff --git a/scripts/stack-mcp-sandbox/test-cdk.sh b/scripts/stack-mcp-sandbox/test-cdk.sh
new file mode 100755
index 00000000..685ab463
--- /dev/null
+++ b/scripts/stack-mcp-sandbox/test-cdk.sh
@@ -0,0 +1,36 @@
+#!/bin/bash
+
+#============================================================
+# MCP Sandbox Stack - Test CDK
+#
+# Validates the synthesized CloudFormation template via cdk diff.
+#============================================================
+
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
+
+source "${PROJECT_ROOT}/scripts/common/load-env.sh"
+
+log_success() {
+    echo -e "${GREEN}[SUCCESS]${NC} $1"
+}
+
+if [ "${CDK_MCP_SANDBOX_ENABLED:-false}" != "true" ]; then
+    log_info "CDK_MCP_SANDBOX_ENABLED is not 'true' — MCP Sandbox Stack is disabled; skipping test."
+    exit 0
+fi
+
+log_info "Validating synthesized CloudFormation template..."
+cd "${PROJECT_ROOT}/infrastructure"
+
+if [ ! -d "cdk.out" ] || [ ! -f "cdk.out/McpSandboxStack.template.json" ]; then
+    log_error "Synthesized template not found. Run synth.sh first."
+    exit 1
+fi
+
+log_info "Running cdk diff to compare synthesized template with deployed stack..."
+cdk diff McpSandboxStack --app "cdk.out/"
+
+log_success "CloudFormation template validation completed"