Skip to content

Release/1.0.0#506

Merged
colinmxs merged 1353 commits into
mainfrom
release/1.0.0
Jun 24, 2026
Merged

Release/1.0.0#506
colinmxs merged 1353 commits into
mainfrom
release/1.0.0

Conversation

@colinmxs

Copy link
Copy Markdown
Contributor

see release notes

Oscar Filson and others added 30 commits May 5, 2026 14:43
Co-authored-by: colinmxs <colinmxs@users.noreply.github.com>
* fix: resolve open CodeQL alerts

- Remove user-controlled values from log statements (py/log-injection)
- Remove unused imports: lambda, lambdaEventSources, sns, events, targets (js/unused-local-variable)
- Remove unused variable execution_output (py/unused-local-variable)
- Remove dead-code variable result_seen (py/unused-local-variable)
- Rename _period to _ for intentionally unused placeholder (py/unused-local-variable)
- Add explanatory comment to empty except block (py/empty-except)

* fix: upgrade Pygments 2.19.2 -> 2.20.0 (Dependabot alert #71)

Addresses ReDoS vulnerability in GUID matching regex.
Other Dependabot alerts (pillow, aiohttp, cryptography, python-multipart,
vite, postcss, dompurify, hono, lodash-es) are already at fixed versions
in the lockfiles — those alerts are stale and should auto-close on next scan.

* chore(ci): target main branch for CodeQL and Dependabot

- CodeQL workflow: push/PR triggers changed from develop to main
- Dependabot: all four update groups (pip, npm frontend, npm infra,
  github-actions) target main instead of develop

* test: update dependabot config test to expect main branch

Aligns test assertion with the CI config change that retargets Dependabot
from develop to main.

---------

Co-authored-by: colinmxs <colinmxs@users.noreply.github.com>
Bumps [types-aiofiles](https://github.com/python/typeshed) from 25.1.0.20251011 to 25.1.0.20260409.
- [Commits](https://github.com/python/typeshed/commits)

---
updated-dependencies:
- dependency-name: types-aiofiles
  dependency-version: 25.1.0.20260409
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…conversation cost badge, compaction events (#249)

Co-authored-by: colinmxs <colinmxs@users.noreply.github.com>
* feat(login): lava-lamp motion for backdrop blobs

Replace the static circular drift with organic morphing blob shapes that
rise and fall vertically with squish/stretch and gentle rotation. Bumps
blob count from 3 to 5 with offset animation delays so morph cycles don't
deform in lockstep. Honors prefers-reduced-motion.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(login): three-tier parallax for backdrop blobs

Restructure the 5 lava blobs into 6 across 3 depth tiers (far/mid/near).
Size, blur, opacity, animation duration, and travel distance all scale
with depth so the velocity contrast reads as parallax: huge soft far
blobs barely budge while small sharp near blobs traverse the viewport.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(first-boot): apply lava-lamp parallax backdrop and frosted card

Mirror the login page's shell so first-boot and login feel like one
system: three-tier parallax blobs, primary-color radial wash, faint grid
overlay, and the frosted-glass card. Class names are reused under
component-scoped styles, so they don't collide with login.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
#254)

Replace the dense badge with a richer attachment renderer in user message
history:

- Images render as an iMessage-style mosaic (1-bubble, 2-col, 1+2 split,
  2x2 grid, 5+ with "+N" overlay) and open in a full-screen lightbox with
  arrow-key navigation.
- Non-image files render as a document-style card: tinted header strip
  with type chip, white "page" body with a folded corner, and filename
  + size footer. Text-based files (txt, md, csv, html) show a real
  content excerpt; binary types (pdf, docx, xls/xlsx) get skeleton lines.

Backend additions to support the UI:

- GET /files/{upload_id}/preview-url — short-lived presigned GET URL,
  scoped to the file owner, used for inline images and the lightbox.
- GET /files/{upload_id}/text-snippet — first 2KB of a text-based file,
  decoded as UTF-8, for the document card content peek.
…tion

- Add spreadsheet_analysis module with factory-produced tools for listing and analyzing tabular data
- Implement make_list_spreadsheets_tool to enumerate CSV/Excel files from knowledge bases and chat attachments
- Implement make_analyze_tool to download files from S3, execute Python analysis via Code Interpreter, and return results
- Add intelligent schema detection with skiprows probing to handle report-style exports with metadata rows
- Implement stderr cleaning to filter pandas/numpy internal frames and show only user-relevant errors
- Add output truncation (10K chars) and error truncation (600 chars) to prevent context window overflow
- Update ToolRegistry integration to inject spreadsheet tools per-request via extra_tools parameter
- Update chat routes (app_api and inference_api) to pass conversation context to tool factories
- Add comprehensive docstrings and logging for debugging file discovery and Code Interpreter execution
- Enables agents to analyze user-provided spreadsheets without manual file handling or external dependencies
- Add S3 read permissions (GetObject, GetObjectVersion) to runtime execution role for assistants documents bucket
- Enable agent's spreadsheet_analysis tool to download tabular KB files (CSV/XLSX) from S3 for Code Interpreter sandbox analysis
- Add S3_ASSISTANTS_DOCUMENTS_BUCKET_NAME environment variable to runtime configuration via SSM parameter
- Update documentation comments to clarify that documents bucket is now accessed at runtime by the agent, not just during ingestion
- Resolves agent failures when attempting spreadsheet analysis due to missing bucket configuration
…mentation

- Add StartCodeInterpreterSession, StopCodeInterpreterSession, GetCodeInterpreter, GetCodeInterpreterSession, and ListCodeInterpreterSessions actions to runtime execution role
- Replace CreateCodeInterpreterSession with StartCodeInterpreterSession to align with AWS API
- Add detailed inline documentation referencing AWS Bedrock Agent Core policy guide
- Scope permissions to this stack's Custom Code Interpreter resource only, removing need for account-wide discovery permissions
…sion navigation

- Add loaded assistant check to prevent re-fetching already-loaded assistant on metadata signal changes
- Prioritize in-memory loaded assistant over query params and session preferences when determining which assistant to use
- Add cross-session navigation detection to clear stale assistant state before new session metadata loads
- Prevent mid-session assistant attachment validation error when query param persists after first message
- Add conditional clearing of assistant state to avoid wiping in-memory assistant on first turn of new sessions
- Improve assistant resolution priority: loaded assistant → query param → session preferences
- Add detailed comments explaining RAG continuation across follow-up messages and state management edge cases
- Fixes issue where assistant would be lost after first message submission or when navigating between sessions (#205)
…d using URL as source of truth

- Fix assistant_id storage in SessionMetadata by updating preferences sub-model instead of top-level model, resolving silent failures under extra="allow"
- Remove redundant assistant_id resolution logic that attempted fallback to session preferences, simplifying to trust URL query parameter
- Update session list to pass assistantId query param when navigating to sessions with attached assistants
- Refactor session page effect to use URL as single source of truth for assistant attachment, eliminating race conditions from metadata fetch timing
- Add self-heal redirect to rebuild URL with assistantId when landing on bare `/s/:id` URLs from bookmarks or legacy links
- Prevents mid-session assistant validation failures on turn 2+ by ensuring preferences.assistant_id is properly persisted and accessible

Fixes #205
- Add comprehensive section on handling missing/disabled tools in main agent system prompt
- Include step-by-step instructions for identifying user intent and suggesting tool enablement
- Provide mapping of common user intents to corresponding tools (spreadsheet analysis, code interpreter, web search, knowledge base)
- Add concrete example response showing how to guide users to enable Spreadsheet Analysis tool
- Improve user experience by directing users to settings panel rather than refusing requests
- Enable graceful degradation with fallback suggestions when tools are unavailable
…files (#262)

Render parsed markdown in the attachment card excerpt instead of raw text,
and open a full-screen modal viewer when a .md card is clicked rather than
opening the raw source in a new tab. Reuses ngx-markdown (already wired up
for assistant messages) and the existing presigned preview-url flow.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Render real first-page thumbnails for PDF attachments instead of the
skeleton mockup. Page rasterization runs in app-api via pypdfium2
(Apache 2.0 / BSD, bundled PDFium binary, no system poppler/ghostscript).

Backend:
- New `ThumbnailRenderer` with a MIME-type dispatcher; PDF only today.
  Class docstring documents the recommended out-of-process design for
  .docx / .xlsx so the dispatcher stays small.
- New `GET /files/{upload_id}/thumbnail` endpoint. Lazy: HEAD-checks for
  a cached `_thumb.png` sibling next to the original, renders + stores
  on miss, returns a short-lived presigned GET URL. 415 for unsupported
  MIME types, 422 for unreadable / corrupt PDFs.
- Render runs in `loop.run_in_executor` so request workers aren't blocked.
- Single-file and session-cascade deletes now also remove the thumbnail
  sibling.

Frontend:
- `FileUploadService.getThumbnail()` returns a typed result so callers
  can switch on `ready` / `unsupported` / `unavailable` without parsing
  HTTP errors.
- File attachment badge fetches the thumbnail on mount for PDFs and
  renders it as an `object-cover` image in the card body, suppressing
  the bottom fade. Silently falls back to the existing skeleton on any
  error.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…for Code Interpreter

- Add targeted error hints for XLSX→CSV filename mismatches in sandbox environment
- Implement tolerant file matching for CSV↔XLSX aliasing to prevent retry loops
- Expand analyze_tool docstring with critical guidance on filename vs. sandbox paths
- Add schema footer preservation on errors for better retry context
- Enhance list_spreadsheets_tool with file size and MIME type metadata
- Update system prompt builder to clarify file handling for spreadsheet analysis
- Improve stream processor error handling for Code Interpreter responses
- Add file metadata models and utilities for consistent attachment handling
- Update chat input component to support file metadata in message attachments

Fixes: #206
# Conflicts:
#	backend/pyproject.toml
#	backend/src/apis/inference_api/chat/routes.py
#	backend/src/apis/inference_api/chat/service.py
#	frontend/ai.client/src/app/auth/login/login.page.css
#	frontend/ai.client/src/app/auth/login/login.page.ts
#	frontend/ai.client/src/app/session/session.page.ts
* docs(spec): bugfix requirements for BFF middleware event-loop blocking

Regression surfaced after v1.0.0-beta.24 deploy (commit 258193d). The new
SessionRefreshMiddleware runs sync boto3 (DynamoDB + Cognito) inside async
handlers on the hot path of every cookie-bearing request, on a single
uvicorn worker in a single ECS task, with aligned cache TTL and
sliding-renewal throttle defaults. Page-load fan-out produces ~16
serialized blocking AWS calls per active user per minute.

Spec captures the defect (7 clauses), corrected behavior (7 clauses), and
regression-prevention invariants (11 clauses) to carry into design.

* docs(spec): Add BFF middleware event-loop blocking bugfix design and tasks

- Add comprehensive design specification for SessionRefreshMiddleware event-loop blocking issue
- Document root causes: sync boto3 I/O, missing fan-out coalescing, aligned cache/throttle windows, inline awaited writes
- Include formal bug condition specification with 7 sub-conditions and observable symptoms
- Add detailed glossary of key components and terminology
- Document preservation requirements and public contracts that must remain unchanged
- Add implementation tasks with acceptance criteria and verification steps
- Include deployment configuration changes (CDK worker count, environment variables)
- Provide testing strategy for concurrency, performance, and regression validation

* fix(bff-middleware): Resolve event-loop blocking and fan-out amplification

- Offload SessionRepository boto3 calls via asyncio.to_thread to prevent event-loop blocking
- Offload CognitoRefreshClient.refresh via asyncio.to_thread for non-blocking auth operations
- Add per-session single-flight primitive module to coalesce concurrent refresh requests
- Wire single-flight into SessionRefreshMiddleware._resolve_session to eliminate duplicate work
- Convert _maybe_slide to fire-and-forget DDB write with synchronous cache update
- De-align cache/leeway and throttle windows in config (throttle: 60s → 300s)
- Raise production appApi.desiredCount to 2 for distributed request handling
- Add comprehensive bug condition and preservation property tests
- Update task completion checklist and infrastructure configuration

* test(bff): poll for fire-and-forget slide-write under slower schedulers

Task 3.5 moved the slide-write DDB call off the request path via
`asyncio.create_task`, but `test_3_4_slide_max_age_matches_on_both_cookies`
and `test_slide_past_throttle_writes_ddb_and_reemits_cookie` still sampled
`update_item_calls` / `touch_last_seen.await_count` immediately after
`TestClient` returned the response. On CI's slower scheduling (Python 3.12
runners), the detached task hadn't run yet, so Hypothesis tripped a
`FlakyFailure` on the 3.4 property strategy.

Fix: poll the counter up to 1s before asserting. The observable external
contract (cookie attributes, Max-Age, response body) is unchanged; only
the internal timing of the DDB write moved, which is exactly what
task 3.5 intends.

* fix(bff): keep strong ref on fire-and-forget slide-write tasks

Task 3.5 dispatched the slide-write via `asyncio.create_task` but discarded
the returned Task reference. Python's docs explicitly warn about this —
without a strong reference, the task can be garbage-collected mid-execution.
On Python 3.12 CI runners this was racing: the preservation test
`test_3_4_slide_max_age_matches_on_both_cookies` saw 0 `update_item` calls
(Hypothesis flagged it as FlakyFailure — failed on first run, passed on
retry).

Fix: hold a set of pending tasks on the middleware instance and attach an
`add_done_callback(self._slide_tasks.discard)` so the set doesn't leak.

This is the canonical pattern from the asyncio docs:
  https://docs.python.org/3/library/asyncio-task.html\#asyncio.create_task

Verified locally by running the exact CI test scripts inside the
agentcore-dev container:
  - scripts/stack-app-api/test.sh       -> 2459 passed
  - scripts/stack-inference-api/test.sh -> 2459 passed

* test(bff): poll inside TestClient context so background task can run

CI was still failing `test_3_4_slide_max_age_matches_on_both_cookies` on
Python 3.12 despite the strong-reference fix in 78891e2. The production
change was correct — the task reference prevents GC. But the test was
polling OUTSIDE the `with TestClient(app)` block, and TestClient's
`__exit__` shuts down the anyio portal (and the event loop) before the
polling even starts. Any pending asyncio.Task on that loop is cancelled
on teardown, never runs, update_item_calls stays 0.

Fix: poll INSIDE the `with` block. If the task hasn't flushed yet,
drive the event loop with a second GET to give the pending task a
chance to run. Same pattern applied to
test_slide_past_throttle_writes_ddb_and_reemits_cookie.

Reproduced the race locally by setting up a Python 3.12 venv inside
the agentcore-dev container (CI's exact version). Ran the full test
suite on both 3.12 and 3.13: 2459 passed on each.

Also includes the code review report written earlier.

---------

Co-authored-by: colinmxs <colinmxs@users.noreply.github.com>
* feat(agents): upgrade strands to 1.39.0 and enable Bedrock prompt caching

Bumps strands-agents 1.37.0 → 1.39.0 and strands-agents-tools 0.5.1 → 0.5.2.
Re-enables CacheConfig(strategy="auto") on the BedrockModel: the original
blocker (strands PR #1438 — cachePoint blocks alongside non-PDF document
attachments) is now included in v1.39.0, so the workaround is no longer
needed. Updates the corresponding model_config test to assert caching is
emitted rather than suppressed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(model-config): defer Bedrock prompt caching enablement

Backs out the CacheConfig(strategy="auto") activation. The SDK-side
blocker (strands PR #1438) is resolved in 1.39.0, so the technical
barrier is gone — but the user-visible cost/badge impact warrants a
separate, scoped rollout. The version bump itself stays.

The deferral comment in model_config.py replaces the outdated
"Bedrock limitation" rationale; the test now documents intentional
deferral instead of the SDK limitation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…mantics (#270)

* fix(token-accounting): correct per-message cost and context-window semantics

Two related bugs were inflating cost and context-% reporting on tool-use
turns:

1. Per-message cost double-count. Strands emits both per-LLM-call
   metadata (each call's tokens) and a final AgentResultEvent whose
   EventLoopMetrics.accumulated_usage is summed across every call in
   the turn. Both were emitted as `metadata` events and routed into
   per_message_metadata[current_assistant_message_index]["usage"] via
   .update(). Because the AgentResult event arrives after every
   message_stop, the index still pointed at the last assistant
   message — so cumulative tokens overwrote that message's per-call
   values, double-counting earlier messages' input tokens when each
   entry was priced and summed.

   Fix: route the result-extracted cumulative on the existing
   `metadata_summary` (turn-summary) track instead of `metadata`. The
   stream_processor main loop now consumes both event types into its
   accumulated_metadata so the final summary still carries true totals.

2. Context-% inflation within a tool turn. Bedrock reports each
   per-LLM-call inputTokens as the FULL context size sent on that
   call. For a 2-call tool turn (call_1.input=1000,
   call_2.input=2500), Strands' accumulated_usage reports 3500 — but
   the actual current context occupancy is 2500. The final SSE
   `usage` field (which drives the context-% badge and compaction
   trigger) was inheriting Strands' summed value via the
   metadata_summary handler in stream_coordinator.

   Fix: stream_coordinator no longer accumulates `metadata_summary`
   into accumulated_metadata. Per-call `metadata` events
   last-write-wins via .update(), so accumulated_metadata.usage
   equals the most recent call's full input = current context.
   Added a CAUTION comment noting AgentResult.context_size /
   EventLoopMetrics.latest_context_size return only `inputTokens`
   (excluding cacheRead/cacheWrite) — under prompt caching they
   under-report by 99%+, so we deliberately sum all three buckets.

Also folded in: TTFT placeholder of 0 → null. A real time-to-first-token
can never be 0ms, and aggregations need to distinguish absence from a
real value. LatencyMetrics.time_to_first_token is now Optional[int] in
both shared/sessions and app_api/messages models. Frontend stream
parser preserves null instead of coercing; badge component already
hides via truthy check. Existing zero-valued data deserializes fine.

Tooltip on the context-% badge clarified: "Reflects the most recent
turn ... May shrink after a context compaction." Aria-label matches.

Regression tests in test_per_message_cost_attribution.py:
- TestPerMessageAttributionTwoCallTurn (3 cases) — locks the
  metadata vs metadata_summary contract; without the fix,
  per_message_metadata[1].usage = (2300, 130, 2430) instead of the
  expected per-call (1300, 80, 1380).
- TestSummaryAccumulatorAcceptsBothTracks — main loop accumulator
  must consume both tracks for cumulative totals.
- TestStreamCoordinatorContextOccupancy (2 cases) — pin "current
  context" semantic in stream_coordinator and verify the all-three-
  bucket sum (cacheRead/cacheWrite included) matches the most recent
  call's full input.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(costs): add unit tests for CostCalculator math

CostCalculator is the source-of-truth for all USD math, but the existing
costs/ test suite only exercises it transitively through aggregator and
storage tests with mocks. Add a direct test file with 26 cases covering:

- Per-bucket pricing (input/output/cacheRead/cacheWrite) and component
  sums equaling the total
- Cache scenarios (read-only, write-only, mixed) priced against
  Sonnet 4.5 rates so dollar values can be sanity-checked
- Defensive cases: missing pricing keys, None values throughout,
  empty dicts — all degrade to 0 without raising
- calculate_cache_savings correctness and None-tolerance
- validate_pricing / validate_usage required-field predicates

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…oot (#271)

The dark-mode CSS for the auth pages' lava-lamp backdrop and frosted-glass
card never applied: hand-written `html.dark .X` selectors don't match
correctly under Angular's emulated view encapsulation, and ThemeService
(providedIn:'root') was never injected by anything in the pre-auth tree
so the `dark` class wasn't reaching <html> on a cold load.

- Switch the auth-page CSS to `:host-context(html.dark) .X`, the pattern
  the rest of the codebase already uses for component-scoped dark rules.
- Force ThemeService to construct at bootstrap via provideAppInitializer
  so the persisted/system theme is applied to <html> before any route
  renders, including /auth/login and /auth/first-boot on cold load.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…272)

* feat(auth): add SKIP_AUTH=true local-dev bypass with allowlist guard

Adds a single-env-var bypass so unattended local dev (and Claude Code) can
hit protected routes without the Cognito redirect to an external IdP. The
bypass returns a fake admin user from the three auth dependencies in
apis.shared.auth.dependencies; everything else (CSRF middleware, RBAC,
profile cache) flows naturally because no `bff_session` is resolved.

Two safeguards keep the bypass scoped to local dev:

1. Allowlist startup guard in app_api/main.py — refuses to boot when
   SKIP_AUTH=true is paired with any non-localhost entry in CORS_ORIGINS.
   Empty CORS_ORIGINS also refused. Fails closed for deploy targets we
   haven't anticipated, instead of a blocklist of known cloud env vars.

2. CI guard (.github/workflows/skip-auth-guard.yml) — greps CDK source,
   workflows, and Dockerfiles for SKIP_AUTH=true / SKIP_AUTH: true
   patterns and fails the build if any leak into deployed config.

Why an allowlist of CORS origins: CORS_ORIGINS must be set correctly per
environment for the app to function at all, so it's a reliable positive
signal of "this is local dev" — far stronger than enumerating Lambda /
ECS / EKS / App Runner / AgentCore Runtime indicators.

Inference-api is intentionally not bypassed; all SPA traffic flows
through app-api per the BFF pattern, so the single bypass on app-api is
sufficient.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ci): SHA-pin checkout and use ubuntu-24.04 in skip-auth-guard workflow

Match the supply-chain conventions enforced by
tests/supply_chain/{test_action_pinning,test_runner_pinning}.py: pin
actions/checkout to the canonical repo SHA and replace ubuntu-latest
with ubuntu-24.04.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(auth): defer SKIP_AUTH startup guard to lifespan + add tests

Move the CORS_ORIGINS allowlist check from import-time into lifespan()
so tests that import or reload apis.app_api.main (e.g.
tests/routes/test_pbt_auth_sweep.py) don't trip the guard. The runtime
behaviour is unchanged — uvicorn still invokes lifespan at boot.

Add tests/auth/test_skip_auth.py covering:
- _skip_auth_user(): None when unset/falsey, fake User when truthy,
  honors all SKIP_AUTH_* env overrides.
- All three auth dependencies bypass when enabled, still 401 when not.
- Startup guard accepts every localhost variant, rejects empty /
  unset / non-localhost CORS_ORIGINS.
- The skip-auth-guard.yml regex matches realistic leak strings and
  skips benign ones.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test: scrub SKIP_AUTH bleed from local .env in pytest conftest

Tests that reload apis.app_api.main (e.g. test_pbt_auth_sweep.py)
re-run load_dotenv(override=True), which copies SKIP_AUTH=true from a
developer's backend/src/.env into os.environ for the rest of the
process. Downstream auth-aware tests then silently take the bypass
path and return a fake user.

Add a session-wide autouse fixture that delenvs SKIP_AUTH_* per test.
Test-local monkeypatch.setenv still wins (autouse runs first).
Mirrors the existing pattern at tests/apis/shared/oauth/conftest.py.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(test): manage SKIP_AUTH env directly so autouse doesn't break sibling fixtures

The monkeypatch-based scrub fixture changed pytest's fixture
dependency graph: tests/apis/app_api/test_connectors_routes.py uses
monkeypatch.setattr(routes, "_agentcore_control_client", lambda) and
relies on its own autouse `_reset_control_client` tearing down AFTER
that monkeypatch reverts. Adding a sibling autouse fixture that also
depends on monkeypatch flipped the teardown order, leaving
`_agentcore_control_client` as a plain lambda when
`_reset_control_client` calls `cache_clear()` on it — 9 errors in CI.

Manage os.environ directly via save/restore in a try/finally so the
new fixture is independent of monkeypatch and doesn't perturb
ordering for tests that compose monkeypatch with their own fixtures.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(env): document SKIP_AUTH local-dev bypass in .env.example

Add a DEVELOPMENT SETTINGS section entry covering SKIP_AUTH and its
optional SKIP_AUTH_ROLES / SKIP_AUTH_USER_ID / SKIP_AUTH_EMAIL knobs.
Calls out the boot-time CORS_ORIGINS allowlist, the CI guard workflow,
and the inference-api carve-out so a new dev landing in the file
sees the safety story alongside the feature.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* fix(bff): share AES-256 cookie data key across tasks via Secrets Manager

PR #264 raised app-api desiredCount from 1 → 2 for concurrency slack but
left CookieCodec calling kms:GenerateDataKey on first use per process.
That generates a fresh random AES key per task, so a cookie sealed by
Task A unseals as `bad seal` on Task B — every page-load fan-out under
the new deployment shape becomes a 401 storm. Dev confirmed: /sessions
returns 200 from one task while /permissions, /models, /tools, /quota,
/connectors all return 401 from the other.

This commit moves the data key out of per-process state and into a
shared Secrets Manager secret, bootstrapped once at deploy time.

Infra (CDK):
  - New `BFFCookieDataKeySecret`, encrypted at rest with the existing
    `BFFCookieSigningKey` CMK.
  - Two chained `AwsCustomResource`s bootstrap the wrapped data key on
    Create only: `kms:GenerateDataKey` -> `secretsmanager:PutSecretValue`.
    `outputPaths: ["CiphertextBlob"]` whitelists the field returned to
    CFN so the response Plaintext (the AES key itself) never enters
    CloudFormation state.
  - SSM parameter publishes the secret ARN for app-api to consume.

App-api:
  - `CookieCodec._ensure_cipher` now reads the wrapped blob from Secrets
    Manager, calls `kms:Decrypt(KeyId=BFFCookieSigningKey, ...)` to
    unwrap, and caches the AESGCM cipher as before. KeyId is pinned to
    defend against blob substitution if the secret is ever tampered.
  - Distinguish infra failure (`CookieDataKeyUnavailable` -> 5xx) from
    decode failure (`CookieDecodeError` -> clear cookie). Empty / non-
    base64 / wrong-size key all surface as infra errors.
  - Drop `kms:GenerateDataKey` from the runtime task role (least
    privilege; runtime no longer needs it). The bootstrap custom
    resource carries its own narrow grant.

Tests:
  - Cross-task seal/unseal regression locked in:
    `test_two_codecs_with_same_wrapped_blob_decrypt_to_the_same_cipher`
    — two CookieCodec instances simulate two ECS tasks; cookie sealed
    on one MUST unseal on the other.
  - New `_ensure_cipher` battery: happy path, KeyId pin, hot-path
    caching, Secrets Manager / KMS failure propagation, empty / bad
    base64 / wrong-size key rejection, missing config -> decode error.
  - Updated test_3_6 preservation contract to match the new code path
    (one Secrets Manager + one KMS Decrypt per process, was: one
    GenerateDataKey).
  - CDK tests for the bootstrap custom resources (KeySpec=AES_256,
    outputPaths whitelisted, narrow IAM grants), the new env var on
    app-api, and the IAM grant changes (Decrypt-only on the CMK,
    GetSecretValue on the data-key secret).
  - Fixed two pre-existing stale resource-count assertions in
    infrastructure-stack tests (16 → 18 DDB tables, 3 → 6 secrets).

* fix(bff): coalesce Cognito refresh across tasks via DDB conditional-write lock

The in-process `single_flight` and `get_session_lock` introduced by
PR #264 only coalesce same-session callers within a single Python
process. Once the cookie-codec fix lands and dev's two app-api tasks
can share cookies again, two tasks under desiredCount: 2 will each
see the same cookie cross the refresh-leeway boundary and each call
`cognito-idp:initiate_auth` with the same refresh token. Cognito
rotates on the winning call; the loser receives `NotAuthorizedException`,
the loser's middleware clears the user's cookie, and the user is
silently logged out.

This commit adds a cross-task lock so exactly one Cognito refresh per
session per leeway window happens across the entire fleet.

Repository (DDB):
  - New `try_acquire_refresh_lock(session_id, owner, lock_ttl_seconds)`:
    conditional UpdateItem that succeeds iff `attribute_not_exists(
    refresh_lock_until) OR refresh_lock_until < :now`. Loser returns
    False; non-condition errors propagate.
  - `update_tokens` gains `expected_lock_owner=...` — when supplied,
    the write conditionally requires the row's `refresh_lock_owner` to
    match (or be absent), and atomically REMOVE-es the lock attrs in
    the same write. ConditionalCheckFailed propagates so a stale leader
    can't stomp on a successor's freshly persisted tokens.
  - `release_refresh_lock(session_id, owner)`: best-effort cleanup for
    the leader-failed path so a peer doesn't have to wait the full TTL
    before retrying. No-op if the lock has TTL'd or another task owns
    it. Other DDB errors logged-and-swallowed.

Middleware:
  - Two-tier coalescing inside `_resolve_session._loader`:
      1. existing `get_session_lock` (in-process) collapses N concurrent
         same-session callers within one task to one contender.
      2. NEW `try_acquire_refresh_lock` (cross-process via DDB) elects
         exactly one leader across the entire fleet. Followers poll the
         row via `_wait_for_peer_refresh` and adopt the leader's tokens
         (rotation detected by refresh-token mismatch; non-rotation
         detected by access-token mismatch + future-dated exp).
  - Leader path: lock owner threaded through `_persist_refresh` so the
    write is conditional on still-being-leader. ConditionalCheckFailed
    on persist → re-read DDB and adopt the peer's tokens rather than
    invalidating the cache.
  - Cognito refresh failure on leader path: lock is released eagerly
    (best-effort) so peer requests don't have to wait for the full TTL.
  - Configurable `refresh_lock_ttl_seconds` (default 30s) — bounds the
    worst case where a leader crashes mid-refresh.

Tests:
  - 8 new repository tests for the lock primitive: acquire on unlocked
    row, contention blocks peer, TTL recovery, distinct-session isolation,
    release-by-owner-only, atomic clear on token persist, condition fails
    when peer owns the lock.
  - 5 new integration-level cross-task tests
    (`test_session_refresh_cross_task.py`) running two `SessionRefreshMiddleware`
    instances over one moto DDB table — covers leader/follower paths,
    follower-polling-then-adopting, lock TTL recovery after dead leader,
    follower-falls-back-terminal when leader is stuck, and the headline
    invariant: two tasks racing in parallel call Cognito at most once.
  - Updated `test_session_refresh_preservation.py`'s `InstrumentedTable`
    to differentiate lock-acquire / token-persist / slide writes so
    `update_item_side_effect` injection only fires on the persist path
    (preserving the original test intent).

No IAM change required: app-api task role already has `dynamodb:UpdateItem`
on the BFF sessions table.
…nerated secret (#274)

PR #273 introduced an `AwsCustomResource`-chained bootstrap
(kms:GenerateDataKey -> secretsmanager:PutSecretValue) to materialize a
wrapped AES-256 data key for cross-task cookie sealing. That design
fails on first stack create with:

    Custom::AWS BFFCookieDataKeyGenerate CREATE_FAILED
    Response object is too long.

Root cause: the AwsCustomResource framework Lambda JSON-stringifies the
AWS-SDK response BEFORE applying `outputPaths`. KMS returns
`CiphertextBlob` as a Uint8Array, which serializes as `{"0":233,"1":18,
...}` — for a ~200-byte ciphertext that's ~1.5 KB, blowing past
CloudFormation's 4 KB response-object limit. Even if it had landed
under the limit, the value threaded into PutSecretValue via
`getResponseField` would have been the JSON-object form, not a base64
ciphertext — runtime base64-decode would have failed at first cookie
seal attempt.

Fix: drop the chained custom resources entirely. Use Secrets Manager's
own `generateSecretString` (44-char alphanumeric, ~261 bits of entropy)
and derive the AES-256 key at runtime via SHA-256. Single-shot SHA-256
of a >=256-bit-entropy random input is a sound KDF — the output is
statistically indistinguishable from random for AES-256 use.

Threat model is preserved:
  - Secret is still encrypted at rest with the customer-managed
    `BFFCookieSigningKey` CMK.
  - Reading it requires both `secretsmanager:GetSecretValue` AND
    `kms:Decrypt` on the CMK (Secrets Manager invokes Decrypt on the
    caller's behalf using the secret-ARN encryption context).
  - Runtime never gets `kms:GenerateDataKey`, so a compromised task
    can't seal cookies under a parallel key.
  - Cross-task seal/unseal regression lock
    (`test_two_codecs_with_same_secret_derive_the_same_cipher`) still
    holds.

Infra (CDK):
  - Removed `BFFCookieDataKeyGenerate` and `BFFCookieDataKeyStore`
    AwsCustomResources and their narrow IAM grants.
  - `BFFCookieDataKeySecret` now uses `generateSecretString` directly.
  - Dropped `kms:DescribeKey` from app-api task role; kept `kms:Decrypt`
    (Secrets Manager invokes it on the caller's behalf when reading a
    CMK-encrypted secret).
  - Removed the `AwsCustomResource` import; cleaned up obsolete
    bootstrap-related comments.

App-api:
  - `CookieCodec._ensure_cipher` now reads the secret string from
    Secrets Manager and applies SHA-256 to derive the 32-byte AES-256
    key. No KMS round trip, no per-cold-start `kms:Decrypt` call.
  - `CookieCodec` constructor lost the `kms_client` parameter; only
    `secrets_manager_client` is needed for testing.
  - Updated module docstring and `CookieDataKeyUnavailable` comment.

Tests:
  - `test_cookie.py`: rewrote `_ensure_cipher` test battery for the
    no-KMS path. New `test_ensure_cipher_derived_key_matches_sha256_of_secret`
    pins the KDF — a future change must keep the same derivation, or
    every cookie sealed by an old task fails to unseal on a new task
    after deploy. Cross-task regression lock renamed to
    `test_two_codecs_with_same_secret_derive_the_same_cipher`.
  - `test_session_refresh_preservation.py`: 3.6 contract no longer
    asserts `kms.decrypt.call_count` or KeyId-pinning; only the
    Secrets Manager singleton-call invariant remains.
  - `test_session_refresh_cross_task.py`: comment updated to match new
    vocabulary (data-key secret vs wrapped data key).
  - `infrastructure-stack.test.ts`: dropped bootstrap-CR assertions
    (`generateDataKey` / `putSecretValue`); added negative lock that
    no AwsCustomResource emits those actions, plus a positive
    assertion on the `generateSecretString` shape (PasswordLength: 44,
    ExcludePunctuation, IncludeSpace: false).
  - `app-api-stack.test.ts`: comment-only update for the
    `kms:GenerateDataKey` exclusion; same negative assertion still
    holds under the new design.

Net diff: -153 lines. No more chained custom resources, no per-cold-
start KMS round trip, simpler runtime IAM surface.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
… guard (#275)

Two correctness improvements layered on top of PR #273's cross-task
refresh-lock work.

1. Strict-owner lock release (repository.py)

The post-#273 release condition was:

    attribute_not_exists(refresh_lock_owner) OR refresh_lock_owner = :owner

That has a stale-leader stomp bug:

    Task A acquires the lock.
    Task A's lock TTLs (slow Cognito refresh, ECS eviction, etc.).
    Task B acquires the lock, refreshes, and persists tokens — which
      REMOVEs the lock attrs in the same write.
    Task A returns from Cognito and calls update_tokens with our
      (older) tokens.
    `attribute_not_exists(refresh_lock_owner)` matches — Task A's
      stale tokens overwrite Task B's freshly rotated ones.
    Next request: Cognito rejects Task A's now-revoked refresh token;
      user silently logged out.

Fix: tighten to strictly `refresh_lock_owner = :owner`. The leader
always sets these attrs in `try_acquire_refresh_lock`, so the strict
form is correct in every legitimate flow and surfaces every stale-
leader case as `ConditionalCheckFailedException` for the caller to
re-read and adopt the peer's tokens.

Also adds `try_acquire_refresh_lock` test coverage to lock in that the
acquire path uses `attribute_exists(PK)` so it never creates phantom
rows for sessions that don't exist.

2. Absolute-lifetime guard before refresh (session_refresh.py)

Mirrors the existing `_maybe_slide` short-circuit. If a session is
past `created_at + absolute_lifetime_seconds`, don't burn a Cognito
refresh-token rotation on a session whose row would TTL-evict
immediately after the write — clear the cookie instead. Otherwise
we silently rotate a token we'll never read again.

Plus INFO logging on cross-task adoption success so CloudWatch can
answer "how often is cross-task coalescing actually firing?" without
needing a debug deploy.

Tests:
  - test_repository.py: new
    test_update_tokens_rejects_persist_when_peer_already_cleared_the_lock
    locks in the strict-owner condition.
    test_try_acquire_refresh_lock_does_not_create_phantom_row pins
    the acquire-path attribute_exists guard.
  - test_session_refresh_middleware.py: new
    test_refresh_path_past_absolute_cap_clears_cookie_without_calling_cognito
    pins the absolute-lifetime guard ahead of the lock acquisition.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Previously only the SessionService bootstrap path redirected on 401 — a
session that expired mid-session left the user stranded with a generic
toast (CRUD endpoints) or no feedback (SSE chat stream). Now every 401
flows through SessionService.handleUnauthorized(), which dedupes
concurrent calls and queues a single navigation to /auth/login with a
returnUrl preserved.

Also surfaces session loss proactively rather than waiting for the next
HTTP call to fail:

- Cookie-presence fast-path in bootstrap and recheck — when the JS-readable
  __Host-bff_csrf cookie is gone, the session cookie is gone too (BFF
  sets/clears them together with matching Max-Age), so we skip the
  /auth/session round-trip and bounce straight to login.
- Visibility re-probe in the app shell — on tab refocus, recheck() runs
  the cookie check and falls back to /auth/session, so a session that
  expired while the tab was backgrounded is caught immediately.

Deferred to follow-ups: cross-tab BroadcastChannel coordination, draft
preservation across login redirect, periodic background polling.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
… bootstrap run 2026-05-10 (#278)

Adds two new repo-level skills under .claude/skills/:
- kaizen-research: Friday early-AM external + internal scan (AWS Bedrock/AgentCore,
  Strands, aws-samples reference repo, MCP, frontier models, agent-harness patterns;
  internal git/PR/CI/version-pin signals). Outputs dated research doc + queues ideas
  in docs/kaizen/review-queue.md.
- kaizen-review-prep: Friday late-AM ranked decision agenda. Consumes research +
  open queue + last-week's POC findings (from prior research PR comments) +
  recent merges/CI signal. Every item has Ship/Decline/Defer recommendation.

Both skills open PRs into develop on kaizen/* branches. Web budget soft-target
50/run; subagent fan-out for external sources; explicit handoff contract via
review-queue.md.

This commit also includes the first bootstrap output:
- docs/kaizen/research/2026-05-10.md
- docs/kaizen/reviews/2026-05-10.md
- docs/kaizen/review-queue.md (7 open items)

Top finding: bedrock-agentcore is 3 minor versions behind (1.6.4 -> 1.9.0,
released inside scan window) and our open issues #266/#267 were quietly closed
by Strands v1.37/v1.38 (already in our 1.39 pin from #265). CI failure cluster
(9 nightly + 6 deploy failures since May 6) is the loudest internal signal.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…ative subtraction clarification (#279)

Three additions to the kaizen-research skill (and corresponding refresh of
the 2026-05-10 bootstrap output):

1. **FastMCP** added as source category 4a — tracks upstream releases for
   the externally hosted MCP servers this stack consumes via AgentCore
   Gateway. Not pinned in this repo's pyproject.toml; lives in the MCP
   server repos. Source: https://github.com/jlowin/fastmcp + PyPI.

2. **Library-native subtraction** explicitly named in the Subtraction-first
   philosophy. When upstream ships a capability we built or filed an issue
   for, the win is closing our version and adopting upstream. The 2026-05-10
   bootstrap surfaces a canonical example: Strands v1.37/v1.38 silently
   closed our open issues #266 and #267.

3. **Security posture audit** added as internal source 18a. Snapshots open
   Dependabot alerts, open CodeQL findings (with severity + rule + path),
   open security-labeled issues, recent auth-surface commit churn, and the
   most recent CHANGELOG security block. Cross-references open Dependabot
   alerts against external advisories scan to surface "we already know
   it's hitting us" overlap. Surfaces in the doc as a new "Security
   posture" section between Version-pin lag and Retirement candidates.

The bootstrap output (2026-05-10) was re-walked through the new lenses:
- Security posture section added: 9 open Dependabot alerts (4 high — all
  `fast-uri`, confirmed real by the external advisories scan), 10 open
  CodeQL (3 error-severity `py/log-injection` in real backend paths),
  0 security-labeled issues, 7-commit BFF auth-surface churn.
- 2 new Top-7 proposals: #1 patch `fast-uri` Dependabot alerts; #3 fix
  `py/log-injection` CodeQL findings. Both Low effort, High/Med-High
  impact — they jump to the top of the ranked agenda.
- TL;DR, Take, and Review Protocol updated to reflect security findings.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
colinmxs and others added 21 commits June 18, 2026 15:11
* fix(deps): remediate all 22 HIGH Dependabot findings

Bumps vulnerable dependencies across backend, scripts, and frontend to
patched versions. All 22 open HIGH-severity Dependabot alerts addressed.

Backend (pyproject.toml + uv.lock):
- cryptography 47.0.0 -> 48.0.1 (GHSA-537c-gmf6-5ccf)
- starlette 1.0.0 -> 1.3.1 (CVE-2026-48818, CVE-2026-54283)
- python-multipart 0.0.27 -> 0.0.30 (CVE-2026-53539)
- pyjwt[crypto] 2.12.1 -> 2.13.0 (CVE-2026-48526)
- urllib3 pinned 2.7.0 (CVE-2026-44431, CVE-2026-44432)

Scripts (backup-data, restore-data):
- add uv constraint cryptography>=48.0.1 -> 49.0.0 (GHSA-537c-gmf6-5ccf)

Frontend (package.json + package-lock.json):
- @angular/* framework 21.2.11 -> 21.2.17 (CVE-2026-50170/50171/54266/54267/54268)
  (cdk 21.2.14, build/cli 21.2.16 -- latest existing patch for those packages)
- hono override -> 4.12.26 (CVE-2026-54290)
- piscina override -> 5.2.0 (CVE-2026-55388)
- undici override -> 7.28.0 (CVE-2026-9697)
- vite override -> 8.0.16 (CVE-2026-53571)

Verified: backend pytest 3935 passed/3 skipped; frontend build OK;
frontend unit tests 1216 passed. All locked versions >= patched thresholds.

* fix(deps): remediate easy MEDIUM/LOW Dependabot findings

Straightforward in-range/direct dependency bumps for remaining alerts.
Risky or blocked findings deliberately left out (see below).

Backend (pyproject.toml + uv.lock):
- aiohttp 3.13.5 -> 3.14.1
- authlib 1.7.0 -> 1.7.1
- python-multipart 0.0.30 -> 0.0.31
- idna pinned 3.15 (transitive)

Scripts (backup-data, restore-data):
- pytest 8.4.2 -> 9.0.3 (dev)

Frontend (package.json + package-lock.json):
- mermaid 11.14.0 -> 11.15.0 (direct)
- @babel/core override ">=7.29.6 <8.0.0" -> 7.29.7
  (bounded to 7.x; open-ended pulled babel 8 which broke the Angular build)

Infrastructure (package-lock.json, lock-only):
- @babel/core -> 7.29.7 (within range)

Deliberately NOT included (not "easy"):
- esbuild 0.28.1: pinned by vite 8; override risks build breakage (low sev)
- js-yaml 4.2.0: major bump (consumers pin ^3)
- dompurify: one alert has no patched version published
- infra fast-uri (HIGH) / brace-expansion / js-yaml: bundled inside
  aws-cdk-lib@2.251.0/node_modules; npm overrides can't reach them.
  Requires an aws-cdk-lib upgrade -- separate, deliberate change.

Verified: backend pytest 3935 passed/3 skipped; frontend build OK +
1216 unit tests passed; infra tsc build OK + 396 jest tests passed.

* fix(deps): upgrade aws-cdk-lib 2.251.0 -> 2.260.0 to clear bundled CVEs

The remaining infra Dependabot HIGH/MEDIUM alerts were in deps bundled
inside aws-cdk-lib's own node_modules (unreachable by npm overrides):
- fast-uri 3.1.0 -> 3.1.2 (HIGH, 2 advisories) — bundled via ajv
- brace-expansion 5.0.5 -> 5.0.6 (MEDIUM) — bundled via minimatch

Bumping aws-cdk-lib to 2.260.0 (latest 2.x) re-bundles both at patched
versions. constructs 10.6.0 already satisfies the ^10.5.0 peer (unchanged).

Verified: infra tsc build clean; jest 396 passed/18 suites (full
PlatformStack construction + Template.fromStack assertions, no template
regressions). cdk-lib v2 minor bump, backward compatible.
The deploy workflows (backend.yml, platform.yml, frontend-deploy.yml) only
run on push to develop/main and run their test jobs as a pre-deploy gate,
so unit tests never executed on the PR itself. PRs into develop ran only
skip-auth-guard.

Adds .github/workflows/ci.yml triggered on pull_request -> [develop, main]
with three parallel test jobs reusing the existing commands:
- test-backend:  uv sync + uv run pytest tests/
- test-frontend: npm ci + npm run test:ci (vitest)
- test-infra:    npm ci + npx jest

No build/deploy/AWS steps — deploys stay push-only. Actions are SHA-pinned
with the shared checkout SHA, runners pinned to ubuntu-24.04, and
cancel-in-progress: true (safe; no CDK deploy). Conforms to all
tests/supply_chain checks (31 passed).
…across edge origins (#491)

Collapse the three-separate-cert first-deploy footgun into one shared
wildcard, and make cert handling consistent and fail-loud across all
CloudFront origins.

- config.ts: add CDK_CLOUDFRONT_CERTIFICATE_ARN (top-level
  cloudfrontCertificateArn). frontend/artifacts/mcpSandbox certs fall
  back to it when their section-specific ARN is unset; section-specific
  wins. ALB cert stays separate (region-specific). One us-east-1 wildcard
  ({domain}+*.{domain}) now satisfies all three origins.
- artifacts-distribution-construct: add domain-set-but-cert-missing guard
  mirroring mcp-sandbox (replaces the false 'config.ts already enforced'
  comment + opaque fromCertificateArn(undefined) crash), and add a
  domain-less fallback to the CloudFront default domain so domain-less
  synth no longer crashes with 'reading startsWith'.
- load-env.sh: forward cloudfrontCertificateArn + mcpSandbox.certificateArn
  context params (were missing, breaking the cdk.context.json path).
- workflows: wire CDK_CLOUDFRONT_CERTIFICATE_ARN job-level env in
  platform / nightly / teardown.
- docs: step-02/step-03/ACTIONS-REFERENCE recommend the single shared
  cert and reframe per-origin vars as optional overrides; troubleshooting
  entry for the synth cert-guard failure.
- tests: CloudFront cert resolution (config), artifacts cert guard +
  domain-less fallback, and end-to-end shared-cert PlatformStack synth.
  Full infra suite: 20 suites / 406 tests green.
* fix(infra): bump aws-cdk CLI 2.1120.0 -> 2.1128.0 to match aws-cdk-lib 2.260.0

aws-cdk-lib 2.260.0 emits cloud-assembly schema 54.0.0, but the pinned
aws-cdk CLI (2.1120.0) only reads up to schema 53 — so synth/deploy
failed with 'CDK CLI is not compatible with the CDK library ... Maximum
schema version supported is 53.x.x, but found 54.0.0. You need at least
CLI version 2.1128.0'. The library was bumped without bumping the CLI.

Scripts invoke 'npx cdk', which resolves the local aws-cdk devDependency
on the CI runner, so bumping the pin + regenerating the lockfile is the
fix. Also bump the devcontainer global CDK pin (Dockerfile) and the
version tables (README, dev-environment steering) that are documented to
track package.json, so the interactive 'cdk' in the container doesn't
drift and reproduce the same error locally.

Verified in the devcontainer: npx cdk --version -> 2.1128.0; a synth of
an aws-cdk-lib 2.260.0 assembly (manifest schema 54.0.0) is read by the
2.1128.0 CLI with exit 0; full infra suite 20 suites / 406 tests green.

* ci(platform): pin Node 22 in the PlatformStack deploy jobs

The deploy jobs in platform.yml and nightly-deploy-pipeline.yml were the
only jobs without actions/setup-node — they ran scripts/platform/deploy.sh
(which does npm ci + cdk via deploy.sh -> scripts/cdk/install.sh) on the
runner's ambient Node instead of the Node 22 every other job and the
devcontainer pin. Add setup-node (node 22 + npm cache) so the deploy
toolchain is pinned and reproducible.

Note: deps were already being installed (deploy.sh calls install.sh ->
npm ci); the recent schema-mismatch failure was the stale aws-cdk pin
(2.1120.0), fixed in 4339f26 by bumping to 2.1128.0. This change is the
toolchain-pinning gap the #396 refactor left in the deploy jobs.
* fix(infra): auto-generate IAM role names to avoid first-deploy collisions

Drop explicit roleName from the AgentCore memory/code-interpreter/browser/
gateway/runtime execution roles and the SageMaker execution role. Fixed
physical names collide with orphaned roles left by a rolled-back/partial
deploy. Every consumer references these roles by .roleArn (or resolves them
at runtime via GetGateway / SAGEMAKER_EXECUTION_ROLE_ARN), so auto-generated
names are safe.

* chore(infra): replace deprecated pointInTimeRecovery with pointInTimeRecoverySpecification

Silences the aws_dynamodb.TableOptions#pointInTimeRecovery deprecation
warnings across all DynamoDB table constructs. Synthesized CloudFormation
output is unchanged (still PointInTimeRecoveryEnabled: true).

* fix(deploy): re-seed image-tag SSM param when the referenced ECR image is missing

The seed guard skipped any URI-shaped value, trusting the build pipeline.
But image-tag params are not CFN-managed and survive teardown, so a stale
project-repo URI could outlive its ECR repo and break the AgentCore Runtime
/ ECS task def with 'repository does not exist'. Verify the image actually
exists (ecr describe-images) before skipping; otherwise overwrite with the
bootstrap URI.

* fix(infra): grant Cognito group + delete actions for first-boot admin setup

Add cognito-idp:CreateGroup and AdminAddUserToGroup (the first-boot flow
creates the system_admin group and adds the initial admin) plus
AdminDeleteUser (so the rollback path doesn't orphan a Cognito user and
block retry with UsernameExistsException) to the app-api task role.
…es (#495)

Reverts the roleName removal from 7107cf9 for the AgentCore memory/
code-interpreter/browser/gateway/runtime and SageMaker execution roles.

Auto-generating these names is unsafe on an already-deployed stack: the
role ARN feeds create-only properties on the AgentCore resources
(BrowserCustom/CodeInterpreterCustom executionRoleArn, Memory
memoryExecutionRoleArn, Gateway/Runtime roleArn). Renaming the role
replaces it (new ARN) -> forces replacement of the dependent AgentCore
resource -> CFN re-creates it with the same create-only Name -> 'already
exists' collision -> UPDATE_ROLLBACK. Confirmed via ai-sbmt-api-PlatformStack
events (BrowserCustom/CodeInterpreterCustom DELETE_COMPLETE with empty
PhysicalResourceId during rollback).

Orphaned fixed-name roles on a *fresh* deploy are handled by deleting the
orphans before deploying, not by renaming. Added comments on each role to
prevent re-introducing the auto-name change.
The RAG ingestion Lambda and AgentCore Runtime are arm64, but the
post-refactor backend.yml ran their build jobs on amd64 (ubuntu-24.04).
inference-api compensated with QEMU emulation (platform: linux/arm64);
rag-ingestion had neither the platform input nor a build-one.sh PLATFORM,
so it produced an amd64 image -> the arm64 Lambda failed every invoke with
Runtime.InvalidEntrypoint (file uploads stuck 'uploading', no embeddings).

Restore the pre-refactor (main) approach: build both arm images on native
ubuntu-24.04-arm runners instead of emulating on amd64.

- backend.yml: build-inference-api and build-rag-ingestion -> runs-on
  ubuntu-24.04-arm; drop the QEMU-triggering platform input from
  inference-api (native build needs no emulation).
- build-one.sh: rag-ingestion PLATFORM=linux/arm64 (explicit native build).

Deploy jobs stay on ubuntu-24.04 (API-only, no docker build). Note: the
stale amd64 rag-ingestion ECR image must be deleted once so the content-hash
build doesn't skip the corrected arm64 build.
… off) (#497)

* feat(skills): gate skills feature behind SKILLS_ENABLED flag (default off)

Defer the skills feature (user picker, admin catalog, skills mode) for a
release without removing any merged code. A new
apis/shared/feature_flags.py::skills_enabled() reads SKILLS_ENABLED
(default false), mirroring the FINE_TUNING_ENABLED precedent. Deployed
environments go dark automatically because the env var is absent; set
SKILLS_ENABLED=true on both app-api and inference-api to re-enable.

Gated surfaces (code and data left intact):
- app-api main.py: user-facing skills router mounted only when enabled.
- app-api admin/routes.py: admin skills + chat-mode-policy routers gated.
- app-api system/routes.py: GET /system/chat-settings reports chat/no-toggle/
  skillsEnabled:false when off (new skills_enabled field on the response).
- inference-api chat/routes.py: force skill -> chat when off (voice/other
  agent types untouched).
- SPA: ChatModeService.skillsEnabled drives admin nav hide; mode toggle and
  skills section auto-hide via allowModeToggle:false; SkillService eager load
  gated by an effect so a disabled env never fires the now-404 GET /skills/.

Tests default off; skills-mode tests opt in via SKILLS_ENABLED=true and new
off-behavior tests cover the forced-chat path and admin mount gating.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(model-settings): mock Skill/ChatMode services to stop teardown rejection

model-settings.spec instantiated the real SkillService and ChatModeService,
which fire /skills/ (now via an effect) and /system/chat-settings on
construction. With no httpMock those requests fail asynchronously, and the
SKILLS_ENABLED change shifted the timing so a console.error landed during
worker teardown — surfacing as an unhandled EnvironmentTeardownError that
failed the vitest run with exit 1 even though all 1218 tests passed.

Provide minimal mocks for both services so the spec fires no stray async
work. Full suite exits 0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
The admin "discover from server" button signs its request as the app-api
task role (SigV4, service=lambda), but that role has no
lambda:InvokeFunctionUrl — only inference-api does. Against an
AuthType=AWS_IAM Lambda Function URL the signed request is rejected with
403, which surfaces during MCP client init as an anyio TaskGroup
ExceptionGroup and falls through to a generic 502.

For same-team MCP servers that validate a forwarded user JWT (Lambda URL
AuthType=NONE), discovery should mirror the runtime forward_auth_token
path and sign with the admin's own OIDC token instead of SigV4. Add a
forward_auth_token flag to MCPDiscoverRequest; when set, the discover
route forwards admin.raw_token as the bearer (400 if unavailable) and
skips SigV4. Provider-gated OAuth (3LO) discovery is still rejected — the
admin session can't supply an end-user provider token.

Wire the flag through the admin tool form's discover call so the existing
"Forward app authentication token" checkbox governs discovery too.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…or single-stack (#499)

Reconcile the nightly pipeline and teardown scripts with the single
PlatformStack, API-driven-deploy architecture.

nightly-deploy-pipeline.yml: restore the workflow_call input contract the
orchestrator still passes (ref, project-prefix, alb-subdomain, skip-teardown,
label, source-project-prefix, run-e2e). Every job now checks out inputs.ref and
deploys to the ephemeral inputs.project-prefix (never the shared environment).
Add an always() teardown job (needs all deploy/test jobs, gated on
skip-teardown) so every nightly stack is destroyed even on partial/failed
deploys -- no paying for idle resources. Ephemeral env runs with no custom
domain and an unset CDK_COGNITO_DOMAIN_PREFIX (defaults to the unique prefix).

scripts/nightly/teardown.sh: delete <prefix>-PlatformStack via cloudformation
delete-stack + wait (was a dead cdk-destroy loop over removed per-stack names).

scripts/teardown/destroy.sh: add PlatformStack to the foundation phase while
keeping legacy InfrastructureStack/app-stack handling, so the manual teardown
works for both single-stack and legacy deployments.
…cret (#501)

The admin auth-providers endpoints (POST/DELETE /admin/auth-providers) write
the provider client-secret bag back to the auth-provider-secrets secret via
PutSecretValue (apis/shared/auth_providers/repository.py), but the App API
task role was only granted GetSecretValue. Configuring/removing an auth
provider failed with AccessDeniedException on secretsmanager:PutSecretValue.

Add a least-privilege PutSecretValue statement scoped to just the
auth-provider-secrets secret (trailing wildcard matches the random ARN
suffix). No other runtime-written secret needs this.
* fix(nightly): auto-teardown ephemeral nightly deploys; fix teardown for single-stack

Reconcile the nightly pipeline and teardown scripts with the single
PlatformStack, API-driven-deploy architecture.

nightly-deploy-pipeline.yml: restore the workflow_call input contract the
orchestrator still passes (ref, project-prefix, alb-subdomain, skip-teardown,
label, source-project-prefix, run-e2e). Every job now checks out inputs.ref and
deploys to the ephemeral inputs.project-prefix (never the shared environment).
Add an always() teardown job (needs all deploy/test jobs, gated on
skip-teardown) so every nightly stack is destroyed even on partial/failed
deploys -- no paying for idle resources. Ephemeral env runs with no custom
domain and an unset CDK_COGNITO_DOMAIN_PREFIX (defaults to the unique prefix).

scripts/nightly/teardown.sh: delete <prefix>-PlatformStack via cloudformation
delete-stack + wait (was a dead cdk-destroy loop over removed per-stack names).

scripts/teardown/destroy.sh: add PlatformStack to the foundation phase while
keeping legacy InfrastructureStack/app-stack handling, so the manual teardown
works for both single-stack and legacy deployments.

* fix(nightly): restore test-infra/backend/frontend gates in deploy pipeline

The pipeline rewrite for ephemeral auto-teardown dropped the per-pipeline
test gates, breaking infrastructure/test/repo-shape.test.ts which requires:
  - build-*/code-deploy-* jobs gate on test-backend
  - deploy-frontend gates on test-frontend
  - deploy-platform gates on test-infra

Re-add the three test jobs (checking out inputs.ref) and the needs edges,
keeping the input-driven ephemeral deploy + always() teardown intact. Also
add the test jobs to teardown's needs so teardown waits for the whole graph.

Verified: npx jest repo-shape passes (49/49); both nightly workflow YAMLs
parse and all orchestrator calls pass only declared inputs.
…502)

The McpSandboxDistributionConstruct doc comment claimed it publishes the
proxy origin to SSM at `/{prefix}/mcp-sandbox/origin`. That was true of the
pre-#396 standalone McpSandboxStack, but the single-stack consolidation
dropped the SSM publication: the origin is now exposed as `proxyOrigin` and
threaded through PlatformComputeRefs straight into inference-api's
`AGENTCORE_MCP_APPS_SANDBOX_ORIGIN` env var. The stale comment misleads
anyone debugging the sandbox (a missing SSM param looks like a broken
deploy when it is expected). Update the comment to match the code.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Generated by the kaizen-research skill. Top 5 ideas appended to
docs/kaizen/review-queue.md for the kaizen-review-prep run later this morning.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The ui/message bridge handler read `params.content` as a single block
({type,text}), but per SEP-1865 / the ext-apps SDK the View sends an ARRAY
of content blocks (content: [{type:'text',text}]). Every spec-compliant
widget message was therefore rejected with -32000 "Invalid ui/message
params" (e.g. an MCP App's app.sendMessage()).

Read `content` as an array and concatenate its text blocks (mirrors the
ui/update-model-context handler). Update MessageParams to the array shape,
fix the two bridge specs that sent single objects, and add a multi-block
concatenation regression test.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…rt (#504)

A single TLS handshake blip (e.g. SSLV3_ALERT_HANDSHAKE_FAILURE from a
TLS-inspecting middlebox) or connection reset when starting an external
MCP client otherwise fails the whole agent build: Strands' start() raises
MCPClientInitializationError, the tool fails to load, and agent creation
errors out for the user.

UICapableMCPClient.start() now retries transient transport failures —
ConnectError/SSLError/timeouts, detected by walking the
MCPClientInitializationError -> ExceptionGroup -> httpx.ConnectError chain
— up to 3 attempts with exponential backoff. Non-transient errors (bad
URL, auth, protocol) are re-raised on the first attempt. Strands resets
its init future + background thread on failure (stop()), so re-invoking
start() is safe. Covers both external and gateway clients.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…rdances (#505)

A user turn initiated by an MCP App widget (ui/message → submitChatRequest)
appeared in the conversation but skipped the two affordances the composer
path triggers: the loading indicator (the page sets chatLoading before
submitting) and the scroll-to-top of the new user message (chat-container's
post-submit setTimeout). The widget delegate called the service directly,
bypassing the chat-input → chat-container → page chain.

- ChatStateService: add a `scrollToLastUserTick` signal + `requestScrollToLastUser()`.
- ChatContainer: react to the tick by scrolling the last user message to the
  top (mirrors onMessageSubmitted; skips the initial 0 so it doesn't scroll on
  mount).
- mcp-app-frame widget delegate: set chatLoading(true) and request the scroll
  around submitChatRequest (the user message is added synchronously inside it).

The composer path is untouched (it keeps its own setTimeout scroll), so no
existing behavior changes.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Drop the stale '6.' numeric prefix from the workflow name. It was a
leftover from the pre-refactor numbered pipeline ordering; no workflows
use numeric prefixes anymore and nothing referenced the old name.
Promote from 1.0.0-beta.27 to the 1.0.0 GA release. Synced via
scripts/common/sync-version.sh across backend/pyproject.toml,
frontend and infrastructure package.json + lockfiles, uv.lock, and
the README badge/current-release text.
Document the full delta since beta.27 (142 commits) for the 1.0.0 GA:
single-stack platform-as-bootstrap architecture, admin-managed Skills
(gated off), Conversation Modes, file-source connectors + website
crawling, assistant share permissions, Gateway MCP self-service targets,
curated model catalog + Bedrock Mantle provider, per-turn context
attribution, MCP Apps hardening, backup/restore tooling, a security
sweep, and all 22 HIGH Dependabot remediations. Replaces the stale
'Unreleased' section in RELEASE_NOTES.md.
main carried 3 prior-release squash commits (beta.25, beta.26, beta.27)
that were never merged back to develop, so release/1.0.0 could not
auto-merge into main. Every file main has that release/1.0.0 lacks is
the OLDER pre-1.0.0 version (verified: superseded skills/Gateway/Mantle
code, removed per-component CDK_*_ENABLED flags, and the legacy
multi-stack files deleted by #396). release/1.0.0 is the content
superset, so we merge with -s ours: main is recorded as a parent for
ancestry while release/1.0.0's tree is kept exactly — no resurrected
legacy files, no reverted 1.0.0 work.
@colinmxs colinmxs requested a review from a team June 24, 2026 15:35
Comment thread .github/workflows/nightly-deploy-pipeline.yml Dismissed
Comment on lines +105 to +108
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
path: infrastructure/node_modules
key: infrastructure-node-modules-${{ hashFiles('infrastructure/package-lock.json') }}

check-stack-dependencies:
name: "[${{ inputs.label }}] Check Stack Dependencies"
ref: ${{ inputs.ref }}
- uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
Comment on lines +126 to +129
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.ref }}

- name: Run stack dependency tests
run: bash scripts/common/test-stack-dependencies.sh

deploy-infrastructure:
name: "[${{ inputs.label }}] Deploy Infrastructure"
- uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6.4.0

@github-advanced-security github-advanced-security AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

{ name: 'Artifacts', factory: () => new ArtifactsDataConstruct(testStack(), 'X', { config }) },
];

for (const { name, factory } of constructFactories) {


def test_delete_missing_404(client):
assert client.delete("/skills/nope").status_code == 404
Comment on lines +172 to +175
assert (
client.delete("/skills/pdf_workflows/resources/nope.md").status_code
== 404
)
Comment on lines +90 to +96
r"""
os\.(?:environ\.get|getenv|environ\[) # os.environ.get / os.getenv / os.environ[
\s*\(?\s* # optional whitespace + optional (
['"] # opening quote
([A-Z][A-Z0-9_]+) # NAME
['"] # closing quote
""",
colinmxs added 2 commits June 24, 2026 15:50
- Remove all references to the not-yet-ready Skills feature (gated off
  behind SKILLS_ENABLED) from RELEASE_NOTES.md and CHANGELOG.md; it will
  be announced in a future release. Conversation Modes (ships enabled)
  is retained.
- Add a prominent 'Upgrading an existing deployment' section that
  explains the destructive backup -> teardown -> redeploy -> restore
  migration and links to the full step-by-step guides (published docs
  site + in-repo upgrade-from-multi-stack.md) so operators can navigate
  straight to instructions for their stacks.
- Fix an incorrect PR citation for the web-sources dependencies (#378).
- Comment out the push: triggers on platform.yml, backend.yml, and
  frontend-deploy.yml so forking or syncing the codebase never
  auto-deploys infrastructure or code into a user's AWS account.
  Deploys are now manual via the Actions tab; re-enable by uncommenting.
- Document the gating in RELEASE_NOTES.md (CI/CD + deployment notes) and
  CHANGELOG.md.
- Add justification for the single-stack overhaul to the release notes:
  the app is a monolith, not a microservice fleet; the nine-stack layout
  bought all the operational cost of microservices (cross-stack
  ImportValue ordering, inter-stack drift, first-deploy chicken-and-egg)
  with none of the benefits, and consolidating removes that class of
  deployment gotchas.
Comment on lines +90 to +96
r"""
os\.(?:environ\.get|getenv|environ\[) # os.environ.get / os.getenv / os.environ[
\s*\(?\s* # optional whitespace + optional (
['"] # opening quote
([A-Z][A-Z0-9_]+) # NAME
['"] # closing quote
""",
custom_prompt = await get_system_prompts_service().get_enabled_prompt(active_prompt_id)
if not custom_prompt:
logger.info(
f"Custom prompt {active_prompt_id!r} not found or disabled — skipping"
logger.info(
"Kicked off crawl %s for assistant %s (root_document=%s url=%s)",
job.crawl_id,
assistant_id,
):
logger.warning(
"PUT /sessions/%s/metadata: session id is taken under a different user; refusing",
session_id,
try:
return await adapter.search(access_token, query, cursor)
except FileSourceError as err:
logger.warning("search failed for connector %s: %s", provider_id, err)
Run with: pytest tests/supply_chain/test_backup_coverage.py -v
"""

import ast
cgnat = ipaddress.IPv4Network("100.64.0.0/10")
if addr in cgnat:
return True
except ValueError:
parsed = json.loads(candidate)
if isinstance(parsed, dict):
return parsed
except (ValueError, TypeError):
import time
from collections import defaultdict
from typing import Awaitable, Callable, Dict, List, Optional, Set, Tuple
from urllib.parse import urljoin

import asyncio
import logging
import os
@colinmxs colinmxs merged commit 815de11 into main Jun 24, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants