Release/1.0.0#506
Merged
Merged
Conversation
Co-authored-by: colinmxs <colinmxs@users.noreply.github.com>
* fix: resolve open CodeQL alerts - Remove user-controlled values from log statements (py/log-injection) - Remove unused imports: lambda, lambdaEventSources, sns, events, targets (js/unused-local-variable) - Remove unused variable execution_output (py/unused-local-variable) - Remove dead-code variable result_seen (py/unused-local-variable) - Rename _period to _ for intentionally unused placeholder (py/unused-local-variable) - Add explanatory comment to empty except block (py/empty-except) * fix: upgrade Pygments 2.19.2 -> 2.20.0 (Dependabot alert #71) Addresses ReDoS vulnerability in GUID matching regex. Other Dependabot alerts (pillow, aiohttp, cryptography, python-multipart, vite, postcss, dompurify, hono, lodash-es) are already at fixed versions in the lockfiles — those alerts are stale and should auto-close on next scan. * chore(ci): target main branch for CodeQL and Dependabot - CodeQL workflow: push/PR triggers changed from develop to main - Dependabot: all four update groups (pip, npm frontend, npm infra, github-actions) target main instead of develop * test: update dependabot config test to expect main branch Aligns test assertion with the CI config change that retargets Dependabot from develop to main. --------- Co-authored-by: colinmxs <colinmxs@users.noreply.github.com>
Bumps [types-aiofiles](https://github.com/python/typeshed) from 25.1.0.20251011 to 25.1.0.20260409. - [Commits](https://github.com/python/typeshed/commits) --- updated-dependencies: - dependency-name: types-aiofiles dependency-version: 25.1.0.20260409 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…conversation cost badge, compaction events (#249) Co-authored-by: colinmxs <colinmxs@users.noreply.github.com>
* feat(login): lava-lamp motion for backdrop blobs Replace the static circular drift with organic morphing blob shapes that rise and fall vertically with squish/stretch and gentle rotation. Bumps blob count from 3 to 5 with offset animation delays so morph cycles don't deform in lockstep. Honors prefers-reduced-motion. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(login): three-tier parallax for backdrop blobs Restructure the 5 lava blobs into 6 across 3 depth tiers (far/mid/near). Size, blur, opacity, animation duration, and travel distance all scale with depth so the velocity contrast reads as parallax: huge soft far blobs barely budge while small sharp near blobs traverse the viewport. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(first-boot): apply lava-lamp parallax backdrop and frosted card Mirror the login page's shell so first-boot and login feel like one system: three-tier parallax blobs, primary-color radial wash, faint grid overlay, and the frosted-glass card. Class names are reused under component-scoped styles, so they don't collide with login. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
#254) Replace the dense badge with a richer attachment renderer in user message history: - Images render as an iMessage-style mosaic (1-bubble, 2-col, 1+2 split, 2x2 grid, 5+ with "+N" overlay) and open in a full-screen lightbox with arrow-key navigation. - Non-image files render as a document-style card: tinted header strip with type chip, white "page" body with a folded corner, and filename + size footer. Text-based files (txt, md, csv, html) show a real content excerpt; binary types (pdf, docx, xls/xlsx) get skeleton lines. Backend additions to support the UI: - GET /files/{upload_id}/preview-url — short-lived presigned GET URL, scoped to the file owner, used for inline images and the lightbox. - GET /files/{upload_id}/text-snippet — first 2KB of a text-based file, decoded as UTF-8, for the document card content peek.
…tion - Add spreadsheet_analysis module with factory-produced tools for listing and analyzing tabular data - Implement make_list_spreadsheets_tool to enumerate CSV/Excel files from knowledge bases and chat attachments - Implement make_analyze_tool to download files from S3, execute Python analysis via Code Interpreter, and return results - Add intelligent schema detection with skiprows probing to handle report-style exports with metadata rows - Implement stderr cleaning to filter pandas/numpy internal frames and show only user-relevant errors - Add output truncation (10K chars) and error truncation (600 chars) to prevent context window overflow - Update ToolRegistry integration to inject spreadsheet tools per-request via extra_tools parameter - Update chat routes (app_api and inference_api) to pass conversation context to tool factories - Add comprehensive docstrings and logging for debugging file discovery and Code Interpreter execution - Enables agents to analyze user-provided spreadsheets without manual file handling or external dependencies
- Add S3 read permissions (GetObject, GetObjectVersion) to runtime execution role for assistants documents bucket - Enable agent's spreadsheet_analysis tool to download tabular KB files (CSV/XLSX) from S3 for Code Interpreter sandbox analysis - Add S3_ASSISTANTS_DOCUMENTS_BUCKET_NAME environment variable to runtime configuration via SSM parameter - Update documentation comments to clarify that documents bucket is now accessed at runtime by the agent, not just during ingestion - Resolves agent failures when attempting spreadsheet analysis due to missing bucket configuration
…mentation - Add StartCodeInterpreterSession, StopCodeInterpreterSession, GetCodeInterpreter, GetCodeInterpreterSession, and ListCodeInterpreterSessions actions to runtime execution role - Replace CreateCodeInterpreterSession with StartCodeInterpreterSession to align with AWS API - Add detailed inline documentation referencing AWS Bedrock Agent Core policy guide - Scope permissions to this stack's Custom Code Interpreter resource only, removing need for account-wide discovery permissions
…sion navigation - Add loaded assistant check to prevent re-fetching already-loaded assistant on metadata signal changes - Prioritize in-memory loaded assistant over query params and session preferences when determining which assistant to use - Add cross-session navigation detection to clear stale assistant state before new session metadata loads - Prevent mid-session assistant attachment validation error when query param persists after first message - Add conditional clearing of assistant state to avoid wiping in-memory assistant on first turn of new sessions - Improve assistant resolution priority: loaded assistant → query param → session preferences - Add detailed comments explaining RAG continuation across follow-up messages and state management edge cases - Fixes issue where assistant would be lost after first message submission or when navigating between sessions (#205)
…d using URL as source of truth - Fix assistant_id storage in SessionMetadata by updating preferences sub-model instead of top-level model, resolving silent failures under extra="allow" - Remove redundant assistant_id resolution logic that attempted fallback to session preferences, simplifying to trust URL query parameter - Update session list to pass assistantId query param when navigating to sessions with attached assistants - Refactor session page effect to use URL as single source of truth for assistant attachment, eliminating race conditions from metadata fetch timing - Add self-heal redirect to rebuild URL with assistantId when landing on bare `/s/:id` URLs from bookmarks or legacy links - Prevents mid-session assistant validation failures on turn 2+ by ensuring preferences.assistant_id is properly persisted and accessible Fixes #205
- Add comprehensive section on handling missing/disabled tools in main agent system prompt - Include step-by-step instructions for identifying user intent and suggesting tool enablement - Provide mapping of common user intents to corresponding tools (spreadsheet analysis, code interpreter, web search, knowledge base) - Add concrete example response showing how to guide users to enable Spreadsheet Analysis tool - Improve user experience by directing users to settings panel rather than refusing requests - Enable graceful degradation with fallback suggestions when tools are unavailable
…files (#262) Render parsed markdown in the attachment card excerpt instead of raw text, and open a full-screen modal viewer when a .md card is clicked rather than opening the raw source in a new tab. Reuses ngx-markdown (already wired up for assistant messages) and the existing presigned preview-url flow. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Render real first-page thumbnails for PDF attachments instead of the
skeleton mockup. Page rasterization runs in app-api via pypdfium2
(Apache 2.0 / BSD, bundled PDFium binary, no system poppler/ghostscript).
Backend:
- New `ThumbnailRenderer` with a MIME-type dispatcher; PDF only today.
Class docstring documents the recommended out-of-process design for
.docx / .xlsx so the dispatcher stays small.
- New `GET /files/{upload_id}/thumbnail` endpoint. Lazy: HEAD-checks for
a cached `_thumb.png` sibling next to the original, renders + stores
on miss, returns a short-lived presigned GET URL. 415 for unsupported
MIME types, 422 for unreadable / corrupt PDFs.
- Render runs in `loop.run_in_executor` so request workers aren't blocked.
- Single-file and session-cascade deletes now also remove the thumbnail
sibling.
Frontend:
- `FileUploadService.getThumbnail()` returns a typed result so callers
can switch on `ready` / `unsupported` / `unavailable` without parsing
HTTP errors.
- File attachment badge fetches the thumbnail on mount for PDFs and
renders it as an `object-cover` image in the card body, suppressing
the bottom fade. Silently falls back to the existing skeleton on any
error.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…for Code Interpreter - Add targeted error hints for XLSX→CSV filename mismatches in sandbox environment - Implement tolerant file matching for CSV↔XLSX aliasing to prevent retry loops - Expand analyze_tool docstring with critical guidance on filename vs. sandbox paths - Add schema footer preservation on errors for better retry context - Enhance list_spreadsheets_tool with file size and MIME type metadata - Update system prompt builder to clarify file handling for spreadsheet analysis - Improve stream processor error handling for Code Interpreter responses - Add file metadata models and utilities for consistent attachment handling - Update chat input component to support file metadata in message attachments Fixes: #206
# Conflicts: # backend/pyproject.toml # backend/src/apis/inference_api/chat/routes.py # backend/src/apis/inference_api/chat/service.py # frontend/ai.client/src/app/auth/login/login.page.css # frontend/ai.client/src/app/auth/login/login.page.ts # frontend/ai.client/src/app/session/session.page.ts
* docs(spec): bugfix requirements for BFF middleware event-loop blocking Regression surfaced after v1.0.0-beta.24 deploy (commit 258193d). The new SessionRefreshMiddleware runs sync boto3 (DynamoDB + Cognito) inside async handlers on the hot path of every cookie-bearing request, on a single uvicorn worker in a single ECS task, with aligned cache TTL and sliding-renewal throttle defaults. Page-load fan-out produces ~16 serialized blocking AWS calls per active user per minute. Spec captures the defect (7 clauses), corrected behavior (7 clauses), and regression-prevention invariants (11 clauses) to carry into design. * docs(spec): Add BFF middleware event-loop blocking bugfix design and tasks - Add comprehensive design specification for SessionRefreshMiddleware event-loop blocking issue - Document root causes: sync boto3 I/O, missing fan-out coalescing, aligned cache/throttle windows, inline awaited writes - Include formal bug condition specification with 7 sub-conditions and observable symptoms - Add detailed glossary of key components and terminology - Document preservation requirements and public contracts that must remain unchanged - Add implementation tasks with acceptance criteria and verification steps - Include deployment configuration changes (CDK worker count, environment variables) - Provide testing strategy for concurrency, performance, and regression validation * fix(bff-middleware): Resolve event-loop blocking and fan-out amplification - Offload SessionRepository boto3 calls via asyncio.to_thread to prevent event-loop blocking - Offload CognitoRefreshClient.refresh via asyncio.to_thread for non-blocking auth operations - Add per-session single-flight primitive module to coalesce concurrent refresh requests - Wire single-flight into SessionRefreshMiddleware._resolve_session to eliminate duplicate work - Convert _maybe_slide to fire-and-forget DDB write with synchronous cache update - De-align cache/leeway and throttle windows in config (throttle: 60s → 300s) - Raise production appApi.desiredCount to 2 for distributed request handling - Add comprehensive bug condition and preservation property tests - Update task completion checklist and infrastructure configuration * test(bff): poll for fire-and-forget slide-write under slower schedulers Task 3.5 moved the slide-write DDB call off the request path via `asyncio.create_task`, but `test_3_4_slide_max_age_matches_on_both_cookies` and `test_slide_past_throttle_writes_ddb_and_reemits_cookie` still sampled `update_item_calls` / `touch_last_seen.await_count` immediately after `TestClient` returned the response. On CI's slower scheduling (Python 3.12 runners), the detached task hadn't run yet, so Hypothesis tripped a `FlakyFailure` on the 3.4 property strategy. Fix: poll the counter up to 1s before asserting. The observable external contract (cookie attributes, Max-Age, response body) is unchanged; only the internal timing of the DDB write moved, which is exactly what task 3.5 intends. * fix(bff): keep strong ref on fire-and-forget slide-write tasks Task 3.5 dispatched the slide-write via `asyncio.create_task` but discarded the returned Task reference. Python's docs explicitly warn about this — without a strong reference, the task can be garbage-collected mid-execution. On Python 3.12 CI runners this was racing: the preservation test `test_3_4_slide_max_age_matches_on_both_cookies` saw 0 `update_item` calls (Hypothesis flagged it as FlakyFailure — failed on first run, passed on retry). Fix: hold a set of pending tasks on the middleware instance and attach an `add_done_callback(self._slide_tasks.discard)` so the set doesn't leak. This is the canonical pattern from the asyncio docs: https://docs.python.org/3/library/asyncio-task.html\#asyncio.create_task Verified locally by running the exact CI test scripts inside the agentcore-dev container: - scripts/stack-app-api/test.sh -> 2459 passed - scripts/stack-inference-api/test.sh -> 2459 passed * test(bff): poll inside TestClient context so background task can run CI was still failing `test_3_4_slide_max_age_matches_on_both_cookies` on Python 3.12 despite the strong-reference fix in 78891e2. The production change was correct — the task reference prevents GC. But the test was polling OUTSIDE the `with TestClient(app)` block, and TestClient's `__exit__` shuts down the anyio portal (and the event loop) before the polling even starts. Any pending asyncio.Task on that loop is cancelled on teardown, never runs, update_item_calls stays 0. Fix: poll INSIDE the `with` block. If the task hasn't flushed yet, drive the event loop with a second GET to give the pending task a chance to run. Same pattern applied to test_slide_past_throttle_writes_ddb_and_reemits_cookie. Reproduced the race locally by setting up a Python 3.12 venv inside the agentcore-dev container (CI's exact version). Ran the full test suite on both 3.12 and 3.13: 2459 passed on each. Also includes the code review report written earlier. --------- Co-authored-by: colinmxs <colinmxs@users.noreply.github.com>
* feat(agents): upgrade strands to 1.39.0 and enable Bedrock prompt caching Bumps strands-agents 1.37.0 → 1.39.0 and strands-agents-tools 0.5.1 → 0.5.2. Re-enables CacheConfig(strategy="auto") on the BedrockModel: the original blocker (strands PR #1438 — cachePoint blocks alongside non-PDF document attachments) is now included in v1.39.0, so the workaround is no longer needed. Updates the corresponding model_config test to assert caching is emitted rather than suppressed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(model-config): defer Bedrock prompt caching enablement Backs out the CacheConfig(strategy="auto") activation. The SDK-side blocker (strands PR #1438) is resolved in 1.39.0, so the technical barrier is gone — but the user-visible cost/badge impact warrants a separate, scoped rollout. The version bump itself stays. The deferral comment in model_config.py replaces the outdated "Bedrock limitation" rationale; the test now documents intentional deferral instead of the SDK limitation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…mantics (#270) * fix(token-accounting): correct per-message cost and context-window semantics Two related bugs were inflating cost and context-% reporting on tool-use turns: 1. Per-message cost double-count. Strands emits both per-LLM-call metadata (each call's tokens) and a final AgentResultEvent whose EventLoopMetrics.accumulated_usage is summed across every call in the turn. Both were emitted as `metadata` events and routed into per_message_metadata[current_assistant_message_index]["usage"] via .update(). Because the AgentResult event arrives after every message_stop, the index still pointed at the last assistant message — so cumulative tokens overwrote that message's per-call values, double-counting earlier messages' input tokens when each entry was priced and summed. Fix: route the result-extracted cumulative on the existing `metadata_summary` (turn-summary) track instead of `metadata`. The stream_processor main loop now consumes both event types into its accumulated_metadata so the final summary still carries true totals. 2. Context-% inflation within a tool turn. Bedrock reports each per-LLM-call inputTokens as the FULL context size sent on that call. For a 2-call tool turn (call_1.input=1000, call_2.input=2500), Strands' accumulated_usage reports 3500 — but the actual current context occupancy is 2500. The final SSE `usage` field (which drives the context-% badge and compaction trigger) was inheriting Strands' summed value via the metadata_summary handler in stream_coordinator. Fix: stream_coordinator no longer accumulates `metadata_summary` into accumulated_metadata. Per-call `metadata` events last-write-wins via .update(), so accumulated_metadata.usage equals the most recent call's full input = current context. Added a CAUTION comment noting AgentResult.context_size / EventLoopMetrics.latest_context_size return only `inputTokens` (excluding cacheRead/cacheWrite) — under prompt caching they under-report by 99%+, so we deliberately sum all three buckets. Also folded in: TTFT placeholder of 0 → null. A real time-to-first-token can never be 0ms, and aggregations need to distinguish absence from a real value. LatencyMetrics.time_to_first_token is now Optional[int] in both shared/sessions and app_api/messages models. Frontend stream parser preserves null instead of coercing; badge component already hides via truthy check. Existing zero-valued data deserializes fine. Tooltip on the context-% badge clarified: "Reflects the most recent turn ... May shrink after a context compaction." Aria-label matches. Regression tests in test_per_message_cost_attribution.py: - TestPerMessageAttributionTwoCallTurn (3 cases) — locks the metadata vs metadata_summary contract; without the fix, per_message_metadata[1].usage = (2300, 130, 2430) instead of the expected per-call (1300, 80, 1380). - TestSummaryAccumulatorAcceptsBothTracks — main loop accumulator must consume both tracks for cumulative totals. - TestStreamCoordinatorContextOccupancy (2 cases) — pin "current context" semantic in stream_coordinator and verify the all-three- bucket sum (cacheRead/cacheWrite included) matches the most recent call's full input. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(costs): add unit tests for CostCalculator math CostCalculator is the source-of-truth for all USD math, but the existing costs/ test suite only exercises it transitively through aggregator and storage tests with mocks. Add a direct test file with 26 cases covering: - Per-bucket pricing (input/output/cacheRead/cacheWrite) and component sums equaling the total - Cache scenarios (read-only, write-only, mixed) priced against Sonnet 4.5 rates so dollar values can be sanity-checked - Defensive cases: missing pricing keys, None values throughout, empty dicts — all degrade to 0 without raising - calculate_cache_savings correctness and None-tolerance - validate_pricing / validate_usage required-field predicates Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…oot (#271) The dark-mode CSS for the auth pages' lava-lamp backdrop and frosted-glass card never applied: hand-written `html.dark .X` selectors don't match correctly under Angular's emulated view encapsulation, and ThemeService (providedIn:'root') was never injected by anything in the pre-auth tree so the `dark` class wasn't reaching <html> on a cold load. - Switch the auth-page CSS to `:host-context(html.dark) .X`, the pattern the rest of the codebase already uses for component-scoped dark rules. - Force ThemeService to construct at bootstrap via provideAppInitializer so the persisted/system theme is applied to <html> before any route renders, including /auth/login and /auth/first-boot on cold load. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…272) * feat(auth): add SKIP_AUTH=true local-dev bypass with allowlist guard Adds a single-env-var bypass so unattended local dev (and Claude Code) can hit protected routes without the Cognito redirect to an external IdP. The bypass returns a fake admin user from the three auth dependencies in apis.shared.auth.dependencies; everything else (CSRF middleware, RBAC, profile cache) flows naturally because no `bff_session` is resolved. Two safeguards keep the bypass scoped to local dev: 1. Allowlist startup guard in app_api/main.py — refuses to boot when SKIP_AUTH=true is paired with any non-localhost entry in CORS_ORIGINS. Empty CORS_ORIGINS also refused. Fails closed for deploy targets we haven't anticipated, instead of a blocklist of known cloud env vars. 2. CI guard (.github/workflows/skip-auth-guard.yml) — greps CDK source, workflows, and Dockerfiles for SKIP_AUTH=true / SKIP_AUTH: true patterns and fails the build if any leak into deployed config. Why an allowlist of CORS origins: CORS_ORIGINS must be set correctly per environment for the app to function at all, so it's a reliable positive signal of "this is local dev" — far stronger than enumerating Lambda / ECS / EKS / App Runner / AgentCore Runtime indicators. Inference-api is intentionally not bypassed; all SPA traffic flows through app-api per the BFF pattern, so the single bypass on app-api is sufficient. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(ci): SHA-pin checkout and use ubuntu-24.04 in skip-auth-guard workflow Match the supply-chain conventions enforced by tests/supply_chain/{test_action_pinning,test_runner_pinning}.py: pin actions/checkout to the canonical repo SHA and replace ubuntu-latest with ubuntu-24.04. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(auth): defer SKIP_AUTH startup guard to lifespan + add tests Move the CORS_ORIGINS allowlist check from import-time into lifespan() so tests that import or reload apis.app_api.main (e.g. tests/routes/test_pbt_auth_sweep.py) don't trip the guard. The runtime behaviour is unchanged — uvicorn still invokes lifespan at boot. Add tests/auth/test_skip_auth.py covering: - _skip_auth_user(): None when unset/falsey, fake User when truthy, honors all SKIP_AUTH_* env overrides. - All three auth dependencies bypass when enabled, still 401 when not. - Startup guard accepts every localhost variant, rejects empty / unset / non-localhost CORS_ORIGINS. - The skip-auth-guard.yml regex matches realistic leak strings and skips benign ones. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: scrub SKIP_AUTH bleed from local .env in pytest conftest Tests that reload apis.app_api.main (e.g. test_pbt_auth_sweep.py) re-run load_dotenv(override=True), which copies SKIP_AUTH=true from a developer's backend/src/.env into os.environ for the rest of the process. Downstream auth-aware tests then silently take the bypass path and return a fake user. Add a session-wide autouse fixture that delenvs SKIP_AUTH_* per test. Test-local monkeypatch.setenv still wins (autouse runs first). Mirrors the existing pattern at tests/apis/shared/oauth/conftest.py. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): manage SKIP_AUTH env directly so autouse doesn't break sibling fixtures The monkeypatch-based scrub fixture changed pytest's fixture dependency graph: tests/apis/app_api/test_connectors_routes.py uses monkeypatch.setattr(routes, "_agentcore_control_client", lambda) and relies on its own autouse `_reset_control_client` tearing down AFTER that monkeypatch reverts. Adding a sibling autouse fixture that also depends on monkeypatch flipped the teardown order, leaving `_agentcore_control_client` as a plain lambda when `_reset_control_client` calls `cache_clear()` on it — 9 errors in CI. Manage os.environ directly via save/restore in a try/finally so the new fixture is independent of monkeypatch and doesn't perturb ordering for tests that compose monkeypatch with their own fixtures. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(env): document SKIP_AUTH local-dev bypass in .env.example Add a DEVELOPMENT SETTINGS section entry covering SKIP_AUTH and its optional SKIP_AUTH_ROLES / SKIP_AUTH_USER_ID / SKIP_AUTH_EMAIL knobs. Calls out the boot-time CORS_ORIGINS allowlist, the CI guard workflow, and the inference-api carve-out so a new dev landing in the file sees the safety story alongside the feature. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* fix(bff): share AES-256 cookie data key across tasks via Secrets Manager PR #264 raised app-api desiredCount from 1 → 2 for concurrency slack but left CookieCodec calling kms:GenerateDataKey on first use per process. That generates a fresh random AES key per task, so a cookie sealed by Task A unseals as `bad seal` on Task B — every page-load fan-out under the new deployment shape becomes a 401 storm. Dev confirmed: /sessions returns 200 from one task while /permissions, /models, /tools, /quota, /connectors all return 401 from the other. This commit moves the data key out of per-process state and into a shared Secrets Manager secret, bootstrapped once at deploy time. Infra (CDK): - New `BFFCookieDataKeySecret`, encrypted at rest with the existing `BFFCookieSigningKey` CMK. - Two chained `AwsCustomResource`s bootstrap the wrapped data key on Create only: `kms:GenerateDataKey` -> `secretsmanager:PutSecretValue`. `outputPaths: ["CiphertextBlob"]` whitelists the field returned to CFN so the response Plaintext (the AES key itself) never enters CloudFormation state. - SSM parameter publishes the secret ARN for app-api to consume. App-api: - `CookieCodec._ensure_cipher` now reads the wrapped blob from Secrets Manager, calls `kms:Decrypt(KeyId=BFFCookieSigningKey, ...)` to unwrap, and caches the AESGCM cipher as before. KeyId is pinned to defend against blob substitution if the secret is ever tampered. - Distinguish infra failure (`CookieDataKeyUnavailable` -> 5xx) from decode failure (`CookieDecodeError` -> clear cookie). Empty / non- base64 / wrong-size key all surface as infra errors. - Drop `kms:GenerateDataKey` from the runtime task role (least privilege; runtime no longer needs it). The bootstrap custom resource carries its own narrow grant. Tests: - Cross-task seal/unseal regression locked in: `test_two_codecs_with_same_wrapped_blob_decrypt_to_the_same_cipher` — two CookieCodec instances simulate two ECS tasks; cookie sealed on one MUST unseal on the other. - New `_ensure_cipher` battery: happy path, KeyId pin, hot-path caching, Secrets Manager / KMS failure propagation, empty / bad base64 / wrong-size key rejection, missing config -> decode error. - Updated test_3_6 preservation contract to match the new code path (one Secrets Manager + one KMS Decrypt per process, was: one GenerateDataKey). - CDK tests for the bootstrap custom resources (KeySpec=AES_256, outputPaths whitelisted, narrow IAM grants), the new env var on app-api, and the IAM grant changes (Decrypt-only on the CMK, GetSecretValue on the data-key secret). - Fixed two pre-existing stale resource-count assertions in infrastructure-stack tests (16 → 18 DDB tables, 3 → 6 secrets). * fix(bff): coalesce Cognito refresh across tasks via DDB conditional-write lock The in-process `single_flight` and `get_session_lock` introduced by PR #264 only coalesce same-session callers within a single Python process. Once the cookie-codec fix lands and dev's two app-api tasks can share cookies again, two tasks under desiredCount: 2 will each see the same cookie cross the refresh-leeway boundary and each call `cognito-idp:initiate_auth` with the same refresh token. Cognito rotates on the winning call; the loser receives `NotAuthorizedException`, the loser's middleware clears the user's cookie, and the user is silently logged out. This commit adds a cross-task lock so exactly one Cognito refresh per session per leeway window happens across the entire fleet. Repository (DDB): - New `try_acquire_refresh_lock(session_id, owner, lock_ttl_seconds)`: conditional UpdateItem that succeeds iff `attribute_not_exists( refresh_lock_until) OR refresh_lock_until < :now`. Loser returns False; non-condition errors propagate. - `update_tokens` gains `expected_lock_owner=...` — when supplied, the write conditionally requires the row's `refresh_lock_owner` to match (or be absent), and atomically REMOVE-es the lock attrs in the same write. ConditionalCheckFailed propagates so a stale leader can't stomp on a successor's freshly persisted tokens. - `release_refresh_lock(session_id, owner)`: best-effort cleanup for the leader-failed path so a peer doesn't have to wait the full TTL before retrying. No-op if the lock has TTL'd or another task owns it. Other DDB errors logged-and-swallowed. Middleware: - Two-tier coalescing inside `_resolve_session._loader`: 1. existing `get_session_lock` (in-process) collapses N concurrent same-session callers within one task to one contender. 2. NEW `try_acquire_refresh_lock` (cross-process via DDB) elects exactly one leader across the entire fleet. Followers poll the row via `_wait_for_peer_refresh` and adopt the leader's tokens (rotation detected by refresh-token mismatch; non-rotation detected by access-token mismatch + future-dated exp). - Leader path: lock owner threaded through `_persist_refresh` so the write is conditional on still-being-leader. ConditionalCheckFailed on persist → re-read DDB and adopt the peer's tokens rather than invalidating the cache. - Cognito refresh failure on leader path: lock is released eagerly (best-effort) so peer requests don't have to wait for the full TTL. - Configurable `refresh_lock_ttl_seconds` (default 30s) — bounds the worst case where a leader crashes mid-refresh. Tests: - 8 new repository tests for the lock primitive: acquire on unlocked row, contention blocks peer, TTL recovery, distinct-session isolation, release-by-owner-only, atomic clear on token persist, condition fails when peer owns the lock. - 5 new integration-level cross-task tests (`test_session_refresh_cross_task.py`) running two `SessionRefreshMiddleware` instances over one moto DDB table — covers leader/follower paths, follower-polling-then-adopting, lock TTL recovery after dead leader, follower-falls-back-terminal when leader is stuck, and the headline invariant: two tasks racing in parallel call Cognito at most once. - Updated `test_session_refresh_preservation.py`'s `InstrumentedTable` to differentiate lock-acquire / token-persist / slide writes so `update_item_side_effect` injection only fires on the persist path (preserving the original test intent). No IAM change required: app-api task role already has `dynamodb:UpdateItem` on the BFF sessions table.
…nerated secret (#274) PR #273 introduced an `AwsCustomResource`-chained bootstrap (kms:GenerateDataKey -> secretsmanager:PutSecretValue) to materialize a wrapped AES-256 data key for cross-task cookie sealing. That design fails on first stack create with: Custom::AWS BFFCookieDataKeyGenerate CREATE_FAILED Response object is too long. Root cause: the AwsCustomResource framework Lambda JSON-stringifies the AWS-SDK response BEFORE applying `outputPaths`. KMS returns `CiphertextBlob` as a Uint8Array, which serializes as `{"0":233,"1":18, ...}` — for a ~200-byte ciphertext that's ~1.5 KB, blowing past CloudFormation's 4 KB response-object limit. Even if it had landed under the limit, the value threaded into PutSecretValue via `getResponseField` would have been the JSON-object form, not a base64 ciphertext — runtime base64-decode would have failed at first cookie seal attempt. Fix: drop the chained custom resources entirely. Use Secrets Manager's own `generateSecretString` (44-char alphanumeric, ~261 bits of entropy) and derive the AES-256 key at runtime via SHA-256. Single-shot SHA-256 of a >=256-bit-entropy random input is a sound KDF — the output is statistically indistinguishable from random for AES-256 use. Threat model is preserved: - Secret is still encrypted at rest with the customer-managed `BFFCookieSigningKey` CMK. - Reading it requires both `secretsmanager:GetSecretValue` AND `kms:Decrypt` on the CMK (Secrets Manager invokes Decrypt on the caller's behalf using the secret-ARN encryption context). - Runtime never gets `kms:GenerateDataKey`, so a compromised task can't seal cookies under a parallel key. - Cross-task seal/unseal regression lock (`test_two_codecs_with_same_secret_derive_the_same_cipher`) still holds. Infra (CDK): - Removed `BFFCookieDataKeyGenerate` and `BFFCookieDataKeyStore` AwsCustomResources and their narrow IAM grants. - `BFFCookieDataKeySecret` now uses `generateSecretString` directly. - Dropped `kms:DescribeKey` from app-api task role; kept `kms:Decrypt` (Secrets Manager invokes it on the caller's behalf when reading a CMK-encrypted secret). - Removed the `AwsCustomResource` import; cleaned up obsolete bootstrap-related comments. App-api: - `CookieCodec._ensure_cipher` now reads the secret string from Secrets Manager and applies SHA-256 to derive the 32-byte AES-256 key. No KMS round trip, no per-cold-start `kms:Decrypt` call. - `CookieCodec` constructor lost the `kms_client` parameter; only `secrets_manager_client` is needed for testing. - Updated module docstring and `CookieDataKeyUnavailable` comment. Tests: - `test_cookie.py`: rewrote `_ensure_cipher` test battery for the no-KMS path. New `test_ensure_cipher_derived_key_matches_sha256_of_secret` pins the KDF — a future change must keep the same derivation, or every cookie sealed by an old task fails to unseal on a new task after deploy. Cross-task regression lock renamed to `test_two_codecs_with_same_secret_derive_the_same_cipher`. - `test_session_refresh_preservation.py`: 3.6 contract no longer asserts `kms.decrypt.call_count` or KeyId-pinning; only the Secrets Manager singleton-call invariant remains. - `test_session_refresh_cross_task.py`: comment updated to match new vocabulary (data-key secret vs wrapped data key). - `infrastructure-stack.test.ts`: dropped bootstrap-CR assertions (`generateDataKey` / `putSecretValue`); added negative lock that no AwsCustomResource emits those actions, plus a positive assertion on the `generateSecretString` shape (PasswordLength: 44, ExcludePunctuation, IncludeSpace: false). - `app-api-stack.test.ts`: comment-only update for the `kms:GenerateDataKey` exclusion; same negative assertion still holds under the new design. Net diff: -153 lines. No more chained custom resources, no per-cold- start KMS round trip, simpler runtime IAM surface. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
… guard (#275) Two correctness improvements layered on top of PR #273's cross-task refresh-lock work. 1. Strict-owner lock release (repository.py) The post-#273 release condition was: attribute_not_exists(refresh_lock_owner) OR refresh_lock_owner = :owner That has a stale-leader stomp bug: Task A acquires the lock. Task A's lock TTLs (slow Cognito refresh, ECS eviction, etc.). Task B acquires the lock, refreshes, and persists tokens — which REMOVEs the lock attrs in the same write. Task A returns from Cognito and calls update_tokens with our (older) tokens. `attribute_not_exists(refresh_lock_owner)` matches — Task A's stale tokens overwrite Task B's freshly rotated ones. Next request: Cognito rejects Task A's now-revoked refresh token; user silently logged out. Fix: tighten to strictly `refresh_lock_owner = :owner`. The leader always sets these attrs in `try_acquire_refresh_lock`, so the strict form is correct in every legitimate flow and surfaces every stale- leader case as `ConditionalCheckFailedException` for the caller to re-read and adopt the peer's tokens. Also adds `try_acquire_refresh_lock` test coverage to lock in that the acquire path uses `attribute_exists(PK)` so it never creates phantom rows for sessions that don't exist. 2. Absolute-lifetime guard before refresh (session_refresh.py) Mirrors the existing `_maybe_slide` short-circuit. If a session is past `created_at + absolute_lifetime_seconds`, don't burn a Cognito refresh-token rotation on a session whose row would TTL-evict immediately after the write — clear the cookie instead. Otherwise we silently rotate a token we'll never read again. Plus INFO logging on cross-task adoption success so CloudWatch can answer "how often is cross-task coalescing actually firing?" without needing a debug deploy. Tests: - test_repository.py: new test_update_tokens_rejects_persist_when_peer_already_cleared_the_lock locks in the strict-owner condition. test_try_acquire_refresh_lock_does_not_create_phantom_row pins the acquire-path attribute_exists guard. - test_session_refresh_middleware.py: new test_refresh_path_past_absolute_cap_clears_cookie_without_calling_cognito pins the absolute-lifetime guard ahead of the lock acquisition. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Previously only the SessionService bootstrap path redirected on 401 — a session that expired mid-session left the user stranded with a generic toast (CRUD endpoints) or no feedback (SSE chat stream). Now every 401 flows through SessionService.handleUnauthorized(), which dedupes concurrent calls and queues a single navigation to /auth/login with a returnUrl preserved. Also surfaces session loss proactively rather than waiting for the next HTTP call to fail: - Cookie-presence fast-path in bootstrap and recheck — when the JS-readable __Host-bff_csrf cookie is gone, the session cookie is gone too (BFF sets/clears them together with matching Max-Age), so we skip the /auth/session round-trip and bounce straight to login. - Visibility re-probe in the app shell — on tab refocus, recheck() runs the cookie check and falls back to /auth/session, so a session that expired while the tab was backgrounded is caught immediately. Deferred to follow-ups: cross-tab BroadcastChannel coordination, draft preservation across login redirect, periodic background polling. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
… bootstrap run 2026-05-10 (#278) Adds two new repo-level skills under .claude/skills/: - kaizen-research: Friday early-AM external + internal scan (AWS Bedrock/AgentCore, Strands, aws-samples reference repo, MCP, frontier models, agent-harness patterns; internal git/PR/CI/version-pin signals). Outputs dated research doc + queues ideas in docs/kaizen/review-queue.md. - kaizen-review-prep: Friday late-AM ranked decision agenda. Consumes research + open queue + last-week's POC findings (from prior research PR comments) + recent merges/CI signal. Every item has Ship/Decline/Defer recommendation. Both skills open PRs into develop on kaizen/* branches. Web budget soft-target 50/run; subagent fan-out for external sources; explicit handoff contract via review-queue.md. This commit also includes the first bootstrap output: - docs/kaizen/research/2026-05-10.md - docs/kaizen/reviews/2026-05-10.md - docs/kaizen/review-queue.md (7 open items) Top finding: bedrock-agentcore is 3 minor versions behind (1.6.4 -> 1.9.0, released inside scan window) and our open issues #266/#267 were quietly closed by Strands v1.37/v1.38 (already in our 1.39 pin from #265). CI failure cluster (9 nightly + 6 deploy failures since May 6) is the loudest internal signal. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
…ative subtraction clarification (#279) Three additions to the kaizen-research skill (and corresponding refresh of the 2026-05-10 bootstrap output): 1. **FastMCP** added as source category 4a — tracks upstream releases for the externally hosted MCP servers this stack consumes via AgentCore Gateway. Not pinned in this repo's pyproject.toml; lives in the MCP server repos. Source: https://github.com/jlowin/fastmcp + PyPI. 2. **Library-native subtraction** explicitly named in the Subtraction-first philosophy. When upstream ships a capability we built or filed an issue for, the win is closing our version and adopting upstream. The 2026-05-10 bootstrap surfaces a canonical example: Strands v1.37/v1.38 silently closed our open issues #266 and #267. 3. **Security posture audit** added as internal source 18a. Snapshots open Dependabot alerts, open CodeQL findings (with severity + rule + path), open security-labeled issues, recent auth-surface commit churn, and the most recent CHANGELOG security block. Cross-references open Dependabot alerts against external advisories scan to surface "we already know it's hitting us" overlap. Surfaces in the doc as a new "Security posture" section between Version-pin lag and Retirement candidates. The bootstrap output (2026-05-10) was re-walked through the new lenses: - Security posture section added: 9 open Dependabot alerts (4 high — all `fast-uri`, confirmed real by the external advisories scan), 10 open CodeQL (3 error-severity `py/log-injection` in real backend paths), 0 security-labeled issues, 7-commit BFF auth-surface churn. - 2 new Top-7 proposals: #1 patch `fast-uri` Dependabot alerts; #3 fix `py/log-injection` CodeQL findings. Both Low effort, High/Med-High impact — they jump to the top of the ranked agenda. - TL;DR, Take, and Review Protocol updated to reflect security findings. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
* fix(deps): remediate all 22 HIGH Dependabot findings Bumps vulnerable dependencies across backend, scripts, and frontend to patched versions. All 22 open HIGH-severity Dependabot alerts addressed. Backend (pyproject.toml + uv.lock): - cryptography 47.0.0 -> 48.0.1 (GHSA-537c-gmf6-5ccf) - starlette 1.0.0 -> 1.3.1 (CVE-2026-48818, CVE-2026-54283) - python-multipart 0.0.27 -> 0.0.30 (CVE-2026-53539) - pyjwt[crypto] 2.12.1 -> 2.13.0 (CVE-2026-48526) - urllib3 pinned 2.7.0 (CVE-2026-44431, CVE-2026-44432) Scripts (backup-data, restore-data): - add uv constraint cryptography>=48.0.1 -> 49.0.0 (GHSA-537c-gmf6-5ccf) Frontend (package.json + package-lock.json): - @angular/* framework 21.2.11 -> 21.2.17 (CVE-2026-50170/50171/54266/54267/54268) (cdk 21.2.14, build/cli 21.2.16 -- latest existing patch for those packages) - hono override -> 4.12.26 (CVE-2026-54290) - piscina override -> 5.2.0 (CVE-2026-55388) - undici override -> 7.28.0 (CVE-2026-9697) - vite override -> 8.0.16 (CVE-2026-53571) Verified: backend pytest 3935 passed/3 skipped; frontend build OK; frontend unit tests 1216 passed. All locked versions >= patched thresholds. * fix(deps): remediate easy MEDIUM/LOW Dependabot findings Straightforward in-range/direct dependency bumps for remaining alerts. Risky or blocked findings deliberately left out (see below). Backend (pyproject.toml + uv.lock): - aiohttp 3.13.5 -> 3.14.1 - authlib 1.7.0 -> 1.7.1 - python-multipart 0.0.30 -> 0.0.31 - idna pinned 3.15 (transitive) Scripts (backup-data, restore-data): - pytest 8.4.2 -> 9.0.3 (dev) Frontend (package.json + package-lock.json): - mermaid 11.14.0 -> 11.15.0 (direct) - @babel/core override ">=7.29.6 <8.0.0" -> 7.29.7 (bounded to 7.x; open-ended pulled babel 8 which broke the Angular build) Infrastructure (package-lock.json, lock-only): - @babel/core -> 7.29.7 (within range) Deliberately NOT included (not "easy"): - esbuild 0.28.1: pinned by vite 8; override risks build breakage (low sev) - js-yaml 4.2.0: major bump (consumers pin ^3) - dompurify: one alert has no patched version published - infra fast-uri (HIGH) / brace-expansion / js-yaml: bundled inside aws-cdk-lib@2.251.0/node_modules; npm overrides can't reach them. Requires an aws-cdk-lib upgrade -- separate, deliberate change. Verified: backend pytest 3935 passed/3 skipped; frontend build OK + 1216 unit tests passed; infra tsc build OK + 396 jest tests passed. * fix(deps): upgrade aws-cdk-lib 2.251.0 -> 2.260.0 to clear bundled CVEs The remaining infra Dependabot HIGH/MEDIUM alerts were in deps bundled inside aws-cdk-lib's own node_modules (unreachable by npm overrides): - fast-uri 3.1.0 -> 3.1.2 (HIGH, 2 advisories) — bundled via ajv - brace-expansion 5.0.5 -> 5.0.6 (MEDIUM) — bundled via minimatch Bumping aws-cdk-lib to 2.260.0 (latest 2.x) re-bundles both at patched versions. constructs 10.6.0 already satisfies the ^10.5.0 peer (unchanged). Verified: infra tsc build clean; jest 396 passed/18 suites (full PlatformStack construction + Template.fromStack assertions, no template regressions). cdk-lib v2 minor bump, backward compatible.
The deploy workflows (backend.yml, platform.yml, frontend-deploy.yml) only run on push to develop/main and run their test jobs as a pre-deploy gate, so unit tests never executed on the PR itself. PRs into develop ran only skip-auth-guard. Adds .github/workflows/ci.yml triggered on pull_request -> [develop, main] with three parallel test jobs reusing the existing commands: - test-backend: uv sync + uv run pytest tests/ - test-frontend: npm ci + npm run test:ci (vitest) - test-infra: npm ci + npx jest No build/deploy/AWS steps — deploys stay push-only. Actions are SHA-pinned with the shared checkout SHA, runners pinned to ubuntu-24.04, and cancel-in-progress: true (safe; no CDK deploy). Conforms to all tests/supply_chain checks (31 passed).
…across edge origins (#491) Collapse the three-separate-cert first-deploy footgun into one shared wildcard, and make cert handling consistent and fail-loud across all CloudFront origins. - config.ts: add CDK_CLOUDFRONT_CERTIFICATE_ARN (top-level cloudfrontCertificateArn). frontend/artifacts/mcpSandbox certs fall back to it when their section-specific ARN is unset; section-specific wins. ALB cert stays separate (region-specific). One us-east-1 wildcard ({domain}+*.{domain}) now satisfies all three origins. - artifacts-distribution-construct: add domain-set-but-cert-missing guard mirroring mcp-sandbox (replaces the false 'config.ts already enforced' comment + opaque fromCertificateArn(undefined) crash), and add a domain-less fallback to the CloudFront default domain so domain-less synth no longer crashes with 'reading startsWith'. - load-env.sh: forward cloudfrontCertificateArn + mcpSandbox.certificateArn context params (were missing, breaking the cdk.context.json path). - workflows: wire CDK_CLOUDFRONT_CERTIFICATE_ARN job-level env in platform / nightly / teardown. - docs: step-02/step-03/ACTIONS-REFERENCE recommend the single shared cert and reframe per-origin vars as optional overrides; troubleshooting entry for the synth cert-guard failure. - tests: CloudFront cert resolution (config), artifacts cert guard + domain-less fallback, and end-to-end shared-cert PlatformStack synth. Full infra suite: 20 suites / 406 tests green.
* fix(infra): bump aws-cdk CLI 2.1120.0 -> 2.1128.0 to match aws-cdk-lib 2.260.0 aws-cdk-lib 2.260.0 emits cloud-assembly schema 54.0.0, but the pinned aws-cdk CLI (2.1120.0) only reads up to schema 53 — so synth/deploy failed with 'CDK CLI is not compatible with the CDK library ... Maximum schema version supported is 53.x.x, but found 54.0.0. You need at least CLI version 2.1128.0'. The library was bumped without bumping the CLI. Scripts invoke 'npx cdk', which resolves the local aws-cdk devDependency on the CI runner, so bumping the pin + regenerating the lockfile is the fix. Also bump the devcontainer global CDK pin (Dockerfile) and the version tables (README, dev-environment steering) that are documented to track package.json, so the interactive 'cdk' in the container doesn't drift and reproduce the same error locally. Verified in the devcontainer: npx cdk --version -> 2.1128.0; a synth of an aws-cdk-lib 2.260.0 assembly (manifest schema 54.0.0) is read by the 2.1128.0 CLI with exit 0; full infra suite 20 suites / 406 tests green. * ci(platform): pin Node 22 in the PlatformStack deploy jobs The deploy jobs in platform.yml and nightly-deploy-pipeline.yml were the only jobs without actions/setup-node — they ran scripts/platform/deploy.sh (which does npm ci + cdk via deploy.sh -> scripts/cdk/install.sh) on the runner's ambient Node instead of the Node 22 every other job and the devcontainer pin. Add setup-node (node 22 + npm cache) so the deploy toolchain is pinned and reproducible. Note: deps were already being installed (deploy.sh calls install.sh -> npm ci); the recent schema-mismatch failure was the stale aws-cdk pin (2.1120.0), fixed in 4339f26 by bumping to 2.1128.0. This change is the toolchain-pinning gap the #396 refactor left in the deploy jobs.
* fix(infra): auto-generate IAM role names to avoid first-deploy collisions Drop explicit roleName from the AgentCore memory/code-interpreter/browser/ gateway/runtime execution roles and the SageMaker execution role. Fixed physical names collide with orphaned roles left by a rolled-back/partial deploy. Every consumer references these roles by .roleArn (or resolves them at runtime via GetGateway / SAGEMAKER_EXECUTION_ROLE_ARN), so auto-generated names are safe. * chore(infra): replace deprecated pointInTimeRecovery with pointInTimeRecoverySpecification Silences the aws_dynamodb.TableOptions#pointInTimeRecovery deprecation warnings across all DynamoDB table constructs. Synthesized CloudFormation output is unchanged (still PointInTimeRecoveryEnabled: true). * fix(deploy): re-seed image-tag SSM param when the referenced ECR image is missing The seed guard skipped any URI-shaped value, trusting the build pipeline. But image-tag params are not CFN-managed and survive teardown, so a stale project-repo URI could outlive its ECR repo and break the AgentCore Runtime / ECS task def with 'repository does not exist'. Verify the image actually exists (ecr describe-images) before skipping; otherwise overwrite with the bootstrap URI. * fix(infra): grant Cognito group + delete actions for first-boot admin setup Add cognito-idp:CreateGroup and AdminAddUserToGroup (the first-boot flow creates the system_admin group and adds the initial admin) plus AdminDeleteUser (so the rollback path doesn't orphan a Cognito user and block retry with UsernameExistsException) to the app-api task role.
…es (#495) Reverts the roleName removal from 7107cf9 for the AgentCore memory/ code-interpreter/browser/gateway/runtime and SageMaker execution roles. Auto-generating these names is unsafe on an already-deployed stack: the role ARN feeds create-only properties on the AgentCore resources (BrowserCustom/CodeInterpreterCustom executionRoleArn, Memory memoryExecutionRoleArn, Gateway/Runtime roleArn). Renaming the role replaces it (new ARN) -> forces replacement of the dependent AgentCore resource -> CFN re-creates it with the same create-only Name -> 'already exists' collision -> UPDATE_ROLLBACK. Confirmed via ai-sbmt-api-PlatformStack events (BrowserCustom/CodeInterpreterCustom DELETE_COMPLETE with empty PhysicalResourceId during rollback). Orphaned fixed-name roles on a *fresh* deploy are handled by deleting the orphans before deploying, not by renaming. Added comments on each role to prevent re-introducing the auto-name change.
The RAG ingestion Lambda and AgentCore Runtime are arm64, but the post-refactor backend.yml ran their build jobs on amd64 (ubuntu-24.04). inference-api compensated with QEMU emulation (platform: linux/arm64); rag-ingestion had neither the platform input nor a build-one.sh PLATFORM, so it produced an amd64 image -> the arm64 Lambda failed every invoke with Runtime.InvalidEntrypoint (file uploads stuck 'uploading', no embeddings). Restore the pre-refactor (main) approach: build both arm images on native ubuntu-24.04-arm runners instead of emulating on amd64. - backend.yml: build-inference-api and build-rag-ingestion -> runs-on ubuntu-24.04-arm; drop the QEMU-triggering platform input from inference-api (native build needs no emulation). - build-one.sh: rag-ingestion PLATFORM=linux/arm64 (explicit native build). Deploy jobs stay on ubuntu-24.04 (API-only, no docker build). Note: the stale amd64 rag-ingestion ECR image must be deleted once so the content-hash build doesn't skip the corrected arm64 build.
… off) (#497) * feat(skills): gate skills feature behind SKILLS_ENABLED flag (default off) Defer the skills feature (user picker, admin catalog, skills mode) for a release without removing any merged code. A new apis/shared/feature_flags.py::skills_enabled() reads SKILLS_ENABLED (default false), mirroring the FINE_TUNING_ENABLED precedent. Deployed environments go dark automatically because the env var is absent; set SKILLS_ENABLED=true on both app-api and inference-api to re-enable. Gated surfaces (code and data left intact): - app-api main.py: user-facing skills router mounted only when enabled. - app-api admin/routes.py: admin skills + chat-mode-policy routers gated. - app-api system/routes.py: GET /system/chat-settings reports chat/no-toggle/ skillsEnabled:false when off (new skills_enabled field on the response). - inference-api chat/routes.py: force skill -> chat when off (voice/other agent types untouched). - SPA: ChatModeService.skillsEnabled drives admin nav hide; mode toggle and skills section auto-hide via allowModeToggle:false; SkillService eager load gated by an effect so a disabled env never fires the now-404 GET /skills/. Tests default off; skills-mode tests opt in via SKILLS_ENABLED=true and new off-behavior tests cover the forced-chat path and admin mount gating. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(model-settings): mock Skill/ChatMode services to stop teardown rejection model-settings.spec instantiated the real SkillService and ChatModeService, which fire /skills/ (now via an effect) and /system/chat-settings on construction. With no httpMock those requests fail asynchronously, and the SKILLS_ENABLED change shifted the timing so a console.error landed during worker teardown — surfacing as an unhandled EnvironmentTeardownError that failed the vitest run with exit 1 even though all 1218 tests passed. Provide minimal mocks for both services so the spec fires no stray async work. Full suite exits 0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
The admin "discover from server" button signs its request as the app-api task role (SigV4, service=lambda), but that role has no lambda:InvokeFunctionUrl — only inference-api does. Against an AuthType=AWS_IAM Lambda Function URL the signed request is rejected with 403, which surfaces during MCP client init as an anyio TaskGroup ExceptionGroup and falls through to a generic 502. For same-team MCP servers that validate a forwarded user JWT (Lambda URL AuthType=NONE), discovery should mirror the runtime forward_auth_token path and sign with the admin's own OIDC token instead of SigV4. Add a forward_auth_token flag to MCPDiscoverRequest; when set, the discover route forwards admin.raw_token as the bearer (400 if unavailable) and skips SigV4. Provider-gated OAuth (3LO) discovery is still rejected — the admin session can't supply an end-user provider token. Wire the flag through the admin tool form's discover call so the existing "Forward app authentication token" checkbox governs discovery too. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…or single-stack (#499) Reconcile the nightly pipeline and teardown scripts with the single PlatformStack, API-driven-deploy architecture. nightly-deploy-pipeline.yml: restore the workflow_call input contract the orchestrator still passes (ref, project-prefix, alb-subdomain, skip-teardown, label, source-project-prefix, run-e2e). Every job now checks out inputs.ref and deploys to the ephemeral inputs.project-prefix (never the shared environment). Add an always() teardown job (needs all deploy/test jobs, gated on skip-teardown) so every nightly stack is destroyed even on partial/failed deploys -- no paying for idle resources. Ephemeral env runs with no custom domain and an unset CDK_COGNITO_DOMAIN_PREFIX (defaults to the unique prefix). scripts/nightly/teardown.sh: delete <prefix>-PlatformStack via cloudformation delete-stack + wait (was a dead cdk-destroy loop over removed per-stack names). scripts/teardown/destroy.sh: add PlatformStack to the foundation phase while keeping legacy InfrastructureStack/app-stack handling, so the manual teardown works for both single-stack and legacy deployments.
…cret (#501) The admin auth-providers endpoints (POST/DELETE /admin/auth-providers) write the provider client-secret bag back to the auth-provider-secrets secret via PutSecretValue (apis/shared/auth_providers/repository.py), but the App API task role was only granted GetSecretValue. Configuring/removing an auth provider failed with AccessDeniedException on secretsmanager:PutSecretValue. Add a least-privilege PutSecretValue statement scoped to just the auth-provider-secrets secret (trailing wildcard matches the random ARN suffix). No other runtime-written secret needs this.
* fix(nightly): auto-teardown ephemeral nightly deploys; fix teardown for single-stack Reconcile the nightly pipeline and teardown scripts with the single PlatformStack, API-driven-deploy architecture. nightly-deploy-pipeline.yml: restore the workflow_call input contract the orchestrator still passes (ref, project-prefix, alb-subdomain, skip-teardown, label, source-project-prefix, run-e2e). Every job now checks out inputs.ref and deploys to the ephemeral inputs.project-prefix (never the shared environment). Add an always() teardown job (needs all deploy/test jobs, gated on skip-teardown) so every nightly stack is destroyed even on partial/failed deploys -- no paying for idle resources. Ephemeral env runs with no custom domain and an unset CDK_COGNITO_DOMAIN_PREFIX (defaults to the unique prefix). scripts/nightly/teardown.sh: delete <prefix>-PlatformStack via cloudformation delete-stack + wait (was a dead cdk-destroy loop over removed per-stack names). scripts/teardown/destroy.sh: add PlatformStack to the foundation phase while keeping legacy InfrastructureStack/app-stack handling, so the manual teardown works for both single-stack and legacy deployments. * fix(nightly): restore test-infra/backend/frontend gates in deploy pipeline The pipeline rewrite for ephemeral auto-teardown dropped the per-pipeline test gates, breaking infrastructure/test/repo-shape.test.ts which requires: - build-*/code-deploy-* jobs gate on test-backend - deploy-frontend gates on test-frontend - deploy-platform gates on test-infra Re-add the three test jobs (checking out inputs.ref) and the needs edges, keeping the input-driven ephemeral deploy + always() teardown intact. Also add the test jobs to teardown's needs so teardown waits for the whole graph. Verified: npx jest repo-shape passes (49/49); both nightly workflow YAMLs parse and all orchestrator calls pass only declared inputs.
…502) The McpSandboxDistributionConstruct doc comment claimed it publishes the proxy origin to SSM at `/{prefix}/mcp-sandbox/origin`. That was true of the pre-#396 standalone McpSandboxStack, but the single-stack consolidation dropped the SSM publication: the origin is now exposed as `proxyOrigin` and threaded through PlatformComputeRefs straight into inference-api's `AGENTCORE_MCP_APPS_SANDBOX_ORIGIN` env var. The stale comment misleads anyone debugging the sandbox (a missing SSM param looks like a broken deploy when it is expected). Update the comment to match the code. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Generated by the kaizen-research skill. Top 5 ideas appended to docs/kaizen/review-queue.md for the kaizen-review-prep run later this morning. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The ui/message bridge handler read `params.content` as a single block
({type,text}), but per SEP-1865 / the ext-apps SDK the View sends an ARRAY
of content blocks (content: [{type:'text',text}]). Every spec-compliant
widget message was therefore rejected with -32000 "Invalid ui/message
params" (e.g. an MCP App's app.sendMessage()).
Read `content` as an array and concatenate its text blocks (mirrors the
ui/update-model-context handler). Update MessageParams to the array shape,
fix the two bridge specs that sent single objects, and add a multi-block
concatenation regression test.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…rt (#504) A single TLS handshake blip (e.g. SSLV3_ALERT_HANDSHAKE_FAILURE from a TLS-inspecting middlebox) or connection reset when starting an external MCP client otherwise fails the whole agent build: Strands' start() raises MCPClientInitializationError, the tool fails to load, and agent creation errors out for the user. UICapableMCPClient.start() now retries transient transport failures — ConnectError/SSLError/timeouts, detected by walking the MCPClientInitializationError -> ExceptionGroup -> httpx.ConnectError chain — up to 3 attempts with exponential backoff. Non-transient errors (bad URL, auth, protocol) are re-raised on the first attempt. Strands resets its init future + background thread on failure (stop()), so re-invoking start() is safe. Covers both external and gateway clients. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…rdances (#505) A user turn initiated by an MCP App widget (ui/message → submitChatRequest) appeared in the conversation but skipped the two affordances the composer path triggers: the loading indicator (the page sets chatLoading before submitting) and the scroll-to-top of the new user message (chat-container's post-submit setTimeout). The widget delegate called the service directly, bypassing the chat-input → chat-container → page chain. - ChatStateService: add a `scrollToLastUserTick` signal + `requestScrollToLastUser()`. - ChatContainer: react to the tick by scrolling the last user message to the top (mirrors onMessageSubmitted; skips the initial 0 so it doesn't scroll on mount). - mcp-app-frame widget delegate: set chatLoading(true) and request the scroll around submitChatRequest (the user message is added synchronously inside it). The composer path is untouched (it keeps its own setTimeout scroll), so no existing behavior changes. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Drop the stale '6.' numeric prefix from the workflow name. It was a leftover from the pre-refactor numbered pipeline ordering; no workflows use numeric prefixes anymore and nothing referenced the old name.
Promote from 1.0.0-beta.27 to the 1.0.0 GA release. Synced via scripts/common/sync-version.sh across backend/pyproject.toml, frontend and infrastructure package.json + lockfiles, uv.lock, and the README badge/current-release text.
Document the full delta since beta.27 (142 commits) for the 1.0.0 GA: single-stack platform-as-bootstrap architecture, admin-managed Skills (gated off), Conversation Modes, file-source connectors + website crawling, assistant share permissions, Gateway MCP self-service targets, curated model catalog + Bedrock Mantle provider, per-turn context attribution, MCP Apps hardening, backup/restore tooling, a security sweep, and all 22 HIGH Dependabot remediations. Replaces the stale 'Unreleased' section in RELEASE_NOTES.md.
main carried 3 prior-release squash commits (beta.25, beta.26, beta.27) that were never merged back to develop, so release/1.0.0 could not auto-merge into main. Every file main has that release/1.0.0 lacks is the OLDER pre-1.0.0 version (verified: superseded skills/Gateway/Mantle code, removed per-component CDK_*_ENABLED flags, and the legacy multi-stack files deleted by #396). release/1.0.0 is the content superset, so we merge with -s ours: main is recorded as a parent for ancestry while release/1.0.0's tree is kept exactly — no resurrected legacy files, no reverted 1.0.0 work.
Comment on lines
+105
to
+108
| - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 | ||
| with: | ||
| path: infrastructure/node_modules | ||
| key: infrastructure-node-modules-${{ hashFiles('infrastructure/package-lock.json') }} | ||
|
|
||
| check-stack-dependencies: | ||
| name: "[${{ inputs.label }}] Check Stack Dependencies" | ||
| ref: ${{ inputs.ref }} | ||
| - uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0 |
Comment on lines
+126
to
+129
| - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 | ||
| with: | ||
| ref: ${{ inputs.ref }} | ||
|
|
||
| - name: Run stack dependency tests | ||
| run: bash scripts/common/test-stack-dependencies.sh | ||
|
|
||
| deploy-infrastructure: | ||
| name: "[${{ inputs.label }}] Deploy Infrastructure" | ||
| - uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6.4.0 |
Contributor
There was a problem hiding this comment.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
| { name: 'Artifacts', factory: () => new ArtifactsDataConstruct(testStack(), 'X', { config }) }, | ||
| ]; | ||
|
|
||
| for (const { name, factory } of constructFactories) { |
|
|
||
|
|
||
| def test_delete_missing_404(client): | ||
| assert client.delete("/skills/nope").status_code == 404 |
Comment on lines
+172
to
+175
| assert ( | ||
| client.delete("/skills/pdf_workflows/resources/nope.md").status_code | ||
| == 404 | ||
| ) |
Comment on lines
+90
to
+96
| r""" | ||
| os\.(?:environ\.get|getenv|environ\[) # os.environ.get / os.getenv / os.environ[ | ||
| \s*\(?\s* # optional whitespace + optional ( | ||
| ['"] # opening quote | ||
| ([A-Z][A-Z0-9_]+) # NAME | ||
| ['"] # closing quote | ||
| """, |
- Remove all references to the not-yet-ready Skills feature (gated off behind SKILLS_ENABLED) from RELEASE_NOTES.md and CHANGELOG.md; it will be announced in a future release. Conversation Modes (ships enabled) is retained. - Add a prominent 'Upgrading an existing deployment' section that explains the destructive backup -> teardown -> redeploy -> restore migration and links to the full step-by-step guides (published docs site + in-repo upgrade-from-multi-stack.md) so operators can navigate straight to instructions for their stacks. - Fix an incorrect PR citation for the web-sources dependencies (#378).
- Comment out the push: triggers on platform.yml, backend.yml, and frontend-deploy.yml so forking or syncing the codebase never auto-deploys infrastructure or code into a user's AWS account. Deploys are now manual via the Actions tab; re-enable by uncommenting. - Document the gating in RELEASE_NOTES.md (CI/CD + deployment notes) and CHANGELOG.md. - Add justification for the single-stack overhaul to the release notes: the app is a monolith, not a microservice fleet; the nine-stack layout bought all the operational cost of microservices (cross-stack ImportValue ordering, inter-stack drift, first-deploy chicken-and-egg) with none of the benefits, and consolidating removes that class of deployment gotchas.
Comment on lines
+90
to
+96
| r""" | ||
| os\.(?:environ\.get|getenv|environ\[) # os.environ.get / os.getenv / os.environ[ | ||
| \s*\(?\s* # optional whitespace + optional ( | ||
| ['"] # opening quote | ||
| ([A-Z][A-Z0-9_]+) # NAME | ||
| ['"] # closing quote | ||
| """, |
| custom_prompt = await get_system_prompts_service().get_enabled_prompt(active_prompt_id) | ||
| if not custom_prompt: | ||
| logger.info( | ||
| f"Custom prompt {active_prompt_id!r} not found or disabled — skipping" |
| logger.info( | ||
| "Kicked off crawl %s for assistant %s (root_document=%s url=%s)", | ||
| job.crawl_id, | ||
| assistant_id, |
| ): | ||
| logger.warning( | ||
| "PUT /sessions/%s/metadata: session id is taken under a different user; refusing", | ||
| session_id, |
| try: | ||
| return await adapter.search(access_token, query, cursor) | ||
| except FileSourceError as err: | ||
| logger.warning("search failed for connector %s: %s", provider_id, err) |
| Run with: pytest tests/supply_chain/test_backup_coverage.py -v | ||
| """ | ||
|
|
||
| import ast |
| cgnat = ipaddress.IPv4Network("100.64.0.0/10") | ||
| if addr in cgnat: | ||
| return True | ||
| except ValueError: |
| parsed = json.loads(candidate) | ||
| if isinstance(parsed, dict): | ||
| return parsed | ||
| except (ValueError, TypeError): |
| import time | ||
| from collections import defaultdict | ||
| from typing import Awaitable, Callable, Dict, List, Optional, Set, Tuple | ||
| from urllib.parse import urljoin |
|
|
||
| import asyncio | ||
| import logging | ||
| import os |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
see release notes