brendanddev · brendanddev · Jun 2, 2026 · Apr 28, 2026 · Apr 28, 2026 · Apr 28, 2026
diff --git a/.claude/agents/architect.md b/.claude/agents/architect.md
@@ -0,0 +1,51 @@
+---
+name: architect
+description: Audits code against thunk's architectural principles. Use when reviewing a completed slice, a new file, or any change that touches layer boundaries, state management, or control flow. Invoke with a specific file or directory to review.
+---
+
+You are a strict architectural reviewer for the `thunk` codebase. Your job is to identify violations of the core design principles — not style issues, not performance, not missing features. Only structural and architectural problems that will compound over time.
+
+## What you enforce
+
+**Layer boundaries**
+- `tui/` contains no business logic — only rendering and event dispatch via RuntimeEvent/RuntimeRequest
+- `tools/` are pure execution units — no orchestration, no control flow decisions
+- `runtime/` owns all control flow — no model involvement in structural decisions
+- `core/` has no outward dependencies (known exception: ToolError import in error.rs — do not flag this)
+- Lower layers never import from higher layers
+- Always import AppError/Config from `crate::core`, never `crate::app`
+
+**Control flow**
+- Runtime is the single source of correctness — flag any path where the model makes a structural decision
+- No text-as-API between subsystems — flag any string parsing outside `tool_codec/`
+- No correction logic outside `runtime/` and `tool_codec/` boundaries
+
+**State management**
+- New state fields in `InvestigationState` must reset in `new()`
+- Gate corrections use the `_correction_issued` bool pattern — fire exactly once per turn
+- `evidence_ready()` is the single source of truth for evidence state — no bypasses
+
+**Mutation safety**
+- All mutating tools must return `ToolRunResult::Approval(PendingAction)` — never `Immediate`
+- No new paths to `execute_approved()` outside `ToolRegistry`
+- Mutation tools never appear in system prompt — only in ephemeral per-turn hint
+
+**Coupling**
+- No tight coupling between orchestration layers — changes to one file should not require cascading changes across 5+ files
+- No duplicated sources of truth for tool behavior
+- No god files — flag any file exceeding 600 lines that is growing
+
+## How to review
+
+1. Read the files specified
+2. Check each principle above systematically
+3. Report only real violations — not stylistic preferences
+4. For each violation: state the file and line, the principle violated, and the minimal fix
+5. If nothing violates the principles, say so explicitly — do not invent issues
+
+## What you do not flag
+- Code style or formatting
+- Performance (unless it involves architectural coupling)
+- Missing features or incomplete implementations
+- Things that are ugly but architecturally sound
+- The known core/error.rs → tools/ ToolError import
diff --git a/.claude/agents/refactor.md b/.claude/agents/refactor.md
@@ -0,0 +1,55 @@
+---
+name: refactor
+description: Analyzes files and modules for size, mixed responsibilities, and separation of concerns violations. Use when a file feels too large, a function is doing too much, or a module owns more than one distinct concern. Invoke with a specific file, directory, or line threshold.
+---
+
+You are a refactor reviewer for the `thunk` codebase. Your job is to identify files and functions that should be split — not for line count alone, but because they own more than one distinct responsibility or mix concerns that belong in separate layers.
+
+## What you analyze
+
+**File size**
+- Any `.rs` file over 1000 lines is a candidate for review
+- Flag files that are growing across phases — size trend matters more than absolute count
+- `src/runtime/orchestration/tool_round.rs` and `src/runtime/orchestration/engine.rs` are known large files — analyze carefully before flagging
+
+**Function size**
+- Any function over 100 lines likely owns more than one responsibility
+- Flag functions that mix policy decisions with execution, or parsing with dispatch
+
+**Separation of concerns**
+- Policy mixed with execution in the same function
+- Parsing logic outside `tool_codec/`
+- Orchestration logic inside `tools/`
+- Multiple unrelated responsibilities in the same module
+
+**Layering violations**
+- Read `.claude/dev/module-map.md` before analyzing — ownership boundaries are defined there
+- Flag any split that would require a lower layer to import from a higher layer
+- Flag any proposed split that creates circular dependencies
+
+## How to review
+
+1. Read `.claude/rules/invariants.md` and `.claude/dev/module-map.md` first
+2. If a specific file was given, analyze that file only
+3. Otherwise run: `find src -name "*.rs" | xargs wc -l | sort -rn | head -20`
+4. For each candidate file:
+   - List the distinct responsibilities it owns
+   - Identify functions over 100 lines
+   - Flag mixed concerns
+5. For each proposed split:
+   - Name the new module and what moves there
+   - Identify all cross-module import changes required
+   - Estimate risk: low / medium / high
+   - Flag if the split touches public APIs
+6. Prioritize by risk — highest impact splits first
+
+## What you do not flag
+- Line count alone without mixed responsibilities
+- Style or formatting issues
+- Performance concerns
+- Incomplete implementations
+- Known architectural exceptions documented in `.claude/rules/invariants.md`
+- The known `core/error.rs` → `tools/` ToolError import
+
+## Output format
+For each file: state the file, its line count, the distinct responsibilities it owns, and whether a split is warranted. For each proposed split: state what moves where, the risk level, and what changes are required. If nothing warrants splitting, say so explicitly.
diff --git a/.claude/commands/refactor.md b/.claude/commands/refactor.md
@@ -0,0 +1,33 @@
+# /refactor
+
+Analyze the codebase for files and functions that should be split for
+modularity, separation of concerns, and maintainability.
+
+## Usage
+- `/refactor` — scan all source files, report anything over threshold
+- `/refactor src/runtime/orchestration/tool_round.rs` — analyze specific file
+- `/refactor 300` — use custom line threshold instead of default 500
+
+## Steps
+
+1. Read `.claude/rules/invariants.md` and `.claude/dev/module-map.md` first
+2. If a specific file was given, analyze that file only
+3. Otherwise, find all `.rs` files over the line threshold:
+   `find src -name "*.rs" | xargs wc -l | sort -rn | head -20`
+4. For each file over threshold:
+   - List distinct responsibilities it owns
+   - Identify functions over 100 lines
+   - Flag any separation of concerns violations
+   - Flag any layering violations per module-map.md
+5. For each candidate split:
+   - Propose new module name and what moves there
+   - Estimate risk: low / medium / high
+   - Note any cross-module import changes required
+6. Output a prioritized list — highest risk files first
+
+## Constraints
+- Never suggest splitting for line count alone — only when distinct
+  responsibilities exist
+- Never propose changes that violate `.claude/rules/invariants.md`
+- Flag any split that touches public APIs or cross-module imports
+- Do not modify any files — analysis only unless explicitly asked
diff --git a/.claude/commands/sync-claude.md b/.claude/commands/sync-claude.md
@@ -0,0 +1,45 @@
+# /sync-claude
+
+Audit the current state of `.claude/` and `CLAUDE.md` against the actual codebase and update anything stale. This command keeps the AI development environment in sync with reality.
+
+## What to check and update
+
+**1. Test baseline in `CLAUDE.md`**
+Run `cargo test --no-default-features 2>&1 | grep "^test result"` and update the test count in CLAUDE.md if it has changed.
+
+**2. Invariant locations in `.claude/rules/invariants.md`**
+Verify these line number references are still accurate:
+- `is_permitted_shell_command()` in `src/runtime/investigation/prompt_analysis.rs`
+- `execute_approved()` in `src/tools/registry.rs`
+- `evidence_ready()` in `src/runtime/investigation/investigation.rs`
+- `tool_allowed_for_surface()` in `src/runtime/investigation/tool_surface.rs`
+Update any stale line references.
+
+**3. Layer boundaries in `.claude/rules/architecture.md`**
+Check if the known `core/ → tools/` violation still exists:
+`grep -n "ToolError" src/core/error.rs`
+If it's been fixed, remove the "Known Exception" section. If new violations exist, document them.
+
+**4. Test command accuracy**
+Verify `just verify` still runs `cargo test --no-default-features`:
+`grep "test" justfile`
+Update CLAUDE.md or slice-discipline.md if the command has changed.
+
+**5. New tools or surfaces**
+Check if new tools have been added since last sync:
+`ls src/tools/`
+If new tools exist that aren't documented in `rules/invariants.md` (under Surface Enforcement), add them.
+
+**6. Key files table in `CLAUDE.md`**
+Verify all referenced files still exist at the listed paths:
+`find src -name "*.rs" | grep -E "registry|prompt_analysis|tool_surface|investigation|prompt|engine|tool_round"`
+Update any moved or renamed files.
+
+**7. Phase references**
+Check the current phase from recent git log:
+`git log --oneline -5`
+If CLAUDE.md or any rules file references a stale phase number, update it.
+
+## After auditing
+Report what was checked, what was stale, and what was updated. Do not touch any Rust source files. Do not run `cargo test` — use the grep/find commands above for verification only.
+
diff --git a/.claude/dev/core-loop.md b/.claude/dev/core-loop.md
@@ -0,0 +1,30 @@
+# Core Loop
+
+## System Mental Model
+
+- The runtime is the state machine. It owns request handling, turn classification, tool dispatch, approval suspension, answer admission, deterministic terminal answers, anchor state, project snapshot caching, and conversation trimming. Code: `src/runtime/orchestration/engine.rs`, `src/runtime/conversation.rs`, `src/runtime/types.rs`.
+- The backend does not execute tools or decide whether a response is valid. `ModelBackend::generate()` only receives a `GenerateRequest` and emits `BackendEvent`s; the runtime parses the returned text, discards invalid protocol, and decides whether to keep or replace the assistant output. Code: `src/llm/backend.rs`, `src/runtime/orchestration/generation.rs`, `src/runtime/protocol/tool_codec/`, `src/runtime/orchestration/engine.rs`.
+- The runtime injects turn-local policy before every generation. `run_generate_turn()` appends a system message naming the active `ToolSurface`, and may append a bounded project snapshot hint. These hints are request-local and are not persisted in `Conversation`. Code: `src/runtime/orchestration/generation.rs`, `src/runtime/investigation/tool_surface.rs`, `src/runtime/project/project_snapshot.rs`, `src/runtime/protocol/prompt.rs`.
+- The runtime, not the backend, chooses when tools are available. `select_tool_surface()` selects one of `RetrievalFirst`, `GitReadOnly`, `AnswerOnly`, or `MutationEnabled`. `tool_allowed_for_surface()` enforces surface membership before dispatch. Code: `src/runtime/investigation/tool_surface.rs`.
+- The runtime guarantees project confinement. All tool inputs are converted from raw `ToolInput` into `ResolvedToolInput` before dispatch; read, list, and search scopes must stay inside `ProjectRoot`; mutation targets also reject symlink parents and symlink targets. On Windows, `ProjectRoot::new()` strips the `\\?\` UNC prefix after `fs::canonicalize`. Code: `src/runtime/project/resolved_input.rs`, `src/runtime/project/resolver.rs`, `src/runtime/project/project_root.rs`.
+- The runtime guarantees that mutations do not execute during the proposal phase. `edit_file` and `write_file` and `shell` return `ToolRunResult::Approval(PendingAction)` from `run()`, and only `execute_approved()` performs the actual action. Code: `src/tools/mod.rs`, `src/tools/types.rs`, `src/tools/edit_file.rs`, `src/tools/write_file.rs`, `src/tools/shell.rs`.
+- The runtime guarantees that investigation answers are grounded in read evidence, not search text alone. Search-only answers, unread file citations, out-of-scope citations, repeated tool drift after evidence, and repeated malformed protocol all terminate through runtime-owned branches. Code: `src/runtime/orchestration/engine.rs`, `src/runtime/orchestration/tool_round.rs`, `src/runtime/investigation/investigation.rs`, `src/runtime/protocol/response_text.rs`.
+- The runtime guarantees bounded context growth. Tool results are capped through `cap_tool_result_blocks()` (driven by `ContextPolicy` derived from `BackendCapabilities.context_window_tokens`), old tool exchanges are live-trimmed without removing conversational messages, context usage is estimated, `/context stats` reports live usage, `/compact` prunes stale tool results, a warning fires at 75%, and auto-prune runs at 90%. Summarization is deferred. Code: `src/runtime/orchestration/engine.rs`, `src/runtime/orchestration/context_cap.rs`, `src/runtime/orchestration/context_policy.rs`, `src/runtime/orchestration/command_handlers.rs`, `src/runtime/conversation.rs`.
+
+## Core Runtime Loop
+
+- `Runtime::handle()` is the single request entrypoint. It dispatches `Submit`, `Reset`, `Approve`, `Reject`, `QueryLast`, `QueryAnchors`, `QueryHistory`, `ReadFile`, `SearchCode`, `Undo`, `ProvidersList`, `ProvidersUse`, `GitBranch`, `GitStatus`, `GitDiff`, `GitLog`, `ListDir`, `LspStatus`, `IndexBuild`, `IndexStatus`, `ContextStats`, and `Compact` requests. Code: `src/runtime/orchestration/engine.rs`, `src/runtime/types.rs`.
+- Slash-command requests (`GitBranch`, `GitStatus`, `GitDiff`, `GitLog`, `ReadFile`, `SearchCode`, `ListDir`) are dispatched through the `CommandTool` allowlist in `command_handlers.rs`. Mutating tools are excluded from this allowlist by construction. Code: `src/runtime/orchestration/command_handlers.rs`.
+- `handle_submit()` rejects empty prompts and new submits while a `PendingAction` exists. It also special-cases exact anchor prompts and routes them into `run_last_read_file_anchor()` or `run_last_search_anchor()` instead of the normal turn loop. Code: `src/runtime/orchestration/engine.rs`, `src/runtime/orchestration/anchor_resolution.rs`, `src/runtime/investigation/anchors.rs`.
+- A normal submit enters `run_turns_with_initial_reads()`. That function computes turn state once from the original user prompt: retrieval intent, direct-read mode, whether investigation is required, whether mutation is allowed, the `ToolSurface`, the `InvestigationMode`, and an optional prompt-derived path scope. State is collected into `TurnContext` and `TurnState`. Code: `src/runtime/orchestration/engine.rs`, `src/runtime/orchestration/turn_state.rs`, `src/runtime/investigation/prompt_analysis.rs`, `src/runtime/investigation/investigation.rs`, `src/runtime/investigation/tool_surface.rs`.
+- Before any backend generation, the runtime may seed the first tool call itself. This happens for narrow natural-language edits (`requested_simple_edit()`), direct reads, directory listings, and permitted shell commands. The seeded call is stored as `PendingRuntimeCall { seeded_pre_generation: true }`, so the first tool can run with no backend round. Code: `src/runtime/orchestration/engine.rs`, `src/runtime/orchestration/turn_state.rs`, `src/runtime/investigation/prompt_analysis.rs`.
+- Each loop iteration chooses an `effective_surface`. If `answer_phase` (`AnswerPhaseKind::PostRead` or `InvestigationEvidenceReady`) is active, `effective_surface` is forced to `AnswerOnly`; otherwise it uses the prompt-selected surface. Code: `src/runtime/orchestration/engine.rs`, `src/runtime/orchestration/turn_state.rs`, `src/runtime/investigation/tool_surface.rs`.
+- `run_generate_turn()` builds the request from `Conversation::snapshot()`, appends the surface hint, optionally appends the project snapshot hint, sends the request to the backend, buffers streamed text, and only writes the assistant reply into `Conversation` after a complete response is available. Code: `src/runtime/orchestration/generation.rs`.
+- After generation, the runtime parses the assistant text with `tool_codec::parse_all_tool_inputs()`. If no tool calls are parsed, the runtime either admits the answer or replaces it through guard branches. Code: `src/runtime/protocol/tool_codec/tool_parser.rs`, `src/runtime/orchestration/engine.rs`.
+- If tool calls are present, the runtime increments `tool_rounds` unless the call was seeded before generation. The round limit is `MAX_TOOL_ROUNDS = 10`; hitting it emits `AnswerSource::ToolLimitReached`. Code: `src/runtime/orchestration/engine.rs`.
+- Tool execution is delegated to `run_tool_round()`, which returns one of four outcomes. `Completed` means all calls finished immediately. `ApprovalRequired` means the turn pauses with a `PendingAction`. `RuntimeDispatch` means the runtime selected the next tool call itself. `TerminalAnswer` means the runtime has enough information to end the turn without another backend round. Code: `src/runtime/orchestration/tool_round.rs`.
+- Search to read transition can happen in three ways: the backend emits `[read_file: ...]` after a search result, `run_tool_round()` returns `RuntimeDispatch` to the preferred candidate after search, or a direct-read request is seeded before any generation. Code: `src/runtime/orchestration/tool_round.rs`, `src/runtime/orchestration/engine.rs`.
+- Read to answer transition is runtime-owned. After a completed tool round, the runtime sets `answer_phase = InvestigationEvidenceReady` when `investigation.evidence_ready()` becomes true, or `answer_phase = PostRead` for non-investigation read flows. The next generation then runs under `AnswerOnly`. Code: `src/runtime/orchestration/engine.rs`, `src/runtime/investigation/investigation.rs`.
+- Raw direct reads are a separate terminal path. If a seeded direct read completes in `DirectReadMode::Raw`, the runtime strips the tool-result wrapper with `direct_read_fallback_answer()` and finishes immediately. No synthesis generation is performed. Code: `src/runtime/orchestration/engine.rs`, `src/runtime/protocol/response_text.rs`, `src/runtime/tests/finalization.rs`.
+- Approved mutation success does not re-enter the backend. `handle_approve()` executes the approved tool, commits the tool result, invalidates the project snapshot cache, trims context, and finishes with `mutation_complete_final_answer()`. Code: `src/runtime/orchestration/engine.rs`, `src/runtime/protocol/response_text.rs`.
+- Provider switching is session-only. `ProvidersList` and `ProvidersUse` requests list or swap the active `ModelBackend` without persisting the change. Code: `src/runtime/orchestration/engine.rs`, `src/runtime/types.rs`.