feat(dispatch): enqueue DELETE_BRANCH compensation when a saga plants its loop branch (#433)#437
Conversation
… its loop branch (#433) Every branch a worker creates must be registered for reclamation at the moment it is created, so no failed worker can leak a `loop/<n>` branch the control plane doesn't know to delete. At dispatch, the saga seeded for a worker now carries a DELETE_BRANCH compensation (target = the exact branch name planted for that worker) alongside the pre-existing REMOVE_WORKTREE one. Both seed paths — the shared-store `create` and the `task_store is None` fallback — build the tuple through a single new `_worker_compensations` helper so they can never drift apart. The recovery *handler* for DELETE_BRANCH is a deferred sibling issue in the same epic. Until it lands, a stale saga carrying this kind is quarantined by recovery (parked for a human) rather than falsely marked COMPENSATED — recovery already does this for any unhandled kind. To keep the #360 exhaustiveness contract honest, DELETE_BRANCH is registered in a new `_DEFERRED_COMPENSATION_KINDS` set: every CompensationKind must now be classified as either handled or explicitly deferred, so a genuinely forgotten kind still turns the contract red. Tests: - _worker_compensations carries both kinds; DELETE_BRANCH target is the branch (never the worktree path) and worktree is reaped before branch. - adversarial: empty branch/worktree strings don't crash or merge entries. - both seed paths (_seed_worker_saga + record_worker_task_policy fallback) register DELETE_BRANCH with the exact branch. - integration (primary acceptance): a saga created by _dispatch_one_worker for issue #n carries a DELETE_BRANCH compensation whose target equals the exact planted branch (`loop/7-do-the-thing`). - recovery exhaustiveness contract updated: handled ∪ deferred == all kinds, and the two sets are disjoint. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
hadamrd
left a comment
There was a problem hiding this comment.
Minimal path to green (must-fix to merge)
- Fix the recovery regression in src/forge_loop/control/recovery.py: enqueuing DELETE_BRANCH on every worker saga now makes reconcile_stale_sagas (lines 119-135) treat every dead-worker saga as 'unhandled' → QUARANTINED with the worktree reap skipped, regressing the REMOVE_WORKTREE reclamation that used to run. Make recovery distinguish a DEFERRED kind (run the handled REMOVE_WORKTREE reap now, defer only the branch) from a truly-unhandled kind — i.e. actually consult _DEFERRED_COMPENSATION_KINDS — so dead-worker worktrees are still reaped before the branch handler lands.
- Add a recovery test that a stale dispatch-shaped saga carrying BOTH REMOVE_WORKTREE and DELETE_BRANCH still reaps its worktree on recovery, pinning the corrected interim behavior (current tests only exercise manually-seeded REMOVE_WORKTREE-only sagas, so the production dispatch tuple is uncovered).
Optional follow-ups (do NOT block merge)
- [sev3/correctness] src/forge_loop/control/recovery.py:137 — The reap loop iterates every compensation and calls reap_worktree(saga.issue) per entry while appending compensation.target to
reapedregardless of kind. Once DELETE_BRANCH can reach this loop it will double-fire the reaper and record a branch name as a 'worktree reaped'. Filter by kind == REMOVE_WORKTREE. - [sev3/docs] src/forge_loop/control/recovery.py:30 — Comment references '#272 adds' / 'delete-branch kinds #272 adds' but this work is #433/#431; reconcile the issue reference.
hadamrd
left a comment
There was a problem hiding this comment.
src/forge_loop/control/recovery.py:120 — [sev1/correctness] Enqueuing DELETE_BRANCH on every worker saga breaks existing worktree recovery. reconcile_stale_sagas computes unhandled as any kind not in _HANDLED_COMPENSATION_KINDS; DELETE_BRANCH is deferred (not handled), so every dead-worker saga now short-circuits to mark_quarantined and continue, skipping the REMOVE_WORKTREE reap (the code explicitly 'refuses to half-compensate'). Pre-PR these sagas carried only REMOVE_WORKTREE and were reaped+COMPENSATED. test_recovery.py:161-190 already proves the regression (REMOVE_WORKTREE + unhandled kind => reaped==[], QUARANTINED). Net effect in normal operation: every abandoned worker's worktree leaks and is parked for a human instead of auto-reaped. recovery must treat DEFERRED kinds distinctly (run handled comps, defer only the branch).
hadamrd
left a comment
There was a problem hiding this comment.
Critic findings:
- [sev2/tests] tests/test_recovery.py — No recovery test exercises the new dispatch-shaped compensation tuple (REMOVE_WORKTREE + DELETE_BRANCH). The existing reap/compensate tests seed sagas manually with REMOVE_WORKTREE only, so they stay green while production dispatch sagas now behave differently. Add a test asserting a stale saga with both kinds still reaps the worktree, which will both pin the corrected behavior and have caught the sev1 above.
…es still reap (#433) Enqueuing DELETE_BRANCH on every worker saga regressed boot/tick recovery: reconcile_stale_sagas computed `unhandled` against `_HANDLED_COMPENSATION_KINDS` only, so the deferred DELETE_BRANCH made every dead-worker saga short-circuit to QUARANTINED — skipping the REMOVE_WORKTREE reap that used to run. Every abandoned worker's worktree leaked and was parked for a human instead of auto-reaped. Recovery now consults `_DEFERRED_COMPENSATION_KINDS`: only a kind that is in NEITHER the handled nor the deferred set taints a saga into quarantine. A saga carrying handled + deferred kinds still runs its handled compensations and is driven COMPENSATED; the deferred branch deletion is knowingly skipped (never falsely claimed undone) until its reclamation handler lands. The reap loop now filters to REMOVE_WORKTREE so a non-worktree entry can't double-fire the reaper or record its target as a reaped worktree. Tests: - New regression guard: a dispatch-shaped saga (REMOVE_WORKTREE + DELETE_BRANCH) still reaps its worktree exactly once and reaches COMPENSATED, recording only the worktree path. - Existing quarantine tests re-pointed off `delete-branch` (now a registered deferred kind) onto genuinely-unhandled kinds, preserving their intent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fixes #433
Parent: #431 — epic "Compensate the branch a failed worker abandons".
What
At dispatch, the task saga seeded for a worker now carries a
DELETE_BRANCHcompensation alongside the pre-existing
REMOVE_WORKTREEone. Itstargetisthe exact branch name planted for that worker (
loop/<n>-<slug>), so afailed worker can never leak a branch the control plane doesn't know to delete.
Acceptance criteria
Covered by
test_dispatch_one_worker_saga_carries_delete_branch_for_exact_branch(integration through the real
_dispatch_one_workerpath), which asserts theseeded saga carries a
DELETE_BRANCHcompensation whosetarget ==_branch_for_issue(issue)(loop/7-do-the-thing).Changes
tasks/saga.py: addCompensationKind.DELETE_BRANCH.runner/dispatch.py: both seed paths (_seed_worker_sagavia storecreate,and the
record_worker_task_policytask_store is Nonefallback) build thecompensation tuple through one new
_worker_compensations(worktree_path, branch)helper — REMOVE_WORKTREE first (reap worktree), then DELETE_BRANCH (reclaim the
branch). Single source so the two paths can never drift.
control/recovery.py: registerDELETE_BRANCHin a new_DEFERRED_COMPENSATION_KINDSset. Its recovery handler is a deferred siblingissue; until then a stale saga carrying it is quarantined (parked for a
human), never falsely marked COMPENSATED — recovery already does this for any
unhandled kind. The Recovery drives sagas with an unhandled CompensationKind to a terminal state, enforced by an exhaustiveness test #360 exhaustiveness contract now requires every
CompensationKindto be classified as handled or explicitly deferred, so agenuinely forgotten kind still turns the contract red.
Tests (all pass:
PYTHONPATH=src python -m pytest tests/test_dispatch_branch_compensation.py tests/test_recovery.py→ 22 passed)_worker_compensationscarries both kinds; DELETE_BRANCH target is the branch,never the worktree path (the exact falsifiable bug guarded against), and the
worktree is ordered before the branch.
handled ∪ deferred == set(CompensationKind)and the two sets are disjoint.
Gates
ruff check src tests→ clean.pyright src/forge_loop→ only 2 pre-existing errors (dispatch.py:1002pr_changed_filesrepo arg;tick_checks.py:173GhClientLike protocol). Bothreproduce on the base commit with this diff reverted; this change introduces
zero new type errors and touches neither line.
PYTHONPATH=srcbecause the operator's editableinstall shadows the worktree
src/; the runner's canonical worker env resolvesthe worktree source directly.
Out of scope (follow-up sub-tickets for the epic)
DELETE_BRANCHfrom_DEFERRED_→_HANDLED_COMPENSATION_KINDS).