Skip to content

feat(dispatch): enqueue DELETE_BRANCH compensation when a saga plants its loop branch (#433)#437

Merged
hadamrd merged 2 commits into
trunkfrom
loop/433-enqueue-a-delete-branch-compensation-whe
Jun 9, 2026
Merged

feat(dispatch): enqueue DELETE_BRANCH compensation when a saga plants its loop branch (#433)#437
hadamrd merged 2 commits into
trunkfrom
loop/433-enqueue-a-delete-branch-compensation-whe

Conversation

@hadamrd

@hadamrd hadamrd commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Fixes #433
Parent: #431 — epic "Compensate the branch a failed worker abandons".

What

At dispatch, the task saga seeded for a worker now carries a DELETE_BRANCH
compensation alongside the pre-existing REMOVE_WORKTREE one. Its target is
the exact branch name planted for that worker (loop/<n>-<slug>), so a
failed worker can never leak a branch the control plane doesn't know to delete.

Acceptance criteria

A saga created by the dispatch path for issue #n carries a DELETE_BRANCH
compensation whose target equals the exact branch name planted for that worker.

Covered by test_dispatch_one_worker_saga_carries_delete_branch_for_exact_branch
(integration through the real _dispatch_one_worker path), which asserts the
seeded saga carries a DELETE_BRANCH compensation whose target ==
_branch_for_issue(issue) (loop/7-do-the-thing).

Changes

  • tasks/saga.py: add CompensationKind.DELETE_BRANCH.
  • runner/dispatch.py: both seed paths (_seed_worker_saga via store create,
    and the record_worker_task_policy task_store is None fallback) build the
    compensation tuple through one new _worker_compensations(worktree_path, branch)
    helper — REMOVE_WORKTREE first (reap worktree), then DELETE_BRANCH (reclaim the
    branch). Single source so the two paths can never drift.
  • control/recovery.py: register DELETE_BRANCH in a new
    _DEFERRED_COMPENSATION_KINDS set. Its recovery handler is a deferred sibling
    issue; until then a stale saga carrying it is quarantined (parked for a
    human), never falsely marked COMPENSATED — recovery already does this for any
    unhandled kind. The Recovery drives sagas with an unhandled CompensationKind to a terminal state, enforced by an exhaustiveness test #360 exhaustiveness contract now requires every
    CompensationKind to be classified as handled or explicitly deferred, so a
    genuinely forgotten kind still turns the contract red.

Tests (all pass: PYTHONPATH=src python -m pytest tests/test_dispatch_branch_compensation.py tests/test_recovery.py → 22 passed)

  • _worker_compensations carries both kinds; DELETE_BRANCH target is the branch,
    never the worktree path
    (the exact falsifiable bug guarded against), and the
    worktree is ordered before the branch.
  • Adversarial: empty branch/worktree strings don't crash or merge the two entries.
  • Both seed paths register DELETE_BRANCH with the exact branch.
  • Integration / primary acceptance (above).
  • Recovery exhaustiveness contract: handled ∪ deferred == set(CompensationKind)
    and the two sets are disjoint.

Gates

  • ruff check src tests → clean.
  • pyright src/forge_loop → only 2 pre-existing errors (dispatch.py:1002
    pr_changed_files repo arg; tick_checks.py:173 GhClientLike protocol). Both
    reproduce on the base commit with this diff reverted; this change introduces
    zero new type errors
    and touches neither line.
  • Tests were validated with PYTHONPATH=src because the operator's editable
    install shadows the worktree src/; the runner's canonical worker env resolves
    the worktree source directly.

Out of scope (follow-up sub-tickets for the epic)

  • The recovery handler that actually deletes the branch on a stale saga (moves
    DELETE_BRANCH from _DEFERRED__HANDLED_COMPENSATION_KINDS).

… its loop branch (#433)

Every branch a worker creates must be registered for reclamation at the
moment it is created, so no failed worker can leak a `loop/<n>` branch the
control plane doesn't know to delete.

At dispatch, the saga seeded for a worker now carries a DELETE_BRANCH
compensation (target = the exact branch name planted for that worker)
alongside the pre-existing REMOVE_WORKTREE one. Both seed paths — the
shared-store `create` and the `task_store is None` fallback — build the
tuple through a single new `_worker_compensations` helper so they can
never drift apart.

The recovery *handler* for DELETE_BRANCH is a deferred sibling issue in
the same epic. Until it lands, a stale saga carrying this kind is
quarantined by recovery (parked for a human) rather than falsely marked
COMPENSATED — recovery already does this for any unhandled kind. To keep
the #360 exhaustiveness contract honest, DELETE_BRANCH is registered in a
new `_DEFERRED_COMPENSATION_KINDS` set: every CompensationKind must now be
classified as either handled or explicitly deferred, so a genuinely
forgotten kind still turns the contract red.

Tests:
- _worker_compensations carries both kinds; DELETE_BRANCH target is the
  branch (never the worktree path) and worktree is reaped before branch.
- adversarial: empty branch/worktree strings don't crash or merge entries.
- both seed paths (_seed_worker_saga + record_worker_task_policy fallback)
  register DELETE_BRANCH with the exact branch.
- integration (primary acceptance): a saga created by _dispatch_one_worker
  for issue #n carries a DELETE_BRANCH compensation whose target equals the
  exact planted branch (`loop/7-do-the-thing`).
- recovery exhaustiveness contract updated: handled ∪ deferred == all kinds,
  and the two sets are disjoint.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@hadamrd hadamrd left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimal path to green (must-fix to merge)

  1. Fix the recovery regression in src/forge_loop/control/recovery.py: enqueuing DELETE_BRANCH on every worker saga now makes reconcile_stale_sagas (lines 119-135) treat every dead-worker saga as 'unhandled' → QUARANTINED with the worktree reap skipped, regressing the REMOVE_WORKTREE reclamation that used to run. Make recovery distinguish a DEFERRED kind (run the handled REMOVE_WORKTREE reap now, defer only the branch) from a truly-unhandled kind — i.e. actually consult _DEFERRED_COMPENSATION_KINDS — so dead-worker worktrees are still reaped before the branch handler lands.
  2. Add a recovery test that a stale dispatch-shaped saga carrying BOTH REMOVE_WORKTREE and DELETE_BRANCH still reaps its worktree on recovery, pinning the corrected interim behavior (current tests only exercise manually-seeded REMOVE_WORKTREE-only sagas, so the production dispatch tuple is uncovered).

Optional follow-ups (do NOT block merge)

  • [sev3/correctness] src/forge_loop/control/recovery.py:137 — The reap loop iterates every compensation and calls reap_worktree(saga.issue) per entry while appending compensation.target to reaped regardless of kind. Once DELETE_BRANCH can reach this loop it will double-fire the reaper and record a branch name as a 'worktree reaped'. Filter by kind == REMOVE_WORKTREE.
  • [sev3/docs] src/forge_loop/control/recovery.py:30 — Comment references '#272 adds' / 'delete-branch kinds #272 adds' but this work is #433/#431; reconcile the issue reference.

@hadamrd hadamrd added the critic:blocking Critic found blocking issues label Jun 8, 2026

@hadamrd hadamrd left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/forge_loop/control/recovery.py:120[sev1/correctness] Enqueuing DELETE_BRANCH on every worker saga breaks existing worktree recovery. reconcile_stale_sagas computes unhandled as any kind not in _HANDLED_COMPENSATION_KINDS; DELETE_BRANCH is deferred (not handled), so every dead-worker saga now short-circuits to mark_quarantined and continue, skipping the REMOVE_WORKTREE reap (the code explicitly 'refuses to half-compensate'). Pre-PR these sagas carried only REMOVE_WORKTREE and were reaped+COMPENSATED. test_recovery.py:161-190 already proves the regression (REMOVE_WORKTREE + unhandled kind => reaped==[], QUARANTINED). Net effect in normal operation: every abandoned worker's worktree leaks and is parked for a human instead of auto-reaped. recovery must treat DEFERRED kinds distinctly (run handled comps, defer only the branch).

@hadamrd hadamrd left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critic findings:

  • [sev2/tests] tests/test_recovery.py — No recovery test exercises the new dispatch-shaped compensation tuple (REMOVE_WORKTREE + DELETE_BRANCH). The existing reap/compensate tests seed sagas manually with REMOVE_WORKTREE only, so they stay green while production dispatch sagas now behave differently. Add a test asserting a stale saga with both kinds still reaps the worktree, which will both pin the corrected behavior and have caught the sev1 above.

…es still reap (#433)

Enqueuing DELETE_BRANCH on every worker saga regressed boot/tick recovery:
reconcile_stale_sagas computed `unhandled` against `_HANDLED_COMPENSATION_KINDS`
only, so the deferred DELETE_BRANCH made every dead-worker saga short-circuit to
QUARANTINED — skipping the REMOVE_WORKTREE reap that used to run. Every abandoned
worker's worktree leaked and was parked for a human instead of auto-reaped.

Recovery now consults `_DEFERRED_COMPENSATION_KINDS`: only a kind that is in
NEITHER the handled nor the deferred set taints a saga into quarantine. A saga
carrying handled + deferred kinds still runs its handled compensations and is
driven COMPENSATED; the deferred branch deletion is knowingly skipped (never
falsely claimed undone) until its reclamation handler lands. The reap loop now
filters to REMOVE_WORKTREE so a non-worktree entry can't double-fire the reaper
or record its target as a reaped worktree.

Tests:
- New regression guard: a dispatch-shaped saga (REMOVE_WORKTREE + DELETE_BRANCH)
  still reaps its worktree exactly once and reaches COMPENSATED, recording only
  the worktree path.
- Existing quarantine tests re-pointed off `delete-branch` (now a registered
  deferred kind) onto genuinely-unhandled kinds, preserving their intent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@hadamrd hadamrd merged commit c082994 into trunk Jun 9, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

critic:blocking Critic found blocking issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enqueue a DELETE_BRANCH compensation when a saga plants its loop branch

1 participant