Skip to content

Expire stale epics past a TTL with an audit close #435

Description

@hadamrd

Problem

epic_sweep.sweep (src/forge_loop/epic_sweep.py) only closes an epic when it has ≥1 tracked sub-issue and every sub-issue is CLOSED. Two populations are therefore immortal:

  1. An epic that never gets broken into sub-issues (skipped_no_subs).
  2. An epic whose sub-issues stall open indefinitely (skipped_open_subs — though this one legitimately has open work).

Population (1) is the leak: those epics land in skipped_no_subs every pass and stay open forever. The LLM groomer is explicitly told to SKIP epic-labelled issues (runner/tick.py DEFAULT_BRIEF STEP 2), so nothing reaps them. Result: the open-epic count climbs monotonically and the operational-entropy open_epics metric (operational_entropy.snapshot, surfaced via control/status._backlog_entropy) never converges.

Concrete example: epic #390 is opened on 2026-01-01, never decomposed into sub-issues. Today is 2026-06-08. With epic_ttl_days=90 it is 158 days old with zero open sub-issues — it should have been auto-closed ~68 days ago with an expiry audit comment, but sweep leaves it open indefinitely under skipped_no_subs.

Acceptance criteria

  • A new TTL pass closes an epic iff: it is state=open, it has zero open sub-issues, AND its age (now − created_at) exceeds epic_ttl_days whole days. The close posts an expiry audit comment that is textually distinct from the all-subs-resolved comment (it must name the TTL/age, e.g. "open 158 days, TTL 90").
  • Fail-safe (load-bearing): an epic with ≥1 open sub-issue is NEVER expired, regardless of age — it stays under skipped_open_subs. This dominates the TTL check.
  • An epic within the TTL is left open (reported as skipped_no_subs / skipped_open_subs, not closed).
  • The existing all-subs-resolved close path is unchanged: an epic whose subs are all CLOSED still closes via build_close_comment with reason="completed", NOT via the TTL path, even if also old.
  • An epic whose created_at is None (timestamp unknown) is NEVER expired (cannot prove age → fail safe), and the sweep does not raise.
  • TTL is configurable via epic_ttl_days (Maintenance settings + frozen Config); a value <= 0 disables the TTL pass entirely (pure no-op), preserving pre-change behaviour.
  • The sweep stays pure/deterministic: now is injected (param with sensible default), never read from the wall clock inside the pure function — so tests are clock-free. Every GhClient error is still caught; the sweep never raises and the tick survives.
  • run_epic_sweep (runner/tick_checks.py) threads config + now through, and the typed EpicSweepDoneEvent reports expired epics (add an expired: list[int] field) distinctly from closed.
  • Each evaluated epic lands in exactly ONE bucket of the report (closed / expired / skipped_open_subs / skipped_no_subs / errors) — mutual exclusivity preserved.

Test matrix

Unit (tests/test_epic_sweep.py, mirror existing MockGhClient style):

  • test_stale_epic_with_no_open_subs_is_expired — zero open subs, age > TTL → in report.expired, close_issue called with the expiry reason, expiry comment posted. Primary falsifiable acceptance.
  • test_epic_within_ttl_is_not_expired — zero open subs, age < TTL → not closed, no comment.
  • test_epic_with_open_sub_is_never_expired_even_when_ancient — ≥1 open sub + age ≫ TTL → skipped_open_subs, never closed (fail-safe).
  • test_epic_with_no_created_at_is_never_expiredcreated_at=None, old by any measure → not closed, no raise.
  • test_ttl_zero_disables_expiry_passepic_ttl_days=0 → behaviour identical to today (no expiry closes).
  • test_all_subs_resolved_still_uses_completed_path_not_ttl — all subs CLOSED + old → closes via the completed comment, NOT the expiry comment.
  • test_expiry_comment_is_distinct_from_resolved_comment — assert the expiry body differs from build_close_comment output and names the age/TTL.
  • Adversarial / sad path: test_ttl_close_issue_error_is_recorded_not_raisedclose_issue raises on a TTL-eligible epic → recorded in report.errors, sweep returns normally, tick not crashed.

Integration (tick wiring): extend test_run_epic_sweep_* so EpicSweepDoneEvent carries the expired list and run_epic_sweep passes epic_ttl_days + injected now from cadence config.

Out of scope

  • Do NOT change the all-subs-resolved close condition or its comment text.
  • Do NOT touch the LLM groomer / DEFAULT_BRIEF epic-skip behaviour.
  • Do NOT add re-open / un-expire automation, notifications, or a grace-period warning comment.
  • Do NOT make TTL per-epic-label-configurable or read TTL from issue body — a single global epic_ttl_days only.
  • Do NOT introduce a new event KIND; extend EpicSweepDoneEvent in place.
  • Do NOT add wall-clock reads inside the pure sweep function — now is injected.

File pointers

  • src/forge_loop/epic_sweep.py — add TTL branch to sweep (inject now, add epic_ttl_days param), add an expiry-comment builder beside build_close_comment, add expired to EpicSweepReport, export it.
  • src/forge_loop/runner/tick_checks.pyrun_epic_sweep: pass epic_ttl_days + now, populate expired on the event.
  • src/forge_loop/events.pyEpicSweepDoneEvent: add expired: list[int] field.
  • src/forge_loop/settings.pyMaintenanceSettings: add epic_ttl_days: int = 0 (next to epic_label).
  • src/forge_loop/config.py — frozen Maintenance config block: add epic_ttl_days and plumb from settings.
  • tests/test_epic_sweep.py — new + extended tests above.
  • Age helper to mirror (don't reinvent): src/forge_loop/control/status.py::_oldest_age_days/_parse_iso and src/forge_loop/snapshot.py::_age_seconds. Issue.created_at is already populated by the gh_client builder (gh_client.py ~line 487) — verify issues_by_label's query actually surfaces created_at before relying on it; if it doesn't, that's part of this ticket.

Worker note

AC is wide — it spans epic_sweep, tick_checks, events, settings, and config (5 modules) plus feature + config + event-field changes. Worker, you are at high risk of running out of turns before pushing. Apply COMMIT DISCIPLINE (wip-commit every 20 turns / 5 file-edits) aggressively from the start, and run the EXIT CHECKLIST even if implementation feels incomplete. Suggested order: (1) config/settings knob, (2) pure sweep TTL branch + unit tests (the load-bearing falsifiable AC), commit, (3) event field + tick wiring + integration tests, commit.

Original report

epic_sweep.sweep only closes an epic when all of its tracked sub-issues are resolved. An epic that never receives sub-tickets, or whose subs stall indefinitely, stays open forever, so the open-epic count grows monotonically and the operational-entropy snapshot's open_epics metric never converges. Add a TTL pass: an epic with zero open sub-issues that has been open longer than a configurable max age (e.g. epic_ttl_days) is auto-closed with an audit comment explaining the expiry, distinct from the all-subs-resolved close path. Fail-safe: an epic with >=1 open sub-issue is never expired. Primary falsifiable acceptance: an epic with no open sub-issues whose age exceeds the TTL is closed with an expiry audit comment, while one within the TTL (or with an open sub) is left open.

Customer story: As a forge-loop operator who must not drown in the loop's own exhaust: the epic pile only shrinks when every sub-issue resolves, so abandoned or sub-less epics pile up. I need stale epics to expire by TTL automatically so the backlog converges, instead of the open-epic count climbing forever.

Metadata

Metadata

Assignees

No one assigned

    Labels

    axis:operational-convergenceValue axis: operational convergence (GC the loop's own exhaust)po:expandedPO subagent expanded the issue body

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions