Skip to content

feat(epic_sweep): expire stale epics past a TTL with an audit close (#435)#436

Merged
hadamrd merged 1 commit into
trunkfrom
loop/435-expire-stale-epics-past-a-ttl-with-an-au
Jun 8, 2026
Merged

feat(epic_sweep): expire stale epics past a TTL with an audit close (#435)#436
hadamrd merged 1 commit into
trunkfrom
loop/435-expire-stale-epics-past-a-ttl-with-an-au

Conversation

@hadamrd

@hadamrd hadamrd commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Fixes #435

Problem

epic_sweep.sweep only closed an epic with ≥1 tracked sub-issue where every sub-issue is CLOSED. Two populations were immortal: (1) epics never decomposed into sub-issues (skipped_no_subs) and (2) epics whose subs stall open. Population (1) is the leak — the LLM groomer is told to SKIP epic-labelled issues, so nothing reaps an undecomposed epic and the operational-entropy open_epics metric climbs monotonically and never converges.

Change

A new TTL expiry pass in the pure, deterministic sweep:

  • Closes an epic iff state=open AND zero open sub-issues AND age (now − created_at) > epic_ttl_days whole days. Posts a new build_expiry_comment (textually distinct from build_close_comment — names the age/TTL, e.g. "open 158 days, TTL 90 days") and closes with reason="not_planned". Reported in a new EpicSweepReport.expired bucket.
  • Fail-safe (load-bearing): a ≥1 open sub-issue is never expired regardless of age → skipped_open_subs dominates the TTL check.
  • The existing all-subs-resolved path is unchanged and dominates TTL: an old, fully-resolved epic still closes via build_close_comment with reason="completed", not the expiry path.
  • created_at=None → age unprovable → never expired (fail safe), no raise.
  • epic_ttl_days <= 0 disables the pass entirely (pre-change no-op).
  • now is injected (param, sensible UTC default) so the sweep stays clock-free; every GhClient error is still caught — the sweep never raises.
  • Each evaluated epic lands in exactly one bucket (closed / expired / skipped_open_subs / skipped_no_subs / errors).

Wiring

  • runner/tick_checks.run_epic_sweep threads cfg.epic_ttl_days + an injected now through, and EpicSweepDoneEvent gains an expired: list[int] field (extended in place — no new event KIND).
  • epic_ttl_days: int = 0 added to MaintenanceSettings + frozen Config (default 0 = disabled, preserving pre-change behaviour).
  • Verified issues_by_label's REST query already surfaces created_at (gh_client.py ~line 487) — no gh_client change needed.

Reuse / scope discipline

  • Shared _post_and_close helper backs both the completed and TTL close paths (no duplicated comment-then-close-then-record; manifesto Q7).
  • _age_days mirrors control/status._oldest_age_days for a single issue (aware-datetime math, floored at 0) without pulling that heavyweight module into the lean epic_sweep.
  • Net diff is the single TTL mechanism only.

Tests (tests/test_epic_sweep.py)

Unit (T1 edges + adversarial default arm, T2 false-case, clock-free):

  • test_stale_epic_with_no_open_subs_is_expiredprimary falsifiable acceptance: in report.expired, close_issue called with not_planned, expiry comment names 158 days / TTL 90.
  • test_epic_within_ttl_is_not_expired
  • test_epic_with_open_sub_is_never_expired_even_when_ancient — fail-safe
  • test_epic_with_no_created_at_is_never_expired
  • test_ttl_zero_disables_expiry_pass
  • test_all_subs_resolved_still_uses_completed_path_not_ttl
  • test_expiry_comment_is_distinct_from_resolved_comment
  • test_ttl_close_issue_error_is_recorded_not_raised — adversarial sad-path
  • Integration: test_run_epic_sweep_event_carries_expired_with_ttl_and_injected_now — tick wiring threads epic_ttl_days + injected now, event carries expired.

Gates

  • ruff check src tests → clean.
  • pyright src/forge_loop/epic_sweep.py → clean. Full pyright src/forge_loop reports 2 pre-existing errors (dispatch.py:985, run_branch_sweep in tick_checks.py) present identically on origin/trunk with the trunk versions of those untouched files — not introduced here.
  • tests/test_epic_sweep.py → 22 passed. Adjacent test_events.py/test_config.py pass. test_settings.py::test_no_loop_env_reads_outside_settings fails on a pre-existing os.environ.get in console_api.py (on trunk, untouched by this PR).

Lumen semantic discovery was skipped — mcp__lumen__semantic_search was not available in the worker session; ran the events/config/settings test files directly instead.

…435)

The epic sweep only ever closed an epic with >=1 tracked sub-issue whose
subs were ALL closed. Two populations were therefore immortal: epics never
broken into sub-issues (skipped_no_subs) and epics with stalled-open subs.
Population (1) is a pure leak -- the LLM groomer is told to SKIP epic-labelled
issues, so nothing reaps an undecomposed epic and the operational-entropy
open_epics metric never converges.

Add a TTL expiry pass to the pure, deterministic sweep:

- An open epic with ZERO open sub-issues whose age (now - created_at) exceeds
  epic_ttl_days whole days is closed via a new build_expiry_comment (textually
  distinct from build_close_comment -- it names the age/TTL, e.g. "open 158
  days, TTL 90") with reason="not_planned", and reported under a new
  EpicSweepReport.expired bucket.
- Fail-safe ordering (load-bearing): >=1 open sub NEVER expires regardless of
  age (skipped_open_subs dominates); the all-subs-resolved completed path is
  unchanged and dominates the TTL check; created_at=None never expires (age
  unprovable -> fail safe); epic_ttl_days<=0 disables the pass entirely.
- now is INJECTED (param, sensible UTC default) so tests are clock-free; every
  GhClient error is still caught -- the sweep never raises.

Wiring: run_epic_sweep threads cfg.epic_ttl_days + an injected now through, and
EpicSweepDoneEvent gains an expired: list[int] field (extended in place, no new
KIND). epic_ttl_days added to MaintenanceSettings + frozen Config (default 0 =
disabled, preserving pre-change behaviour).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@hadamrd hadamrd left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimal path to green (must-fix to merge)

Nothing blocks merge.

Optional follow-ups (do NOT block merge)

  • [sev3/architecture] src/forge_loop/epic_sweep.py — Non-blocking nit: sweep resolves clock = now if now is not None else datetime.now(UTC) internally, but run_epic_sweep (tick_checks.py) already defaults now to datetime.now(UTC) before calling. The in-sweep fallback is redundant in production and in mild tension with the issue's out-of-scope line 'do not add wall-clock reads inside the pure sweep function'. Consider making the sweep param required (now: datetime) or leaving the wall-clock default solely at the tick_checks boundary, so the pure function is strictly clock-free. Tests already inject now, so behaviour is unaffected either way.
  • [sev3/tests] tests/test_epic_sweep.py — Reviewed for tautology/sad-path coverage: tests assert observable side effects (gh.calls reason='not_planned'/'completed', report buckets, comment text incl. '158 days'/'TTL 90') and cover the adversarial close-raises and created_at=None arms — no action needed. Optional: a test pinning exactly one bucket per epic across a mixed batch would make the mutual-exclusivity invariant explicit, but the per-case tests already cover it.

@hadamrd hadamrd merged commit 6aa6e29 into trunk Jun 8, 2026
2 checks passed
@hadamrd hadamrd deleted the loop/435-expire-stale-epics-past-a-ttl-with-an-au branch June 8, 2026 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expire stale epics past a TTL with an audit close

1 participant