feat(epic_sweep): expire stale epics past a TTL with an audit close (#435)#436
Merged
Merged
Conversation
…435) The epic sweep only ever closed an epic with >=1 tracked sub-issue whose subs were ALL closed. Two populations were therefore immortal: epics never broken into sub-issues (skipped_no_subs) and epics with stalled-open subs. Population (1) is a pure leak -- the LLM groomer is told to SKIP epic-labelled issues, so nothing reaps an undecomposed epic and the operational-entropy open_epics metric never converges. Add a TTL expiry pass to the pure, deterministic sweep: - An open epic with ZERO open sub-issues whose age (now - created_at) exceeds epic_ttl_days whole days is closed via a new build_expiry_comment (textually distinct from build_close_comment -- it names the age/TTL, e.g. "open 158 days, TTL 90") with reason="not_planned", and reported under a new EpicSweepReport.expired bucket. - Fail-safe ordering (load-bearing): >=1 open sub NEVER expires regardless of age (skipped_open_subs dominates); the all-subs-resolved completed path is unchanged and dominates the TTL check; created_at=None never expires (age unprovable -> fail safe); epic_ttl_days<=0 disables the pass entirely. - now is INJECTED (param, sensible UTC default) so tests are clock-free; every GhClient error is still caught -- the sweep never raises. Wiring: run_epic_sweep threads cfg.epic_ttl_days + an injected now through, and EpicSweepDoneEvent gains an expired: list[int] field (extended in place, no new KIND). epic_ttl_days added to MaintenanceSettings + frozen Config (default 0 = disabled, preserving pre-change behaviour). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
hadamrd
commented
Jun 8, 2026
hadamrd
left a comment
Owner
Author
There was a problem hiding this comment.
Minimal path to green (must-fix to merge)
Nothing blocks merge.
Optional follow-ups (do NOT block merge)
- [sev3/architecture] src/forge_loop/epic_sweep.py — Non-blocking nit:
sweepresolvesclock = now if now is not None else datetime.now(UTC)internally, butrun_epic_sweep(tick_checks.py) already defaultsnowtodatetime.now(UTC)before calling. The in-sweepfallback is redundant in production and in mild tension with the issue's out-of-scope line 'do not add wall-clock reads inside the pure sweep function'. Consider making thesweepparam required (now: datetime) or leaving the wall-clock default solely at the tick_checks boundary, so the pure function is strictly clock-free. Tests already injectnow, so behaviour is unaffected either way. - [sev3/tests] tests/test_epic_sweep.py — Reviewed for tautology/sad-path coverage: tests assert observable side effects (gh.calls reason='not_planned'/'completed', report buckets, comment text incl. '158 days'/'TTL 90') and cover the adversarial close-raises and created_at=None arms — no action needed. Optional: a test pinning exactly one bucket per epic across a mixed batch would make the mutual-exclusivity invariant explicit, but the per-case tests already cover it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #435
Problem
epic_sweep.sweeponly closed an epic with ≥1 tracked sub-issue where every sub-issue is CLOSED. Two populations were immortal: (1) epics never decomposed into sub-issues (skipped_no_subs) and (2) epics whose subs stall open. Population (1) is the leak — the LLM groomer is told to SKIPepic-labelled issues, so nothing reaps an undecomposed epic and the operational-entropyopen_epicsmetric climbs monotonically and never converges.Change
A new TTL expiry pass in the pure, deterministic
sweep:state=openAND zero open sub-issues AND age (now − created_at) >epic_ttl_dayswhole days. Posts a newbuild_expiry_comment(textually distinct frombuild_close_comment— names the age/TTL, e.g. "open 158 days, TTL 90 days") and closes withreason="not_planned". Reported in a newEpicSweepReport.expiredbucket.≥1open sub-issue is never expired regardless of age →skipped_open_subsdominates the TTL check.build_close_commentwithreason="completed", not the expiry path.created_at=None→ age unprovable → never expired (fail safe), no raise.epic_ttl_days <= 0disables the pass entirely (pre-change no-op).nowis injected (param, sensible UTC default) so the sweep stays clock-free; every GhClient error is still caught — the sweep never raises.closed/expired/skipped_open_subs/skipped_no_subs/errors).Wiring
runner/tick_checks.run_epic_sweepthreadscfg.epic_ttl_days+ an injectednowthrough, andEpicSweepDoneEventgains anexpired: list[int]field (extended in place — no new event KIND).epic_ttl_days: int = 0added toMaintenanceSettings+ frozenConfig(default0= disabled, preserving pre-change behaviour).issues_by_label's REST query already surfacescreated_at(gh_client.py~line 487) — no gh_client change needed.Reuse / scope discipline
_post_and_closehelper backs both the completed and TTL close paths (no duplicated comment-then-close-then-record; manifesto Q7)._age_daysmirrorscontrol/status._oldest_age_daysfor a single issue (aware-datetime math, floored at 0) without pulling that heavyweight module into the leanepic_sweep.Tests (
tests/test_epic_sweep.py)Unit (T1 edges + adversarial default arm, T2 false-case, clock-free):
test_stale_epic_with_no_open_subs_is_expired— primary falsifiable acceptance: inreport.expired,close_issuecalled withnot_planned, expiry comment names158 days/TTL 90.test_epic_within_ttl_is_not_expiredtest_epic_with_open_sub_is_never_expired_even_when_ancient— fail-safetest_epic_with_no_created_at_is_never_expiredtest_ttl_zero_disables_expiry_passtest_all_subs_resolved_still_uses_completed_path_not_ttltest_expiry_comment_is_distinct_from_resolved_commenttest_ttl_close_issue_error_is_recorded_not_raised— adversarial sad-pathtest_run_epic_sweep_event_carries_expired_with_ttl_and_injected_now— tick wiring threadsepic_ttl_days+ injectednow, event carriesexpired.Gates
ruff check src tests→ clean.pyright src/forge_loop/epic_sweep.py→ clean. Fullpyright src/forge_loopreports 2 pre-existing errors (dispatch.py:985,run_branch_sweepintick_checks.py) present identically onorigin/trunkwith the trunk versions of those untouched files — not introduced here.tests/test_epic_sweep.py→ 22 passed. Adjacenttest_events.py/test_config.pypass.test_settings.py::test_no_loop_env_reads_outside_settingsfails on a pre-existingos.environ.getinconsole_api.py(on trunk, untouched by this PR).Lumen semantic discovery was skipped —
mcp__lumen__semantic_searchwas not available in the worker session; ran the events/config/settings test files directly instead.