Skip to content

feat(doctor): surface prunable-memory backlog in memory_integrity probe (#429)#430

Merged
hadamrd merged 1 commit into
trunkfrom
loop/429-surface-prunable-memory-backlog-in-the-f
Jun 8, 2026
Merged

feat(doctor): surface prunable-memory backlog in memory_integrity probe (#429)#430
hadamrd merged 1 commit into
trunkfrom
loop/429-surface-prunable-memory-backlog-in-the-f

Conversation

@hadamrd

@hadamrd hadamrd commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Summary

Enriches the existing memory_integrity doctor probe (src/forge_loop/control/doctor.py) so it surfaces the prunable-memory backlog — the active, non-load-bearing episodic items that accumulate one-per-issue (parent epic #426) until they drown the load-bearing decisions in boot context. Previously the probe printed episodes=600 and returned PASS, giving a maestro resuming after context loss no signal that compaction was overdue.

No new probe, CHECK_NAMES entry, or CLI command — the existing probe is enriched only. Read-only: no writes to .forge.

Dependency on #427 — which path I took

is_load_bearing_memory (the pure predicate) is #427, which is OPEN / not merged with no PR at the time of writing. Per the ticket's "Do not re-implement the classification here" constraint (and manifesto Q7), this probe consumes the predicate via injection and resolves it dynamically in production (_resolve_load_bearing_predicategetattr(forge_loop.memory, "is_load_bearing_memory", None)).

Acceptance criteria & how they're tested

Criterion Implementation Test
Named threshold constant, no bare literal PRUNABLE_MEMORY_WARN_THRESHOLD referenced by all unit tests
Named compaction remediation constant COMPACT_REMEDIATION alongside *_REMEDIATION asserted == COMPACT_REMEDIATION
Detail reports prunable count; existing fields preserved appended prunable=<n> to the breakdown test_prunable_backlog_reported_below_threshold_passes, test_warn_detail_preserves_existing_breakdown_fields
> threshold → WARN naming count + compaction, non-None remediation strict > branch test_prunable_backlog_above_threshold_warns
<= threshold → unchanged PASS, remediation None else branch boundary + below-threshold tests
Strict > (not >=) boundary at-threshold PASS, +1 WARN test_prunable_threshold_boundary_uses_strict_greater_than
Absent/corrupt/unreadable branches untouched; prunable runs only on clean-open logic gated after clean open test_prunable_logic_does_not_mask_corrupt_store_fail, existing test_fails_cleanly_on_corrupt_db
doctor --json over above-threshold store → warn + non-null remediation; human table renders count + remediation flows through existing renderer test_json_memory_integrity_warns_on_above_threshold_prunable_backlog, test_human_table_renders_prunable_count_and_compaction_remediation

Prunable count is computed over the already-fetched active list (no extra DB round-trip; manifesto Q9).

Test plan

  • ruff check src tests — passes
  • pyright src/forge_loop0 errors in the changed doctor.py. 2 errors remain in runner/dispatch.py and runner/tick_checks.py — both pre-existing on trunk in files this PR does not touch (verified by checking out the trunk versions).
  • Targeted tests: all 10 new/affected tests pass (TestMemoryIntegrity + the two cross-boundary tests).

Note on the full suite: python -m pytest -q was not run to completion in the worker worktree because this environment's disk fsyncs each SQLite store-open at ~7s, making a full-suite run take hours (a single SqliteMemoryStore(...) open measured at 7.3s here). Per the worker contract I targeted the specific tests I touched; CI on a normal disk runs the full suite. Test seeding sets PRAGMA synchronous=OFF on the throwaway test DBs to keep per-row inserts fast.

Out of scope (respected)

Fixes #429

…be (#429)

The `memory_integrity` doctor probe reported the memory store's
decisions/rejected/episodes/skills breakdown and returned PASS whenever the
store opened cleanly — with no visibility into the *prunable* backlog: the
active, non-load-bearing episodic items that accumulate one-per-issue (epic
#426) until they drown the load-bearing decisions in boot context. A loop that
has run 300 issues showed `episodes=600` and a green light, giving a maestro
resuming after context loss no signal that compaction was overdue.

This enriches the existing probe (no new probe / CHECK_NAMES / CLI command):

- `PRUNABLE_MEMORY_WARN_THRESHOLD` and `COMPACT_REMEDIATION` are declared once
  alongside the existing named `*_REMEDIATION` / verdict constants (no bare
  literal at the call site).
- The detail line additionally reports `prunable=<n>` (active items where
  `is_load_bearing_memory(item)` is False), computed over the already-fetched
  `active` list (no extra DB round-trip). Existing fields are unchanged.
- Prunable strictly > threshold -> WARN naming the count + compaction, with
  `COMPACT_REMEDIATION`; <= threshold -> unchanged PASS / remediation None.
- The absent (WARN), corrupt (FAIL) and unreadable (FAIL) branches are
  untouched; the prunable logic runs only on the clean-open path. Read-only.

Dependency on sibling #427: `is_load_bearing_memory` is NOT yet merged. Per the
ticket's "do not re-implement the classification" constraint (also manifesto
Q7), the probe consumes the predicate via injection and resolves it dynamically
in production (`_resolve_load_bearing_predicate`). Until #427 lands the symbol
is absent, so the probe degrades — reporting `prunable=unmeasured` and staying
PASS — exactly as `mutation_survivors_check` degrades without its #379 checker
(declared-degrade per Q11). When #427 merges the WARN gate lights up with no
further change here. Tests inject a fake predicate to exercise the full
PASS/WARN/boundary matrix now, and monkeypatch the module symbol to drive the
real CLI path end-to-end.

Tests (tests/test_doctor_control_plane.py): below-threshold PASS, above-
threshold WARN, strict `>` boundary (at-threshold PASS, +1 WARN), existing-
fields-preserved, unavailable-predicate degrade, corrupt-store FAIL not masked,
plus `doctor --json` WARN integration and human-table rendering of the prunable
count + remediation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@hadamrd hadamrd left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimal path to green (must-fix to merge)

Nothing blocks merge.

Optional follow-ups (do NOT block merge)

  • [sev3/tests] Full pytest suite was not run to completion in the worker worktree (slow-fsync disk); only targeted tests + ruff + pyright were run. The change is additive and well-isolated so this is low-risk, but confirm CI runs the full suite green before merge.
  • [sev3/tests] tests/test_doctor_control_plane.py:143 — _episodic_is_prunable encodes an assumed semantics of #427's not-yet-merged is_load_bearing_memory. The seam test will stay green even if #427 ships different classification. When #427 lands, replace the monkeypatched fake with a real end-to-end assertion against the actual predicate.
  • [sev3/style] tests/test_doctor_control_plane.py:165 — _seed_memory_backlog reaches into the private store._connection to set PRAGMA synchronous=OFF. Acceptable for a throwaway test db, but if this seeding pattern spreads, consider a small store/test helper rather than touching a private attribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Surface prunable-memory backlog in the forge-loop doctor memory_integrity probe

1 participant