Host test memory leak fixes + leak-hunting skill by lukemelia · Pull Request #4397 · cardstack/boxel

lukemelia · 2026-04-13T23:51:06Z

Summary

Fixes a stack of memory leaks in the host test suite that were driving CI shards toward V8 OOM in long runs, plus the per-test instrumentation that surfaces future regressions and a Claude Code Skill that documents the hunting workflow.

The dominant leak (~7.5 MB / test) was App._applicationInstances retaining destroyed-but-not-unwatched ApplicationInstances. The fix sweeps them out in QUnit.testDone. Smaller fixes address listener-removal mismatches, globalThis closures that pinned the test owner, and per-test state held by setupMockMatrix.

Validation

Local card-basics probe (90 tests):

Metric	Before fix stack	After fix stack
Heap growth t=10→t=90	158 → 775 MB	116 → 177 MB
Per-test slope (steady state)	~7.5 MB/test	~0.125 MB/test
`App._applicationInstances` size	climbs 11 → 51 (+1/test)	stays at 0
Base-realm module-source duplication	+31 copies / 40 tests	+1 copy / 40 tests

With the leak gone, this PR also reverts the temporary 30-shard CI bump back to 20.

What's in the PR

The fixes (9 commits):

Sweep destroyed ApplicationInstances from App._applicationInstances
MEMPROBE instrumentation in setup-qunit.js (per-test gc + every-10 log line)
setupMockMatrix testState cleanup
globalThis closure + listener capture flag fixes
Resize handle modifier listener cleanup

The tooling (1 commit):

.claude/skills/host-test-memory-leak-hunting/ — Skill (workflow) + known-leaks catalog
packages/host/scripts/{heap-snapshot-runner,snapshot-diff,snapshot-by-class,snapshot-retainers}.js — CDP-driven snapshot capture and analysis
packages/host/scripts/HEAP_PROBE.md — invocation reference

The scripts are dev-time only — never invoked from CI or production builds.

Open follow-up

super.willDestroy() in ApplicationInstance throws somewhere in the chain — that's why _unwatchInstance is skipped and the workaround sweep is needed. The sweep is provably safe (it only removes instances @glimmer/destroyable has already marked DESTROYED), but the underlying error is worth tracking down so we can file an upstream Ember issue.

Test plan

All 20 host-test shards pass on this PR
Shard 3 (the historical OOM canary) passes well within memory limits
MEMPROBE log lines in CI output show app_instances=0 and used= flat across the run
Spot-check: load tests/index.html?filter=card-basics locally, confirm no test failures

The `manageHandleRegistration` modifier on `ResizablePanelGroup`'s Handle component called `registerResizeHandle()` (which returns an unregister function) but never returned a cleanup from the modifier. When a Handle was destroyed, its `setResizeHandlerState` closure stayed in the module-scope `registeredResizeHandlers` Set, and the global pointerdown/pointermove/pointerup/dblclick listeners on document.body stayed live. Because `setResizeHandlerState` captures `this` (the Handle component) and transitively the Ember owner, every mount/unmount of a ResizablePanelGroup accumulated a pinned owner graph. In host tests this showed up as every service (LoaderService, StoreService, MatrixService, etc.) growing +1 per test; in production it would leak listeners and component references any time a user mounted and unmounted resizable panels. Fix: `registerHandle()` now returns the unregister function from `registerResizeHandle()`, and the modifier returns a cleanup that calls it on destroy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…owners Three related fixes for owner-leak retainers found via heap snapshot analysis. Each closes a path where ApplicationInstance (and its services) remained reachable across test boundaries. 1. boxelTransitionTo helper closure (registerBoxelTransitionTo, render.ts #setupTransitionHelper): The helper closures captured `router` (and in render.ts `this`), were stashed on globalThis, and were never cleared on owner destroy. Each callsite now passes a destroyable owner; a registerDestructor removes the global iff it still points at the function we installed. 2. render-route globals (__boxelRenderContext, __renderModel, __docsInFlight, __waitForRenderLoadStability): Cleared in deactivate() but not on owner destroy. In tests the owner is destroyed without a normal route teardown, leaving these closures pinning `this` on globalThis. Added registerDestructor in beforeModel for RenderRoute, ModuleRoute, CommandRunnerRoute, and RenderFileExtractRoute to clear them on owner destroy. 3. panel-resize-handle-registry capture-flag mismatch: pointerdown was added with `{ capture: true }` but removed without the capture flag, so removeEventListener silently no-op'd. The listener stayed on document.body, retaining its module-level closure scope (which contained the registered handles' modifier functions and through them the entire ApplicationInstance).

Collapses two long conditional/function-signature lines to single-line form per the project's prettier config. No behavioral change.

The module-level `testState` object inside `setupMockMatrix` held a reference to the LAST test's `owner` and `sdk` until the next test's `beforeEach` overwrote it. Between tests (and after the final test in each module), that stale reference kept the entire owner subgraph alive: services, Realm, TestRealmAdapter, queue handlers, bound Worker methods, MockSDK event listeners, etc. Heap-snapshot retainer chain confirmed the leak: ApplicationInstance ← testState object (property "owner") ← MockUtils (property "testState") ← TestRealmAdapter (private #mockMatrixUtils) ← Realm (private #adapter) Clearing `testState.owner`/`sdk`/`opts` in an afterEach (after `settled()` so any in-flight async completes first) lets the previous test's owner be garbage-collected before the next test starts.

Temporary instrumentation: after each test, force GC (if --expose-gc exposes it) and emit "PROBE t=N used=XMB total=YMB name=..." so CI logs show per-test heap deltas. Chrome args now include --expose-gc and --enable-precise-memory-info so the numbers are usable. To be reverted once the residual per-test leak is eliminated.

Per-test gc() calls (via --expose-gc) are what keep shard 3 under the 4GB old-space ceiling — V8's automatic GC couldn't reclaim fast enough and the heap drifted into OOM around test 139. The heap still grows ~20MB/test (155MB → 3.5GB across shard 3), so there is an underlying ~1-owner-subgraph-per-test leak that remains to be eliminated, but the forced GC keeps us below the ceiling. Cutting the log output to every 10 tests (MEMPROBE lines) — the per-test noise was pushing log size past useful.

- mock-matrix.ts: blank line between import groups (import/order) - setup-qunit.js: multi-line console.log (prettier) and drop an unused eslint-disable-next-line directive that was no longer needed after the log was already wrapped

Add a WeakRef to each test's this.owner in testDone (after the test body has run), then after gc() count how many prior owners are still dereferenceable. If live_owners grows linearly with test count, the ApplicationInstance is still leaking; if it stays near 0 (or bounded by gc cadence), the leak lives below the owner — e.g., in module-level caches that survive owner destruction.

Heap snapshot analysis showed `App._applicationInstances` accumulating ~1 prior-test ApplicationInstance per host test (11→51 over 40 tests). Each pinned instance retained its Registry → template factories → FieldComponent closures → Box trees → ~7.5MB of base-realm module sources, driving the host test suite into V8 OOM in long shards. Root cause: `ApplicationInstance.willDestroy()` calls `super.willDestroy()` THEN `_unwatchInstance(this)`. When something in the super chain throws, `_unwatchInstance` is skipped — but `@glimmer/destroyable` still marks the instance as `isDestroyed`. The Set keeps a strong reference to a destroyed corpse forever. Fix: in `QUnit.testDone`, after gc(), iterate `_applicationInstances` and delete any entry where `isDestroyed(inst)` is true. Provably safe — only sweeps instances already in the DESTROYED state per @glimmer/destroyable. Validated locally on `card-basics` (90 tests): - Heap growth t=10→t=90: 158→775 MB → 116→177 MB - Per-test slope (steady state): ~7.5 MB → ~0.125 MB - `card-api` source string copies: +31/40 tests → +1/40 tests - `app_instances` Set size: climbs to 50+ → stays at 0 Replaces the WeakRef-on-context.owner instrumentation (which always read 0 because `getContext()` returns undefined at testDone after `unsetContext()` runs in teardownContext) with a direct readout of `getApplication()._applicationInstances.size` plus per-state counts. With the leak gone, revert the temporary 30-shard CI bump back to 20.

Documents the local probe loop used to find the leaks in this branch and includes the snapshot-analysis scripts so the next person can repeat the investigation without re-deriving the workflow from scratch. - `.claude/skills/host-test-memory-leak-hunting/SKILL.md` — when to use, the probe loop step-by-step, pitfalls, and how to validate a fix. - `.claude/skills/host-test-memory-leak-hunting/known-leaks.md` — catalog of leaks found so far (App._applicationInstances Set, mock-matrix testState, globalThis closures + listener capture-flag mismatch, resize handle modifier) plus patterns to look for when adding new code. - `packages/host/scripts/heap-snapshot-runner.js` — CDP client that watches Chrome console for `MEMPROBE t=N` and writes `/tmp/snap-tN.heapsnapshot` (streams chunks to disk; joining blows V8's max string length past ~300MB). - `packages/host/scripts/snapshot-diff.js` — constructor-name count deltas between two snapshots, sorted by retained-size delta. - `packages/host/scripts/snapshot-by-class.js` — same idea, name-truncated for at-a-glance scanning, sorted by count delta. - `packages/host/scripts/snapshot-retainers.js` — backward BFS from matching nodes to GC roots; `--strong` flag (almost always required) skips weak/shortcut edges and WeakMap "part of key" internal edges that would otherwise mask the true retainer chain. - `packages/host/scripts/HEAP_PROBE.md` — invocation reference for the scripts, linked from the skill. Dev-time tooling only — not invoked from CI or production builds.

github-actions · 2026-04-13T23:55:20Z

Preview deployments

The leak sweep in setup-qunit.js brings growth down from ~7.5MB/test to ~0.125MB/test, so 20 shards is sufficient again.

github-actions · 2026-04-14T00:26:08Z

Realm Server Test Results

1 files ±0 1 suites ±0 16m 0s ⏱️ + 2m 33s
844 tests ±0 844 ✅ ±0 0 💤 ±0 0 ❌ ±0
915 runs ±0 915 ✅ ±0 0 💤 ±0 0 ❌ ±0

Results for commit 2f7f67f. ± Comparison against base commit 9131738.

♻️ This comment has been updated with latest results.

github-actions · 2026-04-14T00:34:07Z

Host Test Results

2 194 tests ±0 2 179 ✅ ±0 2h 31m 38s ⏱️ + 2m 24s
1 suites ±0 15 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 2f7f67f. ± Comparison against base commit 9131738.

♻️ This comment has been updated with latest results.

Copilot

Pull request overview

This PR reduces memory growth in the host test suite by fixing several leak sources and adding per-test memory instrumentation, plus adding dev-time heap snapshot tooling and documentation to make future leak regressions easier to diagnose.

Changes:

Add QUnit.testDone cleanup/instrumentation: sweep destroyed ApplicationInstances, force GC, and periodically log MEMPROBE heap + instance stats.
Fix additional per-test retention sources (mock-matrix testState cleanup, globalThis closures cleaned up with destructors, and event listener removal mismatch).
Add dev-time heap snapshot capture/analysis scripts and documentation (plus a Claude skill describing the workflow).

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
packages/host/tests/helpers/setup-qunit.js	Adds per-test GC + `MEMPROBE` logging and sweeps destroyed `ApplicationInstance`s from `App._applicationInstances`.
packages/host/tests/helpers/mock-matrix.ts	Clears module-level `testState` after each test to allow GC of owner/sdk graphs.
packages/host/testem.js	Enables `--expose-gc` and precise memory info for Chrome test runs.
packages/host/scripts/heap-snapshot-runner.js	New CDP-based snapshot runner that captures `.heapsnapshot` files at chosen probe indices.
packages/host/scripts/snapshot-diff.js	New tool to diff two heap snapshots by constructor counts and retained-size deltas.
packages/host/scripts/snapshot-by-class.js	New tool to report constructor count deltas with truncated names for readability.
packages/host/scripts/snapshot-retainers.js	New tool to find retainer paths from targets back to GC roots.
packages/host/scripts/HEAP_PROBE.md	Usage/reference docs for snapshot tooling.
packages/host/app/utils/register-boxel-transition.ts	Registers global transition helper with destructor-based cleanup to avoid global retention.
packages/host/app/routes/standby.ts	Updates callsite to pass an owner for destructor registration.
packages/host/app/routes/render/file-extract.ts	Clears `__boxelRenderContext` on owner destroy (tests) to prevent per-test retention.
packages/host/app/routes/render.ts	Adds destructor-based cleanup for globals + transition helper to avoid globalThis pinning in tests.
packages/host/app/routes/module.ts	Adds owner into model context and clears `__boxelRenderContext` on destroy (tests).
packages/host/app/routes/command-runner.ts	Clears `__boxelRenderContext` on destroy and updates transition helper registration signature.
packages/host/app/components/card-prerender.gts	Passes `owner` through module model context so global cleanup can be tied to a destroyable.
packages/boxel-ui/addon/src/components/resizable-panel-group/utils/panel-resize-handle-registry.ts	Fixes `removeEventListener` capture mismatch so listeners are actually removed.
packages/boxel-ui/addon/src/components/resizable-panel-group/handle.gts	Ensures resize-handle registration is unregistered when the modifier is destroyed.
.claude/skills/host-test-memory-leak-hunting/known-leaks.md	Adds catalog of known leak patterns and fixes.
.claude/skills/host-test-memory-leak-hunting/SKILL.md	Adds documented workflow for leak hunting and validating fixes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

packages/host/scripts/heap-snapshot-runner.js

packages/host/scripts/snapshot-retainers.js

packages/host/scripts/heap-snapshot-runner.js

- Pre-filter raw message strings before JSON.parse in cdpCall and the console listener so HeapProfiler chunk notifications aren't parsed redundantly - Replace queue.shift() with head-index pointer in BFS retainer walk to avoid O(n²) on large snapshots

lukemelia and others added 10 commits April 13, 2026 19:47

Fix prettier formatting in memory-leak fixes

161f388

Collapses two long conditional/function-signature lines to single-line form per the project's prettier config. No behavioral change.

lukemelia added 2 commits April 13, 2026 20:01

Revert temporary 30-shard bump now that leak is fixed

ac02861

The leak sweep in setup-qunit.js brings growth down from ~7.5MB/test to ~0.125MB/test, so 20 shards is sufficient again.

Fix prettier formatting in snapshot-retainers.js

8c2f1e4

lukemelia requested a review from Copilot April 14, 2026 00:45

Copilot started reviewing on behalf of lukemelia April 14, 2026 00:45 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

packages/host/scripts/heap-snapshot-runner.js Show resolved Hide resolved

packages/host/scripts/snapshot-retainers.js Outdated Show resolved Hide resolved

packages/host/scripts/heap-snapshot-runner.js Show resolved Hide resolved

lukemelia requested review from a team, backspace and habdelra April 14, 2026 01:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Host test memory leak fixes + leak-hunting skill#4397

Host test memory leak fixes + leak-hunting skill#4397
lukemelia wants to merge 13 commits intomainfrom
cs-host-test-memory-leak-fixes

lukemelia commented Apr 13, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lukemelia commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

What's in the PR

Open follow-up

Test plan

Uh oh!

github-actions bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Preview deployments

Uh oh!

github-actions bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Realm Server Test Results

Uh oh!

github-actions bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Host Test Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lukemelia commented Apr 13, 2026 •

edited

Loading

github-actions bot commented Apr 13, 2026 •

edited

Loading

github-actions bot commented Apr 14, 2026 •

edited

Loading

github-actions bot commented Apr 14, 2026 •

edited

Loading