Host test memory leak fixes + leak-hunting skill#4397
Open
Host test memory leak fixes + leak-hunting skill#4397
Conversation
The `manageHandleRegistration` modifier on `ResizablePanelGroup`'s Handle component called `registerResizeHandle()` (which returns an unregister function) but never returned a cleanup from the modifier. When a Handle was destroyed, its `setResizeHandlerState` closure stayed in the module-scope `registeredResizeHandlers` Set, and the global pointerdown/pointermove/pointerup/dblclick listeners on document.body stayed live. Because `setResizeHandlerState` captures `this` (the Handle component) and transitively the Ember owner, every mount/unmount of a ResizablePanelGroup accumulated a pinned owner graph. In host tests this showed up as every service (LoaderService, StoreService, MatrixService, etc.) growing +1 per test; in production it would leak listeners and component references any time a user mounted and unmounted resizable panels. Fix: `registerHandle()` now returns the unregister function from `registerResizeHandle()`, and the modifier returns a cleanup that calls it on destroy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…owners
Three related fixes for owner-leak retainers found via heap snapshot
analysis. Each closes a path where ApplicationInstance (and its services)
remained reachable across test boundaries.
1. boxelTransitionTo helper closure (registerBoxelTransitionTo,
render.ts #setupTransitionHelper):
The helper closures captured `router` (and in render.ts `this`),
were stashed on globalThis, and were never cleared on owner destroy.
Each callsite now passes a destroyable owner; a registerDestructor
removes the global iff it still points at the function we installed.
2. render-route globals (__boxelRenderContext, __renderModel,
__docsInFlight, __waitForRenderLoadStability):
Cleared in deactivate() but not on owner destroy. In tests the owner
is destroyed without a normal route teardown, leaving these closures
pinning `this` on globalThis. Added registerDestructor in beforeModel
for RenderRoute, ModuleRoute, CommandRunnerRoute, and
RenderFileExtractRoute to clear them on owner destroy.
3. panel-resize-handle-registry capture-flag mismatch:
pointerdown was added with `{ capture: true }` but removed without
the capture flag, so removeEventListener silently no-op'd. The
listener stayed on document.body, retaining its module-level closure
scope (which contained the registered handles' modifier functions
and through them the entire ApplicationInstance).
Collapses two long conditional/function-signature lines to single-line form per the project's prettier config. No behavioral change.
The module-level `testState` object inside `setupMockMatrix` held a
reference to the LAST test's `owner` and `sdk` until the next test's
`beforeEach` overwrote it. Between tests (and after the final test in
each module), that stale reference kept the entire owner subgraph
alive: services, Realm, TestRealmAdapter, queue handlers, bound Worker
methods, MockSDK event listeners, etc.
Heap-snapshot retainer chain confirmed the leak:
ApplicationInstance
← testState object (property "owner")
← MockUtils (property "testState")
← TestRealmAdapter (private #mockMatrixUtils)
← Realm (private #adapter)
Clearing `testState.owner`/`sdk`/`opts` in an afterEach (after
`settled()` so any in-flight async completes first) lets the previous
test's owner be garbage-collected before the next test starts.
Temporary instrumentation: after each test, force GC (if --expose-gc exposes it) and emit "PROBE t=N used=XMB total=YMB name=..." so CI logs show per-test heap deltas. Chrome args now include --expose-gc and --enable-precise-memory-info so the numbers are usable. To be reverted once the residual per-test leak is eliminated.
Per-test gc() calls (via --expose-gc) are what keep shard 3 under the 4GB old-space ceiling — V8's automatic GC couldn't reclaim fast enough and the heap drifted into OOM around test 139. The heap still grows ~20MB/test (155MB → 3.5GB across shard 3), so there is an underlying ~1-owner-subgraph-per-test leak that remains to be eliminated, but the forced GC keeps us below the ceiling. Cutting the log output to every 10 tests (MEMPROBE lines) — the per-test noise was pushing log size past useful.
- mock-matrix.ts: blank line between import groups (import/order) - setup-qunit.js: multi-line console.log (prettier) and drop an unused eslint-disable-next-line directive that was no longer needed after the log was already wrapped
Add a WeakRef to each test's this.owner in testDone (after the test body has run), then after gc() count how many prior owners are still dereferenceable. If live_owners grows linearly with test count, the ApplicationInstance is still leaking; if it stays near 0 (or bounded by gc cadence), the leak lives below the owner — e.g., in module-level caches that survive owner destruction.
Heap snapshot analysis showed `App._applicationInstances` accumulating ~1 prior-test ApplicationInstance per host test (11→51 over 40 tests). Each pinned instance retained its Registry → template factories → FieldComponent closures → Box trees → ~7.5MB of base-realm module sources, driving the host test suite into V8 OOM in long shards. Root cause: `ApplicationInstance.willDestroy()` calls `super.willDestroy()` THEN `_unwatchInstance(this)`. When something in the super chain throws, `_unwatchInstance` is skipped — but `@glimmer/destroyable` still marks the instance as `isDestroyed`. The Set keeps a strong reference to a destroyed corpse forever. Fix: in `QUnit.testDone`, after gc(), iterate `_applicationInstances` and delete any entry where `isDestroyed(inst)` is true. Provably safe — only sweeps instances already in the DESTROYED state per @glimmer/destroyable. Validated locally on `card-basics` (90 tests): - Heap growth t=10→t=90: 158→775 MB → 116→177 MB - Per-test slope (steady state): ~7.5 MB → ~0.125 MB - `card-api` source string copies: +31/40 tests → +1/40 tests - `app_instances` Set size: climbs to 50+ → stays at 0 Replaces the WeakRef-on-context.owner instrumentation (which always read 0 because `getContext()` returns undefined at testDone after `unsetContext()` runs in teardownContext) with a direct readout of `getApplication()._applicationInstances.size` plus per-state counts. With the leak gone, revert the temporary 30-shard CI bump back to 20.
Documents the local probe loop used to find the leaks in this branch and includes the snapshot-analysis scripts so the next person can repeat the investigation without re-deriving the workflow from scratch. - `.claude/skills/host-test-memory-leak-hunting/SKILL.md` — when to use, the probe loop step-by-step, pitfalls, and how to validate a fix. - `.claude/skills/host-test-memory-leak-hunting/known-leaks.md` — catalog of leaks found so far (App._applicationInstances Set, mock-matrix testState, globalThis closures + listener capture-flag mismatch, resize handle modifier) plus patterns to look for when adding new code. - `packages/host/scripts/heap-snapshot-runner.js` — CDP client that watches Chrome console for `MEMPROBE t=N` and writes `/tmp/snap-tN.heapsnapshot` (streams chunks to disk; joining blows V8's max string length past ~300MB). - `packages/host/scripts/snapshot-diff.js` — constructor-name count deltas between two snapshots, sorted by retained-size delta. - `packages/host/scripts/snapshot-by-class.js` — same idea, name-truncated for at-a-glance scanning, sorted by count delta. - `packages/host/scripts/snapshot-retainers.js` — backward BFS from matching nodes to GC roots; `--strong` flag (almost always required) skips weak/shortcut edges and WeakMap "part of key" internal edges that would otherwise mask the true retainer chain. - `packages/host/scripts/HEAP_PROBE.md` — invocation reference for the scripts, linked from the skill. Dev-time tooling only — not invoked from CI or production builds.
Preview deployments |
The leak sweep in setup-qunit.js brings growth down from ~7.5MB/test to ~0.125MB/test, so 20 shards is sufficient again.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR reduces memory growth in the host test suite by fixing several leak sources and adding per-test memory instrumentation, plus adding dev-time heap snapshot tooling and documentation to make future leak regressions easier to diagnose.
Changes:
- Add
QUnit.testDonecleanup/instrumentation: sweep destroyedApplicationInstances, force GC, and periodically logMEMPROBEheap + instance stats. - Fix additional per-test retention sources (mock-matrix
testStatecleanup, globalThis closures cleaned up with destructors, and event listener removal mismatch). - Add dev-time heap snapshot capture/analysis scripts and documentation (plus a Claude skill describing the workflow).
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/host/tests/helpers/setup-qunit.js | Adds per-test GC + MEMPROBE logging and sweeps destroyed ApplicationInstances from App._applicationInstances. |
| packages/host/tests/helpers/mock-matrix.ts | Clears module-level testState after each test to allow GC of owner/sdk graphs. |
| packages/host/testem.js | Enables --expose-gc and precise memory info for Chrome test runs. |
| packages/host/scripts/heap-snapshot-runner.js | New CDP-based snapshot runner that captures .heapsnapshot files at chosen probe indices. |
| packages/host/scripts/snapshot-diff.js | New tool to diff two heap snapshots by constructor counts and retained-size deltas. |
| packages/host/scripts/snapshot-by-class.js | New tool to report constructor count deltas with truncated names for readability. |
| packages/host/scripts/snapshot-retainers.js | New tool to find retainer paths from targets back to GC roots. |
| packages/host/scripts/HEAP_PROBE.md | Usage/reference docs for snapshot tooling. |
| packages/host/app/utils/register-boxel-transition.ts | Registers global transition helper with destructor-based cleanup to avoid global retention. |
| packages/host/app/routes/standby.ts | Updates callsite to pass an owner for destructor registration. |
| packages/host/app/routes/render/file-extract.ts | Clears __boxelRenderContext on owner destroy (tests) to prevent per-test retention. |
| packages/host/app/routes/render.ts | Adds destructor-based cleanup for globals + transition helper to avoid globalThis pinning in tests. |
| packages/host/app/routes/module.ts | Adds owner into model context and clears __boxelRenderContext on destroy (tests). |
| packages/host/app/routes/command-runner.ts | Clears __boxelRenderContext on destroy and updates transition helper registration signature. |
| packages/host/app/components/card-prerender.gts | Passes owner through module model context so global cleanup can be tied to a destroyable. |
| packages/boxel-ui/addon/src/components/resizable-panel-group/utils/panel-resize-handle-registry.ts | Fixes removeEventListener capture mismatch so listeners are actually removed. |
| packages/boxel-ui/addon/src/components/resizable-panel-group/handle.gts | Ensures resize-handle registration is unregistered when the modifier is destroyed. |
| .claude/skills/host-test-memory-leak-hunting/known-leaks.md | Adds catalog of known leak patterns and fixes. |
| .claude/skills/host-test-memory-leak-hunting/SKILL.md | Adds documented workflow for leak hunting and validating fixes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Pre-filter raw message strings before JSON.parse in cdpCall and the console listener so HeapProfiler chunk notifications aren't parsed redundantly - Replace queue.shift() with head-index pointer in BFS retainer walk to avoid O(n²) on large snapshots
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a stack of memory leaks in the host test suite that were driving CI shards toward V8 OOM in long runs, plus the per-test instrumentation that surfaces future regressions and a Claude Code Skill that documents the hunting workflow.
The dominant leak (~7.5 MB / test) was
App._applicationInstancesretaining destroyed-but-not-unwatched ApplicationInstances. The fix sweeps them out inQUnit.testDone. Smaller fixes address listener-removal mismatches, globalThis closures that pinned the test owner, and per-test state held bysetupMockMatrix.Validation
Local
card-basicsprobe (90 tests):App._applicationInstancessizeWith the leak gone, this PR also reverts the temporary 30-shard CI bump back to 20.
What's in the PR
The fixes (9 commits):
ApplicationInstances fromApp._applicationInstancesMEMPROBEinstrumentation insetup-qunit.js(per-test gc + every-10 log line)setupMockMatrixtestState cleanupglobalThisclosure + listenercaptureflag fixesThe tooling (1 commit):
.claude/skills/host-test-memory-leak-hunting/— Skill (workflow) + known-leaks catalogpackages/host/scripts/{heap-snapshot-runner,snapshot-diff,snapshot-by-class,snapshot-retainers}.js— CDP-driven snapshot capture and analysispackages/host/scripts/HEAP_PROBE.md— invocation referenceThe scripts are dev-time only — never invoked from CI or production builds.
Open follow-up
super.willDestroy()inApplicationInstancethrows somewhere in the chain — that's why_unwatchInstanceis skipped and the workaround sweep is needed. The sweep is provably safe (it only removes instances@glimmer/destroyablehas already marked DESTROYED), but the underlying error is worth tracking down so we can file an upstream Ember issue.Test plan
MEMPROBElog lines in CI output showapp_instances=0andused=flat across the runtests/index.html?filter=card-basicslocally, confirm no test failures