fix(skstore): purge DeletedDir tombstones at end of update transaction by zaccharyfavere · Pull Request #1245 · SkipLabs/skip

zaccharyfavere · 2026-05-27T13:52:31Z

DeletedDir entries were created in Dirs.state by removeDir() to mark directories as deleted, but were never cleaned up. They accumulated indefinitely, causing a persistent memory leak proportional to the number of subdirectories created and destroyed over time.

The fix adds a purgeDeletedDirs() method on the Dirs class and calls it at the end of updateWithStatus(), right after the cascade finishes and fromContext is reset. At that point the tombstones have served their purpose and can safely be removed.

This is the natural cleanup point: it runs after each merge of changes into the context state, matching the lifecycle of the tombstones themselves.

cc @skiplabsdaniel

CLAassistant · 2026-05-27T13:52:49Z

All committers have signed the CLA.

mbouaziz · 2026-05-27T14:05:44Z

Hi, thanks for your contribution!
Sorry, CI failures might be because I switched jobs to use gen 2 circle ci instances but they may be counting memory differently

gen2 CircleCI machines enforce virtual memory limits at the cgroup level. The Skip runtime's default SKIP_CAPACITY of 16 GB (palloc.c) fails to mmap on any class with <16 GB RAM, producing "ERROR (MAP FAILED): Cannot allocate memory" early in the job. Observed on skipruntime/large.gen2 in PR #1245 and on an earlier skdb-wasm/large.gen2 run that died at 61 s. gen1 tolerated the same 16 GB virtual mmap because only RSS was enforced and the mmap is lazy. Set SKIP_CAPACITY explicitly per gen2 job, leaving ~25% RAM headroom for OS and tooling: check-ts medium.gen2 (4 GB) -> 3G skdb large.gen2 (8 GB) -> 6G skdb-wasm large.gen2 (8 GB) -> 6G skipruntime large.gen2 (8 GB)* -> 6G compiler xlarge.gen2 (16 GB) -> 12G * postgres + kafka sidecars get their own RAM allocation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

## Summary Gen2 CircleCI machines enforce virtual memory limits at the cgroup level. The Skip runtime's default `SKIP_CAPACITY` of 16 GB (`skiplang/prelude/runtime/palloc.c:449-451`) fails to `mmap` on any class with <16 GB RAM, producing `ERROR (MAP FAILED): Cannot allocate memory` early in the job. Observed on `skipruntime` / `large.gen2` in PR #1245 (job 16682) and on an earlier `skdb-wasm` / `large.gen2` run that died at 61 s (workflow `f78dd673`, 2026-05-27 09:37). Gen1 tolerated the same 16 GB virtual mmap because only RSS was enforced and the mmap is lazy — gen2 evidently does not. Set `SKIP_CAPACITY` explicitly per gen2 job, leaving ~25% RAM headroom for OS + tooling: | Job | Class | RAM | `SKIP_CAPACITY` | |---|---|---|---| | check-ts | medium.gen2 | 4 GB | `3G` | | skdb | large.gen2 | 8 GB | `6G` | | skdb-wasm | large.gen2 | 8 GB | `6G` | | skipruntime | large.gen2 (+ pg/kafka sidecars) | 8 GB primary | `6G` | | compiler | xlarge.gen2 | 16 GB | `12G` | Why also setting the ones that haven't (yet) failed: same `mmap` happens on every Skip-toolchain invocation, so any gen2 job using `skiplabs/skip*` images is one coin flip away from the same failure. `skipruntime` on `large.gen2` succeeded 3× in a row on the Phase 1 PR before failing on #1245 — the variance is real. `check-examples` is on `large.gen2` but uses `cimg/base` and only runs the Skip toolchain inside docker-compose containers (separate cgroups), so it's not affected. ## Test plan - [ ] Land this, then trigger a PR that exercises each gen2 workflow (touch `skiplang/prelude/src/foo` for compiler + skdb + skdb-wasm + skipruntime, and a ts workspace file for check-ts). - [ ] Confirm `skipruntime` no longer dies with `MAP FAILED` at `skargo build`. - [ ] Confirm `compiler` still passes — 12 G cap is well above measured peak (~10 GB). - [ ] Watch `skdb` and `skdb-wasm` over the next few PRs for any new failures specifically at allocation time; if they appear, drop the cap further. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

zaccharyfavere · 2026-05-27T14:25:58Z

Hi, alright thanks for your return !

mbouaziz · 2026-05-27T16:34:05Z

Sorry, still fixing the CI...
Very bad timing for a broken CI, we haven't had any external contribution for months 😆

zaccharyfavere · 2026-05-27T16:51:24Z

No worries, I just re pushed because I realised there was a format error still in my code. That should be fixed now since it made one of the CI pass

mbouaziz · 2026-05-28T08:39:39Z

I think the CI is back to normal, I've rebased your commit to check

mbouaziz · 2026-05-28T08:47:07Z

getDirsChangesAfter(tick) relies on DeletedDir tombstones to report deletions to downstream consumers after updateWithStatus returns. Purging them here means subsequent calls (e.g. importNext/import in the synchronizer's exportToRoot path, notifyAll, CHANGED_DIRS computation in resolveContext) won't see the deletion.

Concretely, a DROP TABLE/DELETE followed by update() would purge the tombstone before exportToRoot -> importNext scans for DeletedDir, so the drop never propagates to root.

I understand the memory leak but a safe purge needs to be bounded by the slowest consumer of the change history, e.g. track the min tick across active synchronizers/subscribers and only drop tombstones below it, or do the compaction at a synchronization boundary that's provably past the last reader.

Previously, DeletedDir tombstones accumulated indefinitely in the context's directory map, causing memory growth proportional to the number of removed directories. Then, I naively tried purging at the end of updateWithStatus, and it broke Synchronizer-based propagation: exportToRoot/importNext rely on DeletedDir tombstones in the source context after updateWithStatus returns to propagate deletions to root. This commit purges at the start of the two entry points that obtain a mutable context for an update cycle (runWithGc and runWithResult), just after mclone. At that point, the tombstones present in the context are leftovers from the previous tick, and all downstream consumers from that previous tick have already had the chance to observe them. A dedicated set of marked-deleted dir names is maintained to avoid scanning the full directory map at each update.

mbouaziz mentioned this pull request May 27, 2026

ci: set SKIP_CAPACITY on gen2 jobs to fit class RAM #1246

Merged

4 tasks

zaccharyfavere force-pushed the fix-deleteddir-leak branch from eb65f41 to a04d30d Compare May 27, 2026 16:28

mbouaziz force-pushed the fix-deleteddir-leak branch from a04d30d to 1b2a3bb Compare May 28, 2026 08:39

mbouaziz requested a review from skiplabsdaniel May 28, 2026 14:21

zaccharyfavere force-pushed the fix-deleteddir-leak branch from 1b2a3bb to 7671ddd Compare June 2, 2026 13:49

zaccharyfavere force-pushed the fix-deleteddir-leak branch from 7671ddd to a8191ad Compare June 2, 2026 15:07

Merge branch 'main' into fix-deleteddir-leak

2064dbf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(skstore): purge DeletedDir tombstones at end of update transaction#1245

fix(skstore): purge DeletedDir tombstones at end of update transaction#1245
zaccharyfavere wants to merge 2 commits into
SkipLabs:mainfrom
zaccharyfavere:fix-deleteddir-leak

zaccharyfavere commented May 27, 2026

Uh oh!

CLAassistant commented May 27, 2026 •

edited

Loading

Uh oh!

mbouaziz commented May 27, 2026

Uh oh!

zaccharyfavere commented May 27, 2026

Uh oh!

mbouaziz commented May 27, 2026

Uh oh!

zaccharyfavere commented May 27, 2026

Uh oh!

mbouaziz commented May 28, 2026

Uh oh!

mbouaziz commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zaccharyfavere commented May 27, 2026

Uh oh!

CLAassistant commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mbouaziz commented May 27, 2026

Uh oh!

zaccharyfavere commented May 27, 2026

Uh oh!

mbouaziz commented May 27, 2026

Uh oh!

zaccharyfavere commented May 27, 2026

Uh oh!

mbouaziz commented May 28, 2026

Uh oh!

mbouaziz commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented May 27, 2026 •

edited

Loading