Skip to content

fix(datastore): reclaim catalog space via manual rebuild instead of VACUUM (swamp-club#494)#1480

Merged
stack72 merged 1 commit into
mainfrom
fix/issue-494-vacuum-rebuild
May 31, 2026
Merged

fix(datastore): reclaim catalog space via manual rebuild instead of VACUUM (swamp-club#494)#1480
stack72 merged 1 commit into
mainfrom
fix/issue-494-vacuum-rebuild

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented May 31, 2026

Summary

swamp datastore compact always failed its VACUUM step with VACUUM skipped: too many attached databases - max 0 (ERR_SQLITE_ERROR), so it never reclaimed fragmented pages — only the WAL checkpoint freed any space.

Root cause: SQLite's native VACUUM (and VACUUM INTO) cannot run under Deno's node:sqlite binding. Both ATTACH a temporary database internally, but the binding hardcodes SQLITE_LIMIT_ATTACHED to 0, so the attach is rejected. Confirmed reproducible under both deno run and the compiled binary — VACUUM has never functioned in this runtime (PR #1340 only made the failure skip gracefully). Reproduced end-to-end on a fresh repo with no datastore configured, confirming the fault is purely in the local catalog.

Fix: Replace the VACUUM call in CatalogStore.vacuum() with a manual rebuild that never uses ATTACH:

  • Copy all DDL from sqlite_master (tables before indexes/triggers/views) into a fresh single-file database via a second connection.
  • Stream every row of every user table by rowid in ITERATE_PAGE_SIZE (1000) batches inside one transaction, so a large (giga-swamp) catalog is never materialized in memory.
  • Atomically swap the rebuilt file into place (POSIX rename-replace, with a remove-then-rename fallback for Windows), dropping the old WAL/SHM sidecars, then reopen and restore WAL mode.
  • close() is now idempotent so the CLI's finally { close() } cannot mask a reopen failure (node:sqlite throws on double-close).

catalog_meta (schema_version, populated) is copied by the generic table pass, so reopening does not trigger a destructive re-migration or needless backfill. The vacuum() (): boolean contract and the DatastoreCompactDeps boundary are unchanged, so the libswamp compact operation and CLI command are untouched. The renderer's skip wording is softened (JSON shape unchanged).

Test Plan

  • New unit tests in catalog_store_test.ts:
    • reclaims space and preserves rows (3000+ rows, deletes half, crosses the page boundary, asserts file size strictly decreases and a reopened store sees identical data)
    • vacuum on an empty catalog returns true and keeps the schema
    • store is usable after vacuum, and a second close() is a no-op
  • deno check, deno lint, deno fmt --check all clean.
  • Full suite: 6524 passed, 0 failed.
  • Compiled-binary end-to-end against a scratch repo: the VACUUM skipped warning is gone and swamp datastore compact --json reports "vacuumSkipped": false (previously true).

Fixes swamp-club#494.

🤖 Generated with Claude Code

…ACUUM (swamp-club#494)

SQLite's native VACUUM (and VACUUM INTO) cannot run under Deno's node:sqlite
binding: both ATTACH a temporary database internally, but the binding hardcodes
SQLITE_LIMIT_ATTACHED to 0, so the attach is rejected with "too many attached
databases - max 0". As a result `swamp datastore compact` always skipped VACUUM
and never reclaimed fragmented pages — only the WAL checkpoint freed space.

Replace the VACUUM call with a manual rebuild that never uses ATTACH: copy the
schema and stream every row into a fresh single-file database via a second
connection, then atomically swap it into place and reopen. Rows are paged by
rowid in ITERATE_PAGE_SIZE batches so a large catalog is never loaded into
memory. close() is now idempotent so the CLI's finally{close()} cannot mask a
reopen failure.

Co-authored-by: Sean Escriva <webframp@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI UX Review

Blocking

None.

Suggestions

  1. datastore_compact.ts:37 — progress message still says "Vacuuming"
    The vacuuming event still logs Vacuuming catalog database (this may take a moment).... The underlying operation is now a manual row-by-row rebuild, not a SQLite VACUUM. Updating to something like Rebuilding catalog database (this may take a moment)... would keep terminology consistent with the updated skip message (Catalog rebuild skipped). Low priority since users just want compaction to work, but the mismatch between the progress line and the skip/success lines is mildly confusing.

  2. vacuumSkipped JSON field name — now that the operation is a rebuild, the field name is slightly misleading for script consumers. Renaming to rebuildSkipped would be cleaner, but that is a breaking change to the JSON contract and not worth the churn here. Worth noting in a follow-up.

Verdict

PASS — the one changed user-visible line is an improvement (removes the technical VACUUM skipped (runtime limitation) jargon in favour of plain Catalog rebuild skipped). JSON shape is unchanged. No new flags, help text, or error-message regressions.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Clean, well-tested fix. The manual-rebuild approach correctly works around the SQLITE_LIMIT_ATTACHED=0 limitation. Changes are properly contained to the infrastructure persistence layer — the domain/application boundary (DatastoreCompactDeps, vacuum(): boolean contract) is unchanged, which is good DDD practice.

Blocking Issues

None.

Suggestions

  1. closed flag not set during vacuum() swap (catalog_store.ts:626): vacuum() calls this.db.close() without setting this.closed = true. If the subsequent rename or reopen throws, the store has a closed db handle but closed === false, so a caller's finally { store.close() } would attempt a double-close and crash — defeating the idempotent-close guarantee this PR introduces. Consider setting this.closed = true before this.db.close() at line 626, then resetting this.closed = false after successful reopen at line 645.

  2. SQL identifier quoting in copyTableRows (catalog_store.ts:698-708): Table and column names are interpolated with double-quote wrapping ("${table}", "${c}") but embedded double-quotes are not escaped (the standard SQL escape is ""). The names all come from sqlite_master/PRAGMA table_info on the application's own schema, so this is safe for the known catalog tables — but if a future migration ever introduced a name containing ", the rebuild would silently produce broken SQL. Low risk, minor hardening.

  3. Verbose JSDoc blocks: Several new multi-line comment blocks were added (e.g., vacuum(), openConnection(), rebuildInto(), copyTableRows()). CLAUDE.md says "one short line max" for comments. The why here is genuinely non-obvious (VACUUM limitation, rebuild strategy), which justifies commenting, but the blocks could be condensed. Existing code in this file uses similar JSDoc style so this is consistent with local convention.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

None found that block this merge.

The finding below is notable but ultimately MEDIUM because the preconditions (filesystem failures during rename/reopen) are unlikely on any platform where the compact command can successfully write the rebuilt temp file.

Medium

  1. vacuum() bypasses the closed flag, allowing double-close to mask errorscatalog_store.ts:626

    The PR adds closed specifically so finally { close() } cannot mask a reopen failure. But vacuum() calls this.db.close() directly on line 626, never setting this.closed = true. If any operation between line 626 and the successful reopen (line 644) throws — e.g. Deno.renameSync fails on both the POSIX and Windows paths, or openConnection/PRAGMA journal_mode=WAL fails — the exception propagates to the caller. The CLI's finally { catalogStore.close() } (datastore_compact.ts:92-94) then enters close(), sees this.closed === false, and calls this.db.close() on the already-closed handle. node:sqlite throws "database is not open", which replaces the original exception.

    Breaking scenario: Disk full after rebuildInto succeeds — Deno.renameSync fails. The user sees "database is not open" instead of the real ENOSPC error.

    Suggested fix: Set this.closed = true immediately after this.db.close() on line 626, and set this.closed = false after the successful reopen on line 644. This way the finally { close() } is a safe no-op if the swap/reopen fails.

    Classified as MEDIUM rather than HIGH because: (a) the precondition (rename or open failing after a successful rebuild-and-write of the same filesystem) is unlikely, and (b) the user still sees a crash either way — only the error message is worse.

  2. removeDbFiles swallows all errors, not just "file not found"catalog_store.ts:729-738

    The catch block ignores every exception from Deno.removeSync, including PermissionDenied or Busy. If sidecars can't be removed (e.g. antivirus holding a lock on Windows), the stale -wal or -shm could shadow the rebuilt database after the rename, causing WAL corruption or stale reads on the next open. A targeted catch for Deno.errors.NotFound would be safer.

Low

  1. Number(row.__rowid) truncates bigint rowids above 2^53catalog_store.ts:719

    SQLite rowids are 64-bit. Number() loses precision above Number.MAX_SAFE_INTEGER. Two distinct high rowids could map to the same Number, making lastRowid never advance — an infinite loop. In practice, no catalog will have > 9 quadrillion rows, so this is theoretical only.

  2. Identifier quoting in copyTableRows doesn't escape embedded double-quotescatalog_store.ts:700,706

    Column and table names are interpolated with "${name}" quoting. A name containing " would break the SQL (SQLite requires "" escaping). All current names are simple ASCII identifiers from createSchema(), so this is safe today but would silently break if a future schema introduces quoted identifiers.

Verdict

PASS — The core rebuild logic is well-structured: schema-then-data ordering, transaction-wrapped row copy, paged cursor to bound memory, cross-platform rename fallback, and idempotent close for the happy path. The medium findings are edge cases in error paths that are unlikely to trigger in practice. The test coverage for the happy path (including multi-page batches and empty catalogs) is solid.

@stack72 stack72 merged commit 47fe96f into main May 31, 2026
11 checks passed
@stack72 stack72 deleted the fix/issue-494-vacuum-rebuild branch May 31, 2026 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant