Skip to content

feat(gate): DescriptionLengthGate — MDL promotion gate (sibling to HeldOutGate)#230

Merged
drewstone merged 1 commit into
mainfrom
feat/description-length-gate
Jun 7, 2026
Merged

feat(gate): DescriptionLengthGate — MDL promotion gate (sibling to HeldOutGate)#230
drewstone merged 1 commit into
mainfrom
feat/description-length-gate

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

What

Adds DescriptionLengthGate — a Minimum-Description-Length promotion gate, sibling to HeldOutGate, in src/description-length-gate.ts. Implements the Builder/Breaker acceptance rule from Wang & Buehler, Self-Revising Discovery Systems for Science (arXiv:2606.01444, MIT LAMM):

L(M, D) = λ·L_model(M) + L_data(D | M)
accept M' over M  iff  L(M', D) < L(M, D)
  • L_model = gzip-bit size of the candidate's model text (a deterministic Kolmogorov-complexity proxy).
  • L_data = residual surprise −Σ log2(score) over the shared tasks, capped at a score floor (a failed task costs ~10 bits, never ∞).
  • Both candidate and baseline are scored on the same enlarged evidence (every shared task, not just the held-out split).

Why merge it

HeldOutGate answers "does the candidate generalize on held-out items?" via a paired-delta CI + an overfit-gap heuristic. DescriptionLengthGate answers a different, complementary question: "does the candidate explain the evidence more cheaply?" — it charges the candidate for its own size, so a model that scores better only by bloating (memorizing counterexamples) grows L_model faster than it shrinks L_data and is rejected with rejectionCode: 'model_bloat'. That's the principled, information-theoretic form of the overfit penalty HeldOutGate approximates — useful whenever the model text whose size you want to penalize is available (a prompt, skill, profile, or symbolic model). It needs no held-out split, which makes it usable in self-improvement loops where every task is accumulated evidence.

It's a pure, additive primitive: a substrate gate alongside the existing one, callers opt in by constructing it.

Notes

  • Zero new dependencies (node:zlib); no upward deps — substrate-pure.
  • Pure & deterministic; 9 unit tests incl. the anti-overfit core (a higher-scoring but bloated candidate is rejected) and the calibration property (a larger model promotes only once accumulated evidence pays for its bits; lambda is the lever).
  • Full suite green locally: 1913 passed, biome + typecheck + build + verify:package all clean.

A Minimum-Description-Length promotion gate, sibling to HeldOutGate,
implementing the Builder/Breaker acceptance rule from Wang & Buehler,
'Self-Revising Discovery Systems for Science' (arXiv:2606.01444):

  L(M, D) = lambda * L_model(M) + L_data(D | M)
  accept M' over M  iff  L(M', D) < L(M, D)

Both candidate and baseline are scored on the same enlarged evidence
(every shared task, not just held-out). L_model is the gzip-bit size of
the model text (a Kolmogorov-complexity proxy); L_data is the residual
surprise -sum log2(score), capped at the score floor. A candidate that
improves outcomes by bloating its model grows L_model faster than it
shrinks L_data and is rejected (rejectionCode 'model_bloat') — a
complexity-penalized alternative to HeldOutGate's overfit-gap heuristic,
for when the model text whose size to penalize is available.

Pure, deterministic, zero new deps (node:zlib). 9 tests.
@drewstone drewstone merged commit 9e52e08 into main Jun 7, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant