feat(gate): DescriptionLengthGate — MDL promotion gate (sibling to HeldOutGate) by drewstone · Pull Request #230 · tangle-network/agent-eval

drewstone · 2026-06-07T11:37:49Z

What

Adds DescriptionLengthGate — a Minimum-Description-Length promotion gate, sibling to HeldOutGate, in src/description-length-gate.ts. Implements the Builder/Breaker acceptance rule from Wang & Buehler, Self-Revising Discovery Systems for Science (arXiv:2606.01444, MIT LAMM):

L(M, D) = λ·L_model(M) + L_data(D | M)
accept M' over M  iff  L(M', D) < L(M, D)

L_model = gzip-bit size of the candidate's model text (a deterministic Kolmogorov-complexity proxy).
L_data = residual surprise −Σ log2(score) over the shared tasks, capped at a score floor (a failed task costs ~10 bits, never ∞).
Both candidate and baseline are scored on the same enlarged evidence (every shared task, not just the held-out split).

Why merge it

HeldOutGate answers "does the candidate generalize on held-out items?" via a paired-delta CI + an overfit-gap heuristic. DescriptionLengthGate answers a different, complementary question: "does the candidate explain the evidence more cheaply?" — it charges the candidate for its own size, so a model that scores better only by bloating (memorizing counterexamples) grows L_model faster than it shrinks L_data and is rejected with rejectionCode: 'model_bloat'. That's the principled, information-theoretic form of the overfit penalty HeldOutGate approximates — useful whenever the model text whose size you want to penalize is available (a prompt, skill, profile, or symbolic model). It needs no held-out split, which makes it usable in self-improvement loops where every task is accumulated evidence.

It's a pure, additive primitive: a substrate gate alongside the existing one, callers opt in by constructing it.

Notes

Zero new dependencies (node:zlib); no upward deps — substrate-pure.
Pure & deterministic; 9 unit tests incl. the anti-overfit core (a higher-scoring but bloated candidate is rejected) and the calibration property (a larger model promotes only once accumulated evidence pays for its bits; lambda is the lever).
Full suite green locally: 1913 passed, biome + typecheck + build + verify:package all clean.

A Minimum-Description-Length promotion gate, sibling to HeldOutGate, implementing the Builder/Breaker acceptance rule from Wang & Buehler, 'Self-Revising Discovery Systems for Science' (arXiv:2606.01444): L(M, D) = lambda * L_model(M) + L_data(D | M) accept M' over M iff L(M', D) < L(M, D) Both candidate and baseline are scored on the same enlarged evidence (every shared task, not just held-out). L_model is the gzip-bit size of the model text (a Kolmogorov-complexity proxy); L_data is the residual surprise -sum log2(score), capped at the score floor. A candidate that improves outcomes by bloating its model grows L_model faster than it shrinks L_data and is rejected (rejectionCode 'model_bloat') — a complexity-penalized alternative to HeldOutGate's overfit-gap heuristic, for when the model text whose size to penalize is available. Pure, deterministic, zero new deps (node:zlib). 9 tests.

drewstone merged commit 9e52e08 into main Jun 7, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gate): DescriptionLengthGate — MDL promotion gate (sibling to HeldOutGate)#230

feat(gate): DescriptionLengthGate — MDL promotion gate (sibling to HeldOutGate)#230
drewstone merged 1 commit into
mainfrom
feat/description-length-gate

drewstone commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

drewstone commented Jun 7, 2026

What

Why merge it

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant