Port response-quality scoring and consensus fan-out to TypeScript by BillJr99 · Pull Request #66 · BillJr99/CognitiveLoopKernel

BillJr99 · 2026-05-24T16:34:22Z

Summary

This PR ports the Python harness's response-quality scoring and stochastic consensus logic into the Pi extension as real, enforceable tools. Previously, orchestration policy lived only in the chief's prompt; now it's enforced in code so the chief cannot accidentally skip critical steps like parallel sampling, quality validation, or Ralph branch creation.

Key Changes

Response-quality scorer (src/quality.ts): TypeScript port of clk_harness/orchestration/response_quality.py. Detects empty responses, refusals, malformed blocks, low confidence, and missing declared outputs. Generates repair hints for recoverable failures so re-rolls fix specific issues rather than re-rolling at random.
Consensus and quality-dispatch primitives (src/consensus.ts):
- dispatchWithQuality() — wraps a single subagent dispatch with automatic quality re-dispatch loop (up to maxRetries attempts with repair preambles)
- runConsensus() — fans out N parallel tmux subagent samples, scores each, returns the highest-scoring winner plus all candidates for traceability
New orchestration tools (src/tools.ts):
- clk_consensus — fan-out N parallel samples (default 3, clamped 1..6) for high-stakes decisions
- clk_subagent_quality — single subagent with quality gate and automatic repair re-rolls
- clk_autoresearch — bounded researcher + critic alternation (default 2 iterations)
- clk_ralph — one-call Ralph iteration: creates branch, fans out consensus, returns winner (branch creation and fan-out happen in one step and cannot be skipped)
Git push integration (src/git.ts):
- hasRemote() — check if a remote exists
- commitsAhead() — count local commits not yet on upstream
- pushBestEffort() — best-effort git push that never throws, returns success/failure with reason
- pushIfEnabled() helper in tools.ts auto-pushes on CLK_GITHUB_PUSH_ON_COMMIT=true and surfaces ↑N ahead count in status bar
Updated chief primer (src/prompts.ts): New dispatch tool quick reference explaining when to use each tool (subagent vs. quality vs. consensus vs. autoresearch vs. ralph).
Comprehensive test coverage:
- tests/quality.test.ts — unit tests for scoring logic (mirrors Python harness tests)
- tests/consensus.test.ts — tests for quality-dispatch loop and consensus fan-out with injectable spawn function
- tests/git.test.ts — tests for remote/push/ahead helpers
- Updated tests/index.test.ts and tests/prompts.test.ts to verify new tools are registered
Documentation (pi-extension/README.md): Expanded with tool reference section, quality-scoring rules, and clarification that orchestration policy is now enforced in code, not just in the prompt.

Notable Implementation Details

Quality scorer uses pure regex/string operations (no I/O) so it's fast and testable without mocking.
Repair hints quote every failure reason back to the worker so it understands what to fix.
Consensus fan-out uses a configurable maxParallel to limit concurrent tmux sessions (default min(4, samples)).
All push operations are best-effort and never block tool results on git bookkeeping failures.
The chief can no longer accidentally skip Ralph branch creation or consensus fan-out by misreading the prompt — the tools enforce the shape.

https://claude.ai/code/session_012nKhcka2fhuazbVbhQpRm1

…arness Two of the recent CLK harness PRs have a direct parallel in pi-extension: * push-on-commit + ahead counter (756723c). pi-extension already commits every clk_checkpoint / clk_merge call, but never pushes — a remote-backed Pi workspace silently accumulated local commits. - src/git.ts: hasRemote, commitsAhead, pushBestEffort (best-effort, never throws; mirrors clk_harness/git_ops.py). - src/tools.ts: pushIfEnabled helper called after clk_checkpoint and clk_merge. Gated on CLK_GITHUB_PUSH_ON_COMMIT=true to match the Python TUI; surfaces an ↑N ahead count on push failure or when auto-push is disabled but commits exist. - src/index.ts: /clk-doctor now reports the ahead count and warns when local commits haven't reached origin. * multi-line objective truncation (24f379b). idea.slice(0, 60) was being done before splitting on newlines, so a multi-line idea could leak a fragment of line 2 into the status bar. - src/index.ts: new firstLineShort helper, used at every ctx.ui.setStatus("clk-idea", …) site and in /clk-doctor. Tests: tests/git.test.ts covers no-remote/sync/unreachable cases for pushBestEffort and commitsAhead. tests/index.test.ts asserts firstLineShort returns single-line, capped output for multi-line input.

… Ralph Ports the Python harness's orchestration loops into the TypeScript extension so the chief can drive real code-enforced fan-out instead of having to fan-out by emitting parallel clk_subagent calls and hoping it followed the prompt. src/quality.ts (new) Port of clk_harness/orchestration/response_quality.py. Pure regex / string scorer — no I/O, no provider calls. Detects empty bodies, refusal phrases, malformed ACTION / POST blocks, missing declared POST PRODUCES keys, low CONFIDENCE: <n> values, and NEEDS_REVIEW: true. Exposes scoreResponse, repairHint, isRecoverable, summarise. src/consensus.ts (new) Two primitives, both with an injectable spawn function so tests can drive them without tmux / pi installed: * dispatchWithQuality — wraps a single spawnSubagent in the quality re-dispatch loop. Re-runs with a repair-preamble preface on every recoverable failure up to maxRetries. * runConsensus — fan-out N parallel tmux samples for the same task, score each, return all + the winner. Pool runner caps concurrent in-flight sessions via maxParallel. src/subagent.ts Exposes spawnSubagent + SpawnOptions so consensus.ts can call them. Behaviour unchanged. src/tools.ts (+428 LOC) Four new tools registered alongside the existing roster: * clk_subagent_quality — one subagent + quality re-rolls. * clk_consensus — N samples, scored, winner returned. * clk_autoresearch — researcher + critic alternation (iterations are recorded on progress.md). * clk_ralph — branch + consensus fan-out in one call; the chief then calls clk_merge or clk_revert based on validation. Each tool surfaces a structured details payload so the chief sees scores, attempts, and flags rather than just the winning text. src/prompts.ts Updated chief primer to direct the chief through the new tools (Dispatch tool quick reference, restated rules 3, 4, 5A). The old "emit 3-5 clk_subagent calls in the same message" guidance is replaced by "call clk_consensus" so fan-out is enforced in code, not by chief compliance. src/index.ts /clk-help lists every orchestration tool and notes the CLK_GITHUB_PUSH_ON_COMMIT auto-push behaviour landed in the prior commit. Tests: 24 new tests across quality.test.ts (happy paths, every failure mode, repairHint / isRecoverable / summarise) and consensus.test.ts (injected spawn covers ok / retry / max-retries / non-recoverable refusal / fan-out winner picking / sample clamping / error capture / maxParallel concurrency). index.test.ts and prompts.test.ts updated to assert the new tools are registered and named in the chief primer. All 94 tests pass, typecheck clean.

…, doctor Updates both READMEs to reflect the orchestration work that just landed in pi-extension and the recent main-line PRs (push-on-commit, doctor / diag CLI, multi-line truncation fix) that already shipped to master but weren't fully cross-referenced. pi-extension/README.md (full rewrite, +293 net lines) * Replaces the "8 small tools" narrative with a proper Tool Reference that groups roster / dispatch / iterative-refinement and explains when to pick clk_subagent vs clk_subagent_quality vs clk_consensus vs clk_autoresearch vs clk_ralph. * New "Response-quality scoring" section listing every flag the detector raises and how the repair-preamble loop quotes them back to the worker. Cross-references the Python harness's response_quality.py so behaviour drift between the two implementations is one diff away from being noticed. * New "Auto-push (opt-in)" section covering CLK_GITHUB_PUSH_ON_COMMIT, the ↑N ahead counter, and the pre-push secret-scanner interaction. * Commands table extended with /clk-help, /clk-doctor, /clk-undo (these existed in the code but the README only listed /clk and /clk-abort). * "What you keep / what changes" tables rewritten: stochastic consensus, quality re-dispatch, and Ralph refinement are now described as code-enforced (not chief-compliance dependent), and the comparison row about robustness loops names the new tools as the per-call equivalents of the Python harness's clk.config.json::robustness.* knobs. * Repository layout updated with src/quality.ts, src/consensus.ts, the new test files, and explicit per-file purposes. * "Testing" section reflects the real 96-test count and notes the suite runs entirely offline (consensus tests inject a fake spawn). README.md (main) — targeted updates * Pi extension section: brief but accurate rundown of the new orchestration tools, a Commands table that matches /clk-help, the CLK_GITHUB_PUSH_ON_COMMIT env var, and an updated example transcript that uses clk_consensus / clk_autoresearch / clk_ralph by name rather than the "fans out to 3 subagents" abstraction. * Layout section: pi-extension/ subtree expanded to show every src/ file with a one-line purpose, including the new quality.ts and consensus.ts. * Testing section: pi-extension test count corrected from 53 to 96 (~1s → ~2s), and the per-suite description rewritten to name the new modules (quality / consensus / git auto-push helpers / firstLineShort) so a contributor browsing the README knows what is and isn't covered.

claude added 3 commits May 24, 2026 16:19

BillJr99 merged commit 66d8ee6 into master May 24, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port response-quality scoring and consensus fan-out to TypeScript#66

Port response-quality scoring and consensus fan-out to TypeScript#66
BillJr99 merged 3 commits into
masterfrom
claude/pr-changes-pi-extension-wwEZP

BillJr99 commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BillJr99 commented May 24, 2026

Summary

Key Changes

Notable Implementation Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants