Add robustness loops: quality validation, consensus, and critic-judge refinement by BillJr99 · Pull Request #64 · BillJr99/CognitiveLoopKernel

BillJr99 · 2026-05-24T15:35:11Z

Summary

This PR introduces a comprehensive robustness framework that wraps agent dispatches with three layers of quality assurance and adaptive retry logic:

Response Quality Validation — scores responses for emptiness, malformed blocks, missing outputs, and low confidence
Auto-Consensus Fan-Out — stages marked careful: true (or all stages when configured) fan into N parallel samples with chief coalescing
Critic-Judge Refinement — draft → critic → revise inner loop that iterates until the critic approves or max rounds reached

Additionally, the Ralph and autoresearch loops now detect plateaus and regressions, escalating to consensus and reframing rather than burning the full iteration budget.

Key Changes

Core Robustness Layers

response_quality.py (new) — Validator module that detects:
- Empty or sub-threshold responses
- Malformed ACTION/POST blocks
- Missing declared outputs
- Low confidence or needs-review flags
- Refusal patterns
- Returns ResponseQuality verdict with recoverable/non-recoverable distinction and repair hints
agent.py — Refactored dispatch entry point:
- run() now wraps _dispatch_once() with quality and consensus layers
- _dispatch_with_quality_loop() — retries failed responses with repair preambles, escalates to consensus on final retry
- _dispatch_auto_consensus() — fans out to N samples + chief coalescing
- _should_auto_consensus() — proactive fan-out decision logic
- _META_PHASES frozenset prevents recursion (consensus, checkpoint, recovery, critic, etc. never re-trigger loops)
workflow.py — Critic-judge refinement:
- WorkflowStage.refine field for explicit critic configuration
- _refine_enabled() — decides when to run critic based on stage config or robustness.auto_refine policy
- _refine_loop() — dispatches critic, scores response, re-dispatches worker with feedback up to max_rounds
- Integrated into _run_stage() after initial dispatch

Adaptive Loop Control

ralph_loop.py — Plateau and regression detection:
- _progress_verdict() — classifies iterations as improving/plateau/regressing
- _handle_plateau() — escalates to consensus fan-out, then reframes workflow
- _should_terminate_for_plateau() — graceful termination when stuck
- _adaptive_extra() — injects careful=true into next iteration's dispatches when escalating
autoresearch_loop.py — Similar plateau detection for experiment loops

Blackboard & Q&A Protocol

blackboard.py — Extended Post class:
- target_agent and urgency fields for directed questions
- find_unanswered_questions() filters by target agent
- Questions with urgency: blocking answered inline; async surfaced to chief

Configuration & Documentation

config.py — New robustness block in DEFAULT_CLK_CONFIG:
- auto_consensus (off | on_careful | always)
- auto_refine (off | careful_only | all)
- max_quality_retries, min_response_chars
- refine_max_rounds, refine_accept_threshold
- max_qa_depth, plateau_window, plateau_action
.env.example — Documented all CLK_ROBUSTNESS_* env vars with cost multipliers
kickoff.sh — Added env-var override block that maps CLK_ROBUSTNESS_* into clk.config.json
README.md — Comprehensive sections:
- "Robustness loops" explaining all four layers
- "Robustness-loop multipliers"

https://claude.ai/code/session_01XYLA51b49wz4S2THibSjaF

…ch, Q&A, refine, plateau adaptation Adds five reinforcing robustness layers around the existing dispatch chokepoint in AgentRunner.run() and the workflow stage runner, all gated by a new `robustness` config block with kill switches. * Layer 1 — auto-consensus + output-quality re-dispatch (agent.py + new response_quality.py). Every non-meta dispatch is now scored after the provider returns; empty / malformed / contract-violating / low-confidence responses are re-dispatched with a repair preamble, escalating to a stochastic consensus fan-out on the final retry. Stages marked `careful: true` (or all stages when set to "always") fan out into N samples proactively without needing the agent to emit PROPOSE_CONSENSUS. * Layer 2 — inter-agent Q&A protocol (blackboard.py + agent.py). POST: question blocks now carry TO/URGENCY; blocking questions are routed inline to the target peer so the asker effectively sees the answer in subsequent rounds. Caps Q&A chain depth via robustness.max_qa_depth. * Layer 3 — critic-judge refinement loop (workflow.py). New `refine:` stage attribute runs draft → critic → revise until the critic accepts or max_rounds is hit, with auto_refine triggering the same for `careful: true` stages by default. * Layer 4 — adaptive Ralph + autoresearch (ralph_loop.py, autoresearch_loop.py). Both loops now guard against malformed planner output (skip the iteration rather than commit garbage), detect plateau / regression, escalate via consensus, reframe via the chief, and terminate gracefully when escalation can't break the plateau. Autoresearch gains an evaluator-gated rollback parallel to Ralph's. * Layer 5 — prompt updates (templates/prompts.py). Adds CONFIDENCE / NEEDS_REVIEW footer to every base agent and teaches Q&A + plateau awareness in the shared blackboard protocol and ralph.md. 29 new tests cover response-quality scoring, POST: question parsing with TO/URGENCY, find_unanswered_questions, workflow YAML parsing of refine:, quality-retry firing on empty responses, retry capping at max_quality_retries, meta-phase bypass of the quality loop, and the auto-consensus mode matrix (off / on_careful / always). All 180 existing tests continue to pass. https://claude.ai/code/session_01XYLA51b49wz4S2THibSjaF

…tency test Docs-everywhere pass that completes the robustness-loops work. Anyone cloning the repo can now read the README + the install scripts and know what each loop does, how it triggers, and how to dial it up/down — including the prior knobs (provider retry, supervise cycles, consensus caps, recovery, meta-prompting) that had no single home in the docs before. README.md * New "## Robustness loops" section (after "## Loops") that walks through every loop in order — provider retry, stage retry, supervise cycles, recovery on unmet deps, review/checkpoint stages, auto-quality re-dispatch, stochastic consensus (opt-in + automatic), inter-agent Q&A, critic-judge refinement, adaptive Ralph & autoresearch — each with the YAML/config snippet that tunes it, the activity-log event name to grep for, and a kill switch. * New "Putting it together" subsection traces a careful stage end to end so the user can see all layers compose. * "What's new" gets a top-of-file changelog entry summarising the layers and pointing at the new section. * "## Self-healing on unmet deps" cross-links the new section (dependency vs. content failures). * "## Dynamic agents (casting)" mentions the new Q&A POST type. * "## Cost guardrails" grows a "Robustness-loop multipliers" subsection with a worst-case-cost table per knob plus cost-minimal and cost-maximal config recipes. .env.example * New "Robustness loops" block with CLK_ROBUSTNESS_* lines for every knob: AUTO_CONSENSUS, AUTO_REFINE, MAX_QUALITY_RETRIES, MIN_RESPONSE_CHARS, REFINE_MAX_ROUNDS, REFINE_ACCEPT_THRESHOLD, QA_PARALLEL_JUDGES, MAX_QA_DEPTH, PLATEAU_WINDOW, PLATEAU_ACTION. * New "Prior-knob reference" block documenting the legacy knobs that the docs claimed parity for but had no single source: provider timeouts and retry policy, supervise cycles, consensus caps, casting cap, auto-commit, validation batch caps, meta-prompt mode, review per-stage, recovery cap. kickoff.sh * After `clk init`, a new Python block reads every CLK_ROBUSTNESS_* and prior CLK_* env var and writes it into .clk/config/clk.config.json via dotted-path assignment. Unset vars fall through to DEFAULT_CLK_CONFIG; partially-set envs are honored. Header comment enumerates the full env-var surface so future contributors can find it. scripts/install_local.sh * Header expanded from ~13 lines to a self-contained narrative that describes (a) what the script does, (b) the .clk/ directory layout it creates, (c) all three install strategies with their fallbacks, (d) the extras-group syntax, (e) what the script does NOT install (provider CLIs, telegram, docker, github) with pointers to the README sections that cover each, (f) the related entry points (scripts/clk, install_tool.sh, run_loop.sh, kickoff.sh). scripts/clk, install_tool.sh, run_loop.sh * Header cross-references added so any one of them lands a reader on the canonical install procedure (install_local.sh) and the README sections (Loops, Robustness loops, Provider and authentication). tests/test_docs_consistency.py * New mechanical assertions that the four sources stay aligned: DEFAULT_CLK_CONFIG['robustness'] keys ↔ CLK_ROBUSTNESS_* lines in .env.example ↔ env-var mapping in kickoff.sh ↔ README mentions in the Robustness-loops + Cost-guardrails sections. Plus parity checks for the prior-knob inventory and the install-script header content. Eight tests, all passing — adding a new robustness knob in the future requires touching all four files or the suite fails. All 188 unit tests pass. https://claude.ai/code/session_01XYLA51b49wz4S2THibSjaF

- README ## Docker section: add a callout block explaining that CLK_RUN_INSTALL should stay false (the default) in Docker because the Dockerfile already installs all deps at build time. - README ## First-run setup wizard: expand the 'Loop settings' bullet to explain what the install flag does and explicitly say to leave it false in Docker. - kickoff.sh setup wizard: extend the _sv_explain text for the 'run install' prompt to tell users running in Docker that false is correct because the image ships with deps pre-installed. https://claude.ai/code/session_01XYLA51b49wz4S2THibSjaF

claude added 3 commits May 24, 2026 15:06

BillJr99 merged commit c07c5bb into master May 24, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add robustness loops: quality validation, consensus, and critic-judge refinement#64

Add robustness loops: quality validation, consensus, and critic-judge refinement#64
BillJr99 merged 3 commits into
masterfrom
claude/agent-robustness-looping-g9eBe

BillJr99 commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BillJr99 commented May 24, 2026

Summary

Key Changes

Core Robustness Layers

Adaptive Loop Control

Blackboard & Q&A Protocol

Configuration & Documentation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants