Local-only multi-agent development harness. Drop clk into an empty
directory, capture an idea, and let a team of agents iterate the idea
into a working system through repeated agentic development cycles. The
chief casts the team dynamically per project, the agents emit machine-
parsed ACTION: blocks that the harness executes, and every change is
committed automatically.
Experimental software — use at your own risk. CLK is a research prototype. It is not intended for, and has not been evaluated or deemed suitable for, any particular purpose, production use, or critical workload. No warranty is provided, express or implied. By using this software you accept all associated risks.
Contributions, bug reports, and ideas are very welcome — feel free to open an issue or pull request!
If you've used CLK before, the highlights of this release:
- Robustness loops by default. Every meaningful dispatch is now
scored after the provider returns; empty / malformed / contract-
violating / low-confidence responses are re-dispatched with a repair
preamble, escalating to a stochastic consensus fan-out on the final
retry. Stages marked
careful: truefan into N parallel samples proactively (configurable viarobustness.auto_consensus). The critic-judge inner loop (refine:stage attribute, or default-on for careful stages) drives draft → critic → revise until the critic signs off. Ralph and autoresearch detect plateau / regression and escalate-then-reframe instead of burning the full iteration budget. Agents can ask peers directed clarifying questions viaPOST: question TO: <peer> URGENCY: blockingand the harness routes the answer inline. Everything is gated byclk.config.json::robustness.*(orCLK_ROBUSTNESS_*env vars) so you can throttle cost — see Robustness loops below. - The setup wizard explains itself.
kickoff.sh --setupis now a series of explain-then-ask blocks (provider, loop settings, tool detection, telegram, GitHub, git identity) — every question is preceded by a short block telling you what the value does. Modeled onscripts/install_local.sh's narration style. - Tool auto-install. Pick a provider whose CLI isn't installed and
the wizard surfaces the canonical install command (
npm install -g …,curl -fsSL https://ollama.ai/install.sh | sh, etc.) and asks before running it. The same registry powers/installfrom inside the TUI. - First-use configuration. After install, every tool goes through
the same four-step shape: auth → upstream route → model → verify.
Pi prompts for its upstream provider (openrouter / anthropic /
openai / google) and the right env-var receives your API key.
Ollama runs
ollama list, lets you pick a local model or pull a new one with progress streaming. Re-run any time via/configure [tool]. - GitHub integration. The wizard offers to skip, link an existing
repo, or create a new private one. A hardened
.gitignoreand a pre-push secret scanner protect against accidental.env/ API-key leaks.CLK_GITHUB_PUSH_ON_COMMIT=truemakes each agent commit push automatically. - Friendlier TUI. First-run welcome banner,
/helpmodal overlay (F1 or?), state-aware hint bar above the input, in-title USD cost estimate, narrative status snapshots, and follow-on suggestions after every workflow and loop ("next:/loop ralph 5to refine,/undoto revert, or type a follow-up message"). The user always knows the next move. - Recoverability everywhere. Atomic
.envand JSON writes with.bakrotation;kickoff.sh --restoreswaps it back. Per-step resume in the wizard via.clk/.setup-progress. Crashed-session detection in the TUI surfaces "recovered from a crashed session" and points at the preservedconversation.md./undoreverts the last clk-authored commit after explicit confirm. /doctorand/diag. Health-check every provider and config;--fixprompts before repairing./diagbuilds a redacted tarball for bug reports — API keys are replaced with<redacted: N chars>./tutorial. A 30-second sample idea against theshellprovider so first-time users see agents working end-to-end without spending a cent.- Workspace management.
./kickoff.sh --list,--clean 7d,/workspacesinside the TUI. Old kickoff dirs no longer pile up. - Always-confirm policy. Every install, push, undo, ollama pull,
cost-cap crossing, or
--cleanremoval asks[y/N]every single time. There is no "remember my answer" setting — by design.
See the Recoverability, GitHub integration, Diagnostics, Workspaces, and Cost guardrails sections below for the full walkthroughs.
- Local-first. Everything lives under
.clk/in the project directory. No global installs, nosudo. - Provider-agnostic. Works with Claude Code, OpenAI Codex, Google Gemini, OpenWebUI (any OpenAI-compatible HTTP server), Pi, local Ollama, or a built-in dummy "shell" provider for testing.
- Dynamic team. A baseline of three agents (
chief,qa,ralph) ships with the harness; the chief invents project-specific specialists on the fly — includingengineerwhen an implementer is needed — writes their prompts, and authors the workflow YAML that wires them together. - Real actions, not just descriptions. Agents emit
ACTION:blocks (write/edit/append/delete/run/done) that the harness applies with path-safety checks, automatic backups, and per-agent git commits. - Self-healing. When a stage's dependencies fail, the chief is dispatched in recovery mode (capped) to fix or re-cast rather than silently skipping.
- Iterative by design. Ships with Archon-style YAML workflows and a Ralph/gnhf-style improvement loop; the same ralph agent also drives Karpathy-style autoresearch cycles when the state has open questions.
- Memory through git. Every successful milestone (and every action
batch) is committed with a structured message so future agent runs
can mine the log for context. A separate
.clk/state/casting.logrecords every roster decision, and.clk/logs/session.logmirrors the TUI status pane.
Skim this matrix to jump straight to the right tutorial. Every path
goes through the same kickoff.sh --setup wizard at some point, so once
you've configured CLK in one place you can mix and match the rest.
| Platform / mode | Tutorial |
|---|---|
| Local Linux / macOS / WSL (Python) | Quick start → Lower-level CLI |
| Docker container (build locally) | Docker → First-run setup |
| Pre-built image from GHCR | Docker → Pull from GHCR |
Raspberry Pi (pi runtime) |
Pi extension |
| REST API (drive CLK from code) | REST API |
| Chat-control from your phone | Telegram Bot |
Every tutorial ends with a "You should now see…" verification step. If something differs, check the Troubleshooting notes inline in the section you followed.
The fastest path is the kickoff script, which copies the harness into a
fresh workspace/kickoff-<timestamp>/ directory, gives it its own git repo, and
launches the TUI dashboard. The source tree is never modified.
Want chat control? After running
--setuponce, see the Telegram Bot section to drive CLK from your phone with live status updates.
# Optional: copy .env.example to .env to set defaults non-interactively.
./kickoff.sh "A local-first journaling app that summarizes my week"
# First time? Run the setup wizard to create your .env:
./kickoff.sh --setup
# Or omit the prompt and type your idea into the TUI:
./kickoff.shkickoff.sh reads all settings from .env (and optional CLI overrides) and
requires no interactive prompts during a normal run. If required config is
missing it prints exactly what's needed and offers to run --setup for you.
# CLI overrides (override any .env value for a single run)
./kickoff.sh --provider claude --max-iterations 10 "My idea"
./kickoff.sh --no-tui "My idea"
# Re-run setup at any time to update your .env:
./kickoff.sh --setupThe TUI shows live agent cards (idle / working / done / failed), a
status log that updates in place, and a Claude-Code-style > input
field. Use it to type follow-ups; each message dispatches another
engineering cycle so the agents react to the new context.
| TUI command | Effect |
|---|---|
| free text | first message becomes the idea, then auto-runs casting + engineering; later messages append to the conversation and re-cast + re-run |
/help (or F1, or ? when empty) |
open the in-place help overlay with every command listed |
/idea <text> |
replace the captured idea |
/cast |
force a fresh chief casting pass against the current state |
/roles list |
print the current roster (baseline + dynamic) |
/roles add NAME "role description" |
add a dynamic role (the chief usually does this for you) |
/roles drop NAME |
remove a dynamic role (baseline cannot be removed) |
/run [workflow] |
run a single workflow cycle (default engineering) |
/loop ralph 5 |
start a Ralph refinement loop with 5 iterations |
/loop autoresearch 3 |
start a Karpathy-style research loop (ralph agent, research mode) |
/stop |
request the active loop to stop after the current iteration |
/abort |
SIGTERM any running CLI subprocess (use when an agent is genuinely hung; the heartbeat tells you when this is likely) |
/provider <name> |
switch the active provider; verifies it's reachable and warns if not |
/install [tool] |
install a missing provider CLI (claude, pi, ollama, …) via the registry in scripts/install_tool.sh |
/configure [tool] |
(re-)run a tool's first-use config — auth, upstream route, model picking |
/github |
inspect the current remote and link instructions for adding one |
/undo |
preview the last clk-authored commit; /undo confirm reverts it |
/doctor [--fix] |
health-check every provider, config, and git state; --fix prompts before repairing |
/diag |
bundle the logs, last 3 runs, and a redacted .env into clk-diag-<ts>.tar.gz for bug reports |
/tutorial |
run a 30-second sample idea on the shell provider — costs nothing |
/workspaces list|rename|switch|clean |
manage past kickoff dirs under workspace/ |
/status |
print a narrative session snapshot (idea, agents, tokens, files, per-provider cost) |
/quit |
exit the TUI |
PgUp/PgDn scroll the log pane; Backspace edits the input; Enter sends.
The input area wraps when you type past one row and the status log
word-wraps every entry. A one-line hint bar above the input adapts to
state: if no idea is captured yet it says "type your idea, or
/tutorial, or /help"; if a run failed with a missing CLI it says
"try /install <provider> to fix"; if an agent is working it points
at /abort. You always know your next move.
The title bar shows: project, active provider, current phase, total
tokens, estimated USD cost for the session (via the per-provider
table in clk_harness/pricing.py), files written, and a ↑N counter
for commits not yet pushed to the GitHub remote (when configured).
CLI providers (claude, codex, gemini, pi) stream their
subprocess stdout/stderr live: every line the CLI prints (auth status,
"Connecting...", retries, etc.) appears in the status pane within
milliseconds, and each agent card has a "live" rotating view showing
PID + bytes received + the most recent line. The heartbeat fires every
~15s while an agent is working and tells you whether the subprocess is
actively streaming or silent — and if it's been silent for more than
two minutes it suggests typing /abort. So you can immediately tell
"this is just a slow model call" from "this is genuinely hung."
If you'd rather drive the harness without the TUI:
./scripts/install_local.sh # local pip install (optional)
./scripts/clk init
./scripts/clk idea "A local-first journaling app that summarizes my week"
./scripts/clk plan
./scripts/clk run
./scripts/clk loop --max-iterations 10
./scripts/clk status
./scripts/clk providersSet CLK_NO_TUI=true in your environment (or .env) to make kickoff.sh
fall back to this non-interactive pipeline.
The shell/dummy provider is the default and always works, so you can
exercise the entire harness with no API keys. Switch providers by
editing .clk/config/providers.json, via the TUI's /provider command,
or:
./scripts/clk configure --set default_provider=claudeCLK ships a FastAPI-based HTTP server that exposes a subset of CLI
commands programmatically — specifically: init, idea, plan, run,
loop, and status (see /api/capabilities for the authoritative list).
Use it to integrate CLK into your own tooling, drive it from a web UI,
or orchestrate it from CI pipelines without spawning a terminal.
pip install "clk-harness[api]"The REST API starts automatically in the background whenever you run
any clk sub-command (provided the optional [api] extras are installed).
A [clk] REST API listening on http://… banner is printed to stderr at
startup. You can also start it standalone:
# Using the console-script entry point (recommended)
clk-api
# Or via the module entry point
python -m clk_harness.api
# Or via uvicorn directly
uvicorn clk_harness.api:app --host 0.0.0.0 --port 8001The server listens on port 8001 by default. Override with
CLK_API_PORT=<port>.
Warning: the REST API has no authentication and binds to
0.0.0.0(all interfaces) by default. This default suits sandbox / container environments where network isolation is provided by the runtime. Do not expose the API port to an untrusted network without additional access controls. For local development, restrict the server to loopback (127.0.0.1) using the mechanisms below.
When the CLI starts, the REST API auto-starts on a background daemon thread
and prints a [clk] banner to stderr. Override the bind address or disable
the API entirely:
| Mechanism | Effect |
|---|---|
CLK_API_HOST=127.0.0.1 |
Restrict the API to loopback (recommended for local dev) |
CLK_API_PORT=<port> |
Change the listen port (default 8001) |
clk --no-api <cmd> |
Skip the background API for this invocation |
CLK_DISABLE_API=1 |
Disable the background API for all CLI invocations |
If the optional [api] extras (fastapi, uvicorn) are not installed,
the background thread is silently skipped and the CLI works normally.
# Health check
curl http://localhost:8001/api/healthz
# Create a workspace
WS=$(curl -s -X POST http://localhost:8001/api/workspaces \
-H 'Content-Type: application/json' \
-d '{"name": "my-project"}' \
| python3 -c "import sys,json; print(json.load(sys.stdin)['workspace_id'])")
# Capture an idea
TASK=$(curl -s -X POST http://localhost:8001/api/research \
-H 'Content-Type: application/json' \
-d "{\"command\":\"idea\",\"args\":[\"A local-first journaling app\"],\"workspace_id\":\"$WS\"}" \
| python3 -c "import sys,json; print(json.load(sys.stdin)['task_id'])")
# Stream live output
curl -sN http://localhost:8001/api/research/$TASK/streamSee docs/REST_API.md for the full endpoint reference, SSE event format, and more examples.
The harness ships with a Dockerfile. Kickoff directories are created under
workspace/ inside the container; mount a volume there to keep them after
the container exits.
The default mode is the interactive TUI dashboard — run with -it so the
container has a terminal. If no .env is present it will prompt for provider
and settings before launching. Pass your idea as the first argument to skip
the prompt and go straight to the engineering workflow.
install_local.shis not needed inside Docker. TheDockerfilerunspip install -e .at image-build time, so all Python dependencies are already present. KeepCLK_RUN_INSTALL=false(the default) — setting it totruein a Docker environment would redundantly re-create a.clk/venvthat the container doesn't need.
All examples below assume the image is tagged clk locally — either
build it from source or pull a prebuilt image and re-tag it (see the next
two sections).
docker build -t clk .Prebuilt images are published to GitHub Container Registry on every push to
main (tagged latest and main), every semver tag (vX.Y.Z → X.Y.Z,
X.Y), and every commit (sha-<short>):
docker pull ghcr.io/billjr99/cognitiveloopkernel:latest
docker tag ghcr.io/billjr99/cognitiveloopkernel:latest clkThe docker tag step lets every later command in this README refer to the
image simply as clk. If you'd rather not re-tag, substitute
ghcr.io/billjr99/cognitiveloopkernel:latest for clk in the examples
below.
kickoff.sh loads /app/.env at startup, so any setting that can be
configured via CLK_* env vars (provider, API keys, git identity, etc.)
can also live in a single file. There are two ways to provide it:
Bind-mount a host file at /app/.env — recommended when you want the
setup wizard's edits to persist back to disk:
touch ~/clk.env # create empty file first (Docker quirk)
docker run --rm -it \
-v ~/clk.env:/app/.env \
-v clk-workspace:/app/workspace \
clk "My idea here"Pass it via --env-file — simpler when the file is read-only config:
docker run --rm -it \
--env-file ~/clk.env \
-v clk-workspace:/app/workspace \
clk "My idea here"The bind-mount approach is required if you want to use --setup (the wizard
writes back into /app/.env); --env-file only injects vars at start.
Run the setup wizard to create your .env. The wizard is structured
as a series of explain-then-ask blocks — each section tells you
what the value does before asking for it, modeled on the
scripts/install_local.sh narration style. Sections (in order):
- Provider — pick the AI that writes code (
shell,claude,codex,gemini,pi,ollama,openwebui). One-liner per choice. - Loop settings — max iterations, project name, install flag,
TUI/no-TUI. The install flag (
CLK_RUN_INSTALL) controls whetherscripts/install_local.shruns inside each kickoff directory to create a local.clk/venv. Leave itfalse(the default) when running in Docker — the image already has all Python dependencies installed at build time, so the local venv step is unnecessary. - Auth mode — only for CLI providers;
clireuses your localclaude login/codex login/gemini login,apikeyprompts for a key directly. - Tool detection + auto-install — checks whether the chosen
provider's CLI is on PATH; if not, surfaces the canonical install
command and asks before running it. Backed by
scripts/install_tool.sh's registry — same commands the TUI's/installuses. - First-use configure — auth → upstream route → model →
verify. Pi picks
openrouter/anthropic/openai/googleand sets the right{ROUTE}_API_KEYenv var. Ollama runsollama list, lets you pick a local model or pull a new one (progress streamed). State recorded in.clk/state/configured-tools.jsonso the wizard knows not to re-prompt next time. - Telegram — same flow as before. Says yes here triggers the
dedicated bot wizard at
scripts/telegram_setup_wizard.sh. - GitHub — optional remote (skip / existing / create); writes a
hardened
.gitignoreand a pre-push secret scan hook. See GitHub integration. - Git identity —
CLK_GIT_NAME/CLK_GIT_EMAILfor the in-container fallback.
Atomic writes. Every answer is persisted to .env immediately
via env_set (sourced from scripts/lib_env.sh). The previous
content rotates to .env.bak. If the wizard crashes mid-flow, the
next run looks at .clk/.setup-progress and offers to resume from
the last completed step. To undo a bad wizard run entirely, run
./kickoff.sh --restore.
Always-confirm. Every install, push, ollama pull, and
destructive step asks [y/N] every single time. Pressing Enter
defaults to the safe option.
# Create an empty config file on the host (once)
touch ~/clk.env
# Run the wizard — writes into the bind-mounted file
docker run --rm -it \
-v ~/clk.env:/app/.env \
-v clk-workspace:/app/workspace \
clk --setup--setup also works locally (outside Docker) and updates ./kickoff.sh's
own .env in-place.
Named volume — kickoffs persist in a Docker-managed volume across runs:
docker volume create clk-workspace
docker run --rm -it \
-v clk-workspace:/app/workspace \
clk "A local-first journaling app that summarizes my week"Host directory — kickoffs written directly to a directory on your machine:
docker run --rm -it \
-v /path/to/my/projects:/app/workspace \
clk "A local-first journaling app that summarizes my week"Anonymous volume — Docker allocates a temporary volume that is
automatically removed when the container exits (--rm handles cleanup):
docker run --rm -it \
-v /app/workspace \
clk "A local-first journaling app that summarizes my week"Ephemeral — no explicit volume mount; Docker creates an anonymous volume
for /app/workspace (declared in the image) and removes it with --rm:
docker run --rm -it clk "A local-first journaling app that summarizes my week"Pass any CLK_* variable or API key with -e:
docker run --rm -it \
-v clk-workspace:/app/workspace \
-e CLK_PROVIDER=claude \
-e CLK_AUTH_MODE=apikey \
-e ANTHROPIC_API_KEY=sk-ant-... \
clk "A local-first journaling app that summarizes my week"For the pi provider with an OpenRouter key:
docker run --rm -it \
-v clk-workspace:/app/workspace \
-e CLK_PROVIDER=pi \
-e CLK_PI_MODEL=openrouter/free \
-e CLK_PI_KEY_TYPE=openrouter \
-e CLK_PI_API_KEY=sk-or-... \
clk "A local-first journaling app that summarizes my week"For ollama or openwebui running on the host, use host.docker.internal
as the endpoint (macOS/Windows) or --network host (Linux):
docker run --rm -it \
-v clk-workspace:/app/workspace \
-e CLK_PROVIDER=ollama \
-e CLK_OLLAMA_ENDPOINT=http://host.docker.internal:11434 \
clk "My idea"For scripted or CI use, skip the TUI entirely. The pipeline runs
init → idea → plan → run → loop without any curses UI:
docker run --rm \
-v clk-workspace:/app/workspace \
-e CLK_NO_TUI=true \
-e CLK_PROVIDER=claude \
-e CLK_AUTH_MODE=apikey \
-e ANTHROPIC_API_KEY=sk-ant-... \
clk "A local-first journaling app that summarizes my week"To run the REST API server inside the container instead of the TUI, override the entrypoint command:
docker run --rm -p 8001:8001 \
-v clk-workspaces:/workspaces \
clk python -m clk_harness.apiMount /workspaces to persist workspace directories across container
restarts.
Note: workspace state is in-memory and is NOT recoverable after restart. Even when the
/workspacesvolume is mounted, the in-memory registry of workspace IDs and task history is lost every time the container restarts. The files inside/workspacessurvive on disk, but you must create new workspace registrations viaPOST /api/workspacesafter each restart — previous workspace IDs and task IDs will not be recognised by the new container instance.
Override the workspace root with CLK_WORKSPACES_DIR.
Two-way chat control for CLK. The bot lets you kick off runs, watch live status updates, tail the activity log, and cancel tasks from anywhere Telegram works — no SSH, no port forwarding, no public URL. It connects via long polling, so it works behind NAT (your home network, a Pi behind a router, a Docker container).
clk-telegram-bot is a separate process that:
- Long-polls Telegram's servers for messages from allowlisted users.
- Translates commands into calls against the local CLK REST API
(
clk-api, defaulthttp://127.0.0.1:8001). - Tails
.clk/logs/activity.jsonland pushes interesting events (agent dispatches, action applied, iteration outcomes, errors) to subscribed chats in real time.
Access is gated by a numeric-user-ID allowlist. Unknown users get a single canned reply that prints their own user ID (so the operator can add them) and are otherwise ignored.
Three steps. The wizard automates the last two:
- Create the bot with @BotFather:
- Open Telegram, message
@BotFather. - Send
/newbot. Pick a display name and a unique username that ends inbot(e.g.my_clk_bot). - BotFather replies with an HTTP API token like
123456789:AAH...xyz. Copy it.
- Open Telegram, message
- Run the wizard:
The wizard:
./scripts/telegram_setup_wizard.sh
- Validates the token by calling
getMeagainst Telegram. - Prints "Send any message to your new bot, then press Enter".
- Reads
getUpdatesto capture your numeric user ID automatically (you can also enter one manually). - Writes
CLK_TELEGRAM_BOT_TOKEN,CLK_TELEGRAM_ALLOWED_USERS, andCLK_TELEGRAM_ENABLED=trueto.env(preserving other keys).
- Validates the token by calling
- Start the bot:
# Make sure the REST API is running first (so the bot has something to drive): clk-api & # Then start the bot: clk-telegram-bot
The wizard is idempotent: re-run any time to rotate the token, add more allowed users, or re-discover your ID after switching accounts.
You should now see: in your Telegram chat with the new bot, sending
/start replies with your user ID and the help text. Sending /status
lists workspaces.
kickoff.sh offers Telegram setup automatically the first time it runs
without a token configured. The image already includes
python-telegram-bot, the wizard script, and the clk-telegram-bot
entry point.
# 1. Create an empty config file on the host (once).
touch ~/clk.env
# 2. Run kickoff with --setup; answer "y" at the Telegram prompt.
docker run --rm -it \
-v ~/clk.env:/app/.env \
-v clk-workspace:/app/workspace \
clk --setupTo run only the Telegram wizard (no kickoff prompts):
docker run --rm -it \
-v ~/clk.env:/app/.env \
--entrypoint scripts/telegram_setup_wizard.sh \
clkOnce ~/clk.env has the Telegram keys, run the bot in its own container
alongside clk-api:
# REST API server (port 8001 published so the bot container can reach it)
docker run -d --name clk-api \
-v ~/clk.env:/app/.env \
-v clk-workspaces:/workspaces \
-p 8001:8001 \
--entrypoint python clk -m clk_harness.api
# Telegram bot — talks to clk-api via Docker's bridge network
docker run -d --name clk-telegram-bot \
--link clk-api \
-v ~/clk.env:/app/.env \
-v clk-workspaces:/workspaces \
-e CLK_API_HOST=clk-api \
-e CLK_API_PORT=8001 \
--entrypoint clk-telegram-bot clkThe bot makes outbound HTTPS calls to api.telegram.org, so no
inbound port forwarding is needed. The default Docker bridge network is
enough.
Install CLK via the Pi extension or pip install 'clk-harness[api,telegram]', then drop two systemd units:
# /etc/systemd/system/clk-api.service
[Unit]
Description=CLK REST API
After=network-online.target
[Service]
User=pi
WorkingDirectory=/home/pi/clk
EnvironmentFile=/home/pi/clk/.env
ExecStart=/usr/local/bin/clk-api
Restart=on-failure
[Install]
WantedBy=multi-user.target# /etc/systemd/system/clk-telegram-bot.service
[Unit]
Description=CLK Telegram bot
After=clk-api.service
Requires=clk-api.service
[Service]
User=pi
WorkingDirectory=/home/pi/clk
EnvironmentFile=/home/pi/clk/.env
ExecStart=/usr/local/bin/clk-telegram-bot
Restart=on-failure
[Install]
WantedBy=multi-user.targetEnable both: sudo systemctl enable --now clk-api clk-telegram-bot.
You should now see: from your phone, /status returns the current
workspace list. Sending /run improve the README kicks off a CLK run and
the bot replies with a task ID.
| Command | Effect |
|---|---|
/start |
Greet, show your user ID, indicate whether allowlisted |
/help |
Show this command list |
/status |
List workspaces and last task ID |
/run <objective> |
Start a single CLK run with the given objective |
/loop [args] |
Start the Ralph / autoresearch loop |
/plan <topic> |
Run the planning workflow |
/idea <text> |
Capture an idea |
/cancel [task_id] |
Cancel a running task (latest if omitted) |
/tail [N] |
Print the last N lines of activity.jsonl (default 20) |
/subscribe |
Receive live event pushes in this chat |
/unsubscribe |
Stop receiving live event pushes |
/workspace <id> |
Set the default workspace for this chat |
Any plain text (no slash) from an allowlisted user is treated as
/run <text> — so you can just describe what you want.
Either re-run scripts/telegram_setup_wizard.sh (it appends new IDs to
the existing list) or edit CLK_TELEGRAM_ALLOWED_USERS in .env
directly:
# .env
CLK_TELEGRAM_ALLOWED_USERS=123456789,987654321,555666777Restart clk-telegram-bot to pick up the change.
- Bot doesn't reply. Send
/startand check the reply for your user ID. If you get the "Not allowlisted" message, add the ID toCLK_TELEGRAM_ALLOWED_USERSand restart the bot. token rejected by Telegram(during the wizard). The token is wrong or was revoked. Get a fresh one from BotFather with/token.- No live updates even after
/subscribe. Confirm that the bot can read the activity log:CLK_TELEGRAM_ACTIVITY_LOGoverrides the default path, or the bot auto-detects$CLK_WORKSPACES_DIR/<workspace>/.clk/logs/activity.jsonl. clk-telegram-bot --check-configexits non-zero. It prints which variable is missing (2= token,3= empty allowlist).- Kickoff prompts every run. Set
CLK_TELEGRAM_SKIP=truein.envto permanently suppress the "Set up Telegram bot now?" prompt.
CLK tries hard to never leave you with a broken setup or a stuck session. The safety nets:
| Safety net | When it kicks in | How to use it |
|---|---|---|
.env.bak rotation |
Every wizard run rotates the old .env to .env.bak before writing. |
./kickoff.sh --restore swaps it back. |
Atomic .env writes |
Wizards write to .env.tmp and rename — Ctrl-C mid-write leaves either the old or the new file intact, never half. |
Automatic; no user action. |
| Atomic JSON config writes | Same pattern for .clk/config/*.json and any agent-written JSON, with .bak rotation. |
Implemented in clk_harness.config.save_json. |
| Per-step wizard resume | Wizard tracks last completed step in .clk/.setup-progress. If you Ctrl-C, the next run offers to resume. |
./kickoff.sh --setup prompts "Resume from after step X? [Y/n]". |
| Crashed-session detection | The TUI writes its PID to .clk/state/.tui-active. If a previous TUI exited uncleanly, the next launch surfaces "recovered from a crashed session" and points to the preserved .clk/state/conversation.md. |
Automatic. |
/undo |
After every agent commit, /undo lets you preview and revert the last commit. Two-step (preview first, then /undo confirm) so it's never accidental. |
Type /undo in the TUI. |
/abort |
When an agent subprocess is stuck, SIGTERM it without killing the TUI. The provider returns a timeout error, the cycle reports the failure cleanly. | Type /abort in the TUI. |
/install / /configure |
Recover from "CLI not found" / "auth failed" without leaving the dashboard. | /install [provider] then /configure [provider]. |
| Pre-push secret scanner | Installed in the kickoff dir's .git/hooks/pre-push. Greps for ANTHROPIC_API_KEY=, OPENAI_API_KEY=, sk-…, private-key headers. Bypass with git push --no-verify when sure. |
Automatic in every kickoff dir. |
Confirmation policy. Every install, push, undo, cost-cap
crossing, ollama pull, and destructive --clean action asks [y/N]
every single time. There is no "remember my answer" shortcut — by
design.
kickoff.sh --setup offers to wire each kickoff workspace up to a
GitHub remote so every CLK commit is checkpointed off your machine.
Three modes:
skip— no GitHub, local commits only (default).existing— paste ahttps://github.com/OWNER/REPOorgit@github.com:OWNER/REPO.gitURL; the wizard validates it viagh repo view(orgit ls-remoteifghisn't on PATH).create— provideowner/repo(default$USER/$CLK_PROJECT_NAME-kickoff), the wizard runsgh repo create … --privatefrom inside the kickoff dir. Default visibility is private — making it public requires an explicit choice.
Auth. Prefer the gh CLI if it's on PATH and authenticated. If
not, the wizard offers to install gh and drops you into a shell
for gh auth login (same pattern as pi login). PATs are stashed
in ~/.config/clk/github-token (chmod 600), never .env.
Hardened .gitignore. Written before the first push so secrets
can't leak. Blocks .env, .env.bak, .env.local, *.pem,
*.key, *_id_rsa*, /secrets/, plus editor / OS junk.
Pre-push hook. .git/hooks/pre-push greps the about-to-push
objects for obvious key patterns (Anthropic / OpenAI / OpenRouter /
Gemini / Google keys, generic sk-… strings, Slack xoxb- tokens,
private key headers). On a hit the push aborts with the offending
lines and the bypass instructions. Bypass once with git push --no-verify.
CLK_GITHUB_PUSH_ON_COMMIT=true makes the harness follow every
auto-commit with a git push origin HEAD. Failures are non-fatal —
the commit stays local until the network or remote is back. The TUI
title bar shows ↑N for the count of unpushed commits.
Re-link from the TUI. Type /github to see current remotes and
re-link instructions.
Two new commands help when something feels off.
Health-check every provider, validate .env against known-bad
combos, and check git/GitHub state.
- Reports each finding as
ok | warn | fail. - Exits non-zero on any
failso it slots into CI. /doctor --fixprompts before each automated remedy (running/install, re-runningconfigure_tool, writing a missing key).
Common findings:
| Finding | Meaning | Fix |
|---|---|---|
claude: unavailable |
claude CLI not on PATH or API key missing |
/install claude then /configure claude |
anthropic_key: fail |
CLK_AUTH_MODE=apikey but ANTHROPIC_API_KEY is empty |
/configure claude to set it |
git: warn |
no git repo at project root; auto-commit disabled | git init |
ollama: unavailable |
endpoint not reachable | /install ollama, then ollama serve & |
Bundles the current state into a clk-diag-<ts>.tar.gz for sharing
in bug reports. Contents:
.clk/logs/*(recent only — capped so the bundle stays small).clk/runs/<last-3>/.clk/state/*.{md,json}clk doctoroutputpyproject.tomlversion,python --version,git --version,uname -a- A redacted copy of
.env— every value under a key containingKEY,TOKEN,SECRET, orPASSis replaced with<redacted: N chars>so the recipient can confirm you had a key without seeing it.
Always confirms before writing the tarball.
First-time users can type /tutorial in the TUI to run a
30-second sample idea — "Add a hello() function to greeter.py" —
against the shell provider. Costs nothing, takes no API keys,
demonstrates the cast → engineer → qa → commit loop end-to-end so
the user knows what a "real" run will look like.
The tutorial backs up your active provider, runs one engineering
cycle in .clk/state/.tutorial/, then restores. A marker at
.clk/state/.seen-tutorial suppresses the "type /tutorial" hint
in the welcome banner on subsequent runs.
Each kickoff.sh creates workspace/kickoff-<timestamp>/. To keep
the directory navigable:
./kickoff.sh --list # show every kickoff with its idea
./kickoff.sh --clean 7d # delete kickoff dirs older than 7 days (after y/N)
./kickoff.sh --clean 30m # same, in minutes
./kickoff.sh --restore # roll .env back to .env.bak (undo last wizard run)From inside the TUI:
/workspaces list # numbered list, * marks the current one
/workspaces rename old-name new # rename a kickoff dir
/workspaces switch <name> # prints instructions (/quit, then cd)
/workspaces clean # points at ./kickoff.sh --clean
The kickoff manifest at KICKOFF.md (written by kickoff.sh into
each new workspace) records timestamp, source dir, project name,
provider, max iterations, install flag, and idea.
Title-bar dollar cost is computed from the per-provider table in
clk_harness/pricing.py:
| Provider | Default $/1k in | Default $/1k out |
|---|---|---|
| claude (sonnet-4-5) | $0.003 | $0.015 |
| claude (haiku-latest) | $0.0008 | $0.004 |
| claude (opus-latest) | $0.015 | $0.075 |
| codex (gpt-4o) | $0.0025 | $0.010 |
| codex (gpt-4o-mini) | $0.00015 | $0.0006 |
| codex (o1) | $0.015 | $0.060 |
| gemini (1.5-pro) | $0.00125 | $0.005 |
| gemini (1.5-flash) | $0.000075 | $0.0003 |
| pi | $0.003 | $0.015 (blended default; override per route) |
| ollama / shell | $0.00 | $0.00 |
Override per project by adding to .clk/config/providers.json:
Or per model:
"pricing_by_model": { "openrouter/free": { "input_per_1k": 0.0, "output_per_1k": 0.0 } }/status prints the per-provider breakdown so you can see which
provider is eating the budget. Updated lazily from the same numbers
the title bar shows.
The robustness loops (see Robustness loops) trade tokens for quality. Use this table to pick a regime:
| Knob | Worst-case multiplier per affected dispatch | Recommended starting point |
|---|---|---|
robustness.auto_consensus |
off → ×1; on_careful → ×(N+1) on careful stages only; always → ×(N+1) on every dispatch (where N = consensus.max_samples, default 6) |
on_careful (default) |
robustness.auto_refine |
off → ×1; careful_only → ×(1 + 1 worker revision + 1 critic) on careful stages; all → that on every stage |
careful_only (default) |
robustness.max_quality_retries |
At most this many extra dispatches when a response fails the quality check; 0 disables | 2 (default) |
robustness.refine_max_rounds |
Cap on critic↔worker round-trips inside a refine loop | 3 (default) |
robustness.max_qa_depth |
Cap on inter-agent Q&A chain depth (each peer answer can ask one peer) | 3 (default) |
robustness.plateau_window |
How many no-improvement Ralph/autoresearch iterations before escalation | 3 (default) |
robustness.plateau_action |
off disables adaptive loop termination entirely |
escalate_then_reframe |
Cost-minimal regime (closest to legacy CLK behavior, no extra tokens):
"robustness": {
"auto_consensus": "off",
"auto_refine": "off",
"max_quality_retries": 0,
"plateau_action": "off"
}Cost-maximal "lean into the loop" regime (every dispatch fans out, critic gates every careful stage, plateau detection on, Q&A protocol fully open):
"robustness": {
"auto_consensus": "always",
"auto_refine": "all",
"max_quality_retries": 3,
"refine_max_rounds": 4,
"plateau_action": "escalate_then_reframe"
}A native pi.dev extension that brings the full CLK
orchestration model — dynamic casting, stochastic consensus, Ralph
refinement, and Karpathy-style autoresearch — into Pi behind a single
/clk command. No Python harness required at runtime.
The TypeScript extension now ports the harness's response-quality
scoring and consensus fan-out as real tools (clk_consensus,
clk_subagent_quality, clk_autoresearch, clk_ralph) rather than
relying on chief compliance — every parallel sample is scored by the
same rules clk_harness/orchestration/response_quality.py uses, the
winner is picked in code, and Ralph branches are created by the tool so
the protocol can't be skipped. See pi-extension/README.md
for the full tool reference, state layout, error handling, and
customisation notes.
Requirements: Pi on PATH; tmux on PATH; Git on PATH.
Install:
| Option | Command | When to use |
|---|---|---|
| Quick test | pi -e /path/to/CognitiveLoopKernel/pi-extension/src/index.ts |
Try it out; reloads on /reload |
| Project-local | mkdir -p .pi/extensions && ln -s /path/to/CognitiveLoopKernel/pi-extension .pi/extensions/clk |
Version-controlled per project |
| Global | mkdir -p ~/.pi/agent/extensions && ln -s /path/to/CognitiveLoopKernel/pi-extension ~/.pi/agent/extensions/clk |
Available in every Pi session |
Commands:
| Command | Effect |
|---|---|
/clk <idea> |
Capture the idea and hand off to the chief. Resumes if state exists. |
/clk-abort |
End the active run. State is preserved; resume with /clk later. |
/clk-help |
List every CLK slash command, every orchestration tool the chief uses, and the active safety nets. |
/clk-doctor |
Health-check tmux, git, the workspace .clk/ layout, the pre-push hook, and (when a remote exists) the count of local commits not yet pushed. |
/clk-undo |
Preview the last CLK commit; /clk-undo confirm creates a revert commit on top of it. |
Orchestration tools the chief uses (you don't call these directly):
| Tool | Purpose |
|---|---|
clk_cast |
Persist a roster of project-specific specialist roles. |
clk_subagent |
Raw single-subagent dispatch via a detached tmux pi session. |
clk_subagent_quality |
One subagent + automatic repair-preamble re-rolls on quality failures. |
clk_consensus |
Fan out N parallel samples (default 3, max 6), score each, return the winner plus every candidate's score. |
clk_autoresearch |
Bounded researcher + critic alternation; each iteration recorded on the progress log. |
clk_ralph |
Create a ralph/<iter> branch and run a consensus fan-out in one call; chief then calls clk_merge or clk_revert. |
clk_branch / clk_merge / clk_revert / clk_checkpoint |
Git plumbing for the Ralph iteration cycle. |
clk_progress |
Append a one-line entry to .clk/state/progress.md. |
clk_done |
Mark the run complete and write .clk/state/done.md. |
Optional env vars:
| Variable | Effect |
|---|---|
CLK_GITHUB_PUSH_ON_COMMIT=true |
After every clk_checkpoint and clk_merge, run git push origin HEAD best-effort and surface an ↑N ahead counter if the push fails. Same env var as the Python TUI. |
A typical session:
> /clk a local-first journaling app that summarizes my week
[CLK run started. The chief is taking over.]
[chief casts engineer, ux_writer, summarizer, qa]
[chief calls clk_consensus({agent:"architect", samples:3, task:"... storage design ..."})]
[harness fans out 3 parallel tmux pi subagents, scores each, returns the winner]
[chief calls clk_autoresearch({question:"sync model: append-only vs CRDT?"})]
[chief calls clk_ralph({iterationName:"iter-1-mvp", agent:"engineer", task:"... build MVP ..."})]
[chief calls bash: pytest -q]
[chief calls clk_merge: "ralph win: MVP capture+persist+summarize"]
[chief calls clk_done: "MVP runs; tests pass; README + deploy plan present"]
The package itself:
clk_harness/
api.py # FastAPI REST API server
_api_launcher.py # background daemon thread launcher (auto-start on CLI)
_api_shim.py # console-script shim for clk-api (guards ImportError)
cli.py # argparse entrypoint
config.py # paths, default configs, JSON load/save
git_ops.py # init, commit, revert, status helpers
providers/ # claude, codex, pi, ollama, shell adapters
orchestration/ # agent runner, workflow runner, ralph loop (refinement + autoresearch)
templates/ # bundled prompts and workflows
utils/ # logging
scripts/
clk # launcher (prefers .clk/venv/bin/python)
install_local.sh # creates .clk/venv and installs PyYAML
run_loop.sh # convenience wrapper around clk loop
run_all_tests.sh # orchestrator: build + test in ephemeral Docker
tests/ # pytest regression suite (CI-gated)
user_tests/ # pytest end-to-end suite (drives CLI + REST API)
pi-extension/ # standalone Pi extension (TypeScript)
src/
index.ts # /clk + /clk-help + /clk-doctor + /clk-undo, session lifecycle
prompts.ts # the chief's operator's manual
tools.ts # clk_cast / clk_progress / clk_checkpoint / clk_branch /
# clk_merge / clk_revert / clk_consensus / clk_subagent_quality /
# clk_autoresearch / clk_ralph / clk_done
subagent.ts # raw clk_subagent — spawnSubagent() exposed for consensus
consensus.ts # dispatchWithQuality + runConsensus (port of agent.py)
quality.ts # scoreResponse + repairHint (port of response_quality.py)
git.ts # checkpoint, branch, merge, revert + hasRemote / commitsAhead /
# pushBestEffort (port of git_ops.py auto-push helpers)
state.ts / abort.ts / errors.ts / types.ts
tests/ # node --test suites covering every file in src/
docs/
REST_API.md # full REST API reference
The harness state, written by clk init and grown by every command:
.clk/
config/
clk.config.json # project-wide config (incl. casting + recovery caps)
providers.json # provider registry + active provider
agents.json # agent -> prompt + provider mapping (mutable)
workflows/*.yaml # Archon-style workflows (chief authors per project)
prompts/ # editable prompt templates (one per agent;
# dynamic roles get a generated file here)
state/
idea.json # captured idea
system_brief.md # initial brief
prd.json # product manager output
progress.md # human-readable timeline
decisions.md # decisions log
experiments.jsonl # per-iteration outcomes
agent_memory.jsonl # all agent invocations (incl. token usage)
casting.log # JSONL of every roster decision (add/update/remove)
done.md # written only when completion criteria met
logs/
activity.jsonl # detailed agent activity log
session.log # mirror of the TUI status pane
<cmd>-<ts>.log # per-command log files
runs/ # per-invocation prompt + response capture
tools/ # locally-cloned external tools (e.g. pi)
venv/ # local python venv
backups/ # safety copies of overwritten files (per run)
| Provider | Detection | Notes |
|-------------|------------------------------------------|
shell | always available | dummy; echoes prompts and writes stub files. Use for tests, CI, dry runs. |
| claude | claude on PATH | runs claude --print non-interactively. Add "args": ["--print", "--output-format", "json"] to providers.json to get real token counts. |
| codex | codex on PATH | runs codex exec. |
| gemini | gemini on PATH | runs the Google Gemini CLI; prompt fed on stdin. |
| pi | pi on PATH or .clk/tools/pi/bin/pi | pi.dev terminal harness; supports model selection, OpenRouter, and any API-key provider. See below. |
| ollama | TCP reachable at endpoint | local-only LLM via HTTP. Use a ≥14B model (e.g. qwen3:14b) — see Ollama provider for why. |
| openwebui | TCP reachable at endpoint | any OpenAI-compatible server. Configure endpoint, api_key, model in providers.json; kickoff offers a numbered model picker fetched from /api/models. |
./scripts/clk providers prints availability as JSON. Customize per
provider in .clk/config/providers.json.
For the CLI-driven providers (claude, codex, gemini) you can
choose how authentication works at kickoff:
CLK_AUTH_MODE=cli(default) — spawn the provider's local CLI as a subprocess and trust whatever auth that CLI already has. If you've runclaude login/codex login/ Gemini sign-in, no API key is required and kickoff will not prompt for one. Persisted toproviders.jsonas"mode": "cli".CLK_AUTH_MODE=apikey— call the upstream HTTP API directly (no local CLI is spawned at all). Kickoff prompts for the standard env var (ANTHROPIC_API_KEY,OPENAI_API_KEY,GEMINI_API_KEY/GOOGLE_API_KEY) and stores it inproviders.jsonalong with"mode": "api". Each provider has a built-in HTTP client (Anthropic Messages, OpenAI Chat Completions, GeminigenerateContent).
The other providers don't need this knob: shell and ollama are
local, openwebui uses an explicit bearer token, and pi has its own
authentication model described below.
Ollama is local and free — no API key, no rate limits — which makes
it tempting to default to. The catch is that CLK asks the chief to
emit machine-parseable YAML workflows, and small open-weight models
(≤8B parameters) are inconsistent at this. Specifically, the chief
will occasionally produce a PROPOSE_WORKFLOW block where a list
item contains an unquoted colon (e.g. [type:finding, stage:create_file]), which YAML can't parse.
What you'll see when this happens:
[workflow] PROPOSE_WORKFLOW parse failed: mapping values are not
allowed here :: keeping prior workflow
[supervise] cycle N/M no progress (workflow still has zero new stages)
The harness handles this safely — it detects the bad YAML, refuses to clobber the existing workflow file, falls back to the bundled engineering template, and the supervise loop keeps the run alive until its cap. But the visible symptom is a loop that "spins" without forward progress, which is frustrating.
Recommendation: use qwen3:14b or larger as the minimum. It
follows the structured-output instructions reliably enough that the
chief's proposals parse on the first try. Pull it with:
ollama pull qwen3:14bOther ≥14B options that work well: llama3.1:70b, qwen2.5-coder:32b,
deepseek-r1:14b. Models ≤8B (llama3.2, gemma2, qwen2.5:7b,
phi3) are fine for chat but flaky for workflow generation — they'll
get through some cycles cleanly but fail the YAML contract often
enough that the loop won't make steady progress.
Memory rule of thumb: a 14B Q4 model needs ~10 GB of RAM/VRAM; 32B
needs ~20 GB; 70B needs ~40 GB. The setup wizard's ollama section
streams ollama pull progress so you can see download size before
it lands.
pi (from pi.dev) is an extensible terminal
harness. CLK drives it as a subprocess, piping the prompt on stdin and
capturing stdout as the agent response.
Model selection
Pass a model to pi via CLK_PI_MODEL:
CLK_PI_MODEL=openrouter/free # free tier via OpenRouter
CLK_PI_MODEL=openrouter/auto # let OpenRouter pick the best available free model
CLK_PI_MODEL=anthropic/claude-3-5-sonnet # specific model via OpenRouterLeave CLK_PI_MODEL blank to use pi's own active profile or default.
The value is forwarded to pi as pi --model <value>.
API keys
Pi reads provider-specific environment variables — one per backend. Two settings control this:
| Setting | Purpose |
|---|---|
CLK_PI_KEY_TYPE |
The provider your key belongs to (default: openrouter) |
CLK_PI_API_KEY |
The actual key value |
The harness derives the env var name by convention:
{CLK_PI_KEY_TYPE.upper()}_API_KEY. So:
CLK_PI_KEY_TYPE |
Env var set for pi |
|---|---|
openrouter |
OPENROUTER_API_KEY |
openai |
OPENAI_API_KEY |
anthropic |
ANTHROPIC_API_KEY |
mistral |
MISTRAL_API_KEY |
| any future provider | {NAME}_API_KEY automatically |
This means new providers require no code changes — just set
CLK_PI_KEY_TYPE to the provider name and CLK_PI_API_KEY to your key.
Leave CLK_PI_API_KEY blank if you have already run pi login and pi
has its own stored credentials.
Interactive pi setup
If you need to run pi login, configure a profile, or verify your
setup interactively, kickoff offers to open pi's TUI before launching
the harness. You'll be prompted at the end of the pi configuration
questions during both --setup and a normal kickoff run (when pi is
on PATH). Exit pi normally when done and kickoff will continue.
This is useful for first-time Docker sessions where pi has no stored credentials yet:
# Run the setup wizard — it will offer to open pi if found on PATH
./kickoff.sh --setupPi's own state (credentials, profiles) is stored in pi's own config
directory (e.g. ~/.pi/) — no extra Docker volume is required for
CLK's harness state, but if you want pi credentials to persist across
container restarts, mount the pi config directory:
docker run --rm -it \
-v ~/.pi:/root/.pi \
-v clk-workspace:/app/workspace \
-e CLK_PROVIDER=pi \
-e CLK_PI_MODEL=openrouter/free \
-e CLK_PI_KEY_TYPE=openrouter \
-e OPENROUTER_API_KEY=sk-or-... \
clk "My idea"Alternatively, pass the API key directly via CLK_PI_API_KEY and skip
pi login altogether — kickoff will set the right env var for you.
The kickoff dir lays the agents' work out as a normal project tree
with all harness machinery folded under .clk/:
workspace/kickoff-<ts>/
src/, tests/, README.md ... # the project the agents are building
# (agents write directly to project root)
scripts/clk # convenience launcher shim
KICKOFF.md # provenance manifest
.gitignore # hardened — blocks .env, .env.bak, *.pem, …
.git/hooks/pre-push # secret scanner; aborts on key patterns
.clk/ # ALL harness state — sandboxed off
.setup-progress # per-step resume marker for the wizard
harness/clk_harness/ # harness sources copied from parent
harness/scripts/ # original launcher / installer
harness/pyproject.toml # package metadata for pip install -e
config/ # clk.config.json, providers.json, agents.json
# each written atomically with a .bak rotation
state/ # idea.json, prd.json, decisions.md ...
# plus:
# .seen-welcome first-run banner marker
# .seen-tutorial /tutorial done marker
# .tui-active PID lock (crashed-session detection)
# configured-tools.json which tools have had configure_tool run
# session-cost.json persisted USD totals
prompts/ # per-agent system prompts
blackboard/ # cross-agent shared scratchpad (POST blocks land here)
runs/ # per-dispatch prompt + response logs
backups/ # pre-write copies of mutated files
cache/, logs/, venv/ # local-only artifacts
The repo root also adds:
scripts/lib_env.sh— shared atomic-write helpers (env_set,env_get,env_atomic_write,env_restore) sourced by both wizards.scripts/install_tool.sh— install + check + configure registry for every supported tool. Used bykickoff.sh --setupand by the TUI's/install//configurecommands.clk_harness/pricing.py— per-provider USD pricing table backing the title-bar cost estimate.~/.config/clk/github-token— when present (chmod 600), used in place of theghCLI for GitHub operations.
ACTION blocks resolve relative to the project root. The harness rejects
any path that resolves into .clk/ so agents can't accidentally (or
intentionally) write into harness state. run commands cwd into the
project root. To share findings across agents, workers emit POST
blocks; the harness routes those into .clk/blackboard/ even though
agents cannot write there directly.
The kickoff .gitignore keeps .clk/ out of git except for the
curated state files (idea.json, system_brief.md, prd.json,
decisions.md, progress.md, casting.log, done.md, plus the
blackboard) so git log in the kickoff dir tells the project's story
without harness chatter. Deleting .clk/ resets the harness without
touching the project tree.
The default engineering workflow ends with a supervise stage where
the chief evaluates whether the user's prompt has been fully addressed.
The chief either:
- emits
ACTION: donewith a one-line reason — writes.clk/state/done.mdand terminates the loop, or - emits
PROPOSE_WORKFLOWwith the next iteration's stages — the workflow runner picks them up and runs another cycle.
So no agent is ever truly "done" until the chief signals completion.
Capped at clk.config.json::supervise.max_cycles (default 5) to avoid
runaway loops.
The harness ships with three baseline agents that cannot be removed:
chief— decomposes objectives, casts the team, authors workflow YAML.qa— default validator.ralph— drives both the Ralph refinement loop and Karpathy-style autoresearch cycles; the mode is inferred from the current project state.
Everything else is dynamic. On the first user message, the chief is
auto-dispatched with the captured idea and casts the project-specific team,
including engineer when an implementer is needed (e.g. data_steward,
ml_evaluator, api_contract, ux_writer, security_auditor).
The name engineer is reserved: the harness actively rejects any attempt
to create engineering, coder, developer, or other aliases, and
reports the denial directly to the chief via its $casting_feedback context
so it learns to use engineer directly. Each role decision is
applied immediately and persisted to .clk/config/agents.json plus
.clk/state/casting.log (JSONL, one entry per add/update/remove).
Type /cast in the TUI to force a re-cast at any time, or run
clk cast from the CLI. To inspect or edit by hand:
clk roles list|add --name X --role "..."|remove --name X.
Agents communicate via a blackboard at .clk/blackboard/ — short
markdown POST blocks each agent emits at the end of its run, filtered
into peers' prompts based on each stage's inputs: selectors.
Directed clarifying questions are a special POST type
(POST: question TO: <peer> URGENCY: blocking) routed inline by the
harness — see Robustness loops for the protocol details and depth
caps.
Agents drive real changes by emitting ACTION: blocks the harness
parses and applies — descriptions alone do nothing. Supported kinds:
ACTION: write/edit/append/delete— file mutations (paths must resolve inside the project root; originals are backed up to.clk/backups/<run_id>/).ACTION: run— shell command, runs in project root, output captured to the log; rejectssudoand obvious-foot-gun patterns.ACTION: done— writes.clk/state/done.md, signaling the loops to stop.
Every agent run that mutates files produces an immediate structured
git commit ([agent] objective with files, commands, token totals in
the body). A cap from clk.config.json::validation.max_files_per_batch
(default 25) limits damage from a runaway agent.
When a workflow stage's dependencies fail, the harness dispatches the
chief in recovery mode with the exact failure reasons (agent error,
validation output) and asks them to either re-cast the workflow,
emit ACTION blocks that fix the upstream failure, or PROPOSE_ROLE
a specialist that can. Capped at 3 recovery passes per stage
(configurable via clk.config.json::recovery::max_per_stage).
This section is about dependency failures. Content failures —
empty, malformed, or low-confidence agent output that nonetheless
returned ok=True — are handled by the response-quality re-dispatch
loop documented in Robustness loops above.
YAML workflows live in .clk/config/workflows/. The default
engineering.yaml is intentionally minimal (chief → engineer → qa);
the chief overwrites it on first cast with a project-tailored cycle.
The bundled scaffolds:
discovery.yaml- validate problem, users, landscape.product.yaml- PRD + technical architecture.engineering.yaml- baseline cycle; chief replaces this per project.validation.yaml- drive toward a green test suite.deployment.yaml- deployment recipe + checklist.ralph_loop.yaml- single Ralph iteration (useclk loopto repeat).
Stage schema:
- id: implement
agent: engineer
objective: Implement the smallest vertical slice.
depends_on: [architect]
validation: "pytest -q"
commit: trueWhen validation is set, the command must exit 0 before the harness
will commit. Failed validations leave the working tree untouched (and
in the Ralph loop, are reverted to the pre-iteration HEAD).
Ralph runs in two modes (selected automatically based on project state,
or forced via /loop):
- Refinement mode (
/loop ralph N, default). Each iteration: ralph picks one measurable improvement, the engineer implements it, QA validates, and the harness commits or reverts. - Autoresearch mode (
/loop autoresearch N). Each iteration: ralph surveys state, picks the highest-value open question, designs and runs a small experiment, and records the learning regardless of pass/fail.
Both modes respect max_iterations and stop early when
.clk/state/done.md is created. Both also auto-detect plateau and
regression and adapt — see Robustness loops below.
CLK leans into the loop: every dispatch is wrapped in self-correcting
behavior so the harness does not just accept the first thing a
sub-agent returns. This section is a single index of every loop the
harness runs — old and new — with the config knob that tunes each
one and the activity-log event you can grep for in .clk/logs/.
All knobs live under clk.config.json::robustness.* (and the
parallel CLK_ROBUSTNESS_* env-var family — see .env.example).
Every layer has an off-switch so you can throttle cost.
Transient provider errors (rate limits, timeouts, "no endpoints available", HTTP 429) are retried with exponential backoff before the response surfaces at the workflow layer.
- Code:
clk_harness/orchestration/agent.py::AgentRunner._should_retry_provider - Config:
clk.config.json::provider_retry.{max_retries, backoff_s} - Logged events:
provider_attempt,provider_retry - Kill switch: set
provider_retry.max_retries: 0
When a workflow stage fails with a retryable provider error after the inner provider-retry budget is exhausted, the workflow runner retries the entire stage with a larger backoff before giving up on the stage.
- Code:
workflow.py::WorkflowRunner._is_retryable_stage_error - Config:
clk.config.json::provider_retry.{stage_max_retries, stage_backoff_s} - Logged events:
workflow_stage_retry - Kill switch: set
provider_retry.stage_max_retries: 0
The chief's supervise stage decides whether the user's prompt has
been fully addressed; if not, it emits a PROPOSE_WORKFLOW and the
whole workflow re-runs. See Chief supervisor loop for the full
description.
- Config:
clk.config.json::supervise.max_cycles(default 20) - Kill switch: set
supervise.max_cycles: 1
When a stage's dependencies fail, the chief is dispatched in recovery mode to re-cast, remediate, or accept the gap. See Self-healing on unmet deps. This handles dependency failures; content failures are handled by Layer 6 below.
- Config:
clk.config.json::recovery.max_per_stage(default 3)
Stages marked phase: review automatically receive a chief-authored
review prompt containing the upstream stages' POST blocks, and the
chief emits a verdict (continue / redirect / abort). Stages marked
careful: true add a post-stage checkpoint and (when configured)
trigger meta-prompt drafting on dispatch.
Example:
- id: design_spec
agent: architect
careful: true
outputs: [design_brief]
objective: Draft the API contract.
- id: review_design
agent: chief
phase: review
depends_on: [design_spec]- Config:
clk.config.json::review.per_stage(apply to every stage) - Logged events:
workflow_checkpoint,consensus_coalesced
After every dispatch, the response is scored against
response_quality:
- empty / sub-threshold text
- malformed
ACTION:orPOST:blocks - missing declared
outputs(the stage's contract keys) - self-reported low confidence (
CONFIDENCE: <0..1>parsed from the response) - refusal patterns (treated as not-recoverable — surfaces to the chief instead of retrying blindly)
Recoverable failures are re-dispatched with a repair preamble that
quotes the specific reasons back to the worker, up to
robustness.max_quality_retries. On the final retry, when
auto_consensus is not "off", the dispatch escalates to a
stochastic consensus fan-out rather than another single-shot retry.
- Code:
orchestration/response_quality.py,agent.py::_dispatch_with_quality_loop - Config:
robustness.{max_quality_retries, min_response_chars} - Logged events:
agent_quality_retry,agent_quality_final - Kill switch:
robustness.max_quality_retries: 0
Any agent can emit PROPOSE_CONSENSUS to fan a question into N
independent samples; the harness runs them in parallel, logs them,
and dispatches the chief to coalesce. New in this release:
robustness.auto_consensus makes the fan-out automatic.
auto_consensus |
Behavior |
|---|---|
off |
Only PROPOSE_CONSENSUS triggers fan-out (legacy behavior). |
on_careful (default) |
Stages marked careful: true fan out automatically. |
always |
Every non-chief dispatch fans out (×N samples — most expensive setting). |
Cost: a fan-out costs roughly N + 1 dispatches (N samples + 1 chief
coalescing). Caps at consensus.max_samples (default 6) and
consensus.max_parallel (default 4).
- Logged events:
consensus_started,consensus_sample_dispatch,consensus_samples_completed,consensus_coalesced - Kill switch:
robustness.auto_consensus: "off"
Agents emit:
POST: question
TO: architect
URGENCY: blocking
BODY:
Are user IDs opaque strings or integers?
END_POST
With URGENCY: blocking, the harness dispatches the target peer
immediately to answer; the peer's POST: answer lists the question
id in its CONSUMES, and the asker sees the answer in the next
blackboard digest. URGENCY: async records the question for the
chief to schedule in a later cycle.
Chain depth is capped at robustness.max_qa_depth (default 3) so a
question can't trigger an unbounded chain of clarifications.
- Code:
agent.py::_route_blocking_questions,blackboard.py - Config:
robustness.{max_qa_depth, qa_parallel_judges} - Logged events:
qa_dispatch,qa_chain_capped,qa_chain_cycle,qa_target_unknown - Kill switch: omit the
TO:field in yourPOST: questionblocks; no protocol-level off-switch (Q&A is opt-in per post).
Stages may declare a refinement loop that threads a critic between
worker rounds. The critic scores the worker's output 0..1; if below
the accept threshold, the worker is re-dispatched with the critic's
feedback until accept or max_rounds is reached.
- id: design_spec
agent: architect
refine:
critic: critic
max_rounds: 4
accept_threshold: 0.8
objective: Draft the spec.When the stage has no explicit refine: block, robustness.auto_refine
decides whether one round runs anyway:
auto_refine |
Behavior |
|---|---|
off |
Only stages with refine: use the inner loop. |
careful_only (default) |
Stages marked careful: true get one critic pass. |
all |
Every non-chief, non-qa, non-critic stage gets one pass. |
The critic's last two lines must be:
VERDICT: accept # or `revise`
SCORE: <0..1>
- Code:
workflow.py::WorkflowRunner._refine_loop - Config:
robustness.{auto_refine, refine_max_rounds, refine_accept_threshold} - Logged events:
refine_critic_verdict - Kill switch:
robustness.auto_refine: "off"AND remove anyrefine:blocks from your workflow YAML.
Both loops record every iteration's outcome to
.clk/state/experiments.jsonl. After robustness.plateau_window
consecutive iterations without measurable improvement, the loop:
- Escalates — the next iteration's dispatches carry
careful=truein their extra, which (via Layer 7) fans them into stochastic consensus. - Reframes — the chief is dispatched with a "plateau dispatch" prompt asking it to re-cast roles or re-author the workflow with a qualitatively different approach (new metric, new experiment family) rather than another marginal tweak.
- Terminates gracefully — if escalation + reframe fail to break
the plateau across two more iterations,
done.mdis written with reason "plateau" rather than burning the full iteration budget.
Regression (last iteration failed after at least one earlier success in the window) triggers an additional critic dispatch on the failing diff before the next plan, so the next iteration starts from an informed view of what broke.
Autoresearch additionally gains an evaluator gate (previously only in Ralph): if the analyst's writes break the build, the working tree is reverted rather than committed.
Both loops also short-circuit when a planner or surveyor returns
empty / unrecoverable output; rather than commit garbage, the
iteration is recorded with improved=False.
- Code:
ralph_loop.py::RalphLoop,autoresearch_loop.py::AutoresearchLoop - Config:
robustness.{plateau_window, plateau_action}(escalate_then_reframe|escalate_only|reframe_only|off) - Logged events:
ralph_plateau_detected,ralph_plateau_escalate,ralph_plateau_terminated,ralph_regression_detected,ralph_iteration_skipped_low_quality,autoresearch_step_skipped_low_quality,autoresearch_revert - Kill switch:
robustness.plateau_action: "off"
A typical "careful" engineering stage now runs:
- Stage dispatched with
careful: true. auto_consensus=on_careful→ N samples fan out in parallel.- Chief coalesces the samples.
auto_refine=careful_only→ critic scores the coalesced output; the worker is revised until critic accepts ormax_rounds.- Stage validation runs.
- Checkpoint (if enabled) — chief CONTINUE / REDIRECT / ABORT verdict.
- Outputs contract check; warn if any declared key was not posted.
Tracing this in .clk/logs/:
grep -E '^(consensus_|refine_|workflow_checkpoint|agent_quality_)' \
.clk/logs/activity.jsonl | jq .
CLK considers the system "done" when .clk/state/done.md exists. By
convention you create it only when:
- the MVP runs locally,
- the test suite passes,
- the README explains setup,
- a deployment plan exists,
- a deployment checklist exists,
- at least one user-facing interaction path exists.
CLK ships three test suites and a one-command orchestrator that runs them all in an ephemeral Docker container.
| Suite | What it covers | Runner |
|---|---|---|
tests/ |
Unit + integration regression tests (CI-gated) | pytest |
user_tests/ |
End-to-end CLI / REST API / kickoff.sh user tests |
pytest |
pi-extension/tests/ |
TypeScript Node tests for the Pi extension | npm |
# Interactive: prompts for LLM provider, API key, base URL, model.
# Builds an ephemeral Docker image, runs every suite inside, then tears
# the container down (success or failure).
./scripts/run_all_tests.sh
# CI / scripted use — skip the prompts and use the shell provider:
./scripts/run_all_tests.sh --non-interactive
# Single suite (no Docker, runs directly on the host):
./scripts/run_all_tests.sh --local --suite=user
./scripts/run_all_tests.sh --local --suite=ci
./scripts/run_all_tests.sh --local --suite=piThe interactive menu asks four questions:
- LLM provider (shell / claude / codex / gemini / pi / ollama / openwebui)
- Auth mode (cli vs apikey) for the CLI-driven providers
- API key, base URL, model name — only for the chosen provider
- Confirm + go
All deterministic tests (CLI plumbing, REST API contract, etc.) run
against the shell provider regardless — they need no credentials and
always succeed. The opt-in real-provider smoke test
(test_kickoff_with_user_selected_provider in user_tests/) runs
kickoff.sh end-to-end with whatever provider you selected, and the
pi-extension runtime smoke verifies the pi CLI is reachable when you
chose pi and gave it a model + key.
run_all_tests.sh (Docker mode):
- Builds
clk:tests-<pid>from the projectDockerfile. - Mounts the repo read-only at
/repo, copies it into a writable/workinside the container. - Runs
pytest tests/thenpytest user_tests/thennpm testinsidepi-extension/. - Always tears down the container on exit (success, failure, or
^C) and removes the ephemeral image, unless
--keepis passed.
Useful flags:
| Flag | Effect |
|---|---|
--local |
Run on the host directly; no Docker daemon required. |
--non-interactive |
Skip all prompts; force CLK_PROVIDER=shell. |
--suite=all |
Default — run all three test directories. |
--suite=ci |
Only tests/ (regression). |
--suite=user |
Only user_tests/. |
--suite=pi |
Only pi-extension/tests/. |
--keep |
Don't remove the container or image on exit. |
--no-build |
Reuse a pre-built clk:tests-latest image. |
-k <expr> |
Forward a -k filter to pytest. |
-- <args> |
Pass remaining args verbatim to pytest. |
Each suite is just pytest / npm and can be invoked on its own:
# Regression suite (existing CI tests)
pip install -e ".[api,dev]" pytest pytest-asyncio httpx
pytest tests/ -v
# User-perspective end-to-end suite (CLI subprocess + live REST API +
# real kickoff.sh runs). Uses the shell provider — no API keys needed.
pytest user_tests/ -v
# Pi extension TypeScript suite
cd pi-extension
npm install
npm test # unit + integration tests (96 tests, ~2s)
npm run test:strict # also runs `tsc --noEmit`The user_tests/ suite verifies, from a real user's vantage point:
- Every
clksub-command (init,idea,cast,roles,plan,run,loop,status,providers,configure) exits cleanly and writes the documented.clk/artefacts. - All seven shipped providers register and the
shellprovider is always available. - The REST API serves health, capabilities, workflows, workspace CRUD, research task creation, SSE streaming, artifact listing, path traversal blocking, and cancellation.
kickoff.shproduces a self-contained workspace dir with its own git repo, and respects--provider/CLK_PROVIDERoverrides.- Filesystem invariants (commit history,
.clk/runs/shell-stubs/, per-command.clk/logs/<cmd>-<ts>.log, etc.).
The pi-extension/tests/ suite verifies:
classifyError,withRetry,looksRedacted,isMaxTurnsResult, and allrecoveryHintbranches.clkChiefPrimerrenders the captured idea + every CLK tool name (clk_cast,clk_subagent,clk_subagent_quality,clk_consensus,clk_autoresearch,clk_ralph,clk_checkpoint,clk_done).scoreResponseflags every documented failure mode (empty / refusal / malformed ACTION / malformed POST / missing outputs / low confidence / needs-review / missing-confidence) andrepairHintquotes each reason to the worker.runConsensusfans out N samples, scores them, picks the winner, caps tomaxParallel, and captures spawn errors without throwing.dispatchWithQualityretries with a repair preamble on recoverable failures and stops on refusal ormaxRetries.setIdea,setRoster,appendProgress,markDone,isDoneround-trip state through.clk/state/*.jsonandprogress.md.- The
gitwrapper does init, checkpoint, branch, merge, revert,hasRemote,commitsAhead, andpushBestEffortcorrectly against a realgitbinary (including the bare-upstream sync, the unreachable- remote failure path, and the no-remote no-op). - The extension's
defaultexport registers every documented tool (clk_cast,clk_progress,clk_checkpoint,clk_revert,clk_branch,clk_merge,clk_done,clk_consensus,clk_subagent_quality,clk_autoresearch,clk_ralph,clk_subagent) and the/clkslash command, and handles an empty-idea invocation cleanly. firstLineShortreturns single-line, capped output so a multi-line idea never bleeds line 2 into the Pi status bar.
- Edit prompts in
.clk/prompts/to change agent behavior. - Edit
.clk/config/agents.jsonto bind specific agents to specific providers (e.g.engineer->claude,researcher->ollama). - Edit
.clk/config/workflows/*.yamlto add new stages or new workflows. Reference any new workflow withclk run --workflow NAME. clk configure --set key=valueupdates.clk/config/clk.config.json.
- Failed work is never silently deleted. The Ralph loop reverts via
git reset --hard <pre-iter-sha>; failed agent outputs remain in.clk/runs/<run_id>/. - Operations that touch more than 5 files are logged before execution (warning) and refused above 25 (configurable).
- All exceptions are logged with
[location] messageand a full traceback.
Every loop and workflow command accepts --dry-run. Providers honor it
and skip side effects. Use it to preview prompt rendering and stage
ordering without writing files or committing.
MIT.