feat(loop): <promise> completion tag for early exit by paulnsorensen · Pull Request #46 · computerlovetech/ralphify

paulnsorensen · 2026-04-17T21:22:41Z

Summary

Adds opt-in early-exit to the loop via a structured completion tag. When
stop_on_completion_signal: true is set in frontmatter and the agent
emits a matching <promise>TEXT</promise> tag in its output, ralphify
stops after the current iteration instead of continuing to the iteration
budget. The default inner text is RALPH_PROMISE_COMPLETE; override it
with the optional completion_signal field.

Why the `<promise>` tag format

The tag format intentionally mirrors what the Claude Code ralphify
plugin (Ralph-Wiggum) already uses to signal structured completion
back to its harness. Aligning on the same <promise>...</promise> shape
means a single prompt body works on both sides — a ralph written against
the Claude plugin can stop a ralphify loop with no adapter code, and
vice versa. Ralphify still runs its own command/prompt loop; only the
completion-tag format is shared.

What's new

src/ralphify/_promise.py — strict <promise>TEXT</promise> parser
with whitespace normalization (bare marker text no longer counts)
Two optional frontmatter fields:
- completion_signal — inner text of the promise tag (default
  RALPH_PROMISE_COMPLETE)
- stop_on_completion_signal — bool gate for early exit (default
  false)
examples/promise-completion/RALPH.md — canonical example, surfaced
from ralph scaffold help and the cookbook
ralph run --help and ralph scaffold --help now explain the pattern
Docs updated across getting-started, cli, how-it-works, agents,
cookbook, quick-reference
Regression coverage in tests/test_promise.py,
tests/test_agent.py, tests/test_cli.py, tests/test_engine.py
(changed modules land at ~98% coverage)

Backwards compatibility

Default behavior is unchanged. Loops still run to their iteration budget
unless both stop_on_completion_signal: true and the configured tag
are present. Existing ralphs are untouched.

Test plan

uv run pytest — 661 passed, 1 xpassed
uv run ruff check .
uv run ruff format --check .
uv run ty check
uv run mkdocs build --strict
Example ralph in examples/promise-completion/ exits early on
<promise>COMPLETE</promise>

Note on help-text test robustness

tests/test_cli.py::TestHelp asserts that <promise>...</promise>
appears in --help output. Rich wraps help text differently in headless
CI vs. a local TTY, which splits the tag across lines and breaks a
naïve substring check. The tests strip ANSI codes and collapse
whitespace before asserting, so they're stable on both sides.

Adds opt-in early-exit to the loop via a structured completion tag. When stop_on_completion_signal: true is set in frontmatter and the agent emits a matching <TEXT> tag in its output, ralphify stops after the current iteration instead of continuing to the iteration budget. - _promise.py: strict TEXT parser with whitespace normalization - Two optional frontmatter fields: completion_signal and stop_on_completion_signal - examples/promise-completion/RALPH.md: canonical example - Docs updated across getting-started, cli, how-it-works, agents, cookbook, quick-reference - Tests in test_promise.py, test_agent.py, test_cli.py, test_engine.py (~98% coverage)

#6) * feat: seed CLI adapter layer with Claude, Codex, Copilot adapters Introduces a pluggable CLIAdapter Protocol and lands the three concrete adapters that exercise it, so the abstraction ships with demonstrated value on day one. - src/ralphify/adapters/__init__.py — CLIAdapter Protocol, AdapterEvent, first-match ADAPTERS registry, select_adapter() dispatch with GenericAdapter fallback. - src/ralphify/adapters/_generic.py — blocking-path fallback adapter. - src/ralphify/adapters/claude.py — stem "claude", --output-format stream-json --verbose, parses tool_use blocks in assistant messages, extracts completion signal from terminal result event. - src/ralphify/adapters/codex.py — stem "codex", --json, maps CollabToolCall/McpToolCall/CommandExecution to tool_use events, scans TurnCompleted/TaskComplete for promise tags. - src/ralphify/adapters/copilot.py — stem "copilot" (not "gh"), --output-format json, renders_structured=False, supports_soft_windown =False. install_wind_down_hook is part of the Protocol surface but every concrete adapter currently raises NotImplementedError — the actual hook plumbing (soft wind-down, AgentHook fanout, max_turns enforcement) lands in subsequent PRs. This PR is pure additions with no engine wiring: adapters are registered and discoverable, but the run loop still uses the old hard-coded Claude path. Tests: 50 new tests covering each adapter's event parsing, command building, completion-signal extraction, and registry dispatch. Co-Authored-By: WOZCODE <contact@withwoz.com> * feat: route Claude detection and promise completion through adapter Wire the CLI adapter layer into the three sites that previously hard-coded Claude knowledge: - _agent.py: execute_agent resolves the adapter (or accepts one from the caller) and delegates flag construction to adapter.build_command. _run_agent_streaming no longer appends --output-format/--verbose itself — the cmd it receives is already flagged. - _console_emitter.py: startup-hint text picks "live activity on" vs "live output on" via adapter.renders_structured instead of a claude-binary check. - engine.py: promise completion runs through adapter.extract_completion_signal(captured_stdout, signal). ClaudeAdapter scans the terminal result event only; GenericAdapter scans full stdout to preserve behavior for unknown CLIs. The engine-side _record_promise_fragment incremental scanner is gone. Tightens ClaudeAdapter.extract_completion_signal to return False when no result event is present so embedded <promise> tags inside status or assistant JSON can't trigger early completion. * fix: address copilot review feedback on adapter seed PR - Rename `supports_soft_windown` → `supports_soft_wind_down` across the Protocol and all adapters to match the `install_wind_down_hook` / "wind-down" terminology used elsewhere (caught before it baked into the public adapter surface). - Claude and Copilot `build_command()` now detect the `--output-format` flag pair and overwrite a user-supplied value instead of treating the flag and value tokens independently (which produced invalid argv when a caller pre-set `--output-format <other>`). - Align Claude `parse_event()` docstring with actual behavior: `result` events return `kind="result"` and non-assistant events return `kind="message"` rather than `None`. - Flip Codex `renders_structured` to False so the peek panel stays in raw-line mode until the emitter is adapter-driven; Codex's streaming path currently feeds an `_IterationPanel` that only understands Claude's stream-json schema. * refactor: split renders_structured into supports_streaming and renders_structured_peek Addresses the architectural concern in PR #6 review: the old `renders_structured` flag conflated two orthogonal capabilities: 1. Whether the adapter emits a structured JSON stream that the execution path can parse into on_activity callbacks. 2. Whether the console peek panel understands the adapter's event schema and should render it via `_IterationPanel` instead of raw lines. Before the split, an adapter had to choose between a useful streaming path with an empty peek panel (Codex's previous state) or a working raw peek with no activity callbacks. The split makes the correct combination representable: - ClaudeAdapter: supports_streaming=True, renders_structured_peek=True - CodexAdapter: supports_streaming=True, renders_structured_peek=False - CopilotAdapter / GenericAdapter: both False Codex now takes the streaming path again (on_activity callbacks fire) while the peek panel stays in raw-line mode until the emitter is adapter-driven. Callers updated: - _agent.execute_agent dispatches on adapter.supports_streaming - _console_emitter consumes adapter.renders_structured_peek via _agent_renders_structured_peek (renamed) - tests/conftest autouse fixture forces both flags to False * perf: skip stdout capture when adapter exposes result_text The previous adapter seed forced capture_stdout=True on every iteration so the engine could feed the buffered transcript into adapter.extract_completion_signal. For verbose streaming agents this regressed memory vs the prior tail-scan approach, and Claude in particular was paying for a full transcript buffer it did not need: agent.result_text already carries the terminal assistant message. Add a requires_full_stdout_for_completion capability flag and a keyword-only extract_completion_signal(result_text=, stdout=, user_signal=) signature so each adapter declares its own input. Claude sets the flag False and reads result_text directly; Codex / Copilot / Generic keep the full-stdout path. The engine now gates capture on log_dir or (stop_on_completion_signal AND requires_full_stdout_for_completion), so streaming agents skip the buffer entirely unless logging is on, and non-streaming agents only buffer when the user opts into completion signalling. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: WOZCODE <contact@withwoz.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

) * refactor: extract adapter Protocol and registry into _protocol.py The adapter package's __init__ both imports concrete adapter modules (to populate ADAPTERS) and defines the Protocol those modules depend on. That's a latent circular-import hazard: it works today only because __init__ defines the Protocol before it imports concrete adapters. As soon as an adapter grows a dependency that could re-enter the package namespace, the order-of-execution becomes load-bearing. Split the Protocol, the AdapterEvent NamedTuple, the Literal kind enums, and the ADAPTERS list into a new leaf module _protocol.py. Concrete adapters (claude, codex, copilot, _generic) now import from _protocol directly. __init__.py re-exports the public names so existing `from ralphify.adapters import CLIAdapter` imports keep working, and keeps select_adapter plus the builtin-registration bootstrap. No behaviour change. * feat: add turn-cap event types and max_turns frontmatter fields Preparatory foundation for the forthcoming turn-cap enforcement work: no enforcement yet, just the event-type surface and the validated RunConfig fields that enforcement will read. - _events.py: TOOL_USE, ITERATION_TURN_APPROACHING_LIMIT, and ITERATION_TURN_CAPPED event kinds with typed data payloads (ToolUseData, TurnApproachingLimitData, TurnCappedData) threaded through the EventData union. - _frontmatter.py: FIELD_MAX_TURNS and FIELD_MAX_TURNS_GRACE constants. - _run_types.py: max_turns (int | None, default None) and max_turns_grace (int, default 2) fields on RunConfig. - cli.py: _validate_max_turns / _validate_max_turns_grace helpers with range and type checks; grace must be strictly less than the cap when the cap is set. _build_run_config threads both values into the returned RunConfig. - tests/test_cli_frontmatter_fields.py: unit coverage for the validators plus end-to-end _build_run_config assertions for the default, valid, and invalid cases. Nothing consumes max_turns yet; it will be read by the adapter-driven enforcement path in a follow-up PR. * fix: mock shutil.which in frontmatter field tests CI runners don't have 'claude' on PATH, so _build_run_config's agent validation was failing before reaching the turn-cap assertions.

#10) * workspace: bootstrap improve-codebase Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop redundant `if parts else ""` — empty join already returns "" `" · ".join([])` returns `""`, so the explicit empty-case guard in `_format_params` is dead. Behavior is identical either way. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_format_params` simplification + verified-live audit - iterations.md: log 4ccfa9a - backlog.md: note vulture false-positives confirmed live; add two future-win candidates to triage - coverage/_console_emitter.md: first coverage note for this module Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: extract `_stop_compact_live_unlocked` to dedupe compact-Live teardown The three-line `if self._live is not None: self._live.stop(); self._live = None` pattern appeared in three call sites (`_stop_live_unlocked`, `enter_fullscreen`, `_on_iteration_ended`). Collapsed into a single helper — behavior unchanged. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_stop_compact_live_unlocked` extraction + backlog note Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop redundant `_iteration_order` list — dict insertion order suffices Python dicts preserve insertion order since 3.7, so the parallel `_iteration_order` list tracking the same sequence as `_iteration_history` was pure duplication. Archive re-orders via pop-then-insert, eviction iterates the dict (oldest-first), and enter_fullscreen's most-recent lookup uses `reversed()`. Behavior is unchanged; the only test hit was a direct field reference. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_iteration_order` removal Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: extract `_call_safely` helper for best-effort observer callbacks The same guarded-callback pattern (`if cb is not None: try: cb(...); except Exception: pass`, with identical "best-effort; draining must not stop" comment) was duplicated three times across `_read_agent_stream` and `_pump_stream`. Consolidate into a single `_call_safely` helper defined next to the callback type aliases. Behavior unchanged — the helper preserves the None guard, suppresses all exceptions, and only fires the callback once with the provided arguments. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_call_safely` extraction + seed `_agent.py` coverage note Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop redundant max(total - visible, 1) guard in `_scrollbar_metrics` The early return `if total <= visible` above guarantees `total > visible`, so `total - visible` is always ≥ 1. Inline the subtraction directly and note the invariant in a comment. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record scrollbar-metrics simplification Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop redundant empty-dict guard in `_format_categories` `" · ".join([])` already returns `""`, so the early-return on an empty `_tool_categories` dict was dead code — same pattern as 4ccfa9a's `_format_params` cleanup. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_format_categories` empty-dict guard removal Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: extract `_step_iteration` to dedupe prev/next iteration browsing `prev_iteration` and `next_iteration` were 12-line mirror images differing only in step direction and eviction-fallback choice (oldest vs newest). Extracted to a single direction-parametric helper — behavior preserved. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_step_iteration` extraction Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: use public `iteration_id` property in history eviction The one cross-class `_iteration_id` access in the module now goes through the public property, keeping `_FullscreenPeek`'s private attribute private. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `iteration_id` property-access cleanup Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop `_reset_view` — identical to `scroll_to_bottom` `_FullscreenPeek._reset_view` set `_offset = 0` and `_auto_scroll = True`, the same two assignments as `scroll_to_bottom`. Dropped the private helper and called `scroll_to_bottom()` from the two `_step_iteration` sites instead. Added the docstring that used to live on `_reset_view` to the surviving method so the "snap to newest line + follow" intent is still recorded. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_reset_view` removal Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: defer `panel_for` guard to `is_live` to dedupe identical condition Both methods checked `self._current_iteration == iteration_id and self._active_renderable is not None`. Rewrite `panel_for` to call `is_live` so the condition lives in exactly one place — if the "this is the live iteration" check ever needs to grow (e.g. add a paused state), `panel_for` follows automatically. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `panel_for` / `is_live` dedup Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `total_in` alias in `_format_tokens` The local `total_in` variable was a pointless rename of `self._input_tokens` — it hinted at an aggregated "total" that no longer exists (cache-read tokens are intentionally excluded). Read `self._input_tokens` directly so the label reflects what the value actually is. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `total_in` alias inlining Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop redundant f-string wrap around `_plural` result `_plural` already returns a str, so `f"{_plural(...)}"` constructs the same string via an extra format step. Pass the value straight through, matching the style of the sibling `header.append(...)` calls. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_plural` f-string wrap cleanup Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop redundant empty-user_args branch in `resolve_args` `_ARGS_RE.sub` with a callable that returns `dict.get(name, "")` already resolves every placeholder to the empty string when the dict is empty, so the `if not user_args: return _ARGS_RE.sub("", prompt)` early return was dead code producing the same output as the general path. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `resolve_args` empty-branch cleanup Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: promote 40-line fallback height to `_DEFAULT_CONSOLE_HEIGHT` constant Both `_FullscreenPeek._console_height` (pre-render default) and `_fullscreen_page_size`'s except-branch fallback used the literal 40 to mean "reasonable terminal height when the real value isn't available." Naming the shared intent in one place keeps the two in lockstep. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_DEFAULT_CONSOLE_HEIGHT` constant promotion Shift current phase from Phase 1 (dead code — exhausted) to Phase 3 (magic values), update the _console_emitter.py coverage note with the new sha, and log the iteration. Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: narrow `name_col` scope to `if arg:` in `_apply_assistant` The tool_use branch computed the padded `name_col` unconditionally, then only used it when `arg` was truthy — the `else` branch uses the raw `name` instead. Move the pad-to-column formatting inside `if arg:` so the helper variable only exists where it's actually rendered, and no unnecessary f-string work happens when a tool call has no primary arg. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `name_col` scope narrowing and open Phase 4 Phase 3 (magic values) is essentially drained — mark it so and open Phase 4 (complex conditionals & long functions) as the current phase. The 134078d narrowing is the first concrete Phase 4 item. Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop redundant `_active_renderable` guard in `_on_iteration_started` `_archive_current_iteration_unlocked` already no-ops when no iteration is active (guarded at its entry), so wrapping the call in `if self._active_renderable is not None:` added nothing. Dropped the `if` and updated the surrounding comment to note the archive call's no-op behavior. Same shape as 5337d88 and 4ccfa9a's dead-guard removals. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_on_iteration_started` guard removal Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: move `_structured_agent` check out of lock in `_on_agent_output_line` Mirrors the pattern already used in `_on_agent_activity`: the `_structured_agent` flag is write-once (set in `_on_run_started` before any iteration events can flow), so the short-circuit check is safe lock-free. Also avoids acquiring the console lock for every stdout line when running Claude in structured mode. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_on_agent_output_line` lock-free check move Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: skip dead `"".join(...)` in `_run_agent_streaming` when not logging The trailing block joined `stream.stdout_lines` and `stderr_lines` unconditionally, then the AgentResult ternary discarded both joined strings when `log_dir is None` (they were only ever consumed by `_write_log` and the `captured_stdout`/`captured_stderr` fields). Fold the `if log_dir is not None` check up into the join so the no-log path skips the work entirely — same shape as the already-lazy `_run_agent_blocking` tail, which uses `"".join(...) if x is not None else None` exactly once. Drops the duplicated ternary on the AgentResult fields too. Same observable behavior (verified by `test_captured_output_set_when_logging` and `test_no_log_when_dir_not_set`). Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_run_agent_streaming` lazy-join refactor Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop `line_count` alias in `_IterationSpinner._build_footer` The local was computed unconditionally but only read on the truthy branch (both as predicate and as `_plural` arg). Inline the truthy check on `self._scroll_lines` and call `len()` only inside the branch that needs it — matches the sibling `_IterationPanel._build_footer` style (`if self._tool_count > 0:` with direct attribute references). Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_IterationSpinner._build_footer` alias drop Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `agent` alias in `_on_run_started` The local was read exactly once — the only use was the immediately following `_is_claude_command(agent)` call. Reading `data["agent"]` directly matches the style of the sibling fc5e1cb inline of `total_in` and keeps the function focused on what it does. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `agent` alias inline in `_on_run_started` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop `new_offset` alias in `_FullscreenPeek.scroll_down` The local was assigned immediately to `self._offset`, then the follow-mode check re-read it as `new_offset == 0`. Reading `self._offset` directly after assignment has identical semantics — the alias added no value. Sibling `scroll_up` keeps its local because it compares the new value against the old one *before* the assignment (to conditionally disable auto-scroll), which requires two distinct values. `scroll_down` has no such comparison, so the alias is dead. Same shape as ef176bf (`line_count`) and 134078d (`name_col`). Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_FullscreenPeek.scroll_down` alias drop Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `msg` alias in `_apply_assistant` The `msg = raw.get("message", {})` local was read exactly once on the next line as `msg.get("usage")`. Reading `raw.get("message", {}).get("usage")` directly matches the chained-get style already used in `_iter_content_blocks` and the inline-alias style established by 497c028 (`agent`) and fc5e1cb (`total_in`). Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `msg` alias inline in `_apply_assistant` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `_fullscreen_page_size()` into space/b lambdas The `page` local in `_handle_fullscreen_key` was computed unconditionally in the non-exit branch but only consumed by the space/b action lambdas (page-down / page-up). Inlining `self._fullscreen_page_size()` into those two lambdas means it's only evaluated when space or b is actually pressed — j/k/g/G/[/] now skip the call entirely. Behavior is identical: the dict is built and a single action is invoked per keypress, and the page value is read at action-invocation time under the same lock. Same Phase 4 shape as 134078d / ef176bf / b19625e — narrow the scope of a local that was only live on one branch. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_fullscreen_page_size()` lambda-inline refactor Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: use `next(reversed(...), None)` in `enter_fullscreen` history fallback The compound `if initial_id is None and self._iteration_history:` guard was doing two jobs: pick the fallback only when there's no live iteration, *and* avoid `next(reversed({}))` raising StopIteration on an empty dict. Switching to `next(reversed(...), None)` lets the standard sentinel default handle the empty case so the outer `if` only encodes the "fall back when nothing is live" intent. Same observable behavior: when both `_current_iteration` is None and `_iteration_history` is empty, `initial_id` ends up None and the existing `if initial_id is None or self.panel_for(...) is None:` branch prints "Full peek: no iterations yet" and returns False — covered by `test_enter_without_iteration_prints_hint`. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `enter_fullscreen` history-fallback simplification Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop redundant BOM-startswith guard before `removeprefix` `str.removeprefix` is already a no-op when the prefix is absent — the `if text.startswith(_UTF8_BOM)` guard was dead defensive code. Same shape as 5337d88 / 4ccfa9a / 8cb0d47's "drop the guard when the operation already handles the empty/no-match case." Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record BOM-guard drop in `parse_frontmatter` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: replace `parsed = None` sentinel with try/except/else in `_read_agent_stream` The sentinel was a no-op placeholder for the error path: setting `parsed = None` just so `isinstance(parsed, dict)` would skip the forwarding block on the next line. Using the `try/except/else` structure expresses the same intent directly — the dict-handling logic runs only when `json.loads` succeeds — and drops the sentinel assignment plus the redundant dict check on the JSON-error path. Behavior preserved: the inner `isinstance(parsed, dict)` still gates forwarding for valid-but-non-dict JSON (lists, strings, numbers). Covered by the existing stream tests (`test_agent.py::test_stream_json_*`). Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record try/except/else refactor in `_read_agent_stream` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: replace `source._outcome` cross-class access with public `outcome` property `_FullscreenPeek._build_header` was reading `source._outcome` directly on a `_LivePanelBase` instance, crossing class boundaries into a private attribute. Expose an `outcome` property on `_LivePanelBase` (matching the `iteration_id` property added in ef9a178 for the parallel case) so the fullscreen header reads the value through a public API. Same observable behavior; the attribute is still the single source of truth inside `freeze`. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `outcome` property refactor on `_LivePanelBase` Iteration note, coverage update, and backlog entry for d0060b3 — the ef9a178-style public-property substitution applied to the remaining `source._outcome` cross-class access in the fullscreen header. Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: narrow `secs` scope into `if minutes < _MINUTES_PER_HOUR:` branch `secs = total % _SECONDS_PER_MINUTE` was computed unconditionally but only consumed by the `f"{minutes}m {secs}s"` return on the truthy branch. The hours branch derives its own `mins`/`hours` pair and never references `secs`. Moving the computation inside the branch that uses it keeps the local co-located with its use site. Same Phase 4 narrow-the-scope shape as 134078d (`name_col`), ef176bf (`line_count`), 59b0e34 (`page`), b19625e (`new_offset`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `secs` scope-narrowing in `format_duration` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `text` alias in `collect_output` The local was assigned from `ensure_str(stream)` and read exactly once on the next line as `parts.append(text)`. Inlining matches the alias-inline style from fc5e1cb / 497c028 / 52e0272 — the helper name already documents the decode step. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `text` alias inline in `collect_output` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: narrow `line` scope past early-return in `_on_agent_output_line` `escape_markup(data["line"])` was computed unconditionally before the `if not isinstance(target, _IterationSpinner): return` guard, so the work was wasted whenever the event hit the early-return branch (no active panel, or structured panel for a non-structured-agent edge case). Moving the binding below the guard keeps the same output for the _IterationSpinner branch and drops the dead work for the other branch. Same Phase 4 shape as 134078d's `name_col` narrowing and 7730dd4's `secs` narrowing. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `line` scope narrowing in `_on_agent_output_line` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `reason` alias in `run_loop` stop-event emit The `reason = state.status.reason` local was read exactly once on the next line as the `reason=` kwarg to `RunStoppedData(...)`. Collapsing to `reason=state.status.reason` matches the inline-alias pattern from ce487d3 (`text`) / 52e0272 (`msg`) / 497c028 (`agent`) / fc5e1cb (`total_in`). The property access is safe at this call site: the immediately-preceding `if state.status == RunStatus.RUNNING: state.status = RunStatus.COMPLETED` ensures `state.status` is always a terminal status (STOPPED / COMPLETED / FAILED), so `RunStatus.reason` cannot raise. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `reason` alias inline in `run_loop` Added coverage/engine.md noting the inline-alias refactor and the safety argument for the `state.status.reason` property access. Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `binary` alias in `_supports_stream_json` The local was read exactly once on the next line as `binary == CLAUDE_BINARY`. Collapsing to `Path(cmd[0]).stem == CLAUDE_BINARY` matches the sibling check in `_console_emitter.py:_is_claude_command`, which already uses the chained form, and the inline-alias pattern from ce487d3 / 52e0272 / 497c028 / fc5e1cb. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `binary` alias inline in `_supports_stream_json` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `visible` alias in `_LivePanelBase._build_body` The `visible = self._scroll_lines[-_MAX_VISIBLE_SCROLL:]` local was read exactly once as the iterable of the very next `for line in ...` loop. Collapsing to `for line in self._scroll_lines[-_MAX_VISIBLE_SCROLL:]:` matches the inline-alias pattern from 497c028 / fc5e1cb / 52e0272 / ce487d3 / e1ad87a — the slice already documents "the last N scroll lines" without needing a named intermediate. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `visible` alias inline in `_build_body` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `reader` thread handle in `_read_agent_stream` The `reader` local served only to call `.start()` — the thread is never joined explicitly because termination is signalled through the queue's None sentinel and the daemon flag. Collapsing into the fluent `Thread(...).start()` form matches the fire-and-forget intent and drops an unused binding. Python's `threading` module keeps live threads reachable via `threading._active`, so dropping the local reference does not affect thread lifetime. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `reader` thread handle inline in `_read_agent_stream` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `remaining` alias in `_read_agent_stream` timeout calc The `remaining = deadline - time.monotonic()` local served a single use on the very next line as `max(remaining, 0)`. Collapsing to `max(deadline - time.monotonic(), 0)` matches the inline-alias style established by e1ad87a (`binary`), 497c028 (`agent`), fc5e1cb (`total_in`), 52e0272 (`msg`), and ce487d3 (`text`). Updated the adjacent comment to refer to "clamp to 0" since the `remaining` name no longer exists. Behavior preserved — same value reaches `line_q.get(timeout=...)` on every iteration of the read loop. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `remaining` alias inline in `_read_agent_stream` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `stream_cmd` into Popen call in `_run_agent_streaming` The local was constructed and read exactly once on the very next statement as the first positional arg to `subprocess.Popen(...)`. The three appended tokens are already named constants (`_OUTPUT_FORMAT_FLAG`, `_STREAM_FORMAT`, `_VERBOSE_FLAG`) so the intent reads clearly at the call site without the intermediate binding. Same Phase 4 inline-alias shape as 66d6c60 (`remaining`), b24accf (`reader`), 2fda4f0 (`visible`), and e1ad87a (`binary`). Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `stream_cmd` inline in `_run_agent_streaming` Co-authored-by: Ralphify <noreply@ralphify.co> --------- Co-authored-by: Kasper Junge <kasperjunge@Kaspers-MacBook-Pro.local> Co-authored-by: Ralphify <noreply@ralphify.co> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: support embedding ralphify as a library so callers can run loops in-process Make typer/rich a `[cli]` extra so the core (pyyaml-only) imports cleanly in library contexts; `ralph` CLI raises a clear install hint when the extra is missing. Add in-memory `RunConfig.prompt` to run a loop without a RALPH.md file, plus a RunManager embedding lifecycle (wait_for_any/all, get_result, shutdown) and an immutable RunResult snapshot. Make Event generic over its payload and export the typed event payloads so embedders can annotate handlers without casting. Documented in docs/embedding.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: address PR review findings for library embedding API - manager: guard wait_for_any/wait_for_all against infinite blocking when run_ids is empty or contains only unregistered IDs with timeout=None — nothing can ever notify the condition, so return immediately - __init__: narrow the CLI-import guard in main() to ModuleNotFoundError and only emit the [cli]-extra hint for missing rich/typer; re-raise any other missing-module error so real bugs in ralphify.cli aren't masked - _events: replace the unsound cast(EventData, {}) with an explicit empty NoData payload type added to the EventData union; drop the unused cast - _events: reword an internal "curd 1" note to plain language Adds no-hang regression tests for the wait helpers (timeout=None) and a test that main() re-raises unrelated import errors. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

New `Invocation` abstraction and `deliver_prompt` method let adapters choose where the prompt goes: stdin or as a positional argument. Arg-delivery adapters (opencode) return `stdin_text=None` for DEVNULL stdin and no writer thread. Enables opencode as first-class with `--format json` event parsing and safe positional-argument delivery. Co-authored-by: Claude <noreply@anthropic.com>

The opencode adapter mapped `tool_result` to a message event, but opencode never emits that type — verified against run.ts emit() call sites in both v1.15.10 and v0.15.31. Meanwhile the newer `reasoning` and `error` events were unhandled, so they were silently dropped from the live peek panel. Map the real informational event set (step_start / text / reasoning / error) so they render without counting against the turn cap. Tool output rides on the existing `tool_use` event once a tool part reaches the completed/error status, so no separate result-event handling is needed. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

feat: add Crush CLI adapter so users can loop Charm Crush Crush is TUI-first but ships a `crush run` subcommand for non-interactive single-prompt use, reading the prompt from stdin and auto-approving all permissions (no --yolo needed). It emits plain text only — no JSON or streaming-event mode — so the adapter sits in the blocking tier alongside the generic adapter: counts_what="none", supports_streaming=False, requires_full_stdout_for_completion=True. Completion is detected by scanning stdout for the <promise> tag. build_command injects `--quiet` (idempotent) to suppress the spinner for clean piped output. Verified against crush v0.71.0 source. Also gitignore .cheese/ and .serena/ (skill scratch + LSP cache). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…own (#20) * feat: enforce max_turns and add agent lifecycle hooks Completes Phase 3 of the CLI adapter layer: fills in the turn-cap enforcement, lifecycle hooks, and per-CLI soft wind-down that the merged adapter seed (#6, #7) left scaffolded as NotImplementedError stubs and unwired events. - max_turns is now enforced: streaming adapters that count tool uses (Claude, Codex, Copilot, opencode) are SIGTERM'd at the cap and the iteration is recorded as completed-at-cap; Crush (counts_what="none") treats the cap as a no-op. - New `hooks` frontmatter field + AgentHook protocol (ShellAgentHook, CombinedAgentHook, NoOpAgentHook). Hooks observe iteration lifecycle, tool use, turn-approaching-limit, turn cap, and completion; a failing hook is logged but never aborts the run. - Per-CLI soft wind-down: Claude PreToolUse / Codex PostToolUse hooks are installed into a per-iteration tempdir (CLAUDE_CONFIG_DIR / CODEX_HOME) so the user's real config stays untouched. Adapters with no hook system downgrade to hard-cap-only via NotImplementedError. Re-sliced from the original explore-cli-hooks branch (PRs #5/#9) onto current main, reconciled with the prompt-delivery-as-adapter (#17) and library-embedding (#16) refactors, and extended with turn-cap coverage for the Crush and opencode adapters that landed after those PRs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: correct turn-cap enforcement, wind-down clamps, and Codex hook config Addresses review findings on PR #20: - Force stdout buffering on the blocking path when max_turns is set on a tool-use-counting adapter, so post-hoc counting actually reports tool_use_count and sets turn_capped (previously always 0 for Copilot). - Route the wind-down counter callback through _call_safely so a raising subscriber cannot crash the agent loop. - Clamp the wind-down grace below max_turns in both _setup_wind_down and the engine's approaching-limit threshold, so a Python-API grace >= max_turns no longer fires wind-down / approaching events at count 0. - Log the first soft wind-down counter-write failure at WARNING instead of swallowing all OSError silently. - Write Codex's hook feature flag to the [features] table (the one Codex reads; Stable, default-on) instead of the inert [experimental] table. - Clean up the orphaned counter file when install_wind_down_hook raises NotImplementedError and the counter lives in log_dir. - Reword CHANGELOG and quick-reference to distinguish streaming SIGTERM preemption from blocking post-hoc turn_capped reporting (Copilot is not SIGTERM'd). - Inline single-use hook constants in the Claude/Codex adapters. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: lock turn-cap and wind-down fixes with regression tests Hardening tests for the PR #20 review fixes, each proven to fail on the pre-fix implementation: - blocking-path forced stdout buffering so a tool-use-counting adapter with max_turns reports tool_use_count and turn_capped (plus a no-cap negative control). - wrapped counter callback swallows a raising subscriber via _call_safely. - wind-down grace is clamped below the cap before reaching the shim. - the log_dir counter file is cleaned up when install_wind_down_hook raises NotImplementedError. - the first counter-write failure logs one WARNING; repeats stay quiet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

test_run_resolves_installed_ralph_by_name mocked only shutil.which, not the subprocess, so it spawned a real `claude` agent and asserted on its non-deterministic output — the run took ~35s and could flake when the model's response happened to contain "installed ralph". Wrap the run in the standard subprocess mock so resolution is exercised without a live agent call. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

paulnsorensen force-pushed the main branch from 6523677 to 85ff945 Compare April 18, 2026 21:22

paulnsorensen and others added 9 commits April 18, 2026 20:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(loop): <promise> completion tag for early exit#46

feat(loop): <promise> completion tag for early exit#46
paulnsorensen wants to merge 10 commits into
computerlovetech:mainfrom
paulnsorensen:main

paulnsorensen commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

paulnsorensen commented Apr 17, 2026

Summary

Why the <promise> tag format

What's new

Backwards compatibility

Test plan

Note on help-text test robustness

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Why the `<promise>` tag format