feat(loop): <promise> completion tag for early exit#46
Open
paulnsorensen wants to merge 10 commits into
Open
Conversation
Adds opt-in early-exit to the loop via a structured completion tag. When stop_on_completion_signal: true is set in frontmatter and the agent emits a matching <TEXT> tag in its output, ralphify stops after the current iteration instead of continuing to the iteration budget. - _promise.py: strict TEXT parser with whitespace normalization - Two optional frontmatter fields: completion_signal and stop_on_completion_signal - examples/promise-completion/RALPH.md: canonical example - Docs updated across getting-started, cli, how-it-works, agents, cookbook, quick-reference - Tests in test_promise.py, test_agent.py, test_cli.py, test_engine.py (~98% coverage)
#6) * feat: seed CLI adapter layer with Claude, Codex, Copilot adapters Introduces a pluggable CLIAdapter Protocol and lands the three concrete adapters that exercise it, so the abstraction ships with demonstrated value on day one. - src/ralphify/adapters/__init__.py — CLIAdapter Protocol, AdapterEvent, first-match ADAPTERS registry, select_adapter() dispatch with GenericAdapter fallback. - src/ralphify/adapters/_generic.py — blocking-path fallback adapter. - src/ralphify/adapters/claude.py — stem "claude", --output-format stream-json --verbose, parses tool_use blocks in assistant messages, extracts completion signal from terminal result event. - src/ralphify/adapters/codex.py — stem "codex", --json, maps CollabToolCall/McpToolCall/CommandExecution to tool_use events, scans TurnCompleted/TaskComplete for promise tags. - src/ralphify/adapters/copilot.py — stem "copilot" (not "gh"), --output-format json, renders_structured=False, supports_soft_windown =False. install_wind_down_hook is part of the Protocol surface but every concrete adapter currently raises NotImplementedError — the actual hook plumbing (soft wind-down, AgentHook fanout, max_turns enforcement) lands in subsequent PRs. This PR is pure additions with no engine wiring: adapters are registered and discoverable, but the run loop still uses the old hard-coded Claude path. Tests: 50 new tests covering each adapter's event parsing, command building, completion-signal extraction, and registry dispatch. Co-Authored-By: WOZCODE <contact@withwoz.com> * feat: route Claude detection and promise completion through adapter Wire the CLI adapter layer into the three sites that previously hard-coded Claude knowledge: - _agent.py: execute_agent resolves the adapter (or accepts one from the caller) and delegates flag construction to adapter.build_command. _run_agent_streaming no longer appends --output-format/--verbose itself — the cmd it receives is already flagged. - _console_emitter.py: startup-hint text picks "live activity on" vs "live output on" via adapter.renders_structured instead of a claude-binary check. - engine.py: promise completion runs through adapter.extract_completion_signal(captured_stdout, signal). ClaudeAdapter scans the terminal result event only; GenericAdapter scans full stdout to preserve behavior for unknown CLIs. The engine-side _record_promise_fragment incremental scanner is gone. Tightens ClaudeAdapter.extract_completion_signal to return False when no result event is present so embedded <promise> tags inside status or assistant JSON can't trigger early completion. * fix: address copilot review feedback on adapter seed PR - Rename `supports_soft_windown` → `supports_soft_wind_down` across the Protocol and all adapters to match the `install_wind_down_hook` / "wind-down" terminology used elsewhere (caught before it baked into the public adapter surface). - Claude and Copilot `build_command()` now detect the `--output-format` flag pair and overwrite a user-supplied value instead of treating the flag and value tokens independently (which produced invalid argv when a caller pre-set `--output-format <other>`). - Align Claude `parse_event()` docstring with actual behavior: `result` events return `kind="result"` and non-assistant events return `kind="message"` rather than `None`. - Flip Codex `renders_structured` to False so the peek panel stays in raw-line mode until the emitter is adapter-driven; Codex's streaming path currently feeds an `_IterationPanel` that only understands Claude's stream-json schema. * refactor: split renders_structured into supports_streaming and renders_structured_peek Addresses the architectural concern in PR #6 review: the old `renders_structured` flag conflated two orthogonal capabilities: 1. Whether the adapter emits a structured JSON stream that the execution path can parse into on_activity callbacks. 2. Whether the console peek panel understands the adapter's event schema and should render it via `_IterationPanel` instead of raw lines. Before the split, an adapter had to choose between a useful streaming path with an empty peek panel (Codex's previous state) or a working raw peek with no activity callbacks. The split makes the correct combination representable: - ClaudeAdapter: supports_streaming=True, renders_structured_peek=True - CodexAdapter: supports_streaming=True, renders_structured_peek=False - CopilotAdapter / GenericAdapter: both False Codex now takes the streaming path again (on_activity callbacks fire) while the peek panel stays in raw-line mode until the emitter is adapter-driven. Callers updated: - _agent.execute_agent dispatches on adapter.supports_streaming - _console_emitter consumes adapter.renders_structured_peek via _agent_renders_structured_peek (renamed) - tests/conftest autouse fixture forces both flags to False * perf: skip stdout capture when adapter exposes result_text The previous adapter seed forced capture_stdout=True on every iteration so the engine could feed the buffered transcript into adapter.extract_completion_signal. For verbose streaming agents this regressed memory vs the prior tail-scan approach, and Claude in particular was paying for a full transcript buffer it did not need: agent.result_text already carries the terminal assistant message. Add a requires_full_stdout_for_completion capability flag and a keyword-only extract_completion_signal(result_text=, stdout=, user_signal=) signature so each adapter declares its own input. Claude sets the flag False and reads result_text directly; Codex / Copilot / Generic keep the full-stdout path. The engine now gates capture on log_dir or (stop_on_completion_signal AND requires_full_stdout_for_completion), so streaming agents skip the buffer entirely unless logging is on, and non-streaming agents only buffer when the user opts into completion signalling. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: WOZCODE <contact@withwoz.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
) * refactor: extract adapter Protocol and registry into _protocol.py The adapter package's __init__ both imports concrete adapter modules (to populate ADAPTERS) and defines the Protocol those modules depend on. That's a latent circular-import hazard: it works today only because __init__ defines the Protocol before it imports concrete adapters. As soon as an adapter grows a dependency that could re-enter the package namespace, the order-of-execution becomes load-bearing. Split the Protocol, the AdapterEvent NamedTuple, the Literal kind enums, and the ADAPTERS list into a new leaf module _protocol.py. Concrete adapters (claude, codex, copilot, _generic) now import from _protocol directly. __init__.py re-exports the public names so existing `from ralphify.adapters import CLIAdapter` imports keep working, and keeps select_adapter plus the builtin-registration bootstrap. No behaviour change. * feat: add turn-cap event types and max_turns frontmatter fields Preparatory foundation for the forthcoming turn-cap enforcement work: no enforcement yet, just the event-type surface and the validated RunConfig fields that enforcement will read. - _events.py: TOOL_USE, ITERATION_TURN_APPROACHING_LIMIT, and ITERATION_TURN_CAPPED event kinds with typed data payloads (ToolUseData, TurnApproachingLimitData, TurnCappedData) threaded through the EventData union. - _frontmatter.py: FIELD_MAX_TURNS and FIELD_MAX_TURNS_GRACE constants. - _run_types.py: max_turns (int | None, default None) and max_turns_grace (int, default 2) fields on RunConfig. - cli.py: _validate_max_turns / _validate_max_turns_grace helpers with range and type checks; grace must be strictly less than the cap when the cap is set. _build_run_config threads both values into the returned RunConfig. - tests/test_cli_frontmatter_fields.py: unit coverage for the validators plus end-to-end _build_run_config assertions for the default, valid, and invalid cases. Nothing consumes max_turns yet; it will be read by the adapter-driven enforcement path in a follow-up PR. * fix: mock shutil.which in frontmatter field tests CI runners don't have 'claude' on PATH, so _build_run_config's agent validation was failing before reaching the turn-cap assertions.
#10) * workspace: bootstrap improve-codebase Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop redundant `if parts else ""` — empty join already returns "" `" · ".join([])` returns `""`, so the explicit empty-case guard in `_format_params` is dead. Behavior is identical either way. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_format_params` simplification + verified-live audit - iterations.md: log 4ccfa9a - backlog.md: note vulture false-positives confirmed live; add two future-win candidates to triage - coverage/_console_emitter.md: first coverage note for this module Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: extract `_stop_compact_live_unlocked` to dedupe compact-Live teardown The three-line `if self._live is not None: self._live.stop(); self._live = None` pattern appeared in three call sites (`_stop_live_unlocked`, `enter_fullscreen`, `_on_iteration_ended`). Collapsed into a single helper — behavior unchanged. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_stop_compact_live_unlocked` extraction + backlog note Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop redundant `_iteration_order` list — dict insertion order suffices Python dicts preserve insertion order since 3.7, so the parallel `_iteration_order` list tracking the same sequence as `_iteration_history` was pure duplication. Archive re-orders via pop-then-insert, eviction iterates the dict (oldest-first), and enter_fullscreen's most-recent lookup uses `reversed()`. Behavior is unchanged; the only test hit was a direct field reference. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_iteration_order` removal Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: extract `_call_safely` helper for best-effort observer callbacks The same guarded-callback pattern (`if cb is not None: try: cb(...); except Exception: pass`, with identical "best-effort; draining must not stop" comment) was duplicated three times across `_read_agent_stream` and `_pump_stream`. Consolidate into a single `_call_safely` helper defined next to the callback type aliases. Behavior unchanged — the helper preserves the None guard, suppresses all exceptions, and only fires the callback once with the provided arguments. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_call_safely` extraction + seed `_agent.py` coverage note Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop redundant max(total - visible, 1) guard in `_scrollbar_metrics` The early return `if total <= visible` above guarantees `total > visible`, so `total - visible` is always ≥ 1. Inline the subtraction directly and note the invariant in a comment. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record scrollbar-metrics simplification Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop redundant empty-dict guard in `_format_categories` `" · ".join([])` already returns `""`, so the early-return on an empty `_tool_categories` dict was dead code — same pattern as 4ccfa9a's `_format_params` cleanup. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_format_categories` empty-dict guard removal Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: extract `_step_iteration` to dedupe prev/next iteration browsing `prev_iteration` and `next_iteration` were 12-line mirror images differing only in step direction and eviction-fallback choice (oldest vs newest). Extracted to a single direction-parametric helper — behavior preserved. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_step_iteration` extraction Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: use public `iteration_id` property in history eviction The one cross-class `_iteration_id` access in the module now goes through the public property, keeping `_FullscreenPeek`'s private attribute private. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `iteration_id` property-access cleanup Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop `_reset_view` — identical to `scroll_to_bottom` `_FullscreenPeek._reset_view` set `_offset = 0` and `_auto_scroll = True`, the same two assignments as `scroll_to_bottom`. Dropped the private helper and called `scroll_to_bottom()` from the two `_step_iteration` sites instead. Added the docstring that used to live on `_reset_view` to the surviving method so the "snap to newest line + follow" intent is still recorded. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_reset_view` removal Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: defer `panel_for` guard to `is_live` to dedupe identical condition Both methods checked `self._current_iteration == iteration_id and self._active_renderable is not None`. Rewrite `panel_for` to call `is_live` so the condition lives in exactly one place — if the "this is the live iteration" check ever needs to grow (e.g. add a paused state), `panel_for` follows automatically. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `panel_for` / `is_live` dedup Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `total_in` alias in `_format_tokens` The local `total_in` variable was a pointless rename of `self._input_tokens` — it hinted at an aggregated "total" that no longer exists (cache-read tokens are intentionally excluded). Read `self._input_tokens` directly so the label reflects what the value actually is. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `total_in` alias inlining Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop redundant f-string wrap around `_plural` result `_plural` already returns a str, so `f"{_plural(...)}"` constructs the same string via an extra format step. Pass the value straight through, matching the style of the sibling `header.append(...)` calls. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_plural` f-string wrap cleanup Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop redundant empty-user_args branch in `resolve_args` `_ARGS_RE.sub` with a callable that returns `dict.get(name, "")` already resolves every placeholder to the empty string when the dict is empty, so the `if not user_args: return _ARGS_RE.sub("", prompt)` early return was dead code producing the same output as the general path. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `resolve_args` empty-branch cleanup Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: promote 40-line fallback height to `_DEFAULT_CONSOLE_HEIGHT` constant Both `_FullscreenPeek._console_height` (pre-render default) and `_fullscreen_page_size`'s except-branch fallback used the literal 40 to mean "reasonable terminal height when the real value isn't available." Naming the shared intent in one place keeps the two in lockstep. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_DEFAULT_CONSOLE_HEIGHT` constant promotion Shift current phase from Phase 1 (dead code — exhausted) to Phase 3 (magic values), update the _console_emitter.py coverage note with the new sha, and log the iteration. Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: narrow `name_col` scope to `if arg:` in `_apply_assistant` The tool_use branch computed the padded `name_col` unconditionally, then only used it when `arg` was truthy — the `else` branch uses the raw `name` instead. Move the pad-to-column formatting inside `if arg:` so the helper variable only exists where it's actually rendered, and no unnecessary f-string work happens when a tool call has no primary arg. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `name_col` scope narrowing and open Phase 4 Phase 3 (magic values) is essentially drained — mark it so and open Phase 4 (complex conditionals & long functions) as the current phase. The 134078d narrowing is the first concrete Phase 4 item. Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop redundant `_active_renderable` guard in `_on_iteration_started` `_archive_current_iteration_unlocked` already no-ops when no iteration is active (guarded at its entry), so wrapping the call in `if self._active_renderable is not None:` added nothing. Dropped the `if` and updated the surrounding comment to note the archive call's no-op behavior. Same shape as 5337d88 and 4ccfa9a's dead-guard removals. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_on_iteration_started` guard removal Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: move `_structured_agent` check out of lock in `_on_agent_output_line` Mirrors the pattern already used in `_on_agent_activity`: the `_structured_agent` flag is write-once (set in `_on_run_started` before any iteration events can flow), so the short-circuit check is safe lock-free. Also avoids acquiring the console lock for every stdout line when running Claude in structured mode. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_on_agent_output_line` lock-free check move Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: skip dead `"".join(...)` in `_run_agent_streaming` when not logging The trailing block joined `stream.stdout_lines` and `stderr_lines` unconditionally, then the AgentResult ternary discarded both joined strings when `log_dir is None` (they were only ever consumed by `_write_log` and the `captured_stdout`/`captured_stderr` fields). Fold the `if log_dir is not None` check up into the join so the no-log path skips the work entirely — same shape as the already-lazy `_run_agent_blocking` tail, which uses `"".join(...) if x is not None else None` exactly once. Drops the duplicated ternary on the AgentResult fields too. Same observable behavior (verified by `test_captured_output_set_when_logging` and `test_no_log_when_dir_not_set`). Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_run_agent_streaming` lazy-join refactor Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop `line_count` alias in `_IterationSpinner._build_footer` The local was computed unconditionally but only read on the truthy branch (both as predicate and as `_plural` arg). Inline the truthy check on `self._scroll_lines` and call `len()` only inside the branch that needs it — matches the sibling `_IterationPanel._build_footer` style (`if self._tool_count > 0:` with direct attribute references). Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_IterationSpinner._build_footer` alias drop Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `agent` alias in `_on_run_started` The local was read exactly once — the only use was the immediately following `_is_claude_command(agent)` call. Reading `data["agent"]` directly matches the style of the sibling fc5e1cb inline of `total_in` and keeps the function focused on what it does. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `agent` alias inline in `_on_run_started` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop `new_offset` alias in `_FullscreenPeek.scroll_down` The local was assigned immediately to `self._offset`, then the follow-mode check re-read it as `new_offset == 0`. Reading `self._offset` directly after assignment has identical semantics — the alias added no value. Sibling `scroll_up` keeps its local because it compares the new value against the old one *before* the assignment (to conditionally disable auto-scroll), which requires two distinct values. `scroll_down` has no such comparison, so the alias is dead. Same shape as ef176bf (`line_count`) and 134078d (`name_col`). Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_FullscreenPeek.scroll_down` alias drop Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `msg` alias in `_apply_assistant` The `msg = raw.get("message", {})` local was read exactly once on the next line as `msg.get("usage")`. Reading `raw.get("message", {}).get("usage")` directly matches the chained-get style already used in `_iter_content_blocks` and the inline-alias style established by 497c028 (`agent`) and fc5e1cb (`total_in`). Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `msg` alias inline in `_apply_assistant` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `_fullscreen_page_size()` into space/b lambdas The `page` local in `_handle_fullscreen_key` was computed unconditionally in the non-exit branch but only consumed by the space/b action lambdas (page-down / page-up). Inlining `self._fullscreen_page_size()` into those two lambdas means it's only evaluated when space or b is actually pressed — j/k/g/G/[/] now skip the call entirely. Behavior is identical: the dict is built and a single action is invoked per keypress, and the page value is read at action-invocation time under the same lock. Same Phase 4 shape as 134078d / ef176bf / b19625e — narrow the scope of a local that was only live on one branch. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `_fullscreen_page_size()` lambda-inline refactor Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: use `next(reversed(...), None)` in `enter_fullscreen` history fallback The compound `if initial_id is None and self._iteration_history:` guard was doing two jobs: pick the fallback only when there's no live iteration, *and* avoid `next(reversed({}))` raising StopIteration on an empty dict. Switching to `next(reversed(...), None)` lets the standard sentinel default handle the empty case so the outer `if` only encodes the "fall back when nothing is live" intent. Same observable behavior: when both `_current_iteration` is None and `_iteration_history` is empty, `initial_id` ends up None and the existing `if initial_id is None or self.panel_for(...) is None:` branch prints "Full peek: no iterations yet" and returns False — covered by `test_enter_without_iteration_prints_hint`. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `enter_fullscreen` history-fallback simplification Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: drop redundant BOM-startswith guard before `removeprefix` `str.removeprefix` is already a no-op when the prefix is absent — the `if text.startswith(_UTF8_BOM)` guard was dead defensive code. Same shape as 5337d88 / 4ccfa9a / 8cb0d47's "drop the guard when the operation already handles the empty/no-match case." Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record BOM-guard drop in `parse_frontmatter` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: replace `parsed = None` sentinel with try/except/else in `_read_agent_stream` The sentinel was a no-op placeholder for the error path: setting `parsed = None` just so `isinstance(parsed, dict)` would skip the forwarding block on the next line. Using the `try/except/else` structure expresses the same intent directly — the dict-handling logic runs only when `json.loads` succeeds — and drops the sentinel assignment plus the redundant dict check on the JSON-error path. Behavior preserved: the inner `isinstance(parsed, dict)` still gates forwarding for valid-but-non-dict JSON (lists, strings, numbers). Covered by the existing stream tests (`test_agent.py::test_stream_json_*`). Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record try/except/else refactor in `_read_agent_stream` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: replace `source._outcome` cross-class access with public `outcome` property `_FullscreenPeek._build_header` was reading `source._outcome` directly on a `_LivePanelBase` instance, crossing class boundaries into a private attribute. Expose an `outcome` property on `_LivePanelBase` (matching the `iteration_id` property added in ef9a178 for the parallel case) so the fullscreen header reads the value through a public API. Same observable behavior; the attribute is still the single source of truth inside `freeze`. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `outcome` property refactor on `_LivePanelBase` Iteration note, coverage update, and backlog entry for d0060b3 — the ef9a178-style public-property substitution applied to the remaining `source._outcome` cross-class access in the fullscreen header. Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: narrow `secs` scope into `if minutes < _MINUTES_PER_HOUR:` branch `secs = total % _SECONDS_PER_MINUTE` was computed unconditionally but only consumed by the `f"{minutes}m {secs}s"` return on the truthy branch. The hours branch derives its own `mins`/`hours` pair and never references `secs`. Moving the computation inside the branch that uses it keeps the local co-located with its use site. Same Phase 4 narrow-the-scope shape as 134078d (`name_col`), ef176bf (`line_count`), 59b0e34 (`page`), b19625e (`new_offset`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `secs` scope-narrowing in `format_duration` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `text` alias in `collect_output` The local was assigned from `ensure_str(stream)` and read exactly once on the next line as `parts.append(text)`. Inlining matches the alias-inline style from fc5e1cb / 497c028 / 52e0272 — the helper name already documents the decode step. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `text` alias inline in `collect_output` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: narrow `line` scope past early-return in `_on_agent_output_line` `escape_markup(data["line"])` was computed unconditionally before the `if not isinstance(target, _IterationSpinner): return` guard, so the work was wasted whenever the event hit the early-return branch (no active panel, or structured panel for a non-structured-agent edge case). Moving the binding below the guard keeps the same output for the _IterationSpinner branch and drops the dead work for the other branch. Same Phase 4 shape as 134078d's `name_col` narrowing and 7730dd4's `secs` narrowing. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `line` scope narrowing in `_on_agent_output_line` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `reason` alias in `run_loop` stop-event emit The `reason = state.status.reason` local was read exactly once on the next line as the `reason=` kwarg to `RunStoppedData(...)`. Collapsing to `reason=state.status.reason` matches the inline-alias pattern from ce487d3 (`text`) / 52e0272 (`msg`) / 497c028 (`agent`) / fc5e1cb (`total_in`). The property access is safe at this call site: the immediately-preceding `if state.status == RunStatus.RUNNING: state.status = RunStatus.COMPLETED` ensures `state.status` is always a terminal status (STOPPED / COMPLETED / FAILED), so `RunStatus.reason` cannot raise. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `reason` alias inline in `run_loop` Added coverage/engine.md noting the inline-alias refactor and the safety argument for the `state.status.reason` property access. Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `binary` alias in `_supports_stream_json` The local was read exactly once on the next line as `binary == CLAUDE_BINARY`. Collapsing to `Path(cmd[0]).stem == CLAUDE_BINARY` matches the sibling check in `_console_emitter.py:_is_claude_command`, which already uses the chained form, and the inline-alias pattern from ce487d3 / 52e0272 / 497c028 / fc5e1cb. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `binary` alias inline in `_supports_stream_json` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `visible` alias in `_LivePanelBase._build_body` The `visible = self._scroll_lines[-_MAX_VISIBLE_SCROLL:]` local was read exactly once as the iterable of the very next `for line in ...` loop. Collapsing to `for line in self._scroll_lines[-_MAX_VISIBLE_SCROLL:]:` matches the inline-alias pattern from 497c028 / fc5e1cb / 52e0272 / ce487d3 / e1ad87a — the slice already documents "the last N scroll lines" without needing a named intermediate. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `visible` alias inline in `_build_body` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `reader` thread handle in `_read_agent_stream` The `reader` local served only to call `.start()` — the thread is never joined explicitly because termination is signalled through the queue's None sentinel and the daemon flag. Collapsing into the fluent `Thread(...).start()` form matches the fire-and-forget intent and drops an unused binding. Python's `threading` module keeps live threads reachable via `threading._active`, so dropping the local reference does not affect thread lifetime. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `reader` thread handle inline in `_read_agent_stream` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `remaining` alias in `_read_agent_stream` timeout calc The `remaining = deadline - time.monotonic()` local served a single use on the very next line as `max(remaining, 0)`. Collapsing to `max(deadline - time.monotonic(), 0)` matches the inline-alias style established by e1ad87a (`binary`), 497c028 (`agent`), fc5e1cb (`total_in`), 52e0272 (`msg`), and ce487d3 (`text`). Updated the adjacent comment to refer to "clamp to 0" since the `remaining` name no longer exists. Behavior preserved — same value reaches `line_q.get(timeout=...)` on every iteration of the read loop. Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `remaining` alias inline in `_read_agent_stream` Co-authored-by: Ralphify <noreply@ralphify.co> * refactor: inline `stream_cmd` into Popen call in `_run_agent_streaming` The local was constructed and read exactly once on the very next statement as the first positional arg to `subprocess.Popen(...)`. The three appended tokens are already named constants (`_OUTPUT_FORMAT_FLAG`, `_STREAM_FORMAT`, `_VERBOSE_FLAG`) so the intent reads clearly at the call site without the intermediate binding. Same Phase 4 inline-alias shape as 66d6c60 (`remaining`), b24accf (`reader`), 2fda4f0 (`visible`), and e1ad87a (`binary`). Co-authored-by: Ralphify <noreply@ralphify.co> * workspace: record `stream_cmd` inline in `_run_agent_streaming` Co-authored-by: Ralphify <noreply@ralphify.co> --------- Co-authored-by: Kasper Junge <kasperjunge@Kaspers-MacBook-Pro.local> Co-authored-by: Ralphify <noreply@ralphify.co> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: support embedding ralphify as a library so callers can run loops in-process
Make typer/rich a `[cli]` extra so the core (pyyaml-only) imports cleanly
in library contexts; `ralph` CLI raises a clear install hint when the extra
is missing. Add in-memory `RunConfig.prompt` to run a loop without a
RALPH.md file, plus a RunManager embedding lifecycle (wait_for_any/all,
get_result, shutdown) and an immutable RunResult snapshot. Make Event
generic over its payload and export the typed event payloads so embedders
can annotate handlers without casting. Documented in docs/embedding.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix: address PR review findings for library embedding API
- manager: guard wait_for_any/wait_for_all against infinite blocking when
run_ids is empty or contains only unregistered IDs with timeout=None —
nothing can ever notify the condition, so return immediately
- __init__: narrow the CLI-import guard in main() to ModuleNotFoundError
and only emit the [cli]-extra hint for missing rich/typer; re-raise any
other missing-module error so real bugs in ralphify.cli aren't masked
- _events: replace the unsound cast(EventData, {}) with an explicit empty
NoData payload type added to the EventData union; drop the unused cast
- _events: reword an internal "curd 1" note to plain language
Adds no-hang regression tests for the wait helpers (timeout=None) and a
test that main() re-raises unrelated import errors.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
New `Invocation` abstraction and `deliver_prompt` method let adapters choose where the prompt goes: stdin or as a positional argument. Arg-delivery adapters (opencode) return `stdin_text=None` for DEVNULL stdin and no writer thread. Enables opencode as first-class with `--format json` event parsing and safe positional-argument delivery. Co-authored-by: Claude <noreply@anthropic.com>
The opencode adapter mapped `tool_result` to a message event, but opencode never emits that type — verified against run.ts emit() call sites in both v1.15.10 and v0.15.31. Meanwhile the newer `reasoning` and `error` events were unhandled, so they were silently dropped from the live peek panel. Map the real informational event set (step_start / text / reasoning / error) so they render without counting against the turn cap. Tool output rides on the existing `tool_use` event once a tool part reaches the completed/error status, so no separate result-event handling is needed. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
feat: add Crush CLI adapter so users can loop Charm Crush Crush is TUI-first but ships a `crush run` subcommand for non-interactive single-prompt use, reading the prompt from stdin and auto-approving all permissions (no --yolo needed). It emits plain text only — no JSON or streaming-event mode — so the adapter sits in the blocking tier alongside the generic adapter: counts_what="none", supports_streaming=False, requires_full_stdout_for_completion=True. Completion is detected by scanning stdout for the <promise> tag. build_command injects `--quiet` (idempotent) to suppress the spinner for clean piped output. Verified against crush v0.71.0 source. Also gitignore .cheese/ and .serena/ (skill scratch + LSP cache). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…own (#20) * feat: enforce max_turns and add agent lifecycle hooks Completes Phase 3 of the CLI adapter layer: fills in the turn-cap enforcement, lifecycle hooks, and per-CLI soft wind-down that the merged adapter seed (#6, #7) left scaffolded as NotImplementedError stubs and unwired events. - max_turns is now enforced: streaming adapters that count tool uses (Claude, Codex, Copilot, opencode) are SIGTERM'd at the cap and the iteration is recorded as completed-at-cap; Crush (counts_what="none") treats the cap as a no-op. - New `hooks` frontmatter field + AgentHook protocol (ShellAgentHook, CombinedAgentHook, NoOpAgentHook). Hooks observe iteration lifecycle, tool use, turn-approaching-limit, turn cap, and completion; a failing hook is logged but never aborts the run. - Per-CLI soft wind-down: Claude PreToolUse / Codex PostToolUse hooks are installed into a per-iteration tempdir (CLAUDE_CONFIG_DIR / CODEX_HOME) so the user's real config stays untouched. Adapters with no hook system downgrade to hard-cap-only via NotImplementedError. Re-sliced from the original explore-cli-hooks branch (PRs #5/#9) onto current main, reconciled with the prompt-delivery-as-adapter (#17) and library-embedding (#16) refactors, and extended with turn-cap coverage for the Crush and opencode adapters that landed after those PRs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: correct turn-cap enforcement, wind-down clamps, and Codex hook config Addresses review findings on PR #20: - Force stdout buffering on the blocking path when max_turns is set on a tool-use-counting adapter, so post-hoc counting actually reports tool_use_count and sets turn_capped (previously always 0 for Copilot). - Route the wind-down counter callback through _call_safely so a raising subscriber cannot crash the agent loop. - Clamp the wind-down grace below max_turns in both _setup_wind_down and the engine's approaching-limit threshold, so a Python-API grace >= max_turns no longer fires wind-down / approaching events at count 0. - Log the first soft wind-down counter-write failure at WARNING instead of swallowing all OSError silently. - Write Codex's hook feature flag to the [features] table (the one Codex reads; Stable, default-on) instead of the inert [experimental] table. - Clean up the orphaned counter file when install_wind_down_hook raises NotImplementedError and the counter lives in log_dir. - Reword CHANGELOG and quick-reference to distinguish streaming SIGTERM preemption from blocking post-hoc turn_capped reporting (Copilot is not SIGTERM'd). - Inline single-use hook constants in the Claude/Codex adapters. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: lock turn-cap and wind-down fixes with regression tests Hardening tests for the PR #20 review fixes, each proven to fail on the pre-fix implementation: - blocking-path forced stdout buffering so a tool-use-counting adapter with max_turns reports tool_use_count and turn_capped (plus a no-cap negative control). - wrapped counter callback swallows a raising subscriber via _call_safely. - wind-down grace is clamped below the cap before reaching the shim. - the log_dir counter file is cleaned up when install_wind_down_hook raises NotImplementedError. - the first counter-write failure logs one WARNING; repeats stay quiet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
test_run_resolves_installed_ralph_by_name mocked only shutil.which, not the subprocess, so it spawned a real `claude` agent and asserted on its non-deterministic output — the run took ~35s and could flake when the model's response happened to contain "installed ralph". Wrap the run in the standard subprocess mock so resolution is exercised without a live agent call. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds opt-in early-exit to the loop via a structured completion tag. When
stop_on_completion_signal: trueis set in frontmatter and the agentemits a matching
<promise>TEXT</promise>tag in its output, ralphifystops after the current iteration instead of continuing to the iteration
budget. The default inner text is
RALPH_PROMISE_COMPLETE; override itwith the optional
completion_signalfield.Why the
<promise>tag formatThe tag format intentionally mirrors what the Claude Code ralphify
plugin (Ralph-Wiggum) already uses to signal structured completion
back to its harness. Aligning on the same
<promise>...</promise>shapemeans a single prompt body works on both sides — a ralph written against
the Claude plugin can stop a ralphify loop with no adapter code, and
vice versa. Ralphify still runs its own command/prompt loop; only the
completion-tag format is shared.
What's new
src/ralphify/_promise.py— strict<promise>TEXT</promise>parserwith whitespace normalization (bare marker text no longer counts)
completion_signal— inner text of the promise tag (defaultRALPH_PROMISE_COMPLETE)stop_on_completion_signal— bool gate for early exit (defaultfalse)examples/promise-completion/RALPH.md— canonical example, surfacedfrom
ralph scaffoldhelp and the cookbookralph run --helpandralph scaffold --helpnow explain the patterncookbook, quick-reference
tests/test_promise.py,tests/test_agent.py,tests/test_cli.py,tests/test_engine.py(changed modules land at ~98% coverage)
Backwards compatibility
Default behavior is unchanged. Loops still run to their iteration budget
unless both
stop_on_completion_signal: trueand the configured tagare present. Existing ralphs are untouched.
Test plan
uv run pytest— 661 passed, 1 xpasseduv run ruff check .uv run ruff format --check .uv run ty checkuv run mkdocs build --strictexamples/promise-completion/exits early on<promise>COMPLETE</promise>Note on help-text test robustness
tests/test_cli.py::TestHelpasserts that<promise>...</promise>appears in
--helpoutput. Rich wraps help text differently in headlessCI vs. a local TTY, which splits the tag across lines and breaks a
naïve substring check. The tests strip ANSI codes and collapse
whitespace before asserting, so they're stable on both sides.