feat(agents): opencode host runtime — serve lifecycle + drive_turn (#588 phase 2)#592
Conversation
) OpenCodeServer manages a host 'opencode serve' process (writes the LiteLLM provider config pointing at the proxy with the agent's own key, spawns the server, waits on /doc health) and drive_turn runs one turn through OpenCodeAdapter, wiring its reply dicts to the sink. Mirrors openclaw_acp_runtime; never raises out of drive_turn (degrades to an error reply).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds ChangesOpenCode Runtime Integration
Sequence DiagramsequenceDiagram
participant Caller
participant drive_turn
participant OpenCodeAdapter
participant Sink
Caller->>drive_turn: text, trace_id, sink, base_url, model_id, ...
drive_turn->>OpenCodeAdapter: create with config
drive_turn->>OpenCodeAdapter: ensure_session()
drive_turn->>OpenCodeAdapter: prompt(text)
alt Success
OpenCodeAdapter->>Sink: delta dict with trace_id
OpenCodeAdapter->>Sink: final dict with trace_id
else Adapter Exception
drive_turn->>Sink: error dict with trace_id
else ensure_session Exception
drive_turn->>Sink: error dict with trace_id
end
drive_turn->>OpenCodeAdapter: close()
drive_turn->>Caller: return (no raise)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/test_opencode_runtime.py (1)
124-215: ⚡ Quick winAdd a regression for the live-but-unhealthy startup path.
The suite covers healthy and dead processes, but not the case where
_procis still alive while_health_check()stays false for a few polls. That branch is whereensure_running()can currently double-spawn on the same port and miss its deadline budget, so it needs a targeted regression test.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_opencode_runtime.py` around lines 124 - 215, Add a regression test that exercises the “live-but-unhealthy” startup path in OpenCodeServer.ensure_running: simulate asyncio.create_subprocess_exec returning a live _FakeProcess (with returncode None) while OpenCodeServer._health_check initially returns False for several polls then True (or stays False to trigger timeout), and assert that ensure_running does not double-spawn on the same port (i.e., spawn_count increments only once) and respects the deadline behavior; reference the existing test helpers/_symbols: OpenCodeServer.ensure_running, OpenCodeServer._health_check, asyncio.create_subprocess_exec, _FakeProcess, and server._proc.returncode to implement the scenario and assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tinyagentos/opencode_runtime.py`:
- Around line 81-99: The config file is written with default permissions,
leaking the LiteLLM key; ensure the directory is created with owner-only
permissions and the file is created with owner-read/write only. Change the
config_dir.mkdir call to set mode=0o700 (and keep parents=True, exist_ok=True)
and, after writing the payload (where config_path.write_text is used), call
config_path.chmod(0o600) (or atomically open/write the file with owner-only
mode) so the directory and opencode.json are owner-only; use os.umask if needed
to enforce modes on creation.
- Around line 145-146: The subprocess for the long-lived "opencode serve" child
is created with stdout=asyncio.subprocess.PIPE and
stderr=asyncio.subprocess.PIPE but those pipes are never drained; change the
creation in opencode_runtime.py so the child won't block: either set stdout and
stderr to asyncio.subprocess.DEVNULL (or None to inherit parent fds) when
calling create_subprocess_exec/create_subprocess_shell, or keep the PIPEs and
spawn async reader tasks that continuously read from process.stdout and
process.stderr (ensure you reference the same process variable used when
awaiting the child, e.g., the proc/process returned by create_subprocess_*).
Make one of these fixes so the OS pipe buffers cannot fill and stall the
runtime.
- Around line 127-148: The code currently may spawn a new opencode process while
self._proc is still alive; update the startup logic in the method that uses
self._proc/is_running() and _health_check() so it first polls the existing child
while it is alive and only spawns a new process if the child has exited.
Implement a real deadline using asyncio.get_running_loop().time() (compute
deadline = now + deadline_s) and loop: while loop.time() < deadline and
self._proc is not None and self.is_running(): await _health_check() with short
sleeps between checks; if the child exits during that loop fail fast and do not
spawn; only call asyncio.create_subprocess_exec(self._cfg.binary, "serve", ...)
when there is no live child. Apply the same change to the second similar block
(the one around the 154-166 region) so both startup paths use the same
live-child polling and time-based deadline logic.
---
Nitpick comments:
In `@tests/test_opencode_runtime.py`:
- Around line 124-215: Add a regression test that exercises the
“live-but-unhealthy” startup path in OpenCodeServer.ensure_running: simulate
asyncio.create_subprocess_exec returning a live _FakeProcess (with returncode
None) while OpenCodeServer._health_check initially returns False for several
polls then True (or stays False to trigger timeout), and assert that
ensure_running does not double-spawn on the same port (i.e., spawn_count
increments only once) and respects the deadline behavior; reference the existing
test helpers/_symbols: OpenCodeServer.ensure_running,
OpenCodeServer._health_check, asyncio.create_subprocess_exec, _FakeProcess, and
server._proc.returncode to implement the scenario and assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: b7fc76b8-8d47-4400-8cd3-4527aefd252f
📒 Files selected for processing (2)
tests/test_opencode_runtime.pytinyagentos/opencode_runtime.py
| config_dir = Path(self._cfg.home) / ".config" / "opencode" | ||
| config_dir.mkdir(parents=True, exist_ok=True) | ||
| models = {mid: {} for mid in self._cfg.model_ids} | ||
| payload = { | ||
| "$schema": "https://opencode.ai/config.json", | ||
| "provider": { | ||
| "litellm": { | ||
| "npm": "@ai-sdk/openai-compatible", | ||
| "name": "LiteLLM", | ||
| "options": { | ||
| "baseURL": self._cfg.litellm_base_url, | ||
| "apiKey": self._cfg.litellm_key, | ||
| }, | ||
| "models": models, | ||
| } | ||
| }, | ||
| } | ||
| config_path = config_dir / "opencode.json" | ||
| config_path.write_text(json.dumps(payload, indent=2), encoding="utf-8") |
There was a problem hiding this comment.
Restrict the on-disk config permissions.
This file persists the agent-scoped LiteLLM key in plaintext, but the directory and file permissions are whatever the current umask allows. On a shared host, that can leave opencode.json readable by other local users; create the config dir/file with owner-only permissions before writing the key.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tinyagentos/opencode_runtime.py` around lines 81 - 99, The config file is
written with default permissions, leaking the LiteLLM key; ensure the directory
is created with owner-only permissions and the file is created with
owner-read/write only. Change the config_dir.mkdir call to set mode=0o700 (and
keep parents=True, exist_ok=True) and, after writing the payload (where
config_path.write_text is used), call config_path.chmod(0o600) (or atomically
open/write the file with owner-only mode) so the directory and opencode.json are
owner-only; use os.umask if needed to enforce modes on creation.
| "serve", | ||
| "--port", str(self._cfg.port), | ||
| "--hostname", "127.0.0.1", | ||
| stdout=asyncio.subprocess.PIPE, |
There was a problem hiding this comment.
CRITICAL: Subprocess output pipes not consumed
The subprocess is created with stdout=asyncio.subprocess.PIPE and stderr=asyncio.subprocess.PIPE, but these pipes are never read. If the subprocess produces enough output to fill the OS pipe buffers, it will block waiting for the pipes to be read.
This could cause the opencode serve process to hang indefinitely.
Fix: Either redirect output to devnull (stdout=asyncio.subprocess.DEVNULL, stderr=asyncio.subprocess.DEVNULL) or create a task to read and log the output.
| TimeoutError: if the server does not become healthy within *deadline_s*. | ||
| """ | ||
| # Fast path: already running and healthy. | ||
| if self.is_running() and await self._health_check(): |
There was a problem hiding this comment.
WARNING: Potential for multiple opencode serve processes on same port
If an existing opencode serve process is unhealthy (not responding to /doc) but still running, the fast path check will fail, causing write_config() to be called and a new process to be spawned without stopping the existing unhealthy process.
This could result in two processes trying to bind to the same port, likely causing the new process to fail to start.
Fix: Before spawning a new process, stop any existing unhealthy process by calling await self.stop() if self.is_running() is True but the health check failed.
| if await self._health_check(): | ||
| logger.info("opencode_runtime: server healthy after %.1fs", elapsed) | ||
| return | ||
| await asyncio.sleep(poll_s) |
There was a problem hiding this comment.
WARNING: Health check polling loop may miss server becoming healthy
The polling loop sleeps for a fixed poll_s interval and then increments elapsed by poll_s. If the server becomes healthy just after a health check and just before the sleep, we will miss it and potentially wait nearly an entire poll_s interval longer than necessary, or even exceed the deadline and raise a false TimeoutError.
Fix: Calculate the remaining time before each sleep and sleep for the minimum of poll_s and the remaining time.
…respawn Addresses two CodeRabbit Majors on #592: - The opencode.json config embeds the agent's LiteLLM key in plaintext; chmod it 0600 (mirrors install_hermes.sh on ~/.hermes/.env), with a test asserting the mode. - ensure_running reaps an existing live-but-unhealthy process before respawning (was orphaning it + colliding on the port), and polls against a wall-clock deadline so a slow health check can't overrun deadline_s.
… PIPEs Addresses the kilo CRITICAL / CodeRabbit Major on #592: spawning the long-lived server with stdout/stderr=PIPE deadlocks it once the OS pipe buffer fills (no one drains it). Redirect both to a 0600 serve.log under the config dir (no unread pipe, and the host taOS agent's serve logs are kept for diagnosis); close the handle on stop().
Phase 2 of the opencode taOS agent (#588), on top of the merged adapter (#591).
What
tinyagentos/opencode_runtime.py:OpenCodeServermanages a hostopencode serveprocess:write_config(the LiteLLM provider config pointing at the proxy with the agent's own scoped key),ensure_running(idempotent — spawns the server, pollsGET /docfor health up to a deadline),stop,is_running,base_url.drive_turnruns one turn throughOpenCodeAdapter, wiring its reply dicts to the sink. Mirrorsopenclaw_acp_runtime: lifecycle (server) is separate from turn execution (drive_turn takes abase_url), and it never raises — failures degrade to a singleerrorreply (no raw-exception leak, matching the openclaw pattern).Tests
tests/test_opencode_runtime.py(14):write_configJSON shape,ensure_runningidempotency (spawns once, no re-spawn when healthy) + health-timeout, anddrive_turnwiring the sink + the error-on-failure path (one error reply, no raise). Full file set 38 green (with the adapter tests).Next (#588): the taOS agent as an agent record with its own LiteLLM key, configurable in the Agents app, then rewire
routes/taos_agent.pyto drive this runtime + rename "taOS Assistant" to "taOS agent".Summary by CodeRabbit
New Features
Tests