refactor(agent,runtime): unify PATH/env handling via command prelude#63
Open
zpzjzj wants to merge 8 commits into
Open
refactor(agent,runtime): unify PATH/env handling via command prelude#63zpzjzj wants to merge 8 commits into
zpzjzj wants to merge 8 commits into
Conversation
Collaborator
Author
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 226344a804
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
8cae05a to
4ee2bda
Compare
…updates Lets setup-time orchestrators (notably each agent's Install) seed the runtime's persistent env baseline with values resolved through the target shell. Subsequent Exec calls inherit these vars unless overridden by opts.Env. The intended use case is one-time setup before the runtime is exercised in parallel — concurrent MergeEnv vs. Exec is not safe and documented as such. The three real runtimes (NoneRuntime, DockerRuntime, OpenSandboxRuntime) implement MergeEnv as a simple cfg.Env overlay. For DockerRuntime, the container's entrypoint (sleep infinity) is started with the env present at Create time, so post-Create MergeEnv only affects subsequent `docker exec` calls — the entrypoint runs unaffected. Six in-tree test mocks (mockRuntime, scriptJudgeRuntime, mockJudgeTestRuntime, claudeCodeTestRuntime, codexTestRuntime, qoderTestRuntime) gain trivial no-op stubs to keep implementing the interface; tests that need to observe MergeEnv behaviour will get real recording in later commits. No caller changes — this commit only widens the interface and adds the plumbing. Tests for the new behaviour live next to each real runtime. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…a MergeEnv
Move PATH provisioning from "agent injects $-string into env map and hopes
the runtime expands it" to "agent asks the runtime's own shell to expand
the string, then writes the literal result to the runtime's env baseline".
Concretely:
- agent.go: `mergeExecOptionsEnv` no longer injects PATH; it now only
merges credentials, observability, and caller env. The
`agentExecutablePath` constant is gone — it was the shell-expansion
string that all the runtime-side compensation code existed to handle.
- agent.go: new private helper `probeAndMergePATH(ctx, rt, probeCmd)`
runs the supplied probe (a `printf '%s' "$HOME/..."`-style command)
through the runtime, then calls `rt.MergeEnv({"PATH": <result>})`.
Probe failure logs a warning and leaves the runtime's default PATH
in effect.
- Each agent's `Install` calls the helper at the top with its own
per-agent probe constant:
- claude-code & codex: include $HOME/.local/bin AND
$HOME/.nvm/current/bin (their CLIs are node scripts whose shebang
needs node, which lives under the nvm symlink).
- qodercli: $HOME/.local/bin only (qodercli is a self-contained
binary placed by its official installer; no nvm needed).
The probe constants are deliberately not shared even where claude
and codex have identical values — a future divergence shouldn't
silently cross-contaminate.
Type=none semantic change: the evaluator skips `ag.Install` when
environment.type=none, so the probe doesn't run there either. The
host shell's PATH is now the contract for type=none, matching the
Unix expectation that tools inherit your shell env. This drops the
previously-forced `$HOME/.local/bin:$HOME/.nvm/current/bin:$PATH`
for type=none users; documented in a follow-up commit's CHANGELOG.
Tests: agent_test.go gains three TestProbeAndMergePATH_* cases for
happy/failure/empty-stdout paths. The three per-agent install tests
have their `lastExecEnv["PATH"] == agentExecutablePath` assertion
replaced with the new shape: PATH absent from opts.Env (agent doesn't
inject) AND `mergedEnv["PATH"]` populated (probe→MergeEnv ran). The
three fake runtimes (claudeCodeTestRuntime, codexTestRuntime,
qoderTestRuntime) gain a probeResponseStdout field, a recording
MergeEnv impl, and a guard that detects the probe command shape so
it isn't recorded as the agent's "real" command. e2e/pipeline_test.go
comment updated to describe the new mechanism (and to note why the
mock-engine path is unaffected: it uses type=none with explicit
mockEngineEnv-supplied PATH).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…x Exec Both runtimes carried matching hacks to detect PATH values containing \$, strip them from the literal env list, and prepend `export PATH=...` to the command so the inner shell would expand them. Docker also rejected \$(...) to guard against command substitution firing inside that auto-generated export. The agent layer no longer pushes \$-bearing PATH (see prior commit: each agent's Install probes the runtime shell for the literal PATH and hands it to runtime.MergeEnv). With no caller pushing such values, the runtime can pass every env key literally — no key special-cased, no \$(...) rejection needed because the runtime never constructs a shell command from a caller-supplied env value. Removes: - docker.go: PATH-with-\$ skip in buildCreateArgs and the pathPrefix block in Exec (including the \$(...) rejection and the comment that explained why it was needed). - opensandbox.go: literalRemoteEnv, withRemoteEnvExpansion, and the shellDoubleQuote helper they shared. opensandbox Exec now just passes Command + Envs straight through to the SDK. Tests updated to assert literal pass-through (including \$-bearing PATH and \$(...) forms) rather than the deleted special-case behaviour. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mergeEnv previously routed values through expandEnvMap, which used
os.Expand to substitute \$VAR / \${VAR} against the host env before
handing the result to cmd.Env. Docker and opensandbox have always
forwarded values literally, so the same eval.yaml could behave
differently under different runtimes.
Drop expandEnvMap and forward values literally in NoneRuntime as well.
Callers that need shell-side expansion should resolve the value first
(see internal/agent.probeAndMergePATH) or prepend `export X=...` to
their command. CHANGELOG documents the migration.
Tests: TestNoneRuntime_ExecExpandsPathFromRuntimeEnv replaced with
TestNoneRuntime_ForwardsEnvLiterally, which asserts the inverse
invariant — \$-bearing values reach the child process as the literal
string, with no host-side substitution.
CHANGELOG: [Unreleased] section now lists the three concrete changes
shipped by this PR — `runtime.MergeEnv` added, agent layer no longer
injects PATH, NoneRuntime stops expanding `\$VAR`. Includes the
type=none migration note about the host shell's PATH now being the
contract.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…est probe match Two follow-ups from PR review: 1. CHANGELOG was silent on the docker/opensandbox side of the "values forward literally" rule. The doc only called out NoneRuntime dropping host-side $VAR expansion, but the same PR also stops silently dropping `environment.env.PATH` values containing `$` in docker (previously buildCreateArgs skipped such keys to keep the container's entrypoint executable). Repo grep finds zero users relying on this — phantom scenario, deliberately accepted in PR discussion — but the breakage mode (container fails to start) is loud enough that users hitting it deserve to see the migration note alongside the NoneRuntime one. 2. The three agent test fakes used `strings.HasPrefix(command, "printf '%s' \"$HOME/")` to detect probe Exec calls. Loose prefix match would silently swallow any future test that legitimately issued such a command for non-probe reasons, returning the canned fake PATH instead of recording the real exec. Tighten to exact match against each agent's known probe constant (claudeCodeExecPathProbeCmd / codexExecPathProbeCmd / qoderExecPathProbeCmd). Production code never had this issue — real runtimes don't dispatch on command shape — so this is purely a test-isolation hardening. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…eanup The earlier "drop PATH-with-$ special-case in docker/opensandbox Exec" commit deleted literalRemoteEnv / withRemoteEnvExpansion / shellDoubleQuote from opensandbox.go but didn't notice that PR #71 (merged into main while this branch was in flight) had just added a TestOpenSandboxCommandHelpers test specifically exercising those three helpers. After rebasing onto the new main the symbols are referenced from a test but defined nowhere — compile fails on every CI job that hits the runtime package. The helpers were intentionally removed (no caller now constructs a shell command from a $-bearing env value, so neither the auto-quote nor the $(...) rejection has anything to guard). Drop the orphaned test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rgeEnv impls
The three runtimes' MergeEnv methods were byte-identical:
if len(env) == 0 { return }
if r.cfg.Env == nil { r.cfg.Env = make(map[string]string, len(env)) }
maps.Copy(r.cfg.Env, env)
Hoist the body into a package-level `mergeIntoEnvBaseline` helper in
runtime.go and shrink each runtime's MergeEnv to a one-line delegation.
The methods stay (the Runtime interface still demands them, and the
docstrings document each runtime's specific caveat — e.g. docker's
note that post-Create MergeEnv only affects subsequent `docker exec`
calls, not the long-running container entrypoint), but the behaviour
lives in exactly one place.
Reasoning for delegation over embedding: Go embedding would also work
(`type envBaseline struct{...}; MergeEnv(...)` embedded into each
concrete runtime), but that would require moving the env state out of
cfg.Env into a new field, threading initialization through every
construction site (including struct-literal `&NoneRuntime{cfg:...}`
patterns in tests), and renaming `r.cfg.Env` reads in every Exec
path. The free-helper form gives the same "one implementation"
property with zero changes to existing constructors or readers.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…hods CI lint surfaced two classes of nits the local incremental lint missed: 1. gofumpt formatting on the trailing close-brace of claude_code_test.go, codex_test.go, and qodercli_test.go. Applied verbatim. 2. dupl flagged the three agent Install methods as near-duplicates of each other (49-68 lines pair-wise). They are intentionally similar: each agent's Install shares the same probe→merge→exec lifecycle and only varies in the per-agent probe constant and default install command builder, which are already pulled out into per-agent constants and functions. Extracting the orchestration into a BaseAgent helper would add an indirection layer that obscures the per-agent control flow without removing real complexity. Match the existing pattern (codex.go:232 / qodercli.go:71 on Run) and pin `//nolint:dupl` with the rationale on each Install. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
lbfsc
reviewed
May 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The agent layer was injecting a shell-expansion string —
$HOME/.local/bin:$HOME/.nvm/current/bin:$PATH— as the literal value ofthe
PATHkey inmap[string]stringenv tables. Process-level env varsare not shell-expressions, so every runtime had to compensate:
os.Expandagainstthe host environment to coincidentally produce the right PATH.
PATHvalues containing
$, stripped them from the literal env list, andprepended an
export PATH="…"line to the command so the innershell would expand them. Docker additionally rejected
$(…)toguard against command substitution firing inside that auto-generated
export.
Three structurally identical patches lived in three files (~23 lines of
defensive commentary in
docker.goalone), plusNoneRuntimebehaveddifferently from the other two for user-supplied
environment.envvalues.
What
Move PATH from the env-map layer to the command layer, where shell
expansion lives.
withAgentExecPath(cmd)helper prependsexport PATH="$HOME/.local/bin:$HOME/.nvm/current/bin:$PATH"to thecommand itself.
mergeExecOptionsEnvno longer injectsPATHintothe env map. Applied to all 8 agent Install / Run / Check call sites
(claude-code, codex, qodercli, generic CLI).
literally; no key gets special handling. The
pathPrefixbranch,the
$(…)rejection,shellDoubleQuote,literalRemoteEnv,withRemoteEnvExpansion, and the matching cfg-time PATH skip areall deleted.
mergeEnvno longer routes values throughexpandEnvMap/os.Expand. Values forward literally, matchingdocker and opensandbox. This is the user-visible breaking change
(see CHANGELOG).
Commits are atomic, in the order an audit will read them:
Safety / attack surface
The deleted
$(…)rejection indocker.goguarded an attack vectorthat only existed because the runtime was constructing
export PATH="<user-value>"from caller-supplied env. With env passedvia
--env KEY=VALUE/ sandboxEnvs(process-level env-var setting,no shell parsing), that vector no longer exists. The agent layer's
withAgentExecPathuses a compile-time constant (agentExecutablePath),never user input.
Breaking change
NoneRuntimeno longer expands\$VAR/\${VAR}inenvironment.envoropts.Envvalues. Configs that depended onhost-side expansion (e.g.
MY_VAR: \"\$HOME/foo\") should switch toliteral values or move the export into a
setup_stepsshell snippet.Documented in CHANGELOG.
Test plan
go build ./...go vet ./...go test ./...(all packages green)go test -tags=e2e ./e2e/... -run TestPipeline— end-to-end mock-engine path, which exercises the agent → runtime PATH prelude chainagent_test.go/claude_code_test.go/codex_test.go/qodercli_test.goflipped to negative: env map must NOT carry PATH, command must start with the export preludeTestDockerRuntime_ExecPassesEnvLiterallyconfirms PATH and\$(…)-containing values forward literallyTestNoneRuntime_ForwardsEnvLiterallyconfirms no host-side expansion🤖 Generated with Claude Code