Skip to content

feat: add Custom Engine with local transport#24

Open
zpzjzj wants to merge 3 commits into
mainfrom
feat/custom-engine-local
Open

feat: add Custom Engine with local transport#24
zpzjzj wants to merge 3 commits into
mainfrom
feat/custom-engine-local

Conversation

@zpzjzj
Copy link
Copy Markdown
Collaborator

@zpzjzj zpzjzj commented May 19, 2026

Summary

Adds a Custom Engine mechanism so users can plug in their own agent executors purely through engine.custom in eval.yaml, without modifying skill-up. Previously any engine.name not matching a built-in agent (claude_code, codex, qodercli) failed with unsupported agent.

This is Phase 1 — the local transport: skill-eval runs a user-provided command inside the current runtime via runtime.Exec, hands it a standard SessionInput, and parses a standard SessionResult. The http transport is fully designed (see docs/design/custom-engine.md) and its config schema is parsed/validated, but selecting it returns an explicit "not yet implemented" error — HTTP lands in a follow-up PR.

Related issues

Changes

  • docs: docs/design/custom-engine.md — full design (local + HTTP), config schema, SessionInput/SessionResult contracts, env/template variables, artifact rules.
  • config: CustomEngineConfig schema (engine.custom); load-time resolution of ${VAR} / ${VAR:-default} / ${VAR?msg} env references over the engine.custom tree (built-in template vars left intact for run time); validation of transport and required fields.
  • agent: CustomAgent — builds SessionInput, renders template variables, executes the command via runtime.Exec, parses SessionResult from stdout or output_file, supports session_result and text response formats, and maps structured artifacts back. SessionArtifacts gains a structured files field.
  • factory: non-built-in engine names with an engine.custom block dispatch to CustomAgent; otherwise the error is unsupported agent "x": missing engine.custom.
  • wiring: the custom config and per-case metadata (case_id, variant, max_turns, timeout) are threaded through credentialcliruntime.ExecOptionsevaluator into the agent.

ResolveRunnerInitParams now takes config.EngineConfig instead of config.ModelConfig so it can carry the custom block.

Test plan

  • make test passes (go test ./... — all 13 packages green)
  • make verify passes (fmt + vet + golangci-lint — 0 issues)
  • Manual testing steps:
    • Ran a real custom local engine (a shell script emitting a SessionResult) through skill-eval run — the case executed and passed, final_message flowed into the report.
    • Verified a missing ${VAR} env reference fails at config load with a clear error.
    • Unit tests cover env-ref forms, validation, local stdout/output_file/text/non-zero-exit/unparseable/missing-exit_code, HTTP not-implemented, template rendering, and factory dispatch.

Notes for reviewers

  • Built-in engine names are duplicated as a small constant set in internal/config/validator.go (with an inline comment) to avoid a configagent import cycle, since agent now imports config for CustomEngineConfig.
  • transport: http is intentionally a no-op stub in this PR — config is parsed and validated, but Run returns "not yet implemented".
  • Pre-existing test TestResolveJudgeInitParams_FallsBackToRunnerBaseURLBeforeCredentialFallback is environment-sensitive (fails when ANTHROPIC_BASE_URL is set in the ambient env); unrelated to this change.

🤖 Generated with Claude Code

@zpzjzj zpzjzj requested a review from hittyt as a code owner May 19, 2026 02:17
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 477d2a1a82

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/agent/custom.go Outdated
Comment thread internal/config/loader.go Outdated
Comment thread internal/agent/custom.go Outdated
Comment thread internal/agent/custom.go Outdated
Comment thread internal/agent/custom.go Outdated
Comment thread internal/agent/custom.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e1950d882

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/agent/custom.go
Comment thread internal/agent/custom.go Outdated
Comment thread internal/config/validator.go Outdated
Comment thread internal/agent/custom.go
@zpzjzj zpzjzj requested review from jwx0925, lbfsc and lijunfeng722 May 19, 2026 02:57
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 469dd61d12

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/agent/custom.go Outdated
Comment thread internal/agent/custom.go Outdated
Comment thread internal/agent/custom.go
Comment thread internal/agent/custom.go Outdated
Comment thread internal/agent/custom.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 67ba0076b3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/agent/custom.go
Comment thread internal/agent/custom.go Outdated
Comment thread internal/agent/custom.go Outdated
Comment thread internal/agent/custom.go
Comment thread internal/agent/custom.go Fixed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bc1421f0dc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/agent/custom.go
Comment thread internal/agent/custom.go Outdated
Comment thread internal/agent/custom.go Outdated
Comment thread internal/runtime/none.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef5174722c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/agent/custom.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f583aa05df

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/agent/factory.go
Comment thread internal/agent/factory.go
Comment thread internal/agent/custom.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 189a278edf

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/config/loader.go Outdated
Comment thread internal/config/customengine.go
Comment thread internal/agent/custom.go Outdated
Comment thread internal/credential/agent_init.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0974c1d479

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/cli/run.go
Comment thread internal/agent/custom.go
Comment thread internal/agent/custom.go
Comment thread internal/credential/agent_init.go Outdated
@zpzjzj zpzjzj self-assigned this May 19, 2026
@zpzjzj zpzjzj added the enhancement New feature or request label May 19, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7347ec23ba

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/agent/custom.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 899560be8e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/config/customengine.go
Comment thread internal/agent/custom.go Outdated
Comment thread internal/agent/custom.go Outdated
Comment thread internal/agent/custom.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 515054cafa

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/agent/custom.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 201dd2fbbc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/agent/custom.go Outdated
Comment thread internal/config/customengine.go
Comment thread internal/agent/custom.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 978997c1bd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/config/customengine.go
Comment thread internal/config/customengine.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c6d3972627

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/config/customengine.go Outdated
Comment thread internal/config/customengine.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 546ca961e4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/agent/custom.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e282bffb2b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agent/custom.go Outdated
Comment thread internal/agent/custom.go Outdated
Comment thread internal/agent/custom.go Outdated
Comment thread internal/agent/custom.go
Comment thread internal/agent/custom.go
Comment thread internal/agent/custom.go
Comment thread internal/agent/custom.go Outdated
Comment thread internal/agent/custom.go
Comment thread internal/agent/custom.go
Comment thread internal/agent/custom.go Outdated
Comment thread internal/agent/agent.go
Comment thread internal/config/validator.go Outdated
Copy link
Copy Markdown
Collaborator Author

@zpzjzj zpzjzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

代码评审报告(自动)

项目 内容
评审人 @zpzjzj (via Claude Code github-code-review skill)
评审分支 feat/custom-engine-localmain
HEAD 8984f3a6
规模 27 files, +3925 / -31

问题汇总

  • 🔴 Blocker: 1
  • 🟠 Major: 14
  • 🔵 Info: 11

整体评价

❌ 不建议合并 —— 存在 1 个 Blocker(output_file / input_file 路径穿越,可让框架在 clearStaleOutputFile 阶段对 host 任意文件 os.Remove)。需修复后才能进入下一轮评审。

CI 维度评审时 8 个 CI run / 3 个 CodeQL run 仍处 QUEUED,未给出失败信号;待 CI 完成后请复核。license/cla 已通过。

关键改进方向

  1. 安全护栏:完成 workspacePath 的边界校验是合并的硬门槛;同时建议把 containsAPIKey 扩展为「所有已知凭据字符串」的 mask 器,覆盖 OTel span / stderr 日志 / 进程列表三条泄漏路径。
  2. transport 抽象前置:在 Phase 1 落地 customTransport 接口,避免 http transport PR 大改 CustomAgent 主体;同时把 sessionInput 升级为公开契约 SessionInput
  3. OOM 收口:output_file 读取链路(custom.go:340)和 NoneRuntime.Execbytes.Buffer(runtime/none.go:196)都缺上限;并发评测时是真实风险,建议同步引入 MaxStdout/StderrBytes
  4. 架构债CustomAgent 嵌入 CLIAgent、builtin engine 常量在 config/agent 复制、ArtifactFile 「exactly one of」无校验、Check 永远返回 nil —— 趁 Phase 1 一起收掉成本最低。
  5. 用户向文档:README / README.zh / CHANGELOG / docs/guide/writing-evals.md / docs/index.md 完全没动,这能力对外发现度近 0。

已具备的防护(无需更改)

  • shellQuote 单引号转义到位
  • 配置加载期 strict 模式拒绝在 local.command/args/cwd/input_file/output_file 中引用 secret-name env 或 ${api_key} / ${kwargs} 聚合变量
  • 迭代式 env 展开防止 wrapper-var 走私
  • containsAPIKey 在 args/command 中拦截 Cfg.APIKey 字面量
  • clearStaleOutputFile 只在显式配置 output_file 时触发
  • http transport 在 validator 阶段被拒绝
  • 进程组 + WaitDelay 让 timeout 能真正生效

行评论无法挂载的 Major(PR 未触碰原行)

  • 🟠 OTel span 泄漏 + bash -c 进程列表暴露internal/runtime/none.go:174,189。问题是已存在代码,但 CustomAgent 把"用户提供的任意命令/env"变成主要入口,secret 泄漏面被显著放大。建议把 OTel process.command 字符串与 stderr 落日志(none.go:215)一并过 mask 处理。
  • 🟠 NoneRuntime.Exec stdout/stderr 无上限internal/runtime/none.go:196-205。同样旧代码,但 CustomAgent 让脚本完全用户可控,OOM 风险首次具备触发条件。

Major(已发行评论)

# 文件:行 问题
1 custom.go:41 嵌入 CLIAgent 但 override 大多方法,脆弱继承
2 custom.go:67 Check 永远返回 nil,validate 裸奔
3 custom.go:79 transport 分发用 switch 而非策略接口
4 custom.go:113 sessionInput 被双重 JSON 序列化
5 custom.go:223 finishLocal 控制流分裂、参数臃肿
6 custom.go:340 output_file 全量加载 + 多余落盘往返
7 custom.go:588 sessionInput 应升级为公开契约
8 agent.go:52 ArtifactFile 「exactly one of」无运行时校验
9 validator.go:32 builtin engine 常量在 config / agent 重复
10 custom.go:692 🔴 Blocker:workspacePath 路径穿越

Info 级建议(汇总)

代码质量:

  • containsAPIKey 用裸子串匹配,短 API key 易误报;建议加最小长度门槛(custom.go:564)
  • readRawResultoutputFile == "" 是死分支(custom.go:341);DownloadFile 失败 vs ReadFile 失败的 produced 语义不一致
  • 常量名 customResponseSessionJSON 值是 "session_result",命名不符(custom.go:25),建议改 customResponseSessionResult
  • 模板解析(${X} / ${X:-default} / ${X?msg})在 custom.go:756-818customengine.go:348-429 几乎重复实现;建议抽 internal/template 子包

安全:

  • renderTemplate 在 vars 命中但值为空时落回 os.Getenv(custom.go:801),绕过 strict 模式;建议同样对命中名做 isSensitiveEnvName 校验或直接去掉 fallback
  • stderr 原文以 WARN 落日志(none.go:215);建议长度截断 + 凭据 mask,或仅 ExitCode≠0 时打印
  • custom.Env key 未做危险变量过滤(custom.go:212);建议 deny-list LD_PRELOAD / DYLD_INSERT_LIBRARIES / BASH_ENV / PYTHONPATH

架构:

  • ResolveRunnerInitParams 取整个 EngineConfig 过宽(credential/agent_init.go:74),内部只读 Model + Custom;建议改显式参数
  • http transport「not yet implemented」文案在 custom.go:83validator.go:221 双源;建议抽常量
  • 设计文档第 96 行处建议补一句「未配置 output_file 时结果取自 stdout,默认路径不被读取」

文档:

  • CHANGELOG.md 完全未更新(Unreleased / Added 段应记 Custom Engine + local transport)
  • README.md / README.zh.md Features 区仍只列三个内置引擎
  • docs/guide/writing-evals.md Section 5 没有 engine.custom 示例 / link
  • docs/index.md features 区未露出 Custom Engine
  • 设计文档字段表(docs/design/custom-engine.md:196-200)未在 custom.http.* 行内标注「Phase 2; rejected today」
  • custom.http.method 描述「First version only supports POST」措辞陈旧
  • 缺 quickstart / minimal 完整示例的用户向文档(仅 e2e/testdata/custom-engine/ fixture,文档未引用)
  • CustomEngineConfig 字段级 godoc 较薄(schema.go:106-140)
  • 设计文档错误分类(728-740)未列对应 sentinel 常量名

说明:本评审通过 6 个并行子 Agent 完成代码质量、安全性、性能、CI、架构、文档六个维度的检查。如对某条结论有异议或想深入讨论,欢迎在对应行评论下回复。

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1fb11ba4ce

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agent/custom.go
Comment thread internal/agent/custom.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 406660c075

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agent/custom.go Outdated
Comment thread internal/agent/custom.go Outdated
Comment thread internal/agent/custom.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aba832cb71

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/config/customengine.go Outdated
Comment thread internal/runtime/none.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: de0a540919

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agent/custom.go
Comment thread internal/agent/custom.go
Comment thread internal/agent/custom.go
Comment thread internal/config/customengine.go Outdated
zpzjzj and others added 2 commits May 26, 2026 17:24
Adds a Custom Engine mechanism so users can plug in their own agent
executors purely through `engine.custom` in `eval.yaml`, without
modifying `skill-up`. Previously any `engine.name` not matching a
built-in agent (`claude_code`, `codex`, `qodercli`) failed with
`unsupported agent`.

This is Phase 1 — the `local` transport: `skill-eval` runs a
user-provided command inside the current runtime via `runtime.Exec`,
hands it a standard `SessionInput`, and parses a standard
`SessionResult`. The `http` transport is fully designed (see
`docs/design/custom-engine.md`) and its config schema is
parsed/validated, but selecting it returns an explicit "not yet
implemented" error — HTTP lands in a follow-up PR.

Changes:

- docs: `docs/design/custom-engine.md` — full design (local + HTTP),
  config schema, `SessionInput`/`SessionResult` contracts, env/template
  variables, artifact rules.
- config: `CustomEngineConfig` schema (`engine.custom`); load-time
  resolution of `${VAR}` / `${VAR:-default}` / `${VAR?msg}` env
  references over the `engine.custom` tree (built-in template vars
  left intact for run time); validation of transport and required
  fields; strict mode rejects secret-shaped env names and credential
  literals from reaching a command line.
- agent: `CustomAgent` — builds `SessionInput`, renders template
  variables, executes the command via `runtime.Exec`, parses
  `SessionResult` from stdout or `output_file`, supports
  `session_result` and `text` response formats, and maps structured
  artifacts back. `SessionArtifacts` gains a structured `files` field.
  Path I/O is confined to the runtime workspace through `workspacePath`
  which resolves symlinks on every call to close the TOCTOU window;
  inline artifact size is capped per-file (50 MiB) and per-result
  (200 MiB).
- factory: non-built-in engine names with an `engine.custom` block
  dispatch to `CustomAgent`; otherwise the error is
  `unsupported agent "x": missing engine.custom`. Built-in engine
  names are listed in `internal/agentkind` as a single source of truth
  shared by the agent factory and config validator.
- wiring: the custom config and per-case metadata (`case_id`,
  `variant`, `max_turns`, timeout) are threaded through `credential`
  → `cli` → `runtime.ExecOptions` → `evaluator` into the agent.
  `ResolveRunnerInitParams` now takes `config.EngineConfig` instead of
  `config.ModelConfig` so it can carry the custom block.
- runtime/none: process-group + `WaitDelay` so a timed-out command's
  descendants are reliably killed and a clean exit whose pipes
  outlive the parent is not mis-reported as a failure.

Security hardening: API key and `engine.custom.env` values are masked
in `FinalMessage`, `Stderr`, and `Transcript` content before they
enter reports or judges; the rendered command line is rejected if it
embeds the configured API key; the custom timeout is clamped to the
outer case deadline so the value advertised to the agent matches the
real wall-clock budget.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…sport

Two narrow follow-ups surfaced by a fresh self-review against the
squashed PR. None are field-reported bugs; both are defense-in-depth
on the masking and API-misuse boundaries introduced by this PR.

1. internal/agent/custom.go — mask parsed.Engine / parsed.Model.

   `parseSessionResult` was already passing the other engine-controlled
   string fields (FinalMessage, Stderr, Transcript content via
   maskTranscript) through maskAPIKey before they reach reports,
   judges, and OTel span attributes. The Engine and Model fields were
   the only ones taken verbatim:

       Engine:       firstNonEmpty(parsed.Engine, a.Name()),
       Model:        firstNonEmpty(parsed.Model, ...),

   A custom engine that echoes a credential into its self-identification
   fields (e.g. a misconfigured wrapper that builds the engine name
   from a banner string, or a hostile engine returning
   `Model: "anthropic/sk-realkey..."`) would have bypassed the masking
   discipline applied elsewhere in the same struct literal. Route both
   through maskAPIKey to close the gap.

2. internal/config/validator.go — doc Validator.ValidateAll's split
   with ResolveCustomEngineConfig.

   The split exists for a real reason (the final engine name is only
   known after CLI overrides like --engine settle, and engine.custom
   must only be enforced for non-built-in engines), but the function
   name `ValidateAll` invites future API consumers to assume it
   validates everything. Today cli/run.go and cli/validate.go both
   call ResolveCustomEngineConfig after ValidateAll; document that
   contract on ValidateAll itself so a new caller does not silently
   skip engine.custom validation.

A third self-review finding (re-order ResolveCustomEngineConfig so a
`transport: http` config fails fast with "not yet implemented" rather
than a confusing env-missing diagnostic on http.url) was deliberately
not applied: removing the resolveHTTPEnv call site leaves the
function unused (lint failure under revive's strict rules), and
keeping it alive with //nolint trades one minor UX wart for code
churn that Phase 2 will need to undo. Leaving as documented Info.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@zpzjzj zpzjzj force-pushed the feat/custom-engine-local branch from 9e9ccfa to 989c115 Compare May 26, 2026 09:30
# Conflicts:
#	docs/guide/writing-evals.md
#	internal/agent/agent.go
#	internal/agent/factory.go
#	internal/config/schema.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants