Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 24 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Codex plugin for Claude Code

Use Codex from inside Claude Code for code reviews or to delegate tasks to Codex.
Use Codex from inside Claude Code as a second model family — to critique a design before you build it, adversarially review completed work, or delegate a task to Codex.

This plugin is for Claude Code users who want an easy way to start using Codex from the workflow
they already have.
Expand All @@ -9,9 +9,11 @@ they already have.

## What You Get

- `/codex:review` for a normal read-only Codex review
- `/codex:adversarial-review` for a steerable challenge review
- `/codex:rescue`, `/codex:status`, `/codex:result`, and `/codex:cancel` to delegate work and manage background jobs
Codex is a **second model family** you reach for in two situations, plus a workhorse for delegation:

- `/codex:critique` — hand it a **design, before you build it**. Codex reads the code *and* queries the live database, then attacks the design where the data doesn't back it up.
- `/codex:adversarial-review` — an adversarial review of **completed work**. The one review command: it challenges the approach, not just the diff.
- `/codex:rescue`, `/codex:status`, `/codex:result`, and `/codex:cancel` — delegate a task to Codex and manage background jobs.

## Requirements

Expand Down Expand Up @@ -62,50 +64,41 @@ If Codex is installed but not logged in yet, run:
After install, you should see:

- the slash commands listed below
- the `codex:codex-rescue` subagent in `/agents`
- the `codex:codex-rescue` and `codex:codex-critique` subagents in `/agents`

One simple first run is:

```bash
/codex:review --background
/codex:adversarial-review --background
/codex:status
/codex:result
```

## Usage

### `/codex:review`
### `/codex:critique`

Runs a normal Codex review on your current work. It gives you the same quality of code review as running `/review` inside Codex directly.
Hands Codex a **design — before you build it** — for an independent, second-family critique. Codex has full read access to the repository *and*, where a database or MCP tool is available, the live data. It verifies each load-bearing claim in the design against the code (`file:line`) and the data (the actual query and its result), and leads with where the design does *not* hold up.

> [!NOTE]
> Code review especially for multi-file changes might take a while. It's generally recommended to run it in the background.
Reach for this when Claude has produced or endorsed a design and you want a different model to attack it before it gets built — not to review a finished diff (that's [`/codex:adversarial-review`](#codexadversarial-review)) and not to run a task (that's [`/codex:rescue`](#codexrescue)).

Use it when you want:

- a review of your current uncommitted changes
- a review of your branch compared to a base branch like `main`

Use `--base <ref>` for branch review. It also supports `--wait` and `--background`. It is not steerable and does not take custom focus text. Use [`/codex:adversarial-review`](#codexadversarial-review) when you want to challenge a specific decision or risk area.

Examples:
Pass the design inline, or point at a design doc:

```bash
/codex:review
/codex:review --base main
/codex:review --background
/codex:critique docs/design/new-cache-layer.md
/codex:critique --focus "the n is tiny — does the data even support this split?" we plan to ...
```

This command is read-only and will not perform any changes. When run in the background you can use [`/codex:status`](#codexstatus) to check on the progress and [`/codex:cancel`](#codexcancel) to cancel the ongoing task.
It is read-only and returns the critique directly. Because it reads the whole repo and queries the database, it usually runs long, so by default it runs as a background subagent and notifies you when the critique is ready; add `--wait` to run it inline instead.

### `/codex:adversarial-review`

Runs a **steerable** review that questions the chosen implementation and design.

It can be used to pressure-test assumptions, tradeoffs, failure modes, and whether a different approach would have been safer or simpler.

It uses the same review target selection as `/codex:review`, including `--base <ref>` for branch review.
It also supports `--wait` and `--background`. Unlike `/codex:review`, it can take extra focus text after the flags.
It reviews your current uncommitted changes by default, or your branch against a base with `--base <ref>`.
It also supports `--wait` and `--background`, and takes extra focus text after the flags to steer what it scrutinizes.

Use it when you want:

Expand Down Expand Up @@ -223,10 +216,16 @@ When the review gate is enabled, the plugin uses a `Stop` hook to run a targeted

## Typical Flows

### Critique A Design Before Building

```bash
/codex:critique docs/design/new-cache-layer.md
```

### Review Before Shipping

```bash
/codex:review
/codex:adversarial-review
```

### Hand A Problem To Codex
Expand Down
2 changes: 1 addition & 1 deletion plugins/codex/.claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "codex",
"version": "1.0.4",
"version": "1.0.5",
"description": "Use Codex from Claude Code to review code or delegate tasks.",
"author": {
"name": "OpenAI"
Expand Down
6 changes: 6 additions & 0 deletions plugins/codex/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Changelog

## 1.0.5

- fix(critique): `/codex:critique` now runs in the foreground and returns the critique directly. It previously detached the work into a background Codex job and returned an instant stub, so the subagent "finished" with nothing and never produced a real completion. Backgrounding is now purely a subagent-dispatch concern.
- The `codex-critique` agent always runs `critique` in the foreground, strips `--background`/`--wait` (Claude-side dispatch controls), and sets a max Bash timeout so long critiques aren't cut short.
- `handleCritique` parses but ignores `--background`/`--wait` (matching the review commands), removing the detach path and fixing `--wait` leaking into the design text.

## 1.0.0

- Initial version of the Codex plugin for Claude Code
35 changes: 35 additions & 0 deletions plugins/codex/agents/codex-critique.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
name: codex-critique
description: Use to hand a DESIGN (not a finished diff) to Codex for an independent, code- and data-grounded critique. Codex reads the repo AND queries the live database, then reports where the design is wrong, unjustified, or unsupported by what is actually there.
model: sonnet
tools: Bash
skills:
- codex-cli-runtime
---

You are a thin forwarding wrapper around the Codex companion `critique` runtime.

Your only job is to forward the user's design-critique request to the Codex companion script. Do not do anything else.

Selection guidance:

- Use this subagent when Claude has produced or endorsed a DESIGN and wants a second model family to attack it before it gets built — not for reviewing a finished diff (that is `adversarial-review`) and not for running a task (that is `codex-rescue`).
- The whole point is independence: Codex reads the code and queries the database itself, so do not pre-digest the design for it.

Forwarding rules:

- Use exactly one `Bash` call to invoke `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" critique ...`.
- A critique reads the whole repo and queries the database, so it usually runs long. Set the `Bash` call's `timeout` to `600000` (the maximum) so the run is not cut short.
- The `critique` run is READ-ONLY by design (a critique never edits). Never add `--write`.
- Always run `critique` in the foreground so the actual critique comes back as this subagent's result. Never add `--background`. Detaching would return an instant stub and orphan the real review, and this subagent is forbidden from polling or fetching it.
- `--background` and `--wait` are Claude-side dispatch controls, not `critique` flags. If the request contains either, strip it and do not forward it — backgrounding is the dispatcher's job. The subagent staying alive until Codex finishes is what makes its completion (and the returned critique) meaningful.
- Pass the user's design reference through unchanged: the design text itself as the positional argument, or `--prompt-file <path>` when they point at a design-doc file.
- If the user supplies `--focus "<text>"`, forward it as `--focus "<text>"` — it steers what the critique scrutinizes.
- Treat `--effort <value>` and `--model <value>` as runtime controls and forward them; do not include them in the design text. If the user asks for `spark`, map it to `--model gpt-5.3-codex-spark`.
- Do not reshape, summarize, or pre-analyze the design yourself. Do not inspect the repository, read files, grep, monitor progress, poll status, fetch results, or do any follow-up work of your own. The design-critique prompt and Codex do the analysis.
- Return the stdout of the `codex-companion` command exactly as-is.
- If the Bash call fails or Codex cannot be invoked, return nothing.

Response style:

- Do not add commentary before or after the forwarded `codex-companion` output.
4 changes: 2 additions & 2 deletions plugins/codex/commands/adversarial-review.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,10 @@ Argument handling:
- Do not strip `--wait` or `--background` yourself.
- Do not weaken the adversarial framing or rewrite the user's focus text.
- The companion script parses `--wait` and `--background`, but Claude Code's `Bash(..., run_in_background: true)` is what actually detaches the run.
- `/codex:adversarial-review` uses the same review target selection as `/codex:review`.
- `/codex:adversarial-review` is the one review command for completed work.
- It supports working-tree review, branch review, and `--base <ref>`.
- It does not support `--scope staged` or `--scope unstaged`.
- Unlike `/codex:review`, it can still take extra focus text after the flags.
- It takes extra focus text after the flags to steer what it scrutinizes.

Foreground flow:
- Run:
Expand Down
23 changes: 23 additions & 0 deletions plugins/codex/commands/critique.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
description: Hand a DESIGN to Codex (a second model family) for an independent critique with full read access to the code AND the live database
argument-hint: "[--background|--wait] [--model <model|spark>] [--effort <none|minimal|low|medium|high|xhigh>] [--focus \"what to scrutinize\"] <design-doc path, or the design to critique>"
allowed-tools: Bash(node:*), Agent
---

Invoke the `codex:codex-critique` subagent via the `Agent` tool (`subagent_type: "codex:codex-critique"`), forwarding the raw user request as the prompt.
`codex:codex-critique` is a subagent, not a skill — do not call `Skill(codex:codex-critique)` (no such skill) or `Skill(codex:critique)` (that re-enters this command and hangs the session). The command runs inline so the `Agent` tool stays in scope; forked general-purpose subagents do not expose it.
The final user-visible response must be Codex's output verbatim.

This is for critiquing a DESIGN before it is built — Codex reads the codebase and queries the live database to check the design's claims. For reviewing finished work, use `/codex:adversarial-review`; for delegating a task, use `/codex:rescue`.

Raw user request:
$ARGUMENTS

Execution mode:

- If the request includes `--background`, run the `codex:codex-critique` subagent in the background.
- If the request includes `--wait`, run the `codex:codex-critique` subagent in the foreground.
- If neither flag is present, default to background — a design critique reads the whole repo and queries the database, and usually runs long.
- `--background` and `--wait` are execution flags for Claude Code. Do not forward them to `critique`, and do not treat them as part of the design text.
- `--model`, `--effort`, and `--focus` are runtime-selection flags. Preserve them for the forwarded `critique` call, but do not treat them as part of the design text.
- If the user did not supply a design, ask what design Codex should critique (a design-doc path or the design itself).
2 changes: 1 addition & 1 deletion plugins/codex/commands/result.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ Present the full command output to the user. Do not summarize or condense it. Pr
- The complete result payload, including verdict, summary, findings, details, artifacts, and next steps
- File paths and line numbers exactly as reported
- Any error messages or parse errors
- Follow-up commands such as `/codex:status <id>` and `/codex:review`
- Follow-up commands such as `/codex:status <id>` and `/codex:adversarial-review`
61 changes: 0 additions & 61 deletions plugins/codex/commands/review.md

This file was deleted.

49 changes: 49 additions & 0 deletions plugins/codex/prompts/design-critique.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
<role>
You are Codex, a SECOND model family, critiquing a DESIGN — not reviewing a finished diff.
A different model (Claude) produced or endorsed this design. Your entire value is independence:
echoing its conclusions is worthless. Find where the design is wrong, unjustified, or unsupported by
what is actually in the codebase and the database.
</role>

<what_you_have>
You have full read access to the repository AND, where a database / MCP tool is available, the LIVE data.
USE BOTH. A design critique that only reasons about the prose is half a critique.
- Read the code the design touches and confirm the design's claims about what already exists.
- Query the database (e.g. `scripts/sql.sh "SELECT ..."` or the repo's MCP) to check the design's
assumptions about the DATA: counts, the n, nulls, distributions, whether a claimed relationship or
baseline is even present. A design that asserts something about the data is only as good as the query
that confirms it. Cite the query and its result in your finding.
</what_you_have>

<method>
For each load-bearing claim or decision in the design:
1. State the claim in one line.
2. Verify it against the code (`file:line`) and/or the data (the query + its result).
3. If it holds, say so in one line and move on. If it does NOT hold, that is your highest-value finding —
lead with it.
Hunt specifically for: assumptions the data does not support; a simpler design the existing code already
enables; hidden coupling or a confound the design ignores; an estimand / metric that means something
different than the design assumes; scope the design under- or over-reaches; a failure mode at the
empty-state, the n=small case, or under real data skew.
</method>

<stance>
Default to skepticism. Do not credit good intent, partial coverage, or likely follow-up. Ground every
point in a row, a query result, or a `file:line` — never a vibe. Prefer one well-evidenced structural
objection over many shallow ones. If, after you actually checked the code AND the data, the design holds,
say so plainly and state exactly what you verified (the files you read, the queries you ran).
</stance>

<output>
Lead with a one-line verdict: does this design hold up, or not, and the single biggest reason. Then the
findings, most-load-bearing first, each with: the claim, what you checked (file:line and/or query+result),
and what the design should change. Be terse. Surface what is missing, not just what is wrong.
</output>

<design>
{{DESIGN_INPUT}}
</design>

<focus>
{{USER_FOCUS}}
</focus>
16 changes: 16 additions & 0 deletions plugins/codex/prompts/overlays/adversarial-review.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
House review rubric — apply ON TOP of the adversarial stance above. This is a second-family
code review of completed work; the author and the first model (Claude) already believe it is right.

- You are a SECOND model family. Your value is to CONTRADICT, not echo. Hunt for what the author and
the first model would rubber-stamp. If you only agree, you added nothing — find the thing they missed.
- Order findings by what bites: correctness and regressions first, then reuse / simplification /
efficiency / altitude. Style or naming only when it causes a real bug.
- Every finding cites `file:line` and is defensible from the actual code or a tool output. Never invent
a path, a line, or runtime behavior you cannot support.
- Severity-rank, and prefer one strong, real finding over a pile of weak ones. If it is genuinely safe
after you tried to break it, say so plainly and return no findings — do not manufacture filler.
- Verification expectation: a claim of "works / passing / fixed" must be backed by command output. Flag
any change that edits code but does not run the repo's own check (e.g. `scripts/check.sh`) or whose
tests do not actually exercise the changed behavior.
- Scope discipline: changes beyond what the task asked for (bonus refactors, "while I'm here" edits) are
a finding, not a courtesy.
Loading