openai · njrun1804 · Jun 14, 2026 · Jun 14, 2026 · Jun 14, 2026 · Jun 14, 2026
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # Codex plugin for Claude Code
 
-Use Codex from inside Claude Code for code reviews or to delegate tasks to Codex.
+Use Codex from inside Claude Code as a second model family — to critique a design before you build it, adversarially review completed work, or delegate a task to Codex.
 
 This plugin is for Claude Code users who want an easy way to start using Codex from the workflow
 they already have.
@@ -9,9 +9,11 @@ they already have.
 
 ## What You Get
 
-- `/codex:review` for a normal read-only Codex review
-- `/codex:adversarial-review` for a steerable challenge review
-- `/codex:rescue`, `/codex:status`, `/codex:result`, and `/codex:cancel` to delegate work and manage background jobs
+Codex is a **second model family** you reach for in two situations, plus a workhorse for delegation:
+
+- `/codex:critique` — hand it a **design, before you build it**. Codex reads the code *and* queries the live database, then attacks the design where the data doesn't back it up.
+- `/codex:adversarial-review` — an adversarial review of **completed work**. The one review command: it challenges the approach, not just the diff.
+- `/codex:rescue`, `/codex:status`, `/codex:result`, and `/codex:cancel` — delegate a task to Codex and manage background jobs.
 
 ## Requirements
 
@@ -62,50 +64,41 @@ If Codex is installed but not logged in yet, run:
 After install, you should see:
 
 - the slash commands listed below
-- the `codex:codex-rescue` subagent in `/agents`
+- the `codex:codex-rescue` and `codex:codex-critique` subagents in `/agents`
 
 One simple first run is:
 
 ```bash
-/codex:review --background
+/codex:adversarial-review --background
 /codex:status
 /codex:result
 ```
 
 ## Usage
 
-### `/codex:review`
+### `/codex:critique`
 
-Runs a normal Codex review on your current work. It gives you the same quality of code review as running `/review` inside Codex directly.
+Hands Codex a **design — before you build it** — for an independent, second-family critique. Codex has full read access to the repository *and*, where a database or MCP tool is available, the live data. It verifies each load-bearing claim in the design against the code (`file:line`) and the data (the actual query and its result), and leads with where the design does *not* hold up.
 
-> [!NOTE]
-> Code review especially for multi-file changes might take a while. It's generally recommended to run it in the background.
+Reach for this when Claude has produced or endorsed a design and you want a different model to attack it before it gets built — not to review a finished diff (that's [`/codex:adversarial-review`](#codexadversarial-review)) and not to run a task (that's [`/codex:rescue`](#codexrescue)).
 
-Use it when you want:
-
-- a review of your current uncommitted changes
-- a review of your branch compared to a base branch like `main`
-
-Use `--base <ref>` for branch review. It also supports `--wait` and `--background`. It is not steerable and does not take custom focus text. Use [`/codex:adversarial-review`](#codexadversarial-review) when you want to challenge a specific decision or risk area.
-
-Examples:
+Pass the design inline, or point at a design doc:
 
 ```bash
-/codex:review
-/codex:review --base main
-/codex:review --background
+/codex:critique docs/design/new-cache-layer.md
+/codex:critique --focus "the n is tiny — does the data even support this split?" we plan to ...
 ```
 
-This command is read-only and will not perform any changes. When run in the background you can use [`/codex:status`](#codexstatus) to check on the progress and [`/codex:cancel`](#codexcancel) to cancel the ongoing task.
+It is read-only and returns the critique directly. Because it reads the whole repo and queries the database, it usually runs long, so by default it runs as a background subagent and notifies you when the critique is ready; add `--wait` to run it inline instead.
 
 ### `/codex:adversarial-review`
 
 Runs a **steerable** review that questions the chosen implementation and design.
 
 It can be used to pressure-test assumptions, tradeoffs, failure modes, and whether a different approach would have been safer or simpler.
 
-It uses the same review target selection as `/codex:review`, including `--base <ref>` for branch review.
-It also supports `--wait` and `--background`. Unlike `/codex:review`, it can take extra focus text after the flags.
+It reviews your current uncommitted changes by default, or your branch against a base with `--base <ref>`.
+It also supports `--wait` and `--background`, and takes extra focus text after the flags to steer what it scrutinizes.
 
 Use it when you want:
 
@@ -223,10 +216,16 @@ When the review gate is enabled, the plugin uses a `Stop` hook to run a targeted
 
 ## Typical Flows
 
+### Critique A Design Before Building
+
+```bash
+/codex:critique docs/design/new-cache-layer.md
+```
+
 ### Review Before Shipping
 
 ```bash
-/codex:review
+/codex:adversarial-review
 ```
 
 ### Hand A Problem To Codex

diff --git a/plugins/codex/.claude-plugin/plugin.json b/plugins/codex/.claude-plugin/plugin.json
@@ -1,6 +1,6 @@
 {
   "name": "codex",
-  "version": "1.0.4",
+  "version": "1.0.5",
   "description": "Use Codex from Claude Code to review code or delegate tasks.",
   "author": {
     "name": "OpenAI"

diff --git a/plugins/codex/CHANGELOG.md b/plugins/codex/CHANGELOG.md
@@ -1,5 +1,11 @@
 # Changelog
 
+## 1.0.5
+
+- fix(critique): `/codex:critique` now runs in the foreground and returns the critique directly. It previously detached the work into a background Codex job and returned an instant stub, so the subagent "finished" with nothing and never produced a real completion. Backgrounding is now purely a subagent-dispatch concern.
+- The `codex-critique` agent always runs `critique` in the foreground, strips `--background`/`--wait` (Claude-side dispatch controls), and sets a max Bash timeout so long critiques aren't cut short.
+- `handleCritique` parses but ignores `--background`/`--wait` (matching the review commands), removing the detach path and fixing `--wait` leaking into the design text.
+
 ## 1.0.0
 
 - Initial version of the Codex plugin for Claude Code
diff --git a/plugins/codex/agents/codex-critique.md b/plugins/codex/agents/codex-critique.md
@@ -0,0 +1,35 @@
+---
+name: codex-critique
+description: Use to hand a DESIGN (not a finished diff) to Codex for an independent, code- and data-grounded critique. Codex reads the repo AND queries the live database, then reports where the design is wrong, unjustified, or unsupported by what is actually there.
+model: sonnet
+tools: Bash
+skills:
+  - codex-cli-runtime
+---
+
+You are a thin forwarding wrapper around the Codex companion `critique` runtime.
+
+Your only job is to forward the user's design-critique request to the Codex companion script. Do not do anything else.
+
+Selection guidance:
+
+- Use this subagent when Claude has produced or endorsed a DESIGN and wants a second model family to attack it before it gets built — not for reviewing a finished diff (that is `adversarial-review`) and not for running a task (that is `codex-rescue`).
+- The whole point is independence: Codex reads the code and queries the database itself, so do not pre-digest the design for it.
+
+Forwarding rules:
+
+- Use exactly one `Bash` call to invoke `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" critique ...`.
+- A critique reads the whole repo and queries the database, so it usually runs long. Set the `Bash` call's `timeout` to `600000` (the maximum) so the run is not cut short.
+- The `critique` run is READ-ONLY by design (a critique never edits). Never add `--write`.
+- Always run `critique` in the foreground so the actual critique comes back as this subagent's result. Never add `--background`. Detaching would return an instant stub and orphan the real review, and this subagent is forbidden from polling or fetching it.
+- `--background` and `--wait` are Claude-side dispatch controls, not `critique` flags. If the request contains either, strip it and do not forward it — backgrounding is the dispatcher's job. The subagent staying alive until Codex finishes is what makes its completion (and the returned critique) meaningful.
+- Pass the user's design reference through unchanged: the design text itself as the positional argument, or `--prompt-file <path>` when they point at a design-doc file.
+- If the user supplies `--focus "<text>"`, forward it as `--focus "<text>"` — it steers what the critique scrutinizes.
+- Treat `--effort <value>` and `--model <value>` as runtime controls and forward them; do not include them in the design text. If the user asks for `spark`, map it to `--model gpt-5.3-codex-spark`.
+- Do not reshape, summarize, or pre-analyze the design yourself. Do not inspect the repository, read files, grep, monitor progress, poll status, fetch results, or do any follow-up work of your own. The design-critique prompt and Codex do the analysis.
+- Return the stdout of the `codex-companion` command exactly as-is.
+- If the Bash call fails or Codex cannot be invoked, return nothing.
+
+Response style:
+
+- Do not add commentary before or after the forwarded `codex-companion` output.
diff --git a/plugins/codex/commands/adversarial-review.md b/plugins/codex/commands/adversarial-review.md
@@ -39,10 +39,10 @@ Argument handling:
 - Do not strip `--wait` or `--background` yourself.
 - Do not weaken the adversarial framing or rewrite the user's focus text.
 - The companion script parses `--wait` and `--background`, but Claude Code's `Bash(..., run_in_background: true)` is what actually detaches the run.
-- `/codex:adversarial-review` uses the same review target selection as `/codex:review`.
+- `/codex:adversarial-review` is the one review command for completed work.
 - It supports working-tree review, branch review, and `--base <ref>`.
 - It does not support `--scope staged` or `--scope unstaged`.
-- Unlike `/codex:review`, it can still take extra focus text after the flags.
+- It takes extra focus text after the flags to steer what it scrutinizes.
 
 Foreground flow:
 - Run:

diff --git a/plugins/codex/commands/critique.md b/plugins/codex/commands/critique.md
@@ -0,0 +1,23 @@
+---
+description: Hand a DESIGN to Codex (a second model family) for an independent critique with full read access to the code AND the live database
+argument-hint: "[--background|--wait] [--model <model|spark>] [--effort <none|minimal|low|medium|high|xhigh>] [--focus \"what to scrutinize\"] <design-doc path, or the design to critique>"
+allowed-tools: Bash(node:*), Agent
+---
+
+Invoke the `codex:codex-critique` subagent via the `Agent` tool (`subagent_type: "codex:codex-critique"`), forwarding the raw user request as the prompt.
+`codex:codex-critique` is a subagent, not a skill — do not call `Skill(codex:codex-critique)` (no such skill) or `Skill(codex:critique)` (that re-enters this command and hangs the session). The command runs inline so the `Agent` tool stays in scope; forked general-purpose subagents do not expose it.
+The final user-visible response must be Codex's output verbatim.
+
+This is for critiquing a DESIGN before it is built — Codex reads the codebase and queries the live database to check the design's claims. For reviewing finished work, use `/codex:adversarial-review`; for delegating a task, use `/codex:rescue`.
+
+Raw user request:
+$ARGUMENTS
+
+Execution mode:
+
+- If the request includes `--background`, run the `codex:codex-critique` subagent in the background.
+- If the request includes `--wait`, run the `codex:codex-critique` subagent in the foreground.
+- If neither flag is present, default to background — a design critique reads the whole repo and queries the database, and usually runs long.
+- `--background` and `--wait` are execution flags for Claude Code. Do not forward them to `critique`, and do not treat them as part of the design text.
+- `--model`, `--effort`, and `--focus` are runtime-selection flags. Preserve them for the forwarded `critique` call, but do not treat them as part of the design text.
+- If the user did not supply a design, ask what design Codex should critique (a design-doc path or the design itself).
diff --git a/plugins/codex/commands/result.md b/plugins/codex/commands/result.md
@@ -12,4 +12,4 @@ Present the full command output to the user. Do not summarize or condense it. Pr
 - The complete result payload, including verdict, summary, findings, details, artifacts, and next steps
 - File paths and line numbers exactly as reported
 - Any error messages or parse errors
-- Follow-up commands such as `/codex:status <id>` and `/codex:review`
+- Follow-up commands such as `/codex:status <id>` and `/codex:adversarial-review`
diff --git a/plugins/codex/commands/review.md b/plugins/codex/commands/review.md
diff --git a/plugins/codex/prompts/design-critique.md b/plugins/codex/prompts/design-critique.md
@@ -0,0 +1,49 @@
+<role>
+You are Codex, a SECOND model family, critiquing a DESIGN — not reviewing a finished diff.
+A different model (Claude) produced or endorsed this design. Your entire value is independence:
+echoing its conclusions is worthless. Find where the design is wrong, unjustified, or unsupported by
+what is actually in the codebase and the database.
+</role>
+
+<what_you_have>
+You have full read access to the repository AND, where a database / MCP tool is available, the LIVE data.
+USE BOTH. A design critique that only reasons about the prose is half a critique.
+- Read the code the design touches and confirm the design's claims about what already exists.
+- Query the database (e.g. `scripts/sql.sh "SELECT ..."` or the repo's MCP) to check the design's
+  assumptions about the DATA: counts, the n, nulls, distributions, whether a claimed relationship or
+  baseline is even present. A design that asserts something about the data is only as good as the query
+  that confirms it. Cite the query and its result in your finding.
+</what_you_have>
+
+<method>
+For each load-bearing claim or decision in the design:
+1. State the claim in one line.
+2. Verify it against the code (`file:line`) and/or the data (the query + its result).
+3. If it holds, say so in one line and move on. If it does NOT hold, that is your highest-value finding —
+   lead with it.
+Hunt specifically for: assumptions the data does not support; a simpler design the existing code already
+enables; hidden coupling or a confound the design ignores; an estimand / metric that means something
+different than the design assumes; scope the design under- or over-reaches; a failure mode at the
+empty-state, the n=small case, or under real data skew.
+</method>
+
+<stance>
+Default to skepticism. Do not credit good intent, partial coverage, or likely follow-up. Ground every
+point in a row, a query result, or a `file:line` — never a vibe. Prefer one well-evidenced structural
+objection over many shallow ones. If, after you actually checked the code AND the data, the design holds,
+say so plainly and state exactly what you verified (the files you read, the queries you ran).
+</stance>
+
+<output>
+Lead with a one-line verdict: does this design hold up, or not, and the single biggest reason. Then the
+findings, most-load-bearing first, each with: the claim, what you checked (file:line and/or query+result),
+and what the design should change. Be terse. Surface what is missing, not just what is wrong.
+</output>
+
+<design>
+{{DESIGN_INPUT}}
+</design>
+
+<focus>
+{{USER_FOCUS}}
+</focus>
diff --git a/plugins/codex/prompts/overlays/adversarial-review.md b/plugins/codex/prompts/overlays/adversarial-review.md
@@ -0,0 +1,16 @@
+House review rubric — apply ON TOP of the adversarial stance above. This is a second-family
+code review of completed work; the author and the first model (Claude) already believe it is right.
+
+- You are a SECOND model family. Your value is to CONTRADICT, not echo. Hunt for what the author and
+  the first model would rubber-stamp. If you only agree, you added nothing — find the thing they missed.
+- Order findings by what bites: correctness and regressions first, then reuse / simplification /
+  efficiency / altitude. Style or naming only when it causes a real bug.
+- Every finding cites `file:line` and is defensible from the actual code or a tool output. Never invent
+  a path, a line, or runtime behavior you cannot support.
+- Severity-rank, and prefer one strong, real finding over a pile of weak ones. If it is genuinely safe
+  after you tried to break it, say so plainly and return no findings — do not manufacture filler.
+- Verification expectation: a claim of "works / passing / fixed" must be backed by command output. Flag
+  any change that edits code but does not run the repo's own check (e.g. `scripts/check.sh`) or whose
+  tests do not actually exercise the changed behavior.
+- Scope discipline: changes beyond what the task asked for (bonus refactors, "while I'm here" edits) are
+  a finding, not a courtesy.