Task system, agent interface, and planning workflow redesign#29
Task system, agent interface, and planning workflow redesign#29FuZhiyu wants to merge 909 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 708a571f79
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| input_files = input_files or [] | ||
| output_files = output_files or [] | ||
|
|
||
| task_dir = plan_root / task_path |
There was a problem hiding this comment.
Reject task paths that escape the plan root
create_task builds task_dir via plan_root / task_path without normalizing or checking containment, so inputs like --path ../escaped create and write task.md outside the plan tree. This breaks the task-system boundary and allows accidental or malicious writes to arbitrary sibling directories from normal CLI usage; the same safety invariant should be enforced here (e.g., resolve + verify under plan_root) before any filesystem mutation.
Useful? React with 👍 / 👎.
…note Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sync PR #29 follow-up base. Incoming range 75a86cc..876178e carries: - b36d563: not-started plans for path-containment + nested review-planning-protocol - 876178e: dynamic-workflows task move (deletions) + tree-cleanup edit This branch already implemented and approved path-containment and the recursive-context work (at top-level superRA/review-planning-protocol/). Conflict on path-containment/task.md resolved to this branch's implemented superset. Propagation commits reconcile stale incoming statuses and remove the duplicate nested review-planning-protocol task. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> # Conflicts: # superRA/task-system/cli-scripts/path-containment/task.md
… Sync Map + impact notes) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ed-surfaces: manifest preload baseline false for reviewer; status revise Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…t-commit-model: dangling path A/B vocabulary in worktree-return line; status revise Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…nterface sibling references in objective; status revise Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…nherited; .plan phrasing; empty-JSON crash-path hole; status revise Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…non-inherited top-level sections; stale hook claims; status revise Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ctive findings (review_status results claim, agent-orchestration narration, archived row) Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…empty branch Results) Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…gs (leftover approval prose in review notes, dead planning.md refs) Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…seded review notes survived approval; replaced) Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…tale implementation-workflow path) Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…: minor finding (stale hooks.json command claim) Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…spec-slim: minor finding (Editing Etiquette heading claim stale) Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…tree-slim: minor finding (planning.md co-existence caveat resolved-by-deletion) Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ync: minor finding (contributor policy in claude-tools adapter) Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ucture-specs: minor finding (dispatch wording leaks into direct-mode output) Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…-task-read: minor finding (stale lean-interface paths in Results) Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…li: minor finding (stale lean-interface path) Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ssor: minor finding (legacy-sidecar mutation path raises uncaught) Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…or finding (.plan phrasing stated as current) Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…inor findings (prose output: field, .plan phrasing) Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…r findings (e2e lacks task-hook evidence, generated-artifact inventory gaps) Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…dings (dangling Revision Notes pointer, dead planning.md citation) Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ing (dead edit-guidance citations) Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…plan-as-noun instructions, stale README section pointer, broken Results links Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ine JS unlintable, stale .plan gitignore refs); stays approved
…iene The reconciliation paragraph read as a design essay defending against task 08's superseded "keep status out of subject" rule — a tension invisible to a using-superra reader. Cut to the one useful disambiguation (STATE = what this commit did; live status stays in frontmatter); drops the DRY echo of the §Report Format return. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The task's output is documentation, so its ## Results re-summarized the grammar that now lives in using-superra §Commit Hygiene — and had already drifted (it described the reconciliation paragraph trimmed in 0fb275d). Distil to a pointer at the source of truth plus the touched-surface list; keep the task as its own leaf. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
agent-loading-tests: fold 13 children into parent, mature Results to a README pointer + coverage narrative (researcher-approved). role-spec-always-load: keep, trim transient integration self-check from Results. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ature results Mature & Consolidate (researcher-approved): - agent-loading-tests: folded all 13 numbered children (01-13) into the parent; removed their directories via git rm. Parent is now a leaf, status: approved. Rewrote ## Objective and ## Results to current state — a reader-facing four-layer coverage narrative pointing at tests/harness-instruction-following/README.md as the canonical LC001-LC023 matrix. Dropped per-turn integration scaffolding (Final Diff Self-Check, integration-pass-edit log) and the transient ## Revision Notes. Corrected stale numbers to 140 passed / LC001-LC023. - role-spec-always-load: kept in place; removed the transient integration self-check paragraph and the ## Revision Notes section; repointed stale 'task 09/10' cross-refs to the folded suite's LC001 canary. Tree validates clean (no orphans/broken edges); CI-safe suite green (140 passed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Review Notes Maturation-review coherence cleanups: the caveats line still read 'All 22 entries' after LC023 was added on the trunk sync (the matrix was already corrected to LC001-LC023); remove the stale empty Review Notes header the fold left on the consolidated agent-loading-tests task. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Base advanced during integration with the commit-state-grammar work under agent-surface-redesign. Role-spec edits are non-overlapping: my role-spec-always-load conditional-load instruction lives in 'Before You Start'; the incoming change adds commit-subject STATE grammar in the commit section. Both survive. Generated artifacts (.codex, direct-mode) regenerated from merged sources — drift check clean. Incoming 72420b3 hook-payload fix was already on HEAD (no-op). Harness suite 140 passed, task-tree tests 284 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cit always-loaded skill body load
Squash-merge of the test-improvement branch.
Harness instruction-following suite (tests/harness-instruction-following/):
structural evidence that agents load the files/skills the dispatch, role spec,
stage, domain, task tree, and workflow triggers require before acting. Four
layers — static CI, fixture/parser, live-claude (claude-agent-sdk in-process
Skill/Read hooks), live-codex (canary + SubagentStart hook) — mapped to the
LC001-LC023 load contract in load_contract.json with a covered_by matrix.
CI-safe layer (140 tests) runs with no model/credentials; live smokes gate on
RUN_LIVE_HARNESS=1 and stay out of CI. Deferred-import isolation keeps
claude-agent-sdk off the default pytest path.
Replaces the prose-regression shell tests (test-{mistral,slide-design,zotero}
-skill-text.sh, test-sync-integration-contract.sh) with structural assertions.
role-spec-always-load: agents/{implementer,reviewer}.md now explicitly load both
always-loaded skills (using-superra + report-in-markdown) if not already in
context, so non-autoload harnesses (Codex) get them; generated .codex agents and
direct-mode references regenerated in sync.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Plan the trim of skills/using-superra/references/main-agent.md: replace the Workflow Frontier Resolver (which assumes a durable workflow-stage that does not exist — status + git are the only state) with a status-driven Resuming Work model, trim the scenario-enumerating pause/proceed prose to principle, and drop redundant inbound pointers. Splits out the frontier-completeness code question (whether compute_frontier should surface implemented/revise) as a dependency. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ier surfaces implemented/revise Broaden compute_frontier's leaf-inclusion set to all actionable statuses (not-started, in-progress, implemented, revise) so `task frontier` is the complete "what needs doing now" surface — the per-task status already in the output distinguishes implement vs review vs fix work. Dependency satisfaction stays approved-only, so dependents of unreviewed work remain blocked. Replace the revise-exclusion test with one asserting the new behavior; fix the empty-frontier message. Only consumer is the `task frontier` CLI (dashboard does not call compute_frontier). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ier surfaces actionable in-flight work Verified: implemented/revise leaves now appear on the frontier (live + unit tests, 13 frontier cases green); dependency satisfaction stays approved-only so dependents of unreviewed work remain blocked; sole consumer is the task frontier CLI. Minimal, correct. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing Work model, distilled pauses, dropped pointers Replace main-agent.md's Workflow Frontier Resolver (which assumed a durable workflow-stage that does not exist) with a status+git Resuming Work model; distill the scenario-enumerating pause/proceed prose to two essences (a researcher-only decision that materially changes a task objective; a pre-set workflow gate); fix the dangling banned-phrasings reference and the Execution Modes typo. Drop all eight inbound Frontier Resolver pointers across superimplement/superintegrate/superplan/using-superra/agent-orchestration, keeping one survivor (the using-superra master map → Resuming Work); renumber superimplement's pause-class references to the new two-class model. main-agent.md shrinks ~100 lines. superplan's step-6 pointer edit left unstaged (intermingled with concurrent researcher edits). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing Work rewrite verified Verified: zero live Frontier Resolver / Three-Pause-Class references remain in skills/agents; the §Resuming Work and §Proceeding and Pausing anchors resolve; each owning workflow states its own next move so dropping the inbound pointers strands nothing; the all-approved resume path is covered by superimplement Step 2. One non-blocking cosmetic nit (dangling colon in §Resuming Work after the researcher's bullet trim) noted, not blocking. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Plan the full removal of script/input/output/tags/created from the task.md frontmatter schema as a new subtree under task-tree, modeled on the codex-task-hooks (code task + docs task) precedent: - 01-code-and-compat: drop the five fields from the data model, serializer, write_task, parse_task, task_read/query/check, conftest, plan_migrate, and ~143 test refs — keeping the parser tolerant of legacy/unknown keys (no strict validation, no bulk migration) as a hard back-compat requirement. - 02-docs-propagation: sweep the remaining field references out of the instruction prose. Parent task-tree status rolls up approved -> in-progress (PostToolUse hook). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both subtasks are approved and their deliverables are permanent in the branch diff (the compute_frontier broadening + tests, and the main-agent.md rewrite). Fold the two execution-unit children into the parent: remove the 01-frontier-completeness and 02-resuming-work-rewrite directories and distil their results into a short matured ## Results on the parent recording both outcomes. Decision rationale for the frontier change lives at the compute_frontier docstring. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fix grammar/typos in the reworked welcome and quickstart pages, and fill in the three author TODO notes (refactor detail, mature/consolidate rationale, concrete iterate-the-tree examples) that were left inline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d-section and dispatch-ownership refs - task-tree-design.md: complete the placement-in-tree rewrite (incomplete sentence, grammar, Descent->Descend, broken Retroactive numbering); fold the one-line root-workstream rule back into the descent since superplan User Review still depends on it. - superplan/SKILL.md + consolidation.md: retarget the four dangling "Placing Work by Durable Home" pointers to the renamed section (depends_on semantics now point at Parent and sibling context); restore the lost Review Mode definition pointer beside the inlined planning reviewer block. - CLAUDE.md: planning-review dispatch template moved out of the trimmed agent-orchestration into superplan SKILL.md Agent Review; update the ownership table to record the new home and the exception. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ructions The pdf2image figure-conversion command in report-in-markdown used a bare `uv run --with` (no --script/--no-project), so running it inside a research project would discover and provision that project's .venv — a side effect on a throwaway command. Add --no-project. In econ-data-analysis, the jupytext rendering guidance hardcoded `uv run jupytext` and a specific pyproject.toml dev-dependency setup. Replace with high-level guidance: render in whatever environment carries the project's packages, following the project's existing setup, and flag that bare `uv run` provisions a venv as a documented side effect rather than the prescribed path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…DONE — drop five frontmatter fields from data layer, CLI, tests; back-compat preserved Remove script/input/output/tags/created from the Task dataclass, serializer, parser reads, readable/JSON output, placement smells (root-leaf-fields and cross-subtree output-overlap), dashboard pills, migration, and fixtures. The parser stays tolerant of legacy/unknown keys: parse_frontmatter still reads every key into the dict, parse_task picks only the three known ones, and a legacy file sheds the retired keys on its next write_task rewrite. Add TestParseTask::test_legacy_fields_parse_and_are_dropped_on_rewrite as the executable back-compat guarantee. Delete three tests for the removed behavior. Suite: 693 passed, 2 skipped. task check parses 304 legacy-field-carrying task.md files in this repo with no field errors. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ROVE — fields dropped, back-compat verified end-to-end Walked commit eb6ddc1: five fields gone from data model/CLI/validators/ dashboard/migration/tests; parse_frontmatter unchanged; _STALE_FIELDS lists all five. Back-compat regression test is non-vacuous (real in-place round-trip + unknown key). Re-ran suite (693 passed, 2 skipped) and task check (exit 0 on 260 legacy-field-carrying task.md files). Deleted tests covered intentionally- removed Smell 1 and Smell 4 behavior; Smell 4 removal documented in Results. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… removal Both callers of today_str() (the created-field writers) were removed in eb6ddc1; a repo-wide grep shows zero remaining references. Drop the helper and its now-unused 'from datetime import date' import. Resolves the reviewer's non-blocking MINOR on 01-code-and-compat. Suite: 693 passed, 2 skipped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… DONE — drop script/input/output/tags/created from instruction prose Propagate the narrowed title/status/depends_on field set across skill and reference docs so no instruction tells an agent to maintain a removed field. Recast §Stale Content (output: frontmatter -> ## Objective/## Results), trim the Task dataclass + PLAN.md-migration FIELD_RE table in internals.md to match the post-01 code, strip the plan-file template fields, recast consolidation's scope-update prose, and reduce the superplan material-change item. Two extra hits (internals migration table, task-tree-design scope bullet) found via the sweep. _task_io.py docstring and task-tree SKILL.md confirmed already correct. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…VISE — flag residual **Script:** field-prose in internals.md migration checklist internals.md:189 still tells an agent to add **Script:** *(none)* during legacy PLAN.md normalization, a field the migrator's FIELD_RE no longer consumes — the same migration path where the FIELD_RE table rows were correctly dropped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…PROVE — drop residual **Script:** field-prose from migration checklist Resolves the reviewer's MAJOR inline: internals.md:189 told an agent preparing a legacy PLAN.md to add a '**Script:** *(none)*' default, a field the migrator no longer consumes. Re-swept active docs for mid-line **Script:**/**Input:**/ **Output:** field-prose — only legacy-PLAN.md test fixtures remain (migrator input data, not instructions). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tags/created on rewrite frontmatter-field-narrowing is fully approved (both subtasks), so the status propagates up through task-tree to the superRA root. The same rewrite sheds the legacy 'tags: []' and 'created:' keys those root task.md files still carried — the back-compat field-shedding from 01-code-and-compat, observed in production. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tegrated) test-improvement's harness-instruction-following test suite and agent-loading work were already pulled into better-handoff via the earlier cross-merges (217ffa7, f91582e, 17c359b) and test-improvement's own back-merge 9fd8fc1. A 3-way merge here resolves with no conflicts to a tree byte-identical to HEAD: all 74 files test-improvement changed since the merge-base 178cc86 match better-handoff exactly. This merge records the integration link so the branch can be retired; it changes no files. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
task.mdwith YAML frontmatter + markdown body. Python CLI for CRUD, frontier computation, migration, and HTML dashboard generation..plan/task-tree operations natively (planning, implementation, integration workflows + agent role specs).## Revision Notesreplacing## Decisionslog, and complete PLAN.md/RESULTS.md reference sweep across 43+ files.Test plan
sync_codex_agents.py --check)grep 'PLAN\.md' skills/— only migration/historical refs remaingrep '## Decisions' skills/ agents/— only legacy data model annotation🤖 Generated with Claude Code