Skip to content

Task system, agent interface, and planning workflow redesign#29

Open
FuZhiyu wants to merge 909 commits into
mainfrom
better-handoff
Open

Task system, agent interface, and planning workflow redesign#29
FuZhiyu wants to merge 909 commits into
mainfrom
better-handoff

Conversation

@FuZhiyu

@FuZhiyu FuZhiyu commented May 25, 2026

Copy link
Copy Markdown
Owner

Summary

  • Task system skill — filesystem-based task hierarchy replacing flat PLAN.md/RESULTS.md. Each task is a self-contained task.md with YAML frontmatter + markdown body. Python CLI for CRUD, frontier computation, migration, and HTML dashboard generation.
  • Agent interface integration — updated all workflow skills, agent specs, and orchestration to use .plan/ task-tree operations natively (planning, implementation, integration workflows + agent role specs).
  • Planning workflow redesign — new 5-phase structure (Discovery → Exploration → Domain Setup → Design → Review & Commit), harness plan mode compatibility reference, "plan is the verb" terminology convention, ## Revision Notes replacing ## Decisions log, and complete PLAN.md/RESULTS.md reference sweep across 43+ files.

Test plan

  • 92 task-system unit tests pass
  • Generated files in sync (sync_codex_agents.py --check)
  • Contract tests unchanged (48 pass, 6 pre-existing failures)
  • grep 'PLAN\.md' skills/ — only migration/historical refs remain
  • grep '## Decisions' skills/ agents/ — only legacy data model annotation
  • All 15 tasks individually reviewed and approved
  • Integration review passed (cross-reference consistency, terminology, generated files)

🤖 Generated with Claude Code

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 708a571f79

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

input_files = input_files or []
output_files = output_files or []

task_dir = plan_root / task_path

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject task paths that escape the plan root

create_task builds task_dir via plan_root / task_path without normalizing or checking containment, so inputs like --path ../escaped create and write task.md outside the plan tree. This breaks the task-system boundary and allows accidental or malicious writes to arbitrary sibling directories from normal CLI usage; the same safety invariant should be enforced here (e.g., resolve + verify under plan_root) before any filesystem mutation.

Useful? React with 👍 / 👎.

FuZhiyu added a commit that referenced this pull request Jun 2, 2026
 follow-ups

Codex P1 (task_create.py path traversal) → new superRA/task-system/cli-path-safety
task. Heading/description rename-justification comment → scope bullet in
review-planning-protocol/05-recursive-context-conventions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
FuZhiyu added a commit that referenced this pull request Jun 2, 2026
…note

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
FuZhiyu added a commit that referenced this pull request Jun 2, 2026
Sync PR #29 follow-up base. Incoming range 75a86cc..876178e carries:
- b36d563: not-started plans for path-containment + nested review-planning-protocol
- 876178e: dynamic-workflows task move (deletions) + tree-cleanup edit

This branch already implemented and approved path-containment and the
recursive-context work (at top-level superRA/review-planning-protocol/).
Conflict on path-containment/task.md resolved to this branch's implemented
superset. Propagation commits reconcile stale incoming statuses and remove
the duplicate nested review-planning-protocol task.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

# Conflicts:
#	superRA/task-system/cli-scripts/path-containment/task.md
FuZhiyu added a commit that referenced this pull request Jun 2, 2026
… Sync Map + impact notes)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
FuZhiyu and others added 25 commits June 10, 2026 15:48
…ed-surfaces: manifest preload baseline false for reviewer; status revise

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…t-commit-model: dangling path A/B vocabulary in worktree-return line; status revise

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…nterface sibling references in objective; status revise

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…nherited; .plan phrasing; empty-JSON crash-path hole; status revise

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…non-inherited top-level sections; stale hook claims; status revise

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ctive findings (review_status results claim, agent-orchestration narration, archived row)

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…empty branch Results)

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…gs (leftover approval prose in review notes, dead planning.md refs)

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…seded review notes survived approval; replaced)

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…tale implementation-workflow path)

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…: minor finding (stale hooks.json command claim)

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…spec-slim: minor finding (Editing Etiquette heading claim stale)

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…tree-slim: minor finding (planning.md co-existence caveat resolved-by-deletion)

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ync: minor finding (contributor policy in claude-tools adapter)

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ucture-specs: minor finding (dispatch wording leaks into direct-mode output)

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…-task-read: minor finding (stale lean-interface paths in Results)

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…li: minor finding (stale lean-interface path)

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ssor: minor finding (legacy-sidecar mutation path raises uncaught)

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…or finding (.plan phrasing stated as current)

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…inor findings (prose output: field, .plan phrasing)

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…r findings (e2e lacks task-hook evidence, generated-artifact inventory gaps)

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…dings (dangling Revision Notes pointer, dead planning.md citation)

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ing (dead edit-guidance citations)

Retrospective adversarial review of the better-handoff agent-interface work: findings recorded in ## Review Notes per reviewer protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…plan-as-noun instructions, stale README section pointer, broken Results links

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ine JS unlintable, stale .plan gitignore refs); stays approved
FuZhiyu and others added 30 commits June 21, 2026 13:57
…iene

The reconciliation paragraph read as a design essay defending against task
08's superseded "keep status out of subject" rule — a tension invisible to a
using-superra reader. Cut to the one useful disambiguation (STATE = what this
commit did; live status stays in frontmatter); drops the DRY echo of the
§Report Format return.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The task's output is documentation, so its ## Results re-summarized the grammar
that now lives in using-superra §Commit Hygiene — and had already drifted (it
described the reconciliation paragraph trimmed in 0fb275d). Distil to a pointer
at the source of truth plus the touched-surface list; keep the task as its own
leaf.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
agent-loading-tests: fold 13 children into parent, mature Results to a README
pointer + coverage narrative (researcher-approved). role-spec-always-load: keep,
trim transient integration self-check from Results.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ature results

Mature & Consolidate (researcher-approved):
- agent-loading-tests: folded all 13 numbered children (01-13) into the parent; removed their directories via git rm. Parent is now a leaf, status: approved. Rewrote ## Objective and ## Results to current state — a reader-facing four-layer coverage narrative pointing at tests/harness-instruction-following/README.md as the canonical LC001-LC023 matrix. Dropped per-turn integration scaffolding (Final Diff Self-Check, integration-pass-edit log) and the transient ## Revision Notes. Corrected stale numbers to 140 passed / LC001-LC023.
- role-spec-always-load: kept in place; removed the transient integration self-check paragraph and the ## Revision Notes section; repointed stale 'task 09/10' cross-refs to the folded suite's LC001 canary.
Tree validates clean (no orphans/broken edges); CI-safe suite green (140 passed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Review Notes

Maturation-review coherence cleanups: the caveats line still read 'All 22
entries' after LC023 was added on the trunk sync (the matrix was already
corrected to LC001-LC023); remove the stale empty Review Notes header the fold
left on the consolidated agent-loading-tests task.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Base advanced during integration with the commit-state-grammar work under
agent-surface-redesign. Role-spec edits are non-overlapping: my role-spec-always-load
conditional-load instruction lives in 'Before You Start'; the incoming change adds
commit-subject STATE grammar in the commit section. Both survive. Generated artifacts
(.codex, direct-mode) regenerated from merged sources — drift check clean. Incoming
72420b3 hook-payload fix was already on HEAD (no-op). Harness suite 140 passed,
task-tree tests 284 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cit always-loaded skill body load

Squash-merge of the test-improvement branch.

Harness instruction-following suite (tests/harness-instruction-following/):
structural evidence that agents load the files/skills the dispatch, role spec,
stage, domain, task tree, and workflow triggers require before acting. Four
layers — static CI, fixture/parser, live-claude (claude-agent-sdk in-process
Skill/Read hooks), live-codex (canary + SubagentStart hook) — mapped to the
LC001-LC023 load contract in load_contract.json with a covered_by matrix.
CI-safe layer (140 tests) runs with no model/credentials; live smokes gate on
RUN_LIVE_HARNESS=1 and stay out of CI. Deferred-import isolation keeps
claude-agent-sdk off the default pytest path.

Replaces the prose-regression shell tests (test-{mistral,slide-design,zotero}
-skill-text.sh, test-sync-integration-contract.sh) with structural assertions.

role-spec-always-load: agents/{implementer,reviewer}.md now explicitly load both
always-loaded skills (using-superra + report-in-markdown) if not already in
context, so non-autoload harnesses (Codex) get them; generated .codex agents and
direct-mode references regenerated in sync.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Plan the trim of skills/using-superra/references/main-agent.md: replace the
Workflow Frontier Resolver (which assumes a durable workflow-stage that does not
exist — status + git are the only state) with a status-driven Resuming Work model,
trim the scenario-enumerating pause/proceed prose to principle, and drop redundant
inbound pointers. Splits out the frontier-completeness code question (whether
compute_frontier should surface implemented/revise) as a dependency.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ier surfaces implemented/revise

Broaden compute_frontier's leaf-inclusion set to all actionable statuses
(not-started, in-progress, implemented, revise) so `task frontier` is the
complete "what needs doing now" surface — the per-task status already in the
output distinguishes implement vs review vs fix work. Dependency satisfaction
stays approved-only, so dependents of unreviewed work remain blocked. Replace
the revise-exclusion test with one asserting the new behavior; fix the
empty-frontier message. Only consumer is the `task frontier` CLI (dashboard
does not call compute_frontier).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ier surfaces actionable in-flight work

Verified: implemented/revise leaves now appear on the frontier (live + unit
tests, 13 frontier cases green); dependency satisfaction stays approved-only so
dependents of unreviewed work remain blocked; sole consumer is the task frontier
CLI. Minimal, correct.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing Work model, distilled pauses, dropped pointers

Replace main-agent.md's Workflow Frontier Resolver (which assumed a durable
workflow-stage that does not exist) with a status+git Resuming Work model;
distill the scenario-enumerating pause/proceed prose to two essences (a
researcher-only decision that materially changes a task objective; a pre-set
workflow gate); fix the dangling banned-phrasings reference and the Execution
Modes typo. Drop all eight inbound Frontier Resolver pointers across
superimplement/superintegrate/superplan/using-superra/agent-orchestration,
keeping one survivor (the using-superra master map → Resuming Work); renumber
superimplement's pause-class references to the new two-class model. main-agent.md
shrinks ~100 lines. superplan's step-6 pointer edit left unstaged (intermingled
with concurrent researcher edits).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing Work rewrite verified

Verified: zero live Frontier Resolver / Three-Pause-Class references remain in
skills/agents; the §Resuming Work and §Proceeding and Pausing anchors resolve;
each owning workflow states its own next move so dropping the inbound pointers
strands nothing; the all-approved resume path is covered by superimplement Step
2. One non-blocking cosmetic nit (dangling colon in §Resuming Work after the
researcher's bullet trim) noted, not blocking.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Plan the full removal of script/input/output/tags/created from the
task.md frontmatter schema as a new subtree under task-tree, modeled on
the codex-task-hooks (code task + docs task) precedent:

- 01-code-and-compat: drop the five fields from the data model, serializer,
  write_task, parse_task, task_read/query/check, conftest, plan_migrate, and
  ~143 test refs — keeping the parser tolerant of legacy/unknown keys (no
  strict validation, no bulk migration) as a hard back-compat requirement.
- 02-docs-propagation: sweep the remaining field references out of the
  instruction prose.

Parent task-tree status rolls up approved -> in-progress (PostToolUse hook).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both subtasks are approved and their deliverables are permanent in the branch
diff (the compute_frontier broadening + tests, and the main-agent.md rewrite).
Fold the two execution-unit children into the parent: remove the
01-frontier-completeness and 02-resuming-work-rewrite directories and distil
their results into a short matured ## Results on the parent recording both
outcomes. Decision rationale for the frontier change lives at the
compute_frontier docstring.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fix grammar/typos in the reworked welcome and quickstart pages, and
fill in the three author TODO notes (refactor detail, mature/consolidate
rationale, concrete iterate-the-tree examples) that were left inline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d-section and dispatch-ownership refs

- task-tree-design.md: complete the placement-in-tree rewrite (incomplete
  sentence, grammar, Descent->Descend, broken Retroactive numbering); fold
  the one-line root-workstream rule back into the descent since superplan
  User Review still depends on it.
- superplan/SKILL.md + consolidation.md: retarget the four dangling
  "Placing Work by Durable Home" pointers to the renamed section (depends_on
  semantics now point at Parent and sibling context); restore the lost
  Review Mode definition pointer beside the inlined planning reviewer block.
- CLAUDE.md: planning-review dispatch template moved out of the trimmed
  agent-orchestration into superplan SKILL.md Agent Review; update the
  ownership table to record the new home and the exception.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ructions

The pdf2image figure-conversion command in report-in-markdown used a bare
`uv run --with` (no --script/--no-project), so running it inside a research
project would discover and provision that project's .venv — a side effect on a
throwaway command. Add --no-project.

In econ-data-analysis, the jupytext rendering guidance hardcoded `uv run
jupytext` and a specific pyproject.toml dev-dependency setup. Replace with
high-level guidance: render in whatever environment carries the project's
packages, following the project's existing setup, and flag that bare `uv run`
provisions a venv as a documented side effect rather than the prescribed path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…DONE — drop five frontmatter fields from data layer, CLI, tests; back-compat preserved

Remove script/input/output/tags/created from the Task dataclass, serializer,
parser reads, readable/JSON output, placement smells (root-leaf-fields and
cross-subtree output-overlap), dashboard pills, migration, and fixtures. The
parser stays tolerant of legacy/unknown keys: parse_frontmatter still reads
every key into the dict, parse_task picks only the three known ones, and a
legacy file sheds the retired keys on its next write_task rewrite.

Add TestParseTask::test_legacy_fields_parse_and_are_dropped_on_rewrite as the
executable back-compat guarantee. Delete three tests for the removed behavior.

Suite: 693 passed, 2 skipped. task check parses 304 legacy-field-carrying
task.md files in this repo with no field errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ROVE — fields dropped, back-compat verified end-to-end

Walked commit eb6ddc1: five fields gone from data model/CLI/validators/
dashboard/migration/tests; parse_frontmatter unchanged; _STALE_FIELDS lists
all five. Back-compat regression test is non-vacuous (real in-place round-trip
+ unknown key). Re-ran suite (693 passed, 2 skipped) and task check (exit 0 on
260 legacy-field-carrying task.md files). Deleted tests covered intentionally-
removed Smell 1 and Smell 4 behavior; Smell 4 removal documented in Results.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… removal

Both callers of today_str() (the created-field writers) were removed in
eb6ddc1; a repo-wide grep shows zero remaining references. Drop the helper
and its now-unused 'from datetime import date' import. Resolves the reviewer's
non-blocking MINOR on 01-code-and-compat. Suite: 693 passed, 2 skipped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… DONE — drop script/input/output/tags/created from instruction prose

Propagate the narrowed title/status/depends_on field set across skill and
reference docs so no instruction tells an agent to maintain a removed field.
Recast §Stale Content (output: frontmatter -> ## Objective/## Results), trim
the Task dataclass + PLAN.md-migration FIELD_RE table in internals.md to match
the post-01 code, strip the plan-file template fields, recast consolidation's
scope-update prose, and reduce the superplan material-change item. Two extra
hits (internals migration table, task-tree-design scope bullet) found via the
sweep. _task_io.py docstring and task-tree SKILL.md confirmed already correct.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…VISE — flag residual **Script:** field-prose in internals.md migration checklist

internals.md:189 still tells an agent to add **Script:** *(none)* during legacy
PLAN.md normalization, a field the migrator's FIELD_RE no longer consumes — the
same migration path where the FIELD_RE table rows were correctly dropped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…PROVE — drop residual **Script:** field-prose from migration checklist

Resolves the reviewer's MAJOR inline: internals.md:189 told an agent preparing
a legacy PLAN.md to add a '**Script:** *(none)*' default, a field the migrator
no longer consumes. Re-swept active docs for mid-line **Script:**/**Input:**/
**Output:** field-prose — only legacy-PLAN.md test fixtures remain (migrator
input data, not instructions).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tags/created on rewrite

frontmatter-field-narrowing is fully approved (both subtasks), so the status
propagates up through task-tree to the superRA root. The same rewrite sheds the
legacy 'tags: []' and 'created:' keys those root task.md files still carried —
the back-compat field-shedding from 01-code-and-compat, observed in production.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tegrated)

test-improvement's harness-instruction-following test suite and agent-loading
work were already pulled into better-handoff via the earlier cross-merges
(217ffa7, f91582e, 17c359b) and test-improvement's own back-merge 9fd8fc1.
A 3-way merge here resolves with no conflicts to a tree byte-identical to HEAD:
all 74 files test-improvement changed since the merge-base 178cc86 match
better-handoff exactly. This merge records the integration link so the branch
can be retired; it changes no files.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant