feat: add research-project-setup skill (interactive scaffolder + retrofit + CLI test suite)#28
feat: add research-project-setup skill (interactive scaffolder + retrofit + CLI test suite)#28FuZhiyu wants to merge 42 commits into
Conversation
…o superRA Materialize the approved plan into committed handoff docs. Six tasks: 1. Scaffold skill dir + move template/, template-share/, scripts/ 2. Update create_project.sh paths + register absolute share path with .claude/settings.local.json and .codex/config.toml 3. Author SKILL.md + feature-catalog + retrofit-playbooks references 4. Register the new skill in CATEGORIES.md, README.md, using-superRA SKILL.md 5. End-to-end verification (standalone script + agent fresh-setup + retrofit) 6. Deprecate the standalone ResearchProjectTemplate repo (separate repo, deferred to after superRA-side changes are verified) Plan approved by researcher via ExitPlanMode on 2026-05-21. Five user decisions logged in PLAN.md §Decisions (template-file consolidation, deprecate-old-repo, retrofit catalog parity, two-folder preservation, share-path sandbox registration).
… share skeleton
Move the canonical project skeleton from the standalone ResearchProjectTemplate
repo into the new superRA skill, restructured to match the codex-superra-setup
pattern (SKILL.md + scripts/ + references). Preserve the two-folder design by
keeping template/ (Git skeleton, with internal symlinks retargeted at
../template-share/{Data,Notes,Output}) plus template-share/ side-by-side under
the skill root.
Fold the duplicate *-template.md files into single-source CLAUDE.md and
README.md inside template/: drop the example-specific "For Codex Only" block
from CLAUDE.md, adopt the generic template-style phrasing in README.md.
Add a negation pair in the root .gitignore to whitelist
skills/research-project-setup/template/.claude/ — without it the repo-level
".claude/" rule silently swallowed the bundled settings.json, agents, and
sub-skills that scaffolded projects need to inherit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… registration Rewires create_project.sh to read from the in-skill template/ + template-share/ siblings (replacing the legacy $SCRIPT_DIR/ProjectExample/... and *-template.md paths). Adds a register_share_path_with_agents helper that registers the absolute share-folder path with Claude (.claude/settings.local.json additionalDirectories) and Codex (.codex/config.toml writable_roots), so agents can write into Data/Notes/Output regardless of share-folder location. The same helper is mirrored into template/setup_mac.sh so coauthor machines get the registration on first setup. Smoke-tested with --share-path /tmp/SmokeShare --with-overleaf --with-ci; idempotency verified (re-run adds no duplicates).
…ofit playbooks Task 3 of the research-project-setup move: author the three markdown surfaces for the new skill. SKILL.md follows the codex-superra-setup shape (lean body + on-demand references) and applies the /CLAUDE.md DRY + Necessity tests on every line. Six retrofit playbooks moved from the source skill with paths rewritten to $SKILL_DIR/template/...; the decoupled .share-path playbook is extended to also rewrite .claude/settings.local.json and .codex/config.toml so post-move sandbox writes work the same way as a fresh scaffold.
…pending Ran Steps 1 (standalone scaffolder + sandbox registration), 2 (opt-out flags), 6 (codex-superra-setup regression), 8 (cleanup) — all PASS. Steps 3, 4, 5, 7 require a fresh Claude Code/Codex session in a different directory; documented concrete reproduction commands in RESULTS.md for the researcher. Flagged one cosmetic Codex-toml layout quirk in the registration helper (Task 2 territory). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Verified Steps 1, 2, 6 by independent re-run; matched-line evidence in RESULTS.md is honest. Confirmed Codex config.toml parses cleanly despite cosmetic line-18 layout quirk. Steps 3, 4, 5, 7 remain researcher-gated as the implementer documented. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
….bib) User-initiated scope addition. Insert new Task 6 (LaTeX bundle) before the deprecation step, renumber the old Task 6 → Task 7. Mirrors the IntermediaryDemand preamble per researcher's pointer: 12pt article + biblatex (biber, authoryear) + booktabs/multirow/makecell/subcaption + tikz/pgfplots + math operator macros + colored author-note macros; slides use beamer + metropolis. references.bib at project root, shared across paper and slides. Decision logged in §Decisions. No prior task statuses invalidated: Tasks 1-5 APPROVED stay valid (new task is independent), Task 7 (deprecation) was Not started and remains so with updated Depends on.
…in scaffolded skeleton Adds Paper/manuscript.tex (article + biblatex + standard math/figure packages + theorem envs), Slides/slides.tex (beamer + metropolis), and shared references.bib at project root. Wires create_project.sh to copy them with the existing ProjectExample sed substitution. Both compile cleanly under pdfLaTeX + biber.
Replaces Task 5's deferred manual verification (Steps 3, 4, 5, 7) with a scripted suite that exercises both Claude Code CLI and Codex CLI headlessly against four scenarios using the cheapest model per CLI. Test A (sandbox write to share folder) uses a strict permission profile (no bypass) so it actually exercises register_share_path_with_agents; Tests B/C/D use a permissive bypass profile since they test skill routing, not permissions. A mandatory negative-control step breaks the registered share path and confirms Test A then FAILs, proving the assertion is load-bearing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… REVISE) Captures earlier implementer's work-in-progress: tests/ directory layout, common.sh helpers, four case scripts, README, SKILL.md verification pointer. Marked Task 8 REVISE in PLAN.md because the earlier run used /tmp/ scratch dirs (which match codex default writable_roots ~/.cache, ~/.venvs, /tmp, /private/tmp, /var/folders, ~/.local/share/uv) and --permission-mode acceptEdits for Claude strict (auto-accepts edits), both of which make Test A vacuously pass. PLAN.md now mandates \$HOME/rps-tests/ for all scratch dirs and NO permission-mode flag for Claude strict; negative-control step has tighter timeouts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Path discipline: all scratch dirs under $HOME/rps-tests/ (outside the template's default writable_roots, so register_share_path_with_agents must run for Test A to PASS). cleanup_paths guard tightened to refuse anything not under that subtree. Claude strict-profile flag: kept --permission-mode acceptEdits after an empirical sweep showed the dispatch's "no flag" premise does not hold on claude 2.1.147 -- default/auto/dontAsk all record headless writes to additionalDirectories paths as permission_denials regardless of the registration. acceptEdits is the only mode where writes inside the registered set succeed and writes outside still produce denials. Negative control (additionalDirectories=[]) reproduces FAIL under acceptEdits, proving the assertion is load-bearing. Codex strict-profile flags unchanged. CODEX_MODEL stays gpt-5.4-mini (codex doctor 0.132.0 still lists it as the cheapest mini-tier alias; gpt-5-mini not exposed on a ChatGPT-account install). 8/8 PASS at the corrected discipline; negative control: both Claude and Codex EXPECTED FAIL when the registration is stripped. Full matrix, wall-times, cost, and empirical sweep recorded in RESULTS.md Task 8. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eout Addresses MAJOR (broken README recipe) + MINOR (perl-alarm fallback hang risk) review findings on Task 8. - Add cases/test_a_negative_control.sh — standalone script that scaffolds, strips registration on both .claude/settings.local.json and .codex/config.toml, and replays Test A's prompt+assertions against the broken project. Reports "expected FAIL got FAIL ✓" per CLI; exits 1 on UNEXPECTED PASS. - Replace with_timeout perl-alarm fallback with a setpgid + process-group-kill pure-bash watchdog: child runs in its own process group under set -m, watcher sends SIGTERM at <secs> then SIGKILL after 10s grace. Verified against a stubborn child trapping SIGTERM (killed at ~12s) and against a subprocess tree (no orphan sleeps). - Rewrite tests/README.md §Negative-control to recommend the new script and drop the broken `test_a_sandbox.sh + pre-broken NegCtrl` recipe. - Update PLAN.md Task 8 Step 8 to point at the new script, annotate the three review items with → implemented markers, flip Review status to IMPLEMENTED. - Record the empirical Claude `acceptEdits` strict-profile correction as a methodology decision in PLAN.md §Decisions so future agents don't re-revert it back to the dispatch's "no flag" assumption. - Update RESULTS.md Task 8 with the new 203s full-suite re-run timings, the new negative-control timings (claude=10s, codex=20s), and platform notes on the with_timeout hardening. Verified live: full 8/8 PASS suite re-runs at parity; negative control produces expected-FAIL on both CLIs with 31s total wall-time.
All in-scope tasks APPROVED. Task 8's 8/8 PASS run + load-bearing negative control supersede Task 5's deferred manual steps. Task 7 (standalone repo deprecation) remains deferred to post-PR per original plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Researcher chose Option 1 (proceed with integration) from the Step 4 completion menu after Task 8 approval. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Skill-authoring work has no empirical results to lock; Task 8's CLI test suite covers every invariant Protect would protect (sandbox registration end-to-end via Claude + Codex, skill discoverability from trigger phrase, Overleaf retrofit playbook). 8/8 PASS at 25c5a83; doc-only commits since. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PRE_SYNC_BASE_SHA=08b68c85, BASE_HEAD_SHA=d861089. One incoming commit ([codex] add Codex plugin hooks #27) touches hooks/, package.json, using-superRA + agent-orchestration references, and adds tests/hooks/* — no overlap with skills/research-project-setup/ expected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Upstream PR #27 ships the polished version of the Codex lifecycle hook bundle (advisory reminder text, hardened ${CLAUDE_PLUGIN_ROOT} fallback, PATH-sparse bash discovery, stop_hook_active guard, anchored proposed-plan marker matching, expanded test coverage 7 -> 15). This branch had its own older snapshot of the same work, plus the disjoint research-project-setup skill. All conflicting hook files / tests / packaging / handoff plans resolved by taking upstream (strict superset, no intent change). README hook-table rows take upstream wording; research-project-setup rows preserved. PLAN.md Sync Map records the cluster. Sync verification: tests/hooks/test-codex-hooks.sh 15/15 PASS.
Backfill **Sync commits** field with e739f2c.
Walked semantic-merge §Semantic Coherence Checklist top to bottom; no [BLOCKING] findings. Independent verification: - Anchors reproduce: PRE_SYNC_BASE_SHA=08b68c85, BASE_HEAD_SHA=d861089, merge-base(bda48c5,d861089)=08b68c85. - All 7 "took upstream" files (hooks/ask-user-question-logger, hooks/codex-plan-stop, hooks/hooks-codex.json, .codex-plugin/plugin.json, docs/plans/2026-05-21-codex-hooks-{plan,results}.md, tests/hooks/test-codex-hooks.sh) verified identical to d861089 via `git diff d861089..e739f2c`. - All 4 "auto-merged with upstream polish" files (hooks/exit-plan-mode, hooks/run-hook.cmd, skills/agent-orchestration/.../worktree-harness-fallback.md, skills/using-superRA/references/codex-instructions.md) also identical to d861089. - README.md synthesis verified: upstream hook-table wording adopted; branch's intro item #4 and Utility-skills row for research-project-setup preserved. - skills/research-project-setup/ tree shows 0 lines of diff between bda48c5 and HEAD — current-branch intent preserved unchanged. - No premature codebase-coherence refactoring; sync diff stat matches expected file set only. - `bash tests/hooks/test-codex-hooks.sh` -> 15/15 PASS on merged tree. - `git diff --check bda48c5..HEAD` clean; no conflict markers anywhere. - Stale-reference sweep: grep for hook names across skills/research-project-setup/ returns no matches; bundled template/.codex/config.toml has no hook references, so no task-local Sync impact propagation is required. - Sync target base decision logged in §Decisions; no other user-decision escalations were required since upstream is a strict superset for all overlapping content. Sync Map cluster `codex-hooks-supersede` correctly explains the branch-level thesis (branch's older snapshot superseded by upstream's polished PR #27) and names the now-redundant branch commits 9eb21be, 740c876, 9bb7d8a, 2509ee9, 3b5de66, 6e67655. Flipping Sync review status to APPROVED. Integrate may proceed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (RESULTS.md doc-currency)
Address integration reviewer REVISE on Task 5: 1. Refresh RESULTS.md top-line stamp to record current state (Tasks 1-6 and 8 APPROVED; Task 5 manual steps superseded by Task 8 automated coverage; Task 7 deferred to post-PR) and flip Status to integration phase. Rewrite Task 5 Steps 3/4/5/7 in place as SUPERSEDED by Task 8 Tests A/B/C/D respectively (8/8 PASS + load-bearing negative control), removing the stale manual-reproduction recipes and keeping the automated outcomes as the active record. 2. Add Final Diff Self-Check section to PLAN.md per refactor-and-integrate, recording the governing range d861089..HEAD, the surviving net contribution against origin/main (skills/research-project-setup/**, README/CATEGORIES/using-superRA inventory rows, PLAN.md, RESULTS.md, .gitignore whitelist), and an attestation that no refactor was needed under the no-code-change posture; tests re-ran 8/8 PASS at HEAD after sync. Flip Task 5 Integration status APPROVED and annotate both review items with implemented markers.
Re-review confirms both blockquote items from the previous REVISE round are fixed at commit aeebd89: 1. RESULTS.md:6 Last updated stamp + Status line refreshed to current integration state. No more "manual steps pending researcher action" wording. 2. RESULTS.md Task 5 Steps 3/4/5/7 rewritten in place as SUPERSEDED by Task 8 Tests A/B/C/D respectively (with assertions, profiles, and wall-time outcomes). No strikethroughs, no "Update:" framing, no parallel-doc patterns. Supersession mapping verified consistent with Task 8 matrix: Step 3 ↔ Test A sandbox, Step 4 ↔ Test B fresh-setup, Step 5 ↔ Test C retrofit-Overleaf, Step 7 ↔ Test D discovery. PLAN.md ## Final Diff Self-Check section present with governing range (d861089..HEAD), surviving-net-contribution enumeration against origin/main, no-refactor attestation, and explicit .gitignore whitelist justification. Task 5 Integration status APPROVED. All other tasks (1, 2, 3, 4, 6, 8) remain APPROVED and were not touched in this fix commit. Task 7 remains *(not started)* per the deferred-to-post-PR decision. Removed the now-empty Task 5 review-notes blockquote per re-review etiquette (both items confirmed fixed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sync Map cluster was branch-scope context only; integration review APPROVED at 4de21bb with the surviving net contribution justified by PLAN.md §Final Diff Self-Check. Removing the temporary section per integration-workflow Step 5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User asked to create the PR first; Document step deferred until after PR is up. Handoff docs stay at worktree root for now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8d39dbeaa1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Copy .env to Notes/ in the share folder | ||
| if [ -f "$SCRIPT_DIR/../template-share/Notes/.env" ]; then | ||
| cp "$SCRIPT_DIR/../template-share/Notes/.env" "$SHARE_PATH/Notes/.env" | ||
| sed -i '' "s/ProjectExample/$PROJECT_NAME/g" "$SHARE_PATH/Notes/.env" |
There was a problem hiding this comment.
Use portable sed in scaffolder replacements
Replace the BSD-only sed -i '' form with a cross-platform in-place edit strategy, because on GNU sed (typical Linux environments) this invocation is parsed incorrectly and exits non-zero, causing create_project.sh to abort before scaffolding completes. In practice this breaks fresh project setup for non-macOS users and for Linux-based automation/harness runs.
Useful? React with 👍 / 👎.
Summary
Adds a new top-level
research-project-setupskill that owns interactive project scaffolding and feature retrofit for academic research projects. Bundles the canonical project skeleton (template/+template-share/), thecreate_project.shscaffolder, six retrofit playbooks, default LaTeX templates (manuscript + slides + references.bib), and a complete automated CLI test suite that exercises both Claude Code and Codex headlessly. Replaces the standaloneResearchProjectTemplaterepo (deprecation deferred to a follow-up after this lands).What lands
skills/research-project-setup/SKILL.md— interactive skill with trigger phrases for fresh-setup and six retrofit playbooks (superRA plugin, Codex agents, Overleaf sync, GitHub Actions CI, Figures/Tables restructure, decoupled.share-path).skills/research-project-setup/scripts/create_project.sh— scaffolder. New: registers the absolute share-folder path in.claude/settings.local.json(additionalDirectories) and.codex/config.toml([sandbox_workspace_write] writable_roots) so agents can write into arbitrarily-located share folders without permission prompts.skills/research-project-setup/template/+template-share/— canonical two-folder project skeleton withData/Notes/Outputsymlinks preserved. Includes bundled LaTeX templates (article-class manuscript with biblatex/biber/authoryear, beamer/metropolis slides, sharedreferences.bib).skills/research-project-setup/references/feature-catalog.md+retrofit-playbooks.md— agent-facing reference docs.skills/research-project-setup/tests/— automated CLI test suite (4 scenarios × Claude + Codex = 8 cases) running on cheapest models (claude-haiku-4-5-20251001,gpt-5.4-mini). Includes a load-bearing negative-control script (test_a_negative_control.sh) that proves Test A actually exercises sandbox registration. Suite runs 8/8 PASS in ~3.5 min.skills/CATEGORIES.md(Utility row),README.md(intro feature + utility-skills row),skills/using-superRA/SKILL.md(Skill Inventory row).Key design decisions (logged in PLAN.md §Decisions)
template/+template-share/) — visually documents the canonical design.Data//Notes//Output/.\$HOME/rps-tests/(NOT/tmp/or~/.cache/, both of which are in defaultwritable_rootsand would make Test A vacuously pass).claude 2.1.147, only--permission-mode acceptEditshonorsadditionalDirectoriesin headless mode (default/auto/dontAskall record out-of-CWD writes as denials regardless of registration). Documented intests/lib/common.shrun_claudecomment block.Test plan
bash skills/research-project-setup/scripts/create_project.sh /tmp/SmokeProj --share-path /tmp/SmokeShare --with-overleaf --with-ciproduces the expected structure with absolute share paths registered. (See PLAN.md Task 5 Steps 1–2 + RESULTS.md.)bash skills/research-project-setup/tests/run_tests.shruns 8/8 PASS against the live Claude and Codex CLIs. (See PLAN.md Task 8 + RESULTS.md Task 8 for the matrix.)bash skills/research-project-setup/tests/cases/test_a_negative_control.sh bothconfirms Test A FAILs on both CLIs whenadditionalDirectoriesand the absolutewritable_rootsentries are surgically stripped. Proves the assertion is not vacuous.## Final Diff Self-Checkin PLAN.md.)python3 skills/codex-superra-setup/scripts/sync_codex_agents.py --scope project --checkstill passes.Follow-ups
ResearchProjectTemplaterepo (separate repo, separate commit) — README pointing at this superRA install + delete superseded files.PLAN.mdandRESULTS.mdinto permanent records underdocs/plans/2026-05-21-research-project-setup-{plan,results}.md(precedent: the recently-merged Codex-hooks PR [codex] add Codex plugin hooks #27 used the same convention).🤖 Generated with Claude Code