feat: add research-project-setup skill (interactive scaffolder + retrofit + CLI test suite) by FuZhiyu · Pull Request #28 · FuZhiyu/superRA

FuZhiyu · 2026-05-22T01:48:11Z

Summary

Adds a new top-level research-project-setup skill that owns interactive project scaffolding and feature retrofit for academic research projects. Bundles the canonical project skeleton (template/ + template-share/), the create_project.sh scaffolder, six retrofit playbooks, default LaTeX templates (manuscript + slides + references.bib), and a complete automated CLI test suite that exercises both Claude Code and Codex headlessly. Replaces the standalone ResearchProjectTemplate repo (deprecation deferred to a follow-up after this lands).

What lands

skills/research-project-setup/SKILL.md — interactive skill with trigger phrases for fresh-setup and six retrofit playbooks (superRA plugin, Codex agents, Overleaf sync, GitHub Actions CI, Figures/Tables restructure, decoupled .share-path).
skills/research-project-setup/scripts/create_project.sh — scaffolder. New: registers the absolute share-folder path in .claude/settings.local.json (additionalDirectories) and .codex/config.toml ([sandbox_workspace_write] writable_roots) so agents can write into arbitrarily-located share folders without permission prompts.
skills/research-project-setup/template/ + template-share/ — canonical two-folder project skeleton with Data/Notes/Output symlinks preserved. Includes bundled LaTeX templates (article-class manuscript with biblatex/biber/authoryear, beamer/metropolis slides, shared references.bib).
skills/research-project-setup/references/feature-catalog.md + retrofit-playbooks.md — agent-facing reference docs.
skills/research-project-setup/tests/ — automated CLI test suite (4 scenarios × Claude + Codex = 8 cases) running on cheapest models (claude-haiku-4-5-20251001, gpt-5.4-mini). Includes a load-bearing negative-control script (test_a_negative_control.sh) that proves Test A actually exercises sandbox registration. Suite runs 8/8 PASS in ~3.5 min.
Inventory wiring: skills/CATEGORIES.md (Utility row), README.md (intro feature + utility-skills row), skills/using-superRA/SKILL.md (Skill Inventory row).

Key design decisions (logged in PLAN.md §Decisions)

Two-folder structure preserved inside the skill (template/ + template-share/) — visually documents the canonical design.
Sandbox registration is a first-class responsibility of the scaffolder, not just a documentation note — without it, scaffolded projects with non-sibling share folders trigger permission prompts on every write into Data//Notes//Output/.
LaTeX bundle included by default (not opt-in) — mirrors the IntermediaryDemand setup (12pt article + 1in margins + biblatex authoryear + booktabs/multirow/makecell/subcaption/hyperref/tikz/pgfplots + standard math operator macros).
CLI test suite runs against \$HOME/rps-tests/ (NOT /tmp/ or ~/.cache/, both of which are in default writable_roots and would make Test A vacuously pass).
Empirical Claude flag discovery: on claude 2.1.147, only --permission-mode acceptEdits honors additionalDirectories in headless mode (default/auto/dontAsk all record out-of-CWD writes as denials regardless of registration). Documented in tests/lib/common.sh run_claude comment block.

Test plan

Standalone scaffolder smoke test — bash skills/research-project-setup/scripts/create_project.sh /tmp/SmokeProj --share-path /tmp/SmokeShare --with-overleaf --with-ci produces the expected structure with absolute share paths registered. (See PLAN.md Task 5 Steps 1–2 + RESULTS.md.)
Automated CLI test suite — bash skills/research-project-setup/tests/run_tests.sh runs 8/8 PASS against the live Claude and Codex CLIs. (See PLAN.md Task 8 + RESULTS.md Task 8 for the matrix.)
Load-bearing negative control — bash skills/research-project-setup/tests/cases/test_a_negative_control.sh both confirms Test A FAILs on both CLIs when additionalDirectories and the absolute writable_roots entries are surgically stripped. Proves the assertion is not vacuous.
Post-sync regression check — same 8/8 PASS after merging upstream Codex-hooks PR [codex] add Codex plugin hooks #27. (See ## Final Diff Self-Check in PLAN.md.)
Codex named-agent sync regression — python3 skills/codex-superra-setup/scripts/sync_codex_agents.py --scope project --check still passes.

Follow-ups

Deprecate the standalone ResearchProjectTemplate repo (separate repo, separate commit) — README pointing at this superRA install + delete superseded files.
Mature PLAN.md and RESULTS.md into permanent records under docs/plans/2026-05-21-research-project-setup-{plan,results}.md (precedent: the recently-merged Codex-hooks PR [codex] add Codex plugin hooks #27 used the same convention).

🤖 Generated with Claude Code

…o superRA Materialize the approved plan into committed handoff docs. Six tasks: 1. Scaffold skill dir + move template/, template-share/, scripts/ 2. Update create_project.sh paths + register absolute share path with .claude/settings.local.json and .codex/config.toml 3. Author SKILL.md + feature-catalog + retrofit-playbooks references 4. Register the new skill in CATEGORIES.md, README.md, using-superRA SKILL.md 5. End-to-end verification (standalone script + agent fresh-setup + retrofit) 6. Deprecate the standalone ResearchProjectTemplate repo (separate repo, deferred to after superRA-side changes are verified) Plan approved by researcher via ExitPlanMode on 2026-05-21. Five user decisions logged in PLAN.md §Decisions (template-file consolidation, deprecate-old-repo, retrofit catalog parity, two-folder preservation, share-path sandbox registration).

… share skeleton Move the canonical project skeleton from the standalone ResearchProjectTemplate repo into the new superRA skill, restructured to match the codex-superra-setup pattern (SKILL.md + scripts/ + references). Preserve the two-folder design by keeping template/ (Git skeleton, with internal symlinks retargeted at ../template-share/{Data,Notes,Output}) plus template-share/ side-by-side under the skill root. Fold the duplicate *-template.md files into single-source CLAUDE.md and README.md inside template/: drop the example-specific "For Codex Only" block from CLAUDE.md, adopt the generic template-style phrasing in README.md. Add a negation pair in the root .gitignore to whitelist skills/research-project-setup/template/.claude/ — without it the repo-level ".claude/" rule silently swallowed the bundled settings.json, agents, and sub-skills that scaffolded projects need to inherit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… registration Rewires create_project.sh to read from the in-skill template/ + template-share/ siblings (replacing the legacy $SCRIPT_DIR/ProjectExample/... and *-template.md paths). Adds a register_share_path_with_agents helper that registers the absolute share-folder path with Claude (.claude/settings.local.json additionalDirectories) and Codex (.codex/config.toml writable_roots), so agents can write into Data/Notes/Output regardless of share-folder location. The same helper is mirrored into template/setup_mac.sh so coauthor machines get the registration on first setup. Smoke-tested with --share-path /tmp/SmokeShare --with-overleaf --with-ci; idempotency verified (re-run adds no duplicates).

…ofit playbooks Task 3 of the research-project-setup move: author the three markdown surfaces for the new skill. SKILL.md follows the codex-superra-setup shape (lean body + on-demand references) and applies the /CLAUDE.md DRY + Necessity tests on every line. Six retrofit playbooks moved from the source skill with paths rewritten to $SKILL_DIR/template/...; the decoupled .share-path playbook is extended to also rewrite .claude/settings.local.json and .codex/config.toml so post-move sandbox writes work the same way as a fresh scaffold.

…pending Ran Steps 1 (standalone scaffolder + sandbox registration), 2 (opt-out flags), 6 (codex-superra-setup regression), 8 (cleanup) — all PASS. Steps 3, 4, 5, 7 require a fresh Claude Code/Codex session in a different directory; documented concrete reproduction commands in RESULTS.md for the researcher. Flagged one cosmetic Codex-toml layout quirk in the registration helper (Task 2 territory). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Verified Steps 1, 2, 6 by independent re-run; matched-line evidence in RESULTS.md is honest. Confirmed Codex config.toml parses cleanly despite cosmetic line-18 layout quirk. Steps 3, 4, 5, 7 remain researcher-gated as the implementer documented. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

….bib) User-initiated scope addition. Insert new Task 6 (LaTeX bundle) before the deprecation step, renumber the old Task 6 → Task 7. Mirrors the IntermediaryDemand preamble per researcher's pointer: 12pt article + biblatex (biber, authoryear) + booktabs/multirow/makecell/subcaption + tikz/pgfplots + math operator macros + colored author-note macros; slides use beamer + metropolis. references.bib at project root, shared across paper and slides. Decision logged in §Decisions. No prior task statuses invalidated: Tasks 1-5 APPROVED stay valid (new task is independent), Task 7 (deprecation) was Not started and remains so with updated Depends on.

…in scaffolded skeleton Adds Paper/manuscript.tex (article + biblatex + standard math/figure packages + theorem envs), Slides/slides.tex (beamer + metropolis), and shared references.bib at project root. Wires create_project.sh to copy them with the existing ProjectExample sed substitution. Both compile cleanly under pdfLaTeX + biber.

Replaces Task 5's deferred manual verification (Steps 3, 4, 5, 7) with a scripted suite that exercises both Claude Code CLI and Codex CLI headlessly against four scenarios using the cheapest model per CLI. Test A (sandbox write to share folder) uses a strict permission profile (no bypass) so it actually exercises register_share_path_with_agents; Tests B/C/D use a permissive bypass profile since they test skill routing, not permissions. A mandatory negative-control step breaks the registered share path and confirms Test A then FAILs, proving the assertion is load-bearing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… REVISE) Captures earlier implementer's work-in-progress: tests/ directory layout, common.sh helpers, four case scripts, README, SKILL.md verification pointer. Marked Task 8 REVISE in PLAN.md because the earlier run used /tmp/ scratch dirs (which match codex default writable_roots ~/.cache, ~/.venvs, /tmp, /private/tmp, /var/folders, ~/.local/share/uv) and --permission-mode acceptEdits for Claude strict (auto-accepts edits), both of which make Test A vacuously pass. PLAN.md now mandates \$HOME/rps-tests/ for all scratch dirs and NO permission-mode flag for Claude strict; negative-control step has tighter timeouts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Path discipline: all scratch dirs under $HOME/rps-tests/ (outside the template's default writable_roots, so register_share_path_with_agents must run for Test A to PASS). cleanup_paths guard tightened to refuse anything not under that subtree. Claude strict-profile flag: kept --permission-mode acceptEdits after an empirical sweep showed the dispatch's "no flag" premise does not hold on claude 2.1.147 -- default/auto/dontAsk all record headless writes to additionalDirectories paths as permission_denials regardless of the registration. acceptEdits is the only mode where writes inside the registered set succeed and writes outside still produce denials. Negative control (additionalDirectories=[]) reproduces FAIL under acceptEdits, proving the assertion is load-bearing. Codex strict-profile flags unchanged. CODEX_MODEL stays gpt-5.4-mini (codex doctor 0.132.0 still lists it as the cheapest mini-tier alias; gpt-5-mini not exposed on a ChatGPT-account install). 8/8 PASS at the corrected discipline; negative control: both Claude and Codex EXPECTED FAIL when the registration is stripped. Full matrix, wall-times, cost, and empirical sweep recorded in RESULTS.md Task 8. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…eout Addresses MAJOR (broken README recipe) + MINOR (perl-alarm fallback hang risk) review findings on Task 8. - Add cases/test_a_negative_control.sh — standalone script that scaffolds, strips registration on both .claude/settings.local.json and .codex/config.toml, and replays Test A's prompt+assertions against the broken project. Reports "expected FAIL got FAIL ✓" per CLI; exits 1 on UNEXPECTED PASS. - Replace with_timeout perl-alarm fallback with a setpgid + process-group-kill pure-bash watchdog: child runs in its own process group under set -m, watcher sends SIGTERM at <secs> then SIGKILL after 10s grace. Verified against a stubborn child trapping SIGTERM (killed at ~12s) and against a subprocess tree (no orphan sleeps). - Rewrite tests/README.md §Negative-control to recommend the new script and drop the broken `test_a_sandbox.sh + pre-broken NegCtrl` recipe. - Update PLAN.md Task 8 Step 8 to point at the new script, annotate the three review items with → implemented markers, flip Review status to IMPLEMENTED. - Record the empirical Claude `acceptEdits` strict-profile correction as a methodology decision in PLAN.md §Decisions so future agents don't re-revert it back to the dispatch's "no flag" assumption. - Update RESULTS.md Task 8 with the new 203s full-suite re-run timings, the new negative-control timings (claude=10s, codex=20s), and platform notes on the with_timeout hardening. Verified live: full 8/8 PASS suite re-runs at parity; negative control produces expected-FAIL on both CLIs with 31s total wall-time.

All in-scope tasks APPROVED. Task 8's 8/8 PASS run + load-bearing negative control supersede Task 5's deferred manual steps. Task 7 (standalone repo deprecation) remains deferred to post-PR per original plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Researcher chose Option 1 (proceed with integration) from the Step 4 completion menu after Task 8 approval. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Skill-authoring work has no empirical results to lock; Task 8's CLI test suite covers every invariant Protect would protect (sandbox registration end-to-end via Claude + Codex, skill discoverability from trigger phrase, Overleaf retrofit playbook). 8/8 PASS at 25c5a83; doc-only commits since. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PRE_SYNC_BASE_SHA=08b68c85, BASE_HEAD_SHA=d861089. One incoming commit ([codex] add Codex plugin hooks #27) touches hooks/, package.json, using-superRA + agent-orchestration references, and adds tests/hooks/* — no overlap with skills/research-project-setup/ expected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Upstream PR #27 ships the polished version of the Codex lifecycle hook bundle (advisory reminder text, hardened ${CLAUDE_PLUGIN_ROOT} fallback, PATH-sparse bash discovery, stop_hook_active guard, anchored proposed-plan marker matching, expanded test coverage 7 -> 15). This branch had its own older snapshot of the same work, plus the disjoint research-project-setup skill. All conflicting hook files / tests / packaging / handoff plans resolved by taking upstream (strict superset, no intent change). README hook-table rows take upstream wording; research-project-setup rows preserved. PLAN.md Sync Map records the cluster. Sync verification: tests/hooks/test-codex-hooks.sh 15/15 PASS.

Backfill **Sync commits** field with e739f2c.

Walked semantic-merge §Semantic Coherence Checklist top to bottom; no [BLOCKING] findings. Independent verification: - Anchors reproduce: PRE_SYNC_BASE_SHA=08b68c85, BASE_HEAD_SHA=d861089, merge-base(bda48c5,d861089)=08b68c85. - All 7 "took upstream" files (hooks/ask-user-question-logger, hooks/codex-plan-stop, hooks/hooks-codex.json, .codex-plugin/plugin.json, docs/plans/2026-05-21-codex-hooks-{plan,results}.md, tests/hooks/test-codex-hooks.sh) verified identical to d861089 via `git diff d861089..e739f2c`. - All 4 "auto-merged with upstream polish" files (hooks/exit-plan-mode, hooks/run-hook.cmd, skills/agent-orchestration/.../worktree-harness-fallback.md, skills/using-superRA/references/codex-instructions.md) also identical to d861089. - README.md synthesis verified: upstream hook-table wording adopted; branch's intro item #4 and Utility-skills row for research-project-setup preserved. - skills/research-project-setup/ tree shows 0 lines of diff between bda48c5 and HEAD — current-branch intent preserved unchanged. - No premature codebase-coherence refactoring; sync diff stat matches expected file set only. - `bash tests/hooks/test-codex-hooks.sh` -> 15/15 PASS on merged tree. - `git diff --check bda48c5..HEAD` clean; no conflict markers anywhere. - Stale-reference sweep: grep for hook names across skills/research-project-setup/ returns no matches; bundled template/.codex/config.toml has no hook references, so no task-local Sync impact propagation is required. - Sync target base decision logged in §Decisions; no other user-decision escalations were required since upstream is a strict superset for all overlapping content. Sync Map cluster `codex-hooks-supersede` correctly explains the branch-level thesis (branch's older snapshot superseded by upstream's polished PR #27) and names the now-redundant branch commits 9eb21be, 740c876, 9bb7d8a, 2509ee9, 3b5de66, 6e67655. Flipping Sync review status to APPROVED. Integrate may proceed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… (RESULTS.md doc-currency)

Address integration reviewer REVISE on Task 5: 1. Refresh RESULTS.md top-line stamp to record current state (Tasks 1-6 and 8 APPROVED; Task 5 manual steps superseded by Task 8 automated coverage; Task 7 deferred to post-PR) and flip Status to integration phase. Rewrite Task 5 Steps 3/4/5/7 in place as SUPERSEDED by Task 8 Tests A/B/C/D respectively (8/8 PASS + load-bearing negative control), removing the stale manual-reproduction recipes and keeping the automated outcomes as the active record. 2. Add Final Diff Self-Check section to PLAN.md per refactor-and-integrate, recording the governing range d861089..HEAD, the surviving net contribution against origin/main (skills/research-project-setup/**, README/CATEGORIES/using-superRA inventory rows, PLAN.md, RESULTS.md, .gitignore whitelist), and an attestation that no refactor was needed under the no-code-change posture; tests re-ran 8/8 PASS at HEAD after sync. Flip Task 5 Integration status APPROVED and annotate both review items with implemented markers.

Re-review confirms both blockquote items from the previous REVISE round are fixed at commit aeebd89: 1. RESULTS.md:6 Last updated stamp + Status line refreshed to current integration state. No more "manual steps pending researcher action" wording. 2. RESULTS.md Task 5 Steps 3/4/5/7 rewritten in place as SUPERSEDED by Task 8 Tests A/B/C/D respectively (with assertions, profiles, and wall-time outcomes). No strikethroughs, no "Update:" framing, no parallel-doc patterns. Supersession mapping verified consistent with Task 8 matrix: Step 3 ↔ Test A sandbox, Step 4 ↔ Test B fresh-setup, Step 5 ↔ Test C retrofit-Overleaf, Step 7 ↔ Test D discovery. PLAN.md ## Final Diff Self-Check section present with governing range (d861089..HEAD), surviving-net-contribution enumeration against origin/main, no-refactor attestation, and explicit .gitignore whitelist justification. Task 5 Integration status APPROVED. All other tasks (1, 2, 3, 4, 6, 8) remain APPROVED and were not touched in this fix commit. Task 7 remains *(not started)* per the deferred-to-post-PR decision. Removed the now-empty Task 5 review-notes blockquote per re-review etiquette (both items confirmed fixed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sync Map cluster was branch-scope context only; integration review APPROVED at 4de21bb with the surviving net contribution justified by PLAN.md §Final Diff Self-Check. Removing the temporary section per integration-workflow Step 5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

User asked to create the PR first; Document step deferred until after PR is up. Handoff docs stay at worktree root for now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d39dbeaa1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-22T01:53:05Z

+# Copy .env to Notes/ in the share folder
+if [ -f "$SCRIPT_DIR/../template-share/Notes/.env" ]; then
+    cp "$SCRIPT_DIR/../template-share/Notes/.env" "$SHARE_PATH/Notes/.env"
+    sed -i '' "s/ProjectExample/$PROJECT_NAME/g" "$SHARE_PATH/Notes/.env"


Use portable sed in scaffolder replacements

Replace the BSD-only sed -i '' form with a cross-platform in-place edit strategy, because on GNU sed (typical Linux environments) this invocation is parsed incorrectly and exits non-zero, causing create_project.sh to abort before scaffolding completes. In practice this breaks fresh project setup for non-macOS users and for Linux-based automation/harness runs.

Useful? React with 👍 / 👎.

FuZhiyu and others added 30 commits May 21, 2026 13:29

hooks: add codex lifecycle hooks

9eb21be

hooks: address codex hook review findings

740c876

integration: log base and finish decisions

43ed2be

integration: record codex hook sync evidence

46f242f

review: Task 1 integration revise

e07463c

integration: fix codex hook handoff scope

9bb7d8a

review: Task 1 integration approve

b77f940

docs: archive codex hook handoff records

2509ee9

docs: record codex hooks pr

3b5de66

review: Task 1 approve

eed3af8

review: Task 2 approve

417f425

review: Task 3 approve

569b9a7

task 4: register research-project-setup in inventory surfaces

d93bbce

review: Task 4 approve

10c4a32

review: Task 6 approve

a3d6e0b

decision: defer integration until Task 5 manual steps pass

7510ae0

review: Task 8 revise

74361e9

review: Task 8 approve

45e595d

FuZhiyu and others added 12 commits May 21, 2026 19:29

decision: log proceed-with-integration choice

f68e358

Researcher chose Option 1 (proceed with integration) from the Step 4 completion menu after Task 8 approval. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sync: record merge commit SHA in PLAN.md Sync Map

4f48878

Backfill **Sync commits** field with e739f2c.

review: post-sync integration — Tasks 1-4,6,8 APPROVED; Task 5 REVISE…

9beefd8

… (RESULTS.md doc-currency)

decision: open PR before doc maturation

8d39dbe

User asked to create the PR first; Document step deferred until after PR is up. Handoff docs stay at worktree root for now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add research-project-setup skill (interactive scaffolder + retrofit + CLI test suite)#28

feat: add research-project-setup skill (interactive scaffolder + retrofit + CLI test suite)#28
FuZhiyu wants to merge 42 commits into
mainfrom
superRA-project-template

FuZhiyu commented May 22, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FuZhiyu commented May 22, 2026

Summary

What lands

Key design decisions (logged in PLAN.md §Decisions)

Test plan

Follow-ups

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant