Skip to content

feat: add research-project-setup skill (interactive scaffolder + retrofit + CLI test suite)#28

Open
FuZhiyu wants to merge 42 commits into
mainfrom
superRA-project-template
Open

feat: add research-project-setup skill (interactive scaffolder + retrofit + CLI test suite)#28
FuZhiyu wants to merge 42 commits into
mainfrom
superRA-project-template

Conversation

@FuZhiyu

@FuZhiyu FuZhiyu commented May 22, 2026

Copy link
Copy Markdown
Owner

Summary

Adds a new top-level research-project-setup skill that owns interactive project scaffolding and feature retrofit for academic research projects. Bundles the canonical project skeleton (template/ + template-share/), the create_project.sh scaffolder, six retrofit playbooks, default LaTeX templates (manuscript + slides + references.bib), and a complete automated CLI test suite that exercises both Claude Code and Codex headlessly. Replaces the standalone ResearchProjectTemplate repo (deprecation deferred to a follow-up after this lands).

What lands

  • skills/research-project-setup/SKILL.md — interactive skill with trigger phrases for fresh-setup and six retrofit playbooks (superRA plugin, Codex agents, Overleaf sync, GitHub Actions CI, Figures/Tables restructure, decoupled .share-path).
  • skills/research-project-setup/scripts/create_project.sh — scaffolder. New: registers the absolute share-folder path in .claude/settings.local.json (additionalDirectories) and .codex/config.toml ([sandbox_workspace_write] writable_roots) so agents can write into arbitrarily-located share folders without permission prompts.
  • skills/research-project-setup/template/ + template-share/ — canonical two-folder project skeleton with Data/Notes/Output symlinks preserved. Includes bundled LaTeX templates (article-class manuscript with biblatex/biber/authoryear, beamer/metropolis slides, shared references.bib).
  • skills/research-project-setup/references/feature-catalog.md + retrofit-playbooks.md — agent-facing reference docs.
  • skills/research-project-setup/tests/ — automated CLI test suite (4 scenarios × Claude + Codex = 8 cases) running on cheapest models (claude-haiku-4-5-20251001, gpt-5.4-mini). Includes a load-bearing negative-control script (test_a_negative_control.sh) that proves Test A actually exercises sandbox registration. Suite runs 8/8 PASS in ~3.5 min.
  • Inventory wiring: skills/CATEGORIES.md (Utility row), README.md (intro feature + utility-skills row), skills/using-superRA/SKILL.md (Skill Inventory row).

Key design decisions (logged in PLAN.md §Decisions)

  • Two-folder structure preserved inside the skill (template/ + template-share/) — visually documents the canonical design.
  • Sandbox registration is a first-class responsibility of the scaffolder, not just a documentation note — without it, scaffolded projects with non-sibling share folders trigger permission prompts on every write into Data//Notes//Output/.
  • LaTeX bundle included by default (not opt-in) — mirrors the IntermediaryDemand setup (12pt article + 1in margins + biblatex authoryear + booktabs/multirow/makecell/subcaption/hyperref/tikz/pgfplots + standard math operator macros).
  • CLI test suite runs against \$HOME/rps-tests/ (NOT /tmp/ or ~/.cache/, both of which are in default writable_roots and would make Test A vacuously pass).
  • Empirical Claude flag discovery: on claude 2.1.147, only --permission-mode acceptEdits honors additionalDirectories in headless mode (default/auto/dontAsk all record out-of-CWD writes as denials regardless of registration). Documented in tests/lib/common.sh run_claude comment block.

Test plan

  • Standalone scaffolder smoke testbash skills/research-project-setup/scripts/create_project.sh /tmp/SmokeProj --share-path /tmp/SmokeShare --with-overleaf --with-ci produces the expected structure with absolute share paths registered. (See PLAN.md Task 5 Steps 1–2 + RESULTS.md.)
  • Automated CLI test suitebash skills/research-project-setup/tests/run_tests.sh runs 8/8 PASS against the live Claude and Codex CLIs. (See PLAN.md Task 8 + RESULTS.md Task 8 for the matrix.)
  • Load-bearing negative controlbash skills/research-project-setup/tests/cases/test_a_negative_control.sh both confirms Test A FAILs on both CLIs when additionalDirectories and the absolute writable_roots entries are surgically stripped. Proves the assertion is not vacuous.
  • Post-sync regression check — same 8/8 PASS after merging upstream Codex-hooks PR [codex] add Codex plugin hooks #27. (See ## Final Diff Self-Check in PLAN.md.)
  • Codex named-agent sync regressionpython3 skills/codex-superra-setup/scripts/sync_codex_agents.py --scope project --check still passes.

Follow-ups

  • Deprecate the standalone ResearchProjectTemplate repo (separate repo, separate commit) — README pointing at this superRA install + delete superseded files.
  • Mature PLAN.md and RESULTS.md into permanent records under docs/plans/2026-05-21-research-project-setup-{plan,results}.md (precedent: the recently-merged Codex-hooks PR [codex] add Codex plugin hooks #27 used the same convention).

🤖 Generated with Claude Code

FuZhiyu and others added 30 commits May 21, 2026 13:29
…o superRA

Materialize the approved plan into committed handoff docs. Six tasks:
1. Scaffold skill dir + move template/, template-share/, scripts/
2. Update create_project.sh paths + register absolute share path with
   .claude/settings.local.json and .codex/config.toml
3. Author SKILL.md + feature-catalog + retrofit-playbooks references
4. Register the new skill in CATEGORIES.md, README.md, using-superRA SKILL.md
5. End-to-end verification (standalone script + agent fresh-setup + retrofit)
6. Deprecate the standalone ResearchProjectTemplate repo (separate repo,
   deferred to after superRA-side changes are verified)

Plan approved by researcher via ExitPlanMode on 2026-05-21. Five user
decisions logged in PLAN.md §Decisions (template-file consolidation,
deprecate-old-repo, retrofit catalog parity, two-folder preservation,
share-path sandbox registration).
… share skeleton

Move the canonical project skeleton from the standalone ResearchProjectTemplate
repo into the new superRA skill, restructured to match the codex-superra-setup
pattern (SKILL.md + scripts/ + references). Preserve the two-folder design by
keeping template/ (Git skeleton, with internal symlinks retargeted at
../template-share/{Data,Notes,Output}) plus template-share/ side-by-side under
the skill root.

Fold the duplicate *-template.md files into single-source CLAUDE.md and
README.md inside template/: drop the example-specific "For Codex Only" block
from CLAUDE.md, adopt the generic template-style phrasing in README.md.

Add a negation pair in the root .gitignore to whitelist
skills/research-project-setup/template/.claude/ — without it the repo-level
".claude/" rule silently swallowed the bundled settings.json, agents, and
sub-skills that scaffolded projects need to inherit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… registration

Rewires create_project.sh to read from the in-skill template/ + template-share/
siblings (replacing the legacy $SCRIPT_DIR/ProjectExample/... and *-template.md
paths). Adds a register_share_path_with_agents helper that registers the
absolute share-folder path with Claude (.claude/settings.local.json
additionalDirectories) and Codex (.codex/config.toml writable_roots), so
agents can write into Data/Notes/Output regardless of share-folder location.
The same helper is mirrored into template/setup_mac.sh so coauthor machines
get the registration on first setup. Smoke-tested with --share-path
/tmp/SmokeShare --with-overleaf --with-ci; idempotency verified (re-run
adds no duplicates).
…ofit playbooks

Task 3 of the research-project-setup move: author the three markdown surfaces
for the new skill. SKILL.md follows the codex-superra-setup shape (lean body +
on-demand references) and applies the /CLAUDE.md DRY + Necessity tests on every
line. Six retrofit playbooks moved from the source skill with paths rewritten
to $SKILL_DIR/template/...; the decoupled .share-path playbook is extended to
also rewrite .claude/settings.local.json and .codex/config.toml so post-move
sandbox writes work the same way as a fresh scaffold.
…pending

Ran Steps 1 (standalone scaffolder + sandbox registration), 2 (opt-out
flags), 6 (codex-superra-setup regression), 8 (cleanup) — all PASS.
Steps 3, 4, 5, 7 require a fresh Claude Code/Codex session in a
different directory; documented concrete reproduction commands in
RESULTS.md for the researcher. Flagged one cosmetic Codex-toml layout
quirk in the registration helper (Task 2 territory).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Verified Steps 1, 2, 6 by independent re-run; matched-line evidence
in RESULTS.md is honest. Confirmed Codex config.toml parses cleanly
despite cosmetic line-18 layout quirk. Steps 3, 4, 5, 7 remain
researcher-gated as the implementer documented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
….bib)

User-initiated scope addition. Insert new Task 6 (LaTeX bundle) before
the deprecation step, renumber the old Task 6 → Task 7. Mirrors the
IntermediaryDemand preamble per researcher's pointer: 12pt article +
biblatex (biber, authoryear) + booktabs/multirow/makecell/subcaption
+ tikz/pgfplots + math operator macros + colored author-note macros;
slides use beamer + metropolis. references.bib at project root, shared
across paper and slides.

Decision logged in §Decisions. No prior task statuses invalidated:
Tasks 1-5 APPROVED stay valid (new task is independent), Task 7
(deprecation) was Not started and remains so with updated Depends on.
…in scaffolded skeleton

Adds Paper/manuscript.tex (article + biblatex + standard math/figure
packages + theorem envs), Slides/slides.tex (beamer + metropolis), and
shared references.bib at project root. Wires create_project.sh to copy
them with the existing ProjectExample sed substitution. Both compile
cleanly under pdfLaTeX + biber.
Replaces Task 5's deferred manual verification (Steps 3, 4, 5, 7) with a
scripted suite that exercises both Claude Code CLI and Codex CLI
headlessly against four scenarios using the cheapest model per CLI. Test
A (sandbox write to share folder) uses a strict permission profile (no
bypass) so it actually exercises register_share_path_with_agents; Tests
B/C/D use a permissive bypass profile since they test skill routing, not
permissions. A mandatory negative-control step breaks the registered
share path and confirms Test A then FAILs, proving the assertion is
load-bearing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… REVISE)

Captures earlier implementer's work-in-progress: tests/ directory layout,
common.sh helpers, four case scripts, README, SKILL.md verification
pointer. Marked Task 8 REVISE in PLAN.md because the earlier run used
/tmp/ scratch dirs (which match codex default writable_roots ~/.cache,
~/.venvs, /tmp, /private/tmp, /var/folders, ~/.local/share/uv) and
--permission-mode acceptEdits for Claude strict (auto-accepts edits),
both of which make Test A vacuously pass. PLAN.md now mandates
\$HOME/rps-tests/ for all scratch dirs and NO permission-mode flag for
Claude strict; negative-control step has tighter timeouts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Path discipline: all scratch dirs under $HOME/rps-tests/ (outside the
template's default writable_roots, so register_share_path_with_agents
must run for Test A to PASS). cleanup_paths guard tightened to refuse
anything not under that subtree.

Claude strict-profile flag: kept --permission-mode acceptEdits after
an empirical sweep showed the dispatch's "no flag" premise does not
hold on claude 2.1.147 -- default/auto/dontAsk all record headless
writes to additionalDirectories paths as permission_denials regardless
of the registration. acceptEdits is the only mode where writes inside
the registered set succeed and writes outside still produce denials.
Negative control (additionalDirectories=[]) reproduces FAIL under
acceptEdits, proving the assertion is load-bearing.

Codex strict-profile flags unchanged. CODEX_MODEL stays gpt-5.4-mini
(codex doctor 0.132.0 still lists it as the cheapest mini-tier alias;
gpt-5-mini not exposed on a ChatGPT-account install).

8/8 PASS at the corrected discipline; negative control: both Claude
and Codex EXPECTED FAIL when the registration is stripped. Full
matrix, wall-times, cost, and empirical sweep recorded in
RESULTS.md Task 8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eout

Addresses MAJOR (broken README recipe) + MINOR (perl-alarm fallback hang risk)
review findings on Task 8.

- Add cases/test_a_negative_control.sh — standalone script that scaffolds,
  strips registration on both .claude/settings.local.json and .codex/config.toml,
  and replays Test A's prompt+assertions against the broken project. Reports
  "expected FAIL got FAIL ✓" per CLI; exits 1 on UNEXPECTED PASS.
- Replace with_timeout perl-alarm fallback with a setpgid + process-group-kill
  pure-bash watchdog: child runs in its own process group under set -m,
  watcher sends SIGTERM at <secs> then SIGKILL after 10s grace. Verified
  against a stubborn child trapping SIGTERM (killed at ~12s) and against
  a subprocess tree (no orphan sleeps).
- Rewrite tests/README.md §Negative-control to recommend the new script
  and drop the broken `test_a_sandbox.sh + pre-broken NegCtrl` recipe.
- Update PLAN.md Task 8 Step 8 to point at the new script, annotate the
  three review items with → implemented markers, flip Review status to
  IMPLEMENTED.
- Record the empirical Claude `acceptEdits` strict-profile correction as
  a methodology decision in PLAN.md §Decisions so future agents don't
  re-revert it back to the dispatch's "no flag" assumption.
- Update RESULTS.md Task 8 with the new 203s full-suite re-run timings,
  the new negative-control timings (claude=10s, codex=20s), and
  platform notes on the with_timeout hardening.

Verified live: full 8/8 PASS suite re-runs at parity; negative control
produces expected-FAIL on both CLIs with 31s total wall-time.
FuZhiyu and others added 12 commits May 21, 2026 19:29
All in-scope tasks APPROVED. Task 8's 8/8 PASS run + load-bearing
negative control supersede Task 5's deferred manual steps. Task 7
(standalone repo deprecation) remains deferred to post-PR per original
plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Researcher chose Option 1 (proceed with integration) from the Step 4
completion menu after Task 8 approval.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Skill-authoring work has no empirical results to lock; Task 8's CLI test
suite covers every invariant Protect would protect (sandbox registration
end-to-end via Claude + Codex, skill discoverability from trigger phrase,
Overleaf retrofit playbook). 8/8 PASS at 25c5a83; doc-only commits since.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PRE_SYNC_BASE_SHA=08b68c85, BASE_HEAD_SHA=d861089. One incoming commit
([codex] add Codex plugin hooks #27) touches hooks/, package.json,
using-superRA + agent-orchestration references, and adds tests/hooks/* —
no overlap with skills/research-project-setup/ expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Upstream PR #27 ships the polished version of the Codex lifecycle hook
bundle (advisory reminder text, hardened ${CLAUDE_PLUGIN_ROOT} fallback,
PATH-sparse bash discovery, stop_hook_active guard, anchored proposed-plan
marker matching, expanded test coverage 7 -> 15). This branch had its own
older snapshot of the same work, plus the disjoint research-project-setup
skill. All conflicting hook files / tests / packaging / handoff plans
resolved by taking upstream (strict superset, no intent change). README
hook-table rows take upstream wording; research-project-setup rows
preserved. PLAN.md Sync Map records the cluster.

Sync verification: tests/hooks/test-codex-hooks.sh 15/15 PASS.
Backfill **Sync commits** field with e739f2c.
Walked semantic-merge §Semantic Coherence Checklist top to bottom; no
[BLOCKING] findings.

Independent verification:
- Anchors reproduce: PRE_SYNC_BASE_SHA=08b68c85, BASE_HEAD_SHA=d861089,
  merge-base(bda48c5,d861089)=08b68c85.
- All 7 "took upstream" files (hooks/ask-user-question-logger,
  hooks/codex-plan-stop, hooks/hooks-codex.json, .codex-plugin/plugin.json,
  docs/plans/2026-05-21-codex-hooks-{plan,results}.md,
  tests/hooks/test-codex-hooks.sh) verified identical to d861089 via
  `git diff d861089..e739f2c`.
- All 4 "auto-merged with upstream polish" files (hooks/exit-plan-mode,
  hooks/run-hook.cmd, skills/agent-orchestration/.../worktree-harness-fallback.md,
  skills/using-superRA/references/codex-instructions.md) also identical to d861089.
- README.md synthesis verified: upstream hook-table wording adopted;
  branch's intro item #4 and Utility-skills row for research-project-setup
  preserved.
- skills/research-project-setup/ tree shows 0 lines of diff between bda48c5
  and HEAD — current-branch intent preserved unchanged.
- No premature codebase-coherence refactoring; sync diff stat matches
  expected file set only.
- `bash tests/hooks/test-codex-hooks.sh` -> 15/15 PASS on merged tree.
- `git diff --check bda48c5..HEAD` clean; no conflict markers anywhere.
- Stale-reference sweep: grep for hook names across skills/research-project-setup/
  returns no matches; bundled template/.codex/config.toml has no hook references,
  so no task-local Sync impact propagation is required.
- Sync target base decision logged in §Decisions; no other user-decision
  escalations were required since upstream is a strict superset for all
  overlapping content.

Sync Map cluster `codex-hooks-supersede` correctly explains the branch-level
thesis (branch's older snapshot superseded by upstream's polished PR #27)
and names the now-redundant branch commits 9eb21be, 740c876, 9bb7d8a,
2509ee9, 3b5de66, 6e67655.

Flipping Sync review status to APPROVED. Integrate may proceed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address integration reviewer REVISE on Task 5:

1. Refresh RESULTS.md top-line stamp to record current state (Tasks 1-6
   and 8 APPROVED; Task 5 manual steps superseded by Task 8 automated
   coverage; Task 7 deferred to post-PR) and flip Status to integration
   phase. Rewrite Task 5 Steps 3/4/5/7 in place as SUPERSEDED by Task 8
   Tests A/B/C/D respectively (8/8 PASS + load-bearing negative control),
   removing the stale manual-reproduction recipes and keeping the
   automated outcomes as the active record.

2. Add Final Diff Self-Check section to PLAN.md per
   refactor-and-integrate, recording the governing range
   d861089..HEAD, the surviving net contribution against origin/main
   (skills/research-project-setup/**, README/CATEGORIES/using-superRA
   inventory rows, PLAN.md, RESULTS.md, .gitignore whitelist), and an
   attestation that no refactor was needed under the no-code-change
   posture; tests re-ran 8/8 PASS at HEAD after sync.

Flip Task 5 Integration status APPROVED and annotate both review items
with implemented markers.
Re-review confirms both blockquote items from the previous REVISE round
are fixed at commit aeebd89:

1. RESULTS.md:6 Last updated stamp + Status line refreshed to current
   integration state. No more "manual steps pending researcher action"
   wording.
2. RESULTS.md Task 5 Steps 3/4/5/7 rewritten in place as SUPERSEDED by
   Task 8 Tests A/B/C/D respectively (with assertions, profiles, and
   wall-time outcomes). No strikethroughs, no "Update:" framing,
   no parallel-doc patterns.

Supersession mapping verified consistent with Task 8 matrix:
Step 3 ↔ Test A sandbox, Step 4 ↔ Test B fresh-setup, Step 5 ↔ Test C
retrofit-Overleaf, Step 7 ↔ Test D discovery.

PLAN.md ## Final Diff Self-Check section present with governing range
(d861089..HEAD), surviving-net-contribution enumeration against
origin/main, no-refactor attestation, and explicit .gitignore
whitelist justification.

Task 5 Integration status APPROVED. All other tasks (1, 2, 3, 4, 6, 8)
remain APPROVED and were not touched in this fix commit. Task 7
remains *(not started)* per the deferred-to-post-PR decision.

Removed the now-empty Task 5 review-notes blockquote per
re-review etiquette (both items confirmed fixed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sync Map cluster was branch-scope context only; integration review
APPROVED at 4de21bb with the surviving net contribution justified by
PLAN.md §Final Diff Self-Check. Removing the temporary section per
integration-workflow Step 5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User asked to create the PR first; Document step deferred until after PR
is up. Handoff docs stay at worktree root for now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d39dbeaa1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# Copy .env to Notes/ in the share folder
if [ -f "$SCRIPT_DIR/../template-share/Notes/.env" ]; then
cp "$SCRIPT_DIR/../template-share/Notes/.env" "$SHARE_PATH/Notes/.env"
sed -i '' "s/ProjectExample/$PROJECT_NAME/g" "$SHARE_PATH/Notes/.env"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use portable sed in scaffolder replacements

Replace the BSD-only sed -i '' form with a cross-platform in-place edit strategy, because on GNU sed (typical Linux environments) this invocation is parsed incorrectly and exits non-zero, causing create_project.sh to abort before scaffolding completes. In practice this breaks fresh project setup for non-macOS users and for Linux-based automation/harness runs.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant