An iterative MCP-based test harness for Claude Code plugins and MCP servers.
Plugin Test Harness (PTH) runs a structured test/fix/reload loop against a target plugin entirely inside Claude Code. It creates a git session branch and worktree, auto-discovers tool schemas from the live MCP server, generates test cases, executes them via Claude, and applies code fixes, iterating until convergence. Test suites and session artifacts persist across sessions in ~/.pth/PLUGIN_NAME/ so state is never lost when a session ends or a process restarts.
[P1] Act on Intent: Invoking a session tool is consent to its full scope. Session gates block only the narrow pre-session window; once a session is active, all tools operate without confirmation.
[P2] Scope Fidelity: Each tool does exactly one thing: generate, record, fix, reload, or inspect. No tool performs a superset of its stated scope without being asked.
[P3] Succeed Quietly, Fail Transparently: Normal iterations emit compact summaries. On critical failures (build error, git conflict, missing tool) PTH stops immediately and surfaces raw output alongside a recovery plan.
[P4] Use the Full Toolkit: The convergence trend (improving, plateaued, oscillating, declining) is a structured signal, not prose, enabling Claude to pick the right next action without guessing.
[P5] Convergence is the Contract: Sessions iterate toward zero failing tests without check-ins. PTH surfaces the trend and a recommended next action at each iteration boundary; the AI drives toward the goal.
[P6] Composable, Focused Units: Each MCP tool is independently callable. Orchestration is assembled at runtime by the AI agent, not baked into PTH itself.
- Node.js 20+
- Target plugin must be on the local filesystem inside a git repository
- For
mcpmode: target server must be startable via its.mcp.json(PTH spawns the server to discover tool schemas) - For
pluginmode: target plugin must have a.claude-plugin/directory
/plugin marketplace add L3DigitalNet/Claude-Code-Plugins
/plugin install plugin-test-harness@l3digitalnet-plugins
For local development:
claude --plugin-dir ./plugins/plugin-test-harnessPTH is distributed with a pre-built dist/ directory. If you are running from source or a clean checkout, build first:
cd ~/.claude/plugins/cache/l3digitalnet-plugins/plugin-test-harness/<version>/
npm install
npm run buildflowchart TD
User([User]) -->|"pth_preflight"| Preflight[Check path, git, build system, lock]
Preflight -->|"pth_start_session"| Start[Create branch + worktree<br/>Detect mode, load saved tests<br/>Run gap analysis]
Start --> Generate[pth_generate_tests<br/>Auto-discover schemas<br/>Upsert test suite]
Generate --> Run[Claude executes each test<br/>against live plugin tools]
Run -->|"pth_record_result × N"| Record[Record pass / fail / skipped<br/>Update session state]
Record -->|"pth_get_iteration_status"| Iter{Convergence<br/>trend?}
Iter -->|improving| Run
Iter -->|plateaued / oscillating / declining| Fix[pth_apply_fix<br/>Write + commit files]
Fix -->|"pth_reload_plugin"| Reload[Build → sync cache → kill process<br/>Claude Code restarts server]
Reload --> Run
Iter -->|all passing| End[pth_end_session<br/>Persist artifacts, remove worktree]
End --> Report((~/.pth/PLUGIN/<br/>SESSION-REPORT.md))
Call pth_preflight with the absolute path to your plugin.
Then call pth_start_session to create the session branch.
Call pth_generate_tests; schemas are auto-discovered.
Execute each test by calling the plugin's tools, then pth_record_result.
Call pth_get_iteration_status to checkpoint and see the trend.
Apply fixes with pth_apply_fix, reload with pth_reload_plugin, repeat.
End with pth_end_session.
If the Claude Code process restarts mid-session, use:
pth_resume_session branch: "pth/my-plugin-2026-02-18-abc123" pluginPath: "/path/to/plugin"
Session branches follow the pattern pth/<plugin>-<date>-<6-char-hex>. The branch remains in the git repository after the session ends, preserving the full fix history.
Each session runs in a git worktree at /tmp/pth-worktree-<branch-slug>. The main working tree is never modified during a session. Fix commits land on the session branch; pth_diff_session shows the cumulative diff against the branch point.
| Tool | Description |
|---|---|
pth_preflight |
Validate prerequisites: plugin path exists, git repo detected, plugin mode recognized, no active session lock, persistent store status. |
pth_start_session |
Create session branch and worktree, detect plugin mode, load saved tests from persistent store, run gap analysis vs last snapshot. |
pth_resume_session |
Re-attach to an existing session branch after a process restart. Loads tests from persistent store. |
pth_end_session |
Persist tests and artifacts to ~/.pth/, generate SESSION-REPORT.md, remove worktree, release session lock. |
pth_get_session_status |
Current iteration count, pass/fail totals, convergence trend, mode, and start time. |
| Tool | Description |
|---|---|
pth_generate_tests |
Auto-discover tool schemas by spawning the target MCP server, then generate test cases (valid input + missing-required-field variants). Upserts existing tests; skips tests with pinned: true (hand-edited via pth_edit_test). Accepts optional tools[] for gap-targeted generation. |
pth_list_tests |
List tests with optional filters: mode (mcp/plugin), status (passing/failing/pending), tag. |
pth_create_test |
Add a new test from a YAML definition. |
pth_edit_test |
Update an existing test by ID with a new YAML definition. Sets pinned: true on the test, protecting it from being overwritten by pth_generate_tests. |
pth_delete_test |
Remove a test from the suite by ID. Also clears its result history from the in-session results tracker. |
| Tool | Description |
|---|---|
pth_record_result |
Record a test outcome (passing/failing/skipped) after Claude executes the test. Accepts optional durationMs, failureReason, and claudeNotes. |
pth_get_results |
Full pass/fail listing for all tests with failure reasons. |
pth_get_test_impact |
Identify which tests exercise a given set of source files, used to target re-runs after a fix. |
pth_get_iteration_status |
Checkpoint the current iteration: snapshot pass/fail counts, advance the iteration counter, emit the convergence table and trend recommendation. |
| Tool | Description |
|---|---|
pth_apply_fix |
Write file changes and commit to the session branch with PTH git trailers (PTH-Test, PTH-Category, PTH-Iteration, PTH-Files). |
pth_sync_to_cache |
Sync worktree changes to the plugin's installed cache directory so hook script changes take effect immediately without a rebuild. |
pth_reload_plugin |
Build the plugin, sync the new binary to cache, then send SIGTERM to the MCP server process. Claude Code auto-restarts the server from the updated cache. |
pth_get_fix_history |
List all fix commits on the session branch with their PTH trailers. |
pth_revert_fix |
Undo a specific fix commit via git revert. Handles session-state.json conflicts automatically. |
pth_diff_session |
Show the cumulative diff of all changes on the session branch vs the branch point (truncated at 200 lines; use pth_get_fix_history for a structured summary on large diffs). |
| Mode | Detected when | How PTH tests |
|---|---|---|
mcp |
.mcp.json present at plugin root |
Spawns the MCP server via .mcp.json, calls tools/list to discover all tool schemas, generates test calls via Claude using the discovered tools |
plugin |
.claude-plugin/ directory present (no .mcp.json) |
Inspects commands/, skills/, agents/ directories; generates file-existence validation tests for hook scripts |
If both signals are present, mcp takes precedence.
Tests are defined in YAML and parsed into PthTest objects. The id field is auto-generated from the name via slugification if not explicitly provided.
id: pkg-info-valid-input # optional; slugified from name if omitted
name: pkg_info - valid input
mode: mcp
type: single
tool: pkg_info
input:
package: bash
expect:
success: true
output_contains: "Description"
timeout_seconds: 10
tags: [smoke]name: apply fix then inspect history
mode: mcp
type: scenario
steps:
- tool: pth_apply_fix
input:
files:
- path: src/stub.ts
content: "// stub\n"
commitTitle: "test: stub commit for scenario"
expect:
success: true
capture:
commitHash: "text:Fix committed: (\\w+)"
- tool: pth_get_fix_history
input: {}
expect:
success: true
output_contains: ${commitHash}
expect:
success: true
timeout_seconds: 30
generated_from: schemaname: write-guard.sh - script exists and is readable
mode: plugin
type: validate
checks:
- type: file-exists
files: [scripts/write-guard.sh]
expect: {}
generated_from: source_analysisname: build succeeds
mode: plugin
type: exec
command: npm run build
expect:
exit_code: 0
stdout_contains: "Build succeeded"
timeout_seconds: 60| Field | Type | Description |
|---|---|---|
success |
boolean | Tool call returns without error |
output_contains |
string | Response text contains substring |
output_equals |
string | Response text exact match |
output_matches |
string | Response text matches regex |
output_json |
any | Response parses as JSON equal to value |
output_json_contains |
any | Response JSON contains subset |
error_contains |
string | Error message contains substring |
exit_code |
number | Process exit code (exec type) |
stdout_contains |
string | stdout contains substring (exec type) |
stdout_matches |
string | stdout matches regex (exec type) |
Branch naming: pth/<plugin-name>-<YYYY-MM-DD>-<6-hex-chars>
Example: pth/linux-sysadmin-mcp-2026-02-18-a3f9c2
Worktrees are created at: /tmp/pth-worktree-<branch-slug>
The branch remains in the git repository after pth_end_session; you can browse fix history with:
git log pth/my-plugin-2026-02-18-a3f9c2 --oneline
git show pth/my-plugin-2026-02-18-a3f9c2:src/some-file.tsEach fix commit carries PTH git trailers:
fix: handle null group
PTH-Test: pkg_info_missing_required
PTH-Category: runtime-exception
PTH-Iteration: 3
PTH-Files: src/tools/pkg-info.ts
All session artifacts are written to ~/.pth/<plugin-name>/ at pth_end_session:
~/.pth/<plugin-name>/
├── index.json # metadata + session count
├── tests/ # authoritative test suite (loaded by pth_start_session)
├── results-history.json # per-test pass/fail history across all sessions
├── plugin-snapshot.json # tool schemas + component list for gap analysis
└── sessions/<branch>/
├── SESSION-REPORT.md # iteration table, fix list, final results
├── iteration-history.json
└── fix-history.json
Tests in ~/.pth/ are the authoritative source. When resuming or starting a new session, PTH loads the saved test suite; no tests are lost across process restarts.
pth_get_iteration_status evaluates the last four iteration snapshots and returns one of five trend values:
| Trend | Meaning | Recommended action |
|---|---|---|
unknown |
Fewer than two iterations recorded | Keep running tests |
improving |
Pass count rising over the window | Continue iterating |
plateaued |
Last 2–3 snapshots have identical pass counts | Try a different fix strategy |
oscillating |
Direction reverses 2+ times in the window | Use pth_get_test_impact to find the regressing fix |
declining |
Pass count fell from first to last snapshot | Use pth_revert_fix before continuing |
The convergence table rendered by pth_get_iteration_status:
| Iteration | Passing | Failing | Fixes |
|-----------|---------|---------|-------|
| 1 | 12 | 8 | 0 |
| 2 | 16 | 4 | 2 |
| 3 | 18 | 2 | 1 |
When a persistent store exists for a plugin, pth_start_session compares the saved plugin snapshot against the current source to surface structural changes without requiring a live server. It flags new tools (present in source but absent from the snapshot), modified tools (source files changed since capture), and removed tools. Tests whose target tool was removed are surfaced as stale test IDs.
Use pth_generate_tests with the tools[] parameter to generate tests only for new or modified components from the gap report.
- CI export: write
pth-results.jsonto a configurable output path for integration with CI pipelines - Doc patch generation: auto-propose README updates when generated tests reveal undocumented behavior
- Parallel test execution: run independent tests concurrently to reduce iteration time on large suites
pth_reload_pluginterminates the MCP server process by pattern-matching againstpsoutput. On systems where the process pattern cannot be found (non-standard Claude Code installation paths), passprocessPatternexplicitly.- Convergence trend requires at least two distinct iteration snapshots. A session with a single
pth_get_iteration_statuscall always returnsunknown. - Plugin mode (
hook-scriptandvalidatetests) currently generates only file-existence checks. Behavioral testing of hook scripts requiresexectype tests written manually viapth_create_test. - The
capturefield in scenario steps uses text regex extraction from the tool response string, not JSON path extraction. Complex output formats may require manual test authoring.
- Repository: L3DigitalNet/Claude-Code-Plugins
- Changelog: CHANGELOG.md
- Issues: GitHub Issues
- Design document: docs/PTH-DESIGN.md