Plugin Test Harness

An iterative MCP-based test harness for Claude Code plugins and MCP servers.

Summary

Plugin Test Harness (PTH) runs a structured test/fix/reload loop against a target plugin entirely inside Claude Code. It creates a git session branch and worktree, auto-discovers tool schemas from the live MCP server, generates test cases, executes them via Claude, and applies code fixes, iterating until convergence. Test suites and session artifacts persist across sessions in ~/.pth/PLUGIN_NAME/ so state is never lost when a session ends or a process restarts.

Principles

[P1] Act on Intent: Invoking a session tool is consent to its full scope. Session gates block only the narrow pre-session window; once a session is active, all tools operate without confirmation.

[P2] Scope Fidelity: Each tool does exactly one thing: generate, record, fix, reload, or inspect. No tool performs a superset of its stated scope without being asked.

[P3] Succeed Quietly, Fail Transparently: Normal iterations emit compact summaries. On critical failures (build error, git conflict, missing tool) PTH stops immediately and surfaces raw output alongside a recovery plan.

[P4] Use the Full Toolkit: The convergence trend (improving, plateaued, oscillating, declining) is a structured signal, not prose, enabling Claude to pick the right next action without guessing.

[P5] Convergence is the Contract: Sessions iterate toward zero failing tests without check-ins. PTH surfaces the trend and a recommended next action at each iteration boundary; the AI drives toward the goal.

[P6] Composable, Focused Units: Each MCP tool is independently callable. Orchestration is assembled at runtime by the AI agent, not baked into PTH itself.

Requirements

Node.js 20+
Target plugin must be on the local filesystem inside a git repository
For mcp mode: target server must be startable via its .mcp.json (PTH spawns the server to discover tool schemas)
For plugin mode: target plugin must have a .claude-plugin/ directory

Installation

/plugin marketplace add L3DigitalNet/Claude-Code-Plugins
/plugin install plugin-test-harness@l3digitalnet-plugins

For local development:

claude --plugin-dir ./plugins/plugin-test-harness

Post-Install Steps

PTH is distributed with a pre-built dist/ directory. If you are running from source or a clean checkout, build first:

cd ~/.claude/plugins/cache/l3digitalnet-plugins/plugin-test-harness/<version>/
npm install
npm run build

How It Works

flowchart TD
    User([User]) -->|"pth_preflight"| Preflight[Check path, git, build system, lock]
    Preflight -->|"pth_start_session"| Start[Create branch + worktree<br/>Detect mode, load saved tests<br/>Run gap analysis]
    Start --> Generate[pth_generate_tests<br/>Auto-discover schemas<br/>Upsert test suite]
    Generate --> Run[Claude executes each test<br/>against live plugin tools]
    Run -->|"pth_record_result × N"| Record[Record pass / fail / skipped<br/>Update session state]
    Record -->|"pth_get_iteration_status"| Iter{Convergence<br/>trend?}
    Iter -->|improving| Run
    Iter -->|plateaued / oscillating / declining| Fix[pth_apply_fix<br/>Write + commit files]
    Fix -->|"pth_reload_plugin"| Reload[Build → sync cache → kill process<br/>Claude Code restarts server]
    Reload --> Run
    Iter -->|all passing| End[pth_end_session<br/>Persist artifacts, remove worktree]
    End --> Report((~/.pth/PLUGIN/<br/>SESSION-REPORT.md))

Usage

Starting a session

Call pth_preflight with the absolute path to your plugin.
Then call pth_start_session to create the session branch.
Call pth_generate_tests; schemas are auto-discovered.
Execute each test by calling the plugin's tools, then pth_record_result.
Call pth_get_iteration_status to checkpoint and see the trend.
Apply fixes with pth_apply_fix, reload with pth_reload_plugin, repeat.
End with pth_end_session.

Resuming an interrupted session

If the Claude Code process restarts mid-session, use:

pth_resume_session  branch: "pth/my-plugin-2026-02-18-abc123"  pluginPath: "/path/to/plugin"

Session branches follow the pattern pth/<plugin>-<date>-<6-char-hex>. The branch remains in the git repository after the session ends, preserving the full fix history.

Session isolation

Each session runs in a git worktree at /tmp/pth-worktree-<branch-slug>. The main working tree is never modified during a session. Fix commits land on the session branch; pth_diff_session shows the cumulative diff against the branch point.

Tools

Session Management

Tool	Description
`pth_preflight`	Validate prerequisites: plugin path exists, git repo detected, plugin mode recognized, no active session lock, persistent store status.
`pth_start_session`	Create session branch and worktree, detect plugin mode, load saved tests from persistent store, run gap analysis vs last snapshot.
`pth_resume_session`	Re-attach to an existing session branch after a process restart. Loads tests from persistent store.
`pth_end_session`	Persist tests and artifacts to `~/.pth/`, generate SESSION-REPORT.md, remove worktree, release session lock.
`pth_get_session_status`	Current iteration count, pass/fail totals, convergence trend, mode, and start time.

Test Management

Tool	Description
`pth_generate_tests`	Auto-discover tool schemas by spawning the target MCP server, then generate test cases (valid input + missing-required-field variants). Upserts existing tests; skips tests with `pinned: true` (hand-edited via `pth_edit_test`). Accepts optional `tools[]` for gap-targeted generation.
`pth_list_tests`	List tests with optional filters: `mode` (mcp/plugin), `status` (passing/failing/pending), `tag`.
`pth_create_test`	Add a new test from a YAML definition.
`pth_edit_test`	Update an existing test by ID with a new YAML definition. Sets `pinned: true` on the test, protecting it from being overwritten by `pth_generate_tests`.
`pth_delete_test`	Remove a test from the suite by ID. Also clears its result history from the in-session results tracker.

Execution and Results

Tool	Description
`pth_record_result`	Record a test outcome (passing/failing/skipped) after Claude executes the test. Accepts optional `durationMs`, `failureReason`, and `claudeNotes`.
`pth_get_results`	Full pass/fail listing for all tests with failure reasons.
`pth_get_test_impact`	Identify which tests exercise a given set of source files, used to target re-runs after a fix.
`pth_get_iteration_status`	Checkpoint the current iteration: snapshot pass/fail counts, advance the iteration counter, emit the convergence table and trend recommendation.

Fix Management

Tool	Description
`pth_apply_fix`	Write file changes and commit to the session branch with PTH git trailers (`PTH-Test`, `PTH-Category`, `PTH-Iteration`, `PTH-Files`).
`pth_sync_to_cache`	Sync worktree changes to the plugin's installed cache directory so hook script changes take effect immediately without a rebuild.
`pth_reload_plugin`	Build the plugin, sync the new binary to cache, then send SIGTERM to the MCP server process. Claude Code auto-restarts the server from the updated cache.
`pth_get_fix_history`	List all fix commits on the session branch with their PTH trailers.
`pth_revert_fix`	Undo a specific fix commit via `git revert`. Handles `session-state.json` conflicts automatically.
`pth_diff_session`	Show the cumulative diff of all changes on the session branch vs the branch point (truncated at 200 lines; use `pth_get_fix_history` for a structured summary on large diffs).

Modes

Mode	Detected when	How PTH tests
`mcp`	`.mcp.json` present at plugin root	Spawns the MCP server via `.mcp.json`, calls `tools/list` to discover all tool schemas, generates test calls via Claude using the discovered tools
`plugin`	`.claude-plugin/` directory present (no `.mcp.json`)	Inspects `commands/`, `skills/`, `agents/` directories; generates file-existence validation tests for hook scripts

If both signals are present, mcp takes precedence.

Test YAML Format

Tests are defined in YAML and parsed into PthTest objects. The id field is auto-generated from the name via slugification if not explicitly provided.

Single MCP tool call

id: pkg-info-valid-input          # optional; slugified from name if omitted
name: pkg_info - valid input
mode: mcp
type: single
tool: pkg_info
input:
  package: bash
expect:
  success: true
  output_contains: "Description"
timeout_seconds: 10
tags: [smoke]

Multi-step scenario (captures output between steps)

name: apply fix then inspect history
mode: mcp
type: scenario
steps:
  - tool: pth_apply_fix
    input:
      files:
        - path: src/stub.ts
          content: "// stub\n"
      commitTitle: "test: stub commit for scenario"
    expect:
      success: true
    capture:
      commitHash: "text:Fix committed: (\\w+)"
  - tool: pth_get_fix_history
    input: {}
    expect:
      success: true
      output_contains: ${commitHash}
expect:
  success: true
timeout_seconds: 30
generated_from: schema

Plugin hook script validation

name: write-guard.sh - script exists and is readable
mode: plugin
type: validate
checks:
  - type: file-exists
    files: [scripts/write-guard.sh]
expect: {}
generated_from: source_analysis

Custom exec test

name: build succeeds
mode: plugin
type: exec
command: npm run build
expect:
  exit_code: 0
  stdout_contains: "Build succeeded"
timeout_seconds: 60

Available `expect` fields

Field	Type	Description
`success`	boolean	Tool call returns without error
`output_contains`	string	Response text contains substring
`output_equals`	string	Response text exact match
`output_matches`	string	Response text matches regex
`output_json`	any	Response parses as JSON equal to value
`output_json_contains`	any	Response JSON contains subset
`error_contains`	string	Error message contains substring
`exit_code`	number	Process exit code (exec type)
`stdout_contains`	string	stdout contains substring (exec type)
`stdout_matches`	string	stdout matches regex (exec type)

Session Branches

Branch naming: pth/<plugin-name>-<YYYY-MM-DD>-<6-hex-chars>

Example: pth/linux-sysadmin-mcp-2026-02-18-a3f9c2

Worktrees are created at: /tmp/pth-worktree-<branch-slug>

The branch remains in the git repository after pth_end_session; you can browse fix history with:

git log pth/my-plugin-2026-02-18-a3f9c2 --oneline
git show pth/my-plugin-2026-02-18-a3f9c2:src/some-file.ts

Each fix commit carries PTH git trailers:

fix: handle null group

PTH-Test: pkg_info_missing_required
PTH-Category: runtime-exception
PTH-Iteration: 3
PTH-Files: src/tools/pkg-info.ts

Persistent Store

All session artifacts are written to ~/.pth/<plugin-name>/ at pth_end_session:

~/.pth/<plugin-name>/
├── index.json                   # metadata + session count
├── tests/                       # authoritative test suite (loaded by pth_start_session)
├── results-history.json         # per-test pass/fail history across all sessions
├── plugin-snapshot.json         # tool schemas + component list for gap analysis
└── sessions/<branch>/
    ├── SESSION-REPORT.md        # iteration table, fix list, final results
    ├── iteration-history.json
    └── fix-history.json

Tests in ~/.pth/ are the authoritative source. When resuming or starting a new session, PTH loads the saved test suite; no tests are lost across process restarts.

Convergence

pth_get_iteration_status evaluates the last four iteration snapshots and returns one of five trend values:

Trend	Meaning	Recommended action
`unknown`	Fewer than two iterations recorded	Keep running tests
`improving`	Pass count rising over the window	Continue iterating
`plateaued`	Last 2–3 snapshots have identical pass counts	Try a different fix strategy
`oscillating`	Direction reverses 2+ times in the window	Use `pth_get_test_impact` to find the regressing fix
`declining`	Pass count fell from first to last snapshot	Use `pth_revert_fix` before continuing

The convergence table rendered by pth_get_iteration_status:

| Iteration | Passing | Failing | Fixes |
|-----------|---------|---------|-------|
| 1         | 12      | 8       | 0     |
| 2         | 16      | 4       | 2     |
| 3         | 18      | 2       | 1     |

Gap Analysis

When a persistent store exists for a plugin, pth_start_session compares the saved plugin snapshot against the current source to surface structural changes without requiring a live server. It flags new tools (present in source but absent from the snapshot), modified tools (source files changed since capture), and removed tools. Tests whose target tool was removed are surfaced as stale test IDs.

Use pth_generate_tests with the tools[] parameter to generate tests only for new or modified components from the gap report.

Planned Features

CI export: write pth-results.json to a configurable output path for integration with CI pipelines
Doc patch generation: auto-propose README updates when generated tests reveal undocumented behavior
Parallel test execution: run independent tests concurrently to reduce iteration time on large suites

Known Issues

pth_reload_plugin terminates the MCP server process by pattern-matching against ps output. On systems where the process pattern cannot be found (non-standard Claude Code installation paths), pass processPattern explicitly.
Convergence trend requires at least two distinct iteration snapshots. A session with a single pth_get_iteration_status call always returns unknown.
Plugin mode (hook-script and validate tests) currently generates only file-existence checks. Behavioral testing of hook scripts requires exec type tests written manually via pth_create_test.
The capture field in scenario steps uses text regex extraction from the tool response string, not JSON path extraction. Complex output formats may require manual test authoring.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plugin Test Harness

Summary

Principles

Requirements

Installation

Post-Install Steps

How It Works

Usage

Starting a session

Resuming an interrupted session

Session isolation

Tools

Session Management

Test Management

Execution and Results

Fix Management

Modes

Test YAML Format

Single MCP tool call

Multi-step scenario (captures output between steps)

Plugin hook script validation

Custom exec test

Available `expect` fields

Session Branches

Persistent Store

Convergence

Gap Analysis

Planned Features

Known Issues

Links

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Plugin Test Harness

Summary

Principles

Requirements

Installation

Post-Install Steps

How It Works

Usage

Starting a session

Resuming an interrupted session

Session isolation

Tools

Session Management

Test Management

Execution and Results

Fix Management

Modes

Test YAML Format

Single MCP tool call

Multi-step scenario (captures output between steps)

Plugin hook script validation

Custom exec test

Available expect fields

Session Branches

Persistent Store

Convergence

Gap Analysis

Planned Features

Known Issues

Links

Available `expect` fields