Add end-to-end integration test harness against a real server (#54) by VijitSingh97 · Pull Request #114 · p2pool-starter-stack/pithead

VijitSingh97 · 2026-06-04T05:00:24Z

Closes part of #54 (the harness + full config matrix + lifecycle/edge slices; the make release wiring lands with #44).

What

A new tests/integration/ suite — the runtime/integration half of our testing. It drives a real, already-synced Pithead server through the config matrix and asserts the stack behaves. Today everything is client-side/unit (the pithead shell tests stub docker/sudo, the compose test only checks interpolation, the dashboard pytest mocks its clients) — none prove a real apply → sync-gate → mine → status flow works on a real host.

File	Role
`tests/integration/run.sh`	Entry point — connects (SSH or `--local`), iterates the matrix, asserts, captures artifacts, restores
`tests/integration/scenarios.sh`	The declarative config matrix (adding a case is a one-line data edit)
`tests/integration/lib.sh`	Target I/O (SSH/local), assertions, readiness waiters, config rendering, secret redaction
`tests/integration/selftest.sh`	Pure-logic self-test — no server, runs in CI on every PR

How it works

The box is assumed already deployed and synced with miners connected — the whole point of a dedicated test box is that the full Monero + Tari nodes are synced once and reused. So the harness moves between scenarios with non-interactive pithead apply -y: recreates only changed containers, reuses the synced chain data dirs (never re-syncs / re-provisions Tor), preserves secrets. It waits on real readiness signals (container health, pithead status, dashboard sync %, miner-released) with timeouts — never fixed sleeps. All reads happen on-box (pithead status/doctor, curl …/api/state), so SSH and --local behave identically.

It snapshots the box's original config.json up front and restores it at the end.

What each scenario asserts

Expected containers up, unexpected absent (no monerod in remote mode)
pithead status exit code is 0 for a healthy config
Dashboard reads live state: Monero synced, pruned/full display (Show pruned vs full status in the Monero panel #32), sidechain pool.type matches p2pool.pool
End-to-end mining: workers online (>= --workers, default 2), stratum connected, hashes accumulating (Hashrate reflects zero and chart isn't painting #28)
Posture propagated to .env (RPC bind, DASHBOARD_SECURE, XVB_ENABLED, TARI_REQUIRED); Caddy scheme matches dashboard.secure
Idempotency: a second apply -y is a clean no-op
Secrets preserved (proxy token + onions unchanged across every apply)

--lifecycle adds: restart, a sidechain-change apply (secrets preserved + dashboard reflects it), and node-down failover (#31) — stop monerod → status non-zero + workers rejected → start → readmitted → status 0.

Matrix coverage

Covers the realistic combinations and guarantees every value of every axis (monero.mode, monero.prune, monero.rpc_lan_access, p2pool.pool main/mini/nano, xvb.enabled, dashboard.secure, dashboard.tari_required) is exercised at least once — the selftest enforces this; run.sh --list prints it.

Safety (the box holds real keys)

Never mutates the canonical chains. The destructive prune axis (pruned vs full are different DBs) only runs against a separate --pruned-data-dir/--full-data-dir; otherwise the case is SKIPPED with a reason — never silently dropped, never run against the canonical DB.
Secrets hygiene — proxy token / RPC creds / onions are never printed; preservation is checked by hashing on the box (plaintext never crosses the wire); all artifacts are redacted.
Continue-on-error — collects the whole matrix and summarizes, with per-scenario artifacts for failures.

Testing & wiring

tests/integration/selftest.sh — 48 pure-logic checks (config rendering & typing, profile-gated expectations, redaction, axis coverage, SSH/local exec, JSON parsing). It caught a real bug during development: jq_get used // empty, which silently swallows boolean-false in jq — broke the full-vs-pruned axis. Fixed.
make test-integration (live, manual — the blocking pre-release gate per Release & versioning structure (single-product releases via GHCR) #44) and make test-integration-selftest.
CI runs the selftest + shellcheck of the new scripts on every PR (shell job). The live matrix stays a gated/manual self-hosted job (it needs the real box).
New docs/integration-testing.md (provisioning, safety model, matrix, artifacts, release wiring) + docs index / releasing.md status / CHANGELOG updates.

Verification

shellcheck --severity=warning clean across all stack + integration scripts
tests/integration/selftest.sh: 48 passed
Existing tests/stack/run.sh: 72 passed (no regressions)

Note: the live run.sh paths can't be exercised in CI (no real box). Pure logic is covered by the selftest; the live matrix is validated by running make test-integration against the test server.

🤖 Generated with Claude Code

Stand up the runtime/integration half of our testing: a suite that drives a REAL, already-synced Pithead server through the config matrix and asserts the stack behaves — not just the client-side/unit checks we have today. The box is assumed already deployed and synced with miners connected, so the harness moves between scenarios with non-interactive `pithead apply -y` (recreates only changed containers, reuses the synced chain data dirs — never re-syncs, never re-provisions Tor, preserves secrets). It waits on real readiness signals (container health, `pithead status`, dashboard sync %, miner-released) with timeouts — never fixed sleeps — then asserts per scenario: expected containers up / unexpected absent (no monerod in remote mode), Monero synced + pruned/full display, sidechain selection, end-to-end mining (workers online, hashes accumulating), posture propagated to .env, status exit codes, apply idempotency, and secret preservation. `--lifecycle` adds restart, a pool-change apply, and node-down failover (#31). tests/integration/: run.sh entry point — SSH or --local, iterate matrix, assert, restore scenarios.sh declarative config matrix (data, not code) lib.sh target I/O, assertions, readiness waiters, config render, redaction selftest.sh pure-logic self-test (no server) — runs in CI on every PR Safety: never mutates the canonical chains; the destructive prune axis only runs against a separate --pruned/--full-data-dir (else SKIPPED, never silent); secrets are hashed on-box for comparison and redacted from all artifacts; continue-on-error collects the whole matrix. Wired as `make test-integration` (the blocking pre-release gate per #44) with the pure-logic selftest in CI. New docs/integration-testing.md plus index/releasing/ changelog updates. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…-37494f # Conflicts: # CHANGELOG.md

…unit gaps (#54) Simulate every runtime situation the live synced box can't show, at the cheapest honest tier. Documented in docs/testing-strategy.md with a full scenario catalog. Tier 1 (unit, every PR): backfill the genuine gaps the audit found — the required-Tari sync gate (monero synced but tari syncing → still held), the #35 one-way-latch × #31 failover interaction after release, and a simultaneous double outage. +3 dashboard tests; suite 381 green at 92.86% coverage. Tier 2 (contract, every PR, docker-free): controllable fake monerod (HTTP get_info) and fake Tari (gRPC BaseNode, via the vendored stubs) under tests/integration/fakes/, plus a contract test that points the REAL Monero/Tari clients at them and asserts they parse every state (synced/syncing/down). This is the verifiable proof the fakes speak the daemons' wire format. make test-fakes. Tier 3 (mini-stack, CI with docker): tests/integration/mini-stack/ runs the REAL dashboard + docker-control/-proxy against the fakes with lightweight p2pool/ xmrig-proxy containers, and a scenario runner asserting the control plane end-to-end — sync hold→release (#35) and node-down reject→readmit (#31) — actually stopping/starting real containers. Its own workflow (needs a docker daemon); compose validated with `docker compose config`. make test-mini-stack. Tier 4 (live matrix): add a --fault-injection phase to run.sh that deliberately breaks monerod (stop / SIGSTOP / remove) and asserts pithead status' verdicts (down / unhealthy / missing) and the full failover→recovery cycle, plus the service_state helper and parsing covered by the self-test (53 green). Enabler: UPDATE_INTERVAL is now env-configurable so the mini-stack loops fast in CI. CI wires the contract test into the dashboard job (every PR) and the docker mini-stack as its own paths-filtered workflow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ness (#54) Generated coverage inventory and a round of scenario hardening across the tiers, plus a production-readiness posture (CI gating, engineering standards, flake policy, honest gaps). - Inventory: tests/inventory.sh enumerates every test/scenario across all suites into docs/test-inventory.md (make test-inventory). A CI drift check (make test-inventory-check) fails if a test is added/removed without regenerating it, so "what's covered" can't silently rot. - Tier 2 (contract): +5 daemon edge states proven against the real clients — monerod BUSY (HTTP 200, non-OK status → distrusted), synced-by-height without the flag, db_size unknown; Tari syncing-without-reliable-target (no false 100%), and cached-last-reading on a brief gRPC blip. 7 → 12 contract tests. - Tier 3 (mini-stack): 4 → 8 deterministic scenarios — added required-Tari hold (holds while only monerod is synced), required-Tari down → reject → readmit, and a dashboard-restart persistence check (the one-way latch is not re-held). New assert_stays helper proves the gate does NOT act prematurely. - docs/testing-strategy.md: a Production-readiness section — what's blocking on every PR vs. the release gate, the engineering standards (deterministic waits, isolation, artifact capture, secrets hygiene, reproducibility), the quarantine-not-retry flake policy, and the known gaps (first real-hardware green run, CLI breadth in automation, soak, load, security review). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

, #32) First green run of the live integration harness against a real synced, mining box (gouda): 22/22. That run did its job — it surfaced three over-strict harness assertions and one real product bug. Harness calibration (tests/integration/): - Monero "synced" now trusts monerod's own get_info `synchronized` flag (via a new monero_caught_up helper, creds stay on the box) instead of the dashboard's `.sync.monero.state == "done"`: a synced LOCAL node has no target height, so that UI field reads "loading", not "done". - Mining liveness keys off proxy_workers; stratum.conns is now informational (a healthy mining box can report conns 0). - Pruned/full assertion is version-robust: a local node's mode must be determinate (Pruned|Full), not coupled to config (which can differ from the node's reality). - New non-destructive `--check` mode: assert the box's current live state with no config change / apply / restore. The safe first run and an ongoing health check. Read-only assertions refactored into assert_running_state, shared by --check and the per-scenario battery. Product fix (build/dashboard, #32): the pruned/full label always showed Full on local nodes — config.py parsed MONERO_PRUNE with == "true" but pithead writes 1/0. Now accepts 1/true/yes/on, with unit tests. Dashboard suite 383 green @ 92.91%. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Namespace the mini-stack's controllable containers (itest-p2pool / itest-xmrig-proxy) and move the fakes' host control ports to 28081/28152. Before this, the mini-stack used bare container_name p2pool/xmrig-proxy and published 18081 — which on a host already running the real stack collides with the production containers and monerod, and its docker-control would target the real p2pool/xmrig-proxy. Now it's fully isolated: unique names, own network, non-colliding ports — safe to run beside a live deployment (e.g. the test box). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The standalone fake Tari bound its gRPC server to 127.0.0.1, so the dashboard container couldn't reach it across the docker network — Tari read as unreachable, TARI_REQUIRED held the gate, and the miner never released in the mini-stack. Bind 0.0.0.0 in the container (loopback stays the default for the in-process contract test). Found by running the mini-stack on a real docker host. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…54) Got the fake-daemon mini-stack to a clean pass on a real Docker host (gouda). Two real bugs were surfaced and fixed in the process (fake_tari binding 127.0.0.1; container-name/port collisions). The monerod-down failover scenario is removed from the mini-stack: the dashboard's monerod down-path falls back to log-scraping a real monerod container, which the fake stack has no equivalent of, so it can't be simulated honestly here — it's covered on real hardware by the tier-4 --fault-injection phase. The mini-stack now covers sync hold/release (monero+Tari) and Tari reject/readmit, end-to-end with real containers. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ug, dev guide Audited coverage for INTENT (not line count) and filled the genuine gaps the audit surfaced — plus a developer testing guide. Persistence (storage_service, 82% → 87%): added intent tests for the upgrade path and long-running behavior that a fresh-DB test never exercises — schema migration timestamp backfill, in-memory + DB history retention pruning, and worker retention pruning. The migration test caught a REAL bug: _create_tables built idx_ts on history(timestamp) before _migrate_db added that column, so opening a pre-timestamp DB threw "no such column: timestamp" and aborted the whole migration. Indexes now run after migrations. Integration harness self-coverage: moved resolve_overrides (the prerequisite gate that protects the canonical synced chains from a destructive prune flip) into lib.sh so the self-test can exercise it, and added cases for every gate path (skip-without-dir, augment-with-dir, remote endpoint, compound), plus pool_label edges, scenario lookup negatives, redact *_PASSWORD/*_SECRET variants, and overrides_to_jq empties. Self-test 53 → 78. Docs: new docs/testing-guide.md — a practical, recipe-oriented guide for future developers (where to write each kind of test, conventions, and the calibration gotchas learned on real hardware). Refreshed CONTRIBUTING's testing section and the docs index. Dashboard suite 387 green @ 93.64%; selftest 78; stack 85; contract 12. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…-37494f # Conflicts: # CHANGELOG.md # docs/releasing.md

Audited the project's bug/security history (CHANGELOG, fix commits, closed issues, code "broke once" comments) against the suite. Most past bugs were already covered — #70 XvB overshoot (VipReserve clamp tests), #65 chart time gaps, #27 hashrate averages, #58 build provenance, the monerod get_info parser, the Tari overload cache. Two genuine gaps are now closed: - Security (#90): a new tests/stack/test_security.sh renders the canonical compose config and asserts the hardening invariants so a future edit can't silently undo them — RPC credentials never appear in a healthcheck command (the docker-inspect leak fix), no-new-privileges + cap_drop on the leaf containers (and the documented dashboard exception), the read socket-proxy can't POST while the control proxy is start/stop-only (both socket-ro, read-only rootfs), and p2pool has its liveness probe. Proven to bite (it fails on a re-introduced leak / a dashboard cap_drop). Runs on every PR (compose job); wired into make test + the inventory. - The dashboard.host "auto"-revert fix (247c5a0) had no test: added one asserting a configured host is used verbatim and that "auto" reverts HOST_IP to the machine hostname (not a stale value). make test green: dashboard 426 @ 93.83%, stack 93, security 23, selftest 78, contract 12. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… backup/restore gap) Manages the "destructive matrix on a precious box" gap: `run.sh --safety-backup` takes a real `pithead backup` before the destructive scenarios, captures the archive, and if anything fails rolls the box all the way back (down → restore → up) before restoring the baseline config; on success it just removes the archive. This doubles as end-to-end coverage for backup/restore (the #102 CLI breadth gap): the safety backup asserts the archive contains config.json/.env, and the --lifecycle phase adds an explicit backup → restore round-trip that diverges the pool, restores, and asserts the pool reverted and secrets survived. Also: moved resolve_overrides to lib.sh earlier left a few globals read only cross-file — added scoped shellcheck directives and removed a genuinely-dead `prune` local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… hardening tests Rename: pin `name: pithead` in docker-compose.yml so the stack's images, network and volumes are `pithead*` regardless of the checkout directory name (which is what left older checkouts showing the repo's previous name, e.g. p2pool-starter-stack-tor). Because the service containers use global container_names, a stack still running under the old directory-derived project would clash on `up`; `pithead` now runs migrate_compose_project on up/apply/ upgrade — it detects containers from a different compose project holding our names and removes them so the renamed project takes over (bind-mounted chain data is untouched). test_compose.sh asserts the project name is pinned. Cleanup: I'd added tests/stack/test_security.sh, not noticing test_compose.sh already had the #90 hardening assertions — that was duplication. Removed it and folded only the genuinely-new checks into test_compose.sh's hardening section: per-service Docker socket-proxy scoping (read proxy can't POST, control proxy is start/stop-only, socket mounted read-only) and the Tari [m]inotari self-match guard. Reverted the separate make target / CI step / inventory section. make test green: dashboard 426, stack 93, compose+hardening 15 checks, selftest 78, contract 12. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…f-hosted gate Answers "can GitHub do the full end-to-end, or do we need the dedicated server?" and gets the server ready safely. - run.sh --readiness: a non-destructive assessment of whether a box is fit to be a release/validation server — chains synced (reusable), chain FS snapshot/reflink- capable, disk headroom, .env owner-only, dashboard bound to localhost, backup net usable. Ran it against the real box: 6/6, one actionable finding (chains on ext4, so the prune axis must copy or be skipped; otherwise ready — 497 GiB free, .env 600, dashboard localhost-only). - docs/release-server.md: the analysis + hardening guide. GitHub-hosted runners can't hold a 95–270 GiB synced chain or sync for days, so the real-daemon tier 4 is impossible there — but tiers 1–3 (unit, contract, the fake-daemon mini-stack with real Docker) already run free on every PR as the merge gate. The dedicated server is the tier-4 release gate. Plus the hardening checklist (the box holds wallet/onion/RPC keys) and the SAFE self-hosted-runner model. - .github/workflows/release-gate.yml: runs the tier-4 matrix on a self-hosted runner only on trusted code (workflow_dispatch / push to main) — deliberately never on pull_request, because a fork PR could run arbitrary code on a runner that holds real keys (GitHub's own guidance). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

From a full review of the branch diff: - HIGH: migrate_compose_project had unguarded `var=$(... jq ...)` substitutions; pithead runs `set -Eeuo pipefail`, so a jq/docker hiccup would abort pithead mid up/apply/upgrade with the scary ERR-trap message. Guarded every substitution (verified it survives non-JSON from `compose config`). - MED: the helper force-removed ANY container sharing one of our service names (tor/caddy/dashboard/…) from a different compose project — a footgun for anyone running an unrelated `caddy`/`tor` on the same host. Now it removes ONLY the containers from the exact project THIS directory used to derive, so an unrelated container is never touched. Fixed the misleading log message + the inaccurate "Tor keys in volumes" comment (they're bind-mounted), and documented the one-time Caddy cert re-issue. - LOW: config.py UPDATE_INTERVAL now tolerates a malformed override instead of crashing the dashboard at import (with a test). release-gate.yml passes workflow inputs via env: instead of interpolating them into the run script. make test green; stack suite 93; dashboard suite incl. the new config test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…-37494f # Conflicts: # tests/stack/run.sh # tests/stack/test_compose.sh

Make the live test box (gouda) a proper dev+agent platform and capture the lessons from setting it up. - run.sh --readiness: prune-mode-aware. Reports the live chain's prune mode, recognizes a snapshot-isolated same-mode copy on a CoW FS, and honestly flags when the opposite mode will be skipped (covered by the fakes). Reads MONERO_PRUNE directly so it works in standalone --readiness. - tests/integration/{build-pruned-chain,compact-chain}.sh: chain ops — build a pruned chain next to a full one; reclaim LMDB free-page bloat from an already-pruned chain via `monero-blockchain-prune --copy-pruned-database` (concurrent-safe, no downtime until the swap). - tests/integration/system-info.sh + gouda-testbench-README.md: a re-runnable system snapshot and a build-server runbook for developers and AI agents (golden rules, layout, how to run the stack + harness, agent gotchas). - docs/release-server.md, integration-testing.md: reconcile with the storage-speed lesson (active chains belong on SSD/NVMe; CoW is a bonus), the compaction gotcha (stock mdb_copy can't read Monero's patched LMDB), and an end-to-end coverage/gaps table with before-release priorities. - docs/test-inventory.md: regenerated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- docs/test-server-architecture.md: high-level structure (SSD-primary storage policy, sizing table, directory layout) + a "recreate on another box" runbook (clone -> config -> rsync the chains -> deploy) so migrating to a bigger box is a transfer + redeploy, no re-sync. - gouda-testbench-README.md: reframe as a dev/agent test platform (not a production miner — downtime/teardown are fine); update paths to ~/pithead and the decoupled chains at /srv/code/pithead/data. - system-info.sh: read chain locations from .env (decoupled from the checkout); default STACK_DIR to ~/pithead. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ins (/srv/code/pithead-data) Reflect the clean re-homed layout in the build-server docs: checkout on the NVMe at /srv/code/pithead (= ~/code/pithead), chains decoupled at /srv/code/pithead-data, harness --dir code/pithead, system-info STACK_DIR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Benchmarks on the reference box show the '1TB SSD' is a SATA drive at ~37-98 MB/s (no /dev/nvme present). That bottlenecks monerod/builds and makes LMDB compaction impractical (~16h vs ~10min on NVMe). Document the finding, how to verify a disk is genuinely fast, and that a real m.2 PCIe NVMe is the #1 upgrade. The matching mdb_copy (LMDB 0.9.70) is staged for when a fast disk lands. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

VijitSingh97 and others added 21 commits June 3, 2026 23:59

Merge remote-tracking branch 'origin/main' into claude/magical-spence…

16fd634

…-37494f # Conflicts: # CHANGELOG.md

docs: record tier-3 mini-stack green run on real hardware (#54)

328e620

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into claude/magical-spence…

2ef67de

…-37494f # Conflicts: # CHANGELOG.md # docs/releasing.md

Merge remote-tracking branch 'origin/main' into claude/magical-spence…

61ff27b

…-37494f # Conflicts: # tests/stack/run.sh # tests/stack/test_compose.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add end-to-end integration test harness against a real server (#54)#114

Add end-to-end integration test harness against a real server (#54)#114
VijitSingh97 wants to merge 21 commits into
mainfrom
claude/magical-spence-37494f

VijitSingh97 commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

VijitSingh97 commented Jun 4, 2026

What

How it works

What each scenario asserts

Matrix coverage

Safety (the box holds real keys)

Testing & wiring

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant