Skip to content

Add end-to-end integration test harness against a real server (#54)#114

Open
VijitSingh97 wants to merge 21 commits into
mainfrom
claude/magical-spence-37494f
Open

Add end-to-end integration test harness against a real server (#54)#114
VijitSingh97 wants to merge 21 commits into
mainfrom
claude/magical-spence-37494f

Conversation

@VijitSingh97
Copy link
Copy Markdown
Collaborator

Closes part of #54 (the harness + full config matrix + lifecycle/edge slices; the make release wiring lands with #44).

What

A new tests/integration/ suite — the runtime/integration half of our testing. It drives a real, already-synced Pithead server through the config matrix and asserts the stack behaves. Today everything is client-side/unit (the pithead shell tests stub docker/sudo, the compose test only checks interpolation, the dashboard pytest mocks its clients) — none prove a real apply → sync-gate → mine → status flow works on a real host.

File Role
tests/integration/run.sh Entry point — connects (SSH or --local), iterates the matrix, asserts, captures artifacts, restores
tests/integration/scenarios.sh The declarative config matrix (adding a case is a one-line data edit)
tests/integration/lib.sh Target I/O (SSH/local), assertions, readiness waiters, config rendering, secret redaction
tests/integration/selftest.sh Pure-logic self-test — no server, runs in CI on every PR

How it works

The box is assumed already deployed and synced with miners connected — the whole point of a dedicated test box is that the full Monero + Tari nodes are synced once and reused. So the harness moves between scenarios with non-interactive pithead apply -y: recreates only changed containers, reuses the synced chain data dirs (never re-syncs / re-provisions Tor), preserves secrets. It waits on real readiness signals (container health, pithead status, dashboard sync %, miner-released) with timeouts — never fixed sleeps. All reads happen on-box (pithead status/doctor, curl …/api/state), so SSH and --local behave identically.

It snapshots the box's original config.json up front and restores it at the end.

What each scenario asserts

  • Expected containers up, unexpected absent (no monerod in remote mode)
  • pithead status exit code is 0 for a healthy config
  • Dashboard reads live state: Monero synced, pruned/full display (Show pruned vs full status in the Monero panel #32), sidechain pool.type matches p2pool.pool
  • End-to-end mining: workers online (>= --workers, default 2), stratum connected, hashes accumulating (Hashrate reflects zero and chart isn't painting #28)
  • Posture propagated to .env (RPC bind, DASHBOARD_SECURE, XVB_ENABLED, TARI_REQUIRED); Caddy scheme matches dashboard.secure
  • Idempotency: a second apply -y is a clean no-op
  • Secrets preserved (proxy token + onions unchanged across every apply)

--lifecycle adds: restart, a sidechain-change apply (secrets preserved + dashboard reflects it), and node-down failover (#31) — stop monerodstatus non-zero + workers rejected → start → readmitted → status 0.

Matrix coverage

Covers the realistic combinations and guarantees every value of every axis (monero.mode, monero.prune, monero.rpc_lan_access, p2pool.pool main/mini/nano, xvb.enabled, dashboard.secure, dashboard.tari_required) is exercised at least once — the selftest enforces this; run.sh --list prints it.

Safety (the box holds real keys)

  • Never mutates the canonical chains. The destructive prune axis (pruned vs full are different DBs) only runs against a separate --pruned-data-dir/--full-data-dir; otherwise the case is SKIPPED with a reason — never silently dropped, never run against the canonical DB.
  • Secrets hygiene — proxy token / RPC creds / onions are never printed; preservation is checked by hashing on the box (plaintext never crosses the wire); all artifacts are redacted.
  • Continue-on-error — collects the whole matrix and summarizes, with per-scenario artifacts for failures.

Testing & wiring

  • tests/integration/selftest.sh — 48 pure-logic checks (config rendering & typing, profile-gated expectations, redaction, axis coverage, SSH/local exec, JSON parsing). It caught a real bug during development: jq_get used // empty, which silently swallows boolean-false in jq — broke the full-vs-pruned axis. Fixed.
  • make test-integration (live, manual — the blocking pre-release gate per Release & versioning structure (single-product releases via GHCR) #44) and make test-integration-selftest.
  • CI runs the selftest + shellcheck of the new scripts on every PR (shell job). The live matrix stays a gated/manual self-hosted job (it needs the real box).
  • New docs/integration-testing.md (provisioning, safety model, matrix, artifacts, release wiring) + docs index / releasing.md status / CHANGELOG updates.

Verification

  • shellcheck --severity=warning clean across all stack + integration scripts
  • tests/integration/selftest.sh: 48 passed
  • Existing tests/stack/run.sh: 72 passed (no regressions)

Note: the live run.sh paths can't be exercised in CI (no real box). Pure logic is covered by the selftest; the live matrix is validated by running make test-integration against the test server.

🤖 Generated with Claude Code

VijitSingh97 and others added 21 commits June 3, 2026 23:59
Stand up the runtime/integration half of our testing: a suite that drives a
REAL, already-synced Pithead server through the config matrix and asserts the
stack behaves — not just the client-side/unit checks we have today.

The box is assumed already deployed and synced with miners connected, so the
harness moves between scenarios with non-interactive `pithead apply -y`
(recreates only changed containers, reuses the synced chain data dirs — never
re-syncs, never re-provisions Tor, preserves secrets). It waits on real
readiness signals (container health, `pithead status`, dashboard sync %,
miner-released) with timeouts — never fixed sleeps — then asserts per scenario:
expected containers up / unexpected absent (no monerod in remote mode), Monero
synced + pruned/full display, sidechain selection, end-to-end mining (workers
online, hashes accumulating), posture propagated to .env, status exit codes,
apply idempotency, and secret preservation. `--lifecycle` adds restart, a
pool-change apply, and node-down failover (#31).

tests/integration/:
  run.sh        entry point — SSH or --local, iterate matrix, assert, restore
  scenarios.sh  declarative config matrix (data, not code)
  lib.sh        target I/O, assertions, readiness waiters, config render, redaction
  selftest.sh   pure-logic self-test (no server) — runs in CI on every PR

Safety: never mutates the canonical chains; the destructive prune axis only runs
against a separate --pruned/--full-data-dir (else SKIPPED, never silent); secrets
are hashed on-box for comparison and redacted from all artifacts; continue-on-error
collects the whole matrix.

Wired as `make test-integration` (the blocking pre-release gate per #44) with the
pure-logic selftest in CI. New docs/integration-testing.md plus index/releasing/
changelog updates.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…unit gaps (#54)

Simulate every runtime situation the live synced box can't show, at the cheapest
honest tier. Documented in docs/testing-strategy.md with a full scenario catalog.

Tier 1 (unit, every PR): backfill the genuine gaps the audit found — the
required-Tari sync gate (monero synced but tari syncing → still held), the #35
one-way-latch × #31 failover interaction after release, and a simultaneous
double outage. +3 dashboard tests; suite 381 green at 92.86% coverage.

Tier 2 (contract, every PR, docker-free): controllable fake monerod (HTTP
get_info) and fake Tari (gRPC BaseNode, via the vendored stubs) under
tests/integration/fakes/, plus a contract test that points the REAL Monero/Tari
clients at them and asserts they parse every state (synced/syncing/down). This
is the verifiable proof the fakes speak the daemons' wire format. make test-fakes.

Tier 3 (mini-stack, CI with docker): tests/integration/mini-stack/ runs the REAL
dashboard + docker-control/-proxy against the fakes with lightweight p2pool/
xmrig-proxy containers, and a scenario runner asserting the control plane
end-to-end — sync hold→release (#35) and node-down reject→readmit (#31) — actually
stopping/starting real containers. Its own workflow (needs a docker daemon);
compose validated with `docker compose config`. make test-mini-stack.

Tier 4 (live matrix): add a --fault-injection phase to run.sh that deliberately
breaks monerod (stop / SIGSTOP / remove) and asserts pithead status' verdicts
(down / unhealthy / missing) and the full failover→recovery cycle, plus the
service_state helper and parsing covered by the self-test (53 green).

Enabler: UPDATE_INTERVAL is now env-configurable so the mini-stack loops fast in
CI. CI wires the contract test into the dashboard job (every PR) and the docker
mini-stack as its own paths-filtered workflow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ness (#54)

Generated coverage inventory and a round of scenario hardening across the tiers,
plus a production-readiness posture (CI gating, engineering standards, flake
policy, honest gaps).

- Inventory: tests/inventory.sh enumerates every test/scenario across all suites
  into docs/test-inventory.md (make test-inventory). A CI drift check
  (make test-inventory-check) fails if a test is added/removed without
  regenerating it, so "what's covered" can't silently rot.

- Tier 2 (contract): +5 daemon edge states proven against the real clients —
  monerod BUSY (HTTP 200, non-OK status → distrusted), synced-by-height without
  the flag, db_size unknown; Tari syncing-without-reliable-target (no false 100%),
  and cached-last-reading on a brief gRPC blip. 7 → 12 contract tests.

- Tier 3 (mini-stack): 4 → 8 deterministic scenarios — added required-Tari hold
  (holds while only monerod is synced), required-Tari down → reject → readmit, and
  a dashboard-restart persistence check (the one-way latch is not re-held). New
  assert_stays helper proves the gate does NOT act prematurely.

- docs/testing-strategy.md: a Production-readiness section — what's blocking on
  every PR vs. the release gate, the engineering standards (deterministic waits,
  isolation, artifact capture, secrets hygiene, reproducibility), the
  quarantine-not-retry flake policy, and the known gaps (first real-hardware green
  run, CLI breadth in automation, soak, load, security review).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
, #32)

First green run of the live integration harness against a real synced, mining
box (gouda): 22/22. That run did its job — it surfaced three over-strict harness
assertions and one real product bug.

Harness calibration (tests/integration/):
- Monero "synced" now trusts monerod's own get_info `synchronized` flag (via a
  new monero_caught_up helper, creds stay on the box) instead of the dashboard's
  `.sync.monero.state == "done"`: a synced LOCAL node has no target height, so
  that UI field reads "loading", not "done".
- Mining liveness keys off proxy_workers; stratum.conns is now informational (a
  healthy mining box can report conns 0).
- Pruned/full assertion is version-robust: a local node's mode must be
  determinate (Pruned|Full), not coupled to config (which can differ from the
  node's reality).
- New non-destructive `--check` mode: assert the box's current live state with no
  config change / apply / restore. The safe first run and an ongoing health check.
  Read-only assertions refactored into assert_running_state, shared by --check and
  the per-scenario battery.

Product fix (build/dashboard, #32): the pruned/full label always showed Full on
local nodes — config.py parsed MONERO_PRUNE with == "true" but pithead writes 1/0.
Now accepts 1/true/yes/on, with unit tests. Dashboard suite 383 green @ 92.91%.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Namespace the mini-stack's controllable containers (itest-p2pool / itest-xmrig-proxy)
and move the fakes' host control ports to 28081/28152. Before this, the mini-stack used
bare container_name p2pool/xmrig-proxy and published 18081 — which on a host already
running the real stack collides with the production containers and monerod, and its
docker-control would target the real p2pool/xmrig-proxy. Now it's fully isolated: unique
names, own network, non-colliding ports — safe to run beside a live deployment (e.g. the
test box).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The standalone fake Tari bound its gRPC server to 127.0.0.1, so the dashboard
container couldn't reach it across the docker network — Tari read as unreachable,
TARI_REQUIRED held the gate, and the miner never released in the mini-stack. Bind
0.0.0.0 in the container (loopback stays the default for the in-process contract
test). Found by running the mini-stack on a real docker host.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…54)

Got the fake-daemon mini-stack to a clean pass on a real Docker host (gouda). Two
real bugs were surfaced and fixed in the process (fake_tari binding 127.0.0.1;
container-name/port collisions). The monerod-down failover scenario is removed
from the mini-stack: the dashboard's monerod down-path falls back to log-scraping
a real monerod container, which the fake stack has no equivalent of, so it can't
be simulated honestly here — it's covered on real hardware by the tier-4
--fault-injection phase. The mini-stack now covers sync hold/release (monero+Tari)
and Tari reject/readmit, end-to-end with real containers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ug, dev guide

Audited coverage for INTENT (not line count) and filled the genuine gaps the
audit surfaced — plus a developer testing guide.

Persistence (storage_service, 82% → 87%): added intent tests for the upgrade
path and long-running behavior that a fresh-DB test never exercises — schema
migration timestamp backfill, in-memory + DB history retention pruning, and
worker retention pruning. The migration test caught a REAL bug: _create_tables
built idx_ts on history(timestamp) before _migrate_db added that column, so
opening a pre-timestamp DB threw "no such column: timestamp" and aborted the
whole migration. Indexes now run after migrations.

Integration harness self-coverage: moved resolve_overrides (the prerequisite
gate that protects the canonical synced chains from a destructive prune flip)
into lib.sh so the self-test can exercise it, and added cases for every gate
path (skip-without-dir, augment-with-dir, remote endpoint, compound), plus
pool_label edges, scenario lookup negatives, redact *_PASSWORD/*_SECRET
variants, and overrides_to_jq empties. Self-test 53 → 78.

Docs: new docs/testing-guide.md — a practical, recipe-oriented guide for future
developers (where to write each kind of test, conventions, and the calibration
gotchas learned on real hardware). Refreshed CONTRIBUTING's testing section and
the docs index.

Dashboard suite 387 green @ 93.64%; selftest 78; stack 85; contract 12.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-37494f

# Conflicts:
#	CHANGELOG.md
#	docs/releasing.md
Audited the project's bug/security history (CHANGELOG, fix commits, closed
issues, code "broke once" comments) against the suite. Most past bugs were
already covered — #70 XvB overshoot (VipReserve clamp tests), #65 chart time
gaps, #27 hashrate averages, #58 build provenance, the monerod get_info parser,
the Tari overload cache. Two genuine gaps are now closed:

- Security (#90): a new tests/stack/test_security.sh renders the canonical
  compose config and asserts the hardening invariants so a future edit can't
  silently undo them — RPC credentials never appear in a healthcheck command
  (the docker-inspect leak fix), no-new-privileges + cap_drop on the leaf
  containers (and the documented dashboard exception), the read socket-proxy
  can't POST while the control proxy is start/stop-only (both socket-ro,
  read-only rootfs), and p2pool has its liveness probe. Proven to bite (it fails
  on a re-introduced leak / a dashboard cap_drop). Runs on every PR (compose
  job); wired into make test + the inventory.

- The dashboard.host "auto"-revert fix (247c5a0) had no test: added one asserting
  a configured host is used verbatim and that "auto" reverts HOST_IP to the
  machine hostname (not a stale value).

make test green: dashboard 426 @ 93.83%, stack 93, security 23, selftest 78,
contract 12.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… backup/restore gap)

Manages the "destructive matrix on a precious box" gap: `run.sh --safety-backup`
takes a real `pithead backup` before the destructive scenarios, captures the
archive, and if anything fails rolls the box all the way back (down → restore →
up) before restoring the baseline config; on success it just removes the
archive. This doubles as end-to-end coverage for backup/restore (the #102 CLI
breadth gap): the safety backup asserts the archive contains config.json/.env,
and the --lifecycle phase adds an explicit backup → restore round-trip that
diverges the pool, restores, and asserts the pool reverted and secrets survived.

Also: moved resolve_overrides to lib.sh earlier left a few globals read only
cross-file — added scoped shellcheck directives and removed a genuinely-dead
`prune` local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… hardening tests

Rename: pin `name: pithead` in docker-compose.yml so the stack's images, network
and volumes are `pithead*` regardless of the checkout directory name (which is
what left older checkouts showing the repo's previous name, e.g.
p2pool-starter-stack-tor). Because the service containers use global
container_names, a stack still running under the old directory-derived project
would clash on `up`; `pithead` now runs migrate_compose_project on up/apply/
upgrade — it detects containers from a different compose project holding our
names and removes them so the renamed project takes over (bind-mounted chain data
is untouched). test_compose.sh asserts the project name is pinned.

Cleanup: I'd added tests/stack/test_security.sh, not noticing test_compose.sh
already had the #90 hardening assertions — that was duplication. Removed it and
folded only the genuinely-new checks into test_compose.sh's hardening section:
per-service Docker socket-proxy scoping (read proxy can't POST, control proxy is
start/stop-only, socket mounted read-only) and the Tari [m]inotari self-match
guard. Reverted the separate make target / CI step / inventory section.

make test green: dashboard 426, stack 93, compose+hardening 15 checks, selftest
78, contract 12.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…f-hosted gate

Answers "can GitHub do the full end-to-end, or do we need the dedicated server?"
and gets the server ready safely.

- run.sh --readiness: a non-destructive assessment of whether a box is fit to be a
  release/validation server — chains synced (reusable), chain FS snapshot/reflink-
  capable, disk headroom, .env owner-only, dashboard bound to localhost, backup net
  usable. Ran it against the real box: 6/6, one actionable finding (chains on ext4,
  so the prune axis must copy or be skipped; otherwise ready — 497 GiB free, .env
  600, dashboard localhost-only).

- docs/release-server.md: the analysis + hardening guide. GitHub-hosted runners
  can't hold a 95–270 GiB synced chain or sync for days, so the real-daemon tier 4
  is impossible there — but tiers 1–3 (unit, contract, the fake-daemon mini-stack
  with real Docker) already run free on every PR as the merge gate. The dedicated
  server is the tier-4 release gate. Plus the hardening checklist (the box holds
  wallet/onion/RPC keys) and the SAFE self-hosted-runner model.

- .github/workflows/release-gate.yml: runs the tier-4 matrix on a self-hosted
  runner only on trusted code (workflow_dispatch / push to main) — deliberately
  never on pull_request, because a fork PR could run arbitrary code on a runner
  that holds real keys (GitHub's own guidance).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
From a full review of the branch diff:

- HIGH: migrate_compose_project had unguarded `var=$(... jq ...)` substitutions;
  pithead runs `set -Eeuo pipefail`, so a jq/docker hiccup would abort pithead
  mid up/apply/upgrade with the scary ERR-trap message. Guarded every
  substitution (verified it survives non-JSON from `compose config`).
- MED: the helper force-removed ANY container sharing one of our service names
  (tor/caddy/dashboard/…) from a different compose project — a footgun for anyone
  running an unrelated `caddy`/`tor` on the same host. Now it removes ONLY the
  containers from the exact project THIS directory used to derive, so an unrelated
  container is never touched. Fixed the misleading log message + the inaccurate
  "Tor keys in volumes" comment (they're bind-mounted), and documented the
  one-time Caddy cert re-issue.
- LOW: config.py UPDATE_INTERVAL now tolerates a malformed override instead of
  crashing the dashboard at import (with a test). release-gate.yml passes
  workflow inputs via env: instead of interpolating them into the run script.

make test green; stack suite 93; dashboard suite incl. the new config test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-37494f

# Conflicts:
#	tests/stack/run.sh
#	tests/stack/test_compose.sh
Make the live test box (gouda) a proper dev+agent platform and capture the
lessons from setting it up.

- run.sh --readiness: prune-mode-aware. Reports the live chain's prune mode,
  recognizes a snapshot-isolated same-mode copy on a CoW FS, and honestly flags
  when the opposite mode will be skipped (covered by the fakes). Reads
  MONERO_PRUNE directly so it works in standalone --readiness.
- tests/integration/{build-pruned-chain,compact-chain}.sh: chain ops — build a
  pruned chain next to a full one; reclaim LMDB free-page bloat from an
  already-pruned chain via `monero-blockchain-prune --copy-pruned-database`
  (concurrent-safe, no downtime until the swap).
- tests/integration/system-info.sh + gouda-testbench-README.md: a re-runnable
  system snapshot and a build-server runbook for developers and AI agents
  (golden rules, layout, how to run the stack + harness, agent gotchas).
- docs/release-server.md, integration-testing.md: reconcile with the
  storage-speed lesson (active chains belong on SSD/NVMe; CoW is a bonus), the
  compaction gotcha (stock mdb_copy can't read Monero's patched LMDB), and an
  end-to-end coverage/gaps table with before-release priorities.
- docs/test-inventory.md: regenerated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- docs/test-server-architecture.md: high-level structure (SSD-primary storage
  policy, sizing table, directory layout) + a "recreate on another box" runbook
  (clone -> config -> rsync the chains -> deploy) so migrating to a bigger box
  is a transfer + redeploy, no re-sync.
- gouda-testbench-README.md: reframe as a dev/agent test platform (not a
  production miner — downtime/teardown are fine); update paths to ~/pithead and
  the decoupled chains at /srv/code/pithead/data.
- system-info.sh: read chain locations from .env (decoupled from the checkout);
  default STACK_DIR to ~/pithead.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ins (/srv/code/pithead-data)

Reflect the clean re-homed layout in the build-server docs: checkout on the
NVMe at /srv/code/pithead (= ~/code/pithead), chains decoupled at
/srv/code/pithead-data, harness --dir code/pithead, system-info STACK_DIR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Benchmarks on the reference box show the '1TB SSD' is a SATA drive at
~37-98 MB/s (no /dev/nvme present). That bottlenecks monerod/builds and makes
LMDB compaction impractical (~16h vs ~10min on NVMe). Document the finding, how
to verify a disk is genuinely fast, and that a real m.2 PCIe NVMe is the #1
upgrade. The matching mdb_copy (LMDB 0.9.70) is staged for when a fast disk lands.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant