Add end-to-end integration test harness against a real server (#54)#114
Open
VijitSingh97 wants to merge 21 commits into
Open
Add end-to-end integration test harness against a real server (#54)#114VijitSingh97 wants to merge 21 commits into
VijitSingh97 wants to merge 21 commits into
Conversation
Stand up the runtime/integration half of our testing: a suite that drives a REAL, already-synced Pithead server through the config matrix and asserts the stack behaves — not just the client-side/unit checks we have today. The box is assumed already deployed and synced with miners connected, so the harness moves between scenarios with non-interactive `pithead apply -y` (recreates only changed containers, reuses the synced chain data dirs — never re-syncs, never re-provisions Tor, preserves secrets). It waits on real readiness signals (container health, `pithead status`, dashboard sync %, miner-released) with timeouts — never fixed sleeps — then asserts per scenario: expected containers up / unexpected absent (no monerod in remote mode), Monero synced + pruned/full display, sidechain selection, end-to-end mining (workers online, hashes accumulating), posture propagated to .env, status exit codes, apply idempotency, and secret preservation. `--lifecycle` adds restart, a pool-change apply, and node-down failover (#31). tests/integration/: run.sh entry point — SSH or --local, iterate matrix, assert, restore scenarios.sh declarative config matrix (data, not code) lib.sh target I/O, assertions, readiness waiters, config render, redaction selftest.sh pure-logic self-test (no server) — runs in CI on every PR Safety: never mutates the canonical chains; the destructive prune axis only runs against a separate --pruned/--full-data-dir (else SKIPPED, never silent); secrets are hashed on-box for comparison and redacted from all artifacts; continue-on-error collects the whole matrix. Wired as `make test-integration` (the blocking pre-release gate per #44) with the pure-logic selftest in CI. New docs/integration-testing.md plus index/releasing/ changelog updates. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-37494f # Conflicts: # CHANGELOG.md
…unit gaps (#54) Simulate every runtime situation the live synced box can't show, at the cheapest honest tier. Documented in docs/testing-strategy.md with a full scenario catalog. Tier 1 (unit, every PR): backfill the genuine gaps the audit found — the required-Tari sync gate (monero synced but tari syncing → still held), the #35 one-way-latch × #31 failover interaction after release, and a simultaneous double outage. +3 dashboard tests; suite 381 green at 92.86% coverage. Tier 2 (contract, every PR, docker-free): controllable fake monerod (HTTP get_info) and fake Tari (gRPC BaseNode, via the vendored stubs) under tests/integration/fakes/, plus a contract test that points the REAL Monero/Tari clients at them and asserts they parse every state (synced/syncing/down). This is the verifiable proof the fakes speak the daemons' wire format. make test-fakes. Tier 3 (mini-stack, CI with docker): tests/integration/mini-stack/ runs the REAL dashboard + docker-control/-proxy against the fakes with lightweight p2pool/ xmrig-proxy containers, and a scenario runner asserting the control plane end-to-end — sync hold→release (#35) and node-down reject→readmit (#31) — actually stopping/starting real containers. Its own workflow (needs a docker daemon); compose validated with `docker compose config`. make test-mini-stack. Tier 4 (live matrix): add a --fault-injection phase to run.sh that deliberately breaks monerod (stop / SIGSTOP / remove) and asserts pithead status' verdicts (down / unhealthy / missing) and the full failover→recovery cycle, plus the service_state helper and parsing covered by the self-test (53 green). Enabler: UPDATE_INTERVAL is now env-configurable so the mini-stack loops fast in CI. CI wires the contract test into the dashboard job (every PR) and the docker mini-stack as its own paths-filtered workflow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ness (#54) Generated coverage inventory and a round of scenario hardening across the tiers, plus a production-readiness posture (CI gating, engineering standards, flake policy, honest gaps). - Inventory: tests/inventory.sh enumerates every test/scenario across all suites into docs/test-inventory.md (make test-inventory). A CI drift check (make test-inventory-check) fails if a test is added/removed without regenerating it, so "what's covered" can't silently rot. - Tier 2 (contract): +5 daemon edge states proven against the real clients — monerod BUSY (HTTP 200, non-OK status → distrusted), synced-by-height without the flag, db_size unknown; Tari syncing-without-reliable-target (no false 100%), and cached-last-reading on a brief gRPC blip. 7 → 12 contract tests. - Tier 3 (mini-stack): 4 → 8 deterministic scenarios — added required-Tari hold (holds while only monerod is synced), required-Tari down → reject → readmit, and a dashboard-restart persistence check (the one-way latch is not re-held). New assert_stays helper proves the gate does NOT act prematurely. - docs/testing-strategy.md: a Production-readiness section — what's blocking on every PR vs. the release gate, the engineering standards (deterministic waits, isolation, artifact capture, secrets hygiene, reproducibility), the quarantine-not-retry flake policy, and the known gaps (first real-hardware green run, CLI breadth in automation, soak, load, security review). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
, #32) First green run of the live integration harness against a real synced, mining box (gouda): 22/22. That run did its job — it surfaced three over-strict harness assertions and one real product bug. Harness calibration (tests/integration/): - Monero "synced" now trusts monerod's own get_info `synchronized` flag (via a new monero_caught_up helper, creds stay on the box) instead of the dashboard's `.sync.monero.state == "done"`: a synced LOCAL node has no target height, so that UI field reads "loading", not "done". - Mining liveness keys off proxy_workers; stratum.conns is now informational (a healthy mining box can report conns 0). - Pruned/full assertion is version-robust: a local node's mode must be determinate (Pruned|Full), not coupled to config (which can differ from the node's reality). - New non-destructive `--check` mode: assert the box's current live state with no config change / apply / restore. The safe first run and an ongoing health check. Read-only assertions refactored into assert_running_state, shared by --check and the per-scenario battery. Product fix (build/dashboard, #32): the pruned/full label always showed Full on local nodes — config.py parsed MONERO_PRUNE with == "true" but pithead writes 1/0. Now accepts 1/true/yes/on, with unit tests. Dashboard suite 383 green @ 92.91%. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Namespace the mini-stack's controllable containers (itest-p2pool / itest-xmrig-proxy) and move the fakes' host control ports to 28081/28152. Before this, the mini-stack used bare container_name p2pool/xmrig-proxy and published 18081 — which on a host already running the real stack collides with the production containers and monerod, and its docker-control would target the real p2pool/xmrig-proxy. Now it's fully isolated: unique names, own network, non-colliding ports — safe to run beside a live deployment (e.g. the test box). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The standalone fake Tari bound its gRPC server to 127.0.0.1, so the dashboard container couldn't reach it across the docker network — Tari read as unreachable, TARI_REQUIRED held the gate, and the miner never released in the mini-stack. Bind 0.0.0.0 in the container (loopback stays the default for the in-process contract test). Found by running the mini-stack on a real docker host. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…54) Got the fake-daemon mini-stack to a clean pass on a real Docker host (gouda). Two real bugs were surfaced and fixed in the process (fake_tari binding 127.0.0.1; container-name/port collisions). The monerod-down failover scenario is removed from the mini-stack: the dashboard's monerod down-path falls back to log-scraping a real monerod container, which the fake stack has no equivalent of, so it can't be simulated honestly here — it's covered on real hardware by the tier-4 --fault-injection phase. The mini-stack now covers sync hold/release (monero+Tari) and Tari reject/readmit, end-to-end with real containers. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ug, dev guide Audited coverage for INTENT (not line count) and filled the genuine gaps the audit surfaced — plus a developer testing guide. Persistence (storage_service, 82% → 87%): added intent tests for the upgrade path and long-running behavior that a fresh-DB test never exercises — schema migration timestamp backfill, in-memory + DB history retention pruning, and worker retention pruning. The migration test caught a REAL bug: _create_tables built idx_ts on history(timestamp) before _migrate_db added that column, so opening a pre-timestamp DB threw "no such column: timestamp" and aborted the whole migration. Indexes now run after migrations. Integration harness self-coverage: moved resolve_overrides (the prerequisite gate that protects the canonical synced chains from a destructive prune flip) into lib.sh so the self-test can exercise it, and added cases for every gate path (skip-without-dir, augment-with-dir, remote endpoint, compound), plus pool_label edges, scenario lookup negatives, redact *_PASSWORD/*_SECRET variants, and overrides_to_jq empties. Self-test 53 → 78. Docs: new docs/testing-guide.md — a practical, recipe-oriented guide for future developers (where to write each kind of test, conventions, and the calibration gotchas learned on real hardware). Refreshed CONTRIBUTING's testing section and the docs index. Dashboard suite 387 green @ 93.64%; selftest 78; stack 85; contract 12. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-37494f # Conflicts: # CHANGELOG.md # docs/releasing.md
Audited the project's bug/security history (CHANGELOG, fix commits, closed issues, code "broke once" comments) against the suite. Most past bugs were already covered — #70 XvB overshoot (VipReserve clamp tests), #65 chart time gaps, #27 hashrate averages, #58 build provenance, the monerod get_info parser, the Tari overload cache. Two genuine gaps are now closed: - Security (#90): a new tests/stack/test_security.sh renders the canonical compose config and asserts the hardening invariants so a future edit can't silently undo them — RPC credentials never appear in a healthcheck command (the docker-inspect leak fix), no-new-privileges + cap_drop on the leaf containers (and the documented dashboard exception), the read socket-proxy can't POST while the control proxy is start/stop-only (both socket-ro, read-only rootfs), and p2pool has its liveness probe. Proven to bite (it fails on a re-introduced leak / a dashboard cap_drop). Runs on every PR (compose job); wired into make test + the inventory. - The dashboard.host "auto"-revert fix (247c5a0) had no test: added one asserting a configured host is used verbatim and that "auto" reverts HOST_IP to the machine hostname (not a stale value). make test green: dashboard 426 @ 93.83%, stack 93, security 23, selftest 78, contract 12. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… backup/restore gap) Manages the "destructive matrix on a precious box" gap: `run.sh --safety-backup` takes a real `pithead backup` before the destructive scenarios, captures the archive, and if anything fails rolls the box all the way back (down → restore → up) before restoring the baseline config; on success it just removes the archive. This doubles as end-to-end coverage for backup/restore (the #102 CLI breadth gap): the safety backup asserts the archive contains config.json/.env, and the --lifecycle phase adds an explicit backup → restore round-trip that diverges the pool, restores, and asserts the pool reverted and secrets survived. Also: moved resolve_overrides to lib.sh earlier left a few globals read only cross-file — added scoped shellcheck directives and removed a genuinely-dead `prune` local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… hardening tests Rename: pin `name: pithead` in docker-compose.yml so the stack's images, network and volumes are `pithead*` regardless of the checkout directory name (which is what left older checkouts showing the repo's previous name, e.g. p2pool-starter-stack-tor). Because the service containers use global container_names, a stack still running under the old directory-derived project would clash on `up`; `pithead` now runs migrate_compose_project on up/apply/ upgrade — it detects containers from a different compose project holding our names and removes them so the renamed project takes over (bind-mounted chain data is untouched). test_compose.sh asserts the project name is pinned. Cleanup: I'd added tests/stack/test_security.sh, not noticing test_compose.sh already had the #90 hardening assertions — that was duplication. Removed it and folded only the genuinely-new checks into test_compose.sh's hardening section: per-service Docker socket-proxy scoping (read proxy can't POST, control proxy is start/stop-only, socket mounted read-only) and the Tari [m]inotari self-match guard. Reverted the separate make target / CI step / inventory section. make test green: dashboard 426, stack 93, compose+hardening 15 checks, selftest 78, contract 12. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…f-hosted gate Answers "can GitHub do the full end-to-end, or do we need the dedicated server?" and gets the server ready safely. - run.sh --readiness: a non-destructive assessment of whether a box is fit to be a release/validation server — chains synced (reusable), chain FS snapshot/reflink- capable, disk headroom, .env owner-only, dashboard bound to localhost, backup net usable. Ran it against the real box: 6/6, one actionable finding (chains on ext4, so the prune axis must copy or be skipped; otherwise ready — 497 GiB free, .env 600, dashboard localhost-only). - docs/release-server.md: the analysis + hardening guide. GitHub-hosted runners can't hold a 95–270 GiB synced chain or sync for days, so the real-daemon tier 4 is impossible there — but tiers 1–3 (unit, contract, the fake-daemon mini-stack with real Docker) already run free on every PR as the merge gate. The dedicated server is the tier-4 release gate. Plus the hardening checklist (the box holds wallet/onion/RPC keys) and the SAFE self-hosted-runner model. - .github/workflows/release-gate.yml: runs the tier-4 matrix on a self-hosted runner only on trusted code (workflow_dispatch / push to main) — deliberately never on pull_request, because a fork PR could run arbitrary code on a runner that holds real keys (GitHub's own guidance). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
From a full review of the branch diff: - HIGH: migrate_compose_project had unguarded `var=$(... jq ...)` substitutions; pithead runs `set -Eeuo pipefail`, so a jq/docker hiccup would abort pithead mid up/apply/upgrade with the scary ERR-trap message. Guarded every substitution (verified it survives non-JSON from `compose config`). - MED: the helper force-removed ANY container sharing one of our service names (tor/caddy/dashboard/…) from a different compose project — a footgun for anyone running an unrelated `caddy`/`tor` on the same host. Now it removes ONLY the containers from the exact project THIS directory used to derive, so an unrelated container is never touched. Fixed the misleading log message + the inaccurate "Tor keys in volumes" comment (they're bind-mounted), and documented the one-time Caddy cert re-issue. - LOW: config.py UPDATE_INTERVAL now tolerates a malformed override instead of crashing the dashboard at import (with a test). release-gate.yml passes workflow inputs via env: instead of interpolating them into the run script. make test green; stack suite 93; dashboard suite incl. the new config test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-37494f # Conflicts: # tests/stack/run.sh # tests/stack/test_compose.sh
Make the live test box (gouda) a proper dev+agent platform and capture the
lessons from setting it up.
- run.sh --readiness: prune-mode-aware. Reports the live chain's prune mode,
recognizes a snapshot-isolated same-mode copy on a CoW FS, and honestly flags
when the opposite mode will be skipped (covered by the fakes). Reads
MONERO_PRUNE directly so it works in standalone --readiness.
- tests/integration/{build-pruned-chain,compact-chain}.sh: chain ops — build a
pruned chain next to a full one; reclaim LMDB free-page bloat from an
already-pruned chain via `monero-blockchain-prune --copy-pruned-database`
(concurrent-safe, no downtime until the swap).
- tests/integration/system-info.sh + gouda-testbench-README.md: a re-runnable
system snapshot and a build-server runbook for developers and AI agents
(golden rules, layout, how to run the stack + harness, agent gotchas).
- docs/release-server.md, integration-testing.md: reconcile with the
storage-speed lesson (active chains belong on SSD/NVMe; CoW is a bonus), the
compaction gotcha (stock mdb_copy can't read Monero's patched LMDB), and an
end-to-end coverage/gaps table with before-release priorities.
- docs/test-inventory.md: regenerated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- docs/test-server-architecture.md: high-level structure (SSD-primary storage policy, sizing table, directory layout) + a "recreate on another box" runbook (clone -> config -> rsync the chains -> deploy) so migrating to a bigger box is a transfer + redeploy, no re-sync. - gouda-testbench-README.md: reframe as a dev/agent test platform (not a production miner — downtime/teardown are fine); update paths to ~/pithead and the decoupled chains at /srv/code/pithead/data. - system-info.sh: read chain locations from .env (decoupled from the checkout); default STACK_DIR to ~/pithead. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ins (/srv/code/pithead-data) Reflect the clean re-homed layout in the build-server docs: checkout on the NVMe at /srv/code/pithead (= ~/code/pithead), chains decoupled at /srv/code/pithead-data, harness --dir code/pithead, system-info STACK_DIR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Benchmarks on the reference box show the '1TB SSD' is a SATA drive at ~37-98 MB/s (no /dev/nvme present). That bottlenecks monerod/builds and makes LMDB compaction impractical (~16h vs ~10min on NVMe). Document the finding, how to verify a disk is genuinely fast, and that a real m.2 PCIe NVMe is the #1 upgrade. The matching mdb_copy (LMDB 0.9.70) is staged for when a fast disk lands. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes part of #54 (the harness + full config matrix + lifecycle/edge slices; the
make releasewiring lands with #44).What
A new
tests/integration/suite — the runtime/integration half of our testing. It drives a real, already-synced Pithead server through the config matrix and asserts the stack behaves. Today everything is client-side/unit (thepitheadshell tests stubdocker/sudo, the compose test only checks interpolation, the dashboard pytest mocks its clients) — none prove a realapply → sync-gate → mine → statusflow works on a real host.tests/integration/run.sh--local), iterates the matrix, asserts, captures artifacts, restorestests/integration/scenarios.shtests/integration/lib.shtests/integration/selftest.shHow it works
The box is assumed already deployed and synced with miners connected — the whole point of a dedicated test box is that the full Monero + Tari nodes are synced once and reused. So the harness moves between scenarios with non-interactive
pithead apply -y: recreates only changed containers, reuses the synced chain data dirs (never re-syncs / re-provisions Tor), preserves secrets. It waits on real readiness signals (container health,pithead status, dashboard sync %, miner-released) with timeouts — never fixed sleeps. All reads happen on-box (pithead status/doctor,curl …/api/state), so SSH and--localbehave identically.It snapshots the box's original
config.jsonup front and restores it at the end.What each scenario asserts
monerodin remote mode)pithead statusexit code is0for a healthy configpool.typematchesp2pool.pool>= --workers, default 2), stratum connected, hashes accumulating (Hashrate reflects zero and chart isn't painting #28).env(RPC bind,DASHBOARD_SECURE,XVB_ENABLED,TARI_REQUIRED); Caddy scheme matchesdashboard.secureapply -yis a clean no-op--lifecycleadds:restart, a sidechain-changeapply(secrets preserved + dashboard reflects it), and node-down failover (#31) — stopmonerod→statusnon-zero + workers rejected → start → readmitted →status0.Matrix coverage
Covers the realistic combinations and guarantees every value of every axis (
monero.mode,monero.prune,monero.rpc_lan_access,p2pool.poolmain/mini/nano,xvb.enabled,dashboard.secure,dashboard.tari_required) is exercised at least once — the selftest enforces this;run.sh --listprints it.Safety (the box holds real keys)
--pruned-data-dir/--full-data-dir; otherwise the case is SKIPPED with a reason — never silently dropped, never run against the canonical DB.Testing & wiring
tests/integration/selftest.sh— 48 pure-logic checks (config rendering & typing, profile-gated expectations, redaction, axis coverage, SSH/local exec, JSON parsing). It caught a real bug during development:jq_getused// empty, which silently swallows boolean-falsein jq — broke the full-vs-pruned axis. Fixed.make test-integration(live, manual — the blocking pre-release gate per Release & versioning structure (single-product releases via GHCR) #44) andmake test-integration-selftest.shelljob). The live matrix stays a gated/manual self-hosted job (it needs the real box).docs/integration-testing.md(provisioning, safety model, matrix, artifacts, release wiring) + docs index /releasing.mdstatus /CHANGELOGupdates.Verification
shellcheck --severity=warningclean across all stack + integration scriptstests/integration/selftest.sh: 48 passedtests/stack/run.sh: 72 passed (no regressions)🤖 Generated with Claude Code