fix(replication): gate audits by mature repair proofs#106
Conversation
Track key-specific repair hints per close group and require proof before auditing or prune-confirming a peer for a key. SemVer: patch
There was a problem hiding this comment.
Pull request overview
This PR tightens replication audit and prune-confirmation eligibility by requiring key-specific “repair proof” evidence (tied to a close-group snapshot and local sync-cycle epoch) before a peer can be audited or used for prune-confirmation. It also adds lifecycle invalidation for those proofs (close-group changes, local key deletion, peer removal) and wires proof tracking through neighbor-sync, audit, and pruning.
Changes:
- Introduce
RepairProofsto record per-(peer, key) replica-hint evidence, including maturity by local sync-cycle epoch and invalidation rules. - Gate normal audits and prune-confirmation audits on mature repair proofs; clear proofs on record deletion and peer removal.
- Update replication internals and e2e tests to pass new context (repair proofs + epoch) through pruning and audit flows.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/e2e/replication.rs | Updates prune-pass e2e test to construct/seed RepairProofs and provide sync epoch context. |
| src/replication/types.rs | Adds RepairProofs data structure + unit tests for proof recording, maturity, and invalidation. |
| src/replication/pruning.rs | Refactors prune pass to take a context struct and requires mature proofs before prune-confirmation audits; clears proofs on deletion. |
| src/replication/neighbor_sync.rs | Returns NeighborSyncOutcome to track which replica hints were sent (for proof recording). |
| src/replication/mod.rs | Adds sync-cycle epoch + proof table to the engine, records sent hints as proofs, clears proofs on peer removal, and threads new parameters into audit/prune. |
| src/replication/audit.rs | Filters audited (peer, key) pairs to those with mature repair proofs. |
| docs/REPLICATION_DESIGN.md | Updates design doc invariants/rules to describe key-specific RepairProof and maturity gating. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Invalidate key repair proofs when close-group membership changes and require a completed sync-cycle epoch before audits or prune-confirmation challenges can use a proof. SemVer: patch
99846c5 to
98c20ea
Compare
Avoid cloning full neighbor-sync responses when recording repair hints, and carry close-group snapshots from hint construction into proof recording instead of performing another DHT lookup per hinted key. Keep the public audit, prune, and neighbor-sync entry points compatible while routing the engine through proof-aware internal variants. SemVer: patch
Document the send success boolean used by repair proof recording and make the conservative behavior of the legacy prune wrapper explicit. SemVer: patch
SemVer: patch
SemVer: patch
SemVer: patch
grumbach
left a comment
There was a problem hiding this comment.
Approving. Clean change — replaces the weak peer-level RepairOpportunity with per-(peer, key) RepairProof tied to close-group snapshot + sync epoch, and the design-doc updates land in lockstep with the code. The audit trust-penalty flow now requires fresh responsibility confirmation AND a mature key-specific proof under the current snapshot, which is exactly the right gate.
Minor notes:
1. Memory bound. Doc-acknowledged worst case is local_key_count × CLOSE_GROUP_SIZE. Fine for normal nodes; at very large local stores (millions of keys × 7) it adds up. Worth confirming RepairProofs uses a compact internal layout (no full peer/key copies per entry) — happy to take your word for it if you've already eyeballed it.
2. Snapshot-invalidation under RT churn. Every close-group membership change for a key wipes all proofs for that key. On a busy/churny network this could leave audits idle longer than expected. Trades audit-grief-via-eviction for audit-stall-via-churn, which is probably the right call, but worth keeping an eye on once running on a real testnet.
3. audit_tick compat wrapper passes an empty proofs table → silently never audits. Any caller still on the old API stops auditing without surfacing a warning. Mark #[deprecated] (or just remove once the engine fully migrates) so we don't accidentally regress later.
Ship it.
Keep best-effort response call sites on a unit-returning helper while the repair-proof path calls the checked variant that reports send acceptance. SemVer: patch
Reconcile repair-proof snapshots by dropping peers that left the current close group while retaining mature proofs for peers that remained stable. SemVer: patch
|
Follow-up on snapshot-invalidation under RT churn: addressed in 425de06. RepairProofs now reconcile close-group snapshots by retaining mature proofs for peers that remain in the current close group and dropping only peers that left. That keeps the eviction/re-entry safety property without stalling audits for stable peers during unrelated churn. |
Summary:
SemVer: patch
Tests:
Note: full all-target clippy is still blocked locally by the existing duplicate saorsa_core path override in unstaged Cargo.toml/Cargo.lock.