Skip to content

fix(replication): gate audits by mature repair proofs#106

Merged
mickvandijke merged 9 commits into
WithAutonomi:mainfrom
mickvandijke:fix/replication-repair-proof-epochs
May 21, 2026
Merged

fix(replication): gate audits by mature repair proofs#106
mickvandijke merged 9 commits into
WithAutonomi:mainfrom
mickvandijke:fix/replication-repair-proof-epochs

Conversation

@mickvandijke
Copy link
Copy Markdown
Collaborator

Summary:

  • Track repair proofs with close-group snapshots and local sync-cycle epochs.
  • Require mature key-specific proofs for normal audits and prune-confirmation audits.
  • Invalidate stale proofs on close-group changes, local key deletion, and peer removal so re-entry requires a fresh hint.

SemVer: patch

Tests:

  • cargo fmt --all -- --check
  • cargo check --lib
  • cargo test --lib replication::types::tests::repair_proofs
  • cargo test --lib replication::
  • cargo clippy --lib --all-features -- -D warnings

Note: full all-target clippy is still blocked locally by the existing duplicate saorsa_core path override in unstaged Cargo.toml/Cargo.lock.

Copilot AI review requested due to automatic review settings May 20, 2026 13:47
Track key-specific repair hints per close group and require proof before auditing or prune-confirming a peer for a key.

SemVer: patch
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens replication audit and prune-confirmation eligibility by requiring key-specific “repair proof” evidence (tied to a close-group snapshot and local sync-cycle epoch) before a peer can be audited or used for prune-confirmation. It also adds lifecycle invalidation for those proofs (close-group changes, local key deletion, peer removal) and wires proof tracking through neighbor-sync, audit, and pruning.

Changes:

  • Introduce RepairProofs to record per-(peer, key) replica-hint evidence, including maturity by local sync-cycle epoch and invalidation rules.
  • Gate normal audits and prune-confirmation audits on mature repair proofs; clear proofs on record deletion and peer removal.
  • Update replication internals and e2e tests to pass new context (repair proofs + epoch) through pruning and audit flows.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/e2e/replication.rs Updates prune-pass e2e test to construct/seed RepairProofs and provide sync epoch context.
src/replication/types.rs Adds RepairProofs data structure + unit tests for proof recording, maturity, and invalidation.
src/replication/pruning.rs Refactors prune pass to take a context struct and requires mature proofs before prune-confirmation audits; clears proofs on deletion.
src/replication/neighbor_sync.rs Returns NeighborSyncOutcome to track which replica hints were sent (for proof recording).
src/replication/mod.rs Adds sync-cycle epoch + proof table to the engine, records sent hints as proofs, clears proofs on peer removal, and threads new parameters into audit/prune.
src/replication/audit.rs Filters audited (peer, key) pairs to those with mature repair proofs.
docs/REPLICATION_DESIGN.md Updates design doc invariants/rules to describe key-specific RepairProof and maturity gating.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/replication/mod.rs Outdated
Comment thread src/replication/mod.rs Outdated
Comment thread src/replication/pruning.rs Outdated
Comment thread src/replication/neighbor_sync.rs
Comment thread src/replication/audit.rs
Invalidate key repair proofs when close-group membership changes and require a completed sync-cycle epoch before audits or prune-confirmation challenges can use a proof.

SemVer: patch
@mickvandijke mickvandijke force-pushed the fix/replication-repair-proof-epochs branch from 99846c5 to 98c20ea Compare May 20, 2026 14:01
Avoid cloning full neighbor-sync responses when recording repair hints, and carry close-group snapshots from hint construction into proof recording instead of performing another DHT lookup per hinted key.

Keep the public audit, prune, and neighbor-sync entry points compatible while routing the engine through proof-aware internal variants.

SemVer: patch
Copilot AI review requested due to automatic review settings May 20, 2026 14:40
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comment thread src/replication/mod.rs Outdated
Comment thread src/replication/pruning.rs
Document the send success boolean used by repair proof recording and make the conservative behavior of the legacy prune wrapper explicit.

SemVer: patch
Copilot AI review requested due to automatic review settings May 20, 2026 16:43
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comment thread src/replication/types.rs
Comment thread src/replication/pruning.rs
Copilot AI review requested due to automatic review settings May 20, 2026 17:18
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment thread src/replication/mod.rs Outdated
Copy link
Copy Markdown
Collaborator

@grumbach grumbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. Clean change — replaces the weak peer-level RepairOpportunity with per-(peer, key) RepairProof tied to close-group snapshot + sync epoch, and the design-doc updates land in lockstep with the code. The audit trust-penalty flow now requires fresh responsibility confirmation AND a mature key-specific proof under the current snapshot, which is exactly the right gate.

Minor notes:

1. Memory bound. Doc-acknowledged worst case is local_key_count × CLOSE_GROUP_SIZE. Fine for normal nodes; at very large local stores (millions of keys × 7) it adds up. Worth confirming RepairProofs uses a compact internal layout (no full peer/key copies per entry) — happy to take your word for it if you've already eyeballed it.

2. Snapshot-invalidation under RT churn. Every close-group membership change for a key wipes all proofs for that key. On a busy/churny network this could leave audits idle longer than expected. Trades audit-grief-via-eviction for audit-stall-via-churn, which is probably the right call, but worth keeping an eye on once running on a real testnet.

3. audit_tick compat wrapper passes an empty proofs table → silently never audits. Any caller still on the old API stops auditing without surfacing a warning. Mark #[deprecated] (or just remove once the engine fully migrates) so we don't accidentally regress later.

Ship it.

Keep best-effort response call sites on a unit-returning helper while the repair-proof path calls the checked variant that reports send acceptance.

SemVer: patch
Reconcile repair-proof snapshots by dropping peers that left the current close group while retaining mature proofs for peers that remained stable.

SemVer: patch
@mickvandijke
Copy link
Copy Markdown
Collaborator Author

Follow-up on snapshot-invalidation under RT churn: addressed in 425de06. RepairProofs now reconcile close-group snapshots by retaining mature proofs for peers that remain in the current close group and dropping only peers that left. That keeps the eviction/re-entry safety property without stalling audits for stable peers during unrelated churn.

@mickvandijke mickvandijke merged commit 5a052fb into WithAutonomi:main May 21, 2026
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants