docs(design): proposed txn secondary-commit idempotency (Jepsen 26198185540) by bootjp · Pull Request #796 · bootjp/elastickv

bootjp · 2026-05-21T13:33:43Z

Summary

Design doc for fixing the :duplicate-elements anomaly surfaced by scheduled Jepsen run 26198185540.

What's in the doc

Root cause walkthrough: a single :append 14 230 from worker 3 produced TWO storage entries with value 230, immediately visible in the next read. Same-commit_ts replay is dedup'd by content-addressed list-delta keys; different-commit_ts replay (regenerated by runTransaction's retryRedisWrite on partial-commit) is NOT, because RPUSH/LPUSH derive their seq from a fresh meta read at attempt time.
TiKV reference: Percolator's three layers — content-addressed writes, CheckTxnStatus before retry, LockResolver — and which one elastickv is missing.
Proposed fix: server-side per-txn applied marker keyed by adapter-allocated UUIDv7, written atomically with the commit mutations and checked at FSM apply entry. Optional client-side commit-status check (TiKV-style CheckTxnStatus) as a follow-up.
Six implementation milestones with scope estimates.
Caller audit table covering every commit-request entry point.
Four risks with mitigations.

Why now / why this layer

PR fix(redis): NOTLEADER prefix for gRPC-wrapped leader errors (closes Jepsen run 26035515694 crash) #789 (NOTLEADER prefix) unmasked this anomaly — workers no longer crash, so the :duplicate-elements is now visible to the checker rather than being silently classified as :info.
A pure client-side fix is not viable: the trigger spans runTransaction's retry loop AND the gRPC Forward redirect, and no client-supplied txn ID exists. Server-side marker is the natural enforcement point.
Existing applyCommitWithIdempotencyFallback only catches same-commit_ts replay; the regenerated-input replay shape needs a logical-txn-identity marker.

Companion PR

PR #795 ("upload full demo cluster log on Jepsen failure") is open in parallel. The design here is independent of which exact code path produces the partial-commit (that's what PR #795 will let us pinpoint); the fix works for any such path.

Status

Open for review. Implementation milestones (M1–M6) follow once this lands.

…6198185540) Triggered by the scheduled Jepsen run 26198185540 surfacing a real :duplicate-elements anomaly on the Redis list-append workload. The duplicate is a single client-level :append being applied twice in storage with DIFFERENT commit_ts: the EXEC body's retry loop (runTransaction's retryRedisWrite) regenerates startTS / commitTS / list index on each iteration, so content- addressing — which catches identical-input replay — does not catch the regenerated-input retry-after-partial-success shape. Design proposes a server-side per-txn applied marker keyed by an adapter-allocated UUIDv7, written atomically with the commit mutations and checked at FSM apply entry. TiKV's CheckTxnStatus + content-addressed Write CF design is the reference; this proposal lifts the same correctness guarantee into the FSM layer because elastickv's RPUSH/LPUSH-derived storage keys are NOT pure content-addressed (the seq number depends on a meta read at attempt time). Doc covers: - Why the same-commit_ts replay does NOT duplicate (content-addressed delta keys) - Why the different-commit_ts replay DOES duplicate (runTransaction's EXEC-body retry regenerates everything) - Why applyCommitWithIdempotencyFallback does NOT catch this (works only on identical-input replay) - How TiKV's three layers (content-addressed writes + CheckTxnStatus + LockResolver) compose, and which one elastickv is missing - Server-side marker design (key prefix, GC TTL, batch atomicity) - Optional client-side commit-status check - Six implementation milestones (M1 proto/UUID, M2 FSM marker, M3 adapter sentinel handling, M4 GC, M5 optional check, M6 Jepsen validation) - Caller audit table covering all commit-request entry points - Four risks (GC timing, proto overhead, reconstruction intricacy, LockResolver interaction) The doc is independent of PR #795 (full Jepsen log artifact); the fix design works for any partial-commit code path that might be the actual trigger.

coderabbitai · 2026-05-21T13:33:53Z

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 27 minutes and 38 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 20648c60-3471-4b8c-bad3-8fc3852e0fa6

📥 Commits

Reviewing files that changed from the base of the PR and between fe96c88 and 22753ab.

📒 Files selected for processing (1)

docs/design/2026_05_21_proposed_txn_secondary_idempotency.md

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/txn-idempotency-design

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This design document proposes a server-side transaction-applied marker and an optional client-side check to ensure exactly-once application of multi-shard transaction commits, addressing a duplicate-element anomaly found during Jepsen testing. The proposed solution involves using a UUIDv7 as a transaction identifier to detect and handle retries that might otherwise result in duplicate writes. Review feedback highlights the need for the new error sentinel to include the committed timestamp to aid result reconstruction, suggests optimizing the marker garbage collection by leveraging the time-ordered nature of UUIDv7 with DeleteRange, and warns about the operational risks of adding synchronous reads to the secondary commit retry path.

gemini-code-assist · 2026-05-21T13:36:22Z

+        // Primary already committed. If our secondary's marker is
+        // present, this is a true no-op — return success. Otherwise
+        // retry the secondary commit (it really did fail).
+        if secondary_marker_present(adapter_txn_id) { return Ok }


The secondary_marker_present check in milestone M5 introduces a significant operational risk: a synchronous read from the secondary shard before every retry. If the secondary shard is experiencing issues (e.g., leader election or network partition), this check will likely fail or time out, delaying the retry loop and impacting availability. As per repository guidelines for operational risks, the design should detail mitigation strategies; a 'blind' retry might be more robust as it avoids the added latency and dependency in the failure path.

References

When a design document identifies a significant operational risk, it must also detail potential mitigation strategies to ensure system stability.

gemini-code-assist · 2026-05-21T13:36:22Z

+2. If present and the recorded commit_ts equals the request's commit_ts:
+   no-op (return nil). The mutations would be a duplicate.
+3. If present and the recorded commit_ts **differs**: return
+   `ErrTxnAlreadyCommittedAtDifferentTS` (a new sentinel). This catches the


To facilitate the result reconstruction described in milestone M3, the ErrTxnAlreadyCommittedAtDifferentTS sentinel should carry the recorded commit_ts in its response payload. Since the adapter regenerates startTS on every retry (as noted in the walkthrough), it will not know the original startTS used by the successful attempt and thus cannot look up the primary's commit record via txnCommitKey(primary_key, original_start_ts). Returning the commit_ts directly from the FSM marker eliminates this lookup dependency and supports smoother rolling upgrades by reducing the need for consistent state lookups across different node versions.

References

Design documents should address operational risks such as rolling upgrades by detailing strategies that ensure compatibility across different node versions.

gemini-code-assist · 2026-05-21T13:36:22Z

+
+A conservative TTL of `2 × (lockResolverInterval + defaultTxnLockTTLms)`
+= 80 s covers both. Cleanup runs in `kv/compactor.go`'s existing periodic
+sweep — add a new prefix scan that drops markers whose `applied_at` is


Since adapter_txn_id is a UUIDv7, it is naturally time-ordered (the most significant 48 bits contain a millisecond timestamp). Instead of a prefix scan that reads every value to check applied_at_wall_time, the compactor can use a more efficient DeleteRange operation. By calculating the UUIDv7 prefix corresponding to the TTL cutoff, you can delete all expired markers in a single atomic operation without the O(N) read overhead during the periodic sweep.

gemini-code-assist Bot reviewed May 21, 2026

View reviewed changes

Merge branch 'main' into docs/txn-idempotency-design

22753ab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(design): proposed txn secondary-commit idempotency (Jepsen 26198185540)#796

docs(design): proposed txn secondary-commit idempotency (Jepsen 26198185540)#796
bootjp wants to merge 2 commits into
mainfrom
docs/txn-idempotency-design

bootjp commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bootjp commented May 21, 2026

Summary

What's in the doc

Why now / why this layer

Companion PR

Status

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 21, 2026 •

edited

Loading