Skip to content

feat: aborted-transaction retry policy and client integration#18

Open
mlwelles wants to merge 3 commits into
matthewmcneely:mainfrom
mlwelles:feature/retry-record
Open

feat: aborted-transaction retry policy and client integration#18
mlwelles wants to merge 3 commits into
matthewmcneely:mainfrom
mlwelles:feature/retry-record

Conversation

@mlwelles

@mlwelles mlwelles commented Jun 4, 2026

Copy link
Copy Markdown

What this adds

An opt-in retry layer for aborted Dgraph transactions. When concurrent
writers conflict on an indexed predicate, Dgraph aborts one of the transactions
with dgo.ErrAborted. Without a retry, that surfaces to the caller as a failed
write even though retrying would succeed.

  • RetryPolicy / DefaultRetryPolicy — exponential backoff with jitter,
    modeled after dgraph4j's policy.
  • Client.WithRetry(ctx, policy, fn) — runs fn, retrying on aborts per the
    policy; returns non-abort errors immediately; honors context cancellation
    during backoff.
err := client.WithRetry(ctx, modusgraph.DefaultRetryPolicy, func() error {
    return client.Insert(ctx, &entity)
})

Self-contained: retry.go plus the WithRetry method on Client. Builds and
tests green. The bulk-loader PR (#19) stacks on this.

The problem it solves

Concurrent inserts on the same indexed predicate race at commit time; Dgraph
lets one win and aborts the rest. The aborting transaction is safe to retry
against a fresh snapshot. WithRetry encapsulates that loop — backoff schedule,
jitter to avoid a thundering herd, and a retry ceiling — so callers wrap their
mutation once instead of hand-rolling retry logic at every write site.

Review round — changes since first push

Addresses the automated review (cubic):

  • Jitter could exceed MaxDelay (P2): delay() applied the MaxDelay cap
    before adding jitter, so the result could reach MaxDelay * (1 + Jitter)
    violating the documented "no single delay exceeds MaxDelay." Jitter is now
    added before a final clamp, and the exponential is capped before the shift so
    a large attempt count cannot overflow.
  • Negative MaxRetries skipped fn (P2): for attempt := range MaxRetries + 1
    ran zero times when MaxRetries < 0, falling through to the unreachable
    panic. WithRetry now clamps a negative MaxRetries to zero, so fn
    always runs at least once as documented.

Tests

  • TestRetryPolicyDelayJitterNeverExceedsMaxCap — asserts the cap invariant
    across attempts under large jitter.
  • TestWithRetryNegativeMaxRetries — asserts fn runs once with MaxRetries=-1.
  • TestWithRetryContextCancellation — rewritten to return dgo.ErrAborted from
    fn, so it actually enters the backoff sleep and exercises the ctx.Done()
    path it claims to cover (it previously returned ctx.Err() and returned
    before any backoff).

Documentation

Runnable, compile-checked examples for WithRetry and a custom RetryPolicy.

@mlwelles mlwelles requested a review from matthewmcneely as a code owner June 4, 2026 17:11
@mlwelles mlwelles force-pushed the feature/retry-record branch from 21760f3 to 67c0b31 Compare June 4, 2026 17:12

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 2 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread retry.go
Comment thread retry.go Outdated
Add RetryPolicy / DefaultRetryPolicy and a runner that re-executes a
function on aborted Dgraph transactions with exponential backoff
(retry.go), exposed on the client via a WithRetry method.
@mlwelles mlwelles force-pushed the feature/retry-record branch from 67c0b31 to b477c28 Compare June 4, 2026 17:24
@mlwelles mlwelles changed the title feat: aborted-transaction retry policy and runner feat: aborted-transaction retry policy and client integration Jun 4, 2026
…Retries

Addresses review feedback on the retry policy:

- delay() now adds jitter before the final MaxDelay clamp, so the documented
  invariant "no single delay exceeds MaxDelay" holds. Previously the cap was
  applied before jitter, letting the delay reach MaxDelay*(1+Jitter). The
  exponential is also capped before the shift to avoid overflow at large
  attempt counts.
- WithRetry clamps a negative MaxRetries to zero so fn always runs at least
  once, as documented, instead of skipping the loop and hitting the
  unreachable panic.

Tests:
- TestRetryPolicyDelayJitterNeverExceedsMaxCap asserts the cap invariant
  across attempts with large jitter.
- TestWithRetryNegativeMaxRetries asserts fn runs once with MaxRetries=-1.
- TestWithRetryContextCancellation now returns dgo.ErrAborted from fn so it
  actually enters the backoff sleep and exercises the ctx.Done() path it
  claims to cover (previously it returned ctx.Err() and bailed immediately).

Docs: add runnable examples for WithRetry and a custom RetryPolicy.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="retry_example_test.go">

<violation number="1" location="retry_example_test.go:20">
P2: Example function name does not match exported method: ExampleClient_withRetry should be ExampleClient_WithRetry to properly associate with Client.WithRetry in godoc.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread retry_example_test.go
// predicate — is retried with exponential backoff instead of surfacing to the
// caller. Non-abort errors return immediately; the context bounds the total
// wait.
func ExampleClient_withRetry() {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Example function name does not match exported method: ExampleClient_withRetry should be ExampleClient_WithRetry to properly associate with Client.WithRetry in godoc.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At retry_example_test.go, line 20:

<comment>Example function name does not match exported method: ExampleClient_withRetry should be ExampleClient_WithRetry to properly associate with Client.WithRetry in godoc.</comment>

<file context>
@@ -0,0 +1,53 @@
+// predicate — is retried with exponential backoff instead of surfacing to the
+// caller. Non-abort errors return immediately; the context bounds the total
+// wait.
+func ExampleClient_withRetry() {
+	client, _ := modusgraph.NewClient("dgraph://localhost:9080")
+	defer client.Close()
</file context>
Suggested change
func ExampleClient_withRetry() {
func ExampleClient_WithRetry() {

Drop the int->uint shift conversion (Go allows a signed shift count, and
attempt is bounded to [0,63)), clearing gosec G115. Annotate the jitter RNG
with nolint:gosec — backoff jitter is not security-sensitive, so math/rand/v2
is appropriate. No behavior change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant