Skip to content

feat: surface transaction abort reason on AbortedError#307

Open
rahst12 wants to merge 1 commit into
dgraph-io:mainfrom
rahst12:txn-abort-reason-surface-phase-1
Open

feat: surface transaction abort reason on AbortedError#307
rahst12 wants to merge 1 commit into
dgraph-io:mainfrom
rahst12:txn-abort-reason-surface-phase-1

Conversation

@rahst12

@rahst12 rahst12 commented Jun 17, 2026

Copy link
Copy Markdown

Problem

When Dgraph aborts a transaction, pydgraph collapses every cause into a single opaque
exception:

pydgraph.errors.AbortedError: Transaction has been aborted. Please retry

AbortedError exposes no way to tell why the transaction aborted, so an application
cannot distinguish the cases that warrant different responses:

  • a write-write conflict — retry immediately with a fresh transaction;
  • a predicate move — a predicate is relocating between groups and commits on it are
    temporarily blocked, so back off and retry once the move completes;
  • a stale start-ts — the transaction predates the current Zero leader (a leader
    change); retry with a fresh transaction.

The server now reports the category (see the companion dgraph PR), encoding it as a
"<code>: <detail>" prefix on the gRPC ABORTED status. Previously pydgraph discarded
even the server's message — it raised a bare AbortedError with a hardcoded default
string — so that information never reached the caller at all.

Fix

Surface the category as a typed attribute on AbortedError, parsed from the message the
server already sends.

  • AbortReason enumCONFLICT, PREDICATE_MOVE, STALE_STARTTS, UNKNOWN
    (exported from the top-level pydgraph package).
  • AbortedError.reason — the parsed AbortReason, derived via parse_abort_reason()
    from the "<code>: <detail>" prefix. The full server text remains available via
    str(error).
  • Plumbing — the sync and async commit/mutate paths (txn.py, async_txn.py) now
    pass the server's message into AbortedError instead of raising a reasonless default;
    util.abort_error_message() extracts it from the gRPC error (preferring
    error.details(), falling back to str(error)).

Backward compatible by design:

  • Against an older server that reports no categorized prefix, reason is
    AbortReason.UNKNOWN, so callers degrade gracefully.
  • The exception type is unchanged and it remains raised from the same paths — existing
    except AbortedError / retry logic is unaffected.

Tests

  • tests/test_abort_reason.py — unit tests for parse_abort_reason() and
    AbortedError.reason: each category parses correctly, and an empty or prefix-less
    message degrades to UNKNOWN.
  • tests/test_abort_reason_live.py — end-to-end test that drives a real (locally
    patched) Dgraph cluster, forces each abort category, and asserts it propagates to
    AbortedError.reason. Skips gracefully when cluster prerequisites are unavailable.
    tests/docker-compose.multigroup.yml stands up the multi-group cluster needed for the
    predicate-move case.

Future work

This change surfaces the category only — the slice the server provides today. Once the
server enriches aborts with the contended predicate and UID/token (planned dgraph
follow-up via the gRPC rich-error model, then a first-class TxnContext field), this
client can add typed accessors (e.g. conflict_predicates) without breaking the reason
attribute introduced here. A later phase may also expand AbortReason (e.g. splitting
non-move predicate-move cases into PREDICATE_UNAVAILABLE / INTERNAL); unknown codes
already map to UNKNOWN, so adding values stays backward compatible.

Example

Before, every abort looked the same — there was no way to branch on the cause:

except pydgraph.AbortedError:
    retry_with_new_txn()  # the only option, regardless of why it aborted

Now the category is available via error.reason:

try:
    txn.mutate(set_obj=data)  # or txn.commit()
except pydgraph.AbortedError as error:
    if error.reason is pydgraph.AbortReason.PREDICATE_MOVE:
        backoff_and_retry()   # predicate relocating — back off, then retry
    elif error.reason is pydgraph.AbortReason.UNKNOWN:
        logging.warning("Txn aborted: %s", error)  # full text still available
        retry_with_new_txn()
    else:  # CONFLICT or STALE_STARTTS — retry with a fresh txn
        retry_with_new_txn()

Checklist

@rahst12 rahst12 requested a review from a team as a code owner June 17, 2026 05:58
@CLAassistant

CLAassistant commented Jun 17, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants