Skip to content

fix(broadcast): bail on confirmed-with-err to avoid CF 504#302

Merged
rz1989s merged 1 commit into
mainfrom
fix/issue-299-bail-on-confirmed-with-err
May 23, 2026
Merged

fix(broadcast): bail on confirmed-with-err to avoid CF 504#302
rz1989s merged 1 commit into
mainfrom
fix/issue-299-bail-on-confirmed-with-err

Conversation

@rz1989s
Copy link
Copy Markdown
Member

@rz1989s rz1989s commented May 23, 2026

Summary

Closes #299. When a transaction is confirmed on-chain but the program returns an error (e.g. AccountNotInitialized, slippage exceeded), sendAndConfirmWithRetry used to discard value.err from confirmTransaction and return the signature as a success. If the WS subscription was slow to fire, confirmTransaction polled until blockhash expiry (~90s), exceeding the 100s Cloudflare edge timeout — the user saw a generic CF 504 HTML page instead of a structured error envelope.

This PR implements Option 3 from the issue:

  1. confirmInspected — wraps connection.confirmTransaction and throws a new TransactionFailedOnChainError when value.err is non-null, capturing both the signature and the err detail.
  2. pollForErr — runs in parallel, calling getSignatureStatuses every interval and returning the same typed error the moment the RPC reports confirmed-with-err. Either path wins the race; /api/tx/broadcast catches the typed error and returns a structured 502 TX_FAILED_ON_CHAIN with the err payload so the FE can render an actionable message.

The poll loop is fire-and-forget on cleanup so successful broadcasts do not pay an extra interval (2s in prod) of latency waiting for the poll to observe stopped=true.

Response envelope on a confirmed-with-err tx

Before:

  • Best case: 200 { signature } with a signature whose tx errored — FE proceeded as if success.
  • Worst case: CF 504 HTML page after 100s.

After:

{
  "error": {
    "code": "TX_FAILED_ON_CHAIN",
    "message": "Transaction <sig> confirmed on-chain but the program returned an error: {\"InstructionError\":[0,{\"Custom\":3012}]}",
    "signature": "<sig>",
    "err": { "InstructionError": [0, { "Custom": 3012 }] }
  }
}

Test plan

  • pnpm test -- --run in packages/agent — 1652 passed (was 1648, +4 new tests: 3 in sendWithRetry, 1 in tx-broadcast)
  • pnpm typecheck clean across root + sdk + app + agent
  • New tests cover three paths: (a) confirmTransaction returns { value: { err } } → throws; (b) getSignatureStatuses detects err first while confirmTransaction hangs → throws; (c) poll stays quiet on null (tx-not-yet-seen) status while confirmation resolves cleanly.
  • routes/tx-broadcast test asserts the structured 502 TX_FAILED_ON_CHAIN envelope shape (code, message, signature, err).
  • Post-merge: re-run /tmp/sipher-smoke/smoke5.mjs once VPS deploys — expect /api/tx/broadcast to return the new 502 envelope instead of 200 for the confirmed-with-3012 signature.

Notes

  • The winner.kind === 'polled' && !winner.errOrNull branch is structurally unreachable (poll only returns null when stopped=true, set in finally); the code throws an invariant error if it ever fires.
  • redact() is reused on err.message to ensure Helius API-key fragments don't escape if a future code path attaches them.
  • Smoke evidence from frontier_sip_18 + 19: confirmTransaction typically resolves in ~3s for confirmed-with-err txs, so this is primarily a correctness fix; the parallel poll adds defense-in-depth for the slow-subscription edge case the issue called out.

When a tx is confirmed on-chain but the program returns an error (e.g.
AccountNotInitialized, slippage exceeded), sendAndConfirmWithRetry used
to discard the value.err from confirmTransaction and return the
signature as success. If the WS subscription was slow to fire,
confirmTransaction would poll until blockhash expiry, exceeding the
100s Cloudflare edge timeout and the user would see a generic CF 504
HTML page instead of a structured error envelope.

Two changes implement issue #299 Option 3:

1. confirmInspected wraps connection.confirmTransaction and throws a
   new TransactionFailedOnChainError when value.err is non-null,
   capturing both the signature and the err detail.

2. pollForErr runs in parallel, calling getSignatureStatuses every
   interval and returning the same typed error as soon as the RPC
   reports a confirmed-with-err status. Either path wins the race;
   /api/tx/broadcast catches the typed error and returns a structured
   502 TX_FAILED_ON_CHAIN with the err payload so the FE can render
   an actionable message.

The poll loop is fire-and-forget on cleanup so successful broadcasts
do not pay an extra interval of latency waiting for the poll to
observe stopped=true.

Closes #299
@vercel
Copy link
Copy Markdown

vercel Bot commented May 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
sipher Ready Ready Preview, Comment May 23, 2026 2:44pm

@rz1989s rz1989s merged commit f0e4ddb into main May 23, 2026
8 checks passed
@rz1989s rz1989s deleted the fix/issue-299-bail-on-confirmed-with-err branch May 23, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(broadcast): return structured 504 instead of generic CF 504 when tx confirms with program error

1 participant