Skip to content

[NET-752] [Alert L34X4c] ink-sepolia_Ingest_Behind_Head#489

Open
elina-chertova wants to merge 3 commits into
open-betafrom
alert-fix/l34x4c-ink-sepolia-ingest-behind-head-squid-sdk
Open

[NET-752] [Alert L34X4c] ink-sepolia_Ingest_Behind_Head#489
elina-chertova wants to merge 3 commits into
open-betafrom
alert-fix/l34x4c-ink-sepolia-ingest-behind-head-squid-sdk

Conversation

@elina-chertova
Copy link
Copy Markdown

Automated fix proposal for alert L34X4c.

  • Alert: ink-sepolia_Ingest_Behind_Head
  • Base branch: open-beta
  • Investigation: /app/data/investigations/L34X4c
  • Report: /app/data/investigations/L34X4c/report.html

Reviewer quick view

  • Scope: 1 file(s) in evm

  • Root cause (agent): not explicitly captured

  • Summary: FINAL RESPONSE — ink-sepolia_Ingest_Behind_Head (L34X4c)

    Fix class: rca_fix
    Mitigation kind: rca_fix
    Infra patch sufficient: true
    Requires operator action: false
    Confidence: low
    Evidence basis: code, metrics, rpc
    Falsification: if dump pod restarts after the schema fix and crashes again at a
    0x7e block where nonce is present (Alchemy-synthesized), the nonce
    schema is not the root cause; exit code 137 on the previous container
    would shift blame to OOM
    Follow-up: kubectl logs dump-ink-sepolia-0 -n evm-archive --previous --tail=100
    to confirm crash type; if exit 137 → add memory_limit: 512Mi to
    ink-sepolia.yaml as next incremental step

    Verdict: accept_with_changes

    Accept the implementer's staged diff (nonce: SMALL_QTY → nonce: option(SMALL_QTY) in
    evm/evm-rpc/src/rpc-data.ts), with one material override: downgrade confidence from medium to low
    and explicitly record that the nonce schema is not proven as the crash cause for this incident —
    Alchemy provides synthetic nonces for type 0x7e deposit transactions, so the required-field check
    passes. The fix is still correct and should be delivered, but the post-deploy validation step is
    non-trivial: if the pod crashes again after the fix, root cause shifts to OOM or unhandled RPC
    error.

    What is known

    ┌──────────────────────────────────────┬────────────────────────┬──────────┐
    │ Fact │ Source │ Tag │
    ├──────────────────────────────────────┼────────────────────────┼──────────┤
    │ Pipeline stalled at block 51,149,075 │ Grafana │ [evidenc │
    │ since 14:46 UTC (2.5h before alert │ sqd_latest_processed_b │ e] │
    │ fired) │ lock_timestamp flat, │ │
    │ │ 90-min window │ │
    ├──────────────────────────────────────┼────────────────────────┼──────────┤
    │ Dump pod metric: 86/87 scrape points │ Grafana 3h window (86 │ [evidenc │
    │ then absent → pod crashed │ points vs 91 for │ e] │
    │ │ ingest) │ │
    ├──────────────────────────────────────┼────────────────────────┼──────────┤
    │ Alchemy RPC alive: │ RPC probe │ [evidenc │
    │ debug_traceBlockByNumber HTTP 200, │ │ e] │
    │ <100ms │ │ │
    ├──────────────────────────────────────┼────────────────────────┼──────────┤
    │ Block 51149075 (last processed): 1 │ rpc.replay_block │ [evidenc │
    │ tx, type 0x7e, nonce: "0x30c7914" │ │ e] │
    │ PRESENT │ │ │
    ├──────────────────────────────────────┼────────────────────────┼──────────┤
    │ Block 51149076 (next block): 1 tx, │ rpc.replay_block │ [evidenc │
    │ type 0x7e, nonce: "0x30c7915" │ │ e] │
    │ PRESENT │ │ │
    ├──────────────────────────────────────┼────────────────────────┼──────────┤
    │ Alchemy synthesizes nonces for 0x7e │ RPC probe (both blocks │ [evidenc │
    │ deposit txs (equal to next block │ confirmed) │ e] │
    │ number) │ │ │
    ├──────────────────────────────────────┼────────────────────────┼──────────┤
    │ nonce: SMALL_QTY is required (not │ evm/evm-rpc/src/rpc-da │ [evidenc │
    │ option()) in Transaction schema, │ ta.ts SHA 4932ad6 │ e] │
    │ line 221 │ │ │
    ├──────────────────────────────────────┼────────────────────────┼──────────┤
    │ PR [NET-674] [Alert odMjbK] ink-sepolia_Writer_Short_Stall #486 (nonce → option(SMALL_QTY)) │ gh pr view 486 --repo │ [evidenc │
    │ is OPEN, unmerged │ subsquid/squid-sdk │ e] │
    ├──────────────────────────────────────┼────────────────────────┼──────────┤
    │ QuikNode commented out (PR [NET-652] [Alert EdvwWb] ink-sepolia_No_Dumper_Data #482, │ ink-sepolia.yaml SHA │ [evidenc │
    │ 2026-05-22); only Alchemy active, │ 9999799 │ e] │
    │ capacity: 2 │ │ │
    ├──────────────────────────────────────┼────────────────────────┼──────────┤
    │ Heavy data profile: │ ink-sepolia.yaml │ [evidenc │
    │ traces+statediffs+debug_api_for_stat │ │ e] │
    │ ediffs+batch_limit:1 │ │ │
    ├──────────────────────────────────────┼────────────────────────┼──────────┤
    │ Default dump memory: 256M; no │ values.yaml line 14 + │ [evidenc │
    │ override in ink-sepolia.yaml │ ink-sepolia.yaml │ e] │
    ├──────────────────────────────────────┼────────────────────────┼──────────┤
    │ kubectl unavailable → no crash log, │ k8s.logs all failed │ [evidenc │
    │ no exit code │ │ e] │
    ├──────────────────────────────────────┼────────────────────────┼──────────┤
    │ Prior incident odMjbK on same │ Memory refs │ [memory: │
    │ network resolved by identical │ │ odMjbK] │
    │ nonce-option fix (PR [NET-674] [Alert odMjbK] ink-sepolia_Writer_Short_Stall #486 origin) │ │ │
    └──────────────────────────────────────┴────────────────────────┴──────────┘

    What is still uncertain

    Root cause of the dump pod crash is unconfirmed. Two competing hypotheses:

    H1 — Nonce schema on a provider that omits nonce (historical pattern, same network):

    • Supported by: nonce: SMALL_QTY is provably required; prior incident odMjbK on ink-sepolia used
      this exact fix; PR [NET-674] [Alert odMjbK] ink-sepolia_Writer_Short_Stall #486 proposes it
    • Undermined by: Alchemy synthesizes nonces for 0x7e txs — blocks 51149075 and 51149076 both have
      nonce present in the RPC response; the schema check would pass for Alchemy. This is a meaningful
      departure from odMjbK where a different provider or block may have omitted nonce.

    H2 — OOM (cumulative memory pressure):

    • Supported by: 256M default + traces+statediffs+debug_api is the heaviest data profile; pod ran
      ~87 scrape intervals before crashing (cumulative, not per-block triggered); memory pressure is
      additive over long runs
    • Undermined by: block 51149076 is only 4.8 KB / 1 tx / 54 ms (no single large block); OOM from a
      single block is ruled out, but cumulative is not

    Current best explanation

    The exact crash cause is undetermined. Most likely: either (a) a transient Alchemy error or
    edge-case in debug_traceBlockByNumber processing that produced an unhandled exception not visible
    via RPC probe, or (b) cumulative OOM over a multi-hour run with the heavy data profile. The nonce
    schema fix (H1) is the correct code action regardless — it is defensive against providers that
    omit nonce — but it is not confirmed to prevent recurrence on Alchemy.

    Track A — Safe action now

    Merge PR [NET-674] [Alert odMjbK] ink-sepolia_Writer_Short_Stall #486 — nonce: SMALL_QTY → nonce: option(SMALL_QTY) in evm/evm-rpc/src/rpc-data.ts
    (squid-sdk open-beta).

    This is a one-line defensive fix that:

    1. Is already reviewed and staged in fixes/proposed/evm/evm-rpc/src/rpc-data.ts
    2. Uses the same option() pattern used by 10+ other optional Transaction fields
    3. Prevents crashes on any provider that omits nonce for 0x7e deposit txs (even if Alchemy
      happens to synthesize it today)
    4. Has zero downside risk — loosening a validator from required → optional cannot break existing
      behaviour on providers that do provide it

    Diff (staged in fixes/proposed/):

    --- a/evm/evm-rpc/src/rpc-data.ts
    +++ b/evm/evm-rpc/src/rpc-data.ts

Fix metadata

  • Fix class: rca_fix
  • Confidence: low
  • Evidence basis: code, metrics, rpc
  • Falsification: if dump pod restarts after the schema fix and crashes again at a
  • Follow-up: kubectl logs dump-ink-sepolia-0 -n evm-archive --previous --tail=100
    (Generated by the terminal-debate agent — values reflect the agent's self-assessment, not a verified verdict. Use them as a starting point for review.)

Summary

FINAL RESPONSE — ink-sepolia_Ingest_Behind_Head (L34X4c)

Fix class: rca_fix
Mitigation kind: rca_fix
Infra patch sufficient: true
Requires operator action: false
Confidence: low
Evidence basis: code, metrics, rpc
Falsification: if dump pod restarts after the schema fix and crashes again at a
0x7e block where nonce is present (Alchemy-synthesized), the nonce
schema is not the root cause; exit code 137 on the previous container
would shift blame to OOM
Follow-up: kubectl logs dump-ink-sepolia-0 -n evm-archive --previous --tail=100
to confirm crash type; if exit 137 → add memory_limit: 512Mi to
ink-sepolia.yaml as next incremental step

Verdict: accept_with_changes

Accept the implementer's staged diff (nonce: SMALL_QTY → nonce: option(SMALL_QTY) in
evm/evm-rpc/src/rpc-data.ts), with one material override: downgrade confidence from medium to low
and explicitly record that the nonce schema is not proven as the crash cause for this incident —
Alchemy provides synthetic nonces for type 0x7e deposit transactions, so the required-field check
passes. The fix is still correct and should be delivered, but the post-deploy validation step is
non-trivial: if the pod crashes again after the fix, root cause shifts to OOM or unhandled RPC
error.

What is known

┌──────────────────────────────────────┬────────────────────────┬──────────┐
│ Fact │ Source │ Tag │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Pipeline stalled at block 51,149,075 │ Grafana │ [evidenc │
│ since 14:46 UTC (2.5h before alert │ sqd_latest_processed_b │ e] │
│ fired) │ lock_timestamp flat, │ │
│ │ 90-min window │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Dump pod metric: 86/87 scrape points │ Grafana 3h window (86 │ [evidenc │
│ then absent → pod crashed │ points vs 91 for │ e] │
│ │ ingest) │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Alchemy RPC alive: │ RPC probe │ [evidenc │
│ debug_traceBlockByNumber HTTP 200, │ │ e] │
│ <100ms │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Block 51149075 (last processed): 1 │ rpc.replay_block │ [evidenc │
│ tx, type 0x7e, nonce: "0x30c7914" │ │ e] │
│ PRESENT │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Block 51149076 (next block): 1 tx, │ rpc.replay_block │ [evidenc │
│ type 0x7e, nonce: "0x30c7915" │ │ e] │
│ PRESENT │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Alchemy synthesizes nonces for 0x7e │ RPC probe (both blocks │ [evidenc │
│ deposit txs (equal to next block │ confirmed) │ e] │
│ number) │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ nonce: SMALL_QTY is required (not │ evm/evm-rpc/src/rpc-da │ [evidenc │
│ option()) in Transaction schema, │ ta.ts SHA 4932ad6 │ e] │
│ line 221 │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ PR #486 (nonce → option(SMALL_QTY)) │ gh pr view 486 --repo │ [evidenc │
│ is OPEN, unmerged │ subsquid/squid-sdk │ e] │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ QuikNode commented out (PR #482, │ ink-sepolia.yaml SHA │ [evidenc │
│ 2026-05-22); only Alchemy active, │ 9999799 │ e] │
│ capacity: 2 │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Heavy data profile: │ ink-sepolia.yaml │ [evidenc │
│ traces+statediffs+debug_api_for_stat │ │ e] │
│ ediffs+batch_limit:1 │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Default dump memory: 256M; no │ values.yaml line 14 + │ [evidenc │
│ override in ink-sepolia.yaml │ ink-sepolia.yaml │ e] │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ kubectl unavailable → no crash log, │ k8s.logs all failed │ [evidenc │
│ no exit code │ │ e] │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Prior incident odMjbK on same │ Memory refs │ [memory: │
│ network resolved by identical │ │ odMjbK] │
│ nonce-option fix (PR #486 origin) │ │ │
└──────────────────────────────────────┴────────────────────────┴──────────┘

What is still uncertain

Root cause of the dump pod crash is unconfirmed. Two competing hypotheses:

H1 — Nonce schema on a provider that omits nonce (historical pattern, same network):

  • Supported by: nonce: SMALL_QTY is provably required; prior incident odMjbK on ink-sepolia used
    this exact fix; PR [NET-674] [Alert odMjbK] ink-sepolia_Writer_Short_Stall #486 proposes it
  • Undermined by: Alchemy synthesizes nonces for 0x7e txs — blocks 51149075 and 51149076 both have
    nonce present in the RPC response; the schema check would pass for Alchemy. This is a meaningful
    departure from odMjbK where a different provider or block may have omitted nonce.

H2 — OOM (cumulative memory pressure):

  • Supported by: 256M default + traces+statediffs+debug_api is the heaviest data profile; pod ran
    ~87 scrape intervals before crashing (cumulative, not per-block triggered); memory pressure is
    additive over long runs
  • Undermined by: block 51149076 is only 4.8 KB / 1 tx / 54 ms (no single large block); OOM from a
    single block is ruled out, but cumulative is not

Current best explanation

The exact crash cause is undetermined. Most likely: either (a) a transient Alchemy error or
edge-case in debug_traceBlockByNumber processing that produced an unhandled exception not visible
via RPC probe, or (b) cumulative OOM over a multi-hour run with the heavy data profile. The nonce
schema fix (H1) is the correct code action regardless — it is defensive against providers that
omit nonce — but it is not confirmed to prevent recurrence on Alchemy.

Track A — Safe action now

Merge PR #486 — nonce: SMALL_QTY → nonce: option(SMALL_QTY) in evm/evm-rpc/src/rpc-data.ts
(squid-sdk open-beta).

This is a one-line defensive fix that:

  1. Is already reviewed and staged in fixes/proposed/evm/evm-rpc/src/rpc-data.ts
  2. Uses the same option() pattern used by 10+ other optional Transaction fields
  3. Prevents crashes on any provider that omits nonce for 0x7e deposit txs (even if Alchemy
    happens to synthesize it today)
  4. Has zero downside risk — loosening a validator from required → optional cannot break existing
    behaviour on providers that do provide it

Diff (staged in fixes/proposed/):

--- a/evm/evm-rpc/src/rpc-data.ts
+++ b/evm/evm-rpc/src/rpc-data.ts

Risk & rollout

  • Suggested rollout: canary / one-network-first, then broader rollout after signal is stable.
  • Rollback: revert this PR (or restore previous config values/files) if the incident signal worsens.

Reproduction status

Incident behavior was reproduced or corroborated strongly enough for a non-hypothesis fix proposal.

Validation checklist

  • Verify the original incident signal improves (logs/metrics/alerts) after deploy.
  • Verify no regression on sibling networks/providers/services touched by this change.
  • Confirm queue / delivery pipeline status returns to expected steady state.

Changed files

  • evm/evm-rpc/src/rpc-data.ts

Notify

cc @tmcgroul (automation opened this PR.)

Elina Chertova and others added 2 commits June 1, 2026 09:56
…optional

Making `nonce: option(SMALL_QTY)` in rpc-data.ts (OP Stack 0x7e deposit txs
omit nonce) widened Transaction.nonce to `string | null | undefined`, which
broke `tsc` in every non-deposit consumer that requires a present nonce.

Assert non-null at the 15 affected call sites, matching how sibling fields
(gasPrice/value/v/r/s) are already handled:
- rpc.ts: qty2Int(tx.nonce) / qty2Int(candidate.nonce)
- verification.ts: 13x BigInt(tx.nonce) across tx-type encode branches

The 0x7e branch never references nonce, so deposit txs are unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…kage cascade)

Completes the nonce-optional cascade: after evm-rpc compiled, the dependent
package surfaced `mapping.ts(534,24)` TS2345 — `qty2Int(src.nonce)` on the now
`string | null | undefined` rpc.Transaction.nonce. Wrapped with assertNotNull,
matching the evm-rpc consumers.

Produced by the squid-sdk self-repair loop (a real Claude implementer run with a
full shell: it ran `rush build`, found this layered cross-package error, and
fixed it). Verified locally: `rush build --to @subsquid/evm-rpc --from
@subsquid/evm-rpc` → 0 TS errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants