[NET-752] [Alert L34X4c] ink-sepolia_Ingest_Behind_Head#489
Open
elina-chertova wants to merge 3 commits into
Open
[NET-752] [Alert L34X4c] ink-sepolia_Ingest_Behind_Head#489elina-chertova wants to merge 3 commits into
elina-chertova wants to merge 3 commits into
Conversation
…optional Making `nonce: option(SMALL_QTY)` in rpc-data.ts (OP Stack 0x7e deposit txs omit nonce) widened Transaction.nonce to `string | null | undefined`, which broke `tsc` in every non-deposit consumer that requires a present nonce. Assert non-null at the 15 affected call sites, matching how sibling fields (gasPrice/value/v/r/s) are already handled: - rpc.ts: qty2Int(tx.nonce) / qty2Int(candidate.nonce) - verification.ts: 13x BigInt(tx.nonce) across tx-type encode branches The 0x7e branch never references nonce, so deposit txs are unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…kage cascade) Completes the nonce-optional cascade: after evm-rpc compiled, the dependent package surfaced `mapping.ts(534,24)` TS2345 — `qty2Int(src.nonce)` on the now `string | null | undefined` rpc.Transaction.nonce. Wrapped with assertNotNull, matching the evm-rpc consumers. Produced by the squid-sdk self-repair loop (a real Claude implementer run with a full shell: it ran `rush build`, found this layered cross-package error, and fixed it). Verified locally: `rush build --to @subsquid/evm-rpc --from @subsquid/evm-rpc` → 0 TS errors. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automated fix proposal for alert
L34X4c.open-beta/app/data/investigations/L34X4c/app/data/investigations/L34X4c/report.htmlReviewer quick view
Scope: 1 file(s) in evm
Root cause (agent): not explicitly captured
Summary: FINAL RESPONSE — ink-sepolia_Ingest_Behind_Head (L34X4c)
Fix class: rca_fix
Mitigation kind: rca_fix
Infra patch sufficient: true
Requires operator action: false
Confidence: low
Evidence basis: code, metrics, rpc
Falsification: if dump pod restarts after the schema fix and crashes again at a
0x7e block where nonce is present (Alchemy-synthesized), the nonce
schema is not the root cause; exit code 137 on the previous container
would shift blame to OOM
Follow-up: kubectl logs dump-ink-sepolia-0 -n evm-archive --previous --tail=100
to confirm crash type; if exit 137 → add memory_limit: 512Mi to
ink-sepolia.yaml as next incremental step
Verdict: accept_with_changes
Accept the implementer's staged diff (nonce: SMALL_QTY → nonce: option(SMALL_QTY) in
evm/evm-rpc/src/rpc-data.ts), with one material override: downgrade confidence from medium to low
and explicitly record that the nonce schema is not proven as the crash cause for this incident —
Alchemy provides synthetic nonces for type 0x7e deposit transactions, so the required-field check
passes. The fix is still correct and should be delivered, but the post-deploy validation step is
non-trivial: if the pod crashes again after the fix, root cause shifts to OOM or unhandled RPC
error.
What is known
┌──────────────────────────────────────┬────────────────────────┬──────────┐
│ Fact │ Source │ Tag │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Pipeline stalled at block 51,149,075 │ Grafana │ [evidenc │
│ since 14:46 UTC (2.5h before alert │ sqd_latest_processed_b │ e] │
│ fired) │ lock_timestamp flat, │ │
│ │ 90-min window │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Dump pod metric: 86/87 scrape points │ Grafana 3h window (86 │ [evidenc │
│ then absent → pod crashed │ points vs 91 for │ e] │
│ │ ingest) │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Alchemy RPC alive: │ RPC probe │ [evidenc │
│ debug_traceBlockByNumber HTTP 200, │ │ e] │
│ <100ms │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Block 51149075 (last processed): 1 │ rpc.replay_block │ [evidenc │
│ tx, type 0x7e, nonce: "0x30c7914" │ │ e] │
│ PRESENT │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Block 51149076 (next block): 1 tx, │ rpc.replay_block │ [evidenc │
│ type 0x7e, nonce: "0x30c7915" │ │ e] │
│ PRESENT │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Alchemy synthesizes nonces for 0x7e │ RPC probe (both blocks │ [evidenc │
│ deposit txs (equal to next block │ confirmed) │ e] │
│ number) │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ nonce: SMALL_QTY is required (not │ evm/evm-rpc/src/rpc-da │ [evidenc │
│ option()) in Transaction schema, │ ta.ts SHA 4932ad6 │ e] │
│ line 221 │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ PR [NET-674] [Alert odMjbK] ink-sepolia_Writer_Short_Stall #486 (nonce → option(SMALL_QTY)) │ gh pr view 486 --repo │ [evidenc │
│ is OPEN, unmerged │ subsquid/squid-sdk │ e] │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ QuikNode commented out (PR [NET-652] [Alert EdvwWb] ink-sepolia_No_Dumper_Data #482, │ ink-sepolia.yaml SHA │ [evidenc │
│ 2026-05-22); only Alchemy active, │ 9999799 │ e] │
│ capacity: 2 │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Heavy data profile: │ ink-sepolia.yaml │ [evidenc │
│ traces+statediffs+debug_api_for_stat │ │ e] │
│ ediffs+batch_limit:1 │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Default dump memory: 256M; no │ values.yaml line 14 + │ [evidenc │
│ override in ink-sepolia.yaml │ ink-sepolia.yaml │ e] │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ kubectl unavailable → no crash log, │ k8s.logs all failed │ [evidenc │
│ no exit code │ │ e] │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Prior incident odMjbK on same │ Memory refs │ [memory: │
│ network resolved by identical │ │ odMjbK] │
│ nonce-option fix (PR [NET-674] [Alert odMjbK] ink-sepolia_Writer_Short_Stall #486 origin) │ │ │
└──────────────────────────────────────┴────────────────────────┴──────────┘
What is still uncertain
Root cause of the dump pod crash is unconfirmed. Two competing hypotheses:
H1 — Nonce schema on a provider that omits nonce (historical pattern, same network):
this exact fix; PR [NET-674] [Alert odMjbK] ink-sepolia_Writer_Short_Stall #486 proposes it
nonce present in the RPC response; the schema check would pass for Alchemy. This is a meaningful
departure from odMjbK where a different provider or block may have omitted nonce.
H2 — OOM (cumulative memory pressure):
~87 scrape intervals before crashing (cumulative, not per-block triggered); memory pressure is
additive over long runs
single block is ruled out, but cumulative is not
Current best explanation
The exact crash cause is undetermined. Most likely: either (a) a transient Alchemy error or
edge-case in debug_traceBlockByNumber processing that produced an unhandled exception not visible
via RPC probe, or (b) cumulative OOM over a multi-hour run with the heavy data profile. The nonce
schema fix (H1) is the correct code action regardless — it is defensive against providers that
omit nonce — but it is not confirmed to prevent recurrence on Alchemy.
Track A — Safe action now
Merge PR [NET-674] [Alert odMjbK] ink-sepolia_Writer_Short_Stall #486 — nonce: SMALL_QTY → nonce: option(SMALL_QTY) in evm/evm-rpc/src/rpc-data.ts
(squid-sdk open-beta).
This is a one-line defensive fix that:
happens to synthesize it today)
behaviour on providers that do provide it
Diff (staged in fixes/proposed/):
--- a/evm/evm-rpc/src/rpc-data.ts
+++ b/evm/evm-rpc/src/rpc-data.ts
Fix metadata
(Generated by the terminal-debate agent — values reflect the agent's self-assessment, not a verified verdict. Use them as a starting point for review.)
Summary
FINAL RESPONSE — ink-sepolia_Ingest_Behind_Head (L34X4c)
Fix class: rca_fix
Mitigation kind: rca_fix
Infra patch sufficient: true
Requires operator action: false
Confidence: low
Evidence basis: code, metrics, rpc
Falsification: if dump pod restarts after the schema fix and crashes again at a
0x7e block where nonce is present (Alchemy-synthesized), the nonce
schema is not the root cause; exit code 137 on the previous container
would shift blame to OOM
Follow-up: kubectl logs dump-ink-sepolia-0 -n evm-archive --previous --tail=100
to confirm crash type; if exit 137 → add memory_limit: 512Mi to
ink-sepolia.yaml as next incremental step
Verdict: accept_with_changes
Accept the implementer's staged diff (nonce: SMALL_QTY → nonce: option(SMALL_QTY) in
evm/evm-rpc/src/rpc-data.ts), with one material override: downgrade confidence from medium to low
and explicitly record that the nonce schema is not proven as the crash cause for this incident —
Alchemy provides synthetic nonces for type 0x7e deposit transactions, so the required-field check
passes. The fix is still correct and should be delivered, but the post-deploy validation step is
non-trivial: if the pod crashes again after the fix, root cause shifts to OOM or unhandled RPC
error.
What is known
┌──────────────────────────────────────┬────────────────────────┬──────────┐
│ Fact │ Source │ Tag │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Pipeline stalled at block 51,149,075 │ Grafana │ [evidenc │
│ since 14:46 UTC (2.5h before alert │ sqd_latest_processed_b │ e] │
│ fired) │ lock_timestamp flat, │ │
│ │ 90-min window │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Dump pod metric: 86/87 scrape points │ Grafana 3h window (86 │ [evidenc │
│ then absent → pod crashed │ points vs 91 for │ e] │
│ │ ingest) │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Alchemy RPC alive: │ RPC probe │ [evidenc │
│ debug_traceBlockByNumber HTTP 200, │ │ e] │
│ <100ms │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Block 51149075 (last processed): 1 │ rpc.replay_block │ [evidenc │
│ tx, type 0x7e, nonce: "0x30c7914" │ │ e] │
│ PRESENT │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Block 51149076 (next block): 1 tx, │ rpc.replay_block │ [evidenc │
│ type 0x7e, nonce: "0x30c7915" │ │ e] │
│ PRESENT │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Alchemy synthesizes nonces for 0x7e │ RPC probe (both blocks │ [evidenc │
│ deposit txs (equal to next block │ confirmed) │ e] │
│ number) │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ nonce: SMALL_QTY is required (not │ evm/evm-rpc/src/rpc-da │ [evidenc │
│ option()) in Transaction schema, │ ta.ts SHA 4932ad6 │ e] │
│ line 221 │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ PR #486 (nonce → option(SMALL_QTY)) │ gh pr view 486 --repo │ [evidenc │
│ is OPEN, unmerged │ subsquid/squid-sdk │ e] │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ QuikNode commented out (PR #482, │ ink-sepolia.yaml SHA │ [evidenc │
│ 2026-05-22); only Alchemy active, │ 9999799 │ e] │
│ capacity: 2 │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Heavy data profile: │ ink-sepolia.yaml │ [evidenc │
│ traces+statediffs+debug_api_for_stat │ │ e] │
│ ediffs+batch_limit:1 │ │ │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Default dump memory: 256M; no │ values.yaml line 14 + │ [evidenc │
│ override in ink-sepolia.yaml │ ink-sepolia.yaml │ e] │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ kubectl unavailable → no crash log, │ k8s.logs all failed │ [evidenc │
│ no exit code │ │ e] │
├──────────────────────────────────────┼────────────────────────┼──────────┤
│ Prior incident odMjbK on same │ Memory refs │ [memory: │
│ network resolved by identical │ │ odMjbK] │
│ nonce-option fix (PR #486 origin) │ │ │
└──────────────────────────────────────┴────────────────────────┴──────────┘
What is still uncertain
Root cause of the dump pod crash is unconfirmed. Two competing hypotheses:
H1 — Nonce schema on a provider that omits nonce (historical pattern, same network):
this exact fix; PR [NET-674] [Alert odMjbK] ink-sepolia_Writer_Short_Stall #486 proposes it
nonce present in the RPC response; the schema check would pass for Alchemy. This is a meaningful
departure from odMjbK where a different provider or block may have omitted nonce.
H2 — OOM (cumulative memory pressure):
~87 scrape intervals before crashing (cumulative, not per-block triggered); memory pressure is
additive over long runs
single block is ruled out, but cumulative is not
Current best explanation
The exact crash cause is undetermined. Most likely: either (a) a transient Alchemy error or
edge-case in debug_traceBlockByNumber processing that produced an unhandled exception not visible
via RPC probe, or (b) cumulative OOM over a multi-hour run with the heavy data profile. The nonce
schema fix (H1) is the correct code action regardless — it is defensive against providers that
omit nonce — but it is not confirmed to prevent recurrence on Alchemy.
Track A — Safe action now
Merge PR #486 — nonce: SMALL_QTY → nonce: option(SMALL_QTY) in evm/evm-rpc/src/rpc-data.ts
(squid-sdk open-beta).
This is a one-line defensive fix that:
happens to synthesize it today)
behaviour on providers that do provide it
Diff (staged in fixes/proposed/):
--- a/evm/evm-rpc/src/rpc-data.ts
+++ b/evm/evm-rpc/src/rpc-data.ts
Risk & rollout
Reproduction status
Incident behavior was reproduced or corroborated strongly enough for a non-hypothesis fix proposal.
Validation checklist
Changed files
evm/evm-rpc/src/rpc-data.tsNotify
cc @tmcgroul (automation opened this PR.)