Skip to content

fix: cherry-pick: validate legacy checkpoint block against L1 in CheckValidBlock (#1638) (#1639)#1645

Merged
joanestebanr merged 1 commit into
developfrom
cherry-pick/1639-checkvalidblock-l1
Jun 8, 2026
Merged

fix: cherry-pick: validate legacy checkpoint block against L1 in CheckValidBlock (#1638) (#1639)#1645
joanestebanr merged 1 commit into
developfrom
cherry-pick/1639-checkvalidblock-l1

Conversation

@joanestebanr

@joanestebanr joanestebanr commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

🔄 Changes Summary

Cherry-pick of #1639 (merged into release/0.10) onto develop.

  • Fixes a sync failure loop in l1infotreesync after upgrading from the legacy syncer (e.g. 0.8.x) to the multidownloader-based implementation (bug: updating aggkit sometimes produce a stuck situation 'error downloading blocks' #1638).
  • Root cause: on upgrade the processor reports a last-processed checkpoint block (an "empty" block the legacy syncer persisted to track progress) that the legacy syncer downloaded, but the multidownloader's own freshly-created storage never contained it. EVMMultidownloader.CheckValidBlock then failed with not found in storage or blocks_reorged, leaving the syncer stuck with no self-recovery (manual DB deletion was the only workaround).
  • Fix: when a block is neither in blocks nor in blocks_reorged and is at or below the finalized block, CheckValidBlock now validates it against L1 by fetching the canonical header at that height and comparing hashes:
    • match → the block is canonical, it just was never downloaded by this multidownloader (legacy checkpoint after upgrade, or pruned block) → valid, the syncer resumes from lastBlock+1.
    • differ → the block is on an orphaned branch of a finalized height → error (severe inconsistency requiring manual intervention).
    • Blocks above the finalized block are not stable and keep the previous error behaviour.

⚠️ Breaking Changes

  • None.

📋 Config Updates

  • None.

✅ Testing

  • 🤖 Automatic: TestEVMMultidownloader_CheckValidBlock_LegacyCheckpointAfterUpgrade (real empty storage) covering the three paths; existing CheckValidBlock mock tests updated for the new finalized lookup. Build and multidownloader package tests verified locally on develop.
  • 🖱️ Manual: N/A.

🐞 Issues

🔗 Related PRs

📝 Notes

  • The fix lives in the multidownloader layer (CheckValidBlock), where the error surfaces. The validation is gated on blockNumber <= finalized, so the extra RPC call only happens for stable blocks and only while the missing block is being resolved (transient, during the legacy→multidownloader transition).

🤖 Generated with Claude Code

…1638) (#1639)

- Fixes a sync failure loop in `l1infotreesync` after upgrading from the
legacy syncer (e.g. 0.8.x) to the multidownloader-based implementation
(#1638).
- Root cause: on upgrade the processor reports a last-processed
**checkpoint block** (an "empty" block the legacy syncer persisted to
track progress) that the legacy syncer downloaded, but the
multidownloader's own freshly-created storage never contained it.
`EVMMultidownloader.CheckValidBlock` then failed with `not found in
storage or blocks_reorged`, leaving the syncer stuck with no
self-recovery (manual DB deletion was the only workaround).
- Fix: when a block is neither in `blocks` nor in `blocks_reorged` and
is **at or below the finalized block**, `CheckValidBlock` now validates
it against L1 by fetching the canonical header at that height and
comparing hashes:
- **match** → the block is canonical, it just was never downloaded by
this multidownloader (legacy checkpoint after upgrade, or pruned block)
→ **valid**, the syncer resumes from `lastBlock+1`.
- **differ** → the block is on an orphaned branch of a finalized height
→ **error** (severe inconsistency requiring manual intervention).
- Blocks **above** the finalized block are not stable and keep the
previous error behaviour.

- None.

- None.

- 🤖 **Automatic**:
`TestEVMMultidownloader_CheckValidBlock_LegacyCheckpointAfterUpgrade`
(real empty storage) covering the three paths: canonical hash matches →
valid; canonical hash differs → error; block above finalized → error.
Existing `CheckValidBlock` mock tests updated for the new finalized
lookup.
- 🖱️ **Manual**: N/A.

- Closes #1638

- N/A

- The fix lives in the multidownloader layer (`CheckValidBlock`), where
the error surfaces. The validation is gated on `blockNumber <=
finalized` so the extra RPC call only happens for stable blocks, and
only while the missing block is being resolved (transient, during the
legacy→multidownloader transition; once the first blocks are downloaded
and stored, the processor advances into the known range and this branch
no longer fires).
- On a canonical hash mismatch of a finalized block, the multidownloader
cannot synthesize a reorg it never observed, so it surfaces a
descriptive error rather than fabricating reorg data.

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
@claude

claude Bot commented Jun 8, 2026

Copy link
Copy Markdown

Claude finished @joanestebanr's task in 1m 17s —— View job


Code Review Complete


🚀 What's New

This PR cherry-picks a critical fix from release/0.10 (#1639) to develop that resolves a sync loop bug in l1infotreesync after upgrading from legacy syncer (v0.8.x) to the multidownloader-based implementation. The fix adds L1 validation fallback in CheckValidBlock for blocks that aren't present in local storage but are at or below the finalized block number.

🐛 Bug Fixes

📋 Config Updates

None

⚠️ Breaking Changes

None


Review Feedback

🟢 Strengths

  1. Well-targeted fix: The solution correctly addresses the root cause by adding L1 validation only for finalized blocks that are missing from local storage.

  2. Excellent test coverage: The new TestEVMMultidownloader_CheckValidBlock_LegacyCheckpointAfterUpgrade test comprehensively covers all code paths:

    • Happy path (canonical hash matches)
    • Error path (canonical hash differs - orphaned block)
    • Block above finalized (not stable yet)
    • RPC failure scenarios (finalized block lookup, canonical header lookup, nil header)
  3. Proper separation of concerns: The new checkValidBlockAgainstL1 function is well-separated with clear documentation explaining the logic.

  4. Safe error handling: When RPC calls fail, the function returns isValid=true with an error, allowing for retry rather than treating transient failures as permanent inconsistencies.

  5. Good defensive check: The code at lines 40-49 in evm_multidownloader_reorg.go properly prevents the L1 fallback from running when a different block is stored locally, avoiding masking of real reorgs.


🟡 Suggestions for Improvement

  1. Import alias consistency (multidownloader/evm_multidownloader_reorg.go:8)

    • The new import uses aggkittypes which is good and follows the project's CLAUDE.md guidelines for avoiding import conflicts.
    • ✅ No action needed - this is already correct.
  2. Consider caching finalized block number (multidownloader/evm_multidownloader_reorg.go:71)

    • The GetFinalizedBlockNumber call happens on every CheckValidBlock invocation when a block is missing. During the transition period after an upgrade, this could result in multiple RPC calls.
    • 💡 Optional optimization: Consider caching the finalized block number with a short TTL (e.g., 30 seconds) to reduce RPC overhead during the upgrade transition period. However, given this is a transient scenario, the current implementation is acceptable.
  3. Error message clarity (multidownloader/evm_multidownloader_reorg.go:79-82)

    • The error message for blocks above finalized is clear, but consider whether the processor/caller can actually handle this retry scenario effectively.
    • Current behavior: returns isValid=false with error
    • Question: Should this also return isValid=true (like RPC failures) to allow retry, since the block might become finalized soon?
    • This is more of a design question - the current behavior seems reasonable for forcing the syncer to wait until the block is finalized before validating.
  4. Log level consideration (multidownloader/evm_multidownloader_reorg.go:97-99)

    • The success case logs at Infof level. During upgrade transitions with many legacy checkpoints, this could generate substantial log volume.
    • 💡 Consider using Debugf instead, or only log once per upgrade scenario rather than per block.

🟢 Code Quality Observations

  1. Naming: Function and variable names are clear and descriptive.
  2. Comments: Excellent inline documentation explaining the "why" behind the logic.
  3. Error wrapping: Proper use of fmt.Errorf with %w for error wrapping.
  4. Test structure: Tests follow Go conventions with table-driven patterns and clear subtests.

🟢 Security & Correctness

  1. No security issues identified: The L1 validation approach is sound - comparing hashes against canonical blocks is the correct way to validate finalized blocks.

  2. Race condition handling: The check for blockNumber > finalizedBlockNumber at line 77 properly handles the case where blocks haven't been finalized yet, avoiding validation against unstable state.

  3. Reorg masking prevention: The check at lines 40-49 correctly prevents masking of real reorgs by ensuring the L1 fallback only runs when no block is stored locally (not when a different block is stored).


🟢 Performance Considerations

  1. RPC call overhead: The additional RPC calls (GetFinalizedBlockNumber and CustomHeaderByNumber) only occur for missing blocks below the finalized threshold. This is acceptable because:

    • It's a transient scenario (only during upgrade)
    • The alternative (manual DB deletion) is worse UX
    • Normal operation (blocks in storage) has no additional overhead
  2. No database regression: The changes don't add any new database queries in happy paths.


✅ Overall Assessment

This is a well-crafted fix for a critical upgrade bug. The implementation is:

  • ✅ Correct and safe
  • ✅ Well-tested with comprehensive coverage
  • ✅ Properly documented
  • ✅ Minimal performance impact
  • ✅ No breaking changes

The code follows Go best practices and the repository's coding standards (as defined in CLAUDE.md). The cherry-pick from release/0.10 is clean and appropriate.

Recommendation: ✅ Approve - This PR is ready to merge pending CI completion.


Minor: .gitignore Changes

The .gitignore additions (bin, .claude/, test.env) are reasonable development artifacts to ignore. These are unrelated to the core fix but are harmless additions.


@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e5db97601

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// inconsistency and let the caller retry.
func (dh *EVMMultidownloader) checkValidBlockAgainstL1(ctx context.Context, blockNumber uint64,
blockHash common.Hash) (bool, uint64, error) {
finalizedBlockNumber, err := dh.GetFinalizedBlockNumber(ctx)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Query a real finalized height before trusting missing blocks

I checked the syncer configs and BlockFinality is user-configurable with LatestBlock, SafeBlock, and PendingBlock in addition to FinalizedBlock, but GetFinalizedBlockNumber here returns dh.cfg.BlockFinality rather than an immutable finalized height. In those configurations, a missing unfinalized block that matches the current canonical header is accepted as valid even though the multidownloader has no local block or reorged record for that height; if that block is later reorged, the resumed sync has skipped the only height it could use to detect/repair the processor state. Please gate this fallback on a truly finalized tag (or query aggkittypes.FinalizedBlock directly) before treating the absent block as valid.

Useful? React with 👍 / 👎.

@sonarqubecloud

sonarqubecloud Bot commented Jun 8, 2026

Copy link
Copy Markdown

@joanestebanr joanestebanr self-assigned this Jun 8, 2026
@joanestebanr joanestebanr added the bug Something isn't working label Jun 8, 2026
@joanestebanr joanestebanr merged commit b1b2f3d into develop Jun 8, 2026
26 of 27 checks passed
@joanestebanr joanestebanr deleted the cherry-pick/1639-checkvalidblock-l1 branch June 8, 2026 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: updating aggkit sometimes produce a stuck situation 'error downloading blocks'

2 participants