Skip to content

fix(aarch64): requeue instead of using stale cached ttbr0#410

Merged
ryanbreen merged 1 commit into
mainfrom
fix/aarch64-stale-cached-ttbr0-dispatch
Jun 2, 2026
Merged

fix(aarch64): requeue instead of using stale cached ttbr0#410
ryanbreen merged 1 commit into
mainfrom
fix/aarch64-stale-cached-ttbr0-dispatch

Conversation

@ryanbreen
Copy link
Copy Markdown
Owner

@ryanbreen ryanbreen commented Jun 2, 2026

Summary

Fixes the aarch64 stale cached TTBR0 dispatch bug: when the process-manager lock is busy, the dispatcher no longer installs thread.cached_ttbr0 as an unverifiable fallback. It requeues/redirects instead of dispatching under a stale or orphaned userspace page-table root.

Root-cause evidence from TURN 350 showed the residual post-spawn EC=0 fault was not PID 5 running under its own page table. The fault-time thread was PID 4/TID 14 with cached_ttbr0=0x100004407e000, whose untagged root 0x4407e000 no longer belonged to any live process.

Corrected TURN 351 verification

The initial verification harness had a false-pass bug: it stopped on WAIT_STRESS_PASS and did not scan the whole boot. TURN 351 reclassified the saved boots using normalized/de-interleaved serial and a whole-boot rule: a pass requires WAIT_STRESS_PASS, the #404 non-contiguous-frame assertion, and zero UNHANDLED_EC / FATAL_POSTMORTEM / PANIC / DEFER_SNAP / trace dump / SOFT_LOCKUP / DATA_ABORT anywhere in the log.

Corrected whole-boot results:

Boot Corrected verdict Notes
verify-2 PASS WAIT_STRESS_PASS + #404 assertion, no fault markers
verify-5 PASS WAIT_STRESS_PASS + #404 assertion, no fault markers
verify-7 PASS WAIT_STRESS_PASS + #404 assertion, no fault markers
verify-10 PASS WAIT_STRESS_PASS + #404 assertion, no fault markers
verify-11 PASS WAIT_STRESS_PASS + #404 assertion, no fault markers
verify-12 PASS WAIT_STRESS_PASS + #404 assertion, no fault markers
verify-3 FAIL Previously mislabeled PASS; whole-log scan catches recovered UNHANDLED_EC, 6x DEFER_SNAP, and trace dump after WAIT_STRESS_PASS
verify-8 FAIL Previously mislabeled PASS; whole-log scan catches UNHANDLED_EC and DEFER_SNAP
verify-9 FAIL Fault markers present

Artifact table: /Users/wrb/Downloads/Ralph/breenix-interrupt-io-roadmap-1780056222/turn351-artifacts/whole_boot_reclassification.md.

Comparison notes:

  • main-compare-1 and main-compare-2: no fail window in the same whole-log classifier.
  • pr404-compare-1: no WAIT_STRESS_PASS; not a clean comparison pass.
  • pr404-compare-2: no fail window.

Build evidence: turn351-artifacts/final-fix-build-warning-error-grep.txt is 0 bytes.

Test plan

  • aarch64 build clean: zero warning/error grep in turn351-artifacts/final-fix-build-warning-error-grep.txt.
  • 6 corrected whole-boot-clean ARM64/Parallels WAIT_STRESS verification boots with Fix ARM64 user stack frame mapping #404 assertion active: verify-2, verify-5, verify-7, verify-10, verify-11, verify-12.
  • Harness audit proved the old false-pass mode by reclassifying verify-3 as FAIL from de-interleaved whole-boot serial.

🤖 Generated with Claude Code / Codex Ralph verification loop.

Remove the PM-lock-busy fallback that restored cached TTBR0 values during dispatch. If TTBR0 setup cannot acquire the process-manager lock, the existing redirect/requeue path now handles the thread instead of risking a stale address-space root.

Co-authored-by: Ryan Breen <ryan@ryanbreen.com>

Co-authored-by: Claude Code <noreply@anthropic.com>
@ryanbreen ryanbreen merged commit 134c532 into main Jun 2, 2026
@ryanbreen ryanbreen deleted the fix/aarch64-stale-cached-ttbr0-dispatch branch June 2, 2026 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant