Skip to content

feat(wasm-dist): k_mutex_unlock wasm-cross-LTO module (complete: build + consume)#60

Merged
avrabe merged 4 commits into
mainfrom
feat/wasm-mutex-module
Jun 14, 2026
Merged

feat(wasm-dist): k_mutex_unlock wasm-cross-LTO module (complete: build + consume)#60
avrabe merged 4 commits into
mainfrom
feat/wasm-mutex-module

Conversation

@avrabe

@avrabe avrabe commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Promotes the silicon-validated k_mutex_unlock to a complete, consumable wasm-cross-LTO module, mirroring the sem module (#59), now that synth v0.11.41 fixes the #331 spill-slot collision that previously deadlocked it on hardware.

Build pipeline + shim

  • zephyr/wasm/mutex_unlock_shim.c — production shim (faithful Zephyr v4.4.0 k_mutex layout: owner@8 / lock_count@12). Seam (gale_k_mutex_unlock_decide) inside the bundle so loom inlines it.
  • zephyr/wasm/gale_wasm_mutex_tramp.S — the r11=0 native-pointer trampoline.
  • scripts/build-wasm-dist.sh — refactored into build_module(); emits both sem + mutex + a combined manifest.

Consumption wiring (mirrors sem)

  • zephyr/gale_mutex.c#ifndef GALE_WASM_LTO_OVERRIDE_MUTEX_UNLOCK guard compiles out the native unlock when the module is linked.
  • zephyr/KconfigCONFIG_GALE_WASM_LTO_MUTEX.
  • zephyr/CMakeLists.txt — consumption block under CONFIG_GALE_KERNEL_MUTEX (artifact discovery via GALE_WASM_LTO_OBJ_DIR/_MUTEX_OBJ; links the .o + trampoline). gale_wasm_wrappers.c added only when the sem block didn't (shared gale_w_* defs).
  • release-wasm.yml — job name; dist/* already attaches both modules.

Multi-module symbol fix (found by building, not assuming)

Building tests/kernel/mutex/mutex_api with both CONFIG_GALE_WASM_LTO_{SEM,MUTEX}=y revealed that sem.o and mutex.o each carry synth's internal func_7/func_8 helpers as global symbols → multiple-definition at final link. #59 never hit this (sem shipped alone). Fix: build-wasm-dist.sh now exports only the body entry (objcopy --keep-global-symbol=<body>), localizing the decide + all func_N internals.

Verification (nucleo_g474re, cortex-m4f, synth 0.11.41)

  • Dist produces valid ET_REL objects exporting only synth_k_{sem_give,mutex_unlock}_body (func_N localized).
  • tests/kernel/mutex/mutex_api links with CONFIG_GALE_WASM_LTO_MUTEX=y — native unlock compiled out, synth_k_mutex_unlock_body + trampoline linked.
  • Both sem+mutex wasm on → links with no duplicate symbols.
  • First silicon measurement: k_mutex_unlock = 501 cyc, SELFCHECK rc=0 owner=0. synth#331 signature absent (arg0 home write-once); testbed primitives lane GREEN. Repro: jess repro/synth-331/.

🤖 Generated with Claude Code

…mpoline + dist build)

Promotes the silicon-validated mutex unlock to a shippable wasm-cross-LTO module
mirroring the sem module (#59), now that synth v0.11.41 fixes the #331 spill-slot
collision that previously deadlocked it.

- zephyr/wasm/mutex_unlock_shim.c: production shim (faithful Zephyr v4.4.0 k_mutex
  layout owner@8/lock_count@12; seam = gale_k_mutex_unlock_decide inside the bundle).
- zephyr/wasm/gale_wasm_mutex_tramp.S: r11=0 trampoline (--native-pointer-abi).
- scripts/build-wasm-dist.sh: refactored into a build_module() helper; now emits BOTH
  sem and mutex modules + a combined manifest. mutex uses --native-pointer-abi.

Verified: build produces a valid ET_REL mutex .o — synth_k_mutex_unlock_body global,
gale_k_mutex_unlock_decide localized, all kernel imports renamed to gale_w_*, no
un-renamed imports; arg0-home spill slot write-once (synth#331 signature absent).
First silicon measurement: k_mutex_unlock = 501 cyc (G474RE, synth 0.11.41), SELFCHECK
rc=0 owner=0. Testbed primitives lane GREEN.

Follow-up (separate PR): Kconfig CONFIG_GALE_WASM_LTO_MUTEX + CMakeLists consumption
wiring + release-wasm.yml attach, mirroring the sem module's.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
avrabe added a commit that referenced this pull request Jun 13, 2026
@codecov

codecov Bot commented Jun 13, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…odule symbol fix

Completes the k_mutex_unlock wasm-cross-LTO module (build #60) into a consumable
feature, mirroring the sem module:
- zephyr/gale_mutex.c: #ifndef GALE_WASM_LTO_OVERRIDE_MUTEX_UNLOCK guard around the
  native z_impl_k_mutex_unlock (mirrors gale_sem.c).
- zephyr/Kconfig: CONFIG_GALE_WASM_LTO_MUTEX.
- zephyr/CMakeLists.txt: consumption block under CONFIG_GALE_KERNEL_MUTEX (artifact
  discovery via GALE_WASM_LTO_OBJ_DIR / _MUTEX_OBJ; links the .o + r11=0 tramp).
  gale_wasm_wrappers.c added only when CONFIG_GALE_WASM_LTO_SEM didn't (shared
  gale_w_* defs — avoids double-definition when both modules are on).
- release-wasm.yml: job name (sem + mutex); dist/* already attaches both.

multi-module symbol fix (build-wasm-dist.sh): export ONLY the body entry via
objcopy --keep-global-symbol=<body>, localizing the decide AND synth's internal
func_N helpers. Without this, sem.o and mutex.o each carry global func_7/func_8
and collide at final link — caught by building tests/kernel/mutex/mutex_api with
BOTH CONFIG_GALE_WASM_LTO_{SEM,MUTEX}=y. #59 never hit it (sem shipped alone).

Verified on nucleo_g474re (cortex-m4f): tests/kernel/mutex/mutex_api links with
CONFIG_GALE_WASM_LTO_MUTEX=y (native unlock compiled out, synth_k_mutex_unlock_body
+ trampoline linked) AND with both sem+mutex wasm on (no duplicate symbols).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@avrabe avrabe changed the title feat(wasm-dist): k_mutex_unlock wasm-cross-LTO module — shim + trampoline + dist build feat(wasm-dist): k_mutex_unlock wasm-cross-LTO module (complete: build + consume) Jun 13, 2026
The option landed inside the 'if GALE_KERNEL_SEM ... endif' block, so it was
unreachable unless the SEM kernel was also enabled. Moved it under
GALE_KERNEL_MUTEX with 'depends on GALE_KERNEL_MUTEX'. Verified: with
CONFIG_GALE_KERNEL_SEM unset + CONFIG_GALE_KERNEL_MUTEX=y, tests/kernel/mutex/
mutex_api now accepts CONFIG_GALE_WASM_LTO_MUTEX=y and links (nucleo_g474re).
Also exercises the sem-off branch of the gale_wasm_wrappers.c dedup guard.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
avrabe added a commit that referenced this pull request Jun 13, 2026
… building (func_N collision, Kconfig mis-gate)
@avrabe avrabe marked this pull request as draft June 13, 2026 20:13
@avrabe

avrabe commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

⚠️ Converted to draft — the module FAULTS in the full mutex_api suite on silicon (blocking)

Ran the upstream tests/kernel/mutex/mutex_api on real NUCLEO-G474RE with CONFIG_GALE_WASM_LTO_MUTEX=y (the actual consume path, beyond the link-only check). It USAGE-FAULTs on test_complex_inversion:

Running TESTSUITE mutex_api
START - test_complex_inversion
E: ***** USAGE FAULT *****
E: Faulting instruction address (r15/pc): 0x080003f4
E: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0

Root cause (characterized, not a path/logic bug)

The faulting instruction at body+0xc is 0xfac8 0100 = <UNDEFINED>. In the working body that spot is movw r1, #0 — a R_ARM_MOVW_ABS_NC reloc against __synth_globals. The dissolved mutex body references wasm data-section statics (__synth_globals at .data+0x10008, __synth_wasm_data) via many MOVW_ABS/MOVT_ABS relocs, and the Zephyr link corrupts those movw instructions into undefined ones → fault on the first unlock that reaches the prologue.

Two facts localize it:

  • The sem module has ZERO MOVW_ABS relocs (no wasm statics) — that's why feat(release): ship wasm-cross-LTO modules with releases (Kconfig consumption + pipeline) #59 works in-kernel and this doesn't. The mutex .o also carries a ~64 KB .data section (the wasm linear-memory image) that sem doesn't.
  • The no-waiter mutex microbench links the same .o and runs clean (501 cyc, SELFCHECK rc=0) — so the .o is fine; it's the Zephyr-link consumption path (CMakeLists wiring + linker flags, e.g. --no-relax) that mis-applies the MOVW_ABS relocs against the 64 KB .data.

What this means

The consumption wiring in this PR is insufficient for a module that carries wasm statics. Two things to resolve before this can land:

  1. Linker/reloc: why the Zephyr link corrupts MOVW_ABS __synth_globals (the microbench link doesn't) — likely flag/placement of the 64 KB .data.
  2. 64 KB .data is itself a problem on a 128 KB-RAM target — the dissolved mutex object dragging the full wasm linear memory is wasteful; the decide uses very little of it (a synth-side concern worth raising separately).

The build-pipeline + shim parts of this PR are sound and the bug is well-characterized; holding as draft until the static-data consumption path is fixed. The no-waiter hot path (501 cyc) and the codegen/#331 guard remain valid. Repro: tests/kernel/mutex/mutex_api -b nucleo_g474re + CONFIG_GALE_WASM_LTO_MUTEX=y + the dist obj dir.

avrabe added a commit that referenced this pull request Jun 13, 2026
…_globals corrupted at Zephyr link, 64KB .data); -> draft, characterized, owed root-cause
@avrabe

avrabe commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Root-cause narrowing (firing follow-up)

Ruled out two suspects:

  • Not the objcopy step: built the .o with --keep-global-symbol=<body> (current) vs the old --localize-symbol=<decide>MOVW_ABS/MOVT_ABS reloc tables and .data size are byte-identical. So the func_N-dedup objcopy change didn't cause it.
  • Not a malformed synth reloc: the no-waiter microbench links the same .o and executes body+0xc every call without faulting → that movw __synth_globals is a valid instruction when linked in that image. An undefined instruction there would fault unconditionally. So synth's reloc is well-formed; it links correctly in one image and is corrupted in another.

So the corruption is link-context-specific: the mutex_api Zephyr image's MOVW_ABS_NC __synth_globals application turns movw r1,#… into 0xfac8… (undefined), while the microbench image links the identical reloc correctly. The distinguishing factor is the ~64 KB .data section the dissolved mutex object carries (the full wasm linear-memory image — __synth_globals at .data+0x10008) and where it lands in the larger image.

The deeper lever: the dissolved object shouldn't carry 64 KB of wasm linear memory for a decide that uses a handful of constants — that's both the source of these absolute relocs and a non-starter on a 128 KB-RAM target. Trimming the emitted linmem to what the decide actually references (synth-side) would remove the reloc shape entirely, the way the sem module has none. Holding #60 draft; this is the gate.

@avrabe

avrabe commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Root cause pinned — and the fix is on our side (mirror the sem module)

The corruption traces to the decide's return ABI, not the linker per se:

  • gale_k_mutex_unlock_decide returns a 12-byte struct {int32_t ret; uint8_t action; uint32_t new_lock_count}. clang/wasm32 lowers a >8-byte struct return via sret → the wasm needs a linear-memory stack frame: (memory 2) + (global $__stack_pointer i32.const 65536), with 12 i32.load/store ops. synth emits that memory as the 64 KB .data (__synth_wasm_data) and the stack-pointer global as __synth_globals, referenced via the MOVW_ABS/MOVT_ABS relocs that the Zephyr link mangles into the undefined instruction.
  • gale_k_sem_give_decide returns a packed uint64_t(param i32 i32 i32) (result i64), register return, no linear memory, zero MOVW_ABS relocs. That's exactly why the sem module is a clean drop-in and this isn't.

Fix (gale-side, mirrors sem): change gale_k_mutex_unlock_decide to return a packed u64 (e.g. ret / action / new_lock_count bit-packed) instead of a struct. That removes the sret → removes the 64 KB linmem .data → removes the __synth_globals/__synth_wasm_data abs-relocs → the mutex object becomes the same shape as the working sem object. Touches: the FFI decide + its Verus proof, the wasm shim decode, and the native gale_mutex.c decode (all the way sem already does it). Tracking as the unblock for this PR.

(Also avoids the 64 KB-RAM cost entirely — a separate non-starter on a 128 KB target.)

@avrabe

avrabe commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Correction — tested the u64-repack hypothesis at the artifact level; it's promising-but-partial, not a full fix

I prototyped the packed-u64 decide (FFI + shim) on scratch and rebuilt the wasm object to check the claim before committing. Result:

  • Removed all __synth_globals relocs (the sret stack-pointer frame). The instruction that USAGE-FAULTed — body+0xc movw r1,#0 against __synth_globals — is exactly that class, so the repack eliminates that reloc. Good progress.
  • But the 64 KB .data and 6 __synth_wasm_data MOVW_ABS/MOVT_ABS relocs persist in the body (MOVW_ABS went 10→6, not 0). The wasm still declares (memory 2) + (global $__stack_pointer 65536).

So returning u64 is necessary but not sufficient — it kills the sret/struct-return linmem cost, but the dissolved body still references __synth_wasm_data (the residual relocs the Zephyr link could still corrupt). My earlier "drop-in like sem" was over-stated; sem has zero such relocs, the mutex body still has six.

Where that leaves it:

  1. The packed-u64 ABI is still worth doing (removes sret + the __synth_globals faulting reloc) — but it's a full feature-loop change (FFI + 6 Kani harnesses + gale_mutex.h + native gale_mutex.c + tests, re-verified) and must be confirmed by a silicon mutex_api re-test, not assumed.
  2. The residual __synth_wasm_data relocs + 64 KB .data are the deeper issue — likely synth emitting the full wasm linear memory for a body that touches a few bytes. Trimming that (synth-side) is what would make it truly sem-shaped.

Reverted the scratch; #60 stays draft. Honest status: root cause is the link-context MOVW_ABS corruption against the dissolved body's wasm-data references; the u64 ABI removes one class of those but not all.

@avrabe

avrabe commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Filed the synth-side fix lever as pulseengine/synth#345 — dissolved --relocatable objects carrying the full 64 KB wasm linmem .data + absolute MOVW relocs (the root of the link corruption + a 128 KB-RAM non-starter). Kill-criterion there matches this PR: the dissolved k_mutex_unlock links + runs mutex_api on G474RE without fault, .data bounded by referenced bytes (sem-shaped). #60 stays draft pending that.

avrabe added a commit that referenced this pull request Jun 14, 2026
…full mutex_api on G474RE, both complex_inversion + recursive, VCP autodetect)

One-command verification to run the moment synth#345 (.bss + PC-rel) lands: rebuilds the dist
mutex .o (--native-pointer-abi) -> builds upstream tests/kernel/mutex/mutex_api with
CONFIG_GALE_WASM_LTO_MUTEX=y -> flash G474RE -> assert PROJECT EXECUTION SUCCESSFUL, no FAULT.
The no-waiter microbench only exercised UNLOCKED; this runs the full suite that caught the real
faults (complex_inversion link-corruption + the without-flag recursive MPU fault). Run from the
feat/wasm-mutex-module branch (needs the shim + CMakeLists wiring) until PR #60 merges.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
avrabe added a commit that referenced this pull request Jun 14, 2026
…ted prior 3-arg error) — sem-shaped/seam-folded, drop-in-ready; full module queued (modest value, #60 blocked)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… shim (gale#62)

The dissolved z_impl_k_mutex_unlock omitted the priority restoration the real
unlock does via adjust_owner_prio — the unlocking thread kept its inherited
boost, failing test_mutex_priority_inheritance (wasm-LTO FAIL / native PASS).

Add two gale_w_* wrappers (k_thread is opaque to the wasm shim):
  - gale_w_adjust_thread_prio: z_thread_prio_set(thread, new_prio) if changed
  - gale_w_thread_prio: read base.prio
Shim UNLOCKED path now restores cur to owner_orig_prio BEFORE handoff (while
mutex->owner is still cur), and records the new owner's orig prio on handoff.

Verified on G474RE silicon (synth v0.11.43): mutex_api 4/4 + mutex_api_1cpu 5/5
PASS incl test_mutex_priority_inheritance (6.001s, was FAIL 0.520s), no fault.
Module shape unchanged (.data=4, MOVW_ABS=0). Closes #62.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@avrabe avrabe marked this pull request as ready for review June 14, 2026 10:56
@avrabe

avrabe commented Jun 14, 2026

Copy link
Copy Markdown
Contributor Author

Marking ready for review — both blockers resolved and the module is silicon-verified-correct.

Blockers cleared:

  • synth#345 (link survivability) — fixed in synth v0.11.43. The dissolved mutex object's reloc shape flipped: MOVW/MOVT_ABS 22→0, R_ARM_ABS32 0→11 (link-survivable pool words), .data 65548→4 (.bss=65536). The test_complex_inversion USAGE FAULT and test_mutex_recursive MPU FAULT are gone.
  • gale#62 (shim priority-inheritance) — fixed here (713362f): added gale_w_adjust_thread_prio + gale_w_thread_prio wrappers; the UNLOCKED path now restores the unlocking thread's inherited boost before handoff.

Verified on real G474RE silicon (synth v0.11.43, CONFIG_GALE_WASM_LTO_MUTEX=y): full upstream tests/kernel/mutex/mutex_apimutex_api 4/4 + mutex_api_1cpu 5/5 PASS (incl. test_mutex_priority_inheritance 6.0s), no fault — matches the native gale mutex.

CI: 60/60 green (the lone earlier stack (qemu_cortex_m3) red was a Rust-install setup flake — passed on re-run; PR touches nothing in the stack primitive).

Mutex joins sem (give+take) + pipe (rd/wr) as a verified-correct dissolved primitive; this one is full-suite-correct on hardware, not just shape.

@avrabe avrabe merged commit 222602e into main Jun 14, 2026
101 of 102 checks passed
@avrabe avrabe deleted the feat/wasm-mutex-module branch June 14, 2026 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant