Skip to content

Arg-register lowering drops/shifts the first arg for calls with 5 args + struct(sret) return (cortex-m4f, --native-pointer-abi) #359

@avrabe

Description

@avrabe

Summary

A call to a function with 5 scalar args returning a 16-byte struct by value has mismatched argument-register assignment between the call site and the callee: the caller omits the first argument from the register sequence and shifts the rest into r0–r3, while the callee reads r0–r3 as args 1–4. The result is silently-wrong values (no fault). Reproduces without loom (synth directly), so it's a synth ARM-lowering issue, not loom.

Found extending the wasm-cross-LTO work to a 4th struct-return primitive (k_msgq_put). The 3 working primitives (sem, mutex, stack) all have decides with ≤3 args; msgq's decide is the first with 5 args, which is what exposes this.

Repro

gale_k_msgq_put_decide(write_idx, used_msgs, max_msgs, has_waiter, is_no_wait) -> GaleMsgqPutDecision (16-byte #[repr(C)]: i32 ret; u8 action; u32 new_write_idx; u32 new_used). Pipeline: clang --target=wasm32 -O2 → wasm-ld → synth compile --target cortex-m4f --native-pointer-abi --all-exports --relocatable. (loom not required — see below.)

Caller (the dissolved z_impl_k_msgq_put body) just before bl <gale_k_msgq_put_decide> — synth, NO loom:

ldr r5, [fp, msgq+32]   ; r5 = used_msgs
ldr r7, [fp, msgq+12]   ; r7 = max_msgs
... cmp reader,#0 ; r1 = has_waiter
ldr r4, [sp,#104]       ; r4 = is_no_wait
mov r0, r5              ; r0 <- used_msgs
mov r1, r7              ; r1 <- max_msgs
mov r2, r1(has_waiter)  ; r2 <- has_waiter
mov r3, r4              ; r3 <- is_no_wait
bl  <gale_k_msgq_put_decide>

write_idx (computed earlier) is never moved into an argument register. The caller passes args 2..5 in r0..r3 and drops arg1.

Callee gale_k_msgq_put_decide entry:

stmdb sp!, {r4,r5,r6,r7,r8,lr}
str r0,[sp,#24] ; str r1,[sp,#28] ; str r2,[sp,#32] ; str r3,[sp,#36]
... uses r1,r2,r3 (=args) in the used<max comparison

The callee reads r0..r3 as args 1..4 (write_idx, used, max, has_waiter).

Net effect

  • callee used_msgs ← caller max_msgs (8)
  • callee max_msgs ← caller has_waiter (0)
  • used(8) >= max(0) → not(used<max) → Full → returns -ENOMSG.

On silicon (G474RE): a freshly k_msgq_init'd empty queue returns rc=-35 (ENOMSG) and stores nothing, instead of rc=0. native k_msgq_put = 145 cyc (correct); wasm-cross-LTO returns wrong (no fault). msg layout independently DWARF-verified correct, and hardcoding write_idx=0 does not change it (so it's not the write_idx division) — the args are mis-assigned at the ABI level.

Isolation

  • Reproduces with synth compile directly on the wasm-ld output (no loom) — identical caller/callee register mismatch. → synth, not loom.
  • The 3 ≤3-arg struct-return decides (sem/mutex/stack, 12-byte returns) lower correctly and run correct on silicon. The trigger is 5 args (and/or the 5-args + sret interaction).

Kill-criterion

Wrong if, after the fix, the dissolved z_impl_k_msgq_put passes write_idx in the correct arg register (caller/callee agree) and the bench returns rc=0 + round-trips the value on an empty queue. Repro harness: gale-smart-data/.../wasm-testbed/msgq-microbench/ (one command: builds + flashes + measures native vs wasm).

I'm the on-silicon gate (G474RE) — will re-measure the moment a fix lands.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions