feat(test): automated Parallels launcher→terminal test harness by ryanbreen · Pull Request #411 · ryanbreen/breenix

ryanbreen · 2026-06-02T01:17:15Z

What this does

Host-side automation that drives the real Breenix GUI input path on a fresh Parallels VM and validates it with serial-log oracles — no kernel/userspace changes:

boot (run.sh --parallels) → BWM ready → double-tap SUPER →
  /bin/blauncher (Terminal pre-selected) → Enter → /bin/bterm

A run PASSes only when the serial log shows both [spawn] path='/bin/bterm' and [bterm] config:. "Launcher opened" alone ([spawn] path='/bin/blauncher') is an explicit FAIL — the oracle never accepts weaker evidence than "the terminal actually launched and initialized".

Files

scripts/parallels/inject.sh — canonical prlctl send-key-event helper (PS/2 set-1 scancodes, extended-key aware for Super = 0xE0 0x5B). Errors loudly (exit 2) on an empty/unset $VM instead of silently no-op'ing.
scripts/parallels/launcher-smoke.sh — one full run; prints exactly RESULT: PASS (exit 0) or RESULT: FAIL: <reason> (exit 1), with an evidence dir (serial excerpt + screenshots + result.txt). Includes a locked-screen preflight (refuses to run on a locked Mac, where Parallels silently drops injected keys) and a caffeinate -d keep-alive, both wired into the cleanup trap. The injection trigger is isolated to one config block at the top (SUPER_PREFIX/SUPER_CODE/INTER_TAP_MS/ENTER_CODE).
.claude/workflows/parallels-launcher-test.js — runs the smoke test sequentially (one VM, never parallel) up to 15×; gate = 10 consecutive PASS; reports the streak + first failure.
docs/planning/parallels-test-harness/README.md, RALPH_STATE.md — the proven recipe, host prerequisites, and known limitations.

Recipe

The recipe is proven in code and was walked manually in a prior session: double-tap Super (bwm.rs load_defaults binds SUPER+SUPER → exec /bin/blauncher), blauncher pre-selects APPS[0] = "Terminal", Enter alone launches /bin/bterm. Injection is prlctl send-key-event <VM> --scancode <ps2-set1> --event press|release — NOT CGEvents.

⚠️ Validation status — live 10× run is PENDING AN UNLOCKED MAC

The harness is built and verified (both scripts bash -n clean; the recipe walked manually before), but the live 10-in-a-row validation has not run, because:

Root cause: prlctl send-key-event reaches the guest only when the macOS console is unlocked. With the screen locked, Parallels detaches the VM window and silently drops every injected keystroke (send-key-event returns rc=0, key never lands — proven functionally by injecting = into the Bounce demo with no effect). This is not a TCC/Accessibility issue: injection goes through the virtual xHCI HID via prl_disp_service, not macOS CGEvent — so no permissions grant fixes it. There is no non-interactive unlock bypass; the smoke script preflights this and fails fast with a clear message.

Exact steps to run the validation (operator, at the console):

Physically unlock the Mac at the console.
Disable auto-lock for unattended runs: System Settings → Lock Screen → "Require password after screen saver begins/display is turned off" = Never/Off.
caffeinate -d & (the smoke script also starts one, but disabling auto-lock is still required).
bash scripts/parallels/launcher-smoke.sh (single run), then invoke the parallels-launcher-test workflow for the 10× gate.

QEMU is not a substitute for this flow: BWM's ARM64 path needs the Parallels-specific VirGL 3D compositor (absent on QEMU here, so BWM never starts), and the SUPER hotkey reads SUPER_PRESSED only from the USB-HID/xHCI driver, which never enumerates on QEMU (the virtio-keyboard MMIO driver never tracks Super). Making QEMU viable would require kernel changes (software-compositor fallback for BWM + a virtio-keyboard→SUPER bridge) — out of scope for this host-side harness.

Test plan

bash scripts/parallels/launcher-smoke.sh prints RESULT: PASS (requires unlocked Mac)
parallels-launcher-test workflow reports consecutiveGreenAchieved: true — 10 consecutive green (requires unlocked Mac)

🤖 Generated with Claude Code

Host-side automation that drives the real Breenix GUI input path on a fresh Parallels VM and validates it with serial-log oracles: boot (run.sh --parallels) -> BWM ready -> double-tap SUPER -> /bin/blauncher (Terminal pre-selected) -> Enter -> /bin/bterm PASS requires real serial evidence that bterm spawned AND emitted its config line -- "launcher opened" alone is an explicit FAIL. Files: - scripts/parallels/inject.sh -- canonical prlctl send-key-event helper (PS/2 set-1 scancodes; extended-key aware; errors loudly on empty $VM). - scripts/parallels/launcher-smoke.sh -- one full run, prints exactly "RESULT: PASS" / "RESULT: FAIL: <reason>". Locked-screen preflight (refuses to run on a locked Mac, where Parallels silently drops injected keys) plus a caffeinate -d keep-alive, both wired into the cleanup trap. - .claude/workflows/parallels-launcher-test.js -- runs the smoke test sequentially (one VM, never parallel) up to 15x; gate = 10 consecutive PASS. - docs/planning/parallels-test-harness/{README,RALPH_STATE}.md -- proven recipe, host prerequisites, and known limitations. Documents the night's findings: the macOS console must be unlocked for prlctl send-key-event to reach the guest (it injects through the virtual xHCI HID via prl_disp_service, NOT macOS CGEvent/TCC -- so no permissions grant fixes a locked screen), the unattended-run requirements (disable auto-lock + caffeinate), and why QEMU is not a viable substitute for this flow (BWM needs the Parallels-specific VirGL compositor and SUPER is only read from the USB-HID/xHCI driver, which never enumerates on QEMU). Validation status: the live 10x run is PENDING AN UNLOCKED MAC. The recipe is proven in code and was walked manually in a prior session. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…VM from run.sh stdout Adversarial correctness review of the launcher-test harness (PR #411), which has never been run end-to-end. Found and fixed one dangerous race that could have caused a false readiness signal / wrong-VM injection on the very first real run: 1. Stale-serial false match (HIGH). The readiness poll grepped /tmp/breenix-parallels-serial.log for the BWM ready marker with no guarantee the log was the fresh one this boot created. run.sh only `rm -f`s and recreates the serial log late (right before `prlctl start`, after the whole build). A leftover prior-run log at that path already containing the marker (confirmed present on the test Mac right now) would be matched as "ready" before the VM even started, after which BASE_LINE/tail-since would be computed against the wrong file and the oracle greps would see nothing. Fix: snapshot the leftover log's inode before launching run.sh and only trust the marker once the log's inode changes (fresh file) — serial_inode() + serial_is_fresh() gate the readiness poll. 2. Indirect VM-name resolution (MEDIUM). `prlctl list -a | grep breenix- | tail -1` could select a leftover/stuck breenix-* VM (run.sh's old-VM delete is best-effort). Fix: resolve the VM name authoritatively from run.sh's own `VM: breenix-<epoch>` stdout line in RUN_LOG (printed only after the fresh VM is created+started), falling back to the prlctl heuristic. The proven recipe (double-tap SUPER trigger, Enter, and the dual-oracle PASS gate requiring BOTH `[spawn] path='/bin/bterm'` AND `[bterm] config:`) is unchanged. README updated to match. inject.sh and the workflow JS were reviewed and required no changes. bash -n, node --check, and shellcheck clean (only an SC2329 false positive on the trap-invoked cleanup()). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…idate bterm's own startup Three fixes from the first live end-to-end runs on an unlocked Mac (the flow is now proven working — double-Ctrl opens /bin/blauncher, Enter launches /bin/bterm, terminal window + child shell come up; serial + screenshot evidence): - set -e lock preflight: the python lock probe exits 1 when UNLOCKED (the required state); as a bare statement that tripped `set -e` and aborted before reading $?. Run it as an if-condition (set -e exempt). Previously the harness died in ~1s on an unlocked Mac — the one state in which it must run. - injection: Parallels 26.3.3 rejects `--scancode 91` (0x5B Super) with "Invalid scan code sequence" and offers no way to send the 0xE0 0x5B extended pair as separate --scancode calls. Breenix's HID layer maps the Left-Ctrl bit to the SUPER modifier, so inject Left-Ctrl (scancode 29, no prefix): accepted by Parallels and the exact "double control key" the operator describes. - oracle: blauncher launches bterm via fork+execv, which does NOT emit the kernel's "[spawn] path='/bin/bterm'" line. Validate bterm's OWN startup logs instead -- '[bterm] config:' AND '[bterm] spawned child pid=' (terminal started AND loaded its shell). Stronger, honest proof (the binary actually ran); never weakens the gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The double-tap trigger is timing-sensitive (bwm requires two Ctrl taps within a 400ms window). On a CPU-throttled / overloaded host, prlctl send-key-event latency balloons (observed 162s for a single doubletap at 4 VM cores), spreading the two taps far past the window so the launcher never opens. Log the injection wall-time and warn when it exceeds ~350ms, so a "launcher did not open" failure is diagnosable as a timing miss vs. the key never reaching the guest. Conclusion from the throttled gate: do NOT throttle these runs. The flow works at full CPU (proven once end-to-end); reliability must be measured at full speed, which means running when the operator is away rather than throttled alongside them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… body, --no-build) The generated workflow had two bugs that would have wrecked a real run: - it invoked launcher-smoke.sh WITHOUT --no-build, so each of up to 15 attempts would trigger a full kernel+userspace+ext2 rebuild (~10 min each). - it was written as `export default async function run()` calling `agent({prompt, schema})`, but the Workflow runtime executes the script BODY directly and agent() takes (promptString, {schema}) -- so as written the loop was never invoked. Rewrite to the documented pattern: top-level body with phase()/await agent(), agent(prompt, {schema}), --no-build, a pre-run lock guard, and per-attempt injection-wall-time capture. Stops at 10 consecutive PASS or 15 attempts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…thout breaking injection) The operator uses this Mac while runs happen, so the VM must not hog CPU — but throttling it breaks the timing-sensitive double-tap. Resolution: - background_vm_proc: drop the VM to `renice 20` (perf cores, polite under contention) as soon as it boots, through the long boot/warmup phases. - foreground_vm_proc: restore `renice 0` for the brief double-tap injection window. - Use renice ONLY (no `taskpolicy -b`): E-core banishment starved the guest so it couldn't consume the two taps inside bwm's 400ms window (observed 1876ms). - Add --no-background opt-out; bump default timeout to 1200s (backgrounded boots are slower). NB: a separate, host-side issue gates reliability — `prlctl send-key-event` latency is variable and coupled to host load (seen 0.4s..166s/call); a double-tap needs each call <~100ms, which requires a responsive/quiet Parallels dispatcher. The renice toggle fires correctly; an end-to-end PASS with it is still pending a responsive dispatcher (run on a quiet host / after a Parallels restart). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…enix VM is already running run.sh kills any existing breenix VM before creating its own, so two overlapping launcher-smoke runs would destroy each other's in-flight VM (and two VMs would fight the Parallels dispatcher). Add a preflight that emits RESULT: FAIL and exits if a breenix VM is already running, enforcing strictly-serial execution even if a caller accidentally launches runs concurrently. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…t -j` (load-independent double-tap) ROOT CAUSE of the failing reliability gate (15/15 fail, every double-tap ~1.9s): the double-tap was 4 SEPARATE `prlctl send-key-event` spawns, each ~475ms on a loaded host, so the two taps landed ~1.9s apart — far outside bwm's 400ms window. Proof #3 only passed because the dispatcher was fast on an idle (5am) host. FIX: send every command as ONE `prlctl send-key-event -j` batch (JSON event array on stdin). The inter-event delays are then applied by the Parallels dispatcher with precise timing, INDEPENDENT of prlctl's per-spawn latency — so the double-tap lands inside the 400ms window regardless of host load. Validated: the whole double-tap is one ~0.6s call with the two taps spaced exactly 190ms by the dispatcher (vs ~1.9s and unreliable across 4 spawns). inject.sh: tap/doubletap/hold/type now build a JSON event array and send it via one `-j` stdin call. launcher-smoke.sh: the injection wall-time log is reworded (wall-time is now just prlctl overhead, not the tap spacing). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…m input drops The 10/15 gate exposed two REAL Breenix intermittency bugs (not harness issues): - ~25% double-tap drop: bwm never registers the (correctly batched, dispatcher- timed) double-tap — blauncher truly never spawns (verified absent across the whole boot, not late). Guest-side BWM/HID input intermittency. - EC=0xe Illegal Execution State crash on the Enter->fork/exec->bterm path (run-124137): launcher opened, then [UNHANDLED_EC] cpu=5 + [FATAL_POSTMORTEM]; the handler parks the CPU in idle so heartbeats continue (looks "hung"). This is clone-exec/TTBR0 SMP territory — the area of this branch's in-flight fixes. Make the harness an honest bug-detector: grep the post-injection serial for [UNHANDLED_EC]/[FATAL_POSTMORTEM]/panic and report "KERNEL FAULT ..." with the offending line, distinctly from a benign "double-tap dropped" or "terminal did not launch". No silent retry-to-green — the gate honestly reports the real reliability (and which failure mode), per the no-faking-tests policy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…postmortem The EC=0xe (Illegal Execution State) catch-all previously printed only ELR, which is not enough to confirm WHY the ERET landed in an illegal state. Add, on the fatal park path only (interrupts already masked; lock-free raw-UART output like the existing [UNHANDLED_EC]/[DATA_ABORT] lines; nothing on hot paths): - [FATAL_REGS]: spsr, esr, far, elr, sp, x0..x30 from the exception frame. - [FATAL_THREAD]: current tid, saved_by_inline_schedule, ctx_elr_el1 via the deadlock-safe scheduler try_dump_state (try_lock; skips if busy) — the same accessor the PC_ALIGN fatal handler already uses. This makes the next capture of the intermittent crash decisive: SPSR shows the illegal PSTATE, and saved_by_inline_schedule + ctx_elr_el1 directly confirm/refute the stale-elr_el1-restored-on-dispatch-ERET hypothesis. Diagnostic only; exception.rs only (no gold-master / context_switch.rs / userspace). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The launcher-test harness reproduced an intermittent crash; forensic analysis (enhanced postmortem b196121 + symbolization + trace ring) confirms the proximate cause with high confidence: idle_loop_arm64's register file gets saved into a non-idle thread's Thread.context, which is later dispatched via ERET into .bss (0x269000=WAKE_SITE_SCHEDULE) -> EC=0x0 (UDF) or EC=0xe (illegal SPSR). Same bug, two exception classes. Unifies the prior crash hunt + the branch's TTBR0/clone-exec cluster. Fix is in gold-master context_switch.rs and the obvious mitigation intersects the "NO EL0 dispatch guard" autopsy warning -> documented as a signoff proposal, not applied. Doc lays out both fix options, the upstream-writer candidates, the Parallels-only confirmation path, and how to validate via the harness. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…'t dropped The launcher double-tap was dropped ~22% of the time: the modifier path is polled-level (hid.rs SUPER_PRESSED.store), and bwm samples it once per (bursty, GPU-fenced) compositor wake, so a tap's ~30ms high window can fall entirely between two polls and be missed -> tap_count reaches 1 not 2 -> launcher never fires. The mouse path already solved this with a press-edge latch; modifiers lacked the equivalent. Fix (mirrors the mouse latch; none of the 3 files are gold-master/prohibited): - hid.rs: SUPER_TAP_COUNT atomic, incremented on the SUPER 0->1 rising edge at HID-report time (swap-based), plus a read-and-clear accessor; wakes the compositor on a Super edge. Lock-free, no logging on the path. - graphics.rs: op=31 returns+clears the latched tap count; a keyboard-ready bit in compositor_ready_bits so a tap wakes compositor_wait. - bwm.rs: drains the latch every frame and drives SUPER multi-tap from latched press-edges (combo semantics + 400ms window + cooldown preserved; a single tap cannot read as a double). Validated via the launcher harness: drop rate ~22% -> ~9% (10/11 injected runs opened the launcher), no regressions, no spurious launches, injection load-independent. The residual ~9% showed zero guest HID activity post-injection (a host injection-delivery miss, not the latch) -- separate, host-side. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Run 3 of a validation batch hit a false readiness timeout (and leaked a VM) because concurrent serial writers split the one-shot marker mid-line ("[in[bwm] hotkeys: using built-TELNETD_STARTING"). Match EITHER the hotkeys-defaults line OR the recurring [bwm-fps] compositing line (printed ~180x/s once the desktop is live, so a clean instance appears within ms), via grep -aE. Removes the harness's own flaky failure mode. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ryanbreen and others added 13 commits June 1, 2026 21:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(test): automated Parallels launcher→terminal test harness#411

feat(test): automated Parallels launcher→terminal test harness#411
ryanbreen wants to merge 13 commits into
mainfrom
feat/parallels-launcher-test-harness

ryanbreen commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ryanbreen commented Jun 2, 2026

What this does

Files

Recipe

⚠️ Validation status — live 10× run is PENDING AN UNLOCKED MAC

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant