Add opt-in human behavior simulation and OS input path by quantsquirrel · Pull Request #396 · browser-use/browser-harness

quantsquirrel · 2026-05-31T13:59:52Z

Summary

Adds an opt-in human-behavior simulation layer for browser-harness:

human-like mouse trajectories, timing, tremor, Fitts-law movement, hover/dwell, typing, scrolling, and wait pacing
daemon-side batched input dispatch for ~60Hz pointer delivery without per-event IPC overhead
default Runtime.enable removal to avoid console-serialization CDP tell, with BH_CDP_ENABLE_RUNTIME=1 escape hatch
live self-test probes for T1/T2/rate/isTrusted and validation notes
macOS OS-input path (human_move_os, human_click_os) using Quartz CGEvents for rare detection-sensitive clicks
Retina viewport fix: use cssLayoutViewport rather than device-pixel layoutViewport

Live validation

On local Chrome 148 / macOS:

os_calibrate: error_px [0.0, 0]
os_selftest(stress=True): repeated coalesced max 2 / 3 / 2, so the OS-input path closes the T1 getCoalescedEvents() tell
CDP under the same renderer stress: coalesced max 1, confirming the CDP path remains uncoalesced
Active display count was 1; multi-monitor mapping is guarded and unit-tested, but not live-tested on multi-monitor hardware

Tests

Validated on the feature branch and again after merging into latest origin/main in a temporary worktree:

python3 tests/unit/test_human_behavior.py → 33/33 passed
python3 tests/unit/test_daemon_input_sequence.py → 3/3 passed
uv run --with pytest pytest tests/unit/test_admin.py tests/unit/test_helpers.py tests/unit/test_daemon.py tests/unit/test_run.py -q → 81 passed after merge with current origin/main
python3 -m py_compile agent-workspace/agent_helpers.py src/browser_harness/daemon.py src/browser_harness/helpers.py

Notes

OS-input mode is intentionally opt-in: it foregrounds the browser and moves the physical cursor.
os_selftest(stress=True) is the deterministic proof mode. Unstressed Chrome can legitimately report getCoalescedEvents()==1 even for real OS moves.

Summary by cubic

Adds an opt-in human behavior simulation layer to browser-harness with human-like mouse, typing, and scrolling, plus a macOS OS-input path for sensitive clicks. Improves realism and raises event cadence via ~60Hz server-side dispatch, and drops default CDP Runtime to reduce detectability.

New Features
- Human-like input: Fitts-timed mouse moves with tremor/overshoot, correct keycodes with key-hold, scroll detents.
- Server-side batched dispatch (~60Hz) with automatic client fallback and resume-on-failure.
- macOS OS-input mode: human_move_os, human_click_os, os_selftest, os_calibrate for detection-sensitive actions.
- Built-in self-tests (T1/T2/rate/isTrusted) to validate behavior on your Chrome.
Migration
- Opt-in only; no changes required for existing flows.
- OS mode needs macOS, pyobjc-framework-Quartz, and Accessibility; it foregrounds the browser and moves the real cursor.
- CDP Runtime.enable is now omitted by default; set BH_CDP_ENABLE_RUNTIME=1 if you need Runtime events.

^{Written for commit 306155f. Summary will update on new commits.}

Rewrite the human-behavior-simulation layer (agent-workspace/) with correctness and detection-realism fixes from a multi-lens review. - typing: default semantic mode now emits correct virtual-key codes (_vk_for_char) and a non-zero key hold; no longer routes through the core press_key (which emitted 0ms holds + VK_NUMPAD codes for letters) - tremor: exact OU discretization, dt tied to the real per-event interval, amplitude re-calibrated to ~0.8px (inside the 0.3-1.2px human band), anisotropic 2:1 axes - motion: asymmetric ballistic velocity (Beta(2,3)) replacing symmetric smoothstep; Fitts' Law movement time (optional target width); overshoot + correction on long moves - scroll: cursor anchored before wheel; discrete detent multiples (wheel) - idle: bounded cursor drift during human_wait (anchored, <=~15px) - click: <=1px release micro-drift (clamped in-viewport); teleport invariant preserved (press == final move) - session: cursor/click-bias/tremor-orientation persist across -c calls via a per-BU_NAME atomic state file - hardening: underscore-private config tables/class (no namespace leak), narrowed _viewport except, physical-typing dd>=hold constraint - docs: STALE banners on the two design/review drafts; add HUMAN_SIM_VALIDATION.md as the authoritative validation artifact Known ceilings documented in-module (not fixable in this layer): event rate ~20-40Hz (per-call IPC), getCoalescedEvents().length==1 / no PointerEvent stream, CDP-presence detectability. Adds tests/unit/test_human_behavior.py (17 hermetic tests, no browser). Reviewed in a separate lane (APPROVE; 0 regressions). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… dispatch + Runtime.enable drop Tackles the three documented behavioral-detection ceilings with what is actually fixable (researched against Chromium source + fingerprinting lit), and documents honestly what is not. Event RATE (was ~28Hz) — FIXED: - daemon: new `meta:"input_sequence"` handler dispatches a precomputed event list server-side over the persistent CDP WS, sleeping delay_ms before each. Decouples the rate from the per-call client<->daemon IPC round-trip. - helpers: `_send(timeout, raise_on_error)` + `dispatch_input_sequence()` (timeout sized to sum(delay) + per-event slack). - agent_helpers: human_move/human_click/human_scroll now build one event batch and emit via `_emit()`; move_step_ms lowered to ~16ms (~60Hz). A mid-batch failure resumes the remainder client-side (resume-from-count, never re-sends the dispatched prefix — no double-fire); pre-batch daemons fall back to the client path automatically. CDP-presence — mitigated: - daemon omits Runtime.enable by default (`_enabled_domains()`), removing the console-serialization detection class. Nothing consumes Runtime events and Runtime.evaluate works without enable; BH_CDP_ENABLE_RUNTIME=1 restores it. Documented as NOT fixable in software (need a patched Chromium): - getCoalescedEvents() stays empty — CDP injects via ForwardMouseEvent, bypassing the compositor coalescing queue (also why we target ~60Hz, not higher: extra uncoalesced events look more anomalous). - screenX==clientX — CDP sets no window/desktop offset (Cloudflare Turnstile checks this); not settable via CDP. Corrected: pressure 0/0.5, tilt 0, pointerType "mouse" are spec-correct for a real mouse — NOT a bot tell (earlier over-statement removed). Tests: tests/unit/test_daemon_input_sequence.py (3, hermetic via cdp_use stub) + test_human_behavior.py grows to 22 (batch dispatch, ~60Hz rate, single-batch click invariant, fallback, resume-from-count). All 25 pass; py_compile clean. Reviewed in a separate lane across two passes (APPROVE; the partial-failure double-dispatch found in pass 1 is fixed and regression-tested). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…on record Turns the residual-CDP-tell question from speculation into measurement, and records the no-fork decision from the 6-lens investigation. - agent_helpers: human_selftest() instruments the live page (transparent full-viewport overlay, clicks swallowed) while driving real human_* input, then reports for THIS Chrome: T2 screenX-vs-clientX delta, T1 getCoalescedEvents length, delivered pointer rate (~60 fast / ~30 fallback), isTrusted. chrome_version() reads the major via UA (Runtime.enable-free). Both exported; _eval stays private. - CEILING_DECISIONS.md: research conclusions + decision + phased plan. Key findings: T2 (screenX==clientX) is already fixed upstream in Chrome >=142 (crbug 40280325) — verify, don't assume. T1 (getCoalescedEvents empty) has zero confirmed production deployments — theoretical, not shipped. Frida (macOS hardened-runtime / keychain break) and a Chromium fork (profile conflict + 4-week rebase) are both REJECTED for a personal tool. T1's only real fix is a sparingly-used OS-injection (CGEvent) mode, left on the shelf until a confirmed target is shown to check it. Tests: +2 selftest verdict-logic tests (exposed vs fixed-Chrome canned data); test_human_behavior.py now 24, daemon 3. py_compile clean; load-path verified (helpers auto-loads, human_selftest/chrome_version exported). NOTE: selftest verdict logic is unit-tested; its LIVE behavior on real Chrome (whether CDP fires pointermove on the overlay, getCoalescedEvents values) is exactly what the tool is meant to reveal on first run — not yet verified here. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Ran human_selftest() against the live machine's real Chrome 148.0.7778.181: - T2 (screenX==clientX): NOT exposed — screen/client delta 121px (upstream fix live). - T1 (getCoalescedEvents): EXPOSED — max 1 (CDP coalescing bypass confirmed on 148). - isTrusted true; delivered pointer rate 41Hz with the server-side fast path verified active (dispatch_input_sequence returned {ok,count:2} on a fresh daemon against real Chrome — the daemon batch handler works end-to-end). Confirms the no-fork decision empirically: the only exposed tell (T1) is the one with zero production deployment. Phases 1/2 not triggered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rate, robust click capture Live runs surfaced two selftest flaws (verdict logic was always correct; these are diagnostic-quality issues): - Rate metric swung 19-41Hz run-to-run: (n-1)/span counted the large gaps BETWEEN the move/move/click trajectories. Switched to the MEDIAN inter-move interval, which reports the true per-event cadence — now a stable ~48-56Hz (server-side fast path), matching the verified daemon dispatch. - clicks_captured was intermittently 0, which read like a bug. Verified via a catch-all probe that human_click DOES fire a full, correct event chain (pointerdown/mousedown/pointerup/mouseup/click) — it really clicks. The selftest's single-press capture can just miss the read window, so: capture pointerdown AND mousedown, retry-read briefly, and — decisively — derive the T1/T2/rate/isTrusted verdict from the deterministic MOVE stream only. Clicks are now labelled best-effort and never gate the verdict. Recorded the live Chrome-148 measurement update in CEILING_DECISIONS.md. Tests: 24/24 (verdict logic unchanged — canned signal lives in the move stream); py_compile clean; verified across repeated live runs (T2 OK delta 121px, T1 EXPOSED coalesced<=1, rate ~50Hz, isTrusted true — consistent regardless of click capture). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…o close the T1 coalesced tell Real Quartz CGEvents traverse the full HID->compositor->renderer pipeline, so the page sees genuine coalesced pointer events + correct screenX + isTrusted — the one thing CDP Input.dispatchMouseEvent provably cannot do (it bypasses the compositor coalescing queue). Opt-in, macOS-only, lazy pyobjc import (core stays pure-stdlib). API: human_click_os(x,y[,button,app_name]) / human_move_os(x,y) / os_selftest() / os_calibrate(). Reuses the Fitts/Bezier/tremor trajectory, posting real moves at ~125Hz so Chrome's compositor coalesces them. Safety (this posts REAL clicks on the live desktop — three-layer guard, never blind): - frontmost check: foregrounds BH_BROWSER_APP (default "Google Chrome") and refuses if the wrong app is frontmost (Brave/Edge users pass app_name / set the env). - display-bounds check: refuses if the mapped global point is off all displays (CGGetActiveDisplayList/CGDisplayBounds; handles negative multi-monitor origins). - cursor-arrival check: posts the move, reads back the real cursor, refuses if it didn't reach the target (= Accessibility not granted) instead of a silent no-op. - click uses kCGMouseEventClickState=1 (else clickState 0 may not register / exposes MouseEvent.detail===0). Validated: 30 hermetic tests (mocked Quartz: capability, client->screen mapping, full move+down+up sequence with clickState, off-screen refusal, Accessibility-denied refusal). os_calibrate() run LIVE returned error_px [0.0, 0] — the client->screen mapping matches the browser's reported screenX/screenY EXACTLY on the primary display, so OS clicks land where intended (validated WITHOUT moving the cursor). Two adversarial review passes (APPROVE). NOT yet run live: the CGEvent path needs `pip install pyobjc-framework-Quartz` into the env + Accessibility granted; os_selftest() then measures whether getCoalescedEvents()>1 actually results. Multi-monitor mapping unvalidated (os_calibrate covered primary only). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… OS injection Running os_selftest() live (pyobjc installed + Accessibility granted) drove real CGEvent moves and the page reported getCoalescedEvents() max = 2 (>1) with a real screenX offset (delta 164px): T1 (getCoalescedEvents empty) is GENUINELY CLOSED via the OS-injection path — the one tell CDP Input.dispatchMouseEvent provably cannot fix. The live run also exposed a latent Retina bug: _viewport read layoutViewport, which is DEVICE px (2x at devicePixelRatio 2); CDP/CGEvent coordinates are CSS px, so every fractional target (e.g. 0.65*w) was 2x too large and mapped off-screen — the OS display-bounds guard correctly refused. Fixed _viewport to prefer cssLayoutViewport (CSS px), falling back to layoutViewport for older Chrome. This also corrects _clamp and cursor-init on any dpr>1 display (affected the CDP path too, latent). Tests: +1 Retina regression (test_viewport_uses_css_pixels_on_retina; _FakeQuartz cdp now returns both cssLayoutViewport=1200x800 and layoutViewport=2400x1600, asserts the CSS one is used). 31/31 + daemon 3. py_compile clean. os_calibrate live still [0.0,0]. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The live no-stress OS selftest was flaky on Chrome 148: real Quartz moves can legitimately report getCoalescedEvents()==1 when the renderer is idle. The proof now adds transient renderer stress so the compositor path difference is measured deterministically, while the actual OS movement model stays human-scale. Constraint: Chrome 148 on a single-display Retina Mac produced no-stress OS coalescing inconsistently. Rejected: Increase OS move rate or burst CGEvents | live repeats stayed flaky and changed the movement model. Confidence: high Scope-risk: narrow Directive: Treat os_selftest(stress=True) as the T1 proof mode; do not claim unstressed Chrome always returns coalesced events. Tested: python3 tests/unit/test_human_behavior.py; python3 tests/unit/test_daemon_input_sequence.py; uv run --with pytest pytest tests/unit/test_admin.py tests/unit/test_helpers.py tests/unit/test_daemon.py tests/unit/test_run.py -q; python3 -m py_compile agent-workspace/agent_helpers.py src/browser_harness/daemon.py src/browser_harness/helpers.py; live os_calibrate ok error_px [0.0,0]; live os_selftest stress repeats coalesced_len_max 2/3/3; live CDP stress coalesced_len_max 1. Not-tested: live multi-monitor hardware; current machine exposes one active display only.

The stress-mode coalescing proof can make CGEventPost completion observable before the WindowServer has settled the final cursor location. The OS path now re-posts the final landing point with a short settle loop before treating the run as an Accessibility failure. Constraint: live os_selftest(stress=True) once read the cursor 31px from target immediately after trajectory completion. Rejected: Loosen the 4px verification threshold | would hide wrong-monitor or Accessibility failures instead of waiting for real cursor arrival. Confidence: high Scope-risk: narrow Directive: Keep the final-arrival check strict; add settling, not broad tolerance, when WindowServer timing is the issue. Tested: python3 tests/unit/test_human_behavior.py; python3 tests/unit/test_daemon_input_sequence.py; uv run --with pytest pytest tests/unit/test_admin.py tests/unit/test_helpers.py tests/unit/test_daemon.py tests/unit/test_run.py -q; python3 -m py_compile agent-workspace/agent_helpers.py src/browser_harness/daemon.py src/browser_harness/helpers.py; live os_calibrate ok error_px [0.0,0]; live os_selftest stress repeats coalesced_len_max 2/3/2; live CDP stress coalesced_len_max 1. Not-tested: live multi-monitor hardware; current machine exposes one active display only.

cubic-dev-ai

No issues found across 10 files

_{Re-trigger cubic}

quantsquirrel and others added 9 commits May 29, 2026 05:46

cubic-dev-ai Bot reviewed May 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add opt-in human behavior simulation and OS input path#396

Add opt-in human behavior simulation and OS input path#396
quantsquirrel wants to merge 9 commits into
browser-use:mainfrom
quantsquirrel:feat/human-behavior-sim

quantsquirrel commented May 31, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

quantsquirrel commented May 31, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Live validation

Tests

Notes

Summary by cubic

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

quantsquirrel commented May 31, 2026 •

edited by cubic-dev-ai Bot

Loading