Add opt-in human behavior simulation and OS input path#396
Open
quantsquirrel wants to merge 9 commits into
Open
Add opt-in human behavior simulation and OS input path#396quantsquirrel wants to merge 9 commits into
quantsquirrel wants to merge 9 commits into
Conversation
Rewrite the human-behavior-simulation layer (agent-workspace/) with correctness and detection-realism fixes from a multi-lens review. - typing: default semantic mode now emits correct virtual-key codes (_vk_for_char) and a non-zero key hold; no longer routes through the core press_key (which emitted 0ms holds + VK_NUMPAD codes for letters) - tremor: exact OU discretization, dt tied to the real per-event interval, amplitude re-calibrated to ~0.8px (inside the 0.3-1.2px human band), anisotropic 2:1 axes - motion: asymmetric ballistic velocity (Beta(2,3)) replacing symmetric smoothstep; Fitts' Law movement time (optional target width); overshoot + correction on long moves - scroll: cursor anchored before wheel; discrete detent multiples (wheel) - idle: bounded cursor drift during human_wait (anchored, <=~15px) - click: <=1px release micro-drift (clamped in-viewport); teleport invariant preserved (press == final move) - session: cursor/click-bias/tremor-orientation persist across -c calls via a per-BU_NAME atomic state file - hardening: underscore-private config tables/class (no namespace leak), narrowed _viewport except, physical-typing dd>=hold constraint - docs: STALE banners on the two design/review drafts; add HUMAN_SIM_VALIDATION.md as the authoritative validation artifact Known ceilings documented in-module (not fixable in this layer): event rate ~20-40Hz (per-call IPC), getCoalescedEvents().length==1 / no PointerEvent stream, CDP-presence detectability. Adds tests/unit/test_human_behavior.py (17 hermetic tests, no browser). Reviewed in a separate lane (APPROVE; 0 regressions). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… dispatch + Runtime.enable drop Tackles the three documented behavioral-detection ceilings with what is actually fixable (researched against Chromium source + fingerprinting lit), and documents honestly what is not. Event RATE (was ~28Hz) — FIXED: - daemon: new `meta:"input_sequence"` handler dispatches a precomputed event list server-side over the persistent CDP WS, sleeping delay_ms before each. Decouples the rate from the per-call client<->daemon IPC round-trip. - helpers: `_send(timeout, raise_on_error)` + `dispatch_input_sequence()` (timeout sized to sum(delay) + per-event slack). - agent_helpers: human_move/human_click/human_scroll now build one event batch and emit via `_emit()`; move_step_ms lowered to ~16ms (~60Hz). A mid-batch failure resumes the remainder client-side (resume-from-count, never re-sends the dispatched prefix — no double-fire); pre-batch daemons fall back to the client path automatically. CDP-presence — mitigated: - daemon omits Runtime.enable by default (`_enabled_domains()`), removing the console-serialization detection class. Nothing consumes Runtime events and Runtime.evaluate works without enable; BH_CDP_ENABLE_RUNTIME=1 restores it. Documented as NOT fixable in software (need a patched Chromium): - getCoalescedEvents() stays empty — CDP injects via ForwardMouseEvent, bypassing the compositor coalescing queue (also why we target ~60Hz, not higher: extra uncoalesced events look more anomalous). - screenX==clientX — CDP sets no window/desktop offset (Cloudflare Turnstile checks this); not settable via CDP. Corrected: pressure 0/0.5, tilt 0, pointerType "mouse" are spec-correct for a real mouse — NOT a bot tell (earlier over-statement removed). Tests: tests/unit/test_daemon_input_sequence.py (3, hermetic via cdp_use stub) + test_human_behavior.py grows to 22 (batch dispatch, ~60Hz rate, single-batch click invariant, fallback, resume-from-count). All 25 pass; py_compile clean. Reviewed in a separate lane across two passes (APPROVE; the partial-failure double-dispatch found in pass 1 is fixed and regression-tested). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…on record Turns the residual-CDP-tell question from speculation into measurement, and records the no-fork decision from the 6-lens investigation. - agent_helpers: human_selftest() instruments the live page (transparent full-viewport overlay, clicks swallowed) while driving real human_* input, then reports for THIS Chrome: T2 screenX-vs-clientX delta, T1 getCoalescedEvents length, delivered pointer rate (~60 fast / ~30 fallback), isTrusted. chrome_version() reads the major via UA (Runtime.enable-free). Both exported; _eval stays private. - CEILING_DECISIONS.md: research conclusions + decision + phased plan. Key findings: T2 (screenX==clientX) is already fixed upstream in Chrome >=142 (crbug 40280325) — verify, don't assume. T1 (getCoalescedEvents empty) has zero confirmed production deployments — theoretical, not shipped. Frida (macOS hardened-runtime / keychain break) and a Chromium fork (profile conflict + 4-week rebase) are both REJECTED for a personal tool. T1's only real fix is a sparingly-used OS-injection (CGEvent) mode, left on the shelf until a confirmed target is shown to check it. Tests: +2 selftest verdict-logic tests (exposed vs fixed-Chrome canned data); test_human_behavior.py now 24, daemon 3. py_compile clean; load-path verified (helpers auto-loads, human_selftest/chrome_version exported). NOTE: selftest verdict logic is unit-tested; its LIVE behavior on real Chrome (whether CDP fires pointermove on the overlay, getCoalescedEvents values) is exactly what the tool is meant to reveal on first run — not yet verified here. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ran human_selftest() against the live machine's real Chrome 148.0.7778.181:
- T2 (screenX==clientX): NOT exposed — screen/client delta 121px (upstream fix live).
- T1 (getCoalescedEvents): EXPOSED — max 1 (CDP coalescing bypass confirmed on 148).
- isTrusted true; delivered pointer rate 41Hz with the server-side fast path
verified active (dispatch_input_sequence returned {ok,count:2} on a fresh daemon
against real Chrome — the daemon batch handler works end-to-end).
Confirms the no-fork decision empirically: the only exposed tell (T1) is the one
with zero production deployment. Phases 1/2 not triggered.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rate, robust click capture Live runs surfaced two selftest flaws (verdict logic was always correct; these are diagnostic-quality issues): - Rate metric swung 19-41Hz run-to-run: (n-1)/span counted the large gaps BETWEEN the move/move/click trajectories. Switched to the MEDIAN inter-move interval, which reports the true per-event cadence — now a stable ~48-56Hz (server-side fast path), matching the verified daemon dispatch. - clicks_captured was intermittently 0, which read like a bug. Verified via a catch-all probe that human_click DOES fire a full, correct event chain (pointerdown/mousedown/pointerup/mouseup/click) — it really clicks. The selftest's single-press capture can just miss the read window, so: capture pointerdown AND mousedown, retry-read briefly, and — decisively — derive the T1/T2/rate/isTrusted verdict from the deterministic MOVE stream only. Clicks are now labelled best-effort and never gate the verdict. Recorded the live Chrome-148 measurement update in CEILING_DECISIONS.md. Tests: 24/24 (verdict logic unchanged — canned signal lives in the move stream); py_compile clean; verified across repeated live runs (T2 OK delta 121px, T1 EXPOSED coalesced<=1, rate ~50Hz, isTrusted true — consistent regardless of click capture). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…o close the T1 coalesced tell Real Quartz CGEvents traverse the full HID->compositor->renderer pipeline, so the page sees genuine coalesced pointer events + correct screenX + isTrusted — the one thing CDP Input.dispatchMouseEvent provably cannot do (it bypasses the compositor coalescing queue). Opt-in, macOS-only, lazy pyobjc import (core stays pure-stdlib). API: human_click_os(x,y[,button,app_name]) / human_move_os(x,y) / os_selftest() / os_calibrate(). Reuses the Fitts/Bezier/tremor trajectory, posting real moves at ~125Hz so Chrome's compositor coalesces them. Safety (this posts REAL clicks on the live desktop — three-layer guard, never blind): - frontmost check: foregrounds BH_BROWSER_APP (default "Google Chrome") and refuses if the wrong app is frontmost (Brave/Edge users pass app_name / set the env). - display-bounds check: refuses if the mapped global point is off all displays (CGGetActiveDisplayList/CGDisplayBounds; handles negative multi-monitor origins). - cursor-arrival check: posts the move, reads back the real cursor, refuses if it didn't reach the target (= Accessibility not granted) instead of a silent no-op. - click uses kCGMouseEventClickState=1 (else clickState 0 may not register / exposes MouseEvent.detail===0). Validated: 30 hermetic tests (mocked Quartz: capability, client->screen mapping, full move+down+up sequence with clickState, off-screen refusal, Accessibility-denied refusal). os_calibrate() run LIVE returned error_px [0.0, 0] — the client->screen mapping matches the browser's reported screenX/screenY EXACTLY on the primary display, so OS clicks land where intended (validated WITHOUT moving the cursor). Two adversarial review passes (APPROVE). NOT yet run live: the CGEvent path needs `pip install pyobjc-framework-Quartz` into the env + Accessibility granted; os_selftest() then measures whether getCoalescedEvents()>1 actually results. Multi-monitor mapping unvalidated (os_calibrate covered primary only). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… OS injection Running os_selftest() live (pyobjc installed + Accessibility granted) drove real CGEvent moves and the page reported getCoalescedEvents() max = 2 (>1) with a real screenX offset (delta 164px): T1 (getCoalescedEvents empty) is GENUINELY CLOSED via the OS-injection path — the one tell CDP Input.dispatchMouseEvent provably cannot fix. The live run also exposed a latent Retina bug: _viewport read layoutViewport, which is DEVICE px (2x at devicePixelRatio 2); CDP/CGEvent coordinates are CSS px, so every fractional target (e.g. 0.65*w) was 2x too large and mapped off-screen — the OS display-bounds guard correctly refused. Fixed _viewport to prefer cssLayoutViewport (CSS px), falling back to layoutViewport for older Chrome. This also corrects _clamp and cursor-init on any dpr>1 display (affected the CDP path too, latent). Tests: +1 Retina regression (test_viewport_uses_css_pixels_on_retina; _FakeQuartz cdp now returns both cssLayoutViewport=1200x800 and layoutViewport=2400x1600, asserts the CSS one is used). 31/31 + daemon 3. py_compile clean. os_calibrate live still [0.0,0]. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The live no-stress OS selftest was flaky on Chrome 148: real Quartz moves can legitimately report getCoalescedEvents()==1 when the renderer is idle. The proof now adds transient renderer stress so the compositor path difference is measured deterministically, while the actual OS movement model stays human-scale. Constraint: Chrome 148 on a single-display Retina Mac produced no-stress OS coalescing inconsistently. Rejected: Increase OS move rate or burst CGEvents | live repeats stayed flaky and changed the movement model. Confidence: high Scope-risk: narrow Directive: Treat os_selftest(stress=True) as the T1 proof mode; do not claim unstressed Chrome always returns coalesced events. Tested: python3 tests/unit/test_human_behavior.py; python3 tests/unit/test_daemon_input_sequence.py; uv run --with pytest pytest tests/unit/test_admin.py tests/unit/test_helpers.py tests/unit/test_daemon.py tests/unit/test_run.py -q; python3 -m py_compile agent-workspace/agent_helpers.py src/browser_harness/daemon.py src/browser_harness/helpers.py; live os_calibrate ok error_px [0.0,0]; live os_selftest stress repeats coalesced_len_max 2/3/3; live CDP stress coalesced_len_max 1. Not-tested: live multi-monitor hardware; current machine exposes one active display only.
The stress-mode coalescing proof can make CGEventPost completion observable before the WindowServer has settled the final cursor location. The OS path now re-posts the final landing point with a short settle loop before treating the run as an Accessibility failure. Constraint: live os_selftest(stress=True) once read the cursor 31px from target immediately after trajectory completion. Rejected: Loosen the 4px verification threshold | would hide wrong-monitor or Accessibility failures instead of waiting for real cursor arrival. Confidence: high Scope-risk: narrow Directive: Keep the final-arrival check strict; add settling, not broad tolerance, when WindowServer timing is the issue. Tested: python3 tests/unit/test_human_behavior.py; python3 tests/unit/test_daemon_input_sequence.py; uv run --with pytest pytest tests/unit/test_admin.py tests/unit/test_helpers.py tests/unit/test_daemon.py tests/unit/test_run.py -q; python3 -m py_compile agent-workspace/agent_helpers.py src/browser_harness/daemon.py src/browser_harness/helpers.py; live os_calibrate ok error_px [0.0,0]; live os_selftest stress repeats coalesced_len_max 2/3/2; live CDP stress coalesced_len_max 1. Not-tested: live multi-monitor hardware; current machine exposes one active display only.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an opt-in human-behavior simulation layer for
browser-harness:Runtime.enableremoval to avoid console-serialization CDP tell, withBH_CDP_ENABLE_RUNTIME=1escape hatchhuman_move_os,human_click_os) using Quartz CGEvents for rare detection-sensitive clickscssLayoutViewportrather than device-pixellayoutViewportLive validation
On local Chrome 148 / macOS:
os_calibrate:error_px [0.0, 0]os_selftest(stress=True): repeated coalesced max2 / 3 / 2, so the OS-input path closes the T1getCoalescedEvents()tell1, confirming the CDP path remains uncoalesced1; multi-monitor mapping is guarded and unit-tested, but not live-tested on multi-monitor hardwareTests
Validated on the feature branch and again after merging into latest
origin/mainin a temporary worktree:python3 tests/unit/test_human_behavior.py→33/33 passedpython3 tests/unit/test_daemon_input_sequence.py→3/3 passeduv run --with pytest pytest tests/unit/test_admin.py tests/unit/test_helpers.py tests/unit/test_daemon.py tests/unit/test_run.py -q→81 passedafter merge with currentorigin/mainpython3 -m py_compile agent-workspace/agent_helpers.py src/browser_harness/daemon.py src/browser_harness/helpers.pyNotes
os_selftest(stress=True)is the deterministic proof mode. Unstressed Chrome can legitimately reportgetCoalescedEvents()==1even for real OS moves.Summary by cubic
Adds an opt-in human behavior simulation layer to
browser-harnesswith human-like mouse, typing, and scrolling, plus a macOS OS-input path for sensitive clicks. Improves realism and raises event cadence via ~60Hz server-side dispatch, and drops default CDP Runtime to reduce detectability.New Features
human_move_os,human_click_os,os_selftest,os_calibratefor detection-sensitive actions.Migration
pyobjc-framework-Quartz, and Accessibility; it foregrounds the browser and moves the real cursor.Runtime.enableis now omitted by default; setBH_CDP_ENABLE_RUNTIME=1if you need Runtime events.Written for commit 306155f. Summary will update on new commits.