OS Screen Observer

A prototype that exposes the operating system's UI accessibility tree, textual descriptions, and ASCII spatial sketches through two simultaneous interfaces:

Web inspector (human-facing) — a browser-based dashboard at localhost:5001
MCP server (AI-facing) — a stdio MCP server compatible with Claude Desktop and Claude Code

Both interfaces share the same underlying observer and can run simultaneously.

REST API

OSScreenObserver exposes a full REST API on port 5001 (configurable). Most /api/* endpoints return JSON; /api/metrics returns text/plain (Prometheus format) and / returns HTML.

Security & Network Bind Defaults

The default bind host is now 0.0.0.0 (all network interfaces) on port 5001. This default is intended for testing inside an isolated sandbox or container (e.g., a disposable VM/dev container) where exposing the API to the container network is convenient and safe.

The REST API has NO authentication. Any client that can reach the port can call every endpoint, including /api/action, which can click, type, and otherwise control the host desktop.

For local-only use on a workstation, override the bind to loopback:
# Command-line override (recommended for local dev)
python main.py --mode both --host 127.0.0.1
Or edit config.json:
{
  "web_ui": { "host": "127.0.0.1", "port": 5001 }
}
Do not expose the default 0.0.0.0 bind on a workstation connected to an untrusted network (home Wi-Fi, café, corporate LAN, public cloud VM) without a firewall, reverse proxy with authentication, or VPN in front of it.

CORS warning: The Flask server enables permissive CORS for all routes by default (CORS(app)). Any website running in the user's browser can send cross-origin requests to the API — including destructive /api/action calls. Restrict CORS origins or add an authentication/proxy layer before exposing the server to a multi-user environment.

Startup modes

python main.py                         # Default: auto — TTY → inspect (web UI + interactive setup); piped (Claude Desktop) → both
python main.py --mode both             # Force REST API + MCP stdio simultaneously
python main.py --mode inspect          # Force web UI only
python main.py --mode mcp              # Force MCP stdio only
python main.py --mock                  # Mock mode with synthetic data (no OS access)
python main.py --mock --scenario scenarios_examples/login.yaml  # Scenario-driven mock

Health check (poll until ready)

curl http://127.0.0.1:5001/api/healthz

Endpoint quick reference

Method	Endpoint	Description
`GET`	`/api/windows`	List visible top-level windows
`GET`	`/api/structure`	Full accessibility element tree (JSON)
`GET`	`/api/description`	Combined screen description (accessibility + OCR + VLM) — `mode` query parameter is accepted but ignored — always returns combined output
`GET`	`/api/sketch`	ASCII spatial layout diagram
`GET`	`/api/screenshot`	Base64-encoded PNG screenshot
`POST`	`/api/action`	Execute click, type, key, or scroll action
`GET`	`/api/capabilities`	Server capabilities and platform info
`GET`	`/api/healthz`	Health and uptime
`GET`	`/api/metrics`	Prometheus-format metrics (`text/plain`)

Example workflow

# 1. List windows
curl http://127.0.0.1:5001/api/windows

# 2. Get combined description of focused window (all available sources)
curl http://127.0.0.1:5001/api/description

# 3. Get element tree for precise coordinates
curl http://127.0.0.1:5001/api/structure

# 4. Click a button at coordinates
curl -X POST http://127.0.0.1:5001/api/action \
  -H "Content-Type: application/json" \
  -d '{"action": "click_at", "x": 480, "y": 300}'

Full API reference

See screen_observer_api_reference.md for complete endpoint documentation including v2 agentic endpoints (snapshots, tracing, replay, scenarios, oracles, budgets, redaction). (Note: /api/metrics returns plain text and / returns an HTML page, not JSON; the reference doc has been updated to reflect these exceptions.)

LLM tool integration

The REST API endpoints map directly to the SCREEN_TOOLS OpenAI/OpenWebUI function-calling schema documented in screen_observer_api_reference.md. Any system that supports OpenAI-compatible tool use can integrate OSScreenObserver using these tool schemas.

Installation

Quick start — automated launchers

The fastest path is the platform launcher, which detects missing dependencies (Python, Tesseract, Ollama, wmctrl on Linux), asks before installing each one, sets up a .venv/, installs requirements.txt, and starts the server:

# Linux
./start.sh

# macOS
./start-mac.sh

# Windows  (Command Prompt or PowerShell)
start.bat

The scripts use winget on Windows, Homebrew on macOS, and the native package manager on Linux (apt / dnf / pacman / zypper). Skip any prompt to install manually later; the launcher will still bring up whatever is already working.

For a manual install, follow the steps below.

1. Python environment

python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux/WSL:
source .venv/bin/activate

pip install -r requirements.txt

2. Platform-specific setup

Windows (primary — full UIA support)

pip install pywinauto pywin32 psutil

macOS

# mss and pyautogui handle screenshots and actions.
pip install pyobjc          # enables full AX accessibility tree

Linux

# Window enumeration: install wmctrl (Debian/Ubuntu shown; use the
# equivalent for your distro: `dnf install wmctrl`, `pacman -S wmctrl`,
# `zypper install wmctrl`, etc.).
sudo apt install wmctrl
# If wmctrl is unavailable, the adapter falls back to python-xlib:
#   pip install python-xlib

# AT-SPI accessibility tree (optional but recommended):
# Install the system package first — the PyPI `pyatspi` wheel requires the
# underlying GObject/AT-SPI libraries to be present.
sudo apt install python3-pyatspi   # Debian/Ubuntu/WSL
pip install pyatspi

WSL (Windows Subsystem for Linux)

The server auto-detects WSL and uses PowerShell for screenshots and window listing when no X11 display is available. Set DISPLAY for X11 forwarding to also enable accessibility tools.

3. Description sources (optional, all platforms)

get_screen_description always runs in combined mode and returns every source that is available. The web inspector's Description tab shows which sources ran and how to enable any that are missing.

OCR (Tesseract)

# Windows:  download from https://github.com/tesseract-ocr/tesseract/releases
# macOS:    brew install tesseract
# Linux:    sudo apt install tesseract-ocr

pip install pytesseract

On Windows the Tesseract installer does not add the binary to PATH. Point the server at it in config.json:

{
  "ocr": {
    "enabled": true,
    // forward slashes work on Windows too:
    "tesseract_cmd": "C:/Program Files/Tesseract-OCR/tesseract.exe"
  }
}

If the JSON parser rejects the file (forgotten backslash escape) the server logs a [main:load_config] error and reports config_error at GET /api/healthz.

VLM (optional, all platforms)

The VLM modality is reached through any OpenAI-compatible chat-completions endpoint. Two common setups:

Setup	`base_url`	notes
OpenWebUI	`http://localhost:3000`	fronts Ollama, Anthropic, OpenAI, etc.
Ollama direct	`http://localhost:11434`	use a vision model such as `qwen2.5vl:7b`, `llama3.2-vision`, or `minicpm-v`
OpenAI / LiteLLM / other	your endpoint URL	standard `/v1` path

OSScreenObserver automatically probes /api/v1/models first (OpenWebUI convention) and falls back to /v1/models (Ollama / OpenAI convention), so pointing base_url straight at Ollama works without any extra configuration.

The VLM channel has two operating modes:

single — one screenshot + one prompt, optionally grounded with the accessibility tree, OCR text, ASCII sketch, and focused-element hint as <X>...</X> envelopes appended to the prompt. Cheap (one HTTP call) and back-compatible with prior versions.
multipass — a three-pass pipeline (scene → controls → next-actions) with an optional verify pass. Returns a structured JSON envelope with summary, app, screen_type, focused, controls, next_actions, modal_open, sensitive_regions, and per-pass timings. The envelope travels in the legacy description field as pretty-printed JSON and is also exposed parsed under the new vlm_structured field for callers that prefer not to re-parse.

// config.json — Ollama direct, recommended starting configuration
"vlm": {
  "enabled":  true,
  "base_url": "http://localhost:11434",
  "api_key":  null,

  "model":         "qwen2.5vl:7b",       // primary (Pass 2 / single-shot)
  "model_fast":    "qwen2.5vl:3b",       // Pass 1 + per-widget crop labels
  "model_actions": null,                 // Pass 3 (no image); falls back to primary
  "model_verify":  null,                 // optional second opinion

  "mode":          "multipass",          // or "single" for legacy one-shot
  "output_format": "json",
  "max_tokens":    2000,
  "temperature":   0.1,

  "ground_with_tree":   true,            // inject <ACCESSIBILITY_TREE>
  "ground_with_ocr":    true,            // inject <OCR_TEXT>
  "ground_with_sketch": true,            // inject <ASCII_SKETCH> with tab badges
  "ground_with_focus":  true             // inject <FOCUSED_ELEMENT>
}

Recommended Ollama models (24 GB RTX 4090, 128 GB RAM):

Role	Model	Tag	~VRAM	Notes
Primary, best overall	Qwen2.5-VL 7B	`qwen2.5vl:7b`	~8 GB	SOTA small open VLM for UI/document tasks; strong at small fonts.
Primary, premium	Qwen2.5-VL 32B (Q4_K_M)	`qwen2.5vl:32b`	~20 GB	Top-tier reasoning; fits 24 GB at Q4; slower per image.
Different family (verify)	Llama 3.2 Vision 11B	`llama3.2-vision:11b`	~9 GB	Good `model_verify` pair for genuine second opinion.
OCR-heavy screens	MiniCPM-V 2.6	`minicpm-v:8b`	~7 GB	Excellent on dense text and forms.
Pass 1 / crop labels	Qwen2.5-VL 3B	`qwen2.5vl:3b`	~4 GB	Cheap scene tagging; reused for the ASCII renderer's crop labeller.
Pass 3 (text-only)	Qwen2.5 14B	`qwen2.5:14b`	~9 GB	Pass 3 has no image; a text-only LLM is faster than a VLM.

Set OLLAMA_KEEP_ALIVE=30m and OLLAMA_MAX_LOADED_MODELS=2 so the primary and fast models stay resident across multipass calls.

The first time you run python main.py --mode inspect with vlm.enabled=true and vlm.model=null, OSScreenObserver fetches the model list from the endpoint, shows a paginated picker, and saves your choice back to config.json. When mode="multipass", the picker also prompts for the optional model_fast, model_actions, and model_verify slots (skip any of them to reuse the primary). In mcp/both mode the picker is suppressed (stdin is owned by the MCP framing channel); set the model keys directly in config.json for non-interactive use.

Running

Both interfaces (default)

python main.py
# Web inspector: http://127.0.0.1:5001
# MCP server:    stdin/stdout (for Claude Desktop)

Web inspector only

python main.py --mode inspect

MCP server only

python main.py --mode mcp

Mock mode (no OS access required — useful for development)

python main.py --mock
# or
python main.py --mock --mode inspect

Custom port

python main.py --port 8080

Web Inspector

Open http://127.0.0.1:5001 in a browser after starting with --mode inspect or --mode both.

Tab	Content
STRUCTURE	Interactive collapsible JSON tree of the accessibility element hierarchy
DESCRIPTION	Combined description from all available sources (accessibility tree, OCR, VLM). Each source is shown in its own labeled section. A badge row at the top shows which sources ran and how to enable any that are missing.
SKETCH	ASCII spatial layout diagram (Unicode box-drawing characters)
SCREENSHOT	Pixel screenshot, visible-area bounding boxes, and ASCII sketch (all in one panel)
ACTIONS	Click at coordinates, type text, press key combinations

The sidebar lists all visible windows. Click one to select it. All tabs update to reflect the selected window. Enable AUTO-REFRESH to poll every 3 seconds.

MCP Integration (Claude Desktop)

Add the following block to your Claude Desktop configuration (%APPDATA%\Claude\claude_desktop_config.json on Windows, ~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "os-screen-observer": {
      "command": "python",
      "args": [
        "/absolute/path/to/screen_observer/main.py",
        "--mode", "both"
      ]
    }
  }
}

To run with mock data during development:

{
  "mcpServers": {
    "os-screen-observer": {
      "command": "python",
      "args": [
        "/absolute/path/to/screen_observer/main.py",
        "--mode", "both",
        "--mock"
      ]
    }
  }
}

Restart Claude Desktop after editing the config. The server will appear in the tools menu. You can then ask Claude to:

"List the windows currently open on my desktop."
"Show me the accessibility tree for the focused window."
"Give me an ASCII sketch of the Notepad window layout."
"Describe what is on the screen using OCR."
"Click at coordinates 400, 300."
"Type 'hello world' into the focused field."

Available MCP Tools

Tool	Description
`list_windows`	Enumerate all visible top-level windows
`get_window_structure`	Full accessibility element tree as JSON
`get_screen_description`	Combined description from all available sources (accessibility tree + OCR + VLM). No mode parameter needed — returns everything the platform supports.
`get_screen_sketch`	ASCII spatial layout diagram
`get_screenshot`	Screenshot as base64 PNG
`get_full_screenshot`	Screenshot + ASCII sketch in one call (sketch uses OCR overlay)
`get_visible_areas`	Visible (non-occluded, on-screen) bounding boxes for a window
`bring_to_foreground`	Raise a window using the platform focus API; falls back to title-bar click
`click_at`	Click at pixel coordinates
`type_text`	Type text into the focused element
`press_key`	Press a key combination (e.g., `ctrl+c`, `alt+f4`)
`scroll`	Scroll the mouse wheel at an optional screen position

REST API Reference

The web inspector exposes the following endpoints (all GET unless noted):

Endpoint	Params	Description
`GET /api/windows`	—	List all visible windows
`GET /api/structure`	`window_index`	Accessibility element tree (JSON)
`GET /api/description`	`window_index`	Combined description (accessibility + OCR + VLM, whatever is available). `mode` query parameter is accepted but ignored — always returns combined output.
`GET /api/sketch`	`window_index`, `grid_width`, `grid_height`, `ocr`	ASCII layout sketch
`GET /api/screenshot`	`window_index`	Screenshot as base64 PNG
`GET /api/full_screenshot`	`window_index`, `grid_width`, `grid_height`	Screenshot + ASCII sketch (sketch uses OCR overlay)
`GET /api/visible_areas`	`window_index` (required)	Visible non-occluded bounding boxes
`GET /api/bring_to_foreground`	`window_index` (required)	Click the title bar to raise the window
`POST /api/action`	JSON body `{action, …}`	Execute click / type / key / scroll

`GET /api/full_screenshot`

Returns a combined response so callers don't need two round-trips:

{
  "window":   "Untitled — Notepad",
  "format":   "png",
  "encoding": "base64",
  "width":    800,
  "height":   600,
  "data":     "<base64 PNG>",
  "sketch":   "┌── Window ──…"
}

`GET /api/visible_areas`

Returns the portions of a window that are visible on screen — not covered by other windows and within the monitor area:

{
  "window": "Untitled — Notepad",
  "visible_regions": [
    {"x": 80, "y": 60, "width": 800, "height": 400},
    {"x": 80, "y": 500, "width": 400, "height": 160}
  ]
}

Each entry is a rectangle in absolute screen pixels. If the window is fully visible a single region covering the entire window is returned. If the window is fully off-screen or completely covered, the list is empty.

`GET /api/bring_to_foreground`

Raises a window by clicking in its title-bar area. The server selects the top-most visible region of the window and clicks ~20 px below its top edge:

{
  "window":    "Untitled — Notepad",
  "success":   true,
  "action":    "click_at",
  "clicked_x": 960,
  "clicked_y": 80
}

window_index is required. If the window has no visible area the response contains "success": false with an explanatory error message — the click is not attempted in that case.

Platform notes

Platform	Occlusion detection
Windows	Real Z-order via `win32gui`: a fully-covered window returns `success: false`
macOS / Linux	Z-order unavailable; the window is assumed to be on top, so the screen-clipped bounds are used. A fully-covered window may still produce a click that lands on the covering window.

Configuration Reference (`config.json`)

The following shows the built-in defaults (when no config.json is provided). The shipped config.json overrides web_ui.host to 127.0.0.1 for loopback-only access.

{
  "web_ui": {
    "host":  "0.0.0.0",     // bind address; use "127.0.0.1" for loopback-only
    "port":  5001,           // HTTP port
    "debug": false
  },
  "mcp": {
    "server_name": "os-screen-observer",
    "version":     "0.1.0"
  },
  "ocr": {
    "enabled":         true,
    "tesseract_cmd":   null,   // e.g. "C:\\Program Files\\Tesseract-OCR\\tesseract.exe"
    "min_confidence":  30      // 0–100; words below this threshold are discarded
  },
  "vlm": {
    "enabled":    false,       // OpenWebUI-compatible chat-completions endpoint
    "base_url":   "http://localhost:3000",
    "api_key":    null,         // or set $OWUI_API_KEY
    "model":      null,         // null → interactive picker on first --mode inspect run
    "max_tokens": 1500
  },
  "ascii_sketch": {
    "grid_width":  110,        // output width in characters
    "grid_height":  38,        // output height in characters
    "unicode_box": true        // false → plain ASCII +/-/| instead of ┌─┐│└┘
  },
  "tree": {
    "max_depth": 8             // maximum depth to traverse (Windows only)
  },
  "logging": {
    "level": "INFO"            // DEBUG / INFO / WARNING / ERROR
  },
  "mock":    false,            // force mock adapter regardless of platform
  "platform": "auto"          // "auto" | "Windows" | "Darwin" | "Linux"
}

Project Layout

screen_observer/
├── main.py            Entry point; argument parsing; thread coordination
├── config.json        Runtime configuration
├── requirements.txt   Python dependencies
├── observer.py        Platform adapters + ScreenObserver facade
├── ascii_renderer.py  ASCII spatial sketch renderer
├── description.py     Textual description generator (tree / OCR / VLM)
├── mcp_server.py      MCP JSON-RPC 2.0 stdio server
└── web_inspector.py   Flask REST API + embedded single-page UI

Platform Support Status

Feature	Windows	macOS	Linux	WSL
Window enumeration	Full (`win32gui`)	Supported (`Quartz` / AppleScript)	Supported (`wmctrl`)	Supported (`wmctrl`) or PowerShell fallback
Accessibility tree	Full (UIA + pywinauto)	Supported (`pyobjc` AXUIElement)	Supported (`pyatspi`)	Stub (no X11 without DISPLAY)
Screenshot	`PrintWindow` → `mss`	`mss`	`mss` → `scrot`	`mss` (if DISPLAY) or PowerShell
OCR	yes	yes	yes	yes
VLM description	yes	yes	yes	yes
ASCII sketch	yes	yes	yes	yes
Input actions	yes	yes	yes	yes (`pyautogui` needs DISPLAY)
Mock mode	yes	yes	yes	yes

get_screen_description always returns everything the current platform supports in a single call — no mode parameter required. The web inspector's Description tab shows which sources ran and how to enable missing ones.

All adapters degrade gracefully: if a library is not installed or a capability is unavailable, the server continues running and returns whatever it can. Optional dependencies for macOS (pyobjc) and Linux (pyatspi) are auto-installed via mac_adapter.py / linux_adapter.py when present.

Testing

OSScreenObserver ships with two test tiers:

Regression suite (`tests/`)

Runs in-process against the Flask test client, mock adapter, and the existing client / observer / app fixtures from tests/conftest.py. No subprocesses, no display, no LLM. Used by the default ci.yml.

pip install -r requirements.txt -r requirements-dev.txt
python -m pytest tests/ -m "not user"

User tests (`tests/user/`)

End-to-end tests that boot a real python main.py subprocess and drive it over the wire. Covers:

REST surface (test_rest_full.py) — every documented endpoint on Flask, including response envelopes, error codes, snapshot lifecycle, observe diff tokens, metrics in Prometheus format.
MCP stdio (test_mcp_protocol.py) — JSON-RPC 2.0 framing over stdio, initialize / tools/list / tools/call, smoke coverage of all 49 MCP tools, stdout purity (logs must go to stderr).
Scenarios (test_scenarios_user.py) — drives login.yaml from start to welcome via reactions; oracle pass/fail.
Trace/replay (test_trace_replay.py) — record + replay round trip with no divergences.
ASCII renderer (test_ascii_render_snapshot.py) — locks the sketch output against a stored snapshot.
All 9 assert_state predicate kinds (test_predicates_full.py) — element_exists, element_absent, value_equals, value_matches, text_visible, window_focused, window_exists, tree_hash_equals, and the AND combination.
Element actions (test_element_actions_full.py) — focus, set_value, invoke, select_option, hover, drag, key_into, clear_text, right_click, double_click, the propose-then-confirm flow.
OCR / VLM live tests — test_ocr_real_tesseract.py runs Tesseract against a generated PIL PNG; test_vlm_real_ollama.py exercises the multipass VLM pipeline against a reachable Ollama daemon (skipped if none is reachable).
Live X11 (test_xvfb_live.py) — boots OSO without --mock, spawns xterm via the fixture, and verifies the Linux adapter picks the window up.
Budgets / redaction / propose (test_budget_redaction_audit.py) — --max-actions enforcement, redaction status, propose_action token flow.
Config bootstrap + Ollama-setup live — test_setup_config_live.py, test_ollama_setup_live.py.

python -m pytest tests/user/ -m "user"

Docker harness (shared with AutoGUI)

The unified bash scripts/test-in-docker.sh in the AutoGUI repo runs both repos' regression + user tiers, the integration tier, and the pi-extension tier in a single image. The image bundles Xvfb + fluxbox so wmctrl / xdotool / scrot / Tesseract all work, optionally bundles Ollama with pre-pulled chat + VLM models, and tears down on exit even on Ctrl-C. See AutoGUI/README.md for the picker walkthrough and flag reference.

Marker plumbing

pytest.ini registers four markers:

Marker	Meaning
`user`	End-to-end tests that boot a real subprocess
`slow_llm`	Hits a real chat LLM (e.g. Ollama via VLM endpoint)
`slow_vlm`	Hits a real vision LLM
`needs_display`	Requires `$DISPLAY` pointing at an X server
`needs_tesseract`	Requires the `tesseract` binary on PATH

Default CI lane selects not user so the new tier is opt-in.

Known Limitations (Prototype)

Accessibility-dark applications — Games, Electron apps with custom renderers, and applications that do not instrument UIA will produce sparse trees. The OCR and VLM modalities degrade more gracefully in these cases.
Prompt injection risk — Screen content is included verbatim in tool results. Malicious content on-screen could attempt to influence the AI's behavior. Apply appropriate trust boundaries when using this server in production contexts.
Performance — Full tree traversal on a complex window (e.g., a browser with many DOM-mapped UIA nodes) can take several seconds. The tree.max_depth config key limits traversal depth.
Action safety — The click_at, type_text, and press_key tools execute real input events. Apply appropriate authorization controls before exposing this server to an untrusted AI client in production.

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.claude/skills/babysit-pr-copilot		.claude/skills/babysit-pr-copilot
.github/workflows		.github/workflows
scenarios_examples		scenarios_examples
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
agentic_features_design.md		agentic_features_design.md
ascii_renderer.py		ascii_renderer.py
audit.py		audit.py
budgets.py		budgets.py
config.json.example		config.json.example
description.py		description.py
diff.py		diff.py
element_selectors.py		element_selectors.py
errors.py		errors.py
hashing.py		hashing.py
linux_adapter.py		linux_adapter.py
mac_adapter.py		mac_adapter.py
main.py		main.py
mcp_server.py		mcp_server.py
observer.py		observer.py
ocr_util.py		ocr_util.py
ollama_setup.py		ollama_setup.py
oracles.py		oracles.py
pytest.ini		pytest.ini
redaction.py		redaction.py
replay.py		replay.py
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
scenarios.py		scenarios.py
screen_observer_api_reference.md		screen_observer_api_reference.md
session.py		session.py
setup_config.py		setup_config.py
start-mac.sh		start-mac.sh
start.bat		start.bat
start.sh		start.sh
tools.py		tools.py
tracing.py		tracing.py
vlm_setup.py		vlm_setup.py
web_inspector.py		web_inspector.py
window_agent.py		window_agent.py

Folders and files

Latest commit

History

Repository files navigation

OS Screen Observer

REST API

Security & Network Bind Defaults

Startup modes

Health check (poll until ready)

Endpoint quick reference

Example workflow

Full API reference

LLM tool integration

Installation

Quick start — automated launchers

1. Python environment

2. Platform-specific setup

3. Description sources (optional, all platforms)

Running

Both interfaces (default)

Web inspector only

MCP server only

Mock mode (no OS access required — useful for development)

Custom port

Web Inspector

MCP Integration (Claude Desktop)

Available MCP Tools

REST API Reference

GET /api/full_screenshot

GET /api/visible_areas

GET /api/bring_to_foreground

Configuration Reference (config.json)

Project Layout

Platform Support Status

Testing

Regression suite (tests/)

User tests (tests/user/)

Docker harness (shared with AutoGUI)

Marker plumbing

Known Limitations (Prototype)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`GET /api/full_screenshot`

`GET /api/visible_areas`

`GET /api/bring_to_foreground`

Configuration Reference (`config.json`)

Regression suite (`tests/`)

User tests (`tests/user/`)

Packages