|
| 1 | +--- |
| 2 | +name: agent-browser |
| 3 | +description: >- |
| 4 | + Guides headless browser automation using the agent-browser CLI for web interaction, |
| 5 | + accessibility tree navigation, form filling, screenshots, and authenticated sessions. |
| 6 | + USE WHEN the user asks to "open a webpage", "navigate a site", "take a screenshot", |
| 7 | + "fill a form", "get page content", "interact with a website", "scrape a page", |
| 8 | + "automate browser", "accessibility tree", or works with agent-browser, Playwright, |
| 9 | + headless Chrome, CDP. DO NOT USE for static HTML parsing, curl/wget requests, |
| 10 | + or API-only interactions where WebFetch suffices. |
| 11 | +version: 0.1.0 |
| 12 | +allowed-tools: Bash |
| 13 | +argument-hint: "[url or action]" |
| 14 | +--- |
| 15 | + |
| 16 | +# Headless Browser Automation |
| 17 | + |
| 18 | +## Mental Model |
| 19 | + |
| 20 | +`agent-browser` is a **stateful CLI** — you `open` a page, interact with it through a series of commands, and `close` when done. There is one active browser session at a time. |
| 21 | + |
| 22 | +The accessibility tree (`snapshot`) is the primary way to "see" the page. It returns a structured tree of every element on the page, each tagged with a reference ID (`@e1`, `@e2`, etc.). You use these references to target elements for clicks, form fills, and selections. Think of it as a DOM you can address by stable short IDs rather than CSS selectors. |
| 23 | + |
| 24 | +The core loop is: **open → snapshot → interact → snapshot → close**. Always snapshot before interacting so you know what elements are available. Always snapshot after interacting to verify the result. |
| 25 | + |
| 26 | +--- |
| 27 | + |
| 28 | +## Core Workflow |
| 29 | + |
| 30 | +Every browser task follows this pattern: |
| 31 | + |
| 32 | +```bash |
| 33 | +# 1. Open a page |
| 34 | +agent-browser open https://example.com |
| 35 | + |
| 36 | +# 2. Snapshot to see the page structure |
| 37 | +agent-browser snapshot |
| 38 | + |
| 39 | +# 3. Interact using element references from the snapshot |
| 40 | +agent-browser click @e2 |
| 41 | +agent-browser fill @e3 "search query" |
| 42 | + |
| 43 | +# 4. Snapshot again to see the result |
| 44 | +agent-browser snapshot |
| 45 | + |
| 46 | +# 5. Close when done |
| 47 | +agent-browser close |
| 48 | +``` |
| 49 | + |
| 50 | +--- |
| 51 | + |
| 52 | +## Commands Overview |
| 53 | + |
| 54 | +| Command | Purpose | Example | |
| 55 | +|---------|---------|---------| |
| 56 | +| `open <url>` | Navigate to a URL and start a session | `agent-browser open https://example.com` | |
| 57 | +| `snapshot` | Get the accessibility tree with element references | `agent-browser snapshot` | |
| 58 | +| `screenshot <path>` | Capture a PNG screenshot of the current page | `agent-browser screenshot page.png` | |
| 59 | +| `click @eN` | Click an element by its reference ID | `agent-browser click @e2` | |
| 60 | +| `fill @eN "text"` | Type text into an input element | `agent-browser fill @e3 "hello"` | |
| 61 | +| `select @eN "value"` | Select an option from a dropdown | `agent-browser select @e5 "option1"` | |
| 62 | +| `cookie set "..."` | Set a cookie for authenticated sessions | `agent-browser cookie set "session=abc123; domain=.example.com"` | |
| 63 | +| `connect <port>` | Connect to host Chrome via CDP | `agent-browser connect 9222` | |
| 64 | +| `close` | End the browser session | `agent-browser close` | |
| 65 | + |
| 66 | +> **Full details:** See `references/cli-reference.md` for complete command syntax, output formats, and all options. |
| 67 | +
|
| 68 | +--- |
| 69 | + |
| 70 | +## Element References |
| 71 | + |
| 72 | +When you run `agent-browser snapshot`, the output is an accessibility tree where each interactive element is tagged with a reference like `@e1`, `@e2`, etc.: |
| 73 | + |
| 74 | +``` |
| 75 | +document "Example Page" |
| 76 | + heading "Welcome" @e1 |
| 77 | + textbox "Search" @e2 |
| 78 | + button "Submit" @e3 |
| 79 | + link "About Us" @e4 |
| 80 | +``` |
| 81 | + |
| 82 | +Use these references in subsequent commands: |
| 83 | + |
| 84 | +- `agent-browser click @e3` — clicks the "Submit" button |
| 85 | +- `agent-browser fill @e2 "my query"` — types into the "Search" textbox |
| 86 | +- References are stable within a single page state. After navigation or significant DOM changes, run `snapshot` again to get updated references. |
| 87 | + |
| 88 | +--- |
| 89 | + |
| 90 | +## Authentication Patterns |
| 91 | + |
| 92 | +For pages requiring authentication, inject cookies before opening the page: |
| 93 | + |
| 94 | +```bash |
| 95 | +# Set session cookie first |
| 96 | +agent-browser cookie set "session=abc123; domain=.example.com" |
| 97 | + |
| 98 | +# Then open the authenticated page |
| 99 | +agent-browser open https://example.com/dashboard |
| 100 | + |
| 101 | +# Proceed normally |
| 102 | +agent-browser snapshot |
| 103 | +``` |
| 104 | + |
| 105 | +This avoids needing to fill login forms when you already have valid session credentials. |
| 106 | + |
| 107 | +--- |
| 108 | + |
| 109 | +## Containerized Usage |
| 110 | + |
| 111 | +### Headless Mode (Default) |
| 112 | + |
| 113 | +Uses bundled Chromium in the container — no display needed. Works out of the box: |
| 114 | + |
| 115 | +```bash |
| 116 | +agent-browser open https://example.com |
| 117 | +agent-browser snapshot |
| 118 | +agent-browser close |
| 119 | +``` |
| 120 | + |
| 121 | +### Host Chrome Connection |
| 122 | + |
| 123 | +Connect to Chrome running on your host machine via CDP (Chrome DevTools Protocol). Useful when the container's bundled Chromium is insufficient (e.g., specific browser extensions needed): |
| 124 | + |
| 125 | +1. Start Chrome on host with remote debugging: |
| 126 | + ```bash |
| 127 | + chrome --remote-debugging-port=9222 |
| 128 | + ``` |
| 129 | + |
| 130 | +2. Connect from container: |
| 131 | + ```bash |
| 132 | + agent-browser connect 9222 |
| 133 | + ``` |
| 134 | + |
| 135 | +--- |
| 136 | + |
| 137 | +## Ambiguity Policy |
| 138 | + |
| 139 | +These defaults apply when the user does not specify a preference. State the assumption when applying a default: |
| 140 | + |
| 141 | +- **Mode:** Always use headless mode (bundled Chromium) unless the user explicitly requests host Chrome connection |
| 142 | +- **Snapshot first:** Always run `snapshot` before interacting with elements — never guess element references |
| 143 | +- **Snapshot after:** Always run `snapshot` after interactions to verify results |
| 144 | +- **Close when done:** Always `close` the browser session when the task is complete |
| 145 | +- **Screenshots:** Save to the current working directory unless the user specifies a path |
| 146 | +- **Cookie scope:** Set cookies before `open` so they apply to the initial page load |
| 147 | + |
| 148 | +--- |
| 149 | + |
| 150 | +## Reference Files |
| 151 | + |
| 152 | +| File | Contents | |
| 153 | +|------|----------| |
| 154 | +| [CLI Reference](references/cli-reference.md) | Complete command syntax, all flags and options, output format descriptions, connection modes, error handling | |
| 155 | +| [Workflow Patterns](references/workflow-patterns.md) | Common automation patterns: page inspection, form filling, multi-page navigation, authenticated sessions, screenshots, error recovery | |
0 commit comments