Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
2e27811
feat: compact snapshot output with URL compression
mugorski May 18, 2026
30f0e97
feat: compact snapshot output with URL compression
mugorski May 18, 2026
7ee028b
feat: add python package for benchmarking
kzajac-opera May 19, 2026
4df1d09
feat: add python package for benchmarking
kzajac-opera May 19, 2026
1770e5b
feat: add benchmark report generation
kzajac-opera May 19, 2026
a5cc1e8
feat: add benchmark configuration files
kzajac-opera May 19, 2026
602cb59
docs: add benchmark documentation
kzajac-opera May 19, 2026
394aedd
chore: Add python CI and reformat code
kzajac-opera May 19, 2026
2726d16
chore: Use secure command parsing with shlex
kzajac-opera May 19, 2026
f7858b1
chore: Rename snapshot-efficiency to snapshot-agentic-use
kzajac-opera May 19, 2026
6c4f3da
chore: Move python configuration one-level up
kzajac-opera May 19, 2026
0bfdaab
chore: Add shared code with token counter
kzajac-opera May 19, 2026
3982ac5
docs: Update benchmark docs
kzajac-opera May 19, 2026
3a58b59
feat: Add benchmark config for snapshotting single page
kzajac-opera May 19, 2026
3aab459
docs: Document snapshot benchmark
kzajac-opera May 19, 2026
2d39628
fix: Fix linter issues after mv
kzajac-opera May 19, 2026
15c9b82
chore: Add more conditions to page-token-benchmark
kzajac-opera May 20, 2026
6a45631
feat: Add explicit --full option to agentic-use-benchmark
kzajac-opera May 20, 2026
5368860
chore: Move token_counter.py
kzajac-opera May 20, 2026
b501288
fix: Explicitly request open before snapshot in every mode
kzajac-opera May 20, 2026
5aa9f2f
feat: Update SKILL.md after token optimization
kzajac-opera May 20, 2026
be6a2fc
fix: Set devtools ports explicitly to avoid port collision
kzajac-opera May 20, 2026
6b012dd
fix: Update tasks for wikipedia extraction -> year change from 2024 t…
kzajac-opera May 20, 2026
856366c
chore: remove running benchmark with and without --full and always as…
kzajac-opera May 21, 2026
5ce1714
docs: Combine two CLAUDE files into one for benchmarks
kzajac-opera May 21, 2026
ac89c74
docs: Mention the results in main README
kzajac-opera May 21, 2026
440542f
fix: Fixes from review
kzajac-opera May 21, 2026
37b7625
fix: Add python safety check and verbose error handling
kzajac-opera May 21, 2026
3be0b73
fix: Error message readability fix
kzajac-opera May 22, 2026
44319ea
chore: execute in benchmark run in different directory
kzajac-opera May 22, 2026
55e0341
chore: use consistent table formatting in report and README
kzajac-opera May 22, 2026
4672b00
chore: add debug helper for CLAUDE
kzajac-opera May 22, 2026
33567f6
docs: Document experiment results in README
kzajac-opera May 22, 2026
485aa49
fix: strip whitespace from --conditions split
kzajac-opera May 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,19 @@ on:
branches: [main]

jobs:
lint-benchmark:
runs-on: ubuntu-latest
defaults:
run:
working-directory: benchmarks
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install -r requirements-dev.txt
- run: make check

build-and-test:
runs-on: ubuntu-latest
steps:
Expand Down
32 changes: 32 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ Key files:
| `src/bridge.ts` → `runBridge()` | Entry point for the bridge process |
| `bin/opera-browser-cli-bridge.js` | Bridge binary entrypoint (calls `runBridge`) |

## Benchmarks

Token-cost and agentic-quality measurements live in `benchmarks/`. See `benchmarks/CLAUDE.md` for file roles and how to run them.

## Specs directory

Planned and in-progress fixes are documented as Markdown specs in `specs/`.
Expand All @@ -24,6 +28,34 @@ Always check there before starting implementation work.
|---|---|
| [`specs/fix-parallel-streaming-routing.md`](specs/fix-parallel-streaming-routing.md) | Planned — parallel chunk routing for concurrent Opera AI calls |

## Common issues

### Stale bridge process after update (`BRIDGE_NOT_READY` / "different server")

**Symptom:** `opera-browser-cli` commands fail with:
```
error: Port 9224 is in use by a different server (not opera-devtools-mcp).
code: BRIDGE_NOT_READY
```
even though the bridge is running (`lsof -i :9224` shows a `node` process).

**Cause:** The bridge process was started before `dist/src/bridge.js` was rebuilt. The
running process has old code in memory; its `/health` response is missing the
`server: "opera-browser-cli"` field that `checkPortStatus` (`client.ts`) requires to
recognise the bridge as healthy. Without that field the port is classified as a conflict.

**Fix:** Restart the bridge:
```sh
opera-browser-cli stop
# next command auto-starts a fresh bridge with current code
```

If `stop` does nothing (the bridge was started without a PID file, or the PID file was
deleted), kill it by port instead:
```sh
lsof -ti :9224 | xargs kill
```

## Architecture notes

### Bridge transport model
Expand Down
95 changes: 67 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ OPERA_CLI_BROWSER_URL=http://127.0.0.1:9222 opera-browser-cli open https://examp

```
┌───────────────────────┐
│ opera-browser-cli │ CLI — parse args, format output
│ opera-browser-cli │ CLI — parse args, format output
└──────────┬────────────┘
│ HTTP (localhost:9225)
Expand All @@ -122,7 +122,7 @@ OPERA_CLI_BROWSER_URL=http://127.0.0.1:9222 opera-browser-cli open https://examp
│ stdio
┌───────────────────────┐
│ opera-devtools-mcp │ Headless Chrome via DevTools Protocol
│ opera-devtools-mcp │ Headless Chrome via DevTools Protocol
└───────────────────────┘
```

Expand All @@ -136,7 +136,7 @@ OPERA_CLI_BROWSER_URL=http://127.0.0.1:9222 opera-browser-cli open https://examp
### Navigation

| Command | Description |
| ----------------- | -------------------------------------------- |
|-------------------|----------------------------------------------|
| `open <url>` | Navigate to URL and snapshot |
| `snapshot` | Capture current page state |
| `screenshot <p>` | Save a screenshot to a file |
Expand All @@ -156,7 +156,7 @@ opera-browser-cli eval "(() => { const rows = [...document.querySelectorAll('tr'
### Interaction

| Command | Description |
| -------------------------- | ------------------------------ |
|----------------------------|--------------------------------|
| `click @<uid>` | Click an element by ref |
| `fill @<uid> <text>` | Fill a form field |
| `type <text>` | Type text at current focus |
Expand All @@ -170,7 +170,7 @@ opera-browser-cli eval "(() => { const rows = [...document.querySelectorAll('tr'
### Page Management

| Command | Description |
| ----------------- | --------------------------- |
|-------------------|-----------------------------|
| `pages` | List all open tabs |
| `newpage <url>` | Open a new tab |
| `selectpage <id>` | Switch to a tab by ID |
Expand All @@ -180,13 +180,13 @@ opera-browser-cli eval "(() => { const rows = [...document.querySelectorAll('tr'
### Emulation

| Command | Description |
| --------- | ------------------------------- |
|-----------|---------------------------------|
| `emulate` | Emulate device/network/viewport |

### DevTools Debugging

| Command | Description |
| ------------------ | ------------------------------ |
|--------------------|--------------------------------|
| `console` | List console messages |
| `console-get <id>` | Get a specific console message |
| `network` | List network requests |
Expand All @@ -195,7 +195,7 @@ opera-browser-cli eval "(() => { const rows = [...document.querySelectorAll('tr'
### Performance

| Command | Description |
| --------------------------- | ----------------------------- |
|-----------------------------|-------------------------------|
| `lighthouse` | Run a Lighthouse audit |
| `perf-start` | Start a performance trace |
| `perf-stop` | Stop the performance trace |
Expand All @@ -204,27 +204,27 @@ opera-browser-cli eval "(() => { const rows = [...document.querySelectorAll('tr'

### Opera AI

| Command | Description | Requires |
| ------------------- | --------------------------------------------- | -------------- |
| `chat <prompt>` | Send a chat message to Opera's built-in AI | Any Opera |
| `invoke-do <prompt>`| Ask the AI to perform a complex browsing task | Opera Neon |
| `make <prompt>` | Ask the AI to build a webpage or app | Opera Neon |
| `research <prompt>` | Ask the AI to research a topic in depth | Opera Neon |
| Command | Description | Requires |
|----------------------|-----------------------------------------------|------------|
| `chat <prompt>` | Send a chat message to Opera's built-in AI | Any Opera |
| `invoke-do <prompt>` | Ask the AI to perform a complex browsing task | Opera Neon |
| `make <prompt>` | Ask the AI to build a webpage or app | Opera Neon |
| `research <prompt>` | Ask the AI to research a topic in depth | Opera Neon |

`research` accepts `--type local` (default), `--type one-minute`, or `--type deep`.

### Configuration

| Command | Description |
| -------- | ------------------------------------------------ |
|----------|--------------------------------------------------|
| `setup` | Interactive first-time setup (browser path, etc) |
| `doctor` | Check configuration and environment |
| `logs` | Show bridge server logs |

### Bridge

| Command | Description |
| ------- | ----------------------- |
|---------|-------------------------|
| `start` | Start the bridge server |
| `stop` | Stop the bridge server |

Expand All @@ -235,7 +235,7 @@ session is active or the no-session status/help block when one is not.
### Flags

| Flag | Description |
| --------------------------- | ------------------------------------------- |
|-----------------------------|---------------------------------------------|
| `--help` | Show usage information |
| `-v`, `-V`, `--version` | Show the installed CLI version |
| `--full` | Show complete output without truncation |
Expand Down Expand Up @@ -263,21 +263,21 @@ session is active or the no-session status/help block when one is not.

## Configuration

| Variable | Default | Purpose |
| --- | --- | --- |
| `OPERA_CLI_PORT` | `9225` | Bridge server port |
| `OPERA_CLI_MCP_BIN` | _(bundled `opera-devtools-mcp`)_ | Override the MCP server binary |
| `OPERA_CLI_EXECUTABLE_PATH` | _(system Chrome)_ | Custom browser binary |
| `OPERA_CLI_BROWSER_URL` | — | Connect to an existing browser instance instead of launching one |
| `OPERA_CLI_USER_DATA_DIR` | — | Persistent Chrome profile directory (skips isolated mode) |
| `OPERA_CLI_HEADED` | — | Set to `1` to run in headed (visible) mode |
| `OPERA_CLI_CHROME_ARGS` | — | Extra Chrome flags, space-separated |
| `OPERA_CLI_DISABLE_HOOKS` | — | Set to `1` to skip auto-installing session hooks |
| Variable | Default | Purpose |
|-----------------------------|----------------------------------|------------------------------------------------------------------|
| `OPERA_CLI_PORT` | `9225` | Bridge server port |
| `OPERA_CLI_MCP_BIN` | _(bundled `opera-devtools-mcp`)_ | Override the MCP server binary |
| `OPERA_CLI_EXECUTABLE_PATH` | _(system Chrome)_ | Custom browser binary |
| `OPERA_CLI_BROWSER_URL` | — | Connect to an existing browser instance instead of launching one |
| `OPERA_CLI_USER_DATA_DIR` | — | Persistent Chrome profile directory (skips isolated mode) |
| `OPERA_CLI_HEADED` | — | Set to `1` to run in headed (visible) mode |
| `OPERA_CLI_CHROME_ARGS` | — | Extra Chrome flags, space-separated |
| `OPERA_CLI_DISABLE_HOOKS` | — | Set to `1` to skip auto-installing session hooks |

State is stored in `~/.opera-browser-cli/`:

| File | Purpose |
| ------------ | ---------------------------------- |
|--------------|------------------------------------|
| `bridge.pid` | PID and port of the running bridge |

### Session Hooks
Expand Down Expand Up @@ -328,6 +328,45 @@ export OPERA_CLI_MCP_BIN=opera-devtools-mcp
export OPERA_CLI_HEADED=1
```

## Benchmarks

### Page Snapshot

Runs snapshot command on 50 static pages (Wikipedia, GitHub, MDN, Python docs, RFC Editor) and counts output tokens via tiktoken. No LLM involved — purely mechanical measurement.

**Results (50 runs each):**

| Condition | Avg tokens | Median tokens | p95 tokens |
|-----------------|------------|---------------|------------|
| `opera-compact` | 60.6k | 24.3k | 256.1k |
| `mcp-raw` | 94.7k | 45.0k | 391.3k |
| `opera-raw` | 94.9k | 45.1k | 381.4k |
| `axi` | 98.5k | 46.6k | 396.9k |

`--full` variants (no char limit) are also measured; see the [detailed README](page-token-benchmark/README.md) and [results report](page-token-benchmark/results/report.md).

---

### Agentic Use

An LLM agent completes 7 browser tasks (adapted from the [axi bench-browser benchmark](https://github.com/kunchenguid/axi/tree/main/bench-browser)) across 4 conditions. Each run is graded pass/fail by an LLM judge. Captures input tokens, snapshot size, wall time, and tool call count.
The agent was selecting each tool with or without `--full` flag, depending on the context.

**Results (35 runs each, 5 repeats × 7 tasks):**

| Condition | Pass [%] | Avg input length [tokens] | Avg snapshot length [chars] | Avg task time [seconds] | Avg tool calls |
|---------------|----------|---------------------------|-----------------------------|-------------------------|----------------|
| opera-compact | 100% | 36.3k | 83.1k | 6.8 | 1.4 |
| opera-raw | 100% | 107.5k | 198.1k | 8.5 | 1.6 |
| axi | 100% | 102.2k | 203.9k | 9.8 | 1.5 |
| mcp-raw | 100% | 179.2k | 218.7k | 9.4 | 2.1 |

> opera-compact saves **80%** total tokens vs mcp-raw baseline.

See the [detailed README](snapshot-agentic-use/README.md) and [results report](snapshot-agentic-use/results/v6/report.md).

---

## Development

```sh
Expand Down
30 changes: 28 additions & 2 deletions SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: Browser automation and web interaction using the opera-browser-cli

# Skill: opera-browser-cli Browser Automation

`opera-browser-cli` controls a Opera browser browser session.
`opera-browser-cli` controls an Opera browser session.

- **Standard commands** (`open`, `click`, `fill`, `screenshot`, etc.) — work with any Opera browser session.
- **`chat`** — available on any Opera browser.
Expand All @@ -17,4 +17,30 @@ Run `opera-browser-cli --help` for the full command list, or `opera-browser-cli
opera-browser-cli open https://example.com # start here — navigate and snapshot the page
```

If a user hits `Opera: user is not signed in` on an AI command, suggest they sign in to their Opera account. `invoke-do`, `make`, and `research` require Opera Neon with an active sign-in. Run `opera-browser-cli setup` or `opera-browser-cli doctor` to configure or diagnose.
## Snapshot format

By default snapshots are **compact**: role names are shortened, refs use `@PAGE.ELEM` form (e.g. `@2.4`), headings become markdown, and redundant ARIA attributes are stripped. Pass `--raw` to any snapshot-returning command to get the unprocessed MCP output instead.

Repeated or very long URLs in compact output are replaced with `$uN` tokens. A `urls:` trailer at the end of the snapshot lists what each token resolves to.

```
@2.4 link "Download" url=$u1
...
urls:
$u1 /downloads/installer-v3.2.1-x86_64.tar.gz
```

Use `opera-browser-cli url $uN` or `opera-browser-cli url @ref` to resolve a token or element ref to its full URL without taking a new snapshot.

## Flags available on snapshot-returning commands

| Flag | Effect |
|----------|--------------------------------------------------------------|
| `--full` | Show complete snapshot without truncation |
| `--raw` | Unprocessed MCP output (disables compact format and URL LUT) |

Commands that accept these flags: `open`, `snapshot`, `click`, `fill`, `type`, `press`, `scroll`, `back`, `hover`, `drag`, `fillform`, `upload`, `newpage`, `selectpage`.

## Sign-in errors

If you hit `Opera: user is not signed in` on an AI command, suggest signing in to their Opera account. Run `opera-browser-cli setup` or `opera-browser-cli doctor` to configure or diagnose.
5 changes: 5 additions & 0 deletions benchmarks/.flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[flake8]
max-line-length = 120
# E203: whitespace before ':' — conflicts with black's slice formatting
# W503: line break before binary operator — conflicts with black
extend-ignore = E203, W503
Loading
Loading