A friendlier front-end for OpenWebUI. Built for users who want the power of agentic AI — running commands, reading files, generating images and audio, calling MCP servers — without having to be a developer.
⚠ Experimental software — use in a sandbox environment only. BetterWebUI is a research prototype. It is not intended for, nor evaluated or deemed suitable for, any particular production use or critical workload. No warranty is provided, express or implied. Shell commands approved in the chat interface execute directly on your local machine; integrated services (CLK, AutoGUI, OSScreenObserver) may take real actions on your desktop. Run this software only in an isolated, sandboxed environment and review every command before approving it. By using this software you accept all associated risks.
Contributions, bug reports, and ideas are very welcome — feel free to open an issue or pull request!
- Connects to your LLM provider of choice — OpenWebUI, Ollama (direct), OpenAI, Anthropic, or any OpenAI-compatible endpoint. A friendly setup wizard runs on first launch, picks defaults per provider, and validates the connection before saving.
- Lets you pick from any model your provider knows about (scrollable, filterable
picker —
↑↓to navigate, type to filter). - Workspaces — bundle a system prompt, chosen skills, MCP servers, CLI shortcuts, and persistent files into a saved configuration you can return to. "Grading", "Research", "Course prep" — switch with one click.
- Skills — short markdown briefs telling the assistant how to do specific tasks. Loaded on demand when a request matches.
- System prompts — the assistant's role and tone.
- MCP servers — extend the assistant with tools from a curated registry (Filesystem, GitHub, Fetch, Brave Search, Memory, Git, …) or your own custom servers.
- CLI shortcuts — registered command-line tools (git, gh, pandoc, ffmpeg, …) the assistant knows are available.
- Math + markdown rendering — prose, tables, code, and LaTeX (
$...$,$$...$$,\(...\),\[...\]) all render properly via KaTeX. - Multimodal in — attach images and files to your messages.
- Multimodal out — generated images and audio download to your computer automatically; nothing is left lying on the server.
- Local file picker — when the assistant wants to read a file, you get a file picker. The assistant only sees what you choose to share.
- Local shell execution — bash on macOS/Linux, PowerShell on Windows. Every command requires a one-click approval before it runs.
BetterWebUI integrates with three external AI services via REST APIs, exposing them
at /api/services/* endpoints that the LLM can call through tool use or slash commands.
| Service | Env var | Default URL | Purpose |
|---|---|---|---|
| CognitiveLoopKernel (CLK) | CLK_BASE_URL |
http://localhost:8001 |
Deep research loops & multi-step workflows |
| AutoGUI | AUTOGUI_BASE_URL |
http://localhost:8002 |
Desktop GUI automation via ReAct |
| OSScreenObserver (OSSO) | OSSO_BASE_URL |
http://localhost:5001 |
Screen reading & accessibility inspection |
These services are included as git submodules (CognitiveLoopKernel/, AutoGUI/,
OSScreenObserver/). Running start.sh automatically pulls the submodules and starts
any service that is not already running; those services are stopped automatically when
the script exits. Override the ports with the CLK_PORT, AUTOGUI_PORT, and
OSSO_PORT environment variables.
Each service can be toggled on or off independently from Settings → Services (or via the API). Disabled services immediately return an HTTP 503 for all their routes, and the LLM is told the service is unavailable. Re-enabling restores normal operation without a restart.
| Method | Path | Purpose |
|---|---|---|
| GET | /api/services/status |
Current enabled/disabled state for all services |
| POST | /api/services/{name}/enable |
Enable a service (clk, autogui, osso) |
| POST | /api/services/{name}/disable |
Disable a service |
When an enabled service is not running or unreachable, BetterWebUI returns a descriptive HTTP 503 response rather than crashing. The LLM receives the error message and relays it to the user.
Tool calls that trigger side-effects require a one-click approval from the user in the chat interface before the action executes:
clk_research— shows the workflow and command for approvalautogui_task— shows the task description for approvalscreen_action— shows the action type and coordinates for approval
Read-only operations (screen_windows, screen_description,
screen_screenshot) run without an approval prompt.
| Method | Path | Service |
|---|---|---|
| GET | /api/services/health |
All (aggregated health check) |
| GET | /api/services/status |
All (enable/disable state) |
| POST | /api/services/{name}/enable |
All |
| POST | /api/services/{name}/disable |
All |
| GET | /api/services/clk/workflows |
CLK |
| POST | /api/services/clk/research |
CLK |
| GET | /api/services/clk/research/{id} |
CLK |
| GET | /api/services/clk/research/{id}/stream |
CLK (SSE) |
| GET | /api/services/clk/research/{id}/artifacts |
CLK |
| POST | /api/services/clk/research/{id}/cancel |
CLK |
| POST | /api/services/autogui/task |
AutoGUI |
| GET | /api/services/autogui/task/{id} |
AutoGUI |
| GET | /api/services/autogui/task/{id}/stream |
AutoGUI (SSE) |
| POST | /api/services/autogui/task/{id}/cancel |
AutoGUI |
| GET | /api/services/autogui/tools |
AutoGUI |
| GET | /api/services/osso/windows |
OSSO |
| GET | /api/services/osso/description |
OSSO |
| GET | /api/services/osso/structure |
OSSO |
| GET | /api/services/osso/screenshot |
OSSO |
| POST | /api/services/osso/action |
OSSO |
| GET | /api/services/osso/capabilities |
OSSO |
| GET | /api/services/tools |
All (LLM tool specs) |
When typing in the chat, prefix your message with a slash command to route directly to a service:
/research <topic>— starts a CLK research workflow/observe— returns a description of the current screen via OSSO/automate <task>— sends a GUI automation task to AutoGUI (dry-run by default)
See deploy/README.md for the full integration deployment guide,
including Docker Compose configuration and the bootstrap.sh script for cloning
sibling repositories.
pip install -r requirements.txt
pytest tests/ --ignore=tests/playwrightscripts/run-all-tests.sh is the single entry point. It drives the same
setup wizard the launchers use, then runs (in order) pytest, the existing
Playwright integration suite, the comprehensive browser-driven UI suite
(~155 tests, 55 spec files), and the curl smoke tests.
./scripts/run-all-tests.shUseful flags:
| Flag | What it does |
|---|---|
--no-wizard |
Skip the wizard; assume env is already set (CI mode) |
--reconfigure |
Force re-prompt for provider / URL / key / model |
--docker |
Bring up deploy/docker-compose.e2e.yml (Ollama + OpenWebUI) and tear it down on exit |
--docker-compose <file> |
Tear down the given compose stack on exit (assume it's already up) |
--skip-python / --skip-playwright / --skip-ui / --skip-smoke |
Selectively run stages |
--keep-going |
Don't fail-fast — run every stage even if an earlier one fails |
-- <args> |
Pass remaining args to playwright test (e.g. -- --grep settings) |
The runner owns the lifecycle of any docker stack it uses: the cleanup
trap runs docker compose down -v --remove-orphans on EXIT/INT/TERM,
guaranteeing teardown even when tests fail or the script is interrupted.
Requires Docker Desktop and Node.js 18+. The script pulls the model on first run, starts the full stack, runs all tests, and tears everything down.
./scripts/run-e2e-docker.sh
# Override the model (default: tinyllama:1.1b):
OLLAMA_MODEL=phi3:mini ./scripts/run-e2e-docker.shOr run directly via npm (inside tests/playwright):
cd tests/playwright && npm run test:e2e
# Override model:
OLLAMA_MODEL=phi3:mini npm run test:e2eRequires Python 3.10+, Node.js 18+, git, and a running OpenWebUI instance. The script clones the sibling repos, sets up virtual environments, starts all services, and runs the full Playwright suite (service-integration + chat).
./scripts/run-e2e-local.shThe same setup wizard prompts for provider, base URL, API key, and model on
first run; subsequent runs reuse the saved configuration in deploy/.env.
Services started locally (all stopped automatically when the script exits):
| Service | Port | Mode |
|---|---|---|
| BetterWebUI | 8765 | normal |
| CognitiveLoopKernel | 8001 | normal |
| AutoGUI | 8002 | dry-run (no real desktop actions) |
| OSScreenObserver | 5001 | mock (synthetic screen data) |
Sibling repos are cloned (or updated) as siblings of this directory:
parent/
├── betterwebui/ ← this repo
├── cognitiveloopkernel/
├── autogui/
└── osscreenobserver/
You need an LLM endpoint you can reach and (for most providers) an API key. The bundled setup wizard supports:
| Provider | Default URL | API key needed? |
|---|---|---|
| OpenWebUI | http://localhost:3000 |
yes (Settings → Account → API Keys) |
| Ollama (direct) | http://localhost:11434 |
no |
| OpenAI | https://api.openai.com/v1 |
yes |
| Anthropic | https://api.anthropic.com/v1 |
yes |
| Custom (OpenAI-compatible) | (you supply) | yes |
The wizard runs automatically the first time you launch — it picks a
provider, validates the connection, lets you pick a default model from
a scrollable list, and writes the result to deploy/.env. To re-run it
later, pass --reconfigure to scripts/setup_wizard.py or use the
Settings → Connection tab in the UI.
Choose whichever installation method suits you:
- Install Docker Desktop and start it.
- Open a terminal, navigate to the folder you cloned/downloaded, and run:
docker compose upThat's it. Docker builds and starts the app. Open http://localhost:8765 in your browser.
To stop it: press Ctrl-C in the terminal. To start again later: docker compose up.
Your data (conversations, workspaces, skills) is saved in the
data/folder next to the app, not inside Docker. You can back it up, share it, or delete it freely.
./start-mac.shChecks for Homebrew and offers to install it, then installs Python 3 and git via Homebrew if they are missing (with a Y/n prompt for each). On subsequent launches it skips straight to starting the services.
You need Python 3.10+, git, and curl available in your PATH.
./start.shThe first launch pulls the three service git submodules, creates .venv folders for each,
installs all Python packages, and starts every service. Later launches skip setup steps
that are already complete. Services that were already running before the script launched
are left alone; only the services it started are stopped when you press Ctrl-C.
Port overrides: CLK_PORT (default 8001), AUTOGUI_PORT (default 8002),
OSSO_PORT (default 5001), PORT for BetterWebUI itself (default 8765).
You need Python 3.10+ and git in your PATH. Install from python.org / git-scm.com, or:
winget install Python.Python.3.12
winget install Git.GitThen double-click start.bat, or in a terminal:
start.batstart.bat checks for Python and git, pulls submodules, installs packages, and opens each
service in a minimised terminal window. When BetterWebUI exits the service windows are
closed automatically.
When the server is running, open http://127.0.0.1:8765 in your browser.
The Python launchers (start.sh / start-mac.sh / start.bat) run the
setup wizard automatically before booting any services — so on first
launch you'll be walked through the four prompts (provider menu → base
URL → API key → model picker) and the rest is configured for you. You
can return to Settings → Connection in the UI at any time to change
values without re-running the wizard.
If you have CLK, AutoGUI, or OSScreenObserver running, scroll to Settings → Services to enable/disable each one. (All three are enabled by default; they degrade gracefully if not reachable.)
Optional, only if you want to use MCP servers:
- Node.js (for
npx-based servers like Filesystem, GitHub, Memory) - uv (for
uvx-based servers like Fetch, Git, Time)
BetterWebUI runs locally on your computer. When you click start.sh
or start.bat, the server starts on your machine. That means:
- Shell commands the assistant runs → execute on your computer
- Files you pick → stay on your computer
- Files the assistant generates → download to your Downloads folder
- The LLM endpoint you configured (OpenWebUI, Ollama, OpenAI, Anthropic, …) is the only remote piece — it only ever sees the messages and base64'd attachments you send
If you want to host BetterWebUI on a remote server and have shell commands still execute locally, that's a different architecture (a local bridge agent). It's not built in yet — let us know if you need it.
A workspace is a saved bundle of:
- A system prompt
- A subset of your skills
- A subset of your MCP servers
- A subset of your CLI shortcuts
- Persistent files (attached to every new chat in that workspace)
- A default model (optional)
Open the Workspaces tab → + New workspace to create one. Examples:
- Grading: prompt = "You are a grading assistant…", skills =
grading-rubric, files =[syllabus.pdf, rubric.docx]. - Research: prompt = "You are a research assistant…", skills =
research-citations, MCP =fetch,brave-search. - Course prep: prompt = "Help me prepare lecture materials…",
CLI shortcuts =
pandoc, files =[course-notes.md].
Switch the active workspace from the dropdown at the top of the chat.
Skills are markdown files in the skills/ folder. Three are included as
examples (rubric helper, citation helper, computer helper). You can:
- Click Skills in the sidebar → New skill to write one in the UI
- Or drop a
.mdfile into theskills/folder directly
Each skill is a frontmatter header plus a body:
---
name: My Skill
description: When the assistant should load this skill
---
When this skill is loaded, do these things…The assistant sees a list of available skills and their descriptions. When
a user request matches one, the assistant calls load_skill to read the
full instructions and follow them.
Click Tools → + Add from registry to install one of:
- Filesystem — read/write files in a chosen directory (needs Node.js)
- GitHub — repos, issues, PRs (needs Node.js + a GitHub PAT)
- Fetch — retrieve and parse web pages (needs Python + uv)
- Brave Search — web search (needs Node.js + a Brave API key)
- Memory — a persistent knowledge graph (needs Node.js)
- Git — read a local Git repo's history (needs Python + uv)
- Sequential Thinking — stepped reasoning (needs Node.js)
- Time — accurate time + timezone conversion (needs Python + uv)
Or + Custom to register a server you've written or found elsewhere.
If a server fails to start (most often: missing npx or uvx), the UI
shows the error in the server's row — fix the prerequisite, then click
the row to reconcile.
Pre-registered command templates the assistant can invoke through
cli_call. Each invocation goes through the same approval dialog as a
raw shell command. The curated registry includes git, gh, pandoc,
ffmpeg, yt-dlp, sqlite3, ripgrep, curl. Add your own with
+ Custom — use {args} in the template as the placeholder for
arguments the assistant fills in.
The assistant's responses render as proper markdown — headings, lists, tables, code blocks, links. Mathematics renders via KaTeX. The assistant is told it can use:
$inline$and$$display$$\(inline\)and\[display\]
Try asking it to derive something or explain a formula and the equations will typeset nicely.
Every action that touches your computer is gated:
- Shell commands show a dialog with the exact command and the assistant's stated reason. You approve or deny each one.
- File saves show the filename and a preview before downloading.
- File reads open a file picker — you choose what the assistant sees.
- File generation (image/audio), skill loading, and MCP tool calls run without prompting (they don't change anything destructive).
- Shell execution can be turned off entirely in Settings.
betterwebui/
├── app.py # backend (FastAPI)
├── static/ # frontend (HTML/CSS/JS, no build step)
├── skills/ # your skills, as .md files
├── services/ # integration clients (CLK, AutoGUI, OSSO)
├── CognitiveLoopKernel/ # git submodule — CLK service
├── AutoGUI/ # git submodule — AutoGUI service
├── OSScreenObserver/ # git submodule — OSScreenObserver service
├── data/
│ ├── config.json # your settings (API key lives here)
│ ├── system_prompts.json
│ ├── conversations.json
│ ├── workspaces.json
│ ├── mcp_servers.json
│ ├── cli_tools.json
│ └── uploads/ # files you attached
└── start.sh / start-mac.sh / start.bat
The data/ folder is yours — back it up if you've written prompts,
workspaces, or conversations you care about.
Generated images/audio are NOT stored on the server — they stream directly to your browser, which downloads them and displays them inline using a temporary blob URL.
- "Cannot reach OpenWebUI" — check the URL and that OpenWebUI is actually running. Try opening it in another browser tab first.
- "No working API endpoint detected" — the URL probably points at a web page rather than the API. Try just the host root.
- Image generation fails — your OpenWebUI instance needs an image backend configured (Image Generation in OpenWebUI's admin settings).
- Audio generation fails — OpenWebUI needs TTS configured (Audio settings in admin).
- MCP server won't start — usually
npxoruvxis missing. Install Node.js (https://nodejs.org/) or uv (https://docs.astral.sh/uv/), then reconcile from the Tools tab. - Math doesn't render — check the browser console for KaTeX errors; CDN may be blocked by a firewall.
MIT license; Use freely within your institution.