Skip to content

BillJr99/BetterWebUI

Repository files navigation

BetterWebUI

A friendlier front-end for OpenWebUI. Built for users who want the power of agentic AI — running commands, reading files, generating images and audio, calling MCP servers — without having to be a developer.

⚠ Experimental software — use in a sandbox environment only. BetterWebUI is a research prototype. It is not intended for, nor evaluated or deemed suitable for, any particular production use or critical workload. No warranty is provided, express or implied. Shell commands approved in the chat interface execute directly on your local machine; integrated services (CLK, AutoGUI, OSScreenObserver) may take real actions on your desktop. Run this software only in an isolated, sandboxed environment and review every command before approving it. By using this software you accept all associated risks.

Contributions, bug reports, and ideas are very welcome — feel free to open an issue or pull request!

What it does

  • Connects to your LLM provider of choice — OpenWebUI, Ollama (direct), OpenAI, Anthropic, or any OpenAI-compatible endpoint. A friendly setup wizard runs on first launch, picks defaults per provider, and validates the connection before saving.
  • Lets you pick from any model your provider knows about (scrollable, filterable picker — ↑↓ to navigate, type to filter).
  • Workspaces — bundle a system prompt, chosen skills, MCP servers, CLI shortcuts, and persistent files into a saved configuration you can return to. "Grading", "Research", "Course prep" — switch with one click.
  • Skills — short markdown briefs telling the assistant how to do specific tasks. Loaded on demand when a request matches.
  • System prompts — the assistant's role and tone.
  • MCP servers — extend the assistant with tools from a curated registry (Filesystem, GitHub, Fetch, Brave Search, Memory, Git, …) or your own custom servers.
  • CLI shortcuts — registered command-line tools (git, gh, pandoc, ffmpeg, …) the assistant knows are available.
  • Math + markdown rendering — prose, tables, code, and LaTeX ($...$, $$...$$, \(...\), \[...\]) all render properly via KaTeX.
  • Multimodal in — attach images and files to your messages.
  • Multimodal out — generated images and audio download to your computer automatically; nothing is left lying on the server.
  • Local file picker — when the assistant wants to read a file, you get a file picker. The assistant only sees what you choose to share.
  • Local shell execution — bash on macOS/Linux, PowerShell on Windows. Every command requires a one-click approval before it runs.

Service Integrations

BetterWebUI integrates with three external AI services via REST APIs, exposing them at /api/services/* endpoints that the LLM can call through tool use or slash commands.

Service Env var Default URL Purpose
CognitiveLoopKernel (CLK) CLK_BASE_URL http://localhost:8001 Deep research loops & multi-step workflows
AutoGUI AUTOGUI_BASE_URL http://localhost:8002 Desktop GUI automation via ReAct
OSScreenObserver (OSSO) OSSO_BASE_URL http://localhost:5001 Screen reading & accessibility inspection

These services are included as git submodules (CognitiveLoopKernel/, AutoGUI/, OSScreenObserver/). Running start.sh automatically pulls the submodules and starts any service that is not already running; those services are stopped automatically when the script exits. Override the ports with the CLK_PORT, AUTOGUI_PORT, and OSSO_PORT environment variables.

Enable / disable services

Each service can be toggled on or off independently from Settings → Services (or via the API). Disabled services immediately return an HTTP 503 for all their routes, and the LLM is told the service is unavailable. Re-enabling restores normal operation without a restart.

Method Path Purpose
GET /api/services/status Current enabled/disabled state for all services
POST /api/services/{name}/enable Enable a service (clk, autogui, osso)
POST /api/services/{name}/disable Disable a service

Graceful degradation

When an enabled service is not running or unreachable, BetterWebUI returns a descriptive HTTP 503 response rather than crashing. The LLM receives the error message and relays it to the user.

Approval flow

Tool calls that trigger side-effects require a one-click approval from the user in the chat interface before the action executes:

  • clk_research — shows the workflow and command for approval
  • autogui_task — shows the task description for approval
  • screen_action — shows the action type and coordinates for approval

Read-only operations (screen_windows, screen_description, screen_screenshot) run without an approval prompt.

Integrated endpoints

Method Path Service
GET /api/services/health All (aggregated health check)
GET /api/services/status All (enable/disable state)
POST /api/services/{name}/enable All
POST /api/services/{name}/disable All
GET /api/services/clk/workflows CLK
POST /api/services/clk/research CLK
GET /api/services/clk/research/{id} CLK
GET /api/services/clk/research/{id}/stream CLK (SSE)
GET /api/services/clk/research/{id}/artifacts CLK
POST /api/services/clk/research/{id}/cancel CLK
POST /api/services/autogui/task AutoGUI
GET /api/services/autogui/task/{id} AutoGUI
GET /api/services/autogui/task/{id}/stream AutoGUI (SSE)
POST /api/services/autogui/task/{id}/cancel AutoGUI
GET /api/services/autogui/tools AutoGUI
GET /api/services/osso/windows OSSO
GET /api/services/osso/description OSSO
GET /api/services/osso/structure OSSO
GET /api/services/osso/screenshot OSSO
POST /api/services/osso/action OSSO
GET /api/services/osso/capabilities OSSO
GET /api/services/tools All (LLM tool specs)

Slash commands

When typing in the chat, prefix your message with a slash command to route directly to a service:

  • /research <topic> — starts a CLK research workflow
  • /observe — returns a description of the current screen via OSSO
  • /automate <task> — sends a GUI automation task to AutoGUI (dry-run by default)

Deployment

See deploy/README.md for the full integration deployment guide, including Docker Compose configuration and the bootstrap.sh script for cloning sibling repositories.

Running the test suite

Unit + service-integration tests (no external dependencies)

pip install -r requirements.txt
pytest tests/ --ignore=tests/playwright

Everything — unified runner (recommended)

scripts/run-all-tests.sh is the single entry point. It drives the same setup wizard the launchers use, then runs (in order) pytest, the existing Playwright integration suite, the comprehensive browser-driven UI suite (~155 tests, 55 spec files), and the curl smoke tests.

./scripts/run-all-tests.sh

Useful flags:

Flag What it does
--no-wizard Skip the wizard; assume env is already set (CI mode)
--reconfigure Force re-prompt for provider / URL / key / model
--docker Bring up deploy/docker-compose.e2e.yml (Ollama + OpenWebUI) and tear it down on exit
--docker-compose <file> Tear down the given compose stack on exit (assume it's already up)
--skip-python / --skip-playwright / --skip-ui / --skip-smoke Selectively run stages
--keep-going Don't fail-fast — run every stage even if an earlier one fails
-- <args> Pass remaining args to playwright test (e.g. -- --grep settings)

The runner owns the lifecycle of any docker stack it uses: the cleanup trap runs docker compose down -v --remove-orphans on EXIT/INT/TERM, guaranteeing teardown even when tests fail or the script is interrupted.

End-to-end tests — Docker (Ollama + OpenWebUI, fully self-contained)

Requires Docker Desktop and Node.js 18+. The script pulls the model on first run, starts the full stack, runs all tests, and tears everything down.

./scripts/run-e2e-docker.sh

# Override the model (default: tinyllama:1.1b):
OLLAMA_MODEL=phi3:mini ./scripts/run-e2e-docker.sh

Or run directly via npm (inside tests/playwright):

cd tests/playwright && npm run test:e2e
# Override model:
OLLAMA_MODEL=phi3:mini npm run test:e2e

End-to-end tests — local (your own OpenWebUI, no Docker)

Requires Python 3.10+, Node.js 18+, git, and a running OpenWebUI instance. The script clones the sibling repos, sets up virtual environments, starts all services, and runs the full Playwright suite (service-integration + chat).

./scripts/run-e2e-local.sh

The same setup wizard prompts for provider, base URL, API key, and model on first run; subsequent runs reuse the saved configuration in deploy/.env.

Services started locally (all stopped automatically when the script exits):

Service Port Mode
BetterWebUI 8765 normal
CognitiveLoopKernel 8001 normal
AutoGUI 8002 dry-run (no real desktop actions)
OSScreenObserver 5001 mock (synthetic screen data)

Sibling repos are cloned (or updated) as siblings of this directory:

parent/
├── betterwebui/          ← this repo
├── cognitiveloopkernel/
├── autogui/
└── osscreenobserver/

First-time setup

You need an LLM endpoint you can reach and (for most providers) an API key. The bundled setup wizard supports:

Provider Default URL API key needed?
OpenWebUI http://localhost:3000 yes (Settings → Account → API Keys)
Ollama (direct) http://localhost:11434 no
OpenAI https://api.openai.com/v1 yes
Anthropic https://api.anthropic.com/v1 yes
Custom (OpenAI-compatible) (you supply) yes

The wizard runs automatically the first time you launch — it picks a provider, validates the connection, lets you pick a default model from a scrollable list, and writes the result to deploy/.env. To re-run it later, pass --reconfigure to scripts/setup_wizard.py or use the Settings → Connection tab in the UI.

Choose whichever installation method suits you:


Option A — Docker (recommended, no Python needed)

  1. Install Docker Desktop and start it.
  2. Open a terminal, navigate to the folder you cloned/downloaded, and run:
docker compose up

That's it. Docker builds and starts the app. Open http://localhost:8765 in your browser.

To stop it: press Ctrl-C in the terminal. To start again later: docker compose up.

Your data (conversations, workspaces, skills) is saved in the data/ folder next to the app, not inside Docker. You can back it up, share it, or delete it freely.


Option B — Python (macOS)

./start-mac.sh

Checks for Homebrew and offers to install it, then installs Python 3 and git via Homebrew if they are missing (with a Y/n prompt for each). On subsequent launches it skips straight to starting the services.

Option C — Python (Linux / generic Unix)

You need Python 3.10+, git, and curl available in your PATH.

./start.sh

Options B & C — what the script does

The first launch pulls the three service git submodules, creates .venv folders for each, installs all Python packages, and starts every service. Later launches skip setup steps that are already complete. Services that were already running before the script launched are left alone; only the services it started are stopped when you press Ctrl-C.

Port overrides: CLK_PORT (default 8001), AUTOGUI_PORT (default 8002), OSSO_PORT (default 5001), PORT for BetterWebUI itself (default 8765).

Option D — Python (Windows)

You need Python 3.10+ and git in your PATH. Install from python.org / git-scm.com, or:

winget install Python.Python.3.12
winget install Git.Git

Then double-click start.bat, or in a terminal:

start.bat

start.bat checks for Python and git, pulls submodules, installs packages, and opens each service in a minimised terminal window. When BetterWebUI exits the service windows are closed automatically.


When the server is running, open http://127.0.0.1:8765 in your browser.

Configure on first run

The Python launchers (start.sh / start-mac.sh / start.bat) run the setup wizard automatically before booting any services — so on first launch you'll be walked through the four prompts (provider menu → base URL → API key → model picker) and the rest is configured for you. You can return to Settings → Connection in the UI at any time to change values without re-running the wizard.

If you have CLK, AutoGUI, or OSScreenObserver running, scroll to Settings → Services to enable/disable each one. (All three are enabled by default; they degrade gracefully if not reachable.)

Optional, only if you want to use MCP servers:

  • Node.js (for npx-based servers like Filesystem, GitHub, Memory)
  • uv (for uvx-based servers like Fetch, Git, Time)

Where things run

BetterWebUI runs locally on your computer. When you click start.sh or start.bat, the server starts on your machine. That means:

  • Shell commands the assistant runs → execute on your computer
  • Files you pick → stay on your computer
  • Files the assistant generates → download to your Downloads folder
  • The LLM endpoint you configured (OpenWebUI, Ollama, OpenAI, Anthropic, …) is the only remote piece — it only ever sees the messages and base64'd attachments you send

If you want to host BetterWebUI on a remote server and have shell commands still execute locally, that's a different architecture (a local bridge agent). It's not built in yet — let us know if you need it.

Workspaces

A workspace is a saved bundle of:

  • A system prompt
  • A subset of your skills
  • A subset of your MCP servers
  • A subset of your CLI shortcuts
  • Persistent files (attached to every new chat in that workspace)
  • A default model (optional)

Open the Workspaces tab → + New workspace to create one. Examples:

  • Grading: prompt = "You are a grading assistant…", skills = grading-rubric, files = [syllabus.pdf, rubric.docx].
  • Research: prompt = "You are a research assistant…", skills = research-citations, MCP = fetch, brave-search.
  • Course prep: prompt = "Help me prepare lecture materials…", CLI shortcuts = pandoc, files = [course-notes.md].

Switch the active workspace from the dropdown at the top of the chat.

Skills

Skills are markdown files in the skills/ folder. Three are included as examples (rubric helper, citation helper, computer helper). You can:

  • Click Skills in the sidebar → New skill to write one in the UI
  • Or drop a .md file into the skills/ folder directly

Each skill is a frontmatter header plus a body:

---
name: My Skill
description: When the assistant should load this skill
---

When this skill is loaded, do these things…

The assistant sees a list of available skills and their descriptions. When a user request matches one, the assistant calls load_skill to read the full instructions and follow them.

MCP servers

Click Tools+ Add from registry to install one of:

  • Filesystem — read/write files in a chosen directory (needs Node.js)
  • GitHub — repos, issues, PRs (needs Node.js + a GitHub PAT)
  • Fetch — retrieve and parse web pages (needs Python + uv)
  • Brave Search — web search (needs Node.js + a Brave API key)
  • Memory — a persistent knowledge graph (needs Node.js)
  • Git — read a local Git repo's history (needs Python + uv)
  • Sequential Thinking — stepped reasoning (needs Node.js)
  • Time — accurate time + timezone conversion (needs Python + uv)

Or + Custom to register a server you've written or found elsewhere.

If a server fails to start (most often: missing npx or uvx), the UI shows the error in the server's row — fix the prerequisite, then click the row to reconcile.

CLI shortcuts

Pre-registered command templates the assistant can invoke through cli_call. Each invocation goes through the same approval dialog as a raw shell command. The curated registry includes git, gh, pandoc, ffmpeg, yt-dlp, sqlite3, ripgrep, curl. Add your own with + Custom — use {args} in the template as the placeholder for arguments the assistant fills in.

Math + markdown

The assistant's responses render as proper markdown — headings, lists, tables, code blocks, links. Mathematics renders via KaTeX. The assistant is told it can use:

  • $inline$ and $$display$$
  • \(inline\) and \[display\]

Try asking it to derive something or explain a formula and the equations will typeset nicely.

Safety

Every action that touches your computer is gated:

  • Shell commands show a dialog with the exact command and the assistant's stated reason. You approve or deny each one.
  • File saves show the filename and a preview before downloading.
  • File reads open a file picker — you choose what the assistant sees.
  • File generation (image/audio), skill loading, and MCP tool calls run without prompting (they don't change anything destructive).
  • Shell execution can be turned off entirely in Settings.

Where things live

betterwebui/
├── app.py                    # backend (FastAPI)
├── static/                   # frontend (HTML/CSS/JS, no build step)
├── skills/                   # your skills, as .md files
├── services/                 # integration clients (CLK, AutoGUI, OSSO)
├── CognitiveLoopKernel/      # git submodule — CLK service
├── AutoGUI/                  # git submodule — AutoGUI service
├── OSScreenObserver/         # git submodule — OSScreenObserver service
├── data/
│   ├── config.json               # your settings (API key lives here)
│   ├── system_prompts.json
│   ├── conversations.json
│   ├── workspaces.json
│   ├── mcp_servers.json
│   ├── cli_tools.json
│   └── uploads/                  # files you attached
└── start.sh / start-mac.sh / start.bat

The data/ folder is yours — back it up if you've written prompts, workspaces, or conversations you care about.

Generated images/audio are NOT stored on the server — they stream directly to your browser, which downloads them and displays them inline using a temporary blob URL.

Troubleshooting

  • "Cannot reach OpenWebUI" — check the URL and that OpenWebUI is actually running. Try opening it in another browser tab first.
  • "No working API endpoint detected" — the URL probably points at a web page rather than the API. Try just the host root.
  • Image generation fails — your OpenWebUI instance needs an image backend configured (Image Generation in OpenWebUI's admin settings).
  • Audio generation fails — OpenWebUI needs TTS configured (Audio settings in admin).
  • MCP server won't start — usually npx or uvx is missing. Install Node.js (https://nodejs.org/) or uv (https://docs.astral.sh/uv/), then reconcile from the Tools tab.
  • Math doesn't render — check the browser console for KaTeX errors; CDN may be blocked by a firewall.

License

MIT license; Use freely within your institution.

About

Friendlier OpenWebUI front-end with cowork and code style functionality for a general audience: workspaces bundle system prompts, skills, MCP servers, and persistent files into one-click configurations; local shell runs with approval dialogs; multimodal in/out; and LaTeX math rendering via KaTeX. No developer experience required.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors