OpenAI-compatible LLM gateway with API key management, budget enforcement, and usage tracking.
gateway sits between your applications and LLM providers so you can control access, cost, and observability in one place.
- OpenAI-compatible endpoints (
/v1/chat/completions,/v1/embeddings,/v1/models) - Virtual API key management (
/v1/keys) for safe client access - User and budget controls (
/v1/users,/v1/budgets) - Usage and pricing tracking (
/v1/messages,/v1/pricing) - Health and metrics endpoints (
/health, optional/metrics) - Built-in tools the gateway runs itself —
otari_code_execution(sandboxed Python REPL) andotari_web_search. See Built-in tools.
uv venv
source .venv/bin/activate
uv sync --devcp config.example.yml config.ymlEdit config.yml and set at least:
master_key- one provider credential in
providers(for exampleopenai.api_key)
uv run gateway serve --config config.ymlOpen API docs at http://localhost:8000/docs.
Platform mode is enabled automatically when OTARI_AI_TOKEN is set.
- Export platform env vars:
export OTARI_AI_TOKEN=gw_xxx- Start the gateway:
uv run gateway serve --config config.ymlNotes:
GATEWAY_MODEis optional; effective mode is derived fromOTARI_AI_TOKEN.- If you explicitly set
GATEWAY_MODE=platform, startup fails unlessOTARI_AI_TOKENis also set. - In platform mode, local
providersconfiguration is not used. - The gateway/platform wire contract (resolve and usage endpoints, request/response shapes, retry semantics) is documented in
docs/platform-protocol.md.
On startup, the gateway can bootstrap an API key in logs. Export it as GATEWAY_API_KEY, then call the gateway as an OpenAI-compatible server:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["GATEWAY_API_KEY"],
base_url="http://localhost:8000/v1",
)
response = client.chat.completions.create(
model="openai:gpt-4o",
messages=[{"role": "user", "content": "Hello from gateway"}],
)
print(response.choices[0].message.content)Run with hot reload and .env:
cp .env.example .env
make devmake test
make lint
make typecheckRun a single test file:
uv run pytest tests/unit/test_gateway_cli.py -vThe gateway image is published on Docker Hub.
cp config.example.yml config.yml
docker compose up -ddocker run --rm \
-p 8000:8000 \
-v "$(pwd)/config.yml:/app/config.yml:ro" \
mzdotai/otari:latest \
gateway serve --config /app/config.ymlGateway will be available at http://localhost:8000.
The gateway can run a couple of tools itself so any model — including
open-weight ones — gets parity with what frontier APIs expose as managed
tools. Both are opt-in via the request's tools array and run inside
docker-compose profiles so operators who don't use them don't pull extra
images.
These use dedicated otari_* tool types. The keyword decides who runs the
code: an otari_* type means the gateway runs it. Every other tool type — the
legacy gateway short forms (code_execution, web_search) and the
provider-native keywords (code_interpreter, code_execution_<date>,
web_search_<date>) — is passed through to the upstream provider untouched, so
the provider runs it in its own native sandbox/search. (In particular, the bare
code_execution / web_search short forms no longer trigger the gateway
sandbox — use the otari_* types for that.) Either way the gateway still
handles routing, observability, and billing.
{
"model": "anthropic:claude-sonnet-4-6",
"messages": [{"role": "user", "content": "Compute 23 factorial."}],
"tools": [{"type": "otari_code_execution"}]
}Bring up with docker compose --profile code-exec up. See demo/code-exec/
for a runnable walkthrough of both the gateway-managed and native-passthrough
flows.
{
"model": "anthropic:claude-sonnet-4-6",
"messages": [{"role": "user", "content": "What's the latest stable Python release?"}],
"tools": [{"type": "otari_web_search"}]
}Bring up with docker compose --profile web-search up. See demo/web-search/
for a runnable walkthrough.
The bundled backend is a SearXNG metasearch container restricted to engines
that don't forbid automated querying (duckduckgo, mojeek, qwant, wikipedia)
— see scripts/searxng/settings.yml. Top results are fetched and content
is extracted via trafilatura in-process so the model sees LLM-ready
Markdown, not raw SERP snippets.
The free SearXNG engines rate-limit/CAPTCHA automated queries by IP, so they
can be flaky for sustained use. For commercial or production use, swap the
SearXNG container for a backend that uses a licensed API (Tavily, Brave Search
API, Exa, Linkup, Serper). WebSearchBackend is configured purely by URL
(GATEWAY_WEB_SEARCH_URL), so any HTTP service that exposes a
SearXNG-compatible /search?format=json endpoint is a drop-in replacement
— including thin adapters in front of commercial APIs. Adapters that
already extract content can pass it through on the optional
extracted_content result field to bypass the gateway-side extraction.
A ready-to-run Brave Search adapter ships in
scripts/web-search-brave-adapter/: set BRAVE_API_KEY and
GATEWAY_WEB_SEARCH_URL=http://brave-adapter:8080, then
docker compose --profile web-search-brave up -d --build brave-adapter gateway.
See that folder's README for details and how to adapt it to another provider.
Per-tool overrides (max_results, allowed_domains, blocked_domains,
purpose_hint) live on the tool entry; operator-level env knobs
(GATEWAY_WEB_SEARCH_ENGINES, GATEWAY_WEB_SEARCH_MAX_RESULTS,
GATEWAY_WEB_SEARCH_EXTRACT, GATEWAY_WEB_SEARCH_PURPOSE_HINT) live
alongside GATEWAY_WEB_SEARCH_URL.
GET /healthPOST /v1/chat/completionsPOST /v1/embeddingsPOST /v1/moderationsGET /v1/modelsPOST/GET /v1/keysPOST/GET /v1/usersPOST/GET /v1/budgetsGET /v1/messagesGET /v1/pricing
Full schema: docs/public/openapi.json
uv run gateway init-db --config config.yml
uv run gateway migrate --config config.yml
uv run gateway migrate --config config.yml --revision <rev>
uv run python scripts/generate_openapi.py --checkApache 2.0. See LICENSE.