Local AI app and inference engine for agents. Run open-weight LLMs locally — private, on your machine.
Getting Started · Discord · X / Twitter · Bug Reports
Desktop
Mobile
or grab any build from atomic.chat · GitHub Releases — latest: v1.1.95
Atomic Chat is built by a small team and a handful of community contributors. Pull requests welcome — see CONTRIBUTING.md for how to get started.
Local models
- Run open-weight LLMs locally from HuggingFace — Llama, Gemma, Qwen, Mistral, Phi, and others
- Multi-Token Prediction (MTP) speculative decoding — 30–70% throughput boost on supported models, up to 3× on Gemma 4
- DFlash block-diffusion decoding — up to 6× faster on Qwen 3.6, Gemma 4, Kimi K2.5
- Flash Attention toggle (
on/off/auto) - Automatic reasoning-context tracking for chain-of-thought models
- Auto context-window expansion with overflow notifications
- EAGLE-3 speculative decoding for Gemma 4 on Apple Silicon (MLX)
- MTP on MLX for Qwen 3.5 / 3.6 and DeepSeek V4
- TurboQuant KV cache on MLX-VLM — smaller memory footprint via RHT-correct fast paths
Cloud models
- Built-in providers: OpenAI, Anthropic, Mistral, Groq, MiniMax, Qwen, Moonshot
- Bring your own key, switch model per chat, mix local and cloud freely
Tools & integrations
- One-click agent launch — launch OpenCode and GitHub Copilot CLI agents in one click from the Integrations tab
- Artifacts — live preview panel for HTML/CSS/JS code with copy, download and print
- Connect multiple MCP servers — bring your own tools, file access, web search
- In-app log viewer for MCP tool calls
- Custom assistants with per-assistant system prompts
- Projects with conversation tree view in the sidebar
Local API
- OpenAI-compatible server at
http://localhost:1337/v1— drop-in replacement for the OpenAI SDK - Works with any agent, CLI, or IDE plugin that speaks the OpenAI API
- Bound to
127.0.0.1by default; sethost: 0.0.0.0to expose on LAN
Privacy
- Everything runs locally when you want it to — local server is loopback-only by default
- Your conversations and keys stay on your machine
Three engines under the hood, all exposed through one OpenAI-compatible API at http://localhost:1337/v1:
- atomic-llama-cpp-turboquant — our
llama.cppfork with TurboQuant optimizations for faster quantized inference. Cross-platform (macOS, Windows, Linux), CPU and GPU. - Upstream llama.cpp — official
ggml-orgbuild, used on Windows by default for the widest hardware coverage and MTP support. - MLX-VLM — Apple Silicon-native engine for vision-language models, running on the Neural Engine and unified memory. Faster than llama.cpp on M-series chips for supported models.
Speculative-decoding features available across backends:
- MTP (Multi-Token Prediction) — a draft model predicts ahead, the full model verifies in one pass. Available on macOS and Windows.
- DFlash — block-diffusion speculative decoding for Qwen 3.6, Gemma 4, Kimi K2.5 and others. Apple Silicon only; can't be enabled together with MTP.
- Flash Attention — Settings →
on/off/auto.
Tools talking to http://localhost:1337/v1 don't need to know which backend is running underneath — switch engines without reconfiguring clients.
Atomic Chat runs an OpenAI-compatible server at http://localhost:1337/v1, so any agent, CLI, IDE plugin, or app that speaks the OpenAI API can run on top of your local models — no extra glue needed. Just point its base URL at Atomic Chat and you're done.
A few projects already ship first-class support with their own setup docs:
| Tool | What it is | Setup |
|---|---|---|
| OpenCode | Open-source TUI coding agent. Add Atomic Chat as a local provider in opencode.json. |
Setup guide → |
| OpenClaude | Open-source coding-agent CLI for cloud and local models. Lists Atomic Chat as a supported provider. | Providers list → |
| Goose | Open-source extensible AI agent (CLI, desktop, API). | Setup guide → |
| Hermes Desktop | Native desktop companion for Hermes Agent. Includes an Atomic Chat local preset at http://localhost:1337/v1. |
Repo → |
| Hermes Workspace | Local-first agent workspace built on Nous Research's Hermes. Uses Atomic Chat as its inference backend. | Repo → |
| nanobot | Ultra-lightweight personal AI agent with chat channels, MCP, and WebUI. | Repo → |
| nanoclaw | Containerized agent runtime that calls Atomic Chat as an MCP tool. | Skill guide → |
Built something that runs on Atomic Chat? Open a PR and we'll add it here.
- Node.js ≥ 20.0.0
- Yarn ≥ 4.5.3
- Make ≥ 3.81
- Rust (for Tauri)
- (Apple Silicon) MetalToolchain
xcodebuild -downloadComponent MetalToolchain
git clone https://github.com/AtomicBot-ai/Atomic-Chat
cd Atomic-Chat
make devThis handles everything: installs dependencies, builds core components, and launches the app.
Available make targets:
make dev— full development setup and launchmake build— production buildmake test— run tests and lintingmake clean— delete everything and start fresh
yarn install
yarn build:tauri:plugin:api
yarn build:core
yarn build:extensions
yarn dev- macOS: 13.6+ (8GB RAM for 3B models, 16GB for 7B, 32GB for 13B)
- Windows: 10/11 x64 (same RAM recommendations as macOS)
- Linux: x86_64, glibc ≥ 2.35 (Ubuntu 22.04+, Debian 12+, Fedora 40+, Arch, Mint, Pop!_OS — same RAM recommendations as macOS). Optional: a Vulkan loader (
vulkan-1package, ormesa-vulkan-drivers/ proprietary NVIDIA driver) for GPU acceleration. - iOS: 17+ (download from App Store)
- Android: download from Google Play
Atomic Chat ships as a single self-contained .AppImage — no installer, no root:
chmod +x Atomic.Chat_*_amd64.AppImage
./Atomic.Chat_*_amd64.AppImageIf prompted about FUSE on first launch: sudo apt install fuse libfuse2 (Debian/Ubuntu) or sudo dnf install fuse fuse-libs (Fedora). GPU acceleration (Vulkan) is auto-detected on first launch; only GGUF models run on Linux.
If something isn't working:
Apache 2.0 — see LICENSE for details.
Built on the shoulders of giants:
© 2026 Atomic Chat · Built with ❤️ · atomic.chat










