| 12+ LLM Providers |
60-80% Cost Reduction |
699 Tests Passing |
0 Code Changes Required |
AI coding tools lock you into one provider. Claude Code requires Anthropic. Codex requires OpenAI. You can't use your company's Databricks endpoint, your local Ollama models, or your AWS Bedrock account — at least, not without Lynkr.
The real costs:
- Anthropic API at $15/MTok output adds up fast for daily coding
- No way to use free local models (Ollama, llama.cpp) with Claude Code
- Enterprise teams can't route through their own cloud infrastructure
- Provider outages take your entire workflow down
Lynkr is a self-hosted proxy that sits between your AI coding tools and any LLM provider. One environment variable change, and your tools work with any model.
Claude Code / Cursor / Codex / Cline / Continue / Vercel AI SDK
|
Lynkr
|
Ollama | Bedrock | Databricks | OpenRouter | Azure | OpenAI | llama.cpp
# That's it. Three lines.
npm install -g lynkr
export ANTHROPIC_BASE_URL=http://localhost:8081
lynkr startnpm install -g pino-pretty && npm install -g lynkrFree & Local (Ollama)
export MODEL_PROVIDER=ollama
export OLLAMA_MODEL=qwen2.5-coder:latest
lynkr startAWS Bedrock (100+ models)
export MODEL_PROVIDER=bedrock
export AWS_BEDROCK_API_KEY=your-key
export AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0
lynkr startOpenRouter (cheapest cloud)
export MODEL_PROVIDER=openrouter
export OPENROUTER_API_KEY=sk-or-v1-your-key
lynkr startClaude Code
export ANTHROPIC_BASE_URL=http://localhost:8081
export ANTHROPIC_API_KEY=dummy
claude "Your prompt here"Codex CLI — edit ~/.codex/config.toml:
model_provider = "lynkr"
model = "gpt-4o"
[model_providers.lynkr]
name = "Lynkr Proxy"
base_url = "http://localhost:8081/v1"
wire_api = "responses"Cursor IDE
- Settings > Features > Models
- Base URL:
http://localhost:8081/v1 - API Key:
sk-lynkr
Vercel AI SDK
import { generateText } from "ai";
import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
const lynkr = createOpenAICompatible({
baseURL: "http://localhost:8081/v1",
name: "lynkr",
apiKey: "sk-lynkr",
});
const { text } = await generateText({
model: lynkr.chatModel("auto"),
prompt: "Hello!",
});Works with any OpenAI-compatible client: Cline, Continue.dev, ClawdBot, KiloCode, and more.
| Provider | Type | Models | Cost |
|---|---|---|---|
| Ollama | Local | Unlimited (free, offline) | Free |
| llama.cpp | Local | Any GGUF model | Free |
| LM Studio | Local | Local models with GUI | Free |
| MLX Server | Local | Apple Silicon optimized | Free |
| AWS Bedrock | Cloud | 100+ (Claude, Llama, Mistral, Titan) | $$ |
| OpenRouter | Cloud | 100+ (GPT, Claude, Llama, Gemini) |
|
| Databricks | Cloud | Claude Sonnet 4.5, Opus 4.6 | $$$ |
| Azure OpenAI | Cloud | GPT-4o, o1, o3 | $$$ |
| Azure Anthropic | Cloud | Claude models | $$$ |
| OpenAI | Cloud | GPT-4o, o3, o4-mini | $$$ |
| Google Vertex | Cloud | Gemini 2.5 Pro/Flash | $$$ |
| Moonshot AI | Cloud | Kimi K2 Thinking/Turbo | $$ |
| Z.AI | Cloud | GLM-4.7 | $$ |
| DeepSeek | Cloud | DeepSeek Reasoner, R1 | $ |
4 local providers for 100% offline, free usage. 10+ cloud providers for scale.
| Feature | Lynkr | LiteLLM (42K stars) | OpenRouter | PortKey |
|---|---|---|---|---|
| Setup | npm install -g lynkr |
Python + Docker + Postgres | Account signup | Docker + config |
| Claude Code support | Drop-in, native | Requires config | No CLI support | Requires config |
| Cursor support | Drop-in, native | Partial | Via API key | Partial |
| Codex CLI support | Drop-in, native | No | No | No |
| Built for coding tools | Yes (purpose-built) | No (general gateway) | No (general API) | No (general gateway) |
| Local models | Ollama, llama.cpp, LM Studio, MLX | Ollama only | No | No |
| Token optimization | Built-in (60-80% savings) | No | No | Caching only |
| Complexity routing | Auto-routes by task difficulty | Manual | Cost/latency only | Manual |
| Memory system | Titans-inspired long-term memory | No | No | No |
| Self-hosted | Yes (Node.js) | Yes (Python stack) | No (SaaS) | Yes (Docker) |
| Offline capable | Yes | Yes | No | No |
| Transaction fees | None | None (OSS) / Paid enterprise | 5.5% on credits | Free tier / Paid |
| Dependencies | Node.js only | Python, Prisma, PostgreSQL | N/A | Docker, Python |
| Format conversion | Anthropic <-> OpenAI (automatic) | Automatic | N/A | Automatic |
| Code intelligence | Graphify (19-lang AST graph) | No | No | No |
| Routing telemetry | Built-in (SQLite + REST API) | No | Dashboard | Dashboard |
| Admin hot-reload | Yes (no restart) | Requires restart | N/A | Requires restart |
| License | Apache 2.0 | MIT | Proprietary | MIT (gateway) |
Lynkr's edge: Purpose-built for AI coding tools. Not a general LLM gateway — a proxy that understands Claude Code, Cursor, and Codex natively, with built-in token optimization, complexity-based routing, and a memory system designed for coding workflows. Installs in one command, runs on Node.js, zero infrastructure required.
| Scenario | Direct Anthropic | Lynkr + Ollama | Lynkr + OpenRouter | Lynkr + Bedrock |
|---|---|---|---|---|
| Daily Claude Code usage | ~$10-30/day | $0 (free) | ~$2-8/day | ~$5-15/day |
| Token optimization savings | — | — | 60-80% further | 60-80% further |
| Monthly (heavy use) | $300-900 | $0 | $60-240 | $150-450 |
With token optimization enabled, Lynkr's smart tool selection, prompt caching, and memory deduplication reduce token usage by 60-80% on top of provider savings.
Lynkr isn't just a passthrough proxy. It's an optimization layer.
Routes requests to the right model based on 5-phase complexity analysis. Simple questions go to fast/cheap models. Complex architectural tasks go to powerful models. Includes Graphify structural analysis for code-aware routing.
- Complexity scoring — 15-dimension weighted scoring with agentic workflow detection
- Graphify integration — AST-based knowledge graph detects god nodes, community cohesion, blast radius across 19 languages
- Routing telemetry — every decision recorded with quality scoring (0-100) and latency tracking (P50/P95/P99)
- Smart tool selection — only sends tools relevant to the current task
- Code Mode — replaces 100+ MCP tools with 4 meta-tools (~96% token reduction)
- Distill compression — structural similarity, delta rendering, smart dedup of repetitive tool outputs
- Prompt caching — SHA-256 keyed LRU cache
- Memory deduplication — eliminates repeated information across turns
- History compression — sliding window with Distill-powered structural dedup
- Headroom sidecar — optional 47-92% ML-based compression (Smart Crusher, CCR, LLMLingua)
- Circuit breakers — automatic failover with half-open probe recovery
- Admin hot-reload —
POST /v1/admin/reloadreloads config + resets circuit breakers without restart - Load shedding — graceful degradation under high load
- Prometheus metrics — full observability at
/metrics - Health checks — K8s-ready endpoints at
/health - Performance timer — per-request timing breakdown with
PERF_TIMER=true
Titans-inspired long-term memory with surprise-based filtering. The system remembers important context across sessions and forgets noise — reducing token waste from repeated context.
Cache responses for semantically similar prompts. Hit rate depends on your workflow, but repeat questions (common in coding) get instant responses.
SEMANTIC_CACHE_ENABLED=true
SEMANTIC_CACHE_THRESHOLD=0.95Automatic Model Context Protocol server discovery and orchestration. Your MCP tools work through Lynkr without configuration. Enable Code Mode to replace 100+ MCP tool definitions with 4 lightweight meta-tools:
CODE_MODE_ENABLED=true # ~96% reduction in tool-catalog tokensNPM (recommended)
npm install -g lynkr && lynkr startDocker
docker-compose up -dGit Clone
git clone https://github.com/Fast-Editor/Lynkr.git
cd Lynkr && npm install && cp .env.example .env
npm startHomebrew
brew tap vishalveerareddy123/lynkr
brew install lynkr| Guide | Description |
|---|---|
| Installation | All installation methods |
| Provider Config | Setup for all 12+ providers |
| Claude Code CLI | Detailed Claude Code integration |
| Codex CLI | Codex config.toml setup |
| Cursor IDE | Cursor integration + troubleshooting |
| Embeddings | @Codebase semantic search (4 options) |
| Token Optimization | 60-80% cost reduction strategies |
| Memory System | Titans-inspired long-term memory |
| Tools & Execution | Tool calling and execution modes |
| Smart Routing | Complexity-based model routing |
| Docker Deployment | docker-compose with GPU support |
| Production Hardening | Circuit breakers, metrics, load shedding |
| API Reference | All endpoints and formats |
| Troubleshooting | Common issues and solutions |
| FAQ | Frequently asked questions |
| Issue | Solution |
|---|---|
| Same response for all queries | Disable semantic cache: SEMANTIC_CACHE_ENABLED=false |
| Tool calls not executing | Increase threshold: POLICY_TOOL_LOOP_THRESHOLD=15 |
| Slow first request | Keep Ollama loaded: OLLAMA_KEEP_ALIVE=24h |
| Connection refused | Ensure Lynkr is running: lynkr start |
We welcome contributions. See the Contributing Guide and Testing Guide.
Apache 2.0 — See LICENSE.
- GitHub Discussions — Questions and tips
- Report Issues — Bug reports and feature requests
- NPM Package — Official package
- DeepWiki — AI-powered docs search
Built by Vishal Veera Reddy — for developers who want control over their AI tools.