Unified inference orchestrator for local LLM deployments — intent classification, adaptive routing, quality-gated self-improvement, and SOTA web search.
ODO sits between user requests and a local llama-server, intelligently routing, enriching, and quality-gating every interaction.
ODO is designed to work with Chimere Distilled — our Claude Opus 4.6 distillation of Qwen3.5-35B-A3B (MoE).
| Metric | Score |
|---|---|
| HumanEval | 97% |
| BFCL tool-calling | 85% (+18 pts vs base) |
| IFEval | 80% |
| GGUF size | 15 GB (fits 16 GB VRAM) |
User → ODO (intent classify → enrich → route) → llama-server → ODO (quality gate) → User
- Intent Classification: 3-strategy cascade (regex → filetype → LLM)
- Context Enrichment: Web search, ChromaDB RAG, tool injection
- Adaptive Routing: Entropy-based compute profiles (thinking vs no-think)
- Quality Assessment: Scoring + continuous improvement
- Search Pipeline: 8-stage SOTA web search (QueryExpand → WebSearch → RRF → DeepFetch → Diversity → CRAG → Contradictions → Synthesis)
- Zero-config intent handling
- YAML-based pipeline definitions (hot-reload)
- Adaptive compute allocation (think/no-think routing)
- Autonomous self-improvement (overnight LoRA + DSPy)
- Engram memory management (semantic few-shot, n-gram bias)
- DVTS tree search with PRM scoring
- Knowledge ingestion (YouTube, Instagram, GLM-OCR)
docker compose up -d
# ODO on port 8084, llama-server on port 8081pip install -r requirements.txt
export ODO_BACKEND=http://127.0.0.1:8081
python odo.pyAll via environment variables:
ODO_BACKEND: llama-server URL (default:http://127.0.0.1:8081)ODO_PORT: ODO listening port (default:8084)CHIMERE_HOME: Data directory (default:~/.chimere)
Pipeline YAMLs in pipelines/ for per-route customization.
- chimere — Rust inference runtime + distilled models
- Chimere Distilled GGUF — 15 GB GGUF for local inference
Apache 2.0 — Kevin Remondiere