Summary
I’d like Cotabby to include an Ollama inference backend so I can reuse my locally downloaded models, with a model selection UX that feels the same as the LM Studio/OpenAI-compatible endpoint flow.
Problem
Right now I can’t smoothly reuse the Ollama models I already manage. This overlaps with (and builds on) #627 (local OpenAI-compatible endpoint). I want to avoid duplicating model downloads and configuration between Ollama and Cotabby.
Proposed direction
- Add an Ollama engine option alongside existing engines.
- Connect to
localhost:11434 by default, with settings to customize base URL/model name.
- Use streaming completions (SSE) for low-latency suggestions.
- Mirror the LM Studio picker UX so there’s consistent mental model across “local OpenAI-compatible servers”.
Additional context
This also enables MLX use indirectly: Ollama can use MLX, so an Ollama backend lets me benefit from MLX without Cotabby needing its own MLX pipeline.
Related issue: #457 (MLX inference backend).
Summary
I’d like Cotabby to include an Ollama inference backend so I can reuse my locally downloaded models, with a model selection UX that feels the same as the LM Studio/OpenAI-compatible endpoint flow.
Problem
Right now I can’t smoothly reuse the Ollama models I already manage. This overlaps with (and builds on) #627 (local OpenAI-compatible endpoint). I want to avoid duplicating model downloads and configuration between Ollama and Cotabby.
Proposed direction
localhost:11434by default, with settings to customize base URL/model name.Additional context
This also enables MLX use indirectly: Ollama can use MLX, so an Ollama backend lets me benefit from MLX without Cotabby needing its own MLX pipeline.
Related issue: #457 (MLX inference backend).