OpenAI-Compatible Providers Guide

🎯 What Are OpenAI-Compatible Providers?

OpenAI-Compatible providers are services that implement the same API interface as OpenAI, allowing you to use them as drop-in replacements. This includes:

Local deployments (LocalAI, FastChat, vLLM)
Cloud services (Together AI, Anyscale, Groq, DeepInfra)
Custom implementations using the OpenAI SDK

⚡ Quick Setup

1. Via Frontend Dashboard

Navigate to: http://localhost:3000/dashboard/settings/llm-providers
Click "Add Provider"
Select "OpenAI-Compatible" from provider type
Choose a preset or enter custom URL
Enter API key (if required)
Click "Test Connection"
Click "Save"

2. Via API

curl -X POST http://localhost:8000/api/v1/llm-providers \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Together AI",
    "provider_type": "openai_compatible",
    "base_url": "https://api.together.xyz/v1",
    "api_key": "your-api-key",
    "model_name": "mistralai/Mixtral-8x7B-Instruct-v0.1",
    "is_active": true,
    "config": {
      "temperature": 0.7,
      "max_tokens": 4096
    }
  }'

📋 Available Presets

Local Deployments (No API Key Required)

Provider	Base URL	Description
LocalAI	`http://localhost:8080/v1`	Self-hosted AI, drop-in OpenAI replacement
FastChat	`http://localhost:8000/v1`	Berkeley's distributed LLM serving
vLLM	`http://localhost:8000/v1`	High-throughput LLM inference
Text Generation Inference	`http://localhost:8080`	Hugging Face's optimized inference

Cloud Services (API Key Required)

Provider	Base URL	Pricing	Description
Together AI	`https://api.together.xyz/v1`	~$0.90/1M tokens	Open-source models, fast inference
Anyscale Endpoints	`https://api.endpoints.anyscale.com/v1`	~$0.15/1M tokens	Ray-powered model serving
Groq	`https://api.groq.com/openai/v1`	Free (beta)	Ultra-fast LPU inference
DeepInfra	`https://api.deepinfra.com/v1`	~$0.08/1M tokens	Low-cost model hosting
Lepton AI	`https://<workspace>.lepton.run/api/v1`	~$0.50/1M tokens	Serverless LLM deployment

🔧 Setup Examples

LocalAI (Self-Hosted)

1. Install LocalAI:

# Using Docker
docker run -p 8080:8080 localai/localai:latest

# Or download from: https://localai.io/basics/getting_started/

2. Pull a Model:

# Models are auto-downloaded on first use
# Configure in LocalAI's models directory

3. Add Provider:

curl -X POST http://localhost:8000/api/v1/llm-providers \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "LocalAI",
    "provider_type": "openai_compatible",
    "base_url": "http://localhost:8080/v1",
    "model_name": "ggml-gpt4all-j",
    "is_active": true
  }'

4. Test:

curl -X POST http://localhost:8000/api/v1/llm-providers/{id}/test \
  -H "Authorization: Bearer YOUR_TOKEN"

Together AI (Cloud)

1. Get API Key:

Sign up: https://together.ai
Get API key: https://api.together.ai/settings/api-keys

2. Add Provider:

curl -X POST http://localhost:8000/api/v1/llm-providers \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Together AI - Mixtral",
    "provider_type": "openai_compatible",
    "base_url": "https://api.together.xyz/v1",
    "api_key": "your-together-api-key",
    "model_name": "mistralai/Mixtral-8x7B-Instruct-v0.1",
    "is_active": true,
    "config": {
      "temperature": 0.7,
      "max_tokens": 4096
    }
  }'

3. Available Models:

mistralai/Mixtral-8x7B-Instruct-v0.1 - Best quality
mistralai/Mistral-7B-Instruct-v0.2 - Fast & capable
togethercomputer/RedPajama-INCITE-7B-Chat - Lightweight
togethercomputer/LLaMA-2-7B-32K-Instruct - Long context

Docs: https://docs.together.ai/docs

Groq (Ultra-Fast)

1. Get API Key:

Sign up: https://console.groq.com
Get API key: https://console.groq.com/keys
Currently free during beta!

2. Add Provider:

curl -X POST http://localhost:8000/api/v1/llm-providers \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Groq",
    "provider_type": "openai_compatible",
    "base_url": "https://api.groq.com/openai/v1",
    "api_key": "your-groq-api-key",
    "model_name": "mixtral-8x7b-32768",
    "is_active": true,
    "config": {
      "temperature": 0.6,
      "max_tokens": 32768
    }
  }'

3. Available Models:

mixtral-8x7b-32768 - Mixtral 8x7B (32K context)
llama2-70b-4096 - Llama 2 70B (4K context)
gemma-7b-it - Google Gemma 7B

Why Groq? 10-100x faster than GPU inference!

Docs: https://console.groq.com/docs

Anyscale Endpoints

1. Get API Key:

Sign up: https://app.endpoints.anyscale.com
Get API key from dashboard

2. Add Provider:

curl -X POST http://localhost:8000/api/v1/llm-providers \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Anyscale",
    "provider_type": "openai_compatible",
    "base_url": "https://api.endpoints.anyscale.com/v1",
    "api_key": "your-anyscale-api-key",
    "model_name": "mistralai/Mistral-7B-Instruct-v0.2",
    "is_active": true
  }'

3. Available Models:

mistralai/Mistral-7B-Instruct-v0.2
meta-llama/Llama-2-7b-chat-hf
meta-llama/Llama-2-13b-chat-hf
meta-llama/Llama-2-70b-chat-hf

Docs: https://docs.endpoints.anyscale.com

DeepInfra (Low-Cost)

1. Get API Key:

Sign up: https://deepinfra.com
Get API key from dashboard

2. Add Provider:

curl -X POST http://localhost:8000/api/v1/llm-providers \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "DeepInfra",
    "provider_type": "openai_compatible",
    "base_url": "https://api.deepinfra.com/v1",
    "api_key": "your-deepinfra-api-key",
    "model_name": "mistralai/Mixtral-8x7B-Instruct-v0.1",
    "is_active": true,
    "config": {
      "temperature": 0.7,
      "max_tokens": 4096
    }
  }'

3. Pricing (per 1M tokens):

Mixtral 8x7B: ~$0.24
Mistral 7B: ~$0.08
Llama 2 70B: ~$0.28

Docs: https://deepinfra.com/docs

Custom OpenAI-Compatible API

If you have a custom OpenAI-compatible endpoint:

curl -X POST http://localhost:8000/api/v1/llm-providers \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My Custom API",
    "provider_type": "openai_compatible",
    "base_url": "https://your-custom-domain.com/v1",
    "api_key": "your-api-key-or-null",
    "model_name": "your-model-name",
    "is_active": true,
    "config": {
      "temperature": 0.7,
      "max_tokens": 4096,
      "pricing": {
        "input_cost_per_token": 0.000001,
        "output_cost_per_token": 0.000002
      }
    }
  }'

💻 Usage Examples

Python SDK

from app.services.llm_client import create_llm_client
from app.services.llm_provider_service import LLMProviderService

# Initialize
llm_client = create_llm_client()
provider_service = LLMProviderService()

# Get provider
provider = await provider_service.get_provider(provider_id)
api_key = provider_service.decrypt_api_key(provider.api_key)

# Register provider
llm_client.register_provider(provider, api_key=api_key)

# Use Groq for ultra-fast response
response = await llm_client.chat_completion(
    messages=[{"role": "user", "content": "Hello!"}],
    provider_id=provider.id,
    model="mixtral-8x7b-32768",
    temperature=0.6,
)

print(f"Response: {response.content}")
print(f"Time: {response.response_time_ms}ms")
print(f"Cost: ${response.cost_usd}")

Using Different Providers for Different Tasks

# Use Groq for fast chat
groq_response = await llm_client.chat_completion(
    messages=[{"role": "user", "content": "Quick question!"}],
    provider_id=groq_provider.id,
)

# Use Together AI for complex reasoning
together_response = await llm_client.chat_completion(
    messages=[{"role": "user", "content": "Solve this complex problem..."}],
    provider_id=together_provider.id,
    model="mistralai/Mixtral-8x7B-Instruct-v0.1",
)

# Use LocalAI for development/testing (free)
local_response = await llm_client.chat_completion(
    messages=[{"role": "user", "content": "Test message"}],
    provider_id=localai_provider.id,
)

Streaming Response

async for chunk in await llm_client.chat_completion(
    messages=[{"role": "user", "content": "Write a story"}],
    provider_id=provider.id,
    stream=True,
):
    print(chunk.content, end="", flush=True)

📊 Comparison Table

Provider	Speed	Cost	Quality	Best For
Groq	⚡⚡⚡⚡⚡	Free (beta)	⭐⭐⭐⭐	Real-time apps
Together AI	⚡⚡⚡⚡	$	⭐⭐⭐⭐⭐	Production workloads
Anyscale	⚡⚡⚡	$	⭐⭐⭐⭐	Enterprise apps
DeepInfra	⚡⚡⚡	¢	⭐⭐⭐⭐	Cost-sensitive apps
LocalAI	⚡⚡	Free	⭐⭐⭐	Development/Testing
FastChat	⚡⚡⚡	Free	⭐⭐⭐⭐	Research
vLLM	⚡⚡⚡⚡	Free	⭐⭐⭐⭐	High-throughput

🔍 Troubleshooting

Issue: "Connection refused"

Solution:

Verify the service is running
Check the base URL is correct

Test with curl:

curl http://localhost:8080/v1/models  # LocalAI
curl https://api.together.xyz/v1/models -H "Authorization: Bearer YOUR_KEY"  # Together

Issue: "Invalid API key"

Solution:

Verify API key is correct (no extra spaces)
Check API key has sufficient permissions
Ensure API key hasn't expired
For cloud services, check billing is active

Issue: "Model not found"

Solution:

Verify model name matches provider's format
Check model is available for your account
Review provider documentation for supported models

Issue: "Rate limit exceeded"

Solution:

Wait and retry (auto-retries with backoff)
Upgrade your plan for higher limits
Use a different provider
Implement request queuing

💰 Cost Comparison

Cloud Provider Pricing (per 1M tokens)

Provider	Input	Output	Example Cost (100K tokens)
Groq	Free	Free	$0.00
DeepInfra	$0.08	$0.08	$0.008
Together AI	$0.90	$0.90	$0.09
Anyscale	$0.15	$0.15	$0.015
OpenAI GPT-4	$30.00	$60.00	$4.50
Anthropic Claude 3	$3.00	$15.00	$0.45

Note: Local deployments cost only electricity and hardware.

🎯 Best Practices

Use Local for Development: Save money by using LocalAI/FastChat during development
Use Groq for Speed: When you need ultra-fast responses
Use Together/Anyscale for Quality: For production workloads requiring high quality
Monitor Usage: Track token usage and costs in the dashboard
Set Up Multiple Providers: Have fallback providers for redundancy
Cache Responses: Cache frequent queries to reduce API calls
Batch Requests: Combine multiple queries when possible

📚 Additional Resources

OpenAI API Reference: https://platform.openai.com/docs/api-reference
LocalAI Docs: https://localai.io/basics/getting_started/
Together AI Docs: https://docs.together.ai/docs
Groq Docs: https://console.groq.com/docs
Anyscale Docs: https://docs.endpoints.anyscale.com
DeepInfra Docs: https://deepinfra.com/docs

🚀 Next Steps

Choose a Provider: Select based on your needs (speed, cost, quality)
Get API Key (if cloud): Sign up and get your API key
Add Provider: Use dashboard or API to configure
Test Connection: Verify it's working
Start Using: Integrate with your agents!

Version: 1.0.0
Last Updated: 2026-03-13
Status: Production Ready

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAI-Compatible Providers Guide

🎯 What Are OpenAI-Compatible Providers?

⚡ Quick Setup

1. Via Frontend Dashboard

2. Via API

📋 Available Presets

Local Deployments (No API Key Required)

Cloud Services (API Key Required)

🔧 Setup Examples

LocalAI (Self-Hosted)

Together AI (Cloud)

Groq (Ultra-Fast)

Anyscale Endpoints

DeepInfra (Low-Cost)

Custom OpenAI-Compatible API

💻 Usage Examples

Python SDK

Using Different Providers for Different Tasks

Streaming Response

📊 Comparison Table

🔍 Troubleshooting

Issue: "Connection refused"

Issue: "Invalid API key"

Issue: "Model not found"

Issue: "Rate limit exceeded"

💰 Cost Comparison

Cloud Provider Pricing (per 1M tokens)

🎯 Best Practices

📚 Additional Resources

🚀 Next Steps

FilesExpand file tree

openai-compatible.md

Latest commit

History

openai-compatible.md

File metadata and controls

OpenAI-Compatible Providers Guide

🎯 What Are OpenAI-Compatible Providers?

⚡ Quick Setup

1. Via Frontend Dashboard

2. Via API

📋 Available Presets

Local Deployments (No API Key Required)

Cloud Services (API Key Required)

🔧 Setup Examples

LocalAI (Self-Hosted)

Together AI (Cloud)

Groq (Ultra-Fast)

Anyscale Endpoints

DeepInfra (Low-Cost)

Custom OpenAI-Compatible API

💻 Usage Examples

Python SDK

Using Different Providers for Different Tasks

Streaming Response

📊 Comparison Table

🔍 Troubleshooting

Issue: "Connection refused"

Issue: "Invalid API key"

Issue: "Model not found"

Issue: "Rate limit exceeded"

💰 Cost Comparison

Cloud Provider Pricing (per 1M tokens)

🎯 Best Practices

📚 Additional Resources

🚀 Next Steps