This guide shows you how to configure and use OpenAI-compatible APIs in the Document-Analyzer-Operator Platform.
cd backend
poetry install# Generate encryption key for API keys
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"Save this key! You'll need it for the .env file.
Edit backend/.env:
# Encryption Key (REQUIRED - from step 2)
ENCRYPTION_KEY=your_generated_encryption_key_here
# Default LLM Provider
DEFAULT_LLM_PROVIDER=openai
# OpenAI Configuration
OPENAI_API_KEY=sk-your-openai-api-key
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_MODEL=gpt-4
# Anthropic Configuration (optional)
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
ANTHROPIC_MODEL=claude-3-sonnet-20240229
# Ollama Configuration (optional - local)
OLLAMA_BASE_URL=http://localhost:11434/v1
OLLAMA_MODEL=llama2
# LM Studio Configuration (optional - local)
LM_STUDIO_BASE_URL=http://localhost:1234/v1
# vLLM Configuration (optional - local)
VLLM_BASE_URL=http://localhost:8000/v1cd backend
poetry run alembic upgrade headThis creates two new tables:
llm_providers- Stores provider configurationsllm_usage_logs- Tracks API usage and costs
poetry run uvicorn app.main:app --reloadcurl -X POST http://localhost:8000/api/v1/llm-providers \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "OpenAI GPT-4",
"provider_type": "openai",
"base_url": "https://api.openai.com/v1",
"api_key": "sk-your-openai-api-key",
"model_name": "gpt-4",
"is_active": true,
"config": {
"temperature": 0.7,
"max_tokens": 4096
}
}'curl -X POST http://localhost:8000/api/v1/llm-providers \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Anthropic Claude 3",
"provider_type": "anthropic",
"base_url": "https://api.anthropic.com/v1",
"api_key": "sk-ant-your-anthropic-key",
"model_name": "claude-3-sonnet-20240229",
"is_active": true,
"config": {
"temperature": 0.7,
"max_tokens": 4096
}
}'curl -X POST http://localhost:8000/api/v1/llm-providers \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Local Ollama",
"provider_type": "ollama",
"base_url": "http://localhost:11434/v1",
"model_name": "llama2",
"is_active": true,
"config": {
"temperature": 0.7,
"max_tokens": 4096
}
}'Note: First, pull the model:
ollama pull llama2- Open LM Studio
- Load a model (e.g.,
mistral-7b) - Start the local server
- Add provider:
curl -X POST http://localhost:8000/api/v1/llm-providers \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "LM Studio",
"provider_type": "lm_studio",
"base_url": "http://localhost:1234/v1",
"model_name": "mistral-7b-instruct",
"is_active": true,
"config": {
"temperature": 0.7,
"max_tokens": 4096
}
}'- Navigate to: http://localhost:3000/dashboard/settings/llm-providers
- Click "Add Provider"
- Select provider type (OpenAI, Anthropic, Ollama, etc.)
- Fill in the configuration form
- Click "Test Connection" to verify
- Click "Save"
from app.services.llm_client import create_llm_client
from app.services.llm_provider_service import LLMProviderService
# Initialize
llm_client = create_llm_client(timeout=60.0, max_retries=3)
provider_service = LLMProviderService()
# Get provider from database
provider = await provider_service.get_provider(provider_id)
api_key = provider_service.decrypt_api_key(provider.api_key)
# Register provider
llm_client.register_provider(provider, api_key=api_key)
# Chat completion
response = await llm_client.chat_completion(
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
provider_id=provider.id,
temperature=0.7,
max_tokens=1024,
)
print(f"Response: {response.content}")
print(f"Tokens used: {response.usage.total_tokens}")
print(f"Estimated cost: ${response.cost_usd}")
# Streaming response
async for chunk in llm_client.chat_completion(
messages=[{"role": "user", "content": "Write a poem about AI"}],
provider_id=provider.id,
stream=True,
):
print(chunk.content, end="", flush=True)
# Generate embeddings
embeddings = await llm_client.embeddings(
text="The quick brown fox jumps over the lazy dog",
provider_id=provider.id,
)
print(f"Embedding dimensions: {len(embeddings[0])}")
# List available models
models = await llm_client.list_models(provider_id=provider.id)
print(f"Available models: {models}")# Register multiple providers
openai_key = service.decrypt_api_key(openai_provider.api_key)
anthropic_key = service.decrypt_api_key(anthropic_provider.api_key)
llm_client.register_provider(openai_provider, api_key=openai_key)
llm_client.register_provider(anthropic_provider, api_key=anthropic_key)
# Use OpenAI for one task
response1 = await llm_client.chat_completion(
messages=[{"role": "user", "content": "Hello!"}],
provider_id=openai_provider.id,
)
# Use Anthropic for another task
response2 = await llm_client.chat_completion(
messages=[{"role": "user", "content": "Hi there!"}],
provider_id=anthropic_provider.id,
)from app.services.llm_client import RateLimitError, AuthenticationError, TimeoutError
try:
response = await llm_client.chat_completion(
messages=[{"role": "user", "content": "Test"}],
provider_id=provider.id,
)
except RateLimitError as e:
print(f"Rate limit exceeded. Retry after {e.retry_after} seconds")
except AuthenticationError as e:
print(f"Invalid API key: {e}")
except TimeoutError as e:
print(f"Request timed out after {e.timeout} seconds")
except Exception as e:
print(f"Unexpected error: {e}")GET /api/v1/llm-providers
Authorization: Bearer <token>Response:
{
"data": [
{
"id": "uuid",
"name": "OpenAI GPT-4",
"provider_type": "openai",
"base_url": "https://api.openai.com/v1",
"model_name": "gpt-4",
"is_active": true,
"is_default": true,
"created_at": "2026-03-13T10:00:00Z"
}
],
"total": 1
}POST /api/v1/llm-providers
Authorization: Bearer <token>
Content-Type: application/json
{
"name": "My Provider",
"provider_type": "openai",
"base_url": "https://api.openai.com/v1",
"api_key": "sk-...",
"model_name": "gpt-4",
"is_active": true,
"config": {
"temperature": 0.7,
"max_tokens": 4096
}
}POST /api/v1/llm-providers/{id}/test
Authorization: Bearer <token>
Content-Type: application/json
{
"test_message": "Hello, are you working?"
}Response:
{
"success": true,
"message": "Connection successful",
"model": "gpt-4",
"response_time_ms": 234,
"test_response": "Hello! Yes, I'm working correctly."
}GET /api/v1/llm-providers/usage?start_date=2026-03-01&end_date=2026-03-31
Authorization: Bearer <token>Response:
{
"total_requests": 1250,
"total_tokens_input": 125000,
"total_tokens_output": 87500,
"total_cost_usd": 12.50,
"success_rate": 0.98,
"average_response_time_ms": 345,
"by_provider": [
{
"provider_name": "OpenAI GPT-4",
"requests": 800,
"tokens_input": 80000,
"tokens_output": 56000,
"cost_usd": 9.60
}
],
"by_model": [
{
"model": "gpt-4",
"requests": 800,
"avg_tokens": 170
}
]
}Solution:
- Verify
ENCRYPTION_KEYin.envis correct - Ensure it's a valid Fernet key (44 characters)
- Regenerate if needed:
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
Solution:
- Ensure Ollama/LM Studio/vLLM is running
- Check the base URL is correct
- Verify no firewall is blocking the connection
- Test with curl:
curl http://localhost:11434/v1/models # Ollama curl http://localhost:1234/v1/models # LM Studio
Solution:
- Verify API key is correct (no extra spaces)
- Check API key has sufficient permissions
- Ensure API key hasn't expired
- For OpenAI: Check billing is active at https://platform.openai.com/account/billing
Solution:
- Wait and retry (client auto-retries with exponential backoff)
- Upgrade your API plan for higher limits
- Use a different provider
- Implement request queuing
Solution:
- Verify model name is correct
- Check model is available for your account
- For local providers, ensure model is downloaded:
ollama pull llama2 # Ollama
Solution:
- Monitor usage in dashboard: http://localhost:3000/dashboard/settings/llm-providers/usage
- Set usage alerts
- Use cheaper models (gpt-3.5-turbo instead of gpt-4)
- Reduce max_tokens in config
- Use local models (Ollama, LM Studio) for development
Setup:
- Get API key: https://platform.openai.com/api-keys
- Add provider via API or dashboard
- Test connection
Models:
gpt-4- Best quality, $0.03/1K input, $0.06/1K outputgpt-4-turbo- Fast GPT-4, $0.01/1K input, $0.03/1K outputgpt-3.5-turbo- Fast & cheap, $0.0005/1K input, $0.0015/1K output
Docs: https://platform.openai.com/docs
Setup:
- Get API key: https://console.anthropic.com/settings/keys
- Add provider via API or dashboard
- Test connection
Models:
claude-3-opus-20240229- Most powerful, $15/1M input, $75/1M outputclaude-3-sonnet-20240229- Balanced, $3/1M input, $15/1M outputclaude-3-haiku-20240307- Fast & cheap, $0.25/1M input, $1.25/1M output
Docs: https://docs.anthropic.com/claude/docs
Setup:
- Install: https://ollama.ai/download
- Pull model:
ollama pull llama2 - Add provider (no API key needed)
- Test connection
Popular Models:
llama2- General purposemistral- Fast & capablecodellama- Code generationneural-chat- Conversational
Docs: https://github.com/ollama/ollama
Setup:
- Install: https://lmstudio.ai/
- Download and load a model
- Start local server (port 1234)
- Add provider (no API key needed)
- Test connection
Supported Models: Any GGUF format model from Hugging Face
Docs: https://lmstudio.ai/docs
Setup:
- Install:
pip install vllm - Deploy model:
python -m vllm.entrypoints.api_server --model mistralai/Mistral-7B-Instruct-v0.2 - Add provider (no API key needed for local)
- Test connection
Docs: https://docs.vllm.ai/
| Model | Input (per 1K tokens) | Output (per 1K tokens) |
|---|---|---|
| gpt-4 | $0.03 | $0.06 |
| gpt-4-turbo | $0.01 | $0.03 |
| gpt-3.5-turbo | $0.0005 | $0.0015 |
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| claude-3-opus | $15 | $75 |
| claude-3-sonnet | $3 | $15 |
| claude-3-haiku | $0.25 | $1.25 |
- Ollama: Free (uses your hardware)
- LM Studio: Free (uses your hardware)
- vLLM: Free (uses your hardware)
Note: Token counting uses ~4 characters per token for English text.
- Never commit API keys to version control
- Use environment variables for encryption key
- Enable HTTPS in production
- Rotate API keys periodically
- Monitor usage for anomalies
- Set usage limits with your provider
- Use separate keys for development and production
- Backup encryption key securely (losing it = losing all API keys)
- Backend Documentation:
backend/docs/LLM_PROVIDERS.md - API Documentation: http://localhost:8000/docs
- Frontend Dashboard: http://localhost:3000/dashboard/settings/llm-providers
- Usage Statistics: http://localhost:3000/dashboard/settings/llm-providers/usage
Version: 1.0.0
Last Updated: 2026-03-13
Status: Production Ready