[AI-Generated] feat: auto-discover models from custom OpenAI-compatible endpoints#527
[AI-Generated] feat: auto-discover models from custom OpenAI-compatible endpoints#527HWiese1980 wants to merge 2 commits into
Conversation
When OPENAI_BASE_URL is set, the /models/config endpoint now queries the custom endpoint's /v1/models API to dynamically discover available models instead of relying solely on the static generator.json config. This enables seamless integration with self-hosted LLM servers like llama.cpp, vLLM, LocalAI, and text-generation-webui without requiring manual model configuration. Key changes: - Add _discover_openai_models() helper that queries /v1/models - Modify get_model_config() to use discovery for the openai provider when OPENAI_BASE_URL is set - Graceful fallback to generator.json if discovery fails - Add unit tests for the discovery feature - Document the feature in README.md
There was a problem hiding this comment.
Code Review
This pull request introduces dynamic model discovery for custom OpenAI-compatible endpoints by querying their /models API when OPENAI_BASE_URL is configured, falling back gracefully to static configurations on failure. The feedback suggests improving the robustness of the _discover_openai_models helper function by passing the OPENAI_API_KEY in the request headers, reducing the timeout from 10 to 3 seconds to prevent UI hangs, and adding defensive type checks when parsing the JSON response.
| try: | ||
| async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=10)) as session: | ||
| async with session.get(models_url) as response: | ||
| response.raise_for_status() | ||
| data = await response.json() | ||
|
|
||
| discovered = [] | ||
| # OpenAI /v1/models returns {"data": [{"id": "model-name", ...}, ...]} | ||
| for model_info in data.get("data", []): | ||
| model_id = model_info.get("id", "") | ||
| if model_id: | ||
| discovered.append(Model(id=model_id, name=model_id)) | ||
|
|
||
| if discovered: | ||
| logger.info(f"Discovered {len(discovered)} models from {models_url}") | ||
| return discovered | ||
| except Exception as e: | ||
| logger.warning(f"Failed to discover models from {models_url}: {e}") | ||
| return [] |
There was a problem hiding this comment.
This helper function can be improved in three key areas to enhance correctness, performance, and robustness:
- Authentication Support: Many OpenAI-compatible custom endpoints (such as OpenRouter, Together AI, or private endpoints with token authentication) require an
Authorizationheader even for the/v1/modelsendpoint. SinceOPENAI_API_KEYis configured in the environment, we should pass it as aBearertoken in the request headers. - Reduced Timeout: A 10-second timeout is excessively long for a blocking GET endpoint (
/models/config). If the custom endpoint is offline or slow, the entire UI/API will hang for 10 seconds before falling back to static models. Reducing this to3seconds provides a much better user experience. - Defensive Parsing: We should defensively verify that the returned JSON
dataand eachmodel_infoitem are dictionaries before attempting to access them, preventing potentialAttributeErrorexceptions if a non-compliant endpoint returns an unexpected format.
headers = {}
api_key = os.environ.get("OPENAI_API_KEY")
if api_key:
headers["Authorization"] = f"Bearer {api_key}"
try:
async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=3)) as session:
async with session.get(models_url, headers=headers) as response:
response.raise_for_status()
data = await response.json()
discovered = []
if isinstance(data, dict):
# OpenAI /v1/models returns {"data": [{"id": "model-name", ...}, ...]}
for model_info in data.get("data", []):
if isinstance(model_info, dict):
model_id = model_info.get("id", "")
if model_id:
discovered.append(Model(id=model_id, name=model_id))
if discovered:
logger.info(f"Discovered {len(discovered)} models from {models_url}")
return discovered
except Exception as e:
logger.warning(f"Failed to discover models from {models_url}: {e}")
return []c7034c7 to
b533a39
Compare
…ve parsing - Pass OPENAI_API_KEY as Bearer token in discovery request headers - Reduce timeout from 10s to 3s to prevent UI hangs - Add isinstance checks for defensive JSON parsing
b533a39 to
e10ac16
Compare
|
Human here: ready to be merged in my opinion... also happy to hear some feedback. |
This PR was fully AI-generated (code, tests, documentation, and this PR description) using Claude (Anthropic) via Hermes Agent. A human reviewed and approved the intent and final result, but did not write any of the code manually.
Summary
When
OPENAI_BASE_URLis set (indicating a custom OpenAI-compatible endpoint), the/models/configAPI endpoint now dynamically discovers available models by querying the endpoint's/v1/modelsAPI.This enables seamless integration with self-hosted LLM servers like llama.cpp, vLLM, LocalAI, and text-generation-webui without requiring manual model configuration in
generator.json.Changes
api/api.py: Add_discover_openai_models()async helper + modifyget_model_config()to use it for theopenaiprovider whenOPENAI_BASE_URLis settests/unit/test_model_discovery.py: Unit tests for discovery success, failure fallback, empty response, and URL constructionREADME.md: Document custom endpoint configuration and automatic model discoveryBehavior
OPENAI_BASE_URLnot setgenerator.jsonmodels (unchanged)OPENAI_BASE_URLset + endpoint reachable/v1/modelsOPENAI_BASE_URLset + endpoint unreachablegenerator.jsonExample
The model loaded in llama.cpp will automatically appear in the UI dropdown.
No new dependencies
Uses
aiohttpwhich is already in the project's dependencies.