Add capability-aware routing & failover (tools/vision/reasoning/json)#33
Merged
Merged
Conversation
Virtual models previously only failed over on HTTP status >= 400, so a free model that returns 200 while silently ignoring tools/function calls looked like a success and broke tool-using clients. This adds a general model-capability framework (tools / vision / reasoning / json): - Runtime (server.py): a capability registry with request detectors, strict detectors, and response validators. Virtual-model requests now (1) proactively prefer candidates that support the needed capabilities via a stable reorder that never drops candidates, and (2) reactively fail over when a *forced* capability isn't delivered by a 200 (tools forced but no tool_calls; JSON mode but non-JSON body). Reactive 200-body detection is non-streaming only; vision/reasoning rely on existing HTTP-error failover. - New capability virtual endpoints: llmproxy__tools, llmproxy__tools/free, llmproxy__vision, llmproxy__vision/free (plus legacy llmproxy/ forms), advertised in /v1/models when backing candidates exist, with config hints. - New optional model_capabilities config map (model -> [caps]), threaded through the scraper pipeline (Evidence, OpenRouter supported_parameters/input_modalities, aggregate/apply_updates/regenerate/reconcile), providers.py, config.example.json, and the setup wizard (tag/remove/view + auto-populate). Empty = full backward compat; reconcile is add-only so hand-set tags are never pruned. - Docs: README capability section + endpoints + model_capabilities field. - Tests: detectors, ordering, reactive failover, virtual dispatch/hints, and pipeline (OpenRouter mapping, apply_updates, add-only reconcile, wizard shape). https://claude.ai/code/session_019YMQmPWsAUtALVqqY9FHPo
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds comprehensive capability-aware routing and failover to llmproxy, allowing requests to be intelligently routed to models that support specific capabilities (tools, vision, reasoning, json) and automatically failing over when a model claims to support a capability but doesn't deliver it in the response.
Key Changes
Core Capability Detection & Routing
_request_has_tools(),_tool_use_forced(),_response_has_tool_call()for tools_request_has_image()for vision_request_wants_reasoning()for reasoning_request_wants_json(),_response_is_json()for json_CAPABILITIESdict mapping each capability to its request detector, strict detector (for forced cases), and response validatormodel_capabilitiesconfig field allows tagging models with supported capabilities (case-insensitive, auto-populated from scrapers)_order_by_capability()reorders candidates so models supporting needed capabilities are tried first, with fallback to unknown-capability models_proxy_cycling_non_streaming()to detect when a 200 response failed to deliver a forced capability (e.g.,tool_choice: "required"but no tool calls) and automatically try the next candidateVirtual Endpoints
_CAPABILITY_VIRTUALSconstant defining capability-based virtual modelsllmproxy__tools,llmproxy__vision,llmproxy__tools/free,llmproxy__vision/free(and legacyllmproxy/forms)_get_capability_model_candidates()and_get_capability_free_candidates()selectors_get_virtual_candidates()to dispatch capability-based virtual models/v1/modelslist when at least one model is tagged with that capabilityConfiguration & Setup
model_capabilitiesfield to config schema (optional, top-level object)_model_capabilities()handles missing/malformed config gracefullyScraper Integration
supported_parameters(tools/reasoning/structured_outputs) andarchitecture.input_modalities(image → vision)apply_updates()stores capabilities only for free-tier models (parallel tofree_limits)Testing
tests/test_capabilities.pycovering:Notable Implementation Details
Trueon malformed JSON or unexpected shapes to avoid spurious failover on unparseable responsestool_choice: "auto"or"none"never trigger failover even without tool calls (model may legitimately answer without tools)provider/modelforms, matching existingmodel_reasoningbehaviorhttps://claude.ai/code/session_019YMQmPWsAUtALVqqY9FHPo