feat: robust Gemini model cascade — ListModels pruning + 404 advance#56
Merged
Conversation
- Update _MODEL_CASCADE to use correct preview IDs (gemini-3-flash-preview, gemini-3.1-flash-lite-preview) based on current Gemini API model registry - Add fetch_available_model_ids() that calls the ListModels REST endpoint at startup; _ModelCascade accepts available_model_ids and prunes entries that aren't in the account's model list before the first call - Add ModelNotFoundEnrichError sentinel and is_model_not_found_error() to _quota.py so HTTP 404 (invalid model ID) is detected alongside 429 quota - Wire 404 → cascade.advance() in both backfill.py and run.py — unknown or not-yet-rolled-out preview IDs burn through in one item each and land on gemini-2.5-flash which is confirmed working - Update sources.yaml gemini_model to gemini-3-flash-preview https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW
Contributor
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Snapshot WarningsEnsure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice. Scanned FilesNone |
Replace the hardcoded _MODEL_CASCADE list with build_flash_model_cascade(), which queries the Gemini REST ListModels endpoint at runtime, filters to generateContent-capable flash models, and sorts them newest-first (highest major.minor, non-lite before lite, stable before experimental). This means the cascade automatically picks up new models (e.g. Gemini 4 Flash) without any code or config changes, and never wastes an API call on a model that doesn't exist on the account. The configured gemini_model acts as the starting point; models sorted before it are dropped since they represent a newer but already-superseded capability tier. Falls back to _FALLBACK_CASCADE (gemini-2.5-flash + lite) on any network or API error, with the configured starting_model prepended when it isn't in the fallback list. Also fix pre-existing test failures: - TestProcess mocks now return (client, http_client) tuple and also patch _build_flash_model_cascade to avoid live HTTP calls in tests - TestRateLimiter updated to import from src.pipeline._gemini (where _RateLimiter now lives) and patch the correct time module path https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW
… guard Three fixes from code review (both manual and sub-agent): - Redact `key=***` from httpx exception string before logging in build_flash_model_cascade — HTTPStatusError includes the full URL (with ?key=...) in its __str__, which would expose the API key in logs - Fix module docstring: "three concerns" → "four concerns" (build_flash_model_cascade is the fourth, but the count line wasn't updated) - Guard advance() against being called when already exhausted — early return False prevents _idx growing unboundedly and avoids repeat error log spam on redundant advance() calls https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW
…dence The cascade now covers all versioned Pro and Flash models discovered via ListModels, sorted by the specified tier+version order: Tier 0 — Pro (gemini-X.Y-pro) best capability, exhausted first Tier 1 — Flash (gemini-X.Y-flash) balanced Tier 2 — Flash-Lite (gemini-X.Y-flash-lite) highest daily quota, last resort Within each tier: newest major.minor first, stable before preview/exp. Concrete order once all models are live: gemini-3.1-pro → gemini-2.5-pro → gemini-3-flash → gemini-2.5-flash → gemini-3.1-flash-lite → gemini-2.5-flash-lite The advance() trigger is daily-quota exhaustion only — RPM back-off is handled by the rate limiter, not the cascade. Other changes: - Remove the starting_model floor-slicing logic; the full API-discovered list is always used so no available model is silently excluded - Rename build_flash_model_cascade → build_model_cascade (includes Pro) - Filter: match gemini-\d+.*-(pro|flash) to exclude unversioned legacy IDs - Expand _FALLBACK_CASCADE to include gemini-2.5-pro - Update sources.yaml: gemini_model is now a fallback-only hint; docs updated to explain the tier ordering and cascade trigger https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW
…nt.models.list() Switches from raw HTTP ListModels calls (which leaked the API key in URLs) to the authenticated genai SDK client's models.list() method — no key in a URL, no separate HTTP call, no httpx dependency for model discovery. _DESIRED_CASCADE defines the user-specified precedence order as (fragment, fallback_rpm) tuples; resolve_cascade() matches each fragment against available model IDs with more-specific-fragment shadowing so "gemini-2.5-flash" cannot claim "gemini-2.5-flash-lite" IDs. _ModelCascade now uses _model_rpm() to look up the verified free-tier RPM for each model in the cascade (_MODEL_RATES) rather than a single fixed default, so each model is paced correctly for its own quota pool. https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_MODEL_CASCADEnow usesgemini-3-flash-previewandgemini-3.1-flash-lite-preview(the actual API identifiers per the Gemini model registry)fetch_available_model_ids()calls the ListModels REST endpoint at startup;_ModelCascadeaccepts the result and prunes entries not in the account's model list before the first call is ever madeModelNotFoundEnrichErrorsentinel +is_model_not_found_error()in_quota.py; bothbackfill.pyandrun.pynow treat an HTTP 404 (unknown/not-yet-rolled-out model ID) the same as quota exhaustion — the cascade advances immediately rather than retrying three times and marking the item failedBehaviour after this PR
On backfill startup the pipeline calls ListModels, prunes the cascade to only confirmed-available models, then begins enrichment. If a preview ID still 404s (e.g. not yet in the account's allowed list), it burns through in exactly one item and falls through to
gemini-2.5-flash, which is confirmed working — so the full backfill completes.Test plan
make check)make test)gemini-2.5-flashand enriches itemshttps://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW
Generated by Claude Code