feat: robust Gemini model cascade — ListModels pruning + 404 advance by davidamitchell · Pull Request #56 · davidamitchell/Latest-developments-

davidamitchell · 2026-05-12T07:28:36Z

Summary

Correct model IDs: _MODEL_CASCADE now uses gemini-3-flash-preview and gemini-3.1-flash-lite-preview (the actual API identifiers per the Gemini model registry)
Dynamic model discovery: fetch_available_model_ids() calls the ListModels REST endpoint at startup; _ModelCascade accepts the result and prunes entries not in the account's model list before the first call is ever made
404 → cascade advance: New ModelNotFoundEnrichError sentinel + is_model_not_found_error() in _quota.py; both backfill.py and run.py now treat an HTTP 404 (unknown/not-yet-rolled-out model ID) the same as quota exhaustion — the cascade advances immediately rather than retrying three times and marking the item failed

Behaviour after this PR

On backfill startup the pipeline calls ListModels, prunes the cascade to only confirmed-available models, then begins enrichment. If a preview ID still 404s (e.g. not yet in the account's allowed list), it burns through in exactly one item and falls through to gemini-2.5-flash, which is confirmed working — so the full backfill completes.

Test plan

Lint passes (make check)
Existing tests pass (make test)
Manual: trigger backfill workflow on this branch, confirm it reaches gemini-2.5-flash and enriches items

https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW

Generated by Claude Code

- Update _MODEL_CASCADE to use correct preview IDs (gemini-3-flash-preview, gemini-3.1-flash-lite-preview) based on current Gemini API model registry - Add fetch_available_model_ids() that calls the ListModels REST endpoint at startup; _ModelCascade accepts available_model_ids and prunes entries that aren't in the account's model list before the first call - Add ModelNotFoundEnrichError sentinel and is_model_not_found_error() to _quota.py so HTTP 404 (invalid model ID) is detected alongside 429 quota - Wire 404 → cascade.advance() in both backfill.py and run.py — unknown or not-yet-rolled-out preview IDs burn through in one item each and land on gemini-2.5-flash which is confirmed working - Update sources.yaml gemini_model to gemini-3-flash-preview https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW

github-actions · 2026-05-12T07:28:47Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA a479303.

Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW

Replace the hardcoded _MODEL_CASCADE list with build_flash_model_cascade(), which queries the Gemini REST ListModels endpoint at runtime, filters to generateContent-capable flash models, and sorts them newest-first (highest major.minor, non-lite before lite, stable before experimental). This means the cascade automatically picks up new models (e.g. Gemini 4 Flash) without any code or config changes, and never wastes an API call on a model that doesn't exist on the account. The configured gemini_model acts as the starting point; models sorted before it are dropped since they represent a newer but already-superseded capability tier. Falls back to _FALLBACK_CASCADE (gemini-2.5-flash + lite) on any network or API error, with the configured starting_model prepended when it isn't in the fallback list. Also fix pre-existing test failures: - TestProcess mocks now return (client, http_client) tuple and also patch _build_flash_model_cascade to avoid live HTTP calls in tests - TestRateLimiter updated to import from src.pipeline._gemini (where _RateLimiter now lives) and patch the correct time module path https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW

… guard Three fixes from code review (both manual and sub-agent): - Redact `key=***` from httpx exception string before logging in build_flash_model_cascade — HTTPStatusError includes the full URL (with ?key=...) in its __str__, which would expose the API key in logs - Fix module docstring: "three concerns" → "four concerns" (build_flash_model_cascade is the fourth, but the count line wasn't updated) - Guard advance() against being called when already exhausted — early return False prevents _idx growing unboundedly and avoids repeat error log spam on redundant advance() calls https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW

…dence The cascade now covers all versioned Pro and Flash models discovered via ListModels, sorted by the specified tier+version order: Tier 0 — Pro (gemini-X.Y-pro) best capability, exhausted first Tier 1 — Flash (gemini-X.Y-flash) balanced Tier 2 — Flash-Lite (gemini-X.Y-flash-lite) highest daily quota, last resort Within each tier: newest major.minor first, stable before preview/exp. Concrete order once all models are live: gemini-3.1-pro → gemini-2.5-pro → gemini-3-flash → gemini-2.5-flash → gemini-3.1-flash-lite → gemini-2.5-flash-lite The advance() trigger is daily-quota exhaustion only — RPM back-off is handled by the rate limiter, not the cascade. Other changes: - Remove the starting_model floor-slicing logic; the full API-discovered list is always used so no available model is silently excluded - Rename build_flash_model_cascade → build_model_cascade (includes Pro) - Filter: match gemini-\d+.*-(pro|flash) to exclude unversioned legacy IDs - Expand _FALLBACK_CASCADE to include gemini-2.5-pro - Update sources.yaml: gemini_model is now a fallback-only hint; docs updated to explain the tier ordering and cascade trigger https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW

…nt.models.list() Switches from raw HTTP ListModels calls (which leaked the API key in URLs) to the authenticated genai SDK client's models.list() method — no key in a URL, no separate HTTP call, no httpx dependency for model discovery. _DESIRED_CASCADE defines the user-specified precedence order as (fragment, fallback_rpm) tuples; resolve_cascade() matches each fragment against available model IDs with more-specific-fragment shadowing so "gemini-2.5-flash" cannot claim "gemini-2.5-flash-lite" IDs. _ModelCascade now uses _model_rpm() to look up the verified free-tier RPM for each model in the cascade (_MODEL_RATES) rather than a single fixed default, so each model is paced correctly for its own quota pool. https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW

claude added 5 commits May 12, 2026 07:29

chore: update PROGRESS.md and CHANGELOG.md for PR #56

47def2b

https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW

davidamitchell merged commit 6a1325b into main May 12, 2026
5 of 7 checks passed

davidamitchell deleted the claude/update-gemini-models-ANMpT branch May 12, 2026 20:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: robust Gemini model cascade — ListModels pruning + 404 advance#56

feat: robust Gemini model cascade — ListModels pruning + 404 advance#56
davidamitchell merged 6 commits into
mainfrom
claude/update-gemini-models-ANMpT

davidamitchell commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

davidamitchell commented May 12, 2026

Summary

Behaviour after this PR

Test plan

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Snapshot Warnings

Scanned Files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 12, 2026 •

edited

Loading