Skip to content

feat: robust Gemini model cascade — ListModels pruning + 404 advance#56

Merged
davidamitchell merged 6 commits into
mainfrom
claude/update-gemini-models-ANMpT
May 12, 2026
Merged

feat: robust Gemini model cascade — ListModels pruning + 404 advance#56
davidamitchell merged 6 commits into
mainfrom
claude/update-gemini-models-ANMpT

Conversation

@davidamitchell
Copy link
Copy Markdown
Owner

Summary

  • Correct model IDs: _MODEL_CASCADE now uses gemini-3-flash-preview and gemini-3.1-flash-lite-preview (the actual API identifiers per the Gemini model registry)
  • Dynamic model discovery: fetch_available_model_ids() calls the ListModels REST endpoint at startup; _ModelCascade accepts the result and prunes entries not in the account's model list before the first call is ever made
  • 404 → cascade advance: New ModelNotFoundEnrichError sentinel + is_model_not_found_error() in _quota.py; both backfill.py and run.py now treat an HTTP 404 (unknown/not-yet-rolled-out model ID) the same as quota exhaustion — the cascade advances immediately rather than retrying three times and marking the item failed

Behaviour after this PR

On backfill startup the pipeline calls ListModels, prunes the cascade to only confirmed-available models, then begins enrichment. If a preview ID still 404s (e.g. not yet in the account's allowed list), it burns through in exactly one item and falls through to gemini-2.5-flash, which is confirmed working — so the full backfill completes.

Test plan

  • Lint passes (make check)
  • Existing tests pass (make test)
  • Manual: trigger backfill workflow on this branch, confirm it reaches gemini-2.5-flash and enriches items

https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW


Generated by Claude Code

- Update _MODEL_CASCADE to use correct preview IDs (gemini-3-flash-preview,
  gemini-3.1-flash-lite-preview) based on current Gemini API model registry
- Add fetch_available_model_ids() that calls the ListModels REST endpoint at
  startup; _ModelCascade accepts available_model_ids and prunes entries that
  aren't in the account's model list before the first call
- Add ModelNotFoundEnrichError sentinel and is_model_not_found_error() to
  _quota.py so HTTP 404 (invalid model ID) is detected alongside 429 quota
- Wire 404 → cascade.advance() in both backfill.py and run.py — unknown or
  not-yet-rolled-out preview IDs burn through in one item each and land on
  gemini-2.5-flash which is confirmed working
- Update sources.yaml gemini_model to gemini-3-flash-preview

https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA a479303.
Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

claude added 5 commits May 12, 2026 07:29
Replace the hardcoded _MODEL_CASCADE list with build_flash_model_cascade(),
which queries the Gemini REST ListModels endpoint at runtime, filters to
generateContent-capable flash models, and sorts them newest-first (highest
major.minor, non-lite before lite, stable before experimental).

This means the cascade automatically picks up new models (e.g. Gemini 4 Flash)
without any code or config changes, and never wastes an API call on a model
that doesn't exist on the account. The configured gemini_model acts as the
starting point; models sorted before it are dropped since they represent a
newer but already-superseded capability tier.

Falls back to _FALLBACK_CASCADE (gemini-2.5-flash + lite) on any network or
API error, with the configured starting_model prepended when it isn't in the
fallback list.

Also fix pre-existing test failures:
- TestProcess mocks now return (client, http_client) tuple and also patch
  _build_flash_model_cascade to avoid live HTTP calls in tests
- TestRateLimiter updated to import from src.pipeline._gemini (where
  _RateLimiter now lives) and patch the correct time module path

https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW
… guard

Three fixes from code review (both manual and sub-agent):

- Redact `key=***` from httpx exception string before logging in
  build_flash_model_cascade — HTTPStatusError includes the full URL
  (with ?key=...) in its __str__, which would expose the API key in logs

- Fix module docstring: "three concerns" → "four concerns" (build_flash_model_cascade
  is the fourth, but the count line wasn't updated)

- Guard advance() against being called when already exhausted — early
  return False prevents _idx growing unboundedly and avoids repeat error
  log spam on redundant advance() calls

https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW
…dence

The cascade now covers all versioned Pro and Flash models discovered via
ListModels, sorted by the specified tier+version order:

  Tier 0 — Pro (gemini-X.Y-pro)          best capability, exhausted first
  Tier 1 — Flash (gemini-X.Y-flash)       balanced
  Tier 2 — Flash-Lite (gemini-X.Y-flash-lite)  highest daily quota, last resort

  Within each tier: newest major.minor first, stable before preview/exp.

Concrete order once all models are live:
  gemini-3.1-pro → gemini-2.5-pro → gemini-3-flash → gemini-2.5-flash
  → gemini-3.1-flash-lite → gemini-2.5-flash-lite

The advance() trigger is daily-quota exhaustion only — RPM back-off is
handled by the rate limiter, not the cascade.

Other changes:
- Remove the starting_model floor-slicing logic; the full API-discovered
  list is always used so no available model is silently excluded
- Rename build_flash_model_cascade → build_model_cascade (includes Pro)
- Filter: match gemini-\d+.*-(pro|flash) to exclude unversioned legacy IDs
- Expand _FALLBACK_CASCADE to include gemini-2.5-pro
- Update sources.yaml: gemini_model is now a fallback-only hint; docs
  updated to explain the tier ordering and cascade trigger

https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW
…nt.models.list()

Switches from raw HTTP ListModels calls (which leaked the API key in URLs)
to the authenticated genai SDK client's models.list() method — no key in a
URL, no separate HTTP call, no httpx dependency for model discovery.

_DESIRED_CASCADE defines the user-specified precedence order as
(fragment, fallback_rpm) tuples; resolve_cascade() matches each fragment
against available model IDs with more-specific-fragment shadowing so
"gemini-2.5-flash" cannot claim "gemini-2.5-flash-lite" IDs.

_ModelCascade now uses _model_rpm() to look up the verified free-tier RPM
for each model in the cascade (_MODEL_RATES) rather than a single fixed
default, so each model is paced correctly for its own quota pool.

https://claude.ai/code/session_01GssPv93FqES87fdsAbuvEW
@davidamitchell davidamitchell merged commit 6a1325b into main May 12, 2026
5 of 7 checks passed
@davidamitchell davidamitchell deleted the claude/update-gemini-models-ANMpT branch May 12, 2026 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants