Skip to content

Harden Gemini 404 model fallback and persist pipeline output incrementally#58

Merged
davidamitchell merged 5 commits into
mainfrom
copilot/fix-pipeline-errors
May 14, 2026
Merged

Harden Gemini 404 model fallback and persist pipeline output incrementally#58
davidamitchell merged 5 commits into
mainfrom
copilot/fix-pipeline-errors

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 13, 2026

Pipeline failures were intermittently caused by Gemini 404 NOT_FOUND model errors that were not always classified as model-not-found, so cascade fallback did not reliably advance. The processing flow also behaved too all-or-nothing under interruption, increasing reprocessing risk.

  • Gemini 404 handling

    • Extended _quota.is_model_not_found_error() to detect model-not-found from textual error payloads (NOT_FOUND, embedded code: 404, model-path not-found strings), not only structured code/status_code attributes.
    • Tightened text-based matching with model-specific guards to reduce false positives from unrelated errors.
  • Incremental processed-data persistence

    • Added an optional per-item callback in process(...) and wired main() to merge/write each completed ProcessedItem immediately.
    • Callback failures are now logged at WARNING and processing continues, so persistence errors do not abort the batch.
    • Avoided redundant final full merge/write when per-item persistence succeeds for all items.
  • Workflow commit behavior on failure paths

    • Updated fetch-and-process.yml so the processed-data commit step runs with if: always().
    • Ensured git push origin HEAD always executes in that step, so previously created local commits are not stranded when the run fails.
  • Regression coverage

    • Expanded tests for model-not-found detection partitions:
      • code == 404 and status_code == 404
      • multiple supported text payload variants
      • negative cases (quota 429, generic transport errors, non-model not_found text)
    • Added callback behavior tests:
      • per-item callback invocation
      • on_item_processed=None
      • raising callback logs and processing continues
# src/pipeline/run.py
processed, ai_failures = process(
    items,
    gemini_api_key=api_key,
    fetch_date=today,
    rpm=cfg.pipeline.gemini_rpm,
    enrich_max_output_tokens=cfg.pipeline.enrich_max_output_tokens,
    gemini_model=cfg.pipeline.gemini_model,
    on_item_processed=_persist_item,
)

Copilot AI linked an issue May 13, 2026 that may be closed by this pull request
Copilot AI and others added 2 commits May 13, 2026 09:27
Copilot AI changed the title [WIP] Fix pipeline errors during AI enrichment process Harden Gemini 404 model fallback and persist pipeline output incrementally May 13, 2026
Copilot AI requested a review from davidamitchell May 13, 2026 09:32
Copy link
Copy Markdown
Owner

@davidamitchell davidamitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #58: Harden Gemini 404 model fallback + incremental persistence

The intent is sound and the three changes work together coherently. There are, however, a few correctness and resilience gaps that should be addressed before merge.


Overview

Three independent concerns addressed in one PR:

  1. is_model_not_found_error() extended to detect text-payload 404s
  2. process() gains a per-item callback for incremental persistence
  3. Workflow commit step now runs with if: always()

Each is the right call. The issues below are in the execution, not the direction.


Issues requiring changes

1. "not_found" text match is too broad — false-positive risk (_quota.py)

text = str(exc).lower()
return (
    "not_found" in text   # ← this matches ANY error containing "not_found"
    or ...
)

"not_found" will match quota errors, transport errors, or any SDK error whose message happens to include that string. A false positive here advances the model cascade when it should not — that is a silent correctness failure, not a logged one.

The more specific patterns below it ("model is not found", "models/" + "not found") already cover the real Gemini model-not-found shape. The broad "not_found" check should be tightened to something like "not_found" in text and "model" in text, or removed in favour of the already-present specific patterns.

The '"code": 404' / "'code': 404" checks have the same issue — they match any error whose serialised payload contains a 404 code, not necessarily a model-not-found 404. At minimum add "model" in text as a guard.

2. Callback exception is unhandled — a write failure aborts the whole batch (run.py)

results.append(processed)
if on_item_processed is not None:
    on_item_processed(processed)   # raises → kills the loop

If _merge_and_write fails (disk full, permissions, etc.) the exception propagates and kills the remaining items in the batch. This contradicts the project rule: "Fetcher failures for a single source must not abort the entire run." The same resilience should apply here — catch, log at WARNING, and continue.

if on_item_processed is not None:
    try:
        on_item_processed(processed)
    except Exception as exc:  # noqa: BLE001 — callback is caller-supplied; must not abort batch
        logger.warning("on_item_processed callback failed for item %s: %s", processed.id, exc)

3. Split imports from the same module will fail make check (run.py)

from src.pipeline._gemini import (_ModelCascade,)
from src.pipeline._gemini import (make_gemini_client as _make_gemini_client,)
from src.pipeline._gemini import (resolve_cascade as _resolve_cascade,)

Ruff collapses repeated imports from the same module into one block. This will be rejected by make check. Consolidate into a single from src.pipeline._gemini import (...).

4. Workflow push step not verified for if: always() (checklist item 6)

The diff only shows the commit step gaining if: always(). Per the workflow validation checklist in the project instructions:

If a Python script commits via subprocess and a later step is the only place that pushes: the later step must always push — otherwise locally-committed-but-not-pushed commits are silently discarded when the job ends.

The push step must also carry if: always() (or if: always() && steps.<commit-step>.outcome != 'skipped'). If it doesn't, partial commits made after a process failure will be lost when the job exits — exactly the scenario this PR is trying to fix.


Moderate concerns

5. Final _merge_and_write after the loop is redundant (run.py)

# In main():
on_item_processed=lambda item: _merge_and_write(out_path, [item]),  # writes N times
...
merged = _merge_and_write(out_path, processed)  # then writes all N again

If the callback ran for every item, the final call re-reads the file, re-merges all items that are already present, and re-writes. This is O(n) redundant I/O per run and will compound on large batches. Either skip the final call when a callback is set, or make the callback accumulate in memory and only flush at the end. The current behaviour is correct but wasteful.


Minor / test coverage

6. Test partitions are thin (test_pipeline_quota.py)

Per the project TDD mandate — "every function with conditional logic requires partitions for: typical value, boundary, empty/zero, and invalid/malformed" — the two tests cover only code == 404 (attribute) and one text-payload format. Missing partitions:

  • status_code == 404 attribute path
  • Negative case: a quota-exhausted 429 must not be detected as model-not-found
  • Negative case: a generic transport error must not be detected as model-not-found
  • Each text-pattern variant individually ("model is not found", models/...not found, dict-serialised "code": 404)

The negative cases are especially important given the false-positive risk in issue #1 above.

7. Callback test missing the None and raise paths (test_pipeline_run.py)

The new test confirms the happy path. Two additional partitions are needed once issue #2 is fixed:

  • on_item_processed=None — must not raise
  • Callback that raises — must log and continue, not abort the batch

What's good

  • The text-based fallback detection is the right approach — SDK error shapes vary and relying solely on .code / .status_code has already proven fragile.
  • Callable[[ProcessedItem], None] | None typing is clean and correct.
  • CHANGELOG, PROGRESS, and Mini-Retro all updated per the project mandates — good hygiene.
  • The workflow if: always() fix is the correct lever for commit-on-failure.

Summary

Issue Severity
"not_found" / '"code": 404' false-positive risk High
Callback exception unhandled — batch abort on write failure High
Split imports fail make check High
Push step not verified for if: always() High
Redundant final _merge_and_write Moderate
Thin test partitions (missing negatives) Moderate

Please address issues 1–4 before merge; 5–6 are strongly recommended.


Generated by Claude Code

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 13, 2026

Code Review — PR #58: Harden Gemini 404 model fallback + incremental persistence

The intent is sound and the three changes work together coherently. There are, however, a few correctness and res...

Addressed in commit a8197bd.

Changes made:

  • Tightened is_model_not_found_error() text-path detection with model-specific guards and added compact/space code:404 variants.
  • Added broader partition coverage in tests/test_pipeline_quota.py, including negative cases.
  • Wrapped on_item_processed callback in process() with warning-log + continue behavior.
  • Added callback partition tests in tests/test_pipeline_run.py for None and raising callback paths.
  • Reduced redundant final merge/write by skipping full merge when incremental per-item persistence succeeded.
  • Updated fetch-and-process.yml commit step to always run git push origin HEAD under if: always().

Ruff flagged the I001 import-sort issue and its formatter wrapped four
lines that exceeded the 100-char project limit. No logic changes.

https://claude.ai/code/session_01QVcqWDPsVT7EJ6LYF6bzzq
@github-actions
Copy link
Copy Markdown
Contributor

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 5733819.
Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

Copy link
Copy Markdown
Owner

@davidamitchell davidamitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: Logic-against-intention verification + ruff fix

Checked out the branch, read the actual files (which include a8197bd — the review-feedback commit), and traced each change against its stated intent.


Fix applied: ruff line-length (commit 5733819)

ruff check --fix && ruff format wrapped four lines that exceeded the 100-char project limit inside run.py. No logic changes. All 9 test_pipeline_quota.py tests pass. The test_pipeline_run.py failures are pre-existing baseline issues (httpx, google.genai not installed in this environment) — confirmed unrelated to this PR.


Logic trace: is_model_not_found_error()

has_model_hint = "model" in text or "models/" in text
return (
    ("not_found" in text and has_model_hint)           # A: gRPC NOT_FOUND + model mention
    or "model is not found" in text                    # B: natural-language phrase
    or ("models/" in text and "not found" in text)     # C: model path format
    or (...'"code": 404'... and has_model_hint)        # D: JSON payload + model mention
)

Each pattern covers a distinct Gemini error shape:

  • A catches the gRPC NOT_FOUND status string (lowercased not_found with underscore, distinct from natural-language "not found" with a space). has_model_hint prevents matching generic resource-not-found errors.
  • B catches natural-language phrases like "model is not found" — not redundant with A because "not found" (space) ≠ "not_found" (underscore).
  • C catches the Gemini model-path format models/xxx is not found — specific and correct.
  • D catches serialised error payloads where the SDK stringifies the full JSON response into the exception message. Guarded by has_model_hint.

One residual edge case: an error like "API config for model 'x' returned code: 404" could trigger D. Theoretical false positive; in practice Gemini SDK only surfaces model-ID 404s through this path. Acceptable.


Logic trace: on_item_processed callback resilience ✅

In process(): item is appended to results before the callback fires — it is never lost from the in-memory batch even if the callback raises. The exception is caught, logged at WARNING, and swallowed — the loop continues to the next item.

In main() — the _persist_item/callback_failed design: the _persist_item closure sets callback_failed = True then re-raises; process() catches and swallows that raise. callback_failed bridges the swallowed exception back to main() to trigger the fallback _merge_and_write. Three scenarios all handled correctly:

Scenario callback_failed how merged
All writes succeed False read_processed_jsonl — no extra I/O
Write fails for item N, recovers later True _merge_and_write with full in-memory batch
All writes fail True _merge_and_write with full in-memory batch

Every processed item ends up on disk in all cases. The flag elegantly avoids a redundant write on the happy path — the concern about redundant I/O from the initial review is resolved.


Logic trace: workflow if: always()

git push origin HEAD was already outside the if git diff --cached --quiet; fi block before this PR — it always pushes unconditionally. The PR adds if: always() to the step itself so it runs even when the process job fails. Together they satisfy workflow checklist item 6. My original concern that the push might be missing if: always() was unfounded — the push was already unconditional inside the script.


All original issues resolved

Issue Status
"not_found" / '"code": 404' false-positive risk Fixed — has_model_hint guard (a8197bd)
Callback exception unhandled — batch abort on write failure Fixed — try/except in process() + callback_failed fallback (a8197bd)
Split imports failing ruff Fixed — ruff auto-format, one-alias-per-block is its canonical form (5733819)
Push step if: always() not verified Non-issue — push was already unconditional
Redundant final _merge_and_write Resolved — callback_failed skips it on the happy path
Thin test partitions (missing negatives) Fixed — 6 tests covering positives, negatives, boundary; None and raise paths for callback (a8197bd)

Logic matches intention across all three stated goals. Ready to merge.


Generated by Claude Code

@davidamitchell davidamitchell marked this pull request as ready for review May 14, 2026 08:52
@davidamitchell davidamitchell merged commit 5c4d819 into main May 14, 2026
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pipeline errors

3 participants