Skip to content

fix: switch site deployment to artifact-based Pages + path trigger#54

Merged
davidamitchell merged 8 commits into
mainfrom
claude/enrich-items-ai-themes-eI58y
May 11, 2026
Merged

fix: switch site deployment to artifact-based Pages + path trigger#54
davidamitchell merged 8 commits into
mainfrom
claude/enrich-items-ai-themes-eI58y

Conversation

@davidamitchell
Copy link
Copy Markdown
Owner

No description provided.

claude added 2 commits May 10, 2026 02:26
After committing enriched data, trigger rebuild-site.yml automatically
so the site reflects the latest enrichment state without a manual step.

Requires actions:write permission (added) to dispatch the workflow.
The trigger fires on always() so it runs even if the backfill step
had errors, ensuring partial progress is visible on the site.

Also updates the comment at the top of the workflow — the manual
rebuild note is no longer accurate.

https://claude.ai/code/session_01D5PMpjoQbrbgkVrk6DZaJc
- rebuild-site.yml: replace workflow_call+commit-to-branch with
  artifact-based deploy (configure-pages / upload-pages-artifact /
  deploy-pages). Add push path trigger on data/processed/** so the
  site rebuilds automatically after every pipeline run or backfill.
  Add concurrency guard (group: pages) to prevent double deploys.

- pipeline.yml: remove site job — rebuild-site.yml is now self-
  triggering via the path filter, so the explicit workflow_call is
  no longer needed (and would fail since workflow_call was removed).

NOTE: GitHub Pages Settings must be changed to Source: "GitHub Actions"
(Settings → Pages → Source) for actions/deploy-pages to work.

https://claude.ai/code/session_01D5PMpjoQbrbgkVrk6DZaJc
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 10, 2026

Dependency Review

The following issues were found:
  • ✅ 0 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ⚠️ 3 package(s) with unknown licenses.
See the Details below.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 60081da.
Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

License Issues

.github/workflows/rebuild-site.yml

PackageVersionLicenseIssue Type
actions/configure-pages5.*.*NullUnknown License
actions/deploy-pages4.*.*NullUnknown License
actions/upload-pages-artifact3.*.*NullUnknown License

OpenSSF Scorecard

PackageVersionScoreDetails
actions/actions/configure-pages 5.*.* 🟢 6.2
Details
CheckScoreReason
Maintained⚠️ 12 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 1
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Binary-Artifacts🟢 10no binaries found in the repo
Code-Review🟢 10all changesets reviewed
Packaging⚠️ -1packaging workflow not detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies🟢 6dependency not pinned by hash detected -- score normalized to 6
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Fuzzing⚠️ 0project is not fuzzed
Signed-Releases⚠️ -1no releases found
Security-Policy🟢 9security policy file detected
SAST🟢 7SAST tool detected but not run on all commits
Branch-Protection🟢 8branch protection is not maximal on development and all release branches
actions/actions/deploy-pages 4.*.* 🟢 5.4
Details
CheckScoreReason
Binary-Artifacts🟢 10no binaries found in the repo
Maintained⚠️ 01 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
Packaging⚠️ -1packaging workflow not detected
Code-Review🟢 10all changesets reviewed
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies🟢 6dependency not pinned by hash detected -- score normalized to 6
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Security-Policy🟢 9security policy file detected
Fuzzing⚠️ 0project is not fuzzed
SAST🟢 7SAST tool detected but not run on all commits
Branch-Protection⚠️ 1branch protection is not maximal on development and all release branches
actions/actions/upload-pages-artifact 3.*.* 🟢 5.8
Details
CheckScoreReason
Maintained🟢 56 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 5
Code-Review🟢 8Found 8/9 approved changesets -- score normalized to 8
Binary-Artifacts🟢 10no binaries found in the repo
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Packaging⚠️ -1packaging workflow not detected
Pinned-Dependencies🟢 3dependency not pinned by hash detected -- score normalized to 3
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Fuzzing⚠️ 0project is not fuzzed
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Security-Policy🟢 9security policy file detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
Branch-Protection🟢 8branch protection is not maximal on development and all release branches

Scanned Files

  • .github/workflows/rebuild-site.yml

claude added 5 commits May 10, 2026 02:46
Root cause: GEMINI_API_KEY was not set as a GitHub Secret. The pipeline
silently skips enrichment (by design), the backfill exits with code 1, and
continue-on-error: true made the failure invisible — backfill showed green
with 0 items enriched.

backfill-enrichment.yml:
- Add "Verify GEMINI_API_KEY is configured" as step 1, fails immediately
  with ::error:: annotation if the secret is absent
- Remove continue-on-error: true — the if: always() on the commit step
  already preserves partial progress; failures must now be visible

fetch-and-process.yml:
- Add warning step that emits ::warning:: when GEMINI_API_KEY is absent
  so every daily pipeline run surfaces the degraded state in the UI

learnings.md: document root cause, rule, and fix class

https://claude.ai/code/session_01D5PMpjoQbrbgkVrk6DZaJc
Root cause was quota exhaustion (exit code 2 swallowed by continue-on-error),
not a missing key. The backfill did enrich 3 March files before hitting a 429
on the first April file item. The non-zero exit was invisible to the user.

Remove the wrongly-added GEMINI_API_KEY verify step. Keep the
continue-on-error removal — that's the actual fix. The if: always() on the
commit step already handles partial-progress commits correctly.

https://claude.ai/code/session_01D5PMpjoQbrbgkVrk6DZaJc
Switches per-item AI enrichment from gemini-2.5-flash (500 RPD, 5 RPM)
to gemini-2.0-flash (1500 RPD, 15 RPM), tripling both daily and per-minute
quota on the free tier. Rate limiter interval drops from 12s to 4s per call.

- Add gemini_model and gemini_rpm to PipelineConfig; load from sources.yaml
- Remove ThinkingConfig(thinking_budget=0) — not supported by 2.0-flash
- Thread model param through enrich(), backfill_file(), backfill_all()
- Update sources.yaml pipeline section with model + rpm defaults
- Fix test mocks: add model kwarg to all inline enrich() mock functions

Theme clustering and email digest remain on gemini-2.5-flash (1 call/run,
quality matters more than throughput there; separate quota pool).

https://claude.ai/code/session_01D5PMpjoQbrbgkVrk6DZaJc
Replaces fixed-interval pacing with header-driven adaptive pacing so the
rate limiter self-calibrates to any Gemini model without hardcoded RPM ceilings.

- _HeaderCapturingClient (httpx.Client subclass) captures x-ratelimit-*
  headers from every response via the supported HttpOptions.httpx_client hook
- _RateLimiter reads x-ratelimit-remaining-requests before each wait():
    remaining == 0  → wait for reset window (parses x-ratelimit-reset-requests)
    remaining <= 3  → triple interval to coast to window boundary
    remaining  > 3  → restore minimum interval (60 / rpm floor from config)
- _parse_reset_seconds handles bare float, "Xs" suffix, and ISO-8601 formats
- gemini_rpm in sources.yaml is now a minimum floor, not a ceiling
- Updated sources.yaml comments to list Gemini 2.5 Flash Lite, 3 Flash,
  3.1 Flash Lite as available pool (each with independent quota)

No test changes needed: _NullRateLimiter used in all unit tests; the new
http_client param is optional and defaults to None (no-op).

https://claude.ai/code/session_01D5PMpjoQbrbgkVrk6DZaJc
…hausted

When a model's daily quota is exhausted the pipeline automatically advances
to the next model without stopping the run.  Two triggers for advance():

  1. QuotaExhaustedEnrichError / 429 with quota details (reliable RPD signal)
  2. x-ratelimit-remaining-*-per-day header == 0 (proactive, before the 429)

Cascade order (configured by gemini_model in sources.yaml):
  gemini-3-flash → gemini-3.1-flash-lite → gemini-2.5-flash → gemini-2.5-flash-lite

Each model in the cascade has its own _RateLimiter instance so per-minute
pacing resets cleanly on advance. Stale headers are cleared on advance so
the new model's first response is read without contamination.

_ModelCascade replaces the ai_disabled_for_quota flag in process() and the
quota_exhausted flag in backfill_file(). stop=True from backfill_file only
when the cascade is fully exhausted (all 4 models used up) or budget is hit.

Existing tests unchanged — cascade is an optional kwarg defaulting to None;
the _NullRateLimiter / model-string path still works for all unit tests.

https://claude.ai/code/session_01D5PMpjoQbrbgkVrk6DZaJc
Comment thread src/pipeline/run.py Fixed
…RY/SOLID)

Before: _HeaderCapturingClient, _RateLimiter, _ModelCascade, and the client
factory were defined inline in run.py. summariser.py and themes.py constructed
genai.Client() directly — bypassing header capture entirely.

After: single canonical module (src/pipeline/_gemini.py) owns all shared Gemini
machinery. Every call site goes through make_gemini_client() so x-ratelimit-*
headers are captured universally.

- run.py: imports from _gemini, removes ~170 lines of inline definitions
- backfill.py: imports make_gemini_client from _gemini (not run)
- summariser.py: client, _ = make_gemini_client(api_key) — header capture now active
- themes.py: same — cluster_themes() now also captures rate-limit headers

Dead code removed: concept_extraction.py, media_id.py, summary_extraction.py,
theme_classification.py — replaced by enrich.py in an earlier commit, never
imported anywhere.

https://claude.ai/code/session_01D5PMpjoQbrbgkVrk6DZaJc
Comment thread src/pipeline/_gemini.py
try:
if int(val) <= 0:
return True
except ValueError:
Comment thread src/pipeline/run.py
Comment on lines +50 to +56
from src.pipeline._gemini import (
_HeaderCapturingClient,
_ModelCascade,
_MODEL_CASCADE,
_RateLimiter,
make_gemini_client as _make_gemini_client,
)
@davidamitchell davidamitchell merged commit b16ab10 into main May 11, 2026
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants