fix: switch site deployment to artifact-based Pages + path trigger by davidamitchell · Pull Request #54 · davidamitchell/Latest-developments-

davidamitchell · 2026-05-10T02:43:18Z

No description provided.

After committing enriched data, trigger rebuild-site.yml automatically so the site reflects the latest enrichment state without a manual step. Requires actions:write permission (added) to dispatch the workflow. The trigger fires on always() so it runs even if the backfill step had errors, ensuring partial progress is visible on the site. Also updates the comment at the top of the workflow — the manual rebuild note is no longer accurate. https://claude.ai/code/session_01D5PMpjoQbrbgkVrk6DZaJc

- rebuild-site.yml: replace workflow_call+commit-to-branch with artifact-based deploy (configure-pages / upload-pages-artifact / deploy-pages). Add push path trigger on data/processed/** so the site rebuilds automatically after every pipeline run or backfill. Add concurrency guard (group: pages) to prevent double deploys. - pipeline.yml: remove site job — rebuild-site.yml is now self- triggering via the path filter, so the explicit workflow_call is no longer needed (and would fail since workflow_call was removed). NOTE: GitHub Pages Settings must be changed to Source: "GitHub Actions" (Settings → Pages → Source) for actions/deploy-pages to work. https://claude.ai/code/session_01D5PMpjoQbrbgkVrk6DZaJc

github-actions · 2026-05-10T02:43:30Z

Dependency Review

The following issues were found:

✅ 0 vulnerable package(s)
✅ 0 package(s) with incompatible licenses
✅ 0 package(s) with invalid SPDX license definitions
⚠️ 3 package(s) with unknown licenses.

See the Details below.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 60081da.

Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

License Issues

.github/workflows/rebuild-site.yml

Package	Version	License	Issue Type
actions/configure-pages	5..	Null	Unknown License
actions/deploy-pages	4..	Null	Unknown License
actions/upload-pages-artifact	3..	Null	Unknown License

OpenSSF Scorecard

Package

Version

Score

Details

actions/actions/configure-pages

5.*.*

🟢 6.2

Details

Check	Score	Reason
Maintained	⚠️ 1	2 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 1
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Binary-Artifacts	🟢 10	no binaries found in the repo
Code-Review	🟢 10	all changesets reviewed
Packaging	⚠️ -1	packaging workflow not detected
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies	🟢 6	dependency not pinned by hash detected -- score normalized to 6
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
License	🟢 10	license file detected
Fuzzing	⚠️ 0	project is not fuzzed
Signed-Releases	⚠️ -1	no releases found
Security-Policy	🟢 9	security policy file detected
SAST	🟢 7	SAST tool detected but not run on all commits
Branch-Protection	🟢 8	branch protection is not maximal on development and all release branches

actions/actions/deploy-pages

4.*.*

🟢 5.4

Details

Check	Score	Reason
Binary-Artifacts	🟢 10	no binaries found in the repo
Maintained	⚠️ 0	1 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
Packaging	⚠️ -1	packaging workflow not detected
Code-Review	🟢 10	all changesets reviewed
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies	🟢 6	dependency not pinned by hash detected -- score normalized to 6
License	🟢 10	license file detected
Signed-Releases	⚠️ -1	no releases found
Security-Policy	🟢 9	security policy file detected
Fuzzing	⚠️ 0	project is not fuzzed
SAST	🟢 7	SAST tool detected but not run on all commits
Branch-Protection	⚠️ 1	branch protection is not maximal on development and all release branches

actions/actions/upload-pages-artifact

3.*.*

🟢 5.8

Details

Check	Score	Reason
Maintained	🟢 5	6 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 5
Code-Review	🟢 8	Found 8/9 approved changesets -- score normalized to 8
Binary-Artifacts	🟢 10	no binaries found in the repo
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Packaging	⚠️ -1	packaging workflow not detected
Pinned-Dependencies	🟢 3	dependency not pinned by hash detected -- score normalized to 3
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Fuzzing	⚠️ 0	project is not fuzzed
License	🟢 10	license file detected
Signed-Releases	⚠️ -1	no releases found
Security-Policy	🟢 9	security policy file detected
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0
Branch-Protection	🟢 8	branch protection is not maximal on development and all release branches

Scanned Files

.github/workflows/rebuild-site.yml

Root cause: GEMINI_API_KEY was not set as a GitHub Secret. The pipeline silently skips enrichment (by design), the backfill exits with code 1, and continue-on-error: true made the failure invisible — backfill showed green with 0 items enriched. backfill-enrichment.yml: - Add "Verify GEMINI_API_KEY is configured" as step 1, fails immediately with ::error:: annotation if the secret is absent - Remove continue-on-error: true — the if: always() on the commit step already preserves partial progress; failures must now be visible fetch-and-process.yml: - Add warning step that emits ::warning:: when GEMINI_API_KEY is absent so every daily pipeline run surfaces the degraded state in the UI learnings.md: document root cause, rule, and fix class https://claude.ai/code/session_01D5PMpjoQbrbgkVrk6DZaJc

Root cause was quota exhaustion (exit code 2 swallowed by continue-on-error), not a missing key. The backfill did enrich 3 March files before hitting a 429 on the first April file item. The non-zero exit was invisible to the user. Remove the wrongly-added GEMINI_API_KEY verify step. Keep the continue-on-error removal — that's the actual fix. The if: always() on the commit step already handles partial-progress commits correctly. https://claude.ai/code/session_01D5PMpjoQbrbgkVrk6DZaJc

Switches per-item AI enrichment from gemini-2.5-flash (500 RPD, 5 RPM) to gemini-2.0-flash (1500 RPD, 15 RPM), tripling both daily and per-minute quota on the free tier. Rate limiter interval drops from 12s to 4s per call. - Add gemini_model and gemini_rpm to PipelineConfig; load from sources.yaml - Remove ThinkingConfig(thinking_budget=0) — not supported by 2.0-flash - Thread model param through enrich(), backfill_file(), backfill_all() - Update sources.yaml pipeline section with model + rpm defaults - Fix test mocks: add model kwarg to all inline enrich() mock functions Theme clustering and email digest remain on gemini-2.5-flash (1 call/run, quality matters more than throughput there; separate quota pool). https://claude.ai/code/session_01D5PMpjoQbrbgkVrk6DZaJc

Replaces fixed-interval pacing with header-driven adaptive pacing so the rate limiter self-calibrates to any Gemini model without hardcoded RPM ceilings. - _HeaderCapturingClient (httpx.Client subclass) captures x-ratelimit-* headers from every response via the supported HttpOptions.httpx_client hook - _RateLimiter reads x-ratelimit-remaining-requests before each wait(): remaining == 0 → wait for reset window (parses x-ratelimit-reset-requests) remaining <= 3 → triple interval to coast to window boundary remaining > 3 → restore minimum interval (60 / rpm floor from config) - _parse_reset_seconds handles bare float, "Xs" suffix, and ISO-8601 formats - gemini_rpm in sources.yaml is now a minimum floor, not a ceiling - Updated sources.yaml comments to list Gemini 2.5 Flash Lite, 3 Flash, 3.1 Flash Lite as available pool (each with independent quota) No test changes needed: _NullRateLimiter used in all unit tests; the new http_client param is optional and defaults to None (no-op). https://claude.ai/code/session_01D5PMpjoQbrbgkVrk6DZaJc

…hausted When a model's daily quota is exhausted the pipeline automatically advances to the next model without stopping the run. Two triggers for advance(): 1. QuotaExhaustedEnrichError / 429 with quota details (reliable RPD signal) 2. x-ratelimit-remaining-*-per-day header == 0 (proactive, before the 429) Cascade order (configured by gemini_model in sources.yaml): gemini-3-flash → gemini-3.1-flash-lite → gemini-2.5-flash → gemini-2.5-flash-lite Each model in the cascade has its own _RateLimiter instance so per-minute pacing resets cleanly on advance. Stale headers are cleared on advance so the new model's first response is read without contamination. _ModelCascade replaces the ai_disabled_for_quota flag in process() and the quota_exhausted flag in backfill_file(). stop=True from backfill_file only when the cascade is fully exhausted (all 4 models used up) or budget is hit. Existing tests unchanged — cascade is an optional kwarg defaulting to None; the _NullRateLimiter / model-string path still works for all unit tests. https://claude.ai/code/session_01D5PMpjoQbrbgkVrk6DZaJc

…RY/SOLID) Before: _HeaderCapturingClient, _RateLimiter, _ModelCascade, and the client factory were defined inline in run.py. summariser.py and themes.py constructed genai.Client() directly — bypassing header capture entirely. After: single canonical module (src/pipeline/_gemini.py) owns all shared Gemini machinery. Every call site goes through make_gemini_client() so x-ratelimit-* headers are captured universally. - run.py: imports from _gemini, removes ~170 lines of inline definitions - backfill.py: imports make_gemini_client from _gemini (not run) - summariser.py: client, _ = make_gemini_client(api_key) — header capture now active - themes.py: same — cluster_themes() now also captures rate-limit headers Dead code removed: concept_extraction.py, media_id.py, summary_extraction.py, theme_classification.py — replaced by enrich.py in an earlier commit, never imported anywhere. https://claude.ai/code/session_01D5PMpjoQbrbgkVrk6DZaJc

+                try:
+                    if int(val) <= 0:
+                        return True
+                except ValueError:


+from src.pipeline._gemini import (
+    _HeaderCapturingClient,
+    _ModelCascade,
+    _MODEL_CASCADE,
+    _RateLimiter,
+    make_gemini_client as _make_gemini_client,
+)


claude added 2 commits May 10, 2026 02:26

claude added 5 commits May 10, 2026 02:46

github-advanced-security AI found potential problems May 10, 2026

View reviewed changes

Comment thread src/pipeline/run.py Fixed

github-advanced-security AI found potential problems May 10, 2026

View reviewed changes

davidamitchell merged commit b16ab10 into main May 11, 2026
5 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: switch site deployment to artifact-based Pages + path trigger#54

fix: switch site deployment to artifact-based Pages + path trigger#54
davidamitchell merged 8 commits into
mainfrom
claude/enrich-items-ai-themes-eI58y

davidamitchell commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

davidamitchell commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Snapshot Warnings

License Issues

.github/workflows/rebuild-site.yml

OpenSSF Scorecard

Scanned Files

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented May 10, 2026 •

edited

Loading