Skip to content

Build: Designate a single Gradle cache writer across CI workflows#16356

Open
kevinjqliu wants to merge 4 commits into
apache:mainfrom
kevinjqliu:kevinjqliu/fix-gradle-cache
Open

Build: Designate a single Gradle cache writer across CI workflows#16356
kevinjqliu wants to merge 4 commits into
apache:mainfrom
kevinjqliu:kevinjqliu/fix-gradle-cache

Conversation

@kevinjqliu
Copy link
Copy Markdown
Contributor

Why

The shared gradle caches managed by gradle/actions/setup-gradle are written by every job that doesn't explicitly opt out. With ~10 setup-gradle invocations across 12 workflows, parallel jobs race to save overlapping caches on every push to main, producing duplicated entries and accelerating LRU pressure against GitHub's 10 GB per-repo cap.

This causes cache thrashing: each commit produces ~3–4 GB of fresh per-job gradle-home-...-<sha> entries that immediately evict older entries (including hot dependency caches) under LRU. The next build then misses on entries that should have been warm, re-downloads dependencies, writes new entries, and evicts again — a self-perpetuating churn loop that wastes minutes per build and keeps the cache in a permanently cold state despite being at-capacity.

What

Restrict cache writes to a single canonical job and make every other job read-only:

  • Sole writer: java-ci.ymlbuild-checks (17) on refs/heads/main only. This job runs ./gradlew -DallModules build and therefore resolves the union dependency closure that all other workflows need.
  • Read-only everywhere else: cache-read-only: true added to setup-gradle in java-ci.yml (3 other jobs), spark-ci.yml, flink-ci.yml, hive-ci.yml, kafka-connect-ci.yml, delta-conversion-ci.yml (×2), cve-scan.yml, api-binary-compatibility.yml, publish-snapshot.yml, publish-iceberg-rest-fixture-docker.yml, recurring-jmh-benchmarks.yml.
  • Cache-disabled: jmh-benchmarks.yml (workflow_dispatch on arbitrary repo/ref) → cache-disabled: true to avoid cache poisoning.

Read-only jobs still benefit from cache restores (including setup-gradle's restore-keys prefix walk that lets matrix variants pull a sibling job's gradle-home entry).

Validation

Validated on a fork (kevinjqliu/iceberg) across 4 rounds:

Round Trigger Outcome
R1 Initial main push (cold) New caches saved by build-checks (17) only
R2 PR run (refs/pull/20/*) Zero new cache entries created (PRs are read-only)
R3 2nd main push (warm) build-checks (17) updated gradle-home and gradle-dependencies; no other writers
R4 3rd main push All restores hit; only the single per-commit gradle-home entry created

Job logs confirm Cache is read-only: will not save state for use in subsequent builds. for every non-writer job, and Saved cache entry with key gradle-home-v1\|...build-checks[...]-<sha> only from build-checks (17).

Impact

  • Eliminates inter-job cache write races and duplicate entries across the matrix
  • Single deterministic write point makes cache contents predictable and debuggable
  • No build-time regression: read-only jobs still get full restore behavior
  • Reduces steady-state cache footprint and slows accumulation against the 10 GB cap

Files changed

12 workflows under .github/workflows/, +58 lines (comments + 1–2 new keys per file). No code changes, no test changes.

@github-actions github-actions Bot added the INFRA label May 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Gradle cache thrashing in CI by designating a single canonical cache writer (java-ci.yml's build-checks (17) job on refs/heads/main) and making all other gradle/actions/setup-gradle invocations read-only (or cache-disabled for the workflow_dispatch-driven jmh-benchmarks.yml). This prevents parallel jobs from racing to save overlapping cache entries and accelerating LRU eviction against GitHub's 10 GB per-repo cache cap.

Changes:

  • Add cache-read-only: true to setup-gradle steps in 11 jobs across 10 workflows.
  • Make java-ci.yml build-checks job conditionally read-only — writes only on refs/heads/main with matrix.jvm == 17.
  • Set cache-disabled: true in jmh-benchmarks.yml to prevent cache poisoning from arbitrary repo/ref dispatch inputs.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.

Show a summary per file
File Description
.github/workflows/java-ci.yml Marks core-tests, build-javadoc, check-runtime-deps read-only; gates build-checks writes to main + JVM 17
.github/workflows/spark-ci.yml Adds cache-read-only: true to setup-gradle
.github/workflows/flink-ci.yml Adds cache-read-only: true to setup-gradle
.github/workflows/hive-ci.yml Adds cache-read-only: true to setup-gradle
.github/workflows/kafka-connect-ci.yml Adds cache-read-only: true to setup-gradle
.github/workflows/delta-conversion-ci.yml Adds cache-read-only: true to two scala-variant jobs
.github/workflows/cve-scan.yml Adds cache-read-only: true to setup-gradle
.github/workflows/api-binary-compatibility.yml Adds cache-read-only: true to setup-gradle
.github/workflows/publish-snapshot.yml Adds cache-read-only: true to setup-gradle
.github/workflows/publish-iceberg-rest-fixture-docker.yml Adds cache-read-only: true to setup-gradle
.github/workflows/recurring-jmh-benchmarks.yml Adds cache-read-only: true to setup-gradle
.github/workflows/jmh-benchmarks.yml Sets cache-disabled: true to prevent poisoning from arbitrary dispatch inputs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@huaxingao huaxingao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants