Build: Speed up Spark CI with parallel test execution by kevinjqliu · Pull Request #16357 · apache/iceberg

kevinjqliu · 2026-05-15T21:45:57Z

Reduces Spark CI wall-clock time by running tests in parallel, matching the
pattern already used in flink-ci.yml:

Add -DtestParallelism=auto to the ./gradlew check invocation so test
execution scales to the runner.

Copilot

Pull request overview

This PR reduces Spark CI wall-clock time by enabling Gradle test parallelism, aligning Spark CI behavior with the existing Flink CI workflow.

Changes:

Adds -DtestParallelism=auto to the Spark CI ./gradlew ... :check invocation to scale test execution to runner CPU capacity.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Run :iceberg-spark and :iceberg-spark-extensions (+ :iceberg-spark-runtime) :check tasks in separate matrix jobs instead of serially within one Gradle invocation. On the slowest matrix combo this removes ~26 min from the critical path. Bump -DtestParallelism from 'auto' (= ceil(nproc/2) = 2 forks on a 4 vCPU ubuntu-24.04 runner) to nproc-1 (= 3 forks). Each Spark test fork uses ~3.16 GB heap, so 3 forks ~= 9.5 GB leaves headroom for the Gradle daemon and Spark driver overhead inside each fork. Raise max-parallel from 15 to 30 to accommodate the doubled job count (2 modules x 12 surviving (jvm, spark, scala) combos = 24 jobs).

- fail-fast: false so an OOM/flake in one matrix combo doesn't cancel the other ~23 jobs and we can see whether failures are correlated. - timeout-minutes: 60 so a hung Gradle daemon or stuck Spark driver bails out fast instead of waiting for the default 6 h job timeout.

3 forks * ~3.16 GB heap (plus Spark off-heap and the 4 GB Gradle daemon) exceeded the 16 GB ubuntu-24.04 runner ceiling and triggered the kernel OOM-killer, surfacing as 'The runner has received a shutdown signal'. Drop the spark-only step to 2 forks; iceberg-spark-extensions and iceberg-spark-runtime keep nproc-1 since their forks use Gradle's default ~512 MB heap.

Inline a 10s-interval memory sampler inside each test step that writes both a CSV (uploaded as the runner-monitor artifact) and per-sample markdown rows into GITHUB_STEP_SUMMARY. A follow-up always-run step mirrors free/ps/dmesg output and the monitor tail into its own summary so diagnostics survive even if the test step's summary upload is lost to a runner SIGKILL.

add -DtestParallelism=auto

2a6a296

github-actions Bot added the INFRA label May 15, 2026

kevinjqliu requested a review from Copilot May 15, 2026 21:58

Copilot started reviewing on behalf of kevinjqliu May 15, 2026 21:59 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

kevinjqliu added 5 commits May 15, 2026 15:43

auto

ac928c4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build: Speed up Spark CI with parallel test execution#16357

Build: Speed up Spark CI with parallel test execution#16357
kevinjqliu wants to merge 6 commits into
apache:mainfrom
kevinjqliu:kevinjqliu/parallelize-spark-ci

kevinjqliu commented May 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kevinjqliu commented May 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants