Skip to content

fix: repin fleet with normal push and propagate failures#220

Merged
joshua-temple merged 1 commit into
mainfrom
ci/fix-fleet-repin
Jun 18, 2026
Merged

fix: repin fleet with normal push and propagate failures#220
joshua-temple merged 1 commit into
mainfrom
ci/fix-fleet-repin

Conversation

@joshua-temple

Copy link
Copy Markdown
Collaborator

Problem

The fleet-e2e repin job had two bugs found by an end-to-end run:

  1. Push rejected. The repin did git push --force-with-lease origin HEAD:main to each example repo, which the protected-main ruleset rejects (push declined due to repository rule violations). Write access exists - the suites' own state-writes to the same main succeed via a normal push - the force is what is blocked.
  2. False success. The rejection was swallowed: the echo after the push ran regardless, the subshell exited 0, and the job reported green while pushing nothing. Suites then validated a stale pin under a fresh-rc label.

Fix

  • Replace the force push with a normal fast-forward push (git push origin HEAD:main). The repin clones fresh main, so the first push is a fast-forward and needs no force.
  • On a non-fast-forward rejection (a concurrent write landed), bounded retry: git fetch origin main && git reset --hard origin/main, re-apply the cli_version sed + generate-workflow --force, re-commit, push, up to 5 attempts. This mirrors the state-writer's commitWithApplicationRetry.
  • Propagate failures. Capture the push exit status explicitly, record per-repo failures, and exit non-zero if any repo's regen or push fails.
  • Post-push assertion. Read each repo's main cli_version back via the contents API and fail the job if it is not the rc, so a silent no-op can never report green.

Verification

  • actionlint clean.
  • Reasoned through: a rejected or failed push now sets the per-repo failure, the final [ -n "$failed" ] && exit 1 reds the job, and thus the fleet gate, so no promote on a stale fleet.
  • Maintainer CI only (no generator/CLI change). Workflow-only, so e2e is skipped and the Integration Gate passes.

If a normal push is also rejected, the job now fails loudly with the exact rejection captured, signalling a genuine ruleset/write block that needs a maintainer protection bypass.

The fleet-e2e repin job force-pushed to each example repo's protected main, which the ruleset rejects, and then swallowed the rejection so the job reported green while pushing nothing. Suites then validated a stale pin under a fresh-rc label.

Replace the force push with a normal fast-forward push plus a bounded fetch/reset/re-apply/retry loop, mirroring the state-writer's commitWithApplicationRetry. Capture the push exit status explicitly, record per-repo failures, and exit non-zero if any repo's regen or push fails. Add a post-push assertion that reads each repo's main cli_version back and fails the job if it is not the rc, so a silent no-op can never report green.

Signed-off-by: Joshua Temple <joshua.temple@stablekernel.com>
@joshua-temple joshua-temple merged commit 18b4936 into main Jun 18, 2026
13 checks passed
@joshua-temple joshua-temple deleted the ci/fix-fleet-repin branch June 18, 2026 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant