Skip to content

Fix: add dry-run AzureCluster create to ensure CA bundle availability#6221

Open
mboersma wants to merge 2 commits intokubernetes-sigs:mainfrom
mboersma:fix-webhook-ca-flake
Open

Fix: add dry-run AzureCluster create to ensure CA bundle availability#6221
mboersma wants to merge 2 commits intokubernetes-sigs:mainfrom
mboersma:fix-webhook-ca-flake

Conversation

@mboersma
Copy link
Copy Markdown
Contributor

@mboersma mboersma commented Apr 8, 2026

What type of PR is this?

/kind flake

What this PR does / why we need it:

Fixes a flaky e2e test failure where the kube-apiserver hasn't yet picked up updated webhook CA bundles from its informer cache, even though cert-manager's cainjector has already populated them on the webhook configurations: the well-known "x509 error."

After the existing check that waits for CA bundle injection into all ValidatingWebhookConfigurations and MutatingWebhookConfigurations, this adds a dry-run AzureCluster create to verify the CAPZ mutating webhook is actually reachable end-to-end with valid TLS. This closes the race window between the CA bundle being written and the apiserver serving requests through the webhook with the new certificate.

Which issue(s) this PR fixes:
Fixes #5690 (hopefully)
See also #6144, which apparently didn't work. :-(

Special notes for your reviewer:

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • cherry-pick candidate

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/flake Categorizes issue or PR as related to a flaky test. labels Apr 8, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign fabriziopandini for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 8, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.74%. Comparing base (d359ee9) to head (6a44fd3).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6221      +/-   ##
==========================================
+ Coverage   43.66%   43.74%   +0.08%     
==========================================
  Files         289      289              
  Lines       25495    25475      -20     
==========================================
+ Hits        11132    11145      +13     
+ Misses      13561    13527      -34     
- Partials      802      803       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mboersma mboersma changed the title Fix: add dry-run AzureCluster create to ensure CA bundle availability [WIP] Fix: add dry-run AzureCluster create to ensure CA bundle availability Apr 8, 2026
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 8, 2026
A validation error (Invalid/Forbidden) from the webhook proves
TLS is working end-to-end, which is all the probe needs to verify.
Only retry on errors that indicate TLS is not yet ready.
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 9, 2026
@mboersma
Copy link
Copy Markdown
Contributor Author

mboersma commented Apr 9, 2026

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Apr 9, 2026
@mboersma mboersma changed the title [WIP] Fix: add dry-run AzureCluster create to ensure CA bundle availability Fix: add dry-run AzureCluster create to ensure CA bundle availability Apr 9, 2026
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 9, 2026
@mboersma
Copy link
Copy Markdown
Contributor Author

mboersma commented Apr 9, 2026

/test pull-cluster-api-provider-azure-e2e

This bug isn't deterministic, so we can't easily know if this fixes it. I'll run tests a few times and we can make a judgement call.

@mboersma
Copy link
Copy Markdown
Contributor Author

mboersma commented Apr 9, 2026

/test pull-cluster-api-provider-azure-e2e

No failures yet... 🤞🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/flake Categorizes issue or PR as related to a flaky test. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

Webhooks sometimes fail with certificate errors in e2e

2 participants