Skip to content

feat: support config-gated in-place mutation of FirstPartyUsage IP tags#10133

Open
nilo19 wants to merge 4 commits intokubernetes-sigs:masterfrom
nilo19:worktree-iptags-mutate
Open

feat: support config-gated in-place mutation of FirstPartyUsage IP tags#10133
nilo19 wants to merge 4 commits intokubernetes-sigs:masterfrom
nilo19:worktree-iptags-mutate

Conversation

@nilo19
Copy link
Copy Markdown
Contributor

@nilo19 nilo19 commented Apr 2, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

When the service.beta.kubernetes.io/azure-pip-ip-tags annotation changes on a Service, the current code deletes the existing Public IP and creates a new one, causing unnecessary IP address changes and service disruption.

This PR adds a new config flag enableIPTagMutationForExistingPublicIP that enables in-place mutation of FirstPartyUsage IP tags on existing PIPs instead of deleting and recreating them.

Key design decisions:

  • Config-gated: Off by default — no behavior change without the config flag
  • FirstPartyUsage only: Only FirstPartyUsage-type IP tags (/NonProd, /Unprivileged) can be mutated in-place. Attempting to change other types (e.g. RoutingPreference) returns a reconciliation error
  • Original behavior preserved: When the config is off, IP tag mismatch still triggers delete-and-recreate via condition Enable e2e tests for Azure #4 in shouldReleaseExistingOwnedPublicIP

Changes:

  • Add enableIPTagMutationForExistingPublicIP bool config field in pkg/provider/config/azure.go
  • Redesign ensurePIPIPTagged as a *Cloud method returning (bool, error) — splits tags into FirstPartyUsage vs other, validates non-FPU tags, only mutates FPU in-place
  • Restore shouldReleaseExistingOwnedPublicIP condition Enable e2e tests for Azure #4 gated on !enableIPTagMutation
  • Wire ensurePIPIPTagged after the keep/delete decision in getPublicIPUpdates (not on the delete path)
  • Wire in ensurePublicIPExists before the DNS-label short-circuit

Validated against live Azure:

  • FirstPartyUsage=/NonProd/Unprivileged update: ✅ succeeds on Standard SKU zone-redundant PIPs
  • Clear FirstPartyUsage tags: ✅ succeeds
  • PIP identity (Name, IP, ResourceGUID) preserved through all mutations
  • RoutingPreference + FirstPartyUsage coexistence: ❌ rejected by Azure (MultipleIPTagWithRoutingPreferenceNotSupported)

Which issue(s) this PR fixes:

Fixes #9125

Special notes for your reviewer:

  • The ensurePIPIPTagged method is only called on PIPs that shouldReleaseExistingOwnedPublicIP decides to keep — never on the delete path
  • User-assigned PIPs are not modified (guarded by !isUserAssignedPIP)
  • Shared PIPs follow "last service wins" semantics, consistent with existing ensurePIPTagged
  • The ensurePublicIPExists call site is placed before the DNS-label short-circuit to ensure IP tags are reconciled on the early-return path
  • Test values use real Azure FirstPartyUsage values (/NonProd, /Unprivileged) validated against live Azure
  • Error assertions compare expected error messages, not just err != nil

Does this PR introduce a user-facing change?

Support config-gated in-place mutation of FirstPartyUsage IP tags on existing public IPs via `enableIPTagMutationForExistingPublicIP` config flag, avoiding unnecessary IP address changes and service disruption when the `service.beta.kubernetes.io/azure-pip-ip-tags` annotation changes.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 2, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nilo19

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-actions github-actions Bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Apr 2, 2026
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 2, 2026
@k8s-ci-robot k8s-ci-robot requested review from cheftako and elmiko April 2, 2026 04:35
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 2, 2026
Add enableIPTagMutationForExistingPublicIP config flag that enables
in-place mutation of FirstPartyUsage IP tags on existing public IPs
instead of deleting and recreating the PIP.

Only FirstPartyUsage-type IP tags (/NonProd, /Unprivileged) can be
mutated in-place. Attempting to change other IP tag types (e.g.
RoutingPreference) returns a reconciliation error. When the config
flag is disabled (default), the original delete-and-recreate behavior
is preserved.
@nilo19 nilo19 force-pushed the worktree-iptags-mutate branch from d9e7b85 to 29c5b54 Compare April 8, 2026 00:52
@nilo19 nilo19 changed the title feat: support in-place mutation of IP tags on existing public IPs feat: support config-gated in-place mutation of FirstPartyUsage IP tags Apr 8, 2026
@nilo19 nilo19 assigned nilo19 and unassigned nilo19 Apr 8, 2026
@nilo19 nilo19 requested a review from Copilot April 8, 2026 02:15
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a config-gated feature to avoid deleting/recreating Azure Public IPs when service.beta.kubernetes.io/azure-pip-ip-tags changes, by allowing in-place mutation of FirstPartyUsage IP tags on existing, CCM-managed PIPs.

Changes:

  • Introduces enableIPTagMutationForExistingPublicIP config to gate in-place mutation behavior (off by default).
  • Adds ensurePIPIPTagged mutation logic and wires it into the PIP reconcile flow (only for kept, non-user-assigned PIPs).
  • Adds/extends unit and e2e coverage for in-place IP tag mutation, including error cases for non-FirstPartyUsage tag changes.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
tests/e2e/network/service_annotations.go Adds an e2e that updates/clears FirstPartyUsage IP tags and verifies PIP identity is preserved; adds helper to poll/compare IP tags.
pkg/provider/config/azure.go Adds the EnableIPTagMutationForExistingPublicIP config field and documentation.
pkg/provider/azure_loadbalancer.go Gates delete/recreate behavior on the new config and adds in-place FirstPartyUsage IP tag reconciliation.
pkg/provider/azure_loadbalancer_test.go Adds unit tests covering mutation-enabled vs disabled behavior and ensures IP tag reconciliation runs before DNS short-circuit.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/e2e/network/service_annotations.go Outdated
Comment thread pkg/provider/azure_loadbalancer.go
Comment thread pkg/provider/azure_loadbalancer.go Outdated
Comment thread pkg/provider/azure_loadbalancer.go Outdated
Comment thread pkg/provider/azure_loadbalancer.go
Comment thread pkg/provider/azure_loadbalancer_test.go
nilo19 added 2 commits April 8, 2026 13:21
The existing comparator used `a < b || c < d` which is not a strict
weak ordering and can produce non-deterministic sort results. Fix to
compare by IPTagType first, then by Tag only when types are equal.
@nilo19
Copy link
Copy Markdown
Contributor Author

nilo19 commented Apr 11, 2026

/retest

Comment thread pkg/provider/azure_loadbalancer.go Outdated
@@ -3313,12 +3325,6 @@ func shouldReleaseExistingOwnedPublicIP(
// Latch some variables for readability purposes.
pipName := *existingPip.Name
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since currentIPTags is moved down. I think this can also be moved down.

Comment thread pkg/provider/azure_loadbalancer.go Outdated
return false
}

// Determine current IP tags for comparison.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the comment changed from "Assume the current IP Tags are empty by default unless properties specify otherwise."?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted.

return false, nil
}

if pip.Properties == nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe check this first before getServiceIPTagRequestForPublicIP?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored the func

return changed
}

// ensurePIPIPTagged ensures the PIP has the IP tags specified by the service annotation.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add comment about returning true if the IPTags were modified and can update in-place.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

Comment thread pkg/provider/azure_loadbalancer.go Outdated
}

// ensurePIPIPTagged ensures the PIP has the IP tags specified by the service annotation.
// When the config is off or annotation is absent, existing IP tags are preserved.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"config" is vague here and in test descriptions if I see the code outside of this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

Comment thread pkg/provider/azure_loadbalancer_test.go Outdated
expectedClientGet: &deleteUnwantedPIPsAndCreateANewOneclientGet,
},
{
desc: "config on + RP mismatch should return error",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is RP and FPU, RoutingPreference and FirstPartyUsage? Should probably be spelled out in test descriptions. Maybe non-FirstPartyUsage as well for the "RP" cases.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

Comment thread pkg/provider/azure_loadbalancer_test.go Outdated
shouldPutPIP: true,
},
{
desc: "shall update existed PIP IP tags in place before the dns short-circuit",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"before the dns short-circuit" is somewhat clear if I know the code. A more self-explanatory description would be better.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rephrased.

Comment thread pkg/provider/azure_loadbalancer_test.go Outdated
}{
{
// #1: config off — no-op regardless of annotation/tags
desc: "config off should be no-op",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments above each case are redundant with the desc field. Each case seem simple enough. Either remove the comments or merge the comment and desc if needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

Comment thread pkg/provider/azure_loadbalancer.go Outdated
desiredTag := ptr.Deref(desired.Tag, "")
found := false
for _, current := range currentOther {
if strings.EqualFold(ptr.Deref(current.IPTagType, ""), desiredType) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using map better here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored the function.

service.Annotations = test.additionalAnnotations
mockPIPsClient := az.NetworkClientFactory.GetPublicIPAddressClient().(*mock_publicipaddressclient.MockInterface)
createOrUpdateCount := 0
if test.shouldPutPIP {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldPutPIP gates whether the mock expectation is set up at all. Setting the mock when shouldPutPIP=true with Times(1) instead of AnyTimes() would be a cleaner assertion without needing a separate counter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but that's something out of the scope of this fix, we can refactor this issue among all unit tests in the future with another change.

Comment on lines +3356 to +3358
(!enableIPTagMutation &&
ipTagRequest.IPTagsRequestedByAnnotation &&
!areIPTagsEquivalent(currentIPTags, ipTagRequest.IPTags))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ip tag mutation is enabled, current IP no tags, and request to add RoutingPreference tag type, this will be false which mean it won't be deleted. Later it will error since ensurePIPIPTagged validates only FirstPartyUsage tags can be updated in-place. Is this intended when ip tag mutation is enabled? Is there any guidance if user wants to add RoutingPreference tag type in this case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — addressed in the latest revision. We removed the client-side FirstPartyUsage validation in ensurePIPIPTagged; with mutation enabled we now PUT the requested tag set and let NRP accept or reject. For tag types NRP doesn't yet support in-place, the user gets Azure's error directly and can fall back to enableIPTagMutation=false for delete-and-recreate semantics.

}

ipTagRequest := getServiceIPTagRequestForPublicIP(service)
if !ipTagRequest.IPTagsRequestedByAnnotation {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should removing the azure-pip-ip-tags annotation have the same effect as setting it empty (i.e., "")? Currently removing it does not remove the IP tag, only setting it empty will remove the IP tag.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, should remove the tag by setting it empty, or it will be ignored.

Comment thread pkg/provider/azure_loadbalancer.go Outdated
// Validate: error if annotation explicitly requests a non-FPU tag that
// differs from the current value, or if a new non-FPU tag is requested.
pipName := ptr.Deref(pip.Name, "")
for _, desired := range desiredOther {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended?
Existing PIP: RoutingPreference=Internet
Annotation changes to: FirstPartyUsage=/NonProd
desiredOther is empty, validation passes
Merge produces [RoutingPreference=Internet, FirstPartyUsage=/NonProd]
CreateOrUpdatePIP will fail.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error returned here will trigger retry but the next retry succeeds. The PIP in Azure still only has RoutingPreference. I suspect this is because ensurePIPIPTagged modifies pip.Properties.IPTags to in-place on the cached object.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — this was indeed a problem with the merge approach. In the latest revision I've removed the FirstPartyUsage split/merge logic entirely. When EnableIPTagMutationForExistingPublicIP is on and the annotation is present, it becomes the source of truth for the full IP tag set (pip.Properties.IPTags = ipTagRequest.IPTags), and Azure NRP decides which mutations are accepted on an existing PIP. So in your scenario the resulting IPTags would be [FirstPartyUsage=/NonProd] rather than the merged pair — no client-side merge that Azure would reject. I've also dropped the client-side validation of non-FPU tag types for the same reason (forward-compat with whatever NRP chooses to allow).

For the cache-poisoning bug. Two fixes in the latest revision:

  1. listPIP now deep-copies entries before returning them (matching getPublicIPAddress), so any in-place mutation by callers can't leak back into the cache: azure_publicip_repo.go lines 159–164.
  2. ensurePIPIPTagged no longer builds a merged [non-FPU + desired FPU] slice. With mutation enabled, the annotation is the full source of truth and pip.Properties.IPTags is replaced wholesale with ipTagRequest.IPTags. So the failure mode you describe — Azure rejecting the merge, retry succeeding because the cached PIP got mutated — no longer applies, both because the merge is gone and because the cache can't be mutated by the caller.

expectedChanged: false,
expectedErrMsg: `cannot mutate IP tag type "RoutingPreference" on existing PIP "testPIP"`,
expectedIPTags: []*armnetwork.IPTag{},
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NetworkDomain + FirstPartyUsage tag type can coexist. Worth adding mixed tag types cases like:

  1. current PIP NetworkDomain, add FirstPartyUsage.
  2. current PIP NetworkDomain + FirstPartyUsage, modify FirstPartyUsage.
  3. current PIP NetworkDomain + FirstPartyUsage, remove FirstPartyUsage.
  4. current PIP NetworkDomain + FirstPartyUsage, modify NetworkDomain, error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have refactored ensurePIPIPTagged this cases are not needed.

Comment thread pkg/provider/azure_loadbalancer.go Outdated
serviceIPTagRequest serviceIPTagRequest,
serviceAnnotationRequestsNamedPublicIP,
isIPv6 bool,
ipTagRequest serviceIPTagRequest,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why rename the serviceIPTagRequest param and move its position?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted.

@nilo19 nilo19 force-pushed the worktree-iptags-mutate branch from 687c9c5 to c095504 Compare April 20, 2026 13:52
@nilo19 nilo19 force-pushed the worktree-iptags-mutate branch from c095504 to c226c11 Compare April 20, 2026 13:58
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@nilo19: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cloud-provider-azure-e2e-ccm-vmss-ip-lb-capz c226c11 link true /test pull-cloud-provider-azure-e2e-ccm-vmss-ip-lb-capz
pull-cloud-provider-azure-e2e-ccm-vmssflex-capz c226c11 link true /test pull-cloud-provider-azure-e2e-ccm-vmssflex-capz

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

// an existing PIP.
// Returns true if pip.Properties.IPTags was modified and the caller must persist
// the change via an in-place update; false if no change is needed.
func (az *Cloud) ensurePIPIPTagged(service *v1.Service, pip *armnetwork.PublicIPAddress) (bool, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error return is always nil, can be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update pip tags as configured in service annotation instead of removing and re-creating a new pip

4 participants