feat: support config-gated in-place mutation of FirstPartyUsage IP tags#10133
feat: support config-gated in-place mutation of FirstPartyUsage IP tags#10133nilo19 wants to merge 4 commits intokubernetes-sigs:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: nilo19 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Add enableIPTagMutationForExistingPublicIP config flag that enables in-place mutation of FirstPartyUsage IP tags on existing public IPs instead of deleting and recreating the PIP. Only FirstPartyUsage-type IP tags (/NonProd, /Unprivileged) can be mutated in-place. Attempting to change other IP tag types (e.g. RoutingPreference) returns a reconciliation error. When the config flag is disabled (default), the original delete-and-recreate behavior is preserved.
d9e7b85 to
29c5b54
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds a config-gated feature to avoid deleting/recreating Azure Public IPs when service.beta.kubernetes.io/azure-pip-ip-tags changes, by allowing in-place mutation of FirstPartyUsage IP tags on existing, CCM-managed PIPs.
Changes:
- Introduces
enableIPTagMutationForExistingPublicIPconfig to gate in-place mutation behavior (off by default). - Adds
ensurePIPIPTaggedmutation logic and wires it into the PIP reconcile flow (only for kept, non-user-assigned PIPs). - Adds/extends unit and e2e coverage for in-place IP tag mutation, including error cases for non-FirstPartyUsage tag changes.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| tests/e2e/network/service_annotations.go | Adds an e2e that updates/clears FirstPartyUsage IP tags and verifies PIP identity is preserved; adds helper to poll/compare IP tags. |
| pkg/provider/config/azure.go | Adds the EnableIPTagMutationForExistingPublicIP config field and documentation. |
| pkg/provider/azure_loadbalancer.go | Gates delete/recreate behavior on the new config and adds in-place FirstPartyUsage IP tag reconciliation. |
| pkg/provider/azure_loadbalancer_test.go | Adds unit tests covering mutation-enabled vs disabled behavior and ensures IP tag reconciliation runs before DNS short-circuit. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The existing comparator used `a < b || c < d` which is not a strict weak ordering and can produce non-deterministic sort results. Fix to compare by IPTagType first, then by Tag only when types are equal.
|
/retest |
| @@ -3313,12 +3325,6 @@ func shouldReleaseExistingOwnedPublicIP( | |||
| // Latch some variables for readability purposes. | |||
| pipName := *existingPip.Name | |||
There was a problem hiding this comment.
Since currentIPTags is moved down. I think this can also be moved down.
| return false | ||
| } | ||
|
|
||
| // Determine current IP tags for comparison. |
There was a problem hiding this comment.
Why is the comment changed from "Assume the current IP Tags are empty by default unless properties specify otherwise."?
| return false, nil | ||
| } | ||
|
|
||
| if pip.Properties == nil { |
There was a problem hiding this comment.
Nit: maybe check this first before getServiceIPTagRequestForPublicIP?
| return changed | ||
| } | ||
|
|
||
| // ensurePIPIPTagged ensures the PIP has the IP tags specified by the service annotation. |
There was a problem hiding this comment.
Maybe add comment about returning true if the IPTags were modified and can update in-place.
| } | ||
|
|
||
| // ensurePIPIPTagged ensures the PIP has the IP tags specified by the service annotation. | ||
| // When the config is off or annotation is absent, existing IP tags are preserved. |
There was a problem hiding this comment.
"config" is vague here and in test descriptions if I see the code outside of this PR.
| expectedClientGet: &deleteUnwantedPIPsAndCreateANewOneclientGet, | ||
| }, | ||
| { | ||
| desc: "config on + RP mismatch should return error", |
There was a problem hiding this comment.
is RP and FPU, RoutingPreference and FirstPartyUsage? Should probably be spelled out in test descriptions. Maybe non-FirstPartyUsage as well for the "RP" cases.
| shouldPutPIP: true, | ||
| }, | ||
| { | ||
| desc: "shall update existed PIP IP tags in place before the dns short-circuit", |
There was a problem hiding this comment.
"before the dns short-circuit" is somewhat clear if I know the code. A more self-explanatory description would be better.
| }{ | ||
| { | ||
| // #1: config off — no-op regardless of annotation/tags | ||
| desc: "config off should be no-op", |
There was a problem hiding this comment.
Comments above each case are redundant with the desc field. Each case seem simple enough. Either remove the comments or merge the comment and desc if needed.
| desiredTag := ptr.Deref(desired.Tag, "") | ||
| found := false | ||
| for _, current := range currentOther { | ||
| if strings.EqualFold(ptr.Deref(current.IPTagType, ""), desiredType) { |
There was a problem hiding this comment.
Is using map better here?
There was a problem hiding this comment.
refactored the function.
| service.Annotations = test.additionalAnnotations | ||
| mockPIPsClient := az.NetworkClientFactory.GetPublicIPAddressClient().(*mock_publicipaddressclient.MockInterface) | ||
| createOrUpdateCount := 0 | ||
| if test.shouldPutPIP { |
There was a problem hiding this comment.
shouldPutPIP gates whether the mock expectation is set up at all. Setting the mock when shouldPutPIP=true with Times(1) instead of AnyTimes() would be a cleaner assertion without needing a separate counter.
There was a problem hiding this comment.
yes, but that's something out of the scope of this fix, we can refactor this issue among all unit tests in the future with another change.
| (!enableIPTagMutation && | ||
| ipTagRequest.IPTagsRequestedByAnnotation && | ||
| !areIPTagsEquivalent(currentIPTags, ipTagRequest.IPTags)) |
There was a problem hiding this comment.
If the ip tag mutation is enabled, current IP no tags, and request to add RoutingPreference tag type, this will be false which mean it won't be deleted. Later it will error since ensurePIPIPTagged validates only FirstPartyUsage tags can be updated in-place. Is this intended when ip tag mutation is enabled? Is there any guidance if user wants to add RoutingPreference tag type in this case?
There was a problem hiding this comment.
Good catch — addressed in the latest revision. We removed the client-side FirstPartyUsage validation in ensurePIPIPTagged; with mutation enabled we now PUT the requested tag set and let NRP accept or reject. For tag types NRP doesn't yet support in-place, the user gets Azure's error directly and can fall back to enableIPTagMutation=false for delete-and-recreate semantics.
| } | ||
|
|
||
| ipTagRequest := getServiceIPTagRequestForPublicIP(service) | ||
| if !ipTagRequest.IPTagsRequestedByAnnotation { |
There was a problem hiding this comment.
Should removing the azure-pip-ip-tags annotation have the same effect as setting it empty (i.e., "")? Currently removing it does not remove the IP tag, only setting it empty will remove the IP tag.
There was a problem hiding this comment.
yes, should remove the tag by setting it empty, or it will be ignored.
| // Validate: error if annotation explicitly requests a non-FPU tag that | ||
| // differs from the current value, or if a new non-FPU tag is requested. | ||
| pipName := ptr.Deref(pip.Name, "") | ||
| for _, desired := range desiredOther { |
There was a problem hiding this comment.
Is this intended?
Existing PIP: RoutingPreference=Internet
Annotation changes to: FirstPartyUsage=/NonProd
desiredOther is empty, validation passes
Merge produces [RoutingPreference=Internet, FirstPartyUsage=/NonProd]
CreateOrUpdatePIP will fail.
There was a problem hiding this comment.
Error returned here will trigger retry but the next retry succeeds. The PIP in Azure still only has RoutingPreference. I suspect this is because ensurePIPIPTagged modifies pip.Properties.IPTags to in-place on the cached object.
There was a problem hiding this comment.
Good catch — this was indeed a problem with the merge approach. In the latest revision I've removed the FirstPartyUsage split/merge logic entirely. When EnableIPTagMutationForExistingPublicIP is on and the annotation is present, it becomes the source of truth for the full IP tag set (pip.Properties.IPTags = ipTagRequest.IPTags), and Azure NRP decides which mutations are accepted on an existing PIP. So in your scenario the resulting IPTags would be [FirstPartyUsage=/NonProd] rather than the merged pair — no client-side merge that Azure would reject. I've also dropped the client-side validation of non-FPU tag types for the same reason (forward-compat with whatever NRP chooses to allow).
For the cache-poisoning bug. Two fixes in the latest revision:
- listPIP now deep-copies entries before returning them (matching getPublicIPAddress), so any in-place mutation by callers can't leak back into the cache: azure_publicip_repo.go lines 159–164.
- ensurePIPIPTagged no longer builds a merged [non-FPU + desired FPU] slice. With mutation enabled, the annotation is the full source of truth and pip.Properties.IPTags is replaced wholesale with ipTagRequest.IPTags. So the failure mode you describe — Azure rejecting the merge, retry succeeding because the cached PIP got mutated — no longer applies, both because the merge is gone and because the cache can't be mutated by the caller.
| expectedChanged: false, | ||
| expectedErrMsg: `cannot mutate IP tag type "RoutingPreference" on existing PIP "testPIP"`, | ||
| expectedIPTags: []*armnetwork.IPTag{}, | ||
| }, |
There was a problem hiding this comment.
NetworkDomain + FirstPartyUsage tag type can coexist. Worth adding mixed tag types cases like:
- current PIP NetworkDomain, add FirstPartyUsage.
- current PIP NetworkDomain + FirstPartyUsage, modify FirstPartyUsage.
- current PIP NetworkDomain + FirstPartyUsage, remove FirstPartyUsage.
- current PIP NetworkDomain + FirstPartyUsage, modify NetworkDomain, error.
There was a problem hiding this comment.
Since we have refactored ensurePIPIPTagged this cases are not needed.
| serviceIPTagRequest serviceIPTagRequest, | ||
| serviceAnnotationRequestsNamedPublicIP, | ||
| isIPv6 bool, | ||
| ipTagRequest serviceIPTagRequest, |
There was a problem hiding this comment.
Why rename the serviceIPTagRequest param and move its position?
687c9c5 to
c095504
Compare
c095504 to
c226c11
Compare
|
@nilo19: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| // an existing PIP. | ||
| // Returns true if pip.Properties.IPTags was modified and the caller must persist | ||
| // the change via an in-place update; false if no change is needed. | ||
| func (az *Cloud) ensurePIPIPTagged(service *v1.Service, pip *armnetwork.PublicIPAddress) (bool, error) { |
There was a problem hiding this comment.
error return is always nil, can be removed.
What type of PR is this?
/kind feature
What this PR does / why we need it:
When the
service.beta.kubernetes.io/azure-pip-ip-tagsannotation changes on a Service, the current code deletes the existing Public IP and creates a new one, causing unnecessary IP address changes and service disruption.This PR adds a new config flag
enableIPTagMutationForExistingPublicIPthat enables in-place mutation ofFirstPartyUsageIP tags on existing PIPs instead of deleting and recreating them.Key design decisions:
FirstPartyUsage-type IP tags (/NonProd,/Unprivileged) can be mutated in-place. Attempting to change other types (e.g.RoutingPreference) returns a reconciliation errorshouldReleaseExistingOwnedPublicIPChanges:
enableIPTagMutationForExistingPublicIPbool config field inpkg/provider/config/azure.goensurePIPIPTaggedas a*Cloudmethod returning(bool, error)— splits tags into FirstPartyUsage vs other, validates non-FPU tags, only mutates FPU in-placeshouldReleaseExistingOwnedPublicIPcondition Enable e2e tests for Azure #4 gated on!enableIPTagMutationensurePIPIPTaggedafter the keep/delete decision ingetPublicIPUpdates(not on the delete path)ensurePublicIPExistsbefore the DNS-label short-circuitValidated against live Azure:
FirstPartyUsage=/NonProd→/Unprivilegedupdate: ✅ succeeds on Standard SKU zone-redundant PIPsRoutingPreference+FirstPartyUsagecoexistence: ❌ rejected by Azure (MultipleIPTagWithRoutingPreferenceNotSupported)Which issue(s) this PR fixes:
Fixes #9125
Special notes for your reviewer:
ensurePIPIPTaggedmethod is only called on PIPs thatshouldReleaseExistingOwnedPublicIPdecides to keep — never on the delete path!isUserAssignedPIP)ensurePIPTaggedensurePublicIPExistscall site is placed before the DNS-label short-circuit to ensure IP tags are reconciled on the early-return path/NonProd,/Unprivileged) validated against live Azureerr != nilDoes this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: