Introduce KANConfig resource for dynamic controller configuration#154
Introduce KANConfig resource for dynamic controller configuration#154SeptBlast wants to merge 2 commits intokubernetes-sigs:mainfrom
Conversation
- Added KANConfig CRD with fields for ProxyImage, WorkerCount, and AgenticIdentityTrustDomain. - Implemented KANConfig handling in the controller to apply configurations dynamically. - Updated controller to manage worker count and proxy image based on KANConfig. - Added event handlers for KANConfig to trigger updates on GatewayClasses. - Implemented tests for KANConfig application and status updates.
✅ Deploy Preview for kube-agentic-networking ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: SeptBlast The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @SeptBlast! |
|
Hi @SeptBlast. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Pull request overview
This PR introduces a KANConfig cluster-scoped custom resource and wires the controller to read configuration via GatewayClass.spec.parametersRef, enabling configuration changes without restarting the controller.
Changes:
- Add
KANConfigAPI types, CRD, generated client/informer/lister code, and deployment manifest updates. - Update the controller to resolve
GatewayClass.spec.parametersRefand applyKANConfigvalues (proxy image, worker count, trust domain) while writing backKANConfigstatus. - Add unit tests covering
applyKANConfigbehavior and related helpers.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/controller/kanconfig_test.go | Adds unit tests for applyKANConfig, mergeStringSlice, and Gateway ownership helper. |
| pkg/controller/gatewayclass.go | Resolves parametersRef, applies KANConfig, updates KANConfig status, and adds KANConfig event handlers. |
| pkg/controller/controller.go | Adds KANConfig lister/sync, removes Run worker arg, and initializes controller config defaults for KANConfig-driven config. |
| cmd/agentic-net-controller/main.go | Removes CLI flags for controller config, wires KANConfig informer, changes informer resync default, and updates controller Run call. |
| k8s/deploy/deployment.yaml | Updates RBAC for KANConfig, removes config CLI args, adds parametersRef and an example KANConfig resource. |
| k8s/crds/agentic.prototype.x-k8s.io_kanconfigs.yaml | Introduces the generated CRD for KANConfig. |
| api/v0alpha0/kanconfig_types.go | Adds Go API type definitions for KANConfig (spec/status). |
| api/v0alpha0/zz_generated.register.go | Registers KANConfig types with the scheme. |
| api/v0alpha0/zz_generated.deepcopy.go | Adds generated deep-copy implementations for KANConfig types. |
| k8s/client/listers/api/v0alpha0/kanconfig.go | Adds generated lister for KANConfig. |
| k8s/client/listers/api/v0alpha0/expansion_generated.go | Adds lister expansion interfaces for KANConfig. |
| k8s/client/informers/externalversions/generic.go | Adds generic informer mapping for kanconfigs. |
| k8s/client/informers/externalversions/api/v0alpha0/kanconfig.go | Adds generated shared informer for KANConfig. |
| k8s/client/informers/externalversions/api/v0alpha0/interface.go | Exposes KANConfigs() informer in the versioned informer interface. |
| k8s/client/clientset/versioned/typed/api/v0alpha0/kanconfig.go | Adds generated typed client for KANConfig, including UpdateStatus. |
| k8s/client/clientset/versioned/typed/api/v0alpha0/generated_expansion.go | Adds client expansion interface for KANConfig. |
| k8s/client/clientset/versioned/typed/api/v0alpha0/fake/fake_kanconfig.go | Adds fake client implementation for KANConfig. |
| k8s/client/clientset/versioned/typed/api/v0alpha0/fake/fake_api_client.go | Wires fake KANConfigs() getter into the fake API client. |
| k8s/client/clientset/versioned/typed/api/v0alpha0/api_client.go | Wires real KANConfigs() getter into the typed API client. |
Comments suppressed due to low confidence (1)
pkg/controller/controller.go:183
translator.Newis now constructed with an empty trust domain, but translator logic usesagenticIdentityTrustDomainwhen generating SPIFFE IDs for AccessPolicy. Without wiring KANConfig into the translator, policies will produce incorrect/empty SPIFFE IDs. Consider passing an initial trust domain (from a default KANConfig) and/or updating the translator whenapplyKANConfigchanges the trust domain.
c.translator = translator.New(
"",
kubeClientSet,
gwClientSet,
namespaceInformer.Lister(),
serviceInformer.Lister(),
secretInformer.Lister(),
gatewayInformer.Lister(),
httprouteInformer.Lister(),
accessPolicyInformer.Lister(),
backendInformer.Lister(),
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -117,14 +101,16 @@ func main() { | |||
| sharedGwInformers.Gateway().V1().HTTPRoutes(), | |||
| sharedGwInformers.Gateway().V1beta1().ReferenceGrants(), | |||
| sharedAgenticInformers.Agentic().V0alpha0().XBackends(), | |||
| sharedAgenticInformers.Agentic().V0alpha0().XAccessPolicies()) | |||
| sharedAgenticInformers.Agentic().V0alpha0().XAccessPolicies(), | |||
| sharedAgenticInformers.Agentic().V0alpha0().KANConfigs(), | |||
| ) | |||
There was a problem hiding this comment.
This change removes the agentic identity signer controller wiring/flags, but the new KANConfig API and the deployment manifest still include enableAgenticIdentitySigner and agenticIdentityTrustDomain. As-is, the feature is effectively disabled and the config field is unused. Either keep/start the signer controller based on KANConfig, or remove the field and manifest setting to avoid a confusing no-op.
k8s/deploy/deployment.yaml
Outdated
| resources: ["xbackends", "xaccesspolicies", "kanconfigs"] | ||
| verbs: ["get", "list", "watch", "update", "patch"] | ||
| - apiGroups: ["agentic.prototype.x-k8s.io"] |
There was a problem hiding this comment.
The ClusterRole grants update/patch on kanconfigs (spec) in addition to kanconfigs/status. The controller code only needs get/list/watch on kanconfigs plus update/patch on kanconfigs/status. Consider dropping spec mutation permissions to follow least-privilege RBAC.
| resources: ["xbackends", "xaccesspolicies", "kanconfigs"] | |
| verbs: ["get", "list", "watch", "update", "patch"] | |
| - apiGroups: ["agentic.prototype.x-k8s.io"] | |
| resources: ["xbackends", "xaccesspolicies"] | |
| verbs: ["get", "list", "watch", "update", "patch"] | |
| - apiGroups: ["agentic.prototype.x-k8s.io"] | |
| resources: ["kanconfigs"] | |
| verbs: ["get", "list", "watch"] | |
| - apiGroups: ["agentic.prototype.x-k8s.io"] |
k8s/deploy/deployment.yaml
Outdated
| apiVersion: agentic.prototype.x-k8s.io/v0alpha0 | ||
| kind: KANConfig | ||
| metadata: | ||
| name: kube-agentic-networking-config | ||
| spec: | ||
| proxyImage: envoyproxy/envoy:v1.36-latest | ||
| workerCount: 6 | ||
| agenticIdentityTrustDomain: cluster.local | ||
| enableAgenticIdentitySigner: true |
There was a problem hiding this comment.
The manifest sets spec.enableAgenticIdentitySigner: true in the KANConfig, but the controller binary no longer starts the signer controller and there is no handling of this field in the controller. This makes the manifest configuration a no-op and may be misleading for operators. Either wire this field into behavior (start/enable the signer) or remove it from the CR/spec in the deployment example.
| func (c *Controller) applyKANConfig(cfg *v0alpha0.KANConfig, referencingGatewayClass string) { | ||
| if cfg.Spec.ProxyImage != "" { | ||
| c.envoyImage = cfg.Spec.ProxyImage | ||
| } | ||
| if cfg.Spec.AgenticIdentityTrustDomain != "" { | ||
| c.agenticIdentityTrustDomain = cfg.Spec.AgenticIdentityTrustDomain | ||
| } | ||
| if cfg.Spec.WorkerCount > 0 { | ||
| c.workerCount = cfg.Spec.WorkerCount | ||
| } |
There was a problem hiding this comment.
applyKANConfig mutates c.envoyImage, c.agenticIdentityTrustDomain, and c.workerCount while Gateway reconcile workers concurrently read those fields (e.g., in syncHandler). This introduces a real data race under the Go race detector. Consider storing config in an immutable struct behind an atomic.Value (or protecting reads/writes with a mutex), and always read a snapshot inside reconciles.
pkg/controller/gatewayclass.go
Outdated
| newCfg := cfg.DeepCopy() | ||
| newCfg.Status.ObservedGeneration = cfg.Generation | ||
| newCfg.Status.ActiveWorkerCount = c.workerCount | ||
| newCfg.Status.ReferencedBy = mergeStringSlice(cfg.Status.ReferencedBy, referencingGatewayClass) |
There was a problem hiding this comment.
mergeStringSlice(cfg.Status.ReferencedBy, ...) can mutate the backing array of cfg.Status.ReferencedBy when it appends (if the slice has spare capacity). Since cfg is from a lister/informer cache and should be treated read-only, this can accidentally mutate the cached object. Consider copying the slice before appending (e.g., append([]string(nil), cfg.Status.ReferencedBy...)) or changing mergeStringSlice to always return a new slice.
| newCfg.Status.ReferencedBy = mergeStringSlice(cfg.Status.ReferencedBy, referencingGatewayClass) | |
| referencedBy := append([]string(nil), cfg.Status.ReferencedBy...) | |
| newCfg.Status.ReferencedBy = mergeStringSlice(referencedBy, referencingGatewayClass) |
pkg/controller/gatewayclass.go
Outdated
| func (c *Controller) applyKANConfig(cfg *v0alpha0.KANConfig, referencingGatewayClass string) { | ||
| if cfg.Spec.ProxyImage != "" { | ||
| c.envoyImage = cfg.Spec.ProxyImage | ||
| } | ||
| if cfg.Spec.AgenticIdentityTrustDomain != "" { | ||
| c.agenticIdentityTrustDomain = cfg.Spec.AgenticIdentityTrustDomain |
There was a problem hiding this comment.
Updating c.agenticIdentityTrustDomain here does not update c.translator (constructed in controller.New), but the translator uses agenticIdentityTrustDomain when generating SPIFFE IDs for AccessPolicy. As-is, config changes (and even initial config) won’t affect translation behavior. Consider either recreating/updating the translator when config changes or making translator read the trust domain from the controller’s shared config snapshot.
| func (c *Controller) applyKANConfig(cfg *v0alpha0.KANConfig, referencingGatewayClass string) { | |
| if cfg.Spec.ProxyImage != "" { | |
| c.envoyImage = cfg.Spec.ProxyImage | |
| } | |
| if cfg.Spec.AgenticIdentityTrustDomain != "" { | |
| c.agenticIdentityTrustDomain = cfg.Spec.AgenticIdentityTrustDomain | |
| // setAgenticIdentityTrustDomain updates the controller's agentic identity trust domain | |
| // and, if supported by the translator implementation, propagates the change to it. | |
| func (c *Controller) setAgenticIdentityTrustDomain(td string) { | |
| if td == "" { | |
| return | |
| } | |
| c.agenticIdentityTrustDomain = td | |
| type trustDomainUpdater interface { | |
| SetAgenticIdentityTrustDomain(string) | |
| } | |
| if updater, ok := any(c.translator).(trustDomainUpdater); ok && updater != nil { | |
| updater.SetAgenticIdentityTrustDomain(td) | |
| } | |
| } | |
| func (c *Controller) applyKANConfig(cfg *v0alpha0.KANConfig, referencingGatewayClass string) { | |
| if cfg.Spec.ProxyImage != "" { | |
| c.envoyImage = cfg.Spec.ProxyImage | |
| } | |
| if cfg.Spec.AgenticIdentityTrustDomain != "" { | |
| c.setAgenticIdentityTrustDomain(cfg.Spec.AgenticIdentityTrustDomain) |
pkg/controller/gatewayclass.go
Outdated
| if cfg.Spec.WorkerCount > 0 { | ||
| c.workerCount = cfg.Spec.WorkerCount | ||
| } | ||
| // Add more fields here as KANConfigSpec grows | ||
|
|
||
| // Update KANConfig status to reflect what was applied. | ||
| newCfg := cfg.DeepCopy() | ||
| newCfg.Status.ObservedGeneration = cfg.Generation | ||
| newCfg.Status.ActiveWorkerCount = c.workerCount | ||
| newCfg.Status.ReferencedBy = mergeStringSlice(cfg.Status.ReferencedBy, referencingGatewayClass) |
There was a problem hiding this comment.
workerCount is used only when starting goroutines in Controller.Run. Changing it here after startup will not change the number of running workers, but ActiveWorkerCount is still reported as c.workerCount, which can become inaccurate. Either make spec.workerCount startup-only (and report the real active count), or implement dynamic worker pool resizing.
pkg/controller/gatewayclass.go
Outdated
| newCfg := cfg.DeepCopy() | ||
| newCfg.Status.ObservedGeneration = cfg.Generation | ||
| newCfg.Status.ActiveWorkerCount = c.workerCount | ||
| newCfg.Status.ReferencedBy = mergeStringSlice(cfg.Status.ReferencedBy, referencingGatewayClass) |
There was a problem hiding this comment.
ReferencedBy is updated by only appending the current GatewayClass name, but there’s no corresponding removal when a GatewayClass changes parametersRef or is deleted. Over time this will accumulate stale entries and make status incorrect. Consider recomputing ReferencedBy from the current GatewayClass list (filtering by parametersRef) on each apply, or maintaining it with add/remove logic tied to GatewayClass events.
| newCfg.Status.ReferencedBy = mergeStringSlice(cfg.Status.ReferencedBy, referencingGatewayClass) | |
| newCfg.Status.ReferencedBy = []string{referencingGatewayClass} |
| // Update KANConfig status to reflect what was applied. | ||
| newCfg := cfg.DeepCopy() | ||
| newCfg.Status.ObservedGeneration = cfg.Generation | ||
| newCfg.Status.ActiveWorkerCount = c.workerCount | ||
| newCfg.Status.ReferencedBy = mergeStringSlice(cfg.Status.ReferencedBy, referencingGatewayClass) | ||
| meta.SetStatusCondition(&newCfg.Status.Conditions, metav1.Condition{ | ||
| Type: "Accepted", | ||
| Status: metav1.ConditionTrue, | ||
| Reason: "Accepted", | ||
| Message: "KANConfig is valid and has been accepted by the controller.", | ||
| ObservedGeneration: cfg.Generation, | ||
| }) | ||
| meta.SetStatusCondition(&newCfg.Status.Conditions, metav1.Condition{ | ||
| Type: "Applied", | ||
| Status: metav1.ConditionTrue, | ||
| Reason: "Applied", | ||
| Message: fmt.Sprintf("KANConfig has been applied to GatewayClass %q.", referencingGatewayClass), | ||
| ObservedGeneration: cfg.Generation, | ||
| }) | ||
| if _, err := c.agentic.client.AgenticV0alpha0().KANConfigs("").UpdateStatus( | ||
| context.Background(), newCfg, metav1.UpdateOptions{}, | ||
| ); err != nil { | ||
| klog.Errorf("failed to update KANConfig %q status: %v", cfg.Name, err) | ||
| } |
There was a problem hiding this comment.
applyKANConfig calls UpdateStatus on every reconcile even if the status content is unchanged. This can cause unnecessary API writes and reconciler churn. Consider checking whether ObservedGeneration/ActiveWorkerCount/ReferencedBy/Conditions actually changed (e.g., via semantic deep-equal) before issuing UpdateStatus.
- Introduced atomic configuration management for KANConfig in the controller. - Updated KANConfig CRD to include new fields for active worker count and referenced GatewayClasses. - Adjusted deployment permissions to reflect changes in KANConfig resource handling. - Improved test coverage for KANConfig application logic.
|
/ok-to-test |
|
@SeptBlast: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Thanks @SeptBlast ! /hold for review of the approach and new crd /cc |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
LiorLieberman
left a comment
There was a problem hiding this comment.
Hi @SeptBlast, thanks for picking this up and sorry for the delay!
I think workersCount probably still belongs in a flag.
The way we look at GatewayClass is that it represents class of Gateways that can be instantiated from that class. Hence some configs there (or using paramsRef) should just impact the Gateways that are created from this class.
workersCount is something that affects the controller itself, vs proxyImage, as an example which is something that is related to the Gateway -- i.e what proxyImage would be in the deployment that is created from a Gateway CRD.
Regarding agenticIdentityTrustDomain, I also think its a cluster level config, and not Gateway Level.
/cc @haiyanmeng @chuangw6 can you please confirm my understanding of how we use agenticIdentityTrustDomain.
@SeptBlast - how would you compare this KANConfig VS having a simple ConfigMap to begin with?
What type of PR is this?
/kind feature
What this PR does / why we need it:
Moves KAN controller configuration from CLI flags to a
KANConfigcustom resource referenced viaGatewayClass.spec.parametersRef. This allows operators to update controller configuration (proxy image, worker count, agentic identity trust domain) without restarting the controller binary. The controller now watches forKANConfigchanges and re-queues all ownedGatewayClassobjects to apply the updated configuration dynamically.A full
KANConfigStatussubresource is also introduced, trackingObservedGeneration,ActiveWorkerCount, whichGatewayClassobjects reference this config (ReferencedBy), andAccepted/Appliedconditions so operators can observe the current state of the configuration.The deployment.yaml manifest is updated to include the
KANConfigCR, aparametersRefon theGatewayClass, and the required RBAC rules (includingkanconfigs/status).Which issue(s) this PR fixes:
Fixes #135
Does this PR introduce a user-facing change?
Controller configuration (proxy image, worker count, agentic identity trust domain) is now managed via a
KANConfigcustom resource referenced byGatewayClass.spec.parametersRef, replacing the previous CLI flags. Configuration changes take effect dynamically without restarting the controller.