Background
I'm developing a tool that systematically explores controller reconciliation ordering, staleness, and fault injection (kamera).
Observed Behavior
I observe that when the disruption controller marks a node as Consolidatable before the node is fully initialized (pods scheduled, kubelet registered), a subsequent disruption evaluation treats the node as a valid consolidation candidate and enqueues deletion. The provisioner then creates a replacement, producing a create-delete cycle.
The consolidation.ShouldDisrupt() predicate checks ConditionTypeConsolidatable (line 122) but does not gate on Initialized(). An uninitialized node has no pods scheduled yet, so it appears "empty" and is marked Consolidatable — triggering deletion before the node has had a chance to receive workloads. This occurs whenever the disruption controller evaluates the node before initialization completes.
Expected Behavior
The disruption controller should not evaluate nodes for consolidation until they are fully initialized (all expected pods scheduled, NodeClaim initialized, Node registered).
Proposed Fix
The consolidation.ShouldDisrupt() predicate at consolidation.go:88-123 checks ConditionTypeConsolidatable (line 122) but does not check whether the node is initialized. In contrast, ValidateNodeDisruptable() at statenode.go:210 correctly gates on Initialized() (which checks for the NodeInitializedLabelKey label at statenode.go:343-350).
possible fix: add an Initialized() check at the top of consolidation.ShouldDisrupt():
func (c *Consolidation) ShouldDisrupt(_ context.Context, cn *Candidate) bool {
if !cn.Initialized() {
return false
}
// ... existing checks ...
}
This mirrors the pattern in ValidateNodeDisruptable() and ensures nodes are never considered for consolidation before they are fully initialized. Without this check, the node appears "empty" (no pods scheduled yet) and is immediately marked Consolidatable, triggering a deletion before the node has had a chance to receive workloads.
I'm happy to put up a PR for this if it would be helpful.
Versions
- Karpenter: v1.8.0 (
sigs.k8s.io/karpenter, commit 8ae07cf8)
- Kubernetes: simulated via kamera (based on k8s.io/client-go v0.35.0 / Kubernetes 1.35)
Background
I'm developing a tool that systematically explores controller reconciliation ordering, staleness, and fault injection (kamera).
Observed Behavior
I observe that when the disruption controller marks a node as
Consolidatablebefore the node is fully initialized (pods scheduled, kubelet registered), a subsequent disruption evaluation treats the node as a valid consolidation candidate and enqueues deletion. The provisioner then creates a replacement, producing a create-delete cycle.The
consolidation.ShouldDisrupt()predicate checksConditionTypeConsolidatable(line 122) but does not gate onInitialized(). An uninitialized node has no pods scheduled yet, so it appears "empty" and is marked Consolidatable — triggering deletion before the node has had a chance to receive workloads. This occurs whenever the disruption controller evaluates the node before initialization completes.Expected Behavior
The disruption controller should not evaluate nodes for consolidation until they are fully initialized (all expected pods scheduled, NodeClaim initialized, Node registered).
Proposed Fix
The
consolidation.ShouldDisrupt()predicate atconsolidation.go:88-123checksConditionTypeConsolidatable(line 122) but does not check whether the node is initialized. In contrast,ValidateNodeDisruptable()atstatenode.go:210correctly gates onInitialized()(which checks for theNodeInitializedLabelKeylabel atstatenode.go:343-350).possible fix: add an
Initialized()check at the top ofconsolidation.ShouldDisrupt():This mirrors the pattern in
ValidateNodeDisruptable()and ensures nodes are never considered for consolidation before they are fully initialized. Without this check, the node appears "empty" (no pods scheduled yet) and is immediately marked Consolidatable, triggering a deletion before the node has had a chance to receive workloads.I'm happy to put up a PR for this if it would be helpful.
Versions
sigs.k8s.io/karpenter, commit8ae07cf8)