Hello Team
In my cluster I cant fix pod one of the pod
gpu-feature-discovery-9ztn5 1/1 Running 0 16m
gpu-feature-discovery-nlg6b 1/1 Running 0 16m
gpu-feature-discovery-rmvtr 1/1 Running 0 16m
gpu-operator-75b97bdcbc-fjmsm 1/1 Running 0 17m
nvidia-container-toolkit-daemonset-65m2r 1/1 Running 0 16m
nvidia-container-toolkit-daemonset-prg6k 1/1 Running 0 16m
nvidia-container-toolkit-daemonset-s9g8m 1/1 Running 0 16m
nvidia-cuda-validator-44rfx 0/1 Completed 0 14m
nvidia-cuda-validator-7bh8j 0/1 Completed 0 13m
nvidia-cuda-validator-rr9qz 0/1 Init:Error 4 (2m14s ago) 5m1s
nvidia-dcgm-exporter-bn65s 1/1 Running 0 16m
nvidia-dcgm-exporter-f454z 1/1 Running 0 16m
nvidia-dcgm-exporter-z8n4w 1/1 Running 0 16m
nvidia-device-plugin-daemonset-k4rl8 1/1 Running 0 16m
nvidia-device-plugin-daemonset-p7w7s 1/1 Running 0 16m
nvidia-device-plugin-daemonset-vt5d7 1/1 Running 0 16m
nvidia-driver-daemonset-9hcbr 1/1 Running 0 17m
nvidia-driver-daemonset-dhq4j 1/1 Running 0 16m
nvidia-driver-daemonset-t84wg 1/1 Running 0 16m
nvidia-gpu-operator-node-feature-discovery-gc-796ccf46c6-p9674 1/1 Running 0 17m
nvidia-gpu-operator-node-feature-discovery-master-65589f87tw7qq 1/1 Running 0 17m
nvidia-gpu-operator-node-feature-discovery-worker-62nv7 1/1 Running 1 (16m ago) 17m
nvidia-gpu-operator-node-feature-discovery-worker-gjvjg 1/1 Running 0 17m
nvidia-gpu-operator-node-feature-discovery-worker-jvmqm 1/1 Running 1 (16m ago) 17m
nvidia-gpu-operator-node-feature-discovery-worker-pxkgq 1/1 Running 0 17m
nvidia-gpu-operator-node-feature-discovery-worker-qgrbd 1/1 Running 1 (16m ago) 17m
nvidia-gpu-operator-node-feature-discovery-worker-t9rq6 1/1 Running 1 (16m ago) 17m
nvidia-mig-manager-jq47r 1/1 Running 0 15m
nvidia-mig-manager-l6wdd 1/1 Running 0 13m
nvidia-mig-manager-twltl 1/1 Running 0 14m
nvidia-operator-validator-hpqxf 1/1 Running 0 16m
nvidia-operator-validator-tc7b2 1/1 Running 0 16m
nvidia-operator-validator-zfrz6 0/1 Init:2/4 2 (5m13s ago) 16m
I did node restarts, daemonSet restart but nothing can helped me
GPU Operator version: v26.3.2
Driver: 580.159.04
GPU: NVIDIA B200
OS Type Ubuntu 24.04
CUDA Version: 13.0
Hello Team
In my cluster I cant fix pod one of the pod
I did node restarts, daemonSet restart but nothing can helped me
GPU Operator version: v26.3.2
Driver: 580.159.04
GPU: NVIDIA B200
OS Type Ubuntu 24.04
CUDA Version: 13.0