Add GPUClusterConfig CRD and controller for DRA-based stack#2513
Draft
karthikvetrivel wants to merge 4 commits into
Draft
Add GPUClusterConfig CRD and controller for DRA-based stack#2513karthikvetrivel wants to merge 4 commits into
karthikvetrivel wants to merge 4 commits into
Conversation
0a080e3 to
5ddc1d5
Compare
Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
73c9d30 to
94566b5
Compare
Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
94566b5 to
a4e79f6
Compare
Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
1. Overview
We introduce a new CRD named
GPUClusterConfigand a new controller for reconciling it. LikeClusterPolicytoday, it is a singleton, cluster-scoped CRD that configures the operands needed to enable GPUs in Kubernetes.GPUClusterConfigrepresents the new DRA-based software-enablement stack; it is an evolution ofClusterPolicy.Change Log
9f08bec:
GPUClusterConfigGo types inapi/nvidia/v1alpha1, cluster-scoped + singleton, with kubebuilder validation/default markers for every operand block. WireAddToScheme. Generated the CRD manifest + deepcopy.make manifests generateproduces the CRD yaml and deepcopy.kubectl applythe CRD succeeds.73c9d30:
state.Manager/SyncState()engine (the samepattern
NVIDIADriveruses), registered incmd/gpu-operator/main.go.instance is marked
Ignoredand skipped. Mirrors howClusterPolicyhandles duplicates.ccc0f7a:
gpusandcomputeDomainscontainers start, validates that the NVIDIA driver is installed, and writes/run/nvidia/validations/driver-readywith the two env vars the kubelet-plugin containers source on startup (NVIDIA_DRIVER_ROOT,DRIVER_ROOT_CTR_PATH).87fa6c0: