Skip to content

fix: drop containerd config mounts when NRI is enabled#2514

Open
frezbo wants to merge 1 commit into
NVIDIA:mainfrom
frezbo:fix/drop-containerd-config-mounts-when-nri-enabled
Open

fix: drop containerd config mounts when NRI is enabled#2514
frezbo wants to merge 1 commit into
NVIDIA:mainfrom
frezbo:fix/drop-containerd-config-mounts-when-nri-enabled

Conversation

@frezbo
Copy link
Copy Markdown

@frezbo frezbo commented Jun 3, 2026

When NRI is enabled there's no need to setup containerd config file path mounts as the installer is a no-op. This fixes issues on immutable hosts where the containerd paths are non-writable and prevents the NRI pod from coming up since kubelet tries to create mount-points for those.

Part of upstream discusssion at: #2385

Copilot AI review requested due to automatic review settings June 3, 2026 11:54
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@frezbo
Copy link
Copy Markdown
Author

frezbo commented Jun 3, 2026

Relates to: NVIDIA/nvidia-container-toolkit#1861

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds NRI-plugin-aware runtime transformation so the toolkit installer avoids mounting/modifying runtime config on read-only hosts (e.g., Talos), and updates unit tests to cover the new behavior.

Changes:

  • Add early-return NRI plugin path in transformForRuntime that mounts only the NRI socket.
  • Extract runtime config/socket mounting logic into transformRuntimeConfigAndSocketMounts.
  • Extend TestTransformForRuntime with per-test ClusterPolicy input and a new “containerd with NRI plugin enabled” case.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
controllers/transforms_test.go Adds a new NRI-enabled containerd test case and allows per-test ClusterPolicy inputs.
controllers/object_controls.go Skips runtime config/socket mounts when NRI plugin is enabled; factors mount logic into a helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread controllers/object_controls.go
Comment thread controllers/transforms_test.go
Comment thread controllers/transforms_test.go
Comment thread controllers/object_controls.go Outdated
@cdesiniotis cdesiniotis self-assigned this Jun 3, 2026
@frezbo frezbo force-pushed the fix/drop-containerd-config-mounts-when-nri-enabled branch from 7d78a7c to 3383f08 Compare June 4, 2026 13:09
@cdesiniotis
Copy link
Copy Markdown
Contributor

/ok to test 3383f08

@cdesiniotis
Copy link
Copy Markdown
Contributor

@tariq1890 requesting your review

Comment thread controllers/transforms_test.go Outdated
@frezbo frezbo force-pushed the fix/drop-containerd-config-mounts-when-nri-enabled branch from 3383f08 to c3feb2e Compare June 4, 2026 17:53
@cdesiniotis
Copy link
Copy Markdown
Contributor

/ok to test c3feb2e

@cdesiniotis cdesiniotis added this to the v26.7 milestone Jun 4, 2026
Comment thread controllers/object_controls.go
Comment thread controllers/object_controls.go Outdated
@tariq1890
Copy link
Copy Markdown
Contributor

Thank you very much @frezbo for your contribution here!

When NRI is enabled there's no need to setup containerd config file path
mounts as the installer is a no-op. This fixes issues on immutable hosts
where the containerd paths are non-writable and prevents the NRI pod from
coming up since kubelet tries to create mount-points for those.

Part of upstream discusssion at: NVIDIA#2385

Signed-off-by: Noel Georgi <git@frezbo.dev>
@frezbo frezbo force-pushed the fix/drop-containerd-config-mounts-when-nri-enabled branch from c3feb2e to 543bf7a Compare June 5, 2026 11:45
Comment on lines +1416 to +1420
// transformRuntimeConfigAndSocketMounts configures the toolkit container with the
// mounts and environment required for the toolkit installer to update the container
// runtime configuration (top-level config, drop-in config, and runtime socket). This
// is not required when the NRI plugin is enabled, since the installer does not modify
// the runtime configuration in that mode.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave out the parts that are more or less obvious.

Suggested change
// transformRuntimeConfigAndSocketMounts configures the toolkit container with the
// mounts and environment required for the toolkit installer to update the container
// runtime configuration (top-level config, drop-in config, and runtime socket). This
// is not required when the NRI plugin is enabled, since the installer does not modify
// the runtime configuration in that mode.
// transformRuntimeConfigAndSocketMounts configures the toolkit container with the
// mounts and environment required for the toolkit installer to update the container
// runtime configuration (top-level config, drop-in config, and runtime socket).

// hostPath mounts (e.g. /etc/containerd/conf.d with DirectoryOrCreate) break on
// read-only hosts such as Talos. Only the NRI socket mount (below) is required.
if config.CDI.IsNRIPluginEnabled() {
// setup mounts for the runtime NRI socket file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move L1389-L1401 to a private the same way as you have done in transformRuntimeConfigAndSocketMounts?

testCases := []struct {
description string
runtime gpuv1.Runtime
clusterPolicy *gpuv1.ClusterPolicySpec
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of modifying the test case struct here, you can add a new test case to the TestTransformToolkit method in object_controls_test.go. The test case struct over there already has the ClusterPolicySpec object as a struct field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants