Skip to content

feat(snapshot): allow pmem backing file path override on restore#5896

Open
MohibShaikh wants to merge 1 commit into
firecracker-microvm:mainfrom
MohibShaikh:feat/pmem-snapshot-override
Open

feat(snapshot): allow pmem backing file path override on restore#5896
MohibShaikh wants to merge 1 commit into
firecracker-microvm:mainfrom
MohibShaikh:feat/pmem-snapshot-override

Conversation

@MohibShaikh

Copy link
Copy Markdown
Contributor

Allow overriding virtio-pmem device backing file paths when restoring from a snapshot. This is useful when the host path baked into the snapshot is no longer valid, for example when restoring on a different host or under a different jailer chroot.

Adds a pmem_overrides parameter to the snapshot load API, following the same pattern as the existing network_overrides (#4731) and vsock_override (#5323). Each entry identifies a pmem device by its pmem_id and supplies a new path_on_host. Both MMIO and PCI transports are covered. Overrides targeting an unknown pmem_id fail the restore with a clear error.

Part of #4992.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the automated style checks.
  • I have described what is done in these changes, why they are needed, and how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs) in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@JamesC1305 JamesC1305 self-assigned this May 20, 2026
@codecov

codecov Bot commented May 20, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 11.11111% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.86%. Comparing base (ce26972) to head (e53e035).

⚠️ Current head e53e035 differs from pull request most recent head 9c67102

Please upload reports for the commit 9c67102 to get more accurate results.

Files with missing lines Patch % Lines
src/vmm/src/persist.rs 5.88% 16 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5896      +/-   ##
==========================================
- Coverage   83.03%   82.86%   -0.18%     
==========================================
  Files         277      277              
  Lines       30204    30076     -128     
==========================================
- Hits        25079    24921     -158     
- Misses       5125     5155      +30     
Flag Coverage Δ
5.10-m5n.metal 83.15% <11.11%> (-0.19%) ⬇️
5.10-m6a.metal 82.49% <11.11%> (-0.20%) ⬇️
5.10-m6g.metal 79.79% <11.11%> (-0.21%) ⬇️
5.10-m6i.metal 83.15% <11.11%> (-0.19%) ⬇️
5.10-m7a.metal-48xl 82.48% <11.11%> (-0.20%) ⬇️
5.10-m7g.metal 79.79% <11.11%> (-0.21%) ⬇️
5.10-m7i.metal-24xl 83.13% <11.11%> (-0.19%) ⬇️
5.10-m7i.metal-48xl 83.12% <11.11%> (-0.19%) ⬇️
5.10-m8g.metal-24xl 79.79% <11.11%> (-0.21%) ⬇️
5.10-m8g.metal-48xl 79.79% <11.11%> (-0.21%) ⬇️
5.10-m8i.metal-48xl 83.12% <11.11%> (-0.20%) ⬇️
5.10-m8i.metal-96xl 83.12% <11.11%> (-0.20%) ⬇️
6.1-m5n.metal 83.17% <11.11%> (-0.19%) ⬇️
6.1-m6a.metal 82.52% <11.11%> (-0.20%) ⬇️
6.1-m6g.metal 79.79% <11.11%> (-0.21%) ⬇️
6.1-m6i.metal 83.17% <11.11%> (-0.19%) ⬇️
6.1-m7a.metal-48xl 82.51% <11.11%> (-0.20%) ⬇️
6.1-m7g.metal 79.79% <11.11%> (-0.21%) ⬇️
6.1-m7i.metal-24xl 83.19% <11.11%> (-0.19%) ⬇️
6.1-m7i.metal-48xl 83.19% <11.11%> (-0.19%) ⬇️
6.1-m8g.metal-24xl 79.79% <11.11%> (-0.22%) ⬇️
6.1-m8g.metal-48xl 79.79% <11.11%> (-0.21%) ⬇️
6.1-m8i.metal-48xl 83.19% <11.11%> (-0.19%) ⬇️
6.1-m8i.metal-96xl 83.19% <11.11%> (-0.19%) ⬇️
6.18-m5n.metal ?
6.18-m6a.metal ?
6.18-m6g.metal ?
6.18-m6i.metal ?
6.18-m7a.metal-48xl ?
6.18-m7g.metal ?
6.18-m7i.metal-24xl ?
6.18-m7i.metal-48xl ?
6.18-m8g.metal-24xl ?
6.18-m8g.metal-48xl ?
6.18-m8i.metal-48xl ?
6.18-m8i.metal-96xl ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Allows for changing the backing host file of a virtio-pmem device
during snapshot restore.
required:
- pmem_id

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps we should use id for uniformity with the rest of the pmem API.

@JamesC1305

JamesC1305 commented May 21, 2026

Copy link
Copy Markdown
Contributor

Hi @MohibShaikh,

Thanks for the PR. Overall it looks in pretty good shape to me.

I've left one minor nit regarding parameter names. I'm going to checkout the code and just perform a sanity check to confirm that it works as intended, and double check the unit/integ tests are reasonable. I'll get back to you soon

edit: I forgot to mention that the buildkite run failed: https://buildkite.com/firecracker/firecracker-pr/builds/17149/canvas

Can you please fix the style issue, and the integration test issue? It looks to be related to permissions on the pmem backing file.

When you think it's ready, can you please run tools/devtool checkstyle and tools/devtool test -- integration_tests/functional/test_snapshot_basic.py to ensure that everything works as intended?

Thanks :)

@MohibShaikh MohibShaikh force-pushed the feat/pmem-snapshot-override branch from e53e035 to af64c99 Compare May 21, 2026 11:25
@MohibShaikh

Copy link
Copy Markdown
Contributor Author

Thanks for the review, @JamesC1305!

  • Renamed pmem_idid in PmemOverride (Rust struct, swagger, docs, API tests, integration tests) for uniformity with the rest of the pmem API.
  • Fixed the integration test perms issue: the override file was copied straight into the chroot via shutil.copyfile, so it kept root ownership and the jailer couldn't open it. Now staged via create_jailed_resource(...) so it gets hardlinked + chowned to the jailer's uid:gid like every other snapshot file.
  • Squashed the two commits into one, the doc(changelog): link PR... commit was missing a body and tripped gitlint B6.
  • Rebased on latest main.

Re-ran tools/devtool checkstyle and tools/devtool checkbuild --all locally —> both green. My local env can't pull the AWS test artifacts for integration tests, so relying on Buildkite for those. Force-pushed.

@JamesC1305 JamesC1305 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small things, mostly regarding documentation and testing.

Comment thread src/vmm/src/persist.rs Outdated
/// Unknown Vsock Device.
UnknownVsockDevice,
/// Unknown Pmem Device.
UnknownPmemDevice,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we convert this to provide information about the unknown device? I.e. something like:

// Unknown Pmem device: {0}
UnknownPmemDevice(String)

(We have similar feedback for the currently open block device PR)

Comment thread docs/pmem.md Outdated
Comment on lines +268 to +270
each `virtio-pmem` device at a different host file. The new file must be at
least as large as the backing file used when the snapshot was taken; using a
smaller file will result in the device failing to restore.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new file must be at
least as large as the backing file used when the snapshot was taken; using a
smaller file will result in the device failing to restore.

I think this misstates the purpose of path-renaming. Path renaming is only intended to be used for when the same pmem file is at a different location on the host. It is not intended for arbitrarily swapping out the backing file on snapshot restore. The backing file should therefore be the same size as the pre-snapshot version.

# Hardlink the override into the restored VM's chroot and chown it to the
# jailer's uid:gid so Firecracker can open it.
override_jailed = restored_vm.create_jailed_resource(override_src)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a similar modification to microvm.py as the Block Override PR, where, when pmem_overrides is not None, the original pmem file is not copied to the restored VM's jail. Combined with using shutil.move (rather than copy), we can ensure that if restoring from the overridden path didn't work, the test should fail.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our test framework has changed since this PR was opened, and fixtures such as uvm_plain_any no longer exist. Could you please update your tests accordingly? More information about the new uvm fixture and dimension pinning are available in tests/README.md.

Comment thread src/vmm/src/persist.rs
.map(|state| state.config.path_on_host.clone_from(&entry.path_on_host))
.ok_or(SnapshotStateFromFileError::UnknownPmemDevice)?;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it maybe worth having small tests to ensure the 'happy path' works as intended, and also that an error is returned for bad conditions? This is the crux of the change

Allow overriding virtio-pmem device backing file paths when restoring
from a snapshot. This is useful when the host file path baked into the
snapshot is no longer valid, for example when restoring on a different
host or under a different jailer chroot.

Add a pmem_overrides parameter to the snapshot load API, following the
same pattern as the existing network_overrides and vsock_override
mechanisms. Each entry identifies a pmem device by its id and provides
a new path_on_host. Both MMIO and PCI transports are covered. An
override targeting an unknown id fails the restore with a clear error.

Part of firecracker-microvm#4992.

Signed-off-by: MohibShaikh <mohibuddin9@gmail.com>
@MohibShaikh MohibShaikh force-pushed the feat/pmem-snapshot-override branch from af64c99 to 9c67102 Compare June 23, 2026 11:45
@MohibShaikh

MohibShaikh commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the detailed review @JamesC1305! I've pushed an update addressing all the points. The branch was also rebased onto latest main (it had fallen well behind, which is why the test fixtures needed updating).

  • persist.rs error variant: UnknownPmemDevice now carries the offending id —> Unknown Pmem device: {0} / UnknownPmemDevice(String).
  • Unit tests for the override logic: extracted the override loop into a small apply_pmem_overrides() helper and added tests for the happy path, the unknown-id error, and the empty/no-op case.
  • docs/pmem.md: reworded to make clear that path renaming is only for relocating the same backing file (which must therefore be the same size), not for swapping in a different file on restore.
  • Integration test now genuinely exercises the override: restore_from_snapshot no longer auto-jails pmem backing files that have an override (same approach as the Block Override PR), and the test uses shutil.move instead of copyfile. The original file is therefore unreachable at its baked-in path, so the restore can only succeed via the override.
  • Test framework: switched from the removed uvm_plain_any fixture to uvm_configured with @pin_guest_kernel(GUEST_KERNEL_DEFAULT), matching the surrounding tests.
  • Swagger naming: id (from the earlier round).

Locally I've run cargo build, cargo fmt --check and black --check (all green). My env has no KVM/AWS artifacts, so I'm relying on Buildkite for the unit + integration tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants