Skip to content

Implement Better VM revert using VDI revert (26.1-lcm)#7116

Open
psafont wants to merge 7 commits into
xapi-project:26.1-lcmfrom
psafont:dev/pau/vdi-revert
Open

Implement Better VM revert using VDI revert (26.1-lcm)#7116
psafont wants to merge 7 commits into
xapi-project:26.1-lcmfrom
psafont:dev/pau/vdi-revert

Conversation

@psafont

@psafont psafont commented Jun 5, 2026

Copy link
Copy Markdown
Member

This PR contains the changes needed to implement the proposal shown in https://github.com/xapi-project/xen-api/blob/master/doc/content/design/snapshot-revert.md

Please note that the storage backends need to implement VDI_REVERT in order to benefit from this change.

The fallback logic to the previous method complicates the implementation quite a bit, although it's made a bit more manageable using sets.
The complexity comes from the current infrastructure to clone and delete, which mixes VBDs and VDIs, and that two VMs are related: the 'live' or 'destination' VM and the snapshot VM.

Note that only disks are affected by the change, and both CDs and suspend VDIs are treated the same way as before.

There's one change in user-visible behaviour: now the UUID of the VDIs do not change when reverting if VDI.revert is used.

This PR is stacked on top of #7103

@psafont psafont force-pushed the dev/pau/vdi-revert branch from 97b3938 to 7f81a53 Compare June 5, 2026 14:00
Comment thread ocaml/xapi/xapi_vdi.ml Outdated
Comment thread ocaml/xapi/xapi_vm_snapshot.ml Outdated
Comment thread ocaml/xapi/xapi_vm_snapshot.ml Outdated
Comment thread ocaml/xapi/xapi_vm_snapshot.ml Outdated
@psafont psafont force-pushed the dev/pau/vdi-revert branch from 7f81a53 to 9752afa Compare June 5, 2026 15:05
@last-genius

Copy link
Copy Markdown
Contributor

All of the != need to be <> instead

@psafont psafont force-pushed the dev/pau/vdi-revert branch 2 times, most recently from 4b4bf79 to 6f791df Compare June 5, 2026 15:22
Comment thread ocaml/xapi/xapi_vdi.ml Outdated
Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
@psafont psafont force-pushed the dev/pau/vdi-revert branch 3 times, most recently from 8e9d14e to 95b8cf0 Compare June 11, 2026 09:30
johnelse and others added 5 commits June 11, 2026 10:40
This is for both the live VDI (revert_from) and the snapshot (revert_to)

These correspond to the VM operations `reverting` and `revert`,
respectively.

Only snapshots backed by an SR that allows clones are allowed to be
reverted to. This is because SRs supporting revert is not enough to use
vdi revert without clone: having an incorrect snapshot_of field will
also cause the use of clone. With this we can ensure that if a bad edge
case is met the system has a working fallback.

Reverting to "live" snapshot VDIs is allowed, this happens when
reverting to a checkpoint.

Currently reverting requires the cloning feature because it's not
possible to discriminate between backends that support vdi revert and
ones that done. This will be possible once all of them do

Test that VDIs...
* can be reverted to a snapshot.
* can be reverted to an attached snapshot.
* cannot be reverted to a leaf VDI.
* cannot be reverted to an attached leaf.

Co-authored-by: John Else <john.else@citrix.com>
Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
This operation allows storage backends to implement VDI revert natively,
as well as advertise it. The operation is unused by xapi for the time
being.

Co-authored-by: David Scott <dave.scott@eu.citrix.com>
Co-authored-by: John Else <john.else@citrix.com>
Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
The API function is needed to be able to forward it across the pool and
to block other operations on the VDI while it's running.

It's unused for the time being.

Co-authored-by: John Else <john.else@citrix.com>
Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
The code now tries to call VDI.revert on all the disk VDIs, and if that
fails, it falls back to the original revert method that destroys VM
disks, clones snapshot disks and then fixes up metadata.

This means that the successfully reverted disks must be excluded from
the original method. This is done by introducing sets of VDIs and VBDs
and using set logic on them.

The CD disks and the suspend VDI are treated like before: destroy +
clone.

The code is more convoluted that I would have liked because of existing
clone infrastructure mixes VBDs with VDIs, so they need to be converted
back and forth to be able to run the previous method correctly,
especially for updating the snapshot_of field of the cloned snapshot
disks.

Quicktests encodes the difference in behaviour from the original method
to the native one: now VDI UUIDs are not changed when reverting.

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
@psafont psafont force-pushed the dev/pau/vdi-revert branch from 8a09224 to 97823b4 Compare June 11, 2026 09:41
@psafont

psafont commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Waiting to merge this until the prep PR has been tested by Xenserver and a build with these patches is tested on XCP-ng's CI this weekend

@psafont

psafont commented Jun 12, 2026

Copy link
Copy Markdown
Member Author

I've ran the quicktests with backends that have support vfi revert and ones that don't

In case where there's an error while reverting on the Storage Backend,
the state of the SR might become unstable, and a recovery using the
clone method for reverting might introduce bigger issues. Instead stop
the revert immediately.

The cases where ignoring the errors is acceptable is when either the SR
does not advertise support for VDI_REVERT, and when the SR does not
implement revert for a particular VDI. for example, if the SR implements
the revert for only some of the formats it implements.

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
@psafont psafont force-pushed the dev/pau/vdi-revert branch from f40f110 to 8b526e7 Compare June 12, 2026 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants