Skip to content

[cifmw_backup_restore] Fix post-restore validation and cleanup#3947

Open
stuggi wants to merge 1 commit into
openstack-k8s-operators:mainfrom
stuggi:backup_restore_improvement
Open

[cifmw_backup_restore] Fix post-restore validation and cleanup#3947
stuggi wants to merge 1 commit into
openstack-k8s-operators:mainfrom
stuggi:backup_restore_improvement

Conversation

@stuggi
Copy link
Copy Markdown
Contributor

@stuggi stuggi commented May 20, 2026

  • Wait for compute services and network agents to be ready with retry loops before proceeding to workload validation, preventing tempest from running against a partially recovered control plane
  • Delete test-operator CRs (Tempest, Tobiko, AnsibleTest, HorizonTest) at the beginning of cleanup while controllers and dependencies are still running, so finalizers get processed properly
  • Wait for test-operator pods to terminate after CR deletion
  • Adapt GaleraRestore pod discovery to the shortened resource names from mariadb-operator which drops the galera instance name prefix from generated resources (restore- instead of -restore-). Uses the galerarestore/name label selector when available, with fallback to the old naming convention so this change can land independently of the mariadb-operator PR
  • Increase control plane ready timeout from 10m to 30m
  • Fix loop_var collision with _delete_all_of_kind.yml

Related-To: openstack-k8s-operators/mariadb-operator#463
Jira: OSPRH-30196

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign eshulman2 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@stuggi stuggi requested review from abays and jhanzlic May 20, 2026 06:08
lmiccini
lmiccini previously approved these changes May 20, 2026
@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdoproject.org/buildset/98a258c888ad4a96b82b20e8fad96fd9

openstack-k8s-operators-content-provider FAILURE in 4m 09s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal-minor-update SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
✔️ cifmw-pod-zuul-files SUCCESS in 5m 35s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 40s
✔️ cifmw-pod-pre-commit SUCCESS in 9m 06s
cifmw-molecule-cifmw_backup_restore RETRY_LIMIT in 54s

@stuggi
Copy link
Copy Markdown
Contributor Author

stuggi commented May 20, 2026

recheck

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdoproject.org/buildset/597b7e25f8854f30a44e423e6ecbb830

openstack-k8s-operators-content-provider FAILURE in 4m 03s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal-minor-update SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
✔️ cifmw-pod-zuul-files SUCCESS in 4m 33s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 16s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 53s
cifmw-molecule-cifmw_backup_restore RETRY_LIMIT in 53s

@stuggi
Copy link
Copy Markdown
Contributor Author

stuggi commented May 20, 2026

recheck

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdoproject.org/buildset/db9090ce531d4300a52882ab747b6dcc

openstack-k8s-operators-content-provider FAILURE in 4m 43s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal-minor-update SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
✔️ cifmw-pod-zuul-files SUCCESS in 5m 15s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 49s
✔️ cifmw-pod-pre-commit SUCCESS in 10m 17s
cifmw-molecule-cifmw_backup_restore RETRY_LIMIT in 56s

@stuggi
Copy link
Copy Markdown
Contributor Author

stuggi commented May 20, 2026

recheck

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdoproject.org/buildset/501b0963805241a3a4789ef02f8a0832

openstack-k8s-operators-content-provider FAILURE in 4m 04s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal-minor-update SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
✔️ cifmw-pod-zuul-files SUCCESS in 4m 53s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 03s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 46s
cifmw-molecule-cifmw_backup_restore RETRY_LIMIT in 55s

Comment thread roles/cifmw_backup_restore/tasks/restore.yml Outdated
- Wait for compute services and network agents to be ready with
  retry loops before proceeding to workload validation, preventing
  tempest from running against a partially recovered control plane
- Delete test-operator CRs (Tempest, Tobiko, AnsibleTest, HorizonTest)
  at the beginning of cleanup while controllers and dependencies are
  still running, so finalizers get processed properly
- Wait for test-operator pods to terminate after CR deletion
- Adapt GaleraRestore pod discovery to the shortened resource names
  from mariadb-operator which drops the galera instance name prefix
  from generated resources (restore-<name> instead of
  <galera>-restore-<name>). Uses the galerarestore/name label selector
  when available, with fallback to the old naming convention so this
  change can land independently of the mariadb-operator PR
- Increase control plane ready timeout from 10m to 30m
- Fix loop_var collision with _delete_all_of_kind.yml

Related-To: openstack-k8s-operators/mariadb-operator#463

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
@stuggi stuggi force-pushed the backup_restore_improvement branch from 33df7ca to e85756a Compare May 20, 2026 12:28
Copy link
Copy Markdown
Contributor

@abays abays left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants