diff --git a/.gitignore b/.gitignore index 6801fbae54..9bbe5898c1 100644 --- a/.gitignore +++ b/.gitignore @@ -21,6 +21,7 @@ cpd-cli-workspace/* /node_modules package-lock.json package.json +test*.sh # Byte-compiled / optimized / DLL files __pycache__/ diff --git a/build/bin/gen_role_docs.py b/build/bin/gen_role_docs.py index ad404b7177..1fe3b79da0 100644 --- a/build/bin/gen_role_docs.py +++ b/build/bin/gen_role_docs.py @@ -136,5 +136,3 @@ print(f" ✗ Warning: {src_file} not found, skipping {role}") print(f"Role documentation generation complete!") - -# Made with Bob diff --git a/build/scripts/validate_readme.py b/build/scripts/validate_readme.py index 49042a9196..bd4b65244c 100644 --- a/build/scripts/validate_readme.py +++ b/build/scripts/validate_readme.py @@ -444,5 +444,3 @@ def main(): if __name__ == '__main__': main() - -# Made with Bob diff --git a/docs/playbooks/backup-restore.md b/docs/playbooks/backup-restore.md index 1a517befec..988eb4995e 100644 --- a/docs/playbooks/backup-restore.md +++ b/docs/playbooks/backup-restore.md @@ -1,599 +1,533 @@ Backup and Restore =============================================================================== -Overview -------------------------------------------------------------------------------- -MAS Devops Collection includes playbooks for backing up and restoring of the following MAS components and their dependencies: - -- [MongoDB](#backuprestore-for-mongodb) -- [Db2](#backuprestore-for-db2) -- [MAS Core](#backuprestore-for-mas-core) -- [Manage](#backuprestore-for-manage) -- [IoT](#backuprestore-for-iot) -- [Monitor](#backuprestore-for-monitor) -- [Health](#backuprestore-for-health) -- [Optimizer](#backuprestore-for-optimizer) -- [Visual Inspection](#backuprestore-for-visual-inspection) - - -Creation of both **full** and **incremental** backups are supported. The backup and restore Ansible roles can also be used individually, allowing you to build your own customized backup and restore playbook covering exactly what you need. For example, you can only [backup/restore Manage attachments](../roles/suite_app_backup_restore.md). - !!! important - The backup and restore playbooks in this collection are still work in progress, they are not suitable for production use at this time. You may track development progress using the [Backup & Restore](https://github.com/ibm-mas/ansible-devops/issues?q=label%3A%22Backup+%26+Restore%22+) label in the Github repository. + These playbooks are samples to demonstrate how to use the roles in this collection. - Production-ready backup and restore options are detailed in the [Backup and restore](https://www.ibm.com/docs/en/mas-cd/continuous-delivery?topic=administering-backing-up-restoring-maximo-application-suite) topic in the product documentation. - -Configuration - Storage -------------------------------------------------------------------------------- -You can save the backup files to a folder on your local file system by setting the following environment variables: - -| Envrionment variable | Required (Default Value) | Description | -| ------------------------------------ | -------------------------- | ----------- | -| MASBR_STORAGE_LOCAL_FOLDER | **Yes** | The local path to save the backup files | -| MASBR_LOCAL_TEMP_FOLDER | No (`/tmp/masbr`) | Local folder for saving the temporary backup/restore data, the data in this folder will be deleted after the backup/restore job completed. | + They are **not intended for production use** as-is, they are a starting point for power users to aid in the development of their own Ansible playbooks using the roles in this collection. + The recommended way to perform backup and restore operations for MAS is to use the [MAS CLI](https://ibm-mas.github.io/cli/), which uses this Ansible Collection to deliver a complete managed lifecycle for your MAS instance. -Configuration - Backup +Overview ------------------------------------------------------------------------------- +MAS Devops Collection includes playbooks/guidance for backing up and restoring MAS components and their dependencies. The backup and restore operations are designed to protect your MAS installation and enable disaster recovery, cluster migration, and testing scenarios. + +**Supported Components:** +- [MongoDB Community](#mongodb-community-backup-and-restore) +- [MAS Core](#mas-core-backup-and-restore) +- [Db2 Backup and Restore](#db2-backup-and-restore) +- [Manage Application](#manage-application-backup-and-restore) + +**Roles Supporting Backup/Restore:** +- [`ibm.mas_devops.cert_manager`](../roles/cert_manager.md) +- [`ibm.mas_devops.db2`](../roles/db2.md) +- [`ibm.mas_devops.ibm_catalogs`](../roles/ibm_catalogs.md) +- [`ibm.mas_devops.mongodb`](../roles/mongodb.md) +- [`ibm.mas_devops.sls`](../roles/sls.md) +- [`ibm.mas_devops.suite_backup`](../roles/suite_backup.md) +- [`ibm.mas_devops.suite_restore`](../roles/suite_restore.md) +- [`ibm.mas_devops.suite_app_backup`](../roles/suite_app_backup.md) +- [`ibm.mas_devops.suite_app_restore`](../roles/suite_app_restore.md) + +MongoDB Community Backup and Restore +=============================================================================== -| Envrionment variable | Required (Default Value) | Description | -| ------------------------------------ | ------------------------ | ----------- | -| MASBR_ACTION | **Yes** | Whether to run the playbook to perform a `backup` or a `restore` | -| MASBR_BACKUP_TYPE | No (`full`) | Set `full` or `incr` to indicate the playbook to create a **full** backup or **incremental** backup. | -| MASBR_BACKUP_FROM_VERSION | No | Set the full backup version to use in the incremental backup, this will be in the format of a `YYYMMDDHHMMSS` timestamp (e.g. `20240621021316`). | - -The playbooks are switched to backup mode by setting `MASBR_ACTION` to `backup`. +## Overview +This playbook performs backup and restore operations for MongoDB Community Edition instances. It supports backing up both the MongoDB instance configuration and database data. -### Full Backups -If you set environment variable `MASBR_BACKUP_TYPE=full` or do not specify a value for this variable, the playbook will take a full backup. +**Important**: +- Supports MongoDB Community Edition only +- Can backup/restore entire instance or individual databases +- Backup includes both Kubernetes resources and database data -### Incremental Backups -You can set environment variable `MASBR_BACKUP_TYPE=incr` to indicate the playbook to take an incremental backup. +## Playbook Content -!!! important - Only supports creating incremental backup for MonogDB, Db2 and persistent volume data. The playbook will always create a full backup for other type of data regardless of whether this variable be set to `incr`. +The playbook executes the following operations: -The environment variable `MASBR_BACKUP_FROM_VERSION` is only valid if `MASBR_BACKUP_TYPE=incr`. It indicates which backup version that the incremental backup to based on. If you do not set a value for this variable, the playbook will try to find the latest Completed Full backup from the specified storage location, and then take an incremental backup based on it. +### Backup Operation +1. [Backup MongoDB Instance Resources](../roles/mongodb.md) - Kubernetes resources (Deployment, Custom resources, ConfigMaps, Secrets) +2. [Backup MongoDB Database Data](../roles/mongodb.md) - Database data using mongodump -!!! important - The backup files you specified by `MASBR_BACKUP_FROM_VERSION` must be a Full backup. And the component name and data types in the specified Full backup file must be same as the current incremental backup job. +### Restore Operation +1. [Restore MongoDB Instance Resources](../roles/mongodb.md) - Recreate Kubernetes resources +2. [Restore MongoDB Database Data](../roles/mongodb.md) - Restore database data using mongorestore +## Required Environment Variables -Configuration - Restore -------------------------------------------------------------------------------- +### Common Variables (Backup and Restore) +- `MAS_INSTANCE_ID` - The instance ID of the MAS installation +- `MAS_BACKUP_DIR` - Directory where backup files will be stored/retrieved (e.g., `/tmp/mas_backups`) +- `MONGODB_ACTION` - Set to `backup`, `backup-database`, `restore`, or `restore-database` +- `MONGODB_INSTANCE_NAME` - MongoDB instance name (default: `mas-mongo-ce`) +- `MONGODB_NAMESPACE` - Namespace where MongoDB is installed (default: `mongoce`) -| Envrionment variable | Required (Default Value) | Description | -| ------------------------------------ | ------------------------ | ----------- | -| MASBR_ACTION | **Yes** | Whether to run the playbook to perform a `backup` or a `restore` | -| MASBR_RESTORE_FROM_VERSION | **Yes** | Set the backup version to use in the restore, this will be in the format of a `YYYMMDDHHMMSS` timestamp (e.g. `20240621021316`) | -| MASBR_RESTORE_OVERWRITE | **Yes** | Set whether the restore should **overwrite** any existing data or if we should stop and **FAIL** if there is data detected in the directory. **WARNING:** This will overwrite all data when restoring! | +### Backup-Specific Variables +- `MONGODB_BACKUP_VERSION` - (Optional) Custom version identifier for the backup. If not provided, defaults to timestamp format `YYYYMMDD-HHMMSS` -The playbooks are switched to restore mode by setting `MASBR_ACTION` to `restore`. You **must** specify the `MASBR_RESTORE_FROM_VERSION` environment variable to indicate which version of the backup files to use. +### Restore-Specific Variables +- `MONGODB_BACKUP_VERSION` - (Required) The backup version identifier to restore -In the case of restoring from an incremental backup, the corresponding full backup will be restored first before continuing to restore the incremental backup. +## Optional Environment Variables +### Storage Class Override (Restore) +- `OVERRIDE_STORAGECLASS` - Set to `true` to override storage class names from backup (default: `false`) +- `MONGODB_STORAGECLASS_NAME_RWO` - Custom RWO storage class for MongoDB -Backup/Restore for MongoDB -------------------------------------------------------------------------------- -This playbook `ibm.mas_devops.br_mongodb` will invoke the role [mongodb](../roles/mongodb.md) to backup/restore the MongoDB databases. +### Application-Specific +- `MAS_APP_ID` - (Optional) MAS application ID if backing up application-specific database -This playbook supports backing up and restoring databases for an in-cluster MongoDB CE instance. If you are using other MongoDB venders, such as IBM Cloud Databases for MongoDB, Amazon DocumentDB or MongoDB Altas Database, please refer to the corresponding vender's documentation for more information about their provided backup/restore service. +## Usage Examples -### Environment Variables -- `MONGODB_NAMESPACE`: By default the backup and restore processes will use a namespace of `mongoce`, if you have customized the install of MongoDb CE you must set this environment variable to the appropriate namespace you wish to backup from/restore to. -- `MAS_INSTANCE_ID`: **Required**. This playbook supports backup/restore MongoDB databases that belong to a specific MAS instance, call the playbook multiple times with different values for `MAS_INSTANCE_ID` if you wish to back up multiple MAS instances that use the same MongoDB CE instance. -- `MAS_APP_ID`: **Optional**. By default, this playbook will backup all databases belonging to the specified MAS instance. You can backup the databases only belong to a specific MAS application by setting this environment variable to a supported MAS application id `core`, `manage`, `iot`, `monitor`, `health`, `optimizer` or `visualinspection`. +### Backup MongoDB Instance +Create a complete backup of MongoDB instance and all databases: -### Examples ```bash -# Full backup all MongoDB data for the dev1 instance -export MASBR_ACTION=backup -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MAS_INSTANCE_ID=dev -ansible-playbook ibm.mas_devops.br_mongodb - -# Incremental backup all MongoDB data for the dev1 instance -export MASBR_ACTION=backup -export MASBR_BACKUP_TYPE=incr -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MAS_INSTANCE_ID=dev -ansible-playbook ibm.mas_devops.br_mongodb - -# Restore all MongoDB data for the dev1 instance -export MASBR_ACTION=restore -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MASBR_RESTORE_FROM_VERSION=20240630132439 -export MAS_INSTANCE_ID=dev -ansible-playbook ibm.mas_devops.br_mongodb +export MAS_INSTANCE_ID=inst1 +export MAS_BACKUP_DIR=/backup/mas +export MONGODB_ACTION=backup +export MONGODB_INSTANCE_NAME=mas-mongo-ce +export MONGODB_NAMESPACE=mongoce -# Backup just the IoT MongoDB data for the dev2 instance -export MASBR_ACTION=backup -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MAS_INSTANCE_ID=dev2 -export MAS_APP_ID=iot +oc login --token=xxxx --server=https://myocpserver ansible-playbook ibm.mas_devops.br_mongodb ``` +### Backup with Custom Version +Create a backup with a custom version identifier: -Backup/Restore for Db2 -------------------------------------------------------------------------------- -This playbook `ibm.mas_devops.br_db2` will invoke the role [db2](../roles/db2.md) to backup/restore a single Db2 instance. +```bash +export MAS_INSTANCE_ID=inst1 +export MAS_BACKUP_DIR=/backup/mas +export MONGODB_ACTION=backup +export MONGODB_BACKUP_VERSION=pre-upgrade-mongo +export MONGODB_INSTANCE_NAME=mas-mongo-ce -### Environment Variables +oc login --token=xxxx --server=https://myocpserver +ansible-playbook ibm.mas_devops.br_mongodb +``` -- `DB2_INSTANCE_NAME`: **Required** This playbook only supports backing up specific Db2 instance at a time. If you want to backup all Db2 instances in the Db2 cluster, you need to run this playbook multiple times with different value of this environment variable. -- `MAS_INSTANCE_ID`: **Required** Set the instance ID for the MAS install. -- `MASBR_ACTION`: **Required** Set the action to be performed, `backup` or `restore`. -- `MASBR_STORAGE_LOCAL_FOLDER`: **Required** Set the local path to the directory to be used for backup and restore. -- `DB2_NAMESPACE`: **Optional** Set the DB2 namespace, defaults to `db2u` +### Backup Individual Database +Create a backup of a specific database only: -### Examples ```bash -# Full backup for the db2w-shared Db2 instance -export MAS_INSTANCE_ID=dev -export MASBR_ACTION=backup -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export DB2_INSTANCE_NAME=db2w-shared -ansible-playbook ibm.mas_devops.br_db2 - -# Incremental backup for the db2w-shared Db2 instance -export MAS_INSTANCE_ID=dev -export MASBR_ACTION=backup -export MASBR_BACKUP_TYPE=incr -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export DB2_INSTANCE_NAME=db2w-shared -ansible-playbook ibm.mas_devops.br_db2 +export MAS_INSTANCE_ID=inst1 +export MAS_BACKUP_DIR=/backup/mas +export MONGODB_ACTION=backup-database +export MONGODB_INSTANCE_NAME=mas-mongo-ce +export MAS_APP_ID=manage -# Restore for the db2w-shared Db2 instance -export MAS_INSTANCE_ID=dev -export MASBR_ACTION=restore -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MASBR_RESTORE_FROM_VERSION=20240630132439 -export DB2_INSTANCE_NAME=db2w-shared -ansible-playbook ibm.mas_devops.br_db2 +oc login --token=xxxx --server=https://myocpserver +ansible-playbook ibm.mas_devops.br_mongodb ``` -Backup/Restore for MAS Core -------------------------------------------------------------------------------- -This playbook `ibm.mas_devops.br_core` will backup the following components that MAS Core depends on in order: +### Restore MongoDB Instance +Restore MongoDB instance from a backup: -| Component | Ansible Role | Data included | -| --------- | -------------------------------------------------------- | ---------------------------------- | -| mongodb | [mongodb](../roles/mongodb.md) | MongoDB databases used by MAS Core | -| core | [suite_backup_restore](../roles/suite_backup_restore.md) | MAS Core namespace resources | +```bash +export MAS_INSTANCE_ID=inst1 +export MAS_BACKUP_DIR=/backup/mas +export MONGODB_ACTION=restore +export MONGODB_BACKUP_VERSION=20260122-131500 +export MONGODB_INSTANCE_NAME=mas-mongo-ce +export MONGODB_NAMESPACE=mongoce + +oc login --token=xxxx --server=https://myocpserver +ansible-playbook ibm.mas_devops.br_mongodb +``` +### Restore with Storage Class Override +Restore MongoDB to a different cluster with different storage class: -### Environment Variables +```bash +export MAS_INSTANCE_ID=inst1 +export MAS_BACKUP_DIR=/backup/mas +export MONGODB_ACTION=restore +export MONGODB_BACKUP_VERSION=20260122-131500 +export MONGODB_INSTANCE_NAME=mas-mongo-ce +export MONGODB_NAMESPACE=mongoce + +# Override storage class +export OVERRIDE_STORAGECLASS=true +export MONGODB_STORAGECLASS_NAME_RWO=ocs-storagecluster-ceph-rbd + +oc login --token=xxxx --server=https://myocpserver +ansible-playbook ibm.mas_devops.br_mongodb +``` -- `MAS_INSTANCE_ID` **Required**. The MAS instance ID to perform a backup for. +### Restore Individual Database +Restore a specific database only: -### Examples ```bash -# Full backup all core data for the dev1 instance -export MASBR_ACTION=backup -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MAS_INSTANCE_ID=dev -ansible-playbook ibm.mas_devops.br_core - -# Incremental backup all core data for the dev1 instance -export MASBR_ACTION=backup -export MASBR_BACKUP_TYPE=incr -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MAS_INSTANCE_ID=dev -ansible-playbook ibm.mas_devops.br_core - -# Restore all core data for the dev1 instance -export MASBR_ACTION=restore -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MASBR_RESTORE_FROM_VERSION=20240630132439 -export MAS_INSTANCE_ID=dev -ansible-playbook ibm.mas_devops.br_core +export MAS_INSTANCE_ID=inst1 +export MAS_BACKUP_DIR=/backup/mas +export MONGODB_ACTION=restore-database +export MONGODB_BACKUP_VERSION=20260122-131500 +export MONGODB_INSTANCE_NAME=mas-mongo-ce +export MAS_APP_ID=manage + +oc login --token=xxxx --server=https://myocpserver +ansible-playbook ibm.mas_devops.br_mongodb ``` +## Important Considerations + +### Backup Actions +- **backup**: Full backup of MongoDB instance (Kubernetes resources + all database data) +- **backup-database**: Backup specific database data only (requires `MAS_APP_ID`) + +### Restore Actions +- **restore**: Full restore of MongoDB instance (Kubernetes resources + all database data) +- **restore-database**: Restore specific database data only (requires `MAS_APP_ID`) + +### Prerequisites for Restore +- Target cluster must have MongoDB Community Operator installed +- Sufficient storage capacity for database restoration +- Same or compatible MongoDB version as the backup +- Target cluster must use the same MAS instance ID as the backup + +### Backup Best Practices +1. **Regular Schedule**: Perform backups regularly, especially before: + - MongoDB upgrades + - MAS upgrades + - Configuration changes +2. **Full vs Database Backups**: + - Use full backups for complete disaster recovery + - Use database backups for application-specific data protection +3. **Test Restores**: Periodically test restore procedures in non-production environments +4. **Secure Storage**: Store backups in a secure location separate from the cluster + +### Restore Best Practices +1. **Pre-Restore Validation**: + - Verify backup archive exists and is complete + - Confirm target cluster has sufficient resources + - Verify MongoDB instance name matches the backup +2. **Post-Restore Verification**: + - Verify MongoDB pods are running + - Test database connectivity + - Verify data integrity + - Check application connectivity to MongoDB + +### Storage Requirements +- Plan for sufficient storage for MongoDB backups +- Database backups can be large depending on data size +- Backup directory structure: `{mas_backup_dir}/backup-{version}-mongoce/` + +### Security Considerations +- Backup files contain sensitive data including database contents and credentials +- Secure backup directory with appropriate permissions (chmod 700 recommended) +- Consider encrypting backups for long-term storage +- Restrict access to backup files to authorized personnel only + +!!! tip + If you do not want to set up all the dependencies on your local system, you can run the playbook inside our docker image: `docker run -ti --pull always quay.io/ibmmas/cli` + +## Additional Resources + +For detailed information about MongoDB backup and restore operations, refer to the role documentation: +- [MongoDB Backup/Restore](../roles/mongodb.md) + +Db2 Backup and Restore +=============================================================================== -Backup/Restore for Manage -------------------------------------------------------------------------------- -This playbook `ibm.mas_devops.br_manage` will backup the following components that Manage depends on in order: +## Overview +This playbook performs backup and restore operations for IBM Db2 Universal Operator instances. It supports both online and offline backups, and can store backups either on disk or in S3-compatible object storage(database backups only). -| Component | Role | Data included | -| --------- | ---------------------------------------------------------------- | ------------------------------------------------------------------------------ | -| mongodb | [mongodb](../roles/mongodb.md) | MongoDB databases used by MAS Core | -| db2 | [db2](../roles/db2.md) | Db2 instance used by Manage | -| core | [suite_backup_restore](../roles/suite_backup_restore.md) | MAS Core namespace resources | -| manage | [suite_app_backup_restore](../roles/suite_app_backup_restore.md) | Manage namespace resources
Persistent volume data, such as attachments | +**Important**: The playbook supports multiple backup actions: +- `backup` - Full Db2 instance backup +- `backup-database` - Individual database backup +- `restore` - Full Db2 instance restore +- `restore-database` - Individual database restore +## Required Environment Variables -### Environment Variables +### Common Variables (Backup and Restore) +- `MAS_INSTANCE_ID` - The instance ID of the MAS installation +- `MAS_BACKUP_DIR` - Directory where backup files will be stored/retrieved (e.g., `/tmp/mas_backups`) +- `DB2_INSTANCE_NAME` - Name of the Db2 instance +- `DB2_ACTION` - Set to `backup`, `backup-database`, `restore`, or `restore-database` -- `MAS_INSTANCE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS instance at a time. If you have multiple MAS instances in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. -- `MAS_WORKSPACE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS workspace at a time. If you have multiple MAS workspaces in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. -- `DB2_INSTANCE_NAME` **Optional**. When defined, this playbook will backup the Db2 instance used by Manage. DB2 role is skipped when environment variable is not defined.. -- `DB2_NAMESPACE`: **Optional** Set the DB2 namespace, defaults to `db2u` +### Backup-Specific Variables +- `DB2_BACKUP_TYPE` - Set to `online` or `offline` (default: `online`) +- `BACKUP_VENDOR` - Set to `disk` or `s3` (default: `disk`) -### Examples +### Restore-Specific Variables +- `DB2_BACKUP_VERSION` - (Required) The backup version identifier to restore -```bash -# Full backup all manage data for the dev1 instance and ws1 workspace -export MASBR_ACTION=backup -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MAS_INSTANCE_ID=dev -export MAS_WORKSPACE_ID=ws1 -export DB2_INSTANCE_NAME=mas-dev1-ws1-manage # set this to execute db2 backup role -ansible-playbook ibm.mas_devops.br_manage - -# Incremental backup all manage data for the dev1 instance and ws1 workspace -export MASBR_ACTION=backup -export MASBR_BACKUP_TYPE=incr -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MAS_INSTANCE_ID=dev -export MAS_WORKSPACE_ID=ws1 -export DB2_INSTANCE_NAME=mas-dev1-ws1-manage # set this to execute db2 backup role -ansible-playbook ibm.mas_devops.br_manage - -# Restore all manage data for the dev1 instance and ws1 workspace -export MASBR_ACTION=restore -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MASBR_RESTORE_FROM_VERSION=20240630132439 -export MAS_INSTANCE_ID=dev -export MAS_WORKSPACE_ID=ws1 -export DB2_INSTANCE_NAME=mas-dev1-ws1-manage # set this to execute db2 backup role -ansible-playbook ibm.mas_devops.br_manage -``` +### S3 Storage Variables (when BACKUP_VENDOR=s3) +- `BACKUP_S3_ALIAS` - S3 alias name (default: `S3DB2COS`) +- `BACKUP_S3_ENDPOINT` - S3 endpoint URL +- `BACKUP_S3_BUCKET` - S3 bucket name +- `BACKUP_S3_ACCESS_KEY` - S3 access key +- `BACKUP_S3_SECRET_KEY` - S3 secret key +## Optional Environment Variables -Backup/Restore for IoT -------------------------------------------------------------------------------- -This playbook `ibm.mas_devops.br_iot` will backup the following components that IoT depends on in order: +### Db2 Configuration +- `DB2_NAMESPACE` - Namespace where Db2 is installed (default: `db2u`) -| Component | Ansible Role | Data included | -| --------- | ---------------------------------------------------------------- | ------------------------------------------ | -| mongodb | [mongodb](../roles/mongodb.md) | MongoDB databases used by MAS Core and IoT | -| db2 | [db2](../roles/db2.md) | Db2 instance used by IoT | -| core | [suite_backup_restore](../roles/suite_backup_restore.md) | MAS Core namespace resources | -| iot | [suite_app_backup_restore](../roles/suite_app_backup_restore.md) | IoT namespace resources | +### Storage Class Override (Restore) +- `OVERRIDE_STORAGECLASS` - Set to `true` to override storage class names from backup (default: `false`) +- `CUSTOM_STORAGE_CLASS_RWO` - Storage class for Read-write-only +- `CUSTOM_STORAGE_CLASS_RWX` - Storage class for Read-write-many -### Environment Variables -- `MAS_INSTANCE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS instance at a time. If you have multiple MAS instances in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. -- `MAS_WORKSPACE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS workspace at a time. If you have multiple MAS workspaces in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. -- `DB2_INSTANCE_NAME` **Required**. This playbook will backup the the Db2 instance used by IoT, you need to set the correct Db2 instance name for this environment variable. -- `DB2_NAMESPACE`: **Optional** Set the DB2 namespace, defaults to `db2u` +## Usage Examples -### Examples +### Backup Db2 to Disk (Online) +Create an online backup of Db2 instance to local disk: ```bash -# Full backup all iot data for the dev1 instance -export MASBR_ACTION=backup -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MAS_INSTANCE_ID=dev -export MAS_WORKSPACE_ID=ws1 -export DB2_INSTANCE_NAME=db2w-shared -ansible-playbook ibm.mas_devops.br_iot - -# Incremental backup all iot data for the dev1 instance -export MASBR_ACTION=backup -export MASBR_BACKUP_TYPE=incr -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MAS_INSTANCE_ID=dev -export MAS_WORKSPACE_ID=ws1 +export MAS_INSTANCE_ID=inst1 +export MAS_BACKUP_DIR=/backup/mas export DB2_INSTANCE_NAME=db2w-shared -ansible-playbook ibm.mas_devops.br_iot - -# Restore all iot data for the dev1 instance -export MASBR_ACTION=restore -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MASBR_RESTORE_FROM_VERSION=20240630132439 -export MAS_INSTANCE_ID=dev -export MAS_WORKSPACE_ID=ws1 -export DB2_INSTANCE_NAME=db2w-shared -ansible-playbook ibm.mas_devops.br_iot +export DB2_ACTION=backup +export DB2_BACKUP_TYPE=online +export BACKUP_VENDOR=disk +oc login --token=xxxx --server=https://myocpserver +ansible-playbook ibm.mas_devops.br_db2 ``` - -Backup/Restore for Monitor -------------------------------------------------------------------------------- -This playbook `ibm.mas_devops.br_monitor` will backup the following components that Monitor depends on in order: - -| Component | Ansible Role | Data included | -| --------- | ---------------------------------------------------------------- | --------------------------------------------------- | -| mongodb | [mongodb](../roles/mongodb.md) | MongoDB databases used by MAS Core, IoT and Monitor | -| db2 | [db2](../roles/db2.md) | Db2 instance used by IoT and Monitor | -| core | [suite_backup_restore](../roles/suite_backup_restore.md) | MAS Core namespace resources | -| iot | [suite_app_backup_restore](../roles/suite_app_backup_restore.md) | IoT namespace resources | -| monitor | [suite_app_backup_restore](../roles/suite_app_backup_restore.md) | Monitor namespace resources | - - -### Environment Variables - -- `MAS_INSTANCE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS instance at a time. If you have multiple MAS instances in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. -- `MAS_WORKSPACE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS workspace at a time. If you have multiple MAS workspaces in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. -- `DB2_INSTANCE_NAME` **Required**. This playbook will backup the the Db2 instance used by IoT and Monitor, you need to set the correct Db2 instance name for this environment variable. -- `DB2_NAMESPACE`: **Optional** Set the DB2 namespace, defaults to `db2u` - -### Examples +### Backup Db2 to S3 +Create a backup of Db2 instance to S3 storage: ```bash -# Full backup all monitor data for the dev1 instance -export MASBR_ACTION=backup -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MAS_INSTANCE_ID=dev -export MAS_WORKSPACE_ID=ws1 +export MAS_INSTANCE_ID=inst1 +export MAS_BACKUP_DIR=/backup/mas export DB2_INSTANCE_NAME=db2w-shared -ansible-playbook ibm.mas_devops.br_monitor - -# Incremental backup all monitor data for the dev1 instance -export MASBR_ACTION=backup -export MASBR_BACKUP_TYPE=incr -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MAS_INSTANCE_ID=dev -export MAS_WORKSPACE_ID=ws1 -export DB2_INSTANCE_NAME=db2w-shared -ansible-playbook ibm.mas_devops.br_monitor - -# Restore all monitor data for the dev1 instance -export MASBR_ACTION=restore -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MASBR_RESTORE_FROM_VERSION=20240630132439 -export MAS_INSTANCE_ID=dev -export MAS_WORKSPACE_ID=ws1 -export DB2_INSTANCE_NAME=db2w-shared -ansible-playbook ibm.mas_devops.br_monitor +export DB2_ACTION=backup +export DB2_BACKUP_TYPE=online +export BACKUP_VENDOR=s3 +export BACKUP_S3_ENDPOINT=https://s3.us-east.cloud-object-storage.appdomain.cloud +export BACKUP_S3_BUCKET=mas-db2-backups +export BACKUP_S3_ACCESS_KEY=your-access-key +export BACKUP_S3_SECRET_KEY=your-secret-key + +oc login --token=xxxx --server=https://myocpserver +ansible-playbook ibm.mas_devops.br_db2 ``` - -Backup/Restore for Health -------------------------------------------------------------------------------- -This playbook `ibm.mas_devops.br_health` will backup the following components that Health depends on in order: - -| Component | Ansible Role | Data included | -| --------- | ---------------------------------------------------------------- | ------------------------------------------------------------------------------ | -| mongodb | [mongodb](../roles/mongodb.md) | MongoDB databases used by MAS Core | -| db2 | [db2](../roles/db2.md) | Db2 instance used by Manage and Health | -| core | [suite_backup_restore](../roles/suite_backup_restore.md) | MAS Core namespace resources | -| manage | [suite_app_backup_restore](../roles/suite_app_backup_restore.md) | Manage namespace resources
Persistent volume data, such as attachments | -| health | [suite_backup_restore](../roles/suite_backup_restore.md) | Health namespace resources
Watson Studio project assets | - -### Environment Variables - -- `MAS_INSTANCE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS instance at a time. If you have multiple MAS instances in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. -- `MAS_WORKSPACE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS workspace at a time. If you have multiple MAS workspaces in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. -- `DB2_INSTANCE_NAME` **Required**. This playbook will backup the the Db2 instance used by Manage and Health, you need to set the correct Db2 instance name for this environment variable. - -### Examples +### Restore Db2 from Backup +Restore Db2 instance from a previous backup: ```bash -# Full backup all health data for the dev1 instance and ws1 workspace -export MASBR_ACTION=backup -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MAS_INSTANCE_ID=dev -export MAS_WORKSPACE_ID=ws1 -export DB2_INSTANCE_NAME=mas-dev1-ws1-manage -ansible-playbook ibm.mas_devops.br_health - -# Incremental backup all health data for the dev1 instance and ws1 workspace -export MASBR_ACTION=backup -export MASBR_BACKUP_TYPE=incr -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MAS_INSTANCE_ID=dev -export MAS_WORKSPACE_ID=ws1 -export DB2_INSTANCE_NAME=mas-dev1-ws1-manage -ansible-playbook ibm.mas_devops.br_health - -# Restore all health data for the dev1 instance and ws1 workspace -export MASBR_ACTION=restore -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MASBR_RESTORE_FROM_VERSION=20240630132439 -export MAS_INSTANCE_ID=dev -export MAS_WORKSPACE_ID=ws1 -export DB2_INSTANCE_NAME=mas-dev1-ws1-manage -ansible-playbook ibm.mas_devops.br_health -``` - - -Backup/Restore for Optimizer -------------------------------------------------------------------------------- -This playbook `ibm.mas_devops.br_optimizer` will backup the following components that Optimizer depends on in order: - -| Component | Ansible Role | Data included | -| --------- | ---------------------------------------------------------------- | ------------------------------------------------------------------------------ | -| mongodb | [mongodb](../roles/mongodb.md) | MongoDB databases used by MAS Core and Optimizer | -| db2 | [db2](../roles/db2.md) | Db2 instance used by Manage | -| core | [suite_backup_restore](../roles/suite_backup_restore.md) | MAS Core namespace resources | -| manage | [suite_app_backup_restore](../roles/suite_app_backup_restore.md) | Manage namespace resources
Persistent volume data, such as attachments | -| optimizer | [suite_backup_restore](../roles/suite_backup_restore.md) | Optimizer namespace resources | - -### Environment Variables - -- `MAS_INSTANCE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS instance at a time. If you have multiple MAS instances in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. -- `MAS_WORKSPACE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS workspace at a time. If you have multiple MAS workspaces in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. -- `DB2_INSTANCE_NAME` **Required**. This playbook will backup the the Db2 instance used by Manage, you need to set the correct Db2 instance name for this environment variable. -- `DB2_NAMESPACE`: **Optional** Set the DB2 namespace, defaults to `db2u` - -### Examples +export MAS_INSTANCE_ID=inst1 +export MAS_BACKUP_DIR=/backup/mas +export DB2_INSTANCE_NAME=db2w-shared +export DB2_ACTION=restore +export DB2_BACKUP_VERSION=20260122-131500 +export BACKUP_VENDOR=disk -```bash -# Full backup all optimizer data for the dev1 instance and ws1 workspace -export MASBR_ACTION=backup -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MAS_INSTANCE_ID=dev -export MAS_WORKSPACE_ID=ws1 -export DB2_INSTANCE_NAME=mas-dev1-ws1-manage -ansible-playbook ibm.mas_devops.br_optimizer - -# Incremental backup all optimizer data for the dev1 instance and ws1 workspace -export MASBR_ACTION=backup -export MASBR_BACKUP_TYPE=incr -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MAS_INSTANCE_ID=dev -export MAS_WORKSPACE_ID=ws1 -export DB2_INSTANCE_NAME=mas-dev1-ws1-manage -ansible-playbook ibm.mas_devops.br_optimizer - -# Restore all optimizer data for the dev1 instance and ws1 workspace -export MASBR_ACTION=restore -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MASBR_RESTORE_FROM_VERSION=20240630132439 -export MAS_INSTANCE_ID=dev -export MAS_WORKSPACE_ID=ws1 -export DB2_INSTANCE_NAME=mas-dev1-ws1-manage -ansible-playbook ibm.mas_devops.br_optimizer +oc login --token=xxxx --server=https://myocpserver +ansible-playbook ibm.mas_devops.br_db2 ``` - -Backup/Restore for Visual Inspection -------------------------------------------------------------------------------- -This playbook `ibm.mas_devops.br_visualinspection` will backup the following components that Visual Inspection depends on in order: - -| Component | Ansible Role | Data included | -| ---------------- | ---------------------------------------------------------------- | ------------------------------------------------------------------------------------------- | -| mongodb | [mongodb](../roles/mongodb.md) | MongoDB databases used by MAS Core and Visual Inspection | -| core | [suite_backup_restore](../roles/suite_backup_restore.md) | MAS Core namespace resources | -| visualinspection | [suite_app_backup_restore](../roles/suite_app_backup_restore.md) | Visual Inspection namespace resources
Persistent volume data, such as images and models | - -### Environment Variables - -- `MAS_INSTANCE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS instance at a time. If you have multiple MAS instances in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. -- `MAS_WORKSPACE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS workspace at a time. If you have multiple MAS workspaces in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. - -### Examples +### Restore with Storage Class Override +Restore Db2 to a different cluster with different storage classes: ```bash -# Full backup all visual inspection data for the dev1 instance and ws1 workspace -export MASBR_ACTION=backup -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MAS_INSTANCE_ID=dev -export MAS_WORKSPACE_ID=ws1 -ansible-playbook ibm.mas_devops.br_visualinspection - -# Incremental backup all visual inspection data for the dev1 instance and ws1 workspace -export MASBR_ACTION=backup -export MASBR_BACKUP_TYPE=incr -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MAS_INSTANCE_ID=dev -export MAS_WORKSPACE_ID=ws1 -ansible-playbook ibm.mas_devops.br_visualinspection - -# Restore all visual inspection data for the dev1 instance and ws1 workspace -export MASBR_ACTION=restore -export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup -export MASBR_RESTORE_FROM_VERSION=20240630132439 -export MAS_INSTANCE_ID=dev -export MAS_WORKSPACE_ID=ws1 -ansible-playbook ibm.mas_devops.br_visualinspection +export MAS_INSTANCE_ID=inst1 +export MAS_BACKUP_DIR=/backup/mas +export DB2_INSTANCE_NAME=db2w-shared +export DB2_ACTION=restore +export DB2_BACKUP_VERSION=20260122-131500 +export BACKUP_VENDOR=disk + +# Override storage classes +export OVERRIDE_STORAGECLASS=true +export DB2_META_STORAGE_CLASS=ocs-storagecluster-ceph-rbd +export DB2_DATA_STORAGE_CLASS=ocs-storagecluster-ceph-rbd +export DB2_BACKUP_STORAGE_CLASS=ocs-storagecluster-ceph-rbd +export DB2_LOGS_STORAGE_CLASS=ocs-storagecluster-ceph-rbd +export DB2_TEMP_STORAGE_CLASS=ocs-storagecluster-ceph-rbd + +oc login --token=xxxx --server=https://myocpserver +ansible-playbook ibm.mas_devops.br_db2 ``` +## Important Considerations -Reference -------------------------------------------------------------------------------- -### Directory Structure -No matter what kind of storage systems you choose, the folder structure created in the storage system is same. +### Backup Types +- **Online Backup**: Database remains available during backup (recommended for production) +- **Offline Backup**: Database is taken offline during backup (faster but causes downtime) -Below is the sample folder structure for saving backup jobs: +### Storage Vendor Options +- **Disk**: Stores backups on local filesystem or mounted storage +- **S3**: Stores backups in S3-compatible object storage (recommended for production) -``` -/backups/mongodb-main-full-20240621122530 -├── backup.yml -├── database -│ ├── mongodb-main-full-20240621122530.tar.gz -│ └── query.json -└── log - ├── mongodb-main-full-20240621122530-backup-log.tar.gz - └── mongodb-main-full-20240621122530-ansible-log.tar.gz - -/backups/core-main-full-20240621122530 -├── backup.yml -├── log -│ ├── core-main-full-20240621122530-ansible-log.tar.gz -│ └── core-main-full-20240621122530-namespace-log.tar.gz -└── namespace - └── core-main-full-20240621122530-namespace.tar.gz -``` +### Prerequisites for Restore +- Target cluster must have Db2 Universal Operator installed +- Sufficient storage capacity for database restoration +- Same or compatible Db2 version as the backup -- ``: the root folder is specified by `MASBR_STORAGE_LOCAL_FOLDER` or `MASBR_STORAGE_CLOUD_BUCKET` -- The backup playbooks will create a seperated backup job folder under the `backups` folder for each component. The backup job folder is named by following this format: `{BACKUP COMPONENT}-{INSTANCE ID}-{BACKUP TYPE}-{BACKUP VERSION}`. -- When using playbook to backup multiple components at once, all backup job folders will be assigned to the same backup version. In above example, the same backup version `20240621122530` for backing up `mongodb` and `core` components. -- `backup.yml`: keep the backup job information -- `database`: data type for database. This folder save the backup files of MongoDB database, Db2 database. -- `namespace`: data type for namespace resources. This folder save the exported namespace resources. -- `pv`: data type for persistent volume. This folder save the persistent volume data, e.g. the Manage attachments, VI images and models. -- `log`: this folder save all job running log files -In addition to the backup jobs, we also save restore jobs in the specified storage location. For example: +MAS Core Backup and Restore +=============================================================================== -``` -/restores/mongodb-main-incr-20240622040201-20240622075501 -├── log -│ ├── mongodb-main-incr-20240622040201-20240622075501-ansible-log.tar.gz -│ └── mongodb-main-incr-20240622040201-20240622075501-restore-log.tar.gz -└── restore.yml - -/restores/core-main-incr-20240622040201-20240622075501 -├── log -│ ├── core-main-incr-20240622040201-20240622075501-ansible-log.tar.gz -│ └── core-main-incr-20240622040201-20240622075501-namespace-log.tar.gz -└── restore.yml -``` +## Overview +This guide shows backup and restore operations for IBM Maximo Application Suite Core and its dependencies. This guidance can be used to build your own playbooks to run against any OCP cluster regardless of its type; whether it's running in IBM Cloud, Azure, AWS, or your local datacenter. + +**Important**: Backup can only be restored to an instance with the same MAS instance ID. + +## Guidance Content + +Sequence of roles: + +### Backup Operation +1. [Backup IBM Operator Catalogs](../roles/ibm_catalogs.md) (~1 minute) +2. [Backup Certificate Manager](../roles/cert_manager.md) (~1 minute) +3. [Backup MongoDB Community Edition](../roles/mongodb.md) (~5-30 minutes depending on database size) +4. [Backup Suite License Service](../roles/sls.md) (~2 minutes, optional) +5. [Backup MAS Core](../roles/suite_backup.md) (~5 minutes) + +### Restore Operation +1. [Restore IBM Operator Catalogs](../roles/ibm_catalogs.md) (~2 minutes) +2. [Restore Certificate Manager](../roles/cert_manager.md) (~5 minutes) +3. [Install Grafana](../roles/grafana.md) (~10 minutes, optional) +4. [Restore MongoDB Community Edition](../roles/mongodb.md) (~10-60 minutes depending on database size) +5. [Restore Suite License Service](../roles/sls.md) (~10 minutes, optional) +6. [Install Data Reporter Operator](../roles/dro.md) (~10 minutes, optional) +7. [Restore MAS Core](../roles/suite_restore.md) (~30 minutes) + +All timings are estimates. See the individual role documentation for more information and full details of all configuration options. + +## Important Considerations + +### Prerequisites for Restore +- Target cluster must have sufficient resources (CPU, memory, storage) +- Certificate Manager must be installed (handled by playbook) +- Target cluster must use the same MAS instance ID as the backup +- Backup files must be accessible from the restore environment + +### Backup Best Practices +1. **Regular Schedule**: Perform backups regularly, especially before: + - MAS upgrades + - Configuration changes + - Application installations + - Cluster maintenance +2. **Test Restores**: Periodically test restore procedures in non-production environments +3. **Secure Storage**: Store backups in a secure location separate from the cluster +4. **Retention Policy**: Implement and document backup retention policies +5. **Verify Integrity**: Verify backup integrity after completion + +### Restore Best Practices +1. **Pre-Restore Validation**: + - Verify backup archive exists and is complete + - Confirm target cluster has sufficient resources + - Verify MAS instance ID matches the backup +2. **Dependency Coordination**: + - Ensure all external services (SLS, DRO, databases) are accessible + - Verify network connectivity to external services +3. **Post-Restore Verification**: + - Verify Suite status is Ready + - Verify all Workspaces are Ready + - Test application connectivity + - Test user authentication + +### Storage Requirements +- Ensure sufficient storage in the backup directory +- Plan for at least 2x the database size for MongoDB backups +- Monitor disk space during backup operations +- Backup directory structure: `{mas_backup_dir}/backup-{version}-{component}/` + +### Security Considerations +- Backup files contain sensitive data including credentials and certificates +- Secure backup directory with appropriate permissions (chmod 700 recommended) +- Consider encrypting backups for long-term storage +- Restrict access to backup files to authorized personnel only +- Ensure secure transfer of backup files to restore environment + +!!! tip + If you do not want to set up all the dependencies on your local system, you can run the playbook inside our docker image: `docker run -ti --pull always quay.io/ibmmas/cli` + +## Additional Resources + +For detailed information about individual backup and restore operations, refer to the role documentation: +- [IBM Operator Catalogs Backup/Restore](../roles/ibm_catalogs.md) +- [Certificate Manager Backup/Restore](../roles/cert_manager.md) +- [MongoDB Backup/Restore](../roles/mongodb.md) +- [SLS Backup/Restore](../roles/sls.md) +- [MAS Core Backup](../roles/suite_backup.md) +- [MAS Core Restore](../roles/suite_restore.md) +- [Db2 Backup/Restore](../roles/db2.md) + +Manage Application Backup and Restore +=============================================================================== -The restore playbooks will create a seperated restore job folder under the `restores` folder for each component. The restore job folder is named by following this format: `{BACKUP JOB NAME}-{RESTORE VERSION}`. - -- `restore.yml`: keep the restore job information -- `log`: this folder save all job running log files - -### Data Model -#### backup.yml -```yaml -kind: Backup -name: "core-main-incr-20240622040201" -version: "20240622040201" -type: "incr" -from: "core-main-full-20240621122530" -source: - domain: "source-cluster.mydomain.com" - suite: "8.11.11" - instance: "main" - workspace: "" -component: - name: "core" - instance: "main" - namespace: "mas-main-core" -data: - - seq: "1" - type: "namespace" - phase: "Completed" -status: - phase: "Completed" - startTimestamp: "2024-06-22T04:05:22" - completionTimestamp: "2024-06-22T04:06:04" - sentNotifications: - - type: "Slack" - channel: "#ansible-slack-dev" - timestamp: "2024-06-22T04:05:34" - phase: "InProgress" - - type: "Slack" - channel: "#ansible-slack-dev" - timestamp: "2024-06-22T04:06:10" - phase: "Completed" -``` +## Overview +This guide shows backup and restore operations for IBM Maximo Manage application. + +**Important**: +- Backup can only be restored to an instance with the same MAS instance ID +- You **MUST** run the [DB2 backup and restore playbook](#db2-backup-and-restore) as a prerequisite step before running Manage backup or restore operations + +## Content + +Executes the following operations: + +### Backup Operation +1. **[Backup Db2 Database](../roles/db2.md) - PREREQUISITE STEP** (run `br_db2.yml` playbook separately first) +2. [Backup Manage Application](../roles/suite_app_backup.md) + +### Restore Operation +1. **[Restore Db2 Database](../roles/db2.md) - PREREQUISITE STEP** (run `br_db2.yml` playbook separately first) +2. [Restore Manage Application](../roles/suite_app_restore.md) + +## Important Considerations + +### Prerequisites for Restore +- Target cluster must have MAS Core installed and configured +- Target cluster must have Db2 Universal Operator installed +- Workspace must exist with the same workspace ID +- Sufficient resources (CPU, memory, storage) for both Db2 and Manage +- Target cluster must use the same MAS instance ID as the backup + +### Backup Best Practices +1. **Two-Step Process**: Always backup DB2 first, then Manage application + - Run `br_db2.yml` playbook before running Manage application backup + - DB2 backup is NOT automatically included in Manage backup +2. **Version Alignment**: Use consistent version identifiers for both DB2 and Manage backups for easier tracking +3. **Regular Schedule**: Perform backups regularly, especially before: + - Manage upgrades or updates + - Configuration changes + - Data migrations +4. **Test Restores**: Periodically test restore procedures in non-production environments +5. **Secure Storage**: Store backups in a secure location, preferably using S3 storage + +### Restore Best Practices +1. **Pre-Restore Validation**: + - Verify both DB2 and Manage backup archives exist + - Confirm target cluster has sufficient resources + - Verify MAS instance ID and workspace ID match the backup +2. **Restore Order**: **CRITICAL** - Always restore DB2 first, then Manage application + - Run `br_db2.yml` playbook before running Manage application restore + - DB2 restore is NOT automatically included in Manage restore +3. **Post-Restore Verification**: + - Verify DB2 instance is running and accessible + - Verify Manage workspace status is Ready + - Test Manage application functionality + - Verify data integrity + +### Storage Requirements +- Plan for sufficient storage for both Db2 and Manage backups +- Db2 backups can be large depending on database size +- Manage application configuration is relatively small +- Consider using S3 storage for production backups + +### Security Considerations +- Backup files contain sensitive data including database contents and credentials +- Secure backup directory with appropriate permissions +- Consider encrypting backups for long-term storage +- Restrict access to backup files to authorized personnel only +- Ensure secure transfer of backup files to restore environment + +!!! tip + If you do not want to set up all the dependencies on your local system, you can run the playbook inside our docker image: `docker run -ti --pull always quay.io/ibmmas/cli` + +## Additional Resources + +For detailed information about individual backup and restore operations, refer to the role documentation: +- [Db2 Backup/Restore](../roles/db2.md) +- [Manage Application Backup](../roles/suite_app_backup.md) +- [Manage Application Restore](../roles/suite_app_restore.md) -#### restore.yml -```yaml -kind: Restore -name: "core-main-incr-20240622040201-20240622075501" -version: "20240622075501" -from: "core-main-incr-20240622040201" -target: - domain: "target-cluster.mydomain.com" -component: - name: "core" - instance: "main" - namespace: "mas-main-core" -data: - - seq: 1 - type: "namespace" - phase: "Completed" -status: - phase: "Completed" - startTimestamp: "2024-06-22T08:04:19" - completionTimestamp: "2024-06-22T08:04:33" -``` diff --git a/docs/playbooks/legacy-backup-restore.md b/docs/playbooks/legacy-backup-restore.md new file mode 100644 index 0000000000..51ce4ed12d --- /dev/null +++ b/docs/playbooks/legacy-backup-restore.md @@ -0,0 +1,604 @@ +Legacy Backup and Restore +=============================================================================== + +!!! important + The Legacy backup and restore playbooks are removed in CLI v19.0 and later. + Use older CLI versions to continue using the legacy playbooks. + Please refer to the [backup and restore documentation](../backup-and-restore.md) for the latest information. + +Overview +------------------------------------------------------------------------------- +MAS Devops Collection includes playbooks for backing up and restoring of the following MAS components and their dependencies: + +- [MongoDB](#backuprestore-for-mongodb) +- [Db2](#backuprestore-for-db2) +- [MAS Core](#backuprestore-for-mas-core) +- [Manage](#backuprestore-for-manage) +- [IoT](#backuprestore-for-iot) +- [Monitor](#backuprestore-for-monitor) +- [Health](#backuprestore-for-health) +- [Optimizer](#backuprestore-for-optimizer) +- [Visual Inspection](#backuprestore-for-visual-inspection) + + +Creation of both **full** and **incremental** backups are supported. The backup and restore Ansible roles can also be used individually, allowing you to build your own customized backup and restore playbook covering exactly what you need. For example, you can only [backup/restore Manage attachments](../roles/suite_app_backup_restore.md). + +!!! important + The backup and restore playbooks in this collection are still work in progress, they are not suitable for production use at this time. You may track development progress using the [Backup & Restore](https://github.com/ibm-mas/ansible-devops/issues?q=label%3A%22Backup+%26+Restore%22+) label in the Github repository. + + Production-ready backup and restore options are detailed in the [Backup and restore](https://www.ibm.com/docs/en/mas-cd/continuous-delivery?topic=administering-backing-up-restoring-maximo-application-suite) topic in the product documentation. + +Configuration - Storage +------------------------------------------------------------------------------- +You can save the backup files to a folder on your local file system by setting the following environment variables: + +| Envrionment variable | Required (Default Value) | Description | +| ------------------------------------ | -------------------------- | ----------- | +| MASBR_STORAGE_LOCAL_FOLDER | **Yes** | The local path to save the backup files | +| MASBR_LOCAL_TEMP_FOLDER | No (`/tmp/masbr`) | Local folder for saving the temporary backup/restore data, the data in this folder will be deleted after the backup/restore job completed. | + + +Configuration - Backup +------------------------------------------------------------------------------- + +| Envrionment variable | Required (Default Value) | Description | +| ------------------------------------ | ------------------------ | ----------- | +| MASBR_ACTION | **Yes** | Whether to run the playbook to perform a `backup` or a `restore` | +| MASBR_BACKUP_TYPE | No (`full`) | Set `full` or `incr` to indicate the playbook to create a **full** backup or **incremental** backup. | +| MASBR_BACKUP_FROM_VERSION | No | Set the full backup version to use in the incremental backup, this will be in the format of a `YYYMMDDHHMMSS` timestamp (e.g. `20240621021316`). | + +The playbooks are switched to backup mode by setting `MASBR_ACTION` to `backup`. + +### Full Backups +If you set environment variable `MASBR_BACKUP_TYPE=full` or do not specify a value for this variable, the playbook will take a full backup. + +### Incremental Backups +You can set environment variable `MASBR_BACKUP_TYPE=incr` to indicate the playbook to take an incremental backup. + +!!! important + Only supports creating incremental backup for MonogDB, Db2 and persistent volume data. The playbook will always create a full backup for other type of data regardless of whether this variable be set to `incr`. + +The environment variable `MASBR_BACKUP_FROM_VERSION` is only valid if `MASBR_BACKUP_TYPE=incr`. It indicates which backup version that the incremental backup to based on. If you do not set a value for this variable, the playbook will try to find the latest Completed Full backup from the specified storage location, and then take an incremental backup based on it. + +!!! important + The backup files you specified by `MASBR_BACKUP_FROM_VERSION` must be a Full backup. And the component name and data types in the specified Full backup file must be same as the current incremental backup job. + + +Configuration - Restore +------------------------------------------------------------------------------- + +| Envrionment variable | Required (Default Value) | Description | +| ------------------------------------ | ------------------------ | ----------- | +| MASBR_ACTION | **Yes** | Whether to run the playbook to perform a `backup` or a `restore` | +| MASBR_RESTORE_FROM_VERSION | **Yes** | Set the backup version to use in the restore, this will be in the format of a `YYYMMDDHHMMSS` timestamp (e.g. `20240621021316`) | +| MASBR_RESTORE_OVERWRITE | **Yes** | Set whether the restore should **overwrite** any existing data or if we should stop and **FAIL** if there is data detected in the directory. **WARNING:** This will overwrite all data when restoring! | + +The playbooks are switched to restore mode by setting `MASBR_ACTION` to `restore`. You **must** specify the `MASBR_RESTORE_FROM_VERSION` environment variable to indicate which version of the backup files to use. + +In the case of restoring from an incremental backup, the corresponding full backup will be restored first before continuing to restore the incremental backup. + + +Backup/Restore for MongoDB +------------------------------------------------------------------------------- +This playbook `ibm.mas_devops.br_mongodb` will invoke the role [mongodb](../roles/mongodb.md) to backup/restore the MongoDB databases. + +This playbook supports backing up and restoring databases for an in-cluster MongoDB CE instance. If you are using other MongoDB venders, such as IBM Cloud Databases for MongoDB, Amazon DocumentDB or MongoDB Altas Database, please refer to the corresponding vender's documentation for more information about their provided backup/restore service. + +### Environment Variables +- `MONGODB_NAMESPACE`: By default the backup and restore processes will use a namespace of `mongoce`, if you have customized the install of MongoDb CE you must set this environment variable to the appropriate namespace you wish to backup from/restore to. +- `MAS_INSTANCE_ID`: **Required**. This playbook supports backup/restore MongoDB databases that belong to a specific MAS instance, call the playbook multiple times with different values for `MAS_INSTANCE_ID` if you wish to back up multiple MAS instances that use the same MongoDB CE instance. +- `MAS_APP_ID`: **Optional**. By default, this playbook will backup all databases belonging to the specified MAS instance. You can backup the databases only belong to a specific MAS application by setting this environment variable to a supported MAS application id `core`, `manage`, `iot`, `monitor`, `health`, `optimizer` or `visualinspection`. + +### Examples +```bash +# Full backup all MongoDB data for the dev1 instance +export MASBR_ACTION=backup +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MAS_INSTANCE_ID=dev +ansible-playbook ibm.mas_devops.br_mongodb + +# Incremental backup all MongoDB data for the dev1 instance +export MASBR_ACTION=backup +export MASBR_BACKUP_TYPE=incr +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MAS_INSTANCE_ID=dev +ansible-playbook ibm.mas_devops.br_mongodb + +# Restore all MongoDB data for the dev1 instance +export MASBR_ACTION=restore +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MASBR_RESTORE_FROM_VERSION=20240630132439 +export MAS_INSTANCE_ID=dev +ansible-playbook ibm.mas_devops.br_mongodb + +# Backup just the IoT MongoDB data for the dev2 instance +export MASBR_ACTION=backup +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MAS_INSTANCE_ID=dev2 +export MAS_APP_ID=iot +ansible-playbook ibm.mas_devops.br_mongodb +``` + + +Backup/Restore for Db2 +------------------------------------------------------------------------------- +This playbook `ibm.mas_devops.br_db2` will invoke the role [db2](../roles/db2.md) to backup/restore a single Db2 instance. + +### Environment Variables + +- `DB2_INSTANCE_NAME`: **Required** This playbook only supports backing up specific Db2 instance at a time. If you want to backup all Db2 instances in the Db2 cluster, you need to run this playbook multiple times with different value of this environment variable. +- `MAS_INSTANCE_ID`: **Required** Set the instance ID for the MAS install. +- `MASBR_ACTION`: **Required** Set the action to be performed, `backup` or `restore`. +- `MASBR_STORAGE_LOCAL_FOLDER`: **Required** Set the local path to the directory to be used for backup and restore. +- `DB2_NAMESPACE`: **Optional** Set the DB2 namespace, defaults to `db2u` + +### Examples +```bash +# Full backup for the db2w-shared Db2 instance +export MAS_INSTANCE_ID=dev +export MASBR_ACTION=backup +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export DB2_INSTANCE_NAME=db2w-shared +ansible-playbook ibm.mas_devops.br_db2 + +# Incremental backup for the db2w-shared Db2 instance +export MAS_INSTANCE_ID=dev +export MASBR_ACTION=backup +export MASBR_BACKUP_TYPE=incr +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export DB2_INSTANCE_NAME=db2w-shared +ansible-playbook ibm.mas_devops.br_db2 + +# Restore for the db2w-shared Db2 instance +export MAS_INSTANCE_ID=dev +export MASBR_ACTION=restore +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MASBR_RESTORE_FROM_VERSION=20240630132439 +export DB2_INSTANCE_NAME=db2w-shared +ansible-playbook ibm.mas_devops.br_db2 +``` + +Backup/Restore for MAS Core +------------------------------------------------------------------------------- +This playbook `ibm.mas_devops.br_core` will backup the following components that MAS Core depends on in order: + +| Component | Ansible Role | Data included | +| --------- | -------------------------------------------------------- | ---------------------------------- | +| mongodb | [mongodb](../roles/mongodb.md) | MongoDB databases used by MAS Core | +| core | [suite_backup_restore](../roles/suite_backup_restore.md) | MAS Core namespace resources | + + +### Environment Variables + +- `MAS_INSTANCE_ID` **Required**. The MAS instance ID to perform a backup for. + +### Examples +```bash +# Full backup all core data for the dev1 instance +export MASBR_ACTION=backup +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MAS_INSTANCE_ID=dev +ansible-playbook ibm.mas_devops.br_core + +# Incremental backup all core data for the dev1 instance +export MASBR_ACTION=backup +export MASBR_BACKUP_TYPE=incr +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MAS_INSTANCE_ID=dev +ansible-playbook ibm.mas_devops.br_core + +# Restore all core data for the dev1 instance +export MASBR_ACTION=restore +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MASBR_RESTORE_FROM_VERSION=20240630132439 +export MAS_INSTANCE_ID=dev +ansible-playbook ibm.mas_devops.br_core +``` + + +Backup/Restore for Manage +------------------------------------------------------------------------------- +This playbook `ibm.mas_devops.br_manage` will backup the following components that Manage depends on in order: + +| Component | Role | Data included | +| --------- | ---------------------------------------------------------------- | ------------------------------------------------------------------------------ | +| mongodb | [mongodb](../roles/mongodb.md) | MongoDB databases used by MAS Core | +| db2 | [db2](../roles/db2.md) | Db2 instance used by Manage | +| core | [suite_backup_restore](../roles/suite_backup_restore.md) | MAS Core namespace resources | +| manage | [suite_app_backup_restore](../roles/suite_app_backup_restore.md) | Manage namespace resources
Persistent volume data, such as attachments | + + +### Environment Variables + +- `MAS_INSTANCE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS instance at a time. If you have multiple MAS instances in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. +- `MAS_WORKSPACE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS workspace at a time. If you have multiple MAS workspaces in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. +- `DB2_INSTANCE_NAME` **Optional**. When defined, this playbook will backup the Db2 instance used by Manage. DB2 role is skipped when environment variable is not defined.. +- `DB2_NAMESPACE`: **Optional** Set the DB2 namespace, defaults to `db2u` + +### Examples + +```bash +# Full backup all manage data for the dev1 instance and ws1 workspace +export MASBR_ACTION=backup +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MAS_INSTANCE_ID=dev +export MAS_WORKSPACE_ID=ws1 +export DB2_INSTANCE_NAME=mas-dev1-ws1-manage # set this to execute db2 backup role +ansible-playbook ibm.mas_devops.br_manage + +# Incremental backup all manage data for the dev1 instance and ws1 workspace +export MASBR_ACTION=backup +export MASBR_BACKUP_TYPE=incr +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MAS_INSTANCE_ID=dev +export MAS_WORKSPACE_ID=ws1 +export DB2_INSTANCE_NAME=mas-dev1-ws1-manage # set this to execute db2 backup role +ansible-playbook ibm.mas_devops.br_manage + +# Restore all manage data for the dev1 instance and ws1 workspace +export MASBR_ACTION=restore +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MASBR_RESTORE_FROM_VERSION=20240630132439 +export MAS_INSTANCE_ID=dev +export MAS_WORKSPACE_ID=ws1 +export DB2_INSTANCE_NAME=mas-dev1-ws1-manage # set this to execute db2 backup role +ansible-playbook ibm.mas_devops.br_manage +``` + + +Backup/Restore for IoT +------------------------------------------------------------------------------- +This playbook `ibm.mas_devops.br_iot` will backup the following components that IoT depends on in order: + +| Component | Ansible Role | Data included | +| --------- | ---------------------------------------------------------------- | ------------------------------------------ | +| mongodb | [mongodb](../roles/mongodb.md) | MongoDB databases used by MAS Core and IoT | +| db2 | [db2](../roles/db2.md) | Db2 instance used by IoT | +| core | [suite_backup_restore](../roles/suite_backup_restore.md) | MAS Core namespace resources | +| iot | [suite_app_backup_restore](../roles/suite_app_backup_restore.md) | IoT namespace resources | + +### Environment Variables + +- `MAS_INSTANCE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS instance at a time. If you have multiple MAS instances in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. +- `MAS_WORKSPACE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS workspace at a time. If you have multiple MAS workspaces in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. +- `DB2_INSTANCE_NAME` **Required**. This playbook will backup the the Db2 instance used by IoT, you need to set the correct Db2 instance name for this environment variable. +- `DB2_NAMESPACE`: **Optional** Set the DB2 namespace, defaults to `db2u` + +### Examples + +```bash +# Full backup all iot data for the dev1 instance +export MASBR_ACTION=backup +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MAS_INSTANCE_ID=dev +export MAS_WORKSPACE_ID=ws1 +export DB2_INSTANCE_NAME=db2w-shared +ansible-playbook ibm.mas_devops.br_iot + +# Incremental backup all iot data for the dev1 instance +export MASBR_ACTION=backup +export MASBR_BACKUP_TYPE=incr +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MAS_INSTANCE_ID=dev +export MAS_WORKSPACE_ID=ws1 +export DB2_INSTANCE_NAME=db2w-shared +ansible-playbook ibm.mas_devops.br_iot + +# Restore all iot data for the dev1 instance +export MASBR_ACTION=restore +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MASBR_RESTORE_FROM_VERSION=20240630132439 +export MAS_INSTANCE_ID=dev +export MAS_WORKSPACE_ID=ws1 +export DB2_INSTANCE_NAME=db2w-shared +ansible-playbook ibm.mas_devops.br_iot + +``` + + +Backup/Restore for Monitor +------------------------------------------------------------------------------- +This playbook `ibm.mas_devops.br_monitor` will backup the following components that Monitor depends on in order: + +| Component | Ansible Role | Data included | +| --------- | ---------------------------------------------------------------- | --------------------------------------------------- | +| mongodb | [mongodb](../roles/mongodb.md) | MongoDB databases used by MAS Core, IoT and Monitor | +| db2 | [db2](../roles/db2.md) | Db2 instance used by IoT and Monitor | +| core | [suite_backup_restore](../roles/suite_backup_restore.md) | MAS Core namespace resources | +| iot | [suite_app_backup_restore](../roles/suite_app_backup_restore.md) | IoT namespace resources | +| monitor | [suite_app_backup_restore](../roles/suite_app_backup_restore.md) | Monitor namespace resources | + + +### Environment Variables + +- `MAS_INSTANCE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS instance at a time. If you have multiple MAS instances in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. +- `MAS_WORKSPACE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS workspace at a time. If you have multiple MAS workspaces in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. +- `DB2_INSTANCE_NAME` **Required**. This playbook will backup the the Db2 instance used by IoT and Monitor, you need to set the correct Db2 instance name for this environment variable. +- `DB2_NAMESPACE`: **Optional** Set the DB2 namespace, defaults to `db2u` + +### Examples + +```bash +# Full backup all monitor data for the dev1 instance +export MASBR_ACTION=backup +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MAS_INSTANCE_ID=dev +export MAS_WORKSPACE_ID=ws1 +export DB2_INSTANCE_NAME=db2w-shared +ansible-playbook ibm.mas_devops.br_monitor + +# Incremental backup all monitor data for the dev1 instance +export MASBR_ACTION=backup +export MASBR_BACKUP_TYPE=incr +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MAS_INSTANCE_ID=dev +export MAS_WORKSPACE_ID=ws1 +export DB2_INSTANCE_NAME=db2w-shared +ansible-playbook ibm.mas_devops.br_monitor + +# Restore all monitor data for the dev1 instance +export MASBR_ACTION=restore +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MASBR_RESTORE_FROM_VERSION=20240630132439 +export MAS_INSTANCE_ID=dev +export MAS_WORKSPACE_ID=ws1 +export DB2_INSTANCE_NAME=db2w-shared +ansible-playbook ibm.mas_devops.br_monitor +``` + + +Backup/Restore for Health +------------------------------------------------------------------------------- +This playbook `ibm.mas_devops.br_health` will backup the following components that Health depends on in order: + +| Component | Ansible Role | Data included | +| --------- | ---------------------------------------------------------------- | ------------------------------------------------------------------------------ | +| mongodb | [mongodb](../roles/mongodb.md) | MongoDB databases used by MAS Core | +| db2 | [db2](../roles/db2.md) | Db2 instance used by Manage and Health | +| core | [suite_backup_restore](../roles/suite_backup_restore.md) | MAS Core namespace resources | +| manage | [suite_app_backup_restore](../roles/suite_app_backup_restore.md) | Manage namespace resources
Persistent volume data, such as attachments | +| health | [suite_backup_restore](../roles/suite_backup_restore.md) | Health namespace resources
Watson Studio project assets | + +### Environment Variables + +- `MAS_INSTANCE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS instance at a time. If you have multiple MAS instances in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. +- `MAS_WORKSPACE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS workspace at a time. If you have multiple MAS workspaces in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. +- `DB2_INSTANCE_NAME` **Required**. This playbook will backup the the Db2 instance used by Manage and Health, you need to set the correct Db2 instance name for this environment variable. + +### Examples + +```bash +# Full backup all health data for the dev1 instance and ws1 workspace +export MASBR_ACTION=backup +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MAS_INSTANCE_ID=dev +export MAS_WORKSPACE_ID=ws1 +export DB2_INSTANCE_NAME=mas-dev1-ws1-manage +ansible-playbook ibm.mas_devops.br_health + +# Incremental backup all health data for the dev1 instance and ws1 workspace +export MASBR_ACTION=backup +export MASBR_BACKUP_TYPE=incr +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MAS_INSTANCE_ID=dev +export MAS_WORKSPACE_ID=ws1 +export DB2_INSTANCE_NAME=mas-dev1-ws1-manage +ansible-playbook ibm.mas_devops.br_health + +# Restore all health data for the dev1 instance and ws1 workspace +export MASBR_ACTION=restore +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MASBR_RESTORE_FROM_VERSION=20240630132439 +export MAS_INSTANCE_ID=dev +export MAS_WORKSPACE_ID=ws1 +export DB2_INSTANCE_NAME=mas-dev1-ws1-manage +ansible-playbook ibm.mas_devops.br_health +``` + + +Backup/Restore for Optimizer +------------------------------------------------------------------------------- +This playbook `ibm.mas_devops.br_optimizer` will backup the following components that Optimizer depends on in order: + +| Component | Ansible Role | Data included | +| --------- | ---------------------------------------------------------------- | ------------------------------------------------------------------------------ | +| mongodb | [mongodb](../roles/mongodb.md) | MongoDB databases used by MAS Core and Optimizer | +| db2 | [db2](../roles/db2.md) | Db2 instance used by Manage | +| core | [suite_backup_restore](../roles/suite_backup_restore.md) | MAS Core namespace resources | +| manage | [suite_app_backup_restore](../roles/suite_app_backup_restore.md) | Manage namespace resources
Persistent volume data, such as attachments | +| optimizer | [suite_backup_restore](../roles/suite_backup_restore.md) | Optimizer namespace resources | + +### Environment Variables + +- `MAS_INSTANCE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS instance at a time. If you have multiple MAS instances in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. +- `MAS_WORKSPACE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS workspace at a time. If you have multiple MAS workspaces in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. +- `DB2_INSTANCE_NAME` **Required**. This playbook will backup the the Db2 instance used by Manage, you need to set the correct Db2 instance name for this environment variable. +- `DB2_NAMESPACE`: **Optional** Set the DB2 namespace, defaults to `db2u` + +### Examples + +```bash +# Full backup all optimizer data for the dev1 instance and ws1 workspace +export MASBR_ACTION=backup +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MAS_INSTANCE_ID=dev +export MAS_WORKSPACE_ID=ws1 +export DB2_INSTANCE_NAME=mas-dev1-ws1-manage +ansible-playbook ibm.mas_devops.br_optimizer + +# Incremental backup all optimizer data for the dev1 instance and ws1 workspace +export MASBR_ACTION=backup +export MASBR_BACKUP_TYPE=incr +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MAS_INSTANCE_ID=dev +export MAS_WORKSPACE_ID=ws1 +export DB2_INSTANCE_NAME=mas-dev1-ws1-manage +ansible-playbook ibm.mas_devops.br_optimizer + +# Restore all optimizer data for the dev1 instance and ws1 workspace +export MASBR_ACTION=restore +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MASBR_RESTORE_FROM_VERSION=20240630132439 +export MAS_INSTANCE_ID=dev +export MAS_WORKSPACE_ID=ws1 +export DB2_INSTANCE_NAME=mas-dev1-ws1-manage +ansible-playbook ibm.mas_devops.br_optimizer +``` + + +Backup/Restore for Visual Inspection +------------------------------------------------------------------------------- +This playbook `ibm.mas_devops.br_visualinspection` will backup the following components that Visual Inspection depends on in order: + +| Component | Ansible Role | Data included | +| ---------------- | ---------------------------------------------------------------- | ------------------------------------------------------------------------------------------- | +| mongodb | [mongodb](../roles/mongodb.md) | MongoDB databases used by MAS Core and Visual Inspection | +| core | [suite_backup_restore](../roles/suite_backup_restore.md) | MAS Core namespace resources | +| visualinspection | [suite_app_backup_restore](../roles/suite_app_backup_restore.md) | Visual Inspection namespace resources
Persistent volume data, such as images and models | + +### Environment Variables + +- `MAS_INSTANCE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS instance at a time. If you have multiple MAS instances in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. +- `MAS_WORKSPACE_ID` **Required**. This playbook only supports backing up components belong to a specific MAS workspace at a time. If you have multiple MAS workspaces in the cluster to be backed up, you need to run this playbook multiple times with different value of this environment variable. + +### Examples + +```bash +# Full backup all visual inspection data for the dev1 instance and ws1 workspace +export MASBR_ACTION=backup +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MAS_INSTANCE_ID=dev +export MAS_WORKSPACE_ID=ws1 +ansible-playbook ibm.mas_devops.br_visualinspection + +# Incremental backup all visual inspection data for the dev1 instance and ws1 workspace +export MASBR_ACTION=backup +export MASBR_BACKUP_TYPE=incr +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MAS_INSTANCE_ID=dev +export MAS_WORKSPACE_ID=ws1 +ansible-playbook ibm.mas_devops.br_visualinspection + +# Restore all visual inspection data for the dev1 instance and ws1 workspace +export MASBR_ACTION=restore +export MASBR_STORAGE_LOCAL_FOLDER=/tmp/backup +export MASBR_RESTORE_FROM_VERSION=20240630132439 +export MAS_INSTANCE_ID=dev +export MAS_WORKSPACE_ID=ws1 +ansible-playbook ibm.mas_devops.br_visualinspection +``` + + +Reference +------------------------------------------------------------------------------- +### Directory Structure +No matter what kind of storage systems you choose, the folder structure created in the storage system is same. + +Below is the sample folder structure for saving backup jobs: + +``` +/backups/mongodb-main-full-20240621122530 +├── backup.yml +├── database +│ ├── mongodb-main-full-20240621122530.tar.gz +│ └── query.json +└── log + ├── mongodb-main-full-20240621122530-backup-log.tar.gz + └── mongodb-main-full-20240621122530-ansible-log.tar.gz + +/backups/core-main-full-20240621122530 +├── backup.yml +├── log +│ ├── core-main-full-20240621122530-ansible-log.tar.gz +│ └── core-main-full-20240621122530-namespace-log.tar.gz +└── namespace + └── core-main-full-20240621122530-namespace.tar.gz +``` + +- ``: the root folder is specified by `MASBR_STORAGE_LOCAL_FOLDER` or `MASBR_STORAGE_CLOUD_BUCKET` +- The backup playbooks will create a seperated backup job folder under the `backups` folder for each component. The backup job folder is named by following this format: `{BACKUP COMPONENT}-{INSTANCE ID}-{BACKUP TYPE}-{BACKUP VERSION}`. +- When using playbook to backup multiple components at once, all backup job folders will be assigned to the same backup version. In above example, the same backup version `20240621122530` for backing up `mongodb` and `core` components. +- `backup.yml`: keep the backup job information +- `database`: data type for database. This folder save the backup files of MongoDB database, Db2 database. +- `namespace`: data type for namespace resources. This folder save the exported namespace resources. +- `pv`: data type for persistent volume. This folder save the persistent volume data, e.g. the Manage attachments, VI images and models. +- `log`: this folder save all job running log files + +In addition to the backup jobs, we also save restore jobs in the specified storage location. For example: + +``` +/restores/mongodb-main-incr-20240622040201-20240622075501 +├── log +│ ├── mongodb-main-incr-20240622040201-20240622075501-ansible-log.tar.gz +│ └── mongodb-main-incr-20240622040201-20240622075501-restore-log.tar.gz +└── restore.yml + +/restores/core-main-incr-20240622040201-20240622075501 +├── log +│ ├── core-main-incr-20240622040201-20240622075501-ansible-log.tar.gz +│ └── core-main-incr-20240622040201-20240622075501-namespace-log.tar.gz +└── restore.yml +``` + +The restore playbooks will create a seperated restore job folder under the `restores` folder for each component. The restore job folder is named by following this format: `{BACKUP JOB NAME}-{RESTORE VERSION}`. + +- `restore.yml`: keep the restore job information +- `log`: this folder save all job running log files + +### Data Model +#### backup.yml +```yaml +kind: Backup +name: "core-main-incr-20240622040201" +version: "20240622040201" +type: "incr" +from: "core-main-full-20240621122530" +source: + domain: "source-cluster.mydomain.com" + suite: "8.11.11" + instance: "main" + workspace: "" +component: + name: "core" + instance: "main" + namespace: "mas-main-core" +data: + - seq: "1" + type: "namespace" + phase: "Completed" +status: + phase: "Completed" + startTimestamp: "2024-06-22T04:05:22" + completionTimestamp: "2024-06-22T04:06:04" + sentNotifications: + - type: "Slack" + channel: "#ansible-slack-dev" + timestamp: "2024-06-22T04:05:34" + phase: "InProgress" + - type: "Slack" + channel: "#ansible-slack-dev" + timestamp: "2024-06-22T04:06:10" + phase: "Completed" +``` + +#### restore.yml +```yaml +kind: Restore +name: "core-main-incr-20240622040201-20240622075501" +version: "20240622075501" +from: "core-main-incr-20240622040201" +target: + domain: "target-cluster.mydomain.com" +component: + name: "core" + instance: "main" + namespace: "mas-main-core" +data: + - seq: 1 + type: "namespace" + phase: "Completed" +status: + phase: "Completed" + startTimestamp: "2024-06-22T08:04:19" + completionTimestamp: "2024-06-22T08:04:33" +``` diff --git a/ibm/mas_devops/common_tasks/backup_restore/after_run_tasks.yml b/ibm/mas_devops/common_tasks/backup_restore/after_run_tasks.yml deleted file mode 100644 index f8ae4c50da..0000000000 --- a/ibm/mas_devops/common_tasks/backup_restore/after_run_tasks.yml +++ /dev/null @@ -1,54 +0,0 @@ ---- -# After backup/restore component -# ------------------------------------------------------------------------- -- name: "After {{ masbr_job_type }} {{ masbr_job_component.name }}" - when: _component_after_task_path is defined and _component_after_task_path | length > 0 - include_tasks: "{{ _component_after_task_path }}" - -# Copy Ansible log file to storage location -# ----------------------------------------------------------------------------- -- name: "Set fact: Ansible log path" - set_fact: - masbr_ansible_log_path: "{{ lookup('env', 'ANSIBLE_LOG_PATH') }}" - masbr_ansible_log_name: "{{ masbr_job_name }}-ansible" - -- name: "Copy Ansible log file to storage location" - when: - - masbr_ansible_log_path is defined - - masbr_ansible_log_path | length > 0 - block: - - name: "Debug: Ansbile log path" - debug: - msg: "Ansible log path .................. {{ masbr_ansible_log_path }}" - - - name: "Create a tar.gz archive of Ansible log file" - shell: >- - mkdir -p {{ masbr_local_job_folder }}/log && - cp -f {{ masbr_ansible_log_path }} {{ masbr_local_job_folder }}/log/{{ masbr_ansible_log_name }}.log && - tar -czf {{ masbr_local_job_folder }}/log/{{ masbr_ansible_log_name }}-log.tar.gz - -C {{ masbr_local_job_folder }}/log {{ masbr_ansible_log_name }}.log - - - name: "Copy Ansible log file from local to storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_local_files_to_storage.yml" - vars: - masbr_cf_job_type: "{{ masbr_job_type }}" - masbr_cf_job_name: "{{ masbr_job_name_final }}" - masbr_cf_paths: - - src_file: "log/{{ masbr_ansible_log_name }}-log.tar.gz" - dest_folder: "log" - -# Delete local job folder -# ----------------------------------------------------------------------------- -- name: "Delete local job folder" - file: - path: "{{ masbr_local_job_folder }}" - state: absent - -# Display summary of the running task results -# ----------------------------------------------------------------------------- -- name: "Summary" - debug: - msg: - - "Job name ........................... {{ masbr_job_name }}" - - "Job status ......................... {{ masbr_job_status.phase }}" - - "Job storage location ............... {{ masbr_storage_job_folder_final | default(masbr_storage_job_folder, true) }}" diff --git a/ibm/mas_devops/common_tasks/backup_restore/before_run_tasks.yml b/ibm/mas_devops/common_tasks/backup_restore/before_run_tasks.yml deleted file mode 100644 index ab977485f4..0000000000 --- a/ibm/mas_devops/common_tasks/backup_restore/before_run_tasks.yml +++ /dev/null @@ -1,33 +0,0 @@ ---- -# Scenario 1 - when running a playbook: -# 1. playbook include this task -# 2. roles include this task -# Scenario 2 - when running a role: -# 1. the role include this task - -# Check common variables -# ----------------------------------------------------------------------------- -- name: "Check common variables" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/check_common_vars.yml" - -# Confirm cluster information -# ----------------------------------------------------------------------------- -- name: "Confirm the currently connected cluster information" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/confirm_cluster_info.yml" - -# Check common backup/restore variables -# ----------------------------------------------------------------------------- -- name: "Check common {{ _job_type }} variables" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/check_{{ _job_type }}_vars.yml" - -# Before backup/restore component -# ------------------------------------------------------------------------- -- name: "Before {{ _job_type }} {{ masbr_job_component.name }}" - when: _component_before_task_path is defined and _component_before_task_path | length > 0 - include_tasks: "{{ _component_before_task_path }}" - -# Set a flag to indicate these tasks are included -# ----------------------------------------------------------------------------- -- name: "Set fact: already included these tasks" - set_fact: - masbr_included_before_run_tasks: true diff --git a/ibm/mas_devops/common_tasks/backup_restore/check_backup_vars.yml b/ibm/mas_devops/common_tasks/backup_restore/check_backup_vars.yml deleted file mode 100644 index e964a61079..0000000000 --- a/ibm/mas_devops/common_tasks/backup_restore/check_backup_vars.yml +++ /dev/null @@ -1,196 +0,0 @@ ---- -# Set below common job facts: -# masbr_task_type: backup, restore -# masbr_job_type: backup, restore -# masbr_job_name, masbr_job_name_final -# -# Set below backup job facts: -# masbr_backup_from -# masbr_backup_from_yaml - -# Backup environment variables -# ----------------------------------------------------------------------------- -- name: "Set fact: backup environment variables" - set_fact: - # Supported backup types: 'full', 'incr', 'delta' (Not support 'delta' by now) - masbr_backup_type: "{{ lookup('env', 'MASBR_BACKUP_TYPE') | default('full', true) }}" - - # Data type string separated by commas: e.g.'namespace,pv' - masbr_backup_data: "{{ lookup('env', 'MASBR_BACKUP_DATA') | default('', true) }}" - - # The version of the backup to create incremental backup based on - # only used when masbr_backup_type='incr' - masbr_backup_from_version: "{{ lookup('env', 'MASBR_BACKUP_FROM_VERSION') | default('', true) }}" - -# Check 'masbr_job_component' -# ----------------------------------------------------------------------------- -- name: "Fail if masbr_job_component is not provided" - assert: - that: - - masbr_job_component is defined - - ('name' in masbr_job_component) - - ('namespace' in masbr_job_component) - fail_msg: - - "masbr_job_component.name is required" - - "masbr_job_component.namespace is required" - -# Check 'masbr_job_data_list' -# ----------------------------------------------------------------------------- -- name: "Set fact: init masbr_job_data_list" - set_fact: - masbr_job_data_list: "{{ masbr_job_data_list | default([], true) }}" - -- name: "Set fact: specified backup data" - when: - - masbr_backup_data is defined - - masbr_backup_data | length > 0 - - (_ignore_masbr_backup_data is not defined) or (_ignore_masbr_backup_data is defined and not _ignore_masbr_backup_data) - block: - - name: "Set fact: reset masbr_job_data_specified" - set_fact: - masbr_job_data_specified: [] - - - name: "Get specified backup data" - set_fact: - masbr_job_data_specified: "{{ masbr_job_data_specified + [{ 'seq': (idx+1)|int, 'type': item|trim }] }}" - loop: "{{ masbr_backup_data | split(',') }}" - loop_control: - index_var: idx - - - name: "Set fact: override the default masbr_job_data_list" - set_fact: - masbr_job_data_list: "{{ masbr_job_data_specified }}" - -- name: "Set fact: set default phase to each backup data" - when: masbr_job_data_list is defined and masbr_job_data_list | length > 0 - block: - - name: "Set fact: reset masbr_job_data_init" - set_fact: - masbr_job_data_init: [] - - - name: "Set fact: set default phase to each backup data" - set_fact: - masbr_job_data_init: "{{ masbr_job_data_init + [ item | combine({ 'phase': 'New' }) ] }}" - loop: "{{ masbr_job_data_list }}" - - - name: "Set fact: backup data with default phase" - set_fact: - masbr_job_data_list: "{{ masbr_job_data_init }}" - -# Set 'masbr_task_type' -# ----------------------------------------------------------------------------- -- name: "Set fact: backup job variables" - set_fact: - masbr_task_type: "backup" - masbr_job_type: "backup" - -- name: "Set fact: job name include instance" - when: masbr_job_component.instance is defined and masbr_job_component.instance | length > 0 - set_fact: - # Format '---' - # 'mongodb-main-incr-20240509130354' - # 'db2-mas-main-masdev-manage-full-20240509130354' - # 'manage-ivt90x-01-full-20240509130354' - masbr_job_name_prefix: >- - {{ masbr_job_component.name }}-{{ masbr_job_component.instance }} - -- name: "Set fact: job name without instance" - when: masbr_job_component.instance is undefined or masbr_job_component.instance | length == 0 - set_fact: - masbr_job_name_prefix: "{{ masbr_job_component.name }}" - -- name: "Set fact: backup job name" - set_fact: - masbr_job_name: "{{ masbr_job_name_prefix }}-{{ masbr_backup_type }}-{{ masbr_job_version }}" - -- name: "Set fact: final backup job name" - set_fact: - # At this point, set it as the same value of masbr_job_name - masbr_job_name_final: "{{ masbr_job_name }}" - -# Create local job folder -# ----------------------------------------------------------------------------- -- name: "Create local job folder" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/create_local_job_folder.yml" - -# Check incremental backup -# ----------------------------------------------------------------------------- -- name: "Checks for incremental backup" - when: masbr_backup_type is defined and masbr_backup_type == "incr" - block: - # when 'masbr_backup_from_version' is not specified: find the latest full backup - - name: "Get the latest Full backup job name when masbr_backup_from_version not provided" - when: masbr_backup_from_version is not defined or masbr_backup_from_version | length == 0 - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/list_storage_job_folders.yml" - vars: - masbr_ls_job_type: "backup" - masbr_ls_filter: "| grep -P '^{{ masbr_job_name_prefix }}-full-.*(? 0 - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/list_storage_job_folders.yml" - vars: - masbr_ls_job_type: "backup" - masbr_ls_filter: "| grep '^{{ masbr_job_name_prefix }}-full-{{ masbr_backup_from_version }}$' | sort -r | head -1" - - - name: "Fail if not found any previous Full backup job" - assert: - that: masbr_ls_results is defined and masbr_ls_results | length == 1 - fail_msg: "Not found any previous Full backup job, please take a Full backup first!" - - - name: "Set fact: backup from job name" - set_fact: - masbr_backup_from: "{{ masbr_ls_results[0] }}" - - # Get backup from job information - - name: "Get backup from job information" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_storage_files_to_local.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_backup_from }}" - masbr_cf_paths: - - src_file: "backup.yml" - dest_folder: "from" - - - name: "Set fact: backup from job information" - set_fact: - masbr_backup_from_yaml: "{{ lookup('file', masbr_local_job_folder + '/from/backup.yml') | from_yaml }}" - - - name: "Debug: backup from job information" - debug: - msg: "{{ masbr_backup_from_yaml }}" - - # The backup from should be Completed - - name: "Fail if the backup from is not Completed" - assert: - that: masbr_backup_from_yaml.status.phase == "Completed" - fail_msg: "The specified backup job is not Completed, please specify a Completed backup job." - - # The backup from job should has the same component and data as current job - - name: "Fail if the component name of backup from job is not same as current job" - assert: - that: masbr_backup_from_yaml.component.name == masbr_job_component.name - fail_msg: "The component name of backup from job is not same as current job" - - - name: "Set fact: data list difference" - set_fact: - masbr_job_data_list_differ: >- - {{ masbr_job_data_list | map(attribute='type') | - difference(masbr_backup_from_yaml.data | map(attribute='type')) }} - - - name: "Fail if the data list of backup from job does not cover current job" - assert: - that: masbr_job_data_list_differ | length == 0 - fail_msg: "The data list of backup from job does not cover current job: {{ masbr_job_data_list_differ }}" - -# Show backup job information -# ----------------------------------------------------------------------------- -- name: "Debug: backup job information" - debug: - msg: - - "Backup job name ....................... {{ masbr_job_name }}" - - "Backup type ........................... {{ masbr_backup_type }}" - - "Backup from ........................... {{ masbr_backup_from | default('', true) }}" - - "Backup component ...................... {{ masbr_job_component }}" - - "Backup data ........................... {{ masbr_job_data_list }}" diff --git a/ibm/mas_devops/common_tasks/backup_restore/check_common_vars.yml b/ibm/mas_devops/common_tasks/backup_restore/check_common_vars.yml deleted file mode 100644 index 1f3eb049a4..0000000000 --- a/ibm/mas_devops/common_tasks/backup_restore/check_common_vars.yml +++ /dev/null @@ -1,41 +0,0 @@ ---- -# Load default variables -# ----------------------------------------------------------------------------- -- name: "Load common variables" - include_vars: "{{ role_path }}/../../common_vars/backup_restore.yml" - -- name: "Set fact: internal used common variables" - set_fact: - # ONLY FOR DEV - __masbr_dev_create_env_file: "{{ lookup('env', '__MASBR_DEV_CREATE_ENV_FILE') | default(false, true) | bool }}" - - # Temp folder in the Pod for backup/restore - masbr_pod_temp_folder: "/tmp/masbr" - - # Timestamp display format - masbr_timestamp_format: "%Y%m%d%H%M%S" - -- name: "Set fact: job lock file" - set_fact: - masbr_pod_lock_file: "{{masbr_pod_temp_folder}}/running.lock" - -# Get 'masbr_job_version' in below order: -# 1. get from input 'masbr_job_version' -# 2. if not set, get from env 'MASBR_JOB_VERSION' -# 3. For schedule, always create a new version. -- name: "Get job version from env" - when: masbr_job_version is not defined - set_fact: - masbr_job_version: "{{ lookup('env', 'MASBR_JOB_VERSION') | default(masbr_timestamp_format | strftime, true) }}" - -# Storage location -# ----------------------------------------------------------------------------- -- name: "Fail if masbr_storage_local_folder is not provided" - assert: - that: masbr_storage_local_folder is defined and masbr_storage_local_folder != "" - fail_msg: "MASBR_STORAGE_LOCAL_FOLDER is required" - -- name: "Debug: variables for local backup storage" - debug: - msg: - - "Local storage folder ............... {{ masbr_storage_local_folder }}" diff --git a/ibm/mas_devops/common_tasks/backup_restore/check_restore_vars.yml b/ibm/mas_devops/common_tasks/backup_restore/check_restore_vars.yml deleted file mode 100644 index 49c5e1d22c..0000000000 --- a/ibm/mas_devops/common_tasks/backup_restore/check_restore_vars.yml +++ /dev/null @@ -1,208 +0,0 @@ ---- -# Set below common job facts: -# masbr_task_type, masbr_job_type -# masbr_job_name, masbr_job_name_final -# -# Set below restore job facts: -# masbr_restore_from -# masbr_restore_basedon -# masbr_restore_from_yaml -# masbr_restore_from_incr: true|false -# masbr_restore_to_diff_domain: true|false -# masbr_restore_to_diff_instance: true|false - -# Restore environment variables -# ----------------------------------------------------------------------------- -- name: "Set fact: restore environment variables" - set_fact: - # The information of the backup to be restored from - # - masbr_restore_from_version: "{{ lookup('env', 'MASBR_RESTORE_FROM_VERSION') }}" - masbr_restore_overwrite: "{{ lookup('env', 'MASBR_RESTORE_OVERWRITE') }}" - - # Data type string separated by commas: e.g.'namespace,pv' - masbr_restore_data: "{{ lookup('env', 'MASBR_RESTORE_DATA') | default('', true) }}" - - # Also will restore the based on full backup when trying to restore from an incremental backup - # (Not used by now) - masbr_restore_include_basedon: "{{ lookup('env', 'MASBR_RESTORE_INCLUDE_BASEDON') | default(true, true) }}" - -- name: "Fail if masbr_restore_from_version is not provided" - assert: - that: masbr_restore_from_version is defined and masbr_restore_from_version != "" - fail_msg: "MASBR_RESTORE_FROM_VERSION is required for running restore job" - -- name: "Fail if masbr_restore_overwrite is not provided" - assert: - that: masbr_restore_overwrite is defined and masbr_restore_overwrite != "" - fail_msg: "MASBR_RESTORE_OVERWRITE is required for running restore job" - -# Check 'masbr_job_component' -# ----------------------------------------------------------------------------- -- name: "Fail if masbr_job_component is not provided" - assert: - that: - - masbr_job_component is defined - - ('name' in masbr_job_component) - - ('namespace' in masbr_job_component) - fail_msg: - - "masbr_job_component.name is required" - - "masbr_job_component.namespace is required" - -# Check 'masbr_job_data_list' -# ----------------------------------------------------------------------------- -- name: "Set fact: init masbr_job_data_list" - set_fact: - masbr_job_data_list: "{{ masbr_job_data_list | default([], true) }}" - -- name: "Set fact: specified restore data" - when: masbr_restore_data is defined and masbr_restore_data | length > 0 - block: - - name: "Set fact: reset masbr_job_data_specified" - set_fact: - masbr_job_data_specified: [] - - - name: "Get specified restore data" - set_fact: - masbr_job_data_specified: "{{ masbr_job_data_specified + [{ 'seq': (idx+1)|int, 'type': item|trim }] }}" - loop: "{{ masbr_restore_data | split(',') }}" - loop_control: - index_var: idx - - - name: "Set fact: override the default masbr_job_data_list" - set_fact: - masbr_job_data_list: "{{ masbr_job_data_specified }}" - -- name: "Set fact: set default phase to each restore data" - when: masbr_job_data_list is defined and masbr_job_data_list | length > 0 - block: - - name: "Set fact: reset masbr_job_data_init" - set_fact: - masbr_job_data_init: [] - - - name: "Set fact: set default phase to each restore data" - set_fact: - masbr_job_data_init: "{{ masbr_job_data_init + [ item | combine({ 'phase': 'New' }) ] }}" - loop: "{{ masbr_job_data_list }}" - - - name: "Set fact: restore data with default phase" - set_fact: - masbr_job_data_list: "{{ masbr_job_data_init }}" - -# Find restore-from job name -# ----------------------------------------------------------------------------- -- name: "Find the restore-from job name" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/list_storage_job_folders.yml" - vars: - masbr_ls_job_type: "backup" - masbr_ls_filter: "| grep '^{{ masbr_job_component.name }}-.*-{{ masbr_restore_from_version }}$'" - -- name: "Fail if not found the restore-from job name" - assert: - that: masbr_ls_results is defined and masbr_ls_results | length == 1 - fail_msg: "Not found the job name specified by MASBR_RESTORE_FROM_VERSION" - -- name: "Set fact: restore-from job name" - set_fact: - masbr_restore_from: "{{ masbr_ls_results[0] }}" - -# Set restore job variables -# ----------------------------------------------------------------------------- -- name: "Set fact: restore job variables" - set_fact: - masbr_task_type: "restore" - masbr_job_type: "restore" - -- name: "Set fact: restore job name" - set_fact: - masbr_job_name: "{{ masbr_restore_from }}-{{ masbr_job_version }}" - -- name: "Set fact: final restore job name" - set_fact: - # At this point, set it as the same value of masbr_job_name - masbr_job_name_final: "{{ masbr_job_name }}" - -# Create local job folder -# ----------------------------------------------------------------------------- -- name: "Create local job folder" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/create_local_job_folder.yml" - -# Get restore-from job information -# ----------------------------------------------------------------------------- -- name: "Get restore-from job information" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_storage_files_to_local.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_restore_from }}" - masbr_cf_paths: - - src_file: "backup.yml" - dest_folder: "from" - -- name: "Set fact: restore-from job yaml" - set_fact: - masbr_restore_from_yaml: "{{ lookup('file', masbr_local_job_folder + '/from/backup.yml') | from_yaml }}" - -- name: "Debug: restore-from job yaml" - debug: - msg: "{{ masbr_restore_from_yaml }}" - -# The restore-from job should be Completed -- name: "Fail if the restore-from job is not Completed" - assert: - that: masbr_restore_from_yaml.status.phase == "Completed" - fail_msg: "The specified backup job is not Completed, please specify a Completed Full backup job." - -- name: "Set fact: restore-from job variables" - set_fact: - # Whether restore from an incremental backup - masbr_restore_from_incr: "{{ true if masbr_restore_from_yaml.type == 'incr' else false }}" - - # Whether restore to different domain (Disaster Recovery) - masbr_restore_to_diff_domain: >- - {{ true if masbr_restore_from_yaml.source.domain != masbr_cluster_domain else false }} - - # Whether restore to different mas instance (Data Migration) - masbr_restore_to_diff_instance: >- - {{ true if (masbr_job_component.instance is defined and masbr_job_component.instance | length > 0 and - masbr_restore_from_yaml.source.instance != masbr_job_component.instance) else false }} - -# Trying to restore from an incremental backup, also need to check the existance of the based on full backup -# ----------------------------------------------------------------------------- -- name: "Check the existence of the based on full backup job" - when: masbr_restore_from_incr - block: - - name: "Fail if not found 'from' specified in this incremental backup job information" - assert: - that: masbr_restore_from_yaml.from is defined and masbr_restore_from_yaml.from | length > 0 - fail_msg: "Not found 'from' specified in this incremental backup job information" - - - name: "Set fact: the based on full backup job name" - set_fact: - masbr_restore_basedon: "{{ masbr_restore_from_yaml.from }}" - - - name: "Check the existence of the based on full backup job" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/list_storage_job_folders.yml" - vars: - masbr_ls_job_type: "backup" - masbr_ls_filter: "| grep {{ masbr_restore_basedon }}" - - - name: "Fail if not found the based on full backup job" - assert: - that: masbr_ls_results is defined and masbr_ls_results | length == 1 - fail_msg: >- - Not found the based on full backup job folder: - {{ masbr_storage_job_type_folder }}/{{ masbr_restore_basedon }} - -# Show restore job information -# ----------------------------------------------------------------------------- -- name: "Debug: restore job information" - debug: - msg: - - "Restore job name ....................... {{ masbr_job_name }}" - - "Restore from ........................... {{ masbr_restore_from }}" - - "Restore overwrite existing data ........ {{ masbr_restore_overwrite }}" - - "Restore component ...................... {{ masbr_job_component }}" - - "Restore data ........................... {{ masbr_job_data_list }}" - - "Restore from incremental backup ........ {{ masbr_restore_from_incr }}" - - "Restore to different domain ............ {{ masbr_restore_to_diff_domain }}" - - "Restore to different instance .......... {{ masbr_restore_to_diff_instance }}" diff --git a/ibm/mas_devops/common_tasks/backup_restore/confirm_cluster_info.yml b/ibm/mas_devops/common_tasks/backup_restore/confirm_cluster_info.yml deleted file mode 100644 index 81ced28145..0000000000 --- a/ibm/mas_devops/common_tasks/backup_restore/confirm_cluster_info.yml +++ /dev/null @@ -1,32 +0,0 @@ ---- -# Get cluster domain -# ----------------------------------------------------------------------------- -- name: "Get cluster domain" - kubernetes.core.k8s_info: - api_version: config.openshift.io/v1 - kind: DNS - name: cluster - register: _cluster_dns - -- name: "Set fact: cluster domain" - set_fact: - masbr_cluster_domain: "{{ _cluster_dns.resources[0].spec.baseDomain }}" - -- name: "Debug: cluster domain" - debug: - msg: "Cluster domain ........................ {{ masbr_cluster_domain }}" - -# Confirm the cluster information -# ----------------------------------------------------------------------------- -- name: "Confirm the connected cluster information" - when: masbr_confirm_cluster | bool - block: - - name: "Get user input" - pause: - prompt: "\nCurrently connected to cluster:\n {{ masbr_cluster_domain }}\nProceed on this cluster? [yes/no]" - register: _confirm_cluster_info - - - name: "Cancel task" - when: not _confirm_cluster_info.user_input | bool - fail: - msg: "User chooses to cancel task" diff --git a/ibm/mas_devops/common_tasks/backup_restore/copy_local_files_to_storage.yml b/ibm/mas_devops/common_tasks/backup_restore/copy_local_files_to_storage.yml deleted file mode 100644 index 058043a510..0000000000 --- a/ibm/mas_devops/common_tasks/backup_restore/copy_local_files_to_storage.yml +++ /dev/null @@ -1,24 +0,0 @@ ---- -# Copy local job files to local storage -# ----------------------------------------------------------------------------- -- name: "Set fact: local storage job folder" - set_fact: - masbr_storage_job_folder: >- - {{ masbr_storage_local_folder }}/{{ masbr_cf_job_type }}s/{{ masbr_cf_job_name }} - -- name: "Debug: local storage job folder" - debug: - msg: "Local storage job folder .......... {{ masbr_storage_job_folder }}" - -- name: "Copy local job files to local storage job folder" - shell: >- - mkdir -p {{ [masbr_storage_job_folder, item.dest_folder] | path_join }} && - cp -rf {{ [masbr_local_job_folder, item.src_file] | path_join }} - {{ [masbr_storage_job_folder, item.dest_folder] | path_join }} && - ls -lA {{ [masbr_storage_job_folder, item.dest_folder] | path_join }} - loop: "{{ masbr_cf_paths }}" - register: _local_copy_output - -- name: "Debug: copy local job files to local storage job folder" - debug: - msg: "{{ _local_copy_output | json_query('results[*].stdout_lines') }}" diff --git a/ibm/mas_devops/common_tasks/backup_restore/copy_pod_files_to_storage.yml b/ibm/mas_devops/common_tasks/backup_restore/copy_pod_files_to_storage.yml deleted file mode 100644 index 1d22b4d351..0000000000 --- a/ibm/mas_devops/common_tasks/backup_restore/copy_pod_files_to_storage.yml +++ /dev/null @@ -1,53 +0,0 @@ ---- -# Copy files from pod to local storage -# ----------------------------------------------------------------------------- -- name: "Set fact: local storage job folder" - set_fact: - masbr_storage_job_folder: >- - {{ masbr_storage_local_folder }}/{{ masbr_cf_job_type }}s/{{ masbr_cf_job_name }} - -- name: "Debug: All PV variables" - debug: - msg: - - "Pod Name ................... {{ masbr_cf_pod_name }}" - - "Local storage job folder ......... {{ masbr_storage_job_folder }}" - - "Folder setup .................... {{ masbr_cf_paths }}" - - "Source Folder .................... {{ item.src_folder | default('null', true) }}" - - "Destination Folder ............... {{ [masbr_storage_job_folder, item.dest_folder] | path_join }}" - loop: "{{ masbr_cf_paths }}" - -# Condition 1. src_folder -> dest_folder: copy src_folder/* to dest_folder/* -- name: "Copy files from pod folder to local storage folder" - when: - - item.src_folder is defined and item.src_folder | length > 0 - - item.dest_folder is defined and item.dest_folder | length > 0 - shell: >- - mkdir -p {{ [masbr_storage_job_folder, item.dest_folder] | path_join }} && - oc cp --retries=50 -c {{ masbr_cf_container_name }} - {{ masbr_cf_namespace }}/{{ masbr_cf_pod_name }}:{{ item.src_folder }} - {{ [masbr_storage_job_folder, item.dest_folder] | path_join }} - loop: "{{ masbr_cf_paths }}" - -# Condition 2. src_file -> dest_folder: copy src_file to dest_folder/src_file -- name: "Copy file from pod to local storage folder" - when: - - item.src_file is defined and item.src_file | length > 0 - - item.dest_folder is defined and item.dest_folder | length > 0 - shell: >- - mkdir -p {{ [masbr_storage_job_folder, item.dest_folder] | path_join }} && - oc cp --retries=50 -c {{ masbr_cf_container_name }} - {{ masbr_cf_namespace }}/{{ masbr_cf_pod_name }}:{{ item.src_file }} - {{ [masbr_storage_job_folder, item.dest_folder, item.src_file|basename] | path_join }} - loop: "{{ masbr_cf_paths }}" - -# Condition 3. src_file -> dest_file -- name: "Copy file from pod to local storage file" - when: - - item.src_file is defined and item.src_file | length > 0 - - item.dest_file is defined and item.dest_file | length > 0 - shell: >- - mkdir -p {{ [masbr_storage_job_folder, item.dest_file|dirname] | path_join }} && - oc cp --retries=50 -c {{ masbr_cf_container_name }} - {{ masbr_cf_namespace }}/{{ masbr_cf_pod_name }}:{{ [item.src_folder, item.src_file] | path_join }} - {{ [masbr_storage_job_folder, item.dest_file] | path_join }} - loop: "{{ masbr_cf_paths }}" diff --git a/ibm/mas_devops/common_tasks/backup_restore/copy_storage_files_to_local.yml b/ibm/mas_devops/common_tasks/backup_restore/copy_storage_files_to_local.yml deleted file mode 100644 index 363e4516ca..0000000000 --- a/ibm/mas_devops/common_tasks/backup_restore/copy_storage_files_to_local.yml +++ /dev/null @@ -1,24 +0,0 @@ ---- -# Copy job files from local storage to local job folder -# ----------------------------------------------------------------------------- -- name: "Set fact: local storage job folder" - set_fact: - masbr_storage_job_folder: >- - {{ masbr_storage_local_folder }}/{{ masbr_cf_job_type }}s/{{ masbr_cf_job_name }} - -- name: "Debug: local storage job folder" - debug: - msg: "Local storage job folder .......... {{ masbr_storage_job_folder }}" - -- name: "Copy job files from local storage to local job folder" - shell: >- - mkdir -p {{ [masbr_local_job_folder, item.dest_folder] | path_join }} && - cp -rf {{ [masbr_storage_job_folder, item.src_file] | path_join }} - {{ [masbr_local_job_folder, item.dest_folder] | path_join }} && - ls -lA {{ [masbr_local_job_folder, item.dest_folder] | path_join }} - loop: "{{ masbr_cf_paths }}" - register: _local_copy_output - -- name: "Debug: copy job files from local storage to local job folder" - debug: - msg: "{{ _local_copy_output | json_query('results[*].stdout_lines') }}" diff --git a/ibm/mas_devops/common_tasks/backup_restore/copy_storage_files_to_pod.yml b/ibm/mas_devops/common_tasks/backup_restore/copy_storage_files_to_pod.yml deleted file mode 100644 index c9ff13ed7d..0000000000 --- a/ibm/mas_devops/common_tasks/backup_restore/copy_storage_files_to_pod.yml +++ /dev/null @@ -1,76 +0,0 @@ ---- -# Copy files from local storage to pod -# ----------------------------------------------------------------------------- - -# Local storage job folder -- name: "Set fact: local storage job folder" - set_fact: - masbr_storage_job_folder: >- - {{ masbr_storage_local_folder }}/{{ masbr_cf_job_type }}s/{{ masbr_cf_job_name }} - -- name: "Debug: All PV variables" - debug: - msg: - - "Local storage job folder ......... {{ masbr_storage_job_folder }}" - - "Overwrite existing data .......... {{ masbr_restore_overwrite }}" - - "Folder setup .................... {{ masbr_cf_paths }}" - - "Source Folder .................... {{ [masbr_storage_job_folder, item.src_folder | default(item.src_file, true)] | path_join }}" - - "Destination Folder ............... {{ item.dest_folder }}" - loop: "{{ masbr_cf_paths }}" - -- name: "Erase all existing data found in destination folders" - when: - - item.src_folder is defined and item.src_folder | length > 0 - - item.dest_folder is defined and item.dest_folder | length > 0 - - masbr_restore_overwrite is defined and masbr_restore_overwrite - shell: >- - oc exec {{ masbr_cf_pod_name }} -c {{ masbr_cf_container_name }} -n {{ masbr_cf_namespace }} -- bash -c 'rm -rf {{ item.dest_folder }}/*' - loop: "{{ masbr_cf_paths }}" - -- name: "Detect if there is any data in destination folders" - when: - - item.src_folder is defined and item.src_folder | length > 0 - - item.dest_folder is defined and item.dest_folder | length > 0 - - masbr_restore_overwrite is defined and masbr_restore_overwrite == False - shell: >- - oc exec {{ masbr_cf_pod_name }} -c {{ masbr_cf_container_name }} -n {{ masbr_cf_namespace }} -- bash -c '[ "$(ls -A {{ item.dest_folder }})" ] && { echo "{{ item.dest_folder }} is not empty!" && exit 1; } || echo "{{ item.dest_folder }} is empty!";' - loop: "{{ masbr_cf_paths }}" - - -# Condition 1. src_folder -> dest_folder: copy src_folder/* to dest_folder/* -# -# - exec into masbr_cf_pod_name/masbr_cf_container_name, create temp folder -# - cp from src_folder to temp folder inside masbr_cf_pod_name/masbr_cf_container_name -# - exec into masbr_cf_pod_name/masbr_cf_container_name, move item.temp_dest_folder to dest_folder and delete temp_dest_folder -- name: "Copy files from local storage folder to pod folder" - when: - - item.src_folder is defined and item.src_folder | length > 0 - - item.dest_folder is defined and item.dest_folder | length > 0 - shell: >- - oc exec {{ masbr_cf_pod_name }} -c {{ masbr_cf_container_name }} -n {{ masbr_cf_namespace }} -- bash -c 'mkdir -p {{ [item.dest_folder, masbr_job_version] | path_join }}' \ - && oc cp --retries=50 -c {{ masbr_cf_container_name }} {{ [masbr_storage_job_folder, item.src_folder] | path_join }}/. {{ masbr_cf_namespace }}/{{ masbr_cf_pod_name }}:{{ [item.dest_folder, masbr_job_version] | path_join }} \ - && oc exec {{ masbr_cf_pod_name }} -c {{ masbr_cf_container_name }} -n {{ masbr_cf_namespace }} -- bash -c 'mv -f {{ [item.dest_folder, masbr_job_version] | path_join }}/* {{ item.dest_folder }} && rm -rf {{ [item.dest_folder, masbr_job_version] | path_join }}' - loop: "{{ masbr_cf_paths }}" - -# Condition 2. src_file -> dest_folder: copy src_file to dest_folder/src_file -- name: "Copy file from local storage folder to pod folder" - when: - - item.src_file is defined and item.src_file | length > 0 - - item.dest_folder is defined and item.dest_folder | length > 0 - shell: >- - oc exec {{ masbr_cf_pod_name }} -c {{ masbr_cf_container_name }} -n {{ masbr_cf_namespace }} -- bash -c 'mkdir -p {{ item.dest_folder }}' \ - && oc cp --retries=50 -c {{ masbr_cf_container_name }} {{ [masbr_storage_job_folder, item.src_file] | path_join }} {{ masbr_cf_namespace }}/{{ masbr_cf_pod_name }}:{{ item.dest_folder }} - loop: "{{ masbr_cf_paths }}" - -# Condition 3. src_file -> dest_file -# - exec into masbr_cf_pod_name/masbr_cf_container_name, create temp folder -# - cp from src_folder to temp folder inside masbr_cf_pod_name/masbr_cf_container_name -# - exec into masbr_cf_pod_name/masbr_cf_container_name, move temp_dest_folder to item.dest_folder and delete temp_dest_folder -- name: "Copy file from local storage folder to pod file" - when: - - item.src_file is defined and item.src_file | length > 0 - - item.dest_folder is defined and item.dest_folder | length > 0 - shell: >- - oc exec {{ masbr_cf_pod_name }} -c {{ masbr_cf_container_name }} -n {{ masbr_cf_namespace }} -- bash -c 'mkdir -p {{ [item.dest_folder, masbr_job_version] | path_join }}' \ - && oc cp --retries=50 -c {{ masbr_cf_container_name }} {{ [temp_src_folder, item.src_file] | path_join }} {{ masbr_cf_namespace }}/{{ masbr_cf_pod_name }}:{{ [item.dest_folder, masbr_job_version] | path_join }} \ - && oc exec {{ masbr_cf_pod_name }} -c {{ masbr_cf_container_name }} -n {{ masbr_cf_namespace }} -- bash -c 'mv -f {{ [item.dest_folder, masbr_job_version] | path_join }}/{{ item.src_file|basename }} {{ item.dest_folder }} && rm -rf {{ [item.dest_folder, masbr_job_version] | path_join }}' diff --git a/ibm/mas_devops/common_tasks/backup_restore/create_cleanup_job.yml b/ibm/mas_devops/common_tasks/backup_restore/create_cleanup_job.yml deleted file mode 100644 index eacd6e9153..0000000000 --- a/ibm/mas_devops/common_tasks/backup_restore/create_cleanup_job.yml +++ /dev/null @@ -1,63 +0,0 @@ ---- -# Check if cleanup Job exists -# ----------------------------------------------------------------------------- -- name: "Get cleanup Job" - kubernetes.core.k8s_info: - api_version: batch/v1 - kind: CronJob - name: masbr-cleanup - namespace: "{{ masbr_cleanup_namespace }}" - register: _cleanup_job_info - -- name: "Set fact: cleanup Job exists" - when: - - _cleanup_job_info is defined - - _cleanup_job_info.resources is defined - - _cleanup_job_info.resources | length == 1 - set_fact: - masbr_cleanup_job_exists: true - -# Create script configmap if cleanup Job not exists -# ----------------------------------------------------------------------------- -- name: "Create script configmap if cleanup Job not exists" - when: masbr_cleanup_job_exists is not defined - block: - - name: "Create cleanup script" - template: - src: "{{ role_path }}/../../common_tasks/templates/backup_restore/cleanup_job.sh.j2" - dest: "{{ masbr_local_job_folder }}/cleanup_job.sh" - - - name: "Get cleanup script content" - shell: > - cat {{ masbr_local_job_folder }}/cleanup_job.sh - register: _cleanup_sh_content - - - name: "Create configmap to save cleanup script" - kubernetes.core.k8s: - definition: - apiVersion: v1 - kind: ConfigMap - metadata: - name: masbr-cleanup - namespace: "{{ masbr_cleanup_namespace }}" - data: - script: "{{ _cleanup_sh_content.stdout }}" - wait: true - -# Create or update cleanup Job -# ----------------------------------------------------------------------------- -- name: "Set fact: cleanup Job variables" - set_fact: - masbr_cleanup_env: - - name: "MASBR_CLEANUP_TTL_SEC" - value: "{{ masbr_cleanup_ttl_sec }}" - masbr_cleanup_cmds: >- - oc get cm/masbr-cleanup -n {{ masbr_cleanup_namespace }} -o yaml | yq '.data.script' > /tmp/masbr-cleanup.sh && - chmod +x /tmp/masbr-cleanup.sh && - /tmp/masbr-cleanup.sh - -- name: "Create or update cleanup Job" - kubernetes.core.k8s: - apply: true - template: "{{ role_path }}/../../common_tasks/templates/backup_restore/cleanup_job.yml.j2" - state: present diff --git a/ibm/mas_devops/common_tasks/backup_restore/create_local_job_folder.yml b/ibm/mas_devops/common_tasks/backup_restore/create_local_job_folder.yml deleted file mode 100644 index 56a57ad033..0000000000 --- a/ibm/mas_devops/common_tasks/backup_restore/create_local_job_folder.yml +++ /dev/null @@ -1,21 +0,0 @@ ---- -# Create local job folder if not exists -# ----------------------------------------------------------------------------- -- name: "Set fact: local job folder" - set_fact: - masbr_local_job_folder: "{{ masbr_local_temp_folder }}/{{ masbr_job_name }}" - -- name: "Debug: local job folder" - debug: - msg: "Local job folder ...................... {{ masbr_local_job_folder }}" - -- name: "Check if local job folder exists" - stat: - path: "{{ masbr_local_job_folder }}" - register: _file_stat_output - -- name: "Create local job folder if not exists" - when: not _file_stat_output.stat.exists - file: - path: "{{ masbr_local_job_folder }}" - state: directory diff --git a/ibm/mas_devops/common_tasks/backup_restore/delete_storage_job_folder.yml b/ibm/mas_devops/common_tasks/backup_restore/delete_storage_job_folder.yml deleted file mode 100644 index 60d6a71fa0..0000000000 --- a/ibm/mas_devops/common_tasks/backup_restore/delete_storage_job_folder.yml +++ /dev/null @@ -1,15 +0,0 @@ ---- -# Delete the job folder from local storage -# ----------------------------------------------------------------------------- -- name: "Set fact: local storage job folder" - set_fact: - masbr_storage_job_folder: "{{ masbr_storage_local_folder }}/{{ masbr_job_type }}s/{{ masbr_job_name }}" - -- name: "Debug: local storage job folder" - debug: - msg: "Local storage job folder .......... {{ masbr_storage_job_folder }}" - -- name: "Delete the job folder from local storage" - command: rm -rf {{ masbr_storage_job_folder }} - args: - removes: "{{ masbr_storage_job_folder }}" diff --git a/ibm/mas_devops/common_tasks/backup_restore/list_storage_job_folders.yml b/ibm/mas_devops/common_tasks/backup_restore/list_storage_job_folders.yml deleted file mode 100644 index 7115813af6..0000000000 --- a/ibm/mas_devops/common_tasks/backup_restore/list_storage_job_folders.yml +++ /dev/null @@ -1,27 +0,0 @@ ---- -# List job folders in local storage -# ----------------------------------------------------------------------------- -- name: "Set fact: local storage job type folder" - set_fact: - masbr_storage_job_type_folder: "{{ masbr_storage_local_folder }}/{{ masbr_ls_job_type }}s" - -- name: "Debug: list job folders variables" - debug: - msg: - - "Search folder ...................... {{ masbr_storage_job_type_folder }}" - - "Search filter ...................... {{ masbr_ls_filter | default('', true) }}" - -- name: "List job folders in local storage" - changed_when: false - shell: >- - ls {{ masbr_storage_job_type_folder }} {{ masbr_ls_filter | default('') }}; - exit 0 - register: _ls_output - -- name: "Set fact: results of list job folders" - set_fact: - masbr_ls_results: "{{ _ls_output.stdout_lines }}" - -- name: "Debug: results of list job folders" - debug: - msg: "Results of list job folders ....... {{ masbr_ls_results }}" diff --git a/ibm/mas_devops/common_tasks/backup_restore/rename_storage_job_folder.yml b/ibm/mas_devops/common_tasks/backup_restore/rename_storage_job_folder.yml deleted file mode 100644 index fe869342e1..0000000000 --- a/ibm/mas_devops/common_tasks/backup_restore/rename_storage_job_folder.yml +++ /dev/null @@ -1,23 +0,0 @@ ---- -- name: "Set fact: final job folder name" - set_fact: - masbr_job_name_final: "{{ masbr_job_name }}-{{ masbr_job_status.phase }}" - -# Rename the job folder in local storage -# ----------------------------------------------------------------------------- -- name: "Set fact: local storage job folder" - set_fact: - masbr_storage_job_folder: "{{ masbr_storage_local_folder }}/{{ masbr_job_type }}s/{{ masbr_job_name }}" - masbr_storage_job_folder_final: "{{ masbr_storage_local_folder }}/{{ masbr_job_type }}s/{{ masbr_job_name_final }}" - -- name: "Debug: rename local storage job folder" - debug: - msg: - - "Source job folder .................. {{ masbr_storage_job_folder }}" - - "Dest job folder .................... {{ masbr_storage_job_folder_final }}" - -- name: "Rename the job folder in local storage" - command: mv {{ masbr_storage_job_folder }} {{ masbr_storage_job_folder_final }} - args: - removes: "{{ masbr_storage_job_folder }}" - creates: "{{ masbr_storage_job_folder_final }}" diff --git a/ibm/mas_devops/common_tasks/backup_restore/restart_and_reconsiled.yml b/ibm/mas_devops/common_tasks/backup_restore/restart_and_reconsiled.yml deleted file mode 100644 index ff022cdd72..0000000000 --- a/ibm/mas_devops/common_tasks/backup_restore/restart_and_reconsiled.yml +++ /dev/null @@ -1,23 +0,0 @@ ---- -- name: "Restart {{ _pod_keywords }} pod" - shell: > - oc get pods -n {{ _pod_namespace }} --no-headers=true | grep "{{ _pod_keywords }}" | awk '{print $1}' - | xargs oc delete pod -n {{ _pod_namespace }} - -- name: "Wait for {{ _pod_keywords }} pod to be ready (10s delay)" - shell: > - oc get pods -n {{ _pod_namespace }} --no-headers=true | grep "{{ _pod_keywords }}" - | grep -Evi "1/1|2/2|3/3|4/4|5/5|6/6|7/7|8/8|9/9|complete" | wc -l - register: _is_not_ready - until: _is_not_ready.stdout|int == 0 - retries: 30 - delay: 10 - -- name: "Wait for {{ _pod_keywords }} pod to be reconsiled (10s delay)" - shell: > - oc get pods -n {{ _pod_namespace }} --no-headers=true | grep "{{ _pod_keywords }}" | awk '{print $1}' - | xargs oc logs -c {{ _container_name }} -n {{ _pod_namespace }} | grep "ok=" | wc -l - register: _is_reconsiled - until: _is_reconsiled.stdout|int == 1 - retries: 60 - delay: 10 diff --git a/ibm/mas_devops/common_tasks/backup_restore/update_job_status.yml b/ibm/mas_devops/common_tasks/backup_restore/update_job_status.yml deleted file mode 100644 index 98f751b30f..0000000000 --- a/ibm/mas_devops/common_tasks/backup_restore/update_job_status.yml +++ /dev/null @@ -1,156 +0,0 @@ ---- -# Update job variables -# ----------------------------------------------------------------------------- -# Update 'masbr_job_component' -- name: "Update fact: masbr_job_component" - when: _job_component is defined - set_fact: - masbr_job_component: "{{ masbr_job_component | combine(_job_component) }}" - -# Update 'masbr_job_data_list' -- name: "Update fact: masbr_job_data_list" - when: - - _job_data_list is defined - - _job_data_list | length > 0 - block: - - name: "Update fact: masbr_job_data_list" - ansible.utils.update_fact: - updates: - - path: masbr_job_data_list[{{ item.seq|int -1 }}].phase - value: "{{ item.phase }}" - loop: "{{ _job_data_list }}" - register: _job_data_list_updated - - - name: "Set fact: masbr_job_data_list" - set_fact: - masbr_job_data_list: "{{ _job_data_list_updated.results[-1].masbr_job_data_list }}" - -# Get specified 'masbr_job_status.phase' -- name: "Get specified masbr_job_status.phase" - when: - - _job_status is defined - - _job_status.phase is defined - set_fact: - _job_status_: - phase: "{{ _job_status.phase }}" - -# Determine 'masbr_job_status.phase' based on 'masbr_job_data_list' -- name: "Determine masbr_job_status.phase based on masbr_job_data_list" - when: _job_status is not defined - block: - - name: "Get unique phases of all job data types" - set_fact: - masbr_job_data_phases: "{{ masbr_job_data_list | map(attribute='phase') | unique }}" - - - name: "Debug: unique phases of all job data types" - debug: - msg: "Job data phases ................... {{ masbr_job_data_phases }}" - - # masbr_job_data_phases: ['New'] - - name: "Set fact: masbr_job_status_phase ('New')" - when: - - ("New" in masbr_job_data_phases) - - masbr_job_data_phases | length == 1 - set_fact: - _job_status_: - phase: "New" - - # masbr_job_data_phases: ['InProgress'], ['InProgress', 'New'], ['Completed', 'InProgress', 'New'] - - name: "Set fact: masbr_job_status_phase ('InProgress')" - when: - - ("InProgress" in masbr_job_data_phases) - - ("Failed" not in masbr_job_data_phases) - set_fact: - _job_status_: - phase: "InProgress" - - # masbr_job_data_phases: ['Completed'] - - name: "Set fact: masbr_job_status_phase ('Completed')" - when: - - ("Completed" in masbr_job_data_phases) - - masbr_job_data_phases | length == 1 - set_fact: - _job_status_: - phase: "Completed" - - # masbr_job_data_phases: ['Failed', 'InProgress', 'New'] - - name: "Set fact: masbr_job_status_phase ('PartiallyFailed')" - when: - - ("Failed" in masbr_job_data_phases) - - masbr_job_data_phases | length > 1 - set_fact: - _job_status_: - phase: "PartiallyFailed" - - # masbr_job_data_phases: ['Failed'] - - name: "Set fact: masbr_job_status_phase ('Failed')" - when: - - ("Failed" in masbr_job_data_phases) - - masbr_job_data_phases | length == 1 - set_fact: - _job_status_: - phase: "Failed" - -# When Job status is "New" -- name: "Update fact: masbr_job_status.phase ('New')" - when: - - _job_status_ is defined - - _job_status_.phase is defined - - _job_status_.phase == "New" - set_fact: - masbr_job_status: - phase: "New" - startTimestamp: "{{ '%Y-%m-%dT%H:%M:%S' | strftime }}" - -# When Job status is "InProgress" -- name: "Update fact: masbr_job_status.phase ('InProgress')" - when: - - _job_status_ is defined - - _job_status_.phase is defined - - _job_status_.phase == "InProgress" - set_fact: - masbr_job_status: "{{ masbr_job_status | combine(_job_status_) }}" - -# When Job status is "Completed", "Failed" or "PartiallyFailed" -- name: "Update fact: masbr_job_status.phase ('Completed', 'Failed', 'PartiallyFailed')" - when: - - _job_status_ is defined - - _job_status_.phase is defined - - _job_status_.phase in ['Completed', 'Failed', 'PartiallyFailed'] - set_fact: - masbr_job_status: >- - {{ masbr_job_status | combine({ - 'phase': _job_status_.phase, - 'completionTimestamp': '%Y-%m-%dT%H:%M:%S' | strftime - }) }} - -# Create job file -# ----------------------------------------------------------------------------- -- name: "Debug: update job variables" - debug: - msg: - - "masbr_job_component .................... {{ masbr_job_component }}" - - "masbr_job_data_list .................... {{ masbr_job_data_list }}" - - "masbr_job_status ....................... {{ masbr_job_status }}" - -- name: "Create updated job file" - template: - src: "{{ role_path }}/../../common_tasks/templates/backup_restore/{{ masbr_job_type }}.yml.j2" - dest: "{{ masbr_local_job_folder }}/{{ masbr_job_type }}.yml" - -# Copy local job files to specified storage location -# ----------------------------------------------------------------------------- -- name: "Copy local job files to specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_local_files_to_storage.yml" - vars: - masbr_cf_job_type: "{{ masbr_job_type }}" - masbr_cf_job_name: "{{ masbr_job_name }}" - masbr_cf_paths: - - src_file: "{{ masbr_job_type }}.yml" - dest_folder: "" - -# Append job final status to the job folder name -# ----------------------------------------------------------------------------- -- name: "Append job final status to the job folder name" - when: masbr_job_status.phase in ['PartiallyFailed', 'Failed'] - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/rename_storage_job_folder.yml" diff --git a/ibm/mas_devops/common_tasks/templates/backup_restore/backup-namespace-resources.sh.j2 b/ibm/mas_devops/common_tasks/templates/backup_restore/backup-namespace-resources.sh.j2 deleted file mode 100644 index 89bf2fd25a..0000000000 --- a/ibm/mas_devops/common_tasks/templates/backup_restore/backup-namespace-resources.sh.j2 +++ /dev/null @@ -1,72 +0,0 @@ -#!/bin/bash - -BACKUP_FOLDER="{{ masbr_ns_backup_folder }}" -LOG_FILE="{{ masbr_ns_backup_log }}" - - -function backupSingleResource { - local RESOURCE_NAMESPACE=$1 - local RESOURCE_KIND=$2 - local RESOURCE_NAME=$3 - - echo "Backing up $RESOURCE_KIND/$RESOURCE_NAME in the $RESOURCE_NAMESPACE namespace..." | tee -a $LOG_FILE - - if [[ "$(oc get $RESOURCE_KIND/$RESOURCE_NAME -n $RESOURCE_NAMESPACE --ignore-not-found=true --no-headers=true | wc -l)" == "0" ]]; then - echo "Not found $RESOURCE_KIND/$RESOURCE_NAME in the $RESOURCE_NAMESPACE namespace!" | tee -a $LOG_FILE - - else - local resourceYaml=$(oc get $RESOURCE_KIND/$RESOURCE_NAME -n $RESOURCE_NAMESPACE -o yaml) - hasCredentials=$(echo "$resourceYaml" | yq '.spec.config.credentials | has("secretName")') - - if [ "$hasCredentials" == "true" ]; then - credentialsName=`(echo "$resourceYaml" | yq .spec.config.credentials.secretName)` - echo "The credentials $credentialsName will be backed up for the resource $RESOURCE_NAME" | tee -a $LOG_FILE - backupSingleResource $RESOURCE_NAMESPACE Secret $credentialsName - fi - - echo "$resourceYaml" | yq 'del(.metadata.creationTimestamp, .metadata.ownerReferences, .metadata.generation, .metadata.resourceVersion, .metadata.uid, .metadata.annotations["kubectl.kubernetes.io/last-applied-configuration"], .status)' > $BACKUP_FOLDER/$RESOURCE_KIND-$RESOURCE_NAME.yaml | tee -a $LOG_FILE - echo "Saved $RESOURCE_KIND/$RESOURCE_NAME to $BACKUP_FOLDER/$RESOURCE_KIND-$RESOURCE_NAME.yaml" | tee -a $LOG_FILE - fi -} - -function backupResources { - local RESOURCE_NAMESPACE=$1 - local RESOURCE_KIND=$2 - local RESOURCE_KEYWORDS=$3 - - if [[ $RESOURCE_KEYWORDS == "" ]]; then - echo "Backing up all $RESOURCE_KIND resources in the $RESOURCE_NAMESPACE namespace..." | tee -a $LOG_FILE - resourceNames=($(oc get $RESOURCE_KIND -n $RESOURCE_NAMESPACE --no-headers=true | awk '{print $1}')) - else - echo "Backing up $RESOURCE_KIND resources containing keywords \"$RESOURCE_KEYWORDS\" in the $RESOURCE_NAMESPACE namespace..." | tee -a $LOG_FILE - resourceNames=($(oc get $RESOURCE_KIND -n $RESOURCE_NAMESPACE --no-headers=true | awk '{print $1}' | grep "$RESOURCE_KEYWORDS")) - fi - - for resourceName in ${resourceNames[@]}; do - backupSingleResource $RESOURCE_NAMESPACE $RESOURCE_KIND $resourceName - done -} - -if command -v yq &> /dev/null; then - echo "yq is installed, continuing" -else - echo "yq not found! please install yq before performing a backup" - exit 1 -fi - - -{% if masbr_ns_backup_resources is defined and masbr_ns_backup_resources | length > 0 %} -{% for ns_resources in masbr_ns_backup_resources %} - -{% for resource in ns_resources.resources %} - -{% if resource.name is not defined or resource.name == "*" %} -backupResources "{{ ns_resources.namespace }}" "{{ resource.kind }}" "{{ resource.keywords|default('') }}" -{% else %} -backupSingleResource "{{ ns_resources.namespace }}" "{{ resource.kind }}" "{{ resource.name }}" -{% endif %} - -{% endfor %} # ns_resources.resources - -{% endfor %} # masbr_ns_backup_resources -{% endif %} \ No newline at end of file diff --git a/ibm/mas_devops/common_tasks/templates/backup_restore/backup.yml.j2 b/ibm/mas_devops/common_tasks/templates/backup_restore/backup.yml.j2 deleted file mode 100644 index 03c2148b85..0000000000 --- a/ibm/mas_devops/common_tasks/templates/backup_restore/backup.yml.j2 +++ /dev/null @@ -1,45 +0,0 @@ ---- -kind: Backup -name: "{{ masbr_job_name }}" -version: "{{ masbr_job_version }}" -type: "{{ masbr_backup_type }}" -{% if masbr_backup_from is defined and masbr_backup_from | length > 0 %} -from: "{{ masbr_backup_from }}" -{% endif %} -source: - domain: "{{ masbr_cluster_domain }}" - suite: "{{ mas_core_version | default('', true) }}" - instance: "{{ mas_instance_id | default('', true) }}" - workspace: "{{ mas_workspace_id | default('', true) }}" -{% if masbr_job_component is defined and masbr_job_component.items() %} -component: -{% for key, value in masbr_job_component.items() %} - {{ key }}: "{{ value }}" -{% endfor %} -{% endif %} -{% if masbr_job_data_list is defined and masbr_job_data_list | length > 0 %} -data: -{% for job_data in masbr_job_data_list %} - - seq: "{{ job_data.seq }}" - type: "{{ job_data.type }}" - phase: "{{ job_data.phase | default('New', true) }}" -{% endfor %} -{% endif %} -{% if masbr_backup_schedule is defined and masbr_backup_schedule | length > 0 %} -schedule: "{{ masbr_backup_schedule }}" -{% endif %} -status: - phase: "{{ masbr_job_status.phase | default('New', true) }}" - startTimestamp: "{{ masbr_job_status.startTimestamp | default('', true) }}" - completionTimestamp: "{{ masbr_job_status.completionTimestamp | default('', true) }}" -{% if masbr_job_status is defined - and masbr_job_status.sentNotifications is defined - and masbr_job_status.sentNotifications | length > 0 %} - sentNotifications: -{% for notification in masbr_job_status.sentNotifications %} - - type: "{{ notification.type }}" - channel: "{{ notification.channel }}" - timestamp: "{{ notification.timestamp }}" - phase: "{{ notification.phase }}" -{% endfor %} -{% endif %} diff --git a/ibm/mas_devops/common_tasks/templates/backup_restore/cleanup_job.sh.j2 b/ibm/mas_devops/common_tasks/templates/backup_restore/cleanup_job.sh.j2 deleted file mode 100644 index e07040d5be..0000000000 --- a/ibm/mas_devops/common_tasks/templates/backup_restore/cleanup_job.sh.j2 +++ /dev/null @@ -1,64 +0,0 @@ -namespace={{ masbr_cleanup_namespace }} -ttl=${MASBR_CLEANUP_TTL_SEC} - -current_time=$(date +%Y-%m-%dT%H:%M:%SZ) -current_ts=$(date +%s) - -echo "Start running cleanup job" -echo "Current time: ${current_time}" -echo "Current ts: ${current_ts}" -echo "TTL: ${ttl}s" - - -# Cleanup Jobs -job_names=($(oc get job -n ${namespace} --ignore-not-found=true --no-headers=true -l 'masbr-type in (backup,restore,schedule,copy)' | awk '{print $1}')) - -for job_name in ${job_names[@]}; do - echo "" - echo "Checking Job [ ${job_name} ] ..." - job_yaml=$(oc get job/${job_name} -n ${namespace} -o yaml) - job_complete=$(echo "${job_yaml}" | yq '.status.conditions.[] | select(.type == "Complete") | .status') - - if [[ "${job_complete}" == "True" ]]; then - job_complete_time=$(echo "${job_yaml}" | yq '.status.completionTime') - job_complete_ts=$(date +%s -d "${job_complete_time}") - echo "Job completion time: ${job_complete_time} (${job_complete_ts})" - - delta=$((current_ts - job_complete_ts)) - if [ ${delta} -gt ${ttl} ]; then - echo "Job exceed TTL (+$((delta - ttl))s)" - oc delete job/${job_name} -n ${namespace} - else - echo "Job not exceed TTL (-$((ttl - delta))s)" - fi - else - echo "Job not complete" - fi -done - - -# Cleanup ConfigMaps -cm_names=($(oc get cm -n ${namespace} --ignore-not-found=true --no-headers=true -l 'masbr-type in (backup,restore,schedule,copy)' | awk '{print $1}')) -for cm_name in ${cm_names[@]}; do - echo "" - echo "Checking ConfigMap [ ${cm_name} ] ..." - cm_yaml=$(oc get cm/${cm_name} -n ${namespace} -o yaml) - masbr_type=$(echo "${cm_yaml}" | yq '.metadata.labels.masbr-type') - - if [[ "${masbr_type}" == "schedule" ]]; then - if [[ "$(oc get cronjob -n ${namespace} --ignore-not-found=true --no-headers=true | grep ${cm_name} | wc -l)" == "0" ]]; then - echo "Not found related CronJob" - oc delete cm/${cm_name} -n ${namespace} - else - echo "Found related CronJob" - fi - - else - if [[ "$(oc get job -n ${namespace} --ignore-not-found=true --no-headers=true | grep ${cm_name} | wc -l)" == "0" ]]; then - echo "Not found related Job" - oc delete cm/${cm_name} -n ${namespace} - else - echo "Found related Job" - fi - fi -done diff --git a/ibm/mas_devops/common_tasks/templates/backup_restore/cleanup_job.yml.j2 b/ibm/mas_devops/common_tasks/templates/backup_restore/cleanup_job.yml.j2 deleted file mode 100644 index 144a9efb2c..0000000000 --- a/ibm/mas_devops/common_tasks/templates/backup_restore/cleanup_job.yml.j2 +++ /dev/null @@ -1,83 +0,0 @@ ---- -kind: ServiceAccount -apiVersion: v1 -metadata: - name: "masbr-sa" - namespace: "{{ masbr_cleanup_namespace }}" - labels: - mas.ibm.com/masbr: "" - ---- -kind: ClusterRoleBinding -apiVersion: rbac.authorization.k8s.io/v1 -metadata: - name: "masbr-{{ masbr_cleanup_namespace }}" - labels: - mas.ibm.com/masbr: "" -subjects: - - kind: ServiceAccount - name: "masbr-sa" - namespace: "{{ masbr_cleanup_namespace }}" -roleRef: - apiGroup: rbac.authorization.k8s.io - kind: ClusterRole - name: cluster-admin - ---- -kind: NetworkPolicy -apiVersion: networking.k8s.io/v1 -metadata: - name: "masbr-network-policy" - namespace: "{{ masbr_cleanup_namespace }}" - labels: - mas.ibm.com/masbr: "" -spec: - podSelector: - matchLabels: - mas.ibm.com/masbr: "" - egress: - - {} - policyTypes: - - Egress - ---- -kind: CronJob -apiVersion: batch/v1 -metadata: - name: "masbr-cleanup" - namespace: "{{ masbr_cleanup_namespace }}" - labels: - mas.ibm.com/masbr: "" - masbr-type: "cleanup" -spec: - schedule: "{{ masbr_cleanup_schedule }}" -{% if masbr_job_timezone is defined and masbr_job_timezone | length > 0 %} - timeZone: "{{ masbr_job_timezone }}" -{% endif %} - successfulJobsHistoryLimit: 1 - jobTemplate: - spec: - backoffLimit: 1 - template: - metadata: - name: "masbr-cleanup" - labels: - mas.ibm.com/masbr: "" - masbr-type: "cleanup" - spec: - serviceAccountName: "masbr-sa" - containers: - - name: main - image: quay.io/ibmmas/cli:{{ masbr_mascli_image_tag }} -{% if masbr_mascli_image_pull_policy is defined and masbr_mascli_image_pull_policy | length > 0 %} - imagePullPolicy: "{{ masbr_mascli_image_pull_policy }}" -{% endif %} - command: - - sh - - '-c' - - >- - {{ masbr_cleanup_cmds }} -{% if masbr_cleanup_env is defined and masbr_cleanup_env | length > 0 %} - env: {{ masbr_cleanup_env }} -{% endif %} - restartPolicy: Never diff --git a/ibm/mas_devops/common_tasks/templates/backup_restore/copy_cloud_files_job.yml.j2 b/ibm/mas_devops/common_tasks/templates/backup_restore/copy_cloud_files_job.yml.j2 deleted file mode 100644 index d72c96c0d2..0000000000 --- a/ibm/mas_devops/common_tasks/templates/backup_restore/copy_cloud_files_job.yml.j2 +++ /dev/null @@ -1,95 +0,0 @@ ---- -kind: NetworkPolicy -apiVersion: networking.k8s.io/v1 -metadata: - name: "masbr-network-policy" - namespace: "{{ masbr_cf_namespace }}" - labels: - mas.ibm.com/masbr: "" -spec: - podSelector: - matchLabels: - mas.ibm.com/masbr: "" - egress: - - {} - policyTypes: - - Egress - ---- -apiVersion: batch/v1 -kind: Job -metadata: - name: "{{ masbr_cf_k8s_name }}" - namespace: "{{ masbr_cf_namespace }}" - labels: - mas.ibm.com/masbr: "" - masbr-type: "copy" - masbr-job: "{{ masbr_job_name }}" -spec: - backoffLimit: 1 - template: - metadata: - name: "{{ masbr_cf_k8s_name }}" - labels: - mas.ibm.com/masbr: "" - masbr-type: "copy" - masbr-job: "{{ masbr_job_name }}" - spec: -{% if masbr_cf_affinity is defined and masbr_cf_affinity %} - affinity: - podAffinity: - requiredDuringSchedulingIgnoredDuringExecution: - - labelSelector: - matchExpressions: - - key: statefulset.kubernetes.io/pod-name - operator: In - values: - - "{{ masbr_cf_pod_name }}" - topologyKey: kubernetes.io/hostname -{% endif %} -{% if masbr_cf_service_account_name is defined and masbr_cf_service_account_name | length > 0 %} - serviceAccountName: "{{ masbr_cf_service_account_name }}" -{% endif %} -{% if masbr_cf_service_account is defined and masbr_cf_service_account | length > 0 %} - serviceAccount: "{{ masbr_cf_service_account }}" -{% endif %} -{% if masbr_cf_pod_security_context is defined and masbr_cf_pod_security_context | length > 0 %} - securityContext: {{ masbr_cf_pod_security_context }} -{% endif %} - containers: - - name: main - image: quay.io/ibmmas/cli:{{ masbr_mascli_image_tag }} -{% if masbr_mascli_image_pull_policy is defined and masbr_mascli_image_pull_policy | length > 0 %} - imagePullPolicy: "{{ masbr_mascli_image_pull_policy }}" -{% endif %} - command: - - sh - - '-c' - - >- - {{ masbr_cf_cmds }} -{% if masbr_cf_env is defined and masbr_cf_env | length > 0 %} - env: {{ masbr_cf_env }} -{% endif %} - volumeMounts: - - name: tmp - mountPath: /tmp - - name: data-volume - mountPath: "{{ masbr_cf_pvc_mount_path }}" -{% if masbr_cf_pvc_sub_path is defined and masbr_cf_pvc_sub_path | length > 0 %} - subPath: "{{ masbr_cf_pvc_sub_path }}" -{% endif %} - - name: cm-volume - mountPath: /mnt/configmap -{% if masbr_cf_container_security_context is defined and masbr_cf_container_security_context | length > 0 %} - securityContext: {{ masbr_cf_container_security_context }} -{% endif %} - restartPolicy: Never - volumes: - - name: tmp - emptyDir: {} - - name: data-volume - persistentVolumeClaim: - claimName: "{{ masbr_cf_pvc_name }}" - - name: cm-volume - configMap: - name: "{{ masbr_cf_k8s_name }}" diff --git a/ibm/mas_devops/common_tasks/templates/backup_restore/restore.yml.j2 b/ibm/mas_devops/common_tasks/templates/backup_restore/restore.yml.j2 deleted file mode 100644 index 8ad544b404..0000000000 --- a/ibm/mas_devops/common_tasks/templates/backup_restore/restore.yml.j2 +++ /dev/null @@ -1,36 +0,0 @@ ---- -kind: Restore -name: "{{ masbr_job_name }}" -version: "{{ masbr_job_version }}" -from: "{{ masbr_restore_from }}" -target: - domain: "{{ masbr_cluster_domain }}" -{% if masbr_job_component is defined and masbr_job_component.items() %} -component: -{% for key, value in masbr_job_component.items() %} - {{ key }}: "{{ value }}" -{% endfor %} -{% endif %} -{% if masbr_job_data_list is defined and masbr_job_data_list | length > 0 %} -data: -{% for job_data in masbr_job_data_list %} - - seq: {{ job_data.seq }} - type: "{{ job_data.type }}" - phase: "{{ job_data.phase | default('New', true) }}" -{% endfor %} -{% endif %} -status: - phase: "{{ masbr_job_status.phase | default('New', true) }}" - startTimestamp: "{{ masbr_job_status.startTimestamp | default('', true) }}" - completionTimestamp: "{{ masbr_job_status.completionTimestamp | default('', true) }}" -{% if masbr_job_status is defined - and masbr_job_status.sentNotifications is defined - and masbr_job_status.sentNotifications | length > 0 %} - sentNotifications: -{% for notification in masbr_job_status.sentNotifications %} - - type: "{{ notification.type }}" - channel: "{{ notification.channel }}" - timestamp: "{{ notification.timestamp }}" - phase: "{{ notification.phase }}" -{% endfor %} -{% endif %} diff --git a/ibm/mas_devops/common_vars/backup_restore.yml b/ibm/mas_devops/common_vars/backup_restore.yml deleted file mode 100644 index 10c54bb15f..0000000000 --- a/ibm/mas_devops/common_vars/backup_restore.yml +++ /dev/null @@ -1,53 +0,0 @@ ---- -# Job management -# ----------------------------------------------------------------------------- -# Whether to confirm the currently connected cluster before run tasks -masbr_confirm_cluster: "{{ lookup('env', 'MASBR_CONFIRM_CLUSTER') | default(false, true) | bool }}" - -# Copy file timeout in seconds (default timeout is 12 hours: 3600 * 12) -masbr_copy_timeout_sec: "{{ lookup('env', 'MASBR_COPY_TIMEOUT_SEC') | default(43200, true) | int }}" - -# Whether to allow multiple backup/restore jobs to run simultaneously -masbr_allow_multi_jobs: "{{ lookup('env', 'MASBR_ALLOW_MULTI_JOBS') | default(true, true) | bool }}" - -# Cron expression of cleanup Job (default to run at 1:00 every day) -# https://en.wikipedia.org/wiki/Cron -masbr_cleanup_schedule: "{{ lookup('env', 'MASBR_CLEANUP_SCHEDULE') | default('0 1 * * *', true) }}" - -# The completed Jobs that exceed this time-to-live in seconds will be deleted (default ttl is 1 week: 3600 * 24 * 7) -masbr_cleanup_ttl_sec: "{{ lookup('env', 'MASBR_CLEANUP_TTL_SEC') | default('604800', true) }}" - -# Time zone of CronJob -# https://en.wikipedia.org/wiki/List_of_tz_database_time_zones -masbr_job_timezone: "{{ lookup('env', 'MASBR_JOB_TIMEZONE') | default('', true) }}" - -# Docker image tag -# ----------------------------------------------------------------------------- -masbr_mascli_image_tag: "{{ lookup('env', 'MASBR_MASCLI_IMAGE_TAG') | default('latest', true) }}" -masbr_mascli_image_pull_policy: "{{ lookup('env', 'MASBR_MASCLI_IMAGE_PULL_POLICY') | default('', true) }}" - -# Storage variables -# ----------------------------------------------------------------------------- -# Local temp folder for backup/restore -masbr_local_temp_folder: "{{ lookup('env', 'MASBR_LOCAL_TEMP_FOLDER') | default('/tmp/masbr', true) }}" -masbr_storage_local_folder: "{{ lookup('env', 'MASBR_STORAGE_LOCAL_FOLDER') }}" - -# Notification variables -# ----------------------------------------------------------------------------- -# Supported notification levels: -# - 'verbose': send notifications when job in all phases 'InProgress', 'Completed', 'Failed', 'PartiallyFailed' -# - 'info': send job final results 'Completed', 'Failed', 'PartiallyFailed' -# - 'failure': sent notifications only when job in the phase 'Failed', 'PartiallyFailed' -masbr_notification_levels: - verbose: - - "InProgress" - - "Completed" - - "Failed" - - "PartiallyFailed" - info: - - "Completed" - - "Failed" - - "PartiallyFailed" - failure: - - "Failed" - - "PartiallyFailed" diff --git a/ibm/mas_devops/meta/runtime.yml b/ibm/mas_devops/meta/runtime.yml index a32908d084..4500be7bed 100644 --- a/ibm/mas_devops/meta/runtime.yml +++ b/ibm/mas_devops/meta/runtime.yml @@ -11,6 +11,10 @@ action_groups: - verify_core_version - verify_subscriptions - verify_workloads + - verify_storage_class + - get_mongoce_info + - verify_backup_restore_vars + - verify_mongoce_version - wait_for_app_ready - wait_for_conditions - update_global_pull_secret diff --git a/ibm/mas_devops/playbooks/br_core.yml b/ibm/mas_devops/playbooks/br_core.yml deleted file mode 100644 index 1039ab6e15..0000000000 --- a/ibm/mas_devops/playbooks/br_core.yml +++ /dev/null @@ -1,76 +0,0 @@ -- name: "Backup/Restore MAS Core" - hosts: localhost - any_errors_fatal: true - - vars: - # Define the target for backup/restore - mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" - - # Define what action to perform - masbr_action: "{{ lookup('env', 'MASBR_ACTION') }}" - - # Define what to backup/restore - masbr_job_component: - name: "core" - instance: "{{ mas_instance_id }}" - namespace: "mas-{{ mas_instance_id }}-core" - - # Configure path to backup_restore tasks - role_path: "{{ [playbook_dir, '../common_tasks/backup_restore'] | path_join }}" - - pre_tasks: - # Display the notice that this is still a work in progress - # ------------------------------------------------------------------------- - - name: Important Notice - debug: - msg: | - ********************************************************************* - ************************* IMPORTANT NOTICE ************************** - ********************************************************************* - * * - * The backup and restore playbooks in this collection are still * - * work in progress, they are not suitable for production use at * - * this time. * - * * - * You may track development progress using the Backup & Restore * - * label in the GitHub repository: * - * * - * https://ibm.biz/BdGnfb * - * * - * Production-ready backup and restore options are detailed in the * - * Backup and restore topic in the product documentation: * - * * - * https://ibm.biz/BdGnf3 * - * * - ********************************************************************* - - # Check for required environment variables - # ------------------------------------------------------------------------- - - name: "Fail if mas_instance_id is not provided" - assert: - that: mas_instance_id is defined and mas_instance_id != "" - fail_msg: "MAS_INSTANCE_ID is required" - - - name: "Fail if masbr_action is not set to backup|restore" - assert: - that: masbr_action in ["backup", "restore"] - fail_msg: "MASBR_ACTION is required and must be set to 'backup' or 'restore'" - - # Common checks before run tasks - # ------------------------------------------------------------------------- - - name: "Before run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/before_run_tasks.yml" - vars: - _job_type: "{{ masbr_action }}" - - tasks: - - name: "MongoDB: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.mongodb - vars: - mongodb_action: "{{ masbr_action }}" - mas_app_id: "core" - - - name: "MAS Core namespace: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.suite_backup_restore diff --git a/ibm/mas_devops/playbooks/br_db2.yml b/ibm/mas_devops/playbooks/br_db2.yml index d285c59175..5ac82d18e0 100644 --- a/ibm/mas_devops/playbooks/br_db2.yml +++ b/ibm/mas_devops/playbooks/br_db2.yml @@ -1,28 +1,30 @@ -- name: "Backup/Restore Db2 for MAS" - hosts: localhost +--- +- hosts: localhost any_errors_fatal: true - vars: # Define the target for backup/restore mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" + mas_application_id: "{{ lookup('env', 'MAS_APP_ID') }}" + mas_backup_dir: "{{ lookup('env', 'MAS_BACKUP_DIR') }}" # eg: /tmp/mas_backups + db2_instance_name: "{{ lookup('env', 'DB2_INSTANCE_NAME') }}" db2_namespace: "{{ lookup('env', 'DB2_NAMESPACE') | default('db2u', true) }}" - db2_instance_id: "{{ lookup('env', 'DB2_INSTANCE_NAME') }}" - - # Define what action to perform - masbr_action: "{{ lookup('env', 'MASBR_ACTION') }}" - - # Define what to backup/restore - masbr_job_component: - name: "db2" - instance: "{{ db2_instance_id }}" - namespace: "{{ db2_namespace }}" - - # Configure path to backup_restore tasks - role_path: "{{ [playbook_dir, '../common_tasks/backup_restore'] | path_join }}" + db2_action: "{{ lookup('env', 'DB2_ACTION') }}" # backup or backup-database or restore or restore-database + db2_backup_version: "{{ lookup('env', 'DB2_BACKUP_VERSION') }}" # Required for restore action + backup_type: "{{ lookup('env', 'DB2_BACKUP_TYPE') | default('online', true) }}" # Supported values are online, offline and required for backup + backup_vendor: "{{ lookup('env', 'BACKUP_VENDOR') | default('disk', true) | lower }}" + backup_s3_alias: "{{ lookup('env', 'BACKUP_S3_ALIAS') | default('S3DB2COS', true) }}" # Required for backup_vendor=s3 + backup_s3_endpoint: "{{ lookup('env', 'BACKUP_S3_ENDPOINT') }}" # Required for backup_vendor=s3 + backup_s3_bucket: "{{ lookup('env', 'BACKUP_S3_BUCKET') }}" # Required for backup_vendor=s3 + backup_s3_access_key: "{{ lookup('env', 'BACKUP_S3_ACCESS_KEY') }}" # Required for backup_vendor=s3 + backup_s3_secret_key: "{{ lookup('env', 'BACKUP_S3_SECRET_KEY') }}" # Required for backup_vendor=s3 + # Set OVERRIDE_STORAGECLASS to true to override the storage class names in backup + override_storageclass: "{{ lookup('env', 'OVERRIDE_STORAGECLASS') | default(false, true) }}" + # Set OVERRIDE_STORAGECLASS to true and use the below storage classes. + # when OVERRIDE_STORAGECLASS is true and below classes are not set, then the cluster's default storage classes will be used. + custom_storage_class_rwo: "{{ lookup('env', 'CUSTOM_STORAGE_CLASS_RWO') }}" + custom_storage_class_rwx: "{{ lookup('env', 'CUSTOM_STORAGE_CLASS_RWX') }}" pre_tasks: - # Display the notice that this is still a work in progress - # ------------------------------------------------------------------------- - name: Important Notice debug: msg: | @@ -30,51 +32,59 @@ ************************* IMPORTANT NOTICE ************************** ********************************************************************* * * - * The backup and restore playbooks in this collection are still * - * work in progress, they are not suitable for production use at * - * this time. * + * These playbooks are samples to demonstrate how to use the roles * + * in this collection. * * * - * You may track development progress using the Backup & Restore * - * label in the GitHub repository: * + * They are NOT INTENDED FOR PRODUCTION USE as-is, they are a * + * starting point for power users to aid in the development of * + * their own Ansible playbooks using the roles in this collection * * * - * https://ibm.biz/BdGnfb * + * The recommended way to install MAS is to use the MAS CLI, which * + * uses this Ansible Collection to deliver a complete managed * + * lifecycle for your MAS instance. * * * - * Production-ready backup and restore options are detailed in the * - * Backup and restore topic in the product documentation: * - * * - * https://ibm.biz/BdGnf3 * + * https://ibm-mas.github.io/cli/ * * * ********************************************************************* - # Check for required environment variables - # ------------------------------------------------------------------------- - - name: "Fail if mas_instance_id is not provided" + - name: "Fail if DB2_ACTION is not set to backup|restore" assert: - that: mas_instance_id is defined and mas_instance_id != "" - fail_msg: "MAS_INSTANCE_ID is required" + that: db2_action in ["backup", "backup-database", "restore-database", "restore"] + fail_msg: "DB2_ACTION is required and must be set to 'backup' or 'backup-database' or 'restore-database' or 'restore'" - - name: "Fail if db2_instance_id is not provided" + - name: "Fail if BACKUP_VENDOR is not set to s3|disk" assert: - that: db2_instance_id is defined and db2_instance_id != "" - fail_msg: "DB2_INSTANCE_NAME is required" + that: backup_vendor in ["s3", "disk"] + fail_msg: "BACKUP_VENDOR is required and must be set to 's3' or 'disk'" - - name: "Fail if masbr_action is not set to backup|restore" + - name: "Fail if DB2_BACKUP_TYPE is not set to online|offline" assert: - that: masbr_action in ["backup", "restore"] - fail_msg: "MASBR_ACTION is required and must be set to 'backup' or 'restore'" + that: backup_type in ["online", "offline"] + fail_msg: "DB2_BACKUP_TYPE is required and must be set to 'online' or 'offline'" + + - name: "Fail if mas_instance_id is not set to" + ansible.builtin.assert: + that: + - mas_instance_id is defined + fail_msg: "mas_instance_id is required and must be set" + + - name: "Fail if mas_application_id is not set to" + ansible.builtin.assert: + that: + - mas_application_id is defined + fail_msg: "mas_application_id is required and must be set" - # Common checks before run tasks - # ------------------------------------------------------------------------- - - name: "Before run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/before_run_tasks.yml" - vars: - _job_type: "{{ masbr_action }}" + - name: "Fail if S3 variables are not set when backup_vendor is s3" + ibm.mas_devops.verify_backup_restore_vars: + backup_vendor: "{{ backup_vendor }}" + backup_s3_alias: "{{ backup_s3_alias }}" + backup_s3_endpoint: "{{ backup_s3_endpoint }}" + backup_s3_bucket: "{{ backup_s3_bucket }}" + backup_s3_access_key: "{{ backup_s3_access_key }}" + backup_s3_secret_key: "{{ backup_s3_secret_key }}" + action: "s3_setup" + component: "db2" + when: backup_vendor == "s3" - tasks: - # Run backup/restore tasks locally - # ------------------------------------------------------------------------- - - name: "Db2 {{ masbr_action }}" - include_role: - name: ibm.mas_devops.db2 - vars: - db2_action: "{{ masbr_action }}" + roles: + - role: ibm.mas_devops.db2 diff --git a/ibm/mas_devops/playbooks/br_health.yml b/ibm/mas_devops/playbooks/br_health.yml deleted file mode 100644 index e82b22fe2b..0000000000 --- a/ibm/mas_devops/playbooks/br_health.yml +++ /dev/null @@ -1,107 +0,0 @@ -- name: "Backup/Restore Maximo Health" - hosts: localhost - any_errors_fatal: true - - vars: - # Define the target for backup/restore - mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" - mas_workspace_id: "{{ lookup('env', 'MAS_WORKSPACE_ID') }}" - db2_instance_id: "{{ lookup('env', 'DB2_INSTANCE_NAME') }}" - db2_namespace: "{{ lookup('env', 'DB2_NAMESPACE') | default('db2u', true) }}" - - # Define what action to perform - masbr_action: "{{ lookup('env', 'MASBR_ACTION') }}" - - # Define what to backup/restore - masbr_job_component: - name: "health" - instance: "{{ mas_instance_id }}" - namespace: "mas-{{ mas_instance_id }}-manage" - - # Configure path to backup_restore tasks - role_path: "{{ [playbook_dir, '../common_tasks/backup_restore'] | path_join }}" - - pre_tasks: - # Display the notice that this is still a work in progress - # ------------------------------------------------------------------------- - - name: Important Notice - debug: - msg: | - ********************************************************************* - ************************* IMPORTANT NOTICE ************************** - ********************************************************************* - * * - * The backup and restore playbooks in this collection are still * - * work in progress, they are not suitable for production use at * - * this time. * - * * - * You may track development progress using the Backup & Restore * - * label in the GitHub repository: * - * * - * https://ibm.biz/BdGnfb * - * * - * Production-ready backup and restore options are detailed in the * - * Backup and restore topic in the product documentation: * - * * - * https://ibm.biz/BdGnf3 * - * * - ********************************************************************* - - # Check for required environment variables - # ------------------------------------------------------------------------- - - name: "Fail if mas_instance_id is not provided" - assert: - that: mas_instance_id is defined and mas_instance_id != "" - fail_msg: "MAS_INSTANCE_ID is required" - - - name: "Fail if mas_workspace_id is not provided" - assert: - that: mas_workspace_id is defined and mas_workspace_id != "" - fail_msg: "MAS_WORKSPACE_ID is required" - - - name: "Fail if db2_instance_id is not provided" - assert: - that: db2_instance_id is defined and db2_instance_id != "" - fail_msg: "DB2_INSTANCE_NAME is required" - - - name: "Fail if masbr_action is not set to backup|restore" - assert: - that: masbr_action in ["backup", "restore"] - fail_msg: "MASBR_ACTION is required and must be set to 'backup' or 'restore'" - - # Common checks before run tasks - # ------------------------------------------------------------------------- - - name: "Before run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/before_run_tasks.yml" - vars: - _job_type: "{{ masbr_action }}" - - tasks: - - name: "MongoDB: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.mongodb - vars: - mongodb_action: "{{ masbr_action }}" - mas_app_id: "health" - - - name: "Db2: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.db2 - vars: - db2_action: "{{ masbr_action }}" - - - name: "MAS Core namespace: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.suite_backup_restore - - - name: "Manage namespace: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.suite_app_backup_restore - vars: - mas_app_id: "manage" - - - name: "Health namespace: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.suite_app_backup_restore - vars: - mas_app_id: "health" diff --git a/ibm/mas_devops/playbooks/br_iot.yml b/ibm/mas_devops/playbooks/br_iot.yml deleted file mode 100644 index 1d5ebdd9e3..0000000000 --- a/ibm/mas_devops/playbooks/br_iot.yml +++ /dev/null @@ -1,101 +0,0 @@ -- name: "Backup/Restore IoT tool" - hosts: localhost - any_errors_fatal: true - - vars: - # Define the target for backup/restore - mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" - mas_workspace_id: "{{ lookup('env', 'MAS_WORKSPACE_ID') }}" - db2_instance_id: "{{ lookup('env', 'DB2_INSTANCE_NAME') }}" - db2_namespace: "{{ lookup('env', 'DB2_NAMESPACE') | default('db2u', true) }}" - - # Define what action to perform - masbr_action: "{{ lookup('env', 'MASBR_ACTION') }}" - - # Define what to backup/restore - masbr_job_component: - name: "iot" - instance: "{{ mas_instance_id }}" - namespace: "mas-{{ mas_instance_id }}-iot" - - # Configure path to backup_restore tasks - role_path: "{{ [playbook_dir, '../common_tasks/backup_restore'] | path_join }}" - - pre_tasks: - # Display the notice that this is still a work in progress - # ------------------------------------------------------------------------- - - name: Important Notice - debug: - msg: | - ********************************************************************* - ************************* IMPORTANT NOTICE ************************** - ********************************************************************* - * * - * The backup and restore playbooks in this collection are still * - * work in progress, they are not suitable for production use at * - * this time. * - * * - * You may track development progress using the Backup & Restore * - * label in the GitHub repository: * - * * - * https://ibm.biz/BdGnfb * - * * - * Production-ready backup and restore options are detailed in the * - * Backup and restore topic in the product documentation: * - * * - * https://ibm.biz/BdGnf3 * - * * - ********************************************************************* - - # Check for required environment variables - # ------------------------------------------------------------------------- - - name: "Fail if mas_instance_id is not provided" - assert: - that: mas_instance_id is defined and mas_instance_id != "" - fail_msg: "mas_instance_id is required" - - - name: "Fail if mas_workspace_id is not provided" - assert: - that: mas_workspace_id is defined and mas_workspace_id != "" - fail_msg: "mas_workspace_id is required" - - - name: "Fail if db2_instance_id is not provided" - assert: - that: db2_instance_id is defined and db2_instance_id != "" - fail_msg: "db2_instance_id is required" - - - name: "Fail if masbr_action is not set to backup|restore" - assert: - that: masbr_action in ["backup", "restore"] - fail_msg: "masbr_action is required and must be set to 'backup' or 'restore'" - - # Common checks before run tasks - # ------------------------------------------------------------------------- - - name: "Before run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/before_run_tasks.yml" - vars: - _job_type: "{{ masbr_action }}" - - tasks: - - name: "MongoDB: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.mongodb - vars: - mongodb_action: "{{ masbr_action }}" - mas_app_id: "iot" - - - name: "Db2: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.db2 - vars: - db2_action: "{{ masbr_action }}" - - - name: "MAS Core namespace: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.suite_backup_restore - - - name: "IoT namespace: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.suite_app_backup_restore - vars: - mas_app_id: "iot" diff --git a/ibm/mas_devops/playbooks/br_manage.yml b/ibm/mas_devops/playbooks/br_manage.yml deleted file mode 100644 index 96f364b62f..0000000000 --- a/ibm/mas_devops/playbooks/br_manage.yml +++ /dev/null @@ -1,97 +0,0 @@ -- name: "Backup/Restore Maximo Manage" - hosts: localhost - any_errors_fatal: true - - vars: - # Define the target for backup/restore - mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" - mas_workspace_id: "{{ lookup('env', 'MAS_WORKSPACE_ID') }}" - db2_instance_id: "{{ lookup('env', 'DB2_INSTANCE_NAME') }}" - db2_namespace: "{{ lookup('env', 'DB2_NAMESPACE') | default('db2u', true) }}" - - # Define what action to perform - masbr_action: "{{ lookup('env', 'MASBR_ACTION') }}" - - # Define what to backup/restore - masbr_job_component: - name: "manage" - instance: "{{ mas_instance_id }}" - namespace: "mas-{{ mas_instance_id }}-manage" - - # Configure path to backup_restore tasks - role_path: "{{ [playbook_dir, '../common_tasks/backup_restore'] | path_join }}" - - pre_tasks: - # Display the notice that this is still a work in progress - # ------------------------------------------------------------------------- - - name: Important Notice - debug: - msg: | - ********************************************************************* - ************************* IMPORTANT NOTICE ************************** - ********************************************************************* - * * - * The backup and restore playbooks in this collection are still * - * work in progress, they are not suitable for production use at * - * this time. * - * * - * You may track development progress using the Backup & Restore * - * label in the GitHub repository: * - * * - * https://ibm.biz/BdGnfb * - * * - * Production-ready backup and restore options are detailed in the * - * Backup and restore topic in the product documentation: * - * * - * https://ibm.biz/BdGnf3 * - * * - ********************************************************************* - - # Check for required environment variables - # ------------------------------------------------------------------------- - - name: "Fail if mas_instance_id is not provided" - assert: - that: mas_instance_id is defined and mas_instance_id != "" - fail_msg: "mas_instance_id is required" - - - name: "Fail if mas_workspace_id is not provided" - assert: - that: mas_workspace_id is defined and mas_workspace_id != "" - fail_msg: "mas_workspace_id is required" - - - name: "Fail if masbr_action is not set to backup|restore" - assert: - that: masbr_action in ["backup", "restore"] - fail_msg: "masbr_action is required and must be set to 'backup' or 'restore'" - - # Common checks before run tasks - # ------------------------------------------------------------------------- - - name: "Before run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/before_run_tasks.yml" - vars: - _job_type: "{{ masbr_action }}" - - tasks: - - name: "MongoDB: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.mongodb - vars: - mongodb_action: "{{ masbr_action }}" - mas_app_id: "manage" - - - name: "Db2: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.db2 - vars: - db2_action: "{{ masbr_action }}" - when: db2_instance_id is defined and db2_instance_id != "" - - - name: "MAS Core namespace: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.suite_backup_restore - - - name: "Manage namespace: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.suite_app_backup_restore - vars: - mas_app_id: "manage" diff --git a/ibm/mas_devops/playbooks/br_mongodb.yml b/ibm/mas_devops/playbooks/br_mongodb.yml index 9fa399eae7..1c2b64dfd7 100644 --- a/ibm/mas_devops/playbooks/br_mongodb.yml +++ b/ibm/mas_devops/playbooks/br_mongodb.yml @@ -1,29 +1,22 @@ -- name: "Backup/Restore MongoDB for MAS" - hosts: localhost +--- +- hosts: localhost any_errors_fatal: true - vars: # Define the target for backup/restore mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" - mas_app_id: "{{ lookup('env', 'MAS_APP_ID') }}" - mongodb_provider: "{{ lookup('env', 'MONGODB_PROVIDER') | default('community', true) }}" + mongodb_action: "{{ lookup('env', 'MONGODB_ACTION') | default('backup', true) }}" # backup or backup-database or restore-database or install + mas_backup_dir: "{{ lookup('env', 'MAS_BACKUP_DIR') }}" # eg: /tmp/mas_backups + mongodb_backup_version: "{{ lookup('env', 'MONGODB_BACKUP_VERSION') }}" # Required for restore or restore-database action + mongodb_instance_name: "{{ lookup('env', 'MONGODB_INSTANCE_NAME') | default('mas-mongo-ce', true) }}" mongodb_namespace: "{{ lookup('env', 'MONGODB_NAMESPACE') | default('mongoce', true) }}" - - # Define what action to perform - masbr_action: "{{ lookup('env', 'MASBR_ACTION') }}" - - # Define what to backup/restore - masbr_job_component: - name: "mongodb" - instance: "{{ mas_instance_id }}" - namespace: "{{ mongodb_namespace }}" - - # Configure path to backup_restore tasks - role_path: "{{ [playbook_dir, '../common_tasks/backup_restore'] | path_join }}" + mongodb_provider: "community" # only community is supported currently + mas_app_id: "{{ lookup('env', 'MAS_APP_ID') }}" # Optional + # mongo restore configuration + override_storageclass: "{{ lookup('env', 'OVERRIDE_STORAGECLASS') | default('false', true) | bool }}" # Set to true to override + # when OVERRIDE_MONGODB_STORAGECLASS to true, MONGODB_STORAGECLASS_NAME_RWO will be used + mongodb_storageclass_rwo: "{{ lookup('env', 'MONGODB_STORAGECLASS_NAME_RWO') | default('', true) }}" pre_tasks: - # Display the notice that this is still a work in progress - # ------------------------------------------------------------------------- - name: Important Notice debug: msg: | @@ -31,44 +24,26 @@ ************************* IMPORTANT NOTICE ************************** ********************************************************************* * * - * The backup and restore playbooks in this collection are still * - * work in progress, they are not suitable for production use at * - * this time. * - * * - * You may track development progress using the Backup & Restore * - * label in the GitHub repository: * + * These playbooks are samples to demonstrate how to use the roles * + * in this collection. * * * - * https://ibm.biz/BdGnfb * + * They are NOT INTENDED FOR PRODUCTION USE as-is, they are a * + * starting point for power users to aid in the development of * + * their own Ansible playbooks using the roles in this collection * * * - * Production-ready backup and restore options are detailed in the * - * Backup and restore topic in the product documentation: * + * The recommended way to install MAS is to use the MAS CLI, which * + * uses this Ansible Collection to deliver a complete managed * + * lifecycle for your MAS instance. * * * - * https://ibm.biz/BdGnf3 * + * https://ibm-mas.github.io/cli/ * * * ********************************************************************* - - # Check for required environment variables - # ------------------------------------------------------------------------- - - name: "Fail if mas_instance_id is not provided" - assert: - that: mas_instance_id is defined and mas_instance_id != "" - fail_msg: "mas_instance_id is required" - - - name: "Fail if masbr_action is not set to backup|restore" + - name: "Fail if mongodb_action is not set to backup|backup-database|restore|restore-database" assert: - that: masbr_action in ["backup", "restore"] - fail_msg: "masbr_action is required and must be set to 'backup' or 'restore'" - - # Common checks before run tasks - # ------------------------------------------------------------------------- - - name: "Before run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/before_run_tasks.yml" - vars: - _job_type: "{{ masbr_action }}" + that: mongodb_action in ["backup", "backup-database", "restore-database", "restore"] + fail_msg: "mongodb_action is required and must be set to 'backup' or 'backup-database' or 'restore' or 'restore-database'" - tasks: - - name: "MongoDB: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.mongodb + roles: + - role: ibm.mas_devops.mongodb vars: - mongodb_action: "{{ masbr_action }}" + mongodb_storage_class: "{{ mongodb_storageclass_rwo }}" diff --git a/ibm/mas_devops/playbooks/br_monitor.yml b/ibm/mas_devops/playbooks/br_monitor.yml deleted file mode 100644 index d3835d1c9b..0000000000 --- a/ibm/mas_devops/playbooks/br_monitor.yml +++ /dev/null @@ -1,107 +0,0 @@ -- name: "Backup/Restore Maximo Monitor" - hosts: localhost - any_errors_fatal: true - - vars: - # Define the target for backup/restore - mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" - mas_workspace_id: "{{ lookup('env', 'MAS_WORKSPACE_ID') }}" - db2_instance_id: "{{ lookup('env', 'DB2_INSTANCE_NAME') }}" - db2_namespace: "{{ lookup('env', 'DB2_NAMESPACE') | default('db2u', true) }}" - - # Define what action to perform - masbr_action: "{{ lookup('env', 'MASBR_ACTION') }}" - - # Define what to backup/restore - masbr_job_component: - name: "monitor" - instance: "{{ mas_instance_id }}" - namespace: "mas-{{ mas_instance_id }}-monitor" - - # Configure path to backup_restore tasks - role_path: "{{ [playbook_dir, '../common_tasks/backup_restore'] | path_join }}" - - pre_tasks: - # Display the notice that this is still a work in progress - # ------------------------------------------------------------------------- - - name: Important Notice - debug: - msg: | - ********************************************************************* - ************************* IMPORTANT NOTICE ************************** - ********************************************************************* - * * - * The backup and restore playbooks in this collection are still * - * work in progress, they are not suitable for production use at * - * this time. * - * * - * You may track development progress using the Backup & Restore * - * label in the GitHub repository: * - * * - * https://ibm.biz/BdGnfb * - * * - * Production-ready backup and restore options are detailed in the * - * Backup and restore topic in the product documentation: * - * * - * https://ibm.biz/BdGnf3 * - * * - ********************************************************************* - - # Check for required environment variables - # ------------------------------------------------------------------------- - - name: "Fail if mas_instance_id is not provided" - assert: - that: mas_instance_id is defined and mas_instance_id != "" - fail_msg: "mas_instance_id is required" - - - name: "Fail if mas_workspace_id is not provided" - assert: - that: mas_workspace_id is defined and mas_workspace_id != "" - fail_msg: "mas_workspace_id is required" - - - name: "Fail if db2_instance_id is not provided" - assert: - that: db2_instance_id is defined and db2_instance_id != "" - fail_msg: "db2_instance_id is required" - - - name: "Fail if masbr_action is not set to backup|restore" - assert: - that: masbr_action in ["backup", "restore"] - fail_msg: "masbr_action is required and must be set to 'backup' or 'restore'" - - # Common checks before run tasks - # ------------------------------------------------------------------------- - - name: "Before run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/before_run_tasks.yml" - vars: - _job_type: "{{ masbr_action }}" - - tasks: - - name: "MongoDB: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.mongodb - vars: - mongodb_action: "{{ masbr_action }}" - mas_app_id: "monitor" - - - name: "Db2: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.db2 - vars: - db2_action: "{{ masbr_action }}" - - - name: "MAS Core namespace: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.suite_backup_restore - - - name: "IoT namespace: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.suite_app_backup_restore - vars: - mas_app_id: "iot" - - - name: "Monitor namespace: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.suite_app_backup_restore - vars: - mas_app_id: "monitor" diff --git a/ibm/mas_devops/playbooks/br_optimizer.yml b/ibm/mas_devops/playbooks/br_optimizer.yml deleted file mode 100644 index cece75a761..0000000000 --- a/ibm/mas_devops/playbooks/br_optimizer.yml +++ /dev/null @@ -1,107 +0,0 @@ -- name: "Backup/Restore Maximo Optimizer" - hosts: localhost - any_errors_fatal: true - - vars: - # Define the target for backup/restore - mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" - mas_workspace_id: "{{ lookup('env', 'MAS_WORKSPACE_ID') }}" - db2_instance_id: "{{ lookup('env', 'DB2_INSTANCE_NAME') }}" - db2_namespace: "{{ lookup('env', 'DB2_NAMESPACE') | default('db2u', true) }}" - - # Define what action to perform - masbr_action: "{{ lookup('env', 'MASBR_ACTION') }}" - - # Define what to backup/restore - masbr_job_component: - name: "optimizer" - instance: "{{ mas_instance_id }}" - namespace: "mas-{{ mas_instance_id }}-optimizer" - - # Configure path to backup_restore tasks - role_path: "{{ [playbook_dir, '../common_tasks/backup_restore'] | path_join }}" - - pre_tasks: - # Display the notice that this is still a work in progress - # ------------------------------------------------------------------------- - - name: Important Notice - debug: - msg: | - ********************************************************************* - ************************* IMPORTANT NOTICE ************************** - ********************************************************************* - * * - * The backup and restore playbooks in this collection are still * - * work in progress, they are not suitable for production use at * - * this time. * - * * - * You may track development progress using the Backup & Restore * - * label in the GitHub repository: * - * * - * https://ibm.biz/BdGnfb * - * * - * Production-ready backup and restore options are detailed in the * - * Backup and restore topic in the product documentation: * - * * - * https://ibm.biz/BdGnf3 * - * * - ********************************************************************* - - # Check for required environment variables - # ------------------------------------------------------------------------- - - name: "Fail if mas_instance_id is not provided" - assert: - that: mas_instance_id is defined and mas_instance_id != "" - fail_msg: "mas_instance_id is required" - - - name: "Fail if mas_workspace_id is not provided" - assert: - that: mas_workspace_id is defined and mas_workspace_id != "" - fail_msg: "mas_workspace_id is required" - - - name: "Fail if db2_instance_id is not provided" - assert: - that: db2_instance_id is defined and db2_instance_id != "" - fail_msg: "db2_instance_id is required" - - - name: "Fail if masbr_action is not set to backup|restore" - assert: - that: masbr_action in ["backup", "restore"] - fail_msg: "masbr_action is required and must be set to 'backup' or 'restore'" - - # Common checks before run tasks - # ------------------------------------------------------------------------- - - name: "Before run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/before_run_tasks.yml" - vars: - _job_type: "{{ masbr_action }}" - - tasks: - - name: "MongoDB: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.mongodb - vars: - mongodb_action: "{{ masbr_action }}" - mas_app_id: "optimizer" - - - name: "Db2: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.db2 - vars: - db2_action: "{{ masbr_action }}" - - - name: "MAS Core namespace: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.suite_backup_restore - - - name: "Manage namespace: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.suite_app_backup_restore - vars: - mas_app_id: "manage" - - - name: "Optimizer namespace: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.suite_app_backup_restore - vars: - mas_app_id: "optimizer" diff --git a/ibm/mas_devops/playbooks/br_visualinspection.yml b/ibm/mas_devops/playbooks/br_visualinspection.yml deleted file mode 100644 index 4023ad752f..0000000000 --- a/ibm/mas_devops/playbooks/br_visualinspection.yml +++ /dev/null @@ -1,88 +0,0 @@ -- name: "Backup/Restore Maximo Visual Inspection" - hosts: localhost - any_errors_fatal: true - - vars: - # Define the target for backup/restore - mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" - mas_workspace_id: "{{ lookup('env', 'MAS_WORKSPACE_ID') }}" - - # Define what action to perform - masbr_action: "{{ lookup('env', 'MASBR_ACTION') }}" - - # Define what to backup/restore - masbr_job_component: - name: "visualinspection" - instance: "{{ mas_instance_id }}" - namespace: "mas-{{ mas_instance_id }}-visualinspection" - - # Configure path to backup_restore tasks - role_path: "{{ [playbook_dir, '../common_tasks/backup_restore'] | path_join }}" - - pre_tasks: - # Display the notice that this is still a work in progress - # ------------------------------------------------------------------------- - - name: Important Notice - debug: - msg: | - ********************************************************************* - ************************* IMPORTANT NOTICE ************************** - ********************************************************************* - * * - * The backup and restore playbooks in this collection are still * - * work in progress, they are not suitable for production use at * - * this time. * - * * - * You may track development progress using the Backup & Restore * - * label in the GitHub repository: * - * * - * https://ibm.biz/BdGnfb * - * * - * Production-ready backup and restore options are detailed in the * - * Backup and restore topic in the product documentation: * - * * - * https://ibm.biz/BdGnf3 * - * * - ********************************************************************* - - # Check for required environment variables - # ------------------------------------------------------------------------- - - name: "Fail if mas_instance_id is not provided" - assert: - that: mas_instance_id is defined and mas_instance_id != "" - fail_msg: "mas_instance_id is required" - - - name: "Fail if mas_workspace_id is not provided" - assert: - that: mas_workspace_id is defined and mas_workspace_id != "" - fail_msg: "mas_workspace_id is required" - - - name: "Fail if masbr_action is not set to backup|restore" - assert: - that: masbr_action in ["backup", "restore"] - fail_msg: "masbr_action is required and must be set to 'backup' or 'restore'" - - # Common checks before run tasks - # ------------------------------------------------------------------------- - - name: "Before run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/before_run_tasks.yml" - vars: - _job_type: "{{ masbr_action }}" - - tasks: - - name: "MongoDB: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.mongodb - vars: - mongodb_action: "{{ masbr_action }}" - mas_app_id: "visualinspection" - - - name: "MAS Core namespace: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.suite_backup_restore - - - name: "Visual Inspection namespace: {{ masbr_action }}" - include_role: - name: ibm.mas_devops.suite_app_backup_restore - vars: - mas_app_id: "visualinspection" diff --git a/ibm/mas_devops/plugins/action/backup_resource.py b/ibm/mas_devops/plugins/action/backup_resource.py new file mode 100644 index 0000000000..805436f0de --- /dev/null +++ b/ibm/mas_devops/plugins/action/backup_resource.py @@ -0,0 +1,169 @@ +#!/usr/bin/env python3 + +import logging +import urllib3 + +from ansible_collections.kubernetes.core.plugins.module_utils.k8s.client import get_api_client +from ansible.plugins.action import ActionBase +from ansible.errors import AnsibleError +from ansible.utils.display import Display + +from mas.devops.backup import backupResources + +urllib3.disable_warnings() # Disabling warnings will prevent InsecureRequestWarnings from dynClient +logging.basicConfig(level=logging.INFO, format='%(asctime)s %(name)-20s %(levelname)-8s %(message)s', datefmt='%Y-%m-%d %H:%M:%S') + +display = Display() + + +class ActionModule(ActionBase): + """ + Backup Kubernetes resources based on a list of resource definitions + + Usage Example + ------------- + tasks: + - name: "Backup MAS Suite resources" + ibm.mas_devops.backup_resource: + backup_resources: + - namespace: "mas-inst1-core" + resources: + - kind: Subscription + api_version: operators.coreos.com/v1alpha1 + name: ibm-mas-operator + - kind: Suite + api_version: core.mas.ibm.com/v1 + - kind: Workspace + api_version: core.mas.ibm.com/v1 + backup_path: "/backup/mas-suite" + """ + def run(self, tmp=None, task_vars=None): + super(ActionModule, self).run(tmp, task_vars) + + # Initialize DynamicClient and grab the task args + host = self._task.args.get('host', None) + api_key = self._task.args.get('api_key', None) + + dynClient = get_api_client(api_key=api_key, host=host) + + backup_resources = self._task.args.get('backup_resources') + backup_path = self._task.args.get('backup_path') + + if backup_resources is None or not isinstance(backup_resources, list): + raise AnsibleError(f"Error: backup_resources argument must be provided as a list") + if backup_path is None or backup_path == "": + raise AnsibleError(f"Error: backup_path argument was not provided") + + display.v(f"Starting backup of MAS resources to '{backup_path}'") + + total_backed_up = 0 + total_failed = 0 + total_not_found = 0 + all_discovered_secrets = set() + secrets_by_namespace = {} # Track secrets per namespace + failed_resources = [] # Track failed resources with details + + # Process each namespace and its resources + for namespace_config in backup_resources: + namespace = namespace_config.get('namespace') + resources = namespace_config.get('resources', []) + + # Namespace can be blank for cluster-scoped resources + if namespace: + display.v(f"Processing namespace: {namespace}") + else: + display.v(f"Processing cluster-scoped resources") + + # Process each resource in the namespace (or cluster-scoped) + for resource_def in resources: + kind = resource_def.get('kind') + api_version = resource_def.get('api_version') + name = resource_def.get('name') + labels = resource_def.get('labels', []) + + if not kind: + display.v(f"Warning: Skipping resource without kind defined") + continue + + if not api_version: + raise AnsibleError(f"Error: api_version is required for resource kind '{kind}'") + + # Backup resources (either specific or all of that kind) + # Pass namespace, name, and labels as named optional arguments + backed_up, not_found, failed, discovered_secrets = backupResources( + dynClient, kind, api_version, backup_path, + namespace=namespace, name=name, labels=labels + ) + total_backed_up += backed_up + total_not_found += not_found + total_failed += failed + + # Track failed resources with details + if failed > 0: + scope = namespace if namespace else 'cluster-scoped' + resource_desc = f"{kind}/{name}" if name else f"{kind} (all)" + if labels: + resource_desc += f" with labels {labels}" + failed_resources.append({ + 'scope': scope, + 'kind': kind, + 'name': name if name else 'all', + 'api_version': api_version, + 'description': resource_desc + }) + + # Track discovered secrets by namespace (only if namespace is provided) + if discovered_secrets and namespace: + if namespace not in secrets_by_namespace: + secrets_by_namespace[namespace] = set() + secrets_by_namespace[namespace].update(discovered_secrets) + all_discovered_secrets.update(discovered_secrets) + + display.v(f"Backup complete for named resources: {total_backed_up} resources backed up, {total_not_found} not found, {total_failed} failed") + + # Now backup all discovered secrets per namespace + for ns, secret_names in secrets_by_namespace.items(): + if secret_names: + display.v(f"Backing up {len(secret_names)} discovered secret(s) in namespace '{ns}': {', '.join(sorted(secret_names))}") + + for secret_name in sorted(secret_names): + display.v(f"Backing up discovered secret: {secret_name}") + backed_up, not_found, failed, _ = backupResources( + dynClient, 'Secret', 'v1', backup_path, + namespace=ns, name=secret_name + ) + total_backed_up += backed_up + total_not_found += not_found + total_failed += failed + + # Track failed secret backups + if failed > 0: + failed_resources.append({ + 'scope': ns, + 'kind': 'Secret', + 'name': secret_name, + 'api_version': 'v1', + 'description': f"Secret/{secret_name} (auto-discovered)" + }) + + if not_found: + display.v(f"Warning: Referenced secret '{secret_name}' not found in namespace '{ns}'") + + display.v(f"Backup complete for all: {total_backed_up} resources backed up, {total_not_found} not found, {total_failed} failed") + + # Determine if the backup was successful + has_failures = total_failed > 0 + + return dict( + message=f"Backed up {total_backed_up} MAS resources" + (f" with {total_failed} failures" if has_failures else ""), + failed=has_failures, + changed=False, + success=not has_failures, + backed_up_count=total_backed_up, + not_found_count=total_not_found, + failed_count=total_failed, + discovered_secrets_count=len(all_discovered_secrets), + discovered_secrets=sorted(list(all_discovered_secrets)), + failed_resources=failed_resources + ) + diff --git a/ibm/mas_devops/plugins/action/crd_exists.py b/ibm/mas_devops/plugins/action/crd_exists.py new file mode 100644 index 0000000000..741433aad5 --- /dev/null +++ b/ibm/mas_devops/plugins/action/crd_exists.py @@ -0,0 +1,43 @@ +#!/usr/bin/env python3 + +import logging +import urllib3 +from ansible_collections.kubernetes.core.plugins.module_utils.k8s.client import get_api_client +from ansible.errors import AnsibleError +from ansible.plugins.action import ActionBase + +from mas.devops.ocp import crdExists + +urllib3.disable_warnings() # Disabling warnings will prevent InsecureRequestWarnings from dynClient +logging.basicConfig(level=logging.INFO, format='%(asctime)s %(name)-20s %(levelname)-8s %(message)s', datefmt='%Y-%m-%d %H:%M:%S') + +class ActionModule(ActionBase): + """ + Usage Example + ------------- + tasks: + - name: "Check CRD exists" + ibm.mas_devops.crd_exists: + crdName: GrafanaDashboard + register: crd_exists + """ + def run(self, tmp=None, task_vars=None): + super(ActionModule, self).run(tmp, task_vars) + + # Target subscription + crdName = self._task.args.get('crdName', None) + + if crdName is None: + raise AnsibleError("Error: CRD Name argument was not provided") + + # Initialize DynamicClient and apply the Subscription + host = self._task.args.get('host', None) + api_key = self._task.args.get('api_key', None) + + dynClient = get_api_client(api_key=api_key, host=host) + isCRDPresent = crdExists(dynClient, crdName) + + return dict( + success=True, + exists=isCRDPresent + ) diff --git a/ibm/mas_devops/plugins/action/download_from_s3.py b/ibm/mas_devops/plugins/action/download_from_s3.py new file mode 100644 index 0000000000..df1410a9a1 --- /dev/null +++ b/ibm/mas_devops/plugins/action/download_from_s3.py @@ -0,0 +1,63 @@ +#!/usr/bin/env python3 + +import logging +import os +import urllib3 +from ansible.errors import AnsibleError +from ansible.plugins.action import ActionBase +from mas.devops.backup import downloadFromS3 + +urllib3.disable_warnings() # Disabling warnings will prevent InsecureRequestWarnings from dynClient + +def normalize_endpoint_url(endpoint) -> str|None: + if not endpoint: + return endpoint + if not endpoint.startswith(("http://", "https://")): + return f"https://{endpoint}" + return endpoint + +class ActionModule(ActionBase): + """ + Usage Example + ------------- + tasks: + - name: "Upload to S3 location" + ibm.mas_devops.upload_to_s3: + mas_catalog_version: "{{ catalog_tag }}" + fail_if_catalog_does_not_exist: true + register: mas_catalog_metadata + """ + def run(self, tmp=None, task_vars=None): + super(ActionModule, self).run(tmp, task_vars) + + local_dir = self._task.args.get('local_dir', None) + bucket_name = self._task.args.get('bucket_name', None) + object_name = self._task.args.get('object_name', None) + aws_access_key_id = self._task.args.get('aws_access_key_id', None) + aws_secret_access_key = self._task.args.get('aws_secret_access_key', None) + endpoint_url = self._task.args.get('endpoint_url', None) + region_name = self._task.args.get('region_name', None) + + if local_dir is None: + raise AnsibleError(f"Error: local_dir argument was not provided") + if bucket_name is None: + raise AnsibleError(f"Error: bucket_name argument was not provided") + if object_name is None: + raise AnsibleError(f"Error: object_name argument was not provided") + if aws_access_key_id is None: + raise AnsibleError(f"Error: aws_access_key_id argument was not provided") + if aws_secret_access_key is None: + raise AnsibleError(f"Error: aws_secret_access_key argument was not provided") + + endpoint_url = normalize_endpoint_url(endpoint=endpoint_url) + + download_status = downloadFromS3( + local_dir=local_dir, bucket_name=bucket_name, object_name=object_name, + endpoint_url=endpoint_url, aws_access_key_id=aws_access_key_id, + aws_secret_access_key=aws_secret_access_key, region_name=region_name + ) + + return dict( + success=download_status + ) + diff --git a/ibm/mas_devops/plugins/action/get_db2u_pod_name.py b/ibm/mas_devops/plugins/action/get_db2u_pod_name.py new file mode 100644 index 0000000000..d47e281c8b --- /dev/null +++ b/ibm/mas_devops/plugins/action/get_db2u_pod_name.py @@ -0,0 +1,71 @@ +#!/usr/bin/env python3 + +import urllib3 +from ansible_collections.kubernetes.core.plugins.module_utils.k8s.client import get_api_client +from ansible.plugins.action import ActionBase +from ansible.errors import AnsibleError +from ansible.utils.display import Display +from kubernetes.dynamic import DynamicClient +from kubernetes.dynamic.exceptions import NotFoundError + +from ansible_collections.ibm.mas_devops.plugins.module_utils.backuprestore import getDb2uInstance, getDb2VersionFromCR, isDb2uReady + + +# Disabling warnings will prevent InsecureRequestWarnings from dynClient +urllib3.disable_warnings() +display = Display() + +class ActionModule(ActionBase): + """ + Usage Example + ------------- + tasks: + - name: "Get Db2u pod name + ibm.mas_devops.get_db2u_pod_name: + """ + def run(self, tmp=None, task_vars=None): + super(ActionModule, self).run(tmp, task_vars) + + # Initialize DynamicClient and grab the task args + host = self._task.args.get('host', None) + api_key = self._task.args.get('api_key', None) + + dynClient = get_api_client(api_key=api_key, host=host) + + db2_instance_name = self._task.args.get('db2_instance_name') + db2_namespace = self._task.args.get('db2_namespace') + + if db2_instance_name is None: + raise AnsibleError(f"Error: db2_instance_name argument was not provided") + if db2_namespace is None: + raise AnsibleError(f"Error: db2_namespace argument was not provided") + + # First, check if the Db2u Instance is present and healthy state. + db2u_cr = getDb2uInstance(dynClient, db2_instance_name, db2_namespace) + if not db2u_cr: + raise AnsibleError(f"Error: Db2u instance '{db2_instance_name}' not found in namespace '{db2_namespace}'") + if not isDb2uReady(db2u_cr): + raise AnsibleError(f"Error: Db2u instance '{db2_instance_name}' is not in 'Ready' state") + + # Next, get the Pod name for the Db2u instance + label_selector = f"app={db2_instance_name},type=engine" + display.v(f"Looking up Db2u Pod in namespace '{db2_namespace}' with labels '{label_selector}'") + try: + podAPI = dynClient.resources.get(api_version="v1", kind="Pod") + pods = podAPI.get(namespace=db2_namespace, label_selector=label_selector) + if pods.items: + pod_name = pods.items[0]["metadata"]["name"] + display.v(f"- Found Pod '{pod_name}' in namespace '{db2_namespace}' with labels '{label_selector}'") + return dict( + failed=False, + success=True, + pod_name=pod_name, + db2_version=getDb2VersionFromCR(db2u_cr), + msg="Db2u Pod found" + ) + else: + display.v(f"- No Pods found in namespace '{db2_namespace}' with labels '{label_selector}'") + except NotFoundError: + display.v(f"- No Pods found in namespace '{db2_namespace}' with labels '{label_selector}'") + return dict(failed=True, success=False, pod_name="", db2version="", msg="Db2u Pod not found") + diff --git a/ibm/mas_devops/plugins/action/get_mas_cluster_issuer.py b/ibm/mas_devops/plugins/action/get_mas_cluster_issuer.py new file mode 100644 index 0000000000..1c0b0458e8 --- /dev/null +++ b/ibm/mas_devops/plugins/action/get_mas_cluster_issuer.py @@ -0,0 +1,67 @@ +#!/usr/bin/env python3 + +import logging +import urllib3 +from ansible_collections.kubernetes.core.plugins.module_utils.k8s.client import get_api_client +from ansible.plugins.action import ActionBase +from ansible.errors import AnsibleError + +from mas.devops.mas import getMasPublicClusterIssuer + +urllib3.disable_warnings() # Disabling warnings will prevent InsecureRequestWarnings from dynClient +logging.basicConfig(level=logging.INFO, format='%(asctime)s %(name)-20s %(levelname)-8s %(message)s', datefmt='%Y-%m-%d %H:%M:%S') + + +class ActionModule(ActionBase): + """ + Retrieve the Public Cluster Issuer for a MAS instance. + + This action plugin queries the Suite custom resource and retrieves the + certificate issuer name from spec.certificateIssuer.name. If not specified, + it returns the default issuer name. + + Usage Example + ------------- + tasks: + - name: "Get MAS Cluster Issuer" + ibm.mas_devops.get_mas_cluster_issuer: + instance_id: "{{ mas_instance_id }}" + register: cluster_issuer + """ + def run(self, tmp=None, task_vars=None): + super(ActionModule, self).run(tmp, task_vars) + + # Initialize DynamicClient and grab the task args + host = self._task.args.get('host', None) + api_key = self._task.args.get('api_key', None) + + dynClient = get_api_client(api_key=api_key, host=host) + + instanceId = self._task.args.get('instance_id', None) + + if instanceId is None: + raise AnsibleError(f"Error: instance_id argument was not provided") + if not isinstance(instanceId, str): + raise AnsibleError(f"Error: instance_id argument is not a string") + + # Get the cluster issuer name + issuerName = getMasPublicClusterIssuer(dynClient, instanceId) + + if issuerName is None: + return dict( + message=f"Failed to retrieve cluster issuer for MAS instance '{instanceId}'", + success=False, + failed=True, + changed=False, + instance_id=instanceId + ) + + return dict( + message=f"Successfully retrieved cluster issuer for MAS instance '{instanceId}'", + success=True, + failed=False, + changed=False, + instance_id=instanceId, + issuer_name=issuerName + ) + diff --git a/ibm/mas_devops/plugins/action/get_mongoce_info.py b/ibm/mas_devops/plugins/action/get_mongoce_info.py new file mode 100644 index 0000000000..9817ce19a9 --- /dev/null +++ b/ibm/mas_devops/plugins/action/get_mongoce_info.py @@ -0,0 +1,192 @@ +#!/usr/bin/env python3 + +import logging +import urllib3 + +from ansible_collections.kubernetes.core.plugins.module_utils.k8s.client import get_api_client +from ansible.plugins.action import ActionBase +from ansible.errors import AnsibleError +from ansible.utils.display import Display +from kubernetes.dynamic import DynamicClient +from kubernetes.dynamic.exceptions import NotFoundError + +import yaml +import base64 + +from mas.devops.ocp import getCR, getSecret + +urllib3.disable_warnings() # Disabling warnings will prevent InsecureRequestWarnings from dynClient +logging.basicConfig(level=logging.INFO, format='%(asctime)s %(name)-20s %(levelname)-8s %(message)s', datefmt='%Y-%m-%d %H:%M:%S') + +display = Display() + +def display_information(mongoDBCommunityCR : dict): + display.v(f"MongoCE instance name .......................... {mongoDBCommunityCR['metadata']['name']}") + display.v(f"MongoCE namespace .............................. {mongoDBCommunityCR['metadata']['namespace']}") + display.v(f"MongoDB Version ................................ {mongoDBCommunityCR['spec']['version']}") + +def get_mongoce_admin_secretname(mongoDBCommunityCR): + """ + short_description: Get admin secret name from MongoDBCommunity CR + """ + for user in mongoDBCommunityCR['spec']['users']: + if 'db' in user: + if user['db'] == 'admin' and 'passwordSecretRef' in user: + return user['passwordSecretRef']['name'] + return None + +def isMongoRunning(mongoCR: dict) -> bool: + """ + Check if MongoDB Community instance is running + return True if running, else False + """ + display.v(f"Checking if MongoDB Community instance is in 'Running' state") + if 'status' in mongoCR: + if 'phase' in mongoCR['status']: + if mongoCR['status']['phase'] == 'Running': + display.v(f"MongoDB Community instance is in 'Running' state") + return True + display.v(f"MongoDB Community instance is not in 'Running' state") + return False + +def getMongoceCR(dynClient: DynamicClient, mongodb_instance_name: str, mongodb_namespace: str) -> dict: + """ + Check if MongoDB Community instance exists + return cr if exists, else return empty dict + """ + display.v(f"Checking if MongoDB Community instance '{mongodb_instance_name}' exists in namespace '{mongodb_namespace}'") + mongodbCR = getCR( + dynClient=dynClient, + cr_api_version="mongodbcommunity.mongodb.com/v1", + cr_kind="MongoDBCommunity", + cr_name=mongodb_instance_name, + namespace=mongodb_namespace + ) + if mongodbCR: + return mongodbCR.to_dict() + else: + return {} + +def getMongoVersion(mongoCR: dict) -> str: + """ + Get MongoDB version from MongoDB Community CR + """ + if 'spec' in mongoCR: + if 'version' in mongoCR['spec']: + return mongoCR['spec']['version'] + return "" + +def getMongoDBServiceName(mongoCR: dict) -> str: + """ + Get MongoDB Service name from MongoDB Community CR + """ + if 'spec' in mongoCR: + if 'statefulSet' in mongoCR['spec']: + if 'spec' in mongoCR['spec']['statefulSet']: + if 'serviceName' in mongoCR['spec']['statefulSet']['spec']: + return mongoCR['spec']['statefulSet']['spec']['serviceName'] + return "" + +def getPodNameFromLabels(dynClient: DynamicClient, namespace: str, label_selector: str) -> str: + """ + Get Pod name from labels + """ + display.v(f"Looking up Mongo Pod in namespace '{namespace}' with labels '{label_selector}'") + try: + podAPI = dynClient.resources.get(api_version="v1", kind="Pod") + pods = podAPI.get(namespace=namespace, label_selector=label_selector) + if pods.items: + pod_name = pods.items[0]["metadata"]["name"] + display.v(f"Found Pod '{pod_name}' in namespace '{namespace}' with labels '{label_selector}'") + return pod_name + else: + display.v(f"No Pods found in namespace '{namespace}' with labels '{label_selector}'") + except NotFoundError: + display.v(f"No Pods found in namespace '{namespace}' with labels '{label_selector}'") + return "" + +class ActionModule(ActionBase): + """ + Usage Example + ------------- + tasks: + - name: "Retrieve info from MongoDB instance CR and resources" + ibm.mas_devops.get_mongoce_info: + """ + def run(self, tmp=None, task_vars=None): + super(ActionModule, self).run(tmp, task_vars) + + # Initialize DynamicClient and grab the task args + host = self._task.args.get('host', None) + api_key = self._task.args.get('api_key', None) + + dynClient = get_api_client(api_key=api_key, host=host) + + mongodb_instance_name = self._task.args.get('mongodb_instance_name') + mongodb_namespace = self._task.args.get('mongodb_namespace') + + if mongodb_instance_name is None: + raise AnsibleError(f"Error: mongodb_instance_name argument was not provided") + if mongodb_namespace is None: + raise AnsibleError(f"Error: mongodb_namespace argument was not provided") + + display.v(f"Retrieving MongoDB Community instance '{mongodb_instance_name}' in namespace '{mongodb_namespace}'") + + # 1. Check if MongoDB Community instance exists + mongodb_cr = getMongoceCR(dynClient, mongodb_instance_name, mongodb_namespace) + if not mongodb_cr: + raise AnsibleError(f"Error: MongoDB Community instance '{mongodb_instance_name}' does not exist in namespace '{mongodb_namespace}'") + else: + display.v(f"MongoDB Community instance '{mongodb_instance_name}' exists in namespace '{mongodb_namespace}'") + + # 2. Check if MongoDB Community instance is in 'Running' state + if not isMongoRunning(mongodb_cr): + raise AnsibleError(f"Error: MongoDB Community instance '{mongodb_instance_name}' is not in 'Running' state") + + display_information(mongodb_cr) + + # 3. Lookup mongoce Pod to retrieve mongo pod name + mongoce_pod_name = getPodNameFromLabels( + dynClient=dynClient, + namespace=mongodb_namespace, + label_selector="apps.kubernetes.io/pod-index=0" + ) + if not mongoce_pod_name: + raise AnsibleError(f"Error: Could not find Pod for MongoDB Community instance '{mongodb_instance_name}' in namespace '{mongodb_namespace}'") + + # 4. Retrieve MongoDB admin Secret data + mongodb_admin_secretname = get_mongoce_admin_secretname(mongodb_cr) + mongodb_admin_secret = getSecret( + dynClient=dynClient, + namespace=mongodb_namespace, + secret_name=mongodb_admin_secretname + ) + if not mongodb_admin_secret: + raise AnsibleError(f"Error: Could not find admin Secret '{mongodb_admin_secretname}' for MongoDB Community instance '{mongodb_instance_name}' in namespace '{mongodb_namespace}'") + + # 5. Retrieve mongodb service name + mongodb_service_name = getMongoDBServiceName(mongodb_cr) + if not mongodb_service_name: + raise AnsibleError(f"Error: Could not find MongoDB Service name for MongoDB Community instance '{mongodb_instance_name}' in namespace '{mongodb_namespace}'") + + # 6. Construct mongodb_host + mongodb_host = f"{mongoce_pod_name}.{mongodb_service_name}.{mongodb_namespace}.svc.cluster.local:27017" + + mongo_info = dict( + mongoce_pod_name=mongoce_pod_name, + mongodb_admin_user="admin", + mongodb_admin_password = base64.b64decode(mongodb_admin_secret['data']['password']), + mongodb_service_name=mongodb_service_name, + mongodb_host=mongodb_host, + mongodb_version=getMongoVersion(mongodb_cr) + ) + + return dict( + message=f"Successfully set facts from MongoDB Community instance '{mongodb_instance_name}' resources", + failed=False, + changed=False, + success=True, + **mongo_info + ) + + diff --git a/ibm/mas_devops/plugins/action/restore_resource.py b/ibm/mas_devops/plugins/action/restore_resource.py new file mode 100644 index 0000000000..66597f6fca --- /dev/null +++ b/ibm/mas_devops/plugins/action/restore_resource.py @@ -0,0 +1,415 @@ +#!/usr/bin/env python3 + +import logging +import os +import urllib3 + +from ansible_collections.kubernetes.core.plugins.module_utils.k8s.client import get_api_client +from ansible.plugins.action import ActionBase +from ansible.errors import AnsibleError +from ansible.utils.display import Display + +from mas.devops.restore import loadYamlFile, restoreResource + +urllib3.disable_warnings() # Disabling warnings will prevent InsecureRequestWarnings from dynClient + +display = Display() + + +# Custom logging handler to forward Python logs to Ansible display +class AnsibleDisplayHandler(logging.Handler): + """Custom logging handler that forwards log messages to Ansible's display system""" + + def emit(self, record): + try: + msg = self.format(record) + if record.levelno >= logging.ERROR: + display.error(msg) + elif record.levelno >= logging.WARNING: + display.warning(msg) + else: + display.vvv(msg) # Use vvv for info/debug messages (visible with -vvv) + except Exception: + self.handleError(record) + + +# Configure logging to use both console and Ansible display +def setup_logging(): + """Setup logging to output to both console and Ansible display""" + # Create formatter + formatter = logging.Formatter('%(asctime)s %(name)-20s %(levelname)-8s %(message)s', datefmt='%Y-%m-%d %H:%M:%S') + + # Setup root logger + root_logger = logging.getLogger() + root_logger.setLevel(logging.INFO) + + # Add Ansible display handler + ansible_handler = AnsibleDisplayHandler() + ansible_handler.setFormatter(formatter) + root_logger.addHandler(ansible_handler) + + # Also add console handler for direct execution/debugging + console_handler = logging.StreamHandler() + console_handler.setFormatter(formatter) + root_logger.addHandler(console_handler) + + +# Initialize logging +setup_logging() + +def filter_fields(resource_data: dict, fields_to_filter: dict, resource_kind: str) -> dict: + """ + Filter out specified fields from a resource based on key paths, filtered by resource kind. + + Args: + + resource_data: The resource dictionary to modify + fields_to_filter: Dictionary mapping resource kinds to lists of field paths to filter out + resource_kind: The kind of the current resource (e.g., 'Suite', 'Secret', 'ConfigMap') + + Returns: + dict: Modified resource data with specified fields removed + + Example: + fields_to_filter = { + 'Suite': [ + 'spec.domain', 'spec.clusterIssuer.name' + + ] + } + """ + + if not fields_to_filter or resource_kind not in fields_to_filter: + return resource_data + + kind_fields = fields_to_filter[resource_kind] + + if not kind_fields: + return resource_data + + for field_path in kind_fields: + # Split the field path into parts + keys = field_path.split('.') + # Traverse the dictionary to find the parent of the field to remove + current = resource_data + + for key in keys[:-1]: + if key not in current: + # If any part of the path doesn't exist, skip this field + break + current = current[key] + + # Remove the field if it exists + if keys[-1] in current: + del current[keys[-1]] + + return resource_data + + +def apply_overrides(resource_data: dict, override_values: dict, resource_kind: str) -> dict: + """ + Apply override values to a resource based on key paths, filtered by resource kind. + + Args: + resource_data: The resource dictionary to modify + override_values: Dictionary mapping resource kinds to lists of override dictionaries + resource_kind: The kind of the current resource (e.g., 'Suite', 'Secret', 'ConfigMap') + + Returns: + dict: Modified resource data + + Example: + override_values = { + 'Suite': [ + {'spec.domain': 'mydomain.com'}, + {'spec.clusterIssuer.name': 'bob'} + ], + 'Secret': [ + {'data.value': 'newvalue'} + ] + } + """ + if not override_values or resource_kind not in override_values: + return resource_data + + kind_overrides = override_values[resource_kind] + if not kind_overrides: + return resource_data + + for override_dict in kind_overrides: + for key_path, new_value in override_dict.items(): + # Skip if value is NO_OVERRIDE (use backup value) + if new_value == "NO_OVERRIDE": + display.vvv(f"Skipping override for {resource_kind}: {key_path} (NO_OVERRIDE)") + continue + + # Split the key path by dots to navigate nested structure + keys = key_path.split('.') + + # Navigate to the parent of the target key + current = resource_data + for i, key in enumerate(keys[:-1]): + if key not in current: + # Create missing intermediate dictionaries + current[key] = {} + elif not isinstance(current[key], dict): + # Can't navigate further if intermediate value is not a dict + display.warning(f"Cannot apply override '{key_path}': '{'.'.join(keys[:i+1])}' is not a dictionary") + break + current = current[key] + else: + # Set the final value + final_key = keys[-1] + old_value = current.get(final_key, '') + current[final_key] = new_value + display.vvv(f"Applied override for {resource_kind}: {key_path}: {old_value} -> {new_value}") + + return resource_data + + +class ActionModule(ActionBase): + """ + Restore Kubernetes resources from a backup archive directory. + Automatically discovers and restores all resources found in the backup. + + - If a resource doesn't exist, it will be created + - If a resource exists and replace_resource=True, it will be updated (replaced) + - If a resource exists and replace_resource=False, it will be skipped + + Usage Example + ------------- + tasks: + - name: "Restore and replace specific MAS Suite resources" + ibm.mas_devops.restore_resource: + backup_path: "/backup/backup-20250115-120000-suite" + resource_kinds: + - Secret + - ConfigMap + replace_resource: true + + - name: "Restore all resources (skip existing)" + ibm.mas_devops.restore_resource: + backup_path: "/backup/backup-20250115-120000-suite" + replace_resource: false + + - name: "Restore resources with overrides, filter_values and skip_files" + ibm.mas_devops.restore_resource: + backup_path: "/backup/backup-20250115-120000-suite" + resource_kinds: + - Suite + - Secret + - ConfigMap + replace_resource: true + filter_values: + Suite: + - spec.domain + - spec.clusterIssuer.name + override_values: + Suite: + - spec.domain: mydomain.com + - spec.clusterIssuer.name: bob + Secret: + - data.value: newvalue + skip_files: #skip applying these files + Secret: + - jdbc-credentials.yaml + """ + def run(self, tmp=None, task_vars=None): + super(ActionModule, self).run(tmp, task_vars) + + # Initialize DynamicClient and grab the task args + host = self._task.args.get('host', None) + api_key = self._task.args.get('api_key', None) + + dynClient = get_api_client(api_key=api_key, host=host) + + backup_path = self._task.args.get('backup_path') + replace_resource = self._task.args.get('replace_resource', True) + resource_kinds = self._task.args.get('resource_kinds', None) + override_values = self._task.args.get('override_values', None) + filter_values = self._task.args.get('filter_values', None) + skip_files = self._task.args.get('skip_files', None) + + if backup_path is None or backup_path == "": + raise AnsibleError(f"Error: backup_path argument was not provided") + + # Check if backup path exists + if not os.path.exists(backup_path): + raise AnsibleError(f"Error: backup_path does not exist: {backup_path}") + + # Check if resources directory exists + resources_path = os.path.join(backup_path, 'resources') + if not os.path.exists(resources_path): + raise AnsibleError(f"Error: resources directory not found in backup: {resources_path}") + + display.v(f"Starting restore of MAS resources from '{backup_path}'") + display.v(f"Replace existing resources: {'enabled' if replace_resource else 'disabled'}") + if override_values: + override_kinds = ', '.join(override_values.keys()) + display.v(f"Override values will be applied for resource kinds: {override_kinds}") + + if filter_values: + filter_kinds = ', '.join(filter_values.keys()) + display.v(f"Filter values will be applied for resource kinds: {filter_kinds}") + + skip_files_lower = None + if skip_files: + display.v(f"Skip files will be applied for resource kinds: {', '.join(skip_files.keys())}") + skip_files_lower = {f"{k.lower()}s": v for k, v in skip_files.items()} + + total_created = 0 + total_updated = 0 + total_skipped = 0 + total_failed = 0 + failed_resources = [] # Track failed resources with details + + # Discover all resource types in the backup + try: + resource_dirs = [d for d in os.listdir(resources_path) + if os.path.isdir(os.path.join(resources_path, d))] + except Exception as e: + raise AnsibleError(f"Error listing resource directories in {resources_path}: {e}") + + if not resource_dirs: + display.warning(f"No resource directories found in {resources_path}") + return dict( + message="No resources found to restore", + failed=False, + changed=False, + success=True, + created_count=0, + updated_count=0, + skipped_count=0, + failed_count=0, + failed_resources=[] + ) + + display.v(f"Found {len(resource_dirs)} resource type(s) in backup: {', '.join(sorted(resource_dirs))}") + + # Filter resource directories if specific kinds requested + if resource_kinds: + # Convert resource_kinds to lowercase directory names (add 's' suffix) + requested_dirs = set() + for kind in resource_kinds: + # Handle both singular and plural forms + dir_name = kind.lower() + if not dir_name.endswith('s'): + dir_name = dir_name + 's' + requested_dirs.add(dir_name) + + # Filter to only requested directories + resource_dirs = [d for d in resource_dirs if d in requested_dirs] + + if not resource_dirs: + display.warning(f"None of the requested resource kinds found in backup") + return dict( + message="No requested resources found to restore", + failed=False, + changed=False, + success=True, + created_count=0, + updated_count=0, + skipped_count=0, + failed_count=0, + failed_resources=[] + ) + + display.v(f"Restoring {len(resource_dirs)} requested resource type(s): {', '.join(sorted(resource_dirs))}") + + # Process each resource directory + for resource_dir in sorted(resource_dirs): + resource_dir_path = os.path.join(resources_path, resource_dir) + files_to_skip= [] + + if skip_files_lower: + files_to_skip = skip_files_lower[resource_dir] + + # Get all YAML files in this directory + try: + yaml_files = [f for f in os.listdir(resource_dir_path) if f.endswith('.yaml')] + except Exception as e: + display.warning(f"Error listing files in {resource_dir_path}: {e}") + continue + + if not yaml_files: + display.v(f"No YAML files found in {resource_dir}/") + continue + + display.v(f"Restoring {len(yaml_files)} resource(s) from {resource_dir}/") + + # Process each YAML file + for yaml_file in sorted(yaml_files): + + if yaml_file in files_to_skip: + display.v(f"Skipping {yaml_file} as it is in the skip list") + total_skipped += 1 + continue + + yaml_file_path = os.path.join(resource_dir_path, yaml_file) + + # Load the resource data + resource_data = loadYamlFile(yaml_file_path) + if not resource_data: + display.warning(f"Failed to load {yaml_file_path}") + total_failed += 1 + failed_resources.append({ + 'kind': 'Unknown', + 'name': yaml_file.replace('.yaml', ''), + 'description': f"Unknown/{yaml_file.replace('.yaml', '')}", + 'error': 'Failed to load YAML file' + }) + continue + + # Apply overrides if provided + if override_values: + resource_kind = resource_data.get('kind', 'Unknown') + resource_data = apply_overrides(resource_data, override_values, resource_kind) + + # Filter values if provided + if filter_values: + resource_kind = resource_data.get('kind', 'Unknown') + resource_data = filter_fields(resource_data, filter_values, resource_kind) + + # Restore the resource + success, resource_name, status_msg = restoreResource( + dynClient, resource_data, namespace=None, replace_resource=replace_resource + ) + + if success: + if status_msg == "updated": + # Resource was updated + total_updated += 1 + elif status_msg == "skipped": + # Resource was skipped + total_skipped += 1 + else: + # Resource was created + total_created += 1 + else: + total_failed += 1 + kind = resource_data.get('kind', 'Unknown') + failed_resources.append({ + 'kind': kind, + 'name': resource_name, + 'description': f"{kind}/{resource_name}", + 'error': status_msg + }) + + display.v(f"Progress: {total_created} created, {total_updated} updated, {total_skipped} skipped, {total_failed} failed") + + display.v(f"Restore complete: {total_created} resources created, {total_updated} updated, {total_skipped} skipped, {total_failed} failed") + + # Determine if the restore was successful + has_failures = total_failed > 0 + + return dict( + message=f"Restored {total_created + total_updated} MAS resources ({total_created} created, {total_updated} updated, {total_skipped} skipped)" + (f" with {total_failed} failures" if has_failures else ""), + failed=has_failures, + changed=(total_created + total_updated) > 0, + success=not has_failures, + created_count=total_created, + updated_count=total_updated, + skipped_count=total_skipped, + failed_count=total_failed, + failed_resources=failed_resources + ) diff --git a/ibm/mas_devops/plugins/action/save_sls_registration_info.py b/ibm/mas_devops/plugins/action/save_sls_registration_info.py new file mode 100644 index 0000000000..63cf044b48 --- /dev/null +++ b/ibm/mas_devops/plugins/action/save_sls_registration_info.py @@ -0,0 +1,52 @@ +#!/usr/bin/env python3 + +import logging +import yaml +import urllib3 +from ansible_collections.kubernetes.core.plugins.module_utils.k8s.client import get_api_client +from ansible.errors import AnsibleError +from ansible.plugins.action import ActionBase +import os + +from mas.devops.sls import getSLSRegistrationDetails + +urllib3.disable_warnings() # Disabling warnings will prevent InsecureRequestWarnings from dynClient +logging.basicConfig(level=logging.INFO, format='%(asctime)s %(name)-20s %(levelname)-8s %(message)s', datefmt='%Y-%m-%d %H:%M:%S') + +class ActionModule(ActionBase): + def run(self, tmp=None, task_vars=None): + super(ActionModule, self).run(tmp, task_vars) + + slsNamespace = self._task.args.get('namespace', None) + slsName = self._task.args.get('name', None) + backupPath = self._task.args.get('sls_backup_path', None) + + if slsNamespace is None: + raise AnsibleError(f"Error: slsNamespace argument was not provided") + if slsName is None: + raise AnsibleError(f"Error: slsName argument was not provided") + if backupPath is None: + raise AnsibleError(f"Error: sls_backup_dir argument was not provided") + + # Initialize DynamicClient, ensure the namespace exists, and create/update the entitlement secret + host = self._task.args.get('host', None) + api_key = self._task.args.get('api_key', None) + + dynClient = get_api_client(api_key=api_key, host=host) + + registrationDetails = getSLSRegistrationDetails(name=slsName, namespace=slsNamespace, dynClient=dynClient) + + if registrationDetails: + with open(os.path.join(backupPath, 'sls-registration.yaml'), 'w') as outfile: + yaml.dump(registrationDetails, outfile, default_flow_style=False) + return dict( + msg=f"Successfully stored SLS registration details to {os.path.join(backupPath, 'sls-registration.yml')}", + failed=False, + success=True + ) + else: + return dict( + msg=f"Couldn't get registration details from CR status of SLS {slsName} in namespace {slsNamespace}.", + failed=True, + success=False + ) diff --git a/ibm/mas_devops/plugins/action/upload_to_s3.py b/ibm/mas_devops/plugins/action/upload_to_s3.py new file mode 100644 index 0000000000..c9bd6e87d2 --- /dev/null +++ b/ibm/mas_devops/plugins/action/upload_to_s3.py @@ -0,0 +1,63 @@ +#!/usr/bin/env python3 + +import logging +import os +import urllib3 +from ansible.errors import AnsibleError +from ansible.plugins.action import ActionBase +from mas.devops.backup import uploadToS3 + +urllib3.disable_warnings() # Disabling warnings will prevent InsecureRequestWarnings from dynClient + +def normalize_endpoint_url(endpoint) -> str|None: + if not endpoint: + return endpoint + if not endpoint.startswith(("http://", "https://")): + return f"https://{endpoint}" + return endpoint + +class ActionModule(ActionBase): + """ + Usage Example + ------------- + tasks: + - name: "Upload to S3 location" + ibm.mas_devops.upload_to_s3: + mas_catalog_version: "{{ catalog_tag }}" + fail_if_catalog_does_not_exist: true + register: mas_catalog_metadata + """ + def run(self, tmp=None, task_vars=None): + super(ActionModule, self).run(tmp, task_vars) + + file_path = self._task.args.get('file_path', None) + bucket_name = self._task.args.get('bucket_name', None) + object_name = self._task.args.get('object_name', None) + aws_access_key_id = self._task.args.get('aws_access_key_id', None) + aws_secret_access_key = self._task.args.get('aws_secret_access_key', None) + endpoint_url = self._task.args.get('endpoint_url', None) + region_name = self._task.args.get('region_name', None) + + if file_path is None: + raise AnsibleError(f"Error: file_path argument was not provided") + if bucket_name is None: + raise AnsibleError(f"Error: bucket_name argument was not provided") + if object_name is None: + raise AnsibleError(f"Error: object_name argument was not provided") + if aws_access_key_id is None: + raise AnsibleError(f"Error: aws_access_key_id argument was not provided") + if aws_secret_access_key is None: + raise AnsibleError(f"Error: aws_secret_access_key argument was not provided") + + endpoint_url = normalize_endpoint_url(endpoint=endpoint_url) + + upload_status = uploadToS3( + file_path=file_path, bucket_name=bucket_name, object_name=object_name, + endpoint_url=endpoint_url, aws_access_key_id=aws_access_key_id, + aws_secret_access_key=aws_secret_access_key, region_name=region_name + ) + + return dict( + success=upload_status + ) + diff --git a/ibm/mas_devops/plugins/action/verify_backup_restore_vars.py b/ibm/mas_devops/plugins/action/verify_backup_restore_vars.py new file mode 100644 index 0000000000..03e7f06dac --- /dev/null +++ b/ibm/mas_devops/plugins/action/verify_backup_restore_vars.py @@ -0,0 +1,72 @@ +#!/usr/bin/env python3 + +from ansible.plugins.action import ActionBase +from ansible.errors import AnsibleError + + +class ActionModule(ActionBase): + + REQUIRED = { + "catalog": { + "backup": ["mas_backup_dir"], + "restore": ["mas_backup_dir", "ibm_catalogs_backup_version"] + }, + "certmanager": { + "backup": ["mas_backup_dir"], + "restore": ["mas_backup_dir", "certmanager_backup_version"] + }, + "db2": { + "backup": ["db2_instance_name", "db2_namespace", "mas_backup_dir", "mas_instance_id", "mas_application_id"], + "restore-instance": ["mas_backup_dir", "db2_backup_version", "mas_application_id"], + "restore-database": ["mas_backup_dir", "db2_backup_version", "db2_instance_name", "backup_vendor", "mas_application_id"], + "s3_setup": ["backup_vendor", "backup_s3_alias", "backup_s3_endpoint", "backup_s3_bucket", "backup_s3_access_key", "backup_s3_secret_key"] + }, + "grafana": { + "backup": ["mas_backup_dir"] + }, + "mongodb": { + "backup": ["mongodb_instance_name", "mas_backup_dir", "mas_instance_id"], # mongodb_instance_name has a default value + "restore": ["mas_backup_dir", "mongodb_backup_version"] + }, + "sls": { + "backup": ["mas_backup_dir", "sls_namespace", "sls_instance_name"], + "restore": ["mas_backup_dir", "sls_backup_version"] + }, + "suite": { + "backup": ["mas_instance_id", "mas_backup_dir"], + "restore": ["mas_instance_id", "mas_backup_dir", "suite_backup_version"] + }, + "manage": { + "backup": ["mas_instance_id", "mas_workspace_id", "mas_backup_dir"], + "restore": ["mas_instance_id", "mas_backup_dir", "mas_app_backup_version"] + } + } + + def run(self, tmp=None, task_vars=None): + super(ActionModule, self).run(tmp, task_vars) + + component = self._task.args.get('component', None) + action = self._task.args.get('action', None) + + if component not in self.REQUIRED: + raise AnsibleError(f"Unknown component '{component}'. Allowed: {list(self.REQUIRED)}") + + if action not in self.REQUIRED[component]: + raise AnsibleError(f"Unknown action '{action}' for component '{component}'. Allowed: {list(self.REQUIRED[component])}") + + missing_args = [] + for req_arg in self.REQUIRED[component][action]: + r_arg = self._task.args.get(req_arg, None) + if r_arg is None or r_arg == '': + missing_args.append(req_arg) + + if len(missing_args) > 0: + raise AnsibleError(f"Missing required arguments for component '{component}' action '{action}': {missing_args}") + else: + return dict( + changed=False, + failed=False, + msg=f"All required arguments for component '{component}' action '{action}' are provided." + ) + + \ No newline at end of file diff --git a/ibm/mas_devops/plugins/action/verify_mongoce_version.py b/ibm/mas_devops/plugins/action/verify_mongoce_version.py new file mode 100644 index 0000000000..913ced1662 --- /dev/null +++ b/ibm/mas_devops/plugins/action/verify_mongoce_version.py @@ -0,0 +1,110 @@ +#!/usr/bin/env python3 + +import urllib3 +from ansible_collections.kubernetes.core.plugins.module_utils.k8s.client import get_api_client +from ansible.plugins.action import ActionBase +from ansible.errors import AnsibleError +from ansible.utils.display import Display +from kubernetes.dynamic import DynamicClient +from kubernetes.dynamic.exceptions import NotFoundError + +from mas.devops.ocp import getCR + +# Disabling warnings will prevent InsecureRequestWarnings from dynClient +urllib3.disable_warnings() +display = Display() + +def isMongoRunning(mongoCR: dict) -> bool: + """ + Check if MongoDB Community instance is running + return True if running, else False + """ + display.v(f"Checking if MongoDB Community instance is in 'Running' state") + if 'status' in mongoCR: + if 'phase' in mongoCR['status']: + if mongoCR['status']['phase'] == 'Running': + display.v(f"MongoDB Community instance is in 'Running' state") + return True + display.v(f"MongoDB Community instance is not in 'Running' state") + return False + +def getMongoceCR(dynClient: DynamicClient, mongodb_instance_name: str, mongodb_namespace: str) -> dict: + """ + Check if MongoDB Community instance exists + return cr if exists, else return empty dict + """ + display.v(f"Checking if MongoDB Community instance '{mongodb_instance_name}' exists in namespace '{mongodb_namespace}'") + mongodbCR = getCR( + dynClient=dynClient, + cr_api_version="mongodbcommunity.mongodb.com/v1", + cr_kind="MongoDBCommunity", + cr_name=mongodb_instance_name, + namespace=mongodb_namespace + ) + if mongodbCR: + return mongodbCR.to_dict() + else: + return {} + + +class ActionModule(ActionBase): + """ + Usage Example + ------------- + tasks: + - name: "Verify Existing MongoDB version + ibm.mas_devops.verify_mongoce_version: + """ + def run(self, tmp=None, task_vars=None): + super(ActionModule, self).run(tmp, task_vars) + + # Initialize DynamicClient and grab the task args + host = self._task.args.get('host', None) + api_key = self._task.args.get('api_key', None) + + dynClient = get_api_client(api_key=api_key, host=host) + + mongodb_instance_name = self._task.args.get('mongodb_instance_name') + mongodb_namespace = self._task.args.get('mongodb_namespace') + + if mongodb_instance_name is None: + raise AnsibleError(f"Error: mongodb_instance_name argument was not provided") + if mongodb_namespace is None: + raise AnsibleError(f"Error: mongodb_namespace argument was not provided") + + # Check for existing MongoDb install + mongodbCR = getMongoceCR( + dynClient=dynClient, + mongodb_instance_name=mongodb_instance_name, + mongodb_namespace=mongodb_namespace + ) + + if not mongodbCR: + display.v(f"MongoDB Community instance '{mongodb_instance_name}' does NOT exist in namespace '{mongodb_namespace}'") + return dict( + message=f"MongoDB Community instance '{mongodb_instance_name}' does NOT exist in namespace '{mongodb_namespace}'", + success=True, + failed=False, + exist=False, + running=False + ) + elif isMongoRunning(mongodbCR): + display.v(f"MongoDB Community instance '{mongodb_instance_name}' is running version '{mongodbCR['spec']['version']}' in namespace '{mongodb_namespace}'") + return dict( + message=f"MongoDB Community instance '{mongodb_instance_name}' is running version '{mongodbCR['spec']['version']}' in namespace '{mongodb_namespace}'", + success=True, + failed=False, + exist=True, + running=True, + mongoce_version=mongodbCR['spec']['version'] + ) + else: + display.v(f"MongoDB Community instance '{mongodb_instance_name}' is NOT in 'Running' state in namespace '{mongodb_namespace}'") + return dict( + message=f"MongoDB Community instance '{mongodb_instance_name}' is NOT in 'Running' state in namespace '{mongodb_namespace}'", + success=True, + failed=False, + exist=True, + running=False, + mongoce_version=mongodbCR['spec']['version'] + ) diff --git a/ibm/mas_devops/plugins/action/verify_storage_class.py b/ibm/mas_devops/plugins/action/verify_storage_class.py new file mode 100644 index 0000000000..ef2e88dfb9 --- /dev/null +++ b/ibm/mas_devops/plugins/action/verify_storage_class.py @@ -0,0 +1,51 @@ +#!/usr/bin/env python3 + +import urllib3 +from ansible_collections.kubernetes.core.plugins.module_utils.k8s.client import get_api_client +from ansible.plugins.action import ActionBase + +from mas.devops.ocp import getStorageClass + + +# Disabling warnings will prevent InsecureRequestWarnings from dynClient +urllib3.disable_warnings() + + +class ActionModule(ActionBase): + """ + Usage Example + ------------- + tasks: + - name: "Verify if Storage class exist + ibm.mas_devops.verify_storage_class: + """ + def run(self, tmp=None, task_vars=None): + super(ActionModule, self).run(tmp, task_vars) + + # Initialize DynamicClient and grab the task args + host = self._task.args.get('host', None) + api_key = self._task.args.get('api_key', None) + + dynClient = get_api_client(api_key=api_key, host=host) + storageClass = self._task.args['storage_class'] + sc = getStorageClass(dynClient, storageClass) + + # We don't want to fail if we can't find the specific storage class, doing so will + # result in roles/playbooks failing in environments where none of the default + # storage classes are available. We use the success=false to track when we couldn't + # find a default storage class, which does not trigger Ansible treating the action as + # failed. + if sc is None: + return dict( + message=f"Failed to find {storageClass} storage class in cluster", + success=False, + failed=True, + name=storageClass + ) + + return dict( + message=f"Successfully found {storageClass} storage class in cluster", + success=True, + failed=False, + name=storageClass + ) diff --git a/ibm/mas_devops/plugins/filter/filters.py b/ibm/mas_devops/plugins/filter/filters.py index cbe33e3daa..e827e37b4f 100644 --- a/ibm/mas_devops/plugins/filter/filters.py +++ b/ibm/mas_devops/plugins/filter/filters.py @@ -1,6 +1,6 @@ import yaml import re - +import datetime def private_vlan(vlans): """ @@ -474,6 +474,65 @@ def get_default_upgrade_channel(current: str, valid_paths: dict) -> str: print(f'Error: channel upgrade compatibility matrix is incorrectly defined') return default +def set_storage_classes_names(storage_list: list, storage_class_name_rwo: str, storage_class_name_rwx: str): + """ + Iterate through the storage_list list and set the storage_class_name for each storage item based on the access mode. + Expects data to be + storage: + - name: meta + spec: + accessModes: + - ReadWriteMany + resources: + requests: + storage: 20Gi + storageClassName: nfs-client + type: create + - name: backup + spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 20Gi + storageClassName: nfs-client + type: create + """ + + for storage_item in storage_list: + if 'spec' in storage_item and 'accessModes' in storage_item['spec'] and 'storageClassName' in storage_item['spec']: + if storage_item['spec']['accessModes'][0] == 'ReadWriteMany': + storage_item['spec']['storageClassName'] = storage_class_name_rwx + else: + storage_item['spec']['storageClassName'] = storage_class_name_rwo + return storage_list + +def override_manage_persistent_volumes(volumes_list: list, storage_class_name_rwo: str, storage_class_name_rwx: str): + """ + Iterate through the volumes_list list and set the storage_class_name for each storage item based on the access mode. + Expects data to be + storage: + - pvcName: manage-imagestitching + accessModes: + - ReadWriteMany + size: 20Gi + storageClassName: nfs-client + - pvcName: manage-2 + accessModes: + - ReadWriteMany + size: 20Gi + storageClassName: nfs-client + """ + + for volume_item in volumes_list: + if 'accessModes' in volume_item and 'storageClassName' in volume_item: + if volume_item['accessModes'][0] == 'ReadWriteMany': + volume_item['storageClassName'] = storage_class_name_rwx + else: + volume_item['storageClassName'] = storage_class_name_rwo + return volumes_list + + class FilterModule(object): def filters(self): return { @@ -497,5 +556,7 @@ def filters(self): 'get_db2_instance_name': get_db2_instance_name, 'get_ecr_repositories': get_ecr_repositories, 'is_channel_upgrade_path_valid': is_channel_upgrade_path_valid, - 'get_default_upgrade_channel': get_default_upgrade_channel + 'get_default_upgrade_channel': get_default_upgrade_channel, + 'set_storage_classes_names': set_storage_classes_names, + 'override_manage_persistent_volumes': override_manage_persistent_volumes } diff --git a/ibm/mas_devops/plugins/module_utils/__init__.py b/ibm/mas_devops/plugins/module_utils/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/ibm/mas_devops/plugins/module_utils/backuprestore.py b/ibm/mas_devops/plugins/module_utils/backuprestore.py new file mode 100644 index 0000000000..5e1f5f5983 --- /dev/null +++ b/ibm/mas_devops/plugins/module_utils/backuprestore.py @@ -0,0 +1,43 @@ +#!/usr/bin/env python3 + +from mas.devops.ocp import getSecret +from kubernetes.dynamic import DynamicClient +from kubernetes.dynamic.exceptions import NotFoundError + +from mas.devops.ocp import getCR + +def getDb2VersionFromCR(db2uCR: dict) -> str: + """ + Extract Db2 version from Db2uCluster/Db2uInstance CR + """ + try: + if 'spec' in db2uCR and 'version' in db2uCR['spec']: + return db2uCR['spec']['version'] + return None + except Exception as e: + return None + +def getDb2uInstance(dynClient: DynamicClient, db2_instance_name: str, db2_namespace: str): + """ + Retrieve Db2uCluster CR instance + """ + + db2uCR = getCR(dynClient, cr_api_version="db2u.databases.ibm.com/v1", cr_kind="Db2uCluster", cr_name=db2_instance_name, namespace=db2_namespace) + if db2uCR: + return db2uCR.to_dict() + else: + # Db2uCluster CR not found, try Db2uInstance + db2uCR = getCR(dynClient, cr_api_version="db2u.databases.ibm.com/v1", cr_kind="Db2uInstance", cr_name=db2_instance_name, namespace=db2_namespace) + if db2uCR: + return db2uCR.to_dict() + return None + +def isDb2uReady(db2uCR: dict) -> bool: + """ + Check if Db2uCluster/Db2uInstance is in ready state + """ + if 'status' in db2uCR: + status = db2uCR['status'] + if 'state' in status and status['state'] == 'Ready': + return True + return False \ No newline at end of file diff --git a/ibm/mas_devops/roles/cert_manager/README.md b/ibm/mas_devops/roles/cert_manager/README.md index b209ac825e..068291dac8 100644 --- a/ibm/mas_devops/roles/cert_manager/README.md +++ b/ibm/mas_devops/roles/cert_manager/README.md @@ -23,19 +23,154 @@ Specifies which operation to perform on the Certificate Manager operator. **When to use**: - Use `install` (default) for initial deployment or to ensure cert-manager is present - Use `uninstall` to remove cert-manager (use with extreme caution) +- Use `backup` to backup the cert-manager installation resources to an archive +- Use `restore` to restore the cert-manager installation resources from an archive created from the `backup` action - Use `none` to skip cert-manager operations while running broader playbooks -**Valid values**: `install`, `uninstall`, `none` +**Valid values**: `install`, `uninstall`, `backup`, `restore`, `none` **Impact**: - `install`: Deploys Red Hat Certificate Manager Operator to `cert-manager-operator` namespace and creates operand in `cert-manager` namespace - `uninstall`: Removes cert-manager operator and operand (destructive operation) +- `backup`: Stores the resources used for the installation (not certificate or secrets) in an archive location +- `restore`: Restores the resources used for the installation (not certificate or secrets) from an archive location - `none`: Role takes no action **Related variables**: None **Note**: **WARNING** - Certificate Manager is a cluster-wide dependency used by MAS, SLS, and other components. Uninstalling it will break certificate management for all dependent applications. Only use `uninstall` if you are certain no applications depend on it. +### Backup and Restore Variables + +#### mas_backup_dir +Directory path where Certificate Manager backup files will be stored. + +- **Required** for backup and restore operations +- Environment Variable: `MAS_BACKUP_DIR` +- Default: None + +**Purpose**: Specifies the local filesystem directory where backup archives will be created (for backup) or read from (for restore). This directory serves as the central location for all Certificate Manager backup data. + +**When to use**: +- Required when `cert_manager_action` is set to `backup` or `restore` +- Should be a persistent location with sufficient storage space +- Ensure the directory is accessible and has appropriate permissions + +**Valid values**: Any valid local filesystem path (e.g., `/backup/mas`, `/home/user/certmanager-backups`) + +**Impact**: All backup files and metadata will be stored in subdirectories under this path. The backup creates a timestamped directory structure: `{mas_backup_dir}/backup-{version}-certmanager/` + +**Related variables**: Works with `certmanager_backup_version` to create unique backup directories. + +**Note**: Ensure this directory has sufficient space for backup data and is regularly backed up to external storage for disaster recovery. + +#### certmanager_backup_version +Version identifier for the backup, used to create unique backup directories. + +- **Optional** for backup (auto-generated if not provided) +- **Required** for restore +- Environment Variable: `CERTMANAGER_BACKUP_VERSION` +- Default: Auto-generated timestamp in format `YYYYMMDD-HHMMSS` + +**Purpose**: Provides a unique identifier for each backup, allowing multiple backups to coexist and enabling point-in-time restore operations. + +**When to use**: +- For backup: Leave unset to auto-generate a timestamp-based version, or provide a custom identifier +- For restore: Must specify the exact version identifier of the backup to restore + +**Valid values**: Any string suitable for directory names (alphanumeric, hyphens, underscores). Auto-generated format: `YYYYMMDD-HHMMSS` (e.g., `20260122-131500`) + +**Impact**: +- For backup: Creates directory `{mas_backup_dir}/backup-{version}-certmanager/` +- For restore: Looks for backup in `{mas_backup_dir}/backup-{version}-certmanager/` + +**Related variables**: Works with `mas_backup_dir` to determine backup location. + +**Note**: When restoring, you must know the exact backup version identifier. List the contents of `mas_backup_dir` to see available backups. + +## Backup and Restore Operations +------------------------------------------------------------------------------- + +This section provides comprehensive information about Certificate Manager backup and restore operations. + +### Action Comparison + +| Action | Purpose | Instance Resources | Prerequisites | Use Case | +|--------|---------|-------------------|---------------|----------| +| `backup` | Create backup | Yes (operator and operand resources) | Running Certificate Manager instance | Regular backups, disaster recovery preparation | +| `restore` | Full restore | Yes (recreates operator and operand) | Backup archive | Disaster recovery, cluster migration, complete restoration | + +### Backup Process + +The Certificate Manager backup operation creates a backup of your Certificate Manager installation resources: + +1. **Operator Resources**: Backs up Kubernetes resources including: + - Projects/Namespaces (`cert-manager-operator` and `cert-manager`) + - Subscription (`openshift-cert-manager-operator`) + - OperatorGroup +2. **Auto-discovered Secrets**: Any secrets referenced by the backed-up resources are automatically discovered and included + +**Note**: The backup does NOT include individual certificates or secrets created by Certificate Manager for applications. Those are backed up as part of the specific service (e.g., MongoDB, SLS) that uses them. + +**Backup Directory Structure:** +``` +{mas_backup_dir}/ +└── backup-{version}-certmanager/ + └── resources/ + ├── projects/ + ├── subscriptions/ + ├── operatorgroups/ + └── secrets/ +``` + +### Restore Process + +The Certificate Manager restore operation performs a complete restoration of the Certificate Manager operator and operand: + +**Steps:** +1. Validates backup files and required variables +2. Restores Projects/Namespaces +3. Restores OperatorGroups +4. Restores Subscriptions (triggers operator installation) +5. Waits for cert-manager-operator-controller-manager deployment to be ready (up to 30 minutes) +6. Waits for CertManager cluster Custom Resource to be created (up to 5 minutes) +7. Waits for cert-manager-webhook deployment to be ready (up to 30 minutes) + +**When to use:** +- Disaster recovery scenarios +- Migrating Certificate Manager to a new cluster +- Recreating a deleted Certificate Manager instance +- Setting up a new environment from backup + +### Important Considerations + +**Version Compatibility:** +- Target Certificate Manager version should match the backup version +- Version upgrades should be performed separately, not during restore +- The restore process validates version compatibility before proceeding + +**Storage Requirements:** +- Ensure sufficient storage in the backup directory +- Backup directory structure: `{mas_backup_dir}/backup-{version}-certmanager/` +- Monitor disk space during backup operations + +**Security:** +- Backup files contain operator configuration and auto-discovered secrets +- Secure backup directory with appropriate permissions (chmod 700 recommended) +- Consider encrypting backups for long-term storage +- Restrict access to backup files to authorized personnel only + +### Backup and Restore Best Practices + +1. **Regular Backups**: Schedule automated backups at regular intervals, especially before upgrades +2. **Test Restores**: Periodically test restore procedures in non-production environments +3. **Monitor Operations**: Implement monitoring and alerting for backup failures +4. **Backup Validation**: Verify backup integrity after completion +5. **Retention Policy**: Implement and document backup retention policies +6. **Disaster Recovery**: Include Certificate Manager backup/restore in your DR plan +7. **Coordinate with Services**: Coordinate Certificate Manager backups with dependent service backups + + ## Example Playbook After installing the Ansible Collection you can include this role in your own custom playbooks. diff --git a/ibm/mas_devops/roles/cert_manager/defaults/main.yml b/ibm/mas_devops/roles/cert_manager/defaults/main.yml index 826b47d211..af7c0fca42 100644 --- a/ibm/mas_devops/roles/cert_manager/defaults/main.yml +++ b/ibm/mas_devops/roles/cert_manager/defaults/main.yml @@ -5,3 +5,7 @@ cert_manager_action: "{{ lookup('env', 'CERT_MANAGER_ACTION') | default('install cert_manager_operator_namespace: "cert-manager-operator" cert_manager_namespace: "cert-manager" cert_manager_channel: "{{ lookup('env', 'REDHAT_CERT_MANAGER_CHANNEL') | default('stable-v1', true) }}" + +# Backup and restore variables +certmanager_backup_version: "{{ lookup('env', 'CERTMANAGER_BACKUP_VERSION') }}" +mas_backup_dir: "{{ lookup('env', 'MAS_BACKUP_DIR') }}" diff --git a/ibm/mas_devops/roles/cert_manager/tasks/provider/redhat/backup.yml b/ibm/mas_devops/roles/cert_manager/tasks/provider/redhat/backup.yml new file mode 100644 index 0000000000..1ac51474fc --- /dev/null +++ b/ibm/mas_devops/roles/cert_manager/tasks/provider/redhat/backup.yml @@ -0,0 +1,75 @@ +--- +- name: "Fail if require variables for Redhat cert-manager backup are not provided" + ibm.mas_devops.verify_backup_restore_vars: + mas_backup_dir: "{{ mas_backup_dir }}" + action: "backup" + component: "certmanager" + +- name: "Check if CERTMANAGER_BACKUP_VERSION is provided, if not set to default 'YYYYMMDD-HHMMSS' format" + set_fact: + certmanager_backup_version: "{{ lookup('pipe', 'date +%Y%m%d-%H%M%S') }}" + when: certmanager_backup_version is not defined or certmanager_backup_version == "" or certmanager_backup_version == "None" + +- name: "Set fact: cert-manager backup base directory path" + set_fact: + certmgr_backup_path: "{{ mas_backup_dir }}/backup-{{ certmanager_backup_version }}-certmanager" + +- name: "Set fact: cert-manager backup resources" + set_fact: + certmanager_backup_resources: + - namespace: "{{ cert_manager_operator_namespace }}" + resources: + # Projects + - kind: Project + api_version: project.openshift.io/v1 + name: "{{ cert_manager_namespace }}" + - kind: Project + api_version: project.openshift.io/v1 + name: "{{ cert_manager_operator_namespace }}" + # certmanager source + - kind: Subscription + api_version: operators.coreos.com/v1alpha1 + name: openshift-cert-manager-operator + # operator group + - kind: OperatorGroup + api_version: operators.coreos.com/v1 + name: operatorgroup + + +# Call the backup_resources plugin to execute the backup to the path provided +# ----------------------------------------------------------------------------- +- name: "Backup cert-manager resources (referenced secrets are auto-discovered)" + ibm.mas_devops.backup_resource: + backup_resources: "{{ certmanager_backup_resources }}" + backup_path: "{{ certmgr_backup_path }}" + register: backup_result + +# Show the results +# ----------------------------------------------------------------------------- +- name: "Display backup results" + debug: + msg: + - "Backup completed{{ ' with failures' if backup_result.failed_count > 0 else ' successfully' }}" + - "Total resources backed up: {{ backup_result.backed_up_count }}" + - "Total resources failed: {{ backup_result.failed_count }}" + - "Resources not found: {{ backup_result.not_found_count }}" + - "Secrets auto-discovered: {{ backup_result.discovered_secrets_count }}" + - "Backup location: {{ certmgr_backup_path }}" + +# Fail task if any errors occurred. +# ----------------------------------------------------------------------------- +- name: "Display failed resources" + debug: + msg: + - "Failed resources:" + - "{{ backup_result.failed_resources | to_nice_yaml }}" + when: backup_result.failed_count > 0 + +- name: "Fail if backup had errors" + fail: + msg: | + Backup failed for {{ backup_result.failed_count }} resource(s): + {% for resource in backup_result.failed_resources %} + - {{ resource.description }} in {{ resource.scope }} + {% endfor %} + when: backup_result.failed_count > 0 diff --git a/ibm/mas_devops/roles/cert_manager/tasks/provider/redhat/restore.yml b/ibm/mas_devops/roles/cert_manager/tasks/provider/redhat/restore.yml new file mode 100644 index 0000000000..e8266b2218 --- /dev/null +++ b/ibm/mas_devops/roles/cert_manager/tasks/provider/redhat/restore.yml @@ -0,0 +1,178 @@ +--- +- name: "Fail if require variables for Redhat cert-manager backup are not provided" + ibm.mas_devops.verify_backup_restore_vars: + mas_backup_dir: "{{ mas_backup_dir }}" + certmanager_backup_version: "{{ certmanager_backup_version }}" + action: "restore" + component: "certmanager" + +- name: "Set fact: cert-manager backup base directory path" + set_fact: + certmgr_backup_path="{{ mas_backup_dir }}/backup-{{ certmanager_backup_version }}-certmanager" + certmgr_backup_resource_path="{{ mas_backup_dir }}/backup-{{ certmanager_backup_version }}-certmanager/resources" + +- name: "Check cert-manager backup resource path exist" + stat: + path: "{{ certmgr_backup_resource_path }}" + register: resources_backup_path_stat + +- name: "Fail if backup archive does not exist" + fail: + msg: "Cert-manager resources archive not found at: {{ certmgr_backup_resource_path }}" + when: not resources_backup_path_stat.stat.exists or not resources_backup_path_stat.stat.isdir + +- name: "Cert-manager restore information" + debug: + msg: + - "Backup Version ................. {{ certmanager_backup_version }}" + - "Backup Path .................... {{ certmgr_backup_path }}" + +# Restore Projects +# ------------------------------------------------------------------------- +- name: "Restore Projects" + ibm.mas_devops.restore_resource: + backup_path: "{{ certmgr_backup_path }}" + resource_kinds: + - Project + register: project_result + +# Restore OperatorGroups and Subscriptions +# ------------------------------------------------------------------------- +- name: "Restore OperatorGroups" + ibm.mas_devops.restore_resource: + backup_path: "{{ certmgr_backup_path }}" + resource_kinds: + - OperatorGroup + register: operatorgroups_result + when: project_result.success + +- name: "Restore Subscriptions" + ibm.mas_devops.restore_resource: + backup_path: "{{ certmgr_backup_path }}" + resource_kinds: + - Subscription + register: subscriptions_result + when: operatorgroups_result.success + +# Wait for Subscription to be processed +# ----------------------------------------------------------------------------- +- name: "Wait for Red Hat cert-manager-operator-controller-manager to be ready (60s delay)" + kubernetes.core.k8s_info: + api_version: apps/v1 + name: cert-manager-operator-controller-manager + namespace: "{{ cert_manager_operator_namespace }}" + kind: Deployment + register: certmanager_deployment + until: + - certmanager_deployment.resources is defined + - certmanager_deployment.resources | length > 0 + - certmanager_deployment.resources[0].status is defined + - certmanager_deployment.resources[0].status.replicas is defined + - certmanager_deployment.resources[0].status.readyReplicas is defined + - certmanager_deployment.resources[0].status.readyReplicas == certmanager_deployment.resources[0].status.replicas + retries: 30 # Approximately 1/2 hour before we give up + delay: 60 # 1 minute + when: subscriptions_result.success + +# Wait for CertManager instance to be created +# ----------------------------------------------------------------------------- +- name: "Wait for CertManager Cluster Custom Resource to be created" + kubernetes.core.k8s_info: + api_version: operator.openshift.io/v1alpha1 + name: cluster + kind: CertManager + register: certmanager_cluster_cr + until: + - certmanager_cluster_cr.resources is defined + - certmanager_cluster_cr.resources | length > 0 + retries: 10 # Approximately 5 minutes before we give up + delay: 30 # 30 seconds + when: subscriptions_result.success + + +# Wait for Cert Manager's webhook to be ready +# ----------------------------------------------------------------------------- +- name: "Wait for cert-manager-webhook deployment to be ready (60s delay)" + kubernetes.core.k8s_info: + api_version: apps/v1 + name: cert-manager-webhook + namespace: "{{ cert_manager_namespace }}" + kind: Deployment + register: certmanager_webhook_deployment + until: + - certmanager_webhook_deployment.resources is defined + - certmanager_webhook_deployment.resources | length > 0 + - certmanager_webhook_deployment.resources[0].status is defined + - certmanager_webhook_deployment.resources[0].status.replicas is defined + - certmanager_webhook_deployment.resources[0].status.readyReplicas is defined + - certmanager_webhook_deployment.resources[0].status.readyReplicas == certmanager_webhook_deployment.resources[0].status.replicas + retries: 60 # Approximately 1/2 hour before we give up + delay: 60 # 1 minute + when: subscriptions_result.success + +# Calculate total results +# ----------------------------------------------------------------------------- +- name: "Calculate total restore results" + set_fact: + total_created: >- + {{ + (project_result.created_count | default(0)) + + (operatorgroups_result.created_count | default(0)) + + (subscriptions_result.created_count | default(0)) + }} + total_updated: >- + {{ + (project_result.updated_count | default(0)) + + (operatorgroups_result.updated_count | default(0)) + + (subscriptions_result.updated_count | default(0)) + }} + total_skipped: >- + {{ + (project_result.skipped_count | default(0)) + + (operatorgroups_result.skipped_count | default(0)) + + (subscriptions_result.skipped_count | default(0)) + }} + total_failed: >- + {{ + (project_result.failed_count | default(0)) + + (operatorgroups_result.failed_count | default(0)) + + (subscriptions_result.failed_count | default(0)) + }} + +- name: "Display total restore results" + debug: + msg: + - >- + Restore completed{{ ' with failures' if total_failed | int > 0 + else ' successfully' }} + - "Total resources created: {{ total_created }}" + - "Total resources updated: {{ total_updated }}" + - "Total resources skipped: {{ total_skipped }}" + - "Total resources failed: {{ total_failed }}" + +# Fail task if any errors occurred +# ----------------------------------------------------------------------------- +- name: "Collect all failed resources" + set_fact: + all_failed_resources: >- + {{ + (project_result.failed_resources | default([])) + + (operatorgroups_result.failed_resources | default([])) + + (subscriptions_result.failed_resources | default([])) + }} + +- name: "Display failed resources" + debug: + msg: + - "Failed resources:" + - "{{ all_failed_resources | to_nice_yaml }}" + when: total_failed | int > 0 + +- name: "Fail if restore had errors" + fail: + msg: | + Restore failed for {{ total_failed }} resource(s): + {% for resource in all_failed_resources %} + - {{ resource.description }}: {{ resource.error }} + {% endfor %} + when: total_failed | int > 0 diff --git a/ibm/mas_devops/roles/db2/README.md b/ibm/mas_devops/roles/db2/README.md index 333bb3009f..531b85ea5e 100644 --- a/ibm/mas_devops/roles/db2/README.md +++ b/ibm/mas_devops/roles/db2/README.md @@ -101,20 +101,24 @@ Specifies which operation to perform on the Db2 database. **When to use**: - Use `install` (default) for initial Db2 deployment - Use `upgrade` to upgrade all Db2 instances in the namespace to a new version -- Use `backup` to create a backup of Db2 data -- Use `restore` to restore Db2 from a backup +- Use `backup` to create a backup of Db2 instance and/or database +- Use `restore` to restore a backup of db2 instance and database +- Use `restore-database` to restore only the database to an existing Db2 instance -**Valid values**: `install`, `upgrade`, `backup`, `restore` +**Valid values**: `install`, `upgrade`, `backup`, `restore`, `restore-database` -**Impact**: -- `install`: Creates new Db2 operator and instance +**Impact**: +- `install`: Creates new Db2 operator and instance. When `db2_backup_version` is provided, installs from backup (instance + database) - `upgrade`: Upgrades ALL instances in `db2_namespace` to `db2_version` (affects all instances in namespace) -- `backup`: Creates backup of Db2 data -- `restore`: Restores Db2 from backup +- `backup`: Creates backup of Db2 instance resources and/or database data +- `restore`: Creates new Db2 operator and instance from the backup of Db2 instance resources and restores database data to the created instance +- `restore-database`: Restores database to an existing running Db2 instance (does not restore instance resources) **Related variables**: - `db2_version`: Required for upgrade action to specify target version - `db2_namespace`: All instances in this namespace are affected by upgrade +- `db2_backup_version`: Required for restore/restore-database action; optional for backup, defaults to YYYYMMDD-HHMMSS +- `override_storageclass`: In Restore, controls whether storage classes are overridden **Note**: **WARNING** - When using `upgrade`, ALL Db2 instances in the specified namespace will be upgraded. Plan accordingly and ensure `db2_version` matches the operator channel. @@ -991,126 +995,778 @@ This is only used when both `mas_config_dir` and `mas_instance_id` are set, and - Environment Variable: `'MAS_APP_ID` - Default: None +Role Variables - Backup and Restore +------------------------------------------------------------------------------- + +### mas_instance_id +MAS instance identifier for the backup/restore operation. + +- **Required** for backup and restore operations +- Environment Variable: `MAS_INSTANCE_ID` +- Default: None + +**Purpose**: Identifies the MAS instance associated with the Db2 backup. Used for organizing backups and ensuring restore operations target the correct instance. + +**When to use**: +- Always required when performing backup or restore operations +- Must match the MAS instance ID that uses this Db2 instance + +**Valid values**: Valid MAS instance ID (e.g., `inst1`, `masinst1`) + +**Example**: `masinst1` + +### mas_application_id +MAS application identifier for the backup/restore operation. + +- **Required** for backup and restore operations +- Environment Variable: `MAS_APP_ID` +- Default: None + +**Purpose**: Identifies the MAS application (e.g., manage, iot) that uses this Db2 database. Used for organizing backups and database-specific operations. + +**When to use**: +- Always required when performing backup or restore operations +- Must match the MAS application that uses this Db2 database + +**Valid values**: Valid MAS application ID (e.g., `manage`, `iot`, `monitor`) + +**Example**: `manage` + +### mas_backup_dir +Local directory path where backups will be stored or restored from. + +- **Required** for backup and restore operations +- Environment Variable: `MAS_BACKUP_DIR` +- Default: None + +**Purpose**: Specifies the local filesystem directory for storing Db2 backup files and metadata. This directory serves as the staging area for all backup and restore operations. + +**When to use**: +- Always required when performing backup or restore operations +- Must be accessible from the system running the Ansible playbook +- Should have sufficient disk space for database backups + +**Valid values**: Any valid local filesystem path (e.g., `/tmp/mas_backups`, `/backup/db2`) + +**Impact**: +- Backup files and metadata are stored in subdirectories under this path +- Directory structure: `/backup--db2u-/` +- Insufficient space will cause backup failures + +**Related variables**: +- `db2_backup_version`: Used to create versioned backup subdirectories +- `backup_vendor`: When set to `s3`, database backups go to S3 but instance resources still use this directory + +**Example**: `/tmp/masbr` + +### db2_backup_version +The backup version timestamp identifier for backup and restore operations. + +- **Required** for `restore` and `restore-database` actions +- **Auto-generated** for backup operations +- Environment Variable: `DB2_BACKUP_VERSION` +- Default: Auto-generated in format `YYYYMMDD-HHMMSS` -## Role Variables - Backup and Restore -#### masbr_confirm_cluster -Set `true` or `false` to indicate the role whether to confirm the currently connected cluster before running the backup or restore job. +**Purpose**: Uniquely identifies a specific backup version using a timestamp. This allows multiple backups to coexist and enables point-in-time restore operations. + +**When to use**: +- Automatically generated during backup (no need to set manually) +- Must be specified when restoring to identify which backup to use +- Must be specified when installing Db2 from an existing backup + +**Valid values**: Timestamp string in format `YYYYMMDD-HHMMSS` (e.g., `20251212-021316` for December 12, 2025 at 02:13:16) + +**Impact**: +- Determines the backup directory name: `backup--db2u-` +- Used to locate backup files during restore operations +- Recorded in backup metadata file for verification + +**Related variables**: +- `mas_backup_dir`: Parent directory containing versioned backups +- `db2_action`: Required when action is `restore-database` or `restore`(instance & database) + +**Example**: `20251212-021316` + +### override_storageclass +Controls whether to override storage classes during Db2 installation from backup. +Only used in Db2 instance restore. - **Optional** -- Environment Variable: `MASBR_CONFIRM_CLUSTER` +- Environment Variable: `OVERRIDE_STORAGECLASS` - Default: `false` -#### masbr_copy_timeout_sec -Set the transfer files timeout in seconds. +**Purpose**: Allows changing storage classes when restoring Db2 to a different cluster or when the original storage classes are not available. When enabled, uses specified storage class variables or cluster defaults instead of backup metadata values. -- Optional -- Environment Variable: `MASBR_COPY_TIMEOUT_SEC` -- Default: `43200` (12 hours) +**When to use**: +- Set to `true` when restoring to a cluster with different storage classes +- Set to `true` when original storage classes are not available in target cluster +- Leave as `false` to use the same storage classes as the original instance -#### masbr_job_timezone -Set the [time zone](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) for creating scheduled backup job. If not set a value for this variable, this role will use UTC time zone when creating a CronJob for running scheduled backup job. +**Valid values**: `true`, `false` + +**Impact**: +- When `true`: Uses `CUSTOM_STORAGE_CLASS_RWO` and `CUSTOM_STORAGE_CLASS_RWX` if set, otherwise uses cluster default storage classes +- When `false`: Uses storage classes from backup metadata (original instance configuration) + +**Related variables**: +- `custom_storage_class_rwo`: Override for ReadWriteOnce storage (applies to data, logs, temp, archivelogs, audit_logs) +- `custom_storage_class_rwx`: Override for ReadWriteMany storage (applies to meta, backup) + +### custom_storage_class_rwo +Custom ReadWriteOnce storage class for Db2 restore operations. - **Optional** -- Environment Variable: `MASBR_JOB_TIMEZONE` +- Environment Variable: `CUSTOM_STORAGE_CLASS_RWO` - Default: None -#### masbr_storage_local_folder -Set local path to save the backup files. +**Purpose**: Provides a single storage class override for all ReadWriteOnce (RWO) PVCs during restore. This simplifies storage class configuration when all RWO volumes can use the same storage class. -- **Required** -- Environment Variable: `MASBR_STORAGE_LOCAL_FOLDER` +**When to use**: +- Set when `override_storageclass` is `true` and you want to use the same storage class for all RWO volumes +- Applies to: data, logs, temp, archivelogs, and audit_logs storage + +**Valid values**: Valid storage class name available in the target cluster + +**Impact**: When set, overrides the storage class for all RWO PVCs unless specific DB2 storage class variables are also set (which take precedence) + +**Example**: `ocs-storagecluster-ceph-rbd` + +### custom_storage_class_rwx +Custom ReadWriteMany storage class for Db2 restore operations. + +- **Optional** +- Environment Variable: `CUSTOM_STORAGE_CLASS_RWX` - Default: None -#### masbr_backup_type -Set `full` or `incr` to indicate the role to create a full backup or incremental backup. +**Purpose**: Provides a single storage class override for all ReadWriteMany (RWX) PVCs during restore. This simplifies storage class configuration when all RWX volumes can use the same storage class. -- Optional -- Environment Variable: `MASBR_BACKUP_TYPE` -- Default: `full` +**When to use**: +- Set when `override_storageclass` is `true` and you want to use the same storage class for all RWX volumes +- Applies to: meta and backup storage -#### masbr_backup_from_version -Set the full backup version to use in the incremental backup, this will be in the format of a `YYYMMDDHHMMSS` timestamp (e.g. `20240621021316`). This variable is only valid when `MASBR_BACKUP_TYPE=incr`. If not set a value for this variable, this role will try to find the latest full backup version from the specified storage location. +**Valid values**: Valid storage class name available in the target cluster + +**Impact**: When set, overrides the storage class for all RWX PVCs unless specific DB2 storage class variables are also set (which take precedence) + +**Example**: `ocs-storagecluster-cephfs` + +### backup_type +Type of backup operation to perform on the Db2 database. + +- **Optional** +- Environment Variable: `DB2_BACKUP_TYPE` +- Default: `online` + +**Purpose**: Determines whether the database remains available during backup. Online backups allow continued database access but may impact performance, while offline backups require downtime but complete faster. + +**When to use**: +- Use `online` (default) for production systems requiring high availability +- Use `offline` when downtime is acceptable and faster backup is desired +- **Must use `offline`** if circular logging is enabled (`LOGARCHMETH1: OFF` and/or `LOGARCHMETH2: OFF`) + +**Valid values**: `online`, `offline` + +**Impact**: +- `online`: Database remains accessible during backup; requires archive logging enabled; may impact performance +- `offline`: Database is unavailable during backup; faster completion; works with circular logging + +**Related variables**: +- `db2_database_db_config`: Check `LOGARCHMETH1` and `LOGARCHMETH2` settings to determine if online backup is supported + +**Important**: If your Db2 instance has circular logging enabled (default configuration), you can only use `offline` backup type. If archive logging is enabled, you can use either type. + +### backup_vendor +Storage backend for database backup files only. - **Optional** -- Environment Variable: `MASBR_BACKUP_FROM_VERSION` +- Environment Variable: `BACKUP_VENDOR` +- Default: `disk` + +**Purpose**: Determines where database backup files are stored. Disk storage keeps backups locally, while S3 storage sends them directly to S3-compatible object storage. + +**When to use**: +- Use `disk` (default) for local backups or when S3 is not available +- Use `s3` for cloud-based backups, long-term retention, or disaster recovery scenarios + +**Valid values**: `disk`, `s3` + +**Impact**: +- `disk`: Database Backup files stored locally and copied to `mas_backup_dir`; requires sufficient local storage +- `s3`: Database backup sent directly to S3 bucket; instance resources still stored locally; requires S3 credentials + +**Related variables**: +- When `s3`: Requires `backup_s3_endpoint`, `backup_s3_bucket`, `backup_s3_access_key`, `backup_s3_secret_key` +- `mas_backup_dir`: Always required for metadata and instance resources + +**Note**: Instance resources (secrets, certificates, CRs) are always stored locally in `mas_backup_dir`, regardless of vendor setting. Only database backup files go to S3. + +**Purpose**: Determines if Kubernetes resources (secrets, certificates, Db2uCluster CR, etc.) are backed up along with the database. When `false`, enables full disaster recovery by backing up both instance configuration and data. + +**When to use**: +- Set to `false` when you need complete disaster recovery capability (instance + database) +- Set to `false` when migrating Db2 to a new cluster +- Leave as `true` (default) for database-only backups when instance already exists + +**Valid values**: `true`, `false` + +**Impact**: +- `true`: Only database data is backed up; faster backup; requires existing Db2 instance for restore +- `false`: Both instance resources and database are backed up; enables full recovery; allows install from backup + +**Note**: Instance resources include: Db2uCluster CR, secrets (passwords, certificates), ConfigMaps, and other Kubernetes resources needed to recreate the Db2 instance. + +### backup_s3_endpoint +S3-compatible object storage endpoint URL. + +- **Required** when `backup_vendor` is `s3` +- Environment Variable: `BACKUP_S3_ENDPOINT` - Default: None -#### masbr_backup_schedule -Set [Cron expression](ttps://en.wikipedia.org/wiki/Cron) to create a scheduled backup. If not set a value for this varialbe, this role will create an on-demand backup. +**Purpose**: Specifies the S3 API endpoint for storing database backups. Supports AWS S3, IBM Cloud Object Storage, MinIO, and other S3-compatible services. -- Optional -- Environment Variable: `MASBR_BACKUP_SCHEDULE` +**When to use**: +- Required when using S3 storage for backups (`backup_vendor: s3`) +- Must be accessible from the Db2 pod + +**Valid values**: HTTPS URL to S3-compatible endpoint (e.g., `https://s3.us-east.cloud-object-storage.appdomain.cloud`, `https://s3.amazonaws.com`) + +**Impact**: Db2 connects to this endpoint to upload/download backup files. Incorrect endpoint will cause backup/restore failures. + +**Related variables**: +- `backup_vendor`: Must be set to `s3` +- `backup_s3_bucket`: Bucket at this endpoint +- `backup_s3_access_key`, `backup_s3_secret_key`: Credentials for this endpoint + +**Example**: `https://s3.us-east.cloud-object-storage.appdomain.cloud` + +### backup_s3_bucket +S3 bucket name for storing database backups. + +- **Required** when `backup_vendor` is `s3` +- Environment Variable: `BACKUP_S3_BUCKET` - Default: None -#### masbr_restore_from_version -Set the backup version to use in the restore, this will be in the format of a `YYYMMDDHHMMSS` timestamp (e.g. `20240621021316`) +**Purpose**: Specifies the S3 bucket where database backup files will be stored. The bucket must exist and credentials must have read/write permissions. + +**When to use**: +- Required when using S3 storage for backups (`backup_vendor: s3`) +- Bucket must be created before running backup + +**Valid values**: Valid S3 bucket name following S3 naming conventions + +**Impact**: Backup files are stored in this bucket under path `/`. Incorrect bucket name or insufficient permissions will cause failures. + +**Related variables**: +- `backup_vendor`: Must be set to `s3` +- `backup_s3_endpoint`: S3 service hosting this bucket +- `backup_s3_access_key`, `backup_s3_secret_key`: Must have permissions for this bucket + +**Example**: `mas-db2-backups` + +### backup_s3_access_key +S3 access key ID for authentication. + +- **Required** when `backup_vendor` is `s3` +- Environment Variable: `BACKUP_S3_ACCESS_KEY` +- Default: None + +**Purpose**: Provides the access key ID for authenticating to S3-compatible object storage. Used together with secret key for S3 API authentication. + +**When to use**: +- Required when using S3 storage for backups (`backup_vendor: s3`) +- Must have read/write permissions to the specified bucket + +**Valid values**: Valid S3 access key ID from your S3 provider + +**Impact**: Used for S3 authentication. Invalid credentials will cause backup/restore to fail with authentication errors. + +**Related variables**: +- `backup_vendor`: Must be set to `s3` +- `backup_s3_secret_key`: Corresponding secret key +- `backup_s3_bucket`: Bucket these credentials can access + +**Security**: Store securely using Ansible Vault or environment variables. Never commit to version control. + +### backup_s3_secret_key +S3 secret access key for authentication. -- **Required** only when `DB2_ACTION=restore` -- Environment Variable: `MASBR_RESTORE_FROM_VERSION` +- **Required** when `backup_vendor` is `s3` +- Environment Variable: `BACKUP_S3_SECRET_KEY` - Default: None -## Example Playbook +**Purpose**: Provides the secret access key for authenticating to S3-compatible object storage. Used together with access key for S3 API authentication. + +**When to use**: +- Required when using S3 storage for backups (`backup_vendor: s3`) +- Must correspond to the access key ID + +**Valid values**: Valid S3 secret access key from your S3 provider + +**Impact**: Used for S3 authentication. Invalid credentials will cause backup/restore to fail with authentication errors. + +**Related variables**: +- `backup_vendor`: Must be set to `s3` +- `backup_s3_access_key`: Corresponding access key ID +- `backup_s3_bucket`: Bucket these credentials can access + +**Security**: Store securely using Ansible Vault or environment variables. Never commit to version control. + +### backup_s3_alias +Db2 storage access alias name for S3 configuration. + +- **Optional** +- Environment Variable: `BACKUP_S3_ALIAS` +- Default: `S3DB2COS` + +**Purpose**: Defines the alias name used in Db2's storage access configuration for S3. This is an internal Db2 identifier for the S3 connection. + +**When to use**: +- Usually leave as default unless you have specific Db2 storage access naming requirements +- Change only if you need to match existing Db2 storage access configurations + +**Valid values**: Valid Db2 storage access alias name (alphanumeric, no spaces) + +**Impact**: Used internally by Db2 to reference the S3 storage configuration. Changing this is rarely necessary. + +**Related variables**: +- `backup_vendor`: Only used when set to `s3` + +**Default**: `S3DB2COS` + -### Install Db2 +Example Usage - Backup and Restore +------------------------------------------------------------------------------- + +### Backup Db2 Database to Disk ```yaml - hosts: localhost any_errors_fatal: true vars: - ibm_entitlement_key: xxxxx + mas_instance_id: masinst1 + mas_application_id: manage + mas_backup_dir: /tmp/masbr + db2_action: backup-database + db2_instance_name: db2u-manage + db2_namespace: db2u + backup_type: online + backup_vendor: disk + roles: + - ibm.mas_devops.db2 +``` - # Configuration for the Db2 cluster - db2_instance_name: db2u-db01 +### Backup Db2 Database to S3 +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + mas_instance_id: masinst1 + mas_application_id: manage + mas_backup_dir: /tmp/masbr + db2_action: backup-database + db2_instance_name: db2u-manage + backup_type: online + backup_vendor: s3 + backup_s3_endpoint: https://s3.us-east.cloud-object-storage.appdomain.cloud + backup_s3_bucket: mas-db2-backups # your bucket name + backup_s3_access_key: "{{ lookup('env', 'S3_ACCESS_KEY') }}" + backup_s3_secret_key: "{{ lookup('env', 'S3_SECRET_KEY') }}" + roles: + - ibm.mas_devops.db2 +``` - db2_meta_storage_class: "ibmc-file-gold" - db2_data_storage_class: "ibmc-block-gold" - db2_backup_storage_class: "ibmc-file-gold" - db2_logs_storage_class: "ibmc-block-gold" - db2_temp_storage_class: "ibmc-block-gold" +### Backup with Instance Resources and Database to disk +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + mas_instance_id: masinst1 + mas_application_id: manage + mas_backup_dir: /tmp/masbr + db2_action: backup + db2_instance_name: db2u-manage + db2_namespace: db2u + backup_vendor: disk + roles: + - ibm.mas_devops.db2 +``` - # Create the MAS JdbcCfg & Secret resource definitions - mas_instance_id: inst1 - mas_config_dir: /home/david/masconfig +### Restore Db2 Database from Disk +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + db2_action: restore-database + mas_instance_id: masinst1 + mas_application_id: manage + db2_backup_version: 20251212-021316 + mas_backup_dir: /tmp/masbr + db2_instance_name: db2u-manage + db2_namespace: db2u + backup_vendor: disk roles: - ibm.mas_devops.db2 ``` -### Backup Db2 +### Restore Db2 Database from S3 ```yaml - hosts: localhost any_errors_fatal: true vars: - db2_action: backup - db2_instance_name: db2u-db01 - masbr_storage_local_folder: /tmp/masbr + db2_action: restore-database + mas_instance_id: masinst1 + mas_application_id: manage + db2_backup_version: 20251212-021316 + mas_backup_dir: /tmp/masbr + db2_instance_name: db2u-manage + backup_vendor: s3 + backup_s3_endpoint: https://s3.us-east.cloud-object-storage.appdomain.cloud + backup_s3_bucket: mas-db2-backups # your bucket name + backup_s3_access_key: "{{ lookup('env', 'S3_ACCESS_KEY') }}" + backup_s3_secret_key: "{{ lookup('env', 'S3_SECRET_KEY') }}" roles: - ibm.mas_devops.db2 ``` -### Restore Db2 +### Restore Db2 from Backup (Instance + Database) ```yaml - hosts: localhost any_errors_fatal: true vars: db2_action: restore - db2_instance_name: db2u-db01 - masbr_restore_from_version: 20240621021316 - masbr_storage_local_folder: /tmp/masbr + mas_instance_id: masinst1 + mas_application_id: manage + db2_backup_version: 20251212-021316 + mas_backup_dir: /tmp/masbr + backup_vendor: disk roles: - ibm.mas_devops.db2 ``` -## Run Role Playbook +### Restore Db2 from Backup (Instance + Database) w/ storage class override using custom storage classes +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + db2_action: restore + mas_instance_id: masinst1 + mas_application_id: manage + db2_backup_version: 20251212-021316 + mas_backup_dir: /tmp/masbr + backup_vendor: disk + override_storageclass: true + custom_storage_class_rwo: ocs-storagecluster-ceph-rbd # For data, logs, temp, archivelogs, audit_logs + custom_storage_class_rwx: ocs-storagecluster-cephfs # For meta, backup + roles: + - ibm.mas_devops.db2 +``` +### Restore Db2 from Backup (Instance + Database(S3)) +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + db2_action: restore + mas_instance_id: masinst1 + mas_application_id: manage + db2_backup_version: 20251212-021316 + mas_backup_dir: /tmp/masbr + backup_vendor: s3 + backup_s3_endpoint: https://s3.us-east.cloud-object-storage.appdomain.cloud + backup_s3_bucket: mas-db2-backups # your bucket name + backup_s3_access_key: "{{ lookup('env', 'S3_ACCESS_KEY') }}" + backup_s3_secret_key: "{{ lookup('env', 'S3_SECRET_KEY') }}" + roles: + - ibm.mas_devops.db2 +``` + +### Backup Directory Structure (Disk) +``` +/tmp/masbr/ +└── backup--db2u-/ + ├── data/ + │ ├── db2--BLUDB-backup-.tar.gz + │ └── db2-backup-info.yaml + └── resources/ + ├── db2uclusters/ + ├── secrets/ + ├── certificates/ + └── issuers/ + └── {kind}s/ +``` + +### Database backup Metadata (db2-backup-info.yaml) +```yaml +source_db2_backup_version: "20251212-021316" +source_db2_backup_timestamp: "20251212021316" +source_db2_instance_name: "db2u-manage" +source_db2_instance_version: "11.5.8.0-cn7" +database: "BLUDB" +backup_vendor: "disk" +vendor_backup_path: "/mnt/backup/20251212-021316/data" +local_backup_path: "/tmp/masbr/backup-20251212-021316-db2u/data/db2-BLUDB-backup-20251212-021316.tar.gz" +status: "SUCCESS" +``` + +### Important Considerations + +**Version Compatibility:** +- The restore operation requires the target Db2 version to match the backup version +- Always verify version compatibility before attempting a restore + +**Backup Types:** +If your DB2 instance has got circular logging enabled i.e `LOGARCHMETH1: OFF or/and LOGARCHMETH2: OFF`, you can only use `offline` backup type. +If your DB2 instance has got circular logging disabled, you can use either `online` or `offline` backup type. +If you are unsure, you can use default `online` backup type. +- **Online Backup**: Database remains available during backup (recommended for production) +- **Offline Backup**: Database is taken offline during backup (faster but requires downtime) + +**Storage Options:** +- **Disk Storage**: Backups stored locally and copied to backup directory +- **S3 Storage**: Backups stored directly to S3-compatible object storage (no local storage required) + +**Security:** +- Backup files contain sensitive data and credentials +- Secure the backup directory with appropriate permissions +- Consider encrypting backup files for long-term storage + +**Performance:** +- Online backups may impact Db2 performance during execution +- Schedule backups during low-usage periods +- Monitor Db2 resource utilization during backup + + +Backup and Restore Troubleshooting +------------------------------------------------------------------------------- + +### Common Issues and Solutions + +#### Backup Failures + +- Check DB2 pod logs: `oc logs -n -c db2u` +- Review backup script logs in the pod: `/tmp/db2_backup.log` +- Verify S3 credentials and connectivity (for S3 backups) +- Ensure sufficient storage space in the backup PVC(/mnt/backup) + +**Issue: Backup fails with "insufficient storage space"** +``` +Error: SQL2062N An error occurred while accessing media "backup_path" +``` +**Solution:** +- Check available disk space on the Db2 pod: `oc exec -n -- df -h` +- Verify backup storage PVC has sufficient capacity +- For S3 backups, ensure bucket has adequate space and proper permissions +- Consider using compression or incremental backups to reduce storage requirements + +**Issue: Backup fails with "database is in use"** +``` +Error: SQL1035N The database is currently in use. SQLSTATE=57019 +``` +**Solution:** +- For offline backups, ensure all applications are disconnected +- Use online backup instead: `export DB2_BACKUP_TYPE=online` +- Check active connections: `oc exec -n -- su - db2inst1 -c "db2 list applications"` +- Force disconnect if necessary: `db2 force applications all` + +**Issue: S3 backup fails with authentication errors** +``` +Error: Unable to authenticate with S3 endpoint +``` +**Solution:** +- Verify S3 credentials are correct: `BACKUP_S3_ACCESS_KEY` and `BACKUP_S3_SECRET_KEY` +- Test S3 connectivity from the Db2 pod +- Ensure S3 endpoint URL is correct and accessible +- Check firewall rules and network policies allow S3 access +- Verify bucket exists and credentials have write permissions + +**Issue: Backup script execution timeout** +``` +Error: Backup operation timed out after 3600 seconds +``` +**Solution:** +- Large databases may require extended timeout periods +- Monitor backup progress: `oc logs -n -f` +- Check Db2 performance and resource utilization +- Consider scheduling backups during low-usage periods +- For very large databases, use incremental backups + +#### Restore Failures + +- Verify backup version exists and is complete +- Check DB2 version compatibility +- Review restore script logs: `/tmp/db2_restore_disk.log` or `/tmp/db2_restore_s3.log` +- Ensure DB2 instance is running and healthy before database restore +- For S3 restores, verify S3 connectivity and credentials + +**Issue: Restore fails with version mismatch** +``` +Error: DB2 version mismatch. Backup version: 11.5.8.0, Target version: 11.5.9.0 +``` +**Solution:** +- Ensure target Db2 version matches backup version +- Check backup metadata: `cat /data/db2-backup-info.yaml` +- Install matching Db2 version: `export DB2_VERSION=11.5.8.0-cn7` +- Alternatively, upgrade backup to target version (requires manual intervention) + +**Issue: Restore fails with "database already exists"** +``` +Error: SQL1005N The database alias "BLUDB" already exists +``` +**Solution:** +- Drop existing database before restore: + ```bash + oc exec -n -- su - db2inst1 -c "db2 drop database BLUDB" + ``` +- Or use a different database name during restore +- Verify database state: `db2 list database directory` + +**Issue: Restore fails with corrupted backup files** +``` +Error: SQL2025N The database cannot be restored from backup image +``` +**Solution:** +- Verify backup file integrity: + ```bash + tar -tzf .tar.gz > /dev/null + ``` +- Check backup metadata for status: `status: SUCCESS` +- Re-run backup if corruption detected +- For S3 restores, verify file was completely uploaded +- Check storage system for hardware errors + +**Issue: Restore fails with insufficient permissions** +``` +Error: SQL0551N User does not have required authorization +``` +**Solution:** +- Verify db2inst1 user has proper permissions +- Check pod security context and service account +- Ensure restore scripts have execute permissions +- Review OpenShift security policies (SCC) + +**Issue: S3 restore fails to download backup files** +``` +Error: Failed to download backup from S3 bucket +``` +**Solution:** +- Verify S3 credentials have read permissions +- Check S3 bucket name and path are correct +- Test S3 connectivity: `aws s3 ls s3:///` +- Ensure network policies allow outbound S3 access +- Verify backup files exist in S3 bucket + +#### Performance Issues + +**Issue: Backup taking too long** +**Solution:** +- Use online backups to avoid database downtime +- Schedule backups during low-usage periods +- Increase Db2 pod resources (CPU/memory) +- Use compression to reduce backup size +- Consider incremental backups for large databases +- Check network bandwidth for S3 backups + +**Issue: Restore taking too long** +**Solution:** +- Ensure adequate resources allocated to Db2 pod +- Monitor pod resource utilization during restore +- Check storage performance (IOPS, throughput) +- For S3 restores, verify network bandwidth +- Consider using faster storage classes + +#### Validation and Verification + +**Issue: How to verify backup completed successfully** +**Solution:** +1. Check backup metadata file: + ```bash + cat /data/db2-backup-info.yaml + ``` + Verify `status: SUCCESS` + +2. Verify backup file exists and has reasonable size: + ```bash + ls -lh /data/db2-*.tar.gz + ``` + +3. For S3 backups, verify files in bucket: + ```bash + aws s3 ls s3://// + ``` + +4. Check Db2 backup history: + ```bash + oc exec -n -- su - db2inst1 -c "db2 list history backup all for BLUDB" + ``` + +**Issue: How to verify restore completed successfully** +**Solution:** +1. Check database is online: + ```bash + oc exec -n -- su - db2inst1 -c "db2 connect to BLUDB" + ``` + +2. Verify table counts and data integrity: + ```bash + oc exec -n -- su - db2inst1 -c "db2 'select count(*) from '" + ``` + +3. Check database configuration: + ```bash + oc exec -n -- su - db2inst1 -c "db2 get db cfg for BLUDB" + ``` + +4. Review restore logs for errors: + ```bash + oc logs -n | grep -i error + ``` + +### Diagnostic Commands + +**Check Db2 pod status:** ```bash -export IBM_ENTITLEMENT_KEY=xxxxx -export DB2_INSTANCE_NAME=db2u-db01 -export DB2_META_STORAGE_CLASS=ibmc-file-gold -export DB2_DATA_STORAGE_CLASS=ibmc-block-gold -export MAS_INSTANCE_ID=inst1 -export MAS_CONFIG_DIR=/home/masconfig -ansible-playbook ibm.mas_devops.run_role +oc get pods -n | grep db2 +oc describe pod -n ``` -## License +**Check Db2 instance status:** +```bash +oc exec -n -- su - db2inst1 -c "db2pd -" +``` + +**Check database status:** +```bash +oc exec -n -- su - db2inst1 -c "db2 list active databases" +``` + +**Check backup storage:** +```bash +oc get pvc -n +oc exec -n -- df -h /mnt/backup +``` + +**View Db2 diagnostic logs:** +```bash +oc exec -n -- tail -f /var/log/db2u.log +oc exec -n -- cat /database/config/db2inst1/sqllib/db2dump/db2diag.log +``` + +**Check S3 configuration (if using S3):** +```bash +oc exec -n -- su - db2inst1 -c "db2 list storage access" +``` + +### Getting Help + +If you encounter issues not covered in this troubleshooting guide: + +1. **Check Db2 logs**: Review `/var/log/db2u.log` and `db2diag.log` for detailed error messages +2. **Review backup metadata**: Check `db2-backup-info.yaml` for backup details and status +3. **Verify prerequisites**: Ensure all required variables are set correctly +4. **Test connectivity**: Verify network access to storage (S3 or PVC) +5. **Check resources**: Ensure adequate CPU, memory, and storage are available +6. **Open an issue**: Report problems at the project repository with logs and configuration details + +License +------------------------------------------------------------------------------- EPL-2.0 diff --git a/ibm/mas_devops/roles/db2/defaults/main.yml b/ibm/mas_devops/roles/db2/defaults/main.yml index dc2bbccf3a..8fe702e383 100644 --- a/ibm/mas_devops/roles/db2/defaults/main.yml +++ b/ibm/mas_devops/roles/db2/defaults/main.yml @@ -1,6 +1,7 @@ --- # db2 action # --------------------------------------------------------------------------- +# Supported actions: install, uninstall, backup, backup-database, restore, restore-database db2_action: "{{ lookup('env', 'DB2_ACTION') | default('install', true) }}" # Configure Db2 instance @@ -123,7 +124,7 @@ mas_config_dir: "{{ lookup('env', 'MAS_CONFIG_DIR') }}" mas_config_scope: "{{ lookup('env', 'MAS_CONFIG_SCOPE') | default('system', true) }}" # Supported values are "system", "ws", "app", or "wsapp" mas_workspace_id: "{{ lookup('env', 'MAS_WORKSPACE_ID') }}" # Necessary for ws and wsapp scopes -mas_application_id: "{{ lookup('env', 'MAS_APP_ID') }}" # Necessary for app and wsapp scopes +mas_application_id: "{{ lookup('env', 'MAS_APP_ID') }}" # Necessary for app and wsapp scopes and backup and restore tasks # Entitlement # ----------------------------------------------------------------------------- @@ -136,6 +137,29 @@ ibm_entitlement_key: "{{ lookup('env', 'IBM_ENTITLEMENT_KEY') }}" # ----------------------------------------------------------------------------- custom_labels: "{{ lookup('env', 'CUSTOM_LABELS') | default(None, true) | string | ibm.mas_devops.string2dict() }}" +# Backup and Restore variables +# ----------------------------------------------------------------------------- +mas_backup_dir: "{{ lookup('env', 'MAS_BACKUP_DIR') }}" +db2_backup_version: "{{ lookup('env', 'DB2_BACKUP_VERSION') }}" + +# Set flag to true, to use cluster's default storage classes +override_storageclass: "{{ lookup('env', 'OVERRIDE_STORAGECLASS') | default(false, true) }}" +custom_storage_class_rwo: "{{ lookup('env', 'CUSTOM_STORAGE_CLASS_RWO') | default('', true) }}" +custom_storage_class_rwx: "{{ lookup('env', 'CUSTOM_STORAGE_CLASS_RWX') | default('', true) }}" + +backup_vendor: "{{ lookup('env', 'BACKUP_VENDOR') | default('disk', true) | lower }}" # Supported values are s3 and disk +backup_s3_alias: "{{ lookup('env', 'BACKUP_S3_ALIAS') | default('S3DB2COS', true) }}" +backup_s3_endpoint: "{{ lookup('env', 'BACKUP_S3_ENDPOINT') }}" +backup_s3_bucket: "{{ lookup('env', 'BACKUP_S3_BUCKET') }}" +backup_s3_access_key: "{{ lookup('env', 'BACKUP_S3_ACCESS_KEY') }}" +backup_s3_secret_key: "{{ lookup('env', 'BACKUP_S3_SECRET_KEY') }}" +backup_type: "{{ lookup('env', 'DB2_BACKUP_TYPE') | default('online', true) }}" # Supported values are online, offline + +_db2_storages: "NO_OVERRIDE" +_db2_instance_password: "NO_OVERRIDE" +_db2_ldapblueadmin_password: "NO_OVERRIDE" +_db2_ldappassword: "NO_OVERRIDE" + # Addons # ----------------------------------------------------------------------------- db2_addons_audit_config: "{{ lookup('env', 'DB2_ADDONS_AUDIT_CONFIG') }}" diff --git a/ibm/mas_devops/roles/db2/files/backup/prepare_backup_scripts.sh b/ibm/mas_devops/roles/db2/files/backup/prepare_backup_scripts.sh new file mode 100644 index 0000000000..3855288c47 --- /dev/null +++ b/ibm/mas_devops/roles/db2/files/backup/prepare_backup_scripts.sh @@ -0,0 +1,49 @@ +#!/bin/bash + +# Finding the Instance owner +INSTOWNER=`/usr/local/bin/db2greg -dump | grep -ae "I," | grep -v "/das," | awk -F ',' '{print $4}' ` + +# Finding Instnace owner Group +GRPID=`cat /etc/passwd | grep ${INSTOWNER} | cut -d: -f4` +INSTGROUP=`cat /etc/group | grep ${GRPID} | cut -d: -f1` + +# Find the home directory +INSTHOME=` cat /etc/passwd | grep ${INSTOWNER} | cut -d: -f6` + +# Resolving INSTOWNER's executables path (sqllib): +DBPATH=`/usr/local/bin/db2greg -dump | grep -ae "I," | grep -v "/das," | grep "${INSTOWNER}" | awk -F ',' '{print $5}' ` + +# Source the db2profile for the root user to be able to issue several db2 commands locally: +SOURCEPATH="$DBPATH/db2profile" +. $SOURCEPATH + +mkdir -p ${INSTHOME}/bin/ + +cd /tmp/db2-scripts/ + +echo -e "\nCopying the files to bin directory under Instance Home . . . " + +if [ -f db2_backup.sh ]; then + echo -e "\nCopying db2_backup.sh to bin directory under Instance Home . . . " + cp -rp db2_backup.sh ${INSTHOME}/bin/ +fi + +if [ -f setup_cos_storage_access.sh ]; then + echo -e "\nCopying setup_cos_storage_access.sh to bin directory under Instance Home . . . " + cp -rp setup_cos_storage_access.sh ${INSTHOME}/bin/ +fi + +if [ -f db2_restore_disk.sh ]; then + echo -e "\nCopying db2_restore_disk.sh to bin directory under Instance Home . . . " + cp -rp db2_restore_disk.sh ${INSTHOME}/bin/ +fi + +if [ -f db2_restore_s3.sh ]; then + echo -e "\nCopying db2_restore_s3.sh to bin directory under Instance Home . . . " + cp -rp db2_restore_s3.sh ${INSTHOME}/bin/ +fi + +chown -R ${INSTOWNER}:${INSTGROUP} ${INSTHOME}/bin + +echo -e "\nINSTHOME=${INSTHOME}\n" +echo -e "PrepareSuccess: Backup scripts have been copied to Instance Home bin directory." diff --git a/ibm/mas_devops/roles/db2/tasks/after-backup-restore.yml b/ibm/mas_devops/roles/db2/tasks/after-backup-restore.yml deleted file mode 100644 index 60ea1112e3..0000000000 --- a/ibm/mas_devops/roles/db2/tasks/after-backup-restore.yml +++ /dev/null @@ -1,11 +0,0 @@ ---- -# Clean up -# ------------------------------------------------------------------------- -- name: "Delete temporary folders" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - rm -f {{ masbr_pod_lock_file }}; - rm -rf {{ db2_pod_temp_folder }}; - rm -rf {{ db2_pvc_temp_folder }} - {{ exec_in_pod_end }} diff --git a/ibm/mas_devops/roles/db2/tasks/backup-database/backup-database.yml b/ibm/mas_devops/roles/db2/tasks/backup-database/backup-database.yml new file mode 100644 index 0000000000..7f0c5517d0 --- /dev/null +++ b/ibm/mas_devops/roles/db2/tasks/backup-database/backup-database.yml @@ -0,0 +1,255 @@ +--- +- name: Debug – configuration summary + ansible.builtin.debug: + msg: + - "================== BACKUP DB2 DATABASE CONFIGURATION SUMMARY ==================" + - "MAS INSTANCE ID : {{ mas_instance_id | default('UNDEFINED') }}" + - "MAS APPLICATION ID : {{ mas_application_id | default('UNDEFINED') }}" + - "DB2 INSTANCE NAME : {{ db2_instance_name | default('UNDEFINED') }}" + - "NAMESPACE : {{ db2_namespace | default('UNDEFINED') }}" + - "DB NAME : {{ db2_dbname | default('UNDEFINED') }}" + - "BACKUP VERSION : {{ db2_backup_version | default('UNDEFINED') }}" + - "VENDOR : {{ backup_vendor | default('UNDEFINED') }}" + - "S3 ENDPOINT : {{ backup_s3_endpoint | default('UNDEFINED') }}" + - "S3 BUCKET : {{ backup_s3_bucket | default('UNDEFINED') }}" + - "BACKUP DIR : {{ db2_backup_path | default('UNDEFINED') }}" + - "================================================================================" + +# Check if backup vendor is s3, then check s3 related variables +# ----------------------------------------------------------------------------- +- name: "Verify DB2 S3 backup variables" + ibm.mas_devops.verify_backup_restore_vars: + component: "db2" + action: "s3_setup" + backup_vendor: "{{ backup_vendor }}" + backup_s3_alias: "{{ backup_s3_alias }}" + backup_s3_endpoint: "{{ backup_s3_endpoint }}" + backup_s3_bucket: "{{ backup_s3_bucket }}" + backup_s3_access_key: "{{ backup_s3_access_key }}" + backup_s3_secret_key: "{{ backup_s3_secret_key }}" + when: backup_vendor == "s3" + +- name: Set fact db2_backup_data_path + set_fact: + db2_backup_data_path: "{{ db2_backup_path }}/data" + +- name: "Create {{ db2_backup_path }}/data directory" + file: + path: "{{ db2_backup_data_path }}" + state: directory + mode: "0755" + +- name: "Set fact backup_path for the backup script when backup vendor is s3" + set_fact: + full_backup_path: "DB2REMOTE://{{ backup_s3_alias }}/{{ backup_s3_bucket }}/backups-db2-{{ mas_application_id }}/{{ db2_backup_version }}" + when: backup_vendor == "s3" + +- name: "Set fact backup_path for the backup script when backup vendor is disk" + set_fact: + full_backup_path: "/mnt/backup/{{ db2_backup_version }}/data" + base_backup_path: "/mnt/backup/{{ db2_backup_version }}" + when: backup_vendor == "disk" + +# Check if db2 instance is running and get the pod name +# ----------------------------------------------------------------------------- +- name: "Check DB2 is running and get DB2 pod name" + ibm.mas_devops.get_db2u_pod_name: + db2_instance_name: "{{ db2_instance_name | lower }}" + db2_namespace: "{{ db2_namespace }}" + register: db2_pod_name_result + +- name: Assert DB2 pod name is found + assert: + that: + - db2_pod_name_result is defined + - db2_pod_name_result.success + - db2_pod_name_result.pod_name != "" + fail_msg: "DB2 pod name could not be found. Ensure the DB2 instance is running." + +- name: "Set fact db2_pod_name" + set_fact: + db2_pod_name: "{{ db2_pod_name_result.pod_name }}" + +- name: "Remove any existing files in /tmp/db2-scripts directory on localhost" + ansible.builtin.file: + path: "/tmp/db2-scripts" + state: absent + +- name: "Recreate /tmp/db2-scripts directory on localhost" + ansible.builtin.file: + path: "/tmp/db2-scripts" + state: directory + mode: "0755" + +- name: "Copy prepare_backup_scripts.sh to /tmp" + ansible.builtin.copy: + src: "{{ role_path }}/files/backup/prepare_backup_scripts.sh" + dest: "/tmp/db2-scripts/prepare_backup_scripts.sh" + mode: "0755" + +- name: "Template the DB2 S3 storage access setup script" + ansible.builtin.template: + src: "backup/setup_cos_storage_access.sh.j2" + dest: "/tmp/db2-scripts/setup_cos_storage_access.sh" + mode: "0755" + when: backup_vendor == "s3" + +- name: "Template the DB2 backup script" + ansible.builtin.template: + src: "backup/db2_backup.sh.j2" + dest: "/tmp/db2-scripts/db2_backup.sh" + mode: "0755" + +- name: Zip the DB2 backup scripts + ansible.builtin.archive: + path: /tmp/db2-scripts + dest: /tmp/db2-scripts.zip + format: zip + mode: "0755" + +# Create /tmp/db2-scripts directory in DB2 pod and copy scripts into the pod +# ----------------------------------------------------------------------------- +- name: create /tmp/db2-scripts directory in DB2 pod + shell: "oc exec -n {{ db2_namespace }} {{ db2_pod_name }} -c db2u -- su -lc 'rm -rf /tmp/db2-scripts && mkdir -p /tmp/db2-scripts/' db2inst1" + register: create_dir_result + retries: 2 + delay: 15 # seconds + until: create_dir_result.rc == 0 + +- name: Copy /tmp/db2-scripts.zip to DB2 pod /tmp/db2-scripts.zip + shell: "oc cp /tmp/db2-scripts.zip {{ db2_namespace }}/{{ db2_pod_name }}:/tmp/db2-scripts.zip -c db2u" + register: copy_scripts_result + retries: 2 + delay: 15 # seconds + until: copy_scripts_result.rc == 0 + +- name: Unzip DB2 backup scripts in DB2 pod + shell: "oc exec -n {{ db2_namespace }} {{ db2_pod_name }} -c db2u -- su -lc 'cd /tmp; unzip -o db2-scripts.zip; chmod -R +x /tmp/db2-scripts' db2inst1" + register: unzip_result + retries: 2 + delay: 15 # seconds + until: unzip_result.rc == 0 + +# Execute prepare_backup_scripts.sh in DB2 pod +# this script will unzip the backup scripts, and sets the right permissions to the scripts. +# ----------------------------------------------------------------------------- +- name: "Execute prepare_backup_scripts.sh in DB2 pod" + shell: "oc exec -n {{ db2_namespace }} {{ db2_pod_name }} -c db2u -- su -lc '/tmp/db2-scripts/prepare_backup_scripts.sh | tee /tmp/prepare_backup_scripts.log' db2inst1" + register: prepare_scripts_result + retries: 2 + delay: 15 # seconds + until: + - prepare_scripts_result.rc == 0 + - prepare_scripts_result.stdout | regex_search('PrepareSuccess', multiline=True) + +- name: "Debug prepare_backup_scripts.sh output" + ansible.builtin.debug: + msg: "{{ prepare_scripts_result.stdout_lines }}" + +- name: Get INSTHOME from prepare_backup_scripts.log + ansible.builtin.set_fact: + db2_insthome: "{{ (prepare_scripts_result.stdout | regex_search('INSTHOME=(.*)', '\\1'))[0] }}" + when: + - prepare_scripts_result is defined + - prepare_scripts_result.rc == 0 + - prepare_scripts_result.stdout is defined + +# Execute setup_cos_storage_access.sh in DB2 pod if backup vendor is s3 +# ----------------------------------------------------------------------------- +- name: "Execute setup_cos_storage_access.sh in DB2 pod" + shell: "oc exec -n {{ db2_namespace }} {{ db2_pod_name }} -c db2u -- su -lc '{{ db2_insthome }}/bin/setup_cos_storage_access.sh | tee /tmp/setup_cos_storage_access.log' db2inst1" + register: setup_s3_result + when: backup_vendor == "s3" + +- name: "Assert S3 storage access setup completed successfully" + assert: + that: + - setup_s3_result is defined + - setup_s3_result.rc == 0 + when: backup_vendor == "s3" + +- name: "Debug setup_cos_storage_access.sh output" + ansible.builtin.debug: + msg: "{{ setup_s3_result.stdout_lines }}" + when: backup_vendor == "s3" + +# Execute db2_backup.sh in DB2 pod +# ----------------------------------------------------------------------------- +- name: "Executing DB2 backup.." + block: + - name: "Execute db2_backup.sh in DB2 pod, check logs in /tmp/db2_backup.log in pod" + shell: "oc exec -n {{ db2_namespace }} {{ db2_pod_name }} -c db2u -- su -lc '{{ db2_insthome }}/bin/db2_backup.sh | tee /tmp/db2_backup.log' db2inst1" + register: db2_backup_result + + - name: "Debug db2_backup.sh output" + ansible.builtin.debug: + msg: "{{ db2_backup_result.stdout_lines }}" + + - name: "Assert DB2 backup completed successfully" + assert: + that: + - db2_backup_result is defined + - db2_backup_result.rc == 0 + - db2_backup_result.stdout | regex_search('BACKUP COMPLETED SUCCESSFULLY', multiline=True) + fail_msg: "DB2 backup did not complete successfully. Check /tmp/db2_backup.log in {{ db2_pod_name }} pod for details." + + - name: Get backup file timestamp from db2_backup.log + ansible.builtin.set_fact: + db2_backup_timestamp: "{{ (db2_backup_result.stdout | regex_search('BACKUP_FILE_TIMESTAMP=(.*)', '\\1'))[0] }}" + when: + - db2_backup_result is defined + - db2_backup_result.rc == 0 + - db2_backup_result.stdout is defined + + # If vendor is disk, tar the backup files and cp to backup_data_path + # ----------------------------------------------------------------------------- + - name: "Move backup files to {{ db2_backup_data_path }} when backup vendor is disk.. This will take a while..." + shell: "oc cp --retries=50 -c db2u {{ db2_namespace }}/{{ db2_pod_name }}:{{ base_backup_path }}/db2-{{ mas_application_id }}-{{ db2_dbname }}-backup-{{ db2_backup_version }}.tar.gz {{ db2_backup_data_path }}/db2-{{ mas_application_id }}-{{ db2_dbname }}-backup-{{ db2_backup_version }}.tar.gz" + register: rsync_result + when: backup_vendor == "disk" + + - name: "Debug rsync output" + ansible.builtin.debug: + msg: "{{ rsync_result.stdout_lines }}" + when: backup_vendor == "disk" + + always: + - name: "Clean up backup directory in the pod" + shell: "oc exec -n {{ db2_namespace }} {{ db2_pod_name }} -c db2u -- su -lc 'rm -rf {{ base_backup_path }}' db2inst1" + when: backup_vendor == "disk" + + +# Create yaml file with db2 backup details, add local_backup_path if vendor is disk +# ----------------------------------------------------------------------------- +- name: "Create yaml file with db2 backup details" + copy: + dest: "{{ db2_backup_data_path }}/db2-backup-info.yaml" + content: | + source_db2_backup_version: "{{ db2_backup_version }}" + source_db2_backup_timestamp: "{{ db2_backup_timestamp }}" + source_db2_instance_name: "{{ db2_instance_name | lower }}" + source_db2_instance_version: "{{ db2_pod_name_result.db2_version }}" + database: "{{ db2_dbname }}" + app_id: "{{ mas_application_id }}" + backup_vendor: "{{ backup_vendor }}" + vendor_backup_path: "{{ full_backup_path }}" + {% if backup_vendor == 'disk' %} + local_backup_path: "{{ db2_backup_data_path }}/db2-{{ mas_application_id }}-{{ db2_dbname }}-backup-{{ db2_backup_version }}.tar.gz" + {% endif %} + status: "SUCCESS" + +- name: Database backup summary + ansible.builtin.debug: + msg: + - "================== BACKUP DB2 DATABASE CONFIGURATION SUMMARY ==================" + - "MAS APPLICATION ID : {{ mas_application_id | default('UNDEFINED') }}" + - "DB2 INSTANCE NAME : {{ db2_instance_name | default('UNDEFINED') }}" + - "NAMESPACE : {{ db2_namespace | default('UNDEFINED') }}" + - "DB NAME : {{ db2_dbname | default('UNDEFINED') }}" + - "BACKUP VERSION : {{ db2_backup_version | default('UNDEFINED') }}" + - "VENDOR : {{ backup_vendor | default('UNDEFINED') }}" + - "S3 ENDPOINT : {{ backup_s3_endpoint | default('UNDEFINED') }}" + - "S3 BUCKET : {{ backup_s3_bucket | default('UNDEFINED') }}" + - "BACKUP DIR : {{ db2_backup_path | default('UNDEFINED') }}" + - "STATUS : SUCCESS" + - "================================================================================" diff --git a/ibm/mas_devops/roles/db2/tasks/backup-database/main.yml b/ibm/mas_devops/roles/db2/tasks/backup-database/main.yml new file mode 100644 index 0000000000..d425340330 --- /dev/null +++ b/ibm/mas_devops/roles/db2/tasks/backup-database/main.yml @@ -0,0 +1,34 @@ +--- +# Check db2 backup required variables +# ----------------------------------------------------------------------------- +- name: Verify DB2 backup variables + ibm.mas_devops.verify_backup_restore_vars: + component: "db2" + action: "backup" + db2_instance_name: "{{ db2_instance_name }}" + mas_backup_dir: "{{ mas_backup_dir }}" + mas_instance_id: "{{ mas_instance_id }}" + db2_namespace: "{{ db2_namespace }}" + mas_application_id: "{{ mas_application_id }}" + +# Set DB2 backup version if not provided +# ----------------------------------------------------------------------------- +- name: "Check if DB2_BACKUP_VERSION is provided, if not set to default 'YYYYMMDD-HHMMSS' format" + set_fact: + db2_backup_version: "{{ lookup('pipe', 'date +%Y%m%d-%H%M%S') }}" + when: db2_backup_version is not defined or db2_backup_version == "" or db2_backup_version == "None" + +- name: "Set fact: DB2 backup base directory path" + set_fact: + db2_backup_path: "{{ mas_backup_dir }}/backup-{{ db2_backup_version }}-db2u-{{ mas_application_id }}" + +- name: "Create {{ db2_backup_path }} directory for Db2 backup" + file: + path: "{{ db2_backup_path }}" + state: directory + mode: "0755" + +# Backup Db2 database Data using Db2Backup CR +# ------------------------------------------------------------------------- +- name: "Start Database backup process." + include_tasks: "tasks/backup-database/backup-database.yml" diff --git a/ibm/mas_devops/roles/db2/tasks/backup/backup-database.yml b/ibm/mas_devops/roles/db2/tasks/backup/backup-database.yml deleted file mode 100644 index 72c6b14e64..0000000000 --- a/ibm/mas_devops/roles/db2/tasks/backup/backup-database.yml +++ /dev/null @@ -1,163 +0,0 @@ ---- -# Update db2 database backup status: InProgress -# ----------------------------------------------------------------------------- -- name: "Update db2 database backup status: InProgress" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "InProgress" - -- name: "Backup db2 database" - block: - # Prepare db2 database backup folder - # ------------------------------------------------------------------------- - - name: "Set fact: db2 database backup folder" - set_fact: - # We should use Db2 backup pvc to save the temporary backup files, the db2 pod - # ephemeral local storage has a limits up to 4Gi by default. - db2_backup_folder: "{{ db2_pvc_temp_folder }}/{{ masbr_job_data_type }}" - - - name: "Set fact: db2 database backup log" - set_fact: - db2_backup_log: "{{ db2_backup_folder }}/db2-backup.log" - - - name: "Create db2 database backup folder" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - mkdir -p {{ db2_backup_folder }} - {{ exec_in_pod_end }} - - - name: "Debug: db2 database backup folder" - debug: - msg: "Db2 database backup folder ........ {{ db2_backup_folder }}" - - # Take a backup of db2 database - # https://www.ibm.com/docs/en/db2/11.5?topic=commands-backup-database - # ------------------------------------------------------------------------- - - name: "Take {{ 'Full' if masbr_backup_type == 'full' else 'Incremental' }} backup of db2 database" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - mkdir -p {{ db2_backup_folder }}/db2backup && - db2 -v connect to {{ db2_dbname }} | tee -a {{ db2_backup_log }} && - db2 -v backup db {{ db2_dbname }} on all dbpartitionnums online - {{ "incremental" if masbr_backup_type == "incr" }} - to {{ db2_backup_folder }}/db2backup - compress UTIL_IMPACT_PRIORITY 50 include logs without prompting | tee -a {{ db2_backup_log }} - {{ exec_in_pod_end }} - register: _db2backup_output - - - name: "Debug: db2 backup output" - debug: - msg: "{{ _db2backup_output.stdout_lines }}" - - # Extract Db2 keystore master key - # https://www.ibm.com/docs/en/db2/11.5?topic=edr-restoring-encrypted-backup-image-different-system-local-keystore - # ------------------------------------------------------------------------- - - name: "Check master key label from keystore.p12" - changed_when: false - shell: > - {{ exec_in_pod_begin }} - gsk8capicmd_64 -cert -list all -db {{ db2_keystore_folder }}/keystore.p12 -stashed - {{ exec_in_pod_end }} - register: _check_master_label_output - - - name: "Get master key label from keystore.p12" - vars: - regex: '\DB2(.*)' - when: item is regex('\DB2(.*)') - set_fact: - db2_master_key_label: "{{ item | regex_search(regex) }}" - with_items: "{{ _check_master_label_output.stdout_lines | list }}" - - - name: "Extract master key from keystore.p12" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - gsk8capicmd_64 -secretkey -extract -db {{ db2_keystore_folder }}/keystore.p12 -stashed - -label {{ db2_master_key_label }} -format ascii - -target {{ db2_backup_folder }}/db2backup/master_key_label.kdb - {{ exec_in_pod_end }} - register: _extract_master_key_output - - - name: "Copy keystore files to db2 backup folder" - changed_when: false - shell: > - {{ exec_in_pod_begin }} - cp -rf {{ db2_keystore_folder }}/* {{ db2_backup_folder }}/db2backup && - ls -lA {{ db2_backup_folder }}/db2backup - {{ exec_in_pod_end }} - register: _copy_keystore_output - - - name: "Debug: files in db2 backup folder" - debug: - msg: "{{ _copy_keystore_output.stdout_lines }}" - - # Create tar.gz archives of database backup files - # ------------------------------------------------------------------------- - - name: "Create tar.gz archives of database backup files" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - tar -czf {{ db2_backup_folder }}/{{ masbr_job_name }}.tar.gz - -C {{ db2_backup_folder }}/db2backup . && - du -h {{ db2_backup_folder }}/* - {{ exec_in_pod_end }} - register: _du_files_output - - - name: "Debug: size of backup files" - debug: - msg: "{{ _du_files_output.stdout_lines }}" - - # Copy backup files from pod to specified storage location - # ------------------------------------------------------------------------- - - name: "Copy backup files from pod to specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_pod_files_to_storage.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_job_name }}" - masbr_cf_are_pvc_paths: true - masbr_cf_paths: - - src_file: "{{ db2_backup_folder }}/{{ masbr_job_name }}.tar.gz" - dest_folder: "{{ masbr_job_data_type }}" - - # Update database backup status: Completed - # ------------------------------------------------------------------------- - - name: "Update database backup status: Completed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Completed" - - rescue: - # Update database backup status: Failed - # ------------------------------------------------------------------------- - - name: "Update database backup status: Failed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Failed" - - always: - # Copy db2 backup log file from pod to specified storage location - # ------------------------------------------------------------------------- - - name: "Create a tar.gz archive of db2 backup log" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - tar -czf {{ db2_backup_folder }}/db2-backup-log.tar.gz - -C {{ db2_backup_folder }} db2-backup.log - {{ exec_in_pod_end }} - - - name: "Copy db2 backup log file from pod to specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_pod_files_to_storage.yml" - vars: - masbr_cf_job_type: "{{ masbr_job_type }}" - masbr_cf_job_name: "{{ masbr_job_name_final }}" - masbr_cf_paths: - - src_file: "{{ db2_backup_folder }}/db2-backup-log.tar.gz" - dest_folder: "log" diff --git a/ibm/mas_devops/roles/db2/tasks/backup/backup-instance.yml b/ibm/mas_devops/roles/db2/tasks/backup/backup-instance.yml new file mode 100644 index 0000000000..d38b362a12 --- /dev/null +++ b/ibm/mas_devops/roles/db2/tasks/backup/backup-instance.yml @@ -0,0 +1,109 @@ +--- +- name: "Set fact: DB2 universal operator backup resources" + set_fact: + db2_backup_resources: + - namespace: "{{ db2_namespace }}" + resources: + # Namespace + - kind: Project + api_version: project.openshift.io/v1 + name: "{{ db2_namespace }}" + # db2u.databases.ibm.com + - kind: Db2uCluster + api_version: db2u.databases.ibm.com/v1 + name: "{{ db2_instance_name }}" + # support for db2uinstance migration + - kind: Db2uInstance + api_version: db2u.databases.ibm.com/v1 + name: "{{ db2_instance_name }}" + # subscription + - kind: Subscription + api_version: operators.coreos.com/v1alpha1 + name: db2u-operator + # operatorgroup + - kind: OperatorGroup + api_version: operators.coreos.com/v1 + # secrets + - kind: Secret + api_version: v1 + name: ibm-registry + - kind: Secret + api_version: v1 + name: "c-{{ db2_instance_name }}-instancepassword" + - kind: Secret + api_version: v1 + name: "c-{{ db2_instance_name }}-ldapblueadminpassword" + - kind: Secret + api_version: v1 + name: "c-{{ db2_instance_name }}-ldappassword" + - kind: Secret + api_version: v1 + name: "db2u-certificate-{{ db2_instance_name }}" + - kind: Secret + api_version: v1 + name: db2u-license-keys + - kind: ConfigMap + api_version: v1 + name: db2u-release + # Issuers + - kind: Issuer + api_version: cert-manager.io/v1 + name: db2u-issuer + - kind: Issuer + api_version: cert-manager.io/v1 + name: db2u-ca-issuer + # Certificates + - kind: Certificate + api_version: cert-manager.io/v1 + name: "db2u-certificate-{{ db2_instance_name }}" + - kind: Certificate + api_version: cert-manager.io/v1 + name: db2u-ca-certificate + # Route + - kind: Route + api_version: route.openshift.io/v1 + name: "db2u-{{ db2_instance_name }}-tls-route" + - namespace: "mas-{{ mas_instance_id }}-core" + resources: + # secrets + - kind: Secret + api_version: v1 + name: "jdbc-{{ db2_instance_name }}-credentials" + +# Call the backup_resources plugin to execute the backup to the path provided +# ----------------------------------------------------------------------------- +- name: "Backup DB2 resources (referenced secrets are auto-discovered)" + ibm.mas_devops.backup_resource: + backup_resources: "{{ db2_backup_resources }}" + backup_path: "{{ db2_backup_path }}" + register: backup_result + +# Show the results +# ----------------------------------------------------------------------------- +- name: "Display backup results" + debug: + msg: + - "Backup completed{{ ' with failures' if backup_result.failed_count > 0 else ' successfully' }}" + - "Total resources backed up: {{ backup_result.backed_up_count }}" + - "Total resources failed: {{ backup_result.failed_count }}" + - "Resources not found: {{ backup_result.not_found_count }}" + - "Secrets auto-discovered: {{ backup_result.discovered_secrets_count }}" + - "Backup location: {{ db2_backup_path }}" + +# Fail task if any errors occurred. +# ----------------------------------------------------------------------------- +- name: "Display failed resources" + debug: + msg: + - "Failed resources:" + - "{{ backup_result.failed_resources | to_nice_yaml }}" + when: backup_result.failed_count > 0 + +- name: "Fail if backup had errors" + fail: + msg: | + Backup failed for {{ backup_result.failed_count }} resource(s): + {% for resource in backup_result.failed_resources %} + - {{ resource.description }} in {{ resource.scope }} + {% endfor %} + when: backup_result.failed_count > 0 diff --git a/ibm/mas_devops/roles/db2/tasks/backup/main.yml b/ibm/mas_devops/roles/db2/tasks/backup/main.yml index a915394c47..fe7c8897c8 100644 --- a/ibm/mas_devops/roles/db2/tasks/backup/main.yml +++ b/ibm/mas_devops/roles/db2/tasks/backup/main.yml @@ -1,67 +1,39 @@ --- # Check db2 backup required variables # ----------------------------------------------------------------------------- -- name: "Fail if db2_instance_name is not provided" - assert: - that: db2_instance_name is defined and db2_instance_name != "" - fail_msg: "db2_instance_name is required" +- name: Verify DB2 backup variables + ibm.mas_devops.verify_backup_restore_vars: + component: "db2" + action: "backup" + db2_instance_name: "{{ db2_instance_name }}" + mas_backup_dir: "{{ mas_backup_dir }}" + mas_instance_id: "{{ mas_instance_id }}" + db2_namespace: "{{ db2_namespace }}" + mas_application_id: "{{ mas_application_id }}" -# Set common backup job variables +# Set DB2 backup version if not provided # ----------------------------------------------------------------------------- -- name: "Set fact: common backup job variables" +- name: "Check if DB2_BACKUP_VERSION is provided, if not set to default 'YYYYMMDD-HHMMSS' format" set_fact: - masbr_job_component: - name: "db2" - instance: "{{ db2_instance_name }}" - namespace: "{{ db2_namespace }}" - masbr_job_data_list: - - seq: "1" - type: "database" + db2_backup_version: "{{ lookup('pipe', 'date +%Y%m%d-%H%M%S') }}" + when: db2_backup_version is not defined or db2_backup_version == "" or db2_backup_version == "None" -# Before run tasks -# ------------------------------------------------------------------------- -- name: "Before run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/before_run_tasks.yml" - vars: - _ignore_masbr_backup_data: true - _job_type: "backup" - _component_before_task_path: "{{ role_path }}/tasks/before-backup-restore.yml" - -- name: "Run backup tasks" - block: - # Update backup job status: New - # ------------------------------------------------------------------------- - - name: "Update backup job status: New" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "1" - phase: "New" +- name: "Set fact: DB2 backup base directory path" + set_fact: + db2_backup_path: "{{ mas_backup_dir }}/backup-{{ db2_backup_version }}-db2u-{{ mas_application_id }}" - # Run backup tasks for each data type - # ------------------------------------------------------------------------- - - name: "Run backup tasks for each data type" - include_tasks: "{{ role_path }}/tasks/backup/backup-{{ job_data_item.type }}.yml" - vars: - masbr_job_data_seq: "{{ job_data_item.seq }}" - masbr_job_data_type: "{{ job_data_item.type }}" - loop: "{{ masbr_job_data_list }}" - loop_control: - loop_var: job_data_item +- name: "Create {{ db2_backup_path }} directory for Db2 backup" + file: + path: "{{ db2_backup_path }}" + state: directory + mode: "0755" - rescue: - # Update backup status: Failed - # ------------------------------------------------------------------------- - - name: "Update database backup status: Failed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_status: - phase: "Failed" +# Backup Db2 Universal operator Instance Kubernetes Resources +# ------------------------------------------------------------------------- +- name: "Start Db2 Universal operator Instance backup process." + include_tasks: "{{ role_path }}/tasks/backup/backup-instance.yml" - always: - # After run tasks - # ------------------------------------------------------------------------- - - name: "After run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/after_run_tasks.yml" - vars: - _component_after_task_path: "{{ role_path }}/tasks/after-backup-restore.yml" +# Backup Db2 database Data using Db2Backup CR +# ------------------------------------------------------------------------- +- name: "Start Database backup process." + include_tasks: "{{ role_path }}/tasks/backup-database/backup-database.yml" diff --git a/ibm/mas_devops/roles/db2/tasks/before-backup-restore.yml b/ibm/mas_devops/roles/db2/tasks/before-backup-restore.yml deleted file mode 100644 index 01061b37e0..0000000000 --- a/ibm/mas_devops/roles/db2/tasks/before-backup-restore.yml +++ /dev/null @@ -1,147 +0,0 @@ ---- -# Get db2 version and status -# ----------------------------------------------------------------------------- -- name: "Get Db2uCluster" - kubernetes.core.k8s_info: - api_version: db2u.databases.ibm.com/v1 - kind: Db2uCluster - name: "{{ masbr_job_component.instance }}" - namespace: "{{ db2_namespace }}" - register: _db2ucluster_output - -- name: "Set fact: db2 version" - set_fact: - db2_version: "{{ _db2ucluster_output.resources[0].spec.version }}" - when: - - _db2ucluster_output is defined - - _db2ucluster_output.resources[0] is defined - - _db2ucluster_output.resources[0].spec.version is defined - -- name: "Fail if db2 does not exists" - assert: - that: db2_version is defined - fail_msg: "Db2 does not exists!" - -- name: "Set fact: db2 running status" - set_fact: - db2_running: true - when: - - _db2ucluster_output is defined - - _db2ucluster_output.resources[0] is defined - - _db2ucluster_output.resources[0].status is defined - - _db2ucluster_output.resources[0].status.state is defined - - _db2ucluster_output.resources[0].status.state == "Ready" - -- name: "Fail if db2 is not running" - assert: - that: db2_running is defined and db2_running - fail_msg: "Db2 is not running!" - -# Get db2 pod name -# ----------------------------------------------------------------------------- -- name: "Get db2 pod name" - kubernetes.core.k8s_info: - kind: Pod - namespace: "{{ db2_namespace }}" - label_selectors: - - type=engine - - app={{ masbr_job_component.instance }} - register: _db2_pod_output - failed_when: - - _db2_pod_output.resources is not defined - - _db2_pod_output.resources | length == 0 - -- name: "Set fact: db2 pod name" - set_fact: - db2_pod_name: "{{ _db2_pod_output.resources[0].metadata.name }}" - db2_container_name: db2u - -- name: "Set fact: exec command in db2 pod" - set_fact: - exec_in_pod_begin: >- - oc exec {{ db2_pod_name }} -c {{ db2_container_name }} -n {{ db2_namespace }} -- su -lc ' - exec_in_pod_end: "' {{ db2_jdbc_username }}" - -# Set db2 backup/restore variables -# ------------------------------------------------------------------------- -- name: "Set fact: db2 pod copy file variables" - set_fact: - masbr_cf_namespace: "{{ db2_namespace }}" - masbr_cf_pod_name: "{{ db2_pod_name }}" - masbr_cf_container_name: "{{ db2_container_name }}" - masbr_cf_pvc_name: "c-{{ db2_instance_name if masbr_job_type == 'restore' else db2_instance_name }}-backup" - masbr_cf_pvc_mount_path: "/mnt/backup" - masbr_cf_pvc_sub_path: "" - masbr_cf_are_pvc_paths: true - masbr_cf_affinity: false - -- name: "Set fact: temporary folders" - set_fact: - db2_pod_temp_folder: "{{ masbr_pod_temp_folder }}/{{ masbr_job_name }}" - db2_pvc_temp_folder: "{{ masbr_cf_pvc_mount_path }}/{{ masbr_job_name }}" - -- name: "Set fact: db2 keystore folder" - set_fact: - db2_keystore_folder: "/mnt/blumeta0/db2/keystore" - -# Output db2 information -# ----------------------------------------------------------------------------- -- name: "Debug: db2 information" - debug: - msg: - - "Db2 version ............................ {{ db2_version }}" - - "Db2 is running ......................... {{ db2_running }}" - - "Db2 pod name ........................... {{ db2_pod_name }}" - -# Check if an exiting job is running -# ------------------------------------------------------------------------- -- name: "Try to find job lock file in pod" - when: not masbr_allow_multi_jobs - changed_when: false - shell: > - {{ exec_in_pod_begin }} - [ -f {{ masbr_pod_lock_file }} ] && echo exist; exit 0 - {{ exec_in_pod_end }} - register: _get_lock_file_output - -- name: "Fail if found job lock file in pod" - when: not masbr_allow_multi_jobs - assert: - that: _get_lock_file_output.stdout != "exist" - fail_msg: "A backup/restore job is running now, please try to run job later!" - -- name: "Create job lock file in pod" - changed_when: true - shell: >- - {{ exec_in_pod_begin }} - mkdir -p {{ masbr_pod_lock_file | dirname }}; - touch {{ masbr_pod_lock_file }} - {{ exec_in_pod_end }} - -# Check storage usage -# ------------------------------------------------------------------------- -- name: "Get storage usage of pod temporary folder" - changed_when: false - shell: > - {{ exec_in_pod_begin }} - mkdir -p {{ db2_pod_temp_folder }}; - df -h {{ db2_pod_temp_folder }} - {{ exec_in_pod_end }} - register: _df_temp_output - -- name: "Debug: storage usage of pod temporary folder" - debug: - msg: "{{ _df_temp_output.stdout_lines }}" - -- name: "Get storage usage of pvc temporary folder" - changed_when: false - shell: > - {{ exec_in_pod_begin }} - mkdir -p {{ db2_pvc_temp_folder }}; - df -h {{ db2_pvc_temp_folder }} - {{ exec_in_pod_end }} - register: _df_pvc_output - -- name: "Debug: storage usage of pvc temporary folder" - debug: - msg: "{{ _df_pvc_output.stdout_lines }}" diff --git a/ibm/mas_devops/roles/db2/tasks/main.yml b/ibm/mas_devops/roles/db2/tasks/main.yml index 113dcb8d30..20b54999ce 100644 --- a/ibm/mas_devops/roles/db2/tasks/main.yml +++ b/ibm/mas_devops/roles/db2/tasks/main.yml @@ -3,3 +3,10 @@ include_tasks: "tasks/{{ db2_action }}/main.yml" when: - db2_action != "none" + - db2_action in ["install", "upgrade", "backup", "restore-database", "backup-database", "restore"] + +- name: "Fail if db2_action is invalid" + fail: + msg: "db2_action must be one of ['install', 'upgrade', 'backup', 'restore-database', 'backup-database', 'restore']" + when: + - db2_action not in ["install", "upgrade", "backup", "restore-database", "backup-database", "restore"] diff --git a/ibm/mas_devops/roles/db2/tasks/restore-database/main.yml b/ibm/mas_devops/roles/db2/tasks/restore-database/main.yml new file mode 100644 index 0000000000..41bca3b031 --- /dev/null +++ b/ibm/mas_devops/roles/db2/tasks/restore-database/main.yml @@ -0,0 +1,4 @@ +--- +# Restore Db2 Universal operator database +- name: "Start Database restore process." + include_tasks: "{{ role_path }}/tasks/restore-database/restore-database.yml" diff --git a/ibm/mas_devops/roles/db2/tasks/restore-database/restore-database.yml b/ibm/mas_devops/roles/db2/tasks/restore-database/restore-database.yml new file mode 100644 index 0000000000..7e650ee0f3 --- /dev/null +++ b/ibm/mas_devops/roles/db2/tasks/restore-database/restore-database.yml @@ -0,0 +1,42 @@ +--- +# Check db2 Restore required variables +# ----------------------------------------------------------------------------- +- name: Verify DB2 restore variables + ibm.mas_devops.verify_backup_restore_vars: + component: "db2" + action: "restore-database" + db2_backup_version: "{{ db2_backup_version }}" + mas_backup_dir: "{{ mas_backup_dir }}" + db2_instance_name: "{{ db2_instance_name }}" + backup_vendor: "{{ backup_vendor }}" + mas_application_id: "{{ mas_application_id }}" + +# Check s3 variables when backup vendor is s3 +# ----------------------------------------------------------------------------- +- name: "Verify DB2 S3 variables" + ibm.mas_devops.verify_backup_restore_vars: + component: "db2" + action: "s3_setup" + backup_vendor: "{{ backup_vendor }}" + backup_s3_alias: "{{ backup_s3_alias }}" + backup_s3_endpoint: "{{ backup_s3_endpoint }}" + backup_s3_bucket: "{{ backup_s3_bucket }}" + backup_s3_access_key: "{{ backup_s3_access_key }}" + backup_s3_secret_key: "{{ backup_s3_secret_key }}" + when: backup_vendor == "s3" + +- name: Debug – "Start database restore process" + ansible.builtin.debug: + msg: + - "================== STARTING RESTORE DB2 DATABASE ==============================" + - "DB2 INSTANCE NAME : {{ db2_instance_name | default('UNDEFINED') }}" + - "MAS APPLICATION ID : {{ mas_application_id | default('UNDEFINED') }}" + - "BACKUP VERSION : {{ db2_backup_version | default('UNDEFINED') }}" + - "VENDOR : {{ backup_vendor | default('UNDEFINED') }}" + - "S3 ENDPOINT : {{ backup_s3_endpoint | default('UNDEFINED') }}" + - "S3 BUCKET : {{ backup_s3_bucket | default('UNDEFINED') }}" + - "BACKUP DIR : {{ mas_backup_dir | default('UNDEFINED') }}" + - "================================================================================" + +- name: "Run restore-db-from-disk.yml when backup vendor is disk" + include_tasks: "{{ role_path }}/tasks/restore-database/restore-db-from-{{ backup_vendor }}.yml" diff --git a/ibm/mas_devops/roles/db2/tasks/restore-database/restore-db-from-disk.yml b/ibm/mas_devops/roles/db2/tasks/restore-database/restore-db-from-disk.yml new file mode 100644 index 0000000000..48f9846c9e --- /dev/null +++ b/ibm/mas_devops/roles/db2/tasks/restore-database/restore-db-from-disk.yml @@ -0,0 +1,198 @@ +- name: "Set fact backup_path for the backup script when backup vendor is disk" + set_fact: + pod_full_backup_path: "/mnt/backup/{{ db2_backup_version }}" + local_full_backup_path: "{{ mas_backup_dir }}/backup-{{ db2_backup_version }}-db2u-{{ mas_application_id }}/data" + +# Check backup files in local backup directory if vendor is disk +# ----------------------------------------------------------------------------- +# check db2-backup-info.yaml file exists in local backup directory +- name: "Check db2-backup-info.yaml file exists in local backup directory" + stat: + path: "{{ local_full_backup_path }}/db2-backup-info.yaml" + register: db2_backup_info_file_stat + +- name: "Assert db2-backup-info.yaml file exists" + assert: + that: + - db2_backup_info_file_stat.stat.exists == true + fail_msg: "db2-backup-info.yaml file does not exist in {{ local_full_backup_path }}. Cannot proceed with restore." + +- name: include vars from db2-backup-info.yaml + include_vars: + file: "{{ local_full_backup_path }}/db2-backup-info.yaml" + name: db2_backup_info + +# check tar.gz file exists in local backup directory +- name: Check for tar.gz file in backup directory + stat: + path: "{{ local_full_backup_path }}/db2-{{ mas_application_id }}-{{ db2_backup_info.database }}-backup-{{ db2_backup_info.source_db2_backup_version }}.tar.gz" + register: db2_backup_tar_stat + +- name: "Fail if db2-{{ mas_application_id }}-{{ db2_backup_info.database }}-backup-{{ db2_backup_info.source_db2_backup_version }}.tar.gz file not found" + assert: + that: + - db2_backup_tar_stat.stat.exists == true + fail_msg: "No db2-{{ mas_application_id }}-{{ db2_backup_info.database }}-backup-{{ db2_backup_info.source_db2_backup_version }}.tar.gz file found in {{ local_full_backup_path }}. Cannot proceed with restore." + +- name: "Set fact backup_archive" + set_fact: + backup_archive_filename: "db2-{{ mas_application_id }}-{{ db2_backup_info.database }}-backup-{{ db2_backup_info.source_db2_backup_version }}.tar.gz" + +# Check if Db2 instance is running and version match with backup version +# ----------------------------------------------------------------------------- +- name: "Check DB2 is running and get DB2 pod name" + ibm.mas_devops.get_db2u_pod_name: + db2_instance_name: "{{ db2_instance_name | lower}}" + db2_namespace: "{{ db2_namespace }}" + register: db2_pod_name_result + +- name: Assert DB2 is running and pod name is found + assert: + that: + - db2_pod_name_result is defined + - db2_pod_name_result.success + - db2_pod_name_result.pod_name != "" + - db2_pod_name_result.db2_version != "" + fail_msg: "DB2 Instance {{ db2_instance_name | lower }} is not running in namespace {{ db2_namespace }}. Ensure the DB2 instance is running." + +- name: "Assert DB2 version matches backup version" + assert: + that: + - db2_pod_name_result.db2_version == db2_backup_info.source_db2_instance_version + fail_msg: "DB2 version {{ db2_pod_name_result.db2_version }} does not match backup version {{ db2_backup_info.source_db2_instance_version }}. Cannot proceed with restore." + +- name: "Set fact db2_pod_name" + set_fact: + db2_pod_name: "{{ db2_pod_name_result.pod_name }}" + +# Copy backup files to the pod +# ----------------------------------------------------------------------------- +- name: "Remove any existing files in /tmp/db2-scripts directory on localhost" + ansible.builtin.file: + path: "/tmp/db2-scripts" + state: absent + +- name: "Recreate /tmp/db2-scripts directory on localhost" + ansible.builtin.file: + path: "/tmp/db2-scripts" + state: directory + mode: "0755" + +- name: "Copy prepare_backup_scripts.sh to /tmp" + ansible.builtin.copy: + src: "{{ role_path }}/files/backup/prepare_backup_scripts.sh" + dest: "/tmp/db2-scripts/prepare_backup_scripts.sh" + mode: "0755" + +- name: create template restore script in /tmp/db2-scripts directory on localhost + ansible.builtin.template: + src: "backup/db2_restore_disk.sh.j2" + dest: "/tmp/db2-scripts/db2_restore_disk.sh" + mode: "0755" + vars: + db2_dbname: "{{ db2_backup_info.database }}" + db2_backup_version: "{{ db2_backup_info.source_db2_backup_version }}" + db2_backup_timestamp: "{{ db2_backup_info.source_db2_backup_timestamp | trim }}" + backup_type: "online" + +- name: Zip the DB2 backup scripts + ansible.builtin.archive: + path: /tmp/db2-scripts + dest: /tmp/db2-scripts.zip + format: zip + mode: "0755" + +# Create /tmp/db2-scripts directory in DB2 pod and copy scripts into the pod +# ----------------------------------------------------------------------------- +- name: create /tmp/db2-scripts directory in DB2 pod + shell: "oc exec -n {{ db2_namespace }} {{ db2_pod_name }} -c db2u -- su -lc 'mkdir -p /tmp/db2-scripts/ && rm -rf /tmp/db2-scripts/*' db2inst1" + register: create_dir_result + retries: 2 + delay: 15 # seconds + until: create_dir_result.rc == 0 + +- name: Copy /tmp/db2-scripts.zip to DB2 pod /tmp/db2-scripts.zip + shell: "oc cp /tmp/db2-scripts.zip {{ db2_namespace }}/{{ db2_pod_name }}:/tmp/db2-scripts.zip -c db2u" + register: copy_scripts_result + retries: 2 + delay: 15 # seconds + until: copy_scripts_result.rc == 0 + +- name: Unzip DB2 backup scripts in DB2 pod + shell: "oc exec -n {{ db2_namespace }} {{ db2_pod_name }} -c db2u -- su -lc 'cd /tmp; unzip -o db2-scripts.zip; chmod -R +x /tmp/db2-scripts' db2inst1" + register: unzip_result + retries: 2 + delay: 15 # seconds + until: unzip_result.rc == 0 + +# Execute prepare_backup_scripts.sh in DB2 pod +# this script will unzip the backup scripts, and sets the right permissions to the scripts. +# ----------------------------------------------------------------------------- +- name: "Execute prepare_backup_scripts.sh in DB2 pod" + shell: "oc exec -n {{ db2_namespace }} {{ db2_pod_name }} -c db2u -- su -lc '/tmp/db2-scripts/prepare_backup_scripts.sh | tee /tmp/prepare_backup_scripts.log' db2inst1" + register: prepare_scripts_result + retries: 2 + delay: 15 # seconds + until: + - prepare_scripts_result.rc == 0 + - prepare_scripts_result.stdout | regex_search('PrepareSuccess', multiline=True) is not none + +- name: "Debug prepare_backup_scripts.sh output" + ansible.builtin.debug: + msg: "{{ prepare_scripts_result.stdout_lines }}" + +- name: Get INSTHOME from prepare_backup_scripts.log + ansible.builtin.set_fact: + db2_insthome: "{{ (prepare_scripts_result.stdout | regex_search('INSTHOME=(.*)', '\\1'))[0] }}" + when: + - prepare_scripts_result is defined + - prepare_scripts_result.rc == 0 + - prepare_scripts_result.stdout is defined + +# Copy backup archive to DB2 pod +# ----------------------------------------------------------------------------- +- name: "Create {{ pod_full_backup_path }} directory in DB2 pod" + shell: "oc exec -n {{ db2_namespace }} {{ db2_pod_name }} -c db2u -- su -lc 'rm -rf {{ pod_full_backup_path }} && mkdir -p {{ pod_full_backup_path }} && chmod a+w {{ pod_full_backup_path}}' db2inst1" + register: create_dir_result + failed_when: create_dir_result.rc != 0 + +- name: "Copy backup archive to DB2 pod. This will take a while..." + shell: "oc cp --retries=50 {{ local_full_backup_path }}/{{ backup_archive_filename }} {{ db2_namespace }}/{{ db2_pod_name }}:{{ pod_full_backup_path }}/{{ backup_archive_filename }} -c db2u" + register: copy_backup_result + failed_when: copy_backup_result.rc != 0 + +- name: "Copying backup archive to DB2 pod result" + debug: + msg: + - "Backup archive {{ backup_archive_filename }} copied to DB2 pod {{ db2_pod_name }} in path {{ pod_full_backup_path }}/{{ backup_archive_filename }}" + +- name: "Extract backup files" + shell: "oc exec -n {{ db2_namespace }} {{ db2_pod_name }} -c db2u -- su -lc 'tar -xzf {{ pod_full_backup_path }}/{{ backup_archive_filename }} -C {{ pod_full_backup_path }} && ls {{ pod_full_backup_path }}' db2inst1" + register: extract_backup_result + failed_when: extract_backup_result.rc != 0 + +- name: "Extract backup files result" + debug: + msg: + - "------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------" + - "Backup archive {{ backup_archive_filename }} extracted in DB2 pod {{ db2_pod_name }} in path {{ pod_full_backup_path }}" + - "------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------" + - "{{ extract_backup_result.stdout_lines }}" + +# Execute db2_restore_disk.sh in DB2 pod if backup vendor is disk +# ----------------------------------------------------------------------------- +- name: "Execute db2_restore_disk.sh in DB2 pod, Check logs in /tmp/db2_restore_disk.log in DB2 pod {{ db2_pod_name }}" + shell: "oc exec -n {{ db2_namespace }} {{ db2_pod_name }} -c db2u -- su -lc '{{ db2_insthome }}/bin/db2_restore_disk.sh | tee /tmp/db2_restore_disk.log' db2inst1" + register: restore_db_result + failed_when: restore_db_result.rc != 0 + +- name: "Debug db2_restore_disk.sh output" + ansible.builtin.debug: + msg: "{{ restore_db_result.stdout_lines }}" + +- name: "Assert DB2 restore was successful" + assert: + that: + - restore_db_result.stdout is defined + - restore_db_result.stdout | regex_search('RestoreSuccess', multiline=True) is not none + fail_msg: "DB2 restore failed. Check /tmp/db2_restore_disk.log in DB2 pod {{ db2_pod_name }} for more details." diff --git a/ibm/mas_devops/roles/db2/tasks/restore-database/restore-db-from-s3.yml b/ibm/mas_devops/roles/db2/tasks/restore-database/restore-db-from-s3.yml new file mode 100644 index 0000000000..d10a94f426 --- /dev/null +++ b/ibm/mas_devops/roles/db2/tasks/restore-database/restore-db-from-s3.yml @@ -0,0 +1,141 @@ +- name: "Set fact backup_path for the backup script when backup vendor is disk" + set_fact: + pod_full_backup_path: "/mnt/backup/{{ db2_backup_version }}" + s3_full_backup_path: "DB2REMOTE://{{ backup_s3_alias }}/{{ backup_s3_bucket }}/backups-db2-{{ mas_application_id }}/{{ db2_backup_version }}" + +# Check if Db2 instance is running and version match with backup version +# ----------------------------------------------------------------------------- +- name: "Check DB2 is running and get DB2 pod name" + ibm.mas_devops.get_db2u_pod_name: + db2_instance_name: "{{ db2_instance_name | lower}}" + db2_namespace: "{{ db2_namespace }}" + register: db2_pod_name_result + +- name: Assert DB2 is running and pod name is found + assert: + that: + - db2_pod_name_result is defined + - db2_pod_name_result.success + - db2_pod_name_result.pod_name != "" + - db2_pod_name_result.db2_version != "" + fail_msg: "DB2 Instance {{ db2_instance_name | lower}} is not running in namespace {{ db2_namespace }}. Ensure the DB2 instance is running." + +- name: "Set fact db2_pod_name" + set_fact: + db2_pod_name: "{{ db2_pod_name_result.pod_name }}" + +# Copy Restore files to the pod +# ----------------------------------------------------------------------------- +- name: "Remove any existing files in /tmp/db2-scripts directory on localhost" + ansible.builtin.file: + path: "/tmp/db2-scripts" + state: absent + +- name: "Recreate /tmp/db2-scripts directory on localhost" + ansible.builtin.file: + path: "/tmp/db2-scripts" + state: directory + mode: "0755" + +- name: "Copy prepare_backup_scripts.sh to /tmp" + ansible.builtin.copy: + src: "{{ role_path }}/files/backup/prepare_backup_scripts.sh" + dest: "/tmp/db2-scripts/prepare_backup_scripts.sh" + mode: "0755" + +- name: "Template the DB2 S3 storage access setup script" + ansible.builtin.template: + src: "backup/setup_cos_storage_access.sh.j2" + dest: "/tmp/db2-scripts/setup_cos_storage_access.sh" + mode: "0755" + +- name: create template restore script in /tmp/db2-scripts directory on localhost + ansible.builtin.template: + src: "backup/db2_restore_s3.sh.j2" + dest: "/tmp/db2-scripts/db2_restore_s3.sh" + mode: "0755" + +- name: Zip the DB2 backup scripts + ansible.builtin.archive: + path: /tmp/db2-scripts + dest: /tmp/db2-scripts.zip + format: zip + mode: "0755" + +# Create /tmp/db2-scripts directory in DB2 pod and copy scripts into the pod +# ----------------------------------------------------------------------------- +- name: create /tmp/db2-scripts directory in DB2 pod + shell: "oc exec -n {{ db2_namespace }} {{ db2_pod_name }} -c db2u -- su -lc 'mkdir -p /tmp/db2-scripts/ && rm -rf /tmp/db2-scripts/*' db2inst1" + register: create_dir_result + retries: 2 + delay: 15 # seconds + until: create_dir_result.rc == 0 + +- name: Copy /tmp/db2-scripts.zip to DB2 pod /tmp/db2-scripts.zip + shell: "oc cp /tmp/db2-scripts.zip {{ db2_namespace }}/{{ db2_pod_name }}:/tmp/db2-scripts.zip -c db2u" + register: copy_scripts_result + retries: 2 + delay: 15 # seconds + until: copy_scripts_result.rc == 0 + +- name: Unzip DB2 backup scripts in DB2 pod + shell: "oc exec -n {{ db2_namespace }} {{ db2_pod_name }} -c db2u -- su -lc 'cd /tmp; unzip -o db2-scripts.zip; chmod -R +x /tmp/db2-scripts' db2inst1" + register: unzip_result + retries: 2 + delay: 15 # seconds + until: unzip_result.rc == 0 + +# Execute prepare_backup_scripts.sh in DB2 pod +# this script will unzip the backup scripts, and sets the right permissions to the scripts. +# ----------------------------------------------------------------------------- +- name: "Execute prepare_backup_scripts.sh in DB2 pod" + shell: "oc exec -n {{ db2_namespace }} {{ db2_pod_name }} -c db2u -- su -lc '/tmp/db2-scripts/prepare_backup_scripts.sh | tee /tmp/prepare_backup_scripts.log' db2inst1" + register: prepare_scripts_result + retries: 2 + delay: 15 # seconds + until: + - prepare_scripts_result.rc == 0 + - prepare_scripts_result.stdout | regex_search('PrepareSuccess', multiline=True) is not none + +- name: "Debug prepare_backup_scripts.sh output" + ansible.builtin.debug: + msg: "{{ prepare_scripts_result.stdout_lines }}" + +- name: Get INSTHOME from prepare_backup_scripts.log + ansible.builtin.set_fact: + db2_insthome: "{{ (prepare_scripts_result.stdout | regex_search('INSTHOME=(.*)', '\\1'))[0] }}" + when: + - prepare_scripts_result is defined + - prepare_scripts_result.rc == 0 + - prepare_scripts_result.stdout is defined + +# Execute setup_cos_storage_access.sh in DB2 pod if backup vendor is s3 +# ----------------------------------------------------------------------------- +- name: "Execute setup_cos_storage_access.sh in DB2 pod" + shell: "oc exec -n {{ db2_namespace }} {{ db2_pod_name }} -c db2u -- su -lc '{{ db2_insthome }}/bin/setup_cos_storage_access.sh | tee /tmp/setup_cos_storage_access.log' db2inst1" + register: setup_s3_result + retries: 2 + delay: 15 # seconds + until: setup_s3_result.rc == 0 + +- name: "Debug setup_cos_storage_access.sh output" + ansible.builtin.debug: + msg: "{{ setup_s3_result.stdout_lines }}" + +# Execute db2_restore_s3.sh in DB2 pod if backup vendor is s3 +# ----------------------------------------------------------------------------- +- name: "Execute db2_restore_s3.sh in DB2 pod, Check logs in /tmp/db2_restore_s3.log in DB2 pod {{ db2_pod_name }}" + shell: "oc exec -n {{ db2_namespace }} {{ db2_pod_name }} -c db2u -- su -lc '{{ db2_insthome }}/bin/db2_restore_s3.sh | tee /tmp/db2_restore_s3.log' db2inst1" + register: restore_db_result + failed_when: restore_db_result.rc != 0 + +- name: "Debug db2_restore_s3.sh output" + ansible.builtin.debug: + msg: "{{ restore_db_result.stdout_lines }}" + +- name: "Assert DB2 restore was successful" + assert: + that: + - restore_db_result.stdout is defined + - restore_db_result.stdout | regex_search('RestoreSuccess', multiline=True) is not none + fail_msg: "DB2 restore failed. Check /tmp/db2_restore_s3.log in DB2 pod {{ db2_pod_name }} for more details." diff --git a/ibm/mas_devops/roles/db2/tasks/restore/copy-db2-backup-file.yml b/ibm/mas_devops/roles/db2/tasks/restore/copy-db2-backup-file.yml deleted file mode 100644 index 57fc684401..0000000000 --- a/ibm/mas_devops/roles/db2/tasks/restore/copy-db2-backup-file.yml +++ /dev/null @@ -1,107 +0,0 @@ ---- -# Copy backup file from specified storage location -# ------------------------------------------------------------------------- -- name: "Copy backup file from specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_storage_files_to_pod.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ _job_name }}" - masbr_cf_paths: - - src_file: "{{ masbr_job_data_type }}/{{ _job_name }}.tar.gz" - dest_folder: "{{ db2_restore_folder }}" - -# Extract the tar.gz file -# ------------------------------------------------------------------------- -- name: "Extract the tar.gz file" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - mkdir -p {{ db2_restore_folder }}/{{ _job_name }} && - tar -xzf {{ db2_restore_folder }}/{{ _job_name }}.tar.gz - -C {{ db2_restore_folder }}/{{ _job_name }} && - ls {{ db2_restore_folder }}/{{ _job_name }} - {{ exec_in_pod_end }} - register: _extract_output - -- name: "Debug: list extracted files" - debug: - msg: - - "Extract output folder .............. {{ db2_restore_folder }}/{{ _job_name }}" - - "{{ _extract_output.stdout_lines }}" - -# Validate extracted files -# ------------------------------------------------------------------------- -- name: "Set fact: extracted file names" - set_fact: - db2_restore_filenames: "{{ _extract_output.stdout_lines }}" - -- name: "Set fact: Db2 keystore .p12 file" - vars: - regex: ".+?(?=.p12)" - when: item is regex(regex) - set_fact: - db2_keystore_p12_file: "{{ item }}" - loop: "{{ db2_restore_filenames }}" - -- name: "Fail if Db2 keystore .p12 file was not found" - assert: - that: - - db2_keystore_p12_file is defined - - db2_keystore_p12_file | length > 0 - fail_msg: "Db2 keystore .p12 file from source Db2 instance must be provided for the restore process to work." - -- name: "Set fact: Db2 keystore .sth file" - vars: - regex: ".+?(?=.sth)" - when: item is regex(regex) - set_fact: - db2_keystore_sth_file: "{{ item }}" - loop: "{{ db2_restore_filenames }}" - -- name: "Fail if Db2 keystore .sth file was not found" - assert: - that: - - db2_keystore_sth_file is defined - - db2_keystore_sth_file | length > 0 - fail_msg: "Db2 keystore .sth file from source Db2 instance must be provided for the restore process to work." - -- name: "Set fact: Db2 keystore .kdb file" - vars: - regex: ".+?(?=.kdb)" - when: item is regex(regex) - set_fact: - db2_keystore_kdb_file: "{{ item }}" - loop: "{{ db2_restore_filenames }}" - -- name: "Fail if Db2 keystore .kdb file was not found" - assert: - that: - - db2_keystore_kdb_file is defined - - db2_keystore_kdb_file | length > 0 - fail_msg: "Db2 keystore .kdb file from source Db2 instance must be provided for the restore process to work." - -- name: "Set fact: Db2 backup file timestamp" - vars: - regex: '\d+\d+\d+\d' - when_regex: "^{{ db2_dbname | upper }}.*" - when: item is regex(when_regex) - set_fact: - db2_backup_timestamp: "{{ item | regex_search(regex) }}" - loop: "{{ db2_restore_filenames }}" - -- name: "Fail if Db2 backup file timestamp was not found" - assert: - that: - - db2_backup_timestamp is defined - - db2_backup_timestamp != "" - fail_msg: >- - Db2 backup files were not found or it does not have the expected format - i.e '{{ db2_dbname | upper }}.0.db2inst1.DBPART000.202XXXXXXXXXXX.001' - -- name: "Debug: Db2 backup file timestamp" - debug: - msg: - - "Db2 keystore .p12 file ............. {{ db2_keystore_p12_file }}" - - "Db2 keystore .sth file ............. {{ db2_keystore_sth_file }}" - - "Db2 keystore .kdb file ............. {{ db2_keystore_kdb_file }}" - - "Db2 backup file timestamp .......... {{ db2_backup_timestamp }}" diff --git a/ibm/mas_devops/roles/db2/tasks/restore/determine-storage-classes.yml b/ibm/mas_devops/roles/db2/tasks/restore/determine-storage-classes.yml new file mode 100644 index 0000000000..2f09839b96 --- /dev/null +++ b/ibm/mas_devops/roles/db2/tasks/restore/determine-storage-classes.yml @@ -0,0 +1,43 @@ +--- +# 1. Lookup storage class availabiity +# ----------------------------------------------------------------------------- +- name: "determine-storage-classes : Load default storage class information" + include_tasks: "{{ role_path }}/../../common_tasks/default_storage_classes.yml" + +# 2. RWO Storage (Required) +# ----------------------------------------------------------------------------- +- name: Default RWO Storage for ROKS if not set by user (ReadWriteOnce) + when: + - custom_storage_class_rwo is not defined or custom_storage_class_rwo == "" + - defaultStorageClasses.success + vars: + set_fact: + custom_storage_class_rwo: "{{ defaultStorageClasses.rwo }}" + +- name: Assert that a RWO storage class has been defined + assert: + that: custom_storage_class_rwo is defined and custom_storage_class_rwo != "" + fail_msg: "custom_storage_class_rwo must be defined" + +# 3. RWX Storage (Required) +# ----------------------------------------------------------------------------- +- name: Default RWX Storage for ROKS if not set by user (ReadWriteMany) + when: + - custom_storage_class_rwx is not defined or custom_storage_class_rwx == "" + - defaultStorageClasses.success + set_fact: + custom_storage_class_rwx: "{{ defaultStorageClasses.rwx }}" + +- name: Assert that a meta storage class has been defined + assert: + that: custom_storage_class_rwx is defined and custom_storage_class_rwx != "" + fail_msg: "custom_storage_class_rwx must be defined" + + +# Get Storage from DB2 CR +- name: Override storage class names in the db2 spec.storage + set_fact: + _db2_storages: "{{ db2ucluster_cr_cfg.spec.storage | ibm.mas_devops.set_storage_classes_names(custom_storage_class_rwo, custom_storage_class_rwx) }}" + when: + - custom_storage_class_rwo is defined + - custom_storage_class_rwx is defined diff --git a/ibm/mas_devops/roles/db2/tasks/restore/main.yml b/ibm/mas_devops/roles/db2/tasks/restore/main.yml index eb813e356d..07a68076fe 100644 --- a/ibm/mas_devops/roles/db2/tasks/restore/main.yml +++ b/ibm/mas_devops/roles/db2/tasks/restore/main.yml @@ -1,66 +1,7 @@ --- -# Check db2 restore required variables -# ----------------------------------------------------------------------------- -- name: "Fail if db2_instance_name is not provided" - assert: - that: db2_instance_name is defined and db2_instance_name != "" - fail_msg: "db2_instance_name is required" +# Restore Db2 Universal operator instance +- name: "Start DB2 Instance restore process." + include_tasks: "{{ role_path }}/tasks/restore/restore-instance.yml" -# Set common restore variables -# ----------------------------------------------------------------------------- -- name: "Set fact: common restore variables" - set_fact: - masbr_job_component: - name: "db2" - instance: "{{ db2_instance_name }}" - namespace: "{{ db2_namespace }}" - masbr_job_data_list: - - seq: "1" - type: "database" - -# Before run tasks -# ------------------------------------------------------------------------- -- name: "Before run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/before_run_tasks.yml" - vars: - _job_type: "restore" - _component_before_task_path: "{{ role_path }}/tasks/before-backup-restore.yml" - -- name: "Run restore tasks" - block: - # Update restore job status: New - # ------------------------------------------------------------------------- - - name: "Update restore job status: New" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "1" - phase: "New" - - # Run restore tasks for each data type - # ------------------------------------------------------------------------- - - name: "Run restore tasks for each data type" - include_tasks: "{{ role_path }}/tasks/restore/restore-{{ job_data_item.type }}.yml" - vars: - masbr_job_data_seq: "{{ job_data_item.seq }}" - masbr_job_data_type: "{{ job_data_item.type }}" - loop: "{{ masbr_job_data_list }}" - loop_control: - loop_var: job_data_item - - rescue: - # Update restore status: Failed - # ------------------------------------------------------------------------- - - name: "Update database restore status: Failed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_status: - phase: "Failed" - - always: - # After run tasks - # ------------------------------------------------------------------------- - - name: "After run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/after_run_tasks.yml" - vars: - _component_after_task_path: "{{ role_path }}/tasks/after-backup-restore.yml" +# Restore Db2 database +- include_tasks: tasks/restore-database/main.yml diff --git a/ibm/mas_devops/roles/db2/tasks/restore/restore-database.yml b/ibm/mas_devops/roles/db2/tasks/restore/restore-database.yml deleted file mode 100644 index 23e1dd324f..0000000000 --- a/ibm/mas_devops/roles/db2/tasks/restore/restore-database.yml +++ /dev/null @@ -1,291 +0,0 @@ ---- -# Update db2 database restore status: InProgress -# ----------------------------------------------------------------------------- -- name: "Update db2 database restore status: InProgress" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "InProgress" - -- name: "Restore db2 database" - block: - # Prepare db2 database restore folder - # ------------------------------------------------------------------------- - - name: "Set fact: db2 database restore variables" - set_fact: - # We should use Db2 backup pvc to save the temporary backup files, the db2 pod - # ephemeral local storage has a limits up to 4Gi by default. - db2_restore_folder: "{{ db2_pvc_temp_folder }}/{{ masbr_job_data_type }}" - - - name: "Set fact: db2 database restore log" - set_fact: - db2_restore_log: "{{ db2_restore_folder }}/db2-restore.log" - - - name: "Create db2 database restore folder" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - mkdir -p {{ db2_restore_folder }} && - chmod a+w {{ db2_restore_folder }} - {{ exec_in_pod_end }} - - - name: "Debug: db2 database restore folder" - debug: - msg: "Db2 database restore folder ....... {{ db2_restore_folder }}" - - # Copy backup file from specified storage location to pod - # ------------------------------------------------------------------------- - - name: "Copy backup file from specified storage location to pod" - include_tasks: "{{ role_path }}/tasks/restore/copy-db2-backup-file.yml" - vars: - _job_name: "{{ masbr_restore_from }}" - - - name: "Set fact: Db2 backup file timestamp" - set_fact: - masbr_restore_from_timestamp: "{{ db2_backup_timestamp }}" - - # This is an incremental backup, we also need to copy the based on full backup file - # ------------------------------------------------------------------------- - - name: "This is an incremental backup, we also need to copy the based on full backup file" - when: masbr_restore_from_incr - block: - - name: "Copy based on full backup file from specified storage location to pod" - include_tasks: "{{ role_path }}/tasks/restore/copy-db2-backup-file.yml" - vars: - _job_name: "{{ masbr_restore_basedon }}" - - - name: "Set fact: Db2 backup file timestamp" - set_fact: - masbr_restore_basedon_timestamp: "{{ db2_backup_timestamp }}" - - # Add Db2 keystore master key - # https://www.ibm.com/docs/en/db2/11.5?topic=edr-restoring-encrypted-backup-image-different-system-local-keystore - # ------------------------------------------------------------------------- - - name: "Check master key label from source keystore.p12" - changed_when: false - shell: > - {{ exec_in_pod_begin }} - gsk8capicmd_64 -cert -list all -db {{ db2_restore_folder }}/{{ masbr_restore_from }}/keystore.p12 -stashed - {{ exec_in_pod_end }} - register: _check_master_label_output - - - name: "Get master key label from source keystore.p12" - vars: - regex: '\DB2(.*)' - when: item is regex('\DB2(.*)') - set_fact: - db2_master_key_label: "{{ item | regex_search(regex) }}" - with_items: "{{ _check_master_label_output.stdout_lines | list }}" - - - name: "Add master key to target keystore.p12" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - gsk8capicmd_64 -secretkey -add -db {{ db2_keystore_folder }}/keystore.p12 -stashed - -label {{ db2_master_key_label }} -format ascii - -file {{ db2_restore_folder }}/{{ masbr_restore_from }}/master_key_label.kdb - {{ exec_in_pod_end }} - register: _add_master_key_output - failed_when: - - _add_master_key_output.rc != 0 - - ('CTGSK3005W' not in _add_master_key_output.stdout) - - # Deactivate Db2 in preparation for restore - # https://www.ibm.com/docs/en/db2/11.5?topic=r-restoring-db2-from-online-backup-using-commands - # ------------------------------------------------------------------------- - - name: "Deactivate Db2 in preparation for restore" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - echo "1. Temporarily disable the built-in HA" | tee -a {{ db2_restore_log }}; - sudo wvcli system disable -m "Disable HA before Db2 maintenance" | tee -a {{ db2_restore_log }}; - echo "2. Connect to the database" | tee -a {{ db2_restore_log }}; - db2 -v connect to {{ db2_dbname }} | tee -a {{ db2_restore_log }}; - echo "3. Disconnect all the applications that are connected to Db2" | tee -a {{ db2_restore_log }}; - db2 -v list applications | tee -a {{ db2_restore_log }}; - db2 -v force application all | tee -a {{ db2_restore_log }}; - echo "4. Terminate the database" | tee -a {{ db2_restore_log }}; - db2 -v terminate | tee -a {{ db2_restore_log }}; - echo "5. Stop the database" | tee -a {{ db2_restore_log }}; - db2stop force | tee -a {{ db2_restore_log }}; - echo "6. Ensure that all Db2 interprocess communications are cleaned for the instance" | tee -a {{ db2_restore_log }}; - ipclean -a | tee -a {{ db2_restore_log }}; - echo "7. Turn off all communications to the database by setting the value of the DB2COMM variable to null" | tee -a {{ db2_restore_log }}; - db2set -null DB2COMM | tee -a {{ db2_restore_log }}; - echo "8. Restart the database in restricted access mode" | tee -a {{ db2_restore_log }}; - db2start admin mode restricted access | tee -a {{ db2_restore_log }} - {{ exec_in_pod_end }} - register: _pre_restore_output - - - name: "Debug: deactivate Db2 in preparation for restore" - debug: - msg: "{{ _pre_restore_output.stdout_lines }}" - - - name: "Run Db2 restore commands" - block: - # Run Db2 full restore command - # https://www.ibm.com/docs/en/db2/11.5?topic=commands-restore-database - # ------------------------------------------------------------------------- - - name: "Restore Db2 from a full backup" - when: not masbr_restore_from_incr - changed_when: true - shell: > - {{ exec_in_pod_begin }} - echo "9. Restore Db2 from a full backup" | tee -a {{ db2_restore_log }}; - db2 -v restore db {{ db2_dbname }} - from {{ db2_restore_folder }}/{{ masbr_restore_from }} - taken at {{ masbr_restore_from_timestamp }} into {{ db2_dbname }} - logtarget {{ db2_restore_folder }}/{{ masbr_restore_from }} - replace existing without prompting | tee -a {{ db2_restore_log }} - {{ exec_in_pod_end }} - register: _run_full_restore_output - failed_when: - - _run_full_restore_output.rc != 0 - # SQL2581N: this Db2 error code means something went wrong in restore command - # - ('SQL2581N' in _run_full_restore_output.stdout) - - - name: "Debug: restore Db2 from a full backup" - when: not masbr_restore_from_incr - debug: - msg: "{{ _run_full_restore_output.stdout_lines }}" - - # Run Db2 incremental restore command - # https://www.ibm.com/docs/en/db2/11.5?topic=commands-restore-database - # ------------------------------------------------------------------------- - - name: "Restore Db2 from an incremental backup" - when: masbr_restore_from_incr - changed_when: true - shell: > - {{ exec_in_pod_begin }} - echo "9. Restore Db2 from an incremental backup" | tee -a {{ db2_restore_log }}; - db2ckrst -d BLUDB -t {{ masbr_restore_from_timestamp }} | tee -a {{ db2_restore_log }}; - db2 -v restore db {{ db2_dbname }} incremental - from {{ db2_restore_folder }}/{{ masbr_restore_from }} - taken at {{ masbr_restore_from_timestamp }} into {{ db2_dbname }} - logtarget {{ db2_restore_folder }}/{{ masbr_restore_from }} - replace existing without prompting | tee -a {{ db2_restore_log }}; - db2 -v restore db {{ db2_dbname }} incremental - from {{ db2_restore_folder }}/{{ masbr_restore_basedon }} - taken at {{ masbr_restore_basedon_timestamp }} into {{ db2_dbname }} - logtarget {{ db2_restore_folder }}/{{ masbr_restore_basedon }} - replace existing without prompting | tee -a {{ db2_restore_log }}; - db2 -v restore db {{ db2_dbname }} incremental - from {{ db2_restore_folder }}/{{ masbr_restore_from }} - taken at {{ masbr_restore_from_timestamp }} into {{ db2_dbname }} - logtarget {{ db2_restore_folder }}/{{ masbr_restore_from }} - replace existing without prompting | tee -a {{ db2_restore_log }} - {{ exec_in_pod_end }} - register: _run_incr_restore_output - failed_when: - - _run_incr_restore_output.rc != 0 - # SQL2581N: this Db2 error code means something went wrong in restore command - # - ('SQL2581N' in _run_incr_restore_output.stdout) - - - name: "Debug: run Db2 restore command" - when: masbr_restore_from_incr - debug: - msg: "{{ _run_incr_restore_output.stdout_lines }}" - - always: - # Run Db2 rollforward command regardless of whether Db2 restore success or not, - # otherwise the Db2 will be in pending status. - # https://www.ibm.com/docs/en/db2/11.5?topic=commands-rollforward-database - # ------------------------------------------------------------------------- - - name: "Check Db2 rollforward status" - changed_when: false - shell: > - {{ exec_in_pod_begin }} - echo "10. Check Db2 rollforward status" | tee -a {{ db2_restore_log }}; - db2 -v rollforward db {{ db2_dbname }} query status | tee -a {{ db2_restore_log }} - {{ exec_in_pod_end }} - register: _query_rollforward_output - - - name: "Debug: check Db2 rollforward status" - debug: - msg: "{{ _query_rollforward_output.stdout_lines }}" - - - name: "Run Db2 rollforward command" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - echo "11. Run Db2 rollforward command" | tee -a {{ db2_restore_log }}; - db2 -v "rollforward db {{ db2_dbname }} to end of backup and complete - overflow log path ({{ db2_restore_folder }}/{{ masbr_restore_from }}) noretrieve" | tee -a {{ db2_restore_log }} - {{ exec_in_pod_end }} - register: _run_rollforward_output - failed_when: - - _run_rollforward_output.rc != 0 - # SQL1119N: this Db2 error code means something went wrong in rollforward command - # - ('SQL1119N' in _run_rollforward_output.stdout) - - - name: "Debug: run Db2 rollforward command" - debug: - msg: "{{ _run_rollforward_output.stdout_lines }}" - - # Active Db2 after successful rollforward - # https://www.ibm.com/docs/en/db2/11.5?topic=r-restoring-db2-from-online-backup-using-commands - # ------------------------------------------------------------------------- - - name: "Active Db2 after successful rollforward" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - echo "12. Stop the database" | tee -a {{ db2_restore_log }}; - db2stop force | tee -a {{ db2_restore_log }}; - echo "13. Ensure that all Db2 interprocess communications are cleaned for the instance" | tee -a {{ db2_restore_log }}; - ipclean -a | tee -a {{ db2_restore_log }}; - echo "14. Reinitialize the Db2 communication manager to accept database connections" | tee -a {{ db2_restore_log }}; - db2set DB2COMM=TCPIP,SSL | tee -a {{ db2_restore_log }}; - echo "15. Restart the database for normal operation" | tee -a {{ db2_restore_log }}; - db2start | tee -a {{ db2_restore_log }}; - echo "16. Activate the database" | tee -a {{ db2_restore_log }}; - db2 activate db {{ db2_dbname }} | tee -a {{ db2_restore_log }}; - echo "17. Re-enable the Wolverine high availability monitoring process" | tee -a {{ db2_restore_log }}; - wvcli system enable -m "Enable HA after Db2 maintenance" | tee -a {{ db2_restore_log }}; - echo "18. Connect to the database" | tee -a {{ db2_restore_log }}; - db2 connect to {{ db2_dbname }} | tee -a {{ db2_restore_log }} - {{ exec_in_pod_end }} - register: _post_restore_output - - - name: "Debug: active Db2 after successfull rollforward" - debug: - msg: "{{ _post_restore_output.stdout_lines }}" - - # Update database restore status: Completed - # ------------------------------------------------------------------------- - - name: "Update database restore status: Completed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Completed" - - rescue: - # Update database restore status: Failed - # ------------------------------------------------------------------------- - - name: "Update database restore status: Failed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Failed" - - always: - # Copy db2 restore log file from pod to specified storage location - # ------------------------------------------------------------------------- - - name: "Create a tar.gz archive of db2 restore log" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - tar -czf {{ db2_restore_folder }}/db2-restore-log.tar.gz - -C {{ db2_restore_folder }} db2-restore.log - {{ exec_in_pod_end }} - - - name: "Copy db2 restore log file from pod to specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_pod_files_to_storage.yml" - vars: - masbr_cf_job_type: "restore" - masbr_cf_job_name: "{{ masbr_job_name_final }}" - masbr_cf_paths: - - src_file: "{{ db2_restore_folder }}/db2-restore-log.tar.gz" - dest_folder: "log" diff --git a/ibm/mas_devops/roles/db2/tasks/restore/restore-instance.yml b/ibm/mas_devops/roles/db2/tasks/restore/restore-instance.yml new file mode 100644 index 0000000000..edbafb8f61 --- /dev/null +++ b/ibm/mas_devops/roles/db2/tasks/restore/restore-instance.yml @@ -0,0 +1,355 @@ +--- +# Check db2 Restore required variables +# ----------------------------------------------------------------------------- +- name: Verify DB2 restore variables + ibm.mas_devops.verify_backup_restore_vars: + component: "db2" + action: "restore-instance" + db2_backup_version: "{{ db2_backup_version }}" + mas_backup_dir: "{{ mas_backup_dir }}" + mas_application_id: "{{ mas_application_id }}" + +# Set backup path facts +- name: "Set fact: backup dir paths" + set_fact: + db2_backup_path: "{{ mas_backup_dir }}/backup-{{ db2_backup_version }}-db2u-{{ mas_application_id }}" + db2_resources_path: "{{ mas_backup_dir }}/backup-{{ db2_backup_version }}-db2u-{{ mas_application_id }}/resources" + +- name: "Check Db2u resource path exist" + stat: + path: "{{ db2_resources_path }}" + register: resources_backup_path_stat + +- name: "Fail if backup archive does not exist" + fail: + msg: "Db2 backup resources archive not found at: {{ db2_resources_path }}" + when: not resources_backup_path_stat.stat.exists or not resources_backup_path_stat.stat.isdir + +# Verify cert-manager exists +# ----------------------------------------------------------------------------- +- name: Detect Certificate Manager installation + include_tasks: "{{ role_path }}/../../common_tasks/detect_cert_manager.yml" + +# Verify only one db2ucluster instance file is present in backup archive +# ----------------------------------------------------------------------------- +- name: Get files from {{ db2_resources_path }}/db2uclusters directory + set_fact: + instance_files: "{{ lookup('fileglob', '{{ db2_resources_path }}/db2uclusters/*', wantlist=True) }}" + +- name: Assert exactly one Db2uCluster CR exists + assert: + that: + - instance_files | length == 1 + fail_msg: "Db2uCluster Directory must contain exactly one file" + +- name: Set fact db2ucluster cr + set_fact: + db2ucluster_cr_cfg: "{{ lookup('file', '{{ instance_files[0] }}') | from_yaml }}" + +# Get Db2u details from backup CR +# ----------------------------------------------------------------------------- +- name: Set fact db2u namespace and instance name from backup CR + set_fact: + db2_namespace: "{{ db2ucluster_cr_cfg.metadata.namespace }}" + db2_instance_name: "{{ db2ucluster_cr_cfg.metadata.name }}" + db2_version: "{{ db2ucluster_cr_cfg.spec.version }}" + db2_dbname: "{{ db2ucluster_cr_cfg.spec.environment.database.name }}" + db2_type: "{{ db2ucluster_cr_cfg.spec.environment.dbType }}" + +- name: "Db2uCluster restore information" + debug: + msg: + - "Db2u Namespace ................. {{ db2_namespace }}" + - "Db2u Instance Name ............. {{ db2_instance_name }}" + - "Db2u version ................... {{ db2_version }}" + - "Db2u Database Name ............. {{ db2_dbname }}" + - "Db2u Database Type ............. {{ db2_type }}" + - "MAS Instance ID ................ {{ mas_instance_id | default('undefined') }}" + - "Backup Version ................. {{ db2_backup_version }}" + - "Backup Path .................... {{ db2_backup_path }}" + +# 1. Restore Namespace +# ----------------------------------------------------------------------------- +- name: Restore namespace + ibm.mas_devops.restore_resource: + backup_path: "{{ db2_backup_path }}" + resource_kinds: + - Project + replace_resource: false + register: namespace_result + +# 2. Restore Secrets & configmaps +# ----------------------------------------------------------------------------- +- name: Restore Secrets + ibm.mas_devops.restore_resource: + backup_path: "{{ db2_backup_path }}" + resource_kinds: + - Secret + replace_resource: false + skip_files: #skip applying these files as its from core namespace, will be taken care from suite restore + Secret: + - jdbc-{{ db2_instance_name }}-credentials.yaml + register: secrets_result + when: + - namespace_result.success + +- name: Restore ConfigMaps + ibm.mas_devops.restore_resource: + backup_path: "{{ db2_backup_path }}" + resource_kinds: + - ConfigMap + replace_resource: false + register: configmaps_result + when: + - namespace_result.success + +# 3. Restore Operatorgroups +# ----------------------------------------------------------------------------- +- name: Restore Operatorgroups + ibm.mas_devops.restore_resource: + backup_path: "{{ db2_backup_path }}" + resource_kinds: + - OperatorGroup + replace_resource: false + register: operatorgroups_result + when: + - namespace_result.success + - secrets_result is defined and secrets_result.success + - configmaps_result is defined and configmaps_result.success + +# 4. Restore Subscription +# ----------------------------------------------------------------------------- +- name: Restore Subscriptions + ibm.mas_devops.restore_resource: + backup_path: "{{ db2_backup_path }}" + resource_kinds: + - Subscription + replace_resource: false + register: subscriptions_result + when: + - namespace_result.success + - operatorgroups_result is defined and operatorgroups_result.success + +# 5. Restore Certs and Issuers +# ----------------------------------------------------------------------------- +- name: "Restore Certificate Manager resources" + ibm.mas_devops.restore_resource: + backup_path: "{{ db2_backup_path }}" + resource_kinds: + - Issuer + - Certificate + register: certmanager_result + when: + - namespace_result.success + - subscriptions_result is defined and subscriptions_result.success + +# 6. Wait until the Db2uCluster CRD is available +# ----------------------------------------------------------------------------- +- name: "Wait until the Db2uCluster CRD is available" + include_tasks: "{{ role_path }}/../../common_tasks/wait_for_crd.yml" + vars: + crd_name: "db2uclusters.db2u.databases.ibm.com" + +# 7. Restore Db2uCluster +# ----------------------------------------------------------------------------- + +- name: "Overriding storage class name with default storage class in CR" + include_tasks: "tasks/restore/determine-storage-classes.yml" + when: + - override_storageclass | bool + +- name: "Retrieve db2 passwords from secrets to restore via Db2uCluster CR" + include_tasks: "tasks/restore/retrieve_db2_passwords.yml" + no_log: true + +- name: "Restore Db2uCluster resources" + ibm.mas_devops.restore_resource: + backup_path: "{{ db2_backup_path }}" + resource_kinds: + - Db2uCluster + override_values: + Db2uCluster: + - spec.environment.database.ssl.secretName: "db2u-certificate-{{ db2_instance_name | lower }}" + - spec.environment.instance.password: "{{ _db2_instance_password }}" + - spec.environment.ldap.password: "{{ _db2_ldappassword }}" + - spec.environment.ldap.blueAdminPassword: "{{ _db2_ldapblueadmin_password }}" + - spec.storage: "{{ _db2_storages }}" + register: cr_result + when: + - namespace_result.success + +# 8. Set timezone +# ----------------------------------------------------------------------------- +- name: "Determine if timezone is set in the CR" + set_fact: + db2_timezone: "{{ db2ucluster_cr_cfg.spec.advOpts.timezone | default('') }}" + +- name: "Set db2 instance timezone" + include_tasks: "tasks/install/setup_timezone.yml" + when: + - db2_timezone is defined and db2_timezone != "" + +# 9. Wait for the cluster to be ready +# ----------------------------------------------------------------------------- +- name: "Wait for db2u instance to be ready (5m delay)" + kubernetes.core.k8s_info: + api_version: db2u.databases.ibm.com/v1 + name: "{{ db2_instance_name | lower }}" + namespace: "{{db2_namespace}}" + kind: Db2uCluster + register: db2_cluster_lookup + until: + - db2_cluster_lookup.resources is defined + - db2_cluster_lookup.resources | length == 1 + - db2_cluster_lookup.resources[0].status is defined + - db2_cluster_lookup.resources[0].status.state is defined + - db2_cluster_lookup.resources[0].status.state == "Ready" + retries: 24 # Approximately 2 hours before we give up + delay: 300 # 5 minutes + when: + - namespace_result.success + - cr_result.success + no_log: true # no_log because spits out big json object + +# 10. Restore route +- name: "Get cluster subdomain" + kubernetes.core.k8s_info: + api_version: config.openshift.io/v1 + kind: Ingress + name: cluster + register: _cluster_subdomain + no_log: true # no_log because spits out big json object + +# Apply cluster domain in tls route +- name: "Restore routes" + ibm.mas_devops.restore_resource: + backup_path: "{{ db2_backup_path }}" + resource_kinds: + - Route + override_values: + Route: + - spec.host: "{{db2_instance_name | lower }}-{{db2_namespace}}.{{_cluster_subdomain.resources[0].spec.domain}}" + register: routes_result + when: + - namespace_result.success + - subscriptions_result is defined and subscriptions_result.success + +# 11. Restore LDAP user +# ----------------------------------------------------------------------------- +- name: Restore LDAP user if username and password is provided + include_tasks: tasks/install/create_ldap_user.yml + when: + - db2_ldap_username is defined and db2_ldap_username != "" + - db2_ldap_password is defined and db2_ldap_password != "" + +# 12. Wait for the statefulset to be ready +# ----------------------------------------------------------------------------- +- name: "Wait for Db2 Stateful set to be ready" + kubernetes.core.k8s_info: + api_version: apps/v1 + kind: StatefulSet + name: "c-{{ db2_instance_name | lower }}-db2u" + namespace: "{{ db2_namespace }}" + register: db2_sts + until: + - db2_sts.resources is defined + - db2_sts.resources | length > 0 + - db2_sts.resources[0].status is defined + - db2_sts.resources[0].status.replicas is defined + - db2_sts.resources[0].status.readyReplicas is defined + - db2_sts.resources[0].status.readyReplicas == db2_sts.resources[0].status.replicas + retries: 20 # approx 10 minutes before we give up + delay: 30 # seconds + no_log: true # no_log because spits out big json object + +# Calculate total results +# ----------------------------------------------------------------------------- +- name: "Calculate total restore results" + set_fact: + total_created: >- + {{ + (namespace_result.created_count | default(0)) + + (secrets_result.created_count | default(0)) + + (configmaps_result.created_count | default(0)) + + (operatorgroups_result.created_count | default(0)) + + (subscriptions_result.created_count | default(0)) + + (certmanager_result.created_count | default(0)) + + (cr_result.created_count | default(0)) + + (routes_result.created_count | default(0)) + }} + total_updated: >- + {{ + (namespace_result.updated_count | default(0)) + + (secrets_result.updated_count | default(0)) + + (configmaps_result.updated_count | default(0)) + + (operatorgroups_result.updated_count | default(0)) + + (subscriptions_result.updated_count | default(0)) + + (certmanager_result.updated_count | default(0)) + + (cr_result.updated_count | default(0)) + + (routes_result.updated_count | default(0)) + }} + total_skipped: >- + {{ + (namespace_result.skipped_count | default(0)) + + (secrets_result.skipped_count | default(0)) + + (configmaps_result.skipped_count | default(0)) + + (operatorgroups_result.skipped_count | default(0)) + + (subscriptions_result.skipped_count | default(0)) + + (certmanager_result.skipped_count | default(0)) + + (cr_result.skipped_count | default(0)) + + (routes_result.skipped_count | default(0)) + }} + total_failed: >- + {{ + (namespace_result.failed_count | default(0)) + + (secrets_result.failed_count | default(0)) + + (configmaps_result.failed_count | default(0)) + + (operatorgroups_result.failed_count | default(0)) + + (subscriptions_result.failed_count | default(0)) + + (certmanager_result.failed_count | default(0)) + + (cr_result.failed_count | default(0)) + + (routes_result.failed_count | default(0)) + }} + +- name: "Display total restore results" + debug: + msg: + - >- + Restore completed{{ ' with failures' if total_failed | int > 0 + else ' successfully' }} + - "Total resources created: {{ total_created }}" + - "Total resources updated: {{ total_updated }}" + - "Total resources skipped: {{ total_skipped }}" + - "Total resources failed: {{ total_failed }}" + +# Fail task if any errors occurred +# ----------------------------------------------------------------------------- +- name: "Collect all failed resources" + set_fact: + all_failed_resources: >- + {{ + (namespace_result.failed_resources | default([])) + + (secrets_result.failed_resources | default([])) + + (configmaps_result.failed_resources | default([])) + + (operatorgroups_result.failed_resources | default([])) + + (subscriptions_result.failed_resources | default([])) + + (certmanager_result.failed_resources | default([])) + + (cr_result.failed_resources | default([])) + + (routes_result.failed_resources | default([])) + }} + +- name: "Display failed resources" + debug: + msg: + - "Failed resources:" + - "{{ all_failed_resources | to_nice_yaml }}" + when: total_failed | int > 0 + +- name: "Fail if restore had errors" + fail: + msg: | + Restore failed for {{ total_failed }} resource(s): + {% for resource in all_failed_resources %} + - {{ resource.description }}: {{ resource.error }} + {% endfor %} + when: total_failed | int > 0 diff --git a/ibm/mas_devops/roles/db2/tasks/restore/retrieve_db2_passwords.yml b/ibm/mas_devops/roles/db2/tasks/restore/retrieve_db2_passwords.yml new file mode 100644 index 0000000000..dcb59d28c8 --- /dev/null +++ b/ibm/mas_devops/roles/db2/tasks/restore/retrieve_db2_passwords.yml @@ -0,0 +1,63 @@ +--- +# Retrieving instance password from secret to restore via Db2uCluster CR +- name: "Check instance password secret is present in /secrets/ dir" + no_log: true + stat: + path: "{{ db2_resources_path }}/secrets/c-{{ db2_instance_name }}-instancepassword.yaml" + register: instancepassword_stat + +- name: Read YAML file and get password value + no_log: true + set_fact: + _db2_instance_password: "{{ lookup('file', '{{ db2_resources_path }}/secrets/c-{{ db2_instance_name }}-instancepassword.yaml') | from_yaml | json_query('data.password') | b64decode }}" + when: instancepassword_stat.stat.exists + +# Retrieving ldapblueadminpassword from secrets to restore via Db2uCluster CR +- name: "Check ldapblueadminpassword password secret is present in /secrets/ dir" + no_log: true + stat: + path: "{{ db2_resources_path }}/secrets/c-{{ db2_instance_name }}-ldapblueadminpassword.yaml" + register: ldapblueadminpassword_stat + +- name: Read YAML file and get password value + no_log: true + set_fact: + _db2_ldapblueadmin_password: "{{ lookup('file', '{{ db2_resources_path }}/secrets/c-{{ db2_instance_name }}-ldapblueadminpassword.yaml') | from_yaml | json_query('data.password') | b64decode }}" + when: ldapblueadminpassword_stat.stat.exists + + +# Retrieving ldappassword from secrets to restore via Db2uCluster CR +- name: "Check ldappassword password secret is present in /secrets/ dir" + no_log: true + stat: + path: "{{ db2_resources_path }}/secrets/c-{{ db2_instance_name }}-ldappassword.yaml" + register: ldappassword_stat + +- name: Read YAML file and get password value + no_log: true + set_fact: + _db2_ldappassword: "{{ lookup('file', '{{ db2_resources_path }}/secrets/c-{{ db2_instance_name }}-ldappassword.yaml') | from_yaml | json_query('data.password') | b64decode }}" + when: ldappassword_stat.stat.exists + +# Retrieving jdbc username/password from secrets to restore via Db2uCluster CR +- name: "Check jdbc-{{ db2_instance_name }}-credentials password secret is present in /secrets/ dir" + no_log: true + stat: + path: "{{ db2_resources_path }}/secrets/jdbc-{{ db2_instance_name }}-credentials.yaml" + register: jdbcsecret_stat + +- name: Read YAML file and get username value + no_log: true + set_fact: + _jdbc_username: "{{ lookup('file', '{{ db2_resources_path }}/secrets/jdbc-{{ db2_instance_name }}-credentials.yaml') | from_yaml | json_query('data.username') | b64decode }}" + when: jdbcsecret_stat.stat.exists + +- name: Read YAML file and get password value + no_log: true + set_fact: + db2_ldap_username: "{{ _jdbc_username}}" + db2_ldap_password: "{{ lookup('file', '{{ db2_resources_path }}/secrets/jdbc-{{ db2_instance_name }}-credentials.yaml') | from_yaml | json_query('data.password') | b64decode }}" + when: + - _jdbc_username is defined and _jdbc_username != "" + - _jdbc_username != "db2inst1" + - jdbcsecret_stat.stat.exists diff --git a/ibm/mas_devops/roles/db2/templates/backup/db2_backup.sh.j2 b/ibm/mas_devops/roles/db2/templates/backup/db2_backup.sh.j2 new file mode 100644 index 0000000000..9937ce8f77 --- /dev/null +++ b/ibm/mas_devops/roles/db2/templates/backup/db2_backup.sh.j2 @@ -0,0 +1,340 @@ +#!/bin/bash + + +function stopDBConnections() +{ + databaseName=${1} + + echo "" + echo "STEP 1: Temporarily disable the built-in HA" + if [ -f /etc/wolverine/config.json ]; then + wvcli system disable -m "Disable HA before Db2 maintenance" + echo "INFO: Wolverine HA disabled" + else + echo "INFO: Wolverine HA not enabled" + fi + + echo "" + echo "STEP 2: Connect to the database" + connect_status=$(db2 -v connect to ${databaseName}) + rc=$? + if [ $rc -ne 0 ]; then + if echo "$connect_status" | grep -q "SQL1032N"; then + echo "INFO: Database ${databaseName} is not currently active. Proceeding with restore" + elif echo "$connect_status" | grep -q "SQL1013N"; then + echo "INFO: Database ${databaseName} does not exist. Proceeding with restore" + elif echo "$connect_status" | grep -q "SQL30081N"; then + echo "ERROR: Network error while connecting to database ${databaseName}" + exit 1 + elif echo "$connect_status" | grep -q "SQL1119N"; then + echo "ERROR: $connect_status" + exit 1 + else + echo "INFO: $connect_status. Proceeding with restore" + fi + fi + + echo "" + echo "STEP 3: Disconnect all applications connected to Db2" + db2 -v force application all + echo "INFO: Waiting for connections to close (30 seconds)..." + sleep 30 + + echo "" + echo "STEP 4: Terminate the database" + db2 -v terminate + + echo "" + echo "STEP 5: Stop the database" + db2stop force + + echo "" + echo "STEP 6: Clean Db2 interprocess communications" + ipclean -a + + echo "" + echo "STEP 7: Disable database communications" + db2set -null DB2COMM + + echo "" + echo "STEP 8: Restart database in restricted access mode" + db2start admin mode restricted access + echo "INFO: Waiting for database to start in restricted access mode (60 seconds)..." + sleep 60 +} + +function startDB() +{ + databaseName=${1} + + echo "" + echo "STEP 10: Halt the restricted access mode" + db2stop force + + echo "" + echo "STEP 11: Clean Db2 interprocess communications" + ipclean -a + + echo "" + echo "STEP 12: Reinitialize the Db2 communication manager to accept database connections" + db2set DB2COMM=TCPIP,SSL + + echo "" + echo "STEP 13: Restart database for normal operation" + db2start + echo "INFO: Waiting for database to start (60 seconds)..." + sleep 60 + + echo "" + echo "STEP 14: Activate the database" + db2 activate db ${databaseName} + rc=$? + if [ $rc -ne 0 ]; then + echo "ERROR: Failed to activate database ${databaseName}. Return code: $rc" + else + echo "INFO: Database activated successfully" + fi + + echo "" + echo "STEP 15: Re-enable Wolverine high availability" + sudo wvcli system enable -m "Enable HA after Db2 maintenance" + rc=$? + if [ $rc -ne 0 ]; then + echo "ERROR: Failed to enable HA. Return code: $rc" + else + echo "INFO: Wolverine HA re-enabled successfully" + fi + + echo "" + echo "STEP 16: Show HA monitoring status" + sudo wvcli system status + sudo wvcli system devices + + echo "" + echo "STEP 17: Connect to the database" + db2 -v connect to ${databaseName} + rc=$? + if [ $rc -ne 0 ]; then + echo "ERROR: Failed to connect to database ${databaseName}. Return code: $rc" + else + echo "INFO: Successfully connected to database ${databaseName}" + fi +} + +DATABASE={{ db2_dbname }} +BACKUP_TYPE={{ backup_type }} +VENDOR={{ backup_vendor }} +APP_ID={{ mas_application_id }} +### for S3, BACKUP_PATH=DB2REMOTE://{{ backup_s3_alias }}/{{ backup_s3_bucket }}/backups-db2-{{ mas_application_id }}/{{ db2_backup_version }} +### for disk, BACKUP_PATH=/mnt/backup/{{ db2_backup_version }}/data +BACKUP_PATH={{ full_backup_path }} +DB2_BACKUP_VERSION={{ db2_backup_version }} + +### only for disk, DISK_BACKUP_BASE=/mnt/backup/{{ db2_backup_version }} where tar is stored +DISK_BACKUP_BASE={{ base_backup_path }} + +# Finding the db2 Instance owner +INSTOWNER=`/usr/local/bin/db2greg -dump | grep -ae "I," | grep -v "/das," | awk -F ',' '{print $4}' ` + +instance=`whoami` + +# Find the home directory +instance_home=`/usr/local/bin/db2greg -dump | grep -ae "I," | grep -v "/das," | grep "${instance}" | awk -F ',' '{print $5}'| cut -d/ -f 1,2,3,4,5` + +if [ ! -f "$instance_home/sqllib/db2profile" ] +then + echo "ERROR: $instance_home/sqllib/db2profile not found" + EXIT_STATUS=1 +else + . $instance_home/sqllib/db2profile +fi + +### Check for the existance of /home/ctginst1/sqllib/db2dump/libdb2compr.so...if it exists, delete it +COMPRESS_LOC=$instance_home/sqllib/db2dump/libdb2compr.so +if [[ -f ${COMPRESS_LOC} ]] +then + rm ${COMPRESS_LOC} +fi + +### Check to see if the database is Running +ps -ef | grep db2sys | grep -v grep > /dev/null 2>&1 +if [ $? -eq 1 ]; then + echo "ERROR: Database is not active" + echo "Database Not Active,BACKUP Not Run" > $instance_home/bin/LASTbkupRUN + exit +fi + +### Create Backup Directory if it does not exist when vendor is disk +if [ ${VENDOR} = 'disk' ] ; then + if [ ! -d "${BACKUP_PATH}" ] ; then + echo "INFO: Creating backup directory: ${BACKUP_PATH}" + mkdir -p ${BACKUP_PATH} + if [ $? -ne 0 ] ; then + echo "ERROR: Could not create backup directory ${BACKUP_PATH}" + exit 1 + fi + echo "INFO: Backup directory created successfully" + fi +fi + +### Check LOGARCHMETH1 is not OFF to proceed if ONLINE backup +if [ ${BACKUP_TYPE} = 'online' ] ; then + echo "INFO: Checking if LOGARCHMETH1 is ON to proceed with ONLINE backup..." + logarchmeth1_cmd=$(db2 get db cfg for ${DATABASE} | grep LOGARCHMETH1 | awk -F'= ' '{print $2}') + + if [ ${logarchmeth1_cmd} = 'OFF' ]; then + echo "ERROR: LOGARCHMETH1 is OFF. Cannot proceed with online backup, Please choose offline backup option. Exiting..." + exit 1 + fi + echo "INFO: LOGARCHMETH1 is ON: $logarchmeth1_cmd" +fi + +# ============================================================================ +# Start Backup Process +# ============================================================================ +DATETIME=`date +%Y-%m-%d_%H%M%S`; +echo "" +echo "========================================================================" +echo " DB2 BACKUP PROCESS STARTED" +echo "========================================================================" +echo " Database : ${DATABASE}" +echo " Backup Type : ${BACKUP_TYPE}" +echo " Backup Vendor : ${VENDOR}" +echo " Backup Path : ${BACKUP_PATH}" +echo " Start Time : ${DATETIME}" +echo "========================================================================" +echo "" + + +# ============================================================================ +# Execute Backup Command +# ============================================================================ +if [ ${BACKUP_TYPE} = 'online' ] ; then + echo "INFO: Performing full online backup of database ${DATABASE}..." + db2 -v archive log for db $DATABASE + sleep 30 + echo "INFO: Command: db2 -v backup db ${DATABASE} on all dbpartitionnums online to ${BACKUP_PATH} compress UTIL_IMPACT_PRIORITY 50 include logs without prompting" + BACKUP_CMD=$(db2 -v backup db $DATABASE on all dbpartitionnums online to $BACKUP_PATH compress UTIL_IMPACT_PRIORITY 50 include logs without prompting) +elif [ ${BACKUP_TYPE} = 'offline' ] ; then + echo "INFO: Performing full offline backup of database ${DATABASE}..." + + stopDBConnections $DATABASE + + echo "" + echo "STEP 9: Backup database in offline mode" + BACKUP_CMD=$(db2 -v backup db $DATABASE on all dbpartitionnums to $BACKUP_PATH compress UTIL_IMPACT_PRIORITY 50 without prompting) + + startDB $DATABASE + +else + echo "ERROR: Invalid backup type: ${BACKUP_TYPE}" + exit 1 +fi + +if echo "$BACKUP_CMD" | grep -q "Backup successful."; then + echo "$BACKUP_CMD" + backup_timestamp=`echo "$BACKUP_CMD" | grep timestamp | cut -d: -f2` + echo "" + echo "INFO: Backup operation completed successfully" +else + echo "" + echo "========================================================================" + echo " BACKUP FAILED" + echo "========================================================================" + echo "ERROR: Backup operation did not complete successfully" >&2 + echo "ERROR: Timestamp: $(date '+%Y-%m-%d %H:%M:%S')" >&2 + echo "========================================================================" + exit 1 +fi + +# ============================================================================ +# Copy Keystore Files +# ============================================================================ + +# Get KEYSTORE_LOCATION from dbm cfg +KEYSTORE_LOC=$(db2 get dbm cfg | grep KEYSTORE_LOCATION | awk -F'= ' '{print $2}') + +# Get .sth file by replacing the .p12 to .sth +STH_LOC=$(echo $KEYSTORE_LOC | sed 's/\.p12$/.sth/') + +if [ ${VENDOR} = 's3' ] ; then + echo "" + echo "INFO: Copying keystore files to COS location..." + set -x + TARGET1=keystore.p12 + TARGET2=keystore.sth + + db2RemStgManager ALIAS PUT source=${KEYSTORE_LOC} target=${BACKUP_PATH}/${TARGET1} + db2RemStgManager ALIAS PUT source=${STH_LOC} target=${BACKUP_PATH}/${TARGET2} + set +x + echo "INFO: Keystore files copied to COS location successfully" +else + echo "" + echo "INFO: Copying keystore files to local disk location..." + cp ${KEYSTORE_LOC} ${BACKUP_PATH}/keystore.p12 + cp ${STH_LOC} ${BACKUP_PATH}/keystore.sth + echo "INFO: Keystore files copied to local disk location successfully" +fi + +# ============================================================================ +# Create Backup Information File +# ============================================================================ +echo "" +echo "INFO: Creating db2-backup-info.yaml file..." +YAML_FILE=/tmp/db2-backup-info.yaml +cat <> ${YAML_FILE} +source_db2_backup_version: {{ db2_backup_version }} +database: ${DATABASE} +app_id: ${APP_ID} +backup_type: ${BACKUP_TYPE} +backup_vendor: ${VENDOR} +vendor_backup_path: ${BACKUP_PATH} +backup_timestamp: ${backup_timestamp} +source_db2_instance_name: {{ db2_instance_name }} +source_db2_instance_version: {{ db2_version }} +source_db2_instance_channel: {{ db2_channel }} +status: SUCCESS +EOT +echo "INFO: db2-backup-info.yaml file created at ${YAML_FILE}" + +if [ ${VENDOR} = 's3' ] ; then + echo "INFO: Uploading db2-backup-info.yaml file to COS location..." + db2RemStgManager ALIAS PUT source=${YAML_FILE} target=${BACKUP_PATH}/db2-backup-info.yaml + echo "INFO: db2-backup-info.yaml file uploaded successfully" +fi + +# ============================================================================ +# Archive Backup Files (Disk Vendor Only) +# ============================================================================ +if [ ${VENDOR} = 'disk' ] ; then + cp ${YAML_FILE} ${BACKUP_PATH}/db2-backup-info.yaml + echo "" + echo "INFO: Creating tar archive of backup files..." + tar -czf ${DISK_BACKUP_BASE}/db2-${APP_ID}-${DATABASE}-backup-${DB2_BACKUP_VERSION}.tar.gz -C ${BACKUP_PATH} . + echo "INFO: Tar archive created successfully" + echo "" + echo "Disk Usage:" + du -h ${BACKUP_PATH}/* +else + echo "" + echo "INFO: Backup vendor is not disk. Skipping tar archive creation." +fi + +echo "BACKUP_FILE_TIMESTAMP=${backup_timestamp}" + +# ============================================================================ +# Backup Completion Summary +# ============================================================================ +DATETIME=`date +%Y-%m-%d_%H%M%S`; +echo "" +echo "========================================================================" +echo " DB2 BACKUP COMPLETED SUCCESSFULLY" +echo "========================================================================" +echo " Database : ${DATABASE}" +echo " Backup Type : ${BACKUP_TYPE}" +echo " Backup Vendor : ${VENDOR}" +echo " Backup Path : ${BACKUP_PATH}" +echo " End Time : ${DATETIME}" +echo " Backup Timestamp: ${backup_timestamp}" +echo "========================================================================" +echo "" diff --git a/ibm/mas_devops/roles/db2/templates/backup/db2_restore_disk.sh.j2 b/ibm/mas_devops/roles/db2/templates/backup/db2_restore_disk.sh.j2 new file mode 100644 index 0000000000..4451c130aa --- /dev/null +++ b/ibm/mas_devops/roles/db2/templates/backup/db2_restore_disk.sh.j2 @@ -0,0 +1,332 @@ +#!/bin/bash + +DATABASE={{ db2_dbname }} +BACKUP_TYPE={{ backup_type }} +DB2_BACKUP_VERSION={{ db2_backup_version }} +DB2_BACKUP_TIMESTAMP={{ db2_backup_timestamp }} + +# Finding the db2 Instance owner +INSTOWNER=`/usr/local/bin/db2greg -dump | grep -ae "I," | grep -v "/das," | awk -F ',' '{print $4}' ` + +instance=`whoami` +BACKUP_BASE=/mnt/backup +# Find the home directory +instance_home=`/usr/local/bin/db2greg -dump | grep -ae "I," | grep -v "/das," | grep "${instance}" | awk -F ',' '{print $5}'| cut -d/ -f 1,2,3,4,5` + +if [ ! -f "$instance_home/sqllib/db2profile" ] +then + echo "ERROR: $instance_home/sqllib/db2profile not found" + EXIT_STATUS=1 +else + . $instance_home/sqllib/db2profile +fi + +### Verify the backup files + +### Check if keystore files are present + +POD_BACKUP_PATH="{{ pod_full_backup_path }}" + +if [[ ! -f "${POD_BACKUP_PATH}/keystore.p12" ]]; then + echo "ERROR: Db2 keystore.p12 file is missing in the backup location: ${POD_BACKUP_PATH}" + exit 1 +fi + +if [[ ! -f "${POD_BACKUP_PATH}/keystore.sth" ]]; then + echo "ERROR: Db2 keystore.sth file is missing in the backup location: ${POD_BACKUP_PATH}" + exit 1 +fi + +### Check if backup files are present +ls ${POD_BACKUP_PATH}/${DATABASE}*.DBPART000.* 2>/dev/null +rc=$? +if [ $rc -ne 0 ]; then + echo "ERROR: Db2 backup files are not found in the backup location: ${POD_BACKUP_PATH}" + exit 1 +fi + +# from the backup check command, check if the logs are included in the backup files. +BACKUP_FILES=${POD_BACKUP_PATH}/${DATABASE}*.DBPART000.* +backup_contain_logs=true +for FILE in $BACKUP_FILES; do + echo "INFO: Verifying downloaded backup $FILE" + bkpchk_cmd=$(db2ckbkp -h ${FILE}) + if [ $rc -ne 0 ]; then + echo "ERROR: Failed to verify backup file ${FILE}. Return code: $rc" + exit 1 + else + echo "INFO: Successfully verified: $(basename "$FILE")" + echo "INFO: Check if the backup contains logs" + logs_value=$(echo "$bkpchk_cmd" | grep 'Includes Logs' | awk -F '-- ' '{print $2}' | awk '{print $1}') + # 0 when the backup doesn't contain logs + if [ ${logs_value} = '0' ]; then + echo "INFO: Backup doesn't contain logs" + backup_contain_logs=false + fi + fi +done +echo "INFO: Db2 backup files verified successfully." + +echo "INFO: Checking if LOGARCHMETH1 is ON." +logarchmeth1_enabled=true +logarchmeth1_cmd=$(db2 get db cfg for ${DATABASE} | grep LOGARCHMETH1 | awk -F'= ' '{print $2}') +if [ ${logarchmeth1_cmd} = 'OFF' ]; then + echo "INFO: LOGARCHMETH1 is OFF." + logarchmeth1_enabled=false +else + echo "INFO: LOGARCHMETH1 is ON." +fi + +echo "INFO: Db2 keystore files verified successfully." + +# Extract Db2 keystore master key +# https://www.ibm.com/docs/en/db2/11.5?topic=edr-restoring-encrypted-backup-image-different-system-local-keystore + +MASTER_KEY_LABEL=`gsk8capicmd_64 -cert -list all -db ${POD_BACKUP_PATH}/keystore.p12 -stashed 2>&1 | grep -i $INSTOWNER_$DATABASE | awk '/^#/ { print $2 }'` +if [ -z "$MASTER_KEY_LABEL" ]; then + echo "ERROR: MASTER_KEY_LABEL is empty. Cannot proceed with secret key extraction." + exit 1 +else + ### Check if MASTER_KEY_LABEL exists in the source keystore + SOURCE_KEYSTORE_LOC=$(db2 get dbm cfg | grep KEYSTORE_LOCATION | awk -F'= ' '{print $2}') + SOURCE_LABELS=$(gsk8capicmd_64 -cert -list all -db ${SOURCE_KEYSTORE_LOC} -stashed 2>&1 | awk '/^#/ { print $2 }') + if ! echo "$SOURCE_LABELS" | grep -q "^${MASTER_KEY_LABEL}$"; then + echo "INFO: MASTER_KEY_LABEL '${MASTER_KEY_LABEL}' not found in source keystore." + gsk8capicmd_64 -secretkey -extract -db ${POD_BACKUP_PATH}/keystore.p12 -stashed -label ${MASTER_KEY_LABEL} -format ascii -target ${POD_BACKUP_PATH}/master_key_label.kdb + echo "INFO: MASTER_KEY_LABEL is not empty - Secret key extraction Completed." + ## Add Master key to target keystore + result=$(gsk8capicmd_64 -secretkey -add -db ${SOURCE_KEYSTORE_LOC} -stashed -label ${MASTER_KEY_LABEL} -file ${POD_BACKUP_PATH}/master_key_label.kdb 2>&1) + rc=$? + if [ $rc -ne 0 ]; then + echo "ERROR: Error adding master key to target keystore. Return code: $rc" + exit 1 + fi + + if echo "$result" | grep -q "CTGSK3005W"; then + echo "ERROR: CTGSK3005W warning detected. Exiting" + echo "$result" + exit 1 + fi + else + echo "INFO: MASTER_KEY_LABEL '${MASTER_KEY_LABEL}' already exists in target keystore. Skipping addition." + fi +fi + +echo "INFO: Db2 keystore master key restored successfully." + +# Deactivate Db2 in preparation for restore +# https://www.ibm.com/docs/en/db2/11.5?topic=r-restoring-db2-from-online-backup-using-commands + +dbRestoreSuccess=true +dbRollbackSuccess=true + +echo "" +echo "========================================================================" +echo " PREPARING DATABASE FOR RESTORE" +echo "========================================================================" +echo "" + +echo "STEP 1: Temporarily disable the built-in HA" +if [ -f /etc/wolverine/config.json ]; then + wvcli system disable -m "Disable HA before Db2 maintenance" +else + echo "INFO: Wolverine HA not enabled" +fi + +echo "" +echo "STEP 2: Connect to the database" +connect_status=$(db2 -v connect to ${DATABASE}) +rc=$? +if [ $rc -ne 0 ]; then + if echo "$connect_status" | grep -q "SQL1032N"; then + echo "INFO: Database ${DATABASE} is not currently active. Proceeding with restore." + elif echo "$connect_status" | grep -q "SQL1013N"; then + echo "INFO: Database ${DATABASE} does not exist. Proceeding with restore." + elif echo "$connect_status" | grep -q "SQL30081N"; then + echo "ERROR: Network error while connecting to database ${DATABASE}" + exit 1 + elif echo "$connect_status" | grep -q "SQL1119N"; then + echo "ERROR: $connect_status" + exit 1 + else + echo "INFO: $connect_status. Proceeding with restore." + fi +fi + +echo "" +echo "STEP 3: Disconnect all the applications that are connected to Db2" +db2 -v force application all + +echo "" +echo "INFO: Waiting for connections to close, sleep for 30 seconds" +sleep 30 + +echo "" +echo "STEP 4: Terminate the database" +db2 -v terminate + +echo "" +echo "STEP 5: Stop the database" +db2stop force + +echo "" +echo "STEP 6: Ensure that all Db2 interprocess communications are cleaned for the instance" +ipclean -a + +echo "" +echo "STEP 7: Turn off all communications to the database by setting the value of the DB2COMM variable to null" +db2set -null DB2COMM + +echo "" +echo "STEP 8: Restart the database in restricted access mode" +db2start admin mode restricted access + +echo "" +echo "INFO: Waiting for db to start" +sleep 60 + +echo "" +echo "========================================================================" +echo " EXECUTING DATABASE RESTORE" +echo "========================================================================" +echo "" + +echo "STEP 9: Restore Db2 from a full backup" +if [[ "$backup_contain_logs" == "true" && "logarchmeth1_enabled" == "true" ]]; then + restore_cmd=$(db2 -v restore db ${DATABASE} from ${POD_BACKUP_PATH} taken at ${DB2_BACKUP_TIMESTAMP} into ${DATABASE} logtarget ${POD_BACKUP_PATH} replace existing without prompting) +else + echo "INFO: Restoring database ${DATABASE} from backup timestamp ${DB2_BACKUP_TIMESTAMP} WITHOUT logs..." + restore_cmd=$(db2 -v restore db ${DATABASE} from ${POD_BACKUP_PATH} taken at ${DB2_BACKUP_TIMESTAMP} into ${DATABASE} replace existing without prompting) +fi +rc=$? +if [ $rc -ne 0 ]; then + echo "$restore_cmd" + if echo "$restore_cmd" | grep -q "Restore is successful"; then + echo "INFO: Database ${DATABASE} restored successfully from backup." + else + echo "ERROR: Error restoring database ${DATABASE} from backup. Return code: $rc" + dbRestoreSuccess=false + fi +fi + +echo "" +echo "STEP 10: Check Db2 rollforward status" +if [[ "$backup_contain_logs" == "true" && "logarchmeth1_enabled" == "true" ]]; then + db2 -v rollforward db ${DATABASE} query status + rc=$? + if [ $rc -ne 0 ]; then + echo "ERROR: Error querying rollforward status for database ${DATABASE}. Return code: $rc" + fi +else + echo "INFO: Skipping rollforward status check as either backup does not contain logs or logarchmeth1 is OFF." +fi + +echo "" +echo "STEP 11: Run Db2 rollforward command" +if [[ "$backup_contain_logs" == "true" && "logarchmeth1_enabled" == "true" ]]; then + db2 -v rollforward db ${DATABASE} to end of backup and complete overflow log path "(${POD_BACKUP_PATH})" noretrieve + rc=$? + if [ $rc -ne 0 ]; then + echo "ERROR: Error during rollforward for database ${DATABASE}. Return code: $rc" + dbRollbackSuccess=false + fi +else + echo "INFO: Skipping rollforward as either backup does not contain logs or logarchmeth1 is OFF." +fi + +echo "" +echo "========================================================================" +echo " RESTARTING DATABASE FOR NORMAL OPERATION" +echo "========================================================================" +echo "" + +echo "STEP 12: Stop the database" +db2stop force +rc=$? +if [ $rc -ne 0 ]; then + echo "ERROR: Error stopping database ${DATABASE}. Return code: $rc" +fi + +echo "" +echo "STEP 13: Ensure that all Db2 interprocess communications are cleaned for the instance" +ipclean -a +rc=$? +if [ $rc -ne 0 ]; then + echo "ERROR: Error cleaning Db2 interprocess communications. Return code: $rc" +fi + +echo "" +echo "STEP 14: Reinitialize the Db2 communication manager to accept database connections" +db2set DB2COMM=TCPIP,SSL +rc=$? +if [ $rc -ne 0 ]; then + echo "ERROR: Error reinitializing Db2 communication manager. Return code: $rc" +fi + +echo "" +echo "STEP 15: Restart the database for normal operation" +db2start +rc=$? +if [ $rc -ne 0 ]; then + echo "ERROR: Error starting database ${DATABASE}. Return code: $rc" +fi + +echo "" +echo "INFO: waiting for db to start, sleep 60 seconds" +sleep 60 + +echo "" +echo "STEP 16: Activate the database" +db2 activate db ${DATABASE} +rc=$? +if [ $rc -ne 0 ]; then + echo "ERROR: Error activating database ${DATABASE}. Return code: $rc" +fi + +echo "" +echo "STEP 17: Re-enable the Wolverine high availability monitoring process" +sudo wvcli system enable -m "Enable HA after Db2 maintenance" +rc=$? +if [ $rc -ne 0 ]; then + echo "ERROR: Error enabling HA. Return code: $rc" +fi + +echo "" +echo "STEP 18: Connect to the database" +db2 -v connect to ${DATABASE} +rc=$? +if [ $rc -ne 0 ]; then + echo "ERROR: Error connecting to database ${DATABASE}. Return code: $rc" +fi + +# Cleanup the backup files from local path +echo "INFO: Cleaning up local backup files at: ${POD_BACKUP_PATH}" +rm -rf ${POD_BACKUP_PATH} + +echo "" +echo "========================================================================" +echo " RESTORE COMPLETION STATUS" +echo "========================================================================" + +if [ "$dbRestoreSuccess" = false ]; then + echo " Status: FAILED-RestoreFailed" + echo " Reason: Database restore encountered errors" + echo "========================================================================" + exit 1 +fi + +if [ "$dbRollbackSuccess" = false ]; then + echo " Status: FAILED-RestoreFailed" + echo " Reason: Database rollforward encountered errors" + echo "========================================================================" + exit 1 +fi + +echo " Database : ${DATABASE}" +echo " Backup Timestamp: ${DB2_BACKUP_TIMESTAMP}" +echo " POD Path : ${POD_BACKUP_PATH}" +echo " Status : SUCCESS-RestoreSuccess" +echo "========================================================================" +echo "" +exit 0 + diff --git a/ibm/mas_devops/roles/db2/templates/backup/db2_restore_s3.sh.j2 b/ibm/mas_devops/roles/db2/templates/backup/db2_restore_s3.sh.j2 new file mode 100644 index 0000000000..a96f0043ad --- /dev/null +++ b/ibm/mas_devops/roles/db2/templates/backup/db2_restore_s3.sh.j2 @@ -0,0 +1,453 @@ +#!/bin/bash +DB2_BACKUP_VERSION={{ db2_backup_version }} + +# Finding the db2 Instance owner +INSTOWNER=`/usr/local/bin/db2greg -dump | grep -ae "I," | grep -v "/das," | awk -F ',' '{print $4}' ` + +instance=`whoami` +BACKUP_BASE=/mnt/backup + +# Find the home directory +instance_home=`/usr/local/bin/db2greg -dump | grep -ae "I," | grep -v "/das," | grep "${instance}" | awk -F ',' '{print $5}'| cut -d/ -f 1,2,3,4,5` + +if [ ! -f "$instance_home/sqllib/db2profile" ] +then + echo "ERROR: $instance_home/sqllib/db2profile not found" + EXIT_STATUS=1 +else + . $instance_home/sqllib/db2profile +fi + +REMOTE_PATH="{{ s3_full_backup_path }}" + +# ============================================================================ +# Verify Backup Files in S3 +# ============================================================================ +echo "" +echo "========================================================================" +echo " DB2 RESTORE FROM S3 BACKUP" +echo "========================================================================" +echo " Remote Path: ${REMOTE_PATH}" +echo "========================================================================" +echo "" + +echo "INFO: Verifying backup files in S3 path: ${REMOTE_PATH}" + +LIST_OUTPUT=$(db2RemStgManager ALIAS LIST source=${REMOTE_PATH} 2>&1) +rc=$? +if [ $rc -ne 0 ]; then + echo "ERROR: Failed to list files in S3 path ${REMOTE_PATH}. Return code: $rc" + exit 1 +fi + +echo "$LIST_OUTPUT" + +TOTAL_FILES=$(echo "$LIST_OUTPUT" | awk -F= '/Total number of files found/ { gsub(/[^0-9]/,"",$2); print $2 }') +if [ "$TOTAL_FILES" -eq 0 ]; then + echo "ERROR: No files found in S3 path: ${REMOTE_PATH}" + exit 1 +fi + +FILES=$(echo "$LIST_OUTPUT" | awk '/^[0-9]/ { print $2 }') + +FOUND_P12=false +FOUND_STH=false +#FOUND_INFO=false +FOUND_BACKUP=false +BACKUP_FILE="" + +# ============================================================================ +# Validate Required Backup Files +# ============================================================================ +echo "" +echo "INFO: Validating required backup files..." + +for FILE in $FILES; do + if [[ "$FILE" == *.p12 ]]; then + FOUND_P12=true + elif [[ "$FILE" == *.sth ]]; then + FOUND_STH=true +# elif [[ "$FILE" == *db2-backup-info.yaml ]]; then +# FOUND_INFO=true + elif [[ "$FILE" == *.DBPART000.* ]]; then + FOUND_BACKUP=true + BACKUP_FILE=$(basename "$FILE") + fi +done + +if [[ "$FOUND_P12" == false ]]; then + echo "ERROR: Db2 keystore.p12 file is missing in the backup location: ${REMOTE_PATH}" + exit 1 +fi + +if [[ "$FOUND_STH" == false ]]; then + echo "ERROR: Db2 keystore.sth file is missing in the backup location: ${REMOTE_PATH}" + exit 1 +fi + +#if [[ "$FOUND_INFO" == false ]]; then +# echo "ERROR: Db2 db2-backup-info.yaml file is missing in the backup location: ${REMOTE_PATH}" +# exit 1 +#fi + +if [[ "$FOUND_BACKUP" == false ]]; then + echo "ERROR: Db2 backup files are missing in the backup location: ${REMOTE_PATH}" + exit 1 +fi + +echo "INFO: All required backup files verified successfully" + +# ============================================================================ +# Download Backup Files from S3 +# ============================================================================ +POD_BACKUP_PATH="/mnt/backup/{{ db2_backup_version }}" +rm -rf ${POD_BACKUP_PATH} +mkdir -p ${POD_BACKUP_PATH} + +echo "" +echo "INFO: Downloading keystore files from S3 to local path: ${POD_BACKUP_PATH}" +db2RemStgManager ALIAS GET source=${REMOTE_PATH}/keystore.p12 target=${POD_BACKUP_PATH}/keystore.p12 +db2RemStgManager ALIAS GET source=${REMOTE_PATH}/keystore.sth target=${POD_BACKUP_PATH}/keystore.sth +#db2RemStgManager ALIAS GET source=${REMOTE_PATH}/db2-backup-info.yaml target=${POD_BACKUP_PATH}/db2-backup-info.yaml +echo "INFO: Keystore files downloaded successfully" + +echo "" +echo "INFO: Downloading backup image files from S3..." + +# from the backup check command, check if the logs are included in the backup files. +backup_contain_logs=true +for FILE in $FILES; do + if [[ "$FILE" == *.DBPART000.* ]]; then + echo "INFO: Downloading backup file: $(basename "$FILE")" + db2RemStgManager ALIAS GET source=${REMOTE_PATH}/$(basename "$FILE") target=${POD_BACKUP_PATH}/$(basename "$FILE") + rc=$? + if [ $rc -ne 0 ]; then + echo "ERROR: Failed to download backup file ${FILE}. Return code: $rc" + exit 1 + else + echo "INFO: Successfully downloaded: $(basename "$FILE")" + fi + + echo "INFO: Verifying downloaded backup file" + bkpchk_cmd=$(db2ckbkp -h ${POD_BACKUP_PATH}/$(basename "$FILE")) + rc=$? + if [ $rc -ne 0 ]; then + echo "ERROR: Failed to verify backup file ${FILE}. Return code: $rc" + exit 1 + else + echo "INFO: Successfully verified: $(basename "$FILE")" + # Check if the backup contains logs + echo "INFO: Check if the backup contains logs" + logs_value=$(echo "$bkpchk_cmd" | grep 'Includes Logs' | awk -F '-- ' '{print $2}' | awk '{print $1}') + # 0 when the backup doesn't contain logs + if [ ${logs_value} = '0' ]; then + echo "INFO: Backup doesn't contain logs" + backup_contain_logs=false + fi + fi + fi +done +echo "INFO: All backup files downloaded successfully" + +# ============================================================================ +# Extract Database Information and Restore Keystore Master Key +# ============================================================================ +# Reference: https://www.ibm.com/docs/en/db2/11.5?topic=edr-restoring-encrypted-backup-image-different-system-local-keystore + +echo "" +echo "INFO: Extracting database information from backup file..." +# Example backup file name: DBNAME.0.db2inst1.DBPART000.20231015120000.001 +DATABASE=$(echo "$BACKUP_FILE" | awk -F. '{print $1}') +echo "INFO: Database name: ${DATABASE}" + +DB2_BACKUP_TIMESTAMP=$(echo "$BACKUP_FILE" | awk -F. '{print $5}') +echo "INFO: Backup timestamp: ${DB2_BACKUP_TIMESTAMP}" + +echo "INFO: Checking if LOGARCHMETH1 is ON." +logarchmeth1_enabled=true +logarchmeth1_cmd=$(db2 get db cfg for ${DATABASE} | grep LOGARCHMETH1 | awk -F'= ' '{print $2}') +if [ ${logarchmeth1_cmd} = 'OFF' ]; then + echo "INFO: LOGARCHMETH1 is OFF." + logarchmeth1_enabled=false +else + echo "INFO: LOGARCHMETH1 is ON." +fi + + +echo "" +echo "INFO: Extracting keystore master key..." +MASTER_KEY_LABEL=`gsk8capicmd_64 -cert -list all -db ${POD_BACKUP_PATH}/keystore.p12 -stashed 2>&1 | grep -i $INSTOWNER_$DATABASE | awk '/^#/ { print $2 }'` +if [ -z "$MASTER_KEY_LABEL" ]; then + echo "ERROR: MASTER_KEY_LABEL is empty. Cannot proceed with secret key extraction" + exit 1 +else + echo "INFO: Master key label: ${MASTER_KEY_LABEL}" + + # Check if MASTER_KEY_LABEL exists in the source keystore + SOURCE_KEYSTORE_LOC=$(db2 get dbm cfg | grep KEYSTORE_LOCATION | awk -F'= ' '{print $2}') + SOURCE_LABELS=$(gsk8capicmd_64 -cert -list all -db ${SOURCE_KEYSTORE_LOC} -stashed 2>&1 | awk '/^#/ { print $2 }') + if ! echo "$SOURCE_LABELS" | grep -q "^${MASTER_KEY_LABEL}$"; then + echo "INFO: Master key label '${MASTER_KEY_LABEL}' not found in target keystore" + echo "INFO: Extracting secret key from backup keystore..." + gsk8capicmd_64 -secretkey -extract -db ${POD_BACKUP_PATH}/keystore.p12 -stashed -label ${MASTER_KEY_LABEL} -format ascii -target ${POD_BACKUP_PATH}/master_key_label.kdb + echo "INFO: Secret key extraction completed" + + echo "INFO: Adding master key to target keystore..." + result=$(gsk8capicmd_64 -secretkey -add -db ${SOURCE_KEYSTORE_LOC} -stashed -label ${MASTER_KEY_LABEL} -file ${POD_BACKUP_PATH}/master_key_label.kdb 2>&1) + rc=$? + if [ $rc -ne 0 ]; then + echo "ERROR: Failed to add master key to target keystore. Return code: $rc" + exit 1 + fi + + if echo "$result" | grep -q "CTGSK3005W"; then + echo "ERROR: CTGSK3005W warning detected" + echo "$result" + exit 1 + fi + echo "INFO: Master key added to target keystore successfully" + else + echo "INFO: Master key label '${MASTER_KEY_LABEL}' already exists in target keystore. Skipping addition" + fi +fi + +echo "INFO: Keystore master key restored successfully" + +# ============================================================================ +# Prepare Database for Restore +# ============================================================================ +# Reference: https://www.ibm.com/docs/en/db2/11.5?topic=r-restoring-db2-from-online-backup-using-commands + +dbRestoreSuccess=true +dbRollbackSuccess=true + +echo "" +echo "========================================================================" +echo " PREPARING DATABASE FOR RESTORE" +echo "========================================================================" +echo "" + +echo "STEP 1: Temporarily disable the built-in HA" +if [ -f /etc/wolverine/config.json ]; then + wvcli system disable -m "Disable HA before Db2 maintenance" + echo "INFO: Wolverine HA disabled" +else + echo "INFO: Wolverine HA not enabled" +fi + +echo "" +echo "STEP 2: Connect to the database" +connect_status=$(db2 -v connect to ${DATABASE}) +rc=$? +if [ $rc -ne 0 ]; then + if echo "$connect_status" | grep -q "SQL1032N"; then + echo "INFO: Database ${DATABASE} is not currently active. Proceeding with restore" + elif echo "$connect_status" | grep -q "SQL1013N"; then + echo "INFO: Database ${DATABASE} does not exist. Proceeding with restore" + elif echo "$connect_status" | grep -q "SQL30081N"; then + echo "ERROR: Network error while connecting to database ${DATABASE}" + exit 1 + elif echo "$connect_status" | grep -q "SQL1119N"; then + echo "ERROR: $connect_status" + exit 1 + else + echo "INFO: $connect_status. Proceeding with restore" + fi +fi + +echo "" +echo "STEP 3: Disconnect all applications connected to Db2" +db2 -v force application all +echo "INFO: Waiting for connections to close (30 seconds)..." +sleep 30 + +echo "" +echo "STEP 4: Terminate the database" +db2 -v terminate + +echo "" +echo "STEP 5: Stop the database" +db2stop force + +echo "" +echo "STEP 6: Clean Db2 interprocess communications" +ipclean -a + +echo "" +echo "STEP 7: Disable database communications" +db2set -null DB2COMM + +echo "" +echo "STEP 8: Restart database in restricted access mode" +db2start admin mode restricted access +echo "INFO: Waiting for database to start (60 seconds)..." +sleep 60 + +# ============================================================================ +# Execute Database Restore +# ============================================================================ +echo "" +echo "========================================================================" +echo " EXECUTING DATABASE RESTORE" +echo "========================================================================" +echo "" + +echo "STEP 9: Restore Db2 from full backup" + +if [[ "$backup_contain_logs" == "true" && "logarchmeth1_enabled" == "true" ]]; then + echo "INFO: Restoring database ${DATABASE} from backup timestamp ${DB2_BACKUP_TIMESTAMP}..." + restore_cmd=$(db2 -v restore db ${DATABASE} from ${POD_BACKUP_PATH} taken at ${DB2_BACKUP_TIMESTAMP} into ${DATABASE} logtarget ${POD_BACKUP_PATH} replace existing without prompting) +else + echo "INFO: Restoring database ${DATABASE} from backup timestamp ${DB2_BACKUP_TIMESTAMP} WITHOUT logs..." + restore_cmd=$(db2 -v restore db ${DATABASE} from ${POD_BACKUP_PATH} taken at ${DB2_BACKUP_TIMESTAMP} into ${DATABASE} replace existing without prompting) +fi +rc=$? +if [ $rc -ne 0 ]; then + echo "$restore_cmd" + if echo "$restore_cmd" | grep -q "Restore is successful"; then + echo "INFO: Database ${DATABASE} restored successfully from backup" + else + echo "ERROR: Failed to restore database ${DATABASE} from backup. Return code: $rc" + dbRestoreSuccess=false + fi +else + echo "INFO: Database restore completed successfully" +fi + +echo "" +if [[ "$backup_contain_logs" == "true" && "logarchmeth1_enabled" == "true" ]]; then + echo "STEP 10: Check Db2 rollforward status" + db2 -v rollforward db ${DATABASE} query status + rc=$? + if [ $rc -ne 0 ]; then + echo "ERROR: Failed to query rollforward status for database ${DATABASE}. Return code: $rc" + fi +else + echo "STEP 10: Check Db2 rollforward status" + echo "INFO: Skipping rollforward status as either backup does not contain logs or logarchmeth1 is OFF." +fi + +echo "" +echo "STEP 11: Execute Db2 rollforward" +if [[ "$backup_contain_logs" == "true" && "logarchmeth1_enabled" == "true" ]]; then + echo "INFO: Rolling forward database ${DATABASE}..." + db2 -v rollforward db ${DATABASE} to end of backup and complete overflow log path "(${POD_BACKUP_PATH})" noretrieve + rc=$? + if [ $rc -ne 0 ]; then + echo "ERROR: Rollforward failed for database ${DATABASE}. Return code: $rc" + dbRollbackSuccess=false + else + echo "INFO: Rollforward completed successfully" + fi +else + echo "INFO: Skipping rollforward as either backup does not contain logs or logarchmeth1 is OFF." +fi +echo "" + +# ============================================================================ +# Restart Database for Normal Operation +# ============================================================================ +echo "" +echo "========================================================================" +echo " RESTARTING DATABASE FOR NORMAL OPERATION" +echo "========================================================================" +echo "" + +echo "STEP 12: Stop the database" +db2stop force +rc=$? +if [ $rc -ne 0 ]; then + echo "ERROR: Failed to stop database ${DATABASE}. Return code: $rc" +fi + +echo "" +echo "STEP 13: Clean Db2 interprocess communications" +ipclean -a +rc=$? +if [ $rc -ne 0 ]; then + echo "ERROR: Failed to clean Db2 interprocess communications. Return code: $rc" +fi + +echo "" +echo "STEP 14: Reinitialize Db2 communication manager" +db2set DB2COMM=TCPIP,SSL +rc=$? +if [ $rc -ne 0 ]; then + echo "ERROR: Failed to reinitialize Db2 communication manager. Return code: $rc" +fi + +echo "" +echo "STEP 15: Restart database for normal operation" +db2start +rc=$? +if [ $rc -ne 0 ]; then + echo "ERROR: Failed to start database ${DATABASE}. Return code: $rc" +fi +echo "INFO: Waiting for database to start (60 seconds)..." +sleep 60 + +echo "" +echo "STEP 16: Activate the database" +db2 activate db ${DATABASE} +rc=$? +if [ $rc -ne 0 ]; then + echo "ERROR: Failed to activate database ${DATABASE}. Return code: $rc" +else + echo "INFO: Database activated successfully" +fi + +echo "" +echo "STEP 17: Re-enable Wolverine high availability" +sudo wvcli system enable -m "Enable HA after Db2 maintenance" +rc=$? +if [ $rc -ne 0 ]; then + echo "ERROR: Failed to enable HA. Return code: $rc" +else + echo "INFO: Wolverine HA re-enabled successfully" +fi + +echo "" +echo "STEP 18: Connect to the database" +db2 -v connect to ${DATABASE} +rc=$? +if [ $rc -ne 0 ]; then + echo "ERROR: Failed to connect to database ${DATABASE}. Return code: $rc" +else + echo "INFO: Successfully connected to database ${DATABASE}" +fi + +# ============================================================================ +# Cleanup and Final Status +# ============================================================================ +echo "" +echo "INFO: Cleaning up local backup files at: ${POD_BACKUP_PATH}" +rm -rf ${POD_BACKUP_PATH} +echo "INFO: Cleanup completed" + +echo "" +echo "========================================================================" +echo " RESTORE COMPLETION STATUS" +echo "========================================================================" + +if [ "$dbRestoreSuccess" = false ]; then + echo " Status: FAILED-RestoreFailed" + echo " Reason: Database restore encountered errors" + echo "========================================================================" + exit 1 +fi + +if [ "$dbRollbackSuccess" = false ]; then + echo " Status: FAILED-RestoreFailed" + echo " Reason: Database rollforward encountered errors" + echo "========================================================================" + exit 1 +fi + +echo " Database : ${DATABASE}" +echo " Backup Timestamp: ${DB2_BACKUP_TIMESTAMP}" +echo " Remote Path : ${REMOTE_PATH}" +echo " Status : SUCCESS-RestoreSuccess" +echo "========================================================================" +echo "" +exit 0 + diff --git a/ibm/mas_devops/roles/db2/templates/backup/download_from_cos.sh.j2 b/ibm/mas_devops/roles/db2/templates/backup/download_from_cos.sh.j2 new file mode 100644 index 0000000000..7e692ea9e6 --- /dev/null +++ b/ibm/mas_devops/roles/db2/templates/backup/download_from_cos.sh.j2 @@ -0,0 +1,21 @@ +#!/bin/bash + +### COS_SOURCE_PATH=DB2REMOTE://backup_s3_alias/backup_s3_bucket/backups-db2/db2_backup_version +COS_SOURCE_PATH={{ cos_source_path }} +DB2_BACKUP_VERSION={{ db2_backup_version }} +TARGET_PATH=/mnt/backup/backups-db2/$DB2_BACKUP_VERSION + +instance=`whoami` + +# Find the home directory +instance_home=`/usr/local/bin/db2greg -dump | grep -ae "I," | grep -v "/das," | grep "${instance}" | awk -F ',' '{print $5}'| cut -d/ -f 1,2,3,4,5` +DOWNLOAD_LOG=$instance_home/bin/download_LOG.out + +db2RemStgManager ALIAS GET source=${COS_SOURCE_PATH}/${ibm_cos_path} target=$download_path/$file_name_to_be_downloaded 2>&1 | tee -a ${DOWNLOAD_LOG} + +rc=$? +if [ $rc -ne 0 ]; then + echo "FAILED_DOWNLOAD_BACKUP db2RemStgManager ALIAS GET source=${COS_SOURCE_PATH}/${ibm_cos_path} target=$download_path/$file_name_to_be_downloaded rc=$rc" >> ${DOWNLOAD_BACK_LOG} + echo "status=fail" >> ${DOWNLOAD_BACK_LOG} + exit 1 +fi \ No newline at end of file diff --git a/ibm/mas_devops/roles/db2/templates/backup/setup_cos_storage_access.sh.j2 b/ibm/mas_devops/roles/db2/templates/backup/setup_cos_storage_access.sh.j2 new file mode 100644 index 0000000000..000ca14a93 --- /dev/null +++ b/ibm/mas_devops/roles/db2/templates/backup/setup_cos_storage_access.sh.j2 @@ -0,0 +1,8 @@ +#!/bin/bash + +if db2 list storage access | grep {{ backup_s3_alias }}; then + echo "{{ backup_s3_alias }} is available already." +else + echo "{{ backup_s3_alias }} is not available. Creating" + db2 catalog storage access alias {{ backup_s3_alias }} VENDOR S3 server {{ backup_s3_endpoint }} user {{ backup_s3_access_key }} password {{ backup_s3_secret_key }} container {{ backup_s3_bucket }} +fi \ No newline at end of file diff --git a/ibm/mas_devops/roles/db2/templates/db2ucluster.yml.j2 b/ibm/mas_devops/roles/db2/templates/db2ucluster.yml.j2 index 4794b674d1..aa0d858dda 100644 --- a/ibm/mas_devops/roles/db2/templates/db2ucluster.yml.j2 +++ b/ibm/mas_devops/roles/db2/templates/db2ucluster.yml.j2 @@ -65,6 +65,9 @@ spec: secretName: "db2u-certificate-{{db2_instance_name}}" certLabel: "CN=db2u" instance: +{% if db2_instance_password is defined and db2_instance_password != "" %} + password: "{{ db2_instance_password }}" +{% endif %} {% if db2_instance_registry is defined and db2_instance_registry != "" %} registry: {{ db2_instance_registry }} diff --git a/ibm/mas_devops/roles/download_backup_archive/README.md b/ibm/mas_devops/roles/download_backup_archive/README.md new file mode 100644 index 0000000000..cad68f967d --- /dev/null +++ b/ibm/mas_devops/roles/download_backup_archive/README.md @@ -0,0 +1,412 @@ +# download_backup_archive +Downloads and extracts MAS backup archives from AWS S3 or Artifactory. + +This role automates the process of downloading MAS backup archives from remote storage locations and extracting them to a local directory for restore operations. It supports downloading from both AWS S3 (or S3-compatible storage) and Artifactory repositories. The role handles archive verification, extraction, and cleanup operations. + +Key features: +- Downloads compressed tar.gz archives from AWS S3 or S3-compatible storage +- Downloads compressed tar.gz archives from Artifactory repositories +- **Support for downloading multiple archives in a single operation** +- **Auto-generation of archive names based on component versions** +- **Archive management option to keep downloaded archives organized** +- Automatic extraction of downloaded archives +- Configurable download timeouts for large archives +- Optional cleanup of temporary files after extraction +- Verification of downloaded archive integrity + +## Prerequisites + +### For S3 Download +- [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) or boto3 Python library must be installed +- `amazon.aws` Ansible collection must be installed +- AWS credentials with S3 read permissions +- S3 bucket must exist and be accessible + +### For Artifactory Download +- `curl` command-line tool must be installed +- Artifactory API token with download permissions +- Artifactory repository must exist and be accessible + +## Role Variables + +### Required Variables + +#### mas_instance_id +Instance ID of the MAS instance. This is used to identify the backup directories. + +- **Required** +- Environment Variable: `MAS_INSTANCE_ID` +- Default Value: None + +#### mas_restore_dir +Directory where the backup archive will be downloaded and extracted. This is the parent directory where all component backup directories will be restored. + +- **Required** +- Environment Variable: `MAS_RESTORE_DIR` +- Default Value: None + +### Backup Archive Variables + +#### backup_version +Version identifier for the backup to download. This must match the version used when the backup was created. When downloading multiple archives, this is used as the default version for all components unless specific component versions are provided. + +- **Required** +- Environment Variable: `BACKUP_VERSION` +- Default Value: None + +#### backup_archive_names +List of archive names to download. Can be provided as a comma-separated string or as a list. If not provided, the role will auto-generate archive names based on component versions. + +Examples: +- String format: `"archive1.tar.gz,archive2.tar.gz,archive3.tar.gz"` +- List format: `["archive1.tar.gz", "archive2.tar.gz", "archive3.tar.gz"]` + +- **Optional** +- Environment Variable: `BACKUP_ARCHIVE_NAMES` +- Default Value: Auto-generated from component versions + +#### Component-Specific Backup Versions + +These variables allow you to specify different backup versions for each component. If not provided, they default to the value of `backup_version`. These are used to auto-generate archive names when `backup_archive_names` is not provided. + +- `ibm_catalogs_backup_version` - Environment Variable: `IBM_CATALOGS_BACKUP_VERSION` +- `certmanager_backup_version` - Environment Variable: `CERTMANAGER_BACKUP_VERSION` +- `mongodb_backup_version` - Environment Variable: `MONGODB_BACKUP_VERSION` +- `sls_backup_version` - Environment Variable: `SLS_BACKUP_VERSION` +- `db2_backup_version` - Environment Variable: `DB2_BACKUP_VERSION` +- `suite_backup_version` - Environment Variable: `SUITE_BACKUP_VERSION` +- `manage_backup_version` - Environment Variable: `MANAGE_BACKUP_VERSION` + +All are **Optional** and default to `backup_version`. + +### S3 Download Variables + +Provide these variables to download the backup archive from AWS S3 or S3-compatible storage. If S3 credentials are provided, S3 download takes precedence over Artifactory. + +#### aws_access_key_id +AWS access key ID for authentication. + +- **Required for S3 download** +- Environment Variable: `S3_ACCESS_KEY_ID` +- Default Value: None + +#### aws_secret_access_key +AWS secret access key for authentication. + +- **Required for S3 download** +- Environment Variable: `S3_SECRET_ACCESS_KEY` +- Default Value: None + +#### s3_bucket_name +Name of the S3 bucket where the archive is stored. + +- **Required for S3 download** +- Environment Variable: `S3_BUCKET_NAME` +- Default Value: None + +#### s3_region +AWS region where the S3 bucket is located. + +- **Optional** +- Environment Variable: `S3_REGION` +- Default Value: `us-east-1` + +#### s3_endpoint_url +Custom S3 endpoint URL for S3-compatible storage services (e.g., MinIO, Wasabi, IBM Cloud Object Storage). + +- **Optional** +- Environment Variable: `S3_ENDPOINT_URL` +- Default Value: None (uses AWS S3 endpoints) + +### Artifactory Download Variables + +Provide these variables to download the backup archive from Artifactory. Artifactory download is used only if S3 credentials are not provided. + +#### artifactory_username +Artifactory username for authentication. + +- **Required for Artifactory download** +- Environment Variable: `ARTIFACTORY_USERNAME` +- Default Value: None + +#### artifactory_token +Artifactory API token for authentication. + +- **Required for Artifactory download** +- Environment Variable: `ARTIFACTORY_TOKEN` +- Default Value: None + +#### artifactory_repository +Name of the Artifactory repository where the archive is stored. + +- **Required for Artifactory download** +- Environment Variable: `ARTIFACTORY_REPOSITORY` +- Default Value: None + +#### artifactory_url +Base URL of the Artifactory server (e.g., `https://artifactory.example.com/artifactory`). + +- **Optional** +- Environment Variable: `ARTIFACTORY_URL` +- Default Value: `https://na.artifactory.swg-devops.com/artifactory` + +### General Configuration + +#### backup_temp_dir +Temporary directory where the archive will be downloaded before extraction. The directory is created if it doesn't exist and can be cleaned up after extraction. + +- **Optional** +- Environment Variable: None +- Default Value: `/mas-restore-` + +#### download_timeout +Maximum time in seconds to wait for the download to complete. Useful for large archives or slow network connections. + +- **Optional** +- Environment Variable: `DOWNLOAD_TIMEOUT_SECS` +- Default Value: `3600` (1 hour) + +#### extract_archive +Whether to automatically extract the downloaded archive. Set to `false` if you only want to download the archive without extracting it. + +- **Optional** +- Environment Variable: `EXTRACT_ARCHIVE` +- Default Value: `true` + +#### cleanup_archive +Whether to remove the archive file and temporary directory after successful extraction. Set to `false` to keep the downloaded archive. + +- **Optional** +- Environment Variable: `CLEANUP_ARCHIVE` +- Default Value: `true` + +#### include_sls_archive +Whether to download SLS archive (`-sls.tar.gz`) from S3 or Artifactory. + +**Important:** When set to `false`, the role will automatically skip downloading sls archive from S3 or Artifactory. This prevents unnecessary downloads when SLS component restore is not needed. + +- **Optional** +- Environment Variable: `INCLUDE_SLS_ARCHIVE` +- Default Value: `true` + +#### include_manage_app_archive +Whether to download Manage app archive (`-app-manage.tar.gz`) from S3 or Artifactory. + +**Important:** When set to `false`, the role will automatically skip downloading manage-app archive from S3 or Artifactory. This prevents unnecessary downloads when manage component restore is not needed. + +- **Optional** +- Environment Variable: `INCLUDE_MANAGE_APP_ARCHIVE` +- Default Value: `true` + +#### include_manage_db_archive +Whether to download Manage DB2 archive (`-db2u-manage.tar.gz`) from S3 or Artifactory. + +**Important:** When set to `false`, the role will automatically skip downloading manage-db2 archive (`-db2u-manage.tar.gz`) from S3 or Artifactory. This prevents unnecessary downloads when manage component restore is not needed. + +- **Optional** +- Environment Variable: `INCLUDE_MANAGE_DB_ARCHIVE` +- Default Value: `true` + +## Example Playbook + +### Download Multiple Archives from S3 (Auto-generated names) +After installing the Ansible Collection you can include this role in your own custom playbooks. + +```yaml +- hosts: localhost + vars: + mas_instance_id: inst1 + mas_restore_dir: /restore/mas + backup_version: "20260117-191500" + aws_access_key_id: "{{ lookup('env', 'AWS_ACCESS_KEY_ID') }}" + aws_secret_access_key: "{{ lookup('env', 'AWS_SECRET_ACCESS_KEY') }}" + s3_bucket_name: my-mas-backups + s3_region: us-west-2 + roles: + - ibm.mas_devops.download_backup_archive +``` + +### Download Specific Archives from S3 +```yaml +- hosts: localhost + vars: + mas_instance_id: inst1 + mas_restore_dir: /restore/mas + backup_version: "20260117-191500" + backup_archive_names: + - mas-inst1-backup-20260117-191500-catalog.tar.gz + - mas-inst1-backup-20260117-191500-suite.tar.gz + - mas-inst1-backup-20260117-191500-manage.tar.gz + aws_access_key_id: "{{ lookup('env', 'AWS_ACCESS_KEY_ID') }}" + aws_secret_access_key: "{{ lookup('env', 'AWS_SECRET_ACCESS_KEY') }}" + s3_bucket_name: my-mas-backups + s3_region: us-west-2 + roles: + - ibm.mas_devops.download_backup_archive +``` + +### Download with Different Component Versions +```yaml +- hosts: localhost + vars: + mas_instance_id: inst1 + mas_restore_dir: /restore/mas + backup_version: "20260117-191500" + mongodb_backup_version: "20260116-120000" + db2_backup_version: "20260115-100000" + aws_access_key_id: "{{ lookup('env', 'AWS_ACCESS_KEY_ID') }}" + aws_secret_access_key: "{{ lookup('env', 'AWS_SECRET_ACCESS_KEY') }}" + s3_bucket_name: my-mas-backups + s3_region: us-west-2 + roles: + - ibm.mas_devops.download_backup_archive +``` + +### Artifactory Download + +```yaml +- hosts: localhost + vars: + mas_instance_id: inst1 + mas_restore_dir: /restore/mas + backup_version: "20260117-191500" + artifactory_username: "{{ lookup('env', 'ARTIFACTORY_USERNAME') }}" + artifactory_token: "{{ lookup('env', 'ARTIFACTORY_TOKEN') }}" + artifactory_url: https://artifactory.example.com/artifactory + artifactory_repository: mas-backups + roles: + - ibm.mas_devops.download_backup_archive +``` + +### S3-Compatible Storage (IBMcloud, MinIO, Wasabi, etc.) + +```yaml +- hosts: localhost + vars: + mas_instance_id: inst1 + mas_restore_dir: /restore/mas + backup_version: "20260117-191500" + aws_access_key_id: "{{ lookup('env', 'S3_ACCESS_KEY') }}" + aws_secret_access_key: "{{ lookup('env', 'S3_SECRET_KEY') }}" + s3_bucket_name: mas-backups + s3_region: us-east-1 + s3_endpoint_url: https://s3.example.com + roles: + - ibm.mas_devops.download_backup_archive +``` + +### Download Without Extraction + +```yaml +- hosts: localhost + vars: + mas_instance_id: inst1 + mas_restore_dir: /restore/mas + backup_version: "20260117-191500" + extract_archive: false + cleanup_archive: false + aws_access_key_id: "{{ lookup('env', 'AWS_ACCESS_KEY_ID') }}" + aws_secret_access_key: "{{ lookup('env', 'AWS_SECRET_ACCESS_KEY') }}" + s3_bucket_name: my-mas-backups + roles: + - ibm.mas_devops.download_backup_archive +``` + +### Download with Archive Management - without Manage DB2 archives + +```yaml +- hosts: localhost + vars: + mas_instance_id: inst1 + mas_restore_dir: /restore/mas + backup_version: "20260117-191500" + include_manage_db_archive: false + aws_access_key_id: "{{ lookup('env', 'AWS_ACCESS_KEY_ID') }}" + aws_secret_access_key: "{{ lookup('env', 'AWS_SECRET_ACCESS_KEY') }}" + s3_bucket_name: my-mas-backups + roles: + - ibm.mas_devops.download_backup_archive +``` + +## Run Role Playbook +After installing the Ansible Collection you can easily run the role standalone using the `run_role` playbook provided. + +### Download Multiple Archives from S3 (Auto-generated) + +```bash +export MAS_INSTANCE_ID=inst1 +export MAS_RESTORE_DIR=/restore/mas +export BACKUP_VERSION=20260117-191500 +export S3_ACCESS_KEY_ID=your_access_key +export S3_SECRET_ACCESS_KEY=your_secret_key +export S3_BUCKET_NAME=my-mas-backups +export S3_REGION=us-west-2 +ROLE_NAME=download_backup_archive ansible-playbook ibm.mas_devops.run_role +``` + +### Download Specific Archives from S3 + +```bash +export MAS_INSTANCE_ID=inst1 +export MAS_RESTORE_DIR=/restore/mas +export BACKUP_VERSION=20260117-191500 +export BACKUP_ARCHIVE_NAMES="mas-inst1-backup-20260117-191500-catalog.tar.gz,mas-inst1-backup-20260117-191500-suite.tar.gz" +export S3_ACCESS_KEY_ID=your_access_key +export S3_SECRET_ACCESS_KEY=your_secret_key +export S3_BUCKET_NAME=my-mas-backups +export S3_REGION=us-west-2 +ROLE_NAME=download_backup_archive ansible-playbook ibm.mas_devops.run_role +``` + +### Download with Archive Management + +```bash +export MAS_INSTANCE_ID=inst1 +export MAS_RESTORE_DIR=/restore/mas +export BACKUP_VERSION=20260117-191500 +export MANAGE_ARCHIVES=false +export S3_ACCESS_KEY_ID=your_access_key +export S3_SECRET_ACCESS_KEY=your_secret_key +export S3_BUCKET_NAME=my-mas-backups +export S3_REGION=us-west-2 +ROLE_NAME=download_backup_archive ansible-playbook ibm.mas_devops.run_role +``` + +### Artifactory Download + +```bash +export MAS_INSTANCE_ID=inst1 +export MAS_RESTORE_DIR=/restore/mas +export BACKUP_VERSION=20260117-191500 +export ARTIFACTORY_USERNAME=your_username +export ARTIFACTORY_TOKEN=your_token +export ARTIFACTORY_URL=https://artifactory.example.com/artifactory +export ARTIFACTORY_REPOSITORY=mas-backups +ROLE_NAME=download_backup_archive ansible-playbook ibm.mas_devops.run_role +``` + +## Extracted Directory Structure + +After successful execution, the role will extract the backup archive to the restore directory with the following structure: + +``` +/restore/mas/ +├── backup-20260117-191500-catalog/ +├── backup-20260117-191500-certmanager/ +├── backup-20260117-191500-sls/ +├── backup-20260117-191500-mongoce/ +├── backup-20260117-191500-app-/ +├── backup-20260117-191500-db2u-/ +└── backup-20260117-191500-suite/ +``` + +The exact directories present will depend on which components were included in the original backup. + +## Related Roles + +- `upload_backup_archive` - Creates and uploads MAS backup archives to S3 or Artifactory +- `mongodb` - MongoDB backup and restore operations +- `db2` - Db2 backup and restore operations + +## License +EPL-2.0 \ No newline at end of file diff --git a/ibm/mas_devops/roles/download_backup_archive/defaults/main.yml b/ibm/mas_devops/roles/download_backup_archive/defaults/main.yml new file mode 100644 index 0000000000..326e54dfaf --- /dev/null +++ b/ibm/mas_devops/roles/download_backup_archive/defaults/main.yml @@ -0,0 +1,46 @@ +--- +# Required variables + +# Directory where the backup archive will be downloaded and extracted +mas_restore_dir: "{{ lookup('env', 'MAS_RESTORE_DIR') }}" +mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" + +# Backup version to download +backup_version: "{{ lookup('env', 'BACKUP_VERSION') }}" + +# Multiple archives download - list of archive names to download +# If not provided, will auto-generate based on backup_version and component versions +backup_archive_names: "{{ lookup('env', 'BACKUP_ARCHIVE_NAMES') }}" + +# Specific backup versions for each component (optional - override backup_version) +# These are used to auto-generate archive names when backup_archive_names is not provided +ibm_catalogs_backup_version: "{{ lookup('env', 'IBM_CATALOGS_BACKUP_VERSION') | default (backup_version, true) }}" +certmanager_backup_version: "{{ lookup('env', 'CERTMANAGER_BACKUP_VERSION') | default (backup_version, true) }}" +mongodb_backup_version: "{{ lookup('env', 'MONGODB_BACKUP_VERSION') | default (backup_version, true) }}" +sls_backup_version: "{{ lookup('env', 'SLS_BACKUP_VERSION') | default (backup_version, true) }}" +db2_backup_version: "{{ lookup('env', 'DB2_BACKUP_VERSION') | default (backup_version, true) }}" +suite_backup_version: "{{ lookup('env', 'SUITE_BACKUP_VERSION') | default (backup_version, true) }}" +manage_backup_version: "{{ lookup('env', 'MANAGE_BACKUP_VERSION') | default (backup_version, true) }}" + +# S3 Configuration (provide these to download from S3) +aws_access_key_id: "{{ lookup('env', 'S3_ACCESS_KEY_ID') }}" +aws_secret_access_key: "{{ lookup('env', 'S3_SECRET_ACCESS_KEY') }}" +s3_bucket_name: "{{ lookup('env', 'S3_BUCKET_NAME') }}" +s3_region: "{{ lookup('env', 'S3_REGION') | default('us-east-1', true) }}" +s3_endpoint_url: "{{ lookup('env', 'S3_ENDPOINT_URL') }}" + +# Artifactory Configuration (provide these to download from Artifactory) +artifactory_username: "{{ lookup('env', 'ARTIFACTORY_USERNAME') }}" +artifactory_token: "{{ lookup('env', 'ARTIFACTORY_TOKEN') }}" +artifactory_url: "{{ lookup('env', 'ARTIFACTORY_URL') | default('https://na.artifactory.swg-devops.com/artifactory', true) }}" +artifactory_repository: "{{ lookup('env', 'ARTIFACTORY_REPOSITORY') }}" + +# General settings +backup_temp_dir: "{{ mas_restore_dir }}/mas-{{ mas_instance_id }}-restore-{{ backup_version }}" +download_timeout: "{{ lookup('env', 'DOWNLOAD_TIMEOUT_SECS') | default(10800, true) }}" # Download timeout in seconds (3 hour) +extract_archive: "{{ lookup('env', 'EXTRACT_ARCHIVE') | default(true, true) }}" # Whether to extract the archive after download +cleanup_archive: "{{ lookup('env', 'CLEANUP_ARCHIVE') | default(true, true) }}" # Whether to remove the archive file after extraction + +include_sls_archive: "{{ lookup('env', 'INCLUDE_SLS_ARCHIVE') | default(true, true) }}" +include_manage_app_archive: "{{ lookup('env', 'INCLUDE_MANAGE_APP_ARCHIVE') | default(true, true) }}" +include_manage_db_archive: "{{ lookup('env', 'INCLUDE_MANAGE_DB_ARCHIVE') | default(true, true) }}" diff --git a/ibm/mas_devops/roles/download_backup_archive/meta/main.yml b/ibm/mas_devops/roles/download_backup_archive/meta/main.yml new file mode 100644 index 0000000000..155b1d78a4 --- /dev/null +++ b/ibm/mas_devops/roles/download_backup_archive/meta/main.yml @@ -0,0 +1,20 @@ +--- +galaxy_info: + author: IBM + description: Download and extract MAS backup archive from S3 or Artifactory + company: IBM + license: EPL-2.0 + min_ansible_version: "2.9" + platforms: + - name: EL + versions: + - "8" + galaxy_tags: + - ibm + - mas + - backup + - restore + - s3 + - artifactory + +dependencies: [] diff --git a/ibm/mas_devops/roles/download_backup_archive/tasks/download_from_artifactory.yml b/ibm/mas_devops/roles/download_backup_archive/tasks/download_from_artifactory.yml new file mode 100644 index 0000000000..72c96e9dc1 --- /dev/null +++ b/ibm/mas_devops/roles/download_backup_archive/tasks/download_from_artifactory.yml @@ -0,0 +1,79 @@ +--- +# Download backup archives from Artifactory +- name: "Validate Artifactory repository is defined" + ansible.builtin.fail: + msg: "artifactory_repository is required when downloading from Artifactory" + when: artifactory_repository is not defined or artifactory_repository == '' + +- name: "Display Artifactory download information" + ansible.builtin.debug: + msg: + - "Downloading from Artifactory: {{ artifactory_url }}" + - "Repository: {{ artifactory_repository }}" + - "Number of archives: {{ backup_archives_to_download | length }}" + - "Archives: {{ backup_archives_to_download }}" + - "Download to: {{ backup_temp_dir }}" + +- name: "Download archives from Artifactory using curl" + ansible.builtin.command: + cmd: > + curl -X GET + -u {{ artifactory_username }}:{{ artifactory_token }} + -o {{ backup_temp_dir }}/mas-{{ mas_instance_id }}-backups/{{ item }} + {{ artifactory_url }}/{{ artifactory_repository }}/mas-{{ mas_instance_id }}-backups/{{ item }} + --max-time {{ download_timeout }} + --connect-timeout 60 + --fail + --silent + --show-error + --location + register: artifactory_download_results + changed_when: artifactory_download_results.rc == 0 + failed_when: false + no_log: true + loop: "{{ backup_archives_to_download }}" + +- name: "Process download results" + ansible.builtin.set_fact: + artifactory_download_summary: "{{ artifactory_download_summary | default([]) + [{'archive': item.item, 'success': item.rc == 0, 'http_code': item.stdout | regex_search('[0-9]{3}$') if item.rc == 0 else 'N/A'}] }}" + loop: "{{ artifactory_download_results.results }}" + +- name: "Verify downloaded archives exist" + ansible.builtin.stat: + path: "{{ backup_temp_dir }}/mas-{{ mas_instance_id }}-backups/{{ item }}" + register: downloaded_archives_stat + loop: "{{ backup_archives_to_download }}" + +- name: "Display Artifactory download results" + ansible.builtin.debug: + msg: "Successfully downloaded {{ item.archive }} from Artifactory - HTTP Status: {{ item.http_code }} - Size: {{ (downloaded_archives_stat.results[loop_index].stat.size / 1024 / 1024) | round(2) }} MB" + when: + - item.success + - downloaded_archives_stat.results[loop_index].stat.exists + loop: "{{ artifactory_download_summary }}" + loop_control: + index_var: loop_index + +- name: "Check for failed downloads" + ansible.builtin.set_fact: + failed_artifactory_downloads: "{{ artifactory_download_summary | selectattr('success', 'equalto', false) | map(attribute='archive') | list }}" + +- name: "Check for missing archives" + ansible.builtin.set_fact: + missing_archives: "{{ downloaded_archives_stat.results | selectattr('stat.exists', 'equalto', false) | map(attribute='item') | list }}" + +- name: "Fail if any Artifactory download failed" + ansible.builtin.fail: + msg: "Failed to download {{ failed_artifactory_downloads | length }} archive(s) from Artifactory: {{ failed_artifactory_downloads | join(', ') }}" + when: failed_artifactory_downloads | length > 0 + +- name: "Fail if any downloaded archive does not exist" + ansible.builtin.fail: + msg: "{{ missing_archives | length }} downloaded archive(s) do not exist: {{ missing_archives | join(', ') }}" + when: missing_archives | length > 0 + +- name: "Display download summary" + ansible.builtin.debug: + msg: + - "Successfully downloaded {{ backup_archives_to_download | length }} archive(s) from Artifactory" + - "Total size: {{ (downloaded_archives_stat.results | map(attribute='stat.size') | sum / 1024 / 1024) | round(2) }} MB" diff --git a/ibm/mas_devops/roles/download_backup_archive/tasks/download_from_s3.yml b/ibm/mas_devops/roles/download_backup_archive/tasks/download_from_s3.yml new file mode 100644 index 0000000000..abcbeaeac6 --- /dev/null +++ b/ibm/mas_devops/roles/download_backup_archive/tasks/download_from_s3.yml @@ -0,0 +1,60 @@ +--- +# Download backup archives from S3 +- name: "Display S3 download information" + ansible.builtin.debug: + msg: + - "Downloading from S3 bucket: {{ s3_bucket_name }}" + - "Region: {{ s3_region }}" + - "Number of archives: {{ backup_archives_to_download | length }}" + - "Archives: {{ backup_archives_to_download }}" + - "Download to: {{ backup_temp_dir }}" + +- name: "Download archives from S3" + ibm.mas_devops.download_from_s3: + aws_access_key_id: "{{ aws_access_key_id }}" + aws_secret_access_key: "{{ aws_secret_access_key }}" + bucket_name: "{{ s3_bucket_name }}" + object_name: "mas-{{ mas_instance_id }}-backups/{{ item }}" + local_dir: "{{ backup_temp_dir }}" + region_name: "{{ s3_region }}" + endpoint_url: "{{ s3_endpoint_url | default(omit) }}" + register: s3_download_results + poll: 10 + loop: "{{ backup_archives_to_download }}" + +- name: "Verify downloaded archives exist" + ansible.builtin.stat: + path: "{{ backup_temp_dir }}/mas-{{ mas_instance_id }}-backups/{{ item }}" + register: downloaded_archives_stat + loop: "{{ backup_archives_to_download }}" + +- name: "Display S3 download results" + ansible.builtin.debug: + msg: "Successfully downloaded {{ item.item }} from S3 bucket {{ s3_bucket_name }} - Size: {{ (item.stat.size / 1024 / 1024) | round(2) }} MB" + when: + - item.stat.exists + loop: "{{ downloaded_archives_stat.results }}" + +- name: "Check for failed downloads" + ansible.builtin.set_fact: + failed_s3_downloads: "{{ s3_download_results.results | selectattr('success', 'equalto', false) | map(attribute='item') | list }}" + +- name: "Check for missing archives" + ansible.builtin.set_fact: + missing_archives: "{{ downloaded_archives_stat.results | selectattr('stat.exists', 'equalto', false) | map(attribute='item') | list }}" + +- name: "Fail if any S3 download failed" + ansible.builtin.fail: + msg: "Failed to download {{ failed_s3_downloads | length }} archive(s) from S3: {{ failed_s3_downloads | join(', ') }}" + when: failed_s3_downloads | length > 0 + +- name: "Fail if any downloaded archive does not exist" + ansible.builtin.fail: + msg: "{{ missing_archives | length }} downloaded archive(s) do not exist: {{ missing_archives | join(', ') }}" + when: missing_archives | length > 0 + +- name: "Display download summary" + ansible.builtin.debug: + msg: + - "Successfully downloaded {{ backup_archives_to_download | length }} archive(s) from S3" + - "Total size: {{ (downloaded_archives_stat.results | map(attribute='stat.size') | sum / 1024 / 1024) | round(2) }} MB" diff --git a/ibm/mas_devops/roles/download_backup_archive/tasks/extract_archive.yml b/ibm/mas_devops/roles/download_backup_archive/tasks/extract_archive.yml new file mode 100644 index 0000000000..7214078a7a --- /dev/null +++ b/ibm/mas_devops/roles/download_backup_archive/tasks/extract_archive.yml @@ -0,0 +1,58 @@ +--- +# Extract downloaded backup archives +- name: "Verify archives exist before extraction" + ansible.builtin.stat: + path: "{{ backup_temp_dir }}/mas-{{ mas_instance_id }}-backups/{{ item }}" + register: archives_stat + loop: "{{ backup_archives_to_download }}" + +- name: "Check for missing archives" + ansible.builtin.set_fact: + missing_archives_for_extraction: "{{ archives_stat.results | selectattr('stat.exists', 'equalto', false) | map(attribute='item') | list }}" + +- name: "Fail if any archive does not exist" + ansible.builtin.fail: + msg: "{{ missing_archives_for_extraction | length }} archive(s) do not exist: {{ missing_archives_for_extraction | join(', ') }}" + when: missing_archives_for_extraction | length > 0 + +- name: "Display extraction information" + ansible.builtin.debug: + msg: + - "Extracting {{ backup_archives_to_download | length }} archive(s)" + - "Total size: {{ (archives_stat.results | map(attribute='stat.size') | sum / 1024 / 1024) | round(2) }} MB" + - "Extract to: {{ mas_restore_dir }}" + +- name: "Extract tar.gz archives to restore directory" + ansible.builtin.command: + cmd: "tar -xzf {{ backup_temp_dir }}/mas-{{ mas_instance_id }}-backups/{{ item }} -C {{ mas_restore_dir }}" + register: tar_extract_results + changed_when: tar_extract_results.rc == 0 + loop: "{{ backup_archives_to_download }}" + +- name: "Check for extraction failures" + ansible.builtin.set_fact: + failed_extractions: "{{ tar_extract_results.results | selectattr('rc', 'ne', 0) | map(attribute='item') | list }}" + +- name: "Fail if any extraction failed" + ansible.builtin.fail: + msg: "Failed to extract {{ failed_extractions | length }} archive(s): {{ failed_extractions | join(', ') }}" + when: failed_extractions | length > 0 + +- name: "List extracted directories" + ansible.builtin.find: + paths: "{{ mas_restore_dir }}" + file_type: directory + patterns: "backup-*" + register: extracted_dirs + +- name: "Display extraction result" + ansible.builtin.debug: + msg: + - "Successfully extracted {{ backup_archives_to_download | length }} archive(s)" + - "Extracted {{ extracted_dirs.files | length }} backup directories to {{ mas_restore_dir }}" + - "Directories: {{ extracted_dirs.files | map(attribute='path') | map('basename') | list }}" + +- name: "Fail if no backup directories were extracted" + ansible.builtin.fail: + msg: "No backup directories found after extraction in {{ mas_restore_dir }}" + when: extracted_dirs.files | length == 0 diff --git a/ibm/mas_devops/roles/download_backup_archive/tasks/main.yml b/ibm/mas_devops/roles/download_backup_archive/tasks/main.yml new file mode 100644 index 0000000000..d559fbf3ae --- /dev/null +++ b/ibm/mas_devops/roles/download_backup_archive/tasks/main.yml @@ -0,0 +1,158 @@ +--- +# Validate required variables +- name: "Fail if mas_restore_dir is not defined" + ansible.builtin.fail: + msg: "mas_restore_dir is required but not defined" + when: mas_restore_dir is not defined or mas_restore_dir == '' + +- name: "Fail if mas_instance_id is not defined" + ansible.builtin.fail: + msg: "mas_instance_id is required but not defined" + when: mas_instance_id is not defined or mas_instance_id == '' + +- name: "Fail if backup_version is not defined" + ansible.builtin.fail: + msg: "backup_version is required but not defined" + when: backup_version is not defined or backup_version == '' + +# Determine which archives to download +- name: "Set backup archive names list" + ansible.builtin.set_fact: + backup_archives_to_download: [] + +# Handle explicit list of archives +- name: "Parse backup_archive_names if provided as string" + ansible.builtin.set_fact: + backup_archives_to_download: "{{ backup_archive_names.split(',') | map('trim') | list }}" + when: + - backup_archive_names is defined + - backup_archive_names != '' + - backup_archive_names is string + +- name: "Use backup_archive_names if provided as list" + ansible.builtin.set_fact: + backup_archives_to_download: "{{ backup_archive_names }}" + when: + - backup_archive_names is defined + - backup_archive_names != '' + - backup_archive_names is not string + +# Auto-generate archive names based on component versions +- name: "Auto-generate archive names from component versions" + ansible.builtin.set_fact: + backup_archives_to_download: + - "mas-{{ mas_instance_id }}-backup-{{ ibm_catalogs_backup_version }}-catalog.tar.gz" + - "mas-{{ mas_instance_id }}-backup-{{ certmanager_backup_version }}-certmanager.tar.gz" + - "mas-{{ mas_instance_id }}-backup-{{ mongodb_backup_version }}-mongoce.tar.gz" + - "mas-{{ mas_instance_id }}-backup-{{ sls_backup_version }}-sls.tar.gz" + - "mas-{{ mas_instance_id }}-backup-{{ suite_backup_version }}-suite.tar.gz" + - "mas-{{ mas_instance_id }}-backup-{{ manage_backup_version }}-app-manage.tar.gz" + - "mas-{{ mas_instance_id }}-backup-{{ db2_backup_version }}-db2u-manage.tar.gz" + when: backup_archives_to_download | length == 0 + + +- name: "Filter out SLS archive when include_sls_archive is false" + ansible.builtin.set_fact: + backup_archives_to_download: "{{ backup_archives_to_download | reject('search', '-sls\\.tar\\.gz$') | list }}" + when: not (include_sls_archive | bool) + +- name: "Filter out manage app archive when include_manage_app_archive is false" + ansible.builtin.set_fact: + backup_archives_to_download: "{{ backup_archives_to_download | reject('search', '-app-manage\\.tar\\.gz$') | list }}" + when: not (include_manage_app_archive | bool) + +# Filter out manage archives if include_manage_db_archive is false +- name: "Filter out manage archives when include_manage_db_archive is false" + ansible.builtin.set_fact: + backup_archives_to_download: "{{ backup_archives_to_download | reject('search', '-db2u-manage\\.tar\\.gz$') | list }}" + when: not (include_manage_db_archive | bool) + +- name: "Display archives to download" + ansible.builtin.debug: + msg: + - "Will download {{ backup_archives_to_download | length }} archive(s):" + - "{{ backup_archives_to_download }}" + +# Determine download source +- name: "Check if S3 credentials are provided" + ansible.builtin.set_fact: + download_from_s3: "{{ (aws_access_key_id is defined and aws_access_key_id != '') and (aws_secret_access_key is defined and aws_secret_access_key != '') and (s3_bucket_name is defined and s3_bucket_name != '') }}" + +- name: "Check if Artifactory credentials are provided" + ansible.builtin.set_fact: + download_from_artifactory: "{{ (artifactory_username is defined and artifactory_username != '') and (artifactory_token is defined and artifactory_token != '') and (artifactory_url is defined and artifactory_url != '') }}" + +- name: "Fail if neither S3 nor Artifactory credentials are provided" + ansible.builtin.fail: + msg: "Either S3 credentials (aws_access_key_id, aws_secret_access_key, s3_bucket_name) or Artifactory credentials (artifactory_username, artifactory_token, artifactory_url) must be provided" + when: not download_from_s3 and not download_from_artifactory + +- name: "Display download source" + ansible.builtin.debug: + msg: "Will download from: {{ 'S3' if download_from_s3 else 'Artifactory' }}" + +# Create restore directory +- name: "Create restore directory" + ansible.builtin.file: + path: "{{ mas_restore_dir }}" + state: directory + mode: '0755' + +# Delete temporary directory to make sure its clean +- name: "Delete temporary directory before creating to make sure it is clean" + ansible.builtin.file: + path: "{{ backup_temp_dir }}" + state: absent + +# Create temporary directory for download +- name: "Create temporary directory for download" + ansible.builtin.file: + path: "{{ backup_temp_dir }}" + state: directory + mode: '0755' + +# Download from S3, boto3 needs this base directory for bucket base dir +- name: "Create temporary base directory" + ansible.builtin.file: + path: "{{ backup_temp_dir }}/mas-{{ mas_instance_id }}-backups" + state: directory + mode: '0755' + +# Download from S3 or Artifactory +- name: "Download from S3" + ansible.builtin.include_tasks: download_from_s3.yml + when: download_from_s3 + +- name: "Download from Artifactory" + ansible.builtin.include_tasks: download_from_artifactory.yml + when: download_from_artifactory and not download_from_s3 + +# Extract archives +- name: "Extract backup archives" + ansible.builtin.include_tasks: extract_archive.yml + when: extract_archive | bool + +# Cleanup +- name: "Remove temporary archive files" + ansible.builtin.file: + path: "{{ backup_temp_dir }}/{{ item }}" + state: absent + loop: "{{ backup_archives_to_download }}" + when: + - cleanup_archive | bool + - extract_archive | bool + +- name: "Remove temporary directory" + ansible.builtin.file: + path: "{{ backup_temp_dir }}" + state: absent + when: + - cleanup_archive | bool + - extract_archive | bool + +- name: "Display archive management information" + ansible.builtin.debug: + msg: + - "Archives are being managed in: {{ backup_temp_dir }}" + - "Downloaded archives: {{ backup_archives_to_download }}" + - "To cleanup manually, remove the directory: {{ backup_temp_dir }}" diff --git a/ibm/mas_devops/roles/grafana/README.md b/ibm/mas_devops/roles/grafana/README.md index da858ecebe..b3904651c7 100644 --- a/ibm/mas_devops/roles/grafana/README.md +++ b/ibm/mas_devops/roles/grafana/README.md @@ -159,7 +159,7 @@ Storage volume size for Grafana user data. - hosts: localhost vars: grafana_instance_storage_class: "ibmc-file-gold-gid" - grafana_instance_storage_class: "15Gi" + grafana_instance_storage_size: "15Gi" roles: - ibm.mas_devops.grafana ``` diff --git a/ibm/mas_devops/roles/ibm_catalogs/README.md b/ibm/mas_devops/roles/ibm_catalogs/README.md index b58774c95e..31100e7941 100644 --- a/ibm/mas_devops/roles/ibm_catalogs/README.md +++ b/ibm/mas_devops/roles/ibm_catalogs/README.md @@ -83,6 +83,137 @@ Artifactory API token for accessing pre-release development catalogs (IBM employ **Note**: **IBM EMPLOYEES ONLY** - This is for pre-release testing only. Never use development catalogs in production. Keep this token secure and do not commit to source control. Generate tokens from IBM Artifactory. +### Backup and Restore Variables + +#### mas_backup_dir +Directory path where IBM Operator Catalog backup files will be stored. + +- **Required** for backup and restore operations +- Environment Variable: `MAS_BACKUP_DIR` +- Default: None + +**Purpose**: Specifies the local filesystem directory where backup archives will be created (for backup) or read from (for restore). This directory serves as the central location for all IBM Operator Catalog backup data. + +**When to use**: +- Required when `ibm_catalogs_action` is set to `backup` or `restore` +- Should be a persistent location with sufficient storage space +- Ensure the directory is accessible and has appropriate permissions + +**Valid values**: Any valid local filesystem path (e.g., `/backup/mas`, `/home/user/catalog-backups`) + +**Impact**: All backup files and metadata will be stored in subdirectories under this path. The backup creates a timestamped directory structure: `{mas_backup_dir}/backup-{version}-catalog/` + +**Related variables**: Works with `ibm_catalogs_backup_version` to create unique backup directories. + +**Note**: Ensure this directory has sufficient space for backup data and is regularly backed up to external storage for disaster recovery. + +#### ibm_catalogs_backup_version +Version identifier for the backup, used to create unique backup directories. + +- **Optional** for backup (auto-generated if not provided) +- **Required** for restore +- Environment Variable: `IBM_CATALOGS_BACKUP_VERSION` +- Default: Auto-generated timestamp in format `YYYYMMDD-HHMMSS` + +**Purpose**: Provides a unique identifier for each backup, allowing multiple backups to coexist and enabling point-in-time restore operations. + +**When to use**: +- For backup: Leave unset to auto-generate a timestamp-based version, or provide a custom identifier +- For restore: Must specify the exact version identifier of the backup to restore + +**Valid values**: Any string suitable for directory names (alphanumeric, hyphens, underscores). Auto-generated format: `YYYYMMDD-HHMMSS` (e.g., `20260122-131500`) + +**Impact**: +- For backup: Creates directory `{mas_backup_dir}/backup-{version}-catalog/` +- For restore: Looks for backup in `{mas_backup_dir}/backup-{version}-catalog/` + +**Related variables**: Works with `mas_backup_dir` to determine backup location. + +**Note**: When restoring, you must know the exact backup version identifier. List the contents of `mas_backup_dir` to see available backups. + +Backup and Restore Operations +------------------------------------------------------------------------------- + +This section provides comprehensive information about IBM Operator Catalog backup and restore operations. + +### Action Comparison + +| Action | Purpose | Instance Resources | Prerequisites | Use Case | +|--------|---------|-------------------|---------------|----------| +| `backup` | Create backup | Yes (catalog and related resources) | Running IBM Operator Catalog | Regular backups, disaster recovery preparation | +| `restore` | Full restore | Yes (recreates catalog and related resources) | Backup archive | Disaster recovery, cluster migration, complete restoration | + +### Backup Process + +The IBM Operator Catalog backup operation creates a backup of your catalog installation resources: + +1. **Catalog Resources**: Backs up Kubernetes resources including: + - CatalogSource (`ibm-operator-catalog`) + - Secrets (for development catalogs: `wiotp-docker-local`) + - ServiceAccounts (`ibm-operator-catalog`, `default`) +2. **Auto-discovered Secrets**: Any secrets referenced by the backed-up resources are automatically discovered and included + +**Note**: The backup includes development catalog credentials if they were configured during installation. + +**Backup Directory Structure:** +``` +{mas_backup_dir}/ +└── backup-{version}-catalog/ + └── resources/ + ├── catalogsources/ + ├── secrets/ + └── serviceaccounts/ +``` + +### Restore Process + +The IBM Operator Catalog restore operation performs a complete restoration of the catalog: + +**Steps:** +1. Validates backup files and required variables +2. Restores Secrets (or creates new `wiotp-docker-local` secret if `artifactory_username` and `artifactory_token` are provided) +3. Restores ServiceAccounts +4. Restores CatalogSource +5. Waits for CatalogSource to be ready (up to 30 minutes) + +**When to use:** +- Disaster recovery scenarios +- Migrating IBM Operator Catalog to a new cluster +- Recreating a deleted catalog +- Setting up a new environment from backup + +### Important Considerations + +**Version Compatibility:** +- Target catalog version should match the backup version +- The restore process validates version compatibility before proceeding + +**Storage Requirements:** +- Ensure sufficient storage in the backup directory +- Backup directory structure: `{mas_backup_dir}/backup-{version}-catalog/` +- Monitor disk space during backup operations + +**Security:** +- Backup files contain catalog configuration and credentials (for development catalogs) +- Secure backup directory with appropriate permissions (chmod 700 recommended) +- Consider encrypting backups for long-term storage +- Restrict access to backup files to authorized personnel only + +**Development Catalog Credentials:** +- If restoring with new `artifactory_username` and `artifactory_token`, the restore will create a new secret instead of using the backed-up one +- This allows updating credentials during restore if needed + +### Backup and Restore Best Practices + +1. **Regular Backups**: Schedule automated backups at regular intervals, especially before upgrades +2. **Test Restores**: Periodically test restore procedures in non-production environments +3. **Monitor Operations**: Implement monitoring and alerting for backup failures +4. **Backup Validation**: Verify backup integrity after completion +5. **Retention Policy**: Implement and document backup retention policies +6. **Disaster Recovery**: Include IBM Operator Catalog backup/restore in your DR plan +7. **Coordinate with Operators**: Coordinate catalog backups with operator backups that depend on it + + ## Example Playbook After installing the Ansible Collection you can include this role in your own custom playbooks. diff --git a/ibm/mas_devops/roles/ibm_catalogs/defaults/main.yml b/ibm/mas_devops/roles/ibm_catalogs/defaults/main.yml index 1514354933..807bce437a 100644 --- a/ibm/mas_devops/roles/ibm_catalogs/defaults/main.yml +++ b/ibm/mas_devops/roles/ibm_catalogs/defaults/main.yml @@ -10,3 +10,7 @@ artifactory_token: "{{ lookup('env', 'ARTIFACTORY_TOKEN') }}" # mas_catalog_digest is needed for development airgap. This environment variable should be set before running the code mas_catalog_digest: "{{ lookup('env', 'MAS_CATALOG_DIGEST') }}" mas_catalog_version: "{{ lookup('env', 'MAS_CATALOG_VERSION') | default ('@@MAS_LATEST_CATALOG@@', True) }}" + +# Backup and restore variables +ibm_catalogs_backup_version: "{{ lookup('env', 'IBM_CATALOGS_BACKUP_VERSION') }}" +mas_backup_dir: "{{ lookup('env', 'MAS_BACKUP_DIR') }}" diff --git a/ibm/mas_devops/roles/ibm_catalogs/tasks/backup/main.yml b/ibm/mas_devops/roles/ibm_catalogs/tasks/backup/main.yml new file mode 100644 index 0000000000..264aa066d5 --- /dev/null +++ b/ibm/mas_devops/roles/ibm_catalogs/tasks/backup/main.yml @@ -0,0 +1,74 @@ +--- +- name: "Fail if require variables for IBM operator catalog backup are not provided" + ibm.mas_devops.verify_backup_restore_vars: + mas_backup_dir: "{{ mas_backup_dir }}" + action: "backup" + component: "catalog" + +- name: "Check if IBM_CATALOGS_BACKUP_VERSION is provided, if not set to default 'YYYYMMDD-HHMMSS' format" + set_fact: + ibm_catalogs_backup_version: "{{ lookup('pipe', 'date +%Y%m%d-%H%M%S') }}" + when: ibm_catalogs_backup_version is not defined or ibm_catalogs_backup_version == "" or ibm_catalogs_backup_version == "None" + +- name: "Set fact: Catalog backup base directory path" + set_fact: + catalog_backup_path: "{{ mas_backup_dir }}/backup-{{ ibm_catalogs_backup_version }}-catalog" + +- name: "Set fact: IBM operator catalog backup resources" + set_fact: + catalog_backup_resources: + - namespace: "openshift-marketplace" + resources: + # Catalog source + - kind: CatalogSource + api_version: operators.coreos.com/v1alpha1 + name: ibm-operator-catalog + # Secret (for dev catalogs) + - kind: Secret + api_version: v1 + name: wiotp-docker-local + # Service Accounts (this is reqd for dev catalogs) + - kind: ServiceAccount + api_version: v1 + name: ibm-operator-catalog + - kind: ServiceAccount + api_version: v1 + name: default + +# Call the backup_resources plugin to execute the backup to the path provided +# ----------------------------------------------------------------------------- +- name: "Backup catalogs resources (referenced secrets are auto-discovered)" + ibm.mas_devops.backup_resource: + backup_resources: "{{ catalog_backup_resources }}" + backup_path: "{{ catalog_backup_path }}" + register: backup_result + +# Show the results +# ----------------------------------------------------------------------------- +- name: "Display backup results" + debug: + msg: + - "Backup completed{{ ' with failures' if backup_result.failed_count > 0 else ' successfully' }}" + - "Total resources backed up: {{ backup_result.backed_up_count }}" + - "Total resources failed: {{ backup_result.failed_count }}" + - "Resources not found: {{ backup_result.not_found_count }}" + - "Secrets auto-discovered: {{ backup_result.discovered_secrets_count }}" + - "Backup location: {{ catalog_backup_path }}" + +# Fail task if any errors occurred. +# ----------------------------------------------------------------------------- +- name: "Display failed resources" + debug: + msg: + - "Failed resources:" + - "{{ backup_result.failed_resources | to_nice_yaml }}" + when: backup_result.failed_count > 0 + +- name: "Fail if backup had errors" + fail: + msg: | + Backup failed for {{ backup_result.failed_count }} resource(s): + {% for resource in backup_result.failed_resources %} + - {{ resource.description }} in {{ resource.scope }} + {% endfor %} + when: backup_result.failed_count > 0 diff --git a/ibm/mas_devops/roles/ibm_catalogs/tasks/restore/create-wiotp-secret.yml b/ibm/mas_devops/roles/ibm_catalogs/tasks/restore/create-wiotp-secret.yml new file mode 100644 index 0000000000..ebda6fba25 --- /dev/null +++ b/ibm/mas_devops/roles/ibm_catalogs/tasks/restore/create-wiotp-secret.yml @@ -0,0 +1,27 @@ +--- + +# Create an image pull secret for local artifactory so that we can install the development catalog +# --------------------------------------------------------------------------------------------------------------------- +- name: "Create wiotp-docker-local secret" + no_log: true + vars: + artifactoryAuthStr: "{{artifactory_username}}:{{artifactory_token}}" + artifactoryAuth: "{{ artifactoryAuthStr | b64encode }}" + content: + - '{"auths":{"docker-na-public.artifactory.swg-devops.com/wiotp-docker-local": {"username":"{{artifactory_username}}","password":"{{artifactory_token}}","auth":"{{artifactoryAuth}}"}' + - ',"docker-na-proxy-svl.artifactory.swg-devops.com/wiotp-docker-local": {"username":"{{artifactory_username}}","password":"{{artifactory_token}}","auth":"{{artifactoryAuth}}"}' + - ',"docker-na-proxy-rtp.artifactory.swg-devops.com/wiotp-docker-local": {"username":"{{artifactory_username}}","password":"{{artifactory_token}}","auth":"{{artifactoryAuth}}"}' + - "}" + - "}" + kubernetes.core.k8s: + definition: + apiVersion: v1 + kind: Secret + type: kubernetes.io/dockerconfigjson + metadata: + name: wiotp-docker-local + namespace: openshift-marketplace + stringData: + # Only way I could get three consecutive "}" into a string :) + .dockerconfigjson: "{{ content | join('') | string }}" + register: result diff --git a/ibm/mas_devops/roles/ibm_catalogs/tasks/restore/main.yml b/ibm/mas_devops/roles/ibm_catalogs/tasks/restore/main.yml new file mode 100644 index 0000000000..5ecb849aba --- /dev/null +++ b/ibm/mas_devops/roles/ibm_catalogs/tasks/restore/main.yml @@ -0,0 +1,151 @@ +--- +- name: "Fail if require variables for IBM operator catalog backup are not provided" + ibm.mas_devops.verify_backup_restore_vars: + mas_backup_dir: "{{ mas_backup_dir }}" + ibm_catalogs_backup_version: "{{ ibm_catalogs_backup_version }}" + action: "restore" + component: "catalog" + +- name: "Set fact: Catalog backup base directory path" + set_fact: + catalog_backup_path: "{{ mas_backup_dir }}/backup-{{ ibm_catalogs_backup_version }}-catalog" + catalog_resources_path: "{{ mas_backup_dir }}/backup-{{ ibm_catalogs_backup_version }}-catalog/resources" + +- name: "Check catalog backup resource path exist" + stat: + path: "{{ catalog_resources_path }}" + register: resources_backup_path_stat + +- name: "Fail if backup archive does not exist" + fail: + msg: "Catalog resources archive not found at: {{ catalog_resources_path }}" + when: not resources_backup_path_stat.stat.exists or not resources_backup_path_stat.stat.isdir + +- name: "MAS Catalog restore information" + debug: + msg: + - "Backup Version ................. {{ ibm_catalogs_backup_version }}" + - "Backup Path .................... {{ catalog_backup_path }}" + +# 1. Restore Secrets (required for dev catalog) +# ----------------------------------------------------------------------------- +- name: Create wiotp-docker-local secret if artifactory details are provided. + when: + - artifactory_username is defined and artifactory_username != "" + - artifactory_token is defined and artifactory_token != "" + include_tasks: tasks/restore/create-wiotp-secret.yml + +- name: Restore secrets + ibm.mas_devops.restore_resource: + backup_path: "{{ catalog_backup_path }}" + resource_kinds: + - Secret + register: secret_result + when: + - artifactory_username is not defined or artifactory_username == "" + - artifactory_token is not defined or artifactory_token == "" + +# 2. Restore Service Accounts (required for dev catalog) +# ----------------------------------------------------------------------------- +- name: Restore ServiceAccounts + ibm.mas_devops.restore_resource: + backup_path: "{{ catalog_backup_path }}" + resource_kinds: + - ServiceAccount + register: sa_result + +# 3. Restore CatalogSource +# ----------------------------------------------------------------------------- +- name: Restore CatalogSource + ibm.mas_devops.restore_resource: + backup_path: "{{ catalog_backup_path }}" + resource_kinds: + - CatalogSource + register: catalog_result + +# Calculate total results +# ----------------------------------------------------------------------------- +- name: "Calculate total restore results" + set_fact: + total_created: >- + {{ + (secret_result.created_count | default(0)) + + (sa_result.created_count | default(0)) + + (catalog_result.created_count | default(0)) + }} + total_updated: >- + {{ + (secret_result.updated_count | default(0)) + + (sa_result.updated_count | default(0)) + + (catalog_result.updated_count | default(0)) + }} + total_skipped: >- + {{ + (secret_result.skipped_count | default(0)) + + (sa_result.skipped_count | default(0)) + + (catalog_result.skipped_count | default(0)) + }} + total_failed: >- + {{ + (secret_result.failed_count | default(0)) + + (sa_result.failed_count | default(0)) + + (catalog_result.failed_count | default(0)) + }} + +- name: "Display total restore results" + debug: + msg: + - >- + Restore completed{{ ' with failures' if total_failed | int > 0 + else ' successfully' }} + - "Total resources created: {{ total_created }}" + - "Total resources updated: {{ total_updated }}" + - "Total resources skipped: {{ total_skipped }}" + - "Total resources failed: {{ total_failed }}" + +# Fail task if any errors occurred +# ----------------------------------------------------------------------------- +- name: "Collect all failed resources" + set_fact: + all_failed_resources: >- + {{ + (secret_result.failed_resources | default([])) + + (sa_result.failed_resources | default([])) + + (catalog_result.failed_resources | default([])) + }} + +- name: "Display failed resources" + debug: + msg: + - "Failed resources:" + - "{{ all_failed_resources | to_nice_yaml }}" + when: total_failed | int > 0 + +- name: "Fail if restore had errors" + fail: + msg: | + Restore failed for {{ total_failed }} resource(s): + {% for resource in all_failed_resources %} + - {{ resource.description }}: {{ resource.error }} + {% endfor %} + when: total_failed | int > 0 + +# 4. Wait until ready +# ----------------------------------------------------------------------------- +- name: "Wait until CatalogSource/ibm-operator-catalog is Ready" + when: catalog_result.success + kubernetes.core.k8s_info: + api_version: operators.coreos.com/v1alpha1 + kind: CatalogSource + name: ibm-operator-catalog + namespace: openshift-marketplace + register: ibm_catalog_lookup + retries: 30 # Up to 30 min + delay: 60 # Every minute + until: + - ibm_catalog_lookup.resources is defined + - ibm_catalog_lookup.resources | length == 1 + - ibm_catalog_lookup.resources[0].status is defined + - ibm_catalog_lookup.resources[0].status.connectionState is defined + - ibm_catalog_lookup.resources[0].status.connectionState.lastObservedState is defined + - ibm_catalog_lookup.resources[0].status.connectionState.lastObservedState == "READY" diff --git a/ibm/mas_devops/roles/mongodb/README.md b/ibm/mas_devops/roles/mongodb/README.md index 904c2f47ed..8c7f1fd1ad 100644 --- a/ibm/mas_devops/roles/mongodb/README.md +++ b/ibm/mas_devops/roles/mongodb/README.md @@ -128,7 +128,7 @@ Specifies which operation to perform on the MongoDB instance. - Use `create-mongo-service-credentials` to generate service credentials (ibm only) **Valid values** (provider-specific): -- **community**: `install`, `uninstall`, `backup`, `restore` +- **community**: `install`, `uninstall`, `backup`, `backup-database`, `restore`, `restore-database` - **aws**: `install`, `uninstall`, `docdb_secret_rotate`, `destroy-data` - **ibm**: `install`, `uninstall`, `backup`, `restore`, `create-mongo-service-credentials` - **atlas**: `install`, `uninstall`, `restore` @@ -558,194 +558,359 @@ Confirmation flag to upgrade MongoDB from version 7 to version 8. - Environment Variable: `MONGODB_V8_UPGRADE` - Default Value: `false` -**Purpose**: Acts as a safety confirmation to prevent accidental MongoDB major version upgrades from version 7 to 8. +Role Variables - Backup and Restore (CE Operator) +------------------------------------------------------------------------------- -**When to use**: -- Set to `true` only when intentionally upgrading from MongoDB 7 to version 8 -- Must be explicitly set to perform the upgrade -- Leave as `false` for all other operations - -**Valid values**: `true`, `false` - -**Impact**: When `true` and `mongodb_version` is set to an 8.x version, triggers MongoDB upgrade from 7.x to 8.x. This is a one-way operation that cannot be reversed without restoring from backup. - -**Related variables**: -- `mongodb_version`: Must be set to an 8.x version for upgrade to proceed -- Other upgrade flags: `mongodb_v5_upgrade`, `mongodb_v6_upgrade`, `mongodb_v7_upgrade` - -**Note**: **Always backup before upgrading**. Test upgrades in non-production environments first. Review MongoDB 8.0 release notes for breaking changes. - -#### masbr_confirm_cluster -Enables cluster confirmation prompt before executing backup or restore operations. - -- **Optional** -- Environment Variable: `MASBR_CONFIRM_CLUSTER` -- Default: `false` - -**Purpose**: Provides a safety check to confirm you're connected to the correct cluster before performing backup or restore operations, preventing accidental operations on wrong clusters. - -**When to use**: -- Set to `true` in environments with multiple clusters to prevent mistakes -- Set to `true` for production environments as an additional safety measure -- Leave as `false` for automated pipelines where confirmation isn't possible +### mongodb_action +For backup and restore operations, set `mongodb_action` to one of the following: +- `backup`: Create a backup of MongoDB databases and instance resources +- `backup-database`: Create a backup of MongoDB databases only +- `restore`: Restore both MongoDB instance resources and databases from a backup (full restore) +- `restore-database`: Restore only MongoDB databases from a backup to an existing instance (database-only restore) -**Valid values**: `true`, `false` - -**Impact**: When `true`, the role will prompt for confirmation of the cluster before proceeding with backup/restore. This adds a manual step but prevents costly mistakes. - -**Related variables**: Used with `mongodb_action` when set to `backup` or `restore`. - -#### masbr_copy_timeout_sec -Timeout in seconds for backup/restore file transfer operations. - -- **Optional** -- Environment Variable: `MASBR_COPY_TIMEOUT_SEC` -- Default: `43200` (12 hours) - -**Purpose**: Sets the maximum time allowed for copying backup files to/from storage. Prevents operations from hanging indefinitely on slow networks or large datasets. - -**When to use**: -- Increase for very large databases or slow network connections -- Decrease for smaller databases to fail faster on issues -- Use default (12 hours) for most deployments - -**Valid values**: Positive integer representing seconds (e.g., `3600` = 1 hour, `43200` = 12 hours, `86400` = 24 hours) - -**Impact**: Operations exceeding this timeout will fail. Setting too low causes failures on legitimate long-running transfers. Setting too high delays detection of stuck operations. - -**Related variables**: Used with `mongodb_action` when set to `backup` or `restore`. - -**Note**: Consider database size and network speed when setting timeout. Monitor actual transfer times to optimize this value. +- Environment Variable: `MONGODB_ACTION` +- Default Value: `install` -#### masbr_job_timezone -Time zone for scheduled backup job execution. +**Action Details:** +- **backup**: Creates a complete backup including database data and Kubernetes resources (secrets, certificates, CRs) +- **backup-database**: Creates a complete backup including database data only +- **restore**: Performs a full restore by recreating the MongoDB instance from backup resources and then restoring database data +- **restore-database**: Restores only the database data to an already running MongoDB instance without touching instance resources -- **Optional** -- Environment Variable: `MASBR_JOB_TIMEZONE` -- Default: None (uses UTC) +### mas_backup_dir +**Required for backup/restore operations**. Local directory path where backup files will be stored or read from. -**Purpose**: Specifies the time zone for scheduled backup CronJobs, ensuring backups run at the intended local time rather than UTC. +- Environment Variable: `MAS_BACKUP_DIR` +- Default Value: None +- Example: `/tmp/masbr` -**When to use**: -- Set when scheduling backups to run at specific local times -- Use for compliance with backup windows in specific time zones -- Leave unset to use UTC (recommended for global deployments) +### mongodb_backup_version +**Required for restore operations**. The backup version timestamp to restore from. This is automatically generated during backup in the format `YYYYMMDD-HHMMSS`. -**Valid values**: Any valid [tz database time zone](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) (e.g., `America/New_York`, `Europe/London`, `Asia/Tokyo`) +- Environment Variable: `MONGODB_BACKUP_VERSION` +- Default Value: YYYYMMDD-HHMMSS +- Example: `20251212-021316` -**Impact**: Affects when scheduled backups execute. Incorrect time zone can cause backups to run during business hours or miss backup windows. +### mongodb_instance_name +The name of the MongoDB instance to backup. -**Related variables**: -- `masbr_backup_schedule`: Defines the cron schedule -- Only applies when `masbr_backup_schedule` is set +- Environment Variable: `MONGODB_INSTANCE_NAME` +- Default Value: `mas-mongo-ce` -**Note**: When not set, CronJobs use UTC time zone. Consider daylight saving time changes when scheduling backups. +### override_storageclass +Controls whether to override storage class for the MongoDB instance during restore. +Set to `true` to override the storage class with `MONGODB_STORAGE_CLASS` value or cluster's default storageclass. -#### masbr_storage_local_folder -Local filesystem path where backup files will be stored or retrieved from. +- Environment Variable: `OVERRIDE_STORAGECLASS` +- Default Value: `false` -- **Required** when `mongodb_action` is `backup` or `restore` -- Environment Variable: `MASBR_STORAGE_LOCAL_FOLDER` -- Default: None +### mas_app_id +Optional. Specific MAS application ID for targeted backup/restore operations. -**Purpose**: Specifies the directory for storing MongoDB backup files. This location must have sufficient space and appropriate permissions for backup operations. +- Environment Variable: `MAS_APP_ID` +- Default Value: None -**When to use**: -- Always required for backup and restore operations -- Use a path with adequate storage space for full database backups -- Consider using network-attached storage for backup retention -- Ensure path is accessible and has proper permissions -**Valid values**: Any valid local filesystem path (e.g., `/backup/mongodb`, `/mnt/nfs/backups`, `/tmp/masbr`) +Backup and Restore Operations +------------------------------------------------------------------------------- -**Impact**: Backup files are written to this location. Insufficient space causes backup failures. Path must be accessible during restore operations. +This section provides comprehensive information about MongoDB backup and restore operations for the Community Edition (CE) operator. -**Related variables**: -- `masbr_backup_type`: Determines if full or incremental backups are stored here -- `masbr_restore_from_version`: Specifies which backup version to restore from this location +### Action Comparison -**Note**: Ensure adequate disk space (at least 2x database size for full backups). Implement backup retention policies to manage storage usage. +| Action | Purpose | Instance Resources | Database Data | Prerequisites | Use Case | +|--------|---------|-------------------|---------------|---------------|----------| +| `backup` | Create backup | Yes | Yes | Running MongoDB instance | Regular backups, disaster recovery preparation | +| `backup-database` | Database-only backup | No | Yes | Running MongoDB instance | Regular backups, disaster recovery preparation | +| `restore` | Full restore | Yes (recreates instance) | Yes | Backup with instance resources | Disaster recovery, cluster migration, complete restoration | +| `restore-database` | Database-only restore | No (preserves existing) | Yes | Running MongoDB instance with matching version | Data recovery, selective restore, testing | -#### masbr_backup_type -Type of backup to create: full or incremental. +### Backup Process -- **Optional** -- Environment Variable: `MASBR_BACKUP_TYPE` -- Default: `full` +The MongoDB backup operation creates a backup of your MongoDB instance and databases associated with your MAS instance: -**Purpose**: Determines whether to create a complete database backup or an incremental backup containing only changes since the last full backup. Incremental backups save time and storage. +1. **Database Backup**: Uses `mongodump` to export databases with filter `"^(mas|iot|sls|ibm-sls)(_|-)({{ mas_instance_id }}|sls)(_|-)(?!.*monitor$)"` to match databases like `mas--` or `iot--` or `ibm-sls_sls_licensing` or `sls-_sls_licensing`. +2. **Instance Resources** : Backs up Kubernetes resources including: + - CustomResourceDefinition (CRD) + - MongoDBCommunity Custom Resource (CR) + - Secrets (TLS certificates, credentials) + - Certificate resources + - Issuer resources + - Mongodb operator deployment + - service monitoring + - grafana dashboards -**When to use**: -- Use `full` for initial backups or periodic complete backups -- Use `incr` for frequent backups between full backups to save time and space -- Implement a strategy like weekly full + daily incremental backups +**Backup Directory Structure:** +``` +/tmp/masbr/ +└── backup--mongoce/ + ├── data/ + │ ├── mongodump-.tar.gz + │ └── mongodb-info.yaml + └── resources/ + ├── mongodbcommunitys/ + ├── secrets/ + ├── certificates/ + └── issuers/ + └── {kind}s/ +``` -**Valid values**: `full`, `incr` +### Restore Process + +The MongoDB role supports two types of restore operations: + +#### 1. Full Restore (`restore` action) +Performs a complete restoration of both the MongoDB instance and its databases: + +**Steps:** +1. Validates backup files and required variables +2. Restores the namespace (Project) +3. Restores Secrets and ConfigMaps +4. Restores CustomResourceDefinitions (CRDs) +5. Restores RBAC resources (ServiceAccount, Role, RoleBinding) +6. Configures anyuid permissions for MongoDB service accounts +7. Restores the MongoDB Operator Deployment +8. Waits for MongoDB operator to be ready +9. Restores Certificate Manager resources (Issuer, Certificate) +10. Restores the MongoDBCommunity Custom Resource (Overrides storageclass if flag is set) +11. Waits for MongoDB StatefulSets to be ready +12. Restores ServiceMonitor and GrafanaDashboard resources +13. Restores database data using `mongorestore` + +**When to use:** +- Disaster recovery scenarios +- Migrating MongoDB to a new cluster +- Recreating a deleted MongoDB instance +- Setting up a new environment from backup + +**Requirements:** +- `mas_instance_id`: MAS instance identifier +- `mas_backup_dir`: Directory containing the backup +- `mongodb_backup_version`: Timestamp of the backup to restore +- `override_storageclass`: Set to `true` to override storage class during instance restore + +#### 2. Database-Only Restore (`restore-database` action) +Restores only the database data to an existing MongoDB instance: + +**Steps:** +1. Validates backup files and required variables +2. Verifies existing MongoDB installation is running +3. Checks MongoDB version compatibility +4. Restores database data using `mongorestore` + +**When to use:** +- Restoring data to an existing MongoDB instance +- Recovering from data corruption without recreating the instance +- Selective database restoration +- Testing data restoration without affecting instance configuration + +**Requirements:** +- `mas_instance_id`: MAS instance identifier +- `mas_backup_dir`: Directory containing the backup +- `mongodb_backup_version`: Timestamp of the backup to restore +- `mongodb_namespace`: Namespace where MongoDB is running +- `mongodb_instance_name`: Name of the existing MongoDB instance +- MongoDB instance must already be running and accessible + +### Important Considerations + +**Version Compatibility:** +- Target MongoDB version must match the backup version exactly +- Version upgrades should be performed separately, not during restore +- The restore process validates version compatibility before proceeding +- For `restore-database` action, the existing MongoDB instance must be running the same version as the backup + +**Storage Requirements:** +- Ensure sufficient storage in the backup directory +- Plan for at least 2x the database size for backup storage +- Backup directory structure: `/tmp/masbr/backup--mongoce/` +- Monitor disk space during backup operations + +**Security:** +- Backup files contain sensitive data and credentials +- Secure backup directory with appropriate permissions (chmod 700 recommended) +- Consider encrypting backups for long-term storage +- Backup includes TLS certificates and admin credentials +- Restrict access to backup files to authorized personnel only + +**Performance:** +- Backup operations may impact MongoDB performance +- Schedule backups during low-usage periods +- Monitor resource utilization during backup/restore +- Large databases may take significant time to backup/restore +- Network bandwidth may affect backup/restore speed + +**Restore Action Differences:** +- **`restore` action**: Recreates the entire MongoDB instance from scratch, including all Kubernetes resources and restores database +- **`restore-database` action**: Only restores database data to an existing instance, preserving current configuration + +### Backup and Restore Best Practices + +1. **Regular Backups**: Schedule automated backups at regular intervals +2. **Test Restores**: Periodically test restore procedures in non-production environments +3. **Monitor Operations**: Implement monitoring and alerting for backup failures +4. **Backup Validation**: Verify backup integrity after completion +5. **Retention Policy**: Implement and document backup retention policies +6. **Disaster Recovery**: Include MongoDB backup/restore in your DR plan + +### Backup and Restore Issues + +**Backup Failures:** + +- **Permission Errors**: Verify backup role and user creation succeeded. Check MongoDB pod logs with `oc logs -n mongoce -c mongod` +- **Disk Space Issues**: Ensure sufficient space in backup directory. Clean up old backups if needed like `/tmp/`. Check with `df -h`. +- **Pod Access Issues**: Verify pod is running and accessible. Check network connectivity to the cluster + +**Restore Failures:** + +- **Version Mismatch**: Ensure target MongoDB version matches backup version. Deploy correct version before restoring +- **Authentication Errors**: Verify admin credentials are correct. Check MongoDB health status +- **Missing Backup Files**: Verify backup directory path and version. Ensure backup completed successfully +- **Data Inconsistency**: Verify backup integrity. Check restore logs for errors. Consider re-running restore + +**General Issues:** + +- **Pods Not Starting After Restore**: Check pod events with `oc describe pod -n mongoce `. Verify PVCs are bound. Check resource limits and node capacity +- **Connection Issues**: Verify network policies and service configurations. Check certificate validity + +Example Playbooks +------------------------------------------------------------------------------- -**Impact**: -- `full`: Creates complete backup, takes longer, uses more storage -- `incr`: Creates incremental backup, faster, uses less storage, requires base full backup +### Install (CE Operator) +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + mongodb_storage_class: ibmc-block-gold + mas_instance_id: masinst1 + mas_config_dir: ~/masconfig + roles: + - ibm.mas_devops.mongodb +``` -**Related variables**: -- `masbr_backup_from_version`: Required when `incr` is used to specify base full backup -- `mongodb_action`: Must be set to `backup` +### Backup (CE Operator) +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + mas_instance_id: masinst1 + mas_backup_dir: /tmp/masbr + mongodb_action: backup + roles: + - ibm.mas_devops.mongodb +``` -**Note**: Incremental backups require a full backup as base. Restore operations may need to apply multiple incremental backups sequentially. +### Backup with Instance Resources (CE Operator) +Create a complete backup including both database data and instance resources (secrets, certificates, issuers). -#### masbr_backup_from_version -Timestamp of the full backup to use as base for incremental backup. +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + mas_instance_id: masinst1 + mas_backup_dir: /tmp/masbr + mongodb_action: backup + roles: + - ibm.mas_devops.mongodb +``` -- **Optional** -- Environment Variable: `MASBR_BACKUP_FROM_VERSION` -- Default: None (automatically uses latest full backup) +### Backup Database (CE Operator) +Create a backup of database data. -**Purpose**: Specifies which full backup serves as the base for an incremental backup. This links the incremental backup to a specific full backup version. +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + mas_instance_id: masinst1 + mas_backup_dir: /tmp/masbr + mongodb_action: backup-database + roles: + - ibm.mas_devops.mongodb +``` -**When to use**: -- Set when creating incremental backups and you want to specify a particular full backup -- Leave unset to automatically use the most recent full backup -- Only valid when `masbr_backup_type=incr` +### Restore Database (CE Operator) +Restore MongoDB databases from a backup to an existing MongoDB instance without recreating the instance resources. -**Valid values**: Timestamp in format `YYYYMMDDHHMMSS` (e.g., `20240621021316`) +**Use Case**: This action is ideal when you need to restore database data to an already running MongoDB instance, such as recovering from data corruption or restoring specific databases. -**Impact**: Incremental backup will contain only changes since the specified full backup. Incorrect version can cause backup chain issues. +**Prerequisites**: +- MongoDB instance must already be installed and running +- MongoDB version must match the backup version +- Backup files must be available in the specified backup directory -**Related variables**: -- `masbr_backup_type`: Must be set to `incr` -- `masbr_storage_local_folder`: Location where full backup exists +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + mongodb_action: restore-database + mas_instance_id: masinst1 + mongodb_backup_version: 20251212-021316 + mas_backup_dir: /tmp/masbr + mongodb_namespace: mongoce + mongodb_instance_name: mas-mongo-ce + roles: + - ibm.mas_devops.mongodb +``` -**Note**: If not specified, role automatically finds the latest full backup. Ensure the specified full backup exists in the storage location. +**What this does**: +1. Validates that the backup exists at the specified location +2. Verifies the MongoDB instance is running in the specified namespace +3. Checks version compatibility between backup and running instance +4. Restores database data using `mongorestore` command +5. Preserves all existing instance configuration and resources -#### masbr_backup_schedule -Cron expression for scheduling automated backups. +**Note**: This action does NOT restore instance resources (secrets, certificates, CRs). Use the `restore` action for full instance restoration. -- **Optional** -- Environment Variable: `MASBR_BACKUP_SCHEDULE` -- Default: None (creates on-demand backup) +### Full Restore (CE Operator) +Perform a complete restoration of MongoDB instance including all Kubernetes resources and database data from a backup. -**Purpose**: Defines when automated backups should run using standard cron syntax. Enables regular, unattended backup operations. +**Use Case**: This action is ideal for disaster recovery scenarios where you need to recreate the entire MongoDB instance from scratch, including all configuration, secrets, certificates, and data. -**When to use**: -- Set to schedule regular automated backups (e.g., daily, weekly) -- Leave unset for manual, on-demand backups -- Consider backup windows and system load when scheduling +**Prerequisites**: +- Backup files must be available in the specified backup directory +- Backup must include instance resources (secrets, certificates, CRs) +- Target cluster must have cert-manager installed +- Sufficient storage and resources available -**Valid values**: Standard [cron expression](https://en.wikipedia.org/wiki/Cron) (e.g., `0 2 * * *` for daily at 2 AM, `0 2 * * 0` for weekly on Sunday at 2 AM) +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + mongodb_action: restore + mas_instance_id: masinst1 + mongodb_backup_version: 20251212-021316 + mas_backup_dir: /tmp/masbr + roles: + - ibm.mas_devops.mongodb +``` -**Impact**: When set, creates a Kubernetes CronJob that automatically runs backups on schedule. Without this, backups only run when role is manually executed. +**What this does**: +1. Validates backup files and required variables +2. Restores namespace, secrets, and ConfigMaps +3. Restores CRDs and RBAC resources +4. Restores MongoDB Operator deployment +5. Restores Certificate Manager resources +6. Restores MongoDBCommunity Custom Resource +7. Waits for MongoDB instance to be fully operational +8. Restores database data using `mongorestore` -**Related variables**: -- `masbr_job_timezone`: Specifies time zone for schedule -- `masbr_backup_type`: Determines if scheduled backups are full or incremental +**Note**: This is a complete restoration that recreates the MongoDB instance from scratch. Use `restore-database` action if you only need to restore data to an existing instance. -**Note**: Test cron expressions before deploying. Consider backup duration when scheduling to avoid overlapping backup jobs. +### Install from Backup (CE Operator) +Deploy a new MongoDB instance using configuration from a backup and restore data from a backup. This is useful for disaster recovery or migrating MongoDB to a new cluster. -#### masbr_restore_from_version -Timestamp of the backup version to restore. +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + mongodb_action: install + mas_instance_id: masinst1 + mongodb_backup_version: 20251212-021316 + mas_backup_dir: /tmp/masbr + roles: + - ibm.mas_devops.mongodb +``` -- **Required** when `mongodb_action=restore` -- Environment Variable: `MASBR_RESTORE_FROM_VERSION` -- Default: None **Purpose**: Specifies which backup version to restore from. This allows point-in-time recovery to a specific backup. diff --git a/ibm/mas_devops/roles/mongodb/defaults/main.yml b/ibm/mas_devops/roles/mongodb/defaults/main.yml index a8eba711b0..020746e3c8 100644 --- a/ibm/mas_devops/roles/mongodb/defaults/main.yml +++ b/ibm/mas_devops/roles/mongodb/defaults/main.yml @@ -5,6 +5,8 @@ mongodb_provider: "{{ lookup('env','MONGODB_PROVIDER') | default('community', Tr # When these are defined we will generate a MAS MongoCfg template mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" mas_config_dir: "{{ lookup('env', 'MAS_CONFIG_DIR') }}" + +# Supported actions: install, uninstall, backup, backup-database, restore, restore-database mongodb_action: "{{ lookup('env', 'MONGODB_ACTION') | default('install', True) }}" # Backup mongodb databases for specified MAS application. Backup all mongodb databases if not specify this value. @@ -117,6 +119,19 @@ docdb_master_password: "{{ lookup('env', 'DOCDB_MASTER_PASSWORD') }}" # Custom Labels custom_labels: "{{ lookup('env', 'CUSTOM_LABELS') | default(None, true) | string | ibm.mas_devops.string2dict() }}" +# MongoCE backup and restore vars +# ----------------------------------------------------------------------------- +mongodb_instance_name: "{{ lookup('env', 'MONGODB_INSTANCE_NAME') | default('mas-mongo-ce', true) }}" +mas_backup_dir: "{{ lookup('env', 'MAS_BACKUP_DIR') }}" +mongodb_backup_version: "{{ lookup('env', 'MONGODB_BACKUP_VERSION') }}" + +# set flag to true, to override storage class with cluster's default +override_storageclass: "{{ lookup('env', 'OVERRIDE_STORAGECLASS') | default(false, true) }}" +_mongodb_storage: "NO_OVERRIDE" + +# ----------------------------------------------------------------------------- + + # Mongo upgrade flags # If identified that there's an existing Mongo that might lead to a v5 or v6 upgrade # the following flags must be set to confirm the upgrades otherwise the role will fail and not proceed with the upgrade. diff --git a/ibm/mas_devops/roles/mongodb/tasks/main.yml b/ibm/mas_devops/roles/mongodb/tasks/main.yml index b46859adae..8e2daa9236 100755 --- a/ibm/mas_devops/roles/mongodb/tasks/main.yml +++ b/ibm/mas_devops/roles/mongodb/tasks/main.yml @@ -8,6 +8,13 @@ that: mongodb_provider is defined and mongodb_provider != "" fail_msg: "mongodb_provider property is required" +- name: "Fail if mongodb_action is not provided or invalid" + assert: + that: + - mongodb_action is defined + - mongodb_action in ["install", "uninstall", "backup", "backup-database", "restore-database", "restore"] + fail_msg: "mongodb_action property is required and must be one of 'install', 'uninstall', 'backup', 'backup-database','restore-database', 'restore'" + # 2. Run the install / uninstall for specified provider # ----------------------------------------------------------------------------- - include_tasks: "tasks/providers/{{ mongodb_provider }}/{{ mongodb_action }}.yml" diff --git a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-database.yml b/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-database.yml new file mode 100644 index 0000000000..ddc43b08fb --- /dev/null +++ b/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-database.yml @@ -0,0 +1,37 @@ +--- +# Check mongodb backup variables +# ----------------------------------------------------------------------------- + +- name: "Fail if require variables for Mongodb backup are not provided" + ibm.mas_devops.verify_backup_restore_vars: + mas_instance_id: "{{ mas_instance_id }}" + mas_backup_dir: "{{ mas_backup_dir }}" + mongodb_instance_name: "{{ mongodb_instance_name }}" + action: "backup" + component: "mongodb" + +- name: "Check if MONGODB_BACKUP_VERSION is provided, if not set to default 'YYYYMMDD-HHMMSS' format" + set_fact: + mongodb_backup_version: "{{ lookup('pipe', 'date +%Y%m%d-%H%M%S') }}" + when: mongodb_backup_version is not defined or mongodb_backup_version == "" or mongodb_backup_version == "None" + +- name: "Set fact: Create Mongodb backup base directory path" + set_fact: + mongodb_backup_path: "{{ mas_backup_dir }}/backup-{{ mongodb_backup_version }}-mongoce" + +- name: "Create {{ mongodb_backup_path }} directory for Mongodb backup" + file: + path: "{{ mongodb_backup_path }}" + state: directory + mode: "0755" + +# Backup Mongodb database Data using mondodump +# ------------------------------------------------------------------------- +- name: "Debug information - Mongodb database Data backup" + debug: + msg: + - "MongoCE namespace .......................... {{ mongodb_namespace }}" + - "MongoCE instance name ...................... {{ mongodb_instance_name }}" + +- name: "Start Database backup process." + include_tasks: "{{ role_path }}/tasks/providers/{{ mongodb_provider }}/backup-restore/backup-database.yml" diff --git a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/after-backup-restore.yml b/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/after-backup-restore.yml deleted file mode 100644 index 460fc3510c..0000000000 --- a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/after-backup-restore.yml +++ /dev/null @@ -1,11 +0,0 @@ ---- -# Clean up -# ------------------------------------------------------------------------- -- name: "Delete temporary folders" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - rm -f {{ masbr_pod_lock_file }}; - rm -rf {{ mongodb_pod_temp_folder }}; - rm -rf {{ mongodb_pvc_temp_folder }} - {{ exec_in_pod_end }} diff --git a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/backup-database.yml b/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/backup-database.yml index 33111d295f..14f1a1929c 100644 --- a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/backup-database.yml +++ b/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/backup-database.yml @@ -1,340 +1,111 @@ --- -# Update database backup status: InProgress +# Backup Mongodb database Data using mondodump # ----------------------------------------------------------------------------- -- name: "Update database backup status: InProgress" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "InProgress" -- name: "Backup mongodb databases" +- name: "Retrieve mongodb info from cr and resources" + ibm.mas_devops.get_mongoce_info: + mongodb_instance_name: "{{ mongodb_instance_name }}" + mongodb_namespace: "{{ mongodb_namespace }}" + register: mongodb_info_result + +- name: "Set fact: mongodb info" + set_fact: + mongodb_backup_data_path: "{{ mongodb_backup_path }}/data" + mongodb_backup_data_filename: "mongodump-{{ mongodb_backup_version }}.tar.gz" + mongoce_pod_name: "{{ mongodb_info_result.mongoce_pod_name }}" + mongodb_version: "{{ mongodb_info_result.mongodb_version }}" + mongodb_service_name: "{{ mongodb_info_result.mongodb_service_name }}" + mongodb_host: "{{ mongodb_info_result.mongodb_host }}" + mongodb_admin_user: "{{ mongodb_info_result.mongodb_admin_user }}" + mongodb_admin_password: "{{ mongodb_info_result.mongodb_admin_password }}" + +- name: "Create {{ mongodb_backup_data_path }} directory for Mongodb database backup" + file: + path: "{{ mongodb_backup_data_path }}" + state: directory + mode: "0755" + +- name: "debug facts" + debug: + msg: + - "mongoce_pod_name .................................. {{ mongoce_pod_name }}" + - "mongodb_version ................................... {{ mongodb_version }}" + - "mongodb_service_name .............................. {{ mongodb_service_name }}" + - "mongodb_host ...................................... {{ mongodb_host }}" + - "mongodb_backup_version ............................ {{ mongodb_backup_version }}" + + +# Prepare shell scripts and transfer to mongo pod +# ----------------------------------------------------------------------------- +- name: Create create-role-user.sh script in local /tmp + ansible.builtin.template: + src: community/backup-restore/create-role-user.sh.j2 + dest: /tmp/create-role-user.sh + mode: '777' + +- name: Create database-backup.sh script in local /tmp + ansible.builtin.template: + src: community/backup-restore/database-backup.sh.j2 + dest: /tmp/database-backup.sh + mode: '777' + +- name: Copy the create-role-user.sh script into the mongodb pod + shell: "oc cp --retries=50 /tmp/create-role-user.sh {{ mongodb_namespace }}/{{ mongoce_pod_name }}:/tmp/create-role-user.sh -c mongod" + register: copy_result + +- name: Copy the database-backup.sh script into the mongodb pod + shell: "oc cp --retries=50 /tmp/database-backup.sh {{ mongodb_namespace }}/{{ mongoce_pod_name }}:/tmp/database-backup.sh -c mongod" + register: copy_result + +# The log file will also be available inside the pod /tmp/create-role-user.log +- name: Exec into mongo pod and run create-role-user.sh to setup backup role and user in Mongodb. + shell: | + oc exec -n {{mongodb_namespace}} {{mongoce_pod_name}} -c mongod -- bash /tmp/create-role-user.sh + register: create_roleuser_output + +- name: "Assert Create role user" + assert: + that: + - create_roleuser_output.rc == 0 + - create_roleuser_output.stdout | regex_search('ROLEUSERstatus-SUCCESS', multiline=True) is not none + fail_msg: "Failed to create role and user for backup" + +- name: "Debug create-role-user logs" + debug: + msg: "{{ create_roleuser_output.stdout_lines }}" + +- name: "Starting Mongo database backup" block: - # Create mongodb role and user for backing up databases - # ------------------------------------------------------------------------- - - name: "Create mongodb role and user for backing up databases" - include_tasks: "tasks/providers/{{ mongodb_provider }}/backup-restore/create-role-user.yml" - - # Prepare mongodb database backup folder - # ------------------------------------------------------------------------- - - name: "Set fact: mongodb database backup folder" - set_fact: - # We should use mongodb pod ephemeral local storage to save the temporary files, - # the mongodb data pvc size is not big enough. - mongodb_backup_folder: "{{ mongodb_pod_temp_folder }}/{{ masbr_job_data_type }}" - - - name: "Set fact: mongodb database backup log" - set_fact: - mongodb_backup_log: "{{ mongodb_backup_folder }}/{{ masbr_job_name }}-backup.log" - - - name: "Create mongodb database backup folder" - changed_when: true - shell: > - mkdir -p {{ masbr_local_job_folder }}/{{ masbr_job_data_type }}; - {{ exec_in_pod_begin }} - mkdir -p {{ mongodb_backup_folder }} - {{ exec_in_pod_end }} - - # Set database name filter - # ------------------------------------------------------------------------- - - name: "Set fact: default database name filter" - when: mas_app_id is not defined or mas_app_id | length == 0 - set_fact: - # Backup all databases if not specified mas_app_id - mongodb_db_filter: "^(mas|iot)(_|-){{ mas_instance_id }}(_|-)(?!.*monitor$)" - - - name: "Set fact: database name filter for {{ mas_app_id }}" - when: mas_app_id is defined and mas_app_id | length > 0 - block: - - name: "Set fact: database name filters for each mas app" - set_fact: - mongodb_db_all_filters: - iot: "iot_{{ mas_instance_id }}_" - visualinspection: "mas-{{ mas_instance_id }}-(visualinspection|edgeman)" - optimizer: "mas_{{ mas_instance_id }}_optimizer" - - - name: "Set fact: always backup databases for core" - set_fact: - mongodb_db_app_filters: - ["mas_{{ mas_instance_id }}_(core|catalog|adoptionusage)"] - - - name: "Set fact: append database name filters for ({{ mas_app_id }})" - set_fact: - mongodb_db_app_filters: > - {{ mongodb_db_app_filters + [mongodb_db_all_filters[mas_app_id] | default('')] }} - - - name: "Set fact: database name filters for ({{ mas_app_id }})" - when: mongodb_db_app_filters is defined and mongodb_db_app_filters | length > 0 - set_fact: - mongodb_db_filter: "{{ mongodb_db_app_filters | select() | join('|') }}" - - - name: "Debug: database name filter" + - name: "Running backup script in pod. Check logs in /tmp/database-backup.log in pod" + shell: | + oc exec -n {{mongodb_namespace}} {{mongoce_pod_name}} -c mongod -- bash -c '/tmp/database-backup.sh | tee /tmp/database-backup.log' + register: database_backup_output + + - name: "Assert Database backup" + assert: + that: + - database_backup_output.rc == 0 + - database_backup_output.stdout | regex_search('DATABASEBACKUPstatus-SUCCESS', multiline=True) is not none + fail_msg: "Failed to backup databases" + + - name: "Debug database-backup logs" debug: - msg: "{{ mongodb_db_filter }}" - - # Take a full backup of mongodb databases - # ------------------------------------------------------------------------- - - name: "Take a full backup of mongodb databases" - when: masbr_backup_type == "full" - block: - # Get all database name belong to specified mas instance - - name: "Get all database names belong to mas instance {{ mas_instance_id }}" - changed_when: false - shell: >- - {{ exec_in_pod_begin }} - {{ mongodb_shell }} --quiet --host={{ mongodb_primary_host }} - --username={{ mongodb_user }} --password={{ mongodb_password }} - --authenticationDatabase=admin --tls --tlsCAFile={{ mongodb_ca_file }} admin - --eval="JSON.stringify(db.adminCommand({ - listDatabases: 1, - nameOnly: true, - filter: { name: /{{ mongodb_db_filter }}/ } - }))" | tee -a {{ mongodb_backup_log }} - {{ exec_in_pod_end }} - register: _mongodb_db_names_output - no_log: true - - - name: "Set fact: database names" - set_fact: - mongodb_db_names: "{{ _mongodb_db_names_output.stdout | from_json | json_query('databases') }}" - - - name: "Debug: database names" - debug: - msg: "{{ mongodb_db_names }}" - - # Get the timestamp of current backup - - name: "Get the timestamp of current backup" - changed_when: false - shell: > - {{ exec_in_pod_begin }} - date +%s - {{ exec_in_pod_end }} - register: _ts_output - - - name: "Set fact: timestamp of current backup" - set_fact: - mongodb_backup_ts: "{{ _ts_output.stdout }}" - - # Excluding sessions collection for Monitor database since it is not cleaned up automatically - # leading to very large backup sizes and long backup times. This is acceptable as - # sessions are transient and can be recreated. - # Refer DT425304 - MASCORE-5808 - - name: "Set fact for Monitor database" - set_fact: - monitor_db_name: "mas_{{ mas_instance_id }}_monitor" - exclude_monitor_session_collection: "--excludeCollection=sessions" - - # mongodump - - name: "Take full backup of mongodb databases" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - mongodump --host={{ mongodb_primary_host }} - --username={{ mongodb_user }} --password={{ mongodb_password }} - --authenticationDatabase=admin --ssl --sslCAFile={{ mongodb_ca_file }} - --out={{ mongodb_backup_folder }}/mongodump - --db={{ item.name }} - {{ exclude_monitor_session_collection if (item.name == monitor_db_name) else '' }} - |& tee -a {{ mongodb_backup_log }} - {{ exec_in_pod_end }} - loop: "{{ mongodb_db_names }}" - register: _mongodump_output - no_log: true - - - name: "Debug: mongodump output" - debug: - msg: "{{ _mongodump_output | json_query('results[*].stdout_lines') }}" - - # Take an incremental backup of mongodb databases - # ------------------------------------------------------------------------- - - name: "Take an incremental backup of mongodb databases" - when: masbr_backup_type == "incr" - block: - # Get query file from specified backup job - - name: "Get query file from specified backup job" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_storage_files_to_pod.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_backup_from }}" - masbr_cf_paths: - - src_file: "{{ masbr_job_data_type }}/query.json" - dest_folder: "{{ mongodb_backup_folder }}/from" - - - name: "Get query file content" - changed_when: false - shell: > - {{ exec_in_pod_begin }} - cat {{ mongodb_backup_folder }}/from/query.json - {{ exec_in_pod_end }} - register: _query_file_content_output + msg: "{{ database_backup_output.stdout_lines }}" - - name: "Debug: query file content" - debug: - msg: "{{ _query_file_content_output.stdout }}" + - name: "copy backup files from mongo pod to local backup directory" + shell: "oc cp --retries=50 -c mongod {{ mongodb_namespace }}/{{ mongoce_pod_name }}:/tmp/masbr/{{ mongodb_backup_version }}/{{ mongodb_backup_data_filename }} {{ mongodb_backup_data_path }}/{{ mongodb_backup_data_filename }}" + register: copy_result - # Get the timestamp of current backup - - name: "Get the timestamp of current backup" - changed_when: false - shell: > - {{ exec_in_pod_begin }} - date +%s - {{ exec_in_pod_end }} - register: _ts_output - - - name: "Set fact: timestamp of current backup" - set_fact: - mongodb_backup_ts: "{{ _ts_output.stdout }}" - - # mongodump - - name: "Take an incremental backup of mongodb databases" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - mongodump --host={{ mongodb_primary_host }} - --username={{ mongodb_user }} --password={{ mongodb_password }} - --authenticationDatabase=admin --ssl --sslCAFile={{ mongodb_ca_file }} - --db=local --collection=oplog.rs --queryFile={{ mongodb_backup_folder }}/from/query.json - --out={{ mongodb_backup_folder }}/mongodump - |& tee -a {{ mongodb_backup_log }} - {{ exec_in_pod_end }} - register: _mongodump_output - no_log: true - - - name: "Debug: mongodump output" - debug: - msg: "{{ _mongodump_output.stdout_lines }}" - - # Create tar.gz archives of database backup files - # ------------------------------------------------------------------------- - - name: "Create tar.gz archives of database backup files" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - tar -czf {{ mongodb_backup_folder }}/{{ masbr_job_name }}.tar.gz - -C {{ mongodb_backup_folder }}/mongodump . && - du -h {{ mongodb_backup_folder }}/* - {{ exec_in_pod_end }} - register: _du_files_output - - - name: "Debug: size of backup files" - debug: - msg: "{{ _du_files_output.stdout_lines }}" - - # Copy backup files to specified storage location - # ------------------------------------------------------------------------- - - name: "Copy backup files from pod to specified storage location" - when: _mongodb_cf_in_server - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_pod_files_to_storage.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_job_name }}" - masbr_cf_paths: - - src_file: "{{ mongodb_backup_folder }}/{{ masbr_job_name }}.tar.gz" - dest_folder: "{{ masbr_job_data_type }}" - - - name: "Download and copy backup files to specified storage location" - when: not _mongodb_cf_in_server - block: - - name: "Download backup files from pod to local" - changed_when: true - shell: > - oc cp --retries=50 -c {{ mongodb_container_name }} - {{ mongodb_namespace }}/{{ mongodb_pod_name }}:{{ mongodb_backup_folder }}/{{ masbr_job_name }}.tar.gz - {{ masbr_local_job_folder }}/{{ masbr_job_data_type }}/{{ masbr_job_name }}.tar.gz - - - name: "Copy backup files from local to specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_local_files_to_storage.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_job_name }}" - masbr_cf_paths: - - src_file: "{{ masbr_job_data_type }}/{{ masbr_job_name }}.tar.gz" - dest_folder: "{{ masbr_job_data_type }}" - - # Create query file used by the subsequent incremental backups - # ------------------------------------------------------------------------- - - name: "Create query file used by the subsequent incremental backups" + - name: "Create yaml file with mongodb backup details" copy: - dest: "{{ masbr_local_job_folder }}/{{ masbr_job_data_type }}/query.json" - content: "{{ mongodb_query | from_yaml | to_json }}" - mode: preserve - vars: - mongodb_query: "{{ lookup('template', 'templates/community/mongo-query.yml.j2') }}" - - - name: "Get query file content" - changed_when: false - shell: > - cat {{ masbr_local_job_folder }}/{{ masbr_job_data_type }}/query.json - register: _cat_query_file_output - - - name: "Debug: query file content" - debug: - msg: "{{ _cat_query_file_output.stdout }}" - - - name: "Copy query file to specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_local_files_to_storage.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_job_name }}" - masbr_cf_paths: - - src_file: "{{ masbr_job_data_type }}/query.json" - dest_folder: "{{ masbr_job_data_type }}" - - # Update database backup status: Completed - # ------------------------------------------------------------------------- - - name: "Update database backup status: Completed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Completed" - - rescue: - # Update database backup status: Failed - # ------------------------------------------------------------------------- - - name: "Update database backup status: Failed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Failed" + dest: "{{ mongodb_backup_data_path }}/mongodb-info.yaml" + content: | + source_mongodb_backup_version: "{{ mongodb_backup_version }}" + source_mongodb_backup_data_filename: "{{ mongodb_backup_data_filename }}" + source_mongodb_version: "{{ mongodb_version }}" always: - # Copy mongodb backup log file to specified storage location - # ------------------------------------------------------------------------- - - name: "Create a tar.gz archive of mongodb backup log" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - tar -czf {{ mongodb_backup_folder }}/{{ masbr_job_name }}-backup-log.tar.gz - -C {{ mongodb_backup_folder }} {{ masbr_job_name }}-backup.log - {{ exec_in_pod_end }} - - - name: "Copy mongodb backup log file from pod to specified storage location" - when: _mongodb_cf_in_server - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_pod_files_to_storage.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_job_name_final }}" - masbr_cf_paths: - - src_file: "{{ mongodb_backup_folder }}/{{ masbr_job_name }}-backup-log.tar.gz" - dest_folder: "log" - - - name: "Download and copy backup log file to specified storage location" - when: not _mongodb_cf_in_server - block: - - name: "Download backup log file from pod to local" - changed_when: true - shell: > - oc cp --retries=50 -c {{ mongodb_container_name }} - {{ mongodb_namespace }}/{{ mongodb_pod_name }}:{{ mongodb_backup_folder }}/{{ masbr_job_name }}-backup-log.tar.gz - {{ masbr_local_job_folder }}/log/{{ masbr_job_name }}-backup-log.tar.gz - - - name: "Copy backup log file from local to specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_local_files_to_storage.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_job_name_final }}" - masbr_cf_paths: - - src_file: "log/{{ masbr_job_name }}-backup-log.tar.gz" - dest_folder: "log" + - name: Cleanup backup files from mongo pod + shell: | + oc exec -n {{mongodb_namespace}} {{mongoce_pod_name}} -c mongod -- rm -rf /tmp/masbr/{{ mongodb_backup_version }} + register: database_backup_cleanup_output diff --git a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/backup-instance.yml b/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/backup-instance.yml new file mode 100644 index 0000000000..818229c457 --- /dev/null +++ b/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/backup-instance.yml @@ -0,0 +1,127 @@ +--- +# Backup Mongodb Community edition cluster Instance +# ----------------------------------------------------------------------------- +# Check if GrafanaDashboard CRD exists +# ----------------------------------------------------------------------------- +- name: "Check if GrafanaDashboard CRD exists" + ibm.mas_devops.crd_exists: + crdName: grafanadashboards.grafana.integreatly.org + register: grafana_crd_check + +- name: "Set fact: MongoDB CE backup resources" + set_fact: + mongoce_namespace_backup_resources: + - kind: Project + api_version: project.openshift.io/v1 + name: "{{ mongodb_namespace }}" + # CRD - mongodbcommunity.mongodbcommunity.mongodb.com + - kind: CustomResourceDefinition + api_version: apiextensions.k8s.io/v1 + name: mongodbcommunity.mongodbcommunity.mongodb.com + # mongodbcommunity.mongodb.com + - kind: MongoDBCommunity + api_version: mongodbcommunity.mongodb.com/v1 + name: "{{ mongodb_instance_name }}" + # Role + - kind: Role + api_version: rbac.authorization.k8s.io/v1 + name: mongodb-kubernetes-operator + - kind: Role + api_version: rbac.authorization.k8s.io/v1 + name: mongodb-database + # RoleBinding + - kind: RoleBinding + api_version: rbac.authorization.k8s.io/v1 + name: mongodb-kubernetes-operator + - kind: RoleBinding + api_version: rbac.authorization.k8s.io/v1 + name: mongodb-database + # ServiceAccount + - kind: ServiceAccount + api_version: v1 + name: mongodb-kubernetes-operator + - kind: ServiceAccount + api_version: v1 + name: mongodb-database + # Issuers + - kind: Issuer + api_version: cert-manager.io/v1 + # Certificates + - kind: Certificate + api_version: cert-manager.io/v1 + # Deployment + - kind: Deployment + api_version: apps/v1 + name: mongodb-kubernetes-operator + # secrets + - kind: Secret + api_version: v1 + name: "{{ mongodb_instance_name }}-scram-scram-credentials" + - kind: Secret + api_version: v1 + name: "{{ mongodb_instance_name }}-admin-password" + # Configmap + - kind: ConfigMap + api_version: v1 + name: "{{ mongodb_instance_name }}-cert-map" + # Service Monitor + - kind: ServiceMonitor + api_version: monitoring.coreos.com/v1 + name: "{{ mongodb_instance_name }}-service-monitor" + +# Add GrafanaDashboard resource only if CRD exists +# ----------------------------------------------------------------------------- +- name: "Set fact: MongoDB Grafana Dashboard backup resources" + set_fact: + grafana_mongodb_backup_resources: + - kind: GrafanaDashboard + api_version: grafana.integreatly.org/v1beta1 + +- name: Add GrafanaDashboard to backup resources if CRD exists + set_fact: + mongoce_namespace_backup_resources: "{{ mongoce_namespace_backup_resources + grafana_mongodb_backup_resources }}" + when: grafana_crd_check.exists | bool + +- name: "Set fact: MongoDB backup resources" + set_fact: + mongoce_backup_resources: + - namespace: "{{ mongodb_namespace }}" + resources: "{{ mongoce_namespace_backup_resources }}" + +# Call the backup_resources plugin to execute the backup to the path provided +# ----------------------------------------------------------------------------- +- name: "Backup MongoCE resources (referenced secrets are auto-discovered)" + ibm.mas_devops.backup_resource: + backup_resources: "{{ mongoce_backup_resources }}" + backup_path: "{{ mongodb_backup_path }}" + register: backup_result + +# Show the results +# ----------------------------------------------------------------------------- +- name: "Display backup results" + debug: + msg: + - "Backup completed{{ ' with failures' if backup_result.failed_count > 0 else ' successfully' }}" + - "Total resources backed up: {{ backup_result.backed_up_count }}" + - "Total resources failed: {{ backup_result.failed_count }}" + - "Resources not found: {{ backup_result.not_found_count }}" + - "Secrets auto-discovered: {{ backup_result.discovered_secrets_count }}" + - "Backup location: {{ mongodb_backup_path }}" + +# Fail task if any errors occurred. +# ----------------------------------------------------------------------------- +- name: "Display failed resources" + debug: + msg: + - "Failed resources:" + - "{{ backup_result.failed_resources | to_nice_yaml }}" + when: backup_result.failed_count > 0 + +- name: "Fail if backup had errors" + fail: + msg: | + Backup failed for {{ backup_result.failed_count }} resource(s): + {% for resource in backup_result.failed_resources %} + - {{ resource.description }} in {{ resource.scope }} + {% endfor %} + when: backup_result.failed_count > 0 diff --git a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/before-backup-restore.yml b/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/before-backup-restore.yml deleted file mode 100644 index ed74e87464..0000000000 --- a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/before-backup-restore.yml +++ /dev/null @@ -1,159 +0,0 @@ -# Set mongodb backup/restore variables -# ----------------------------------------------------------------------------- -- name: "Set fact: mongodb pod variables" - set_fact: - mongodb_pod_name: mas-mongo-ce-0 - mongodb_container_name: mongod - -- name: "Set fact: exec command in mongodb pod" - set_fact: - exec_in_pod_begin: >- - oc exec {{ mongodb_pod_name }} -c {{ mongodb_container_name }} -n {{ mongodb_namespace }} -- bash -c ' - exec_in_pod_end: "'" - -- name: "Set fact: copy file variables" - set_fact: - masbr_cf_namespace: "{{ mongodb_namespace }}" - masbr_cf_pod_name: "{{ mongodb_pod_name }}" - masbr_cf_container_name: "{{ mongodb_container_name }}" - masbr_cf_pvc_name: "data-volume-{{ mongodb_pod_name }}" - masbr_cf_pvc_mount_path: "/data" - masbr_cf_pvc_sub_path: "" - masbr_cf_are_pvc_paths: false - # The mongodb pvc access mode is 'ReadWriteOnce', we need to set affinity to schedule our copying file pod - # to the same node where mongodb pod located, so that the mongodb pvc can be mounted by multiple pods. - masbr_cf_affinity: true - -- name: "Set fact: temporary folders" - set_fact: - mongodb_pod_temp_folder: "{{ masbr_pod_temp_folder }}/{{ masbr_job_name }}" - mongodb_pvc_temp_folder: "{{ masbr_cf_pvc_mount_path }}/{{ masbr_job_name }}" - -# Get mongodb admin password -# ----------------------------------------------------------------------------- -- name: "Get mongodb admin password" - kubernetes.core.k8s_info: - kind: Secret - name: mas-mongo-ce-admin-password - namespace: "{{ mongodb_namespace }}" - register: _mongodb_password_output - no_log: true - -- name: "Set fact: mongodb admin password" - set_fact: - mongodb_password: "{{ _mongodb_password_output.resources[0].data.password | b64decode }}" - when: - - _mongodb_password_output is defined - - _mongodb_password_output.resources[0] is defined - - _mongodb_password_output.resources[0].data.password is defined - no_log: true - -# Get mongodb ca file location -# ----------------------------------------------------------------------------- -- name: "Get mongodb ca file" - changed_when: false - shell: > - {{ exec_in_pod_begin }} - ls /var/lib/tls/ca/*.pem | head -n 1 - {{ exec_in_pod_end }} - register: _mongodb_ca_file_output - -- name: "Set fact: mongodb ca file" - set_fact: - mongodb_ca_file: "{{ _mongodb_ca_file_output.stdout }}" - when: - - _mongodb_ca_file_output.rc == 0 - - '"No such file or directory" not in _mongodb_ca_file_output.stdout' - -# Get mongodb primary host -# ----------------------------------------------------------------------------- -- name: "Get mongodb server information" - changed_when: false - shell: > - {{ exec_in_pod_begin }} - {{ mongodb_shell }} --quiet --host={{ mongodb_pod_name }}.mas-mongo-ce-svc.{{ mongodb_namespace }}.svc.cluster.local:27017 - --username=admin --password={{ mongodb_password }} --authenticationDatabase=admin - --tls --tlsCAFile={{ mongodb_ca_file }} admin - --eval="print(JSON.stringify(db.runCommand({hello:1})))" - {{ exec_in_pod_end }} - register: _mongodb_info_output - no_log: true - -- name: "Set fact: mongodb primary host" - set_fact: - mongodb_primary_host: "{{ _mongodb_info_output.stdout_lines[-1] | from_json | json_query('primary') }}" - -# Output mongodb information -# ----------------------------------------------------------------------------- -- name: "Debug: mongodb information" - debug: - msg: - - "MongoDB version ............................ {{ mongodb_version }}" - - "MongoDB is running ......................... {{ mongodb_running }}" - - "MongoDB pod name ........................... {{ mongodb_pod_name }}" - - "MongoDB primary host ....................... {{ mongodb_primary_host }}" - - "MongoDB ca file ............................ {{ mongodb_ca_file }}" - -# Check if an exiting job is running -# ------------------------------------------------------------------------- -- name: "Try to find job lock file in pod" - when: not masbr_allow_multi_jobs - changed_when: false - shell: > - {{ exec_in_pod_begin }} - [ -f {{ masbr_pod_lock_file }} ] && echo exist; exit 0 - {{ exec_in_pod_end }} - register: _get_lock_file_output - -- name: "Fail if found job lock file in pod" - when: not masbr_allow_multi_jobs - assert: - that: _get_lock_file_output.stdout != "exist" - fail_msg: "A backup/restore job is running now, please try to run job later!" - -- name: "Create job lock file in pod" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - mkdir -p {{ masbr_pod_lock_file | dirname }}; - touch {{ masbr_pod_lock_file }} - {{ exec_in_pod_end }} - register: _create_restore_lock_output - -# Check storage usage -# ------------------------------------------------------------------------- -- name: "Get storage usage of pod temporary folder" - changed_when: false - shell: > - {{ exec_in_pod_begin }} - mkdir -p {{ mongodb_pod_temp_folder }}; - df -h {{ mongodb_pod_temp_folder }} - {{ exec_in_pod_end }} - register: _df_temp_output - -- name: "Debug: storage usage of pod temporary folder" - debug: - msg: "{{ _df_temp_output.stdout_lines }}" - -- name: "Get storage usage of pvc temporary folder" - changed_when: false - shell: > - {{ exec_in_pod_begin }} - mkdir -p {{ mongodb_pvc_temp_folder }}; - df -h {{ mongodb_pvc_temp_folder }} - {{ exec_in_pod_end }} - register: _df_pvc_output - -- name: "Debug: storage usage of pvc temporary folder" - debug: - msg: "{{ _df_pvc_output.stdout_lines }}" - -# Workarounds -# ------------------------------------------------------------------------- -- name: "Set fact: how to copy backup files to specified storage location" - # When testing in the env using 'ibmc-block-gold' PVC, our copying file pod cannot mount - # the mongo data PVC even schedule it to the same node where mongo data pod located. - # Do not create Pod to copy mongo PVC data at this point, we will download the data to local first, - # then copy it to specified storage location. - set_fact: - _mongodb_cf_in_server: false diff --git a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/create-role-user.yml b/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/create-role-user.yml deleted file mode 100644 index 4a9d97c804..0000000000 --- a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/create-role-user.yml +++ /dev/null @@ -1,67 +0,0 @@ ---- -- name: "Set fact: mongodb role and user" - set_fact: - mongodb_role: sysadmin - mongodb_user: sysadmin - -# Create mongodb role -# ----------------------------------------------------------------------------- -- name: "Get mongodb role '{{ mongodb_role }}'" - changed_when: false - shell: > - oc exec {{ mongodb_pod_name }} -c {{ mongodb_container_name }} -n {{ mongodb_namespace }} -- bash -c - '{{ mongodb_shell }} --quiet --host={{ mongodb_primary_host }} --username=admin --password={{ mongodb_password }} - --authenticationDatabase=admin --tls --tlsCAFile={{ mongodb_ca_file }} admin - --eval="db.getRole( \"{{ mongodb_role }}\" )"' - register: _mongodb_get_role_output - no_log: true - -- name: "Debug: get mongodb role result" - debug: - msg: "Get mongodb role result ............... {{ _mongodb_get_role_output.stdout_lines }}" - -- name: "Create mongodb role '{{ mongodb_role }}'" - when: _mongodb_get_role_output.stdout_lines[-1] == "null" - block: - - name: "Create mongodb role '{{ mongodb_role }}'" - changed_when: true - shell: > - oc exec {{ mongodb_pod_name }} -c {{ mongodb_container_name }} -n {{ mongodb_namespace }} -- bash -c - '{{ mongodb_shell }} --quiet --host={{ mongodb_primary_host }} --username=admin --password={{ mongodb_password }} - --authenticationDatabase=admin --tls --tlsCAFile={{ mongodb_ca_file }} admin - --eval="db.createRole({ role: \"{{ mongodb_role }}\", roles: [], privileges: [{ resource: {anyResource: true}, actions: [\"anyAction\"] }]})"' - register: _mongodb_create_role_output - no_log: true - - - name: "Debug: create mongodb role result" - debug: - msg: "Create mongodb role result ........ {{ _mongodb_create_role_output.stdout_lines }}" - -# Create mongodb user -# ----------------------------------------------------------------------------- -- name: "Get mongodb user '{{ mongodb_user }}'" - changed_when: false - shell: > - oc exec {{ mongodb_pod_name }} -c {{ mongodb_container_name }} -n {{ mongodb_namespace }} -- bash -c - '{{ mongodb_shell }} --quiet --host={{ mongodb_primary_host }} --username=admin --password={{ mongodb_password }} - --authenticationDatabase=admin --tls --tlsCAFile={{ mongodb_ca_file }} admin - --eval="db.getUser( \"{{ mongodb_user }}\" )"' - register: _mongodb_get_user_output - no_log: true - -- name: "Create mongodb user '{{ mongodb_user }}'" - when: _mongodb_get_user_output.stdout_lines[-1] == "null" - block: - - name: "Create mongodb user '{{ mongodb_user }}'" - changed_when: true - shell: > - oc exec {{ mongodb_pod_name }} -c {{ mongodb_container_name }} -n {{ mongodb_namespace }} -- bash -c - '{{ mongodb_shell }} --quiet --host={{ mongodb_primary_host }} --username=admin --password={{ mongodb_password }} - --authenticationDatabase=admin --tls --tlsCAFile={{ mongodb_ca_file }} admin - --eval="db.createUser({ user: \"{{ mongodb_user }}\", pwd: \"{{ mongodb_password }}\", roles: [ \"{{ mongodb_role }}\" ]})"' - register: _mongodb_create_user_output - no_log: true - - - name: "Debug: create mongodb user result" - debug: - msg: "Create mongodb user result ........ {{ _mongodb_create_user_output.stdout_lines }}" diff --git a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/get-mongo-info.yml b/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/get-mongo-info.yml deleted file mode 100644 index 34fb99117e..0000000000 --- a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/get-mongo-info.yml +++ /dev/null @@ -1,38 +0,0 @@ ---- -# Get mongodb version and status -# ----------------------------------------------------------------------------- -- name: "Get MongoDBCommunity" - kubernetes.core.k8s_info: - api_version: mongodbcommunity.mongodb.com/v1 - kind: MongoDBCommunity - name: mas-mongo-ce - namespace: "{{ mongodb_namespace }}" - register: _mongodbcommunity_output - -- name: "Set fact: mongodb version" - set_fact: - mongodb_version: "{{ _mongodbcommunity_output.resources[0].spec.version }}" - when: - - _mongodbcommunity_output is defined - - _mongodbcommunity_output.resources[0] is defined - - _mongodbcommunity_output.resources[0].spec.version is defined - -- name: "Fail if mongodb does not exists" - assert: - that: mongodb_version is defined - fail_msg: "Mongodb does not exists!" - -- name: "Set fact: mongodb running status" - set_fact: - mongodb_running: true - when: - - _mongodbcommunity_output is defined - - _mongodbcommunity_output.resources[0] is defined - - _mongodbcommunity_output.resources[0].status is defined - - _mongodbcommunity_output.resources[0].status.phase is defined - - _mongodbcommunity_output.resources[0].status.phase == "Running" - -- name: "Fail if mongodb is not running" - assert: - that: mongodb_running is defined and mongodb_running - fail_msg: "Mongodb is not running!" diff --git a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/restore-database-patch.yml b/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/restore-database-patch.yml deleted file mode 100644 index 19f45bf41d..0000000000 --- a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/restore-database-patch.yml +++ /dev/null @@ -1,76 +0,0 @@ ---- -- name: "Debug: cluster information" - debug: - msg: - - "Source cluster ................... {{ masbr_restore_from_yaml.source.domain }}" - - "Target cluster ................... {{ masbr_cluster_domain }}" - -# Because we will not restore the OauthClient collection, disable below patches for now. -- name: "Update domain in 'mas_{{ mas_instance_id }}_core.OauthClient' collection" - when: masbr_restore_to_diff_domain and false - block: - # Get Suite version - # ----------------------------------------------------------------------------- - - name: "Set fact: mas core namespace name" - set_fact: - mas_core_namespace: "mas-{{ mas_instance_id }}-core" - - - name: "Get Suite" - kubernetes.core.k8s_info: - api_version: core.mas.ibm.com/v1 - kind: Suite - name: "{{ mas_instance_id }}" - namespace: "{{ mas_core_namespace }}" - register: _suite_output - - - name: "Set fact: Suite version" - set_fact: - mas_core_version: "{{ _suite_output.resources[0].status.versions.reconciled }}" - when: - - _suite_output is defined - - (_suite_output.resources | length > 0) - - _suite_output.resources[0].status.versions.reconciled is defined - - - name: "Debug: Suite version" - debug: - msg: "Suite version ..................... {{ mas_core_version }}" - - - name: "This fix only for MAS 8.x" - when: mas_core_version is match("^8.") - block: - # Run oidcclientreg job - # --------------------------------------------------------------------- - - name: "Run oidcclientreg job" - changed_when: true - shell: > - oc get job {{ mas_instance_id }}-oidcclientreg -n {{ mas_core_namespace }} -o yaml - | yq 'del(.spec.selector)' | yq 'del(.spec.template.metadata.labels)' - | oc replace --force -f - - register: _run_job_output - - - name: "Debug: run oidcclientreg job" - debug: - msg: "{{ _run_job_output.stdout_lines }}" - - - name: "Wait for oidcclientreg job to be completed (10s delay)" - kubernetes.core.k8s_info: - api_version: batch/v1 - kind: Job - name: "{{ mas_instance_id }}-oidcclientreg" - namespace: "{{ mas_core_namespace }}" - register: _job_result - until: - - _job_result.resources is defined - - _job_result.resources | length > 0 - - _job_result.resources | json_query('[*].status.conditions[?type==`Complete`][].status') | select ('match','True') | list | length == 1 - retries: 30 - delay: 10 - - # Restart entitymgr-ws pod - # --------------------------------------------------------------------- - - name: "Restart entitymgr-ws pod" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/restart_and_reconsiled.yml" - vars: - _pod_namespace: "{{ mas_core_namespace }}" - _pod_keywords: "entitymgr-ws" - _container_name: "manager" diff --git a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/restore-database-perform.yml b/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/restore-database-perform.yml deleted file mode 100644 index 581bc827e5..0000000000 --- a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/restore-database-perform.yml +++ /dev/null @@ -1,121 +0,0 @@ ---- -# Input parameters: -# _job_type -# _job_name - -# Copy backup file from specified storage location -# ------------------------------------------------------------------------- -- name: "Copy backup file from specified storage location to pod" - when: _mongodb_cf_in_server - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_storage_files_to_pod.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ _job_name }}" - masbr_cf_paths: - - src_file: "{{ masbr_job_data_type }}/{{ _job_name }}.tar.gz" - dest_folder: "{{ mongodb_restore_folder }}" - -- name: "Download and copy backup files to pod" - when: not _mongodb_cf_in_server - block: - - name: "Download backup files from specified storage location to local" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_storage_files_to_local.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ _job_name }}" - masbr_cf_paths: - - src_file: "{{ masbr_job_data_type }}/{{ _job_name }}.tar.gz" - dest_folder: "{{ masbr_local_job_folder }}/{{ masbr_job_data_type }}" - - - name: "Copy backup files from local to pod" - changed_when: true - shell: > - oc cp --retries=50 -c {{ mongodb_container_name }} - {{ masbr_local_job_folder }}/{{ masbr_job_data_type }}/{{ _job_name }}.tar.gz - {{ mongodb_namespace }}/{{ mongodb_pod_name }}:{{ mongodb_restore_folder }} - -# Extract the tar.gz file -# ------------------------------------------------------------------------- -- name: "Extract the tar.gz file" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - mkdir -p {{ mongodb_restore_folder }}/{{ _job_name }} && - tar -xzf {{ mongodb_restore_folder }}/{{ _job_name }}.tar.gz - -C {{ mongodb_restore_folder }}/{{ _job_name }} && - ls -lA {{ mongodb_restore_folder }}/{{ _job_name }} - {{ exec_in_pod_end }} - register: _extract_output - -- name: "Debug: list extracted files" - debug: - msg: - - "Extract output folder .............. {{ mongodb_restore_folder }}/{{ _job_name }}" - - "{{ _extract_output.stdout_lines }}" - -# Restore mongodb databases -# ------------------------------------------------------------------------- -- name: "Set fact: mongodb ns instance" - set_fact: - mongodb_ns_instance: "{{ mas_instance_id }}" - -- name: "Set fact: rename mongodb ns" - when: masbr_restore_to_diff_instance - set_fact: - mongodb_ns_rename: >- - --nsFrom="*_{{ masbr_restore_from_yaml.source.instance }}_*.*" --nsTo="*_{{ mas_instance_id }}_*.*" - mongodb_ns_instance: "{{ masbr_restore_from_yaml.source.instance }}" - -- name: "Set fact: mongodb ns filters (Only for restoring from Full backup)" - when: _job_type == "full" - set_fact: - # Exclude these collections because they should be populated by MAS installation - mongodb_ns_filter: >- - --nsExclude="mas_{{ mongodb_ns_instance }}_adoptionusage.*" - --nsExclude="mas_{{ mongodb_ns_instance }}_catalog.*" - --nsExclude="mas_{{ mongodb_ns_instance }}_core.OauthClient" - --nsExclude="mas_{{ mongodb_ns_instance }}_core.OauthToken" - --nsExclude="mas_{{ mongodb_ns_instance }}_core.bindings" - --nsExclude="mas_{{ mongodb_ns_instance }}_core.graphiteconfigtool.*" - --nsExclude="mas_{{ mongodb_ns_instance }}_core.workspaces" - -- name: "Debug: mongodb ns filters (Only for restoring from Full backup)" - when: _job_type == "full" - debug: - msg: "{{ mongodb_ns_filter }}" - -- name: "Restore mongodb databases" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - mongorestore --host={{ mongodb_primary_host }} - --username={{ mongodb_user }} --password={{ mongodb_password }} - --authenticationDatabase=admin --ssl --sslCAFile={{ mongodb_ca_file }} - {{ '--oplogReplay' if _job_type == 'incr' }} - {{ mongodb_ns_filter if _job_type == 'full' }} - {{ mongodb_ns_rename if masbr_restore_to_diff_instance }} - --drop --dir={{ mongodb_restore_folder }}/{{ _job_name }} - |& tee -a {{ mongodb_restore_log }} - {{ exec_in_pod_end }} - register: _mongorestore_output - no_log: true - -- name: "Debug: mongorestore output" - debug: - msg: "{{ _mongorestore_output.stdout_lines }}" - -# Save restored database names -# ------------------------------------------------------------------------- -- name: "Get restored database names" - changed_when: false - when: _job_type == "full" - shell: > - {{ exec_in_pod_begin }} - ls {{ mongodb_restore_folder }}/{{ _job_name }} - {{ exec_in_pod_end }} - register: _ls_output - -- name: "Set fact: restored database names" - when: _job_type == "full" - set_fact: - mongodb_restored_db_names: "{{ _ls_output.stdout_lines }}" diff --git a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/restore-database.yml b/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/restore-database.yml index cfbdde90d9..d759578006 100644 --- a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/restore-database.yml +++ b/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup-restore/restore-database.yml @@ -1,123 +1,133 @@ --- -# Update database restore status: InProgress + +- name: "Set fact: mongodb restore paths and filename" + set_fact: + mongodb_backup_data_filename: "mongodump-{{ mongodb_backup_version }}.tar.gz" + mongodb_tmp_restore_dir: "/tmp/masbr/{{ mongodb_backup_version }}" + +- name: "Check if backup data files exists" + stat: + path: "{{ item }}" + register: backup_data_files + loop: + - "{{ mongodb_data_path }}/{{ mongodb_backup_data_filename }}" + - "{{ mongodb_data_path }}/mongodb-info.yaml" + +- name: "Assert backup data files exist" + assert: + that: + - backup_data_files.results[0].stat.exists + - backup_data_files.results[1].stat.exists + fail_msg: "Required backup data files are missing in {{ mongodb_data_path }}. Ensure that the backup process completed successfully." + +- name: include vars from mongodb-info.yaml + include_vars: + file: "{{ mongodb_data_path }}/mongodb-info.yaml" + name: mongodb_backup_info + +- name: "Set facts: from mongodb cr and resources" + ibm.mas_devops.get_mongoce_info: + mongodb_instance_name: "{{ mongodb_instance_name }}" + mongodb_namespace: "{{ mongodb_namespace }}" + register: mongodb_info_result + +- name: "Assert mongodb_info_result" + assert: + that: + - mongodb_info_result.success + +- name: "Set fact: mongodb info" + set_fact: + mongoce_pod_name: "{{ mongodb_info_result.mongoce_pod_name }}" + mongodb_version: "{{ mongodb_info_result.mongodb_version }}" + mongodb_service_name: "{{ mongodb_info_result.mongodb_service_name }}" + mongodb_host: "{{ mongodb_info_result.mongodb_host }}" + mongodb_admin_user: "{{ mongodb_info_result.mongodb_admin_user }}" + mongodb_admin_password: "{{ mongodb_info_result.mongodb_admin_password }}" + +- name: "Assert if target mongoce version match backup mongoce version" + assert: + that: + - mongodb_version.split('.')[:2] == mongodb_backup_info.source_mongodb_version.split('.')[:2] + fail_msg: "MongoDB major.minor version mismatch.. Target MongoCE version {{ mongodb_version }} does NOT match backup MongoCE version {{ mongodb_backup_info.source_mongodb_version }}. Restore cannot proceed." + +- name: "debug facts" + debug: + msg: + - "mongoce_pod_name .................................. {{ mongoce_pod_name }}" + - "mongodb_version ................................... {{ mongodb_version }}" + - "mongodb_service_name .............................. {{ mongodb_service_name }}" + - "mongodb_host ...................................... {{ mongodb_host }}" + - "mongodb_backup_version ............................ {{ mongodb_backup_version }}" + +# Prepare shell scripts for restore and transfer scripts to mongo pod # ----------------------------------------------------------------------------- -- name: "Update database restore status: InProgress" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "InProgress" - -- name: "Restore mongodb databases" - block: - # Create mongodb role and user for backing up databases - # ------------------------------------------------------------------------- - - name: "Create mongodb role and user for backing up databases" - include_tasks: "tasks/providers/{{ mongodb_provider }}/backup-restore/create-role-user.yml" - - # Prepare mongodb database restore folders - # ------------------------------------------------------------------------- - - name: "Set fact: mongodb database restore variables" - set_fact: - # We should use mongodb pod ephemeral local storage to save the temporary files, - # the mongodb data pvc size is not big enough. - mongodb_restore_folder: "{{ mongodb_pod_temp_folder }}/{{ masbr_job_data_type }}" - - - name: "Set fact: mongodb database restore log" - set_fact: - mongodb_restore_log: "{{ mongodb_restore_folder }}/{{ masbr_job_name }}-restore.log" - - - name: "Create mongodb database restore folder in pod" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - mkdir -p {{ mongodb_restore_folder }}; - chmod a+w {{ mongodb_restore_folder }} - {{ exec_in_pod_end }} - - - name: "Debug: mongodb database restore folder in pod" - debug: - msg: "Database restore folder ........... {{ mongodb_restore_folder }}" - - # This is an incremental backup, need to restore based on full backup first - # ------------------------------------------------------------------------- - - name: "Restore based on full backup" - when: masbr_restore_from_incr - block: - - name: "Copy based on full backup file from specified storage location to pod" - include_tasks: "{{ role_path }}/tasks/providers/{{ mongodb_provider }}/backup-restore/restore-database-perform.yml" - vars: - _job_type: "full" - _job_name: "{{ masbr_restore_basedon }}" - - # Restore databases from the specified Full or Incremental backup - # ------------------------------------------------------------------------- - - name: "Restore databases from the specified {{ 'Incremental' if masbr_restore_from_incr else 'Full' }} backup" - include_tasks: "{{ role_path }}/tasks/providers/{{ mongodb_provider }}/backup-restore/restore-database-perform.yml" - vars: - _job_type: "{{ 'incr' if masbr_restore_from_incr else 'full' }}" - _job_name: "{{ masbr_restore_from }}" - - # Do some post restoration tasks - # ------------------------------------------------------------------------- - - name: "Do some post restoration tasks " - include_tasks: "{{ role_path }}/tasks/providers/{{ mongodb_provider }}/backup-restore/restore-database-patch.yml" - - # Update database restore status: Completed - # ------------------------------------------------------------------------- - - name: "Update database restore status: Completed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Completed" - - rescue: - # Update database restore status: Failed - # ------------------------------------------------------------------------- - - name: "Update database restore status: Failed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Failed" - - always: - # Copy mongodb restore log file to specified storage location - # ------------------------------------------------------------------------- - - name: "Create a tar.gz archive of mongodb restore log" - changed_when: true - shell: > - {{ exec_in_pod_begin }} - tar -czf {{ mongodb_restore_folder }}/{{ masbr_job_name }}-restore-log.tar.gz - -C {{ mongodb_restore_folder }} {{ masbr_job_name }}-restore.log - {{ exec_in_pod_end }} - - - name: "Copy mongodb restore log file from pod to specified storage location" - when: _mongodb_cf_in_server - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_pod_files_to_storage.yml" - vars: - masbr_cf_job_type: "restore" - masbr_cf_job_name: "{{ masbr_job_name_final }}" - masbr_cf_paths: - - src_file: "{{ mongodb_restore_folder }}/{{ masbr_job_name }}-restore-log.tar.gz" - dest_folder: "log" - - - name: "Download and copy restore log file to specified storage location" - when: not _mongodb_cf_in_server - block: - - name: "Download restore log file from pod to local" - changed_when: true - shell: > - oc cp --retries=50 -c {{ mongodb_container_name }} - {{ mongodb_namespace }}/{{ mongodb_pod_name }}:{{ mongodb_restore_folder }}/{{ masbr_job_name }}-restore-log.tar.gz - {{ masbr_local_job_folder }}/log/{{ masbr_job_name }}-restore-log.tar.gz - - - name: "Copy restore log file from local to specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_local_files_to_storage.yml" - vars: - masbr_cf_job_type: "restore" - masbr_cf_job_name: "{{ masbr_job_name_final }}" - masbr_cf_paths: - - src_file: "log/{{ masbr_job_name }}-restore-log.tar.gz" - dest_folder: "log" +- block: + - name: Create create-role-user.sh script in local /tmp + ansible.builtin.template: + src: community/backup-restore/create-role-user.sh.j2 + dest: /tmp/create-role-user.sh + mode: '777' + + - name: Create database-restore.sh script in local /tmp + ansible.builtin.template: + src: community/backup-restore/database-restore.sh.j2 + dest: /tmp/database-restore.sh + mode: '777' + + - name: Copy the create-role-user.sh script into the mongodb pod + shell: "oc cp --retries=50 /tmp/create-role-user.sh {{ mongodb_namespace }}/{{ mongoce_pod_name }}:/tmp/create-role-user.sh -c mongod" + register: copy_result + retries: 2 + delay: 15 # seconds + until: copy_result.rc == 0 + + - name: Copy the database-restore.sh script into the mongodb pod + shell: "oc cp --retries=50 /tmp/database-restore.sh {{ mongodb_namespace }}/{{ mongoce_pod_name }}:/tmp/database-restore.sh -c mongod" + register: copy_result + retries: 2 + delay: 15 # seconds + until: copy_result.rc == 0 + +# The log file will also be available inside the pod /tmp/create-role-user.log +- name: Exec into mongo pod and run create-role-user.sh to setup restore role and user in Mongodb. + shell: | + oc exec -n {{mongodb_namespace}} {{mongoce_pod_name}} -- bash /tmp/create-role-user.sh + register: create_roleuser_output + +- name: "Debug create-role-user logs" + debug: + msg: "{{ create_roleuser_output.stdout_lines }}" + +- name: "Assert Create role user" + assert: + that: + - create_roleuser_output.rc == 0 + - create_roleuser_output.stdout | regex_search('ROLEUSERstatus-SUCCESS', multiline=True) is not none + fail_msg: "Failed to create role and user for restore" + +# Create temporary restore directory +- name: "Create temporary restore directory in mongodb pod and clean up any existing files" + shell: oc exec -n {{mongodb_namespace}} {{mongoce_pod_name}} -c mongod -- bash -c 'rm -rf {{ mongodb_tmp_restore_dir }}; mkdir -p {{ mongodb_tmp_restore_dir }}' + register: create_tmpdir_output + +- name: "Copy backup data file into mongodb pod" + shell: | + oc cp --retries=50 {{ mongodb_data_path }}/{{ mongodb_backup_data_filename }} {{ mongodb_namespace }}/{{ mongoce_pod_name }}:{{ mongodb_tmp_restore_dir }}/{{ mongodb_backup_data_filename }} -c mongod + register: copy_backupdata_output + +- name: "Running restore script in pod, check logs in /tmp/database-restore.log in pod" + shell: | + oc exec -n {{mongodb_namespace}} {{mongoce_pod_name}} -c mongod -- bash -c '/tmp/database-restore.sh | tee /tmp/database-restore.log' + register: database_restore_output + +- name: Assert database restore + assert: + that: + - database_restore_output.rc == 0 + - database_restore_output.stdout | regex_search('DATABASERESTOREstatus-SUCCESS', multiline=True) is not none + fail_msg: "Failed to restore database from backup" + +- name: "Debug database-restore logs" + debug: + msg: "{{ database_restore_output.stdout_lines }}" diff --git a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup.yml b/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup.yml index 605d26764e..37ce2dc4b9 100644 --- a/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup.yml +++ b/ibm/mas_devops/roles/mongodb/tasks/providers/community/backup.yml @@ -1,75 +1,42 @@ --- # Check mongodb backup variables # ----------------------------------------------------------------------------- -- name: "Fail if mas_instance_id is not provided" - assert: - that: mas_instance_id is defined and mas_instance_id != "" - fail_msg: "mas_instance_id is required" -# Get mongodb information -# ------------------------------------------------------------------------- -- name: "Get mongodb information" - include_tasks: "{{ role_path }}/tasks/providers/{{ mongodb_provider }}/backup-restore/get-mongo-info.yml" +- name: "Fail if require variables for Mongodb backup are not provided" + ibm.mas_devops.verify_backup_restore_vars: + mas_instance_id: "{{ mas_instance_id }}" + mas_backup_dir: "{{ mas_backup_dir }}" + mongodb_instance_name: "{{ mongodb_instance_name }}" + action: "backup" + component: "mongodb" -# Set common backup job variables -# ----------------------------------------------------------------------------- -- name: "Set fact: common backup job variables" +- name: "Check if MONGODB_BACKUP_VERSION is provided, if not set to default 'YYYYMMDD-HHMMSS' format" set_fact: - masbr_job_component: - name: "mongodb" - instance: "{{ mas_instance_id }}" - app: "{{ mas_app_id }}" - namespace: "{{ mongodb_namespace }}" - provider: "{{ mongodb_provider }}" - version: "{{ mongodb_version }}" - masbr_job_data_list: - - seq: "1" - type: "database" + mongodb_backup_version: "{{ lookup('pipe', 'date +%Y%m%d-%H%M%S') }}" + when: mongodb_backup_version is not defined or mongodb_backup_version == "" or mongodb_backup_version == "None" -# Before run tasks -# ------------------------------------------------------------------------- -- name: "Before run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/before_run_tasks.yml" - vars: - _ignore_masbr_backup_data: true - _job_type: "backup" - _component_before_task_path: "{{ role_path }}/tasks/providers/{{ mongodb_provider }}/backup-restore/before-backup-restore.yml" +- name: "Set fact: Create Mongodb backup base directory path" + set_fact: + mongodb_backup_path: "{{ mas_backup_dir }}/backup-{{ mongodb_backup_version }}-mongoce" -- name: "Perform backup" - block: - # Update backup job status: New - # ------------------------------------------------------------------------- - - name: "Update backup job status: New" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "1" - phase: "New" +- name: "Create {{ mongodb_backup_path }} directory for Mongodb backup" + file: + path: "{{ mongodb_backup_path }}" + state: directory + mode: "0755" - # Run backup tasks for each data type - # ------------------------------------------------------------------------- - - name: "Run backup tasks for each data type" - include_tasks: "tasks/providers/{{ mongodb_provider }}/backup-restore/backup-{{ job_data_item.type }}.yml" - vars: - masbr_job_data_seq: "{{ job_data_item.seq }}" - masbr_job_data_type: "{{ job_data_item.type }}" - loop: "{{ masbr_job_data_list }}" - loop_control: - loop_var: job_data_item +# Backup Mongodb Cluster Instance Kubernetes Resources +# ------------------------------------------------------------------------- +- name: "Start MongoCE Instance backup process." + include_tasks: "{{ role_path }}/tasks/providers/{{ mongodb_provider }}/backup-restore/backup-instance.yml" - rescue: - # Update backup status: Failed - # ------------------------------------------------------------------------- - - name: "Update database backup status: Failed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_status: - phase: "Failed" +# Backup Mongodb database Data using mondodump +# ------------------------------------------------------------------------- +- name: "Debug information - Mongodb database Data backup" + debug: + msg: + - "MongoCE namespace .......................... {{ mongodb_namespace }}" + - "MongoCE instance name ...................... {{ mongodb_instance_name }}" - always: - # After run tasks - # ------------------------------------------------------------------------- - - name: "After run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/after_run_tasks.yml" - vars: - _component_after_task_path: "{{ role_path }}/tasks/providers/{{ mongodb_provider }}/backup-restore/after-backup-restore.yml" +- name: "Start Database backup process." + include_tasks: "{{ role_path }}/tasks/providers/{{ mongodb_provider }}/backup-restore/backup-database.yml" diff --git a/ibm/mas_devops/roles/mongodb/tasks/providers/community/restore-database.yml b/ibm/mas_devops/roles/mongodb/tasks/providers/community/restore-database.yml new file mode 100644 index 0000000000..19eba5e2e0 --- /dev/null +++ b/ibm/mas_devops/roles/mongodb/tasks/providers/community/restore-database.yml @@ -0,0 +1,28 @@ +--- +# Check mongodb restore required variables +# ----------------------------------------------------------------------------- +- name: "Fail if required variables for Mongodb database restore are not provided" + ibm.mas_devops.verify_backup_restore_vars: + mas_instance_id: "{{ mas_instance_id }}" + mas_backup_dir: "{{ mas_backup_dir }}" + mongodb_backup_version: "{{ mongodb_backup_version }}" + mongodb_instance_name: "{{ mongodb_instance_name }}" + action: "restore" + component: "mongodb" + +- name: "Set fact: backup dir paths" + set_fact: + mongodb_backup_dir: "{{ mas_backup_dir }}/backup-{{ mongodb_backup_version }}-mongoce" + mongodb_data_path: "{{ mas_backup_dir }}/backup-{{ mongodb_backup_version }}-mongoce/data" + +# Check for existing MongoDb install +# ----------------------------------------------------------------------------- +- name: "Check for existing MongoDB installation in namespace {{ mongodb_namespace }}" + ibm.mas_devops.verify_mongoce_version: + mongodb_instance_name: "{{ mongodb_instance_name }}" + mongodb_namespace: "{{ mongodb_namespace }}" + register: existing_mongo_info + +- name: "Start Database restore process." + include_tasks: "{{ role_path }}/tasks/providers/{{ mongodb_provider }}/backup-restore/restore-database.yml" + when: existing_mongo_info is defined and existing_mongo_info.running diff --git a/ibm/mas_devops/roles/mongodb/tasks/providers/community/restore-instance.yml b/ibm/mas_devops/roles/mongodb/tasks/providers/community/restore-instance.yml new file mode 100644 index 0000000000..fae1ffea5c --- /dev/null +++ b/ibm/mas_devops/roles/mongodb/tasks/providers/community/restore-instance.yml @@ -0,0 +1,396 @@ +--- +# Check mongodb restore required variables +# ----------------------------------------------------------------------------- +- name: "Fail if required variables for Mongodb database restore are not provided" + ibm.mas_devops.verify_backup_restore_vars: + mas_instance_id: "{{ mas_instance_id }}" + mas_backup_dir: "{{ mas_backup_dir }}" + mongodb_backup_version: "{{ mongodb_backup_version }}" + action: "restore" + component: "mongodb" + +# Set backup path facts +- name: "Set fact: backup dir paths" + set_fact: + mongodb_backup_path: "{{ mas_backup_dir }}/backup-{{ mongodb_backup_version }}-mongoce" + mongodb_resources_path: "{{ mas_backup_dir }}/backup-{{ mongodb_backup_version }}-mongoce/resources" + +- name: "Check mongodb resource path exist" + stat: + path: "{{ mongodb_resources_path }}" + register: resources_backup_path_stat + +- name: "Fail if backup archive does not exist" + fail: + msg: "Mongodb resources archive not found at: {{ mongodb_resources_path }}" + when: not resources_backup_path_stat.stat.exists or not resources_backup_path_stat.stat.isdir + +# Verify cert-manager exists +# ----------------------------------------------------------------------------- +- name: Detect Certificate Manager installation + include_tasks: "{{ role_path }}/../../common_tasks/detect_cert_manager.yml" + +# Verify only one MongoCE instance file is present in backup archive +# ----------------------------------------------------------------------------- +- name: Get files from {{ mongodb_resources_path }}/mongodbcommunitys directory + set_fact: + instance_files: "{{ lookup('fileglob', '{{ mongodb_resources_path }}/mongodbcommunitys/*', wantlist=True) }}" + +- name: Assert exactly one MongoDBCommunity CR exists + assert: + that: + - instance_files | length == 1 + fail_msg: "MongoDBCommunity Directory must contain exactly one file" + +- name: Set fact mongodb cr + set_fact: + mongodbcr_cfg: "{{ lookup('file', '{{ instance_files[0] }}') | from_yaml }}" + +# Get Mongodb details from backup CR +# ----------------------------------------------------------------------------- +- name: Set fact mongo namespace and instance name from backup CR + set_fact: + mongodb_namespace: "{{ mongodbcr_cfg.metadata.namespace }}" + mongodb_instance_name: "{{ mongodbcr_cfg.metadata.name }}" + source_mongodb_replicas_check: "{{ mongodbcr_cfg.spec.members }}" + source_mongodb_version: "{{ mongodbcr_cfg.spec.version }}" + +- name: "Mongo restore information" + debug: + msg: + - "MAS Instance ID ................ {{ mas_instance_id }}" + - "MongoDB Namespace .............. {{ mongodb_namespace }}" + - "MongoDB Instance Name .......... {{ mongodb_instance_name }}" + - "Backup Version ................. {{ mongodb_backup_version }}" + - "Backup Path .................... {{ mongodb_backup_path }}" + +# 1. Restore Namespace +# ----------------------------------------------------------------------------- +- name: Restore namespace + ibm.mas_devops.restore_resource: + backup_path: "{{ mongodb_backup_path }}" + resource_kinds: + - Project + replace_resource: false + register: namespace_result + +# 2. Restore Secrets and ConfigMaps +# ----------------------------------------------------------------------------- +- name: Restore Secrets and ConfigMaps + ibm.mas_devops.restore_resource: + backup_path: "{{ mongodb_backup_path }}" + resource_kinds: + - Secret + - ConfigMap + register: secrets_configmaps_result + when: namespace_result.success + +# 3. Restore CRD +# ----------------------------------------------------------------------------- +- name: Restore CRD + ibm.mas_devops.restore_resource: + backup_path: "{{ mongodb_backup_path }}" + resource_kinds: + - CustomResourceDefinition + register: crd_result + when: + - namespace_result.success + - secrets_configmaps_result is defined and secrets_configmaps_result.success + +# 4. Restore RBAC +# ----------------------------------------------------------------------------- +- name: Restore ServiceAccount, Role, RoleBinding + ibm.mas_devops.restore_resource: + backup_path: "{{ mongodb_backup_path }}" + resource_kinds: + - ServiceAccount + - Role + - RoleBinding + register: rbac_result + when: + - namespace_result.success + - crd_result is defined and crd_result.success + +# 5. Configure anyuid permissions in the MongoDb namespace +# ----------------------------------------------------------------------------- +- name: "Configure anyuid permissions in the MongoDb namespace(s)" + shell: | + oc adm policy add-scc-to-user anyuid system:serviceaccount:{{ mongodb_namespace }}:default + oc adm policy add-scc-to-user anyuid system:serviceaccount:{{ mongodb_namespace }}:mongodb-kubernetes-operator + oc adm policy add-scc-to-user anyuid system:serviceaccount:{{ mongodb_namespace }}:mongodb-database + when: + - mongodb_namespace is defined and namespace_result.success + +# 6. Restore the MongoDb Operator deployment +# ----------------------------------------------------------------------------- +- name: Restore the MongoDb Operator + ibm.mas_devops.restore_resource: + backup_path: "{{ mongodb_backup_path }}" + resource_kinds: + - Deployment + register: operator_result + when: + - namespace_result.success + - rbac_result is defined and rbac_result.success + +- name: "Wait for Mongodb operator to be ready (60s delay)" + kubernetes.core.k8s_info: + api_version: apps/v1 + kind: Deployment + name: mongodb-kubernetes-operator + namespace: "{{ mongodb_namespace }}" + register: mongodb_operator_status + retries: 60 + delay: 60 + until: + - mongodb_operator_status.resources is defined + - mongodb_operator_status.resources | length > 0 + - mongodb_operator_status.resources[0].status is defined + - mongodb_operator_status.resources[0].status.conditions is defined + - mongodb_operator_status.resources[0].status.conditions | selectattr('type', 'equalto', 'Available') | selectattr('status', 'equalto', 'True') | list | length > 0 + when: + - namespace_result.success + - operator_result is defined and operator_result.success + + +# 7. Restore Certficate Manager resources +# ----------------------------------------------------------------------------- +- name: "Restore Certificate Manager resources" + ibm.mas_devops.restore_resource: + backup_path: "{{ mongodb_backup_path }}" + resource_kinds: + - Issuer + - Certificate + register: certmanager_result + when: + - namespace_result.success + - operator_result is defined and operator_result.success + +# Load default storage classes (if not provided by the user and not an update) +# ----------------------------------------------------------------------------- +- name: Use chosen (or default) storage class when override_storageclass is true + include_tasks: tasks/providers/community/determine-storage-classes.yml + when: + - namespace_result.success + - certmanager_result is defined and certmanager_result.success + - override_storageclass | bool + +- name: Replace mongo storageclass names when override_storageclass is true + when: + - override_storageclass | bool + - mongodb_storage_class is defined + set_fact: + _mongodb_storage: "{{ mongodbcr_cfg.spec.statefulSet.spec.volumeClaimTemplates | ibm.mas_devops.set_storage_classes_names(mongodb_storage_class, mongodb_storage_class) }}" + +# 8. Restore MongodbCE CR resource +# ----------------------------------------------------------------------------- +- name: "Restore MongodbCE CR resource" + ibm.mas_devops.restore_resource: + backup_path: "{{ mongodb_backup_path }}" + resource_kinds: + - MongoDBCommunity + filter_values: + MongoDBCommunity: + - spec.replicaSetHorizons + override_values: + MongoDBCommunity: + - spec.statefulSet.spec.volumeClaimTemplates: "{{ _mongodb_storage }}" + register: cr_result + when: + - namespace_result.success + - certmanager_result is defined and certmanager_result.success + +- name: "Wait for {{ mongodb_instance_name }} stateful set to be ready" + kubernetes.core.k8s_info: + api_version: apps/v1 + kind: StatefulSet + name: "{{ mongodb_instance_name }}" + namespace: "{{ mongodb_namespace }}" + vars: + mongodb_replicas_check: "{{ source_mongodb_replicas_check }}" + register: mongodb_statefulset + retries: 45 # Approx 90 minutes + delay: 120 # 2 minutes + until: + - mongodb_statefulset.resources is defined + - mongodb_statefulset.resources | length > 0 + - mongodb_statefulset.resources[0].status.readyReplicas is defined + - mongodb_statefulset.resources[0].status.readyReplicas == (mongodb_replicas_check|int) + when: + - namespace_result.success + - cr_result is defined and cr_result.success + +- name: "Wait for {{ mongodb_instance_name }}-arb stateful set to be ready" + kubernetes.core.k8s_info: + api_version: apps/v1 + kind: StatefulSet + name: "{{ mongodb_instance_name }}-arb" + namespace: "{{ mongodb_namespace }}" + register: mongodb_arb_statefulset + retries: 45 # Approx 90 minutes + delay: 120 # 2 minutes + until: + - mongodb_arb_statefulset.resources is defined + - mongodb_arb_statefulset.resources | length > 0 + - mongodb_arb_statefulset.resources[0].status.availableReplicas is defined + - mongodb_arb_statefulset.resources[0].status.availableReplicas == 0 + when: + - namespace_result.success + - cr_result is defined and cr_result.success + - source_mongodb_version is version('4.4.0','>=') # this statefulset will only exist in Mongo v4.4+ + +- name: "Wait for Mongo CR to report expected version {{ source_mongodb_version }}" + kubernetes.core.k8s_info: + api_version: mongodbcommunity.mongodb.com/v1 + kind: MongoDBCommunity + name: "{{ mongodb_instance_name }}" + namespace: "{{ mongodb_namespace }}" + register: mongodb_cr + retries: 45 # Approx 45 minutes + delay: 60 # 1 minute + until: + - mongodb_cr.resources[0].status.version is defined + - mongodb_cr.resources[0].status.version == source_mongodb_version + when: + - namespace_result.success + - cr_result is defined and cr_result.success + +# Restore MongoDb service monitor +# ----------------------------------------------------------------------------- +- name: "Restore Service Monitor resources" + ibm.mas_devops.restore_resource: + backup_path: "{{ mongodb_backup_path }}" + resource_kinds: + - ServiceMonitor + register: servicemonitor_result + when: + - namespace_result.success + - cr_result is defined and cr_result.success + +# Restore Grafana Dashboard if cluster has grafana v5 capability +# ----------------------------------------------------------------------------- +- name: Get cluster info + kubernetes.core.k8s_cluster_info: + register: api_status + no_log: true + when: + - namespace_result.success + - servicemonitor_result is defined and servicemonitor_result.success + +- name: Determine cluster grafana capabilities + set_fact: + supports_grafanav5: "{{ + api_status is defined and + api_status.apis is defined and + api_status.apis['grafana.integreatly.org/v1beta1'] is defined }}" + when: + - namespace_result.success + - servicemonitor_result is defined and servicemonitor_result.success + +- name: "Restore Grafana Dashboards" + ibm.mas_devops.restore_resource: + backup_path: "{{ mongodb_backup_path }}" + resource_kinds: + - GrafanaDashboard + register: grafdash_result + when: + - namespace_result.success + - servicemonitor_result is defined and servicemonitor_result.success + - supports_grafanav5 + +# Calculate total results +# ----------------------------------------------------------------------------- +- name: "Calculate total restore results" + set_fact: + total_created: >- + {{ + (namespace_result.created_count | default(0)) + + (secrets_configmaps_result.created_count | default(0)) + + (crd_result.created_count | default(0)) + + (operator_result.created_count | default(0)) + + (rbac_result.created_count | default(0)) + + (certmanager_result.created_count | default(0)) + + (cr_result.created_count | default(0)) + + (servicemonitor_result.created_count | default(0)) + + (grafdash_result.created_count | default(0)) + }} + total_updated: >- + {{ + (namespace_result.updated_count | default(0)) + + (secrets_configmaps_result.updated_count | default(0)) + + (crd_result.updated_count | default(0)) + + (operator_result.updated_count | default(0)) + + (rbac_result.updated_count | default(0)) + + (certmanager_result.updated_count | default(0)) + + (cr_result.updated_count | default(0)) + + (servicemonitor_result.updated_count | default(0)) + + (grafdash_result.updated_count | default(0)) + }} + total_skipped: >- + {{ + (namespace_result.skipped_count | default(0)) + + (secrets_configmaps_result.skipped_count | default(0)) + + (crd_result.skipped_count | default(0)) + + (operator_result.skipped_count | default(0)) + + (rbac_result.skipped_count | default(0)) + + (certmanager_result.skipped_count | default(0)) + + (cr_result.skipped_count | default(0)) + + (servicemonitor_result.skipped_count | default(0)) + + (grafdash_result.skipped_count | default(0)) + }} + total_failed: >- + {{ + (namespace_result.failed_count | default(0)) + + (secrets_configmaps_result.failed_count | default(0)) + + (crd_result.failed_count | default(0)) + + (operator_result.failed_count | default(0)) + + (rbac_result.failed_count | default(0)) + + (certmanager_result.failed_count | default(0)) + + (cr_result.failed_count | default(0)) + + (servicemonitor_result.failed_count | default(0)) + + (grafdash_result.failed_count | default(0)) + }} + +- name: "Display total restore results" + debug: + msg: + - >- + Restore completed{{ ' with failures' if total_failed | int > 0 + else ' successfully' }} + - "Total resources created: {{ total_created }}" + - "Total resources updated: {{ total_updated }}" + - "Total resources skipped: {{ total_skipped }}" + - "Total resources failed: {{ total_failed }}" + +# Fail task if any errors occurred +# ----------------------------------------------------------------------------- +- name: "Collect all failed resources" + set_fact: + all_failed_resources: >- + {{ + (namespace_result.failed_resources | default([])) + + (secrets_configmaps_result.failed_resources | default([])) + + (crd_result.failed_resources | default([])) + + (operator_result.failed_resources | default([])) + + (rbac_result.failed_resources | default([])) + + (certmanager_result.failed_resources | default([])) + + (cr_result.failed_resources | default([])) + + (servicemonitor_result.failed_resources | default([])) + + (grafdash_result.failed_resources | default([])) + }} + +- name: "Display failed resources" + debug: + msg: + - "Failed resources:" + - "{{ all_failed_resources | to_nice_yaml }}" + when: total_failed | int > 0 + +- name: "Fail if restore had errors" + fail: + msg: | + Restore failed for {{ total_failed }} resource(s): + {% for resource in all_failed_resources %} + - {{ resource.description }}: {{ resource.error }} + {% endfor %} + when: total_failed | int > 0 diff --git a/ibm/mas_devops/roles/mongodb/tasks/providers/community/restore.yml b/ibm/mas_devops/roles/mongodb/tasks/providers/community/restore.yml index 5cb5b4be8f..7f83ac8df1 100644 --- a/ibm/mas_devops/roles/mongodb/tasks/providers/community/restore.yml +++ b/ibm/mas_devops/roles/mongodb/tasks/providers/community/restore.yml @@ -1,77 +1,8 @@ --- -# Check mongodb restore required variables +# Task to restore instance # ----------------------------------------------------------------------------- -- name: "Set fact: " - set_fact: - mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" +- include_tasks: "tasks/providers/{{ mongodb_provider }}/restore-instance.yml" -- name: "Fail if mas_instance_id is not provided" - assert: - that: mas_instance_id is defined and mas_instance_id != "" - fail_msg: "mas_instance_id is required" - -# Get mongodb information -# ------------------------------------------------------------------------- -- name: "Get mongodb information" - include_tasks: "{{ role_path }}/tasks/providers/{{ mongodb_provider }}/backup-restore/get-mongo-info.yml" - -# Set common restore job variables +# Task to restore database # ----------------------------------------------------------------------------- -- name: "Set fact: common restore job variables" - set_fact: - masbr_job_component: - name: "mongodb" - instance: "{{ mas_instance_id }}" - namespace: "{{ mongodb_namespace }}" - provider: "{{ mongodb_provider }}" - version: "{{ mongodb_version }}" - masbr_job_data_list: - - seq: "1" - type: "database" - -# Before run tasks -# ------------------------------------------------------------------------- -- name: "Before run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/before_run_tasks.yml" - vars: - _job_type: "restore" - _component_before_task_path: "{{ role_path }}/tasks/providers/{{ mongodb_provider }}/backup-restore/before-backup-restore.yml" - -- name: "Perform restore" - block: - # Update restore job status: New - # ------------------------------------------------------------------------- - - name: "Update restore job status: New" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "1" - phase: "New" - - # Run restore tasks for each data type - # ------------------------------------------------------------------------- - - name: "Run restore tasks for each data type" - include_tasks: "tasks/providers/{{ mongodb_provider }}/backup-restore/restore-{{ job_data_item.type }}.yml" - vars: - masbr_job_data_seq: "{{ job_data_item.seq }}" - masbr_job_data_type: "{{ job_data_item.type }}" - loop: "{{ masbr_job_data_list }}" - loop_control: - loop_var: job_data_item - - rescue: - # Update restore status: Failed - # ------------------------------------------------------------------------- - - name: "Update database restore status: Failed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_status: - phase: "Failed" - - always: - # After run tasks - # ------------------------------------------------------------------------- - - name: "After run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/after_run_tasks.yml" - vars: - _component_after_task_path: "{{ role_path }}/tasks/providers/{{ mongodb_provider }}/backup-restore/after-backup-restore.yml" +- include_tasks: "tasks/providers/{{ mongodb_provider }}/restore-database.yml" diff --git a/ibm/mas_devops/roles/mongodb/templates/community/backup-restore/create-role-user.sh.j2 b/ibm/mas_devops/roles/mongodb/templates/community/backup-restore/create-role-user.sh.j2 new file mode 100644 index 0000000000..85f93d3882 --- /dev/null +++ b/ibm/mas_devops/roles/mongodb/templates/community/backup-restore/create-role-user.sh.j2 @@ -0,0 +1,69 @@ +#!/bin/bash + + +MONGODB_BACKUP_ROLE="bradmin" +MONGODB_BACKUP_USER="bradmin" + +# Check if mongo is primary +tlsCAFile=$(ls /var/lib/tls/ca/*.pem) +echo "Using TLS CA file: $tlsCAFile" +primary_host="" + +is_primary=$(mongosh --quiet -u {{ mongodb_admin_user }} -p {{ mongodb_admin_password }} --host {{ mongodb_host }} --tlsCAFile $tlsCAFile --authenticationDatabase=admin --tls --eval 'db.isMaster().ismaster') +cmdrc=$? +if [ $cmdrc -ne 0 ]; then + echo "Error checking if node is primary. Exiting." + exit 1 +fi + +if [ "$is_primary" != "true" ]; then + echo "This node is not primary. Get Primary from server status." + primary_host=$(mongosh --quiet -u {{ mongodb_admin_user }} -p {{ mongodb_admin_password }} --tlsCAFile $tlsCAFile --host {{ mongodb_host }} --authenticationDatabase=admin --tls --eval 'rs.isMaster().primary') + cmdrc=$? + if [ $cmdrc -ne 0 ]; then + echo "Error getting primary node. Exiting." + exit 1 + fi + echo "Primary host is: $primary_host" +fi + +if [ -n "$primary_host" ]; then + echo "Connecting to primary host: $primary_host to create user." + mongo_cmd="mongosh --quiet -u {{ mongodb_admin_user }} -p {{ mongodb_admin_password }} --tlsCAFile $tlsCAFile --authenticationDatabase=admin --tls --host $primary_host" +else + echo "Creating user on current primary node." + mongo_cmd="mongosh --quiet -u {{ mongodb_admin_user }} -p {{ mongodb_admin_password }} --tlsCAFile $tlsCAFile --authenticationDatabase=admin --tls --host {{ mongodb_host }}" +fi + +# Check if role already exists +role_exists=$($mongo_cmd --eval "db.getSiblingDB('admin').getRole('$MONGODB_BACKUP_ROLE');") +if [ -n "$role_exists" -a "$role_exists" != "null" ]; then + echo "Role $MONGODB_BACKUP_ROLE already exists. Skipping role creation." +else + echo "Creating role $MONGODB_BACKUP_ROLE." + # Create role user + $mongo_cmd --eval "db.getSiblingDB('admin').createRole({role: '$MONGODB_BACKUP_ROLE', privileges: [{ resource: { anyResource: true }, actions: [ 'anyAction' ] }],roles: []});" + cmdrc=$? + if [ $cmdrc -ne 0 ]; then + echo "Error creating backup role. Exiting." + exit 1 + fi +fi + +# Create backup user with the created role +user_exists=$($mongo_cmd --eval "db.getSiblingDB('admin').getUser('$MONGODB_BACKUP_USER');") +if [ -n "$user_exists" -a "$user_exists" != "null" ]; then + echo "User $MONGODB_BACKUP_USER already exists. Skipping user creation." +else + echo "Creating user $MONGODB_BACKUP_USER with role $MONGODB_BACKUP_ROLE." + $mongo_cmd --eval "db.getSiblingDB('admin').createUser({ user: '$MONGODB_BACKUP_USER', pwd: '{{ mongodb_admin_password }}', roles: [ { role: '$MONGODB_BACKUP_ROLE', db: 'admin' } ]});" + cmdrc=$? + if [ $cmdrc -ne 0 ]; then + echo "Error creating backup admin user. Exiting." + exit 1 + fi +fi + +echo "Role and user creation process completed." +echo "ROLEUSERstatus-SUCCESS" + diff --git a/ibm/mas_devops/roles/mongodb/templates/community/backup-restore/database-backup.sh.j2 b/ibm/mas_devops/roles/mongodb/templates/community/backup-restore/database-backup.sh.j2 new file mode 100644 index 0000000000..ca479bf4a7 --- /dev/null +++ b/ibm/mas_devops/roles/mongodb/templates/community/backup-restore/database-backup.sh.j2 @@ -0,0 +1,94 @@ +#!/bin/bash + +MONGODB_BACKUP_USER="bradmin" + +TMP_BACKUP_DIR="/tmp/masbr/{{ mongodb_backup_version }}" +TMP_BACKUP_MONGODUMP_DIR="${TMP_BACKUP_DIR}/mongodump" + +ALL_DATABASES_FILTER="^(mas|iot|sls|ibm-sls)(_|-)({{ mas_instance_id }}|sls)(_|-)(?!.*monitor$)" + +tlsCAFile=$(ls /var/lib/tls/ca/*.pem) +primary_host="" + +is_primary=$(mongosh --quiet -u {{ mongodb_admin_user }} -p {{ mongodb_admin_password }} --host {{ mongodb_host }} --tlsCAFile $tlsCAFile --authenticationDatabase=admin --tls --eval 'db.isMaster().ismaster') +cmdrc=$? +if [ $cmdrc -ne 0 ]; then + echo "Error checking if node is primary. Exiting." + exit 1 +fi + +if [ "$is_primary" != "true" ]; then + echo "This node is not primary. Get Primary from server status." + primary_host=$(mongosh --quiet -u {{ mongodb_admin_user }} -p {{ mongodb_admin_password }} --tlsCAFile $tlsCAFile --host {{ mongodb_host }} --authenticationDatabase=admin --tls --eval 'rs.isMaster().primary') + cmdrc=$? + if [ $cmdrc -ne 0 ]; then + echo "Error getting primary node. Exiting." + exit 1 + fi + echo "Primary host is: $primary_host" +fi + +if [ -n "$primary_host" ]; then + echo "Connecting to primary host: $primary_host" + mongo_cmd="mongosh --quiet -u {{ mongodb_admin_user }} -p {{ mongodb_admin_password }} --tlsCAFile $tlsCAFile --authenticationDatabase=admin --tls --host $primary_host" + mongodump_cmd="mongodump -u {{ mongodb_admin_user }} -p {{ mongodb_admin_password }} --sslCAFile $tlsCAFile --authenticationDatabase=admin --ssl --host $primary_host" +else + echo "Continuing on current primary node." + mongo_cmd="mongosh --quiet -u {{ mongodb_admin_user }} -p {{ mongodb_admin_password }} --tlsCAFile $tlsCAFile --authenticationDatabase=admin --tls --host {{ mongodb_host }}" + mongodump_cmd="mongodump -u {{ mongodb_admin_user }} -p {{ mongodb_admin_password }} --sslCAFile $tlsCAFile --authenticationDatabase=admin --ssl --host {{ mongodb_host }}" +fi + +# cleanup base backup dir +rm -rf $TMP_BACKUP_DIR + +# Create temporary directory for backup +mkdir -p $TMP_BACKUP_MONGODUMP_DIR +if [ $? -ne 0 ]; then + echo "Error creating temporary backup directory $TMP_BACKUP_MONGODUMP_DIR. Exiting." + exit 1 +fi + +# Get database names matching the filter +databases=$($mongo_cmd --eval "db.adminCommand('listDatabases').databases.map(db => db.name).filter(name => name.match(/$ALL_DATABASES_FILTER/)).join(',')") +cmdrc=$? +if [ $cmdrc -ne 0 ]; then + echo "Error retrieving database names. Exiting." + exit 1 +fi +echo "Databases to back up: $databases" +IFS=',' read -r -a db_array <<< "$databases" + +# Perform backup for each database + +# Excluding sessions collection for Monitor database since it is not cleaned up automatically +# leading to very large backup sizes and long backup times. This is acceptable as +# sessions are transient and can be recreated. +# Refer DT425304 - MASCORE-5808 + +for db_name in "${db_array[@]}"; do + echo "Backing up database: $db_name" + # ignore sessions collection for monitor database + if [[ "$db_name" == *_monitor ]]; then + $mongodump_cmd --db=$db_name --excludeCollection=sessions --out=$TMP_BACKUP_MONGODUMP_DIR + else + $mongodump_cmd --db=$db_name --out=$TMP_BACKUP_MONGODUMP_DIR + fi + cmdrc=$? + if [ $cmdrc -ne 0 ]; then + echo "Error backing up database $db_name. Exiting." + exit 1 + fi +done +echo "Backup completed successfully. Backup files are located in $TMP_BACKUP_MONGODUMP_DIR." + +# Create tar.gz archives of database backup files +tar -czf $TMP_BACKUP_DIR/{{ mongodb_backup_data_filename }} -C $TMP_BACKUP_MONGODUMP_DIR . +if [ $? -ne 0 ]; then + echo "Error creating tar.gz archive of backup files. Exiting." + exit 1 +fi + +echo "Created archive: $TMP_BACKUP_DIR/{{ mongodb_backup_data_filename }}" + +echo "DATABASEBACKUPstatus-SUCCESS" + diff --git a/ibm/mas_devops/roles/mongodb/templates/community/backup-restore/database-restore.sh.j2 b/ibm/mas_devops/roles/mongodb/templates/community/backup-restore/database-restore.sh.j2 new file mode 100644 index 0000000000..1ee950d44d --- /dev/null +++ b/ibm/mas_devops/roles/mongodb/templates/community/backup-restore/database-restore.sh.j2 @@ -0,0 +1,79 @@ +#!/bin/bash + +MONGODB_BACKUP_USER="bradmin" + +BACKUP_DATA_FILENAME="{{ mongodb_backup_data_filename }}" +TMP_RESTORE_DIR="{{ mongodb_tmp_restore_dir }}" + + +# Extract backup data tar.gz file +echo "Extracting backup data file: $BACKUP_DATA_FILENAME" +tar -xzf $TMP_RESTORE_DIR/$BACKUP_DATA_FILENAME -C $TMP_RESTORE_DIR +cmdrc=$? +if [ $cmdrc -ne 0 ]; then + echo "Error extracting backup data file. Exiting." + exit 1 +fi + +# Remove the tar.gz file after extraction +rm -f $TMP_RESTORE_DIR/$BACKUP_DATA_FILENAME + +# Determine if current node is primary +tlsCAFile=$(ls /var/lib/tls/ca/*.pem) +echo "Using TLS CA file: $tlsCAFile" +primary_host="" + +is_primary=$(mongosh --quiet -u {{ mongodb_admin_user }} -p {{ mongodb_admin_password }} --host {{ mongodb_host }} --tlsCAFile $tlsCAFile --authenticationDatabase=admin --tls --eval 'db.isMaster().ismaster') +cmdrc=$? +if [ $cmdrc -ne 0 ]; then + echo "Error checking if node is primary. Exiting." + exit 1 +fi + +if [ "$is_primary" != "true" ]; then + echo "This node is not primary. Get Primary from server status." + primary_host=$(mongosh --quiet -u {{ mongodb_admin_user }} -p {{ mongodb_admin_password }} --tlsCAFile $tlsCAFile --host {{ mongodb_host }} --authenticationDatabase=admin --tls --eval 'rs.isMaster().primary') + cmdrc=$? + if [ $cmdrc -ne 0 ]; then + echo "Error getting primary node. Exiting." + exit 1 + fi + echo "Primary host is: $primary_host" +fi + +# Exclude these collections because they should be populated by MAS installation +mongodb_ns_exclude_args=" --nsExclude=mas_{{ mas_instance_id }}_adoptionusage.* \ + --nsExclude=mas_{{ mas_instance_id }}_catalog.* \ + --nsExclude=mas_{{ mas_instance_id }}_core.OauthClient \ + --nsExclude=mas_{{ mas_instance_id }}_core.OauthToken \ + --nsExclude=mas_{{ mas_instance_id }}_core.bindings \ + --nsExclude=mas_{{ mas_instance_id }}_core.graphiteconfigtool.* \ + --nsExclude=mas_{{ mas_instance_id }}_core.workspaces" + +if [ -n "$primary_host" ]; then + echo "Connecting to primary host: $primary_host" + mongo_cmd="mongosh --quiet -u {{ mongodb_admin_user }} -p {{ mongodb_admin_password }} --tlsCAFile $tlsCAFile --authenticationDatabase=admin --tls --host $primary_host" + mongorestore_cmd="mongorestore -u {{ mongodb_admin_user }} -p {{ mongodb_admin_password }} --sslCAFile $tlsCAFile --authenticationDatabase=admin --ssl --drop --dir=$TMP_RESTORE_DIR $mongodb_ns_exclude_args --host $primary_host" +else + echo "Continuing on current primary node." + mongo_cmd="mongosh --quiet -u {{ mongodb_admin_user }} -p {{ mongodb_admin_password }} --tlsCAFile $tlsCAFile --authenticationDatabase=admin --tls --host {{ mongodb_host }}" + mongorestore_cmd="mongorestore -u {{ mongodb_admin_user }} -p {{ mongodb_admin_password }} --sslCAFile $tlsCAFile --authenticationDatabase=admin --ssl --drop --dir=$TMP_RESTORE_DIR $mongodb_ns_exclude_args --host {{ mongodb_host }}" +fi + +# Perform restore +echo "Restoring databases from $TMP_RESTORE_DIR" +$mongorestore_cmd +restore_rc=$? + +# Clean up temporary dir +echo "Cleaning up temporary dir" +rm -rf $TMP_RESTORE_DIR + + +if [ $restore_rc -ne 0 ]; then + echo "Error restoring databases. Exiting." + exit 1 +fi + +echo "DATABASERESTOREstatus-SUCCESS" + diff --git a/ibm/mas_devops/roles/sls/README.md b/ibm/mas_devops/roles/sls/README.md index bb9a1fd077..a4fa4195ad 100644 --- a/ibm/mas_devops/roles/sls/README.md +++ b/ibm/mas_devops/roles/sls/README.md @@ -15,21 +15,28 @@ Specifies which operation to perform on the Suite License Service (SLS) instance - Environment Variable: `SLS_ACTION` - Default: `install` -**Purpose**: Controls what action the role executes against the SLS instance. This allows the same role to handle installation, configuration generation, and removal of SLS. +**Purpose**: Controls what action the role executes against the SLS instance. This allows the same role to handle installation, configuration generation, backup, restore, and removal of SLS. **When to use**: - Use `install` for initial SLS deployment or updates - Use `gencfg` to generate SLS configuration for MAS without installing SLS (when using existing SLS) +- Use `backup` to create a backup of SLS instance and its configuration +- Use `restore` to restore SLS instance from a backup - Use `uninstall` to remove SLS instance (use with caution) -**Valid values**: `install`, `gencfg`, `uninstall` +**Valid values**: `install`, `gencfg`, `backup`, `restore`, `uninstall` -**Impact**: +**Impact**: - `install`: Deploys or updates SLS operator and instance - `gencfg`: Only generates SLSCfg resource for MAS integration +- `backup`: Creates a backup of all SLS resources, secrets, and registration data +- `restore`: Restores SLS from a previously created backup - `uninstall`: Removes SLS instance and operator (destructive operation) -**Related variables**: When using `gencfg`, requires `sls_url` to be set to point to existing SLS instance. +**Related variables**: +- When using `gencfg`, requires `sls_url` to be set to point to existing SLS instance +- When using `backup`, requires `mas_backup_dir` and optionally `sls_backup_version` +- When using `restore`, requires `mas_backup_dir` and `sls_backup_version` **Note**: Always backup license data before using `uninstall`. The `gencfg` action is useful when SLS is shared across multiple MAS instances. @@ -686,6 +693,240 @@ For examples refer to the [BestEfforts reference configuration in the MAS CLI](h - Environment Variable: `MAS_POD_TEMPLATES_DIR` - Default: None +Role Variables - Backup and Restore Variables +------------------------------------------------------------------------------- +#### mas_backup_dir +Directory path where SLS backup files will be stored. + +- **Required** for backup and restore operations +- Environment Variable: `MAS_BACKUP_DIR` +- Default: None + +**Purpose**: Specifies the local filesystem directory where backup archives will be created (for backup) or read from (for restore). This directory serves as the central location for all SLS backup data. + +**When to use**: +- Required when `sls_action` is set to `backup` or `restore` +- Should be a persistent location with sufficient storage space +- Ensure the directory is accessible and has appropriate permissions + +**Valid values**: Any valid local filesystem path (e.g., `/backup/mas`, `/home/user/sls-backups`) + +**Impact**: All backup files and metadata will be stored in subdirectories under this path. The backup creates a timestamped directory structure: `{mas_backup_dir}/backup-{version}-sls/` + +**Related variables**: Works with `sls_backup_version` to create unique backup directories. + +**Note**: Ensure this directory has sufficient space for backup data and is regularly backed up to external storage for disaster recovery. + +#### sls_backup_version +Version identifier for the backup, used to create unique backup directories. + +- **Optional** for backup (auto-generated if not provided) +- **Required** for restore +- Environment Variable: `SLS_BACKUP_VERSION` +- Default: Auto-generated timestamp in format `YYYYMMDD-HHMMSS` + +**Purpose**: Provides a unique identifier for each backup, allowing multiple backups to coexist and enabling point-in-time restore operations. + +**When to use**: +- For backup: Leave unset to auto-generate a timestamp-based version, or provide a custom identifier +- For restore: Must specify the exact version identifier of the backup to restore + +**Valid values**: Any string suitable for directory names (alphanumeric, hyphens, underscores). Auto-generated format: `YYYYMMDD-HHMMSS` (e.g., `20260122-131500`) + +**Impact**: +- For backup: Creates directory `{mas_backup_dir}/backup-{version}-sls/` +- For restore: Looks for backup in `{mas_backup_dir}/backup-{version}-sls/` + +**Related variables**: Works with `mas_backup_dir` to determine backup location. + +**Note**: When restoring, you must know the exact backup version identifier. List the contents of `mas_backup_dir` to see available backups. + +Backup and Restore Operations +------------------------------------------------------------------------------- + +The SLS role supports backup and restore operations to protect your license service configuration and data. This is essential for disaster recovery, migration, and upgrade scenarios. + +### What Gets Backed Up + +The SLS backup operation captures all critical resources needed to restore a complete SLS instance: + +**Kubernetes Resources:** +- **Project/Namespace**: The SLS namespace configuration +- **Secrets**: + - IBM entitlement credentials (`ibm-entitlement`) + - MongoDB connection credentials (`ibm-sls-mongo-credentials`) + - Bootstrap configuration (`{instance-name}-bootstrap`) + - License entitlement data (`ibm-sls-{instance-name}-entitlement`) + - All referenced secrets (auto-discovered) +- **ConfigMaps**: Suite registration configuration +- **Operator Resources**: + - Subscription (`ibm-sls`) + - OperatorGroup +- **SLS Custom Resources**: + - LicenseService CR (the main SLS instance) +- **Certificate Manager Resources**: + - Issuers (self-signed and CA issuers) + - Certificates (CA certificate) + +**Registration Data:** +- License ID +- Registration key +- Suite registration information + +### Backup Process + +The backup operation performs the following steps: + +1. **Validation**: Verifies required variables (`mas_backup_dir`, `sls_namespace`, `sls_instance_name`) +2. **Version Generation**: Creates or uses provided backup version identifier +3. **Resource Discovery**: Identifies all SLS resources and auto-discovers referenced secrets +4. **Backup Execution**: Exports all resources to YAML files in the backup directory +5. **Registration Capture**: Saves SLS registration details for restore +6. **Verification**: Reports backup statistics and any failures + +**Backup Directory Structure:** +``` +{mas_backup_dir}/ +└── backup-{version}-sls/ + ├── resources/ + │ ├── projects/ + │ ├── secrets/ + │ ├── configmaps/ + │ ├── subscriptions/ + │ ├── operatorgroups/ + │ ├── licenseservices/ + │ ├── issuers/ + │ └── certificates/ + └── sls-registration.yaml +``` + +### Restore Process + +The restore operation performs the following steps: + +1. **Validation**: Verifies required variables and backup existence +2. **Backup Verification**: Checks for valid backup structure and required files +3. **Certificate Manager Check**: Ensures cert-manager is installed in the cluster +4. **Sequential Restoration**: + - Projects/Namespaces + - Secrets and ConfigMaps + - OperatorGroups + - Subscriptions (triggers operator installation) + - Certificate Manager resources + - Bootstrap secret (recreated from registration data) + - LicenseService CR (with optional domain override) +5. **Verification**: Waits for SLS to become ready and reports restore statistics + +**Important Restore Considerations:** +- The target cluster must have Certificate Manager installed +- MongoDB must be available and accessible (not restored by this role) +- The restore can optionally override the `sls_domain` if deploying to a different cluster +- All secrets and credentials are restored exactly as backed up + +### Backup Example + +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + sls_action: backup + mas_backup_dir: /backup/mas + sls_namespace: ibm-sls + sls_instance_name: sls + # Optional: specify custom backup version + # sls_backup_version: "pre-upgrade-backup" + + roles: + - ibm.mas_devops.sls +``` + +**Using environment variables:** +```bash +export SLS_ACTION=backup +export MAS_BACKUP_DIR=/backup/mas +export SLS_NAMESPACE=ibm-sls +export SLS_INSTANCE_NAME=sls +# Optional: export SLS_BACKUP_VERSION=pre-upgrade-backup + +ansible-playbook ibm.mas_devops.run_role +``` + +### Restore Example + +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + sls_action: restore + mas_backup_dir: /backup/mas + sls_backup_version: "20260122-131500" + # Optional: override domain for different cluster + # sls_domain: sls.newcluster.example.com + + roles: + - ibm.mas_devops.sls +``` + +**Using environment variables:** +```bash +export SLS_ACTION=restore +export MAS_BACKUP_DIR=/backup/mas +export SLS_BACKUP_VERSION=20260122-131500 +# Optional: export SLS_DOMAIN=sls.newcluster.example.com + +ansible-playbook ibm.mas_devops.run_role +``` + +### Best Practices + +1. **Regular Backups**: Schedule regular backups before: + - SLS upgrades + - License entitlement changes + - Cluster maintenance + - MAS upgrades + +2. **Backup Storage**: + - Store backups in a location separate from the cluster + - Implement backup retention policies + - Test restore procedures regularly + +3. **MongoDB Backup**: + - SLS backup does NOT include MongoDB data + - Ensure MongoDB is backed up separately using the `mongodb` role backup functionality + - Coordinate SLS and MongoDB backups for consistency + +4. **Pre-Restore Checklist**: + - Verify Certificate Manager is installed + - Ensure MongoDB is available and accessible + - Confirm backup version exists and is complete + - Review and adjust `sls_domain` if restoring to different cluster + +5. **Disaster Recovery**: + - Document backup locations and procedures + - Test restore process in non-production environment + - Keep backup version identifiers in a safe location + - Maintain MongoDB connection details separately + +6. **Migration Scenarios**: + - When migrating to a new cluster, restore MongoDB first + - Update `sls_domain` if cluster ingress domain changed + - Verify network connectivity between restored SLS and MAS instances + +### Troubleshooting + +**Backup Issues:** +- **"Required variables not provided"**: Ensure `mas_backup_dir`, `sls_namespace`, and `sls_instance_name` are set +- **"Resource not found"**: Some resources may not exist in your deployment (this is normal) +- **Permission errors**: Ensure write access to `mas_backup_dir` + +**Restore Issues:** +- **"Backup archive not found"**: Verify `sls_backup_version` matches an existing backup directory +- **"Certificate Manager not found"**: Install cert-manager before restoring SLS +- **"LicenseService not ready"**: Check operator logs and ensure MongoDB is accessible +- **Domain mismatch**: Use `sls_domain` variable to override domain for new cluster + +For more information on backup and restore, refer to the [IBM Documentation on Backing up and Restoring SLS](https://www.ibm.com/docs/en/masv-and-l/cd?topic=service-backing-up-restoring). + ## Example Playbook ### Install and generate a configuration [up to SLS 3.6.0] diff --git a/ibm/mas_devops/roles/sls/defaults/main.yml b/ibm/mas_devops/roles/sls/defaults/main.yml index 6ffdc3e97c..7b30689e50 100644 --- a/ibm/mas_devops/roles/sls/defaults/main.yml +++ b/ibm/mas_devops/roles/sls/defaults/main.yml @@ -62,3 +62,7 @@ custom_labels: "{{ lookup('env', 'CUSTOM_LABELS') | default(None, true) | string # mas_pod_templates_dir: path to directory containing podTemplates configuration # ----------------------------------------------------------------------------- mas_pod_templates_dir: "{{ lookup('env', 'MAS_POD_TEMPLATES_DIR') | default('', true) }}" + +# Backup and restore variables +sls_backup_version: "{{ lookup('env', 'SLS_BACKUP_VERSION') }}" +mas_backup_dir: "{{ lookup('env', 'MAS_BACKUP_DIR') }}" diff --git a/ibm/mas_devops/roles/sls/tasks/backup/main.yml b/ibm/mas_devops/roles/sls/tasks/backup/main.yml new file mode 100644 index 0000000000..ee671046e7 --- /dev/null +++ b/ibm/mas_devops/roles/sls/tasks/backup/main.yml @@ -0,0 +1,117 @@ +--- +- name: "Fail if require variables for SLS backup are not provided" + ibm.mas_devops.verify_backup_restore_vars: + mas_backup_dir: "{{ mas_backup_dir }}" + sls_namespace: "{{ sls_namespace }}" + sls_instance_name: "{{ sls_instance_name }}" + action: "backup" + component: "sls" + +- name: "Check if SLS_BACKUP_VERSION is provided, if not set to default 'YYYYMMDD-HHMMSS' format" + set_fact: + sls_backup_version: "{{ lookup('pipe', 'date +%Y%m%d-%H%M%S') }}" + when: sls_backup_version is not defined or sls_backup_version == "" or sls_backup_version == "None" + +- name: "Set fact: SLS backup base directory path" + set_fact: + sls_backup_path: "{{ mas_backup_dir }}/backup-{{ sls_backup_version }}-sls" + +- name: "Set fact: SLS backup resources" + set_fact: + sls_backup_resources: + - namespace: "{{ sls_namespace }}" + resources: + # project + - kind: Project + api_version: project.openshift.io/v1 + name: "{{ sls_namespace }}" + # secret + - kind: Secret + api_version: v1 + name: ibm-entitlement + - kind: Secret + api_version: v1 + name: ibm-sls-mongo-credentials + - kind: Secret + api_version: v1 + name: "{{ sls_instance_name }}-bootstrap" + - kind: Secret + api_version: v1 + name: "ibm-sls-{{ sls_instance_name }}-entitlement" + # configmap + - kind: ConfigMap + api_version: v1 + name: "{{ sls_instance_name }}-suite-registration" + # subscription + - kind: Subscription + api_version: operators.coreos.com/v1alpha1 + name: ibm-sls + # operator group + - kind: OperatorGroup + api_version: operators.coreos.com/v1 + name: operatorgroup + # Licenseservice CR + - kind: LicenseService + api_version: sls.ibm.com/v1 + name: "{{ sls_instance_name }}" + # Issuers + - kind: Issuer + api_version: cert-manager.io/v1 + name: "{{ sls_instance_name }}-issuer" + - kind: Issuer + api_version: cert-manager.io/v1 + name: "{{ sls_instance_name }}-ca-issuer" + # Certificates + - kind: Certificate + api_version: cert-manager.io/v1 + name: "{{ sls_instance_name }}-cert-ca" + +# Call the backup_resources plugin to execute the backup to the path provided +# ----------------------------------------------------------------------------- +- name: "Backup sls resources (referenced secrets are auto-discovered)" + ibm.mas_devops.backup_resource: + backup_resources: "{{ sls_backup_resources }}" + backup_path: "{{ sls_backup_path }}" + register: backup_result + +# Show the results +# ----------------------------------------------------------------------------- +- name: "Display backup results" + debug: + msg: + - "Backup completed{{ ' with failures' if backup_result.failed_count > 0 else ' successfully' }}" + - "Total resources backed up: {{ backup_result.backed_up_count }}" + - "Total resources failed: {{ backup_result.failed_count }}" + - "Resources not found: {{ backup_result.not_found_count }}" + - "Secrets auto-discovered: {{ backup_result.discovered_secrets_count }}" + - "Backup location: {{ sls_backup_path }}" + +# Fail task if any errors occurred. +# ----------------------------------------------------------------------------- +- name: "Display failed resources" + debug: + msg: + - "Failed resources:" + - "{{ backup_result.failed_resources | to_nice_yaml }}" + when: backup_result.failed_count > 0 + +- name: "Fail if backup had errors" + fail: + msg: | + Backup failed for {{ backup_result.failed_count }} resource(s): + {% for resource in backup_result.failed_resources %} + - {{ resource.description }} in {{ resource.scope }} + {% endfor %} + when: backup_result.failed_count > 0 + +- name: "Store SLS registration details" + ibm.mas_devops.save_sls_registration_info: + namespace: "{{ sls_namespace }}" + name: "{{ sls_instance_name }}" + sls_backup_path: "{{ sls_backup_path }}" + register: sls_registration_result + +- name: "Fail if storing SLS registration details has failed." + fail: + msg: "{{ sls_registration_result.msg }}" + when: sls_registration_result.failed diff --git a/ibm/mas_devops/roles/sls/tasks/main.yml b/ibm/mas_devops/roles/sls/tasks/main.yml index 7cf575437c..81ebce0653 100644 --- a/ibm/mas_devops/roles/sls/tasks/main.yml +++ b/ibm/mas_devops/roles/sls/tasks/main.yml @@ -7,7 +7,7 @@ - name: Run the specified action ansible.builtin.include_tasks: "tasks/{{ sls_action }}/main.yml" when: - - sls_action in ["install", "uninstall"] + - sls_action in ["install", "uninstall", "backup", "restore"] - sls_url is not defined or sls_url == "" # TODO: We should take a bigger look at how the "only generate a cfg" mode works diff --git a/ibm/mas_devops/roles/sls/tasks/restore/main.yml b/ibm/mas_devops/roles/sls/tasks/restore/main.yml new file mode 100644 index 0000000000..6af345729f --- /dev/null +++ b/ibm/mas_devops/roles/sls/tasks/restore/main.yml @@ -0,0 +1,252 @@ +--- +- name: "Fail if require variables for SLS backup are not provided" + ibm.mas_devops.verify_backup_restore_vars: + mas_backup_dir: "{{ mas_backup_dir }}" + sls_backup_version: "{{ sls_backup_version }}" + action: "restore" + component: "sls" + +- name: "Set fact: SLS backup base directory path" + set_fact: + sls_backup_path: "{{ mas_backup_dir }}/backup-{{ sls_backup_version }}-sls" + sls_backup_resource_path: "{{ mas_backup_dir }}/backup-{{ sls_backup_version }}-sls/resources" + +- name: "Check SLS backup resource path exist" + stat: + path: "{{ sls_backup_resource_path }}" + register: resources_backup_path_stat + +- name: "Fail if backup archive does not exist" + fail: + msg: "SLS resources archive not found at: {{ sls_backup_resource_path }}" + when: not resources_backup_path_stat.stat.exists or not resources_backup_path_stat.stat.isdir + +# Verify only one LicenseService instance file is present in backup archive +# ----------------------------------------------------------------------------- +- name: Get files from {{ sls_backup_resource_path }}/licenseservices directory + set_fact: + instance_files: "{{ lookup('fileglob', '{{ sls_backup_resource_path }}/licenseservices/*', wantlist=True) }}" + +- name: Assert exactly one LicenseService CR exists + assert: + that: + - instance_files | length == 1 + fail_msg: "LicenseService Directory must contain exactly one file" + +- name: Set fact LicenseService cr + set_fact: + sls_cr_cfg: "{{ lookup('file', '{{ instance_files[0] }}') | from_yaml }}" + +- name: "Set fact: SLS details" + set_fact: + sls_instance_name: "{{ sls_cr_cfg.metadata.name }}" + sls_namespace: "{{ sls_cr_cfg.metadata.namespace }}" + +# Verify sls-registration.yaml exists in backup archive +# ----------------------------------------------------------------------------- +- name: "Check sls-registration.yaml exists in {{ sls_backup_path }}" + stat: + path: "{{ sls_backup_path }}/sls-registration.yaml" + register: registrationfile_stat + +- name: "Fail if sls-registration.yaml does not exist" + fail: + msg: "sls-registration.yaml not found at: {{ sls_backup_path }}" + when: not registrationfile_stat.stat.exists + +- name: "SLS restore information" + debug: + msg: + - "Backup Version ................. {{ sls_backup_version }}" + - "Backup Path .................... {{ sls_backup_path }}" + - "SLS Instance Name .............. {{ sls_instance_name }}" + - "SLS Namespace .................. {{ sls_namespace }}" + +# Verify cert-manager exists +# ----------------------------------------------------------------------------- +- name: Detect Certificate Manager installation + include_tasks: "{{ role_path }}/../../common_tasks/detect_cert_manager.yml" + +# Restore Projects +# ------------------------------------------------------------------------- +- name: "Restore Projects" + ibm.mas_devops.restore_resource: + backup_path: "{{ sls_backup_path }}" + resource_kinds: + - Project + register: projects_result + +# Restore Secrets and ConfigMaps +- name: "Restore Secrets and ConfigMaps" + ibm.mas_devops.restore_resource: + backup_path: "{{ sls_backup_path }}" + resource_kinds: + - Secret + - ConfigMap + register: secrets_configmaps_result + when: projects_result.success + +# Restore OperatorGroups and Subscriptions +- name: "Restore OperatorGroups" + ibm.mas_devops.restore_resource: + backup_path: "{{ sls_backup_path }}" + resource_kinds: + - OperatorGroup + register: operatorgroups_result + when: projects_result.success + +- name: "Restore Subscriptions" + ibm.mas_devops.restore_resource: + backup_path: "{{ sls_backup_path }}" + resource_kinds: + - Subscription + register: subscriptions_result + when: projects_result.success + +# Wait until the LicenseService CRD is available +# ----------------------------------------------------------------------------- +- name: "Wait until the LicenseService CRD is available" + include_tasks: "{{ role_path }}/../../common_tasks/wait_for_crd.yml" + vars: + crd_name: "licenseservices.sls.ibm.com" + +# Restore Certificate Manager resources (Issuers, Certificates) +- name: "Restore Certificate Manager resources" + ibm.mas_devops.restore_resource: + backup_path: "{{ sls_backup_path }}" + resource_kinds: + - Issuer + - Certificate + register: certmanager_result + when: projects_result.success + +# Create Bootstrap secret +# ----------------------------------------------------------------------------- +- name: "Load sls-registration.yaml" + include_vars: + file: "{{ sls_backup_path }}/sls-registration.yaml" + name: sls_registration + +# refer: https://www.ibm.com/docs/en/masv-and-l/cd?topic=service-backing-up-restoring +- name: Create SLS Bootstrap secret + kubernetes.core.k8s: + state: present + definition: + apiVersion: v1 + kind: Secret + metadata: + name: "{{ sls_instance_name }}-bootstrap" + namespace: "{{ sls_namespace }}" + stringData: + licensingId: "{{ (sls_registration.licenseId is defined and sls_registration.licenseId != '') | ternary(sls_registration.licenseId, omit) }}" + registrationKey: "{{ (sls_registration.registrationKey is defined and sls_registration.registrationKey != '') | ternary(sls_registration.registrationKey, omit) }}" + when: projects_result.success + +# Restore SLS resources +# ----------------------------------------------------------------------------- +# Setting NO_OVERRIDE to sls_domain here since its being used by other actions. +- name: "Set NO_OVERRIDE to sls_domain if its not defined" + set_fact: + _sls_domain: "{{ sls_domain | default('NO_OVERRIDE', true) }}" + +- name: "Restore LicenseService" + ibm.mas_devops.restore_resource: + backup_path: "{{ sls_backup_path }}" + resource_kinds: + - LicenseService + override_values: + LicenseService: + - spec.domain: "{{ _sls_domain }}" + - ca.secretName: "{{ sls_instance_name }}-cert-ca" + register: sls_result + when: projects_result.success + +# Verify SLS +# ----------------------------------------------------------------------------- +- name: Verify LicenseService CR + ansible.builtin.include_tasks: "tasks/install/sls-verify.yml" + when: + - projects_result.success + - sls_result is defined and sls_result.success + +# Calculate total results +# ----------------------------------------------------------------------------- +- name: "Calculate total restore results" + set_fact: + total_created: >- + {{ + (projects_result.created_count | default(0)) + + (secrets_configmaps_result.created_count | default(0)) + + (operatorgroups_result.created_count | default(0)) + + (subscriptions_result.created_count | default(0)) + + (certmanager_result.created_count | default(0)) + + (sls_result.created_count | default(0)) + }} + total_updated: >- + {{ + (projects_result.updated_count | default(0)) + + (secrets_configmaps_result.updated_count | default(0)) + + (operatorgroups_result.updated_count | default(0)) + + (subscriptions_result.updated_count | default(0)) + + (certmanager_result.updated_count | default(0)) + + (sls_result.updated_count | default(0)) + }} + total_skipped: >- + {{ + (projects_result.skipped_count | default(0)) + + (secrets_configmaps_result.skipped_count | default(0)) + + (operatorgroups_result.skipped_count | default(0)) + + (subscriptions_result.skipped_count | default(0)) + + (certmanager_result.skipped_count | default(0)) + + (sls_result.skipped_count | default(0)) + }} + total_failed: >- + {{ + (projects_result.failed_count | default(0)) + + (secrets_configmaps_result.failed_count | default(0)) + + (operatorgroups_result.failed_count | default(0)) + + (subscriptions_result.failed_count | default(0)) + + (certmanager_result.failed_count | default(0)) + + (sls_result.failed_count | default(0)) + }} + +- name: "Display total restore results" + debug: + msg: + - >- + Restore completed{{ ' with failures' if total_failed | int > 0 + else ' successfully' }} + - "Total resources created: {{ total_created }}" + - "Total resources updated: {{ total_updated }}" + - "Total resources skipped: {{ total_skipped }}" + - "Total resources failed: {{ total_failed }}" + +# Fail task if any errors occurred +# ----------------------------------------------------------------------------- +- name: "Collect all failed resources" + set_fact: + all_failed_resources: >- + {{ + (projects_result.failed_resources | default([])) + + (secrets_configmaps_result.failed_resources | default([])) + + (operatorgroups_result.failed_resources | default([])) + + (subscriptions_result.failed_resources | default([])) + + (certmanager_result.failed_resources | default([])) + + (sls_result.failed_resources | default([])) + }} + +- name: "Display failed resources" + debug: + msg: + - "Failed resources:" + - "{{ all_failed_resources | to_nice_yaml }}" + when: total_failed | int > 0 + +- name: "Fail if restore had errors" + fail: + msg: | + Restore failed for {{ total_failed }} resource(s): + {% for resource in all_failed_resources %} + - {{ resource.description }}: {{ resource.error }} + {% endfor %} + when: total_failed | int > 0 diff --git a/ibm/mas_devops/roles/suite_app_backup/README.md b/ibm/mas_devops/roles/suite_app_backup/README.md new file mode 100644 index 0000000000..d8264d7a32 --- /dev/null +++ b/ibm/mas_devops/roles/suite_app_backup/README.md @@ -0,0 +1,191 @@ +Backup MAS Applications +=============================================================================== + +Overview +------------------------------------------------------------------------------- +This role supports backing up MAS application resources and data. Currently supported applications: + +- **`manage`**: Backs up Manage namespace resources (CRs, secrets, subscriptions) and persistent volume data + +Future support planned for: `iot`, `monitor`, `health`, `optimizer`, `visualinspection` + +The backup process creates a timestamped backup directory containing: +1. **Namespace Resources**: Kubernetes resources including ManageApp, ManageWorkspace, secrets, and subscriptions +2. **Persistent Volume Data**: Application data stored in PVCs (automatically detected from ManageWorkspace CR) + +!!! important + - An application backup can only be restored to an instance with the same MAS instance ID + - This role backs up application resources and PV data only. Database backups must be performed separately using the appropriate database backup role + - For Manage, see the [db2](db2.md) role for database backup + + +Role Variables - General +------------------------------------------------------------------------------- +### mas_app_id +Defines the MAS application ID for the backup action. + +- **Required** +- Environment Variable: `MAS_APP_ID` +- Default: None +- Valid Values: `manage` (currently supported) + +### mas_instance_id +Defines the MAS instance ID for the backup action. + +- **Required** +- Environment Variable: `MAS_INSTANCE_ID` +- Default: None + +### mas_workspace_id +Defines the MAS workspace ID for the backup action. + +- **Required** +- Environment Variable: `MAS_WORKSPACE_ID` +- Default: None + +### mas_backup_dir +Defines the directory where backups will be stored. The role will create a timestamped subdirectory within this location. + +- **Required** +- Environment Variable: `MAS_BACKUP_DIR` +- Default: None +- Example: `/backup/mas` + +### mas_app_backup_version +Optional custom version identifier for the backup. If not specified, defaults to timestamp format `YYYYMMDD-HHMMSS`. + +- Optional +- Environment Variable: `MAS_APP_BACKUP_VERSION` +- Default: Auto-generated timestamp +- Example: `20240315-143022` or `v1.0-prod` + + +What Gets Backed Up +------------------------------------------------------------------------------- +### Manage Application +When backing up the Manage application, the following resources are included: + +**Namespace Resources** (automatically backed up): +- `ManageApp` CR +- `ManageWorkspace` CR +- Encryption secrets (dynamically determined from ManageWorkspace CR) +- Certificates with `mas.ibm.com/instanceId` label +- Subscription and OperatorGroup +- IBM entitlement secret +- All referenced secrets (auto-discovered) + +**Persistent Volume Data** (automatically backed up if configured in ManageWorkspace CR): +- All persistent volumes defined in `spec.settings.deployment.persistentVolumes` +- Data is backed up as compressed tar.gz archives +- Each PVC's mount path is archived separately +- Archives are stored in the `data` subdirectory + +**NOT Included** (must be backed up separately): +- Manage database (Db2) - use the [db2](db2.md) role +- Suite-level resources - use the [suite_backup](suite_backup.md) role + + +How Persistent Volume Backup Works +------------------------------------------------------------------------------- +The role automatically detects and backs up persistent volumes configured in the ManageWorkspace CR: + +1. **Detection**: Reads `spec.settings.deployment.persistentVolumes` from ManageWorkspace CR +2. **Pod Selection**: Finds the UI or ALL server bundle pod for the workspace +3. **Archive Creation**: Creates tar.gz archives of each mount path inside the pod +4. **Transfer**: Copies archives from pod to local backup directory +5. **Cleanup**: Removes temporary archives from the pod + +Example ManageWorkspace CR configuration: +```yaml +spec: + settings: + deployment: + persistentVolumes: + - accessModes: + - ReadWriteMany + mountPath: /jmsstore + pvcName: mas-inst1-ws1-jmsserver-pvc + size: 25Gi + storageClassName: efs-csi + - accessModes: + - ReadWriteMany + mountPath: /usr/share/fonts/truetype/Free3of9Extended + pvcName: masms-inst1-ws1-fonts-pvc + size: 8Gi + storageClassName: efs-csi +``` + +This configuration will result in two tar.gz archives: +- `mas-inst1-ws1-jmsserver-pvc.tar.gz` +- `masms-inst1-ws1-fonts-pvc.tar.gz` + + +Backup Directory Structure +------------------------------------------------------------------------------- +The role creates a backup directory with the following structure: + +``` +/ +└── backup--app-manage/ + ├── resources/ + │ ├── projects + │ │ └── mas--manage.yaml + | ├── secrets + │ │ └── .yaml + │ │ └── .yaml + | ├── configmaps + │ │ └── .yaml + │ │ └── .yaml + | ├── subscriptions + │ │ └── .yaml + │ └── ... (other resources) + └── data/ + ├── .tar.gz + ├── .tar.gz +``` + + +Example Playbooks +------------------------------------------------------------------------------- + +### Basic Backup +Backup Manage namespace resources and any configured persistent volumes: + +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + mas_instance_id: inst1 + mas_workspace_id: ws1 + mas_app_id: manage + mas_backup_dir: /backup/mas + roles: + - ibm.mas_devops.suite_app_backup +``` + +### Backup with Custom Version +Backup with a custom version identifier: + +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + mas_instance_id: inst1 + mas_workspace_id: ws1 + mas_app_id: manage + mas_backup_dir: /backup/mas + mas_app_backup_version: "prod-backup-20240315" + roles: + - ibm.mas_devops.suite_app_backup +``` + + +Notes +------------------------------------------------------------------------------- +- **Database Backup**: This role does NOT backup the Manage database. Use the [db2](db2.md) role to backup Db2 databases separately +- **Suite Resources**: This role backs up application-specific resources only. For suite-level resources (Suite CR, workspace CRs, etc.), use the [suite_backup](suite_backup.md) role +- **Storage Requirements**: Ensure sufficient storage space in `mas_backup_dir` for both namespace resources and PV data +- **Pod Access**: The role uses the UI or ALL server bundle pod to access PVC data. Ensure at least one of these pods is running and healthy +- **Backup Time**: PV backup duration depends on the amount of data in the persistent volumes +- **Automatic Detection**: Persistent volumes are automatically detected from the ManageWorkspace CR - no manual configuration needed +- **Compression**: All PV data is compressed using gzip to minimize storage requirements diff --git a/ibm/mas_devops/roles/suite_app_backup/defaults/main.yml b/ibm/mas_devops/roles/suite_app_backup/defaults/main.yml new file mode 100644 index 0000000000..0afe09899a --- /dev/null +++ b/ibm/mas_devops/roles/suite_app_backup/defaults/main.yml @@ -0,0 +1,17 @@ +--- +# MAS Application Backup - Default Variables +# ============================================================================= + +# General Configuration +# ----------------------------------------------------------------------------- +mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" +mas_workspace_id: "{{ lookup('env', 'MAS_WORKSPACE_ID') }}" +mas_app_id: "{{ lookup('env', 'MAS_APP_ID') }}" + +# Backup Configuration +# ----------------------------------------------------------------------------- +# Directory where backups will be stored +mas_backup_dir: "{{ lookup('env', 'MAS_BACKUP_DIR') }}" + +# Optional: Specify a custom backup version (defaults to timestamp YYYYMMDD-HHMMSS) +mas_app_backup_version: "{{ lookup('env', 'MAS_APP_BACKUP_VERSION') }}" diff --git a/ibm/mas_devops/roles/suite_app_backup/tasks/main.yml b/ibm/mas_devops/roles/suite_app_backup/tasks/main.yml new file mode 100644 index 0000000000..8aa6571f69 --- /dev/null +++ b/ibm/mas_devops/roles/suite_app_backup/tasks/main.yml @@ -0,0 +1,75 @@ +--- +# MAS Application Backup Role +# ============================================================================= +# This role backs up MAS application resources and data. +# Currently supports: manage +# +# The backup includes: +# - Application namespace resources (CRs, secrets, subscriptions) +# - Persistent volume data (if configured) + +# 1. Validate required variables +# ----------------------------------------------------------------------------- +- name: "Fail if mas_instance_id is not provided" + assert: + that: mas_instance_id is defined and mas_instance_id != "" + fail_msg: "mas_instance_id is required" + +- name: "Fail if mas_workspace_id is not provided" + assert: + that: mas_workspace_id is defined and mas_workspace_id != "" + fail_msg: "mas_workspace_id is required" + +- name: "Fail if mas_app_id is not provided" + assert: + that: mas_app_id is defined and mas_app_id != "" + fail_msg: "mas_app_id is required" + +- name: "Fail if mas_backup_dir is not provided" + assert: + that: mas_backup_dir is defined and mas_backup_dir != "" + fail_msg: "mas_backup_dir is required" + +# 2. Display backup configuration +# ----------------------------------------------------------------------------- +- name: "Display backup configuration" + debug: + msg: + - "MAS Instance ID: {{ mas_instance_id }}" + - "MAS Workspace ID: {{ mas_workspace_id }}" + - "MAS App ID: {{ mas_app_id }}" + - "Backup Directory: {{ mas_backup_dir }}" + - "Backup Version: {{ mas_app_backup_version | default('auto-generated') }}" + +# 3. Route to app-specific backup tasks +# ----------------------------------------------------------------------------- +- name: "Execute backup for {{ mas_app_id }}" + block: + # Manage backup + # --------------------------------------------------------------------------- + - name: "Backup Manage application" + when: mas_app_id == "manage" + block: + - name: "Backup Manage namespace resources" + include_tasks: "{{ role_path }}/tasks/manage/backup-namespace.yml" + + - name: "Backup Manage persistent volumes" + include_tasks: "{{ role_path }}/tasks/manage/backup-pv.yml" + + - name: "Manage backup completed successfully" + debug: + msg: + - "==========================================" + - "Manage Backup Completed Successfully" + - "==========================================" + - "Instance ID: {{ mas_instance_id }}" + - "Workspace ID: {{ mas_workspace_id }}" + - "Backup Location: {{ manage_backup_path }}" + - "==========================================" + + # Unsupported app + # --------------------------------------------------------------------------- + - name: "Fail if app is not supported" + fail: + msg: "Application '{{ mas_app_id }}' is not yet supported for backup. Currently supported: manage" + when: mas_app_id not in ['manage'] diff --git a/ibm/mas_devops/roles/suite_app_backup/tasks/manage/backup-namespace.yml b/ibm/mas_devops/roles/suite_app_backup/tasks/manage/backup-namespace.yml new file mode 100644 index 0000000000..a81050b913 --- /dev/null +++ b/ibm/mas_devops/roles/suite_app_backup/tasks/manage/backup-namespace.yml @@ -0,0 +1,153 @@ +--- +# Backup Manage Namespace Resources +# ============================================================================= +# This task backs up all Manage namespace resources using the backup_resource +# plugin, following the pattern used in suite_backup role. + +# 1. Verify required variables +# ----------------------------------------------------------------------------- +- name: "Verify required variables for Manage backup" + ibm.mas_devops.verify_backup_restore_vars: + component: manage + action: backup + mas_instance_id: "{{ mas_instance_id }}" + mas_workspace_id: "{{ mas_workspace_id }}" + mas_backup_dir: "{{ mas_backup_dir }}" + +# 2. Set backup version if not provided +# ----------------------------------------------------------------------------- +- name: "Check if MAS_APP_BACKUP_VERSION is provided, if not set to default 'YYYYMMDD-HHMMSS' format" + set_fact: + mas_app_backup_version: "{{ lookup('pipe', 'date +%Y%m%d-%H%M%S') }}" + when: mas_app_backup_version is not defined or mas_app_backup_version == "" or mas_app_backup_version == "None" + +# 3. Set backup path +# ----------------------------------------------------------------------------- +- name: "Set fact: Manage backup base directory path" + set_fact: + manage_backup_path: "{{ mas_backup_dir }}/backup-{{ mas_app_backup_version }}-app-manage" + +# 4. Set Manage namespace and workspace CR name +# ----------------------------------------------------------------------------- +- name: "Set fact: Manage namespace" + set_fact: + mas_app_namespace: "mas-{{ mas_instance_id }}-manage" + +- name: "Set fact: Manage workspace CR name" + set_fact: + manage_workspace_cr_name: "{{ mas_instance_id }}-{{ mas_workspace_id }}" + +# 5. Get ManageWorkspace CR to determine encryption secret name +# ----------------------------------------------------------------------------- +- name: "Get ManageWorkspace CR" + kubernetes.core.k8s_info: + api_version: apps.mas.ibm.com/v1 + kind: ManageWorkspace + name: "{{ manage_workspace_cr_name }}" + namespace: "{{ mas_app_namespace }}" + register: manage_workspace_cr + +- name: "Fail if ManageWorkspace CR not found" + fail: + msg: "ManageWorkspace CR '{{ manage_workspace_cr_name }}' not found in namespace '{{ mas_app_namespace }}'" + when: + - manage_workspace_cr.resources is not defined or manage_workspace_cr.resources | length == 0 + +- name: "Set fact: Manage encryption secret name" + set_fact: + manage_encryptionsecret_name: "{{ manage_workspace_cr.resources[0].spec.settings.db.encryptionSecret | default(mas_workspace_id + '-manage-encryptionsecret', true) }}" + when: + - manage_workspace_cr.resources[0].spec is defined + - manage_workspace_cr.resources[0].spec.settings is defined + +- name: "Debug: Manage encryption secret name" + debug: + msg: "Manage encryption secret name: {{ manage_encryptionsecret_name }}" + +# 6. Build the Manage namespace resources list. +# Note: the ManageWorkspace CR contains entries with the key secretName and these +# will be automatically picked up. We only need to define secrets in the fact below +# that are not referenced by a resource or not having the key secretName +# ----------------------------------------------------------------------------- +- name: "Set fact: Manage namespace resources" + set_fact: + manage_namespace_resources: + # Core Manage CRs + - kind: Project + api_version: project.openshift.io/v1 + name: "{{ mas_app_namespace }}" + - kind: ManageApp + api_version: apps.mas.ibm.com/v1 + name: "{{ mas_instance_id }}" + - kind: ManageWorkspace + api_version: apps.mas.ibm.com/v1 + name: "{{ manage_workspace_cr_name }}" + # Encryption secrets + - kind: Secret + api_version: v1 + name: "{{ manage_encryptionsecret_name }}" + - kind: Secret + api_version: v1 + name: "{{ manage_encryptionsecret_name }}-operator" + # Certificates + - kind: Certificate + api_version: cert-manager.io/v1 + labels: + - "mas.ibm.com/instanceId={{ mas_instance_id }}" + # Subscription and OperatorGroup + - kind: Subscription + api_version: operators.coreos.com/v1alpha1 + name: ibm-mas-manage + - kind: OperatorGroup + api_version: operators.coreos.com/v1 + name: operatorgroup + # IBM entitlement secret + - kind: Secret + api_version: v1 + name: ibm-entitlement + +- name: "Set fact: Manage backup resources" + set_fact: + manage_backup_resources: + - namespace: "{{ mas_app_namespace }}" + resources: "{{ manage_namespace_resources }}" + +# 7. Backup Manage namespace resources +# ----------------------------------------------------------------------------- +- name: "Backup Manage namespace resources (referenced secrets are auto-discovered)" + ibm.mas_devops.backup_resource: + backup_resources: "{{ manage_backup_resources }}" + backup_path: "{{ manage_backup_path }}" + register: manage_backup_result + +# 8. Display backup results +# ----------------------------------------------------------------------------- +- name: "Display Manage backup results" + debug: + msg: + - "Backup completed{{ ' with failures' if manage_backup_result.failed_count > 0 else ' successfully' }}" + - "Total resources backed up: {{ manage_backup_result.backed_up_count }}" + - "Total resources failed: {{ manage_backup_result.failed_count }}" + - "Resources not found: {{ manage_backup_result.not_found_count }}" + - "Secrets auto-discovered: {{ manage_backup_result.discovered_secrets_count }}" + - "Backup location: {{ manage_backup_path }}" + +# 9. Display failed resources if any +# ----------------------------------------------------------------------------- +- name: "Display failed resources" + debug: + msg: + - "Failed resources:" + - "{{ manage_backup_result.failed_resources | to_nice_yaml }}" + when: manage_backup_result.failed_count > 0 + +# 10. Fail if backup had errors +# ----------------------------------------------------------------------------- +- name: "Fail if backup had errors" + fail: + msg: | + Backup failed for {{ manage_backup_result.failed_count }} resource(s): + {% for resource in manage_backup_result.failed_resources %} + - {{ resource.description }} in {{ resource.scope }} + {% endfor %} + when: manage_backup_result.failed_count > 0 diff --git a/ibm/mas_devops/roles/suite_app_backup/tasks/manage/backup-pv.yml b/ibm/mas_devops/roles/suite_app_backup/tasks/manage/backup-pv.yml new file mode 100644 index 0000000000..de3ba19606 --- /dev/null +++ b/ibm/mas_devops/roles/suite_app_backup/tasks/manage/backup-pv.yml @@ -0,0 +1,126 @@ +--- +# Backup Manage Persistent Volumes +# ============================================================================= +# This task backs up Manage persistent volume data by: +# 1. Reading persistentVolumes from ManageWorkspace CR +# 2. Finding the UI or ALL server bundle pod +# 3. Creating tar.gz archives of each mount path +# 4. Storing them in the backup data folder + +# 1. Check if ManageWorkspace has persistent volumes configured +# ----------------------------------------------------------------------------- +- name: "Check if ManageWorkspace has persistent volumes" + set_fact: + manage_persistent_volumes: "{{ manage_workspace_cr.resources[0].spec.settings.deployment.persistentVolumes | default([]) }}" + +- name: "Display persistent volumes configuration" + debug: + msg: + - "Persistent volumes found: {{ manage_persistent_volumes | length }}" + - "{{ manage_persistent_volumes | to_nice_yaml }}" + +# 2. Only proceed if persistent volumes are configured +# ----------------------------------------------------------------------------- +- name: "Backup Manage persistent volumes" + when: manage_persistent_volumes | length > 0 + block: + # Find the server bundle pod (UI, ALL, or maxinst as fallback) + # ------------------------------------------------------------------------- + - name: "Find UI server bundle pod" + kubernetes.core.k8s_info: + kind: Pod + namespace: "{{ mas_app_namespace }}" + label_selectors: + - mas.ibm.com/appType=serverBundle + - mas.ibm.com/appTypeName=ui + - mas.ibm.com/workspaceId={{ mas_workspace_id }} + register: ui_pod_output + + - name: "Find ALL server bundle pod" + kubernetes.core.k8s_info: + kind: Pod + namespace: "{{ mas_app_namespace }}" + label_selectors: + - mas.ibm.com/appType=serverBundle + - mas.ibm.com/appTypeName=all + - mas.ibm.com/workspaceId={{ mas_workspace_id }} + register: all_pod_output + when: ui_pod_output.resources | length == 0 + + - name: "Find maxinst pod as fallback" + kubernetes.core.k8s_info: + kind: Pod + namespace: "{{ mas_app_namespace }}" + label_selectors: + - mas.ibm.com/appType=maxinstudb + - mas.ibm.com/workspaceId={{ mas_workspace_id }} + register: maxinst_pod_output + when: + - ui_pod_output.resources | length == 0 + - (all_pod_output.resources is not defined or all_pod_output.resources | length == 0) + + # Determine which pod to use + # ------------------------------------------------------------------------- + - name: "Set fact: server bundle pod to use" + set_fact: + server_bundle_pod: >- + {{ + ui_pod_output.resources[0] if ui_pod_output.resources | length > 0 + else (all_pod_output.resources[0] if (all_pod_output.resources is defined and all_pod_output.resources | length > 0) + else (maxinst_pod_output.resources[0] if (maxinst_pod_output is defined and maxinst_pod_output.resources is defined and maxinst_pod_output.resources | length > 0) + else None)) + }} + server_bundle_type: >- + {{ + 'ui' if ui_pod_output.resources | length > 0 + else ('all' if (all_pod_output.resources is defined and all_pod_output.resources | length > 0) + else ('maxinst' if (maxinst_pod_output is defined and maxinst_pod_output.resources is defined and maxinst_pod_output.resources | length > 0) + else 'none')) + }} + + - name: "Fail if no suitable pod found" + fail: + msg: "No UI, ALL server bundle, or maxinst pod found for workspace {{ mas_workspace_id }}" + when: server_bundle_pod is none or server_bundle_pod == None + + - name: "Display server bundle pod information" + debug: + msg: + - "Server bundle type: {{ server_bundle_type }}" + - "Pod name: {{ server_bundle_pod.metadata.name }}" + - "Container: {{ server_bundle_pod.spec.containers[0].name }}" + + # Create data backup directory + # ------------------------------------------------------------------------- + - name: "Set fact: PV backup data path" + set_fact: + manage_pv_data_path: "{{ manage_backup_path }}/data" + + - name: "Create PV backup data directory" + file: + path: "{{ manage_pv_data_path }}" + state: directory + mode: '0755' + + # Backup each persistent volume + # ------------------------------------------------------------------------- + - name: "Backup each persistent volume" + include_tasks: "{{ role_path }}/tasks/manage/backup-single-pv.yml" + loop: "{{ manage_persistent_volumes }}" + loop_control: + loop_var: pv_item + index_var: pv_index + + - name: "Display PV backup completion" + debug: + msg: + - "Manage PV backup completed" + - "Persistent volumes backed up: {{ manage_persistent_volumes | length }}" + - "Backup location: {{ manage_pv_data_path }}" + +# 3. Skip message if no persistent volumes configured +# ----------------------------------------------------------------------------- +- name: "Skip PV backup message" + debug: + msg: "Skipping Manage PV backup - no persistent volumes configured in ManageWorkspace CR" + when: manage_persistent_volumes | length == 0 diff --git a/ibm/mas_devops/roles/suite_app_backup/tasks/manage/backup-single-pv.yml b/ibm/mas_devops/roles/suite_app_backup/tasks/manage/backup-single-pv.yml new file mode 100644 index 0000000000..d4066bbc41 --- /dev/null +++ b/ibm/mas_devops/roles/suite_app_backup/tasks/manage/backup-single-pv.yml @@ -0,0 +1,93 @@ +--- +# Backup Single Persistent Volume +# ============================================================================= +# This task creates a tar.gz archive of a single persistent volume's mount path +# from the server bundle pod. + +- name: "Set fact: PV backup details" + set_fact: + pv_mount_path: "{{ pv_item.mountPath }}" + pv_pvc_name: "{{ pv_item.pvcName }}" + pv_archive_name: "{{ pv_item.pvcName }}.tar.gz" + pv_archive_path: "{{ manage_pv_data_path }}/{{ pv_item.pvcName }}.tar.gz" + +- name: "Display PV backup information" + debug: + msg: + - "Backing up PV {{ pv_index + 1 }}/{{ manage_persistent_volumes | length }}" + - "PVC Name: {{ pv_pvc_name }}" + - "Mount Path: {{ pv_mount_path }}" + - "Archive: {{ pv_archive_name }}" + +# Create tar.gz archive in the pod +# ----------------------------------------------------------------------------- +- name: "Create tar.gz archive of {{ pv_mount_path }} in pod" + kubernetes.core.k8s_exec: + namespace: "{{ mas_app_namespace }}" + pod: "{{ server_bundle_pod.metadata.name }}" + container: "{{ server_bundle_pod.spec.containers[0].name }}" + command: > + tar -czf /tmp/{{ pv_archive_name }} -C {{ pv_mount_path }} . + register: tar_result + failed_when: false + +- name: "Check tar creation result" + debug: + msg: + - "Tar creation return code: {{ tar_result.rc }}" + - "{{ 'Success' if tar_result.rc == 0 else 'Failed' }}" + +- name: "Fail if tar creation failed" + fail: + msg: "Failed to create tar archive for {{ pv_pvc_name }}: {{ tar_result.stderr | default('Unknown error') }}" + when: tar_result.rc != 0 + +# Copy tar.gz archive from pod to local backup directory +# ----------------------------------------------------------------------------- +- name: "Copy tar.gz archive from pod to backup location" + shell: | + oc cp {{ mas_app_namespace }}/{{ server_bundle_pod.metadata.name }}:/tmp/{{ pv_archive_name }} {{ pv_archive_path }} -c {{ server_bundle_pod.spec.containers[0].name }} + register: copy_result + failed_when: false + +- name: "Check copy result" + debug: + msg: + - "Copy return code: {{ copy_result.rc }}" + - "{{ 'Success' if copy_result.rc == 0 else 'Failed' }}" + +- name: "Fail if copy failed" + fail: + msg: "Failed to copy tar archive for {{ pv_pvc_name }}: {{ copy_result.stderr | default('Unknown error') }}" + when: copy_result.rc != 0 + +# Clean up tar.gz archive from pod +# ----------------------------------------------------------------------------- +- name: "Remove tar.gz archive from pod" + kubernetes.core.k8s_exec: + namespace: "{{ mas_app_namespace }}" + pod: "{{ server_bundle_pod.metadata.name }}" + container: "{{ server_bundle_pod.spec.containers[0].name }}" + command: rm -f /tmp/{{ pv_archive_name }} + register: cleanup_result + failed_when: false + +# Verify backup was created +# ----------------------------------------------------------------------------- +- name: "Verify backup archive exists and get size" + stat: + path: "{{ pv_archive_path }}" + register: archive_stat + +- name: "Display backup statistics" + debug: + msg: + - "Archive created: {{ archive_stat.stat.exists }}" + - "Archive size: {{ (archive_stat.stat.size / 1024 / 1024) | round(2) }} MB" + - "Archive location: {{ pv_archive_path }}" + when: archive_stat.stat.exists + +- name: "Warning if archive not found" + debug: + msg: "WARNING: Archive file not found at {{ pv_archive_path }}" + when: not archive_stat.stat.exists diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/README.md b/ibm/mas_devops/roles/suite_app_backup_restore/README.md deleted file mode 100644 index 1dd7503b2f..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/README.md +++ /dev/null @@ -1,507 +0,0 @@ -# suite_app_backup_restore - -## Overview - -This role supports backing up and restoring the data for below MAS applications: - -- `manage`: Manage namespace resources, persistent volume data (e.g. attachments) -- `iot`: IoT namespace resources -- `monitor`: Monitor namespace resources -- `health`: Health namespace resources, Watson Studio project asset -- `optimizer`: Optimizer namespace resources -- `visualinspection`: Visual Inspection namespace resources, persistent volume data (e.g. image datasets, models) - -Supports creating on-demand or scheduled backup jobs for taking full or incremental backups, and optionally creating Kubernetes jobs for running the backup/restore process. - -!!! important - An application backup can only be restored to an instance with the same MAS instance ID. - -## Role Variables - -### General Variables - -#### masbr_action -Action to perform on MAS application data. - -- **Required** -- Environment Variable: `MAS_BR_ACTION` -- Default: None - -**Purpose**: Specifies whether to create a backup of MAS application data or restore from a previous backup. - -**When to use**: -- Set to `backup` to create a backup of application data -- Set to `restore` to restore application data from a backup -- Always required to indicate the operation type - -**Valid values**: `backup`, `restore` - -**Impact**: -- `backup`: Creates backup job (on-demand or scheduled) for application data -- `restore`: Restores application data from specified backup version - -**Related variables**: -- `masbr_restore_from_version`: Required when action is `restore` -- `masbr_backup_schedule`: Optional for scheduled backups -- `mas_app_id`: Application to backup/restore - -**Note**: **IMPORTANT** - This role handles application-specific data (namespace resources, PV data, Watson Studio assets). Database data (Db2, MongoDB) must be backed up/restored separately. An application backup can only be restored to an instance with the same MAS instance ID. - -#### mas_app_id -MAS application identifier for backup/restore operations. - -- **Required** -- Environment Variable: `MAS_APP_ID` -- Default: None - -**Purpose**: Identifies which MAS application to backup or restore. Different applications support different data types (namespace, PV, Watson Studio). - -**When to use**: -- Always required for application backup/restore operations -- Must match an installed application in the instance -- Determines which data types are available for backup - -**Valid values**: `manage`, `iot`, `monitor`, `health`, `optimizer`, `visualinspection` - -**Impact**: Determines which application's data will be backed up or restored. Each application supports different data types: -- `manage`: namespace, pv (attachments) -- `iot`: namespace -- `monitor`: namespace -- `health`: namespace, wsl (Watson Studio) -- `optimizer`: namespace -- `visualinspection`: namespace, pv (datasets, models) - -**Related variables**: -- `masbr_backup_data`/`masbr_restore_data`: Data types to backup/restore -- `mas_instance_id`: Instance containing this application -- `mas_workspace_id`: Workspace containing this application - -**Note**: Database data (Db2, MongoDB) is not included in application backups and must be backed up separately using dedicated roles. - -#### mas_instance_id -MAS instance identifier for application backup/restore. - -- **Required** -- Environment Variable: `MAS_INSTANCE_ID` -- Default: None - -**Purpose**: Identifies which MAS instance contains the application to backup or restore. Used to locate application resources and ensure restore compatibility. - -**When to use**: -- Always required for application backup and restore operations -- Must match the instance ID from MAS installation -- Critical for restore operations (must match original backup instance ID) - -**Valid values**: Lowercase alphanumeric string, 3-12 characters (e.g., `prod`, `dev`, `main`) - -**Impact**: Determines which MAS instance's application will be backed up or restored. **CRITICAL** - An application backup can only be restored to an instance with the same MAS instance ID. - -**Related variables**: -- `mas_app_id`: Application within this instance -- `mas_workspace_id`: Workspace within this instance -- `masbr_restore_from_version`: Backup version to restore (for restore action) - -**Note**: **IMPORTANT** - The instance ID must match between backup and restore operations. Attempting to restore a backup to an instance with a different ID will fail. - -#### mas_workspace_id -Workspace identifier for application backup/restore. - -- **Required** -- Environment Variable: `MAS_WORKSPACE_ID` -- Default: None - -**Purpose**: Identifies which workspace within the MAS instance contains the application to backup or restore. Used to locate application resources. - -**When to use**: -- Always required for application backup and restore operations -- Must match the workspace ID from application installation -- Used to construct resource names and locate application data - -**Valid values**: Lowercase alphanumeric string (e.g., `ws1`, `prod`, `test`) - -**Impact**: Determines which workspace's application data will be backed up or restored. Incorrect workspace ID will cause operations to fail. - -**Related variables**: -- `mas_instance_id`: Instance containing this workspace -- `mas_app_id`: Application within this workspace - -**Note**: The workspace must contain the specified application. Application data is workspace-specific and cannot be restored to a different workspace. - -#### masbr_storage_local_folder -Local filesystem path for backup storage. - -- **Required** -- Environment Variable: `MASBR_STORAGE_LOCAL_FOLDER` -- Default: None - -**Purpose**: Specifies the local filesystem path where application backup files are stored (for backups) or retrieved from (for restores). This is the persistent storage location for backup data. - -**When to use**: -- Always required for backup and restore operations -- Must be accessible from the system running the role -- Should have sufficient space for application data backups -- Must be persistent across operations for restore capability - -**Valid values**: Absolute filesystem path (e.g., `/tmp/masbr`, `/backup/mas-apps`, `/mnt/backup`) - -**Impact**: Backup files are written to or read from this location. Insufficient space will cause backup failures. Path must exist and be writable. - -**Related variables**: -- `masbr_copy_timeout_sec`: Timeout for transferring files to/from this location -- `masbr_restore_from_version`: Backup version stored in this location - -**Note**: Ensure the path has sufficient disk space for application backups (especially for Manage attachments and Visual Inspection datasets). For production, use a dedicated backup volume with appropriate retention policies. - -#### masbr_confirm_cluster -Confirm cluster connection before backup/restore. - -- **Optional** -- Environment Variable: `MASBR_CONFIRM_CLUSTER` -- Default: `false` - -**Purpose**: Controls whether the role prompts for confirmation of the currently connected cluster before executing backup or restore operations. Safety feature to prevent accidental operations on wrong cluster. - -**When to use**: -- Set to `true` for interactive confirmation (recommended for production) -- Leave as `false` (default) for automated/non-interactive operations -- Use `true` when manually running backup/restore to verify correct cluster - -**Valid values**: `true`, `false` - -**Impact**: -- `true`: Role prompts for cluster confirmation before proceeding -- `false`: Role proceeds without confirmation (suitable for automation) - -**Related variables**: -- `masbr_action`: Operation requiring cluster confirmation - -**Note**: Enabling cluster confirmation is recommended for manual operations, especially in production environments, to prevent accidental backup/restore on the wrong cluster. - -#### masbr_copy_timeout_sec -File transfer timeout in seconds. - -- **Optional** -- Environment Variable: `MASBR_COPY_TIMEOUT_SEC` -- Default: `43200` (12 hours) - -**Purpose**: Specifies the maximum time allowed for transferring application backup files between cluster and local storage. Prevents operations from hanging indefinitely. - -**When to use**: -- Use default (12 hours) for most deployments -- Increase for very large backups (e.g., Manage attachments, Visual Inspection datasets) -- Decrease for smaller backups to fail faster on issues - -**Valid values**: Positive integer (seconds), e.g., `3600` (1 hour), `43200` (12 hours), `86400` (24 hours) - -**Impact**: Operations exceeding this timeout will fail. Insufficient timeout for large backups will cause failures. Excessive timeout delays error detection. - -**Related variables**: -- `masbr_storage_local_folder`: Destination for file transfers - -**Note**: The default 12 hours is suitable for most deployments. Adjust based on backup size (especially for Manage attachments and Visual Inspection datasets) and network speed. - -#### masbr_job_timezone -Time zone for scheduled backup jobs. - -- **Optional** -- Environment Variable: `MASBR_JOB_TIMEZONE` -- Default: UTC - -**Purpose**: Specifies the time zone for scheduled backup CronJobs. Ensures backups run at the intended local time rather than UTC. - -**When to use**: -- Leave unset to use UTC (default) -- Set when you need backups to run at specific local times -- Only applies to scheduled backups (when `masbr_backup_schedule` is set) - -**Valid values**: Valid [tz database time zone](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) (e.g., `America/New_York`, `Europe/London`, `Asia/Tokyo`) - -**Impact**: Determines when scheduled backups execute. Incorrect time zone may cause backups to run at unexpected times. - -**Related variables**: -- `masbr_backup_schedule`: Cron expression interpreted in this time zone - -**Note**: Only relevant for scheduled backups. On-demand backups ignore this setting. Use standard tz database names (e.g., `America/New_York`, not `EST`). - -### Backup Variables - -#### masbr_backup_type -Backup type: full or incremental. - -- **Optional** -- Environment Variable: `MASBR_BACKUP_TYPE` -- Default: `full` - -**Purpose**: Specifies whether to create a full backup or incremental backup. Incremental backups only capture changes since the last full backup, reducing backup time and storage. - -**When to use**: -- Use `full` (default) for complete backups -- Use `incr` for incremental backups of persistent volume data -- Incremental backups require a previous full backup - -**Valid values**: `full`, `incr` - -**Impact**: -- `full`: Creates complete backup of all data -- `incr`: Creates incremental backup of PV data only (namespace data is always full) - -**Related variables**: -- `masbr_backup_from_version`: Full backup version for incremental backup -- `masbr_backup_data`: Data types to backup - -**Note**: **IMPORTANT** - Incremental backups only apply to persistent volume (PV) data. Namespace and Watson Studio data are always backed up in full regardless of this setting. Incremental backups require a previous full backup as a baseline. - -#### masbr_backup_data -Data types to include in backup. - -- **Optional** -- Environment Variable: `MASBR_BACKUP_DATA` -- Default: All supported data types for the application - -**Purpose**: Specifies which types of data to backup. Allows selective backup of namespace resources, persistent volumes, or Watson Studio assets. - -**When to use**: -- Leave unset to backup all supported data types (recommended) -- Set to backup specific data types only -- Use comma-separated list for multiple types (e.g., `namespace,pv`) - -**Valid values**: Comma-separated list of: `namespace`, `pv`, `wsl` -- `namespace`: Kubernetes namespace resources -- `pv`: Persistent volume data (attachments, datasets, models) -- `wsl`: Watson Studio project assets (Health only) - -**Impact**: Only specified data types are backed up. Unspecified types are excluded from backup. - -**Related variables**: -- `mas_app_id`: Determines which data types are supported -- `masbr_backup_type`: Full or incremental (applies to PV data only) - -**Note**: Supported data types vary by application: -- Manage: `namespace`, `pv` -- IoT/Monitor/Optimizer: `namespace` only -- Health: `namespace`, `wsl` -- Visual Inspection: `namespace`, `pv` - -The data types supported by each MAS applications: - -| MAS App Name | MAS App ID | Data types | -| ----------------- | ------------------- | ------------------- | -| Manage | `manage` | `namespace`, `pv` | -| IoT | `iot` | `namespace` | -| Monitor | `monitor` | `namespace` | -| Health | `health` | `namespace`, `wsl` | -| Optimizer | `optimizer` | `namespace` | -| Visual Inspection | `visualinspection` | `namespace`, `pv` | - -#### masbr_backup_from_version -Base full backup version for incremental backups. - -- **Optional** (when `masbr_backup_type=incr`) -- Environment Variable: `MASBR_BACKUP_FROM_VERSION` -- Default: Latest full backup (auto-detected) - -**Purpose**: Specifies which full backup to use as the baseline for an incremental backup. Incremental backups capture only changes since this version. - -**When to use**: -- Only applies when `masbr_backup_type=incr` -- Leave unset to automatically use the latest full backup (recommended) -- Set explicitly to use a specific full backup as baseline - -**Valid values**: Timestamp in `YYYYMMDDHHMMSS` format (e.g., `20240621021316` for June 21, 2024 at 02:13:16) - -**Impact**: Determines which full backup is used as the baseline. Incremental backup captures changes since this version. If not set, automatically uses the latest full backup. - -**Related variables**: -- `masbr_backup_type`: Must be `incr` for this variable to be used -- `masbr_storage_local_folder`: Location where full backup versions are stored - -**Note**: Only valid for incremental backups. The specified version must be a full backup (not incremental). Auto-detection finds the latest full backup in storage. - -#### masbr_backup_schedule -Cron expression for scheduled backups. - -- **Optional** -- Environment Variable: `MASBR_BACKUP_SCHEDULE` -- Default: None (on-demand backup) - -**Purpose**: Defines a schedule for automatic recurring backups using Cron syntax. When set, creates a Kubernetes CronJob for automated backups. - -**When to use**: -- Leave unset for on-demand backups (manual execution) -- Set to create scheduled/recurring backups -- Use for automated backup strategies - -**Valid values**: Valid [Cron expression](https://en.wikipedia.org/wiki/Cron) (e.g., `0 2 * * *` for daily at 2 AM, `0 2 * * 0` for weekly on Sunday at 2 AM) - -**Impact**: -- When set: Creates a Kubernetes CronJob that runs backups automatically on schedule -- When unset: Creates an on-demand backup job that runs immediately - -**Related variables**: -- `masbr_job_timezone`: Time zone for interpreting the cron schedule -- `masbr_action`: Must be `backup` for scheduled backups - -**Note**: Scheduled backups only apply when `masbr_action=backup`. The cron expression is interpreted in the time zone specified by `masbr_job_timezone` (defaults to UTC). Common patterns: `0 2 * * *` (daily 2 AM), `0 2 * * 0` (weekly Sunday 2 AM), `0 2 1 * *` (monthly 1st at 2 AM). - -### Restore Variables - -#### masbr_restore_from_version -Backup version timestamp for restore operations. - -- **Required** (when `masbr_action=restore`) -- Environment Variable: `MASBR_RESTORE_FROM_VERSION` -- Default: None - -**Purpose**: Specifies which backup version to restore from. The version is a timestamp identifying a specific backup. - -**When to use**: -- Required when `masbr_action=restore` -- Not used for backup operations -- Must match an existing backup version in storage - -**Valid values**: Timestamp in `YYYYMMDDHHMMSS` format (e.g., `20240621021316` for June 21, 2024 at 02:13:16) - -**Impact**: Determines which backup is restored. Incorrect or non-existent version will cause restore to fail. - -**Related variables**: -- `masbr_action`: Must be `restore` for this variable to be used -- `masbr_storage_local_folder`: Location where backup versions are stored -- `mas_instance_id`: Must match the instance ID from the backup - -**Note**: The backup version timestamp is generated automatically during backup creation. List available backups in `masbr_storage_local_folder` to find valid version timestamps. **IMPORTANT** - The backup can only be restored to an instance with the same MAS instance ID as the original backup. - -#### masbr_restore_data -Data types to include in restore. - -- **Optional** -- Environment Variable: `MASBR_RESTORE_DATA` -- Default: All supported data types for the application - -**Purpose**: Specifies which types of data to restore. Allows selective restore of namespace resources, persistent volumes, or Watson Studio assets. - -**When to use**: -- Leave unset to restore all supported data types (recommended) -- Set to restore specific data types only -- Use comma-separated list for multiple types (e.g., `namespace,pv`) - -**Valid values**: Comma-separated list of: `namespace`, `pv`, `wsl` -- `namespace`: Kubernetes namespace resources -- `pv`: Persistent volume data (attachments, datasets, models) -- `wsl`: Watson Studio project assets (Health only) - -**Impact**: Only specified data types are restored. Unspecified types remain unchanged. - -**Related variables**: -- `mas_app_id`: Determines which data types are supported -- `masbr_restore_from_version`: Backup version containing the data - -**Note**: Supported data types vary by application: -- Manage: `namespace`, `pv` -- IoT/Monitor/Optimizer: `namespace` only -- Health: `namespace`, `wsl` -- Visual Inspection: `namespace`, `pv` - -The data types supported by each MAS applications: - -| MAS App Name | MAS App ID | Data types | -| ----------------- | ------------------- | ------------------- | -| Manage | `manage` | `namespace`, `pv` | -| IoT | `iot` | `namespace` | -| Monitor | `monitor` | `namespace` | -| Health | `health` | `namespace`, `wsl` | -| Optimizer | `optimizer` | `namespace` | -| Visual Inspection | `visualinspection` | `namespace`, `pv` | - -### Manage Variables - -#### masbr_manage_pvc_paths -Manage PVC paths for backup/restore (Manage only). - -- **Optional** -- Environment Variable: `MASBR_MANAGE_PVC_PATHS` -- Default: None - -**Purpose**: Specifies which Manage persistent volumes to backup/restore. Defines PVC names, mount paths, and optional subpaths for Manage attachments and custom files. - -**When to use**: -- Only applies to Manage application (`mas_app_id=manage`) -- Required when backing up/restoring Manage PV data -- Leave unset to skip Manage PV backup/restore -- Set to backup specific Manage PVCs (e.g., attachments, custom files) - -**Valid values**: Comma-separated list in format `:/` -- Example: `manage-doclinks1-pvc:/mnt/doclinks1/attachments` -- Multiple: `manage-doclinks1-pvc:/mnt/doclinks1,manage-doclinks2-pvc:/mnt/doclinks2` - -**Impact**: Only specified PVCs are backed up/restored. Unspecified PVCs are excluded. - -**Related variables**: -- `mas_app_id`: Must be `manage` for this variable to apply -- `masbr_backup_data`/`masbr_restore_data`: Must include `pv` data type - -**Note**: PVC names and mount paths are defined in the ManageWorkspace CR `spec.settings.deployment.persistentVolumes`. Subpath is optional. If not set, no Manage PV data is backed up/restored. - -The `` and `` are defined in the `ManageWorkspace` CRD instance `spec.settings.deployment.persistentVolumes`: - -```yaml -persistentVolumes: - - accessModes: - - ReadWriteMany - mountPath: /mnt/doclinks1 - pvcName: manage-doclinks1-pvc - size: '20' - storageClassName: ocs-storagecluster-cephfs - volumeName: '' - - accessModes: - - ReadWriteMany - mountPath: /mnt/doclinks2 - pvcName: manage-doclinks2-pvc - size: '20' - storageClassName: ocs-storagecluster-cephfs - volumeName: '' -``` - -If not set a value for this variable, this role will not backup and restore persistent volume data for Manage. - -## Example Playbook - -### Backup -Backup Manage attachments, note that this does not include backup of any data in Db2, see the `backup` action in the [db2](db2.md) role. - -```yaml -- hosts: localhost - any_errors_fatal: true - vars: - masbr_action: backup - mas_instance_id: main - mas_workspace_id: ws1 - mas_app_id: manage - masbr_backup_data: pv - masbr_manage_pvc_paths: "manage-doclinks1-pvc:/mnt/doclinks1" - masbr_storage_local_folder: /tmp/masbr - roles: - - ibm.mas_devops.suite_app_backup_restore -``` - -### Restore -Restore Manage attachments, note that this does not include restore of any data in Db2, see the `restore` action in the [db2](db2.md) role. - -```yaml -- hosts: localhost - any_errors_fatal: true - vars: - masbr_action: restore - masbr_restore_from_version: 20240621021316 - mas_instance_id: main - mas_workspace_id: ws1 - mas_app_id: manage - masbr_backup_data: pv - masbr_manage_pvc_paths: "manage-doclinks1-pvc:/mnt/doclinks1" - masbr_storage_local_folder: /tmp/masbr - roles: - - ibm.mas_devops.suite_app_backup_restore -``` - -## License - -EPL-2.0 diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/defaults/main.yml b/ibm/mas_devops/roles/suite_app_backup_restore/defaults/main.yml deleted file mode 100644 index 5618f24bc9..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/defaults/main.yml +++ /dev/null @@ -1,20 +0,0 @@ -mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" -mas_workspace_id: "{{ lookup('env', 'MAS_WORKSPACE_ID') }}" -mas_app_id: "{{ lookup('env', 'MAS_APP_ID') }}" - -masbr_action: "{{ lookup('env', 'MASBR_ACTION') }}" - -# Manage PVC paths to backup/restore, format: ":/" separated by commas -# For example: "pvc-docs:/doclinks/attachments,pvc-maxlogs:/maxlogs" -masbr_manage_pvc_paths: "{{ lookup('env', 'MASBR_MANAGE_PVC_PATHS') | default('', true) }}" - -# Backup/Restore - Supported job types per app -# https://ibm-mas.github.io/ansible-devops/roles/suite_app_backup_restore/#masbr_backup_data -# https://ibm-mas.github.io/ansible-devops/roles/suite_app_backup_restore/#masbr_restore_data -supported_job_data_item_types: - health: ["namespace", "wsl"] - iot: ["namespace"] - manage: ["namespace", "pv"] - monitor: ["namespace"] - optimizer: ["namespace"] - visualinspection: ["namespace", pv] diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/backup-namespace.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/backup-namespace.yml deleted file mode 100644 index 8c867a9b89..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/backup-namespace.yml +++ /dev/null @@ -1,113 +0,0 @@ ---- -# Update namespace resource backup status: InProgress -# ----------------------------------------------------------------------------- -- name: "Update namespace resource backup status: InProgress" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "InProgress" - - -- name: "Backup namespace resources" - block: - # Prepare namespace resource backup folder - # ------------------------------------------------------------------------- - - name: "Set fact: namespace resource backup folder" - set_fact: - masbr_ns_backup_folder: "{{ masbr_local_job_folder }}/{{ masbr_job_data_type }}" - masbr_ns_backup_name: "{{ masbr_job_name }}-{{ masbr_job_data_type }}" - - - name: "Set fact: namespace resource backup log" - set_fact: - masbr_ns_backup_log: "{{ masbr_local_job_folder }}/{{ masbr_ns_backup_name }}.log" - - - name: "Create local backup folder for saving namespace resoruces" - changed_when: true - shell: > - mkdir -p {{ masbr_ns_backup_folder }} && - touch {{ masbr_ns_backup_log }} - - - # Run backup namespace resource script - # ------------------------------------------------------------------------- - - name: "Create backup namespace resource script" - template: - src: "{{ role_path }}/../../common_tasks/templates/backup_restore/backup-namespace-resources.sh.j2" - dest: "{{ masbr_local_job_folder }}/backup-namespace-resources.sh" - mode: "777" - - - name: "Run backup namespace resource script" - changed_when: true - shell: > - {{ masbr_local_job_folder }}/backup-namespace-resources.sh - register: _script_output - - - name: "Debug: run backup namespace resource script" - debug: - msg: "{{ _script_output.stdout_lines }}" - - - # Create tar.gz archives of namespace resource backup files - # ------------------------------------------------------------------------- - - name: "Create tar.gz archives of namespace resource backup files" - changed_when: true - shell: > - tar -czf {{ masbr_local_job_folder }}/{{ masbr_ns_backup_name }}.tar.gz - -C {{ masbr_ns_backup_folder }} . && - ls -lA {{ masbr_ns_backup_folder }} - register: _list_files_output - - - name: "Debug: list of namespace resource backup files" - debug: - msg: "{{ _list_files_output.stdout_lines }}" - - - # Copy backup files to specified storage location - # ------------------------------------------------------------------------- - - name: "Copy backup files to specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_local_files_to_storage.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_job_name }}" - masbr_cf_paths: - - src_file: "{{ masbr_ns_backup_name }}.tar.gz" - dest_folder: "{{ masbr_job_data_type }}" - - - # Update namespace resource backup status: Completed - # ------------------------------------------------------------------------- - - name: "Update namespace resource backup status: Completed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Completed" - - rescue: - # Update namespace resource backup status: Failed - # ------------------------------------------------------------------------- - - name: "Update namespace resource backup status: Failed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Failed" - - always: - # Copy namespace resource backup log file to specified storage location - # ------------------------------------------------------------------------- - - name: "Create a tar.gz archive of namespace resource backup log" - changed_when: true - shell: > - tar -czf {{ masbr_local_job_folder }}/{{ masbr_ns_backup_name }}-log.tar.gz - -C {{ masbr_local_job_folder }} {{ masbr_ns_backup_name }}.log - - - name: "Copy namespace resource backup log file to specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_local_files_to_storage.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_job_name_final }}" - masbr_cf_paths: - - src_file: "{{ masbr_local_job_folder }}/{{ masbr_ns_backup_name }}-log.tar.gz" - dest_folder: "log" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/backup-pv.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/backup-pv.yml deleted file mode 100644 index a8984b8002..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/backup-pv.yml +++ /dev/null @@ -1,85 +0,0 @@ ---- -# Update pv data backup status: InProgress -# ----------------------------------------------------------------------------- -- name: "Update pv data backup status: InProgress" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "InProgress" - - -# Get app pv information -# ----------------------------------------------------------------------------- -- name: "Set fact: mas_app_pv_list" - set_fact: - mas_app_pv_list: [] - -- name: "Get {{ mas_app_id }} pv information" - when: mas_app_id in ['manage', 'visualinspection'] - include_tasks: "tasks/{{ mas_app_id }}/pv-info.yml" - -- name: "Debug: {{ mas_app_id }} pv information" - debug: - msg: "{{ mas_app_pv_list }}" - - -# Not found pv need to be backed up, skip this task. -# (TODO: should set the status to 'Skip') -# ------------------------------------------------------------------------- -- name: "Update pv data backup status: Completed" - when: mas_app_pv_list | length == 0 - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Completed" - - -- name: "Backup pv data" - when: mas_app_pv_list | length > 0 - block: - # Copy pv data to specified storage location - # ------------------------------------------------------------------------- - - name: "Set fact: copy file variables" - set_fact: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_job_name }}" - masbr_cf_namespace: "{{ mas_app_namespace }}" - masbr_cf_are_pvc_paths: true - - - name: "Set fact: incremental backup" - when: masbr_backup_type == "incr" - set_fact: - masbr_cf_from_job_name: "{{ masbr_backup_from }}" - - - name: "Copy pv data to specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_pod_files_to_storage.yml" - vars: - masbr_cf_pvc_name: "{{ mas_app_pv_item.pvc_name }}" - masbr_cf_pvc_mount_path: "{{ mas_app_pv_item.mount_path }}" - masbr_cf_pvc_sub_path: "{{ mas_app_pv_item.sub_path | default('') }}" - masbr_cf_paths: "{{ mas_app_pv_item.backup_paths }}" - loop: "{{ mas_app_pv_list }}" - loop_control: - loop_var: mas_app_pv_item - - - # Update pv data backup status: Completed - # ------------------------------------------------------------------------- - - name: "Update pv data backup status: Completed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Completed" - - rescue: - # Update pv data backup status: Failed - # ------------------------------------------------------------------------- - - name: "Update pv data backup status: Failed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Failed" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/get-app-info.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/get-app-info.yml deleted file mode 100644 index ae61fac2af..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/get-app-info.yml +++ /dev/null @@ -1,114 +0,0 @@ ---- -- name: "Load mas app information" - include_vars: "{{ role_path }}/../../common_vars/application_info.yml" - -- name: "Set fact: mas app information" - when: mas_app_id != "health" - set_fact: - mas_app_kind: "{{ app_info[mas_app_id].kind }}" - mas_ws_kind: "{{ app_info[mas_app_id].ws_kind }}" - mas_api_version: "{{ app_info[mas_app_id].api_version }}" - -- name: "Get health app information" - when: mas_app_id == "health" - include_tasks: "tasks/health/get-app-info.yml" - - -# Get app version and status -# ----------------------------------------------------------------------------- -- name: "Set fact: application CRD instance name" - set_fact: - mas_app_cr_name: "{{ mas_instance_id }}" - -- name: "Get {{ mas_app_kind }}/{{ mas_app_cr_name }}" - kubernetes.core.k8s_info: - api_version: "{{ mas_api_version }}" - kind: "{{ mas_app_kind }}" - name: "{{ mas_app_cr_name }}" - namespace: "{{ mas_app_namespace }}" - register: _app_output - -- name: "Set fact: {{ mas_app_kind }}/{{ mas_app_cr_name }} version" - set_fact: - mas_app_version: "{{ _app_output.resources[0].status.versions.reconciled }}" - when: - - _app_output is defined - - (_app_output.resources | length > 0) - - _app_output.resources[0].status.versions.reconciled is defined - -- name: "Fail if {{ mas_app_kind }}/{{ mas_app_cr_name }} does not exists" - assert: - that: mas_app_version is defined - fail_msg: "{{ mas_app_kind }}/{{ mas_app_cr_name }} does not exists!" - when: masbr_action is defined and masbr_action == "backup" - -- name: "Set fact: {{ mas_app_kind }}/{{ mas_app_cr_name }} status" - set_fact: - mas_app_ready: true - when: - - _app_output.resources is defined - - (_app_output.resources | length > 0) - - _app_output.resources | json_query('[*].status.conditions[?type==`Ready`][].status') | select ('match','True') | list | length == 1 - -# When performing restore, we shouldn't care about the status of app. -- name: "Fail if {{ mas_app_kind }}/{{ mas_app_cr_name }} is not ready" - when: masbr_action is defined and masbr_action == "backup" - assert: - that: mas_app_ready is defined and mas_app_ready - fail_msg: "{{ mas_app_kind }}/{{ mas_app_cr_name }} is not ready!" - - -# Get workspace version and status -# ----------------------------------------------------------------------------- -- name: "Set fact: workspace CRD instance name" - set_fact: - mas_ws_cr_name: "{{ mas_instance_id }}-{{ mas_workspace_id }}" - -- name: "Get {{ mas_ws_kind }}/{{ mas_ws_cr_name }}" - kubernetes.core.k8s_info: - api_version: "{{ mas_api_version }}" - kind: "{{ mas_ws_kind }}" - name: "{{ mas_ws_cr_name }}" - namespace: "{{ mas_app_namespace }}" - register: _ws_output - -- name: "Set fact: {{ mas_ws_kind }}/{{ mas_ws_cr_name }} version" - set_fact: - mas_ws_version: "{{ _ws_output.resources[0].status.versions.reconciled }}" - when: - - _ws_output is defined - - (_ws_output.resources | length > 0) - - _ws_output.resources[0].status.versions.reconciled is defined - -- name: "Fail if {{ mas_ws_kind }}/{{ mas_ws_cr_name }} does not exists" - assert: - that: mas_ws_version is defined - fail_msg: "{{ mas_ws_kind }}/{{ mas_ws_cr_name }} does not exists!" - when: masbr_action is defined and masbr_action == "backup" - -- name: "Set fact: {{ mas_ws_kind }}/{{ mas_ws_cr_name }} status" - set_fact: - mas_ws_ready: true - when: - - _ws_output.resources is defined - - (_ws_output.resources | length > 0) - - _ws_output.resources | json_query('[*].status.conditions[?type==`Ready`][].status') | select ('match','True') | list | length == 1 - -# When performing restore, we shouldn't care about the status of app. -- name: "Fail if {{ mas_ws_kind }}/{{ mas_ws_cr_name }} is not ready" - when: masbr_action is defined and masbr_action == "backup" - assert: - that: mas_ws_ready is defined and mas_ws_ready - fail_msg: "{{ mas_ws_kind }}/{{ mas_ws_cr_name }} is not ready!" - - -# Output app information -# ----------------------------------------------------------------------------- -- name: "Debug: {{ mas_app_id | capitalize }} information" - when: masbr_action is defined and masbr_action == "backup" - debug: - msg: - - "{{ mas_app_kind }}/{{ mas_app_cr_name }} version ............ {{ mas_app_version }}" - - "{{ mas_app_kind }}/{{ mas_app_cr_name }} is ready ........... {{ mas_app_ready | default(false, true) }}" - - "{{ mas_ws_kind }}/{{ mas_ws_cr_name }} version .......... {{ mas_ws_version }}" - - "{{ mas_ws_kind }}/{{ mas_ws_cr_name }} is ready ......... {{ mas_ws_ready | default(false, true) }}" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/health/backup-vars.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/health/backup-vars.yml deleted file mode 100644 index 8a52ef7427..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/health/backup-vars.yml +++ /dev/null @@ -1,34 +0,0 @@ ---- -- name: "Set fact: default backup job data list" - set_fact: - masbr_job_data_list: - - seq: "1" - type: "namespace" - - seq: "2" - type: "wsl" - -- name: "Set fact: health standalone namespace backup resources" - when: mas_health_standalone - set_fact: - masbr_ns_backup_resources: - - namespace: "{{ mas_app_namespace }}" - resources: - - kind: Subscription - name: ibm-mas-manage - - kind: OperatorGroup - name: ibm-health-operatorgroup - - kind: Secret - name: ibm-entitlement - # apps.mas.ibm.com - - kind: HealthApp - - kind: HealthWorkspace - -- name: "Set fact: health ext namespace backup resources" - when: not mas_health_standalone - set_fact: - masbr_ns_backup_resources: - - namespace: "{{ mas_app_namespace }}" - resources: - # apps.mas.ibm.com - - kind: HealthextWorkspace - - kind: HealthextAccelerator diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/health/backup-wsl.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/health/backup-wsl.yml deleted file mode 100644 index 3499cfe1b6..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/health/backup-wsl.yml +++ /dev/null @@ -1,97 +0,0 @@ ---- -# Update watson studio project backup status: InProgress -# ----------------------------------------------------------------------------- -- name: "Update watson studio project backup status: InProgress" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "InProgress" - - -- name: "Backup watson studio project" - block: - # Prepare watson studio project backup folder - # ------------------------------------------------------------------------- - - name: "Set fact: watson studio project backup name" - set_fact: - masbr_wsl_backup_name: "{{ masbr_job_name }}-{{ masbr_job_data_type }}" - - - name: "Set fact: watson studio project backup log" - set_fact: - masbr_wsl_backup_log: "{{ masbr_local_job_folder }}/{{ masbr_wsl_backup_name }}.log" - - - name: "Create watson studio project backup folder" - changed_when: true - shell: > - touch {{ masbr_wsl_backup_log }} - - - # Get watson studio information - # ----------------------------------------------------------------------------- - - name: "Get watson studio information" - include_tasks: "tasks/health/get-wsl-info.yml" - vars: - _wsl_log: "{{ masbr_wsl_backup_log }}" - - - # Export watson studio project asset - # ----------------------------------------------------------------------------- - - name: "Export watson studio project asset" - changed_when: true - shell: >- - {{ cpd_cli_cmd }} config users set cpd-user --username={{ cpd_username }} --apikey={{ cpd_apikey }}; - {{ cpd_cli_cmd }} config profiles set cpd-profile --user=cpd-user --url={{ cpd_endpoint }}; - {{ cpd_cli_cmd }} asset export start --profile=cpd-profile --project-id={{ cpd_project_id }} - --name={{ masbr_job_name }} --assets='{"all_assets": true}' - --output-file={{ masbr_local_job_folder }}/{{ masbr_wsl_backup_name }}.tgz >> {{ masbr_wsl_backup_log }} - - - # Copy backup files to specified storage location - # ------------------------------------------------------------------------- - - name: "Copy backup files to specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_local_files_to_storage.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_job_name }}" - masbr_cf_paths: - - src_file: "{{ masbr_wsl_backup_name }}.tgz" - dest_folder: "{{ masbr_job_data_type }}" - - - # Update watson studio project backup status: Completed - # ------------------------------------------------------------------------- - - name: "Update watson studio project backup status: Completed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Completed" - - rescue: - # Update watson studio project backup status: Failed - # ------------------------------------------------------------------------- - - name: "Update watson studio project backup status: Failed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Failed" - - always: - # Copy watson studio project backup log file to specified storage location - # ------------------------------------------------------------------------- - - name: "Create a tar.gz archive of watson studio project backup log" - changed_when: true - shell: > - tar -czf {{ masbr_local_job_folder }}/{{ masbr_wsl_backup_name }}-log.tar.gz - -C {{ masbr_local_job_folder }} {{ masbr_wsl_backup_name }}.log - - - name: "Copy watson studio project backup log file to specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_local_files_to_storage.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_job_name_final }}" - masbr_cf_paths: - - src_file: "{{ masbr_local_job_folder }}/{{ masbr_wsl_backup_name }}-log.tar.gz" - dest_folder: "log" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/health/get-app-info.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/health/get-app-info.yml deleted file mode 100644 index 50daf8a436..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/health/get-app-info.yml +++ /dev/null @@ -1,21 +0,0 @@ ---- -- name: "Determine health deployment types" - set_fact: - # Only support healthext deployment by now - mas_health_standalone: false - -- name: "Set fact: health standalone app information" - when: mas_health_standalone - set_fact: - mas_app_namespace: "mas-{{ mas_instance_id }}-health" - mas_app_kind: "HealthApp" - mas_ws_kind: "HealthWorkspace" - mas_api_version: "apps.mas.ibm.com/v1" - -- name: "Set fact: health ext app information" - when: not mas_health_standalone - set_fact: - mas_app_namespace: "mas-{{ mas_instance_id }}-manage" - mas_app_kind: "ManageApp" - mas_ws_kind: "HealthextWorkspace" - mas_api_version: "apps.mas.ibm.com/v1" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/health/get-wsl-info.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/health/get-wsl-info.yml deleted file mode 100644 index 6c4943ec3e..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/health/get-wsl-info.yml +++ /dev/null @@ -1,80 +0,0 @@ ---- -# Input parameters: -# _wsl_log - - -# Get watson studio information -# ----------------------------------------------------------------------------- -- name: "Set fact: watson studio secret" - set_fact: - wsl_secret_name: "{{ mas_instance_id }}-{{ mas_workspace_id }}-healthext-watsonstudio-secret" - -- name: "Get watson studio secret" - kubernetes.core.k8s_info: - kind: Secret - name: "{{ wsl_secret_name }}" - namespace: "{{ mas_app_namespace }}" - register: _wsl_secret_output - -- name: "Set fact: mongodb admin password" - set_fact: - cpd_endpoint: "{{ _wsl_secret_output.resources[0].data.endpoint | b64decode }}" - cpd_username: "{{ _wsl_secret_output.resources[0].data.username | b64decode }}" - cpd_password: "{{ _wsl_secret_output.resources[0].data.password | b64decode }}" - cpd_project_id: "{{ _wsl_secret_output.resources[0].data.projectid | b64decode }}" - when: - - _wsl_secret_output is defined - - _wsl_secret_output.resources is defined - - _wsl_secret_output.resources | length > 0 - -- name: "Debug: watson studio information" - debug: - msg: - - "CPD endpoint ........................... {{ cpd_endpoint }}" - - "CPD username ........................... {{ cpd_username }}" - - "CPD project ............................ {{ cpd_project_id }}" - - -# Generate a CPD API Key -# ----------------------------------------------------------------------------- -- name: "Generate CPD API Key" - changed_when: false - shell: >- - echo "Call {{ cpd_endpoint }}/icp4d-api/v1/authorize" >> {{ _wsl_log }}; - CPD_API_RESP=$(curl -k -X POST -H "Content-Type: application/json" - -d "{\"username\":\"{{ cpd_username }}\",\"password\":\"{{ cpd_password }}\"}" - "{{ cpd_endpoint }}/icp4d-api/v1/authorize"); - echo "${CPD_API_RESP}" >> {{ _wsl_log }}; - CPD_TOKEN=$(echo "${CPD_API_RESP}" | jq -r '.token'); - echo "Call {{ cpd_endpoint }}/usermgmt/v1/user/apiKey" >> {{ _wsl_log }}; - CPD_API_RESP=$(curl -k -X GET "{{ cpd_endpoint }}/usermgmt/v1/user/apiKey" -H "Accept: application/json" - -H "Authorization: Bearer ${CPD_TOKEN}"); - echo "${CPD_API_RESP}" >> {{ _wsl_log }}; - CPD_API_KEY=$(echo "${CPD_API_RESP}" | jq -r '.apiKey'); - echo "${CPD_API_KEY}" - register: _cpd_apikey_output - -- name: "Set fact: CPD API Key" - set_fact: - cpd_apikey: "{{ _cpd_apikey_output.stdout }}" - - -# Check if cpd-cli installed -# ----------------------------------------------------------------------------- -- name: "Get cpd-cli information" - changed_when: false - shell: > - cpd-cli version - register: _cpd_cli_output - -- name: "Download cpd-cli" - when: _cpd_cli_output.rc != 0 - changed_when: true - shell: >- - cd /tmp; - curl -L https://github.com/IBM/cpd-cli/releases/download/v13.1.5r1/cpd-cli-linux-EE-13.1.5.tgz -o cpd-cli-linux.tgz; - tar -xf cpd-cli-linux.tgz; - -- name: "Set fact: cpd-cli command" - set_fact: - cpd_cli_cmd: "{{ '/tmp/cpd-cli-linux-EE-13.1.5-242/cpd-cli' if _cpd_cli_output.rc != 0 else 'cpd-cli' }}" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/health/restore-vars.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/health/restore-vars.yml deleted file mode 100644 index 9c7f51f5f3..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/health/restore-vars.yml +++ /dev/null @@ -1,8 +0,0 @@ ---- -- name: "Set fact: default restore job data list" - set_fact: - masbr_job_data_list: - - seq: "1" - type: "namespace" - - seq: "2" - type: "wsl" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/health/restore-wsl.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/health/restore-wsl.yml deleted file mode 100644 index bb0bc6ca58..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/health/restore-wsl.yml +++ /dev/null @@ -1,127 +0,0 @@ ---- -# Update watson studio project restore status: InProgress -# ----------------------------------------------------------------------------- -- name: "Update watson studio project restore status: InProgress" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "InProgress" - - -- name: "Restore watson studio project" - block: - # Prepare watson studio project restore folder - # ------------------------------------------------------------------------- - - name: "Set fact: watson studio project restore name" - set_fact: - masbr_wsl_restore_from_name: "{{ masbr_restore_from }}-{{ masbr_job_data_type }}" - masbr_wsl_restore_name: "{{ masbr_job_name }}-{{ masbr_job_data_type }}" - - - name: "Set fact: watson studio project restore folder" - set_fact: - masbr_wsl_restore_log: "{{ masbr_local_job_folder }}/{{ masbr_wsl_restore_name }}.log" - - - name: "Create watson studio project restore log" - changed_when: true - shell: > - touch {{ masbr_wsl_restore_log }} - - - # Copy backup file from specified storage location - # ------------------------------------------------------------------------- - - name: "Copy backup file from specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_storage_files_to_local.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_restore_from }}" - masbr_cf_paths: - - src_file: "{{ masbr_job_data_type }}/{{ masbr_wsl_restore_from_name }}.tgz" - dest_folder: "./" - - src_file: "{{ masbr_job_data_type }}/wsl-project-name.txt" - dest_folder: "./" - - - # Get watson studio information - # ----------------------------------------------------------------------------- - - name: "Get watson studio information" - include_tasks: "tasks/health/get-wsl-info.yml" - vars: - _wsl_log: "{{ masbr_wsl_restore_log }}" - - - # Import watson studio project asset - # ----------------------------------------------------------------------------- - - name: "Import watson studio project asset" - changed_when: true - shell: >- - {{ cpd_cli_cmd }} config users set cpd-user --username={{ cpd_username }} --apikey={{ cpd_apikey }}; - {{ cpd_cli_cmd }} config profiles set cpd-profile --user=cpd-user --url={{ cpd_endpoint }}; - echo "List projects" >> {{ masbr_wsl_restore_log }}; - {{ cpd_cli_cmd }} project list --profile=cpd-profile >> {{ masbr_wsl_restore_log }}; - WSL_PROJECT_NAME={{ mas_instance_id }}-{{ mas_workspace_id }}-healthext-{{ masbr_job_version }}; - echo "Create project ${WSL_PROJECT_NAME}" >> {{ masbr_wsl_restore_log }}; - CREATE_PROJECT_JSON=$({{ cpd_cli_cmd }} project create --profile=cpd-profile --name=${WSL_PROJECT_NAME} --output=json); - echo "${CREATE_PROJECT_JSON}" >> {{ masbr_wsl_restore_log }}; - WSL_PROJECT_LOC=$(echo "${CREATE_PROJECT_JSON}" | jq -r '.location'); - WSL_PROJECT_ID=$(echo "${WSL_PROJECT_LOC##*/}"); - echo "Import project asset" >> {{ masbr_wsl_restore_log }}; - {{ cpd_cli_cmd }} asset import start --profile=cpd-profile --project-id=${WSL_PROJECT_ID} - --import-file={{ masbr_local_job_folder }}/{{ masbr_wsl_restore_from_name }}.tgz >> {{ masbr_wsl_restore_log }}; - echo "${WSL_PROJECT_ID}" - register: _import_asset_output - - - name: "Set fact: new watson studio project id" - set_fact: - new_project_id: "{{ _import_asset_output.stdout }}" - - - name: "Debug: new watson studio project id" - debug: - msg: "New WS project id ................. {{ new_project_id }}" - - - # Update watson studio secret - # ----------------------------------------------------------------------------- - - name: "Update secret {{ wsl_secret_name }}" - changed_when: true - shell: >- - oc patch secret/{{ wsl_secret_name }} -n {{ mas_app_namespace }} - -p "{\"data\": {\"projectid\": \"{{ new_project_id | b64encode }}\"}}" - - - # Update watson studio project restore status: Completed - # ------------------------------------------------------------------------- - - name: "Update watson studio project restore status: Completed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Completed" - - rescue: - # Update watson studio project restore status: Failed - # ------------------------------------------------------------------------- - - name: "Update watson studio project restore status: Failed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Failed" - - always: - # Copy watson studio project restore log file to specified storage location - # ------------------------------------------------------------------------- - - name: "Create a tar.gz archive of watson studio project restore log" - changed_when: true - shell: > - tar -czf {{ masbr_local_job_folder }}/{{ masbr_wsl_restore_name }}-log.tar.gz - -C {{ masbr_local_job_folder }} {{ masbr_wsl_restore_name }}.log - - - name: "Copy watson studio project restore log file to specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_local_files_to_storage.yml" - vars: - masbr_cf_job_type: "restore" - masbr_cf_job_name: "{{ masbr_job_name_final }}" - masbr_cf_paths: - - src_file: "{{ masbr_local_job_folder }}/{{ masbr_wsl_restore_name }}-log.tar.gz" - dest_folder: "log" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/iot/backup-vars.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/iot/backup-vars.yml deleted file mode 100644 index ebbe463c7b..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/iot/backup-vars.yml +++ /dev/null @@ -1,29 +0,0 @@ ---- -- name: "Set fact: default backup job data list" - set_fact: - masbr_job_data_list: - - seq: "1" - type: "namespace" - -- name: "Set fact: namespace backup resources" - set_fact: - masbr_ns_backup_resources: - - namespace: "{{ mas_app_namespace }}" - resources: - - kind: Subscription - name: ibm-mas-iot - - kind: OperatorGroup - name: ibm-iot-operatorgroup - - kind: Secret - name: ibm-entitlement - - kind: Secret - name: "actions-credsenckey" - - kind: Secret - name: "auth-encryption-secret" - - kind: Secret - name: "provision-creds-enckey" - - kind: Secret - name: "auth-edc-user-sync-secret" - # apps.mas.ibm.com - - kind: IoT - - kind: IoTWorkspace diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/iot/restore-namespace.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/iot/restore-namespace.yml deleted file mode 100644 index c65917ad5b..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/iot/restore-namespace.yml +++ /dev/null @@ -1,50 +0,0 @@ ---- -# Apply secret yaml files -# ------------------------------------------------------------------------- -- name: "Set fact: namespace resources to be restored" - set_fact: - masbr_ns_restore_resources: - - "Secret-actions-credsenckey.yaml" - - "Secret-auth-encryption-secret.yaml" - - "Secret-provision-creds-enckey.yaml" - - "Secret-auth-edc-user-sync-secret.yaml" - -- name: "Replace mas instance in secret yaml files" - when: masbr_restore_to_diff_instance - changed_when: true - shell: > - yq -i 'with(.metadata; - .namespace="{{ mas_app_namespace }} | - .labels."app.kubernetes.io/instance"="{{ mas_app_namespace }}" | - .labels."mas.ibm.com/instanceId"="{{ mas_app_namespace }}" - )' {{ masbr_ns_restore_folder }}/{{ _ns_resource_file_name }} - loop: "{{ masbr_ns_restore_resources }}" - loop_control: - loop_var: _ns_resource_file_name - -- name: "Apply secret yaml files" - kubernetes.core.k8s: - apply: true - src: "{{ masbr_ns_restore_folder }}/{{ _ns_resource_file_name }}" - loop: "{{ masbr_ns_restore_resources }}" - loop_control: - loop_var: _ns_resource_file_name - - -# Restart pods -# ------------------------------------------------------------------------- -- name: "Delete pods in {{ mas_app_namespace }}" - changed_when: true - shell: >- - oc get pod -n {{ mas_app_namespace }} | grep "{{ _del_pod_name }}" | awk '{print $1}' | - xargs oc delete pod -n {{ mas_app_namespace }} - loop: - - "datapower-datapower" - - "auth-masuseragent" - loop_control: - loop_var: _del_pod_name - register: _del_pods_output - -- name: "Debug: delete pods in {{ mas_app_namespace }}" - debug: - msg: "{{ _del_pods_output | json_query('results[*].stdout_lines') }}" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/iot/restore-vars.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/iot/restore-vars.yml deleted file mode 100644 index cf23734548..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/iot/restore-vars.yml +++ /dev/null @@ -1,6 +0,0 @@ ---- -- name: "Set fact: default restore job data list" - set_fact: - masbr_job_data_list: - - seq: "1" - type: "namespace" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/main.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/main.yml deleted file mode 100644 index 6722ca867a..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/main.yml +++ /dev/null @@ -1,97 +0,0 @@ ---- -# Check mas app backup/restore required variables -# ----------------------------------------------------------------------------- -- name: "Fail if mas_instance_id is not provided" - assert: - that: mas_instance_id is defined and mas_instance_id != "" - fail_msg: "mas_instance_id is required" - -- name: "Fail if mas_workspace_id is not provided" - assert: - that: mas_workspace_id is defined and mas_workspace_id != "" - fail_msg: "mas_workspace_id is required" - -- name: "Fail if mas_app_id is not provided" - assert: - that: mas_app_id is defined and mas_app_id != "" - fail_msg: "mas_app_id is required" - -- name: "Fail if masbr_action is not provided" - assert: - that: masbr_action is defined and masbr_action != "" - fail_msg: "masbr_action is required" - -- name: "Set fact: namespace name for {{ mas_app_id }}" - set_fact: - mas_app_namespace: "mas-{{ mas_instance_id }}-{{ mas_app_id }}" - - -# Get mas app information -# ----------------------------------------------------------------------------- -- name: "Get {{ mas_app_id }} information" - include_tasks: "tasks/get-app-info.yml" - - -# Set common job variables -# ----------------------------------------------------------------------------- -- name: "Set fact: common job variables" - set_fact: - masbr_job_component: - name: "{{ mas_app_id }}" - instance: "{{ mas_instance_id }}" - workspace: "{{ mas_workspace_id }}" - namespace: "{{ mas_app_namespace }}" - -- name: "Load mas app variables" - include_tasks: "tasks/{{ mas_app_id }}/{{ masbr_action }}-vars.yml" - - -# Before run tasks -# ----------------------------------------------------------------------------- -- name: "Before run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/before_run_tasks.yml" - vars: - _job_type: "{{ masbr_action }}" - - -- name: "Run {{ masbr_action }} tasks" - block: - # Update job status: New - # ------------------------------------------------------------------------- - - name: "Update job status: New" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "1" - phase: "New" - - - # Run backup/restore tasks for each data type - # TODO: check and ignore unsupported data type - # ------------------------------------------------------------------------- - - name: "Run {{ masbr_action }} tasks for each data type" - include_tasks: "{{ _include_tasks_folder }}/{{ masbr_action }}-{{ job_data_item.type }}.yml" - vars: - _include_tasks_folder: >- - {{ role_path }}/{{ 'tasks' if job_data_item.type in ['namespace', 'pv'] else 'tasks/' + mas_app_id }} - masbr_job_data_seq: "{{ job_data_item.seq }}" - masbr_job_data_type: "{{ job_data_item.type }}" - loop: "{{ masbr_job_data_list }}" - loop_control: - loop_var: job_data_item - when: job_data_item.type in supported_job_data_item_types[mas_app_id] - - rescue: - # Update job status: Failed - # ------------------------------------------------------------------------- - - name: "Update job status: Failed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_status: - phase: "Failed" - - always: - # After run tasks - # ------------------------------------------------------------------------- - - name: "After run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/after_run_tasks.yml" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/manage/backup-vars.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/manage/backup-vars.yml deleted file mode 100644 index 700e185478..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/manage/backup-vars.yml +++ /dev/null @@ -1,27 +0,0 @@ ---- -- name: "Set fact: default backup job data list" - set_fact: - masbr_job_data_list: - - seq: "1" - type: "namespace" - - seq: "2" - type: "pv" - -- name: "Set fact: namespace backup resources" - set_fact: - masbr_ns_backup_resources: - - namespace: "{{ mas_app_namespace }}" - resources: - - kind: Subscription - name: ibm-mas-manage - - kind: OperatorGroup - name: mas-{{ mas_instance_id }}-manage-operator-group - - kind: Secret - name: ibm-entitlement - - kind: Secret - name: "{{ mas_workspace_id }}-manage-encryptionsecret" - - kind: Secret - name: "{{ mas_workspace_id }}-manage-encryptionsecret-operator" - # apps.mas.ibm.com - - kind: ManageApp - - kind: ManageWorkspace diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/manage/pv-info.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/manage/pv-info.yml deleted file mode 100644 index cb5971a048..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/manage/pv-info.yml +++ /dev/null @@ -1,98 +0,0 @@ ---- -# Get workspace pv spec -# ----------------------------------------------------------------------------- - -# ~~ Sample workspace pv spec ~~ -# persistentVolumes: -# - accessModes: -# - ReadWriteMany -# mountPath: /mnt/doclinks1 -# pvcName: manage-doclinks1-pvc -# size: '20' -# storageClassName: ocs-storagecluster-cephfs -# volumeName: '' -# - accessModes: -# - ReadWriteMany -# mountPath: /mnt/doclinks2 -# pvcName: manage-doclinks2-pvc -# size: '20' -# storageClassName: ocs-storagecluster-cephfs -# volumeName: '' - -- name: "Set fact: workspace pv spec" - set_fact: - manage_ws_pvs: "{{ _ws_output.resources[0].spec.settings.deployment.persistentVolumes }}" - when: masbr_manage_pvc_paths is defined and masbr_manage_pvc_paths | length > 0 - -- name: "Debug: workspace pv spec" - debug: - msg: "{{ manage_ws_pvs }}" - when: manage_ws_pvs is defined - - -# Only go on processing when manage has pv defined -# ----------------------------------------------------------------------------- -- name: "Only go on processing when manage has pv defined" - when: manage_ws_pvs is defined and manage_ws_pvs | length > 0 - block: - # Get maxinst pod information - - name: "Get maxinst pod information" - kubernetes.core.k8s_info: - kind: Pod - namespace: "{{ mas_app_namespace }}" - label_selectors: - - mas.ibm.com/appType=maxinstudb - - mas.ibm.com/workspaceId={{ mas_workspace_id }} - register: _maxinst_pod_output - failed_when: - - _maxinst_pod_output.resources is not defined - - _maxinst_pod_output.resources | length == 0 - - - name: "Set fact: copy pvc file variables" - set_fact: - masbr_cf_pod_name: "{{ _maxinst_pod_output.resources[0].metadata.name }}" - masbr_cf_container_name: "{{ _maxinst_pod_output.resources[0].spec.containers[0].name }}" - masbr_cf_affinity: false - - - name: "Debug: maxinst pod information" - debug: - msg: - - "maxinst pod name ................... {{ masbr_cf_pod_name }}" - - "maxinst container name ............. {{ masbr_cf_container_name }}" - - - name: "Set fact: reset mas_app_pv_list" - set_fact: - mas_app_pv_list: [] - _manage_pvc_paths: [] - - # Set '_manage_pvc_paths' based on 'masbr_manage_pvc_paths' - - name: "Get specified pvc backup/restore paths" - when: masbr_manage_pvc_paths is defined and masbr_manage_pvc_paths | length > 0 - block: - - name: "Set fact: _manage_pvc_paths" - set_fact: - _manage_pvc_paths: >- - {{ _manage_pvc_paths + [{ - 'pvcName': item | split(':') | first | trim, - 'pvcPath': item | split(':') | last | trim - }] }} - loop: "{{ masbr_manage_pvc_paths | split(',') }}" - - - name: "Set fact: mas_app_pv_list" - set_fact: - mas_app_pv_list: >- - {{ mas_app_pv_list + [{ - 'mount_path': manage_ws_pvs | json_query( - '[?pvcName==`' + item.pvcName + '`].mountPath') | first, - 'pvc_name': _maxinst_pod_output.resources[0].spec.volumes | json_query( - '[?name==`' + item.pvcName + '`].persistentVolumeClaim.claimName') | first, - 'backup_paths': [{ - 'src_folder': item.pvcPath, - 'dest_folder': 'pv/' + item.pvcName + item.pvcPath - }], - 'restore_paths': [{ - 'src_folder': 'pv/' + item.pvcName + item.pvcPath, - 'dest_folder': item.pvcPath - }], - }] }} - loop: "{{ _manage_pvc_paths }}" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/manage/restore-namespace.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/manage/restore-namespace.yml deleted file mode 100644 index baadcbc74c..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/manage/restore-namespace.yml +++ /dev/null @@ -1,35 +0,0 @@ ---- -- name: "Set fact: workspace id in the resource file name" - set_fact: - masbr_restore_from_workspace: "{{ masbr_restore_from_yaml.component.workspace }}" - -- name: "Replace mas instance in the resource files" - when: masbr_restore_to_diff_instance - changed_when: true - shell: > - yq -i 'with(.metadata; - .namespace="{{ mas_app_namespace }}" | - .name="{{ mas_workspace_id }}-manage-encryptionsecret" - )' {{ masbr_ns_restore_folder }}/Secret-{{ masbr_restore_from_workspace }}-manage-encryptionsecret.yaml; - - yq -i 'with(.metadata; - .namespace="{{ mas_app_namespace }}" | - .name="{{ mas_workspace_id }}-manage-encryptionsecret-operator" - )' {{ masbr_ns_restore_folder }}/Secret-{{ masbr_restore_from_workspace }}-manage-encryptionsecret-operator.yaml; - -# Restore namespace resoruces -# ------------------------------------------------------------------------- -# Loop through the folder -- name: "Get the list of files from restore directory" - find: - paths: "{{ masbr_ns_restore_folder }}" - patterns: '*.yml,*.yaml' - recurse: no - register: find_result - -- name: "Apply configs" - kubernetes.core.k8s: - state: present - template: "{{ item.path }}" - with_items: "{{ find_result.files }}" - when: find_result is defined diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/manage/restore-vars.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/manage/restore-vars.yml deleted file mode 100644 index d55d2c8d27..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/manage/restore-vars.yml +++ /dev/null @@ -1,8 +0,0 @@ ---- -- name: "Set fact: default restore job data list" - set_fact: - masbr_job_data_list: - - seq: "1" - type: "namespace" - - seq: "2" - type: "pv" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/monitor/backup-vars.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/monitor/backup-vars.yml deleted file mode 100644 index 6e697ac281..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/monitor/backup-vars.yml +++ /dev/null @@ -1,42 +0,0 @@ ---- -- name: "Set fact: default backup job data list" - set_fact: - masbr_job_data_list: - - seq: "1" - type: "namespace" - -- name: "Set fact: namespace backup resources" - set_fact: - masbr_ns_backup_resources: - # monitor - - namespace: "{{ mas_app_namespace }}" - resources: - - kind: Subscription - name: ibm-mas-monitor - - kind: OperatorGroup - name: ibm-monitor-operatorgroup - - kind: Secret - name: ibm-entitlement - - kind: Secret - name: "{{ mas_instance_id }}-{{ mas_workspace_id }}-datadictionaryworkspace-workspace-binding" - - kind: Secret - name: "monitor-kitt" - # apps.mas.ibm.com - - kind: MonitorApp - - kind: MonitorWorkspace - # add - - namespace: "mas-{{ mas_instance_id }}-add" - resources: - - kind: Subscription - name: ibm-data-dictionary - - kind: OperatorGroup - name: ibm-dd-group - - kind: Secret - name: ibm-entitlement - - kind: Secret - name: "datadictionary-{{ mas_workspace_id }}" - - kind: Secret - name: "instance-admin" - # apps.mas.ibm.com - - kind: AssetDataDictionary - - kind: DataDictionaryWorkspace diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/monitor/restore-namespace.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/monitor/restore-namespace.yml deleted file mode 100644 index 5e245d6804..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/monitor/restore-namespace.yml +++ /dev/null @@ -1,88 +0,0 @@ ---- -- name: "Set fact: instance and workspace id in the resource file name" - set_fact: - masbr_restore_from_instance: "{{ masbr_restore_from_yaml.component.instance }}" - masbr_restore_from_workspace: "{{ masbr_restore_from_yaml.component.workspace }}" - - -# Restore namespace resources for monitor -# ------------------------------------------------------------------------- -- name: "Replace mas instance in the resource files" - when: masbr_restore_to_diff_instance - changed_when: true - shell: > - yq -i 'with(.metadata; - .namespace="{{ mas_app_namespace }}" | - .name="{{ mas_instance_id }}-{{ mas_workspace_id }}-datadictionaryworkspace-workspace-binding" - )' {{ masbr_ns_restore_folder }}/Secret-{{ masbr_restore_from_instance }}-{{ masbr_restore_from_workspace }}-datadictionaryworkspace-workspace-binding.yaml; - - yq -i 'with(.metadata; - .namespace="{{ mas_app_namespace }}" - )' {{ masbr_ns_restore_folder }}/Secret-monitor-kitt.yaml; - -- name: "Apply secret yaml files" - kubernetes.core.k8s: - apply: true - src: "{{ masbr_ns_restore_folder }}/{{ _ns_resource_file_name }}" - loop: - - "Secret-{{ masbr_restore_from_instance }}-{{ masbr_restore_from_workspace }}-datadictionaryworkspace-workspace-binding.yaml" - - "Secret-monitor-kitt.yaml" - loop_control: - loop_var: _ns_resource_file_name - - -# Restore namespace resources for add -# ------------------------------------------------------------------------- -- name: "Replace mas instance in the resource files" - when: masbr_restore_to_diff_instance - changed_when: true - shell: > - yq -i 'with(.metadata; - .namespace="mas-{{ mas_instance_id }}-add" | - .name="datadictionary-{{ mas_workspace_id }}" - )' {{ masbr_ns_restore_folder }}/Secret-datadictionary-{{ masbr_restore_from_workspace }}.yaml; - - yq -i 'with(.metadata; - .namespace="mas-{{ mas_instance_id }}-add" - )' {{ masbr_ns_restore_folder }}/Secret-instance-admin.yaml; - -- name: "Apply secret yaml files" - kubernetes.core.k8s: - apply: true - src: "{{ masbr_ns_restore_folder }}/{{ _ns_resource_file_name }}" - loop: - - "Secret-datadictionary-{{ masbr_restore_from_workspace }}.yaml" - - "Secret-instance-admin.yaml" - loop_control: - loop_var: _ns_resource_file_name - - -# Restart pods -# ------------------------------------------------------------------------- -- name: "Delete pods in mas-{{ mas_instance_id }}-add" - changed_when: true - shell: >- - oc get pod -n mas-{{ mas_instance_id }}-add | grep "{{ _del_pod_name }}" | awk '{print $1}' | - xargs oc delete pod -n mas-{{ mas_instance_id }}-add - loop: - - "user-store" - - "series-store" - - "graph-store" - loop_control: - loop_var: _del_pod_name - register: _del_pods_output - -- name: "Debug: delete pods in {{ mas_app_namespace }}" - debug: - msg: "{{ _del_pods_output | json_query('results[*].stdout_lines') }}" - -- name: "Delete pods in {{ mas_app_namespace }}" - changed_when: true - shell: >- - oc get pod -n mas-{{ mas_instance_id }}-monitor | grep "{{ mas_instance_id }}" | awk '{print $1}' | - xargs oc delete pod -n mas-{{ mas_instance_id }}-monitor - register: _del_pods_output - -- name: "Debug: delete pods in {{ mas_app_namespace }}" - debug: - msg: "{{ _del_pods_output.stdout_lines }}" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/monitor/restore-vars.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/monitor/restore-vars.yml deleted file mode 100644 index cf23734548..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/monitor/restore-vars.yml +++ /dev/null @@ -1,6 +0,0 @@ ---- -- name: "Set fact: default restore job data list" - set_fact: - masbr_job_data_list: - - seq: "1" - type: "namespace" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/optimizer/backup-vars.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/optimizer/backup-vars.yml deleted file mode 100644 index dfa1711ab2..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/optimizer/backup-vars.yml +++ /dev/null @@ -1,19 +0,0 @@ ---- -- name: "Set fact: default backup job data list" - set_fact: - masbr_job_data_list: - - seq: "1" - type: "namespace" - -- name: "Set fact: namespace backup resources" - set_fact: - masbr_ns_backup_resources: - - namespace: "{{ mas_app_namespace }}" - resources: - - kind: Subscription - name: ibm-mas-optimizer - - kind: OperatorGroup - name: mas-{{ mas_instance_id }}-optimizer-operator-group - # apps.mas.ibm.com - - kind: OptimizerApp - - kind: OptimizerWorkspace diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/optimizer/restore-vars.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/optimizer/restore-vars.yml deleted file mode 100644 index cf23734548..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/optimizer/restore-vars.yml +++ /dev/null @@ -1,6 +0,0 @@ ---- -- name: "Set fact: default restore job data list" - set_fact: - masbr_job_data_list: - - seq: "1" - type: "namespace" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/restore-namespace.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/restore-namespace.yml deleted file mode 100644 index 92931ceddf..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/restore-namespace.yml +++ /dev/null @@ -1,109 +0,0 @@ ---- -# Update namespace resource restore status: InProgress -# ----------------------------------------------------------------------------- -- name: "Update namespace resource restore status: InProgress" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "InProgress" - - -- name: "Restore namespace resources" - block: - # Prepare namespace resource restore folder - # ------------------------------------------------------------------------- - - name: "Set fact: namespace backup file name" - set_fact: - masbr_ns_restore_from_name: "{{ masbr_restore_from }}-{{ masbr_job_data_type }}" - - - name: "Set fact: namespace resource restore folder" - set_fact: - masbr_ns_restore_folder: "{{ masbr_local_job_folder }}/{{ masbr_job_data_type }}/{{ masbr_ns_restore_from_name }}" - masbr_ns_restore_name: "{{ masbr_job_name }}-{{ masbr_job_data_type }}" - - - name: "Set fact: namespace resource restore log" - set_fact: - masbr_ns_restore_log: "{{ masbr_local_job_folder }}/{{ masbr_ns_restore_name }}.log" - - - name: "Create local restore folder for saving namespace resoruces" - changed_when: true - shell: > - mkdir -p {{ masbr_ns_restore_folder }} && - touch {{ masbr_ns_restore_log }} - - - name: "Debug: namespace resource restore folder" - debug: - msg: "Namespace resource restore folder ........ {{ masbr_ns_restore_folder }}" - - - # Copy backup file from specified storage location - # ------------------------------------------------------------------------- - - name: "Copy backup file from specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_storage_files_to_local.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_restore_from }}" - masbr_cf_paths: - - src_file: "{{ masbr_job_data_type }}/{{ masbr_ns_restore_from_name }}.tar.gz" - dest_folder: "{{ masbr_job_data_type }}" - - - # Extract the tar.gz file - # ------------------------------------------------------------------------- - - name: "Extract the tar.gz file" - changed_when: true - shell: > - tar -xzf {{ masbr_local_job_folder }}/{{ masbr_job_data_type }}/{{ masbr_ns_restore_from_name }}.tar.gz - -C {{ masbr_ns_restore_folder }} && - ls -lA {{ masbr_ns_restore_folder }} - register: _extract_output - - - name: "Debug: list extracted files" - debug: - msg: "{{ _extract_output.stdout_lines }}" - - - # Restore namespace resoruces - # ------------------------------------------------------------------------- - - name: "Restore namespace resoruces" - when: mas_app_id in ['manage', 'iot', 'monitor'] - include_tasks: "tasks/{{ mas_app_id }}/restore-namespace.yml" - - - # Update database restore status: Completed - # ------------------------------------------------------------------------- - - name: "Update database restore status: Completed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Completed" - - rescue: - # Update database restore status: Failed - # ------------------------------------------------------------------------- - - name: "Update database restore status: Failed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Failed" - - always: - # Copy namespace resource restore log file to specified storage location - # ------------------------------------------------------------------------- - - name: "Create a tar.gz archive of namespace resource restore log" - changed_when: true - shell: > - tar -czf {{ masbr_local_job_folder }}/{{ masbr_ns_restore_name }}-log.tar.gz - -C {{ masbr_local_job_folder }} {{ masbr_ns_restore_name }}.log - - - name: "Copy namespace resource restore log file to specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_local_files_to_storage.yml" - vars: - masbr_cf_job_type: "restore" - masbr_cf_job_name: "{{ masbr_job_name_final }}" - masbr_cf_paths: - - src_file: "{{ masbr_local_job_folder }}/{{ masbr_ns_restore_name }}-log.tar.gz" - dest_folder: "log" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/restore-pv.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/restore-pv.yml deleted file mode 100644 index c73928580a..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/restore-pv.yml +++ /dev/null @@ -1,85 +0,0 @@ ---- -# Update pv data restore status: InProgress -# ----------------------------------------------------------------------------- -- name: "Update pv data restore status: InProgress" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "InProgress" - - -# Get app pv information -# ----------------------------------------------------------------------------- -- name: "Set fact: mas_app_pv_list" - set_fact: - mas_app_pv_list: [] - -- name: "Get {{ mas_app_id }} pv information" - when: mas_app_id in ['manage', 'visualinspection'] - include_tasks: "tasks/{{ mas_app_id }}/pv-info.yml" - -- name: "Debug: {{ mas_app_id }} pv information" - debug: - msg: "{{ mas_app_pv_list }}" - - -# Not found pv need to be restored, skip this task. -# (TODO: should set the status to 'Skip') -# ------------------------------------------------------------------------- -- name: "Update pv data restore status: Completed" - when: mas_app_pv_list | length == 0 - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Completed" - - -- name: "Restore pv data" - when: mas_app_pv_list | length > 0 - block: - # Copy pv data from specified storage location - # ------------------------------------------------------------------------- - - name: "Set fact: copy file variables" - set_fact: - masbr_cf_job_type: "backup" - masbr_cf_namespace: "{{ mas_app_namespace }}" - masbr_cf_job_name: "{{ masbr_restore_from }}" - masbr_cf_are_pvc_paths: true - - - name: "Set fact: restore from an incremental backup" - when: masbr_restore_from_incr - set_fact: - masbr_cf_from_job_name: "{{ masbr_restore_basedon }}" - - - name: "Copy pv data from specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_storage_files_to_pod.yml" - vars: - masbr_cf_pvc_name: "{{ mas_app_pv_item.pvc_name }}" - masbr_cf_pvc_mount_path: "{{ mas_app_pv_item.mount_path }}" - masbr_cf_pvc_sub_path: "{{ mas_app_pv_item.sub_path | default('') }}" - masbr_cf_paths: "{{ mas_app_pv_item.restore_paths }}" - loop: "{{ mas_app_pv_list }}" - loop_control: - loop_var: mas_app_pv_item - - - # Update pv data restore status: Completed - # ------------------------------------------------------------------------- - - name: "Update pv data restore status: Completed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Completed" - - rescue: - # Update pv data restore status: Failed - # ------------------------------------------------------------------------- - - name: "Update pv data restore status: Failed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Failed" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/visualinspection/backup-vars.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/visualinspection/backup-vars.yml deleted file mode 100644 index 307e9f0d6c..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/visualinspection/backup-vars.yml +++ /dev/null @@ -1,27 +0,0 @@ ---- -- name: "Set fact: default backup job data list" - set_fact: - masbr_job_data_list: - - seq: "1" - type: "namespace" - - seq: "2" - type: "pv" - -- name: "Set fact: namespace backup resources" - set_fact: - masbr_ns_backup_resources: - - namespace: "{{ mas_app_namespace }}" - resources: - - kind: Subscription - name: ibm-mas-visualinspection - - kind: OperatorGroup - name: mas-{{ mas_instance_id }}-visualinspection-operator-group - - kind: Secret - name: ibm-entitlement - # https://www.ibm.com/docs/en/mas-cd/maximo-vi/continuous-delivery?topic=managing-workload-scale-customization - - kind: ConfigMap - keywords: ^custom-.*-config$ - - kind: HorizontalPodAutoscaler - # apps.mas.ibm.com - - kind: VisualInspectionApp - - kind: VisualInspectionAppWorkspace diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/visualinspection/pv-info.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/visualinspection/pv-info.yml deleted file mode 100644 index 32f266ff76..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/visualinspection/pv-info.yml +++ /dev/null @@ -1,41 +0,0 @@ ---- -# Check ui pod information -# ------------------------------------------------------------------------- -- name: "Get ui pod information" - kubernetes.core.k8s_info: - kind: Pod - namespace: "{{ mas_app_namespace }}" - label_selectors: - - app.kubernetes.io/name=ui - register: _ui_pod_output - failed_when: - - _ui_pod_output.resources is not defined - - _ui_pod_output.resources | length == 0 - -- name: "Set fact: copy pvc file variables" - set_fact: - masbr_cf_pod_name: "{{ _ui_pod_output.resources[0].metadata.name }}" - masbr_cf_container_name: "{{ _ui_pod_output.resources[0].spec.containers[0].name }}" - masbr_cf_affinity: false - -- name: "Debug: ui pod information" - debug: - msg: - - "ui pod name ................... {{ masbr_cf_pod_name }}" - - "ui container name ............. {{ masbr_cf_container_name }}" - - -# Set pv information variables -# ------------------------------------------------------------------------- -- name: "Set fact: mas_app_pv_list" - set_fact: - mas_app_pv_list: - - mount_path: "/opt/powerai-vision/data" - sub_path: "data" - pvc_name: "{{ mas_instance_id }}-data-pvc" - backup_paths: - - src_folder: "/opt/powerai-vision/data" - dest_folder: "pv" - restore_paths: - - src_folder: "pv" - dest_folder: "/opt/powerai-vision/data" diff --git a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/visualinspection/restore-vars.yml b/ibm/mas_devops/roles/suite_app_backup_restore/tasks/visualinspection/restore-vars.yml deleted file mode 100644 index d55d2c8d27..0000000000 --- a/ibm/mas_devops/roles/suite_app_backup_restore/tasks/visualinspection/restore-vars.yml +++ /dev/null @@ -1,8 +0,0 @@ ---- -- name: "Set fact: default restore job data list" - set_fact: - masbr_job_data_list: - - seq: "1" - type: "namespace" - - seq: "2" - type: "pv" diff --git a/ibm/mas_devops/roles/suite_app_restore/README.md b/ibm/mas_devops/roles/suite_app_restore/README.md new file mode 100644 index 0000000000..55af8f2fea --- /dev/null +++ b/ibm/mas_devops/roles/suite_app_restore/README.md @@ -0,0 +1,381 @@ +Restore MAS Applications +=============================================================================== + +Overview +------------------------------------------------------------------------------- +This role supports restoring MAS application resources and data from backups created by the `suite_app_backup` role. Currently supported applications: + +- **`manage`**: Restores Manage namespace resources (CRs, secrets, subscriptions) and persistent volume data + +Future support planned for: `iot`, `monitor`, `health`, `optimizer`, `visualinspection` + +The restore process follows these phases: +1. **Phase 1**: Restore Kubernetes resources like Project, Secrets, Configmaps, Subscription, Certificates and ManageApp CR (not ManageWorkspace CR) +2. **Phase 2**: Check if ManageWorkspace CR is already available +3. **Phase 3**: Restore persistent volume data + - If ManageWorkspace exists: Scale it down, restore data, then continue + - If ManageWorkspace doesn't exist: Create PVCs, create dummy pod, restore data, delete dummy pod +4. **Phase 4**: Restore ManageWorkspace CR +5. **Phase 5**: Wait for Manage deployment to be activated + +!!! important + - An application backup can only be restored to an instance with the same MAS instance ID + - This role restores application resources and PV data only. Database restores must be performed separately using the appropriate database restore role + - For Manage, see the [db2](db2.md) role for database restore + - The restore process is designed to be idempotent and can handle both fresh installations and existing deployments + + +Role Variables - General +------------------------------------------------------------------------------- +### mas_app_id +Defines the MAS application ID for the restore action. + +- **Required** +- Environment Variable: `MAS_APP_ID` +- Default: None +- Valid Values: `manage` (currently supported) + +### mas_instance_id +Defines the MAS instance ID for the restore action. Must match the instance ID from the backup. + +- **Required** +- Environment Variable: `MAS_INSTANCE_ID` +- Default: None + +### mas_workspace_id +Defines the MAS workspace ID for the restore action. Must match the workspace ID from the backup. + +- **Required** +- Environment Variable: `MAS_WORKSPACE_ID` +- Default: None + +### mas_backup_dir +Defines the directory where backups are stored. The role will look for the backup version subdirectory within this location. + +- **Required** +- Environment Variable: `MAS_BACKUP_DIR` +- Default: None +- Example: `/backup/mas` + +### mas_app_backup_version +Specifies which backup version to restore. This should match the version identifier used during backup. + +- **Required** +- Environment Variable: `MAS_APP_BACKUP_VERSION` +- Default: None +- Example: `20240315-143022` or `v1.0-prod` + +### mas_app_restore_wait_retries +Maximum time in seconds to wait for ManageWorkspace to become ready after restore. + +- Optional +- Environment Variable: `MAS_APP_RESTORE_WAIT_RETRIES` +- Default: `120` + +### mas_app_restore_wait_delay +Delay in seconds between status checks when waiting for ManageWorkspace to become ready. + +- Optional +- Environment Variable: `MAS_APP_RESTORE_WAIT_DELAY` +- Default: `360` + +### helper_pod_image +Name of the image to use by the PVC-restore-helper pod. This pod will be deployed on the app's namespace to mount PVC volumes and extract the pvc backup to the mounted path. +Image must have `tar` installed for `oc cp` to work. + +- Optional +- Environment Variable: `HELPER_POD_IMAGE` +- Default: `registry.redhat.io/ubi9/ubi:latest` + +### override_storageclass +Enable or disable storage class override during restore. When enabled, the restore process will use custom storage classes instead of the storage classes from the backup. + +- Optional +- Environment Variable: `OVERRIDE_STORAGECLASS` +- Default: `false` +- Valid Values: `true`, `false` + +### mas_app_custom_storage_class_rwx +Custom storage class to use for PVCs with ReadWriteMany (RWX) access mode when `override_storageclass` is enabled. If not provided and override is enabled, the default storage class will be used. + +- Optional +- Environment Variable: `MAS_APP_CUSTOM_STORAGE_CLASS_RWX` +- Default: Empty (uses default storage class) +- Example: `ocs-storagecluster-cephfs` + +### mas_app_custom_storage_class_rwo +Custom storage class to use for PVCs with ReadWriteOnce (RWO) access mode when `override_storageclass` is enabled. If not provided and override is enabled, the default storage class will be used. + +- Optional +- Environment Variable: `MAS_APP_CUSTOM_STORAGE_CLASS_RWO` +- Default: Empty (uses default storage class) +- Example: `ocs-storagecluster-ceph-rbd` + + +What Gets Restored +------------------------------------------------------------------------------- +### Manage Application +When restoring the Manage application, the following resources are restored: + +**Namespace Resources** (Phase 1 & 4): +- `Project` (namespace) +- Encryption secrets +- Certificates with `mas.ibm.com/instanceId` label +- IBM entitlement secret +- All referenced secrets +- Subscription and OperatorGroup +- `ManageApp` CR (Phase 1) +- `ManageWorkspace` CR (Phase 4) + +**Persistent Volume Data** (Phase 3): +- All persistent volumes defined in `spec.settings.deployment.persistentVolumes` +- Data is restored from compressed tar.gz archives +- Each PVC's mount path is restored separately +- Archives are read from the `data` subdirectory + +**NOT Restored** (must be restored separately): +- Manage database (Db2) - use the [db2](db2.md) role +- Suite-level resources - use the [suite_restore](suite_restore.md) role + + +How Persistent Volume Restore Works +------------------------------------------------------------------------------- +The role intelligently handles PV restoration based on whether ManageWorkspace CR is already deployed: + +### Scenario 1: ManageWorkspace CR Does Not Exist (Fresh Restore) +1. **Read Configuration**: Extracts PVC configuration from ManageWorkspace CR backup +2. **Create PVCs**: Creates all PVCs defined in the backup +3. **Create Dummy Pod**: Creates a temporary pod that mounts all PVCs +4. **Restore Data**: Copies tar.gz archives to the pod and extracts them to mount paths +5. **Cleanup**: Deletes the dummy pod +6. **Deploy CR**: Restores the ManageWorkspace CR +7. **Wait**: Waits for Manage deployment to be activated + +### Scenario 2: ManageWorkspace CR Already Exists (Re-restore/Update) +1. **Scale Down**: Sets ManageWorkspace `spec.settings.deployment.mode` to `down` +2. **Wait**: Waits for workspace to scale down +3. **Find Pod**: Locates UI, ALL, or maxinst pod for data access +4. **Restore Data**: Copies tar.gz archives to the pod and extracts them to mount paths +5. **Update CR**: Restores the ManageWorkspace CR (which will scale back up) +6. **Wait**: Waits for Manage deployment to be activated + +### Dummy Pod Specification +When a dummy pod is created for restoration, it uses: +- **Image**: `registry.redhat.io/ubi8/ubi-minimal:latest` +- **Command**: `sleep infinity` (keeps pod running) +- **Volumes**: All PVCs from ManageWorkspace CR configuration +- **Labels**: Tagged with instance ID and workspace ID for easy identification + + +Restore Process Phases +------------------------------------------------------------------------------- + +### Phase 1: Restore Resources Until ManageApp CR +- Restores all namespace resources except ManageWorkspace CR +- Includes: ManageApp, secrets, certificates, subscriptions, operator groups +- Waits for ManageApp CR to become ready +- Auto-discovers and restores referenced secrets + +### Phase 2: Check ManageWorkspace Status +- Checks if ManageWorkspace CR already exists +- If exists: Sets `spec.settings.deployment.mode` to `down` and waits for scale down +- If not exists: Proceeds to create PVCs and dummy pod + +### Phase 3: Restore Persistent Volume Data +- Reads PVC configuration from ManageWorkspace CR backup +- Creates PVCs if needed (when ManageWorkspace doesn't exist) +- Creates dummy pod or uses existing server bundle pod +- Restores data from tar.gz archives to each PVC +- Cleans up dummy pod if created + +### Phase 4: Restore ManageWorkspace CR +- Restores the ManageWorkspace CR +- This triggers the Manage deployment to start/restart + +### Phase 5: Wait for Deployment Activation +- Monitors ManageWorkspace CR status +- Waits for Ready condition to be True +- Configurable timeout and delay between checks + + +Expected Backup Directory Structure +------------------------------------------------------------------------------- +The role expects the backup directory to have the following structure: + +``` +/ +└── backup--app-manage/ + ├── resources/ + │ ├── projects + │ │ └── mas--manage.yaml + | ├── secrets + │ │ └── .yaml + │ │ └── .yaml + | ├── configmaps + │ │ └── .yaml + │ │ └── .yaml + | ├── subscriptions + │ │ └── .yaml + │ └── ... (other resources) + └── data/ + ├── .tar.gz + ├── .tar.gz +``` + + +Example Playbooks +------------------------------------------------------------------------------- + +### Basic Restore +Restore Manage namespace resources and persistent volumes from a backup: + +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + mas_instance_id: inst1 + mas_workspace_id: ws1 + mas_app_id: manage + mas_backup_dir: /backup/mas + mas_app_backup_version: "20240315-143022" + roles: + - ibm.mas_devops.suite_app_restore +``` + +### Restore with Custom Timeout +Restore with a custom wait timeout for large deployments: + +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + mas_instance_id: inst1 + mas_workspace_id: ws1 + mas_app_id: manage + mas_backup_dir: /backup/mas + mas_app_backup_version: "prod-backup-20240315" + mas_app_restore_wait_retries: 180 + mas_app_restore_wait_delay: 360 # Check 6 minutes + roles: + - ibm.mas_devops.suite_app_restore +``` + +### Complete Restore Workflow +Complete workflow including database restore: + +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + mas_instance_id: inst1 + mas_workspace_id: ws1 + mas_backup_dir: /backup/mas + backup_version: "20240315-143022" + + tasks: + # 1. Restore Db2 database first + - name: "Restore Manage database" + include_role: + name: ibm.mas_devops.db2 + vars: + db2_action: restore + db2_backup_version: "{{ backup_version }}" + + # 2. Restore Manage application + - name: "Restore Manage application" + include_role: + name: ibm.mas_devops.suite_app_restore + vars: + mas_app_id: manage + mas_app_backup_version: "{{ backup_version }}" +``` + +### Restore with Storage Class Override +Restore to a different cluster with different storage classes: + +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + mas_instance_id: inst1 + mas_workspace_id: ws1 + mas_app_id: manage + mas_backup_dir: /backup/mas + mas_app_backup_version: "20240315-143022" + # Enable storage class override + override_storageclass: true + # Specify custom storage classes + mas_app_custom_storage_class_rwo: "ocs-storagecluster-cephfs" + mas_app_custom_storage_class_rwx: "ocs-storagecluster-ceph-rbd" + roles: + - ibm.mas_devops.suite_app_restore +``` + +### Restore with Storage Class Override Using Default Classes +Restore with override enabled but using cluster's default storage classes: + +```yaml +- hosts: localhost + any_errors_fatal: true + vars: + mas_instance_id: inst1 + mas_workspace_id: ws1 + mas_app_id: manage + mas_backup_dir: /backup/mas + mas_app_backup_version: "20240315-143022" + # Enable storage class override without specifying custom classes + # Will automatically use the cluster's default storage class + override_storageclass: true + roles: + - ibm.mas_devops.suite_app_restore +``` + + +Troubleshooting +------------------------------------------------------------------------------- + +### Restore Fails at Phase 1 +**Issue**: Resources fail to restore in Phase 1 +**Solution**: +- Check that the backup directory exists and contains the expected files +- Verify namespace exists: `oc get namespace mas--manage` +- Check for conflicting resources: `oc get manageapp,subscription -n mas--manage` + +### Dummy Pod Fails to Start +**Issue**: Dummy pod remains in Pending state +**Solution**: +- Check PVC status: `oc get pvc -n mas--manage` +- Verify storage class exists and can provision volumes +- Check pod events: `oc describe pod -n mas--manage` + +### PV Data Restore Fails +**Issue**: Data extraction fails in the pod +**Solution**: +- Verify tar.gz archives are not corrupted +- Check pod has sufficient disk space +- Verify mount paths are accessible in the pod + +### ManageWorkspace Never Becomes Ready +**Issue**: Phase 5 times out waiting for ManageWorkspace +**Solution**: +- Check ManageWorkspace status: `oc describe manageworkspace -n mas--manage` +- Verify database is accessible and restored +- Check pod logs: `oc logs -l mas.ibm.com/appType=serverBundle -n mas--manage` +- Increase `mas_app_restore_wait_retries` and `mas_app_restore_wait_delay` values` if deployment is slow + + +Notes +------------------------------------------------------------------------------- +- **Database Restore**: This role does NOT restore the Manage database. Use the [db2](db2.md) role to restore Db2 databases separately, and do this BEFORE running the application restore +- **Suite Resources**: This role restores application-specific resources only. For suite-level resources (Suite CR, workspace CRs, etc.), use the [suite_restore](suite_restore.md) role +- **Instance ID Match**: The restore must be performed on a cluster with the same MAS instance ID as the backup +- **Idempotent**: The restore process is idempotent and can be run multiple times +- **Dummy Pod**: When ManageWorkspace doesn't exist, a temporary pod is created for data restoration and automatically cleaned up +- **Scale Down**: When ManageWorkspace exists, it's automatically scaled down before data restoration +- **Automatic Detection**: Persistent volumes are automatically detected from the ManageWorkspace CR backup +- **Compression**: All PV data is stored as compressed tar.gz archives +- **Wait Time**: The default wait timeout is 1 hour, but this can be adjusted based on deployment size +- **Prerequisites**: Ensure the MAS operator and Manage operator are installed before running restore +- **Storage Class Override**: When restoring to a different cluster with different storage classes, enable `override_storageclass` to automatically map PVCs to appropriate storage classes based on access modes (RWX/RWO) +- **Default Storage Classes**: If `override_storageclass` is enabled but custom storage classes are not specified, the cluster's default storage class will be used automatically +- **Access Mode Mapping**: The role intelligently assigns storage classes based on PVC access modes - RWX (ReadWriteMany) uses `mas_app_custom_storage_class_rwx` and RWO (ReadWriteOnce) uses `mas_app_custom_storage_class_rwo` \ No newline at end of file diff --git a/ibm/mas_devops/roles/suite_app_restore/defaults/main.yml b/ibm/mas_devops/roles/suite_app_restore/defaults/main.yml new file mode 100644 index 0000000000..d831e186f2 --- /dev/null +++ b/ibm/mas_devops/roles/suite_app_restore/defaults/main.yml @@ -0,0 +1,33 @@ +--- +# MAS Application Restore - Default Variables +# ============================================================================= + +# General Configuration +# ----------------------------------------------------------------------------- +mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" +mas_workspace_id: "{{ lookup('env', 'MAS_WORKSPACE_ID') }}" +mas_app_id: "{{ lookup('env', 'MAS_APP_ID') }}" + +# Restore Configuration +# ----------------------------------------------------------------------------- +# Directory where backups are stored +mas_backup_dir: "{{ lookup('env', 'MAS_BACKUP_DIR') }}" + +# Backup version to restore (e.g., "20240315-143022" or "v1.0-prod") +mas_app_backup_version: "{{ lookup('env', 'MAS_APP_BACKUP_VERSION') }}" + +# PVC helper pod configuration +helper_pod_image: "{{ lookup('env', 'HELPER_POD_IMAGE') | default('registry.redhat.io/ubi9/ubi:latest', true) }}" + +# Storage Class Override Options +# ----------------------------------------------------------------------------- +# Override storage class from backup with custom storage classes +override_storageclass: "{{ lookup('env', 'OVERRIDE_STORAGECLASS') | default('false', true) | bool }}" + +# Custom storage class for ReadWriteMany (RWX) access mode +mas_app_custom_storage_class_rwx: "{{ lookup('env', 'MAS_APP_CUSTOM_STORAGE_CLASS_RWX') | default('', true) }}" + +# Custom storage class for ReadWriteOnce (RWO) access mode +mas_app_custom_storage_class_rwo: "{{ lookup('env', 'MAS_APP_CUSTOM_STORAGE_CLASS_RWO') | default('', true) }}" + +_manage_persistent_volumes: "NO_OVERRIDE" diff --git a/ibm/mas_devops/roles/suite_app_restore/meta/main.yml b/ibm/mas_devops/roles/suite_app_restore/meta/main.yml new file mode 100644 index 0000000000..ecd01a67cc --- /dev/null +++ b/ibm/mas_devops/roles/suite_app_restore/meta/main.yml @@ -0,0 +1,19 @@ +--- +galaxy_info: + author: IBM + description: Restore MAS application resources and data + company: IBM + license: EPL-2.0 + min_ansible_version: "2.10" + platforms: + - name: EL + versions: + - "8" + galaxy_tags: + - ibm + - mas + - maximo + - restore + - backup + +dependencies: [] diff --git a/ibm/mas_devops/roles/suite_app_restore/tasks/main.yml b/ibm/mas_devops/roles/suite_app_restore/tasks/main.yml new file mode 100644 index 0000000000..72e7c15d94 --- /dev/null +++ b/ibm/mas_devops/roles/suite_app_restore/tasks/main.yml @@ -0,0 +1,79 @@ +--- +# MAS Application Restore Role +# ============================================================================= +# This role restores MAS application resources and data. +# Currently supports: manage +# +# The restore includes: +# - Application namespace resources (CRs, secrets, subscriptions) +# - Persistent volume data (if configured) + +# 1. Validate required variables +# ----------------------------------------------------------------------------- +- name: "Fail if mas_instance_id is not provided" + assert: + that: mas_instance_id is defined and mas_instance_id != "" + fail_msg: "mas_instance_id is required" + +- name: "Fail if mas_app_id is not provided" + assert: + that: mas_app_id is defined and mas_app_id != "" + fail_msg: "mas_app_id is required" + +- name: "Fail if mas_backup_dir is not provided" + assert: + that: mas_backup_dir is defined and mas_backup_dir != "" + fail_msg: "mas_backup_dir is required" + +- name: "Fail if mas_app_backup_version is not provided" + assert: + that: mas_app_backup_version is defined and mas_app_backup_version != "" + fail_msg: "mas_app_backup_version is required" + +# 2. Load var files +# ----------------------------------------------------------------------------- +- name: Load mas_appws variables + include_vars: "vars/{{ mas_app_id }}.yml" + +# 3. Display restore configuration +# ----------------------------------------------------------------------------- +- name: "Display restore configuration" + debug: + msg: + - "MAS Instance ID: {{ mas_instance_id }}" + - "MAS App ID: {{ mas_app_id }}" + - "Backup Directory: {{ mas_backup_dir }}" + - "Backup Version: {{ mas_app_backup_version }}" + +# 4. Route to app-specific restore tasks +# ----------------------------------------------------------------------------- +- name: "Execute restore for {{ mas_app_id }}" + block: + # Manage restore + # --------------------------------------------------------------------------- + - name: "Restore Manage application" + when: mas_app_id == "manage" + block: + - name: "Restore Manage namespace resources" + include_tasks: "{{ role_path }}/tasks/manage/restore-namespace.yml" + + - name: "Restore Manage persistent volumes" + include_tasks: "{{ role_path }}/tasks/manage/restore-pv.yml" + + - name: "Manage restore completed successfully" + debug: + msg: + - "==========================================" + - "Manage Restore Completed Successfully" + - "==========================================" + - "Instance ID: {{ mas_instance_id }}" + - "Backup Version: {{ mas_app_backup_version }}" + - "Restored from: {{ manage_backup_path }}" + - "==========================================" + + # Unsupported app + # --------------------------------------------------------------------------- + - name: "Fail if app is not supported" + fail: + msg: "Application '{{ mas_app_id }}' is not yet supported for restore. Currently supported: manage" + when: mas_app_id not in ['manage'] diff --git a/ibm/mas_devops/roles/suite_app_restore/tasks/manage/restore-namespace.yml b/ibm/mas_devops/roles/suite_app_restore/tasks/manage/restore-namespace.yml new file mode 100644 index 0000000000..a5d15c44de --- /dev/null +++ b/ibm/mas_devops/roles/suite_app_restore/tasks/manage/restore-namespace.yml @@ -0,0 +1,375 @@ +--- +# Restore Manage Namespace Resources +# ============================================================================= +# This task restores Manage namespace resources following these steps: +# 1. Restore kubernetes resources until ManageApp CR +# 2. Check if ManageWorkspace CR is deployed +# 3. If deployed, change spec.mode to down +# 4. Continue with ManageWorkspace CR restore +# 5. Wait for Manage deployment to be activated + +# 1. Verify required variables +# ----------------------------------------------------------------------------- +- name: "Verify required variables for Manage restore" + ibm.mas_devops.verify_backup_restore_vars: + component: manage + action: restore + mas_instance_id: "{{ mas_instance_id }}" + mas_backup_dir: "{{ mas_backup_dir }}" + mas_app_backup_version: "{{ mas_app_backup_version }}" + +# 2. Set backup path +# ----------------------------------------------------------------------------- +- name: "Set fact: Manage backup base directory path" + set_fact: + manage_backup_path: "{{ mas_backup_dir }}/backup-{{ mas_app_backup_version }}-app-manage" + manage_resources_backup_path: "{{ mas_backup_dir }}/backup-{{ mas_app_backup_version }}-app-manage/resources" + +- name: "Verify backup directory exists" + stat: + path: "{{ manage_resources_backup_path }}" + register: backup_dir_stat + +- name: "Fail if backup directory does not exist" + fail: + msg: "Backup directory not found: {{ manage_resources_backup_path }}" + when: not backup_dir_stat.stat.exists or not backup_dir_stat.stat.isdir + +# 2.1. Verify cert-manager exists +# ----------------------------------------------------------------------------- +- name: Detect Certificate Manager installation + include_tasks: "{{ role_path }}/../../common_tasks/detect_cert_manager.yml" + +# 3. Set Manage namespace and workspace CR name +# ----------------------------------------------------------------------------- +- name: "Set fact: Manage namespace" + set_fact: + mas_app_namespace: "mas-{{ mas_instance_id }}-manage" + +- name: Get files from {{ manage_resources_backup_path }}/manageworkspaces directory + set_fact: + instance_files: "{{ lookup('fileglob', '{{ manage_resources_backup_path }}/manageworkspaces/*', wantlist=True) }}" + +- name: Assert exactly one ManageWorkspace CR exists + assert: + that: + - instance_files | length == 1 + fail_msg: "ManageWorkspace Directory must contain exactly one file" + +- name: Set fact ManageWorkspace cr + set_fact: + workspace_backup_cr: "{{ lookup('file', '{{ instance_files[0] }}') | from_yaml }}" + +- name: "Set fact: Manage workspace CR name" + set_fact: + manage_workspace_cr_name: "{{ workspace_backup_cr.metadata.name}}" + +# 4. Restore resources until ManageApp CR (excluding ManageWorkspace) +# ----------------------------------------------------------------------------- +- name: "Display restore phase 1 information" + debug: + msg: + - "==========================================" + - "Phase 1: Restore resources Project, Secrets, Configmaps, Subscription, Certificates and ManageApp CR" + - "==========================================" + - "Namespace: {{ mas_app_namespace }}" + - "Backup path: {{ manage_backup_path }}" + +# Step 1: Restore Projects first +- name: "Restore Projects" + ibm.mas_devops.restore_resource: + backup_path: "{{ manage_backup_path }}" + resource_kinds: + - Project + replace_resource: false + register: projects_result + +- name: "Display Projects restore results" + debug: + msg: >- + Projects: {{ projects_result.created_count }} created, + {{ projects_result.updated_count }} updated, + {{ projects_result.skipped_count }} skipped, + {{ projects_result.failed_count }} failed + +# Step 2: Restore Secrets and ConfigMaps +- name: "Restore Secrets and ConfigMaps" + ibm.mas_devops.restore_resource: + backup_path: "{{ manage_backup_path }}" + resource_kinds: + - Secret + - ConfigMap + register: secrets_configmaps_result + when: projects_result.success + +- name: "Display Secrets and ConfigMaps restore results" + debug: + msg: >- + Secrets and ConfigMaps: {{ secrets_configmaps_result.created_count }} created, + {{ secrets_configmaps_result.updated_count }} updated, + {{ secrets_configmaps_result.skipped_count }} skipped, + {{ secrets_configmaps_result.failed_count }} failed + when: projects_result.success + +# Step 3: Restore OperatorGroups and Subscriptions +- name: "Restore OperatorGroups" + ibm.mas_devops.restore_resource: + backup_path: "{{ manage_backup_path }}" + resource_kinds: + - OperatorGroup + register: operatorgroups_result + when: projects_result.success + +- name: "Display OperatorGroups restore results" + debug: + msg: >- + OperatorGroups: {{ operatorgroups_result.created_count }} created, + {{ operatorgroups_result.updated_count }} updated, + {{ operatorgroups_result.skipped_count }} skipped, + {{ operatorgroups_result.failed_count }} failed + when: projects_result.success + +- name: "Restore Subscriptions" + ibm.mas_devops.restore_resource: + backup_path: "{{ manage_backup_path }}" + resource_kinds: + - Subscription + register: subscriptions_result + when: projects_result.success + +- name: "Display Subscriptions restore results" + debug: + msg: >- + Subscriptions: {{ subscriptions_result.created_count }} created, + {{ subscriptions_result.updated_count }} updated, + {{ subscriptions_result.skipped_count }} skipped, + {{ subscriptions_result.failed_count }} failed + when: projects_result.success + +# Wait until the LicenseService CRD is available +# ----------------------------------------------------------------------------- +- name: "Wait until the ManageApps CRD is available" + include_tasks: "{{ role_path }}/../../common_tasks/wait_for_crd.yml" + vars: + crd_name: "manageapps.apps.mas.ibm.com" + +# Step 4: Restore Certificate resources +- name: "Restore Certificate resources" + ibm.mas_devops.restore_resource: + backup_path: "{{ manage_backup_path }}" + resource_kinds: + - Certificate + register: certmanager_result + when: projects_result.success + +- name: "Display Certificates restore results" + debug: + msg: >- + Certificate Manager resources: {{ certmanager_result.created_count }} created, + {{ certmanager_result.updated_count }} updated, + {{ certmanager_result.skipped_count }} skipped, + {{ certmanager_result.failed_count }} failed + when: projects_result.success + +- name: "Restore ManageApp CR" + ibm.mas_devops.restore_resource: + backup_path: "{{ manage_backup_path }}" + resource_kinds: + - ManageApp + register: cr_result + +- name: "Display ManageApp CR restore results" + debug: + msg: >- + Manage App resource: {{ cr_result.created_count }} created, + {{ cr_result.updated_count }} updated, + {{ cr_result.skipped_count }} skipped, + {{ cr_result.failed_count }} failed + +# Calculate Phase 1 results +# ----------------------------------------------------------------------------- +- name: "Calculate Phase 1 restore results" + set_fact: + total_created: >- + {{ + (projects_result.created_count | default(0)) + + (secrets_configmaps_result.created_count | default(0)) + + (operatorgroups_result.created_count | default(0)) + + (subscriptions_result.created_count | default(0)) + + (certmanager_result.created_count | default(0)) + + (cr_result.created_count | default(0)) + }} + total_updated: >- + {{ + (projects_result.updated_count | default(0)) + + (secrets_configmaps_result.updated_count | default(0)) + + (operatorgroups_result.updated_count | default(0)) + + (subscriptions_result.updated_count | default(0)) + + (certmanager_result.updated_count | default(0)) + + (cr_result.updated_count | default(0)) + }} + total_skipped: >- + {{ + (projects_result.skipped_count | default(0)) + + (secrets_configmaps_result.skipped_count | default(0)) + + (operatorgroups_result.skipped_count | default(0)) + + (subscriptions_result.skipped_count | default(0)) + + (certmanager_result.skipped_count | default(0)) + + (cr_result.skipped_count | default(0)) + }} + total_failed: >- + {{ + (projects_result.failed_count | default(0)) + + (secrets_configmaps_result.failed_count | default(0)) + + (operatorgroups_result.failed_count | default(0)) + + (subscriptions_result.failed_count | default(0)) + + (certmanager_result.failed_count | default(0)) + + (cr_result.failed_count | default(0)) + }} + +- name: "Display Phase 1 restore results" + debug: + msg: + - >- + Phase 1 Restore completed{{ ' with failures' if total_failed | int > 0 + else ' successfully' }} + - "Total resources created: {{ total_created }}" + - "Total resources updated: {{ total_updated }}" + - "Total resources skipped: {{ total_skipped }}" + - "Total resources failed: {{ total_failed }}" + +# Fail task if any errors occurred +# ----------------------------------------------------------------------------- +- name: "Collect Phase 1 failed resources" + set_fact: + all_failed_resources: >- + {{ + (projects_result.failed_resources | default([])) + + (secrets_configmaps_result.failed_resources | default([])) + + (operatorgroups_result.failed_resources | default([])) + + (subscriptions_result.failed_resources | default([])) + + (certmanager_result.failed_resources | default([])) + + (certmanager_result.failed_resources | default([])) + + (cr_result.failed_resources | default([])) + }} + +- name: "Display failed resources" + debug: + msg: + - "Failed resources:" + - "{{ all_failed_resources | to_nice_yaml }}" + when: total_failed | int > 0 + +- name: "Fail if phase 1 restore had errors" + fail: + msg: | + Restore failed for {{ total_failed }} resource(s): + {% for resource in all_failed_resources %} + - {{ resource.description }}: {{ resource.error }} + {% endfor %} + when: total_failed | int > 0 + +# 5. Wait for ManageApp CR to be ready +# ----------------------------------------------------------------------------- +- name: "Wait for ManageApp CR to be ready" + kubernetes.core.k8s_info: + api_version: apps.mas.ibm.com/v1 + kind: ManageApp + name: "{{ mas_instance_id }}" + namespace: "{{ mas_app_namespace }}" + register: manage_app_cr + retries: 15 + delay: 60 + until: + - manage_app_cr.resources is defined + - manage_app_cr.resources | length > 0 + - manage_app_cr.resources[0].status is defined + - manage_app_cr.resources[0].status.conditions is defined + - manage_app_cr.resources[0].status.conditions | selectattr('type', 'equalto', 'Ready') | list | length > 0 + - (manage_app_cr.resources[0].status.conditions | selectattr('type', 'equalto', 'Ready') | first).status == 'True' + +- name: "ManageApp CR is ready" + debug: + msg: "ManageApp CR {{ mas_instance_id }} is ready" + +# 6. Check if ManageWorkspace CR is already deployed +# ----------------------------------------------------------------------------- +- name: "Display restore phase 2 information" + debug: + msg: + - "==========================================" + - "Phase 2: Check if ManageWorkspace CR is already deployed and scale down" + - "==========================================" + - "Namespace: {{ mas_app_namespace }}" + - "Backup path: {{ manage_backup_path }}" + +- name: "Check if ManageWorkspace CR exists" + kubernetes.core.k8s_info: + api_version: apps.mas.ibm.com/v1 + kind: ManageWorkspace + name: "{{ manage_workspace_cr_name }}" + namespace: "{{ mas_app_namespace }}" + register: existing_manage_workspace_cr + +- name: "Set fact: ManageWorkspace CR exists" + set_fact: + manage_workspace_exists: "{{ existing_manage_workspace_cr.resources is defined and existing_manage_workspace_cr.resources | length > 0 }}" + +- name: "Display ManageWorkspace CR status" + debug: + msg: "ManageWorkspace CR {{ 'exists' if manage_workspace_exists else 'does not exist' }}" + +# 7. If ManageWorkspace exists, set mode to down +# ----------------------------------------------------------------------------- +- name: "Set ManageWorkspace mode to down (if exists)" + when: manage_workspace_exists + block: + - name: "Get current ManageWorkspace spec.mode" + set_fact: + current_mode: "{{ existing_manage_workspace_cr.resources[0].spec.settings.deployment.mode | default('up') }}" + + - name: "Display current mode" + debug: + msg: "Current ManageWorkspace mode: {{ current_mode }}" + + - name: "Set ManageWorkspace mode to down" + kubernetes.core.k8s: + api_version: apps.mas.ibm.com/v1 + kind: ManageWorkspace + name: "{{ manage_workspace_cr_name }}" + namespace: "{{ mas_app_namespace }}" + definition: + spec: + settings: + deployment: + mode: down + state: patched + when: current_mode != 'down' + + - name: "Wait for ManageWorkspace to scale down" + kubernetes.core.k8s_info: + api_version: apps.mas.ibm.com/v1 + kind: ManageWorkspace + name: "{{ manage_workspace_cr_name }}" + namespace: "{{ mas_app_namespace }}" + register: workspace_status + retries: 15 + delay: 60 + until: + - workspace_status.resources is defined + - workspace_status.resources | length > 0 + - workspace_status.resources[0].status is defined + - workspace_status.resources[0].status.conditions is defined + - workspace_status.resources[0].status.conditions | selectattr('type', 'equalto', 'Ready') | list | length > 0 + - (workspace_status.resources[0].status.conditions | selectattr('type', 'equalto', 'Ready') | first).status == 'False' + when: current_mode != 'down' + + - name: "ManageWorkspace scaled down successfully" + debug: + msg: "ManageWorkspace {{ manage_workspace_cr_name }} is now in down mode" + when: current_mode != 'down' + + - name: "ManageWorkspace already in down mode" + debug: + msg: "ManageWorkspace {{ manage_workspace_cr_name }} is already in down mode" + when: current_mode == 'down' diff --git a/ibm/mas_devops/roles/suite_app_restore/tasks/manage/restore-pv.yml b/ibm/mas_devops/roles/suite_app_restore/tasks/manage/restore-pv.yml new file mode 100644 index 0000000000..0ad7134f75 --- /dev/null +++ b/ibm/mas_devops/roles/suite_app_restore/tasks/manage/restore-pv.yml @@ -0,0 +1,315 @@ +--- +# Restore Manage Persistent Volumes +# ============================================================================= +# This task restores Manage persistent volume data by: +# 1. Reading the ManageWorkspace CR from backup to get PVC configuration +# 2. If ManageWorkspace CR is not deployed, create PVCs and a dummy pod +# 3. Restore PVC data using the pod +# 4. Shutdown the dummy pod +# 5. Continue with ManageWorkspace CR restore +# 6. Wait for Manage deployment to be activated + +# 1. Read ManageWorkspace CR from backup to get PVC configuration +# ----------------------------------------------------------------------------- +- name: "Display restore phase 3 information" + debug: + msg: + - "==========================================" + - "Phase 3: Restore Persistent Volume Data" + - "==========================================" + +- name: "Extract persistent volumes configuration from backup" + set_fact: + manage_persistent_volumes: "{{ workspace_backup_cr.spec.settings.deployment.persistentVolumes | default([]) }}" + +- name: "Display persistent volumes configuration" + debug: + msg: + - "Persistent volumes found in backup: {{ manage_persistent_volumes | length }}" + - "{{ manage_persistent_volumes | to_nice_yaml }}" + +# 2. Check if we need to restore PV data +# ----------------------------------------------------------------------------- +- name: "Check if PV data backup exists" + stat: + path: "{{ manage_backup_path }}/data" + register: pv_data_dir_stat + +- name: "Set fact: PV data needs restore" + set_fact: + pv_data_needs_restore: "{{ manage_persistent_volumes | length > 0 and pv_data_dir_stat.stat.exists and pv_data_dir_stat.stat.isdir }}" + +- name: "Display PV restore decision" + debug: + msg: "PV data restore {{ 'required' if pv_data_needs_restore else 'not required' }}" + +# 3. Restore PV data if needed +# ----------------------------------------------------------------------------- +- name: "Restore Manage persistent volume data" + when: pv_data_needs_restore + block: + # If ManageWorkspace doesn't exist, create PVCs and dummy pod + # ------------------------------------------------------------------------- + - name: "Display PVC creation information" + debug: + msg: + - "ManageWorkspace CR not deployed yet" + - "Creating PVCs and dummy pod for PVC data restoration" + + # Determine storage class override settings + # --------------------------------------------------------------------- + - name: "Display storage class override settings" + debug: + msg: + - "Storage class override enabled: {{ override_storageclass }}" + - "Custom RWX storage class: {{ mas_app_custom_storage_class_rwx | default('not set') }}" + - "Custom RWO storage class: {{ mas_app_custom_storage_class_rwo | default('not set') }}" + + # Get default storage classes if override is enabled but custom classes not provided + # --------------------------------------------------------------------- + - name: "Lookup default storage classes" + when: + - override_storageclass + block: + - name: "Include default storage classes lookup" + include_tasks: "{{ role_path }}/../../common_tasks/default_storage_classes.yml" + + - name: "Set default RWX storage class if not provided" + set_fact: + mas_app_custom_storage_class_rwx: "{{ defaultStorageClasses.rwx }}" + when: mas_app_custom_storage_class_rwx == '' + + - name: "Set default RWO storage class if not provided" + set_fact: + mas_app_custom_storage_class_rwo: "{{ defaultStorageClasses.rwo }}" + when: mas_app_custom_storage_class_rwo == '' + + - name: "Fail storage class override if default storage classes not found" + fail: + msg: "Storage class override enabled but no custom storage classes provided and default storage classes not found" + when: mas_app_custom_storage_class_rwx == '' or mas_app_custom_storage_class_rwo == '' + + - name: Replace ManageWorkspace Persistent Volumes when override_storageclass is true + set_fact: + _manage_persistent_volumes: "{{ manage_persistent_volumes | ibm.mas_devops.override_manage_persistent_volumes(mas_app_custom_storage_class_rwo, mas_app_custom_storage_class_rwx) }}" + when: manage_persistent_volumes is defined and manage_persistent_volumes | length > 0 + + # Create PVCs + # --------------------------------------------------------------------- + - name: "Create PVCs for data restoration" + kubernetes.core.k8s: + state: present + definition: + apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + name: "{{ pv_item.pvcName }}" + namespace: "{{ mas_app_namespace }}" + spec: + accessModes: "{{ pv_item.accessModes }}" + storageClassName: "{{ storage_class_to_use }}" + resources: + requests: + storage: "{{ pv_item.size }}" + vars: + storage_class_to_use: >- + {%- if override_storageclass -%} + {%- if 'ReadWriteMany' in pv_item.accessModes -%} + {{ mas_app_custom_storage_class_rwx }} + {%- else -%} + {{ mas_app_custom_storage_class_rwo }} + {%- endif -%} + {%- else -%} + {{ pv_item.storageClassName }} + {%- endif -%} + loop: "{{ manage_persistent_volumes }}" + loop_control: + loop_var: pv_item + + - name: "Wait for PVCs to be bound" + kubernetes.core.k8s_info: + api_version: v1 + kind: PersistentVolumeClaim + name: "{{ pv_item.pvcName }}" + namespace: "{{ mas_app_namespace }}" + register: pvc_status + retries: 60 + delay: 5 + until: + - pvc_status.resources is defined + - pvc_status.resources | length > 0 + - pvc_status.resources[0].status.phase == 'Bound' + loop: "{{ manage_persistent_volumes }}" + loop_control: + loop_var: pv_item + + # Create helper pod to mount PVCs + # --------------------------------------------------------------------- + - name: "Create PV restore helper pod for PVC data restoration" + kubernetes.core.k8s: + state: present + definition: + apiVersion: v1 + kind: Pod + metadata: + name: "{{ mas_instance_id }}-manage-restore-pvc-pod" + namespace: "{{ mas_app_namespace }}" + labels: + app: manage-restore + mas.ibm.com/instanceId: "{{ mas_instance_id }}" + spec: + restartPolicy: Never + containers: + - name: restore-container + image: "{{ helper_pod_image }}" + command: ["/bin/sh", "-c", "sleep infinity"] + volumeMounts: "{{ volume_mounts }}" + volumes: "{{ volumes }}" + vars: + volume_mounts: | + {% set mounts = [] %} + {% for pv in manage_persistent_volumes %} + {% set _ = mounts.append({'name': pv.pvcName, 'mountPath': pv.mountPath}) %} + {% endfor %} + {{ mounts }} + volumes: | + {% set vols = [] %} + {% for pv in manage_persistent_volumes %} + {% set _ = vols.append({'name': pv.pvcName, 'persistentVolumeClaim': {'claimName': pv.pvcName}}) %} + {% endfor %} + {{ vols }} + register: dummy_pod_created + + - name: "Wait for helper pod to be running" + kubernetes.core.k8s_info: + api_version: v1 + kind: Pod + name: "{{ mas_instance_id }}-manage-restore-pvc-pod" + namespace: "{{ mas_app_namespace }}" + register: dummy_pod_status + retries: 60 + delay: 5 + until: + - dummy_pod_status.resources is defined + - dummy_pod_status.resources | length > 0 + - dummy_pod_status.resources[0].status.phase == 'Running' + + - name: "Set fact: restore pod name" + set_fact: + restore_pod_name: "{{ mas_instance_id }}-manage-restore-pvc-pod" + restore_pod_container: "restore-container" + using_dummy_pod: true + + # Restore each persistent volume + # ------------------------------------------------------------------------- + - name: "Display restore pod information" + debug: + msg: + - "Restore pod: {{ restore_pod_name }}" + - "Container: {{ restore_pod_container }}" + - "Using helper pod: {{ using_dummy_pod }}" + + - name: "Restore each persistent volume" + include_tasks: "{{ role_path }}/tasks/manage/restore-single-pv.yml" + loop: "{{ manage_persistent_volumes }}" + loop_control: + loop_var: pv_item + index_var: pv_index + + # Shutdown helper pod if created + # ------------------------------------------------------------------------- + - name: "Shutdown dummy pod" + when: using_dummy_pod | default(false) + block: + - name: "Delete dummy restore pod" + kubernetes.core.k8s: + api_version: v1 + kind: Pod + name: "{{ restore_pod_name }}" + namespace: "{{ mas_app_namespace }}" + state: absent + wait: yes + wait_timeout: 300 + + - name: "Helper pod deleted successfully" + debug: + msg: "Helper restore pod {{ restore_pod_name }} has been deleted" + + - name: "Display PV restore completion" + debug: + msg: + - "Manage PV restore completed" + - "Persistent volumes restored: {{ manage_persistent_volumes | length }}" + +# 4. Skip message if no PV data to restore +# ----------------------------------------------------------------------------- +- name: "Skip PV restore message" + debug: + msg: "Skipping Manage PV restore - no persistent volume data found in backup" + when: not pv_data_needs_restore + +# 5. Restore ManageWorkspace CR +# ----------------------------------------------------------------------------- +- name: "Display restore phase 4 information" + debug: + msg: + - "==========================================" + - "Phase 4: Restore ManageWorkspace CR" + - "==========================================" + +- name: "Restore ManageWorkspace CR" + ibm.mas_devops.restore_resource: + backup_path: "{{ manage_backup_path }}" + resource_kinds: + - ManageWorkspace + override_values: + ManageWorkspace: + - spec.settings.deployment.persistentVolumes: "{{ _manage_persistent_volumes }}" + register: manage_restore_phase4_result + +- name: "Display ManageWorkspace CR restore results" + debug: + msg: >- + Manage Worspace resource: {{ manage_restore_phase4_result.created_count }} created, + {{ manage_restore_phase4_result.updated_count }} updated, + {{ manage_restore_phase4_result.skipped_count }} skipped, + {{ manage_restore_phase4_result.failed_count }} failed + +- name: "Fail if ManageWorkspace CR restore had errors" + fail: + msg: "ManageWorkspace CR restore failed. See logs for details." + when: manage_restore_phase4_result.failed_count | int > 0 + +# 6. Wait for Manage deployment to be activated +# ----------------------------------------------------------------------------- +- name: "Display restore phase 5 information" + debug: + msg: + - "==========================================" + - "Phase 5: Wait for Manage Deployment" + - "==========================================" + +- name: "Wait for ManageWorkspace to be ready" + kubernetes.core.k8s_info: + api_version: apps.mas.ibm.com/v1 + kind: ManageWorkspace + name: "{{ manage_workspace_cr_name }}" + namespace: "{{ mas_app_namespace }}" + register: workspace_ready_status + retries: "{{ mas_app_restore_wait_retries | int }}" + delay: "{{ mas_app_restore_wait_delay | int }}" + until: + - workspace_ready_status.resources is defined + - workspace_ready_status.resources | length > 0 + - workspace_ready_status.resources[0].status is defined + - workspace_ready_status.resources[0].status.conditions is defined + - workspace_ready_status.resources[0].status.conditions | selectattr('type', 'equalto', 'Ready') | list | length > 0 + - (workspace_ready_status.resources[0].status.conditions | selectattr('type', 'equalto', 'Ready') | first).status == 'True' + +- name: "ManageWorkspace is ready" + debug: + msg: + - "==========================================" + - "ManageWorkspace {{ manage_workspace_cr_name }} is ready" + - "Manage deployment has been activated" + - "==========================================" diff --git a/ibm/mas_devops/roles/suite_app_restore/tasks/manage/restore-single-pv.yml b/ibm/mas_devops/roles/suite_app_restore/tasks/manage/restore-single-pv.yml new file mode 100644 index 0000000000..e67373c0d1 --- /dev/null +++ b/ibm/mas_devops/roles/suite_app_restore/tasks/manage/restore-single-pv.yml @@ -0,0 +1,132 @@ +--- +# Restore Single Persistent Volume +# ============================================================================= +# This task restores a single persistent volume's data from a tar.gz archive +# to the restore pod. + +- name: "Set fact: PV restore details" + set_fact: + pv_mount_path: "{{ pv_item.mountPath }}" + pv_pvc_name: "{{ pv_item.pvcName }}" + pv_archive_name: "{{ pv_item.pvcName }}.tar.gz" + pv_archive_path: "{{ manage_backup_path }}/data/{{ pv_item.pvcName }}.tar.gz" + +- name: "Display PV restore information" + debug: + msg: + - "Restoring PV {{ pv_index + 1 }}/{{ manage_persistent_volumes | length }}" + - "PVC Name: {{ pv_pvc_name }}" + - "Mount Path: {{ pv_mount_path }}" + - "Archive: {{ pv_archive_name }}" + +# Verify backup archive exists +# ----------------------------------------------------------------------------- +- name: "Verify backup archive exists" + stat: + path: "{{ pv_archive_path }}" + register: archive_stat + +- name: "Fail if backup archive not found" + fail: + msg: "Backup archive not found: {{ pv_archive_path }}" + when: not archive_stat.stat.exists + +- name: "Display archive information" + debug: + msg: + - "Archive size: {{ (archive_stat.stat.size / 1024 / 1024) | round(2) }} MB" + - "Archive location: {{ pv_archive_path }}" + +# Copy tar.gz archive from local backup to pod +# ----------------------------------------------------------------------------- +- name: "Copy tar.gz archive to restore pod" + shell: | + oc cp {{ pv_archive_path }} {{ mas_app_namespace }}/{{ restore_pod_name }}:/tmp/{{ pv_archive_name }} -c {{ restore_pod_container }} + register: copy_result + failed_when: false + +- name: "Check copy result" + debug: + msg: + - "Copy return code: {{ copy_result.rc }}" + - "{{ 'Success' if copy_result.rc == 0 else 'Failed' }}" + +- name: "Fail if copy failed" + fail: + msg: "Failed to copy tar archive to pod for {{ pv_pvc_name }}: {{ copy_result.stderr | default('Unknown error') }}" + when: copy_result.rc != 0 + +# Clear existing data in mount path (optional, be careful!) +# ----------------------------------------------------------------------------- +- name: "Clear existing data in mount path" + kubernetes.core.k8s_exec: + namespace: "{{ mas_app_namespace }}" + pod: "{{ restore_pod_name }}" + container: "{{ restore_pod_container }}" + command: > + sh -c "rm -rf {{ pv_mount_path }}/* {{ pv_mount_path }}/.[!.]* 2>/dev/null || true" + register: clear_result + failed_when: false + +- name: "Display clear result" + debug: + msg: "Cleared existing data in {{ pv_mount_path }}" + +# Extract tar.gz archive in the pod +# ----------------------------------------------------------------------------- +- name: "Extract tar.gz archive in pod at {{ pv_mount_path }}" + kubernetes.core.k8s_exec: + namespace: "{{ mas_app_namespace }}" + pod: "{{ restore_pod_name }}" + container: "{{ restore_pod_container }}" + command: > + tar -xzf /tmp/{{ pv_archive_name }} -C {{ pv_mount_path }} + register: extract_result + failed_when: false + +- name: "Check extract result" + debug: + msg: + - "Extract return code: {{ extract_result.rc }}" + - "{{ 'Success' if extract_result.rc == 0 else 'Failed' }}" + +- name: "Fail if extract failed" + fail: + msg: "Failed to extract tar archive for {{ pv_pvc_name }}: {{ extract_result.stderr | default('Unknown error') }}" + when: extract_result.rc != 0 + +# Clean up tar.gz archive from pod +# ----------------------------------------------------------------------------- +- name: "Remove tar.gz archive from pod" + kubernetes.core.k8s_exec: + namespace: "{{ mas_app_namespace }}" + pod: "{{ restore_pod_name }}" + container: "{{ restore_pod_container }}" + command: rm -f /tmp/{{ pv_archive_name }} + register: cleanup_result + failed_when: false + +# Verify restore was successful +# ----------------------------------------------------------------------------- +- name: "Verify data was restored" + kubernetes.core.k8s_exec: + namespace: "{{ mas_app_namespace }}" + pod: "{{ restore_pod_name }}" + container: "{{ restore_pod_container }}" + command: > + sh -c "ls -la {{ pv_mount_path }} | head -20" + register: verify_result + failed_when: false + +- name: "Display restored data verification" + debug: + msg: + - "Restored data in {{ pv_mount_path }}:" + - "{{ verify_result.stdout_lines | default(['No output']) }}" + when: verify_result.rc == 0 + +- name: "PV restore completed" + debug: + msg: + - "Successfully restored PVC: {{ pv_pvc_name }}" + - "Mount path: {{ pv_mount_path }}" diff --git a/ibm/mas_devops/roles/suite_app_restore/vars/manage.yml b/ibm/mas_devops/roles/suite_app_restore/vars/manage.yml new file mode 100644 index 0000000000..7566be6076 --- /dev/null +++ b/ibm/mas_devops/roles/suite_app_restore/vars/manage.yml @@ -0,0 +1,8 @@ +--- +# Restore Options +# ----------------------------------------------------------------------------- +# Wait retries for ManageWorkspace to be ready +mas_app_restore_wait_retries: "{{ lookup('env', 'MAS_APP_RESTORE_WAIT_RETRIES') | default('180', true) }}" + +# Delay between status checks (in seconds) +mas_app_restore_wait_delay: "{{ lookup('env', 'MAS_APP_RESTORE_WAIT_DELAY') | default('360', true) }}" diff --git a/ibm/mas_devops/roles/suite_backup/README.md b/ibm/mas_devops/roles/suite_backup/README.md new file mode 100644 index 0000000000..d47636e513 --- /dev/null +++ b/ibm/mas_devops/roles/suite_backup/README.md @@ -0,0 +1,194 @@ +Backup MAS Core +=============================================================================== + +Overview +------------------------------------------------------------------------------- +This role supports backing up MAS Core namespace resources and supporting resources +in other namespaces; supports creating on-demand full backups. + +!!! important + Backup can only be restored to an instance with the same MAS instance ID. + + +Role Variables +------------------------------------------------------------------------------- + +### mas_instance_id +The instance ID of the Maximo Application Suite installation to backup. + +- **Required** +- Environment Variable: `MAS_INSTANCE_ID` +- Default Value: None + +### mas_backup_dir +The local directory path where backup files will be stored (for backup). + +- **Required** +- Environment Variable: `MAS_BACKUP_DIR` +- Default: None +- Example: `/tmp/mas_backups` + +### suite_backup_version +Set version to override the default `YYYYMMDD-HHMMSS` timestamp version used in the name of the backup file. + +- **Optional** +- Default: `YYYYMMDD-HHMMSS` timestamp. +- Environment Variable: `SUITE_BACKUP_VERSION` + +### include_sls +Controls whether to include the Suite SLS (Suite License Service) configuration in the backup archive. +If you plan to install a new SLS in any recovery action then you should set this to `false`. + +- **Optional** +- Default: `true` +- Environment Variable: `INCLUDE_SLS` + +### include_dro +Controls whether to include the Suite DRO configuration in the backup archive. +If you plan to install a new DRO in any recovery action then you should set this to `false`. + +- **Optional** +- Default: `true` +- Environment Variable: `INCLUDE_DRO` + + +## Backup Operations +------------------------------------------------------------------------------- + +This section provides comprehensive information about MAS Core backup operations. + +### Overview + +The MAS Core backup operation creates a comprehensive backup of your MAS Core installation, including all namespace resources and supporting resources in other namespaces. This backup can be restored using the [`suite_restore`](suite_restore.md) role. + +**Important**: Backup can only be restored to an instance with the same MAS instance ID. + +### What Gets Backed Up + +The MAS Core backup operation captures all critical resources needed to restore a complete MAS Core instance: + +**Core Namespace Resources (`mas-{instance-id}-core`):** +- **Projects/Namespaces**: The MAS Core namespace +- **Secrets**: + - Superuser credentials (`{instance-id}-credentials-superuser`) + - IBM entitlement key (`ibm-entitlement`) + - Public certificates (`{instance-id}-cert-public`) + - All auto-discovered secrets referenced by other resources +- **Operator Resources**: + - Subscription (`ibm-mas`) + - OperatorGroup +- **Certificate Manager Resources**: + - Certificates (with label `mas.ibm.com/instanceId={instance-id}`) +- **MAS Addon Resources** (addons.mas.ibm.com): + - MVIEdge + - ReplicaDB + - GenericAddon +- **MAS Core Resources** (core.mas.ibm.com): + - Suite CR + - Workspace CRs +- **MAS Internal Resources** (internal.mas.ibm.com): + - CoreIDP +- **MAS Configuration Resources** (config.mas.ibm.com): + - AppCfg, IDPCfg, JdbcCfg, KafkaCfg, MongoCfg + - ObjectStorageCfg, PushNotificationCfg, ScimCfg + - SmtpCfg, WatsonStudioCfg + - BasCfg (if `include_dro` is true) + - SlsCfg (if `include_sls` is true) + +**Certificate Manager Resources:** +- **ClusterIssuers**: + - Public cluster issuer (detected automatically) + - `mas-{instance-id}-core-internal-issuer` + - `mas-{instance-id}-ca` +- **Issuers** (in cert-manager namespace): + - `mas-{instance-id}-core-internal-ca-issuer` + - `mas-{instance-id}-core-public-ca-issuer` +- **Certificates** (in cert-manager namespace): + - `{instance-id}-cert-internal-ca` + - `{instance-id}-cert-public-ca` + +### Backup Process + +The MAS Core backup operation performs the following steps: + +1. **Validation**: Verifies required variables (`mas_instance_id`, `mas_backup_dir`) +2. **Version Generation**: Creates or uses provided backup version identifier +3. **Certificate Manager Detection**: Detects the Certificate Manager installation and namespace +4. **Cluster Issuer Detection**: Identifies the public cluster issuer in use +5. **Resource Discovery**: Identifies all MAS Core resources and auto-discovers referenced secrets +6. **Backup Execution**: Exports all resources to YAML files in the backup directory +7. **Verification**: Reports backup statistics and any failures + +**Backup Directory Structure:** +``` +{mas_backup_dir}/ +└── backup-{version}-suite/ + └── resources/ + ├── projects/ + ├── secrets/ + ├── configmaps/ + ├── subscriptions/ + ├── operatorgroups/ + ├── clusterissuers/ + ├── issuers/ + ├── certificates/ + ├── mviedges/ + ├── replicadbs/ + ├── genericaddons/ + ├── suites/ + ├── workspaces/ + ├── coreidps/ + ├── appcfgs/ + ├── idpcfgs/ + ├── jdbccfgs/ + ├── kafkacfgs/ + ├── mongocfgs/ + ├── objectstoragecfgs/ + ├── pushnotificationcfgs/ + ├── scimcfgs/ + ├── smtpcfgs/ + ├── watsonstudiocfgs/ + ├── bascfgs/ (if include_dro is true) + └── slscfgs/ (if include_sls is true) +``` + +### Important Considerations + +**Instance ID Requirement:** +- Backup can only be restored to an instance with the same MAS instance ID +- The instance ID is embedded in resource names and cannot be changed during restore + +**Storage Requirements:** +- Ensure sufficient storage in the backup directory +- Backup directory structure: `{mas_backup_dir}/backup-{version}-suite/` +- Monitor disk space during backup operations + +**Security:** +- Backup files contain sensitive data including credentials and certificates +- Secure backup directory with appropriate permissions (chmod 700 recommended) +- Consider encrypting backups for long-term storage +- Restrict access to backup files to authorized personnel only + +**SLS and DRO Configuration:** +- Use `include_sls=false` if you plan to install a new SLS during recovery +- Use `include_dro=false` if you plan to install a new DRO during recovery +- Default is `true` for both, which includes the configuration in the backup + +### Backup Best Practices + +1. **Regular Backups**: Schedule automated backups at regular intervals, especially before: + - MAS upgrades + - Configuration changes + - Application installations + - Cluster maintenance +2. **Test Restores**: Periodically test restore procedures in non-production environments +3. **Monitor Operations**: Implement monitoring and alerting for backup failures +4. **Backup Validation**: Verify backup integrity after completion +5. **Retention Policy**: Implement and document backup retention policies +6. **Disaster Recovery**: Include MAS Core backup/restore in your DR plan +7. **Coordinate Backups**: Coordinate MAS Core backups with: + - Database backups (MongoDB, Db2) + - SLS backups (if using separate SLS) + - DRO backups (if using separate DRO) + - Application-specific backups + diff --git a/ibm/mas_devops/roles/suite_backup/defaults/main.yml b/ibm/mas_devops/roles/suite_backup/defaults/main.yml new file mode 100644 index 0000000000..7a7408afd4 --- /dev/null +++ b/ibm/mas_devops/roles/suite_backup/defaults/main.yml @@ -0,0 +1,11 @@ +--- +mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" + +mas_backup_dir: "{{ lookup('env', 'MAS_BACKUP_DIR') }}" +suite_backup_version: "{{ lookup('env', 'SUITE_BACKUP_VERSION') }}" + +# Include SLS configuration from backup in backup +include_sls: "{{ lookup('env', 'INCLUDE_SLS') | default('true', true) | bool }}" + +# Include DRO (BAS) configuration from backup in backup +include_dro: "{{ lookup('env', 'INCLUDE_DRO') | default('true', true) | bool }}" diff --git a/ibm/mas_devops/roles/suite_backup/tasks/main.yml b/ibm/mas_devops/roles/suite_backup/tasks/main.yml new file mode 100644 index 0000000000..1e68357f27 --- /dev/null +++ b/ibm/mas_devops/roles/suite_backup/tasks/main.yml @@ -0,0 +1,206 @@ +--- +# 1. Check mas core backup required variables +# ----------------------------------------------------------------------------- +- name: "Verify required variables for suite backup" + ibm.mas_devops.verify_backup_restore_vars: + component: suite + action: backup + mas_instance_id: "{{ mas_instance_id }}" + mas_backup_dir: "{{ mas_backup_dir }}" + +- name: "Check if SUITE_BACKUP_VERSION is provided, if not set to default 'YYYYMMDD-HHMMSS' format" + set_fact: + suite_backup_version: "{{ lookup('pipe', 'date +%Y%m%d-%H%M%S') }}" + when: suite_backup_version is not defined or suite_backup_version == "" or suite_backup_version == "None" + +- name: "Set fact: mas core namespace name" + set_fact: + mas_core_namespace: "mas-{{ mas_instance_id }}-core" + +- name: "Set fact: mas suite backup base directory path" + set_fact: + suite_backup_path: "{{ mas_backup_dir }}/backup-{{ suite_backup_version }}-suite" + +# 2. Determine version of cert-manager in use on the cluster +# ----------------------------------------------------------------------------- +- name: Detect Certificate Manager installation + include_tasks: "{{ role_path }}/../../common_tasks/detect_cert_manager.yml" + +# 3. Determine if a public clusterissuer is being used +# ----------------------------------------------------------------------------- +- name: Detect MAS Public Cluster Issuer + ibm.mas_devops.get_mas_cluster_issuer: + instance_id: "{{ mas_instance_id }}" + register: mas_cluster_issuer + +- name: "Display detected MAS cluster issuer" + debug: + msg: "Detected MAS cluster issuer: {{ mas_cluster_issuer.issuer_name }}" + when: mas_cluster_issuer.success + +- name: "Fail if MAS cluster issuer detection failed" + fail: + msg: "Failed to detect MAS cluster issuer" + when: mas_cluster_issuer.failed + +# 4. Build the config.mas.ibm.com resource list based on include flags +# ----------------------------------------------------------------------------- +- name: "Set fact: base config resources list" + set_fact: + config_resources: + - kind: AppCfg + api_version: config.mas.ibm.com/v1 + - kind: IDPCfg + api_version: config.mas.ibm.com/v1 + - kind: JdbcCfg + api_version: config.mas.ibm.com/v1 + - kind: KafkaCfg + api_version: config.mas.ibm.com/v1 + - kind: MongoCfg + api_version: config.mas.ibm.com/v1 + - kind: ObjectStorageCfg + api_version: config.mas.ibm.com/v1 + - kind: PushNotificationCfg + api_version: config.mas.ibm.com/v1 + - kind: ScimCfg + api_version: config.mas.ibm.com/v1 + - kind: SmtpCfg + api_version: config.mas.ibm.com/v1 + - kind: WatsonStudioCfg + api_version: config.mas.ibm.com/v1 + +- name: "Add BasCfg to config resources if include_dro is true" + set_fact: + config_resources: "{{ config_resources + [{'kind': 'BasCfg', 'api_version': 'config.mas.ibm.com/v1'}] }}" + when: include_dro | bool + +- name: "Add SlsCfg to config resources if include_sls is true" + set_fact: + config_resources: "{{ config_resources + [{'kind': 'SlsCfg', 'api_version': 'config.mas.ibm.com/v1'}] }}" + when: include_sls | bool + +# 5. Build the core namespace resources list +# ----------------------------------------------------------------------------- +- name: "Set fact: mas core namespace resources" + set_fact: + mas_core_namespace_resources: + - kind: Project + api_version: project.openshift.io/v1 + name: "{{ mas_core_namespace }}" + - kind: Secret + api_version: v1 + name: "{{ mas_instance_id }}-credentials-superuser" + # subscription + - kind: Subscription + api_version: operators.coreos.com/v1alpha1 + name: ibm-mas + - kind: OperatorGroup + api_version: operators.coreos.com/v1 + name: operatorgroup + # secrets + - kind: Secret + api_version: v1 + name: ibm-entitlement + - kind: Secret + api_version: v1 + name: "{{ mas_instance_id }}-credentials-superuser" + # public cert in case of manual managed certs + - kind: Secret + api_version: v1 + name: "{{ mas_instance_id }}-cert-public" + # certificates + - kind: Certificate + api_version: cert-manager.io/v1 + labels: + - "mas.ibm.com/instanceId={{ mas_instance_id }}" + # addons.mas.ibm.com + - kind: MVIEdge + api_version: addons.mas.ibm.com/v1 + - kind: ReplicaDB + api_version: addons.mas.ibm.com/v1 + - kind: GenericAddon + api_version: addons.mas.ibm.com/v1 + # core.mas.ibm.com + - kind: Suite + api_version: core.mas.ibm.com/v1 + - kind: Workspace + api_version: core.mas.ibm.com/v1 + # internal.mas.ibm.com + - kind: CoreIDP + api_version: internal.mas.ibm.com/v1 + +- name: "Combine core namespace resources with config resources" + set_fact: + mas_core_namespace_resources: "{{ mas_core_namespace_resources + config_resources }}" + +# 6. Set the backup_resources we want to backup. Note that if the resource doesn't +# exist, the backup will still succeed. If a resource contains `secretName` key +# then the secret will be backed up as well. +# ----------------------------------------------------------------------------- +- name: "Set fact: mas suite backup resources" + set_fact: + suite_backup_resources: + - resources: + - kind: ClusterIssuer + api_version: cert-manager.io/v1 + name: "{{ mas_cluster_issuer.issuer_name }}" + - kind: ClusterIssuer + api_version: cert-manager.io/v1 + name: "mas-{{ mas_instance_id }}-core-internal-issuer" + - kind: ClusterIssuer + api_version: cert-manager.io/v1 + name: "mas-{{ mas_instance_id }}-ca" + - namespace: "{{ cert_manager_cluster_resource_namespace }}" + resources: + - kind: Issuer + api_version: cert-manager.io/v1 + name: "mas-{{ mas_instance_id }}-core-internal-ca-issuer" + - kind: Issuer + api_version: cert-manager.io/v1 + name: "mas-{{ mas_instance_id }}-core-public-ca-issuer" + - kind: Certificate + api_version: cert-manager.io/v1 + name: "{{ mas_instance_id }}-cert-internal-ca" + - kind: Certificate + api_version: cert-manager.io/v1 + name: "{{ mas_instance_id }}-cert-public-ca" + - namespace: "{{ mas_core_namespace }}" + resources: "{{ mas_core_namespace_resources }}" + +# 7. Call the backup_resources plugin to execute the backup to the path provided +# ----------------------------------------------------------------------------- +- name: "Backup MAS Suite resources (referenced secrets are auto-discovered)" + ibm.mas_devops.backup_resource: + backup_resources: "{{ suite_backup_resources }}" + backup_path: "{{ suite_backup_path }}" + register: backup_result + +# 8. Show the results +# ----------------------------------------------------------------------------- +- name: "Display backup results" + debug: + msg: + - "Backup completed{{ ' with failures' if backup_result.failed_count > 0 else ' successfully' }}" + - "Total resources backed up: {{ backup_result.backed_up_count }}" + - "Total resources failed: {{ backup_result.failed_count }}" + - "Resources not found: {{ backup_result.not_found_count }}" + - "Secrets auto-discovered: {{ backup_result.discovered_secrets_count }}" + - "Backup location: {{ suite_backup_path }}" + +# 9. Fail task if any errors occurred. +# ----------------------------------------------------------------------------- +- name: "Display failed resources" + debug: + msg: + - "Failed resources:" + - "{{ backup_result.failed_resources | to_nice_yaml }}" + when: backup_result.failed_count > 0 + +- name: "Fail if backup had errors" + fail: + msg: | + Backup failed for {{ backup_result.failed_count }} resource(s): + {% for resource in backup_result.failed_resources %} + - {{ resource.description }} in {{ resource.scope }} + {% endfor %} + when: backup_result.failed_count > 0 diff --git a/ibm/mas_devops/roles/suite_backup_restore/README.md b/ibm/mas_devops/roles/suite_backup_restore/README.md deleted file mode 100644 index 5492170b40..0000000000 --- a/ibm/mas_devops/roles/suite_backup_restore/README.md +++ /dev/null @@ -1,247 +0,0 @@ -# suite_backup_restore - -This role supports backing up and restoring MAS Core namespace resources; supports creating on-demand or scheduled backup jobs for taking full or incremental backups, and optionally creating Kubernetes jobs for running the backup/restore process. - -!!! important - A backup can only be restored to an instance with the same MAS instance ID. - -## Role Variables - -### General - -#### masbr_action -Action to perform on MAS Core namespace. - -- **Required** -- Environment Variable: `MASBR_ACTION` -- Default: None - -**Purpose**: Specifies whether to create a backup of MAS Core namespace resources or restore from a previous backup. - -**When to use**: -- Set to `backup` to create a backup of MAS Core namespace resources -- Set to `restore` to restore MAS Core namespace from a backup -- Always required to indicate the operation type - -**Valid values**: `backup`, `restore` - -**Impact**: -- `backup`: Creates backup job (on-demand or scheduled) for MAS Core namespace resources -- `restore`: Restores MAS Core namespace from specified backup version - -**Related variables**: -- `masbr_restore_from_version`: Required when action is `restore` -- `masbr_backup_schedule`: Optional for scheduled backups -- `mas_instance_id`: Instance to backup/restore - -**Note**: **IMPORTANT** - This role handles MAS Core namespace resources only. MongoDB data must be backed up/restored separately using the `mongodb` role. A backup can only be restored to an instance with the same MAS instance ID. - -#### mas_instance_id -MAS instance identifier for backup/restore operations. - -- **Required** -- Environment Variable: `MAS_INSTANCE_ID` -- Default: None - -**Purpose**: Identifies which MAS instance to backup or restore. Used to locate MAS Core namespace resources and ensure restore compatibility. - -**When to use**: -- Always required for backup and restore operations -- Must match the instance ID from MAS installation -- Critical for restore operations (must match original backup instance ID) - -**Valid values**: Lowercase alphanumeric string, 3-12 characters (e.g., `prod`, `dev`, `main`) - -**Impact**: Determines which MAS instance's Core namespace will be backed up or restored. **CRITICAL** - A backup can only be restored to an instance with the same MAS instance ID. - -**Related variables**: -- `masbr_action`: Whether backing up or restoring this instance -- `masbr_restore_from_version`: Backup version to restore (for restore action) - -**Note**: **IMPORTANT** - The instance ID must match between backup and restore operations. Attempting to restore a backup to an instance with a different ID will fail. - -#### masbr_confirm_cluster -Confirm cluster connection before backup/restore. - -- **Optional** -- Environment Variable: `MASBR_CONFIRM_CLUSTER` -- Default: `false` - -**Purpose**: Controls whether the role prompts for confirmation of the currently connected cluster before executing backup or restore operations. Safety feature to prevent accidental operations on wrong cluster. - -**When to use**: -- Set to `true` for interactive confirmation (recommended for production) -- Leave as `false` (default) for automated/non-interactive operations -- Use `true` when manually running backup/restore to verify correct cluster - -**Valid values**: `true`, `false` - -**Impact**: -- `true`: Role prompts for cluster confirmation before proceeding -- `false`: Role proceeds without confirmation (suitable for automation) - -**Related variables**: -- `masbr_action`: Operation requiring cluster confirmation - -**Note**: Enabling cluster confirmation is recommended for manual operations, especially in production environments, to prevent accidental backup/restore on the wrong cluster. - -#### masbr_copy_timeout_sec -File transfer timeout in seconds. - -- **Optional** -- Environment Variable: `MASBR_COPY_TIMEOUT_SEC` -- Default: `43200` (12 hours) - -**Purpose**: Specifies the maximum time allowed for transferring backup files between cluster and local storage. Prevents operations from hanging indefinitely. - -**When to use**: -- Use default (12 hours) for most deployments -- Increase for very large backups or slow network connections -- Decrease for smaller backups to fail faster on issues - -**Valid values**: Positive integer (seconds), e.g., `3600` (1 hour), `43200` (12 hours), `86400` (24 hours) - -**Impact**: Operations exceeding this timeout will fail. Insufficient timeout for large backups will cause failures. Excessive timeout delays error detection. - -**Related variables**: -- `masbr_storage_local_folder`: Destination for file transfers - -**Note**: The default 12 hours is suitable for most deployments. Adjust based on backup size and network speed. Monitor actual transfer times to optimize this setting. - -#### masbr_job_timezone -Time zone for scheduled backup jobs. - -- **Optional** -- Environment Variable: `MASBR_JOB_TIMEZONE` -- Default: UTC - -**Purpose**: Specifies the time zone for scheduled backup CronJobs. Ensures backups run at the intended local time rather than UTC. - -**When to use**: -- Leave unset to use UTC (default) -- Set when you need backups to run at specific local times -- Only applies to scheduled backups (when `masbr_backup_schedule` is set) - -**Valid values**: Valid [tz database time zone](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) (e.g., `America/New_York`, `Europe/London`, `Asia/Tokyo`) - -**Impact**: Determines when scheduled backups execute. Incorrect time zone may cause backups to run at unexpected times. - -**Related variables**: -- `masbr_backup_schedule`: Cron expression interpreted in this time zone - -**Note**: Only relevant for scheduled backups. On-demand backups ignore this setting. Use standard tz database names (e.g., `America/New_York`, not `EST`). - -#### masbr_storage_local_folder -Local filesystem path for backup storage. - -- **Required** -- Environment Variable: `MASBR_STORAGE_LOCAL_FOLDER` -- Default: None - -**Purpose**: Specifies the local filesystem path where backup files are stored (for backups) or retrieved from (for restores). This is the persistent storage location for backup data. - -**When to use**: -- Always required for backup and restore operations -- Must be accessible from the system running the role -- Should have sufficient space for backup files -- Must be persistent across operations for restore capability - -**Valid values**: Absolute filesystem path (e.g., `/tmp/masbr`, `/backup/mas`, `/mnt/backup`) - -**Impact**: Backup files are written to or read from this location. Insufficient space will cause backup failures. Path must exist and be writable. - -**Related variables**: -- `masbr_copy_timeout_sec`: Timeout for transferring files to/from this location -- `masbr_restore_from_version`: Backup version stored in this location - -**Note**: Ensure the path has sufficient disk space for backups. For production, use a dedicated backup volume with appropriate retention policies. The path must be accessible during both backup and restore operations. - -### Backup - -#### masbr_backup_schedule -Cron expression for scheduled backups. - -- **Optional** -- Environment Variable: `MASBR_BACKUP_SCHEDULE` -- Default: None (on-demand backup) - -**Purpose**: Defines a schedule for automatic recurring backups using Cron syntax. When set, creates a Kubernetes CronJob for automated backups. - -**When to use**: -- Leave unset for on-demand backups (manual execution) -- Set to create scheduled/recurring backups -- Use for automated backup strategies - -**Valid values**: Valid [Cron expression](https://en.wikipedia.org/wiki/Cron) (e.g., `0 2 * * *` for daily at 2 AM, `0 2 * * 0` for weekly on Sunday at 2 AM) - -**Impact**: -- When set: Creates a Kubernetes CronJob that runs backups automatically on schedule -- When unset: Creates an on-demand backup job that runs immediately - -**Related variables**: -- `masbr_job_timezone`: Time zone for interpreting the cron schedule -- `masbr_action`: Must be `backup` for scheduled backups - -**Note**: Scheduled backups only apply when `masbr_action=backup`. The cron expression is interpreted in the time zone specified by `masbr_job_timezone` (defaults to UTC). Common patterns: `0 2 * * *` (daily 2 AM), `0 2 * * 0` (weekly Sunday 2 AM), `0 2 1 * *` (monthly 1st at 2 AM). - -### Restore - -#### masbr_restore_from_version -Backup version timestamp for restore operations. - -- **Required** (when `masbr_action=restore`) -- Environment Variable: `MASBR_RESTORE_FROM_VERSION` -- Default: None - -**Purpose**: Specifies which backup version to restore from. The version is a timestamp identifying a specific backup. - -**When to use**: -- Required when `masbr_action=restore` -- Not used for backup operations -- Must match an existing backup version in storage - -**Valid values**: Timestamp in `YYYYMMDDHHMMSS` format (e.g., `20240621021316` for June 21, 2024 at 02:13:16) - -**Impact**: Determines which backup is restored. Incorrect or non-existent version will cause restore to fail. - -**Related variables**: -- `masbr_action`: Must be `restore` for this variable to be used -- `masbr_storage_local_folder`: Location where backup versions are stored -- `mas_instance_id`: Must match the instance ID from the backup - -**Note**: The backup version timestamp is generated automatically during backup creation. List available backups in `masbr_storage_local_folder` to find valid version timestamps. **IMPORTANT** - The backup can only be restored to an instance with the same MAS instance ID as the original backup. - -## Example Playbook - -### Backup -Backup MAS Core namespace resources, note that this does not include backup of any data in MongoDb, see the `backup` action in the [mongodb](mongodb.md) role. - -```yaml -- hosts: localhost - any_errors_fatal: true - vars: - masbr_action: backup - mas_instance_id: main - masbr_storage_local_folder: /tmp/masbr - roles: - - ibm.mas_devops.suite_backup_restore -``` - -### Restore -Restore MAS Core namespace resources, note that this does not include backup of any data in MongoDb, see the `restore` action in the [mongodb](mongodb.md) role. - -```yaml -- hosts: localhost - any_errors_fatal: true - vars: - masbr_action: restore - masbr_restore_from_version: 20240621021316 - mas_instance_id: main - masbr_storage_local_folder: /tmp/masbr - roles: - - ibm.mas_devops.suite_backup_restore -``` - -## License - -EPL-2.0 diff --git a/ibm/mas_devops/roles/suite_backup_restore/defaults/main.yml b/ibm/mas_devops/roles/suite_backup_restore/defaults/main.yml deleted file mode 100644 index e778df58fa..0000000000 --- a/ibm/mas_devops/roles/suite_backup_restore/defaults/main.yml +++ /dev/null @@ -1,6 +0,0 @@ ---- -masbr_action: "{{ lookup('env', 'MASBR_ACTION') }}" -mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" - -# Backup/Restore - Supported job types -supported_job_data_item_types: ["namespace"] diff --git a/ibm/mas_devops/roles/suite_backup_restore/tasks/backup-namespace.yml b/ibm/mas_devops/roles/suite_backup_restore/tasks/backup-namespace.yml deleted file mode 100644 index 8c867a9b89..0000000000 --- a/ibm/mas_devops/roles/suite_backup_restore/tasks/backup-namespace.yml +++ /dev/null @@ -1,113 +0,0 @@ ---- -# Update namespace resource backup status: InProgress -# ----------------------------------------------------------------------------- -- name: "Update namespace resource backup status: InProgress" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "InProgress" - - -- name: "Backup namespace resources" - block: - # Prepare namespace resource backup folder - # ------------------------------------------------------------------------- - - name: "Set fact: namespace resource backup folder" - set_fact: - masbr_ns_backup_folder: "{{ masbr_local_job_folder }}/{{ masbr_job_data_type }}" - masbr_ns_backup_name: "{{ masbr_job_name }}-{{ masbr_job_data_type }}" - - - name: "Set fact: namespace resource backup log" - set_fact: - masbr_ns_backup_log: "{{ masbr_local_job_folder }}/{{ masbr_ns_backup_name }}.log" - - - name: "Create local backup folder for saving namespace resoruces" - changed_when: true - shell: > - mkdir -p {{ masbr_ns_backup_folder }} && - touch {{ masbr_ns_backup_log }} - - - # Run backup namespace resource script - # ------------------------------------------------------------------------- - - name: "Create backup namespace resource script" - template: - src: "{{ role_path }}/../../common_tasks/templates/backup_restore/backup-namespace-resources.sh.j2" - dest: "{{ masbr_local_job_folder }}/backup-namespace-resources.sh" - mode: "777" - - - name: "Run backup namespace resource script" - changed_when: true - shell: > - {{ masbr_local_job_folder }}/backup-namespace-resources.sh - register: _script_output - - - name: "Debug: run backup namespace resource script" - debug: - msg: "{{ _script_output.stdout_lines }}" - - - # Create tar.gz archives of namespace resource backup files - # ------------------------------------------------------------------------- - - name: "Create tar.gz archives of namespace resource backup files" - changed_when: true - shell: > - tar -czf {{ masbr_local_job_folder }}/{{ masbr_ns_backup_name }}.tar.gz - -C {{ masbr_ns_backup_folder }} . && - ls -lA {{ masbr_ns_backup_folder }} - register: _list_files_output - - - name: "Debug: list of namespace resource backup files" - debug: - msg: "{{ _list_files_output.stdout_lines }}" - - - # Copy backup files to specified storage location - # ------------------------------------------------------------------------- - - name: "Copy backup files to specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_local_files_to_storage.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_job_name }}" - masbr_cf_paths: - - src_file: "{{ masbr_ns_backup_name }}.tar.gz" - dest_folder: "{{ masbr_job_data_type }}" - - - # Update namespace resource backup status: Completed - # ------------------------------------------------------------------------- - - name: "Update namespace resource backup status: Completed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Completed" - - rescue: - # Update namespace resource backup status: Failed - # ------------------------------------------------------------------------- - - name: "Update namespace resource backup status: Failed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Failed" - - always: - # Copy namespace resource backup log file to specified storage location - # ------------------------------------------------------------------------- - - name: "Create a tar.gz archive of namespace resource backup log" - changed_when: true - shell: > - tar -czf {{ masbr_local_job_folder }}/{{ masbr_ns_backup_name }}-log.tar.gz - -C {{ masbr_local_job_folder }} {{ masbr_ns_backup_name }}.log - - - name: "Copy namespace resource backup log file to specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_local_files_to_storage.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_job_name_final }}" - masbr_cf_paths: - - src_file: "{{ masbr_local_job_folder }}/{{ masbr_ns_backup_name }}-log.tar.gz" - dest_folder: "log" diff --git a/ibm/mas_devops/roles/suite_backup_restore/tasks/backup-vars.yml b/ibm/mas_devops/roles/suite_backup_restore/tasks/backup-vars.yml deleted file mode 100644 index 8c427d75b7..0000000000 --- a/ibm/mas_devops/roles/suite_backup_restore/tasks/backup-vars.yml +++ /dev/null @@ -1,40 +0,0 @@ ---- -- name: "Set fact: default backup job data list" - set_fact: - masbr_job_data_list: - - seq: "1" - type: "namespace" - -- name: "Set fact: namespace backup resources" - set_fact: - masbr_ns_backup_resources: - - namespace: "{{ mas_core_namespace }}" - resources: - - kind: Subscription - name: ibm-mas-operator - - kind: OperatorGroup - name: ibm-mas-operator-group - - kind: Secret - name: ibm-entitlement - - kind: Secret - name: "{{ mas_instance_id }}-credentials-superuser" - # addons.mas.ibm.com - - kind: MVIEdge - - kind: ReplicaDB - # config.mas.ibm.com - - kind: BasCfg - - kind: IDPCfg - - kind: JdbcCfg - - kind: KafkaCfg - - kind: MongoCfg - - kind: ObjectStorageCfg - - kind: PushNotificationCfg - - kind: ScimCfg - - kind: SlsCfg - - kind: SmtpCfg - - kind: WatsonStudioCfg - # core.mas.ibm.com - - kind: Suite - - kind: Workspace - # internal.mas.ibm.com - - kind: CoreIDP diff --git a/ibm/mas_devops/roles/suite_backup_restore/tasks/get-suite-info.yml b/ibm/mas_devops/roles/suite_backup_restore/tasks/get-suite-info.yml deleted file mode 100644 index c07d9b9369..0000000000 --- a/ibm/mas_devops/roles/suite_backup_restore/tasks/get-suite-info.yml +++ /dev/null @@ -1,45 +0,0 @@ ---- -# Get Suite version and status -# ----------------------------------------------------------------------------- -- name: "Get Suite" - kubernetes.core.k8s_info: - api_version: core.mas.ibm.com/v1 - kind: Suite - name: "{{ mas_instance_id }}" - namespace: "{{ mas_core_namespace }}" - register: _suite_output - -- name: "Set fact: Suite version" - set_fact: - mas_core_version: "{{ _suite_output.resources[0].status.versions.reconciled }}" - when: - - _suite_output is defined - - (_suite_output.resources | length > 0) - - _suite_output.resources[0].status.versions.reconciled is defined - -- name: "Fail if Suite does not exists" - assert: - that: mas_core_version is defined - fail_msg: "Suite does not exists!" - -- name: "Set fact: Suite status" - set_fact: - mas_core_ready: true - when: - - _suite_output.resources is defined - - (_suite_output.resources | length > 0) - - _suite_output.resources | json_query('[*].status.conditions[?type==`Ready`][].status') | select ('match','True') | list | length == 1 - -- name: "Fail if Suite is not ready" - assert: - that: mas_core_ready is defined and mas_core_ready - fail_msg: "Suite is not ready!" - - -# Output Suite information -# ----------------------------------------------------------------------------- -- name: "Debug: Suite information" - debug: - msg: - - "Suite version .......................... {{ mas_core_version }}" - - "Suite is ready ......................... {{ mas_core_ready }}" diff --git a/ibm/mas_devops/roles/suite_backup_restore/tasks/main.yml b/ibm/mas_devops/roles/suite_backup_restore/tasks/main.yml deleted file mode 100644 index d22d41f048..0000000000 --- a/ibm/mas_devops/roles/suite_backup_restore/tasks/main.yml +++ /dev/null @@ -1,78 +0,0 @@ ---- -# Check mas core backup/restore required variables -# ----------------------------------------------------------------------------- -- name: "Fail if mas_instance_id is not provided" - assert: - that: mas_instance_id is defined and mas_instance_id != "" - fail_msg: "mas_instance_id is required" - -- name: "Fail if masbr_action is not provided" - assert: - that: masbr_action is defined and masbr_action != "" - fail_msg: "masbr_action is required" - -- name: "Set fact: mas core namespace name" - set_fact: - mas_core_namespace: "mas-{{ mas_instance_id }}-core" - - -# Set common job variables -# ----------------------------------------------------------------------------- -- name: "Set fact: common job variables" - set_fact: - masbr_job_component: - name: "core" - instance: "{{ mas_instance_id }}" - namespace: "{{ mas_core_namespace }}" - -- name: "Load mas core variables" - include_tasks: "tasks/{{ masbr_action }}-vars.yml" - - -# Before run tasks -# ------------------------------------------------------------------------- -- name: "Before run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/before_run_tasks.yml" - vars: - _job_type: "{{ masbr_action }}" - _component_before_task_path: "{{ role_path }}/tasks/get-suite-info.yml" - - -- name: "Run {{ masbr_action }} tasks" - block: - # Update job status: New - # ------------------------------------------------------------------------- - - name: "Update job status: New" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "1" - phase: "New" - - - # Run backup/restore tasks for each data type - # ------------------------------------------------------------------------- - - name: "Run {{ masbr_action }} tasks for each data type" - include_tasks: "{{ role_path }}/tasks/{{ masbr_action }}-{{ job_data_item.type }}.yml" - vars: - masbr_job_data_seq: "{{ job_data_item.seq }}" - masbr_job_data_type: "{{ job_data_item.type }}" - loop: "{{ masbr_job_data_list }}" - loop_control: - loop_var: job_data_item - when: job_data_item.type in supported_job_data_item_types - - rescue: - # Update job status: Failed - # ------------------------------------------------------------------------- - - name: "Update job status: Failed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_status: - phase: "Failed" - - always: - # After run tasks - # ------------------------------------------------------------------------- - - name: "After run tasks" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/after_run_tasks.yml" diff --git a/ibm/mas_devops/roles/suite_backup_restore/tasks/restore-namespace.yml b/ibm/mas_devops/roles/suite_backup_restore/tasks/restore-namespace.yml deleted file mode 100644 index bffd88a435..0000000000 --- a/ibm/mas_devops/roles/suite_backup_restore/tasks/restore-namespace.yml +++ /dev/null @@ -1,120 +0,0 @@ ---- -# Update namespace resource restore status: InProgress -# ----------------------------------------------------------------------------- -- name: "Update namespace resource restore status: InProgress" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "InProgress" - - -- name: "Restore namespace resources" - block: - # Prepare namespace resource restore folder - # ------------------------------------------------------------------------- - - name: "Set fact: namespace resource restore folder" - set_fact: - masbr_ns_restore_folder: "{{ masbr_local_job_folder }}/{{ masbr_job_data_type }}" - masbr_ns_restore_name: "{{ masbr_job_name }}-{{ masbr_job_data_type }}" - masbr_ns_restore_from_name: "{{ masbr_restore_from }}-{{ masbr_job_data_type }}" - - - name: "Set fact: namespace resource restore log" - set_fact: - masbr_ns_restore_log: "{{ masbr_local_job_folder }}/{{ masbr_ns_restore_name }}.log" - - - name: "Create local restore folder for saving namespace resoruces" - changed_when: true - shell: > - mkdir -p {{ masbr_ns_restore_folder }} && - touch {{ masbr_ns_restore_log }} - - - name: "Debug: namespace resource restore folder" - debug: - msg: "Namespace resource restore folder ........ {{ masbr_ns_restore_folder }}" - - - # Copy backup file from specified storage location - # ------------------------------------------------------------------------- - - name: "Copy backup file from specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_storage_files_to_local.yml" - vars: - masbr_cf_job_type: "backup" - masbr_cf_job_name: "{{ masbr_restore_from }}" - masbr_cf_paths: - - src_file: "{{ masbr_job_data_type }}/{{ masbr_ns_restore_from_name }}.tar.gz" - dest_folder: "{{ masbr_job_data_type }}" - - - # Extract the tar.gz file - # ------------------------------------------------------------------------- - - name: "Extract the tar.gz file" - changed_when: true - shell: > - mkdir -p {{ masbr_ns_restore_folder }}/{{ masbr_ns_restore_from_name }} && - tar -xzf {{ masbr_ns_restore_folder }}/{{ masbr_ns_restore_from_name }}.tar.gz - -C {{ masbr_ns_restore_folder }}/{{ masbr_ns_restore_from_name }} && - ls -lA {{ masbr_ns_restore_folder }}/{{ masbr_ns_restore_from_name }} - register: _extract_output - - - name: "Debug: list extracted files" - debug: - msg: - - "Extract output folder .............. {{ masbr_ns_restore_folder }}/{{ masbr_ns_restore_from_name }}" - - "{{ _extract_output.stdout_lines }}" - - - # Restore namespace resoruces - # ------------------------------------------------------------------------- - # Loop through the folder - - name: "Get the list of files from restore directory" - find: - paths: "{{ masbr_ns_restore_folder }}/{{ masbr_ns_restore_from_name }}" - patterns: '*.yml,*.yaml' - recurse: no - register: find_result - - - name: "Apply configs" - kubernetes.core.k8s: - state: present - template: "{{ item.path }}" - with_items: "{{ find_result.files }}" - when: find_result is defined - - - # Update database restore status: Completed - # ------------------------------------------------------------------------- - - name: "Update database restore status: Completed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Completed" - - rescue: - # Update database restore status: Failed - # ------------------------------------------------------------------------- - - name: "Update database restore status: Failed" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/update_job_status.yml" - vars: - _job_data_list: - - seq: "{{ masbr_job_data_seq }}" - phase: "Failed" - - always: - # Copy namespace resource restore log file to specified storage location - # ------------------------------------------------------------------------- - - name: "Create a tar.gz archive of namespace resource restore log" - changed_when: true - shell: > - tar -czf {{ masbr_local_job_folder }}/{{ masbr_ns_restore_name }}-log.tar.gz - -C {{ masbr_local_job_folder }} {{ masbr_ns_restore_name }}.log - - - name: "Copy namespace resource restore log file to specified storage location" - include_tasks: "{{ role_path }}/../../common_tasks/backup_restore/copy_local_files_to_storage.yml" - vars: - masbr_cf_job_type: "restore" - masbr_cf_job_name: "{{ masbr_job_name_final }}" - masbr_cf_paths: - - src_file: "{{ masbr_local_job_folder }}/{{ masbr_ns_restore_name }}-log.tar.gz" - dest_folder: "log" diff --git a/ibm/mas_devops/roles/suite_backup_restore/tasks/restore-vars.yml b/ibm/mas_devops/roles/suite_backup_restore/tasks/restore-vars.yml deleted file mode 100644 index cf23734548..0000000000 --- a/ibm/mas_devops/roles/suite_backup_restore/tasks/restore-vars.yml +++ /dev/null @@ -1,6 +0,0 @@ ---- -- name: "Set fact: default restore job data list" - set_fact: - masbr_job_data_list: - - seq: "1" - type: "namespace" diff --git a/ibm/mas_devops/roles/suite_restore/README.md b/ibm/mas_devops/roles/suite_restore/README.md new file mode 100644 index 0000000000..3fd3b7177a --- /dev/null +++ b/ibm/mas_devops/roles/suite_restore/README.md @@ -0,0 +1,231 @@ +Restore MAS Core +=============================================================================== + +Overview +------------------------------------------------------------------------------- +This role supports restoring the MAS Core namespace resources and supporting +resources in other namespace when provided the backup archive generated from +`suite_backup` role. + +!!! important + Restore can only be made to the an instance with the same MAS instance ID as the backup. + + +Role Variables +------------------------------------------------------------------------------- + +### mas_instance_id +The instance ID of the Maximo Application Suite installation to restore. This +should match the instance ID of the backup. + +- **Required** +- Environment Variable: `MAS_INSTANCE_ID` +- Default Value: None + +### mas_backup_dir +The local directory path where backup files to restore are stored. + +- **Required** +- Environment Variaable: `MAS_BACKUP_DIR` +- Default: None +- Example: `/tmp/mas_backups` + +### suite_backup_version +The version of the backup file located in the `MAS_BACKUP_DIR` to be used +in the restore. + +- **Required** +- Default: None +- Environment Variable: `SUITE_BACKUP_VERSION` +- Example: `20260116-130937` + +### mas_domain +The domain to use for the MAS Suite instance. If not provided, the domain from the backup will be used. + +- **Optional** +- Environment Variable: `MAS_DOMAIN` +- Default: `NO_OVERRIDE` (uses value from backup) +- Example: `mydomain.example.com` + +### include_sls_from_backup +Controls whether to restore the Suite SLS (Suite License Service) configuration from the backup archive. +This should be used when the registration key stays the same, either due to also restoring the same +SLS service, or you are using a centralized SLS service that has not changed. If you plan to install +and use a new SLS service then set this value to `false` and use the `sls_cfg_file` variable to point +to the new SLS configuration file. + +- **Optional** +- Default: `true` +- Environment Variable: `INCLUDE_SLS_FROM_BACKUP` + +### sls_url +If `include_sls_from_backup` is true. The URL for the Suite License Service (SLS). If not provided, the URL from the backup will be used. +This is used when the domain has changed for SLS but SLS was restored from a backup and so the regristration key is the same. + +- **Optional** +- Environment Variable: `SLS_URL` +- Default: `NO_OVERRIDE` (uses value from backup) +- Example: `https://sls.example.com` + +### sls_cfg_file +If `include_sls_from_backup` is false. Path to the file containing external SLS configuration YAML file to apply. +This is used when you want to use SLS configuration from outside the backup archive (e.g., from a separate SLS role execution). + +- **Optional** +- Environment Variable: `SLS_CFG_FILE` +- Default: None +- Example: `/tmp/sls_config/sls.yml` + +### include_dro_from_backup +Controls whether to restore the Suite DRO configuration from the backup archive. +This should be used when the DRO details stay the same as you are using a centralized DRO service that has not changed. +If you plan to install and use a new DRO service then set this value to `false` and use the `dro_cfg_file` variable to point +to the new DRO configuration file. + +- **Optional** +- Default: `true` +- Environment Variable: `INCLUDE_DRO_FROM_BACKUP` + +### bas_url +If `include_dro_from_backup` is true. The URL for the Behavior Analytics Service (BAS). If not provided, the URL from the backup will be used. +This is used when the domain has changed for DRO but DRO was restored from a backup and so the api key is the same. + +- **Optional** +- Environment Variable: `BAS_URL` +- Default: `NO_OVERRIDE` (uses value from backup) +- Example: `https://bas.example.com` + +### dro_cfg_file +If `include_dro_from_backup` is false. Path to the file containing external DRO (BAS) configuration YAML file to apply. +This is used when you want to use DRO configuration from outside the backup archive (e.g., from a separate DRO role execution). + +- **Optional** +- Environment Variable: `DRO_CFG_FILE` +- Default: None +- Example: `/tmp/dro_config/dro.yml` + + +Restore Operations +------------------------------------------------------------------------------- + +This section provides comprehensive information about MAS Core restore operations. + +### Overview + +The MAS Core restore operation performs a complete restoration of a MAS Core installation from a backup created by the [`suite_backup`](suite_backup.md) role. The restore process recreates all namespace resources and supporting resources in the correct order. + +**Important**: Restore can only be made to an instance with the same MAS instance ID as the backup. + +### Restore Process + +The MAS Core restore operation performs the following steps in sequence: + +1. **Validation**: Verifies required variables and backup archive existence +2. **Certificate Manager Check**: Ensures cert-manager is installed in the cluster +3. **Projects Restoration**: Restores the MAS Core namespace +4. **Secrets and ConfigMaps**: Restores all secrets and configuration maps +5. **Operator Resources**: Restores OperatorGroups and Subscriptions +6. **Subscription Wait**: Waits for subscriptions to be ready (up to 30 minutes) +7. **Certificate Manager Resources**: Restores ClusterIssuers, Issuers, and Certificates +8. **MAS Addon Resources**: Restores MVIEdge, ReplicaDB, and GenericAddon resources +9. **MAS Configuration Resources**: Restores all config.mas.ibm.com resources with optional overrides: + - BasCfg (if `include_dro_from_backup` is true, with optional `bas_url` override) + - SlsCfg (if `include_sls_from_backup` is true, with optional `sls_url` override) + - AppCfg, IDPCfg, JdbcCfg, KafkaCfg, MongoCfg, ObjectStorageCfg, etc. +10. **MAS Internal Resources**: Restores CoreIDP resources +11. **Suite Restoration**: Restores the Suite CR with optional `mas_domain` override +12. **Suite Wait**: Waits for Suite to be ready (up to 60 minutes) +13. **Workspace Restoration**: Restores all Workspace CRs +14. **Workspace Wait**: Waits for all Workspaces to be ready (up to 60 minutes) + +### Configuration Override Options + +The restore process supports several override options to adapt the backup to a new environment: + +**Domain Override:** +- Use `mas_domain` to change the domain when restoring to a different cluster +- Default: Uses the domain from the backup + +**SLS Configuration:** +- If `include_sls_from_backup=true`: Restores SlsCfg from backup + - Use `sls_url` to override the SLS URL if the domain changed +- If `include_sls_from_backup=false`: Use `sls_cfg_file` to provide external SLS configuration + +**DRO Configuration:** +- If `include_dro_from_backup=true`: Restores BasCfg from backup + - Use `bas_url` to override the BAS URL if the domain changed +- If `include_dro_from_backup=false`: Use `dro_cfg_file` to provide external DRO configuration + +### When to Use + +**Full Restore Scenarios:** +- Disaster recovery after cluster failure +- Migrating MAS Core to a new cluster +- Recreating a deleted MAS Core instance +- Setting up a new environment from backup +- Testing backup integrity in non-production + +**Partial Configuration Scenarios:** +- Restoring with new SLS service (set `include_sls_from_backup=false`) +- Restoring with new DRO service (set `include_dro_from_backup=false`) +- Restoring to cluster with different domain (use `mas_domain` override) + +### Important Considerations + +**Prerequisites:** +- Target cluster must have Certificate Manager installed +- Target cluster must have the same MAS instance ID as the backup +- Required dependencies (MongoDB, Db2, etc.) must be available and accessible +- Sufficient cluster resources (CPU, memory, storage) must be available + +**Instance ID Requirement:** +- Restore can only be made to an instance with the same MAS instance ID +- The instance ID is embedded in resource names and cannot be changed + +**Storage Requirements:** +- Ensure backup directory is accessible from the restore environment +- Verify backup archive integrity before starting restore + +**Security:** +- Backup files contain sensitive data including credentials and certificates +- Ensure secure transfer of backup files to restore environment +- Verify backup file permissions and access controls + +**Configuration Dependencies:** +- If using external SLS/DRO configuration files, ensure they are valid and accessible +- Coordinate with database restore operations to ensure data consistency +- Verify network connectivity to external services (SLS, DRO, databases) + +### Restore Best Practices + +1. **Pre-Restore Validation**: + - Verify backup archive exists and is complete + - Confirm Certificate Manager is installed + - Ensure target cluster has sufficient resources + - Verify MAS instance ID matches the backup + +2. **Dependency Coordination**: + - Restore databases (MongoDB, Db2) before MAS Core + - Restore SLS before MAS Core if using separate SLS + - Restore DRO before MAS Core if using separate DRO + - Ensure all external services are accessible + +3. **Configuration Planning**: + - Determine if domain override is needed + - Decide whether to use backup SLS/DRO or new services + - Prepare external configuration files if needed + - Document any configuration changes + +4. **Post-Restore Verification**: + - Verify Suite status is Ready + - Verify all Workspaces are Ready + - Test application connectivity + - Verify database connections + - Test user authentication + +5. **Disaster Recovery**: + - Test restore procedures regularly in non-production + - Document restore procedures and configuration + - Maintain backup version identifiers + - Keep external configuration files secure and accessible + diff --git a/ibm/mas_devops/roles/suite_restore/defaults/main.yml b/ibm/mas_devops/roles/suite_restore/defaults/main.yml new file mode 100644 index 0000000000..a5ce7a0b4a --- /dev/null +++ b/ibm/mas_devops/roles/suite_restore/defaults/main.yml @@ -0,0 +1,22 @@ +--- +mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" + +mas_backup_dir: "{{ lookup('env', 'MAS_BACKUP_DIR') }}" +suite_backup_version: "{{ lookup('env', 'SUITE_BACKUP_VERSION') }}" + +# Override values for restore - set to NO_OVERRIDE to use values from backup +mas_domain: "{{ lookup('env', 'MAS_DOMAIN') | default('NO_OVERRIDE', true) }}" +sls_url: "{{ lookup('env', 'SLS_URL') | default('NO_OVERRIDE', true) }}" +bas_url: "{{ lookup('env', 'BAS_URL') | default('NO_OVERRIDE', true) }}" + +# Include SLS configuration from backup in restore +include_sls_from_backup: "{{ lookup('env', 'INCLUDE_SLS_FROM_BACKUP') | default('true', true) | bool }}" + +# Path to external SLS config file (used when include_sls_from_backup is false) +sls_cfg_file: "{{ lookup('env', 'SLS_CFG_FILE') | default('', true) }}" + +# Include DRO (BAS) configuration from backup in restore +include_dro_from_backup: "{{ lookup('env', 'INCLUDE_DRO_FROM_BACKUP') | default('true', true) | bool }}" + +# Path to external DRO config file (used when include_dro_from_backup is false) +dro_cfg_file: "{{ lookup('env', 'DRO_CFG_FILE') | default('', true) }}" diff --git a/ibm/mas_devops/roles/suite_restore/tasks/main.yml b/ibm/mas_devops/roles/suite_restore/tasks/main.yml new file mode 100644 index 0000000000..8697c7999a --- /dev/null +++ b/ibm/mas_devops/roles/suite_restore/tasks/main.yml @@ -0,0 +1,495 @@ +--- +# 1. Check mas core restore required variables +# ----------------------------------------------------------------------------- +- name: "Verify required variables for suite restore" + ibm.mas_devops.verify_backup_restore_vars: + component: suite + action: restore + mas_instance_id: "{{ mas_instance_id }}" + mas_backup_dir: "{{ mas_backup_dir }}" + suite_backup_version: "{{ suite_backup_version }}" + +- name: "Set fact: mas core namespace name" + set_fact: + mas_core_namespace: "mas-{{ mas_instance_id }}-core" + +- name: "Set fact: mas suite backup path" + set_fact: + suite_backup_path: "{{ mas_backup_dir }}/backup-{{ suite_backup_version }}-suite" + +# 2. Verify backup archive exists +# ----------------------------------------------------------------------------- +- name: "Check if backup archive exists" + stat: + path: "{{ suite_backup_path }}" + register: backup_path_stat + +- name: "Fail if backup archive does not exist" + fail: + msg: "Backup archive not found at: {{ suite_backup_path }}" + when: not backup_path_stat.stat.exists or not backup_path_stat.stat.isdir + +- name: "Check if backup resources directory exists" + stat: + path: "{{ suite_backup_path }}/resources" + register: backup_resources_stat + +- name: "Fail if backup resources directory does not exist" + fail: + msg: "Backup resources directory not found at: {{ suite_backup_path }}/resources" + when: not backup_resources_stat.stat.exists or not backup_resources_stat.stat.isdir + +# 3. Verify cert-manager exists +# ----------------------------------------------------------------------------- +- name: Detect Certificate Manager installation + include_tasks: "{{ role_path }}/../../common_tasks/detect_cert_manager.yml" + +# 4. Display restore information +# ----------------------------------------------------------------------------- +- name: "Display restore information" + debug: + msg: + - "MAS Instance ID: {{ mas_instance_id }}" + - "MAS Core Namespace: {{ mas_core_namespace }}" + - "Backup Version: {{ suite_backup_version }}" + - "Backup Path: {{ suite_backup_path }}" + +# 5. Restore resources in correct order +# ----------------------------------------------------------------------------- +# Step 1: Restore Projects first +- name: "Restore Projects" + ibm.mas_devops.restore_resource: + backup_path: "{{ suite_backup_path }}" + resource_kinds: + - Project + replace_resource: false + register: projects_result + +- name: "Display Projects restore results" + debug: + msg: >- + Projects: {{ projects_result.created_count }} created, + {{ projects_result.updated_count }} updated, + {{ projects_result.skipped_count }} skipped, + {{ projects_result.failed_count }} failed + +# Step 2: Restore Secrets and ConfigMaps +- name: "Restore Secrets and ConfigMaps" + ibm.mas_devops.restore_resource: + backup_path: "{{ suite_backup_path }}" + resource_kinds: + - Secret + - ConfigMap + register: secrets_configmaps_result + when: projects_result.success + +- name: "Display Secrets and ConfigMaps restore results" + debug: + msg: >- + Secrets and ConfigMaps: {{ secrets_configmaps_result.created_count }} created, + {{ secrets_configmaps_result.updated_count }} updated, + {{ secrets_configmaps_result.skipped_count }} skipped, + {{ secrets_configmaps_result.failed_count }} failed + when: projects_result.success + +# Step 3: Restore OperatorGroups and Subscriptions +- name: "Restore OperatorGroups" + ibm.mas_devops.restore_resource: + backup_path: "{{ suite_backup_path }}" + resource_kinds: + - OperatorGroup + register: operatorgroups_result + when: projects_result.success + +- name: "Display OperatorGroups restore results" + debug: + msg: >- + OperatorGroups: {{ operatorgroups_result.created_count }} created, + {{ operatorgroups_result.updated_count }} updated, + {{ operatorgroups_result.skipped_count }} skipped, + {{ operatorgroups_result.failed_count }} failed + when: projects_result.success + +- name: "Restore Subscriptions" + ibm.mas_devops.restore_resource: + backup_path: "{{ suite_backup_path }}" + resource_kinds: + - Subscription + register: subscriptions_result + when: projects_result.success + +- name: "Display Subscriptions restore results" + debug: + msg: >- + Subscriptions: {{ subscriptions_result.created_count }} created, + {{ subscriptions_result.updated_count }} updated, + {{ subscriptions_result.skipped_count }} skipped, + {{ subscriptions_result.failed_count }} failed + when: projects_result.success + +# Wait for subscriptions to be ready +- name: "Wait for Subscriptions to be ready (60s delay)" + ibm.mas_devops.verify_subscriptions: + retries: 30 + delay: 60 + when: + - projects_result.success + - >- + subscriptions_result.created_count > 0 or + subscriptions_result.updated_count > 0 or + subscriptions_result.skipped_count > 0 + +# Step 4: Restore Certificate Manager resources (ClusterIssuers, Issuers, Certificates) +- name: "Restore Certificate Manager resources" + ibm.mas_devops.restore_resource: + backup_path: "{{ suite_backup_path }}" + resource_kinds: + - ClusterIssuer + - Issuer + - Certificate + register: certmanager_result + when: projects_result.success + +- name: "Display Certificate Manager restore results" + debug: + msg: >- + Certificate Manager resources: {{ certmanager_result.created_count }} created, + {{ certmanager_result.updated_count }} updated, + {{ certmanager_result.skipped_count }} skipped, + {{ certmanager_result.failed_count }} failed + when: projects_result.success + +# Step 5: Restore MAS Addon resources (addons.mas.ibm.com) +- name: "Restore MAS Addon resources" + ibm.mas_devops.restore_resource: + backup_path: "{{ suite_backup_path }}" + resource_kinds: + - MVIEdge + - ReplicaDB + - GenericAddon + register: addons_result + when: projects_result.success + +- name: "Display MAS Addons restore results" + debug: + msg: >- + MAS Addons: {{ addons_result.created_count }} created, + {{ addons_result.updated_count }} updated, + {{ addons_result.skipped_count }} skipped, + {{ addons_result.failed_count }} failed + when: projects_result.success + +# Step 6: Build MAS Config resource kinds list based on include flags +- name: "Set fact: base config resource kinds" + set_fact: + config_resource_kinds: + - AppCfg + - IDPCfg + - JdbcCfg + - KafkaCfg + - MongoCfg + - ObjectStorageCfg + - PushNotificationCfg + - ScimCfg + - SmtpCfg + - WatsonStudioCfg + config_override_values: {} + +- name: "Add BasCfg to restore if include_dro_from_backup is true" + block: + - name: "Add BasCfg to config resource kinds" + set_fact: + config_resource_kinds: "{{ config_resource_kinds + ['BasCfg'] }}" + + - name: "Add BasCfg override values" + set_fact: + config_override_values: "{{ config_override_values | combine({'BasCfg': [{'spec.config.url': bas_url}]}) }}" + when: include_dro_from_backup | bool + +- name: "Apply external DRO config if include_dro_from_backup is false and dro_cfg_file is provided" + block: + - name: "Verify dro_cfg_file exists" + stat: + path: "{{ dro_cfg_file }}" + register: dro_cfg_file_stat + + - name: "Fail if dro_cfg_file does not exist" + fail: + msg: "DRO config file not found at: {{ dro_cfg_file }}" + when: not dro_cfg_file_stat.stat.exists + + - name: "Apply DRO config file" + kubernetes.core.k8s: + state: present + src: "{{ dro_cfg_file }}" + when: + - not (include_dro_from_backup | bool) + - dro_cfg_file is defined + - dro_cfg_file != "" + +- name: "Add SlsCfg to restore if include_sls_from_backup is true" + block: + - name: "Add SlsCfg to config resource kinds" + set_fact: + config_resource_kinds: "{{ config_resource_kinds + ['SlsCfg'] }}" + + - name: "Add SlsCfg override values" + set_fact: + config_override_values: "{{ config_override_values | combine({'SlsCfg': [{'spec.config.url': sls_url}]}) }}" + when: include_sls_from_backup | bool + +- name: "Apply external SLS config if include_sls_from_backup is false and sls_cfg_file is provided" + block: + - name: "Verify sls_cfg_file exists" + stat: + path: "{{ sls_cfg_file }}" + register: sls_cfg_file_stat + + - name: "Fail if sls_cfg_file does not exist" + fail: + msg: "SLS config file not found at: {{ sls_cfg_file }}" + when: not sls_cfg_file_stat.stat.exists + + - name: "Apply SLS config file" + kubernetes.core.k8s: + state: present + src: "{{ sls_cfg_file }}" + when: + - not (include_sls_from_backup | bool) + - sls_cfg_file is defined + - sls_cfg_file != "" + +# Step 7: Restore MAS Config resources (config.mas.ibm.com) +- name: "Restore MAS Config resources" + ibm.mas_devops.restore_resource: + backup_path: "{{ suite_backup_path }}" + resource_kinds: "{{ config_resource_kinds }}" + override_values: "{{ config_override_values }}" + register: configs_result + when: projects_result.success + +- name: "Display MAS Configs restore results" + debug: + msg: >- + MAS Configs: {{ configs_result.created_count }} created, + {{ configs_result.updated_count }} updated, + {{ configs_result.skipped_count }} skipped, + {{ configs_result.failed_count }} failed + when: projects_result.success + +# Step 8: Restore MAS Internal resources (internal.mas.ibm.com) +- name: "Restore MAS Internal resources" + ibm.mas_devops.restore_resource: + backup_path: "{{ suite_backup_path }}" + resource_kinds: + - CoreIDP + register: internal_result + when: projects_result.success + +- name: "Display MAS Internal restore results" + debug: + msg: >- + MAS Internal: {{ internal_result.created_count }} created, + {{ internal_result.updated_count }} updated, + {{ internal_result.skipped_count }} skipped, + {{ internal_result.failed_count }} failed + when: projects_result.success + +# Step 9: Restore Suite +- name: "Restore Suite" + ibm.mas_devops.restore_resource: + backup_path: "{{ suite_backup_path }}" + resource_kinds: + - Suite + override_values: + Suite: + - spec.domain: "{{ mas_domain }}" + register: suite_result + when: projects_result.success + +- name: "Display Suite restore results" + debug: + msg: >- + Suite: {{ suite_result.created_count }} created, + {{ suite_result.updated_count }} updated, + {{ suite_result.skipped_count }} skipped, + {{ suite_result.failed_count }} failed + when: projects_result.success + +# Wait for Suite to be ready before restoring Workspaces +- name: "Wait for Suite to be ready (60s delay)" + kubernetes.core.k8s_info: + api_version: core.mas.ibm.com/v1 + kind: Suite + name: "{{ mas_instance_id }}" + namespace: "{{ mas_core_namespace }}" + register: suite_status + retries: 60 + delay: 60 + until: + - suite_status.resources is defined + - suite_status.resources | length > 0 + - suite_status.resources[0].status is defined + - suite_status.resources[0].status.conditions is defined + - >- + suite_status.resources[0].status.conditions | + selectattr('type', 'equalto', 'Ready') | + selectattr('status', 'equalto', 'True') | + list | length > 0 + when: + - projects_result.success + - >- + suite_result.created_count > 0 or + suite_result.updated_count > 0 or + suite_result.skipped_count > 0 + +# Step 10: Restore Workspaces +- name: "Restore Workspaces" + ibm.mas_devops.restore_resource: + backup_path: "{{ suite_backup_path }}" + resource_kinds: + - Workspace + register: workspace_result + when: projects_result.success + +- name: "Display Workspace restore results" + debug: + msg: >- + Workspaces: {{ workspace_result.created_count }} created, + {{ workspace_result.updated_count }} updated, + {{ workspace_result.skipped_count }} skipped, + {{ workspace_result.failed_count }} failed + when: projects_result.success + +# Wait for all Workspaces to be ready before finishing +- name: "Wait for all Workspaces to be ready (60s delay)" + kubernetes.core.k8s_info: + api_version: core.mas.ibm.com/v1 + kind: Workspace + namespace: "{{ mas_core_namespace }}" + label_selectors: + - "mas.ibm.com/instanceId={{ mas_instance_id }}" + register: workspace_status + retries: 60 + delay: 60 + until: + - workspace_status.resources is defined + - workspace_status.resources | length > 0 + - >- + workspace_status.resources | + selectattr('status.conditions', 'defined') | + list | length == workspace_status.resources | length + - >- + workspace_status.resources | + map(attribute='status.conditions') | + map('selectattr', 'type', 'equalto', 'Ready') | + map('selectattr', 'status', 'equalto', 'True') | + map('list') | select | list | length == workspace_status.resources | length + when: + - projects_result.success + - >- + workspace_result.created_count > 0 or + workspace_result.updated_count > 0 or + workspace_result.skipped_count > 0 + +# 6. Calculate total results +# ----------------------------------------------------------------------------- +- name: "Calculate total restore results" + set_fact: + total_created: >- + {{ + (projects_result.created_count | default(0)) + + (secrets_configmaps_result.created_count | default(0)) + + (operatorgroups_result.created_count | default(0)) + + (subscriptions_result.created_count | default(0)) + + (certmanager_result.created_count | default(0)) + + (addons_result.created_count | default(0)) + + (configs_result.created_count | default(0)) + + (internal_result.created_count | default(0)) + + (suite_result.created_count | default(0)) + + (workspace_result.created_count | default(0)) + }} + total_updated: >- + {{ + (projects_result.updated_count | default(0)) + + (secrets_configmaps_result.updated_count | default(0)) + + (operatorgroups_result.updated_count | default(0)) + + (subscriptions_result.updated_count | default(0)) + + (certmanager_result.updated_count | default(0)) + + (addons_result.updated_count | default(0)) + + (configs_result.updated_count | default(0)) + + (internal_result.updated_count | default(0)) + + (suite_result.updated_count | default(0)) + + (workspace_result.updated_count | default(0)) + }} + total_skipped: >- + {{ + (projects_result.skipped_count | default(0)) + + (secrets_configmaps_result.skipped_count | default(0)) + + (operatorgroups_result.skipped_count | default(0)) + + (subscriptions_result.skipped_count | default(0)) + + (certmanager_result.skipped_count | default(0)) + + (addons_result.skipped_count | default(0)) + + (configs_result.skipped_count | default(0)) + + (internal_result.skipped_count | default(0)) + + (suite_result.skipped_count | default(0)) + + (workspace_result.skipped_count | default(0)) + }} + total_failed: >- + {{ + (projects_result.failed_count | default(0)) + + (secrets_configmaps_result.failed_count | default(0)) + + (operatorgroups_result.failed_count | default(0)) + + (subscriptions_result.failed_count | default(0)) + + (certmanager_result.failed_count | default(0)) + + (addons_result.failed_count | default(0)) + + (configs_result.failed_count | default(0)) + + (internal_result.failed_count | default(0)) + + (suite_result.failed_count | default(0)) + + (workspace_result.failed_count | default(0)) + }} + +- name: "Display total restore results" + debug: + msg: + - >- + Restore completed{{ ' with failures' if total_failed | int > 0 + else ' successfully' }} + - "Total resources created: {{ total_created }}" + - "Total resources updated: {{ total_updated }}" + - "Total resources skipped: {{ total_skipped }}" + - "Total resources failed: {{ total_failed }}" + +# 7. Fail task if any errors occurred +# ----------------------------------------------------------------------------- +- name: "Collect all failed resources" + set_fact: + all_failed_resources: >- + {{ + (projects_result.failed_resources | default([])) + + (secrets_configmaps_result.failed_resources | default([])) + + (operatorgroups_result.failed_resources | default([])) + + (subscriptions_result.failed_resources | default([])) + + (certmanager_result.failed_resources | default([])) + + (addons_result.failed_resources | default([])) + + (configs_result.failed_resources | default([])) + + (internal_result.failed_resources | default([])) + + (suite_result.failed_resources | default([])) + + (workspace_result.failed_resources | default([])) + }} + +- name: "Display failed resources" + debug: + msg: + - "Failed resources:" + - "{{ all_failed_resources | to_nice_yaml }}" + when: total_failed | int > 0 + +- name: "Fail if restore had errors" + fail: + msg: | + Restore failed for {{ total_failed }} resource(s): + {% for resource in all_failed_resources %} + - {{ resource.description }}: {{ resource.error }} + {% endfor %} + when: total_failed | int > 0 diff --git a/ibm/mas_devops/roles/upload_backup_archive/README.md b/ibm/mas_devops/roles/upload_backup_archive/README.md new file mode 100644 index 0000000000..138c1b36d3 --- /dev/null +++ b/ibm/mas_devops/roles/upload_backup_archive/README.md @@ -0,0 +1,286 @@ +# upload_backup_archive +Creates a compressed archive of MAS backup directories and uploads it to AWS S3 or Artifactory. + +This role automates the process of packaging MAS backup directories into a single tar.gz archive and uploading it to a remote storage location. It supports multiple backup components (catalog, cert-manager, SLS, MongoDB, Db2, and MAS Suite) and allows for component-specific backup versions. The role intelligently detects which backup directories exist and only archives those that are present. + +Key features: +- Creates compressed tar.gz archives of MAS backup directories +- Supports uploading to AWS S3 or S3-compatible storage +- Supports uploading to Artifactory repositories +- Handles component-specific backup versions +- Automatic cleanup of temporary files +- Configurable upload timeouts for large archives + +## Prerequisites + +### For S3 Upload +- [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) or boto3 Python library must be installed +- `amazon.aws` Ansible collection must be installed +- AWS credentials with S3 write permissions +- S3 bucket must exist and be accessible + +### For Artifactory Upload +- `curl` command-line tool must be installed +- Artifactory API token with upload permissions +- Artifactory repository must exist and be accessible + +## Role Variables + +### Required Variables + +#### mas_instance_id +Instance ID of the MAS instance. This is used to identify the backup directories. + +- **Required** +- Environment Variable: `MAS_INSTANCE_ID` +- Default Value: None + +#### mas_backup_dir +Directory containing the MAS backup folders. This is the parent directory where all component backup directories are located. + +- **Required** +- Environment Variable: `MAS_BACKUP_DIR` +- Default Value: None + +#### backup_version +Version identifier for the backup. This is used as the default version for all component backups unless component-specific versions are provided. + +- **Required** +- Environment Variable: `BACKUP_VERSION` +- Default Value: None + +### Component-Specific Backup Versions + +These variables allow you to specify different backup versions for individual components. If not provided, they default to the value of `backup_version`. + +#### ibm_catalogs_backup_version +Backup version for the catalog component. + +- **Optional** +- Environment Variable: `IBM_CATALOGS_BACKUP_VERSION` +- Default Value: Value of `backup_version` + +#### certmanager_backup_version +Backup version for the cert-manager component. + +- **Optional** +- Environment Variable: `CERTMANAGER_BACKUP_VERSION` +- Default Value: Value of `backup_version` + +#### mongodb_backup_version +Backup version for the MongoDB component. + +- **Optional** +- Environment Variable: `MONGODB_BACKUP_VERSION` +- Default Value: Value of `backup_version` + +#### sls_backup_version +Backup version for the SLS (Suite License Service) component. + +- **Optional** +- Environment Variable: `SLS_BACKUP_VERSION` +- Default Value: Value of `backup_version` + +#### db2_backup_version +Backup version for the Db2 component. + +- **Optional** +- Environment Variable: `DB2_BACKUP_VERSION` +- Default Value: Value of `backup_version` + +#### suite_backup_version +Backup version for the MAS Suite component. + +- **Optional** +- Environment Variable: `SUITE_BACKUP_VERSION` +- Default Value: Value of `backup_version` + +#### manage_backup_version +Backup version for the MAS Manage app. + +- **Optional** +- Environment Variable: `MANAGE_BACKUP_VERSION` +- Default Value: Value of `backup_version` + +### S3 Upload Variables + +Provide these variables to upload the backup archive to AWS S3 or S3-compatible storage. If S3 credentials are provided, S3 upload takes precedence over Artifactory. + +#### aws_access_key_id +AWS access key ID for authentication. + +- **Required for S3 upload** +- Environment Variable: `S3_ACCESS_KEY_ID` +- Default Value: None + +#### aws_secret_access_key +AWS secret access key for authentication. + +- **Required for S3 upload** +- Environment Variable: `S3_SECRET_ACCESS_KEY` +- Default Value: None + +#### s3_bucket_name +Name of the S3 bucket where the archive will be uploaded. + +- **Required for S3 upload** +- Environment Variable: `S3_BUCKET_NAME` +- Default Value: None + +#### s3_region +AWS region where the S3 bucket is located. + +- **Optional** +- Environment Variable: `S3_REGION` +- Default Value: `us-east-1` + +#### s3_endpoint_url +Custom S3 endpoint URL for S3-compatible storage services (e.g., MinIO, Wasabi, IBM Cloud Object Storage). + +- **Optional** +- Environment Variable: `S3_ENDPOINT_URL` +- Default Value: None (uses AWS S3 endpoints) + +### Artifactory Upload Variables + +Provide these variables to upload the backup archive to Artifactory. Artifactory upload is used only if S3 credentials are not provided. + +#### artifactory_username +Artifactory username for authentication. + +- **Required for Artifactory upload** +- Environment Variable: `ARTIFACTORY_USERNAME` +- Default Value: None + +#### artifactory_token +Artifactory API token for authentication. + +- **Required for Artifactory upload** +- Environment Variable: `ARTIFACTORY_TOKEN` +- Default Value: None + +#### artifactory_repository +Name of the Artifactory repository where the archive will be uploaded. + +- **Required for Artifactory upload** +- Environment Variable: `ARTIFACTORY_REPOSITORY` +- Default Value: None + +#### artifactory_url +Base URL of the Artifactory server (e.g., `https://artifactory.example.com/artifactory`). + +- **Optional** +- Environment Variable: `ARTIFACTORY_URL` +- Default Value: `https://na.artifactory.swg-devops.com/artifactory` + +### General Configuration + +#### backup_temp_dir +Temporary directory where the archive will be created before upload. The directory is created if it doesn't exist and cleaned up after upload. + +- **Optional** +- Environment Variable: None +- Default Value: `/mas-backup-{{ backup_version }}` + +#### upload_timeout +Maximum time in seconds to wait for the upload to complete. Useful for large archives or slow network connections. + +- **Optional** +- Environment Variable: None +- Default Value: `3600` (1 hour) + +## Example Playbook + +### S3 Upload +After installing the Ansible Collection you can include this role in your own custom playbooks. + +```yaml +- hosts: localhost + vars: + mas_backup_dir: /backup/mas + backup_version: "20260117-191500" + aws_access_key_id: "{{ lookup('env', 'AWS_ACCESS_KEY_ID') }}" + aws_secret_access_key: "{{ lookup('env', 'AWS_SECRET_ACCESS_KEY') }}" + s3_bucket_name: my-mas-backups + s3_region: us-west-2 + roles: + - ibm.mas_devops.upload_backup_archive +``` + +### Artifactory Upload + +```yaml +- hosts: localhost + vars: + mas_backup_dir: /backup/mas + backup_version: "20260117-191500" + artifactory_username: "{{ lookup('env', 'ARTIFACTORY_USERNAME') }}" + artifactory_token: "{{ lookup('env', 'ARTIFACTORY_TOKEN') }}" + artifactory_url: https://artifactory.example.com/artifactory + artifactory_repository: mas-backups + roles: + - ibm.mas_devops.upload_backup_archive +``` + +### S3-Compatible Storage (IBMcloud, MinIO, Wasabi, etc.) + +```yaml +- hosts: localhost + vars: + mas_backup_dir: /backup/mas + backup_version: "20260117-191500" + aws_access_key_id: "{{ lookup('env', 'S3_ACCESS_KEY') }}" + aws_secret_access_key: "{{ lookup('env', 'S3_SECRET_KEY') }}" + s3_bucket_name: mas-backups + s3_region: us-east-1 + s3_endpoint_url: https://s3.example.com + roles: + - ibm.mas_devops.upload_backup_archive +``` + +### Component-Specific Backup Versions + +```yaml +- hosts: localhost + vars: + mas_backup_dir: /backup/mas + backup_version: "20260117-191500" + # Override specific component versions + mongodb_backup_version: "20260116-120000" + db2_backup_version: "20260115-180000" + aws_access_key_id: "{{ lookup('env', 'AWS_ACCESS_KEY_ID') }}" + aws_secret_access_key: "{{ lookup('env', 'AWS_SECRET_ACCESS_KEY') }}" + s3_bucket_name: my-mas-backups + roles: + - ibm.mas_devops.upload_backup_archive +``` + +## Run Role Playbook +After installing the Ansible Collection you can easily run the role standalone using the `run_role` playbook provided. + +### S3 Upload + +```bash +export MAS_BACKUP_DIR=/backup/mas +export BACKUP_VERSION=20260117-191500 +export S3_ACCESS_KEY_ID=your_access_key +export S3_SECRET_ACCESS_KEY=your_secret_key +export S3_BUCKET_NAME=my-mas-backups +export S3_REGION=us-west-2 +ROLE_NAME=upload_backup_archive ansible-playbook ibm.mas_devops.run_role +``` + +### Artifactory Upload + +```bash +export MAS_BACKUP_DIR=/backup/mas +export BACKUP_VERSION=20260117-191500 +export ARTIFACTORY_USERNAME=your_username +export ARTIFACTORY_TOKEN=your_token +export ARTIFACTORY_URL=https://artifactory.example.com/artifactory +export ARTIFACTORY_REPOSITORY=mas-backups +ROLE_NAME=upload_backup_archive ansible-playbook ibm.mas_devops.run_role +``` + +## License +EPL-2.0 \ No newline at end of file diff --git a/ibm/mas_devops/roles/upload_backup_archive/defaults/main.yml b/ibm/mas_devops/roles/upload_backup_archive/defaults/main.yml new file mode 100644 index 0000000000..a39a8dd0f1 --- /dev/null +++ b/ibm/mas_devops/roles/upload_backup_archive/defaults/main.yml @@ -0,0 +1,38 @@ +--- +# Required variables + +# Directory containing the backup folders +mas_backup_dir: "{{ lookup('env', 'MAS_BACKUP_DIR') }}" + +mas_instance_id: "{{ lookup('env', 'MAS_INSTANCE_ID') }}" + +# Backup version to use for all component's backup +backup_version: "{{ lookup('env', 'BACKUP_VERSION') }}" + +# Specific backup versions for each component (optional - override backup_version) +ibm_catalogs_backup_version: "{{ lookup('env', 'IBM_CATALOGS_BACKUP_VERSION') | default (backup_version, true) }}" +certmanager_backup_version: "{{ lookup('env', 'CERTMANAGER_BACKUP_VERSION') | default (backup_version, true) }}" +mongodb_backup_version: "{{ lookup('env', 'MONGODB_BACKUP_VERSION') | default (backup_version, true) }}" +sls_backup_version: "{{ lookup('env', 'SLS_BACKUP_VERSION') | default (backup_version, true) }}" +db2_backup_version: "{{ lookup('env', 'DB2_BACKUP_VERSION') | default (backup_version, true) }}" +suite_backup_version: "{{ lookup('env', 'SUITE_BACKUP_VERSION') | default (backup_version, true) }}" +manage_backup_version: "{{ lookup('env', 'MANAGE_BACKUP_VERSION') | default (backup_version, true) }}" + +# S3 Configuration (provide these to upload to S3) +aws_access_key_id: "{{ lookup('env', 'S3_ACCESS_KEY_ID') }}" +aws_secret_access_key: "{{ lookup('env', 'S3_SECRET_ACCESS_KEY') }}" +s3_bucket_name: "{{ lookup('env', 'S3_BUCKET_NAME') }}" +s3_region: "{{ lookup('env', 'S3_REGION') | default('us-east-1', true) }}" +s3_endpoint_url: "{{ lookup('env', 'S3_ENDPOINT_URL') }}" + +# Artifactory Configuration (provide these to upload to Artifactory) +artifactory_username: "{{ lookup('env', 'ARTIFACTORY_USERNAME') }}" +artifactory_token: "{{ lookup('env', 'ARTIFACTORY_TOKEN') }}" +artifactory_url: "{{ lookup('env', 'ARTIFACTORY_URL') | default('https://na.artifactory.swg-devops.com/artifactory', true) }}" +artifactory_repository: "{{ lookup('env', 'ARTIFACTORY_REPOSITORY') }}" + +# General settings +# Archive naming pattern: mas-{{ mas_instance_id }}-{{ backup_directory_name }}.tar.gz +# Multiple archives will be created, one for each backup directory +backup_temp_dir: "{{ mas_backup_dir }}/mas-{{ mas_instance_id }}-backup-{{ backup_version }}" +upload_timeout: 10800 # Upload timeout in seconds (3 hours) diff --git a/ibm/mas_devops/roles/upload_backup_archive/meta/main.yml b/ibm/mas_devops/roles/upload_backup_archive/meta/main.yml new file mode 100644 index 0000000000..ebe3aabfe8 --- /dev/null +++ b/ibm/mas_devops/roles/upload_backup_archive/meta/main.yml @@ -0,0 +1,19 @@ +--- +galaxy_info: + author: IBM + description: Upload MAS backup archive to S3 or Artifactory + company: IBM + license: EPL-2.0 + min_ansible_version: "2.9" + platforms: + - name: EL + versions: + - "8" + galaxy_tags: + - ibm + - mas + - backup + - s3 + - artifactory + +dependencies: [] diff --git a/ibm/mas_devops/roles/upload_backup_archive/tasks/create_archive.yml b/ibm/mas_devops/roles/upload_backup_archive/tasks/create_archive.yml new file mode 100644 index 0000000000..31a2af600e --- /dev/null +++ b/ibm/mas_devops/roles/upload_backup_archive/tasks/create_archive.yml @@ -0,0 +1,81 @@ +--- +# Create tar archive of backup directories +- name: "Set backup directories to archive" + ansible.builtin.set_fact: + backup_directories: + - "backup-{{ ibm_catalogs_backup_version }}-catalog" + - "backup-{{ certmanager_backup_version }}-certmanager" + - "backup-{{ mongodb_backup_version }}-mongoce" + - "backup-{{ sls_backup_version }}-sls" + - "backup-{{ suite_backup_version }}-suite" + - "backup-{{ manage_backup_version }}-app-manage" + - "backup-{{ db2_backup_version }}-db2u-manage" + +- name: "Verify backup directory exists" + ansible.builtin.stat: + path: "{{ mas_backup_dir }}" + register: backup_dir_stat + +- name: "Fail if backup directory does not exist" + ansible.builtin.fail: + msg: "Backup directory {{ mas_backup_dir }} does not exist" + when: not backup_dir_stat.stat.exists or not backup_dir_stat.stat.isdir + +- name: "Check which backup directories exist" + ansible.builtin.stat: + path: "{{ mas_backup_dir }}/{{ item }}" + register: backup_dirs_stat + loop: "{{ backup_directories }}" + +- name: "Set list of existing backup directories" + ansible.builtin.set_fact: + existing_backup_dirs: "{{ backup_dirs_stat.results | selectattr('stat.exists', 'equalto', true) | map(attribute='item') | list }}" + +- name: "Display existing backup directories" + ansible.builtin.debug: + msg: "Found {{ existing_backup_dirs | length }} backup directories: {{ existing_backup_dirs }}" + +- name: "Fail if no backup directories found" + ansible.builtin.fail: + msg: "No backup directories found in {{ mas_backup_dir }} for version {{ backup_version }}" + when: existing_backup_dirs | length == 0 + +- name: "Remove temporary dir if it exists to avoid conflicts" + ansible.builtin.file: + path: "{{ backup_temp_dir }}" + state: absent + +- name: "Create temporary directory for archives" + ansible.builtin.file: + path: "{{ backup_temp_dir }}" + state: directory + mode: '0755' + +- name: "Create tar.gz archive for each backup directory" + ansible.builtin.command: + cmd: "tar -czf {{ backup_temp_dir }}/mas-{{ mas_instance_id }}-{{ item }}.tar.gz -C {{ mas_backup_dir }} {{ item }}" + register: tar_results + changed_when: tar_results.rc == 0 + loop: "{{ existing_backup_dirs }}" + +- name: "Verify archives were created" + ansible.builtin.stat: + path: "{{ backup_temp_dir }}/mas-{{ mas_instance_id }}-{{ item }}.tar.gz" + register: archive_stats + loop: "{{ existing_backup_dirs }}" + +- name: "Fail if any archive was not created" + ansible.builtin.fail: + msg: "Failed to create archive {{ backup_temp_dir }}/mas-{{ mas_instance_id }}-{{ item.item }}.tar.gz" + when: not item.stat.exists + loop: "{{ archive_stats.results }}" + +- name: "Build list of created archives" + ansible.builtin.set_fact: + backup_archives: "{{ archive_stats.results | map(attribute='stat.path') | list }}" + +- name: "Display archive information" + ansible.builtin.debug: + msg: + - "Created {{ backup_archives | length }} archives successfully:" + - "{% for item in archive_stats.results %} - {{ item.stat.path | basename }}: {{ (item.stat.size / 1024 / 1024) | round(2) }} MB{% endfor %}" diff --git a/ibm/mas_devops/roles/upload_backup_archive/tasks/main.yml b/ibm/mas_devops/roles/upload_backup_archive/tasks/main.yml new file mode 100644 index 0000000000..278cddd10a --- /dev/null +++ b/ibm/mas_devops/roles/upload_backup_archive/tasks/main.yml @@ -0,0 +1,61 @@ +--- +# Validate required variables +- name: "Fail if mas_backup_dir is not defined" + ansible.builtin.fail: + msg: "mas_backup_dir is required but not defined" + when: mas_backup_dir is not defined or mas_backup_dir == '' + +- name: "Fail if mas_instance_id is not defined" + ansible.builtin.fail: + msg: "mas_instance_id is required but not defined" + when: mas_instance_id is not defined or mas_instance_id == '' + +- name: "Fail if backup_version is not defined" + ansible.builtin.fail: + msg: "backup_version is required but not defined" + when: backup_version is not defined or backup_version == '' + +# Determine upload destination +- name: "Check if S3 credentials are provided" + ansible.builtin.set_fact: + upload_to_s3: "{{ (aws_access_key_id is defined and aws_access_key_id != '') and (aws_secret_access_key is defined and aws_secret_access_key != '') and (s3_bucket_name is defined and s3_bucket_name != '') }}" + +- name: "Check if Artifactory credentials are provided" + ansible.builtin.set_fact: + upload_to_artifactory: "{{ (artifactory_username is defined and artifactory_username != '') and (artifactory_token is defined and artifactory_token != '') and (artifactory_url is defined and artifactory_url != '') }}" + +- name: "Fail if neither S3 nor Artifactory credentials are provided" + ansible.builtin.fail: + msg: "Either S3 credentials (aws_access_key_id, aws_secret_access_key, s3_bucket_name) or Artifactory credentials (artifactory_username, artifactory_token, artifactory_url) must be provided" + when: not upload_to_s3 and not upload_to_artifactory + +- name: "Display upload destination" + ansible.builtin.debug: + msg: "Will upload to: {{ 'S3' if upload_to_s3 else 'Artifactory' }}" + +# Create tar archive +- name: "Create backup archive" + ansible.builtin.include_tasks: create_archive.yml + +# Upload to S3 or Artifactory +- name: "Upload to S3" + ansible.builtin.include_tasks: upload_to_s3.yml + when: upload_to_s3 + +- name: "Upload to Artifactory" + ansible.builtin.include_tasks: upload_to_artifactory.yml + when: upload_to_artifactory and not upload_to_s3 + +# Cleanup +- name: "Remove temporary archive files" + ansible.builtin.file: + path: "{{ item }}" + state: absent + loop: "{{ backup_archives }}" + when: backup_archives is defined + +- name: "Remove temporary directory" + ansible.builtin.file: + path: "{{ backup_temp_dir }}" + state: absent + when: backup_temp_dir is defined diff --git a/ibm/mas_devops/roles/upload_backup_archive/tasks/upload_to_artifactory.yml b/ibm/mas_devops/roles/upload_backup_archive/tasks/upload_to_artifactory.yml new file mode 100644 index 0000000000..56aedb0e31 --- /dev/null +++ b/ibm/mas_devops/roles/upload_backup_archive/tasks/upload_to_artifactory.yml @@ -0,0 +1,51 @@ +--- +# Upload backup archives to Artifactory +- name: "Validate Artifactory repository is defined" + ansible.builtin.fail: + msg: "artifactory_repository is required when uploading to Artifactory" + when: artifactory_repository is not defined or artifactory_repository == '' + +- name: "Display Artifactory upload information" + ansible.builtin.debug: + msg: + - "Uploading to Artifactory: {{ artifactory_url }}" + - "Repository: {{ artifactory_repository }}" + - "Number of archives: {{ backup_archives | length }}" + +- name: "Upload archives to Artifactory using curl" + ansible.builtin.command: + cmd: > + curl -X PUT + -u {{ artifactory_username }}:{{ artifactory_token }} + -T {{ item }} + {{ artifactory_url }}/{{ artifactory_repository }}/mas-{{ mas_instance_id }}-backups/{{ item | basename }} + --max-time {{ upload_timeout }} + --connect-timeout 60 + --fail + --silent + --show-error + register: artifactory_upload_results + changed_when: artifactory_upload_results.rc == 0 + failed_when: false + no_log: true + loop: "{{ backup_archives }}" + +- name: "Process upload results" + ansible.builtin.set_fact: + artifactory_upload_summary: "{{ artifactory_upload_summary | default([]) + [{'archive': item.item | basename, 'success': item.rc == 0, 'http_code': item.stdout | regex_search('[0-9]{3}$') if item.rc == 0 else 'N/A'}] }}" + loop: "{{ artifactory_upload_results.results }}" + +- name: "Display Artifactory upload results" + ansible.builtin.debug: + msg: + - "Upload summary:" + - "{% for result in artifactory_upload_summary %} - {{ result.archive }}: {{ 'SUCCESS' if result.success else 'FAILED' }} (HTTP {{ result.http_code }}){% endfor %}" + +- name: "Check for failed uploads" + ansible.builtin.set_fact: + failed_artifactory_uploads: "{{ artifactory_upload_summary | selectattr('success', 'equalto', false) | map(attribute='archive') | list }}" + +- name: "Fail if any Artifactory upload failed" + ansible.builtin.fail: + msg: "Failed to upload {{ failed_artifactory_uploads | length }} archive(s) to Artifactory: {{ failed_artifactory_uploads | join(', ') }}" + when: failed_artifactory_uploads | length > 0 diff --git a/ibm/mas_devops/roles/upload_backup_archive/tasks/upload_to_s3.yml b/ibm/mas_devops/roles/upload_backup_archive/tasks/upload_to_s3.yml new file mode 100644 index 0000000000..d05eb54267 --- /dev/null +++ b/ibm/mas_devops/roles/upload_backup_archive/tasks/upload_to_s3.yml @@ -0,0 +1,36 @@ +--- +# Upload backup archives to S3 +- name: "Display S3 upload information" + ansible.builtin.debug: + msg: + - "Uploading to S3 bucket: {{ s3_bucket_name }}" + - "Region: {{ s3_region }}" + - "Number of archives: {{ backup_archives | length }}" + +- name: "Upload archives to S3" + ibm.mas_devops.upload_to_s3: + aws_access_key_id: "{{ aws_access_key_id }}" + aws_secret_access_key: "{{ aws_secret_access_key }}" + bucket_name: "{{ s3_bucket_name }}" + object_name: "mas-{{ mas_instance_id }}-backups/{{ item | basename }}" + file_path: "{{ item }}" + region_name: "{{ s3_region }}" + endpoint_url: "{{ s3_endpoint_url | default(omit) }}" + register: s3_upload_results + poll: 10 + loop: "{{ backup_archives }}" + +- name: "Display S3 upload results" + ansible.builtin.debug: + msg: "Successfully uploaded {{ item.item | basename }} to S3 bucket {{ s3_bucket_name }}/mas-{{ mas_instance_id }}-backups" + when: item.success | bool + loop: "{{ s3_upload_results.results }}" + +- name: "Check for failed uploads" + ansible.builtin.set_fact: + failed_uploads: "{{ s3_upload_results.results | selectattr('success', 'equalto', false) | map(attribute='item') | list }}" + +- name: "Fail if any S3 upload failed" + ansible.builtin.fail: + msg: "Failed to upload {{ failed_uploads | length }} archive(s) to S3: {{ failed_uploads | map('basename') | join(', ') }}" + when: failed_uploads | length > 0 diff --git a/mkdocs.yml b/mkdocs.yml index 267ae74b6d..44ad9c2c05 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -25,7 +25,8 @@ nav: - "Add Real Estate and Facilities": playbooks/mas-facilities.md - "Update": playbooks/mas-update.md - "Upgrade": playbooks/mas-upgrade.md - - "Backup & Restore": playbooks/backup-restore.md + - "Backup and Restore": playbooks/backup_restore.md + - "Legacy Backup and Restore": playbooks/legacy_backup_restore.md - "Roles: OCP Mgmt": - "ocp_cluster_monitoring": roles/ocp_cluster_monitoring.md - "ocp_config": roles/ocp_config.md @@ -79,7 +80,6 @@ nav: - "suite_app_uninstall": roles/suite_app_uninstall.md - "suite_app_upgrade": roles/suite_app_upgrade.md - "suite_app_rollback": roles/suite_app_rollback.md - - "suite_app_backup_restore": roles/suite_app_backup_restore.md - "suite_certs": roles/suite_certs.md - "suite_config": roles/suite_config.md - "suite_db2_setup_for_manage": roles/suite_db2_setup_for_manage.md @@ -99,7 +99,6 @@ nav: - "suite_upgrade": roles/suite_upgrade.md - "suite_rollback": roles/suite_rollback.md - "suite_verify": roles/suite_verify.md - - "suite_backup_restore": roles/suite_backup_restore.md - "Roles: Utilities": - "ansible_version_check": roles/ansible_version_check.md - "entitlement_key_rotation": roles/entitlement_key_rotation.md