The current process for ensuring the safety of the nexus-data
PVC is a one time, space intensive, manual process, and is only recommended to be done while Nexus is
in a known good state. An export is recommended to be done before an upgrade, in order to enable the ability to roll back. Taking an export can also be used to improve
Nexus resiliency by allowing easy fixes for data corruption.
CAUTION: This process may be risky and is not recommended for all use cases.
Note: Only one export should be taken. Each time the script is run, it will overwrite the old export.
Prior to making an export, check the size of the exported tar file on the cluster (for example, three times the size of just the export) and the amount of storage that the cluster has left.
Run the following command on a master node:
ncn-m# kubectl exec -n nexus deploy/nexus -c nexus -- df -P /nexus-data | grep '/nexus-data' |
awk '{print "Amount of space the Nexus export will take up on cluster: "(($3 * 3)/1048576)" GiB";}' &&
ceph df | grep 'zone1.rgw.buckets.data' | awk '{ print "Currently used: " $7 $8 ", Max Available " $10 $11;}'
The above commands will return the following information:
If the size of the Nexus export plus the size of the currently used space is larger than the maximum available space, then submit a help request in order to determine a solution.
Taking the export can take multiple hours and Nexus will be unavailable for the entire time. For a fresh install of Nexus, the export takes around
1 hour for every 60 GiB stored in the nexus-data
PVC. For example, if the nexus-data
PVC is 120 GiB (meaning the first step showed the export will
use 360 GiB on cluster), then Nexus would be unavailable for around 2 hours while the export was taking place.
To get an export, run the export script on any master node where the latest CSM documentation is installed. See Check for latest documentation.
If an export has been taken previously, then it should be deleted before a new export is taken. See Cleanup previous export.
ncn-m# /usr/share/doc/csm/scripts/nexus-export.sh
The restore will delete any changes made to Nexus after the backup was taken. The restore takes around half the time that the export took (for example, if the export took two hours then the restore would take around one hour). While the restore is underway, Nexus is unavailable.
To restore Nexus to the state of the backup, run the restore script on any master node where the latest CSM documentation is installed. See Check for latest documentation.
ncn-m# /usr/share/doc/csm/scripts/nexus-restore.sh
To cleanup all the jobs and data that the export or restore creates, there are a few different commands that can be used.
ncn-mw# kubectl delete job -n nexus nexus-backup
ncn-mw# kubectl delete job -n nexus nexus-restore
If a new export is being created, then it is recommended to first delete the old export, in order to ensure that everything exported correctly. If the old export is not deleted, then the new job will overwrite the old export. This is expected behavior, because only one export should be used at a time.
To delete an export:
ncn-mw# kubectl delete pvc -n nexus nexus-bak
If an export is stopped prematurely or fails to complete, there are a few steps that need to be taken to bring Nexus back into a working state.
Delete the failed or stopped job.
See Cleanup export job.
Delete the partially filled export PVC.
Restart Nexus if it is still stopped.
Depending on where the job failed the Nexus pods may still be down.
Check if the Nexus pods are down.
ncn-mw# kubectl get pods -n nexus | grep nexus
If the Nexus pod is not found, then scale it back up.
ncn-mw# kubectl -n nexus scale deployment nexus --replicas=1