The current process for ensuring the safety of the nexus-data
PVC is a one time, space intensive, manual process, and is only recommended to be done while Nexus is
in a known good state. An export is recommended to be done before an upgrade, in order to enable the ability to roll back. Taking an export can also be used to improve
Nexus resiliency by allowing easy fixes for data corruption.
CAUTION: This process may be risky and is not recommended for all use cases.
Note: Only one export should be taken. Each time the script is run, it will overwrite the old export.
Prior to making an export, check the size of the exported tar file on the cluster (for example, three times the size of just the export) and the amount of storage that the cluster has left.
Run the following command on a master node:
ncn-m# kubectl exec -n nexus deploy/nexus -c nexus -- df -P /nexus-data | grep '/nexus-data' |
awk '{print "Amount of space the Nexus export will take up on cluster: "(($3 * 3)/1048576)" GiB";}' &&
ceph df | grep 'zone1.rgw.buckets.data' | awk '{ print "Currently used: " $7 $8 ", Max Available " $10 $11;}'
The above commands will return the following information:
If the size of the Nexus export plus the size of the currently used space is larger than the maximum available space, then follow the steps on Nexus Space Cleanup.
Taking the export can take multiple hours and Nexus will be unavailable for the entire time. For a fresh install of Nexus, the export takes around
1 hour for every 60 GiB stored in the nexus-data
PVC. For example, if the nexus-data
PVC is 120 GiB (meaning the first step showed the export will
use 360 GiB on cluster), then Nexus would be unavailable for around 2 hours while the export was taking place. If the time required to backup is too long
because of the size it will take follow the steps on Nexus Space Cleanup.
(ncn-m#
) If an export has been taken previously, then it should be deleted before a new export is taken.
Check for existing nexus-bak
PVC. If found, it needs to be removed.
kubectl get pvc -n nexus nexus-bak
Example output:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nexus-bak Bound pvc-7551d342-f976-48e1-bb91-1957b75dbc53 1000Gi RWO k8s-block-replicated 42d
Check for existing nexus-backup
job. If found, it needs to be removed.
kubectl get jobs -n nexus nexus-backup
Example output:
NAME COMPLETIONS DURATION AGE
nexus-backup 1/1 6h22m 42d
To take an export of nexus, run the export script on any master node where the latest CSM documentation is installed. See Check for latest documentation.
ncn-m# /usr/share/doc/csm/scripts/nexus-export.sh
Example output:
Gibibytes available in cluster: 52418
Gibibytes used in nexus-data: 434
Gibibytes available in nexus-data: 566
Space to be used from backup: 1302
Creating PVC for Nexus backup, if needed
Error from server (NotFound): persistentvolumeclaims "nexus-bak" not found
persistentvolumeclaim/nexus-bak created
Scaling Nexus deployment to 0
deployment.apps/nexus scaled
Starting backup, do not exit this script.
Should be done around Fri 22 Mar 2024 06:29:03 PM UTC (7:14 from now)
job.batch/nexus-backup created
Waiting for the backup to finish.
..............................
A single “.” will be output every 30 seconds until the export reports “Done”.
The restore will delete any changes made to Nexus after the backup was taken. The restore takes around half the time that the export took (for example, if the export took two hours then the restore would take around one hour). While the restore is underway, Nexus is unavailable.
To restore Nexus to the state of the backup, run the restore script on any master node where the latest CSM documentation is installed. See Check for latest documentation.
ncn-m# /usr/share/doc/csm/scripts/nexus-restore.sh
To cleanup all the jobs and data that the export or restore creates, there are a few different commands that can be used.
ncn-mw# kubectl delete job -n nexus nexus-backup
ncn-mw# kubectl delete job -n nexus nexus-restore
If a new export is being created, then it is recommended to first delete the old export, in order to ensure that everything exported correctly. If the old export is not deleted, then the new job will overwrite the old export. This is expected behavior, because only one export should be used at a time.
To delete an export:
ncn-mw# kubectl delete pvc -n nexus nexus-bak
If an export is stopped prematurely or fails to complete, there are a few steps that need to be taken to bring Nexus back into a working state.
Delete the failed or stopped job.
See Cleanup export job.
Delete the partially filled export PVC.
Restart Nexus if it is still stopped.
Depending on where the job failed the Nexus pods may still be down.
Check if the Nexus pods are down.
ncn-mw# kubectl get pods -n nexus | grep nexus
If the Nexus pod is not found, then scale it back up.
ncn-mw# kubectl -n nexus scale deployment nexus --replicas=1