Manually create a backup of a healthy bare-metal etcd cluster.
bare-metal etcd cluster backups are automatically created every ten minutes and deleted after 24 hours. When necessary, these procedures can be used to create an additional backup, and then save a separate copy of it, or one of the automated backups. When needed, a bare-metal etcd cluster can be restored from a saved backup.
The commands in these procedures can be run on any master (control plane) node (ncn-m#
) on the system.
These procedures assume a healthy bare-metal etcd cluster.
Verify bare-metal etcd cluster health on each master (control plane) node (ncn-m#
) on the system.
ETCDCTL_API=3 etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt \
--key /etc/kubernetes/pki/etcd/server.key endpoint health
Example output:
127.0.0.1:2379 is healthy: successfully committed proposal: took = 20.697187ms
(ncn-mw#
) Verify that the bare-metal etcd cluster backups are initiated every ten minutes.
kubectl get cronjob -n kube-system
Example output:
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
kube-etcdbackup */10 * * * * False 0 3m28s 7d10h
(ncn-mw#
) View backup job.
kubectl get jobs -n kube-system | grep -e NAME -e etcdbackup
Example output:
NAME COMPLETIONS DURATION AGE
kube-etcdbackup-27721380 1/1 8s 2m30s
(ncn-mw#
) List the available bare-metal etcd cluster backups. Note the ten minute interval.
cd /opt/cray/platform-utils/s3 && \
./list-objects.py --bucket-name etcd-backup | \
grep bare-metal | tail -5; echo "Current": $(date +"%Y-%m-%d-%H-%M-%S")
Example output:
bare-metal/etcd-backup-2022-09-15-19-50-02.tar.gz
bare-metal/etcd-backup-2022-09-15-20-00-02.tar.gz
bare-metal/etcd-backup-2022-09-15-20-10-02.tar.gz
bare-metal/etcd-backup-2022-09-15-20-20-02.tar.gz
bare-metal/etcd-backup-2022-09-15-20-30-03.tar.gz
Current: 2022-09-15-20-32-54
Validate bare-metal etcd cluster’s health.
(ncn-mw#
) Create backup job.
kubectl create job tmp-bare-metal-etcd-backup --from=cronjob/kube-etcdbackup -n kube-system
Example output:
job.batch/tmp-bare-metal-etcd-backup created
(ncn-mw#
) Validate job creation.
kubectl get jobs -n kube-system | grep -e NAME -e tmp-bare-metal-etcd-backup
Example output:
NAME COMPLETIONS DURATION AGE
tmp-bare-metal-etcd-backup 1/1 9s 16s
(ncn-mw#
) Verify that the new bare-metal etcd cluster backup was created.
In this example, bare-metal/etcd-backup-2022-09-15-20-34-13.tar.gz
is created.
cd /opt/cray/platform-utils/s3 && ./list-objects.py \
--bucket-name etcd-backup | \
grep bare-metal | tail -5; echo "Current": $(date +"%Y-%m-%d-%H-%M-%S")
Example output:
bare-metal/etcd-backup-2022-09-15-20-00-02.tar.gz
bare-metal/etcd-backup-2022-09-15-20-10-02.tar.gz
bare-metal/etcd-backup-2022-09-15-20-20-02.tar.gz
bare-metal/etcd-backup-2022-09-15-20-30-03.tar.gz
bare-metal/etcd-backup-2022-09-15-20-34-13.tar.gz
Current: 2022-09-15-20-36-14
(ncn-mw#
) Delete the tmp-bare-metal-etcd-backup
backup job.
kubectl -n kube-system delete job tmp-bare-metal-etcd-backup
Example output:
job.batch "tmp-bare-metal-etcd-backup" deleted
(ncn-mw#
) Create a temporary directory.
mkdir /tmp/etcd_backup
(ncn-mw#
) Retrieve the backup from S3.
In this example, the bare-metal/etcd-backup-2022-09-15-20-34-13.tar.gz
backup created earlier is saved.
Set the BACKUP_NAME
variable to the file name, omitting the bare-metal/
prefix and the .tar.gz suffix
.
BACKUP_NAME=etcd-backup-2022-09-15-20-34-13
cd /opt/cray/platform-utils/s3
./download-file.py --bucket-name etcd-backup \
--key-name "bare-metal/${BACKUP_NAME}.tar.gz" \
--file-name "/tmp/etcd_backup/${BACKUP_NAME}.tar.gz"
ls /tmp/etcd_backup/
Example output:
etcd-backup-2022-09-15-20-34-13.tar.gz
(ncn-mw#
) Create a /tmp/etcd_restore
directory.
mkdir /tmp/etcd_restore
(ncn-mw#
) Copy the saved bare-metal etcd cluster backup file into the directory.
Set the BACKUP_NAME
variable to the file name, omitting the bare-metal/
prefix and the .tar.gz suffix
.
In the following example, the backup file is copied from /tmp/etcd_backup
where a copy of the backup file etcd-backup-2022-09-15-20-34-13.tar.gz
was saved above.
cd /tmp/etcd_restore
BACKUP_NAME=etcd-backup-2022-09-15-20-34-13
cp /tmp/etcd_backup/${BACKUP_NAME}.tar.gz .
ls /tmp/etcd_restore/
Example output:
etcd-backup-2022-09-15-20-34-13.tar.gz
(ncn-mw#
) Uncompress the backup file and extract the etcd-dump.bin
file.
gunzip "${BACKUP_NAME}.tar.gz"
tar -xvf "${BACKUP_NAME}.tar"
mv -v "${BACKUP_NAME}/etcd-dump.bin" /tmp
Example output:
renamed 'etcd-backup-2022-09-15-20-34-13/etcd-dump.bin' -> '/tmp/etcd-dump.bin'
(ncn-mw#
) Push the etcd-dump.bin
file to the other NCN master nodes.
If not running these steps on ncn-m001
, adjust the NCN names in the following command accordingly.
scp /tmp/etcd-dump.bin ncn-m002:/tmp
scp /tmp/etcd-dump.bin ncn-m003:/tmp
Continue the bare-metal etcd cluster restore.
Continue with the Restore member directory step.
Verify the restored bare-metal etcd cluster’s health.