Cray System Management Documentation > Cray System Management (CSM) Administration Guide > kubernetes > Create a Manual Backup of Bare-Metal etcd Cluster

Create a Manual Backup of Bare-Metal etcd Cluster

Manually create a backup of a healthy bare-metal etcd cluster.

bare-metal etcd cluster backups are automatically created every ten minutes and deleted after 24 hours. When necessary, these procedures can be used to create an additional backup, and then save a separate copy of it, or one of the automated backups. When needed, a bare-metal etcd cluster can be restored from a saved backup.

The commands in these procedures can be run on any master (control plane) node (ncn-m#) on the system. These procedures assume a healthy bare-metal etcd cluster.

Check health of bare-metal etcd cluster
bare-metal etcd cluster backups overview
Create new bare-metal etcd cluster backup
Save a copy of a bare-metal etcd cluster backup
Restore bare-metal etcd cluster from a saved backup file

Check health of bare-metal etcd cluster

Verify bare-metal etcd cluster health on each master (control plane) node (ncn-m#) on the system.

ETCDCTL_API=3 etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \
    --cert /etc/kubernetes/pki/etcd/server.crt \
    --key /etc/kubernetes/pki/etcd/server.key endpoint health

Example output:

127.0.0.1:2379 is healthy: successfully committed proposal: took = 20.697187ms

Bare-metal etcd cluster backups overview

(ncn-mw#) Verify that the bare-metal etcd cluster backups are initiated every ten minutes.

kubectl get cronjob -n kube-system

Example output:

NAME              SCHEDULE       SUSPEND   ACTIVE   LAST SCHEDULE   AGE
kube-etcdbackup   */10 * * * *   False     0        3m28s           7d10h

(ncn-mw#) View backup job.

kubectl get jobs -n kube-system | grep -e NAME -e etcdbackup

Example output:

NAME                       COMPLETIONS   DURATION   AGE
kube-etcdbackup-27721380   1/1           8s         2m30s

(ncn-mw#) List the available bare-metal etcd cluster backups. Note the ten minute interval.

cd /opt/cray/platform-utils/s3 && \
    ./list-objects.py --bucket-name etcd-backup | \
    grep bare-metal | tail -5; echo "Current": $(date +"%Y-%m-%d-%H-%M-%S")

Example output:

bare-metal/etcd-backup-2022-09-15-19-50-02.tar.gz
bare-metal/etcd-backup-2022-09-15-20-00-02.tar.gz
bare-metal/etcd-backup-2022-09-15-20-10-02.tar.gz
bare-metal/etcd-backup-2022-09-15-20-20-02.tar.gz
bare-metal/etcd-backup-2022-09-15-20-30-03.tar.gz
Current: 2022-09-15-20-32-54

Create new bare-metal etcd cluster backup

Validate bare-metal etcd cluster’s health.

See Check health of bare-metal etcd cluster.

(ncn-mw#) Create backup job.

kubectl create job tmp-bare-metal-etcd-backup --from=cronjob/kube-etcdbackup -n kube-system

Example output:

job.batch/tmp-bare-metal-etcd-backup created

(ncn-mw#) Validate job creation.

kubectl get jobs -n kube-system | grep -e NAME -e tmp-bare-metal-etcd-backup

Example output:

NAME                       COMPLETIONS   DURATION   AGE
tmp-bare-metal-etcd-backup   1/1           9s         16s

(ncn-mw#) Verify that the new bare-metal etcd cluster backup was created.

In this example, bare-metal/etcd-backup-2022-09-15-20-34-13.tar.gz is created.

cd /opt/cray/platform-utils/s3 && ./list-objects.py \
    --bucket-name etcd-backup | \
    grep bare-metal | tail -5; echo "Current": $(date +"%Y-%m-%d-%H-%M-%S")

Example output:

bare-metal/etcd-backup-2022-09-15-20-00-02.tar.gz
bare-metal/etcd-backup-2022-09-15-20-10-02.tar.gz
bare-metal/etcd-backup-2022-09-15-20-20-02.tar.gz
bare-metal/etcd-backup-2022-09-15-20-30-03.tar.gz
bare-metal/etcd-backup-2022-09-15-20-34-13.tar.gz
Current: 2022-09-15-20-36-14

(ncn-mw#) Delete the tmp-bare-metal-etcd-backup backup job.

kubectl -n kube-system delete job tmp-bare-metal-etcd-backup

Example output:

job.batch "tmp-bare-metal-etcd-backup" deleted

Save a copy of a bare-metal etcd cluster backup

(ncn-mw#) Create a temporary directory.
```
mkdir /tmp/etcd_backup
```

(ncn-mw#) Retrieve the backup from S3.

In this example, the bare-metal/etcd-backup-2022-09-15-20-34-13.tar.gz backup created earlier is saved. Set the BACKUP_NAME variable to the file name, omitting the bare-metal/ prefix and the .tar.gz suffix.

BACKUP_NAME=etcd-backup-2022-09-15-20-34-13
cd /opt/cray/platform-utils/s3
./download-file.py --bucket-name etcd-backup \
    --key-name "bare-metal/${BACKUP_NAME}.tar.gz" \
    --file-name "/tmp/etcd_backup/${BACKUP_NAME}.tar.gz"
ls /tmp/etcd_backup/

Example output:

etcd-backup-2022-09-15-20-34-13.tar.gz

Restore bare-metal etcd cluster from a saved backup file

(ncn-mw#) Create a /tmp/etcd_restore directory.
```
mkdir /tmp/etcd_restore
```
(ncn-mw#) Copy the saved bare-metal etcd cluster backup file into the directory.

Set the BACKUP_NAME variable to the file name, omitting the bare-metal/ prefix and the .tar.gz suffix. In the following example, the backup file is copied from /tmp/etcd_backup where a copy of the backup file etcd-backup-2022-09-15-20-34-13.tar.gz was saved above.
```
cd /tmp/etcd_restore
BACKUP_NAME=etcd-backup-2022-09-15-20-34-13
cp /tmp/etcd_backup/${BACKUP_NAME}.tar.gz .
ls /tmp/etcd_restore/
```
Example output:
```
etcd-backup-2022-09-15-20-34-13.tar.gz
```

(ncn-mw#) Uncompress the backup file and extract the etcd-dump.bin file.

gunzip "${BACKUP_NAME}.tar.gz"
tar -xvf "${BACKUP_NAME}.tar"
mv -v "${BACKUP_NAME}/etcd-dump.bin" /tmp

Example output:

renamed 'etcd-backup-2022-09-15-20-34-13/etcd-dump.bin' -> '/tmp/etcd-dump.bin'

(ncn-mw#) Push the etcd-dump.bin file to the other NCN master nodes.

If not running these steps on ncn-m001, adjust the NCN names in the following command accordingly.
```
scp /tmp/etcd-dump.bin ncn-m002:/tmp
scp /tmp/etcd-dump.bin ncn-m003:/tmp
```
Continue the bare-metal etcd cluster restore.

Continue with the Restore member directory step.
Verify the restored bare-metal etcd cluster’s health.

See Check health of bare-metal etcd cluster.