Use an existing backup of a healthy etcd cluster to restore an unhealthy cluster to a healthy state.
The commands in this procedure can be run on any master or worker node on the system.
A backup of a healthy etcd cluster has been created.
List the backups for the desired etcd cluster.
The example below uses the Boot Orchestration Service (BOS).
ncn# kubectl exec -it -n operators \
$(kubectl get pod -n operators | grep etcd-backup-restore | head -1 | awk '{print $1}') \
-c boto3 -- list_backups cray-bos
Example output:
cray-bos/etcd.backup_v108497_2020-03-20-23:42:37
cray-bos/etcd.backup_v125815_2020-03-21-23:42:37
cray-bos/etcd.backup_v143095_2020-03-22-23:42:38
cray-bos/etcd.backup_v160489_2020-03-23-23:42:37
cray-bos/etcd.backup_v176621_2020-03-24-23:42:37
cray-bos/etcd.backup_v277935_2020-03-30-23:52:54
cray-bos/etcd.backup_v86767_2020-03-19-18:00:05
Restore the cluster using a backup.
Replace etcd.backup_v277935_2020-03-30-23:52:54
in the command below with the name of the backup being used.
ncn# kubectl exec -it -n operators \
$(kubectl get pod -n operators | grep etcd-backup-restore | head -1 | awk '{print $1}') \
-c util -- restore_from_backup cray-bos etcd.backup_v277935_2020-03-30-23:52:54
Example output:
etcdrestore.etcd.database.coreos.com/cray-bos-etcd created
Watch the pods come back online.
This may take a couple minutes.
ncn# kubectl -n services get pod | grep SERVICE_NAME
Example output:
cray-bos-etcd-498jn7th6p 1/1 Running 0 4h1m
cray-bos-etcd-dj7d894227 1/1 Running 0 3h59m
cray-bos-etcd-tk4pr4kgqk 1/1 Running 0 4
Delete the EtcdRestore
custom resource.
This step makes it possible for future restores to occur. Replace the etcdrestore.etcd.database.coreos.com/cray-bos-etcd
value with the name returned in
the earlier step when creating the backup.
ncn# kubectl -n services delete etcdrestore.etcd.database.coreos.com/cray-bos-etcd
Example output:
etcdrestore.etcd.database.coreos.com "cray-bos-etcd" deleted
Verify that the cray-bos-etcd-client
service was created.
ncn# kubectl get service -n services cray-bos-etcd-client
Example of output showing that the service was created:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cray-bos-etcd-client ClusterIP 10.28.248.232 <none> 2379/TCP 2m
If the etcd-client
service was not created, then repeat the procedure to restore the cluster again.