Use an existing backup of a healthy etcd cluster to restore an unhealthy cluster to a healthy state.
The commands in this procedure can be run on any Kubernetes master or worker node on the system.
A backup of a healthy etcd cluster has been created.
Etcd clusters can be restored using an automated script.
The automated script will restore the cluster from the most recent backup if it finds a backup created within the last 7 days. If it does not discover a backup within the last 7 days, it will ask the user if they would like to rebuild the cluster.
(ncn-mw#) Rebuild/restore a single cluster
/opt/cray/platform-utils/etcd/etcd_restore_rebuild.sh -s cray-bss
(ncn-mw#) Rebuild/restore multiple clusters
/opt/cray/platform-utils/etcd/etcd_restore_rebuild.sh -m cray-bss,cray-fas
(ncn-mw#) Rebuild/restore all clusters
/opt/cray/platform-utils/etcd/etcd_restore_rebuild.sh -a
/opt/cray/platform-utils/etcd/etcd_restore_rebuild.sh -s cray-bss
Example output:
The following etcd clusters will be restored/rebuilt:
cray-bss
You will be accepting responsibility for any missing data if there is a
restore/rebuild over a running etcd k/v. HPE assumes no responsibility.
Proceed restoring/rebuilding? (yes/no)
yes
Proceeding: restoring/rebuilding etcd clusters.
----- Restoring from cray-bss/db-2025-05-28_20-00 -----
Scaling etcd statefulset down to zero...
statefulset.apps/cray-bss-bitnami-etcd scaled
statefulset rolling update complete 0 pods at revision cray-bss-bitnami-etcd-7ccc4dd5cc...
Setting cluster state for cray-bss to 'new' and to start from snapshot
statefulset.apps/cray-bss-bitnami-etcd env updated
Deleting existing PVC's...
persistentvolumeclaim "data-cray-bss-bitnami-etcd-0" deleted
persistentvolumeclaim "data-cray-bss-bitnami-etcd-1" deleted
persistentvolumeclaim "data-cray-bss-bitnami-etcd-2" deleted
Scaling etcd statefulset back up to three members...
statefulset.apps/cray-bss-bitnami-etcd scaled
waiting for statefulset rolling update to complete 0 pods at revision cray-bss-bitnami-etcd-c766d98bf...
Waiting for 1 pods to be ready...
Waiting for 2 pods to be ready...
Waiting for 3 pods to be ready...
Waiting for 3 pods to be ready...
Waiting for 3 pods to be ready...
Waiting for 3 pods to be ready...
Waiting for 2 pods to be ready...
Waiting for 1 pods to be ready...
statefulset rolling update complete 3 pods at revision cray-bss-bitnami-etcd-c766d98bf...
Setting cluster state for cray-bss to back to 'existing'
statefulset.apps/cray-bss-bitnami-etcd env updated
Checking endpoint health.
cray-bss etcd cluster health verified from cray-bss-bitnami-etcd-1
The automated etcd_restore_rebuild.sh script always restores from the most recent backup within the last seven days.
In situations where the most recent backup contains corrupted data, it may be necessary to restore from an older backup manually.
Use the etcd-util.sh script to list available backups and restore from a specific one.
(ncn-mw#) List all available backups for a cluster.
Replace cray-bss with the name of the etcd cluster to list backups for.
/opt/cray/platform-utils/etcd/etcd-util.sh list_backups cray-bss
Example output:
cray-bss/db-2026-02-24_23-00
cray-bss/db-2026-02-25_23-00
cray-bss/db-2026-02-26_23-00
cray-bss/db-2026-03-03_22-00
cray-bss/db-2026-03-04_22-00
(ncn-mw#) Restore the cluster from a specific backup.
Replace cray-bss with the name of the etcd cluster and db-2026-03-03_22-00 with the backup name (without the cluster prefix) from the list of available backups.
/opt/cray/platform-utils/etcd/etcd-util.sh restore_from_backup cray-bss db-2026-03-03_22-00
Example output:
Scaling etcd statefulset down to zero...
statefulset.apps/cray-bss-bitnami-etcd scaled
statefulset rolling update complete 0 pods at revision cray-bss-bitnami-etcd-977d76d6b...
Setting cluster state for cray-bss to 'new' and to start from snapshot
statefulset.apps/cray-bss-bitnami-etcd env updated
Deleting existing PVC's...
persistentvolumeclaim "data-cray-bss-bitnami-etcd-0" deleted
persistentvolumeclaim "data-cray-bss-bitnami-etcd-1" deleted
persistentvolumeclaim "data-cray-bss-bitnami-etcd-2" deleted
Scaling etcd statefulset back up to three members...
statefulset.apps/cray-bss-bitnami-etcd scaled
...
statefulset rolling update complete 3 pods at revision cray-bss-bitnami-etcd-5d4978bfb...
Setting cluster state for cray-bss to back to 'existing'
statefulset.apps/cray-bss-bitnami-etcd env updated
Checking endpoint health.
cray-bss etcd cluster health verified from cray-bss-bitnami-etcd-1