Cray System Management Documentation > Cray System Management (CSM) Administration Guide > kubernetes > Check the Health of etcd Clusters

Check the Health of etcd Clusters

Check to see if all of the etcd clusters have the correct number of healthy pods and a healthy cluster database. Any clusters that do not have healthy pods will need to be either restored from backup or rebuilt.

Prerequisites

This procedure requires root privileges.

Procedure

(ncn-mw#) Check the health of the clusters.

To check the health of the etcd clusters in the services namespace without TLS authentication:

/opt/cray/platform-utils/ncnHealthChecks.sh -s etcd_health_status

Example output:

**************************************************************************

=== Check the Health of the Etcd Clusters in all Namespaces. ===
=== Verify a "healthy" Report for Each Etcd Pod. ===
Fri 10 Mar 2023 07:52:09 PM UTC
### cray-bos-bitnami-etcd-0 ###
127.0.0.1:2379 is healthy: successfully committed proposal: took = 4.166761ms
### cray-bos-bitnami-etcd-1 ###
127.0.0.1:2379 is healthy: successfully committed proposal: took = 4.697124ms
### cray-bos-bitnami-etcd-2 ###
127.0.0.1:2379 is healthy: successfully committed proposal: took = 4.119712ms
[...]
 --- PASSED ---

If any of the etcd clusters are not healthy, refer to Restore an etcd Cluster from a Backup.

(ncn-mw#) Check the number of pods in each cluster.

Each cluster should contain at least three pods.

/opt/cray/platform-utils/ncnHealthChecks.sh -s etcd_cluster_balance

Example output:

**************************************************************************

=== Check the Number of Pods in Each Cluster. Verify they are Balanced. ===
=== Each cluster should contain at least three pods, but may contain more. ===
=== Ensure that no two pods in a given cluster exist on the same worker node. ===
Fri 10 Mar 2023 07:54:22 PM UTC
cray-bos-bitnami-etcd-0                                           2/2     Running     0          22h     10.32.0.76    ncn-w002   <none>           <none>
cray-bos-bitnami-etcd-1                                           2/2     Running     0          22h     10.40.0.8     ncn-w003   <none>           <none>
cray-bos-bitnami-etcd-2                                           2/2     Running     0          22h     10.44.0.58    ncn-w001   <none>           <none>
[...]
 --- PASSED ---

If the etcd clusters have fewer than three pods in a ‘Running’ state, see Restore an etcd Cluster from a Backup.

(ncn-mw#) Check the health of all etcd clusters’ databases:

/opt/cray/platform-utils/ncnHealthChecks.sh -s etcd_database_health

Example output:

**************************************************************************

=== Check the health of Etcd Cluster's database in the Services Namespace. ===
=== PASS or FAIL status returned. ===
### cray-bos-bitnami-etcd-0 Etcd Database Check: ###
PASS: OK foo fooCheck 1
### cray-bos-bitnami-etcd-1 Etcd Database Check: ###
PASS: OK foo fooCheck 1
### cray-bos-bitnami-etcd-2 Etcd Database Check: ###
PASS: OK foo fooCheck 1
[...]
 --- PASSED ---

If any of the etcd cluster databases are not healthy, then refer to the following procedures: