Cray System Management Documentation > Cray System Management (CSM) Administration Guide > kubernetes > Clear Space in an etcd Cluster Database

Clear Space in an etcd Cluster Database

Use this procedure to clear the etcd cluster NOSPACE alarm. Once it is set it will remain set. If needed, defrag the database cluster before clearing the NOSPACE alarm.

Defragging the database cluster and clearing the etcd cluster NOSPACE alarm will free up database space.

Prerequisites

This procedure requires root privileges
The etcd clusters are in a healthy state

Procedure

(ncn-mw#) Clear up space when the etcd database space has exceeded and has been defragged, but the NOSPACE alarm remains set.

Determine if any clusters are failing to store a new key-value.

/opt/cray/platform-utils/ncnHealthChecks.sh -s etcd_database_health

Example output:

**************************************************************************

=== Check the health of Etcd Cluster's database in the Services Namespace. ===
=== PASS or FAIL status returned. ===
### cray-bos-bitnami-etcd-0 Etcd Database Check: ###
FAILED DATABASE CHECK - EXPECTED: OK foo fooCheck 1
{"level":"warn","ts":"2020-10-23T23:56:48.408Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-208534eb-2ab4-4c58-8853-58bff088c394/127.0.0.1:2379","attempt":0,"error":"rpc error: code = ResourceExhausted desc = etcdserver: mvcc: database space exceeded"}
Error: etcdserver: mvcc: database space exceeded
### cray-bos-bitnami-etcd-1 Etcd Database Check: ###
PASS: OK foo fooCheck 1
### cray-bos-bitnami-etcd-2 Etcd Database Check: ###
PASS: OK foo fooCheck 1
 --- PASSED ---

Check to see if the default 3G disk usage space (unless defined differently in the Helm chart) is currently exceeded. all_clusters can be substituted with a cluster name (cray-bss) for an individual cluster

/opt/cray/platform-utils/etcd/etcd-util.sh pvc_usage all_clusters

Example output:

### cray-bos-bitnami-etcd-0 PVC Usage: ###
Filesystem   Size   Used   Avail   Use%   Mounted   on
/dev/rbd14   7.8G   123M   7.7G   2%   /bitnami/etcd

### cray-bos-bitnami-etcd-1 PVC Usage: ###
Filesystem   Size   Used   Avail   Use%   Mounted   on
/dev/rbd11   7.8G   123M   7.7G   2%   /bitnami/etcd

### cray-bos-bitnami-etcd-2 PVC Usage: ###
Filesystem   Size   Used   Avail   Use%   Mounted   on
/dev/rbd10   7.8G   123M   7.7G   2%   /bitnami/etcd

Clear the NOSPACE alarm. The example below will clear alarms for all clusters. all_clusters can be substituted with a cluster name (cray-bss) for an individual cluster:

/opt/cray/platform-utils/etcd/etcd-util.sh clear_alarms all_clusters

Example output:

### cray-bos-bitnami-etcd-0 Disarmed Alarms: ###
memberID:6004340417806974740 alarm:NOSPACE
memberID:10618826089438871005 alarm:NOSPACE
memberID:6927946043724325475 alarm:NOSPACE
### cray-bos-bitnami-etcd-1 Disarmed Alarms: ###
### cray-bos-bitnami-etcd-2 Disarmed Alarms: ###

Verify that a new key-value can now be successfully stored.

/opt/cray/platform-utils/ncnHealthChecks.sh -s etcd_database_health

Example output:

**************************************************************************

=== Check the health of Etcd Cluster's database in the Services Namespace. ===
=== PASS or FAIL status returned. ===
### cray-bos-bitnami-etcd-0 Etcd Database Check: ###
PASS: OK foo fooCheck 1
### cray-bos-bitnami-etcd-1 Etcd Database Check: ###
PASS: OK foo fooCheck 1
### cray-bos-bitnami-etcd-2 Etcd Database Check: ###
PASS: OK foo fooCheck 1
 --- PASSED ---

(ncn-mw#) Clear the NOSPACE alarm. If the database needs to be defragged, then the alarm will be reset.

/opt/cray/platform-utils/etcd/etcd-util.sh pvc_usage all_clusters

Example output:

### cray-bos-bitnami-etcd-0 PVC Usage: ###
Filesystem   Size   Used   Avail   Use%   Mounted   on
/dev/rbd14   7.8G   123M   7.7G   2%   /bitnami/etcd

### cray-bos-bitnami-etcd-1 PVC Usage: ###
Filesystem   Size   Used   Avail   Use%   Mounted   on
/dev/rbd11   7.8G   123M   7.7G   2%   /bitnami/etcd

### cray-bos-bitnami-etcd-2 PVC Usage: ###
Filesystem   Size   Used   Avail   Use%   Mounted   on
/dev/rbd10   7.8G   123M   7.7G   2%   /bitnami/etcd

Resolve the space issue by either increasing the frequency of how often the etcd-defrag cron job is run, or by triggering it manually.

Select one of the following options:

Increase the frequency of the kube-etcd-defrag from every 24 hours to 12 hours.

kubectl edit -n services cronjob.batch/kube-etcd-defrag

Example output:

[...]

              name: etcd-defrag
            name: etcd-defrag
  schedule: 0 */12 * * *
  successfulJobsHistoryLimit: 1
  suspend: false
status:

[...]

Trigger the job manually.

kubectl -n services create job --from=cronjob/kube-etcd-defrag kube-etcd-defrag

Check the log messages after the defrag job is triggered

kubectl logs -n services $(kubectl get po -n services -l 'service.istio.io/canonical-name=kube-etcd-defrag' --sort-by=.metadata.creationTimestamp | tail -1 | awk '{print $1}')

Example output:

Running etcd defrag for: all
Skip defrag for: cray-hbtd-etcd
Skipping defrag for: cray-hbtd-etcd
Defragging cray-bos-bitnami-etcd-0
Defragging cray-bos-bitnami-etcd-1
Defragging cray-bos-bitnami-etcd-2

[...]

Verify that the disk space is less than the size limit.

/opt/cray/platform-utils/etcd/etcd-util.sh pvc_usage all_clusters

Example output:

### cray-bos-bitnami-etcd-0 PVC Usage: ###
Filesystem   Size   Used   Avail   Use%   Mounted   on
/dev/rbd14   7.8G   123M   7.7G   2%   /bitnami/etcd

### cray-bos-bitnami-etcd-1 PVC Usage: ###
Filesystem   Size   Used   Avail   Use%   Mounted   on
/dev/rbd11   7.8G   123M   7.7G   2%   /bitnami/etcd

### cray-bos-bitnami-etcd-2 PVC Usage: ###
Filesystem   Size   Used   Avail   Use%   Mounted   on
/dev/rbd10   7.8G   123M   7.7G   2%   /bitnami/etcd

Turn off the NOSPACE alarm. The example below will clear alarms for all clusters. all_clusters can be substituted with a cluster name (cray-bss) for an individual cluster:

/opt/cray/platform-utils/etcd/etcd-util.sh clear_alarms all_clusters

Example output:

### cray-bos-bitnami-etcd-0 Disarmed Alarms: ###
memberID:6004340417806974740 alarm:NOSPACE
memberID:10618826089438871005 alarm:NOSPACE
memberID:6927946043724325475 alarm:NOSPACE
### cray-bos-bitnami-etcd-1 Disarmed Alarms: ###
### cray-bos-bitnami-etcd-2 Disarmed Alarms: ###