Increase the appropriate resource limits for pods after determining if a pod is being CPU throttled or OOMKilled
.
Return Kubernetes pods to a healthy state with resources available.
(ncn-mw#
) Determine the current limits of a pod.
kubectl get po -n services POD_ID -o yaml
Look for the following section returned in the output:
resources:
limits:
cpu: "2"
memory: 2Gi
requests:
cpu: 10m
memory: 64Mi
(ncn-mw#
) Determine which Kubernetes entity (etcdcluster
, deployment
, statefulset
, etc) is creating the pod.
The Kubernetes entity can be found with either of the following options:
Find the Kubernetes entity and grep
for the pod in question.
In the following example, replace hbtd-etcd
with the pod being used.
kubectl get deployment,statefulset,etcdcluster,postgresql,daemonsets -A | grep hbtd-etcd
Example output:
services etcdcluster.etcd.database.coreos.com/cray-hbtd-etcd 32d
Describe the pod and look in the Labels
section.
This section is helpful for tracking down which entity is creating the pod.
kubectl describe pod -n services POD_ID
Excerpt from example output:
Labels: app=etcd
etcd_cluster=cray-hbtd-etcd
etcd_node=cray-hbtd-etcd-8r2scmpb58
(ncn-mw#
) Edit the entity.
In the example below, be sure to replace ENTITY_TYPE
and ENTITY_NAME
with the values determined in
the previous step (in the example output for the following step, these would be etcdcluster
and
cray-hbtd-etcd
, respectively).
kubectl edit ENTITY_TYPE -n services ENTITY_NAME
(ncn-mw#
) Increase the resource limits for the pod.
resources: {}
Replace the text above with the following section, increasing the limits values:
resources:
limits:
cpu: "4"
memory: 8Gi
requests:
cpu: 10m
memory: 64Mi
(ncn-mw#
) Run a rolling restart of the pods.
kubectl get po -n services | grep ENTITY_NAME
Example output:
cray-hbtd-etcd-8r2scmpb58 1/1 Running 0 5d11h
cray-hbtd-etcd-qvz4zzjzw2 1/1 Running 0 5d11h
cray-hbtd-etcd-vzjzmbn6nr 1/1 Running 0 5d11h
(ncn-mw#
) Kill the pods off one by one.
Wait for each replacement pod to come up and be in a Running
state before proceeding to the next pod.
kubectl -n services delete pod POD_ID
(ncn-mw#
) Verify that all pods are now Running
with a more recent age.
kubectl get po -n services | grep ENTITY_NAME
Example output:
cray-hbtd-etcd-8r2scmpb58 1/1 Running 0 12s
cray-hbtd-etcd-qvz4zzjzw2 1/1 Running 0 32s
cray-hbtd-etcd-vzjzmbn6nr 1/1 Running 0 98s