Increase Pod Resource Limits

Increase the appropriate resource limits for pods after determining if a pod is being CPU throttled or OOMKilled.

Return Kubernetes pods to a healthy state with resources available.

Prerequisites

Procedure

  1. (ncn-mw#) Determine the current limits of a pod.

    kubectl get po -n services POD_ID -o yaml
    

    Look for the following section returned in the output:

        resources:
          limits:
            cpu: "2"
            memory: 2Gi
          requests:
            cpu: 10m
            memory: 64Mi
    
  2. (ncn-mw#) Determine which Kubernetes entity (etcdcluster, deployment, statefulset, etc) is creating the pod.

    The Kubernetes entity can be found with either of the following options:

    • Find the Kubernetes entity and grep for the pod in question.

      In the following example, replace hbtd-etcd with the pod being used.

      kubectl get deployment,statefulset,etcdcluster,postgresql,daemonsets -A | grep hbtd-etcd
      

      Example output:

      services    etcdcluster.etcd.database.coreos.com/cray-hbtd-etcd               32d
      
    • Describe the pod and look in the Labels section.

      This section is helpful for tracking down which entity is creating the pod.

      kubectl describe pod -n services POD_ID
      

      Excerpt from example output:

      Labels:       app=etcd
                    etcd_cluster=cray-hbtd-etcd
                    etcd_node=cray-hbtd-etcd-8r2scmpb58
      
  3. (ncn-mw#) Edit the entity.

    In the example below, be sure to replace ENTITY_TYPE and ENTITY_NAME with the values determined in the previous step (in the example output for the following step, these would be etcdcluster and cray-hbtd-etcd, respectively).

    kubectl edit ENTITY_TYPE -n services ENTITY_NAME
    
  4. (ncn-mw#) Increase the resource limits for the pod.

        resources: {}
    

    Replace the text above with the following section, increasing the limits values:

        resources:
        limits:
          cpu: "4"
          memory: 8Gi
        requests:
          cpu: 10m
          memory: 64Mi
    
  5. (ncn-mw#) Run a rolling restart of the pods.

    kubectl get po -n services | grep ENTITY_NAME
    

    Example output:

    cray-hbtd-etcd-8r2scmpb58 1/1 Running 0 5d11h
    cray-hbtd-etcd-qvz4zzjzw2 1/1 Running 0 5d11h
    cray-hbtd-etcd-vzjzmbn6nr 1/1 Running 0 5d11h
    
  6. (ncn-mw#) Kill the pods off one by one.

    Wait for each replacement pod to come up and be in a Running state before proceeding to the next pod.

    kubectl -n services delete pod POD_ID
    
  7. (ncn-mw#) Verify that all pods are now Running with a more recent age.

    kubectl get po -n services | grep ENTITY_NAME
    

    Example output:

    cray-hbtd-etcd-8r2scmpb58 1/1 Running 0 12s
    cray-hbtd-etcd-qvz4zzjzw2 1/1 Running 0 32s
    cray-hbtd-etcd-vzjzmbn6nr 1/1 Running 0 98s