containerd
containerd
is a container runtime (systemd
service) that runs on the host. It is used to run containers on the Kubernetes platform.
/var/lib/containerd
filling upIn older versions of containerd
, there are cases where the /var/lib/containerd
directory fills up. In the event that this occurs, the following steps can be used to remediate the issue.
(ncn-mw#
) Restart containerd
on the NCN.
Whether or not this resolves the space issue, if this is a worker NCN, then also see the notes in the Restarting
containerd
on a worker NCN section for subsequent steps that must be taken aftercontainerd
is restarted.
systemctl restart containerd
Many times this will free up space in /var/lib/containerd
– if not, then proceed to the next step.
(ncn-mw#
) Restart kubelet
on the NCN.
systemctl restart kubelet
If restarting kubelet
fails to free up space in /var/lib/containerd
, then proceed to the next step.
(ncn-mw#
) Prune unused container images on the NCN.
crictl rmi --prune
Any unused images will be pruned. If still encountering disk space issues in /var/lib/containerd
, then proceed to the next step to reboot the NCN.
Reboot the NCN.
Follow the Reboot NCNs process to properly cordon/drain the NCN and reboot.
Generally this will free up space in /var/lib/containerd
.
containerd
on a worker NCNIf the containerd
service is restarted on a worker node, then this may cause the sonar-jobs-watcher
pod running on that worker node to fail when attempting
to cleanup unneeded containers. The following procedure determines if this is the case and remediates it, if necessary.
(ncn-mw#
) Retrieve the name of the sonar-jobs-watcher
pod that is running on this worker node.
Modify the following command to specify the name of the specific worker NCN where containerd
was restarted.
kubectl get pods -l name=sonar-jobs-watcher -n services -o wide | grep ncn-w001
Example output:
sonar-jobs-watcher-8z6th 1/1 Running 0 95d 10.42.0.6 ncn-w001 <none> <none>
(ncn-mw#
) View the logs for the sonar-jobs-watcher
pod.
Modify the following command to specify the pod name identified in the previous step.
kubectl logs sonar-jobs-watcher-8z6th -n services
Example output:
Found pod cray-dns-unbound-manager-1631116980-h69h6 with restartPolicy 'Never' and container 'manager' with status 'Completed'
All containers of job pod cray-dns-unbound-manager-1631116980-h69h6 has completed. Killing istio-proxy (1c65dacb960c2f8ff6b07dfc9780c4621beb8b258599453a08c246bbe680c511) to allow job to complete
time="2021-09-08T16:44:18Z" level=fatal msg="failed to connect: failed to connect, make sure you are running as root and the runtime has been started: context deadline exceeded"
When this occurs, pods that are running on the node where containerd
was restarted may remain in a NotReady
state and never complete.
(ncn-mw#
) Check if pods are stuck in a NotReady
state.
kubectl get pods -o wide -A | grep NotReady
Example output:
services cray-dns-unbound-manager-1631116980-h69h6 1/2 NotReady 0 10m 10.42.0.100 ncn-w001 <none> <none>
(ncn-mw#
) If any pods are stuck in a NotReady
state, then restart the sonar-jobs-watcher
daemonset
to resolve the issue.
kubectl rollout restart -n services daemonset sonar-jobs-watcher
Expected output:
daemonset.apps/sonar-jobs-watcher restarted
(ncn-mw#
) Verify that the restart completed successfully.
kubectl rollout status -n services daemonset sonar-jobs-watcher
Expected output:
daemon set "sonar-jobs-watcher" successfully rolled out
Once the sonar-jobs-watcher
pods restart, any pods that were in a NotReady
state should complete within about a minute.
To learn more in general about containerd
, refer to the containerd
documentation.