containerdcontainerd is a container runtime (systemd service) that runs on the host. It is used to run containers on the Kubernetes platform.
/var/lib/containerd filling upIn older versions of containerd, there are cases where the /var/lib/containerd directory fills up. In the event that this occurs, the following steps can be used to remediate the issue.
(ncn-mw#) Restart containerd on the NCN.
Whether or not this resolves the space issue, if this is a worker NCN, then also see the notes in the Restarting
containerdon a worker NCN section for subsequent steps that must be taken aftercontainerdis restarted.
systemctl restart containerd
Many times this will free up space in /var/lib/containerd – if not, then proceed to the next step.
(ncn-mw#) Restart kubelet on the NCN.
systemctl restart kubelet
If restarting kubelet fails to free up space in /var/lib/containerd, then proceed to the next step.
(ncn-mw#) Prune unused container images on the NCN.
crictl rmi --prune
Any unused images will be pruned. If still encountering disk space issues in /var/lib/containerd, then proceed to the next step to reboot the NCN.
Reboot the NCN.
Follow the Reboot NCNs process to properly cordon/drain the NCN and reboot.
Generally this will free up space in /var/lib/containerd.
containerd on a worker NCNIf the containerd service is restarted on a worker node, then this may cause the sonar-jobs-watcher pod running on that worker node to fail when attempting
to cleanup unneeded containers. The following procedure determines if this is the case and remediates it, if necessary.
(ncn-mw#) Retrieve the name of the sonar-jobs-watcher pod that is running on this worker node.
Modify the following command to specify the name of the specific worker NCN where containerd was restarted.
kubectl get pods -l name=sonar-jobs-watcher -n services -o wide | grep ncn-w001
Example output:
sonar-jobs-watcher-8z6th 1/1 Running 0 95d 10.42.0.6 ncn-w001 <none> <none>
(ncn-mw#) View the logs for the sonar-jobs-watcher pod.
Modify the following command to specify the pod name identified in the previous step.
kubectl logs sonar-jobs-watcher-8z6th -n services
Example output:
Found pod cray-dns-unbound-manager-1631116980-h69h6 with restartPolicy 'Never' and container 'manager' with status 'Completed'
All containers of job pod cray-dns-unbound-manager-1631116980-h69h6 has completed. Killing istio-proxy (1c65dacb960c2f8ff6b07dfc9780c4621beb8b258599453a08c246bbe680c511) to allow job to complete
time="2021-09-08T16:44:18Z" level=fatal msg="failed to connect: failed to connect, make sure you are running as root and the runtime has been started: context deadline exceeded"
When this occurs, pods that are running on the node where containerd was restarted may remain in a NotReady state and never complete.
(ncn-mw#) Check if pods are stuck in a NotReady state.
kubectl get pods -o wide -A | grep NotReady
Example output:
services cray-dns-unbound-manager-1631116980-h69h6 1/2 NotReady 0 10m 10.42.0.100 ncn-w001 <none> <none>
(ncn-mw#) If any pods are stuck in a NotReady state, then restart the sonar-jobs-watcher daemonset to resolve the issue.
kubectl rollout restart -n services daemonset sonar-jobs-watcher
Expected output:
daemonset.apps/sonar-jobs-watcher restarted
(ncn-mw#) Verify that the restart completed successfully.
kubectl rollout status -n services daemonset sonar-jobs-watcher
Expected output:
daemon set "sonar-jobs-watcher" successfully rolled out
Once the sonar-jobs-watcher pods restart, any pods that were in a NotReady state should complete within about a minute.
To learn more in general about containerd, refer to the containerd documentation.