TFTP issues can result in node boot failures. Use this procedure to investigate and resolve such issues.
This procedure requires administrative privileges.
Encryption of compute node logs is not enabled, so the passwords may be passed in clear text.
(ncn-mw#
) Check that the TFTP service is running.
kubectl get pods -n services -o wide | grep cray-tftp
Start a tcpdump
session on the NCN.
(ncn-mw#
) Obtain the TFTP pod’s ID.
PODID=$(kubectl get pods -n services --no-headers -o wide | grep cray-tftp | awk '{print $1}')
echo $PODID
(ncn-mw#
) Enter the TFTP pod using the pod ID.
Double check that PODID
contains only one ID. If there are multiple TFTP pods listed, just choose one as the ID.
kubectl exec -n services -it $PODID /bin/sh
Start a tcpdump
session from within the TFTP pod.
Open another terminal to perform the following tasks:
Use a TFTP client to issue a TFTP request from either the NCN or a laptop.
Analyze the NCN tcpdump
data to ensure that the TFTP discover request is visible.
Go back to the original terminal to analyze the TFTP pod’s tcpdump
data in order to ensure that the TFTP request is visible inside the pod.
If the TFTP request is not visible on the NCN, it may be caused by a firewall issue. If the TFTP request is not visible inside the pod, then double check that the request was issued over the correct interface for the Node Management Network (NMN). If it was, then the underlying issue could be related to the firewall.