PXE Boot Troubleshooting

This page is designed to cover various issues that arise when trying to PXE boot nodes in an HPE Cray EX system.

In order for PXE booting to work successfully, the management network switches need to be configured correctly.

Configuration required for PXE booting

To successfully PXE boot nodes, the following is required:

  • The ip helper-address must be configured on VLANs 1, 2, 4, and 7. This will be where the layer 3 gateway exists (spine or leaf).
  • The virtual-IP/VSX/MAGP IP address must be configured on VLANs 1, 2, 4, and 7.
  • spine01 and spine02 need an active gateway on VLAN 1; this can be identified from MTL.yaml generated by CSI.
  • spine01 and spine02 need an ip helper-address on VLAN 1 pointing to 10.92.100.222.

Switch configuration

Aruba configuration

  1. (sw-spine#) Check the configuration for interface vlan x.

    This configuration will be the same on BOTH Switches (except the ip address). There will be an active-gateway and ip helper-address configured.

    int vlan 1,2,4,7
    show run current-context
    

    Example output:

    interface vlan 1
        ip mtu 9198
        ip address 10.1.0.2/16
        active-gateway ip mac 12:00:00:00:6b:00
        active-gateway ip 10.1.0.1
        ip helper-address 10.92.100.222
    interface vlan 2
        vsx-sync active-gateways
        ip mtu 9198
        ip address 10.252.0.2/17
        active-gateway ip mac 12:01:00:00:01:00
        active-gateway ip 10.252.0.1
        ip helper-address 10.92.100.222
        ip ospf 1 area 0.0.0.0
    interface vlan 4
        vsx-sync active-gateways
        ip mtu 9198
        ip address 10.254.0.2/17
        active-gateway ip mac 12:01:00:00:01:00
        active-gateway ip 10.254.0.1
        ip helper-address 10.94.100.222
        ip ospf 1 area 0.0.0.0
    interface vlan 7
        ip mtu 9198
        ip address 10.103.11.1/24
        active-gateway ip mac 12:01:00:00:01:00
        active-gateway ip 10.103.11.111
        ip helper-address 10.92.100.222
    
  2. (sw-spine#) If any of this configuration is missing, then update it on BOTH switches.

    conf t
    int vlan 1
    ip helper-address 10.92.100.222
    active-gateway ip mac 12:01:00:00:01:00
    active-gateway ip 10.1.0.1
    
    conf t
    int vlan 2
    ip helper-address 10.92.100.222
    active-gateway ip mac 12:01:00:00:01:00
    active-gateway ip 10.252.0.1
    
    conf t
    int vlan 4
    ip helper-address 10.94.100.222
    active-gateway ip mac 12:01:00:00:01:00
    
    conf t
    int vlan 7
    ip helper-address 10.92.100.222
    active-gateway ip mac 12:01:00:00:01:00
    active-gateway ip xxxxxxx
    write mem
    

Mellanox configuration

  1. (sw-spine#) Check the configuration for interface vlan 1.

    This configuration will be the same on BOTH Switches (except the ip address). magp and ip dhcp relay will be configured.

    show run int vlan 1
    

    Example output:

    interface vlan 1
    interface vlan 1 ip address 10.1.0.2/16 primary
    interface vlan 1 ip dhcp relay instance 2 downstream
    interface vlan 1 magp 1
    interface vlan 1 magp 1 ip virtual-router address 10.1.0.1
    interface vlan 1 magp 1 ip virtual-router mac-address 00:00:5E:00:01:01
    
  2. (sw-spine#) If this configuration is missing, then add it to BOTH switches.

    conf t
    interface vlan 1 magp 1
    ip virtual-router address 10.1.0.1
    ip virtual-router mac-address 00:00:5E:00:01:01
    conf t
    ip dhcp relay instance 2 vrf default
    ip dhcp relay instance 2 address 10.92.100.222
    interface vlan 2 ip dhcp relay instance 2 downstream
    
  3. (sw-spine#) Verify the VLAN 1 MAGP configuration.

    show magp 1
    

    Example output:

    MAGP 1:
      Interface vlan: 1
      Admin state   : Enabled
      State         : Master
      Virtual IP    : 10.1.0.1
      Virtual MAC   : 00:00:5E:00:01:01
    
  4. (sw-spine#) Verify the DHCP relay configuration.

    show ip dhcp relay instance 2
    

    Example output:

    VRF Name: default
    
    DHCP Servers:
      10.92.100.222
    
    DHCP relay agent options:
      always-on         : Disabled
      Information Option: Disabled
      UDP port          : 67
      Auto-helper       : Disabled
    
    -------------------------------------------
    Interface   Label             Mode
    -------------------------------------------
    vlan1       N/A               downstream
    vlan2       N/A               downstream
    vlan7       N/A               downstream
    
  5. (sw-spine#) Verify that the route to the TFTP server and the route for the ingress gateway are available.

    show ip route 10.92.100.60
    

    Example output:

    Flags:
      F: Failed to install in H/W
      B: BFD protected (static route)
      i: BFD session initializing (static route)
      x: protecting BFD session failed (static route)
      c: consistent hashing
      p: partial programming in H/W
    
    VRF Name default:
      ------------------------------------------------------------------------------------------------------
      Destination       Mask              Flag     Gateway           Interface        Source     AD/M
      ------------------------------------------------------------------------------------------------------
      default           0.0.0.0           c        10.101.15.161     eth1/12          static     1/1
      10.92.100.60      255.255.255.255   c        10.252.0.5        vlan2            bgp        200/0
                                          c        10.252.0.6        vlan2            bgp        200/0
                                          c        10.252.0.7        vlan2            bgp        200/0
    
    show ip route 10.92.100.71
    

    Example output:

    Flags:
      F: Failed to install in H/W
      B: BFD protected (static route)
      i: BFD session initializing (static route)
      x: protecting BFD session failed (static route)
      c: consistent hashing
      p: partial programming in H/W
    
    VRF Name default:
      ------------------------------------------------------------------------------------------------------
      Destination       Mask              Flag     Gateway           Interface        Source     AD/M
      ------------------------------------------------------------------------------------------------------
      default           0.0.0.0           c        10.101.15.161     eth1/12          static     1/1
      10.92.100.71      255.255.255.255   c        10.252.0.5        vlan2            bgp        200/0
                                          c        10.252.0.6        vlan2            bgp        200/0
                                          c        10.252.0.7        vlan2            bgp        200/0
    

Next steps

If the configuration looks good and PXE boot is still not working, there are some other things to try.

Kernel panic when unpacking initrd

In rare cases, when a node is PXE booting it may kernel panic when unpacking the initrd. The error message will be similar to the following, but may vary depending on a number of factors including the kernel and hardware in use.

...
http://rgw-vip.nmn/boot-images/9e2032aa-2e1d-4cd5-acb5-cb1ad7bac001/kernel... ok
http://rgw-vip.nmn/boot-images/9e2032aa-2e1d-4cd5-acb5-cb1ad7bac001/initrd... ok
DxeTpm2MeasureBootHandler: PeCoffLoaderGetImageInfo failed! Status = Unsupported
Image path:
"".
[    0.554122][  T222] Initramfs unpacking failed: invalid magic at start of compressed archive
[    1.202522][    T1] i8042: Can't read CTR while initializing i8042
[    1.588403][    T1] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[    1.597815][    T1] CPU: 19 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.23.17-default #1 SLE15-SP6
[    1.611314][    T1] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant DL325 Gen10 Plus, BIOS A43 02/06/2023
...

This is typically a transient issue and can be resolved by rebooting the node with ipmitool.

read -s is used to prevent the password from being written to the screen or the shell history.

USERNAME=root
read -r -s -p "NCN BMC ${USERNAME} password: " IPMI_PASSWORD
export IPMI_PASSWORD
ipmitool -I lanplus -U "${USERNAME}" -E -H <bmc-hostname> power reset

Node iPXE retries and NIC order

In some environments, during the Deploy Final NCN: Reboot step, ncn-m001 may loop through all of its NICs and still fail to PXE boot, even after the third chain attempt. The NIC boot ordering used by default is designed to be optimal for multiple types of hardware and cabling, but it may need to be edited for specific environments in order to reduce the boot time of ncn-m001.

If the boot issues described above are observed, then follow the steps in Edit the iPXE Embedded Boot Script, adjusting the NIC boot order such that net0, or others, come before net2. If that does not resolve the issue, then return to this page.

Restart BSS

Restart the Boot Script Service (BSS) if the following output is returned on the console during an NCN PXE boot attempt (specifically the 404 Not Found error at the bottom):

https://api-gw-service-nmn.local/apis/bss/boot/v1/bootscript...X509 chain 0x6d35c548 added X509 0x6d360d68 "eniac.dev.cray.com"
X509 chain 0x6d35c548 added X509 0x6d3d62e0 "Platform CA - L1 (a0b073c8-5c9c-4f89-b8a2-a44adce3cbdf)"
X509 chain 0x6d35c548 added X509 0x6d3d6420 "Platform CA (a0b073c8-5c9c-4f89-b8a2-a44adce3cbdf)"
EFITIME is 2021-02-26 21:55:04
HTTP 0x6d35da88 status 404 Not Found
  1. (ncn-mw#) Rollout a restart of the BSS deployment.

    kubectl -n services rollout restart deployment cray-bss
    

    Example output:

    deployment.apps/cray-bss restarted
    
  2. (ncn-mw#) Wait for the roll out to complete.

    Wait for this command to return (it will block showing status as the pods are refreshed).

    kubectl -n services rollout status deployment cray-bss
    

    Example output:

    Waiting for deployment "cray-bss" rollout to finish: 1 out of 3 new replicas have been updated...
    Waiting for deployment "cray-bss" rollout to finish: 1 out of 3 new replicas have been updated...
    Waiting for deployment "cray-bss" rollout to finish: 1 out of 3 new replicas have been updated...
    Waiting for deployment "cray-bss" rollout to finish: 2 out of 3 new replicas have been updated...
    Waiting for deployment "cray-bss" rollout to finish: 2 out of 3 new replicas have been updated...
    Waiting for deployment "cray-bss" rollout to finish: 2 out of 3 new replicas have been updated...
    Waiting for deployment "cray-bss" rollout to finish: 1 old replicas are pending termination...
    Waiting for deployment "cray-bss" rollout to finish: 1 old replicas are pending termination...
    deployment "cray-bss" successfully rolled out
    
  3. Reboot the NCN that failed to PXE boot.

Restart Kea

In some cases, rebooting the Kea pod has resolved PXE issues.

  1. (ncn-mw#) Restart Kea.

    kubectl rollout restart deployment -n services cray-dhcp-kea
    
  2. (ncn-mw#) Wait for deployment to restart.

    kubectl rollout status deployment -n services cray-dhcp-kea
    

Missing BSS data

If the PXE boot is giving 404 errors, this could be because the necessary information is not in BSS. The information is uploaded into BSS with the csi handoff bss-metadata and csi handoff bss-update-cloud-init commands in the Deploy Final NCN Handoff Data procedure. If these commands failed or were skipped accidentally, then this will cause the ncn-m001 PXE boot to fail.

In that case, then use the following recovery procedure.

  1. Reboot to the PIT.

    • If using a remote ISO PIT, skip to the next step.

    • If using a USB PIT, follow this procedure:

      1. Reboot the PIT node, watching the console as it boots.

      2. Manually stop it at the boot menu.

      3. Select the USB device for the boot.

      4. Once booted, log in and mount the data partition. See:

  2. (pit#) Set variables for the system name, the CAN IP address for ncn-m002, the Kubernetes version, and the Ceph version.

    If needed, the typescript file from that procedure should be on ncn-m002 and ncn-m003 in the /metal/bootstrap/prep/admin directory.

    Substitute the correct values for the system in use in the following commands:

    CAN_IP_NCN_M002=$(ssh ncn-m002 ip -4 a show bond0.can0 | grep inet | awk '{print $2}' | cut -d / -f1)
    
  3. (pit#) If using a remote ISO PIT, then run the following commands to finish configuring the network and copy files.

    Skip these steps if using a USB PIT.

    1. Run the following command to copy files from ncn-m002 to the PIT node.

      scp -p ${CAN_IP_NCN_M002}:/metal/bootstrap/prep/${SYSTEM_NAME}/pit-files/* /etc/sysconfig/network/
      
    2. Apply the network changes.

      wicked ifreload all
      systemctl restart wickedd-nanny && sleep 5
      
    3. Copy data.json from ncn-m002 to the PIT node.

      mkdir -p /var/www/ephemeral/configs
      scp ${CAN_IP_NCN_M002}:/metal/bootstrap/prep/${SYSTEM_NAME}/basecamp/data.json /var/www/ephemeral/configs
      
  4. (pit#) Copy Kubernetes configuration file from ncn-m002.

    mkdir -pv ~/.kube
    scp ${CAN_IP_NCN_M002}:/etc/kubernetes/admin.conf ~/.kube/config
    
  5. (pit#) Set DNS to use Unbound.

    echo "nameserver 10.92.100.225" > /etc/resolv.conf
    
  6. (pit#) Export the API token.

    export TOKEN=$(curl -k -s -S -d grant_type=client_credentials \
        -d client_id=admin-client \
        -d client_secret=`kubectl get secrets admin-client-auth -o jsonpath='{.data.client-secret}' | base64 -d` \
        https://api-gw-service-nmn.local/keycloak/realms/shasta/protocol/openid-connect/token | jq -r '.access_token')
    
  7. (pit#) Re-run the BSS handoff commands from the Deploy Final NCN procedure.

    WARNING These commands should never be run from a node other than the PIT node, or ncn-m001 during handoff**

    csi handoff bss-metadata --data-file /var/www/ephemeral/configs/data.json || echo "ERROR: csi handoff bss-metadata failed"
    csi handoff bss-update-cloud-init --set meta-data.dns-server=10.92.100.225 --limit Global
    
  8. Perform the Restart BSS and the Restart Kea procedures.

  9. Reboot the PIT node.