This page is designed to cover various issues that arise when trying to PXE boot nodes in an HPE Cray EX system.
In order for PXE booting to work successfully, the management network switches need to be configured correctly.
To successfully PXE boot nodes, the following is required:
ip helper-address
must be configured on VLANs 1, 2, 4, and 7. This will be where the layer 3 gateway exists (spine or aggregation).ncn-m001
needs an active gateway on VLAN 1; this can be identified from MTL.yaml
generated by CSI.ncn-m001
needs an ip helper-address
on VLAN 1 pointing to 10.92.100.222
.Snippet of MTL.yaml
:
name: network_hardware
net-name: MTL
vlan_id: 0
comment: ""
gateway: 10.1.0.1
Check the configuration for interface vlan x
.
This configuration will be the same on BOTH Switches (except the ip address
).
There will be an active-gateway
and ip helper-address
configured.
sw-spine-001(config)# int vlan 1,2,4,7
sw-spine-001(config-if-vlan-<1,2,4,7>)# show run current-context
Example output:
interface vlan 1
ip mtu 9198
ip address 10.1.0.2/16
active-gateway ip mac 12:00:00:00:6b:00
active-gateway ip 10.1.0.1
ip helper-address 10.92.100.222
interface vlan 2
vsx-sync active-gateways
ip mtu 9198
ip address 10.252.0.2/17
active-gateway ip mac 12:01:00:00:01:00
active-gateway ip 10.252.0.1
ip helper-address 10.92.100.222
ip ospf 1 area 0.0.0.0
interface vlan 4
vsx-sync active-gateways
ip mtu 9198
ip address 10.254.0.2/17
active-gateway ip mac 12:01:00:00:01:00
active-gateway ip 10.254.0.1
ip helper-address 10.94.100.222
ip ospf 1 area 0.0.0.0
interface vlan 7
ip mtu 9198
ip address 10.103.11.1/24
active-gateway ip mac 12:01:00:00:01:00
active-gateway ip 10.103.11.111
ip helper-address 10.92.100.222
If any of this configuration is missing, then update it on BOTH switches.
sw-spine-002# conf t
sw-spine-002(config)# int vlan 1
sw-spine-002(config-if-vlan)# ip helper-address 10.92.100.222
sw-spine-002(config-if-vlan)# active-gateway ip mac 12:01:00:00:01:00
sw-spine-002(config-if-vlan)# active-gateway ip 10.1.0.1
sw-spine-002# conf t
sw-spine-002(config)# int vlan 2
sw-spine-002(config-if-vlan)# ip helper-address 10.92.100.222
sw-spine-002(config-if-vlan)# active-gateway ip mac 12:01:00:00:01:00
sw-spine-002(config-if-vlan)# active-gateway ip 10.252.0.1
sw-spine-002# conf t
sw-spine-002(config)# int vlan 4
sw-spine-002(config-if-vlan)# ip helper-address 10.94.100.222
sw-spine-002(config-if-vlan)# active-gateway ip mac 12:01:00:00:01:00
sw-spine-002# conf t
sw-spine-002(config)# int vlan 7
sw-spine-002(config-if-vlan)# ip helper-address 10.92.100.222
sw-spine-002(config-if-vlan)# active-gateway ip mac 12:01:00:00:01:00
sw-spine-002(config-if-vlan)# active-gateway ip xxxxxxx
sw-spine-002(config-if-vlan)# write mem
Verify the route to the TFTP server is in place.
This is a static route to get to the TFTP server via a worker node.
The worker node IP address is in NMN.yaml
from CSI-generated data.
- ip_address: 10.252.1.9
name: ncn-w001
comment: x3000c0s4b0n0
aliases:
sw-spine-002(config)# show ip route static
Example output:
Displaying ipv4 routes selected for forwarding
'[x/y]' denotes [distance/metric]
0.0.0.0/0, vrf default
via 10.103.15.209, [1/0], static
10.92.100.60/32, vrf default
via 10.252.1.7, [1/0], static
In this example, the route is 10.92.100.60/32 via 10.252.1.7
, with 10.252.1.7
being the worker node.
If that static route is missing, then add it.
sw-spine-001(config)# ip route 10.92.100.60/32 10.252.1.7
Check the configuration for interface vlan 1
.
This configuration will be the same on BOTH Switches (except the ip address
).
magp
and ip dhcp relay
will be configured.
sw-spine-001 [standalone: master] # show run int vlan 1
Example output:
interface vlan 1
interface vlan 1 ip address 10.1.0.2/16 primary
interface vlan 1 ip dhcp relay instance 2 downstream
interface vlan 1 magp 1
interface vlan 1 magp 1 ip virtual-router address 10.1.0.1
interface vlan 1 magp 1 ip virtual-router mac-address 00:00:5E:00:01:01
If this configuration is missing, then add it to BOTH switches.
sw-spine-001 [standalone: master] # conf t
sw-spine-001 [standalone: master] (config) # interface vlan 1 magp 1
sw-spine-001 [standalone: master] (config interface vlan 1 magp 1) # ip virtual-router address 10.1.0.1
sw-spine-001 [standalone: master] (config interface vlan 1 magp 1) # ip virtual-router mac-address 00:00:5E:00:01:01
sw-spine-001 [standalone: master] # conf t
sw-spine-001 [standalone: master] (config) # ip dhcp relay instance 2 vrf default
sw-spine-001 [standalone: master] (config) # ip dhcp relay instance 2 address 10.92.100.222
sw-spine-001 [standalone: master] (config) # interface vlan 2 ip dhcp relay instance 2 downstream
Verify the VLAN 1 MAGP configuration.
sw-spine-001 [standalone: master] # show magp 1
Example output:
MAGP 1:
Interface vlan: 1
Admin state : Enabled
State : Master
Virtual IP : 10.1.0.1
Virtual MAC : 00:00:5E:00:01:01
Verify the DHCP relay configuration.
sw-spine-001 [standalone: master] (config) # show ip dhcp relay instance 2
Example output:
VRF Name: default
DHCP Servers:
10.92.100.222
DHCP relay agent options:
always-on : Disabled
Information Option: Disabled
UDP port : 67
Auto-helper : Disabled
-------------------------------------------
Interface Label Mode
-------------------------------------------
vlan1 N/A downstream
vlan2 N/A downstream
vlan7 N/A downstream
Verify that the route to the TFTP server and the route for the ingress gateway are available.
sw-spine-001 [standalone: master] # show ip route 10.92.100.60
Example output:
Flags:
F: Failed to install in H/W
B: BFD protected (static route)
i: BFD session initializing (static route)
x: protecting BFD session failed (static route)
c: consistent hashing
p: partial programming in H/W
VRF Name default:
------------------------------------------------------------------------------------------------------
Destination Mask Flag Gateway Interface Source AD/M
------------------------------------------------------------------------------------------------------
default 0.0.0.0 c 10.101.15.161 eth1/12 static 1/1
10.92.100.60 255.255.255.255 c 10.252.0.5 vlan2 bgp 200/0
c 10.252.0.6 vlan2 bgp 200/0
c 10.252.0.7 vlan2 bgp 200/0
sw-spine-001 [standalone: master] # show ip route 10.92.100.71
Example output:
Flags:
F: Failed to install in H/W
B: BFD protected (static route)
i: BFD session initializing (static route)
x: protecting BFD session failed (static route)
c: consistent hashing
p: partial programming in H/W
VRF Name default:
------------------------------------------------------------------------------------------------------
Destination Mask Flag Gateway Interface Source AD/M
------------------------------------------------------------------------------------------------------
default 0.0.0.0 c 10.101.15.161 eth1/12 static 1/1
10.92.100.71 255.255.255.255 c 10.252.0.5 vlan2 bgp 200/0
c 10.252.0.6 vlan2 bgp 200/0
c 10.252.0.7 vlan2 bgp 200/0
If these routes are missing, then refer to Update BGP Neighbors.
If the configuration looks good and PXE boot is still not working, then there are some other things to try.
Restart the Boot Script Service (BSS) if the following output is returned on the console
during an NCN PXE boot attempt (specifically the 404 Not Found
error at the bottom):
https://api-gw-service-nmn.local/apis/bss/boot/v1/bootscript...X509 chain 0x6d35c548 added X509 0x6d360d68 "eniac.dev.cray.com"
X509 chain 0x6d35c548 added X509 0x6d3d62e0 "Platform CA - L1 (a0b073c8-5c9c-4f89-b8a2-a44adce3cbdf)"
X509 chain 0x6d35c548 added X509 0x6d3d6420 "Platform CA (a0b073c8-5c9c-4f89-b8a2-a44adce3cbdf)"
EFITIME is 2021-02-26 21:55:04
HTTP 0x6d35da88 status 404 Not Found
Roll out a restart of the BSS deployment.
ncn-mw# kubectl -n services rollout restart deployment cray-bss
Example output:
deployment.apps/cray-bss restarted
Wait for this command to return (it will block showing status as the pods are refreshed):
ncn-mw# # kubectl -n services rollout status deployment cray-bss
Example output:
Waiting for deployment "cray-bss" rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for deployment "cray-bss" rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for deployment "cray-bss" rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for deployment "cray-bss" rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for deployment "cray-bss" rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for deployment "cray-bss" rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for deployment "cray-bss" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "cray-bss" rollout to finish: 1 old replicas are pending termination...
deployment "cray-bss" successfully rolled out
Reboot the NCN that failed to PXE boot.
In some cases, rebooting the Kea pod has resolved PXE issues.
Get the Kea pod.
ncn-mw# kubectl get pods -n services | grep kea
Example output:
cray-dhcp-kea-6bd8cfc9c5-m6bgw 3/3 Running 0 20h
Delete the Kea pod.
ncn-mw# kubectl delete pods -n services cray-dhcp-kea-6bd8cfc9c5-m6bgw
If the PXE boot is giving 404 errors, this could be because the necessary information is not in BSS. The
information is uploaded into BSS with the csi handoff bss-metadata
and csi handoff bss-update-cloud-init
commands in the Redeploy PIT Node procedure. If these commands
failed or were skipped accidentally, then this will cause the ncn-m001
PXE boot to fail.
In that case, use the following recovery procedure.
Reboot to the PIT.
If using a USB PIT, follow this procedure:
Reboot the PIT node, watching the console as it boots.
Manually stop it at the boot menu.
Select the USB device for the boot.
Once booted, log in and mount the data partition.
pit# mount -vL PITDATA
If using a remote ISO PIT, follow the Bootstrap LiveCD Remote ISO procedure up through (and including) the Set Up The Site Link step.
Set variables for the system name, the CAN IP address for ncn-m002
, the Kubernetes version, and the Ceph version.
The CAN IP address for ncn-m002
is obtained at this step of the Redeploy PIT Node procedure.
The Kubernetes and Ceph versions are from the output of the csi handoff ncn-images
command in the Redeploy PIT Node procedure.
If needed, the typescript file from that procedure should be on ncn-m002
in the /metal/bootstrap/prep/admin
directory.
Be sure to substitute the correct values for the system in the following commands:
pit# SYSTEM_NAME=eniac
pit# CAN_IP_NCN_M002=a.b.c.d
pit# export KUBERNETES_VERSION=m.n.o
pit# export CEPH_VERSION=x.y.z
If using a remote ISO PIT, run the following commands to finish configuring the network and copy files.
Skip these steps if using a USB PIT.
Run the following command to copy files from ncn-m002
to the PIT node.
pit# scp -p ${CAN_IP_NCN_M002}:/metal/bootstrap/prep/${SYSTEM_NAME}/pit-files/* /etc/sysconfig/network/
Apply the network changes.
pit# wicked ifreload all
pit# systemctl restart wickedd-nanny && sleep 5
Copy data.json
from ncn-m002
to the PIT node.
pit# mkdir -p /var/www/ephemeral/configs
pit# scp ${CAN_IP_NCN_M002}:/metal/bootstrap/prep/${SYSTEM_NAME}/basecamp/data.json /var/www/ephemeral/configs
Copy Kubernetes configuration file from ncn-m002
.
pit# mkdir -pv ~/.kube
pit# scp ${CAN_IP_NCN_M002}:/etc/kubernetes/admin.conf ~/.kube/config
Set DNS to use Unbound.
pit# echo "nameserver 10.92.100.225" > /etc/resolv.conf
Export an API token.
pit# export TOKEN=$(curl -k -s -S -d grant_type=client_credentials \
-d client_id=admin-client \
-d client_secret=`kubectl get secrets admin-client-auth -o jsonpath='{.data.client-secret}' | base64 -d` \
https://api-gw-service-nmn.local/keycloak/realms/shasta/protocol/openid-connect/token | jq -r '.access_token')
Re-run the BSS handoff commands from the Redeploy PIT Node procedure.
WARNING: These commands should never be run from a node other than the PIT node or ncn-m001
pit# csi handoff bss-metadata --data-file /var/www/ephemeral/configs/data.json || echo "ERROR: csi handoff bss-metadata failed"
pit# csi handoff bss-update-cloud-init --set meta-data.dns-server=10.92.100.225 --limit Global
Perform the Restart BSS and the Restart Kea procedures.
Reboot the PIT node.