This document provides information on non-compute node (NCN) boot devices and boot ordering.
Non-compute nodes (NCNs) can boot from two sources:
Under normal operations, the NCNs use the following boot order:
After the CSM install is complete, it is usually not necessary to change the boot order. Having PXE first and disk as a fallback works in the majority of situations.
It may be desirable to change the boot order under these circumstances:
There are two different methods for determining whether a management node is booted using disk or PXE. The method to use will vary depending on the system environment.
(ncn#
or pit#
) Check kernel parameters.
cat /proc/cmdline
If it starts with kernel
, then the node network booted. If it starts with BOOT_IMAGE=(
, then it disk booted.
(ncn#
or pit#
) Check output from efibootmgr
.
efibootmgr
The BootCurrent
value should be matched to the list beneath to see if it lines up with a networking option or a cray sd*)
option for disk boots.
efibootmgr
Example output:
BootCurrent: 0016
Timeout: 2 seconds
BootOrder: 0000,0011,0013,0014,0015,0016,0017,0005,0007,0018,0019,001A,001B,001C,001D,001E,001F,0020,0021,0012
Boot0000* cray (sda1)
Boot0001* UEFI: Built-in EFI Shell
Boot0005* UEFI: PXE IP4 Mellanox Network Adapter - B8:59:9F:1D:D8:4E
Boot0007* UEFI: PXE IP4 Mellanox Network Adapter - B8:59:9F:1D:D8:4F
Boot0010* UEFI: AMI Virtual CDROM0 1.00
Boot0011* cray (sdb1)
Boot0012* UEFI: Built-in EFI Shell
Boot0013* UEFI OS
Boot0014* UEFI OS
Boot0015* UEFI: AMI Virtual CDROM0 1.00
Boot0016* UEFI: SanDisk <--- Matches here
Boot0017* UEFI: SanDisk, Partition 2
Boot0018* UEFI: HTTP IP4 Intel(R) I350 Gigabit Network Connection
Boot0019* UEFI: PXE IP4 Intel(R) I350 Gigabit Network Connection
Boot001A* UEFI: HTTP IP4 Mellanox Network Adapter - B8:59:9F:1D:D8:4E
Boot001B* UEFI: HTTP IP4 Mellanox Network Adapter - B8:59:9F:1D:D8:4F
Boot001C* UEFI: HTTP IP4 Intel(R) I350 Gigabit Network Connection
Boot001D* UEFI: PXE IP4 Intel(R) I350 Gigabit Network Connection
Boot001E* UEFI: PXE IP6 Intel(R) I350 Gigabit Network Connection
Boot001F* UEFI: PXE IP6 Intel(R) I350 Gigabit Network Connection
Boot0020* UEFI: PXE IP6 Mellanox Network Adapter - B8:59:9F:1D:D8:4E
Boot0021* UEFI: PXE IP6 Mellanox Network Adapter - B8:59:9F:1D:D8:4F
When reinstalling a system, the BMCs for the NCNs may be set to static IP addressing. The /var/lib/misc/dnsmasq.leases
file is checked when setting up the symlinks for the
artifacts each node needs to boot. So if the BMCs are set to static, those artifacts will not get set up correctly. Set the BMCs back to DHCP by using a command such as:
read -s
is used to prevent the password from being written to the screen or the shell history.
USERNAME=root
read -r -s -p "NCN BMC ${USERNAME} password: " IPMI_PASSWORD
export IPMI_PASSWORD
for h in $( grep mgmt /etc/hosts | grep -v m001 | awk -F ',' '{print $2}' ); do
ipmitool -U "${USERNAME}" -I lanplus -H "${h}" -E lan set 1 ipsrc dhcp
done
Some BMCs need a cold reset in order to pick up this change fully:
for h in $( grep mgmt /etc/hosts | grep -v m001 | awk -F ',' '{print $2}' ); do
ipmitool -U "${USERNAME}" -I lanplus -H "${h}" -E mc reset cold
done
ipmitool
can set and edit boot order; it works better for some vendors based on their BMC implementationefibootmgr
speaks directly to the node’s UEFI; it can only be ignored by new BIOS activityNOTE:
cloud-init
will set boot order and trim boot devices during itsruncmd
module, but this does not always work with certain hardware vendors. An administrator may invoke thecloud-init
script on any NCN or PIT by loading/srv/cray/scripts/metal/metal-lib.sh
(this should be loaded in a sub-shell as the library has aset -e
flag.)
This section gives the procedure for setting the boot order on NCNs and the PIT node.
Setting the boot order with efibootmgr
will ensure that the desired network interfaces and disks are in the proper order for booting.
The commands are the same for all hardware vendors, except where noted.
(ncn#
or pit#
) Create a list of the desired IPv4 boot devices.
Follow the section corresponding to the hardware manufacturer of the system:
Gigabyte Technology
efibootmgr | grep -iP '(pxe ipv?4.*adapter)' | tee /tmp/bbs1
Hewlett-Packard Enterprise
efibootmgr | grep -i 'port 1' | grep -i 'pxe ipv4' | tee /tmp/bbs1
Intel Corporation
efibootmgr | grep -i 'ipv4' | grep -iv 'baseboard' | tee /tmp/bbs1
(ncn#
or pit#
) Create a list of the Cray disk boot devices.
efibootmgr | grep -i cray | tee /tmp/bbs2
(ncn#
or pit#
) Set the boot order to first PXE boot, with disk boot as the fallback option.
efibootmgr -o $(cat /tmp/bbs* | awk '!x[$0]++' | sed 's/^Boot//g' | tr -d '*' | awk '{print $1}' | tr -t '\n' ',' | sed 's/,$//') | grep -i bootorder
(ncn#
or pit#
) Set all of the desired boot options to be active.
cat /tmp/bbs* | awk '!x[$0]++' | sed 's/^Boot//g' | tr -d '*' | awk '{print $1}' | xargs -r -t -i efibootmgr -b {} -a
(ncn#
or pit#
) Set next boot entry.
efibootmgr -n <desired_next_boot_device>
After following the steps above on a given NCN, that NCN will use the desired Shasta boot order.
This is the end of the Setting boot order
procedure.
This procedure prunes the list of boot devices, optimizing the boot order to align with CSM’s requirements.
(ncn#
or pit#
) Load the metal tools library and invoke the boot trim function.
TEMP=$(mktemp -d)
efibootmgr > "${TEMP}/original.log"
(
. /srv/cray/scripts/metal/metal-lib.sh
setup_uefi_bootorder >"${TEMP}/run.log"
)
NOTE The above snippet is
pdsh
friendly for bulk trims.
The ${TEMP}/run.log
file will show the output from each efibootmgr
call to trim the boot order.
The boot order has been trimmed.
Each section shows example output of the efibootmgr
command.
Master node (with onboard NICs enabled)
BootCurrent: 0002
Timeout: 0 seconds
BootOrder: 0000,0013,001A,0002,000F,0012,0014,0015
Boot0000* System Utilities
Boot0001 Non bootable Hotkey
Boot0002* CRAY UEFI OS 0
Boot0003 Intelligent Provisioning
Boot0004 Embedded UEFI Shell
Boot0005 Embedded iPXE
Boot0006 Diagnose Error
Boot0007 Boot Menu
Boot0008 Network Boot
Boot0009 View Integrated Management Log
Boot000A View GUI mode Integrated Management Log
Boot000B View BIOS Event Log
Boot000C HTTP Boot
Boot000D PXE Boot
Boot000E Embedded Diagnostics
Boot000F* CRAY UEFI OS 1
Boot0010* Generic USB Boot
Boot0012 SATA Drive Box 1 Bay 1 : VK000480GWTHA
Boot0013* OCP Slot 10 Port 1 : Marvell FastLinQ 41000 Series - 2P 25GbE SFP28 QL41232HQCU-HC OCP3 Adapter - NIC - Marvell FastLinQ 41000 Series - 2P 25GbE SFP28 QL41232HQCU-HC OCP3 Adapter - PXE (PXE IPv4)
Boot0014 SATA Drive Box 1 Bay 2 : VK000480GWTHA
Boot0015 SATA Drive Box 1 Bay 3 : VK000480GWTHA
Boot0016* Rear USB 1 : PNY USB 3.1 FD
Boot001A* Slot 1 Port 1 : Marvell FastLinQ 41000 Series - 2P 25GbE SFP28 QL41232HLCU-HC MD2 Adapter - NIC - Marvell FastLinQ 41000 Series - 2P 25GbE SFP28 QL41232HLCU-HC MD2 Adapter - PXE (PXE IPv4)
Storage node (with onboard NICs enabled)
BootCurrent: 0014
Timeout: 0 seconds
BootOrder: 0000,0014,0015,0016,0010,0011,0012
Boot0000* System Utilities
Boot0001 Non bootable Hotkey
Boot0002 Intelligent Provisioning
Boot0003 Embedded UEFI Shell
Boot0004 Embedded iPXE
Boot0005 Diagnose Error
Boot0006 Boot Menu
Boot0007 Network Boot
Boot0008 View Integrated Management Log
Boot0009 View GUI mode Integrated Management Log
Boot000A View BIOS Event Log
Boot000B HTTP Boot
Boot000C PXE Boot
Boot000D Embedded Diagnostics
Boot000E* Generic USB Boot
Boot000F* OCP Slot 10 Port 2 : Marvell FastLinQ 41000 Series - 2P 25GbE SFP28 QL41232HQCU-HC OCP3 Adapter - NIC - Marvell FastLinQ 41000 Series - 2P 25GbE SFP28 QL41232HQCU-HC OCP3 Adapter - PXE (PXE IPv4)
Boot0010 SATA Drive Box 1 Bay 1 : VK000480GWTHA
Boot0011 SATA Drive Box 1 Bay 2 : VK000480GWTHA
Boot0012 SATA Drive Box 1 Bay 3 : VK001920GWTHC
Boot0013 Temporary Legacy Boot Option
Boot0014* OCP Slot 10 Port 1 : Marvell FastLinQ 41000 Series - 2P 25GbE SFP28 QL41232HQCU-HC OCP3 Adapter - NIC - Marvell FastLinQ 41000 Series - 2P 25GbE SFP28 QL41232HQCU-HC OCP3 Adapter - PXE (PXE IPv4)
Boot0015* CRAY UEFI OS 0
Boot0016* CRAY UEFI OS 1
Worker node (with onboard NICs enabled)
BootCurrent: 0019
Timeout: 20 seconds
BootOrder: 0000,0019,001E,001C,001D,0010,0011,0012,0013,0014,0015,0016,0017,0018,001A
Boot0000* System Utilities
Boot0001 Non bootable Hotkey
Boot0002 Intelligent Provisioning
Boot0003 Embedded UEFI Shell
Boot0004 Embedded iPXE
Boot0005 Diagnose Error
Boot0006 Boot Menu
Boot0007 Network Boot
Boot0008 View Integrated Management Log
Boot0009 View GUI mode Integrated Management Log
Boot000A View BIOS Event Log
Boot000B HTTP Boot
Boot000C PXE Boot
Boot000D Embedded Diagnostics
Boot000E* Generic USB Boot
Boot000F Temporary Legacy Boot Option
Boot0010 SATA Drive Box 4 Bay 2 : VK001920GWTTC
Boot0011 SATA Drive Box 4 Bay 1 : VK001920GWTTC
Boot0012 SATA Drive Box 1 Bay 5 : VK001920GWTTC
Boot0013 SATA Drive Box 1 Bay 6 : VK001920GWTTC
Boot0014 SATA Drive Box 1 Bay 7 : VK001920GWTTC
Boot0015 SATA Drive Box 1 Bay 8 : VK001920GWTTC
Boot0016 SATA Drive Box 1 Bay 1 : VK000480GWTHA
Boot0017 SATA Drive Box 1 Bay 2 : VK000480GWTHA
Boot0018 SATA Drive Box 1 Bay 3 : VK001920GWTTC
Boot0019* OCP Slot 10 Port 1 : Marvell FastLinQ 41000 Series - 2P 25GbE SFP28 QL41232HQCU-HC OCP3 Adapter - NIC - Marvell FastLinQ 41000 Series - 2P 25GbE SFP28 QL41232HQCU-HC OCP3 Adapter - PXE (PXE IPv4)
Boot001A SATA Drive Box 1 Bay 4 : VK001920GWTTC
Boot001B* Generic USB Boot
Boot001C* CRAY UEFI OS 0
Boot001D* CRAY UEFI OS 1
Boot001E* Slot 1 Port 1 : Marvell FastLinQ 41000 Series - 2P 25GbE SFP28 QL41232HLCU-HC MD2 Adapter - NIC - Marvell FastLinQ 41000 Series - 2P 25GbE SFP28 QL41232HLCU-HC MD2 Adapter - PXE (PXE IPv4)
This procedure is only needed if wishing to revert boot order changes.
Reset the BIOS. Refer to vendor documentation for resetting the BIOS or attempt to reset the BIOS with ipmitool
NOTE: When using
ipmitool
against a machine remotely, it requires more arguments:
read -s
is used to prevent the password from being written to the screen or the shell history.USERNAME=root read -r -s -p "NCN BMC ${USERNAME} password: " IPMI_PASSWORD export IPMI_PASSWORD ipmitool -I lanplus -U "${USERNAME}" -E -H <bmc-hostname>
(ncn#
or pit#
) Reset BIOS with ipmitool
.
ipmitool chassis bootdev none options=clear-cmos
(ncn#
or pit#
) Set next boot with ipmitool
.
ipmitool chassis bootdev pxe options=persistent
ipmitool chassis bootdev pxe options=efiboot
(ncn#
or pit#
) Boot to BIOS for checkout of boot devices.
ipmitool chassis bootdev bios options=efiboot
This is the end of the Reverting changes
procedure.
This procedure explains how to identify USB devices on NCNs.
Some nodes very obviously display which device is the USB, whereas other nodes (such as Gigabyte) do not.
Parsing the output of efibootmgr
can be helpful in determining which device is a USB device. Tools such as lsblk
, blkid
, or kernel (/proc
) may
also be of use. As an example, one can sometimes match up ls -l /dev/disk/by-partuuid
with efibootmgr -v
.
(ncn#
or pit#
) Display the current UEFI boot selections.
efibootmgr
Example output:
BootCurrent: 0015
Timeout: 1 seconds
BootOrder: 000E,000D,0011,0012,0007,0005,0006,0008,0009,0000,0001,0002,000A,000B,000C,0003,0004,000F,0010,0013,0014
Boot0000* Enter Setup
Boot0001 Boot Device List
Boot0002 Network Boot
Boot0003* Launch EFI Shell
Boot0004* UEFI HTTPv6: Network 00 at Riser 02 Slot 01
Boot0005* UEFI HTTPv6: Intel Network 00 at Baseboard
Boot0006* UEFI HTTPv4: Intel Network 00 at Baseboard
Boot0007* UEFI IPv4: Intel Network 00 at Baseboard
Boot0008* UEFI IPv6: Intel Network 00 at Baseboard
Boot0009* UEFI HTTPv6: Intel Network 01 at Baseboard
Boot000A* UEFI HTTPv4: Intel Network 01 at Baseboard
Boot000B* UEFI IPv4: Intel Network 01 at Baseboard
Boot000C* UEFI IPv6: Intel Network 01 at Baseboard
Boot000D* UEFI HTTPv4: Network 00 at Riser 02 Slot 01
Boot000E* UEFI IPv4: Network 00 at Riser 02 Slot 01
Boot000F* UEFI IPv6: Network 00 at Riser 02 Slot 01
Boot0010* UEFI HTTPv6: Network 01 at Riser 02 Slot 01
Boot0011* UEFI HTTPv4: Network 01 at Riser 02 Slot 01
Boot0012* UEFI IPv4: Network 01 at Riser 02 Slot 01
Boot0013* UEFI IPv6: Network 01 at Riser 02 Slot 01
Boot0014* UEFI Samsung Flash Drive 1100
Boot0015* UEFI Samsung Flash Drive 1100
Boot0018* UEFI SAMSUNG MZ7LH480HAHQ-00005 S45PNA0M838871
Boot1001* Enter Setup
(ncn#
or pit#
) Set next boot entry.
In the example above, the device is 0014
or 0015
. An option is to guess it is the first one, and can correct this on-the-fly in POST.
Notice the lack of Boot
in the ID number given; If wanting to choose Boot0014
in the output above, pass 0014
to efibootmgr
:
efibootmgr -n 0014
(ncn#
or pit#
) Verify that the BootNext
device is what was selected.
efibootmgr | grep -i bootnext
Example output:
BootNext: 0014
Now the UEFI Samsung Flash Drive will boot next.
NOTE: There are duplicates in the list. During boot, the EFI boot manager will select the first one. If the first one is false, then it can be deleted with
efibootmgr -b 0014 -d
.
This is the end of the Locating USB device
procedure.