The following information must be uploaded to the Boot Script Service (BSS) as a prerequisite to booting a node using iPXE:
initrd image in the artifact repositoryBSS manages the iPXE boot scripts that coordinate the boot process for nodes, and it enables basic association of boot scripts with nodes.
The boot scripts supply a booting node with a pointer to the necessary images (kernel and initrd) and a set of boot-time parameters.
When using BOS to boot nodes, this is done automatically. The information on this page describes how an administrator can do this manually, if desired.
initrd image and kernel image for one or more nodes have been uploaded to the artifact repository.
Throughout this procedure, be sure to replace the example values with the actual values for the hosts, boot artifacts, and parameters.
Set variables identifying the boot artifacts and parameters.
(ncn-mw#) Set KERNEL to the S3 download URL of the kernel artifact.
This should be in the s3://s3_BUCKET/S3_OBJECT_KEY/kernel format.
KERNEL=s3://boot-images/97b548b9-2ea9-45c9-95ba-dfc77e5522eb/kernel
(ncn-mw#) Set INITRD to the S3 download URL of the initrd artifact.
This should be in the s3://s3_BUCKET/S3_OBJECT_KEY/initrd format.
INITRD=s3://boot-images/97b548b9-2ea9-45c9-95ba-dfc77e5522eb/initrd
(ncn-mw#) Set ROOTFS to the S3 download URL of the rootfs artifact.
This should be in the s3://s3_BUCKET/S3_OBJECT_KEY/rootfs format.
ROOTFS=s3://boot-images/97b548b9-2ea9-45c9-95ba-dfc77e5522eb/rootfs
(ncn-mw#) Set ETAG to the etag of the rootfs in S3.
S3_BUCKET_KEY=$(echo "${ROOTFS}" | sed 's#^s3://\([^/]\+\)/\(.*rootfs\)$#\1 \2#')
ETAG=$(cray artifacts describe ${S3_BUCKET_KEY} --format json | jq -r '.artifact.ETag' | tr -d '"')
(ncn-mw#) Set PARAMS to the boot kernel parameters.
IMPORTANT: The PARAMS line must always include the substring crashkernel=512M.
This enables node dumps, which are needed to troubleshoot node crashes.
For readability, this example shows the variable being set over multiple lines.
PARAMS="console=ttyS0,115200 bad_page=panic crashkernel=512M,high "
PARAMS+="crashkernel=256M,low intel_pstate=disable numa_balancing=disable oops=panic "
PARAMS+="pcie_ports=native rd.retry=10 rd.shell split_lock_detect=off systemd.unified_cgroup_hierarchy=1 "
PARAMS+="ip=dhcp quiet spire_join_token=\${SPIRE_JOIN_TOKEN} "
PARAMS+="root=craycps-s3:${ROOTFS}:${ETAG}:dvs:api-gw-service-nmn.local:300:eth0 "
PARAMS+="nmd_data=url=${ROOTFS},etag=${ETAG} bos_update_frequency=4h"
(ncn-mw#) Review the variables and verify that the values are correct.
echo "KERNEL=${KERNEL}"
echo "INITRD=${INITRD}"
echo "ROOTFS=${ROOTFS}"
echo "ETAG=${ETAG}"
echo "PARAMS=${PARAMS}"
This step requires the variables from 1. Set variables.
There are three options for updating BSS:
(ncn-mw#) Set HOSTS to a comma-separated list of the node component names (xnames)
whose BSS entries should be updated.
HOSTS=x3000c0s21b1n0,x3000c0s21b2n0
(ncn-mw#) Create the boot parameters in BSS for the selected nodes.
cray bss bootparameters create --hosts "${HOSTS}" --kernel "${KERNEL}" --initrd "${INITRD}" --params "${PARAMS}"
(ncn-mw#) Confirm that the information has been uploaded to BSS.
cray bss bootparameters list --hosts "${HOSTS}"
(ncn-mw#) Set NIDS to a comma-separated list of the node IDs whose BSS entries should be updated.
NIDS=1001,1032
(ncn-mw#) Create the boot parameters in BSS for the selected nodes.
cray bss bootparameters create --nids "${NIDS}" --kernel "${KERNEL}" --initrd "${INITRD}" --params "${PARAMS}"
(ncn-mw#) Confirm that the information has been uploaded to BSS.
cray bss bootparameters list --nids "${NIDS}"
BSS supports a mechanism that allows for a default boot setup, rather than needing to specify boot details for each specific node.
This feature is particularly useful with larger systems. To do this, follow the Update BSS by host name
procedure, setting the HOSTS variable to Default.
Boot information has been added to BSS in preparation for iPXE booting all nodes in the list of host names or NIDs.
As part of power up the nodes in the host name or NID list, the next step is to reboot the nodes.
See also: Troubleshoot Compute Node Boot Issues Related to the Boot Script Service (BSS)
This section lists other BSS queries that may be useful when booting nodes or debugging boot issues.
This will show the specific boot script that will be passed to a given node when requesting a boot script. This is useful for debugging boot problems and to verify that BSS is configured correctly.
(ncn-mw#) View the boot script in BSS using a NID.
cray bss bootscript list --nid NODE_ID
(ncn-mw#) View the boot script in BSS using a host name.
cray bss bootscript list --name HOST_NAME
(ncn-mw#) View the entire contents of BSS.
cray bss dumpstate list
(ncn-mw#) View the information that BSS retrieved from the
Hardware State Manager (HSM).
cray bss hosts list
(ncn-mw#) View all boot parameter information in BSS.
cray bss bootparameters list