The Pre-Install Toolkit (PIT) node needs to be bootstrapped from the LiveCD. There are two media available to bootstrap the PIT node: the RemoteISO or a bootable USB device. This procedure describes using the USB device. If not using the RemoteISO, see Bootstrap PIT Node from LiveCD USB
The installation process is similar to the USB-based installation, with adjustments to account for the lack of removable storage.
Important: Before starting this procedure be sure to complete the procedure to Prepare Configuration Payload for the relevant installation scenario.
The LiveCD Remote ISO has known compatibility issues for nodes from certain vendors.
Warning: If this is a re-installation on a system that still has a USB device from a prior installation, then that USB device must be wiped before continuing. Failing to wipe the USB, if present, may result in confusion. If the USB is still booted, then it can wipe itself using the basic wipe from Wipe NCN Disks for Reinstallation. If it is not booted, please do so and wipe it or disable the USB ports in the BIOS (not available for all vendors).
Obtain and attach the LiveCD cray-pre-install-toolkit
ISO file to the BMC. Depending on the vendor of the node,
the instructions for attaching to the BMC will differ.
Download the CSM software release and extract the LiveCD remote ISO image.
Important: Ensure that you have the CSM release plus any patches or hotfixes by following the instructions in Update CSM Product Stream
The cray-pre-install-toolkit
ISO and other files are now available in the directory from the extracted CSM tar file.
The ISO will have a name similar to
cray-pre-install-toolkit-sle15sp3.x86_64-1.5.8-20211203183315-geddda8a.iso
This ISO file can be extracted from the CSM release tar file using the following command:
linux# tar --wildcards --no-anchored -xzvf <csm-release>.tar.gz 'cray-pre-install-toolkit-*.iso'
Prepare a server on the network to host the cray-pre-install-toolkit
ISO file.
Place the cray-pre-install-toolkit
ISO file on a server which the BMC of the PIT node
will be able to contact using HTTP or HTTPS.
Note: A short URL is better than a long URL for the PIT file on the webserver.
See the respective procedure below to attach an ISO.
The chosen procedure should have rebooted the server. Observe the server boot into the LiveCD.
On first login (over SSH or at local console) the LiveCD will prompt the administrator to change the password.
The initial password is empty; enter the username of root
and press return
twice.
pit login: root
Expected output looks similar to the following:
Password: <-------just press Enter here for a blank password
You are required to change your password immediately (administrator enforced)
Changing password for root.
Current password: <------- press Enter here, again, for a blank password
New password: <------- type new password
Retype new password:<------- retype new password
Welcome to the CRAY Pre-Install Toolkit (LiveOS)
Start a typescript to record this section of activities done on ncn-m001
while booted from the LiveCD.
pit# script -af ~/csm-install-remoteiso.$(date +%Y-%m-%d).txt
pit# export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
Print information about the booted PIT image.
There is nothing in the output that needs to be verified. This is run in order to ensure the information is recorded in the typescript file, in case it is needed later. For example, this information is useful to include in any bug reports or service queries for issues encountered on the PIT node.
NOTE The
App. Version
will report incorrectly in CSM 1.2. Please obtain the version information by running the step below and by invokingrpm -q cray-site-init
.
pit# /root/bin/metalid.sh
Expected output looks similar to the following:
= PIT Identification = COPY/CUT START =======================================
VERSION=1.5.7
TIMESTAMP=20211028194247
HASH=ge4aceb1
CRAY-Site-Init build signature...
Build Commit : a6c8dddf9df1a9fc7f8c4f17cb26568a8b41d433-main
Build Time : 2021-12-01T16:16:41Z
Go Version : go1.16.10
Git Version : a6c8dddf9df1a9fc7f8c4f17cb26568a8b41d433
Platform : linux/amd64
App. Version : 1.12.2
metal-net-scripts-0.0.2-1.noarch
metal-basecamp-1.1.9-1.x86_64
metal-ipxe-2.0.10-1.noarch
pit-init-1.2.12-1.noarch
= PIT Identification = COPY/CUT END =========================================
Find a local disk for storing product installers.
pit# disk="$(lsblk -l -o SIZE,NAME,TYPE,TRAN | grep -E '(sata|nvme|sas)' | sort -h | awk '{print $2}' | head -n 1 | tr -d '\n')"
pit# echo $disk
pit# parted --wipesignatures -m --align=opt --ignore-busy -s /dev/$disk -- mklabel gpt mkpart primary ext4 2048s 100%
pit# mkfs.ext4 -L PITDATA "/dev/${disk}1"
pit# mount -vL PITDATA
The parted
command may give an error similar to the following:
Error: Partition(s) 4 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably
because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making
further changes.
In that case, the following steps may resolve the problem without needing to reboot. These commands remove
volume groups and raid arrays that may be using the disk. These commands only need to be run if the earlier
parted
command failed.
pit# RAIDS=$(grep "${disk}[0-9]" /proc/mdstat | awk '{ print "/dev/"$1 }') ; echo ${RAIDS}
pit# VGS=$(echo ${RAIDS} | xargs -r pvs --noheadings -o vg_name 2>/dev/null) ; echo ${VGS}
pit# echo ${VGS} | xargs -r -t -n 1 vgremove -f -v
pit# echo ${RAIDS} | xargs -r -t -n 1 mdadm -S -f -v
After running the above procedure, retry the parted
command which failed. If it succeeds, resume the install from that point.
Set up the site-link, enabling SSH to work. You can reconnect with SSH after this step.
Note: If your site’s network authority or network administrator has already provisioned a DHCP IPv4 address for your master node’s external NIC(s), then skip this step.
Set networking variables.
If you have previously created the
system_config.yaml
file for this system, the values for these variables are in it. The following table lists the variables being set, their correspondingsystem_config.yaml
fields, and a description of what they are.
Variable | system_config.yaml |
Description |
---|---|---|
site_ip |
site-ip |
The IPv4 address and CIDR netmask for the node’s external interface(s) |
site_gw |
site-gw |
The IPv4 gateway address for the node’s external interface(s) |
site_dns |
site-dns |
The IPv4 domain name server address for the site |
site_nics |
site-nic |
The actual NIC name(s) for the external site interface(s) |
If the
system_config.yaml
file has not yet been generated for this system, the values forsite_ip
,site_gw
, andsite_dns
should be provided by the site’s network administrator or network authority. Thesite_nics
interface(s) are typically the first onboard adapter or the first copper 1 GbE PCIe adapter on the PIT node. If multiple interfaces are specified, they must be separated by spaces (for example,site_nics='p2p1 p2p2 p2p3'
).
pit# site_ip=172.30.XXX.YYY/20
pit# site_gw=172.30.48.1
pit# site_dns=172.30.84.40
pit# site_nics=em1
Run the csi-setup-lan0.sh
script to set up the site link.
Note: All of the
/root/bin/csi-*
scripts are harmless to run without parameters; doing so will print usage statements.
pit# /root/bin/csi-setup-lan0.sh $site_ip $site_gw $site_dns $site_nics
Verify that lan0
has an IP address and attempt to auto-set the hostname based on DNS.
The script appends -pit
to the end of the hostname as a means to reduce the chances of confusing the PIT node with an actual, deployed NCN.
pit# ip a show lan0
pit# /root/bin/csi-set-hostname.sh # this will attempt to set the hostname based on the site's own DNS records.
Add helper variables to PIT node environment.
Important: All CSM install procedures on the PIT node assume that these variables are set and exported.
Set helper variables.
pit# CSM_RELEASE=csm-x.y.z
pit# SYSTEM_NAME=eniac
pit# PITDATA=$(lsblk -o MOUNTPOINT -nr /dev/disk/by-label/PITDATA)
Add variables to the PIT environment.
By adding these to the /etc/environment
file of the PIT node, these variables will be
automatically set and exported in shell sessions on the PIT node.
The
echo
prepends a newline to ensure that the variable assignment occurs on a unique line, and not at the end of another line.
pit# echo "
CSM_RELEASE=${CSM_RELEASE}
PITDATA=${PITDATA}
CSM_PATH=${PITDATA}/${CSM_RELEASE}
SYSTEM_NAME=${SYSTEM_NAME}" | tee -a /etc/environment
Exit the typescript, exit the console session, and log in again using SSH.
pit# exit # exit the typescript started earlier
pit# exit # log out of the pit node
# Close the console session by entering &. or ~.
# Then ssh back into the PIT node
external# ssh root@${SYSTEM_NAME}-ncn-m001
After reconnecting, resume the typescript (the -a
appends to an existing script).
pit# script -af $(ls -tr ~/csm-install-remoteiso*.txt | head -n 1)
pit# export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
Verify that expected environment variables are set in the new login shell.
pit# echo -e "CSM_PATH=${CSM_PATH}\nCSM_RELEASE=${CSM_RELEASE}\nPITDATA=${PITDATA}\nSYSTEM_NAME=${SYSTEM_NAME}"
Check hostname.
pit# hostnamectl
Note:
- The hostname should be similar to
eniac-ncn-m001-pit
when booted from the LiveCD, but it will be shown aspit#
in the documentation command prompts from this point onward.- If the hostname returned by the
hostnamectl
command ispit
, then re-run thecsi-set-hostname.sh
script with the same parameters. Otherwise, an administrator should set the hostname manually withhostnamectl
. In the latter case, do not confuse other administrators by using the hostnamencn-m001
. Append the-pit
suffix, indicating that the node is booted from the LiveCD.
Create necessary directories.
pit# mkdir -pv ${PITDATA}/{admin,configs} ${PITDATA}/prep/{admin,logs} ${PITDATA}/data/{k8s,ceph}
Relocate the typescript to the newly mounted PITDATA
directory.
Quit the typescript session with the exit
command.
Copy the typescript file to its new location.
pit# cp -v ~/csm-install-remoteiso.*.txt ${PITDATA}/prep/admin
Restart the typescript, appending to the previous file.
pit# script -af $(ls -tr ${PITDATA}/prep/admin/csm-install-remoteiso*.txt | head -n 1)
pit# export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
Download the CSM software release to the PIT node.
Set variable to URL of CSM tarball.
pit# URL=https://arti.dev.cray.com/artifactory/shasta-distribution-stable-local/csm/${CSM_RELEASE}.tar.gz
Fetch the release tarball.
pit# wget ${URL} -O ${CSM_PATH}.tar.gz
Expand the tarball on the PIT node.
Note: Expansion of the tarball may take more than 45 minutes.
pit# tar -C ${PITDATA} -zxvf ${CSM_PATH}.tar.gz && ls -l ${CSM_PATH}
Copy the artifacts into place.
pit# rsync -a -P --delete ${CSM_PATH}/images/kubernetes/ ${PITDATA}/data/k8s/ &&
rsync -a -P --delete ${CSM_PATH}/images/storage-ceph/ ${PITDATA}/data/ceph/
Note: The PIT ISO, Helm charts/images, and bootstrap RPMs are now available in the extracted CSM tar file.
Install the latest version of CSI tool.
pit# rpm -Uvh --force $(find ${CSM_PATH}/rpm/ -name "cray-site-init-*.x86_64.rpm" | sort -V | tail -1)
Install the latest documentation RPM.
Show the version of CSI installed.
NOTE The
App. Version
will report incorrectly in CSM 1.2.0 and CSM 1.2.1. Please obtain the version information by running the step below and by invokingrpm -q cray-site-init
.
pit# /root/bin/metalid.sh
Expected output looks similar to the following:
= PIT Identification = COPY/CUT START =======================================
VERSION=1.5.7
TIMESTAMP=20211028194247
HASH=ge4aceb1
CRAY-Site-Init build signature...
Build Commit : a6c8dddf9df1a9fc7f8c4f17cb26568a8b41d433-main
Build Time : 2021-12-01T16:16:41Z
Go Version : go1.16.10
Git Version : a6c8dddf9df1a9fc7f8c4f17cb26568a8b41d433
Platform : linux/amd64
App. Version : 1.12.2
metal-net-scripts-0.0.2-1.noarch
metal-basecamp-1.1.9-1.x86_64
metal-ipxe-2.0.10-1.noarch
pit-init-1.2.12-1.noarch
= PIT Identification = COPY/CUT END =========================================
Some files are needed for generating the configuration payload. See the Command Line Configuration Payload and Configuration Payload Files topics if one has not already prepared the information for this system.
Create the hmn_connections.json
file by following the Create HMN Connections JSON procedure. Return to this section when completed.
Create the configuration input files if needed and copy them into the preparation directory.
The preparation directory is ${PITDATA}/prep
.
Copy these files into the preparation directory, or create them if this is an initial install of the system:
application_node_config.yaml
(optional - see below)cabinets.yaml
(optional - see below)hmn_connections.json
ncn_metadata.csv
switch_metadata.csv
system_config.yaml
(only available after first-install generation of system files)The optional
application_node_config.yaml
file may be provided for further definition of settings relating to how application nodes will appear in HSM for roles and subroles. See Create Application Node YAML.The optional
cabinets.yaml
file allows cabinet naming and numbering as well as some VLAN overrides. See Create Cabinets YAML.The
system_config.yaml
file is generated by thecsi
tool during the first install of a system, and can later be used for reinstalls of the system. For the initial install, the information in it must be provided as command line arguments tocsi config init
.
After gathering the files into this working directory, move on to Subsequent Fresh-Installs (Re-Installs).
Proceed to the appropriate next step.
For subsequent fresh-installs (re-installs) where the system_config.yaml
parameter file is available, generate the updated system configuration
(see Cray Site Init
Files).
Warning: If the
system_config.yaml
file is unavailable, then skip this step and proceed to Initial Installs (bare-metal).
Check for the configuration files. The needed files should be in the preparation directory.
pit# ls -1 ${PITDATA}/prep
Expected output looks similar to the following:
application_node_config.yaml
cabinets.yaml
hmn_connections.json
ncn_metadata.csv
switch_metadata.csv
system_config.yaml
Generate the system configuration.
Note: Ensure that you specify a reachable NTP pool or server using the
ntp-pools
orntp-servers
fields, respectively. Adding an unreachable server can cause clock skew aschrony
tries to continually reach out to a server it can never reach.
pit# cd ${PITDATA}/prep && csi config init
A new directory matching the system-name
field in system_config.yaml
will now exist in the working directory.
Note: These warnings from
csi config init
for issues inhmn_connections.json
can be ignored.
The node with the external connection (
ncn-m001
) will have a warning similar to this because its BMC is connected to the site and not the HMN like the other management NCNs. It can be ignored."Couldn't find switch port for NCN: x3000c0s1b0"
An unexpected component may have this message. If this component is an application node with an unusual prefix, it should be added to the
application_node_config.yaml
file. Then reruncsi config init
. See the procedure to Create Application Node Config YAML{"level":"warn","ts":1610405168.8705149,"msg":"Found unknown source prefix! If this is expected to be an Application node, please update application_node_config.yaml","row": {"Source":"gateway01","SourceRack":"x3000","SourceLocation":"u33","DestinationRack":"x3002","DestinationLocation":"u48","DestinationPort":"j29"}}
If a cooling door is found in
hmn_connections.json
, there may be a message like the following. It can be safely ignored.{"level":"warn","ts":1612552159.2962296,"msg":"Cooling door found, but xname does not yet exist for cooling doors!","row": {"Source":"x3000door-Motiv","SourceRack":"x3000","SourceLocation":" ","DestinationRack":"x3000","DestinationLocation":"u36","DestinationPort":"j27"}}
Skip the next step and continue to verify and backup system_config.yaml
.
For first-time/initial installs (without a system_config.yaml
file), generate the system configuration. See below for an explanation of the command line parameters and some common settings.
Check for the configuration files. The needed files should be in the preparation directory.
pit# ls -1 ${PITDATA}/prep
Expected output looks similar to the following:
application_node_config.yaml
cabinets.yaml
hmn_connections.json
ncn_metadata.csv
switch_metadata.csv
Generate the system configuration.
Notes:
- Run
csi config init --help
to print a full list of parameters that must be set. These will vary significantly depending on the system and site configuration.- Ensure that you specify a reachable NTP pool or server using the
--ntp-pools
or--ntp-servers
flags, respectively. Adding an unreachable server can cause clock skew aschrony
tries to continually reach out to a server it can never reach.
pit# cd ${PITDATA}/prep && csi config init <options>
A new directory matching the --system-name
argument will now exist in the working directory.
Important: After generating a configuration, a visual audit of the generated files for network data should be performed.
Special Notes: Certain parameters to
csi config init
may be hard to grasp on first-time configuration generations:
- The optional
application_node_config.yaml
file is used to map prefixes inhmn_connections.csv
to HSM subroles. A command line option is required in order forcsi
to use the file. See Create Application Node YAML.- The
bootstrap-ncn-bmc-user
andbootstrap-ncn-bmc-pass
must match what is used for the BMC account and its password for the management NCNs.- Set site parameters (
site-domain
,site-ip
,site-gw
,site-nic
,site-dns
) for the network information which connectsncn-m001
(the PIT node) to the site. Thesite-nic
is the interface onncn-m001
that is connected to the site network.- There are other interfaces possible, but the
install-ncn-bond-members
are typically:
p1p1,p10p1
for HPE nodesp1p1,p1p2
for Gigabyte nodesp801p1,p801p2
for Intel nodes- If not using a
cabinets-yaml
file, then set the three cabinet parameters (mountain-cabinets
,hill-cabinets
, andriver-cabinets
) to the quantity of each cabinet type included in this system.- The starting cabinet number for each type of cabinet (for example,
starting-mountain-cabinet
) has a default that can be overridden. See thecsi config init --help
.- For systems that use non-sequential cabinet ID numbers, use the
cabinets-yaml
argument to include thecabinets.yaml
file. This file gives the ability to explicitly specify the ID of every cabinet in the system. When specifying acabinets.yaml
file with thecabinets-yaml
argument, other command line arguments related to cabinets will be ignored bycsi
. See Create Cabinets YAML.- An override to default cabinet IPv4 subnets can be made with the
hmn-mtn-cidr
andnmn-mtn-cidr
parameters.Note: These warnings from
csi config init
for issues inhmn_connections.json
can be ignored.
The node with the external connection (
ncn-m001
) will have a warning similar to this because its BMC is connected to the site and not the HMN like the other management NCNs. It can be ignored."Couldn't find switch port for NCN: x3000c0s1b0"
An unexpected component may have this message. If this component is an application node with an unusual prefix, it should be added to the
application_node_config.yaml
file. Then reruncsi config init
. See the procedure to Create Application Node Config YAML{"level":"warn","ts":1610405168.8705149,"msg":"Found unknown source prefix! If this is expected to be an Application node, please update application_node_config.yaml","row": {"Source":"gateway01","SourceRack":"x3000","SourceLocation":"u33","DestinationRack":"x3002","DestinationLocation":"u48","DestinationPort":"j29"}}
If a cooling door is found in
hmn_connections.json
, there may be a message like the following. It can be safely ignored.{"level":"warn","ts":1612552159.2962296,"msg":"Cooling door found, but xname does not yet exist for cooling doors!","row": {"Source":"x3000door-Motiv","SourceRack":"x3000","SourceLocation":" ","DestinationRack":"x3000","DestinationLocation":"u36","DestinationPort":"j27"}}
Link the generated system_config.yaml
file into the prep/
directory. This is needed for pit-init
to find and resolve the file.
NOTE
This step is needed only for fresh installs wheresystem_config.yaml
is missing from theprep/
directory.
pit# cd ${PITDATA}/prep && ln ${SYSTEM_NAME}/system_config.yaml
Continue to the next step to verify and backup system_config.yaml
.
system_config.yaml
Verify that the newly generated system_config.yaml
matches the current version of CSI.
View the new system_config.yaml
file and note the CSI version reported near the end of the file.
pit# cat ${PITDATA}/prep/${SYSTEM_NAME}/system_config.yaml
Note the version reported by the csi
tool.
NOTE The
App. Version
will report incorrectly in CSM 1.2.0 and CSM 1.2.1. Please obtain the version information by running the step below and by invokingrpm -q cray-site-init
.
pit# csi version
The two versions should match. If they do not, determine the cause and regenerate the file.
Copy the new system_config.yaml
file somewhere safe to facilitate re-installs.
Continue to the next step to Prepare Site Init
.
Site Init
Important: Although the command prompts in this procedure are
linux#
, the procedure should be performed on the PIT node.
Prepare the site-init
directory by performing the Prepare Site Init
procedures.
Initialize the PIT.
The pit-init.sh
script will prepare the PIT server for deploying NCNs.
Set the
USERNAME
andIPMI_PASSWORD
variables to the credentials for the BMC of the PIT node.
read -s
is used in order to prevent the credentials from being displayed on the screen or recorded in the shell history.
pit# USERNAME=root
pit# read -s IPMI_PASSWORD
pit# export USERNAME IPMI_PASSWORD ; /root/bin/pit-init.sh
Install csm-testing
and hpe-csm-goss-package
.
The following assumes the CSM_PATH
environment variable is set to the absolute path of the unpacked CSM release.
pit# rpm -Uvh --force $(find ${CSM_PATH}/rpm/ -name "csm-testing*.rpm" | sort -V | tail -1) $(find ${CSM_PATH}/rpm/ -name "hpe-csm-goss-package*.rpm" | sort -V | tail -1)
After completing this procedure, proceed to configure the management network switches.