The Pre-Install Toolkit (PIT) node needs to be bootstrapped from the LiveCD. There are two media available to bootstrap the PIT node–the RemoteISO or a bootable USB device. This procedure describes using the RemoteISO. If not using the RemoteISO, see Bootstrap PIT Node from LiveCD USB
The installation process is similar to the USB based installation with adjustments to account for the lack of removable storage.
Important: Before starting this procedure be sure to complete the procedure to Prepare Configuration Payload for the relevant installation scenario.
The LiveCD Remote ISO has known compatibility issues for nodes from certain vendors.
Warning: If this is a re-installation on a system that still has a USB device from a prior installation, then that USB device must be wiped before continuing. Failing to wipe the USB, if present, may result in confusion. If the USB is still booted, then it can wipe itself using the basic wipe from Wipe NCN Disks for Reinstallation. If it is not booted, please do so and wipe it or disable the USB ports in the BIOS (not available for all vendors).
Obtain and attach the LiveCD cray-pre-install-toolkit ISO file to the BMC. Depending on the vendor of the node,
the instructions for attaching to the BMC will differ.
The CSM software release should be downloaded and expanded for use.
Important: To ensure that the CSM release plus any patches, workarounds, or hot fixes are included follow the instructions in Update CSM Product Stream
The cray-pre-install-toolkit ISO and other files are now available in the directory from the extracted CSM tar.
The ISO will have a name similar to
cray-pre-install-toolkit-sle15sp2.x86_64-1.4.10-20210514183447-gc054094.iso
This ISO file can be extracted from the CSM release tar file using the following command:
linux# tar --wildcards --no-anchored -xzvf <csm-release>.tar.gz 'cray-pre-install-toolkit-*.iso'
This release of CSM software, the cray-pre-install-toolkit ISO should be placed on a server which the PIT node
will be able to contact using HTTP or HTTPS.
Note: A shorter path name is better than a long path name on the webserver.
tar file. It will have a long filename similar to
cray-pre-install-toolkit-sle15sp2.x86_64-1.4.10-20210514183447-gc054094.iso, so pick a shorter name on the webserver.See the respective procedure below to attach an ISO.
The chosen procedure should have rebooted the server. Observe the server boot into the LiveCD.
On first login (over SSH or at local console) the LiveCD will prompt the administrator to change the password.
The initial password is empty; set the username of root and press return twice.
pit login: root
Expected output looks similar to the following:
Password: <-------just press Enter here for a blank password
You are required to change your password immediately (administrator enforced)
Changing password for root.
Current password: <------- press Enter here, again, for a blank password
New password: <------- type new password
Retype new password:<------- retype new password
Welcome to the CRAY Pre-Install Toolkit (LiveOS)
Set up the initial typescript.
pit# cd ~
pit# script -af csm-install-remoteiso.$(date +%Y-%m-%d).txt
pit# export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
Set up the site-link, enabling SSH to work. You can reconnect with SSH after this step.
NOTICE REGARDING DHCPIf your site’s network authority or network administrator has already provisioned an IPv4 address for your master node(s) external NIC(s), then skip this step.
Setup variables.
# The IPv4 Address for the nodes external interface(s); this will be provided if not already by the site's network administrator or network authority.
pit# site_ip=172.30.XXX.YYY/20
pit# site_gw=172.30.48.1
pit# site_dns=172.30.84.40
# The actual NIC names for the external site interface; the first onboard or the first 1GBe PCIe (RJ-45).
pit# site_nics='p2p1 p2p2 p2p3'
# another example:
pit# site_nics=em1
Run the link setup script.
NOTE : USAGEAll of the/root/bin/csi-*scripts are harmless to run without parameters, doing so will dump usage statements.
pit# /root/bin/csi-setup-lan0.sh $site_ip $site_gw $site_dns $site_nics
Print lan0, and if it has an IP address, then exit console and log in again using SSH.
pit# ip a show lan0
pit# exit
external# ssh root@${SYSTEM_NAME}-ncn-m001
(Recommended) After reconnecting, resume the typescript (the -a appends to an existing script).
pit# cd ~
pit# script -af $(ls -tr csm-install-remoteiso* | head -n 1)
pit# export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
Check hostname.
pit# hostnamectl
Note:
- The hostname should be similar to
eniac-ncn-m001-pitwhen booted from the LiveCD, but it will be shown aspit#in the documentation command prompts from this point onward.- If the hostname returned by the
hostnamectlcommand ispit, then re-run thecsi-set-hostname.shscript with the same parameters. Otherwise, an administrator should set the hostname manually withhostnamectl. In the latter case, do not confuse other administrators by using the hostnamencn-m001. Append the-pitsuffix, indicating that the node is booted from the LiveCD.
Find a local disk for storing product installers.
pit# disk="$(lsblk -l -o SIZE,NAME,TYPE,TRAN | grep -E '(sata|nvme|sas)' | sort -h | awk '{print $2}' | head -n 1 | tr -d '\n')"
pit# echo $disk
pit# parted --wipesignatures -m --align=opt --ignore-busy -s /dev/$disk -- mklabel gpt mkpart primary ext4 2048s 100%
pit# mkfs.ext4 -L PITDATA "/dev/${disk}1"
In some cases the parted command may give an error similar to the following:
Error: Partition(s) 4 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably
because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making
further changes.
In that case, the following steps may resolve the problem without needing to reboot. These commands will remove
volume groups and raid arrays that may be using the disk. These commands only need to be run if the earlier
parted command failed.
pit# RAIDS=$(grep "${disk}[0-9]" /proc/mdstat | awk '{ print "/dev/"$1 }')
pit# echo $RAIDS
pit# VGS=$(echo $RAIDS | xargs -r pvs --noheadings -o vg_name 2>/dev/null)
pit# echo $VGS
pit# echo $VGS | xargs -r -t -n 1 vgremove -f -v
pit# echo $RAIDS | xargs -r -t -n 1 mdadm -S -f -v
After running the above procedure, retry the parted command which failed. If it succeeds, resume the install from that point.
Mount local disk, check the output of each command as it goes.
pit# mount -v -L PITDATA
pit# pushd /var/www/ephemeral
pit# mkdir -v admin prep prep/admin configs data
Quit the typescript session with the exit command, copy the file (csm-install-remoteis.<date>.txt) from its initial location to the newly created directory, and restart the typescript.
pit# exit # The typescript
pit# cp -v ~/csm-install-remoteiso.*.txt /var/www/ephemeral/prep/admin
pit# cd /var/www/ephemeral/prep/admin
pit# script -af $(ls -tr csm-install-remoteiso* | head -n 1)
pit# export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
pit# pushd /var/www/ephemeral
Download the CSM software release to the PIT node.
Important: In an earlier step, the CSM release plus any patches, workarounds, or hot fixes
were downloaded to a system using the instructions in Update CSM Product Stream
Either copy from that system to the PIT node or set the ENDPOINT variable to URL and use wget.
Set helper variables.
pit# ENDPOINT=https://arti.dev.cray.com/artifactory/shasta-distribution-stable-local/csm
pit# export CSM_RELEASE=csm-x.y.z
pit# export SYSTEM_NAME=eniac
Save the CSM_RELEASE and SYSTEM_NAME variables for usage later; all subsequent shell sessions will have this variable set.
The
echoprepends a newline to ensure that the variable assignment occurs on a unique line, and not at the end of another.
pit# echo -e "\nCSM_RELEASE=${CSM_RELEASE}\nSYSTEM_NAME=${SYSTEM_NAME}" >>/etc/environment
Fetch the release tar file.
pit# wget ${ENDPOINT}/${CSM_RELEASE}.tar.gz -O /var/www/ephemeral/${CSM_RELEASE}.tar.gz
Expand the tar file on the PIT node.
Note: Expansion of the
tarfile may take more than 45 minutes.
pit# tar -zxvf ${CSM_RELEASE}.tar.gz
pit# ls -l ${CSM_RELEASE}
Copy the artifacts into place.
pit# mkdir -pv data/{k8s,ceph}
pit# rsync -a -P --delete ./${CSM_RELEASE}/images/kubernetes/ ./data/k8s/
pit# rsync -a -P --delete ./${CSM_RELEASE}/images/storage-ceph/ ./data/ceph/
The PIT ISO, Helm charts/images, and bootstrap RPMs are now available in the extracted CSM
tar.
Install/upgrade the CSI and testing RPMs.
pit# rpm -Uvh --force \
$(find ./${CSM_RELEASE}/rpm/ -name "cray-site-init-*.x86_64.rpm" | sort -V | tail -1) \
$(find ./${CSM_RELEASE}/rpm/ -name "hpe-csm-goss-package*.rpm" | sort -V | tail -1) \
$(find ./${CSM_RELEASE}/rpm/ -name "csm-testing*.rpm" | sort -V | tail -1) \
$(find ./${CSM_RELEASE}/rpm/ -name "goss-servers*.rpm" | sort -V | tail -1)
Show the version of CSI installed.
pit# csi version
Expected output looks similar to the following:
CRAY-Site-Init build signature...
Build Commit : b3ed3046a460d804eb545d21a362b3a5c7d517a3-release-shasta-1.4
Build Time : 2021-02-04T21:05:32Z
Go Version : go1.14.9
Git Version : b3ed3046a460d804eb545d21a362b3a5c7d517a3
Platform : linux/amd64
App. Version : 1.5.18
Download and install/upgrade the workaround and documentation RPMs.
If this machine does not have direct Internet access these RPMs will need to be externally downloaded and then copied to the system.
Important: In an earlier step, the CSM release plus any patches, workarounds, or hot fixes were downloaded to a system using the instructions in Check for Latest Workarounds and Documentation Updates. Use that set of RPMs rather than downloading again.
linux# wget https://storage.googleapis.com/csm-release-public/shasta-1.5/docs-csm/docs-csm-latest.noarch.rpm
linux# wget https://storage.googleapis.com/csm-release-public/shasta-1.5/csm-install-workarounds/csm-install-workarounds-latest.noarch.rpm
linux# scp -p docs-csm-*rpm csm-install-workarounds-*rpm ncn-m001:/root
linux# ssh ncn-m001
pit# rpm -Uvh --force docs-csm-latest.noarch.rpm
pit# rpm -Uvh --force csm-install-workarounds-latest.noarch.rpm
Some files are needed for generating the configuration payload. See the Command Line Configuration Payload and Configuration Payload Files topics if one has not already prepared the information for this system.
Create the hmn_connections.json file by following the Create HMN Connections JSON procedure. Return to this section when completed.
Create the configuration input files if needed and copy them into the preparation directory.
The preparation directory is ${PITDATA}/prep.
Copy these files into the preparation directory, or create them if this is an initial install of the system:
application_node_config.yaml (optional - see below)cabinets.yaml (optional - see below)hmn_connections.jsonncn_metadata.csvswitch_metadata.csvsystem_config.yaml (only available after first-install generation of system files)The optional
application_node_config.yamlfile may be provided for further definition of settings relating to how application nodes will appear in HSM for roles and subroles. See Create Application Node YAML.The optional
cabinets.yamlfile allows cabinet naming and numbering as well as some VLAN overrides. See Create Cabinets YAML.The
system_config.yamlfile is generated by thecsitool during the first install of a system, and can later be used for reinstalls of the system. For the initial install, the information in it must be provided as command line arguments tocsi config init.
Change into the preparation directory.
linux# mkdir -pv /var/www/ephemeral/prep
linux# cd /var/www/ephemeral/prep
After gathering the files into this working directory, generate your configurations.
If doing a reinstall and have the system_config.yaml parameter file available, then generate the system configuration reusing this parameter file (see avoiding parameters).
If not doing a reinstall of Shasta software, then the system_config.yaml file will not be available, so skip the rest of this step.
Check for the configuration files. The needed files should be in the current directory.
linux# ls -1
Expected output looks similar to the following:
application_node_config.yaml
cabinets.yaml
hmn_connections.json
ncn_metadata.csv
switch_metadata.csv
system_config.yaml
Generate the system configuration.
Note: Ensure that you specify a reachable NTP pool or server using the
ntp-poolsorntp-serversfields, respectively. Adding an unreachable server can cause clock skew aschronytries to continually reach out to a server it can never reach.
linux# csi config init
A new directory matching the system-name field in system_config.yaml will now exist in the working directory.
Note: These warnings from
csi config initfor issues inhmn_connections.jsoncan be ignored.
The node with the external connection (
ncn-m001) will have a warning similar to this because its BMC is connected to the site and not the HMN like the other management NCNs. It can be ignored."Couldn't find switch port for NCN: x3000c0s1b0"An unexpected component may have this message. If this component is an application node with an unusual prefix, it should be added to the
application_node_config.yamlfile. Then reruncsi config init. See the procedure to Create Application Node Config YAML{"level":"warn","ts":1610405168.8705149,"msg":"Found unknown source prefix! If this is expected to be an Application node, please update application_node_config.yaml","row": {"Source":"gateway01","SourceRack":"x3000","SourceLocation":"u33","DestinationRack":"x3002","DestinationLocation":"u48","DestinationPort":"j29"}}If a cooling door is found in
hmn_connections.json, there may be a message like the following. It can be safely ignored.{"level":"warn","ts":1612552159.2962296,"msg":"Cooling door found, but xname does not yet exist for cooling doors!","row": {"Source":"x3000door-Motiv","SourceRack":"x3000","SourceLocation":" ","DestinationRack":"x3000","DestinationLocation":"u36","DestinationPort":"j27"}}
Skip the next step and continue to the CSI Workarounds.
If doing a first time install or the system_config.yaml parameter file for a reinstall is not available, generate the system configuration.
If doing a first time install, this step is required. If you did the previous step as part of a reinstall, skip this.
Check for the configuration files. The needed files should be in the current directory.
linux# ls -1
Expected output looks similar to the following:
application_node_config.yaml
cabinets.yaml
hmn_connections.json
ncn_metadata.csv
switch_metadata.csv
Generate the system configuration.
Notes:
- Run
csi config init --helpto print a full list of parameters that must be set. These will vary significantly depending on the system and site configuration.- Ensure that you specify a reachable NTP pool or server using the
--ntp-poolsor--ntp-serversflags, respectively. Adding an unreachable server can cause clock skew aschronytries to continually reach out to a server it can never reach.
linux# csi config init <options>
A new directory matching the system-name field in system_config.yaml will now exist in the working directory.
Important: After generating a configuration, a visual audit of the generated files for network data should be performed.
Special Notes: Certain parameters to
csi config initmay be hard to grasp on first-time configuration generations:Notes about parameters to
csi config init:
- The optional
application_node_config.yamlfile is used to map prefixes inhmn_connections.csvto HSM subroles. A command line option is required in order forcsito use the file. See Create Application Node YAML.- The
bootstrap-ncn-bmc-userandbootstrap-ncn-bmc-passmust match what is used for the BMC account and its password for the management NCNs.- Set site parameters (
site-domain,site-ip,site-gw,site-nic,site-dns) for the network information which connectsncn-m001(the PIT node) to the site. Thesite-nicis the interface onncn-m001that is connected to the site network.- There are other interfaces possible, but the
install-ncn-bond-membersare typically:
p1p1,p10p1for HPE nodesp1p1,p1p2for Gigabyte nodesp801p1,p801p2for Intel nodes- If not using a
cabinets-yamlfile, then set the three cabinet parameters (mountain-cabinets,hill-cabinets, andriver-cabinets) to the quantity of each cabinet type included in this system.- The starting cabinet number for each type of cabinet (for example,
starting-mountain-cabinet) has a default that can be overridden. See thecsi config init --help.- For systems that use non-sequential cabinet ID numbers, use the
cabinets-yamlargument to include thecabinets.yamlfile. This file gives the ability to explicitly specify the ID of every cabinet in the system. When specifying acabinets.yamlfile with thecabinets-yamlargument, other command line arguments related to cabinets will be ignored bycsi. See Create Cabinets YAML.- An override to default cabinet IPv4 subnets can be made with the
hmn-mtn-cidrandnmn-mtn-cidrparameters.- By default, spine switches are used as MetalLB peers. Use
--bgp-peers aggregationto use aggregation switches instead.- Several parameters (
can-gateway,can-cidr,can-static-pool,can-dynamic-pool) describe the CAN (Customer Access network). Thecan-gatewayis the common gateway IP address used for both spine switches and commonly referred to as the Virtual IP address for the CAN. Thecan-cidris the IP subnet for the CAN assigned to this system. Thecan-static-poolandcan-dynamic-poolare the MetalLB address static and dynamic pools for the CAN. Thecan-external-dnsis the static IP address assigned to the DNS instance running in the cluster to which requests the cluster subdomain will be forwarded. Thecan-external-dnsIP address must be within thecan-static-poolrange.- Set
ntp-poolsto reachable NTP poolsNote: These warnings from
csi config initfor issues inhmn_connections.jsoncan be ignored.
The node with the external connection (
ncn-m001) will have a warning similar to this because its BMC is connected to the site and not the HMN like the other management NCNs. It can be ignored."Couldn't find switch port for NCN: x3000c0s1b0"An unexpected component may have this message. If this component is an application node with an unusual prefix, it should be added to the
application_node_config.yamlfile. Then reruncsi config init. See the procedure to Create Application Node Config YAML{"level":"warn","ts":1610405168.8705149,"msg":"Found unknown source prefix! If this is expected to be an Application node, please update application_node_config.yaml","row": {"Source":"gateway01","SourceRack":"x3000","SourceLocation":"u33","DestinationRack":"x3002","DestinationLocation":"u48","DestinationPort":"j29"}}If a cooling door is found in
hmn_connections.json, there may be a message like the following. It can be safely ignored.{"level":"warn","ts":1612552159.2962296,"msg":"Cooling door found, but xname does not yet exist for cooling doors!","row": {"Source":"x3000door-Motiv","SourceRack":"x3000","SourceLocation":" ","DestinationRack":"x3000","DestinationLocation":"u36","DestinationPort":"j27"}}
Link the generated system_config.yaml file into the prep/ directory. This is needed for pit-init to find and resolve the file.
NOTEThis step is needed only for fresh installs wheresystem_config.yamlis missing from theprep/directory.
pit# cd ${PITDATA}/prep && ln ${SYSTEM_NAME}/system_config.yaml
Continue with the next step to apply the csi-config workarounds.
Follow the workaround instructions for the csi-config breakpoint.
Copy the interface configuration files generated earlier by csi config init
into /etc/sysconfig/network/ with the first option or use the provided scripts in the second option below.
Option 1: Copy PIT files.
pit# cp -pv /var/www/ephemeral/prep/${SYSTEM_NAME}/pit-files/* /etc/sysconfig/network/
pit# wicked ifreload all
pit# systemctl restart wickedd-nanny && sleep 5
Option 2: Set up dnsmasq by hand.
pit# /root/bin/csi-setup-vlan002.sh $nmn_cidr
pit# /root/bin/csi-setup-vlan004.sh $hmn_cidr
pit# /root/bin/csi-setup-vlan007.sh $can_cidr
Check that IP addresses are set for each interface and investigate any failures.
Check IP addresses. Do not run tests if these are missing and instead triage the issue.
pit# wicked show bond0 vlan002 vlan004 vlan007
bond0 up
link: #7, state up, mtu 1500
type: bond, mode ieee802-3ad, hwaddr b8:59:9f:fe:49:d4
config: compat:suse:/etc/sysconfig/network/ifcfg-bond0
leases: ipv4 static granted
addr: ipv4 10.1.1.2/16 [static]
vlan002 up
link: #8, state up, mtu 1500
type: vlan bond0[2], hwaddr b8:59:9f:fe:49:d4
config: compat:suse:/etc/sysconfig/network/ifcfg-vlan002
leases: ipv4 static granted
addr: ipv4 10.252.1.4/17 [static]
route: ipv4 10.92.100.0/24 via 10.252.0.1 proto boot
vlan007 up
link: #9, state up, mtu 1500
type: vlan bond0[7], hwaddr b8:59:9f:fe:49:d4
config: compat:suse:/etc/sysconfig/network/ifcfg-vlan007
leases: ipv4 static granted
addr: ipv4 10.102.9.5/24 [static]
vlan004 up
link: #10, state up, mtu 1500
type: vlan bond0[4], hwaddr b8:59:9f:fe:49:d4
config: compat:suse:/etc/sysconfig/network/ifcfg-vlan004
leases: ipv4 static granted
addr: ipv4 10.254.1.4/17 [static]
Run tests, inspect failures.
pit# csi pit validate --network
Copy the service configuration files generated earlier by csi config init for dnsmasq, Metal
Basecamp (cloud-init), and ConMan.
Copy files (files only, -r is expressly not used).
pit# cp -pv /var/www/ephemeral/prep/${SYSTEM_NAME}/dnsmasq.d/* /etc/dnsmasq.d/
pit# cp -pv /var/www/ephemeral/prep/${SYSTEM_NAME}/conman.conf /etc/conman.conf
pit# cp -pv /var/www/ephemeral/prep/${SYSTEM_NAME}/basecamp/* /var/www/ephemeral/configs/
Enable, and fully restart all PIT services.
pit# systemctl enable basecamp nexus dnsmasq conman
pit# systemctl stop basecamp nexus dnsmasq conman
pit# systemctl start basecamp nexus dnsmasq conman
Start and configure NTP on the LiveCD for a fallback/recovery server.
pit# /root/bin/configure-ntp.sh
Check that the services are ready and investigate any test failures.
pit# csi pit validate --services
Mount a shim to match the SHASTA-CFG steps’ directory structure.
pit# mkdir -vp /mnt/pitdata
pit# mount -v -L PITDATA /mnt/pitdata
The following procedure will set up customized CA certificates for deployment using SHASTA-CFG.
site-init to create and prepare the site-init directory for your system.After completing this procedure, the next step is to configure the management network switches.