This page will guide an administrator through creating a USB stick from either their Shasta v1.3 ncn-m001 node or their own laptop or desktop.
There are 5 overall steps that provide a bootable USB with SSH enabled, capable of installing Shasta v1.4 (or higher).
Fetch the base installation CSM tarball and extract it, installing the contained CSI tool.
linux# script -af csm-usb-livecd.$(date +%Y-%m-%d).txt
linux# export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
INTERNAL USE
TheENDPOINT
URL below are for internal use. Customers do not need to download any additional artifacts, the CSM tarball is included along with the Shasta release.
If necessary, download the CSM software release to the Linux host which will be preparing the LiveCD.
linux# cd ~
linux# export ENDPOINT=https://arti.dev.cray.com/artifactory/shasta-distribution-stable-local/csm/
linux# export CSM_RELEASE=csm-x.y.z
linux# wget ${ENDPOINT}/${CSM_RELEASE}.tar.gz
Expand the CSM software release:
IMPORTANT
Before proceeding, refer to the “CSM Patch Assembly” section of the Shasta Install Guide to apply any needed patch content for CSM. It is critical to perform these steps to ensure that the correct CSM release artifacts are deployed.
WARNING
Ensure that theCSM_RELEASE
environment variable is set to the version of the patched CSM release tarball. Applying the “CSM Patch Assembly” procedure will result in a different CSM version when compared to the pre-patched CSM release tarball.
linux# tar -zxvf ${CSM_RELEASE}.tar.gz
linux# export CSM_PATH=$(pwd)/${CSM_RELEASE}
linux# ls -l ${CSM_PATH}
The ISO and other files are now available in the extracted CSM tar.
Remove any previously-installed versions of CSI, the CSM install documentation, and the CSM install workarounds.
It is okay if this command reports that one or more of the packages are not installed.
linux# rpm -ev cray-site-init csm-install-workarounds docs-csm
Install the CSI RPM.
Make sure the CSM_PATH
variable is set to the directory of your expanded CSM tarball.
linux# rpm -Uvh --force ${CSM_PATH}/rpm/cray/csm/sle-15sp2/x86_64/cray-site-init-*.x86_64.rpm
Download and install the workaround and documentation RPMs. If this machine does not have direct internet access these RPMs will need to be externally downloaded and then copied to be installed.
linux# rpm -Uvh https://storage.googleapis.com/csm-release-public/shasta-1.4/docs-csm/docs-csm-latest.noarch.rpm
linux# rpm -Uvh https://storage.googleapis.com/csm-release-public/shasta-1.4/csm-install-workarounds/csm-install-workarounds-latest.noarch.rpm
Show the version of CSI installed.
linux# csi version
Expected output looks similar to the following:
CRAY-Site-Init build signature...
Build Commit : b3ed3046a460d804eb545d21a362b3a5c7d517a3-release-shasta-1.4
Build Time : 2021-02-04T21:05:32Z
Go Version : go1.14.9
Git Version : b3ed3046a460d804eb545d21a362b3a5c7d517a3
Platform : linux/amd64
App. Version : 1.5.18
Install podman or docker to support container tools required to generated sealed secrets.
Podman RPMs are included in the embedded
repository in the CSM release and
may be installed in your pre-LiveCD environment using zypper
as follows:
Add the embedded
repository (if necessary):
linux# zypper ar -fG "${CSM_PATH}/rpm/embedded" "${CSM_RELEASE}-embedded"
Install podman
and podman-cni-config
packages:
linux# zypper in --repo ${CSM_RELEASE}-embedded -y podman podman-cni-config
Or you may use rpm -Uvh
to install RPMs (and their dependencies) manually
from the ${CSM_PATH}/rpm/embedded
directory.
Install lsscsi to view attached storage devices.
lsscsi RPMs are included in the embedded
repository in the CSM release and
may be installed in your pre-LiveCD environment using zypper
as follows:
Add the embedded
repository (if necessary):
linux# zypper ar -fG "${CSM_PATH}/rpm/embedded" "${CSM_RELEASE}-embedded"
Install lsscsi
package:
linux# zypper in --repo ${CSM_RELEASE}-embedded -y lsscsi
Or you may use rpm -Uvh
to install RPMs (and their dependencies) manually
from the ${CSM_PATH}/rpm/embedded
directory.
Although not strictly required, the procedures for setting up the
site-init
directory recommend persisting site-init
files in a Git
repository.
Git RPMs are included in the embedded
repository in the CSM release and
may be installed in your pre-LiveCD environment using zypper
as follows:
Add the embedded
repository (if necessary):
linux# zypper ar -fG "${CSM_PATH}/rpm/embedded" "${CSM_RELEASE}-embedded"
Install git
package:
linux# zypper in --repo ${CSM_RELEASE}-embedded -y git
Or you may use rpm -Uvh
to install RPMs (and their dependencies) manually
from the ${CSM_PATH}/rpm/embedded
directory.
Cray Site Init will create the bootable LiveCD. Before creating the media, we need to identify which device that is.
Identify the USB device.
This example shows the USB device is /dev/sdd on the host.
linux# lsscsi
Expected output looks similar to the following:
[6:0:0:0] disk ATA SAMSUNG MZ7LH480 404Q /dev/sda
[7:0:0:0] disk ATA SAMSUNG MZ7LH480 404Q /dev/sdb
[8:0:0:0] disk ATA SAMSUNG MZ7LH480 404Q /dev/sdc
[14:0:0:0] disk SanDisk Extreme SSD 1012 /dev/sdd
[14:0:0:1] enclosu SanDisk SES Device 1012 -
In the above example, we can see our internal disks as the ATA
devices and our USB as the disk
or enclosu
device. Since the SanDisk
fits the profile we are looking for, we are going to use /dev/sdd
as our disk.
Set a variable with your disk to avoid mistakes:
linux# export USB=/dev/sd<disk_letter>
Format the USB device
On Linux using the CSI application:
linux# csi pit format $USB "${CSM_PATH}"/cray-pre-install-toolkit-*.iso 50000
On MacOS using the bash script:
macos# ./cray-site-init/write-livecd.sh $USB "${CSM_PATH}"/cray-pre-install-toolkit-*.iso 50000
NOTE: At this point the USB stick is usable in any server with an x86_64 architecture based CPU. The remaining steps help add the installation data and enable SSH on boot.
Mount the configuration and persistent data partition:
linux# mkdir -pv /mnt/{cow,pitdata}
linux# mount -vL cow /mnt/cow && mount -vL PITDATA /mnt/pitdata
Copy and extract the tarball (compressed) into the USB:
linux# cp -rv "${CSM_PATH}.tar.gz" /mnt/pitdata/
linux# tar -zxvf "${CSM_PATH}.tar.gz" -C /mnt/pitdata/
The USB stick is now bootable and contains our artifacts. This may be useful for internal or quick usage. Administrators seeking a Shasta installation must continue onto the configuration payload.
The SHASTA-CFG structure and other configuration files will be prepared, then csi will generate system-unique configuration payload used for the rest of the CSM installation on the USB stick.
Check for workarounds in the /opt/cray/csm/workarounds/before-configuration-payload
directory. If there are any workarounds in that directory, run those now. Each has its own instructions in their respective README.md
files.
# Example
linux# ls /opt/cray/csm/workarounds/before-configuration-payload
If there is a workaround here, the output looks similar to the following:
CASMINST-999
Some files are needed for generating the configuration payload. New systems will need to create these files before continuing. Systems upgrading from Shasta v1.3 should prepare by gathering data from the existing system.
Note: The USB stick is usable at this time, but without SSH enabled as well as core services. This means the stick could be used to boot the system now, and a user can return to this step at another time.
Pull these files into the current working directory:
application_node_config.yaml
(optional - see below)cabinets.yaml
(optional - see below)hmn_connections.json
ncn_metadata.csv
switch_metadata.csv
system_config.yaml
(see below)The optional
application_node_config.yaml
file may be provided for further defining of settings relating to how application nodes will appear in HSM for roles and subroles. See the CSI usage for more information.
The optional
cabinets.yaml
file allows cabinet naming and numbering as well as some networking overrides (e.g. VLAN) which will allow systems on Shasta v1.3 to minimize changes to the existing system while migrating to Shasta v1.4. More information on this file can be found here.
After gathering the files into the working directory, generate your configs:
Change into the preparation directory:
linux# mkdir -pv /mnt/pitdata/prep
linux# cd /mnt/pitdata/prep
Generate the system configuration reusing a parameter file (see avoiding parameters) or skip this step.
If moving from a Shasta v1.3 system, the system_config.yaml file will not be available, so skip this step and continue with step 3.
The needed files should be in the current directory.
linux# ls -1
Expected output looks similar to the following:
application_node_config.yaml
cabinets.yaml
hmn_connections.json
ncn_metadata.csv
switch_metadata.csv
system_config.yaml
Generate the system configuration.
linux# csi config init
A new directory matching your --system-name
argument will now exist in your working directory.
Set an environment variable so this system name can be used in later commands.
linux# export SYSTEM_NAME=eniac
Skip step 3 and continue with the CSI Workarounds
Generate the system configuration when a pre-existing parameter file is unavailable:
If moving from a Shasta v1.3 system, this step is required. If you did step 2 above, skip this step.
The needed files should be in the current directory. The application_node_config.yaml file is optional.
linux# ls -1
Expected output looks similar to the following:
application_node_config.yaml
hmn_connections.json
ncn_metadata.csv
shasta_system_configs
switch_metadata.csv
Set an environment variable so this system name can be used in later commands.
linux# export SYSTEM_NAME=eniac
Generate the system configuration. See below for an explanation of the command line parameters and some common settings.
linux# csi config init \
--bootstrap-ncn-bmc-user root \
--bootstrap-ncn-bmc-pass changeme \
--system-name ${SYSTEM_NAME} \
--mountain-cabinets 4 \
--starting-mountain-cabinet 1000 \
--hill-cabinets 0 \
--river-cabinets 1 \
--can-cidr 10.103.11.0/24 \
--can-external-dns 10.103.11.113 \
--can-gateway 10.103.11.1 \
--can-static-pool 10.103.11.112/28 \
--can-dynamic-pool 10.103.11.128/25 \
--nmn-cidr 10.252.0.0/17 \
--hmn-cidr 10.254.0.0/17 \
--ntp-pool time.nist.gov \
--site-domain dev.cray.com \
--site-ip 172.30.53.79/20 \
--site-gw 172.30.48.1 \
--site-nic p1p2 \
--site-dns 172.30.84.40 \
--install-ncn-bond-members p1p1,p10p1 \
--application-node-config-yaml application_node_config.yaml \
--cabinets-yaml cabinets.yaml \
--hmn-mtn-cidr 10.104.0.0/17 \
--nmn-mtn-cidr 10.100.0.0/17
A new directory matching your --system-name
argument will now exist in your working directory.
After generating a configuration, particularly when upgrading from Shasta v1.3 a visual audit of the generated files for network data should be performed. Specifically, the
/networks/HMN_MTN.yaml and /networks/NMN_MTN.yaml files should be viewed to ensure that cabinet names, subnets and VLANs have been preserved for an upgrade to Shasta v1.4. Failure of these parameters to match will likely mean a re-installation or reprogramming of CDU switches and CMM VLANs.
Run the command “csi config init –help” to get more information about the parameters mentioned in the example command above and others which are available.
Notes about parameters to “csi config init”:
These warnings from “csi config init” for issues in hmn_connections.json can be ignored.
"Couldn't find switch port for NCN: x3000c0s1b0"
{"level":"warn","ts":1610405168.8705149,"msg":"Found unknown source prefix! If this is expected to be an Application node, please update application_node_config.yaml","row":
{"Source":"gateway01","SourceRack":"x3000","SourceLocation":"u33","DestinationRack":"x3002","DestinationLocation":"u48","DestinationPort":"j29"}}
{"level":"warn","ts":1612552159.2962296,"msg":"Cooling door found, but xname does not yet exist for cooling doors!","row":
{"Source":"x3000door-Motiv","SourceRack":"x3000","SourceLocation":" ","DestinationRack":"x3000","DestinationLocation":"u36","DestinationPort":"j27"}}
Continue with the CSI Workarounds
Check for workarounds in the /opt/cray/csm/workarounds/csi-config
directory. If there are any workarounds in that directory, run those now. Each has its own instructions in their respective README.md
files.
# Example
linux# ls /opt/cray/csm/workarounds/csi-config
If there is a workaround here, the output looks similar to the following:
CASMINST-999
Now execute the procedures in 067-SHASTA-CFG.md to prepare the site-init
directory for your system. Do not skip this step.
Now that the configuration is generated, we can populate the LiveCD with the generated files.
This will enable SSH, and other services when the LiveCD starts.
Set system name and enter prep directory
linux# export SYSTEM_NAME=eniac
linux# cd /mnt/pitdata/prep
Use CSI to populate the LiveCD, provide both the mount point and the CSI generated config dir.
linux# csi pit populate cow /mnt/cow/ ${SYSTEM_NAME}/
Expected output looks similar to the following:
config------------------------> /mnt/cow/rw/etc/sysconfig/network/config...OK
ifcfg-bond0-------------------> /mnt/cow/rw/etc/sysconfig/network/ifcfg-bond0...OK
ifcfg-lan0--------------------> /mnt/cow/rw/etc/sysconfig/network/ifcfg-lan0...OK
ifcfg-vlan002-----------------> /mnt/cow/rw/etc/sysconfig/network/ifcfg-vlan002...OK
ifcfg-vlan004-----------------> /mnt/cow/rw/etc/sysconfig/network/ifcfg-vlan004...OK
ifcfg-vlan007-----------------> /mnt/cow/rw/etc/sysconfig/network/ifcfg-vlan007...OK
ifroute-lan0------------------> /mnt/cow/rw/etc/sysconfig/network/ifroute-lan0...OK
ifroute-vlan002---------------> /mnt/cow/rw/etc/sysconfig/network/ifroute-vlan002...OK
CAN.conf----------------------> /mnt/cow/rw/etc/dnsmasq.d/CAN.conf...OK
HMN.conf----------------------> /mnt/cow/rw/etc/dnsmasq.d/HMN.conf...OK
NMN.conf----------------------> /mnt/cow/rw/etc/dnsmasq.d/NMN.conf...OK
mtl.conf----------------------> /mnt/cow/rw/etc/dnsmasq.d/mtl.conf...OK
statics.conf------------------> /mnt/cow/rw/etc/dnsmasq.d/statics.conf...OK
conman.conf-------------------> /mnt/cow/rw/etc/conman.conf...OK
Optionally set the hostname, print it into the hostname file.
Do not confuse other admins and name the LiveCD “ncn-m001”, please append the “-pit” suffix which will indicate that the node is booted from the LiveCD.
linux# echo "${SYSTEM_NAME}-ncn-m001-pit" >/mnt/cow/rw/etc/hostname
Unmount the Overlay, we are done with it
linux# umount /mnt/cow
Make directories needed for basecamp (cloud-init) and the squashFS images
linux# mkdir -pv /mnt/pitdata/configs/
linux# mkdir -pv /mnt/pitdata/data/{k8s,ceph}/
Copy basecamp data
linux# csi pit populate pitdata ${SYSTEM_NAME} /mnt/pitdata/configs -b
Expected output looks similar to the following:
data.json---------------------> /mnt/pitdata/configs/data.json...OK
Update CA Cert on the copied data.json
file. Provide the path to the data.json
, the path to
our customizations.yaml
, and finally the sealed_secrets.key
linux# csi patch ca \
--cloud-init-seed-file /mnt/pitdata/configs/data.json \
--customizations-file /mnt/pitdata/prep/site-init/customizations.yaml \
--sealed-secret-key-file /mnt/pitdata/prep/site-init/certs/sealed_secrets.key
Copy k8s artifacts:
Make sure the CSM_PATH
variable is set to the directory of your expanded CSM tarball.
linux# csi pit populate pitdata "${CSM_PATH}/images/kubernetes/" /mnt/pitdata/data/k8s/ -kiK
Expected output looks similar to the following:
5.3.18-24.37-default-0.0.6.kernel-----------------> /mnt/pitdata/data/k8s/...OK
initrd.img-0.0.6.xz-------------------------------> /mnt/pitdata/data/k8s/...OK
kubernetes-0.0.6.squashfs-------------------------> /mnt/pitdata/data/k8s/...OK
Copy ceph/storage artifacts:
Make sure the CSM_PATH
variable is set to the directory of your expanded CSM tarball.
linux# csi pit populate pitdata "${CSM_PATH}/images/storage-ceph/" /mnt/pitdata/data/ceph/ -kiC
Expected output looks similar to the following:
5.3.18-24.37-default-0.0.5.kernel-----------------> /mnt/pitdata/data/ceph/...OK
initrd.img-0.0.5.xz-------------------------------> /mnt/pitdata/data/ceph/...OK
storage-ceph-0.0.5.squashfs-----------------------> /mnt/pitdata/data/ceph/...OK
Unmount the data partition:
linux# cd; umount /mnt/pitdata
Quit the typescript session with the exit
command and copy the file (csm-usb-livecd.
Now the USB stick may be reattached to the CRAY, or if it was made on the CRAY then its server can now reboot into the LiveCD.
Some systems will boot the USB stick automatically if no other OS exists (bare-metal). Otherwise the administrator may need to use the BIOS Boot Selection menu to choose the USB stick.
If an administrator is rebooting a node into the LiveCD, vs booting a bare-metal or wiped node, then efibootmgr
will deterministically set the boot order. See the set boot order page for more information on this topic..
UEFI booting must be enabled to find the USB sticks EFI bootloader.
Start a typescript on an external system, such as a laptop or Linux system, to record this section of activities done on the console of ncn-m001 via IPMI.
external# script -a boot.livecd.$(date +%Y-%m-%d).txt
external# export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
Confirm that the IPMI credentials work for the BMC by checking the power status.
external# export SYSTEM_NAME=eniac
external# export username=root
external# export password=changeme
external# ipmitool -I lanplus -U $username -P $password -H ${SYSTEM_NAME}-ncn-m001-mgmt chassis power status
Connect to the IPMI console.
external# ipmitool -I lanplus -U $username -P $password -H ${SYSTEM_NAME}-ncn-m001-mgmt sol activate
ncn-m001#
Reboot
ncn-m001# reboot
Watch the shutdown and boot from the ipmitool session to the console terminal.
An integrity check runs before Linux starts by default, it can be skipped by selecting “OK” in its prompt.
On first login (over SSH or at local console) the LiveCD will prompt the administrator to change the password.
The initial password is empty; set the username of root
and press return
twice:
pit login: root
Expected output looks similar to the following:
Password: <-------just press Enter here for a blank password
You are required to change your password immediately (administrator enforced)
Changing password for root.
Current password: <------- press Enter here, again, for a blank password
New password: <------- type new password
Retype new password:<------- retype new password
Welcome to the CRAY Prenstall Toolkit (LiveOS)
Offline CSM documentation can be found at /usr/share/doc/metal (version: rpm -q docs-csm)
NOTE
If this password is forgotten, it can be reset by mounting the USB stick on another computer. See LiveCD Troubleshooting for information on clearing the password.
Disconnect from IPMI console.
Once the network is up so that SSH to the node works, disconnect from the IPMI console.
You can disconnect from the IPMI console by using the “~.”, that is, the tilde character followed by a period character.
Login via ssh
to the node as root.
external# ssh root@${SYSTEM_NAME}-ncn-m001
pit#
Note: The hostname should be similar to eniac-ncn-m001-pit when booted from the LiveCD, but it will be shown as “pit#” in the command prompts from this point onward.
Start a typescript to record this section of activities done on ncn-m001 while booted from the LiveCD.
pit# script -af booted-csm-livecd.$(date +%Y-%m-%d).txt
pit# export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
Download and install/upgrade the workaround and documentation RPMs. If this machine does not have direct internet access these RPMs will need to be externally downloaded and then copied to be installed.
pit# rpm -Uvh --force https://storage.googleapis.com/csm-release-public/shasta-1.4/docs-csm/docs-csm-latest.noarch.rpm
pit# rpm -Uvh --force https://storage.googleapis.com/csm-release-public/shasta-1.4/csm-install-workarounds/csm-install-workarounds-latest.noarch.rpm
Check the pit-release version.
pit# cat /etc/pit-release
Expected output looks similar to the following:
VERSION=1.2.2
TIMESTAMP=20210121044136
HASH=75e6c4a
Mount the data partition
The data partition is set to
fsopt=noauto
to facilitate LiveCDs over virtual-ISO mount. USB installations need to mount this manually.
pit# mount -L PITDATA
First login workarounds
Check for workarounds in the /opt/cray/csm/workarounds/first-livecd-login
directory. If there are any workarounds in that directory, run those now. Each has its own instructions in their respective README.md
files.
# Example
pit# ls /opt/cray/csm/workarounds/first-livecd-login
If there is a workaround here, the output looks similar to the following:
CASMINST-999
Start services
pit# systemctl start nexus
pit# systemctl start basecamp
pit# systemctl start conman
Verify the system:
pit# csi pit validate --network
pit# csi pit validate --services
- If dnsmasq is dead, restart it with
systemctl restart dnsmasq
.
In addition, the final output from validating the services should have information about the nexus and basecamp containers/images similar this example.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ff7c22c6c6cb dtr.dev.cray.com/sonatype/nexus3:3.25.0 sh -c ${SONATYPE_... 3 minutes ago Up 3 minutes ago nexus
c7638b573b93 dtr.dev.cray.com/cray/metal-basecamp:1.1.0-1de4aa6 5 minutes ago Up 5 minutes ago basecamp
Follow the output’s directions for failed validations before moving on.
If this is a Shasta v1.3.x migration scenario, then the Dell and Mellanox switches can be reconfigured now with their new names, new IP addresses, and new configuration for v1.4.
See Management Network Dell And Mellanox Upgrades.
After successfully validating the LiveCD USB environment, the administrator may start the CSM Metal Install.