Bootstrap PIT Node from LiveCD Remote ISO

The Pre-Install Toolkit (PIT) node needs to be bootstrapped from the LiveCD. There are two media available to bootstrap the PIT node: the RemoteISO or a bootable USB device. This procedure describes using the USB device. If not using the RemoteISO, see Bootstrap PIT Node from LiveCD USB

The installation process is similar to the USB-based installation, with adjustments to account for the lack of removable storage.

Important: Before starting this procedure be sure to complete the procedure to Prepare Configuration Payload for the relevant installation scenario.

Topics

  1. Known compatibility issues
  2. Attaching and booting the LiveCD with the BMC
  3. First login
  4. Configure the running LiveCD
    1. Generate installation files
    2. Verify and backup system_config.yaml
    3. Prepare Site Init
  5. Bring up the PIT services and validate PIT health
  6. Next topic

1. Known compatibility issues

The LiveCD Remote ISO has known compatibility issues for nodes from certain vendors.

2. Attaching and booting the LiveCD with the BMC

Warning: If this is a re-installation on a system that still has a USB device from a prior installation, then that USB device must be wiped before continuing. Failing to wipe the USB, if present, may result in confusion. If the USB is still booted, then it can wipe itself using the basic wipe from Wipe NCN Disks for Reinstallation. If it is not booted, please do so and wipe it or disable the USB ports in the BIOS (not available for all vendors).

Obtain and attach the LiveCD cray-pre-install-toolkit ISO file to the BMC. Depending on the vendor of the node, the instructions for attaching to the BMC will differ.

  1. Download the CSM software release and extract the LiveCD remote ISO image.

    Important: Ensure that you have the CSM release plus any patches or hotfixes by following the instructions in Update CSM Product Stream

    The cray-pre-install-toolkit ISO and other files are now available in the directory from the extracted CSM tar file. The ISO will have a name similar to cray-pre-install-toolkit-sle15sp3.x86_64-1.5.8-20211203183315-geddda8a.iso

    This ISO file can be extracted from the CSM release tar file using the following command:

    linux# tar --wildcards --no-anchored -xzvf <csm-release>.tar.gz 'cray-pre-install-toolkit-*.iso'
    
  2. Prepare a server on the network to host the cray-pre-install-toolkit ISO file.

    Place the cray-pre-install-toolkit ISO file on a server which the BMC of the PIT node will be able to contact using HTTP or HTTPS.

    Note: A short URL is better than a long URL for the PIT file on the webserver.

  3. See the respective procedure below to attach an ISO.

  4. The chosen procedure should have rebooted the server. Observe the server boot into the LiveCD.

3. First login

On first login (over SSH or at local console) the LiveCD will prompt the administrator to change the password.

  1. The initial password is empty; enter the username of root and press return twice.

    pit login: root
    

    Expected output looks similar to the following:

    Password:           <-------just press Enter here for a blank password
    You are required to change your password immediately (administrator enforced)
    Changing password for root.
    Current password:   <------- press Enter here, again, for a blank password
    New password:       <------- type new password
    Retype new password:<------- retype new password
    Welcome to the CRAY Pre-Install Toolkit (LiveOS)
    

4. Configure the running LiveCD

  1. Start a typescript to record this section of activities done on ncn-m001 while booted from the LiveCD.

    pit# script -af ~/csm-install-remoteiso.$(date +%Y-%m-%d).txt
    pit# export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
    
  2. Print information about the booted PIT image.

    There is nothing in the output that needs to be verified. This is run in order to ensure the information is recorded in the typescript file, in case it is needed later. For example, this information is useful to include in any bug reports or service queries for issues encountered on the PIT node.

    NOTE The App. Version will report incorrectly in CSM 1.2. Please obtain the version information by running the step below and by invoking rpm -q cray-site-init.

    pit# /root/bin/metalid.sh
    

    Expected output looks similar to the following:

    = PIT Identification = COPY/CUT START =======================================
    VERSION=1.5.7
    TIMESTAMP=20211028194247
    HASH=ge4aceb1
    CRAY-Site-Init build signature...
    Build Commit   : a6c8dddf9df1a9fc7f8c4f17cb26568a8b41d433-main
    Build Time     : 2021-12-01T16:16:41Z
    Go Version     : go1.16.10
    Git Version    : a6c8dddf9df1a9fc7f8c4f17cb26568a8b41d433
    Platform       : linux/amd64
    App. Version   : 1.12.2
    metal-net-scripts-0.0.2-1.noarch
    metal-basecamp-1.1.9-1.x86_64
    metal-ipxe-2.0.10-1.noarch
    pit-init-1.2.12-1.noarch
    = PIT Identification = COPY/CUT END =========================================
    
  3. Find a local disk for storing product installers.

    pit# disk="$(lsblk -l -o SIZE,NAME,TYPE,TRAN | grep -E '(sata|nvme|sas)' | sort -h | awk '{print $2}' | head -n 1 | tr -d '\n')"
    pit# echo $disk
    pit# parted --wipesignatures -m --align=opt --ignore-busy -s /dev/$disk -- mklabel gpt mkpart primary ext4 2048s 100%
    pit# mkfs.ext4 -L PITDATA "/dev/${disk}1"
    pit# mount -vL PITDATA
    

    The parted command may give an error similar to the following:

    Error: Partition(s) 4 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably
    because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making
    further changes.
    

    In that case, the following steps may resolve the problem without needing to reboot. These commands remove volume groups and raid arrays that may be using the disk. These commands only need to be run if the earlier parted command failed.

    pit# RAIDS=$(grep "${disk}[0-9]" /proc/mdstat | awk '{ print "/dev/"$1 }') ; echo ${RAIDS}
    pit# VGS=$(echo ${RAIDS} | xargs -r pvs --noheadings -o vg_name 2>/dev/null) ; echo ${VGS}
    pit# echo ${VGS} | xargs -r -t -n 1 vgremove -f -v
    pit# echo ${RAIDS} | xargs -r -t -n 1 mdadm -S -f -v
    

    After running the above procedure, retry the parted command which failed. If it succeeds, resume the install from that point.

  4. Set up the site-link, enabling SSH to work. You can reconnect with SSH after this step.

    Note: If your site’s network authority or network administrator has already provisioned a DHCP IPv4 address for your master node’s external NIC(s), then skip this step.

    1. Set networking variables.

      If you have previously created the system_config.yaml file for this system, the values for these variables are in it. The following table lists the variables being set, their corresponding system_config.yaml fields, and a description of what they are.

      Variable system_config.yaml Description
      site_ip site-ip The IPv4 address and CIDR netmask for the node’s external interface(s)
      site_gw site-gw The IPv4 gateway address for the node’s external interface(s)
      site_dns site-dns The IPv4 domain name server address for the site
      site_nics site-nic The actual NIC name(s) for the external site interface(s)

      If the system_config.yaml file has not yet been generated for this system, the values for site_ip, site_gw, and site_dns should be provided by the site’s network administrator or network authority. The site_nics interface(s) are typically the first onboard adapter or the first copper 1 GbE PCIe adapter on the PIT node. If multiple interfaces are specified, they must be separated by spaces (for example, site_nics='p2p1 p2p2 p2p3').

      pit# site_ip=172.30.XXX.YYY/20
      pit# site_gw=172.30.48.1
      pit# site_dns=172.30.84.40
      pit# site_nics=em1
      
    2. Run the csi-setup-lan0.sh script to set up the site link.

      Note: All of the /root/bin/csi-* scripts are harmless to run without parameters; doing so will print usage statements.

      pit# /root/bin/csi-setup-lan0.sh $site_ip $site_gw $site_dns $site_nics
      
    3. Verify that lan0 has an IP address and attempt to auto-set the hostname based on DNS.

      The script appends -pit to the end of the hostname as a means to reduce the chances of confusing the PIT node with an actual, deployed NCN.

      pit# ip a show lan0
      pit# /root/bin/csi-set-hostname.sh # this will attempt to set the hostname based on the site's own DNS records.
      
    4. Add helper variables to PIT node environment.

      Important: All CSM install procedures on the PIT node assume that these variables are set and exported.

      1. Set helper variables.

        pit# CSM_RELEASE=csm-x.y.z
        pit# SYSTEM_NAME=eniac
        pit# PITDATA=$(lsblk -o MOUNTPOINT -nr /dev/disk/by-label/PITDATA)
        
      2. Add variables to the PIT environment.

        By adding these to the /etc/environment file of the PIT node, these variables will be automatically set and exported in shell sessions on the PIT node.

        The echo prepends a newline to ensure that the variable assignment occurs on a unique line, and not at the end of another line.

        pit# echo "
        CSM_RELEASE=${CSM_RELEASE}
        PITDATA=${PITDATA}
        CSM_PATH=${PITDATA}/${CSM_RELEASE}
        SYSTEM_NAME=${SYSTEM_NAME}" | tee -a /etc/environment
        
    5. Exit the typescript, exit the console session, and log in again using SSH.

      pit# exit # exit the typescript started earlier
      pit# exit # log out of the pit node
      # Close the console session by entering &. or ~.
      # Then ssh back into the PIT node
      external# ssh root@${SYSTEM_NAME}-ncn-m001
      
    6. After reconnecting, resume the typescript (the -a appends to an existing script).

      pit# script -af $(ls -tr ~/csm-install-remoteiso*.txt | head -n 1)
      pit# export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
      
    7. Verify that expected environment variables are set in the new login shell.

      pit# echo -e "CSM_PATH=${CSM_PATH}\nCSM_RELEASE=${CSM_RELEASE}\nPITDATA=${PITDATA}\nSYSTEM_NAME=${SYSTEM_NAME}"
      
    8. Check hostname.

      pit# hostnamectl
      

      Note:

      • The hostname should be similar to eniac-ncn-m001-pit when booted from the LiveCD, but it will be shown as pit# in the documentation command prompts from this point onward.
      • If the hostname returned by the hostnamectl command is pit, then re-run the csi-set-hostname.sh script with the same parameters. Otherwise, an administrator should set the hostname manually with hostnamectl. In the latter case, do not confuse other administrators by using the hostname ncn-m001. Append the -pit suffix, indicating that the node is booted from the LiveCD.
  5. Create necessary directories.

    pit# mkdir -pv ${PITDATA}/{admin,configs} ${PITDATA}/prep/{admin,logs} ${PITDATA}/data/{k8s,ceph}
    
  6. Relocate the typescript to the newly mounted PITDATA directory.

    1. Quit the typescript session with the exit command.

    2. Copy the typescript file to its new location.

      pit# cp -v ~/csm-install-remoteiso.*.txt ${PITDATA}/prep/admin
      
    3. Restart the typescript, appending to the previous file.

      pit# script -af $(ls -tr ${PITDATA}/prep/admin/csm-install-remoteiso*.txt | head -n 1)
      pit# export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
      
  7. Download the CSM software release to the PIT node.

    1. Set variable to URL of CSM tarball.

      pit# URL=https://arti.dev.cray.com/artifactory/shasta-distribution-stable-local/csm/${CSM_RELEASE}.tar.gz
      
    2. Fetch the release tarball.

      pit# wget ${URL} -O ${CSM_PATH}.tar.gz
      
    3. Expand the tarball on the PIT node.

      Note: Expansion of the tarball may take more than 45 minutes.

      pit# tar -C ${PITDATA} -zxvf ${CSM_PATH}.tar.gz && ls -l ${CSM_PATH}
      
    4. Copy the artifacts into place.

      pit# rsync -a -P --delete ${CSM_PATH}/images/kubernetes/   ${PITDATA}/data/k8s/ &&
           rsync -a -P --delete ${CSM_PATH}/images/storage-ceph/ ${PITDATA}/data/ceph/
      

    Note: The PIT ISO, Helm charts/images, and bootstrap RPMs are now available in the extracted CSM tar file.

  8. Install the latest version of CSI tool.

    pit# rpm -Uvh --force $(find ${CSM_PATH}/rpm/ -name "cray-site-init-*.x86_64.rpm" | sort -V | tail -1)
    
  9. Install the latest documentation RPM.

    See Check for Latest Documentation

  10. Show the version of CSI installed.

    NOTE The App. Version will report incorrectly in CSM 1.2.0 and CSM 1.2.1. Please obtain the version information by running the step below and by invoking rpm -q cray-site-init.

    pit# /root/bin/metalid.sh
    

    Expected output looks similar to the following:

    = PIT Identification = COPY/CUT START =======================================
    VERSION=1.5.7
    TIMESTAMP=20211028194247
    HASH=ge4aceb1
    CRAY-Site-Init build signature...
    Build Commit   : a6c8dddf9df1a9fc7f8c4f17cb26568a8b41d433-main
    Build Time     : 2021-12-01T16:16:41Z
    Go Version     : go1.16.10
    Git Version    : a6c8dddf9df1a9fc7f8c4f17cb26568a8b41d433
    Platform       : linux/amd64
    App. Version   : 1.12.2
    metal-net-scripts-0.0.2-1.noarch
    metal-basecamp-1.1.9-1.x86_64
    metal-ipxe-2.0.10-1.noarch
    pit-init-1.2.12-1.noarch
    = PIT Identification = COPY/CUT END =========================================
    

4.1 Generate installation files

Some files are needed for generating the configuration payload. See the Command Line Configuration Payload and Configuration Payload Files topics if one has not already prepared the information for this system.

  1. Create the hmn_connections.json file by following the Create HMN Connections JSON procedure. Return to this section when completed.

  2. Create the configuration input files if needed and copy them into the preparation directory.

    The preparation directory is ${PITDATA}/prep.

    Copy these files into the preparation directory, or create them if this is an initial install of the system:

    • application_node_config.yaml (optional - see below)
    • cabinets.yaml (optional - see below)
    • hmn_connections.json
    • ncn_metadata.csv
    • switch_metadata.csv
    • system_config.yaml (only available after first-install generation of system files)

    The optional application_node_config.yaml file may be provided for further definition of settings relating to how application nodes will appear in HSM for roles and subroles. See Create Application Node YAML.

    The optional cabinets.yaml file allows cabinet naming and numbering as well as some VLAN overrides. See Create Cabinets YAML.

    The system_config.yaml file is generated by the csi tool during the first install of a system, and can later be used for reinstalls of the system. For the initial install, the information in it must be provided as command line arguments to csi config init.

    After gathering the files into this working directory, move on to Subsequent Fresh-Installs (Re-Installs).

  3. Proceed to the appropriate next step.

4.1.a Subsequent installs (reinstalls)

  1. For subsequent fresh-installs (re-installs) where the system_config.yaml parameter file is available, generate the updated system configuration (see Cray Site Init Files).

    Warning: If the system_config.yaml file is unavailable, then skip this step and proceed to Initial Installs (bare-metal).

    1. Check for the configuration files. The needed files should be in the preparation directory.

      pit# ls -1 ${PITDATA}/prep
      

      Expected output looks similar to the following:

      application_node_config.yaml
      cabinets.yaml
      hmn_connections.json
      ncn_metadata.csv
      switch_metadata.csv
      system_config.yaml
      
    2. Generate the system configuration.

      Note: Ensure that you specify a reachable NTP pool or server using the ntp-pools or ntp-servers fields, respectively. Adding an unreachable server can cause clock skew as chrony tries to continually reach out to a server it can never reach.

      pit# cd ${PITDATA}/prep && csi config init
      

      A new directory matching the system-name field in system_config.yaml will now exist in the working directory.

      Note: These warnings from csi config init for issues in hmn_connections.json can be ignored.

      • The node with the external connection (ncn-m001) will have a warning similar to this because its BMC is connected to the site and not the HMN like the other management NCNs. It can be ignored.

        "Couldn't find switch port for NCN: x3000c0s1b0"
        
      • An unexpected component may have this message. If this component is an application node with an unusual prefix, it should be added to the application_node_config.yaml file. Then rerun csi config init. See the procedure to Create Application Node Config YAML

        {"level":"warn","ts":1610405168.8705149,"msg":"Found unknown source prefix! If this is expected to be an Application node, please update application_node_config.yaml","row":
        {"Source":"gateway01","SourceRack":"x3000","SourceLocation":"u33","DestinationRack":"x3002","DestinationLocation":"u48","DestinationPort":"j29"}}
        
      • If a cooling door is found in hmn_connections.json, there may be a message like the following. It can be safely ignored.

        {"level":"warn","ts":1612552159.2962296,"msg":"Cooling door found, but xname does not yet exist for cooling doors!","row":
        {"Source":"x3000door-Motiv","SourceRack":"x3000","SourceLocation":" ","DestinationRack":"x3000","DestinationLocation":"u36","DestinationPort":"j27"}}
        
    3. Skip the next step and continue to verify and backup system_config.yaml.

4.1.b Initial installs (bare-metal)

  1. For first-time/initial installs (without a system_config.yamlfile), generate the system configuration. See below for an explanation of the command line parameters and some common settings.

    1. Check for the configuration files. The needed files should be in the preparation directory.

      pit# ls -1 ${PITDATA}/prep
      

      Expected output looks similar to the following:

      application_node_config.yaml
      cabinets.yaml
      hmn_connections.json
      ncn_metadata.csv
      switch_metadata.csv
      
    2. Generate the system configuration.

      Notes:

      • Run csi config init --help to print a full list of parameters that must be set. These will vary significantly depending on the system and site configuration.
      • Ensure that you specify a reachable NTP pool or server using the --ntp-pools or --ntp-servers flags, respectively. Adding an unreachable server can cause clock skew as chrony tries to continually reach out to a server it can never reach.
      pit# cd ${PITDATA}/prep && csi config init <options>
      

      A new directory matching the --system-name argument will now exist in the working directory.

      Important: After generating a configuration, a visual audit of the generated files for network data should be performed.

      Special Notes: Certain parameters to csi config init may be hard to grasp on first-time configuration generations:

      • The optional application_node_config.yaml file is used to map prefixes in hmn_connections.csv to HSM subroles. A command line option is required in order for csi to use the file. See Create Application Node YAML.
      • The bootstrap-ncn-bmc-user and bootstrap-ncn-bmc-pass must match what is used for the BMC account and its password for the management NCNs.
      • Set site parameters (site-domain, site-ip, site-gw, site-nic, site-dns) for the network information which connects ncn-m001 (the PIT node) to the site. The site-nic is the interface on ncn-m001 that is connected to the site network.
      • There are other interfaces possible, but the install-ncn-bond-members are typically:
        • p1p1,p10p1 for HPE nodes
        • p1p1,p1p2 for Gigabyte nodes
        • p801p1,p801p2 for Intel nodes
      • If not using a cabinets-yaml file, then set the three cabinet parameters (mountain-cabinets, hill-cabinets, and river-cabinets) to the quantity of each cabinet type included in this system.
      • The starting cabinet number for each type of cabinet (for example, starting-mountain-cabinet) has a default that can be overridden. See the csi config init --help.
      • For systems that use non-sequential cabinet ID numbers, use the cabinets-yaml argument to include the cabinets.yaml file. This file gives the ability to explicitly specify the ID of every cabinet in the system. When specifying a cabinets.yaml file with the cabinets-yaml argument, other command line arguments related to cabinets will be ignored by csi. See Create Cabinets YAML.
      • An override to default cabinet IPv4 subnets can be made with the hmn-mtn-cidr and nmn-mtn-cidr parameters.

      Note: These warnings from csi config init for issues in hmn_connections.json can be ignored.

      • The node with the external connection (ncn-m001) will have a warning similar to this because its BMC is connected to the site and not the HMN like the other management NCNs. It can be ignored.

        "Couldn't find switch port for NCN: x3000c0s1b0"
        
      • An unexpected component may have this message. If this component is an application node with an unusual prefix, it should be added to the application_node_config.yaml file. Then rerun csi config init. See the procedure to Create Application Node Config YAML

        {"level":"warn","ts":1610405168.8705149,"msg":"Found unknown source prefix! If this is expected to be an Application node, please update application_node_config.yaml","row":
        {"Source":"gateway01","SourceRack":"x3000","SourceLocation":"u33","DestinationRack":"x3002","DestinationLocation":"u48","DestinationPort":"j29"}}
        
      • If a cooling door is found in hmn_connections.json, there may be a message like the following. It can be safely ignored.

        {"level":"warn","ts":1612552159.2962296,"msg":"Cooling door found, but xname does not yet exist for cooling doors!","row":
        {"Source":"x3000door-Motiv","SourceRack":"x3000","SourceLocation":" ","DestinationRack":"x3000","DestinationLocation":"u36","DestinationPort":"j27"}}
        
    3. Link the generated system_config.yaml file into the prep/ directory. This is needed for pit-init to find and resolve the file.

      NOTE This step is needed only for fresh installs where system_config.yaml is missing from the prep/ directory.

      pit# cd ${PITDATA}/prep && ln ${SYSTEM_NAME}/system_config.yaml
      
    4. Continue to the next step to verify and backup system_config.yaml.

4.2 Verify and backup system_config.yaml

  1. Verify that the newly generated system_config.yaml matches the current version of CSI.

    1. View the new system_config.yaml file and note the CSI version reported near the end of the file.

      pit# cat ${PITDATA}/prep/${SYSTEM_NAME}/system_config.yaml
      
    2. Note the version reported by the csi tool.

      NOTE The App. Version will report incorrectly in CSM 1.2.0 and CSM 1.2.1. Please obtain the version information by running the step below and by invoking rpm -q cray-site-init.

      pit# csi version
      
    3. The two versions should match. If they do not, determine the cause and regenerate the file.

  2. Copy the new system_config.yaml file somewhere safe to facilitate re-installs.

  3. Continue to the next step to Prepare Site Init.

4.3 Prepare Site Init

Important: Although the command prompts in this procedure are linux#, the procedure should be performed on the PIT node.

Prepare the site-init directory by performing the Prepare Site Init procedures.

5. Bring up the PIT services and validate PIT health

  1. Initialize the PIT.

    The pit-init.sh script will prepare the PIT server for deploying NCNs.

    Set the USERNAME and IPMI_PASSWORD variables to the credentials for the BMC of the PIT node.

    read -s is used in order to prevent the credentials from being displayed on the screen or recorded in the shell history.

    pit# USERNAME=root
    pit# read -s IPMI_PASSWORD
    pit# export USERNAME IPMI_PASSWORD ; /root/bin/pit-init.sh
    
  2. Install csm-testing and hpe-csm-goss-package.

    The following assumes the CSM_PATH environment variable is set to the absolute path of the unpacked CSM release.

    pit# rpm -Uvh --force $(find ${CSM_PATH}/rpm/ -name "csm-testing*.rpm" | sort -V | tail -1) $(find ${CSM_PATH}/rpm/ -name "hpe-csm-goss-package*.rpm" | sort -V | tail -1)
    

Next topic

After completing this procedure, proceed to configure the management network switches.

See Configure Management Network Switches