SAT Installation

Install and Upgrade Framework

The Install and Upgrade Framework (IUF) provides commands which install, upgrade, and deploy products on systems managed by CSM. IUF capabilities are described in detail in the IUF section of the Cray System Management Documentation. The initial install and upgrade workflows described in the HPE Cray EX System Software Stack Installation and Upgrade Guide for CSM (S-8052) detail when and how to use IUF with a new release of SAT or any other HPE Cray EX product.

This document does not replicate install, upgrade, or deployment procedures detailed in the Cray System Management Documentation. This document provides details regarding software and configuration content specific to SAT which is needed when installing, upgrading, or deploying a SAT release. The Cray System Management Documentation will indicate when sections of this document should be referred to for detailed information.

IUF will perform the following tasks for a release of SAT.

  • IUF deliver-product stage:
    • Uploads SAT configuration content to VCS
    • Uploads SAT information to the CSM product catalog
    • Uploads SAT content to Nexus repositories
  • IUF update-vcs-config stage:
    • Updates the VCS integration branch with new SAT configuration content if a working branch is specified
  • IUF update-cfs-config stage:
    • Creates a new CFS configuration for management nodes with new SAT configuration content
  • IUF prepare-images stage:
    • Creates updated management NCN and managed node images with new SAT content
  • IUF management-nodes-rollout stage:
    • Boots management NCNs with an image containing new SAT content

IUF uses a variety of CSM and SAT tools when performing these tasks. The IUF section of the Cray System Management Documentation describes how to use these tools directly if it is desirable to use them instead of IUF.

IUF Stage Details for SAT

This section describes SAT details that an administrator must be aware of before running IUF stages. Entries are prefixed with Information if no administrative action is required or Action if an administrator needs to perform tasks outside of IUF.

update-vcs-config

Information: This stage is only run if a VCS working branch is specified for SAT. By default, SAT does not create or specify a VCS working branch.

update-cfs-config

Information: This stage only applies to the management configuration and not to the managed configuration.

prepare-images

Information: This stage only applies to management images and not to managed images.

Post-Installation Procedures

After installing SAT with IUF, complete the following SAT configuration procedures before using SAT:

Notes on the Procedures

  • Ellipses (...) in shell output indicate omitted lines.
  • In the examples below, replace x.y.z with the version of the SAT product stream being installed.
  • ‘manager’ and ‘master’ are used interchangeably in the steps below.

Authenticate SAT Commands

To run SAT commands on the manager NCNs, first set up authentication to the API gateway. For more information on authentication types and authentication credentials, see SAT Command Authentication.

The admin account used to authenticate with sat auth must be enabled in Keycloak and must have its assigned role set to admin. For more information on Keycloak accounts and changing Role Mappings, refer to both Configure Keycloak Account and Create Internal User Accounts in the Keycloak Shasta Realm in the Cray System Management Documentation.

Prerequisites

Procedure

The following is the procedure to globally configure the username used by SAT and authenticate to the API gateway.

  1. (ncn-m001#) Generate a default SAT configuration file if one does not exist.

    sat init
    

    Example output:

    Configuration file "/root/.config/sat/sat.toml" generated.
    

    Note: If the configuration file already exists, it will print out the following error.

    ERROR: Configuration file "/root/.config/sat/sat.toml" already exists.
    Not generating configuration file.
    
  2. Edit ~/.config/sat/sat.toml and set the username option in the api_gateway section of the configuration file.

    username = "crayadmin"
    
  3. (ncn-m001#) Run sat auth. Enter the password when prompted.

    sat auth
    

    Example output:

    Password for crayadmin:
    Succeeded!
    
  4. (ncn-m001#) Other sat commands are now authenticated to make requests to the API gateway.

    sat status
    

Generate SAT S3 Credentials

Generate S3 credentials and write them to a local file so the SAT user can access S3 storage. In order to use the SAT S3 bucket, the System Administrator must generate the S3 access key and secret keys and write them to a local file. This must be done on every Kubernetes control plane node where SAT commands are run.

SAT uses S3 storage for several purposes, most importantly to store the site-specific information set with sat setrev (see Set System Revision Information).

Prerequisites

Procedure

  1. (ncn-m001#) Ensure the files are readable only by root.

    touch /root/.config/sat/s3_access_key \
        /root/.config/sat/s3_secret_key
    
    chmod 600 /root/.config/sat/s3_access_key \
        /root/.config/sat/s3_secret_key
    
  2. (ncn-m001#) Write the credentials to local files using kubectl.

    kubectl get secret sat-s3-credentials -o json -o \
        jsonpath='{.data.access_key}' | base64 -d > \
        /root/.config/sat/s3_access_key
    
    kubectl get secret sat-s3-credentials -o json -o \
        jsonpath='{.data.secret_key}' | base64 -d > \
        /root/.config/sat/s3_secret_key
    
  3. Verify the S3 endpoint specified in the SAT configuration file is correct.

    1. (ncn-m001#) Get the SAT configuration file’s endpoint value.

      Note: If the command’s output is commented out, indicated by an initial # character, the SAT configuration will take the default value – "https://rgw-vip.nmn".

      grep endpoint ~/.config/sat/sat.toml
      

      Example output:

      # endpoint = "https://rgw-vip.nmn"
      
    2. (ncn-m001#) Get the sat-s3-credentials secret’s endpoint value.

      kubectl get secret sat-s3-credentials -o json -o \
          jsonpath='{.data.s3_endpoint}' | base64 -d | xargs
      

      Example output:

      https://rgw-vip.nmn
      
    3. Compare the two endpoint values.

      If the values differ, change the SAT configuration file’s endpoint value to match the secret’s.

  4. (ncn-m001#) Copy SAT configurations to each manager node on the system.

    for i in ncn-m002 ncn-m003; do echo $i; ssh ${i} \
        mkdir -p /root/.config/sat; \
        scp -pr /root/.config/sat ${i}:/root/.config; done
    

    Note: Depending on how many manager nodes are on the system, the list of manager nodes may be different. This example assumes three manager nodes, where the configuration files must be copied from ncn-m001 to ncn-m002 and ncn-m003. Therefore, the list of hosts above is ncn-m002 and ncn-m003.

(Optional) Configure Multi-tenancy

If installing SAT on a multi-tenant system, the tenant name can be configured at this point. For more information, see Configure multi-tenancy.

Set System Revision Information

HPE service representatives use system revision information data to identify systems in support cases.

Prerequisites

Procedure

  1. (ncn-m001#) Set System Revision Information.

    Run sat setrev and follow the prompts to set the following site-specific values:

    • Serial number
    • System name
    • System type
    • System description
    • Product number
    • Company name
    • Site name
    • Country code
    • System install date

    Tip: For “System type”, a system with any liquid-cooled components should be considered a liquid-cooled system. In other words, “System type” is EX-1C.

    sat setrev
    

    Example output:

    --------------------------------------------------------------------------------
    Setting:        Serial number
    Purpose:        System identification. This will affect how snapshots are
                    identified in the HPE backend services.
    Description:    This is the top-level serial number which uniquely identifies
                    the system. It can be requested from an HPE representative.
    Valid values:   Alpha-numeric string, 4 - 20 characters.
    Type:           <class 'str'>
    Default:        None
    Current value:  None
    --------------------------------------------------------------------------------
    Please do one of the following to set the value of the above setting:
        - Input a new value
        - Press CTRL-C to exit
    ...
    
  2. Verify System Revision Information.

    (ncn-m001#) Run sat showrev and verify the output shown in the “System Revision Information table.”

    sat showrev
    

    Example table output:

    ################################################################################
    System Revision Information
    ################################################################################
    +---------------------+---------------+
    | component           | data          |
    +---------------------+---------------+
    | Company name        | HPE           |
    | Country code        | US            |
    | Interconnect        | Sling         |
    | Product number      | R4K98A        |
    | Serial number       | 12345         |
    | Site name           | HPE           |
    | Slurm version       | slurm 20.02.5 |
    | System description  | Test System   |
    | System install date | 2021-01-29    |
    | System name         | eniac         |
    | System type         | EX-1C         |
    +---------------------+---------------+
    ################################################################################
    Product Revision Information
    ################################################################################
    +--------------+-----------------+------------------------------+------------------------------+
    | product_name | product_version | images                       | image_recipes                |
    +--------------+-----------------+------------------------------+------------------------------+
    | csm          | 0.8.14          | cray-shasta-csm-sles15sp1... | cray-shasta-csm-sles15sp1... |
    | sat          | 2.0.1           | -                            | -                            |
    | sdu          | 1.0.8           | -                            | -                            |
    | slingshot    | 0.8.0           | -                            | -                            |
    | sma          | 1.4.12          | -                            | -                            |
    +--------------+-----------------+------------------------------+------------------------------+
    ################################################################################
    Local Host Operating System
    ################################################################################
    +-----------+----------------------+
    | component | version              |
    +-----------+----------------------+
    | Kernel    | 5.3.18-24.15-default |
    | SLES      | SLES 15-SP2          |
    +-----------+----------------------+