Configure Non-Compute Nodes with CFS

Non-compute node (NCN) personalization applies post-boot configuration to the HPE Cray EX management nodes. Several HPE Cray EX product environments outside of CSM require NCN personalization to function. Consult the manual for each product to configure them on NCNs by referring to the HPE Cray EX System Software Getting Started Guide (S-8000) 22.07 on the HPE Customer Support Center.

This procedure defines the NCN personalization process for the CSM product using the Configuration Framework Service (CFS).

During a fresh install, carry out these procedures in order. Later, individual procedures may be re-run as needed.

  1. Set up passwordless SSH
  2. Configure the root password and SSH keys in Vault
  3. Perform management NCN personalization

1. Set up passwordless SSH

This procedure should be run during CSM installation and any later time when the SSH keys need to be changed per site requirements.

The goal of passwordless SSH is to enable an easy way for interactive passwordless SSH from and between CSM product environments (management nodes) to downstream managed product environments (COS, UAN, etc), without requiring each downstream environment to create and apply individual changes to NCNs, and as a primary way to manage passwordless SSH configuration between management nodes. Passwordless SSH from downstream nodes into CSM management nodes is not intended or supported.

Passwordless SSH keypairs for the Cray System Management (CSM) are created automatically and maintained with a Kubernetes deployment and staged into Kubernetes secrets (csm-private-key) and ConfigMaps (csm-public-key) in the services namespace. Administrators can use these provided keys, provide their own keys, or use their own solution for authentication.

The management of keys on NCNs is achieved by the trust-csm-ssh-keys and passwordless-ssh Ansible roles in the CSM configuration management repository. The SSH keypair is applied to management nodes using NCN personalization.

NOTE: CFS itself does not use the CSM-provided (or user-supplied) SSH keys to make connections between nodes. CFS will continue to function if passwordless SSH is disabled between CSM and other product environments.

Option 1: Use the CSM-provided SSH keys

The default CSM Ansible plays are already configured to enable Passwordless SSH by default. No further action is necessary before proceeding to Configure the root password and SSH keys in Vault.

Option 2: Provide custom SSH keys

Administrators may elect to replace the CSM-provided keys with their own custom keys.

  1. Set variables to the locations of the public and private SSH key files.

    Replace the values in the examples below with the paths to the desired key files on the system.

    ncn-mw# PUBLIC_KEY_FILE=/path/to/id_rsa-csm.pub
    ncn-mw# PRIVATE_KEY_FILE=/path/to/id_rsa-csm
    
  2. Provide the custom keys by script or manually.

    There are two options for providing the keys.

    • Provide custom SSH keys by script.

      The replace_ssh_keys.sh script can be used to replace the keys from files.

      The docs-csm RPM must be installed in order to use this script. See Check for Latest Documentation

      ncn-mw# /usr/share/doc/csm/scripts/operations/configuration/replace_ssh_keys.sh \
                --public-key-file "${PUBLIC_KEY_FILE}" --private-key-file "${PRIVATE_KEY_FILE}"
      
    • Manually provide custom SSH keys

      The keys stored in Kubernetes can be updated directly.

      1. Replace the private key half:

        ncn-mw# KEY64=$(cat "${PRIVATE_KEY_FILE}" | base64) &&
                kubectl get secret -n services csm-private-key -o json | \
                   jq --arg value "$KEY64" '.data["value"]=$value' | kubectl apply -f - &&
                unset KEY64
        
      2. Replace the public key half:

        ncn-mw# kubectl delete configmap -n services csm-public-key &&
                   cat "${PUBLIC_KEY_FILE}" | base64 > ./value &&
                   kubectl create configmap --from-file value csm-public-key --namespace services &&
                   rm ./value
        

Passwordless SSH with the provided keys will be set up once NCN personalization runs on the NCNs.

NOTE: This keypair may be the same keypair used for the NCN root user, but it is not required to be the same. Either option is valid.

Proceed Configure the root password and SSH keys in Vault.

Option 3: Disable CSM-provided passwordless SSH

Local site security requirements may preclude use of passwordless SSH access between management nodes. A variable has been added to the associated Ansible roles that allows disabling of passwordless SSH setup to any or all nodes. From the cloned csm-config-management repository directory:

ncn-mw# grep csm_passwordless_ssh_enabled roles/trust-csm-ssh-keys/defaults/main.yaml

Example output:

csm_passwordless_ssh_enabled: 'false'

This variable can be overwritten using either a host-specific setting or global to affect all nodes where the playbook is run. See Customize Configuration Values for more detailed information. Do not modify the value in the roles/trust-csm-ssh-keys/defaults/main.yaml file.

Published roles within product configuration repositories can contain more comprehensive information regarding these role-specific flags. Reference any role-specific associated Readme.md documents for additional information, because role documentation is updated more frequently as changes are introduced.

Consult the manual for each product in order to change the default configuration by referring to the HPE Cray EX System Software Getting Started Guide (S-8000) 22.07 on the HPE Customer Support Center. Similar configuration values for disabling the role will be required in these product-specific configuration repositories.

Modifying Ansible plays in a configuration repository will require a new commit and subsequent update of the configuration layer associated with the product.

Proceed Configure the root password and SSH keys in Vault.

Restore CSM-provided SSH keys

Use this procedure if switching from custom keys to the default CSM SSH keys only; otherwise it should be skipped.

In order to restore the default CSM keys, there are two options:

  • Restore by script.

    The docs-csm RPM must be installed in order to use this script. See Check for Latest Documentation

    ncn-mw# /usr/share/doc/csm/scripts/operations/configuration/restore_ssh_keys.sh
    
  • Restore manually.

    The keys can be deleted from Kubernetes directly. The csm-ssh-keys Kubernetes deployment provided by CSM periodically checks the ConfigMap and secret containing the key information. If these entries do not exist, it will recreate them from the default CSM keys. Therefore, in order to manually restore the keys, delete the associated ConfigMap and secret. The default CSM-provided keys will be republished.

    1. Delete the csm-private-key Kubernetes secret.

      ncn-mw# kubectl delete secret -n services csm-private-key
      
    2. Delete the csm-public-key Kubernetes ConfigMap.

      ncn-mw# kubectl delete configmap -n services csm-public-key
      

2. Configure the root password and SSH keys in Vault

The root user password and SSH keys are managed on NCNs by using the csm.password and csm.ssh_keys Ansible roles, respectively, located in the CSM configuration management repository. root user passwords and SSH keys are set and managed in Vault.

There are two options for setting the root password and SSH keys in Vault: automated default or manual.

After these have been set in Vault, they will automatically be applied to NCNs during NCN personalization. For more information on how to configure and run NCN personalization, see 3. Perform management NCN personalization.

Option 1: Automated default

The automated default method uses the write_root_secrets_to_vault.py script to read in the current root user password and SSH keys from the NCN where it is run, and write those to Vault. All of the NCNs are booted from images which already had their root passwords and SSH keys customized during the Deploy Management Nodes procedure of the CSM install. In most cases, these are the same password and keys that should be written to Vault, and this script provides an easy way to do that.

Specifically, the write_root_secrets_to_vault.py script reads the following from the NCN where it is run:

  • The root user password hash from the /etc/shadow file.
  • The private SSH key from /root/.ssh/id_rsa.
  • The public SSH key from /root/.ssh/id_rsa.pub.

This script can be run on any Kubernetes management NCN (master or worker). It only needs to be run once for the cluster, because the same Vault credentials are used for all management NCNs.

The docs-csm RPM must be installed in order to use this script. See Check for Latest Documentation

Run the script with the following command:

ncn-mw# /usr/share/doc/csm/scripts/operations/configuration/write_root_secrets_to_vault.py

A successful execution will exit with return code 0 and will have output similar to the following:

Reading in file '/root/.ssh/id_rsa'
Reading in file '/root/.ssh/id_rsa.pub'
Reading in file '/etc/shadow'
Found root user line in /etc/shadow

Initializing Kubernetes client

Getting Vault token from vault/cray-vault-unseal-keys Kubernetes secret

Examining Kubernetes cray-vault service to determine URL for Vault API endpoint of secret/csm/users/root

Writing SSH keys and root password hash to secret/csm/users/root in Vault
Making POST request to http://10.18.232.40:8200/v1/secret/csm/users/root
Response status code = 204

Read back secrets from Vault to verify that the values were correctly saved
Making GET request to http://10.18.232.40:8200/v1/secret/csm/users/root
Response status code = 200

Validating that Vault contents match what was written to it
All secrets successfully written to Vault

SUCCESS

Proceed to 3. Perform management NCN personalization.

Option 2: Manual

NOTE: Information on writing the root user password and the SSH keys to Vault is documented in two separate procedures. However, if both the password and the SSH keys are to be stored in Vault (the standard case), then the two procedures must be combined. Specifically, only a single write command must be made to Vault, containing both the password and the SSH keys. If multiple write commands are performed, only the information from the final command will persist.

Set the root user password and SSH keys in Vault by combining the following two procedures:

Proceed to 3. Perform management NCN personalization.

3. Perform management NCN personalization

The previous procedures on this page did not make any changes to the NCNs on the system. They merely specified how the NCNs should be configured. In order for these configurations to be applied to the NCNs, a process called NCN personalization takes place using CFS.

There are two steps:

  1. Create or update the NCN personalization configuration in CFS, if needed.
  2. Update CFS to set the NCN personalization configuration as the desired configuration for the NCNs.

These can be accomplished by following Option 1: Automatically apply CSM configuration or Option 2: Manually apply CSM configuration.

Option 1: Automatically apply CSM configuration

The docs-csm RPM must be installed in order to use this script. See Check for Latest Documentation

By default the script will select the latest available CSM release. However, it is possible to specify the CSM release version using the --csm-release parameter.

ncn-mw# /usr/share/doc/csm/scripts/operations/configuration/apply_csm_configuration.sh --csm-release <version e.g. 1.0.11>

Automatic CSM configuration steps

By default, the script will perform the following steps:

  1. Finds the latest installed release version of the CSM product stream.
  2. Finds the CSM configuration version associated with the given release.
  3. Finds the latest commit on the release branch of the csm-config-management repository.
  4. Creates or updates the ncn-personalization.json configuration file.
  5. Finds all nodes in HSM with the Management role.
  6. Disables configuration for all NCNs.
  7. Updates the ncn-personalization configuration in CFS from the ncn-personalization.json file.
  8. Enables configuration for all NCN nodes, and sets their desired configuration to ncn-personalization.
  9. Monitors CFS until all NCN nodes have successfully completed or failed configuration.

Automatic CSM configuration overrides

The script also supports several flags to override these behaviors:

  • --csm-release: Overrides the version of the CSM release that is used. Defaults to the latest version.

    Available versions can be found in the cray-product-catalog ConfigMap.

    ncn-mw# kubectl -n services get cm cray-product-catalog
    
  • --csm-config-version: Overrides the version of the CSM configuration. This corresponds to the version of a branch starting with cray/csm/ in the csm repository in VCS.

  • --git-commit: Overrides the Git commit cloned for the configuration content. Defaults to the latest commit on the csm-release branch.

  • --git-clone-url: Overrides the source of the configuration content. Defaults to https://api-gw-service-nmn.local/vcs/cray/csm-config-management.git.

  • --ncn-config-file: Sets a file other than ncn-personalization.json to be used for the configuration.

  • --xnames: A comma-separated list of component names (xnames) to deploy to. Defaults to all Management nodes in HSM.

  • --clear-state: Clears existing state from components to ensure that CFS runs. This can be used if configuration needs to be re-run on successful nodes with no change to the Git content since the previous run; for example, if the SSH keys have changed.

Option 2: Manually apply CSM configuration

In order to manually run management NCN personalization, first gather the following information:

  • HTTP clone URL for the configuration repository in VCS.
  • Path to the Ansible play to run in the repository.
  • Git commit ID in the repository for CFS to pull and run on the nodes.
Field Value Description
cloneUrl https://api-gw-service-nmn.local/vcs/cray/csm-config-management.git CSM configuration repository
commit Example: 5081c1ecea56002df41218ee39f6030c3eebdf27 CSM configuration commit hash
name Example: csm-1.9.21 CSM configuration layer name
playbook site.yml Default site-wide Ansible playbook for CSM
  1. Retrieve the commit in the repository to use for configuration. If changes have been made to the default branch that was imported during a CSM installation or upgrade, use the commit containing the changes.

  2. If no changes have been made, then the latest commit on the default branch for this version of CSM should be used.

    Find the commit in the cray-product-catalog ConfigMap for the current version of CSM. For example:

    ncn-mw# kubectl -n services get cm cray-product-catalog -o jsonpath='{.data.csm}'
    

    Look for something similar to the following in the output:

    1.2.1:
       configuration:
          clone_url: https://vcs.cmn.SYSTEM_DOMAIN_NAME/vcs/cray/csm-config-management.git
          commit: 43ecfa8236bed625b54325ebb70916f55884b3a4
          import_branch: cray/csm/1.9.24
          import_date: 2021-07-28 03:26:01.869501
          ssh_url: git@vcs.cmn.SYSTEM_DOMAIN_NAME:cray/csm-config-management.git
    

    The commit will be different for each system and version of CSM. For this example, it is 43ecfa8236bed625b54325ebb70916f55884b3a4.

  3. Craft a new configuration layer entry for the new CSM.

          {
             "name": "csm-<version>",
             "cloneUrl": "https://api-gw-service-nmn.local/vcs/cray/csm-config-management.git",
             "playbook": "site.yml",
             "commit": "<retrieved git commit ID>"
          }
    
  4. Create or update the NCN personalization configuration in CFS.

    NOTE The CSM configuration layer MUST be the first layer in the NCN personalization CFS configuration.