Configure Administrative Access

There are several operations which configure administrative access to different parts of the system. Ensuring that the cray CLI can be used with administrative credentials enables use of many management services via commands. The management nodes can be locked from accidental manipulation by the cray capmc and cray fas commands when the intent is to work on the entire system except the management nodes. The cray scsd command can change the SSH keys, NTP server, syslog server, and BMC/controller passwords.

Topics

  1. Configure Keycloak account
  2. Configure the Cray command line interface
  3. Lock management nodes
  4. Configure BMC and controller parameters with SCSD
  5. Configure non-compute nodes with CFS
  6. Upload Olympus BMC recovery firmware into TFTP server
  7. Proceed to next topic

NOTE: The procedures in this section of installation documentation are intended to be done in order, even though the topics are administrative or operational procedures. The topics themselves do not have navigational links to the next topic in the sequence.

1. Configure Keycloak account

Upcoming steps in the installation workflow require an account to be configured in Keycloak for authentication. This can be either a local Keycloak account or an external Identity Provider (IdP), such as LDAP. Having an account in Keycloak with administrative credentials enables the use of many management services via the cray command.

See Configure Keycloak Account.

2. Configure the Cray command line interface

The cray command line interface (CLI) is a framework created to integrate all of the system management REST APIs into easily usable commands.

Later procedures in the installation workflow use the cray command to interact with multiple services. The cray CLI configuration needs to be initialized for the Linux account. The Keycloak user who initializes the CLI configuration needs to be authorized for administrative actions.

See Configure the Cray command line interface.

3. Lock management nodes

The management nodes are unlocked at this point in the installation. Locking the management nodes and their BMCs will prevent actions from FAS to update their firmware or CAPMC to power off or do a power reset. Doing any of these by accident will take down a management node. If the management node is a Kubernetes master or worker node, this can have serious negative effects on system operation.

If a single node is taken down by mistake, it is possible that things will recover. However, if all management nodes are taken down, or all Kubernetes worker nodes are taken down by mistake, the system is dead and has to be completely restarted.

Lock the management nodes and their BMCs now!

See Lock and Unlock Nodes.

4. Configure BMC and controller parameters with SCSD

NOTE: If there are no liquid-cooled cabinets present in the HPE Cray EX system, then this step can be skipped.

The System Configuration Service (SCSD) allows administrators to set various BMC and controller parameters for components in liquid-cooled cabinets. At this point in the install, SCSD should be used to set the SSH key in the node controllers (BMCs) to enable troubleshooting. If any of the nodes fail to power down or power up as part of the compute node booting process, it may be necessary to look at the logs on the BMC for node power down or node power up.

See Configure BMC and Controller Parameters with SCSD.

5. Configure non-compute nodes with CFS

Non-compute Nodes (NCN) need to be configured after booting for administrative access, security, and other purposes. The Configuration Framework Service (CFS) is used to apply post-boot configuration in a decoupled, layered manner. Individual software products including CSM provide one or more layers of configuration in a process called “NCN personalization”.

See Configure Non-Compute Nodes with CFS.

6. Upload Olympus BMC recovery firmware into TFTP server

NOTE: This step requires the CSM software, Cray CLI, and HPC Firmware Pack (HFP) to be installed. If these are not currently installed, then skip this step and perform it later.

The Olympus hardware needs to have recovery firmware loaded to the cray-tftp server in case the BMC loses its firmware. The BMCs are configured to load a recovery firmware from a TFTP server. This procedure does not modify any BMC firmware, but only stages the firmware on the TFTP server for download in the event it is needed.

See Load Olympus BMC Recovery Firmware into TFTP server.

7. Proceed to next topic

After completing the operational procedures above which configure administrative access, the next step is to validate the health of management nodes and CSM services.

See Validate CSM Health.