Note: CRUS is deprecated in CSM 1.2.0 and it will be removed in CSM 1.5.0. It will be replaced with BOS V2, which will provide similar functionality. See Deprecated features.
The following workflow is intended to be a high-level overview of how to upgrade compute nodes. This workflow depicts how services interact with each other during the compute node upgrade process, and helps to provide a quicker and deeper understanding of how the system functions.
Administrator upgrades select compute nodes to a newer compute image by using Compute Rolling Upgrade Service (CRUS).
This workflow is based on the interaction of CRUS with Boot Orchestration Service (BOS) and Slurm (Workload Manager).
The following terms are mentioned in this workflow:
slurmctld
) is the central management daemon of Slurm. It runs on non-compute nodes in a container. It monitors all other Slurm daemons and
resources, accepts jobs, and allocates resources to those jobs.slurmd
) monitors all tasks running on compute nodes, accepts tasks, launches tasks, and kills running tasks upon request. It runs on compute nodes.The following sequence of steps occur during this workflow.
Create three HSM groups with starting, failed, and upgrading labels.
Any names can be used for these groups.
For this example: crus_starting
, crusfailed
, and crusupgrading
, respectively.
Add all of the compute nodes to be updated to the crus_starting
group.
Leave the crusfailed
, and crusupgrading
groups empty.
A session template is a collection of metadata for a group of nodes and their desired configuration.
Create a BOS session template which points to the new image, the desired CFS configuration, and with a boot set which includes all the compute nodes to be updated.
The boot set can include additional nodes, but it must contain all the nodes that need to be updated. The BOS session template should specify crusupgrading
in the
node_groups
field of one of its boot sets.
This example will use the BOS session template named newcomputetemplate
.
A new upgrade session is launched as a result of this call.
Specify the following parameters:
Parameter | Example | Meaning |
---|---|---|
failed_label |
crusfailed |
An empty Hardware State Manager (HSM) group which CRUS will populate with any nodes that fail their upgrades. |
starting_label |
crus_starting |
An HSM group which contains the total set of nodes to be upgraded. |
upgrading_label |
crusupgrading |
An empty HSM group which CRUS will use to boot and configure subsets of the compute nodes. |
upgradestepsize |
50 |
The number of nodes to include in each discrete upgrade step.* |
upgradetemplateid |
newcomputetemplate |
The name of the BOS session template to use for the upgrades. |
workloadmanagertype |
slurm |
Only Slurm is supported. |
* Each group of concurrent upgrades will never exceed this number of compute nodes, although in some cases they may be smaller.
CRUS calls HSM to find the nodes in the crus_starting
group.
CRUS selects a number of these nodes equal to upgradestepsize
and calls HSM to put them into the crusupgrading
group.
CRUS tells Slurm to quiesce these nodes. As each node is quiesced, Slurm puts the node offline.
Slurm reports back to CRUS that all of the nodes as offline.
CRUS calls BOS to create a session with the following arguments:
Parameter | Value |
---|---|
operation |
reboot |
templateUuid |
newcomputetemplate |
limit |
crusupgrading |
CRUS retrieves the BOS session to get the BOA job name.
CRUS waits for the BOA job to finish.
CRUS looks at the exit code of the BOA job to determine whether or not there were errors.
If there were errors, CRUS adds the nodes from the crusupgrading
group into the crusfailed
group.
CRUS calls HSM to empty the crusupgrading
group.
CRUS repeats steps 5-9 until all of the nodes from the crus_starting
group have gone through these steps.
CRUS marks the session status as complete
.