The following workflows present a high-level overview of common Boot Orchestration Service (BOS) operations. These workflows depict how services interact with each other when booting, configuring, or shutting down nodes. They also help provide a quicker and deeper understanding of how the system functions.
The following are mentioned in the workflows:
initrd
, image root) and boot parameters.The following workflows are included in this section:
Use case: Administrator powers on and configures select compute nodes.
BOS v2 boot flow diagram: This labels on the diagram correspond to the workflow steps listed below. Some steps are omitted from the diagram for readability.
Workflow overview: The following sequence of steps occurs during this workflow.
Administrator creates a configuration
(ncn-mw#
) Add a configuration to CFS. For more information on creating CFS configurations, see CFS Configurations.
cray cfs v3 configurations update sample-config --file configuration.json --format json
Example output:
{
"last_updated": "2020-09-22T19:56:32Z",
"layers": [
{
"clone_url": "https://api-gw-service-nmn.local/vcs/cray/configmanagement.git",
"commit": "01b8083dd89c394675f3a6955914f344b90581e2",
"playbook": "site.yaml"
}
],
"name": "sample-config"
}
Administrator creates a BOS session template
A session template is a collection of data specifying a group of nodes, as well as the boot artifacts and configuration that should be applied to them. A session template can be created from a JSON structure. It returns a session template ID if successful.
See Manage a session template for more information.
Administrator creates a session
Create a session to perform the operation specified in the operation request parameter on the boot set defined in the session template. For this use case,
the administrator creates a session with operation as boot
and specifies the session template ID.
(ncn-mw#
)
cray bos v2 sessions create --template-name SESSIONTEMPLATE_NAME --operation boot
Session setup operator
The creation of a session causes the session-setup operator to set a desired state on all components listed in the session template.
This includes pulling files from S3 to determine boot artifacts like kernel, initrd
, and root file system. The session setup operator also enables the relevant
components at this time.
Status operator (powering-on)
The status operator will detect the enabled components and assign them a phase. This involves checking the current state of the node, including communicating with HSM to determine the current power status of the node.
In this example of booting nodes, the first phase is powering-on
. If queried at this point, the nodes will have a status of power-on-pending
.
For more on component phase and status, see Component Status
Power-on operator
The power-on operator will detect nodes with a power-on-pending
status. If root file system provider is the Scalable Boot Provisioning Service (SBPS
), then the power-on operator
will notify SBPS that the root file system needs to be projected. It will do this by tagging the image with sbps-project: true
using the Image Management Service (IMS).
Then, the power-on operator sets the desired boot artifacts in BSS.
If configuration is enabled for the node, the power-on operator will also call CFS to set the desired configuration and disable the node with CFS.
The node must be disabled within CFS so that CFS does not try to configure node until it has booted.
The power-on operator then calls PCS to power-on the node.
Lastly, the power-on operator will update the state of the node in BOS, including setting the last action. If queried at this point,
the nodes will have a status of power-on-called
.
PCS boots nodes
PCS interfaces directly with the Redfish APIs and powers on the selected nodes.
BSS interacts with the nodes
BSS generates iPXE boot scripts based on the image content and boot parameters that have been assigned to a node. Nodes download the iPXE boot script from BSS.
Nodes request boot artifacts from S3
Nodes download the boot artifacts. The nodes boot using the boot artifacts pulled from S3.
Status operator (configuring)
The status operator monitors a node’s power state until HSM reports that the power state is on.
When the power state for a node is on, the status operator will either set the phase to configuring
if CFS configuration is required or it will clear the current phase
if the node is in its final state.
CFS applies configuration
If needed, CFS runs Ansible on the nodes and applies post-boot configuration (also called node personalization).
Status operator (complete)
The status operator will continue monitoring the states for each node until CFS reports that configuration is complete. The status operator will clear the current phase now that the node is in its final state. The status operator will also disable components at this point.
Session completion operator
When all nodes belonging to a session have been disabled, the session is marked complete, and its final status is saved to the database.
Use case: Administrator reboots and configures select compute nodes.
BOS v2 reboot flow diagram: This labels on the diagram correspond to the workflow steps listed below. Some steps are omitted from the diagram for readability.
Workflow overview: The following sequence of steps occurs during this workflow.
Administrator creates a configuration
Add a configuration to CFS. For more information on creating CFS configurations, see CFS Configurations.
(ncn-mw#
)
cray cfs v3 configurations update sample-config --file configuration.json --format json
Example output:
{
"last_updated": "2020-09-22T19:56:32Z",
"layers": [
{
"clone_url": "https://api-gw-service-nmn.local/vcs/cray/configmanagement.git",
"commit": "01b8083dd89c394675f3a6955914f344b90581e2",
"playbook": "site.yaml"
}
],
"name": "sample-config"
}
Administrator creates a BOS session template
A session template is a collection of data specifying a group of nodes, as well as the boot artifacts and configuration that should be applied to them. A session template can be created from a JSON structure. It returns a session template ID if successful.
See Manage a session template for more information.
Administrator creates a session
Create a session to perform the operation specified in the operation request parameter on the boot set defined in the session template. For this use case,
the administrator creates a session with operation as reboot
and specifies the session template ID.
(ncn-mw#
)
cray bos v2 sessions create --template-name SESSIONTEMPLATE_NAME --operation reboot
Session setup operator
The creation of a session causes the session-setup operator to set a desired state on all components listed in the session template.
This includes pulling files from S3 to determine boot artifacts like kernel, initrd
, and root file system. The session setup operator also enables the relevant
components at this time.
Status operator (powering-off)
The status operator will detect the enabled components and assign them a phase. This involves checking the current state of the node, including communicating with HSM to determine the current power status of the node.
In this example of rebooting nodes, the first phase is powering-off
. If queried at this point, the nodes will have a status of power-off-pending
.
For more on component phase and status, see Component Status
Graceful-power-off operator
The power-off operator will detect nodes with a power-off-pending
status, calls PCS to power-off the node.
Then, the power-off operator will update the state of the node in BOS, including setting the last action. If queried at this point, the nodes will have a status of
power-off-gracefully-called
.
Forceful-power-off operator
If powering-off is taking too long, the forceful-power-off will take over. It also calls PCS to power-off the node, but with the addition of the forceful flag.
Then, the power-off operator will update the state of the node in BOS, including setting the last action. If queried at this point, the nodes will have a status of
power-off-forcefully-called
.
PCS powers off nodes
PCS interfaces directly with the Redfish APIs and powers off the selected nodes.
Status operator (powering-on)
The status operator monitors a node’s power state until HSM reports that the power state is off.
When the power state for a node is off, the status operator will set the phase to powering-on
. If queried at this point, the nodes will have a status of
power-on-pending
.
Power-on operator
The power-on operator will detect nodes with a power-on-pending
status. If root file system provider is the Scalable Boot Provisioning Service (SBPS
), then the power-on operator
will notify SBPS that the root file system needs to be projected. It will do this by tagging the image with sbps-project: true
using the Image Management Service (IMS).
Then, the power-on operator sets the desired boot artifacts in BSS.
If configuration is enabled for the node, the power-on operator will also call CFS to set the desired configuration and disable the node with CFS.
The node must be disabled within CFS so that CFS does not try to configure node until it has booted.
The power-on operator then calls PCS to power-on the node.
Lastly, the power-on operator will update the state of the node in BOS, including setting the last action. If queried at this point, the nodes will have a status of
power-on-called
.
PCS boots nodes
PCS interfaces directly with the Redfish APIs and powers on the selected nodes.
BSS interacts with the nodes
BSS generates iPXE boot scripts based on the image content and boot parameters that have been assigned to a node. Nodes download the iPXE boot script from BSS.
Nodes request boot artifacts from S3
Nodes download the boot artifacts. The nodes boot using the boot artifacts pulled from S3.
Status operator (configuring)
The status operator monitors a node’s power state until HSM reports that the power state is on.
When the power state for a node is on, the status operator will either set the phase to configuring
if CFS configuration is required or it will clear the current
phase if the node is in its final state.
CFS applies configuration
If needed, CFS runs Ansible on the nodes and applies post-boot configuration (also called node personalization).
Status operator (complete)
The status operator will continue monitoring the states for each node until CFS reports that configuration is complete. The status operator will clear the current phase now that the node is in its final state. The status operator will also disable components at this point.
Session completion operator
When all nodes belonging to a session have been disabled, the session is marked complete, and its final status is saved to the database.
Use case: Administrator powers off selected compute nodes.
BOS v2 Shutdown flow diagram: This labels on the diagram correspond to the workflow steps listed below. Some steps are omitted from the diagram for readability.
Workflow overview: The following sequence of steps occurs during this workflow.
Administrator creates a BOS session template
A session template is a collection of data specifying a group of nodes, as well as the boot artifacts and configuration that should be applied to them. A session template can be created from a JSON structure. It returns a session template ID if successful.
See Manage a session template for more information.
Administrator creates a session
Create a session to perform the operation specified in the operation request parameter on the boot set defined in the session template. For this use case,
the administrator creates a session with operation as shutdown
and specifies the session template ID.
(ncn-mw#
)
cray bos v2 sessions create --template-name SESSIONTEMPLATE_NAME --operation shutdown
Session setup operator
The creation of a session causes the session-setup operator to set a desired state on all components listed in the session template. For a power-off, this means clearing the desired state for each component. The session setup operator also enables the relevant components at this time.
Status operator (powering-off)
The status operator will detect the enabled components and assign them a phase. This involves checking the current state of the node, including communicating with HSM to determine the current power status of the node.
In this example of booting nodes, the first phase is powering-off
. If queried at this point, the nodes will have a status of power-off-pending
.
For more on component phase and status, see Component Status
Graceful-power-off operator
The power-off operator will detect nodes with a power-off-pending
status, calls PCS to power-off the node.
Then, the power-off operator will update the state of the node in BOS, including setting the last action. If queried at this point, the nodes will have a status of
power-off-gracefully-called
.
Forceful-power-off operator
If powering-off is taking too long, the forceful-power-off will take over. It also calls PCS to power-off the node, but with the addition of the forceful flag.
Then, the power-off operator will update the state of the node in BOS, including setting the last action. If queried at this point, the nodes will have a status of
power-off-forcefully-called
.
PCS powers off nodes
PCS interfaces directly with the Redfish APIs and powers off the selected nodes.
Status operator (complete)
The status operator will continue monitoring the states for each node until CFS reports that configuration is complete. The status operator will clear the current phase now that the node is in its final state. The status operator will also disable components at this point.
Session completion operator
When all nodes belonging to a session have been disabled, the session is marked complete, and its final status is saved to the database.