The following workflows present a high-level overview of common Boot Orchestration Service (BOS) operations. These workflows depict how services interact with each other when booting, configuring, or shutting down nodes. They also help provide a quicker and deeper understanding of how the system functions.
The following are mentioned in the workflows:
initrd
, image root) and boot parameters.The following workflows are included in this section:
Use case: Administrator powers on and configures select compute nodes.
BOS v2 boot flow diagram: This labels on the diagram correspond to the workflow steps listed below. Some steps are omitted from the diagram for readability.
Workflow overview: The following sequence of steps occurs during this workflow.
Administrator creates a configuration
(ncn-mw#
) Add a configuration to CFS. For more information on creating CFS configurations, see CFS Configurations.
cray cfs v3 configurations update sample-config --file configuration.json --format json
Example output:
{
"last_updated": "2020-09-22T19:56:32Z",
"layers": [
{
"clone_url": "https://api-gw-service-nmn.local/vcs/cray/configmanagement.git",
"commit": "01b8083dd89c394675f3a6955914f344b90581e2",
"playbook": "site.yaml"
}
],
"name": "sample-config"
}
Administrator creates a BOS session template
A session template is a collection of data specifying a group of nodes, as well as the boot artifacts and configuration that should be applied to them. A session template can be created from a JSON structure. It returns a session template ID if successful.
See Manage a session template for more information.
Administrator creates a session
Create a session to perform the operation specified in the operation request parameter on the boot set defined in the session template. For this use case,
the administrator creates a session with operation as boot
and specifies the session template ID.
(ncn-mw#
)
cray bos v2 sessions create --template-name SESSIONTEMPLATE_NAME --operation boot
Session setup operator
The creation of a session causes the session-setup operator to set a desired state on all components listed in the session template.
This includes pulling files from S3 to determine boot artifacts like kernel, initrd
, and root file system. The session setup operator also enables the relevant
components at this time.
Status operator (powering-on)
The status operator will detect the enabled components and assign them a phase. This involves checking the current state of the node, including communicating with HSM to determine the current power status of the node.
In this example of booting nodes, the first phase is powering-on
. If queried at this point, the nodes will have a status of power-on-pending
.
For more on component phase and status, see Component Status
Power-on operator
The power-on operator will detect nodes with a power-on-pending
status. The power-on operator first sets the desired boot artifacts in BSS.
If configuration is enabled for the node, the power-on operator will also call CFS to set the desired configuration and disable the node with CFS.
The node must be disabled within CFS so that CFS does not try to configure node until it has booted.
The power-on operator then calls CAPMC to power-on the node.
Lastly, the power-on operator will update the state of the node in BOS, including setting the last action. If queried at this point,
the nodes will have a status of power-on-called
.
CAPMC boots nodes
CAPMC interfaces directly with the Redfish APIs and powers on the selected nodes.
BSS interacts with the nodes
BSS generates iPXE boot scripts based on the image content and boot parameters that have been assigned to a node. Nodes download the iPXE boot script from BSS.
Nodes request boot artifacts from S3
Nodes download the boot artifacts. The nodes boot using the boot artifacts pulled from S3.
Status operator (configuring)
The status operator monitors a node’s power state until HSM reports that the power state is on.
When the power state for a node is on, the status operator will either set the phase to configuring
if CFS configuration is required or it will clear the current phase
if the node is in its final state.
CFS applies configuration
If needed, CFS runs Ansible on the nodes and applies post-boot configuration (also called node personalization).
Status operator (complete)
The status operator will continue monitoring the states for each node until CFS reports that configuration is complete. The status operator will clear the current phase now that the node is in its final state. The status operator will also disable components at this point.
Session completion operator
When all nodes belonging to a session have been disabled, the session is marked complete, and its final status is saved to the database.
Use case: Administrator reboots and configures select compute nodes.
BOS v2 reboot flow diagram: This labels on the diagram correspond to the workflow steps listed below. Some steps are omitted from the diagram for readability.
Workflow overview: The following sequence of steps occurs during this workflow.
Administrator creates a configuration
Add a configuration to CFS. For more information on creating CFS configurations, see CFS Configurations.
(ncn-mw#
)
cray cfs v3 configurations update sample-config --file configuration.json --format json
Example output:
{
"last_updated": "2020-09-22T19:56:32Z",
"layers": [
{
"clone_url": "https://api-gw-service-nmn.local/vcs/cray/configmanagement.git",
"commit": "01b8083dd89c394675f3a6955914f344b90581e2",
"playbook": "site.yaml"
}
],
"name": "sample-config"
}
Administrator creates a BOS session template
A session template is a collection of data specifying a group of nodes, as well as the boot artifacts and configuration that should be applied to them. A session template can be created from a JSON structure. It returns a session template ID if successful.
See Manage a session template for more information.
Administrator creates a session
Create a session to perform the operation specified in the operation request parameter on the boot set defined in the session template. For this use case,
the administrator creates a session with operation as reboot
and specifies the session template ID.
(ncn-mw#
)
cray bos v2 sessions create --template-name SESSIONTEMPLATE_NAME --operation reboot
Session setup operator
The creation of a session causes the session-setup operator to set a desired state on all components listed in the session template.
This includes pulling files from S3 to determine boot artifacts like kernel, initrd
, and root file system. The session setup operator also enables the relevant
components at this time.
Status operator (powering-off)
The status operator will detect the enabled components and assign them a phase. This involves checking the current state of the node, including communicating with HSM to determine the current power status of the node.
In this example of rebooting nodes, the first phase is powering-off
. If queried at this point, the nodes will have a status of power-off-pending
.
For more on component phase and status, see Component Status
Graceful-power-off operator
The power-off operator will detect nodes with a power-off-pending
status, calls CAPMC to power-off the node.
Then, the power-off operator will update the state of the node in BOS, including setting the last action. If queried at this point, the nodes will have a status of
power-off-gracefully-called
.
Forceful-power-off operator
If powering-off is taking too long, the forceful-power-off will take over. It also calls CAPMC to power-off the node, but with the addition of the forceful flag.
Then, the power-off operator will update the state of the node in BOS, including setting the last action. If queried at this point, the nodes will have a status of
power-off-forcefully-called
.
CAPMC powers off nodes
CAPMC interfaces directly with the Redfish APIs and powers off the selected nodes.
Status operator (powering-on)
The status operator monitors a node’s power state until HSM reports that the power state is off.
When the power state for a node is off, the status operator will set the phase to powering-on
. If queried at this point, the nodes will have a status of
power-on-pending
.
Power-on operator
The power-on operator will detect nodes with a power-on-pending
status. The power-on operator first sets the desired boot artifacts in BSS.
If configuration is enabled for the node, the power-on operator will also call CFS to set the desired configuration and disable the node with CFS.
The node must be disabled within CFS so that CFS does not try to configure node until it has booted.
The power-on operator then calls CAPMC to power-on the node.
Lastly, the power-on operator will update the state of the node in BOS, including setting the last action. If queried at this point, the nodes will have a status of
power-on-called
.
CAPMC boots nodes
CAPMC interfaces directly with the Redfish APIs and powers on the selected nodes.
BSS interacts with the nodes
BSS generates iPXE boot scripts based on the image content and boot parameters that have been assigned to a node. Nodes download the iPXE boot script from BSS.
Nodes request boot artifacts from S3
Nodes download the boot artifacts. The nodes boot using the boot artifacts pulled from S3.
Status operator (configuring)
The status operator monitors a node’s power state until HSM reports that the power state is on.
When the power state for a node is on, the status operator will either set the phase to configuring
if CFS configuration is required or it will clear the current
phase if the node is in its final state.
CFS applies configuration
If needed, CFS runs Ansible on the nodes and applies post-boot configuration (also called node personalization).
Status operator (complete)
The status operator will continue monitoring the states for each node until CFS reports that configuration is complete. The status operator will clear the current phase now that the node is in its final state. The status operator will also disable components at this point.
Session completion operator
When all nodes belonging to a session have been disabled, the session is marked complete, and its final status is saved to the database.
Use case: Administrator powers off selected compute nodes.
BOS v2 Shutdown flow diagram: This labels on the diagram correspond to the workflow steps listed below. Some steps are omitted from the diagram for readability.
Workflow overview: The following sequence of steps occurs during this workflow.
Administrator creates a BOS session template
A session template is a collection of data specifying a group of nodes, as well as the boot artifacts and configuration that should be applied to them. A session template can be created from a JSON structure. It returns a session template ID if successful.
See Manage a session template for more information.
Administrator creates a session
Create a session to perform the operation specified in the operation request parameter on the boot set defined in the session template. For this use case,
the administrator creates a session with operation as shutdown
and specifies the session template ID.
(ncn-mw#
)
cray bos v2 sessions create --template-name SESSIONTEMPLATE_NAME --operation shutdown
Session setup operator
The creation of a session causes the session-setup operator to set a desired state on all components listed in the session template. For a power-off, this means clearing the desired state for each component. The session setup operator also enables the relevant components at this time.
Status operator (powering-off)
The status operator will detect the enabled components and assign them a phase. This involves checking the current state of the node, including communicating with HSM to determine the current power status of the node.
In this example of booting nodes, the first phase is powering-off
. If queried at this point, the nodes will have a status of power-off-pending
.
For more on component phase and status, see Component Status
Graceful-power-off operator
The power-off operator will detect nodes with a power-off-pending
status, calls CAPMC to power-off the node.
Then, the power-off operator will update the state of the node in BOS, including setting the last action. If queried at this point, the nodes will have a status of
power-off-gracefully-called
.
Forceful-power-off operator
If powering-off is taking too long, the forceful-power-off will take over. It also calls CAPMC to power-off the node, but with the addition of the forceful flag.
Then, the power-off operator will update the state of the node in BOS, including setting the last action. If queried at this point, the nodes will have a status of
power-off-forcefully-called
.
CAPMC powers off nodes
CAPMC interfaces directly with the Redfish APIs and powers off the selected nodes.
Status operator (complete)
The status operator will continue monitoring the states for each node until CFS reports that configuration is complete. The status operator will clear the current phase now that the node is in its final state. The status operator will also disable components at this point.
Session completion operator
When all nodes belonging to a session have been disabled, the session is marked complete, and its final status is saved to the database.
The following workflows are included in this section:
Use case: Administrator powers on and configures select compute nodes.
Components: This workflow is based on the interaction of the BOS with other services during the boot process:
Workflow overview: The following sequence of steps occurs during this workflow.
Administrator creates a configuration
Add a configuration to CFS. See CFS Configurations for more information.
(ncn-mw#
)
cray cfs v3 configurations update sample-config --file configuration.json --format json
Example output:
{
"last_updated": "2020-09-22T19:56:32Z",
"layers": [
{
"clone_url": "https://api-gw-service-nmn.local/vcs/cray/configmanagement.git",
"commit": "01b8083dd89c394675f3a6955914f344b90581e2",
"playbook": "site.yaml"
}
],
"name": "sample-config"
}
Administrator creates a BOS session template
A session template is a collection of data specifying a group of nodes, as well as the boot artifacts and configuration that should be applied to them. A session template can be created from a JSON structure. It returns a session template ID if successful.
See Manage a session template for more information.
Administrator creates a session
Create a session to perform the operation specified in the operation request parameter on the boot set defined in the session template.
For this use case, the administrator creates a session with operation as boot
and specifies the session template ID. The set of allowed operations are:
boot
– Boot nodes that are powered offconfigure
– Reconfigure the nodes using the Configuration Framework Service (CFS)reboot
– Gracefully power down nodes that are on and then power them back upshutdown
– Gracefully power down nodes that are on(ncn-mw#
)
cray bos v1 session create --template-name SESSIONTEMPLATE_NAME --operation boot
Launch BOA
The creation of a session results in the creation of a Kubernetes BOA job to complete the operation. BOA coordinates with other services to complete the requested operation.
BOA to HSM
BOA coordinates with HSM to validate node group and node status.
BOA to S3
BOA coordinates with S3 to verify boot artifacts like kernel, initrd
, and root file system.
BOA to BSS
BOA updates BSS with boot artifacts and kernel parameters for each node.
BOA to CAPMC
BOA coordinates with CAPMC to power-on the nodes.
CAPMC boots nodes
CAPMC interfaces directly with the Redfish APIs and powers on the selected nodes.
BSS interacts with the nodes
BSS generates iPXE boot scripts based on the image content and boot parameters that have been assigned to a node. Nodes download the iPXE boot script from BSS.
Nodes request boot artifacts from S3
Nodes download the boot artifacts from S3. The nodes boot using these artifacts.
BOA to HSM
BOA waits for the nodes to boot up and be accessible via SSH. This can take up to 30 minutes. BOA coordinates with HSM to ensures that nodes are booted and Ansible can SSH to them.
BOA to CFS
BOA directs CFS to apply post-boot configuration.
CFS applies configuration
CFS runs Ansible on the nodes and applies post-boot configuration (also called node personalization). CFS then communicates the results back to BOA.
Use case: Administrator reconfigures compute nodes that are already booted and configured.
Components: This workflow is based on the interaction of the BOS with other services during the reconfiguration process.
Workflow overview: The following sequence of steps occurs during this workflow.
Administrator creates a BOS session template
A session template is a collection of data specifying a group of nodes, as well as the boot artifacts and configuration that should be applied to them. A session template can be created from a JSON structure. It returns a session template ID if successful.
See Manage a session template for more information.
Administrator creates a session
Create a session to perform the operation specified in the operation request parameter on the boot set defined in the session template.
For this use case, the administrator creates a session with operation as configure
and specifies the session template ID. The set of allowed operations are:
boot
– Boot nodes that are powered offconfigure
– Reconfigure the nodes using the Configuration Framework Service (CFS)reboot
– Gracefully power down nodes that are on and then power them back upshutdown
– Gracefully power down nodes that are on(ncn-mw#
)
cray bos v1 session create --template-name SESSIONTEMPLATE_NAME --operation configure
Launch BOA
The creation of a session results in the creation of a Kubernetes BOA job to complete the operation. BOA coordinates with the underlying subsystem to complete the requested operation.
BOA to HSM
BOA coordinates with HSM to validate node group and node status.
BOA to CFS
BOA directs CFS to apply post-boot configuration.
CFS applies configuration
CFS runs Ansible on the nodes and applies post-boot configuration (also called node personalization).
CFS to BOA
CFS then communicates the results back to BOA.
Use cases: Administrator powers off selected compute nodes.
Components: This workflow is based on the interaction of the Boot Orchestration Service (BOS) with other services during the node shutdown process:
Workflow overview: The following sequence of steps occurs during this workflow.
Administrator creates a BOS session template
A session template is a collection of data specifying a group of nodes, as well as the boot artifacts and configuration that should be applied to them. A session template can be created from a JSON structure. It returns a session template ID if successful.
See Manage a session template for more information.
Administrator creates a session
Create a session to perform the operation specified in the operation request parameter on the boot set defined in the session template.
For this use case, the administrator creates a session with operation as shutdown
and specifies the session template ID. The set of allowed operations are:
boot
– Boot nodes that are powered offconfigure
– Reconfigure the nodes using the Configuration Framework Service (CFS)reboot
– Gracefully power down nodes that are on and then power them back upshutdown
– Gracefully power down nodes that are on(ncn-mw#
)
cray bos v1 session create --template-name SESSIONTEMPLATE_NAME --operation shutdown
Launch BOA
The creation of a session results in the creation of a Kubernetes BOA job to complete the operation. BOA coordinates with the underlying subsystem to complete the requested operation.
BOA to HSM
BOA coordinates with HSM to validate node group and node status.
BOA to CAPMC
BOA directs CAPMC to power off the nodes.
CAPMC to the nodes
CAPMC interfaces directly with the Redfish APIs and powers off the selected nodes.
CAPMC to BOA
CAPMC communicates the results back to BOA.