BOS Sessions

The Boot Orchestration Service (BOS) creates a session when it is asked to perform an operation on a session template. Sessions provide a way to track the status of many nodes at once as they perform the same operation with the same session template information. When creating a session, both the operation and session template are required parameters.

The v1 version of BOS supports these operations:

  • Boot - Boot a designated collection of nodes.
  • Shutdown - Shutdown a designated collection of nodes.
  • Reboot - Reboot a designated collection of nodes.
  • Configure - Configure a designated collection of booted nodes.

BOS sessions can be used to boot compute nodes with customized image roots.

A session requires two parameters, a session template ID and an operation to perform on that template. The BOS API’s session endpoint can display a list of all of the sessions that have been created, including previous and currently running sessions. The endpoint can also display the details of a given session when the specific session ID is provided as a parameter. Sessions can also be deleted through the API.

BOS supports a RESTful API. This API can be interacted with directly using tools like cURL. It can also be interacted with through the Cray Command Line Interface (CLI). See Manage a BOS Session for more information.

BOA functionality

The Boot Orchestration Agent (BOA) implements each session and sees it through to completion. A BOA is a Kubernetes job. It runs once to completion. If there are transient failures, BOA will exit and Kubernetes will reschedule it so that it can re-execute its session.

BOA moves nodes towards the requested state, but if a node fails during any of the intermediate steps, it takes note of it. BOA will then provide a command in the output of the BOA log that can be used to retry the action. This behavior impacts all BOS operations.

For example, if there is a 6,000 node system and 3 nodes fail to power off during a BOS operation. then BOA will continue and attempt to re-provision the remaining 5,997 nodes. After the command is finished, it will provide information about what the administrator needs to do in order to retry the operation on the 3 nodes that failed.

BOS v1 session limitations

The following limitations currently exist with BOS sessions:

  • No checking is done to prevent the launch of multiple sessions with overlapping lists of nodes. Concurrently running sessions may conflict with each other.
  • The boot ordinal and shutdown ordinal are not honored.
  • The partition parameter is not honored.
  • All nodes proceed at the same pace. BOA will not move on to the next step of the boot process until all components have succeeded or failed the current step.
  • The Configuration Framework Service (CFS) has its own limitations. Refer to the Configuration Management documentation for more information.