BOS operators are a feature of BOS v2 only.
BOS v2 has many different operators. Each operator is responsible for doing a single basic task – for example, powering on nodes, discovering new nodes on the system and creating BOS components for them, or initializing a pending BOS v2 session.
The work for a BOS v2 session is done by several different operators, all acting independently of each other. These operators are always running, even when no v2 session is actively underway. This is a big difference from BOS v1, where each session created a new dedicated Kubernetes job that handled everything associated with that v1 session.
Each operator follows the same basic execution loop:
Some BOS Options apply only to specific operators – these are noted in the relevant operator descriptions in the Operator list. The following BOS options apply to operators generally:
logging_level
polling_frequency
discovery operator, which
instead uses the discovery_frequency option for its sleep interval.max_component_batch_size
bss_read_timeout,
cfs_read_timeout,
hsm_read_timeout, and
pcs_read_timeout
See Options for more information.
(ncn-mw#) The BOS operators run in Kubernetes pods in the services namespace.
kubectl get pods -n services | grep '^cray-bos-operator-'
Example output:
cray-bos-operator-actual-state-cleanup-596dc4766c-xsdg2 2/2 Running 0 50d
cray-bos-operator-configuration-865f95f7d7-2j2tf 2/2 Running 0 50d
cray-bos-operator-discovery-698b44f9f9-fhs9c 2/2 Running 0 50d
cray-bos-operator-power-off-forceful-666f76c98f-mx5vb 2/2 Running 0 50d
cray-bos-operator-power-off-graceful-6489689c99-wch2p 2/2 Running 0 50d
cray-bos-operator-power-on-7d778c67cc-8vbrj 2/2 Running 0 50d
cray-bos-operator-session-cleanup-68c4cdbcc-qfvvr 2/2 Running 0 50d
cray-bos-operator-session-completion-756b4ddfb5-584bk 2/2 Running 0 50d
cray-bos-operator-session-setup-654544c589-9t9rr 2/2 Running 0 50d
cray-bos-operator-status-7665867877-gcj59 2/2 Running 0 50d
actual-state-cleanupconfigurationdiscoverypower-off-forcefulpower-off-gracefulpower-onsession-cleanupsession-completionsession-setupstatusactual-state-cleanupThis operator clears the actual_state field for components when the field has not been updated within a specified time.
This ensures that BOS keeps accurate information on the state of all components.
The time limit is controlled by the component_actual_state_ttl option.
WARNING: Unlike the
cleanup_completed_session_ttloption, a zero value for thecomponent_actual_state_ttloption will not disable the cleanup behavior. For details, seecomponent_actual_state_ttl.
configurationThis operator is responsible for setting the desired configuration in the
Configuration Framework Service (CFS)
for components that are in the configuring phase of the boot process.
Because the power-on operator sets the desired configuration prior to booting components,
this is typically only needed when booting to the same boot artifacts, but with a different configuration.
discoveryThis operator checks the Hardware State Manager (HSM) to discover new nodes. If any are found, it creates BOS component records for them.
For its execution loop, this operator sets its sleep interval to the
discovery_frequency option. See Options for more information.
power-off-forcefulThis operator calls the Power Control Service (PCS) to forcefully power off components when a previous power off action fails to power off the component.
power-off-gracefulThis operator calls PCS to gracefully power off components for components that have a power-off-pending status.
power-onThis operator calls PCS to power on components for components that have a power-on-pending status.
session-cleanupThis operator deletes v2 sessions from BOS that are older than a specified age.
The age is controlled by the cleanup_completed_session_ttl option.
If that option has a zero value, then this cleanup behavior is disabled.
session-completionThis operator marks v2 sessions as complete and saves a final status for the session. This happens when all components that a v2 session is responsible for have been disabled.
session-setupThis operator monitors for pending v2 sessions and moves them into the running state. It uses the session template and the session limit (if any) to determine the target components for the session. It uses the session template to determine the appropriate boot artifacts and (optionally) CFS configuration. It then updates the target components with the desired target state, boot artifacts, and configuration.
Related: BOS v2 sessions and HSM locks.
statusThis operator is the workhorse that updates the state of BOS components. For each component that is enabled in BOS, the status operator uses the following information to determine the correct state for the component:
This determination is also impacted by the following options:
default_retry_policy
max_boot_wait_time
max_power_off_wait_time
max_power_on_wait_time
disable_components_on_completion
The source for the BOS operators is located in the
Cray-HPE/bos open source GitHub repository.