BOS provides a global service options endpoint for modifying the base configuration of the service itself.
bss_read_timeoutcfs_read_timeoutcleanup_completed_session_ttlclear_stagecomponent_actual_state_ttldefault_retry_policydiscovery_frequencyhsm_read_timeoutims_errors_fatalims_images_must_existims_read_timeoutlogging_levelmax_boot_wait_timemax_component_batch_sizemax_power_off_wait_timemax_power_on_wait_timepcs_read_timeoutpolling_frequencyreject_nidssession_limit_required(ncn-mw#) View the current option values with the following command:
cray bos v2 options list --format json
Example output:
{
"bss_read_timeout": 20,
"cfs_read_timeout": 20,
"cleanup_completed_session_ttl": "7d",
"clear_stage": false,
"component_actual_state_ttl": "4h",
"default_retry_policy": 3,
"discovery_frequency": 300,
"hsm_read_timeout": 20,
"ims_errors_fatal": false,
"ims_images_must_exist": false,
"ims_read_timeout": 20,
"logging_level": "DEBUG",
"max_boot_wait_time": 1200,
"max_component_batch_size": 1800,
"max_power_off_wait_time": 300,
"max_power_on_wait_time": 120,
"pcs_read_timeout": 20,
"polling_frequency": 15,
"reject_nids": false,
"session_limit_required": false
}
(ncn-mw#) The values for all BOS global options can be modified with the cray bos v2 options update command.
For example:
cray bos v2 options update --polling-frequency 12 --format json
Example output:
{
"bss_read_timeout": 20,
"cfs_read_timeout": 20,
"cleanup_completed_session_ttl": "7d",
"clear_stage": false,
"component_actual_state_ttl": "4h",
"default_retry_policy": 3,
"discovery_frequency": 300,
"hsm_read_timeout": 20,
"ims_errors_fatal": false,
"ims_images_must_exist": false,
"ims_read_timeout": 20,
"logging_level": "DEBUG",
"max_boot_wait_time": 1200,
"max_component_batch_size": 1800,
"max_power_off_wait_time": 300,
"max_power_on_wait_time": 120,
"pcs_read_timeout": 20,
"polling_frequency": 12,
"reject_nids": false,
"session_limit_required": false
}
bss_read_timeoutThe amount of time in seconds that BOS will wait for API responses from the Boot Script Service (BSS). After this time, the request will time out. The default is 20 seconds.
cfs_read_timeoutThe amount of time in seconds that BOS will wait for API responses from the Configuration Framework Service (CFS). After this time, the request will time out. The default is 20 seconds.
cleanup_completed_session_ttlThe amount of time that a completed BOS session can exist without being
cleaned up by the session-cleanup operator.
The value can either be 0 or else be a non-negative integer following by a character indicating the
units: minutes (mor M), hours (h or H), days (d or D), or weeks (w or W).
For example, 3d means three days.
The cleanup behavior is disabled if the option is set to 0, 0m, 0h, 0d, or 0w.
clear_stageAllows staged information for BOS components to be cleared when the requested staging action has been started. Defaults to false.
For more information on staging, see Stage Changes with BOS.
component_actual_state_ttlThis option defines two things:
actual_state is considered valid; if the actual state was last
updated longer ago than this time, then the actual-state-cleanup operator
will clear the actual state of the component.The value can either be 0 or else be a non-negative integer following by a character indicating the
units: minutes (mor M), hours (h or H), days (d or D), or weeks (w or W).
For example, 3d means three days.
WARNING: Unlike
cleanup_completed_session_ttl, a zero value for this option will not disable the cleanup behavior; instead it will result in undesirable behavior. Specifically, component actual states will be cleared every time theactual-state-cleanupoperator runs, and the BOS reporter will have no pauses between reporting the component status to BOS. This will effectively render BOS unable to properly manage the nodes.To avoid problems, never set this option to a value less than 1 hour.
default_retry_policyThe default maximum number of attempts per node for failed actions.
discovery_frequencyThe frequency with which BOS checks the Hardware State Manager (HSM) for new components and adds them to the BOS component database.
hsm_read_timeoutThe amount of time in seconds that BOS will wait for API responses from HSM. After this time, the request will time out. The default is 20 seconds.
ims_errors_fatalThis option modifies how BOS behaves when validating the architecture of a boot image in a boot set. Specifically, this option comes into play when BOS needs data from the Image Management Service (IMS) in order to do this validation, but IMS is unreachable.
In the above situation, if this option is true, then the validation will fail. Otherwise, if the option is false, then a warning will be logged, but the validation will not be failed because of this.
This boot set validation happens when creating a session template, validating a session template, or creating a session.
ims_images_must_existThis option modifies how BOS behaves when validating a boot set whose boot image appears to be from IMS. Specifically, this option comes into play when the image does not actually exist in IMS.
In the above situation, if this option is true, then the validation will fail. Otherwise, if the option is false, then a warning will be logged, but the validation will not be failed because of this.
Note: If ims_images_must_exist is true but ims_errors_fatal is false, then
a failure to determine whether or not an image is in IMS will NOT result in a fatal error.
This boot set validation happens when creating a session template, validating a session template, or creating a session.
ims_read_timeoutThe amount of time in seconds that BOS will wait for API responses from IMS. After this time, the request will time out. The default is 20 seconds.
logging_levelThe logging level for the BOS API server and the BOS Operators.
Valid values for this option are DEBUG, INFO, and WARN.
max_boot_wait_timeHow long BOS will wait for a node to boot into a usable state before rebooting it again (in seconds).
max_component_batch_sizeThe maximum number of components that BOS will group together in a single API request it makes. This can be used to limit the load on other services by forcing BOS to break up its requests into smaller chunks.
max_power_off_wait_timeHow long BOS will wait for a node to power off before forcefully powering it off (in seconds).
max_power_on_wait_timeHow long BOS will wait for a node to power on before calling power on again (in seconds).
pcs_read_timeoutThe amount of time in seconds that BOS will wait for API responses from the Power Control Service (PCS). After this time, the request will time out. The default is 20 seconds.
polling_frequencyHow frequently the BOS operators check component state for needed actions (in seconds).
reject_nidsBOS does not support the use of NIDs to identify nodes – only xnames.
If the reject_nids option is enabled, BOS will prevent creation of sessions and session templates that appear to reference NIDs.
Specifically, if this option is enabled, then:
node_list that appears to contain a NID, then the creation will fail.node_list that appears to contain a NID, then the validation will fail.node_list that appears to contain a NID, then the session creation will fail.This option does NOT have an effect on sessions that were created prior to it being enabled (even if they have not yet started).
session_limit_requiredIf enabled, BOS sessions cannot be created without specifying the limit parameter.
This can be helpful in avoiding accidental reboots of more components than intended.
If this option is enabled, it is still possible to effectively create a session with no limit
by specifying * as the limit parameter (if this is done on the command line, it must be
quoted it in order to prevent it from being interpreted by the shell).