The configuration of individual system components is managed with the cray cfs components
command. The Configuration Framework Service (CFS) contains a database of the configuration state of available hardware known to the Hardware State Manager (HSM). When new nodes are added to the HSM database, a CFS Hardware Sync Agent enters the component into the CFS database with a null state of configuration.
Administrators are able to set a desired CFS configuration for each component, and the CFS Batcher ensures the desired configuration state and the current configuration state match.
Whenever CFS detects that the desired configuration does not match the current configuration state, CFS Batcher will automatically start a CFS session to apply the necessary configuration. See Configuration Management with CFS Batcher for more information.
There are several situations that will cause automatic configuration:
cfs-state-reporter
package installed will register a null current configuration, resulting in a full configuration.Configuration status of a given component (using the component name (xname)) is available through the cray cfs components describe
command. The following fields are provided to determine the status and state of the component:
configurationStatus
The status of the component’s configuration. Valid status values are unconfigured, failed, pending, and configured.
desiredConfig
The CFS configurations entry assigned to this component.
enabled
Indicates whether the component will be configured by CFS or not.
errorCount
The number of times configuration sessions have failed to configure this component.
retryPolicy
The number of times the configuration will be attempted if it fails. If errorCount >= retryPolicy, CFS will not continue attempts to apply the desiredConfig.
state
The list of configuration layers that have been applied to the component from the desiredConfig.
To view the configuration state of a given component, use the describe
command for a given component name (xname):
ncn# cray cfs components describe XNAME --format json
Example output:
{
"configurationStatus": "configured",
"desiredConfig": "configurations-example",
"enabled": true,
"errorCount": 0,
"id": "x3000c0s13b0n0",
"retryPolicy": 3,
"state": [
{
"cloneUrl": "https://api-gw-service-nmn.local/vcs/cray/example.git",
"commit": "f6a2727a70fdd6d95df6ad9c883188e694d5b37f",
"lastUpdated": "2021-07-28T03:26:00Z",
"playbook": "site.yml",
"sessionName": "batcher-6c95df62-2fe3-451b-8cc5-21d3cf748f83"
},
{
"cloneUrl": "https://api-gw-service-nmn.local/vcs/cray/another-example.git",
"commit": "282a9bfbf802d7b5c4d9bb5549b6e77957ec37f0",
"lastUpdated": "2021-07-28T03:26:10Z",
"playbook": "ncn.yml",
"sessionName": "batcher-6c95df62-2fe3-451b-8cc5-21d3cf748f83"
}
]
}
When a layer fails to configure, CFS will append a _failed status to the commit field. CFS Batcher will continue to attempt to configure this component with this configuration layer unless the errorCount has reached the retryPolicy limit.
{
"cloneUrl": "https://api-gw-service-nmn.local/vcs/cray/another-example.git",
"commit": "282a9bfbf802d7b5c4d9bb5549b6e77957ec37f0_failed",
"lastUpdated": "2021-07-28T03:26:20",
"playbook": "ncn.yml",
"sessionName": "batcher-74f83dad-9f90-4f5e-bf45-0498ffde8795"
}
In the event that a playbook is specified in the configuration that does not apply to the specific component, CFS will append _skipped to the commit field.
{
"cloneUrl": "https://api-gw-service-nmn.local/vcs/cray/another-example.git",
"commit": "a8b132fa5ca04cbe1716501d7be38d9b34532a44_skipped",
"lastUpdated": "2021-07-28T03:26:30",
"playbook": "site.yml",
"sessionName": "batcher-e3152e08-77df-4719-9b15-4fd5ad696730"
}
If a playbook exits early because of the Ansible any_errors_fatal setting, CFS will append _incomplete to the commit field for all components that did not cause the failure. This situation would most likely occur only when using an Ansible linear playbook execution strategy.
{
"cloneUrl": "https://api-gw-service-nmn.local/vcs/cray/another-example.git",
"commit": "282a9bfbf802d7b5c4d9bb5549b6e77957ec37f0_incomplete",
"lastUpdated": "2021-07-28T03:26:40",
"playbook": "site.yml",
"sessionName": "batcher-6c95df62-2fe3-451b-8cc5-21d3cf748f83"
}
To force a component which has a specific desiredConfig to a different configuration, use the update
subcommand to change the configuration:
ncn# cray cfs components update XNAME --desired-config new-config
IMPORTANT: Ensure that the new configuration has been created with the
cray cfs configurations update new-config
command before assigning the configuration to any components.
To force a component to retry its configuration again after it failed, change the errorCount to less than the retryPolicy, or raise the retryPolicy. If the errorCount has not reached the retry limit, CFS will automatically keep attempting the configuration and no action is required.
ncn# cray cfs components update XNAME --error-count 0
To disable CFS configuration of a component, use the --enabled
option:
WARNING: When a node reboots and the state-reporter reports in to CFS, it will automatically enable configuration. The following command only disables configuration until a node reboots.
ncn# cray cfs components update XNAME --enabled false
Use --enabled true
to re-enable the component.