HPE Cray provides Ansible plays and roles for software products deemed necessary for the system to function. Customers are free to write their own Ansible plays and roles to augment what HPE Cray provides or implement new features. Basic knowledge of Ansible is needed to write plays and roles. The information below includes recommendations and best practices for writing and running Ansible code on the system successfully with the Configuration Framework Service (CFS).
Help with Ansible can be found in the external Ansible documentation:
The Version Control Service (VCS) is setup during the Cray System Management (CSM) product installation, which is the appropriate place to store configuration content. Individual product installations include the Ansible code to properly configure each product.
The structure of the individual repository directories matches the recommended directory layout from the external Ansible documentation.
Playbooks are found at the top level, and the Ansible roles and variables are in their appropriate directories.
Inventory directories like group_vars
and host_vars
may exist, but they are empty and left for variable overrides and customizations by the user.
Within an Ansible playbook, users can designate which node groups the various tasks and roles will run against.
For example, a playbook may contain a list of roles that are run on only the Compute
nodes, or a list of roles that will run on only Application
nodes. This is designated using the hosts
parameter.
For example, hosts: Compute
would be used to target the compute nodes. Users can create additional sections that target other node types, or adjust the hosts that the included roles will run against as necessary.
It is also possible to target multiple groups within a section of a playbook, or to specify complex targets, such as nodes that are in one group and not in another group.
The syntax for this is available in the external Ansible documentation.
Hosts can also be in more than one group at a time. In this case, Ansible will run all sections that match the node type against the node.
See the Ansible Inventory section for more information about groups that are made automatically available through CFS dynamic inventory.
CFS will handle scaling up Ansible to run on many hosts, but there are still places where performance can be improved by correctly writing Ansible plays.
gather_facts: false
. If only a few facts are required, it is also possible to limit fact gathering by setting gather_subset
.
For more information on gather_subset
, see the external Ansible module setup documentation.include_*
(dynamic re-use) to skip multiple tasks at once when using conditionals. Ansible evaluates conditionals for every node in every task.
This includes when the conditional is applied to a block, or a role imported with roles:
or the import_role
task.
This is because these are static imports that are compiled at the beginning of the playbook, and the conditional is inherited by every task in the role or block.
Evaluating these conditionals for each task may only take a second or two, but across the hundreds of tasks that might be part of a playbook, this can add up to significant wasted time.
Instead use dynamic imports with the include_*
tasks. Because these are evaluated at runtime, a conditional can skip the import of the role or tasks entirely, and is only evaluated once.
See the Ansible documentation on Conditionals with re-use
and Re-using files and roles for more information.hosts
keyword to determine what tasks will run against each node. Ansible will skip sections of the playbook that have a hosts
target that does
not match any nodes in the current inventory/limit, but when multiple types of nodes are configured at the same time with the same configuration, they may end up in the same batch and Ansible run. This would mean that Ansible
has to run through the sections for both types of nodes, taking more time than if the nodes were in separate batches and could skip past the unneeded code.any_errors_fatal
. In addition to not working with all Ansible strategies, this can cause an Ansible run to exit early, and the nodes that did not have the error will have to start from the beginning of the playbook in the next session.free
Ansible strategy. This means avoiding situations where all nodes in a batch need to complete a task before moving onto the next, and can save time by allowing nodes to proceed through a playbook at their own pace.Because CFS splits components into multiple batches, and components may also configure at different times when they are rebooted, some keywords meant for coordinating the runs of multiple nodes may not work as expected.
Keyword | Notes |
---|---|
any_errors_fatal |
This keyword is intended to stop execution as soon as any node reports a failure. However, this will only stop execution for the current batch. |
run_once |
This keyword is intended to limit a task to running on a single node. However this will only cause the task to be run once per batch. |
serial |
This keyword is intended to limit runs to a small number of nodes at a time, such as during a rolling upgrade. |
However, this will only function within the batch, so more nodes may be running the task than intended when multiple batches are running. |
CFS supports two Ansible strategies: cfs_linear
and cfs_free
. cfs_linear
runs all task in a playbook serially, with all nodes completing a task before Ansible moves on to the next task.
cfs_free
decouples the nodes allowing each node to proceed through the playbook at its own pace.
Switching to cfs_free
from the default strategy of cfs_linear
may result in better configuration time, however currently not all included playbooks support the cfs_free
strategy,
so this should only be done when playbooks that are confirmed to work correctly with the cfs_free
strategy.
In addition, the cfs_free
strategy is limited by the fact that configuration in CFS is applied over multiple layers and multiple playbooks.
This means that even when using the cfs_free
strategy, all nodes must complete a playbook together before moving onto the next playbook.
The CFS Ansible strategies extend the similarly named Ansible strategy, adding reporting callbacks that are used to track components’ state. cfs_linear
and cfs_free
should always be used in place of linear
and free
to ensure that CFS functions correctly.