The configuration of Rack Resiliency happens as part of
Management Node Personalization.
Specifically, the setup is done by the rack_resiliency_for_mgmt_nodes.yml
Ansible playbook in the
csm-config-management
Version Control Service (VCS)
repository.
If Rack Resiliency is not enabled, then this playbook will do nothing. See Enabling Rack Resiliency for details on how it is enabled.
There are two setup flows – one for setting up Kubernetes, and one for setting up Ceph. There are some shared preparation steps, but the actual configuration steps differ.
Stage | Ansible role |
---|---|
Verify enablement | csm.rr.check_enablement |
Placement discovery | csm.rr.mgmt_nodes_placement_discovery |
Placement validation | csm.rr.mgmt_nodes_placement_validation |
Stage | Ansible role |
---|---|
Kubernetes zoning | csm.rr.k8s_topology_zoning |
Stage | Ansible role |
---|---|
Ceph zoning | csm.rr.ceph_zoning |
Ceph HAproxy configuration | csm.rr.ceph_haproxy |
The below stages are preparatory steps to setup Kubernetes and Ceph for Rack Resiliency.
This Ansible role verifies that Rack Resiliency is enabled in customizations.yaml
.
If it is not enabled, then the RR setup is skipped.
This Ansible role identifies the physical racks and locates the management nodes in it. The Hardware State Manager (HSM) is queried for information on all of the management NCNs. This information is used to create a mapping between the xnames of the management NCNs and the xnames of the physical racks that contain them.
The System Layout Service (SLS) is used to map these management node xnames to the corresponding Kubernetes and storage node names. This mapping of rack xnames to Kubernetes and storage node hostnames is stored in the below format as a JSON file to be consumed by the Kubernetes and Ceph zoning modules later.
Example of JSON file containing rack to management NCN hostname mapping:
{
"x3000": [
"ncn-m001",
"ncn-w001",
"ncn-w004",
"ncn-w007",
"ncn-s001"
],
"x3001": [
"ncn-m002",
"ncn-w002",
"ncn-w006",
"ncn-w005",
"ncn-w008",
"ncn-s003"
],
"x3002": [
"ncn-m003",
"ncn-w003",
"ncn-w009",
"ncn-s002",
"ncn-s004"
]
}
This Ansible role uses the discovery results from Placement discovery and validates whether the current placement meets the required criteria for enabling rack resiliency.
The placement validation algorithm, as shown in the flow chart above, decides whether the current placement of management nodes is suitable for enabling rack resiliency. If it is found that the current placement is not suitable for rack resiliency, the validation fails.
This module also evaluates if managed nodes are present in the management rack and informational messages are generated for the same.
Note: Slingshot switch placement discovery and validation is not included in this process.
The below stage is used to setup Kubernetes zones.
This Ansible role uses the discovery results from Placement discovery and applies Kubernetes zoning for master and worker nodes. For more information, see Kubernetes zones.
The below stages are used to setup the Ceph zones and update Ceph HAproxy configuration.
This Ansible role uses the discovery results from Placement discovery and applies Ceph zoning for storage nodes. Along with creating zones for Ceph storage nodes, zones for the Ceph services are also created. For more information, see Ceph zones.
This Ansible role updates Ceph HAproxy configuration with latest information after performing
Ceph zoning. It also updates ceph.conf
with latest configuration, on all storage nodes.