This page contains the procedures to list, add, delete and modify the critical services:
The ConfigMap rrs-mon-static
in the rack-resiliency
namespace is where RR keeps its list of
critical services. The RR API/CLI commands to add services end up adding the new services to this
ConfigMap. Because the RR API/CLI does not support edits or deletes, those can only be
accomplished by directly editing the static ConfigMap. For more details on the RR ConfigMaps,
see ConfigMaps.
Any change made to the RR critical services must be made both in RR itself and in the Kyverno Policy.
Several procedures on this page require verifying that a particular service exists in the cluster.
(ncn-mw#
) Verify that a critical service is present in the Kubernetes cluster.
In the following command, be sure to substitute the actual type, name, and namespace of the service.
kubectl get <deployment-or-statefulset> <name-of-the-critical-service> -n <namespace-of-the-service>
The command will give a “not found” error message if the service is not present in the cluster.
When performing any of these operations, no restart of cray-rrs
is required in order
for the changes to take effect. However, there may be a delay before the changes are
picked up by the Resiliency Monitoring Service (RMS).
For more details, see Timing.
(ncn-mw#
) List all critical services grouped by namespace.
cray rrs criticalservices list --format toml
Example output:
[critical_services.namespace]
[[critical_services.namespace.kube-system]]
name = "cilium-operator"
type = "Deployment"
[[critical_services.namespace.kube-system]]
name = "coredns"
type = "Deployment"
[[critical_services.namespace.kube-system]]
name = "sealed-secrets"
type = "Deployment"
[[critical_services.namespace.dvs]]
name = "cray-activemq-artemis-operator-controller-manager"
type = "Deployment"
[[critical_services.namespace.dvs]]
name = "cray-dvs-mqtt-ss"
type = "StatefulSet"
[[critical_services.namespace.services]]
name = "cray-capmc"
type = "Deployment"
[[critical_services.namespace.services]]
name = "cray-console-data"
type = "Deployment"
(ncn-mw#
) Get summarized information about a specific critical service.
cray rrs criticalservices describe <critical-service-name> --format toml
This command returns information such as configured instances, currently running instances, namespace, and type.
Example output:
[critical_service]
name = "cray-capmc"
namespace = "services"
type = "Deployment"
configured_instances = 3
(ncn-mw#
) View the list of critical services directly from the static ConfigMap.
kubectl get cm rrs-mon-static -n rack-resiliency -o jsonpath='{.data.critical-service-config.json}' | jq
Truncated example output (the actual output of ConfigMap will be larger):
{
"critical_services": {
"cilium-operator": {
"namespace": "kube-system",
"type": "Deployment"
},
"coredns": {
"namespace": "kube-system",
"type": "Deployment"
},
"...<output truncated>...",
"sshot-net-operator": {
"namespace": "sshot-net-operator",
"type": "Deployment"
},
"kube-proxy": {
"namespace": "kube-system",
"type": "StatefulSet"
}
}
}
Verify that the critical services is present in the Kubernetes cluster.
Create JSON file with critical services configuration.
CriticalServiceCmStaticType
schema.Example file:
{
"critical_services": {
"coredns": {
"namespace": "kube-system",
"type": "Deployment"
},
"kube-proxy": {
"namespace": "kube-system",
"type": "StatefulSet"
}
}
}
(ncn-mw#
) Add the service to RR.
cray rrs criticalservices update --from-file <file-path> --format toml
Example output:
Update = "Successful"
Successfully_Added_Services = [ "kube-proxy",]
Already_Existing_Services = [ "coredns",]
Add the critical services to the Kyverno cluster policy.
It is strongly recommended to add critical services using the API or CLI, rather than directly editing the ConfigMap.
Verify that the critical services is present in the Kubernetes cluster.
(ncn-mw#
) Edit the static ConfigMap and add services.
Open the ConfigMap for editing.
kubectl edit ConfigMap rrs-mon-static -n rack-resiliency
Add additional service entries to the critical-service-config.json
field under the
data
section.
Save and close the editor, to apply the changes to the ConfigMap.
Add the critical services to the Kyverno cluster policy.
Do not delete or modify the critical services added by HPE. Nothing will prevent an administrator from doing this, but it is not supported.
Verify that the critical services is present in the Kubernetes cluster.
(ncn-mw#
) Edit the static ConfigMap and remove services.
Open the ConfigMap for editing.
kubectl edit ConfigMap rrs-mon-static -n rack-resiliency
Remove service entries from the critical-service-config.json
field under the
data
section.
Save and close the editor, to apply the changes to the ConfigMap.
Remove the critical services from the Kyverno cluster policy.
The following attributes of a critical service may be modified.
- Do not delete or modify the critical services added by HPE. Nothing will prevent an administrator from doing this, but it is not supported.
- Having two services with the same name in different namespaces is generally not considered a best practice in CSM; this use case is not supported in RRS.
- Similarly, services with the same name but different types (e.g., StatefulSet and Deployment) are not supported.
The process of modifying a critical service is essentially the process of removing the current critical service and then adding the modified critical service.
Remove the services being modified.
Add the modified services.
See either of the following:
After adding, removing, or modifying critical services using any of the above methods, the Kyverno cluster policy also must be updated to reflect those changes. The procedures to do this are included in this section.
For more information on the Kyverno policy, see Kyverno Policy.
This procedure is only necessary after adding critical services to RR.
(ncn-mw#
) Add the critical services to the Kyverno cluster policy.
Open the policy for editing.
kubectl edit clusterpolicy insert-labels-topology-constraints
Under spec.rules.match.any.resources.name
, add new entries with the names of the critical
services that were added to RR.
Save and close the editor, to apply the changes to the policy.
(ncn-mw#
) For each service added, verify that it now exists in the policy.
kubectl get clusterpolicy insert-labels-topology-constraints -o yaml |grep <name-of-the-critical-service>
(ncn-mw#
) Restart the services that were added.
CSM provides a script to automate this process. This script checks every Rack Resiliency critical service to see if the Kyverno policy has been applied to it or not. For any that have not, it performs rollout restarts on them, one at a time. During the service restart, the Kyverno policy is applied to each service.
The latest CSM documentation RPM must be installed on the node where this step is being performed. See Check for latest documentation.
python3 /usr/share/doc/csm/upgrade/scripts/upgrade/scripts/k8s/rr_critical_service_restart.py
This procedure is only necessary after removing critical services from RR.
(ncn-mw#
) Remove the critical services from the Kyverno cluster policy.
Open the policy for editing.
kubectl edit clusterpolicy insert-labels-topology-constraints
Under spec.rules.match.any.resources.name
, delete the entries with the names of the
critical services that were removed from RR.
Save and close the editor, to apply the changes to the policy.
(ncn-mw#
) For each service removed, verify that it no longer exists in the policy.
If the service has been removed, this command should give no output.
kubectl get clusterpolicy insert-labels-topology-constraints -o yaml |grep <name-of-the-critical-service>
Unlike when adding services to the Kyverno policy, there is no need for a rollout restart.