Configure an Alerta alert notification for Prometheus Alertmanager alerts.
The SYSTEM_DOMAIN_NAME
value found in some of the URLs on this page is expected to be the system’s fully qualified domain name (FQDN).
(ncn-mw#
) The FQDN can be found by running the following command on any Kubernetes NCN.
kubectl get secret site-init -n loftsman -o jsonpath='{.data.customizations\.yaml}' | base64 -d | yq r - spec.network.dns.external
Example output:
system.hpc.amslabs.hpecorp.net
Be sure to modify the example URLs on this page by replacing SYSTEM_DOMAIN_NAME
with the actual value found using the above command.
This procedure can be performed on any master or worker NCN.
(ncn-mw#
) Save the current alert notification configuration, in case a rollback is needed.
kubectl get secret -n sysmgmt-health alertmanager-cray-sysmgmt-health-kube-p-alertmanager \
-ojsonpath='{.data.alertmanager\.yaml}' | base64 --decode > /tmp/alertmanager-default.yaml
(ncn-mw#
) Create a secret and an alert configuration that will be used to add Alerta notifications for the alerts.
Create the secret file.
Create a file named /tmp/alertmanager-secret.yaml
with the following contents:
apiVersion: v1
data:
alertmanager.yaml: ALERTMANAGER_CONFIG
kind: Secret
metadata:
labels:
app: kube-prometheus-stack-alertmanager
chart: kube-prometheus-stack-45.1.1
heritage: Helm
release: cray-sysmgmt-health
name: alertmanager-cray-sysmgmt-health-kube-p-alertmanager
namespace: sysmgmt-health
type: Opaque
Create the Alerta alert configuration file.
In the following example file, the Alerta server is used to send the notification to http://sma-alerta.sma.svc.cluster.local:8080/webhooks/prometheus
.
Update the fields under webhook_configs:
to reflect the desired configuration.
Create a file named /tmp/alertmanager-new.yaml
with the following contents:
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://sma-alerta.sma.svc.cluster.local:8080/webhooks/prometheus'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
(ncn-mw#
) Replace the alert notification configuration based on the files created in the previous steps.
sed "s/ALERTMANAGER_CONFIG/$(cat /tmp/alertmanager-new.yaml \
| base64 -w0)/g" /tmp/alertmanager-secret.yaml \
| kubectl replace --force -f -
(ncn-mw#
) Validate the configuration changes.
Validate the alerts using cm health
commands.
cm health alert -s
Example output:
Alert Status Count
------------ -----
Critical 13
Warnings 31
.
.
.
prometheus critical critical : 13, warning : 31, info : 0
cm health alert prometheus
Example output:
prometheus Severity Summary
---------- --------- -------
10.12.1.100:8080 warning KubeDeploymentReplicasMismatch:1,
KubeStatefulSetReplicasMismatch:1, KubeJobFailed:1
10.13.1.100:9187 warning PostgresqlHighRollbackRate:1,
PostgresqlInactiveReplicationSlot:1,
PostgresqlFollowerReplicationLagSMA:1
.
.
.
If the configuration does not look accurate, check the logs for errors.
kubectl logs -f -n sysmgmt-health pod/alertmanager-cray-sysmgmt-health-kube-p-alertmanager-0 alertmanager
An Alerta notification will be sent once either of the alerts set in this procedure is FIRING
in Prometheus.
See https://prometheus.cmn.SYSTEM_DOMAIN_NAME/alerts
for more information.