The Prometheus UAN NODE Exporter service, service monitor and endpoints are deployed to scrape SMARTMON data by the cray-sysmgmt-health
chart in the sysmgmt-health
namespace as part of the Cray System Management (CSM) release.
In order to provide data to the Grafana SMART dashboards, the UAN NODE Exporter must be configured with a list of UAN NMN IP Address to scrape metrics from.
NOTE
All variables used within this page depend on the/etc/environment
setup done in Pre-installation.
Obtain the list of site specific UAN NMN IP Address.
(pit#
) Update customizations.yaml
with the list of UAN nodes IPs.
yq write -s - -i ${PITDATA}/prep/site-init/customizations.yaml <<EOF
- command: update
path: spec.kubernetes.services.cray-sysmgmt-health.uanNodeExporter
value:
enabled: true
endpoints:
- 10.252.1.18
- 10.252.1.13
EOF
(pit#
) Review the UAN NODE Exporter configuration.
yq r ${PITDATA}/prep/site-init/customizations.yaml spec.kubernetes.services.cray-sysmgmt-health.uanNodeExporter
The expected output looks similar to:
uanNodeExporter:
enabled: true
endpoints:
- 10.252.1.18
- 10.252.1.13
The most common configuration parameters are specified in the following table. They must be set in the customizations.yaml
file
under the spec.kubernetes.services.cray-sysmgmt-health.uanNodeExporter
service definition.
Customization | Default | Description |
---|---|---|
enabled |
false |
Enables service for UAN NODE Exporter (default chart value is false ) |
endpoints |
10.252.1.13 |
list of UAN NMN IP Address to monitor SMARTMON data |
For a complete set of available parameters, consult the values.yaml
file for the cray-sysmgmt-health
chart.
This procedure is to configure the UAN NODE Exporter once the PIT node no longer exists by editing manifest and deploying cray-sysmgmt-health chart
.
(uan#
) Obtain the list of UAN NMN IP Address.
Login to UAN node
(uan#
)
hostname -i
Expected output looks similar to the following:
::1 127.0.0.1 10.252.1.13
(ncn#
) Get the current cached customizations.
kubectl get secrets -n loftsman site-init -o jsonpath='{.data.customizations\.yaml}' | base64 -d > customizations.yaml
(ncn#
) Get the current cached platform manifest.
kubectl get cm -n loftsman loftsman-platform -o jsonpath='{.data.manifest\.yaml}' > platform.yaml
(ncn#
) Update customizations.yaml
with the list of UAN nodes IPs.
yq write -s - -i customizations.yaml <<EOF
- command: update
path: spec.kubernetes.services.cray-sysmgmt-health.uanNodeExporter
value:
enabled: true
endpoints:
- 10.252.1.18
- 10.252.1.13
EOF
(ncn#
) Review the UAN NODE Exporter configuration.
yq r customizations.yaml spec.kubernetes.services.cray-sysmgmt-health.uanNodeExporter
The expected output looks similar to:
enabled: true
endpoints:
- 10.252.1.18
- 10.252.1.13
Edit the platform.yaml
to only include the cray-sysmgmt-health
chart and all its current data.
The resources specified above will be updated in the next step. The version may differ, because this is an example.
apiVersion: manifests/v1beta1
metadata:
name: platform
spec:
charts:
- name: cray-sysmgmt-health
namespace: sysmgmt-health
values:
.
.
.
version: 0.28.9
(ncn#
) Generate the manifest that will be used to redeploy the chart with the modified resources.
manifestgen -c customizations.yaml -i platform.yaml -o manifest.yaml
(ncn#
) Check that the manifest file contains the desired resource settings.
yq read manifest.yaml 'spec.charts.(name==cray-sysmgmt-health).values.uanNodeExporter'
Example output:
enabled: true
endpoints:
- 10.252.1.18
- 10.252.1.13
(ncn#
) Redeploy the same chart version but with the desired UAN NODE Exporter configuration settings.
loftsman ship --charts-path /etc/cray/upgrade/csm/csm-${CSM_RELEASE}/tarball/csm-${CSM_RELEASE}/helm/ --manifest-path manifest.yaml
Here, /etc/cray/upgrade/csm/csm-${CSM_RELEASE}/tarball/csm-${CSM_RELEASE}/helm/
is the path of the cray-sysmgmt-health
chart.
(ncn#
) This step is critical. Store the modified customizations.yaml
file in the site-init
repository in the customer-managed location.
If this is not done, these changes will not persist in future installs or upgrades.
kubectl delete secret -n loftsman site-init
kubectl create secret -n loftsman generic site-init --from-file=customizations.yaml
(ncn#
) Verify that the changes are in place.
kubectl get endpoints cray-sysmgmt-health-uan-node-exporter -n sysmgmt-health -o json | jq -r '.subsets[0].addresses'
Example output:
[
{
"ip": "10.252.1.18"
},
{
"ip": "10.252.1.13"
}
]