The Prometheus UAN Node Exporter service, service monitor, and endpoints are deployed to scrape SMARTMON data by the
cray-sysmgmt-health chart in the sysmgmt-health namespace.
In order to provide data to the Grafana SMART dashboards, the UAN Node Exporter must be configured with a list of UAN NMN IP address to scrape metrics from.
The method to do this depends on whether it is being done as part of a CSM install, or after CSM has already been installed:
NOTE All variables used within this page depend on the
/etc/environmentsetup done in Pre-installation.
Obtain the list of site-specific UAN NMN IP addresses.
(pit#) Update customizations.yaml with the list of UAN node IP addresses.
yq write -s - -i ${PITDATA}/prep/site-init/customizations.yaml <<EOF
- command: update
path: spec.kubernetes.services.cray-sysmgmt-health.uanNodeExporter
value:
enabled: true
endpoints:
- 10.252.1.18
- 10.252.1.13
EOF
(pit#) Review the UAN Node Exporter configuration.
yq r ${PITDATA}/prep/site-init/customizations.yaml spec.kubernetes.services.cray-sysmgmt-health.uanNodeExporter
The expected output looks similar to:
uanNodeExporter:
enabled: true
endpoints:
- 10.252.1.18
- 10.252.1.13
The most common configuration parameters are specified in the following table. They must be set in the customizations.yaml file
under the spec.kubernetes.services.cray-sysmgmt-health.uanNodeExporter service definition.
| Customization | Default | Description |
|---|---|---|
enabled |
false |
Enables service for UAN Node Exporter (default chart value is false) |
endpoints |
10.252.1.13 |
List of UAN NMN IP addresses to monitor SMARTMON data |
For a complete set of available parameters, consult the values.yaml file for the cray-sysmgmt-health chart.
This procedure configures the UAN Node Exporter after the PIT node no longer exists,
by editing the manifest and deploying the cray-sysmgmt-health Helm chart.
(uan#) Obtain the list of UAN NMN IP addresses.
hostname -i
Expected output looks similar to the following:
::1 127.0.0.1 10.252.1.13
(ncn-mw#) Get the current cached customizations.
kubectl get secrets -n loftsman site-init -o jsonpath='{.data.customizations\.yaml}' | base64 -d > customizations.yaml
(ncn-mw#) Get the current cached platform manifest.
kubectl get cm -n loftsman loftsman-platform -o jsonpath='{.data.manifest\.yaml}' > platform.yaml
(ncn-mw#) Update customizations.yaml with the list of UAN node IP addresses.
yq write -s - -i customizations.yaml <<EOF
- command: update
path: spec.kubernetes.services.cray-sysmgmt-health.uanNodeExporter
value:
enabled: true
endpoints:
- 10.252.1.18
- 10.252.1.13
EOF
(ncn-mw#) Review the UAN Node Exporter configuration.
yq r customizations.yaml spec.kubernetes.services.cray-sysmgmt-health.uanNodeExporter
The expected output looks similar to:
enabled: true
endpoints:
- 10.252.1.18
- 10.252.1.13
Edit the platform.yaml to only include the cray-sysmgmt-health chart and all its current data.
The resources specified above will be updated in the next step. The version may differ, because this is an example.
apiVersion: manifests/v1beta1
metadata:
name: platform
spec:
charts:
- name: cray-sysmgmt-health
namespace: sysmgmt-health
values:
#.
#.
#.
version: 0.28.9
(ncn-mw#) Generate the manifest that will be used to redeploy the chart with the modified resources.
manifestgen -c customizations.yaml -i platform.yaml -o manifest.yaml
(ncn-mw#) Check that the manifest file contains the desired resource settings.
yq read manifest.yaml 'spec.charts.(name==cray-sysmgmt-health).values.uanNodeExporter'
Example output:
enabled: true
endpoints:
- 10.252.1.18
- 10.252.1.13
(ncn-mw#) Redeploy the same chart version but with the desired UAN Node Exporter configuration settings.
loftsman ship --charts-path /etc/cray/upgrade/csm/csm-${CSM_RELEASE}/tarball/csm-${CSM_RELEASE}/helm/ --manifest-path manifest.yaml
Here, /etc/cray/upgrade/csm/csm-${CSM_RELEASE}/tarball/csm-${CSM_RELEASE}/helm/ is the path of the cray-sysmgmt-health chart.
(ncn-mw#) This step is critical. Store the modified customizations.yaml file in the site-init repository in the customer-managed location.
If this is not done, then these changes will not persist in future installs or upgrades.
kubectl delete secret -n loftsman site-init
kubectl create secret -n loftsman generic site-init --from-file=customizations.yaml
(ncn-mw#) Verify that the changes are in place.
kubectl get endpoints cray-sysmgmt-health-uan-node-exporter -n sysmgmt-health -o json | jq -r '.subsets[0].addresses'
Example output:
[
{
"ip": "10.252.1.18"
},
{
"ip": "10.252.1.13"
}
]