Prometheus SNMP Exporter

The Prometheus SNMP Exporter is deployed by the cray-sysmgmt-health chart to the sysmgmt-health namespace as part of the Cray System Management (CSM) release.

Adding SNMP credentials to the system

Both the Prometheus SNMP Exporter and River Endpoint Discovery Service (REDS) hardware discovery use SNMP credentials stored in Vault. These credentials are read from Vault and pushed into customizations.yaml as a sealed secret. For the SNMP Exporter and REDS hardware discovery to work, three things need to have been done by an administrator:

  1. The SNMP credentials need to have been pushed into the correct path in Vault.
  2. The SNMP credentials need to have been encrypted as sealed secrets and written to customizations.yaml (this is covered in the documentation linked in the previous bullet).
  3. The SNMP credentials need to be added to the running configuration on every management network switch.

Note: While the CSM Automatic Network Utility (CANU) will typically not overwrite SNMP settings that are manually applied to the management switches, there are certain cases where SNMP configuration can be over-written or lost (such as when resetting and reconfiguring a switch from factory defaults). To persist the SNMP settings, see CANU Custom Configuration. CANU custom configuration files are used to persist site management network configurations that are intended to take precedence over configurations generated by CANU.

If the Prometheus SNMP Exporter or REDS hardware discovery emit errors related to SNMP authentication, then an administrator should:

Configuration

In order to provide data to the Grafana SNMP dashboards, the SNMP Exporter must be configured with a list of management network switches to scrape metrics from.

NOTE All variables used within this page depend on the /etc/environment setup done in Pre-installation.

  1. (pit#) Update customizations.yaml with the list of switches to be monitored by the SNMP Exporter.

    /usr/share/doc/csm/scripts/configure_snmp_monitor.py -c "${PITDATA}/prep/site-init/customizations.yaml" -s "${PITDATA}/prep/${SYSTEM_NAME}/sls_input_file.json"
    

    Expected output looks similar to the following:

    Switches to monitor for subnet HMN
    [{'name': 'sw-spine-001', 'target': '10.254.0.2'},
     {'name': 'sw-spine-002', 'target': '10.254.0.3'},
     {'name': 'sw-leaf-bmc-001', 'target': '10.254.0.4'}]
    Enabling prometheus-snmp-exporter serviceMonitor
    Adding the targets to the SNMP serviceMonitor configuration
    

    The HMN is used by default as ACLs in the switch configuration block SNMP over the NMN. This can be overridden by passing -n NMN to the configure_snmp_monitor.py script.

  2. (pit#) Review the SNMP Exporter configuration.

    yq4 eval '.spec.kubernetes.services.cray-sysmgmt-health.prometheus-snmp-exporter' "${PITDATA}/prep/site-init/customizations.yaml"
    

    The expected output looks similar to:

    serviceMonitor:
      enabled: true
      params:
        - name: sw-spine-001
          target: 10.254.0.2
        - name: sw-spine-002
          target: 10.254.0.3
        - name: sw-leaf-bmc-001
          target: 10.254.0.4
    

The most common configuration parameters are specified in the following table. They must be set in the customizations.yaml file under the spec.kubernetes.services.cray-sysmgmt-health.prometheus-snmp-exporter service definition.

Customization Default Description
serviceMonitor.enabled true Enables serviceMonitor for SNMP Exporter (default chart value is true)
params.enabled true Sets the SNMP Exporter params change to true (default chart value is false)
params.conf.module if_mib SNMP Exporter to select which module (default chart value is if_mib)
params.conf.target 10.252.0.2 Add list of switch targets to SNMP Exporter to monitor

For a complete set of available parameters, consult the values.yaml file for the cray-sysmgmt-health chart.