Troubleshoot CMN issues

Various connection points to check when using the CMN and how to fix any issues that arise.

The most frequent issue with the Customer Management Network (CMN) is trouble accessing IP addresses outside of the HPE Cray EX system from a node or pod inside the system.

The best way to resolve this issue is to try to ping an outside IP address from one of the NCNs other than ncn-m001, which has a direct connection that it can use instead of the Customer Management Network (CMN). The following are some things to check to make sure CMN is configured correctly:

Does the NCN have an IP Address Configured on the bond0.cmn0 Interface?

Check the status of the bond0.cmn0 interface. Make sure it has an address specified.

ip addr show bond0.cmn0

Example output:

534: bond0.cmn0@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 98:03:9b:b4:27:62 brd ff:ff:ff:ff:ff:ff
    inet 10.102.5.5/26 brd 10.101.8.255 scope global bond0.cmn0
       valid_lft forever preferred_lft forever
    inet6 fe80::9a03:9bff:feb4:2762/64 scope link
       valid_lft forever preferred_lft forever

If there is not an address specified, make sure the cmn- values have been defined in csi config init input.

Does the NCN have a Default Gateway Configured?

Check the default route on an NCN other than ncn-m001. There should be a default route with a gateway matching the cmn-gateway value.

ip route | grep default

Example output:

default via 10.102.5.1 dev bond0.cmn0

If there is not an address specified, make sure the can- values have been defined in csi config init input.

Can the Node Reach the Default CMN Gateway?

Check that the node can ping the default gateway shown in the default route.

ping 10.102.5.1

Example output:

PING 10.102.5.1 (10.102.5.1) 56(84) bytes of data.
64 bytes from 10.102.5.1: icmp_seq=1 ttl=64 time=0.148 ms
64 bytes from 10.102.5.1: icmp_seq=2 ttl=64 time=0.107 ms
64 bytes from 10.102.5.1: icmp_seq=3 ttl=64 time=0.133 ms
64 bytes from 10.102.5.1: icmp_seq=4 ttl=64 time=0.122 ms
^C
--- 10.102.5.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3053ms
rtt min/avg/max/mdev = 0.107/0.127/0.148/0.018 ms

If the default gateway cannot be accessed, check the spine switch configuration.

Can the Spines Reach Outside of the System?

Check that each of the spines can ping an IP address outside of the HPE Cray EX system. This must be an IP address that is reachable from the network to which the CMN is connected. If there is only one spine being used on the system, only spine-001 needs to be checked.

sw-spine-001 [standalone: master] # ping 8.8.8.8

Example output:

PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=112 time=12.6 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=112 time=12.5 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=112 time=22.4 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=112 time=12.5 ms
^C
--- 8.8.8.8 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 12.501/15.022/22.440/4.285 ms

If the outside IP address cannot be reached, check the spine switch configuration and the connection to the customer network.

Can the Spines Reach the NCN?

Check that each of the spines can ping one or more of the NCNs at its bond0.cmn0 IP address. If there is only one spine being used on the system, only spine-001 needs to be checked.

sw-spine-001 [standalone: master] # ping 10.102.5.5

Example output:

PING 10.102.5.5 (10.102.5.5) 56(84) bytes of data.
64 bytes from 10.102.5.5: icmp_seq=1 ttl=64 time=0.140 ms
64 bytes from 10.102.5.5: icmp_seq=2 ttl=64 time=0.134 ms
64 bytes from 10.102.5.5: icmp_seq=3 ttl=64 time=0.126 ms
64 bytes from 10.102.5.5: icmp_seq=4 ttl=64 time=0.178 ms
^C
--- 10.102.5.5 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3058ms
rtt min/avg/max/mdev = 0.126/0.144/0.178/0.023 ms

If the NCN cannot be reached, check the spine switch configuration.

Can a Device Outside the System Reach the CMN Gateway?

Check that a device outside the HPE Cray EX system that is expected to have access to nodes and services on the CMN can ping the CMN gateway.

ping 10.102.5.1

Example output:

PING 10.102.5.1 (10.102.5.1): 56 data bytes
64 bytes from 10.102.5.1: icmp_seq=0 ttl=58 time=54.724 ms
64 bytes from 10.102.5.1: icmp_seq=1 ttl=58 time=65.902 ms
64 bytes from 10.102.5.1: icmp_seq=2 ttl=58 time=51.960 ms
64 bytes from 10.102.5.1: icmp_seq=3 ttl=58 time=55.032 ms
64 bytes from 10.102.5.1: icmp_seq=4 ttl=58 time=57.606 ms
^C
--- 10.102.5.1 ping statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 51.960/57.045/65.902/4.776 ms

If the CMN gateway cannot be reached from outside, check the spine switch configuration and the connection to the customer network.