There is a known issue with Antero nodes where NIDs are not correctly allocated. When Cray Site Init (CSI) generates the System Layout Service (SLS) input file, it assumes that all blades in liquid-cooled cabinets are Windom compute blades. Even though both Antero and Windom blades have 4 nodes, they have different physical layouts.
b0n0, b0n1, b1n0, b1n1b0n0, b0n1, b0n2, b0n3SLS has NIDs only allocated for nodes b0n0, b0n1, b1n0, and b1n1 on a compute node blade.
On an Antero blade, the nodes b0n2 and b0n3 will have automatically assigned NIDs that are not contiguous
with the NIDs on nodes b0n0 and b0n1.
It is important to note that the nodes b0n2 and b0n3 on an Antero blade are functional,
but do not have NIDs in contiguous range with their peers.
This section gives information on how to work around this issue until there is a system
maintenance window in which to Correct the NID numbering.
To work around this issue, the appropriate NID values for nodes b0n2 and n0n3 on Antero blades
must be supplied to the Work Load Manager (WLM) when it launches jobs.
The following sections provide examples of SAT commands that can help determine the NIDs that are in use for Antero blades.
(ncn-mw#) View the NIDs for Antero blades in the system:
ANTERO=$(sat hwinv --list-node-enclosures --fields=xname \
--filter='Model=ANTERO' --format json | \
jq '.node_enclosure_list[] | "xname=\(.xname)*"' -r | sed 's/e0//' | \
paste -sd " " | sed 's/ / or /g')
sat status --type Node --fields 'xname,role,nid' --filter "${ANTERO}"
Example output:
+---------------+---------+-----------+
| xname | Role | NID |
+---------------+---------+-----------+
| x9000c1s4b0n0 | Compute | 1016 |
| x9000c1s4b0n1 | Compute | 1017 |
| x9000c1s4b0n2 | Compute | 147474562 |
| x9000c1s4b0n3 | Compute | 147474563 |
| x9000c1s5b0n0 | Compute | 1020 |
| x9000c1s5b0n1 | Compute | 1021 |
| x9000c1s5b0n2 | Compute | 147474594 |
| x9000c1s5b0n3 | Compute | 147474595 |
+---------------+---------+-----------+
(ncn-mw#) Identify the Antero nodes present in the system:
sat hwinv --list-nodes --fields 'xname,"Model"' --filter='Model="HPE EX4252"'
Example output:
################################################################################
Listing of all nodes
################################################################################
+---------------+------------+
| xname | Model |
+---------------+------------+
| x9000c1s7b0n0 | HPE EX4252 |
| x9000c1s7b0n1 | HPE EX4252 |
| x9000c1s7b0n2 | HPE EX4252 |
| x9000c1s7b0n3 | HPE EX4252 |
| x9000c3s0b0n0 | HPE EX4252 |
| x9000c3s0b0n1 | HPE EX4252 |
| x9000c3s0b0n2 | HPE EX4252 |
| x9000c3s0b0n3 | HPE EX4252 |
+---------------+------------+
(ncn-mw#) View NIDs for all compute nodes in the system:
sat status --type Node --fields 'xname,role,nid' --filter 'role=compute'
Example output:
+---------------+---------+-----------+
| xname | Role | NID |
+---------------+---------+-----------+
| x9000c1s0b0n0 | Compute | 1000 |
| x9000c1s0b0n1 | Compute | 1001 |
| x9000c1s1b0n0 | Compute | 1004 |
| x9000c1s1b0n1 | Compute | 1005 |
| x9000c1s7b0n0 | Compute | 1028 |
| x9000c1s7b0n1 | Compute | 1029 |
| x9000c1s7b0n2 | Compute | 147474562 |
| x9000c1s7b0n3 | Compute | 147474563 |
| x9000c3s0b0n0 | Compute | 1032 |
| x9000c3s0b0n1 | Compute | 1033 |
| x9000c3s0b0n2 | Compute | 147474594 |
| x9000c3s0b0n3 | Compute | 147474595 |
+---------------+---------+-----------+
Optionally, during a system maintenance window, the Antero NID numbering can be corrected by following the Defragment NID Numbering procedure.