This procedure adds one or more liquid-cooled cabinets and associated CDU management switches to SLS.
NOTE: This procedure is intended to be used in conjunction with the top level Add additional Liquid-Cooled Cabinets to a System procedure.
The Cray command line interface (CLI) tool is initialized and configured on the system. See Configure the Cray CLI.
The latest CSM documentation is installed on the system. See Check for latest documentation.
Collect information for the liquid-cooled cabinets being added to the system. For each cabinet collect:
x1004
)3004
)2004
)2025
)Collect information for the CDU switches (if any) being added to the system. For each CDU management switch, collect:
d1w1
)sw-cdu-004
)Perform an SLS dump state operation:
ncn-mw# cray sls dumpstate list --format json > sls_dump.json
ncn-mw# cp -v sls_dump.json sls_dump.original.json
For each new liquid-cooled cabinet being added to the system, collect the following information about each cabinet:
x1004
)3004
)2004
)2025
)The
inspect_sls_cabinets.py
script can be used to help display information about existing cabinets present in the system:ncn-mw# /usr/share/doc/csm/scripts/operations/system_layout_service/inspect_sls_cabinets.py sls_dump.json
Example output on a system with 1 air-cooled cabinet and 4 liquid-cooled cabinets:
================================= Cabinet NID Allocations ================================= Cabinet | NID Ranges --------------------|--------------------- x1000 (Mountain) | 1000-1255 x1001 (Mountain) | 1256-1511 x1002 (Mountain) | 1512-1767 x1003 (Mountain) | 1768-2023 x3000 (River) | 100001-100011 ================================= Cabinet Subnet & VLAN Allocations ================================= Cabinet | HMN VLAN | HMN CIDR | NMN VLAN | NMN CIDR --------------------|-----------|---------------------|-----------|--------------------- x1000 (Mountain) | 3000 | 10.104.0.0/22 | 2000 | 10.100.0.0/22 x1001 (Mountain) | 3001 | 10.104.4.0/22 | 2001 | 10.100.4.0/22 x1002 (Mountain) | 3002 | 10.104.8.0/22 | 2002 | 10.100.8.0/22 x1003 (Mountain) | 3003 | 10.104.12.0/22 | 2003 | 10.100.12.0/22 x3000 (River) | 1513 | 10.107.0.0/22 | 1770 | 10.106.0.0/22
For each new liquid-cooled cabinet, add it to the previously taken SLS state dump in ascending order:
Command line flags for add_liquid_cooled_cabinet.py
:
Argument | Description | Example value |
---|---|---|
--cabinet |
Component name (xname) of the liquid-cooled cabinet to add | x1000 |
--cabinet-type |
Type of liquid-cooled cabinet to add | Mountain , Hill , or EX2500 |
--cabinet-vlan-hmn |
Hardware Management Network (HMN) VLAN ID configured on the CEC | 3004 |
--cabinet-vlan-nmn |
Node Management Network (NMN) VLAN ID configured on the CEC | 2004 |
--starting-nid |
Starting NID for new cabinet. Each cabinet is allocated 256 NIDs | 2024 |
ncn-mw# /usr/share/doc/csm/scripts/operations/system_layout_service/add_liquid_cooled_cabinet.py sls_dump.json \
--cabinet x1004 \
--cabinet-type Mountain \
--cabinet-vlan-hmn 3004 \
--cabinet-vlan-nmn 2004 \
--starting-nid 2024
Example output:
========================
Configuration
========================
SLS State File: sls_dump.json
Starting NID: 2024
Cabinet: x1004
Cabinet Type: Mountain
Cabinet VLAN HMN: 3004
Cabinet VLAN NMN: 2004
========================
Network Configuration
========================
Selecting subnet for x1004 cabinet in HMN_MTN network
Found existing subnet cabinet_1000 with CIDR 10.104.0.0/22
Found existing subnet cabinet_1001 with CIDR 10.104.4.0/22
Found existing subnet cabinet_1002 with CIDR 10.104.8.0/22
Found existing subnet cabinet_1003 with CIDR 10.104.12.0/22
10.104.16.0/22 Available for use.
Selecting subnet for x1004 cabinet in NMN_MTN network
Found existing subnet cabinet_1001 with CIDR 10.100.4.0/22
Found existing subnet cabinet_1002 with CIDR 10.100.8.0/22
Found existing subnet cabinet_1003 with CIDR 10.100.12.0/22
10.100.16.0/22 Available for use.
HMN_MTN Subnet
CIDR: 10.104.16.0/22
Gateway: 10.104.16.1
DHCP Start: 10.104.16.10
DHCP End: 10.104.19.254
NMN_MTN Subnet
CIDR: 10.100.16.0/22
Gateway: 10.100.16.1
DHCP Start: 10.100.16.10
DHCP End: 10.100.19.254
Next available NID 2280
Writing updated SLS state to sls_dump.json
Note: If adding more than one cabinet and contiguous NIDs are desired, then the value of the Next available NID 2280
can be
used as the value for the --start-nid
argument when adding the next cabinet.
Possible Errors:
Problem | Error Message | Resolution |
---|---|---|
Duplicate Cabinet Xname | Error x1000 already exists in sls_dump.json! |
The cabinet has already present in SLS. Ensure the new cabinet has a unique component name (xname), or the cabinet is already present in SLS. |
Duplicate NID values | Error found duplicate NID 3000 |
Need to choose a different starting NID value for the cabinet that does not overlap with existing nodes. |
Duplicate Cabinet HMN VLAN ID: | Error found duplicate VLAN 3022 with subnet cabinet_1001 in HMN_MTN |
Ensure that the this new cabinet has an unique HMN VLAN ID. |
Duplicate Cabinet NMN VLAN ID | Error found duplicate VLAN 3023 with subnet cabinet_1001 in NMN_MTN |
Ensure that the this new cabinet has an unique NMN VLAN ID. |
Inspect cabinet subnet and VLAN allocations in the system after adding the new cabinets.
ncn-mw# /usr/share/doc/csm/scripts/operations/system_layout_service/inspect_sls_cabinets.py sls_dump.json
Example output:
=================================
Cabinet NID Allocations
=================================
Cabinet | NID Ranges
--------------------|---------------------
x1000 (Mountain) | 1000-1255
x1001 (Mountain) | 1256-1511
x1002 (Mountain) | 1512-1767
x1003 (Mountain) | 1768-2023
x1004 (Mountain) | 2024-2279
x3000 (River) | 100001-100011
=================================
Cabinet Subnet & VLAN Allocations
=================================
Cabinet | HMN VLAN | HMN CIDR | NMN VLAN | NMN CIDR
--------------------|-----------|---------------------|-----------|---------------------
x1000 (Mountain) | 3000 | 10.104.0.0/22 | 2000 | 10.100.0.0/22
x1001 (Mountain) | 3001 | 10.104.4.0/22 | 2001 | 10.100.4.0/22
x1002 (Mountain) | 3002 | 10.104.8.0/22 | 2002 | 10.100.8.0/22
x1003 (Mountain) | 3003 | 10.104.12.0/22 | 2003 | 10.100.12.0/22
x1004 (Mountain) | 3004 | 10.104.16.0/22 | 2004 | 10.100.16.0/22
x3000 (River) | 1513 | 10.107.0.0/22 | 1770 | 10.106.0.0/22
For each new CDU switch being added to the system, collect the following information about it:
dDwW
- dD
: where D
is the Coolant Distribution Unit (CDU).
- wW
: where W
is the management switch in a CDU.xXcChHsS
xX
: where X
is the River cabinet identification number (the figure above is 3000
).cC
: where C
is the chassis identification number. This should be 0
.hH
: where H
is the slot number in the cabinet (height).sS
: where S
is the horizontal space number.sw-cdu-004
)For each new CDU switch, add it to the SLS state dump taken in step 1 in ascending order based on the switch alias:
ncn-mw# /usr/share/doc/csm/scripts/operations/system_layout_service/add_cdu_switch.py sls_dump.json \
--cdu-switch d1w1 \
--alias sw-cdu-003 \
--brand Dell
Example output:
========================
Configuration
========================
SLS State File: sls_dump.json
CDU Switch: d1w1
Brand: Dell
Alias: sw-cdu-003
================================
CDU Switch Network Configuration
================================
Selecting IP Reservation for d1w1 CDU Switch in HMN's network_hardware subnet
Found existing IP reservation sw-spine-001 with IP 10.254.0.2
Found existing IP reservation sw-spine-002 with IP 10.254.0.3
Found existing IP reservation sw-leaf-bmc-001 with IP 10.254.0.4
Found existing IP reservation sw-leaf-bmc-002 with IP 10.254.0.5
Found existing IP reservation sw-cdu-001 with IP 10.254.0.6
Found existing IP reservation sw-cdu-002 with IP 10.254.0.7
10.254.0.8 Available for use.
Selecting IP Reservation for d1w1 CDU Switch in NMN's network_hardware subnet
Found existing IP reservation sw-spine-001 with IP 10.252.0.2
Found existing IP reservation sw-spine-002 with IP 10.252.0.3
Found existing IP reservation sw-leaf-bmc-001 with IP 10.252.0.4
Found existing IP reservation sw-leaf-bmc-002 with IP 10.252.0.5
Found existing IP reservation sw-cdu-001 with IP 10.252.0.6
Found existing IP reservation sw-cdu-002 with IP 10.252.0.7
10.252.0.8 Available for use.
Selecting IP Reservation for d1w1 CDU Switch in MTL's network_hardware subnet
Found existing IP reservation sw-spine-001 with IP 10.1.0.2
Found existing IP reservation sw-spine-002 with IP 10.1.0.3
Found existing IP reservation sw-leaf-bmc-001 with IP 10.1.0.4
Found existing IP reservation sw-leaf-bmc-002 with IP 10.1.0.5
Found existing IP reservation sw-cdu-001 with IP 10.1.0.6
Found existing IP reservation sw-cdu-002 with IP 10.1.0.7
10.1.0.8 Available for use.
HMN IP: 10.254.0.8
NMN IP: 10.252.0.8
MTL IP: 10.1.0.8
Writing updated SLS state to sls_dump.json
Inspect the differences between the original SLS state file and the modified one.
ncn-mw# diff sls_dump.original.json sls_dump.json
Perform an SLS load state operation to replace the contents of SLS with the data from the sls_dump.json
file.
Get an API token.
ncn-nw# TOKEN=$(curl -s -S -d grant_type=client_credentials \
-d client_id=admin-client \
-d client_secret=`kubectl get secrets admin-client-auth -o jsonpath='{.data.client-secret}' | base64 -d` \
https://api-gw-service-nmn.local/keycloak/realms/shasta/protocol/openid-connect/token | jq -r '.access_token')
Perform the load state operation.
ncn-mw# curl -s -k -H "Authorization: Bearer ${TOKEN}" -X POST -F sls_dump=@sls_dump.json \
https://api-gw-service-nmn.local/apis/sls/v1/loadstate
MEDS will automatically start looking for potential hardware in the newly added liquid-cooled cabinets.
Note: No hardware in these new cabinets will be discovered until the management network has been reconfigured to support the new cabinets, and routes have been added to the management NCNs in the system.