The configuration payload consists of the information which must be known about the HPE Cray EX system so it
can be passed to the csi
(Cray Site Init) program during the CSM installation process.
Information gathered from a site survey is needed to feed into the CSM installation process, such as system name, system size, site network information for the CAN, site DNS configuration, site NTP configuration, network information for the node used to bootstrap the installation. More detailed component level information about the system hardware is encapsulated in the SHCD (Shasta Cabling Diagram), which is a spreadsheet prepared by HPE Cray Manufacturing to assemble the components of the system and connect appropriately labeled cables.
How the configuration payload is prepared depends on whether this is a first time install of CSM software on this system or the CSM software is being reinstalled. The reinstall scenario has the advantage of being able to use the configuration payload from a previous first time install of CSM and an extra configuration file which that generated.
This information from a site survey can be given to the csi
command as command line arguments.
The information is shown here to explain what data is needed. It will not be used until moving
to the procedure Bootstrap PIT Node
The air-cooled cabinet is known to csi
as a river
cabinet. The liquid-cooled cabinets are either
mountain
or hill
(if a TDS system).
For more description of these settings and the default values, see Default IP Address Ranges and the other topics in CSM Overview
CSI option | Information |
---|---|
--bootstrap-ncn-bmc-user root |
Administrative account for the management node BMCs |
--bootstrap-ncn-bmc-pass changeme |
Password for bootstrap-ncn-bmc-user account |
--system-name eniac |
Name of the HPE Cray EX system |
--mountain-cabinets 4 |
Number of Mountain cabinets, but this could also be in cabinets.yaml |
--starting-mountain-cabinet 1000 |
Starting Mountain cabinet ID |
--hill-cabinets 0 |
Number of Hill cabinets, but this could also be in cabinets.yaml |
--river-cabinets 1 |
Number of River cabinets, but this could also be in cabinets.yaml |
--can-cidr 10.103.11.0/24 |
IP subnet for the CAN assigned to this system |
--can-external-dns 10.103.11.113 |
IP address on CAN for this system’s DNS server |
--can-gateway 10.103.11.1 |
Virtual IP address for the CAN (on the spine switches) |
--can-static-pool 10.103.11.112/28 |
MetalLB static pool on CAN |
--can-dynamic-pool 10.103.11.128/25 |
MetalLB dynamic pool on CAN |
--hmn-cidr 10.254.0.0/17 |
Override the default cabinet IPv4 subnet for River HMN |
--nmn-cidr 10.252.0.0/17 |
Override the default cabinet IPv4 subnet for River NMN |
--hmn-mtn-cidr 10.104.0.0/17 |
Override the default cabinet IPv4 subnet for Mountain HMN |
--nmn-mtn-cidr 10.100.0.0/17 |
Override the default cabinet IPv4 subnet for Mountain NMN |
--ntp-pools time.nist.gov |
External NTP pool(s) for this system to use |
--site-domain dev.cray.com |
Domain name for this system |
--site-ip 172.30.53.79/20 |
IP address and netmask for the PIT node lan0 connection |
--site-gw 172.30.48.1 |
Gateway for the PIT node to use |
--site-nic p1p2 |
NIC on the PIT node to become lan0 |
--site-dns 172.30.84.40 |
Site DNS servers to be used by the PIT node |
--install-ncn-bond-members p1p1,p10p1 |
NICs on each management node to become bond0 |
--application-node-config-yaml application_node_config.yaml |
Name of application_node_config.yaml |
--cabinets-yaml cabinets.yaml |
Name of cabinets.yaml |
--bgp-peers aggregation |
Override the default BGP peers, using aggregation switches instead of spines |
--k8s-api-auditing-enabled |
Enable Kubernetes API audit logging |
--ncn-mgmt-node-auditing-enabled |
Enable host audit logging |
csi
command with all of these options, and then edit that file to adjust the values for the particular system being installed.bootstrap-ncn-bmc-user
and bootstrap-ncn-bmc-pass
must match what is used for the BMC account and its password for the management nodes.site-domain
, site-ip
, site-gw
, site-nic
, site-dns
) for the information which connects ncn-m001
(the PIT node) to the site. The site-nic
is the interface on this node connected to the site.install-ncn-bond-members
are typically:
p1p1,p10p1
for HPE nodesp1p1,p1p2
for Gigabyte nodesp801p1,p801p2
for Intel nodesstarting-mountain-cabinet
) has a default that can be overridden. See the csi config init --help
output for more information.hmn-mtn-cidr
and nmn-mtn-cidr
parameters.can-gateway
, can-cidr
, can-static-pool
, can-dynamic-pool
) describe the Customer Access Network (CAN).
can-gateway
is the common gateway IP address used for both spine switches and commonly referred to as the Virtual IP address for the CAN.can-cidr
is the IP subnet for the CAN assigned to this system.can-static-pool
and can-dynamic-pool
are the MetalLB address static and dynamic pools for the CAN.can-external-dns
is the static IP address assigned to the DNS instance running in the cluster to which requests the cluster subdomain will be forwarded. The can-external-dns
IP address must be within the can-static-pool
range.ntp-pools
to reachable NTP pools.application_node_config.yaml
file is required. It is used to describe the mapping between prefixes in hmn_connections.csv
and HSM subroles
. This file also defines aliases application nodes. For details, see Create Application Node YAML.cabinets-yaml
to include the cabinets.yaml
file.
This file can include information about the starting ID for each cabinet type and number of cabinets which have separate command line options, but is a way to specify explicitly the ID of every cabinet in the system.
See Create Cabinets YAML.--k8s-api-auditing-enabled=true
to enable Kubernetes API audit logging,
and use --ncn-mgmt-node-auditing-enabled=true
to enable host audit logging.
See Audit Logs for more information.A few configuration files are needed for the installation of CSM. These are all provided to the csi
command during the installation process.
Filename | Source | Information |
---|---|---|
cabinets.yaml |
SHCD | The number and type of air-cooled and liquid-cooled cabinets. cabinet IDs, and VLAN numbers |
application_node_config.yaml |
SHCD | The number and type of application nodes with mapping from the name in the SHCD to the desired hostname |
hmn_connections.json |
SHCD | The network topology for HMN of the entire system |
ncn_metadata.csv |
SHCD, other | The number of master, worker, and storage nodes and MAC address information for BMC and bootable NICs |
switch_metadata.csv |
SHCD | Inventory of all spine, aggregation, CDU, and leaf switches |
Although some information in these files can be populated from site survey information, the SHCD prepared by
HPE Cray Manufacturing is the best source of data for hmn_connections.json
. The ncn_metadata.csv
does
require collection of MAC addresses from the management nodes because that information is not present in the SHCD.
cabinets.yaml
The cabinets.yaml
file describes the type of cabinets in the system, the number of each type of cabinet,
and the starting cabinet ID for every cabinet in the system. This file can be used to indicate that a system
has non-contiguous cabinet ID numbers or non-standard VLAN numbers.
The component names (xnames) used in the other files should fit within the cabinet IDs defined by the starting cabinet ID for River
cabinets (modified by the number of cabinets). It is OK for management nodes not to be in x3000
(as the first River
cabinet), but they must be in one of the River cabinets. For example, x3000
with two cabinets would mean x3000
or x3001
should have all management nodes.
See Create Cabinets YAML for instructions about creating this file.
application_node_config.yaml
The application_node_config.yaml
file controls how the csi config init
command finds and treats
application nodes discovered in the hmn_connections.json
file when building the SLS Input file.
Different node prefixes in the SHCD can be identified as Application nodes. Each node prefix can be mapped to a specific HSM sub role. These sub roles can then be used as the targets of Ansible plays run by CFS to configure these nodes. The component name (xname) for each Application node can be assigned one or more hostname aliases.
See Create Application Node YAML for instructions about creating this file.
hmn_connections.json
The hmn_connections.json
file is extracted from the HMN tab of the SHCD spreadsheet. The CSM release
includes the hms-shcd-parser
container; this container can do the extraction on the PIT node booted from the LiveCD (RemoteISO
or USB device) or on a Linux system. Although some information in these files can be populated from site
survey information, the SHCD prepared by HPE Cray Manufacturing is the best source of data for hmn_connections.json
.
No action is required to create this file at this point, and will be created when the PIT node is bootstrapped.
ncn_metadata.csv
The information in the ncn_metadata.csv
file identifies each of the management nodes, assigns the function
as a master, worker, or storage node, and provides the MAC address information needed to identify the BMC and
the NIC which will be used to boot the node.
For each management node, the component name (xname), role, and subrole
can be extracted from the SHCD. However, the rest of the
MAC address information needs to be collected another way. Collect as much information as possible
before the PIT node is booted from the LiveCD and then get the rest later when directed. See the scenarios
which enable partial data collection below in First Time Install.
See Create NCN Metadata CSV for instructions about creating this file.
switch_metadata.csv
The switch_metadata.csv
file is manually created to include information about all spine, aggregation, CDU,
and leaf switches in the system. None of the Slingshot switches for the HSN should be included in this file.
See Create Switch Metadata CSV for instructions about creating this file.
The process to install for the first time must collect the information needed to create these files.
Collect data for cabinets.yaml
See Create Cabinets YAML for instructions about creating this file.
Collect data for application_node_config.yaml
See Create Application Node YAML for instructions about creating this file.
Collect data for ncn_metadata.csv
See Create NCN Metadata CSV for instructions about creating this file.
Collect data for switch_metadata.csv
See Create Switch Metadata CSV for instructions about creating this file.
The process to reinstall must have the configuration payload files available.
Collect Payload for Reinstall
These files from a previous install are needed to do a reinstall.
application_node_config.yaml
(if used previously)cabinets.yaml
(if used previously)hmn_connections.json
ncn_metadata.csv
switch_metadata.csv
system_config.yaml
If the system_config.yaml
is not available, then a reinstall cannot be done. Switch to the install process
and generate any of the other files for the Configuration Payload Files
which are missing.
The command line options used to call csi config init
are not needed.
When doing a reinstall, all of the command line options which had been given to csi config init
during the
previous installation will be found inside the system_config.yaml
file. This simplifies the reinstall process.
When you are ready to bootstrap the LiveCD, it will indicate when to run this command without any extra command line options. It will expect to find all of the above files in the current working directory.
linux# csi config init
After completing this procedure the next step is to prepare the management nodes. See Prepare Management Nodes