The procedures in this section detail the high-level tasks required to power on an HPE Cray EX system.
Important: If an emergency power off (EPO) event occurred, then see Recover from a Liquid-Cooled Cabinet EPO Event for recovery procedures.
If user IDs or passwords are needed, then see step 1 of the Prepare the System for Power Off procedure.
Always use the cabinet power-on sequence for the site.
The management cabinet is the first part of the system that must be powered on and booted. Management network and Slingshot fabric switches power on and boot when cabinet power is applied. After cabinets are powered on, wait at least 10 minutes for systems to initialize.
After all the system cabinets are powered on, be sure that all management network and Slingshot network switches are powered on, and that there are no error LEDS or hardware failures.
To power on an external Lustre file system (ClusterStor), refer to Power On the External Lustre File System.
To power on the external Spectrum Scale (GPFS) file system, refer to site procedures.
Note: If the external file systems are not mounted on worker nodes, then continue to power them in parallel with the power on and boot of the Kubernetes management cluster and the power on of the compute cabinets. This must be completed before beginning to power on and boot the compute nodes and User Access Nodes (UANs).
To power on the management cabinet and bring up the management Kubernetes cluster, refer to Power On and Start the Management Kubernetes Cluster.
To power on all liquid-cooled cabinet CDUs and cabinet PDUs, refer to Power On Compute Cabinets.
Note: Ensure that the external Lustre and Spectrum Scale (GPFS) filesystems are available before starting to boot the compute nodes and UANs.
To power on and boot compute nodes and UANs, refer to Power On and Boot Compute and User Access Nodes and make nodes available to users.
After power on, refer to Validate CSM Health to check system health and status.
Make nodes available to users once system health and any other post-system maintenance checks have completed.