The Power Control Service (PCS) enables direct hardware control of nodes, compute blades, router modules, liquid cooled chassis, and management network switches. PCS talks to BMCs via Redfish to control power, query status, and manage power capping on target components. These controls enable an administrator and 3rd party software to more intelligently manage state and system-wide power consumption.
Administrators can use the cray CLI for power operations from any system that
has HTTPS access to the
System Management Services.
Third party software can access the API directly. Refer to the PCS API documentation for detailed information about API options and features.
The cray power command (see --help) can be used to control power to
specific components by specifying the component xname.
PCS power control assumes that all cabinets and PDUs have been plugged in, breakers are on, and PDU controllers, BMCs, and other embedded controllers are on, available, and have been discovered. Components have their power controlled in a pre-defined order to properly handle requests of dependent components.
Important: It is recommended to use the Boot Orchestration Service (BOS) to boot (power On), shutdown, and reboot compute nodes.
Air Cooled Cabinets:
Liquid Cooled Cabinets:
PCS uses xnames to specify entire cabinets or specific components throughout
the system. By default, PCS controls power to only one component at a time.
--include parents or --include children options can be passed to PCS using
the cray CLI. When the --include parents option is specified in a request,
all parent components of the target component are also included in the power
operation. When the --include children option is specified, all children
components of the target component are also included in the power operation.
By the cabinet naming convention, each cabinet in the system is assigned a unique number. Cabinet numbers can range from 0-9999 and contain from 1-4 digits only.
Manufacturing typically follows a sequential cabinet numbering scheme:
x1000–x2999x3000–x4999x5000–x5999Examples of valid xnames:
s0, allx1000, x3000, x5000x1000c7, x3500c0 (Air Cooled cabinets are always chassis 0)x1000c7s3, x3500c0s15 (U15)x1000c7s3b0n0, x3500c0s15b1n0x3200c0s9 (U9)x3200c0s9b0n0NOTE Power control is not supported for management network switches.
PCS is capable of setting node power limits on all supported compute node hardware in both liquid cooled cabinets and air cooled cabinets. This functionality enables external software to establish an upper bound, or estimate a minimum bound, on the amount of power a system may consume. Separate PCS calls are required to power cap different compute node types as each compute node type has its own power capping capabilities.
NOTE Power capping is not supported for liquid cooled chassis, switch
modules, compute blades, management network switches, or any non-compute
nodes (NCNs) in air cooled cabinets.
The /power-status API in PCS can be used to monitor the Availability/Reachability of all managed hardware.
PCS periodically reaches out to all managed hardware for status. This includes the following hardware types:
ChassisChassisBMCComputeModuleRouterModuleNodeBMCRouterBMCNodeHSNBoardMgmtSwitchMgmtHLSwitchCDUMgmtSwitchCabinetPDUPowerConnectorPCS will respond with the power status, the manager availability, what power controls are available, and when the component’s entry was last updated. For example:
{
"status": [
{
"xname": "x1000c0s0b0n0",
"powerState": "on",
"managementState": "available",
"error": "",
"supportedPowerTransitions": [
"Force-Off",
"On",
"Soft-Off",
"Off",
"Init",
"Hard-Restart",
"Soft-Restart"
],
"lastUpdated": "2023-05-09T20:52:53.489834846Z"
}
]
}
The managementState can be used to determine if the component’s management endpoint was reachable during
the last hardware scan and can be used to monitor system hardware readiness and availability.
See the /power-status section in the
PCS API documentation
for detailed information about the API options and features.