Cray System Management Documentation > Cray System Management (CSM) Administration Guide > hardware state manager > Hardware State Manager (HSM) State and Flag Fields

Hardware State Manager (HSM) State and Flag Fields

HSM manages important information for hardware components in the system. Administrators can use the data returned by HSM to learn about the state of the system. To do so, it is critical that the State and Flag fields are understood, and the next steps to take are known when viewing output returned by HSM commands. It is also beneficial to understand what services can cause State or Flag changes in HSM.

The following describes what causes State and Flag changes for all HSM components:

Initial State/Flag is set upon discovery. This is generally Off/OK or On/OK. BMCs go to Ready/OK instead of On/OK.
A component in the Populated state after discovery has an unknown power state. If one is expected for nodes, BMCs, or another component, this is likely due to a firmware issue.
Flags can be set to Warning or Alert if the component’s Status.Health reads as Warning or Critical via Redfish during discovery.
State for all components associated with a BMC is set to Empty if that BMC is removed from the network.
State change events from components are consumed by HSM via subscriptions to Redfish events. These are subscribed to and placed on the Kafka bus by hmcollector for HSM’s consumption. HSM will update component state based on the information in the Redfish events.

The following describes what causes State and Flag changes for nodes only:

Heartbeat Tracking Daemon (HBTD) updates the state of nodes based on heartbeats it receives from nodes.
HBTD sets the node to Ready/OK when it starts heartbeats.
HBTD sets the node to Ready/Warning after a few missed heartbeats.
HBTD sets the node to Standby after many missed heartbeats and the node is presumed dead.

State descriptions:

Empty

The location is not populated with a component.
Populated

Present (not empty), but no further track can or is being done.
Off

Present but powered off.
On

Powered on. If no heartbeat mechanism is available, its software state may be unknown.
Standby

No longer Ready and presumed dead. It typically means the heartbeat has been lost (w/ alert).
Ready

Both On and Ready to provide its expected services. For example, used for jobs.

Flag descriptions:

OK

Component is OK.
Warning

There is a non-critical error. Generally coupled with a Ready state.
Alert

There is a critical error. Generally coupled with a Standby state. Otherwise, reported via Redfish.

Hardware State Transitions

The following table describes how to interpret when the state of hardware changes:

Prior State	New State	Reason
Ready	Standby	HBTD if node has many missed heartbeats
Ready	Ready/Warning	HBTD if node has a few missed heartbeats
Standby	Ready	HBTD node re-starts heartbeating
On	Ready	HBTD node started heartbeating
Off	Ready	HBTD sees heartbeats before Redfish Event (On)
Standby	On	Redfish Event (On) or if re-discovered while in the standby state
Off	On	Redfish Event (On)
Standby	Off	Redfish Event (Off)
Ready	Off	Redfish Event (Off)
On	Off	Redfish Event (Off)
Any State	Empty	Redfish Endpoint is disabled meaning component removal

Generally, nodes transition from Off to On to Ready when going from Off to booted, and from Ready to Ready/Warning to Standby to Off when shut down.