HSM manages important information for hardware components in the system. Administrators can use the data returned by HSM to learn about the state of the system. To do so, it is critical that the State and Flag fields are understood, and the next steps to take are known when viewing output returned by HSM commands. It is also beneficial to understand what services can cause State or Flag changes in HSM.
The following describes what causes State and Flag changes for all HSM components:
Off/OK
or On/OK
. BMCs go to Ready/OK
instead of On/OK
.Populated
state after discovery has an unknown power state. If one is expected for nodes, BMCs, or another component, this is likely due to a firmware issue.Warning
or Alert
if the component’s Status.Health
reads as Warning
or Critical
via Redfish during discovery.Empty
if that BMC is removed from the network.hmcollector
for HSM’s consumption. HSM will update component state based on the information in the Redfish events.The following describes what causes State and Flag changes for nodes only:
Ready/OK
when it starts heartbeats.Ready/Warning
after a few missed heartbeats.Standby
after many missed heartbeats and the node is presumed dead.State descriptions:
Empty
The location is not populated with a component.
Populated
Present (not empty), but no further track can or is being done.
Off
Present but powered off.
On
Powered on. If no heartbeat mechanism is available, its software state may be unknown.
Standby
No longer Ready
and presumed dead. It typically means the heartbeat has been lost (w/ alert).
Ready
Both On
and Ready
to provide its expected services. For example, used for jobs.
Flag descriptions:
OK
Component is OK.
Warning
There is a non-critical error. Generally coupled with a Ready
state.
Alert
There is a critical error. Generally coupled with a Standby
state. Otherwise, reported via Redfish.
The following table describes how to interpret when the state of hardware changes:
Prior State | New State | Reason |
---|---|---|
Ready | Standby | HBTD if node has many missed heartbeats |
Ready | Ready/Warning | HBTD if node has a few missed heartbeats |
Standby | Ready | HBTD node re-starts heartbeating |
On | Ready | HBTD node started heartbeating |
Off | Ready | HBTD sees heartbeats before Redfish Event (On) |
Standby | On | Redfish Event (On) or if re-discovered while in the standby state |
Off | On | Redfish Event (On) |
Standby | Off | Redfish Event (Off) |
Ready | Off | Redfish Event (Off) |
On | Off | Redfish Event (Off) |
Any State | Empty | Redfish Endpoint is disabled meaning component removal |
Generally, nodes transition from Off
to On
to Ready
when going from Off
to booted, and from Ready
to Ready/Warning
to Standby
to Off
when shut down.