The Boot Orchestration Service (BOS) supports a status endpoint that reports detailed status information for individual BOS sessions.
BOS v2 session status offers an overall status, as well as information about the percentage of components in each state, and any errors being experienced. Status will be current as long as the session is running, and will cache itself when the session ends for future reference.
(ncn-mw#
) To view detailed session status, run:
cray bos v2 sessions status list 3d2e86d1-8909-46fc-8a22-f42f1a140264 --format json
Example output:
{
"error_summary": {
"Sample error message": {"count": 1, "list": "x3000c0s13b0n0"}
},
"managed_components_count": 1,
"percent_failed": 100.0,
"percent_staged": 0,
"percent_successful": 0,
"phases": {
"percent_complete": 100.0,
"percent_configuring": 0,
"percent_powering_off": 0,
"percent_powering_on": 0
},
"status": "complete",
"timing": {
"duration": "0:00:20",
"end_time": "2022-08-22T16:51:10",
"start_time": "2022-08-22T16:50:50"
}
}
error_summary
Contains any error messages currently reported by nodes whether those are transient failures that will be retried or nodes that have reached a retry limit. Nodes are grouped by error message, and each message includes a total count of nodes reporting that error as well as a comma separated list of nodes. For errors on many nodes, the list of nodes will be truncated to the first few for readability.
managed_components_count
The number of components this session is responsible for. While the session is running, this is the current count and may decrease if other newer sessions take over responsibility for components. For completed sessions this is the number of components that were tracked by the session until the session was complete.
status
Status can be either pending
, running
, or complete
. Sessions are considered pending
until the desired state of all associated components has been set.
percent_*
The percent of the managed_components
that are in the specified state.
start_time
This timestamp is set when the session is created.
end_time
This timestamp will initially be null
and will be set when the session ends.
duration
This lists the duration of the session in h:mm:ss
. While the session is running, this will be the current duration, and the value is locked-in when the session completes.
In CSM 1.5.0 on large systems, BOS v2 sessions may not work because of a failed interaction with CFS. For more information, see CFS V2 Failures On Large Systems.
In BOS v1, the status can be retrieved for each boot set within the session, as well as the individual items within a boot set.
BOS sessions contain one or more boot sets. Each boot set contains one or more phases, depending upon the operation for that session.
For example, a reboot
operation would have a shutdown
, boot
, and possibly configuration
phase, but a shutdown
operation would only have a shutdown
phase.
Each phase contains the following categories: not_started
, in_progress
, succeeded
, failed
, and excluded
.
Each session, boot set, and phase contains similar metadata. The following is a table of useful attributes to look for in the metadata:
Attribute | Meaning |
---|---|
start_time |
The time when a session, boot set, or phase started work. |
in_progress |
If true, it means that the session, boot set, or phase has started and still has work going on. |
complete |
If true, it means the session, boot set, or phase has finished. |
error_count |
The number of errors encountered in the boot sets or phases. |
stop_time |
The time when a session, boot set, or phase ended work. |
The following table summarizes how to interpret the various combinations of values for the in_progress
and complete
flags:
in_progress |
complete |
Meaning |
---|---|---|
false | false | Item has not started. |
true | false | Item is in progress. |
false | true | Item has completed. |
true | true | Invalid state (should not occur). |
The in_progress
, complete
, and error_count
fields are cumulative, meaning that they summarize the state of the sub-items.
Item | in_progress meaning |
complete meaning |
---|---|---|
Phase | If true, it means there is at least one node in the in_progress category. |
If true, it means that there are no nodes in the in_progress or not_started categories. |
Boot set | If true, it means there is at least one phase that is in_progress . |
If true, it means that all phases in the boot set are complete . |
Session | If true, it means that at least one boot set is in_progress . |
If true, it means that all boot sets are complete . |
(ncn-mw#
) The BOS session ID is required to view the status of a session. To list the available sessions, use the following command:
Note: If this command fails, there may be too many BOS v1 sessions. This limitation does not exist in BOS v2. For more information, see Hang Listing BOS V1 Sessions.
cray bos v1 session list --format json
Example output:
[
"99a192c2-050e-41bc-a576-548610851742",
"4374f3e6-e8ed-4e66-bf63-3ebe0e618db2",
"fb14932a-a9b7-41b2-ad21-b4bc632cf1ef",
"9321ab7a-bf7f-42fd-8103-94a296552856",
"50aaaa85-6807-45c7-b6de-f984a930e2eb",
"972cfd09-3403-4282-ab93-b41992f7c0d8",
"2c86c1b9-5281-4610-b044-479f1536727a",
"7719385a-e462-4bb6-8fd8-55caa0836528",
"0aac0252-4637-4198-919f-6bafda7fafef",
"13207c87-0b9f-410c-88c1-6e26ff63cb34",
"bd18e7e3-978f-4699-b8f2-8a4ce2d46f75",
"b741e4de-2064-4de4-9f23-20b6c1d0dc1a",
"f4eebe51-a217-46d0-8733-b9499a092042"
]
(ncn-mw#
) It is recommended to describe the session using the session ID above to verify the desired selection was selected:
cray bos v1 session describe SESSION_ID
Example output:
status_link = "/v1/session/f4eebe51-a217-46d0-8733-b9499a092042/status"
complete = false
start_time = "2020-07-22 13:39:07.706774"
templateName = "cle-1.3.0"
error_count = 4
boa_job_name = "boa-f4eebe51-a217-46d0-8733-b9499a092042"
in_progress = false
operation = "reboot"
The status for the session will show the session ID, the boot sets in the session, the metadata, and some links. In the following example, there is only one boot set named computes, and the session ID being used is f4eebe51-a217-46d0-8733-b9499a092042
.
(ncn-mw#
) To display the status for the session:
cray bos v1 session status list SESSION_ID -–format json
Example output:
{
"boot_sets": [
"computes"
],
"id": "f4eebe51-a217-46d0-8733-b9499a092042",
"links": [
{
"href": "/v1/session/f4eebe51-a217-46d0-8733-b9499a092042/status",
"rel": "self"
},
{
"href": "/v1/session/f4eebe51-a217-46d0-8733-b9499a092042/status/computes" ,
"rel": "Boot Set"
}
],
"metadata": {
"in_progress": false,
"start_time": "2020-07-22 13:39:07.706774",
"complete": false,
"error_count": 4
}
}
Run the following command to view the status for a specific boot set in a session. For more information about retrieving the session ID and boot set name, refer to the “View the Status of a Session” section above. Descriptions of the different status sections are described below.
id
parameter identifies which session this status belongs to.name
parameter is the name of the boot set.links
section displays links that enable administrators to drill down into each phase of the boot set.metadata
section for the boot set as a whole.name
parameter is the name of the phase.metadata
section for each phase.not_started
, in_progress
, succeeded
, failed
, and excluded
. The nodes are listed in the category they are currently occupying.(ncn-mw#
)
cray bos v1 session status describe BOOT_SET_NAME SESSION_ID --format json
Example output:
{
"phases": [
{
"name": "shutdown",
"categories": [
{
"name": "not_started",
"node_list": []
},
{
"name": "succeeded",
"node_list": []
},
{
"name": "failed",
"node_list": [
"x3000c0s19b4n0",
"x3000c0s19b1n0",
"x3000c0s19b3n0",
"x3000c0s19b2n0"
]
},
{
"name": "excluded",
"node_list": []
},
{
"name": "in_progress",
"node_list": []
}
],
"metadata": {
"stop_time": "2020-07-22 13:53:19.842705",
"in_progress": false,
"start_time": "2020-07-22 13:39:08.276530",
"complete": true,
"error_count": 4
}
},
{
"name": "boot",
"categories": [
{
"name": "not_started",
"node_list": [
"x3000c0s19b4n0",
"x3000c0s19b3n0",
"x3000c0s19b1n0",
"x3000c0s19b2n0"
]
},
{
"name": "succeeded",
"node_list": []
},
{
"name": "failed",
"node_list": []
},
{
"name": "excluded",
"node_list": []
},
{
"name": "in_progress",
"node_list": []
}
],
"metadata": {
"in_progress": false,
"start_time": "2020-07-22 13:39:08.276542",
"complete": false,
"error_count": 0
}
},
{
"name": "configure",
"categories": [
{
"name": "not_started",
"node_list": [
"x3000c0s19b4n0",
"x3000c0s19b3n0",
"x3000c0s19b1n0",
"x3000c0s19b2n0"
]
},
{
"name": "succeeded",
"node_list": []
},
{
"name": "failed",
"node_list": []
},
{
"name": "excluded",
"node_list": []
},
{
"name": "in_progress",
"node_list": []
}
],
"metadata": {
"in_progress": false,
"start_time": "2020-07-22 13:39:08.276552",
"complete": false,
"error_count": 0
}
}
],
"session": "f4eebe51-a217-46d0-8733-b9499a092042",
"name": "computes",
"links": [
{
"href": "/v1/session/f4eebe51-a217-46d0-8733-b9499a092042/status/computes",
"rel": "self"
},
{
"href": "/v1/session/f4eebe51-a217-46d0-8733-b9499a092042/status/computes/shutdown",
"rel": "Phase"
},
{
"href": "/v1/session/f4eebe51-a217-46d0-8733-b9499a092042/status/computes/boot",
"rel": "Phase"
},
{
"href": "/v1/session/f4eebe51-a217-46d0-8733-b9499a092042/status/computes/configure",
"rel": "Phase"
}
],
"metadata": {
"in_progress": false,
"start_time": "2020-07-22 13:39:08.276519",
"complete": false,
"error_count": 4
}
}
Direct calls to the API are needed to retrieve the status for an individual phase. Support for the Cray CLI is not currently available.
(ncn-mw#
) The following command is used to view the status of a phase:
curl -H "Authorization: Bearer BEARER_TOKEN" -X GET https://api-gw-service-nmn.local/apis/bos/v1/session/SESSION_ID/status/BOOT_SET_NAME/PHASE
(ncn-mw#
) In the following example, the session ID is f89eb554-c733-4197-b2f2-4e1e5ba0c0ec
, the boot set name is computes
, and the individual phase is shutdown
.
curl -H "Authorization: Bearer BEARER_TOKEN" -X GET https://api-gw-service-nmn.local/apis/bos/v1/session/f89eb554-c733-4197-b2f2-4e1e5ba0c0ec/status/computes/shutdown
Example output:
{
"categories": [
{
"name": "not_started",
"node_list": []
},
{
"name": "succeeded",
"node_list": []
},
{
"name": "failed",
"node_list": []
},
{
"name": "excluded",
"node_list": []
},
{
"name": "in_progress",
"node_list": [
"x5000c1s2b0n1",
"x5000c1s0b0n0",
"x3000c0s19b4n0",
"x5000c1s0b1n0",
"x5000c1s0b1n1",
"x5000c1s1b1n1",
"x5000c1s2b0n0",
"x3000c0s19b3n0",
"x5000c1s0b0n1",
"x5000c1s2b1n1",
"x3000c0s19b1n0",
"x5000c1s1b1n0",
"x5000c1s2b1n0",
"x3000c0s19b2n0",
"x5000c1s1b0n1",
"x5000c1s1b0n0"
]
}
],
"metadata": {
"complete": false,
"error_count": 0,
"in_progress": true,
"start_time": "2020-06-30 21:42:39.355423"
},
"name": "shutdown"
}
Direct calls to the API are needed to retrieve the status for an individual category. Support for the Cray CLI is not currently available.
(ncn-mw#
) The following command is used to view the status of a phase:
curl -H "Authorization: Bearer BEARER_TOKEN" -X GET https://api-gw-service-nmn.local/apis/bos/v1/session/SESSION_ID/status/BOOT_SET_NAME/PHASE/CATEGORY
(ncn-mw#
) In the following example, the session ID is f89eb554-c733-4197-b2f2-4e1e5ba0c0ec
, the boot set name is computes
, the phase is shutdown
, and the category is in_progress
.
curl -H "Authorization: Bearer BEARER_TOKEN" -X GET https://api-gw-service-nmn.local/apis/bos/v1/session/f89eb554-c733-4197-b2f2-4e1e5ba0c0ec/status/computes/shutdown/in_progress
Example output:
{
"name": "in_progress",
"node_list": [
"x5000c1s2b0n1",
"x5000c1s0b0n0",
"x3000c0s19b4n0",
"x5000c1s0b1n0",
"x5000c1s0b1n1",
"x5000c1s1b1n1",
"x5000c1s2b0n0",
"x3000c0s19b3n0",
"x5000c1s0b0n1",
"x5000c1s2b1n1",
"x3000c0s19b1n0",
"x5000c1s1b1n0",
"x5000c1s2b1n0",
"x3000c0s19b2n0",
"x5000c1s1b0n1",
"x5000c1s1b0n0"
]
}