Scroll down for code samples, example requests and responses. Select a language for code samples from the tabs above or the mobile navigation menu.
The Heartbeat Tracker Service transfers basic node health, service state, and configuration information between compute nodes and the Hardware State Manager (HSM). The API tracks the heartbeats emitted by various system components. Generally, compute nodes emit heartbeats to inform the HSM that they are alive and healthy. Other components can also emit heartbeats if they so choose. An operating system developer may call this API to track a hardware component heartbeat. There is no Command Line (CLI) for the Heartbeat Tracker Service. The compute nodes send heartbeats after every 3 seconds (by default) to the Heartbeat Tracker Service. The Heartbeat Tracker Service resides on the Non-Compute Node (NCN). It tracks the heartbeats received for a given component and checks them against the previous heartbeat.
Changes in heartbeat behavior are communicated to the Hardware State Manager in the following way:
This is a service to service communication.
Send a heartbeat message from a compute node to the heartbeat tracker service. Heartbeat status changes like heartbeat starts or stops, are communicated to the HSM.
Query the service for for the current heartbeat status of requested components.
Query and modify service operating parameters.
Retrieve health information for the service and its dependencies.
Send a heartbeat message to the heartbeat tracker service with a JSON formatted payload. If it’s the first heartbeat, it will send a heartbeat-started message to the HSM and inform that the component is alive. Keep sending them periodically (say, every 10 seconds) to continue to have an “alive” state. If the heartbeats for a given component stop, the heartbeat tracker service will send a heartbeat-stopped message to HSM with a warning (“node might be dead”) followed later by a heartbeat-stopped message to HSM with an alert (“node is dead”).
Sends a list of components to the service in a JSON formatted payload. The service will respond with a JSON payload containing the same list of components, each with their XName and Heartbeating status.
Query the service for the heartbeat status of a single component. The service will respond with a JSON formatted payload containing the requested component XName and Heartbeating status.
Retrieve current operational parameters.
To change a parameter, perform a PATCH operation with a JSON-formatted payload containing the parameter(s) to be changed along with their new values. For example, you can set the debug level to 2. Debug parameter increases the verbosity of logging.
Base URLs:
Code samples
POST https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/heartbeat/{xname} HTTP/1.1
Host: api-gw-service-nmn.local
Content-Type: application/json
Accept: */*
# You can also use wget
curl -X POST https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/heartbeat/{xname} \
-H 'Content-Type: application/json' \
-H 'Accept: */*' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Content-Type': 'application/json',
'Accept': '*/*',
'Authorization': 'Bearer {access-token}'
}
r = requests.post('https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/heartbeat/{xname}', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Content-Type": []string{"application/json"},
"Accept": []string{"*/*"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("POST", "https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/heartbeat/{xname}", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
POST /heartbeat/{xname}
Send a heartbeat message
Send a heartbeat message from a managed component like compute node to the heartbeat tracker service. To do so, a JSON object that contains the heartbeat information is sent to the heartbeat tracker service. Changes in heartbeat behavior are communicated to the Hardware State Manager.
Body parameter
{
"Status": "Kernel Oops",
"TimeStamp": "2018-07-06T12:34:56.012345-5Z"
}
Name | In | Type | Required | Description |
---|---|---|---|---|
body | body | heartbeat_xname | true | none |
xname | path | XName.1.0.0 | true | none |
Example responses
200 Response
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Success | Error |
400 | Bad Request | Bad Request. Malformed JSON. Verify all JSON formatting in payload. Verify that the all entries are properly set. | None |
401 | Unauthorized | Unauthorized. RBAC and/or authenticated token does not allow calling this method. Check the authentication token expiration. Verify that the RBAC information is correct. | Error |
404 | Not Found | Not Found. Endpoint not available. Check IP routing between managed and management plane. Check that any SMS node services are running on management plane. Check that SMS node API gateway service is running on management plane. Check that SMS node HMI service is running on management plane. | Error |
405 | Method Not Allowed | Operation not permitted. For /heartbeat, only POST operations are allowed. | Error |
default | Default | Unexpected error | Error |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
Code samples
POST https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/heartbeat HTTP/1.1
Host: api-gw-service-nmn.local
Content-Type: application/json
Accept: */*
# You can also use wget
curl -X POST https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/heartbeat \
-H 'Content-Type: application/json' \
-H 'Accept: */*' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Content-Type': 'application/json',
'Accept': '*/*',
'Authorization': 'Bearer {access-token}'
}
r = requests.post('https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/heartbeat', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Content-Type": []string{"application/json"},
"Accept": []string{"*/*"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("POST", "https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/heartbeat", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
POST /heartbeat
Send a heartbeat message
Send a heartbeat message from a managed component like compute node to the heartbeat tracker service. To do so, a JSON object that contains the heartbeat information is sent to the heartbeat tracker service. Changes in heartbeat behavior are communicated to the Hardware State Manager.
Body parameter
{
"Component": "x0c1s2b0n3",
"Hostname": "x0c1s2b0n3.us.cray.com",
"NID": "83",
"Status": "Kernel Oops",
"TimeStamp": "2018-07-06T12:34:56.012345-5Z"
}
Name | In | Type | Required | Description |
---|---|---|---|---|
body | body | heartbeat | true | none |
Example responses
200 Response
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Success | Error |
400 | Bad Request | Bad Request. Malformed JSON. Verify all JSON formatting in payload. Verify that the all entries are properly set. | None |
401 | Unauthorized | Unauthorized. RBAC and/or authenticated token does not allow calling this method. Check the authentication token expiration. Verify that the RBAC information is correct. | Error |
404 | Not Found | Not Found. Endpoint not available. Check IP routing between managed and management plane. Check that any SMS node services are running on management plane. Check that SMS node API gateway service is running on management plane. Check that SMS node HMI service is running on management plane. | Error |
405 | Method Not Allowed | Operation not permitted. For /heartbeat, only POST operations are allowed. | Error |
default | Default | Unexpected error | Error |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
Code samples
POST https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/hbstates HTTP/1.1
Host: api-gw-service-nmn.local
Content-Type: application/json
Accept: application/json
# You can also use wget
curl -X POST https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/hbstates \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Content-Type': 'application/json',
'Accept': 'application/json',
'Authorization': 'Bearer {access-token}'
}
r = requests.post('https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/hbstates', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Content-Type": []string{"application/json"},
"Accept": []string{"application/json"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("POST", "https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/hbstates", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
POST /hbstates
Query the service for heartbeat status of requested components
Sends a list of components to the service in a JSON formatted payload. The service will respond with a JSON payload containing the same list of components, each with their XName and Heartbeating status.
Body parameter
{
"XNames": [
"x0c1s2b0n3"
]
}
Name | In | Type | Required | Description |
---|---|---|---|---|
body | body | hbstates | true | none |
Example responses
200 Response
{
"HBStates": [
{
"XName": "x0c0s0b0n0",
"Heartbeating": true
}
]
}
401 Response
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | OK. The operation was successful and a payload was returned | hbstates_rsp |
400 | Bad Request | Bad Request. Malformed JSON. Verify all JSON formatting in payload. Verify that the all entries are properly set. | None |
401 | Unauthorized | Unauthorized. RBAC and/or authenticated token does not allow calling this method. Check the authentication token expiration. Verify that the RBAC information is correct. | Error |
404 | Not Found | Not Found. Endpoint not available. Check IP routing between managed and management plane. Check that any SMS node services are running on management plane. Check that SMS node API gateway service is running on management plane. Check that SMS node HMI service is running on management plane. | Error |
405 | Method Not Allowed | Operation not permitted. For /hbstates, only POST operations are allowed. | Error |
default | Default | Unexpected error | Error |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
Code samples
GET https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/hbstate/{xname} HTTP/1.1
Host: api-gw-service-nmn.local
Accept: application/json
# You can also use wget
curl -X GET https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/hbstate/{xname} \
-H 'Accept: application/json' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Accept': 'application/json',
'Authorization': 'Bearer {access-token}'
}
r = requests.get('https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/hbstate/{xname}', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Accept": []string{"application/json"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("GET", "https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/hbstate/{xname}", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
GET /hbstate/{xname}
Query the service for the heartbeat status of a single component.
Query the service for the heartbeat status of a single component. The service will respond with a JSON formatted payload containing the requested component XName and heartbeating status.
Name | In | Type | Required | Description |
---|---|---|---|---|
xname | path | XName.1.0.0 | true | none |
Example responses
200 Response
{
"XName": "x0c0s0b0n0",
"Heartbeating": true
}
404 Response
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | OK. The data was succesfully retrieved | hbstates_single_rsp |
404 | Not Found | Not Found. Endpoint not available. Check IP routing between managed and management plane. Check that any SMS node services are running on management plane. Check that SMS node API gateway service is running on management plane. Check that SMS node HMI service is running on management plane. | Error |
405 | Method Not Allowed | Operation not permitted. For /hbstate/{xname}, only GET operations are allowed. | Error |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
Code samples
GET https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/params HTTP/1.1
Host: api-gw-service-nmn.local
Accept: */*
# You can also use wget
curl -X GET https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/params \
-H 'Accept: */*' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Accept': '*/*',
'Authorization': 'Bearer {access-token}'
}
r = requests.get('https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/params', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Accept": []string{"*/*"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("GET", "https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/params", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
GET /params
Retrieve heartbeat tracker parameters
Fetch current heartbeat tracker configurable parameters.
Example responses
200 Response
default Response
{
"type": "string",
"detail": "string",
"instance": "string",
"status": "string",
"title": "string"
}
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Current heartbeat service operational parameter values | params |
400 | Bad Request | Bad Request. Malformed JSON. Verify all JSON formatting in payload. | Error |
401 | Unauthorized | Unauthorized. RBAC and/or authenticated token does not allow calling this method. Check the authentication token expiration. Verify that the RBAC information is correct. | Error |
404 | Not Found | Not Found. Endpoint not available. Check IP routing between managed and management plane. Check that any SMS node services are running on management plane. Check that SMS node API gateway service is running on management plane. Check that SMS node HMI service is running on management plane. | Error |
405 | Method Not Allowed | Operation not permitted. For /params, only PATCH and GET operations are allowed. | Error |
500 | Internal Server Error | Internal Server Error. Unexpected condition encountered when processing the request. | None |
default | Default | Unexpected error | Error |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
Code samples
PATCH https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/params HTTP/1.1
Host: api-gw-service-nmn.local
Content-Type: application/json
Accept: */*
# You can also use wget
curl -X PATCH https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/params \
-H 'Content-Type: application/json' \
-H 'Accept: */*' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Content-Type': 'application/json',
'Accept': '*/*',
'Authorization': 'Bearer {access-token}'
}
r = requests.patch('https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/params', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Content-Type": []string{"application/json"},
"Accept": []string{"*/*"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("PATCH", "https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/params", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
PATCH /params
Update heartbeat tracker parameters
Set one or more configurable parameters for the heartbeat tracker service and have them take effect immediately, without restarting the service.
Body parameter
{
"Debug": "0",
"Errtime": "10",
"Warntime": "5",
"Kv_url": "http://cray-hbtd-etcd-client:2379",
"Interval": "5",
"Nosm": "0",
"Sm_retries": "3",
"Sm_timeout": "5",
"Sm_url": "http://cray-smd/v1/State/Components",
"Telemetry_host": "10.2.3.4:9092:heartbeat_notifications",
"Use_telemetry": "1"
}
Name | In | Type | Required | Description |
---|---|---|---|---|
body | body | params | true | none |
Example responses
200 Response
default Response
{
"type": "string",
"detail": "string",
"instance": "string",
"status": "string",
"title": "string"
}
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Current heartbeat service operational parameter values | params |
400 | Bad Request | Bad Request. Malformed JSON. Verify all JSON formatting in payload. | Error |
401 | Unauthorized | Unauthorized. RBAC and/or authenticated token does not allow calling this method. Check the authentication token expiration. Verify that the RBAC information is correct. | Error |
404 | Not Found | Not Found. Endpoint not available. Check IP routing between managed and management plane. Check that any SMS node services are running on management plane. Check that SMS node API gateway service is running on management plane. Check that SMS node HMI service is running on management plane. | Error |
405 | Method Not Allowed | Operation not permitted. For /params, only PATCH and GET operations are allowed. | Error |
500 | Internal Server Error | Internal Server Error. Unexpected condition encountered when processing the request. | None |
default | Default | Unexpected error | Error |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
Code samples
GET https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/health HTTP/1.1
Host: api-gw-service-nmn.local
Accept: application/json
# You can also use wget
curl -X GET https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/health \
-H 'Accept: application/json' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Accept': 'application/json',
'Authorization': 'Bearer {access-token}'
}
r = requests.get('https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/health', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Accept": []string{"application/json"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("GET", "https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/health", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
GET /health
Query the health of the service
The health
resource returns health information about the heartbeat tracker service and its dependencies. This actively checks the connection between the heartbeat tracker service and the following:
This is primarily intended as a diagnostic tool to investigate the functioning of the heartbeat tracker service.
Example responses
200 Response
{
"KvStore": "KV Store not initialized",
"MsgBus": "Connected and OPEN",
"HsmStatus": "Ready"
}
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | OK Network API call success | Inline |
405 | Method Not Allowed | Operation Not Permitted. For /health, only GET operations are allowed. | Problem7807 |
Status Code 200
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
» KvStore | string | true | none | Status of the KV Store. |
» MsgBus | string | true | none | Status of the connection with the message bus. |
» HsmStatus | string | true | none | Status of the connection to the Hardware State Manager (HSM). Any error reported by an attempt to access the HSM will be included here. |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
Code samples
GET https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/liveness HTTP/1.1
Host: api-gw-service-nmn.local
Accept: application/json
# You can also use wget
curl -X GET https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/liveness \
-H 'Accept: application/json' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Accept': 'application/json',
'Authorization': 'Bearer {access-token}'
}
r = requests.get('https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/liveness', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Accept": []string{"application/json"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("GET", "https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/liveness", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
GET /liveness
Kubernetes liveness endpoint to monitor service health
The liveness
resource works in conjunction with the Kubernetes liveness probe to determine when the service is no longer responding to requests. Too many failures of the liveness probe will result in the service being shut down and restarted.
This is primarily an endpoint for the automated Kubernetes system.
Example responses
405 Response
{
"type": "about:blank",
"detail": "Detail about this specific problem occurrence. See RFC7807",
"instance": "",
"status": 400,
"title": "Description of HTTP Status code, e.g. 400"
}
Status | Meaning | Description | Schema |
---|---|---|---|
204 | No Content | No Content Network API call success | None |
405 | Method Not Allowed | Operation Not Permitted. For /liveness, only GET operations are allowed. | Problem7807 |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
Code samples
GET https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/readiness HTTP/1.1
Host: api-gw-service-nmn.local
Accept: application/json
# You can also use wget
curl -X GET https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/readiness \
-H 'Accept: application/json' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Accept': 'application/json',
'Authorization': 'Bearer {access-token}'
}
r = requests.get('https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/readiness', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Accept": []string{"application/json"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("GET", "https://api-gw-service-nmn.local/apis/hbtd/hmi/v1/readiness", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
GET /readiness
Kubernetes readiness endpoint to monitor service health
The readiness
resource works in conjunction with the Kubernetes readiness probe to determine when the service is no longer healthy and able to respond correctly to requests. Too many failures of the readiness probe will result in the traffic being routed away from this service and eventually the service will be shut down and restarted if in an unready state for too long.
This is primarily an endpoint for the automated Kubernetes system.
Example responses
405 Response
{
"type": "about:blank",
"detail": "Detail about this specific problem occurrence. See RFC7807",
"instance": "",
"status": 400,
"title": "Description of HTTP Status code, e.g. 400"
}
Status | Meaning | Description | Schema |
---|---|---|---|
204 | No Content | No Content Network API call success | None |
405 | Method Not Allowed | Operation Not Permitted. For /readiness, only GET operations are allowed. | Problem7807 |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
{
"Component": "x0c1s2b0n3",
"Hostname": "x0c1s2b0n3.us.cray.com",
"NID": "83",
"Status": "Kernel Oops",
"TimeStamp": "2018-07-06T12:34:56.012345-5Z"
}
Heartbeat Message
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
Component | XName.1.0.0 | true | none | Identifies sender by xname. This is the physical, location-based name of a component. |
Hostname | Hostname.1.0.0 | false | none | Identifies sender by hostname. This is the host name of a component. |
NID | NID.1.0.0 | false | none | Identifies sender by Numeric ID (NID). This is the Numeric ID of a compute node. |
Status | HeartbeatStatus.1.0.0 | true | none | Special status field for specific failure modes. |
TimeStamp | TimeStamp.1.0.0 | true | none | When heartbeat was sent. This is an ISO8601 formatted time stamp. |
{
"Status": "Kernel Oops",
"TimeStamp": "2018-07-06T12:34:56.012345-5Z"
}
Heartbeat Message
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
Status | HeartbeatStatus.1.0.0 | true | none | Special status field for specific failure modes. |
TimeStamp | TimeStamp.1.0.0 | true | none | When heartbeat was sent. This is an ISO8601 formatted time stamp. |
{
"XNames": [
"x0c1s2b0n3"
]
}
Heartbeat Status Query
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
XNames | [XName.1.0.0] | false | none | List of component XNames to query for heartbeat status. |
{
"HBStates": [
{
"XName": "x0c0s0b0n0",
"Heartbeating": true
}
]
}
Heartbeat Status Query Response
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
HBStates | [hbstates_single_rsp] | false | none | List of components’ heartbeat status. |
{
"XName": "x0c0s0b0n0",
"Heartbeating": true
}
Heartbeat Status for a Component
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
XName | string | false | none | XName of a component |
Heartbeating | boolean | false | none | Signifies if a component is actively heartbeating. |
{
"Debug": "0",
"Errtime": "10",
"Warntime": "5",
"Kv_url": "http://cray-hbtd-etcd-client:2379",
"Interval": "5",
"Nosm": "0",
"Port": "8080",
"Sm_retries": "3",
"Sm_timeout": "5",
"Sm_url": "http://cray-smd/v1/State/Components",
"Telemetry_host": "10.2.3.4:9092:heartbeat_notifications",
"Use_telemetry": "1"
}
Operational Parameters Message
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
Debug | string | false | none | This is the debug level of the heartbeat service. Debug parameter increases the verbosity of the logging. |
Errtime | string | false | none | This is the timeout interval resulting in a missing heartbeat error. Allows you to change the max time elapsed since the last heatbeat received by a component before sending an ALERT to the HSM. |
Warntime | string | false | none | This is the timeout interval resulting in a missing heartbeat warning. Allows you to change the max time elapsed since last heartbeat received by a component before sending a WARNING to the State Manager. |
Kv_url | string | false | none | This is the URL of a Key/Value store service. |
Interval | string | false | none | This is the time interval between heartbeat checks (in seconds). |
Nosm | string | false | none | This enables/disables actual State Manager interaction. |
Port | string | false | read-only | This is the port the heartbeat service listens on. |
Sm_retries | string | false | none | This is the number of times to retry failed State Manager interactions. |
Sm_timeout | string | false | none | This is max time (in seconds) to wait for a response from the HSM in any given interaction. |
Sm_url | string | false | none | This is the State Manager URL |
Telemetry_host | string | false | none | Telemetry bus host description (host:port:topic) |
Use_telemetry | string | false | none | Turn on or off the ability to dump notifications of heartbeat state changes to the telemetry bus. If non-zero dump heartbeat change notifications onto the telemetry bus. |
"x0c1s2b0n3"
Identifies sender by xname. This is the physical, location-based name of a component.
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
anonymous | string | false | none | Identifies sender by xname. This is the physical, location-based name of a component. |
"x0c1s2b0n3.us.cray.com"
Identifies sender by hostname. This is the host name of a component.
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
anonymous | string | false | none | Identifies sender by hostname. This is the host name of a component. |
"83"
Identifies sender by Numeric ID (NID). This is the Numeric ID of a compute node.
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
anonymous | string | false | none | Identifies sender by Numeric ID (NID). This is the Numeric ID of a compute node. |
"2018-07-06T12:34:56.012345-5Z"
When heartbeat was sent. This is an ISO8601 formatted time stamp.
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
anonymous | string | false | none | When heartbeat was sent. This is an ISO8601 formatted time stamp. |
"Kernel Oops"
Special status field for specific failure modes.
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
anonymous | string | false | none | Special status field for specific failure modes. |
{
"type": "string",
"detail": "string",
"instance": "string",
"status": "string",
"title": "string"
}
RFC 7807 compliant error payload. All fields are optional except the ’type’ field.
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
type | string | true | none | none |
detail | string | false | none | none |
instance | string | false | none | none |
status | string | false | none | none |
title | string | false | none | none |
{
"type": "about:blank",
"detail": "Detail about this specific problem occurrence. See RFC7807",
"instance": "",
"status": 400,
"title": "Description of HTTP Status code, e.g. 400"
}
RFC 7807 compliant error payload. All fields are optional except the ’type’ field.
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
type | string | true | none | none |
detail | string | false | none | none |
instance | string | false | none | none |
status | number(int32) | false | none | none |
title | string | false | none | none |