Scroll down for code samples, example requests and responses. Select a language for code samples from the tabs above or the mobile navigation menu.
The Rack Resiliency Service (RRS) provides a set of APIs to manage Rack Level resiliency. It queries the Kubernetes cluster to provide aggregated zone information and detailed critical service status. It gathers node details across various zones and presents both high-level summaries and in-depth information for zones and critical services.
Retrieve aggregated zone configuration showing Kubernetes_Topology_Zones and CEPH_Zones including:
- Management_Master_Nodes
- Management_Worker_Nodes
- Management_Storage_Nodes
Alternatively, if zones are not configured, one of the following informational messages is returned:
- "No K8s Topology/Ceph Zones configured"
- "No Ceph zones configured"
- "No K8s topology zones configured"
Retrieve detailed information for a specific zone including:
- Zone_Name
- Management_Master_Nodes
- Management_Worker_Nodes
- Management_Storage_Nodes
- Node status and OSD information
Retrieve a list of critical services grouped by namespace
Retrieve a summarized view of a specific critical service (without pod details). The response includes:
- configured_instances
- name
- namespace
- type
Update the critical services configuration based on provided input. This endpoint
modifies which critical services are monitored.
Retrieve the status of all critical services including service status and distribution details.
Each service object may include:
- Service Name
- status
- balanced: indicates whether the service is properly distributed
Retrieve detailed status for a specific critical service including pod information.
The response includes:
- configured_instances
- currently_running_instances
- name
- namespace
- pods with name, node, status, and zone
- type
- status
- balanced
Base URLs:
License: Hewlett Packard Enterprise Development LP
Retrieve aggregated and detailed information about zones including node types and counts.
Code samples
GET https://api-gw-service-nmn.local/apis/rrs/zones HTTP/1.1
Host: api-gw-service-nmn.local
Accept: application/json
# You can also use wget
curl -X GET https://api-gw-service-nmn.local/apis/rrs/zones \
-H 'Accept: application/json' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Accept': 'application/json',
'Authorization': 'Bearer {access-token}'
}
r = requests.get('https://api-gw-service-nmn.local/apis/rrs/zones', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Accept": []string{"application/json"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("GET", "https://api-gw-service-nmn.local/apis/rrs/zones", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
GET /zones
Get Zones Configuration
Returns an object with a property “Zones” that is an array of zones. Each zone contains:
Example responses
Zones configuration
{
"Zones": [
{
"Zone_Name": "cscs-rack-x3001",
"Kubernetes_Topology_Zone": {
"Management_Master_Nodes": [
"ncn-m002"
],
"Management_Worker_Nodes": [
"ncn-w002",
"ncn-w004"
]
},
"CEPH_Zone": {
"Management_Storage_Nodes": [
"ncn-s004",
"ncn-s003"
]
}
},
{
"Zone_Name": "cscs-rack-x3002",
"Kubernetes_Topology_Zone": {
"Management_Master_Nodes": [
"ncn-m003"
],
"Management_Worker_Nodes": [
"ncn-w003"
]
},
"CEPH_Zone": {
"Management_Storage_Nodes": [
"ncn-s005",
"ncn-s002"
]
}
},
{
"Zone_Name": "cscs-rack-x3000",
"Kubernetes_Topology_Zone": {
"Management_Master_Nodes": [
"ncn-m001"
],
"Management_Worker_Nodes": [
"ncn-w001",
"ncn-w005"
]
},
"CEPH_Zone": {
"Management_Storage_Nodes": [
"ncn-s001"
]
}
}
]
}
404 Response
{
"detail": "string",
"errors": {},
"instance": "http://example.com",
"status": 400,
"title": "string",
"type": "about:blank"
}
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Zones configuration | ZonesResponse |
404 | Not Found | Not found | ProblemDetails |
500 | Internal Server Error | Internal server error | ProblemDetails |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
Code samples
GET https://api-gw-service-nmn.local/apis/rrs/zones/{zone_name} HTTP/1.1
Host: api-gw-service-nmn.local
Accept: application/json
# You can also use wget
curl -X GET https://api-gw-service-nmn.local/apis/rrs/zones/{zone_name} \
-H 'Accept: application/json' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Accept': 'application/json',
'Authorization': 'Bearer {access-token}'
}
r = requests.get('https://api-gw-service-nmn.local/apis/rrs/zones/{zone_name}', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Accept": []string{"application/json"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("GET", "https://api-gw-service-nmn.local/apis/rrs/zones/{zone_name}", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
GET /zones/{zone_name}
Get Detailed Zone Information
Returns detailed information for a specific zone. The response includes:
Name | In | Type | Required | Description |
---|---|---|---|---|
zone_name | path | ZoneName | true | The name of the zone |
Example responses
Detailed zone information
{
"Zone_Name": "cscs-rack-x3001",
"Management_Master": {
"Count": 1,
"Type": "Kubernetes_Topology_Zone",
"Nodes": [
{
"name": "ncn-m002",
"status": "Ready"
}
]
},
"Management_Worker": {
"Count": 2,
"Type": "Kubernetes_Topology_Zone",
"Nodes": [
{
"name": "ncn-w002",
"status": "Ready"
},
{
"name": "ncn-w004",
"status": "Ready"
}
]
},
"Management_Storage": {
"Count": 2,
"Type": "CEPH_Zone",
"Nodes": [
{
"name": "ncn-s004",
"status": "NotReady",
"osds": {
"down": [
"osd.0",
"osd.5"
]
}
},
{
"name": "ncn-s003",
"status": "Ready",
"osds": {
"up": [
"osd.4",
"osd.9",
"osd.12",
"osd.15",
"osd.18",
"osd.21",
"osd.24",
"osd.27"
]
}
}
]
}
}
400 Response
{
"detail": "string",
"errors": {},
"instance": "http://example.com",
"status": 400,
"title": "string",
"type": "about:blank"
}
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Detailed zone information | ZoneDetailResponse |
400 | Bad Request | Bad request | ProblemDetails |
404 | Not Found | Not found | ProblemDetails |
500 | Internal Server Error | Internal server error | ProblemDetails |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
Interact with critical service configurations, summaries, and runtime statuses.
Code samples
GET https://api-gw-service-nmn.local/apis/rrs/criticalservices HTTP/1.1
Host: api-gw-service-nmn.local
Accept: application/json
# You can also use wget
curl -X GET https://api-gw-service-nmn.local/apis/rrs/criticalservices \
-H 'Accept: application/json' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Accept': 'application/json',
'Authorization': 'Bearer {access-token}'
}
r = requests.get('https://api-gw-service-nmn.local/apis/rrs/criticalservices', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Accept": []string{"application/json"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("GET", "https://api-gw-service-nmn.local/apis/rrs/criticalservices", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
GET /criticalservices
Get Critical Services
Returns a list of critical services grouped by namespace. The response includes a critical_services property containing namespaces with arrays of service objects including:
Example responses
List of critical services grouped by namespace
{
"critical_services": {
"namespace": {
"services": [
{
"name": "cray-dns-powerdns",
"type": "Deployment"
},
{
"name": "cray-hbtd",
"type": "Deployment"
},
{
"name": "cray-hmnfd",
"type": "Deployment"
},
{
"name": "cray-keycloak",
"type": "StatefulSet"
},
{
"name": "cray-sls-postgres",
"type": "StatefulSet"
}
],
"spire": [
{
"name": "cray-spire-server",
"type": "StatefulSet"
}
],
"rack-resiliency": [
{
"name": "k8s-zone-api",
"type": "Deployment"
}
]
}
}
}
404 Response
{
"detail": "string",
"errors": {},
"instance": "http://example.com",
"status": 400,
"title": "string",
"type": "about:blank"
}
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | List of critical services grouped by namespace | CriticalServicesListSchema |
404 | Not Found | Not found | ProblemDetails |
500 | Internal Server Error | Internal server error | ProblemDetails |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
Code samples
PATCH https://api-gw-service-nmn.local/apis/rrs/criticalservices HTTP/1.1
Host: api-gw-service-nmn.local
Content-Type: application/json
Accept: application/json
# You can also use wget
curl -X PATCH https://api-gw-service-nmn.local/apis/rrs/criticalservices \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Content-Type': 'application/json',
'Accept': 'application/json',
'Authorization': 'Bearer {access-token}'
}
r = requests.patch('https://api-gw-service-nmn.local/apis/rrs/criticalservices', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Content-Type": []string{"application/json"},
"Accept": []string{"application/json"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("PATCH", "https://api-gw-service-nmn.local/apis/rrs/criticalservices", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
PATCH /criticalservices
Update Critical Services ConfigMap
Updates the critical services configuration. The request body should contain critical services mapped by service name to their configuration details.
Body parameter
{
"critical_services": {
"property1": {
"namespace": "rack-resiliency",
"type": "Deployment"
},
"property2": {
"namespace": "rack-resiliency",
"type": "Deployment"
}
}
}
Name | In | Type | Required | Description |
---|---|---|---|---|
body | body | CriticalServiceCmStaticType | true | Critical services configuration update |
Example responses
Critical Services Updated Successfully
{
"Update": "Successful",
"Successfully_Added_Services": [
"k8s-zone-api",
"kube-multus-ds"
],
"Already_Existing_Services": [
"coredns",
"kube-proxy"
]
}
400 Response
{
"detail": "string",
"errors": {},
"instance": "http://example.com",
"status": 400,
"title": "string",
"type": "about:blank"
}
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Critical Services Updated Successfully | CriticalServiceUpdateSchema |
400 | Bad Request | Bad request | ProblemDetails |
404 | Not Found | Not found | ProblemDetails |
500 | Internal Server Error | Internal server error | ProblemDetails |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
Code samples
GET https://api-gw-service-nmn.local/apis/rrs/criticalservices/{critical_service_name} HTTP/1.1
Host: api-gw-service-nmn.local
Accept: application/json
# You can also use wget
curl -X GET https://api-gw-service-nmn.local/apis/rrs/criticalservices/{critical_service_name} \
-H 'Accept: application/json' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Accept': 'application/json',
'Authorization': 'Bearer {access-token}'
}
r = requests.get('https://api-gw-service-nmn.local/apis/rrs/criticalservices/{critical_service_name}', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Accept": []string{"application/json"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("GET", "https://api-gw-service-nmn.local/apis/rrs/criticalservices/{critical_service_name}", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
GET /criticalservices/{critical_service_name}
Get Critical Service Details (Summarized)
Returns a summarized view of a specific critical service. The response includes:
Name | In | Type | Required | Description |
---|---|---|---|---|
critical_service_name | path | ServiceName | true | The name of the critical service |
Example responses
Summarized critical service information
{
"critical_service": {
"name": "cray-hbtd",
"namespace": "services",
"type": "Deployment",
"configured_instances": 3
}
}
400 Response
{
"detail": "string",
"errors": {},
"instance": "http://example.com",
"status": 400,
"title": "string",
"type": "about:blank"
}
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Summarized critical service information | CriticalServiceDescribeSchema |
400 | Bad Request | Bad request | ProblemDetails |
404 | Not Found | Not found | ProblemDetails |
500 | Internal Server Error | Internal server error | ProblemDetails |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
Code samples
GET https://api-gw-service-nmn.local/apis/rrs/criticalservices/status HTTP/1.1
Host: api-gw-service-nmn.local
Accept: application/json
# You can also use wget
curl -X GET https://api-gw-service-nmn.local/apis/rrs/criticalservices/status \
-H 'Accept: application/json' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Accept': 'application/json',
'Authorization': 'Bearer {access-token}'
}
r = requests.get('https://api-gw-service-nmn.local/apis/rrs/criticalservices/status', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Accept": []string{"application/json"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("GET", "https://api-gw-service-nmn.local/apis/rrs/criticalservices/status", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
GET /criticalservices/status
Get Critical Services Status
Returns the status of all critical services with distribution details. Response provides namespaces containing services with their status information including:
Example responses
Critical services status retrieved successfully
{
"critical_services": {
"namespace": {
"services": [
{
"name": "cray-dns-powerdns",
"type": "Deployment",
"status": "Configured",
"balanced": "true"
},
{
"name": "cray-hbtd",
"type": "Deployment",
"status": "Configured",
"balanced": "true"
},
{
"name": "cray-hmnfd",
"type": "Deployment",
"status": "Configured",
"balanced": "true"
},
{
"name": "cray-keycloak",
"type": "StatefulSet",
"status": "Configured",
"balanced": "true"
},
{
"name": "cray-sls-postgres",
"type": "StatefulSet",
"status": "PartiallyConfigured",
"balanced": "true"
}
],
"spire": [
{
"name": "cray-spire-server",
"type": "StatefulSet",
"status": "Configured",
"balanced": "true"
}
],
"kube-system": [
{
"name": "coredns",
"type": "Deployment",
"status": "Configured",
"balanced": "true"
}
],
"rack-resiliency": [
{
"name": "k8s-zone-api",
"type": "Deployment",
"status": "Configured",
"balanced": "true"
}
]
}
}
}
404 Response
{
"detail": "string",
"errors": {},
"instance": "http://example.com",
"status": 400,
"title": "string",
"type": "about:blank"
}
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Critical services status retrieved successfully | CriticalServicesStatusListSchema |
404 | Not Found | Not found | ProblemDetails |
500 | Internal Server Error | Internal server error | ProblemDetails |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
Code samples
GET https://api-gw-service-nmn.local/apis/rrs/criticalservices/status/{critical_service_name} HTTP/1.1
Host: api-gw-service-nmn.local
Accept: application/json
# You can also use wget
curl -X GET https://api-gw-service-nmn.local/apis/rrs/criticalservices/status/{critical_service_name} \
-H 'Accept: application/json' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Accept': 'application/json',
'Authorization': 'Bearer {access-token}'
}
r = requests.get('https://api-gw-service-nmn.local/apis/rrs/criticalservices/status/{critical_service_name}', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Accept": []string{"application/json"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("GET", "https://api-gw-service-nmn.local/apis/rrs/criticalservices/status/{critical_service_name}", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
GET /criticalservices/status/{critical_service_name}
Get Critical Service Status by Name (Detailed)
Returns detailed status for a specific critical service, including pod information. The response includes:
Name | In | Type | Required | Description |
---|---|---|---|---|
critical_service_name | path | ServiceName | true | The name of the critical service |
Example responses
Detailed critical service status retrieved successfully
{
"critical_service": {
"name": "cray-hbtd",
"namespace": "services",
"type": "Deployment",
"status": "Configured",
"balanced": "true",
"configured_instances": 3,
"currently_running_instances": 3,
"pods": [
{
"name": "cray-hbtd-6cbdbd6955-5xlfg",
"status": "Running",
"node": "ncn-w002",
"zone": "cscs-rack-x3001"
},
{
"name": "cray-hbtd-6cbdbd6955-jwzgq",
"status": "Running",
"node": "ncn-w003",
"zone": "cscs-rack-x3002"
},
{
"name": "cray-hbtd-6cbdbd6955-k6pkt",
"status": "Running",
"node": "ncn-w001",
"zone": "cscs-rack-x3000"
}
]
}
}
400 Response
{
"detail": "string",
"errors": {},
"instance": "http://example.com",
"status": 400,
"title": "string",
"type": "about:blank"
}
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | Detailed critical service status retrieved successfully | CriticalServiceStatusDescribeSchema |
400 | Bad Request | Bad request | ProblemDetails |
404 | Not Found | Not found | ProblemDetails |
500 | Internal Server Error | Internal server error | ProblemDetails |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
Kubernetes health check endpoints for service readiness and liveness probes.
Code samples
GET https://api-gw-service-nmn.local/apis/rrs/healthz/ready HTTP/1.1
Host: api-gw-service-nmn.local
Accept: application/json
# You can also use wget
curl -X GET https://api-gw-service-nmn.local/apis/rrs/healthz/ready \
-H 'Accept: application/json' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Accept': 'application/json',
'Authorization': 'Bearer {access-token}'
}
r = requests.get('https://api-gw-service-nmn.local/apis/rrs/healthz/ready', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Accept": []string{"application/json"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("GET", "https://api-gw-service-nmn.local/apis/rrs/healthz/ready", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
GET /healthz/ready
Retrieve RRS Readiness Probe
Readiness probe for RRS. This is used by Kubernetes to determine if RRS is ready to accept requests
Example responses
200 Response
{}
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | RRS is ready to accept requests | EmptyDict |
500 | Internal Server Error | RRS is not able to accept requests | EmptyDict |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
Code samples
GET https://api-gw-service-nmn.local/apis/rrs/healthz/live HTTP/1.1
Host: api-gw-service-nmn.local
Accept: application/json
# You can also use wget
curl -X GET https://api-gw-service-nmn.local/apis/rrs/healthz/live \
-H 'Accept: application/json' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Accept': 'application/json',
'Authorization': 'Bearer {access-token}'
}
r = requests.get('https://api-gw-service-nmn.local/apis/rrs/healthz/live', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Accept": []string{"application/json"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("GET", "https://api-gw-service-nmn.local/apis/rrs/healthz/live", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
GET /healthz/live
Retrieve RRS Liveness Probe
Liveness probe for RRS. This is used by Kubernetes to determine if RRS is responsive
Example responses
200 Response
{}
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | RRS is responsive | EmptyDict |
500 | Internal Server Error | RRS is not responsive | EmptyDict |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
API version information endpoint.
Code samples
GET https://api-gw-service-nmn.local/apis/rrs/version HTTP/1.1
Host: api-gw-service-nmn.local
Accept: application/json
# You can also use wget
curl -X GET https://api-gw-service-nmn.local/apis/rrs/version \
-H 'Accept: application/json' \
-H 'Authorization: Bearer {access-token}'
import requests
headers = {
'Accept': 'application/json',
'Authorization': 'Bearer {access-token}'
}
r = requests.get('https://api-gw-service-nmn.local/apis/rrs/version', headers = headers)
print(r.json())
package main
import (
"bytes"
"net/http"
)
func main() {
headers := map[string][]string{
"Accept": []string{"application/json"},
"Authorization": []string{"Bearer {access-token}"},
}
data := bytes.NewBuffer([]byte{jsonReq})
req, err := http.NewRequest("GET", "https://api-gw-service-nmn.local/apis/rrs/version", data)
req.Header = headers
client := &http.Client{}
resp, err := client.Do(req)
// ...
}
GET /version
Get RRS version
Retrieve the version of the RRS Service
Example responses
200 Response
{
"version": "1.0.0"
}
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | RRS Version | VersionSchema |
500 | Internal Server Error | Internal server error | ProblemDetails |
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth
"string"
Unique identifier name for the zone
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
anonymous | string | false | none | Unique identifier name for the zone |
{
"Zones": [
{
"Zone_Name": "string",
"Kubernetes_Topology_Zone": {
"Management_Master_Nodes": [
"string"
],
"Management_Worker_Nodes": [
"string"
]
},
"CEPH_Zone": {
"Management_Storage_Nodes": [
"string"
]
}
}
]
}
Response containing all configured zones in the system with their associated node configurations
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
Zones | [ZoneItemSchema] | true | none | Array of zone configurations showing Kubernetes topology and CEPH zone details |
{
"Zone_Name": "string",
"Kubernetes_Topology_Zone": {
"Management_Master_Nodes": [
"string"
],
"Management_Worker_Nodes": [
"string"
]
},
"CEPH_Zone": {
"Management_Storage_Nodes": [
"string"
]
}
}
Configuration details for a single zone including Kubernetes topology and CEPH zone information
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
Zone_Name | ZoneName | true | none | Unique identifier name for the zone |
Kubernetes_Topology_Zone | KubernetesTopologyZoneSchema | false | none | Kubernetes topology zone configuration containing master and worker node assignments |
CEPH_Zone | CephZoneSchema | false | none | CEPH zone configuration containing storage node assignments |
{
"Management_Master_Nodes": [
"string"
],
"Management_Worker_Nodes": [
"string"
]
}
Kubernetes topology zone configuration containing master and worker node assignments
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
Management_Master_Nodes | [string] | false | none | List of Kubernetes master node names assigned to this zone |
Management_Worker_Nodes | [string] | false | none | List of Kubernetes worker node names assigned to this zone |
{
"Management_Storage_Nodes": [
"string"
]
}
CEPH zone configuration containing storage node assignments
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
Management_Storage_Nodes | [string] | true | none | List of CEPH storage node names assigned to this zone |
{
"Zone_Name": "string",
"Management_Master": {
"Count": 1,
"Type": "Kubernetes_Topology_Zone",
"Nodes": [
{
"name": "string",
"status": "Ready"
}
]
},
"Management_Worker": {
"Count": 1,
"Type": "Kubernetes_Topology_Zone",
"Nodes": [
{
"name": "string",
"status": "Ready"
}
]
},
"Management_Storage": {
"Count": 1,
"Type": "CEPH_Zone",
"Nodes": [
{
"name": "string",
"status": "Ready",
"osds": {
"up": [
"string"
],
"down": [
"string"
]
}
}
]
}
}
Detailed information about a specific zone including node counts, types, and individual node status
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
Zone_Name | ZoneName | true | none | Unique identifier name for the zone |
Management_Master | ManagementKubernetesSchema | false | none | Management information for Kubernetes nodes (master or worker) including count and individual node details |
Management_Worker | ManagementKubernetesSchema | false | none | Management information for Kubernetes nodes (master or worker) including count and individual node details |
Management_Storage | ManagementStorageSchema | false | none | Management information for CEPH storage nodes including count and individual node details with OSD information |
{
"Count": 1,
"Type": "Kubernetes_Topology_Zone",
"Nodes": [
{
"name": "string",
"status": "Ready"
}
]
}
Management information for Kubernetes nodes (master or worker) including count and individual node details
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
Count | integer | true | none | Total number of nodes of this type in the zone |
Type | string | true | none | Type classification indicating this is a Kubernetes topology zone |
Nodes | [NodeSchema] | true | none | Detailed information about each individual node |
Property | Value |
---|---|
Type | Kubernetes_Topology_Zone |
{
"Count": 1,
"Type": "CEPH_Zone",
"Nodes": [
{
"name": "string",
"status": "Ready",
"osds": {
"up": [
"string"
],
"down": [
"string"
]
}
}
]
}
Management information for CEPH storage nodes including count and individual node details with OSD information
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
Count | integer | true | none | Total number of storage nodes in the zone |
Type | string | true | none | Type classification indicating this is a CEPH zone |
Nodes | [StorageNodeSchema] | true | none | Detailed information about each storage node including OSD status |
Property | Value |
---|---|
Type | CEPH_Zone |
{
"name": "string",
"status": "Ready"
}
Basic node information including name and operational status
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
name | string | true | none | Unique name identifier for the node |
status | string | true | none | Current operational status of the node |
Property | Value |
---|---|
status | Ready |
status | NotReady |
status | Unknown |
{
"name": "string",
"status": "Ready",
"osds": {
"up": [
"string"
],
"down": [
"string"
]
}
}
Storage node information including CEPH OSD (Object Storage Daemon) status details
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
name | string | true | none | Unique name identifier for the storage node |
status | string | true | none | Current operational status of the storage node |
osds | OSDStatesSchema | true | none | Object Storage Daemon status information showing which OSDs are operational and non-operational |
Property | Value |
---|---|
status | Ready |
status | NotReady |
{
"up": [
"string"
],
"down": [
"string"
]
}
Object Storage Daemon status information showing which OSDs are operational and non-operational
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
up | [string] | false | none | List of OSD identifiers that are currently operational |
down | [string] | false | none | List of OSD identifiers that are currently non-operational |
"rack-resiliency"
Kubernetes namespace name where a service is deployed
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
anonymous | string | false | none | Kubernetes namespace name where a service is deployed |
"cray-dns-powerdns"
Name of the critical service
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
anonymous | string | false | none | Name of the critical service |
"true"
Indicates whether a service is properly distributed across zones for high availability
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
anonymous | string | false | none | Indicates whether a service is properly distributed across zones for high availability |
Property | Value |
---|---|
anonymous | true |
anonymous | false |
anonymous | NA |
"error"
Current operational status of a critical service indicating its configuration and runtime state
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
anonymous | string | false | none | Current operational status of a critical service indicating its configuration and runtime state |
Property | Value |
---|---|
anonymous | error |
anonymous | Configured |
anonymous | PartiallyConfigured |
anonymous | NotConfigured |
anonymous | Running |
anonymous | Unconfigured |
"Deployment"
Kubernetes resource type of the service
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
anonymous | string | false | none | Kubernetes resource type of the service |
Property | Value |
---|---|
anonymous | Deployment |
anonymous | StatefulSet |
{
"critical_services": {
"namespace": {
"property1": [
{
"name": "cray-dns-powerdns",
"type": "Deployment"
}
],
"property2": [
{
"name": "cray-dns-powerdns",
"type": "Deployment"
}
]
}
}
}
Response containing all critical services organized by namespace
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
critical_services | object | true | none | Critical services grouped by their Kubernetes namespaces |
» namespace | object | true | none | Mapping of namespace names to their contained critical services |
»» additionalProperties | [CriticalServiceItemSchema] | false | none | [Basic information about a critical service including its name and Kubernetes resource type] |
{
"name": "cray-dns-powerdns",
"type": "Deployment"
}
Basic information about a critical service including its name and Kubernetes resource type
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
name | ServiceName | true | none | Name of the critical service |
type | ServiceType | true | none | Kubernetes resource type of the service |
{
"critical_services": {
"property1": {
"namespace": "rack-resiliency",
"type": "Deployment"
},
"property2": {
"namespace": "rack-resiliency",
"type": "Deployment"
}
}
}
Configuration payload for updating critical services in the system. Contains service definitions that need to be monitored for resiliency
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
critical_services | object | true | none | Mapping from service names to their configuration details for monitoring |
» additionalProperties | CriticalServiceCmStaticSchema | false | none | Static configuration details for a critical service that needs to be monitored |
{
"namespace": "rack-resiliency",
"type": "Deployment"
}
Static configuration details for a critical service that needs to be monitored
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
namespace | NamespaceName | true | none | Kubernetes namespace name where a service is deployed |
type | ServiceType | true | none | Kubernetes resource type of the service |
{
"Update": "Successful",
"Successfully_Added_Services": [
"cray-dns-powerdns"
],
"Already_Existing_Services": [
"cray-dns-powerdns"
]
}
Response indicating the result of updating critical services configuration
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
Update | string | true | none | Overall status of the update operation |
Successfully_Added_Services | [ServiceName] | true | none | List of service names that were successfully added to the configuration |
Already_Existing_Services | [ServiceName] | true | none | List of service names that were already present in the configuration |
Property | Value |
---|---|
Update | Successful |
Update | Services Already Exist |
{
"critical_service": {
"name": "cray-dns-powerdns",
"namespace": "rack-resiliency",
"type": "Deployment",
"configured_instances": 1
}
}
Summarized view of a critical service without runtime details
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
critical_service | object | true | none | Basic configuration information about the critical service |
» name | ServiceName | true | none | Name of the critical service |
» namespace | NamespaceName | true | none | Kubernetes namespace name where a service is deployed |
» type | ServiceType | true | none | Kubernetes resource type of the service |
» configured_instances | number | true | none | Number of instances configured for this service (replicas for Deployments/StatefulSets) |
{
"critical_service": {
"name": "cray-dns-powerdns",
"namespace": "rack-resiliency",
"type": "Deployment",
"status": "error",
"balanced": "true",
"configured_instances": 1,
"currently_running_instances": 0,
"pods": [
{
"name": "string",
"node": "string",
"status": "Running",
"zone": "string"
}
]
}
}
Detailed status information for a critical service including runtime details and pod information
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
critical_service | object | true | none | Complete status information including configuration and runtime details |
» name | ServiceName | true | none | Name of the critical service |
» namespace | NamespaceName | true | none | Kubernetes namespace name where a service is deployed |
» type | ServiceType | true | none | Kubernetes resource type of the service |
» status | ServiceStatus | true | none | Current operational status of a critical service indicating its configuration and runtime state |
» balanced | ServiceBalanced | true | none | Indicates whether a service is properly distributed across zones for high availability |
» configured_instances | number | true | none | Number of instances configured for this service |
» currently_running_instances | number | true | none | Number of instances currently running and healthy |
» pods | [PodSchema] | true | none | Detailed information about each pod instance of this service |
{
"name": "string",
"node": "string",
"status": "Running",
"zone": "string"
}
Information about an individual pod instance including its location and status
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
name | string | true | none | Unique name of the pod instance |
node | string | true | none | Kubernetes node where the pod is scheduled to run |
status | string | true | none | Current operational status of the pod |
zone | string | true | none | Zone where the pod is located based on its assigned node |
Property | Value |
---|---|
status | Running |
status | Pending |
status | Failed |
status | Terminating |
{
"critical_services": {
"namespace": {
"property1": [
{
"name": "cray-dns-powerdns",
"type": "Deployment",
"status": "error",
"balanced": "true"
}
],
"property2": [
{
"name": "cray-dns-powerdns",
"type": "Deployment",
"status": "error",
"balanced": "true"
}
]
}
}
}
Status overview of all critical services organized by namespace, showing their operational state and distribution
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
critical_services | object | true | none | Critical services grouped by namespace with their status information |
» namespace | object | true | none | Mapping of namespace names to their contained critical services with status |
»» additionalProperties | [CriticalServiceStatusItemSchema] | false | none | [Status summary for a critical service showing its operational state and distribution across zones] |
{
"name": "cray-dns-powerdns",
"type": "Deployment",
"status": "error",
"balanced": "true"
}
Status summary for a critical service showing its operational state and distribution across zones
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
name | ServiceName | true | none | Name of the critical service |
type | ServiceType | true | none | Kubernetes resource type of the service |
status | ServiceStatus | true | none | Current operational status of a critical service indicating its configuration and runtime state |
balanced | ServiceBalanced | true | none | Indicates whether a service is properly distributed across zones for high availability |
{
"version": "1.0.0"
}
Version information for the Rack Resiliency Service
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
version | string | true | none | The current version of the RRS Service |
{
"detail": "string",
"errors": {},
"instance": "http://example.com",
"status": 400,
"title": "string",
"type": "about:blank"
}
An error response for RFC 7807 problem details.
Name | Type | Required | Restrictions | Description |
---|---|---|---|---|
detail | string | false | none | A human-readable explanation specific to this occurrence of the problem. Focus on helping correct the problem, rather than giving debugging information. |
errors | object | false | none | An object denoting field-specific errors. Only present on error responses when field input is specified for the request. |
instance | string(uri) | false | none | A relative URI reference that identifies the specific occurrence of the problem |
status | integer | false | none | HTTP status code |
title | string | false | none | Short, human-readable summary of the problem, should not change by occurrence. |
type | string(uri) | false | none | Relative URI reference to the type of problem which includes human-readable documentation. |
{}
Empty response object typically returned by health check endpoints to indicate successful operation
None