Compute Rolling Upgrade Service v1

Scroll down for code samples, example requests and responses. Select a language for code samples from the tabs above or the mobile navigation menu.

The Compute Rolling Upgrade Service (CRUS) coordinates with workload managers and the Boot Orchestration Service (BOS) to modify the boot image and/or configuration on a set of compute nodes in a way that is minimally disruptive to the overall ability of the computes to run jobs.

CRUS divides the set of nodes into groups and, for each group in turn, it performs the following procedure:

  1. Quiesces the nodes using the workload manager.
  2. Takes the nodes out of service in the workload manager.
  3. Creates a BOS reboot operation on the nodes using the specified BOS session template.
  4. Puts the nodes back into service using the workload manager.

Each group of nodes must complete this procedure before the next group begins it. In this way most of the total set of nodes remains available to do work while each smaller group is being updated.

Resources

/session

A CRUS session performs a rolling upgrade on a set of compute nodes.

Workflow

Create a New Session

POST /session

A new session is launched as a result of this call.

Specify the following parameters:

  • failed_label: An empty Hardware State Manager (HSM) group which CRUS will populate with any nodes that fail their upgrades.
  • starting_label: An HSM group which contains the total set of nodes to be upgraded.
  • upgrade_step_size: The number of nodes to include in each discrete upgrade step. The upgrade steps will never exceed this quantity, although in some cases they may be smaller.
  • upgrade_template_id: The name of the BOS session template to use for the upgrades.
  • workload_manager_type: Currently only slurm is supported.
  • upgrading_label: An empty HSM group which CRUS will use to boot and configure the discrete sets of nodes.

Examine a Session

GET /session/{upgrade_id}

Retrieve session details and status by upgrade id.

List All Sessions

GET /session

List all in progress and completed sessions.

Request a Session Be Deleted

DELETE /session/{upgrade_id}

Request a deletion of the specified CRUS session. Note that the delete may not happen immediately.

Interactions with Other APIs

CRUS works in concert with BOS to perform the node upgrades. The session template specified as the upgrade template must be available in BOS. CRUS uses HSM to view the starting node group and modify the upgrading and (if necessary) failed node groups.

Base URLs:

Authentication

  • HTTP Authentication, scheme: bearer

Default

post__session

Code samples

POST https://api-gw-service-nmn.local/apis/crus/session HTTP/1.1
Host: api-gw-service-nmn.local
Content-Type: application/json
Accept: application/json
# You can also use wget
curl -X POST https://api-gw-service-nmn.local/apis/crus/session \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json' \
  -H 'Authorization: Bearer {access-token}'
import requests
headers = {
  'Content-Type': 'application/json',
  'Accept': 'application/json',
  'Authorization': 'Bearer {access-token}'
}

r = requests.post('https://api-gw-service-nmn.local/apis/crus/session', headers = headers)

print(r.json())
package main

import (
       "bytes"
       "net/http"
)

func main() {

    headers := map[string][]string{
        "Content-Type": []string{"application/json"},
        "Accept": []string{"application/json"},
        "Authorization": []string{"Bearer {access-token}"},
    }

    data := bytes.NewBuffer([]byte{jsonReq})
    req, err := http.NewRequest("POST", "https://api-gw-service-nmn.local/apis/crus/session", data)
    req.Header = headers

    client := &http.Client{}
    resp, err := client.Do(req)
    // ...
}

POST /session

Create a session

The creation of a session performs a rolling upgrade using the specified session template on the nodes specified in the starting group.

Body parameter

{
  "failed_label": "nodes-that-failed",
  "starting_label": "nodes-to-upgrade",
  "upgrade_step_size": 30,
  "upgrade_template_id": "my-bos-session-template",
  "upgrading_label": "nodes-currently-upgrading",
  "workload_manager_type": "slurm"
}

Parameters

Name In Type Required Description
body body Session true A JSON object for creating a Session

Example responses

201 Response

{
  "api_version": "2.71.828",
  "completed": true,
  "failed_label": "nodes-that-failed",
  "kind": "ComputeUpgradeSession",
  "messages": [
    "string"
  ],
  "starting_label": "nodes-to-upgrade",
  "state": "UPDATING",
  "upgrade_id": "c926acf6-b5c6-411e-ba6c-ea0448cab2ee",
  "upgrade_step_size": 30,
  "upgrade_template_id": "my-bos-session-template",
  "upgrading_label": "nodes-currently-upgrading",
  "workload_manager_type": "slurm"
}

Responses

Status Meaning Description Schema
201 Created The status of the CRUS session. SessionStatus
400 Bad Request Bad Request None
422 Unprocessable Entity Unprocessable Entity None
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth

get__session

Code samples

GET https://api-gw-service-nmn.local/apis/crus/session HTTP/1.1
Host: api-gw-service-nmn.local
Accept: application/json
# You can also use wget
curl -X GET https://api-gw-service-nmn.local/apis/crus/session \
  -H 'Accept: application/json' \
  -H 'Authorization: Bearer {access-token}'
import requests
headers = {
  'Accept': 'application/json',
  'Authorization': 'Bearer {access-token}'
}

r = requests.get('https://api-gw-service-nmn.local/apis/crus/session', headers = headers)

print(r.json())
package main

import (
       "bytes"
       "net/http"
)

func main() {

    headers := map[string][]string{
        "Accept": []string{"application/json"},
        "Authorization": []string{"Bearer {access-token}"},
    }

    data := bytes.NewBuffer([]byte{jsonReq})
    req, err := http.NewRequest("GET", "https://api-gw-service-nmn.local/apis/crus/session", data)
    req.Header = headers

    client := &http.Client{}
    resp, err := client.Do(req)
    // ...
}

GET /session

List sessions

List all sessions, including those in progress and those complete.

Example responses

200 Response

[
  {
    "api_version": "2.71.828",
    "completed": true,
    "failed_label": "nodes-that-failed",
    "kind": "ComputeUpgradeSession",
    "messages": [
      "string"
    ],
    "starting_label": "nodes-to-upgrade",
    "state": "UPDATING",
    "upgrade_id": "c926acf6-b5c6-411e-ba6c-ea0448cab2ee",
    "upgrade_step_size": 30,
    "upgrade_template_id": "my-bos-session-template",
    "upgrading_label": "nodes-currently-upgrading",
    "workload_manager_type": "slurm"
  }
]

Responses

Status Meaning Description Schema
200 OK A collection of Sessions Inline

Response Schema

Status Code 200

Name Type Required Restrictions Description
anonymous [SessionStatus] false none [The status for a CRUS Session.]
» api_version string true none Version of the API that created the session.
» completed boolean true none Whether or not the CRUS session has completed.
» failed_label string true none A Hardware State Manager (HSM) group which CRUS will populate
with any nodes that fail their upgrades.
» kind string true none The kind of CRUS session. Currently only ComputeUpgradeSession.
» messages [string] true none Status messages describing the progress of the session.
» starting_label string true none A Hardware State Manager (HSM) group which contains the total set of
nodes to be upgraded.
» state string true none Current state of the session.
» upgrade_id string(uuid) true none The ID of the CRUS session.
» upgrade_step_size integer true none The desired number of nodes for each discrete upgrade step. This quantity
will not be exceeded but some steps may use fewer nodes.
» upgrade_template_id string true none The name of the Boot Orchestration Service (BOS) session template for the
CRUS session upgrades.
» upgrading_label string true none A Hardware State Manager (HSM) group which the CRUS session will use
to boot and configure the discrete sets of nodes.
» workload_manager_type string true none The name of the workload manager.

Enumerated Values

Property Value
kind ComputeUpgradeSession
state CREATED
state READY
state DELETING
state UPDATING
workload_manager_type slurm
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth

get__session_{upgrade_id}

Code samples

GET https://api-gw-service-nmn.local/apis/crus/session/{upgrade_id} HTTP/1.1
Host: api-gw-service-nmn.local
Accept: application/json
# You can also use wget
curl -X GET https://api-gw-service-nmn.local/apis/crus/session/{upgrade_id} \
  -H 'Accept: application/json' \
  -H 'Authorization: Bearer {access-token}'
import requests
headers = {
  'Accept': 'application/json',
  'Authorization': 'Bearer {access-token}'
}

r = requests.get('https://api-gw-service-nmn.local/apis/crus/session/{upgrade_id}', headers = headers)

print(r.json())
package main

import (
       "bytes"
       "net/http"
)

func main() {

    headers := map[string][]string{
        "Accept": []string{"application/json"},
        "Authorization": []string{"Bearer {access-token}"},
    }

    data := bytes.NewBuffer([]byte{jsonReq})
    req, err := http.NewRequest("GET", "https://api-gw-service-nmn.local/apis/crus/session/{upgrade_id}", data)
    req.Header = headers

    client := &http.Client{}
    resp, err := client.Do(req)
    // ...
}

GET /session/{upgrade_id}

Retrieve session details by id

Retrieve session details by upgrade_id.

Parameters

Name In Type Required Description
upgrade_id path string(uuid) true Upgrade ID

Example responses

200 Response

{
  "api_version": "2.71.828",
  "completed": true,
  "failed_label": "nodes-that-failed",
  "kind": "ComputeUpgradeSession",
  "messages": [
    "string"
  ],
  "starting_label": "nodes-to-upgrade",
  "state": "UPDATING",
  "upgrade_id": "c926acf6-b5c6-411e-ba6c-ea0448cab2ee",
  "upgrade_step_size": 30,
  "upgrade_template_id": "my-bos-session-template",
  "upgrading_label": "nodes-currently-upgrading",
  "workload_manager_type": "slurm"
}

Responses

Status Meaning Description Schema
200 OK The status of the CRUS session. SessionStatus
404 Not Found Not Found None
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth

delete__session_{upgrade_id}

Code samples

DELETE https://api-gw-service-nmn.local/apis/crus/session/{upgrade_id} HTTP/1.1
Host: api-gw-service-nmn.local
Accept: application/json
# You can also use wget
curl -X DELETE https://api-gw-service-nmn.local/apis/crus/session/{upgrade_id} \
  -H 'Accept: application/json' \
  -H 'Authorization: Bearer {access-token}'
import requests
headers = {
  'Accept': 'application/json',
  'Authorization': 'Bearer {access-token}'
}

r = requests.delete('https://api-gw-service-nmn.local/apis/crus/session/{upgrade_id}', headers = headers)

print(r.json())
package main

import (
       "bytes"
       "net/http"
)

func main() {

    headers := map[string][]string{
        "Accept": []string{"application/json"},
        "Authorization": []string{"Bearer {access-token}"},
    }

    data := bytes.NewBuffer([]byte{jsonReq})
    req, err := http.NewRequest("DELETE", "https://api-gw-service-nmn.local/apis/crus/session/{upgrade_id}", data)
    req.Header = headers

    client := &http.Client{}
    resp, err := client.Do(req)
    // ...
}

DELETE /session/{upgrade_id}

Delete session by id

Delete session by upgrade_id.

Parameters

Name In Type Required Description
upgrade_id path string(uuid) true Upgrade ID

Example responses

200 Response

{
  "api_version": "2.71.828",
  "completed": true,
  "failed_label": "nodes-that-failed",
  "kind": "ComputeUpgradeSession",
  "messages": [
    "string"
  ],
  "starting_label": "nodes-to-upgrade",
  "state": "UPDATING",
  "upgrade_id": "c926acf6-b5c6-411e-ba6c-ea0448cab2ee",
  "upgrade_step_size": 30,
  "upgrade_template_id": "my-bos-session-template",
  "upgrading_label": "nodes-currently-upgrading",
  "workload_manager_type": "slurm"
}

Responses

Status Meaning Description Schema
200 OK The status of the CRUS session. SessionStatus
404 Not Found Not Found None
To perform this operation, you must be authenticated by means of one of the following methods: bearerAuth

Schemas

Session

{
  "failed_label": "nodes-that-failed",
  "starting_label": "nodes-to-upgrade",
  "upgrade_step_size": 30,
  "upgrade_template_id": "my-bos-session-template",
  "upgrading_label": "nodes-currently-upgrading",
  "workload_manager_type": "slurm"
}

A CRUS Session object.

Properties

Name Type Required Restrictions Description
failed_label string true none An empty Hardware State Manager (HSM) group which CRUS will populate
with any nodes that fail their upgrades.
starting_label string true none A Hardware State Manager (HSM) group which contains the total set of
nodes to be upgraded.
upgrade_step_size integer true none The desired number of nodes for each discrete upgrade step. This quantity
will not be exceeded but some steps may use fewer nodes.
upgrade_template_id string true none The name of the Boot Orchestration Service (BOS) session template to use
for the upgrades.
upgrading_label string true none An empty Hardware State Manager (HSM) group which CRUS will use to boot
and configure the discrete sets of nodes.
workload_manager_type string true none The name of the workload manager. Currently only slurm is supported.

Enumerated Values

Property Value
workload_manager_type slurm

SessionStatus

{
  "api_version": "2.71.828",
  "completed": true,
  "failed_label": "nodes-that-failed",
  "kind": "ComputeUpgradeSession",
  "messages": [
    "string"
  ],
  "starting_label": "nodes-to-upgrade",
  "state": "UPDATING",
  "upgrade_id": "c926acf6-b5c6-411e-ba6c-ea0448cab2ee",
  "upgrade_step_size": 30,
  "upgrade_template_id": "my-bos-session-template",
  "upgrading_label": "nodes-currently-upgrading",
  "workload_manager_type": "slurm"
}

The status for a CRUS Session.

Properties

Name Type Required Restrictions Description
api_version string true none Version of the API that created the session.
completed boolean true none Whether or not the CRUS session has completed.
failed_label string true none A Hardware State Manager (HSM) group which CRUS will populate
with any nodes that fail their upgrades.
kind string true none The kind of CRUS session. Currently only ComputeUpgradeSession.
messages [string] true none Status messages describing the progress of the session.
starting_label string true none A Hardware State Manager (HSM) group which contains the total set of
nodes to be upgraded.
state string true none Current state of the session.
upgrade_id string(uuid) true none The ID of the CRUS session.
upgrade_step_size integer true none The desired number of nodes for each discrete upgrade step. This quantity
will not be exceeded but some steps may use fewer nodes.
upgrade_template_id string true none The name of the Boot Orchestration Service (BOS) session template for the
CRUS session upgrades.
upgrading_label string true none A Hardware State Manager (HSM) group which the CRUS session will use
to boot and configure the discrete sets of nodes.
workload_manager_type string true none The name of the workload manager.

Enumerated Values

Property Value
kind ComputeUpgradeSession
state CREATED
state READY
state DELETING
state UPDATING
workload_manager_type slurm