Cray System Management Documentation > Cray System Management (CSM) Administration Guide > multi-tenancy > Example Workflow

Example Workflow

Overview
Configuration
Updates
Step-by-step guide
Status and troubleshooting
Upgrade all tenants after a Slurm upgrade
Appendices

Overview

A tenant is a collection of nodes that is dedicated to one particular set of users on an HPE Cray EX system running CSM. This guide is intended to provide a comprehensive set of instructions for a system administrator to configure, deploy, and run applications on, one or two tenants.

In this document we provide examples for a hypothetical system called Development, which has two tenants, and each tenant has a SlurmCluster.

Note that this document reflects the current state of the Multi-Tenancy feature. For example, VNI blocks must be manually configured today, but they will be automatically configured in a future release.

Here are the steps required:

Configure desired tenants and SlurmClusters
Deploy each tenant and its SlurmCluster
Configure each SlurmCluster's slurm.conf and sssd.conf
Make any required changes (e.g. VNIs) to primary (user namespace) slurm.conf
Configure USS group variables for all tenants
Create one BOS session template for all tenants
Create one CFS configuration for all tenants
Boot nodes into tenants
In each tenant, login to UAN (if available), or any Compute node
In each tenant, launch Slurm jobs - both commands and applications

Configuration

This section provides additional information on the configuration necessary to fully setup a tenant.

Tenant configuration

Tenants are created and configured by creating a tenant custom resource definition in the form of a yaml file. For more information see Creating a Tenant.

For the purposes of this guide, the tenant configuration settings are made in each tenant’s configuration file, e.g. devten01a.yaml.

Tenant names

Choose your naming convention for each system and tenant
Example:
- system Development
- two tenants devten01a and devten02a
- future iterations might use suffix 01b, 01c, etc

Example of these settings in configuration file devten01a.yaml:

apiVersion: tapms.hpe.com/v1alpha3
kind: Tenant
metadata:
  name: devten01a
  namespace: tenants
spec:
  childnamespaces:
  - slurm
  - user
  tenantname: vcluster-devten01a
  tenantkms:
    enablekms: true
  tenanthooks: []
  tenantresources:
  - enforceexclusivehsmgroups: true
    hsmgrouplabel: devten01a
    type: compute
    xnames:
    - x1000c0s0b0n0

Tenant API version

Tenant apiVersion should use the latest available in the CSM release, e.g. v1alpha3 for CSM 1.6

`SlurmCluster` configuration

These configuration settings are made:

In each SlurmCluster's configuration file, e.g. devcls01a.yaml
In each SlurmCluster's /etc/slurm/slurm.conf file (in each slurmctld pod)

`SlurmCluster` names

Choose your naming convention for each system and SlurmCluster
Example:
- system Development
- SlurmClusters devcls01a and devcls02a
- future iterations might use suffix 01b, 01c, etc
Name length limitation:
- longer names are automatically generated from SlurmCluster name
- for example, devcls01a-slurmdb (16 characters)
- name length limitation is 22 characters

Example of settings in configuration file devcls01a.yaml:

namespace: vcluster-devten01a-slurm
tapmsTenantName: vcluster-devten01a
hsmGroup: devten01a

`SlurmCluster` IP addresses

IMPORTANT Each High-Speed Network (HSN) IP address must be unique, within all the SlurmClusters on any one system.

These HSN IP addresses are assigned in the USS configuration, below
- You will need to know the base HSN IP address for each system
Four HSN IP addresses are used in each SlurmCluster
Example:
- System Development base HSN IP address 10.156.0.0
- Primary SlurmCluster (user namespace) uses 10.156.12.100, .101, .102, .103
- First tenant SlurmCluster (vcluster-devten01-slurmdb namespace) will use 10.156.12.104, .105, .106, .107
- Second tenant SlurmCluster (vcluster-devten02-slurmdb namespace) will use 10.156.12.108, .109, .110, .111

`SlurmCluster` API version

SlurmCluster apiVersion must match Slurm release (for example v1alpha1)

`SlurmCluster` configurable values

Settings for cpu and memory and initialDelaySeconds are shown in the example file devcls01a.yaml, below
These settings were provided by the WLM team, who should be consulted for any changes

`SlurmCluster` version numbers

Version numbers are shown in the example file devcls01a.yaml, below
The version numbers must match the versions of these products on the system

Example:

- cray/cray-slurmctld:1.6.1
- cray/cray-slurmdbd:1.6.1
- cray/munge-munge:1.5.0
- cray/cray-sssd:1.4.0
- cray/cray-slurm-config:1.3.0

`SlurmCluster` Slurm configuration

These configuration settings are made in each SlurmCluster's /etc/slurm/slurm.conf file (in each slurmctld pod)

`SlurmCluster` Slingshot VNI allocation

IMPORTANT Each block of HPE Slingshot VNIs on the High-Speed Network (HSN) must not overlap with other blocks on the same system.

Note that there is one /etc/slurm/slurm.conf file in each tenant’s SlurmCluster
Example with no tenants:
- For primary user namespace:
  - SwitchType=switch/hpe_slingshot
  - SwitchParameters=vnis=1025-65535
Example with one tenant:
- For primary user namespace:
  - SwitchType=switch/hpe_slingshot
  - SwitchParameters=vnis=1025-32767
- For vcluster-devten01a-slurm namespace:
  - SwitchType=switch/hpe_slingshot
  - SwitchParameters=vnis=32768-65535
Example with two tenants:
- For primary user namespace:
  - SwitchType=switch/hpe_slingshot
  - SwitchParameters=vnis=1025-32767
- For vcluster-devten01a-slurm namespace:
  - SwitchType=switch/hpe_slingshot
  - SwitchParameters=vnis=32868-57353
- For vcluster-devten02a-slurm namespace:
  - SwitchType=switch/hpe_slingshot
  - SwitchParameters=vnis=57354-65535

`SlurmCluster` partitions and nodes

The general advice to tailor the compute node configuration for each tenant is to look at the slurm.conf for the primary (user namespace) Slurm instance. Borrow the NodeSet, PartitionName, and NodeName directives that apply to each tenant.

In this example, we have ‘moved’ two Compute nodes to the slurm.conf in namespace vcluster-devten01a-slurm.

# PARTITIONS
NodeSet=Compute Feature=Compute
PartitionName=workq Nodes=Compute MaxTime=INFINITE State=UP OverSubscribe=EXCLUSIVE
# BEGIN COMPUTE NODES
NodeName=nid[001002-001003] Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=456704 Feature=Compute

`SlurmCluster` secrets (for `nonroot` users)

These configuration settings are made in each SlurmCluster's /etc/sssd/sssd.conf file (in each slurmctld pod) You should not need to create or edit the sssd.conf file. Simply clone that file from the primary SlurmCluster (user namespace) to each tenant namespace.

USS configuration

These changes are made to the uss-config-management git repo. Tenants are disambiguated by their HSM group name (for example hsmgrouplabel "devten01a" in devten01a.yaml, and hsmGroup "devten01a" in devcls01a.yaml). All tenants can be booted and configured with a single CFS configuration that contains the appropriate git commit ID in the USS layers.

Example: group_vars/devten01a/slurm.yml

munge_vault_path: secret/slurm/vcluster-devten01a-slurm/devcls01a/munge
slurm_conf_url: https://rgw-vip.local/wlm/vcluster-devten05-slurm/devcls01a/
slurmd_options: "--conf-server 10.156.124.104,10.156.124.105"

Example: group_vars/devten02a/slurm.yml

munge_vault_path: secret/slurm/vcluster-devten02a-slurm/devcls02a/munge
slurm_conf_url: https://rgw-vip.local/wlm/vcluster-devten02a-slurm/devcls02a/
slurmd_options: "--conf-server 10.156.124.108,10.156.124.109"

Updates

After initial creation, the SlurmCluster resource may be updated with new settings. This is useful to correct errors with the initial deployment, or to update to new Slurm versions.

(ncn-mw#) Edit the SlurmCluster file (For example, devcls01a.yaml).
(ncn-mw#) Apply the changes:
```
kubectl apply -f devcls01a.yaml
```
(ncn-mw#) The Slurm operator will update the relevant Kubernetes resources to reflect the new configuration.

For example, if a new version of Slurm is installed on the system, the tenant can update to the new Slurm version by setting new container versions in the SlurmCluster file and applying the changes.

Step-by-step guide

Legend for these examples:

devten01a.yaml - configuration file for tenant devten01a
devcls01a.yaml - configuration file for SlurmCluster devcls01a

Create and apply tenant configuration file

Filename: devten01a.yaml

(ncn-mw#) Make sure tenant name is not already in use:

kubectl get tenant -n tenants -o yaml vcluster-devten01a

(ncn-mw#) Make sure HSM group is not already in use:
```
cray hsm groups describe devten01a
```
(ncn-mw#) Create your tenant.yaml file, and apply it:
- See Appendix A for an example
```
vi devten01a.yaml
kubectl apply -n tenants -f devten01a.yaml
```
(ncn-mw#) Wait for ‘Deploying’ state to become ‘Deployed’:
```
kubectl get tenant -n tenants -o yaml vcluster-devten01a
```
(ncn-mw#) Confirm HSM group:
```
cray hsm groups describe devten01a
```

Repeat this step as needed for additional tenants.

Create and apply the `SlurmCluster` configuration file

Filename: devcls01a.yaml

(ncn-mw#) Make sure cluster name is not already in use:
```
kubectl get pods -A | grep vcluster-devcls01a-slurm
```
(ncn-mw#) Create your cluster.yaml file, and apply it:
- See Appendix B for an example
```
vi devcls01a.yaml
kubectl apply -f devcls01a.yaml
```

(ncn-mw#) Wait for pods to initialize:

kubectl get pods -A | grep vcluster-devten01a-slurm

Repeat this step as needed for additional SlurmClusters.

Edit and apply Slurm configuration file

SlurmCluster: devcls01a Filename: /etc/slurm/slurm.conf

(ncn-mw#) Get the running configuration:

kubectl get configmap -n vcluster-devten01a-slurm devcls01a-slurm-conf -o yaml > devcls01a-slurm-conf.yaml

(ncn-mw#) Extract the slurm.conf:

yq r devcls01a-slurm-conf.yaml 'data."slurm.conf"' > slurm.conf

(ncn-mw#) Edit the slurm.conf:
- See Appendix C
```
vi slurm.conf
```

(ncn-mw#) Update the configuration:

yq w -i devcls01a-slurm-conf.yaml 'data."slurm.conf"' "$(cat slurm.conf)"

(ncn-mw#) Apply the configuration:

kubectl apply -f devcls01a-slurm-conf.yaml

(ncn-mw#) Look up the pod for the tenant slurmcluster:

SLURMCTLD_POD=$(kubectl get pod -n vcluster-devten01a-slurm -lapp.kubernetes.io/name=slurmctld -o name)

(ncn-mw#) Reconfigure:

kubectl exec -n vcluster-devten01a-slurm ${SLURMCTLD_POD} -c slurmctld -- scontrol reconfigure

Repeat this step as needed for additional SlurmClusters.

Edit Slurm configuration file `sssd.conf`

SlurmCluster: devcls01a Filename: /etc/sssd/sssd.conf

(ncn-mw#) Get the user namespace sssd.conf so it can be cloned:

kubectl get configmap -n user sssd-conf -o jsonpath='{.data.sssd\.conf}' > sssd.conf

(ncn-mw#) Delete an existing stub file in the tenant, if present:

kubectl delete secret -n vcluster-devten01a-slurm devcls01a-sssd-conf

(ncn-mw#) Clone the user namespace file into the tenant:

kubectl create secret generic -n vcluster-devten01a-slurm noclus01a-sssd-conf --from-file sssd.conf

(ncn-mw#) Restart the tenant’s slurmctld pods:

kubectl rollout restart deployment -n vcluster-devten01a-slurm devcls01a-slurmctld devcls01a-slurmctld-backup

(ncn-mw#) Restart the tenant’s slurmdbd pods:

kubectl rollout restart deployment -n vcluster-devten01a-slurm devcls01a-slurmdbd devcls01a-slurmdbd-backup

(ncn-mw#) Check for all restarted pods to be in Running state:
```
kubectl get pods -A | egrep 'slurmctld|slurmdbd'
```

Repeat this step as needed for additional SlurmClusters.

Configure USS group variables

Filename: group_vars/devten01a/slurm.yml

(ncn-mw#) Clone the USS repository:

git clone https://api-gw-service-nmn.local/vcs/cray/uss-config-management.git

(ncn-mw#) Go to repo:
```
cd uss-config-management
```
(ncn-mw#) Check out integration branch (1.1.0 shown here):
```
git checkout integration-1.1.0
```
(ncn-mw#) Create subdirectory for tenant:
```
mkdir group_vars/devten01a
```
(ncn-mw#) Edit the file group_vars/devten01a/slurm.yml
- See an appropriate example in Appendix D
(ncn-mw#) Add the new file:
```
git add group_vars/devten01a/slurm.yml
```
(ncn-mw#) Commit the new file:
```
git commit -am "descriptive comment"
```
(ncn-mw#) Push to integration branch (1.1.0 shown here):
```
git push origin integration-1.1.0
```
(ncn-mw#) Remember the first commit ID in the output:
```
git log -a |cat
```
Repeat this step as needed for additional tenants.

Create BOS session template

Note that one template is needed for each node type (UAN, Compute) and architecture (X86, ARM) in the tenants. A single BOS session template may be used for many tenants of the same node type and architecture.

(ncn-mw#) Look up the name of the default template(s) for tenants (for example X86 Compute) and save as JSON files:

cray bos sessiontemplates describe --format json ssi-compute-cr_2024.x86_64-cr_2024_1 > ssi-compute-cr_2024.x86_64-cr_2024_1.json

(ncn-mw#) Make a copy of the default template:

cp ssi-compute-cr_2024.x86_64-cr_2024_1.json ssi-compute-cr_2024.x86_64-cr_2024_1-tenants.json

Edit the new copy:
- Delete block of lines starting with enable_cfs:, name:, and tenant: and remove the comma from the preceding line
- Change the name of the CFS configuration in the BOS session template to add a -tenants suffix, as seen in the next section

(ncn-mw#) Upload the new template, specifying the filename and the name of the new template:

cray bos sessiontemplates create --format json --file ssi-compute-cr_2024.x86_64-cr_2024_1-tenants.json ssi-compute-cr_2024.x86_64-cr_2024_1-tenants

Repeat this step as needed for different node types and architectures.

Create CFS configuration

Note that one configuration is needed for each node type (UAN, Compute) in the tenant. A single CFS configuration may be used for many tenants of the same node type.

(ncn-mw#) Save the default configuration as a JSON file:

cray cfs configurations describe --format json ssi-compute-cr_2024-cr_2024_1 > ssi-compute-cr_2024-cr_2024_1.json

(ncn-mw#) Make a copy of the default JSON file:

cp ssi-compute-cr_2024-cr_2024_1.json ssi-compute-cr_2024-cr_2024_1-tenants.json

Edit the new copy:
- Delete lines starting with lastUpdated:
- Delete the last instance in the file of name: and remove the comma from the preceding line
- Be SURE to replace the commit ID for each JSON block that refers to uss-config-management.git; use the commit ID from “git log” command in the earlier step that created the USS group_vars file

(ncn-mw#) Upload the new configuration, specifying the filename and the name of the new configuration:

cray cfs configurations update --file ssi-compute-cr_2024-cr_2024_1-tenants.json ssi-compute-cr_2024-cr_2024_1-tenants

Repeat this step as needed for different node types.

Boot and run

BOS and CFS:
- One or more BOS session templates are required for the tenant. See Create BOS session template for details.
- The BOS session templates must refer to CFS configurations. See Create CFS configuration for details.

(ncn-mw#) Boot Compute nodes for a node architecture in tenant:

cray bos sessions create --template-name ssi-compute-cr_2024.x86_64-cr_2024_1-tenants --operation boot --limit x9000c1s0b1n0,x9000c1s0b1n1

After CFS completes, login to either a tenant UAN (if available), or tenant Compute.

(ncn-mw#) See what nodes are available:
```
sinfo
```
(ncn-mw#) Launch a command or application:
```
srun -N2 uname -rin
srun -N2 ./all2all
```

Status and troubleshooting

Tenant command examples

(ncn-mw#) View a specific tenant, brief:

kubectl get tenant -n tenants -o yaml vcluster-devten01a

(ncn-mw#) View a specific tenant, verbose:

kubectl describe tenant -n tenants vcluster-devten01a

(ncn-mw#) View the logs for all tenants:

TAPMS_POD=$( kubectl get pods -n tapms-operator --no-headers | awk '{print $1}' );
kubectl logs --timestamps -n tapms-operator $TAPMS_POD

`SlurmCluster` command examples

(ncn-mw#) View the pods for all clusters:
```
kubectl get pods -A | grep vcluster
```

(ncn-mw#) View the pods for a specific cluster:

kubectl get pods -A | grep vcluster-devten01a-slurm

(ncn-mw#) View logs for a specific cluster:

NAMESPACE=vcluster-devten01a-slurm;
SLURMCTLD_POD=$( kubectl get pods -n $NAMESPACE |grep slurmctld |grep -v backup | awk '{print $1}' );
kubectl logs --timestamps -n $NAMESPACE $SLURMCTLD_POD -c slurmctld

HSM command examples

(ncn-mw#) All HSM groups, including all tenants:
```
cray hsm groups list --format yaml
```

(ncn-mw#) Specific tenant:

cray hsm groups describe --format yaml devten01a

HNS command example

(ncn-mw#) All tenants:

kubectl hns tree tenants

Upgrade all tenants after a Slurm upgrade

This procedure is required for each tenant, after Slurm has been upgraded on the system (for example, after using IUF to upgrade products).

The configuration used to create each tenant’s SlurmCluster is required. For this Slurm upgrade, there is no need to change the SlurmCluster name; the only change is to the Slurm version inside each tenant.

SlurmCluster: devcls01a

Filename: devcls01a.yaml

(ncn-mw#) Edit the SlurmCluster configuration file:

cp -p devcls01a.yaml devcls01a.yaml{,.bak}
vi devcls01a.yaml

(ncn-mw#) Double-check the differences:

diff devcls01a.yaml devcls01a.yaml.bak

Possible output:

10c10
<     image: cray/cray-slurmctld:1.7.0-slurm
---
>     image: cray/cray-slurmctld:1.6.1
21c21
<     image: cray/cray-slurmdbd:1.7.0-slurm
---
>     image: cray/cray-slurmdbd:1.6.1

(ncn-mw#) Re-apply the SlurmCluster configuration file:
```
kubectl apply -f devcls01a.yaml
```
(ncn-mw#) Wait for all pods to return to Running state:
```
kubectl get pods -A |grep vcluster
```

Repeat this step as needed for additional SlurmClusters.

Appendices

Appendix A - `Development` tenant

This is filename devten01.yaml; complete file is shown.

apiVersion: tapms.hpe.com/v1alpha3
kind: Tenant
metadata:
  name: vcluster-devten01a
spec:
  childnamespaces:
  - slurm
  - user
  tenantname: vcluster-devten01a
  tenanthooks: []
  tenantresources:
  - enforceexclusivehsmgroups: true
    hsmgrouplabel: devten01a
    type: compute
    xnames:
    - x9000c1s0b1n0
    - x9000c1s0b1n1
  - enforceexclusivehsmgroups: true
    hsmgrouplabel: devten01a
    type: application
    xnames:
    - x3000c0s29b0n0

Appendix B - `Development` `SlurmCluster`

IMPORTANT The values for cpu and memory and initialDelaySeconds are recommended by the WLM team.

This is filename devcls01a.yaml; complete file is shown.

apiVersion: "wlm.hpe.com/v1alpha1"
kind: SlurmCluster
metadata:
  name: devcls01a
  namespace: vcluster-devcls01a-slurm
spec:
  tapmsTenantName: vcluster-devcls01a
  tapmsTenantVersion: v1alpha3
  slurmctld:
    image: cray/cray-slurmctld:1.6.1
    ip: 10.150.124.100
    host: devcls01a-slurmctld
    backupIP: 10.150.124.101
    backupHost: devcls01a-slurmctld-backup
    livenessProbe:
      enabled: true
      initialDelaySeconds: 120
      periodSeconds: 60
      timeoutSeconds: 60
  slurmdbd:
    image: cray/cray-slurmdbd:1.6.1
    ip: 10.150.124.102
    host: devcls01a-slurmdbd
    backupIP: 10.150.124.103
    backupHost: devcls01a-slurmdbd-backup
    livenessProbe:
      enabled: true
      initialDelaySeconds: 43200
      periodSeconds: 30
      timeoutSeconds: 5
  munge:
    image: cray/munge-munge:1.5.0
  sssd:
    image: cray/cray-sssd:1.4.0
  config:
    image: cray/cray-slurm-config:1.3.0
    hsmGroup: devcls01a
  pxc:
    enabled: true
    image:
      repository: cray/cray-pxc
      tag: 1.3.0
    initImage:
      repository: cray/cray-pxc-operator
      tag: 1.3.0
    configuration: |
      [mysqld]
      innodb_log_file_size=4G
      innodb_lock_wait_timeout=900
      wsrep_trx_fragment_size=1G
      wsrep_trx_fragment_unit=bytes
      log_error_suppression_list=MY-013360      
    data:
      storageClassName: k8s-block-replicated
      accessModes:
        - ReadWriteOnce
      storage: 1Ti
    livenessProbe:
      initialDelaySeconds: 300
      periodSeconds: 10
      timeoutSeconds: 5
    resources:
      requests:
        cpu: "500m"
        memory: 4Gi
      limits:
        cpu: "8"
        memory: 32Gi
    backup:
      image:
        repository: cray/cray-pxc-backup
        tag: 1.3.0
      data:
        storageClassName: k8s-block-replicated
        accessModes:
          - ReadWriteOnce
        storage: 512Gi
      # Backup daily at 9:10PM (does not conflict with other CSM DB backups)
      schedule: "10 21 * * *"
      keep: 3
      resources:
        requests:
          cpu: "500m"
          memory: 4Gi
        limits:
          cpu: "8"
          memory: 16Gi
    haproxy:
      image:
        repository: cray/cray-pxc-haproxy
        tag: 1.3.0
      resources:
        requests:
          cpu: "500m"
          memory: 128Mi
        limits:
          cpu: "16"
          memory: 512Mi

Appendix C - Slurm configuration

Administrators are responsible for divvying up the HPE Slingshot VNI space among the primary SlurmCluster (user namespace) and any tenant SlurmClusters. Start with the primary SlurmCluster, and then configure each tenant. Here is an example for primary and one tenant:

This is filename /etc/slurm/slurm.conf for user namespace; partial file is shown.

...
SwitchType=switch/hpe_slingshot
SwitchParameters=vnis=1025-32767
...

This is filename /etc/slurm/slurm.conf for vcluster-devten01-slurm namespace; partial file is shown.
```
...
SwitchType=switch/hpe_slingshot
SwitchParameters=vnis=32768-65535
...
```

Second, insert the NodeSet, PartitionName, and NodeName directives that apply to your tenant. In this example on Development, we have two X86 Compute nodes (1002 and 1003), and one X86 UAN (uan02).

This is filename /etc/slurm/slurm.conf for vcluster-devten01-slurm namespace; partial file is shown.

...
# PARTITIONS
NodeSet=Compute Feature=Compute
PartitionName=workq Nodes=Compute MaxTime=INFINITE State=UP OverSubscribe=EXCLUSIVE
# BEGIN COMPUTE NODES
NodeName=nid[001002-001003] Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=456704 Feature=Compute
# END COMPUTE NODES
NodeName=uan02 Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=227328 Feature=Application_UAN
...

Appendix D - USS group variables

This is file group_vars/devten01a/slurm.yml; complete file is shown.

munge_vault_path: secret/slurm/vcluster-devten01a-slurm/devcls01a/munge
slurm_conf_url: https://rgw-vip.local/wlm/vcluster-devten05-slurm/devcls01a/
slurmd_options: "--conf-server 10.156.124.104,10.156.124.105"

Example Workflow

Overview

Configuration

Tenant configuration

Tenant names

Tenant API version

SlurmCluster configuration

SlurmCluster names

SlurmCluster IP addresses

SlurmCluster API version

SlurmCluster configurable values

SlurmCluster version numbers

SlurmCluster Slurm configuration

SlurmCluster Slingshot VNI allocation

SlurmCluster partitions and nodes

SlurmCluster secrets (for nonroot users)

USS configuration

Updates

Step-by-step guide

Create and apply tenant configuration file

Create and apply the SlurmCluster configuration file

Edit and apply Slurm configuration file

Edit Slurm configuration file sssd.conf

Configure USS group variables

Create BOS session template

Create CFS configuration

Boot and run

Status and troubleshooting

Tenant command examples

SlurmCluster command examples

HSM command examples

HNS command example

Upgrade all tenants after a Slurm upgrade

Appendices

Appendix A - Development tenant

Appendix B - Development SlurmCluster

Appendix C - Slurm configuration

Appendix D - USS group variables

`SlurmCluster` configuration

`SlurmCluster` names

`SlurmCluster` IP addresses

`SlurmCluster` API version

`SlurmCluster` configurable values

`SlurmCluster` version numbers

`SlurmCluster` Slurm configuration

`SlurmCluster` Slingshot VNI allocation

`SlurmCluster` partitions and nodes

`SlurmCluster` secrets (for `nonroot` users)

Create and apply the `SlurmCluster` configuration file

Edit Slurm configuration file `sssd.conf`

`SlurmCluster` command examples

Appendix A - `Development` tenant

Appendix B - `Development` `SlurmCluster`