A tenant is a collection of nodes that is dedicated to one particular set of users on an HPE Cray EX system running CSM. This guide is intended to provide a comprehensive set of instructions for a system administrator to configure, deploy, and run applications on, one or two tenants.
In this document we provide examples for a hypothetical system called Development, which has two tenants, and each tenant has a SlurmCluster.
Note that this document reflects the current state of the Multi-Tenancy feature. For example, VNI blocks must be manually configured today, but they will be automatically configured in a future release.
Here are the steps required:
SlurmClustersSlurmClusterSlurmCluster's slurm.conf and sssd.confVNIs) to primary (user namespace) slurm.confThis section provides additional information on the configuration necessary to fully setup a tenant.
Tenants are created and configured by creating a tenant custom resource definition in the form of a yaml file.
For more information see Creating a Tenant.
For the purposes of this guide, the tenant configuration settings are made in each tenant’s configuration file, e.g. devten01a.yaml.
Choose your naming convention for each system and tenant
Example:
Developmentdevten01a and devten02a01b, 01c, etcExample of these settings in configuration file devten01a.yaml:
apiVersion: tapms.hpe.com/v1alpha3
kind: Tenant
metadata:
name: devten01a
namespace: tenants
spec:
childnamespaces:
- slurm
- user
tenantname: vcluster-devten01a
tenantkms:
enablekms: true
tenanthooks: []
tenantresources:
- enforceexclusivehsmgroups: true
hsmgrouplabel: devten01a
type: compute
xnames:
- x1000c0s0b0n0
apiVersion should use the latest available in the CSM release, e.g. v1alpha3 for CSM 1.7SlurmCluster configurationThese configuration settings are made:
SlurmCluster's configuration file, e.g. devcls01a.yamlSlurmCluster's /etc/slurm/slurm.conf file (in each slurmctld pod)SlurmCluster namesChoose your naming convention for each system and SlurmCluster
Example:
DevelopmentSlurmClusters devcls01a and devcls02a01b, 01c, etcName length limitation:
SlurmCluster namedevcls01a-slurmdb (16 characters)Example of settings in configuration file devcls01a.yaml:
namespace: vcluster-devten01a-slurm
tapmsTenantName: vcluster-devten01a
hsmGroup: devten01a
SlurmCluster IP addressesIMPORTANT Each High-Speed Network (HSN) IP address must be unique, within all the SlurmClusters on any one system.
SlurmClusterDevelopment base HSN IP address 10.156.0.0SlurmCluster (user namespace) uses 10.156.12.100, .101, .102, .103SlurmCluster (vcluster-devten01-slurmdb namespace) will use 10.156.12.104, .105, .106, .107SlurmCluster (vcluster-devten02-slurmdb namespace) will use 10.156.12.108, .109, .110, .111SlurmCluster API versionSlurmCluster apiVersion must match Slurm release (for example v1alpha1)SlurmCluster configurable valuescpu and memory and initialDelaySeconds are shown in the example file devcls01a.yaml, belowSlurmCluster version numbersVersion numbers are shown in the example file devcls01a.yaml, below
The version numbers must match the versions of these products on the system
Example:
- cray/cray-slurmctld:1.6.1
- cray/cray-slurmdbd:1.6.1
- cray/munge-munge:1.5.0
- cray/cray-sssd:1.4.0
- cray/cray-slurm-config:1.3.0
SlurmCluster Slurm configurationThese configuration settings are made in each SlurmCluster's /etc/slurm/slurm.conf file (in each slurmctld pod)
SlurmCluster Slingshot VNI allocationIMPORTANT Each block of HPE Slingshot VNIs on the High-Speed Network (HSN) must not overlap with other blocks on the same system.
/etc/slurm/slurm.conf file in each tenant’s SlurmClusteruser namespace:
SwitchType=switch/hpe_slingshotSwitchParameters=vnis=1025-65535user namespace:
SwitchType=switch/hpe_slingshotSwitchParameters=vnis=1025-32767vcluster-devten01a-slurm namespace:
SwitchType=switch/hpe_slingshotSwitchParameters=vnis=32768-65535user namespace:
SwitchType=switch/hpe_slingshotSwitchParameters=vnis=1025-32767vcluster-devten01a-slurm namespace:
SwitchType=switch/hpe_slingshotSwitchParameters=vnis=32868-57353vcluster-devten02a-slurm namespace:
SwitchType=switch/hpe_slingshotSwitchParameters=vnis=57354-65535SlurmCluster partitions and nodesThe general advice to tailor the compute node configuration for each tenant is to look at the slurm.conf for the primary (user namespace) Slurm instance.
Borrow the NodeSet, PartitionName, and NodeName directives that apply to each tenant.
In this example, we have ‘moved’ two Compute nodes to the slurm.conf in namespace vcluster-devten01a-slurm.
# PARTITIONS
NodeSet=Compute Feature=Compute
PartitionName=workq Nodes=Compute MaxTime=INFINITE State=UP OverSubscribe=EXCLUSIVE
# BEGIN COMPUTE NODES
NodeName=nid[001002-001003] Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=456704 Feature=Compute
SlurmCluster secrets (for nonroot users)These configuration settings are made in each SlurmCluster's /etc/sssd/sssd.conf file (in each slurmctld pod)
You should not need to create or edit the sssd.conf file.
Simply clone that file from the primary SlurmCluster (user namespace) to each tenant namespace.
These changes are made to the uss-config-management git repo.
Tenants are disambiguated by their HSM group name (for example hsmgrouplabel "devten01a" in devten01a.yaml, and hsmGroup "devten01a" in devcls01a.yaml).
All tenants can be booted and configured with a single CFS configuration that contains the appropriate git commit ID in the USS layers.
Example: group_vars/devten01a/slurm.yml
munge_vault_path: secret/slurm/vcluster-devten01a-slurm/devcls01a/munge
slurm_conf_url: https://rgw-vip.local/wlm/vcluster-devten05-slurm/devcls01a/
slurmd_options: "--conf-server 10.156.124.104,10.156.124.105"
Example: group_vars/devten02a/slurm.yml
munge_vault_path: secret/slurm/vcluster-devten02a-slurm/devcls02a/munge
slurm_conf_url: https://rgw-vip.local/wlm/vcluster-devten02a-slurm/devcls02a/
slurmd_options: "--conf-server 10.156.124.108,10.156.124.109"
After initial creation, the SlurmCluster resource may be updated with new
settings. This is useful to correct errors with the initial deployment, or
to update to new Slurm versions.
(ncn-mw#) Edit the SlurmCluster file (For example, devcls01a.yaml).
(ncn-mw#) Apply the changes:
kubectl apply -f devcls01a.yaml
(ncn-mw#) The Slurm operator will update the relevant Kubernetes resources
to reflect the new configuration.
For example, if a new version of Slurm is installed on the system, the tenant
can update to the new Slurm version by setting new container versions in the
SlurmCluster file and applying the changes.
Legend for these examples:
devten01a.yaml - configuration file for tenant devten01adevcls01a.yaml - configuration file for SlurmCluster devcls01aFilename: devten01a.yaml
(ncn-mw#) Make sure tenant name is not already in use:
kubectl get tenant -n tenants -o yaml vcluster-devten01a
(ncn-mw#) Make sure HSM group is not already in use:
cray hsm groups describe devten01a
(ncn-mw#) Create your tenant.yaml file, and apply it:
vi devten01a.yaml
kubectl apply -n tenants -f devten01a.yaml
(ncn-mw#) Wait for ‘Deploying’ state to become ‘Deployed’:
kubectl get tenant -n tenants -o yaml vcluster-devten01a
(ncn-mw#) Confirm HSM group:
cray hsm groups describe devten01a
Repeat this step as needed for additional tenants.
SlurmCluster configuration fileFilename: devcls01a.yaml
(ncn-mw#) Make sure cluster name is not already in use:
kubectl get pods -A | grep vcluster-devcls01a-slurm
(ncn-mw#) Create your cluster.yaml file, and apply it:
vi devcls01a.yaml
kubectl apply -f devcls01a.yaml
(ncn-mw#) Wait for pods to initialize:
kubectl get pods -A | grep vcluster-devten01a-slurm
Repeat this step as needed for additional SlurmClusters.
SlurmCluster: devcls01a
Filename: /etc/slurm/slurm.conf
(ncn-mw#) Get the running configuration:
kubectl get configmap -n vcluster-devten01a-slurm devcls01a-slurm-conf -o yaml > devcls01a-slurm-conf.yaml
(ncn-mw#) Extract the slurm.conf:
yq r devcls01a-slurm-conf.yaml 'data."slurm.conf"' > slurm.conf
(ncn-mw#) Edit the slurm.conf:
vi slurm.conf
(ncn-mw#) Update the configuration:
yq w -i devcls01a-slurm-conf.yaml 'data."slurm.conf"' "$(cat slurm.conf)"
(ncn-mw#) Apply the configuration:
kubectl apply -f devcls01a-slurm-conf.yaml
(ncn-mw#) Look up the pod for the tenant slurmcluster:
SLURMCTLD_POD=$(kubectl get pod -n vcluster-devten01a-slurm -lapp.kubernetes.io/name=slurmctld -o name)
(ncn-mw#) Reconfigure:
kubectl exec -n vcluster-devten01a-slurm ${SLURMCTLD_POD} -c slurmctld -- scontrol reconfigure
Repeat this step as needed for additional SlurmClusters.
sssd.confSlurmCluster: devcls01a
Filename: /etc/sssd/sssd.conf
(ncn-mw#) Get the user namespace sssd.conf so it can be cloned:
kubectl get configmap -n user sssd-conf -o jsonpath='{.data.sssd\.conf}' > sssd.conf
(ncn-mw#) Delete an existing stub file in the tenant, if present:
kubectl delete secret -n vcluster-devten01a-slurm devcls01a-sssd-conf
(ncn-mw#) Clone the user namespace file into the tenant:
kubectl create secret generic -n vcluster-devten01a-slurm noclus01a-sssd-conf --from-file sssd.conf
(ncn-mw#) Restart the tenant’s slurmctld pods:
kubectl rollout restart deployment -n vcluster-devten01a-slurm devcls01a-slurmctld devcls01a-slurmctld-backup
(ncn-mw#) Restart the tenant’s slurmdbd pods:
kubectl rollout restart deployment -n vcluster-devten01a-slurm devcls01a-slurmdbd devcls01a-slurmdbd-backup
(ncn-mw#) Check for all restarted pods to be in Running state:
kubectl get pods -A | egrep 'slurmctld|slurmdbd'
Repeat this step as needed for additional SlurmClusters.
Filename: group_vars/devten01a/slurm.yml
(ncn-mw#) Clone the USS repository:
git clone https://api-gw-service-nmn.local/vcs/cray/uss-config-management.git
(ncn-mw#) Go to repo:
cd uss-config-management
(ncn-mw#) Check out integration branch (1.1.0 shown here):
git checkout integration-1.1.0
(ncn-mw#) Create subdirectory for tenant:
mkdir group_vars/devten01a
(ncn-mw#) Edit the file group_vars/devten01a/slurm.yml
(ncn-mw#) Add the new file:
git add group_vars/devten01a/slurm.yml
(ncn-mw#) Commit the new file:
git commit -am "descriptive comment"
(ncn-mw#) Push to integration branch (1.1.0 shown here):
git push origin integration-1.1.0
(ncn-mw#) Remember the first commit ID in the output:
git log -a |cat
Repeat this step as needed for additional tenants.
Note that one template is needed for each node type (UAN, Compute) and architecture (X86, ARM) in the tenants. A single BOS session template may be used for many tenants of the same node type and architecture.
(ncn-mw#) Look up the name of the default template(s) for tenants (for example X86 Compute) and save as JSON files:
cray bos sessiontemplates describe --format json ssi-compute-cr_2024.x86_64-cr_2024_1 > ssi-compute-cr_2024.x86_64-cr_2024_1.json
(ncn-mw#) Make a copy of the default template:
cp ssi-compute-cr_2024.x86_64-cr_2024_1.json ssi-compute-cr_2024.x86_64-cr_2024_1-tenants.json
enable_cfs:, name:, and tenant: and remove the comma from the preceding line-tenants suffix, as seen in the next section(ncn-mw#) Upload the new template, specifying the filename and the name of the new template:
cray bos sessiontemplates create --format json --file ssi-compute-cr_2024.x86_64-cr_2024_1-tenants.json ssi-compute-cr_2024.x86_64-cr_2024_1-tenants
Repeat this step as needed for different node types and architectures.
Note that one configuration is needed for each node type (UAN, Compute) in the tenant. A single CFS configuration may be used for many tenants of the same node type.
(ncn-mw#) Save the default configuration as a JSON file:
cray cfs configurations describe --format json ssi-compute-cr_2024-cr_2024_1 > ssi-compute-cr_2024-cr_2024_1.json
(ncn-mw#) Make a copy of the default JSON file:
cp ssi-compute-cr_2024-cr_2024_1.json ssi-compute-cr_2024-cr_2024_1-tenants.json
lastUpdated:uss-config-management.git;
use the commit ID from “git log” command in the earlier step that created the USS group_vars file(ncn-mw#) Upload the new configuration, specifying the filename and the name of the new configuration:
cray cfs configurations update --file ssi-compute-cr_2024-cr_2024_1-tenants.json ssi-compute-cr_2024-cr_2024_1-tenants
Repeat this step as needed for different node types.
(ncn-mw#) Boot Compute nodes for a node architecture in tenant:
cray bos sessions create --template-name ssi-compute-cr_2024.x86_64-cr_2024_1-tenants --operation boot --limit x9000c1s0b1n0,x9000c1s0b1n1
After CFS completes, login to either a tenant UAN (if available), or tenant Compute.
(ncn-mw#) See what nodes are available:
sinfo
(ncn-mw#) Launch a command or application:
srun -N2 uname -rin
srun -N2 ./all2all
(ncn-mw#) View a specific tenant, brief:
kubectl get tenant -n tenants -o yaml vcluster-devten01a
(ncn-mw#) View a specific tenant, verbose:
kubectl describe tenant -n tenants vcluster-devten01a
(ncn-mw#) View the logs for all tenants:
TAPMS_POD=$( kubectl get pods -n tapms-operator --no-headers | awk '{print $1}' );
kubectl logs --timestamps -n tapms-operator $TAPMS_POD
SlurmCluster command examples(ncn-mw#) View the pods for all clusters:
kubectl get pods -A | grep vcluster
(ncn-mw#) View the pods for a specific cluster:
kubectl get pods -A | grep vcluster-devten01a-slurm
(ncn-mw#) View logs for a specific cluster:
NAMESPACE=vcluster-devten01a-slurm;
SLURMCTLD_POD=$( kubectl get pods -n $NAMESPACE |grep slurmctld |grep -v backup | awk '{print $1}' );
kubectl logs --timestamps -n $NAMESPACE $SLURMCTLD_POD -c slurmctld
(ncn-mw#) All HSM groups, including all tenants:
cray hsm groups list --format yaml
(ncn-mw#) Specific tenant:
cray hsm groups describe --format yaml devten01a
(ncn-mw#) All tenants:
kubectl hns tree tenants
This procedure is required for each tenant, after Slurm has been upgraded on the system (for example, after using IUF to upgrade products).
The configuration used to create each tenant’s SlurmCluster is required.
For this Slurm upgrade, there is no need to change the SlurmCluster name; the only change is to the Slurm version inside each tenant.
SlurmCluster:devcls01a- Filename:
devcls01a.yaml
(ncn-mw#) Edit the SlurmCluster configuration file:
cp -p devcls01a.yaml devcls01a.yaml{,.bak}
vi devcls01a.yaml
(ncn-mw#) Double-check the differences:
diff devcls01a.yaml devcls01a.yaml.bak
Possible output:
10c10
< image: cray/cray-slurmctld:1.7.0-slurm
---
> image: cray/cray-slurmctld:1.6.1
21c21
< image: cray/cray-slurmdbd:1.7.0-slurm
---
> image: cray/cray-slurmdbd:1.6.1
(ncn-mw#) Re-apply the SlurmCluster configuration file:
kubectl apply -f devcls01a.yaml
(ncn-mw#) Wait for all pods to return to Running state:
kubectl get pods -A |grep vcluster
Repeat this step as needed for additional SlurmClusters.
Development tenantThis is filename devten01.yaml; complete file is shown.
apiVersion: tapms.hpe.com/v1alpha3
kind: Tenant
metadata:
name: vcluster-devten01a
spec:
childnamespaces:
- slurm
- user
tenantname: vcluster-devten01a
tenanthooks: []
tenantresources:
- enforceexclusivehsmgroups: true
hsmgrouplabel: devten01a
type: compute
xnames:
- x9000c1s0b1n0
- x9000c1s0b1n1
- enforceexclusivehsmgroups: true
hsmgrouplabel: devten01a
type: application
xnames:
- x3000c0s29b0n0
Development SlurmClusterIMPORTANT The values for cpu and memory and initialDelaySeconds are recommended by the WLM team.
This is filename devcls01a.yaml; complete file is shown.
apiVersion: "wlm.hpe.com/v1alpha1"
kind: SlurmCluster
metadata:
name: devcls01a
namespace: vcluster-devcls01a-slurm
spec:
tapmsTenantName: vcluster-devcls01a
tapmsTenantVersion: v1alpha3
slurmctld:
image: cray/cray-slurmctld:1.6.1
ip: 10.150.124.100
host: devcls01a-slurmctld
backupIP: 10.150.124.101
backupHost: devcls01a-slurmctld-backup
livenessProbe:
enabled: true
initialDelaySeconds: 120
periodSeconds: 60
timeoutSeconds: 60
slurmdbd:
image: cray/cray-slurmdbd:1.6.1
ip: 10.150.124.102
host: devcls01a-slurmdbd
backupIP: 10.150.124.103
backupHost: devcls01a-slurmdbd-backup
livenessProbe:
enabled: true
initialDelaySeconds: 43200
periodSeconds: 30
timeoutSeconds: 5
munge:
image: cray/munge-munge:1.5.0
sssd:
image: cray/cray-sssd:1.4.0
config:
image: cray/cray-slurm-config:1.3.0
hsmGroup: devcls01a
pxc:
enabled: true
image:
repository: cray/cray-pxc
tag: 1.3.0
initImage:
repository: cray/cray-pxc-operator
tag: 1.3.0
configuration: |
[mysqld]
innodb_log_file_size=4G
innodb_lock_wait_timeout=900
wsrep_trx_fragment_size=1G
wsrep_trx_fragment_unit=bytes
log_error_suppression_list=MY-013360
data:
storageClassName: k8s-block-replicated
accessModes:
- ReadWriteOnce
storage: 1Ti
livenessProbe:
initialDelaySeconds: 300
periodSeconds: 10
timeoutSeconds: 5
resources:
requests:
cpu: "500m"
memory: 4Gi
limits:
cpu: "8"
memory: 32Gi
backup:
image:
repository: cray/cray-pxc-backup
tag: 1.3.0
data:
storageClassName: k8s-block-replicated
accessModes:
- ReadWriteOnce
storage: 512Gi
# Backup daily at 9:10PM (does not conflict with other CSM DB backups)
schedule: "10 21 * * *"
keep: 3
resources:
requests:
cpu: "500m"
memory: 4Gi
limits:
cpu: "8"
memory: 16Gi
haproxy:
image:
repository: cray/cray-pxc-haproxy
tag: 1.3.0
resources:
requests:
cpu: "500m"
memory: 128Mi
limits:
cpu: "16"
memory: 512Mi
Administrators are responsible for divvying up the HPE Slingshot VNI space among the primary SlurmCluster (user namespace) and any tenant SlurmClusters.
Start with the primary SlurmCluster, and then configure each tenant.
Here is an example for primary and one tenant:
This is filename /etc/slurm/slurm.conf for user namespace; partial file is shown.
...
SwitchType=switch/hpe_slingshot
SwitchParameters=vnis=1025-32767
...
This is filename /etc/slurm/slurm.conf for vcluster-devten01-slurm namespace; partial file is shown.
...
SwitchType=switch/hpe_slingshot
SwitchParameters=vnis=32768-65535
...
Second, insert the NodeSet, PartitionName, and NodeName directives that apply to your tenant.
In this example on Development, we have two X86 Compute nodes (1002 and 1003), and one X86 UAN (uan02).
This is filename /etc/slurm/slurm.conf for vcluster-devten01-slurm namespace; partial file is shown.
...
# PARTITIONS
NodeSet=Compute Feature=Compute
PartitionName=workq Nodes=Compute MaxTime=INFINITE State=UP OverSubscribe=EXCLUSIVE
# BEGIN COMPUTE NODES
NodeName=nid[001002-001003] Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=456704 Feature=Compute
# END COMPUTE NODES
NodeName=uan02 Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=227328 Feature=Application_UAN
...
This is file group_vars/devten01a/slurm.yml; complete file is shown.
munge_vault_path: secret/slurm/vcluster-devten01a-slurm/devcls01a/munge
slurm_conf_url: https://rgw-vip.local/wlm/vcluster-devten05-slurm/devcls01a/
slurmd_options: "--conf-server 10.156.124.104,10.156.124.105"