A tenant is a collection of nodes that is dedicated to one particular set of users on an HPE Cray EX system running CSM. This guide is intended to provide a comprehensive set of instructions for a system administrator to configure, deploy, and run applications on, one or two tenants.
In this document we provide examples for a hypothetical system called Development
, which has two tenants, and each tenant has a SlurmCluster
.
Note that this document reflects the current state of the Multi-Tenancy feature. For example, VNI blocks must be manually configured today, but they will be automatically configured in a future release.
Here are the steps required:
SlurmClusters
SlurmCluster
SlurmCluster's
slurm.conf
and sssd.conf
VNIs
) to primary (user
namespace) slurm.conf
This section provides additional information on the configuration necessary to fully setup a tenant.
Tenants are created and configured by creating a tenant custom resource definition in the form of a yaml
file.
For more information see Create A Tenant
For the purposes of this guide, the tenant configuration settings are made in each tenant’s configuration file, e.g. devten01a.yaml
.
Choose your naming convention for each system and tenant
Example:
Development
devten01a
and devten02a
01b
, 01c
, etcExample of these settings in configuration file devten01a.yaml
:
apiVersion: tapms.hpe.com/v1alpha3
kind: Tenant
metadata:
name: devten01a
spec:
childnamespaces:
- slurm
- user
tenantname: vcluster-devten01a
tenantkms:
enablekms: true
tenanthooks: []
tenantresources:
- enforceexclusivehsmgroups: true
hsmgrouplabel: devten01a
type: compute
xnames:
- x1000c0s0b0n0
apiVersion
should use the latest available in the CSM release, e.g. v1alpha3
for CSM 1.6SlurmCluster
ConfigurationThese configuration settings are made:
SlurmCluster's
configuration file, e.g. devcls01a.yaml
SlurmCluster's
/etc/slurm/slurm.conf
file (in each slurmctld
pod)SlurmCluster
NamesChoose your naming convention for each system and SlurmCluster
Example:
Development
SlurmClusters
devcls01a
and devcls02a
01b
, 01c
, etcName length limitation:
SlurmCluster
namedevcls01a-slurmdb
(16 characters)Example of settings in configuration file devcls01a.yaml
:
namespace: vcluster-devten01a-slurm
tapmsTenantName: vcluster-devten01a
hsmGroup: devten01a
SlurmCluster
IP AddressesIMPORTANT
Each High-Speed Network (HSN) IP address must be unique, within all the SlurmClusters
on any one system.
SlurmCluster
Development
base HSN IP address 10.156.0.0SlurmCluster
(user
namespace) uses 10.156.12.100, .101, .102, .103SlurmCluster
(vcluster-devten01-slurmdb
namespace) will use 10.156.12.104, .105, .106, .107SlurmCluster
(vcluster-devten02-slurmdb
namespace) will use 10.156.12.108, .109, .110, .111SlurmCluster
API versionSlurmCluster
apiVersion
must match Slurm release (for example v1alpha1
)SlurmCluster
Configurable Valuescpu
and memory
and initialDelaySeconds
are shown in the example file devcls01a.yaml
, belowSlurmCluster
Version NumbersVersion numbers are shown in the example file devcls01a.yaml
, below
The version numbers must match the versions of these products on the system
Example:
- cray/cray-slurmctld:1.6.1
- cray/cray-slurmdbd:1.6.1
- cray/munge-munge:1.5.0
- cray/cray-sssd:1.4.0
- cray/cray-slurm-config:1.3.0
SlurmCluster
Slurm configurationThese configuration settings are made in each SlurmCluster's
/etc/slurm/slurm.conf
file (in each slurmctld
pod)
SlurmCluster
Slingshot VNI AllocationIMPORTANT
Each block of HPE Slingshot VNIs
on the High-Speed Network (HSN) must not overlap with other blocks on the same system.
/etc/slurm/slurm.conf
file in each tenant’s SlurmCluster
user
namespace:
SwitchType=switch/hpe_slingshot
SwitchParameters=vnis=1025-65535
user
namespace:
SwitchType=switch/hpe_slingshot
SwitchParameters=vnis=1025-32767
vcluster-devten01a-slurm
namespace:
SwitchType=switch/hpe_slingshot
SwitchParameters=vnis=32768-65535
user
namespace:
SwitchType=switch/hpe_slingshot
SwitchParameters=vnis=1025-32767
vcluster-devten01a-slurm
namespace:
SwitchType=switch/hpe_slingshot
SwitchParameters=vnis=32868-57353
vcluster-devten02a-slurm
namespace:
SwitchType=switch/hpe_slingshot
SwitchParameters=vnis=57354-65535
SlurmCluster
Partitions and NodesThe general advice to tailor the compute node configuration for each tenant is to look at the slurm.conf
for the primary (user
namespace) Slurm instance.
Borrow the NodeSet
, PartitionName
, and NodeName
directives that apply to each tenant.
In this example, we have ‘moved’ two Compute nodes to the slurm.conf
in namespace vcluster-devten01a-slurm
.
# PARTITIONS
NodeSet=Compute Feature=Compute
PartitionName=workq Nodes=Compute MaxTime=INFINITE State=UP OverSubscribe=EXCLUSIVE
# BEGIN COMPUTE NODES
NodeName=nid[001002-001003] Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=456704 Feature=Compute
SlurmCluster
Secrets (for nonroot
users)These configuration settings are made in each SlurmCluster's
/etc/sssd/sssd.conf
file (in each slurmctld
pod)
You should not need to create or edit the sssd.conf
file.
Simply clone that file from the primary SlurmCluster
(user
namespace) to each tenant namespace.
These changes are made to the uss-config-management
git repo.
Tenants are disambiguated by their HSM group name (for example hsmgrouplabel
"devten01a"
in devten01a.yaml
, and hsmGroup
"devten01a"
in devcls01a.yaml
).
All tenants can be booted and configured with a single CFS configuration that contains the appropriate git commit ID in the USS layers.
Example: group_vars/devten01a/slurm.yml
munge_vault_path: secret/slurm/vcluster-devten01a-slurm/devcls01a/munge
slurm_conf_url: https://rgw-vip.local/wlm/vcluster-devten05-slurm/devcls01a/
slurmd_options: "--conf-server 10.156.124.104,10.156.124.105"
Example: group_vars/devten02a/slurm.yml
munge_vault_path: secret/slurm/vcluster-devten02a-slurm/devcls02a/munge
slurm_conf_url: https://rgw-vip.local/wlm/vcluster-devten02a-slurm/devcls02a/
slurmd_options: "--conf-server 10.156.124.108,10.156.124.109"
After initial creation, the SlurmCluster
resource may be updated with new
settings. This is useful to correct errors with the initial deployment, or
to update to new Slurm versions.
(ncn-mw#
) Edit the SlurmCluster
file (For example, devcls01a.yaml
).
(ncn-mw#
) Apply the changes:
kubectl apply -f devcls01a.yaml
(ncn-mw#
) The Slurm operator will update the relevant Kubernetes resources
to reflect the new configuration.
For example, if a new version of Slurm is installed on the system, the tenant
can update to the new Slurm version by setting new container versions in the
SlurmCluster
file and applying the changes.
Legend for these examples:
devten01a.yaml
- configuration file for tenant devten01a
devcls01a.yaml
- configuration file for SlurmCluster
devcls01a
Filename: devten01a.yaml
(ncn-mw#
) Make sure tenant name is not already in use:
kubectl get tenant -n tenants -o yaml vcluster-devten01a
(ncn-mw#
) Make sure HSM group is not already in use:
cray hsm groups describe devten01a
(ncn-mw#
) Create your tenant.yaml
file, and apply it:
vi devten01a.yaml
kubectl apply -n tenants -f devten01a.yaml
(ncn-mw#
) Wait for ‘Deploying’ state to become ‘Deployed’:
kubectl get tenant -n tenants -o yaml vcluster-devten01a
(ncn-mw#
) Confirm HSM group:
cray hsm groups describe devten01a
Repeat this step as needed for additional tenants.
SlurmCluster
configuration fileFilename: devcls01a.yaml
(ncn-mw#
) Make sure cluster name is not already in use:
kubectl get pods -A | grep vcluster-devcls01a-slurm
(ncn-mw#
) Create your cluster.yaml
file, and apply it:
vi devcls01a.yaml
kubectl apply -f devcls01a.yaml
(ncn-mw#
) Wait for pods to initialize:
kubectl get pods -A | grep vcluster-devten01a-slurm
Repeat this step as needed for additional SlurmClusters
.
SlurmCluster
: devcls01a
Filename: /etc/slurm/slurm.conf
(ncn-mw#
) Get the running configuration:
kubectl get configmap -n vcluster-devten01a-slurm devcls01a-slurm-conf -o yaml > devcls01a-slurm-conf.yaml
(ncn-mw#
) Extract the slurm.conf
:
yq r devcls01a-slurm-conf.yaml 'data."slurm.conf"' > slurm.conf
(ncn-mw#
) Edit the slurm.conf
:
vi slurm.conf
(ncn-mw#
) Update the configuration:
yq w -i devcls01a-slurm-conf.yaml 'data."slurm.conf"' "$(cat slurm.conf)"
(ncn-mw#
) Apply the configuration:
kubectl apply -f devcls01a-slurm-conf.yaml
(ncn-mw#
) Look up the pod for the tenant slurmcluster
:
SLURMCTLD_POD=$(kubectl get pod -n vcluster-devten01a-slurm -lapp.kubernetes.io/name=slurmctld -o name)
(ncn-mw#
) Reconfigure:
kubectl exec -n vcluster-devten01a-slurm ${SLURMCTLD_POD} -c slurmctld -- scontrol reconfigure
Repeat this step as needed for additional SlurmClusters
.
sssd.conf
SlurmCluster
: devcls01a
Filename: /etc/sssd/sssd.conf
(ncn-mw#
) Get the user
namespace sssd.conf
so it can be cloned:
kubectl get configmap -n user sssd-conf -o jsonpath='{.data.sssd\.conf}' > sssd.conf
(ncn-mw#
) Delete an existing stub
file in the tenant, if present:
kubectl delete secret -n vcluster-devten01a-slurm devcls01a-sssd-conf
(ncn-mw#
) Clone the user
namespace file into the tenant:
kubectl create secret generic -n vcluster-devten01a-slurm noclus01a-sssd-conf --from-file sssd.conf
(ncn-mw#
) Restart the tenant’s slurmctld
pods:
kubectl rollout restart deployment -n vcluster-devten01a-slurm devcls01a-slurmctld devcls01a-slurmctld-backup
(ncn-mw#
) Restart the tenant’s slurmdbd
pods:
kubectl rollout restart deployment -n vcluster-devten01a-slurm devcls01a-slurmdbd devcls01a-slurmdbd-backup
(ncn-mw#
) Check for all restarted pods to be in Running state:
kubectl get pods -A | egrep 'slurmctld|slurmdbd'
Repeat this step as needed for additional SlurmClusters
.
Filename: group_vars/devten01a/slurm.yml
(ncn-mw#
) Clone the USS repository:
git clone https://api-gw-service-nmn.local/vcs/cray/uss-config-management.git
(ncn-mw#
) Go to repo:
cd uss-config-management
(ncn-mw#
) Check out integration branch (1.1.0 shown here):
git checkout integration-1.1.0
(ncn-mw#
) Create subdirectory for tenant:
mkdir group_vars/devten01a
(ncn-mw#
) Edit the file group_vars/devten01a/slurm.yml
(ncn-mw#
) Add the new file:
git add group_vars/devten01a/slurm.yml
(ncn-mw#
) Commit the new file:
git commit -am "descriptive comment"
(ncn-mw#
) Push to integration branch (1.1.0 shown here):
git push origin integration-1.1.0
(ncn-mw#
) Remember the first commit ID in the output:
git log -a |cat
Repeat this step as needed for additional tenants.
Note that you will need one template for each node type (UAN,Compute) and architecture (X86,ARM) in the tenants. You can use a single BOS session template for many tenants of the same node type and architecture.
(ncn-mw#
) Look up the name of the default template(s) for tenants (for example X86 Compute) and save as JSON file(s):
cray bos sessiontemplates describe --format json ssi-compute-cr_2024.x86_64-cr_2024_1 > ssi-compute-cr_2024.x86_64-cr_2024_1.json
(ncn-mw#
) Make a copy of the default template:
cp ssi-compute-cr_2024.x86_64-cr_2024_1.json ssi-compute-cr_2024.x86_64-cr_2024_1-tenants.json
enable_cfs:
, name:
, and tenant:
and remove the comma from the preceding line-tenants
suffix, as seen in the next section(ncn-mw#
) Upload the new template, specifying the filename and the name of the new template:
cray bos sessiontemplates create --format json --file ssi-compute-cr_2024.x86_64-cr_2024_1-tenants.json ssi-compute-cr_2024.x86_64-cr_2024_1-tenants
Repeat this step as needed for different node types and architectures.
Note that you will need one configuration for each node type (UAN,Compute) in the tenant. You can use a single BOS session template for many tenants of the same node type and architecture.
(ncn-mw#
) Save the default configuration as a JSON file:
cray cfs configurations describe --format json ssi-compute-cr_2024-cr_2024_1 > ssi-compute-cr_2024-cr_2024_1.json
(ncn-mw#
) Make a copy of the default JSON file:
cp ssi-compute-cr_2024-cr_2024_1.json ssi-compute-cr_2024-cr_2024_1-tenants.json
lastUpdated
:uss-config-management.git
; you will use the commit ID from “git log” command in the earlier step that created the USS group_vars
file(ncn-mw#
) Upload the new configuration, specifying the filename and the name of the new configuration:
cray cfs configurations update --file ssi-compute-cr_2024-cr_2024_1-tenants.json ssi-compute-cr_2024-cr_2024_1-tenants
Repeat this step as needed for different node types and architectures.
(ncn-mw#
) Boot Compute nodes for a node architecture in tenant:
cray bos sessions create --template-name ssi-compute-cr_2024.x86_64-cr_2024_1-tenants --operation boot --limit x9000c1s0b1n0,x9000c1s0b1n1
After CFS completes, login to either a tenant UAN (if available), or tenant Compute.
(ncn-mw#
) See what nodes are available:
sinfo
(ncn-mw#
) Launch a command or application:
srun -N2 uname -rin
srun -N2 ./all2all
(ncn-mw#
) View a specific tenant, brief:
kubectl get tenant -n tenants -o yaml vcluster-devten01a
(ncn-mw#
) View a specific tenant, verbose:
kubectl describe tenant -n tenants vcluster-devten01a
(ncn-mw#
) View the logs for all tenants:
TAPMS_POD=$( kubectl get pods -n tapms-operator --no-headers | awk '{print $1}' );
kubectl logs --timestamps -n tapms-operator $TAPMS_POD
SlurmCluster
command examples(ncn-mw#
) View the pods for all clusters:
kubectl get pods -A | grep vcluster
(ncn-mw#
) View the pods for a specific cluster:
kubectl get pods -A | grep vcluster-devten01a-slurm
(ncn-mw#
) View logs for a specific cluster:
NAMESPACE=vcluster-devten01a-slurm;
SLURMCTLD_POD=$( kubectl get pods -n $NAMESPACE |grep slurmctld |grep -v backup | awk '{print $1}' );
kubectl logs --timestamps -n $NAMESPACE $SLURMCTLD_POD -c slurmctld
(ncn-mw#
) All HSM groups, including all tenants:
cray hsm groups list --format yaml
(ncn-mw#
) Specific tenant:
cray hsm groups describe --format yaml devten01a
(ncn-mw#
) All tenants:
kubectl hns tree tenants
This procedure is required for each tenant, after Slurm has been upgraded on the system (for example, after using IUF to upgrade products).
You will need the configuration file that you used to create each tenant’s SlurmCluster
.
For this Slurm upgrade, there is no need to change the SlurmCluster
name; the only change is to the Slurm version inside each tenant.
SlurmCluster
:devcls01a
- Filename:
devcls01a.yaml
(ncn-mw#
) Edit the SlurmCluster
configuration file:
cp -p devcls01a.yaml devcls01a.yaml{,.bak}
vi devcls01a.yaml
(ncn-mw#
) Double-check the differences:
diff devcls01a.yaml devcls01a.yaml.bak
Possible output:
10c10
< image: cray/cray-slurmctld:1.7.0-slurm
---
> image: cray/cray-slurmctld:1.6.1
21c21
< image: cray/cray-slurmdbd:1.7.0-slurm
---
> image: cray/cray-slurmdbd:1.6.1
(ncn-mw#
) Re-apply the SlurmCluster
configuration file:
kubectl apply -f devcls01a.yaml
(ncn-mw#
) Wait for all pods to return to Running state:
kubectl get pods -A |grep vcluster
Repeat this step as needed for additional SlurmClusters
.
Development
tenantThis is filename devten01.yaml
; complete file is shown.
apiVersion: tapms.hpe.com/v1alpha3
kind: Tenant
metadata:
name: vcluster-devten01a
spec:
childnamespaces:
- slurm
- user
tenantname: vcluster-devten01a
tenanthooks: []
tenantresources:
- enforceexclusivehsmgroups: true
hsmgrouplabel: devten01a
type: compute
xnames:
- x9000c1s0b1n0
- x9000c1s0b1n1
- enforceexclusivehsmgroups: true
hsmgrouplabel: devten01a
type: application
xnames:
- x3000c0s29b0n0
Development
SlurmCluster
IMPORTANT
The values for cpu
and memory
and initialDelaySeconds
are recommended by the WLM team.
This is filename devcls01a.yaml
; complete file is shown.
apiVersion: "wlm.hpe.com/v1alpha1"
kind: SlurmCluster
metadata:
name: devcls01a
namespace: vcluster-devcls01a-slurm
spec:
tapmsTenantName: vcluster-devcls01a
tapmsTenantVersion: v1alpha3
slurmctld:
image: cray/cray-slurmctld:1.6.1
ip: 10.150.124.100
host: devcls01a-slurmctld
backupIP: 10.150.124.101
backupHost: devcls01a-slurmctld-backup
livenessProbe:
enabled: true
initialDelaySeconds: 120
periodSeconds: 60
timeoutSeconds: 60
slurmdbd:
image: cray/cray-slurmdbd:1.6.1
ip: 10.150.124.102
host: devcls01a-slurmdbd
backupIP: 10.150.124.103
backupHost: devcls01a-slurmdbd-backup
livenessProbe:
enabled: true
initialDelaySeconds: 43200
periodSeconds: 30
timeoutSeconds: 5
munge:
image: cray/munge-munge:1.5.0
sssd:
image: cray/cray-sssd:1.4.0
config:
image: cray/cray-slurm-config:1.3.0
hsmGroup: devcls01a
pxc:
enabled: true
image:
repository: cray/cray-pxc
tag: 1.3.0
initImage:
repository: cray/cray-pxc-operator
tag: 1.3.0
configuration: |
[mysqld]
innodb_log_file_size=4G
innodb_lock_wait_timeout=900
wsrep_trx_fragment_size=1G
wsrep_trx_fragment_unit=bytes
log_error_suppression_list=MY-013360
data:
storageClassName: k8s-block-replicated
accessModes:
- ReadWriteOnce
storage: 1Ti
livenessProbe:
initialDelaySeconds: 300
periodSeconds: 10
timeoutSeconds: 5
resources:
requests:
cpu: "1"
memory: 4Gi
limits:
cpu: "8"
memory: 32Gi
backup:
image:
repository: cray/cray-pxc-backup
tag: 1.3.0
data:
storageClassName: k8s-block-replicated
accessModes:
- ReadWriteOnce
storage: 512Gi
# Backup daily at 9:10PM (does not conflict with other CSM DB backups)
schedule: "10 21 * * *"
keep: 3
resources:
requests:
cpu: "1"
memory: 4Gi
limits:
cpu: "8"
memory: 16Gi
haproxy:
image:
repository: cray/cray-pxc-haproxy
tag: 1.3.0
resources:
requests:
cpu: "1"
memory: 128Mi
limits:
cpu: "16"
memory: 512Mi
First, you are responsible for divvying up the HPE Slingshot VNI space among the primary SlurmCluster
(‘user’ namespace) and any tenant SlurmClusters
.
Start with the primary SlurmCluster
, and then configure each tenant.
Here is an example for primary and one tenant:
This is filename /etc/slurm/slurm.conf
for user
namespace; partial file is shown.
...
SwitchType=switch/hpe_slingshot
SwitchParameters=vnis=1025-32767
...
This is filename /etc/slurm/slurm.conf
for vcluster-devten01-slurm
namespace; partial file is shown.
...
SwitchType=switch/hpe_slingshot
SwitchParameters=vnis=32768-65535
...
Second, insert the NodeSet
, PartitionName
, and NodeName
directives that apply to your tenant.
In this example on Development
, we have two X86 Compute nodes (1002 and 1003), and one X86 UAN (uan02
).
This is filename /etc/slurm/slurm.conf
for vcluster-devten01-slurm
namespace; partial file is shown.
...
# PARTITIONS
NodeSet=Compute Feature=Compute
PartitionName=workq Nodes=Compute MaxTime=INFINITE State=UP OverSubscribe=EXCLUSIVE
# BEGIN COMPUTE NODES
NodeName=nid[001002-001003] Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=456704 Feature=Compute
# END COMPUTE NODES
NodeName=uan02 Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=227328 Feature=Application_UAN
...
This is file group_vars/devten01a/slurm.yml
; complete file is shown.
munge_vault_path: secret/slurm/vcluster-devten01a-slurm/devcls01a/munge
slurm_conf_url: https://rgw-vip.local/wlm/vcluster-devten05-slurm/devcls01a/
slurmd_options: "--conf-server 10.156.124.104,10.156.124.105"