Creating a Tenant

Overview

This page provides information about how to create a tenant. This procedure involves creating a Custom Resource Definition (CRD) and then applying the Custom Resource (CR), for both tapms and the slurm operator.

TAPMS CRD

Tenant provisioning is handled in a declarative fashion, by creating a CR with the specification for the tenant.

  • (ncn-mw#) The full schema is available by executing the following command:

    kubectl get customresourcedefinitions.apiextensions.k8s.io tenants.tapms.hpe.com  -o yaml
    
  • An example of a tenant custom resource (CR):

    apiVersion: tapms.hpe.com/v1alpha3
    kind: Tenant
    metadata:
      name: vcluster-blue
    spec:
      childnamespaces:
      - slurm
      - user
      tenantname: vcluster-blue
      tenantkms:
        enablekms: true
      tenanthooks: []
      tenantresources:
      - enforceexclusivehsmgroups: true
        hsmgrouplabel: blue
        type: compute
        xnames:
        - x1000c0s0b0n0
    

IMPORTANT In order to keep nodes for different tenants separate, enforceexclusivehsmgroups must be set to true, and hsmgrouplabel must be set to a unique label for the tenant. Without these, it is possible for tenants to share nodes.

Apply the TAPMS CR

  • (ncn-mw#) Once the CR has been crafted for the tenant, the following command will begin the provisioning of the tenant:

    All tenants should be applied in the tenants namespace.

    kubectl apply -n tenants -f <tenant.yaml>
    

    Example output:

    tenant.tapms.hpe.com/vcluster-blue created
    
  • (ncn-mw#) It can take up to a minute for tapms to fully create the tenant. The following command can be used to monitor the status of the tenant:

    kubectl get tenant -n tenants vcluster-blue -o yaml
    

    Example output:

    apiVersion: tapms.hpe.com/v1alpha3
    kind: Tenant
    metadata:
      annotations:
        kubectl.kubernetes.io/last-applied-configuration: |
          {"apiVersion":"tapms.hpe.com/v1alpha3","kind":"Tenant","metadata":{"annotations":{},"name":"vcluster-blue","namespace":"tenants"},"spec":{"childnamespaces":["slurm","user"],"tenanthooks":[],"tenantkms":{"enablekms":true},"tenantname":"vcluster-blue","tenantresources":[{"enforceexclusivehsmgroups":true,"hsmgrouplabel":"blue","type":"compute","xnames":["x1000c0s0b0n0"]}]}}      
      creationTimestamp: "2023-09-27T17:14:28Z"
      finalizers:
      - tapms.hpe.com/finalizer
      generation: 8
      name: vcluster-blue
      namespace: tenants
      resourceVersion: "18509045"
      uid: 04f26622-dccb-44a1-a928-7d4750c573e7
    spec:
      childnamespaces:
      - slurm
      - user
      state: Deployed
      tenanthooks: []
      tenantkms:
        enablekms: true
        keyname: key1
        keytype: rsa-3072
      tenantname: vcluster-blue
      tenantresources:
      - enforceexclusivehsmgroups: true
        hsmgrouplabel: blue
        type: compute
        xnames:
        - x1000c0s0b0n0
    status:
      childnamespaces:
      - vcluster-blue-slurm
      - vcluster-blue-user
      tenanthooks: []
      tenantkms:
        keyname: key1
        keytype: rsa-3072
        publickey: '{"1":{"creation_time":"2023-09-27T17:14:50.475282593Z","name":"rsa-3072","public_key":"-----BEGIN
          PUBLIC KEY-----\nMIIBojANBgkqhkiG9w0BAQEFAAOCAY8AMIIBigKCAYEAyYGFfQWlJPKNiz25SAJ+\nHdsW2iENcXd1Rst0hYk5JI7h9y1enLCaSr1TkCh0sRvYu03OZSr7+crNhb4SL7mK\nXFDnkX55qKu4KcIwyz0ZAtPJJ959HlnPuL0ELglV7PIQtMqejLpQqOTU7zM5/Jh+\n++nex5SEo5BmiQGB9UQfgAORhuRI5um0DlnE/W1hHdTvprj1HfPvI+XcBOffzbPe\nK3Os/dnxeSlJ2V45fEDmgR4pCIOdPmoTaXnE/ARlsfp5riA8w0butXT+5MddGNXb\nlMfBLtlTGYPBGApuWoeMqfdgsQv6gm5m7nBT7iaJHrnFkdZVpjJKoCN/4ZEtAjUS\nVF9KL9I/KEiwSh4k4OT7MGlxPIhu7XxBMVxXNMOAo4DTOk9kdUpbgcy+W1fkv5HW\nxYElVbToSokQLiMhURZ6eaqXUcOEDpSVxsvX0oqMkZBwzJcNC3KxEDVnTJQ8VMmp\n6nmDinp4noosUJC5QbiQ8oUyg+gLXbUQUYS0DZawZ1Y3AgMBAAE=\n-----END
          PUBLIC KEY-----\n"}}'
        transitname: cray-tenant-912e5990-8fdc-46ff-b86e-11550345e737
      tenantresources:
      - enforceexclusivehsmgroups: true
        hsmgrouplabel: blue
        type: compute
        xnames:
        - x1000c0s0b0n0
    
  • (ncn-mw#) The cray command can now be used to display the HSM group:

    cray hsm groups describe blue --format toml
    

    Example output:

    label = "blue"
    description = ""
    exclusiveGroup = "tapms-exclusive-group-label"
    tags = [ "vcluster-blue",]
    
    [members]
    ids = [ "x1000c0s0b0n0", "x1000c0s1b0n0",]
    
  • (ncn-mw#) The following command can now be used to display the namespace tree structure for the tenant:

    kubectl hns tree tenants
    

    Example output:

    tenants
    └── [s] vcluster-blue
        ├── [s] vcluster-blue-slurm
        └── [s] vcluster-blue-user
    

slurm operator CRD

Slurm provisioning is similar to tenant creation, using a CR.

(ncn-mw#) To see all possible configuration settings for the custom resource, run this command:

kubectl get crd slurmclusters.wlm.hpe.com -o yaml

Create a custom resource describing the Slurm tenant. For example, the following mycluster.yaml file describes a Slurm tenant named mycluster within a vcluster-blue TAPMS tenant:

apiVersion: "wlm.hpe.com/v1alpha1"
kind: SlurmCluster
metadata:
  name: mycluster
  namespace: vcluster-blue-slurm
spec:
  tapmsTenantName: vcluster-blue
  tapmsTenantVersion: v1alpha3
  slurmctld:
    image: cray/cray-slurmctld:1.6.1
    ip: 10.253.124.100
    host: mycluster-slurmctld
    backupIP: 10.253.124.101
    backupHost: mycluster-slurmctld-backup
    livenessProbe:
      enabled: true
      initialDelaySeconds: 120
      periodSeconds: 60
      timeoutSeconds: 60
  slurmdbd:
    image: cray/cray-slurmdbd:1.6.1
    ip: 10.253.124.102
    host: mycluster-slurmdbd
    backupIP: 10.253.124.103
    backupHost: mycluster-slurmdbd-backup
    livenessProbe:
      enabled: true
      initialDelaySeconds: 43200
      periodSeconds: 30
      timeoutSeconds: 5
  munge:
    image: cray/munge-munge:1.5.0
  sssd:
    image: cray/cray-sssd:1.4.0
    sssdConf: |
      [sssd]
      config_file_version = 2
      services = nss
      domains = files

      [nss]

      [domain/files]
      id_provider = files      
  macvlan:
    subnet: 10.253.0.0/16
  config:
    image: cray/cray-slurm-config:1.3.0
    hsmGroup: blue
  pxc:
    enabled: true
    initImage:
      repository: cray/cray-pxc-operator
      tag: 1.3.0
    image:
      repository: cray/cray-pxc
      tag: 1.3.0
    configuration: |
      [mysqld]
      innodb_log_file_size=4G
      innodb_lock_wait_timeout=900
      wsrep_trx_fragment_size=1G
      wsrep_trx_fragment_unit=bytes
      log_error_suppression_list=MY-013360      
    data:
      storageClassName: k8s-block-replicated
      accessModes:
        - ReadWriteOnce
      storage: 1Ti
    livenessProbe:
      initialDelaySeconds: 300
      periodSeconds: 10
      timeoutSeconds: 5
    resources:
      requests:
        cpu: "500m"
        memory: 4Gi
      limits:
        cpu: "8"
        memory: 32Gi
    backup:
      image:
        repository: cray/cray-pxc-backup
        tag: 1.3.0
      data:
        storageClassName: k8s-block-replicated
        accessModes:
          - ReadWriteOnce
        storage: 512Gi
      # Backup daily at 9:10PM (does not conflict with other CSM DB backups)
      schedule: "10 21 * * *"
      keep: 3
      resources:
        requests:
          cpu: "500m"
          memory: 4Gi
        limits:
          cpu: "8"
          memory: 16Gi
    haproxy:
      image:
        repository: cray/cray-pxc-haproxy
        tag: 1.3.0
      resources:
        requests:
          cpu: "500m"
          memory: 128Mi
        limits:
          cpu: "16"
          memory: 512Mi

Container versions must be customized to the versions installed on the system. List the available versions with the following commands:

curl -s https://registry.local/v2/cray/cray-slurmctld/tags/list | jq -r .tags[]
curl -s https://registry.local/v2/cray/cray-slurmdbd/tags/list | jq -r .tags[]
curl -s https://registry.local/v2/cray/munge-munge/tags/list | jq -r .tags[]
curl -s https://registry.local/v2/cray/cray-sssd/tags/list | jq -r .tags[]
curl -s https://registry.local/v2/cray/cray-slurm-config/tags/list | jq -r .tags[]
curl -s https://registry.local/v2/cray/cray-pxc/tags/list | jq -r .tags[]

Typically, the highest available version should be used. For the slurmctld and slurmdbd containers, use versions with a -slurm postfix if available. For example, use cray/cray-slurmctld:1.8.0-slurm rather than cray/cray-slurmctld:1.8.0.

If using munge container version cray/munge-munge:1.6.0, set spec.munge.uid to 498 and spec.munge.gid to 484.

Configuring the Virtual Network Identifier (VNI) range

If USS 1.3.0 or newer is installed, the HPE Slingshot VNI range used for the tenant may be configured using these settings:

  • spec.config.vniPartition - HPE Slingshot Fabric Manager VNI partition name for this tenant. The HPE Slingshot network operator creates a partition when the SlingshotTenant custom resource is applied.
  • spec.config.vniRange - VNI range to use for this tenant, in format start-end. Must not overlap with any other tenant’s VNI range. If vniPartition is set, the partition’s VNI range overrides this value.
  • spec.config.vniPartitionCreate - If true, create a new HPE Slingshot Fabric Manager VNI partition with name vniPartition and range vniRange if it does not exist.

Apply the slurm operator CR

(ncn-mw#) To create the tenant and deploy Slurm resources, apply the tenant file with kubectl:

kubectl apply -f <cluster>.yaml

Once the tenant has been created, the Ansible configuration for compute and application nodes must be updated to use the tenant-specific configuration. To do this, create a group_vars/<spec.config.hsmGroup>/slurm.yaml file in the uss-config-management VCS repository with the following content:

munge_vault_path: secret/slurm/<metadata.namespace>/<metadata.name>/munge
slurmd_options: "--conf-server <spec.slurmctld.ip>,<spec.slurmctld.backupIP>"

Where values in angle brackets correspond to values from the mycluster.yaml file. For example, if using the example mycluster.yaml file from the previous section, create a group_vars/blue/slurm.yaml file in the uss-config-management VCS repository with the following content:

munge_vault_path: secret/slurm/vcluster-blue-slurm/mycluster/munge
slurmd_options: "--conf-server 10.253.124.100,10.253.124.101"

This will configure nodes in that tenant with the MUNGE key and Slurm configuration files created for that tenant.