Prepare site init

These procedures guide administrators through setting up the site-init directory which contains important customizations for various products.

  1. Background
  2. Create and Initialize site-init Directory
  3. Create Baseline System Customizations
    1. Setup LDAP configuration
    2. Customize DNS configuration
    3. Set storage limit for Thanos S3 bucket
    4. Configure Prometheus SNMP Exporter
  4. Encrypt secrets
  5. Customer-Specific Customizations

1. Background

The shasta-cfg directory included in the CSM release tarball includes relatively static, installation-centric artifacts, such as:

  • Cluster-wide network configuration settings required by Helm charts deployed by product stream Loftsman manifests
  • Sealed Secrets
  • Sealed Secret Generate Blocks – a form of plain-text input that renders to a Sealed Secret
  • Helm chart value overrides that are merged into Loftsman manifests by product stream installers

2. Create and initialize site-init directory

NOTE If the pre-installation is resuming here, ensure the environment variables have been properly set by following Set reusable environment variables and then coming back to this page.

  1. (pit#) Set the SITE_INIT variable.

    Important: All procedures on this page assume that SITE_INIT variable has been set.

    SITE_INIT="${PITDATA}/prep/site-init"
    
  2. (pit#) Create the site-init directory.

    mkdir -pv "${SITE_INIT}"
    
  3. (pit#) Initialize site-init from CSM.

    "${CSM_PATH}/shasta-cfg/meta/init.sh" "${SITE_INIT}"
    

3. Create Baseline System Customizations

The following steps update ${SITE_INIT}/customizations.yaml with system-specific customizations.

  1. (pit#) Change into the site-init directory

    cd "${SITE_INIT}"
    
  2. (pit#) Merge the system-specific settings generated by CSI into customizations.yaml.

    yq merge -xP -i "${SITE_INIT}/customizations.yaml" <(yq prefix -P "${PITDATA}/prep/${SYSTEM_NAME}/customizations.yaml" spec)
    
  3. (pit#) Set the cluster name.

    yq write -i "${SITE_INIT}/customizations.yaml" spec.wlm.cluster_name "${SYSTEM_NAME}"
    
  4. (pit#) Make a backup copy of ${SITE_INIT}/customizations.yaml.

    cp -pv "${SITE_INIT}/customizations.yaml" "${SITE_INIT}/customizations.yaml.prepassword"
    
  5. (pit#) Review the configuration to generate these sealed secrets in customizations.yaml in the site-init directory:

    • spec.kubernetes.sealed_secrets.cray_reds_credentials
    • spec.kubernetes.sealed_secrets.cray_meds_credentials
    • spec.kubernetes.sealed_secrets.cray_hms_rts_credentials
    • Replace the Username and Password references in match the existing settings of your system hardware components.

    NOTE

    • The cray_reds_credentials are used by the River Endpoint Discovery Service (REDS) for River components.
    • The cray_meds_credentials are used by the Mountain Endpoint Discovery Service (MEDS) for the liquid-cooled components in an Olympus (Mountain) cabinet.
    • The cray_hms_rts_credentials are used by the Redfish Translation Service (RTS) for any hardware components which are not managed by Redfish, such as a ServerTech PDU in a River Cabinet.

    See the Decrypt Sealed Secrets for Review section of Manage Sealed Secrets, if needing to examine credentials from prior installations.

    vim "${SITE_INIT}/customizations.yaml"
    
  6. (pit#) Review the changes that you made.

    diff ${SITE_INIT}/customizations.yaml ${SITE_INIT}/customizations.yaml.prepassword
    
  7. (pit#) Validate that REDS/MEDS/RTS credentials are correct.

    For all credentials, make sure that Username and Password values are correct.

    • Validate REDS credentials:

      NOTE These credentials are used by the REDS and HMS discovery services, targeting River Redfish BMC endpoints and management switches

      • For vault_redfish_defaults, the only entry used is:

        {"Cray": {"Username": "root", "Password": "XXXX"}}
        
      • Ensure the Cray key exists. This key is not used in any of the other credential specifications.

      yq read "${SITE_INIT}/customizations.yaml" 'spec.kubernetes.sealed_secrets.cray_reds_credentials.generate.data[*].args.value' | jq
      
    • Validate MEDS credentials:

      These credentials are used by the MEDS service, targeting Redfish BMC endpoints.

      yq read "${SITE_INIT}/customizations.yaml" 'spec.kubernetes.sealed_secrets.cray_meds_credentials.generate.data[0].args.value' | jq
      
    • Validate RTS credentials:

      These credentials are used by the Redfish Translation Service, targeting River Redfish BMC endpoints and PDU controllers.

      yq read "${SITE_INIT}/customizations.yaml" 'spec.kubernetes.sealed_secrets.cray_hms_rts_credentials.generate.data[*].args.value' | jq
      
  8. To customize the PKI Certificate Authority (CA) used by the platform, see Certificate Authority.

    IMPORTANT The CA may not be modified after install.

Setup LDAP configuration

NOTE Skip past LDAP configuration to here if there is no LDAP configuration at this time. If LDAP should be enabled later, follow Add LDAP User Federation after installation.

  1. (pit#) Set environment variables for the LDAP server and its port.

    In the following example, the LDAP server has the hostname dcldap2.hpc.amslabs.hpecorp.net and is using the port 636.

    LDAP=dcldap2.hpc.amslabs.hpecorp.net
    PORT=636
    
  2. (pit#) Load the openjdk container image.

    NOTE Requires a properly configured Docker or Podman environment.

    "${CSM_PATH}/hack/load-container-image.sh" artifactory.algol60.net/csm-docker/stable/docker.io/library/openjdk:11-jre-slim
    
  3. (pit#) Get the issuer certificate.

    Retrieve the issuer certificate for the LDAP server at port 636. Use openssl s_client to connect and show the certificate chain returned by the LDAP host:

    openssl s_client -showcerts -connect "${LDAP}:${PORT}" </dev/null
    
  4. Enter the issuer’s certificate into cacert.pem.

    Either manually extract (i.e., cut/paste) the issuer’s certificate into cacert.pem, or try the following commands to create it automatically.

    NOTE The following commands were verified using OpenSSL version 1.1.1d and use the -nameopt RFC2253 option to ensure consistent formatting of distinguished names. Unfortunately, older versions of OpenSSL may not support -nameopt on the s_client command or may use a different default format. However, the issuer certificate can be manually extracted from the output of the above openssl s_client example, if the following commands are unsuccessful.

    1. (pit#) Observe the issuer’s DN.

      openssl s_client -showcerts -nameopt RFC2253 -connect "${LDAP}:${PORT}" </dev/null 2>/dev/null | grep issuer= | sed -e 's/^issuer=//'
      

      Expected output includes a line similar to one of the below examples:

      Self-signed Certificate:

      emailAddress=dcops@hpe.com,CN=Data Center,OU=HPC/MCS,O=HPE,ST=WI,C=US
      

      Signed Certificate:

       CN=DigiCert Global G2 TLS RSA SHA256 2020 CA1,O=DigiCert Inc,C=US
      
    2. (pit#) Extract the issuer’s certificate.

      NOTE The issuer DN is properly escaped as part of the awk pattern below. It must be changed to match the value for emailAddress, CN, OU, etc. for your LDAP. If the value you are using is different, be sure to escape it properly!

      openssl s_client -showcerts -nameopt RFC2253 -connect "${LDAP}:${PORT}" </dev/null 2>/dev/null |
                awk '/s:emailAddress=dcops@hpe.com,CN=Data Center,OU=HPC\/MCS,O=HPE,ST=WI,C=US/,/END CERTIFICATE/' |
                awk '/BEGIN CERTIFICATE/,/END CERTIFICATE/' > cacert.pem
      
  5. (pit#) Create certs.jks.

    NOTE The alias used in this command for cray-data-center-ca should be changed to match your LDAP.

    podman run --rm -v "$(pwd):/data" \
            artifactory.algol60.net/csm-docker/stable/docker.io/library/openjdk:11-jre-slim keytool \
            -importcert -trustcacerts -file /data/cacert.pem -alias cray-data-center-ca \
            -keystore /data/certs.jks -storepass password -noprompt
    

    NOTE If the command is executed multiple times by oversight, then the console will display the following and may be ignored to proceed further.

    keytool error: java.lang.Exception: Certificate not imported, alias <cray-data-center-ca> already exists
    
  6. (pit#) Create certs.jks.b64 by base-64 encoding certs.jks.

    base64 certs.jks > certs.jks.b64
    
  7. (pit#) Inject and encrypt certs.jks.b64 into customizations.yaml.

    cat <<EOF | yq w - 'data."certs.jks"' "$(<certs.jks.b64)" | \
        yq r -j - | ${SITE_INIT}/utils/secrets-encrypt.sh | \
        yq w -f - -i ${SITE_INIT}/customizations.yaml 'spec.kubernetes.sealed_secrets.cray-keycloak'
    {
      "kind": "Secret",
      "apiVersion": "v1",
      "metadata": {
        "name": "keycloak-certs",
        "namespace": "services",
        "creationTimestamp": null
      },
      "data": {}
    }
    EOF
    
  8. (pit#) Update the keycloak_users_localize sealed secret with the appropriate value for ldap_connection_url.

    1. (pit#) Set ldap_connection_url in customizations.yaml.

      yq write -i "${SITE_INIT}/customizations.yaml" \
               'spec.kubernetes.sealed_secrets.keycloak_users_localize.generate.data.(args.name==ldap_connection_url).args.value' \
               "ldaps://${LDAP}"
      
    2. (pit#) Review the keycloak_users_localize sealed secret.

      yq read "${SITE_INIT}/customizations.yaml" spec.kubernetes.sealed_secrets.keycloak_users_localize
      
  9. Configure the ldapSearchBase and localRoleAssignments settings for the cray-keycloak-users-localize chart in customizations.yaml.

    NOTE There may be one or more groups in LDAP for admins and one or more for users. Each admin group needs to be assigned to role admin and set to both shasta and cray clients in Keycloak. Each user group needs to be assigned to role user and set to both shasta and cray clients in Keycloak.

    1. (pit#) Set ldapSearchBase in customizations.yaml.

      NOTE This example sets ldapSearchBase to dc=dcldap,dc=dit

      yq write -i "${SITE_INIT}/customizations.yaml" spec.kubernetes.services.cray-keycloak-users-localize.ldapSearchBase 'dc=dcldap,dc=dit'
      
    2. (pit#) Set localRoleAssignments in customizations.yaml.

      NOTE This example sets localRoleAssignments for the LDAP groups employee, craydev, and shasta_admins to be the admin role, and the LDAP group shasta_users to be the user role.

      yq write -s - -i "${SITE_INIT}/customizations.yaml" <<EOF
      - command: update
        path: spec.kubernetes.services.cray-keycloak-users-localize.localRoleAssignments
        value:
        - {"group": "employee", "role": "admin", "client": "shasta"}
        - {"group": "employee", "role": "admin", "client": "cray"}
        - {"group": "craydev", "role": "admin", "client": "shasta"}
        - {"group": "craydev", "role": "admin", "client": "cray"}
        - {"group": "shasta_admins", "role": "admin", "client": "shasta"}
        - {"group": "shasta_admins", "role": "admin", "client": "cray"}
        - {"group": "shasta_users", "role": "user", "client": "shasta"}
        - {"group": "shasta_users", "role": "user", "client": "cray"}
      EOF
      
    3. (pit#) Review the cray-keycloak-users-localize values.

      yq read "${SITE_INIT}/customizations.yaml" spec.kubernetes.services.cray-keycloak-users-localize
      

Customize DNS configuration

  1. (pit#) Configure the Unbound DNS resolver (if needed).

    Important If access to a site DNS server is required and this DNS server was specified to csi using the site-dns option (either on the command line or in the system_config.yaml file), then no further action is required and this step should be skipped.

    The default configuration is as follows:

    cray-dns-unbound:
        domain_name: '{{ network.dns.external }}'
        forwardZones:
          - name: "."
            forwardIps:
              - "{{ network.netstaticips.system_to_site_lookups }}"
    

    The configured site DNS server can be verified by inspecting the value set for system_to_site_lookups.

    yq r ${SITE_INIT}/customizations.yaml spec.network.netstaticips.system_to_site_lookups
    

    Possible output:

    172.30.84.40
    

    If there is no requirement to resolve external hostnames (including other services on the site network) or no upstream DNS server, then the cray-dns-unbound service should be configured to forward to the cray-dns-powerdns service.

    1. (pit#) Update the forwardZones configuration for the cray-dns-unbound service to point to the cray-dns-powerdns service.

      yq write -s - -i ${SITE_INIT}/customizations.yaml <<EOF
      - command: update
        path: spec.kubernetes.services.cray-dns-unbound.forwardZones
        value:
        - name: "."
          forwardIps:
          - "10.92.100.85"
      EOF
      
    2. (pit#) Review the cray-dns-unbound values.

      IMPORTANT Do not remove the domain_name entry, it is required for Unbound to forward requests to PowerDNS correctly.

      yq read "${SITE_INIT}/customizations.yaml" spec.kubernetes.services.cray-dns-unbound
      

      Expected output:

      domain_name: '{{ network.dns.external }}'
      forwardZones:
        - name: "."
          forwardIps:
            - "10.92.100.85"
      

    See the following documentation regarding known issues when operating with no upstream DNS server.

  2. (Optional) Configure PowerDNS zone transfer and DNSSEC. See the PowerDNS Configuration Guide for more information.

    • If zone transfer is to be configured, then review customizations.yaml and ensure that the primary_server, secondary_servers, and notify_zones values are set correctly.

    • If DNSSEC is to be used, then add the desired keys into the dnssec SealedSecret.

Set storage limit for Thanos S3 bucket

By default, there is NO retention policy set for thanos object storage data. This means that data is retained forever. Retention can be configured by using the --retention.resolution-raw , --retention.resolution-5m and --retention.resolution-1h flags. Not setting these flags or setting them to 0 second means no retention policy is configured.

The thanos object storage is deployed by the cray-sysmgmt-health chart to the sysmgmt-health namespace. In order to set the storage limits for thanos S3 bucket, configure the thanosCompactor settings for the cray-sysmgmt-health chart in customizations.yaml.

  1. (pit#) Set thanosCompactor in customization.yaml.

    yq write -s - -i "${SITE_INIT}/customizations.yaml" <<EOF
    - command: update
      path: spec.kubernetes.services.cray-sysmgmt-health.thanosCompactor
      value:
         resolutionraw: 15d
         resolution5m: 15d
         resolution1h: 15d
    EOF
    
  2. (pit#) Review the thanosCompactor values.

    yq read "${SITE_INIT}/customizations.yaml" spec.kubernetes.services.cray-sysmgmt-health.thanosCompactor
    

    Example output is:

    resolutionraw: 15d
    resolution5m: 15d
    resolution1h: 15d
    

NOTE The recommended storage limit to configure for thanos is 15d - 30d for all resolutions. As a rule of thumb retention for each downsampling level should be the same, and should be greater than the maximum date range (10 days for 5m to 1h downsampling).

Configure Prometheus SNMP Exporter

The Prometheus SNMP exporter needs to be configured with a list of management network switches to scrape metrics from in order to populate the System Health Service Grafana dashboards.

NOTE that for the Prometheus SNMP exporter to work, SNMP needs to be configured on the management switches and the username and password need to match both Vault and the matching sealed secret in customizations.yaml. See the Prometheus SNMP Exporter page for more information and review the Adding SNMP Credentials to the System section for links to the relevant procedures.

4. Encrypt secrets

  1. (pit#) Load the zeromq container image required by Sealed Secret Generators.

    NOTE Requires a properly configured Docker or Podman environment.

    "${CSM_PATH}/hack/load-container-image.sh" artifactory.algol60.net/csm-docker/stable/docker.io/zeromq/zeromq:v4.0.5
    
  2. (pit#) Re-encrypt existing secrets.

    "${SITE_INIT}/utils/secrets-reencrypt.sh" \
        "${SITE_INIT}/customizations.yaml" \
        "${SITE_INIT}/certs/sealed_secrets.key" \
        "${SITE_INIT}/certs/sealed_secrets.crt"
    

    It is not an error if this script gives no output.

  3. (pit#) Generate secrets.

    "${SITE_INIT}/utils/secrets-seed-customizations.sh" "${SITE_INIT}/customizations.yaml"
    
  4. Leave the site-init directory.

    cd "${PITDATA}"
    
  5. site-init is now prepared. Resume Initialize the LiveCD.

5. Customer-specific customizations

Customer-specific customizations are any changes on top of the baseline configuration to satisfy customer-specific requirements. It is recommended that customer-specific customizations be tracked on branches separate from the mainline in order to make them easier to manage.

Apply any customer-specific customizations by merging the corresponding branches into master branch of site-init.

When considering merges, and especially when resolving conflicts, carefully examine differences to ensure all changes are relevant. For example, when applying a customer-specific customization used in a prior version, be sure the change still makes sense. It is common for options to change as new features are introduced and bugs are fixed.