This procedure will install CSM applications and services into the CSM Kubernetes cluster.
NOTE: Check the information in Known issues before starting this procedure to be warned about possible problems.
NOTE: During this step, only on systems with only three worker nodes (typically Testing and Development Systems (TDS)), the
customizations.yamlfile will be automatically edited to lower pod CPU requests for some services, in order to better facilitate scheduling on smaller systems. See the file/var/www/ephemeral/${CSM_RELEASE}/tds_cpu_requests.yamlfor these settings. This file can be modified with different values (prior to executing theyaplcommand below), if other settings are desired in thecustomizations.yamlfile for this system. For more information about modifyingcustomizations.yamland tuning for specific systems, see Post-Install Customizations.
Install YAPL.
pit# rpm -Uvh /var/www/ephemeral/${CSM_RELEASE}/rpm/cray/csm/sle-15sp2/x86_64/yapl-*.x86_64.rpm
Install CSM services using YAPL.
pit# pushd /usr/share/doc/csm/install/scripts/csm_services && \
yapl -f install.yaml execute
pit# popd
NOTES:
- This command may take up to 90 minutes to complete.
- If any errors are encountered, then potential fixes should be displayed where the error occurred.
- If the installation fails with a missing secret error message, then see CSM Services Install Fails Because of Missing Secret.
- Output is redirected to
/usr/share/doc/csm/install/scripts/csm_services/yapl.log. To show the output in the terminal, append the--console-output executeargument to theyaplcommand.- The
yaplcommand can safely be rerun. By default, it will skip any steps which were previously completed successfully. To force it to rerun all steps regardless of what was previously completed, append the--no-cacheargument to theyaplcommand.- The order of the
yaplcommand arguments is important. The syntax isyapl -f install.yaml [--console-output] execute [--no-cache].
Wait for BSS to be ready.
pit# kubectl -n services rollout status deployment cray-bss
Retrieve an API token.
pit# export TOKEN=$(curl -k -s -S -d grant_type=client_credentials \
-d client_id=admin-client \
-d client_secret=`kubectl get secrets admin-client-auth -o jsonpath='{.data.client-secret}' | base64 -d` \
https://api-gw-service-nmn.local/keycloak/realms/shasta/protocol/openid-connect/token | jq -r '.access_token')
Create empty boot parameters.
pit# curl -i -k -H "Authorization: Bearer ${TOKEN}" -X PUT \
https://api-gw-service-nmn.local/apis/bss/boot/v1/bootparameters \
--data '{"hosts":["Global"]}'
Example of successful output:
HTTP/2 200
content-type: application/json; charset=UTF-8
date: Mon, 27 Jun 2022 17:08:55 GMT
content-length: 0
x-envoy-upstream-service-time: 7
server: istio-envoy
Restart the spire-update-bss job.
pit# SPIRE_JOB=$(kubectl -n spire get jobs -l app.kubernetes.io/name=spire-update-bss -o name)
pit# kubectl -n spire get "${SPIRE_JOB}" -o json | jq 'del(.spec.selector)' \
| jq 'del(.spec.template.metadata.labels."controller-uid")' \
| kubectl replace --force -f -
Wait for the spire-update-bss job to complete.
pit# kubectl -n spire wait "${SPIRE_JOB}" --for=condition=complete --timeout=5m
Wait at least 15 minutes to let the various Kubernetes resources initialize and start before proceeding with the rest of the install. Because there are a number of dependencies between them, some services are not expected to work immediately after the install script completes.
The next step is to validate CSM health before redeploying the final NCN.
See Validate CSM health before final NCN deployment.
Deploy CSM Applications and Services known issuesThe following error may occur during the Deploy CSM Applications and Services step:
+ csi upload-sls-file --sls-file /var/www/ephemeral/prep/eniac/sls_input_file.json
2021/10/05 18:42:58 Retrieving S3 credentials ( sls-s3-credentials ) for SLS
2021/10/05 18:42:58 Unable to SLS S3 secret from k8s:secrets "sls-s3-credentials" not found
Verify that the sls-s3-credentials secret exists in the default namespace:
pit# kubectl get secret sls-s3-credentials
Example output:
NAME TYPE DATA AGE
sls-s3-credentials Opaque 7 28d
Check for running sonar-sync jobs. If there are no sonar-sync jobs, then wait for one to complete. The sonar-sync CronJob is responsible
for copying the sls-s3-credentials secret from the default namespace to the services namespace.
pit# kubectl -n services get pods -l cronjob-name=sonar-sync
Example output:
NAME READY STATUS RESTARTS AGE
sonar-sync-1634322840-4fckz 0/1 Completed 0 73s
sonar-sync-1634322900-pnvl6 1/1 Running 0 13s
Verify that the sls-s3-credentials secret now exists in the services namespace.
pit# kubectl -n services get secret sls-s3-credentials
Example output:
NAME TYPE DATA AGE
sls-s3-credentials Opaque 7 20s
Running the yapl command again is expected to succeed.
Error releasing chart known issuesSome chart installation errors may occur during the Deploy CSM Applications and Services step:
Example output (Constraint kind not recognized):
ERR Error releasing chart gatekeeper-constraints v0.5.0: Shell error: Release "gatekeeper-constraints" does not exist. Installing it now.
Error: admission webhook "validation.gatekeeper.sh" denied the request: Constraint kind K8sPSPFSGroup is not recognized chart=gatekeeper-constraints command=ship namespace=gatekeeper-system version=0.5.0
Another example output (connection refused):
ERR Error releasing chart cray-metallb v1.1.1: Shell error: Release "cray-metallb" does not exist. Installing it now.
Error: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://cray-sysmgmt-health-promet-operator.sysmgmt-health.svc:443/admission-prometheusrules/mutate?timeout=30s": dial tcp 10.17.87.159:443: connect: connection refused chart=cray-metallb command=ship namespace=metallb-system version=1.1.1
As most chart release errors are timing or transitory issues, running the yapl command again (or a few times) is expected to succeed.
Setup Nexus known issuesKnown potential issues along with suggested fixes are listed in Troubleshoot Nexus.