This document guides an administrator through the patch update to Cray Systems Management v1.4.1
from CSM v1.4.0. If upgrading from CSM v1.3.x, then follow the procedures
described in CSM major/minor version upgrade instead.
In the unusual situation of upgrading from a pre-release version of CSM v1.4.0, then follow the procedures
described in CSM major/minor version upgrade instead.
Also note that there is no need to perform intermediate CSM v1.4 patch upgrades. Instead,
consider upgrading to the latest CSM v1.4 patch release. See
CSM patch version upgrade for the full list of patch versions.
bos API specificationhms-rts chart for deployment of multiple back endspcs to be able to report power status of RTS management switchesMgmtHLSwitch and CDUMgmtSwitch to vault to be able to set their SNMP credentialsRTS switches and updating their data in hardware state managerCray-HPE codebasemetacontroller:v4.4.0 container imagebos v2 to filter out any nodes disabled in hardware state manager at bos session creationcrus from the cray-clicray fas loader due to a python library changegoss-servers.service that caused extraneous messages to print to the consoleupload_ceph_images_to_nexuscray-dns-unbound helm chart leading to deletion of DNS recordsSNMP set up for all switches to the install and upgrade instructionsgrok-exporter not running on the ncn-m001 nodecray-sat:3.21.4 container imagecilium:v1.12.4 container imagehsm_discovery_status_test errorbos v2 setting the wrong status at scalegoss-platform-ca-in-bundle test time outbos log with bos shutdown failurebos v1 session create API specification to fix missing required parametersbos v1 list sessionsbos v1 sessionbos v2 sessiontemplatetemplate endpointpython 3.11 in bos serverargoexec container imageprerequisites.sh for upgrading nlsDNS records from the configmap when restarting keacray-drydock for communications between mqtt and spirebos API specification is accurate for get or list sessiontemplates endpointscfs-ara:1.0.2 container imagecray-dns-unbound during the install of csm servicesbos sessiontemplate namesValidate CSM health.
See Validate CSM Health.
Run the CSM health checks to ensure that everything is working properly before the upgrade starts. After the upgrade is completed, another health check is performed. It is important to know if any problems observed at that time existed prior to the upgrade.
IMPORTANT: See the CSM Install Validation and Health Checks procedures in the documentation for the CURRENT CSM version on the system. The validation procedures in the CSM documentation are only intended to work with that specific version of CSM.
(ncn-m001#) Start a typescript on ncn-m001 to capture the commands and output from this procedure.
script -af csm-update.$(date +%Y-%m-%d).txt
export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
Download and extract the CSM v1.4.1 release to ncn-m001.
(ncn-m001#) Set CSM_DISTDIR to the directory of the extracted files.
IMPORTANT: If necessary, change this command to match the actual location of the extracted files.
export CSM_DISTDIR="$(pwd)/csm-1.4.1"
echo "${CSM_DISTDIR}"
(ncn-m001#) Set CSM_RELEASE_VERSION to the CSM release version.
export CSM_RELEASE_VERSION="$(${CSM_DISTDIR}/lib/version.sh --version)"
echo "${CSM_RELEASE_VERSION}"
Download and install/upgrade the latest documentation on ncn-m001.
(ncn-m001#) Run lib/setup-nexus.sh to configure Nexus and upload new CSM RPM
repositories, container images, and Helm charts:
cd "$CSM_DISTDIR"
./lib/setup-nexus.sh ; echo "RC=$?"
On success, the output should end with the following:
+ Nexus setup complete
setup-nexus.sh: OK
RC=0
In the event of an error, consult Troubleshoot Nexus
to resolve potential problems and then try running setup-nexus.sh again. Note that subsequent runs of setup-nexus.sh may
report FAIL when uploading duplicate assets. This is okay as long as setup-nexus.sh outputs setup-nexus.sh: OK and exits
with status code 0.
(ncn-m001#) Run the following script in preparation for 1.4.1 patch upgrade:
for c in $(kubectl get crd |grep argo | cut -d' ' -f1)
do
kubectl label --overwrite crd $c app.kubernetes.io/managed-by="Helm"
kubectl annotate --overwrite crd $c meta.helm.sh/release-name="cray-nls"
kubectl annotate --overwrite crd $c meta.helm.sh/release-namespace="argo"
done
(ncn-m001#) Run upgrade.sh to deploy upgraded CSM applications and services:
cd "$CSM_DISTDIR"
./upgrade.sh
It is important to upload NCN images to IMS and to edit the cray-product-catalog. This is necessary when updating products
with IUF. If this step is skipped, IUF will fail when updating or upgrading products in the future.
(ncn-m001#) Execute script to upload CSM NCN images and update the cray-product-catalog.
/usr/share/doc/csm/upgrade/scripts/upgrade/upload-ncn-images.sh
(ncn-m001#) Update select RPMs on the NCNs.
pdsh -b -S -w $(grep -oP 'ncn-\w\d+' /etc/hosts | sort -u | tr -t '\n' ',') \
'zypper install -y hpe-csm-goss-package csm-testing goss-servers craycli && systemctl enable goss-servers && systemctl restart goss-servers' \
&& echo PASSED || echo FAILED
Verify that the new CSM version is in the product catalog.
(ncn-m001#) Verify that the new CSM version is listed in the output of the following command:
kubectl get cm cray-product-catalog -n services -o jsonpath='{.data.csm}' | yq r -j - | jq -r 'to_entries[] | .key' | sort -V
Example output that includes the new CSM version (1.4.1):
0.9.2
0.9.3
0.9.4
0.9.5
0.9.6
1.0.1
1.0.10
1.2.0
1.2.1
1.2.2
1.3.0
1.3.1
1.4.0
1.4.1
Confirm that the product catalog has an accurate timestamp for the CSM upgrade.
(ncn-m001#) Confirm that the import_date reflects the timestamp of the upgrade.
kubectl get cm cray-product-catalog -n services -o jsonpath='{.data.csm}' | yq r - '"1.4.1".configuration.import_date'
(ncn-m001#) Remember to exit the typescript that was started at the beginning of the upgrade.
exit
It is recommended to save the typescript file for later reference.