This document guides an administrator through the patch update to Cray Systems Management v1.4.1
from CSM v1.4.0
. If upgrading from CSM v1.3.x
, then follow the procedures
described in CSM major/minor version upgrade instead.
In the unusual situation of upgrading from a pre-release version of CSM v1.4.0
, then follow the procedures
described in CSM major/minor version upgrade instead.
Also note that there is no need to perform intermediate CSM v1.4
patch upgrades. Instead,
consider upgrading to the latest CSM v1.4
patch release. See
CSM patch version upgrade for the full list of patch versions.
bos
API specificationhms-rts
chart for deployment of multiple back endspcs
to be able to report power status of RTS
management switchesMgmtHLSwitch
and CDUMgmtSwitch
to vault to be able to set their SNMP credentialsRTS
switches and updating their data in hardware state managerCray-HPE
codebasemetacontroller:v4.4.0
container imagebos v2
to filter out any nodes disabled in hardware state manager at bos
session creationcrus
from the cray-cli
cray fas loader
due to a python library changegoss-servers.service
that caused extraneous messages to print to the consoleupload_ceph_images_to_nexus
cray-dns-unbound
helm chart leading to deletion of DNS
recordsSNMP
set up for all switches to the install and upgrade instructionsgrok-exporter
not running on the ncn-m001
nodecray-sat:3.21.4
container imagecilium:v1.12.4
container imagehsm_discovery_status_test
errorbos v2
setting the wrong status at scalegoss-platform-ca-in-bundle
test time outbos
log with bos
shutdown failurebos v1 session create
API specification to fix missing required parametersbos v1 list sessions
bos v1
sessionbos v2 sessiontemplatetemplate
endpointpython 3.11
in bos
serverargoexec
container imageprerequisites.sh
for upgrading nls
DNS
records from the configmap
when restarting kea
cray-drydock
for communications between mqtt
and spire
bos
API specification is accurate for get or list sessiontemplates
endpointscfs-ara:1.0.2
container imagecray-dns-unbound
during the install of csm
servicesbos sessiontemplate
namesValidate CSM health.
See Validate CSM Health.
Run the CSM health checks to ensure that everything is working properly before the upgrade starts. After the upgrade is completed, another health check is performed. It is important to know if any problems observed at that time existed prior to the upgrade.
IMPORTANT: See the CSM Install Validation and Health Checks procedures in the documentation for the CURRENT CSM version on the system. The validation procedures in the CSM documentation are only intended to work with that specific version of CSM.
(ncn-m001#
) Start a typescript on ncn-m001
to capture the commands and output from this procedure.
script -af csm-update.$(date +%Y-%m-%d).txt
export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
Download and extract the CSM v1.4.1
release to ncn-m001
.
(ncn-m001#
) Set CSM_DISTDIR
to the directory of the extracted files.
IMPORTANT: If necessary, change this command to match the actual location of the extracted files.
export CSM_DISTDIR="$(pwd)/csm-1.4.1"
echo "${CSM_DISTDIR}"
(ncn-m001#
) Set CSM_RELEASE_VERSION
to the CSM release version.
export CSM_RELEASE_VERSION="$(${CSM_DISTDIR}/lib/version.sh --version)"
echo "${CSM_RELEASE_VERSION}"
Download and install/upgrade the latest documentation on ncn-m001
.
(ncn-m001#
) Run lib/setup-nexus.sh
to configure Nexus and upload new CSM RPM
repositories, container images, and Helm charts:
cd "$CSM_DISTDIR"
./lib/setup-nexus.sh ; echo "RC=$?"
On success, the output should end with the following:
+ Nexus setup complete
setup-nexus.sh: OK
RC=0
In the event of an error, consult Troubleshoot Nexus
to resolve potential problems and then try running setup-nexus.sh
again. Note that subsequent runs of setup-nexus.sh
may
report FAIL
when uploading duplicate assets. This is okay as long as setup-nexus.sh
outputs setup-nexus.sh: OK
and exits
with status code 0
.
(ncn-m001#
) Run the following script in preparation for 1.4.1 patch upgrade:
for c in $(kubectl get crd |grep argo | cut -d' ' -f1)
do
kubectl label --overwrite crd $c app.kubernetes.io/managed-by="Helm"
kubectl annotate --overwrite crd $c meta.helm.sh/release-name="cray-nls"
kubectl annotate --overwrite crd $c meta.helm.sh/release-namespace="argo"
done
(ncn-m001#
) Run upgrade.sh
to deploy upgraded CSM applications and services:
cd "$CSM_DISTDIR"
./upgrade.sh
It is important to upload NCN images to IMS and to edit the cray-product-catalog
. This is necessary when updating products
with IUF. If this step is skipped, IUF will fail when updating or upgrading products in the future.
(ncn-m001#
) Execute script to upload CSM NCN images and update the cray-product-catalog
.
/usr/share/doc/csm/upgrade/scripts/upgrade/upload-ncn-images.sh
(ncn-m001#
) Update select RPMs on the NCNs.
pdsh -b -S -w $(grep -oP 'ncn-\w\d+' /etc/hosts | sort -u | tr -t '\n' ',') \
'zypper install -y hpe-csm-goss-package csm-testing goss-servers craycli && systemctl enable goss-servers && systemctl restart goss-servers' \
&& echo PASSED || echo FAILED
Verify that the new CSM version is in the product catalog.
(ncn-m001#
) Verify that the new CSM version is listed in the output of the following command:
kubectl get cm cray-product-catalog -n services -o jsonpath='{.data.csm}' | yq r -j - | jq -r 'to_entries[] | .key' | sort -V
Example output that includes the new CSM version (1.4.1
):
0.9.2
0.9.3
0.9.4
0.9.5
0.9.6
1.0.1
1.0.10
1.2.0
1.2.1
1.2.2
1.3.0
1.3.1
1.4.0
1.4.1
Confirm that the product catalog has an accurate timestamp for the CSM upgrade.
(ncn-m001#
) Confirm that the import_date
reflects the timestamp of the upgrade.
kubectl get cm cray-product-catalog -n services -o jsonpath='{.data.csm}' | yq r - '"1.4.1".configuration.import_date'
(ncn-m001#
) Remember to exit the typescript that was started at the beginning of the upgrade.
exit
It is recommended to save the typescript file for later reference.