This document guides an administrator through the patch update to Cray Systems Management v1.2.2
from v1.2.0
or 1.2.1
.
If upgrading from CSM v1.0.x directly to v1.2.2
, follow the procedures described in Upgrade CSM instead.
AppVersion
that was being reported from csi
version reportprecache
chart upgrade (for better performance)etcd_database_health
check in the ncn-healthcheck
postgres
database backups that caused them to fail to restore and cleans up existing (bad) postgres
backups on the systemcray-dns-unbound
powerDNS
in an air-gapped environmentdhcp-helper
logic updating IP addressessnmp
credentials being set on leaf switches were being lostcray-hmcollector-poll
pod was not collecting river telemetry due to a check the collector does against the SMA kafka
instanceShasta
realm and only allow Keycloak administration through CMNCVE-2020-10770
for keycloak
App.version
field in csi version
commandcapmc
to use the PATCH URI when trying to set multiple controls for Olympus hardwarevcs
data when there are extra spaces in the pod namerequest-ncn-join-token
(to avoid issues with spire tokens)canu
commands--no-cache
flag when resuming CSM services installexternal SSH test
Site Init
documentation on external hostspowerDNS
in an air-gapped environmentgoss
testkdump
(kernel dump) may hang and fail on NCNs in CSM 1.2 (HPE Cray EX System Software 22.07 release). During the upgrade, a workaround is applied to fix this.If you are using the CHN network tech preview, upgrade the CSM management network configuration before proceeding with the patch installation.
Detailed information on the fixes and configuration updates after CANU release 1.6.5 can be found from CANU release notes
Start a typescript on ncn-m001
to capture the commands and output from this procedure.
ncn-m001# script -af csm-update.$(date +%Y-%m-%d).txt
ncn-m001# export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
Download and extract the CSM v1.2.2
release to ncn-m001
.
Set CSM_DISTDIR
to the directory of the extracted files.
IMPORTANT: If necessary, change this command to match the actual location of the extracted files.
ncn-m001# CSM_DISTDIR="$(pwd)/csm-1.2.2"
ncn-m001# echo "${CSM_DISTDIR}"
Set CSM_RELEASE_VERSION
to the CSM release version.
ncn-m001# export CSM_RELEASE_VERSION="$(${CSM_DISTDIR}/lib/version.sh --version)"
ncn-m001# echo "${CSM_RELEASE_VERSION}"
Download and install/upgrade the latest documentation on ncn-m001
.
Run lib/setup-nexus.sh
to configure Nexus and upload new CSM RPM
repositories, container images, and Helm charts:
ncn-m001# cd "$CSM_DISTDIR"
ncn-m001# ./lib/setup-nexus.sh
On success, setup-nexus.sh
will output OK
on stderr
and exit with status
code 0
. For example:
ncn-m001# ./lib/setup-nexus.sh
[... output omitted ...]
+ Nexus setup complete
setup-nexus.sh: OK
ncn-m001# echo $?
0
In the event of an error, consult Troubleshoot Nexus
to resolve potential problems and then try running setup-nexus.sh
again. Note that subsequent runs of setup-nexus.sh
may
report FAIL
when uploading duplicate assets. This is okay as long as setup-nexus.sh
outputs setup-nexus.sh: OK
and exits
with status code 0
.
Run upgrade.sh
to deploy upgraded CSM applications and services:
ncn-m001# cd "$CSM_DISTDIR"
ncn-m001# ./upgrade.sh
Verify that the new CSM version is in the product catalog.
Verify that the new CSM version is listed in the output of the following command:
ncn-m001# kubectl get cm cray-product-catalog -n services -o jsonpath='{.data.csm}' | yq r -j - | jq -r 'to_entries[] | .key' | sort -V
Example output that includes the new CSM version (1.2.2
):
0.9.2
0.9.3
0.9.4
0.9.5
0.9.6
1.0.1
1.0.10
1.2.0
1.2.1
1.2.2
Confirm that the product catalog has an accurate timestamp for the CSM upgrade.
Confirm that the import_date
reflects the timestamp of the upgrade.
ncn-m001# kubectl get cm cray-product-catalog -n services -o jsonpath='{.data.csm}' | yq r - '"1.2.2".configuration.import_date'
Remember to exit the typescript that was started at the beginning of the upgrade.
ncn-m001# exit
It is recommended to save the typescript file for later reference.