This document guides an administrator through the patch update to Cray Systems Management v1.3.1
from v1.3.0
.
If upgrading from CSM v1.2.2
directly to v1.3.1
, follow the procedures described in Upgrade CSM instead.
cfs-operator
for fixed session memory limits.Proliant
iLO (Redfish).max_concurrent
tuning to 10000
.cray-dns-unbound
.non-rootfs_providers
to specify root=<values>
in parameters sessiontemplates
.CVE-2020-10770
via OPA Policy (API AuthZ
).hms-collector
) to Kafka messages to ensure events are sent to the same Kafka partition. The message key is the BMC Xname concatenated with the Redfish Event Message ID. For example x3000c0s11b4.EventLog.1.0.PowerStatusChange
.jwks
URL in cray-opa
to ingress gateway.cfs-operator
to remove the high priority on pods.cray-console-*
timeout to allow more time for post-upgrade hooks to complete.cray-dhcp-kea
timeout on readiness check to from default value.loadstate
and dumpstate
.fs.inotify.max_user_watches
on Kubernetes worker nodes in response to kubectl logs -f
returning no space
errors.(ncn-m001#
) Start a typescript on ncn-m001
to capture the commands and output from this procedure.
script -af csm-update.$(date +%Y-%m-%d).txt
export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
Download and extract the CSM v1.3.1
release to ncn-m001
.
(ncn-m001#
) Set CSM_DISTDIR
to the directory of the extracted files.
IMPORTANT: If necessary, change this command to match the actual location of the extracted files.
CSM_DISTDIR="$(pwd)/csm-1.3.1"
echo "${CSM_DISTDIR}"
(ncn-m001#
) Set CSM_RELEASE_VERSION
to the CSM release version.
export CSM_RELEASE_VERSION="$(${CSM_DISTDIR}/lib/version.sh --version)"
echo "${CSM_RELEASE_VERSION}"
Download and install/upgrade the latest documentation on ncn-m001
.
(ncn-m001#
) Run lib/setup-nexus.sh
to configure Nexus and upload new CSM RPM
repositories, container images, and Helm charts:
cd "$CSM_DISTDIR"
./lib/setup-nexus.sh ; echo "RC=$?"
On success, setup-nexus.sh
will output OK
on stderr
and exit with status
code 0
. For example:
+ Nexus setup complete
setup-nexus.sh: OK
RC=0
In the event of an error, consult Troubleshoot Nexus
to resolve potential problems and then try running setup-nexus.sh
again. Note that subsequent runs of setup-nexus.sh
may
report FAIL
when uploading duplicate assets. This is okay as long as setup-nexus.sh
outputs setup-nexus.sh: OK
and exits
with status code 0
.
(ncn-m001#
) Run upgrade.sh
to deploy upgraded CSM applications and services:
cd "$CSM_DISTDIR"
./upgrade.sh
(ncn-m001#
) Update select RPMs on the NCNs.
NOTE: The following message may be emitted after running the following zypper
command. The message can be safely ignored.
You may wish to restart these processes.
See 'man zypper' for information about the meaning of values in the above table.
No core libraries or services have been updated since the last system boot.
Reboot is probably not necessary.
pdsh -b -S -w $(grep -oP 'ncn-\w\d+' /etc/hosts | sort -u | tr -t '\n' ',') \
'zypper install -y hpe-csm-goss-package csm-testing goss-servers && systemctl enable goss-servers && systemctl restart goss-servers' \
&& echo PASSED || echo FAILED
Verify that the new CSM version is in the product catalog.
(ncn-m001#
) Verify that the new CSM version is listed in the output of the following command:
kubectl get cm cray-product-catalog -n services -o jsonpath='{.data.csm}' | yq r -j - | jq -r 'to_entries[] | .key' | sort -V
Example output that includes the new CSM version (1.3.1
):
0.9.2
0.9.3
0.9.4
0.9.5
0.9.6
1.0.1
1.0.10
1.2.0
1.2.1
1.2.2
1.3.0
1.3.1
Confirm that the product catalog has an accurate timestamp for the CSM upgrade.
(ncn-m001#
) Confirm that the import_date
reflects the timestamp of the upgrade.
kubectl get cm cray-product-catalog -n services -o jsonpath='{.data.csm}' | yq r - '"1.3.1".configuration.import_date'
(ncn-m001#
) Remember to exit the typescript that was started at the beginning of the upgrade.
exit
It is recommended to save the typescript file for later reference.