This page should NOT be used for a normal CSM upgrade.
This page provides instructions for doing a manual upgrade of NCN nodes. There is a section for upgrading worker nodes, storage nodes, and master nodes. All NCN upgrades in CSM 1.6 and later are done through IUF. This page provides instructions in case a manual NCN upgrade is needed for an unusual situation. As we have removed all of our manual CSM upgrade documentation, it is important that we have documentation describing this process in case of an emergency.
If you are performing a regular CSM upgrade, this should be done through IUF. Follow upgrade management nodes and CSM services to perform a normal CSM upgrade.
In CSM 1.6 and later, the storage node upgrades should be executed by IUF. See upgrade management nodes and CSM services to perform a normal CSM upgrade.
Storage node upgrades are done using an IUF Argo workflow. See using the Argo UI to access the UI and using Argo workflows for more information about Argo workflows.
(ncn-m001#
) Set the storage node name for the node that is being upgraded.
storage_node=ncn-s00x
(ncn-m001#
) Execute the storage node upgrade.
NOTE: If
--image-id
and/or--desired-cfs-conf
is not supplied, then the storage node will be upgraded to the image that is already set in BSS and the CFS configuration already set in CFS. Additionally, the--image-id
and--desired-cfs-conf
can be set manually in BSS and in CFS respectively. See set the image ID and CFS configuration manually for the manual process. If the manual process is used, then omit--image-id
and--desired-cfs-conf
from the command below.
/usr/share/doc/csm/upgrade/scripts/upgrade/ncn-upgrade-worker-storage-nodes.sh $storage_node --upgrade --image-id $image --desired-cfs-conf $configuration
In CSM 1.6 and later, the worker node upgrades should be executed by IUF. See upgrade management nodes and CSM services to perform a normal CSM upgrade.
Worker node upgrades are done using an IUF Argo workflow. See using the Argo UI to access the UI and using Argo workflows for more information about Argo workflows.
(ncn-m001#
) Set a worker node name for the node that is being upgraded.
worker_node=ncn-w00x
(ncn-m001#
) Execute a worker node upgrade.
NOTE: If
--image-id
and/or--desired-cfs-conf
is not supplied, then the worker node will be upgraded to the image that is already set in BSS and the CFS configuration already set in CFS. Additionally, the--image-id
and--desired-cfs-conf
can be set manually in BSS and in CFS respectively. See set the image ID and CFS configuration manually for the manual process. If the manual process is used, then omit--image-id
and--desired-cfs-conf
from the command below.
/usr/share/doc/csm/upgrade/scripts/upgrade/ncn-upgrade-worker-storage-nodes.sh $worker_node --image-id $image --desired-cfs-conf $configuration
In CSM 1.6 and later, the master node upgrades should be executed by IUF. See upgrade management nodes and CSM services to perform a normal CSM upgrade.
A master node upgrade is not executed by Argo workflows, instead the master node upgrade is a bash script.
IUF can use Argo workflows to execute a master node upgrade but it does this by executing the master node upgrade script.
The script keeps track of steps that have been completed and prints them to a state file in the /etc/cray/upgrade/csm/csm-${CSM_RELEASE}/<node-name>
directory on the node where the script is executed.
If the master node upgrade fails partway through, it is safe to re-execute the upgrade script because the state is being tracked and steps will not be re-executed if they have already run successfully.
There are two different processes for upgrading master nodes depending on if ncn-m001
is being upgraded or if ncn-m002
or ncn-m003
is being upgraded.
Follow one of two procedures below.
ncn-m002
or ncn-m003
NOTE
If Kubernetes encryption has been enabled via the Kubernetes Encryption Documentation, then backup the/etc/cray/kubernetes/encryption
directory on the master node before upgrading and restore the directory after the node has been upgraded.
(ncn-m001#
) Start a typescript.
script -af /root/csm_upgrade.$(date +%Y%m%d_%H%M%S).upgrade-m0023.txt
export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
(ncn-m001#
) Set upgrade variables.
source /etc/cray/upgrade/csm/myenv
(ncn-m001#
) Set the master node name for the node that is being upgraded (ncn-m002
or ncn-m003
).
master_node=ncn-m00x
(ncn-m001#
) Run ncn-upgrade-master-nodes.sh
for ncn-m002
or ncn-m003
.
/usr/share/doc/csm/upgrade/scripts/upgrade/ncn-upgrade-master-nodes.sh $master_node
NOTE
Theroot
user password for the node may need to be reset after it is rebooted. Additionally, the/etc/cray/kubernetes/encryption
directory should be restored if it was backed up. Once it is restored, thekube-apiserver
on the rebuilt node should be restarted. See Kuberneteskube-apiserver
Failing for details on how to restart thekube-apiserver
.
ncn-m001
To manually upgrade ncn-m001
, the CFS configuration and node image need to be set for ncn-m001
, the artifacts on ncn-m001
need to be backed up, ncn-m002
needs to be prepared to execute the upgrade, and the ncn-m001
upgrade needs to be executed.
Follow the steps below to upgrade ncn-m001
.
(ncn-m001#
) Create an archive of the artifacts.
BACKUP_TARFILE="csm_upgrade.pre_m001_reboot_artifacts.$(date +%Y%m%d_%H%M%S).tgz"
ls -d \
/root/apply_csm_configuration.* \
/root/csm_upgrade.* \
/root/output.log 2>/dev/null |
sed 's_^/__' |
xargs tar -C / -czvf "/root/${BACKUP_TARFILE}"
(ncn-m001#
) Upload the archive to S3 in the cluster.
cray artifacts create config-data "${BACKUP_TARFILE}" "/root/${BACKUP_TARFILE}"
Log out of ncn-m001
.
Log in to ncn-m002
from outside the cluster.
NOTE
Very rarely, a password hash for theroot
user that works properly on a SLES SP2 NCN is not recognized on a SLES SP3 NCN. If password login fails, then log in toncn-m002
fromncn-m001
and use thepasswd
command to reset the password. Then log in using the CMN IP address as directed below. Oncencn-m001
has been upgraded, log in fromncn-m002
and use thepasswd
command to reset the password. The other NCNs will have their passwords updated when NCN personalization is run in a subsequent step.
ssh
to the bond0.cmn0
/CMN IP address of ncn-m002
.
(ncn-m002#
) Start a typescript.
script -af /root/csm_upgrade.$(date +%Y%m%d_%H%M%S).upgrade-m001.txt
export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
Authenticate with the Cray CLI on ncn-m002
.
See Configure the Cray Command Line Interface for details on how to do this.
(ncn-m002#
) Set upgrade variables.
source /etc/cray/upgrade/csm/myenv
(ncn-m002#
) Copy artifacts from ncn-m001
.
A later stage of the upgrade expects the
docs-csm
andlibcsm
RPMs to be located at/root/
onncn-m002
; that is why this command copies them there.
Install csi
and docs-csm
.
scp ncn-m001:/root/csm_upgrade.pre_m001_reboot_artifacts.*.tgz /root
zypper --plus-repo="/etc/cray/upgrade/csm/csm-${CSM_RELEASE}/tarball/csm-${CSM_RELEASE}/rpm/cray/csm/sle-$(awk -F= '/VERSION=/{gsub(/["-]/, "") ; print tolower($NF)}' /etc/os-release)" --no-gpg-checks install -y cray-site-init
scp ncn-m001:/root/*.noarch.rpm /root/
rpm -Uvh --force /root/docs-csm-latest.noarch.rpm
Install libcsm
.
NOTE Since
libcsm
depends on versions of Python relative to what is included in the SLES service packs, then in the event thatncn-m002
is running a newer SLES distro a newlibcsm
must be downloaded. This will often be the case when jumping to a new CSM minor version (e.g. CSM 1.3 to CSM 1.4). e.g. ifncn-m001
is running SLES15SP3, andncn-m002
is running SLES15SP4 then the SLES15SP4libcsm
is needed. Follow the Check for latest documentation guide again, but fromncn-m002
.
rpm -Uvh --force /root/libcsm-latest.noarch.rpm
(ncn-m002#
) Upgrade ncn-m001
.
NOTE
If Kubernetes encryption has been enabled via the Kubernetes Encryption Documentation, then backup the/etc/cray/kubernetes/encryption
directory on the master node before upgrading and restore the directory after the node has been upgraded.
/usr/share/doc/csm/upgrade/scripts/upgrade/ncn-upgrade-master-nodes.sh ncn-m001
NOTE
Theroot
user password for the node may need to be reset after it is rebooted. Additionally, the/etc/cray/kubernetes/encryption
directory should be restored if it was backed up. Once it is restored, thekube-apiserver
on the rebuilt node should be restarted. See Kuberneteskube-apiserver
Failing for details on how to restart thekube-apiserver
.
(ncn-m001#
) Set XNAME
to the xname of the node that is being upgraded.
XNAME=<node_xname>
(ncn-m001#
) Set IMS_IMAGE_ID
to the image ID that should be upgraded to.
IMS_IMAGE_ID=<image_id>
(ncn-m001#
) Set the image ID in BSS.
/usr/share/doc/csm/scripts/operations/node_management/assign-ncn-images.sh -p $IMS_IMAGE_ID $XNAME
The following steps will update the node’s desired configuration but will leave it disabled. It will automatically enable and be applied after the node is upgraded.
(ncn-m001#
) Set CFS_CONFIG_NAME
to the configuration that should be used once the node has been upgraded.
CFS_CONFIG_NAME=<cfs_configuration>
(ncn-m001#
) Set CFS configuration.
/usr/share/doc/csm/scripts/operations/configuration/apply_csm_configuration.sh \
--no-config-change --no-enable --config-name $CFS_CONFIG_NAME --xnames $XNAME