Stage 1 - CSM Service Upgrades

Reminder: If any problems are encountered and the procedure or command output does not provide relevant guidance, see Relevant troubleshooting links for upgrade-related issues.

Start typescript

  1. (ncn-m001#) If a typescript session is already running in the shell, then first stop it with the exit command.

  2. (ncn-m001#) Start a typescript.

    script -af /root/csm_upgrade.$(date +%Y%m%d_%H%M%S).stage_1.txt
    export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
    

If additional shells are opened during this procedure, then record those with typescripts as well. When resuming a procedure after a break, always be sure that a typescript is running before proceeding.

Perform upgrade

During this stage there will be a brief (approximately five minutes) window where pods with Persistent Volumes (PVs) will not be able to migrate between nodes. This is due to a redeployment of the Ceph csi provisioners into namespaces, in order to accommodate the newer charts and a better upgrade strategy.

  1. (ncn-m001#) Set the SW_ADMIN_PASSWORD environment variable.

    Set it to the admin user password for the switches. This is required for post-upgrade tests.

    read -s is used to prevent the password from being written to the screen or the shell history.

    read -s SW_ADMIN_PASSWORD
    
    export SW_ADMIN_PASSWORD
    
  2. (ncn-m001#) Ensure that the CSM_RELEASE variable is set to the target CSM version of this upgrade.

    export CSM_RELEASE=1.5.0
    
  3. (ncn-m001#) Perform the upgrade.

    Run csm-upgrade.sh to deploy upgraded CSM applications and services.

    /usr/share/doc/csm/upgrade/scripts/upgrade/csm-upgrade.sh
    

    NOTE: If some charts fail to upgrade due to timeout, follow the instructions in Helm chart deployment timeout to increase the timeout value and re-run the csm-upgrade.sh script.

Verify Keycloak users

  1. (ncn-m001#) Verify that the Keycloak users localize job has completed as expected.

    This step can be skipped if user localization is not required.

    After an upgrade, it is possible that all expected Keycloak users were not localized. See Verification procedure to confirm that Keycloak localization has completed as expected.

Take Etcd Manual Backup

  1. (ncn-m001#) Execute the following script to take a manual backup of the Etcd clusters. These clusters are automatically backed up every 24 hours, but taking a manual backup at this stage in the upgrade enables restoring from backup later in this process if needed.

    /usr/share/doc/csm/scripts/operations/etcd/take-etcd-manual-backups.sh post_upgrade
    

Configure E1000 node and Redfish Exporter for SMART data

NOTE: Please follow this step if SMART disk data is needed for E1000 node.

This step is for getting the SMART data from the disks on E1000 node using the Redfish exporter into prometheus time-series database. To configure the LDAP instance on the E1000 primary management node and reconfigure the redfish-exporter instance running on the ncn, see Configure E1000 node and Redfish Exporter.

Stop typescript

For any typescripts that were started during this stage, stop them with the exit command.

Stage completed

This stage is completed. There are two different upgrade paths moving forward: