cleanup-live-imagesDuring an upgrade from CSM 1.6 using IUF, the management nodes rollout may get stuck in a loop when processing storage nodes. The workflow repeatedly fails at the cleanup-live-images step with warnings about the workflow being in a failed state.
When running the IUF management-nodes-rollout command:
iuf -a "${ACTIVITY_NAME}" -m "${MEDIA_DIR}" run -r management-nodes-rollout --limit-management-rollout ${STORAGE_CANARY}
The output shows repeated failure and retry messages:
WARN [storage-node-upgrade ] - Workflow in Failed state, Retry ...
INFO [storage-node-upgrade ] - Succeeded:
INFO [storage-node-upgrade ] before-each-ncn-s001 - set-bss-image-and-cfs-config
INFO [storage-node-upgrade ] - Running:
INFO [storage-node-upgrade ] reboot-ncn-s001 - cleanup-live-images
The issue occurs when the /run/initramfs/live/ directory on a storage node is empty or missing the expected CSM version subdirectory.
The cleanup-live-images.sh script (called by IUF via /usr/share/doc/csm/workflows/templates/storage.reboot.yaml) returns an exit
status of 1 when there are no directories to remove, causing the workflow to fail.
On affected nodes, the directory structure appears as:
# ls -al /run/initramfs/live/
total 0
drwxr-xr-x 2 root root 6 Mar 12 12:46 .
drwxr-xr-x 7 root root 180 Oct 24 14:22 ..
While on healthy nodes, it contains version subdirectories:
# ls -al /run/initramfs/live/
total 0
drwxr-xr-x 6 root root 58 Jun 11 10:33 .
drwxr-xr-x 7 root root 180 Oct 24 14:22 ..
drwxr-xr-x 2 root root 66 Jan 15 2024 1.5.1
drwxr-xr-x 2 root root 66 Mar 20 2024 1.6.0
drwxr-xr-x 2 root root 66 May 10 2024 1.6.1
drwxr-xr-x 2 root root 66 Jun 11 2024 1.6.2
Create the missing directory structure on the affected storage node with the current CSM version:
(ncn-m#) SSH to the affected storage node:
ssh ncn-s001
(ncn-s#) Verify the directory is empty:
ls -al /run/initramfs/live/
(ncn-s#) Create the directory for the current CSM version (adjust the version number as needed):
mkdir /run/initramfs/live/1.6.2
Return to the management node and retry the IUF step.
Before running the IUF management-nodes-rollout, verify that all storage nodes have the expected directory structure:
pdsh -w ncn-s00[1-3] ls -al /run/initramfs/live/
If any nodes show an empty directory, apply the workaround.