dmesg may be flooded with iSCSI errorsOn a CSM 1.6.x cluster, after upgrading to CSM 1.7.0, the Kubernetes worker nodes may hang and become unresponsive due to LUN remapping followed by a flood of the following message appearing in the Kubernetes worker nodes dmesg logs.
[Fri Feb 6 19:31:01 2026] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x0000004b from iqn.2023-06.csm.iscsi:x1005c6s4b1n0
Note: LUN remapping occurs when the storage system changes the Logical Unit Number (LUN) identifiers associated with previously mapped storage devices presented to the host.
In CSM 1.7.0, the target port group (TPG) was temporarily disabled and later re-enabled to address an issue where worker nodes were entering a hung state
during the management node rollout stage. However, disabling the TPG during the management node rollout resulted in LUN remapping and dmesg being flooded with
Detected NON_EXISTENT_LUN Access errors, which eventually caused the worker nodes to become unresponsive.
This issue has been resolved in CSM 1.7.1, and the fix now needs to be backported to the 1.6.x release.
The resolution is to apply the following fixes to remove the TPG disablement, followed by restarting the target service on the Kubernetes worker nodes.
After the update-vcs-config stage of the CSM upgrade through IUF, follow the procedure below:
cfs-configncn-mw#) Retrieve the Latest CSM Versionkubectl get cm cray-product-catalog -n services -o jsonpath='{.data.csm}' | yq r -j - | jq -r 'to_entries[] | .key' | sort -V | tail -n 1
Example output:
1.7.0
ncn-mw#) Find the CFS configuration branch associated with the CSM version found in Step 1kubectl get cm -n services cray-product-catalog -o yaml | yq r - 'data.csm' | grep ^1.7.0: -A 10 | grep import_branch
Example output:
import_branch: cray/csm/1.48.2
Note: The import_branch from this output will be used below.
ncn-mw#) Apply Changes to VCSGet VCS credentials:
GITUSER=$( kubectl get secrets -n services vcs-user-credentials -o json | jq -r .data.vcs_username | base64 -d)
GITPASS=$( kubectl get secrets -n services vcs-user-credentials -o json | jq -r .data.vcs_password | base64 -d)
Clone csm-config-management.git(Use above VCS credentials) repo:
git clone https://$GITUSER:$GITPASS@api-gw-service-nmn.local/vcs/cray/csm-config-management.git
Check out the import_branch identified in the Step 2:
In the following command, substitute the actual branch name found in the Step 2:
cd csm-config-management
git checkout cray/csm/1.48.2
Create new branch under import_branch:
For example:
git branch CAST-39550
git checkout CAST-39550
Create patch file by copying below content into new iscsi_fix.patch file:
cat > iscsi_fix.patch << 'EOF'
diff --git a/roles/csm.sbps.lio_config/files/provision_iscsi_server.sh b/roles/csm.sbps.lio_config/files/provision_iscsi_server.sh
index 5921e68..452f793 100644
--- a/roles/csm.sbps.lio_config/files/provision_iscsi_server.sh
+++ b/roles/csm.sbps.lio_config/files/provision_iscsi_server.sh
@@ -70,11 +70,6 @@ function auto_generate_node_acls()
targetcli "/iscsi/${TARGET_SERVER_IQN}/tpg1 set attribute generate_node_acls=1"
}
-function disable_target_port()
-{
- targetcli "/iscsi/${TARGET_SERVER_IQN}/tpg1" disable
-}
-
#--------------------------------------------------------------------
# Base Target Configuration
#--------------------------------------------------------------------
@@ -98,5 +93,4 @@ SERVER_IQN="$(add_server_target)"
#--------------------------------------------------------------------
auto_generate_node_acls "$SERVER_IQN"
-disable_target_port
save_server_config
EOF
Now apply iscsi_fix.patch:
git apply iscsi_fix.patch
Commit and push above patch changes to VCS:
Commit changes:
git add roles/csm.sbps.lio_config/files/provision_iscsi_server.sh
git commit -m "fix for CAST-39550"
Push the changes to VCS:
git push --set-upstream origin CAST-39550
COMMIT=`git log | head -n 1 | awk '{print $2}'`
echo $COMMIT
Example output:
3aac2910252cc55a8b952a36243826a10e70b705
ncn-mw#) Update cray product catalog with new commit id captured from the above stepUpdate the cray product catalog:
kubectl edit cm -n services cray-product-catalog
Example of current cray-product-catalog snippet:
1.7.0:
configuration:
clone_url: https://vcs.cmn.vidar.hpc.amslabs.hpecorp.net/vcs/cray/csm-config-management.git
commit: 26d739d6d2642e72d55cfc7b141902ae4fdceb95
import_branch: cray/csm/1.48.2
Optional command to get the current “commit id” using:
kubectl get cm -n services cray-product-catalog -o yaml | yq r - 'data.csm' | grep ^1.7.0: -A 10 | grep commit
Example output:
commit: 26d739d6d2642e72d55cfc7b141902ae4fdceb95
Replace the above commit id with the new commit id (obtained using echo $COMMIT).
Example of cray-product-catalog snippet after update:
1.7.0:
configuration:
clone_url: https://vcs.cmn.vidar.hpc.amslabs.hpecorp.net/vcs/cray/csm-config-management.git
commit: 3aac2910252cc55a8b952a36243826a10e70b705
import_branch: cray/csm/1.48.2
Optional command just to validate the updated “commit id” using:
kubectl get cm -n services cray-product-catalog -o yaml | yq r - 'data.csm' | grep ^1.7.0: -A 10 | grep commit
Example output:
commit: 3aac2910252cc55a8b952a36243826a10e70b705
ncn-mw#) Get and update CFS configuration with new commit idExample case for node ncn-w001:
XNAME=$(ssh ncn-w001 cat /etc/cray/xname)
CONFIG=$(cray cfs components describe $XNAME --format json | jq -r '.desiredConfig')
cray cfs configurations describe $CONFIG --format json | jq -r '. | del(.name) | del(.lastUpdated)' > ${CONFIG}.json
vim ${CONFIG}.json
Example:
{
"cloneUrl": "https://api-gw-service-nmn.local/vcs/cray/csm-config-management.git",
"commit": "3aac2910252cc55a8b952a36243826a10e70b705",
"name": "csm-sbps_iscsi_targets-1.7.0",
"playbook": "config_sbps_iscsi_targets.yml"
},
ncn-mw#) Update CFS configurationcray cfs configurations update --file ${CONFIG}.json ${CONFIG}
Note: The status is stored in configurationStatus field of the below command. Wait till it changes to configured.
For Example:
cray cfs components describe $XNAME
Example output:
configurationStatus = "pending"
desiredConfig = "management-release-cr_2025-2319974"
enabled = true
errorCount = 0
…
configurationStatus = "configured"
desiredConfig = "management-release-cr_2025-2319974"
enabled = true
errorCount = 0
…
target service on Kubernetes worker nodes after upgradencn-w#) Perform Pre-checksOn Kubernetes worker node:
cfs configuration completioniSCSI SBPS is up and running:
targetcli ls shows the LUNs and portals createdsystemctl status sbps-marshal.service shows sbps marshal agent is in active (running) statencn-w#) Restart the target servicesystemctl restart target.service
ncn-w#) Check the target service statussystemctl status target.service
Example output:
● target.service - Restore LIO kernel target configuration
Loaded: loaded (/usr/lib/systemd/system/target.service; enabled; preset: disabled)
Active: active (exited) since Mon 2026-03-09 07:42:44 UTC; 10h ago
Process: 1625411 ExecStart=/usr/bin/targetctl restore $CONFIG_FILE (code=exited, status=0/SUCCESS)
Main PID: 1625411 (code=exited, status=0/SUCCESS)
CPU: 899ms
Mar 09 07:42:38 ncn-w004 systemd[1]: Starting Restore LIO kernel target configuration...
Mar 09 07:42:44 ncn-w004 systemd[1]: Finished Restore LIO kernel target configuration.