iscsid
and multipathd
services are not enabled on compute nodes and UANsiSCSI based boot content projection which is also known as “Scalable Boot Content Projection” (SBPS) for rootfs
and PE
images
is supported in CSM version CSM 1.6.0 and later. On a customer system, using CSM-1.6.0
with USS-1.1.x
on compute nodes/ UANs in order to support AARCH64 images,
iscsid
and multipathd
services are not enabled by default. SBPS will not be resilient across worker node reboots if these services are not enabled by default on compute nodes or UANs.
This issue can be identified by the following symptoms:
On a compute node or UAN (iSCSI Initiator) we can observe the following SQUASHFS error messages in the console log:
dmesg -T | grep "SQUASHFS error" | head -n 1
Example output:
[Sat Nov 2 22:32:41 2024] SQUASHFS error: xz decompression failed, data probably corrupt
On a compute node or UAN (iSCSI Initiator) we can observe that the iscsid
service is not active:
ncn-s004# systemctl status iscsid
● iscsid.service - Open-iSCSI
Loaded: loaded (/usr/lib/systemd/system/iscsid.service; disabled; preset: disabled)
Active: active (running) since Wed 2024-11-06 08:16:23 CST; 1 day 4h ago
TriggeredBy: ● iscsid.socket
From the journalctl
logs:
Nov 07 10:24:23 nid000004 iscsid[22286]: iscsid: Kernel reported iSCSI connection 2:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
Nov 07 10:25:14 nid000004 iscsid[22286]: iscsid: connect to 10.253.0.3:3260 failed (No route to host)
...
Nov 07 10:30:43 nid000004 iscsid[22286]: iscsid: connect to 10.253.0.3:3260 failed (Connection refused)
...
csm-packages
from compute node BOS session templateExample:
Find the session template name (ncn#
):
cray bos sessiontemplates list | grep compute-*
Example output:
name = "compute-25.1.0-alpha2.x86_64-csm-160-rc4"
Using the sessiontemplate
name from the previous command, find the configuration
name (ncn#
):
cray bos sessiontemplates describe compute-25.1.0-alpha2.x86_64-csm-160-rc4 --format json
Example output:
{
"cfs": {
"configuration": "compute-25.1.0-alpha2-csm-160-rc4"
}
}
Use the configuration
value to describe the configuration (ncn#
):
cray cfs configurations describe compute-25.1.0-alpha2-csm-160-rc4 --format json
Example output:
{
"lastUpdated": "2024-11-02T12:15:21Z",
"layers": [
{
"cloneUrl": "https://api-gw-service-nmn.local/vcs/cray/csm-config-management.git",
"commit": "d530f0a277c9d5dc9e3cb487d32d6b316757f00e",
"name": "csm-packages-1.6.0-rc.4",
"playbook": "csm_packages.yml"
}
]
}
The name
from the describe above identifies the product catalog. Use the version after csm-packages-
in the next step.
csm-config
branch (@VCS) from product catalog given csm-packages-*
version found from Step-1
Example (ncn-#
):
kubectl get cm -n services cray-product-catalog -o yaml | yq - r 'data.csm' | grep ^1.6.0-rc.4: -A 10
Example output:
1.6.0-rc.4:
configuration:
clone_url: https://vcs.cmn.fanta.hpc.amslabs.hpecorp.net/vcs/cray/csm-config-management.git
commit: d530f0a277c9d5dc9e3cb487d32d6b316757f00e
import_branch: cray/csm/1.27.2
The import_branch
from above output will be used below.
csm-config-management.git
@ VCSExample (ncn#
):
USERNAME=$( kubectl get secrets -n services vcs-user-credentials -o json | jq -r .data.vcs_username | base64 -d )
PSWD=$( kubectl get secrets -n services vcs-user-credentials -o json | jq -r .data.vcs_password | base64 -d )
git clone https://api-gw-service-nmn.local/vcs/cray/csm-config-management.git
Note: use above $USERNAME and $PSWD for VCS login
import_branch
found in step-2
Example (ncn#
):
cd csm-config-management
git checkout cray/csm/1.27.2
git checkout -b CASMTRIAGE-7509
Note:
cray/csm/1.27.2
is a target branch andCASMTRIAGE-7509
is a new branch
Add new role to enable iscsid
and multipathd
service:
cat > roles/csm.enable_iscsid_multipathd/tasks/main.yml << EOF
---
- name: Ensure iscsid service is started
ansible.builtin.systemd:
name: iscsid
state: started
enabled: true
- name: Ensure multipathd service is started
ansible.builtin.systemd:
name: multipathd
state: started
enabled: true
EOF
Apply the following changes to csm-config-management/csm_packages.yml
to Application-nodes only play and Compute-nodes only play under csm_services
in order to enable iscsid
and multipathd
services.
diff --git a/csm_packages.yml b/csm_packages.yml
index e3366f8..b223aec 100755
--- a/csm_packages.yml
+++ b/csm_packages.yml
@@ -137,6 +137,9 @@
vars:
packages: "{{application_csm_sles_packages }}"
when: ansible_distribution_file_variety == "SUSE"
+ # Enable iscsid and multipathd service
+ - role: csm.enable_iscsid_multipathd
+
tasks:
- name: Enable smart service
systemd:
@@ -148,3 +151,12 @@
name: cray-node-exporter
state: started
enabled: true
+
+# Compute-nodes only play
+- hosts: Compute:!cfs_image
+ gather_facts: no
+ any_errors_fatal: true
+ remote_user: root
+ roles:
+ # Enable iscsid and multipathd service
+ - role: csm.enable_iscsid_multipathd
Example (ncn#
):
git add csm_packages.yml
git commit -m "fix for CASMTRIAGE-7509"
git push --set-upstream origin CASMTRIAGE-7509
COMMIT="$(git log -1 --pretty='format:%H')"
echo $COMMIT
Example output:
bf214b8a9867531a38f8ca28b6ffae1fe56724ce
Example (ncn#
):
SESSIONTEMPLATE=compute-25.1.0-alpha2.x86_64-csm-160-rc4
CFS_CONFIG="$(cray bos sessiontemplates describe "$SESSIONTEMPLATE" --format json | jq -r .cfs.configuration)"
cray cfs configurations describe "$CFS_CONFIG" --format json | jq '. | del(.lastUpdated) | del(.name)' > "$CFS_CONFIG"
Update the $COMMIT
from 5: Commit the changes and push them to VCS (ncn#
):
vim "$CFS_CONFIG"
cat "$CFS_CONFIG"
Example output:
{
"layers": [
{
"cloneUrl": "https://api-gw-service-nmn.local/vcs/cray/csm-config-management.git",
"commit": "bf214b8a9867531a38f8ca28b6ffae1fe56724ce",
"name": "csm-packages-1.6.0-rc.4",
"playbook": "csm_packages.yml"
}
]
}
cfs
configUpdate cfs
(ncn#
):
cray cfs configurations update --file $CFS_CONFIG $CFS_CONFIG
Please refer to: Create BOS session template for iSCSI SBPS