The Scalable Boot Projection Service (SBPS)
supports iSCSI-based boot content projection for rootfs
and PE
images in CSM version CSM 1.6.0 and
later.
In CSM 1.6, there is a bug that may cause DNS “A” records for the NMN not to be created. iSCSI SBPS projection over NMN will fail if DNS “A” (address) records for the NMN are not created. This bug is fixed in CSM 1.7.0.
This issue can be identified by the following symptoms:
On any worker node, use one of the following commands to observe that the DNS “SRV” “A” records are missing for NMN
powerdns
commandkubectl -n services exec -it deployment/cray-dns-powerdns -- sh -c 'pdnsutil list-all-zones | xargs -n1 pdnsutil list-zone | grep iscsi'
Example output:
_sbps-hsn._tcp.drax.hpc.amslabs.hpecorp.net 3600 IN SRV 1 0 3260 iscsi-server-id-004.hsn.drax.hpc.amslabs.hpecorp.net.
_sbps-hsn._tcp.drax.hpc.amslabs.hpecorp.net 3600 IN SRV 1 0 3260 iscsi-server-id-002.hsn.drax.hpc.amslabs.hpecorp.net.
_sbps-hsn._tcp.drax.hpc.amslabs.hpecorp.net 3600 IN SRV 1 0 3260 iscsi-server-id-001.hsn.drax.hpc.amslabs.hpecorp.net.
_sbps-nmn._tcp.drax.hpc.amslabs.hpecorp.net 3600 IN SRV 1 0 3260 iscsi-server-id-003.nmn.drax.hpc.amslabs.hpecorp.net.
_sbps-nmn._tcp.drax.hpc.amslabs.hpecorp.net 3600 IN SRV 1 0 3260 iscsi-server-id-004.nmn.drax.hpc.amslabs.hpecorp.net.
_sbps-nmn._tcp.drax.hpc.amslabs.hpecorp.net 3600 IN SRV 1 0 3260 iscsi-server-id-001.nmn.drax.hpc.amslabs.hpecorp.net.
_sbps-nmn._tcp.drax.hpc.amslabs.hpecorp.net 3600 IN SRV 1 0 3260 iscsi-server-id-002.nmn.drax.hpc.amslabs.hpecorp.net.
iscsi-server-id-001.hsn.drax.hpc.amslabs.hpecorp.net 3600 IN A 10.253.0.4
iscsi-server-id-002.hsn.drax.hpc.amslabs.hpecorp.net 3600 IN A 10.253.0.2
iscsi-server-id-003.hsn.drax.hpc.amslabs.hpecorp.net 3600 IN A 10.253.0.8
iscsi-server-id-004.hsn.drax.hpc.amslabs.hpecorp.net 3600 IN A 10.253.0.14
dig
commanddig -t SRV +short _sbps-hsn._tcp.drax.hpc.amslabs.hpecorp.net _sbps-nmn._tcp.drax.hpc.amslabs.hpecorp.net
Example output:
1 0 3260 iscsi-server-id-004.hsn.drax.hpc.amslabs.hpecorp.net.
1 0 3260 iscsi-server-id-003.hsn.drax.hpc.amslabs.hpecorp.net.
1 0 3260 iscsi-server-id-002.hsn.drax.hpc.amslabs.hpecorp.net.
1 0 3260 iscsi-server-id-001.hsn.drax.hpc.amslabs.hpecorp.net.
1 0 3260 iscsi-server-id-002.nmn.drax.hpc.amslabs.hpecorp.net.
1 0 3260 iscsi-server-id-003.nmn.drax.hpc.amslabs.hpecorp.net.
1 0 3260 iscsi-server-id-004.nmn.drax.hpc.amslabs.hpecorp.net.
1 0 3260 iscsi-server-id-001.nmn.drax.hpc.amslabs.hpecorp.net.
dig -t A +short iscsi-server-id-004.hsn.drax.hpc.amslabs.hpecorp.net. iscsi-server-id-003.hsn.drax.hpc.amslabs.hpecorp.net. iscsi-server-id-002.hsn.drax.hpc.amslabs.hpecorp.net. iscsi-server-id-001.hsn.drax.hpc.amslabs.hpecorp.net. iscsi-server-id-002.nmn.drax.hpc.amslabs.hpecorp.net. iscsi-server-id-003.nmn.drax.hpc.amslabs.hpecorp.net. iscsi-server-id-004.nmn.drax.hpc.amslabs.hpecorp.net. iscsi-server-id-001.nmn.drax.hpc.amslabs.hpecorp.net.
Example output:
10.253.0.14
10.253.0.8
10.253.0.2
10.253.0.4
On a compute node or UAN (iSCSI initiator), these error messages may appear on the console when iSCSI SBPS is booted over the NMN.
[ 139.758355] dracut-pre-mount[2358]: Warning: sbps-init.sh failed.
[ 139.772137] dracut-pre-mount[2353]: Warning: Unable to prepare squashfs file /tmp/cps/rootfs, dropping to debug.
//lib/dracut/hooks/emergency/10-cray-dump-dracut-log.sh: line 12: echo: write error: Invalid argumentGenerating "/run/initramfs/rdsosreport.Press Enter for maintenance
(or press Control-D to continue): &.
csm-packages
version(ncn-mw#
) Get the version of the csm-packages
Ansible layer.
Find the compute node BOS session template name.
cray bos sessiontemplates list | grep compute-*
Example output:
name = "compute-25.3.0-alpha2.x86_64-161rc7"
Find the CFS configuration associated with that BOS session template.
In the following command, substitute the actual template name found in the previous step.
cray bos sessiontemplates describe compute-25.3.0-alpha2.x86_64-161rc7 --format json
Example output:
{
"cfs": {
"configuration": "compute-25.3.0-alpha2-161rc7"
}
}
Describe the CFS configuration identified in the previous step.
In the following command, substitute the actual configuration name found in the previous step.
cray cfs configurations describe compute-25.3.0-alpha2-161rc7 --format json | grep csm-packages-
Example output:
"name": "csm-packages-1.6.1-rc.7",
Use the version after csm-packages-
in the next step.
(ncn-mw#
) In the Cray Product Catalog, get the import_branch
name associated with the CSM version string from the last step.
In the following command, substitute the actual version found in the previous step.
kubectl get cm -n services cray-product-catalog -o yaml | yq - r 'data.csm' | grep ^1.6.1-rc.7: -A 10
Example output:
1.6.1-rc.7:
configuration:
clone_url: https://vcs.cmn.drax.hpc.amslabs.hpecorp.net/vcs/cray/csm-config-management.git
commit: eaa4ee6948961592b0fa279ae775326ad63eb875
import_branch: cray/csm/1.28.0
The import_branch
from this output will be used below.
(ncn-mw#
) Clone the CSM VCS git repository and apply the workaround.
Clone csm-config-management.git
.
GITUSER=$( kubectl get secrets -n services vcs-user-credentials -o json | jq -r .data.vcs_username | base64 -d )
GITPASS=$( kubectl get secrets -n services vcs-user-credentials -o json | jq -r .data.vcs_password | base64 -d )
git clone https://$GITUSER:$GITPASS@api-gw-service-nmn.local/vcs/cray/csm-config-management.git
Check out the import_branch
identified in the previous section.
In the following command, substitute the actual branch name found in the previous section.
cd csm-config-management
git checkout cray/csm/1.28.0
Copy scripts to any worker node and execute them.
This example uses ncn-w001
, but any worker may be used.
scp roles/csm.sbps.dns_srv_records/files/sbps_get_host_hsn_nmn.sh \
roles/csm.sbps.dns_srv_records/files/sbps_dns_srv_records.sh ncn-w001:/tmp \
&& ssh ncn-w001 'chmod +x /tmp/sbps_get_host_hsn_nmn.sh /tmp/sbps_dns_srv_records.sh && /tmp/sbps_get_host_hsn_nmn.sh | /tmp/sbps_dns_srv_records.sh'
(ncn-w#
) On any worker node, issue one of the following commands to verify that the DNS “SRV” “A” records for the NMN are shown.
powerdns
command after workaroundkubectl -n services exec -it deployment/cray-dns-powerdns -- sh -c 'pdnsutil list-all-zones | xargs -n1 pdnsutil list-zone | grep iscsi'
Example output:
_sbps-hsn._tcp.drax.hpc.amslabs.hpecorp.net 3600 IN SRV 1 0 3260 iscsi-server-id-003.hsn.drax.hpc.amslabs.hpecorp.net.
_sbps-hsn._tcp.drax.hpc.amslabs.hpecorp.net 3600 IN SRV 1 0 3260 iscsi-server-id-004.hsn.drax.hpc.amslabs.hpecorp.net.
_sbps-hsn._tcp.drax.hpc.amslabs.hpecorp.net 3600 IN SRV 1 0 3260 iscsi-server-id-002.hsn.drax.hpc.amslabs.hpecorp.net.
_sbps-hsn._tcp.drax.hpc.amslabs.hpecorp.net 3600 IN SRV 1 0 3260 iscsi-server-id-001.hsn.drax.hpc.amslabs.hpecorp.net.
_sbps-nmn._tcp.drax.hpc.amslabs.hpecorp.net 3600 IN SRV 1 0 3260 iscsi-server-id-003.nmn.drax.hpc.amslabs.hpecorp.net.
_sbps-nmn._tcp.drax.hpc.amslabs.hpecorp.net 3600 IN SRV 1 0 3260 iscsi-server-id-004.nmn.drax.hpc.amslabs.hpecorp.net.
_sbps-nmn._tcp.drax.hpc.amslabs.hpecorp.net 3600 IN SRV 1 0 3260 iscsi-server-id-001.nmn.drax.hpc.amslabs.hpecorp.net.
_sbps-nmn._tcp.drax.hpc.amslabs.hpecorp.net 3600 IN SRV 1 0 3260 iscsi-server-id-002.nmn.drax.hpc.amslabs.hpecorp.net.
iscsi-server-id-001.hsn.drax.hpc.amslabs.hpecorp.net 3600 IN A 10.253.0.4
iscsi-server-id-002.hsn.drax.hpc.amslabs.hpecorp.net 3600 IN A 10.253.0.2
iscsi-server-id-003.hsn.drax.hpc.amslabs.hpecorp.net 3600 IN A 10.253.0.8
iscsi-server-id-004.hsn.drax.hpc.amslabs.hpecorp.net 3600 IN A 10.253.0.14
iscsi-server-id-001.nmn.drax.hpc.amslabs.hpecorp.net 3600 IN A 10.252.1.7
iscsi-server-id-002.nmn.drax.hpc.amslabs.hpecorp.net 3600 IN A 10.252.1.8
iscsi-server-id-003.nmn.drax.hpc.amslabs.hpecorp.net 3600 IN A 10.252.1.9
iscsi-server-id-004.nmn.drax.hpc.amslabs.hpecorp.net 3600 IN A 10.252.1.10
dig
command after workarounddig -t SRV +short _sbps-hsn._tcp.drax.hpc.amslabs.hpecorp.net _sbps-nmn._tcp.drax.hpc.amslabs.hpecorp.net
Example output:
1 0 3260 iscsi-server-id-004.hsn.drax.hpc.amslabs.hpecorp.net.
1 0 3260 iscsi-server-id-003.hsn.drax.hpc.amslabs.hpecorp.net.
1 0 3260 iscsi-server-id-002.hsn.drax.hpc.amslabs.hpecorp.net.
1 0 3260 iscsi-server-id-001.hsn.drax.hpc.amslabs.hpecorp.net.
1 0 3260 iscsi-server-id-002.nmn.drax.hpc.amslabs.hpecorp.net.
1 0 3260 iscsi-server-id-003.nmn.drax.hpc.amslabs.hpecorp.net.
1 0 3260 iscsi-server-id-004.nmn.drax.hpc.amslabs.hpecorp.net.
1 0 3260 iscsi-server-id-001.nmn.drax.hpc.amslabs.hpecorp.net.
dig -t A +short iscsi-server-id-004.hsn.drax.hpc.amslabs.hpecorp.net. iscsi-server-id-003.hsn.drax.hpc.amslabs.hpecorp.net. iscsi-server-id-002.hsn.drax.hpc.amslabs.hpecorp.net. iscsi-server-id-001.hsn.drax.hpc.amslabs.hpecorp.net. iscsi-server-id-002.nmn.drax.hpc.amslabs.hpecorp.net. iscsi-server-id-003.nmn.drax.hpc.amslabs.hpecorp.net. iscsi-server-id-004.nmn.drax.hpc.amslabs.hpecorp.net. iscsi-server-id-001.nmn.drax.hpc.amslabs.hpecorp.net.
Example output:
10.253.0.14
10.253.0.8
10.253.0.2
10.253.0.4
10.252.1.8
10.252.1.9
10.252.1.10
10.252.1.7
Booting iSCSI SBPS over the NMN should now succeed.