This page will assist an administrator with changing the NCN udev
rules for varying PCIe hardware.
NOTE:
If a system’s hardware is Plan of Record (PoR), then this page is not needed.
Identify the hardware configuration by PXE booting a node.
Prevent the network boots from completing by removing the links generated by set-sqfs-links.sh
.
pit# rm /var/www/ncn-*/{initrd.img.xz,kernel,filesystem.squashfs}
The NCNs will fetch the iPXE binary and then pause; this pause prevents the NCN from continuing to boot, providing an opportunity to collect information from it.
Go through each NCN and PXE boot it.
Replace username
and IPMI_PASSWORD
with the present values for the system’s BMCs.
read -s
is used to prevent the password from being echoed to the screen or preserved in the shell history.
pit# username=root
pit# read -r -s -p "NCN BMC ${username} password: " IPMI_PASSWORD
pit# export IPMI_PASSWORD
pit# mtoken='ncn-m(?!001)\w+-mgmt' ; stoken='ncn-s\w+-mgmt' ; wtoken='ncn-w\w+-mgmt'
pit# grep -oP "($mtoken|$stoken|$wtoken)" /etc/dnsmasq.d/statics.conf | sort -u |
xargs -t -i ipmitool -I lanplus -U "${username}" -E -H {} power off
pit# grep -oP "($mtoken|$stoken|$wtoken)" /etc/dnsmasq.d/statics.conf | sort -u |
xargs -t -i ipmitool -I lanplus -U "${username}" -E -H {} power on
Each node will attempt to PXE boot; successful network boots will dump their PCI-SIG to the console. This data can be cross-referenced with the NCN networking page; the directions for this come next.
Collect information from the nodes.
Collect the PIT’s device IDs:
pit# lid
Example output:
em1 8086:37D2
em2 8086:37D2
p801p1 15B3:1013
p801p2 15B3:1013
Collect the other NCNs’ PCI Device and PCI Vendor IDs:
pit# for file in /var/log/conman/console*ncn*; do
echo ${file}
grep -Eoh '(net[0-9] MAC .*)' "${file}" | sort -u | grep PCI && echo -----
done
Example output:
/var/log/conman/console.ncn-m001-mgmt
/var/log/conman/console.ncn-m002-mgmt
net0 MAC b8:59:9f:f9:1c:8e PCI.DeviceID 1013 PCI.VendorID 15b3
net1 MAC b8:59:9f:f9:1c:8f PCI.DeviceID 1013 PCI.VendorID 15b3
-----
/var/log/conman/console.ncn-m003-mgmt
net0 MAC a4:bf:01:6f:6a:fe PCI.DeviceID 37d2 PCI.VendorID 8086
net1 MAC a4:bf:01:6f:6a:ff PCI.DeviceID 37d2 PCI.VendorID 8086
net2 MAC b8:59:9f:fe:49:9c PCI.DeviceID 1013 PCI.VendorID 15b3
net3 MAC b8:59:9f:fe:49:9d PCI.DeviceID 1013 PCI.VendorID 15b3
-----
/var/log/conman/console.ncn-s001-mgmt
net0 MAC b8:59:9f:4a:f6:58 PCI.DeviceID 1013 PCI.VendorID 15b3
net1 MAC b8:59:9f:4a:f6:59 PCI.DeviceID 1013 PCI.VendorID 15b3
-----
/var/log/conman/console.ncn-s002-mgmt
net0 MAC b8:59:9f:fe:49:ec PCI.DeviceID 1013 PCI.VendorID 15b3
net1 MAC b8:59:9f:fe:49:ed PCI.DeviceID 1013 PCI.VendorID 15b3
-----
/var/log/conman/console.ncn-s003-mgmt
net0 MAC a4:bf:01:48:1f:6c PCI.DeviceID 37d2 PCI.VendorID 8086
net1 MAC a4:bf:01:48:1f:6d PCI.DeviceID 37d2 PCI.VendorID 8086
net2 MAC b8:59:9f:f9:1c:ba PCI.DeviceID 1013 PCI.VendorID 15b3
net3 MAC b8:59:9f:f9:1c:bb PCI.DeviceID 1013 PCI.VendorID 15b3
-----
/var/log/conman/console.ncn-s004-mgmt
net0 MAC b8:59:9f:2b:31:1a PCI.DeviceID 1013 PCI.VendorID 15b3
net1 MAC b8:59:9f:2b:31:1b PCI.DeviceID 1013 PCI.VendorID 15b3
-----
/var/log/conman/console.ncn-w001-mgmt
net0 MAC 50:6b:4b:23:a7:90 PCI.DeviceID 1017 PCI.VendorID 15b3
net1 MAC b8:59:9f:fe:49:d8 PCI.DeviceID 1013 PCI.VendorID 15b3
net2 MAC b8:59:9f:fe:49:d9 PCI.DeviceID 1013 PCI.VendorID 15b3
-----
/var/log/conman/console.ncn-w002-mgmt
net0 MAC 50:6b:4b:23:a7:98 PCI.DeviceID 1017 PCI.VendorID 15b3
net1 MAC b8:59:9f:fe:49:f0 PCI.DeviceID 1013 PCI.VendorID 15b3
net2 MAC b8:59:9f:fe:49:f1 PCI.DeviceID 1013 PCI.VendorID 15b3
-----
/var/log/conman/console.ncn-w003-mgmt
net0 MAC b8:59:9f:d9:9e:2c PCI.DeviceID 1013 PCI.VendorID 15b3
net1 MAC b8:59:9f:d9:9e:2d PCI.DeviceID 1013 PCI.VendorID 15b3
-----
Use the information returned in the previous step to compare the PCI.DeviceID
and PCI.VendorID
values to what is in
NCN Networking Vendor and Bus ID Identification.
Since PoR systems are handled with defaults, at this point one should notice differing PCI.DeviceID
values than the bold entries in the Vendor and Bus ID table.
After identifying which cards should be used for the management NICs, follow either of the below options, depending on the scope of the PCIe card change:
If all NCNs have the same change (e.g. all NCNs use ConnectX-5s for their management NICs), then update the main /var/www/boot/script.ipxe
file and re-run set-sqfs-links.sh
.
Replace the default Vendor ID with the desired Intel Vendor ID.
pit# sed -i 's/mgmt_vid0 .*/mgmt_vid0 8086/g' /var/www/boot/script.ipxe
Restore the initrd.img.xz
, kernel, and filesystem.squashfs
links to the boot directories.
pit# set-sqfs-links.sh
If only a subset of the NCNs have differing cards, re-run set-sqfs-links.sh
and then update just that subset of boot scripts:
The below example sets Intel as the management NICs, meaning onboards or Intel PCIe cards would be used for management interfaces. Additionally the example is applying this change to only a subset of nodes; specifically it is applying it to worker nodes only.
Restore the initrd.img.xz
, kernel, and filesystem.squashfs
links to the boot directories.
pit# set-sqfs-links.sh
Replace the default Vendor ID with the desired Intel Vendor ID.
pit# sed -i 's/mgmt_vid0 .*/mgmt_vid0 8086/g' /var/www/ncn-w*/script.ipxe
Now the boot scripts are set up for booting differing PCIe cards or onboard NICs.
Ensure that the management NICs do not get labeled as HSN NICs.
In some cases the cards used for HSN NICs are used for management interfaces (for example, the system’s storage and master nodes use ConnectX-5s). In this case, this procedure will ensure that they are properly labeled.
Note: The HSN NICs key off of the Device ID, not the Vendor ID.
Restore the initrd.img.xz
, kernel, and filesystem.squashfs
links to the boot directories.
pit# set-sqfs-links.sh
Replace the default Vendor ID with the desired Intel Vendor ID.
pit# sed -i 's/hsn_did0 .*/hsn_did0 0000/g' /var/www/ncn-m*/script.ipxe
pit# sed -i 's/hsn_did0 .*/hsn_did0 0000/g' /var/www/ncn-s*/script.ipxe
At this point the system is primed to boot custom PCIe or onboard NICs, and the boot files removed in step 1 are now restored.
Save these scripts off the system for easy re-install at a later date.
pit# SYSTEM_NAME=eniac
pit# tar -czvf "${SYSTEM_NAME}-boot-scripts.tar.gz" /var/www/ncn-*/script.ipxe