Customize PCIe Hardware

This page will assist an administrator with changing the NCN udev rules for varying PCIe hardware.

NOTE: If a system’s hardware is Plan of Record (PoR), then this page is not needed.

Procedure

Identify the hardware configuration by PXE booting a node.

  1. Prevent the network boots from completing by removing the links generated by set-sqfs-links.sh.

    pit# rm /var/www/ncn-*/{initrd.img.xz,kernel,filesystem.squashfs}
    

    The NCNs will fetch the iPXE binary and then pause; this pause prevents the NCN from continuing to boot, providing an opportunity to collect information from it.

  2. Go through each NCN and PXE boot it.

    Replace username and IPMI_PASSWORD with the present values for the system’s BMCs.

    read -s is used to prevent the password from being echoed to the screen or preserved in the shell history.

    pit# username=root
    pit# read -r -s -p "NCN BMC ${username} password: " IPMI_PASSWORD
    pit# export IPMI_PASSWORD
    pit# mtoken='ncn-m(?!001)\w+-mgmt' ; stoken='ncn-s\w+-mgmt' ; wtoken='ncn-w\w+-mgmt'
    pit# grep -oP "($mtoken|$stoken|$wtoken)" /etc/dnsmasq.d/statics.conf | sort -u |
            xargs -t -i ipmitool -I lanplus -U "${username}" -E -H {} power off
    pit# grep -oP "($mtoken|$stoken|$wtoken)" /etc/dnsmasq.d/statics.conf | sort -u |
            xargs -t -i ipmitool -I lanplus -U "${username}" -E -H {} power on
    

    Each node will attempt to PXE boot; successful network boots will dump their PCI-SIG to the console. This data can be cross-referenced with the NCN networking page; the directions for this come next.

  3. Collect information from the nodes.

    • Collect the PIT’s device IDs:

      pit# lid
      

      Example output:

      em1    8086:37D2
      em2    8086:37D2
      p801p1 15B3:1013
      p801p2 15B3:1013
      
    • Collect the other NCNs’ PCI Device and PCI Vendor IDs:

      pit# for file in /var/log/conman/console*ncn*; do
              echo ${file}
              grep -Eoh '(net[0-9] MAC .*)' "${file}" | sort -u | grep PCI && echo -----
           done
      

      Example output:

      /var/log/conman/console.ncn-m001-mgmt
      /var/log/conman/console.ncn-m002-mgmt
      net0 MAC b8:59:9f:f9:1c:8e PCI.DeviceID 1013 PCI.VendorID 15b3
      net1 MAC b8:59:9f:f9:1c:8f PCI.DeviceID 1013 PCI.VendorID 15b3
      -----
      /var/log/conman/console.ncn-m003-mgmt
      net0 MAC a4:bf:01:6f:6a:fe PCI.DeviceID 37d2 PCI.VendorID 8086
      net1 MAC a4:bf:01:6f:6a:ff PCI.DeviceID 37d2 PCI.VendorID 8086
      net2 MAC b8:59:9f:fe:49:9c PCI.DeviceID 1013 PCI.VendorID 15b3
      net3 MAC b8:59:9f:fe:49:9d PCI.DeviceID 1013 PCI.VendorID 15b3
      -----
      /var/log/conman/console.ncn-s001-mgmt
      net0 MAC b8:59:9f:4a:f6:58 PCI.DeviceID 1013 PCI.VendorID 15b3
      net1 MAC b8:59:9f:4a:f6:59 PCI.DeviceID 1013 PCI.VendorID 15b3
      -----
      /var/log/conman/console.ncn-s002-mgmt
      net0 MAC b8:59:9f:fe:49:ec PCI.DeviceID 1013 PCI.VendorID 15b3
      net1 MAC b8:59:9f:fe:49:ed PCI.DeviceID 1013 PCI.VendorID 15b3
      -----
      /var/log/conman/console.ncn-s003-mgmt
      net0 MAC a4:bf:01:48:1f:6c PCI.DeviceID 37d2 PCI.VendorID 8086
      net1 MAC a4:bf:01:48:1f:6d PCI.DeviceID 37d2 PCI.VendorID 8086
      net2 MAC b8:59:9f:f9:1c:ba PCI.DeviceID 1013 PCI.VendorID 15b3
      net3 MAC b8:59:9f:f9:1c:bb PCI.DeviceID 1013 PCI.VendorID 15b3
      -----
      /var/log/conman/console.ncn-s004-mgmt
      net0 MAC b8:59:9f:2b:31:1a PCI.DeviceID 1013 PCI.VendorID 15b3
      net1 MAC b8:59:9f:2b:31:1b PCI.DeviceID 1013 PCI.VendorID 15b3
      -----
      /var/log/conman/console.ncn-w001-mgmt
      net0 MAC 50:6b:4b:23:a7:90 PCI.DeviceID 1017 PCI.VendorID 15b3
      net1 MAC b8:59:9f:fe:49:d8 PCI.DeviceID 1013 PCI.VendorID 15b3
      net2 MAC b8:59:9f:fe:49:d9 PCI.DeviceID 1013 PCI.VendorID 15b3
      -----
      /var/log/conman/console.ncn-w002-mgmt
      net0 MAC 50:6b:4b:23:a7:98 PCI.DeviceID 1017 PCI.VendorID 15b3
      net1 MAC b8:59:9f:fe:49:f0 PCI.DeviceID 1013 PCI.VendorID 15b3
      net2 MAC b8:59:9f:fe:49:f1 PCI.DeviceID 1013 PCI.VendorID 15b3
      -----
      /var/log/conman/console.ncn-w003-mgmt
      net0 MAC b8:59:9f:d9:9e:2c PCI.DeviceID 1013 PCI.VendorID 15b3
      net1 MAC b8:59:9f:d9:9e:2d PCI.DeviceID 1013 PCI.VendorID 15b3
      -----
      
  4. Use the information returned in the previous step to compare the PCI.DeviceID and PCI.VendorID values to what is in NCN Networking Vendor and Bus ID Identification.

  5. Since PoR systems are handled with defaults, at this point one should notice differing PCI.DeviceID values than the bold entries in the Vendor and Bus ID table.

  6. After identifying which cards should be used for the management NICs, follow either of the below options, depending on the scope of the PCIe card change:

    • If all NCNs have the same change (e.g. all NCNs use ConnectX-5s for their management NICs), then update the main /var/www/boot/script.ipxe file and re-run set-sqfs-links.sh.

      1. Replace the default Vendor ID with the desired Intel Vendor ID.

        pit# sed -i 's/mgmt_vid0 .*/mgmt_vid0 8086/g' /var/www/boot/script.ipxe
        
      2. Restore the initrd.img.xz, kernel, and filesystem.squashfs links to the boot directories.

        pit# set-sqfs-links.sh
        
    • If only a subset of the NCNs have differing cards, re-run set-sqfs-links.sh and then update just that subset of boot scripts:

      The below example sets Intel as the management NICs, meaning onboards or Intel PCIe cards would be used for management interfaces. Additionally the example is applying this change to only a subset of nodes; specifically it is applying it to worker nodes only.

      1. Restore the initrd.img.xz, kernel, and filesystem.squashfs links to the boot directories.

        pit# set-sqfs-links.sh
        
      2. Replace the default Vendor ID with the desired Intel Vendor ID.

        pit# sed -i 's/mgmt_vid0 .*/mgmt_vid0 8086/g' /var/www/ncn-w*/script.ipxe
        

    Now the boot scripts are set up for booting differing PCIe cards or onboard NICs.

  7. Ensure that the management NICs do not get labeled as HSN NICs.

    In some cases the cards used for HSN NICs are used for management interfaces (for example, the system’s storage and master nodes use ConnectX-5s). In this case, this procedure will ensure that they are properly labeled.

    Note: The HSN NICs key off of the Device ID, not the Vendor ID.

    1. Restore the initrd.img.xz, kernel, and filesystem.squashfs links to the boot directories.

      pit# set-sqfs-links.sh
      
    2. Replace the default Vendor ID with the desired Intel Vendor ID.

      pit# sed -i 's/hsn_did0 .*/hsn_did0 0000/g' /var/www/ncn-m*/script.ipxe
      pit# sed -i 's/hsn_did0 .*/hsn_did0 0000/g' /var/www/ncn-s*/script.ipxe
      
  8. At this point the system is primed to boot custom PCIe or onboard NICs, and the boot files removed in step 1 are now restored.

  9. Save these scripts off the system for easy re-install at a later date.

    pit# SYSTEM_NAME=eniac
    pit# tar -czvf "${SYSTEM_NAME}-boot-scripts.tar.gz" /var/www/ncn-*/script.ipxe