Troubleshooting Installation Problems

The installation of the Cray System Management (CSM) product requires knowledge of the various nodes and switches for the HPE Cray EX system. The procedures in this section should be referenced during the CSM install for additional information on system hardware, troubleshooting, and administrative tasks related to CSM.

Topics:

  1. Reset root Password on LiveCD
  2. Reinstall LiveCD
  3. PXE Boot Troubleshooting
  4. Wipe NCN Disks for Reinstallation
  5. Restart Network Services and Interfaces on NCNs
  6. Utility Storage Node Installation Troubleshooting
  7. Ceph CSI Troubleshooting
  8. Safeguards for CSM NCN Upgrades

Details

  1. Reset root Password on LiveCD

    If the root password on the LiveCD needs to be changed, then this procedure does the reset.

    See Reset root Password on LiveCD

  2. Reinstall LiveCD

    If a reinstall of the PIT node is needed, the data from the PIT node can be saved to the LiveCD USB and the LiveCD USB can be rebuilt.

    See Reinstall LiveCD

  3. PXE Boot Troubleshooting

    If a reinstall of the PIT node is needed, the data from the PIT node can be saved to the LiveCD USB and the LiveCD USB can be rebuilt.

    See PXE Boot Troubleshooting

  4. Wipe NCN Disks for Reinstallation

    If it has been determined an NCN did not properly configure its storage while trying to Deploy Management Nodes during the install, then the storage should be wiped so the node can be redeployed.

    See Wipe NCN Disks for Reinstallation

  5. Restart Network Services and Interfaces on NCNs

    If an NCN shows any of these problems, the network services and interfaces on that node might need to be restarted.

    • Interfaces not showing up
    • IP Addresses not applying
    • Member/children interfaces not being included

    See Restart Network Services and Interfaces on NCNs

  6. Utility Storage Node Installation Troubleshooting

    If there is a failure in the creation of Ceph storage on the utility storage nodes for one of these scenarios, the Ceph storage might need to be reinitialized.

    • Sometimes a large OSD can be created which is a concatenation of multiple devices, instead of one OSD per device

    See Utility Storage Node Installation Troubleshooting

  7. Ceph CSI Troubleshooting

    If there has been a failure to initialize all Ceph CSI components on ncn-s001, then the storage node cloud-init may need to be rerun.

    • Verify Ceph CSI
    • Rerun Storage Node cloud-init

    See Ceph CSI Troubleshooting

  8. Safeguards for CSM NCN Upgrades

    If a reinstall or upgrade is being done, there might be a reason to use one of these safeguards.

    • Preserve Ceph on Utility Storage Nodes
    • Protect RAID Configuration on Management Nodes

    See Safeguards for CSM NCN Upgrades