This document provides links to troubleshooting information for services and functionality provided by CSM.
In the main repository landing page, change the branch to the CSM version being used on the system (for example, release/1.0
, release/1.2
, release/1.3
, etc.).
Use the pre-populated GitHub “Search or jump to…” function in the upper left hand side of the page and append keywords related to the exiting problem seen into the existing search. (The example searches for “ping” and “PXE” related troubleshooting resources on the “main” branch.)
Follow any run-books, guides, or procedures which are directly related to the problem.
Change the branch to main
and search a second time to retrieve very recent or beta run-books and guides.
Users can also expand the search beyond the “troubleshooting” section (instead of doing “path troubleshooting”) and/or use more advanced GitHub searches such as “path configure” to find the right context.
RedfishEndpoint
s in Hardware State Manageradmin*client-auth
Not Foundkubectl logs -f
returns no space left on devicePodInitializing
management-nodes-rollout
failedcray-console-node
pods in CrashLoopBackOff
cloud-init
fails with ‘Timed out waiting for device’ errorOOMKilled
boot_sets
field always required when Modifying a BOS session templatewaiting_on_user
running
require_dkms=true
arch=x86_64
arch=x86_64
OOMKilled
boot_sets
field always required when Modifying a session templaterunning
cray-console-node
pods in CrashLoopBackOff
RedfishEndpoints
in HSMwaiting_on_user
require_dkms=true
arch=x86_64
arch=x86_64
kubectl
CommandsNotReady
DiscoveryCheck
for Redfish Events from Nodescloud-init
fails with ‘Timed out waiting for device’ error