Cray System Management Documentation > Cray System Management (CSM) Administration Guide > User Access Service (UAS) > Troubleshoot UAI Stuck in ContainerCreating

Troubleshoot UAI Stuck in `ContainerCreating`

Resolve an issue causing UAIs to show a uai_status field of Waiting, and a uai_msg field of ContainerCreating. It is possible that this is just a matter of starting the UAI taking longer than normal, perhaps as it pulls in a new UAI image from a registry. If the issue persists for a long time, it is worth investigating.

Prerequisites

The administrator must be logged into an NCN or a host that has administrative access to the HPE Cray EX System API Gateway
The administrator must have the HPE Cray EX System CLI (cray command) installed on the above host
The HPE Cray EX System CLI must be configured (initialized - cray init command) to reach the HPE Cray EX System API Gateway
The administrator must be logged in as an administrator to the HPE Cray EX System CLI (cray auth login command)
The administrator must be on an NCN or host that has Kubernetes (kubectl command) access to the HPE Cray EX System

Symptoms

The UAI has been in the ContainerCreating status for several minutes.

Procedure

Find the UAI.

ncn-m001-cray uas admin uais list --owner ctuser

Example output:

[[results]]
uai_age = "1m"
uai_connect_string = "ssh ctuser@10.103.13.159"
uai_host = "ncn-w001"
uai_img = "dtr.dev.cray.com/cray/cray-uai-sles15sp1:latest"
uai_ip = "10.103.13.159"
uai_msg = "ContainerCreating"
uai_name = "uai-ctuser-bcd1ff74"
uai_status = "Waiting"
username = "ctuser"

Look up the UAI’s pod in Kubernetes.

ncn-m001-kubectl get po -n user | grep uai-ctuser-bcd1ff74

Example output:

uai-ctuser-bcd1ff74-7d94967bdc-4vm66   0/1     ContainerCreating   0          2m58s

Describe the pod in Kubernetes.

ncn-m001-kubectl describe pod -n user uai-ctuser-bcd1ff74-7d94967bdc-4vm66

Example output:

Name:                 uai-ctuser-bcd1ff74-7d94967bdc-4vm66
Namespace:            user
Priority:             -100
Priority Class Name:  uai-priority
Node:                 ncn-w001/10.252.1.12
Start Time:           Wed, 03 Feb 2021 18:33:00 -0600

[...]

Events:
Type     Reason       Age                    From               Message
----     ------       ----                   ----               -------
Normal   Scheduled    <unknown>              default-scheduler  Successfully assigned user/uai-ctuser-bcd1ff74-7d94967bdc-4vm66 to ncn-w001
Warning  FailedMount  2m53s (x8 over 3m57s)  kubelet, ncn-w001  MountVolume.SetUp failed for volume "broker-sssd-config" : secret "broker-sssd-conf" not found
Warning  FailedMount  2m53s (x8 over 3m57s)  kubelet, ncn-w001  MountVolume.SetUp failed for volume "broker-sshd-config" : configmap "broker-sshd-conf" not found
Warning  FailedMount  2m53s (x8 over 3m57s)  kubelet, ncn-w001  MountVolume.SetUp failed for volume "broker-entrypoint" : configmap "broker-entrypoint" not found
Warning  FailedMount  114s                   kubelet, ncn-w001  Unable to attach or mount volumes: unmounted volumes=[broker-sssd-config broker-entrypoint broker-sshd-config], unattached volumes=[optcraype optlmod etcprofiled optr
optforgelicense broker-sssd-config lustre timezone optintel optmodulefiles usrsharelmod default-token-58t5p
optarmlicenceserver optcraycrayucx slurm-config opttoolworks optnvidiahpcsdk munge-key optamd opttotalview optgcc
opttotalviewlicense broker-entrypoint broker-sshd-config etccrayped opttotalviewsupport optcraymodulefilescrayucx optforge
usrlocalmodules varoptcraypepeimages]: timed out waiting for the condition

This produces a lot of output, all of which can be useful for diagnosis. A good place to start is in the Events section at the bottom. Notice the warnings here about volumes whose secrets and ConfigMaps are not found. In this case, that means the UAI cannot start because it was started in legacy mode without a default UAI class, and some of the volumes configured in the UAS are in the uas namespace to support localization of Broker UAIs and cannot be found in the user namespace. To solve this particular problem, configure a default UAI class with the correct volume list in it, delete the UAI, and allow the user to try creating it again using the default class.

Other problems can usually be quickly identified using this and other information found in the output from the kubectl describe pod command.

Top: User Access Service (UAS)

Next Topic: Troubleshoot Duplicate Mount Paths in a UAI