ContainerCreating
Resolve an issue causing UAIs to show a uai_status
field of Waiting
, and a uai_msg
field of ContainerCreating
.
It is possible that this is just a matter of starting the UAI taking longer than normal, perhaps as it pulls in a new UAI image from a registry. If the issue persists for a long time, it is worth investigating.
cray
command) installed on the above hostcray init
command) to reach the HPE Cray EX System API Gatewaycray auth login
command)kubectl
command) access to the HPE Cray EX SystemThe UAI has been in the ContainerCreating
status for several minutes.
Find the UAI.
ncn-m001-cray uas admin uais list --owner ctuser
Example output:
[[results]]
uai_age = "1m"
uai_connect_string = "ssh ctuser@10.103.13.159"
uai_host = "ncn-w001"
uai_img = "dtr.dev.cray.com/cray/cray-uai-sles15sp1:latest"
uai_ip = "10.103.13.159"
uai_msg = "ContainerCreating"
uai_name = "uai-ctuser-bcd1ff74"
uai_status = "Waiting"
username = "ctuser"
Look up the UAI’s pod in Kubernetes.
ncn-m001-kubectl get po -n user | grep uai-ctuser-bcd1ff74
Example output:
uai-ctuser-bcd1ff74-7d94967bdc-4vm66 0/1 ContainerCreating 0 2m58s
Describe the pod in Kubernetes.
ncn-m001-kubectl describe pod -n user uai-ctuser-bcd1ff74-7d94967bdc-4vm66
Example output:
Name: uai-ctuser-bcd1ff74-7d94967bdc-4vm66
Namespace: user
Priority: -100
Priority Class Name: uai-priority
Node: ncn-w001/10.252.1.12
Start Time: Wed, 03 Feb 2021 18:33:00 -0600
[...]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned user/uai-ctuser-bcd1ff74-7d94967bdc-4vm66 to ncn-w001
Warning FailedMount 2m53s (x8 over 3m57s) kubelet, ncn-w001 MountVolume.SetUp failed for volume "broker-sssd-config" : secret "broker-sssd-conf" not found
Warning FailedMount 2m53s (x8 over 3m57s) kubelet, ncn-w001 MountVolume.SetUp failed for volume "broker-sshd-config" : configmap "broker-sshd-conf" not found
Warning FailedMount 2m53s (x8 over 3m57s) kubelet, ncn-w001 MountVolume.SetUp failed for volume "broker-entrypoint" : configmap "broker-entrypoint" not found
Warning FailedMount 114s kubelet, ncn-w001 Unable to attach or mount volumes: unmounted volumes=[broker-sssd-config broker-entrypoint broker-sshd-config], unattached volumes=[optcraype optlmod etcprofiled optr
optforgelicense broker-sssd-config lustre timezone optintel optmodulefiles usrsharelmod default-token-58t5p
optarmlicenceserver optcraycrayucx slurm-config opttoolworks optnvidiahpcsdk munge-key optamd opttotalview optgcc
opttotalviewlicense broker-entrypoint broker-sshd-config etccrayped opttotalviewsupport optcraymodulefilescrayucx optforge
usrlocalmodules varoptcraypepeimages]: timed out waiting for the condition
This produces a lot of output, all of which can be useful for diagnosis. A good place to start is in the Events
section at the bottom.
Notice the warnings here about volumes whose secrets and ConfigMaps are not found.
In this case, that means the UAI cannot start because it was started in legacy mode without a default UAI class,
and some of the volumes configured in the UAS are in the uas
namespace to support localization of Broker UAIs and cannot be found in the user
namespace.
To solve this particular problem, configure a default UAI class with the correct volume list in it, delete the UAI, and allow the user to try creating it again using the default class.
Other problems can usually be quickly identified using this and other information found in the output from the kubectl describe pod
command.