Troubleshoot UAIs with Administrative Access

Sometimes there is no better way to figure out a problem with a UAI than to get inside it and look around as an administrator. This is done using kubectl exec to start a shell inside the running container as root (in the container). With this an administrator can diagnose problems, make changes to the running UAI, and find solutions. It is important to remember that any change made inside a UAI is transitory. These changes only last as long as the UAI is running. To make a permanent change, either the UAI image has to be changed or external customizations must be applied.


The high-level steps of the procedure are the following:

  1. Find the name of the UAI in question.
  2. Use that name with kubectl to find the pod containing that UAI.
  3. Use that pod name, the UAI name (as the container name), and the user namespace to open an interactive shell in the container with kubectl exec.
  4. From this shell, look around the UAI as needed.


Here is an example session showing a ps command inside the container of a UAI by an administrator.

  1. List the UAIs.

    ncn-mw# cray uas admin uais list --format toml

    Example output:

    uai_age = "1d4h"
    uai_connect_string = "ssh broker@"
    uai_host = "ncn-w001"
    uai_img = ""
    uai_ip = ""
    uai_msg = ""
    uai_name = "uai-broker-2e6ce6b7"
    uai_status = "Running: Ready"
    username = "broker"
    uai_age = "0m"
    uai_connect_string = "ssh vers@"
    uai_host = "ncn-w001"
    uai_img = ""
    uai_ip = ""
    uai_msg = ""
    uai_name = "uai-vers-4ebe1966"
    uai_status = "Running: Ready"
    username = "vers"
  2. Find the pod name.

    ncn-mw# kubectl get po -n user | grep uai-vers-4ebe1966

    Example output:

    uai-vers-4ebe1966-77b7c9c84f-xgqm4     1/1     Running   0          77s
  3. Open an interactive shell in the pod.

    ncn-mw# kubectl exec -it -n user uai-vers-4ebe1966-77b7c9c84f-xgqm4 -c uai-vers-4ebe1966 -- /bin/sh
  4. Run the ps command inside the container of a UAI.

    uai# ps -afe

    Example output:

    UID          PID    PPID  C STIME TTY          TIME CMD
    root           1       0  0 22:56 ?        00:00:00 /bin/bash /usr/bin/
    munge         36       1  0 22:56 ?        00:00:00 /usr/sbin/munged
    root          54       1  0 22:56 ?        00:00:00 su vers -c /usr/sbin/sshd -e -f /etc/uas/ssh/sshd_config -D
    vers          55      54  0 22:56 ?        00:00:00 /usr/sbin/sshd -e -f /etc/uas/ssh/sshd_config -D
    root          90       0  0 22:58 pts/0    00:00:00 /bin/sh
    root          97      90  0 22:58 pts/0    00:00:00 ps -afe

