During NCN master node rebuild, the node may fail to join the Kubernetes cluster due to a missing or expired bootstrap token.
This occurs when the token referenced in the join scripts located in /etc/cray/kubernetes is no longer present in the cluster’s
token list or the cluster-info ConfigMap. The join scripts are created during initial cluster setup and may reference tokens
that have since expired or been removed from the cluster.
Bootstrap tokens should be automatically refreshed by the cray-k8s-token-certs-refresh cronjob, which runs hourly
on the first-master node. If this cronjob is not running or has failed, tokens may become stale, causing
node join failures.
kubeadm join error.cluster-info ConfigMap for the token ID.Example error output:
Attempting to join node to the Kubernetes cluster (will continue to retry if it fails)
kubeadm join 10.252.1.2:6442 --token 770vrq.adapc6m1k68f14r3 --discovery-token-ca-cert-hash sha256:86d0422bb32949fc49fbac28eaf01abe37401f2e93203815c6bd1287403c9af5 --control-plane --certificate-key fa0d08d9b5257c63ae9c3c19c8c766059d46e91454f1325f4f60aac1b7cc6408 --apiserver-advertise-address=10.252.1.17 --apiserver-advertise-address=10.252.1.17...
[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "770vrq"
To see the stack trace of this error execute with --v=5 or higher
(ncn-m#) Identify the first-master node from BSS.
Command:
cray bss bootparameters list --hosts Global --format toml | grep first-master-hostname
Example output:
first-master-hostname = "ncn-m001"
(ncn-m#) Check if the token exists in the cluster’s token list.
Command:
kubeadm token list
If the token referenced in the error (e.g., 770vrq) is not listed, the token has expired or been removed.
(ncn-m#) Verify the token is missing from the cluster-info ConfigMap.
Command:
kubectl -n kube-public get cm cluster-info -o yaml
If the token is not present in the ConfigMap, it needs to be regenerated.
(first-master#) Check if the cray-k8s-token-certs-refresh cronjob is configured and running on the first-master node.
Command:
cat /etc/cron.d/cray-k8s-token-certs-refresh
Expected output:
0 */1 * * * root /srv/cray/scripts/kubernetes/token-certs-refresh.sh >> /var/log/cray/cron.log 2>&1
Check recent cron log entries:
grep token-certs-refresh /var/log/cray/cron.log | tail -10
If the cronjob is missing or not running, this may be why tokens are stale.
Rerun the promote-initial-master.sh script on the first-master node to regenerate the bootstrap tokens, update the join scripts,
and restore the cray-k8s-token-certs-refresh cronjob.
(ncn-m#) Identify the first-master node from BSS if not already known.
Command:
cray bss bootparameters list --hosts Global --format toml | grep first-master-hostname
Example output:
first-master-hostname = "ncn-m001"
(first-master#) Run the promote initial master script on the first-master node.
Command:
/usr/share/doc/csm/upgrade/scripts/k8s/promote-initial-master.sh
This script will:
cluster-info ConfigMap with the new tokens/etc/cray/kubernetes with valid tokenscray-k8s-token-certs-refresh cronjob to run hourly (0 */1 * * *)(first-master#) Verify the cronjob is now configured.
Command:
cat /etc/cron.d/cray-k8s-token-certs-refresh
Expected output:
0 */1 * * * root /srv/cray/scripts/kubernetes/token-certs-refresh.sh >> /var/log/cray/cron.log 2>&1
(ncn-m#) Retry the node rebuild process.
The node should now be able to join the Kubernetes cluster using the updated tokens.
(ncn-m#) Verify the node has successfully joined the cluster.
Command:
kubectl get nodes
The rebuilt node should appear in the node list with a Ready status.