As part of the installation, Kubernetes generates certificates for the required subcomponents. This document will help walk through the process of renewing the certificates.
IMPORTANT:
Depending on the version of Kubernetes, the command may or may not reside under the alpha category. Use kubectl certs --help
and kubectl alpha certs --help
to determine this. The overall command syntax should be the same and this is just whether or not the command structure will require alpha
in it.
IMPORTANT:
When you pick your master node to renew the certificatess on, that is the node that will be referenced in this document as ncn-m
.
IMPORTANT:
This document is based off a base hardware configuration of 3 masters and 3 workers (We are leaving off utility storage since they are not running Kubernetes). Please make sure to update any commands that run on multiple nodes accordingly.
Procedures for Certificate Renewal:
IMPORTANT:
Master nodes will have certificates for both Kubernetes services and the Kubernetes client. Workers will only have the certificates for the Kubernetes client.
Services (master nodes):
/etc/kubernetes/pki/apiserver.crt
/etc/kubernetes/pki/apiserver-etcd-client.crt
/etc/kubernetes/pki/apiserver-etcd-client.key
/etc/kubernetes/pki/apiserver.key
/etc/kubernetes/pki/apiserver-kubelet-client.crt
/etc/kubernetes/pki/apiserver-kubelet-client.key
/etc/kubernetes/pki/ca.crt
/etc/kubernetes/pki/ca.key
/etc/kubernetes/pki/front-proxy-ca.crt
/etc/kubernetes/pki/front-proxy-ca.key
/etc/kubernetes/pki/front-proxy-client.crt
/etc/kubernetes/pki/front-proxy-client.key
/etc/kubernetes/pki/sa.key
/etc/kubernetes/pki/sa.pub
/etc/kubernetes/pki/etcd/ca.crt
/etc/kubernetes/pki/etcd/ca.key
/etc/kubernetes/pki/etcd/healthcheck-client.crt
/etc/kubernetes/pki/etcd/healthcheck-client.key
/etc/kubernetes/pki/etcd/peer.crt
/etc/kubernetes/pki/etcd/peer.key
/etc/kubernetes/pki/etcd/server.crt
/etc/kubernetes/pki/etcd/server.key
Client (master and worker nodes):
/var/lib/kubelet/pki/kubelet-client-2021-09-07-17-06-36.pem
/var/lib/kubelet/pki/kubelet-client-current.pem
/var/lib/kubelet/pki/kubelet.crt
/var/lib/kubelet/pki/kubelet.key
Check the expiration of the certificates.
Log into a master node and run the following:
ncn-m# kubeadm alpha certs check-expiration --config /etc/kubernetes/kubeadmcfg.yaml
WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Sep 24, 2021 15:21 UTC 14d no
apiserver Sep 24, 2021 15:21 UTC 14d ca no
apiserver-etcd-client Sep 24, 2021 15:20 UTC 14d ca no
apiserver-kubelet-client Sep 24, 2021 15:21 UTC 14d ca no
controller-manager.conf Sep 24, 2021 15:21 UTC 14d no
etcd-healthcheck-client Sep 24, 2021 15:19 UTC 14d etcd-ca no
etcd-peer Sep 24, 2021 15:19 UTC 14d etcd-ca no
etcd-server Sep 24, 2021 15:19 UTC 14d etcd-ca no
front-proxy-client Sep 24, 2021 15:21 UTC 14d front-proxy-ca no
scheduler.conf Sep 24, 2021 15:21 UTC 14d no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Sep 02, 2030 15:21 UTC 8y no
etcd-ca Sep 02, 2030 15:19 UTC 8y no
front-proxy-ca Sep 02, 2030 15:21 UTC 8y no
Backup existing certificates.
Master Nodes:
ncn-m# pdsh -w ncn-m00[1-3] tar cvf /root/cert_backup.tar /etc/kubernetes/pki/ /var/lib/kubelet/pki/
ncn-m001: tar: Removing leading / from member names
ncn-m001: /etc/kubernetes/pki/
ncn-m001: /etc/kubernetes/pki/front-proxy-client.key
ncn-m001: tar: Removing leading / from hard link targets
ncn-m001: /etc/kubernetes/pki/apiserver-etcd-client.key
ncn-m001: /etc/kubernetes/pki/sa.key
.
.
.. shortened output
Worker Nodes:
IMPORTANT:
The range of nodes below should reflect the size of the environment. This should run on every worker node.
ncn-m# pdsh -w ncn-w00[1-3] tar cvf /root/cert_backup.tar /var/lib/kubelet/pki/
ncn-w003: tar: Removing leading / from member names
ncn-w003: /var/lib/kubelet/pki/
ncn-w003: /var/lib/kubelet/pki/kubelet.key
ncn-w003: /var/lib/kubelet/pki/kubelet-client-2021-09-07-17-06-36.pem
ncn-w003: /var/lib/kubelet/pki/kubelet.crt
.
.
.. shortened output
Renew the Certificates.
ncn-m# kubeadm alpha certs renew all --config /etc/kubernetes/kubeadmcfg.yaml
WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed
Check the new expiration.
ncn-m# kubeadm alpha certs check-expiration --config /etc/kubernetes/kubeadmcfg.yaml
WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Sep 22, 2022 17:13 UTC 364d no
apiserver Sep 22, 2022 17:13 UTC 364d ca no
apiserver-etcd-client Sep 22, 2022 17:13 UTC 364d etcd-ca no
apiserver-kubelet-client Sep 22, 2022 17:13 UTC 364d ca no
controller-manager.conf Sep 22, 2022 17:13 UTC 364d no
etcd-healthcheck-client Sep 22, 2022 17:13 UTC 364d etcd-ca no
etcd-peer Sep 22, 2022 17:13 UTC 364d etcd-ca no
etcd-server Sep 22, 2022 17:13 UTC 364d etcd-ca no
front-proxy-client Sep 22, 2022 17:13 UTC 364d front-proxy-ca no
scheduler.conf Sep 22, 2022 17:13 UTC 364d no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Sep 02, 2030 15:21 UTC 8y no
etcd-ca Sep 02, 2030 15:19 UTC 8y no
front-proxy-ca Sep 02, 2030 15:21 UTC 8y no
This command may have only updated some certificates.
ncn-m# ncn-m001:~ # ls -l /etc/kubernetes/pki
-rw-r--r-- 1 root root 1249 Sep 22 17:13 apiserver.crt
-rw-r--r-- 1 root root 1090 Sep 22 17:13 apiserver-etcd-client.crt
-rw------- 1 root root 1675 Sep 22 17:13 apiserver-etcd-client.key
-rw------- 1 root root 1679 Sep 22 17:13 apiserver.key
-rw-r--r-- 1 root root 1099 Sep 22 17:13 apiserver-kubelet-client.crt
-rw------- 1 root root 1679 Sep 22 17:13 apiserver-kubelet-client.key
-rw------- 1 root root 1025 Sep 21 20:50 ca.crt
-rw------- 1 root root 1679 Sep 21 20:50 ca.key
drwxr-xr-x 2 root root 162 Sep 21 20:50 etcd
-rw------- 1 root root 1038 Sep 21 20:50 front-proxy-ca.crt
-rw------- 1 root root 1679 Sep 21 20:50 front-proxy-ca.key
-rw-r--r-- 1 root root 1058 Sep 22 17:13 front-proxy-client.crt
-rw------- 1 root root 1675 Sep 22 17:13 front-proxy-client.key
-rw------- 1 root root 1675 Sep 21 20:50 sa.key
-rw------- 1 root root 451 Sep 21 20:50 sa.pub
ncn-m# ls -l /etc/kubernetes/pki/etcd
-rw-r--r-- 1 root root 1017 Sep 21 20:50 ca.crt
-rw-r--r-- 1 root root 1675 Sep 21 20:50 ca.key
-rw-r--r-- 1 root root 1094 Sep 22 17:13 healthcheck-client.crt
-rw------- 1 root root 1679 Sep 22 17:13 healthcheck-client.key
-rw-r--r-- 1 root root 1139 Sep 22 17:13 peer.crt
-rw------- 1 root root 1679 Sep 22 17:13 peer.key
-rw-r--r-- 1 root root 1139 Sep 22 17:13 server.crt
-rw------- 1 root root 1675 Sep 22 17:13 server.key
As we can see not all the certificate files were updated.
IMPORTANT:
Some certificates were not updated because they have a distant expiration time and did not need to be updated. This is expected.
Certificates most likely to not be updated due to a distant expiration:
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Sep 02, 2030 15:21 UTC 8y no
etcd-ca Sep 02, 2030 15:19 UTC 8y no
front-proxy-ca Sep 02, 2030 15:21 UTC 8y no
This means we can ignore the fact that our ca.crt/key, front-proxy-ca.crt/key, and etcd ca.crt/key were not updated.
Check the expiration of the certificates files that do not have a current date and are of the .crt
or .pem
format. See File Locations for the list of files.
This task is for each master node and below example checks each certificate in File Locations.
for i in $(ls /etc/kubernetes/pki/*.crt;ls /etc/kubernetes/pki/etcd/*.crt;ls /var/lib/kubelet/pki/*.crt;ls /var/lib/kubelet/pki/*.pem);do echo ${i}; openssl x509 -enddate -noout -in ${i};done
/etc/kubernetes/pki/apiserver.crt
notAfter=Sep 22 17:13:28 2022 GMT
/etc/kubernetes/pki/apiserver-etcd-client.crt
notAfter=Sep 22 17:13:28 2022 GMT
/etc/kubernetes/pki/apiserver-kubelet-client.crt
notAfter=Sep 22 17:13:28 2022 GMT
/etc/kubernetes/pki/ca.crt
notAfter=Sep 4 09:31:10 2031 GMT
/etc/kubernetes/pki/front-proxy-ca.crt
notAfter=Sep 4 09:31:11 2031 GMT
/etc/kubernetes/pki/front-proxy-client.crt
notAfter=Sep 22 17:13:29 2022 GMT
/etc/kubernetes/pki/etcd/ca.crt
notAfter=Sep 4 09:30:28 2031 GMT
/etc/kubernetes/pki/etcd/healthcheck-client.crt
notAfter=Sep 22 17:13:29 2022 GMT
/etc/kubernetes/pki/etcd/peer.crt
notAfter=Sep 22 17:13:29 2022 GMT
/etc/kubernetes/pki/etcd/server.crt
notAfter=Sep 22 17:13:29 2022 GMT
/var/lib/kubelet/pki/kubelet.crt
notAfter=Sep 21 19:50:16 2022 GMT
/var/lib/kubelet/pki/kubelet-client-2021-09-07-17-06-36.pem
notAfter=Sep 4 17:01:38 2022 GMT
/var/lib/kubelet/pki/kubelet-client-current.pem
notAfter=Sep 4 17:01:38 2022 GMT
IMPORTANT:
DO NOT forget to verify certificates in /etc/kubernetes/pki/etcd.
apiserver-etcd-client.crt
is critical as it is the cert that allows the Kubernetes API server to talk to the bare-metal etcd cluster.
Also the /var/lib/kubelet/pki/
certificates will be updated in the Kubernetes client section that follows.Restart etcd.
Once the steps to renew the needed certs have been completed on all the master nodes, then log into each master node one at a time and do:
ncn-m# systemctl restart etcd.service
Restart kubelet.
On each Kubernetes node do:
IMPORTANT:
The below example will need to be adjusted to reflect the correct amount of master and worker nodes in your environment.
ncn-m# pdsh -w ncn-m00[1-3] -w ncn-w00[1-3] systemctl restart kubelet.service
Fix kubectl command access.
NOTE:
Only if your certificates have expired will the following command respond with Unauthorized. In any case, the new client certificates will need to be distributed in the following steps.
ncn-m# kubectl get nodes
error: You must be logged in to the server (Unauthorized)
ncn-m# cp /etc/kubernetes/admin.conf /root/.kube/config
ncn-m# # kubectl get nodes
NAME STATUS ROLES AGE VERSION
ncn-m001 Ready master 370d v1.18.6
ncn-m002 Ready master 370d v1.18.6
ncn-m003 Ready master 370d v1.18.6
ncn-w001 Ready <none> 370d v1.18.6
ncn-w002 Ready <none> 370d v1.18.6
ncn-w003 Ready <none> 370d v1.18.6
Distribute the client certificate to the rest of the cluster.
NOTE:
You may have errors copying files. The target may or may not exist depending on the version of Shasta.
DO NOT
need to copy this to the master node where you are performing this work./etc/kubernetes/admin.conf
to all master and worker nodes.If you attempt to copy to workers nodes other than ncn-w001
in a Shasta v1.3 or earlier system you will see this error pdcp@ncn-m001: ncn-w003: fatal: /root/.kube/: Is a directory
and this is expected and can be ignored.
Client access:
NOTE:
Please update the below command with the appropriate amount of worker nodes.
For Shasta v1.4 and later :
ncn-m# pdcp -w ncn-m00[2-3] -w ncn-w00[1-3] /etc/kubernetes/admin.conf /etc/kubernetes/
For Shasta v1.3 and earlier :
ncn-m# pdcp -w ncn-m00[2-3] -w ncn-w001 /root/.kube/config /root/.kube/
Backup certificates for kubelet
on each master and worker node:
IMPORTANT:
The below example will need to be adjusted to reflect the correct amount of master and worker nodes in your environment.
ncn-m# pdsh -w ncn-m00[1-3] -w ncn-w00[1-3] tar cvf /root/kubelet_certs.tar /etc/kubernetes/kubelet.conf /var/lib/kubelet/pki/
On the master node where you updated the other certificates do:
Get your current apiserver-advertise-address
.
ncn# kubectl config view|grep server
server: https://10.252.120.2:6442
Using the IP address from the above output do:
apiserver-advertise-address
may vary, so make sure you are not copy and pasting without verifying.ncn-m# for node in $(kubectl get nodes -o json|jq -r '.items[].metadata.name'); do kubeadm alpha kubeconfig user --org system:nodes --client-name system:node:$node --apiserver-advertise-address 10.252.120.2 --apiserver-bind-port 6442 > /root/$node.kubelet.conf; done
This will generate a new kubelet.conf
file in the /root/
directory. There should be a new file per node running Kubernetes.
Copy each file to the corresponding node shown in the filename.
NOTE:
Please update the below command with the appropriate amount of master and worker nodes.
ncn-m# for node in ncn-m00{1..3} ncn-w00{1..3}; do scp /root/$node.kubelet.conf $node:/etc/kubernetes/; done
Log into each node one at a time and do the following.
<node>
.kubelet.conf /etc/kubernetes/kubelet.confCheck the expiration of the kubectl certificates files. See File Locations for the list of files.
This task is for each master and worker node. The example checks each kubelet certificate in File Locations.
for i in $(ls /var/lib/kubelet/pki/*.crt;ls /var/lib/kubelet/pki/*.pem);do echo ${i}; openssl x509 -enddate -noout -in ${i};done
/var/lib/kubelet/pki/kubelet.crt
notAfter=Sep 22 17:37:30 2022 GMT
/var/lib/kubelet/pki/kubelet-client-2021-09-22-18-37-30.pem
notAfter=Sep 22 18:32:30 2022 GMT
/var/lib/kubelet/pki/kubelet-client-current.pem
notAfter=Sep 22 18:32:30 2022 GMT
Perform a rolling reboot of master nodes.
For Shasta v1.4 and later :
For Shasta v1.3 and earlier :
Follow the Reboot_NCNs process.
NOTES:
/etc/cray/xname
on the specific node.IMPORTANT: Please ensure you are verifying pods are running on the master node that was rebooted before proceeding to the next node.
Perform a rolling reboot of worker nodes.
For Shasta v1.4 and later :
For Shasta v1.3 and earlier :
Before rebooting any worker node, scale nexus replicas to 0.
ncn-m# kubectl scale deployment nexus -n nexus --replicas=0
Follow the Reboot_NCNs process.
NOTES:
ncn-w001 is the externally connected node. On Shasta v1.4 and later, ncn-m001 is the externally connected node.
The failover-leader.sh, ncnGetXnames.sh and add_pod_priority.sh scripts are not available or required when rebooting worker nodes.
After draining a worker, force delete any pod that fails to terminate due to Cannot evict pod as it would violate the pod's disruption budget
.
ncn-m# kubectl delete pod <pod-name> -n <namespace> --force
Reference the Shasta v1.3 Admin Guide for any steps related to checking system health.
After rebooting all the worker nodes, scale nexus replicas back to 1 and verify nexus is running.
ncn-m# kubectl scale deployment nexus -n nexus --replicas=1
ncn-m# kubectl get pods -n nexus | grep nexus
nexus-868d7b8466-gjnps 2/2 Running 0 5m
For Shasta v1.3 and earlier, restart the sonar cronjobs and verify vault etcd is healthy.
Restart the sonar cronjobs.
ncn-m# kubectl -n services get cronjob sonar-jobs-watcher -o json | jq 'del(.spec.selector)' | jq 'del(.spec.template.metadata.labels."controller-uid")' | jq 'del(.status)' | kubectl replace --force -f -
ncn-m# kubectl -n services get cronjob sonar-sync -o json | jq 'del(.spec.selector)' | jq 'del(.spec.template.metadata.labels."controller-uid")' | jq 'del(.status)' | kubectl replace --force -f -
After at least a minute, verify that the cronjobs have been scheduled.
ncn-m# # kubectl get cronjobs -n services sonar-jobs-watcher
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
sonar-jobs-watcher */1 * * * * False 1 23s 5m10s
ncn-m# kubectl get cronjobs -n services sonar-sync
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
sonar-sync */1 * * * * False 1 32s 5m15s
Check the health of vault etcd.
ncn-m# for pod in $(kubectl get pods -l app=etcd -n vault -o jsonpath='{.items[*].metadata.name}'); do echo "### ${pod} ###"; kubectl -n vault exec $pod -- /bin/sh -c "ETCDCTL_API=3 etcdctl --cacert /etc/etcdtls/operator/etcd-tls/etcd-client-ca.crt --cert /etc/etcdtls/operator/etcd-tls/etcd-client.crt --key /etc/etcdtls/operator/etcd-tls/etcd-client.key --endpoints https://localhost:2379 endpoint health"; done
If the above health of vault etcd reports any pods as unhealthy
, backup the secret, delete the secret. The operator will create a new secret.
ncn-m# kubectl get secret -n vault cray-vault-etcd-tls -o yaml > /root/vault_sec.yaml
ncn-m# kubectl delete secret -n vault cray-vault-etcd-tls
Once the new secret has been created and the cray-vault-etcd pods are running, verify the health of vault etcd.
ncn-m# kubectl get secret -n vault cray-vault-etcd-tls
NAME TYPE DATA AGE
cray-vault-etcd-tls Opaque 9 5m
ncn-m# kubectl get pods -l app=etcd -n vault
NAME READY STATUS RESTARTS AGE
cray-vault-etcd-stzjf6dqd5 1/1 Running 0 10m
cray-vault-etcd-ws59fgssxt 1/1 Running 0 10m
cray-vault-etcd-xmvfxz48vs 1/1 Running 0 10m
ncn-m# for pod in $(kubectl get pods -l app=etcd -n vault -o jsonpath='{.items[*].metadata.name}'); do echo "### ${pod} ###"; kubectl -n vault exec $pod -- /bin/sh -c "ETCDCTL_API=3 etcdctl --cacert /etc/etcdtls/operator/etcd-tls/etcd-client-ca.crt --cert /etc/etcdtls/operator/etcd-tls/etcd-client.crt --key /etc/etcdtls/operator/etcd-tls/etcd-client.key --endpoints https://localhost:2379 endpoint health"; done
### cray-vault-etcd-stzjf6dqd5 ###
https://localhost:2379 is healthy: successfully committed proposal: took = 19.999618ms
### cray-vault-etcd-ws59fgssxt ###
https://localhost:2379 is healthy: successfully committed proposal: took = 19.597736ms
### cray-vault-etcd-xmvfxz48vs ###
https://localhost:2379 is healthy: successfully committed proposal: took = 19.81056ms
NOTE:
Vault etcd errors such as tls: bad certificate." Reconnecting
can be ignored.
ncn-m# kubectl logs -l app=etcd -n vault | grep "bad certificate\". Reconnecting"
WARNING: 2021/09/24 17:35:11 grpc: addrConn.createTransport failed to connect to {0.0.0.0:2379 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate". Reconnecting...
If Check Certificates indicates that only the apiserver-etcd-client
need to be renewed, then the following can be used to renew just that one certificate.
The full Renew All Certificates procedure will also renew this certificate.
Run the following steps on each master node.
Renew the Etcd certificate.
kubeadm alpha certs renew apiserver-etcd-client --config /etc/kubernetes/kubeadmcfg.yaml
systemctl restart etcd.service
systemctl restart kubelet.service
The client secrets can be updated independently from the Kubernetes certs.
Run the following steps from a master node.
Update the client certificate for kube-etcdbackup
.
Update the kube-etcdbackup-etcd
secret.
kubectl --namespace=kube-system create secret generic kube-etcdbackup-etcd \
--from-file=/etc/kubernetes/pki/etcd/ca.crt \
--from-file=tls.crt=/etc/kubernetes/pki/etcd/server.crt \
--from-file=tls.key=/etc/kubernetes/pki/etcd/server.key \
--save-config --dry-run=client -o yaml | kubectl apply -f -
Check the certificate’s expiration date to verify that the certificate is not expired.
kubectl get secret -n kube-system kube-etcdbackup-etcd -o json | jq -r '.data."tls.crt" | @base64d' | openssl x509 -noout -enddate
Example output:
notAfter=May 4 22:37:16 2023 GMT
Check that the next kube-etcdbackup
cronjob Completed
. This cronjob runs every 10 minutes.
kubectl get pod -l app.kubernetes.io/instance=cray-baremetal-etcd-backup -n kube-system
Example output:
NAME READY STATUS RESTARTS AGE
kube-etcdbackup-1652201400-czh5p 0/1 Completed 0 107s
Update the client certificate for etcd-client
.
Update the etcd-client-cert
secret.
kubectl --namespace=sysmgmt-health create secret generic etcd-client-cert \
--from-file=etcd-client=/etc/kubernetes/pki/apiserver-etcd-client.crt \
--from-file=etcd-client-key=/etc/kubernetes/pki/apiserver-etcd-client.key \
--from-file=etcd-ca=/etc/kubernetes/pki/etcd/ca.crt \
--save-config --dry-run=client -o yaml | kubectl apply -f -
Check the certificates’ expiration dates to verify that none of the certificate are expired.
Check the etcd-ca
expiration date.
kubectl get secret -n sysmgmt-health etcd-client-cert -o json | jq -r '.data."etcd-ca" | @base64d' | openssl x509 -noout -enddate
Example output:
notAfter=May 1 18:20:23 2032 GMT
Check the etcd-client
expiration date.
kubectl get secret -n sysmgmt-health etcd-client-cert -o json | jq -r '.data."etcd-client" | @base64d' | openssl x509 -noout -enddate
Example output:
notAfter=May 4 18:20:24 2023 GMT
Restart Prometheus.
kubectl rollout restart -n sysmgmt-health statefulSet/prometheus-cray-sysmgmt-health-promet-prometheus
kubectl rollout status -n sysmgmt-health statefulSet/prometheus-cray-sysmgmt-health-promet-prometheus
Example output:
Waiting for 1 pods to be ready...
statefulset rolling update complete ...
Check for any tls
errors from the active Prometheus targets. No errors are expected.
PROM_IP=$(kubectl get services -n sysmgmt-health cray-sysmgmt-health-promet-prometheus -o json | jq -r '.spec.clusterIP')
curl -s http://${PROM_IP}:9090/api/v1/targets | jq -r '.data.activeTargets[] | select(."scrapePool" == "sysmgmt-health/cray-sysmgmt-health-promet-kube-etcd/0")' | grep lastError | sort -u
Example output:
"lastError": "",