If rolling back a service with Bitnami etcd, the helm rollback
could fail when going from an etcd chart 9.x version to an etcd chart 8.x version.
This is because the Bitnami 9.x etcd cluster StatefulSet and Pods have the app.kubernetes.io/component=etcd
label and the
Bitnami 8.x etcd cluster StatefulSet and Pods do not, causing the StatefulSet to complain on rollback.
The symptom of this problem is that during etcd chart rollback, Helm will give an error resembling the following:
Error: cannot patch "cray-hmnfd-bitnami-etcd" with kind StatefulSet: StatefulSet.apps "cray-hmnfd-bitnami-etcd" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden
(ncn-mw#
) This error may also be seen in the helm history
output.
helm history -n services cray-hms-hmnfd
Example output:
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
1 Mon Dec 9 21:00:24 2024 superseded cray-hms-hmnfd-3.0.2 1.18.1 Install complete
2 Thu Dec 19 13:53:27 2024 deployed cray-hms-hmnfd-4.0.4 1.21.0 Upgrade complete
3 Fri Dec 20 19:50:42 2024 failed cray-hms-hmnfd-3.0.2 1.18.1 Rollback "cray-hms-hmnfd" failed: cannot patch "cray-hmnfd-bitnami-etcd" with kind StatefulSet: StatefulSet.apps "cray-hmnfd-bitnami-etcd" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden
If helm.sh/chart
is etcd-9.x.x
or greater and the StatefulSet and Pods have the app.kubernetes.io/component=etcd
label,
the error can be worked around by removing the offending label from the etcd cluster Pods and StatefulSet.
(ncn-mw#
) Check which etcd helm chart version is currently running.
kubectl describe pod -n services cray-hmnfd-bitnami-etcd-0 | grep "helm.sh/chart"
Example output:
helm.sh/chart=etcd-9.5.6
(ncn-mw#
) Check that the StatefulSet and the pods have the app.kubernetes.io/component
label.
Check if the StatefulSet has the label.
kubectl get statefulset -n services -l app.kubernetes.io/component=etcd | grep hmnfd
Example output:
cray-hmnfd-bitnami-etcd 3/3 26h
Check if the Pods have the label.
kubectl get pods -n services -l app.kubernetes.io/component=etcd | grep hmnfd
Example output:
cray-hmnfd-bitnami-etcd-0 2/2 Running 0 33m
cray-hmnfd-bitnami-etcd-1 2/2 Running 0 34m
cray-hmnfd-bitnami-etcd-2 2/2 Running 0 35m
If a manual backup of the etcd cluster was not done prior to running helm rollback
, then do so now.
See Create a Manual Backup of a Healthy etcd Cluster.
Run the script remove_label_from_etcd_cluster.sh
in /usr/share/doc/csm/troubleshooting/scripts/
to remove the
app.kubernetes.io/component=etcd
label from the StatefulSet and Pods in the etcd cluster.
remove_label_from_etcd_cluster.sh
usage:
Usage:
./remove_label_from_etcd_cluster.sh <namespace> <etcd-cluster>
<namespace> - The Kubernetes namespace the etcd cluster pods are running in.
Example, 'services'.
<etcd-cluster> - The base name of the etcd cluster pods. Example, 'cray-hmnfd'.
(ncn-mw#
) Run remove_label_from_etcd_cluster.sh
.
/usr/share/doc/csm/troubleshooting/scripts/remove_label_from_etcd_cluster.sh services cray-hmnfd
Example output:
Ensuring cray-hmnfd-bitnami-etcd members do not have 'app.kubernetes.io/component=etcd' label for rollback to bitnami 8.x chart...
Removing 'app.kubernetes.io/component=etcd' label from cray-hmnfd-bitnami-etcd-0...
pod/cray-hmnfd-bitnami-etcd-0 unlabeled
Removing 'app.kubernetes.io/component=etcd' label from cray-hmnfd-bitnami-etcd-1...
pod/cray-hmnfd-bitnami-etcd-1 unlabeled
Removing 'app.kubernetes.io/component=etcd' label from cray-hmnfd-bitnami-etcd-2...
pod/cray-hmnfd-bitnami-etcd-2 unlabeled
Removing label 'app.kubernetes.io/component=etcd' from statefulset for cray-hmnfd-bitnami-etcd
statefulset.apps/cray-hmnfd-bitnami-etcd unlabeled
Label 'app.kubernetes.io/component=etcd' was removed from pods and 'cray-hmnfd-bitnami-etcd' statefulset. Continue with rollback.
The label app.kubernetes.io/component=etcd
that was causing the StatefulSet error has been removed from the StatefulSet and etcd cluster Pods.
Re-running the helm rollback
should now succeed.
helm rollback
With the label app.kubernetes.io/component=etcd
removed from the etcd cluster Pods and the StatefulSet,
re-run helm rollback
. The rollback should succeed and the expected revision should be running.
(ncn-mw#
) Re-run the rollback.
helm rollback -n services cray-hms-hmnfd 1
Example output:
Rollback was a success! Happy Helming!
(ncn-mw#
) Check the running revision.
helm history -n services cray-hms-hmnfd
Example output:
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
1 Mon Dec 9 21:00:24 2024 superseded cray-hms-hmnfd-3.0.2 1.18.1 Install complete
2 Thu Dec 19 13:53:27 2024 superseded cray-hms-hmnfd-4.0.4 1.21.0 Upgrade complete
3 Fri Dec 20 19:50:42 2024 failed cray-hms-hmnfd-3.0.2 1.18.1 Rollback "cray-hms-hmnfd" failed: cannot patch "cray-hmnfd-bitnami-etcd" with kind StatefulSet: StatefulSet.apps "cray-hmnfd-bitnami-etcd" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden
4 Fri Dec 20 20:06:38 2024 deployed cray-hms-hmnfd-3.0.2 1.18.1 Rollback to 1