This describes how to migrate Kubernetes CNI from Weave to Cilium during a CSM upgrade.
(ncn-m#
) Run the migration script:
/usr/share/doc/csm/scripts/cilium_migration.sh
This script will:
argo
namespace.kubectl
.(ncn-mw#
) Monitor the migration workflow:
The workflow status can also be tracked using the Argo CLI:
The Argo CLI watch function can be used to view the overall progress of the workflow.
argo watch <workflow-name> -n argo
The Argo CLI logs function can be used to monitor the workflow in more detail.
argo logs <workflow-name> -n argo -f
Replace <workflow-name>
with the actual name of the workflow created by the cilium_migration.sh script.
When restarting the pods on the NCN worker nodes, it is possible for the workflow to get stuck trying to evict cray-shared-kafka-kafka
or SMA cluster-kafka
pods.
Example output:
evicting pod services/cray-shared-kafka-kafka-1
error when evicting pods/"cray-shared-kafka-kafka-1" -n "services" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
The issue is that one of the restarted Kafka pods cannot communicate with Zookeeper. This is the problem described in
cfs-api
pods in CLBO state during CSM install,
and it has the same workaround.
(ncn-mw#
) If the stuck pod is part of cray-shared-kafka
, then restart that Zookeeper instance.
kubectl delete pods -n services -l strimzi.io/controller-name=cray-shared-kafka-zookeeper
(ncn-mw#
) If the stuck pod is a member of SMA cluster-kafka
, then restart the SMA Zookeeper instance.
kubectl delete pod -n sma -l strimzi.io/controller-name=cluster-zookeeper
etcd
clusterWhen restarting the pods on the NCN worker nodes, it is possible for the workflow to get stuck trying to evict pods from an etcd
cluster.
Example output:
evicting pod services/cray-hbtd-bitnami-etcd-2
error when evicting pods/"cray-hbtd-bitnami-etcd-2" -n "services" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
See etcd
Pods in CLBO State for more information and a workaround.