NOTE
Rebooting master nodes is not supported with IUF and must be performed manually as mentioned here.
NOTE
Additional arguments are available to control the behavior of the management-nodes-rollout
stage, for example --limit-management-rollout
and -cmrp
. See the
management-nodes-rollout
stage documentation for details and adjust the examples below if necessary.
Follow the steps below to reboot storage nodes:
Perform the NCN storage node reboot. This reboots a single storage node first and then reboots the remaining storage nodes.
(ncn-m001#
) Execute the management-nodes-rollout
stage with a single NCN storage node.
STORAGE_CANARY=ncn-s001
iuf -a "${ACTIVITY_NAME}" -m "${MEDIA_DIR}" run -r management-nodes-rollout --limit-management-rollout ${STORAGE_CANARY} --management-rollout-strategy reboot
(ncn-m001#
) Reboot the remaining NCN storage nodes once the first has rebooted successfully. This reboots NCN storage nodes serially.
Get the number of storage nodes based on the cluster and verify that it is correct. The storage canary node should not be in the list since it has already been rebooted.
The list of storage nodes can be manually entered as list of storage
node names separated by spaces if it is not desired to reboot all of the remaining storage nodes.
STORAGE_NODES="$(ceph orch host ls | grep ncn-s | grep -v "$STORAGE_CANARY" | awk '{print $1}' | xargs echo)"
echo "$STORAGE_NODES"
iuf -a "${ACTIVITY_NAME}" -m "${MEDIA_DIR}" run -r management-nodes-rollout --limit-management-rollout ${STORAGE_NODES} --management-rollout-strategy reboot
NOTE
When using the option --limit-management-rollout
to pass the list of nodes for management-nodes-rollout
, ensure that the label iuf-prevent-reboot=true
is not set on any of the nodes passed in the list.
(ncn-m001#
) Verify if any nodes are labeled with iuf-prevent-reboot=true
.
kubectl get nodes --show-labels | grep iuf-prevent-reboot
(ncn-m001#
) Use kubectl
to remove the iuf-prevent-reboot=true
label from the node.
kubectl label nodes "${NODE}" --overwrite iuf-prevent-reboot-
Follow the steps below to reboot worker nodes:
Perform the NCN worker node reboot. This reboots a single worker node first and then reboots the remaining worker nodes.
(ncn-m001#
) Execute the management-nodes-rollout
stage with a single NCN worker node.
This will reboot the canary node.
The worker canary node can be any worker node and does not have to be ncn-w001
.
WORKER_CANARY=ncn-w001
iuf -a "${ACTIVITY_NAME}" -m "${MEDIA_DIR}" run -r management-nodes-rollout --limit-management-rollout ${WORKER_CANARY} --management-rollout-strategy reboot
(ncn-m001#
) Use kubectl
to apply the iuf-prevent-reboot=true
label to the canary node to prevent it from unnecessarily rebooting again.
kubectl label nodes "${WORKER_CANARY}" --overwrite iuf-prevent-reboot=true
(ncn-m001#
) Verify the IUF node labels are present on the desired node.
kubectl get nodes --show-labels | grep iuf-prevent-reboot
(ncn-m001#
) Execute the management-nodes-rollout
stage on all remaining worker nodes.
NOTE
For this step, the argument to --limit-management-rollout
can be Management_Worker
or a list of worker
node names separated by spaces. If Management_Worker
is supplied, all worker nodes that are not labeled
with iuf-prevent-reboot=true
will be reboot. If a list of worker node names is supplied, then those worker nodes will be reboot.
Choose one of the following two options. The difference between the options is the limit-management-rollout
argument, but the two options do the same thing.
(ncn-m001#
) Execute management-nodes-rollout
on all Management_Worker
nodes.
iuf -a "${ACTIVITY_NAME}" -m "${MEDIA_DIR}" run -r management-nodes-rollout --limit-management-rollout Management_Worker --management-rollout-strategy reboot
(ncn-m001#
) Execute management-nodes-rollout
on a group of worker nodes. The list of worker nodes can be manually edited if it is undesirable to reboot all of the workers with one execution.
WORKER_NODES=$(kubectl get node | grep -P 'ncn-w\d+' | grep -v $WORKER_CANARY | awk '{print $1}' | xargs)
echo $WORKER_NODES
iuf -a "${ACTIVITY_NAME}" -m "${MEDIA_DIR}" run -r management-nodes-rollout --limit-management-rollout $WORKER_NODES --management-rollout-strategy reboot
(ncn-m001#
) Use kubectl
to remove the iuf-prevent-rollout=true
label from the canary node.
kubectl label nodes "${WORKER_CANARY}" --overwrite iuf-prevent-reboot-
NOTE
To complete reboot of master nodes manually, refer to the procedure mentioned here.