Only follow the steps in the section for the node type that was added:
Validate the master node added successfully.
Verify the new node is in the cluster.
Run the following command from any master or worker node that is already in the cluster. It is helpful to run this command several times to watch for the newly rebuilt node to join the cluster. This should occur within 10 to 20 minutes.
ncn-mw# kubectl get nodes
Example output:
NAME STATUS ROLES AGE VERSION
ncn-m001 Ready master 113m v1.19.9
ncn-m002 Ready master 113m v1.19.9
ncn-m003 Ready master 112m v1.19.9
ncn-w001 Ready <none> 112m v1.19.9
ncn-w002 Ready <none> 112m v1.19.9
ncn-w003 Ready <none> 112m v1.19.9
Confirm the sdc
disk has the correct lvm
on the rebuilt node.
ncn-m# lsblk `blkid -L ETCDLVM`
Example output:
sdc 8:32 0 447.1G 0 disk
└─ETCDLVM 254:0 0 447.1G 0 crypt
└─etcdvg0-ETCDK8S 254:1 0 32G 0 lvm /run/lib-etcd
Confirm etcd
is running and shows the node as a member once again.
The newly built master node should be in the returned list.
ncn-m# etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/ca.crt \
--key=/etc/kubernetes/pki/etcd/ca.key --endpoints=localhost:2379 member list
Validate the worker node added successfully.
Verify the new node is in the cluster.
Run the following command from any master or worker node that is already in the cluster. It is helpful to run this command several times to watch for the newly added node to join the cluster. This should occur within 10 to 20 minutes.
ncn-mw# kubectl get nodes
Example output:
NAME STATUS ROLES AGE VERSION
ncn-m001 Ready master 113m v1.19.9
ncn-m002 Ready master 113m v1.19.9
ncn-m003 Ready master 112m v1.19.9
ncn-w001 Ready <none> 112m v1.19.9
ncn-w002 Ready <none> 112m v1.19.9
ncn-w003 Ready <none> 112m v1.19.9
Confirm /var/lib/containerd
is on overlay on the node which was added.
Run the following command on the added node.
ncn-w# df -h /var/lib/containerd
Example output:
Filesystem Size Used Avail Use% Mounted on
containerd_overlayfs 378G 245G 133G 65% /var/lib/containerd
After several minutes of the node joining the cluster, pods should be in a Running
state for the worker node.
Confirm the pods are beginning to get scheduled and reach a Running
state on the worker node.
Run this command on any master or worker node. This command assumes you have set the variables from the prerequisites section.
ncn# kubectl get po -A -o wide | grep $NODE
Confirm BGP is healthy.
Follow the steps in the Check BGP Status and Reset Sessions to verify and fix BGP if needed.
Validate the storage node added successfully. The following examples are based on a storage cluster that was expanded from three nodes to four.
Verify the Ceph status looks correct.
Get the current Ceph status:
ncn-m# ceph -s
Example output:
cluster:
id: b13f1282-9b7d-11ec-98d9-b8599f2b2ed2
health: HEALTH_OK
services:
mon: 3 daemons, quorum ncn-s001,ncn-s002,ncn-s003 (age 4h)
mgr: ncn-s001.pdeosn(active, since 4h), standbys: ncn-s002.wjnqvu, ncn-s003.avkrzl
mds: cephfs:1 {0=cephfs.ncn-s001.ldlvfj=up:active} 1 up:standby-replay 1 up:standby
osd: 18 osds: 18 up (since 4h), 18 in (since 4h)
rgw: 4 daemons active (site1.zone1.ncn-s001.ktslgl, site1.zone1.ncn-s002.inynsh, site1.zone1.ncn-s003.dvyhak, site1.zone1.ncn-s004.jnhqvt)
task status:
data:
pools: 12 pools, 713 pgs
objects: 37.20k objects, 72 GiB
usage: 212 GiB used, 31 TiB / 31 TiB avail
pgs: 713 active+clean
io:
client: 7.0 KiB/s rd, 300 KiB/s wr, 2 op/s rd, 49 op/s wr
Verify that the status shows the following:
mon
smds
mgr
processesrgw
for each storage node (4 in this example)Verify the added host contains OSDs and the OSDs are up.
ncn-m# ceph osd tree
Example output:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 31.43875 root default
-7 6.98639 host ncn-s001
0 ssd 1.74660 osd.0 up 1.00000 1.00000
10 ssd 1.74660 osd.10 up 1.00000 1.00000
11 ssd 1.74660 osd.11 up 1.00000 1.00000
15 ssd 1.74660 osd.15 up 1.00000 1.00000
-3 6.98639 host ncn-s002
3 ssd 1.74660 osd.3 up 1.00000 1.00000
5 ssd 1.74660 osd.5 up 1.00000 1.00000
7 ssd 1.74660 osd.7 up 1.00000 1.00000
12 ssd 1.74660 osd.12 up 1.00000 1.00000
-5 6.98639 host ncn-s003
1 ssd 1.74660 osd.1 up 1.00000 1.00000
4 ssd 1.74660 osd.4 up 1.00000 1.00000
8 ssd 1.74660 osd.8 up 1.00000 1.00000
13 ssd 1.74660 osd.13 up 1.00000 1.00000
-9 10.47958 host ncn-s004
2 ssd 1.74660 osd.2 up 1.00000 1.00000
6 ssd 1.74660 osd.6 up 1.00000 1.00000
9 ssd 1.74660 osd.9 up 1.00000 1.00000
14 ssd 1.74660 osd.14 up 1.00000 1.00000
16 ssd 1.74660 osd.16 up 1.00000 1.00000
17 ssd 1.74660 osd.17 up 1.00000 1.00000
Verify the radosgw
and haproxy
are correct.
Run the following command on the added storage node.
There will be an output (without an error) returned if radosgw
and haproxy
are correct.
ncn-s# curl -k https://rgw-vip.nmn
Example output:
<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/ "><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult
Proceed to the next step to Validate Health or return to the main Add, Remove, Replace, or Move NCNs page.