NOTE
This operation can be done to add more than one node at the same time.
Copy join script from ncn-m001
to the storage node that was rebuilt or added.
Run this command on the storage node that was rebuilt or added.
mkdir -pv /usr/share/doc/csm/scripts &&
scp -p ncn-m001:/usr/share/doc/csm/scripts/join_ceph_cluster.sh /usr/share/doc/csm/scripts
Start monitoring the Ceph health alongside the main procedure.
In a separate window, run the following command on ncn-s001
, ncn-s002
, or ncn-s003
(but not the same node that was rebuilt or added):
watch ceph -s
Execute the script from the first step.
Run this command on the storage node that was rebuilt or added.
/usr/share/doc/csm/scripts/join_ceph_cluster.sh
IMPORTANT: In the output from watch ceph -s
the health should go to a HEALTH_WARN
state. This is expected. Most commonly you will see an alert about failed to probe daemons or devices
, but this should clear on its own.
In addition, it may take up to 5 minutes for the added OSDs to report as up
. This is dependent on the Ceph Orchestrator performing an inventory and completing batch processing to add the OSDs.
IMPORTANT: Only do this if you are 100% certain you need to erase data from a previous install.
NOTE
The commands in this section will need to be run from a node running ceph-mon
. Typically ncn-s001
, ncn-s002
, or ncn-s003
.
Find the devices on the node being rebuilt.
ceph orch device ls $NODE
Example Output:
Hostname Path Type Serial Size Health Ident Fault Available
ncn-s003 /dev/sdc ssd S455NY0MB42493 1920G Unknown N/A N/A No
ncn-s003 /dev/sdd ssd S455NY0MB42482 1920G Unknown N/A N/A No
ncn-s003 /dev/sde ssd S455NY0MB42486 1920G Unknown N/A N/A No
ncn-s003 /dev/sdf ssd S455NY0MB51808 1920G Unknown N/A N/A No
ncn-s003 /dev/sdg ssd S455NY0MB42473 1920G Unknown N/A N/A No
ncn-s003 /dev/sdh ssd S455NY0MB42468 1920G Unknown N/A N/A No
IMPORTANT: In the above example the drives on our rebuilt node are showing Available = no
. This is expected because the check is based on the presence of an LVM on the volume.
NOTE
The ceph orch device ls $NODE
command excludes the drives being used for the OS. Please double check that you are not seeing OS drives. These will have a size of 480G
.
Zap the drives.
for drive in $(ceph orch device ls $NODE --format json-pretty |jq -r '.[].devices[].path') ; do
ceph orch device zap $NODE $drive --force
done
Validate that the drives are being added to the cluster.
watch ceph -s
The OSD up
and in
counts should increase. If the in
count increases but does not reflect the amount of drives being added back in, then fail over the ceph-mgr
daemon. This is a known bug and is addressed in newer releases.
If you need to fail over the ceph-mgr
daemon, run:
ceph mgr fail
IMPORTANT: radosgw
by default is deployed to the first three storage nodes. This includes haproxy
and keepalived
.
This is automated as part of the install, but the configuration may need to be regenerated if not running on the first three storage nodes or all nodes.
ncn-s00[1/2/3]#
Deploy Rados Gateway containers to the new nodes. The placement should be all nodes that Rados Gateway should be running on, not only the new node.
ceph orch apply rgw site1 zone1 --placement="<num-daemons> <node1 node2 node3 node4 ... >" --port=8080
ncn-s00[1/2/3]#
Verify that Rados Gateway is running on the desired nodes.
ceph orch ps --daemon_type rgw
Example output:
NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID
rgw.site1.ncn-s001.kvskqt ncn-s001 running (41m) 6m ago 41m 15.2.8 registry.local/ceph/ceph:v15.2.8 553b0cb212c 6e323878db46
rgw.site1.ncn-s002.tisuez ncn-s002 running (41m) 6m ago 41m 15.2.8 registry.local/ceph/ceph:v15.2.8 553b0cb212c 278830a273d3
rgw.site1.ncn-s003.nnwuqy ncn-s003 running (41m) 6m ago 41m 15.2.8 registry.local/ceph/ceph:v15.2.8 553b0cb212c a9706e6d7a69
Add nodes into HAproxy
and KeepAlived
. Adjust the command based on the number of storage nodes.
If the node was rebuilt:
source /srv/cray/scripts/metal/update_apparmor.sh; reconfigure-apparmor
pdsh -w ncn-s00[1-(end node number)] -f 2 \
'/srv/cray/scripts/metal/generate_haproxy_cfg.sh > /etc/haproxy/haproxy.cfg
systemctl restart haproxy.service
/srv/cray/scripts/metal/generate_keepalived_conf.sh > /etc/keepalived/keepalived.conf
systemctl restart keepalived.service'
If the node was added:
Determine the IP address of the added node.
cloud-init query ds | jq -r ".meta_data[].host_records[] | select(.aliases[]? == \"$(hostname)\") | .ip" 2>/dev/null
Example Output:
10.252.1.13
Update the HAproxy configuration to include the added node. Select a storage node ncn-s00x
from one of the first three storage nodes. This cannot be done from the added node.
vi /etc/haproxy/haproxy.cfg
This example adds or updates ncn-s004
with the IP address 10.252.1.13
to backend rgw-backend
.
...
backend rgw-backend
option forwardfor
balance static-rr
option httpchk GET /
server server-ncn-s001-rgw0 10.252.1.6:8080 check weight 100
server server-ncn-s002-rgw0 10.252.1.5:8080 check weight 100
server server-ncn-s003-rgw0 10.252.1.4:8080 check weight 100
server server-ncn-s004-rgw0 10.252.1.13:8080 check weight 100 <--- Added or updated line
...
Copy the updated HAproxy configuration to all the storage nodes. Adjust the command based on the number of storage nodes.
pdcp -w ncn-s00[1-(end node number)] /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg
Configure apparmor
and KeepAlived on the added node and restart the services across all the storage nodes.
source /srv/cray/scripts/metal/update_apparmor.sh; reconfigure-apparmor
/srv/cray/scripts/metal/generate_keepalived_conf.sh > /etc/keepalived/keepalived.conf
export PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no"
pdsh -w ncn-s00[1-(end node number)] -f 2 'systemctl restart haproxy.service; systemctl restart keepalived.service'