Cray System Management Documentation > Cray System Management (CSM) Administration Guide > node management > Rebuild NCNs > Adding a Ceph Node to the Ceph Cluster

Adding a Ceph Node to the Ceph Cluster

NOTE: This operation can be done to add more than one node at the same time.

Add Join Script

Copy and paste the below script into /srv/cray/scripts/common/join_ceph_cluster.sh

NOTE: This script may also available in the /usr/share/doc/csm/scripts directory where the latest docs-csm RPM is installed. If so, it can be copied from that node to the new storage node being rebuilt and skip to step 2.

#!/bin/bash

(( counter=0 ))

host=$(hostname)

> ~/.ssh/known_hosts

for node in ncn-s001 ncn-s002 ncn-s003; do
  ssh-keyscan -H "$node" >> ~/.ssh/known_hosts
  pdsh -w $node > ~/.ssh/known_hosts
  if [[ "$host" == "$node" ]]; then
    continue
  fi

  if [[ $(nc -z -w 10 $node 22) ]] || [[ $counter -lt 3 ]]
  then
    if [[ "$host" =~ ^("ncn-s001"|"ncn-s002"|"ncn-s003")$ ]]
    then
      scp $node:/etc/ceph/* /etc/ceph
    else
      scp $node:/etc/ceph/rgw.pem /etc/ceph/rgw.pem
    fi

    if [[ ! $(pdsh -w $node "/srv/cray/scripts/common/pre-load-images.sh; ceph orch host rm $host; ceph cephadm generate-key; ceph cephadm get-pub-key > ~/ceph.pub; ssh-keyscan -H $host >> ~/.ssh/known_hosts ;ssh-copy-id -f -i ~/ceph.pub root@$host; ceph orch host add $host") ]]
    then
      (( counter+1 ))
      if [[ $counter -ge 3 ]]
      then
        echo "Unable to access ceph monitor nodes"
        exit 1
      fi
    else
      break
    fi
  fi
done

sleep 30
(( ceph_mgr_failed_restarts=0 ))
(( ceph_mgr_successful_restarts=0 ))
until [[ $(cephadm shell -- ceph-volume inventory --format json-pretty|jq '.[] | select(.available == true) | .path' | wc -l) == 0 ]]
do
  for node in ncn-s001 ncn-s002 ncn-s003; do
    if [[ $ceph_mgr_successful_restarts > 10 ]]
    then
      echo "Failed to bring in OSDs, manual troubleshooting required."
      exit 1
    fi
    if pdsh -w $node ceph mgr fail
    then
      (( ceph_mgr_successful_restarts+1 ))
      sleep 120
      break
    else
      (( ceph_mgr_failed_restarts+1 ))
      if [[ $ceph_mgr_failed_restarts -ge 3 ]]
      then
        echo "Unable to access ceph monitor nodes."
        exit 1
      fi
    fi
  done
done

for service in $(cephadm ls | jq -r '.[].systemd_unit')
do
  systemctl enable $service
done

Change the mode of the script.

ncn-s# chmod u+x /srv/cray/scripts/common/join_ceph_cluster.sh

In a separate window, log into one of the first three storage nodes (ncn-s001, ncn-s002, or ncn-s003) and execute the following:
```
ncn-ms# watch ceph -s
```
Execute the script.
```
/srv/cray/scripts/common/join_ceph_cluster.sh
```
IMPORTANT: While watching your window running watch ceph -s you will see the health go to a HEALTH_WARN state. This is expected. Most commonly you will see an alert about “failed to probe daemons or devices” and this will clear.

Zapping OSDs

IMPORTANT: Only do this if you were not able to wipe the node prior to rebuild.

NOTE: The commands in the Zapping OSDs section must be run on a node running ceph-mon. Typically these are ncn-s001, ncn-s002, and ncn-s003.

Find the devices on the node being rebuilt

ceph orch device ls $NODE

Example Output:

Hostname  Path      Type  Serial          Size   Health   Ident  Fault  Available
ncn-s003  /dev/sdc  ssd   S455NY0MB42493  1920G  Unknown  N/A    N/A    No
ncn-s003  /dev/sdd  ssd   S455NY0MB42482  1920G  Unknown  N/A    N/A    No
ncn-s003  /dev/sde  ssd   S455NY0MB42486  1920G  Unknown  N/A    N/A    No
ncn-s003  /dev/sdf  ssd   S455NY0MB51808  1920G  Unknown  N/A    N/A    No
ncn-s003  /dev/sdg  ssd   S455NY0MB42473  1920G  Unknown  N/A    N/A    No
ncn-s003  /dev/sdh  ssd   S455NY0MB42468  1920G  Unknown  N/A    N/A    No

IMPORTANT: In the above example the drives on our rebuilt node are showing “Available = no”. This is expected because the check is based on the presence of an LVM on the volume.

NOTE: The ceph orch device ls $NODE command excludes the drives being used for the OS. Please double check that you are not seeing OS drives. These will have a size of 480G.

Zap the drives

for drive in $(ceph orch device ls $NODE --format json-pretty |jq -r '.[].devices[].path')
do
  ceph orch device zap $NODE $drive --force
done

Validate the drives are being added to the cluster.
```
ncn-ms# watch ceph -s
```
The returned output will have the OSD count UP and IN counts increase. If the IN count increases but does not reflect the amount of drives being added back in, an administrator must fail over the ceph-mgr daemon. This is a known bug and is addressed in newer releases.

If you need to fail over the ceph-mgr daemon please run:
```
ncn-s# ceph mgr fail
```

Regenerate Rados-GW Load Balancer Configuration for the Rebuilt Nodes

IMPORTANT: Rados-GW by default is deployed to the first 3 storage nodes. This includes HAproxy and Keepalived. This is automated as part of the install, but administrators may have to regenerate the configuration if they are not running on the first 3 storage nodes or all nodes.

Deploy Rados Gateway containers to the new nodes.
- Configure Rados Gateway containers with the complete list of nodes it should be running on:
```
ncn-s# ceph orch apply rgw site1 zone1 --placement="<node1 node2 node3 node4 ... >"
```

Verify Rados Gateway is running on the desired nodes.

ncn-s00(1/2/3)# ceph orch ps --daemon_type rgw

Example output:

NAME                             HOST      STATUS         REFRESHED  AGE  VERSION  IMAGE NAME                        IMAGE     D              CONTAINER ID
rgw.site1.zone1.ncn-s001.kvskqt  ncn-s001  running (41m)  6m ago     41m  15.2.8   registry.local/ceph/ceph:v15.2.8      553b0cb212c          6e323878db46
rgw.site1.zone1.ncn-s002.tisuez  ncn-s002  running (41m)  6m ago     41m  15.2.8   registry.local/ceph/ceph:v15.2.8      553b0cb212c          278830a273d3
rgw.site1.zone1.ncn-s003.nnwuqy  ncn-s003  running (41m)  6m ago     41m  15.2.8   registry.local/ceph/ceph:v15.2.8           553b0cb212c      a9706e6d7a69

Add nodes into HAproxy and KeepAlived.

ncn-s# pdsh -w ncn-s00[1-(end node number)] -f 2 \
                'source /srv/cray/scripts/metal/update_apparmor.sh
                 reconfigure-apparmor; /srv/cray/scripts/metal/generate_haproxy_cfg.sh > /etc/haproxy/haproxy.cfg
                 systemctl enable haproxy.service
                 systemctl restart haproxy.service
                 /srv/cray/scripts/metal/generate_keepalived_conf.sh > /etc/keepalived/keepalived.conf
                 systemctl enable keepalived.service
                 systemctl restart keepalived.service'

Next Step - Storage Node Validation