Adding a Ceph Node to the Ceph Cluster

NOTE: This operation can be done to add more than one node at the same time.

Add Join Script

  1. Copy and paste the below script into /srv/cray/scripts/common/

    NOTE: This script may also available in the /usr/share/doc/csm/scripts directory where the latest docs-csm RPM is installed. If so, it can be copied from that node to the new storage node being rebuilt and skip to step 2.

    (( counter=0 ))
    > ~/.ssh/known_hosts
    for node in ncn-s001 ncn-s002 ncn-s003; do
      ssh-keyscan -H "$node" >> ~/.ssh/known_hosts
      pdsh -w $node > ~/.ssh/known_hosts
      if [[ "$host" == "$node" ]]; then
      if [[ $(nc -z -w 10 $node 22) ]] || [[ $counter -lt 3 ]]
        if [[ "$host" =~ ^("ncn-s001"|"ncn-s002"|"ncn-s003")$ ]]
          scp $node:/etc/ceph/* /etc/ceph
          scp $node:/etc/ceph/rgw.pem /etc/ceph/rgw.pem
        if [[ ! $(pdsh -w $node "/srv/cray/scripts/common/; ceph orch host rm $host; ceph cephadm generate-key; ceph cephadm get-pub-key > ~/; ssh-keyscan -H $host >> ~/.ssh/known_hosts ;ssh-copy-id -f -i ~/ root@$host; ceph orch host add $host") ]]
          (( counter+1 ))
          if [[ $counter -ge 3 ]]
            echo "Unable to access ceph monitor nodes"
            exit 1
    sleep 30
    (( ceph_mgr_failed_restarts=0 ))
    (( ceph_mgr_successful_restarts=0 ))
    until [[ $(cephadm shell -- ceph-volume inventory --format json-pretty|jq '.[] | select(.available == true) | .path' | wc -l) == 0 ]]
      for node in ncn-s001 ncn-s002 ncn-s003; do
        if [[ $ceph_mgr_successful_restarts > 10 ]]
          echo "Failed to bring in OSDs, manual troubleshooting required."
          exit 1
        if pdsh -w $node ceph mgr fail
          (( ceph_mgr_successful_restarts+1 ))
          sleep 120
          (( ceph_mgr_failed_restarts+1 ))
          if [[ $ceph_mgr_failed_restarts -ge 3 ]]
            echo "Unable to access ceph monitor nodes."
            exit 1
    for service in $(cephadm ls | jq -r '.[].systemd_unit')
      systemctl enable $service
  2. Change the mode of the script.

    ncn-s# chmod u+x /srv/cray/scripts/common/
  3. In a separate window, log into one of the first three storage nodes (ncn-s001, ncn-s002, or ncn-s003) and execute the following:

    ncn-ms# watch ceph -s
  4. Execute the script.


    IMPORTANT: While watching your window running watch ceph -s you will see the health go to a HEALTH_WARN state. This is expected. Most commonly you will see an alert about “failed to probe daemons or devices” and this will clear.

Zapping OSDs

IMPORTANT: Only do this if you were not able to wipe the node prior to rebuild.

NOTE: The commands in the Zapping OSDs section must be run on a node running ceph-mon. Typically these are ncn-s001, ncn-s002, and ncn-s003.

  1. Find the devices on the node being rebuilt

    ceph orch device ls $NODE

    Example Output:

    Hostname  Path      Type  Serial          Size   Health   Ident  Fault  Available
    ncn-s003  /dev/sdc  ssd   S455NY0MB42493  1920G  Unknown  N/A    N/A    No
    ncn-s003  /dev/sdd  ssd   S455NY0MB42482  1920G  Unknown  N/A    N/A    No
    ncn-s003  /dev/sde  ssd   S455NY0MB42486  1920G  Unknown  N/A    N/A    No
    ncn-s003  /dev/sdf  ssd   S455NY0MB51808  1920G  Unknown  N/A    N/A    No
    ncn-s003  /dev/sdg  ssd   S455NY0MB42473  1920G  Unknown  N/A    N/A    No
    ncn-s003  /dev/sdh  ssd   S455NY0MB42468  1920G  Unknown  N/A    N/A    No

    IMPORTANT: In the above example the drives on our rebuilt node are showing “Available = no”. This is expected because the check is based on the presence of an LVM on the volume.

    NOTE: The ceph orch device ls $NODE command excludes the drives being used for the OS. Please double check that you are not seeing OS drives. These will have a size of 480G.

  2. Zap the drives

    for drive in $(ceph orch device ls $NODE --format json-pretty |jq -r '.[].devices[].path')
      ceph orch device zap $NODE $drive --force
  3. Validate the drives are being added to the cluster.

    ncn-ms# watch ceph -s

    The returned output will have the OSD count UP and IN counts increase. If the IN count increases but does not reflect the amount of drives being added back in, an administrator must fail over the ceph-mgr daemon. This is a known bug and is addressed in newer releases.

    If you need to fail over the ceph-mgr daemon please run:

    ncn-s# ceph mgr fail

Regenerate Rados-GW Load Balancer Configuration for the Rebuilt Nodes

IMPORTANT: Rados-GW by default is deployed to the first 3 storage nodes. This includes HAproxy and Keepalived. This is automated as part of the install, but administrators may have to regenerate the configuration if they are not running on the first 3 storage nodes or all nodes.

  1. Deploy Rados Gateway containers to the new nodes.

    • Configure Rados Gateway containers with the complete list of nodes it should be running on:

      ncn-s# ceph orch apply rgw site1 zone1 --placement="<node1 node2 node3 node4 ... >"
  2. Verify Rados Gateway is running on the desired nodes.

    ncn-s00(1/2/3)# ceph orch ps --daemon_type rgw

    Example output:

    NAME                             HOST      STATUS         REFRESHED  AGE  VERSION  IMAGE NAME                        IMAGE     D              CONTAINER ID
    rgw.site1.zone1.ncn-s001.kvskqt  ncn-s001  running (41m)  6m ago     41m  15.2.8   registry.local/ceph/ceph:v15.2.8      553b0cb212c          6e323878db46
    rgw.site1.zone1.ncn-s002.tisuez  ncn-s002  running (41m)  6m ago     41m  15.2.8   registry.local/ceph/ceph:v15.2.8      553b0cb212c          278830a273d3
    rgw.site1.zone1.ncn-s003.nnwuqy  ncn-s003  running (41m)  6m ago     41m  15.2.8   registry.local/ceph/ceph:v15.2.8           553b0cb212c      a9706e6d7a69
  3. Add nodes into HAproxy and KeepAlived.

    ncn-s# pdsh -w ncn-s00[1-(end node number)] -f 2 \
                    'source /srv/cray/scripts/metal/
                     reconfigure-apparmor; /srv/cray/scripts/metal/ > /etc/haproxy/haproxy.cfg
                     systemctl enable haproxy.service
                     systemctl restart haproxy.service
                     /srv/cray/scripts/metal/ > /etc/keepalived/keepalived.conf
                     systemctl enable keepalived.service
                     systemctl restart keepalived.service'

Next Step - Storage Node Validation