Prepare a storage node before rebuilding it.
IMPORTANT: All of the output examples may not reflect the cluster status where this operation is being performed. For example, if this is a rebuild in place, then Ceph components will not be reporting down, in contrast to a failed node rebuild.
Ensure that the latest CSM documentation RPM is installed on ncn-m001
.
(ncn-m001#
) When rebuilding a node, make sure that /srv/cray/scripts/common/storage-ceph-cloudinit.sh
and /srv/cray/scripts/common/pre-load-images.sh
have been removed from the runcmd
in BSS.
Set node name and xname if not already set.
NODE=ncn-s00n
XNAME=$(ssh $NODE cat /etc/cray/xname)
Get the runcmd
in BSS.
cray bss bootparameters list --name ${XNAME} --format=json|jq -r '.[]|.["cloud-init"]|.["user-data"].runcmd'
Expected output:
[
"/srv/cray/scripts/metal/net-init.sh",
"/srv/cray/scripts/common/update_ca_certs.py",
"/srv/cray/scripts/metal/install.sh",
"/srv/cray/scripts/common/ceph-enable-services.sh",
"touch /etc/cloud/cloud-init.disabled"
]
If /srv/cray/scripts/common/storage-ceph-cloudinit.sh
or /srv/cray/scripts/common/pre-load-images.sh
is in the runcmd
, then it will need to be fixed using
the following procedure:
Obtain an API authentication token.
A token will need to be generated and made available as an environment variable. Refer to the Retrieve an Authentication Token procedure for more information.
Run the following command to patch BSS.
python3 /usr/share/doc/csm/scripts/patch-ceph-runcmd.py
Repeat the original Cray CLI command and verify that the expected output is obtained.
Upload Ceph container images into Nexus.
Log into one of the first three storage NCNs.
This procedure must be performed on a ceph-mon
node. By default these will be
any of the first three storage NCNs: ncn-s001
, ncn-s002
, or ncn-s003
(ncn-s#
) Check the status of Ceph.
Check the OSD status, weight, and location:
ceph osd tree
Example output:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 62.87558 root default
-5 20.95853 host ncn-s001
2 ssd 3.49309 osd.2 up 1.00000 1.00000
5 ssd 3.49309 osd.5 up 1.00000 1.00000
6 ssd 3.49309 osd.6 up 1.00000 1.00000
9 ssd 3.49309 osd.9 up 1.00000 1.00000
12 ssd 3.49309 osd.12 up 1.00000 1.00000
16 ssd 3.49309 osd.16 up 1.00000 1.00000
-3 20.95853 host ncn-s002
0 ssd 3.49309 osd.0 up 1.00000 1.00000
3 ssd 3.49309 osd.3 up 1.00000 1.00000
7 ssd 3.49309 osd.7 up 1.00000 1.00000
10 ssd 3.49309 osd.10 up 1.00000 1.00000
13 ssd 3.49309 osd.13 up 1.00000 1.00000
15 ssd 3.49309 osd.15 up 1.00000 1.00000
-7 20.95853 host ncn-s003
1 ssd 3.49309 osd.1 up 1.00000 1.00000
4 ssd 3.49309 osd.4 up 1.00000 1.00000
8 ssd 3.49309 osd.8 up 1.00000 1.00000
11 ssd 3.49309 osd.11 up 1.00000 1.00000
14 ssd 3.49309 osd.14 up 1.00000 1.00000
17 ssd 3.49309 osd.17 up 1.00000 1.00000
(ncn-s#
) If the node is up, then stop and disable all the Ceph services on the node being rebuilt.
ceph orch host maintenance enter <storage node hostname being rebuilt>
Example output:
Daemons for Ceph cluster 5f79a490-c281-11ed-b6ec-fa163e741e89 stopped on host ncn-s003. Host ncn-s003 moved to maintenance mode
IMPORTANT: The –force flag is used to bypass warnings. These pertain to Ceph services which can handle failures, like rgw
.
Example:
WARNING: Stopping 1 out of 1 daemons in Alertmanager service. Service will not be operational with no daemons left. At least 1 daemon must be running to guarantee service.
ALERT: Cannot stop active Mgr daemon, Please switch active Mgrs with 'ceph mgr fail ncn-s003.ydycwn'
WARNING: Removing RGW daemons can cause clients to lose connectivity.
In this example, the warnings for RGW and Alertmanager would be ignored by passing the --force
flag. The alert for active Mgr
will need to be addressed with the provided command (ceph mgr fail ncn-s003.ydycwn
).
(ncn-s#
) Re-check the OSD status, weight, and location:
ceph osd tree
Example output:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 62.87558 root default
-5 20.95853 host ncn-s001
2 ssd 3.49309 osd.2 up 1.00000 1.00000
5 ssd 3.49309 osd.5 up 1.00000 1.00000
6 ssd 3.49309 osd.6 up 1.00000 1.00000
9 ssd 3.49309 osd.9 up 1.00000 1.00000
12 ssd 3.49309 osd.12 up 1.00000 1.00000
16 ssd 3.49309 osd.16 up 1.00000 1.00000
-3 20.95853 host ncn-s002
0 ssd 3.49309 osd.0 up 1.00000 1.00000
3 ssd 3.49309 osd.3 up 1.00000 1.00000
7 ssd 3.49309 osd.7 up 1.00000 1.00000
10 ssd 3.49309 osd.10 up 1.00000 1.00000
13 ssd 3.49309 osd.13 up 1.00000 1.00000
15 ssd 3.49309 osd.15 up 1.00000 1.00000
-7 20.95853 host ncn-s003
1 ssd 3.49309 osd.1 down 1.00000 1.00000
4 ssd 3.49309 osd.4 down 1.00000 1.00000
8 ssd 3.49309 osd.8 down 1.00000 1.00000
11 ssd 3.49309 osd.11 down 1.00000 1.00000
14 ssd 3.49309 osd.14 down 1.00000 1.00000
17 ssd 3.49309 osd.17 down 1.00000 1.00000
(ncn-s#
) Check the status of the Ceph cluster:
ceph -s
Example output:
cluster:
id: 4c9e9d74-a208-11ed-b008-98039bb427f6
health: HEALTH_WARN
1 host is in maintenance mode <-------- Expect this line.
1/3 mons down, quorum ncn-s001,ncn-s002
6 osds down
1 host (6 osds) down
Degraded data redundancy: 34257/102773 objects degraded (33.333%), 370 pgs degraded, 352 pgs undersized
services:
mon: 3 daemons, quorum ncn-s001,ncn-s002 (age 56s), out of quorum: ncn-s003
mgr: ncn-s002.amfitm(active, since 43m), standbys: ncn-s001.rytusj
mds: 1/1 daemons up, 1 hot standby
osd: 18 osds: 12 up (since 55s), 18 in (since 13h)
rgw: 2 daemons active (2 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 13 pools, 553 pgs
objects: 34.26k objects, 58 GiB
usage: 173 GiB used, 63 TiB / 63 TiB avail
pgs: 34257/102773 objects degraded (33.333%)
370 active+undersized+degraded
159 active+undersized
24 active+clean
io:
client: 8.7 KiB/s rd, 353 KiB/s wr, 3 op/s rd, 53 op/s wr
(ncn-s#
) List down Ceph OSDs.
IMPORTANT: Before proceeding, ensure that this rebuild requires OSD wipes. Storage node rebuilds that are done on an active node do not require the OSD removal. Some examples are rebuilds to get some a custom patched image.
The ceph osd tree
capture indicated that there are down OSDs on ncn-s003
.
ceph osd tree down
Example output:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 62.87758 root default
-7 20.95853 host ncn-s003
1 ssd 3.49309 osd.1 down 1.00000 1.00000
4 ssd 3.49309 osd.4 down 1.00000 1.00000
8 ssd 3.49309 osd.8 down 1.00000 1.00000
11 ssd 3.49309 osd.11 down 1.00000 1.00000
14 ssd 3.49309 osd.14 down 1.00000 1.00000
17 ssd 3.49309 osd.17 down 1.00000 1.00000
(ncn-s#
) Remove the OSD references to allow the rebuild to re-use the original OSD references on the drives.
By default, if the OSD reference is not removed, then there will still a reference to them in the CRUSH map. This will result in OSDs that no longer exist appearing to be down.
The following command assumes the variables from the prerequisites section are set.
for osd in $(ceph osd ls-tree $NODE); do ceph osd destroy osd.$osd --force; ceph osd purge osd.$osd --force; done
Example output:
destroyed osd.1
purged osd.1
destroyed osd.4
purged osd.4
destroyed osd.6
purged osd.6
destroyed osd.11
purged osd.11
destroyed osd.14
purged osd.14
destroyed osd.17
purged osd.17
Proceed to the next step of the storage node rebuild procedure to identify nodes and update metadata. Otherwise, return to the main Rebuild NCNs page.