Troubleshoot Insufficient Standby MDS Daemons Available

Procedure

  1. Log into a node running ceph-mon. Typically this will be ncn-s001/2/3.

  2. Check the ceph health.

    ceph health detail
    

    Example Output:

    HEALTH_WARN insufficient standby MDS daemons available
    [WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons available
    have 0; want 1 more
    

    This output explicitly states that you need at least 1 more to clear the alert.

  3. Determine which MDS daemons are down.

    ceph orch ps --daemon_type mds
    

    Example Output:

    NAME                        HOST      STATUS         REFRESHED  AGE  VERSION    IMAGE NAME                        IMAGE ID      CONTAINER ID
    mds.cephfs.ncn-s001.lhoocr  ncn-s001  stopped        4m ago     18h  <unknown>  registry.local/ceph/ceph:v15.2.8  <unknown>     <unknown>
    mds.cephfs.ncn-s002.nywheq  ncn-s002  stopped        4m ago     18h  <unknown>  registry.local/ceph/ceph:v15.2.8  <unknown>     <unknown>
    mds.cephfs.ncn-s003.jdufcg  ncn-s003  running (18h)  4m ago     18h  15.2.8     registry.local/ceph/ceph:v15.2.8  5553b0cb212c  4df61111d738
    

    IMPORTANT: Depending on the configuration and the number of MDS daemons, the number of MDS daemons in a stopped or error state may vary.

  4. Start the stopped MDS daemon(s).

    ceph orch daemon start <MDS daemon name>
    

    Repeat for each stopped MDS daemon.

  5. Check the status of the cluster.

    ceph health detail
    

    Expected Output:

    HEALTH_OK
    

IMPORTANT: If the daemon is not starting using the method above, please refer to Manage Ceph Services