Power Off the External Lustre File System

General procedure for powering off an external ClusterStor system.

Use this procedure as a general guide to power off an external ClusterStor system. Refer to the detailed procedures in the appropriate ClusterStor administration guide:

Title Model
ClusterStor E1000 Administration Guide 4.2 - S-2758 ClusterStor E1000
ClusterStor Administration Guide 3.4 - S-2756 ClusterStor L300/L300N
ClusterStor Administration Guide - S-2755 Legacy ClusterStor

Procedure

  1. (remote#) SSH to the primary MGMT node as admin.

    ssh -l admin cls01234n00.us.cray.com
    
  2. (n000$) Change to root user.

    sudo su –
    
  3. (n000#) Collect status information for the system before shutdown.

    cscli csinfo
    cscli show_nodes
    cscli fs_info
    crm_mon -1r
    
  4. (n000#) Check resources before unmounting the file system.

    ssh cls01234n002 crm_mon -r1 | grep fsys
    ssh cls01234n004 crm_mon -r1 | grep fsys
    ssh cls01234n006 crm_mon -r1 | grep fsys
    ssh cls01234n008 crm_mon -r1 | grep fsys
    ssh cls01234n010 crm_mon -r1 | grep fsys
    ssh cls01234n012 crm_mon -r1 | grep fsys
    . . .
    
  5. (n000#) Stop the Lustre file system (FILESYSTEM_NAME will be reported from the cscli fs_info command run above).

    cscli unmount -f FILESYSTEM_NAME
    
  6. (n000#) Verify that resources have been stopped by running the following on all even-numbered nodes.

    ssh NODENAME crm_mon -r1 | grep fsys
    

    Example output:

    cls01234n006_md0-fsys (ocf::heartbeat:XYMNTR): Stopped
    cls01234n006_md1-fsys (ocf::heartbeat:XYMNTR): Stopped
    cls01234n006_md2-fsys (ocf::heartbeat:XYMNTR): Stopped
    cls01234n006_md3-fsys (ocf::heartbeat:XYMNTR): Stopped
    cls01234n006_md4-fsys (ocf::heartbeat:XYMNTR): Stopped
    cls01234n006_md5-fsys (ocf::heartbeat:XYMNTR): Stopped
    cls01234n006_md6-fsys (ocf::heartbeat:XYMNTR): Stopped
    cls01234n006_md7-fsys (ocf::heartbeat:XYMNTR): Stopped
    
  7. (n000#) SSH to the MGS node (the MGS_NODE name will be reported from the cscli fs_info command run above).

    ssh MGS_NODE
    
  8. (mgs#) To determine if Resource Group md65-group is stopped, use the crm_mon utility to monitor the status of the MGS and MDS nodes.

    1. Shows MGS and MDS nodes in a partial stopped state.

      [MGS]# crm_mon -1r | grep fsys
      

      Example output:

      cls01234n003_md66-fsys (ocf::heartbeat:XYMNTR): Stopped
      cls01234n003_md65-fsys (ocf::heartbeat:XYMNTR): Started
      
    2. If the output of the previous command shows a partial stopped state (Stopped and Started), issue the stop_xyraid command and verify that the node is stopped.

      [MGS]# stop_xyraid nodename_md65-group
      [MGS]# crm\_mon -1r | grep fsys
      

      Example output:

      cls01234n003_md66-fsys (ocf::heartbeat:XYMNTR): Stopped
      cls01234n003_md65-fsys (ocf::heartbeat:XYMNTR): Stopped
      
  9. (mgs#) Exit the MGS node.

    exit
    
  10. (n000#) Power off the non-MGMT diskless nodes.

    1. Check power state of all non-MGMT nodes and list the node hostnames (in this example cls01234n[02-15]) before power off.

      pm -q
      

      Example output:

      on: cls01234n[000-001]
      on: cls01234n[002-015]
      unknown:
      
    2. Power off all non-MGMT nodes.

      cscli power_manage -n cls01234n[02-15] --power-off
      
    3. Check the power status of the nodes.

      pm -q
      

      Example output:

      on: cls01234n[000-001]
      off: cls01234n[002-015]
      unknown:
      
  11. (n000#) Repeat the previous step until all non-MGMT nodes are powered off.

  12. (n000#) From the primary MGMT node, power off the MGMT nodes.

    cscli power_manage -n cls01234n[000-001] --power-off
    
  13. (n000#) Shut down the primary management node.

    shutdown -h now
    

Next step

Return to System Power Off Procedures and continue with next step.