Cray System Management
  • v
  • Cray System Management (CSM) - Release Notes
  • Cray System Management (CSM) Administration Guide
    • Accessing LiveCD USB Device After Reboot
    • Component Names (xnames)
    • Validate CSM Health
    • Configure the Cray Command Line Interface (cray CLI)
    • User Access Service (UAS)
      • Add a Volume to UAS
      • Broker Mode UAI Management
      • Choosing UAI Resource Settings
      • Common UAI Configuration
      • Configure End-User UAI Classes for Broker Mode
      • Configure UAIs in UAS
      • Configure a Broker UAI Class
      • Configure a Default UAI Class for Legacy Mode
      • Create UAIs From Specific UAI Images in Legacy Mode
      • Create a UAI
      • Create a UAI Class
      • Create a UAI Resource Specification
      • Create a UAI with Additional Ports
      • Create and Use Default UAIs in Legacy Mode
      • Customize End-User UAI Images
      • Customize the Broker UAI Image
      • Delete a UAI
      • Delete a UAI Class
      • Delete a UAI Image Registration
      • Delete a UAI Resource Specification
      • Delete a Volume Configuration
      • Elements of a UAI
      • End-User UAIs
      • Examine a UAI Using a Direct Administrative Command
      • Legacy Mode User-Driven UAI Management
      • List Available UAI Classes
      • List Available UAI Images in Legacy Mode
      • List Registered UAI Images
      • List UAI Resource Specifications
      • List UAIs
      • List UAS Version Information
      • List Volumes Registered in UAS
      • Log in to a Broker UAI
      • This Page Has Moved
      • Modify a UAI Class
      • Obtain the Configuration of a UAS Volume
      • Register a UAI Image
      • Clear UAS Configuration
      • Resource Specifications
      • Retrieve Resource Specification Details
      • Retrieve UAI Image Registration Information
      • Setting UAI Timeouts
      • Broker UAI Resiliency and Load Balancing
      • Special Purpose UAIs
      • Start a Broker UAI
      • Troubleshoot Broker UAI SSSD Cannot Use /etc/sssd/sssd.conf
      • Troubleshoot Common Mistakes when Creating a Custom End-User UAI Image
      • Troubleshoot Duplicate Mount Paths in a UAI
      • Troubleshoot Missing or Incorrect UAI Images
      • Troubleshoot Stale Brokered UAIs
      • Troubleshoot UAS / CLI Authentication Issues
      • Troubleshoot UAI Stuck in ContainerCreating
      • Troubleshoot UAIs by Viewing Log Output
      • Troubleshoot UAIs with Administrative Access
      • Troubleshoot UAS Issues
      • Troubleshoot UAS by Viewing Log Output
      • UAI Classes
      • UAI Host Node Selection
      • UAI Host Nodes
      • UAI Image Customization
      • UAI Images
      • UAI Management
      • UAI Network Attachment Customization
      • UAI macvlans Network Attachments
      • UAS Limitations
      • UAS and UAI Legacy Mode Health Checks
      • Update a Resource Specification
      • Update a UAI Image Registration
      • Update a UAS Volume
      • View a UAI Class
      • Volumes
    • resiliency
      • NTP Resiliency
      • Recreate StatefulSet Pods on Another Node
      • Resilience of System Management Services
      • Resiliency
      • Resiliency Testing Procedure
      • Restore System Functionality if a Kubernetes Worker Node is Down
    • system management health
      • Access System Management Health Services
      • Configure Prometheus Email Alert Notifications
      • Grafana Dashboards by Component
      • Grafterm
      • Remove Kiali
      • System Management Health
      • System Management Health Checks and Alerts
      • Troubleshoot Grafana Dashboard
      • Troubleshoot Prometheus Alerts
    • image management
      • Build a New UAN Image Using the Default Recipe
      • Build an Image Using IMS REST Service
      • Configure IMS to Validate RPMs
      • Convert TGZ Archives to SquashFS Images
      • Create UAN Boot Images
      • Customize an Image Root Using IMS
      • Delete or Recover Deleted IMS Content
      • Image Management
      • Image Management Workflows
      • Import an External Image to IMS
      • Update IMS Job Access Network
      • Upload and Register an Image Recipe
    • node management
      • Access and Update Settings for Replacement NCNs
      • Removing a Liquid-cooled blade from a System
      • Replace a Compute Blade
      • Reset Credentials on Redfish Devices
      • S3FS Usage and Guidelines for Shasta
      • Swap a Compute Blade with a Different System
      • Add TLS Certificates to BMCs
      • TLS Certificates for Redfish BMCs
      • Add a Standard Rack Node
      • Troubleshoot Interfaces with IP Address Issues
      • Add Additional Liquid-Cooled Cabinets to a System
      • Troubleshoot Issues with Redfish Endpoint Discovery
      • Adding a Liquid-cooled Blade to a System
      • Troubleshoot Loss of Console Connections and Logs on Gigabyte Nodes
      • Build NCN Images Locally
      • Update Compute Node Mellanox HSN NIC Firmware
      • Change Java Security Settings
      • Update the Gigabyte Node BIOS Time
      • Change Settings for HMS Collector Polling of Air-Cooled Nodes
      • Updating Cabinet Routes on Management NCNs
      • Change Settings in the Bond
      • Use the Physical KVM
      • Check and Set the metal.no-wipe Setting on NCNs
      • Verify Node Removal
      • Check the BMC Failover Mode
      • View BIOS Logs for Liquid-Cooled Nodes
      • Clear Space in Root File System on Worker Nodes
      • Customize PCIe Hardware
      • Configuration of NCN Bonding
      • Customize PCIe Hardware
      • Configure NTP on NCNs
      • Disable Nodes
      • Dump a Non-Compute Node
      • Enable Nodes
      • Enable Passwordless Connections to Liquid Cooled Node BMCs
      • Find Node Type and Manufacturer
      • Launch a Virtual KVM on Gigabyte Servers
      • Launch a Virtual KVM on Intel Servers
      • Move a Standard Rack Node
      • Move a Standard Rack Node (Same Rack/Same HSN Ports)
      • Move a liquid-cooled blade within a System
      • NCN Drive Identification
      • Node Management
      • Node Management Workflows
      • Reboot NCNs
      • Rebuild NCNs
        • Final Validation Steps
        • Identify Nodes and Update Metadata
        • Post Rebuild Storage Node Validation
        • Power Cycle and Rebuild Nodes
        • Prepare Storage Nodes
        • Re-Add a Storage Node to Ceph
        • Rebuild NCNs
        • Validate Boot Loader
        • Wipe Drives
      • Add Remove Replace NCNs
        • Add NCN Data
        • Alpha Framework to Add, Remove, Replace, or Move NCNs
        • Add Switch Configuration for NCN
        • Allocate NCN IP Addresses
        • Boot NCN
        • Collect NCN MAC Addresses
        • Redeploy Services Impacted by Adding or Permanently Removing Storage Nodes
        • Remove NCN Data
        • Remove NCN from Role
        • Remove Switch Configuration for NCN
        • Update Firmware
        • Validate Health
        • Validate Added NCN
    • conman
      • Access Compute Node Logs
      • Access Console Log Data Via the System Monitoring Framework (SMF)
      • ConMan
      • Disable ConMan After the System Software Installation
      • Establish a Serial Connection to NCNs
      • Log in to a Node Using ConMan
      • Manage Node Consoles
      • Troubleshoot ConMan Asking for Password on SSH Connection
      • Troubleshoot ConMan Blocking Access to a Node BMC
      • Troubleshoot ConMan Failing to Connect to a Console
    • preinstall
      • Fresh Install Setting NodeBMC and RouterBMC Redfish Credentials
      • Change Credentials on ServerTech PDUs
      • Pre-Install Steps
    • system layout service
      • Add Liquid-Cooled Cabinets to SLS
      • Add UAN CAN IP Addresses to SLS
      • Add an alias to a service
      • Create a Backup of the SLS Postgres Database
      • Dump SLS Information
      • Load SLS Database with Dump File
      • Restore SLS Postgres Database from Backup
      • Restore SLS Postgres without an Existing Backup
      • System Layout Service (SLS)
      • Update SLS with UAN Aliases
    • utility storage
      • Adding a Ceph Node to the Ceph Cluster
      • Add Ceph OSDs
      • Adjust Ceph Pool Quotas
      • Alternate Storage Pools
      • Ceph Daemon Memory Profiling
      • Ceph Deep Scrubs
      • Ceph Health States
      • Ceph Orchestrator Usage
      • Ceph Service Check Script Usage
      • Ceph Storage Types
      • Cephadm Reference Material
      • Collect Information about the Ceph Cluster
      • Dump Ceph Crash Data
      • Identify Ceph Latency Issues
      • Manage Ceph Services
      • Shrink the Ceph Cluster
      • Restore Nexus Data After Data Corruption
      • Shrink Ceph OSDs
      • Troubleshoot Ceph-Mon Processes Stopping and Exceeding Max Restarts
      • Troubleshoot Ceph MDS Client Connectivity Issues
      • Troubleshooting Ceph MDS Reporting Slow Requests and Failure on Client
      • Troubleshoot Ceph OSDs Reporting Full
      • Troubleshoot Ceph Services Not Starting After a Server Crash
      • Troubleshoot Failure to Get Ceph Health
      • Troubleshoot Insufficient Standby MDS Daemons Available
      • Troubleshoot Large Object Map Objects in Ceph Health
      • Troubleshoot Pods Failing to Restart on Other Worker Nodes
      • Troubleshoot if RGW Health Check Fails
      • Troubleshoot S3FS Mount Issues
      • Troubleshoot System Clock Skew
      • Troubleshoot a Down OSD
      • Troubleshoot an Unresponsive Rados-Gateway (radosgw) S3 Endpoint
      • Utility Storage
    • hardware state manager
      • Add a Switch to the HSM Database
      • Add an NCN to the HSM Database
      • Component Group Members
      • Component Groups and Partitions
      • Component Memberships
      • Component Partition Members
      • Create a Backup of the HSM Postgres Database
      • HSM Roles and Subroles
      • Hardware Management Services (HMS) Locking API
      • Hardware State Manager (HSM)
      • Hardware State Manager (HSM) State and Flag Fields
      • Lock and Unlock Management Nodes
      • Manage Component Groups
      • Manage Component Partitions
      • Manage HMS Locks
      • Restore Hardware State Manager (HSM) Postgres Database from Backup
      • Restore Hardware State Manager (HSM) Postgres without an Existing Backup
      • Set BMC Management Roles
    • security and authentication
      • API Authorization
      • Access the Keycloak User Management UI
      • Add LDAP User Federation
      • Add Root Service Account for Gigabyte Controllers
      • Audit Logs
      • Authenticate an Account with the Command Line
      • Backup and Restore Vault Clusters
      • Certificate Types
      • Change Air-Cooled Node BMC Credentials
      • Change Credentials on ServerTech PDUs
      • Change Cray EX Liquid-Cooled Cabinet Global Default Password
      • Change the Keycloak Token Lifetime
      • Set NCN Image Root Password, SSH Keys, and Timezone
      • Set NCN Image Root Password, SSH Keys, and Timezone on PIT Node
      • Change Root Passwords for Compute Nodes
      • Change SNMP Credentials on Leaf-BMC Switches
      • Change the Keycloak Admin Password
      • Change the LDAP Server IP Address for Existing LDAP Server Content
      • Change the LDAP Server IP Address for New LDAP Server Content
      • Configure Keycloak for LDAP/AD authentication
      • Configure the RSA Plugin in Keycloak
      • Create Internal Groups in the Keycloak Shasta Realm
      • Create Internal User Accounts in the Keycloak Shasta Realm
      • Create a Backup of the Keycloak Postgres Database
      • Create a Service Account in Keycloak
      • Default Keycloak Realms, Accounts, and Clients
      • Delete Internal User Accounts in the Keycloak Shasta Realm
      • Get a Long-Lived Token for a Service Account
      • HashiCorp Vault
      • Keycloak Operations
      • Keycloak User Localization
      • Keycloak User Management with kcadm.sh
      • Make HTTPS Requests from Sources Outside the Management Kubernetes Cluster
      • Manage Sealed Secrets
      • Manage System Passwords
      • PKI Certificate Authority (CA)
      • PKI Services
      • Preserve Username Capitalization for Users Exported from Keycloak
      • Provisioning a Liquid-Cooled EX Cabinet CEC with Default Credentials
      • Public Key Infrastructure (PKI)
      • Recovering from Mismatched BMC Credentials
      • Remove Internal Groups from the Keycloak Shasta Realm
      • Remove the Email Mapper from the LDAP User Federation
      • Remove the LDAP User Federation from Keycloak
      • Restrict Network Access to the ncn-images S3 Bucket
      • Re-Sync Keycloak Users to Compute Nodes
      • Retrieve an Authentication Token
      • Retrieve the Client Secret for Service Accounts
      • Update NCN User SSH Keys
      • System Security and Authentication
      • Transport Layer Security (TLS) for Ingress Services
      • Troubleshoot Common Vault Cluster Issues
      • Update Default Air-Cooled BMC and Leaf-BMC Switch SNMP Credentials
      • Update Default ServerTech PDU Credentials used by the Redfish Translation Service (RTS)
      • Set NCN User Passwords
      • Updating the Liquid-Cooled EX Cabinet CEC with Default Credentials after a CEC Password Change
    • spire
      • Create a Backup of the Spire Postgres Database
      • Restore missing Spire metadata
      • Restore Spire Postgres without an Existing Backup
      • Troubleshoot Spire Failing to Start on NCNs
      • Update Spire Intermediate CA Certificate
    • boot orchestration
      • BOS Workflows
      • Compute Node Boot Issue Symptom Node Console or Logs Indicate that the Server Response has Timed Out
      • Boot Issue Symptom Node HSN Interface Does Not Appear or Show Detected Links Detected
      • Boot Orchestration
      • Boot UANs
      • Check the Progress of BOS Session Operations
      • Clean Up After a BOS/BOA Job is Completed or Cancelled
      • Clean Up Logs After a BOA Kubernetes Job
      • Compute Node Boot Issue Symptom Duplicate Address Warnings and Declined DHCP Offers in Logs
      • Compute Node Boot Issue Symptom Message About Invalid EEPROM Checksum in Node Console or Log
      • Compute Node Boot Issue Symptom Node is Not Able to Download the Required Artifacts
      • Compute Node Boot Sequence
      • Configure the BOS Timeout When Booting Compute Nodes
      • Create a Session Template to Boot Compute Nodes with CPS
      • Edit the iPXE Embedded Boot Script
      • Healthy Compute Node Boot Process
      • Kernel Boot Parameters
      • Limit the Scope of a BOS Session
      • BOS Limitations for Gigabyte BMC Hardware
      • Log File Locations and Ports Used in Compute Node Boot Troubleshooting
      • Manage a BOS Session
      • Manage a Session Template
      • Node Boot Root Cause Analysis
      • Redeploy the iPXE and TFTP Services
      • BOS Session Templates
      • BOS Sessions
      • Stage Changes Without BOS
      • Tools for Resolving Compute Node Boot Issues
      • Troubleshoot Booting Nodes with Hardware Issues
      • Troubleshoot Compute Node Boot Issues Related to Dynamic Host Configuration Protocol (DHCP)
      • Troubleshoot Compute Node Boot Issues Related to Slow Boot Times
      • Troubleshoot Compute Node Boot Issues Related to Trivial File Transfer Protocol (TFTP)
      • Troubleshoot Compute Node Boot Issues Related to Unified Extensible Firmware Interface (UEFI)
      • Troubleshoot Compute Node Boot Issues Related to the Boot Script Service (BSS)
      • Troubleshoot Compute Node Boot Issues Using Kubernetes
      • Troubleshoot UAN Boot Issues
      • Upload Node Boot Information to Boot Script Service (BSS)
      • View the Status of a BOS Session
    • CSM product management
      • Security Hardening
      • Change Passwords and Credentials
      • Configure CSM packages with CFS
      • Configure Keycloak Account
      • Configure Non-Compute Nodes with CFS
      • Perform NCN Personalization
      • Post-Install Customizations
      • Redeploying a Chart
      • Remove Artifacts from Product Installations
      • Validate Signed RPMs
    • artifact management
      • Artifact Management
      • Generate Temporary S3 Credentials
      • Manage Artifacts with the Cray CLI
      • Use S3 Libraries and Clients
    • firmware
      • FAS Admin Procedures
      • FAS CLI
      • FAS Filters
      • FAS Recipes
      • FAS Use Cases
      • Update Firmware with FAS
      • Updating BMC Firmware and BIOS for ncn-m001
      • Updating BMC Firmware and BIOS for NCNs without FAS
      • Upload BMC Recovery Firmware into TFTP Server
    • hpe pdu
      • HPE PDU Admin Procedures
    • power management
      • Cray Advanced Platform Monitoring and Control (CAPMC)
      • Ignore Nodes with CAPMC
      • Liquid Cooled Node Power Management
      • Power Off Compute and IO Cabinets
      • Power Off the External Lustre File System
      • Power On Compute and IO Cabinets
      • Power On and Boot Compute and User Access Nodes
      • Power On and Start the Management Kubernetes Cluster
      • Power On the External Lustre File System
      • Prepare the System for Power Off
      • Recover from a Liquid Cooled Cabinet EPO Event
      • Save Management Network Switch Configuration Settings
      • Set the Turbo Boost Limit
      • Shut Down and Power Off Compute and User Access Nodes
      • Shut Down and Power Off the Management Kubernetes Cluster
      • Standard Rack Node Power Management
      • System Power Off Procedures
      • System Power On Procedures
      • User Access to Compute Node Power Data
      • Worker Node COS Power Up Configuration
      • Power Management
    • kubernetes
      • About Kubernetes Taints and Labels
      • About Postgres
      • About etcd
      • About kubectl
      • Backups for etcd-operator Clusters
      • Kubernetes and Bare Metal EtcD Certificate Renewal
      • Check for and Clear etcd Cluster Alarms
      • Check the Health and Balance of etcd Clusters
      • Clear Space in an etcd Cluster Database
      • Configure kubectl Credentials to Access the Kubernetes APIs
      • containerd
      • Create a Manual Backup of a Healthy etcd Cluster
      • Determine if Pods are Hitting Resource Limits
      • Disaster Recovery for Postgres
      • Increase Kafka Pod Resource Limits
      • Increase Pod Resource Limits
      • Kubernetes
      • Kubernetes Networking
      • Kubernetes Storage
      • Configure Kubernetes API Audit Log Maximum Backups
      • Pod Resource Limits
      • Rebalance Healthy etcd Clusters
      • Rebuild Unhealthy etcd Clusters
      • Recover from Postgres WAL Event
      • Repopulate Data in etcd Clusters When Rebuilding Them
      • Report the Endpoint Status for etcd Clusters
      • Restore Bare-Metal etcd Clusters from an S3 Snapshot
      • Restore Postgres
      • Restore an etcd Cluster from a Backup
      • Retrieve Cluster Health Information Using Kubernetes
      • TDS Lower CPU Requests
      • Troubleshoot Intermittent HTTP 503 Code Failures
      • Troubleshoot Postgres Database
      • View Postgres Information for System Databases
    • package repository management
      • Manage Repositories with Nexus
      • Nexus Configuration
      • Nexus Deployment
      • Nexus Export and Restore
      • Nexus Space Cleanup
      • Package Repository Management
      • Package Repository Management with Nexus
      • Repair Yum Repository Metadata
      • Restrict Admin Privileges in Nexus
      • Troubleshoot Nexus
    • system configuration service
      • Configure BMC and Controller Parameters with SCSD
      • Manage Parameters with the scsd Service
      • Set BMC Credentials
      • System Configuration Service
    • network
      • Management Network User Guide
        • Management Network 1.0 (1.2 Preconfig) to 1.2
        • Load Saved Switch Configuration
        • Fresh Install
        • Added Hardware
        • Manual Switch Configuration
        • Generate Switch Configurations
        • Apply Custom Switch Configurations for CSM 1.0
        • CSM Automatic Network Utility
          • CANU Installation
          • Troubleshoot CANU Validation Errors
          • Use CANU to Verify, Generate, or Compare Switch Configurations
          • Initializing CANU
          • Introduction to CANU
          • Quick start guide to CANU
          • Uninstall CANU
          • Update CANU From CSM Release Tarball
          • Use CANU to Generate Full Network Configuration
        • Apply Custom Switch Configuration CSM 1.2
        • Apply Switch Configurations
        • Dell Installation and Configuration Guide
          • Configure Access Control Links (ACLs)
          • Configure Address Resolution Protocol (ARP)
          • Back Up a Switch Configuration
          • Configure Domain Name System (DNS) Client
          • Configure Domain Name
          • Configure Hostnames
          • Configure Internet Group Multicast Protocol (IGMP)
          • Configure Link Aggregation Group (LAG)
          • Link layer discovery protocol (LLDP)
          • Configure Locator LED
          • Configure Loopback Interface
          • Configure Management Interface
          • Configure Multiple Spanning Tree Protocol (MSTP)
          • Network Time Protocol (NTP) Client
          • Configure Physical Interfaces
          • Configure QoS
          • Configure Remote Logging
          • Reset Dell Switch Configuration
          • Configure SNMPv2c Community
          • Dell SNMPv3 Users
          • Configure Secure Shell (SSH)
          • Configure System Images
          • Perform an Upgrade on Dell Switches
          • Configure Virtual Local Access Networks (VLANs)
          • Configure VLAN Interface
          • VLAN Trunking 802.1Q
        • Upgrade CANU
        • Collect Data
        • Configuration Management
        • Configure SNMP
        • Mellanox Installation and Configuration Guide
          • Access control lists (ACLs)
          • Address resolution protocol (ARP)
          • Backing up switch configuration
          • BGP basics
          • Cable diagnostics
          • Check BGP and MetalLB
          • Check current DHCP leases
          • Check DHCP lease is getting allocated
          • Check HSM
          • Check KEA DHCP logs
          • Computes/UANs/Application Nodes
          • Large Number of DHCP Declines During a Node Boot
          • Domain name system (DNS) client
          • Domain name
          • You are getting an IP address, but not the correct one. Duplicate IP address check
          • Exec banners
          • Hostname
          • IGMP
          • Ip filter
          • Key features used in the management network configuration
          • Link aggregation group (LAG)
          • Large
          • Link layer discovery protocol (LLDP)
          • Loopback interface
          • Management interface
          • Example of how to configure Scenario A or B
          • Management network functions in detail
          • Medium
          • Multi-chassis interface
          • MLAG (Multi-Chassis LAG)
          • MLAG
          • Multiple spanning tree protocol (MSTP)
          • Native VLAN
          • TCPDUMP
          • NCNs on Install
          • Network types – Naming and segment Function
          • Network traffic pattern inside of the system
          • Network Time Protocol (NTP) Client
          • Open shortest path first (OSPF) v2
          • Physical interfaces
          • PIM-SM bootstrap router (BSR) and rendezvous-point (RP)
          • Rebooting NCN and PXE fails
          • Remote logging
          • How to connect management network to your campus network
          • Routed interfaces
          • Scenario A network connection via management network
          • Scenario B network connection via high speed network
          • Small
          • SNMPv2c community
          • SNMPv3 users
          • Spine-leaf architecture
          • Spine-leaf architecture
          • Why are spine-leaf architectures becoming more popular?
          • Secure shell (SSH)
          • Mac address Table
          • Static routing
          • Confirm the status of the cray-dhcp-kea pods/services
          • System images
          • Test TFTP traffic (Aruba Only)
          • Typical configuration of MLAG link connecting to NCN
          • Typical configuration of MLAG between switches
          • Performing Upgrade On Mellanox Switches
          • Verify the switches are forwarding DHCP traffic
          • Verify BGP
          • Verify the DHCP traffic on the workers
          • Verify route to TFTP
          • Very Large (Exascale)
          • Virtual local access networks (VLANs)
          • VLAN interface
          • VLAN trunking 802.1Q
          • Web user interface (WebUI)
        • Aruba Installation and Configuration Guide
          • 802.1X
          • Access Control Lists (ACLs)
          • Address Resolution Protocol (ARP)
          • Backup a Switch Configuration
          • Border Gateway Protocol (BGP) Basics
          • Bluetooth Capabilities
          • Cable Diagnostics
          • Check BGP and MetalLB
          • Check Current DHCP Leases
          • Check DHCP Lease is Getting Allocated
          • Check HSM
          • Check KEA DHCP Logs
          • Classifier Policies
          • Verify Computes/UANs/Application Nodes
          • Large Number of DHCP Declines During a Node Boot
          • Configure Domain Name Service (DNS) Clients
          • Configure Domain Names
          • Check for Duplicate IP Addresses
          • Configure Exec Banners
          • Configure Hostnames
          • Configure Internet Group Multicast Protocol (IGMP)
          • Initial Prioritization
          • Introduction
          • Key Features Used in the Management Network Configuration
          • Link Aggregation Group (LAG)
          • Link Layer Discovery Protocol (LLDP)
          • Locator LED
          • Loopback Interface
          • MAC Authentication
          • Management Interface
          • Example of How to Configure Scenario A or B
          • System Management Network Functions
          • VSX ISL HA
          • VSX MCLAG Link HA
          • VSX Member Power Failure
          • VSX Split
          • Multi-Chassis Link Aggregation Group (MCLAG)
          • Message-Of-The-Day (MOTD)
          • Multicast Source Discovery Protocol (MSDP)
          • Multiple Spanning Tree Protocol (MSTP)
          • Native VLAN
          • NCN tcpdump
          • NCNs on Install
          • Network Types – Naming and Segment Function
          • Network Topologies
          • Network Traffic Pattern
          • Notices
          • Network Time Protocol (NTP) Client
          • Open Shortest Path First (OSPF) v2
          • Physical Interfaces
          • PIM-SM Bootstrap Router (BSR) and Rendezvous Point (RP)
          • Port Mirroring
          • Port Security
          • Queuing and Scheduling
          • RADIUS
          • Rebooting NCNs and PXE Fails
          • Redundant Power Supplies
          • Remote Logging
          • Connect the Management Network to a Campus Network
          • Routed interfaces
          • Scenario A Network Connection via Management Network
          • Scenario B Network Connection via High-Speed Network
          • Simple Network Management Protocol (SNMP) Agent
          • SNMPv2c Community
          • SNMP traps
          • Aruba SNMPv3 Users
          • Spine-Leaf Architecture
          • Spine-leaf Architecture
          • Secure Shell (SSH)
          • Static Routing
          • Confirm the Status of the cray-dhcp-kea Pods
          • TACACS
          • Test TFTP Traffic (Aruba Only)
          • Typical Configuration of VSX
          • Typical Edge Port Configuration
          • Typical Configuration of MCLAG Link
          • Unidirectional Link Detection (UDLD)
          • Perform a VSX Upgrade on Aruba Switches
          • Verify the Switches are Forwarding DHCP Traffic
          • Verify BGP
          • Verify the DHCP Traffic on the Worker Nodes
          • Verify Route to TFTP
          • Virtual Local Access Networks (VLANs)
          • VLAN Interface
          • VLAN Trunking 802.1Q
          • Virtual Switching Framework (VSF) - 6300 Only
          • Virtual Switching Extension (VSX)
          • What is VSX?
          • Switch Replacement in the VSX Cluster
          • VSX Sync
          • Web User Interface (WebUI)
          • Erase All zeroize
        • External User Guides
        • Network Tests
        • Reinstall
        • Replace Switch
        • Save a Configuration
        • Prometheus SNMP Exporter
        • Upgrade Switches From 1.0 to 1.2 Preconfig
        • Validate Cabling
        • Validate the SHCD
        • Validate Switch Configurations
        • Wipe Management Switch Configuration
        • Backup a Custom Configuration
        • Remove UAN Access to the CMN
        • Enabling Customer High Speed Network Routing
        • BICAN Support Matrix - Shasta Customer Access Networks
        • Bifurcating the CAN - CSM 1.2 Feature Details
        • BICAN Summary
        • firmware
          • Update Management Network Firmware
      • Access to System Management Services
      • Connect to the HPE Cray EX Environment
      • Create a CSM Configuration Upgrade Plan
      • Default IP Address Ranges
      • Network
      • Gateway Testing
      • dhcp
        • DHCP
        • Troubleshoot DHCP Issues
      • external dns
        • External DNS
        • External DNS Failing to Discover Services Workaround
        • External DNS CSI Input Values
        • Ingress Routing
        • Troubleshoot DNS Configuration Issues
        • Troubleshoot Connectivity to Services with External IP addresses
        • Update the cmn-external-dns value post-installation
      • metallb bgp
        • Check BGP Status and Reset Sessions
        • MetalLB Configuration
        • MetalLB in BGP-Mode
        • Troubleshoot BGP not Accepting Routes from MetalLB
        • Troubleshoot Services without an Allocated IP Address
      • customer accessible networks
        • Connect to the CMN and CAN
        • Customer Accessible Networks
        • CAN/CMN with Dual-Spine Configuration
        • Externally Exposed Services
        • Troubleshoot CMN issues
        • BI-CAN Aruba/Arista Configuration
        • MetalLB Peering with Arista Edge Router
      • dns
        • Domain Name Service (DNS) Overview
        • Enable ncsd on UANs
        • Manage the DNS Unbound Resolver
        • PowerDNS Configuration
        • PowerDNS Migration Guide
        • Troubleshoot Common DNS Issues
        • Troubleshoot PowerDNS
    • compute rolling upgrades
      • CRUS Workflow
      • Compute Rolling Upgrades
      • Troubleshoot Nodes Failing to Upgrade in a CRUS Session
      • Troubleshoot a Failed CRUS Session Because of Bad Parameters
      • Troubleshoot a Failed CRUS Session Because of Unmet Conditions
      • Upgrade Compute Nodes with CRUS
    • configuration management
      • Ansible Execution Environments
      • Ansible Inventory
      • Automatic Session Deletion with sessionTTL
      • Backup and Restore VCS Data
      • CFS Flow
      • CFS Global Options
      • CFS Key Management and Permission Denied Errors
      • Change the Ansible Verbosity Logs
      • Configuration Layers
      • Configuration Management
      • Configuration Management of System Components
      • Configuration Management with the CFS Batcher
      • Configuration Sessions
      • Create a CFS Configuration
      • Create a CFS Session with Dynamic Inventory
      • Create an Image Customization CFS Session
      • Create and Populate a VCS Configuration Repository
      • Customize Configuration Values
      • Delete CFS Sessions
      • Enable Ansible Profiling
      • Git Operations
      • Manage Multiple Inventories in a Single Location
      • NCN Worker Image Customization
      • Set Limits for a Configuration Session
      • Set the ansible.cfg for a Session
      • Specifying Hosts and Groups
      • Target Ansible Tasks for Image Customization
      • Track the Status of a Session
      • Troubleshoot Ansible Play Failures in CFS Sessions
      • Troubleshoot CFS Session Failing to Complete
      • Troubleshoot CFS Sessions Failing to Start
      • Update a CFS Configuration
      • Update the Privacy Settings for Gitea Configuration Content Repositories
      • Use a Custom ansible.cfg File
      • Use a Specific Inventory in a Configuration Session
      • VCS Administrative User
      • VCS Branching Strategy
      • Version Control Service (VCS)
      • View Configuration Session Logs
      • Write Ansible Code for CFS
    • hmcollector
      • Adjust HM Collector resource limits and requests
  • CSM Background Information
    • Certificate Authority
    • NCN BIOS
    • NCN Boot Workflow
    • NCN Images
    • NCN Mounts and File Systems
    • NCN Networking
    • NCN Operating System Releases
    • NCN Packages
    • NCN Plan of Record
  • CSM Troubleshooting Information
    • Manual SSH Key Setting Process
    • Troubleshoot the CMS Barebones Image Boot Test
    • Running CT Tests Manually
    • Interpreting HMS Health Check Results
    • PXE Booting Runbook
    • known issues
      • CFS Component With Zero-Length ID
      • Gigabyte BMC Missing Redfish Data
      • Hang Listing BOS Sessions
      • Multiple Console Node Pods on the Same Worker
      • Nexus Fails Authentication with Keycloak Users
      • SLS Not Working During Node Rebuild
      • VCS Password With Illegal Characters
      • Known Issue admin-client-auth Not Found
      • SAT/HSM/CAPMC Component Power State Mismatch
      • Cray CLI 403 Forbidden Errors
      • HMS Discovery Job Not Creating RedfishEndpoints In Hardware State Manager
      • Etcd Cluster Backup Fails Due to Timeout
      • Known Issue Logging into the Gitea web UI requires logging in twice
      • HPE iLO dropping event subscriptions and not properly transitioning power state in CSM software
      • Kafka Failure after CSM 1.2 Upgrade
      • Mellanox lacp-individual Limitations
      • Common Platform CA Issues
      • Spire database connection pool configuration in an air-gapped environment
      • Spire Database Cluster DNS Lookup Failure
    • kubernetes
      • Kubernetes Log File Locations
      • Kubernetes Troubleshooting Information
      • Troubleshoot Kubernetes Master or Worker node in NotReady state
      • Troubleshoot Kubernetes Pods Not Starting
      • Troubleshoot Liveliness or Readiness Probe Failures
      • Troubleshoot Unresponsive kubectl Commands
  • Glossary
  • Install CSM
    • Set Gigabyte Node BMC to Factory Defaults
    • Boot LiveCD Virtual ISO
    • SHCD HMN Tab/HMN Connections Rules
    • Bootstrap PIT Node from LiveCD Remote ISO
    • Switch PXE Boot from Onboard NIC to PCIe
    • Bootstrap PIT Node from LiveCD USB
    • Troubleshooting Installation Problems
    • Cable Management Network Servers
    • Utility Storage Installation Troubleshooting
    • Ceph CSI Troubleshooting
    • Wipe NCN Disks for Reinstallation
    • Clear Gigabyte CMOS
    • Collect MAC Addresses for NCNs
    • Collecting the BMC MAC Addresses
    • Collecting NCN MAC Addresses
    • Configure Administrative Access
    • Configure Management Network
    • Connect to Switch over USB-Serial Cable
    • Create Application Node Config YAML
    • Create Cabinets YAML
    • Create HMN Connections JSON File
    • Create NCN Metadata CSV
    • Create Switch Metadata CSV
    • CSM Services Install Fails Because of Missing Secret
    • Deploy Final NCN
    • Deploy Management Nodes
    • Install CSM Services
    • Prepare Compute Nodes
    • Prepare Configuration Payload
    • Prepare Management Nodes
    • Prepare site-init
    • PXE Boot Troubleshooting
    • Reinstall LiveCD
    • Reset root Password on LiveCD
    • Restart Network Services and Interfaces on NCNs
    • Safeguards for CSM
  • Introduction to CSM Installation
    • CAPMC Deprecation Notice many CAPMC v1 features are being partially deprecated
    • CSM Overview
    • Differences from Previous Release
    • Documentation Conventions
    • Scenarios for Shasta v1.5
  • scripts
    • operations
      • node management
        • Add Remove Replace NCNs
          • Python Library for SLS Network Data
    • workarounds
      • Boot Order Workaround
      • Kernel Dump Workaround
  • Update CSM Product Stream
  • Upgrade CSM
    • CSM 1.2.1 Patch Installation Instructions
      • Upgrade and validate Switch Configurations
    • CSM 1.2.2 Patch Installation Instructions
      • Upgrade and validate Switch Configurations
    • Usage
      • k8s
        • Worker-Specific Manual Steps
      • storage
        • CEPHADM
    • CSM 1.0.x to 1.2.x Upgrade Process
      • Usage
        • k8s
          • Worker-Specific Manual Steps
        • storage
          • CEPHADM
      • Stage 0 - Prerequisites and Preflight Checks
      • Stage 1 - Ceph image upgrade
      • Stage 2 - Kubernetes Upgrade from 1.19.9 to 1.20.13
      • Stage 3 - CSM Service Upgrades
      • Stage 4 - Ceph Upgrade
      • Stage 5 - Perform NCN Personalization
      • Plan and coordinate network upgrade
      • scripts
        • sls
          • SLS Updates Expert mode
          • sls utils Library
          • Upgrade SLS Offline from CSM 1.0.x to CSM 1.2
          • sls updater.py Technical Details
        • upgrade
          • Upgrade Automation
    • Stage 0 - Prerequisites and Preflight Checks
    • Stage 1 - Ceph image upgrade
    • Prepare For Upgrade
    • Stage 2 - Kubernetes Upgrade from 1.19.9 to 1.20.13
    • Stage 3 - CSM Service Upgrades
    • Stage 4 - Ceph Upgrade
    • Stage 5 - Perform NCN Personalization
    • Plan and coordinate network upgrade
    • scripts
      • sls
        • SLS Updates Expert mode
        • sls utils Library
        • Upgrade SLS Offline from CSM 1.0.x to CSM 1.2
        • sls updater.py Technical Details
      • upgrade
        • Upgrade Automation
Cray System Management Documentation > Cray System Management (CSM) Administration Guide > network > Management Network User Guide > CSM Automatic Network Utility

CSM Automatic Network Utility

CSM Automatic Network Utility (CANU) is a tool used to generate/validate/test the Shasta management network.

  • Introduction to CANU
  • Official Documentation
  • Quick Start Guide
  • Install CANU
  • Update CANU From CSM Tarball
  • Initialize CANU
  • Verify, generate, or compare switch configurations
  • Generate full network configuration
  • Uninstall CANU
  • CANU Validation Error