Cray System Management (CSM) 1.6.2 Release Notes
This page documents the changes introduced by this patch, compared to the previous patch
version of CSM.
For the main CSM 1.6 release notes page, including links to other patch release notes,
see CSM 1.6 release notes.
Additions and improvements
General
Security
- Fixed CVEs in the
cmsdev
test tool, cray-console-node
, and cray-console-operator
- Fixed CVEs in
oauth2
proxies by disabling TLS1.2
support
Test
- Add CFS node personalization to the Barebones Image Boot Test
- Improved testing resilience in the
spire_check_key_id_in_jwks
goss test
- Modified
adjust k8s_nodes_ready_check.sh
to not fail when a node is in Ready,SchedulingDisabled
state
- Modified
velero_backups_check.sh
to not fail if a newer, successful backup exists
- Modified
run_hms_ct_tests.sh
to handle concurrency better
- Fixed intermittent failures sometimes seen when running
check_key_id_in_jwks.sh
- Added retry logic to
goss-postgresql-syncfailed.yaml
to prevent intermittent false positives
- Added retry logic to
postgres_clusters_running.sh to prevent
intermittent false positives
- Added fix to prevent false positives in the Hardware State Manager (SMD) CT tests when components are in the
DiscoveryStarted
state when the tests are launched
Bug fixes
Known issues
- After updating Paradise BMC firmware, the
hmcollector-poll
service will lose event subscriptions and must be restarted
cfs-api
pods in CLBO state during CSM install.
- When installing CSM 1.6,
cray-shared-kafka-kafka-
pods in the services namespace fail to come up which results in cfs-api
pods in CLBO state.
- A workaround is presented in CFS API pods in CLBO.
istio-proxy
containers fail with too many open files.
- Install and Upgrade Framework (IUF) does not run the next stage for an activity
- iSCSI based boot content projection may fail if the image to be projected does not have an
etag
- CSM Automatic Network Utility (CANU) 1.8.0 and later is known to cause a brief
Node Management Network (NMN) network outage.
- CANU 1.8.0 and later introduce a separation of administrative traffic and user traffic on the management network
via addition of a new VRF and OSPF area. Until all switches are updated and new routes are propagated, there is a
brief NMN network outage. IP addressing does not change, but NMN traffic will flow over a new isolated VRF
channel. The length of the outage is dependent on the time to apply new switch configurations to all management
network switches - OSPF will propagate routes within seconds. As this affects liquid-cooled Mountain cabinets,
running jobs may be affected. A dedicated outage window is highly recommended for applying these changes.
- System Monitoring Application (SMA) 1.10.15 and later includes an upgraded LDMS that introduces an incompatibility with
configuration files used in prior versions.
- When upgrading from an older SMA version to a version with this new LDMS, the administrator must change the configuration files.
- A workaround is presented as an Action in the deliver-product stage in the IUF Stage Details for SMA section of the HPE Cray Supercomputing EX System Monitoring Application Installation Guide.
- Services that use PostgreSQL may fail when a Kubernetes master node is rebooted or rebuilt.
cray-uas-mgr
may still be running on a system upgraded from CSM 1.5.
- UAI was removed in CSM 1.6.0 but systems upgraded from CSM 1.5 may still have the
cray-uas-mgr
service and associated etcd cluster present.
- A workaround is presented in Remove User Access Service.
For a full list of known issues, see Known issues.