Cray System Management (CSM) - Release Notes

CSM 1.7 contains many changes spanning bug fixes, new feature development, and documentation improvements. This page lists some of the highlights.

New

Monitoring

Networking

Miscellaneous functionality

  • Console logs and interaction is now available and tenant aware through the cray CLI, see console for more information.
  • Recipe builds using kiwi-ng now include the signing keys contained in the hpe-signing-key secret, which allows for the verification of the recipe build artifacts.

New hardware support

New software support

Automation improvements

Base platform component upgrades

Security improvements

  • Spire node attestation can now be setup to use TPM chips on supported platforms, see Enable TPM node attestation with Spire for more information.
  • The old version of the Spire server was removed to fully transition to the newer version of Spire.

Customer-requested enhancements

Documentation enhancements

Noteworthy changes

Test

  • Modified adjust k8s_nodes_ready_check.sh to not fail when a node is in Ready,SchedulingDisabled state
  • Modified velero_backups_check.sh to not fail if a newer, successful backup exists
  • Modified run_hms_ct_tests.sh to handle concurrency better
  • Fixed intermittent failures sometimes seen when running check_key_id_in_jwks.sh
  • Added retry logic to goss-postgresql-syncfailed.yaml to prevent intermittent false positives
  • Added retry logic to postgres_clusters_running.sh to prevent intermittent false positives

Bug fixes

  • The Boot Orchestration Service (BOS) session-setup operator now ignores invalid xnames referenced by session templates, fixing a bug that caused BOS sessions to be stuck in pending state.
  • BOS logging is significantly more memory efficient, fixing a problem where logging on large scale systems could cause BOS operator Kubernetes pods to be OOMKilled.
  • When using the API or CLI to Modify a BOS session template, it is no longer required to specify boot_sets in the update data (this fixes a regression bug present in CSM 1.6).
  • Previously, the CSM 1.5.3 and CSM 1.6.1 releases included changes to resolve resource leaks found in the PCS, SMD, hmcollector, and FAS services. This reduced instances of pods being restarted due to OOMKilled and failed liveness and/or readiness probes. These changes also improved the responsiveness and scalability of these services.
    • In the CSM 1.7.0 release, additional resource leaks in these same services were found and resolved.
    • Additionally, similar resource leaks were found and resolved in the following HMS services: BSS, CAPMC, River Discovery, HBTD, MEDS, RTS, HMNFD, SCSD, SLS
  • A bug was fixed in the hmcollector-poll service so that event subscriptions are no longer lost after updating Paradise BMC firmware. The service no longer needs to be restarted after performing firmware updates.

Deprecations

For more details and a list of all deprecated CSM features, see Deprecations.

Removals

For more details and a list of all features with an announced removal target, see Removals.

Known issues

For a full list of known issues, see Known issues.

Security vulnerability exceptions in CSM 1.7