Cray System Management (CSM) - Release Notes

CSM 1.7 contains many changes spanning bug fixes, new feature development, and documentation improvements. This page lists some of the highlights.

New

Monitoring

Networking

Miscellaneous functionality

  • Console logs and interaction is now available and tenant aware through the cray CLI, see console for more information.
  • Configuration Framework Service (CFS) components can now be updated in bulk through the Cray CLI (cray). See Managing many components for more information. Support is added for v2 and v3 API versions.
  • Recipe builds using kiwi-ng now include the signing keys contained in the hpe-signing-key secret, which allows for the verification of the recipe build artifacts.

New hardware support

New software support

Automation improvements

Base platform component upgrades

  • Kata upgraded to version 3.17.0.

Security improvements

  • Spire node attestation can now be setup to use TPM chips on supported platforms, see Enable TPM node attestation with Spire for more information.
  • The old version of the Spire server was removed to fully transition to the newer version of Spire.

Customer-requested enhancements

Documentation enhancements

Noteworthy changes

Test

  • Modified adjust k8s_nodes_ready_check.sh to not fail when a node is in Ready,SchedulingDisabled state
  • Modified velero_backups_check.sh to not fail if a newer, successful backup exists
  • Modified run_hms_ct_tests.sh to handle concurrency better
  • Fixed intermittent failures sometimes seen when running check_key_id_in_jwks.sh
  • Added retry logic to goss-postgresql-syncfailed.yaml to prevent intermittent false positives
  • Added retry logic to postgres_clusters_running.sh to prevent intermittent false positives
  • Added tests to the Software Management Services (SMS) health checks:
    • Added BOS create/update/delete (CRUD) tests for session templates and sessions.
    • Added CFS CRUD tests for configurations and sources.
    • Added IMS CRUD tests for images, recipes, and public keys.
    • These tests are part of the procedure to Validate CSM Health.
    • For more information on the SMS health checks, see Software Management Services health checks.
  • Added CFS node personalization to the barebones image boot test.

Bug fixes

  • The Boot Orchestration Service (BOS) session-setup operator now ignores invalid xnames referenced by session templates, fixing a bug that caused BOS sessions to be stuck in pending state.
  • BOS logging is significantly more memory efficient, fixing a problem where logging on large scale systems could cause BOS operator Kubernetes pods to be OOMKilled.
  • When using the API or CLI to Modify a BOS session template, it is no longer required to specify boot_sets in the update data (this fixes a regression bug present in CSM 1.6).
  • Previously, the CSM 1.5.3 and CSM 1.6.1 releases included changes to resolve resource leaks found in the PCS, SMD, hmcollector, and FAS services. This reduced instances of pods being restarted due to OOMKilled and failed liveness and/or readiness probes. These changes also improved the responsiveness and scalability of these services.
    • In the CSM 1.7.0 release, additional resource leaks in these same services were found and resolved.
    • Additionally, similar resource leaks were found and resolved in the following HMS services: BSS, CAPMC, River Discovery, HBTD, MEDS, RTS, HMNFD, SCSD, SLS
  • A bug was fixed in the hmcollector-poll service so that event subscriptions are no longer lost after updating Paradise BMC firmware. The service no longer needs to be restarted after performing firmware updates.
  • Fixed an issue where a soft deleted IMS recipe was always assigned the architecture x86_64, regardless of the architecture of the recipe that was deleted.
  • Fixed an issue where a soft deleted IMS recipe was always assigned require_dkms=true, regardless of the value of the recipe that was deleted.
  • Fixed an issue where incorrect metadata was stored for newly created IMS images.
  • Fixed an issue where IMS image tags were removed by a soft delete.
  • Fixed an issue where updating a CFS session could fail and cause the session to be stuck in pending state.
  • Fixed an issue where cfs-debugger crashed when cfs-state-reporter service status did not include a since timestamp.
  • Fixed an issue where the post-upgrade job of cms-ipxe would fail if a previously failed cms-ipxe upgrade job entry existed.
  • Fixed an issue where, when building an IMS image from a recipe, the job status would not update to error when the zypper repositories were not available.

Deprecations

For more details and a list of all deprecated CSM features, see Deprecations.

Removals

For more details and a list of all features with an announced removal target, see Removals.

Known issues

For a full list of known issues, see Known issues.

Security vulnerability exceptions in CSM 1.7