This procedure changes the credential for liquid-cooled EX cabinet chassis controllers and node controller (BMCs) used by CSM services after the CECs have been set to a new global default credential.
NOTE: This procedure does not provision Slingshot switch BMCs (RouterBMCs). Slingshot switch BMC default credentials must be changed using the procedures in the Slingshot product documentation. To update Slingshot switch BMCs, refer to “Change Rosetta Login and Redfish API Credentials” in the Slingshot Operations Guide (>1.6.0).
This procedure provisions only the default Redfish root account passwords. It does not modify Redfish accounts that have been added after an initial system installation.
The MEDS sealed secret contains the default global credential used by MEDS when it discovers new liquid-cooled EX cabinet hardware.
Before redeploying MEDS, update the customizations.yaml
file in the site-init
secret in the loftsman
namespace.
If the site-init
repository is available as a remote repository as described here, then clone it to ncn-m001. Otherwise, ensure that the site-init
repository is available on ncn-m001.
ncn-m001# git clone "$SITE_INIT_REPO_URL" site-init
Acquire customizations.yaml
from the currently running system:
ncn-m001# kubectl get secrets -n loftsman site-init -o jsonpath='{.data.customizations\.yaml}' | base64 -d > site-init/customizations.yaml
Review, add, and commit customizations.yaml
to the local site-init
repository as appropriate.
NOTE:
Ifsite-init
was cloned from a remote repository in step 1, there may not be any differences and hence nothing to commit. This is okay. If there are differences between what is in the repository and what was stored in thesite-init
, then it suggests settings were changed at some point.
ncn-m001# cd site-init
ncn-m001# git diff
ncn-m001# git add customizations.yaml
ncn-m001# git commit -m 'Add customizations.yaml from site-init secret'
Acquire sealed secret keys:
ncn-m001# mkdir -p certs
ncn-m001# kubectl -n kube-system get secret sealed-secrets-key -o jsonpath='{.data.tls\.crt}' | base64 -d > certs/sealed_secrets.crt
ncn-m001# kubectl -n kube-system get secret sealed-secrets-key -o jsonpath='{.data.tls\.key}' | base64 -d > certs/sealed_secrets.key
Inspect the original default credentials for MEDS:
ncn-m001# ./utils/secrets-decrypt.sh cray_meds_credentials ./certs/sealed_secrets.key ./customizations.yaml | jq .data.vault_redfish_defaults -r | base64 -d | jq
{
"Username": "root",
"Password": "bar"
}
Specify the desired default credentials for MEDS to use with new hardware:
Replace
foobar
with the root password configured on the CEC(s).
ncn-m001# echo '{ "Username": "root", "Password": "foobar" }' | base64 > creds.json.b64
Update and regenerate the cray_meds_credentials
sealed secret:
ncn-m001# cat << EOF | yq w - 'data.vault_redfish_defaults' "$(<creds.json.b64)" | yq r -j - | ./utils/secrets-encrypt.sh | yq w -f - -i ./customizations.yaml 'spec.kubernetes.sealed_secrets.cray_meds_credentials'
{
"kind": "Secret",
"apiVersion": "v1",
"metadata": {
"name": "cray-meds-credentials",
"namespace": "services",
"creationTimestamp": null
},
"data": {}
}
EOF
Decrypt updated sealed secret for review. The sealed secret should match the credentials set on the CEC.
ncn-m001# ./utils/secrets-decrypt.sh cray_meds_credentials ./certs/sealed_secrets.key ./customizations.yaml | jq .data.vault_redfish_defaults -r | base64 -d | jq
{
"Username": "root",
"Password": "foobar"
}
Update the site-init secret containing customizations.yaml
for the system:
ncn-m001# kubectl delete secret -n loftsman site-init
ncn-m001# kubectl create secret -n loftsman generic site-init --from-file=customizations.yaml
Check in changes made to customizations.yaml
ncn-m001# git diff
ncn-m001# git add customizations.yaml
ncn-m001# git commit -m 'Update customizations.yaml with global default credential for MEDS'
Push to the remote repository as appropriate:
ncn-m001# git push
Determine the version of MEDS:
ncn-m001# MEDS_VERSION=$(kubectl -n loftsman get cm loftsman-core-services -o jsonpath='{.data.manifest\.yaml}' | yq r - 'spec.charts.(name==cray-hms-meds).version')
ncn-m001# echo $MEDS_VERSION
Create meds-manifest.yaml
:
ncn-m001# cat > meds-manifest.yaml << EOF
apiVersion: manifests/v1beta1
metadata:
name: meds
spec:
charts:
- name: cray-hms-meds
version: $MEDS_VERSION
namespace: services
EOF
Merge customizations.yaml
with meds-manifest.yaml
:
ncn-m001# manifestgen -c customizations.yaml -i ./meds-manifest.yaml > ./meds-manifest.out.yaml
Redeploy the MEDS helm chart:
ncn-m001# loftsman ship \
--charts-repo https://packages.local/repository/charts \
--manifest-path meds-manifest.out.yaml
Wait for the MEDS Vault loader job to run to completion:
ncn-m001# kubectl wait -n services job cray-meds-vault-loader --for=condition=complete --timeout=5m
Verify the default credentials have changed in Vault:
ncn-m001# VAULT_PASSWD=$(kubectl -n vault get secrets cray-vault-unseal-keys -o json | jq -r '.data["vault-root"]' | base64 -d)
ncn-m001# kubectl -n vault exec -it cray-vault-0 -c vault -- env VAULT_TOKEN=$VAULT_PASSWD VAULT_ADDR=http://127.0.0.1:8200 vault kv get secret/meds-cred/global/ipmi
====== Data ======
Key Value
--- -----
Password foobar
Username root
Set CRED_PASSWORD
to the new updated password:
ncn-m001# CRED_PASSWORD=foobar
Update the credentials used by CSM services for all previously discovered EX cabinet BMCs to the new global default:
ncn-m001# \
REDFISH_ENDPOINTS=$(cray hsm inventory redfishEndpoints list --type '!RouterBMC' --format json | jq .RedfishEndpoints[].ID -r | sort -V )
cray hsm state components list --format json > /tmp/components.json
for RF in $REDFISH_ENDPOINTS; do
echo "$RF: Checking..."
CLASS=$(jq -r --arg XNAME "$RF" '.Components[] | select(.ID == $XNAME).Class' /tmp/components.json)
if [[ "$CLASS" != "Mountain" ]]; then
echo "$RF is not Mountain, skipping..."
continue
fi
echo "$RF: Updating credentials"
cray hsm inventory redfishEndpoints update ${RF} --user root --password ${CRED_PASSWORD}
done
It will take some time for the above bash script to run. It will take approximately 5 minutes to update all of the credentials for a single fully populated cabinet.
Alternatively, use the following command on each BMC. Replace
BMC_XNAME
with the BMC xname to update the credentials:ncn-m001# cray hsm inventory redfishEndpoints update BMC_XNAME --user root --password ${CRED_PASSWORD}
Wait for HSM to re-discover the updated RedfishEndpoints:
ncn-m001# sleep 180
Wait for all updated Redfish endpoints to become DiscoverOK
:
The following bash script will find all Redfish endpoints for the liquid-cooled BMCs that are not in DiscoverOK
, and display their last Discovery Status.
ncn-m001# \
cray hsm inventory redfishEndpoints list --laststatus '!DiscoverOK' --type '!RouterBMC' --format json > /tmp/redfishEndpoints.json
cray hsm state components list --format json > /tmp/components.json
REDFISH_ENDPOINTS=$(jq .RedfishEndpoints[].ID -r /tmp/redfishEndpoints.json | sort -V)
for RF in $REDFISH_ENDPOINTS; do
CLASS=$(jq -r --arg XNAME "$RF" '.Components[] | select(.ID == $XNAME).Class' /tmp/components.json)
if [[ "$CLASS" != "Mountain" ]]; then
continue
fi
DISCOVERY_STATUS=$(jq -r --arg XNAME "$RF" '.RedfishEndpoints[] | select(.ID == $XNAME).DiscoveryInfo.LastDiscoveryStatus' /tmp/redfishEndpoints.json)
echo "$RF: $DISCOVERY_STATUS"
done
Example output:
x1001c0r5b0: HTTPsGetFailed
x1001c1s0b0: HTTPsGetFailed
x1001c1s0b1: HTTPsGetFailed
x1001c2s0b1: DiscoveryStarted
For each Redfish endpoint that is reported use the following to troubleshoot why it is not DiscoverOK
or DiscoveryStarted
:
DiscoveryStarted
, then that BMC is currently in the process of being inventoried by HSM. Wait a few minutes and re-try the bash script above to re-check the current discovery status of the RedfishEndpoints.DiscoverOK
or DiscoveryStarted
every 3 minutes.HTTPsGetFailed
, then HSM had issues contacting BMC, then perform the following steps:Verify that the BMC xname is resolvable and pingable.
If the BMC is a ChassisBMC, then the
b0
in its xname needs to be removed to get its hostname. Otherwise, for NodeBMCs their xnames is their BMC hostname. For example, the ChassisBMC has the xnamex1000c0b0
, and its hostname isx1000c0
.
ncn-m001# ping x1001c1s0b0
If a NodeBMC is not pingable, then verify that the slot powering the BMC is powered on. If this is a ChassisBMC, skip this step. For example, the NodeBMC x1001c1s0b0 is in slot x1001c1s0.
ncn-m001# cray capmc get_xname_status create --xnames x1001c1s0
e = 0
err_msg = ""
on = [ "x1001c1s0b0",]
If the slot is off, power it on.
ncn-m001# cray capmc xname_on create --xnames x1001c1s0
If the BMC is reachable and in HTTPsGetFailed
, then verify that the BMC is accessible with the new default global credential. Replace BMC_HOSTNAME
with the hostname of the Redfish Endpoint. For a NodeBMC its hostname is its xname. For a ChassisBMC, the b0
part of the xname must be removed to get its hostname.
For example, the ChassisBMC has the xname
x1000c0b0
, and its hostname isx1000c0
.
ncn-m001# curl -k -u root:$CRED_PASSWORD https://BMC_HOSTNAME/redfish/v1/Managers | jq
If the error message below is returned, then the BMC requires a StatefulReset action. The StatefulReset action will clear user-defined credentials that are taking precedence over the CEC supplied credential. It will also clear NTP, Syslog, and SSH Key configurations on the BMC.
{
"error": {
"@Message.ExtendedInfo": [
{
"@odata.type": "#Message.v1_0_5.Message",
"Message": "While attempting to establish a connection to /redfish/v1/Managers, the service was denied access.",
"MessageArgs": [
"/redfish/v1/Managers"
],
"MessageId": "Security.1.0.AccessDenied",
"Resolution": "Attempt to ensure that the URI is correct and that the service has the appropriate credentials.",
"Severity": "Critical"
}
],
"code": "Security.1.0.AccessDenied",
"message": "While attempting to establish a connection to /redfish/v1/Managers, the service was denied access."
}
}
Perform a StatefulReset on the liquid-cooled BMC and replace BMC_HOSTNAME
with the hostname of the BMC. The OLD_DEFAULT_PASSWORD
must match the credential that was previously set on the BMC. This is mostly likely the previous global default credential for liquid-cooled BMCs.
ncn-m001# curl -k -u root:OLD_DEFAULT_PASSWORD -X POST -H 'Content-Type: application/json' -d \
'{"ResetType": "StatefulReset"}' \
https://BMC_HOSTNAME/redfish/v1/Managers/BMC/Actions/Manager.Reset
After the StatefulReset action has been issued, the BMC will be unreachable for a few minutes as it performs the StatefulReset.
Important!: If after the StatefulReset, the BMC is still using the old password, then power cycle the compute chassis slot(s).
This section must be performed only if any liquid-cooled Node or Chassis BMCs had a StatefulReset action.
For each liquid-cooled BMC that the StatefulReset action was applied, delete the BMC from HSM. Replace BMC_XNAME
with the BMC xname to delete.
ncn-m001# cray hsm inventory redfishEndpoints delete BMC_XNAME
Restart MEDS to re-setup the NTP and Syslog configuration the RedfishEndpoints:
View Running MEDS pods:
ncn-m001# kubectl -n services get pods -l app.kubernetes.io/instance=cray-hms-meds
NAME READY STATUS RESTARTS AGE
cray-meds-6d8b5875bc-4jngc 2/2 Running 0 17d
Restart MEDS:
ncn-m001# kubectl -n services rollout restart deployment cray-meds
ncn-m001# kubectl -n services rollout status deployment cray-meds
Wait for MEDS to re-discover the deleted RedfishEndpoints:
ncn-m001# sleep 300
Verify all expected hardware has been discovered:
The following bash script finds all Redfish endpoints for the liquid-cooled BMCs that are not in DiscoverOK
, and displays their last Discovery Status.
ncn-m001# \
cray hsm inventory redfishEndpoints list --laststatus '!DiscoverOK' --type '!RouterBMC' --format json > /tmp/redfishEndpoints.json
cray hsm state components list --format json > /tmp/components.json
REDFISH_ENDPOINTS=$(jq .RedfishEndpoints[].ID -r /tmp/redfishEndpoints.json | sort -V)
for RF in $REDFISH_ENDPOINTS; do
CLASS=$(jq -r --arg XNAME "$RF" '.Components[] | select(.ID == $XNAME).Class' /tmp/components.json)
if [[ "$CLASS" != "Mountain" ]]; then
continue
fi
DISCOVERY_STATUS=$(jq -r --arg XNAME "$RF" '.RedfishEndpoints[] | select(.ID == $XNAME).DiscoveryInfo.LastDiscoveryStatus' /tmp/redfishEndpoints.json)
echo "$RF: $DISCOVERY_STATUS"
done
Restore SSH Keys configured by cray-conman on liquid-cooled Node BMCs.
View the current status of the cray-conman pods:
ncn-m001# kubectl -n services get pods -l app.kubernetes.io/instance=cray-conman
NAME READY STATUS RESTARTS AGE
cray-conman-7f956fc9bc-97rx4 3/3 Running 0 47d
Restart cray-conman deployment:
ncn-m001# kubectl -n services rollout restart deployment cray-conman
ncn-m001# kubectl -n services rollout status deployment cray-conman
To restore passwordless SSH connections to liquid-cooled Node BMCs that have had the StatefulReset action, follow the procedure in section 30.23 “Enable Passwordless Connections to Liquid Cooled Node BMCs” in the HPE Cray EX System Administration Guide 1.4 S-8001.
WARNING: If an admin uses SCSD to update the SSHConsoleKey value outside of ConMan, it will disrupt the ConMan connection to the console and collection of console logs. Refer to “About the ConMan Containerized Service” in the HPE Cray EX System Administration Guide 1.4 S-8001.
To restore passwordless SSH connections to the liquid-cooled Chassis BMCs that have had a StatefulReset action, follow the steps below for each Chassis BMC that was reset:
Save the public SSH key for the root user.
ncn-m001# export SSH_PUBLIC_KEY=$(cat /root/.ssh/id_rsa.pub | sed 's/[[:space:]]*$//')
Enable passwordless SSH to the root user of the BMCs. Skip this step if passwordless SSH to the root user is not desired. Replace BMC_HOSTNAME
with the hostname name of the Chassis BMC. The hostname of a ChassisBMC is its xname with the ending b0
removed.
ncn-m001# curl -k -u root:$CRED_PASSWORD -X PATCH https://BMC_HOSTNAME/redfish/v1/Managers/BMC/NetworkProtocol \
-H 'Content-Type: application/json' \
-d "{\"Oem\":{\"SSHAdmin\":{\"AuthorizedKeys\":\"ssh-rsa $SSH_PUBLIC_KEY\"}}}"
Enable passwordless SSH to the consoles on the BMCs. Skip this step if passwordless SSH to the root user is not desired. Replace BMC_HOSTNAME
with the hostname name of the Chassis BMC. The hostname of a ChassisBMC is its xname with the ending b0
removed.
ncn-m001# curl -k -u root:$CRED_PASSWORD -X PATCH https://BMC_HOSTNAME/redfish/v1/Managers/BMC/NetworkProtocol \
-H 'Content-Type: application/json' \
-d "{\"Oem\":{\"SSHConsole\":{\"AuthorizedKeys\":\"ssh-rsa $SSH_PUBLIC_KEY\"}}}"