This procedure changes the credential for liquid-cooled EX cabinet chassis controllers and node controller (BMCs) used by CSM services after the CECs have been set to a new global default credential.
NOTE
This procedure does not provision Slingshot switch BMCs (RouterBMCs
). Slingshot switch BMC default credentials must be changed using the procedures in the Slingshot
product documentation. To update Slingshot switch BMCs, refer to “Change Rosetta Login and Redfish API Credentials” in the Slingshot Operations Guide (> 1.6.0)
.
This procedure provisions only the default Redfish root
account passwords. It does not modify Redfish accounts that have been added after an initial system installation.
hms-discovery
Kubernetes CronJob has been disabled.The Mountain Endpoint Discovery Service (MEDS) sealed secret contains the default global credential used by MEDS when it discovers new liquid-cooled EX cabinet hardware.
Follow the Redeploying a Chart procedure with the following specifications:
Chart name: cray-hms-bss
Base manifest name: core-services
(ncn-mw#
) When reaching the step to update the customizations, perform the following steps:
Only follow these steps as part of the previously linked chart redeploy procedure.
Run git clone https://github.com/Cray-HPE/csm.git
.
Acquire sealed secret keys.
mkdir -pv certs
kubectl -n kube-system get secret sealed-secrets-key -o jsonpath='{.data.tls\.crt}' | base64 -d > certs/sealed_secrets.crt
kubectl -n kube-system get secret sealed-secrets-key -o jsonpath='{.data.tls\.key}' | base64 -d > certs/sealed_secrets.key
Modify MEDS sealed secret to use new global default credential.
Inspect the original default credential for MEDS.
./utils/secrets-decrypt.sh cray_meds_credentials ./certs/sealed_secrets.key ./customizations.yaml | jq .data.vault_redfish_defaults -r | base64 -d | jq
Example output:
{
"Username": "root",
"Password": "bar"
}
Specify the desired default credentials for MEDS to use with new hardware.
Replace
foobar
with theroot
user password configured on the CECs.
echo '{ "Username": "root", "Password": "foobar" }' | base64 > creds.json.b64
Update and regenerate the cray_meds_credentials
sealed secret.
cat << EOF | yq w - 'data.vault_redfish_defaults' "$(<creds.json.b64)" | yq r -j - | ./utils/secrets-encrypt.sh | yq w -f - -i ./customizations.yaml 'spec.kubernetes.sealed_secrets.cray_meds_credentials'
{
"kind": "Secret",
"apiVersion": "v1",
"metadata": {
"name": "cray-meds-credentials",
"namespace": "services",
"creationTimestamp": null
},
"data": {}
}
EOF
Decrypt updated sealed secret for review.
The sealed secret should match the credentials set on the CEC.
./utils/secrets-decrypt.sh cray_meds_credentials ./certs/sealed_secrets.key ./customizations.yaml | jq .data.vault_redfish_defaults -r | base64 -d | jq
Example output:
{
"Username": "root",
"Password": "foobar"
}
(ncn-mw#
) When reaching the step to validate the redeployed chart, perform the following steps:
Only follow these steps as part of the previously linked chart redeploy procedure.
Wait for the MEDS Vault loader job to run to completion.
kubectl wait -n services job cray-meds-vault-loader --for=condition=complete --timeout=5m
Verify that the default credentials have changed in Vault.
VAULT_PASSWD=$(kubectl -n vault get secrets cray-vault-unseal-keys -o json | jq -r '.data["vault-root"]' | base64 -d)
kubectl -n vault exec -it cray-vault-0 -c vault -- env VAULT_TOKEN=$VAULT_PASSWD VAULT_ADDR=http://127.0.0.1:8200 vault kv get secret/meds-cred/global/ipmi
Example output:
====== Data ======
Key Value
--- -----
Password foobar
Username root
Make sure to perform the entire linked procedure, including the step to save the updated customizations.
Set CRED_PASSWORD
to the new updated password:
read -s CRED_PASSWORD
echo $CRED_PASSWORD
Expected output:
foobar
Update the credentials used by CSM services for all previously discovered EX cabinet BMCs to the new global default.
\
cray hsm inventory redfishEndpoints list --format json > /tmp/redfishEndpoints.json
cray hsm state components list --format json > /tmp/components.json
REDFISH_ENDPOINTS=$(jq .RedfishEndpoints[].ID -r /tmp/redfishEndpoints.json | sort -V)
for RF in $REDFISH_ENDPOINTS; do
echo "$RF: Checking..."
TYPE=$(jq -r --arg XNAME "$RF" '.RedfishEndpoints[] | select(.ID == $XNAME).Type' /tmp/redfishEndpoints.json)
if [[ -z "$TYPE" ]]; then
echo "$RF missing Type, skipping..."
continue
elif [[ "$TYPE" == "RouterBMC" ]]; then
echo "$RF is a RouterBMC, skipping..."
continue
fi
CLASS=$(jq -r --arg XNAME "$RF" '.Components[] | select(.ID == $XNAME).Class' /tmp/components.json)
if [[ "$CLASS" != "Mountain" ]]; then
echo "$RF is not Mountain, skipping..."
continue
fi
echo "$RF: Updating credentials"
cray hsm inventory redfishEndpoints update ${RF} --user root --password ${CRED_PASSWORD} --id ${RF} --hostname ${RF}
done
It will take some time for the above bash script to run. It will take approximately 5 minutes to update all of the credentials for a single fully populated cabinet.
Alternatively, use the following command on each BMC. Replace
BMC_XNAME
with the BMC component name (xname) to update the credentials:cray hsm inventory redfishEndpoints update BMC_XNAME --user root --password ${CRED_PASSWORD} --id BMC_XNAME --hostname BMC_XNAME
Restart the hms-discovery
Kubernetes CronJob.
kubectl -n services patch cronjobs hms-discovery -p '{"spec" : {"suspend" : false }}'
After 2-3 minutes, the hms-discovery
CronJob will start to power on all of the currently powered off compute slots.
Wait for compute slots to be powered on and for HSM to re-discover the updated Redfish endpoints.
sleep 300
Wait for all updated Redfish endpoints to become DiscoverOK
.
The following Bash script will find all Redfish endpoints for the liquid-cooled BMCs that are not in DiscoverOK
, and display their lastDiscoveryStatus
.
\
cray hsm inventory redfishEndpoints list --laststatus '!DiscoverOK' --format json > /tmp/redfishEndpoints.json
cray hsm state components list --format json > /tmp/components.json
REDFISH_ENDPOINTS=$(jq .RedfishEndpoints[].ID -r /tmp/redfishEndpoints.json | sort -V)
for RF in $REDFISH_ENDPOINTS; do
TYPE=$(jq -r --arg XNAME "$RF" '.RedfishEndpoints[] | select(.ID == $XNAME).Type' /tmp/redfishEndpoints.json)
if [[ -z "$TYPE" ]]; then
continue
elif [[ "$TYPE" == "RouterBMC" ]]; then
continue
fi
CLASS=$(jq -r --arg XNAME "$RF" '.Components[] | select(.ID == $XNAME).Class' /tmp/components.json)
if [[ "$CLASS" != "Mountain" ]]; then
continue
fi
DISCOVERY_STATUS=$(jq -r --arg XNAME "$RF" '.RedfishEndpoints[] | select(.ID == $XNAME).DiscoveryInfo.LastDiscoveryStatus' /tmp/redfishEndpoints.json)
echo "$RF: $DISCOVERY_STATUS"
done
Example output:
x1001c0r5b0: HTTPsGetFailed
x1001c1s0b0: HTTPsGetFailed
x1001c1s0b1: HTTPsGetFailed
x1001c2s0b1: DiscoveryStarted
For each Redfish endpoint that is reported use the following to troubleshoot why it is not DiscoverOK
or DiscoveryStarted
:
If the Redfish endpoint is DiscoveryStarted
, then that BMC is currently in the process of being inventoried by HSM. Wait a
few minutes and re-try the Bash script above to re-check the current discovery status of the RedfishEndpoints
.
The hms-discovery cronjob (if enabled) will trigger a discover on BMCs that are not currently in
DiscoverOK
orDiscoveryStarted
every three minutes.
If the Redfish endpoint is HTTPsGetFailed
, then HSM had issues contacting BMC.
Verify that the BMC component name (xname) is resolvable and pingable.
ping x1001c1s0b0
If a NodeBMC
is not pingable, then verify that the slot powering the BMC is powered on.
If this is a ChassisBMC
, then skip this step.
For example, the NodeBMC
x1001c1s0b0
is in slot x1001c1s0
:
cray power status describe x1001c1s0 --format toml
Example output:
[[status]]
xname = "x1001c1s0"
powerState = "on"
managementState = "available"
error = ""
supportedPowerTransitions = [ "On", "Force-Off", "Soft-Off", "Off", "Init", "Hard-Restart", "Soft-Restart",]
lastUpdated = "2024-02-04T01:48:48.3156272Z"
If the slot is off, power it on:
cray power transition on --xnames x1001c1s0
If the BMC is reachable and in HTTPsGetFailed
, then verify that the BMC is accessible with the new default global credential.
Replace BMC_XNAME
with the hostname of the Redfish endpoint.
curl -k -u root:$CRED_PASSWORD https://BMC_XNAME/redfish/v1/Managers | jq
If the error message below is returned, then the BMC must have a StatefulReset
action performed on it.
The StatefulReset
action clears previously user-defined credentials that are taking precedence over the CEC-supplied
credential. It also clears NTP, syslog
, and SSH key configurations on the BMC.
{
"error": {
"@Message.ExtendedInfo": [
{
"@odata.type": "#Message.v1_0_5.Message",
"Message": "While attempting to establish a connection to /redfish/v1/Managers, the service was denied access.",
"MessageArgs": [
"/redfish/v1/Managers"
],
"MessageId": "Security.1.0.AccessDenied",
"Resolution": "Attempt to ensure that the URI is correct and that the service has the appropriate credentials.",
"Severity": "Critical"
}
],
"code": "Security.1.0.AccessDenied",
"message": "While attempting to establish a connection to /redfish/v1/Managers, the service was denied access."
}
}
Perform a StatefulReset
on the liquid-cooled BMC. Replace BMC_XNAME
with the hostname of the BMC.
The OLD_DEFAULT_PASSWORD
must match the credential that was previously set on the BMC. This is mostly
likely the previous global default credential for liquid-cooled BMCs.
curl -k -u root:OLD_DEFAULT_PASSWORD -X POST -H 'Content-Type: application/json' -d \
'{"ResetType": "StatefulReset"}' \
https://BMC_XNAME/redfish/v1/Managers/BMC/Actions/Manager.Reset
After the StatefulReset
action has been issued, the BMC will be unreachable for a few minutes as it performs the StatefulReset
.
StatefulReset
was performed on any BMC
NOTE
This section only needs to be performed if any liquid-cooled node or chassis BMCs that had to beStatefulReset
.
For each liquid-cooled BMC to which the StatefulReset
action was applied, delete the BMC from HSM.
Replace BMC_XNAME
with the BMC component name (xname) to delete.
cray hsm inventory redfishEndpoints delete BMC_XNAME
Restart MEDS to re-setup the NTP and syslog
configuration for the Redfish endpoints.
View running MEDS pods.
kubectl -n services get pods -l app.kubernetes.io/instance=cray-hms-meds
Example output:
NAME READY STATUS RESTARTS AGE
cray-meds-6d8b5875bc-4jngc 2/2 Running 0 17d
Restart MEDS.
kubectl -n services rollout restart deployment cray-meds
kubectl -n services rollout status deployment cray-meds
Wait five minutes for MEDS to re-discover the deleted Redfish endpoints.
sleep 300
Verify that all expected hardware has been discovered.
The following Bash script will find all Redfish endpoints for the liquid-cooled BMCs that are not in DiscoverOK
, and display their last discovery status.
\
cray hsm inventory redfishEndpoints list --laststatus '!DiscoverOK' --format json > /tmp/redfishEndpoints.json
cray hsm state components list --format json > /tmp/components.json
REDFISH_ENDPOINTS=$(jq .RedfishEndpoints[].ID -r /tmp/redfishEndpoints.json | sort -V)
for RF in $REDFISH_ENDPOINTS; do
TYPE=$(jq -r --arg XNAME "$RF" '.RedfishEndpoints[] | select(.ID == $XNAME).Type' /tmp/redfishEndpoints.json)
if [[ -z "$TYPE" ]]; then
continue
elif [[ "$TYPE" == "RouterBMC" ]]; then
continue
fi
CLASS=$(jq -r --arg XNAME "$RF" '.Components[] | select(.ID == $XNAME).Class' /tmp/components.json)
if [[ "$CLASS" != "Mountain" ]]; then
continue
fi
DISCOVERY_STATUS=$(jq -r --arg XNAME "$RF" '.RedfishEndpoints[] | select(.ID == $XNAME).DiscoveryInfo.LastDiscoveryStatus' /tmp/redfishEndpoints.json)
echo "$RF: $DISCOVERY_STATUS"
done
Restore SSH keys configured by Cray console services on liquid-cooled Node BMCs.
Get the SSH console private key from Vault:
VAULT_PASSWD=$(kubectl -n vault get secrets cray-vault-unseal-keys \
-o json | jq -r '.data["vault-root"]' | base64 -d)
kubectl -n vault exec -t cray-vault-0 -c vault \
-- env VAULT_TOKEN=$VAULT_PASSWD VAULT_ADDR=http://127.0.0.1:8200 \
VAULT_FORMAT=json vault read transit/export/signing-key/mountain-bmc-console \
| jq -r .data.keys[] > ssh-console.key
Generate the SSH public key.
chmod 0600 ssh-console.key
export SCSD_SSH_CONSOLE_KEY=$(ssh-keygen -yf ssh-console.key)
echo $SCSD_SSH_CONSOLE_KEY
Delete the SSH console private key from disk.
rm ssh-console.key
Generate a payload for the SCSD service.
The administrator must be authenticated to the Cray CLI before proceeding. See Configure the Cray Command Line Interface.
cat > scsd_cfg.json <<DATA
{
"Force":false,
"Targets":
$(cray hsm state components list --class Mountain --type NodeBMC --format json | jq -r '[.Components[] | .ID]'),
"Params":{
"SSHConsoleKey":"$(echo $SCSD_SSH_CONSOLE_KEY)"
}
}
DATA
Alternatively create a scsd_cfg.json
file with only the SSH console key:
cat > scsd_cfg.json <<DATA
{
"Force":false,
"Targets":[
"x1000c0s0b0",
"x1000c0s0b0"
],
"Params":{
"SSHConsoleKey":"$(echo $SCSD_SSH_CONSOLE_KEY)"
}
}
DATA
Edit the Targets
array to contain the NodeBMCs
that have have had the StatefulReset
action.
Inspect the generated scsd_cfg.json
file.
Ensure that the following are true before running the cray scsd
command in the following step:
scsd_cfg.json
file to NodeBMCs
that have had the StatefulReset
action applied to them.SSHConsoleKey
settings match the desired public key.Apply SSH console key to the NodeBMCs
:
cray scsd bmc loadcfg create scsd_cfg.json
Check the output to verify all hardware has been set with the correct keys.
Passwordless SSH to the consoles should now function as expected.