Updating Foxconn Paradise Nodes with FAS

Use the Firmware Action Service (FAS) to update the firmware on Foxconn Paradise devices. Each procedure includes the prerequisites and example recipes required to update the firmware.

NOTE: Any node that is locked remains in the state inProgress with the stateHelper message of "failed to lock" until the action times out, or the lock is released. If the action is timed out, these nodes report as failed with the stateHelper message of "time expired; could not complete update". This includes NCNs which are manually locked to prevent accidental rebooting and firmware updates.

Refer to FAS Filters for more information on the content used in the example JSON files.

The FASUpdate.py script can be used to perform default updates to firmware and BIOS.

Prerequisites

The following targets can be updated with FAS on Paradise Nodes:

Update Paradise bmc_active procedure

NOTE: Some BMC firmware updates will require that factory defaults, or a factory reset, be applied. You can check for this requirement when you download a new HFP firmware release. It is very important to check for and perform this action if it is required. If a factory reset of the BMC is required, follow the BMC factory reset procedure at the bottom of this page before updating the BMC firmware.

The FASUpdate.py script can be used to update bmc_active - use recipe foxconn_nodeBMC_bmc.json

The BMC will reboot after the update is complete.

To update using a JSON file and the Cray CLI, use this example JSON file and follow the Updating Paradise Firmware with JSON and the Cray CLI Procedure

{
"stateComponentFilter": {
    "deviceTypes": [ "nodeBMC" ]
  },
"inventoryHardwareFilter": {
    "manufacturer": "foxconn"
    },
"targetFilter": {
    "targets": [ "bmc_active" ]
  },
"command": {
    "version": "latest",
    "tag": "default",
    "overrideDryrun": false,
    "restoreNotPossibleOverride": true,
    "timeLimit": 1000,
    "description": "Dryrun upgrade of Foxconn bmc_active"
  }
}

IMPORTANT: There is a known bug that causes the hmcollector-poll service to lose event subscriptions after BMC firmware is updated. After updating BMC firmware, the hmcollector-poll service must be restarted to work around this issue. After the update is complete, and you confirm the BMC has been rebooted, restart the hmcollector-poll service with this command:

kubectl -n services rollout restart deployment cray-hms-hmcollector-poll

Update Paradise bios_active procedure

The nodes must be OFF before updating the BIOS

The FASUpdate.py script can be used to update bios_active - use recipe foxconn_nodeBMC_bios.json

To update using a JSON file and the Cray CLI, use this example JSON file and follow the Updating Paradise Firmware with JSON and the Cray CLI Procedure

{
"stateComponentFilter": {
    "deviceTypes": [ "nodeBMC" ]
  },
"inventoryHardwareFilter": {
    "manufacturer": "foxconn"
    },
"targetFilter": {
    "targets": [ "bios_active" ]
  },
"command": {
    "version": "latest",
    "tag": "default",
    "overrideDryrun": false,
    "restoreNotPossibleOverride": true,
    "timeLimit": 1000,
    "description": "Dryrun upgrade of Foxconn bios_active"
  }
}

Some BIOS versions will require that BIOS factory defaults are applied to clear all prior settings AFTER the BIOS is updated. You can check for this requirement when you download a new HFP firmware release. It is very important to check for and perform this action if it is required. If resetting BIOS factory defaults is required, follow the BIOS factory defaults procedure at the bottom of this page.

If resetting BIOS factory defaults is not required, simply power the node on.

IMPORTANT: After the update has completed, the nodes must be turned on and REMAIN ON FOR AT LEAST 6 MINUTES

NOTE: The version number reported by Redfish will NOT be updated until the node has fully booted.

Update Paradise erot_active procedure

NOTE: After update of erot_active an AC power cycle is required for update to take affect. To do an AC power cycle, run the following command (ncn#).

ssh admin@$(xname) "ipmitool raw 0x38 0x02"

The FASUpdate.py script can be used to update erot_active - use recipe foxconn_nodeBMC_erot.json

To update using a JSON file and the Cray CLI, use this example JSON file and follow the Updating Paradise Firmware with JSON and the Cray CLI Procedure

{
"stateComponentFilter": {
    "deviceTypes": [ "nodeBMC" ]
  },
"inventoryHardwareFilter": {
    "manufacturer": "foxconn"
    },
"targetFilter": {
    "targets": [ "erot_active" ]
  },
"command": {
    "version": "latest",
    "tag": "default",
    "overrideDryrun": false,
    "restoreNotPossibleOverride": true,
    "timeLimit": 1000,
    "description": "Dryrun upgrade of Foxconn bios_active"
  }
}

Update Paradise fpga_active procedure

NOTE: After update of fpga_active an AC power cycle is required for update to take affect. To do an AC power cycle, run the following command (ncn#).

ssh admin@$(xname) "ipmitool raw 0x38 0x02"

The FASUpdate.py script can be used to update fpga_active - use recipe foxconn_nodeBMC_fpga.json

To update using a JSON file and the Cray CLI, use this example JSON file and follow the Updating Paradise Firmware with JSON and the Cray CLI Procedure

{
"stateComponentFilter": {
    "deviceTypes": [ "nodeBMC" ]
  },
"inventoryHardwareFilter": {
    "manufacturer": "foxconn"
    },
"targetFilter": {
    "targets": [
        "fpga_active"
    ]
  },
"command": {
    "version": "latest",
    "tag": "default",
    "overrideDryrun": false,
    "restoreNotPossibleOverride": true,
    "timeLimit": 1000,
    "description": "Dryrun upgrade of Foxconn bios_active"
  }
}

Update Paradise pld_active procedure

IMPORTANT: The update of the target pld_active should only be applied to blade 1 (i.e. x3000c0s3b1) - applying to other blades at the same time may cause issues. To use the FASUpdate.py script, use the --xnames flag to specify b1.

The FASUpdate.py script can be used to update pld_active - use recipe foxconn_nodeBMC_pld.json

To update using a JSON file and the Cray CLI, use this example JSON file and follow the Updating Paradise Firmware with JSON and the Cray CLI Procedure

{
"stateComponentFilter": {
    "xnames": [ "x3000c0s3b1" ],
    "deviceTypes": [ "nodeBMC" ]
  },
"inventoryHardwareFilter": {
    "manufacturer": "foxconn"
    },
"targetFilter": {
    "targets": [ "pld_active" ]
  },
"command": {
    "version": "latest",
    "tag": "default",
    "overrideDryrun": false,
    "restoreNotPossibleOverride": true,
    "timeLimit": 1000,
    "description": "Dryrun upgrade of Foxconn bios_active"
  }
}

Update Paradise firmware using JSON file and Cray CLI

NOTE: The FASUpdate.py script can be used to perform default updates to firmware and BIOS.

  1. Create a JSON file using the example recipe.

  2. Initiate a dry-run to verify the firmware that will be updated and the version it will update to.

    1. (ncn#) Create the dry-run session.

      The overrideDryrun = false value indicates that the command will do a dry run.

      cray fas actions create nodeBMC.json --format toml
      

      Example output:

      overrideDryrun = false
      actionID = "fddd0025-f5ff-4f59-9e73-1ca2ef2a432d"
      
    2. (ncn#) Describe the actionID for firmware update dry-run job.

      Replace the actionID value with the string returned in the previous step. In this example, "fddd0025-f5ff-4f59-9e73-1ca2ef2a432d" is used.

      cray fas actions describe {actionID} --format toml
      

      Example output:

      blockedBy = []
      state = "completed"
      actionID = "fddd0025-f5ff-4f59-9e73-1ca2ef2a432d"
      startTime = "2020-08-31 15:49:44.568271843 +0000 UTC"
      snapshotID = "00000000-0000-0000-0000-000000000000"
      endTime = "2020-08-31 15:51:35.426714612 +0000 UTC"
      
      [command]
      description = "Update Foxconn Node BMCs Dryrun"
      tag = "default"
      restoreNotPossibleOverride = true
      timeLimit = 10000
      version = "latest"
      overrideDryrun = false
      

      If state = "completed", the dry-run has found and checked all the nodes. Check the following sections for more information:

      • Lists the nodes that have a valid image for updating:

        [operationSummary.succeeded]
        
      • Lists the nodes that will not be updated because they are already at the correct version:

        [operationSummary.noOperation]
        
      • Lists the nodes that had an error when attempting to update:

        [operationSummary.failed]
        
      • Lists the nodes that do not have a valid image for updating:

        [operationSummary.noSolution]
        
  3. Update the firmware after verifying that the dry-run worked as expected.

    1. Edit the JSON file and update the values so an actual firmware update can be run.

      The following example is for the nodeBMC.json file. Update the following values:

      "overrideDryrun":true,
      "description":"Update Foxconn Node BMCs"
      
    2. (ncn#) Run the firmware update.

      The output overrideDryrun = true indicates that an actual firmware update job was created. A new actionID will also be displayed.

      cray fas actions create nodeBMC.json --format toml
      

      Example output:

      overrideDryrun = true
      actionID = "bc40f10a-e50c-4178-9288-8234b336077b"
      

      The time it takes for a firmware action to finish varies. It can be a few minutes or over 20 minutes.

      The BMC automatically reboots after the BMC firmware has been loaded.

  4. Retrieve the operationID and verify that the update is complete.

    cray fas actions describe {actionID} --format toml
    

    Example output:

    [operationSummary.failed]
    [[operationSummary.failed.operationKeys]]
    stateHelper = "unexpected change detected in firmware version. Expected nc.1.3.10-shasta-release.arm.2020-07-21T23:58:22+00:00.d479f59 got: nc.cronomatic-dev.arm.2019-09-24T13:20:24+00:00.9d0f8280"
    fromFirmwareVersion = "nc.cronomatic-dev.arm.2019-09-24T13:20:24+00:00.9d0f8280"
    xname = "x1005c6s4b0"
    target = "BMC"
    operationID = "e910c6ad-db98-44fc-bdc5-90477b23386f"
    
  5. (ncn#) View more details for an operation using the operationID from the previous step.

    Check the list of nodes for the failed or completed state.

    cray fas operations describe {operationID}
    

    For example:

    cray fas operations describe "e910c6ad-db98-44fc-bdc5-90477b23386f" --format toml
    

    Example output:

    fromFirmwareVersion = "nc.cronomatic-dev.arm.2019-09-24T13:20:24+00:00.9d0f8280"
    fromTag = ""
    fromImageURL = ""
    endTime = "2020-08-31 16:40:13.464321212 +0000 UTC"
    actionID = "bc40f10a-e50c-4178-9288-8234b336077b"
    startTime = "2020-08-31 16:28:01.228524446 +0000 UTC"
    fromSemanticFirmwareVersion = ""
    toImageURL = ""
    model = "WNC_REV_B"
    operationID = "e910c6ad-db98-44fc-bdc5-90477b23386f"
    fromImageID = "00000000-0000-0000-0000-000000000000"
    target = "BMC"
    toImageID = "39c0e553-281d-4776-b68e-c46a2993485e"
    toSemanticFirmwareVersion = "1.3.10"
    refreshTime = "2020-08-31 16:40:13.464325422 +0000 UTC"
    blockedBy = []
    toTag = ""
    state = "failed"
    stateHelper = "unexpected change detected in firmware version. Expected nc.1.3.10-shasta-release.arm.2020-07-21T23:58:22+00:00.d479f59 got: nc.cronomatic-dev.arm.2019-09-24T13:20:24+00:00.9d0f8280"
    deviceType = "NodeBMC"
    

    Once the firmware and BIOS are updated, the compute nodes can be powered back on.

    If the nodes have never been powered on in the system before (they are being added during a hardware add procedure), then use the Boot Orchestration Service (BOS) to power them on. Using BOS will prepare the initial boot artifacts required to boot them. If this is not the first time they have been powered on in this system, then you can use the Power Control Service (PCS) to power them on.

Upload Paradise images to TFTP server

(ncn#) To check if a firmware is uploaded to the TFTP server:

kubectl -n services exec -it `kubectl get pods -n services -l app.kubernetes.io/instance=cms-ipxe -o custom-columns=NS:.metadata.name --no-headers | head -1` -- ls /shared_tftp

If the firmware file you need is not listed, run the following command to copy the file from S3 to the TFTP server (ncn#)

/usr/share/doc/csm/scripts/operations/firmware/upload_foxconn_images_tftp.py

Reset BMC Factory Defaults

IMPORTANT: Only perform this action if required! Check the HFP release notes!

Run the following command prior to using FAS to update the BMC firmware. This will reset the BMC to factory defaults (ncn#):

ssh admin@$(xname) 'fw_setenv openbmconce "factory-reset"'

Continue to update the BMC firmware using one of the methods above.

NOTE: The credentials for the admin account may have been reset along with the factory defaults. Should this occur, FAS will no longer be able to verify the update after the BMC reboots and will fail after the time limit. The BMC firmware update should still have succeeded despite this. After the update is complete, return here to reset the admin password if necessary.

If the admin password changed to something other than the what is stored in vault, you may see something like the following when attempting to log in to the BMC:

> ssh admin@x3000c0s33b3
admin@x3000c0s33b3's password:

The account is locked due to 10 failed logins.

(5 minutes left to unlock)
Permission denied, please try again.

You will need to wait until the lockout period expires and time your next login attempt to occur prior to other system services attempting to log in with the wrong password, locking you out again. The factory default password that you will need to log in with to reset the password will not be mentioned here. Please request it from your HPE service representative.

Time the following command to execute after the lockout period expires. Rather than specifying password for the new admin password, as shown in the example, specify the correct password found in vault for your system (ncn#):

ssh admin@$(xname) 'ipmitool user set password 1 "password"'

Reset BIOS Factory Defaults

IMPORTANT: Only perform this action if required! Check the HFP release notes!

Before proceeding, you must have first used FAS to update the BIOS on the target node before resetting the BIOS factory defaults. The node should remain powered OFF after the update.

  1. (ncn#) Reset BIOS factory defaults using ipmitool:

    ssh admin@$(bmc_xname) 'ipmitool raw 0x0 0x8 0x05 0x80 0x80 0x00 0x00 0x00 ; ipmitool raw 0x0 0x9 0x05 0x00 0x00'
    

    The expected results should look like this:

    01 05 80 80 00 00 00
    

    If the results do not look like this, please consult with your HPE service representative before proceeding.

  2. Next, power the node on.

    IMPORTANT: After the node has powered on, it must REMAIN ON FOR AT LEAST 6 MINUTES before proceeding to the next step.

  3. (ncn#) Clear CMOS using ipmitool:

    ssh admin@$(bmc_xname) 'ipmitool chassis bootdev none clear-cmos=yes'
    
  4. Power the node off.

  5. After node power is reported as OFF, then power it on again.

  6. Once node power is reported as ON, the BIOS update will be complete.