Cray System Management Documentation > Cray System Management (CSM) Administration Guide > hardware state manager > Restore Hardware State Manager (HSM) Postgres without an Existing Backup

Restore Hardware State Manager (HSM) Postgres without an Existing Backup

This procedure is intended to repopulate HSM in the event when no Postgres backup exists.

Prerequisite

Healthy System Layout Service (SLS). Recovered first if also affected.

Healthy HSM service.

Verify all 3 HSM postgres replicas are up and running:

ncn# kubectl -n services get pods -l cluster-name=cray-smd-postgres
NAME                  READY   STATUS    RESTARTS   AGE
cray-smd-postgres-0   3/3     Running   0          18d
cray-smd-postgres-1   3/3     Running   0          18d
cray-smd-postgres-2   3/3     Running   0          18d

Procedure

Re-run the HSM loader job.

ncn# kubectl -n services get job cray-smd-init -o json | jq 'del(.spec.selector)' | jq 'del(.spec.template.metadata.labels."controller-uid")' | kubectl replace --force -f -

Wait for the job to complete:

ncn# kubectl wait -n services job cray-smd-init --for=condition=complete --timeout=5m

Verify that the service is functional.

ncn# cray hsm service ready
code = 0
message = "HSM is healthy"

Get the number of node objects stored in HSM.

ncn# cray hsm state components list --type node --format json | jq .[].ID | wc -l
0

Restart MEDS and REDS.

To repopulate HSM with components, restart MEDS and REDS so that they will add known RedfishEndpoints back in to HSM. This will also kick off HSM rediscovery to repopulate components and hardware inventory.

ncn# kubectl scale deployment cray-meds -n services --replicas=0
ncn# kubectl scale deployment cray-meds -n services --replicas=1
ncn# kubectl scale deployment cray-reds -n services --replicas=0
ncn# kubectl scale deployment cray-reds -n services --replicas=1

Wait for the RedfishEndpoints table to get repopulated and discovery to complete.

ncn# cray hsm inventory RedfishEndpoints list --format json | jq .[].ID | wc -l
100
ncn# cray hsm inventory redfishEndpoints list --format json | grep -c "DiscoveryStarted"
0

Check for Discovery Errors.
```
ncn# cray hsm inventory redfishEndpoints list --format json | grep LastDiscoveryStatus | grep -v -c "DiscoverOK"
```
If any of the RedfishEndpoint entries have a LastDiscoveryStatus other than DiscoverOK after discovery has completed, refer to the Troubleshoot Issues with Redfish Endpoint Discovery procedure for guidance.
Re-apply any component group or partition customizations.

Any component groups or partitions created before HSM’s Postgres information was lost will need to be manually re-entered.
- Manage Component Groups
- Manage Component Partitions