The following covers redeploying the Spire service and restoring the data.
(ncn-mw#
) Verify that a backup of the Spire Postgres data exists.
Verify that a completed backup exists.
cray artifacts list postgres-backup --format json | jq -r '.artifacts[].Key | select(contains("spire"))'
Example output:
spire-postgres-2022-09-14T03:10:04.manifest
spire-postgres-2022-09-14T03:10:04.psql
(ncn-mw#
) Uninstall the chart and wait for the resources to terminate.
Note the version of the chart that is currently deployed.
helm history -n spire spire
Example output:
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
1 Tue Aug 2 22:14:31 2022 deployed spire-2.6.0 0.12.2 Install complete
Uninstall the chart.
helm uninstall -n spire spire
Example output:
release "spire" uninstalled
Wait for the resources to terminate, delete the PVCs, and clean up spire-agent
before reinstalling the chart.
Verify that no Spire pods are running.
watch "kubectl get pods -n spire"
Example output:
No resources found in spire namespace.
Delete the Spire PVCs.
kubectl get pvc -n spire | grep spire-data-spire-server | awk '{print $1}' | xargs kubectl delete -n spire pvc
Example output:
persistentvolumeclaim "spire-data-spire-server-0" deleted
persistentvolumeclaim "spire-data-spire-server-1" deleted
persistentvolumeclaim "spire-data-spire-server-2" deleted
Clean up spire-agent
.
for ncn in $(kubectl get nodes -o name | cut -d'/' -f2); do
echo "Cleaning up NCN ${ncn}"
ssh "${ncn}" systemctl stop spire-agent
ssh "${ncn}" rm -v /var/lib/spire/data/svid.key /var/lib/spire/agent_svid.der /var/lib/spire/bundle.der
done
(ncn-mw#
) Redeploy the chart and wait for the resources to start.
Follow the Redeploying a Chart procedure with the following specifications:
Name of chart to be redeployed: spire
Base name of manifest: sysmgmt
When reaching the step to update customizations, no edits need to be made to the customizations file.
When reaching the step to validate that the redeploy was successful, perform the following step:
Only follow this step as part of the previously linked chart redeploy procedure.
Wait for the resources to start.
watch "kubectl get pods -n spire"
Example output:
NAME READY STATUS RESTARTS AGE
request-ncn-join-token-89hp7 2/2 Running 0 31m
request-ncn-join-token-fvqdj 2/2 Running 0 31m
request-ncn-join-token-h7qc2 2/2 Running 0 31m
request-ncn-join-token-wv56n 2/2 Running 0 31m
request-ncn-join-token-dnfhk 2/2 Running 0 31m
request-ncn-join-token-hbvwc 2/2 Running 0 31m
spire-agent-cmn9q 1/1 Running 0 31m
spire-agent-gzn2d 1/1 Running 0 31m
spire-agent-pl595 1/1 Running 0 31m
spire-create-pooler-schema-1-g6gr6 0/3 Completed 0 31m
spire-jwks-6c97b5694f-d94rg 3/3 Running 0 31m
spire-jwks-6c97b5694f-h89lb 3/3 Running 0 31m
spire-jwks-6c97b5694f-kz9k4 3/3 Running 0 31m
spire-postgres-0 3/3 Running 0 31m
spire-postgres-1 3/3 Running 0 31m
spire-postgres-2 3/3 Running 0 30m
spire-postgres-pooler-695d4cd48f-57p5s 2/2 Running 0 30m
spire-postgres-pooler-695d4cd48f-bzm6n 2/2 Running 0 30m
spire-postgres-pooler-695d4cd48f-mv57z 2/2 Running 0 30m
spire-server-0 2/2 Running 4 31m
spire-server-1 2/2 Running 0 28m
spire-server-2 2/2 Running 0 28m
spire-update-bss-1-cfbxc 0/2 Completed 0 31m
Rejoin the storage nodes to Spire and restart the spire-agent
on all NCNs.
/opt/cray/platform-utils/spire/fix-spire-on-storage.sh
for i in $(kubectl get nodes -o name | cut -d"/" -f2) $(ceph node ls | jq -r '.[] | keys[]' | sort -u); do ssh $i systemctl start spire-agent; done