A WAL event can occur because of lag, network communication, or bandwidth issues. This can cause the PVC hosted by Ceph and mounted inside the container on /home/postgres/pgdata
to fill and the database to stop running.
If no database dump exists, then the disk space issue needs to be fixed so that a dump can be taken. Then the dump can be restored to a newly created postgresql
cluster. If a dump already exists,
then skip to Rebuild the cluster and restore the data.
If no database dump exists and neither option results in a successful dump, then service-specific Disaster Recovery for Postgres is required.
The recovery workflow:
A running database is needed to be able to dump the current data.
The following example is based on cray-smd-postgres
.
(ncn-mw#
) Confirm that the database is down (no endpoint exists) and that the disk is full on one or more postgresql
cluster member.
POSTGRESQL=cray-smd-postgres
NAMESPACE=services
kubectl get endpoints ${POSTGRESQL} -n ${NAMESPACE}
Expected output looks similar to:
NAME ENDPOINTS AGE
cray-smd-postgres 3d22h
for i in {0..2}; do
echo "${POSTGRESQL}-${i}:" ; kubectl exec ${POSTGRESQL}-${i} -n ${NAMESPACE} -c postgres -- df -h pgdata
done
Expected output looks similar to:
cray-smd-postgres-0:
Filesystem Size Used Avail Use% Mounted on
/dev/rbd8 30G 28G 1.6G 95% /home/postgres/pgdata
cray-smd-postgres-1:
Filesystem Size Used Avail Use% Mounted on
/dev/rbd15 30G 30G 0 100% /home/postgres/pgdata
cray-smd-postgres-2:
Filesystem Size Used Avail Use% Mounted on
/dev/rbd6 30G 383M 30G 2% /home/postgres/pgdata
If the database is down and the disk is full because of replication issues, there are two ways to attempt to get back to a running database: either delete files or resize the Postgres PVCs until the database is able to start running again.
The following example is based on cray-smd-postgres
.
Clear files from /home/postgres/pgdata/pgroot/pg_log/
until the database is running again and allowing connections.
For example, if the disk space is at 100%, then copy the logs off (optional), exec into the pod, and then clear the logs to recover some disk space.
(ncn-mw#
) Copy off the logs. (Optional)
kubectl cp "${POSTGRESQL}-1":/home/postgres/pgdata/pgroot/pg_log /tmp -c postgres -n ${NAMESPACE}
(ncn-mw#
) Open an interactive shell to the container.
kubectl exec "${POSTGRESQL}-1" -n ${NAMESPACE} -c postgres -it -- bash
(postgres-container#
) Clear the logs.
for i in {0..7}; do > /home/postgres/pgdata/pgroot/pg_log/postgresql-$i.csv; done
(ncn-mw#
) Restart the Postgres cluster and postgres-operator
.
kubectl delete pod -n ${NAMESPACE} "${POSTGRESQL}-0" "${POSTGRESQL}-1" "${POSTGRESQL}-2"
kubectl delete pod -n services -l app.kubernetes.io/name=postgres-operator
(ncn-mw#
) Check if the database is running. If it is running, then proceed to Dump the data.
kubectl exec "${POSTGRESQL}-1" -n ${NAMESPACE} -c postgres -it -- psql -U postgres
Example of output indicating that the database is running (type \q
to exit the shell):
psql (12.2 (Ubuntu 12.2-1.pgdg18.04+1), server 11.7 (Ubuntu 11.7-1.pgdg18.04+1))
Type "help" for help.
postgres=#
If the database is still not running, then delete files from /home/postgres/pgdata/pgroot/data/pg_wal/
.
CAUTION: This method could result in unintended consequences for the Postgres database and long service downtime; do not use unless there is a known Disaster Recovery for Postgres procedure for repopulating the Postgres cluster.
(ncn-mw#
) Open an interactive shell to the container.
kubectl exec "${POSTGRESQL}-1" -n ${NAMESPACE} -c postgres -it -- bash
(postgres-container#
) Delete the files.
rm pgdata/pgroot/data/pg_wal/0*
(ncn-mw#
) Restart the Postgres cluster and postgres-operator
.
kubectl delete pod -n ${NAMESPACE} "${POSTGRESQL}-0" "${POSTGRESQL}-1" "${POSTGRESQL}-2"
kubectl delete pod -n services -l app.kubernetes.io/name=postgres-operator
Check if the database is running using the same method as in the earlier step.
If the database is still not running, then try recovering using the other option listed in this document.
The following example is based on cray-smd-postgres
, where the postgresql
cray-smd-postgres
resource and the pgdata-cray-smd-postgres
PVCs will be resized from 100Gi
to 120Gi
.
(ncn-mw#
) Determine the current size of the Postgres PVCs and set PGRESIZE
to the desired new size (it must be larger than the current size).
Get the name of the postgresql
resource.
kubectl get postgresql -A | grep "smd\|NAME"
Expected output:
NAMESPACE NAME TEAM VERSION PODS VOLUME CPU-REQUEST MEMORY-REQUEST AGE STATUS
services cray-smd-postgres cray-smd 11 3 100Gi 4 8Gi 18h Running
List the PVCs associated with it.
kubectl get pvc -A | grep "cray-smd-postgres\|NAME"
Expected output:
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
services pgdata-cray-smd-postgres-0 Bound pvc-c86859f4-a57f-4694-a66a-8120e96a1ab4 100Gi RWO k8s-block-replicated 18h
services pgdata-cray-smd-postgres-1 Bound pvc-300f52e4-f88d-47ef-9a1e-e598fd919047 100Gi RWO k8s-block-replicated 18h
services pgdata-cray-smd-postgres-2 Bound pvc-f33879f3-0e99-4299-b796-210fbb693a2f 100Gi RWO k8s-block-replicated 18h
Set variables based on the output of these commands.
PGRESIZE=120Gi
POSTGRESQL=cray-smd-postgres
PGDATA=pgdata-cray-smd-postgres
NAMESPACE=services
(ncn-mw#
) Edit numberOfInstances
in the postgresql
resource from 3 to 1.
kubectl patch postgresql ${POSTGRESQL} -n ${NAMESPACE} --type=json -p='[{"op" : "replace", "path":"/spec/numberOfInstances", "value" : 1}]'
Expected output:
postgresql.acid.zalan.do/cray-smd-postgres patched
(ncn-mw#
) Wait for 2 of the 3 postgresql
pods to terminate.
while [ $(kubectl get pods -l "application=spilo,cluster-name=${POSTGRESQL}" -n ${NAMESPACE} | grep -v NAME | wc -l) != 1 ] ; do
echo " waiting for pods to terminate"
sleep 2
done
(ncn-mw#
) Delete the PVCs from the non-running Postgres pods.
kubectl delete pvc "${PGDATA}-1" "${PGDATA}-2" -n ${NAMESPACE}
Expected output:
persistentvolumeclaim "pgdata-cray-smd-postgres-1" deleted
persistentvolumeclaim "pgdata-cray-smd-postgres-2" deleted
(ncn-mw#
) Resize the remaining Postgres PVC resources.requests.storage
to $PGRESIZE
.
kubectl patch -p '{"spec": {"resources": {"requests": {"storage": "'${PGRESIZE}'"}}}}' "pvc/${PGDATA}-0" -n ${NAMESPACE}
Expected output:
persistentvolumeclaim/pgdata-cray-smd-postgres-0 patched
(ncn-mw#
) Wait for the PVC to resize.
while [ -z '$(kubectl describe pvc "${PGDATA}-0" -n ${NAMESPACE} | grep FileSystemResizeSuccessful' ] ; do
echo " waiting for PVC to resize" ; sleep 2
done
(ncn-mw#
) Update the postgresql
resource spec.volume.size
to $PGRESIZE
.
kubectl get "postgresql/${POSTGRESQL}" -n ${NAMESPACE} -o json | jq '.spec.volume = {"size": "'${PGRESIZE}'"}' | kubectl apply -f -
Expected output:
postgresql.acid.zalan.do/cray-smd-postgres configured
(ncn-mw#
) Restart the existing postgresql
pod.
kubectl delete pod "${POSTGRESQL}-0" -n services
Expected output:
pod "cray-smd-postgres-0" deleted
(ncn-mw#
) Perform verifications.
Verify that the single instance pod is Running
with 3/3
Ready
.
kubectl get pods -l "application=spilo,cluster-name=${POSTGRESQL}" -n ${NAMESPACE}
Expected output:
NAME READY STATUS RESTARTS AGE
cray-smd-postgres-0 3/3 Running 0 14s
Verify that patronictl
reports the member is running
.
kubectl exec "${POSTGRESQL}-0" -n ${NAMESPACE} -c postgres -it -- patronictl list
Expected output:
+-------------------+---------------------+------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+-------------------+---------------------+------------+--------+---------+----+-----------+
| cray-smd-postgres | cray-smd-postgres-0 | 10.44.0.38 | Leader | running | 2 | |
+-------------------+---------------------+------------+--------+---------+----+-----------+
Verify that the postgresql
resource is Running
with new volume size ($PGRESIZE
).
kubectl get postgresql ${POSTGRESQL} -n ${NAMESPACE}
Expected output:
NAME TEAM VERSION PODS VOLUME CPU-REQUEST MEMORY-REQUEST AGE STATUS
cray-smd-postgres cray-smd 11 1 120Gi 500m 100Mi 11m Running
Verify that the database is running.
kubectl exec "${POSTGRESQL}-0" -n services -c postgres -it -- psql -U postgres
Example of output indicating that the database is running (type \q
to exit the shell):
psql (12.2 (Ubuntu 12.2-1.pgdg18.04+1), server 11.7 (Ubuntu 11.7-1.pgdg18.04+1))
Type "help" for help.
postgres=#
(ncn-mw#
) Scale numberOfInstances
in postgresql
resource from 1 back to 3.
kubectl patch postgresql "${POSTGRESQL}" -n "${NAMESPACE}" --type='json' -p='[{"op" : "replace", "path":"/spec/numberOfInstances", "value" : 3}]'
Expected output:
postgresql.acid.zalan.do/cray-smd-postgres patched
(ncn-mw#
) Logs may indicate WAL error such as the following, but a dump can be taken at this point.
kubectl logs "${POSTGRESQL}-0" -n ${NAMESPACE} -c postgres | grep -i error
kubectl logs "${POSTGRESQL}-1" -n ${NAMESPACE} -c postgres | grep -i error
kubectl logs "${POSTGRESQL}-2" -n ${NAMESPACE} -c postgres | grep -i error
Output if there is an error:
error: could not get write-ahead log end position from server: ERROR: invalid segment number
Update the customizations.yaml
file with the same changes.
In order to persist any Postgres PVC storage volume size changes, it is necessary that this change also be made to the
customer-managed customizations.yaml
file.
For more information, see Postgres PVC resize.
If the recovery was successful such that the database is now running, then continue with the following steps to dump the data.
(ncn-mw#
) Set helper variables for the client service and postgresql
resource in question.
CLIENT=cray-smd
POSTGRESQL=cray-smd-postgres
NAMESPACE=services
(ncn-mw#
) Record the number of replicas the client service is using. and then scale it to 0.
The following example is based on
cray-smd
. Thecray-smd
client service is deployed as a deployment with 3 replicas. Other services may differ in type (for example,statefulset
) or number of replicas.
Record the number of replicas.
CLIENT_REPLICAS=$(kubectl get deployment -n ${NAMESPACE} ${CLIENT} -o jsonpath='{.spec.replicas}'); echo ${CLIENT_REPLICAS}
Expected output:
3
Scale it to 0.
kubectl scale deployment ${CLIENT} -n ${NAMESPACE} --replicas=0
Expected output:
deployment.apps/cray-smd scaled
(ncn-mw#
) Wait for the running pods to terminate.
while [ $(kubectl get pods -n ${NAMESPACE} -l app.kubernetes.io/name="${CLIENT}" | grep -v NAME | wc -l) != 0 ] ; do
echo " waiting for pods to terminate"
sleep 2
done
(ncn-mw#
) Dump all the data.
Determine which Postgres member is the leader.
kubectl exec "${POSTGRESQL}-0" -n ${NAMESPACE} -c postgres -it -- patronictl list
Expected output:
+-------------------+---------------------+------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+-------------------+---------------------+------------+--------+---------+----+-----------+
| cray-smd-postgres | cray-smd-postgres-0 | 10.42.0.25 | Leader | running | 1 | |
| cray-smd-postgres | cray-smd-postgres-1 | 10.44.0.34 | | running | | 0 |
| cray-smd-postgres | cray-smd-postgres-2 | 10.36.0.44 | | running | | 0 |
+-------------------+---------------------+------------+--------+---------+----+-----------+
Dump the data to a local file.
POSTGRES_LEADER=cray-smd-postgres-0
kubectl exec -it ${POSTGRES_LEADER} -n ${NAMESPACE} -c postgres -- pg_dumpall -c -U postgres > "${POSTGRESQL}-dumpall.sql"
ls "${POSTGRESQL}-dumpall.sql"
Expected output:
cray-smd-postgres-dumpall.sql
If recovery was successful such that a dump could be taken or a dump already exists, then continue with the following steps to rebuild the postgresql
cluster and restore the data.
The following example restores the dump to the cray-smd-postgres
cluster.
If the client service is not yet scaled to 0, follow the step above to “scale the client service to 0”.
(ncn-mw#
) Delete and re-create the postgresql
resource (which includes the PVCs).
Set helper variables.
CLIENT=cray-smd
POSTGRESQL=cray-smd-postgres
NAMESPACE=services
Save the postgresql
resource definition to a file.
kubectl get postgresql ${POSTGRESQL} -n ${NAMESPACE} -o json |
jq 'del(.spec.selector)' |
jq 'del(.spec.template.metadata.labels."controller-uid")' |
jq 'del(.status)' > postgres-cr.yaml
Delete the current postgresql
resource.
kubectl delete -f postgres-cr.yaml
Expected output:
postgresql.acid.zalan.do "cray-smd-postgres" deleted
Wait for the pods to terminate.
while [ $(kubectl get pods -l "application=spilo,cluster-name=${POSTGRESQL}" -n ${NAMESPACE} | grep -v NAME | wc -l) != 0 ] ; do
echo " waiting for pods to terminate" ; sleep 2
done
Create a new postgresql
resource.
kubectl create -f postgres-cr.yaml
Expected output:
postgresql.acid.zalan.do/cray-smd-postgres created
Wait for the pods to start running.
while [ $(kubectl get pods -l "application=spilo,cluster-name=${POSTGRESQL}" -n ${NAMESPACE} | grep -v NAME | wc -l) != 3 ] ; do
echo " waiting for pods to start running" ; sleep 2
done
(ncn-mw#
) Determine which Postgres member is the leader.
kubectl exec "${POSTGRESQL}-0" -n ${NAMESPACE} -c postgres -it -- patronictl list
Expected output:
+-------------------+---------------------+------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+-------------------+---------------------+------------+--------+---------+----+-----------+
| cray-smd-postgres | cray-smd-postgres-0 | 10.42.0.25 | Leader | running | 1 | |
| cray-smd-postgres | cray-smd-postgres-1 | 10.44.0.34 | | running | | 0 |
| cray-smd-postgres | cray-smd-postgres-2 | 10.36.0.44 | | running | | 0 |
+-------------------+---------------------+------------+--------+---------+----+-----------+
(ncn-mw#
) Record the name of the leader.
POSTGRES_LEADER=cray-smd-postgres-0
(ncn-mw#
) Copy the dump taken earlier to the Postgres leader pod and restore the data.
If the dump exists in a different location, adjust this example as needed.
kubectl cp "./${POSTGRESQL}-dumpall.sql" "${POSTGRES_LEADER}:/home/postgres/${POSTGRESQL}-dumpall.sql" -c postgres -n ${NAMESPACE}
kubectl exec ${POSTGRES_LEADER} -c postgres -n ${NAMESPACE} -it -- psql -U postgres < ${POSTGRESQL}-dumpall.sql
Restore the secrets.
Once the dump has been restored onto the newly built postgresql
cluster, the current Kubernetes secrets need to be updated in the
postgresql
cluster, otherwise the service will experience readiness and liveness probe failures because it will be unable to
authenticate to the database.
(ncn-mw#
) Determine what secrets are associated with the postgresql
credentials.
kubectl get secrets -n ${NAMESPACE} | grep "${POSTGRESQL}.credentials"
Expected output:
services hmsdsuser.cray-smd-postgres.credentials Opaque 2 31m
services postgres.cray-smd-postgres.credentials Opaque 2 31m
services service-account.cray-smd-postgres.credentials Opaque 2 31m
services standby.cray-smd-postgres.credentials Opaque 2 31m
For each secret above, get the username and password from Kubernetes and update the Postgres database with this information.
The following example uses the hmsdsuser.cray-smd-postgres.credentials
secret.
(ncn-mw#
) Get the username.
kubectl get secret hmsdsuser.cray-smd-postgres.credentials -n ${NAMESPACE} -ojsonpath='{.data.username}' | base64 -d
Possible output:
hmsdsuser
(ncn-mw#
) Get the password.
kubectl get secret hmsdsuser.cray-smd-postgres.credentials -n ${NAMESPACE} -ojsonpath='{.data.password}'| base64 -d
Possible output:
ABCXYZ
(ncn-mw#
) Open an interactive Postgres console in the leader container.
kubectl exec ${POSTGRES_LEADER} -n ${NAMESPACE} -c postgres -it -- /usr/bin/psql postgres postgres
(postgres#
) Update the password for the user.
ALTER USER hmsdsuser WITH PASSWORD 'ABCXYZ';
On success, output resembles the following:
ALTER ROLE
(postgres#
) If this is the last user being updated, enter \q
to exit the console.
Continue the above process until all ${POSTGRESQL}.credentials
secrets have been updated in the database.
(ncn-mw#
) Restart the postgresql
cluster and wait for it to start running.
Restart the cluster.
kubectl delete pod "${POSTGRESQL}-0" "${POSTGRESQL}-1" "${POSTGRESQL}-2" -n ${NAMESPACE}
Wait for it to start running.
while [ $(kubectl get postgresql ${POSTGRESQL} -n ${NAMESPACE} -o json |jq -r '.status.PostgresClusterStatus') != "Running" ]; do
echo "waiting for ${POSTGRESQL} to start running"; sleep 2
done
(ncn-mw#
) Scale the client service back to the original number of replicas and wait for it to start running.
The number of replicas was saved in the CLIENT_REPLICAS
variable.
Scale it back up.
The following example is based on
cray-smd
. Thecray-smd
client service is deployed as a deployment. Other services may differ in type (for example,statefulset
).
kubectl scale deployment ${CLIENT} -n ${NAMESPACE} --replicas=${CLIENT_REPLICAS}
Wait for all of the pods to start running.
while [ $(kubectl get pods -n ${NAMESPACE} -l app.kubernetes.io/name="${CLIENT}" | grep -v NAME | wc -l) != ${CLIENT_REPLICAS} ] ; do
echo " waiting for pods to start running" ; sleep 2
done