Recover from Postgres WAL Event

A WAL event can occur because of lag, network communication, or bandwidth issues. This can cause the PVC hosted by Ceph and mounted inside the container on /home/postgres/pgdata to fill and the database to stop running. If no database dump exists, then the disk space issue needs to be fixed so that a dump can be taken. Then the dump can be restored to a newly created postgresql cluster. If a dump already exists, then skip to Rebuild the cluster and restore the data.

If no database dump exists and neither option results in a successful dump, then service-specific Disaster Recovery for Postgres is required.

The recovery workflow:

  1. Attempt to recover to a running database

  2. Dump the data

  3. Rebuild the cluster and restore the data

1. Attempt to recover to a running database

A running database is needed to be able to dump the current data.

The following example is based on cray-smd-postgres.

(ncn-mw#) Confirm that the database is down (no endpoint exists) and that the disk is full on one or more postgresql cluster member.

POSTGRESQL=cray-smd-postgres
NAMESPACE=services
kubectl get endpoints ${POSTGRESQL} -n ${NAMESPACE}

Expected output looks similar to:

NAME                ENDPOINTS         AGE
cray-smd-postgres                     3d22h
for i in {0..2}; do
    echo "${POSTGRESQL}-${i}:" ; kubectl exec ${POSTGRESQL}-${i} -n ${NAMESPACE} -c postgres -- df -h pgdata
done

Expected output looks similar to:

cray-smd-postgres-0:
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd8        30G   28G  1.6G  95% /home/postgres/pgdata
cray-smd-postgres-1:
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd15       30G   30G     0 100% /home/postgres/pgdata
cray-smd-postgres-2:
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd6        30G  383M   30G   2% /home/postgres/pgdata

If the database is down and the disk is full because of replication issues, there are two ways to attempt to get back to a running database: either delete files or resize the Postgres PVCs until the database is able to start running again.

Option 1 : Clear logs and/or WAL files

The following example is based on cray-smd-postgres.

  1. Clear files from /home/postgres/pgdata/pgroot/pg_log/ until the database is running again and allowing connections.

    For example, if the disk space is at 100%, then copy the logs off (optional), exec into the pod, and then clear the logs to recover some disk space.

    1. (ncn-mw#) Copy off the logs. (Optional)

      kubectl cp "${POSTGRESQL}-1":/home/postgres/pgdata/pgroot/pg_log /tmp -c postgres -n ${NAMESPACE}
      
    2. (ncn-mw#) Open an interactive shell to the container.

      kubectl exec "${POSTGRESQL}-1" -n ${NAMESPACE} -c postgres -it -- bash
      
    3. (postgres-container#) Clear the logs.

      for i in {0..7}; do > /home/postgres/pgdata/pgroot/pg_log/postgresql-$i.csv; done
      
  2. (ncn-mw#) Restart the Postgres cluster and postgres-operator.

    kubectl delete pod -n ${NAMESPACE} "${POSTGRESQL}-0" "${POSTGRESQL}-1" "${POSTGRESQL}-2"
    kubectl delete pod -n services -l app.kubernetes.io/name=postgres-operator
    
  3. (ncn-mw#) Check if the database is running. If it is running, then proceed to Dump the data.

    kubectl exec "${POSTGRESQL}-1" -n ${NAMESPACE} -c postgres -it -- psql -U postgres
    

    Example of output indicating that the database is running (type \q to exit the shell):

    psql (14.7 (Ubuntu 14.7-1.pgdg18.04+1), server 11.19 (Ubuntu 11.19-1.pgdg18.04+1))
    Type "help" for help.
    
    postgres=#
    
  4. If the database is still not running, then delete files from /home/postgres/pgdata/pgroot/data/pg_wal/.

    CAUTION: This method could result in unintended consequences for the Postgres database and long service downtime; do not use unless there is a known Disaster Recovery for Postgres procedure for repopulating the Postgres cluster.

    1. (ncn-mw#) Open an interactive shell to the container.

      kubectl exec "${POSTGRESQL}-1" -n ${NAMESPACE} -c postgres -it -- bash
      
    2. (postgres-container#) Delete the files.

      rm pgdata/pgroot/data/pg_wal/0*
      
  5. (ncn-mw#) Restart the Postgres cluster and postgres-operator.

    kubectl delete pod -n ${NAMESPACE} "${POSTGRESQL}-0" "${POSTGRESQL}-1" "${POSTGRESQL}-2"
    kubectl delete pod -n services -l app.kubernetes.io/name=postgres-operator
    
  6. Check if the database is running using the same method as in the earlier step.

    If the database is still not running, then try recovering using the other option listed in this document.

Option 2 : Resize the Postgres PVCs

The following example is based on cray-smd-postgres, where the postgresql cray-smd-postgres resource and the pgdata-cray-smd-postgres PVCs will be resized from 100Gi to 120Gi.

  1. (ncn-mw#) Determine the current size of the Postgres PVCs and set PGRESIZE to the desired new size (it must be larger than the current size).

    1. Get the name of the postgresql resource.

      kubectl get postgresql -A | grep "smd\|NAME"
      

      Expected output:

      NAMESPACE   NAME                         TEAM                VERSION   PODS   VOLUME   CPU-REQUEST   MEMORY-REQUEST   AGE   STATUS
      services    cray-smd-postgres            cray-smd            11        3      100Gi    4             8Gi              18h   Running
      
    2. List the PVCs associated with it.

      kubectl get pvc -A | grep "cray-smd-postgres\|NAME"
      

      Expected output:

      NAMESPACE      NAME                         STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS           AGE
      services       pgdata-cray-smd-postgres-0   Bound     pvc-c86859f4-a57f-4694-a66a-8120e96a1ab4   100Gi      RWO            k8s-block-replicated   18h
      services       pgdata-cray-smd-postgres-1   Bound     pvc-300f52e4-f88d-47ef-9a1e-e598fd919047   100Gi      RWO            k8s-block-replicated   18h
      services       pgdata-cray-smd-postgres-2   Bound     pvc-f33879f3-0e99-4299-b796-210fbb693a2f   100Gi      RWO            k8s-block-replicated   18h
      
    3. Set variables based on the output of these commands.

      PGRESIZE=120Gi
      POSTGRESQL=cray-smd-postgres
      PGDATA=pgdata-cray-smd-postgres
      NAMESPACE=services
      
  2. (ncn-mw#) Edit numberOfInstances in the postgresql resource from 3 to 1.

    kubectl patch postgresql ${POSTGRESQL} -n ${NAMESPACE} --type=json -p='[{"op" : "replace", "path":"/spec/numberOfInstances", "value" : 1}]'
    

    Expected output:

     postgresql.acid.zalan.do/cray-smd-postgres patched
    
  3. (ncn-mw#) Wait for 2 of the 3 postgresql pods to terminate.

    while [ $(kubectl get pods -l "application=spilo,cluster-name=${POSTGRESQL}" -n ${NAMESPACE} | grep -v NAME | wc -l) != 1 ] ; do
        echo "  waiting for pods to terminate"
        sleep 2
    done
    
  4. (ncn-mw#) Delete the PVCs from the non-running Postgres pods.

    kubectl delete pvc "${PGDATA}-1" "${PGDATA}-2" -n ${NAMESPACE}
    

    Expected output:

    persistentvolumeclaim "pgdata-cray-smd-postgres-1" deleted
    persistentvolumeclaim "pgdata-cray-smd-postgres-2" deleted
    
  5. (ncn-mw#) Resize the remaining Postgres PVC resources.requests.storage to $PGRESIZE.

    kubectl patch -p '{"spec": {"resources": {"requests": {"storage": "'${PGRESIZE}'"}}}}' "pvc/${PGDATA}-0" -n ${NAMESPACE}
    

    Expected output:

    persistentvolumeclaim/pgdata-cray-smd-postgres-0 patched
    
  6. (ncn-mw#) Wait for the PVC to resize.

    while [ -z '$(kubectl describe pvc "${PGDATA}-0" -n ${NAMESPACE} | grep FileSystemResizeSuccessful' ] ; do
        echo "  waiting for PVC to resize" ; sleep 2
    done
    
  7. (ncn-mw#) Update the postgresql resource spec.volume.size to $PGRESIZE.

    kubectl get "postgresql/${POSTGRESQL}" -n ${NAMESPACE} -o json | jq '.spec.volume = {"size": "'${PGRESIZE}'"}' | kubectl apply -f -
    

    Expected output:

    postgresql.acid.zalan.do/cray-smd-postgres configured
    
  8. (ncn-mw#) Restart the existing postgresql pod.

    kubectl delete pod "${POSTGRESQL}-0" -n services
    

    Expected output:

    pod "cray-smd-postgres-0" deleted
    
  9. (ncn-mw#) Perform verifications.

    1. Verify that the single instance pod is Running with 3/3 Ready.

      kubectl get pods -l "application=spilo,cluster-name=${POSTGRESQL}" -n ${NAMESPACE}
      

      Expected output:

      NAME                  READY   STATUS    RESTARTS   AGE
      cray-smd-postgres-0   3/3     Running   0          14s
      
    2. Verify that patronictl reports the member is running.

      kubectl exec "${POSTGRESQL}-0" -n ${NAMESPACE} -c postgres -it -- patronictl list
      

      Expected output:

      +-------------------+---------------------+------------+--------+---------+----+-----------+
      |      Cluster      |        Member       |    Host    |  Role  |  State  | TL | Lag in MB |
      +-------------------+---------------------+------------+--------+---------+----+-----------+
      | cray-smd-postgres | cray-smd-postgres-0 | 10.44.0.38 | Leader | running |  2 |           |
      +-------------------+---------------------+------------+--------+---------+----+-----------+
      
    3. Verify that the postgresql resource is Running with new volume size ($PGRESIZE).

      kubectl get postgresql ${POSTGRESQL} -n ${NAMESPACE}
      

      Expected output:

      NAME                TEAM       VERSION   PODS   VOLUME   CPU-REQUEST   MEMORY-REQUEST   AGE   STATUS
      cray-smd-postgres   cray-smd   11        1      120Gi    500m          100Mi            11m   Running
      
    4. Verify that the database is running.

      kubectl exec "${POSTGRESQL}-0" -n services -c postgres -it -- psql -U postgres
      

      Example of output indicating that the database is running (type \q to exit the shell):

      psql (14.7 (Ubuntu 14.7-1.pgdg18.04+1), server 11.19 (Ubuntu 11.19-1.pgdg18.04+1))
      Type "help" for help.
      
      postgres=#
      
  10. (ncn-mw#) Scale numberOfInstances in postgresql resource from 1 back to 3.

    kubectl patch postgresql "${POSTGRESQL}" -n "${NAMESPACE}" --type='json' -p='[{"op" : "replace", "path":"/spec/numberOfInstances", "value" : 3}]'
    

    Expected output:

    postgresql.acid.zalan.do/cray-smd-postgres patched
    
  11. (ncn-mw#) Logs may indicate WAL error such as the following, but a dump can be taken at this point.

    kubectl logs "${POSTGRESQL}-0" -n ${NAMESPACE} -c postgres | grep -i error
    kubectl logs "${POSTGRESQL}-1" -n ${NAMESPACE} -c postgres | grep -i error
    kubectl logs "${POSTGRESQL}-2" -n ${NAMESPACE} -c postgres | grep -i error
    

    Output if there is an error:

    error: could not get write-ahead log end position from server: ERROR:  invalid segment number
    
  12. Update the customizations.yaml file with the same changes.

    In order to persist any Postgres PVC storage volume size changes, it is necessary that this change also be made to the customer-managed customizations.yaml file.

    For more information, see Postgres PVC resize.

2. Dump the data

If the recovery was successful such that the database is now running, then continue with the following steps to dump the data.

  1. (ncn-mw#) Set helper variables for the client service and postgresql resource in question.

    CLIENT=cray-smd
    POSTGRESQL=cray-smd-postgres
    NAMESPACE=services
    
  2. (ncn-mw#) Record the number of replicas the client service is using. and then scale it to 0.

    The following example is based on cray-smd. The cray-smd client service is deployed as a deployment with 3 replicas. Other services may differ in type (for example, statefulset) or number of replicas.

    1. Record the number of replicas.

      CLIENT_REPLICAS=$(kubectl get deployment -n ${NAMESPACE} ${CLIENT} -o jsonpath='{.spec.replicas}'); echo ${CLIENT_REPLICAS}
      

      Expected output:

      3
      
    2. Scale it to 0.

      kubectl scale deployment ${CLIENT} -n ${NAMESPACE} --replicas=0
      

      Expected output:

      deployment.apps/cray-smd scaled
      
  3. (ncn-mw#) Wait for the running pods to terminate.

    while [ $(kubectl get pods -n ${NAMESPACE} -l app.kubernetes.io/name="${CLIENT}" | grep -v NAME | wc -l) != 0 ] ; do
        echo "  waiting for pods to terminate"
        sleep 2
    done
    
  4. (ncn-mw#) Dump all the data.

    1. Determine which Postgres member is the leader.

      kubectl exec "${POSTGRESQL}-0" -n ${NAMESPACE} -c postgres -it -- patronictl list
      

      Expected output:

      +-------------------+---------------------+------------+--------+---------+----+-----------+
      |      Cluster      |        Member       |    Host    |  Role  |  State  | TL | Lag in MB |
      +-------------------+---------------------+------------+--------+---------+----+-----------+
      | cray-smd-postgres | cray-smd-postgres-0 | 10.42.0.25 | Leader | running |  1 |           |
      | cray-smd-postgres | cray-smd-postgres-1 | 10.44.0.34 |        | running |    |         0 |
      | cray-smd-postgres | cray-smd-postgres-2 | 10.36.0.44 |        | running |    |         0 |
      +-------------------+---------------------+------------+--------+---------+----+-----------+
      
    2. Dump the data to a local file.

      POSTGRES_LEADER=cray-smd-postgres-0
      kubectl exec -it ${POSTGRES_LEADER} -n ${NAMESPACE} -c postgres -- pg_dumpall -c -U postgres > "${POSTGRESQL}-dumpall.sql"
      ls "${POSTGRESQL}-dumpall.sql"
      

      Expected output:

      cray-smd-postgres-dumpall.sql
      

3. Rebuild the cluster and restore the data

If recovery was successful such that a dump could be taken or a dump already exists, then continue with the following steps to rebuild the postgresql cluster and restore the data.

The following example restores the dump to the cray-smd-postgres cluster.

  1. If the client service is not yet scaled to 0, follow the step above to “scale the client service to 0”.

  2. (ncn-mw#) Delete and re-create the postgresql resource (which includes the PVCs).

    1. Set helper variables.

      CLIENT=cray-smd
      POSTGRESQL=cray-smd-postgres
      NAMESPACE=services
      
    2. Save the postgresql resource definition to a file.

      kubectl get postgresql ${POSTGRESQL} -n ${NAMESPACE} -o json |
          jq 'del(.spec.selector)' |
          jq 'del(.spec.template.metadata.labels."controller-uid")' |
          jq 'del(.status)' > postgres-cr.yaml
      
    3. Delete the current postgresql resource.

      kubectl delete -f postgres-cr.yaml
      

      Expected output:

      postgresql.acid.zalan.do "cray-smd-postgres" deleted
      
    4. Wait for the pods to terminate.

      while [ $(kubectl get pods -l "application=spilo,cluster-name=${POSTGRESQL}" -n ${NAMESPACE} | grep -v NAME | wc -l) != 0 ] ; do
          echo "  waiting for pods to terminate" ; sleep 2
      done
      
    5. Create a new postgresql resource.

      kubectl create -f postgres-cr.yaml
      

      Expected output:

      postgresql.acid.zalan.do/cray-smd-postgres created
      
    6. Wait for the pods to start running.

      while [ $(kubectl get pods -l "application=spilo,cluster-name=${POSTGRESQL}" -n ${NAMESPACE} | grep -v NAME | wc -l) != 3 ] ; do
          echo "  waiting for pods to start running" ; sleep 2
      done
      
  3. (ncn-mw#) Determine which Postgres member is the leader.

    kubectl exec "${POSTGRESQL}-0" -n ${NAMESPACE} -c postgres -it -- patronictl list
    

    Expected output:

    +-------------------+---------------------+------------+--------+---------+----+-----------+
    |      Cluster      |        Member       |    Host    |  Role  |  State  | TL | Lag in MB |
    +-------------------+---------------------+------------+--------+---------+----+-----------+
    | cray-smd-postgres | cray-smd-postgres-0 | 10.42.0.25 | Leader | running |  1 |           |
    | cray-smd-postgres | cray-smd-postgres-1 | 10.44.0.34 |        | running |    |         0 |
    | cray-smd-postgres | cray-smd-postgres-2 | 10.36.0.44 |        | running |    |         0 |
    +-------------------+---------------------+------------+--------+---------+----+-----------+
    
  4. (ncn-mw#) Record the name of the leader.

    POSTGRES_LEADER=cray-smd-postgres-0
    
  5. (ncn-mw#) Copy the dump taken earlier to the Postgres leader pod and restore the data.

    If the dump exists in a different location, adjust this example as needed.

    kubectl cp "./${POSTGRESQL}-dumpall.sql" "${POSTGRES_LEADER}:/home/postgres/${POSTGRESQL}-dumpall.sql" -c postgres -n ${NAMESPACE}
    kubectl exec ${POSTGRES_LEADER} -c postgres -n ${NAMESPACE} -it -- psql -U postgres < ${POSTGRESQL}-dumpall.sql
    
  6. Restore the secrets.

    Once the dump has been restored onto the newly built postgresql cluster, the current Kubernetes secrets need to be updated in the postgresql cluster, otherwise the service will experience readiness and liveness probe failures because it will be unable to authenticate to the database.

    1. (ncn-mw#) Determine what secrets are associated with the postgresql credentials.

      kubectl get secrets -n ${NAMESPACE} | grep "${POSTGRESQL}.credentials"
      

      Expected output:

      services            hmsdsuser.cray-smd-postgres.credentials                       Opaque                                2      31m
      services            postgres.cray-smd-postgres.credentials                        Opaque                                2      31m
      services            service-account.cray-smd-postgres.credentials                 Opaque                                2      31m
      services            standby.cray-smd-postgres.credentials                         Opaque                                2      31m
      
    2. For each secret above, get the username and password from Kubernetes and update the Postgres database with this information.

      The following example uses the hmsdsuser.cray-smd-postgres.credentials secret.

      1. (ncn-mw#) Get the username.

        kubectl get secret hmsdsuser.cray-smd-postgres.credentials -n ${NAMESPACE} -ojsonpath='{.data.username}' | base64 -d
        

        Possible output:

        hmsdsuser
        
      2. (ncn-mw#) Get the password.

        kubectl get secret hmsdsuser.cray-smd-postgres.credentials -n ${NAMESPACE} -ojsonpath='{.data.password}'| base64 -d
        

        Possible output:

        ABCXYZ
        
      3. (ncn-mw#) Open an interactive Postgres console in the leader container.

        kubectl exec ${POSTGRES_LEADER} -n ${NAMESPACE} -c postgres -it -- /usr/bin/psql postgres postgres
        
      4. (postgres#) Update the password for the user.

        ALTER USER hmsdsuser WITH PASSWORD 'ABCXYZ';
        

        On success, output resembles the following:

        ALTER ROLE
        
      5. (postgres#) If this is the last user being updated, enter \q to exit the console.

    3. Continue the above process until all ${POSTGRESQL}.credentials secrets have been updated in the database.

  7. (ncn-mw#) Restart the postgresql cluster and wait for it to start running.

    1. Restart the cluster.

      kubectl delete pod "${POSTGRESQL}-0" "${POSTGRESQL}-1" "${POSTGRESQL}-2" -n ${NAMESPACE}
      
    2. Wait for it to start running.

      while [ $(kubectl get postgresql ${POSTGRESQL} -n ${NAMESPACE} -o json |jq -r '.status.PostgresClusterStatus') != "Running" ]; do
          echo "waiting for ${POSTGRESQL} to start running"; sleep 2
      done
      
  8. (ncn-mw#) Scale the client service back to the original number of replicas and wait for it to start running.

    The number of replicas was saved in the CLIENT_REPLICAS variable.

    1. Scale it back up.

      The following example is based on cray-smd. The cray-smd client service is deployed as a deployment. Other services may differ in type (for example, statefulset).

      kubectl scale deployment ${CLIENT} -n ${NAMESPACE} --replicas=${CLIENT_REPLICAS}
      
    2. Wait for all of the pods to start running.

      while [ $(kubectl get pods -n ${NAMESPACE} -l app.kubernetes.io/name="${CLIENT}" | grep -v NAME | wc -l) != ${CLIENT_REPLICAS} ] ; do
          echo "  waiting for pods to start running" ; sleep 2
      done