Skip to content

Secrets recreated after cluster deletion when delete-pvc / delete-ssl / delete-backups finalizers are used #1564

@recharte

Description

@recharte

Description

When a PerconaPGCluster is deleted with the percona.com/delete-pvc, percona.com/delete-ssl, and percona.com/delete-backups finalizers set, the secrets that those finalizers delete can be recreated by the crunchy reconciler and left behind after the cluster is fully gone.

Root Cause

The deletion flow in (*PGClusterReconciler) Reconcile is:

1. runFinalizers()           ← deletes secrets HERE
2. Delete(postgresCluster)   ← crunchy DeletionTimestamp set HERE

The delete-pvc finalizer deletes user secrets (labeled role=pguser) and the delete-ssl finalizer deletes all TLS secrets. These deletions happen before Delete(postgresCluster) is called, meaning the crunchy PostgresCluster is still fully alive and its reconciler is operational.

The crunchy reconciler registers Owns(&corev1.Secret{}) in its watch setup. When the secrets are deleted, Kubernetes immediately enqueues a reconcile event for the PostgresCluster owner. If that reconcile runs before Delete(postgresCluster) sets a DeletionTimestamp, the crunchy reconciler sees no deletion in progress and recreates all the missing secrets via its normal reconciliation path.

Why delete-backups makes it consistently reproducible

The delete-backups finalizer triggers deleteBackups, which deletes PerconaPGBackup objects. Each deleted backup object has a internal.percona.com/delete-backup finalizer, so its backup controller reconciler runs finishBackup. That function continuously:

  • calls c.Status().Update(crunchyCluster) (clearing ManualBackup status) — directly enqueues the crunchy reconciler
  • updates the backup Job object (removing FinalizerKeepJob) — another owned-object event that re-enqueues the crunchy reconciler
  • retries every 5 seconds while waiting for AnnotationBackupInProgress to clear

Each of these writes repeatedly wakes the crunchy reconciler over several seconds, making the race window large enough to hit reliably.

Expected Behavior

Secrets deleted by delete-pvc / delete-ssl finalizers should not be recreated. After the cluster is fully gone, no secrets belonging to it should remain.

Actual Behavior

Secrets are deleted by the finalizers, then immediately recreated by the crunchy reconciler (triggered by owned-object deletion events and backup controller writes to the crunchy cluster), and are left behind permanently after the PerconaPGCluster is gone.

Affected Components

  • finalizer.go — deletePVCAndSecrets, deleteTLSSecrets, runFinalizers
  • controller.go — deletion flow ordering
  • controller.go — finishBackup concurrent writes to crunchy cluster

Fix Direction

The delete-pvc and delete-ssl finalizers must only run after the crunchy PostgresCluster is fully gone (i.e., after the wait for PostgresCluster deletion in the reconcile loop), not before. Deleting secrets while the crunchy reconciler is still operational will always be racy.

Steps to reproduce

# Deploy operator
kubectl apply --server-side -f https://raw.githubusercontent.com/percona/percona-postgresql-operator/v2.9.0/deploy/bundle.yaml

# Deploy PG cluster
kubectl apply -f https://raw.githubusercontent.com/percona/percona-postgresql-operator/v2.9.0/deploy/cr.yaml

# Add delete-backups finalizer
kubectl patch perconapgcluster cluster1 --type=merge -p '{
  "metadata": {
    "finalizers": [
      "percona.com/delete-pvc",
      "percona.com/delete-ssl",
      "percona.com/delete-backups"
    ]
  }
}'

# Wait for cluster to be ready
kubectl wait --for=jsonpath='{.status.state}'=ready pg cluster1

# Delete cluster
kubectl delete pg cluster1

# Assert that the secrets that were supposed to be deleted have been recreated
# Note that their age is just a few seconds old
kubectl get secret

Versions

  1. Kubernetes - 1.33.2
  2. Operator - 2.9.0
  3. Database - 18.3-1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions