Skip to content

Backup fails: peer-list cannot access POD_NAMESPACE in sub-shell #2345

@rdlh

Description

@rdlh

Report

Backup fails with empty CLUSTER_SIZE - peer-list unable to access POD_NAMESPACE in sub-shell

More about the problem

Scheduled backups consistently fail because the peer-list command cannot access the POD_NAMESPACE environment variable when called from within the backup script's sub-shell.

The backup pod logs show:

++ get_backup_source
+++ /opt/percona/peer-list -on-start=/opt/percona/backup/lib/pxc/get-pxc-state.sh -service=pxc-db-pxc
+++ grep wsrep_cluster_size
+++ sort
+++ tail -1
+++ cut -d : -f 12
++ CLUSTER_SIZE=
++ '[' -z '' ']'
++ exit 1

The CLUSTER_SIZE variable remains empty, causing the backup to fail immediately.

Steps to reproduce

  1. Deploy Percona XtraDB Cluster with the operator
  2. Configure backup storage with containerOptions.env including POD_NAMESPACE:
backup:
  storages:
    s3-ovh:
      containerOptions:
        env:
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
      s3:
        bucket: my-bucket
        credentialsSecret: my-secret
        region: eu-west-par
  1. Create a backup (scheduled or manual)
  2. Backup fails with empty CLUSTER_SIZE

However, manual testing works:

# Exec into backup pod
kubectl exec -it <backup-pod> -n mysql -c xtrabackup -- sh

# Variable is set ✅
$ echo $POD_NAMESPACE
mysql

# Manual call works ✅
$ /opt/percona/peer-list -on-start=/opt/percona/backup/lib/pxc/get-pxc-state.sh -service=pxc-db-pxc
2026/01/17 15:20:32 Peer finder enter
2026/01/17 15:20:32 Peer list updated
now [pxc-db-pxc-0... pxc-db-pxc-1... pxc-db-pxc-2...]

Root cause:
The /opt/percona/backup/backup.sh script calls peer-list in a sub-shell $(...) without exporting POD_NAMESPACE first. Sub-shells don't inherit non-exported variables, so the Go binary cannot see it.

function get_backup_source() {
    CLUSTER_SIZE=$(/opt/percona/peer-list ...)  # POD_NAMESPACE not visible here
    ...
}

Versions

  • Operator version: 1.18.0
  • PXC image: percona/percona-xtradb-cluster:8.4
  • Backup image: percona/percona-xtrabackup:8.4.0-3.1
  • Deployment method: Helm chart (pxc-db-1.18.0)

Anything else?

Proposed fix:
Add export POD_NAMESPACE at the beginning of the get_backup_source() function in /opt/percona/backup/backup.sh:

function get_backup_source() {
    export POD_NAMESPACE  # <-- Add this line
    CLUSTER_SIZE=$(/opt/percona/peer-list -on-start=/opt/percona/backup/lib/pxc/get-pxc-state.sh -service="$PXC_SERVICE" 2>&1 \
        | grep wsrep_cluster_size \
        | sort \
        | tail -1 \
        | cut -d : -f 12)
    ...
}

Alternative fix:
Pass -ns explicitly to all peer-list calls:

CLUSTER_SIZE=$(/opt/percona/peer-list -ns="${POD_NAMESPACE}" -on-start=... -service="$PXC_SERVICE" ...)

I'm not sure if this is a bug or if I'm missing something in my configuration. Any guidance would be appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions