Skip to content

Running backups causes the PITR container to crash #2312

@DimitriosLisenko

Description

@DimitriosLisenko

Report

When a backup runs, the MySQL pod seems to run the following process:

/usr/bin/bash -ue /usr/bin/wsrep_sst_xtrabackup-v2 --role donor --address X.X.X.X:4444/xtrabackup_sst//1 --socket /tmp/mysql.sock --datadir /var/lib/mysql/ --basedir /usr/ --plugindir /usr/lib64/mysql/plugin/ --defaults-file /etc/my.cnf --defaults-group-suffix  --mysqld-version 8.4.6-6.1 --binlog binlog --gtid GTID:44506

This causes the wsrep_ variables to change from

wsrep_local_state_comment	Synced
wsrep_cluster_size	1

to

wsrep_local_state_comment	Donor/Desynced
wsrep_cluster_size	2

which eventually becomes

wsrep_local_state_comment	Synced
wsrep_cluster_size	2

and then

wsrep_local_state_comment	Synced
wsrep_cluster_size	1

In the meantime, the PITR container seems to be running a loop which calls the GetPXCFirstHost function from the operator, which gets the wsrep_ state and looks for nodes with wsrep_ready:ON:wsrep_connected:ON:wsrep_local_state_comment:Synced:wsrep_cluster_status:Primary.

If this loop runs when wsrep_local_state_comment is Donor/Desynced, it can't find any nodes, and the PITR container crashes with

ERROR: new db connection: get host: can't find host

Versions

  1. Kubernetes 1.34.2
  2. Operator 1.18.0
  3. Database 8.4

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions