-
Notifications
You must be signed in to change notification settings - Fork 36
Intermittent network error causes permanent server outage #1224
Copy link
Copy link
Open
Labels
Description
Report
I've started the cluster in alibaba cloud and start getting "Connection refused" when try to connect to it:
More about the problem
What happens:
- cluster runs in mysql group replication mode:
k exec -n mysql pods/ps-db-mysql-2 -it mysql -- cat /var/lib/mysql/mysql.state
ready- something happens, probably network error, nothing in logs, no restart
- server switches to startup mode, but stays fully functional if connected directly
❯❯❯ k exec -n mysql pods/ps-db-mysql-2 -it mysql -- cat /var/lib/mysql/mysql.state
startup
- readiness probe does not allow traffic to it, so kubernetes service stops routing traffic to it
- if it happens on master - switchover never happens, master just becomes inaccessible.
echo -n ready > /var/lib/mysql/mysql.statefixes the problem
Steps to reproduce
I'll add more details when I have, but for now I don't know how to reproduce it
Versions
- Kubernetes 1.35
- Operator 1.0
- Database: percona-server:8.4.6-6.1
Anything else?
No response
Reactions are currently unavailable