Skip to content

K8SPS-69 | Readiness probe should check for async replication status#1270

Open
mayankshah1607 wants to merge 6 commits intomainfrom
K8SPS-69
Open

K8SPS-69 | Readiness probe should check for async replication status#1270
mayankshah1607 wants to merge 6 commits intomainfrom
K8SPS-69

Conversation

@mayankshah1607
Copy link
Copy Markdown
Member

@mayankshah1607 mayankshah1607 commented Apr 2, 2026

CHANGE DESCRIPTION

Problem:
An async replica with stopped replication is not marked as unready.

Cause:
We do not have the required checks in place

Solution:
Update the readiness check to detect that replication was stopped and fail the readiness accordingly

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported PS version?
  • Does the change support oldest and newest supported Kubernetes version?

Copilot AI review requested due to automatic review settings April 2, 2026 10:24
@pull-request-size pull-request-size bot added the size/S 10-29 lines label Apr 2, 2026
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the async-cluster readiness logic so the healthcheck can detect when asynchronous replication exists but is stopped, rather than only inferring replica state from read_only.

Changes:

  • Introduce ReplicationStatusStopped to explicitly represent configured-but-not-running async replication.
  • Update async readiness probe to use ReplicationStatus() and fail when replication is stopped.
  • Update async bootstrap to (re)configure replication when status is NotInitiated or Stopped.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pkg/db/replication.go Adds ReplicationStatusStopped and returns it when replication is present but not ON.
cmd/internal/db/db.go Returns ReplicationStatusStopped when IO/SQL threads aren’t ON (no error).
cmd/healthcheck/main.go Readiness for async clusters now checks replication status and fails if stopped.
cmd/bootstrap/async/async_replication.go Treats Stopped the same as NotInitiated to trigger replication configuration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Copilot AI review requested due to automatic review settings April 2, 2026 11:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mayankshah1607 mayankshah1607 marked this pull request as ready for review April 2, 2026 13:41
@egegunes egegunes added this to the v1.1.0 milestone Apr 3, 2026
@JNKPercona
Copy link
Copy Markdown
Collaborator

Test Name Result Time
async-ignore-annotations-8-4 passed 00:06:24
async-global-metadata-8-4 passed 00:15:45
async-upgrade-8-0 passed 00:12:59
async-upgrade-8-4 passed 00:12:46
auto-config-8-4 passed 00:24:36
config-8-4 passed 00:20:59
config-router-8-0 passed 00:07:35
config-router-8-4 passed 00:07:35
demand-backup-minio-8-0 passed 00:19:41
demand-backup-minio-8-4 passed 00:19:27
demand-backup-cloud-8-4 passed 00:22:31
demand-backup-retry-8-4 passed 00:14:25
demand-backup-incremental-8-0 passed 00:34:02
demand-backup-incremental-8-4 failure 00:14:04
async-data-at-rest-encryption-8-0 passed 00:14:04
async-data-at-rest-encryption-8-4 passed 00:12:53
gr-global-metadata-8-4 passed 00:15:33
gr-data-at-rest-encryption-8-0 passed 00:14:53
gr-data-at-rest-encryption-8-4 passed 00:14:24
gr-demand-backup-minio-8-4 passed 00:12:36
gr-demand-backup-cloud-8-4 passed 00:21:37
gr-demand-backup-haproxy-8-4 passed 00:09:48
gr-demand-backup-incremental-8-0 passed 00:36:20
gr-demand-backup-incremental-8-4 passed 00:35:18
gr-finalizer-8-4 passed 00:05:30
gr-haproxy-8-0 passed 00:04:10
gr-haproxy-8-4 passed 00:04:08
gr-ignore-annotations-8-4 passed 00:04:48
gr-init-deploy-8-0 passed 00:09:58
gr-init-deploy-8-4 passed 00:09:08
gr-one-pod-8-4 passed 00:05:57
gr-recreate-8-4 passed 00:17:11
gr-scaling-8-4 passed 00:07:54
gr-scheduled-backup-8-4 passed 00:16:18
gr-scheduled-backup-incremental-8-4 failure 00:20:40
gr-security-context-8-4 passed 00:10:03
gr-self-healing-8-4 passed 00:26:04
gr-tls-cert-manager-8-4 passed 00:08:55
gr-users-8-4 passed 00:05:03
gr-upgrade-8-0 passed 00:10:00
gr-upgrade-8-4 passed 00:10:18
haproxy-8-0 passed 00:09:13
haproxy-8-4 passed 00:09:39
init-deploy-8-0 passed 00:06:33
init-deploy-8-4 passed 00:07:05
limits-8-4 passed 00:06:26
monitoring-8-4 passed 00:19:18
one-pod-8-0 passed 00:06:05
one-pod-8-4 passed 00:06:06
operator-self-healing-8-4 passed 00:12:58
pvc-resize-8-4 failure 00:06:06
recreate-8-4 passed 00:12:31
scaling-8-4 passed 00:10:25
scheduled-backup-8-0 passed 00:18:28
scheduled-backup-8-4 passed 00:18:17
scheduled-backup-incremental-8-0 passed 00:25:06
scheduled-backup-incremental-8-4 passed 00:24:38
service-per-pod-8-4 passed 00:06:31
sidecars-8-4 passed 00:04:32
smart-update-8-4 passed 00:09:14
storage-8-4 passed 00:04:17
telemetry-8-4 passed 00:06:11
tls-cert-manager-8-4 passed 00:09:59
users-8-0 passed 00:08:14
users-8-4 passed 00:07:45
version-service-8-4 passed 00:20:25
Summary Value
Tests Run 66/66
Job Duration 02:56:47
Total Test Time 14:32:55

commit: 71d3cd6
image: perconalab/percona-server-mysql-operator:PR-1270-71d3cd6c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S 10-29 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants