[bitnami/redis-cluster]: add preStop hook that gracefully fails over master nodes on pod termination by sobotklp · Pull Request #36221 · bitnami/charts

sobotklp · 2025-08-28T16:27:35Z

Description of the change

This is a fix for #23036

It implements graceful failover for terminations of Redis Cluster master pods. This will limit the impact of terminations and failover events, since Redis cluster clients will have the chance to update their topology before the previous pod terminates. This will improve uptime during planned maintenance events.

Benefits

Improved uptime by failing master pods over to other replicas.

Possible drawbacks

If PodDisruptionBudget isn't being used, it may initiate a failover to a node that is also about to also be terminated. The script will attempt to filter out replicas that aren't candidates for promotion.

Applicable issues

fixes [bitnami/redis-cluster] Request failover when a master node gracefully shuts down #23036

Additional information

This is an attempt to mirror the graceful failover behaviour of the redis and valkey charts when using Sentinel.

Checklist

Chart version bumped in Chart.yaml according to semver. This is not necessary when the changes only affect README.md files.
Variables are documented in the values.yaml and added to the README.md using readme-generator-for-helm
Title of the pull request follows this pattern [bitnami/<name_of_the_chart>] Descriptive title
All commits signed off and in agreement of Developer Certificate of Origin (DCO)

carrodher · 2025-08-28T16:44:00Z

Thank you for initiating this pull request. We appreciate your effort. This is just a friendly reminder that signing your commits is important. Your signature certifies that you either authored the patch or have the necessary rights to contribute to the changes. You can find detailed information on how to do this in the “Sign your work” section of our contributing guidelines.

Feel free to reach out if you have any questions or need assistance with the signing process.

…master nodes on pod termination Signed-off-by: Lewis Sobotkiewicz <lewis.sobotkiewicz@wealthsimple.com>

Signed-off-by: Bitnami Bot <bitnami.bot@broadcom.com>

sobotklp · 2025-09-05T02:01:56Z

Thank you for initiating this pull request. We appreciate your effort. This is just a friendly reminder that signing your commits is important. Your signature certifies that you either authored the patch or have the necessary rights to contribute to the changes. You can find detailed information on how to do this in the “Sign your work” section of our contributing guidelines.

Feel free to reach out if you have any questions or need assistance with the signing process.

Hi @carrodher . Thanks for the tips on signing my commits. I've rebased my commit and amended my change with my signature. Let me know if I can do anything else. :)

Signed-off-by: Bitnami Bot <bitnami.bot@broadcom.com>

Signed-off-by: Lewis Sobotkiewicz <lewis.sobotkiewicz@wealthsimple.com>

sobotklp · 2025-09-05T03:28:45Z

The script uses this basic logic on pod termination:

If the node is a master node, then
- Get the node ID
- List the available replicas for the node ID
- Select a replica from the list and attempt to call CLUSTER FAILOVER.

I tested this by deploying a 9-node, 3-shard cluster locally without authentication or TLS, then manually terminating master nodes.
I could see that the commands ROLE and CLUSTER MYID were called on the terminating master node, as expected.
On the target replicas, I saw the expected output in the logs:

 * Manual failover user request accepted.
 * Received replication offset for paused master manual failover: 1148
 * All master replication stream processed, manual failover can start.
 * Start of election delayed for 0 milliseconds (rank #0, offset 1148).
 * Starting a failover election for epoch 11.

sobotklp · 2025-09-05T03:32:30Z

The script uses this basic logic on shutdown:

If the node is a master node, then

Get the node ID

List the available replicas for the node ID

Select a replica from the list and attempt to call CLUSTER FAILOVER.

I tested this by deploying a 9-node, 3-shard cluster locally without authentication or TLS, then manually terminating master nodes. I could see that the commands ROLE and CLUSTER MYID were called on the terminating master node, as expected. On the target replicas, I saw the expected output in the logs:
 * Manual failover user request accepted.
 * Received replication offset for paused master manual failover: 1148
 * All master replication stream processed, manual failover can start.
 * Start of election delayed for 0 milliseconds (rank #0, offset 1148).
 * Starting a failover election for epoch 11.

It's also possible to kubectl exec -it -- bash into a master node and run the script manually, initiating a manual failover without terminating the pod.

xqianwang · 2025-09-05T20:14:36Z

bitnami/redis-cluster/templates/scripts-configmap.yaml

+        if [[ "$result" == "OK" ]]; then
+          {{- if .Values.cluster.redisShutdownWaitFailover }}
+          # Wait for clients to update their topology
+          sleep 10


Should we make this configurable instead of fixed 10 seconds?

it could be configurable up to a maximum of {{- $.Values.redis.terminationGracePeriodSeconds }} I would think.

I updated this to wait $.Values.redis.terminationGracePeriodSeconds - 10 seconds

xqianwang · 2025-09-05T20:16:19Z

bitnami/redis-cluster/templates/scripts-configmap.yaml

+      mapfile -t REPLICA_IPS < <( get_replica_ips )
+
+      NUM_REPLICAS=${#REPLICA_IPS[@]}
+      echo "Found $NUM_REPLICAS available replicas"


This should cover 0 replicas found. But maybe its better to add a warning message?
No replica found for failover; proceeding with shutdown

github-actions · 2025-09-21T01:28:20Z

This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution.

Signed-off-by: Lewis Sobotkiewicz <lewis.sobotkiewicz@wealthsimple.com>

sobotklp · 2025-09-22T19:45:22Z

@migruiz4 would you like me to do anything more here to validate this PR?

Signed-off-by: Bitnami Bot <bitnami.bot@broadcom.com>

github-actions · 2025-10-09T01:28:11Z

This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution.

github-actions · 2025-10-14T01:33:41Z

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Pull Request. Do not hesitate to reopen it later if necessary.

jmtekin · 2025-10-15T08:44:41Z

Hi. Will this change be abandoned?

Signed-off-by: Bitnami Bot <bitnami.bot@broadcom.com>

sobotklp · 2025-10-15T20:45:55Z

Though it would be nice to have this functionality directly in the chart, I was able to add it to the existing chart by adding the following values overrides:

redis:
...
   lifecycleHooks:
      preStop:
        exec:
          command:
            - /bin/bash
            - -ec
            - /custom-scripts/prestop-redis-cluster.sh
    extraVolumeMounts:
      - name: custom-scripts
        mountPath: /custom-scripts
    extraVolumes:
      - name: custom-scripts
        configMap:
          name: custom-redis-scripts
          defaultMode: 0755

then creating my own ConfigMap name: custom-redis-scripts with the content of prestop-redis-cluster.sh.

github-actions · 2025-11-01T01:28:33Z

This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution.

github-actions · 2025-11-07T01:28:22Z

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Pull Request. Do not hesitate to reopen it later if necessary.

github-actions bot added redis-cluster triage Triage is needed labels Aug 28, 2025

github-actions bot assigned carrodher Aug 28, 2025

github-actions bot requested a review from carrodher August 28, 2025 16:28

sobotklp force-pushed the redis-cluster-prestop branch 2 times, most recently from 75ff4fa to a83c273 Compare August 28, 2025 16:32

carrodher changed the title ~~[bitnami/redis-cluster]: add preStop hook that gracefully fails over …~~ [bitnami/redis-cluster]: add preStop hook that gracefully fails over master nodes on pod termination Aug 28, 2025

sobotklp force-pushed the redis-cluster-prestop branch from ffefc85 to 6d09087 Compare August 28, 2025 16:58

sobotklp and others added 2 commits September 4, 2025 19:59

[bitnami/redis-cluster]: add preStop hook that gracefully fails over …

770594d

…master nodes on pod termination Signed-off-by: Lewis Sobotkiewicz <lewis.sobotkiewicz@wealthsimple.com>

Update CHANGELOG.md

59f9262

Signed-off-by: Bitnami Bot <bitnami.bot@broadcom.com>

sobotklp force-pushed the redis-cluster-prestop branch from c0cbf9b to 59f9262 Compare September 5, 2025 02:00

bitnami-bot and others added 2 commits September 5, 2025 02:05

Update CHANGELOG.md

d22249a

Signed-off-by: Bitnami Bot <bitnami.bot@broadcom.com>

Use correct path for restart script

942c0f6

Signed-off-by: Lewis Sobotkiewicz <lewis.sobotkiewicz@wealthsimple.com>

javsalgar added verify Execute verification workflow for these changes in-progress labels Sep 5, 2025

github-actions bot removed the triage Triage is needed label Sep 5, 2025

github-actions bot unassigned carrodher Sep 5, 2025

github-actions bot removed the request for review from carrodher September 5, 2025 06:53

github-actions bot assigned migruiz4 Sep 5, 2025

github-actions bot requested a review from migruiz4 September 5, 2025 06:53

xqianwang approved these changes Sep 5, 2025

View reviewed changes

github-actions bot added the stale 15 days without activity label Sep 21, 2025

Sleep for terminationGracePeriodSeconds-10 after failing over

1f3a3bb

Signed-off-by: Lewis Sobotkiewicz <lewis.sobotkiewicz@wealthsimple.com>

Update CHANGELOG.md

1f7ee6b

Signed-off-by: Bitnami Bot <bitnami.bot@broadcom.com>

github-actions bot removed the stale 15 days without activity label Sep 23, 2025

github-actions bot added the stale 15 days without activity label Oct 9, 2025

github-actions bot added the solved label Oct 14, 2025

bitnami-bot added stale 15 days without activity and removed stale 15 days without activity labels Oct 14, 2025

bitnami-bot closed this Oct 14, 2025

github-actions bot removed the in-progress label Oct 14, 2025

carrodher reopened this Oct 15, 2025

github-actions bot added triage Triage is needed and removed solved labels Oct 15, 2025

github-actions bot assigned javsalgar Oct 15, 2025

github-actions bot requested a review from javsalgar October 15, 2025 08:54

Update CHANGELOG.md

6d5b591

Signed-off-by: Bitnami Bot <bitnami.bot@broadcom.com>

carrodher unassigned javsalgar Oct 15, 2025

carrodher removed the request for review from javsalgar October 15, 2025 11:09

github-actions bot removed the stale 15 days without activity label Oct 16, 2025

github-actions bot added the stale 15 days without activity label Nov 1, 2025

github-actions bot added the solved label Nov 7, 2025

bitnami-bot added stale 15 days without activity and removed stale 15 days without activity labels Nov 7, 2025

bitnami-bot closed this Nov 7, 2025

Conversation

sobotklp commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of the change

Benefits

Possible drawbacks

Applicable issues

Additional information

Checklist

Uh oh!

carrodher commented Aug 28, 2025

Uh oh!

sobotklp commented Sep 5, 2025

Uh oh!

sobotklp commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sobotklp commented Sep 5, 2025

Uh oh!

xqianwang Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

sobotklp Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sobotklp Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

xqianwang Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 21, 2025

Uh oh!

sobotklp commented Sep 22, 2025

Uh oh!

github-actions bot commented Oct 9, 2025

Uh oh!

github-actions bot commented Oct 14, 2025

Uh oh!

jmtekin commented Oct 15, 2025

Uh oh!

sobotklp commented Oct 15, 2025 • edited by carrodher Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 1, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

sobotklp commented Aug 28, 2025 •

edited

Loading

sobotklp commented Sep 5, 2025 •

edited

Loading

sobotklp Sep 5, 2025 •

edited

Loading

sobotklp commented Oct 15, 2025 •

edited by carrodher

Loading