[bitnami/redis-cluster]: add preStop hook that gracefully fails over master nodes on pod termination#36221
[bitnami/redis-cluster]: add preStop hook that gracefully fails over master nodes on pod termination#36221sobotklp wants to merge 7 commits intobitnami:mainfrom
Conversation
75ff4fa to
a83c273
Compare
|
Thank you for initiating this pull request. We appreciate your effort. This is just a friendly reminder that signing your commits is important. Your signature certifies that you either authored the patch or have the necessary rights to contribute to the changes. You can find detailed information on how to do this in the “Sign your work” section of our contributing guidelines. Feel free to reach out if you have any questions or need assistance with the signing process. |
ffefc85 to
6d09087
Compare
…master nodes on pod termination Signed-off-by: Lewis Sobotkiewicz <lewis.sobotkiewicz@wealthsimple.com>
Signed-off-by: Bitnami Bot <bitnami.bot@broadcom.com>
c0cbf9b to
59f9262
Compare
Hi @carrodher . Thanks for the tips on signing my commits. I've rebased my commit and amended my change with my signature. Let me know if I can do anything else. :) |
Signed-off-by: Bitnami Bot <bitnami.bot@broadcom.com>
Signed-off-by: Lewis Sobotkiewicz <lewis.sobotkiewicz@wealthsimple.com>
|
The script uses this basic logic on pod termination:
I tested this by deploying a 9-node, 3-shard cluster locally without authentication or TLS, then manually terminating master nodes. |
It's also possible to |
| if [[ "$result" == "OK" ]]; then | ||
| {{- if .Values.cluster.redisShutdownWaitFailover }} | ||
| # Wait for clients to update their topology | ||
| sleep 10 |
There was a problem hiding this comment.
Should we make this configurable instead of fixed 10 seconds?
There was a problem hiding this comment.
it could be configurable up to a maximum of {{- $.Values.redis.terminationGracePeriodSeconds }} I would think.
There was a problem hiding this comment.
I updated this to wait $.Values.redis.terminationGracePeriodSeconds - 10 seconds
| mapfile -t REPLICA_IPS < <( get_replica_ips ) | ||
|
|
||
| NUM_REPLICAS=${#REPLICA_IPS[@]} | ||
| echo "Found $NUM_REPLICAS available replicas" |
There was a problem hiding this comment.
This should cover 0 replicas found. But maybe its better to add a warning message?
No replica found for failover; proceeding with shutdown
|
This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution. |
Signed-off-by: Lewis Sobotkiewicz <lewis.sobotkiewicz@wealthsimple.com>
|
@migruiz4 would you like me to do anything more here to validate this PR? |
Signed-off-by: Bitnami Bot <bitnami.bot@broadcom.com>
|
This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution. |
|
Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Pull Request. Do not hesitate to reopen it later if necessary. |
|
Hi. Will this change be abandoned? |
Signed-off-by: Bitnami Bot <bitnami.bot@broadcom.com>
|
Though it would be nice to have this functionality directly in the chart, I was able to add it to the existing chart by adding the following values overrides: redis:
...
lifecycleHooks:
preStop:
exec:
command:
- /bin/bash
- -ec
- /custom-scripts/prestop-redis-cluster.sh
extraVolumeMounts:
- name: custom-scripts
mountPath: /custom-scripts
extraVolumes:
- name: custom-scripts
configMap:
name: custom-redis-scripts
defaultMode: 0755then creating my own |
|
This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution. |
|
Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Pull Request. Do not hesitate to reopen it later if necessary. |
Description of the change
This is a fix for #23036
It implements graceful failover for terminations of Redis Cluster master pods. This will limit the impact of terminations and failover events, since Redis cluster clients will have the chance to update their topology before the previous pod terminates. This will improve uptime during planned maintenance events.
Benefits
Improved uptime by failing master pods over to other replicas.
Possible drawbacks
If
PodDisruptionBudgetisn't being used, it may initiate a failover to a node that is also about to also be terminated. The script will attempt to filter out replicas that aren't candidates for promotion.Applicable issues
Additional information
This is an attempt to mirror the graceful failover behaviour of the
redisandvalkeycharts when using Sentinel.Checklist
Chart.yamlaccording to semver. This is not necessary when the changes only affect README.md files.README.mdusing readme-generator-for-helm