fix: bluegreen analysis prematurely succeeds if new ReplicaSet becomes unsaturated#4604
Conversation
…s unsaturated Signed-off-by: Jesse Suen <jesse@akuity.io>
Published E2E Test Results 4 files 4 suites 3h 26m 37s ⏱️ For more details on these failures, see this check. Results for commit 4696fce. ♻️ This comment has been updated with latest results. |
Published Unit Test Results2 391 tests 2 391 ✅ 3m 4s ⏱️ Results for commit 4696fce. ♻️ This comment has been updated with latest results. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #4604 +/- ##
==========================================
- Coverage 84.40% 84.38% -0.02%
==========================================
Files 164 164
Lines 18849 18855 +6
==========================================
+ Hits 15909 15911 +2
- Misses 2077 2079 +2
- Partials 863 865 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Jesse Suen <jesse@akuity.io>
705f2cf to
6237800
Compare
Signed-off-by: Jesse Suen <jesse@akuity.io>
|
…s unsaturated (#4604) * fix: bluegreen analysis prematurely succeeds if new ReplicaSet becomes unsaturated Signed-off-by: Jesse Suen <jesse@akuity.io> * fix: add unit tests Signed-off-by: Jesse Suen <jesse@akuity.io> * fix: function comment was wrong Signed-off-by: Jesse Suen <jesse@akuity.io> --------- Signed-off-by: Jesse Suen <jesse@akuity.io>
…s unsaturated (#4604) * fix: bluegreen analysis prematurely succeeds if new ReplicaSet becomes unsaturated Signed-off-by: Jesse Suen <jesse@akuity.io> * fix: add unit tests Signed-off-by: Jesse Suen <jesse@akuity.io> * fix: function comment was wrong Signed-off-by: Jesse Suen <jesse@akuity.io> --------- Signed-off-by: Jesse Suen <jesse@akuity.io>
|
Fixed in 1.8.4 and upcoming 1.9.0 |



Resolves #3724.
During blue-green rollout reconciliation, we have two methods that determine whether to skip and cancel the pre- and post-promotion analysis (
skipPrePromotionAnalysisRun,skipPostPromotionAnalysisRun).One of the checks in those methods is to see whether the new ReplicaSet is fully saturated. If not, we return true that analysis should be skipped/cancelled. The intended purpose of this check is to prevent analysis from starting unless the newRS is fully up and saturated. In the happy path, where Pods never come down after becoming saturated, this is not a problem.
However, if the new ReplicaSets ever become unsaturated after pre/post promotion had already started, and while the analysis is running, then these methods return true, causing:
The checks are performed during every Rollout reconciliation. If by chance, the new ReplicaSet becomes unsaturated (e.g., due to normal node/pod churn), we will prematurely cause rollouts to promote.
To reproduce this:
This change fixes the issue by not considering pod saturation of the new ReplicaSet if we had already started analysis.
Checklist:
"fix(controller): Updates such and such. Fixes #1234".