K8SPSMDB-1457: add certManagementPolicy option#2266
K8SPSMDB-1457: add certManagementPolicy option#2266myJamong wants to merge 12 commits intopercona:mainfrom
Conversation
|
One consideration is whether to add a guardrail that blocks the userProvidedOnly → auto switch when SSL secrets are missing. However, this might not be the right approach because:
What do you think? Would it be better to add a validation/warning at the operator level, or is documenting this behavior sufficient? |
I think this is auto doing its job. I don't think we need to add logic into operator for additional guardrails. |
306d678 to
65bc7a8
Compare
|
just changed the commit author on force push |
…s userProviededOnly
… userProvieded Only policy
… TLS secret is missing
| cr.Status.AddCondition(api.ClusterCondition{ | ||
| Status: api.ConditionTrue, | ||
| Type: api.ConditionTypeTLSSecretMissing, | ||
| Reason: "TLSSecretNotFound", | ||
| Message: fmt.Sprintf("TLS secret %s is missing, certManagementPolicy is userProvidedOnly", api.SSLSecretName(cr)), | ||
| }) |
There was a problem hiding this comment.
@myJamong I think rather than having a negative condition type and a positive status, we should have a positive type and negative status. for example:
cr.Status.AddCondition(api.ClusterCondition{
Status: api.ConditionFalse,
Type: api.ConditionTypeTLSSecretsReady,
Reason: "TLSSecretNotFound",
Message: fmt.Sprintf("TLS secret %s is missing, certManagementPolicy is userProvidedOnly", api.SSLSecretName(cr)),
})
There was a problem hiding this comment.
Changed - 4968ab6
Renamed TLSSecretMissing to TLSSecretsReady with positive type and negative status and updated all references including unit tests and e2e tests.
commit: f9fc556 |
CHANGE DESCRIPTION
Problem:
The Percona Server for MongoDB Operator automatically generates a new SSL certificate when it cannot find a user-provided secret. In scenarios where the user-provided secret is temporarily lost (e.g., EKS upgrade, External Secrets controller failure, operational mistake), the operator creates a new self-signed certificate with a different CA. This triggers a rolling restart of all MongoDB pods, causing client applications that rely on the original CA certificate to lose connectivity — leading to a severe and unexpected service outage.
Cause:
In reconcileSSL(), the operator cannot distinguish between "the secret was never created" and "the secret existed but was lost." When a user-provided secret is missing, the operator falls through to the automatic certificate creation logic (createSSLManually or createSSLByCertManager), regardless of whether the user intended to manage certificates externally.
Solution:
Added a new configurable field spec.tls.certManagementPolicy to the CRD with two possible values:
This puts control back into the hands of the user, preventing unintended certificate regeneration and pod restarts in production environments.
Relates to: #1758
CHECKLIST
Jira
Needs Doc) and QA (Needs QA)?Tests
compare/*-oc.yml)?Config/Logging/Testability