K8SPSMDB-1457: add certManagementPolicy option by myJamong · Pull Request #2266 · percona/percona-server-mongodb-operator

myJamong · 2026-03-05T09:00:12Z

CHANGE DESCRIPTION

Problem:
The Percona Server for MongoDB Operator automatically generates a new SSL certificate when it cannot find a user-provided secret. In scenarios where the user-provided secret is temporarily lost (e.g., EKS upgrade, External Secrets controller failure, operational mistake), the operator creates a new self-signed certificate with a different CA. This triggers a rolling restart of all MongoDB pods, causing client applications that rely on the original CA certificate to lose connectivity — leading to a severe and unexpected service outage.

Cause:
In reconcileSSL(), the operator cannot distinguish between "the secret was never created" and "the secret existed but was lost." When a user-provided secret is missing, the operator falls through to the automatic certificate creation logic (createSSLManually or createSSLByCertManager), regardless of whether the user intended to manage certificates externally.

Solution:
Added a new configurable field spec.tls.certManagementPolicy to the CRD with two possible values:

auto (default): Existing behavior — operator creates certificates automatically if none are found.
userProvidedOnly: Operator skips automatic certificate creation entirely and returns nil, leaving certificate lifecycle fully to the user(e.g., External Secrets, manual management). A log message is emitted when this policy is active.

This puts control back into the hands of the user, preventing unintended certificate regeneration and pod restarts in production environments.

Relates to: #1758

CHECKLIST

Jira

Is the Jira ticket created and referenced properly?
Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

Is an E2E test/test case added for the new feature/change?
Are unit tests added where appropriate?
Are OpenShift compare files changed for E2E tests (compare/*-oc.yml)?

Config/Logging/Testability

Are all needed new/changed options added to default YAML files?
Are all needed new/changed options added to the Helm Chart?
Did we add proper logging messages for operator actions?
Did we ensure compatibility with the previous version or cluster upgrade process?
Does the change support oldest and newest supported MongoDB version?
Does the change support oldest and newest supported Kubernetes version?

CLAassistant · 2026-03-05T09:00:18Z

All committers have signed the CLA.

myJamong · 2026-03-05T09:07:46Z

One consideration is whether to add a guardrail that blocks the userProvidedOnly → auto switch when SSL secrets are missing. However, this might not be the right approach because:

The auto policy by definition means "create certificates if not found" — blocking that would contradict its purpose
There are legitimate cases where a user intentionally wants to discard old certificates and let the operator generate new ones

What do you think? Would it be better to add a validation/warning at the operator level, or is documenting this behavior sufficient?

egegunes

@myJamong please add cert-management-policy into run-pr.csv and run-release.csv in e2e-tests/

egegunes · 2026-03-06T05:33:13Z

One consideration is whether to add a guardrail that blocks the userProvidedOnly → auto switch when SSL secrets are missing. However, this might not be the right approach because:

The auto policy by definition means "create certificates if not found" — blocking that would contradict its purpose

There are legitimate cases where a user intentionally wants to discard old certificates and let the operator generate new ones

What do you think? Would it be better to add a validation/warning at the operator level, or is documenting this behavior sufficient?

I think this is auto doing its job. I don't think we need to add logic into operator for additional guardrails.

myJamong · 2026-03-06T06:12:10Z

@myJamong please add cert-management-policy into run-pr.csv and run-release.csv in e2e-tests/

@egegunes thanks for the guilde. I added and commited.
27e70a1

myJamong · 2026-03-06T09:18:09Z

just changed the commit author on force push

…s userProviededOnly

… userProvieded Only policy

… TLS secret is missing

…bility

…etion

egegunes · 2026-03-09T05:38:33Z

pkg/controller/perconaservermongodb/psmdb_controller.go

+				cr.Status.AddCondition(api.ClusterCondition{
+					Status:  api.ConditionTrue,
+					Type:    api.ConditionTypeTLSSecretMissing,
+					Reason:  "TLSSecretNotFound",
+					Message: fmt.Sprintf("TLS secret %s is missing, certManagementPolicy is userProvidedOnly", api.SSLSecretName(cr)),
+				})


@myJamong I think rather than having a negative condition type and a positive status, we should have a positive type and negative status. for example:

cr.Status.AddCondition(api.ClusterCondition{ Status: api.ConditionFalse, Type: api.ConditionTypeTLSSecretsReady, Reason: "TLSSecretNotFound", Message: fmt.Sprintf("TLS secret %s is missing, certManagementPolicy is userProvidedOnly", api.SSLSecretName(cr)), })

Changed - 4968ab6

Renamed TLSSecretMissing to TLSSecretsReady with positive type and negative status and updated all references including unit tests and e2e tests.

…pe and negative status

JNKPercona · 2026-03-10T09:52:42Z

Test Name	Result	Time
arbiter	passed	00:11:55
balancer	passed	00:18:41
cert-management-policy	passed	00:08:44
cross-site-sharded	passed	00:18:25
custom-replset-name	failure	00:14:22
custom-tls	passed	00:14:19
custom-users-roles	passed	00:11:18
custom-users-roles-sharded	passed	00:14:56
data-at-rest-encryption	passed	00:12:53
data-sharded	passed	00:23:42
demand-backup	passed	00:15:49
demand-backup-eks-credentials-irsa	passed	00:00:07
demand-backup-fs	passed	00:28:38
demand-backup-if-unhealthy	passed	00:10:35
demand-backup-incremental-aws	passed	00:16:09
demand-backup-incremental-azure	passed	00:15:56
demand-backup-incremental-gcp-native	passed	00:11:10
demand-backup-incremental-gcp-s3	passed	00:11:21
demand-backup-incremental-minio	passed	00:25:36
demand-backup-incremental-sharded-aws	passed	00:26:19
demand-backup-incremental-sharded-azure	passed	00:18:24
demand-backup-incremental-sharded-gcp-native	passed	00:17:57
demand-backup-incremental-sharded-gcp-s3	passed	00:17:36
demand-backup-incremental-sharded-minio	passed	00:27:24
demand-backup-physical-parallel	passed	00:08:34
demand-backup-physical-aws	passed	00:11:55
demand-backup-physical-azure	passed	00:12:23
demand-backup-physical-gcp-s3	passed	00:12:01
demand-backup-physical-gcp-native	passed	00:13:33
demand-backup-physical-minio	passed	00:20:20
demand-backup-physical-minio-native	passed	00:25:56
demand-backup-physical-minio-native-tls	passed	00:20:06
demand-backup-physical-sharded-parallel	passed	00:11:12
demand-backup-physical-sharded-aws	passed	00:18:29
demand-backup-physical-sharded-azure	passed	00:17:52
demand-backup-physical-sharded-gcp-native	failure	00:09:00
demand-backup-physical-sharded-minio	passed	00:18:18
demand-backup-physical-sharded-minio-native	passed	00:17:44
demand-backup-sharded	passed	00:26:37
disabled-auth	passed	00:16:34
expose-sharded	passed	00:34:04
finalizer	passed	00:10:26
ignore-labels-annotations	passed	00:07:54
init-deploy	passed	00:13:08
ldap	passed	00:09:09
ldap-tls	passed	00:12:54
limits	passed	00:06:21
liveness	passed	00:10:27
mongod-major-upgrade	passed	00:13:22
mongod-major-upgrade-sharded	passed	00:20:56
monitoring-2-0	passed	00:25:07
monitoring-pmm3	passed	00:26:25
multi-cluster-service	passed	00:14:01
multi-storage	passed	00:19:12
non-voting-and-hidden	passed	00:17:01
one-pod	failure	00:04:58
operator-self-healing-chaos	passed	00:13:50
pitr	failure	00:05:14
pitr-physical	passed	01:01:55
pitr-sharded	passed	00:21:59
pitr-to-new-cluster	passed	00:26:23
pitr-physical-backup-source	passed	00:56:40
preinit-updates	passed	00:05:06
pvc-auto-resize	passed	00:14:00
pvc-resize	passed	00:17:38
recover-no-primary	passed	00:27:01
replset-overrides	passed	00:17:57
replset-remapping	passed	00:17:08
replset-remapping-sharded	passed	00:17:13
rs-shard-migration	passed	00:14:30
scaling	passed	00:11:09
scheduled-backup	passed	00:17:15
security-context	passed	00:07:07
self-healing-chaos	passed	00:14:59
service-per-pod	passed	00:19:41
serviceless-external-nodes	passed	00:07:27
smart-update	passed	00:08:31
split-horizon	passed	00:14:59
stable-resource-version	passed	00:04:39
storage	passed	00:07:33
tls-issue-cert-manager	passed	00:30:09
unsafe-psa	passed	00:07:52
upgrade	passed	00:09:23
upgrade-consistency	passed	00:06:23
upgrade-consistency-sharded-tls	passed	00:54:04
upgrade-sharded	passed	00:19:54
upgrade-partial-backup	passed	00:16:04
users	passed	00:17:26
users-vault	passed	00:13:31
version-service	passed	00:24:30

Summary	Value
Tests Run	90/90
Job Duration	04:12:13
Total Test Time	25:28:10

commit: f9fc556
image: perconalab/percona-server-mongodb-operator:PR-2266-f9fc55604

myJamong requested review from egegunes, eleo007, gkech, hors, jvpasinatto, mayankshah1607, nmarukovich, oksana-grishchenko, pooknull and valmiranogueira as code owners March 5, 2026 09:00

pull-request-size bot added the size/L 100-499 lines label Mar 5, 2026

egegunes requested changes Mar 6, 2026

View reviewed changes

egegunes added this to the v1.23.0 milestone Mar 6, 2026

egegunes changed the title ~~K8sPSMDB-1457 add certManagementPolicy option~~ K8SPSMDB-1457: add certManagementPolicy option Mar 6, 2026

myJamong added 3 commits March 6, 2026 18:16

add certManagerPolicy option

7b280ee

add cert-management-policy to e2e-test

99e280f

update generated manifests for certManagementPolicy

65bc7a8

myJamong force-pushed the K8SPSMDB-1457-add-certManagementPolicy-option branch from 306d678 to 65bc7a8 Compare March 6, 2026 09:16

myJamong added 7 commits March 7, 2026 09:50

fix: gracefully handle missing TLS secret when certManagementPolicy i…

311ebda

…s userProviededOnly

add TLSSecretMissing Status condition when TLS secret is missing with…

13c07d9

… userProvieded Only policy

read existing StatefulSet SSL annotations to prevent pod restart when…

3f68918

… TLS secret is missing

add TLSSecretMissing condition verification to e2e test

283d3d9

add unit tests for certManagementPolicy userProvidedOnly SSL handling

06c12bc

hange TLS secret missing log level from Info to Error for better visi…

3ba9f74

…bility

skip cluster readiness check when verifying pods after SSL secret del…

0b573f5

…etion

egegunes requested changes Mar 9, 2026

View reviewed changes

egegunes added the community label Mar 9, 2026

myJamong and others added 2 commits March 9, 2026 17:49

Rename TLSSecretMissing condition to TLSSecretsReady with positive ty…

4968ab6

…pe and negative status

Merge branch 'main' into K8SPSMDB-1457-add-certManagementPolicy-option

f9fc556

egegunes approved these changes Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8SPSMDB-1457: add certManagementPolicy option#2266

K8SPSMDB-1457: add certManagementPolicy option#2266
myJamong wants to merge 12 commits intopercona:mainfrom
myJamong:K8SPSMDB-1457-add-certManagementPolicy-option

myJamong commented Mar 5, 2026

Uh oh!

CLAassistant commented Mar 5, 2026 •

edited

Loading

Uh oh!

myJamong commented Mar 5, 2026

Uh oh!

egegunes left a comment

Uh oh!

egegunes commented Mar 6, 2026

Uh oh!

myJamong commented Mar 6, 2026 •

edited

Loading

Uh oh!

myJamong commented Mar 6, 2026

Uh oh!

egegunes Mar 9, 2026

Uh oh!

myJamong Mar 9, 2026

Uh oh!

JNKPercona commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

myJamong commented Mar 5, 2026

CHANGE DESCRIPTION

CHECKLIST

Uh oh!

CLAassistant commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

myJamong commented Mar 5, 2026

Uh oh!

egegunes left a comment

Choose a reason for hiding this comment

Uh oh!

egegunes commented Mar 6, 2026

Uh oh!

myJamong commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

myJamong commented Mar 6, 2026

Uh oh!

egegunes Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

myJamong Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

JNKPercona commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CLAassistant commented Mar 5, 2026 •

edited

Loading

myJamong commented Mar 6, 2026 •

edited

Loading