Skip to content

K8SPXC-1848: update existing cert-manager Certificate CRs on operator upgrade#2414

Open
larainema wants to merge 1 commit intopercona:mainfrom
larainema:fix/update-cert-manager-certificate-specs
Open

K8SPXC-1848: update existing cert-manager Certificate CRs on operator upgrade#2414
larainema wants to merge 1 commit intopercona:mainfrom
larainema:fix/update-cert-manager-certificate-specs

Conversation

@larainema
Copy link
Copy Markdown

Problem

createSSLByCertManager() uses r.client.Create() with an IsAlreadyExists guard for all three cert-manager Certificate resources (CA, ssl, ssl-internal). When the Certificate CR already exists, the function silently skips it — any spec changes introduced in newer operator versions are never applied.

The most significant impact is the CA certificate duration: clusters originally deployed with operator < 1.15.0 still carry duration: 8760h (1 year) instead of the current default 26280h (3 years), because the Certificate CR was never updated after the operator upgrade.

Fixes #2413

Root Cause

err := r.client.Create(ctx, caCert)
if err != nil && !k8serr.IsAlreadyExists(err) {
    return fmt.Errorf("create CA certificate: %v", err)
}

The same pattern is used for all three Certificate CRs. On AlreadyExists, the new desired spec is discarded.

Fix

Add a createOrUpdateCertificate() helper that:

  1. Attempts Create first (fast path for new clusters)
  2. On AlreadyExists: fetches the existing Certificate CR
  3. Compares Spec and Labels using reflect.DeepEqual
  4. If they differ, copies the desired spec/labels onto the existing object and calls Update

All three call sites in createSSLByCertManager now use this helper.

Affected Fields (now reconciled on upgrade)

  • spec.duration — CA validity (most impactful: 1yr → 3yr)
  • spec.renewBefore
  • spec.dnsNames (SANs)
  • spec.commonName
  • labels

Related

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 31, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@it-percona-cla
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@larainema larainema changed the title fix: update existing cert-manager Certificate CRs on operator upgrade K8SPXC: update existing cert-manager Certificate CRs on operator upgrade Mar 31, 2026
return fmt.Errorf("create CA certificate: %v", err)
err := r.createOrUpdateCertificate(ctx, caCert)
if err != nil {
return fmt.Errorf("create or update CA certificate: %v", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please wrap errors here other changed lines

return nil
}

func (r *ReconcilePerconaXtraDBCluster) createOrUpdateCertificate(ctx context.Context, desired *cm.Certificate) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be better to use controllerutil.CreateOrUpdate?

@egegunes egegunes added this to the v1.20.0 milestone Apr 1, 2026
@egegunes egegunes changed the title K8SPXC: update existing cert-manager Certificate CRs on operator upgrade K8SPXC-1848: update existing cert-manager Certificate CRs on operator upgrade Apr 1, 2026
@larainema larainema force-pushed the fix/update-cert-manager-certificate-specs branch from 65f04b9 to 6feb7f0 Compare April 1, 2026 06:55
Copy link
Copy Markdown
Contributor

@egegunes egegunes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes LGTM.

We need to add a case into e2e tests though, probably to tls-issue-cert-manager:

  • Patch PerconaXtraDBCluster to change something under spec.tls, potentially the duration.
  • Check if certificates are updated.
  • Ensure cluster is still available.

@eleo007 wdyt?

Copy link
Copy Markdown
Contributor

@egegunes egegunes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larainema please add a case into tls-issue-cert-manager e2e tests to check certificate updates

@pull-request-size pull-request-size bot added size/L 100-499 lines and removed size/M 30-99 lines labels Apr 6, 2026
@larainema larainema force-pushed the fix/update-cert-manager-certificate-specs branch 3 times, most recently from d345033 to 41d97bd Compare April 7, 2026 23:13
createSSLByCertManager() uses r.client.Create() with an IsAlreadyExists
guard, so any spec changes (duration, renewBefore, SANs, labels)
introduced in newer operator versions are never applied to Certificate
CRs that were created by an older version.

The most significant impact is the CA certificate duration: clusters
originally deployed with operator < 1.15.0 still carry duration=8760h
(1 year) instead of the current default of 26280h (3 years).

Replace the create-and-ignore pattern with a createOrUpdateCertificate
helper that attempts Create first, and on AlreadyExists fetches the
existing Certificate, compares Spec and Labels, and issues an Update
when they differ.

Fixes percona#2413
@larainema larainema force-pushed the fix/update-cert-manager-certificate-specs branch from 41d97bd to d346d57 Compare April 8, 2026 06:33
@JNKPercona
Copy link
Copy Markdown
Collaborator

Test Name Result Time
auto-tuning-8-0 passed 00:22:19
allocator-8-0 passed 00:16:08
allocator-8-4 passed 00:14:05
backup-storage-tls-8-0 failure 00:10:29
cross-site-8-0 passed 00:37:01
custom-users-8-0 passed 00:13:54
demand-backup-cloud-8-0 passed 01:00:48
demand-backup-cloud-8-4 passed 01:01:07
demand-backup-cloud-pxb-8-0 passed 00:57:17
demand-backup-encrypted-with-tls-5-7 passed 00:47:34
demand-backup-encrypted-with-tls-8-0 passed 00:47:03
demand-backup-encrypted-with-tls-8-4 passed 00:47:24
demand-backup-encrypted-with-tls-pxb-5-7 passed 00:18:40
demand-backup-encrypted-with-tls-pxb-8-0 passed 00:17:12
demand-backup-encrypted-with-tls-pxb-8-4 passed 00:18:42
demand-backup-8-0 passed 00:44:15
demand-backup-flow-control-8-0 passed 00:10:49
demand-backup-flow-control-8-4 passed 00:10:59
demand-backup-parallel-8-0 passed 00:09:37
demand-backup-parallel-8-4 passed 00:09:31
demand-backup-without-passwords-8-0 passed 00:15:30
demand-backup-without-passwords-8-4 passed 00:15:33
extra-pvc-8-0 passed 00:24:30
haproxy-5-7 passed 00:14:17
haproxy-8-0 passed 00:14:20
haproxy-8-4 passed 00:13:51
init-deploy-5-7 passed 00:16:00
init-deploy-8-0 passed 00:16:57
limits-8-0 passed 00:12:04
monitoring-2-0-8-0 passed 00:23:36
monitoring-pmm3-8-0 passed 00:18:47
monitoring-pmm3-8-4 passed 00:19:32
one-pod-5-7 passed 00:13:43
one-pod-8-0 passed 00:13:48
pitr-8-0 passed 00:47:14
pitr-8-4 passed 00:48:51
pitr-pxb-8-0 passed 00:46:30
pitr-pxb-8-4 passed 00:46:13
pitr-gap-errors-8-0 passed 00:49:56
pitr-gap-errors-8-4 passed 00:49:13
proxy-protocol-8-0 passed 00:09:40
proxy-switch-8-0 passed 00:13:42
proxysql-sidecar-res-limits-8-0 passed 00:08:54
proxysql-scheduler-8-0 passed 00:27:28
pvc-resize-5-7 passed 00:17:56
pvc-resize-8-0 passed 00:16:50
recreate-8-0 passed 00:18:19
restore-to-encrypted-cluster-8-0 passed 00:28:02
restore-to-encrypted-cluster-8-4 passed 00:27:43
restore-to-encrypted-cluster-pxb-8-0 passed 00:17:18
restore-to-encrypted-cluster-pxb-8-4 passed 00:16:52
scaling-proxysql-8-0 passed 00:08:59
scaling-8-0 passed 00:11:17
scheduled-backup-5-7 failure 00:00:52
scheduled-backup-8-0 failure 00:01:04
scheduled-backup-8-4 failure 00:01:04
security-context-8-0 passed 00:27:08
smart-update1-8-0 passed 00:33:48
smart-update1-8-4 passed 00:33:02
smart-update2-8-0 passed 00:39:14
smart-update2-8-4 failure 00:01:02
smart-update3-8-0 passed 00:18:31
storage-8-0 passed 00:10:41
tls-issue-cert-manager-ref-8-0 passed 00:09:39
tls-issue-cert-manager-8-0 failure 00:01:01
tls-issue-self-8-0 failure 00:01:05
upgrade-consistency-8-0 failure 00:00:58
upgrade-consistency-8-4 failure 00:01:05
upgrade-haproxy-5-7 passed 00:27:15
upgrade-haproxy-8-0 passed 00:27:20
upgrade-proxysql-5-7 failure 01:30:13
upgrade-proxysql-8-0 passed 00:21:00
users-5-7 passed 00:32:54
users-8-0 passed 00:29:32
users-scheduler-8-4 passed 00:31:24
validation-hook-8-0 passed 00:02:32
Summary Value
Tests Run 76/76
Job Duration 05:58:44
Total Test Time 29:21:18

commit: d346d57
image: perconalab/percona-xtradb-cluster-operator:PR-2414-d346d577

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L 100-499 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cert-manager Certificate CRs are never updated after initial creation (create-only semantics)

5 participants