Skip to content

cert-manager Certificate CRs are never updated after initial creation (create-only semantics) #2413

@larainema

Description

@larainema

Bug Report

Summary

createSSLByCertManager() uses r.client.Create() with an IsAlreadyExists guard for all cert-manager Certificate resources (CA, ssl, ssl-internal). This means any changes to the Certificate spec (duration, renewBefore, SANs, etc.) introduced in newer operator versions are never applied to clusters that were originally created with an older version.

Impact

The most significant impact is the CA certificate duration. Before K8SPXC-1418 (included in v1.15.0), the CA Certificate was created with duration: 8760h (1 year). The fix changed it to DefaultCAValidity (3 years / 26280h).

However, clusters originally deployed with operator < 1.15.0 still have the old 1-year duration in their <cluster>-ca-cert Certificate CR. Upgrading the operator to 1.15.0+ does not update the existing Certificate CR, so cert-manager continues renewing the CA with a 1-year validity — defeating the purpose of the fix.

Similarly, the leaf certificate Duration field added in K8SPXC-1494 (guarded by CompareVersionWith("1.19.0")) will never be applied to clusters created before that version.

Steps to Reproduce

  1. Deploy a PXC cluster with operator version < 1.15.0
  2. Upgrade the operator to 1.18.0+
  3. Check the CA Certificate CR:
    kubectl get certificate <cluster>-ca-cert -n <namespace> -o jsonpath="{.spec.duration}"
    
  4. Observe: duration is still 8760h0m0s (1 year) instead of the expected 26280h0m0s (3 years)
  5. After cert-manager renews the CA, the new cert is still only valid for 1 year:
    kubectl get secret <cluster>-ca-cert -n <namespace> -o jsonpath="{.data.tls\.crt}" | base64 -d | openssl x509 -noout -dates
    

Root Cause

In pkg/controller/pxc/tls.go, createSSLByCertManager():

err := r.client.Create(ctx, caCert)
if err != nil && !k8serr.IsAlreadyExists(err) {
    return fmt.Errorf("create CA certificate: %v", err)
}

The same pattern is used for the ssl and ssl-internal Certificate CRs. When the resource already exists, the error is silently ignored and the function continues — the new spec values are never applied.

Expected Behavior

When the operator is upgraded, it should reconcile the Certificate CR specs to match the current desired state. This could be done via:

  1. Get-then-Update/Patch: Fetch the existing Certificate, compare relevant fields, and patch if they differ
  2. Server-Side Apply: Use r.client.Patch() with client.Apply to declaratively ensure the desired state

Affected Fields

  • spec.duration (CA certificate validity — most impactful)
  • spec.renewBefore
  • spec.dnsNames (SANs)
  • spec.commonName
  • spec.labels

Workaround

Manually patch the Certificate CR:

kubectl patch certificate <cluster>-ca-cert -n <namespace> \
  --type=merge -p '{"spec":{"duration":"26280h0m0s"}}'

Then delete the CA secret to trigger renewal with the new duration:

kubectl delete secret <cluster>-ca-cert -n <namespace>

Related

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions