K8SPS-296: improve errors logging by pooknull · Pull Request #1238 · percona/percona-server-mysql-operator

pooknull · 2026-03-16T15:11:01Z

https://perconadev.atlassian.net/browse/K8SPS-296

DESCRIPTION

This PR improves error logging in several ways.

The task was focused on handling errors that can happen during user secret changes, for example unable to upgrade connection: container not found or reconcile: replication: failed to discover cluster. This PR handles those cases, but it also improves the logging behaviour for unhandled errors.

Currently, after each logged error, the operator prints an additional unformatted stack trace originating from the controller-runtime log.Error call, for example:

sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226

This stack trace is not useful, since it only points to the logging inside controller-runtime rather than to the actual source of the error. To remove it, this PR sets StacktraceLevel: zapcore.PanicLevel in the zap options. We already rely on stack traces provided by github.com/pkg/errors, so useful stack trace information is still preserved and logged.

Another issue is that our existing stack traces are too huge. This happens because zap’s errorVerbose output includes stack traces from every wrapped error, while we only need the stack trace from the root cause. To address this, WrapWithDeepestStack was added. It preserves the full error message but keeps only the deepest wrapped error that contains a stack trace as the cause. This makes stack traces significantly shorter and easier to read

CHECKLIST

Jira

Is the Jira ticket created and referenced properly?
Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

Is an E2E test/test case added for the new feature/change?
Are unit tests added where appropriate?

Config/Logging/Testability

Are all needed new/changed options added to default YAML files?
Are all needed new/changed options added to the Helm Chart?
Did we add proper logging messages for operator actions?
Did we ensure compatibility with the previous version or cluster upgrade process?
Does the change support oldest and newest supported PS version?
Does the change support oldest and newest supported Kubernetes version?

https://perconadev.atlassian.net/browse/K8SPS-296

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

egegunes

@pooknull can you paste an example of WrapWithDeepestStack error?

pooknull · 2026-03-27T12:52:26Z

@egegunes Here is the example:

2026-03-27T12:47:27.288Z ERROR failed to update status {"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ps-cluster1","namespace":"andrii-24749"}, "namespace": "andrii-24749", "name": "ps-cluster1", "reconcileID": "91d7f5bb-8df4-4c71-8501-032420552ff6", "error": "update status func: check if GR is ready: test error", "errorVerbose": "update status func: check if GR is ready: test error\ntest error\ngithub.com/percona/percona-server-mysql-operator/pkg/db.(*ReplicationDBManager).CheckIfDatabaseExists\n\t/go/src/github.com/percona/percona-server-mysql-operator/pkg/db/replication.go:192\ngithub.com/percona/percona-server-mysql-operator/pkg/controller/ps.(*PerconaServerMySQLReconciler).isGRReady\n\t/go/src/github.com/percona/percona-server-mysql-operator/pkg/controller/ps/status.go:258\ngithub.com/percona/percona-server-mysql-operator/pkg/controller/ps.(*PerconaServerMySQLReconciler).reconcileCRStatus.func2\n\t/go/src/github.com/percona/percona-server-mysql-operator/pkg/controller/ps/status.go:90\ngithub.com/percona/percona-server-mysql-operator/pkg/controller/ps.writeStatus.func1\n\t/go/src/github.com/percona/percona-server-mysql-operator/pkg/controller/ps/status.go:478\nk8s.io/client-go/util/retry.OnError.func1\n\t/go/pkg/mod/k8s.io/client-go@v0.35.2/util/retry/util.go:51\nk8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection\n\t/go/pkg/mod/k8s.io/apimachinery@v0.35.2/pkg/util/wait/wait.go:150\nk8s.io/apimachinery/pkg/util/wait.ExponentialBackoff\n\t/go/pkg/mod/k8s.io/apimachinery@v0.35.2/pkg/util/wait/backoff.go:477\nk8s.io/client-go/util/retry.OnError\n\t/go/pkg/mod/k8s.io/client-go@v0.35.2/util/retry/util.go:50\nk8s.io/client-go/util/retry.RetryOnConflict\n\t/go/pkg/mod/k8s.io/client-go@v0.35.2/util/retry/util.go:104\ngithub.com/percona/percona-server-mysql-operator/pkg/controller/ps.writeStatus\n\t/go/src/github.com/percona/percona-server-mysql-operator/pkg/controller/ps/status.go:472\ngithub.com/percona/percona-server-mysql-operator/pkg/controller/ps.(*PerconaServerMySQLReconciler).reconcileCRStatus\n\t/go/src/github.com/percona/percona-server-mysql-operator/pkg/controller/ps/status.go:236\ngithub.com/percona/percona-server-mysql-operator/pkg/controller/ps.(*PerconaServerMySQLReconciler).Reconcile.func1\n\t/go/src/github.com/percona/percona-server-mysql-operator/pkg/controller/ps/controller.go:116\ngithub.com/percona/percona-server-mysql-operator/pkg/controller/ps.(*PerconaServerMySQLReconciler).Reconcile\n\t/go/src/github.com/percona/percona-server-mysql-operator/pkg/controller/ps/controller.go:142\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.23.3/pkg/internal/controller/controller.go:222\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.23.3/pkg/internal/controller/controller.go:479\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.23.3/pkg/internal/controller/controller.go:438\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func1.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.23.3/pkg/internal/controller/controller.go:313\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1693"}

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-27T12:58:24Z

pkg/util/errors.go

+	stackErr := err
+	for next := errors.Unwrap(err); next != nil; next = errors.Unwrap(next) {
+		if _, ok := next.(stackTracer); ok {
+			stackErr = next
+		}
+	}
+
+	text := err.Error()
+	if msg != "" {
+		text = msg + ": " + text
+	}
+
+	return &deepStackErr{
+		msg:   text,
+		cause: stackErr,
+	}


WrapWithDeepestStack changes the error chain by setting Unwrap()/Cause() to the deepest stack-bearing wrapped error (stackErr) instead of the original err. This can break errors.Is/errors.As matches for wrapper errors above stackErr and changes error semantics for callers. Consider keeping the original err as the Unwrap() target (preserving the full chain for Is/As) and only using the deepest stack-bearing error when formatting %+v for logging.

Copilot · 2026-03-27T12:58:25Z

pkg/orchestrator/client.go

+		return false
+	}
+
+	return errors.Is(err, io.ErrUnexpectedEOF) || err.Error() == "unexpected end of JSON input"


isTruncatedJSONErr relies on a strict string equality check against "unexpected end of JSON input". This is brittle if the JSON package changes wording or if the error gets wrapped with additional context. Prefer detecting this via errors.As to *json.SyntaxError (and/or using strings.Contains on the message) so the check is resilient to wrapping/prefixes.

Suggested change

return errors.Is(err, io.ErrUnexpectedEOF) || err.Error() == "unexpected end of JSON input"

if errors.Is(err, io.ErrUnexpectedEOF) {

return true

}

var syntaxErr *json.SyntaxError

if errors.As(err, &syntaxErr) && strings.Contains(syntaxErr.Error(), "unexpected end of JSON input") {

return true

}

return false

JNKPercona · 2026-03-27T15:23:44Z

Test Name	Result	Time
async-ignore-annotations-8-4	passed	00:06:24
async-global-metadata-8-4	passed	00:14:31
async-upgrade-8-0	passed	00:12:52
async-upgrade-8-4	passed	00:13:06
auto-config-8-4	passed	00:24:52
config-8-4	passed	00:17:13
config-router-8-0	passed	00:07:39
config-router-8-4	passed	00:07:30
demand-backup-minio-8-0	passed	00:19:40
demand-backup-minio-8-4	passed	00:20:00
demand-backup-cloud-8-4	passed	00:22:47
demand-backup-retry-8-4	passed	00:14:50
async-data-at-rest-encryption-8-0	passed	00:14:24
async-data-at-rest-encryption-8-4	passed	00:13:52
gr-global-metadata-8-4	passed	00:14:08
gr-data-at-rest-encryption-8-0	passed	00:14:41
gr-data-at-rest-encryption-8-4	passed	00:14:46
gr-demand-backup-minio-8-4	passed	00:12:15
gr-demand-backup-cloud-8-4	passed	00:22:30
gr-demand-backup-haproxy-8-4	passed	00:09:52
gr-finalizer-8-4	passed	00:05:34
gr-haproxy-8-0	passed	00:04:13
gr-haproxy-8-4	passed	00:04:11
gr-ignore-annotations-8-4	passed	00:05:05
gr-init-deploy-8-0	passed	00:10:16
gr-init-deploy-8-4	passed	00:09:47
gr-one-pod-8-4	passed	00:05:16
gr-recreate-8-4	passed	00:16:32
gr-scaling-8-4	passed	00:08:24
gr-scheduled-backup-8-4	passed	00:19:41
gr-security-context-8-4	passed	00:09:35
gr-self-healing-8-4	passed	00:21:51
gr-tls-cert-manager-8-4	passed	00:09:29
gr-users-8-4	passed	00:05:12
gr-upgrade-8-0	passed	00:10:48
gr-upgrade-8-4	passed	00:10:11
haproxy-8-0	passed	00:08:10
haproxy-8-4	passed	00:08:20
init-deploy-8-0	passed	00:05:37
init-deploy-8-4	passed	00:05:36
limits-8-4	passed	00:08:30
monitoring-8-4	passed	00:14:09
one-pod-8-0	passed	00:05:28
one-pod-8-4	passed	00:05:32
operator-self-healing-8-4	passed	00:11:24
pvc-resize-8-4	passed	00:05:16
recreate-8-4	passed	00:12:58
scaling-8-4	passed	00:10:10
scheduled-backup-8-0	passed	00:17:28
scheduled-backup-8-4	failure	00:16:47
service-per-pod-8-4	passed	00:06:22
sidecars-8-4	passed	00:04:28
smart-update-8-4	passed	00:09:20
storage-8-4	passed	00:04:05
telemetry-8-4	passed	00:05:54
tls-cert-manager-8-4	passed	00:10:51
users-8-0	passed	00:07:44
users-8-4	passed	00:07:38
version-service-8-4	passed	00:19:55

Summary	Value
Tests Run	59/59
Job Duration	02:28:11
Total Test Time	11:06:04

commit: f5dda51
image: perconalab/percona-server-mysql-operator:PR-1238-f5dda51c

K8SPS-296: improve errors logging

7ce2c81

https://perconadev.atlassian.net/browse/K8SPS-296

Copilot AI review requested due to automatic review settings March 16, 2026 15:11

pull-request-size bot added the size/L 100-499 lines label Mar 16, 2026

Copilot started reviewing on behalf of pooknull March 16, 2026 15:32 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

pooknull marked this pull request as ready for review March 17, 2026 09:57

pooknull requested review from egegunes, gkech, hors and nmarukovich as code owners March 17, 2026 09:57

hors added this to the v1.2.0 milestone Mar 17, 2026

hors marked this pull request as draft March 17, 2026 16:11

egegunes reviewed Mar 18, 2026

View reviewed changes

pooknull added 2 commits March 27, 2026 14:51

add WrapWithDeepestStack to reconcileCRStatus

d03e163

Merge branch 'main' into K8SPS-296

f5dda51

Copilot AI review requested due to automatic review settings March 27, 2026 12:52

Copilot started reviewing on behalf of pooknull March 27, 2026 12:52 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8SPS-296: improve errors logging#1238

K8SPS-296: improve errors logging#1238
pooknull wants to merge 3 commits intomainfrom
K8SPS-296

pooknull commented Mar 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

egegunes left a comment

Uh oh!

pooknull commented Mar 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

JNKPercona commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

-	return errors.Is(err, io.ErrUnexpectedEOF) || err.Error() == "unexpected end of JSON input"
+	if errors.Is(err, io.ErrUnexpectedEOF) {
+		return true
+	}
+	var syntaxErr *json.SyntaxError
+	if errors.As(err, &syntaxErr) && strings.Contains(syntaxErr.Error(), "unexpected end of JSON input") {
+		return true
+	}
+	return false

Conversation

pooknull commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

DESCRIPTION

CHECKLIST

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

egegunes left a comment

Choose a reason for hiding this comment

Uh oh!

pooknull commented Mar 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

JNKPercona commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pooknull commented Mar 16, 2026 •

edited

Loading