Conversation
| **Relay-log-replay technique for PITR:** Traditional PITR pipes binary logs through `mysqlbinlog`, which is single-threaded and slow. Instead, this implementation downloads binlogs from S3, renames them as relay logs, and replays them using MySQL's replication SQL thread via `CHANGE REPLICATION SOURCE TO RELAY_LOG_FILE=..., SOURCE_HOST='dummy'` followed by `START REPLICA SQL_THREAD UNTIL ...`. This technique is described in detail by [lefred's blog post on faster MySQL PITR](https://lefred.be/content/howto-make-mysql-point-in-time-recovery-faster/). The key advantage is that MySQL's SQL thread can leverage parallel applier workers for concurrent transaction replay. | ||
|
|
||
| **GTID-based vs date-based recovery targets:** | ||
| - *GTID-based:* User specifies a GTID set (e.g., `uuid:N`). The SQL thread replays until just before that GTID using `START REPLICA SQL_THREAD UNTIL SQL_BEFORE_GTIDS`. |
There was a problem hiding this comment.
just may be insensitive, try not to use it just retext-equality
| - *GTID-based:* User specifies a GTID set (e.g., `uuid:N`). The SQL thread replays until just before that GTID using `START REPLICA SQL_THREAD UNTIL SQL_BEFORE_GTIDS`. | ||
| - *Date-based:* User specifies a datetime. The last relay log is parsed with `mysqlbinlog --stop-datetime` to find the exact binary log position, then `START REPLICA SQL_THREAD UNTIL RELAY_LOG_FILE, RELAY_LOG_POS` is used. | ||
|
|
||
| **Binlog search:** The `binlog_server` binary provides `search_by_gtid_set` and `search_by_timestamp` subcommands that query S3 metadata to return the list of binlog files covering the range from the backup's GTID position to the target recovery point. The operator executes these via `kubectl exec` on the binlog server pod. |
There was a problem hiding this comment.
Be careful with executes, it’s profane in some cases executes retext-profanities
| Specifies the recovery target type. Determines which search function is called on the binlog server and which `START REPLICA UNTIL` variant is used. | ||
|
|
||
| - **`spec.pitr.gtid`** *(required when type is `"gtid"`)*: | ||
| The GTID set to recover to. Replay stops just *before* this GTID (using `SQL_BEFORE_GTIDS`), so the transaction identified by this GTID is NOT applied. This is the "exclude" semantic — the user specifies the bad transaction they want to skip. |
There was a problem hiding this comment.
just may be insensitive, try not to use it just retext-equality
| ### 4.4 User-Facing Behavior Changes | ||
|
|
||
| - When `spec.pause` is set to `true` on a cluster, the main reconciler now also disables PITR (`spec.backup.pitr.enabled = false`) to prevent the binlog server from running while the cluster is paused. | ||
| - The binlog server StatefulSet is now cleaned up (deleted) when PITR is disabled, via a new `cleanupBinlogServer` function in the main reconciler. |
There was a problem hiding this comment.
disabled may be insensitive, use turned off, has a disability, person with a disability, people with disabilities instead invalid retext-equality
|
|
||
| **Chosen approach:** Set `spec.backup.pitr.enabled = false` when `spec.pause = true` in `CheckNSetDefaults`. | ||
|
|
||
| **Why:** A paused cluster has no MySQL pods running. The binlog server would fail to connect and enter a crash loop. Disabling PITR cleanly prevents this. |
There was a problem hiding this comment.
Be careful with crash, it’s profane in some cases crash retext-profanities
|
|
||
| ### 8.1 No Binlogs Found for Recovery Target | ||
|
|
||
| **Scenario:** The user specifies a PITR target (GTID or date) for which no matching binlogs exist in S3. This can happen if the binlog server was not running during the time period, or if the target is before the backup. |
There was a problem hiding this comment.
Be careful with period, it’s profane in some cases period retext-profanities
| - `getStopPosition` returns error: `"no end_log_pos found in mysqlbinlog output"`. | ||
| - The PITR job fails with a clear error message. | ||
|
|
||
| ### 8.5 S3 Download Failure During Setup |
There was a problem hiding this comment.
Be careful with Failure, it’s profane in some cases failure retext-profanities
|
|
||
| ### 8.6 mysqld Fails to Start in PITR Job | ||
|
|
||
| **Scenario:** The temporary mysqld started by the entrypoint script fails (e.g., corrupted data directory from a failed restore). |
There was a problem hiding this comment.
Be careful with failed, it’s profane in some cases failed retext-profanities
| ## 11. Open Questions [REQUIRED] | ||
|
|
||
| 1. **Binlog server connects to pod-0 only:** The binlog server always connects to `mysql-0` via FQDN. In async replication, if the primary fails over to another pod, the binlog server may connect to a replica and miss transactions (or fail if `log_replica_updates` is off). Should the binlog server connect to the primary service instead? | ||
| - *Option A:* Connect to pod-0 (current). Simple, but may miss transactions after failover. |
There was a problem hiding this comment.
Simple may be insensitive, try not to use it simple retext-equality
There was a problem hiding this comment.
Pull request overview
Adds an enhancement proposal document describing a Point-in-Time Recovery (PITR) design for the Percona Server for MySQL Operator restore flow, including binlog collection via a binlog server and relay-log based replay during restore.
Changes:
- Introduces a new enhancement proposal covering PITR goals, constraints, and architecture.
- Documents proposed CRD/interface changes for
PerconaServerMySQLRestore.spec.pitr. - Captures edge cases, migration considerations, and an E2E testing strategy.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ### 9.3 Operator Version Skew | ||
|
|
||
| - **Operator upgraded, CR unchanged:** The binlog server gets the expanded configuration on next reconciliation. If the binlog server binary doesn't support the new config fields, it may fail. | ||
| - **Restore CR with `pitr` field on old operator:** If CRDs are updated, the old operator ignores the unknown `pitr` field and performs a regular restore without PITR. No error, but the user doesn't get PITR. If CRDs are not updated, `kubectl apply` fails wiith `unknown field: spec.pitr`. |
There was a problem hiding this comment.
Typo: wiith should be with.
| - **Restore CR with `pitr` field on old operator:** If CRDs are updated, the old operator ignores the unknown `pitr` field and performs a regular restore without PITR. No error, but the user doesn't get PITR. If CRDs are not updated, `kubectl apply` fails wiith `unknown field: spec.pitr`. | |
| - **Restore CR with `pitr` field on old operator:** If CRDs are updated, the old operator ignores the unknown `pitr` field and performs a regular restore without PITR. No error, but the user doesn't get PITR. If CRDs are not updated, `kubectl apply` fails with `unknown field: spec.pitr`. |
| | `storage.fs_buffer_directory` | Local buffer directory for in-flight binlog data (`/var/lib/binlogsrv`) | | ||
| | `storage.checkpoint_size` / `checkpoint_interval` | Controls how frequently data is flushed to S3 | | ||
|
|
||
| **S3 URI format change:** The S3 URI was changed from `s3://accessKey:secretKey@bucket.region` to `protocol://accessKey:secretKey@host/bucket` to correctly support custom endpoints (e.g., MinIO). The protocol is extracted from the `endpointURL` field. |
There was a problem hiding this comment.
The proposed S3 URI formats embed access key/secret directly in the URI. Even if stored in a Secret, URIs are often surfaced in logs, error messages, metrics, or kubectl describe, so this increases credential leakage risk. Consider describing (or proposing) a configuration shape that keeps credentials in separate fields/vars (e.g., AWS env vars/Secret refs) and ensure any URI that includes creds is never logged.
| **S3 URI format change:** The S3 URI was changed from `s3://accessKey:secretKey@bucket.region` to `protocol://accessKey:secretKey@host/bucket` to correctly support custom endpoints (e.g., MinIO). The protocol is extracted from the `endpointURL` field. | |
| **S3 configuration shape:** To correctly support custom endpoints (e.g., MinIO) without embedding credentials in URIs, the binlog server configuration separates endpoint and credentials. The S3 endpoint is provided as a credential-free URI of the form `protocol://host/bucket` (derived from the `endpointURL` field), while credentials are supplied via dedicated fields or environment variables / Secret refs (for example, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and optionally a session token). Any `uri` fields (such as those in `BinlogEntry`) must never contain access keys or secrets and may be safely surfaced in logs and events. |
| ) | ||
| ``` | ||
|
|
||
| > [!note] |
There was a problem hiding this comment.
If the intent is to use GitHub Markdown alerts/callouts, the marker needs to match GitHub's supported syntax (e.g., > [!NOTE]). Using > [!note] may render as a plain blockquote instead of an alert, reducing readability.
| > [!note] | |
| > [!NOTE] |
| - *Recommendation:* Document the exclusion semantic clearly. This is consistent with the behavior in K8SPXC. | ||
| - *Resolution:* Pending. | ||
|
|
||
| 5. **Parallel replay workers:** Should the operator let users to configure `replica_parallel_workers`? |
There was a problem hiding this comment.
Grammar: let users to configure should be let users configure.
| 5. **Parallel replay workers:** Should the operator let users to configure `replica_parallel_workers`? | |
| 5. **Parallel replay workers:** Should the operator let users configure `replica_parallel_workers`? |
|
|
||
| 6. **Downloading binary logs:** Operator downloads all binary logs collected by PBS without considering if they are needed or not. Should we implement a smarter logic to decide which slice of binary logs needs to be downloaded? | ||
| - *Option A:* Relay log based recovery already handles transactions that are already applied to database. No need to overcomplicate. | ||
| - *Option B*: If PBS is collecting binary logs for a long time, `pitr` binary might need to download a lot of files and this might delay the recovery. Operator needs to be smarter to understand last write and/or last GTID of the backup and decide from where it needs to start downloading files. |
There was a problem hiding this comment.
Markdown formatting: *Option B*: is inconsistent with the surrounding *Option A:* style and looks like a typo. Consider making it *Option B:* for consistent emphasis and rendering.
| - *Option B*: If PBS is collecting binary logs for a long time, `pitr` binary might need to download a lot of files and this might delay the recovery. Operator needs to be smarter to understand last write and/or last GTID of the backup and decide from where it needs to start downloading files. | |
| - *Option B:* If PBS is collecting binary logs for a long time, `pitr` binary might need to download a lot of files and this might delay the recovery. Operator needs to be smarter to understand last write and/or last GTID of the backup and decide from where it needs to start downloading files. |
| - *Option A:* Relay log based recovery already handles transactions that are already applied to database. No need to overcomplicate. | ||
| - *Option B*: If PBS is collecting binary logs for a long time, `pitr` binary might need to download a lot of files and this might delay the recovery. Operator needs to be smarter to understand last write and/or last GTID of the backup and decide from where it needs to start downloading files. |
There was a problem hiding this comment.
List indentation under Open Question #6 is inconsistent (Option A/B are indented differently than Recommendation/Resolution). This may break the nested list formatting in rendered Markdown; align indentation so all bullets remain properly nested under item 6.
| - *Option A:* Relay log based recovery already handles transactions that are already applied to database. No need to overcomplicate. | |
| - *Option B*: If PBS is collecting binary logs for a long time, `pitr` binary might need to download a lot of files and this might delay the recovery. Operator needs to be smarter to understand last write and/or last GTID of the backup and decide from where it needs to start downloading files. | |
| - *Option A:* Relay log based recovery already handles transactions that are already applied to database. No need to overcomplicate. | |
| - *Option B*: If PBS is collecting binary logs for a long time, `pitr` binary might need to download a lot of files and this might delay the recovery. Operator needs to be smarter to understand last write and/or last GTID of the backup and decide from where it needs to start downloading files. |
| - *Resolution:* Deferred. | ||
|
|
||
| 5. **`SQL_BEFORE_GTIDS` semantics clarity:** The GTID target uses `SQL_BEFORE_GTIDS`, meaning the specified transaction is *excluded*. Should the CRD documentation make this explicit, or should we offer both "before" and "at" semantics? | ||
| - *Recommendation:* Document the exclusion semantic clearly. This is consistent with the behavior in K8SPXC. |
There was a problem hiding this comment.
clearly may be insensitive, try not to use it clearly retext-equality
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| The GTID set to recover to. Replay stops just *before* this GTID (using `SQL_BEFORE_GTIDS`), so the transaction identified by this GTID is NOT applied. This is the "exclude" semantic — the user specifies the bad transaction they want to skip. | ||
|
|
||
| - **`spec.pitr.date`** *(required when type is `"date"`)*: | ||
| ISO datetime string (e.g., `"2026-03-20 10:30:00"`). Replay stops at the last event before this timestamp. The last relay log is parsed with `mysqlbinlog --stop-datetime` to find the exact position. |
There was a problem hiding this comment.
spec.pitr.date is described as an "ISO datetime string" but the example (YYYY-MM-DD HH:MM:SS) is not ISO 8601 and it also doesn't specify a timezone. Please clarify the exact accepted format (preferably RFC3339/ISO8601 with timezone) and document whether the operator treats the value as UTC or local time (especially since mysqlbinlog --stop-datetime is used).
| ISO datetime string (e.g., `"2026-03-20 10:30:00"`). Replay stops at the last event before this timestamp. The last relay log is parsed with `mysqlbinlog --stop-datetime` to find the exact position. | |
| RFC3339 / ISO 8601 datetime string with timezone (e.g., `"2026-03-20T10:30:00Z"` or `"2026-03-20T11:30:00+01:00"`). The operator parses this value, normalizes it to UTC internally, and uses the corresponding UTC time when determining the stop position (e.g., when invoking `mysqlbinlog --stop-datetime`). Replay stops at the last event **strictly before** this timestamp. |
|
|
||
| ### 2.2 Key Constraints | ||
|
|
||
| 1. **S3 only for binlog storage:** The `binlog_server` binary currently supports S3 as its storage backend. The binlog server configuration uses an S3 URI with embedded credentials. |
There was a problem hiding this comment.
This notes using an S3 URI with embedded credentials for the binlog server. Embedding secrets in URIs tends to leak via logs, errors, metrics, or kubectl describe output. Consider switching the design to pass credentials via env vars/Secret refs (like the PITR job does) and ensure any URI-like values are scrubbed/redacted before logging.
| 1. **S3 only for binlog storage:** The `binlog_server` binary currently supports S3 as its storage backend. The binlog server configuration uses an S3 URI with embedded credentials. | |
| 1. **S3 only for binlog storage:** The `binlog_server` binary currently supports S3 as its storage backend. The binlog server configuration uses an S3 URI without embedded credentials; S3 access keys are provided via environment variables/Kubernetes Secret references, and any URI-like values must be scrubbed/redacted before logging. |
| - *Resolution:* Pending. | ||
|
|
||
| 2. **Replicating from primary:** Even if we resolve the question above we also need to decide if PBS will replicate from a secondary or the primary. | ||
| - *Option A:* Use a headless service that matches all MySQL pods. It's okay if the primary is used to replica from. |
There was a problem hiding this comment.
Grammar: "used to replica from" should be "used to replicate from" (or "used as the replication source").
| - *Option A:* Use a headless service that matches all MySQL pods. It's okay if the primary is used to replica from. | |
| - *Option A:* Use a headless service that matches all MySQL pods. It's okay if the primary is used as the replication source. |
|
|
||
| 7. **Downloading binary logs:** Operator downloads all binary logs collected by PBS without considering if they are needed or not. Should we implement a smarter logic to decide which slice of binary logs needs to be downloaded? | ||
| - *Option A:* Relay log based recovery already handles transactions that are already applied to database. No need to overcomplicate. | ||
| - *Option B*: If PBS is collecting binary logs for a long time, `pitr` binary might need to download a lot of files and this might delay the recovery. Operator needs to be smarter to understand last write and/or last GTID of the backup and decide from where it needs to start downloading files. |
There was a problem hiding this comment.
Minor formatting inconsistency: this bullet uses *Option B*: (asterisk after B) while surrounding bullets use *Option X:*. Please make the emphasis/punctuation consistent for readability.
| - *Option B*: If PBS is collecting binary logs for a long time, `pitr` binary might need to download a lot of files and this might delay the recovery. Operator needs to be smarter to understand last write and/or last GTID of the backup and decide from where it needs to start downloading files. | |
| - *Option B:* If PBS is collecting binary logs for a long time, `pitr` binary might need to download a lot of files and this might delay the recovery. Operator needs to be smarter to understand last write and/or last GTID of the backup and decide from where it needs to start downloading files. |
| - *GTID-based:* User specifies a GTID set (e.g., `uuid:N`). The SQL thread replays until just before that GTID using `START REPLICA SQL_THREAD UNTIL SQL_BEFORE_GTIDS`. | ||
| - *Date-based:* User specifies a datetime. The last relay log is parsed with `mysqlbinlog --stop-datetime` to find the exact binary log position, then `START REPLICA SQL_THREAD UNTIL RELAY_LOG_FILE, RELAY_LOG_POS` is used. | ||
|
|
||
| **Binlog search:** The `binlog_server` binary provides `search_by_gtid_set` and `search_by_timestamp` subcommands that query S3 metadata to return the list of binlog files covering the range from the backup's GTID position to the target recovery point. The operator executes these via `kubectl exec` on the binlog server pod. |
There was a problem hiding this comment.
This says the operator runs the binlog search via kubectl exec, but the operator won't have kubectl available in-container; it typically uses the Kubernetes exec API (SPDY/remotecommand) to achieve the same thing. Suggest rephrasing to "execs into the binlog server pod via the Kubernetes API (equivalent to kubectl exec)" to avoid implying a shell-out dependency.
| **Binlog search:** The `binlog_server` binary provides `search_by_gtid_set` and `search_by_timestamp` subcommands that query S3 metadata to return the list of binlog files covering the range from the backup's GTID position to the target recovery point. The operator executes these via `kubectl exec` on the binlog server pod. | |
| **Binlog search:** The `binlog_server` binary provides `search_by_gtid_set` and `search_by_timestamp` subcommands that query S3 metadata to return the list of binlog files covering the range from the backup's GTID position to the target recovery point. The operator execs into the binlog server pod via the Kubernetes API (equivalent to running these commands with `kubectl exec`). |
| - *Recommendation:* Document the exclusion semantic clearly. This is consistent with the behavior in K8SPXC. | ||
| - *Resolution:* Pending. | ||
|
|
||
| 6. **Parallel replay workers:** Should the operator let users to configure `replica_parallel_workers`? |
There was a problem hiding this comment.
Grammar: "let users to configure" should be "let users configure".
| 6. **Parallel replay workers:** Should the operator let users to configure `replica_parallel_workers`? | |
| 6. **Parallel replay workers:** Should the operator let users configure `replica_parallel_workers`? |
| - When `spec.pause` is set to `true` on a cluster, the main reconciler now also disables PITR (`spec.backup.pitr.enabled = false`) to prevent the binlog server from running while the cluster is paused. | ||
| - The binlog server StatefulSet is now cleaned up (deleted) when PITR is disabled, via a new `cleanupBinlogServer` function in the main reconciler. |
There was a problem hiding this comment.
This proposes automatically setting spec.backup.pitr.enabled = false when spec.pause = true. That mutates the user's desired configuration and may be surprising (users would need to remember to re-enable PITR after unpausing). Consider keeping spec.backup.pitr.enabled intact and instead treating spec.pause as an override during reconciliation (e.g., skip reconciling / scale down / clean up binlog-server resources while paused, but don't rewrite spec).
| - When `spec.pause` is set to `true` on a cluster, the main reconciler now also disables PITR (`spec.backup.pitr.enabled = false`) to prevent the binlog server from running while the cluster is paused. | |
| - The binlog server StatefulSet is now cleaned up (deleted) when PITR is disabled, via a new `cleanupBinlogServer` function in the main reconciler. | |
| - When `spec.pause` is set to `true` on a cluster, the main reconciler treats PITR as effectively disabled to prevent the binlog server from running while the cluster is paused (for example, by skipping binlog-server reconciliation and/or scaling it down), but it does **not** modify `spec.backup.pitr.enabled` in the spec. | |
| - The binlog server StatefulSet is now cleaned up (deleted) when PITR is effectively disabled (either because `spec.backup.pitr.enabled` is `false` or because `spec.pause` is `true`), via a new `cleanupBinlogServer` function in the main reconciler. |
commit: a7b88ac |
|
|
||
| --- | ||
|
|
||
| ## 7. User Experience [REQUIRED] |
There was a problem hiding this comment.
Will we also support restoring from backupSource?
There was a problem hiding this comment.
I think we should, yes.
CHANGE DESCRIPTION
Add enhancement proposal for point-in-time recovery.
CHECKLIST
Jira
Needs Doc) and QA (Needs QA)?Tests
Config/Logging/Testability