Skip to content

📖 document ClusterClass rotation behavior for MachinePool objects#13280

Open
bnallapeta wants to merge 1 commit intokubernetes-sigs:mainfrom
bnallapeta:mp/contract
Open

📖 document ClusterClass rotation behavior for MachinePool objects#13280
bnallapeta wants to merge 1 commit intokubernetes-sigs:mainfrom
bnallapeta:mp/contract

Conversation

@bnallapeta
Copy link
Contributor

@bnallapeta bnallapeta commented Jan 29, 2026

What this PR does / why we need it:
Addresses the concern raised in #13110

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Part of #10496

Additional info:
Post discussions with the MachinePool Working Group, proposals mentioned in this doc is created to unblock #13110

Area example:
/area machinepool
/area clusterclass

@k8s-ci-robot k8s-ci-robot added do-not-merge/needs-area PR is missing an area label size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 29, 2026
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign neolit123 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 29, 2026
@bnallapeta
Copy link
Contributor Author

/cc @AndiDog @richardcase

Signed-off-by: Bharath Nallapeta <nr.bharath97@gmail.com>
@bnallapeta
Copy link
Contributor Author

@AndiDog PTAL. Thanks.

@sbueringer sbueringer added the area/documentation Issues or PRs related to documentation label Feb 11, 2026
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-area PR is missing an area label label Feb 11, 2026
As of today the Node initialization consists of syncing labels from Machines to Nodes. Once the labels have been
initially synced the taint is removed from the Node.

### Infrastructure provider watching for bootstrap changes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This page here is a description of the contract for bootstrap providers. I think we should not specify the infrastructure provider contract on this page

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnallapeta I agree here. Maybe we should move all diffs from this file to infra-machinepool.md instead, since that is where it's relevant. And we can link to that file if we see it as an important hint for this bootstrap provider contract.

When the topology controller detects that the InfraMachinePoolTemplate has changed (e.g., from template updates in ClusterClass), it performs
a **rotation**:

1. Creates a new InfraMachinePool object with a new name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndiDog @mboersma @richardcase Do we have consensus across InfraMachinePool providers that rotation is the right thing to do here?

I would have expected that just in-place updating the InfraMachinePool is much easier to implement for infra providers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For InfraMachinePool, I agree that in-place updates should work. @bnallapeta is there a technical reason why it could be problematic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in BootstrapConfig are important to be rotated, as the issue describes. Could the topology controller ensure a suffix of their name? (I'm not well-versed in how cluster class treats such problems.)

The problem with BootstrapConfig, see here: KubeadmConfig contains the latest join token which is needed for machine pools to start new instances. Since the token isn't separate from the actual bootstrapping configuration, CAPA and other infra providers wouldn't easily be able to check if something relevant has changed that should lead to start instances with the newer config, or if only the token has changed (which shouldn't roll existing nodes). My solution back then for CAPA was to check for a changed BootstrapConfig name (PR).

I'd like input from ClusterClass experts here, so we define the right expectations here. What we definitely want is for infra providers to observe a changed reference, but do we technically require that a ClusterClass-based InfraMachinePoolTemplate change leads to complete rotation? That might be hard in infra providers to implement since it will be an InfraMachinePool object with a different name, thereby probably destroying the old pool?!

Copy link
Member

@sbueringer sbueringer Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm entirely fine with rotating the BootstrapConfig (we already rotate the BootstrapConfigTemplate for MachineDeployments, there's no reason why we can't rotate a BootstrapConfig for MachinePools).

My comment was only about that rotating InfraMachinePools as well seems very though to implement for providers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @AndiDog and @sbueringer . In-place updating the inframachinepool is preferable.

I'm entirely fine with rotating the BootstrapConfig (we already rotate the BootstrapConfigTemplate for MachineDeployments, there's no reason why we can't rotate a BootstrapConfig for MachinePools).

I like this as it's consistent then with MD.

Since the token isn't separate from the actual bootstrapping configuration, CAPA and other infra providers wouldn't easily be able to check if something relevant has changed that should lead to start instances with the newer config

At some point we could explore some mechanism to separate these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/documentation Issues or PRs related to documentation cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants