✨ Write kubeadm control plane version file for workers to use to fetch the matching kubeadm binary#13433
✨ Write kubeadm control plane version file for workers to use to fetch the matching kubeadm binary#13433AcidLeroy wants to merge 5 commits intokubernetes-sigs:mainfrom
Conversation
|
@AcidLeroy: The label(s) DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @AcidLeroy. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
a240d31 to
52b943c
Compare
52b943c to
c5baa39
Compare
zarcen
left a comment
There was a problem hiding this comment.
Thanks for putting all this together @AcidLeroy. Have some suggestions
bootstrap/kubeadm/internal/controllers/kubeadmconfig_controller.go
Outdated
Show resolved
Hide resolved
bootstrap/kubeadm/internal/controllers/kubeadmconfig_controller_test.go
Outdated
Show resolved
Hide resolved
bootstrap/kubeadm/internal/controllers/kubeadmconfig_controller_test.go
Outdated
Show resolved
Hide resolved
test/e2e/data/infrastructure-docker/main/clusterclass-quick-start-kubeadm-version.yaml
Outdated
Show resolved
Hide resolved
test/e2e/data/infrastructure-docker/main/clusterclass-quick-start-kubeadm-version.yaml
Outdated
Show resolved
Hide resolved
neolit123
left a comment
There was a problem hiding this comment.
i posted some comments on slack:
https://kubernetes.slack.com/archives/C8TSNPY4T/p1773251442281699
i think CAPI can just write the ClusterConfiguration on disk too.
@neolit123 I will provide an alternative solution by writing the |
|
@neolit123 Is this sort of what you are thinking: https://github.com/AcidLeroy/cluster-api/pull/3/changes |
yes, sgtm, but up to maintainers to decide. EDIT: in the slack thread we figured out that the CAPI v1beta2 ClusteConfiguration doesn't have the kubernetesVersion field, so my proposal is not useful. |
|
Rather than hard coding a file, we should look into providing a go template file (kubeadmconfig) and then in the kube bootsrtap config controller, we could render that file with the version directly into it. Look into to resolveFiles in kubeadm and templating the version into the "fetch kubeadm version" script. |
There was a problem hiding this comment.
This file might be in the gitignore file. Should double check.
test/e2e/kubeadm_version_on_join.go
Outdated
| @@ -0,0 +1,328 @@ | |||
| /* | |||
| Copyright 2025 The Kubernetes Authors. | |||
b9466e1 to
97526b7
Compare
co-author: Wei-Chen Chen <zarcen@gmail.com>
97526b7 to
dbc7e68
Compare
| // getControlPlaneVersionForJoin returns the control plane (cluster) version from the cluster's ControlPlaneRef, | ||
| // e.g. KubeadmControlPlane.spec.version. Returns empty string if the cluster has no ControlPlaneRef or the version | ||
| // cannot be read (e.g. control plane not found or does not support version). Used for worker join so that | ||
| // a 1.34 node uses kubeadm 1.35 when the control plane is at 1.35, for example. | ||
| func (r *KubeadmConfigReconciler) getControlPlaneVersionForJoin(ctx context.Context, scope *Scope) string { | ||
| if !scope.Cluster.Spec.ControlPlaneRef.IsDefined() { | ||
| return "" | ||
| } | ||
| controlPlane, err := external.GetObjectFromContractVersionedRef(ctx, r.Client, scope.Cluster.Spec.ControlPlaneRef, scope.Cluster.Namespace) | ||
| if err != nil { | ||
| scope.V(4).Info("Could not get control plane for version, falling back to machine version", "error", err) | ||
| return "" | ||
| } | ||
| cpVersion, err := contract.ControlPlane().Version().Get(controlPlane) | ||
| if err != nil { | ||
| if !errors.Is(err, contract.ErrFieldNotFound) { | ||
| scope.V(4).Info("Could not get control plane version, falling back to machine version", "error", err) | ||
| } | ||
| return "" | ||
| } | ||
| if cpVersion == nil { | ||
| return "" | ||
| } | ||
| return *cpVersion | ||
| } |
There was a problem hiding this comment.
question: I notice this falls back to the machine version for any error (not found, permission denied, network issue, etc.). Is that intentional for all error types, or would it be worth distinguishing "control plane not found / field not present" (expected) from unexpected failures?
Just wondering whether masking unexpected errors here could make debugging harder down the road.
There was a problem hiding this comment.
Yeah, I think there are some error states here that we can requeue for and only fall back to machine version as an absolute last resort. I'll push some changes up shortly with an alternative to what I have here.
There was a problem hiding this comment.
@zjs, I introduced some changes to include another condition so that we can surface any issues with getting the CP version. In this case, we only fall back to the machine version if we absolutely have to, and we'll be able to see what the issue is via the status conditions. LMK what you think! Thanks!
There was a problem hiding this comment.
Curious to see how others feel, but personally, I like it!
What this PR does / why we need it
Kubernetes allows some skew between the control plane and kubelets, but kubeadm’s own skew policy requires the kubeadm binary used for
kubeadm jointo match the kubeadm used when the cluster was created or last upgraded on that path—so you cannot rely on an older kubeadm on the worker when the control plane is newer.That conflicts with real Cluster API flows (e.g. scaling or remediating workers still on an older Kubernetes while the control plane has moved ahead), as discussed in #13315.
This PR:
KubeadmControlPlanewhen available) and uses it when generating join bootstrap data so join config matches the cluster the node is joining. If the control plane object cannot be read while acontrolPlaneRefis set, reconciliation fails and status conditions surface the error (no silent fallback to the Machine version in that case). When there is no control plane ref or the referenced object does not expose a version, the controller falls back to the Machine’s Kubernetes version as before.KubeadmConfig: theControlPlaneKubernetesVersionAvailablecondition stays True for both success paths, but Reason (and Message) distinguish version read from the control plane reference vs version taken from the Machine because the reference is unset or has no version—so operators can see at a glance whether the skew contract is being driven by the cluster control plane or the worker.spec.fileswithcontentFormat: go-template: files are rendered as Gotext/templatewith data includingKubernetesVersion, so operators can wire their own steps (scripts, package installs, downloads) to install a kubeadm binary that matches beforekubeadm joinruns—without CAPI prescribing a single install mechanism.KubeadmControlPlanewhere needed for version resolution.spec.files, controller tests for the new condition reasons, and E2E coverage (KubeadmVersionOnJoin+clusterclass-quick-start-kubeadm-version) demonstrating the pattern end-to-end.sequenceDiagram participant CP as Control plane (newer K8s) participant BC as Kubeadm bootstrap controller participant W as Worker (older image / kubelet) BC->>CP: Read KubeadmControlPlane.spec.version CP-->>BC: e.g. 1.35.0 BC->>BC: Build join data + TemplateData.KubernetesVersion BC->>W: Render spec.files (go-template) e.g. fetch script with {{ .KubernetesVersion }} Note over W: preKubeadmCommands (operator-defined) W->>W: Install/fetch kubeadm matching CP version W->>CP: kubeadm join (binary matches policy)Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Related to #13315
/area bootstrap
/area test