Skip to content

Add hci-ironic VA variant for Ironic-provisioned HCI deployments#746

Open
rebtoor wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
rebtoor:baremetal
Open

Add hci-ironic VA variant for Ironic-provisioned HCI deployments#746
rebtoor wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
rebtoor:baremetal

Conversation

@rebtoor
Copy link
Copy Markdown
Contributor

@rebtoor rebtoor commented Apr 20, 2026

Add a new validated architecture variant (hci-ironic) that deploys
the VA-HCI scenario with all compute nodes provisioned via Ironic
using a configurable baremetalSetTemplate.osImage, enabling validation
of edpm-hardened-uefi qcow2 images through a complete deployment cycle.

New reusable component:

  • lib/dataplane/nodeset-baremetal -- kustomize Component that maps
    preProvisioned and baremetalSetTemplate from the values ConfigMap
    into the OpenStackDataPlaneNodeSet spec.

New VA variant (va/hci-ironic):

  • Identical to va/hci except the edpm-pre-ceph nodeset stage includes
    nodeset-baremetal alongside the standard nodeset component.
  • All other stages (NNCP, networking, control-plane, deployments,
    post-ceph, Ceph bootstrap hook) reuse the existing va/hci paths.
  • SetupReady timeout increased to 30m to account for Ironic
    provisioning time.

Documentation:

  • Added README and dataplane-pre-ceph guide for the hci-ironic variant.
  • Updated the standard va/hci README with a Variants section pointing
    to hci-ironic for Ironic-provisioned deployments.

The standard va/hci is untouched -- existing pre-provisioned
deployments are not affected.

Closes: ANVIL-108

Co-authored-by: Claude noreply@anthropic.com

@openshift-ci openshift-ci Bot requested review from fultonj and karelyatin April 20, 2026 07:42
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 20, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rebtoor
Once this PR has been reviewed and has the lgtm label, please assign leifmadsen for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rebtoor rebtoor marked this pull request as draft April 20, 2026 07:50
@rebtoor rebtoor changed the title Add nodeset-baremetal component for Ironic-provisioned data planes Add hci-baremetal VA variant for Ironic-provisioned HCI deployments Apr 20, 2026
@rebtoor rebtoor marked this pull request as ready for review April 20, 2026 08:09
@openshift-ci openshift-ci Bot requested a review from cjeanner April 20, 2026 08:09
@rebtoor rebtoor requested review from abays and removed request for cjeanner April 20, 2026 08:13
@softwarefactory-project-zuul
Copy link
Copy Markdown
Contributor

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.

@rebtoor
Copy link
Copy Markdown
Contributor Author

rebtoor commented Apr 20, 2026

/recheck

@softwarefactory-project-zuul
Copy link
Copy Markdown
Contributor

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.

@rebtoor
Copy link
Copy Markdown
Contributor Author

rebtoor commented Apr 20, 2026

recheck

Copy link
Copy Markdown
Contributor

@fultonj fultonj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Roberto,

This looks really good. Would you please consider two changes?

  1. Add Markdown files

Please add markdown files which describe how to build this VA.
For example someone without ci-framework can browse to this URL:

https://github.com/openstack-k8s-operators/architecture/blob/main/examples/va/hci/README.md

and then build the CRs by following the directions and running kustomize.

The new markdown files should make it clear what this alternative to the hci VA are.

I suggest also updating the README for va/hci to state something along the lines of:

The steps in [Configuring and deploying the pre-Ceph dataplane](examples/va/hci/dataplane-pre-ceph.md)
assume that the compute nodes have been pre-provisioned. If you wish to pre-provision these nodes with
Ironic see ...

Consider one set of directions with two possible methods of deploying HCI.
One with pre-provisioned EDPM nodes and one without.

  1. Naming: Baremetal vs Ironic

I have a small concern that calling this "baremetal" implies that the original HCI is only for running in VMs which is not the case.
Would it be difficult in the naming for "baremetal" be replaced by "ironic" since the difference here is really about how the EDPM nodes were deployed?

Other than that this looks really good to me.

I built the CRs for both hci and hci-baremetal and all CRs are the same except nodeset-pre-ceph.yaml. Which is exactly what I would expect.

--- hci/nodeset-pre-ceph.yaml   2026-04-22 13:58:44.208210525 -0400
+++ hci-baremetal/nodeset-pre-ceph.yaml 2026-04-22 13:57:26.240705158 -0400
@@ -34,6 +34,13 @@
   name: openstack-edpm
   namespace: openstack
 spec:
+  baremetalSetTemplate:
+    automatedCleaningMode: disabled
+    bmhLabelSelector:
+      app: openstack
+    bmhNamespace: openshift-machine-api
+    cloudUserName: cloud-admin
+    osImage: edpm-hardened-uefi.qcow2
   env:
   - name: ANSIBLE_FORCE_COLOR
     value: "True"
@@ -175,7 +182,7 @@
         subnetName: subnet1
       - name: tenant
         subnetName: subnet1
-  preProvisioned: true
+  preProvisioned: false
   services:
   - bootstrap
   - configure-network

@rebtoor rebtoor requested a review from a team as a code owner April 23, 2026 08:04
@rebtoor
Copy link
Copy Markdown
Contributor Author

rebtoor commented Apr 23, 2026

Hey John! Thanks for your careful review!

I've addressed both of your requests, I agree with you that ironic > baremetal has much more sense in this context, 'cause the latter would have been misleading so i renamed the scenario (and the downstream job as well).

@rebtoor rebtoor changed the title Add hci-baremetal VA variant for Ironic-provisioned HCI deployments Add hci-ironic VA variant for Ironic-provisioned HCI deployments Apr 23, 2026
@rebtoor rebtoor requested a review from fultonj April 23, 2026 08:25
Add a new validated architecture variant (hci-ironic) that deploys
the VA-HCI scenario with all compute nodes provisioned via Ironic
using a configurable baremetalSetTemplate.osImage, enabling validation
of edpm-hardened-uefi qcow2 images through a complete deployment cycle.

**New reusable component:**

- `lib/dataplane/nodeset-baremetal` -- kustomize Component that maps
  `preProvisioned` and `baremetalSetTemplate` from the values ConfigMap
  into the `OpenStackDataPlaneNodeSet` spec.

**New VA variant (`va/hci-ironic`):**

- Identical to `va/hci` except the edpm-pre-ceph nodeset stage includes
  `nodeset-baremetal` alongside the standard `nodeset` component.
- Pre-ceph nodeset values include `preProvisioned: false` and
  `baremetalSetTemplate` with a configurable `osImage`.
- `SetupReady` timeout increased to 30m to account for Ironic
  provisioning time.
- Post-ceph stage generates `edpm-nodeset-values` at the `hci` path
  so that SSH keys are available during `kustomize build` (the
  pre-ceph stage writes to the `hci-ironic` path, but the post-ceph
  kustomization references the `hci` path for shared resources).
- All other stages (NNCP, networking, control-plane, deployments,
  Ceph bootstrap hook) reuse the existing `va/hci` paths.

**Documentation:**

- Added README and `dataplane-pre-ceph.md` for the hci-ironic variant.
- Updated the `va/hci` README with a Variants section pointing to
  hci-ironic for Ironic-provisioned deployments.

The standard `va/hci` is untouched -- existing pre-provisioned
deployments are not affected.

Closes: ANVIL-108

Co-authored-by: Claude <noreply@anthropic.com>

Signed-off-by: Roberto Alfieri <ralfieri@redhat.com>
Copy link
Copy Markdown
Contributor

@fultonj fultonj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rebtoor !

Just two small changes and I'm good to go.

@@ -0,0 +1 @@
../hci/control-plane No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this was an accidental symlink. Want to remove it in the next patchset?

[johfulto@laptop control-plane{baremetal}]$ ll
total 12K
-rw-r--r--. 1 johfulto johfulto 419 Jan 13 16:31 kustomization.yaml
drwxr-xr-x. 1 johfulto johfulto  44 Jan 13 16:31 networking/
-rw-r--r--. 1 johfulto johfulto 345 Apr 13 11:42 service-values.yaml
lrwxrwxrwx. 1 johfulto johfulto  20 Apr 23 14:47 control-plane -> ../hci/control-plane
[johfulto@laptop control-plane{baremetal}]$ 

@@ -0,0 +1,14 @@
# This is the kustomization for the FINAL step, edpm-post-ceph
# (hci-ironic variant: references hci-ironic pre-ceph values)
---
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

examples/va/hci-ironic/kustomization.yaml as it is right now is broken [1].

The current README doesn't direct users to use it and automation/vars/hci-ironic.yaml does not use it either. Thus, I think it could just be removed.

[1]

[johfulto@laptop hci-ironic{baremetal}]$ kustomize build .
Error: accumulating resources: accumulation err='accumulating resources from 'control-plane/networking/nncp/values.yaml': evalsymlink failure on '/home/johfulto/claude/review/architecture/examples/va/hci-ironic/control-plane/networking/nncp/values.yaml' : lstat /home/johfulto/claude/review/architecture/examples/va/hci-ironic/control-plane: no such file or directory': must build at directory: not a valid directory: evalsymlink failure on '/home/johfulto/claude/review/architecture/examples/va/hci-ironic/control-plane/networking/nncp/values.yaml' : lstat /home/johfulto/claude/review/architecture/examples/va/hci-ironic/control-plane: no such file or directory
[johfulto@laptop hci-ironic{baremetal}]$

Comment thread examples/va/hci/README.md

5. Between stages 3 and 4, _it is assumed that the user installs Ceph on the 3 OSP compute nodes._ OpenStack K8S CRDs do not provide a way to install Ceph via any sort of combination of CRs.

## Variants
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

baremetalSetTemplate:
# CHANGEME - qcow2 image name from the edpm-hardened-uefi container
osImage: edpm-hardened-uefi.qcow2
automatedCleaningMode: disabled
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please add the following comment above automatedCleaningMode ?

# set to 'metadata' if redeploying Ceph to ensure clean disks for OSDs 

Or perhaps change disabled to metadata?

That would be consistent with what we used to recommend with TripleO:

https://docs.openstack.org/project-deploy-guide/tripleo-docs/wallaby/features/cephadm.html#prerequisite-ensure-disks-are-clean

When deploying Ceph, OSDs will not be created unless the disk is factory clean. This subtlety presents itself when people redeploy (since it's hard to get everything right the first time) and the Ceph install fails. Support used to get lots of calls about OSDs not getting created because the reader didn't know this.

I confirmed metadata is the correct setting as per:

https://book.metal3.io/capm3/automated_cleaning.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants