[multiple]: Support agent-based BM SNO deployment#3739
[multiple]: Support agent-based BM SNO deployment#3739bogdando wants to merge 3 commits intoopenstack-k8s-operators:mainfrom
Conversation
|
Skipping CI for Draft Pull Request. |
ba7f9a3 to
647266a
Compare
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/9bd4674b8d4c4b52b1cd39592c1a29c8 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 15m 24s |
9858edf to
cd5e154
Compare
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/c078ef19bb6347c28eeaf5336ae0bbe7 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 18m 59s |
b359870 to
b15906d
Compare
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/24c52f4231c343c5b77791ceeb4a0b06 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 14m 54s |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/302bb0bb5235426b982f4db10356cc88 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 05m 34s |
1a09008 to
5a68d1a
Compare
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/0aefcac6cf914cf1838e68de6b58b488 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 36m 50s |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/b103a318d7a84c699c19ccb2c9b8f7d8 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 08m 24s |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/7afeaad5ed58496e8faf2a68640b4234 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 13m 51s |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/c986a0ffab284cdd8599b29b05ad1b34 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 02m 57s |
870a219 to
cc7d7f5
Compare
| # TARGET="192.168.111.2" | ||
| # MAX_FAILURES=10 | ||
| # FAILURES=0 | ||
| # while true; do |
There was a problem hiding this comment.
Can we not use: wait-for bootstrap-complete and wait-for install-complete
See: https://github.com/openstack-k8s-operators/ci-framework/pull/3407/changes#diff-f7e45313428dbdfa5a6f448364ee77146c3037e601be9127736501340be3f08e or https://github.com/openstack-k8s-operators/hotstack/tree/main/roles/ocp_agent_installer
There was a problem hiding this comment.
Sorry, could you elaborate? I believe this waiting is needed before we can process with the architecture deployment on EDPM host
|
Found issue during testing this again UPDATE: resolved with the top commit |
8f245fe to
22c86ac
Compare
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/91484a74fcb2454e92181cad2382c363 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 07m 33s |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/b4d077a140bf4218a1095f2fc8dc00ea ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 21m 30s |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/5cac6ab3cfcd4e15b37e3b68d4cc6604 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 07m 31s |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/950bcf1a311d48dab6e32af6c1348b68 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 22m 31s |
|
This works for my testing now, including |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Allow deploying SNO OCP for RHOSO control plane instead of
the classic hybrid jobs approach.
Change the controller-0 which runs dev-scripts and deploy
architecture script to become the zuul controller node.
Skip libvirt/vmnet configuration and use no VMs at all.
Ditch dev-scripts and use agent-based openshift-installer
to also cover scenarios with isolated L2 domains between
zuul controller, SNO BM, and EDPM BM (will be added in
the future).
Allow to auto configure usb boot on the target SNO host,
and allow auto-discovery (or validation) of UEFI target
to boot from as Virtual Media Live CD. It is important
to make sure we boot from the image that we build as
we do not wipe the target host disks, and without
those guard rails it may result in confusing behavior
(booting from unexpected sources).
Allow live debug mode for agent appliance.
Password handling for agent aplliance and OCP:
* Pre-ISO generation (for post-bootstrap):
- MachineConfig 99-core-password.yaml -- sets password via MCO after
cluster is up
* Post-ISO generation (for discovery phase):
- coreos-installer iso ignition show -- extracts the embedded
ignition from the agent ISO
- patch_ignition.py -- patches the ignition JSON to add
passwordHash on the core user and a getty@tty1.service autologin
drop-in
- coreos-installer iso ignition embed -f -- re-embeds the patched
ignition back into the ISO
Generated-by: Cursor (claude-4.6-opus-high)
Signed-off-by: Bohdan Dobrelia <bdobreli@redhat.com>
Rename: cifmw_reproducer_bm_ocp -> cifmw_bm_sno cifmw_devscripts_bm_nodes -> cifmw_bm_nodes Change defaults: openshift version, and auto-enable usb boot on target server BIOS. Also extract injection into a separate task, and cover with tests. Make sure no creds are leaking. Fix ejectinig already inserted image. On iDRAC 9 (fw 4.x), EjectMedia sets Inserted=false but the Image URL and internal Remote File Share connection linger indefinitely. Redfish PATCH on VirtualMedia/CD returns 405 (only GET,HEAD allowed), and no amount of waiting releases the stale RFS -- InsertMedia keeps failing with "already connected" (RH BZ#1910739). Work around this iDRAC limitation by SSH-ing into the BMC and running racadm directly, when Image persists after the Redfish eject. Generated-by: claude-4.6-opus-high Signed-off-by: Bohdan Dobrelia <bdobreli@redhat.com>
- Skip controller reboot and wait_for_connection in deploy-edpm-reuse
when cifmw_bm_sno is true (no virtual controller assumption).
- Skip syncing local repos to the Ansible controller in push_code when
cifmw_bm_sno is true.
- In reuse_main, skip CRC/OCP layout detection for BM SNO; set
_use_crc/_use_ocp false and _has_openshift true.
- In deploy_architecture, fall back to play host facts when inventory
has no controller-* host; derive controller address from default IPv4
or inventory_hostname when ansible_host is unset.
- Run OCP cluster-size reduction in architecture only when the ocps
group exists.
- Add cifmw_bm_agent_disabled_ifaces and agent-config networkConfig so
extra NICs can stay link-up without IPv4/IPv6 (overlap validation);
install nmstate when that list is non-empty.
- Document bm_sno Zuul autohold workflow, reproducer scenarios vs
baremetal, and spellcheck terms (NICs, autoheld, tty).
- Update BM SNO logic to match the existing reuse_ocp
flow where OCP (and SNO) deployment becomes skipped
Generated-by: claude-4.6-opus-high
Signed-off-by: Bohdan Dobrelia <bdobreli@redhat.com>
|
This no longer depends on other PRs, and I tested it downstream |
danpawlik
left a comment
There was a problem hiding this comment.
code and commit messages are fine. What's missing here is: SNO on baremetal host without using devscripts. I like that approach, especially that you find a reason how to use it.
I was hoping to use the solution in our CI but I realize, that we can not do any magic with provisioning VM and do an SNO on it without nested virtualization. We will do another solution, or even different one.
LGTM
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/a1da4dc7a7314abca6a9698bf10e4c77 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 06m 11s |
controller instead of the classic hybrid (dev-scripts + libvirt) job
layout: treat the node that runs deploy architecture as the Ansible
controller; skip libvirt/vmnet and do not provision VMs.
path can grow into isolated L2 between Zuul, SNO bare metal, and EDPM
bare metal (EDPM BM to follow).
the UEFI Virtual Media boot target so installs boot the ISO we build;
avoid silent boots from the wrong device when disks are not wiped.
shell) for discovery-phase console access.
so MCO applies the core user password post-bootstrap.
patch_ignition.py to add core passwordHash and getty@tty1 autologin,
then coreos-installer iso ignition embed -f to put ignition back in
the agent ISO.
reboot/wait in deploy-edpm-reuse, skip repo sync to the controller in
push_code, and set reuse facts so OpenShift is assumed without CRC/OCP
VM detection.
host if no controller-* exists; only shrink OCP topology in
architecture when the ocps group is present.
(cifmw_bm_agent_disabled_ifaces + networkConfig); install nmstate when
that list is non-empty to satisfy agent-config validation.
vs baremetal extra-vars, and add spellcheck terms (NICs, autoheld,
tty).
Jira: OSPRH-26767
Generated-by: Cursor (claude-4.6-opus-high)
Signed-off-by: Bohdan Dobrelia bdobreli@redhat.com