refactor: promote SpectrumXRailPoolConfig API from v1alpha1 to v1alpha2#202
refactor: promote SpectrumXRailPoolConfig API from v1alpha1 to v1alpha2#202e0ne wants to merge 1 commit intoMellanox:mainfrom
Conversation
Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com>
📝 WalkthroughWalkthroughThe PR migrates the Spectrum-X operator's API definitions, controllers, and configurations from Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/controller/spectrumxrailpoolconfig_host_flows_controller.go (1)
483-495:⚠️ Potential issue | 🟠 MajorDo not unconditionally delete shared
br-xplaneduring topology cleanup.Line 492 deletes
br-xplaneon everycleanupXPlaneBridgescall. Combined with Line 543, removing one topology can tear down the shared bridge while other multi-plane topologies still exist.Proposed fix
-func (r *SpectrumXRailPoolConfigHostFlowsReconciler) cleanupXPlaneBridges(ctx context.Context, rt *v1alpha2.RailTopology) { +func (r *SpectrumXRailPoolConfigHostFlowsReconciler) cleanupXPlaneBridges(ctx context.Context, rt *v1alpha2.RailTopology, deleteXPlane bool) { log := log.FromContext(ctx) railBridge := fmt.Sprintf(railBridgeTemplate, rt.Name) if _, err := r.exec.Execute(fmt.Sprintf( "ovs-vsctl --if-exists del-br %s", railBridge, )); err != nil { log.Error(err, "failed to delete bridge", "bridge name", railBridge) } - if _, err := r.exec.Execute(fmt.Sprintf( - "ovs-vsctl --if-exists del-br %s", xplaneBridge, - )); err != nil { - log.Error(err, "failed to delete bridge", "bridge name", xplaneBridge) + if deleteXPlane { + if _, err := r.exec.Execute(fmt.Sprintf( + "ovs-vsctl --if-exists del-br %s", xplaneBridge, + )); err != nil { + log.Error(err, "failed to delete bridge", "bridge name", xplaneBridge) + } } }+hasActiveMultiPlane := false +for _, current := range rpc.Spec.RailTopology { + if len(current.NicSelector.PfNames) > 1 { + hasActiveMultiPlane = true + break + } +} ... - if len(rt.NicSelector.PfNames) > 1 { - r.cleanupXPlaneBridges(ctx, &rt) + if len(rt.NicSelector.PfNames) > 1 { + r.cleanupXPlaneBridges(ctx, &rt, !hasActiveMultiPlane) }Also applies to: 517-544
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/spectrumxrailpoolconfig_host_flows_controller.go` around lines 483 - 495, The cleanupXPlaneBridges function currently unconditionally deletes the shared xplaneBridge (symbol xplaneBridge) which can remove a bridge still used by other RailTopology instances; change cleanupXPlaneBridges in SpectrumXRailPoolConfigHostFlowsReconciler to only delete xplaneBridge after verifying it is safe: either (a) query the API for other v1alpha2.RailTopology objects that reference the same xplane (e.g., list and check rt.Name/labels) and skip deletion if any remain, or (b) inspect the bridge state via r.exec.Execute (e.g., ovs-vsctl show or list-ports) and delete only if no other ports/peer topologies exist; keep deleting the per-rail bridge (railBridge) as before and use the same log/error handling around r.exec.Execute. Ensure the new safety check is applied where xplaneBridge was being removed (within cleanupXPlaneBridges and the analogous block at lines ~517-544).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/crd/bases/spectrumx.nvidia.com_spectrumxrailpoolconfigs.yaml`:
- Line 17: The CRD currently serves only v1alpha2 (removes v1alpha1) which will
break clusters with stored or requested v1alpha1 objects; restore compatibility
by adding either a served v1alpha1 version with conversion logic or documenting
a breaking change and migration path: add a v1alpha1 served entry alongside
v1alpha2 in the CRD and implement conversion webhooks or conversion functions
between v1alpha1 and v1alpha2 (or add the v1alpha1 API package and conversion
interfaces) for the SpectrumxRailPoolConfig type so Kubernetes can convert
between versions, or explicitly mark this release as breaking and include
migration scripts/docs.
---
Outside diff comments:
In `@internal/controller/spectrumxrailpoolconfig_host_flows_controller.go`:
- Around line 483-495: The cleanupXPlaneBridges function currently
unconditionally deletes the shared xplaneBridge (symbol xplaneBridge) which can
remove a bridge still used by other RailTopology instances; change
cleanupXPlaneBridges in SpectrumXRailPoolConfigHostFlowsReconciler to only
delete xplaneBridge after verifying it is safe: either (a) query the API for
other v1alpha2.RailTopology objects that reference the same xplane (e.g., list
and check rt.Name/labels) and skip deletion if any remain, or (b) inspect the
bridge state via r.exec.Execute (e.g., ovs-vsctl show or list-ports) and delete
only if no other ports/peer topologies exist; keep deleting the per-rail bridge
(railBridge) as before and use the same log/error handling around
r.exec.Execute. Ensure the new safety check is applied where xplaneBridge was
being removed (within cleanupXPlaneBridges and the analogous block at lines
~517-544).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b77d1b77-a574-43af-bdda-4dcfc6862bb1
📒 Files selected for processing (10)
api/v1alpha2/groupversion_info.goapi/v1alpha2/spectrumxrailpoolconfig_types.goapi/v1alpha2/zz_generated.deepcopy.gocmd/flowscontroller/main.goconfig/crd/bases/spectrumx.nvidia.com_spectrumxrailpoolconfigs.yamlconfig/samples/kustomization.yamlconfig/samples/spectrumx_v1alpha2_spectrumxrailpoolconfig.yamlinternal/controller/spectrumxrailpoolconfig_host_flows_controller.gointernal/controller/spectrumxrailpoolconfig_host_flows_controller_test.gointernal/controller/suite_test.go
| scope: Namespaced | ||
| versions: | ||
| - name: v1alpha1 | ||
| - name: v1alpha2 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== CRD versions =="
rg -n '^\s*-\s*name:\s*v1alpha' config/crd/bases/spectrumx.nvidia.com_spectrumxrailpoolconfigs.yaml -C2
echo
echo "== CRD conversion strategy =="
rg -n '^\s*conversion:|^\s*strategy:' config/crd/bases/spectrumx.nvidia.com_spectrumxrailpoolconfigs.yaml -C2 || true
echo
echo "== API version packages present =="
fd -t d '^v1alpha[0-9]+$' api
echo
echo "== Conversion interfaces in Go code =="
rg -n 'ConvertTo\(|ConvertFrom\(|Hub\(\)' api internal --type go || trueRepository: Mellanox/spectrum-x-operator
Length of output: 313
CRD upgrade removes v1alpha1 with no conversion or migration path.
The CRD only serves v1alpha2 (line 17) with no conversion strategy defined. No v1alpha1 API package or conversion interfaces exist in the codebase. Clusters storing or requesting v1alpha1 objects will break. Either maintain a compatibility window with both versions and conversion logic, or explicitly document this as a breaking release with migration instructions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@config/crd/bases/spectrumx.nvidia.com_spectrumxrailpoolconfigs.yaml` at line
17, The CRD currently serves only v1alpha2 (removes v1alpha1) which will break
clusters with stored or requested v1alpha1 objects; restore compatibility by
adding either a served v1alpha1 version with conversion logic or documenting a
breaking change and migration path: add a v1alpha1 served entry alongside
v1alpha2 in the CRD and implement conversion webhooks or conversion functions
between v1alpha1 and v1alpha2 (or add the v1alpha1 API package and conversion
interfaces) for the SpectrumxRailPoolConfig type so Kubernetes can convert
between versions, or explicitly mark this release as breaking and include
migration scripts/docs.
Summary by CodeRabbit
spectrumx.nvidia.com/v1alpha2in their configurations.