Skip to content

refactor: promote SpectrumXRailPoolConfig API from v1alpha1 to v1alpha2#202

Open
e0ne wants to merge 1 commit intoMellanox:mainfrom
e0ne:api-ver
Open

refactor: promote SpectrumXRailPoolConfig API from v1alpha1 to v1alpha2#202
e0ne wants to merge 1 commit intoMellanox:mainfrom
e0ne:api-ver

Conversation

@e0ne
Copy link
Copy Markdown
Collaborator

@e0ne e0ne commented Apr 27, 2026

Summary by CodeRabbit

  • Chores
    • Migrated the Spectrum-X operator API version from v1alpha1 to v1alpha2. All CustomResourceDefinitions, controller schemas, and supporting registrations have been updated to serve SpectrumXRailPoolConfig resources under the new version. Breaking change: Users must update all existing custom resource manifests to specify the new API version endpoint spectrumx.nvidia.com/v1alpha2 in their configurations.

Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

📝 Walkthrough

Walkthrough

The PR migrates the Spectrum-X operator's API definitions, controllers, and configurations from v1alpha1 to v1alpha2, updating package declarations, imports, type references, and manifests consistently across the codebase.

Changes

Cohort / File(s) Summary
API Package Definitions
api/v1alpha2/groupversion_info.go, api/v1alpha2/spectrumxrailpoolconfig_types.go, api/v1alpha2/zz_generated.deepcopy.go
Updated package declarations and GroupVersion constant from v1alpha1 to v1alpha2; deep-copy generated code namespace migrated.
Controller Implementation and Setup
cmd/flowscontroller/main.go, internal/controller/spectrumxrailpoolconfig_host_flows_controller.go, internal/controller/suite_test.go
Updated imports from spectrumxv1alpha1 to spectrumxv1alpha2 and rewired Reconcile(), SetupWithManager(), and scheme registration to use v1alpha2 SpectrumXRailPoolConfig types and status constants.
Controller Tests
internal/controller/spectrumxrailpoolconfig_host_flows_controller_test.go
Updated test fixtures and type references from v1alpha1 to v1alpha2 API for SpectrumXRailPoolConfig, Spec, RailTopology, and NicSelector.
CRD and Manifest Configurations
config/crd/bases/spectrumx.nvidia.com_spectrumxrailpoolconfigs.yaml, config/samples/spectrumx_v1alpha2_spectrumxrailpoolconfig.yaml, config/samples/kustomization.yaml
Updated CRD version from v1alpha1 to v1alpha2, changed sample resource apiVersion, and updated kustomization to reference v1alpha2 sample manifest.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • rollandf
  • almaslennikov

Poem

🐰 From alpha-one to alpha-two we hop,
Our API now wears a newer coat,
Each import, each type, each reference we swap,
A version migration—watch us gloat!
Schema-Group and manifests align,
Version two shines bright and fine. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: promoting the SpectrumXRailPoolConfig API from v1alpha1 to v1alpha2 across all files in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/controller/spectrumxrailpoolconfig_host_flows_controller.go (1)

483-495: ⚠️ Potential issue | 🟠 Major

Do not unconditionally delete shared br-xplane during topology cleanup.

Line 492 deletes br-xplane on every cleanupXPlaneBridges call. Combined with Line 543, removing one topology can tear down the shared bridge while other multi-plane topologies still exist.

Proposed fix
-func (r *SpectrumXRailPoolConfigHostFlowsReconciler) cleanupXPlaneBridges(ctx context.Context, rt *v1alpha2.RailTopology) {
+func (r *SpectrumXRailPoolConfigHostFlowsReconciler) cleanupXPlaneBridges(ctx context.Context, rt *v1alpha2.RailTopology, deleteXPlane bool) {
 	log := log.FromContext(ctx)
 	railBridge := fmt.Sprintf(railBridgeTemplate, rt.Name)
 	if _, err := r.exec.Execute(fmt.Sprintf(
 		"ovs-vsctl --if-exists del-br %s", railBridge,
 	)); err != nil {
 		log.Error(err, "failed to delete bridge", "bridge name", railBridge)
 	}
-	if _, err := r.exec.Execute(fmt.Sprintf(
-		"ovs-vsctl --if-exists del-br %s", xplaneBridge,
-	)); err != nil {
-		log.Error(err, "failed to delete bridge", "bridge name", xplaneBridge)
+	if deleteXPlane {
+		if _, err := r.exec.Execute(fmt.Sprintf(
+			"ovs-vsctl --if-exists del-br %s", xplaneBridge,
+		)); err != nil {
+			log.Error(err, "failed to delete bridge", "bridge name", xplaneBridge)
+		}
 	}
 }
+hasActiveMultiPlane := false
+for _, current := range rpc.Spec.RailTopology {
+	if len(current.NicSelector.PfNames) > 1 {
+		hasActiveMultiPlane = true
+		break
+	}
+}
...
-			if len(rt.NicSelector.PfNames) > 1 {
-				r.cleanupXPlaneBridges(ctx, &rt)
+			if len(rt.NicSelector.PfNames) > 1 {
+				r.cleanupXPlaneBridges(ctx, &rt, !hasActiveMultiPlane)
 			}

Also applies to: 517-544

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/spectrumxrailpoolconfig_host_flows_controller.go` around
lines 483 - 495, The cleanupXPlaneBridges function currently unconditionally
deletes the shared xplaneBridge (symbol xplaneBridge) which can remove a bridge
still used by other RailTopology instances; change cleanupXPlaneBridges in
SpectrumXRailPoolConfigHostFlowsReconciler to only delete xplaneBridge after
verifying it is safe: either (a) query the API for other v1alpha2.RailTopology
objects that reference the same xplane (e.g., list and check rt.Name/labels) and
skip deletion if any remain, or (b) inspect the bridge state via r.exec.Execute
(e.g., ovs-vsctl show or list-ports) and delete only if no other ports/peer
topologies exist; keep deleting the per-rail bridge (railBridge) as before and
use the same log/error handling around r.exec.Execute. Ensure the new safety
check is applied where xplaneBridge was being removed (within
cleanupXPlaneBridges and the analogous block at lines ~517-544).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@config/crd/bases/spectrumx.nvidia.com_spectrumxrailpoolconfigs.yaml`:
- Line 17: The CRD currently serves only v1alpha2 (removes v1alpha1) which will
break clusters with stored or requested v1alpha1 objects; restore compatibility
by adding either a served v1alpha1 version with conversion logic or documenting
a breaking change and migration path: add a v1alpha1 served entry alongside
v1alpha2 in the CRD and implement conversion webhooks or conversion functions
between v1alpha1 and v1alpha2 (or add the v1alpha1 API package and conversion
interfaces) for the SpectrumxRailPoolConfig type so Kubernetes can convert
between versions, or explicitly mark this release as breaking and include
migration scripts/docs.

---

Outside diff comments:
In `@internal/controller/spectrumxrailpoolconfig_host_flows_controller.go`:
- Around line 483-495: The cleanupXPlaneBridges function currently
unconditionally deletes the shared xplaneBridge (symbol xplaneBridge) which can
remove a bridge still used by other RailTopology instances; change
cleanupXPlaneBridges in SpectrumXRailPoolConfigHostFlowsReconciler to only
delete xplaneBridge after verifying it is safe: either (a) query the API for
other v1alpha2.RailTopology objects that reference the same xplane (e.g., list
and check rt.Name/labels) and skip deletion if any remain, or (b) inspect the
bridge state via r.exec.Execute (e.g., ovs-vsctl show or list-ports) and delete
only if no other ports/peer topologies exist; keep deleting the per-rail bridge
(railBridge) as before and use the same log/error handling around
r.exec.Execute. Ensure the new safety check is applied where xplaneBridge was
being removed (within cleanupXPlaneBridges and the analogous block at lines
~517-544).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b77d1b77-a574-43af-bdda-4dcfc6862bb1

📥 Commits

Reviewing files that changed from the base of the PR and between cec9c7f and 46ff2b8.

📒 Files selected for processing (10)
  • api/v1alpha2/groupversion_info.go
  • api/v1alpha2/spectrumxrailpoolconfig_types.go
  • api/v1alpha2/zz_generated.deepcopy.go
  • cmd/flowscontroller/main.go
  • config/crd/bases/spectrumx.nvidia.com_spectrumxrailpoolconfigs.yaml
  • config/samples/kustomization.yaml
  • config/samples/spectrumx_v1alpha2_spectrumxrailpoolconfig.yaml
  • internal/controller/spectrumxrailpoolconfig_host_flows_controller.go
  • internal/controller/spectrumxrailpoolconfig_host_flows_controller_test.go
  • internal/controller/suite_test.go

scope: Namespaced
versions:
- name: v1alpha1
- name: v1alpha2
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== CRD versions =="
rg -n '^\s*-\s*name:\s*v1alpha' config/crd/bases/spectrumx.nvidia.com_spectrumxrailpoolconfigs.yaml -C2

echo
echo "== CRD conversion strategy =="
rg -n '^\s*conversion:|^\s*strategy:' config/crd/bases/spectrumx.nvidia.com_spectrumxrailpoolconfigs.yaml -C2 || true

echo
echo "== API version packages present =="
fd -t d '^v1alpha[0-9]+$' api

echo
echo "== Conversion interfaces in Go code =="
rg -n 'ConvertTo\(|ConvertFrom\(|Hub\(\)' api internal --type go || true

Repository: Mellanox/spectrum-x-operator

Length of output: 313


CRD upgrade removes v1alpha1 with no conversion or migration path.

The CRD only serves v1alpha2 (line 17) with no conversion strategy defined. No v1alpha1 API package or conversion interfaces exist in the codebase. Clusters storing or requesting v1alpha1 objects will break. Either maintain a compatibility window with both versions and conversion logic, or explicitly document this as a breaking release with migration instructions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/crd/bases/spectrumx.nvidia.com_spectrumxrailpoolconfigs.yaml` at line
17, The CRD currently serves only v1alpha2 (removes v1alpha1) which will break
clusters with stored or requested v1alpha1 objects; restore compatibility by
adding either a served v1alpha1 version with conversion logic or documenting a
breaking change and migration path: add a v1alpha1 served entry alongside
v1alpha2 in the CRD and implement conversion webhooks or conversion functions
between v1alpha1 and v1alpha2 (or add the v1alpha1 API package and conversion
interfaces) for the SpectrumxRailPoolConfig type so Kubernetes can convert
between versions, or explicitly mark this release as breaking and include
migration scripts/docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant