Skip to content

[ACM-32844] Add right-sizing delegation from MCO to MCOA via MCO CR annotation#2404

Open
dvandra wants to merge 19 commits intostolostron:mainfrom
dvandra:right-sizing-delegation
Open

[ACM-32844] Add right-sizing delegation from MCO to MCOA via MCO CR annotation#2404
dvandra wants to merge 19 commits intostolostron:mainfrom
dvandra:right-sizing-delegation

Conversation

@dvandra
Copy link
Copy Markdown
Contributor

@dvandra dvandra commented Apr 9, 2026

Summary

Adds a migration path for right-sizing (RS) recommendations from MCO's
Policy-based approach to MCOA's ManifestWork-based approach. This enables
RS to be managed by MCOA when the MCO CR is annotated with
observability.open-cluster-management.io/right-sizing-capable: "v1".

Default behavior is unchanged — RS continues to use MCO's Policy-based
deployment. MCOA mode is opt-in and requires an admin to manually annotate
the MCO CR.

Key changes

  • Analytics controller: New AnalyticsReconciler that watches the MCO CR
    and handles RS lifecycle based on delegation mode:
    • MCO mode (default): Creates Policy/Placement/PlacementBinding resources
      and syncs "disabled" to AddOnDeploymentConfig (ADC) so MCOA does not deploy RS
    • MCOA mode (annotated): Cleans up Policy resources, syncs actual RS
      enabled/disabled state to ADC so MCOA deploys via ManifestWork
  • Migration gate: IsRightSizingDelegated() reads the MCO CR annotation
    as the authoritative signal for which component manages RS
  • ADC sync: Ensures platformNamespaceRightSizing and
    platformVirtualizationRightSizing variables in ADC reflect the correct
    state, preventing dual deployment by both MCO and MCOA
  • Renderer integration: MCOA templates (ADC, Deployment, CMA) are rendered
    when MCOAEnabled || rightSizingDelegated, ensuring MCOA infrastructure
    exists even when only RS is delegated

How it works

MCO CR annotation absent → MCO mode (Policy-based RS, ADC="disabled")
MCO CR annotation present → MCOA mode (cleanup Policies, ADC="enabled")

Both modes are mutually exclusive — the annotation acts as a feature gate
that ensures only one component deploys RS resources at any time.

Related MCOA PR:

stolostron/multicluster-observability-addon#449

Summary by CodeRabbit

  • New Features

    • Optional delegation mode for right‑sizing with state tracking and automatic sync of right‑sizing status into deployment configs so add‑on deployment reflects delegated vs managed modes.
    • Add‑on rendering now considers delegated right‑sizing when preparing MCOA artifacts; Grafana launch-link annotation removed.
  • Chores

    • Finer‑grained RBAC for placement resources.
    • Stronger cleanup, error propagation, and explicit add‑on teardown behavior during delegation transitions; ADC spec hash handling improved.
  • Tests

    • Expanded unit tests for delegation sync, ADC updates, reconciler state transitions, and cleanup paths.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 9, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Reconciler adds a delegation gate (IsRightSizingDelegated), tracks first-transition to delegation to cleanup Policy resources, and syncs right-sizing state into AddOnDeploymentConfig (ADC). Multiple cleanup helpers now return errors; renderer, RBAC, managed-cluster-addon logic, and tests updated to respect delegation.

Changes

Cohort / File(s) Summary
Analytics reconciler
operators/.../controllers/analytics/analytics_controller.go, operators/.../controllers/analytics/analytics_controller_test.go
Add wasDelegated field; gate Reconcile on IsRightSizingDelegated; on first delegation detect existing Policy resources and call rightsizingctrl.CleanupPolicyResourcesForDelegation; add syncRightSizingStateToADC and hasPolicyResourcesToCleanup; tests for ADC sync and wasDelegated lifecycle.
Rightsizing coordination
operators/.../controllers/analytics/rightsizing/rightsizing_controller.go
New exported CleanupPolicyResourcesForDelegation(ctx, c, mco) to resolve target namespaces, invoke namespace/virtualization cleanup (preserve configmaps), and set component Enabled=false after success.
Rightsizing utility (component)
operators/.../rs-utility/component.go, operators/.../rs-utility/component_test.go
CleanupComponentResources now returns error; propagate cleanup errors; detect freshEnable to trigger apply on first enable; tests updated for fresh-enable behavior and error propagation.
Namespace & Virtualization helpers + tests
operators/.../rightsizing/rs-namespace/helper.go, .../rs-namespace/helper_test.go, .../rs-virtualization/helper.go, .../rs-virtualization/helper_test.go
Change cleanup helpers to return error and propagate rsutility.CleanupComponentResources results; tests now assert no error returned.
Rendering / MCOA
operators/.../pkg/rendering/renderer.go, .../renderer_mcoa.go, .../renderer_mcoa_test.go
Allow MCOA rendering when right-sizing is delegated via renderer options; render ADC right-sizing customized variables based on delegation and capability flags; remove Grafana launch-link annotation; tests updated to expect two new ADC vars.
MulticlusterObservability reconciler
operators/.../multiclusterobservability/multiclusterobservability_controller.go
Forward right-sizing delegation into MCOA renderer options; gate MCOA deletion by both addon-disabled and not-delegated; explicitly Get/Delete ClusterManagementAddOn to force spoke teardown.
Utilities & unit tests (rightsizing)
operators/.../pkg/util/rightsizing.go, .../rightsizing_test.go
Add constants for right-sizing annotation and ADC keys plus IsRightSizingDelegated helper; unit tests for annotation presence and empty-value semantics.
ManagedClusterAddOn utilities & tests
operators/.../pkg/util/managedclusteraddon.go, .../managedclusteraddon_test.go
Compute ADC spec hash and populate DesiredConfig.SpecHash during initialization; update flows and tests to validate spec-hash computation and updates.
RBAC manifest
operators/.../manifests/.../cluster_role.yaml
Re-granulate cluster role: separate placements into its own rule granting read/write verbs while keeping placementdecisions read-only.

Sequence Diagram(s)

sequenceDiagram
  participant Reconciler as AnalyticsReconciler
  participant MCO as MultiClusterObservability
  participant Rightsizing as RightsizingController
  participant ADC as AddOnDeploymentConfig
  participant K8sAPI as Kubernetes API

  Reconciler->>MCO: GET instance
  Note right of MCO: check IsRightSizingDelegated(annotation)
  alt Delegation mode
    Reconciler->>Reconciler: if !wasDelegated -> set wasDelegated=true
    Reconciler->>K8sAPI: GET Policy resources (hasPolicyResourcesToCleanup)
    alt Policy exists
      Reconciler->>Rightsizing: CleanupPolicyResourcesForDelegation(ctx, client, mco)
    end
    Reconciler->>ADC: GET/AddOnDeploymentConfig
    ADC-->>Reconciler: ADC (or NotFound)
    Reconciler->>ADC: Update CustomizedVariables per MCO capabilities (enabled/disabled)
  else MCO-managed mode
    Reconciler->>ADC: GET/AddOnDeploymentConfig
    ADC-->>Reconciler: ADC (or NotFound)
    Reconciler->>ADC: Force right-sizing vars -> "disabled"
    Reconciler->>Rightsizing: Proceed with Policy-based resource creation
  end
  ADC->>K8sAPI: UPDATE (only if changed)
  Rightsizing->>K8sAPI: create/update/delete Policy/Placements as needed
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

"A gate flipped, configs aligned,
cleanup whispered through namespaces fine.
ADCs toggled, policies cleared with care,
errors now bubble up — no surprises there.
We're debugging together — good catch, onward. 🔍🚨✅"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.17% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main feature: adding right-sizing delegation from MCO to MCOA via a CR annotation, which aligns with the PR's core objective and changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dvandra dvandra force-pushed the right-sizing-delegation branch 2 times, most recently from 1482501 to 1992c81 Compare April 9, 2026 06:57
@dvandra dvandra marked this pull request as ready for review April 9, 2026 09:27
@dvandra dvandra marked this pull request as draft April 9, 2026 10:25
@dvandra dvandra marked this pull request as ready for review April 9, 2026 10:25
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go (1)

51-73: Function logic is correct, but the shared state mutation pattern warrants attention.

The cleanup logic correctly:

  1. Uses bindingUpdated=true to preserve ConfigMaps (MCOA owns them)
  2. Cleans up Policy/Placement/PlacementBinding resources
  3. Resets Enabled flags to trigger fresh creation on next MCO-mode reconcile

Architectural observation: Lines 69-70 directly mutate package-level state in external packages:

rsnamespace.ComponentState.Enabled = false
rsvirtualization.ComponentState.Enabled = false

This works because controller-runtime serializes reconciliation within a single controller, but it creates tight coupling between packages. It's like reaching into another team's desk drawer to flip a switch — functional, but makes the dependency graph harder to trace. 🗄️

For future iterations, consider having the namespace/virtualization packages expose a ResetState() method to encapsulate this mutation. Not blocking for this PR since the current approach is safe under single-controller reconciliation semantics.

🔧 Optional: Encapsulate state reset

In rs-namespace and rs-virtualization packages, add:

func ResetComponentState() {
    ComponentState.Enabled = false
}

Then in this function:

-rsnamespace.ComponentState.Enabled = false
-rsvirtualization.ComponentState.Enabled = false
+rsnamespace.ResetComponentState()
+rsvirtualization.ResetComponentState()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go`
around lines 51 - 73, The function directly mutates package-level state
(rsnamespace.ComponentState.Enabled and
rsvirtualization.ComponentState.Enabled); instead add a ResetComponentState()
(or ResetState()) function in both rsnamespace and rsvirtualization packages
that sets ComponentState.Enabled = false, and replace the direct assignments in
CleanupPolicyResourcesForDelegation with calls to
rsnamespace.ResetComponentState() and rsvirtualization.ResetComponentState() so
the state mutation is encapsulated behind a clear API.
operators/multiclusterobservability/controllers/analytics/analytics_controller.go (1)

273-283: Full Update() could conflict with concurrent ADC modifications.

Using r.Client.Update(ctx, adc) replaces the entire ADC object. If another controller (or MCOA itself) modifies the ADC between your Get() and Update(), you'll get a conflict error or silently overwrite their changes.

For surgical updates to CustomizedVariables, consider using a strategic merge patch or optimistic concurrency retry. Given that ADC is a relatively low-churn object, this might be acceptable risk — but worth noting if you see conflict errors in production.

♻️ Consider using Patch for surgical update
// Alternative: Use patch to update only the customizedVariables field
patchData := map[string]any{
    "spec": map[string]any{
        "customizedVariables": adc.Spec.CustomizedVariables,
    },
}
patchBytes, _ := json.Marshal(patchData)
if err := r.Client.Patch(ctx, adc, client.RawPatch(types.MergePatchType, patchBytes)); err != nil {
    return fmt.Errorf("failed to patch AddOnDeploymentConfig: %w", err)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@operators/multiclusterobservability/controllers/analytics/analytics_controller.go`
around lines 273 - 283, The current use of r.Client.Update(ctx, adc) when
needsUpdate can overwrite concurrent ADC changes; change the surgical update to
patch only the customizedVariables field on the AddOnDeploymentConfig (adc)
instead of replacing the whole object: build a MergePatch (or
StrategicMergePatch) that sets spec.customizedVariables to
adc.Spec.CustomizedVariables and call r.Client.Patch(ctx, adc,
client.RawPatch(...)) and return the error if patch fails; alternatively
implement an optimistic retry around r.Client.Update by re-getting adc on
conflict and re-applying the change, but prefer the Patch approach for minimal
churn.
operators/multiclusterobservability/controllers/analytics/analytics_controller_test.go (1)

343-383: Test verifies state but not the "no update" behavior.

The test name suggests it validates "no update when values match," but the fake client doesn't actually track whether Update() was called. The test confirms final state is correct, but can't prove an unnecessary API call was avoided.

For a more robust assertion, you could wrap the client or use a mock that tracks call counts. Low priority since the optimization is more about efficiency than correctness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@operators/multiclusterobservability/controllers/analytics/analytics_controller_test.go`
around lines 343 - 383, The test
TestSyncRightSizingStateToADC_NoUpdateWhenValuesMatch currently only asserts
final ADC values and can't verify that
AnalyticsReconciler.syncRightSizingStateToADC did not call Update(); modify the
test to use a client wrapper or mock around the fake client that intercepts and
counts Update() invocations (or implement a reactor on the fake client) and
assert that Update() was not invoked when ADC already has matching
CustomizedVariables; locate the client used to construct r :=
&AnalyticsReconciler{Client: c, ...} and replace c with the wrapped/mock client
that records Update calls, then add an assertion that the Update call count is
zero.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@operators/multiclusterobservability/controllers/analytics/analytics_controller.go`:
- Around line 120-126: The syncRightSizingStateToADC error is currently logged
then ignored (r.syncRightSizingStateToADC + reqLogger.Error), which can leave
MCOA stale; change the reconcile flow to (1) log the full error at Error level
including context (include the err value in the reqLogger.Error call) and (2)
requeue the reconcile with a backoff when syncRightSizingStateToADC returns an
error (return a ctrl.Result that requests a delayed requeue or use exponential
backoff logic) instead of silently continuing so repeated failures are retried.
- Around line 36-41: The in-memory wasDelegated flag on AnalyticsReconciler does
not persist across controller restarts, causing
CleanupPolicyResourcesForDelegation to re-run unnecessarily; change the
reconcile flow to either persist delegation state (e.g., write/read a ConfigMap
or update the MCO status subresource used by this controller) or add a pre-check
that skips cleanup when nothing needs removal. Concretely: modify the reconcile
entry point that currently reads wasDelegated and calls
CleanupPolicyResourcesForDelegation to first call a new helper (e.g.,
hasPolicyResourcesToCleanup) that queries for Policy/Placement/PlacementBinding
resources and returns false if none exist, and/or replace wasDelegated with a
persisted boolean stored/read from a ConfigMap or the MCO status before deciding
to run CleanupPolicyResourcesForDelegation; update any references to
AnalyticsReconciler.wasDelegated and ensure CleanupPolicyResourcesForDelegation
becomes a no-op when there are no resources to delete.

---

Nitpick comments:
In
`@operators/multiclusterobservability/controllers/analytics/analytics_controller_test.go`:
- Around line 343-383: The test
TestSyncRightSizingStateToADC_NoUpdateWhenValuesMatch currently only asserts
final ADC values and can't verify that
AnalyticsReconciler.syncRightSizingStateToADC did not call Update(); modify the
test to use a client wrapper or mock around the fake client that intercepts and
counts Update() invocations (or implement a reactor on the fake client) and
assert that Update() was not invoked when ADC already has matching
CustomizedVariables; locate the client used to construct r :=
&AnalyticsReconciler{Client: c, ...} and replace c with the wrapped/mock client
that records Update calls, then add an assertion that the Update call count is
zero.

In
`@operators/multiclusterobservability/controllers/analytics/analytics_controller.go`:
- Around line 273-283: The current use of r.Client.Update(ctx, adc) when
needsUpdate can overwrite concurrent ADC changes; change the surgical update to
patch only the customizedVariables field on the AddOnDeploymentConfig (adc)
instead of replacing the whole object: build a MergePatch (or
StrategicMergePatch) that sets spec.customizedVariables to
adc.Spec.CustomizedVariables and call r.Client.Patch(ctx, adc,
client.RawPatch(...)) and return the error if patch fails; alternatively
implement an optimistic retry around r.Client.Update by re-getting adc on
conflict and re-applying the change, but prefer the Patch approach for minimal
churn.

In
`@operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go`:
- Around line 51-73: The function directly mutates package-level state
(rsnamespace.ComponentState.Enabled and
rsvirtualization.ComponentState.Enabled); instead add a ResetComponentState()
(or ResetState()) function in both rsnamespace and rsvirtualization packages
that sets ComponentState.Enabled = false, and replace the direct assignments in
CleanupPolicyResourcesForDelegation with calls to
rsnamespace.ResetComponentState() and rsvirtualization.ResetComponentState() so
the state mutation is encapsulated behind a clear API.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7e2227f0-304c-414c-9ecc-2c1bd279f5dd

📥 Commits

Reviewing files that changed from the base of the PR and between 62cb738 and 1992c81.

📒 Files selected for processing (12)
  • operators/multiclusterobservability/controllers/analytics/analytics_controller.go
  • operators/multiclusterobservability/controllers/analytics/analytics_controller_test.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-utility/component.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-utility/component_test.go
  • operators/multiclusterobservability/controllers/multiclusterobservability/multiclusterobservability_controller.go
  • operators/multiclusterobservability/manifests/base/multicluster-observability-addon/cluster_role.yaml
  • operators/multiclusterobservability/pkg/rendering/renderer.go
  • operators/multiclusterobservability/pkg/rendering/renderer_mcoa.go
  • operators/multiclusterobservability/pkg/rendering/renderer_mcoa_test.go
  • operators/multiclusterobservability/pkg/util/rightsizing.go
  • operators/multiclusterobservability/pkg/util/rightsizing_test.go

Comment thread operators/multiclusterobservability/controllers/analytics/analytics_controller.go Outdated
@dvandra
Copy link
Copy Markdown
Contributor Author

dvandra commented Apr 9, 2026

/retest

@dvandra dvandra changed the title [DRAFT] Right sizing delegation Add right-sizing delegation from MCO to MCOA via MCO CR annotation Apr 9, 2026
@dvandra dvandra force-pushed the right-sizing-delegation branch from 0e12993 to 1a53221 Compare April 9, 2026 13:25
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
operators/multiclusterobservability/pkg/rendering/renderer_mcoa.go (1)

224-252: ⚠️ Potential issue | 🟠 Major

Write explicit disabled RS variables even when Platform is omitted.

Both right-sizing ADC keys are only appended inside if cs.Platform != nil. That means a render triggered by some other MCOA capability (for example user-workload metrics) produces no RS variables at all when Capabilities.Platform is absent, instead of the explicit "disabled" state this migration path depends on. The result is an implicit state machine, which is how configs start acting like haunted machinery.

Initialize both RS variables to "disabled" before the platform block, then override to "enabled" only when delegation is on and the specific recommendation is enabled. 🔍🚨

Suggested shape of the fix
 		appendCustomVar := func(aodc *addonapiv1alpha1.AddOnDeploymentConfig, name, value string) {
 			aodc.Spec.CustomizedVariables = append(
 				aodc.Spec.CustomizedVariables,
 				addonapiv1alpha1.CustomizedVariable{Name: name, Value: value},
 			)
 		}
+
+		delegated := r.rendererOptions != nil && r.rendererOptions.MCOAOptions.RightSizingDelegated
+		namespaceRS := "disabled"
+		virtualizationRS := "disabled"
 
 		if cs.Platform != nil {
 			if cs.Platform.Logs.Collection.Enabled {
 				fqdn := mcoconfig.GetMCOASupportedCRDFQDN(mcoconfig.ClusterLogForwarderCRDName)
 				appendCustomVar(aodc, namePlatformLogsCollection, fqdn)
@@
 			if cs.Platform.Analytics.IncidentDetection.Enabled {
 				appendCustomVar(aodc, namePlatformIncidentDetection, uipluginsCRDFQDN)
 			}
-
-			// Right-sizing ADC variables: when MCOA manages RS (delegated),
-			// sync actual CR state. Otherwise set "disabled" so MCOA doesn't deploy RS.
-			delegated := r.rendererOptions != nil && r.rendererOptions.MCOAOptions.RightSizingDelegated
 			if delegated && cs.Platform.Analytics.NamespaceRightSizingRecommendation.Enabled {
-				appendCustomVar(aodc, mcoutil.ADCKeyPlatformNamespaceRightSizing, "enabled")
-			} else {
-				appendCustomVar(aodc, mcoutil.ADCKeyPlatformNamespaceRightSizing, "disabled")
+				namespaceRS = "enabled"
 			}
 			if delegated && cs.Platform.Analytics.VirtualizationRightSizingRecommendation.Enabled {
-				appendCustomVar(aodc, mcoutil.ADCKeyPlatformVirtualizationRightSizing, "enabled")
-			} else {
-				appendCustomVar(aodc, mcoutil.ADCKeyPlatformVirtualizationRightSizing, "disabled")
+				virtualizationRS = "enabled"
 			}
 		}
+
+		appendCustomVar(aodc, mcoutil.ADCKeyPlatformNamespaceRightSizing, namespaceRS)
+		appendCustomVar(aodc, mcoutil.ADCKeyPlatformVirtualizationRightSizing, virtualizationRS)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@operators/multiclusterobservability/pkg/rendering/renderer_mcoa.go` around
lines 224 - 252, The RS ADC variables are only set inside the cs.Platform != nil
block, causing them to be omitted when Platform is absent; pre-initialize both
mcoutil.ADCKeyPlatformNamespaceRightSizing and
mcoutil.ADCKeyPlatformVirtualizationRightSizing to "disabled" (call
appendCustomVar(aodc, <key>, "disabled")) before the platform block, then keep
the existing logic that overrides them to "enabled" when delegated (check
r.rendererOptions.MCOAOptions.RightSizingDelegated) and the specific
cs.Platform.Analytics.*.Enabled flags, using the same appendCustomVar calls to
update the values.
🧹 Nitpick comments (1)
operators/multiclusterobservability/pkg/rendering/renderer_mcoa.go (1)

42-47: Avoid a second source of truth for RS delegation.

RightSizingDelegated now duplicates the MCO CR annotation that this PR defines as authoritative. If a caller ever passes a stale/default value, Render() and renderAddonDeploymentConfig() can flip the manager state incorrectly and break the mutual-exclusion guarantee. I’d derive delegation from r.cr inside the renderer instead of threading a second flag through options—two clocks in distributed systems rarely make life better. 🔍🚨

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@operators/multiclusterobservability/pkg/rendering/renderer_mcoa.go` around
lines 42 - 47, The MCOARendererOptions.RightSizingDelegated flag duplicates the
authoritative RS delegation annotation on the MCO CR and can cause inconsistent
state; remove this field from MCOARendererOptions and change logic in
MCOARenderer.Render() and renderAddonDeploymentConfig() to derive delegation
directly from the renderer's CR (r.cr) annotation instead of reading the option.
Update all call sites that constructed MCOARendererOptions to stop passing
RightSizingDelegated and ensure any conditional branches that previously used
RightSizingDelegated now evaluate the annotation on r.cr (use the same
annotation key used elsewhere in the repo) so the mutual-exclusion decision
comes from a single source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go`:
- Around line 56-67: CleanupPolicyResourcesForDelegation currently swallows
errors from lower-level cleanup helpers; modify the chain to return and
propagate errors: change rs-utility.CleanupComponentResources (function
CleanupComponentResources) to return error and stop logging+swallowing, update
the helper wrappers CleanupRSNamespaceResources and
CleanupRSVirtualizationResources to return error and forward any error from
CleanupComponentResources, then update CleanupPolicyResourcesForDelegation to
handle and return those errors and update its caller in analytics_controller.go
(the reconcile path that calls CleanupPolicyResourcesForDelegation, same pattern
as syncRightSizingStateToADC) to propagate the error up the reconcile loop so
failures are retried and surfaced.

---

Outside diff comments:
In `@operators/multiclusterobservability/pkg/rendering/renderer_mcoa.go`:
- Around line 224-252: The RS ADC variables are only set inside the cs.Platform
!= nil block, causing them to be omitted when Platform is absent; pre-initialize
both mcoutil.ADCKeyPlatformNamespaceRightSizing and
mcoutil.ADCKeyPlatformVirtualizationRightSizing to "disabled" (call
appendCustomVar(aodc, <key>, "disabled")) before the platform block, then keep
the existing logic that overrides them to "enabled" when delegated (check
r.rendererOptions.MCOAOptions.RightSizingDelegated) and the specific
cs.Platform.Analytics.*.Enabled flags, using the same appendCustomVar calls to
update the values.

---

Nitpick comments:
In `@operators/multiclusterobservability/pkg/rendering/renderer_mcoa.go`:
- Around line 42-47: The MCOARendererOptions.RightSizingDelegated flag
duplicates the authoritative RS delegation annotation on the MCO CR and can
cause inconsistent state; remove this field from MCOARendererOptions and change
logic in MCOARenderer.Render() and renderAddonDeploymentConfig() to derive
delegation directly from the renderer's CR (r.cr) annotation instead of reading
the option. Update all call sites that constructed MCOARendererOptions to stop
passing RightSizingDelegated and ensure any conditional branches that previously
used RightSizingDelegated now evaluate the annotation on r.cr (use the same
annotation key used elsewhere in the repo) so the mutual-exclusion decision
comes from a single source of truth.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 34b4ffc2-4296-4408-a7f4-59f60147318b

📥 Commits

Reviewing files that changed from the base of the PR and between 1992c81 and 1a53221.

📒 Files selected for processing (5)
  • operators/multiclusterobservability/controllers/analytics/analytics_controller.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go
  • operators/multiclusterobservability/controllers/multiclusterobservability/multiclusterobservability_controller.go
  • operators/multiclusterobservability/pkg/rendering/renderer.go
  • operators/multiclusterobservability/pkg/rendering/renderer_mcoa.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • operators/multiclusterobservability/controllers/multiclusterobservability/multiclusterobservability_controller.go
  • operators/multiclusterobservability/controllers/analytics/analytics_controller.go

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@operators/multiclusterobservability/controllers/analytics/analytics_controller.go`:
- Around line 98-104: The cleanup pre-check can return false incorrectly and
permanently skip cleanup; change hasPolicyResourcesToCleanup to return (bool,
error) and update callers (e.g., the reconciliation block around r.wasDelegated
and the logic at lines ~298-311) to handle errors instead of treating any GET
error as “no resources”. Inside hasPolicyResourcesToCleanup check both namespace
policy (rs-prom-rules-policy) and virtualization policy presence, treat only
apierrors.IsNotFound as “not present”, and return other errors upward; in the
caller, call hasPolicyResourcesToCleanup(ctx) and if it returns (true, nil) run
rightsizingctrl.CleanupPolicyResourcesForDelegation, if it returns (false, nil)
don’t clean but still set r.wasDelegated appropriately only when no error, and
if it returns (bool, err) with err != nil propagate the error (do not set
r.wasDelegated) so transient API errors don’t permanently skip cleanup.

In
`@operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go`:
- Around line 56-67: CleanupPolicyResourcesForDelegation currently uses
process-local ComponentState.Namespace (rsnamespace.ComponentState.Namespace and
rsvirtualization.ComponentState.Namespace) which can be stale; instead derive
the target namespaces from the passed-in mco CR before calling
rsnamespace.CleanupRSNamespaceResources and
rsvirtualization.CleanupRSVirtualizationResources. Modify
CleanupPolicyResourcesForDelegation to resolve the configured/observed namespace
bindings from the mco object (e.g., read the relevant fields in mco.Spec/status
that indicate the namespace bindings for the namespace and virtualization
components) and pass those resolved namespaces into
rsnamespace.CleanupRSNamespaceResources and
rsvirtualization.CleanupRSVirtualizationResources so cleanup targets the
CR-derived namespaces rather than process-local ComponentState.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1962d334-f89e-4c48-a740-12e0f5607033

📥 Commits

Reviewing files that changed from the base of the PR and between 1a53221 and cfef7b5.

📒 Files selected for processing (8)
  • operators/multiclusterobservability/controllers/analytics/analytics_controller.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-namespace/helper.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-namespace/helper_test.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-utility/component.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-utility/component_test.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-virtualization/helper.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-virtualization/helper_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-utility/component_test.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-utility/component.go

@dvandra
Copy link
Copy Markdown
Contributor Author

dvandra commented Apr 13, 2026

/retest

@dvandra dvandra force-pushed the right-sizing-delegation branch from 29d1f98 to 65dde4a Compare April 13, 2026 05:05
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go (1)

87-95: Consider handling the error from GetComponentConfig.

Line 90 discards the error return from rsutility.GetComponentConfig. While this is safe given you only pass known component types (ComponentTypeNamespace/ComponentTypeVirtualization), it's a small deviation from Go's "handle every error" idiom. If someone later passes an invalid type, the blank identifier will silently swallow the failure.

That said, this is a private helper with controlled callers, so the risk is minimal. Just flagging for awareness. 🔍

🛡️ Optional: Log or return error for defensive coding
 func resolveNamespaceBinding(mco *mcov1beta2.MultiClusterObservability, ct rsutility.ComponentType) string {
-	_, binding, _ := rsutility.GetComponentConfig(mco, ct)
+	_, binding, err := rsutility.GetComponentConfig(mco, ct)
+	if err != nil {
+		log.V(1).Info("rs - failed to get component config, using default namespace", "error", err)
+		return rsutility.DefaultNamespace
+	}
 	if binding == "" {
 		return rsutility.DefaultNamespace
 	}
 	return binding
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go`
around lines 87 - 95, The resolveNamespaceBinding helper currently discards the
error from rsutility.GetComponentConfig; update resolveNamespaceBinding to check
the returned error from GetComponentConfig and handle it defensively—if error !=
nil, log the error with context (including the ct value) using the controller
logger or fallback to returning rsutility.DefaultNamespace, otherwise proceed to
use the binding; ensure you reference the existing function
resolveNamespaceBinding and rsutility.GetComponentConfig so the error check and
logging are added in that function.
operators/multiclusterobservability/controllers/analytics/analytics_controller.go (1)

303-332: Namespace resolution inconsistency with CleanupPolicyResourcesForDelegation.

hasPolicyResourcesToCleanup uses rsnamespace.ComponentState.Namespace and rsvirtualization.ComponentState.Namespace (lines 312-313), which are process-local and reset on controller restart. Meanwhile, CleanupPolicyResourcesForDelegation in rightsizing_controller.go correctly derives namespaces from the MCO CR via resolveNamespaceBinding.

After a restart with a custom namespace binding, ComponentState.Namespace will be the default, potentially missing policies in custom namespaces. This means hasPolicyResourcesToCleanup could return false when policies actually exist, causing cleanup to be skipped.

Since cleanup is idempotent and the actual cleanup function uses CR-derived namespaces, the worst case is a missed cleanup on the first post-restart reconcile — which will be caught on subsequent reconciles. Low severity, but worth aligning for consistency.

♻️ Align namespace resolution with CleanupPolicyResourcesForDelegation
-func (r *AnalyticsReconciler) hasPolicyResourcesToCleanup(ctx context.Context) (bool, error) {
+func (r *AnalyticsReconciler) hasPolicyResourcesToCleanup(ctx context.Context, mco *mcov1beta2.MultiClusterObservability) (bool, error) {
+	resolveNS := func(binding string) string {
+		if binding == "" {
+			return config.GetDefaultNamespace()
+		}
+		return binding
+	}
+	_, nsBinding := rsnamespace.GetRightSizingNamespaceConfig(mco)
+	_, virtBinding := rsvirtualization.GetRightSizingVirtualizationConfig(mco)
+
 	checks := []struct {
 		name      string
 		namespace string
 	}{
-		{rsnamespace.PrometheusRulePolicyName, rsnamespace.ComponentState.Namespace},
-		{rsvirtualization.PrometheusRulePolicyName, rsvirtualization.ComponentState.Namespace},
+		{rsnamespace.PrometheusRulePolicyName, resolveNS(nsBinding)},
+		{rsvirtualization.PrometheusRulePolicyName, resolveNS(virtBinding)},
 	}

And update the call site:

-			hasResources, err := r.hasPolicyResourcesToCleanup(ctx)
+			hasResources, err := r.hasPolicyResourcesToCleanup(ctx, instance)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@operators/multiclusterobservability/controllers/analytics/analytics_controller.go`
around lines 303 - 332, The hasPolicyResourcesToCleanup function currently reads
namespaces from rsnamespace.ComponentState.Namespace and
rsvirtualization.ComponentState.Namespace which can be stale after a controller
restart; change it to derive the namespaces the same way
CleanupPolicyResourcesForDelegation does by calling resolveNamespaceBinding (or
the same helper used by rightsizing_controller.go) to obtain the effective
namespace for each policy before building the NamespacedName; update the loop in
hasPolicyResourcesToCleanup to call resolveNamespaceBinding (or reuse the
resolver function) for rsnamespace and rsvritualization entries and use the
resolved namespace (falling back to config.GetDefaultNamespace() only if the
resolver returns empty) so namespace resolution is consistent with
CleanupPolicyResourcesForDelegation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@operators/multiclusterobservability/pkg/util/managedclusteraddon_test.go`:
- Around line 265-267: The test accesses firstAddon.Status.ConfigReferences[0]
directly in three assertions (around the checks that compare
DesiredConfig.SpecHash to originalHash and similar checks at the other two
spots), which can panic if the slice is empty; add defensive bounds checks like
verifying len(firstAddon.Status.ConfigReferences) > 0 (and fail the test with
t.Fatalf including a clear message) before each access so the test reports a
readable failure instead of an index-out-of-range panic.

---

Nitpick comments:
In
`@operators/multiclusterobservability/controllers/analytics/analytics_controller.go`:
- Around line 303-332: The hasPolicyResourcesToCleanup function currently reads
namespaces from rsnamespace.ComponentState.Namespace and
rsvirtualization.ComponentState.Namespace which can be stale after a controller
restart; change it to derive the namespaces the same way
CleanupPolicyResourcesForDelegation does by calling resolveNamespaceBinding (or
the same helper used by rightsizing_controller.go) to obtain the effective
namespace for each policy before building the NamespacedName; update the loop in
hasPolicyResourcesToCleanup to call resolveNamespaceBinding (or reuse the
resolver function) for rsnamespace and rsvritualization entries and use the
resolved namespace (falling back to config.GetDefaultNamespace() only if the
resolver returns empty) so namespace resolution is consistent with
CleanupPolicyResourcesForDelegation.

In
`@operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go`:
- Around line 87-95: The resolveNamespaceBinding helper currently discards the
error from rsutility.GetComponentConfig; update resolveNamespaceBinding to check
the returned error from GetComponentConfig and handle it defensively—if error !=
nil, log the error with context (including the ct value) using the controller
logger or fallback to returning rsutility.DefaultNamespace, otherwise proceed to
use the binding; ensure you reference the existing function
resolveNamespaceBinding and rsutility.GetComponentConfig so the error check and
logging are added in that function.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 56ccdb19-fecb-4c1c-b869-565456cebfcb

📥 Commits

Reviewing files that changed from the base of the PR and between cfef7b5 and 65dde4a.

📒 Files selected for processing (18)
  • operators/multiclusterobservability/controllers/analytics/analytics_controller.go
  • operators/multiclusterobservability/controllers/analytics/analytics_controller_test.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-namespace/helper.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-namespace/helper_test.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-utility/component.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-utility/component_test.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-virtualization/helper.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-virtualization/helper_test.go
  • operators/multiclusterobservability/controllers/multiclusterobservability/multiclusterobservability_controller.go
  • operators/multiclusterobservability/manifests/base/multicluster-observability-addon/cluster_role.yaml
  • operators/multiclusterobservability/pkg/rendering/renderer.go
  • operators/multiclusterobservability/pkg/rendering/renderer_mcoa.go
  • operators/multiclusterobservability/pkg/rendering/renderer_mcoa_test.go
  • operators/multiclusterobservability/pkg/util/managedclusteraddon.go
  • operators/multiclusterobservability/pkg/util/managedclusteraddon_test.go
  • operators/multiclusterobservability/pkg/util/rightsizing.go
  • operators/multiclusterobservability/pkg/util/rightsizing_test.go
✅ Files skipped from review due to trivial changes (5)
  • operators/multiclusterobservability/pkg/rendering/renderer_mcoa_test.go
  • operators/multiclusterobservability/pkg/rendering/renderer.go
  • operators/multiclusterobservability/pkg/util/rightsizing_test.go
  • operators/multiclusterobservability/pkg/util/rightsizing.go
  • operators/multiclusterobservability/pkg/rendering/renderer_mcoa.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-namespace/helper_test.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-virtualization/helper.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-namespace/helper.go
  • operators/multiclusterobservability/controllers/analytics/rightsizing/rs-utility/component.go
  • operators/multiclusterobservability/controllers/multiclusterobservability/multiclusterobservability_controller.go

@rajusem rajusem force-pushed the right-sizing-delegation branch from 911ef11 to 35eb137 Compare April 13, 2026 10:29
@dvandra dvandra force-pushed the right-sizing-delegation branch 2 times, most recently from b16d4d6 to 176a35b Compare April 13, 2026 13:48
@rajusem rajusem force-pushed the right-sizing-delegation branch from 176a35b to 35eb137 Compare April 13, 2026 14:03
@dvandra dvandra force-pushed the right-sizing-delegation branch from 35eb137 to 176a35b Compare April 13, 2026 14:12
@rajusem rajusem force-pushed the right-sizing-delegation branch from 176a35b to 35eb137 Compare April 13, 2026 14:13
rajusem and others added 10 commits April 17, 2026 17:28
Signed-off-by: Raj Zalavadia <rzalavad@redhat.com>
…ection

Signed-off-by: Raj Zalavadia <rzalavad@redhat.com>
…nc and renderer

Signed-off-by: Raj Zalavadia <rzalavad@redhat.com>
Signed-off-by: Raj Zalavadia <rzalavad@redhat.com>
Signed-off-by: Darshan Vandra <dvandra@redhat.com>
…ad of stale process-local state

Signed-off-by: Darshan Vandra <dvandra@redhat.com>
Signed-off-by: Darshan Vandra <dvandra@redhat.com>
Signed-off-by: Darshan Vandra <dvandra@redhat.com>

Signed-off-by: Raj Zalavadia <rzalavad@redhat.com>
…elegation check

Signed-off-by: Darshan Vandra <dvandra@redhat.com>
Signed-off-by: Raj Zalavadia <rzalavad@redhat.com>
The Grafana link was removed from the MCOA ClusterManagementAddOn
assuming it caused duplicate non-clickable entries in the ACM console.
The actual root cause is in stolostron/console: AcmDropdown does not
pass href to MenuItem, and AcmLaunchLink has a no-op onSelect handler
for the multi-link dropdown path.

Restore the launch-link annotation so the MCOA CMA links to its
own Grafana dashboard (UID 89eaec849a6e4837a619fb0540c22b13).

Signed-off-by: Darshan Vandra <dvandra@redhat.com>
Signed-off-by: Raj Zalavadia <rzalavad@redhat.com>
@dvandra dvandra force-pushed the right-sizing-delegation branch from b9b0dea to 45314ec Compare April 17, 2026 11:58
@dvandra dvandra changed the title Add right-sizing delegation from MCO to MCOA via MCO CR annotation [ACM-32844] Add right-sizing delegation from MCO to MCOA via MCO CR annotation Apr 20, 2026
dvandra and others added 3 commits April 22, 2026 18:05
Replace cleanupStarted bool with cleanupAt timestamp to block all reconciles
during stabilization window. Phase 2 re-syncs disabled state to ADC before
cleanup to handle main controller overwrite race.

Signed-off-by: Raj Zalavadia <rzalavad@redhat.com>
// load and render multicluster-observability-addon templates if capabilities is enabled
if MCOAEnabled(r.cr) {
// load and render multicluster-observability-addon templates
rightSizingDelegated := r.rendererOptions != nil && r.rendererOptions.MCOAOptions.RightSizingDelegated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't check if right sizing is enabled?

@dvandra
Copy link
Copy Markdown
Contributor Author

dvandra commented Apr 25, 2026

/retest

2 similar comments
@dvandra
Copy link
Copy Markdown
Contributor Author

dvandra commented Apr 25, 2026

/retest

@dvandra
Copy link
Copy Markdown
Contributor Author

dvandra commented Apr 25, 2026

/retest

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 25, 2026

@dvandra: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/test-e2e 1dc2171 link true /test test-e2e

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants