[ACM-32844] Add right-sizing delegation from MCO to MCOA via MCO CR annotation#2404
[ACM-32844] Add right-sizing delegation from MCO to MCOA via MCO CR annotation#2404dvandra wants to merge 19 commits intostolostron:mainfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReconciler adds a delegation gate (IsRightSizingDelegated), tracks first-transition to delegation to cleanup Policy resources, and syncs right-sizing state into AddOnDeploymentConfig (ADC). Multiple cleanup helpers now return errors; renderer, RBAC, managed-cluster-addon logic, and tests updated to respect delegation. Changes
Sequence Diagram(s)sequenceDiagram
participant Reconciler as AnalyticsReconciler
participant MCO as MultiClusterObservability
participant Rightsizing as RightsizingController
participant ADC as AddOnDeploymentConfig
participant K8sAPI as Kubernetes API
Reconciler->>MCO: GET instance
Note right of MCO: check IsRightSizingDelegated(annotation)
alt Delegation mode
Reconciler->>Reconciler: if !wasDelegated -> set wasDelegated=true
Reconciler->>K8sAPI: GET Policy resources (hasPolicyResourcesToCleanup)
alt Policy exists
Reconciler->>Rightsizing: CleanupPolicyResourcesForDelegation(ctx, client, mco)
end
Reconciler->>ADC: GET/AddOnDeploymentConfig
ADC-->>Reconciler: ADC (or NotFound)
Reconciler->>ADC: Update CustomizedVariables per MCO capabilities (enabled/disabled)
else MCO-managed mode
Reconciler->>ADC: GET/AddOnDeploymentConfig
ADC-->>Reconciler: ADC (or NotFound)
Reconciler->>ADC: Force right-sizing vars -> "disabled"
Reconciler->>Rightsizing: Proceed with Policy-based resource creation
end
ADC->>K8sAPI: UPDATE (only if changed)
Rightsizing->>K8sAPI: create/update/delete Policy/Placements as needed
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
1482501 to
1992c81
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go (1)
51-73: Function logic is correct, but the shared state mutation pattern warrants attention.The cleanup logic correctly:
- Uses
bindingUpdated=trueto preserve ConfigMaps (MCOA owns them)- Cleans up Policy/Placement/PlacementBinding resources
- Resets
Enabledflags to trigger fresh creation on next MCO-mode reconcileArchitectural observation: Lines 69-70 directly mutate package-level state in external packages:
rsnamespace.ComponentState.Enabled = false rsvirtualization.ComponentState.Enabled = falseThis works because controller-runtime serializes reconciliation within a single controller, but it creates tight coupling between packages. It's like reaching into another team's desk drawer to flip a switch — functional, but makes the dependency graph harder to trace. 🗄️
For future iterations, consider having the namespace/virtualization packages expose a
ResetState()method to encapsulate this mutation. Not blocking for this PR since the current approach is safe under single-controller reconciliation semantics.🔧 Optional: Encapsulate state reset
In
rs-namespaceandrs-virtualizationpackages, add:func ResetComponentState() { ComponentState.Enabled = false }Then in this function:
-rsnamespace.ComponentState.Enabled = false -rsvirtualization.ComponentState.Enabled = false +rsnamespace.ResetComponentState() +rsvirtualization.ResetComponentState()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go` around lines 51 - 73, The function directly mutates package-level state (rsnamespace.ComponentState.Enabled and rsvirtualization.ComponentState.Enabled); instead add a ResetComponentState() (or ResetState()) function in both rsnamespace and rsvirtualization packages that sets ComponentState.Enabled = false, and replace the direct assignments in CleanupPolicyResourcesForDelegation with calls to rsnamespace.ResetComponentState() and rsvirtualization.ResetComponentState() so the state mutation is encapsulated behind a clear API.operators/multiclusterobservability/controllers/analytics/analytics_controller.go (1)
273-283: FullUpdate()could conflict with concurrent ADC modifications.Using
r.Client.Update(ctx, adc)replaces the entire ADC object. If another controller (or MCOA itself) modifies the ADC between yourGet()andUpdate(), you'll get a conflict error or silently overwrite their changes.For surgical updates to
CustomizedVariables, consider using a strategic merge patch or optimistic concurrency retry. Given that ADC is a relatively low-churn object, this might be acceptable risk — but worth noting if you see conflict errors in production.♻️ Consider using Patch for surgical update
// Alternative: Use patch to update only the customizedVariables field patchData := map[string]any{ "spec": map[string]any{ "customizedVariables": adc.Spec.CustomizedVariables, }, } patchBytes, _ := json.Marshal(patchData) if err := r.Client.Patch(ctx, adc, client.RawPatch(types.MergePatchType, patchBytes)); err != nil { return fmt.Errorf("failed to patch AddOnDeploymentConfig: %w", err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@operators/multiclusterobservability/controllers/analytics/analytics_controller.go` around lines 273 - 283, The current use of r.Client.Update(ctx, adc) when needsUpdate can overwrite concurrent ADC changes; change the surgical update to patch only the customizedVariables field on the AddOnDeploymentConfig (adc) instead of replacing the whole object: build a MergePatch (or StrategicMergePatch) that sets spec.customizedVariables to adc.Spec.CustomizedVariables and call r.Client.Patch(ctx, adc, client.RawPatch(...)) and return the error if patch fails; alternatively implement an optimistic retry around r.Client.Update by re-getting adc on conflict and re-applying the change, but prefer the Patch approach for minimal churn.operators/multiclusterobservability/controllers/analytics/analytics_controller_test.go (1)
343-383: Test verifies state but not the "no update" behavior.The test name suggests it validates "no update when values match," but the fake client doesn't actually track whether
Update()was called. The test confirms final state is correct, but can't prove an unnecessary API call was avoided.For a more robust assertion, you could wrap the client or use a mock that tracks call counts. Low priority since the optimization is more about efficiency than correctness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@operators/multiclusterobservability/controllers/analytics/analytics_controller_test.go` around lines 343 - 383, The test TestSyncRightSizingStateToADC_NoUpdateWhenValuesMatch currently only asserts final ADC values and can't verify that AnalyticsReconciler.syncRightSizingStateToADC did not call Update(); modify the test to use a client wrapper or mock around the fake client that intercepts and counts Update() invocations (or implement a reactor on the fake client) and assert that Update() was not invoked when ADC already has matching CustomizedVariables; locate the client used to construct r := &AnalyticsReconciler{Client: c, ...} and replace c with the wrapped/mock client that records Update calls, then add an assertion that the Update call count is zero.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@operators/multiclusterobservability/controllers/analytics/analytics_controller.go`:
- Around line 120-126: The syncRightSizingStateToADC error is currently logged
then ignored (r.syncRightSizingStateToADC + reqLogger.Error), which can leave
MCOA stale; change the reconcile flow to (1) log the full error at Error level
including context (include the err value in the reqLogger.Error call) and (2)
requeue the reconcile with a backoff when syncRightSizingStateToADC returns an
error (return a ctrl.Result that requests a delayed requeue or use exponential
backoff logic) instead of silently continuing so repeated failures are retried.
- Around line 36-41: The in-memory wasDelegated flag on AnalyticsReconciler does
not persist across controller restarts, causing
CleanupPolicyResourcesForDelegation to re-run unnecessarily; change the
reconcile flow to either persist delegation state (e.g., write/read a ConfigMap
or update the MCO status subresource used by this controller) or add a pre-check
that skips cleanup when nothing needs removal. Concretely: modify the reconcile
entry point that currently reads wasDelegated and calls
CleanupPolicyResourcesForDelegation to first call a new helper (e.g.,
hasPolicyResourcesToCleanup) that queries for Policy/Placement/PlacementBinding
resources and returns false if none exist, and/or replace wasDelegated with a
persisted boolean stored/read from a ConfigMap or the MCO status before deciding
to run CleanupPolicyResourcesForDelegation; update any references to
AnalyticsReconciler.wasDelegated and ensure CleanupPolicyResourcesForDelegation
becomes a no-op when there are no resources to delete.
---
Nitpick comments:
In
`@operators/multiclusterobservability/controllers/analytics/analytics_controller_test.go`:
- Around line 343-383: The test
TestSyncRightSizingStateToADC_NoUpdateWhenValuesMatch currently only asserts
final ADC values and can't verify that
AnalyticsReconciler.syncRightSizingStateToADC did not call Update(); modify the
test to use a client wrapper or mock around the fake client that intercepts and
counts Update() invocations (or implement a reactor on the fake client) and
assert that Update() was not invoked when ADC already has matching
CustomizedVariables; locate the client used to construct r :=
&AnalyticsReconciler{Client: c, ...} and replace c with the wrapped/mock client
that records Update calls, then add an assertion that the Update call count is
zero.
In
`@operators/multiclusterobservability/controllers/analytics/analytics_controller.go`:
- Around line 273-283: The current use of r.Client.Update(ctx, adc) when
needsUpdate can overwrite concurrent ADC changes; change the surgical update to
patch only the customizedVariables field on the AddOnDeploymentConfig (adc)
instead of replacing the whole object: build a MergePatch (or
StrategicMergePatch) that sets spec.customizedVariables to
adc.Spec.CustomizedVariables and call r.Client.Patch(ctx, adc,
client.RawPatch(...)) and return the error if patch fails; alternatively
implement an optimistic retry around r.Client.Update by re-getting adc on
conflict and re-applying the change, but prefer the Patch approach for minimal
churn.
In
`@operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go`:
- Around line 51-73: The function directly mutates package-level state
(rsnamespace.ComponentState.Enabled and
rsvirtualization.ComponentState.Enabled); instead add a ResetComponentState()
(or ResetState()) function in both rsnamespace and rsvirtualization packages
that sets ComponentState.Enabled = false, and replace the direct assignments in
CleanupPolicyResourcesForDelegation with calls to
rsnamespace.ResetComponentState() and rsvirtualization.ResetComponentState() so
the state mutation is encapsulated behind a clear API.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7e2227f0-304c-414c-9ecc-2c1bd279f5dd
📒 Files selected for processing (12)
operators/multiclusterobservability/controllers/analytics/analytics_controller.gooperators/multiclusterobservability/controllers/analytics/analytics_controller_test.gooperators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.gooperators/multiclusterobservability/controllers/analytics/rightsizing/rs-utility/component.gooperators/multiclusterobservability/controllers/analytics/rightsizing/rs-utility/component_test.gooperators/multiclusterobservability/controllers/multiclusterobservability/multiclusterobservability_controller.gooperators/multiclusterobservability/manifests/base/multicluster-observability-addon/cluster_role.yamloperators/multiclusterobservability/pkg/rendering/renderer.gooperators/multiclusterobservability/pkg/rendering/renderer_mcoa.gooperators/multiclusterobservability/pkg/rendering/renderer_mcoa_test.gooperators/multiclusterobservability/pkg/util/rightsizing.gooperators/multiclusterobservability/pkg/util/rightsizing_test.go
|
/retest |
0e12993 to
1a53221
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
operators/multiclusterobservability/pkg/rendering/renderer_mcoa.go (1)
224-252:⚠️ Potential issue | 🟠 MajorWrite explicit disabled RS variables even when
Platformis omitted.Both right-sizing ADC keys are only appended inside
if cs.Platform != nil. That means a render triggered by some other MCOA capability (for example user-workload metrics) produces no RS variables at all whenCapabilities.Platformis absent, instead of the explicit"disabled"state this migration path depends on. The result is an implicit state machine, which is how configs start acting like haunted machinery.Initialize both RS variables to
"disabled"before the platform block, then override to"enabled"only when delegation is on and the specific recommendation is enabled. 🔍🚨Suggested shape of the fix
appendCustomVar := func(aodc *addonapiv1alpha1.AddOnDeploymentConfig, name, value string) { aodc.Spec.CustomizedVariables = append( aodc.Spec.CustomizedVariables, addonapiv1alpha1.CustomizedVariable{Name: name, Value: value}, ) } + + delegated := r.rendererOptions != nil && r.rendererOptions.MCOAOptions.RightSizingDelegated + namespaceRS := "disabled" + virtualizationRS := "disabled" if cs.Platform != nil { if cs.Platform.Logs.Collection.Enabled { fqdn := mcoconfig.GetMCOASupportedCRDFQDN(mcoconfig.ClusterLogForwarderCRDName) appendCustomVar(aodc, namePlatformLogsCollection, fqdn) @@ if cs.Platform.Analytics.IncidentDetection.Enabled { appendCustomVar(aodc, namePlatformIncidentDetection, uipluginsCRDFQDN) } - - // Right-sizing ADC variables: when MCOA manages RS (delegated), - // sync actual CR state. Otherwise set "disabled" so MCOA doesn't deploy RS. - delegated := r.rendererOptions != nil && r.rendererOptions.MCOAOptions.RightSizingDelegated if delegated && cs.Platform.Analytics.NamespaceRightSizingRecommendation.Enabled { - appendCustomVar(aodc, mcoutil.ADCKeyPlatformNamespaceRightSizing, "enabled") - } else { - appendCustomVar(aodc, mcoutil.ADCKeyPlatformNamespaceRightSizing, "disabled") + namespaceRS = "enabled" } if delegated && cs.Platform.Analytics.VirtualizationRightSizingRecommendation.Enabled { - appendCustomVar(aodc, mcoutil.ADCKeyPlatformVirtualizationRightSizing, "enabled") - } else { - appendCustomVar(aodc, mcoutil.ADCKeyPlatformVirtualizationRightSizing, "disabled") + virtualizationRS = "enabled" } } + + appendCustomVar(aodc, mcoutil.ADCKeyPlatformNamespaceRightSizing, namespaceRS) + appendCustomVar(aodc, mcoutil.ADCKeyPlatformVirtualizationRightSizing, virtualizationRS)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@operators/multiclusterobservability/pkg/rendering/renderer_mcoa.go` around lines 224 - 252, The RS ADC variables are only set inside the cs.Platform != nil block, causing them to be omitted when Platform is absent; pre-initialize both mcoutil.ADCKeyPlatformNamespaceRightSizing and mcoutil.ADCKeyPlatformVirtualizationRightSizing to "disabled" (call appendCustomVar(aodc, <key>, "disabled")) before the platform block, then keep the existing logic that overrides them to "enabled" when delegated (check r.rendererOptions.MCOAOptions.RightSizingDelegated) and the specific cs.Platform.Analytics.*.Enabled flags, using the same appendCustomVar calls to update the values.
🧹 Nitpick comments (1)
operators/multiclusterobservability/pkg/rendering/renderer_mcoa.go (1)
42-47: Avoid a second source of truth for RS delegation.
RightSizingDelegatednow duplicates the MCO CR annotation that this PR defines as authoritative. If a caller ever passes a stale/default value,Render()andrenderAddonDeploymentConfig()can flip the manager state incorrectly and break the mutual-exclusion guarantee. I’d derive delegation fromr.crinside the renderer instead of threading a second flag through options—two clocks in distributed systems rarely make life better. 🔍🚨🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@operators/multiclusterobservability/pkg/rendering/renderer_mcoa.go` around lines 42 - 47, The MCOARendererOptions.RightSizingDelegated flag duplicates the authoritative RS delegation annotation on the MCO CR and can cause inconsistent state; remove this field from MCOARendererOptions and change logic in MCOARenderer.Render() and renderAddonDeploymentConfig() to derive delegation directly from the renderer's CR (r.cr) annotation instead of reading the option. Update all call sites that constructed MCOARendererOptions to stop passing RightSizingDelegated and ensure any conditional branches that previously used RightSizingDelegated now evaluate the annotation on r.cr (use the same annotation key used elsewhere in the repo) so the mutual-exclusion decision comes from a single source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go`:
- Around line 56-67: CleanupPolicyResourcesForDelegation currently swallows
errors from lower-level cleanup helpers; modify the chain to return and
propagate errors: change rs-utility.CleanupComponentResources (function
CleanupComponentResources) to return error and stop logging+swallowing, update
the helper wrappers CleanupRSNamespaceResources and
CleanupRSVirtualizationResources to return error and forward any error from
CleanupComponentResources, then update CleanupPolicyResourcesForDelegation to
handle and return those errors and update its caller in analytics_controller.go
(the reconcile path that calls CleanupPolicyResourcesForDelegation, same pattern
as syncRightSizingStateToADC) to propagate the error up the reconcile loop so
failures are retried and surfaced.
---
Outside diff comments:
In `@operators/multiclusterobservability/pkg/rendering/renderer_mcoa.go`:
- Around line 224-252: The RS ADC variables are only set inside the cs.Platform
!= nil block, causing them to be omitted when Platform is absent; pre-initialize
both mcoutil.ADCKeyPlatformNamespaceRightSizing and
mcoutil.ADCKeyPlatformVirtualizationRightSizing to "disabled" (call
appendCustomVar(aodc, <key>, "disabled")) before the platform block, then keep
the existing logic that overrides them to "enabled" when delegated (check
r.rendererOptions.MCOAOptions.RightSizingDelegated) and the specific
cs.Platform.Analytics.*.Enabled flags, using the same appendCustomVar calls to
update the values.
---
Nitpick comments:
In `@operators/multiclusterobservability/pkg/rendering/renderer_mcoa.go`:
- Around line 42-47: The MCOARendererOptions.RightSizingDelegated flag
duplicates the authoritative RS delegation annotation on the MCO CR and can
cause inconsistent state; remove this field from MCOARendererOptions and change
logic in MCOARenderer.Render() and renderAddonDeploymentConfig() to derive
delegation directly from the renderer's CR (r.cr) annotation instead of reading
the option. Update all call sites that constructed MCOARendererOptions to stop
passing RightSizingDelegated and ensure any conditional branches that previously
used RightSizingDelegated now evaluate the annotation on r.cr (use the same
annotation key used elsewhere in the repo) so the mutual-exclusion decision
comes from a single source of truth.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 34b4ffc2-4296-4408-a7f4-59f60147318b
📒 Files selected for processing (5)
operators/multiclusterobservability/controllers/analytics/analytics_controller.gooperators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.gooperators/multiclusterobservability/controllers/multiclusterobservability/multiclusterobservability_controller.gooperators/multiclusterobservability/pkg/rendering/renderer.gooperators/multiclusterobservability/pkg/rendering/renderer_mcoa.go
🚧 Files skipped from review as they are similar to previous changes (2)
- operators/multiclusterobservability/controllers/multiclusterobservability/multiclusterobservability_controller.go
- operators/multiclusterobservability/controllers/analytics/analytics_controller.go
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@operators/multiclusterobservability/controllers/analytics/analytics_controller.go`:
- Around line 98-104: The cleanup pre-check can return false incorrectly and
permanently skip cleanup; change hasPolicyResourcesToCleanup to return (bool,
error) and update callers (e.g., the reconciliation block around r.wasDelegated
and the logic at lines ~298-311) to handle errors instead of treating any GET
error as “no resources”. Inside hasPolicyResourcesToCleanup check both namespace
policy (rs-prom-rules-policy) and virtualization policy presence, treat only
apierrors.IsNotFound as “not present”, and return other errors upward; in the
caller, call hasPolicyResourcesToCleanup(ctx) and if it returns (true, nil) run
rightsizingctrl.CleanupPolicyResourcesForDelegation, if it returns (false, nil)
don’t clean but still set r.wasDelegated appropriately only when no error, and
if it returns (bool, err) with err != nil propagate the error (do not set
r.wasDelegated) so transient API errors don’t permanently skip cleanup.
In
`@operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go`:
- Around line 56-67: CleanupPolicyResourcesForDelegation currently uses
process-local ComponentState.Namespace (rsnamespace.ComponentState.Namespace and
rsvirtualization.ComponentState.Namespace) which can be stale; instead derive
the target namespaces from the passed-in mco CR before calling
rsnamespace.CleanupRSNamespaceResources and
rsvirtualization.CleanupRSVirtualizationResources. Modify
CleanupPolicyResourcesForDelegation to resolve the configured/observed namespace
bindings from the mco object (e.g., read the relevant fields in mco.Spec/status
that indicate the namespace bindings for the namespace and virtualization
components) and pass those resolved namespaces into
rsnamespace.CleanupRSNamespaceResources and
rsvirtualization.CleanupRSVirtualizationResources so cleanup targets the
CR-derived namespaces rather than process-local ComponentState.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1962d334-f89e-4c48-a740-12e0f5607033
📒 Files selected for processing (8)
operators/multiclusterobservability/controllers/analytics/analytics_controller.gooperators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.gooperators/multiclusterobservability/controllers/analytics/rightsizing/rs-namespace/helper.gooperators/multiclusterobservability/controllers/analytics/rightsizing/rs-namespace/helper_test.gooperators/multiclusterobservability/controllers/analytics/rightsizing/rs-utility/component.gooperators/multiclusterobservability/controllers/analytics/rightsizing/rs-utility/component_test.gooperators/multiclusterobservability/controllers/analytics/rightsizing/rs-virtualization/helper.gooperators/multiclusterobservability/controllers/analytics/rightsizing/rs-virtualization/helper_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- operators/multiclusterobservability/controllers/analytics/rightsizing/rs-utility/component_test.go
- operators/multiclusterobservability/controllers/analytics/rightsizing/rs-utility/component.go
|
/retest |
29d1f98 to
65dde4a
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go (1)
87-95: Consider handling the error fromGetComponentConfig.Line 90 discards the error return from
rsutility.GetComponentConfig. While this is safe given you only pass known component types (ComponentTypeNamespace/ComponentTypeVirtualization), it's a small deviation from Go's "handle every error" idiom. If someone later passes an invalid type, the blank identifier will silently swallow the failure.That said, this is a private helper with controlled callers, so the risk is minimal. Just flagging for awareness. 🔍
🛡️ Optional: Log or return error for defensive coding
func resolveNamespaceBinding(mco *mcov1beta2.MultiClusterObservability, ct rsutility.ComponentType) string { - _, binding, _ := rsutility.GetComponentConfig(mco, ct) + _, binding, err := rsutility.GetComponentConfig(mco, ct) + if err != nil { + log.V(1).Info("rs - failed to get component config, using default namespace", "error", err) + return rsutility.DefaultNamespace + } if binding == "" { return rsutility.DefaultNamespace } return binding }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go` around lines 87 - 95, The resolveNamespaceBinding helper currently discards the error from rsutility.GetComponentConfig; update resolveNamespaceBinding to check the returned error from GetComponentConfig and handle it defensively—if error != nil, log the error with context (including the ct value) using the controller logger or fallback to returning rsutility.DefaultNamespace, otherwise proceed to use the binding; ensure you reference the existing function resolveNamespaceBinding and rsutility.GetComponentConfig so the error check and logging are added in that function.operators/multiclusterobservability/controllers/analytics/analytics_controller.go (1)
303-332: Namespace resolution inconsistency withCleanupPolicyResourcesForDelegation.
hasPolicyResourcesToCleanupusesrsnamespace.ComponentState.Namespaceandrsvirtualization.ComponentState.Namespace(lines 312-313), which are process-local and reset on controller restart. Meanwhile,CleanupPolicyResourcesForDelegationinrightsizing_controller.gocorrectly derives namespaces from the MCO CR viaresolveNamespaceBinding.After a restart with a custom namespace binding,
ComponentState.Namespacewill be the default, potentially missing policies in custom namespaces. This meanshasPolicyResourcesToCleanupcould returnfalsewhen policies actually exist, causing cleanup to be skipped.Since cleanup is idempotent and the actual cleanup function uses CR-derived namespaces, the worst case is a missed cleanup on the first post-restart reconcile — which will be caught on subsequent reconciles. Low severity, but worth aligning for consistency.
♻️ Align namespace resolution with CleanupPolicyResourcesForDelegation
-func (r *AnalyticsReconciler) hasPolicyResourcesToCleanup(ctx context.Context) (bool, error) { +func (r *AnalyticsReconciler) hasPolicyResourcesToCleanup(ctx context.Context, mco *mcov1beta2.MultiClusterObservability) (bool, error) { + resolveNS := func(binding string) string { + if binding == "" { + return config.GetDefaultNamespace() + } + return binding + } + _, nsBinding := rsnamespace.GetRightSizingNamespaceConfig(mco) + _, virtBinding := rsvirtualization.GetRightSizingVirtualizationConfig(mco) + checks := []struct { name string namespace string }{ - {rsnamespace.PrometheusRulePolicyName, rsnamespace.ComponentState.Namespace}, - {rsvirtualization.PrometheusRulePolicyName, rsvirtualization.ComponentState.Namespace}, + {rsnamespace.PrometheusRulePolicyName, resolveNS(nsBinding)}, + {rsvirtualization.PrometheusRulePolicyName, resolveNS(virtBinding)}, }And update the call site:
- hasResources, err := r.hasPolicyResourcesToCleanup(ctx) + hasResources, err := r.hasPolicyResourcesToCleanup(ctx, instance)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@operators/multiclusterobservability/controllers/analytics/analytics_controller.go` around lines 303 - 332, The hasPolicyResourcesToCleanup function currently reads namespaces from rsnamespace.ComponentState.Namespace and rsvirtualization.ComponentState.Namespace which can be stale after a controller restart; change it to derive the namespaces the same way CleanupPolicyResourcesForDelegation does by calling resolveNamespaceBinding (or the same helper used by rightsizing_controller.go) to obtain the effective namespace for each policy before building the NamespacedName; update the loop in hasPolicyResourcesToCleanup to call resolveNamespaceBinding (or reuse the resolver function) for rsnamespace and rsvritualization entries and use the resolved namespace (falling back to config.GetDefaultNamespace() only if the resolver returns empty) so namespace resolution is consistent with CleanupPolicyResourcesForDelegation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@operators/multiclusterobservability/pkg/util/managedclusteraddon_test.go`:
- Around line 265-267: The test accesses firstAddon.Status.ConfigReferences[0]
directly in three assertions (around the checks that compare
DesiredConfig.SpecHash to originalHash and similar checks at the other two
spots), which can panic if the slice is empty; add defensive bounds checks like
verifying len(firstAddon.Status.ConfigReferences) > 0 (and fail the test with
t.Fatalf including a clear message) before each access so the test reports a
readable failure instead of an index-out-of-range panic.
---
Nitpick comments:
In
`@operators/multiclusterobservability/controllers/analytics/analytics_controller.go`:
- Around line 303-332: The hasPolicyResourcesToCleanup function currently reads
namespaces from rsnamespace.ComponentState.Namespace and
rsvirtualization.ComponentState.Namespace which can be stale after a controller
restart; change it to derive the namespaces the same way
CleanupPolicyResourcesForDelegation does by calling resolveNamespaceBinding (or
the same helper used by rightsizing_controller.go) to obtain the effective
namespace for each policy before building the NamespacedName; update the loop in
hasPolicyResourcesToCleanup to call resolveNamespaceBinding (or reuse the
resolver function) for rsnamespace and rsvritualization entries and use the
resolved namespace (falling back to config.GetDefaultNamespace() only if the
resolver returns empty) so namespace resolution is consistent with
CleanupPolicyResourcesForDelegation.
In
`@operators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.go`:
- Around line 87-95: The resolveNamespaceBinding helper currently discards the
error from rsutility.GetComponentConfig; update resolveNamespaceBinding to check
the returned error from GetComponentConfig and handle it defensively—if error !=
nil, log the error with context (including the ct value) using the controller
logger or fallback to returning rsutility.DefaultNamespace, otherwise proceed to
use the binding; ensure you reference the existing function
resolveNamespaceBinding and rsutility.GetComponentConfig so the error check and
logging are added in that function.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 56ccdb19-fecb-4c1c-b869-565456cebfcb
📒 Files selected for processing (18)
operators/multiclusterobservability/controllers/analytics/analytics_controller.gooperators/multiclusterobservability/controllers/analytics/analytics_controller_test.gooperators/multiclusterobservability/controllers/analytics/rightsizing/rightsizing_controller.gooperators/multiclusterobservability/controllers/analytics/rightsizing/rs-namespace/helper.gooperators/multiclusterobservability/controllers/analytics/rightsizing/rs-namespace/helper_test.gooperators/multiclusterobservability/controllers/analytics/rightsizing/rs-utility/component.gooperators/multiclusterobservability/controllers/analytics/rightsizing/rs-utility/component_test.gooperators/multiclusterobservability/controllers/analytics/rightsizing/rs-virtualization/helper.gooperators/multiclusterobservability/controllers/analytics/rightsizing/rs-virtualization/helper_test.gooperators/multiclusterobservability/controllers/multiclusterobservability/multiclusterobservability_controller.gooperators/multiclusterobservability/manifests/base/multicluster-observability-addon/cluster_role.yamloperators/multiclusterobservability/pkg/rendering/renderer.gooperators/multiclusterobservability/pkg/rendering/renderer_mcoa.gooperators/multiclusterobservability/pkg/rendering/renderer_mcoa_test.gooperators/multiclusterobservability/pkg/util/managedclusteraddon.gooperators/multiclusterobservability/pkg/util/managedclusteraddon_test.gooperators/multiclusterobservability/pkg/util/rightsizing.gooperators/multiclusterobservability/pkg/util/rightsizing_test.go
✅ Files skipped from review due to trivial changes (5)
- operators/multiclusterobservability/pkg/rendering/renderer_mcoa_test.go
- operators/multiclusterobservability/pkg/rendering/renderer.go
- operators/multiclusterobservability/pkg/util/rightsizing_test.go
- operators/multiclusterobservability/pkg/util/rightsizing.go
- operators/multiclusterobservability/pkg/rendering/renderer_mcoa.go
🚧 Files skipped from review as they are similar to previous changes (5)
- operators/multiclusterobservability/controllers/analytics/rightsizing/rs-namespace/helper_test.go
- operators/multiclusterobservability/controllers/analytics/rightsizing/rs-virtualization/helper.go
- operators/multiclusterobservability/controllers/analytics/rightsizing/rs-namespace/helper.go
- operators/multiclusterobservability/controllers/analytics/rightsizing/rs-utility/component.go
- operators/multiclusterobservability/controllers/multiclusterobservability/multiclusterobservability_controller.go
911ef11 to
35eb137
Compare
b16d4d6 to
176a35b
Compare
176a35b to
35eb137
Compare
35eb137 to
176a35b
Compare
176a35b to
35eb137
Compare
a69d65d to
f8e60b9
Compare
fa3c220 to
b9b0dea
Compare
Signed-off-by: Raj Zalavadia <rzalavad@redhat.com>
…ection Signed-off-by: Raj Zalavadia <rzalavad@redhat.com>
…nc and renderer Signed-off-by: Raj Zalavadia <rzalavad@redhat.com>
Signed-off-by: Raj Zalavadia <rzalavad@redhat.com>
Signed-off-by: Darshan Vandra <dvandra@redhat.com>
…ad of stale process-local state Signed-off-by: Darshan Vandra <dvandra@redhat.com>
Signed-off-by: Darshan Vandra <dvandra@redhat.com>
Signed-off-by: Darshan Vandra <dvandra@redhat.com> Signed-off-by: Raj Zalavadia <rzalavad@redhat.com>
…elegation check Signed-off-by: Darshan Vandra <dvandra@redhat.com> Signed-off-by: Raj Zalavadia <rzalavad@redhat.com>
The Grafana link was removed from the MCOA ClusterManagementAddOn assuming it caused duplicate non-clickable entries in the ACM console. The actual root cause is in stolostron/console: AcmDropdown does not pass href to MenuItem, and AcmLaunchLink has a no-op onSelect handler for the multi-link dropdown path. Restore the launch-link annotation so the MCOA CMA links to its own Grafana dashboard (UID 89eaec849a6e4837a619fb0540c22b13). Signed-off-by: Darshan Vandra <dvandra@redhat.com> Signed-off-by: Raj Zalavadia <rzalavad@redhat.com>
b9b0dea to
45314ec
Compare
Replace cleanupStarted bool with cleanupAt timestamp to block all reconciles during stabilization window. Phase 2 re-syncs disabled state to ADC before cleanup to handle main controller overwrite race. Signed-off-by: Raj Zalavadia <rzalavad@redhat.com>
| // load and render multicluster-observability-addon templates if capabilities is enabled | ||
| if MCOAEnabled(r.cr) { | ||
| // load and render multicluster-observability-addon templates | ||
| rightSizingDelegated := r.rendererOptions != nil && r.rendererOptions.MCOAOptions.RightSizingDelegated |
There was a problem hiding this comment.
We don't check if right sizing is enabled?
Signed-off-by: Darshan Vandra <dvandra@redhat.com>
Signed-off-by: Darshan Vandra <dvandra@redhat.com>
|
/retest |
2 similar comments
|
/retest |
|
/retest |
|
@dvandra: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Adds a migration path for right-sizing (RS) recommendations from MCO's
Policy-based approach to MCOA's ManifestWork-based approach. This enables
RS to be managed by MCOA when the MCO CR is annotated with
observability.open-cluster-management.io/right-sizing-capable: "v1".Default behavior is unchanged — RS continues to use MCO's Policy-based
deployment. MCOA mode is opt-in and requires an admin to manually annotate
the MCO CR.
Key changes
AnalyticsReconcilerthat watches the MCO CRand handles RS lifecycle based on delegation mode:
and syncs "disabled" to AddOnDeploymentConfig (ADC) so MCOA does not deploy RS
enabled/disabled state to ADC so MCOA deploys via ManifestWork
IsRightSizingDelegated()reads the MCO CR annotationas the authoritative signal for which component manages RS
platformNamespaceRightSizingandplatformVirtualizationRightSizingvariables in ADC reflect the correctstate, preventing dual deployment by both MCO and MCOA
when
MCOAEnabled || rightSizingDelegated, ensuring MCOA infrastructureexists even when only RS is delegated
How it works
MCO CR annotation absent → MCO mode (Policy-based RS, ADC="disabled")
MCO CR annotation present → MCOA mode (cleanup Policies, ADC="enabled")
Both modes are mutually exclusive — the annotation acts as a feature gate
that ensures only one component deploys RS resources at any time.
Related MCOA PR:
stolostron/multicluster-observability-addon#449
Summary by CodeRabbit
New Features
Chores
Tests