Conversation
|
/wip |
📝 WalkthroughWalkthroughConsolidated Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Report bugs in Issues Welcome! 🎉This pull request will be automatically processed with the following features: 🔄 Automatic Actions
📋 Available CommandsPR Status Management
Review & Approval
Testing & Validation
Container Operations
Cherry-pick Operations
Label Management
✅ Merge RequirementsThis PR will be automatically approved when the following conditions are met:
📊 Review ProcessApprovers and ReviewersApprovers:
Reviewers:
Available Labels
AI Features
💡 Tips
For more information, please refer to the project documentation or contact the maintainers. |
|
D/S test |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/utils.py`:
- Around line 694-704: Add a Google-style docstring to the public helper
function get_resource_by_name that documents its behavior, parameters, return
value, and exceptions: describe that it performs a cluster lookup for a resource
of type Resource using admin_client (DynamicClient), list args (resource_kind:
Resource class, name: str, admin_client: DynamicClient, namespace: str | None),
state the return type (Resource instance) and side effect (may perform API call
to the cluster), and explicitly document Raises ResourceNotFoundError when the
named resource does not exist; also mention that kwargs['client'] is set to
admin_client and that namespace is optional.
- Around line 695-701: The parameter annotation for resource_kind is incorrect:
change its type from Resource to type[Resource] so the helper can call it as a
constructor (resource = resource_kind(**kwargs)); then add a Google-style
docstring to the helper function that describes its purpose (creating/returning
a Resource via cluster API), lists parameters, return type, and documents
exceptions the function may raise (propagate client/API exceptions when cluster
interactions fail) per coding guidelines. Ensure the docstring is public, placed
immediately under the function signature, and mentions that callers should
handle or expect client/API exceptions.
In `@tests/virt/node/gpu/vgpu/conftest.py`:
- Around line 185-200: The fixture logic currently assumes any non-"A2" GPU is
MIG-capable and can misidentify devices; update nodes_with_supported_mig_gpus
and supported_mig_gpu_device to use a single explicit MIG allow-list (e.g., a
list of supported MIG-capable model keys from GPU_CARDS_MAP) and validate each
node by checking actual MIG capability (via get_nodes_gpu_info) against that
allow-list, derive VGPU_CONFIG_LABEL from the selected GPU_CARDS_MAP entry, and
fail fast with a clear error if no nodes match; also mirror this same allow-list
in tests/virt/conftest.py::nodes_with_supported_gpus and the autouse
virt_special_infra_sanity path so both discovery and session sanity use the same
explicit supported set and avoid mixed-cluster misconfiguration.
In `@tests/virt/node/gpu/vgpu/test_rhel_vm_with_mig_vgpu.py`:
- Around line 23-28: The pytest module-level marker list `pytestmark` needs the
`pytest.mark.tier3` marker added so these hardware/platform-specific MIG vGPU
cases are only collected in the correct lane; update the `pytestmark` array
(which currently contains pytest.mark.post_upgrade, pytest.mark.special_infra,
pytest.mark.gpu, pytest.mark.usefixtures("non_existent_mdev_bus_mig_nodes")) to
include pytest.mark.tier3 alongside the others.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 57422f30-11f7-4b05-8494-050a8425c80b
📒 Files selected for processing (10)
tests/install_upgrade_operators/conftest.pytests/install_upgrade_operators/crypto_policy/utils.pytests/install_upgrade_operators/feature_gates/test_default_featuregates.pytests/install_upgrade_operators/utils.pytests/utils.pytests/virt/conftest.pytests/virt/node/gpu/constants.pytests/virt/node/gpu/utils.pytests/virt/node/gpu/vgpu/conftest.pytests/virt/node/gpu/vgpu/test_rhel_vm_with_mig_vgpu.py
💤 Files with no reviewable changes (1)
- tests/install_upgrade_operators/utils.py
| def get_resource_by_name( | ||
| resource_kind: Resource, name: str, admin_client: DynamicClient, namespace: str | None = None | ||
| ) -> Resource: | ||
| kwargs = {"name": name} | ||
| if namespace: | ||
| kwargs["namespace"] = namespace | ||
| kwargs["client"] = admin_client | ||
| resource = resource_kind(**kwargs) | ||
| if resource.exists: | ||
| return resource | ||
| raise ResourceNotFoundError(f"{resource_kind} {name} not found.") |
There was a problem hiding this comment.
MEDIUM: Add a Google-style docstring for this new shared helper.
This is a public helper with non-obvious side effects (cluster lookup + ResourceNotFoundError path); document args/returns/raises explicitly.
Proposed fix
def get_resource_by_name(
resource_kind: type[Resource], name: str, admin_client: DynamicClient, namespace: str | None = None
) -> Resource:
+ """Return an existing cluster resource by kind and name.
+
+ Args:
+ resource_kind (type[Resource]): Resource class to instantiate.
+ name (str): Resource name.
+ admin_client (DynamicClient): Cluster client.
+ namespace (str | None): Namespace for namespaced resources.
+
+ Returns:
+ Resource: The existing resource object.
+
+ Raises:
+ ResourceNotFoundError: If the resource does not exist.
+ """
kwargs = {"name": name}As per coding guidelines "Google-format docstrings REQUIRED for all public functions with non-obvious return values OR side effects".
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 704-704: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/utils.py` around lines 694 - 704, Add a Google-style docstring to the
public helper function get_resource_by_name that documents its behavior,
parameters, return value, and exceptions: describe that it performs a cluster
lookup for a resource of type Resource using admin_client (DynamicClient), list
args (resource_kind: Resource class, name: str, admin_client: DynamicClient,
namespace: str | None), state the return type (Resource instance) and side
effect (may perform API call to the cluster), and explicitly document Raises
ResourceNotFoundError when the named resource does not exist; also mention that
kwargs['client'] is set to admin_client and that namespace is optional.
| resource_kind: Resource, name: str, admin_client: DynamicClient, namespace: str | None = None | ||
| ) -> Resource: | ||
| kwargs = {"name": name} | ||
| if namespace: | ||
| kwargs["namespace"] = namespace | ||
| kwargs["client"] = admin_client | ||
| resource = resource_kind(**kwargs) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the helper signature and constructor-style usage across changed files.
rg -n "def get_resource_by_name|resource_kind\s*:" tests/utils.py tests/install_upgrade_operators/crypto_policy/utils.py
rg -n "get_resource_by_name\(" tests/utils.py tests/install_upgrade_operatorsRepository: RedHatQE/openshift-virtualization-tests
Length of output: 675
🏁 Script executed:
#!/bin/bash
# Read the full function implementation including docstring
sed -n '694,710p' tests/utils.pyRepository: RedHatQE/openshift-virtualization-tests
Length of output: 500
🏁 Script executed:
#!/bin/bash
# Check one actual call site to see what's being passed as resource_kind
sed -n '55,60p' tests/install_upgrade_operators/crypto_policy/utils.pyRepository: RedHatQE/openshift-virtualization-tests
Length of output: 294
🏁 Script executed:
#!/bin/bash
# Check what type Resource is and if it's callable
rg -A5 "^class Resource" tests/utils.py | head -15Repository: RedHatQE/openshift-virtualization-tests
Length of output: 65
Fix type annotation and add required docstring for new helper.
The resource_kind parameter is called as a constructor at line 701 (resource_kind(**kwargs)), so it must be typed as type[Resource], not Resource. Current annotation breaks strict type checking and masks call-site errors.
Additionally, this public helper performs cluster API interactions and raises exceptions—it requires a Google-format docstring explaining the behavior and exception handling per coding guidelines.
Proposed fixes
def get_resource_by_name(
- resource_kind: Resource, name: str, admin_client: DynamicClient, namespace: str | None = None
+ resource_kind: type[Resource], name: str, admin_client: DynamicClient, namespace: str | None = None
) -> Resource:
+ """Get a resource by name from the cluster.
+
+ Args:
+ resource_kind: The resource class to instantiate.
+ name: Resource name.
+ admin_client: Authenticated cluster client.
+ namespace: Optional namespace; if omitted, cluster-scoped resource is used.
+
+ Returns:
+ Instantiated and verified resource.
+
+ Raises:
+ ResourceNotFoundError: If resource does not exist on cluster.
+ """
kwargs = {"name": name}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/utils.py` around lines 695 - 701, The parameter annotation for
resource_kind is incorrect: change its type from Resource to type[Resource] so
the helper can call it as a constructor (resource = resource_kind(**kwargs));
then add a Google-style docstring to the helper function that describes its
purpose (creating/returning a Resource via cluster API), lists parameters,
return type, and documents exceptions the function may raise (propagate
client/API exceptions when cluster interactions fail) per coding guidelines.
Ensure the docstring is public, placed immediately under the function signature,
and mentions that callers should handle or expect client/API exceptions.
| def nodes_with_supported_mig_gpus(gpu_nodes, workers_utility_pods): | ||
| gpu_nodes_copy = gpu_nodes.copy() | ||
| for node in gpu_nodes: | ||
| if "A2" in get_nodes_gpu_info(util_pods=workers_utility_pods, node=node): | ||
| gpu_nodes_copy.remove(node) | ||
| return gpu_nodes_copy | ||
|
|
||
|
|
||
| @pytest.fixture(scope="class") | ||
| def supported_mig_gpu_device(workers_utility_pods, nodes_with_supported_mig_gpus): | ||
| gpu_info = get_nodes_gpu_info(util_pods=workers_utility_pods, node=nodes_with_supported_mig_gpus[0]) | ||
| for gpu_id in GPU_CARDS_MAP: | ||
| if gpu_id in gpu_info: | ||
| return GPU_CARDS_MAP[gpu_id] | ||
|
|
||
| raise UnsupportedGPUDeviceError("GPU device ID not in current GPU_CARDS_MAP!") |
There was a problem hiding this comment.
CRITICAL: Positively identify MIG-capable GPUs before labeling nodes.
Lines 185-200 currently treat every non-A2 card as MIG-capable, so a T4-first cluster can resolve supported_mig_gpu_device to the T4 entry from GPU_CARDS_MAP. Lines 215-220 then hardcode A30-1-6C anyway, so this fixture tree can misconfigure non-A30 clusters instead of failing fast; A2-only clusters also fall through to nodes_with_supported_mig_gpus[0]. Separately, the autouse tests/virt/conftest.py::virt_special_infra_sanity path still builds support from nodes_with_supported_gpus, which currently drops A30 nodes, so an A30-only MIG environment can be rejected before these fixtures run. Please drive both paths from the same explicit MIG allow-list, and derive VGPU_CONFIG_LABEL from the selected mapping.
Possible fix
+SUPPORTED_MIG_GPU_IDS = ("10de:20b7",)
+
`@pytest.fixture`(scope="class")
def nodes_with_supported_mig_gpus(gpu_nodes, workers_utility_pods):
- gpu_nodes_copy = gpu_nodes.copy()
- for node in gpu_nodes:
- if "A2" in get_nodes_gpu_info(util_pods=workers_utility_pods, node=node):
- gpu_nodes_copy.remove(node)
- return gpu_nodes_copy
+ mig_nodes = []
+ for node in gpu_nodes:
+ gpu_info = get_nodes_gpu_info(util_pods=workers_utility_pods, node=node)
+ if any(gpu_id in gpu_info for gpu_id in SUPPORTED_MIG_GPU_IDS):
+ mig_nodes.append(node)
+ return mig_nodes
`@pytest.fixture`(scope="class")
def supported_mig_gpu_device(workers_utility_pods, nodes_with_supported_mig_gpus):
+ if not nodes_with_supported_mig_gpus:
+ raise UnsupportedGPUDeviceError("No MIG-capable GPU nodes found")
gpu_info = get_nodes_gpu_info(util_pods=workers_utility_pods, node=nodes_with_supported_mig_gpus[0])
- for gpu_id in GPU_CARDS_MAP:
+ for gpu_id in SUPPORTED_MIG_GPU_IDS:
if gpu_id in gpu_info:
return GPU_CARDS_MAP[gpu_id]
@@
`@pytest.fixture`(scope="class")
def mig_gpu_nodes_labeled_with_vgpu_config(
nodes_with_supported_mig_gpus,
+ supported_mig_gpu_device,
):
label_gen = label_nodes(
nodes=nodes_with_supported_mig_gpus,
- labels={"nvidia.com/vgpu.config": "A30-1-6C"},
+ labels={VGPU_CONFIG_LABEL: supported_mig_gpu_device[MDEV_NAME_STR].split()[-1]},
)Mirror the same allow-list in tests/virt/conftest.py::nodes_with_supported_gpus / virt_special_infra_sanity, or the session sanity will still reject the target A30 environment. Based on learnings, mixed GPU clusters are explicitly not supported in this suite; all GPU nodes are assumed to share the same model.
Also applies to: 215-220
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 200-200: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/virt/node/gpu/vgpu/conftest.py` around lines 185 - 200, The fixture
logic currently assumes any non-"A2" GPU is MIG-capable and can misidentify
devices; update nodes_with_supported_mig_gpus and supported_mig_gpu_device to
use a single explicit MIG allow-list (e.g., a list of supported MIG-capable
model keys from GPU_CARDS_MAP) and validate each node by checking actual MIG
capability (via get_nodes_gpu_info) against that allow-list, derive
VGPU_CONFIG_LABEL from the selected GPU_CARDS_MAP entry, and fail fast with a
clear error if no nodes match; also mirror this same allow-list in
tests/virt/conftest.py::nodes_with_supported_gpus and the autouse
virt_special_infra_sanity path so both discovery and session sanity use the same
explicit supported set and avoid mixed-cluster misconfiguration.
| pytestmark = [ | ||
| pytest.mark.post_upgrade, | ||
| pytest.mark.special_infra, | ||
| pytest.mark.gpu, | ||
| pytest.mark.usefixtures("non_existent_mdev_bus_mig_nodes"), | ||
| ] |
There was a problem hiding this comment.
HIGH: Add the tier3 marker to this MIG vGPU module.
These cases are hardware- and platform-specific, and they rely on specialized MIG/vGPU infrastructure. Without pytest.mark.tier3, they can be collected in lanes that do not guarantee the required environment.
Suggested change
pytestmark = [
pytest.mark.post_upgrade,
pytest.mark.special_infra,
pytest.mark.gpu,
+ pytest.mark.tier3,
pytest.mark.usefixtures("non_existent_mdev_bus_mig_nodes"),
]As per coding guidelines "All new tests MUST have markers - check pytest.ini for available markers, NEVER commit unmarked tests; tier3 marker when warranted for complex/hardware/platform-specific/time-consuming tests".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/virt/node/gpu/vgpu/test_rhel_vm_with_mig_vgpu.py` around lines 23 - 28,
The pytest module-level marker list `pytestmark` needs the `pytest.mark.tier3`
marker added so these hardware/platform-specific MIG vGPU cases are only
collected in the correct lane; update the `pytestmark` array (which currently
contains pytest.mark.post_upgrade, pytest.mark.special_infra, pytest.mark.gpu,
pytest.mark.usefixtures("non_existent_mdev_bus_mig_nodes")) to include
pytest.mark.tier3 alongside the others.
|
D/S test |
|
/wip cancel |
|
Clean rebase detected — no code changes compared to previous head ( |
|
D/S test |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (4)
tests/utils.py (1)
694-704:⚠️ Potential issue | 🟠 MajorHIGH: Fix constructor typing and add docstring for this shared lookup helper.
Line 695 should use
type[Resource](notResource) because Line 701 instantiates it as a class. Current typing weakens static checks and can hide bad call sites. Also, this public helper performs cluster lookup and raisesResourceNotFoundError, so it should include a Google-style docstring.Proposed fix
def get_resource_by_name( - resource_kind: Resource, name: str, admin_client: DynamicClient, namespace: str | None = None + resource_kind: type[Resource], name: str, admin_client: DynamicClient, namespace: str | None = None ) -> Resource: + """Get an existing resource by kind and name. + + Args: + resource_kind: Resource class to instantiate. + name: Resource name. + admin_client: Cluster dynamic client. + namespace: Optional namespace for namespaced resources. + + Returns: + The existing resource object. + + Raises: + ResourceNotFoundError: If the resource does not exist. + """ kwargs = {"name": name}#!/bin/bash # Verify helper signature and constructor usage at definition + call sites. sed -n '694,706p' tests/utils.py rg -n -C2 'get_resource_by_name\(' tests/utils.py tests/install_upgrade_operatorsExpected result: definition shows
resource_kind: type[Resource], and call sites pass resource classes (e.g.,HyperConverged,KubeVirt) toresource_kind.As per coding guidelines "Google-format docstrings REQUIRED for all public functions with non-obvious return values OR side effects".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/utils.py` around lines 694 - 704, Update the signature of the helper get_resource_by_name to accept a constructor type by changing the parameter typing from Resource to type[Resource] (so callers pass classes like HyperConverged or KubeVirt and the function can instantiate them), and add a Google-style docstring that documents the function purpose (cluster lookup by name, optional namespace), parameters (resource_kind, name, admin_client, namespace), return type (Resource), and raised exceptions (ResourceNotFoundError when the resource does not exist); reference the function name get_resource_by_name, the Resource type, admin_client/DynamicClient, and ResourceNotFoundError when editing.tests/virt/node/gpu/vgpu/conftest.py (1)
171-187:⚠️ Potential issue | 🔴 CriticalCRITICAL: MIG selection still accepts non-MIG cards and then hardcodes A30 anyway.
nodes_with_supported_mig_gpus()only removes"A2", so a T4-first cluster still passes this filter.supported_mig_gpu_device()then walks the fullGPU_CARDS_MAP, which can return the T4 entry, whilemig_gpu_nodes_labeled_with_vgpu_config()unconditionally stampsA30-1-6C. That misconfigures the cluster instead of failing fast. Please use one explicit MIG allow-list, guard the empty-node case, and deriveVGPU_CONFIG_LABELfrom the selected mapping.Possible fix
+SUPPORTED_MIG_GPU_IDS = ("10de:20b7",) + `@pytest.fixture`(scope="class") def nodes_with_supported_mig_gpus(gpu_nodes, workers_utility_pods): - gpu_nodes_copy = gpu_nodes.copy() - for node in gpu_nodes: - if "A2" in get_nodes_gpu_info(util_pods=workers_utility_pods, node=node): - gpu_nodes_copy.remove(node) - return gpu_nodes_copy + mig_nodes = [] + for node in gpu_nodes: + gpu_info = get_nodes_gpu_info(util_pods=workers_utility_pods, node=node) + if any(gpu_id in gpu_info for gpu_id in SUPPORTED_MIG_GPU_IDS): + mig_nodes.append(node) + return mig_nodes `@pytest.fixture`(scope="class") def supported_mig_gpu_device(workers_utility_pods, nodes_with_supported_mig_gpus): + if not nodes_with_supported_mig_gpus: + raise UnsupportedGPUDeviceError("No MIG-capable GPU nodes found") gpu_info = get_nodes_gpu_info(util_pods=workers_utility_pods, node=nodes_with_supported_mig_gpus[0]) - for gpu_id in GPU_CARDS_MAP: + for gpu_id in SUPPORTED_MIG_GPU_IDS: if gpu_id in gpu_info: return GPU_CARDS_MAP[gpu_id] raise UnsupportedGPUDeviceError("GPU device ID not in current GPU_CARDS_MAP!") @@ def mig_gpu_nodes_labeled_with_vgpu_config( nodes_with_supported_mig_gpus, mig_gpu_nodes_labeled_with_vm_vgpu, nvidia_sandbox_validator_ds, gpu_nodes, + supported_mig_gpu_device, ): label_gen = label_nodes( nodes=nodes_with_supported_mig_gpus, - labels={"nvidia.com/vgpu.config": "A30-1-6C"}, + labels={VGPU_CONFIG_LABEL: supported_mig_gpu_device[MDEV_NAME_STR].split()[-1]}, )Based on learnings: "heterogeneous GPU models are explicitly not supported in this test suite; all GPU nodes are assumed to have the same device model."
Also applies to: 202-210
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/virt/node/gpu/vgpu/conftest.py` around lines 171 - 187, nodes_with_supported_mig_gpus and supported_mig_gpu_device currently allow non-MIG cards and hardcode an A30 vgpu label; update them to use an explicit MIG allow-list (e.g., ALLOWED_MIG_MODELS) and filter gpu_nodes by membership in that allow-list inside nodes_with_supported_mig_gpus, then in supported_mig_gpu_device only search GPU_CARDS_MAP entries for models in the same allow-list and raise a clear error if nodes_with_supported_mig_gpus is empty; finally remove the hardcoded "A30-1-6C" and instead derive VGPU_CONFIG_LABEL (used by mig_gpu_nodes_labeled_with_vgpu_config) from the selected GPU_CARDS_MAP mapping so the label matches the chosen device.tests/virt/node/gpu/vgpu/test_rhel_vm_with_mig_vgpu.py (1)
23-28:⚠️ Potential issue | 🟠 MajorHIGH: add the
tier3marker to this MIG vGPU module.These cases are hardware-specific and depend on specialized MIG/vGPU infrastructure. Without
pytest.mark.tier3, they can be collected in lanes that do not guarantee the required environment.Possible fix
pytestmark = [ pytest.mark.post_upgrade, pytest.mark.special_infra, pytest.mark.gpu, + pytest.mark.tier3, pytest.mark.usefixtures("non_existent_mdev_bus_mig_nodes"), ]As per coding guidelines: "All new tests MUST have markers - check pytest.ini for available markers, NEVER commit unmarked tests; tier3 marker when warranted for complex/hardware/platform-specific/time-consuming tests".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/virt/node/gpu/vgpu/test_rhel_vm_with_mig_vgpu.py` around lines 23 - 28, The module-level pytestmark list is missing the required tier3 marker; update the pytestmark variable in this file (the pytestmark list near the top of test_rhel_vm_with_mig_vgpu.py) to include pytest.mark.tier3 alongside the existing markers (post_upgrade, special_infra, gpu, usefixtures("non_existent_mdev_bus_mig_nodes")) so the entire MIG vGPU module is marked tier3 for collection; ensure you import or reference pytest.mark.tier3 consistently with the existing marker style.tests/virt/conftest.py (1)
184-191:⚠️ Potential issue | 🔴 CriticalCRITICAL: shared GPU discovery still filters out the only MIG-capable card.
This fixture removes every node whose GPU info contains
A30, but the new MIG flow depends on A30-backed nodes. Becausevirt_special_infra_sanity()usesnodes_with_supported_gpusforpytest.mark.gpu, an A30-only MIG cluster will fail session sanity before the MIG fixtures intests/virt/node/gpu/vgpu/conftest.pyever run. Please stop hard-excluding A30 here, or split MIG sanity onto a separate supported-node path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/virt/conftest.py` around lines 184 - 191, The nodes_with_supported_gpus fixture currently removes any node whose get_nodes_gpu_info contains "A30", which wrongly excludes A30-only MIG-capable hosts and causes virt_special_infra_sanity (pytest.mark.gpu) to fail before MIG fixtures run; update nodes_with_supported_gpus so it no longer hard-excludes "A30"—either remove the "A30" check entirely or replace it with a capability-aware check that calls the MIG-specific detector (e.g., is_mig_capable or reuse logic from tests/virt/node/gpu/vgpu/conftest.py) and only exclude nodes that truly lack required drivers/features for the non-MIG flow, or alternatively split the sanity path so virt_special_infra_sanity uses a different fixture that filters only non-MIG-incompatible nodes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/virt/node/gpu/vgpu/conftest.py`:
- Around line 242-251: The fixture non_existent_mdev_bus_mig_nodes currently
calls assert_mdev_bus_exists_on_nodes before the vGPU config label is
guaranteed; update the fixture to depend on the vGPU-config readiness fixture
(e.g., add mig_gpu_nodes_labeled_with_vgpu_config or the shared
vgpu_ready_nodes) so that the nvidia.com/vgpu.config label is applied before
calling assert_mdev_bus_exists_on_nodes; alternatively create a single MIG-ready
fixture that combines mig_gpu_nodes_labeled_with_vm_vgpu and
mig_gpu_nodes_labeled_with_vgpu_config and call assert_mdev_bus_exists_on_nodes
with that ready fixture to ensure ordering.
- Around line 113-123: The fixture update_daemon_set_to_enable_mig_vgpu
currently yields inside a loop over get_daemonsets(), which can produce multiple
yields and breaks pytest; instead collect all patch entries first (e.g. build a
patches dict mapping each matching ds to its patch_data by iterating through
all_daemonsets and creating container_patch/patch_data for
ds.name.startswith("nvidia-vgpu-manager-daemonset")), then open a single
ResourceEditor(patches=patches) context and yield exactly once; reference the
functions/classes get_daemonsets, ResourceEditor and the variable names
container_patch and patch_data to locate and apply the change.
---
Duplicate comments:
In `@tests/utils.py`:
- Around line 694-704: Update the signature of the helper get_resource_by_name
to accept a constructor type by changing the parameter typing from Resource to
type[Resource] (so callers pass classes like HyperConverged or KubeVirt and the
function can instantiate them), and add a Google-style docstring that documents
the function purpose (cluster lookup by name, optional namespace), parameters
(resource_kind, name, admin_client, namespace), return type (Resource), and
raised exceptions (ResourceNotFoundError when the resource does not exist);
reference the function name get_resource_by_name, the Resource type,
admin_client/DynamicClient, and ResourceNotFoundError when editing.
In `@tests/virt/conftest.py`:
- Around line 184-191: The nodes_with_supported_gpus fixture currently removes
any node whose get_nodes_gpu_info contains "A30", which wrongly excludes
A30-only MIG-capable hosts and causes virt_special_infra_sanity
(pytest.mark.gpu) to fail before MIG fixtures run; update
nodes_with_supported_gpus so it no longer hard-excludes "A30"—either remove the
"A30" check entirely or replace it with a capability-aware check that calls the
MIG-specific detector (e.g., is_mig_capable or reuse logic from
tests/virt/node/gpu/vgpu/conftest.py) and only exclude nodes that truly lack
required drivers/features for the non-MIG flow, or alternatively split the
sanity path so virt_special_infra_sanity uses a different fixture that filters
only non-MIG-incompatible nodes.
In `@tests/virt/node/gpu/vgpu/conftest.py`:
- Around line 171-187: nodes_with_supported_mig_gpus and
supported_mig_gpu_device currently allow non-MIG cards and hardcode an A30 vgpu
label; update them to use an explicit MIG allow-list (e.g., ALLOWED_MIG_MODELS)
and filter gpu_nodes by membership in that allow-list inside
nodes_with_supported_mig_gpus, then in supported_mig_gpu_device only search
GPU_CARDS_MAP entries for models in the same allow-list and raise a clear error
if nodes_with_supported_mig_gpus is empty; finally remove the hardcoded
"A30-1-6C" and instead derive VGPU_CONFIG_LABEL (used by
mig_gpu_nodes_labeled_with_vgpu_config) from the selected GPU_CARDS_MAP mapping
so the label matches the chosen device.
In `@tests/virt/node/gpu/vgpu/test_rhel_vm_with_mig_vgpu.py`:
- Around line 23-28: The module-level pytestmark list is missing the required
tier3 marker; update the pytestmark variable in this file (the pytestmark list
near the top of test_rhel_vm_with_mig_vgpu.py) to include pytest.mark.tier3
alongside the existing markers (post_upgrade, special_infra, gpu,
usefixtures("non_existent_mdev_bus_mig_nodes")) so the entire MIG vGPU module is
marked tier3 for collection; ensure you import or reference pytest.mark.tier3
consistently with the existing marker style.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9d464110-083a-4162-bdc9-6c389f3a1d5b
📒 Files selected for processing (10)
tests/install_upgrade_operators/conftest.pytests/install_upgrade_operators/crypto_policy/utils.pytests/install_upgrade_operators/feature_gates/test_default_featuregates.pytests/install_upgrade_operators/utils.pytests/utils.pytests/virt/conftest.pytests/virt/node/gpu/constants.pytests/virt/node/gpu/utils.pytests/virt/node/gpu/vgpu/conftest.pytests/virt/node/gpu/vgpu/test_rhel_vm_with_mig_vgpu.py
💤 Files with no reviewable changes (1)
- tests/install_upgrade_operators/utils.py
| def update_daemon_set_to_enable_mig_vgpu(update_cluster_policy_to_enable_mig_vgpu, admin_client): | ||
| all_daemonsets = get_daemonsets(admin_client=admin_client, namespace="nvidia-gpu-operator") | ||
| for ds in all_daemonsets: | ||
| if ds.name.startswith("nvidia-vgpu-manager-daemonset"): | ||
| container = ds.instance.spec.template.spec.containers[0] | ||
|
|
||
| container_patch = dict(container.items()) | ||
| container_patch["imagePullPolicy"] = "Always" | ||
| patch_data = {"spec": {"template": {"spec": {"containers": [container_patch]}}}} | ||
| with ResourceEditor(patches={ds: patch_data}): | ||
| yield |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
import ast
from pathlib import Path
path = Path("tests/virt/node/gpu/vgpu/conftest.py")
tree = ast.parse(path.read_text())
parents = {}
for parent in ast.walk(tree):
for child in ast.iter_child_nodes(parent):
parents[child] = parent
for node in tree.body:
if isinstance(node, ast.FunctionDef) and node.name == "update_daemon_set_to_enable_mig_vgpu":
for yield_node in [child for child in ast.walk(node) if isinstance(child, ast.Yield)]:
current = parents[yield_node]
yield_inside_loop = False
while current is not node:
if isinstance(current, (ast.For, ast.AsyncFor, ast.While)):
yield_inside_loop = True
break
current = parents[current]
print({"function": node.name, "yield_inside_loop": yield_inside_loop})
PYRepository: RedHatQE/openshift-virtualization-tests
Length of output: 162
HIGH: This fixture can yield multiple times, breaking pytest.
yield is nested inside the daemonset loop. When get_daemonsets() returns more than one nvidia-vgpu-manager-daemonset*, pytest will resume the generator at a second yield during teardown, causing: fixture function has more than one 'yield'. Fixtures must have exactly one yield at the top level. Accumulate all patches in a dict before entering the ResourceEditor context, then yield once.
Suggested fix
`@pytest.fixture`(scope="class")
def update_daemon_set_to_enable_mig_vgpu(update_cluster_policy_to_enable_mig_vgpu, admin_client):
- all_daemonsets = get_daemonsets(admin_client=admin_client, namespace="nvidia-gpu-operator")
- for ds in all_daemonsets:
- if ds.name.startswith("nvidia-vgpu-manager-daemonset"):
- container = ds.instance.spec.template.spec.containers[0]
-
- container_patch = dict(container.items())
- container_patch["imagePullPolicy"] = "Always"
- patch_data = {"spec": {"template": {"spec": {"containers": [container_patch]}}}}
- with ResourceEditor(patches={ds: patch_data}):
- yield
+ patches = {}
+ for daemonset in get_daemonsets(admin_client=admin_client, namespace="nvidia-gpu-operator"):
+ if daemonset.name.startswith("nvidia-vgpu-manager-daemonset"):
+ container_patch = dict(daemonset.instance.spec.template.spec.containers[0].items())
+ container_patch["imagePullPolicy"] = "Always"
+ patches[daemonset] = {"spec": {"template": {"spec": {"containers": [container_patch]}}}}
+
+ with ResourceEditor(patches=patches):
+ yield🧰 Tools
🪛 Ruff (0.15.10)
[warning] 113-113: Unused function argument: update_cluster_policy_to_enable_mig_vgpu
(ARG001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/virt/node/gpu/vgpu/conftest.py` around lines 113 - 123, The fixture
update_daemon_set_to_enable_mig_vgpu currently yields inside a loop over
get_daemonsets(), which can produce multiple yields and breaks pytest; instead
collect all patch entries first (e.g. build a patches dict mapping each matching
ds to its patch_data by iterating through all_daemonsets and creating
container_patch/patch_data for
ds.name.startswith("nvidia-vgpu-manager-daemonset")), then open a single
ResourceEditor(patches=patches) context and yield exactly once; reference the
functions/classes get_daemonsets, ResourceEditor and the variable names
container_patch and patch_data to locate and apply the change.
| @pytest.fixture(scope="class") | ||
| def non_existent_mdev_bus_mig_nodes( | ||
| workers_utility_pods, | ||
| mig_gpu_nodes_labeled_with_vm_vgpu, | ||
| nodes_with_supported_mig_gpus, | ||
| ): | ||
| """ | ||
| Check if the mdev_bus needed for vGPU is available. | ||
| """ | ||
| assert_mdev_bus_exists_on_nodes(workers_utility_pods=workers_utility_pods, nodes=nodes_with_supported_mig_gpus) |
There was a problem hiding this comment.
HIGH: mdev_bus is asserted before the MIG profile label is guaranteed to exist.
This fixture depends on mig_gpu_nodes_labeled_with_vm_vgpu, but not on mig_gpu_nodes_labeled_with_vgpu_config. That means assert_mdev_bus_exists_on_nodes() can run before nvidia.com/vgpu.config is applied, unlike the shared path in tests/virt/conftest.py, which waits on vgpu_ready_nodes first. Please depend on the vGPU-config fixture here, or introduce a single MIG “ready” fixture and assert against that.
🧰 Tools
🪛 Ruff (0.15.10)
[warning] 245-245: Unused function argument: mig_gpu_nodes_labeled_with_vm_vgpu
(ARG001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/virt/node/gpu/vgpu/conftest.py` around lines 242 - 251, The fixture
non_existent_mdev_bus_mig_nodes currently calls assert_mdev_bus_exists_on_nodes
before the vGPU config label is guaranteed; update the fixture to depend on the
vGPU-config readiness fixture (e.g., add mig_gpu_nodes_labeled_with_vgpu_config
or the shared vgpu_ready_nodes) so that the nvidia.com/vgpu.config label is
applied before calling assert_mdev_bus_exists_on_nodes; alternatively create a
single MIG-ready fixture that combines mig_gpu_nodes_labeled_with_vm_vgpu and
mig_gpu_nodes_labeled_with_vgpu_config and call assert_mdev_bus_exists_on_nodes
with that ready fixture to ensure ordering.
|
Clean rebase detected — no code changes compared to previous head ( |
|
jerry7z can not be added as reviewer. Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests repository.: 422 {"message": "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests repository.", "documentation_url": "https://docs.github.com/rest/pulls/review-requests#request-reviewers-for-a-pull-request", "status": "422"} |
|
SiboWang1997 can not be added as reviewer. Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests repository.: 422 {"message": "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests repository.", "documentation_url": "https://docs.github.com/rest/pulls/review-requests#request-reviewers-for-a-pull-request", "status": "422"} |
|
D/S test |
Short description:
Add tests for VM with MIG VGPU
More details:
What this PR does / why we need it:
Which issue(s) this PR fixes:
Special notes for reviewer:
RedHatQE/openshift-python-wrapper#2692 wrapper PR for cluster policcy resource
jira-ticket:
Summary by CodeRabbit
New Features
Tests
Chores