Skip to content

fix: multi-GPU support for Newton physics with camera rendering#5236

Open
ClawLabby wants to merge 1 commit intoisaac-sim:developfrom
ClawLabby:pr/multigpu-newton-camera-fix
Open

fix: multi-GPU support for Newton physics with camera rendering#5236
ClawLabby wants to merge 1 commit intoisaac-sim:developfrom
ClawLabby:pr/multigpu-newton-camera-fix

Conversation

@ClawLabby
Copy link
Copy Markdown

Description

Fixes three interrelated bugs that prevented multi-GPU distributed training with Newton physics and camera observations:

  1. Device assignment: When torchrun sets CUDA_VISIBLE_DEVICES per process (each rank sees 1 GPU), cuda:{local_rank} points to a non-existent device. Fix: detect visible GPU count and fall back to cuda:0 when each process is restricted to a single GPU.

  2. Camera renderer preset: MultiBackendRendererCfg was missing a newton field, so the preset system could not auto-select NewtonWarpRendererCfg when using presets=newton. This caused cameras to default to Kit/RTX rendering, which conflicts with Newton physics in headless multi-GPU mode. Also adds physx alias pointing to the default RTX renderer.

  3. Fabric mode detection: _is_kit_camera() in sim_launcher.py did not recognize PresetCfg renderer configs, incorrectly flagging Newton cameras as Kit cameras and forcing --enable_cameras to enable fabric — which causes race conditions on non-cuda:0 GPUs with Newton. Fix: treat PresetCfg renderers as non-Kit (they resolve to match the physics backend).

Additionally, torch.cuda.set_device() is now called early in both AppLauncher and the training script so that physics backends (Newton/Warp) that allocate on the "current" CUDA device during initialization get the correct GPU.

Testing

  • Includes unit tests in source/isaaclab/test/test_multigpu_newton.py:
    • Device assignment logic for both full-visibility and restricted-GPU scenarios
    • Preset renderer matching (newton field exists and is correct type)
    • Kit camera detection with PresetCfg renderers
    • Integration test: multi-GPU cartpole training with Newton physics (requires 2+ GPUs)
  • Validated on 4×L40 system with DexSuite Kuka-Allegro lift task (4096 envs/GPU, camera observations)

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New tests added

Checklist

  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have added tests that prove my fix is effective
  • I have updated the changelog and added documentation in relevant places

@github-actions github-actions bot added bug Something isn't working documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels Apr 10, 2026
@gavrielstate gavrielstate added the BOT Generated This item has been generated by a bot label Apr 10, 2026
@ClawLabby ClawLabby force-pushed the pr/multigpu-newton-camera-fix branch from 391bfc2 to 5202601 Compare April 10, 2026 20:54
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 10, 2026

Greptile Summary

This PR fixes three bugs in multi-GPU distributed training with Newton physics and camera rendering: device assignment under CUDA_VISIBLE_DEVICES restriction, a missing newton field on MultiBackendRendererCfg, and _is_kit_camera() incorrectly treating PresetCfg renderers as Kit cameras.

  • The CHANGELOG.rst files for source/isaaclab/ and source/isaaclab_tasks/ are not updated, violating the project rule that every change targeting the source directory must include a changelog entry with a new version heading and a matching bump of config/extension.toml.

Confidence Score: 5/5

Safe to merge after addressing the missing CHANGELOG entries and the minor test/preset quality issues.

All findings are P2: shared mutable reference on unannotated preset aliases, a vacuous unit test, a redundant skipif decorator, and an incomplete copyright header. No production code has a definite runtime bug; the core device-assignment and camera-detection fixes are logically correct.

source/isaaclab_tasks/isaaclab_tasks/utils/presets.py (unannotated alias fields) and source/isaaclab/test/test_multigpu_newton.py (test quality issues).

Important Files Changed

Filename Overview
source/isaaclab_tasks/isaaclab_tasks/utils/presets.py Adds newton field and physx alias to MultiBackendRendererCfg; isaacsim_rtx_renderer and physx are unannotated class attributes that share a single mutable IsaacRtxRendererCfg instance across all instances.
source/isaaclab_tasks/isaaclab_tasks/utils/sim_launcher.py Adds PresetCfg recognition to _is_kit_camera() so Newton-preset cameras are not incorrectly flagged as Kit cameras; logic and fallback are correct.
source/isaaclab/isaaclab/app/app_launcher.py Adds torch.cuda.set_device() call early in _resolve_device_settings and fixes device assignment logic when CUDA_VISIBLE_DEVICES restricts each rank to a single GPU.
scripts/reinforcement_learning/rsl_rl/train.py Adds per-rank device selection and torch.cuda.set_device() call inside the distributed training block; logic matches AppLauncher's _resolve_device_settings.
source/isaaclab/test/test_multigpu_newton.py New test file; test_device_assignment_logic tests only its own inline conditional rather than the real implementation, copyright header is incomplete, and a method-level skipif duplicates the class-level decorator.
docs/source/how-to/multigpu_newton_training.md New how-to guide documenting multi-GPU Newton training usage, environment variables, and known issues; informational only.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[torchrun spawns N processes] --> B[train.py: parse args]
    B --> C[launch_simulation context manager]
    C --> D{needs_kit?}
    D -->|Newton + RTX cameras| E[AppLauncher.__init__]
    E --> F[_resolve_device_settings]
    F --> G{num_visible_gpus >= world_size?}
    G -->|Yes| H[device_id = local_rank]
    G -->|No - CUDA_VISIBLE_DEVICES restricted| I[device_id = 0]
    H & I --> J[torch.cuda.set_device early]
    D -->|Newton-only no Kit cameras| K[No AppLauncher]
    K --> L[yield into with block]
    J --> L
    L --> M{args_cli.distributed?}
    M -->|Yes| N[Re-compute device same logic]
    N --> O[torch.cuda.set_device again]
    O --> P[gym.make - Newton/Warp init on correct GPU]
    M -->|No| P
Loading

Reviews (1): Last reviewed commit: "test: add multi-GPU Newton physics tests" | Re-trigger Greptile

Comment on lines +21 to +22
isaacsim_rtx_renderer = default
physx = default
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unannotated aliases share a single mutable object

isaacsim_rtx_renderer = default and physx = default are not annotated, so @configclass does not copy them per instance — every MultiBackendRendererCfg() object shares the same class-level IsaacRtxRendererCfg() instance. If any call site or preset resolver ever mutates a field on one of these aliases, the mutation is visible through all three names (default, isaacsim_rtx_renderer, physx). Add type annotations so configclass gives each instance its own copy:

Suggested change
isaacsim_rtx_renderer = default
physx = default
isaacsim_rtx_renderer: IsaacRtxRendererCfg = IsaacRtxRendererCfg()
physx: IsaacRtxRendererCfg = IsaacRtxRendererCfg()

Comment on lines +24 to +45
def test_device_assignment_logic(self):
"""Test that device assignment logic works correctly for different scenarios."""
# Scenario 1: 4 visible GPUs, world_size=4 → each rank gets its own GPU
num_visible = 4
world_size = 4
for local_rank in range(4):
if num_visible >= world_size:
expected_device = f"cuda:{local_rank}"
else:
expected_device = "cuda:0"
assert expected_device == f"cuda:{local_rank}"

# Scenario 2: 1 visible GPU (CUDA_VISIBLE_DEVICES restricted), world_size=4
# → all ranks use cuda:0 (they each see only their assigned GPU)
num_visible = 1
world_size = 4
for local_rank in range(4):
if num_visible >= world_size:
expected_device = f"cuda:{local_rank}"
else:
expected_device = "cuda:0"
assert expected_device == "cuda:0"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Test only verifies its own inline logic, not the implementation

test_device_assignment_logic hard-codes num_visible and world_size locally and then asserts a condition computed with those same local values — it never calls AppLauncher._resolve_device_settings or the equivalent logic in train.py. A bug introduced in either of those functions (e.g. flipping >= to >) would leave this test green. The test should import and invoke the actual helper, or at minimum mock the environment variables and assert the resulting device_id.

Comment on lines +47 to +48
@pytest.mark.skipif(_get_num_gpus() < 2, reason="Requires at least 2 GPUs")
def test_cartpole_newton_multigpu(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant skipif on method already guarded by class decorator

TestMultiGPUDeviceAssignment is already decorated with @pytest.mark.skipif(_get_num_gpus() < 2, ...), so test_cartpole_newton_multigpu inherits that skip condition. The identical @pytest.mark.skipif on the method itself is noise and can be removed.

Suggested change
@pytest.mark.skipif(_get_num_gpus() < 2, reason="Requires at least 2 GPUs")
def test_cartpole_newton_multigpu(self):
def test_cartpole_newton_multigpu(self):

Comment on lines +1 to +3
# Copyright (c) 2022-2026, The Isaac Lab Project Developers.
# SPDX-License-Identifier: BSD-3-Clause
"""Test multi-GPU device assignment for Newton physics."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Incomplete copyright header

New files must use the full project copyright header (AGENTS.md §"File headers and copyright"). The current header is missing the contributors URL and the "All rights reserved." line:

Suggested change
# Copyright (c) 2022-2026, The Isaac Lab Project Developers.
# SPDX-License-Identifier: BSD-3-Clause
"""Test multi-GPU device assignment for Newton physics."""
# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: multi-GPU Newton physics with camera rendering

Good catch on a real production pain point — multi-GPU training with CUDA_VISIBLE_DEVICES restriction is a common deployment pattern (SLURM, Kubernetes) and this was genuinely broken. The three-bug diagnosis is solid and the fixes are coherent.

What works well

  • The num_visible >= world_size heuristic is the right approach for detecting per-process GPU restriction
  • Early torch.cuda.set_device() is essential for Newton/Warp — physics backends that allocate on the "current" device during init would otherwise all pile onto cuda:0
  • The PresetCfg escape hatch in _is_kit_camera() is logically correct — preset renderers resolve to match the physics backend, so assuming non-Kit is the right default

Issues to address

See inline comments. Summary:

  1. Duplicated device logic between train.py and app_launcher.py — should be factored out
  2. physx = default alias shares a mutable instance (Greptile flagged this too)
  3. torch.cuda.set_device() called twice in the distributed path — once in app_launcher.py, once again in train.py. The second call is redundant if AppLauncher already ran, or both are needed if Newton bypasses AppLauncher. The intent should be documented.
  4. Test test_device_assignment_logic tests its own local variables, not the actual implementation
  5. Missing CHANGELOG entries (as Greptile noted)

Overall: the production code changes are correct and well-motivated. The test and preset alias issues should be cleaned up before merge.

env_cfg.sim.device = f"cuda:{local_rank}"
else:
env_cfg.sim.device = "cuda:0"
agent_cfg.device = env_cfg.sim.device
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The device-selection logic here (num_visible >= world_size → cuda:local_rank, else cuda:0) is duplicated verbatim from app_launcher.py. This creates a maintenance risk — if someone updates one path and forgets the other, they'll silently diverge.

Consider extracting this into a shared utility, e.g.:

# isaaclab/utils/distributed.py
def resolve_cuda_device(local_rank: int) -> str:
    num_visible = torch.cuda.device_count()
    world_size = int(os.getenv("WORLD_SIZE", "1"))
    if num_visible >= world_size:
        return f"cuda:{local_rank}"
    return "cuda:0"

Both train.py and app_launcher.py can then call the same function. This also makes the logic unit-testable without mocking the entire AppLauncher.

# (e.g. Newton/Warp) that use the current device get the correct GPU.
if "cuda" in env_cfg.sim.device:
torch.cuda.set_device(env_cfg.sim.device)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This torch.cuda.set_device() call may be redundant if AppLauncher._resolve_device_settings() already ran earlier (which also calls set_device). In the Newton-without-Kit path where AppLauncher is skipped, this is the only place it happens — which is critical.

Worth adding a comment clarifying the intent:

# Set device here because in the kitless Newton path, AppLauncher
# may not have run (launch_simulation skips it), so this is the
# only place the current CUDA device gets set.

# When CUDA_VISIBLE_DEVICES restricts each process to a single GPU,
# local_rank may exceed the visible device count. Fall back to cuda:0
# so the process uses the one GPU it can see.
import torch
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing torch inside this method is fine for the distributed branch (it's only hit when distributed=True), but the second import torch at line 941 is in the unconditional path — meaning every single-GPU AppLauncher init now imports torch at this point too. That's probably harmless in practice (torch is already imported by then), but the double import is a bit messy. Consider moving both to a single conditional import at the top of the method, or using the module-level import that likely already exists.

default: IsaacRtxRendererCfg = IsaacRtxRendererCfg()
newton: NewtonWarpRendererCfg = NewtonWarpRendererCfg()
newton_renderer: NewtonWarpRendererCfg = NewtonWarpRendererCfg()
ovrtx_renderer: OVRTXRendererCfg = OVRTXRendererCfg()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Greptile's finding here. physx = default creates a class-level alias that points to the same IsaacRtxRendererCfg() instance as default. If @configclass doesn't deep-copy unannotated attributes, mutating one mutates all.

Also, is physx the right name for an alias to IsaacRtxRendererCfg? The PhysX backend doesn't inherently require RTX rendering — the name suggests "use whatever renderer is appropriate for PhysX" but it hard-wires RTX. If this is intentional (PhysX always uses RTX in Isaac Lab), a comment explaining the mapping would help.

# PresetCfg renderers (e.g. MultiBackendRendererCfg) are resolved later;
# assume they will match the physics backend, so not necessarily Kit.
from isaaclab_tasks.utils import PresetCfg
if isinstance(renderer_cfg, PresetCfg):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import is inside a function that could be called frequently during config scanning (_scan_config visits every node). The import is cached by Python after the first call, so there's no real performance issue, but it's worth considering whether this should be a module-level import instead.

Also: the fallback return True on line 132 (unchanged) means any unknown renderer type is treated as Kit. That's the conservative choice (forces Kit launch), but it means new renderer types added in the future will silently require Kit unless they're explicitly handled here. Worth a logger.debug() or comment noting this design choice.

assert expected_device == f"cuda:{local_rank}"

# Scenario 2: 1 visible GPU (CUDA_VISIBLE_DEVICES restricted), world_size=4
# → all ranks use cuda:0 (they each see only their assigned GPU)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't actually test the implementation — it re-implements the same conditional logic inline and asserts against its own local variables. A bug in app_launcher.py (e.g., flipping >= to >) wouldn't be caught.

To make this a real unit test, either:

  1. Extract the device logic into a shared function (as suggested on train.py) and test that directly
  2. Mock torch.cuda.device_count() and os.environ and call AppLauncher._resolve_device_settings() (harder due to side effects)
  3. At minimum, import and call the same code path used in production

As-is, this test is giving false confidence.

test_dir = os.path.dirname(os.path.abspath(__file__)) # .../source/isaaclab/test
isaaclab_root = os.path.dirname(os.path.dirname(os.path.dirname(test_dir))) # .../IsaacLab

result = subprocess.run(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subprocess.run integration test has a 300s timeout which is generous. A few concerns:

  1. Hard-coded script path scripts/reinforcement_learning/rsl_rl/train.py — this assumes the test is run from the IsaacLab root. The cwd computation at lines 83-84 tries to handle this, but it's fragile (counts parent directories).

  2. NCCL_P2P_DISABLE=1 and NCCL_IB_DISABLE=1 — these are reasonable for CI but will mask P2P/IB-related failures. Worth a comment explaining why.

  3. Assertion on line 94 checks for "Learning iteration 1" in stdout OR returncode == 0. The or is too permissive — a process that exits 0 without actually training (e.g., early skip) would pass. Consider requiring both conditions, or at least checking that stderr doesn't contain CUDA errors.

Fixes three interrelated bugs that prevented multi-GPU distributed training
with Newton physics and camera observations:

1. Device assignment: When CUDA_VISIBLE_DEVICES restricts each process to a
   single GPU, local_rank may exceed the visible device count. Added
   resolve_cuda_device() utility that falls back to cuda:0 in this case.

2. Camera renderer preset: MultiBackendRendererCfg was missing a 'newton'
   field, so the preset system could not auto-select NewtonWarpRendererCfg.
   Also fixed unannotated alias fields (physx, isaacsim_rtx_renderer) that
   shared a single mutable instance.

3. Fabric mode detection: _is_kit_camera() did not recognize PresetCfg
   renderer configs, incorrectly flagging Newton cameras as Kit cameras.

Additionally, torch.cuda.set_device() is now called early in AppLauncher
and train.py so physics backends that allocate on the 'current' CUDA device
get the correct GPU.

Includes unit tests for resolve_cuda_device() and the preset/camera fixes.
@ClawLabby ClawLabby force-pushed the pr/multigpu-newton-camera-fix branch from 5202601 to f3005fd Compare April 10, 2026 21:36
@configclass
class MultiBackendRendererCfg(PresetCfg):
default: IsaacRtxRendererCfg = IsaacRtxRendererCfg()
newton: NewtonWarpRendererCfg = NewtonWarpRendererCfg()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addition in this files are all unnecessary

if isinstance(renderer_cfg, RendererCfg):
return renderer_cfg.renderer_type in ("default", "isaac_rtx")
# PresetCfg renderers (e.g. MultiBackendRendererCfg) are resolved later;
# assume they will match the physics backend, so not necessarily Kit.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fix doesn't seems right it should be resolved correctly not patching it here

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR fixes three interrelated bugs preventing multi-GPU distributed training with Newton physics and camera rendering. The core approach — centralizing device resolution, adding early torch.cuda.set_device(), adding PresetCfg recognition, and adding the missing newton preset field — is architecturally sound. However, the new resolve_cuda_device() utility has a correctness bug that would break multi-node training, and the fix was only applied to one of four training scripts that share the same pattern.

Design Assessment

Approach is sound with one flaw. Creating a centralized resolve_cuda_device() utility is the right abstraction — the device resolution logic was duplicated between app_launcher.py and train.py, and this deduplicates it. The PresetCfg escape hatch in _is_kit_camera() correctly handles deferred renderer resolution. The isaacsim_rtx_renderer field annotation fix (from bare assignment to typed field) is a good catch that prevents shared-mutable-instance bugs with @configclass.

Findings

🔴 Critical: resolve_cuda_device() breaks multi-node trainingsource/isaaclab/isaaclab/utils/distributed.py:31

The function compares num_visible against WORLD_SIZE (global process count), but WORLD_SIZE spans all nodes. In a multi-node setup (e.g. 2 nodes × 4 GPUs), WORLD_SIZE=8 while each node sees 4 GPUs. The check num_visible(4) >= world_size(8) evaluates to False, causing every rank on a node to fall back to cuda:0 instead of using its local_rank.

The correct comparison is against local_rank directly:

    num_visible = torch.cuda.device_count()
    if local_rank < num_visible:
        device_id = local_rank
    else:
        device_id = 0
    return f"cuda:{device_id}", device_id

This correctly handles all three scenarios:

  • Full visibility (4 GPUs, local_rank 0–3): each rank gets its own GPU ✓
  • Restricted (1 GPU per process via CUDA_VISIBLE_DEVICES): local_rank ≥ 1 falls back to cuda:0
  • Multi-node (4 visible, 8 global): local_rank 0–3 < 4, each gets its own GPU ✓

The WORLD_SIZE variable and its os.getenv call can be removed entirely.

🟡 Warning: Same bug exists in rl_games/train.py and skrl/train.pyscripts/reinforcement_learning/

Only rsl_rl/train.py was updated to use resolve_cuda_device(). The other two RL training scripts still use the old f"cuda:{local_rank}" pattern:

  • scripts/reinforcement_learning/rl_games/train.py:115-118 — uses f"cuda:{local_rank}" for both agent_cfg and env_cfg.sim.device
  • scripts/reinforcement_learning/skrl/train.py:132 — uses f"cuda:{local_rank}" for env_cfg.sim.device

Neither calls torch.cuda.set_device() either. These should be updated to use resolve_cuda_device() for consistency and to get the same CUDA-visible-devices fix. If intentionally left for a follow-up, a tracking issue or TODO comment would help.

🟡 Warning: resolve_cuda_device() does not validate local_rank against visible devicessource/isaaclab/isaaclab/utils/distributed.py

When num_visible >= world_size (or with the suggested fix, local_rank < num_visible), the function trusts local_rank without bounds-checking against num_visible. If local_ranknum_visible in a misconfigured environment, this silently produces an invalid device string (e.g. cuda:4 when only 2 GPUs are visible), which would only fail later with an opaque CUDA error.

Consider adding a guard:

if device_id >= num_visible:
    raise RuntimeError(
        f"local_rank={local_rank} exceeds visible CUDA devices ({num_visible}). "
        f"Check CUDA_VISIBLE_DEVICES and torchrun --nproc_per_node."
    )

🔵 Suggestion: Redundant torch.cuda.set_device() calls could use a commentsource/isaaclab/isaaclab/app/app_launcher.py:928 and scripts/reinforcement_learning/rsl_rl/train.py:143

The training script's comment explains the Newton kitless path clearly. However, when AppLauncher does run, set_device() is called twice (once in app_launcher.py, once in train.py). This is harmless but a brief note in the AppLauncher call like "Also called in training scripts for the kitless Newton path" would make the redundancy obviously intentional and prevent a future contributor from removing one call thinking it's dead code.

Test Coverage

  • Bug fix PR: Has regression test? Yestest_multigpu_newton.py covers all three fix areas with proper mocking.
  • Test quality: Goodtest_resolve_cuda_device_full_visibility and test_resolve_cuda_device_restricted_visibility directly exercise the fixed function with mocked GPU counts. test_preset_renderer_matching verifies the newton field exists. test_camera_not_kit_with_preset_renderer verifies the _is_kit_camera fix.
  • Integration test (test_cartpole_newton_multigpu) appropriately requires 2+ GPUs with pytest.mark.skipif and has a 300s timeout. The NCCL_P2P_DISABLE/NCCL_IB_DISABLE environment vars are good practice for container compatibility.
  • Gap: No test covers the multi-node scenario (the critical bug above). Add a test case where device_count=4 and WORLD_SIZE=8 to verify local_rank=2 maps to cuda:2, not cuda:0.

CI Status

Only the labeler check is visible (pass). No lint, build, or test CI results are shown — this may be expected for the repo's CI configuration, but the absence of automated test runs on this PR is worth noting.

Verdict

COMMENT

The device resolution centralization and the three bug fixes are well-motivated and mostly correct. The critical multi-node bug in resolve_cuda_device() should be fixed before merge — it's a one-line change (compare against local_rank instead of WORLD_SIZE). The unfixed RL training scripts are a secondary concern that could be addressed in a follow-up.

"""
num_visible = torch.cuda.device_count()
world_size = int(os.getenv("WORLD_SIZE", "1"))
if num_visible >= world_size:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical: Breaks multi-node training. This compares against WORLD_SIZE (global process count across all nodes), not local process count. In a 2-node × 4-GPU setup (WORLD_SIZE=8, num_visible=4), this evaluates to False and every rank falls back to cuda:0.

The simplest correct check is against local_rank directly:

Suggested change
if num_visible >= world_size:
if local_rank < num_visible:

This makes the WORLD_SIZE env var unnecessary — line 30 can be removed entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BOT Generated This item has been generated by a bot bug Something isn't working documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants