Skip to content

Commit f17db88

Browse files
authored
Updates RL libraries training performance comparison (isaac-sim#4109)
# Description > Reopening pending PR (closed at that point) for when the cleanup and removal of the internal repository was performed. This PR updates the agent configuration (to be as similar as possible) for the `Isaac-Humanoid-v0` task to ensure a more accurate comparison of the RL libraries when generating the [Training Performance](https://isaac-sim.github.io/IsaacLab/main/source/overview/reinforcement-learning/rl_frameworks.html#training-performance) table. To this end: 1. A common Training time info (e.g.: `Training time: XXX.YY seconds`) is printed when running existing `train.py` scripts. Currently the RL libraries output training information in different formats and extends. 2. A note is added to involved agent configurations to notify/ensure that any modification should be propagated to the other agent configuration files. 3. The commands used to benchmark the RL libraries is added to docs, for clearness and repro. ## Screenshots Difference between current agent configuration (red) and new agent configuration (green) showing that the new configuration does not represent a radical change in learning <img width="1230" height="880" alt="Screenshot from 2025-11-28 13-19-14" src="https://github.com/user-attachments/assets/12a098c1-c169-4e09-b60f-b5f105341fbd" /> ## Checklist - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there <!-- As you go through the checklist above, you can mark something as done by putting an x character in it For example, - [x] I have done this task - [ ] I have not done this task -->
1 parent aec36d9 commit f17db88

File tree

9 files changed

+80
-19
lines changed

9 files changed

+80
-19
lines changed

docs/source/overview/reinforcement-learning/rl_frameworks.rst

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -71,18 +71,26 @@ Training Performance
7171
--------------------
7272

7373
We performed training with each RL library on the same ``Isaac-Humanoid-v0`` environment
74-
with ``--headless`` on a single RTX PRO 6000 GPU using 4096 environments
75-
and logged the total training time for 65.5M steps for each RL library.
76-
74+
with ``--headless`` on a single NVIDIA GeForce RTX 4090 and logged the total training time
75+
for 65.5M steps (4096 environments x 32 rollout steps x 500 iterations).
7776

7877
+--------------------+-----------------+
7978
| RL Library | Time in seconds |
8079
+====================+=================+
81-
| RL-Games | 207 |
80+
| RL-Games | 201 |
8281
+--------------------+-----------------+
83-
| SKRL | 208 |
82+
| SKRL | 201 |
8483
+--------------------+-----------------+
85-
| RSL RL | 199 |
84+
| RSL RL | 198 |
8685
+--------------------+-----------------+
87-
| Stable-Baselines3 | 322 |
86+
| Stable-Baselines3 | 287 |
8887
+--------------------+-----------------+
88+
89+
Training commands (check for the *'Training time: XXX seconds'* line in the terminal output):
90+
91+
.. code:: bash
92+
93+
python scripts/reinforcement_learning/rl_games/train.py --task Isaac-Humanoid-v0 --max_iterations 500 --headless
94+
python scripts/reinforcement_learning/skrl/train.py --task Isaac-Humanoid-v0 --max_iterations 500 --headless
95+
python scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Humanoid-v0 --max_iterations 500 --headless
96+
python scripts/reinforcement_learning/sb3/train.py --task Isaac-Humanoid-v0 --max_iterations 500 --headless

scripts/reinforcement_learning/rl_games/train.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@
6767
import math
6868
import os
6969
import random
70+
import time
7071
from datetime import datetime
7172

7273
from rl_games.common import env_configurations, vecenv
@@ -201,6 +202,8 @@ def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agen
201202
print_dict(video_kwargs, nesting=4)
202203
env = gym.wrappers.RecordVideo(env, **video_kwargs)
203204

205+
start_time = time.time()
206+
204207
# wrap around environment for rl-games
205208
env = RlGamesVecEnvWrapper(env, rl_device, clip_obs, clip_actions, obs_groups, concate_obs_groups)
206209

@@ -250,6 +253,8 @@ def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agen
250253
else:
251254
runner.run({"train": True, "play": False, "sigma": train_sigma})
252255

256+
print(f"Training time: {round(time.time() - start_time, 2)} seconds")
257+
253258
# close the simulator
254259
env.close()
255260

scripts/reinforcement_learning/rsl_rl/train.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@
7878
import gymnasium as gym
7979
import logging
8080
import os
81+
import time
8182
import torch
8283
from datetime import datetime
8384

@@ -187,6 +188,8 @@ def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agen
187188
print_dict(video_kwargs, nesting=4)
188189
env = gym.wrappers.RecordVideo(env, **video_kwargs)
189190

191+
start_time = time.time()
192+
190193
# wrap around environment for rsl-rl
191194
env = RslRlVecEnvWrapper(env, clip_actions=agent_cfg.clip_actions)
192195

@@ -212,6 +215,8 @@ def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agen
212215
# run training
213216
runner.learn(num_learning_iterations=agent_cfg.max_iterations, init_at_random_ep_len=True)
214217

218+
print(f"Training time: {round(time.time() - start_time, 2)} seconds")
219+
215220
# close the simulator
216221
env.close()
217222

scripts/reinforcement_learning/sb3/train.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ def cleanup_pbar(*args):
8080
import numpy as np
8181
import os
8282
import random
83+
import time
8384
from datetime import datetime
8485

8586
from stable_baselines3 import PPO
@@ -176,6 +177,8 @@ def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agen
176177
print_dict(video_kwargs, nesting=4)
177178
env = gym.wrappers.RecordVideo(env, **video_kwargs)
178179

180+
start_time = time.time()
181+
179182
# wrap around environment for stable baselines
180183
env = Sb3VecEnvWrapper(env, fast_variant=not args_cli.keep_all_info)
181184

@@ -223,6 +226,8 @@ def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agen
223226
print("Saving normalization")
224227
env.save(os.path.join(log_dir, "model_vecnormalize.pkl"))
225228

229+
print(f"Training time: {round(time.time() - start_time, 2)} seconds")
230+
226231
# close the simulator
227232
env.close()
228233

scripts/reinforcement_learning/skrl/train.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@
7878
import logging
7979
import os
8080
import random
81+
import time
8182
from datetime import datetime
8283

8384
import skrl
@@ -214,6 +215,8 @@ def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agen
214215
print_dict(video_kwargs, nesting=4)
215216
env = gym.wrappers.RecordVideo(env, **video_kwargs)
216217

218+
start_time = time.time()
219+
217220
# wrap around environment for skrl
218221
env = SkrlVecEnvWrapper(env, ml_framework=args_cli.ml_framework) # same as: `wrap_env(env, wrapper="auto")`
219222

@@ -229,6 +232,8 @@ def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agen
229232
# run training
230233
runner.run()
231234

235+
print(f"Training time: {round(time.time() - start_time, 2)} seconds")
236+
232237
# close the simulator
233238
env.close()
234239

source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/rl_games_ppo_cfg.yaml

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,14 @@
33
#
44
# SPDX-License-Identifier: BSD-3-Clause
55

6+
# ========================================= IMPORTANT NOTICE =========================================
7+
#
8+
# This file defines the agent configuration used to generate the "Training Performance" table in
9+
# https://isaac-sim.github.io/IsaacLab/main/source/overview/reinforcement-learning/rl_frameworks.html.
10+
# Ensure that the configurations for the other RL libraries are updated if this one is modified.
11+
#
12+
# ====================================================================================================
13+
614
params:
715
seed: 42
816

@@ -50,13 +58,13 @@ params:
5058
device_name: 'cuda:0'
5159
multi_gpu: False
5260
ppo: True
53-
mixed_precision: True
61+
mixed_precision: False
5462
normalize_input: True
5563
normalize_value: True
5664
value_bootstrap: True
5765
num_actors: -1
5866
reward_shaper:
59-
scale_value: 0.6
67+
scale_value: 1.0
6068
normalize_advantage: True
6169
gamma: 0.99
6270
tau: 0.95
@@ -72,7 +80,7 @@ params:
7280
truncate_grads: True
7381
e_clip: 0.2
7482
horizon_length: 32
75-
minibatch_size: 32768
83+
minibatch_size: 32768 # num_envs * horizon_length / num_mini_batches
7684
mini_epochs: 5
7785
critic_coef: 4
7886
clip_value: True

source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/rsl_rl_ppo_cfg.py

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,17 @@
33
#
44
# SPDX-License-Identifier: BSD-3-Clause
55

6+
"""
7+
========================================= IMPORTANT NOTICE =========================================
8+
9+
This file defines the agent configuration used to generate the "Training Performance" table in
10+
https://isaac-sim.github.io/IsaacLab/main/source/overview/reinforcement-learning/rl_frameworks.html.
11+
Ensure that the configurations for the other RL libraries are updated if this one is modified.
12+
13+
====================================================================================================
14+
"""
15+
16+
617
from isaaclab.utils import configclass
718

819
from isaaclab_rl.rsl_rl import RslRlOnPolicyRunnerCfg, RslRlPpoActorCriticCfg, RslRlPpoAlgorithmCfg
@@ -12,18 +23,18 @@
1223
class HumanoidPPORunnerCfg(RslRlOnPolicyRunnerCfg):
1324
num_steps_per_env = 32
1425
max_iterations = 1000
15-
save_interval = 50
26+
save_interval = 100
1627
experiment_name = "humanoid"
1728
policy = RslRlPpoActorCriticCfg(
1829
init_noise_std=1.0,
19-
actor_obs_normalization=False,
20-
critic_obs_normalization=False,
30+
actor_obs_normalization=True,
31+
critic_obs_normalization=True,
2132
actor_hidden_dims=[400, 200, 100],
2233
critic_hidden_dims=[400, 200, 100],
2334
activation="elu",
2435
)
2536
algorithm = RslRlPpoAlgorithmCfg(
26-
value_loss_coef=1.0,
37+
value_loss_coef=2.0,
2738
use_clipped_value_loss=True,
2839
clip_param=0.2,
2940
entropy_coef=0.0,

source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/sb3_ppo_cfg.yaml

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,14 @@
33
#
44
# SPDX-License-Identifier: BSD-3-Clause
55

6-
# Adapted from rsl_rl config
6+
# ========================================= IMPORTANT NOTICE =========================================
7+
#
8+
# This file defines the agent configuration used to generate the "Training Performance" table in
9+
# https://isaac-sim.github.io/IsaacLab/main/source/overview/reinforcement-learning/rl_frameworks.html.
10+
# Ensure that the configurations for the other RL libraries are updated if this one is modified.
11+
#
12+
# ====================================================================================================
13+
714
seed: 42
815
policy: "MlpPolicy"
916
n_timesteps: !!float 5e7
@@ -18,7 +25,7 @@ clip_range: 0.2
1825
n_epochs: 5
1926
gae_lambda: 0.95
2027
max_grad_norm: 1.0
21-
vf_coef: 0.5
28+
vf_coef: 2.0
2229
policy_kwargs:
2330
activation_fn: 'nn.ELU'
2431
net_arch: [400, 200, 100]

source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/skrl_ppo_cfg.yaml

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,14 @@
33
#
44
# SPDX-License-Identifier: BSD-3-Clause
55

6+
# ========================================= IMPORTANT NOTICE =========================================
7+
#
8+
# This file defines the agent configuration used to generate the "Training Performance" table in
9+
# https://isaac-sim.github.io/IsaacLab/main/source/overview/reinforcement-learning/rl_frameworks.html.
10+
# Ensure that the configurations for the other RL libraries are updated if this one is modified.
11+
#
12+
# ====================================================================================================
13+
614
seed: 42
715

816

@@ -67,14 +75,13 @@ agent:
6775
entropy_loss_scale: 0.0
6876
value_loss_scale: 2.0
6977
kl_threshold: 0.0
70-
rewards_shaper_scale: 0.6
7178
time_limit_bootstrap: False
7279
# logging and checkpoint
7380
experiment:
7481
directory: "humanoid"
7582
experiment_name: ""
76-
write_interval: auto
77-
checkpoint_interval: auto
83+
write_interval: 32
84+
checkpoint_interval: 3200
7885

7986

8087
# Sequential trainer

0 commit comments

Comments
 (0)