update doc

yuki-97 · yuki-97 · commit d1d8e052393f · 2026-02-01T21:20:38.000-08:00
Signed-off-by: Yuki Huang &lt;yukih@nvidia.com&gt;
diff --git a/docs/guides/grpo.md b/docs/guides/grpo.md
@@ -68,6 +68,31 @@ data:
     env_name: "math"
 ```
 
+We support using multiple datasets for train and validation. You can refer to `examples/configs/grpo_multiple_datasets.yaml` for a full configuration example. Here's an example configuration:
+```yaml
+data:
+  _override_: true # override the data config instead of merging with it
+  # other data settings, see `examples/configs/sft.yaml` for more details
+  ...
+  # dataset settings
+  train:
+    # train dataset 1
+    - dataset_name: OpenMathInstruct-2
+      split_validation_size: 0.05 # use 5% of the training data as validation data
+      seed: 42  # seed for train/validation split when split_validation_size > 0
+    # train dataset 2
+    - dataset_name: DeepScaler
+  validation:
+    # validation dataset 1
+    - dataset_name: AIME2024
+      repeat: 16
+    # validation dataset 2
+    - dataset_name: DAPOMathAIME2024
+  # default settings for all datasets
+  default:
+    ...
+```
+
 We support using a single dataset for both train and validation by using `split_validation_size` to set the validation ratio.
 [OpenAssistant](../../nemo_rl/data/datasets/response_datasets/oasst.py), [OpenMathInstruct-2](../../nemo_rl/data/datasets/response_datasets/openmathinstruct2.py), [ResponseDataset](../../nemo_rl/data/datasets/response_datasets/response_dataset.py), [Tulu3SftMixtureDataset](../../nemo_rl/data/datasets/response_datasets/tulu3.py) are supported for this feature.
 If you want to support this feature for your custom datasets or other built-in datasets, you can simply add the code to the dataset like [ResponseDataset](../../nemo_rl/data/datasets/response_datasets/response_dataset.py).
diff --git a/docs/guides/sft.md b/docs/guides/sft.md
@@ -100,6 +100,31 @@ data:
     processor: "sft_processor"
 ```
 
+We support using multiple datasets for train and validation. You can refer to `examples/configs/grpo_multiple_datasets.yaml` for a full configuration example. Here's an example configuration:
+```yaml
+data:
+  _override_: true # override the data config instead of merging with it
+  # other data settings, see `examples/configs/sft.yaml` for more details
+  ...
+  # dataset settings
+  train:
+    # train dataset 1
+    - dataset_name: OpenMathInstruct-2
+      split_validation_size: 0.05 # use 5% of the training data as validation data
+      seed: 42  # seed for train/validation split when split_validation_size > 0
+    # train dataset 2
+    - dataset_name: DeepScaler
+  validation:
+    # validation dataset 1
+    - dataset_name: AIME2024
+      repeat: 16
+    # validation dataset 2
+    - dataset_name: DAPOMathAIME2024
+  # default settings for all datasets
+  default:
+    ...
+```
+
 We support using a single dataset for both train and validation by using `split_validation_size` to set the ratio of validation.
 [OpenAssistant](../../nemo_rl/data/datasets/response_datasets/oasst.py), [OpenMathInstruct-2](../../nemo_rl/data/datasets/response_datasets/openmathinstruct2.py), [ResponseDataset](../../nemo_rl/data/datasets/response_datasets/response_dataset.py), [Tulu3SftMixtureDataset](../../nemo_rl/data/datasets/response_datasets/tulu3.py) are supported for this feature.
 If you want to support this feature for your custom datasets or other built-in datasets, you can simply add the code to the dataset like [ResponseDataset](../../nemo_rl/data/datasets/response_datasets/response_dataset.py).
diff --git a/examples/configs/grpo_math_1B.yaml b/examples/configs/grpo_math_1B.yaml
@@ -273,6 +273,10 @@ data:
     system_prompt_file: null
     processor: "math_hf_data_processor"
     env_name: "math"
+
+  # You can also use multiple datasets by using a list of datasets.
+  # See `examples/configs/grpo_multiple_datasets.yaml` for a full configuration example.
+
   # You can use custom response datasets for training and validation. For example:
   # train:
   #   # this dataset will override input_key and use the default values for other vars
diff --git a/examples/configs/grpo_multiple_datasets.yaml b/examples/configs/grpo_multiple_datasets.yaml
@@ -9,6 +9,7 @@ data:
   num_workers: 1
 
   # dataset
+  # See https://github.com/NVIDIA-NeMo/RL/blob/main/docs/guides/sft.md#datasets for more details.
   train:
     - dataset_name: OpenMathInstruct-2
       split_validation_size: 0.05 # use 5% of the training data as validation data
diff --git a/examples/configs/sft.yaml b/examples/configs/sft.yaml
@@ -194,6 +194,10 @@ data:
     prompt_file: null
     system_prompt_file: null
     processor: "sft_processor"
+
+  # You can also use multiple datasets by using a list of datasets.
+  # See `examples/configs/grpo_multiple_datasets.yaml` for a full configuration example.
+
   # You can use custom response datasets for training and validation. For example:
   # train:
   #   # this dataset will override input_key and use the default values for other vars
@@ -212,8 +216,7 @@ data:
   #   processor: "sft_processor"
   # See https://github.com/NVIDIA-NeMo/RL/blob/main/docs/guides/sft.md#datasets for more details.
 
-
-  ## OpenAI format specific configs
+  # OpenAI format specific configs
   # train_data_path: "/path/to/train.jsonl"  # Path to training data
   # val_data_path: "/path/to/val.jsonl"      # Path to validation data
   # chat_key: "messages"                     # Key for messages in the data