Skip to content

Commit d1d8e05

Browse files
committed
update doc
Signed-off-by: Yuki Huang <[email protected]>
1 parent 7bce47d commit d1d8e05

File tree

5 files changed

+60
-2
lines changed

5 files changed

+60
-2
lines changed

docs/guides/grpo.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,31 @@ data:
6868
env_name: "math"
6969
```
7070
71+
We support using multiple datasets for train and validation. You can refer to `examples/configs/grpo_multiple_datasets.yaml` for a full configuration example. Here's an example configuration:
72+
```yaml
73+
data:
74+
_override_: true # override the data config instead of merging with it
75+
# other data settings, see `examples/configs/sft.yaml` for more details
76+
...
77+
# dataset settings
78+
train:
79+
# train dataset 1
80+
- dataset_name: OpenMathInstruct-2
81+
split_validation_size: 0.05 # use 5% of the training data as validation data
82+
seed: 42 # seed for train/validation split when split_validation_size > 0
83+
# train dataset 2
84+
- dataset_name: DeepScaler
85+
validation:
86+
# validation dataset 1
87+
- dataset_name: AIME2024
88+
repeat: 16
89+
# validation dataset 2
90+
- dataset_name: DAPOMathAIME2024
91+
# default settings for all datasets
92+
default:
93+
...
94+
```
95+
7196
We support using a single dataset for both train and validation by using `split_validation_size` to set the validation ratio.
7297
[OpenAssistant](../../nemo_rl/data/datasets/response_datasets/oasst.py), [OpenMathInstruct-2](../../nemo_rl/data/datasets/response_datasets/openmathinstruct2.py), [ResponseDataset](../../nemo_rl/data/datasets/response_datasets/response_dataset.py), [Tulu3SftMixtureDataset](../../nemo_rl/data/datasets/response_datasets/tulu3.py) are supported for this feature.
7398
If you want to support this feature for your custom datasets or other built-in datasets, you can simply add the code to the dataset like [ResponseDataset](../../nemo_rl/data/datasets/response_datasets/response_dataset.py).

docs/guides/sft.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,31 @@ data:
100100
processor: "sft_processor"
101101
```
102102
103+
We support using multiple datasets for train and validation. You can refer to `examples/configs/grpo_multiple_datasets.yaml` for a full configuration example. Here's an example configuration:
104+
```yaml
105+
data:
106+
_override_: true # override the data config instead of merging with it
107+
# other data settings, see `examples/configs/sft.yaml` for more details
108+
...
109+
# dataset settings
110+
train:
111+
# train dataset 1
112+
- dataset_name: OpenMathInstruct-2
113+
split_validation_size: 0.05 # use 5% of the training data as validation data
114+
seed: 42 # seed for train/validation split when split_validation_size > 0
115+
# train dataset 2
116+
- dataset_name: DeepScaler
117+
validation:
118+
# validation dataset 1
119+
- dataset_name: AIME2024
120+
repeat: 16
121+
# validation dataset 2
122+
- dataset_name: DAPOMathAIME2024
123+
# default settings for all datasets
124+
default:
125+
...
126+
```
127+
103128
We support using a single dataset for both train and validation by using `split_validation_size` to set the ratio of validation.
104129
[OpenAssistant](../../nemo_rl/data/datasets/response_datasets/oasst.py), [OpenMathInstruct-2](../../nemo_rl/data/datasets/response_datasets/openmathinstruct2.py), [ResponseDataset](../../nemo_rl/data/datasets/response_datasets/response_dataset.py), [Tulu3SftMixtureDataset](../../nemo_rl/data/datasets/response_datasets/tulu3.py) are supported for this feature.
105130
If you want to support this feature for your custom datasets or other built-in datasets, you can simply add the code to the dataset like [ResponseDataset](../../nemo_rl/data/datasets/response_datasets/response_dataset.py).

examples/configs/grpo_math_1B.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -273,6 +273,10 @@ data:
273273
system_prompt_file: null
274274
processor: "math_hf_data_processor"
275275
env_name: "math"
276+
277+
# You can also use multiple datasets by using a list of datasets.
278+
# See `examples/configs/grpo_multiple_datasets.yaml` for a full configuration example.
279+
276280
# You can use custom response datasets for training and validation. For example:
277281
# train:
278282
# # this dataset will override input_key and use the default values for other vars

examples/configs/grpo_multiple_datasets.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ data:
99
num_workers: 1
1010

1111
# dataset
12+
# See https://github.com/NVIDIA-NeMo/RL/blob/main/docs/guides/sft.md#datasets for more details.
1213
train:
1314
- dataset_name: OpenMathInstruct-2
1415
split_validation_size: 0.05 # use 5% of the training data as validation data

examples/configs/sft.yaml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,10 @@ data:
194194
prompt_file: null
195195
system_prompt_file: null
196196
processor: "sft_processor"
197+
198+
# You can also use multiple datasets by using a list of datasets.
199+
# See `examples/configs/grpo_multiple_datasets.yaml` for a full configuration example.
200+
197201
# You can use custom response datasets for training and validation. For example:
198202
# train:
199203
# # this dataset will override input_key and use the default values for other vars
@@ -212,8 +216,7 @@ data:
212216
# processor: "sft_processor"
213217
# See https://github.com/NVIDIA-NeMo/RL/blob/main/docs/guides/sft.md#datasets for more details.
214218

215-
216-
## OpenAI format specific configs
219+
# OpenAI format specific configs
217220
# train_data_path: "/path/to/train.jsonl" # Path to training data
218221
# val_data_path: "/path/to/val.jsonl" # Path to validation data
219222
# chat_key: "messages" # Key for messages in the data

0 commit comments

Comments
 (0)