-
Notifications
You must be signed in to change notification settings - Fork 239
Open
1 / 51 of 5 issues completedLabels
Description
Steps:
- Dataset Refactor: refactor: refactor dataset module #977
- Decouple Train and Validation Dataset:
- Multiple Datasets Support:
- SFT/RL: feat: support multiple datasets for response dataset #1691
- RM/DPO:
- Clean up
- Clean up GRPO: environments: refactor: unify entrypoint for different envs #1841
- Refactor data processor.
- Refactor prompt management.
- Unify dataset_name and dataset_cls.
Step1: Dataset Refactor
- Add general dataset class for different modes: sft_dataset, preference_dataset (for RM and DPO), rl_dataset. We can use some keys like
prompt_key,chosen_key,rejected_keyto specify how to read local or HuggingFace dataset, instead of writing a new dataset class. - For the built-in datasets (e.g.
open_assistant,HelpSteer3, etc.), we'll keep them for enabling others to accurately reproduce our results.
After refactor, the usage will become:
- For special supported datasets, the usage is the same as before.
- For general datasets (local/hf), an example for DPO is below.
data:
train_data_path: /path/to/local/train_dataset.jsonl
val_data_path: /path/to/local/val_dataset.jsonl
dataset_name: BinaryPreferenceDataset
prompt_key: prompt
chosen_key: chosen
rejected_key: rejectedStep2: Decouple Train and Validation Dataset
Train and validation dataset are coupled for now, which means we need write the same logic twice for train and eval when we add support for new dataset, so it's good to decouple them.
After this, the usage will become:
data:
train:
data_path: /path/to/local/train_dataset.jsonl
dataset_name: BinaryPreferenceDataset
prompt_key: prompt
chosen_key: chosen
rejected_key: rejected
validation:
data_path: /path/to/local/val_dataset.jsonl
dataset_name: BinaryPreferenceDataset
prompt_key: prompt
chosen_key: chosen
rejected_key: rejectedStep3: Multiple Datasets Support
After this, the usage will become:
data:
train:
# this dataset will override prompt_key and use the default values for other vars
- data_path: /path/to/local/train_dataset_1.jsonl
prompt_key: context
# this dataset will use all the default values
- data_path: /path/to/local/train_dataset_2.jsonl
validation:
- data_path: /path/to/local/val_dataset.jsonl
default:
# will use below vars as default values if dataset doesn't specify it
dataset_name: BinaryPreferenceDataset
prompt_key: prompt
chosen_key: chosen
rejected_key: rejectedReactions are currently unavailable