It's currently a bit of a hassle to have to download a potentially large dataset from HF before being able to train on it. The datasets library has an option to use IterableDataset and IterableDatasetDict, and stream in the data rather than blocking while loading it.
Handling these classes though is different than the regular Dataset and DatasetDict and we'll need to decide whether this should be supported.
It's currently a bit of a hassle to have to download a potentially large dataset from HF before being able to train on it. The
datasetslibrary has an option to useIterableDatasetandIterableDatasetDict, and stream in the data rather than blocking while loading it.Handling these classes though is different than the regular
DatasetandDatasetDictand we'll need to decide whether this should be supported.