Skip to content

Allow for huggingface dataset streaming #16

@harrykeightley

Description

@harrykeightley

It's currently a bit of a hassle to have to download a potentially large dataset from HF before being able to train on it. The datasets library has an option to use IterableDataset and IterableDatasetDict, and stream in the data rather than blocking while loading it.

Handling these classes though is different than the regular Dataset and DatasetDict and we'll need to decide whether this should be supported.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions