-
Notifications
You must be signed in to change notification settings - Fork 465
Description
The goal of this ticket is to create a small but representative collection of datasets that can be used to test the dataset import feature, both manually and through automated regression tests.
The datasets should cover a variety of formats, media types, annotations, edge cases, etc... At the same time, they should be relatively small for the sake of test speed.
Naming convention: <annotation_type>-<dataset_format>.zip. Examples:
single_label-coco.zipbounding_box-datumaro.zippolygon-geti.zip
All following combinations must be covered:
single_label->voc,datumaro,getimulti_label->datumaro,getibounding_box->coco,yolo,datumaro,getipolygon->coco,datumaro,geti
Dataset preparation steps
Gather a bunch of free stock images and videos from public archives, such as https://www.pexels.com/. Alternatively, download some datasets from Kaggle, or generate them synthetically.
I recommend to choose low resolutions (e.g. 640 × 480) to minimize the archive size and the import/export time. For the same reason, choose videos with short duration.
Preferably, gather images and videos with a mix of all supported extensions (jpg, tif, webp, mp4, avi, ...)
For 'geti' format
- Create a project in Geti v3.0 of type <project type>
- Upload images and videos
- Annotate some images with different objects and labels
Leave some images unannotated
Annotate some images with the empty label
Assign some images to a specific subset - Annotate some video frames with different objects and labels
Annotate some video frames with the empty label - Export the dataset in Geti format
- Rename the archive according to the above convention
Project type:
- multiclass classification
- multilabel classification
- detection
- instance segmentation
For all other formats
- Create a project in Geti v2.13 of type <project type>
- Upload images and videos
- Annotate some images with different objects and labels
Leave some images unannotated
Annotate some images with the empty label - Annotate some video frames with different objects and labels
Annotate some video frames with the empty label - Export the dataset in all supported formats
- Rename the archive according to the above convention
Project type:
- multiclass classification
- multilabel classification
- detection
- instance segmentation