Skip to content

Prepare sample datasets to test dataset import #5938

@leoll2

Description

@leoll2

The goal of this ticket is to create a small but representative collection of datasets that can be used to test the dataset import feature, both manually and through automated regression tests.

The datasets should cover a variety of formats, media types, annotations, edge cases, etc... At the same time, they should be relatively small for the sake of test speed.

Naming convention: <annotation_type>-<dataset_format>.zip. Examples:

  • single_label-coco.zip
  • bounding_box-datumaro.zip
  • polygon-geti.zip

All following combinations must be covered:

  • single_label -> voc, datumaro, geti
  • multi_label -> datumaro, geti
  • bounding_box -> coco, yolo, datumaro, geti
  • polygon -> coco, datumaro, geti

Dataset preparation steps

Gather a bunch of free stock images and videos from public archives, such as https://www.pexels.com/. Alternatively, download some datasets from Kaggle, or generate them synthetically.
I recommend to choose low resolutions (e.g. 640 × 480) to minimize the archive size and the import/export time. For the same reason, choose videos with short duration.
Preferably, gather images and videos with a mix of all supported extensions (jpg, tif, webp, mp4, avi, ...)

For 'geti' format

  1. Create a project in Geti v3.0 of type <project type>
  2. Upload images and videos
  3. Annotate some images with different objects and labels
    Leave some images unannotated
    Annotate some images with the empty label
    Assign some images to a specific subset
  4. Annotate some video frames with different objects and labels
    Annotate some video frames with the empty label
  5. Export the dataset in Geti format
  6. Rename the archive according to the above convention

Project type:

  • multiclass classification
  • multilabel classification
  • detection
  • instance segmentation

For all other formats

  1. Create a project in Geti v2.13 of type <project type>
  2. Upload images and videos
  3. Annotate some images with different objects and labels
    Leave some images unannotated
    Annotate some images with the empty label
  4. Annotate some video frames with different objects and labels
    Annotate some video frames with the empty label
  5. Export the dataset in all supported formats
  6. Rename the archive according to the above convention

Project type:

  • multiclass classification
  • multilabel classification
  • detection
  • instance segmentation

Metadata

Metadata

Assignees

Labels

Geti BackendIssues related to the Geti application server

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions