generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 21
Open
Description
Thanks for the impressive work.
I have one question about the pretraining process of DETR (of which you've mentioned here: https://github.com/amazon-science/tubelet-transformer#training)
From here (#4 (comment)),
I figured that you've brought the DETR weights trained on COCO dataset and re-trained it on AVA to detect human instances.
- Could you describe this process in a more detailed way? (e.g., how did you manipulated the DETR structure to only detect human, what exactly was the input, position embedding, ... etc)
- Was your intention of this pretraining to make queries focus more on classification after DETR architecture of TubeR learns how to localize actors well enough?
- Have you tried training the whole architecture without the pretrained DETR weights? I've tried several times but could not find a good configuration to make the actual learning happen.
Thanks in advance.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels