Hi, thanks a lot for your great work on this benchmark!
However, I’m a bit confused about the ground truth of some samples and would like to confirm whether there might be annotation issues, or if my understanding is incorrect.
Case 1
{
"category": "VD",
"subcategory": "video",
"visual_input": "2",
"set_id": "17",
"figure_id": "1",
"sample_note": "animation",
"question_id": "1",
"question": "According to the positive sequence of the images, is this cartoon character getting far away? Answer in one sentence.",
"gt_answer_details": "The cartoon character is getting far away.",
"gt_answer": "1",
"filename": "./VD/video/17_1.png"
}
From the image sequence, it doesn’t look to me like the cartoon character is getting farther away, but the gt_answer is 1 (True).
Is this intended, or could this be a labeling error?
Case 2
{
"category": "VD",
"subcategory": "video",
"visual_input": "2",
"set_id": "14",
"figure_id": "1",
"sample_note": "skating meme",
"question_id": "3",
"question": "They are skating to right. According to the positive sequence of the images, are they in the correct order? Answer in one sentence.",
"gt_answer_details": "They are skating to the right",
"gt_answer": "1",
"filename": "./VD/video/14_1.png"
}
In this case, I interpret the characters as skating to the left, but the ground truth says they are skating to the right (gt_answer = 1).
So my question is:
Are these ground truths actually correct and I’m misunderstanding the intended perspective/direction?
Or are these potential annotation errors in the dataset?
Any clarification would be greatly appreciated.
Hi, thanks a lot for your great work on this benchmark!
However, I’m a bit confused about the ground truth of some samples and would like to confirm whether there might be annotation issues, or if my understanding is incorrect.
Case 1
{ "category": "VD", "subcategory": "video", "visual_input": "2", "set_id": "17", "figure_id": "1", "sample_note": "animation", "question_id": "1", "question": "According to the positive sequence of the images, is this cartoon character getting far away? Answer in one sentence.", "gt_answer_details": "The cartoon character is getting far away.", "gt_answer": "1", "filename": "./VD/video/17_1.png" }From the image sequence, it doesn’t look to me like the cartoon character is getting farther away, but the gt_answer is 1 (True).
Is this intended, or could this be a labeling error?
Case 2
{ "category": "VD", "subcategory": "video", "visual_input": "2", "set_id": "14", "figure_id": "1", "sample_note": "skating meme", "question_id": "3", "question": "They are skating to right. According to the positive sequence of the images, are they in the correct order? Answer in one sentence.", "gt_answer_details": "They are skating to the right", "gt_answer": "1", "filename": "./VD/video/14_1.png" }In this case, I interpret the characters as skating to the left, but the ground truth says they are skating to the right (gt_answer = 1).
So my question is:
Are these ground truths actually correct and I’m misunderstanding the intended perspective/direction?
Or are these potential annotation errors in the dataset?
Any clarification would be greatly appreciated.