Add task 1160 from MRS by Palipoor · Pull Request #375 · allenai/natural-instructions

Palipoor · 2021-10-05T17:57:42Z

This task is created from the MRS dataset from this issue #283.
However, I am in doubt whether this is a good addition or not. The data is driven from Reddit replies and they're not good quality examples to learn from. I've cleaned them as much as I could but there're still lots of nonsense going on. I'm submitting the English task, to get other people's opinions. If it's good enough, or there's a good way to filter out nonsense, I will go on and add other languages too.
@swarooprm @danyaljj

tasks/README.md

danyaljj · 2021-10-07T00:30:27Z

Yeah, the data is quite noisy. I am leaning towards a "no", unless we can somehow clean it up around a particular subject.
@swarooprm feel free to share your thoughts.

swarooprm · 2021-10-20T07:09:15Z

I like this task, but I agree that noise is a concern.
I see that, longer instances (both longer inputs and outputs) are less prone to noise; I checked a few instances only though.
Check if this is true and we can filter instances based on that.

Palipoor · 2021-10-20T15:30:38Z

I checked again and I think there's noisy data in short instances too.

swarooprm · 2021-10-20T21:34:49Z

I checked again and I think there's noisy data in short instances too.

What I meant above was to retain only the longer instances. Longer instances seem to contain lesser noise.
It's ok if you still see noise in longer sentences and in that case I am fine if we drop this task.
Subject wise filtering may be another option (some subjects may contain less noise e.g. scientific topic discussion)

Palipoor · 2021-10-21T02:15:30Z

Oh sorry, I didn't read it carefully!
I will check it tomorrow and update the task if it's ok.

Palipoor · 2021-10-24T15:17:46Z

Sorry for being late. I think it makes sense to keep the longer instances(not sure about the threshold though). Should I add other languages too?

swarooprm · 2021-10-27T03:42:38Z

Sorry for being late. I think it makes sense to keep the longer instances(not sure about the threshold though). Should I add other languages too?

Yes, if you have time, feel free to add. It's also fine if you skip this and decide to focus on other ToDos we have in this project.

danyaljj · 2021-10-27T19:30:50Z

I agree with Swaroop. If cleaning up this PR will take more than 1hr, I would say, it's not worth it.

Add task 1160 from MRS

b0ee663

danyaljj reviewed Oct 7, 2021

View reviewed changes

tasks/README.md Show resolved Hide resolved

danyaljj added the onhold label Nov 4, 2021

danyaljj marked this pull request as draft November 4, 2021 21:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add task 1160 from MRS#375

Add task 1160 from MRS#375
Palipoor wants to merge 1 commit intoallenai:masterfrom
Palipoor:mrs

Palipoor commented Oct 5, 2021

Uh oh!

Uh oh!

danyaljj commented Oct 7, 2021

Uh oh!

swarooprm commented Oct 20, 2021

Uh oh!

Palipoor commented Oct 20, 2021

Uh oh!

swarooprm commented Oct 20, 2021

Uh oh!

Palipoor commented Oct 21, 2021

Uh oh!

Palipoor commented Oct 24, 2021

Uh oh!

swarooprm commented Oct 27, 2021

Uh oh!

danyaljj commented Oct 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Palipoor commented Oct 5, 2021

Uh oh!

Uh oh!

danyaljj commented Oct 7, 2021

Uh oh!

swarooprm commented Oct 20, 2021

Uh oh!

Palipoor commented Oct 20, 2021

Uh oh!

swarooprm commented Oct 20, 2021

Uh oh!

Palipoor commented Oct 21, 2021

Uh oh!

Palipoor commented Oct 24, 2021

Uh oh!

swarooprm commented Oct 27, 2021

Uh oh!

danyaljj commented Oct 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants