Skip to content

Added the stopword removal transformation#268

Open
jnyiloke wants to merge 14 commits intoGEM-benchmark:mainfrom
jnyiloke:stopword_removal
Open

Added the stopword removal transformation#268
jnyiloke wants to merge 14 commits intoGEM-benchmark:mainfrom
jnyiloke:stopword_removal

Conversation

@jnyiloke
Copy link
Copy Markdown

@jnyiloke jnyiloke commented Sep 1, 2021

No description provided.

@@ -0,0 +1,14 @@
# Stopword Removal
Removes stopwords from a piece of text.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @juanyiloke please add your name, email and affiliation.

Copy link
Copy Markdown
Author

@jnyiloke jnyiloke Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

"sentence": "OMG!!! jUSTin is AmAZEballs!!!"
},
"outputs": [{
"sentence": "OMG!!! jUSTin is AmAZEballs!!!"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why isn't the stopword "is" removed in this particular example?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed!



def stopword_remove(text, max_outputs=1):
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the max_outputs argument isn't used anywhere in the stopword_remove function?

Copy link
Copy Markdown
Author

@jnyiloke jnyiloke Sep 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

super().__init__(seed, max_outputs=max_outputs)

def generate(self, raw_text: str):
pertubed_text = stopword_remove(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor cosmetic change: pertubed_text -> perturbed_text

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed!

TaskType.TEXT_TO_TEXT_GENERATION,
]
languages = ["en"]
heavy = True
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can mark this transformation as light i.e heavy = False, as we are only using nltk package.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@jnyiloke jnyiloke requested a review from aadesh11 September 18, 2021 20:00
TaskType.TEXT_TO_TEXT_GENERATION,
]
languages = ["en"]
heavy = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add the keywords

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seconded

Author: Juan Yi Loke
Email: juanyi.loke@mail.utoronto.ca
Affliation: University of Toronto

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you would need to add the Robustness Evaluation as per the instructions in the email :)

Copy link
Copy Markdown
Contributor

@msobrevillac msobrevillac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the transformation is well-implemented, however, I think it needs for being well-motivated description in order to show the usefulness of this transformation for the project.

@kaustubhdhole
Copy link
Copy Markdown
Collaborator

Okay, this transformation looks great. I do have a suggestion though: first, if you remove all stopwords, that can be dangerous: example the shakespeare sentence that you added in the README gives a clear example. It might be better you provide a parameter to control the amount of change that should be permitted in a sentence, eg. how many stopwords can be removed at a single time. This way you might be able to generate multiple sentences too with little loss in meaning. Besides, please add appropriate keywords and a robustness evaluation to make this PR stronger. Also, you might want to mention any work relating to the influence of stopword removal. (Might also give better insights).

Copy link
Copy Markdown
Collaborator

@aadesh11 aadesh11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add your transformation name in test/mapper.py so that the test job can run your test cases.

@jnyiloke
Copy link
Copy Markdown
Author

Thanks so much for the reviews everyone, I'll get to them this weekend. It's midterms season for me but that should be over soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants