Added the stopword removal transformation#268
Added the stopword removal transformation#268jnyiloke wants to merge 14 commits intoGEM-benchmark:mainfrom
Conversation
| @@ -0,0 +1,14 @@ | |||
| # Stopword Removal | |||
| Removes stopwords from a piece of text. | |||
There was a problem hiding this comment.
Hi @juanyiloke please add your name, email and affiliation.
| "sentence": "OMG!!! jUSTin is AmAZEballs!!!" | ||
| }, | ||
| "outputs": [{ | ||
| "sentence": "OMG!!! jUSTin is AmAZEballs!!!" |
There was a problem hiding this comment.
Just curious, why isn't the stopword "is" removed in this particular example?
|
|
||
|
|
||
| def stopword_remove(text, max_outputs=1): | ||
| """ |
There was a problem hiding this comment.
the max_outputs argument isn't used anywhere in the stopword_remove function?
| super().__init__(seed, max_outputs=max_outputs) | ||
|
|
||
| def generate(self, raw_text: str): | ||
| pertubed_text = stopword_remove( |
There was a problem hiding this comment.
Minor cosmetic change: pertubed_text -> perturbed_text
| TaskType.TEXT_TO_TEXT_GENERATION, | ||
| ] | ||
| languages = ["en"] | ||
| heavy = True |
There was a problem hiding this comment.
I think we can mark this transformation as light i.e heavy = False, as we are only using nltk package.
| TaskType.TEXT_TO_TEXT_GENERATION, | ||
| ] | ||
| languages = ["en"] | ||
| heavy = False |
There was a problem hiding this comment.
Need to add the keywords
| Author: Juan Yi Loke | ||
| Email: juanyi.loke@mail.utoronto.ca | ||
| Affliation: University of Toronto | ||
|
|
There was a problem hiding this comment.
I believe you would need to add the Robustness Evaluation as per the instructions in the email :)
msobrevillac
left a comment
There was a problem hiding this comment.
I think that the transformation is well-implemented, however, I think it needs for being well-motivated description in order to show the usefulness of this transformation for the project.
|
Okay, this transformation looks great. I do have a suggestion though: first, if you remove all stopwords, that can be dangerous: example the shakespeare sentence that you added in the README gives a clear example. It might be better you provide a parameter to control the amount of change that should be permitted in a sentence, eg. how many stopwords can be removed at a single time. This way you might be able to generate multiple sentences too with little loss in meaning. Besides, please add appropriate keywords and a robustness evaluation to make this PR stronger. Also, you might want to mention any work relating to the influence of stopword removal. (Might also give better insights). |
aadesh11
left a comment
There was a problem hiding this comment.
Please also add your transformation name in test/mapper.py so that the test job can run your test cases.
|
Thanks so much for the reviews everyone, I'll get to them this weekend. It's midterms season for me but that should be over soon. |
No description provided.