Conversation
| Author name: Tabitha Sugumar | ||
| Author email: __ | ||
| Author Affiliation: __ | ||
|
|
There was a problem hiding this comment.
Thanks for your changes @tk-sugumar . Please add your email and affiliation.
|
|
||
| ## Examples of this transformation | ||
|
|
||
| Because this is a randomized transformation, in both the selection of gender and selection of name, test examples are impossible -- the output for a single sentence is expected to be different in each successive run. Instead I've provided some example sentences and outputs for reference. |
There was a problem hiding this comment.
I believe you can use a default seed in the argument in init of your GenderRandomizer transformation so you can generate consistent results for your test cases so you can include them in your test.json
Quite a few of the PRs use this approach for test cases.
See for example:
https://github.com/GEM-benchmark/NL-Augmenter/pull/164/files
There was a problem hiding this comment.
Thanks Timothy! When I tried this, the same name was predicted for each sentence, so for use as intended I think the user would have to modify the code after downloading. Should I still go ahead and do this?
There was a problem hiding this comment.
Hi Timothy, I added in the seed in the initializer, the name names does get predicted each time though, I hope it's ok! Test cases are also added in the test.json
| Author Affiliation: Elsevier | ||
|
|
||
| ## What type of a transformation is this? | ||
| This transformation changes names in English texts, randomizing selection so there's an even chance of male and female names. It modifies pronouns to match the selected name. |
There was a problem hiding this comment.
Please add an acknowledgement that names are not deterministic identifiers of someones pronouns/gender :)
| Randomizes names in text for a 50/50 gender breakdown. Handles pronouns. | ||
| """ | ||
| nlp = spacy.load("en_core_web_sm", disable=["lemmatizer"]) | ||
| nlp.add_pipe("coreferee") |
There was a problem hiding this comment.
Modified as given in example
| class GenderRandomizer(SentenceOperation): | ||
| tasks = [TaskType.TEXT_TO_TEXT_GENERATION] | ||
| languages = ["en"] | ||
|
|
…itialization, added tests to text.json
| ## What tasks does it intend to benefit? | ||
| This is intended to avoid gender bias in natural language processing models. Run this transformation on text data prior to using it to train a model. | ||
|
|
||
| ## Previous Work |
There was a problem hiding this comment.
Importantly please add a Data and Code Provenance section to your transformation. Also, seems you've added about a 109 files which are hard to evaluate. I would suggest moving this into a separate pip project out of this and then adding it to the requirements.txt.
There was a problem hiding this comment.
Thanks! I've expanded on the data and code provenance, and put the description in a Data and Code Provenance section in the Readme.
On the 109 files -- most of them come from the coreferee directory -- this actually already exists as a library installable by pip, but when I was working on this was only installable in python 3.8 and the current version requires python 3.9. Since these transformations are required to be compatible with python 3.7, I downloaded here to make it installable in python 3.7.
|
Hi @tk-sugumar, it won't be a good idea to merge all of these in the repository. It would be better to make a pip library out of it in a separate repository and call only the relevant parts here. @AbinayaM02 thoughts |
Agreed. Like @kaustubhdhole mentioned, you should be installing the library (specify it in the reuirements.txt) and use it for your transformation @tk-sugumar. You can check if the library works fine for python 3.7. |
No description provided.