-
Notifications
You must be signed in to change notification settings - Fork 149
Description
in #75 (comment), @PatMyron said:
For simpler regexes, I occasionally intentionally include corner cases like Billiam and Boberto? since I'm matching real datasets, so it's no issue that they aren't typical. Inversely, false positive matches are detrimental, so I exclude looser matches like:
bill,william,billy,robert,willie,fred
Curious if you'd consider excluding or separating looser matches!
Now I @NickCrews reply:
Yes, I would be willing to remove some looser matches. I would lean towards removing the "fred" from that list, but keeping the rest, as I know Bill/Robert is a moderately common solution.
Regardless, my goal is the for it to be trivial for users to customize this list to their needs, with both adding and removing name relationships. Then it isn't nearly as important for us all to agree on the level of strictness. Haave you found it easy to customize this list? Are you using the python API, or the raw csv, or what?