-
Notifications
You must be signed in to change notification settings - Fork 8
ensure upper() doesn't increase string length #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Is this good to review? |
|
Looks good, nice catch! Can you add a link to this PR as a comment in the function? And can similar stuff happen for |
|
Hm, this isn't really enough to match stdlib's re behavior: I am currently trying to figure out if python exposes case folding information somewhere that I could use. |
I can make that change. And yes there is a single character for which this is true for |
|
Actually, don't worry about it, I will implement this in some other way, thank you for making me aware of this issue, and thank you for providing a fix! I will properly implement the official unicode |
|
Thanks so much for ensuring there is a proper fix @MegaIng!! |
Fixes dottxt-ai/outlines#773
Problem
In
master, interegular useschar.upper(), which can convert one char into two, resulting the set ofaccepts()andstrings()being inconsistent.accepts()andstrings()inconsistency:This change ensures if a capitalized character isn't of length 1, the original character is used.
This is consistent with the behavior of
re: