Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strings: EqualFold is not equivalent to comparing ToLower #52204

Closed
bjkail opened this issue Apr 7, 2022 · 9 comments
Closed

strings: EqualFold is not equivalent to comparing ToLower #52204

bjkail opened this issue Apr 7, 2022 · 9 comments
Labels
Documentation FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@bjkail
Copy link

bjkail commented Apr 7, 2022

#52022 added this statement to the doc for strings.EqualFold:

EqualFold(s, t) is equivalent to Tolower(s) == Tolower(t).

This program seems to be a counterexample.

(Additionally, it seems like the doc meant to say ToLower rather than Tolower?)

@hopehook
Copy link
Member

hopehook commented Apr 7, 2022

Thanks. I'll follow up on this issue.

@hopehook
Copy link
Member

hopehook commented Apr 7, 2022

It seems like the doc meant to say ToLower rather than Tolower?

ToLower

@gopherbot
Copy link

Change https://go.dev/cl/398736 mentions this issue: strings, bytes: improve the description of simple case-folding in EqualFold

@typesanitizer
Copy link

typesanitizer commented Apr 7, 2022

Interesting! It seems like a bug in go staticcheck, which recommends replacing ToLower comparison with EqualFold, even though it is not semantics-preserving.

Also, I did write a fuzz test locally and it passed without finding a counter-example. Maybe the default fuzzing limit is too low.

Given that this transformation has been incorrectly described in multiple places:

  • DigitalOcean blog describes it as an "optimization" even though it is not semantics-preserving.
  • A popular StackOverflow answer has a comment recommending using < after ToLower, which would be inconsistent with EqualFold.

Would it make sense to add a note of caution to the ToLower docs? (And maybe to the ToUpper docs too?) Something like:

Avoid comparing two strings after case transformation; use EqualFold instead. For example, "ς" (Greek Small Letter Final Sigma) and "σ" (Greek Small Letter Sigma) are equivalent under case-folding but not under comparison via lowercase.

Doesn't need to be pure text, maybe it can be a runnable example.

@ainar-g
Copy link
Contributor

ainar-g commented Apr 7, 2022

Interestingly enough, this particular example seems to be okay with strings.ToUpper. See https://go.dev/play/p/JFkZxxPOhWe.

@typesanitizer
Copy link

typesanitizer commented Apr 7, 2022

Example where comparison via ToUpper is different from comparison via EqualFold. https://go.dev/play/p/9g3vAh5wCKN

@cherrymui cherrymui added Documentation NeedsFix The path to resolution is known, but the work has not been done. labels Apr 7, 2022
@cherrymui cherrymui added this to the Backlog milestone Apr 7, 2022
@ianlancetaylor
Copy link
Contributor

Thanks for the examples.

@hopehook Sorry for the incorrect suggestion.

@hopehook
Copy link
Member

hopehook commented Apr 8, 2022

@ianlancetaylor Thanks Ian for the help. I really underestimated the complexity here.

@hopehook
Copy link
Member

hopehook commented Apr 8, 2022

@typesanitizer @ainar-g Thanks for the examples.

I think you are more professional than me. If there is a good solution that can be implemented, I suggest that you can open another issue, solve it together, and help it become better.

@golang golang locked and limited conversation to collaborators Apr 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Documentation FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

7 participants