Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strings: Document use of simple case-folding in EqualFold #52022

Closed
typesanitizer opened this issue Mar 29, 2022 · 7 comments
Closed

strings: Document use of simple case-folding in EqualFold #52022

typesanitizer opened this issue Mar 29, 2022 · 7 comments
Labels
Documentation FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@typesanitizer
Copy link

At the moment, the documentation for EqualFold says:

EqualFold reports whether s and t, interpreted as UTF-8 strings, are equal under Unicode case-folding, which is a more general form of case-insensitivity.

However, this is under-specified. EqualFold uses simple case-folding, not full case-folding, which can be surprising. For example, Python 3 and Swift default to full case folding:

$ python3
> "ß".casefold()
'ss'
$ swift
> import Foundation
> "ß".folding(options: [.caseInsensitive], locale: nil)
$R0: String = "ss"

I think it would be better to explicitly state that simple case-folding is being used. Perhaps something like:

EqualFold reports whether s and t, interpreted as UTF-8 strings,
are equal under simple Unicode case-folding, which is a more
general form of case-insensitivity. For more information,
see https://www.unicode.org/Public/UNIDATA/CaseFolding.txt.

In general, EqualFold(s1, s2) == (ToLower(s1) == ToLower(s2)).

Perhaps it would be also helpful to have an example like:

func ExampleEqualFold() {
  fmt.Println(strings.EqualFold("AB", "ab")) // true because comparison uses simple case-folding
  fmt.Println(strings.EqualFold("ß", "ss"))  // false because comparison does not use full case-folding
  // Output: true
  // Output: false
}

One can also add a fuzz test if so desired:

func FuzzEqualFold(f *testing.F) {
    f.Fuzz(func(t *testing.T, s1 string, s2 string) {
        lower := strings.ToLower(s1) == strings.ToLower(s2)
        equalFold := strings.EqualFold(s1, s2)
        if lower != equalFold {
            t.Fatalf("mismatch between ToLower and EqualFold: s1 = %s, s2 = %s, via Lower = %v, via EqualFold = %v", s1, s2, lower, equalFold)
        }
    )
}
@ianlancetaylor ianlancetaylor added help wanted NeedsFix The path to resolution is known, but the work has not been done. labels Mar 29, 2022
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Mar 29, 2022
@hopehook
Copy link
Member

I'd be happy to submit a CL to fix this.

@gopherbot
Copy link

Change https://go.dev/cl/396616 mentions this issue: strings: document the use of simple case-folding in EqualFold

@hopehook
Copy link
Member

I think the same modification can be done to bytes.EqualFold

@typesanitizer
Copy link
Author

typesanitizer commented Mar 31, 2022

(I haven't set up a Gerrit account, so commenting here on the patch:) I just realized that in terms of wording, it may make sense to consistently use the same variables:

- In general, EqualFold(s1, s2) == (ToLower(s1) == ToLower(s2)).
+ In general, EqualFold(s, t) == (ToLower(s) == ToLower(t)).

@hopehook
Copy link
Member

(I haven't set up a Gerrit account, so commenting here on the patch:) I just realized that in terms of wording, it may make sense to consistently use the same variables:

- In general, EqualFold(s1, s2) == (ToLower(s1) == ToLower(s2)).
+ In general, EqualFold(s, t) == (ToLower(s) == ToLower(t)).

Resolved

@hopehook
Copy link
Member

@typesanitizer
There is a guide on how to contribute to golang: https://go.dev/doc/contribute.
It takes time to fix this issue, if you are interested, you can take a look at this document first.

@gopherbot
Copy link

Change https://go.dev/cl/398736 mentions this issue: strings, bytes: improve the description of simple case-folding in EqualFold

gopherbot pushed a commit that referenced this issue Apr 7, 2022
…alFold

This CL removes the problem description pointed out by @bjkail.
Second, synchronously modify the comments of the bytes package.

Updates #52022
Fixes #52204

Change-Id: I0aa52c774f40bb91f32bebdd2a62a11067a77be0
Reviewed-on: https://go-review.googlesource.com/c/go/+/398736
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Trust: Cherry Mui <cherryyz@google.com>
@golang golang locked and limited conversation to collaborators Apr 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Documentation FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

4 participants