Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strings: Split is inconsistent with bytes.Split #53511

Closed
dsnet opened this issue Jun 23, 2022 · 4 comments
Closed

strings: Split is inconsistent with bytes.Split #53511

dsnet opened this issue Jun 23, 2022 · 4 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Jun 23, 2022

Consider the following:

in := "\xff-\xff"
sep := ""
fmt.Printf("%q\n", bytes.Split([]byte(in), []byte(sep)))   // ["\xff" "-" "\xff"]
fmt.Printf("%q\n", strings.Split(string(in), string(sep))) // ["�" "-" "\xff"]

The results of these two are inconsistent where the strings implementation replaced
\xff with utf8.RuneError, even though the documentation of strings.Split
mentions no such behavior. It only says that it slices up the input.
Furthermore, it is odd, that it only mangles the first element of the result, but not the last.

@gopherbot
Copy link

Change https://go.dev/cl/413715 mentions this issue: strings: avoid utf.RuneError mangling in Split

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 24, 2022
@cagedmantis cagedmantis added this to the Backlog milestone Jun 24, 2022
@cagedmantis
Copy link
Contributor

@griesemer

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 25, 2022
@dmitshur dmitshur modified the milestones: Backlog, Go1.19 Jun 25, 2022
@nightlyone
Copy link
Contributor

https://go.dev/play/p/7PPxD1AyxTq shows it together with 2 other situations where utf8 rune splitting is being done that are handled differently again.

To me it looks like the strings and bytes package are both being inconsistent to each of the other documented rune splitting methods and probably both need to be adjusted.

Or maybe I am missing something here and those functions are supposed to support rune splitting in a different way in order to allow for edge cases I am not aware of.

@dsnet
Copy link
Member Author

dsnet commented Jul 14, 2022

@nightlyone. My CL makes it such that your example prints:

["\xff" "-" "\xff"]
["\xff" "-" "\xff"]
['�' '-' '�']
'�''-''�'

This brings the bytes and strings package in conformance to each other.

Regarding the last two lines in your example, there is no documented guarantee that strings.Split(s, "") needs to be identical to []rune(s). In fact, strings.Split is technically bound by the documentation to always produce a list of subslices. Converting 0xff to the rune error violates that. On the other hand, []rune(s) is documented by the language specification that it does do the rune error mangling.

@gopherbot gopherbot modified the milestones: Go1.19, Go1.20 Aug 2, 2022
jproberts pushed a commit to jproberts/go that referenced this issue Aug 10, 2022
Split should only split strings and not perform mangling
of invalid UTF-8 into ut8.RuneError.
The prior behavior is clearly a bug since mangling is not
performed in all other situations (e.g., separator is non-empty).

Fixes golang#53511

Change-Id: I112a2ef15ee46ddecda015ee14bca04cd76adfbf
Reviewed-on: https://go-review.googlesource.com/c/go/+/413715
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Aug 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants