Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String iteration with range should work with bytes, not code points #1185

Closed
niemeyer opened this issue Oct 9, 2010 · 5 comments
Closed

Comments

@niemeyer
Copy link
Contributor

niemeyer commented Oct 9, 2010

As discussed in the mailing list, it doesn't really make any sense that the two
iterations below produce different results:

func main() {
    s := "á"

    for i := 0; i != len(s); i++ {
        println(i, s[i])
    }

    for i, v := range s {
        println(i, v)
    }   

}   

This is the kind of behavior which will very easily introduce bugs in real world code,
because these versions look very much like feasible alternatives to each other, and
tests will work as expected depending only on the data set used.

As a proposal to fix this behavior, "range" iteration on strings should work
with the raw bytes, and the current behavior may easily be reproduced with the following
version:

    for i, v := range []int(s) {
        println(i, v)
    }
@gopherbot
Copy link
Contributor

Comment 1 by themue:

When using strings the contained chars are more interesting than the needed bytes. So
range should keep its behavior while a "for i, b := range []byte(s) {" returns the bytes.

@niemeyer
Copy link
Contributor Author

Comment 2:

Being more interesting or not is a very subjective argument.  It really depends on what
one is trying to achieve.
The fact that the two versions above look like very reasonable alternatives to each
other, and that because of this the behavior is inconsistent and error prone, is not
subjective.

@niemeyer
Copy link
Contributor Author

Comment 3:

Fango pointed out in the ML the issue of the extra space consumed by []int(s).
To solve this issue, we can easily introduce a function in the utf8 package
to help with space-efficient iteration when going through utf8 code points
is desired:
for i := 0; i != len(s);  {
   rune, i := utf8.NextRune(s, i)
   ...
}
Also, an additional issue spotted in the specification:
"A "for" statement with a "range" clause iterates through all entries
of an array, slice, string or map, or values received on a channel. "
It doesn't really iterate through all entries of the string today, unless we
determine that a string isn't made out of bytes, but of code points.

@rsc
Copy link
Contributor

rsc commented Oct 11, 2010

Comment 4:

Feel free to discuss more on the mailing list.
As you might imagine we spent a long time
on the design of this, so the claim that it
"doesn't really make any sense" doesn't ring
true to us. 
Either way, the issue tracker is the wrong place
for long discussions.

Status changed to WorkingAsIntended.

@niemeyer
Copy link
Contributor Author

Comment 5:

It probably doesn't ring true precisely because you've spent a long time on the design
and implementation of this behavior.  For someone looking at the two iterations above,
deprived of any further insights on the choice made, it really doesn't make sense, if
you forgive my frankness.
I can certainly see the pragmatic reason why it works this way, but it feels like a
language design wart which could be avoided (avoiding with it the surprise and future
bugs) by either putting the feature in the library, or by optimizing the compilation of
"range []int(...)" to what it does right now.).
Either way, I've already presented the argument here and in the mailing list (hopefully
in a clear way). Unless there's further interest from the designers in seeing this
fixed/changed, it probably won't help much to continue the discussion.

@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants