String iteration with range should work with bytes, not code points #1185

niemeyer · 2010-10-09T23:28:52Z

As discussed in the mailing list, it doesn't really make any sense that the two
iterations below produce different results:

func main() {
    s := "á"

    for i := 0; i != len(s); i++ {
        println(i, s[i])
    }

    for i, v := range s {
        println(i, v)
    }   

}   

This is the kind of behavior which will very easily introduce bugs in real world code,
because these versions look very much like feasible alternatives to each other, and
tests will work as expected depending only on the data set used.

As a proposal to fix this behavior, "range" iteration on strings should work
with the raw bytes, and the current behavior may easily be reproduced with the following
version:

    for i, v := range []int(s) {
        println(i, v)
    }

gopherbot · 2010-10-10T10:18:27Z

Comment 1 by themue:

When using strings the contained chars are more interesting than the needed bytes. So
range should keep its behavior while a "for i, b := range []byte(s) {" returns the bytes.

niemeyer · 2010-10-10T11:31:45Z

Comment 2:

Being more interesting or not is a very subjective argument.  It really depends on what
one is trying to achieve.
The fact that the two versions above look like very reasonable alternatives to each
other, and that because of this the behavior is inconsistent and error prone, is not
subjective.

niemeyer · 2010-10-10T12:32:55Z

Comment 3:

Fango pointed out in the ML the issue of the extra space consumed by []int(s).
To solve this issue, we can easily introduce a function in the utf8 package
to help with space-efficient iteration when going through utf8 code points
is desired:
for i := 0; i != len(s);  {
   rune, i := utf8.NextRune(s, i)
   ...
}
Also, an additional issue spotted in the specification:
"A "for" statement with a "range" clause iterates through all entries
of an array, slice, string or map, or values received on a channel. "
It doesn't really iterate through all entries of the string today, unless we
determine that a string isn't made out of bytes, but of code points.

rsc · 2010-10-11T20:00:10Z

Comment 4:

Feel free to discuss more on the mailing list.
As you might imagine we spent a long time
on the design of this, so the claim that it
"doesn't really make any sense" doesn't ring
true to us. 
Either way, the issue tracker is the wrong place
for long discussions.

Status changed to WorkingAsIntended.

niemeyer · 2010-10-11T20:30:00Z

Comment 5:

It probably doesn't ring true precisely because you've spent a long time on the design
and implementation of this behavior.  For someone looking at the two iterations above,
deprived of any further insights on the choice made, it really doesn't make sense, if
you forgive my frankness.
I can certainly see the pragmatic reason why it works this way, but it feels like a
language design wart which could be avoided (avoiding with it the surprise and future
bugs) by either putting the feature in the library, or by optimizing the compilation of
"range []int(...)" to what it does right now.).
Either way, I've already presented the argument here and in the mailing list (hopefully
in a clear way). Unless there's further interest from the designers in seeing this
fixed/changed, it probably won't help much to continue the discussion.

niemeyer added workingasintended labels Oct 11, 2010

golang locked and limited conversation to collaborators Jun 24, 2016

gopherbot added the FrozenDueToAge label Jun 24, 2016

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String iteration with range should work with bytes, not code points #1185

String iteration with range should work with bytes, not code points #1185

niemeyer commented Oct 9, 2010

gopherbot commented Oct 10, 2010

niemeyer commented Oct 10, 2010

niemeyer commented Oct 10, 2010

rsc commented Oct 11, 2010

niemeyer commented Oct 11, 2010

String iteration with range should work with bytes, not code points #1185

String iteration with range should work with bytes, not code points #1185

Comments

niemeyer commented Oct 9, 2010

gopherbot commented Oct 10, 2010

niemeyer commented Oct 10, 2010

niemeyer commented Oct 10, 2010

rsc commented Oct 11, 2010

niemeyer commented Oct 11, 2010