New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: performance issue with ranging over string #13162
Comments
CL https://golang.org/cl/16695 mentions this issue. |
Ranging over string is much slower than using DecodeRuneInString. See golang.org/issue/13162. Replacing ranging over a string with the implementation of the Bytes counterpart results in the following performance improvements: RuneCountInStringTenASCIIChars-8 43.0ns ± 1% 16.4ns ± 2% -61.80% (p=0.000 n=7+8) RuneCountInStringTenJapaneseChars-8 161ns ± 2% 154ns ± 2% -4.58% (p=0.000 n=8+8) ValidStringTenASCIIChars-8 52.2ns ± 1% 13.2ns ± 1% -74.62% (p=0.001 n=7+7) ValidStringTenJapaneseChars-8 173ns ± 2% 153ns ± 2% -11.78% (p=0.000 n=7+8) Update #13162 Change-Id: Ifc40a6a94bb3317f1f2d929d310bd2694645e9f6 Reviewed-on: https://go-review.googlesource.com/16695 Reviewed-by: Russ Cox <rsc@golang.org>
To avoid duplicate work: I am working to improve this by a compiler patch that removes stringiter and handles quick test results:
|
CL https://golang.org/cl/27853 mentions this issue. |
Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
at tip on my macbookpro with intel ivy bridge where:
is the speed of ranging over a string good enough on other machines too now? |
Here are numbers changing func RuneCountInString(s string) (n int) {
for range s {
n++
}
return n
} Looks like it's now only 13-19% slower.
It's at least good enough for Go 1.8. I'll kick this to Unplanned at this point since it's a performance thing, which we don't generally track for releases unless it's really bad or important. |
I actually get better performance (~10%) using string range. Brad, I'm not sure why you see something different. I'll mail a CL to use string range for RuneCountInString. We can argue on that CL. |
CL https://golang.org/cl/33637 mentions this issue. |
@randall77, I think some compiler changes landed after my comment there. I don't see d295174 mentioned here, but it also apparently landed 3 days before my comment. So not sure which one. |
Marking release-blocker to at least figure out whether we accept https://go-review.googlesource.com/c/go/+/33637 . Ideally we would accept it, because it makes the code so much nicer, but it seems slower now with Go tip than it did in past releases. |
Remeasuring, |
Even with the #22234 fix in, this is still slower. I measured two variants of the CL above: first with an unnamed result parameter, and second with a named result parameter. (ideally they shouldn't generate different code, but alas.)
I'll kick this to Go 1.12 for @randall77 et al to investigate more. |
Punting more to Go1.13 but @randall77 sent a CL https://go-review.googlesource.com/c/go/+/33637/ in late 2016. I'll also cc @josharian |
Looping over a string with range is significantly slower than using utf8.DecodeRuneInString.
So using
for i, r := range s { ... }
is significantly slower than
An (approximate) example of is one gets by comparing the benchmarks of the implementations for bytes and strings in unicode/utf8:
The benchmarks for the *String versions are considerably slower than their bytes counterparts, when processing ASCII.
The text was updated successfully, but these errors were encountered: