-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: range loop over []byte, []int can avoid base pointer increments #15809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I don't recall any strength reduction code in the current compiler. This is I think how the code is generated by the frontend and is one of the test cases that a SR will hopefully address. Alternatively, should the frontend generate better code? |
We'd have to be careful that end pointers don't confuse the GC. See related comments in gc/range.go (which is indeed probably the place for improvements here). See also my pending bool2int CL for an interaction between SSA and range / end-of-slice front end code. |
A potential sort-of-optimization for the future is to keep the locally-root (might not be really-root, depending on what our callers fed us) pointers safely on the stack, and then let the derived and sometimes-off-the-end pointers be marked as not live as long as we knew a 1:1 correspondence to the local roots. Means that the GC scans less, means that off-the-end is okay, means that more often the GC's offset calculation isn't needed. |
Except for pointers to stack-allocated things, oops, never mind. |
This is not a strength reduction. It's just the code generated for range loops over arrays/slices: there is a pointer being advanced and an integer index. The pointer advancement avoids a multiply and a bounds check, at least in the old system. You could change the range expansion to avoid the pointer++ when the element size is something that has cheap index calculation, and then mark the indexing as bounds-checked already. |
Started working on this and created a CL that computes the element position directly each time if it is cheap to do. Currently looking at the effects, adjusting tests, benchmarking and evaluating e.g. if we could just compare against a pointer to the end of the array. I am planning to send a CL for 1.9 within the next 2 weeks. with new CL:
without:
|
Thanks. Note that the "pointer to the end of the array" is not well defined in Go. You can't point past the last element (that's a pointer to a different object and will confuse the GC), you can't use an integer instead of a pointer (that won't be updated if the object is on the stack and moves, confusing the program itself), and you can't use a pointer to the last element (an array with 0 elements has no last element). You could use the pointer to the last element as long as there's a check for zero-size arrays that bypasses everything entirely. Anyway, just keep all that in mind. Thanks again. |
@rsc thanks. What i had in mind was having a look at what you described: "You could use the pointer to the last element as long as there's a check for zero-size arrays". On top off all what you mentioned one also needs to keep e.g. the order in which assignments are made in mind. I seem to have found a bug in the string range case (in new and old code) while working on the array range code. #18376 |
CL https://golang.org/cl/38061 mentions this issue. |
In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
Was fixed by https://golang.org/cl/38061 for most used architectures. |
compiles to:
It looks like we strength-reduced the index into
b
into a pointer increment. But the terminating condition still uses the integer, so we end up having 2 induction variables instead of one. We should either compute the terminating condition using an end pointer (and throw away the induction variable CX), or throw away the pointer induction variable (AX) and use an indexed load.@brtzsnr
The text was updated successfully, but these errors were encountered: