cmd/compile: range loop over []byte, []int can avoid base pointer increments #15809

randall77 · 2016-05-24T02:17:21Z

var t [256]byte

func f(b *[16]byte) {
    for i, v := range b {
        b[i] = t[v]
    }
}

compiles to:

    0x000f 00015 (tmp1.go:7)    MOVBLZX (AX), DX
    0x0012 00018 (tmp1.go:7)    LEAQ    "".t(SB), BX
    0x0019 00025 (tmp1.go:7)    MOVBLZX (BX)(DX*1), DX
    0x001d 00029 (tmp1.go:7)    MOVQ    "".b+8(FP), BX
    0x0022 00034 (tmp1.go:7)    MOVB    DL, (BX)(CX*1)
    0x0025 00037 (tmp1.go:6)    INCQ    AX
    0x0028 00040 (tmp1.go:6)    INCQ    CX
    0x002b 00043 (tmp1.go:6)    CMPQ    CX, $16
    0x002f 00047 (tmp1.go:6)    JLT $0, 15

It looks like we strength-reduced the index into b into a pointer increment. But the terminating condition still uses the integer, so we end up having 2 induction variables instead of one. We should either compute the terminating condition using an end pointer (and throw away the induction variable CX), or throw away the pointer induction variable (AX) and use an indexed load.

@brtzsnr

The text was updated successfully, but these errors were encountered:

brtzsnr · 2016-05-24T14:15:50Z

I don't recall any strength reduction code in the current compiler. This is I think how the code is generated by the frontend and is one of the test cases that a SR will hopefully address.

Alternatively, should the frontend generate better code?

josharian · 2016-05-24T14:23:58Z

We'd have to be careful that end pointers don't confuse the GC. See related comments in gc/range.go (which is indeed probably the place for improvements here). See also my pending bool2int CL for an interaction between SSA and range / end-of-slice front end code.

dr2chase · 2016-05-24T16:11:59Z

A potential sort-of-optimization for the future is to keep the locally-root (might not be really-root, depending on what our callers fed us) pointers safely on the stack, and then let the derived and sometimes-off-the-end pointers be marked as not live as long as we knew a 1:1 correspondence to the local roots. Means that the GC scans less, means that off-the-end is okay, means that more often the GC's offset calculation isn't needed.

dr2chase · 2016-05-24T16:14:05Z

Except for pointers to stack-allocated things, oops, never mind.

rsc · 2016-10-21T04:31:22Z

This is not a strength reduction. It's just the code generated for range loops over arrays/slices: there is a pointer being advanced and an integer index. The pointer advancement avoids a multiply and a bounds check, at least in the old system. You could change the range expansion to avoid the pointer++ when the element size is something that has cheap index calculation, and then mark the indexing as bounds-checked already.

martisch · 2016-12-19T20:08:34Z

Started working on this and created a CL that computes the element position directly each time if it is cheap to do. Currently looking at the effects, adjusting tests, benchmarking and evaluating e.g. if we could just compare against a pointer to the end of the array. I am planning to send a CL for 1.9 within the next 2 weeks.

with new CL:

        0x0000 00000 (/Users/martisch/test/main.go:7)   MOVQ    "".b+8(FP), AX
        0x0005 00005 (/Users/martisch/test/main.go:7)   MOVQ    $0, CX
        0x0007 00007 (/Users/martisch/test/main.go:7)   CMPQ    CX, $16
        0x000b 00011 (/Users/martisch/test/main.go:7)   JGE     $0, 42
        0x000d 00013 (/Users/martisch/test/main.go:7)   TESTB   AL, (AX) // can and should be outside the loop?
        0x000f 00015 (/Users/martisch/test/main.go:8)   MOVBLZX (AX)(CX*1), DX
        0x0013 00019 (/Users/martisch/test/main.go:8)   LEAQ    "".t(SB), BX
        0x001a 00026 (/Users/martisch/test/main.go:8)   MOVBLZX (BX)(DX*1), DX
        0x001e 00030 (/Users/martisch/test/main.go:8)   MOVB    DL, (AX)(CX*1)
        0x0021 00033 (/Users/martisch/test/main.go:7)   INCQ    CX
        0x0024 00036 (/Users/martisch/test/main.go:7)   CMPQ    CX, $16
        0x0028 00040 (/Users/martisch/test/main.go:7)   JLT     $0, 13
        0x002a 00042 (/Users/martisch/test/main.go:10)  RET

without:

        0x0000 00000 (/Users/martisch/test/main.go:7)   MOVQ    "".b+8(FP), AX
        0x0005 00005 (/Users/martisch/test/main.go:7)   TESTB   AL, (AX)
        0x0007 00007 (/Users/martisch/test/main.go:6)   MOVQ    AX, CX
        0x000a 00010 (/Users/martisch/test/main.go:7)   MOVQ    $0, DX
        0x000c 00012 (/Users/martisch/test/main.go:7)   CMPQ    DX, $16
        0x0010 00016 (/Users/martisch/test/main.go:7)   JGE     $0, 47
        0x0012 00018 (/Users/martisch/test/main.go:8)   MOVBLZX (AX), BX
        0x0015 00021 (/Users/martisch/test/main.go:8)   LEAQ    "".t(SB), SI
        0x001c 00028 (/Users/martisch/test/main.go:8)   MOVBLZX (SI)(BX*1), BX
        0x0020 00032 (/Users/martisch/test/main.go:8)   MOVB    BL, (CX)(DX*1)
        0x0023 00035 (/Users/martisch/test/main.go:7)   INCQ    AX
        0x0026 00038 (/Users/martisch/test/main.go:7)   INCQ    DX
        0x0029 00041 (/Users/martisch/test/main.go:7)   CMPQ    DX, $16
        0x002d 00045 (/Users/martisch/test/main.go:7)   JLT     $0, 18
        0x002f 00047 (/Users/martisch/test/main.go:10)  RET

rsc · 2016-12-19T20:58:08Z

Thanks. Note that the "pointer to the end of the array" is not well defined in Go. You can't point past the last element (that's a pointer to a different object and will confuse the GC), you can't use an integer instead of a pointer (that won't be updated if the object is on the stack and moves, confusing the program itself), and you can't use a pointer to the last element (an array with 0 elements has no last element). You could use the pointer to the last element as long as there's a check for zero-size arrays that bypasses everything entirely. Anyway, just keep all that in mind. Thanks again.

martisch · 2016-12-19T21:59:29Z

@rsc thanks. What i had in mind was having a look at what you described: "You could use the pointer to the last element as long as there's a check for zero-size arrays".

On top off all what you mentioned one also needs to keep e.g. the order in which assignments are made in mind. I seem to have found a bug in the string range case (in new and old code) while working on the array range code. #18376

gopherbot · 2017-03-12T18:05:17Z

CL https://golang.org/cl/38061 mentions this issue.

In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>

martisch · 2017-11-10T09:25:30Z

Was fixed by https://golang.org/cl/38061 for most used architectures.

randall77 added the Performance label May 24, 2016

randall77 added this to the Go1.8 milestone May 24, 2016

quentinmit added the NeedsFix label Oct 11, 2016

rsc modified the milestones: Go1.9, Go1.8 Oct 21, 2016

rsc changed the title ~~cmd/compile: pointless strength reduction~~ cmd/compile: range loop over []byte, []int can avoid base pointer increments Oct 21, 2016

martisch self-assigned this Dec 19, 2016

randall77 modified the milestones: Go1.10, Go1.9 Jun 6, 2017

martisch closed this as completed Nov 10, 2017

golang locked and limited conversation to collaborators Nov 10, 2018

gopherbot added the FrozenDueToAge label Nov 10, 2018

rsc unassigned martisch Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: range loop over []byte, []int can avoid base pointer increments #15809

cmd/compile: range loop over []byte, []int can avoid base pointer increments #15809

randall77 commented May 24, 2016

brtzsnr commented May 24, 2016

josharian commented May 24, 2016

dr2chase commented May 24, 2016

dr2chase commented May 24, 2016

rsc commented Oct 21, 2016

martisch commented Dec 19, 2016

rsc commented Dec 19, 2016

martisch commented Dec 19, 2016

gopherbot commented Mar 12, 2017

martisch commented Nov 10, 2017 •

edited

Loading

cmd/compile: range loop over []byte, []int can avoid base pointer increments #15809

cmd/compile: range loop over []byte, []int can avoid base pointer increments #15809

Comments

randall77 commented May 24, 2016

brtzsnr commented May 24, 2016

josharian commented May 24, 2016

dr2chase commented May 24, 2016

dr2chase commented May 24, 2016

rsc commented Oct 21, 2016

martisch commented Dec 19, 2016

rsc commented Dec 19, 2016

martisch commented Dec 19, 2016

gopherbot commented Mar 12, 2017

martisch commented Nov 10, 2017 • edited Loading

martisch commented Nov 10, 2017 •

edited

Loading