Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: Use wide integer load/store instructions if possible #11819

Closed
dsnet opened this issue Jul 22, 2015 · 6 comments
Closed

cmd/compile: Use wide integer load/store instructions if possible #11819

dsnet opened this issue Jul 22, 2015 · 6 comments

Comments

@dsnet
Copy link
Member

dsnet commented Jul 22, 2015

Using go1.5beta2

Compile the following code from encoding/binary:

type littleEndian struct{}

func (littleEndian) PutUint64(b []byte, v int64) {
    b[0] = byte(v)
    b[1] = byte(v >> 8)
    b[2] = byte(v >> 16)
    b[3] = byte(v >> 24)
    b[4] = byte(v >> 32)
    b[5] = byte(v >> 40)
    b[6] = byte(v >> 48)
    b[7] = byte(v >> 56)
}

This produces the following:

"".littleEndian.PutUint64 t=1 size=288 value=0 args=0x20 locals=0x0
    0x0000 00000 (/home/rawr/main9.go:5)    TEXT    "".littleEndian.PutUint64(SB), $0-32
    0x0000 00000 (/home/rawr/main9.go:5)    MOVQ    (TLS), CX
    0x0009 00009 (/home/rawr/main9.go:5)    CMPQ    SP, 16(CX)
    0x000d 00013 (/home/rawr/main9.go:5)    JLS 275
    0x0013 00019 (/home/rawr/main9.go:5)    NOP
    0x0013 00019 (/home/rawr/main9.go:5)    NOP
    0x0013 00019 (/home/rawr/main9.go:5)    MOVQ    "".b+8(FP), DX
    0x0018 00024 (/home/rawr/main9.go:5)    MOVQ    "".b+16(FP), CX
    0x001d 00029 (/home/rawr/main9.go:5)    MOVQ    "".v+32(FP), AX
    0x0022 00034 (/home/rawr/main9.go:5)    FUNCDATA    $0, gclocals·2fccd208efe70893f9ac8d682812ae72(SB)
    0x0022 00034 (/home/rawr/main9.go:5)    FUNCDATA    $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
    0x0022 00034 (/home/rawr/main9.go:6)    CMPQ    CX, $0
    0x0026 00038 (/home/rawr/main9.go:6)    JLS $1, 268
    0x002c 00044 (/home/rawr/main9.go:6)    MOVB    AL, (DX)
    0x002e 00046 (/home/rawr/main9.go:7)    MOVQ    DX, BX
    0x0031 00049 (/home/rawr/main9.go:7)    CMPQ    CX, $1
    0x0035 00053 (/home/rawr/main9.go:7)    JLS $1, 261
    0x003b 00059 (/home/rawr/main9.go:7)    INCQ    BX
    0x003e 00062 (/home/rawr/main9.go:7)    MOVQ    AX, BP
    0x0041 00065 (/home/rawr/main9.go:7)    SARQ    $8, BP
    0x0045 00069 (/home/rawr/main9.go:7)    MOVB    BPB, (BX)
    0x0048 00072 (/home/rawr/main9.go:8)    MOVQ    DX, BX
    0x004b 00075 (/home/rawr/main9.go:8)    CMPQ    CX, $2
    0x004f 00079 (/home/rawr/main9.go:8)    JLS $1, 254
    0x0055 00085 (/home/rawr/main9.go:8)    ADDQ    $2, BX
    0x0059 00089 (/home/rawr/main9.go:8)    MOVQ    AX, BP
    0x005c 00092 (/home/rawr/main9.go:8)    SARQ    $16, BP
    0x0060 00096 (/home/rawr/main9.go:8)    MOVB    BPB, (BX)
    0x0063 00099 (/home/rawr/main9.go:9)    MOVQ    DX, BX
    0x0066 00102 (/home/rawr/main9.go:9)    CMPQ    CX, $3
    0x006a 00106 (/home/rawr/main9.go:9)    JLS $1, 247
    0x0070 00112 (/home/rawr/main9.go:9)    ADDQ    $3, BX
    0x0074 00116 (/home/rawr/main9.go:9)    MOVQ    AX, BP
    0x0077 00119 (/home/rawr/main9.go:9)    SARQ    $24, BP
    0x007b 00123 (/home/rawr/main9.go:9)    MOVB    BPB, (BX)
    0x007e 00126 (/home/rawr/main9.go:10)   MOVQ    DX, BX
    0x0081 00129 (/home/rawr/main9.go:10)   CMPQ    CX, $4
    0x0085 00133 (/home/rawr/main9.go:10)   JLS $1, 240
    0x0087 00135 (/home/rawr/main9.go:10)   ADDQ    $4, BX
    0x008b 00139 (/home/rawr/main9.go:10)   MOVQ    AX, BP
    0x008e 00142 (/home/rawr/main9.go:10)   SARQ    $32, BP
    0x0092 00146 (/home/rawr/main9.go:10)   MOVB    BPB, (BX)
    0x0095 00149 (/home/rawr/main9.go:11)   MOVQ    DX, BX
    0x0098 00152 (/home/rawr/main9.go:11)   CMPQ    CX, $5
    0x009c 00156 (/home/rawr/main9.go:11)   JLS $1, 233
    0x009e 00158 (/home/rawr/main9.go:11)   ADDQ    $5, BX
    0x00a2 00162 (/home/rawr/main9.go:11)   MOVQ    AX, BP
    0x00a5 00165 (/home/rawr/main9.go:11)   SARQ    $40, BP
    0x00a9 00169 (/home/rawr/main9.go:11)   MOVB    BPB, (BX)
    0x00ac 00172 (/home/rawr/main9.go:12)   MOVQ    DX, BX
    0x00af 00175 (/home/rawr/main9.go:12)   CMPQ    CX, $6
    0x00b3 00179 (/home/rawr/main9.go:12)   JLS $1, 226
    0x00b5 00181 (/home/rawr/main9.go:12)   ADDQ    $6, BX
    0x00b9 00185 (/home/rawr/main9.go:12)   MOVQ    AX, BP
    0x00bc 00188 (/home/rawr/main9.go:12)   SARQ    $48, BP
    0x00c0 00192 (/home/rawr/main9.go:12)   MOVB    BPB, (BX)
    0x00c3 00195 (/home/rawr/main9.go:13)   MOVQ    DX, BX
    0x00c6 00198 (/home/rawr/main9.go:13)   CMPQ    CX, $7
    0x00ca 00202 (/home/rawr/main9.go:13)   JLS $1, 219
    0x00cc 00204 (/home/rawr/main9.go:13)   ADDQ    $7, BX
    0x00d0 00208 (/home/rawr/main9.go:13)   MOVQ    AX, BP
    0x00d3 00211 (/home/rawr/main9.go:13)   SARQ    $56, BP
    0x00d7 00215 (/home/rawr/main9.go:13)   MOVB    BPB, (BX)
    0x00da 00218 (/home/rawr/main9.go:14)   RET
    0x00db 00219 (/home/rawr/main9.go:13)   PCDATA  $0, $0
    0x00db 00219 (/home/rawr/main9.go:13)   CALL    runtime.panicindex(SB)
    0x00e0 00224 (/home/rawr/main9.go:13)   UNDEF
    0x00e2 00226 (/home/rawr/main9.go:12)   PCDATA  $0, $0
    0x00e2 00226 (/home/rawr/main9.go:12)   CALL    runtime.panicindex(SB)
    0x00e7 00231 (/home/rawr/main9.go:12)   UNDEF
    0x00e9 00233 (/home/rawr/main9.go:11)   PCDATA  $0, $0
    0x00e9 00233 (/home/rawr/main9.go:11)   CALL    runtime.panicindex(SB)
    0x00ee 00238 (/home/rawr/main9.go:11)   UNDEF
    0x00f0 00240 (/home/rawr/main9.go:10)   PCDATA  $0, $0
    0x00f0 00240 (/home/rawr/main9.go:10)   CALL    runtime.panicindex(SB)
    0x00f5 00245 (/home/rawr/main9.go:10)   UNDEF
    0x00f7 00247 (/home/rawr/main9.go:9)    PCDATA  $0, $0
    0x00f7 00247 (/home/rawr/main9.go:9)    CALL    runtime.panicindex(SB)
    0x00fc 00252 (/home/rawr/main9.go:9)    UNDEF
    0x00fe 00254 (/home/rawr/main9.go:8)    PCDATA  $0, $0
    0x00fe 00254 (/home/rawr/main9.go:8)    CALL    runtime.panicindex(SB)
    0x0103 00259 (/home/rawr/main9.go:8)    UNDEF
    0x0105 00261 (/home/rawr/main9.go:7)    PCDATA  $0, $0
    0x0105 00261 (/home/rawr/main9.go:7)    CALL    runtime.panicindex(SB)
    0x010a 00266 (/home/rawr/main9.go:7)    UNDEF
    0x010c 00268 (/home/rawr/main9.go:6)    PCDATA  $0, $0
    0x010c 00268 (/home/rawr/main9.go:6)    CALL    runtime.panicindex(SB)
    0x0111 00273 (/home/rawr/main9.go:6)    UNDEF
    0x0113 00275 (/home/rawr/main9.go:5)    CALL    runtime.morestack_noctxt(SB)
    0x0118 00280 (/home/rawr/main9.go:5)    JMP 0
    0x0000 64 48 8b 0c 25 00 00 00 00 48 3b 61 10 0f 86 00  dH..%....H;a....
    0x0010 01 00 00 48 8b 54 24 08 48 8b 4c 24 10 48 8b 44  ...H.T$.H.L$.H.D
    0x0020 24 20 48 83 f9 00 0f 86 e0 00 00 00 88 02 48 89  $ H...........H.
    0x0030 d3 48 83 f9 01 0f 86 ca 00 00 00 48 ff c3 48 89  .H.........H..H.
    0x0040 c5 48 c1 fd 08 40 88 2b 48 89 d3 48 83 f9 02 0f  .H...@.+H..H....
    0x0050 86 a9 00 00 00 48 83 c3 02 48 89 c5 48 c1 fd 10  .....H...H..H...
    0x0060 40 88 2b 48 89 d3 48 83 f9 03 0f 86 87 00 00 00  @.+H..H.........
    0x0070 48 83 c3 03 48 89 c5 48 c1 fd 18 40 88 2b 48 89  H...H..H...@.+H.
    0x0080 d3 48 83 f9 04 76 69 48 83 c3 04 48 89 c5 48 c1  .H...viH...H..H.
    0x0090 fd 20 40 88 2b 48 89 d3 48 83 f9 05 76 4b 48 83  . @.+H..H...vKH.
    0x00a0 c3 05 48 89 c5 48 c1 fd 28 40 88 2b 48 89 d3 48  ..H..H..(@.+H..H
    0x00b0 83 f9 06 76 2d 48 83 c3 06 48 89 c5 48 c1 fd 30  ...v-H...H..H..0
    0x00c0 40 88 2b 48 89 d3 48 83 f9 07 76 0f 48 83 c3 07  @.+H..H...v.H...
    0x00d0 48 89 c5 48 c1 fd 38 40 88 2b c3 e8 00 00 00 00  H..H..8@.+......
    0x00e0 0f 0b e8 00 00 00 00 0f 0b e8 00 00 00 00 0f 0b  ................
    0x00f0 e8 00 00 00 00 0f 0b e8 00 00 00 00 0f 0b e8 00  ................
    0x0100 00 00 00 0f 0b e8 00 00 00 00 0f 0b e8 00 00 00  ................
    0x0110 00 0f 0b e8 00 00 00 00 e9 e3 fe ff ff           .............
    rel 5+4 t=13 +0
    rel 220+4 t=5 runtime.panicindex+0
    rel 227+4 t=5 runtime.panicindex+0
    rel 234+4 t=5 runtime.panicindex+0
    rel 241+4 t=5 runtime.panicindex+0
    rel 248+4 t=5 runtime.panicindex+0
    rel 255+4 t=5 runtime.panicindex+0
    rel 262+4 t=5 runtime.panicindex+0
    rel 269+4 t=5 runtime.panicindex+0
    rel 276+4 t=5 runtime.morestack_noctxt+0

Putting aside the superfluous amount of bounds checking (captured in #5364), it seems like this entire function could be accomplished using some type of store64 instruction instead of 8 repetitive copies of something like:

    0x00b9 00185 (/home/rawr/main9.go:12)   MOVQ    AX, BP
    0x00bc 00188 (/home/rawr/main9.go:12)   SARQ    $48, BP
    0x00c0 00192 (/home/rawr/main9.go:12)   MOVB    BPB, (BX)
    0x00c3 00195 (/home/rawr/main9.go:13)   MOVQ    DX, BX
    0x00c6 00198 (/home/rawr/main9.go:13)   CMPQ    CX, $7
    0x00ca 00202 (/home/rawr/main9.go:13)   JLS $1, 219
@randall77 randall77 self-assigned this Jul 22, 2015
@randall77 randall77 added this to the Go1.6 milestone Jul 22, 2015
@mwhudson
Copy link
Contributor

This would be great, you do need to worry about alignment on some architectures though and I don't know how that should be handled.

@dsnet
Copy link
Member Author

dsnet commented Jul 23, 2015

If I remember correctly, this should be okay for x86, even though there may be a performance penalty for unaligned reads, it is (I'm guessing) probably faster than 8 calls to MOVB. I can't say the same for other architectures like ARM and PowerPC.

Also, when I think about it, I think fixing this issue implies that #5364 is also fixed too. The only way that a wide integer load/store can be used is if the compiler can reason that the load/store will not go out of bounds. Furthermore, it also means that some changes would need to be made in regular Go code to take advantage of this as well. The code presented above would need to be rewritten as:

func (littleEndian) PutUint64(b []byte, v int64) {
    b[7] = byte(v >> 56)
    b[6] = byte(v >> 48)
    b[5] = byte(v >> 40)
    b[4] = byte(v >> 32)
    b[3] = byte(v >> 24)
    b[2] = byte(v >> 16)
    b[1] = byte(v >> 8)
    b[0] = byte(v)
}

Since the highest index is accessed first, a single bounds checked can be used. If it passes, it can proceed to use the wide integer store. Do note that this code is not functionally equivalent to the original. If a panic occurs, the input slice will not hold a partial result.

@rsc
Copy link
Contributor

rsc commented Nov 4, 2015

Note that you cannot in general apply the suggested optimization to PutUint64. You can tell the difference between the standard execution of that code and the "optimization" when len(b) < 8, because the panic in the code as written must happen with the earlier writes having been done.

@funny-falcon
Copy link
Contributor

@rsc despite there is difference, it can be counted as "undefined behaviour".
btw, current master is already does _ = b[7] , so it will already panic before writting anything.
47c9e13
478b594

@funny-falcon
Copy link
Contributor

And, looks like #14267 already implements desired behavior, no?

@randall77
Copy link
Contributor

Yes, this is now fixed (for amd64).

@golang golang locked and limited conversation to collaborators Apr 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants