Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: elide write barriers for pointer swaps #16765

Closed
josharian opened this issue Aug 17, 2016 · 8 comments
Closed

cmd/compile: elide write barriers for pointer swaps #16765

josharian opened this issue Aug 17, 2016 · 8 comments

Comments

@josharian
Copy link
Contributor

josharian commented Aug 17, 2016

This might be hard to accomplish, but consider:

package p

func f(s []*int, i, j int) {
    s[i], s[j] = s[j], s[i]
}

This generates two writebarrierptr calls. But since no actual pointers were harmed in the making of this swap, it seems to me that ought to be doable with no write barriers at all.

That would transform this:

"".f t=1 size=202 args=0x28 locals=0x28
    0x0000 00000 (x.go:3)   TEXT    "".f(SB), $40-40
    0x0000 00000 (x.go:3)   MOVQ    (TLS), CX
    0x0009 00009 (x.go:3)   CMPQ    SP, 16(CX)
    0x000d 00013 (x.go:3)   JLS 192
    0x0013 00019 (x.go:3)   SUBQ    $40, SP
    0x0017 00023 (x.go:3)   MOVQ    BP, 32(SP)
    0x001c 00028 (x.go:3)   LEAQ    32(SP), BP
    0x0021 00033 (x.go:3)   FUNCDATA    $0, gclocals·943e4296f97f1071542b1de0f5e26987(SB)
    0x0021 00033 (x.go:3)   FUNCDATA    $1, gclocals·c2934d28c868ce52e67cf0667b9c3035(SB)
    0x0021 00033 (x.go:4)   MOVQ    "".i+72(FP), AX
    0x0026 00038 (x.go:4)   MOVQ    "".s+56(FP), CX
    0x002b 00043 (x.go:4)   CMPQ    AX, CX
    0x002e 00046 (x.go:4)   JCC $0, 185
    0x0034 00052 (x.go:4)   MOVQ    "".s+48(FP), DX
    0x0039 00057 (x.go:4)   MOVQ    (DX)(AX*8), BX
    0x003d 00061 (x.go:4)   MOVQ    BX, "".autotmp_1+24(SP)
    0x0042 00066 (x.go:4)   LEAQ    (DX)(AX*8), SI
    0x0046 00070 (x.go:4)   MOVQ    "".j+80(FP), DI
    0x004b 00075 (x.go:4)   CMPQ    DI, CX
    0x004e 00078 (x.go:4)   JCC $0, 185
    0x0050 00080 (x.go:4)   MOVQ    (DX)(DI*8), CX
    0x0054 00084 (x.go:4)   LEAQ    (DX)(DI*8), R8
    0x0058 00088 (x.go:4)   MOVQ    R8, "".autotmp_2+16(SP)
    0x005d 00093 (x.go:4)   MOVL    runtime.writeBarrier(SB), R9
    0x0064 00100 (x.go:4)   TESTL   R9, R9
    0x0067 00103 (x.go:4)   JNE $0, 149
    0x0069 00105 (x.go:4)   MOVQ    CX, (DX)(AX*8)
    0x006d 00109 (x.go:4)   MOVL    runtime.writeBarrier(SB), AX
    0x0073 00115 (x.go:4)   TESTL   AX, AX
    0x0075 00117 (x.go:4)   JNE $0, 133
    0x0077 00119 (x.go:4)   MOVQ    BX, (DX)(DI*8)
    0x007b 00123 (x.go:5)   MOVQ    32(SP), BP
    0x0080 00128 (x.go:5)   ADDQ    $40, SP
    0x0084 00132 (x.go:5)   RET
    0x0085 00133 (x.go:4)   MOVQ    R8, (SP)
    0x0089 00137 (x.go:4)   MOVQ    BX, 8(SP)
    0x008e 00142 (x.go:4)   PCDATA  $0, $0
    0x008e 00142 (x.go:4)   CALL    runtime.writebarrierptr(SB)
    0x0093 00147 (x.go:5)   JMP 123
    0x0095 00149 (x.go:4)   MOVQ    SI, (SP)
    0x0099 00153 (x.go:4)   MOVQ    CX, 8(SP)
    0x009e 00158 (x.go:4)   PCDATA  $0, $1
    0x009e 00158 (x.go:4)   CALL    runtime.writebarrierptr(SB)
    0x00a3 00163 (x.go:4)   MOVQ    "".s+48(FP), DX
    0x00a8 00168 (x.go:4)   MOVQ    "".autotmp_1+24(SP), BX
    0x00ad 00173 (x.go:4)   MOVQ    "".j+80(FP), DI
    0x00b2 00178 (x.go:4)   MOVQ    "".autotmp_2+16(SP), R8
    0x00b7 00183 (x.go:4)   JMP 109
    0x00b9 00185 (x.go:4)   PCDATA  $0, $2
    0x00b9 00185 (x.go:4)   CALL    runtime.panicindex(SB)
    0x00be 00190 (x.go:4)   UNDEF
    0x00c0 00192 (x.go:4)   NOP
    0x00c0 00192 (x.go:3)   CALL    runtime.morestack_noctxt(SB)
    0x00c5 00197 (x.go:3)   JMP 0

into this:

"".f t=1 size=76 args=0x28 locals=0x0
    0x0000 00000 (x.go:3)   TEXT    "".f(SB), $0-40
    0x0000 00000 (x.go:3)   MOVQ    (TLS), CX
    0x0009 00009 (x.go:3)   CMPQ    SP, 16(CX)
    0x000d 00013 (x.go:3)   JLS 69
    0x000f 00015 (x.go:3)   FUNCDATA    $0, gclocals·3cadd97b66f25a3a642be35e9362338f(SB)
    0x000f 00015 (x.go:3)   FUNCDATA    $1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
    0x000f 00015 (x.go:4)   MOVQ    "".i+32(FP), AX
    0x0014 00020 (x.go:4)   MOVQ    "".s+16(FP), CX
    0x0019 00025 (x.go:4)   CMPQ    AX, CX
    0x001c 00028 (x.go:4)   JCC $0, 62
    0x001e 00030 (x.go:4)   MOVQ    "".s+8(FP), DX
    0x0023 00035 (x.go:4)   MOVQ    (DX)(AX*8), BX
    0x0027 00039 (x.go:4)   MOVQ    "".j+40(FP), SI
    0x002c 00044 (x.go:4)   CMPQ    SI, CX
    0x002f 00047 (x.go:4)   JCC $0, 62
    0x0031 00049 (x.go:4)   MOVQ    (DX)(SI*8), CX
    0x0035 00053 (x.go:4)   MOVQ    CX, (DX)(AX*8)
    0x0039 00057 (x.go:4)   MOVQ    BX, (DX)(SI*8)
    0x003d 00061 (x.go:5)   RET
    0x003e 00062 (x.go:4)   PCDATA  $0, $1
    0x003e 00062 (x.go:4)   CALL    runtime.panicindex(SB)
    0x0043 00067 (x.go:4)   UNDEF
    0x0045 00069 (x.go:3)   CALL    runtime.morestack_noctxt(SB)
    0x004a 00074 (x.go:3)   JMP 0

cc @mdempsky @randall77 @aclements @dr2chase

@josharian josharian added this to the Unplanned milestone Aug 17, 2016
@aclements
Copy link
Member

This is definitely unsafe if there's the possibility of preemption in the swap sequence. It might currently be safe if the sequence is unpreemptible (which is the case here), including no possibility of panics or synchronous signals. I'm not sure this is ever safe with ROC.

/cc @RLH

@josharian
Copy link
Contributor Author

Another idea, if pointer swaps are common enough (and I suspect anecdotally they are, but have no evidence): Add a writebarrierswap runtime call that swaps two pointers. It'd eliminate one runtime call, at least. Presumably also a typedmemswap. (cc @bradfitz)

@aclements
Copy link
Member

Yet another idea: we could make the write barrier itself unpreemptible, which would let the compiler check runtime.writeBarrier just once per atomic sequence and coalesce some of the control flow. Would that help?

We haven't looked too closely at this because, as far as we can tell, applications spend basically no time in the write barrier. But it's hard to quantify the secondary effects.

@josharian
Copy link
Contributor Author

Yet another idea: we could make the write barrier itself unpreemptible, which would let the compiler check runtime.writeBarrier just once per atomic sequence and coalesce some of the control flow. Would that help?

Yes, I think it would. And there's a TODO in the compiler for it already.

I think the secondary effects (code size, eliminated jumps, resulting new SSA backend optimization opportunities, etc.) are non-trivial. One recent data point is from CL 26666. A minor compiler change removed a single write barrier from the maps implementation and got a 5-8% perf boost on a few benchmarks that hit that code path hard.

@mdempsky
Copy link
Member

Does

func f(s []*int, i, j int) {
    s[i] = s[j]
}

need a write barrier? What's special about swapping pointers? Or is the observation related to s[i] and s[j] being slots in the same heap object?

@josharian
Copy link
Contributor Author

The special thing about swapping pointers is that it doesn't change the reachable set.

@aclements
Copy link
Member

It might currently be safe if the sequence is unpreemptible

Actually, it's not safe even if the sequence is unpreemptible. GC may be in the process of scanning this slice, and if the scan is between indexes i and j and there's no write barrier, you may fail to mark whatever is now at min(i, j).

@josharian
Copy link
Contributor Author

Ah. Oh well. Thanks, Austin.

@golang golang locked and limited conversation to collaborators Aug 17, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants