cmd/compile: group write barrier calls more aggresively #19838

cherrymui · 2017-04-04T15:42:44Z

Currently, for things like

v0 = some memory
v1 = Store ...  v0 // write barrier
v2 = Store ...  v1 // no write barrier
v3 = Store ...  v2 // write barrier
v4 = Store ...  v3 // no write barrier
v5 = Store ...  v4 // write barrier

The compiler generates separate branches for v1, v3, v5. Something like:

v0 = some memory
if write barrier is on {
    ... // prepare args
    v1a = Call runtime.writebarrierPtr ...
} else {
    v1b = Store ...  v0 // regular store
}
v1 = Phi v1a v1b
v2 = Store ...  v1 // no write barrier
if write barrier is on {
    ... // prepare args
    v3a = Call runtime.writebarrierPtr ...
} else {
    v3b = Store ...  v2 // regular store
}
v3 = Phi v3a v3b
v4 = Store ...  v3 // no write barrier
if write barrier is on {
    ... // prepare args
    v5a = Call runtime.writebarrierPtr ...
} else {
    v5b = Store ...  v4 // regular store
}
v5 = Phi v5a v5b

Sometimes it creates too many blocks. (When building SSA we try to put pointer writes together exactly for this reason.)

It would be better to group them in a single branch, when it is safe. But it needs to know whether it is ok to reorder stores.

Alternatively, maybe we could duplicate v2, v4 to both side of the branch (some conditions needed, at least v2.Uses == 1). But the question is how many of them we want to duplicate. What if there are 5 no-WB stores between two WB stores?

The text was updated successfully, but these errors were encountered:

randall77 · 2017-04-04T19:41:09Z

A simple rule of thumb would be to do whatever generates less total code. So if the source is

storeptr
store * N
storeptr

Compare (some approximation of) the code size between

if wbon {
  storeptrwb
} else {
  storeptr
}
store * N
if wbon {
  storeptrwb
} else {
  storeptr
}

and

if wbon {
   storeptrwb
   store *N
   storeptrwb
} else {
   storeptr
   store *N
   storeptr
}

So basically, if the size of the N stores is smaller than an additional condition (load, compare, conditional branch, unconditional branch).

josharian · 2017-04-27T01:47:12Z

I've started experimenting with this. I hope to some data tomorrow.

gopherbot · 2017-04-27T20:13:36Z

CL https://golang.org/cl/42010 mentions this issue.

gopherbot · 2017-04-27T20:13:36Z

CL https://golang.org/cl/42011 mentions this issue.

gopherbot · 2017-04-27T21:00:52Z

CL https://golang.org/cl/42013 mentions this issue.

josharian · 2017-04-27T21:02:03Z

The CL series just mailed takes a look at this. The final CL needs some numbers. I'd love suggestions about what other than binary size to look at--e.g. regular benchmarks to run.

Unfortunately, my original personal motivation for digging into this--improving compilation time of large static literals--isn't helped, since that involves duplicating calls, not stores, and that's probably a bad idea. So back to the drawing board on those.

This CL mainly moves some work to the switch on w.Op, to make a follow-up change simpler and clearer. Updates #19838 Change-Id: I86f3181c380dd60960afcc24224f655276b8956c Reviewed-on: https://go-review.googlesource.com/42010 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>

…ocks OpVarXXX Values don't generate instructions, so there's no reason not to duplicate them, and duplicating them generates better code (fewer branches). This requires changing the start/end accounting to correctly handle the case in which we have run of Values beginning with an OpVarXXX, e.g. OpVarDef, OpZeroWB, OpMoveWB. In that case, the sequence of values should begin at the OpZeroWB, not the OpVarDef. This also lays the groundwork for experimenting with allowing duplication of some scalar stores. Shrinks function text sizes a tiny amount: name old object-bytes new object-bytes delta Template 381k ± 0% 381k ± 0% -0.01% (p=0.008 n=5+5) Unicode 203k ± 0% 203k ± 0% -0.04% (p=0.008 n=5+5) GoTypes 1.17M ± 0% 1.17M ± 0% -0.01% (p=0.008 n=5+5) SSA 8.24M ± 0% 8.24M ± 0% -0.00% (p=0.008 n=5+5) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 286k ± 0% 286k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 189k ± 0% 189k ± 0% ~ (all equal) XML 415k ± 0% 415k ± 0% -0.01% (p=0.008 n=5+5) Updates golang#19838 Change-Id: Ic5ef30855919f1468066eba08ae5c4bd9a01db27

DO NOT SUBMIT [mailing for initial feedback, needs benchmark numbers] Duplicate some scalar stores on both halves of a writeBarrier.enabled block, to reduce branching. There are other memory ops besides OpVarXXX and OpStore that we could consider here: OpMove, OpZero, Op*Call, atomic stores, OpSelect1, OpPhi. However, OpVarXXX and OpStore are by far the most common, and they correspond to a fixed number of instructions. The others are all rare and are either tricky (OpSelect1, OpPhi, atomic stores) or correspond to potentially lots of code. Fixes golang#19838 Change-Id: I0d5a68c2dfb3b6a6916b92151b3e64df533d4581

This CL mainly moves some work to the switch on w.Op, to make a follow-up change simpler and clearer. Updates golang#19838 Change-Id: I86f3181c380dd60960afcc24224f655276b8956c

…ocks OpVarXXX Values don't generate instructions, so there's no reason not to duplicate them, and duplicating them generates better code (fewer branches). This requires changing the start/end accounting to correctly handle the case in which we have run of Values beginning with an OpVarXXX, e.g. OpVarDef, OpZeroWB, OpMoveWB. In that case, the sequence of values should begin at the OpZeroWB, not the OpVarDef. This also lays the groundwork for experimenting with allowing duplication of some scalar stores. Shrinks function text sizes a tiny amount: name old object-bytes new object-bytes delta Template 381k ± 0% 381k ± 0% -0.01% (p=0.008 n=5+5) Unicode 203k ± 0% 203k ± 0% -0.04% (p=0.008 n=5+5) GoTypes 1.17M ± 0% 1.17M ± 0% -0.01% (p=0.008 n=5+5) SSA 8.24M ± 0% 8.24M ± 0% -0.00% (p=0.008 n=5+5) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 286k ± 0% 286k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 189k ± 0% 189k ± 0% ~ (all equal) XML 415k ± 0% 415k ± 0% -0.01% (p=0.008 n=5+5) Updates #19838 Change-Id: Ic5ef30855919f1468066eba08ae5c4bd9a01db27 Reviewed-on: https://go-review.googlesource.com/42011 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>

…ocks OpVarXXX Values don't generate instructions, so there's no reason not to duplicate them, and duplicating them generates better code (fewer branches). This requires changing the start/end accounting to correctly handle the case in which we have run of Values beginning with an OpVarXXX, e.g. OpVarDef, OpZeroWB, OpMoveWB. In that case, the sequence of values should begin at the OpZeroWB, not the OpVarDef. This also lays the groundwork for experimenting with allowing duplication of some scalar stores. Shrinks function text sizes a tiny amount: name old object-bytes new object-bytes delta Template 381k ± 0% 381k ± 0% -0.01% (p=0.008 n=5+5) Unicode 203k ± 0% 203k ± 0% -0.04% (p=0.008 n=5+5) GoTypes 1.17M ± 0% 1.17M ± 0% -0.01% (p=0.008 n=5+5) SSA 8.24M ± 0% 8.24M ± 0% -0.00% (p=0.008 n=5+5) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 286k ± 0% 286k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 189k ± 0% 189k ± 0% ~ (all equal) XML 415k ± 0% 415k ± 0% -0.01% (p=0.008 n=5+5) Updates golang#19838 Change-Id: Ic5ef30855919f1468066eba08ae5c4bd9a01db27

DO NOT SUBMIT [mailing for initial feedback, needs benchmark numbers] Duplicate some scalar stores on both halves of a writeBarrier.enabled block, to reduce branching. There are other memory ops besides OpVarXXX and OpStore that we could consider here: OpMove, OpZero, Op*Call, atomic stores, OpSelect1, OpPhi. However, OpVarXXX and OpStore are by far the most common, and they correspond to a fixed number of instructions. The others are all rare and are either tricky (OpSelect1, OpPhi, atomic stores) or correspond to potentially lots of code. Fixes golang#19838 Change-Id: I0d5a68c2dfb3b6a6916b92151b3e64df533d4581

josharian · 2017-05-15T17:34:24Z

Anything more on this front should wait for 1.10.

aclements · 2018-05-18T18:42:58Z

This optimization would interact poorly with safe-points everywhere, since we have to disallow safe-points between the write barrier flag check and the last write barrier that depends on it (or, if we can construct the write barrier as a restartable sequence, back up to the flag load, but that doesn't help this optimization). Though I think this optimization may be less important now, too, since we generate significantly less code for a write barrier call than we used to. We could generate even less if we pulled the write out of runtime.gcWriteBarrier.

josharian · 2018-05-18T19:24:23Z

I never got conclusive improvements when I implemented this. Let’s focus any effort on this front on improving write barriers generally. (E.g. #24686)

cherrymui added this to the Unplanned milestone Apr 4, 2017

cherrymui mentioned this issue Apr 4, 2017

cmd/compile: slow to compile large map literals containing dynamic elements #19751

Open

josharian added the Performance label Apr 4, 2017

josharian mentioned this issue Apr 17, 2017

runtime: add per-G shadows of writeBarrier.enabled #20005

Open

josharian added the ToolSpeed label Apr 24, 2017

josharian modified the milestones: Go1.9, Unplanned Apr 24, 2017

josharian mentioned this issue Apr 24, 2017

cmd/compile: big binary and slow compilation times with maps & []interface{} in static code #20095

Open

josharian modified the milestones: Go1.10, Go1.9 May 15, 2017

bradfitz modified the milestones: Go1.10, Go1.11 Nov 28, 2017

bradfitz modified the milestones: Go1.11, Unplanned May 18, 2018

josharian closed this as completed May 18, 2018

golang locked and limited conversation to collaborators May 18, 2019

gopherbot added the FrozenDueToAge label May 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: group write barrier calls more aggresively #19838

cmd/compile: group write barrier calls more aggresively #19838

cherrymui commented Apr 4, 2017

randall77 commented Apr 4, 2017

josharian commented Apr 27, 2017

gopherbot commented Apr 27, 2017

gopherbot commented Apr 27, 2017

gopherbot commented Apr 27, 2017

josharian commented Apr 27, 2017

josharian commented May 15, 2017

aclements commented May 18, 2018

josharian commented May 18, 2018

cmd/compile: group write barrier calls more aggresively #19838

cmd/compile: group write barrier calls more aggresively #19838

Comments

cherrymui commented Apr 4, 2017

randall77 commented Apr 4, 2017

josharian commented Apr 27, 2017

gopherbot commented Apr 27, 2017

gopherbot commented Apr 27, 2017

gopherbot commented Apr 27, 2017

josharian commented Apr 27, 2017

josharian commented May 15, 2017

aclements commented May 18, 2018

josharian commented May 18, 2018