-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: group write barrier calls more aggresively #19838
Comments
A simple rule of thumb would be to do whatever generates less total code. So if the source is
Compare (some approximation of) the code size between
and
So basically, if the size of the N stores is smaller than an additional condition (load, compare, conditional branch, unconditional branch). |
I've started experimenting with this. I hope to some data tomorrow. |
CL https://golang.org/cl/42010 mentions this issue. |
CL https://golang.org/cl/42011 mentions this issue. |
CL https://golang.org/cl/42013 mentions this issue. |
The CL series just mailed takes a look at this. The final CL needs some numbers. I'd love suggestions about what other than binary size to look at--e.g. regular benchmarks to run. Unfortunately, my original personal motivation for digging into this--improving compilation time of large static literals--isn't helped, since that involves duplicating calls, not stores, and that's probably a bad idea. So back to the drawing board on those. |
This CL mainly moves some work to the switch on w.Op, to make a follow-up change simpler and clearer. Updates #19838 Change-Id: I86f3181c380dd60960afcc24224f655276b8956c Reviewed-on: https://go-review.googlesource.com/42010 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
…ocks OpVarXXX Values don't generate instructions, so there's no reason not to duplicate them, and duplicating them generates better code (fewer branches). This requires changing the start/end accounting to correctly handle the case in which we have run of Values beginning with an OpVarXXX, e.g. OpVarDef, OpZeroWB, OpMoveWB. In that case, the sequence of values should begin at the OpZeroWB, not the OpVarDef. This also lays the groundwork for experimenting with allowing duplication of some scalar stores. Shrinks function text sizes a tiny amount: name old object-bytes new object-bytes delta Template 381k ± 0% 381k ± 0% -0.01% (p=0.008 n=5+5) Unicode 203k ± 0% 203k ± 0% -0.04% (p=0.008 n=5+5) GoTypes 1.17M ± 0% 1.17M ± 0% -0.01% (p=0.008 n=5+5) SSA 8.24M ± 0% 8.24M ± 0% -0.00% (p=0.008 n=5+5) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 286k ± 0% 286k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 189k ± 0% 189k ± 0% ~ (all equal) XML 415k ± 0% 415k ± 0% -0.01% (p=0.008 n=5+5) Updates golang#19838 Change-Id: Ic5ef30855919f1468066eba08ae5c4bd9a01db27
DO NOT SUBMIT [mailing for initial feedback, needs benchmark numbers] Duplicate some scalar stores on both halves of a writeBarrier.enabled block, to reduce branching. There are other memory ops besides OpVarXXX and OpStore that we could consider here: OpMove, OpZero, Op*Call, atomic stores, OpSelect1, OpPhi. However, OpVarXXX and OpStore are by far the most common, and they correspond to a fixed number of instructions. The others are all rare and are either tricky (OpSelect1, OpPhi, atomic stores) or correspond to potentially lots of code. Fixes golang#19838 Change-Id: I0d5a68c2dfb3b6a6916b92151b3e64df533d4581
This CL mainly moves some work to the switch on w.Op, to make a follow-up change simpler and clearer. Updates golang#19838 Change-Id: I86f3181c380dd60960afcc24224f655276b8956c
…ocks OpVarXXX Values don't generate instructions, so there's no reason not to duplicate them, and duplicating them generates better code (fewer branches). This requires changing the start/end accounting to correctly handle the case in which we have run of Values beginning with an OpVarXXX, e.g. OpVarDef, OpZeroWB, OpMoveWB. In that case, the sequence of values should begin at the OpZeroWB, not the OpVarDef. This also lays the groundwork for experimenting with allowing duplication of some scalar stores. Shrinks function text sizes a tiny amount: name old object-bytes new object-bytes delta Template 381k ± 0% 381k ± 0% -0.01% (p=0.008 n=5+5) Unicode 203k ± 0% 203k ± 0% -0.04% (p=0.008 n=5+5) GoTypes 1.17M ± 0% 1.17M ± 0% -0.01% (p=0.008 n=5+5) SSA 8.24M ± 0% 8.24M ± 0% -0.00% (p=0.008 n=5+5) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 286k ± 0% 286k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 189k ± 0% 189k ± 0% ~ (all equal) XML 415k ± 0% 415k ± 0% -0.01% (p=0.008 n=5+5) Updates #19838 Change-Id: Ic5ef30855919f1468066eba08ae5c4bd9a01db27 Reviewed-on: https://go-review.googlesource.com/42011 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
…ocks OpVarXXX Values don't generate instructions, so there's no reason not to duplicate them, and duplicating them generates better code (fewer branches). This requires changing the start/end accounting to correctly handle the case in which we have run of Values beginning with an OpVarXXX, e.g. OpVarDef, OpZeroWB, OpMoveWB. In that case, the sequence of values should begin at the OpZeroWB, not the OpVarDef. This also lays the groundwork for experimenting with allowing duplication of some scalar stores. Shrinks function text sizes a tiny amount: name old object-bytes new object-bytes delta Template 381k ± 0% 381k ± 0% -0.01% (p=0.008 n=5+5) Unicode 203k ± 0% 203k ± 0% -0.04% (p=0.008 n=5+5) GoTypes 1.17M ± 0% 1.17M ± 0% -0.01% (p=0.008 n=5+5) SSA 8.24M ± 0% 8.24M ± 0% -0.00% (p=0.008 n=5+5) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 286k ± 0% 286k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 189k ± 0% 189k ± 0% ~ (all equal) XML 415k ± 0% 415k ± 0% -0.01% (p=0.008 n=5+5) Updates golang#19838 Change-Id: Ic5ef30855919f1468066eba08ae5c4bd9a01db27
DO NOT SUBMIT [mailing for initial feedback, needs benchmark numbers] Duplicate some scalar stores on both halves of a writeBarrier.enabled block, to reduce branching. There are other memory ops besides OpVarXXX and OpStore that we could consider here: OpMove, OpZero, Op*Call, atomic stores, OpSelect1, OpPhi. However, OpVarXXX and OpStore are by far the most common, and they correspond to a fixed number of instructions. The others are all rare and are either tricky (OpSelect1, OpPhi, atomic stores) or correspond to potentially lots of code. Fixes golang#19838 Change-Id: I0d5a68c2dfb3b6a6916b92151b3e64df533d4581
Anything more on this front should wait for 1.10. |
This optimization would interact poorly with safe-points everywhere, since we have to disallow safe-points between the write barrier flag check and the last write barrier that depends on it (or, if we can construct the write barrier as a restartable sequence, back up to the flag load, but that doesn't help this optimization). Though I think this optimization may be less important now, too, since we generate significantly less code for a write barrier call than we used to. We could generate even less if we pulled the write out of runtime.gcWriteBarrier. |
I never got conclusive improvements when I implemented this. Let’s focus any effort on this front on improving write barriers generally. (E.g. #24686) |
Currently, for things like
The compiler generates separate branches for v1, v3, v5. Something like:
Sometimes it creates too many blocks. (When building SSA we try to put pointer writes together exactly for this reason.)
It would be better to group them in a single branch, when it is safe. But it needs to know whether it is ok to reorder stores.
Alternatively, maybe we could duplicate v2, v4 to both side of the branch (some conditions needed, at least v2.Uses == 1). But the question is how many of them we want to duplicate. What if there are 5 no-WB stores between two WB stores?
The text was updated successfully, but these errors were encountered: