-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: amd64 addressing modes are unwieldy #36468
Comments
Change https://golang.org/cl/214042 mentions this issue: |
Here's another example of the combinatorial rule problem with amd64. LEAx values with non-zero offsets get split at the end of compilation into multiple LEAs to avoid slow LEAs. Some other values can absorb the LEA offsets, which would lead to a simpler encoding. Here is a sample rule:
But if you look at the number of ops that have ValAndOff-style auxints that can absorb an index, there are a bunch...the combinatorial number mentioned above. As such, even though these rules do shorten generated code I'm not inclined to add them. Another possible approach here is to construct another way to write rewrite rules. We have
This would generate combinations Xac, Xad, Xbc, Xbd, correlated to Yac, Yad, Ybc, Ybd. If you needed eight, you could have Problems with this idea. (1) The syntax is lousy and I don't see a better solution. (2) It generates a lot of rewrite rules. I guess we could do this with loops instead of generation, like we did with commutativity, but that's gonna start hurting my head pretty soon. |
Change https://golang.org/cl/217097 mentions this issue: |
I hacked together another possible strategy to fix this. Take a look and let me know what you think. |
Use a separate compiler pass to introduce complicated x86 addressing modes. Loads in the normal architecture rules (for x86 and all other platforms) can have constant offsets (AuxInt values) and symbols (Aux values), but no more. The complex addressing modes (x+y, x+2*y, etc.) are introduced in a separate pass that combines loads with LEAQx ops. Organizing rewrites this way simplifies the number of rewrites required, as there are lots of different rule orderings that have to be specified to ensure these complex addressing modes are always found if they are possible. Update #36468 Change-Id: I5b4bf7b03a1e731d6dfeb9ef19b376175f3b4b44 Reviewed-on: https://go-review.googlesource.com/c/go/+/217097 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Change https://golang.org/cl/222782 mentions this issue: |
Update #36468 Change-Id: Idfdb845d097994689be450d6e8a57fa9adb57166 Reviewed-on: https://go-review.googlesource.com/c/go/+/222782 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
I'm going to close this, I think we've done enough, at least for x86. I'll open a separate issue for s390x. |
Change https://golang.org/cl/227960 mentions this issue: |
name old time/op new time/op delta LoadAdd-16 545ns ± 0% 456ns ± 0% -16.31% (p=0.000 n=10+10) Update #36468 Change-Id: I84f390d55490648fa1f58cdbc24fd74c4f1bc8c1 Reviewed-on: https://go-review.googlesource.com/c/go/+/227960 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
name old time/op new time/op delta LoadAdd-16 545ns ± 0% 456ns ± 0% -16.31% (p=0.000 n=10+10) Update golang#36468 Change-Id: I84f390d55490648fa1f58cdbc24fd74c4f1bc8c1 Reviewed-on: https://go-review.googlesource.com/c/go/+/227960 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
There are many amd64 instructions that accept a full range of memory addressing modes. It would be nice to make use of them all.
The current approach is to give each its own op, such as
BTRQmodifyidx8
orANDQloadidx1
. The problem is that this generates a large number of ops and rules. To addANDQloadidx1
, we need to find all rules starting withANDQ
and add variants forANDQloadidx1
as appropriate. And we need to add rules (say) absorbingSHLQ
of the index into the scale, andADDQ
into the offset. For an example of how many rules are involved just for adding addressing modes forCMP
, see https://golang.org/cl/167089. And that's a fairly simple case, because there aren't many existing rules involvingCMPQ
that need indexed load analogues.Such additions have steady incremental improvements in generated code, at the cost of developer time, reviewer time, compiler binary size, and time and memory required to build the compiler (which has proved to be an ongoing problem).
I wonder whether we can find a way to factor out some of this commonality, so we can hoover up all the incremental improvements at much less cost.
One idea is to encode the scale into AuxInt, so we'd have just
idx
instead ofidx1
,idx2
,idx4
,idx8
. We'd need helper routines to tell us whether it's ok to adjust the scale in a particular way, since not all scales are available for all bit widths, but that is doable. The great shortcoming of this idea is thatidx1
ops are commutative, but the others are not. Maybe just encode the scale for 2, 4, and 8?Another (vague) idea is to have a separate handwritten pass after lower that does nothing but addressing modes. We'd mark ops with what kind of address modes they support in which argument slots and then do all combining then. We'd still have a combinatorial number of ops, but many fewer rules.
This is actually an instance of a broader amd64 problem around having a combinatorial number of ops (base X const? X load/store/modify X addressingmode). Maybe if we structured this all appropriately, we could generate (rather than hand code) all the ops and their encodings in amd64/ssa.go and their generated rules (like we did for commutativity).
For discussion and ideas.
cc @randall77 @mvdan @martisch @cherrymui
The text was updated successfully, but these errors were encountered: