cmd/compile: amd64 addressing modes are unwieldy #36468

josharian · 2020-01-09T00:00:12Z

There are many amd64 instructions that accept a full range of memory addressing modes. It would be nice to make use of them all.

The current approach is to give each its own op, such as BTRQmodifyidx8 or ANDQloadidx1. The problem is that this generates a large number of ops and rules. To add ANDQloadidx1, we need to find all rules starting with ANDQ and add variants for ANDQloadidx1 as appropriate. And we need to add rules (say) absorbing SHLQ of the index into the scale, and ADDQ into the offset. For an example of how many rules are involved just for adding addressing modes for CMP, see https://golang.org/cl/167089. And that's a fairly simple case, because there aren't many existing rules involving CMPQ that need indexed load analogues.

Such additions have steady incremental improvements in generated code, at the cost of developer time, reviewer time, compiler binary size, and time and memory required to build the compiler (which has proved to be an ongoing problem).

I wonder whether we can find a way to factor out some of this commonality, so we can hoover up all the incremental improvements at much less cost.

One idea is to encode the scale into AuxInt, so we'd have just idx instead of idx1, idx2, idx4, idx8. We'd need helper routines to tell us whether it's ok to adjust the scale in a particular way, since not all scales are available for all bit widths, but that is doable. The great shortcoming of this idea is that idx1 ops are commutative, but the others are not. Maybe just encode the scale for 2, 4, and 8?

Another (vague) idea is to have a separate handwritten pass after lower that does nothing but addressing modes. We'd mark ops with what kind of address modes they support in which argument slots and then do all combining then. We'd still have a combinatorial number of ops, but many fewer rules.

This is actually an instance of a broader amd64 problem around having a combinatorial number of ops (base X const? X load/store/modify X addressingmode). Maybe if we structured this all appropriately, we could generate (rather than hand code) all the ops and their encodings in amd64/ssa.go and their generated rules (like we did for commutativity).

For discussion and ideas.

cc @randall77 @mvdan @martisch @cherrymui

The text was updated successfully, but these errors were encountered:

gopherbot · 2020-01-09T20:05:51Z

Change https://golang.org/cl/214042 mentions this issue: cmd/compile: move amd64 addressing modes to a new pass

josharian · 2020-01-30T06:47:03Z

Here's another example of the combinatorial rule problem with amd64.

LEAx values with non-zero offsets get split at the end of compilation into multiple LEAs to avoid slow LEAs.

Some other values can absorb the LEA offsets, which would lead to a simpler encoding. Here is a sample rule:

(MOVQstoreconst [x] {sym1} (LEA(Q1|Q2|Q4|Q8|L1|L2|L4|L8|W1|W2|W4|W8) <t> [c] {sym2} ptr idx) mem) && c != 0 && ValAndOff(x).canAdd(c) ->
  (MOVQstoreconst [ValAndOff(x).add(c)] {sym1} (LEA(Q1|Q2|Q4|Q8|L1|L2|L4|L8|W1|W2|W4|W8) <t> [0] {sym2} ptr idx) mem)

But if you look at the number of ops that have ValAndOff-style auxints that can absorb an index, there are a bunch...the combinatorial number mentioned above. As such, even though these rules do shorten generated code I'm not inclined to add them.

Another possible approach here is to construct another way to write rewrite rules. We have | to do parallel op selection. We could have named |s to let you do combinatorial op selection. (Maybe the number of consecutive |s?) Something like this:

(X(a|b)(c||d) [z]) -> (Y(a|b)(c||d) [c])

This would generate combinations Xac, Xad, Xbc, Xbd, correlated to Yac, Yad, Ybc, Ybd. If you needed eight, you could have (X(a|b)(c||d)(e|||f) [z]).

Problems with this idea. (1) The syntax is lousy and I don't see a better solution. (2) It generates a lot of rewrite rules. I guess we could do this with loops instead of generation, like we did with commutativity, but that's gonna start hurting my head pretty soon.

gopherbot · 2020-01-30T18:24:07Z

Change https://golang.org/cl/217097 mentions this issue: cmd/compile: insert complicated x86 addressing modes as a separate pass

randall77 · 2020-01-30T18:25:37Z

I hacked together another possible strategy to fix this. Take a look and let me know what you think.

Use a separate compiler pass to introduce complicated x86 addressing modes. Loads in the normal architecture rules (for x86 and all other platforms) can have constant offsets (AuxInt values) and symbols (Aux values), but no more. The complex addressing modes (x+y, x+2*y, etc.) are introduced in a separate pass that combines loads with LEAQx ops. Organizing rewrites this way simplifies the number of rewrites required, as there are lots of different rule orderings that have to be specified to ensure these complex addressing modes are always found if they are possible. Update #36468 Change-Id: I5b4bf7b03a1e731d6dfeb9ef19b376175f3b4b44 Reviewed-on: https://go-review.googlesource.com/c/go/+/217097 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>

gopherbot · 2020-03-10T23:45:04Z

Change https://golang.org/cl/222782 mentions this issue: cmd/compile: convert 386 port to use addressing modes pass

Update #36468 Change-Id: Idfdb845d097994689be450d6e8a57fa9adb57166 Reviewed-on: https://go-review.googlesource.com/c/go/+/222782 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>

randall77 · 2020-03-16T22:10:07Z

I'm going to close this, I think we've done enough, at least for x86. I'll open a separate issue for s390x.

gopherbot · 2020-04-12T19:53:17Z

Change https://golang.org/cl/227960 mentions this issue: cmd/compile: add indexed load+op operations to amd64

name old time/op new time/op delta LoadAdd-16 545ns ± 0% 456ns ± 0% -16.31% (p=0.000 n=10+10) Update #36468 Change-Id: I84f390d55490648fa1f58cdbc24fd74c4f1bc8c1 Reviewed-on: https://go-review.googlesource.com/c/go/+/227960 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>

name old time/op new time/op delta LoadAdd-16 545ns ± 0% 456ns ± 0% -16.31% (p=0.000 n=10+10) Update golang#36468 Change-Id: I84f390d55490648fa1f58cdbc24fd74c4f1bc8c1 Reviewed-on: https://go-review.googlesource.com/c/go/+/227960 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>

josharian added the Performance label Jan 9, 2020

toothrot added this to the Backlog milestone Jan 9, 2020

toothrot added the NeedsInvestigation label Jan 9, 2020

josharian mentioned this issue Jan 30, 2020

cmd/compile: avoid slow versions of LEA instructions on x86 #21735

Open

randall77 mentioned this issue Mar 16, 2020

cmd/compile: use separate addressing modes pass for s390x #37891

Closed

bcmills mentioned this issue Mar 17, 2020

cmd/compile: sporadic memory corruption on 386 (32-bit) builders #37881

Closed

randall77 closed this as completed Jun 22, 2020

rokkerruslan mentioned this issue Jul 27, 2020

cmd/compile: inefficient code generation on amd64 #40426

Closed

golang locked and limited conversation to collaborators Jun 22, 2021

gopherbot added the FrozenDueToAge label Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: amd64 addressing modes are unwieldy #36468

cmd/compile: amd64 addressing modes are unwieldy #36468

josharian commented Jan 9, 2020

gopherbot commented Jan 9, 2020

josharian commented Jan 30, 2020

gopherbot commented Jan 30, 2020

randall77 commented Jan 30, 2020

gopherbot commented Mar 10, 2020

randall77 commented Mar 16, 2020

gopherbot commented Apr 12, 2020

cmd/compile: amd64 addressing modes are unwieldy #36468

cmd/compile: amd64 addressing modes are unwieldy #36468

Comments

josharian commented Jan 9, 2020

gopherbot commented Jan 9, 2020

josharian commented Jan 30, 2020

gopherbot commented Jan 30, 2020

randall77 commented Jan 30, 2020

gopherbot commented Mar 10, 2020

randall77 commented Mar 16, 2020

gopherbot commented Apr 12, 2020