Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: regalloc spilling unecessarily before ops with constrained inputs #17214

Closed
mundaym opened this issue Sep 23, 2016 · 6 comments
Closed
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@mundaym
Copy link
Member

mundaym commented Sep 23, 2016

In the s390x port we have quite a few ops that heavily constrain input registers. For example, STMG4 forces its inputs into R1, R2, R3 and R4. It does this because it requires consecutive registers and there is no other way (that I can think of) to get consecutive input registers using SSA rules. We have a similar problem with instructions that require an even-odd register pair. Other architectures also do something similar for LoweredMoves because it clobbers the input registers.

Unfortunately constraining the input registers can cause the register allocator problems. For example, in BenchmarkAppend:

at entry: R1=b R3=d R4=c

a = MOVDaddr <uintptr> {type.int} _ : R1 // forces b out of R1
b = LoadReg <*uint8> _ : R2
c = Copy <int> R4 : R3                   // forces d out of R3
d = LoadReg <int> _ : R4
_ = STMG4 <mem> _ a b c d _

b and d are spilled unnecessarily because we have unallocated registers available and could have re-arranged the inputs as:

b = Copy <*uint8> R1 : R2
a = MOVDaddr <uintptr> {type.int} : R1
_ = Copy <int> R3 : R5
c = Copy <int> R4 : R3
d = Copy <int> R5 : R4
_ = STMG4 <mem> _ a b c d _

I'm planning to take a look at the register allocator to see if I can fix this. I think both these cases are a little fiddly though. I'm not sure if there is a general solution to this problem... Perhaps copying evicted values to a new register when there are further uses close by?

In this particular case it would also help if the StoreReg ops were moved into this basic block (I noticed there are some TODOs to do this in the regalloc code). Currently the generated code ends up something like this:

for {
     // spill here
     if unlikely {
          // load spilled values
     }
}
@randall77
Copy link
Contributor

I believe this is a dup of #16061.

It would be good to either:

  1. Make the desired register portion of regalloc stronger so it handles these cases.
    Maybe just make sure that s390x is triggering desired registers at all.
  2. Implement "spill to register" instead of spill to stack. That is required for cmd/compile: SSA runtime performance regression on function with multiple idiv instructions #16061, where no amount of desired register information can help, because we desire 2 different values in the same register (two DIVs that both need their args in AX).

@quentinmit
Copy link
Contributor

@mundaym @randall77 #16061 is now marked as fixed; can this issue be closed as well?

@quentinmit quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 3, 2016
@quentinmit quentinmit added this to the Go1.8 milestone Oct 3, 2016
@mundaym
Copy link
Member Author

mundaym commented Oct 3, 2016

Ah yes, sorry, it can be closed. The fix for #16061 fixed this too. It would be nice to firm up the 'desired register' code for s390x but it isn't as important.

I'm not completely certain but it looks to me like the desired registers aren't triggered in this case because the Block containing the STMG instruction has a successor that is also a predecessor and has already been processed.

In other words B2's desired registers are ignored if B1 is processed first and B1 is a successor of B2:

──> B1 ───> B2 ──>
    ^       │
    └───────┘

@mundaym mundaym closed this as completed Oct 3, 2016
@randall77
Copy link
Contributor

I'm not sure I understand your example and what exactly isn't working. Which block is the STMG in? B1 should get desired register information from B2.
Maybe we should discuss in a new bug.

@mundaym
Copy link
Member Author

mundaym commented Oct 3, 2016

Hmm, after studying the code again I think the comment above is wrong, my apologies. I'll see if I can put together a better bug report, but I think (I haven't fully verified this) the following is happening:

  1. The desired register information is being dropped when it is propagated backwards through phis here and so it doesn't make it to the initial register allocation of the value.
  2. The desired register information at the start of the block with the STMG in it (B2) is being overridden by the dynamically calculated values present in the previous block (B1) here.

(2) is what I was trying to talk about in my comment above, although (1) is probably the more important bit. 'Fixing' (2) would probably result in code similar to that generated by https://golang.org/cl/29732, albeit with values moved into their desired registers at the start of the block rather than just before they are clobbered.

@dr2chase
Copy link
Contributor

dr2chase commented Oct 3, 2016

This might have some bearing on the parameters-in-registers experiment, too. I'll keep an eye out for unfortunate-looking register movement.

@golang golang locked and limited conversation to collaborators Oct 3, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants