New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: regalloc spilling unecessarily before ops with constrained inputs #17214
Comments
I believe this is a dup of #16061. It would be good to either:
|
@mundaym @randall77 #16061 is now marked as fixed; can this issue be closed as well? |
Ah yes, sorry, it can be closed. The fix for #16061 fixed this too. It would be nice to firm up the 'desired register' code for s390x but it isn't as important. I'm not completely certain but it looks to me like the desired registers aren't triggered in this case because the In other words B2's desired registers are ignored if B1 is processed first and B1 is a successor of B2:
|
I'm not sure I understand your example and what exactly isn't working. Which block is the STMG in? B1 should get desired register information from B2. |
Hmm, after studying the code again I think the comment above is wrong, my apologies. I'll see if I can put together a better bug report, but I think (I haven't fully verified this) the following is happening:
(2) is what I was trying to talk about in my comment above, although (1) is probably the more important bit. 'Fixing' (2) would probably result in code similar to that generated by https://golang.org/cl/29732, albeit with values moved into their desired registers at the start of the block rather than just before they are clobbered. |
This might have some bearing on the parameters-in-registers experiment, too. I'll keep an eye out for unfortunate-looking register movement. |
In the s390x port we have quite a few ops that heavily constrain input registers. For example, STMG4 forces its inputs into R1, R2, R3 and R4. It does this because it requires consecutive registers and there is no other way (that I can think of) to get consecutive input registers using SSA rules. We have a similar problem with instructions that require an even-odd register pair. Other architectures also do something similar for LoweredMoves because it clobbers the input registers.
Unfortunately constraining the input registers can cause the register allocator problems. For example, in BenchmarkAppend:
b
andd
are spilled unnecessarily because we have unallocated registers available and could have re-arranged the inputs as:I'm planning to take a look at the register allocator to see if I can fix this. I think both these cases are a little fiddly though. I'm not sure if there is a general solution to this problem... Perhaps copying evicted values to a new register when there are further uses close by?
In this particular case it would also help if the
StoreReg
ops were moved into this basic block (I noticed there are some TODOs to do this in the regalloc code). Currently the generated code ends up something like this:The text was updated successfully, but these errors were encountered: