-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: suboptimal regalloc #59297
Comments
I've got a fix for this. Maybe not as general as I'd like, but I'll clean it up tomorrow and see what it looks like. |
|
I think there are two mostly orthogonal ways to look at this and to fix this, though we may want both for the most robustness. The first is what I think Keith is referring to as the liveness issue. The CALL wants to put arg1 ( If we did this, I think it would fix this issue because when moving The second way to look at this is an ordering problem. The CALL has an input value (arg2: This is much more robust than the first approach, as it doesn’t fix the issue by coincidence. However it isn’t perfect. The extreme example of this is a call with arg1 in BX and arg2 in AX. If we process arg1 first, arg2 will be dumped from AX (probably to CX). If we process arg2 first, arg1 will be dumped from BX (probably to CX). Then arg1 will need an extra move from CX to AX. I believe this generalizes to additional arguments rotating through registers. That is an edge case, but it also shows where the liveness change could help. If we do need to dump an input, the liveness change would free up many registers, letting us do a register-register move rather than spilling to the stack. |
Part of the liveness computation is "distance to next use". If that distance included a bit which was "definitely on the other side of a call", we could use that to decide that kicking out that value from a register is ~free. We could even preemptively kick it out in |
I agree that sounds like a good improvement, but I think it falls in the first category I listed above and wouldn't fundamentally fix the specific problem in this issue. That is, allocation of the CALL arg1 needs to dump the value from AX. We will eventually want that value in BX for arg2, but nothing in liveness analysis happening around dumping AX is going to guarantee that we move the value to BX specifically, it can only make it a more likely coincidence. |
I think if BX was free, the desired register mechanism would push the value into BX if it needed to be moved out of a different register. |
Ah, I see what you are saying. I don't think that would work as-is because the logic to select a register to move a value to doesn't consider desired registers, but it seems like considering them there would be a universal improvement. |
For reference, https://go.dev/cl/480256 is the fix I made using an ordering adjustment, but I'm going to try out the liveness idea as well. |
Change https://go.dev/cl/480256 mentions this issue: |
That CL affects 1388 functions in cmd/compile, so it is pretty widespread. Though some of these changes just change register load order without affecting the total number of moves. Most of the actually changed cases I saw were cases like the example above, where an argument to func A was passed as an argument in a different slot to func B. |
Change https://go.dev/cl/509255 mentions this issue: |
Change https://go.dev/cl/508925 mentions this issue: |
Generates:
Note the move from
AX
toCX
and then toBX
. The trip viaCX
is unnecessary. It does that becausey
, which starts out inBX
, is occupyingBX
when we start evaluating arguments for the call. But we don't needy
inBX
at that point, it is necessarily going to get spilled - future uses ofy
couldn't possibly useBX
as a valid copy ofy
as there is a call in the way.I think this is similar to the problem kinda fixed by CL 471158, in that it involves something in a register that isn't really live.
@prattmic
The text was updated successfully, but these errors were encountered: