Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: suboptimal regalloc #59297

Open
randall77 opened this issue Mar 28, 2023 · 12 comments
Open

cmd/compile: suboptimal regalloc #59297

randall77 opened this issue Mar 28, 2023 · 12 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Performance
Milestone

Comments

@randall77
Copy link
Contributor

package main

var g int

func f(x, y int) {
	h(8, x)
	g = y
}
func h(a, b int)

Generates:

MOVQ	BX, main.y+40(SP)
MOVQ	AX, CX
MOVL	$8, AX
MOVQ	CX, BX
CALL	main.h(SB)
MOVQ	main.y+40(SP), CX
MOVQ	CX, main.g(SB)

Note the move from AX to CX and then to BX. The trip via CX is unnecessary. It does that because y, which starts out in BX, is occupying BX when we start evaluating arguments for the call. But we don't need y in BX at that point, it is necessarily going to get spilled - future uses of y couldn't possibly use BX as a valid copy of y as there is a call in the way.

I think this is similar to the problem kinda fixed by CL 471158, in that it involves something in a register that isn't really live.

@prattmic

@randall77 randall77 added this to the Unplanned milestone Mar 28, 2023
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Mar 28, 2023
@prattmic
Copy link
Member

I've got a fix for this. Maybe not as general as I'd like, but I'll clean it up tomorrow and see what it looks like.

@prattmic prattmic self-assigned this Mar 28, 2023
@prattmic
Copy link
Member

MOVQ BX, main.y+8(SP)
MOVQ AX, BX
MOVL $8, AX
CALL main.h(SB)
MOVQ main.y+8(SP), CX
MOVQ CX, main.g(SB)
RET

@prattmic
Copy link
Member

I think there are two mostly orthogonal ways to look at this and to fix this, though we may want both for the most robustness.

The first is what I think Keith is referring to as the liveness issue. The CALL wants to put arg1 (8) in AX, so it needs to move the existing value (x) out. It decides to use CX instead of BX because BX already contains a value (y). But the CALL doesn’t use y as an argument and it clobbers BX, so BX is already effectively dead. The move of x could select BX with no consequences. That is, we’d treat any registers that are (a) in the clobber set and (b) their values are not used as args to the op, as free to assign.

If we did this, I think it would fix this issue because when moving x, we’d choose BX, which is exactly where we want it to end up. But that is mostly just a coincidence, there is no knowledge of where x eventually needs to be in use. If x were the third argument to h, it would still end up in BX and need an additional move to get into CX.

The second way to look at this is an ordering problem. The CALL has an input value (arg2: x) in an input register (arg1: AX). But it is in the wrong input register! We know that we’ll need to dump the value from AX to place arg1, so we’d be best off to handle inputs with values in the wrong register first so they can “cooperatively” move out of the register we need. That is, we should process arg2 first so that we move out of AX to free it for arg1 and move into exactly the register we need it to be in, BX.

This is much more robust than the first approach, as it doesn’t fix the issue by coincidence. However it isn’t perfect. The extreme example of this is a call with arg1 in BX and arg2 in AX. If we process arg1 first, arg2 will be dumped from AX (probably to CX). If we process arg2 first, arg1 will be dumped from BX (probably to CX). Then arg1 will need an extra move from CX to AX. I believe this generalizes to additional arguments rotating through registers.

That is an edge case, but it also shows where the liveness change could help. If we do need to dump an input, the liveness change would free up many registers, letting us do a register-register move rather than spilling to the stack.

@randall77
Copy link
Contributor Author

Part of the liveness computation is "distance to next use". If that distance included a bit which was "definitely on the other side of a call", we could use that to decide that kicking out that value from a register is ~free. We could even preemptively kick it out in advanceUses?
We kind of have that bit already, it is called unlikelyDistance. I think we'd want something more exact than that, though, as that also encodes things on unlikely branches as well as past calls.

@prattmic
Copy link
Member

I agree that sounds like a good improvement, but I think it falls in the first category I listed above and wouldn't fundamentally fix the specific problem in this issue.

That is, allocation of the CALL arg1 needs to dump the value from AX. We will eventually want that value in BX for arg2, but nothing in liveness analysis happening around dumping AX is going to guarantee that we move the value to BX specifically, it can only make it a more likely coincidence.

@randall77
Copy link
Contributor Author

I think if BX was free, the desired register mechanism would push the value into BX if it needed to be moved out of a different register.

@prattmic
Copy link
Member

Ah, I see what you are saying. I don't think that would work as-is because the logic to select a register to move a value to doesn't consider desired registers, but it seems like considering them there would be a universal improvement.

@prattmic
Copy link
Member

For reference, https://go.dev/cl/480256 is the fix I made using an ordering adjustment, but I'm going to try out the liveness idea as well.

@gopherbot
Copy link

Change https://go.dev/cl/480256 mentions this issue: WIP: cmd/compile: adjust regalloc order for call args

@prattmic
Copy link
Member

That CL affects 1388 functions in cmd/compile, so it is pretty widespread. Though some of these changes just change register load order without affecting the total number of moves. Most of the actually changed cases I saw were cases like the example above, where an argument to func A was passed as an argument in a different slot to func B.

@gopherbot
Copy link

Change https://go.dev/cl/509255 mentions this issue: cmd/compile: regalloc: drop values that aren't used until after a call

@gopherbot
Copy link

Change https://go.dev/cl/508925 mentions this issue: WIP: deallocate registers not used until after next call

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Performance
Projects
None yet
Development

No branches or pull requests

3 participants