Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: same global variable loaded twice for function call argument #28484

Closed
martisch opened this issue Oct 30, 2018 · 12 comments
Closed
Labels
FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@martisch
Copy link
Contributor

What version of Go are you using (go version)?

go version devel +2e9f0817f0 Tue Oct 30 04:39:53 2018 +0000 darwin/amd64

found this when benchmarking runtime.makeslice with make([]T, LenVar, LenVar) where LenVar was a global variable. However the issue can be shown with any function call where two arguments are loaded from the same global variable. The issue of unnecessary loading variables to often might come up in other contexts too. Filling for further investigation and possible general optimization.

Example:

var L = 0
var B bool

//go:noinline
func takesTwo(i, j int) bool {
	return i < j
}

func main() {
	B = takesTwo(L, L)
}

main.L is loaded twice to write the arguments for takesTwo to the stack:

  MOVQ main.L(SB), AX			
  MOVQ AX, 0(SP)				
  MOVQ main.L(SB), AX // Can be avoided	
  MOVQ AX, 0x8(SP)			
  CALL main.takesTwo(SB)			

expected:

  MOVQ main.L(SB), AX			
  MOVQ AX, 0(SP)				
  MOVQ AX, 0x8(SP)			
  CALL main.takesTwo(SB)			
@martisch martisch added Performance help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Oct 30, 2018
@martisch martisch added this to the Unplanned milestone Oct 30, 2018
@martisch
Copy link
Contributor Author

cc @randall77 @josharian

@agnivade
Copy link
Contributor

agnivade commented Nov 9, 2018

This seems like a great issue to get one's hands dirty with the compiler during the freeze.

@josharian @martisch - Could you throw me a bone on where to start looking to investigate something like this ?

godoc is pretty much frozen, so I am ramping up on the dark arts of SSA. And I have passing familiarity with rewrite rules. But nothing more I'm afraid.

Apologies for a couple of beginner questions -

Should this be in some SSA pass ? (if so, which one ?) Should this be in a rewrite rule ? Should this be a new pass ?

@josharian
Copy link
Contributor

Hard to say without digging into it. I'd hope it could be a generic rewrite rule. I'd start by poking around ssa.html and understanding in detail the example code and seeing if you see a clear, generalizable way to fix it. However, as I was recently reminded by @mundaym, rewriting rule (and any SSA work) around memory operations are really tricky to get right. So this might prove not to be an ideal first place to experiment.

@agnivade
Copy link
Contributor

agnivade commented Nov 9, 2018

Thanks ! Yes I have been looking into the ssa.html.

Essentially, we have something like this -

v4 (?) = Addr <*int> {"".L} v3
v5 (13) = Load <int> v4 v1
v6 (?) = OffPtr <*int> [0] v2
v7 (13) = Store <mem> {int} v6 v5 v1
v8 (?) = Addr <*int> {"".L} v3
v9 (13) = Load <int> v8 v7
v10 (?) = OffPtr <*int> [8] v2
v11 (13) = Store <mem> {int} v10 v9 v7
v12 (13) = StaticCall <mem> {"".takesTwo} [24] v11

What we would want is something like this -

v4 (?) = Addr <*int> {"".L} v3
v5 (13) = Load <int> v4 v1
v6 (?) = OffPtr <*int> [0] v2
v7 (13) = Store <mem> {int} v6 v5 v1 // v8 and v9 are redundant. So remove them.
v10 (?) = OffPtr <*int> [8] v2
v11 (13) = Store <mem> {int} v10 v5 v7 // replace v9 by v5
v12 (13) = StaticCall <mem> {"".takesTwo} [24] v11

I am not sure if this can be a rewrite rule. IIUC, we would need to check all params of a StaticCall and then replace a Load-Store with a Store using the already Loaded value.

As you say, will leave it for somebody else.

@josharian
Copy link
Contributor

At a glance, it seems to me that one route is to use alias analysis (currently called func disjoint) to recognize that v7 and v9 don’t overlap, then float the load before the store, and then eliminate the duplicate loads. Or some such memory re-ordering.

Related: It might also be worth looking at Philip Hofer’s old CL that brings alias analysis to the tighten pass. I wanted to get that revived this cycle but didn’t get to it.

@anthonycanino1
Copy link
Contributor

Since it doesn't look like this is planned for Go 1.14, I wouldn't mind digging into the SSA to see what I can find. I plan to hack on this, unless someone feels this issue is more pressing and deserves someone with more experience.

@anthonycanino1
Copy link
Contributor

@josharian can you point me to "Philip Hofer’s old CL"? Perhaps I can take a look at it.

@josharian
Copy link
Contributor

@ivzhh
Copy link

ivzhh commented Dec 29, 2019

I think there is an extra constraint on this. Suppose we have a function sideEffect(), which has side effect and will change L.

var L = 0
var B bool

func sideEffect() int // {
//    L = L+1
//    return L
//}

//go:noinline
func takesTwo(i, j, k int) bool {
	return i < j+k
}

func main() {
	B = takesTwo(L, sideEffect(), L)
}

According to go spec:

In a function call, the function value and arguments are evaluated in the usual order. After they are evaluated, the parameters of the call are passed by value to the function and the called function begins execution

So originally, I thought 1st L and 2nd L has different values. Actually, the order of execution is:

  • Call sideEffect()
  • Load 1st L
  • Load 2nd L

Therefore, we can hoist Load step by step, taking mem() from mem() if mem doesn't affect Load's source address (by alias analysis).

  • Previous Static|Inter|ClosureCall
  • Beginning of the block (after phi)
v4 (14) = StaticCall <mem> {"".sideEffect} [8] v1
v5 (?) = OffPtr <*int> [0] v2
v6 (14) = Load <int> v5 v4
v7 (?) = Addr <*int> {"".L} v3
v8 (14) = Load <int> v7 v4
v9 (14) = Store <mem> {int} v5 v8 v4
v10 (?) = OffPtr <*int> [8] v2
v11 (14) = Store <mem> {int} v10 v6 v9
v12 (?) = OffPtr <*int> [16] v2
v13 (?) = Addr <*int> {"".L} v3
v14 (14) = Load <int> v13 v11
v15 (14) = Store <mem> {int} v12 v14 v11
v16 (14) = StaticCall <mem> {"".takesTwo} [32] v15

@randall77
Copy link
Contributor

@danscales is looking into redoing how we marashal/unmarshal arguments and return values. He was thinking about pushing the actual generation of the stores/loads later into SSA. It would solve this issue as a side effect because the stores of arguments would not be there during CSE, so the two loads would be CSEd.

@josharian
Copy link
Contributor

@danscales I’m thrilled you are looking at this. If you are just getting started, the commit message from 2578ac5 might be helpful.

@cherrymui
Copy link
Member

With the late-call expansion by @dr2chase, this now does only one load:

	0x001d 00029 (b.go:12)	MOVQ	"".L(SB), AX
	0x0024 00036 (b.go:12)	MOVQ	AX, (SP)
	0x0028 00040 (b.go:12)	MOVQ	AX, 8(SP)
	0x002d 00045 (b.go:12)	PCDATA	$1, $0
	0x002d 00045 (b.go:12)	CALL	"".takesTwo(SB)

@golang golang locked and limited conversation to collaborators Oct 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

8 participants