Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: lay out loop-free, likeliness-free control flow more compactly #20356

Open
josharian opened this issue May 13, 2017 · 16 comments
Open
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Performance
Milestone

Comments

@josharian
Copy link
Contributor

josharian commented May 13, 2017

package p

func f() int {
	x := 0
	for i := 0; i < 10; i++ {
		odd := 0
		if i%2 == 0 {
			odd = 2 // not 1, otherwise this branch gets optimized away!
		}
		x += odd
		// Distract the layout pass with a bunch of loops.
		for j := 0; j < 10; j++ {
			for j := 0; j < 10; j++ {
				for j := 0; j < 10; j++ {
					x++
				}
			}
		}
	}
	return x
}

In this code, we have even odds of the if i%2 == 0 branch being taken. But the code layout is pretty uneven. In this CFG, b3 is that branch, and b6 and b19 are the taken/not-taken blocks; both feed into b7. It seems like a good layout for this would be b3 b6 b19 b7 or b3 b19 b6 b7. But we put b19 at the very end of the function.

This is a simplified version of something that also happens in the fannkuch benchmark. See also #20355 and #18977.

For details on how to read this image, see #20355.

noncompactlayout

cc @randall77 @dr2chase @cherrymui

Marking 1.9Maybe because we removed the old backend's instruction re-ordering pass during 1.9; this may help prevent regressions from that.

@josharian
Copy link
Contributor Author

Seems like the layout pass would benefit from being made generally loop-aware.

@cherrymui
Copy link
Member

Does it improve performance by laying b6 and b19 together?

Does the old follow pass put them together? Maybe I should investigate this.

What tool did you use to generate the graphs? I have done this a few times with pen and paper. A tool seems very helpful.

@gopherbot
Copy link

CL https://golang.org/cl/43464 mentions this issue.

@josharian
Copy link
Contributor Author

Does it improve performance by laying b6 and b19 together?

I think it will, in larger functions. There are trade-offs--number of jumps, fwd vs backward, code compactness, jump encoding, etc. I need to experiment, although there are enough other layout-related things in flight (CL 43293, issue #20355), that I'd like to wait a little bit. Just filing this so I don't forget and to gather input from experienced hands.

Does the old follow pass put them together? Maybe I should investigate this.

I don't know. I'd be curious to find out.

What tool did you use to generate the graphs? I have done this a few times with pen and paper. A tool seems very helpful.

I used CL 43464, which gopherbot has helpfully linked to. It has major problems, though. Maybe I'll email golang-dev to get opinions and solicit help on it...

@dr2chase
Copy link
Contributor

@laboger had mentioned that our likeliness/layout was not all that they had hoped.

One possibility I had intended to try was to introduce at least one more level of (un)likeliness, for things like branches to panics where we are "really sure" about likeliness, versus all the cases where we're making less-educated guesses. As a general rule we expect p(loopbackedge) > p(return) > p(panic).

@josharian
Copy link
Contributor Author

Note to self: Interesting test cases for this are moderate-complexity autogenerated SSA rule functions, like rewriteValuedec_OpStore_0.

@josharian
Copy link
Contributor Author

One possibility I had intended to try was to introduce at least one more level of (un)likeliness, for things like branches to panics where we are "really sure" about likeliness, versus all the cases where we're making less-educated guesses. As a general rule we expect p(loopbackedge) > p(return) > p(panic).

See CL 43293 for branches to panics; could be improved, but it is a simple and fairly effective first cut. And see the first (somewhat awful) patchset of that CL for what adding new likeliness levels looks like.

And yes, I think spending some time in 1.10 on likeliness and code layout is worthwhile.

@josharian
Copy link
Contributor Author

Another observation for 1.10 use. It appears that a lot of the non-compactness I observed here may simply be due to the fact that posdegree (and zerodegree) are effectively stacks instead of queues. Though that may become irrelevant if lay out code based primarily on loop nesting information.

@gopherbot
Copy link

CL https://golang.org/cl/43501 mentions this issue.

gopherbot pushed a commit that referenced this issue May 16, 2017
Noticed while looking at #20356.

Cuts 160k (1%) off of the cmd/compile binary.

Change-Id: If2397bc6971d6be9be6975048adecb0b5efa6d66
Reviewed-on: https://go-review.googlesource.com/43501
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@josharian
Copy link
Contributor Author

I think CL 43491 (and possibly @dr2chase's follow-up) is enough to address the 1.9 regression. Punting this to 1.10.

@josharian josharian modified the milestones: Go1.10, Go1.9Maybe May 18, 2017
josharian added a commit to josharian/go that referenced this issue May 23, 2017
DO NOT SUBMIT

This CL adds CFG graphs to ssa.html.
It execs dot to generate SVG,
which then gets inlined into the html.
Some standard naming and javascript hacks
enable integration with the rest of ssa.html:
Clicking on blocks highlights the relevant
part of the CFG graph, and vice versa.

Sample output and screenshots can be seen
in issues golang#20355 and golang#20356.

There are serious problems with this CL, though.

Performance:

* Calling dot after every pass is noticeably slow.
* The generated output is giant.
* The browser is very slow to render the resulting page.
* Clicking on blocks is even slower than before.
* Some things I want to do, like allow the user to change
  the table column widths, lock up the browser.

Appearance:

* The CFGs can easily be large and overwhelming.
  Toggling them on/off might be workable, if the
  performance concerns above were addressed.
* I can't figure out the right size to render the CFGs;
  simple ones are currently oversized and cartoonish,
  while larger ones are unreadable.
* They requires an unsatisfying amount of explanation (see golang#20356).
  Block layout information is particularly inferior/confusing.
* Dead blocks float awkwardly in the sky, with no indication
  that they are dead.
* It'd be nice to somehow add visual information about loops,
  which we can calculate, and which is non-obvious in large graphs,
  but I don't know how.
* It'd be nice to add more information per block,
  like the number of values it contains,
  or even the values themselves,
  but adding info to a node makes the graph even less readable.
  Just adding the f.Blocks index in parens was not good.

Bugs, incompleteness:

* I seem to have broken highlighting around the entire
  block in the text.
* Need to hook up some way to communicate dot-related errors
  without bringing down the process.
* Might need some way to enable/disable dot entirely.

Change-Id: I19abc3007f396bdb710ba7563668d343c0924feb
@bradfitz bradfitz modified the milestones: Go1.10, Go1.11 Nov 28, 2017
@bradfitz bradfitz modified the milestones: Go1.11, Unplanned May 18, 2018
@ysmolski
Copy link
Member

ysmolski commented Oct 12, 2018

I am looking at this. I was able to generate CFG using tip and the @josharian tool for the program in the topic. It's somewhat different from what was posted 1+ year ago (not a surprise). I got this:

screen shot 2018-10-12 at 15 35 47

For the full picture see the attached ssa.html:
ssa.html.zip

Now I wonder how can we optimize the layout? What is the desired order of blocks in this case?

EDIT: I have uploaded improved versions of CFGs pictures.

@ysmolski
Copy link
Member

ysmolski commented Oct 17, 2018

For anyone who will work on this. Current version of gc does not have branching in layout pass for this code:

if i%2 == 0 {
     odd = 2
}

Layout pre/post b3 has the following code:

b3: ← b2
v21 (+10) = ADDQconst <int> [-1069] v52 (odd[int])
v57 (+8) = SHRQconst <int> [63] v7
v60 (?) = MOVQconst <int> [0]
v36 (+8) = ADDQ <int> v57 v7
v31 (+8) = SARQconst <int> [1] v36
v50 (+8) = SHLQconst <int> [1] v31
v15 (8) = CMPQ <flags> v7 v50
v23 (13) = CMOVQEQ <int> v5 v21 v15 (odd[int])
v24 (+13) = ADDQ <int> v23 v52 (x[int])
Plain → b8 (8)

It uses conditional move and thus you cannot replicate the bug with the code in the topic!

@ysmolski
Copy link
Member

Code that reproduces the problem:

package p

//go:noinline
func g() {
}

func f() int {
	x := 0
	for i := 0; i < 10; i++ {
		odd := 0
		if i%2 == 0 {
			odd = 2 // not 1, otherwise this branch gets optimized away!
			g()
		}
		x += odd
		// Distract the layout pass with a bunch of loops.
		for j := 0; j < 10; j++ {
			for j := 0; j < 10; j++ {
				for j := 0; j < 10; j++ {
					x++
				}
			}
		}
	}
	return x
}

@ysmolski
Copy link
Member

ysmolski commented Oct 17, 2018

@dr2chase:

One possibility I had intended to try was to introduce at least one more level of (un)likeliness, for things like branches to panics where we are "really sure" about likeliness, versus all the cases where we're making less-educated guesses. As a general rule we expect p(loopbackedge) > p(return) > p(panic).

I agree that it would solve this problem generally, in the case of i%2 == 0 we are pretty safe to assume that it's 50%/50%, while the compiler estimates that as unlikely. I am not sure why it is not the BranchUknown for this condition? Looks like the likelyadjsut pass estimates this to unlikely.

I've read in some very popular book that compiler could profile the code first to estimate likeness of branches and then make the optimization. Laughs and idealistic approaches aside, we can try to introduce what David has suggested.

@josharian what do you think?

@CAFxX
Copy link
Contributor

CAFxX commented Oct 17, 2018

I've read in some very popular book that compiler could profile the code first to estimate likeness of branches and then make the optimization. Laughs and idealistic approaches aside, we can try to introduce what David has suggested.

While we're on the topic, it seems there is no github issue for PGO, that may well be the only good way to address issues such as this or the midstack inlining one. Is it deemed completely out of scope for gc?

@josharian
Copy link
Contributor Author

I’ll file a PGO issue.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Performance
Projects
None yet
Development

No branches or pull requests

7 participants