cmd/compile: lay out loop-free, likeliness-free control flow more compactly #20356

josharian · 2017-05-13T16:43:43Z

package p

func f() int {
	x := 0
	for i := 0; i < 10; i++ {
		odd := 0
		if i%2 == 0 {
			odd = 2 // not 1, otherwise this branch gets optimized away!
		}
		x += odd
		// Distract the layout pass with a bunch of loops.
		for j := 0; j < 10; j++ {
			for j := 0; j < 10; j++ {
				for j := 0; j < 10; j++ {
					x++
				}
			}
		}
	}
	return x
}

In this code, we have even odds of the if i%2 == 0 branch being taken. But the code layout is pretty uneven. In this CFG, b3 is that branch, and b6 and b19 are the taken/not-taken blocks; both feed into b7. It seems like a good layout for this would be b3 b6 b19 b7 or b3 b19 b6 b7. But we put b19 at the very end of the function.

This is a simplified version of something that also happens in the fannkuch benchmark. See also #20355 and #18977.

For details on how to read this image, see #20355.

cc @randall77 @dr2chase @cherrymui

Marking 1.9Maybe because we removed the old backend's instruction re-ordering pass during 1.9; this may help prevent regressions from that.

The text was updated successfully, but these errors were encountered:

josharian · 2017-05-13T17:51:31Z

Seems like the layout pass would benefit from being made generally loop-aware.

cherrymui · 2017-05-13T18:53:38Z

Does it improve performance by laying b6 and b19 together?

Does the old follow pass put them together? Maybe I should investigate this.

What tool did you use to generate the graphs? I have done this a few times with pen and paper. A tool seems very helpful.

gopherbot · 2017-05-14T00:07:21Z

CL https://golang.org/cl/43464 mentions this issue.

josharian · 2017-05-14T00:10:46Z

Does it improve performance by laying b6 and b19 together?

I think it will, in larger functions. There are trade-offs--number of jumps, fwd vs backward, code compactness, jump encoding, etc. I need to experiment, although there are enough other layout-related things in flight (CL 43293, issue #20355), that I'd like to wait a little bit. Just filing this so I don't forget and to gather input from experienced hands.

Does the old follow pass put them together? Maybe I should investigate this.

I don't know. I'd be curious to find out.

What tool did you use to generate the graphs? I have done this a few times with pen and paper. A tool seems very helpful.

I used CL 43464, which gopherbot has helpfully linked to. It has major problems, though. Maybe I'll email golang-dev to get opinions and solicit help on it...

dr2chase · 2017-05-15T16:38:16Z

@laboger had mentioned that our likeliness/layout was not all that they had hoped.

One possibility I had intended to try was to introduce at least one more level of (un)likeliness, for things like branches to panics where we are "really sure" about likeliness, versus all the cases where we're making less-educated guesses. As a general rule we expect p(loopbackedge) > p(return) > p(panic).

josharian · 2017-05-15T16:38:23Z

Note to self: Interesting test cases for this are moderate-complexity autogenerated SSA rule functions, like rewriteValuedec_OpStore_0.

josharian · 2017-05-15T16:47:24Z

One possibility I had intended to try was to introduce at least one more level of (un)likeliness, for things like branches to panics where we are "really sure" about likeliness, versus all the cases where we're making less-educated guesses. As a general rule we expect p(loopbackedge) > p(return) > p(panic).

See CL 43293 for branches to panics; could be improved, but it is a simple and fairly effective first cut. And see the first (somewhat awful) patchset of that CL for what adding new likeliness levels looks like.

And yes, I think spending some time in 1.10 on likeliness and code layout is worthwhile.

josharian · 2017-05-15T16:52:02Z

Another observation for 1.10 use. It appears that a lot of the non-compactness I observed here may simply be due to the fact that posdegree (and zerodegree) are effectively stacks instead of queues. Though that may become irrelevant if lay out code based primarily on loop nesting information.

gopherbot · 2017-05-15T19:22:07Z

CL https://golang.org/cl/43501 mentions this issue.

Noticed while looking at #20356. Cuts 160k (1%) off of the cmd/compile binary. Change-Id: If2397bc6971d6be9be6975048adecb0b5efa6d66 Reviewed-on: https://go-review.googlesource.com/43501 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>

josharian · 2017-05-18T18:04:51Z

I think CL 43491 (and possibly @dr2chase's follow-up) is enough to address the 1.9 regression. Punting this to 1.10.

DO NOT SUBMIT This CL adds CFG graphs to ssa.html. It execs dot to generate SVG, which then gets inlined into the html. Some standard naming and javascript hacks enable integration with the rest of ssa.html: Clicking on blocks highlights the relevant part of the CFG graph, and vice versa. Sample output and screenshots can be seen in issues golang#20355 and golang#20356. There are serious problems with this CL, though. Performance: * Calling dot after every pass is noticeably slow. * The generated output is giant. * The browser is very slow to render the resulting page. * Clicking on blocks is even slower than before. * Some things I want to do, like allow the user to change the table column widths, lock up the browser. Appearance: * The CFGs can easily be large and overwhelming. Toggling them on/off might be workable, if the performance concerns above were addressed. * I can't figure out the right size to render the CFGs; simple ones are currently oversized and cartoonish, while larger ones are unreadable. * They requires an unsatisfying amount of explanation (see golang#20356). Block layout information is particularly inferior/confusing. * Dead blocks float awkwardly in the sky, with no indication that they are dead. * It'd be nice to somehow add visual information about loops, which we can calculate, and which is non-obvious in large graphs, but I don't know how. * It'd be nice to add more information per block, like the number of values it contains, or even the values themselves, but adding info to a node makes the graph even less readable. Just adding the f.Blocks index in parens was not good. Bugs, incompleteness: * I seem to have broken highlighting around the entire block in the text. * Need to hook up some way to communicate dot-related errors without bringing down the process. * Might need some way to enable/disable dot entirely. Change-Id: I19abc3007f396bdb710ba7563668d343c0924feb

ysmolski · 2018-10-12T10:18:22Z

I am looking at this. I was able to generate CFG using tip and the @josharian tool for the program in the topic. It's somewhat different from what was posted 1+ year ago (not a surprise). I got this:

For the full picture see the attached ssa.html:
ssa.html.zip

Now I wonder how can we optimize the layout? What is the desired order of blocks in this case?

EDIT: I have uploaded improved versions of CFGs pictures.

ysmolski · 2018-10-17T09:50:16Z

For anyone who will work on this. Current version of gc does not have branching in layout pass for this code:

if i%2 == 0 {
     odd = 2
}

Layout pre/post b3 has the following code:

b3: ← b2
v21 (+10) = ADDQconst <int> [-1069] v52 (odd[int])
v57 (+8) = SHRQconst <int> [63] v7
v60 (?) = MOVQconst <int> [0]
v36 (+8) = ADDQ <int> v57 v7
v31 (+8) = SARQconst <int> [1] v36
v50 (+8) = SHLQconst <int> [1] v31
v15 (8) = CMPQ <flags> v7 v50
v23 (13) = CMOVQEQ <int> v5 v21 v15 (odd[int])
v24 (+13) = ADDQ <int> v23 v52 (x[int])
Plain → b8 (8)

It uses conditional move and thus you cannot replicate the bug with the code in the topic!

ysmolski · 2018-10-17T10:01:01Z

Code that reproduces the problem:

package p

//go:noinline
func g() {
}

func f() int {
	x := 0
	for i := 0; i < 10; i++ {
		odd := 0
		if i%2 == 0 {
			odd = 2 // not 1, otherwise this branch gets optimized away!
			g()
		}
		x += odd
		// Distract the layout pass with a bunch of loops.
		for j := 0; j < 10; j++ {
			for j := 0; j < 10; j++ {
				for j := 0; j < 10; j++ {
					x++
				}
			}
		}
	}
	return x
}

ysmolski · 2018-10-17T10:12:31Z

@dr2chase:

One possibility I had intended to try was to introduce at least one more level of (un)likeliness, for things like branches to panics where we are "really sure" about likeliness, versus all the cases where we're making less-educated guesses. As a general rule we expect p(loopbackedge) > p(return) > p(panic).

I agree that it would solve this problem generally, in the case of i%2 == 0 we are pretty safe to assume that it's 50%/50%, while the compiler estimates that as unlikely. ~~I am not sure why it is not the BranchUknown for this condition?~~ Looks like the likelyadjsut pass estimates this to unlikely.

I've read in some very popular book that compiler could profile the code first to estimate likeness of branches and then make the optimization. Laughs and idealistic approaches aside, we can try to introduce what David has suggested.

@josharian what do you think?

CAFxX · 2018-10-17T21:21:33Z

I've read in some very popular book that compiler could profile the code first to estimate likeness of branches and then make the optimization. Laughs and idealistic approaches aside, we can try to introduce what David has suggested.

While we're on the topic, it seems there is no github issue for PGO, that may well be the only good way to address issues such as this or the midstack inlining one. Is it deemed completely out of scope for gc?

josharian · 2018-10-17T21:30:12Z

I’ll file a PGO issue.

josharian added the Performance label May 13, 2017

josharian added this to the Go1.9Maybe milestone May 13, 2017

josharian mentioned this issue May 13, 2017

cmd/compile: Fannkuch11 on AMD64 slow down 6% after removing assembler backend instruction reordering #18977

Open

josharian mentioned this issue May 18, 2017

cmd/compile: improve loop rotation #20411

Open

josharian modified the milestones: Go1.10, Go1.9Maybe May 18, 2017

bradfitz modified the milestones: Go1.10, Go1.11 Nov 28, 2017

bradfitz modified the milestones: Go1.11, Unplanned May 18, 2018

josharian mentioned this issue Jun 16, 2018

cmd/compile: possible missed optimization in append benchmark #25916

Open

josharian mentioned this issue Oct 17, 2018

cmd/compile: feedback-guided optimization #28262

Closed

gopherbot added the compiler/runtime label Jul 13, 2022

mknyszek added this to Go Compiler / Runtime Jul 13, 2022

mknyszek removed this from Go Compiler / Runtime Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: lay out loop-free, likeliness-free control flow more compactly #20356

cmd/compile: lay out loop-free, likeliness-free control flow more compactly #20356

josharian commented May 13, 2017 •

edited

Loading

josharian commented May 13, 2017

cherrymui commented May 13, 2017

gopherbot commented May 14, 2017

josharian commented May 14, 2017

dr2chase commented May 15, 2017

josharian commented May 15, 2017

josharian commented May 15, 2017

josharian commented May 15, 2017

gopherbot commented May 15, 2017

josharian commented May 18, 2017

ysmolski commented Oct 12, 2018 •

edited

Loading

ysmolski commented Oct 17, 2018 •

edited

Loading

ysmolski commented Oct 17, 2018

ysmolski commented Oct 17, 2018 •

edited

Loading

CAFxX commented Oct 17, 2018

josharian commented Oct 17, 2018

cmd/compile: lay out loop-free, likeliness-free control flow more compactly #20356

cmd/compile: lay out loop-free, likeliness-free control flow more compactly #20356

Comments

josharian commented May 13, 2017 • edited Loading

josharian commented May 13, 2017

cherrymui commented May 13, 2017

gopherbot commented May 14, 2017

josharian commented May 14, 2017

dr2chase commented May 15, 2017

josharian commented May 15, 2017

josharian commented May 15, 2017

josharian commented May 15, 2017

gopherbot commented May 15, 2017

josharian commented May 18, 2017

ysmolski commented Oct 12, 2018 • edited Loading

ysmolski commented Oct 17, 2018 • edited Loading

ysmolski commented Oct 17, 2018

ysmolski commented Oct 17, 2018 • edited Loading

CAFxX commented Oct 17, 2018

josharian commented Oct 17, 2018

josharian commented May 13, 2017 •

edited

Loading

ysmolski commented Oct 12, 2018 •

edited

Loading

ysmolski commented Oct 17, 2018 •

edited

Loading

ysmolski commented Oct 17, 2018 •

edited

Loading