cmd/compile: auto-generated code pathologically slow to compile #16407

rogpeppe · 2016-07-18T14:44:14Z

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.7rc1 linux/amd64

What operating system and processor architecture are you using (go env)?

amd64, linux

What did you do?

go get github.com/zhenjl/xparse/etld

What did you expect to see?

A response in reasonable time.

What did you see instead?

No response - the code takes a very long time to compile.

Under Go 1.4, the package takes ~2s to install. Under Go 1.6, it took ~4s.

Using Go1.7rc1 I killed the compiler after 9 minutes because my machine was
overheating and becoming unusable.

It's a large state machine, but this doesn't seem like reasonable compiler behaviour.
FWIW this package is used as part of the sequencer tool (see http://zhen.org/blog/sequence-high-performance-sequential-semantic-log--parser/)

The text was updated successfully, but these errors were encountered:

ianlancetaylor · 2016-07-18T15:07:37Z

CC @randall77 @josharian @dr2chase

bradfitz · 2016-07-18T15:09:36Z

Quick code link for those curious:
https://raw.githubusercontent.com/zhenjl/xparse/master/etld/fsm.go

josharian · 2016-07-18T16:07:15Z

Haven't dug much, but I see at least two issues here.

(1) The sparse phi locator is slow on this. Determined by sending SIGQUIT a few times after letting the compilation run for a while. Running with GO_SSA_PHI_LOC_CUTOFF=-1 gets past the SSA construction phase quickly, and lets you reach problem 2.

(2) Memory blowup in regalloc during liveness computation. There are lots of values live across lots of blocks, leading to O(n^2) behavior. That's what causes the machine instability.

There might be some cheap additional heuristics we can use for selecting whether to use the sparse phi locator. That'd probably be safe for 1.7.

The quadratic memory usage in liveness is a general, important, known problem, and way out of scope for 1.7. (David, might an SCC-based representation of liveness help here?)

dgryski · 2016-07-18T16:23:30Z

This will also affect programs using https://github.com/opennota/re2dfa or www.complang.org/ragel/ to build custom DFAs for matching regexps.

josharian · 2016-07-18T16:24:22Z

@rogpeppe I assume you are perfectly capable of finding workarounds as needed in the meantime (including using 1.6 or turning off ssa). But I will say that I would consider converting some of the innermost switch statements to lookup tables. This should compile faster, run faster, and probably produce a smaller binary, particularly if you shift all array indices so that the smallest interesting value is 0. This might require a non-trivial transform, though, to indicate e.g. (lines 495-594) a new value for s and a new value for m and whether to check i and pb before modifying m.

rogpeppe · 2016-07-18T16:28:57Z

@josharian It's not my code - I have nothing to do with it or the sequencer command that imports it. I just encountered the issue when trying out that command.

rogpeppe · 2016-07-18T16:38:21Z

@josharian But to add to your thoughts, I wonder if just using goto instead of a
huge switch statement might help matters.
That's the main reason why goto is there, after all, AIUI.

randall77 · 2016-07-19T06:05:54Z

Regalloc is dying because there are ~30000 integer constants, and they are all live during all blocks in the program. We put them all in the first block, and they never get moved down by the tighten pass because their only use is in a phi.
I'll modify tighten to fix this case. That will fix the regalloc part of this problem.
David, any idea why the sparse phi locator is failing?

gopherbot · 2016-07-19T07:00:16Z

CL https://golang.org/cl/25046 mentions this issue.

mewmew · 2016-07-19T15:22:55Z

I'd reckon this is a duplicate of or related to #14934.

dr2chase · 2016-07-19T15:33:14Z

It's kinda pathological for the sparse phi locator, too. One of the "this isn't usually large" assumptions isn't true. I'm working on that.

randall77 · 2016-07-20T02:20:09Z

Looks like the phi locator takes ~20 minutes to complete on this example.
Not good, but at least it completes.

Changes genfsm.go to divide the state space into chunks of size 256, then call a helper for function for each chunk (as opposed to having a single giant function that does everything). This reduces the compile time for go1.7 down to something reasonable (15-30 seconds depending on machine speed). See related issues zhenjl#1 golang/go#16407

thanm · 2016-07-20T14:09:45Z

I created a pull request [as "learn to program in go" exercise] for the fsm generator that divides the state space into chunks, then puts the chunks in separate functions. Each of the new functions is a couple thousand lines long. Bring the total compilation time down to 10-15 seconds.

dr2chase · 2016-07-20T16:17:24Z

This is not showing up here, but this CL contains two fixes (one that just missed the first 1.7 deadline, the other created earlier this week) that cut the phi location time from(on my laptop) 12+ minutes down to 2, and also cuts the memory footprint of that phase down to about 500MB.

https://go-review.googlesource.com/c/23136/

josharian · 2016-07-20T16:24:45Z

With the sparse phi locator off, though, the SSA construction completes in a few seconds, not 2 minutes. It seems like a better fix here might be updated heuristics about when to use the sparse phi locator, if such heuristics are available.

CL 23136 should probably also go in, but maybe for 1.8 instead? I'm not sure.

dr2chase · 2016-07-20T18:12:46Z

If I knew of such heuristics, I would use them. I haven't yet figured out what makes this particular flow-graph so special. It's big.

entry: x = MOVQconst [7] ... b1: goto b2 b2: v = Phi(x, y, z) Transform that program to: entry: ... b1: x = MOVQconst [7] goto b2 b2: v = Phi(x, y, z) This CL moves constant-generating instructions used by a phi to the appropriate immediate predecessor of the phi's block. We used to put all constants in the entry block. Unfortunately, in large functions we have lots of constants at the start of the function, all of which are used by lots of phis throughout the function. This leads to the constants being live through most of the function (especially if there is an outer loop). That's an O(n^2) problem. Note that most of the non-phi uses of constants have already been folded into instructions (ADDQconst, MOVQstoreconst, etc.). This CL may be generally useful for other instances of compiler slowness, I'll have to check. It may cause some programs to run slower, but probably not by much, as rematerializeable values like these constants are allocated late (not at their originally scheduled location) anyway. This CL is definitely a minimal change that can be considered for 1.7. We probably want to do a better job in the tighten pass generally, not just for phi args. Leaving that for 1.8. Update #16407 Change-Id: If112a8883b4ef172b2f37dea13e44bda9346c342 Reviewed-on: https://go-review.googlesource.com/25046 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>

This is: (1) a simple trick that cuts the number of phi-nodes (temporarily) inserted into the ssa representation by a factor of 10, and can cut the user time to compile tricky inputs like gogo/protobuf tests from 13 user minutes to 9.5, and memory allocation from 3.4GB to 2.4GB. (2) a fix to sparse lookup, that does not rely on an assumption proven false by at least one pathological input "etldlen". These two changes fix unrelated compiler performance bugs, both necessary to obtain good performance compiling etldlen. Without them it takes 20 minutes or longer, with them it completes in 2 minutes, without a gigantic memory footprint. Updates #16407 Change-Id: Iaa8aaa8c706858b3d49de1c4865a7fd79e6f4ff7 Reviewed-on: https://go-review.googlesource.com/23136 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>

josharian · 2016-07-22T19:31:06Z

Looks like 2m is as fast as this is going to get for 1.7. Moving to 1.8.

mschoch · 2016-09-15T20:12:55Z

Initially reported under #17127 the blevesearch/segment package appears to also illustrate the problem, perhaps even more severely.

I've just tested with tip:

$ go version
go version devel +22d3bf1 Thu Sep 15 19:24:04 2016 +0000 darwin/amd64

And the following compilation runs for over 20 minutes (still going):

$ go get -tags 'prod' github.com/blevesearch/segment

randall77 · 2016-10-03T19:45:24Z

@mschoch : your example compiles for me in ~8 minutes. Most of the time (~7 min) is lowered CSE.
Other slow phases:
phi building: 37 sec
generic CSE: 14 sec
regalloc: 4 sec
I thought regalloc would be worse.

mschoch · 2016-10-03T19:55:00Z

@randall77 are there new commits on tip related to this? I let it go for 45 minutes and it never finished. I also with GO_SSA_PHI_LOC_CUTOFF=-1, but it too never finished.

randall77 · 2016-10-03T20:10:48Z

@mschoch : I've been experimenting on CL 30163 which changes the way phis are inserted. It takes 8 minutes there. I wasn't expecting big differences with tip as both tip and that CL take about the same amount of time for phi building on your example. However, it looks like tip is going to take much longer (still running). I don't understand why, as the phi building takes about the same amount of time. For some reason, the differences in phi building make later phases take longer.
Looks like generic CSE is at least the first such phase.

randall77 · 2016-10-03T20:20:12Z

@rogpeppe 's example seems to be fixed with https://go-review.googlesource.com/c/30163/ . Compile time down to ~30sec (most of that still in phi building).

randall77 · 2016-10-03T20:23:08Z

I'm going to reopen #17127. That slow compile appears to be mostly CSE time, whereas this issue is mostly phi building time.

gopherbot · 2016-10-03T20:30:10Z

CL https://golang.org/cl/30163 mentions this issue.

ianlancetaylor added this to the Go1.7Maybe milestone Jul 18, 2016

rogpeppe mentioned this issue Jul 18, 2016

etld/fsm.go triggers pathological compiler case zhenjl/xparse#1

Open

thanm mentioned this issue Jul 20, 2016

Emit fsm code in chunks as opposed single func. zhenjl/xparse#2

Open

mewmew mentioned this issue Jul 22, 2016

generate compressed tables goccmack/gocc#28

Closed

josharian modified the milestones: Go1.8, Go1.7Maybe Jul 22, 2016

mschoch mentioned this issue Sep 15, 2016

cmd/compile: slow compilation on generated code #17127

Closed

bradfitz assigned randall77 Sep 15, 2016

gopherbot closed this as completed in 5a6e511 Oct 3, 2016

josharian mentioned this issue Feb 17, 2017

cmd/compile: compile time takes far longer with Go 1.7.5 than three prior Go releases #19096

Closed

golang locked and limited conversation to collaborators Oct 3, 2017

gopherbot added the FrozenDueToAge label Oct 3, 2017

rsc unassigned randall77 Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: auto-generated code pathologically slow to compile #16407

cmd/compile: auto-generated code pathologically slow to compile #16407

rogpeppe commented Jul 18, 2016

ianlancetaylor commented Jul 18, 2016

bradfitz commented Jul 18, 2016

josharian commented Jul 18, 2016

dgryski commented Jul 18, 2016

josharian commented Jul 18, 2016

rogpeppe commented Jul 18, 2016

rogpeppe commented Jul 18, 2016

randall77 commented Jul 19, 2016

gopherbot commented Jul 19, 2016

mewmew commented Jul 19, 2016

dr2chase commented Jul 19, 2016

randall77 commented Jul 20, 2016

thanm commented Jul 20, 2016

dr2chase commented Jul 20, 2016

josharian commented Jul 20, 2016

dr2chase commented Jul 20, 2016

josharian commented Jul 22, 2016

mschoch commented Sep 15, 2016

randall77 commented Oct 3, 2016

mschoch commented Oct 3, 2016

randall77 commented Oct 3, 2016

randall77 commented Oct 3, 2016

randall77 commented Oct 3, 2016

gopherbot commented Oct 3, 2016

cmd/compile: auto-generated code pathologically slow to compile #16407

cmd/compile: auto-generated code pathologically slow to compile #16407

Comments

rogpeppe commented Jul 18, 2016

ianlancetaylor commented Jul 18, 2016

bradfitz commented Jul 18, 2016

josharian commented Jul 18, 2016

dgryski commented Jul 18, 2016

josharian commented Jul 18, 2016

rogpeppe commented Jul 18, 2016

rogpeppe commented Jul 18, 2016

randall77 commented Jul 19, 2016

gopherbot commented Jul 19, 2016

mewmew commented Jul 19, 2016

dr2chase commented Jul 19, 2016

randall77 commented Jul 20, 2016

thanm commented Jul 20, 2016

dr2chase commented Jul 20, 2016

josharian commented Jul 20, 2016

dr2chase commented Jul 20, 2016

josharian commented Jul 22, 2016

mschoch commented Sep 15, 2016

randall77 commented Oct 3, 2016

mschoch commented Oct 3, 2016

randall77 commented Oct 3, 2016

randall77 commented Oct 3, 2016

randall77 commented Oct 3, 2016

gopherbot commented Oct 3, 2016