-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: "go test" taking ~38x longer on Go tip than on Go 1.4 #14934
Comments
Nuke your $GOPATH/pkg cache before both tests. I suspect in one case you're building dependencies and in the other case you're not. |
I tried to remove the For Go tip (21 sec)
For Go 1.4 (0.6 sec)
|
Try running |
The SSA compiler is very slow with the large, auto-generated CPU and memory profiles follow:
|
The old compiler has a cutoff threshold: when a function uses more than
some certain number of instructions, it will skip the optimization step.
Perhaps we could preserve the old backend for such cases.
(That is, if the input function is too complicated, just fallback to the
old backend.)
|
I would certainly hope that we can remove the old backend eventually. It
On Wed, Mar 23, 2016 at 4:02 PM, Minux Ma notifications@github.com wrote:
|
On Go tip
On Go 1.4
|
@mdempsky The step which takes most of the time for Go tip is the following:
|
Can you please provide a link to that source. I cannot figure out which On Thu, Mar 24, 2016 at 10:28 AM, Robin Eklind notifications@github.com
|
@davecheney https://github.com/mewmew/uc/blob/9557c0010cde8953d0719ab5df170b8cfbdc3a3f/uc/gocc/lexer/lexer_test.go (The sha is given in the report.) |
I am sorry, this revision does not compile cleanly for me github.com/mewmew/uc/uc/gocc/lexerlexer_test.go:11:2: cannot find package "github.com/mewmew/uc/uc/gocc/token" % ls $GOPATH/src/github.com/mewmew/uc/uc/gocc/ On Thu, Mar 24, 2016 at 10:52 AM, Caleb Spare notifications@github.com
|
@davecheney, I was able to follow the instructions to repro to get my numbers above. |
@bradfitz I couldn't get it to work, could you please get a cpu and mem On Thu, Mar 24, 2016 at 10:58 AM, Brad Fitzpatrick <notifications@github.com
|
@davecheney Perhaps you didn't run |
|
@davecheney All steps in one go, simply copy-and-paste into a terminal to reproduce. For Go tip
For Go 1.4
|
Thanks Robin! On Thu, Mar 24, 2016 at 11:50 AM, Robin Eklind notifications@github.com
|
You are very welcome. I'll look into generating CPU and memory profiles. |
Don't spend time on the 1.4 data, it's not going to change. The SVG for go test -i $PKG # compile all the test depenencies to ensure only one On Thu, Mar 24, 2016 at 11:51 AM, Robin Eklind notifications@github.com
|
Thanks for the how to instructions. I ran the commands you suggested and got the following profiles for Go tip. From what I can tell, it doesn't look like the profile managed to capture the entire compilation time? The CPU profile only captured 10 ms of the 20 seconds. And the memory profile only captured ~5 MB of the ~3 GB. Any suggestions? |
I hacked together a standalone file slow.go from lexer_test that gives pretty awful performance (but not 30x slower). The 3k lines of go generates over 4 million SSA values.
|
Run the command with -X, I think there may be more than one compile happening, overwriting the file. Worse case you can copy and paste the output from -x and run each command by hand.
|
@tzneal Thanks! Profiling
To reproduce, run the following:
|
Lots of inlining, lots of gotos, big slice constructor, not surprised it is bad. |
Possibly a dup of #14774. |
CL https://golang.org/cl/21082 mentions this issue. |
David, does your sparse phi insertion CL help here? |
Sparse phi insertion helps.
New profiles from compiling lexer_test.
|
CL https://golang.org/cl/22790 mentions this issue. |
If b has exactly one predecessor, as happens frequently with static calls, we can make lookupVarOutgoing generate less garbage. Instead of generating a value that is just going to be an OpCopy and then get eliminated, loop. This can lead to lots of looping. However, this loop is way cheaper than generating lots of ssa.Values and then eliminating them. For a subset of the code in #15537: Before: 28.31 real 36.17 user 1.68 sys 2282450944 maximum resident set size After: 9.63 real 11.66 user 0.51 sys 638144512 maximum resident set size Updates #15537. Excitingly, it appears that this also helps regular code: name old time/op new time/op delta Template 288ms ± 6% 276ms ± 7% -4.13% (p=0.000 n=21+24) Unicode 143ms ± 8% 141ms ±10% ~ (p=0.287 n=24+25) GoTypes 932ms ± 4% 874ms ± 4% -6.20% (p=0.000 n=23+22) Compiler 4.89s ± 4% 4.58s ± 4% -6.46% (p=0.000 n=22+23) MakeBash 40.2s ±13% 39.8s ± 9% ~ (p=0.648 n=23+23) name old user-ns/op new user-ns/op delta Template 388user-ms ±10% 373user-ms ± 5% -3.80% (p=0.000 n=24+25) Unicode 203user-ms ± 6% 202user-ms ± 7% ~ (p=0.492 n=22+24) GoTypes 1.29user-s ± 4% 1.17user-s ± 4% -9.67% (p=0.000 n=25+23) Compiler 6.86user-s ± 5% 6.28user-s ± 4% -8.49% (p=0.000 n=25+25) name old alloc/op new alloc/op delta Template 51.5MB ± 0% 47.6MB ± 0% -7.47% (p=0.000 n=22+25) Unicode 37.2MB ± 0% 37.1MB ± 0% -0.21% (p=0.000 n=25+25) GoTypes 166MB ± 0% 138MB ± 0% -16.83% (p=0.000 n=25+25) Compiler 756MB ± 0% 628MB ± 0% -16.96% (p=0.000 n=25+23) name old allocs/op new allocs/op delta Template 450k ± 0% 445k ± 0% -1.02% (p=0.000 n=25+25) Unicode 356k ± 0% 356k ± 0% ~ (p=0.374 n=24+25) GoTypes 1.31M ± 0% 1.25M ± 0% -4.18% (p=0.000 n=25+25) Compiler 5.29M ± 0% 5.02M ± 0% -5.15% (p=0.000 n=25+23) It also seems to help in other cases in which phi insertion is a pain point (#14774, #14934). Change-Id: Ibd05ed7b99d262117ece7bb250dfa8c3d1cc5dd2 Reviewed-on: https://go-review.googlesource.com/22790 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
What's the status of this issue? Still 38X? |
I now get
So ~10x |
Also, FWIW
So ~2x from 1.4 to 1.6, ~5x from 1.6 to tip. |
FYI, I just checked whether https://go-review.googlesource.com/c/23136/ is any help here, and it is:
After:
That is queued up for 1.8. |
It sounds like current tip is 10x slower. @rsc @randall77 @dr2chase Is there anything we still want to do here for Go1.7, or should I move this to Go1.8? |
Nothing else planned for 1.7. Let's move this to Go1.8early. |
CL https://golang.org/cl/30163 mentions this issue. |
Should be more asymptotically happy. We process each variable in turn to find all the locations where it needs a phi (the dominance frontier of all of its definitions). Then we add all those phis. This takes O(n * #variables), although hopefully much less. Then we do a single tree walk to match all the FwdRefs with the nearest definition or phi. This takes O(n) time. The one remaining inefficiency is that we might end up introducing a bunch of dead phis in the first step. A TODO is to introduce phis only where they might be used by a read. The old algorithm is still faster on small functions, so there's a cutover size (currently 500 blocks). This algorithm supercedes the David's sparse phi placement algorithm for large functions. Lowers compile time of example from #14934 from ~10 sec to ~4 sec. Lowers compile time of example from #16361 from ~4.5 sec to ~3 sec. Lowers #16407 from ~20 min to ~30 sec. Update #14934 Update #16361 Fixes #16407 Change-Id: I1cff6364e1623c143190b6a924d7599e309db58f Reviewed-on: https://go-review.googlesource.com/30163 Reviewed-by: David Chase <drchase@google.com>
This is now fixed. Tip compiles Todd's slow.go in under half a second. |
I can confirm that tip is ~2x slower than 1.4. (Edit: my initial tests were run on the wrong revision of Go; namely Go 1.6 rather than tip. On my laptop there is a 2x slowdown between Go 1.4 and Go 1.6, and an additional 3x slowdown between Go 1.6 and Go tip b9fd510). Average from 10 consecutive runs follow the instructions of #14934 (comment): For Go 1.4: For Go 1.6 (rev da6b9ec): For Go tip (rev b9fd510): Slowdown between Go 1.4 to Go 1.6: ~2.37 Edit: Thanks a lot for working to resolve this issue. It will help out tremendously when working with auto-generated code (such as parsers). |
My initial tests in #14934 (comment) were run on the wrong revision of Go; namely Go 1.6 rather than tip. The comparison has been updated above. @randall77, my results do not seem to mirror yours. Any idea as to why this may happen? Can I assist with some additional information? At least on my laptop, Go 1.4 outperforms Go 1.6 by ~2x, while Go 1.6 outperforms Go tip (rev b9fd510) by another ~3x. |
The numbers I quoted above were for Todd's example, not yours. I'm inclined to leave this issue closed. It's only a 4 second build. (Most of that time is regalloc.) |
go version
)?For Go tip
and for Go 1.4
go env
)?For Go tip
and for Go 1.4
If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.
For Go tip
and for Go 1.4
The time take by
go test github.com/mewmew/uc/uc/gocc/lexer
taking roughly the same amount of time to run when using Go tip and Go 1.4."go test" took roughly 20 seconds on Go tip and 0.5 seconds on Go 1.4.
Also note that the
compile
command used between 25% to 60% of the CPU, during the 20 second duration.The text was updated successfully, but these errors were encountered: