Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: excessively slow compile for 60K lines function #15112

Closed
jordancurve opened this issue Apr 4, 2016 · 20 comments
Closed

cmd/compile: excessively slow compile for 60K lines function #15112

jordancurve opened this issue Apr 4, 2016 · 20 comments

Comments

@jordancurve
Copy link

The go 1.6 compiler panics on this 2.25 Mbyte source file containing a slice with 13240 elements: https://drive.google.com/file/d/0B2AF520HhpuzdzZhSVpPZGZkTUU/view

go1.6 linux/amd64
GOARCH="amd64"
GOOS="linux"

$ go build buildme.go
# command-line-arguments
panic: runtime error: makeslice: len out of range

goroutine 1 [running]:
panic(0x804f60, 0xc86085dd70)
        /usr/lib/google-golang/src/runtime/panic.go:483 +0x3ef
cmd/compile/internal/gc.newliveness(0xc82018b9e0, 0xc8200a0000, 0xc865c44000, 0x2d416, 0x2e400, 0xc866950000, 0xcee1, 0xf000, 0xc9cda4b11aac74db)
        /usr/lib/google-golang/src/cmd/compile/internal/gc/plive.go:687 +0x161
cmd/compile/internal/gc.liveness(0xc82018b9e0, 0xc8200a0000, 0xc848dee900, 0xc848dee980)
        /usr/lib/google-golang/src/cmd/compile/internal/gc/plive.go:1782 +0x2cf
cmd/compile/internal/gc.compile(0xc82018b9e0)
        /usr/lib/google-golang/src/cmd/compile/internal/gc/pgen.go:541 +0xdf2
cmd/compile/internal/gc.funccompile(0xc82018b9e0)
        /usr/lib/google-golang/src/cmd/compile/internal/gc/dcl.go:1450 +0x1c0
cmd/compile/internal/gc.Main()
        /usr/lib/google-golang/src/cmd/compile/internal/gc/lex.go:472 +0x2116
cmd/compile/internal/amd64.Main()
        /usr/lib/google-golang/src/cmd/compile/internal/amd64/galign.go:127 +0x58d
main.main()
        /usr/lib/google-golang/src/cmd/compile/main.go:32 +0x395
@griesemer
Copy link
Contributor

Seems to work at tip - albeit incredibly slow:

$ time go tool compile buildme.go 

real    1m36.521s
user    1m40.571s
sys 0m1.907s

It's a huge file, but for the reference, gotype type-checks this in 0.9s.

There's some algorithmic issue here.

@griesemer
Copy link
Contributor

@Khaaan can you confirm that this code (or whatever original you have derived this from) compiles (if extremely slow) when compiling with the latest compiler at tip?

@rjammala
Copy link

rjammala commented Apr 4, 2016

@griesemer It compiles for me at tip: +f229e46

$ time go tool compile buildme.go

real 1m55.632s
user 1m56.219s
sys 0m3.270s

@griesemer
Copy link
Contributor

Thanks. There's probably some non-linearity in the backend somewhere. Adjusting title.

@griesemer griesemer changed the title go compiler panic: "makeslice: len out of range" cmd/compile: excessively slow compile for 60K lines function Apr 4, 2016
@josharian
Copy link
Contributor

Looks like it's spending most of the time in CSE. (Surprise!)

cc @tzneal

@josharian
Copy link
Contributor

Started.

@dr2chase
Copy link
Contributor

FYI, I may be mucking around with Types, meaning, canonicalizing them down to a 32-bit integer, as part of improvements to register allocation. Not sure if this is going to matter to you for this, but just a heads-up.

@gopherbot
Copy link

CL https://golang.org/cl/21937 mentions this issue.

gopherbot pushed a commit that referenced this issue Apr 13, 2016
runtime.newobject never returns the same thing twice,
so the resulting value will never be a common subexpression.

This helps when compiling large static data structures
that include pointers, such as maps and slices.
No clear performance impact on other code. (See below.)

For the code in issue #15112:

Before:
  real	1m14.238s
  user	1m18.985s
  sys	0m0.787s

After:
  real	0m47.172s
  user	0m52.248s
  sys	0m0.767s

For the code in issue #15235, size 10k:

Before:
  real	0m44.916s
  user	0m46.577s
  sys	0m0.304s

After:
  real	0m7.703s
  user	0m9.041s
  sys	0m0.316s

Still more work to be done, particularly for #15112.

Updates #15112
Updates #15235


name       old time/op      new time/op      delta
Template        330ms ±11%       333ms ±13%    ~           (p=0.749 n=20+19)
Unicode         148ms ± 6%       152ms ± 8%    ~           (p=0.072 n=18+20)
GoTypes         1.01s ± 7%       1.01s ± 3%    ~           (p=0.583 n=20+20)
Compiler        5.04s ± 2%       5.06s ± 2%    ~           (p=0.314 n=20+20)

name       old user-ns/op   new user-ns/op   delta
Template   444user-ms ±11%  445user-ms ±10%    ~           (p=0.738 n=20+20)
Unicode    215user-ms ± 5%  218user-ms ± 5%    ~           (p=0.239 n=18+18)
GoTypes    1.45user-s ± 3%  1.45user-s ± 4%    ~           (p=0.620 n=20+20)
Compiler   7.23user-s ± 2%  7.22user-s ± 2%    ~           (p=0.901 n=20+19)

name       old alloc/op     new alloc/op     delta
Template       55.0MB ± 0%      55.0MB ± 0%    ~           (p=0.547 n=20+20)
Unicode        37.6MB ± 0%      37.6MB ± 0%    ~           (p=0.301 n=20+20)
GoTypes         177MB ± 0%       177MB ± 0%    ~           (p=0.065 n=20+19)
Compiler        798MB ± 0%       797MB ± 0%  -0.05%        (p=0.000 n=19+20)

name       old allocs/op    new allocs/op    delta
Template         492k ± 0%        493k ± 0%  +0.03%        (p=0.030 n=20+20)
Unicode          377k ± 0%        377k ± 0%    ~           (p=0.423 n=20+19)
GoTypes         1.40M ± 0%       1.40M ± 0%    ~           (p=0.102 n=20+20)
Compiler        5.53M ± 0%       5.53M ± 0%    ~           (p=0.094 n=17+18)

name       old text-bytes   new text-bytes   delta
HelloSize        561k ± 0%        561k ± 0%    ~     (all samples are equal)
CmdGoSize       6.13M ± 0%       6.13M ± 0%    ~     (all samples are equal)

name       old data-bytes   new data-bytes   delta
HelloSize        128k ± 0%        128k ± 0%    ~     (all samples are equal)
CmdGoSize        306k ± 0%        306k ± 0%    ~     (all samples are equal)

name       old exe-bytes    new exe-bytes    delta
HelloSize        905k ± 0%        905k ± 0%    ~     (all samples are equal)
CmdGoSize       9.64M ± 0%       9.64M ± 0%    ~     (all samples are equal)

Change-Id: Id774e2901d7701a3ec7a1c1d1cf1d9327a4107fc
Reviewed-on: https://go-review.googlesource.com/21937
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
@josharian
Copy link
Contributor

CL 21961 will help a bit more (if it goes in). The bottleneck now is the O(n^2) invocation of isAncestorEq. I don't see any clear algorithmic improvements there, but perhaps @dr2chase or @tzneal does.

The only further think I can think of to try is to make isAncestorEq itself cheaper. That's not going to be easy. Perhaps we could cache the entry and exit numbers per value in the eqclass to avoid the indirect lookups in the sparse tree? But that'll cost allocations, and it's not fixing the root cause.

I don't plan to work on this further. It is considerably faster; I hope that it's improved enough that we can downgrade from Go1.7 to Unplanned.

@gopherbot
Copy link

CL https://golang.org/cl/21961 mentions this issue.

@tzneal
Copy link
Member

tzneal commented Apr 13, 2016

I'm working on another CL that gives a 50% speedup for this issue. It just sorts each partition by isAncestorEq to remove the need to find the maximal dominator each pass through.

The other thing I was trying was to reduce the number of values created in the first place. Every reference to a PEXTERN ONAME ("moves" in this case) generates another OpAddr value which all then get removed by cse. The other large sets of duplicate values for this case are stack addresses for calling runtime.newobject.

@josharian
Copy link
Contributor

Did you see CL 21961?

I also tried an initial sort. The primary problem is that isAncestor(Eq) provides only a partial ordering, so you fundamentally need to look at all pairs to get a complete sort. And then you need to maintain the sort order as you remove elements, which is itself expensive. That's how I ended up at CL 21961. But I'd be delighted if you see something I didn't. :)

Reducing the number of values would be a big win. If you can reduce the number of Nodes, that'd help everyone, since compile time in general is at this point roughly proportional to the number of Nodes, for non-pathological files.

I wonder whether stack addresses (or maybe a subset of them?) should be put in the constant cache.

@tzneal
Copy link
Member

tzneal commented Apr 13, 2016

I saw it, but thought that the sort by dom might be cleaner. I was going to flail around a bit with it anyway :)

I saw the partial ordering issue after I started coding and then began using the entry value from sdom to provide an absolute ordering. I just finished up maintaining the sort order to see if it's worth it.

I've got a DNR CL 21981 if you wanted to take a look. I'm on a slowish Macbook so my benchmark times are a little worse in absolute terms, but it's still a ~30% improvement.

@gopherbot
Copy link

CL https://golang.org/cl/21981 mentions this issue.

gopherbot pushed a commit that referenced this issue Apr 13, 2016
We do two O(n) scans of all values in an eqclass when computing
substitutions for CSE.

In unfortunate cases, like those found in #15112, we can have a large
eqclass composed of values found in blocks none of whom dominate the
other.  This leads to O(n^2) behavior. The elements are removed one at a
time, with O(n) scans each time.

This CL removes the linear scan by sorting the eqclass so that dominant
values will be sorted first.  As long as we also ensure we don't disturb
the sort order, then we no longer need to scan for the maximally
dominant value.

For the code in issue #15112:

Before:
real    1m26.094s
user    1m30.776s
sys     0m1.125s

Aefter:
real    0m52.099s
user    0m56.829s
sys     0m1.092s

Updates #15112

Change-Id: Ic4f8680ed172e716232436d31963209c146ef850
Reviewed-on: https://go-review.googlesource.com/21981
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
gopherbot pushed a commit that referenced this issue Apr 15, 2016
Process a slice of equivalent values by setting replaced values to nil
instead of removing them from the slice to eliminate copying.  Also take
advantage of the entry number sort to break early once we reach a value
in a block that is not dominated.

For the code in issue #15112:

Before:
real    0m52.603s
user    0m56.957s
sys     0m1.213s

After:
real    0m22.048s
user    0m26.445s
sys     0m0.939s

Updates #15112

Change-Id: I06d9e1e1f1ad85d7fa196c5d51f0dc163907376d
Reviewed-on: https://go-review.googlesource.com/22068
Reviewed-by: David Chase <drchase@google.com>
@tzneal
Copy link
Member

tzneal commented Apr 15, 2016

@josharian - Caching stack address is harder than I thought due to the value's type in most cases being a unique instance of a Type. I started looking at caching types in Ptrto() to solve this, but the Node's type is unique there as well. The easy cases are args to runtime arguments which are all given type Types[TUINTPTR]. Past that I don't see a great way to reduce the OpOffPtr values.

@dr2chase
Copy link
Contributor

I've got a type-interner working (based on Compare) that might help here. So far I am only using it for LocalSlot because those are sometimes too fat. Will see about making it more pervasive.

@josharian
Copy link
Contributor

A quick profile suggests that the only unusual performance bit here is in fact type comparison, and even that's only ~13% of compile time. So: vast strides.

@randall77
Copy link
Contributor

I'm going to move this to 1.8, I don't think we have anything else to help here for 1.7.

@randall77 randall77 modified the milestones: Go1.8, Go1.7 Apr 29, 2016
@josharian
Copy link
Contributor

@tzneal #15520 is another case in which we do a bunch of work to collapse OpSP offsets. In that case, they all have the same type, though they don't compare the same using ==. A cache of these keyed by (off, type.String) would help here, at least. (Similar to the const cache, callers would need to make sure that their type is actually CMPeq, not just string-equal.) I think we should look again at this idea for Go 1.8.

@gopherbot
Copy link

CL https://golang.org/cl/30354 mentions this issue.

@golang golang locked and limited conversation to collaborators Oct 5, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants