Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: compiling large functions is slow #8225

Closed
josharian opened this issue Jun 17, 2014 · 6 comments
Closed

cmd/compile: compiling large functions is slow #8225

josharian opened this issue Jun 17, 2014 · 6 comments

Comments

@josharian
Copy link
Contributor

Chopping up large functions into smaller ones can have a significant impact on compile
time. See e.g. chan/select5.go (CL 72590045) and the bottom of rotate.go.
@gopherbot
Copy link

Comment 1 by zondolfin:

I created a gigantic type switch with around 3000 cases and 40k LOC, and when compiling
it got
$ time go build
real 2m12.840s
user 2m9.947s
sys 0m1.776s
To find this code, check out
https://github.com/zond/godec/tree/97525c53ef42889c5f9f96033030dd4675047b21, run "go
generator/generator.go"
Then I broke up the switch by moving the body of the cases to other functions, so that
each case only called another function instead of allocate, for loop etc, and got
$ time go build
real 0m8.115s
user 0m7.812s
sys 0m0.282s
To find THAT code, check out
https://github.com/zond/godec/tree/a0188c898f1197709a1238fba5b5864f63eac552, run "go run
generator/generator.go"

@bradfitz bradfitz removed the new label Dec 18, 2014
@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@rsc rsc changed the title cmd/gc: compiling large functions is slow cmd/compile: compiling large functions is slow Jun 8, 2015
@josharian
Copy link
Contributor Author

My original issue ("large functions compile slowly") is too vague and generic to be helpful.

However, the code in Comment 1 is a useful test case. As of 1.8rc1, we spend ~20% of CPU time in resolveFwdRefs and ~10% of CPU time in liveness bit twiddling.

I haven't looked at resolveFwdRefs, but the liveness bit twiddling should be readily reduced. The problem is that we are doing O(number of variables) work for every instruction in the program in livenessprologue. But in almost all cases, progeffects only flips a bit or two. Switching to a sparse return value from progeffects (with appropriate caching and allocation avoidance) should help considerably. I may mail a CL that does this for 1.9.

@josharian josharian modified the milestones: Go1.9Maybe, Unplanned Jan 15, 2017
@josharian
Copy link
Contributor Author

And looks like improving the liveness bit twiddling helps regular code a little bit as well. My crude prototype shows:

name       old time/op      new time/op      delta
Template        212ms ± 4%       208ms ± 2%  -1.67%        (p=0.000 n=23+25)
Unicode        96.9ms ± 4%      96.5ms ± 4%    ~           (p=0.358 n=23+24)
GoTypes         665ms ± 4%       659ms ± 4%    ~           (p=0.068 n=25+25)
Compiler        2.83s ± 2%       2.79s ± 2%  -1.44%        (p=0.000 n=23+25)

name       old user-ns/op   new user-ns/op   delta
Template   265user-ms ± 3%  262user-ms ± 2%  -1.08%        (p=0.001 n=22+25)
Unicode    135user-ms ± 3%  135user-ms ± 5%    ~           (p=0.512 n=21+25)
GoTypes    889user-ms ± 3%  880user-ms ± 2%  -1.04%        (p=0.006 n=25+24)
Compiler   3.85user-s ± 2%  3.81user-s ± 1%  -1.23%        (p=0.000 n=23+25)

@gopherbot
Copy link

CL https://golang.org/cl/36112 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/36211 mentions this issue.

gopherbot pushed a commit that referenced this issue Feb 3, 2017
This speeds up compilation of the code in #8225 by 25%-30%.
The complexity of the algorithm is unchanged,
but this shrinks the constant factor so much that it doesn't matter,
even the size of the giant type switch gets scaled up dramatically.

name       old time/op      new time/op      delta
Template        218ms ± 5%       217ms ±10%    ~           (p=0.163 n=27+30)
Unicode        98.2ms ± 6%      97.7ms ±10%    ~           (p=0.150 n=27+29)
GoTypes         654ms ± 5%       650ms ± 5%    ~           (p=0.350 n=30+30)
Compiler        2.70s ± 4%       2.68s ± 3%    ~           (p=0.128 n=30+29)

name       old user-ns/op   new user-ns/op   delta
Template   276user-ms ± 6%  271user-ms ± 7%  -1.83%        (p=0.003 n=29+28)
Unicode    138user-ms ± 5%  137user-ms ± 4%    ~           (p=0.071 n=27+27)
GoTypes    881user-ms ± 4%  877user-ms ± 4%    ~           (p=0.423 n=30+30)
Compiler   3.76user-s ± 4%  3.72user-s ± 2%  -0.84%        (p=0.028 n=30+29)

name       old alloc/op     new alloc/op     delta
Template       40.7MB ± 0%      40.7MB ± 0%    ~           (p=0.936 n=30+30)
Unicode        30.8MB ± 0%      30.8MB ± 0%    ~           (p=0.859 n=28+30)
GoTypes         123MB ± 0%       123MB ± 0%    ~           (p=0.273 n=30+30)
Compiler        472MB ± 0%       472MB ± 0%    ~           (p=0.432 n=30+30)

name       old allocs/op    new allocs/op    delta
Template         401k ± 1%        401k ± 1%    ~           (p=0.859 n=30+30)
Unicode          331k ± 0%        331k ± 1%    ~           (p=0.823 n=28+30)
GoTypes         1.24M ± 0%       1.24M ± 0%    ~           (p=0.286 n=30+30)
Compiler        4.26M ± 0%       4.26M ± 0%    ~           (p=0.359 n=30+30)

Change-Id: Ia850065a9a84c07a5b0b4e23c1758b5679498da7
Reviewed-on: https://go-review.googlesource.com/36112
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
gopherbot pushed a commit that referenced this issue Feb 3, 2017
When the number of variables in a function is very large,
liveness analysis gets less efficient, since every bit vector
is O(number of variables).

Improve the situation by returning a sparse representation
from progeffects. In all scenarios, progeffects either
returns a slice that is shared function-wide, 
and which is usually small, or a slice that is guaranteed
to have at most three values.

Reduces compilation time for the code in #8225 Comment 1 by ~10%.
Minor effects on regular packages (below).

Passes toolstash -cmp.

Updates #8225

name       old time/op      new time/op      delta
Template        215ms ± 2%       212ms ± 4%  -1.31%  (p=0.001 n=30+30)
Unicode        98.3ms ± 3%      98.4ms ± 5%    ~     (p=0.971 n=30+30)
GoTypes         657ms ± 3%       651ms ± 2%  -0.98%  (p=0.001 n=30+27)
Compiler        2.78s ± 2%       2.77s ± 2%  -0.60%  (p=0.006 n=30+30)
Flate           130ms ± 4%       130ms ± 4%    ~     (p=0.712 n=29+30)
GoParser        159ms ± 5%       158ms ± 3%    ~     (p=0.331 n=29+30)
Reflect         406ms ± 3%       404ms ± 3%  -0.69%  (p=0.041 n=29+30)
Tar             117ms ± 4%       117ms ± 3%    ~     (p=0.886 n=30+29)
XML             219ms ± 2%       217ms ± 2%    ~     (p=0.091 n=29+24)

name       old user-ns/op   new user-ns/op   delta
Template   272user-ms ± 3%  270user-ms ± 3%  -1.03%  (p=0.004 n=30+30)
Unicode    138user-ms ± 2%  138user-ms ± 3%    ~     (p=0.902 n=29+29)
GoTypes    891user-ms ± 2%  883user-ms ± 2%  -0.95%  (p=0.000 n=29+29)
Compiler   3.85user-s ± 2%  3.84user-s ± 2%    ~     (p=0.236 n=30+30)
Flate      167user-ms ± 2%  166user-ms ± 4%    ~     (p=0.511 n=28+30)
GoParser   211user-ms ± 4%  210user-ms ± 3%    ~     (p=0.287 n=29+30)
Reflect    539user-ms ± 3%  536user-ms ± 2%  -0.59%  (p=0.034 n=29+30)
Tar        154user-ms ± 3%  155user-ms ± 4%    ~     (p=0.786 n=30+30)
XML        289user-ms ± 3%  288user-ms ± 4%    ~     (p=0.249 n=30+26)

name       old alloc/op     new alloc/op     delta
Template       40.7MB ± 0%      40.8MB ± 0%  +0.09%  (p=0.001 n=30+30)
Unicode        30.8MB ± 0%      30.8MB ± 0%    ~     (p=0.112 n=30+30)
GoTypes         123MB ± 0%       124MB ± 0%  +0.09%  (p=0.000 n=30+30)
Compiler        473MB ± 0%       473MB ± 0%  +0.05%  (p=0.000 n=30+30)
Flate          26.5MB ± 0%      26.5MB ± 0%    ~     (p=0.186 n=29+30)
GoParser       32.3MB ± 0%      32.4MB ± 0%  +0.07%  (p=0.021 n=28+30)
Reflect        84.4MB ± 0%      84.6MB ± 0%  +0.21%  (p=0.000 n=30+30)
Tar            27.3MB ± 0%      27.3MB ± 0%  +0.09%  (p=0.010 n=30+28)
XML            44.7MB ± 0%      44.7MB ± 0%  +0.07%  (p=0.002 n=30+30)

name       old allocs/op    new allocs/op    delta
Template         401k ± 1%        400k ± 1%    ~     (p=0.321 n=30+30)
Unicode          331k ± 1%        331k ± 1%    ~     (p=0.357 n=30+28)
GoTypes         1.24M ± 0%       1.24M ± 1%  -0.19%  (p=0.001 n=30+30)
Compiler        4.27M ± 0%       4.27M ± 0%  -0.13%  (p=0.000 n=30+30)
Flate            252k ± 1%        251k ± 1%  -0.30%  (p=0.005 n=30+30)
GoParser         325k ± 1%        325k ± 1%    ~     (p=0.224 n=28+30)
Reflect         1.06M ± 0%       1.05M ± 0%  -0.34%  (p=0.000 n=30+30)
Tar              266k ± 1%        266k ± 1%    ~     (p=0.333 n=30+30)
XML              416k ± 1%        415k ± 1%    ~     (p=0.144 n=30+29)


Change-Id: I6ba67a9203516373062a2618122306da73333d98
Reviewed-on: https://go-review.googlesource.com/36211
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
@josharian
Copy link
Contributor Author

The last two commits helped compile time for this code considerably. I don't see anything further to be done peculiar to this code--the rest is now general and known compiler slowness.

josharian added a commit to josharian/go that referenced this issue Feb 14, 2017
When the number of variables in a function is very large,
liveness analysis gets less efficient, since every operation
is O(number of variables).

Improve the situation by returning a sparse representation
from progeffects, caching common return values,
and avoiding allocation in all other cases.

TODO:
fix up mutilated comments
restore debugging code
measure exact impact on golang#8225
measure with compilebench

Reduces compilation time for the code
in golang#8225 Comment 1 by ~10%.

Updates golang#8225

name       old time/op      new time/op      delta
Template        209ms ± 3%       206ms ± 4%  -1.73%        (p=0.000 n=29+28)
Unicode        97.0ms ± 5%      96.8ms ± 5%    ~           (p=0.456 n=29+30)
GoTypes         628ms ± 3%       623ms ± 4%    ~           (p=0.186 n=29+30)
Compiler        2.67s ± 4%       2.63s ± 3%  -1.45%        (p=0.007 n=30+30)

name       old user-ns/op   new user-ns/op   delta
Template   268user-ms ± 3%  261user-ms ± 3%  -2.48%        (p=0.000 n=29+28)
Unicode    137user-ms ± 5%  136user-ms ± 3%    ~           (p=0.568 n=29+29)
GoTypes    858user-ms ± 3%  853user-ms ± 3%    ~           (p=0.202 n=29+30)
Compiler   3.72user-s ± 4%  3.68user-s ± 3%  -1.02%        (p=0.039 n=30+30)

name       old alloc/op     new alloc/op     delta
Template       40.7MB ± 0%      40.7MB ± 0%  +0.05%        (p=0.038 n=29+30)
Unicode        30.8MB ± 0%      30.8MB ± 0%    ~           (p=0.625 n=30+29)
GoTypes         123MB ± 0%       124MB ± 0%  +0.10%        (p=0.000 n=30+30)
Compiler        473MB ± 0%       473MB ± 0%  +0.05%        (p=0.000 n=30+30)

name       old allocs/op    new allocs/op    delta
Template         401k ± 1%        400k ± 1%  -0.26%        (p=0.001 n=29+30)
Unicode          331k ± 1%        331k ± 0%    ~           (p=0.679 n=30+29)
GoTypes         1.24M ± 0%       1.24M ± 0%  -0.14%        (p=0.007 n=30+30)
Compiler        4.27M ± 0%       4.26M ± 0%  -0.10%        (p=0.000 n=30+30)

name       old maxrss/op    new maxrss/op    delta
Template        30.9M ± 2%       30.9M ± 2%    ~           (p=0.958 n=30+29)
Unicode         33.1M ± 1%       33.0M ± 1%    ~           (p=0.462 n=27+30)
GoTypes         61.6M ± 3%       61.6M ± 2%    ~           (p=0.917 n=29+28)
Compiler         202M ± 2%        203M ± 2%    ~           (p=0.631 n=30+30)

Change-Id: I6ba67a9203516373062a2618122306da73333d98
@golang golang locked and limited conversation to collaborators Feb 4, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants