Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: "go test" taking ~38x longer on Go tip than on Go 1.4 #14934

Closed
mewmew opened this issue Mar 23, 2016 · 41 comments
Closed

cmd/compile: "go test" taking ~38x longer on Go tip than on Go 1.4 #14934

mewmew opened this issue Mar 23, 2016 · 41 comments
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. ToolSpeed
Milestone

Comments

@mewmew
Copy link
Contributor

mewmew commented Mar 23, 2016

  1. What version of Go are you using (go version)?

For Go tip

[~]$ go version
go version devel +0659cf6 Wed Mar 23 19:22:53 2016 +0000 linux/amd64

and for Go 1.4

[~]$ go version
go version go1.4.3 linux/amd64
  1. What operating system and processor architecture are you using (go env)?

For Go tip

[/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer]$ go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/tmp/gotip"
GORACE=""
GOROOT="/home/u/go"
GOTOOLDIR="/home/u/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build981577852=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"

and for Go 1.4

[/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer]$ go env
GOARCH="amd64"
GOBIN=""
GOCHAR="6"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/tmp/go14"
GORACE=""
GOROOT="/home/u/go1.4"
GOTOOLDIR="/home/u/go1.4/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"
  1. What did you do?
    If possible, provide a recipe for reproducing the error.
    A complete runnable program is good.
    A link on play.golang.org is best.

For Go tip

[~]$ export GOPATH=/tmp/gotip
[~]$ go get github.com/mewmew/uc
[~]$ go get github.com/goccmack/gocc
[~]$ export PATH=${GOPATH}/bin:${PATH}
[~]$ cd ${GOPATH}/src/github.com/mewmew/uc
[/tmp/gotip/src/github.com/mewmew/uc]$ git checkout 9557c0010cde8953d0719ab5df170b8cfbdc3a3f
[/tmp/gotip/src/github.com/mewmew/uc]$ cd uc/gocc
[/tmp/gotip/src/github.com/mewmew/uc/uc/gocc]$ make gen
[/tmp/gotip/src/github.com/mewmew/uc/uc/gocc]$ cd lexer
[/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer]$ time go test
2016/03/23 21:05:38 path: ../../testdata/incorrect/lexer/bad.c
2016/03/23 21:05:38 path: ../../testdata/incorrect/lexer/good.c
2016/03/23 21:05:38 path: ../../testdata/incorrect/lexer/long-char.c
2016/03/23 21:05:38 path: ../../testdata/incorrect/lexer/ugly.c
2016/03/23 21:05:38 path: ../../testdata/quiet/lexer/l01.c
2016/03/23 21:05:38 path: ../../testdata/quiet/lexer/l02.c
2016/03/23 21:05:38 path: ../../testdata/quiet/lexer/l03.c
2016/03/23 21:05:38 path: ../../testdata/quiet/lexer/l04.c
2016/03/23 21:05:38 path: ../../testdata/quiet/lexer/l05.c
2016/03/23 21:05:38 path: ../../testdata/quiet/lexer/l06.c
PASS
ok      github.com/mewmew/uc/uc/gocc/lexer  0.002s

real    0m20.810s
user    0m33.817s
sys 0m0.977s
[/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer]$ time go install

real    0m0.120s
user    0m0.113s
sys 0m0.023s
[/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer]$ time go test
2016/03/23 21:06:18 path: ../../testdata/incorrect/lexer/bad.c
2016/03/23 21:06:18 path: ../../testdata/incorrect/lexer/good.c
2016/03/23 21:06:18 path: ../../testdata/incorrect/lexer/long-char.c
2016/03/23 21:06:18 path: ../../testdata/incorrect/lexer/ugly.c
2016/03/23 21:06:18 path: ../../testdata/quiet/lexer/l01.c
2016/03/23 21:06:18 path: ../../testdata/quiet/lexer/l02.c
2016/03/23 21:06:18 path: ../../testdata/quiet/lexer/l03.c
2016/03/23 21:06:18 path: ../../testdata/quiet/lexer/l04.c
2016/03/23 21:06:18 path: ../../testdata/quiet/lexer/l05.c
2016/03/23 21:06:18 path: ../../testdata/quiet/lexer/l06.c
PASS
ok      github.com/mewmew/uc/uc/gocc/lexer  0.001s

real    0m20.752s
user    0m34.110s
sys 0m0.953s
[/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer]$ time go test -i

real    0m0.057s
user    0m0.030s
sys 0m0.020s
[/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer]$ time go test
2016/03/23 21:07:15 path: ../../testdata/incorrect/lexer/bad.c
2016/03/23 21:07:15 path: ../../testdata/incorrect/lexer/good.c
2016/03/23 21:07:15 path: ../../testdata/incorrect/lexer/long-char.c
2016/03/23 21:07:15 path: ../../testdata/incorrect/lexer/ugly.c
2016/03/23 21:07:15 path: ../../testdata/quiet/lexer/l01.c
2016/03/23 21:07:15 path: ../../testdata/quiet/lexer/l02.c
2016/03/23 21:07:15 path: ../../testdata/quiet/lexer/l03.c
2016/03/23 21:07:15 path: ../../testdata/quiet/lexer/l04.c
2016/03/23 21:07:15 path: ../../testdata/quiet/lexer/l05.c
2016/03/23 21:07:15 path: ../../testdata/quiet/lexer/l06.c
PASS
ok      github.com/mewmew/uc/uc/gocc/lexer  0.002s

real    0m20.751s
user    0m33.750s
sys 0m1.007s

and for Go 1.4

[~]$ export GOPATH=/tmp/go14
[~]$ go get github.com/mewmew/uc
[~]$ go get github.com/goccmack/gocc
[~]$ export PATH=${GOPATH}/bin:${PATH}
[~]$ cd ${GOPATH}/src/github.com/mewmew/uc
[/tmp/go14/src/github.com/mewmew/uc]$ git checkout 9557c0010cde8953d0719ab5df170b8cfbdc3a3f
[/tmp/go14/src/github.com/mewmew/uc]$ cd uc/gocc
[/tmp/go14/src/github.com/mewmew/uc/uc/gocc]$ make gen
[/tmp/go14/src/github.com/mewmew/uc/uc/gocc]$ cd lexer
[/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer]$ time go test
2016/03/23 21:05:42 path: ../../testdata/incorrect/lexer/bad.c
2016/03/23 21:05:42 path: ../../testdata/incorrect/lexer/good.c
2016/03/23 21:05:42 path: ../../testdata/incorrect/lexer/long-char.c
2016/03/23 21:05:42 path: ../../testdata/incorrect/lexer/ugly.c
2016/03/23 21:05:42 path: ../../testdata/quiet/lexer/l01.c
2016/03/23 21:05:42 path: ../../testdata/quiet/lexer/l02.c
2016/03/23 21:05:42 path: ../../testdata/quiet/lexer/l03.c
2016/03/23 21:05:42 path: ../../testdata/quiet/lexer/l04.c
2016/03/23 21:05:42 path: ../../testdata/quiet/lexer/l05.c
2016/03/23 21:05:42 path: ../../testdata/quiet/lexer/l06.c
PASS
ok      github.com/mewmew/uc/uc/gocc/lexer  0.002s

real    0m0.588s
user    0m0.503s
sys 0m0.053s
[/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer]$ time go install

real    0m0.053s
user    0m0.040s
sys 0m0.007s
[/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer]$ time go test
2016/03/23 21:06:43 path: ../../testdata/incorrect/lexer/bad.c
2016/03/23 21:06:43 path: ../../testdata/incorrect/lexer/good.c
2016/03/23 21:06:43 path: ../../testdata/incorrect/lexer/long-char.c
2016/03/23 21:06:43 path: ../../testdata/incorrect/lexer/ugly.c
2016/03/23 21:06:43 path: ../../testdata/quiet/lexer/l01.c
2016/03/23 21:06:43 path: ../../testdata/quiet/lexer/l02.c
2016/03/23 21:06:43 path: ../../testdata/quiet/lexer/l03.c
2016/03/23 21:06:43 path: ../../testdata/quiet/lexer/l04.c
2016/03/23 21:06:43 path: ../../testdata/quiet/lexer/l05.c
2016/03/23 21:06:43 path: ../../testdata/quiet/lexer/l06.c
PASS
ok      github.com/mewmew/uc/uc/gocc/lexer  0.002s

real    0m0.547s
user    0m0.483s
sys 0m0.057s
[/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer]$ time go test -i

real    0m0.034s
user    0m0.033s
sys 0m0.000s
[/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer]$ time go test
2016/03/23 21:06:52 path: ../../testdata/incorrect/lexer/bad.c
2016/03/23 21:06:52 path: ../../testdata/incorrect/lexer/good.c
2016/03/23 21:06:52 path: ../../testdata/incorrect/lexer/long-char.c
2016/03/23 21:06:52 path: ../../testdata/incorrect/lexer/ugly.c
2016/03/23 21:06:52 path: ../../testdata/quiet/lexer/l01.c
2016/03/23 21:06:52 path: ../../testdata/quiet/lexer/l02.c
2016/03/23 21:06:52 path: ../../testdata/quiet/lexer/l03.c
2016/03/23 21:06:52 path: ../../testdata/quiet/lexer/l04.c
2016/03/23 21:06:52 path: ../../testdata/quiet/lexer/l05.c
2016/03/23 21:06:52 path: ../../testdata/quiet/lexer/l06.c
PASS
ok      github.com/mewmew/uc/uc/gocc/lexer  0.002s

real    0m0.545s
user    0m0.503s
sys 0m0.037s
  1. What did you expect to see?

The time take by go test github.com/mewmew/uc/uc/gocc/lexer taking roughly the same amount of time to run when using Go tip and Go 1.4.

  1. What did you see instead?

"go test" took roughly 20 seconds on Go tip and 0.5 seconds on Go 1.4.

Also note that the compile command used between 25% to 60% of the CPU, during the 20 second duration.

CPU usage

@mewmew mewmew changed the title "go test" taking ~38x longer on Go tip than on Go 1.4 cmd/compile: "go test" taking ~38x longer on Go tip than on Go 1.4 Mar 23, 2016
@bradfitz
Copy link
Contributor

Nuke your $GOPATH/pkg cache before both tests.

I suspect in one case you're building dependencies and in the other case you're not.

@ianlancetaylor ianlancetaylor added this to the Go1.7 milestone Mar 23, 2016
@mewmew
Copy link
Contributor Author

mewmew commented Mar 23, 2016

I tried to remove the $GOPATH/pkg directory but did not make a difference.

For Go tip (21 sec)

[/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer]$ go version
go version devel +0659cf6 Wed Mar 23 19:22:53 2016 +0000 linux/amd64
[/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer]$ ls ${GOPATH}
bin/  src/  ./  ../
[/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer]$ time go test
2016/03/23 22:45:30 path: ../../testdata/incorrect/lexer/bad.c
2016/03/23 22:45:30 path: ../../testdata/incorrect/lexer/good.c
2016/03/23 22:45:30 path: ../../testdata/incorrect/lexer/long-char.c
2016/03/23 22:45:30 path: ../../testdata/incorrect/lexer/ugly.c
2016/03/23 22:45:30 path: ../../testdata/quiet/lexer/l01.c
2016/03/23 22:45:30 path: ../../testdata/quiet/lexer/l02.c
2016/03/23 22:45:30 path: ../../testdata/quiet/lexer/l03.c
2016/03/23 22:45:30 path: ../../testdata/quiet/lexer/l04.c
2016/03/23 22:45:30 path: ../../testdata/quiet/lexer/l05.c
2016/03/23 22:45:30 path: ../../testdata/quiet/lexer/l06.c
PASS
ok      github.com/mewmew/uc/uc/gocc/lexer  0.002s

real    0m21.939s
user    0m32.177s
sys 0m1.307s
[/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer]$ ls ${GOPATH}
bin/  src/  ./  ../
[/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer]$ time go test -i

real    0m0.130s
user    0m0.133s
sys 0m0.007s
[/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer]$ ls ${GOPATH}
bin/  pkg/  src/  ./  ../
[/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer]$ time go test
2016/03/23 22:46:22 path: ../../testdata/incorrect/lexer/bad.c
2016/03/23 22:46:22 path: ../../testdata/incorrect/lexer/good.c
2016/03/23 22:46:22 path: ../../testdata/incorrect/lexer/long-char.c
2016/03/23 22:46:22 path: ../../testdata/incorrect/lexer/ugly.c
2016/03/23 22:46:22 path: ../../testdata/quiet/lexer/l01.c
2016/03/23 22:46:22 path: ../../testdata/quiet/lexer/l02.c
2016/03/23 22:46:22 path: ../../testdata/quiet/lexer/l03.c
2016/03/23 22:46:22 path: ../../testdata/quiet/lexer/l04.c
2016/03/23 22:46:22 path: ../../testdata/quiet/lexer/l05.c
2016/03/23 22:46:22 path: ../../testdata/quiet/lexer/l06.c
PASS
ok      github.com/mewmew/uc/uc/gocc/lexer  0.002s

real    0m21.440s
user    0m33.103s
sys 0m0.783s

For Go 1.4 (0.6 sec)

[/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer]$ go version
go version go1.4.3 linux/amd64
[/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer]$ ls ${GOPATH}
bin/  src/  ./  ../
[/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer]$ time go test
2016/03/23 22:45:06 path: ../../testdata/incorrect/lexer/bad.c
2016/03/23 22:45:06 path: ../../testdata/incorrect/lexer/good.c
2016/03/23 22:45:06 path: ../../testdata/incorrect/lexer/long-char.c
2016/03/23 22:45:06 path: ../../testdata/incorrect/lexer/ugly.c
2016/03/23 22:45:06 path: ../../testdata/quiet/lexer/l01.c
2016/03/23 22:45:06 path: ../../testdata/quiet/lexer/l02.c
2016/03/23 22:45:06 path: ../../testdata/quiet/lexer/l03.c
2016/03/23 22:45:06 path: ../../testdata/quiet/lexer/l04.c
2016/03/23 22:45:06 path: ../../testdata/quiet/lexer/l05.c
2016/03/23 22:45:06 path: ../../testdata/quiet/lexer/l06.c
PASS
ok      github.com/mewmew/uc/uc/gocc/lexer  0.002s

real    0m0.595s
user    0m0.517s
sys 0m0.067s
[/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer]$ ls ${GOPATH}
bin/  src/  ./  ../
[/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer]$ time go test -i

real    0m0.065s
user    0m0.047s
sys 0m0.013s
[/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer]$ ls ${GOPATH}
bin/  pkg/  src/  ./  ../
[/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer]$ time go test
2016/03/23 22:46:24 path: ../../testdata/incorrect/lexer/bad.c
2016/03/23 22:46:24 path: ../../testdata/incorrect/lexer/good.c
2016/03/23 22:46:24 path: ../../testdata/incorrect/lexer/long-char.c
2016/03/23 22:46:24 path: ../../testdata/incorrect/lexer/ugly.c
2016/03/23 22:46:24 path: ../../testdata/quiet/lexer/l01.c
2016/03/23 22:46:24 path: ../../testdata/quiet/lexer/l02.c
2016/03/23 22:46:24 path: ../../testdata/quiet/lexer/l03.c
2016/03/23 22:46:24 path: ../../testdata/quiet/lexer/l04.c
2016/03/23 22:46:24 path: ../../testdata/quiet/lexer/l05.c
2016/03/23 22:46:24 path: ../../testdata/quiet/lexer/l06.c
PASS
ok      github.com/mewmew/uc/uc/gocc/lexer  0.002s

real    0m0.567s
user    0m0.520s
sys 0m0.040s

@mdempsky
Copy link
Member

Try running go test -x -c with both 1.4 and tip. Are there any differences in build actions?

@bradfitz
Copy link
Contributor

The SSA compiler is very slow with the large, auto-generated lexer_test.go.

CPU and memory profiles follow:

(pprof) top --cum 20
6.61s of 38.69s total (17.08%)
Dropped 294 nodes (cum <= 0.19s)
Showing top 20 nodes out of 85 (cum >= 7.55s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%     37.39s 96.64%  runtime.goexit
         0     0%     0%     25.62s 66.22%  cmd/compile/internal/amd64.Main
         0     0%     0%     25.62s 66.22%  cmd/compile/internal/gc.Main
         0     0%     0%     25.62s 66.22%  main.main
         0     0%     0%     25.62s 66.22%  runtime.main
         0     0%     0%     25.51s 65.93%  cmd/compile/internal/gc.funccompile
         0     0%     0%     25.46s 65.81%  cmd/compile/internal/gc.compile
         0     0%     0%     25.23s 65.21%  cmd/compile/internal/gc.buildssa
         0     0%     0%     16.41s 42.41%  cmd/compile/internal/ssa.Compile
     6.06s 15.66% 15.66%     12.98s 33.55%  runtime.scanobject
         0     0% 15.66%     11.69s 30.21%  runtime.gcBgMarkWorker
     0.19s  0.49% 16.15%     11.66s 30.14%  runtime.gcDrain
     0.17s  0.44% 16.59%     11.16s 28.84%  runtime.mallocgc
     0.04s   0.1% 16.70%     10.99s 28.41%  runtime.systemstack
     0.02s 0.052% 16.75%      8.76s 22.64%  cmd/compile/internal/gc.(*state).linkForwardReferences
     0.04s   0.1% 16.85%      8.74s 22.59%  cmd/compile/internal/gc.(*state).resolveFwdRef
     0.09s  0.23% 17.08%      8.60s 22.23%  cmd/compile/internal/gc.(*state).lookupVarOutgoing
         0     0% 17.08%      8.29s 21.43%  cmd/compile/internal/ssa.regalloc
         0     0% 17.08%      7.55s 19.51%  runtime.(*mcache).refill
         0     0% 17.08%      7.55s 19.51%  runtime.(*mcentral).cacheSpan

(pprof) top 20
33550ms of 38690ms total (86.71%)
Dropped 294 nodes (cum <= 193.45ms)
Showing top 20 nodes out of 85 (cum >= 240ms)
      flat  flat%   sum%        cum   cum%
    6260ms 16.18% 16.18%     7490ms 19.36%  runtime.(*mcentral).grow
    6060ms 15.66% 31.84%    12980ms 33.55%  runtime.scanobject
    4180ms 10.80% 42.65%     4180ms 10.80%  cmd/compile/internal/ssa.copyelimValue
    3740ms  9.67% 52.31%     3740ms  9.67%  runtime.heapBitsForObject
    2740ms  7.08% 59.40%     3210ms  8.30%  runtime.greyobject
    2690ms  6.95% 66.35%     2690ms  6.95%  runtime.memclr
    1270ms  3.28% 69.63%     3410ms  8.81%  cmd/compile/internal/ssa.deadcode
    1070ms  2.77% 72.40%     1070ms  2.77%  runtime.duffcopy
     880ms  2.27% 74.67%      880ms  2.27%  runtime.memmove
     680ms  1.76% 76.43%      750ms  1.94%  runtime.evacuate
     660ms  1.71% 78.13%      890ms  2.30%  cmd/compile/internal/ssa.(*regAllocState).computeLive
     550ms  1.42% 79.56%      550ms  1.42%  cmd/compile/internal/ssa.(*Value).resetArgs
     510ms  1.32% 80.87%      700ms  1.81%  cmd/compile/internal/ssa.(*sparseSet).addAll
     470ms  1.21% 82.09%      470ms  1.21%  runtime.usleep
     390ms  1.01% 83.10%      490ms  1.27%  runtime.(*mheap).allocSpanLocked
     380ms  0.98% 84.08%     2340ms  6.05%  runtime.mapassign1
     290ms  0.75% 84.83%     2800ms  7.24%  cmd/compile/internal/ssa.(*edgeState).setup
     290ms  0.75% 85.58%      290ms  0.75%  runtime/internal/atomic.Or8
     230ms  0.59% 86.17%      250ms  0.65%  cmd/compile/internal/ssa.phielimValue
     210ms  0.54% 86.71%      240ms  0.62%  runtime.mapaccess2_fast64


(pprof) top --cum 30
4602.42MB of 5030.21MB total (91.50%)
Dropped 233 nodes (cum <= 25.15MB)
Showing top 30 nodes out of 37 (cum >= 96.28MB)
      flat  flat%   sum%        cum   cum%
         0     0%     0%  5026.71MB 99.93%  cmd/compile/internal/gc.Main
         0     0%     0%  5025.71MB 99.91%  cmd/compile/internal/amd64.Main
         0     0%     0%  5025.21MB 99.90%  main.main
         0     0%     0%  5024.21MB 99.88%  runtime.main
         0     0%     0%  5023.21MB 99.86%  runtime.goexit
         0     0%     0%  4999.15MB 99.38%  cmd/compile/internal/gc.funccompile
         0     0%     0%  4997.98MB 99.36%  cmd/compile/internal/gc.compile
         0     0%     0%  4955.84MB 98.52%  cmd/compile/internal/gc.buildssa
         0     0%     0%  3249.92MB 64.61%  cmd/compile/internal/ssa.Compile
         0     0%     0%  2293.67MB 45.60%  cmd/compile/internal/ssa.regalloc
  173.93MB  3.46%  3.46%  1697.45MB 33.75%  cmd/compile/internal/ssa.(*regAllocState).regalloc
         0     0%  3.46%  1682.08MB 33.44%  cmd/compile/internal/gc.(*state).linkForwardReferences
         0     0%  3.46%  1682.08MB 33.44%  cmd/compile/internal/gc.(*state).resolveFwdRef
  579.61MB 11.52% 14.98%  1663.08MB 33.06%  cmd/compile/internal/gc.(*state).lookupVarOutgoing
  967.46MB 19.23% 34.21%   967.46MB 19.23%  cmd/compile/internal/ssa.(*Func).newValue
         0     0% 34.21%   961.44MB 19.11%  cmd/compile/internal/ssa.(*Block).NewValue0A
         0     0% 34.21%   853.75MB 16.97%  cmd/compile/internal/ssa.stackalloc
  630.77MB 12.54% 46.75%   630.77MB 12.54%  cmd/compile/internal/ssa.(*Func).setHome
         0     0% 46.75%   630.77MB 12.54%  cmd/compile/internal/ssa.(*regAllocState).assignReg
  350.85MB  6.97% 53.73%   596.22MB 11.85%  cmd/compile/internal/ssa.(*regAllocState).init
  233.10MB  4.63% 58.36%   495.48MB  9.85%  cmd/compile/internal/ssa.(*stackAllocState).init
  356.95MB  7.10% 65.46%   358.26MB  7.12%  cmd/compile/internal/ssa.(*stackAllocState).stackalloc
  292.95MB  5.82% 71.28%   295.07MB  5.87%  cmd/compile/internal/ssa.cse
  272.54MB  5.42% 76.70%   272.54MB  5.42%  cmd/compile/internal/ssa.schedule
  244.27MB  4.86% 81.56%   245.37MB  4.88%  cmd/compile/internal/ssa.(*regAllocState).computeLive
  201.41MB  4.00% 85.56%   201.41MB  4.00%  cmd/compile/internal/ssa.(*stackAllocState).buildInterferenceGraph
  174.84MB  3.48% 89.04%   174.84MB  3.48%  cmd/compile/internal/ssa.(*Func).newSparseSet
  123.73MB  2.46% 91.50%   123.73MB  2.46%  cmd/compile/internal/gc.(*state).addNamedValue
         0     0% 91.50%   116.56MB  2.32%  cmd/compile/internal/ssa.dse
         0     0% 91.50%    96.28MB  1.91%  cmd/compile/internal/ssa.deadcode

(pprof) top 30
4.78GB of 4.91GB total (97.25%)
Dropped 233 nodes (cum <= 0.02GB)
Showing top 30 nodes out of 37 (cum >= 3.17GB)
      flat  flat%   sum%        cum   cum%
    0.94GB 19.23% 19.23%     0.94GB 19.23%  cmd/compile/internal/ssa.(*Func).newValue
    0.62GB 12.54% 31.77%     0.62GB 12.54%  cmd/compile/internal/ssa.(*Func).setHome
    0.57GB 11.52% 43.30%     1.62GB 33.06%  cmd/compile/internal/gc.(*state).lookupVarOutgoing
    0.35GB  7.10% 50.39%     0.35GB  7.12%  cmd/compile/internal/ssa.(*stackAllocState).stackalloc
    0.34GB  6.97% 57.37%     0.58GB 11.85%  cmd/compile/internal/ssa.(*regAllocState).init
    0.29GB  5.82% 63.19%     0.29GB  5.87%  cmd/compile/internal/ssa.cse
    0.27GB  5.42% 68.61%     0.27GB  5.42%  cmd/compile/internal/ssa.schedule
    0.24GB  4.86% 73.46%     0.24GB  4.88%  cmd/compile/internal/ssa.(*regAllocState).computeLive
    0.23GB  4.63% 78.10%     0.48GB  9.85%  cmd/compile/internal/ssa.(*stackAllocState).init
    0.20GB  4.00% 82.10%     0.20GB  4.00%  cmd/compile/internal/ssa.(*stackAllocState).buildInterferenceGraph
    0.17GB  3.48% 85.58%     0.17GB  3.48%  cmd/compile/internal/ssa.(*Func).newSparseSet
    0.17GB  3.46% 89.04%     1.66GB 33.75%  cmd/compile/internal/ssa.(*regAllocState).regalloc
    0.12GB  2.46% 91.50%     0.12GB  2.46%  cmd/compile/internal/gc.(*state).addNamedValue
    0.09GB  1.88% 93.38%     0.09GB  1.88%  cmd/compile/internal/ssa.tighten
    0.06GB  1.18% 94.56%     0.06GB  1.21%  cmd/compile/internal/ssa.(*stackAllocState).computeLive
    0.06GB  1.17% 95.73%     0.06GB  1.17%  cmd/compile/internal/ssa.critical
    0.04GB  0.78% 96.50%     0.04GB  0.78%  cmd/compile/internal/ssa.(*edgeState).set
    0.04GB  0.75% 97.25%     0.04GB  0.75%  cmd/compile/internal/ssa.liveValues
         0     0% 97.25%     4.91GB 99.91%  cmd/compile/internal/amd64.Main
         0     0% 97.25%     1.64GB 33.44%  cmd/compile/internal/gc.(*state).linkForwardReferences
         0     0% 97.25%     1.64GB 33.44%  cmd/compile/internal/gc.(*state).resolveFwdRef
         0     0% 97.25%     4.91GB 99.93%  cmd/compile/internal/gc.Main
         0     0% 97.25%     4.84GB 98.52%  cmd/compile/internal/gc.buildssa
         0     0% 97.25%     4.88GB 99.36%  cmd/compile/internal/gc.compile
         0     0% 97.25%     4.88GB 99.38%  cmd/compile/internal/gc.funccompile
         0     0% 97.25%     0.94GB 19.11%  cmd/compile/internal/ssa.(*Block).NewValue0A
         0     0% 97.25%     0.04GB  0.78%  cmd/compile/internal/ssa.(*edgeState).setup
         0     0% 97.25%     0.62GB 12.54%  cmd/compile/internal/ssa.(*regAllocState).assignReg
         0     0% 97.25%     0.04GB  0.78%  cmd/compile/internal/ssa.(*regAllocState).shuffle
         0     0% 97.25%     3.17GB 64.61%  cmd/compile/internal/ssa.Compile

@minux
Copy link
Member

minux commented Mar 23, 2016 via email

@griesemer
Copy link
Contributor

I would certainly hope that we can remove the old backend eventually. It
seems like a bad idea to have to maintain two different backends in one
compiler forever.

  • gri

On Wed, Mar 23, 2016 at 4:02 PM, Minux Ma notifications@github.com wrote:

The old compiler has a cutoff threshold: when a function uses more than
some certain number of instructions, it will skip the optimization step.
Perhaps we could preserve the old backend for such cases.

(That is, if the input function is too complicated, just fallback to the
old backend.)


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#14934 (comment)

@mewmew
Copy link
Contributor Author

mewmew commented Mar 23, 2016

Try running go test -x -c with both 1.4 and tip. Are there any differences in build actions?

On Go tip

[/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer]$ go version
go version devel +0659cf6 Wed Mar 23 19:22:53 2016 +0000 linux/amd64
[/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer]$ time go test -x -c
WORK=/tmp/go-build249655493
mkdir -p $WORK/_/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer/_test/_/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/
mkdir -p $WORK/github.com/mewmew/uc/uc/gocc/token/_obj/
mkdir -p $WORK/github.com/mewmew/uc/uc/gocc/
cd /home/u/Desktop/go/src/github.com/mewmew/uc/uc/gocc/token
/home/u/go/pkg/tool/linux_amd64/compile -o $WORK/github.com/mewmew/uc/uc/gocc/token.a -trimpath $WORK -p github.com/mewmew/uc/uc/gocc/token -complete -buildid 8b7f374d1fc5ffa423d5d240641f029151c72b20 -D _/home/u/Desktop/go/src/github.com/mewmew/uc/uc/gocc/token -I $WORK -pack ./token.go
mkdir -p $WORK/_/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer/_obj/
mkdir -p $WORK/_/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/
cd /tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer
/home/u/go/pkg/tool/linux_amd64/compile -o $WORK/_/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer.a -trimpath $WORK -p _/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer -complete -buildid 217e99cb067119187679a189a2297c23179ff229 -D _/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer -I $WORK -I /home/u/Desktop/go/pkg/linux_amd64 -pack ./acttab.go ./lexer.go ./transitiontable.go
mkdir -p $WORK/github.com/mewmew/uc/uc/gocc/lexer/_obj/
cd /home/u/Desktop/go/src/github.com/mewmew/uc/uc/gocc/lexer
/home/u/go/pkg/tool/linux_amd64/compile -o $WORK/github.com/mewmew/uc/uc/gocc/lexer.a -trimpath $WORK -p github.com/mewmew/uc/uc/gocc/lexer -complete -buildid 217e99cb067119187679a189a2297c23179ff229 -D _/home/u/Desktop/go/src/github.com/mewmew/uc/uc/gocc/lexer -I $WORK -I /home/u/Desktop/go/pkg/linux_amd64 -pack ./acttab.go ./lexer.go ./transitiontable.go
mkdir -p $WORK/_/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer/_test/_obj_xtest/
cd /tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer
/home/u/go/pkg/tool/linux_amd64/compile -o $WORK/_/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer/_test/_/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer_test.a -trimpath $WORK -p _/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer_test -complete -D _/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer -I $WORK -I /home/u/Desktop/go/pkg/linux_amd64 -pack ./lexer_test.go
mkdir -p $WORK/_/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer/_test/
cd $WORK/_/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer/_test
/home/u/go/pkg/tool/linux_amd64/compile -o ./main.a -trimpath $WORK -p main -complete -D "" -I . -I $WORK -pack ./_testmain.go
cd .
/home/u/go/pkg/tool/linux_amd64/link -o $WORK/_/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer/_test/lexer.test -L $WORK/_/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer/_test -L $WORK -L /home/u/Desktop/go/pkg/linux_amd64 -extld=gcc -buildmode=exe $WORK/_/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer/_test/main.a
mkdir -p /tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer/
mv $WORK/_/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer/_test/lexer.test /tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer/lexer.test

real    0m22.553s
user    0m35.083s
sys 0m1.483s

On Go 1.4

[/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer]$ go version
go version go1.4.3 linux/amd64
[/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer]$ time go test -x -c
WORK=/tmp/go-build431641544
mkdir -p $WORK/_/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer/_test/_/tmp/go14/src/github.com/mewmew/uc/uc/gocc/
mkdir -p $WORK/_/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer/_obj/
mkdir -p $WORK/_/tmp/go14/src/github.com/mewmew/uc/uc/gocc/
cd /tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer
/home/u/go1.4/pkg/tool/linux_amd64/6g -o $WORK/_/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer.a -trimpath $WORK -p _/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer -complete -D _/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer -I $WORK -I /home/u/Desktop/go/pkg/linux_amd64 -pack ./acttab.go ./lexer.go ./transitiontable.go
mkdir -p $WORK/_/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer/_test/_obj_xtest/
/home/u/go1.4/pkg/tool/linux_amd64/6g -o $WORK/_/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer/_test/_/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer_test.a -trimpath $WORK -p _/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer_test -complete -D _/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer -I $WORK -I /home/u/Desktop/go/pkg/linux_amd64 -pack ./lexer_test.go
mkdir -p $WORK/_/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer/_test/
cd $WORK/_/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer/_test
/home/u/go1.4/pkg/tool/linux_amd64/6g -o ./main.a -trimpath $WORK -p testmain -complete -D "" -I . -I $WORK -pack ./_testmain.go
cd .
/home/u/go1.4/pkg/tool/linux_amd64/6l -o $WORK/_/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer/_test/lexer.test -L $WORK/_/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer/_test -L $WORK -L /home/u/Desktop/go/pkg/linux_amd64 -extld=gcc $WORK/_/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer/_test/main.a
mkdir -p /tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer/
mv $WORK/_/tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer/_test/lexer.test /tmp/go14/src/github.com/mewmew/uc/uc/gocc/lexer/lexer.test

real    0m0.599s
user    0m0.557s
sys 0m0.050s

@mewmew
Copy link
Contributor Author

mewmew commented Mar 23, 2016

@mdempsky The step which takes most of the time for Go tip is the following:

/home/u/go/pkg/tool/linux_amd64/compile -o $WORK/_/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer/_test/_/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer_test.a -trimpath $WORK -p _/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer_test -complete -D _/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer -I $WORK -I /home/u/Desktop/go/pkg/linux_amd64 -pack ./lexer_test.go

@davecheney
Copy link
Contributor

Can you please provide a link to that source. I cannot figure out which
branch of your repo I need to compile.

On Thu, Mar 24, 2016 at 10:28 AM, Robin Eklind notifications@github.com
wrote:

@mdempsky https://github.com/mdempsky The step which takes most of the
time for Go tip is the following:

/home/u/go/pkg/tool/linux_amd64/compile -o $WORK/_/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer/test//tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer_test.a -trimpath $WORK -p _/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer_test -complete -D _/tmp/gotip/src/github.com/mewmew/uc/uc/gocc/lexer -I $WORK -I /home/u/Desktop/go/pkg/linux_amd64 -pack ./lexer_test.go


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#14934 (comment)

@cespare
Copy link
Contributor

cespare commented Mar 23, 2016

@davecheney
Copy link
Contributor

I am sorry, this revision does not compile cleanly for me

github.com/mewmew/uc/uc/gocc/lexer

lexer_test.go:11:2: cannot find package "github.com/mewmew/uc/uc/gocc/token"
in any of:
/home/dfc/go/src/github.com/mewmew/uc/uc/gocc/token (from $GOROOT)
/home/dfc/src/github.com/mewmew/uc/uc/gocc/token (from $GOPATH)

% ls $GOPATH/src/github.com/mewmew/uc/uc/gocc/
lexer Makefile uc.bnf

On Thu, Mar 24, 2016 at 10:52 AM, Caleb Spare notifications@github.com
wrote:

@davecheney https://github.com/davecheney
https://github.com/mewmew/uc/blob/9557c0010cde8953d0719ab5df170b8cfbdc3a3f/uc/gocc/lexer/lexer_test.go

(The sha is given in the report.)


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#14934 (comment)

@bradfitz
Copy link
Contributor

@davecheney, I was able to follow the instructions to repro to get my numbers above.

@davecheney
Copy link
Contributor

@bradfitz I couldn't get it to work, could you please get a cpu and mem
profile and add their SVGs to the issue. Thanks!

On Thu, Mar 24, 2016 at 10:58 AM, Brad Fitzpatrick <notifications@github.com

wrote:

@davecheney https://github.com/davecheney, I was able to follow the
instructions to repro to get my numbers above.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#14934 (comment)

@cespare
Copy link
Contributor

cespare commented Mar 24, 2016

@davecheney Perhaps you didn't run make gen as in the repro steps.

@mewmew
Copy link
Contributor Author

mewmew commented Mar 24, 2016

make gen has the gocc tool as a dependency, thus the following commands were included in the instructions above. Also note that the git commit hash 9557c0010cde8953d0719ab5df170b8cfbdc3a3f is on the dev branch.

[~]$ go get github.com/goccmack/gocc
[~]$ export PATH=${GOPATH}/bin:${PATH}

@mewmew
Copy link
Contributor Author

mewmew commented Mar 24, 2016

@davecheney All steps in one go, simply copy-and-paste into a terminal to reproduce.

For Go tip

~/go/bin/go version
export GOPATH=/tmp/gotip
~/go/bin/go get github.com/mewmew/uc
~/go/bin/go get github.com/goccmack/gocc
export PATH=${GOPATH}/bin:${PATH}
cd ${GOPATH}/src/github.com/mewmew/uc
git checkout 9557c0010cde8953d0719ab5df170b8cfbdc3a3f
cd uc/gocc
make gen
cd lexer
time ~/go/bin/go test

For Go 1.4

~/go1.4/bin/go version
export GOPATH=/tmp/go14
~/go1.4/bin/go get github.com/mewmew/uc
~/go1.4/bin/go get github.com/goccmack/gocc
export PATH=${GOPATH}/bin:${PATH}
cd ${GOPATH}/src/github.com/mewmew/uc
git checkout 9557c0010cde8953d0719ab5df170b8cfbdc3a3f
cd uc/gocc
make gen
cd lexer
time ~/go1.4/bin/go test

@davecheney
Copy link
Contributor

Thanks Robin!

On Thu, Mar 24, 2016 at 11:50 AM, Robin Eklind notifications@github.com
wrote:

@davecheney https://github.com/davecheney All steps in one go, simply
copy-and-paste into a terminal to reproduce.

For Go tip

~/go/bin/go version
export GOPATH=/tmp/gotip
~/go/bin/go get github.com/mewmew/uc
~/go/bin/go http://github.com/mewmew/uc~/go/bin/go get github.com/goccmack/gocc
export PATH=${GOPATH}/bin:${PATH}
cd ${GOPATH}/src/github.com/mewmew/uc
git checkout 9557c0010cde8953d0719ab5df170b8cfbdc3a3f
cd uc/gocc
make gen
cd lexer
time ~/go/bin/go test

For Go 1.4

~/go1.4/bin/go version
export GOPATH=/tmp/go14
~/go1.4/bin/go get github.com/mewmew/uc
~/go1.4/bin/go get github.com/goccmack/gocc
export PATH=${GOPATH}/bin:${PATH}
cd ${GOPATH}/src/github.com/mewmew/uc
git checkout 9557c0010cde8953d0719ab5df170b8cfbdc3a3f
cd uc/gocc
make gen
cd lexer
time ~/go1.4/bin/go test


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#14934 (comment)

@mewmew
Copy link
Contributor Author

mewmew commented Mar 24, 2016

Thanks Robin!

You are very welcome. I'll look into generating CPU and memory profiles.

@davecheney
Copy link
Contributor

Don't spend time on the 1.4 data, it's not going to change. The SVG for
memory usage and cpu usage for tip are the important ones. I suggest

go test -i $PKG # compile all the test depenencies to ensure only one
command is run
go test -gcflags='-cpuprofile=/tmp/c.p' $PKG
go tool pprof -svg $(go tool -n compile) /tmp/c.p

On Thu, Mar 24, 2016 at 11:51 AM, Robin Eklind notifications@github.com
wrote:

Thanks Robin!

You are very welcome. I'll look into generating CPU and memory profiles.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#14934 (comment)

@mewmew
Copy link
Contributor Author

mewmew commented Mar 24, 2016

Thanks for the how to instructions. I ran the commands you suggested and got the following profiles for Go tip.

CPU profile

CPU profile

Memory profile

Memory profile

From what I can tell, it doesn't look like the profile managed to capture the entire compilation time? The CPU profile only captured 10 ms of the 20 seconds. And the memory profile only captured ~5 MB of the ~3 GB. Any suggestions?

@tzneal
Copy link
Member

tzneal commented Mar 24, 2016

I hacked together a standalone file slow.go from lexer_test that gives pretty awful performance (but not 30x slower). The 3k lines of go generates over 4 million SSA values.

pass early phielim nVals = 4207720
pass early copyelim nVals = 4207720
pass early deadcode nVals = 4207720
pass short circuit nVals = 24076
pass decompose user nVals = 24078
pass opt nVals = 24078
pass zero arg cse nVals = 38169
pass opt deadcode nVals = 38169
pass generic cse nVals = 31927
pass phiopt nVals = 31927
pass nilcheckelim nVals = 31927
pass prove nVals = 31927
pass decompose builtin nVals = 31927
pass dec nVals = 31927
pass late opt nVals = 31927
pass generic deadcode nVals = 31927
pass fuse nVals = 20835
pass dse nVals = 20835
pass tighten nVals = 20835
pass lower nVals = 20835
pass lowered cse nVals = 24555
pass lowered deadcode nVals = 24555
pass checkLower nVals = 17297
pass late phielim nVals = 17297
pass late copyelim nVals = 17297
pass late deadcode nVals = 17297
pass critical nVals = 17297
pass likelyadjust nVals = 17297
pass layout nVals = 17297
pass schedule nVals = 17297
pass flagalloc nVals = 17297
pass regalloc nVals = 17297
pass trim nVals = 23588

@davecheney
Copy link
Contributor

Run the command with -X, I think there may be more than one compile happening, overwriting the file.

Worse case you can copy and paste the output from -x and run each command by hand.

On 24 Mar 2016, at 12:14, Robin Eklind notifications@github.com wrote:

Thanks for the how to instructions. I ran the commands you suggested and got the following profiles.

CPU profile

Memory profile

From what I can tell, it doesn't look like the profile managed to capture the entire compilation time? The CPU profile only captured 10 ms of the 20 seconds. And the memory profile only captured ~5 MB of the ~3 GB. Any suggestions?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@mewmew
Copy link
Contributor Author

mewmew commented Mar 24, 2016

@tzneal Thanks! Profiling slow.go produces the following for Go tip.

[/tmp/gotip/src/slow]$ go version
go version devel +0659cf6 Wed Mar 23 19:22:53 2016 +0000 linux/amd64

CPU profile

CPU profile

Memory profile

Memory profile

To reproduce, run the following:

go version
export GOPATH=/tmp/gotip
mkdir -p ${GOPATH}/src/slow
wget -O ${GOPATH}/src/slow/slow.go https://gist.githubusercontent.com/tzneal/a77e8ac4ef763555fc09/raw/1e01b92c7d3e00331a642258721136d54e5bf9a8/slow.go
sed -i 's/package lexer_test/package slow/' ${GOPATH}/src/slow/slow.go
cd ${GOPATH}/src/slow
go build -gcflags='-cpuprofile=/tmp/c.p'
go tool pprof -svg $(go tool -n compile) /tmp/c.p > c.svg
go build -gcflags='-memprofile=/tmp/m.p'
go tool pprof -svg $(go tool -n compile) /tmp/m.p > m.svg

@randall77
Copy link
Contributor

Lots of inlining, lots of gotos, big slice constructor, not surprised it is bad.
LookupVarOutgoing & friends point to the phi generation algorithm which is not perfect at the moment. We're using something that is usually good in practice but can but O(n^2) in some cases, which this CL may tickle.

@josharian
Copy link
Contributor

Possibly a dup of #14774.

@gopherbot
Copy link

CL https://golang.org/cl/21082 mentions this issue.

@randall77
Copy link
Contributor

David, does your sparse phi insertion CL help here?

@randall77 randall77 assigned dr2chase and unassigned randall77 Apr 29, 2016
@dr2chase
Copy link
Contributor

Sparse phi insertion helps.
Doesn't get back all the performance, but lots better than it was.
(Timings on a Macbook Air laptop running on battery)

export GO_SSA_PHI_LOC_CUTOFF=0 # use sparse everywhere
time go test
...
real    0m13.318s
user    0m14.309s
sys 0m0.863s
export GO_SSA_PHI_LOC_CUTOFF=-1 # use sparse nowhere
time go test
...
real    0m33.241s
user    0m39.499s
sys 0m2.954s
export GO_SSA_PHI_LOC_CUTOFF=2500000 # default threshold 2.5million (#vars*#blocks)
time go test
...
real    0m13.448s
user    0m14.335s
sys 0m0.955s

New profiles from compiling lexer_test. locatePotentialPhiFunctions is still doing a lot of work even when sparse and there might be some upside hiding there, but the big pig is now register allocation and all the memory that it allocates. I've annotated regalloc (and stack alloc) with R and phi location with P

(pprof) top 30
10450ms of 12120ms total (86.22%)
Dropped 249 nodes (cum <= 60.60ms)
Showing top 30 nodes out of 109 (cum >= 180ms)
      flat  flat%   sum%        cum   cum%
    4050ms 33.42% 33.42%     4300ms 35.48%  runtime.(*mcentral).grow
    1130ms  9.32% 42.74%     2020ms 16.67%  runtime.scanobject
     510ms  4.21% 46.95%      510ms  4.21%  runtime.memclr
     470ms  3.88% 50.83%      680ms  5.61%  cmd/compile/internal/ssa.(*regAllocState).computeLive R
     470ms  3.88% 54.70%      470ms  3.88%  cmd/compile/internal/ssa.copyelimValue
     430ms  3.55% 58.25%     4420ms 36.47%  cmd/compile/internal/ssa.(*regAllocState).regalloc R
     400ms  3.30% 61.55%      400ms  3.30%  runtime.heapBitsForObject
     300ms  2.48% 64.03%      490ms  4.04%  runtime.greyobject
     280ms  2.31% 66.34%      280ms  2.31%  runtime.usleep
     240ms  1.98% 68.32%      240ms  1.98%  runtime.memmove
     210ms  1.73% 70.05%      210ms  1.73%  cmd/compile/internal/ssa.(*sparseSet).addAll R?
     200ms  1.65% 71.70%      230ms  1.90%  runtime.evacuate
     190ms  1.57% 73.27%      190ms  1.57%  runtime.duffcopy
     180ms  1.49% 74.75%      620ms  5.12%  runtime.mapassign1
     140ms  1.16% 75.91%      190ms  1.57%  cmd/compile/internal/ssa.(*Block).removePred
     120ms  0.99% 76.90%      200ms  1.65%  runtime.(*mheap).allocSpanLocked
     100ms  0.83% 77.72%      100ms  0.83%  cmd/compile/internal/ssa.(*Value).resetArgs
     100ms  0.83% 78.55%      780ms  6.44%  cmd/compile/internal/ssa.(*node32).insert P
      90ms  0.74% 79.29%      140ms  1.16%  cmd/compile/internal/ssa.(*node32).glb P
      90ms  0.74% 80.03%       90ms  0.74%  cmd/compile/internal/ssa.(*regAllocState).addUse R
      90ms  0.74% 80.78%       90ms  0.74%  runtime/internal/atomic.Or8
      80ms  0.66% 81.44%     1470ms 12.13%  cmd/compile/internal/gc.(*state).locatePotentialPhiFunctions P
      80ms  0.66% 82.10%       80ms  0.66%  cmd/compile/internal/ssa.(*Func).freeValue
      80ms  0.66% 82.76%     2720ms 22.44%  cmd/compile/internal/ssa.(*edgeState).setup R
      80ms  0.66% 83.42%      550ms  4.54%  cmd/compile/internal/ssa.deadcode
      80ms  0.66% 84.08%      100ms  0.83%  cmd/compile/internal/ssa.phielimValue
      80ms  0.66% 84.74%      260ms  2.15%  runtime.interhash
      70ms  0.58% 85.31%       70ms  0.58%  runtime.aeshash64
      60ms   0.5% 85.81%      110ms  0.91%  runtime.ifaceeq
      50ms  0.41% 86.22%      180ms  1.49%  cmd/compile/internal/ssa.(*Func).setHome
(pprof) top 10
7644243 of 7968348 total (95.93%)
Dropped 184 nodes (cum <= 39841)
Showing top 10 nodes out of 37 (cum >= 103353)
      flat  flat%   sum%        cum   cum%
   4655276 58.42% 58.42%    4655276 58.42%  cmd/compile/internal/ssa.(*edgeState).set R
   1261053 15.83% 74.25%    1261053 15.83%  cmd/compile/internal/ssa.(*Func).newValue
    573464  7.20% 81.44%    1083992 13.60%  cmd/compile/internal/gc.(*state).locatePotentialPhiFunctions P
    458766  5.76% 87.20%     494125  6.20%  cmd/compile/internal/ssa.(*SparseTreeMapper).Insert P
    327695  4.11% 91.31%     327695  4.11%  cmd/compile/internal/ssa.(*Value).AddArgs
    131077  1.64% 92.96%     131077  1.64%  cmd/compile/internal/gc.Nod
    103957  1.30% 94.26%    4799453 60.23%  cmd/compile/internal/ssa.(*regAllocState).regalloc R
    100187  1.26% 95.52%    1327832 16.66%  cmd/compile/internal/gc.(*state).lookupVarOutgoing P
     16384  0.21% 95.73%     510509  6.41%  cmd/compile/internal/gc.(*phimap).Insert P
     16384  0.21% 95.93%     103353  1.30%  cmd/compile/internal/gc.walkexpr
1151.23MB of 1239.99MB total (92.84%)
Dropped 165 nodes (cum <= 6.20MB)
Showing top 20 nodes out of 56 (cum >= 48.53MB)
      flat  flat%   sum%        cum   cum%
  156.30MB 12.60% 12.60%   498.03MB 40.16%  cmd/compile/internal/ssa.(*regAllocState).regalloc R
  154.71MB 12.48% 25.08%   154.71MB 12.48%  cmd/compile/internal/ssa.(*Func).newValue
  133.28MB 10.75% 35.83%   135.59MB 10.94%  cmd/compile/internal/ssa.(*regAllocState).computeLive R
  106.17MB  8.56% 44.39%   106.17MB  8.56%  cmd/compile/internal/ssa.(*Func).setHome
   93.66MB  7.55% 51.95%   257.41MB 20.76%  cmd/compile/internal/gc.(*state).lookupVarOutgoing P
   56.57MB  4.56% 56.51%    57.12MB  4.61%  cmd/compile/internal/ssa.(*stackAllocState).stackalloc R
   53.74MB  4.33% 60.84%    53.74MB  4.33%  cmd/compile/internal/ssa.(*stackAllocState).buildInterferenceGraph R
   51.98MB  4.19% 65.03%    52.50MB  4.23%  cmd/compile/internal/ssa.(*stackAllocState).computeLive R
   46.57MB  3.76% 68.79%   182.17MB 14.69%  cmd/compile/internal/ssa.(*regAllocState).init R
   45.83MB  3.70% 72.49%    45.83MB  3.70%  cmd/compile/internal/ssa.cse
   42.01MB  3.39% 75.87%    42.01MB  3.39%  cmd/compile/internal/ssa.schedule
   36.20MB  2.92% 78.79%   142.45MB 11.49%  cmd/compile/internal/ssa.(*stackAllocState).init R
      36MB  2.90% 81.70%       36MB  2.90%  cmd/compile/internal/ssa.(*edgeState).set R
   34.03MB  2.74% 84.44%    34.03MB  2.74%  cmd/compile/internal/ssa.(*node32).insert P
   27.16MB  2.19% 86.63%    27.16MB  2.19%  cmd/compile/internal/ssa.(*Func).newSparseSet R?
   18.04MB  1.45% 88.09%    70.21MB  5.66%  cmd/compile/internal/gc.(*state).locatePotentialPhiFunctions P
   15.13MB  1.22% 89.31%    15.13MB  1.22%  cmd/compile/internal/gc.(*state).addNamedValue
      15MB  1.21% 90.52%       15MB  1.21%  cmd/compile/internal/ssa.(*Value).AddArgs
   14.85MB  1.20% 91.71%    14.85MB  1.20%  cmd/compile/internal/ssa.tighten
      14MB  1.13% 92.84%    48.53MB  3.91%  cmd/compile/internal/ssa.(*SparseTreeMapper).Insert P

@gopherbot
Copy link

CL https://golang.org/cl/22790 mentions this issue.

gopherbot pushed a commit that referenced this issue May 5, 2016
If b has exactly one predecessor, as happens
frequently with static calls, we can make
lookupVarOutgoing generate less garbage.

Instead of generating a value that is just
going to be an OpCopy and then get eliminated,
loop. This can lead to lots of looping.
However, this loop is way cheaper than generating
lots of ssa.Values and then eliminating them.

For a subset of the code in #15537:

Before:

       28.31 real        36.17 user         1.68 sys
2282450944  maximum resident set size

After:

        9.63 real        11.66 user         0.51 sys
 638144512  maximum resident set size

Updates #15537.

Excitingly, it appears that this also helps
regular code:

name       old time/op      new time/op      delta
Template        288ms ± 6%       276ms ± 7%   -4.13%        (p=0.000 n=21+24)
Unicode         143ms ± 8%       141ms ±10%     ~           (p=0.287 n=24+25)
GoTypes         932ms ± 4%       874ms ± 4%   -6.20%        (p=0.000 n=23+22)
Compiler        4.89s ± 4%       4.58s ± 4%   -6.46%        (p=0.000 n=22+23)
MakeBash        40.2s ±13%       39.8s ± 9%     ~           (p=0.648 n=23+23)

name       old user-ns/op   new user-ns/op   delta
Template   388user-ms ±10%  373user-ms ± 5%   -3.80%        (p=0.000 n=24+25)
Unicode    203user-ms ± 6%  202user-ms ± 7%     ~           (p=0.492 n=22+24)
GoTypes    1.29user-s ± 4%  1.17user-s ± 4%   -9.67%        (p=0.000 n=25+23)
Compiler   6.86user-s ± 5%  6.28user-s ± 4%   -8.49%        (p=0.000 n=25+25)

name       old alloc/op     new alloc/op     delta
Template       51.5MB ± 0%      47.6MB ± 0%   -7.47%        (p=0.000 n=22+25)
Unicode        37.2MB ± 0%      37.1MB ± 0%   -0.21%        (p=0.000 n=25+25)
GoTypes         166MB ± 0%       138MB ± 0%  -16.83%        (p=0.000 n=25+25)
Compiler        756MB ± 0%       628MB ± 0%  -16.96%        (p=0.000 n=25+23)

name       old allocs/op    new allocs/op    delta
Template         450k ± 0%        445k ± 0%   -1.02%        (p=0.000 n=25+25)
Unicode          356k ± 0%        356k ± 0%     ~           (p=0.374 n=24+25)
GoTypes         1.31M ± 0%       1.25M ± 0%   -4.18%        (p=0.000 n=25+25)
Compiler        5.29M ± 0%       5.02M ± 0%   -5.15%        (p=0.000 n=25+23)

It also seems to help in other cases in which
phi insertion is a pain point (#14774, #14934).

Change-Id: Ibd05ed7b99d262117ece7bb250dfa8c3d1cc5dd2
Reviewed-on: https://go-review.googlesource.com/22790
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
@rsc
Copy link
Contributor

rsc commented May 17, 2016

What's the status of this issue? Still 38X?

@randall77
Copy link
Contributor

I now get

go1.4:
real    0m0.843s
user    0m0.626s
sys 0m0.104s

tip:
real    0m8.631s
user    0m11.185s
sys 0m0.586s

So ~10x

@randall77
Copy link
Contributor

Also, FWIW

go1.6:
real    0m1.785s
user    0m1.956s
sys 0m0.262s

So ~2x from 1.4 to 1.6, ~5x from 1.6 to tip.

@rsc
Copy link
Contributor

rsc commented May 18, 2016

OK, well it sounds like that's just where we're going to be for Go 1.7?
It's not like all compiles are 10X slower, just large init tables.
(See also #1860, #6289.)

@dr2chase
Copy link
Contributor

FYI, I just checked whether https://go-review.googlesource.com/c/23136/ is any help here, and it is:
Before (tip as of this morning, includes the sparse CL):

time go test
...
real    0m7.208s
user    0m9.288s
sys 0m0.277s

After:

time go test
...
real    0m4.488s
user    0m4.613s
sys 0m0.266s

That is queued up for 1.8.

@quentinmit quentinmit added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label May 26, 2016
@quentinmit
Copy link
Contributor

It sounds like current tip is 10x slower. @rsc @randall77 @dr2chase Is there anything we still want to do here for Go1.7, or should I move this to Go1.8?

@randall77
Copy link
Contributor

Nothing else planned for 1.7. Let's move this to Go1.8early.

@gopherbot
Copy link

CL https://golang.org/cl/30163 mentions this issue.

gopherbot pushed a commit that referenced this issue Oct 3, 2016
Should be more asymptotically happy.

We process each variable in turn to find all the
locations where it needs a phi (the dominance frontier
of all of its definitions).  Then we add all those phis.
This takes O(n * #variables), although hopefully much less.

Then we do a single tree walk to match all the
FwdRefs with the nearest definition or phi.
This takes O(n) time.

The one remaining inefficiency is that we might end up
introducing a bunch of dead phis in the first step.
A TODO is to introduce phis only where they might be
used by a read.

The old algorithm is still faster on small functions,
so there's a cutover size (currently 500 blocks).

This algorithm supercedes the David's sparse phi
placement algorithm for large functions.

Lowers compile time of example from #14934 from
~10 sec to ~4 sec.
Lowers compile time of example from #16361 from
~4.5 sec to ~3 sec.
Lowers #16407 from ~20 min to ~30 sec.

Update #14934
Update #16361
Fixes #16407

Change-Id: I1cff6364e1623c143190b6a924d7599e309db58f
Reviewed-on: https://go-review.googlesource.com/30163
Reviewed-by: David Chase <drchase@google.com>
@randall77
Copy link
Contributor

This is now fixed. Tip compiles Todd's slow.go in under half a second.
Tip is ~2x slower than 1.4 and ~3x faster than 1.6. I'm happy with that.

@mewmew
Copy link
Contributor Author

mewmew commented Oct 5, 2016

I can confirm that tip is ~2x slower than 1.4. (Edit: my initial tests were run on the wrong revision of Go; namely Go 1.6 rather than tip. On my laptop there is a 2x slowdown between Go 1.4 and Go 1.6, and an additional 3x slowdown between Go 1.6 and Go tip b9fd510).

Average from 10 consecutive runs follow the instructions of #14934 (comment):

For Go 1.4:
Average time 0.5824 sec.

For Go 1.6 (rev da6b9ec):
Average time 1.3794 sec

For Go tip (rev b9fd510):
Average time 4.4844 sec

Slowdown between Go 1.4 to Go 1.6: ~2.37
Slowdown between Go 1.6 and Go tip: ~3.25
Slowdown between Go 1.4 and Go tip: ~7.69

Edit: Thanks a lot for working to resolve this issue. It will help out tremendously when working with auto-generated code (such as parsers).

@mewmew
Copy link
Contributor Author

mewmew commented Oct 5, 2016

My initial tests in #14934 (comment) were run on the wrong revision of Go; namely Go 1.6 rather than tip.

The comparison has been updated above. @randall77, my results do not seem to mirror yours. Any idea as to why this may happen? Can I assist with some additional information?

At least on my laptop, Go 1.4 outperforms Go 1.6 by ~2x, while Go 1.6 outperforms Go tip (rev b9fd510) by another ~3x.

@randall77
Copy link
Contributor

The numbers I quoted above were for Todd's example, not yours.
I am seeing similar numbers for your example (I get tip 5.5x slower than 1.4).

I'm inclined to leave this issue closed. It's only a 4 second build. (Most of that time is regalloc.)

@golang golang locked and limited conversation to collaborators Oct 7, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. ToolSpeed
Projects
None yet
Development

No branches or pull requests