-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: SSA performance regression for slice averaging #14828
Comments
Leaving milestone decision to @randall77. |
In particular, this sequence:
as opposed to simply
causes the slowdown. |
SSA generates the same sequence for an int64-summing loop:
but the cpu seems to handle it much better, producing only a 2% slowdown for me. |
This is the general problem of regalloc not looking forward to the next register-constrained use of a value. We allocate that middle ADDSD to X1 when later we know that it is needed in X0 (for the loop phi). We have the same problem with shifts requiring their input in CX. |
CL https://golang.org/cl/22160 mentions this issue. |
go version
)?go version devel +ac47f66 Tue Mar 15 07:13:04 2016 +0000 linux/amd64
go env
)?Ubuntu 14.04 in Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/i/go"
GORACE=""
GOROOT="/home/i/gotip"
GOTOOLDIR="/home/i/gotip/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build380342597=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
Code is at http://play.golang.org/p/L8SlHM5_IJ
It contains benchmark for two functions that average a []float64, one using index in the range-for and other using the value.
I run it with
go test -bench=.
I'm getting a performance regression for tip with SSA on for the value case.
go1.4.3:
BenchmarkAvgSliceValue 500 3815534 ns/op
BenchmarkAvgSliceIndex 300 5028982 ns/op
go1.5.3:
BenchmarkAvgSliceValue-4 500 3768664 ns/op
BenchmarkAvgSliceIndex-4 300 5030110 ns/op
go1.6:
BenchmarkAvgSliceValue-4 500 3769454 ns/op
BenchmarkAvgSliceIndex-4 300 5026162 ns/op
go-tip with ssa off and checks off:
~/gotip/bin/go test -bench=. -gcflags='-d=ssa/check/off -ssa=0'
BenchmarkAvgSliceValue-4 500 3771890 ns/op
BenchmarkAvgSliceIndex-4 300 5030560 ns/op
go-tip with ssa on and checks off:
~/gotip/bin/go test -bench=. -gcflags='-d=ssa/check/off -ssa=1'
BenchmarkAvgSliceValue-4 300 5028224 ns/op
BenchmarkAvgSliceIndex-4 300 5025677 ns/op
Similar performance with SSA on.
Compiling with SSA produces a 25% slowdown when the value is used. Disabling consistency checks doesn't help but I think it just for compiling. It seems that some optimizations for value range loop are not being applied in SSA.
The slowdown is so large for this function that I suspect I'm doing something wrong or there are some other checks in the SSA compiled code that I'm not disabling.
The text was updated successfully, but these errors were encountered: