Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: SSA recalculating loop termination condition every time #16457

Closed
s-l-teichmann opened this issue Jul 21, 2016 · 7 comments
Closed

Comments

@s-l-teichmann
Copy link

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?
go version devel +ff227b8 Thu Jul 21 01:04:22 2016 +0000 linux/amd64
  1. What operating system and processor architecture are you using (go env)?

Arch Linux / Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/sascha/devel/go/own"
GORACE=""
GOROOT="/home/sascha/devel/go.git"
GOTOOLDIR="/home/sascha/devel/go.git/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build849488284=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
  1. What did you do?
    If possible, provide a recipe for reproducing the error.
    A complete runnable program is good.
    A link on play.golang.org is best.

I run a prime number generator
https://play.golang.org/p/7asXrSGepM
(which dumps out as much primes as possible)
for 10 seconds.

Current git:

$ go version
go version devel +ff227b8 Thu Jul 21 01:04:22 2016 +0000 linux/amd64
$ go build concurrentprime.go
$ timeout 10 sh -c './concurrentprime > 1.7'
$ tail -1 1.7
21769961

Go 1.6.3:

$ go version
go version go1.6.3 linux/amd64
$ go build concurrentprime.go
$ timeout 10 sh -c './concurrentprime > 1.6.3'
$ tail -1 1.6.3
52899997
  1. What did you expect to see?

The numbers should be in the same order/range.

  1. What did you see instead?

The 1.6 number is twice as high as the 1.7 one.

If I remove the math.Sqrt function before the inner loop in
favour to the out commented line above the numbers are nearly
equal.

@thanm
Copy link
Contributor

thanm commented Jul 21, 2016

Looks as though the 1.7 compiler has decided to leave the sqrt() call inside the inner loop for some reason (in spite of it being loop-invariant). Assembly for 1.7 for the inner loop looks like

    0x0115 00277 (testcase.go:52)   MOVLQZX SI, R9
    0x0118 00280 (testcase.go:52)   CVTSQ2SD    R9, X0
    0x011d 00285 (testcase.go:52)   SQRTSD  X0, X0
    0x0121 00289 (testcase.go:52)   CVTTSD2SQ   X0, R9
    0x0126 00294 (testcase.go:53)   CMPL    R8, R9
    0x0129 00297 (testcase.go:53)   JHI $0, 341
    0x012b 00299 (testcase.go:54)   TESTL   R8, R8
    0x012e 00302 (testcase.go:54)   JEQ $0, 334
    0x0130 00304 (testcase.go:54)   MOVL    SI, AX
    0x0132 00306 (testcase.go:54)   XORQ    DX, DX
    0x0135 00309 (testcase.go:54)   DIVL    R8
    0x0138 00312 (testcase.go:54)   TESTL   DX, DX
    0x013a 00314 (testcase.go:54)   JEQ $0, 327
    0x013c 00316 (testcase.go:53)   ADDL    $2, R8
    0x0140 00320 (testcase.go:60)   MOVQ    "".autotmp_48+88(SP), DX
    0x0145 00325 (testcase.go:52)   JMP 277

For 1.6 the inner loop looks like:

    0x01e2 00482 (testcase.go:54)   MOVL    SI, AX
    0x01e4 00484 (testcase.go:54)   MOVL    $0, DX
    0x01e6 00486 (testcase.go:54)   DIVL    CX
    0x01e8 00488 (testcase.go:54)   CMPL    DX, $0
    0x01eb 00491 (testcase.go:54)   JEQ 270
    0x01f1 00497 (testcase.go:53)   ADDL    $2, CX
    0x01f4 00500 (testcase.go:53)   NOP
    0x01f4 00500 (testcase.go:53)   CMPL    CX, R8
    0x01f7 00503 (testcase.go:53)   JLS $0, 482

Also appears that the 1.7 compiler is not able to eliminate divide-by-zero check if I am reading that correctly, but I don't think that's the real performance issue.

@josharian josharian changed the title Performance regression Go 1.7 using math.Sqrt cmd/compile: SSA recalculating loop termination condition every time Jul 21, 2016
@bradfitz bradfitz added this to the Go1.8 milestone Jul 21, 2016
@josharian
Copy link
Contributor

This is tighten's fault; it is moving the sqrt calculation closer to use.

Simpler code:

package main

import "math"

var x uint32

func main() {
    end := uint32(math.Sqrt(float64(x)))
    for j := uint32(3); j <= end; j += 2 {
    }
}

With -ssa=0:

"".main t=1 size=41 args=0x0 locals=0x0
    0x0000 00000 (sqrt.go:11)   TEXT    "".main(SB), $0-0
    0x0000 00000 (sqrt.go:12)   MOVLQZX "".x(SB), BX
    0x0006 00006 (sqrt.go:12)   CVTSQ2SD    BX, X0
    0x000b 00011 (sqrt.go:12)   SQRTSD  X0, X0
    0x000f 00015 (sqrt.go:12)   CVTTSD2SQ   X0, BX
    0x0014 00020 (sqrt.go:12)   MOVQL   BX, BX
    0x0016 00022 (sqrt.go:12)   MOVL    BX, CX
    0x0018 00024 (sqrt.go:13)   MOVL    $3, AX
    0x001d 00029 (sqrt.go:13)   CMPL    AX, CX
    0x001f 00031 (sqrt.go:13)   JHI $0, 40
    0x0021 00033 (sqrt.go:13)   ADDL    $2, AX
    0x0024 00036 (sqrt.go:13)   CMPL    AX, CX
    0x0026 00038 (sqrt.go:13)   JLS $0, 33
    0x0028 00040 (sqrt.go:18)   RET

With regular SSA:

"".main t=1 size=35 args=0x0 locals=0x0
    0x0000 00000 (sqrt.go:11)   TEXT    "".main(SB), $0-0
    0x0000 00000 (sqrt.go:12)   MOVL    "".x(SB), AX
    0x0006 00006 (sqrt.go:13)   MOVL    $3, CX
    0x000b 00011 (sqrt.go:12)   CVTSQ2SD    AX, X0
    0x0010 00016 (sqrt.go:12)   SQRTSD  X0, X0
    0x0014 00020 (sqrt.go:12)   CVTTSD2SQ   X0, DX
    0x0019 00025 (sqrt.go:13)   CMPL    CX, DX
    0x001b 00027 (sqrt.go:13)   JHI $0, 34
    0x001d 00029 (sqrt.go:13)   ADDL    $2, CX
    0x0020 00032 (sqrt.go:12)   JMP 11
    0x0022 00034 (sqrt.go:18)   RET

With SSA with the tighten pass disabled:

"".main t=1 size=37 args=0x0 locals=0x0
    0x0000 00000 (sqrt.go:11)   TEXT    "".main(SB), $0-0
    0x0000 00000 (sqrt.go:12)   MOVL    "".x(SB), AX
    0x0006 00006 (sqrt.go:12)   CVTSQ2SD    AX, X0
    0x000b 00011 (sqrt.go:12)   SQRTSD  X0, X0
    0x000f 00015 (sqrt.go:12)   CVTTSD2SQ   X0, AX
    0x0014 00020 (sqrt.go:13)   MOVL    $3, CX
    0x0019 00025 (sqrt.go:13)   CMPL    CX, AX
    0x001b 00027 (sqrt.go:13)   JHI $0, 36
    0x001d 00029 (sqrt.go:13)   ADDL    $2, CX
    0x0020 00032 (sqrt.go:13)   CMPL    CX, AX
    0x0022 00034 (sqrt.go:13)   JLS $0, 29
    0x0024 00036 (sqrt.go:18)   RET

Observe that with tighten disabled, we do have to generate an extra CMPL, so in some sense, tighten is doing its job...it just has no idea that the value calculation it moved is so expensive that it's worth keeping out of the loop. Not sure what the right approach is here.

@dr2chase
Copy link
Contributor

We have a (reducible) "loop detector" hanging around that we might be able to use for this; tag some operations as "expensive" and don't tighten them any closer than just before the entry to the next loop. We also have a strongly-connected-components detector in a pending 1.8 CL that will find all (outermost) loops, including the irreducible ones.

@josharian
Copy link
Contributor

There's also a simple, minimal fix for this particular problem: CL 25111.

@gopherbot
Copy link

CL https://golang.org/cl/25111 mentions this issue.

@randall77
Copy link
Contributor

I believe this is now fixed since https://go-review.googlesource.com/c/28712/

@s-l-teichmann Can you confirm?

@s-l-teichmann
Copy link
Author

Looks fine now. I'v re-run the orignal test with todays go version devel +db82cf4 Wed Sep 28 05:21:53 2016 +0000 linux/amd64 and get numbers slightly better (52924999) than with 1.6. so the 1.7 regression seems to be gone. Thank you. :-)

@golang golang locked and limited conversation to collaborators Sep 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants