Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: Performance degradation from 1.7.5 to 1.8 #19819

Closed
reconditematter opened this issue Apr 2, 2017 · 6 comments
Closed

cmd/compile: Performance degradation from 1.7.5 to 1.8 #19819

reconditematter opened this issue Apr 2, 2017 · 6 comments

Comments

@reconditematter
Copy link

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.8 darwin/amd64

What operating system and processor architecture are you using (go env)?

OS X 10.11.6

What did you do?

A small but stubborn degradation in performance of my program. Tested on Mac and 64-bit Windows.
Single and multiple core runs of the program.
See the table here: https://docs.google.com/spreadsheets/d/1KTDkpzPWSLbDDSzPkL4elZbYrh3F0lJ9LUdIxYT_Vsw/edit?usp=sharing

What did you expect to see?

Same or (hopefully) better performance.

What did you see instead?

A drop in performance. See attached CPU profile files. It seems that the calls to math.Sincos() went up from 410 sec to 473 sec. However was unable to see any performance difference in a simple program that just calls math.Sincos() in a loop. The generated assembly code around the calls to math.Sincos() seems to be somewhat longer in Go 1.8 compare to 1.7.5 (see attached files). The repository of the program at: https://github.com/reconditematter/p5 To reproduce, run ./p5 -ncpu=1 < zztracts.txt

asm175.txt
asm180.txt

prof175.txt
prof180.txt

Thank you,
-Lenny-

@ALTree
Copy link
Member

ALTree commented Apr 2, 2017

Thanks for the report.

go1.8 at this point is done. It would be important to know if the performance regression persist in the current master branch (soon-to-be go1.9).

Can you re-run your benchmarks on tip? Comparing go1.7.5 vs go1.8 vs tip would help understand if there's something that needs to be fixed for go1.9.

@bradfitz bradfitz added this to the Go1.9Maybe milestone Apr 2, 2017
@bradfitz bradfitz added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 2, 2017
@bradfitz bradfitz changed the title Performance degradation from 1.7.5 to 1.8 cmd/compile: Performance degradation from 1.7.5 to 1.8 Apr 2, 2017
@reconditematter
Copy link
Author

As per your request, I tested with the last build:

go version devel +a1cedf0 Sun Apr 2 03:25:02 2017 +0000 darwin/amd64

Unfortunately, the performance regression is still there (and even got slightly worse). I'm attaching the CPU profile and assembly files.

Thanks,
-Lenny-

profdev.txt
asmdev.txt

@bradfitz bradfitz removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 2, 2017
@randall77
Copy link
Contributor

The code around the Sincos calls is definitely shifted around, but it isn't doing anything weird. For instance, it is now doing (lat1+lat2)/2 after the Remainder call instead of before it.

There are a few extra loads. The 1.7 code did (lat1+lat2)/2 and (lat1-lat2)/2 together, so the loads of lat1 and lat2 were shared. The 1.8 code computes those two things on different sides of a function call, so it must reload lat1 and lat2. Not optimal, but I'm not sure why that would cause the performance difference you're seeing.

Sincos is implemented in assembly on amd64 which is why you didn't see anything different just calling Sincos in a loop. (The assembly hasn't changed since at least 1.7.)

The fact that Sincos in the profiles goes from 410 sec to 473 sec is strange, given that the Sincos code itself didn't change. 1.8 is doing the (lat1+lat2)/2 divisions later, maybe causing the code that uses the result in Sincos to stall?

One thing I definitely noticed, apart from the bug in this CL, is that we're doing x/2 using an actual division instead of doing x * .5. I'll open a separate bug for that.

p.s. just saw we have a math.satan. Awesome.

@gopherbot
Copy link

CL https://golang.org/cl/41200 mentions this issue.

gopherbot pushed a commit that referenced this issue Apr 24, 2017
We have dedicated asm implementation of sincos only on 386 and amd64,
on everything else we are just jumping to generic version.
However amd64 version is actually slower than generic one:

Sincos-6               34.4ns ± 0%   24.8ns ± 0%  -27.79%  (p=0.000 n=8+10)

So remove all sincos*.s and keep only generic and 386.

Updates #19819

Change-Id: I7eefab35743729578264f52f6d23ee2c227c92a5
Reviewed-on: https://go-review.googlesource.com/41200
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@TocarIP
Copy link
Contributor

TocarIP commented Apr 24, 2017

After 41200 tip is now faster than 1.7 or 1.8:
1.7: T=27m26.442710939s
1.8: T=28m2.964990394s
tip: T=24m19.262662636s

I'm still not sure why sincos was slower.

@reconditematter
Copy link
Author

Can confirm on OS X.

1.7.5 16m48s (1 core) 8m30s (2 cores)
1.8 17m34s (1 core) 8m52s (2 cores)

go version devel +9459c03 Tue Apr 25 13:15:39 2017 +0000 darwin/amd64
14m49s (1 core) 7m33s (2 cores)

Well done, Go Team!

@golang golang locked and limited conversation to collaborators Apr 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants