-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: Performance degradation from 1.7.5 to 1.8 #19819
Comments
Thanks for the report. go1.8 at this point is done. It would be important to know if the performance regression persist in the current master branch (soon-to-be go1.9). Can you re-run your benchmarks on tip? Comparing go1.7.5 vs go1.8 vs tip would help understand if there's something that needs to be fixed for go1.9. |
As per your request, I tested with the last build: go version devel +a1cedf0 Sun Apr 2 03:25:02 2017 +0000 darwin/amd64 Unfortunately, the performance regression is still there (and even got slightly worse). I'm attaching the CPU profile and assembly files. Thanks, |
The code around the Sincos calls is definitely shifted around, but it isn't doing anything weird. For instance, it is now doing (lat1+lat2)/2 after the Remainder call instead of before it. There are a few extra loads. The 1.7 code did (lat1+lat2)/2 and (lat1-lat2)/2 together, so the loads of lat1 and lat2 were shared. The 1.8 code computes those two things on different sides of a function call, so it must reload lat1 and lat2. Not optimal, but I'm not sure why that would cause the performance difference you're seeing. Sincos is implemented in assembly on amd64 which is why you didn't see anything different just calling Sincos in a loop. (The assembly hasn't changed since at least 1.7.) The fact that Sincos in the profiles goes from 410 sec to 473 sec is strange, given that the Sincos code itself didn't change. 1.8 is doing the (lat1+lat2)/2 divisions later, maybe causing the code that uses the result in Sincos to stall? One thing I definitely noticed, apart from the bug in this CL, is that we're doing x/2 using an actual division instead of doing x * .5. I'll open a separate bug for that. p.s. just saw we have a math.satan. Awesome. |
CL https://golang.org/cl/41200 mentions this issue. |
We have dedicated asm implementation of sincos only on 386 and amd64, on everything else we are just jumping to generic version. However amd64 version is actually slower than generic one: Sincos-6 34.4ns ± 0% 24.8ns ± 0% -27.79% (p=0.000 n=8+10) So remove all sincos*.s and keep only generic and 386. Updates #19819 Change-Id: I7eefab35743729578264f52f6d23ee2c227c92a5 Reviewed-on: https://go-review.googlesource.com/41200 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
After 41200 tip is now faster than 1.7 or 1.8: I'm still not sure why sincos was slower. |
Can confirm on OS X. 1.7.5 16m48s (1 core) 8m30s (2 cores) go version devel +9459c03 Tue Apr 25 13:15:39 2017 +0000 darwin/amd64 Well done, Go Team! |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go version go1.8 darwin/amd64
What operating system and processor architecture are you using (
go env
)?OS X 10.11.6
What did you do?
A small but stubborn degradation in performance of my program. Tested on Mac and 64-bit Windows.
Single and multiple core runs of the program.
See the table here: https://docs.google.com/spreadsheets/d/1KTDkpzPWSLbDDSzPkL4elZbYrh3F0lJ9LUdIxYT_Vsw/edit?usp=sharing
What did you expect to see?
Same or (hopefully) better performance.
What did you see instead?
A drop in performance. See attached CPU profile files. It seems that the calls to math.Sincos() went up from 410 sec to 473 sec. However was unable to see any performance difference in a simple program that just calls math.Sincos() in a loop. The generated assembly code around the calls to math.Sincos() seems to be somewhat longer in Go 1.8 compare to 1.7.5 (see attached files). The repository of the program at: https://github.com/reconditematter/p5 To reproduce, run ./p5 -ncpu=1 < zztracts.txt
asm175.txt
asm180.txt
prof175.txt
prof180.txt
Thank you,
-Lenny-
The text was updated successfully, but these errors were encountered: