Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go build produces binaries that run with varying performance #66866

Closed
PerfectlyFluffy opened this issue Apr 17, 2024 · 4 comments
Closed

go build produces binaries that run with varying performance #66866

PerfectlyFluffy opened this issue Apr 17, 2024 · 4 comments

Comments

@PerfectlyFluffy
Copy link

PerfectlyFluffy commented Apr 17, 2024

Go version

go version go1.22.2 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/user/.cache/go-build'
GOENV='/home/user/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/user/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/user/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/lib/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/lib/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.2'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/user/_dev/perfect28/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build215425291=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Excuse me for my not very good English. I'm programming a benchmark as my first open source contribution. Sources are available at https://github.com/PerfectlyFluffy/perfect28 in case you would want to get a closer look or even debug/test it. It have less than 400 lines of code, so it's quick to review.

What did you see happen?

Inconsistent performance. Since my program is a benchmark, it not only benchmark CPU's, it also benchmark go itself. This is why I noticed this sneaky problem. I have reproduced the problem on Arch Linux/go1.22.2, Debian/go1.19.8 (the problem was present in 1.19.8, but it was 50-60% less severe) and Fedora/go1.21.9. My CPU is a 5950X from AMD.

It's hard to explain, but I will try my best. Here is a screenshot to make it more visual:
image

On the preceding screenshot, we can get to the conclusion that my goal was to get rid of a loopclosure warning. Performance was good before I changed the code. I run this command each time I change my code to verify if performance is good or not : go build && time ./perfect28 --loop=1000000000000, here is the result when performance is good (before code changed):
image

I then modified lines 35 and 40 and ran the command again and I got this result:
image

Noticing a difference of 20 seconds in execution time, I uncommented line 15 to fix this degradation problem and ran the command again which gave me back good performance:
image

I conclude that any code change can have a big impact on performance, even change that are not expected to impact performance like calls to fmt.Print() or sync.WaitGroup.Add(x) for example. When you run go build, you can end with either a good binary or a bad one. For this reason I always do a performance test before I commit my code to make sure I have good performance.

This screenshot show that any change can have an impact on performance:
image

What I did was add line 18 => test => mark the time. Add line 19 => test => mark the time, and so on. I don't let the benchmark finish because I know that a binary is good when 5% is done under 3s.

What did you expect to see?

I expect performance of binaries not to degrade randomly based on how many line of code a program has.

I hope the information I provided will prove to be useful and that it will help find the root cause of the problem. Let me know if I can help in any ways!

I really enjoy coding Go apps! Thank you for your hard work! I appreciate it very much!

Cheers!
Dave

Edit: Typo

PerfectlyFluffy added a commit to PerfectlyFluffy/perfect28 that referenced this issue Apr 17, 2024
@Jorropo
Copy link
Member

Jorropo commented Apr 17, 2024

Please don't post screenshots, can't copy paste from them, have links to commits, codeblocks and other textual form, ...


We really need more repeated runs of each binary to be statistically analyzed, for all that I know adding or removing empty print lines has negligeable effect and you are seeing random variance which would also happen if you built a binary once and ran it 5 times.
We have the benchstat tool which compute that information from the go standard testing.B output.

@randall77
Copy link
Contributor

We've seen this before, performance varying a lot based on how code is laid out in memory. We do try some things to mitigate this, like aligning various pieces of code.
@drchase has been doing some more investigations about good/bad alignments. See CL 576835 and CL 577935.

TL;DR If this effect is what I think it is, there's not much the Go project can do. You'll have to get your CPU vendor to stop making chips that have random(ish) performance variation.

@seankhliao seankhliao closed this as not planned Won't fix, can't repro, duplicate, stale Apr 17, 2024
@PerfectlyFluffy
Copy link
Author

PerfectlyFluffy commented Apr 18, 2024

@Jorropo Thank you for pointing out the benchstat tool! I'll give it a try!

@randall77 Thank you for pointing out that it could be CPU related. I've run some tests on 4 different PCs after I've read your post.

  1. My office PC, an AMD 5950x
  2. My current gaming PC, an AMD 5700x
  3. My previous gaming PC, an Intel i5-4690K
  4. My Raspberry Pi 4B
  • 5950x and 5700x both have the exact same behavior. When performance good, it's good on both, but when it's bad, it's also bad on both (no matter if it was compiled by AMD, Intel or RPI)
  • i5-4690K is very consistent! I can't say if performance is optimal, but it's always the same!
  • Raspberry Pi 4B seem to have a small variance but it's not even close to be as bad as my 2 AMD CPU. I would say that the variance seems to be within margin of error but it's not as good as my intel CPU.

I'm having a hard time convincing myself that it's AMD's fault since I was making the benchmark in C before switching to Go and I have never had this problem with the C implementation. Also, simply adding a line fmt.Print() in the code can worsen or improve performance by as much as 32% which is a very big performance degradation.

I'll continue investigate this problem for a bit and I will open a new issue or even a PR but only if I come to find something tangible.

By the way, goroutines are awesome! ❤️ Thank you!

Cheers!
Dave

PerfectlyFluffy added a commit to PerfectlyFluffy/perfect28 that referenced this issue Apr 19, 2024
PerfectlyFluffy added a commit to PerfectlyFluffy/perfect28 that referenced this issue Apr 19, 2024
@PerfectlyFluffy
Copy link
Author

PerfectlyFluffy commented Apr 19, 2024

I don't mean to be annoying. I make this last post for the sake of completeness.

Output of go test -bench=BenchmarkMain -count=8 > old.txt:

goos: linux
goarch: amd64
pkg: github.com/PerfectlyFluffy/perfect28
cpu: AMD Ryzen 9 5950X 16-Core Processor            
BenchmarkMain-32    	       1	62698387947 ns/op
BenchmarkMain-32    	       1	63401121520 ns/op
BenchmarkMain-32    	       1	63459067340 ns/op
BenchmarkMain-32    	       1	63851727405 ns/op
BenchmarkMain-32    	       1	63563447685 ns/op
BenchmarkMain-32    	       1	63841900151 ns/op
BenchmarkMain-32    	       1	64007010843 ns/op
BenchmarkMain-32    	       1	64087389162 ns/op
PASS
ok  	github.com/PerfectlyFluffy/perfect28	508.915s

Output of go test -bench=BenchmarkMain -count=8 > new.txt (only difference in the code versus old is a fmt.Print() at the root of the main function):

goos: linux
goarch: amd64
pkg: github.com/PerfectlyFluffy/perfect28
cpu: AMD Ryzen 9 5950X 16-Core Processor            
BenchmarkMain-32    	       1	81852111106 ns/op
BenchmarkMain-32    	       1	81963111956 ns/op
BenchmarkMain-32    	       1	82237281288 ns/op
BenchmarkMain-32    	       1	82279110930 ns/op
BenchmarkMain-32    	       1	82491095339 ns/op
BenchmarkMain-32    	       1	82303076903 ns/op
BenchmarkMain-32    	       1	82481060072 ns/op
BenchmarkMain-32    	       1	82489785819 ns/op
PASS
ok  	github.com/PerfectlyFluffy/perfect28	658.101s

Output of benchstat old.txt new.txt:

goos: linux
goarch: amd64
pkg: github.com/PerfectlyFluffy/perfect28
cpu: AMD Ryzen 9 5950X 16-Core Processor            
        │  old.txt   │              new.txt              │
        │   sec/op   │   sec/op    vs base               │
Main-32   63.70 ± 2%   82.29 ± 1%  +29.18% (p=0.000 n=8)

Here a screenshot of the command I used to run the benchmark and the output it yield:
image

Explanation of the command:

  1. Preheated the CPU to mitigate cold start advantage.
  2. Used sed to comment the fmt.Print() in the main to set the benchmark in "good performance mode".
  3. Used git diff to show that sed had done the job properly and also to show that the folder is not dirty.
  4. Ran 8 loop of BenchmarkMain and output the result into old.txt
  5. Used sed again to uncomment the line fmt.Print() to set the benchmark in "bad performance mode".
  6. Used git diff again for the same reason that point 3.
  7. Ran 8 loop of BenchmarkMain and output the result into new.txt

Sadly, I could not include the execution of benchstat old.txt new.txt into the command because I first had to remove manually the output of my program.

I noticed that any code I put in my BenchmarkMain function never affects performance. I've try in many ways, but performance always stayed the same. Only code use in the Main function seem to have an effect on performance.

I provided the benchmark function and instructions to reproduce in this commit: PerfectlyFluffy/perfect28@7e84474

I think this is the best I can do to help with this issue. At this point it appears to be obvious that I did all of this for nothing, but I had a lot of fun.

Cheers,
Dave

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants