Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark results vastly different when Tests are skipped with -run=XXX #42696

Closed
antoineco opened this issue Nov 18, 2020 · 8 comments
Closed
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@antoineco
Copy link

antoineco commented Nov 18, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15.5 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/acotten/.cache/go-build"
GOENV="/home/acotten/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/acotten/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/acotten/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/tmp/19fda1e474bcc545cffe48ac07f9889e/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build782682920=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Ran a benchmark that is part of the following program's tests: https://gist.github.com/antoineco/19fda1e474bcc545cffe48ac07f9889e, once without -run flag, once with -run=XXX to skip tests.

What did you expect to see?

Similar results in both cases.

What did you see instead?

Vastly different results:

$ go test ./ -bench=BenchmarkGenerate -count=20 -cpu=1 -run=XXX
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate         391664              3062 ns/op
BenchmarkGenerate         380072              3045 ns/op
BenchmarkGenerate         388015              3081 ns/op
BenchmarkGenerate         392066              3041 ns/op
BenchmarkGenerate         381025              3093 ns/op
BenchmarkGenerate         397995              3041 ns/op
BenchmarkGenerate         385315              3089 ns/op
BenchmarkGenerate         398268              3041 ns/op
BenchmarkGenerate         389184              3092 ns/op
BenchmarkGenerate         395845              3056 ns/op
BenchmarkGenerate         382033              3090 ns/op
BenchmarkGenerate         394696              3028 ns/op
BenchmarkGenerate         391120              3089 ns/op
BenchmarkGenerate         379633              3033 ns/op
BenchmarkGenerate         337450              3103 ns/op
BenchmarkGenerate         392859              3038 ns/op
BenchmarkGenerate         383972              3086 ns/op
BenchmarkGenerate         392360              3041 ns/op
BenchmarkGenerate         382046              3098 ns/op
BenchmarkGenerate         381825              3042 ns/op
PASS
ok      cegen   25.296s
$ go test ./ -bench=BenchmarkGenerate -count=20 -cpu=1
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate         166654              6381 ns/op
BenchmarkGenerate         180555              6240 ns/op
BenchmarkGenerate         181734              6258 ns/op
BenchmarkGenerate         183918              6260 ns/op
BenchmarkGenerate         181689              6275 ns/op
BenchmarkGenerate         181914              6310 ns/op
BenchmarkGenerate         181581              6284 ns/op
BenchmarkGenerate         182289              6284 ns/op
BenchmarkGenerate         181348              6267 ns/op
BenchmarkGenerate         180328              6275 ns/op
BenchmarkGenerate         179772              6259 ns/op
BenchmarkGenerate         183290              6221 ns/op
BenchmarkGenerate         179972              6282 ns/op
BenchmarkGenerate         182958              6257 ns/op
BenchmarkGenerate         182750              6302 ns/op
BenchmarkGenerate         181096              6261 ns/op
BenchmarkGenerate         180844              6294 ns/op
BenchmarkGenerate         181142              6287 ns/op
BenchmarkGenerate         180400              6275 ns/op
BenchmarkGenerate         180426              6266 ns/op
PASS
ok      cegen   24.072s
@martisch
Copy link
Contributor

martisch commented Nov 18, 2020

Can you run both with -count=20 -cpu=1 and see if there is a high interrun variance? Also make sure to run on a quiet computer ideally with other applications and browsers closed.

If this benchmark does any disk or network io this might also introduce latencies and if run only one might have received a cache result why the other run didnt.

@martisch martisch added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Nov 18, 2020
@antoineco
Copy link
Author

antoineco commented Nov 18, 2020

@martisch here's a run with those flags:

$ go test ./ -bench=BenchmarkGenerate -count=20 -cpu=1
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate         166654              6381 ns/op
BenchmarkGenerate         180555              6240 ns/op
BenchmarkGenerate         181734              6258 ns/op
BenchmarkGenerate         183918              6260 ns/op
BenchmarkGenerate         181689              6275 ns/op
BenchmarkGenerate         181914              6310 ns/op
BenchmarkGenerate         181581              6284 ns/op
BenchmarkGenerate         182289              6284 ns/op
BenchmarkGenerate         181348              6267 ns/op
BenchmarkGenerate         180328              6275 ns/op
BenchmarkGenerate         179772              6259 ns/op
BenchmarkGenerate         183290              6221 ns/op
BenchmarkGenerate         179972              6282 ns/op
BenchmarkGenerate         182958              6257 ns/op
BenchmarkGenerate         182750              6302 ns/op
BenchmarkGenerate         181096              6261 ns/op
BenchmarkGenerate         180844              6294 ns/op
BenchmarkGenerate         181142              6287 ns/op
BenchmarkGenerate         180400              6275 ns/op
BenchmarkGenerate         180426              6266 ns/op
PASS
ok      cegen   24.072s
$ go test ./ -bench=BenchmarkGenerate -count=20 -cpu=1 -run=XXX
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate         391664              3062 ns/op
BenchmarkGenerate         380072              3045 ns/op
BenchmarkGenerate         388015              3081 ns/op
BenchmarkGenerate         392066              3041 ns/op
BenchmarkGenerate         381025              3093 ns/op
BenchmarkGenerate         397995              3041 ns/op
BenchmarkGenerate         385315              3089 ns/op
BenchmarkGenerate         398268              3041 ns/op
BenchmarkGenerate         389184              3092 ns/op
BenchmarkGenerate         395845              3056 ns/op
BenchmarkGenerate         382033              3090 ns/op
BenchmarkGenerate         394696              3028 ns/op
BenchmarkGenerate         391120              3089 ns/op
BenchmarkGenerate         379633              3033 ns/op
BenchmarkGenerate         337450              3103 ns/op
BenchmarkGenerate         392859              3038 ns/op
BenchmarkGenerate         383972              3086 ns/op
BenchmarkGenerate         392360              3041 ns/op
BenchmarkGenerate         382046              3098 ns/op
BenchmarkGenerate         381825              3042 ns/op
PASS
ok      cegen   25.296s

My laptop isn't busy right now, I closed other applications prior to running the benchmarks, and the application isn't interacting with the disk or network (code posted in the description).

I'm curious what you get when you execute those commands?

(edit: updating issue description with those results)

@martisch
Copy link
Contributor

Thanks. My only hunch now is the test might be setting or caching something that helps the benchmarks. That is quite a bit of code and dependencies which I had not had time to understand and I am currently reluctant to just run it. If somebody can reduce the complexity to have a more minimal reproducer that could help find why there is an effect.

@martisch martisch added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Nov 18, 2020
@antoineco
Copy link
Author

antoineco commented Nov 18, 2020

Sure, let me isolate the function in the Benchmark.

Hm actually no, I shouldn't do that. If I remove all Test* funcs inside main_test.go without changing anything else, the Benchmark results are now consistent.

$ go test ./ -bench=BenchmarkGenerate -cpu=1
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate         390487              3062 ns/op
PASS
ok      cegen   1.232ss
$ go test ./ -bench=BenchmarkGenerate -cpu=1 -run=XXX
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate         393402              3028 ns/op
PASS
ok      cegen   2.215s

@martisch
Copy link
Contributor

martisch commented Nov 18, 2020

Maybe we can remove test one by one to see if any test in particular turns out to improve the benchmark results?

Maybe then we can see if that test has side effects like registering a callback or changing some global state without resetting it back to defaults. For example I can see there is a sync.Pool with global state involved. It might just be that the tests causes the sync.Pool to be filled and causing less allocations in the later benchmark. You can add b.ReportAllocs() before the for loop in the benchmark to see allocations.

@antoineco
Copy link
Author

antoineco commented Nov 18, 2020

Oh, right, global vars are shared between all tests. I didn't even consider that angle. Thanks for bringing it up 🤦

TestRun indeed causes the sync.Pool to be initialized with much smaller buffers (2 bytes vs. 2048), making the pool useless if the benchmark if not skipped:

https://gist.github.com/antoineco/19fda1e474bcc545cffe48ac07f9889e#file-main_test-go-L63-L65

$ go test ./ -bench=BenchmarkGenerate -count=5 -cpu=1
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate         164232              6758 ns/op           13945 B/op         22 allocs/op
BenchmarkGenerate         177374              6586 ns/op           13945 B/op         22 allocs/op
BenchmarkGenerate         180433              6594 ns/op           13945 B/op         22 allocs/op
BenchmarkGenerate         175017              6696 ns/op           13945 B/op         22 allocs/op
BenchmarkGenerate         174841              6692 ns/op           13945 B/op         22 allocs/op
PASS
ok      cegen   7.175s
$ go test ./ -bench=BenchmarkGenerate -count=5 -cpu=1 -run=XXX
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate         375070              3244 ns/op             192 B/op          8 allocs/op
BenchmarkGenerate         387146              3217 ns/op             192 B/op          8 allocs/op
BenchmarkGenerate         381927              3178 ns/op             192 B/op          8 allocs/op
BenchmarkGenerate         389636              3191 ns/op             192 B/op          8 allocs/op
BenchmarkGenerate         336218              3163 ns/op             192 B/op          8 allocs/op
PASS
ok      cegen   6.157s

Thanks a lot for investigating. 🙌 Mystery solved!


(edit)

In case someone read this and the cause of the issue isn't obvious 👇

diff --git a/perf/tools/cegen/main.go b/perf/tools/cegen/main.go
index cf585da..1662866 100644
--- a/perf/tools/cegen/main.go
+++ b/perf/tools/cegen/main.go
@@ -139,6 +139,9 @@ type CloudEventTargetsGenerator struct {
        typeAttr   string
        sourceAttr string
        data       []byte
+
+       // Once used to initialize the buffer pool on the first call to Generate.
+       bufOnce sync.Once
 }

 // NewCloudEventTargetsGenerator returns a generator that yields vegeta JSON
@@ -157,9 +160,6 @@ func NewCloudEventTargetsGenerator(url, typeAttr, sourceAttr string, data []byte
 // Buffer pool for jwriter.Writer's underlying Buffer and output.
 var writerBufPool *sync.Pool

-// Once used to initialize the buffer pool on the first call to Generate.
-var bufOnce sync.Once
-
 // Generate returns a target serialized as JSON.
 func (g *CloudEventTargetsGenerator) Generate() ([]byte, error) {
        var t jsonTarget
@@ -181,7 +181,7 @@ func (g *CloudEventTargetsGenerator) Generate() ([]byte, error) {
        t.Body = g.data

        // encode inside the Once fn to determine the size of buffers in sync pools
-       bufOnce.Do(func() {
+       g.bufOnce.Do(func() {
                var jw jwriter.Writer
                t.encode(&jw)
                dataBytes, _ := jw.BuildBytes()

@bcmills
Copy link
Contributor

bcmills commented Nov 18, 2020

@antoineco, FWIW, putting differently-sized buffers in a sync.Pool is almost never appropriate, and our sync.Pool examples have historically not been very clear about that. (That's #23199.)

@antoineco
Copy link
Author

@bcmills that's another interesting point 👍 Although in my case the issue was purely a misuse, as a side effect of me expecting the global var to be "fresh" with every test.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants