testing: Inconsistent benchmark data when GOMAXPROCS=1 #31599

xuanjiazhen · 2019-04-21T16:30:59Z

What version of Go are you using (`go version`)?

$ go version
go version go1.12.4 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/xuanjiazhen/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/xuanjiazhen/work"
GOPROXY=""
GORACE=""
GOROOT="/Users/xuanjiazhen/Desktop/go"
GOTMPDIR=""
GOTOOLDIR="/Users/xuanjiazhen/Desktop/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/w3/64tyfz3n4cs2jw5lrl7mc5c80000gn/T/go-build723171853=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

$GOMAXPROCS=1 go test runtime -v -run none -bench  ^BenchmarkConvT2ILarge$ -count 10
goos: darwin
goarch: amd64
pkg: runtime
BenchmarkConvT2ILarge   50000000                46.8 ns/op
BenchmarkConvT2ILarge   30000000                46.9 ns/op
BenchmarkConvT2ILarge   50000000                39.5 ns/op
BenchmarkConvT2ILarge   30000000                39.1 ns/op
BenchmarkConvT2ILarge   30000000                47.6 ns/op
BenchmarkConvT2ILarge   30000000                46.8 ns/op
BenchmarkConvT2ILarge   30000000                49.6 ns/op
BenchmarkConvT2ILarge   30000000                43.6 ns/op
BenchmarkConvT2ILarge   30000000                52.2 ns/op
BenchmarkConvT2ILarge   50000000                46.9 ns/op
PASS
ok      runtime 16.874s
$GOMAXPROCS=1 go test runtime -v -run none -bench  ^BenchmarkConvT2ILarge -count 10
goos: darwin
goarch: amd64
pkg: runtime
BenchmarkConvT2ILarge   100000000               19.6 ns/op
BenchmarkConvT2ILarge   100000000               19.8 ns/op
BenchmarkConvT2ILarge   100000000               20.3 ns/op
BenchmarkConvT2ILarge   100000000               21.2 ns/op
BenchmarkConvT2ILarge   100000000               21.0 ns/op
BenchmarkConvT2ILarge   100000000               20.6 ns/op
BenchmarkConvT2ILarge   100000000               21.1 ns/op
BenchmarkConvT2ILarge   100000000               20.6 ns/op
BenchmarkConvT2ILarge   100000000               20.6 ns/op
BenchmarkConvT2ILarge   100000000               20.8 ns/op
PASS
ok      runtime 21.803s

What did you expect to see?

The test results of the two types of writing should be consistent.

What did you see instead?

Test results with $ are much worse than test results without it.

$GODEBUG=gctrace=1 GOMAXPROCS=1 go test runtime -v -run none -bench  ^BenchmarkConvT2ILarge -count 10
...
gc 3984 @22.195s 2%: 0.001+0.12+0.001 ms clock, 0.001+0.12/0/0+0.001 ms cpu, 4->4->0 MB, 5 MB goal, 1 P
gc 3985 @22.202s 2%: 0.003+0.13+0.001 ms clock, 0.003+0.12/0/0+0.001 ms cpu, 4->4->0 MB, 5 MB goal, 1 P
gc 3986 @22.207s 2%: 0.001+0.11+0.001 ms clock, 0.001+0.11/0/0+0.001 ms cpu, 4->4->0 MB, 5 MB goal, 1 P
gc 3987 @22.213s 2%: 0.001+0.11+0 ms clock, 0.001+0.11/0/0+0 ms cpu, 4->4->0 MB, 5 MB goal, 1 P
gc 3988 @22.218s 2%: 0.001+0.12+0.001 ms clock, 0.001+0.12/0/0+0.001 ms cpu, 4->4->0 MB, 5 MB goal, 1 P
gc 3989 @22.224s 2%: 0.001+0.12+0 ms clock, 0.001+0.12/0/0+0 ms cpu, 4->4->0 MB, 5 MB goal, 1 P
gc 3990 @22.230s 2%: 0+0.12+0 ms clock, 0+0.12/0/0+0 ms cpu, 4->4->0 MB, 5 MB goal, 1 P
gc 3991 @22.236s 2%: 0.001+0.14+0.001 ms clock, 0.001+0.14/0/0+0.001 ms cpu, 4->4->0 MB, 5 MB goal, 1 P
100000000               22.8 ns/op
PASS

$GODEBUG=gctrace=1 GOMAXPROCS=1 go test runtime -v -run none -bench  ^BenchmarkConvT2ILarge$ -count 10
...
gc 1052 @18.124s 0%: 0.001+20+0.002 ms clock, 0.001+0.13/0.021/0+0.002 ms cpu, 6->11->5 MB, 10 MB goal, 1 P
gc 1053 @18.146s 0%: 0.001+21+0.001 ms clock, 0.001+0.11/0.013/0+0.001 ms cpu, 6->12->6 MB, 11 MB goal, 1 P
gc 1054 @18.169s 0%: 0.001+19+0.001 ms clock, 0.001+0.11/0.014/0+0.001 ms cpu, 7->12->5 MB, 12 MB goal, 1 P
gc 1055 @18.190s 0%: 0.001+23+0.001 ms clock, 0.001+0.11/0.015/0+0.001 ms cpu, 6->12->6 MB, 10 MB goal, 1 P
gc 1056 @18.216s 0%: 0.002+19+0.001 ms clock, 0.002+0.13/0.013/0+0.001 ms cpu, 7->12->5 MB, 12 MB goal, 1 P
gc 1057 @18.237s 0%: 0.002+19+0.001 ms clock, 0.002+0.13/0.017/0+0.001 ms cpu, 6->11->5 MB, 10 MB goal, 1 P
gc 1058 @18.258s 0%: 0.001+20+0.002 ms clock, 0.001+0.11/0.020/0+0.002 ms cpu, 6->12->6 MB, 11 MB goal, 1 P
gc 1059 @18.281s 0%: 0.001+19+0.001 ms clock, 0.001+0.11/0.016/0+0.001 ms cpu, 7->12->5 MB, 12 MB goal, 1 P
gc 1060 @18.302s 0%: 0.001+19+0.002 ms clock, 0.001+0.11/0.017/0+0.002 ms cpu, 6->11->5 MB, 11 MB goal, 1 P
gc 1061 @18.323s 0%: 0.002+11+0.002 ms clock, 0.002+0.13/0.023/0+0.002 ms cpu, 6->8->2 MB, 10 MB goal, 1 P
50000000                51.7 ns/op
PASS
ok      runtime 18.342s
gc 21 @19.662s 0%: 0.015+1.6+0.002 ms clock, 0.015+0.099/0.52/0.90+0.002 ms cpu, 4->4->1 MB, 5 MB goal, 1 P

The text was updated successfully, but these errors were encountered:

ALTree · 2019-04-21T16:57:35Z

Thanks for the report.

I can't reproduce this using 1.12.4 on linux/amd64, the two results are statistically the same (with vs without $):

$ benchstat with.txt without.txt 

name          old time/op  new time/op  delta
ConvT2ILarge  28.0ns ± 1%  28.0ns ± 1%   ~     (p=0.914 n=9+9)

Can you reproduce this consistently? Are you running the benchmarks on a very quiet machine with no other processes using the cpu?

xuanjiazhen · 2019-04-21T19:26:25Z

Thanks for the report.

I can't reproduce this using 1.12.4 on linux/amd64, the two results are statistically the same (with vs without $):
$ benchstat with.txt without.txt 

name          old time/op  new time/op  delta
ConvT2ILarge  28.0ns ± 1%  28.0ns ± 1%   ~     (p=0.914 n=9+9)
Can you reproduce this consistently? Are you running the benchmarks on a very quiet machine with no other processes using the cpu?

Yes,I can reproduce that consistently. Not only on my mac (go1.12.4), but also on x86 and arm64 linux (go1.11.5).They are all quiet enough.

ALTree · 2019-04-22T10:17:27Z

Ah, I was able to reproduce this by doing more runs.

It doesn't happen every time, but when it happens the benchmark is indeed either 1) consistently slower 2) faster in the first iteration, slower from the 2nd.

Here 10 runs with ^BenchmarkConvT2ILarge$ and count=5

BenchmarkConvT2ILarge 	50000000	        27.9 ns/op
BenchmarkConvT2ILarge 	30000000	        52.5 ns/op
BenchmarkConvT2ILarge 	30000000	        59.9 ns/op
BenchmarkConvT2ILarge 	50000000	        40.5 ns/op
BenchmarkConvT2ILarge 	30000000	        59.6 ns/op


BenchmarkConvT2ILarge 	50000000	        41.6 ns/op
BenchmarkConvT2ILarge 	30000000	        59.8 ns/op
BenchmarkConvT2ILarge 	50000000	        53.0 ns/op
BenchmarkConvT2ILarge 	30000000	        61.6 ns/op
BenchmarkConvT2ILarge 	30000000	        61.5 ns/op


BenchmarkConvT2ILarge 	50000000	        40.9 ns/op
BenchmarkConvT2ILarge 	30000000	        61.6 ns/op
BenchmarkConvT2ILarge 	30000000	        61.7 ns/op
BenchmarkConvT2ILarge 	30000000	        61.6 ns/op
BenchmarkConvT2ILarge 	30000000	        61.6 ns/op


BenchmarkConvT2ILarge 	50000000	        27.7 ns/op
BenchmarkConvT2ILarge 	50000000	        28.2 ns/op
BenchmarkConvT2ILarge 	50000000	        28.0 ns/op
BenchmarkConvT2ILarge 	50000000	        27.9 ns/op
BenchmarkConvT2ILarge 	50000000	        27.9 ns/op


BenchmarkConvT2ILarge 	50000000	        55.8 ns/op
BenchmarkConvT2ILarge 	30000000	        61.8 ns/op
BenchmarkConvT2ILarge 	30000000	        61.9 ns/op
BenchmarkConvT2ILarge 	30000000	        61.7 ns/op
BenchmarkConvT2ILarge 	30000000	        59.9 ns/op


BenchmarkConvT2ILarge 	50000000	        28.0 ns/op
BenchmarkConvT2ILarge 	30000000	        57.9 ns/op
BenchmarkConvT2ILarge 	30000000	        61.6 ns/op
BenchmarkConvT2ILarge 	30000000	        61.9 ns/op
BenchmarkConvT2ILarge 	30000000	        61.6 ns/op


BenchmarkConvT2ILarge 	50000000	        27.8 ns/op
BenchmarkConvT2ILarge 	50000000	        57.2 ns/op
BenchmarkConvT2ILarge 	50000000	        51.5 ns/op
BenchmarkConvT2ILarge 	30000000	        61.9 ns/op
BenchmarkConvT2ILarge 	30000000	        59.7 ns/op


BenchmarkConvT2ILarge 	50000000	        27.9 ns/op
BenchmarkConvT2ILarge 	30000000	        52.2 ns/op
BenchmarkConvT2ILarge 	30000000	        59.5 ns/op
BenchmarkConvT2ILarge 	30000000	        61.4 ns/op
BenchmarkConvT2ILarge 	30000000	        61.4 ns/op


BenchmarkConvT2ILarge 	50000000	        47.5 ns/op
BenchmarkConvT2ILarge 	30000000	        59.6 ns/op
BenchmarkConvT2ILarge 	30000000	        59.7 ns/op
BenchmarkConvT2ILarge 	50000000	        51.5 ns/op
BenchmarkConvT2ILarge 	30000000	        61.8 ns/op


BenchmarkConvT2ILarge 	50000000	        27.9 ns/op
BenchmarkConvT2ILarge 	50000000	        27.7 ns/op
BenchmarkConvT2ILarge 	50000000	        27.9 ns/op
BenchmarkConvT2ILarge 	50000000	        27.9 ns/op
BenchmarkConvT2ILarge 	50000000	        27.7 ns/op

While with ^BenchmarkConvT2ILarge is always around 28 ns. This is on 1.12.4.

OTOH, tip is consistent, even with the $, but always at 40ns, which is slower than the 28ns in 1.12.

@xuanjiazhen Any chance you could test with tip (the current master) and confirm that the issue with the benchmark results being bimodal is indeed no longer present?

xuanjiazhen · 2019-04-22T15:26:02Z

I tried it on the latest master branch.The result looks like the two become consistent.But they reached consistent in the poor result!!! I think it is still not normal. @ALTree
(The number of times is a bit strange, is it a new improvement?)

$go version
go version devel +d17d41e58d Mon Apr 22 14:45:16 2019 +0000 darwin/amd64
$GOMAXPROCS=1 go test runtime -v -run none -bench  ^BenchmarkConvT2ILarg
e -count 10
goos: darwin
goarch: amd64
pkg: runtime
BenchmarkConvT2ILarge 	36258176	        32.3 ns/op
BenchmarkConvT2ILarge 	38381629	        32.3 ns/op
BenchmarkConvT2ILarge 	41117751	        32.2 ns/op
BenchmarkConvT2ILarge 	38234461	        32.7 ns/op
BenchmarkConvT2ILarge 	42858790	        32.2 ns/op
BenchmarkConvT2ILarge 	39587468	        32.6 ns/op
BenchmarkConvT2ILarge 	42929428	        32.7 ns/op
BenchmarkConvT2ILarge 	37120336	        32.4 ns/op
BenchmarkConvT2ILarge 	38552698	        32.9 ns/op
BenchmarkConvT2ILarge 	42029185	        32.2 ns/op
PASS
ok  	runtime	13.210s
$GOMAXPROCS=1 go test runtime -v -run none -bench  ^BenchmarkConvT2ILarg
e$ -count 10
goos: darwin
goarch: amd64
pkg: runtime
BenchmarkConvT2ILarge 	35498226	        32.5 ns/op
BenchmarkConvT2ILarge 	41142729	        32.2 ns/op
BenchmarkConvT2ILarge 	43043185	        32.1 ns/op
BenchmarkConvT2ILarge 	39090960	        32.3 ns/op
BenchmarkConvT2ILarge 	43341727	        32.4 ns/op
BenchmarkConvT2ILarge 	42690354	        32.1 ns/op
BenchmarkConvT2ILarge 	40104470	        32.1 ns/op
BenchmarkConvT2ILarge 	41782971	        32.2 ns/op
BenchmarkConvT2ILarge 	42257203	        32.1 ns/op
BenchmarkConvT2ILarge 	42138686	        32.4 ns/op
PASS
ok  	runtime	13.561s
$GOMAXPROCS=2 go test runtime -v -run none -bench  ^BenchmarkConvT2ILarg
e -count 10
goos: darwin
goarch: amd64
pkg: runtime
BenchmarkConvT2ILarge-2   	53497600	        20.2 ns/op
BenchmarkConvT2ILarge-2   	60137784	        20.0 ns/op
BenchmarkConvT2ILarge-2   	60011299	        20.1 ns/op
BenchmarkConvT2ILarge-2   	60673358	        20.2 ns/op
BenchmarkConvT2ILarge-2   	59793987	        20.1 ns/op
BenchmarkConvT2ILarge-2   	60232131	        20.0 ns/op
BenchmarkConvT2ILarge-2   	60023706	        20.1 ns/op
BenchmarkConvT2ILarge-2   	60491013	        20.1 ns/op
BenchmarkConvT2ILarge-2   	60667680	        19.9 ns/op
BenchmarkConvT2ILarge-2   	60528424	        20.0 ns/op
PASS
ok  	runtime	12.187s

josharian · 2019-04-22T20:27:10Z

But they reached consistent in the poor result!!!

The bimodal distribution depending on the number of iterations is because the benchmark is not fully linear in the number of iterations, often due to GC interactions. This is unfortunate but hard to fix in the general case. Whether you hit the "poor result" or not depends on how many iterations you run. If you run with -benchtime=100ms or -benchtime=10ms or the like, you may get different results.

(The number of times is a bit strange, is it a new improvement?)

Yes. And in fact, one of the goals of that change was to avoid bimodal distributions in non-linear benchmarks. :)

ALTree added the NeedsInvestigation label Apr 21, 2019

ALTree added this to the Go1.13 milestone Apr 21, 2019

ALTree added the WaitingForInfo label Apr 21, 2019

ALTree removed the WaitingForInfo label Apr 22, 2019

andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019

rsc modified the milestones: Go1.14, Backlog Oct 9, 2019

michel-laterman mentioned this issue Feb 21, 2024

[Performance] Improve fleet server benchmarks elastic/fleet-server#3152

Closed

1 task

gabyhelp mentioned this issue Jun 22, 2024

cmd/compile: large type switches that always match the first value are slower in Go 1.22 #68125

Open

gabyhelp mentioned this issue Jul 9, 2024

runtime: GOGC and GOMEMLIMIT performance comparison #68346

Closed

gabyhelp mentioned this issue Aug 9, 2024

cmd/compile: pgo causes worse performance on some benchmarks #68815

Open

gabyhelp mentioned this issue Aug 20, 2024

cmd/compile: pgo after CL 449477 cause worse performance on benchmark #68974

Closed

gabyhelp mentioned this issue Dec 12, 2024

perf record output is not correct in arm64. #70804

Closed

gabyhelp mentioned this issue Mar 3, 2025

testing: strange benchmark results with RunParallel and -benchtime #72079

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing: Inconsistent benchmark data when GOMAXPROCS=1 #31599

testing: Inconsistent benchmark data when GOMAXPROCS=1 #31599

xuanjiazhen commented Apr 21, 2019

ALTree commented Apr 21, 2019

xuanjiazhen commented Apr 21, 2019

ALTree commented Apr 22, 2019

xuanjiazhen commented Apr 22, 2019

josharian commented Apr 22, 2019

testing: Inconsistent benchmark data when GOMAXPROCS=1 #31599

testing: Inconsistent benchmark data when GOMAXPROCS=1 #31599

Comments

xuanjiazhen commented Apr 21, 2019

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

ALTree commented Apr 21, 2019

xuanjiazhen commented Apr 21, 2019

ALTree commented Apr 22, 2019

xuanjiazhen commented Apr 22, 2019

josharian commented Apr 22, 2019

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?