Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: Pool.Get is half the speed when uncontended #31820

Closed
rogpeppe opened this issue May 3, 2019 · 6 comments
Closed

sync: Pool.Get is half the speed when uncontended #31820

rogpeppe opened this issue May 3, 2019 · 6 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@rogpeppe
Copy link
Contributor

rogpeppe commented May 3, 2019

What version of Go are you using (go version)?

$ go version
go version devel +2c1b5130aa Fri May 3 10:41:15 2019 +0000 linux/amd64

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/rog/.cache/go-build"
GOENV="/home/rog/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/rog/src/go"
GOPROXY="direct"
GOROOT="/home/rog/go"
GOSUMDB="off"
GOTMPDIR=""
GOTOOLDIR="/home/rog/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/rog/go/src/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build266531649=/tmp/go-build -gno-record-gcc-switches"

What did you do?

With this program, run go test -count 5 -bench . -cpu 1,4 -benchmem.

package test

import (
	"sync"
	"testing"
)

func BenchmarkPool(b *testing.B) {
	obj := new(int)
	p := &sync.Pool{
		New: func() interface{} {
			return obj
		},
	}
	b.RunParallel(func(pb *testing.PB) {
		for pb.Next() {
			x := p.Get()
			p.Put(x)
		}
	})
}

Running -

What did you expect to see?

I'd expect to see that Pool.Get has roughly the same overhead when called in the single-threaded case as when called in parallel.

What did you see instead?

The results seem to indicate a significant overhead when called single threaded, which is unexpected to me.

BenchmarkPool     	57428542	        20.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkPool     	59477628	        20.9 ns/op	       0 B/op	       0 allocs/op
BenchmarkPool     	56434190	        20.7 ns/op	       0 B/op	       0 allocs/op
BenchmarkPool     	58290403	        20.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkPool     	58344486	        20.5 ns/op	       0 B/op	       0 allocs/op
BenchmarkPool-4   	137164621	         8.85 ns/op	       0 B/op	       0 allocs/op
BenchmarkPool-4   	136122486	         8.84 ns/op	       0 B/op	       0 allocs/op
BenchmarkPool-4   	133608108	         9.33 ns/op	       0 B/op	       0 allocs/op
BenchmarkPool-4   	132039085	        10.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkPool-4   	122718084	         9.21 ns/op	       0 B/op	       0 allocs/op
@rogpeppe
Copy link
Contributor Author

rogpeppe commented May 3, 2019

For the record, this is the main issue that's stopping me using a slightly more sophisticated algorithm in github.com/rogpeppe/fastuuid that avoids significant overhead when UUID generation is contended.

@ianlancetaylor
Copy link
Contributor

CC @aclements

@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 3, 2019
@ianlancetaylor ianlancetaylor added this to the Go1.13 milestone May 3, 2019
@aclements
Copy link
Member

These results actually indicate that sync.Pool is scaling basically perfectly and saturating the CPU pipeline (assuming you have two 2-way hyperthreaded cores).

The way parallel benchmark results are reported is frustratingly confusing. It's not reporting CPU-ns/op, it's reporting wall-ns/op. For example, suppose each op takes exactly 100 ns, regardless of parallelism. If the single-threaded benchmark runs for 1 sec, it will execute 10,000,000 ops, so ns/op = 1s/10,000,000 ops = 100 ns/op. But if the same benchmark runs 4-way parallel for 1 sec, it will execute 40,000,000 ops, so ns/op = 1s/40,000,000 = 25 ns/op. (I really wish it didn't work this way...)

Your results only differ by a factor of 2 rather than 4 because of hyperthreading. Since the benchmark is highly predictable and obviously stays within the cache, there are probably basically no pipeline stalls (though I didn't measure this to confirm), so you really only get 2x parallelism. For example, when I run with -cpu 1,4 on a six core 2-way hyperthreaded machine I get:

perflock /tmp/test.test -test.bench . -test.cpu 1,4
goos: linux
goarch: amd64
BenchmarkPool       	63292490	        18.7 ns/op
BenchmarkPool-4     	250416594	         4.70 ns/op

4.70 * 4 = 18.8, indicating perfect scalability.

I'm going to go ahead and close this, but feel free to follow up.

@rogpeppe
Copy link
Contributor Author

rogpeppe commented May 3, 2019

The way parallel benchmark results are reported is frustratingly confusing. It's not reporting CPU-ns/op, it's reporting wall-ns/op. For example, suppose each op takes exactly 100 ns, regardless of parallelism. If the single-threaded benchmark runs for 1 sec, it will execute 10,000,000 ops, so ns/op = 1s/10,000,000 ops = 100 ns/op. But if the same benchmark runs 4-way parallel for 1 sec, it will execute 40,000,000 ops, so ns/op = 1s/40,000,000 = 25 ns/op. (I really wish it didn't work this way...)

OK, I wondered if it was something like that. So "op" in this case does not mean "1 operation that I'm testing takes time $T", but "1 operation that I'm testing takes time ($T * $NCPU)", so the numbers in different -cpu results aren't comparable without scaling them first.

That is definitely somewhat confusing. Is it documented somewhere? (I scanned through go help testflag and https://golang.org/pkg/testing/). Perhaps it should be called out explicitly in https://golang.org/pkg/testing/#hdr-Benchmarks ?

@rogpeppe
Copy link
Contributor Author

rogpeppe commented May 3, 2019

Thanks for the quick response BTW.

@aclements
Copy link
Member

That's right.

I don't think this is documented anywhere. I've opened #31884 about the documentation issue.

@golang golang locked and limited conversation to collaborators May 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants