|
|
Descriptiontesting: ease writing parallel benchmarks
Add b.RunParallel function that captures parallel benchmark boilerplate:
creates worker goroutines, joins worker goroutines, distributes work
among them in an efficient way, auto-tunes grain size.
Fixes issue 7090.
Patch Set 1 #Patch Set 2 : diff -r a41f8780d8b0 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 3 : diff -r a41f8780d8b0 https://dvyukov%40google.com@code.google.com/p/go/ #
Total comments: 1
Patch Set 4 : diff -r daaac74e31ed https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 5 : diff -r daaac74e31ed https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 6 : diff -r daaac74e31ed https://dvyukov%40google.com@code.google.com/p/go/ #
Total comments: 8
Patch Set 7 : diff -r daaac74e31ed https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 8 : diff -r daaac74e31ed https://dvyukov%40google.com@code.google.com/p/go/ #
Total comments: 12
Patch Set 9 : diff -r efb71a1d099d https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 10 : diff -r 4106a965e536 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 11 : diff -r ae14bde9ce3c https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 12 : diff -r 0ff92624872e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 13 : diff -r 0ff92624872e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 14 : diff -r 0ff92624872e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 15 : diff -r 0ff92624872e https://dvyukov%40google.com@code.google.com/p/go/ #
Total comments: 33
Patch Set 16 : diff -r c3ee11c5f19f https://dvyukov%40google.com@code.google.com/p/go/ #
Total comments: 22
Patch Set 17 : diff -r e2dd08f26402 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 18 : diff -r e2dd08f26402 https://dvyukov%40google.com@code.google.com/p/go/ #
Total comments: 2
Patch Set 19 : diff -r b29b14df7e45 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 20 : diff -r f51841430d08 https://dvyukov%40google.com@code.google.com/p/go/ #
MessagesTotal messages: 61
https://codereview.appspot.com/57270043/diff/40001/src/pkg/testing/benchmark.go File src/pkg/testing/benchmark.go (right): https://codereview.appspot.com/57270043/diff/40001/src/pkg/testing/benchmark.... src/pkg/testing/benchmark.go:355: func (b *B) RunParallel2(P int, ctor func() func()) { I think just one function is fine. I'd rename this one "RunParallel" and rename "P" to "perProc" and document that it's normally just 1. Then I'd name both funcs: (perProc, int, setup func() (loop func()))
Sign in to reply to this message.
Hello bradfitz@golang.org (cc: golang-codereviews@googlegroups.com, r@golang.org), I'd like you to review this change to https://dvyukov%40google.com@code.google.com/p/go/
Sign in to reply to this message.
On 2014/01/30 09:29:58, bradfitz wrote: > https://codereview.appspot.com/57270043/diff/40001/src/pkg/testing/benchmark.go > File src/pkg/testing/benchmark.go (right): > > https://codereview.appspot.com/57270043/diff/40001/src/pkg/testing/benchmark.... > src/pkg/testing/benchmark.go:355: func (b *B) RunParallel2(P int, ctor func() > func()) { > I think just one function is fine. I'd rename this one "RunParallel" and rename > "P" to "perProc" and document that it's normally just 1. Then I'd name both > funcs: (perProc, int, setup func() (loop func())) Done
Sign in to reply to this message.
LGTM but wait for an API review from r or iant https://codereview.appspot.com/57270043/diff/100001/src/pkg/testing/benchmark.go File src/pkg/testing/benchmark.go (right): https://codereview.appspot.com/57270043/diff/100001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:354: // A typical value for perProc for CPU bound benchmarks is 1. CPU-bound (adjective hyphen) https://codereview.appspot.com/57270043/diff/100001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:355: // Setup function is executed once in each worker goroutine // The setup function https://codereview.appspot.com/57270043/diff/100001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:357: // Setup function may be used to create goroutine-local objects, // The setup function https://codereview.appspot.com/57270043/diff/100001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:384: n := atomic.AddUint64(&N, grain) will this suck on 32-bit, notably ARM? I think B.N always fits in uint32, no?
Sign in to reply to this message.
https://codereview.appspot.com/57270043/diff/100001/src/pkg/testing/benchmark.go File src/pkg/testing/benchmark.go (right): https://codereview.appspot.com/57270043/diff/100001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:354: // A typical value for perProc for CPU bound benchmarks is 1. On 2014/01/30 10:45:41, bradfitz wrote: > CPU-bound (adjective hyphen) Done. https://codereview.appspot.com/57270043/diff/100001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:355: // Setup function is executed once in each worker goroutine On 2014/01/30 10:45:41, bradfitz wrote: > // The setup function Done. https://codereview.appspot.com/57270043/diff/100001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:357: // Setup function may be used to create goroutine-local objects, On 2014/01/30 10:45:41, bradfitz wrote: > // The setup function Done. https://codereview.appspot.com/57270043/diff/100001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:384: n := atomic.AddUint64(&N, grain) On 2014/01/30 10:45:41, bradfitz wrote: > will this suck on 32-bit, notably ARM? > > I think B.N always fits in uint32, no? If grain is 100us, and AddUint32 takes 300ns, that's still ~0.3% of total time. I would use atomic.AddInt, but Russ thinks it's useless.
Sign in to reply to this message.
And Go team must arrange for a persistent guest reviewer in Europe :)
Sign in to reply to this message.
I think we need to somehow mention that it's intended to be used with "go test -cpu=1,2,4". I see that people write: func BenchmarkFoo(b *testing.B) { runtime.GOMAXPROCS(4) b.RunParallel(...) } Any suggestions for wording?
Sign in to reply to this message.
Is AddUint32 faster than AddUint64? On Jan 30, 2014 12:20 PM, <dvyukov@google.com> wrote: > > https://codereview.appspot.com/57270043/diff/100001/src/ > pkg/testing/benchmark.go > File src/pkg/testing/benchmark.go (right): > > https://codereview.appspot.com/57270043/diff/100001/src/ > pkg/testing/benchmark.go#newcode354 > src/pkg/testing/benchmark.go:354: // A typical value for perProc for CPU > bound benchmarks is 1. > On 2014/01/30 10:45:41, bradfitz wrote: > >> CPU-bound (adjective hyphen) >> > > Done. > > https://codereview.appspot.com/57270043/diff/100001/src/ > pkg/testing/benchmark.go#newcode355 > src/pkg/testing/benchmark.go:355: // Setup function is executed once in > each worker goroutine > On 2014/01/30 10:45:41, bradfitz wrote: > >> // The setup function >> > > Done. > > https://codereview.appspot.com/57270043/diff/100001/src/ > pkg/testing/benchmark.go#newcode357 > src/pkg/testing/benchmark.go:357: // Setup function may be used to > create goroutine-local objects, > On 2014/01/30 10:45:41, bradfitz wrote: > >> // The setup function >> > > Done. > > https://codereview.appspot.com/57270043/diff/100001/src/ > pkg/testing/benchmark.go#newcode384 > src/pkg/testing/benchmark.go:384: n := atomic.AddUint64(&N, grain) > On 2014/01/30 10:45:41, bradfitz wrote: > >> will this suck on 32-bit, notably ARM? >> > > I think B.N always fits in uint32, no? >> > > If grain is 100us, and AddUint32 takes 300ns, that's still ~0.3% of > total time. > I would use atomic.AddInt, but Russ thinks it's useless. > > https://codereview.appspot.com/57270043/ >
Sign in to reply to this message.
It's probably a bit faster on 386: XADD vs CMPXCHG8B loop. It would matter more in a highly contended case. ARMv7+ code looks mostly the same: LDREX/STREX vs LDREXD/STREXD loop. On Thu, Jan 30, 2014 at 4:12 PM, Brad Fitzpatrick <bradfitz@golang.org> wrote: > Is AddUint32 faster than AddUint64? > > On Jan 30, 2014 12:20 PM, <dvyukov@google.com> wrote: >> >> >> >> https://codereview.appspot.com/57270043/diff/100001/src/pkg/testing/benchmark.go >> File src/pkg/testing/benchmark.go (right): >> >> >> https://codereview.appspot.com/57270043/diff/100001/src/pkg/testing/benchmark... >> src/pkg/testing/benchmark.go:354: // A typical value for perProc for CPU >> bound benchmarks is 1. >> On 2014/01/30 10:45:41, bradfitz wrote: >>> >>> CPU-bound (adjective hyphen) >> >> >> Done. >> >> >> https://codereview.appspot.com/57270043/diff/100001/src/pkg/testing/benchmark... >> src/pkg/testing/benchmark.go:355: // Setup function is executed once in >> each worker goroutine >> On 2014/01/30 10:45:41, bradfitz wrote: >>> >>> // The setup function >> >> >> Done. >> >> >> https://codereview.appspot.com/57270043/diff/100001/src/pkg/testing/benchmark... >> src/pkg/testing/benchmark.go:357: // Setup function may be used to >> create goroutine-local objects, >> On 2014/01/30 10:45:41, bradfitz wrote: >>> >>> // The setup function >> >> >> Done. >> >> >> https://codereview.appspot.com/57270043/diff/100001/src/pkg/testing/benchmark... >> src/pkg/testing/benchmark.go:384: n := atomic.AddUint64(&N, grain) >> On 2014/01/30 10:45:41, bradfitz wrote: >>> >>> will this suck on 32-bit, notably ARM? >> >> >>> I think B.N always fits in uint32, no? >> >> >> If grain is 100us, and AddUint32 takes 300ns, that's still ~0.3% of >> total time. >> I would use atomic.AddInt, but Russ thinks it's useless. >> >> https://codereview.appspot.com/57270043/
Sign in to reply to this message.
https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark.go File src/pkg/testing/benchmark.go (right): https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:352: // It creates perProc*GOMAXPROCS worker goroutines I guess the general idea is to run some number of instances of the function in parallel, to see how long they take. I'm not sure why the API should express that in terms of GOMAXPROCS. Why not have the caller pass in the number of goroutines that should be started simultaneously, and let them worry about correlating that with GOMAXPROCS? In particular I can imagine getting rid of GOMAXPROCS some day, at which point it would be strange to have be part of the API here. https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:356: // and must return the loop function which will be executed b.N times. This isn't how non-parallel benchmarks behave. Those benchmarks handle b.N themselves. Why shouldn't parallel benchmarks behave the same? Also, the function should take a *testing.B anyhow, so that it can call Fail, Log, etc. https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:361: var wg sync.WaitGroup May as well move wg down to just before the loop. https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... File src/pkg/testing/benchmark_test.go (right): https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... src/pkg/testing/benchmark_test.go:84: // Parallel benchmark for fmt.Fprintf into a bytes.Buffer. This seems like a strange example, because I can't see why anybody would care about how fmt.Fprintf behaves when run in parallel.
Sign in to reply to this message.
https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark.go File src/pkg/testing/benchmark.go (right): https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:352: // It creates perProc*GOMAXPROCS worker goroutines On 2014/01/31 00:40:50, iant wrote: > I guess the general idea is to run some number of instances of the function in > parallel, to see how long they take. I'm not sure why the API should express > that in terms of GOMAXPROCS. Why not have the caller pass in the number of > goroutines that should be started simultaneously, and let them worry about > correlating that with GOMAXPROCS? In particular I can imagine getting rid of > GOMAXPROCS some day, at which point it would be strange to have be part of the > API here. We have a nice, handy and official way to control GOMAXPROCS in benchmarks and tests, which is: $ go test -cpu=1,2,4,8 In all instances of parallel benchmarks we start k*GOMAXPROCS goroutines. This very naturally leads to this flexible API (as compared to hardcoding GOMAXPROCS in benchmarks). In particular one thing that I want to prevent is: func BenchmarkFoo(b *testing.B) { runtime.GOMAXPROCS(4) b.RunParallel(4, ...) } https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:356: // and must return the loop function which will be executed b.N times. On 2014/01/31 00:40:50, iant wrote: > This isn't how non-parallel benchmarks behave. Those benchmarks handle b.N > themselves. Why shouldn't parallel benchmarks behave the same? Also, the > function should take a *testing.B anyhow, so that it can call Fail, Log, etc. I do not understand comment about b.N. The automatic b.N tuning logic has already happened before the BenhcmarkFoo is called. RunParallel takes full advantage of it, user does not need to do anything special with b.N. In fact, with RunParallel you care even less about b.N, you just provide loop function. b is easily captured by both setup and loop functions. Take a look at the example usages of RunParallel in chan_test.go. https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:361: var wg sync.WaitGroup On 2014/01/31 00:40:50, iant wrote: > May as well move wg down to just before the loop. Done. https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... File src/pkg/testing/benchmark_test.go (right): https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... src/pkg/testing/benchmark_test.go:84: // Parallel benchmark for fmt.Fprintf into a bytes.Buffer. On 2014/01/31 00:40:50, iant wrote: > This seems like a strange example, because I can't see why anybody would care > about how fmt.Fprintf behaves when run in parallel. We do care: https://codereview.appspot.com/50140045/ I believe that all of our benchmarks must be parallel: 1. If you run without -cpu=1,2,4, then you get the same as for non-parallel benchmark 2. It can highlight any implicit or explicit synchronization in the component 3. If there is no, it still can highlight bottlenecks in runtime (e.g. malloc and GC always have some synchronization) 4. If it just scales linearly, it's still a good result Having said that, I am open to suggestions about a better example. I think we don't want to write a new component just for the example, so it must be something from std lib.
Sign in to reply to this message.
https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark.go File src/pkg/testing/benchmark.go (right): https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:352: // It creates perProc*GOMAXPROCS worker goroutines On 2014/01/31 09:26:21, dvyukov wrote: > On 2014/01/31 00:40:50, iant wrote: > > I guess the general idea is to run some number of instances of the function in > > parallel, to see how long they take. I'm not sure why the API should express > > that in terms of GOMAXPROCS. Why not have the caller pass in the number of > > goroutines that should be started simultaneously, and let them worry about > > correlating that with GOMAXPROCS? In particular I can imagine getting rid of > > GOMAXPROCS some day, at which point it would be strange to have be part of the > > API here. > > > We have a nice, handy and official way to control GOMAXPROCS in benchmarks and > tests, which is: > $ go test -cpu=1,2,4,8 > In all instances of parallel benchmarks we start k*GOMAXPROCS goroutines. > This very naturally leads to this flexible API (as compared to hardcoding > GOMAXPROCS in benchmarks). > > In particular one thing that I want to prevent is: > func BenchmarkFoo(b *testing.B) { > runtime.GOMAXPROCS(4) > b.RunParallel(4, ...) > } OK, makes sense. RunParallel helps write parallel benchmarks. It creates perProc worker goroutines on each operating system thread. The number of threads to use comes from GOMAXPROCS; it's most useful to set GOMAXPROCS with the go test -cpu option. For a CPU-bound benchmark, perProc would typically be set to 1. The setup function is called once in each worker goroutine. It must return the loop function, which will be executed b.N times. The loop function should not use b.N itself. The setup function may be used to create goroutine-local variables that the loop function may refer to. https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... File src/pkg/testing/benchmark_test.go (right): https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... src/pkg/testing/benchmark_test.go:84: // Parallel benchmark for fmt.Fprintf into a bytes.Buffer. On 2014/01/31 09:26:21, dvyukov wrote: > On 2014/01/31 00:40:50, iant wrote: > > This seems like a strange example, because I can't see why anybody would care > > about how fmt.Fprintf behaves when run in parallel. > > We do care: > https://codereview.appspot.com/50140045/ That is not a great example, as it is really a test of sync.Pool. I think this example here would make more sense to readers if you test sync.Pool here. > I believe that all of our benchmarks must be parallel: If you really believe that then I think we should consider a different approach here. This API is simple but looks somewhat cumbersome to use. For example, perhaps we could say that go test will look for any function ParallelBenchmark(*testing.PB, int) and execute -test.cpu instances in parallel, passing the number of loop iterations as the int parameter. If you like this suggestion I recommend taking it to golang-dev to see what others think.
Sign in to reply to this message.
On Fri, Jan 31, 2014 at 6:45 PM, <iant@golang.org> wrote: > > https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark.go > File src/pkg/testing/benchmark.go (right): > > https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... > src/pkg/testing/benchmark.go:352: // It creates perProc*GOMAXPROCS > worker goroutines > On 2014/01/31 09:26:21, dvyukov wrote: >> >> On 2014/01/31 00:40:50, iant wrote: >> > I guess the general idea is to run some number of instances of the > > function in >> >> > parallel, to see how long they take. I'm not sure why the API > > should express >> >> > that in terms of GOMAXPROCS. Why not have the caller pass in the > > number of >> >> > goroutines that should be started simultaneously, and let them worry > > about >> >> > correlating that with GOMAXPROCS? In particular I can imagine > > getting rid of >> >> > GOMAXPROCS some day, at which point it would be strange to have be > > part of the >> >> > API here. > > > >> We have a nice, handy and official way to control GOMAXPROCS in > > benchmarks and >> >> tests, which is: >> $ go test -cpu=1,2,4,8 >> In all instances of parallel benchmarks we start k*GOMAXPROCS > > goroutines. >> >> This very naturally leads to this flexible API (as compared to > > hardcoding >> >> GOMAXPROCS in benchmarks). > > >> In particular one thing that I want to prevent is: >> func BenchmarkFoo(b *testing.B) { >> runtime.GOMAXPROCS(4) >> b.RunParallel(4, ...) >> } > > > OK, makes sense. > > RunParallel helps write parallel benchmarks. > It creates perProc worker goroutines on each operating system thread. > The number of threads to use comes from GOMAXPROCS; it's most useful to > set GOMAXPROCS with the go test -cpu option. For a CPU-bound benchmark, > perProc would typically be set to 1. > > The setup function is called once in each worker goroutine. It must > return the loop function, which will be executed b.N times. The loop > function should not use b.N itself. The setup function may be used to > create goroutine-local variables that the loop function may refer to. > > > https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... > File src/pkg/testing/benchmark_test.go (right): > > https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... > src/pkg/testing/benchmark_test.go:84: // Parallel benchmark for > fmt.Fprintf into a bytes.Buffer. > On 2014/01/31 09:26:21, dvyukov wrote: >> >> On 2014/01/31 00:40:50, iant wrote: >> > This seems like a strange example, because I can't see why anybody > > would care >> >> > about how fmt.Fprintf behaves when run in parallel. > > >> We do care: >> https://codereview.appspot.com/50140045/ > > > That is not a great example, as it is really a test of sync.Pool. I > think this example here would make more sense to readers if you test > sync.Pool here. Good idea! >> I believe that all of our benchmarks must be parallel: > > > If you really believe that then I think we should consider a different > approach here. This API is simple but looks somewhat cumbersome to use. > > For example, perhaps we could say that go test will look for any > function > > ParallelBenchmark(*testing.PB, int) > > and execute -test.cpu instances in parallel, passing the number of loop > iterations as the int parameter. If you like this suggestion I > recommend taking it to golang-dev to see what others think. The problem is that half of the parallel benchmarks in std lib need goroutine-local state, so they do not fit into your simple pattern. The general structure of parallel benchmarks is: func BenchmarkFoo(...) { // create some global state // create some goroutine-local state // do one iteration using global state and goroutine-local state } However, I like the idea of providing more fundamental (and easier to use) support for parallel benchmark. I will try to figure out how it can look like.
Sign in to reply to this message.
PTAL https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark.go File src/pkg/testing/benchmark.go (right): https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:352: // It creates perProc*GOMAXPROCS worker goroutines On 2014/01/31 14:45:01, iant wrote: > On 2014/01/31 09:26:21, dvyukov wrote: > > On 2014/01/31 00:40:50, iant wrote: > > > I guess the general idea is to run some number of instances of the function > in > > > parallel, to see how long they take. I'm not sure why the API should > express > > > that in terms of GOMAXPROCS. Why not have the caller pass in the number of > > > goroutines that should be started simultaneously, and let them worry about > > > correlating that with GOMAXPROCS? In particular I can imagine getting rid > of > > > GOMAXPROCS some day, at which point it would be strange to have be part of > the > > > API here. > > > > > > We have a nice, handy and official way to control GOMAXPROCS in benchmarks and > > tests, which is: > > $ go test -cpu=1,2,4,8 > > In all instances of parallel benchmarks we start k*GOMAXPROCS goroutines. > > This very naturally leads to this flexible API (as compared to hardcoding > > GOMAXPROCS in benchmarks). > > > > In particular one thing that I want to prevent is: > > func BenchmarkFoo(b *testing.B) { > > runtime.GOMAXPROCS(4) > > b.RunParallel(4, ...) > > } > > OK, makes sense. > > RunParallel helps write parallel benchmarks. > It creates perProc worker goroutines on each operating system thread. The > number of threads to use comes from GOMAXPROCS; it's most useful to set > GOMAXPROCS with the go test -cpu option. For a CPU-bound benchmark, perProc > would typically be set to 1. > > The setup function is called once in each worker goroutine. It must return the > loop function, which will be executed b.N times. The loop function should not > use b.N itself. The setup function may be used to create goroutine-local > variables that the loop function may refer to. Done. https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... File src/pkg/testing/benchmark_test.go (right): https://codereview.appspot.com/57270043/diff/140001/src/pkg/testing/benchmark... src/pkg/testing/benchmark_test.go:84: // Parallel benchmark for fmt.Fprintf into a bytes.Buffer. On 2014/01/31 14:45:01, iant wrote: > On 2014/01/31 09:26:21, dvyukov wrote: > > On 2014/01/31 00:40:50, iant wrote: > > > This seems like a strange example, because I can't see why anybody would > care > > > about how fmt.Fprintf behaves when run in parallel. > > > > We do care: > > https://codereview.appspot.com/50140045/ > > That is not a great example, as it is really a test of sync.Pool. I think this > example here would make more sense to readers if you test sync.Pool here. I really want to show goroutine-local state, but I don't know how to reasonably squeeze it into sync.Pool benchmark. So I've changed it to parallel benchmark of text/template.Template.Execute on a single object. It makes the global/shared state clearly visible.
Sign in to reply to this message.
On 2014/01/31 15:00:30, dvyukov wrote: > >> I believe that all of our benchmarks must be parallel: > > > > > > If you really believe that then I think we should consider a different > > approach here. This API is simple but looks somewhat cumbersome to use. > > > > For example, perhaps we could say that go test will look for any > > function > > > > ParallelBenchmark(*testing.PB, int) > > > > and execute -test.cpu instances in parallel, passing the number of loop > > iterations as the int parameter. If you like this suggestion I > > recommend taking it to golang-dev to see what others think. > > The problem is that half of the parallel benchmarks in std lib need > goroutine-local state, so they do not fit into your simple pattern. > > The general structure of parallel benchmarks is: > > func BenchmarkFoo(...) { > // create some global state > // create some goroutine-local state > // do one iteration using global state and goroutine-local state > } > > However, I like the idea of providing more fundamental (and easier to > use) support for parallel benchmark. I will try to figure out how it > can look like. I've failed to figure out how ParallelBenchmark can support global/local state. Except for: func ParallelBenchmark(b *testing.B) func() func() { templ := template.Must(template.New("test").Parse("Hello, {{.}}!")) return func() func() { var buf bytes.Buffer return func() { buf.Reset() templ.Execute(&buf, "World") } } } but it looks even more awkward and does not avoid this "func() func()".
Sign in to reply to this message.
On 2014/02/02 17:34:50, dvyukov wrote: > On 2014/01/31 15:00:30, dvyukov wrote: > > >> I believe that all of our benchmarks must be parallel: > > > > > > > > > If you really believe that then I think we should consider a different > > > approach here. This API is simple but looks somewhat cumbersome to use. > > > > > > For example, perhaps we could say that go test will look for any > > > function > > > > > > ParallelBenchmark(*testing.PB, int) > > > > > > and execute -test.cpu instances in parallel, passing the number of loop > > > iterations as the int parameter. If you like this suggestion I > > > recommend taking it to golang-dev to see what others think. > > > > The problem is that half of the parallel benchmarks in std lib need > > goroutine-local state, so they do not fit into your simple pattern. > > > > The general structure of parallel benchmarks is: > > > > func BenchmarkFoo(...) { > > // create some global state > > // create some goroutine-local state > > // do one iteration using global state and goroutine-local state > > } > > > > However, I like the idea of providing more fundamental (and easier to > > use) support for parallel benchmark. I will try to figure out how it > > can look like. > > > I've failed to figure out how ParallelBenchmark can support global/local state. > Except for: > func ParallelBenchmark(b *testing.B) func() func() { > templ := template.Must(template.New("test").Parse("Hello, {{.}}!")) > return func() func() { > var buf bytes.Buffer > return func() { > buf.Reset() > templ.Execute(&buf, "World") > } > } > } > but it looks even more awkward and does not avoid this "func() func()". At the cost of verbosity, we could avoid the func() func() funkiness by introducing an interface, something like: // BWorker is a worker in a parallel benchmark. type BWorker interface { // NewWorker creates a new BWorker from the receiver. // NewWorker can be used to propagate global state to new // workers and to set up local state per worker. NewWorker() BWorker // RunOnce is the unit of work that is being benchmarked. // RunOnce will be called multiple times as part of a benchmark loop. RunOnce() } Using this in a parallel benchmark would look something like: type parallelExecute struct { tpl *template.Template buf *bytes.Buffer } func (e *parallelExecute) NewWorker() BWorker { // Pass the template on to the new worker. // There is no local state to set up in this example. return ¶llelExecute{tpl: e.tpl} } func (e *parallelExecute) RunOnce() { e.buf.Reset() e.tpl.Execute(e.buf, "World") } func BenchmarkExecute(b *testing.B) { tpl := template.Must(template.New("test").Parse("Hello, {{.}}!")) e := ¶llelExecute{tpl: tpl} b.RunParallel(1, e) } Having written all that out, I'm not sure that it is actually any better than func() func(). Still, I figure it is good to have an alternative on the table. Unrelatedly, RunParallel seems useful, non-trivial, and general purpose. (I plan to steal it for my own non-benchmarking purposes.) If you come up with a really clean way to expose it, maybe it should live somewhere other than testing. -josh
Sign in to reply to this message.
Similarly, it would be great to have an auto-load-balanced parallel for loop provided (similar to that in https://groups.google.com/forum/#!searchin/golang-nuts/parallel$20for$20atomi...) that does auto-load balancing. I can open up a thread/issue if this is something that may have merit.
Sign in to reply to this message.
On 2014/02/04 22:17:02, josharian wrote: > On 2014/02/02 17:34:50, dvyukov wrote: > > On 2014/01/31 15:00:30, dvyukov wrote: > > > >> I believe that all of our benchmarks must be parallel: > > > > > > > > > > > > If you really believe that then I think we should consider a different > > > > approach here. This API is simple but looks somewhat cumbersome to use. > > > > > > > > For example, perhaps we could say that go test will look for any > > > > function > > > > > > > > ParallelBenchmark(*testing.PB, int) > > > > > > > > and execute -test.cpu instances in parallel, passing the number of loop > > > > iterations as the int parameter. If you like this suggestion I > > > > recommend taking it to golang-dev to see what others think. > > > > > > The problem is that half of the parallel benchmarks in std lib need > > > goroutine-local state, so they do not fit into your simple pattern. > > > > > > The general structure of parallel benchmarks is: > > > > > > func BenchmarkFoo(...) { > > > // create some global state > > > // create some goroutine-local state > > > // do one iteration using global state and goroutine-local state > > > } > > > > > > However, I like the idea of providing more fundamental (and easier to > > > use) support for parallel benchmark. I will try to figure out how it > > > can look like. > > > > > > I've failed to figure out how ParallelBenchmark can support global/local > state. > > Except for: > > func ParallelBenchmark(b *testing.B) func() func() { > > templ := template.Must(template.New("test").Parse("Hello, {{.}}!")) > > return func() func() { > > var buf bytes.Buffer > > return func() { > > buf.Reset() > > templ.Execute(&buf, "World") > > } > > } > > } > > but it looks even more awkward and does not avoid this "func() func()". > > At the cost of verbosity, we could avoid the func() func() funkiness by > introducing an interface, something like: > > // BWorker is a worker in a parallel benchmark. > type BWorker interface { > // NewWorker creates a new BWorker from the receiver. > // NewWorker can be used to propagate global state to new > // workers and to set up local state per worker. > NewWorker() BWorker > // RunOnce is the unit of work that is being benchmarked. > // RunOnce will be called multiple times as part of a benchmark loop. > RunOnce() > } > > Using this in a parallel benchmark would look something like: > > type parallelExecute struct { > tpl *template.Template > buf *bytes.Buffer > } > > func (e *parallelExecute) NewWorker() BWorker { > // Pass the template on to the new worker. > // There is no local state to set up in this example. > return ¶llelExecute{tpl: e.tpl} > } > > func (e *parallelExecute) RunOnce() { > e.buf.Reset() > e.tpl.Execute(e.buf, "World") > } > > func BenchmarkExecute(b *testing.B) { > tpl := template.Must(template.New("test").Parse("Hello, {{.}}!")) > e := ¶llelExecute{tpl: tpl} > b.RunParallel(1, e) > } > > Having written all that out, I'm not sure that it is actually any better than > func() func(). Still, I figure it is good to have an alternative on the table. I agree that it does not look better. And I agree that we need to try to find something better. How about this? func BenchmarkFoo(b *testing.B) { b.RunParallel(1, func() { var buf bytes.Buffer b.Iter = func() { buf.Reset() fmt.Fprintf(&buf, "foo") } }) } Still, just returning the function looks like the most natural way to... return the function. > Unrelatedly, RunParallel seems useful, non-trivial, and general purpose. (I plan > to steal it for my own non-benchmarking purposes.) If you come up with a really > clean way to expose it, maybe it should live somewhere other than testing. Currently std lib does not provide any parallel algorithms, so just putting this single function somewhere does not sound like a good plan. A more coherent plan is to design parallel package that provides parallel.For and parallel.Sort. However, as always, it can live on github.
Sign in to reply to this message.
On 2014/02/05 06:01:54, dvyukov wrote: > On 2014/02/04 22:17:02, josharian wrote: > > On 2014/02/02 17:34:50, dvyukov wrote: > > > On 2014/01/31 15:00:30, dvyukov wrote: > > > > >> I believe that all of our benchmarks must be parallel: > > > > > > > > > > > > > > > If you really believe that then I think we should consider a different > > > > > approach here. This API is simple but looks somewhat cumbersome to use. > > > > > > > > > > For example, perhaps we could say that go test will look for any > > > > > function > > > > > > > > > > ParallelBenchmark(*testing.PB, int) > > > > > > > > > > and execute -test.cpu instances in parallel, passing the number of loop > > > > > iterations as the int parameter. If you like this suggestion I > > > > > recommend taking it to golang-dev to see what others think. > > > > > > > > The problem is that half of the parallel benchmarks in std lib need > > > > goroutine-local state, so they do not fit into your simple pattern. > > > > > > > > The general structure of parallel benchmarks is: > > > > > > > > func BenchmarkFoo(...) { > > > > // create some global state > > > > // create some goroutine-local state > > > > // do one iteration using global state and goroutine-local > state > > > > } > > > > > > > > However, I like the idea of providing more fundamental (and easier to > > > > use) support for parallel benchmark. I will try to figure out how it > > > > can look like. > > > > > > > > > I've failed to figure out how ParallelBenchmark can support global/local > > state. > > > Except for: > > > func ParallelBenchmark(b *testing.B) func() func() { > > > templ := template.Must(template.New("test").Parse("Hello, {{.}}!")) > > > return func() func() { > > > var buf bytes.Buffer > > > return func() { > > > buf.Reset() > > > templ.Execute(&buf, "World") > > > } > > > } > > > } > > > but it looks even more awkward and does not avoid this "func() func()". > > > > At the cost of verbosity, we could avoid the func() func() funkiness by > > introducing an interface, something like: > > > > // BWorker is a worker in a parallel benchmark. > > type BWorker interface { > > // NewWorker creates a new BWorker from the receiver. > > // NewWorker can be used to propagate global state to new > > // workers and to set up local state per worker. > > NewWorker() BWorker > > // RunOnce is the unit of work that is being benchmarked. > > // RunOnce will be called multiple times as part of a benchmark loop. > > RunOnce() > > } > > > > Using this in a parallel benchmark would look something like: > > > > type parallelExecute struct { > > tpl *template.Template > > buf *bytes.Buffer > > } > > > > func (e *parallelExecute) NewWorker() BWorker { > > // Pass the template on to the new worker. > > // There is no local state to set up in this example. > > return ¶llelExecute{tpl: e.tpl} > > } > > > > func (e *parallelExecute) RunOnce() { > > e.buf.Reset() > > e.tpl.Execute(e.buf, "World") > > } > > > > func BenchmarkExecute(b *testing.B) { > > tpl := template.Must(template.New("test").Parse("Hello, {{.}}!")) > > e := ¶llelExecute{tpl: tpl} > > b.RunParallel(1, e) > > } > > > > Having written all that out, I'm not sure that it is actually any better than > > func() func(). Still, I figure it is good to have an alternative on the table. > > I agree that it does not look better. > And I agree that we need to try to find something better. > > How about this? > > func BenchmarkFoo(b *testing.B) { > b.RunParallel(1, func() { > var buf bytes.Buffer > b.Iter = func() { > buf.Reset() > fmt.Fprintf(&buf, "foo") > } > }) > } > > Still, just returning the function looks like the most natural way to... return > the function. Yep. And to any new reader of such code, at first glance it will look like there might be a race on b.Iter. I like the func func better. Here's yet another try. Instead of trying to make this a one-liner, we could add a helper Parallel field to testing.B. Then, just as you write an explicit for loop up to b.N in regular benchmarks, you would follow something like this template to write a parallel benchmark: func BenchmarkParallelTemplate(b *testing.B) { b.Parallel.Init() // do global state initialization for p := 0; p < b.Parallel.NProcs(1); p++ { go func() { defer b.Parallel.Done() // do local state initialization for { n := b.Parallel.N() if n == 0 { break } for i := 0; i < n; i++ { // do work } } }() } p.Parallel.Wait() } Yes, it's a lot of boilerplate code, but the concurrency structure is exposed nicely, and it avoids returning functions that return functions. > > Unrelatedly, RunParallel seems useful, non-trivial, and general purpose. (I > plan > > to steal it for my own non-benchmarking purposes.) If you come up with a > really > > clean way to expose it, maybe it should live somewhere other than testing. > > Currently std lib does not provide any parallel algorithms, so just putting this > single function somewhere does not sound like a good plan. A more coherent plan > is to design parallel package that provides parallel.For and parallel.Sort. > However, as always, it can live on github. All true. Nudge nudge wink wink. :)
Sign in to reply to this message.
On 2014/02/05 17:35:48, josharian wrote: > On 2014/02/05 06:01:54, dvyukov wrote: > > On 2014/02/04 22:17:02, josharian wrote: > > > On 2014/02/02 17:34:50, dvyukov wrote: > > > > On 2014/01/31 15:00:30, dvyukov wrote: > > > > > >> I believe that all of our benchmarks must be parallel: > > > > > > > > > > > > > > > > > > If you really believe that then I think we should consider a different > > > > > > approach here. This API is simple but looks somewhat cumbersome to > use. > > > > > > > > > > > > For example, perhaps we could say that go test will look for any > > > > > > function > > > > > > > > > > > > ParallelBenchmark(*testing.PB, int) > > > > > > > > > > > > and execute -test.cpu instances in parallel, passing the number of > loop > > > > > > iterations as the int parameter. If you like this suggestion I > > > > > > recommend taking it to golang-dev to see what others think. > > > > > > > > > > The problem is that half of the parallel benchmarks in std lib need > > > > > goroutine-local state, so they do not fit into your simple pattern. > > > > > > > > > > The general structure of parallel benchmarks is: > > > > > > > > > > func BenchmarkFoo(...) { > > > > > // create some global state > > > > > // create some goroutine-local state > > > > > // do one iteration using global state and goroutine-local > > state > > > > > } > > > > > > > > > > However, I like the idea of providing more fundamental (and easier to > > > > > use) support for parallel benchmark. I will try to figure out how it > > > > > can look like. > > > > > > > > > > > > I've failed to figure out how ParallelBenchmark can support global/local > > > state. > > > > Except for: > > > > func ParallelBenchmark(b *testing.B) func() func() { > > > > templ := template.Must(template.New("test").Parse("Hello, {{.}}!")) > > > > return func() func() { > > > > var buf bytes.Buffer > > > > return func() { > > > > buf.Reset() > > > > templ.Execute(&buf, "World") > > > > } > > > > } > > > > } > > > > but it looks even more awkward and does not avoid this "func() func()". > > > > > > At the cost of verbosity, we could avoid the func() func() funkiness by > > > introducing an interface, something like: > > > > > > // BWorker is a worker in a parallel benchmark. > > > type BWorker interface { > > > // NewWorker creates a new BWorker from the receiver. > > > // NewWorker can be used to propagate global state to new > > > // workers and to set up local state per worker. > > > NewWorker() BWorker > > > // RunOnce is the unit of work that is being benchmarked. > > > // RunOnce will be called multiple times as part of a benchmark loop. > > > RunOnce() > > > } > > > > > > Using this in a parallel benchmark would look something like: > > > > > > type parallelExecute struct { > > > tpl *template.Template > > > buf *bytes.Buffer > > > } > > > > > > func (e *parallelExecute) NewWorker() BWorker { > > > // Pass the template on to the new worker. > > > // There is no local state to set up in this example. > > > return ¶llelExecute{tpl: e.tpl} > > > } > > > > > > func (e *parallelExecute) RunOnce() { > > > e.buf.Reset() > > > e.tpl.Execute(e.buf, "World") > > > } > > > > > > func BenchmarkExecute(b *testing.B) { > > > tpl := template.Must(template.New("test").Parse("Hello, {{.}}!")) > > > e := ¶llelExecute{tpl: tpl} > > > b.RunParallel(1, e) > > > } > > > > > > Having written all that out, I'm not sure that it is actually any better > than > > > func() func(). Still, I figure it is good to have an alternative on the > table. > > > > I agree that it does not look better. > > And I agree that we need to try to find something better. > > > > How about this? > > > > func BenchmarkFoo(b *testing.B) { > > b.RunParallel(1, func() { > > var buf bytes.Buffer > > b.Iter = func() { > > buf.Reset() > > fmt.Fprintf(&buf, "foo") > > } > > }) > > } > > > > Still, just returning the function looks like the most natural way to... > return > > the function. > > Yep. And to any new reader of such code, at first glance it will look like there > might be a race on b.Iter. > I like the func func better. > > Here's yet another try. Instead of trying to make this a one-liner, we could add > a helper Parallel field to testing.B. > Then, just as you write an explicit for loop up to b.N in regular benchmarks, > you would follow something like this > template to write a parallel benchmark: > > func BenchmarkParallelTemplate(b *testing.B) { > b.Parallel.Init() > // do global state initialization > for p := 0; p < b.Parallel.NProcs(1); p++ { > go func() { > defer b.Parallel.Done() > // do local state initialization > for { > n := b.Parallel.N() > if n == 0 { > break > } > for i := 0; i < n; i++ { > // do work > } > } > }() > } > p.Parallel.Wait() > } > > Yes, it's a lot of boilerplate code, but the concurrency structure is exposed > nicely, and it avoids > returning functions that return functions. You will need to always lookup what functions you need to call in what order, where and what to do with return values.
Sign in to reply to this message.
On 2014/02/05 18:26:31, dvyukov wrote: > On 2014/02/05 17:35:48, josharian wrote: > > On 2014/02/05 06:01:54, dvyukov wrote: > > > On 2014/02/04 22:17:02, josharian wrote: > > > > On 2014/02/02 17:34:50, dvyukov wrote: > > > > > On 2014/01/31 15:00:30, dvyukov wrote: > > > > > > >> I believe that all of our benchmarks must be parallel: > > > > > > > > > > > > > > > > > > > > > If you really believe that then I think we should consider a > different > > > > > > > approach here. This API is simple but looks somewhat cumbersome to > > use. > > > > > > > > > > > > > > For example, perhaps we could say that go test will look for any > > > > > > > function > > > > > > > > > > > > > > ParallelBenchmark(*testing.PB, int) > > > > > > > > > > > > > > and execute -test.cpu instances in parallel, passing the number of > > loop > > > > > > > iterations as the int parameter. If you like this suggestion I > > > > > > > recommend taking it to golang-dev to see what others think. > > > > > > > > > > > > The problem is that half of the parallel benchmarks in std lib need > > > > > > goroutine-local state, so they do not fit into your simple pattern. > > > > > > > > > > > > The general structure of parallel benchmarks is: > > > > > > > > > > > > func BenchmarkFoo(...) { > > > > > > // create some global state > > > > > > // create some goroutine-local state > > > > > > // do one iteration using global state and goroutine-local > > > state > > > > > > } > > > > > > > > > > > > However, I like the idea of providing more fundamental (and easier to > > > > > > use) support for parallel benchmark. I will try to figure out how it > > > > > > can look like. > > > > > > > > > > > > > > > I've failed to figure out how ParallelBenchmark can support global/local > > > > state. > > > > > Except for: > > > > > func ParallelBenchmark(b *testing.B) func() func() { > > > > > templ := template.Must(template.New("test").Parse("Hello, {{.}}!")) > > > > > return func() func() { > > > > > var buf bytes.Buffer > > > > > return func() { > > > > > buf.Reset() > > > > > templ.Execute(&buf, "World") > > > > > } > > > > > } > > > > > } > > > > > but it looks even more awkward and does not avoid this "func() func()". > > > > > > > > At the cost of verbosity, we could avoid the func() func() funkiness by > > > > introducing an interface, something like: > > > > > > > > // BWorker is a worker in a parallel benchmark. > > > > type BWorker interface { > > > > // NewWorker creates a new BWorker from the receiver. > > > > // NewWorker can be used to propagate global state to new > > > > // workers and to set up local state per worker. > > > > NewWorker() BWorker > > > > // RunOnce is the unit of work that is being benchmarked. > > > > // RunOnce will be called multiple times as part of a benchmark loop. > > > > RunOnce() > > > > } > > > > > > > > Using this in a parallel benchmark would look something like: > > > > > > > > type parallelExecute struct { > > > > tpl *template.Template > > > > buf *bytes.Buffer > > > > } > > > > > > > > func (e *parallelExecute) NewWorker() BWorker { > > > > // Pass the template on to the new worker. > > > > // There is no local state to set up in this example. > > > > return ¶llelExecute{tpl: e.tpl} > > > > } > > > > > > > > func (e *parallelExecute) RunOnce() { > > > > e.buf.Reset() > > > > e.tpl.Execute(e.buf, "World") > > > > } > > > > > > > > func BenchmarkExecute(b *testing.B) { > > > > tpl := template.Must(template.New("test").Parse("Hello, {{.}}!")) > > > > e := ¶llelExecute{tpl: tpl} > > > > b.RunParallel(1, e) > > > > } > > > > > > > > Having written all that out, I'm not sure that it is actually any better > > than > > > > func() func(). Still, I figure it is good to have an alternative on the > > table. > > > > > > I agree that it does not look better. > > > And I agree that we need to try to find something better. > > > > > > How about this? > > > > > > func BenchmarkFoo(b *testing.B) { > > > b.RunParallel(1, func() { > > > var buf bytes.Buffer > > > b.Iter = func() { > > > buf.Reset() > > > fmt.Fprintf(&buf, "foo") > > > } > > > }) > > > } > > > > > > Still, just returning the function looks like the most natural way to... > > return > > > the function. > > > > Yep. And to any new reader of such code, at first glance it will look like > there > > might be a race on b.Iter. > > I like the func func better. > > > > Here's yet another try. Instead of trying to make this a one-liner, we could > add > > a helper Parallel field to testing.B. > > Then, just as you write an explicit for loop up to b.N in regular benchmarks, > > you would follow something like this > > template to write a parallel benchmark: > > > > func BenchmarkParallelTemplate(b *testing.B) { > > b.Parallel.Init() > > // do global state initialization > > for p := 0; p < b.Parallel.NProcs(1); p++ { > > go func() { > > defer b.Parallel.Done() > > // do local state initialization > > for { > > n := b.Parallel.N() > > if n == 0 { > > break > > } > > for i := 0; i < n; i++ { > > // do work > > } > > } > > }() > > } > > p.Parallel.Wait() > > } > > > > Yes, it's a lot of boilerplate code, but the concurrency structure is exposed > > nicely, and it avoids > > returning functions that return functions. > > > You will need to always lookup what functions you need to call in what order, > where and what to do with return values. Yeah. Well, I'm out of ideas for the moment...
Sign in to reply to this message.
On Fri, Jan 31, 2014 at 7:00 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > > On Fri, Jan 31, 2014 at 6:45 PM, <iant@golang.org> wrote: >> >> If you really believe that then I think we should consider a different >> approach here. This API is simple but looks somewhat cumbersome to use. >> >> For example, perhaps we could say that go test will look for any >> function >> >> ParallelBenchmark(*testing.PB, int) >> >> and execute -test.cpu instances in parallel, passing the number of loop >> iterations as the int parameter. If you like this suggestion I >> recommend taking it to golang-dev to see what others think. > > The problem is that half of the parallel benchmarks in std lib need > goroutine-local state, so they do not fit into your simple pattern. > > The general structure of parallel benchmarks is: > > func BenchmarkFoo(...) { > // create some global state > // create some goroutine-local state > // do one iteration using global state and goroutine-local state > } I guess the part I haven't managed to grasp is why it can only do one iteration. I was suggesting that the number of iterations be passed into the function. Then the function can set up state, and run that many iterations. If setting up state is expensive, it can use StopTimer and StartTimer. I understand that the state needs to live across one set of loops, but why does it need to live across multiple sets of loops? Ian
Sign in to reply to this message.
On 2014/02/05 21:30:47, iant wrote: > I guess the part I haven't managed to grasp is why it can only do one > iteration. I was suggesting that the number of iterations be passed > into the function. Then the function can set up state, and run that > many iterations. If setting up state is expensive, it can use > StopTimer and StartTimer. I understand that the state needs to live > across one set of loops, but why does it need to live across multiple > sets of loops? Do you mean: b.RunParallel(func(n int) { var buf bytes.Buffer for i := 0; i < n; i++ { buf.Reset() fmt.Fprintf(&buf, "abc") } }) ? Currently I limit one loop by 100us, this is required for dynamic load balancing. So one loop can be just a dozen of iterations, and there will be 100'000 such loops. So, yes, setting up state can be expensive. StopTimer/StartTimer won't work in parallel setting. And I think we cannot make it work. But I agree that it looks cleaner. Probably if I bump grain size to 1ms, it will be acceptable. Btw, in either case we need to warn user to not call StopTimer in RunParallel.
Sign in to reply to this message.
Just for completeness few more alternatives: b.RunParallel(func(x *interface{}) { buf := (*x).(*bytes.Buffer) if buf == nil { buf = new(bytes.Buffer) *x = buf } buf.Reset() fmt.Fprintf(buf, "abc") }) b.RunParallel(func(x interface{}) interface{} { buf := x.(*bytes.Buffer) if buf == nil { buf = new(bytes.Buffer) } buf.Reset() fmt.Fprintf(buf, "abc") return buf }) // The tricky aspect here is that it's too easy to cause false sharing on the array. bufs := make([]*bytes.Buffer, runtime.GOMAXPROCS(0)) b.RunParallel(func(i int) { if bufs[i] == nil { bufs[i] = new(bytes.Buffer) } bufs[i].Reset() fmt.Fprintf(bufs[i], "abc") }) And the existing ones: b.RunParallel(func() func() { var buf bytes.Buffer return func() { buf.Reset() fmt.Fprintf(&buf, "abc") } }) b.RunParallel(func(n int) { var buf bytes.Buffer for i := 0; i < n; i++ { buf.Reset() fmt.Fprintf(&buf, "abc") } }) type parallelExecute struct { tpl *template.Template buf *bytes.Buffer } func (e *parallelExecute) NewWorker() BWorker { // Pass the template on to the new worker. // There is no local state to set up in this example. return ¶llelExecute{tpl: e.tpl} } func (e *parallelExecute) RunOnce() { e.buf.Reset() e.tpl.Execute(e.buf, "World") } func BenchmarkExecute(b *testing.B) { tpl := template.Must(template.New("test").Parse("Hello, {{.}}!")) e := ¶llelExecute{tpl: tpl} b.RunParallel(1, e) } func BenchmarkParallelTemplate(b *testing.B) { b.Parallel.Init() // do global state initialization for p := 0; p < b.Parallel.NProcs(1); p++ { go func() { defer b.Parallel.Done() // do local state initialization for { n := b.Parallel.N() if n == 0 { break } for i := 0; i < n; i++ { // do work } } }() } p.Parallel.Wait() }
Sign in to reply to this message.
Here's yet another alternative, tweaking Ian's suggestion. (Let me know if you tire of this, and I'll pipe down.) b.RunParallel(1, func(nn chan int) { var buf bytes.Buffer for n := range nn { for i := 0; i < n; i++ { buf.Reset() fmt.Fprintf(&buf, "abc") } } }) Channel operations are more expensive than atomics, but this should skew all benchmarks equally. And I ran a quick-and-dirty test (https://codereview.appspot.com/60640044) and after switching to buffered channels, I couldn't see the channel overhead in the benchmarks.
Sign in to reply to this message.
On 2014/02/06 17:29:00, josharian wrote: > Here's yet another alternative, tweaking Ian's suggestion. (Let me know if you > tire of this, and I'll pipe down.) > > b.RunParallel(1, func(nn chan int) { > var buf bytes.Buffer > for n := range nn { > for i := 0; i < n; i++ { > buf.Reset() > fmt.Fprintf(&buf, "abc") > } > } > }) > > Channel operations are more expensive than atomics, but this should skew all > benchmarks equally. And I ran a quick-and-dirty test > (https://codereview.appspot.com/60640044) and after switching to buffered > channels, I couldn't see the channel overhead in the benchmarks. Humm... this is an interesting development... Probably we can do something along the lines of: b.RunParallel(1, func(pb *testing.PB) { var buf bytes.Buffer for pb.Next() { buf.Reset() fmt.Fprintf(&buf, "abc") } }) where pb.Next will grab a batch of iterations with atomic increment and cache it, so most of the time it will return iterations from the cache. The pb is local to the goroutine. This avoids having double nested loop. Probably we can bolt some other functionality onto this pb. E.g. pb.StopTimer/StartTimer has at least some chance of working in a reasonable way. Btw, RunParallel will also allow us to fix b.Fatalf. Instead of crashing the program, it can force all goroutines to exit early and then call panic in the main goroutine.
Sign in to reply to this message.
I like it. On Feb 6, 2014 9:49 AM, <dvyukov@google.com> wrote: > On 2014/02/06 17:29:00, josharian wrote: > >> Here's yet another alternative, tweaking Ian's suggestion. (Let me >> > know if you > >> tire of this, and I'll pipe down.) >> > > b.RunParallel(1, func(nn chan int) { >> var buf bytes.Buffer >> for n := range nn { >> for i := 0; i < n; i++ { >> buf.Reset() >> fmt.Fprintf(&buf, "abc") >> } >> } >> }) >> > > Channel operations are more expensive than atomics, but this should >> > skew all > >> benchmarks equally. And I ran a quick-and-dirty test >> (https://codereview.appspot.com/60640044) and after switching to >> > buffered > >> channels, I couldn't see the channel overhead in the benchmarks. >> > > > Humm... this is an interesting development... > Probably we can do something along the lines of: > > b.RunParallel(1, func(pb *testing.PB) { > var buf bytes.Buffer > for pb.Next() { > buf.Reset() > fmt.Fprintf(&buf, "abc") > } > }) > > where pb.Next will grab a batch of iterations with atomic increment and > cache it, so most of the time it will return iterations from the cache. > The pb is local to the goroutine. > This avoids having double nested loop. > > Probably we can bolt some other functionality onto this pb. E.g. > pb.StopTimer/StartTimer has at least some chance of working in a > reasonable way. > > Btw, RunParallel will also allow us to fix b.Fatalf. Instead of crashing > the program, it can force all goroutines to exit early and then call > panic in the main goroutine. > > > https://codereview.appspot.com/57270043/ >
Sign in to reply to this message.
> I like it. +1
Sign in to reply to this message.
Full example for completeness: // Parallel benchmark for text/template.Template.Execute on a single object. func BenchmarkTempl(b *testing.B) { templ := template.Must(template.New("test").Parse("Hello, {{.}}!")) // RunParallel will create GOMAXPROCS goroutines // and distribute work among them. b.RunParallel(1, func(pb *testing.PB) { // Each goroutine has own private bytes.Buffer. var buf bytes.Buffer for pb.Next() { // This function is executed b.N times. buf.Reset() templ.Execute(&buf, "World") } }) } Exactly the same number of lines as for current "func func" version, just s/func() func()/func(pb *testing.PB)/ s/return func() {/for pb.Next() {/ ----- And returning to Ian's idea of having "func ParallelBenchmark", we can do: var templ = template.Must(template.New("test").Parse("Hello, {{.}}!")) func ParallelBenchmarkTempl(pb *testing.PB) { var buf bytes.Buffer for pb.Next() { buf.Reset() templ.Execute(&buf, "World") } } The only downside, is that all global state must be, well, global. But -1 line of code :) ----- And extending it further, we don't actually need testing.PB. We can extend testing.B with Next function, which will also work for non-parallel benchmarks, so that you can write: func BenchmarkTempl(b *testing.B) { templ := template.Must(template.New("test").Parse("Hello, {{.}}!")) var buf bytes.Buffer for b.Next() { buf.Reset() templ.Execute(&buf, "World") } } or: var templ = template.Must(template.New("test").Parse("Hello, {{.}}!")) func ParallelBenchmarkTempl(b *testing.B) { var buf bytes.Buffer for b.Next() { buf.Reset() templ.Execute(&buf, "World") } }
Sign in to reply to this message.
If PB had a Run method, we'd have a way to set options in the future. Setters on PB On Feb 6, 2014 10:01 AM, <dvyukov@google.com> wrote: > Full example for completeness: > > // Parallel benchmark for text/template.Template.Execute on a single > object. > func BenchmarkTempl(b *testing.B) { > templ := template.Must(template.New("test").Parse("Hello, > {{.}}!")) > // RunParallel will create GOMAXPROCS goroutines > // and distribute work among them. > b.RunParallel(1, func(pb *testing.PB) { > // Each goroutine has own private bytes.Buffer. > var buf bytes.Buffer > for pb.Next() { > // This function is executed b.N times. > buf.Reset() > templ.Execute(&buf, "World") > } > }) > } > > Exactly the same number of lines as for current "func func" version, > just > s/func() func()/func(pb *testing.PB)/ > s/return func() {/for pb.Next() {/ > > ----- > > And returning to Ian's idea of having "func ParallelBenchmark", we can > do: > > var templ = template.Must(template.New("test").Parse("Hello, {{.}}!")) > > func ParallelBenchmarkTempl(pb *testing.PB) { > var buf bytes.Buffer > for pb.Next() { > buf.Reset() > templ.Execute(&buf, "World") > } > } > > The only downside, is that all global state must be, well, global. > But -1 line of code :) > > ----- > > And extending it further, we don't actually need testing.PB. We can > extend testing.B with Next function, which will also work for > non-parallel benchmarks, so that you can write: > > func BenchmarkTempl(b *testing.B) { > templ := template.Must(template.New("test").Parse("Hello, > {{.}}!")) > var buf bytes.Buffer > for b.Next() { > buf.Reset() > templ.Execute(&buf, "World") > } > } > > or: > > var templ = template.Must(template.New("test").Parse("Hello, {{.}}!")) > func ParallelBenchmarkTempl(b *testing.B) { > var buf bytes.Buffer > for b.Next() { > buf.Reset() > templ.Execute(&buf, "World") > } > } > > https://codereview.appspot.com/57270043/ >
Sign in to reply to this message.
What do you mean by Run and Setters? On Thu, Feb 6, 2014 at 10:38 PM, Brad Fitzpatrick <bradfitz@golang.org> wrote: > If PB had a Run method, we'd have a way to set options in the future. > Setters on PB > > On Feb 6, 2014 10:01 AM, <dvyukov@google.com> wrote: >> >> Full example for completeness: >> >> // Parallel benchmark for text/template.Template.Execute on a single >> object. >> func BenchmarkTempl(b *testing.B) { >> templ := template.Must(template.New("test").Parse("Hello, >> {{.}}!")) >> // RunParallel will create GOMAXPROCS goroutines >> // and distribute work among them. >> b.RunParallel(1, func(pb *testing.PB) { >> // Each goroutine has own private bytes.Buffer. >> var buf bytes.Buffer >> for pb.Next() { >> // This function is executed b.N times. >> buf.Reset() >> templ.Execute(&buf, "World") >> } >> }) >> } >> >> Exactly the same number of lines as for current "func func" version, >> just >> s/func() func()/func(pb *testing.PB)/ >> s/return func() {/for pb.Next() {/ >> >> ----- >> >> And returning to Ian's idea of having "func ParallelBenchmark", we can >> do: >> >> var templ = template.Must(template.New("test").Parse("Hello, {{.}}!")) >> >> func ParallelBenchmarkTempl(pb *testing.PB) { >> var buf bytes.Buffer >> for pb.Next() { >> buf.Reset() >> templ.Execute(&buf, "World") >> } >> } >> >> The only downside, is that all global state must be, well, global. >> But -1 line of code :) >> >> ----- >> >> And extending it further, we don't actually need testing.PB. We can >> extend testing.B with Next function, which will also work for >> non-parallel benchmarks, so that you can write: >> >> func BenchmarkTempl(b *testing.B) { >> templ := template.Must(template.New("test").Parse("Hello, >> {{.}}!")) >> var buf bytes.Buffer >> for b.Next() { >> buf.Reset() >> templ.Execute(&buf, "World") >> } >> } >> >> or: >> >> var templ = template.Must(template.New("test").Parse("Hello, {{.}}!")) >> func ParallelBenchmarkTempl(b *testing.B) { >> var buf bytes.Buffer >> for b.Next() { >> buf.Reset() >> templ.Execute(&buf, "World") >> } >> } >> >> https://codereview.appspot.com/57270043/
Sign in to reply to this message.
Sorry, was typing on a chairlift. b.NewParallel().Run(1, func(pb *testing.PB) { .... }) Then NewParallel can return a *PB and later we can add options on PB to change parameters before we call the Run method. But I don't like having two pb in scope (the one from NewParallel and the same one from the func). Will propose something more coherent later. On Feb 6, 2014 11:43 AM, "Dmitry Vyukov" <dvyukov@google.com> wrote: > What do you mean by Run and Setters? > > On Thu, Feb 6, 2014 at 10:38 PM, Brad Fitzpatrick <bradfitz@golang.org> > wrote: > > If PB had a Run method, we'd have a way to set options in the future. > > Setters on PB > > > > On Feb 6, 2014 10:01 AM, <dvyukov@google.com> wrote: > >> > >> Full example for completeness: > >> > >> // Parallel benchmark for text/template.Template.Execute on a single > >> object. > >> func BenchmarkTempl(b *testing.B) { > >> templ := template.Must(template.New("test").Parse("Hello, > >> {{.}}!")) > >> // RunParallel will create GOMAXPROCS goroutines > >> // and distribute work among them. > >> b.RunParallel(1, func(pb *testing.PB) { > >> // Each goroutine has own private bytes.Buffer. > >> var buf bytes.Buffer > >> for pb.Next() { > >> // This function is executed b.N times. > >> buf.Reset() > >> templ.Execute(&buf, "World") > >> } > >> }) > >> } > >> > >> Exactly the same number of lines as for current "func func" version, > >> just > >> s/func() func()/func(pb *testing.PB)/ > >> s/return func() {/for pb.Next() {/ > >> > >> ----- > >> > >> And returning to Ian's idea of having "func ParallelBenchmark", we can > >> do: > >> > >> var templ = template.Must(template.New("test").Parse("Hello, {{.}}!")) > >> > >> func ParallelBenchmarkTempl(pb *testing.PB) { > >> var buf bytes.Buffer > >> for pb.Next() { > >> buf.Reset() > >> templ.Execute(&buf, "World") > >> } > >> } > >> > >> The only downside, is that all global state must be, well, global. > >> But -1 line of code :) > >> > >> ----- > >> > >> And extending it further, we don't actually need testing.PB. We can > >> extend testing.B with Next function, which will also work for > >> non-parallel benchmarks, so that you can write: > >> > >> func BenchmarkTempl(b *testing.B) { > >> templ := template.Must(template.New("test").Parse("Hello, > >> {{.}}!")) > >> var buf bytes.Buffer > >> for b.Next() { > >> buf.Reset() > >> templ.Execute(&buf, "World") > >> } > >> } > >> > >> or: > >> > >> var templ = template.Must(template.New("test").Parse("Hello, {{.}}!")) > >> func ParallelBenchmarkTempl(b *testing.B) { > >> var buf bytes.Buffer > >> for b.Next() { > >> buf.Reset() > >> templ.Execute(&buf, "World") > >> } > >> } > >> > >> https://codereview.appspot.com/57270043/ >
Sign in to reply to this message.
On Thu, Feb 6, 2014 at 11:47 PM, Brad Fitzpatrick <bradfitz@golang.org> wrote: > Sorry, was typing on a chairlift. > > b.NewParallel().Run(1, func(pb *testing.PB) { > .... > }) > > Then NewParallel can return a *PB and later we can add options on PB to > change parameters before we call the Run method. > > But I don't like having two pb in scope (the one from NewParallel and the > same one from the func). Will propose something more coherent later. A limited form of setters can be: b.RunParallel(func(pb *testing.PB) { // Usually you don't need it, because you get the default of GOMAXPROCS goroutines. pb.SetParallelism(16) // request 16*GOMAXPROCS goroutines ... }) This is of course somewhat subtle, because SetParallelism will actually be called GOMAXPROCS times by different goroutines. But for this idempotent method, we can resolve it under the hood.
Sign in to reply to this message.
> > b.NewParallel().Run(1, func(pb *testing.PB) { > > .... > > }) > > > > Then NewParallel can return a *PB and later we can add options on PB to > > change parameters before we call the Run method. > > > > But I don't like having two pb in scope (the one from NewParallel and the > > same one from the func). Will propose something more coherent later. > > > A limited form of setters can be: > > b.RunParallel(func(pb *testing.PB) { > // Usually you don't need it, because you get the default of > GOMAXPROCS goroutines. > pb.SetParallelism(16) // request 16*GOMAXPROCS goroutines > ... > }) > > This is of course somewhat subtle, because SetParallelism will > actually be called GOMAXPROCS times by different goroutines. But for > this idempotent method, we can resolve it under the hood. Pending Brad's more coherent proposal, what's wrong with the naive approach of simply adding parallelism-related options directly to B? It'd look like: func BenchmarkTempl(b *testing.B) { templ := template.Must(template.New("test").Parse("Hello, {{.}}!")) b.SetParallelism(2) // request 2*GOMAXPROCS goroutines // other parallel-ish options? b.RunParallel(func(pb *testing.PB) { var buf bytes.Buffer for pb.Next() { buf.Reset() templ.Execute(&buf, "World") } }) } It'd preclude multiple concurrent calls to b.RunParallel, but that's not much of a loss.
Sign in to reply to this message.
On Sat, Feb 8, 2014 at 6:03 AM, <josharian@gmail.com> wrote: >> > b.NewParallel().Run(1, func(pb *testing.PB) { >> > .... >> > }) >> > >> > Then NewParallel can return a *PB and later we can add options on PB > > to >> >> > change parameters before we call the Run method. >> > >> > But I don't like having two pb in scope (the one from NewParallel > > and the >> >> > same one from the func). Will propose something more coherent later. > > > >> A limited form of setters can be: > > >> b.RunParallel(func(pb *testing.PB) { >> // Usually you don't need it, because you get the default of >> GOMAXPROCS goroutines. >> pb.SetParallelism(16) // request 16*GOMAXPROCS goroutines >> ... >> }) > > >> This is of course somewhat subtle, because SetParallelism will >> actually be called GOMAXPROCS times by different goroutines. But for >> this idempotent method, we can resolve it under the hood. > > > Pending Brad's more coherent proposal, what's wrong with the naive > approach of simply adding parallelism-related options directly to B? > It'd look like: > > > func BenchmarkTempl(b *testing.B) { > templ := template.Must(template.New("test").Parse("Hello, {{.}}!")) > b.SetParallelism(2) // request 2*GOMAXPROCS goroutines looks good to me > // other parallel-ish options? > b.RunParallel(func(pb *testing.PB) { > var buf bytes.Buffer > for pb.Next() { > buf.Reset() > templ.Execute(&buf, "World") > } > }) > } > > It'd preclude multiple concurrent calls to b.RunParallel, but that's not > much of a loss. > > > https://codereview.appspot.com/57270043/
Sign in to reply to this message.
On Sat, Feb 8, 2014 at 8:12 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Sat, Feb 8, 2014 at 6:03 AM, <josharian@gmail.com> wrote: > >> > b.NewParallel().Run(1, func(pb *testing.PB) { > >> > .... > >> > }) > >> > > >> > Then NewParallel can return a *PB and later we can add options on PB > > > > to > >> > >> > change parameters before we call the Run method. > >> > > >> > But I don't like having two pb in scope (the one from NewParallel > > > > and the > >> > >> > same one from the func). Will propose something more coherent later. > > > > > > > >> A limited form of setters can be: > > > > > >> b.RunParallel(func(pb *testing.PB) { > >> // Usually you don't need it, because you get the default of > >> GOMAXPROCS goroutines. > >> pb.SetParallelism(16) // request 16*GOMAXPROCS goroutines > >> ... > >> }) > > > > > >> This is of course somewhat subtle, because SetParallelism will > >> actually be called GOMAXPROCS times by different goroutines. But for > >> this idempotent method, we can resolve it under the hood. > > > > > > Pending Brad's more coherent proposal, what's wrong with the naive > > approach of simply adding parallelism-related options directly to B? > > It'd look like: > > > > > > func BenchmarkTempl(b *testing.B) { > > templ := template.Must(template.New("test").Parse("Hello, > {{.}}!")) > > b.SetParallelism(2) // request 2*GOMAXPROCS goroutines > > looks good to me > > > // other parallel-ish options? > > b.RunParallel(func(pb *testing.PB) { > Likewise LGTM.
Sign in to reply to this message.
On Sat, Feb 8, 2014 at 6:03 AM, <josharian@gmail.com> wrote: >> > b.NewParallel().Run(1, func(pb *testing.PB) { >> > .... >> > }) >> > >> > Then NewParallel can return a *PB and later we can add options on PB > > to >> >> > change parameters before we call the Run method. >> > >> > But I don't like having two pb in scope (the one from NewParallel > > and the >> >> > same one from the func). Will propose something more coherent later. > > > >> A limited form of setters can be: > > >> b.RunParallel(func(pb *testing.PB) { >> // Usually you don't need it, because you get the default of >> GOMAXPROCS goroutines. >> pb.SetParallelism(16) // request 16*GOMAXPROCS goroutines >> ... >> }) > > >> This is of course somewhat subtle, because SetParallelism will >> actually be called GOMAXPROCS times by different goroutines. But for >> this idempotent method, we can resolve it under the hood. > > > Pending Brad's more coherent proposal, what's wrong with the naive > approach of simply adding parallelism-related options directly to B? > It'd look like: > Ian, what do you think about the following variant (note that SetParallelism call is optional)? > func BenchmarkTempl(b *testing.B) { > templ := template.Must(template.New("test").Parse("Hello, {{.}}!")) > b.SetParallelism(2) // request 2*GOMAXPROCS goroutines > b.RunParallel(func(pb *testing.PB) { > var buf bytes.Buffer > for pb.Next() { > buf.Reset() > templ.Execute(&buf, "World") > } > }) > } > > It'd preclude multiple concurrent calls to b.RunParallel, but that's not > much of a loss. > > > https://codereview.appspot.com/57270043/
Sign in to reply to this message.
Hi! I've updated the code to use the new interface. PTAL. Comments are more of a stub of what needs to be described there, I am all open to suggestions (or complete replacements).
Sign in to reply to this message.
https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark.go File src/pkg/testing/benchmark.go (right): https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:353: // PB is a helper type for RunParallel. Maybe: // A PB is used by RunParallel for running parallel benchmarks. Or s/used/provided/ https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:361: // Next returns whether there is more iterations to execute. s/returns/reports/ and s/is/are/ https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:379: // Number of worker goroutines can be overriden with SetParallelism function. Articles (the) before both Number and SetParallelism https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:380: // It's most useful to use RunParallel with go test -cpu option. with the or "using go test -cpu=..." https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:382: // The body function should not use b.N itself; instead it must execute benchmarking s/must/should/ Must is much stronger than should and often feels too strong. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:385: // The body function should not use StopTimer, StartTimer and ResetTimer functions, another "the". And s/and/or/: should not use the StopTime, StartTime or ResetTimer functions https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:418: // SetParallelism sets number of worker goroutines for RunParallel to p*GOMAXPROCS. the number of Also, mention that the default is 1? Even though it's mentioned indirectly above.
Sign in to reply to this message.
The suggested comment changes below are all just that -- suggested. Take of them what you like. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark.go File src/pkg/testing/benchmark.go (right): https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:40: lastDuration time.Duration Is it worth saving both of these, as opposed to just the last ns/op? As written, lastDuration is a bit ambiguous -- was it the duration of the entire run or the duration of single operation? Comments for these would be good. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:353: // PB is a helper type for RunParallel. // PB coordinates the work done by single goroutine in RunParallel. Is PB the right name now? In its current incarnation, a PB isn't a ParallelBenchmark, it's more of a worker. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:361: // Next returns whether there is more iterations to execute. s/returns/reports/ s/is/are/ https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:386: // because they have global effect. // RunParallel runs a benchmark in parallel. // It creates multiple goroutines and distributes N iterations among them. // The number of goroutines defaults to GOMAXPROCS. To increase parallelism for non-CPU-bound benchmarks, use SetParallelism. // RunParallel is usually used with the go test -cpu flag. // // The body function will be run in each goroutine. It should set up any goroutine-local state and then iterate // until pb.Next returns false. It should not use StartTimer, StopTimer, or ResetTimer. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:404: N := uint64(0) s/N/n/? https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:419: // There is usually no need to call this function for CPU-bound benchmarks. // SetParallelism sets the number of goroutines used by RunParallel to p * GOMAXPROCS. // There is usually no need to call SetParallelism for CPU-bound benchmarks. // If p is less than 1, this call will have no effect. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... File src/pkg/testing/benchmark_test.go (right): https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark_test.go:76: t.Errorf("expected %v procs, got %v", want, procs) As I understand it, the preferred style is now "got %v, want %v", here and below. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark_test.go:103: // Each goroutine has own private bytes.Buffer. s/has own private/has its own/ https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark_test.go:106: // The loop body is executed b.N times total. s/total./total across all goroutines./
Sign in to reply to this message.
PTAL https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark.go File src/pkg/testing/benchmark.go (right): https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:40: lastDuration time.Duration On 2014/02/10 20:19:32, josharian wrote: > Is it worth saving both of these, as opposed to just the last ns/op? I don't know. I do not see any problem with saving an additional var here. For the current run we save N and duration, so it looks reasonable to save the same for the last run. Also, ns/op will have to be float64, and I fear what I don't understand :) > As written, lastDuration is a bit ambiguous -- was it the duration of the entire > run or the duration of single operation? > > Comments for these would be good. comments are added https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:353: // PB is a helper type for RunParallel. On 2014/02/10 19:37:05, bradfitz wrote: > Maybe: > > // A PB is used by RunParallel for running parallel benchmarks. > > Or s/used/provided/ Done. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:353: // PB is a helper type for RunParallel. On 2014/02/10 20:19:32, josharian wrote: > // PB coordinates the work done by single goroutine in RunParallel. > > Is PB the right name now? In its current incarnation, a PB isn't a > ParallelBenchmark, it's more of a worker. Any better suggestion? https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:361: // Next returns whether there is more iterations to execute. On 2014/02/10 19:37:05, bradfitz wrote: > s/returns/reports/ and s/is/are/ Done. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:361: // Next returns whether there is more iterations to execute. On 2014/02/10 20:19:32, josharian wrote: > s/returns/reports/ > s/is/are/ Done. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:379: // Number of worker goroutines can be overriden with SetParallelism function. On 2014/02/10 19:37:05, bradfitz wrote: > Articles (the) before both Number and SetParallelism Done. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:380: // It's most useful to use RunParallel with go test -cpu option. On 2014/02/10 19:37:05, bradfitz wrote: > with the > > or "using go test -cpu=..." Done. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:382: // The body function should not use b.N itself; instead it must execute benchmarking On 2014/02/10 19:37:05, bradfitz wrote: > s/must/should/ > > Must is much stronger than should and often feels too strong. Done. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:385: // The body function should not use StopTimer, StartTimer and ResetTimer functions, On 2014/02/10 19:37:05, bradfitz wrote: > another "the". And s/and/or/: > > should not use the StopTime, StartTime or ResetTimer functions Done. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:386: // because they have global effect. On 2014/02/10 20:19:32, josharian wrote: > // RunParallel runs a benchmark in parallel. > // It creates multiple goroutines and distributes N iterations among them. > // The number of goroutines defaults to GOMAXPROCS. To increase parallelism for > non-CPU-bound benchmarks, use SetParallelism. > // RunParallel is usually used with the go test -cpu flag. > // > // The body function will be run in each goroutine. It should set up any > goroutine-local state and then iterate > // until pb.Next returns false. It should not use StartTimer, StopTimer, or > ResetTimer. Now, that I've fixed all of Brad's comments, I get to this... OK, let it be this version. I permit myself to do following changes (mostly per Brad's suggestions): s/N/b.N/ s/SetParallelism/the SetParallelism function/ s/StartTimer, StopTimer, or ResetTimer/the StartTimer, StopTimer, or ResetTimer functions/ and I've restored "because they have global effect", because this limitation instantly raises why? question. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:404: N := uint64(0) On 2014/02/10 20:19:32, josharian wrote: > s/N/n/? Done. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:418: // SetParallelism sets number of worker goroutines for RunParallel to p*GOMAXPROCS. On 2014/02/10 19:37:05, bradfitz wrote: > the number of > > Also, mention that the default is 1? Even though it's mentioned indirectly > above. Since you will need to fix grammar and wording in my version anyway, can you just give me the complete sense? :) https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:419: // There is usually no need to call this function for CPU-bound benchmarks. On 2014/02/10 20:19:32, josharian wrote: > // SetParallelism sets the number of goroutines used by RunParallel to p * > GOMAXPROCS. > // There is usually no need to call SetParallelism for CPU-bound benchmarks. > // If p is less than 1, this call will have no effect. Done. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... File src/pkg/testing/benchmark_test.go (right): https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark_test.go:76: t.Errorf("expected %v procs, got %v", want, procs) On 2014/02/10 20:19:32, josharian wrote: > As I understand it, the preferred style is now "got %v, want %v", here and > below. Done. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark_test.go:103: // Each goroutine has own private bytes.Buffer. On 2014/02/10 20:19:32, josharian wrote: > s/has own private/has its own/ Done. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark_test.go:106: // The loop body is executed b.N times total. On 2014/02/10 20:19:32, josharian wrote: > s/total./total across all goroutines./ Done.
Sign in to reply to this message.
LGTM Thanks, Dmitry, this was really fun. https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark.go File src/pkg/testing/benchmark.go (right): https://codereview.appspot.com/57270043/diff/270001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:353: // PB is a helper type for RunParallel. On 2014/02/11 11:31:14, dvyukov wrote: > On 2014/02/10 20:19:32, josharian wrote: > > // PB coordinates the work done by single goroutine in RunParallel. > > > > Is PB the right name now? In its current incarnation, a PB isn't a > > ParallelBenchmark, it's more of a worker. > > Any better suggestion? BG, for benchmark goroutine? I don't feel strongly about this. https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark.go File src/pkg/testing/benchmark.go (right): https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:380: // non-CPU-bound benchmarks, use the SetParallelism function. s/use the SetParallelism function/call SetParallelism before RunParallel/.
Sign in to reply to this message.
I think this just needs an example now.
Sign in to reply to this message.
there is an example On Tue, Feb 11, 2014 at 7:53 PM, <bradfitz@golang.org> wrote: > I think this just needs an example now. > > > https://codereview.appspot.com/57270043/
Sign in to reply to this message.
LGTM Whoops, I was only reading benchmark.go this latest round. On Tue, Feb 11, 2014 at 7:54 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > there is an example > > On Tue, Feb 11, 2014 at 7:53 PM, <bradfitz@golang.org> wrote: > > I think this just needs an example now. > > > > > > https://codereview.appspot.com/57270043/ >
Sign in to reply to this message.
I'll take a look at the API later today. On Tue, Feb 11, 2014 at 7:57 AM, Brad Fitzpatrick <bradfitz@golang.org> wrote: > LGTM > > Whoops, I was only reading benchmark.go this latest round. > > > > On Tue, Feb 11, 2014 at 7:54 AM, Dmitry Vyukov <dvyukov@google.com> wrote: >> >> there is an example >> >> On Tue, Feb 11, 2014 at 7:53 PM, <bradfitz@golang.org> wrote: >> > I think this just needs an example now. >> > >> > >> > https://codereview.appspot.com/57270043/ > >
Sign in to reply to this message.
Thanks! Waiting for you. On Tue, Feb 11, 2014 at 8:33 PM, Rob Pike <r@golang.org> wrote: > I'll take a look at the API later today. > > On Tue, Feb 11, 2014 at 7:57 AM, Brad Fitzpatrick <bradfitz@golang.org> wrote: >> LGTM >> >> Whoops, I was only reading benchmark.go this latest round. >> >> >> >> On Tue, Feb 11, 2014 at 7:54 AM, Dmitry Vyukov <dvyukov@google.com> wrote: >>> >>> there is an example >>> >>> On Tue, Feb 11, 2014 at 7:53 PM, <bradfitz@golang.org> wrote: >>> > I think this just needs an example now. >>> > >>> > >>> > https://codereview.appspot.com/57270043/ >> >>
Sign in to reply to this message.
most of my comments are about documentation. the API looks really nice. https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark.go File src/pkg/testing/benchmark.go (right): https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:39: lastN int // number of iterations in the last run .,+1s/last/previous/ https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:46: parallelism int // what units? https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:355: globalN *uint64 // comments please. https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:380: // non-CPU-bound benchmarks, use the SetParallelism function. On 2014/02/11 15:02:20, josharian wrote: > s/use the SetParallelism function/call SetParallelism before RunParallel/. +1 https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:386: // because they have global effect. the package comment in testing.go should have a simple parallel example to augment (but not duplicate) this comment. https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:388: // Calculate grain size as number of iterations that take ~100us. s/u/µ/ U+00B5 'µ' is the micro sign. no need to use the parochial u for µ https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:389: // 100us is enough to amortize the atomic operation and provide sufficient not clear what is 'the' atomic operation. maybe just // 100µs is enough to amortize overhead and provide.... https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:393: grain = 1e5 * uint64(b.lastN) / uint64(b.lastDuration) instead of uint64(b.lastDuration) use b.lastDuration.Nanoseconds() and fix up the types (it's int64). https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:398: // The inner loop and function call must take ~10ns, so do not do more "must take ~10ns". i think you mean that on current hardware, we expect that much time is consumed. rephrase. // We expect the inner loop and function call to take at least 10ns, so do not do more than 100µs/10ns=1e4 iterations. https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:411: pb := &PB{globalN: &n, grain: grain, bN: uint64(b.N)} please roll this out onto multiple lines. it's easier to read that way.
Sign in to reply to this message.
most of my comments are about documentation. the API looks really nice.
Sign in to reply to this message.
most of my comments are about documentation. the API looks really nice.
Sign in to reply to this message.
most of my comments are about documentation. the API looks really nice.
Sign in to reply to this message.
most of my comments are about documentation. the API looks really nice.
Sign in to reply to this message.
Sorry about the burst. The submit button wasn't showing evidence it had succeeded. On Tue, Feb 11, 2014 at 2:57 PM, <r@golang.org> wrote: > most of my comments are about documentation. > > the API looks really nice. > > > https://codereview.appspot.com/57270043/
Sign in to reply to this message.
PTAL https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark.go File src/pkg/testing/benchmark.go (right): https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:39: lastN int // number of iterations in the last run On 2014/02/11 22:57:18, r wrote: > .,+1s/last/previous/ Done. https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:46: parallelism int On 2014/02/11 22:57:18, r wrote: > // what units? Done. https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:355: globalN *uint64 On 2014/02/11 22:57:18, r wrote: > // comments please. Done. https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:380: // non-CPU-bound benchmarks, use the SetParallelism function. On 2014/02/11 15:02:20, josharian wrote: > s/use the SetParallelism function/call SetParallelism before RunParallel/. Done. https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:380: // non-CPU-bound benchmarks, use the SetParallelism function. On 2014/02/11 22:57:18, r wrote: > On 2014/02/11 15:02:20, josharian wrote: > > s/use the SetParallelism function/call SetParallelism before RunParallel/. > > +1 Done. https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:386: // because they have global effect. On 2014/02/11 22:57:18, r wrote: > the package comment in testing.go should have a simple parallel example to > augment (but not duplicate) this comment. Done. https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:388: // Calculate grain size as number of iterations that take ~100us. On 2014/02/11 22:57:18, r wrote: > s/u/µ/ > > U+00B5 'µ' is the micro sign. no need to use the parochial u for µ Done. https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:389: // 100us is enough to amortize the atomic operation and provide sufficient On 2014/02/11 22:57:18, r wrote: > not clear what is 'the' atomic operation. maybe just > > // 100µs is enough to amortize overhead and provide.... Done. https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:393: grain = 1e5 * uint64(b.lastN) / uint64(b.lastDuration) On 2014/02/11 22:57:18, r wrote: > instead of uint64(b.lastDuration) use b.lastDuration.Nanoseconds() and fix up > the types (it's int64). I check that it's >0, so it must fit into uint64. Do you mean to change types of all counters, N, grain, durations to int64? Why? https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:398: // The inner loop and function call must take ~10ns, so do not do more On 2014/02/11 22:57:18, r wrote: > "must take ~10ns". i think you mean that on current hardware, we expect that > much time is consumed. rephrase. > > // We expect the inner loop and function call to take at least 10ns, so do not > do more than 100µs/10ns=1e4 iterations. Done. https://codereview.appspot.com/57270043/diff/280001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:411: pb := &PB{globalN: &n, grain: grain, bN: uint64(b.N)} On 2014/02/11 22:57:18, r wrote: > please roll this out onto multiple lines. it's easier to read that way. Done.
Sign in to reply to this message.
Ian, Rob, final blessing?
Sign in to reply to this message.
LGTM iant is on vacation. i think rsc should take a look too. https://codereview.appspot.com/57270043/diff/320001/src/pkg/testing/benchmark.go File src/pkg/testing/benchmark.go (right): https://codereview.appspot.com/57270043/diff/320001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:422: // SetParallelism sets the number of goroutines used by RunParallel to p * GOMAXPROCS. for loss of ambiguity, please s/p \* GOMAXPROCS/p*GOMAXPROCS/.
Sign in to reply to this message.
R=rsc@golang.org (assigned by r@golang.org)
Sign in to reply to this message.
waiting for Russ https://codereview.appspot.com/57270043/diff/320001/src/pkg/testing/benchmark.go File src/pkg/testing/benchmark.go (right): https://codereview.appspot.com/57270043/diff/320001/src/pkg/testing/benchmark... src/pkg/testing/benchmark.go:422: // SetParallelism sets the number of goroutines used by RunParallel to p * GOMAXPROCS. On 2014/02/16 16:22:10, r wrote: > for loss of ambiguity, please > s/p \* GOMAXPROCS/p*GOMAXPROCS/. Done.
Sign in to reply to this message.
LGTM nice
Sign in to reply to this message.
I agree: nice. Thanks to everyone who pitched in. This is how to do API design: think hard, ask opinions, don't settle for the first thing that comes along. -rob
Sign in to reply to this message.
*** Submitted as https://code.google.com/p/go/source/detail?r=355c9ac57116 *** testing: ease writing parallel benchmarks Add b.RunParallel function that captures parallel benchmark boilerplate: creates worker goroutines, joins worker goroutines, distributes work among them in an efficient way, auto-tunes grain size. Fixes issue 7090. R=bradfitz, iant, josharian, tracey.brendan, r, rsc, gobot CC=golang-codereviews https://codereview.appspot.com/57270043
Sign in to reply to this message.
|