Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: ease writing parallel benchmarks #7090

Closed
bradfitz opened this issue Jan 9, 2014 · 6 comments
Closed

testing: ease writing parallel benchmarks #7090

bradfitz opened this issue Jan 9, 2014 · 6 comments
Milestone

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Jan 9, 2014

Writing contention benchmarks involves some boilerplate:

https://golang.org/cl/46010043/diff/60001/src/pkg/sync/pool_test.go
https://golang.org/cl/49910043/

etc

The general form is:

        const CallsPerSched = 1000
    procs := runtime.GOMAXPROCS(-1)
        N := int32(b.N / CallsPerSched)
        c := make(chan bool, procs)
    for p := 0; p < procs; p++ {
                go func() {
                    var buf bytes.Buffer
                        for atomic.AddInt32(&N, -1) >= 0 {
                                for g := 0; g < CallsPerSched; g++ {
                    f(&buf)
                                }
                    }
                        c <- true
            }()
        }
        for p := 0; p < procs; p++ {
                <-c
    }


But sometimes:

    n0 := uintptr(b.N)
    atomic.AddUintptr(&n, ^uintptr(0)) < n0 {

The testing package seems to cap b.N at 2*1e9, but that's not publicly documented as a
guarantee.

Can/should we say that b.N will always fit in an int32, even if it's of type int?

I once even defensively wrote,

func BenchmarkPool(b *testing.B) {
        procs := runtime.GOMAXPROCS(-1)
    var dec func() bool
    if unsafe.Sizeof(b.N) == 8 {
        n := int64(b.N)
        dec = func() bool {
            return atomic.AddInt64(&n, -1) >= 0
        }
    } else {
                n := int32(b.N)
        dec = func() bool {
            return atomic.AddInt32(&n, -1) >= 0
        }
    }
        var p Pool
    var wg WaitGroup
    for i := 0; i < procs; i++ {
        wg.Add(1)
        go func() {
                        defer wg.Done()
            for dec() {
                                p.Put(1)
                p.Get()
            }
        }()
    }
        wg.Wait()
}

... but felt gross about it.

We should either document this, or provide a means in the testing package to ease
writing benchmarks for contention.
@dvyukov
Copy link
Member

dvyukov commented Jan 10, 2014

Comment 1:

Yes, it would be handy. Lots of benchmarks do this. And even more do not, but should.
In the dashboard benchmarks I use the following helper function:
// Parallel is a public helper function that runs f N times in P*GOMAXPROCS goroutines.
func Parallel(N uint64, P int, f func()) {
        numProcs := P * runtime.GOMAXPROCS(0)
        var wg sync.WaitGroup
        wg.Add(numProcs)
        for p := 0; p < numProcs; p++ {
                go func() {
                        defer wg.Done()
                        for int64(atomic.AddUint64(&N, ^uint64(0))) >= 0 {
                                f()
                        }
                }()
        }
        wg.Wait()
}
One aspect to consider is that generally it also needs to know "grain size", because
synchronizing on each iteration can outweigh the thing-under-test. If it's incorporated
into testing package, then probably we can remember ns/op from previous runs and thus
easily calculate grain size.

Labels changed: added repo-main, release-go1.3maybe.

@dvyukov
Copy link
Member

dvyukov commented Jan 26, 2014

Comment 2:

Out of 27 parallel benchmarks in std lib, 16 fit well into simple:
b.RunParallel(func() {
  ...
})
but 11 use local per-goroutine state, so they do not fit as is into this simple pattern.
I see 2 options for per-goroutine state:
1.
b.RunParallel(func(x *interface{}) {
  ...
})
then the function can cache anything it wants in x. The overhead is merely interface
cast.
2. benchmarks can use sync.Pool to cache local state.
Pool.Get/Put overhead is 20-50 ns depending on processor.
and this variant most likely will create more resources than there are goroutines.
---
Separate question is whether we want to support goroutine excess, i.e. create
K*GOMAXPROCS goroutines.
The interface can be:
b.RunParallel(4, func() {
  ...
})
this will create 4*GOMAXPROCS goroutines.
This may be useful to benchmark something that includes IO operations, or has contention
(so that some goroutines are temporary non-runnable).
But I am concerned that users may mis-interpret this parameter.
Brad?

@bradfitz
Copy link
Contributor Author

Comment 4:

Or even:
    b.RunParallel(f func() (loopFn func()))
f is called once per goroutine and returns a func to be called in a loop.
Then the per-goroutine state is simply createdby f and closed over in loopFn.
That might be too complicated for the majority of cases, though.  We could provide a
simple method and a more complex method that gives you the K parameter too.
I don't have strong opinions here, other than wanting to make this easy to write and
cleaning up the boilerplate in these 27+ and growing number of places.

@dvyukov
Copy link
Member

dvyukov commented Jan 27, 2014

Comment 5:

Here is what I have now:
https://golang.org/cl/57270043
Your "func() func()" idea works nicely, and it seems to be enough to express all common
cases.
Although, this "b.RunParallel2(1, func() func() {" looks somewhat clumsy for std lib.
And, yes, we need a better name for RunParallel2.
Any suggestions?

Owner changed to @dvyukov.

Status changed to Started.

@bradfitz
Copy link
Contributor Author

Comment 6:

Replied on codereview.

@dvyukov
Copy link
Member

dvyukov commented Feb 17, 2014

Comment 7:

This issue was closed by revision c3922f0.

Status changed to Fixed.

@rsc rsc added this to the Go1.3 milestone Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 25, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants