Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: testing: allow B to use real execution duration to decide N #40227

Closed
joesonw opened this issue Jul 15, 2020 · 8 comments
Closed

proposal: testing: allow B to use real execution duration to decide N #40227

joesonw opened this issue Jul 15, 2020 · 8 comments

Comments

@joesonw
Copy link

joesonw commented Jul 15, 2020

When trying something like below. It can lead to benchmark TIMEOUT. Because benchmark only takes account of recorded duration rather than actual execution duration when predicting next b.N.

func BenchmarkXXX(b *testing.B) {
    buf := NewSomeBuffer()

    b.Run("Write", func (b *testing.B) {  // produced 1,000,000 samples (for example)
        for i := 0; i < b.N; i++ {
            buf.ExpansiveWrite()
        }
    })

    b.Run("Read", func (b *testing.B) {
      /*
         produced 10,000,000 samples (again, for example)
         it should be somewhere slightly below 1,000,000 samples, since read operation is really cheap here.
     */
        b.StopTimer()
        for i := 0; i < b.N; i++ {
            buf.ExpansiveWrite()
        }
        b.StartTimer()

        // only trying to measure read here.
        for i := 0; i < b.N; i++ {
            buf.CheapRead()
        }
    })
}

b.start = time.Now()

b.duration += time.Since(b.start)

go/src/testing/benchmark.go

Lines 296 to 324 in c5d7f2f

d := b.benchTime.d
for n := int64(1); !b.failed && b.duration < d && n < 1e9; {
last := n
// Predict required iterations.
goalns := d.Nanoseconds()
prevIters := int64(b.N)
prevns := b.duration.Nanoseconds()
if prevns <= 0 {
// Round up, to avoid div by zero.
prevns = 1
}
// Order of operations matters.
// For very fast benchmarks, prevIters ~= prevns.
// If you divide first, you get 0 or 1,
// which can hide an order of magnitude in execution time.
// So multiply first, then divide.
n = goalns * prevIters / prevns
// Run more iterations than we think we'll need (1.2x).
n += n / 5
// Don't grow too fast in case we had timing errors previously.
n = min(n, 100*last)
// Be sure to run at least one more than last time.
n = max(n, last+1)
// Don't run more than 1e9 times. (This also keeps n in int range on 32 bit platforms.)
n = min(n, 1e9)
b.runN(int(n))
}
}
b.result = BenchmarkResult{b.N, b.duration, b.bytes, b.netAllocs, b.netBytes, b.extra}

Maybe there could be b.ResetDuration() which only resets output duration without polluting prediction algorithm.

@gopherbot gopherbot added this to the Proposal milestone Jul 15, 2020
@martisch
Copy link
Contributor

I usually use https://golang.org/pkg/testing/#B.ResetTimer after setup code and before starting the benchmark loop. Does that work better for your use case?

@joesonw
Copy link
Author

joesonw commented Jul 15, 2020 via email

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Jul 15, 2020
@rsc
Copy link
Contributor

rsc commented Jul 22, 2020

I can't think of a scenario where this kind of per-size setup would lead to valid benchmark results. You're kind of lucky to be getting a timeout - the alternative would be completely bogus results.

If the setup for a loop of length b.N takes longer than the thing you're timing, benchmarks are going to have a hard time getting a precise read. The implication is that the actual operation you are doing is changing based on b.N, which invalidates the entire benchmark timing computation: a buffer of 1 GB is going to have very different cache performance than a buffer of 1 MB, but the benchmark routine depends fundamentally on b.N=1<<30 doing exactly the same operation as b.N=1<<20, just 1<<10 more times.

@joesonw
Copy link
Author

joesonw commented Jul 23, 2020

@rsc It's NOT per-size setup. It's what happened (which should not). I was merely trying to measure read and write performance separately. The current benchmark prediction algorithm was not able to handle it correctly (when expansive operations are ex luded).

I revised the code comment in description, it should be more intuitive now.

@rsc
Copy link
Contributor

rsc commented Jul 14, 2021

There are two problems here.

The first problem is that if read is cheap, then testing needs to do more of them in the loop to get an accurate per-operation iteration count. What you are suggesting would choose a very small N in the Read benchmark because of the (untimed) expensive writes, but then the calculation (timed span) / b.N would be much less accurate than usual. To make this benchmark produce accurate results, you need to find a way to move the expensive setup out of the benchmark entirely.

The second problem is that, despite the claims to the contrary, this is absolutely per-size setup. If you do N ExpensiveWrite followed by N Reads, then between those two there is some buffer somewhere containing an amount of memory that scales with N. That is exactly what you can't do in benchmarks, because the work involved in storing the memory will have different characteristics for different N. To make this benchmark produce accurate results, the actual work has to be the same, just repeated N times.

A more accurate benchmark would look like:

var initBuf = ...set up buffer contents with 1000 writes...

func BenchmarkRead(b *testing.B) {
	var buf Buffer
	for i := 0; i < b.N; i++ {
		if i%1000 == 0 {
			buf.Reset(initBuf)
		}
		buf.CheapRead()
	}
}

@rsc rsc changed the title proposal: allow testing.B to use real execution duration to predicate b.N proposal: testing: allow B to use real execution duration to decide N Jul 14, 2021
@rsc rsc moved this from Incoming to Active in Proposals (old) Jul 14, 2021
@rsc
Copy link
Contributor

rsc commented Jul 14, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Decline in Proposals (old) Jul 21, 2021
@rsc
Copy link
Contributor

rsc commented Jul 21, 2021

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc rsc moved this from Likely Decline to Declined in Proposals (old) Jul 28, 2021
@rsc
Copy link
Contributor

rsc commented Jul 28, 2021

No change in consensus, so declined.
— rsc for the proposal review group

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

4 participants