proposal: testing: allow B to use real execution duration to decide N #40227

joesonw · 2020-07-15T09:10:53Z

When trying something like below. It can lead to benchmark TIMEOUT. Because benchmark only takes account of recorded duration rather than actual execution duration when predicting next b.N.

func BenchmarkXXX(b *testing.B) {
    buf := NewSomeBuffer()

    b.Run("Write", func (b *testing.B) {  // produced 1,000,000 samples (for example)
        for i := 0; i < b.N; i++ {
            buf.ExpansiveWrite()
        }
    })

    b.Run("Read", func (b *testing.B) {
      /*
         produced 10,000,000 samples (again, for example)
         it should be somewhere slightly below 1,000,000 samples, since read operation is really cheap here.
     */
        b.StopTimer()
        for i := 0; i < b.N; i++ {
            buf.ExpansiveWrite()
        }
        b.StartTimer()

        // only trying to measure read here.
        for i := 0; i < b.N; i++ {
            buf.CheapRead()
        }
    })
}

go/src/testing/benchmark.go

Line 125 in c5d7f2f

b.start = time.Now()

go/src/testing/benchmark.go

Line 135 in c5d7f2f

b.duration += time.Since(b.start)

go/src/testing/benchmark.go

Lines 296 to 324 in c5d7f2f

    
           	d := b.benchTime.d 
        
           	for n := int64(1); !b.failed && b.duration < d && n < 1e9; { 
        
           		last := n 
        
           		// Predict required iterations. 
        
           		goalns := d.Nanoseconds() 
        
           		prevIters := int64(b.N) 
        
           		prevns := b.duration.Nanoseconds() 
        
           		if prevns <= 0 { 
        
           			// Round up, to avoid div by zero. 
        
           			prevns = 1 
        
           		} 
        
           		// Order of operations matters. 
        
           		// For very fast benchmarks, prevIters ~= prevns. 
        
           		// If you divide first, you get 0 or 1, 
        
           		// which can hide an order of magnitude in execution time. 
        
           		// So multiply first, then divide. 
        
           		n = goalns * prevIters / prevns 
        
           		// Run more iterations than we think we'll need (1.2x). 
        
           		n += n / 5 
        
           		// Don't grow too fast in case we had timing errors previously. 
        
           		n = min(n, 100*last) 
        
           		// Be sure to run at least one more than last time. 
        
           		n = max(n, last+1) 
        
           		// Don't run more than 1e9 times. (This also keeps n in int range on 32 bit platforms.) 
        
           		n = min(n, 1e9) 
        
           		b.runN(int(n)) 
        
           	} 
        
           } 
        
           b.result = BenchmarkResult{b.N, b.duration, b.bytes, b.netAllocs, b.netBytes, b.extra}

Maybe there could be b.ResetDuration() which only resets output duration without polluting prediction algorithm.

The text was updated successfully, but these errors were encountered:

martisch · 2020-07-15T11:26:27Z

I usually use https://golang.org/pkg/testing/#B.ResetTimer after setup code and before starting the benchmark loop. Does that work better for your use case?

joesonw · 2020-07-15T11:41:20Z

It introduces the same issue, the logic behind both Is the same. When predicting b.N, only built in timer duration was taken into consideration rather than actual execution time. Thus it would make the benchmark batch sizes grow crazily.

rsc · 2020-07-22T18:20:31Z

I can't think of a scenario where this kind of per-size setup would lead to valid benchmark results. You're kind of lucky to be getting a timeout - the alternative would be completely bogus results.

If the setup for a loop of length b.N takes longer than the thing you're timing, benchmarks are going to have a hard time getting a precise read. The implication is that the actual operation you are doing is changing based on b.N, which invalidates the entire benchmark timing computation: a buffer of 1 GB is going to have very different cache performance than a buffer of 1 MB, but the benchmark routine depends fundamentally on b.N=1<<30 doing exactly the same operation as b.N=1<<20, just 1<<10 more times.

joesonw · 2020-07-23T03:25:40Z

@rsc It's NOT per-size setup. It's what happened (which should not). I was merely trying to measure read and write performance separately. The current benchmark prediction algorithm was not able to handle it correctly (when expansive operations are ex luded).

I revised the code comment in description, it should be more intuitive now.

rsc · 2021-07-14T15:40:54Z

There are two problems here.

The first problem is that if read is cheap, then testing needs to do more of them in the loop to get an accurate per-operation iteration count. What you are suggesting would choose a very small N in the Read benchmark because of the (untimed) expensive writes, but then the calculation (timed span) / b.N would be much less accurate than usual. To make this benchmark produce accurate results, you need to find a way to move the expensive setup out of the benchmark entirely.

The second problem is that, despite the claims to the contrary, this is absolutely per-size setup. If you do N ExpensiveWrite followed by N Reads, then between those two there is some buffer somewhere containing an amount of memory that scales with N. That is exactly what you can't do in benchmarks, because the work involved in storing the memory will have different characteristics for different N. To make this benchmark produce accurate results, the actual work has to be the same, just repeated N times.

A more accurate benchmark would look like:

var initBuf = ...set up buffer contents with 1000 writes...

func BenchmarkRead(b *testing.B) {
	var buf Buffer
	for i := 0; i < b.N; i++ {
		if i%1000 == 0 {
			buf.Reset(initBuf)
		}
		buf.CheapRead()
	}
}

rsc · 2021-07-14T18:01:43Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

rsc · 2021-07-21T19:08:38Z

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

rsc · 2021-07-28T18:50:29Z

No change in consensus, so declined.
— rsc for the proposal review group

gopherbot added this to the Proposal milestone Jul 15, 2020

gopherbot added the Proposal label Jul 15, 2020

ianlancetaylor added this to Incoming in Proposals (old) Jul 15, 2020

rsc changed the title ~~proposal: allow testing.B to use real execution duration to predicate b.N~~ proposal: testing: allow B to use real execution duration to decide N Jul 14, 2021

rsc moved this from Incoming to Active in Proposals (old) Jul 14, 2021

rsc mentioned this issue Jul 14, 2021

proposal: review meeting minutes #33502

Open

rsc moved this from Active to Likely Decline in Proposals (old) Jul 21, 2021

rsc added the Proposal-FinalCommentPeriod label Jul 21, 2021

rsc moved this from Likely Decline to Declined in Proposals (old) Jul 28, 2021

rsc removed the Proposal-FinalCommentPeriod label Jul 28, 2021

rsc closed this as completed Jul 28, 2021

RiRa12621 mentioned this issue Apr 13, 2022

cmd/go: go install cmd@version errors out when module with main package has replace directive #44840

Closed

golang locked and limited conversation to collaborators Jul 28, 2022

gopherbot added the FrozenDueToAge label Jul 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: testing: allow B to use real execution duration to decide N #40227

proposal: testing: allow B to use real execution duration to decide N #40227

joesonw commented Jul 15, 2020 •

edited

martisch commented Jul 15, 2020

joesonw commented Jul 15, 2020 via email •

edited

rsc commented Jul 22, 2020

joesonw commented Jul 23, 2020 •

edited

rsc commented Jul 14, 2021

rsc commented Jul 14, 2021

rsc commented Jul 21, 2021

rsc commented Jul 28, 2021

proposal: testing: allow B to use real execution duration to decide N #40227

proposal: testing: allow B to use real execution duration to decide N #40227

Comments

joesonw commented Jul 15, 2020 • edited

martisch commented Jul 15, 2020

joesonw commented Jul 15, 2020 via email • edited

rsc commented Jul 22, 2020

joesonw commented Jul 23, 2020 • edited

rsc commented Jul 14, 2021

rsc commented Jul 14, 2021

rsc commented Jul 21, 2021

rsc commented Jul 28, 2021

joesonw commented Jul 15, 2020 •

edited

joesonw commented Jul 15, 2020 via email •

edited

joesonw commented Jul 23, 2020 •

edited