math: make benchmark iterations more data-dependent #33441

smasher164 · 2019-08-02T19:54:58Z

As discussed in CL 127458, benchmarks should be updated to make each iteration dependent on the result of the previous. This reduces the chance of work done in the loop being optimized away, as well feeding the benchmarks a larger input space. For instance, the FMA benchmark was changed from

func BenchmarkFma(b *testing.B) {
	x := 0.0
	for i := 0; i < b.N; i++ {
		x = Fma(E, Pi, Phi)
	}
	GlobalF = x
}

to

func BenchmarkFma(b *testing.B) {
	x := 0.0
	for i := 0; i < b.N; i++ {
		x = Fma(E, Pi, x)
	}
	GlobalF = x
}

mundaym · 2019-08-03T19:58:33Z

I think we should be careful doing this, at least when measuring the speed of operations that are only a couple of instructions:

This change makes the benchmark measure the latency of the operation rather than throughput which is a choice that may not be ideal for all the benchmarks. Floating point operations often have high latency but also high throughput due to pipelining. Which one we care more about is dependent on the application.
Making the input value dependent on the number of iterations the benchmark runs for (i.e. b.N) isn't ideal if the speed of the operation depends on said input value since it may lead to the benchmark speeding up or slowing down the longer it is run for. For example, the benchmark may end up in a steady state that happens to be an edge case where the operation is a no-op or unusually expensive.

One possible solution is to add a small array of input values (with a length of a power of 2 to avoid division instructions) and iterate over those for throughput benchmarks. The cost of the indexing and load should be low. Then add a separate latency benchmark for operations where we are particularly interested in latency (probably only the single instruction operations such as FMA, Sqrt, Abs, Copysign etc.).

nsajko · 2019-08-03T23:15:09Z

This CL is probably the better solution, more robust as runtime.KeepAlive is basically guaranteed not to get optimizted out: https://go-review.googlesource.com/c/go/+/188437

smasher164 · 2019-08-05T00:20:39Z

If there's no boxing that happens with runtime.KeepAlive, then @nsajko's CL is probably a reasonable option. Especially because as @mundaym points out, we would have to first differentiate and duplicate the benchmarks based on the metric we care about (latency vs throughput), and then define additional input values for each of those benchmarks to avoid reaching a steady state. That seems like a lot of work for little return.

smasher164 closed this as completed Aug 5, 2019

golang locked and limited conversation to collaborators Aug 4, 2020

gopherbot added the FrozenDueToAge label Aug 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

math: make benchmark iterations more data-dependent #33441

math: make benchmark iterations more data-dependent #33441

smasher164 commented Aug 2, 2019

mundaym commented Aug 3, 2019 •

edited

nsajko commented Aug 3, 2019

smasher164 commented Aug 5, 2019

math: make benchmark iterations more data-dependent #33441

math: make benchmark iterations more data-dependent #33441

Comments

smasher164 commented Aug 2, 2019

mundaym commented Aug 3, 2019 • edited

nsajko commented Aug 3, 2019

smasher164 commented Aug 5, 2019

mundaym commented Aug 3, 2019 •

edited