cmd/vet: flag benchmarks that don’t use b #38677

josharian · 2020-04-26T20:30:08Z

I propose that cmd/vet flag any benchmark that doesn’t have any reference to b in its body.

Frequency: Low among experienced gophers, but not an uncommon mistake among new gophers.
Correctness: Benchmarks written without using b do not function as benchmarks. Often this gets caught by a user who is puzzled by the output, but not always, particularly if the benchmark is slow.
Precision: 100%, or very near it. You cannot write a correct benchmark without using b.N, b.Run, b.RunParallel, or somehow passing b or one of its fields to another function to do that for you.

The implementation should be fairly straightforward. Pick out benchmarks. Pick out the identifier (usually b). Look for any uses of that identifier. If none, complain.

cc @robpike @mvdan

The text was updated successfully, but these errors were encountered:

mvdan · 2020-04-26T20:38:13Z

I've actually seen this twice in the past few months. Something else I tend to see is slow benchmarks that do a lot of heavy initialization work, but don't take care of resetting the timer.

I'm not sure how far vet can get here, though. Writing a decent benchmark is more than just using b, and I worry that the vet check could give a false sense of security when their benchmark is "fixed".

Wouldn't there be a way for the testing package to catch this? For example, for any benchmark that can be run with N=1 quickly (which should be the vast majority of them), any run with N>100 should at least take 10x more time. If it does not, we are either benchmarking something else (like the initialization code, or nothing at all), or the benchmark doesn't use b properly.

It would be hard to cover very expensive benchmarks by default there, such as those that take one second per iteration, since the default is -benchtime=1s. But I think that's an okay tradeoff.

josharian · 2020-04-26T20:58:18Z

Writing a decent benchmark is more than just using b, and I worry that the vet check could give a false sense of security when their benchmark is "fixed".

True. I'd hope that being made aware of the problem and referred (by the vet error message) to some docs would set most people on a better path.

Wouldn't there be a way for the testing package to catch this?

There's been scattered discussion between @rsc, @bcmills, @aclements, myself, and maybe others, about having the benchmark framework detect non-linearity. But that's a heavy lift, and you'll get false positives, particularly since you have to base the decision on a single run.

There's also been discussion of having the benchmark framework print all intermediate b.N runs and have benchstat detect non-linearity, bi-modal distributions, and other distribution problems. But then you have to know about benchstat, at which point, you probably aren't the target audience for this vet check.

I definitely wish we could somehow build benchstat into package testing. But the "before"/"after" problem is a thorny one. (See my failed machinations in #10930.)

robpike · 2020-04-27T11:14:55Z

I'm not in favor of this. It seems a minor problem and opening vet up to analyzing people's tests leads us down a long and slippery path.

mvdan · 2020-04-27T11:35:47Z

@josharian your thoughts are very interesting - you've clearly been thinking about this for a while :)

I agree that, in general, writing good benchmarks in Go requires reading quite a lot of content and using multiple tools together carefully. Even though it's a difficult task, I still think that the only way to truly solve that problem is to make the tooling better. Be it by making the testing package smarter, or bundling benchstat into testing.

To me, fuzzing and benchmarking are kind of similar, in the sense that they are very helpful and necessary for many developers, but they are difficult to use correctly simply because not enough hours have gone into properly including them as part of go test.

Adding small checks to vet could possibly help in the near term, but we can only catch the most trivial mistakes there. So while this could meet the bar for inclusion in vet, I don't think it's the direction we want to be heading in.

bcmills · 2020-04-27T18:46:56Z

It seems a minor problem and opening vet up to analyzing people's tests leads us down a long and slippery path.

A Benchmark function that doesn't depend on b.N at all is guaranteed not to produce valid results.

In contrast, a Test function that ignores its *testing.T could potentially be valid: it might indicate a test whose failure mode is (say) an uncaught panic that terminates the test program with a non-zero status.

So I don't think there is a slippery slope here: the argument for Benchmark functions already does not apply to Test functions.

gopherbot · 2020-04-29T19:03:04Z

Change https://golang.org/cl/230978 mentions this issue: testing: fail benchmarks that don't loop over b.N

rsc · 2020-04-29T19:05:55Z

We do vet API usage, checking things like printf format strings.
And we vet test code, checking things like t.Fatalf's format strings.
In general it seems OK to vet misuse of test APIs, same as other APIs,
provided the checks meet the same bar.

That said, I am not convinced this specific check needs to be in vet.
Checking "did it use b at all" would catch some but not all problems
and would be easily defeated. It doesn't seem specific enough to me.
And I think the mistake can be detected more precisely at runtime instead.

As Josh noted, we've talked before about having a linearity check.
That can be a bit sensitive to timings but might be worth doing anyway.
Whether to do that is maybe borderline pending experiments
(that none of us have ever made time for).

But a highly accurate linearity check is not needed to detect
"this line (plotting t against b.N) has slope 0 instead of slope 1".
I sent CL 230978.
With that change, a vet check doesn't seem worthwhile.
(The only benefit would be that the vet check would catch
the bug in a benchmark that is never actually run, but if you
never run your benchmark, do you really care if it has a bug?)

rsc · 2020-06-10T20:00:02Z

Assuming we get CL 230978 cleaned up and submitted, anyone think we should still discuss changing cmd/vet?

josharian · 2020-06-10T20:32:28Z

If we had this to do over again, it would be nice to do something more like b.RunParallel. Instead of exposing b.N, have the benchmark call b.Next() until it returns false. You could expose a Count method that returns an int for benchmarks that need to pre-size auxiliary structures. Then detecting this situation would be trivial and immediate (b.Next never got called). And we could probably do away with b.ResetTimer, since we could wait to start the timer until the first b.Next call.

But I guess that's neither here nor there.

Assuming we get CL 230978 cleaned up and submitted, anyone think we should still discuss changing cmd/vet?

It seems to me that that is an empirical question, and the relevant data point--do we still need it?--is best answered by getting CL 230978 in and then waiting to find out.

rsc · 2023-06-28T01:40:44Z

Austin pointed out various problems with CL 230978 so I ended up writing a vet check after all. Results of running that vet check on the public Go ecosystem are at https://gist.github.com/rsc/4ba2eacb2d1a00339f4539e7b1bee576. There is one false positive I need to correct involving assigning b to a global that is then read by another function called via yet another function, but other than that, the check seems quite accurate and useful.

rsc · 2023-06-28T22:00:22Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

rsc · 2023-07-12T19:54:16Z

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

rsc · 2023-07-26T18:25:56Z

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

ChrisHines · 2023-12-08T20:51:14Z

Austin pointed out various problems with CL 230978 so I ended up writing a vet check after all. Results of running that vet check on the public Go ecosystem are at https://gist.github.com/rsc/4ba2eacb2d1a00339f4539e7b1bee576. There is one false positive I need to correct involving assigning b to a global that is then read by another function called via yet another function, but other than that, the check seems quite accurate and useful.

@rsc

I was just looking through the results you posted from your prototype vet check and I see what looks like more false positives than the one you described. Take entry (9) in the list: https://go-mod-viewer.appspot.com/gonum.org/v1/gonum@v0.13.0/internal/asm/f64/bench_other_test.go#L347

The relevant bits of the flagged code are:

func BenchmarkDivTo100000(t *testing.B) { benchDivTo(DivTo, 100000, t) }

func benchDivTo(f func(dst, a, b []float64) []float64, sz int, t *testing.B) {
	dst, a, b := z[:sz], x[:sz], y[:sz]
	for i := 0; i < t.N; i++ {
		f(dst, a, b)
	}
}

I see this pattern in other entries on your list where the *testing.B is passed to a helper function that has the loop on b.N (although it is named t.N in this case). More examples of the same pattern I found by just clicking on a few entries are below. So far all of these use a name other than b to hold the *testing.B in a helper function.

timothy-king · 2023-12-13T21:20:34Z

@ChrisHines I agree that those look like false positives and should not be flagged. The final implementation should probably be conservative when a callee may use b.

gopherbot added this to the Proposal milestone Apr 26, 2020

gopherbot added the Proposal label Apr 26, 2020

rsc added this to Incoming in Proposals (old) Jun 10, 2020

bcmills mentioned this issue Sep 30, 2020

testing: benchmark iteration reports incorrectly #41637

Closed

josharian mentioned this issue Sep 29, 2021

testing: document best practices for avoiding compiler optimizations in benchmarks #27400

Open

bcmills mentioned this issue Oct 4, 2021

proposal: testing: a less error-prone API for benchmark iteration #48768

Closed

adonovan added the Analysis Issues related to static analysis (vet, x/tools/go/analysis) label Apr 23, 2023

ianlancetaylor mentioned this issue Jun 21, 2023

proposal: benchmarks or benchstat: should question timings less than 1ns #59926

Closed

rsc mentioned this issue Jun 28, 2023

proposal: review meeting minutes #33502

Open

rsc added the Proposal-FinalCommentPeriod label Jul 12, 2023

aclements mentioned this issue Jul 21, 2023

testing: add testing.B.Loop for iteration #61515

Open

rsc changed the title ~~proposal: cmd/vet: flag benchmarks that don’t use b~~ cmd/vet: flag benchmarks that don’t use b Jul 26, 2023

rsc modified the milestones: Proposal, Backlog Jul 26, 2023

rsc added Proposal-Accepted and removed Proposal-FinalCommentPeriod labels Jul 26, 2023

aclements mentioned this issue Aug 31, 2023

proposal: testing: add Keep, to force evaluation in benchmarks #61179

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/vet: flag benchmarks that don’t use b #38677

cmd/vet: flag benchmarks that don’t use b #38677

josharian commented Apr 26, 2020

mvdan commented Apr 26, 2020

josharian commented Apr 26, 2020 •

edited

robpike commented Apr 27, 2020

mvdan commented Apr 27, 2020

bcmills commented Apr 27, 2020 •

edited

gopherbot commented Apr 29, 2020

rsc commented Apr 29, 2020

rsc commented Jun 10, 2020

josharian commented Jun 10, 2020

rsc commented Jun 28, 2023

rsc commented Jun 28, 2023

rsc commented Jul 12, 2023

rsc commented Jul 26, 2023

ChrisHines commented Dec 8, 2023

timothy-king commented Dec 13, 2023

cmd/vet: flag benchmarks that don’t use b #38677

cmd/vet: flag benchmarks that don’t use b #38677

Comments

josharian commented Apr 26, 2020

mvdan commented Apr 26, 2020

josharian commented Apr 26, 2020 • edited

robpike commented Apr 27, 2020

mvdan commented Apr 27, 2020

bcmills commented Apr 27, 2020 • edited

gopherbot commented Apr 29, 2020

rsc commented Apr 29, 2020

rsc commented Jun 10, 2020

josharian commented Jun 10, 2020

rsc commented Jun 28, 2023

rsc commented Jun 28, 2023

rsc commented Jul 12, 2023

rsc commented Jul 26, 2023

ChrisHines commented Dec 8, 2023

timothy-king commented Dec 13, 2023

josharian commented Apr 26, 2020 •

edited

bcmills commented Apr 27, 2020 •

edited