Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: benchmark regexp doesn't filter top level benchmarks #20589

Closed
josharian opened this issue Jun 6, 2017 · 12 comments
Closed

testing: benchmark regexp doesn't filter top level benchmarks #20589

josharian opened this issue Jun 6, 2017 · 12 comments

Comments

@josharian
Copy link
Contributor

josharian commented Jun 6, 2017

package p

import "testing"

func BenchmarkMapA(b *testing.B) {}
func BenchmarkMapB(b *testing.B) {
	b.Run("C", func(b *testing.B) {})
}
$ go test -bench="Map/C" x_test.go 
goos: darwin
goarch: amd64
BenchmarkMapA-8   	2000000000	         0.00 ns/op
BenchmarkMapB/C-8 	2000000000	         0.00 ns/op
PASS
ok  	command-line-arguments	0.020s

The docs for -bench say:

	-bench regexp
	    Run (sub)benchmarks matching a regular expression.
	    The given regular expression is split into smaller ones by
	    top-level '/', where each must match the corresponding part of a
	    benchmark's identifier.

So I expected only benchmarks matching Map in the first part and C in the second part to match--namely BenchmarkMapB/C. But I got BenchmarkMapA as well.

This arose because I wanted to run only the runtime map sub-benchmarks named "Int64" and the natural command go test -bench="Map/Int64" -run=NONE runtime did not work.

cc @mpvl

@mvdan
Copy link
Member

mvdan commented Jun 6, 2017

Note that this happens with tests too, using the same example but with tests instead of benchmarks:

$ go test -v -run Map/C
=== RUN   TestMapA
--- PASS: TestMapA (0.00s)
=== RUN   TestMapB
=== RUN   TestMapB/C
--- PASS: TestMapB (0.00s)
    --- PASS: TestMapB/C (0.00s)
PASS

I always thought this was by design, since one can do stuff like this to avoid all sub-tests:

$ go test -v -run /XXX
=== RUN   TestMapA
--- PASS: TestMapA (0.00s)
=== RUN   TestMapB
--- PASS: TestMapB (0.00s)

In other words, I understood /subtest-match to only filter sub-tests, not regular tests. But looking at the wording, I would think you're right.

@meirf
Copy link
Contributor

meirf commented Jun 12, 2017

I'd like to more precisely define what we want. (Everything below applies to both tests and benchmarks.)

Normalization of the '/'-separated pattern

  • If the given pattern starts with '/', there is an implicit empty string before this first '/'.
  • Every pattern can be thought of as having an implicit infinite "//////..." suffix, with each '/' separated by the empty string.
  • We can then split this normalized (theoretical) string on '/' and have a subregex for each level of any arbitrarily sized tree.

Normalization of the test name

  • If a subtest has no explicit children, it implicitly has a child with empty space name at each depth ad infinitum.

Intra-depth match

If the test's root is at depth zero and the first subregex of the normalized pattern is at index zero, we can define an intra-depth match as the subregex at index i matching a substring of the name of a given subtest at depth i. Note that we consider the empty string to be a valid substring.

Runnability of a test

A subtest is runnable if there is a path from the root to infinity containing that subtest and every subtest in that path has an intra-depth match.

Examples mentioned above, and whether they follow this definition

  • The output of go test -bench="Map/C" x_test.go is a violation because BenchmarkMapA has an implicit empty string test at depth 1 and the subregex 'C' does not match any substring in empty string.
  • go test -v -run Map/C is a violation for a similar reason.
  • go test -v -run /XXX is not a violation. The 0th subregex is the empty string which matches a substring of all the output tests and TestMapB/C is excluded because 'XXX' does not match any substring of 'C'.

Do you agree with this definition? @josharian @ianlancetaylor

(I'd like to work on this.)

@josharian
Copy link
Contributor Author

Seems fairly reasonable, but I'd like @mpvl to weigh in, since he designed the current behavior. Maybe also @rsc (?).

@rsc
Copy link
Contributor

rsc commented Jun 13, 2017

The comment by @meirf appears to be a full redesign of the matching semantics. I'm sorry, but that's simply not on the table. If there are specific changes you want to see, maybe we can talk about those, but I'd rather not start with something that requires me to mentally compare the current behavior and the proposed behavior to see how they differ. Make a list of differences, please.

Also, regrettably, the focus above on "what we want" may not be terribly relevant. The design of the pattern matching is constrained by the fact that the hierarchy of tests and subtests (and also benchmarks and sub-benchmarks) is only discovered by executing the parent levels of the hierarchy. If you have -run=/Foo, then since the only way to find out which tests have a sub-test "Foo" is to run them, it follows that all the top-level tests match this. We could hide that fact by not reporting the test results, but the fact is that the tests would still run, consuming CPU and possibly generating non-t.Logged output. Better to be honest.

In the case of benchmarks, we could potentially be more constraining, by forcing b.N to 0 during those top-level benchmarks when we're just exploring to find a sub-benchmark named Foo. Maybe we should do that, but note that the result would be necessarily inconsistent with the test situation. I would see defending the inconsistency as better than today, but it's important to realize the inconsistency.

And again it is critically important to identify what you propose to change, not to construct a whole new semantics and require diffing it against the current one.

@meirf
Copy link
Contributor

meirf commented Jun 16, 2017

Sorry for using new semantics and putting the diffing/parsing burden on the reader.

The only difference I am aware between the status quo and the definition from my 1st comment is the one @rsc mentioned: "The design of the pattern matching is constrained... realize the inconsistency."

Status quo
A subtest is run if there is a path from the root to the subtest, inclusive, and every subtest in that path has a match.
(Initial) Proposal
A subtest is run if there is a path from the root to infinity containing that subtest and every subtest in that path has a match.

The diff is more general than just the highest level test/benchmark (@josharian's complaint) - e.g. a matching subtest can be 10 levels deep and still be run despite the pattern at the 11th level not matching its subtest/it having no subtest.

Resolution of Diff/New Proposal
As noted by @rsc, this change should not be made for tests and for benchmarks there would necessarily be an inconsistency.
The documentation for tests has:

-run regexp
    Run only those tests and examples matching the regular expression.
    For tests the regular expression is split into smaller ones by
    top-level '/', where each must match the corresponding part of a
    test's identifier.

This has the same issue @josharian brought up regarding benchmarks. Specifically, one would think that a test needs a match at every level of the pattern in order to be run. So for the short term, the resolution could be to change the documentation for both tests and benchmarks. If the benchmark code gets a change, that doc change for benchmarks can be reverted. I'd be happy upload a minor change to the language used in the docs if there are no objections.

@rsc
Copy link
Contributor

rsc commented Jun 16, 2017

(I understand that the problem is general but let's keep using top-level vs sub-test in the discussion, with everyone understanding that we really mean the general problem and that any fixes must apply generally.)

I now remember that we tried b.N=0 for probing like this, but it broke too many ordinary benchmarks that mishandled N=0 (they expected their loops to run at least once). So now we use N=1 for the probe but don't print benchmark output. So for benchmarks, I think things are working as best they can.

For testing, as we've established, there's no way to avoid running the top-level test. So for testing, I also think things are working as best they can.

Here's an example transcript. If you think anything should change, please explain how this would behave differently:

$ cat x_test.go
package x

import "testing"

func TestX(t *testing.T) {
	t.Logf("X running")
	t.Run("Y", func(t *testing.T) {
		t.Logf("Y running")
	})
}

func BenchmarkX(b *testing.B) {
	b.Logf("X running N=%d", b.N)
	b.Run("Y", func(b *testing.B) {
		b.Logf("Y running N=%d", b.N)
	})
}
$ go test -run=/Z -bench=/Z x_test.go
PASS
ok  	command-line-arguments	0.012s
$ go test -v -run=/Z -bench=/Z x_test.go
=== RUN   TestX
--- PASS: TestX (0.00s)
	x_test.go:6: X running
--- BENCH: BenchmarkX
	x_test.go:13: X running N=1
PASS
ok  	command-line-arguments	0.014s
$ go test -v -run=/Y -bench=/Y x_test.go
=== RUN   TestX
=== RUN   TestX/Y
--- PASS: TestX (0.00s)
	x_test.go:6: X running
    --- PASS: TestX/Y (0.00s)
    	x_test.go:8: Y running
goos: darwin
goarch: amd64
BenchmarkX/Y-8 	2000000000	         0.00 ns/op
--- BENCH: BenchmarkX/Y-8
	x_test.go:15: Y running N=1
	x_test.go:15: Y running N=100
	x_test.go:15: Y running N=10000
	x_test.go:15: Y running N=1000000
	x_test.go:15: Y running N=100000000
	x_test.go:15: Y running N=2000000000
--- BENCH: BenchmarkX
	x_test.go:13: X running N=1
PASS
ok  	command-line-arguments	0.015s
$ 

I sent a CL clarifying the docs and will let that CL mark this issue closed.

@gopherbot
Copy link

CL https://golang.org/cl/46030 mentions this issue.

@josharian
Copy link
Contributor Author

So now we use N=1 for the probe but don't print benchmark output. So for benchmarks, I think things are working as best they can.

One minor improvement would be to probe only once when -count > 1. But probably not worth the bother. Thanks for updating the docs.

@josharian
Copy link
Contributor Author

Sorry, Russ, but even with your new wording, the original result looks wrong to me.

Note that BenchmarkMapA gets run with b.N up to 2000000000, and its output gets displayed. I would expect that BenchmarkMapA needs to get run with b.N=1 to discover subbenchmarks and that its output gets suppressed.

@josharian josharian reopened this Jun 16, 2017
@rsc
Copy link
Contributor

rsc commented Jun 16, 2017

Thanks Josh. I see what happened and will update the code to match the description. Re your next-to-last comment, the probe already only happens once regardless of -count.

@rsc
Copy link
Contributor

rsc commented Jun 16, 2017

/cc @mpvl, CL on the way for you shortly. This is the context.

@gopherbot
Copy link

CL https://golang.org/cl/46031 mentions this issue.

@golang golang locked and limited conversation to collaborators Jun 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants