Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: add -shuffle and -shuffleseed to shuffle tests #28592

Closed
cristaloleg opened this issue Nov 4, 2018 · 36 comments
Closed

testing: add -shuffle and -shuffleseed to shuffle tests #28592

cristaloleg opened this issue Nov 4, 2018 · 36 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Proposal Proposal-Accepted
Milestone

Comments

@cristaloleg
Copy link

This is a revive of #10655

Motivation

Consider the following code & corresponding tests:

package pkg

// build and cache regexp, reuse between clients
var re *regexp.Regexp

type Client struct {
	// ...
}

func NewClient(pattern string) *Client {
	if re == nil {
		re = regexp.MustCompile(pattern)
	}
	return &Client{
		// ...
	}
}

func (c *Client) HasPattern(s string) error {
	if !re.MatchString(s) {
		return errors.New("meh, incorrect param")
	}
	return nil
}
package pkg

func TestFoo(t *testing.T) {
	s := NewClient("^foo.*")
	err := s.HasPattern("foo123")
	if err != nil {
		t.Errorf("expected to pass")
	}
}

func TestFooBar(t *testing.T) {
	s := NewClient("^foobar.*")
	err := s.HasPattern("foobar1123")
	if err != nil {
		t.Errorf("expected to pass")
	}
}

Those tests pass, everything looks fine, but they're order dependent. Running them in another order will fail.

To prevent such hidden and hard to debug mistakes we need to make the order of test random for each test build.

Current workarounds

  1. Manual ordering of tests.
  2. Boilerplate code to specify the order of tests.
  3. For table-driven tests we can use a map instead of slice, ex:
func TestSomething(t *testing.T) {
	testCases := map[name]struct{
		a,b int64
		res int64
	}{
		“simple case”: {1, 2, 3},
		“less simple”: {3, 3, 23},
	}

	// due to behaviour of map test cases will be 
	// in a different order for each run
	// but this solution is limited and not suitable for test funcs
	// aka `func TestXXX(t *testing.T)`
	for name, tc := range testCases {
		t.Logf(“test: %s”, name)

		res := foo(tc.a, tc.b)
		if res != tc.res {
			t.Errorf(“want %v, got %v, res, tc.res)
		}
	}
}

Possible solution

We need to specify a test run to execute tests with a random/desired order:

  1. -shuffle to run tests with a random order (used random seed may or should be printed to output depending or not on a -v flag).
  2. -seed <int> to specify a user-defined seed to have an ability repeat a failed build.

Open questions

  1. Should we randomize order of benchmarks and examples? (looks like not)
  2. This makes test runs 'flaky' but this helps to reveal possible implementation caveats.
  3. This might not happen for 1.12 'cause proposal is submitted too late.

PS. if/when it will be accepted - will be happy to work on it.

@gopherbot gopherbot added this to the Proposal milestone Nov 4, 2018
@mvdan
Copy link
Member

mvdan commented Nov 5, 2018

Interesting problem - reminds me of go test -race a bit. I'm a bit worried that every repository will now be expected to run three test commands on its CI; go test; go test -race; go test -shuffle.

On the other hand, making go test randomize the order could make its output non-deterministic.

I wonder if this is something we could enable when one runs go test ./.... That is, since a package's test output and results are printed all at once, we could run the tests in random order and then sort the results before printing them. That would be ideal in my opinion, since every existing user would benefit and we wouldn't need an extra flag.

That suggestion wouldn't apply to go test or go test -v, since those do need to run the tests in order - the results are printed as they happen.

@cristaloleg
Copy link
Author

That is, since a package's test output and results are printed all at once

Might be a problem with #24929 😉

and then sort the results before printing them

But how we can know an order that triggered a test run failure? Sort only success run?

@mvdan
Copy link
Member

mvdan commented Nov 5, 2018

Might be a problem with #24929

Oh, I thought that proposal would affect go test -v, not go test -v ./.... If it affects both, then you're right.

But how we can know an order that triggered a test run failure? Sort only success run?

Potentially. Or one could take the usual approach like with any other test flake - run the tests over and over again with -count or stress until we find the failure again or we're convinced that it's fixed.

@rsc
Copy link
Contributor

rsc commented Dec 12, 2018

More command-line flags seems problematic.
What if instead we added a method on testing.M so that TestMain could call m.Shuffle(time.Now().UnixNano()) in tests that wanted to opt in to randomized execution?
Those tests could also define their own seed flags instead of having to add one to all tests.

@rsc
Copy link
Contributor

rsc commented Jan 23, 2019

@aclements and @dr2chase and I were talking about the problem of benchmarks being sensitive to earlier benchmarks, which this would partly address too.

What if testing.M exposed a slice of test identifiers (strings or some object with a Name method) and a slice of benchmark identifiers, and you could reorder or filter or duplicate those, by rewriting the slices, before calling m.Run? We could still give an m.Shuffle helper for that common operation.

@cristaloleg
Copy link
Author

Quick draft of what you're saying. Did I get it right?

package testing

type M struct { ... }

type Tester interface {
	Name() string
}

func (m *M) Tests() []Tester { ... }
func (m *M) Benchmarks() []Tester { ... }

func (m *M) Shuffle(tests []Tester) []Tester { ... }

@rsc
Copy link
Contributor

rsc commented Jan 30, 2019

@cristaloleg No, it would have to be fields in M (or have setters too), so that they could be rewritten before calling m.Run.

@aclements
Copy link
Member

@aclements and @dr2chase and I were talking about the problem of benchmarks being sensitive to earlier benchmarks, which this would partly address too.

For benchmarks, this is specifically in the context of iterated runs. Ideally, I'd want a way to take, say, N runs of a set of M benchmarks and shuffle the whole N*M sequence of runs. This would get much more robust benchmark results when benchmarks aren't totally isolated (which they never are, at least because of GC state).

For reproducibility of order in the benchmark case, you could imagine printing the sequence randomization seed as one of the key/value headers in the output.

@ianlancetaylor ianlancetaylor changed the title proposal: testing: add shuffle flag proposal: testing: let TestMain access and change the list of tests Mar 20, 2019
@ianlancetaylor
Copy link
Contributor

Retitled per suggestion above.

@andybons
Copy link
Member

Seems like we'd like to do this, but we're blocked on an API definition. Is that correct?

@cristaloleg
Copy link
Author

@andybons looks so

@mark-rushakoff
Copy link
Contributor

The use case that interests me from this proposed API: the ease of partitioning a single test package, that contains many individual tests, that each take a long time to run. For example, if my codebase has one particular test package that takes several minutes when run with -race -count=1, I could have 4 different CI jobs in parallel that each run 1/4 of the tests in the bottlenecked package.

Without an API like this proposal, I would have to run go test -list=. to discover the top-level tests, and then find an appropriate set of regexes to pass to each invocation of go test -run=... such that the top-level tests are split roughly evenly.

@rsc
Copy link
Contributor

rsc commented Jul 16, 2019

The original problem identified was reordering tests to shake out unwanted dependencies between test functions. We expanded to being able to set up the order of repetitions as well. Those still seem worth doing.

Sharding is possible for top-level tests but more difficult once subtests are involved (subtests basically can't get sharded because you don't know until you enter the outer test that they exist, and the outer test might have done expensive setup that you don't want to repeat on each shard).

All those still seem like enough reason to do this. So what would the API look like?

Currently testing.M says

type M struct {
	// Has unexported fields.
}
    M is a type passed to a TestMain function to run the actual tests.

func (m *M) Run() int

The idea I think would be to add exported slice fields Tests and Benchmarks and maybe also a convenience method Shuffle. It is unclear what the types of those slice elements are. There are also examples. Right now we have:

$ go doc testing.InternalTest
package testing // import "testing"

type InternalTest struct {
	Name string
	F    func(*T)
}
    An internal type but exported because it is cross-package; part of the
    implementation of the "go test" command.

$ go doc testing.InternalExample
package testing // import "testing"

type InternalExample struct {
	Name      string
	F         func()
	Output    string
	Unordered bool
}

$ go doc testing.InternalBenchmark
package testing // import "testing"

type InternalBenchmark struct {
	Name string
	F    func(b *B)
}
    An internal type but exported because it is cross-package; part of the
    implementation of the "go test" command.

$

Should we de-internalize these (we can type alias InternalBenchmark = Benchmark for compatibility) and then use them?

The idea is that you'd reorder however you like and then call Run.

Or maybe we could make the lists be []Test where you can only find out the name?

type Test interface {
    Name() string
}

That would hide F, which may be preferable. (Calling one of these F's is non-trivial and perhaps impossible depending on what it needs.)

Or should we just shuffle by default and not add any new API? That approach would require a new flag for -seed still, and it would not solve the benchmark iteration issue.

@rsc
Copy link
Contributor

rsc commented Sep 25, 2019

My previous comment asked three questions and no one has suggested any answers.

Do people care enough about this suggestion that we should continue with it?
Or is it just not that important / would not have enough widespread use?

@misha-ridge
Copy link

As a user in need of randomized order of tests, I do care about having the feature, though I have no preferences about the interface — any of the options above would allow me to achieve the result: shuffle the tests, and rerun them in the same order later if needed.

@rsc rsc changed the title proposal: testing: let TestMain access and change the list of tests proposal: testing: add -shuffle and -shuffleseed to shuffle tests Oct 2, 2019
@rsc
Copy link
Contributor

rsc commented Oct 2, 2019

It sounds like the generality we got to is overkill - no one seems to want it.

The original proposal was to add -shuffle and -seed. It would probably be good to make the latter -shuffleseed (default a random seed) so that they appear next to each other in the help output.

Changing the title back to focus on those flags.

Any thoughts, @mpvl, @robpike, @aclements?

@robpike
Copy link
Contributor

robpike commented Oct 3, 2019

How about just one flag that handles a default value? -shuffle=seed. If seed is 0 (say), it means be random.

@aclements
Copy link
Member

Or, rather than a special value to indicate a random seed, -shuffle=random or -shuffle=N? I think people are likely to use a seed of 0 intending to get a deterministic (but shuffled) run.

@josharian
Copy link
Contributor

Or just -shuffle and -shuffle=N?

@cespare
Copy link
Contributor

cespare commented Oct 3, 2019

@josharian I think that means you couldn't write -shuffle N, which would be inconsistent with the other flags. (And generally, it seems better to stick to the flag package's convention where a flag is either boolean or it's not.)

@rsc
Copy link
Contributor

rsc commented Oct 9, 2019

The -shuffle N problem is especially bad here, since nearly all tests don't check for extra command-line arguments, so the N would become a non-flag argument and the test would ignore it entirely, silently.

There need to be three different settings here:

  • By default, no shuffling.
  • With an easy flag, shuffle with a new random seed each time.
  • With more work, shuffle with a specific seed.

Using one flag is tough to get all three of those. I suppose you could have -shuffle=N where =0 (default) means no shuffle, =1 means a random seed, and =othernumber means that's the seed.

Another options is =-1 (default) means no shuffle, =0 means random seed, and =othernumber means that's the seed. But -shuffle=0 looks a lot like -shuffle=false.

Or maybe a string: -shuffle=off (default) means no shuffle, -shuffle=on means random seed, and =N means that's the seed.

That last one, the string, seems like the nicest one of the three. Thoughts?

@aclements
Copy link
Member

I definitely prefer the string-based option, since it avoids bestowing special meaning to a few integer values.

@mark-rushakoff
Copy link
Contributor

Agreed with the string-based option.

When using -shuffle=on, will the randomly selected seed be printed in the event of a test failure?

The original proposal said:

(used random seed may or should be printed to output depending or not on a -v flag).

But it seems important that a user is able to reproduce a failure with a known -shuffle=N, after triggering a failure using -shuffle=on.

Assuming the shuffle seed is printed in some circumstances, I'm not sure how that affects test2json output. Is it a new Action?

@rsc rsc modified the milestones: Proposal, Go1.15 Nov 6, 2019
@rsc rsc changed the title proposal: testing: add -shuffle and -shuffleseed to shuffle tests testing: add -shuffle and -shuffleseed to shuffle tests Nov 6, 2019
@rsc rsc added the NeedsFix The path to resolution is known, but the work has not been done. label Nov 6, 2019
@cristaloleg
Copy link
Author

I would like to work on it.

@odeke-em
Copy link
Member

odeke-em commented Apr 7, 2020

@cristaloleg all yours, please go ahead and sorry if I might sound like am putting pressure on you, but the Go development tree will close down at the end of April, so please let us know if you don't have time to work on it. Thank you!

cristaloleg pushed a commit to cristaloleg/go that referenced this issue Apr 28, 2020
Adds flag to go test which will randomize tests before they are run.
Flag can have next values: off(default), on and N, where N is a random seed.

Fixes golang#28592
@ianlancetaylor ianlancetaylor modified the milestones: Go1.15, Go1.16 May 19, 2020
@odeke-em
Copy link
Member

Punting this to Go1.17 as we didn’t get any action during Go1.16 development. @cuonglm here is something we could tack onto our plate for Go1.17 if you are interested.

@odeke-em odeke-em modified the milestones: Go1.16, Go1.17 Dec 25, 2020
@strideynet
Copy link

@cristaloleg are you still picking this up?

@cristaloleg
Copy link
Author

@strideynet for the 1.17 looks possible. Will try to make early March.

@strideynet
Copy link

Perfect, if you are unsure about it I am more than willing to take this ticket up @cristaloleg

@cristaloleg
Copy link
Author

@strideynet feel free to pick it up, I'm out of time and motivation. Sorry for a late ping.

@tpaschalis
Copy link
Contributor

@strideynet I could also take a stab if you're busy; let me know what you think! I'll wait for a few days and hunt for some more information in the meantime.

@strideynet
Copy link

@tpaschalis I'm just undergoing a change in employment so it'll be a while until I'm freed up, so it's all yours

@tpaschalis
Copy link
Contributor

@strideynet I wish you the best of luck mate, hope you land a great offer! I've already got a POC which seems to work; I'll iron it out and send it for review this week.

@gopherbot
Copy link

Change https://golang.org/cl/310033 mentions this issue: testing: add -shuffle flag to alter execution order of tests and benchmarks

@ianlancetaylor ianlancetaylor modified the milestones: Go1.17, Backlog Apr 23, 2021
@golang golang locked and limited conversation to collaborators Apr 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Proposal Proposal-Accepted
Projects
None yet
Development

No branches or pull requests