-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testing: shuffle seed should be different when -shuffle=on and -count flag is set #61256
Comments
@rsc forgive me for a ping but can this be included into review meetings? |
This proposal has been added to the active column of the proposals project |
There is also a strange detail that -count=3 with tests A,B,C runs A,B,C,A,B,C,A,B,C, but with benchmarks X,Y,Z you get X,X,X,Y,Y,Y,Z,Z,Z. I wonder why we do that, and whether we should run the tests A,A,A,B,B,B,C,C,C too. I agree that if we keep the tests A,B,C,A,B,C,A,B,C then we should at least shuffle the rounds independently. But maybe for -shuffle=on -count=N we should make the full list of N*M things that will run and shuffle that entire list, so there is no longer a "round" boundary anywhere. |
For me this sounds like a separate proposal (however, running benchmark N times and switching to another sounds quite natural).
Great point, this might also catch cases when running test twice in a row is buggy (pre/post-test initialisation contains a bug) |
It sounds like maybe the proposal should be that -shuffle=on -count=N handles tests and benchmarks the same way, which is make a list of all the tests (alternately, benchmarks) with N copies each, and shuffle that entire list, and then run them in that order. If you're running tests and benchmarks, they still run as two separate phases: first tests, then benchmarks. If -cpu is involved, each cpu set still runs as separate phases around the shuffled sets. So -cpu=1,2,4 runs a shuffled set with cpu=1, a shuffled set (probably differently shuffled) with cpu=2, and then another with cpu=4. We should probably check whether the current tests -count=3 doing A,B,C,A,B,C,A,B,C runs the parallel tests after each A,B,C or at the end of all of them. If the former then the shuffle will be a bit different, maybe too different. |
I haven't thought about -cpu flag. Don't have strong cons/pros why it should/not be involved here. Same for parallel tests. However, I really like that tests gonna be ran in a more arbitrary order than before. (I assume -p flag should not change the behaviour in any way) |
Just putting in my 2 cents: whenever I used shuffle I expect the order to change between runs, even (or especially) when using |
The answer appears to be that it runs A,B,C,Parallel,A,B,C,Parallel,etc. That is, parallel tests run after each A,B,C, not at the end of all of them. That's a bit unfortunate for shuffling the whole list. I think that means for tests, we either need to shuffle within each "A,B,C,Parallel" (but independently for each count), or we need to put all of the parallel tests at the end (whether or not we're shuffling). We could still do a whole-list shuffle for benchmarks. The behavior would be different between tests and benchmarks, but that's already true today, and beyond some pedagogical purity, I'm not sure it matters if they're different. |
One more option for shuffling would be to periodically toggle between (When we hit a `“release parallel” event we would unblock all of the parallel tests buffered so far, wait for them to finish, and then resume running the next possibly-non-parallel test in shuffled order.) |
Sounds like the simplest thing to do is say for tests we only shuffle between the individual sections and for benchmarks we shuffle the whole thing. |
Based on the discussion above, this proposal seems like a likely accept. |
No change in consensus, so accepted. 🎉 |
Change https://go.dev/cl/631016 mentions this issue: |
I'm curious about the motivation for the distinction. Why not shuffle sections for both? Shuffling the whole thing requires building a complete N*M list (I think, perhaps there's a nice algorithm for streaming permutations I don't know about). That's probably not a problem, but I wince a bit at -count=N allocating an slice of arbitrarily large size. |
This proposal is a clarification of the original test shuffle flag proposal #28592
Today I was debugging tests that were rarely failing due to cross-dependency between test functions. To verify the fix I noticed that
go test -shuffle=on -count=100
runs tests in the same order as the first iteration. In other words-test.shuffle
value is generated once and is reused for the other 99 test runs.This makes sense when the shuffle seed is set explicitly (
-shuffle=X
) but for random value (-shuffle=on
) we should generate a new one to increase test order randomness to cover more combinations.Current documentation for
count
andshuffle
flags doesn't mention their relation, so the proposal (if accepted) will not breakgo test
behaviour (docs from currentmaster
branch):Thank you.
The text was updated successfully, but these errors were encountered: