testing: Feature Request: Better flaky test support #27181

hklai · 2018-08-23T20:22:39Z

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (`go version`)?

go version go1.10 darwin/amd64.

Does this issue reproduce with the latest release?

N/A

What operating system and processor architecture are you using (`go env`)?

go version go1.10 darwin/amd64
hklai-macbookpro2:e2e hklai$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/hklai/Library/Caches/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/hklai/go"
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/3q/dk8r61l944q42b8g7d2bq57r005d_5/T/go-build136728580=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

N/A

What did you expect to see?

Better flaky test support, as outlined in the following areas:

Provide a way to annotate flaky tests. For example, there can be a flaky() method in the testing package, and go developers can add a line like "testing.flaky()" in test methods to indicate the tests are flaky. Obviously, this can be done using other mechanisms as well but this needs to be express-able in the test method level (i.e. build tag will not work).
"go test" retries flaky tests natively, if the tests are marked as flaky.
Test options to control the above retry behavior (i.e. number of retries until test passes, delay between retry, etc).
Test retry should work with the -json option, such that downstream processing can determine if a test had been retried, and/or how many times a test is retired.

I understand bazel provides some form of flaky test support, but it does not work well for us because:

the bazel flaky support is per test rule which is not granular enough.
We consciously decided against the use of bazel in our project.

In absence of the above, our only option is to implement a wrapper program that calls "go test", and then run "go test" again on the failed tests that are known to be flaky.

What did you see instead?

N/A

The text was updated successfully, but these errors were encountered:

hklai · 2018-08-23T20:52:46Z

And to be clear, flaky tests are bad and they should be fixed ideally. However, flaky tests do happen from time to time and feature like this is important for productivity and velocity reasons.

And here is the use case I have in mind:

When a test starts to flake, developers can quickly annotate it and "go test" can take care of retrying it, to reduce of chance of test job failing, which can potentially block other developers from merging code. This reduces disruption to dev cycle as developers do not need to manually re-run these flaky test jobs, and their presubmit jobs are more likely to pass.
The developers who are responsible for the flaky tests will fix them and eventually remove the flaky annotation.

Another approach is to completely disable flaky tests in CI but it clearly reduces test coverage, and these tests can be further broken before they can be restored.

CI can be more resilient to flaky tests and developers won't get blocked just because a test starts to flake.

This is also why test retry should be noted so that people can take action to address them as well.

ianlancetaylor · 2018-08-23T20:57:49Z

I disagree pretty strongly. The point of a test is to detect an error. A flaky test is worthless, because you don't learn anything when it fails. It just wastes time. If a test is potentially flaky, then it should be rewritten. A typical simple rewrite is to loop enough times to make it extremely unlikely that all iterations will fail for flaky reasons.

With that attitude, I think the only support the testing package needs for flaky tests is t.Skip, and we already have that.

hklai · 2018-08-23T22:11:09Z

As mentioned above, t.Skip() (i.e. disable test) is an alternative, but it reduces coverage. The related feature can be broken by other changes and it will be more difficult to reenable the test.

A flaky test is bad, but it is not always worthless. As the project and number of developers grow, a 10% flaky test affects more and more developers, but the fact that it passes 90% of the time is still a valuable signal. I totally agree that all flaky tests need to be fixed/addressed, but until then, we also want minimize the impact to the 10% developers.

Correct me if I am wrong, but I think the simple rewrite suggestion won't work in case TestMain() is involved. A test may require setup/cleanup steps in TestMain() that are not available in the test method itself.

ianlancetaylor · 2018-08-23T23:21:48Z

Yes, if TestMain is involved using a loop is harder, but it's still possible.

If your goal for "things are OK" is that your test passes 90% of the time, then you should write your test that way. Don't write it so that go test fails 10% of the time. At least, that's how I see it.

hklai · 2018-08-23T23:54:20Z

No I am not saying tests passing 90% is OK.

Having flaky tests is not an end state. Over the course of development, some tests are going to become flaky for various reasons. The end goal is to fix the cause of flakiness and make the tests not flaky, but until then, we are hoping to minimize impact to developers (i.e. presubmit failing) without having to remove the test.

ianlancetaylor · 2018-08-24T05:08:25Z

Just as the language does not permit unused imports or unused variables, I don't see a compelling reason that the testing package should provide additional mechanisms for permitting flaky tests.

mark-rushakoff · 2018-08-24T15:29:46Z

I don't see why this couldn't be solved in a third party package.

func TestThingThatMayFlake(t *testing.T) {
  otherpkg.RerunOnFlake(t, func(t testing.TB) {
    if err := ThingThatMayFlake(); err != nil {
      t.Fatal(err)
    }
  })
}

andybons · 2018-09-04T20:36:39Z

I am in agreement with @ianlancetaylor. I don't believe we should be adding additional functionality to the go tool that placates (and inherently encourages) flaky tests.

ianlancetaylor changed the title ~~Feature Request: Better flaky test support~~ testing: Feature Request: Better flaky test support Aug 23, 2018

andybons closed this as completed Sep 4, 2018

golang locked and limited conversation to collaborators Sep 4, 2019

gopherbot added the FrozenDueToAge label Sep 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing: Feature Request: Better flaky test support #27181

testing: Feature Request: Better flaky test support #27181

hklai commented Aug 23, 2018

hklai commented Aug 23, 2018

ianlancetaylor commented Aug 23, 2018

hklai commented Aug 23, 2018

ianlancetaylor commented Aug 23, 2018

hklai commented Aug 23, 2018

ianlancetaylor commented Aug 24, 2018

mark-rushakoff commented Aug 24, 2018

andybons commented Sep 4, 2018 •

edited

Loading

testing: Feature Request: Better flaky test support #27181

testing: Feature Request: Better flaky test support #27181

Comments

hklai commented Aug 23, 2018

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

hklai commented Aug 23, 2018

ianlancetaylor commented Aug 23, 2018

hklai commented Aug 23, 2018

ianlancetaylor commented Aug 23, 2018

hklai commented Aug 23, 2018

ianlancetaylor commented Aug 24, 2018

mark-rushakoff commented Aug 24, 2018

andybons commented Sep 4, 2018 • edited Loading

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

andybons commented Sep 4, 2018 •

edited

Loading