proposal: cmd/go: add go test -long #34707

danderson · 2019-10-05T05:12:51Z

Abstract

Add go test -long as a gate for (very) long tests.

Problem statement

I write MetalLB, a set of Go programs that deploy into Kubernetes and do marginally useful things within. Currently, it's tested entirely by short and fast unit tests, but I would very much like to add integration tests that run the code against Kubernetes.

This involves spinning up a local Kubernetes cluster, which takes several minutes, and building and pushing several Docker images, taking another minute or so. This means these tests take 2-5 minutes to even start running, and probably another minute or two to finish.

With the current Go tool UX, the default behavior is the opposite of desirable for rapid iteration, and I often find myself accidentally spinning up this giant coal plant of a test during iterative development. This is an irritating series of little paper cuts.

Empirically, people are solving this on their own. In my code I check for a "RUN_SLOW_TESTS" env var, and use that to t.Skip the long tests. It works well enough, but everyone implementing their own hacks for this means the developer experience is fragmented, and it's easier to miss the fact that long tests exist at all, because the "run absolutely everything" switch is different for everyone.

Proposal

In a perfect world, remove go test -short and replace it with go test -long, flipping the default to be "you must choose to run the very slow things. Given that this breaking change is likely not palatable, I propose adding go test -long in addition to the existing go test -short.

In effect, this creates 3 bands of tests: short, medium, and long, with "short and medium" being the default go test runs. -short and -long act as lowering or raising a bar of "how much I'm willing to wait": -short only allows the short band to run, whereas -long allows short, medium and long to run.

Again I'm not in love with the 3 bands and would prefer 2 with a flipped default. Given that there would be three, I'd propose the following rules of thumb for what tests go where:

go test -short should be as fast as the compiler, fast enough that I can run compile+test interactively as part of an IDE.
go test should be fast enough that the developer's attention doesn't wander off to Twitter. Call it less than 10s total.
go test -long is for persnickety CI pipelines and coffee breaks. Minutes, tens of minute, hours if you really must (though I sincerely hope nobody does that last one).

cc @bradfitz

The text was updated successfully, but these errors were encountered:

bitfield · 2019-10-05T15:18:25Z

If this were implemented, running simply go test would miss some tests, and you'd have to know that long tests were present (and about the -long flag!) in order to run them. The safer option is to always run all tests unless you knowingly opt out of the long ones, per the status quo.

mvdan · 2019-10-06T16:41:33Z

I have mixed feelings about this proposal. I too wish that long/integration tests were more standard across Go projects. On the other hand, I'm not sure that go test -long would bring much value. At least from my personal experience, long tests tend to require extra secrets or resources to run, and users don't tend to spare minutes to run a project's tests.

It seems to me like go test -long would mainly be a minor convenience for a project's own active contributors, but not really for users running go test in general.

danderson · 2019-10-06T18:39:13Z

@bitfield The problem is that, empirically, tests are already getting skipped. I have an if os.Getenv("SKIP_SLOW_TESTS_" != "" { t.Skip(...) }, and I'm told that others have implemented similar empirical methods for excluding long tests from go test.

So, the question is whether it's better to keep a dozen different empirical methods, or have a single way to find and run tests that won't run by default.

@mvdan It's true that large tests tend to require more infrastructure. In my case, it requires a bunch of extra installed programs (docker, kind, ...) to build the test harness. So it comes back to: is there value in standardizing the discovery of such tests, even though running them may take more than just go test -long ?

This proposal was filed out of frustration at setting up yet another janky harness for E2E tests, but I'm not deeply married to solving the problem, so this discussion is very useful!

iand · 2019-10-07T07:56:55Z

The usual way of segregating tests is via build tags. Add a tag to mark long tests and then run them with go test -tags long

mvdan · 2019-10-07T08:05:06Z

is there value in standardizing the discovery of such tests

I think so. Perhaps we can simply standardize an existing way to do it, such as @iand's build tag idea, or your environment variable idea. This would still make most projects alike, without having to overcomplicate the go test setup and flags for everyone.

If we go with a build tag, my only suggestion would be to make it more specific than long, such as longtest.

bcmills · 2019-10-07T13:32:57Z

CC @julieqiu @dmitshur

bcmills · 2019-10-07T13:33:27Z

It seems clearly useful to be able to skip non-hermetic tests (such as those that require unavailable tools), although for those you can generally use exec.LookPath or similar probes to skip the test if a dependency is missing.

With a bit of work (and assuming that we fix #33976 soon), you can even make the skip contingent on your own module not being the main module (so that developers on your project always run all of your tests, and the tests fail a needed tool is missing). See golang.org/x/tools/testenv for an example.

So I think we should be careful not to conflate “long” with “non-hermetic”.

bcmills · 2019-10-07T13:43:03Z

go test -long is for persnickety CI pipelines and coffee breaks. Minutes, tens of minute, hours if you really must (though I sincerely hope nobody does that last one).

If you have three granularities, why not more? In the Go standard library, we have at least four:

go test -short is about as fast as the compiler for most packages, but for more involved binaries (such as cmd/go) it degrades to “not so long as to be a distraction”. If the short tests pass, the toolchain is likely good enough to bootstrap itself but may still contain subtle bugs and regressions.
go test is fast-ish for most packages, but slow for large ones. It runs enough tests to catch most regressions in ordinary configurations.
all.bash runs go test -short for most packages, plus several additional tests for integration with other tools (such as cgo) and configurations (such as varying values of GOMAXPROCS).
GO_BUILDER_NAME guards platform-specific or very expensive tests. Some of those only run on specific builders in the Go project's CI.

That's not to say that every project should follow this structure. However, it does suggest that -long is not particularly universal or general.

bcmills · 2019-10-07T13:46:20Z

That said: the user has an explicit control to tell you how long they expect tests to run: the -timeout flag, which should always be accurate as of Go 1.13 (see #28147).

So one option you could pursue would be to tailor the set of Skip calls based on flag.Lookup("test.timeout").Value: if it indicates a very high timeout, run the very long tests, and if it indicates a very short timeout, then skip them.

mvdan · 2019-10-07T14:29:58Z

I really like @bcmills's last idea. After all, -short and -long are all relative, and different projects will have different points of reference.

I'd love it if this was high-level, like testing.Short. For example, imagine if one could query the order of magnitude that the test is able to take to run for. Imagine an API like func (*testing.T) Magnitude() time.Duration, which could return powers of ten (100 * time.Millisecond, time.Second, 10 * time.Second, etc). This could be the order of magnitude right below -test.timeout, for example.

The reason I bring up orders of magnitude is because I can often tell if tests will take milliseconds, seconds, or minutes to run. But it's hard to say if they will take 1s or 2s, because the CPU/disk/network/etc of different machines could easily vary by 5x. Using a cutoff that directly used -test.timeout would probably result in failures on slow machines.

ianlancetaylor · 2019-10-07T14:31:35Z

See also #30595 (a rejected proposal) and #31310.

bcmills · 2019-10-07T14:42:45Z

But it's hard to say if they will take 1s or 2s, because the CPU/disk/network/etc of different machines could easily vary by 5x. Using a cutoff that directly used -test.timeout would probably result in failures on slow machines.

It shouldn't depend much on the machine: you should only be running the test at all if the timeout is 1–2 orders of magnitude longer than how long you expect the test to run, for precisely that reason.

My touchstone for this is user behavior: if a user ran your program, how long would they wait before hitting ⌃C or canceling a page-load? That's how long the test should be willing to wait, too.

There is a bit of a question as to how you divvy up -test.timeout among all of the tests in the file, but arguably the easy heuristic is to always base the decision on how much time is left rather than how much time you started with.

bcmills · 2019-10-07T14:44:07Z

arguably the easy heuristic is to always base the decision on how much time is left rather than how much time you started with.

On that note, see #28135, which I guess is on me to implement. 😅

mvdan · 2019-10-07T14:49:58Z

base the decision on how much time is left rather than how much time you started with.

If that's true, then go test could run different tests depending on how lucky or unlucky you get, no? That could be confusing.

I fear that we might be getting off-track; would like to hear if @danderson's problems would be fixed by your timeout/deadline ideas, or if they go beyond test times. It would solve most of the issue for me at least, since I've many times seen non-hermetic tests that run decently fast.

cespare · 2019-10-07T16:42:00Z

You can define the -long flag inside your test package and use it as go test -long today.

bcmills · 2019-10-07T16:46:53Z

@cespare, note that if the pattern passed to go test included a mix of your own tests and tests of your dependencies, go test -long […] would cause any dependency's test that does not also define that flag to fail.

rsc · 2019-11-13T19:10:09Z

I don't believe we should add complexity here.
We already have -short=false which you'd think meant long already.
Two standard levels should be enough.
The next number after two is infinity. Let's not go there.

And as @cespare says, you can already define -verylong yourself when you need it.

rsc · 2019-11-20T18:56:43Z

This seems like a likely decline.
The road this leads down is many many levels, not just one more.
It's easy for individual tests to add their own flags, or even for testing helper libraries to do that.

Leaving open for a week for final comments.

mmcloughlin · 2019-11-20T19:33:57Z

It's easy for individual tests to add their own flags, or even for testing helper libraries to do that.

This caught my eye since I've implemented something much like this in an internal/test package of one of my projects. Since it's currently private (hopefully I'll release it one day), I'll paste it here in case somebody finds it useful. I like @bcmills's suggestion of querying the test.timeout flag, I may incorporate that somehow.

My use case is for testing/quick or "limited fuzzing" style tests. If you have a trial function trial func(t *testing.T) bool you can call test.Repeat(t, trial) and it will call the trial function repeatedly until the allotted time runs out or the trial function returns false. The allotted time is set based on -short, -long or -stress flags.

As for adding -long in the standard library, it might be nice to standardize, but I don't feel strongly. I will just echo the point made above that I believe it is not possible for a library to replicate the behavior of the -short flag. In particular I can do go test -short ./..., whereas go test -long ./... will not do what you'd expect.

package test

import (
	"flag"
	"testing"
	"time"
)

// Custom test command line flags.
var (
	long   = flag.Bool("long", false, "enable long running tests")
	stress = flag.Bool("stress", false, "enable stress tests (implies -long)")
)

// timeallowed returns how long a single test is allowed to take.
func timeallowed() time.Duration {
	switch {
	case testing.Short():
		return time.Second / 10
	case *long:
		return 30 * time.Second
	case *stress:
		return 2 * time.Minute
	default:
		return time.Second
	}
}

// Long reports whether long tests are enabled.
func Long() bool {
	return *long || *stress
}

// Stress reports whether stress tests are enabled.
func Stress() bool {
	return *stress
}

// RequireLong marks this test as a long test. Test will be skipped if long
// tests are not enabled.
func RequireLong(t *testing.T) {
	if !Long() {
		t.Skipf("long test: use -long or -stress to enable")
	}
}

// RequireStress marks this test as a stress test. Test will be skipped if stress
// tests are not enabled.
func RequireStress(t *testing.T) {
	if !Stress() {
		t.Skipf("stress test: use -stress to enable")
	}
}

// Repeat the given trial function. The duration is controlled by custom
// command-line flags. The trial function returns whether it wants to continue
// testing.
//
//	-short		run for less time than usual
//	-long		allow more time
//	-stress		run for an extremely long time
func Repeat(t *testing.T, trial func(t *testing.T) bool) {
	start := time.Now()
	d := timeallowed()
	n := 1
	for time.Since(start) < d && trial(t) {
		n++
	}
	t.Logf("%d trials in %s", n, time.Since(start))
}

// Trials returns a function that repeats f.
func Trials(f func(t *testing.T) bool) func(t *testing.T) {
	return func(t *testing.T) {
		Repeat(t, f)
	}
}

danderson · 2019-11-20T19:51:47Z

It seems the consensus is that there's no value in standardizing a way to run specific classes of tests, and that should just be left to each project to figure out. I'm not particularly happy with that, but I can certainly live with it. No objection to closing from me.

rsc · 2019-11-27T18:09:04Z

No change in consensus, so declining.

gopherbot added this to the Proposal milestone Oct 5, 2019

gopherbot added the Proposal label Oct 5, 2019

rsc changed the title ~~Proposal: add go test -long~~ proposal: cmd/go: add go test -long Nov 20, 2019

rsc added the Proposal-FinalCommentPeriod label Nov 20, 2019

andybons mentioned this issue Nov 20, 2019

proposal: review meeting minutes #33502

Open

rsc closed this as completed Nov 27, 2019

rsc changed the title ~~proposal: cmd/go: add go test -long~~ proposal: cmd/go: add go test -long Nov 27, 2019

bcmills mentioned this issue Dec 13, 2019

cmd/go: invalidate cache entries for test runs using different timeouts #36134

Closed

dmitshur mentioned this issue May 13, 2020

x/build: reduce ambiguity in long test of main Go repo #39054

Open

golang locked and limited conversation to collaborators Nov 26, 2020

gopherbot added the FrozenDueToAge label Nov 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: cmd/go: add go test -long #34707

proposal: cmd/go: add go test -long #34707

danderson commented Oct 5, 2019

bitfield commented Oct 5, 2019

mvdan commented Oct 6, 2019

danderson commented Oct 6, 2019

iand commented Oct 7, 2019

mvdan commented Oct 7, 2019

bcmills commented Oct 7, 2019

bcmills commented Oct 7, 2019 •

edited

Loading

bcmills commented Oct 7, 2019 •

edited

Loading

bcmills commented Oct 7, 2019

mvdan commented Oct 7, 2019

ianlancetaylor commented Oct 7, 2019

bcmills commented Oct 7, 2019

bcmills commented Oct 7, 2019

mvdan commented Oct 7, 2019

cespare commented Oct 7, 2019

bcmills commented Oct 7, 2019 •

edited

Loading

rsc commented Nov 13, 2019

rsc commented Nov 20, 2019 •

edited

Loading

mmcloughlin commented Nov 20, 2019 •

edited

Loading

danderson commented Nov 20, 2019

rsc commented Nov 27, 2019

proposal: cmd/go: add go test -long #34707

proposal: cmd/go: add go test -long #34707

Comments

danderson commented Oct 5, 2019

Abstract

Problem statement

Proposal

bitfield commented Oct 5, 2019

mvdan commented Oct 6, 2019

danderson commented Oct 6, 2019

iand commented Oct 7, 2019

mvdan commented Oct 7, 2019

bcmills commented Oct 7, 2019

bcmills commented Oct 7, 2019 • edited Loading

bcmills commented Oct 7, 2019 • edited Loading

bcmills commented Oct 7, 2019

mvdan commented Oct 7, 2019

ianlancetaylor commented Oct 7, 2019

bcmills commented Oct 7, 2019

bcmills commented Oct 7, 2019

mvdan commented Oct 7, 2019

cespare commented Oct 7, 2019

bcmills commented Oct 7, 2019 • edited Loading

rsc commented Nov 13, 2019

rsc commented Nov 20, 2019 • edited Loading

mmcloughlin commented Nov 20, 2019 • edited Loading

danderson commented Nov 20, 2019

rsc commented Nov 27, 2019

bcmills commented Oct 7, 2019 •

edited

Loading

bcmills commented Oct 7, 2019 •

edited

Loading

bcmills commented Oct 7, 2019 •

edited

Loading

rsc commented Nov 20, 2019 •

edited

Loading

mmcloughlin commented Nov 20, 2019 •

edited

Loading