Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: add T.Context() accessor #16221

Closed
bradfitz opened this issue Jun 30, 2016 · 23 comments
Closed

testing: add T.Context() accessor #16221

bradfitz opened this issue Jun 30, 2016 · 23 comments

Comments

@bradfitz
Copy link
Contributor

Can we add a func (*T) Context() context.Context accessor in the testing package?

And then make each test run with a context which is canceled when it finishes.

Then goroutines created by tests can select on the test's completion (pass or fail).

Thoughts?

@bradfitz bradfitz added this to the Go1.8Maybe milestone Jun 30, 2016
@robpike
Copy link
Contributor

robpike commented Aug 22, 2016

Requirement: This CL should not add any dependencies to the testing package (other than context of course),

@bradfitz bradfitz self-assigned this Aug 22, 2016
@bradfitz
Copy link
Contributor Author

I'll prototype an implementation for review.

@broady
Copy link
Member

broady commented Aug 22, 2016

Transitive imports from context are a subset of those from testing.

@dsnet
Copy link
Member

dsnet commented Oct 6, 2016

Something to think about when implementing this. Currently, it is a bug when people call T.FatalX or T.SkipX inside a goroutine other than the main test function. The current behavior is subtly wrong, but we could consider having them cancel the context as a possible alternative to #15758

@rsc
Copy link
Contributor

rsc commented Oct 20, 2016

SGTM. Seems orthogonal to #15758. @dsnet, remember that cancelling a context is a polite request only. It doesn't force things to be torn down.

@bradfitz bradfitz changed the title proposal: testing: add T.Context() accessor? testing: add T.Context() accessor Oct 22, 2016
@gopherbot
Copy link

CL https://golang.org/cl/31724 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/32648 mentions this issue.

gopherbot pushed a commit that referenced this issue Nov 3, 2016
So that testing can use context in its public API.

For #16221.

Change-Id: I6263fa7266c336c9490f20164ce79336df44a57e
Reviewed-on: https://go-review.googlesource.com/32648
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@dsnet
Copy link
Member

dsnet commented Dec 2, 2016

I started some discussion about this feature in: https://groups.google.com/forum/#!topic/golang-dev/rS6psxIf17Q

@bcmills
Copy link
Contributor

bcmills commented Dec 2, 2016

A Context that is canceled when the test fails would be fine, but trying to use it in isolation to signal the return of the Test function seems like a mistake. Context cancellation is asynchronous, so nothing guarantees that the goroutines started as part of the test have actually returned when the test ends, which means they are likely to interfere with subsequent tests.

For the "clean up because the test is done" use-case, it seems like the testing.T needs some mechanism to wait for the cleanup to finish. You could provide a Go method for that, along the lines of errgroup but with an implicit Wait when the test returns:

func TestFoo(t *testing.T) {
  ctx := t.Context()  // Canceled when the test fails or the test function returns.
  t.Go(func() {
    doSomeBackgroundThing(ctx, t)  // The test runner will not begin the next test until this returns.
  })
  … // Run the test.
}

@dsnet
Copy link
Member

dsnet commented Dec 2, 2016

@bcmills, a Go method is very similar to my proposal in #18022.

@neild
Copy link
Contributor

neild commented Dec 2, 2016

I strongly feel that any interface which encourages allowing goroutines to outlive the test that starts them is a mistake.

@bcmills's proposal is pretty much exactly what I first thought of, but I wonder if saving a couple lines of code in tests that use concurrency is worth adding more methods to testing.T.

@bcmills
Copy link
Contributor

bcmills commented Dec 2, 2016

One alternative would be to make t.Context() crash loudly (per #15758) if any of its methods are called after the Test function returns. That would make it much easier to detect interfering goroutines, while still keeping the synchronization explicitly in the test code rather than adding more methods to testing.T.

@cstockton
Copy link

I just wanted to chime in and say I don't fully agree with @dsnet that having Context in the testing package simply "saves two lines of code".. if implemented to cancel on Fatal() the moment before Goexit() was called. Having a context that was canceled when Fatal() was called would simplify tight integration between test coverage, passing context through your systems and the synchronization primitives used to instrument it all would be easier to reason with. I personally (which I may just suck at writing Go) find it a pretty big mental drain orchestrate synchronization between things during testing. Soon as Fatal() is called and Goexit() begins Done() selects that are working in harmony with defer'd channel sends or unlocks are tricky since your defers are always behind the context Done(), which is a bit backwards from how I normally think. Start ctx, add sync prims, channels to unify context pipelines, exit context, exit channels, exit sync prims.

I think adding Context to testing makes for simpler usage for. If there is debate on when a context should be canceled maybe some form of WithErrors(ctx) or WithFatal(ctx) could be supplied to allow it to cancel on errors/fatal, and the default be on test exit. Or maybe a go test **-ctx={exit|fatal|error}** flag.

@quentinmit
Copy link
Contributor

@cstockton What is the point of cancelling the context on Fatal calls? Even if you do cancel the context, because it's asynchronous your code is not guaranteed to notice before the Goexit call runs.

@cstockton
Copy link

@quentinmit I'm not sure what you mean, calling cancel() causes the context to begin ending immediately. If you mean that anything waiting for Done() channels in a select are not guaranteed to happen immediately I understand that, the aim here is to have them happen at all. The order does not matter, just that done is called when runtime.Goexit() is, so blocked goroutines eventually wake up and exit. There are a lot of ways to illustrate this, i.e.:

func TestDeadlock(t *testing.T) {
  go t.Fatal(`oops`)
  <-t.Context().Done()
}

func TestDeadlock(t *testing.T) {
  defer func() {
    <-t.Context().Done()
  }()
  t.Fatal(`oops`)
}

Any of them are fixed immediately by calling Cancel when the test is over.

func TestNoDeadlock(t *testing.T) {
  ctx, cancel := context.WithCancel(context.Background())
  go func() {
    go cancel()
    t.Fatal(`oops`)
  }()
  <-ctx.Done()
}

@dsnet
Copy link
Member

dsnet commented Dec 19, 2016

Please note that both of your examples are racy. It is not valid to call t.Fatal in any other routine other than the original Test function. If your proposal in #18368 is based on this pattern, then it is going to be rejected since it an abuse of the testing API.

@cstockton
Copy link

The examples I posted where to show how things can deadlock, I know they are racy.. that is the entire point.

The first example is to illustrate that if you call fatal while waiting on Done(), you deadlock.

The second example shows how this can happen in defers as well, since you haven't returned to the caller yet.

Apparently I wrote more text than worth reading for you, so I'll dumb it down to this: runtime.Goexit is not friendly to deal with when you orchestrating highly concurrent testing. Having a context that will be canceled when my test fails an assertion would relieve the mental drain of writing unit tests because I have to worry about one less factor.

Share by communicating, runtime.Goexit doesn't communicate anything.

@neild
Copy link
Contributor

neild commented Dec 19, 2016

I believe t.Fatal is a bit of a distraction here, since it can only be called from the main test goroutine. If I understand @cstockton correctly, the idea is that t.Context would become done if t.Error is called from any goroutine. (This is not, AFAIK, what the original proposal did.)

I'm not certain this would be useful in the general case. A test which starts a goroutine which may produce an error will need to wait for that goroutine to exit or otherwise report completion.

To use a slightly modified version one of the above examples,

func TestDeadlock(t *testing.T) {
  go t.Error(`oops`)
  <-t.Context().Done()
}

This test reliably fails, but it cannot succeed. If we modify it so that the child goroutine does not produce an error, we need to introduce an additional synchronization mechanism; waiting on t.Context.Done does not work, because t.Context never becomes done. Once we introduce that synchronization mechanism, t.Context no longer adds any value:

func TestDeadlock(t *testing.T) {
  ctx, cancel := context.WithCancel(context.Background())
  go func() {
    defer cancel()
    if someErrorCondition {
      t.Error("oops")
    }
  }()
  <-ctx.Done()

If I am misunderstanding the argument, perhaps a more complete example demonstrating both success and failure paths would be useful.

@cstockton
Copy link

Thanks @neild for thoughtful reply, the aim of my examples was to illustrate that waiting on a context provided by T.Context() is not intuitive as implemented since it is not canceled when fatal is called. I think it the fact that dsnet focused on the example being racy, and you tried to further decompose them proves that writing tests is more difficult then it should be. Which.. does circle back to my originating point.

Having a context that was canceled when Fatal() was called would simplify tight integration between test coverage, passing context through your systems and the synchronization primitives used to instrument it all would be easier to reason with. I personally (which I may just suck at writing Go) find it a pretty big mental drain orchestrate synchronization between things during testing.

t.Fatal() races, my app races, my dependencies race, everything may race. It's a huge mental drain to orchestrate synchronization when testing it with the current implementation. You have no signal for when your test is going to exit, other than being the only one to call t.Fatal(). This makes reusing test fixtures very difficult. T.Context() helps these problems by giving you a signal to select on when things go wrong.

As for Error canceling the context, I don't think that's intuitive, I use t.Error to spit out a set of failures to dive as deep into a failure as I can before I know I shouldn't continue without serious issues. Once I call t.Fatal I know my goroutine exits, and the test is now over. That is why I expect t.Context to be canceled when Fatal is called, I canceled my test, why wouldn't the context be canceled? I believe the original implementation was otherwise fine, just add an additional cancellation condition For (Skip|Fail)Now

Finally I'm going to note this here now so I don't forget, but you may ignore this, it might not be reasonable anyways.

All this focus about T.Fatal() only being callable from goroutines outside the main goroutine has made me recognize how unfriendly it is, it's something you learn and deal with once, but thinking about having this constraint lifted would make writing tests easier if it could be sanely implemented. Maybe having it exit the goroutine t.Fatal was called from and mark the executing test belonging to T as Failed() and canceling the Context(). This would allow the originating goroutine to launch multiple test fixtures with his instance of T and select on the context in the event of failure or wait for a finishing condition.

@bcmills
Copy link
Contributor

bcmills commented Dec 19, 2016

@cstockton

You have no signal for when your test is going to exit, other than being the only one to call t.Fatal().

That's simply not true. goexit runs deferred functions: to a first approximation, all you have to do to synchronize your test correctly is to defer the cleanup.

That does mean that you have to structure your API in such a way that the deferred function call can determine when the cleanup has completed, but that's a good idea anyway: that's also what allows you to write programs with predictable memory footprints.

@cstockton
Copy link

@bcmills I'm well aware that defers are ran, or I wouldn't have posted this example:

func TestDeadlock(t *testing.T) {
  defer func() {
    <-t.Context().Done()
  }()
  t.Fatal(`oops`)
}

Which I made to illustrate that the context not being done when t.Fatal is called is not intuitive to me in the prior implementation of T.Context().

Which I brought up because of my Request:

  • Add back t.Context()
  • t.Fatal() should cancel the context, because the test is done.

Because:

I personally (which I may just suck at writing Go) find it a pretty big mental drain orchestrate synchronization between things during testing.

But instead of extrapolating it just a little along with what I said to imagine the scenarios a user could run into it's taken in literal form where it can be proved "incorrect" so the originating argument can be discarded.

Anyways, I really do have an issue with your statement:

all you have to do to synchronize your test correctly is to defer the cleanup.
That does mean that you have to structure your API in such a way that the deferred function call can determine when the cleanup has completed, but that's a good idea anyway: that's also what allows you to write programs with predictable memory footprints.

As for API design and memory footprints, we are talking about testing here. If you mean the application API's, then yes, good API design is important. There is already a mechanism in place to allow mutual communication of cancellation when an API that crosses your API's boundaries (the executing test) is done and ready to cleanup, it's called context.Context. Something the testing API is lacking. Which is why I am advocating to add it back. To make writing concurrent tests easier to reason about. I am completely perplexed how you are comfortable telling me to use a LIFO queue to manage synchronization of my tests with a hint of "design good API's". I mean, seriously?

Anyways, I think I'm done here. I made my points best I could. Thanks for taking the time to respond.

@bcmills
Copy link
Contributor

bcmills commented Dec 20, 2016

@cstockton

There is already a mechanism in place to allow mutual communication of cancellation

That's kind of my point: Context cancellation is unidirectional, not "mutual". It provides a way to notify calls that their results are no longer relevant, but does not wait for them to finish using resources. That latter step is important, not only for testing (to reduce interference between tests) but also for production code (to reduce the potential for runaway resource consumption under unusual conditions).

The usual way that we signal that a call is done using resources is by returning from it. But that's not the case for background goroutines.

I actually agree with you that a Context that is cancelled when SkipNow or FailNow is called would be useful. But it needs to be paired with synchronization in the other direction: simplifying one but not the other makes for fragile tests (and fragile APIs).

To give a concrete example: consider the race that @bradfitz fixed in change 34021. Simply canceling the acceptor goroutine would still leave the test flaky. The fix Brad implemented was to switch from an asynchronous API (calling t.Error in a background goroutine) to a somewhat more synchronous one (returning the error on a channel). That fixes the immediate flakiness, but it's still a bit leaky: on failure, it may leave a net.Conn open until the goroutine is scheduled and closes it, interfering with port assignments for subsequent tests. (In a very heavily-loaded environment with lots of similar test-cases in the same binary, that pattern can lead to port exhaustion and spurious failures.) A more robust fix for test cancellation should make it easier to eliminate that leak too.

@cstockton
Copy link

Ohh @bcmills

That's kind of my point: Context cancellation is unidirectional, not "mutual". It provides a way to notify calls that their results are no longer relevant, but does not wait for them to finish using resources. That latter step is important, not only for testing (to reduce interference between tests) but also for production code (to reduce the potential for runaway resource consumption under unusual conditions).

I didn't realize that the context interface didn't have a way for me to pass messages upstream, ignore my request. Thanks.

@golang golang locked and limited conversation to collaborators Dec 20, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants