testing: when using a custom TestMain, m.Run does not return if one of the tests it runs panics #37206

orlangure · 2020-02-13T08:12:28Z

What version of Go are you using (`go version`)?

$ go version
go version go1.13.7 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/yury/Library/Caches/go-build"
GOENV="/Users/yury/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/yury/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/yury/go/src/github.com/orlangure/myproject/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/p1/rjgq2gp55pj58ckbsn94yfz80000gn/T/go-build564127360=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I wrote a function, a test for it, and TestMain to perform setup and teardown for testing this function:

main.go

package main

import "fmt"

func main() {
	fmt.Println("vim-go")
}

func p() {
	panic("foo")
}

p_test.go

package main

import "testing"

func TestP(t *testing.T) {
	p()
}

main_test.go

package main

import (
	"fmt"
	"os"
	"testing"
)

func TestMain(m *testing.M) {
	os.Exit(testMain(m))
}

func testMain(m *testing.M) int {
	setup()
	defer teardown()

	return m.Run()
}

func setup() {
	fmt.Println("setting up")
}

func teardown() {
	fmt.Println("tearing down")
}

What did you expect to see?

I expected to see "setting up" and "tearing down" at some point, and a panic "foo" message with a stack trace.

What did you see instead?

Only setup and panic output appeared:

setting up
--- FAIL: TestP (0.00s)
panic: foo [recovered]
        panic: foo

goroutine 19 [running]:
testing.tRunner.func1(0xc0000b6100)
        /usr/local/go/src/testing/testing.go:874 +0x3a3
panic(0x1111220, 0x116b4d0)
        /usr/local/go/src/runtime/panic.go:679 +0x1b2
akeyless.io/akeyless-main-repo/go/src/t.p(...)
...
FAIL

The program probably called os.Exit at some point while running the tests, or panicked in a separate go routine with no way for me to recover. I would expect both setup and teardown functions to be called at some point.

The text was updated successfully, but these errors were encountered:

dmitshur · 2020-02-15T00:26:27Z

I've verified m.Run indeed doesn't return if one of the tests it runs causes a panic by doing:

func testMain(m *testing.M) int {
        defer func() {
                if e := recover(); e != nil {
                        fmt.Println("there was a panic")
                }
        }()
        setup()
        defer teardown()
        return m.Run()
}

The string "there was a panic" doesn't get printed.

It does get printed if there is a panic before m.Run():

func testMain(m *testing.M) int {
	defer func() {
		if e := recover(); e != nil {
			fmt.Println("there was a panic")
		}
	}()
	setup()
	defer teardown()

	panic("about to m.Run")
	return m.Run()
}

The code in testing that orchestrates everything and handles all the edge case is quite complex and subtle. I'm not sure if it's feasible to make m.Run return an exit code in thus case. If it is possible, then I'm not sure if it's desirable to do so. It may be a good idea to document m.Run that it doesn't return if a test panics, but maybe we don't want to commit to that being a part of the testing API that we can't change.

This needs further investigation.

/cc @mpvl @josharian per owners.

bcmills · 2020-02-18T17:57:38Z

#34129 is somewhat relevant, although the implementation for that issue probably will not address this one.

(CC @changkun @abuchanan-nr)

gopherbot · 2020-02-19T00:01:59Z

Change https://golang.org/cl/219977 mentions this issue: testing: allow m.Run return if a test panics

changkun · 2020-02-19T00:04:29Z

@bcmills Thanks for CC me. I basically agree with @dmitshur . The Go team must decide either document the behavior or allow testing.M.Run to return properly if there is a test panic.

For the following code snippet:

package main

import (
	"fmt"
	"testing"
)

func TestMain(m *testing.M) {
	setup()
	defer teardown()
	m.Run()
}
func TestP(t *testing.T) {
	panic("foo")
}
func setup() {
	fmt.Println("setup()")
}
func teardown() {
	fmt.Println("teardown()")
}

outputs:

setup()
--- FAIL: TestP (0.00s)
panic: foo [recovered]
        panic: foo

goroutine 19 [running]:
testing.tRunner.func1(0xc0000b8100)
        /usr/local/Cellar/go/1.13.8/libexec/src/testing/testing.go:874 +0x3a3
panic(0x11111a0, 0x116b390)
        /usr/local/Cellar/go/1.13.8/libexec/src/runtime/panic.go:679 +0x1b2
_/Users/changkun/Desktop/testing.TestP(0xc0000b8100)
        /Users/changkun/Desktop/testing/main_test.go:14 +0x39
testing.tRunner(0xc0000b8100, 0x114f9b8)
        /usr/local/Cellar/go/1.13.8/libexec/src/testing/testing.go:909 +0xc9
created by testing.(*T).Run
        /usr/local/Cellar/go/1.13.8/libexec/src/testing/testing.go:960 +0x350
exit status 2
FAIL    _/Users/changkun/Desktop/testing        0.295s

If we decide to allow testing.M.Run to return, then it will execute teardown():

setup()
--- FAIL: TestP (0.00s)
panic: foo [recovered]
        panic: foo

goroutine 18 [running]:
testing.tRunner.func1.1(0x1110c20, 0x11690c0)
        /Users/changkun/dev/go-gerrit/src/testing/testing.go:941 +0x355
testing.tRunner.func1(0xc0000ce120)
        /Users/changkun/dev/go-gerrit/src/testing/testing.go:945 +0x427
panic(0x1110c20, 0x11690c0)
        /Users/changkun/dev/go-gerrit/src/runtime/panic.go:967 +0x15d
_/Users/changkun/Desktop/testing.TestP(0xc0000ce120)
        /Users/changkun/Desktop/testing/main_test.go:14 +0x39
testing.tRunner(0xc0000ce120, 0x114aaa8)
        /Users/changkun/dev/go-gerrit/src/testing/testing.go:997 +0xdc
created by testing.(*T).Run
        /Users/changkun/dev/go-gerrit/src/testing/testing.go:1048 +0x357
teardown()
exit status 2
FAIL    _/Users/changkun/Desktop/testing        0.163s

ianlancetaylor · 2020-02-24T17:09:05Z

We can never make this perfect. If some code in a test calls

    go func() { panic("die die die") }()

then m.Run is not going to return no matter what we do.

The only question here is how we should handle a test that panics in the goroutine used to run the test. Should we try to handle that specific case? Or should we just treat it like a panic in a different goroutine, which is in effect what we do now?

I don't have a strong opinion about that. But the testing package is already fairly baroque. Is it really worth complicating it further to handle one specific case when we can't handle other similar cases?

changkun · 2020-02-24T23:16:06Z

@ianlancetaylor

No, we can't do anything about it yet.

I didn't write any test regarding user goroutine panics and left this case to the CL's reviewers, and I think this particular case can simply be ignored because this type of panic happens in a user goroutine. The testing package has done everything it could achieve (panics happens directly in tests and subtests).

If we interested in handling this particular case, more investigation could be done with subsequent CLs.

ianlancetaylor · 2020-02-25T02:02:19Z

I suppose I'm asking for opinions.

Is it worth complicating the testing package to return from m.Run if a panic occurs directly in the test, given that we can't do that if a test start a goroutine that panics?

dmitshur · 2020-02-25T02:13:04Z

Thank you for explaining the current situation and trade-offs in #37206 (comment), @ianlancetaylor.

My opinion is that we should hold off on complicating the testing package until a time when all panics can be handled (since doing it for only some goroutines isn't very worthwhile), and document the rationale for m.Run not returning in the case of a panic so we can look it up in the future. If it's viable to document it publicly without locking in the current behavior, that would be better, but otherwise it can be documented internally in the testing package.

changkun · 2020-02-25T06:26:29Z

I was answering an opinion.

This particular case can simply be ignored because this type of panic happens in a user goroutine.
If we interested in handling this particular panic case, more investigation could be done in subsequent CLs.

bcmills · 2020-02-25T14:05:42Z

If I get CL 134395 cleaned up and merged, it will be fairly straightforward for users to use an errgroup to structure their code such that all goroutines that may panic propagate that panic back to the main goroutine.

Moreover, I suspect that the vast majority of tests do not explicitly spawn goroutines, let alone goroutines that may panic. It seems reasonable for users to expect that their best-effort cleanup using defer will actually be invoked most of the time.

bcmills · 2020-02-25T14:26:29Z

At least part of the complexity of the implementation seems to arise because the runtime does not provide a mechanism for a program to re-raise an existing panic without altering its stack trace. (I've discussed that deficiency with at least @aclements before, but I don't see an issue filed for it.)

If we provided a general mechanism to re-panic without altering the stack trace, then I think the testing package would not need nearly so much complexity on top of that.

(CC @danscales)

changkun · 2020-02-25T18:27:15Z

If we provided a general mechanism to re-panic without altering the stack trace, then I think the testing package would not need nearly so much complexity on top of that.

This might be already off-topic: Was there any existed discussion on registering global handler of panics? It seems (server-side) Go users nowadays avoid to use panic/recover pair, just because they do not want their service down by accident, and there is no way to capture a panic outside a goroutine. The runtime also does not have this privilege.

bcmills · 2020-02-25T18:35:19Z

I don't think a global panic handler would be an appropriate solution, here or in servers. An uncaught panic may indicate than an important program invariant was violated, and trying to resume execution may just replace a panic with a clear backtrace with a hard-to-diagnose deadlock.

changkun · 2020-02-25T18:37:22Z

My opinion is ... until a time when all panics can be handled ...

I don't see any way of doing it, handling all panics seems to be equivalent to the status in GC and mutators. That is to say, a user goroutine may panic intentionally, the runtime should not do anything with it by default. This basically matches what I said in the beginning: "The testing package has done everything it could achieve with the CL. If we interested in handling this particular panic case, more investigation could be done in subsequent CLs."

cc @dmitshur

changkun · 2020-02-25T18:46:29Z

... may indicate than an important program invariant was violated, and trying to resume execution may just replace a panic with a clear backtrace with a hard-to-diagnose deadlock.

Good point. Agree. I just randomly threw some virgin ideas, since I didn't experience much use cases regarding capturing a panic outside a goroutine except this, not suggesting anything :)

ianlancetaylor · 2020-02-26T05:05:29Z

I personally am not convinced by "let's do this now and figure out how to do more later." Sometimes that is the right approach, but sometimes it's important to at least understand how we could do more later.

Right now people are confused because when they panic in a test deferred functions in TestMain are not run. If we fix that, people will be confused because when they panic in a goroutine started by a test deferred functions in TestMain are not run. Although the current state is not good, at least it's consistent.

bcmills · 2020-02-26T14:03:14Z

I don't think we do need to “figure out how to do more later”. We should structure our own code to propagate panics from non-main goroutines back to the main goroutine, and encourage users to do the same.

Especially for non-Parallel tests, the mapping of test functions to goroutines should be an implementation detail internal to the testing package. It should not matter whether the Test function runs on the same goroutine as TestMain or an entirely different goroutine, but today it does. We should fix that.

changkun · 2020-02-26T14:05:11Z

I am also agreed with you about the part " ... 'let's do this now and figure out how to do more later.' ..." is not a good decision in this case.

However, the current situation is I don't see any way of doing it better than the proposed CL because we cannot differentiate an accidental panic and an intentional panic in a spawned goroutine from the runtime unless people give a clear indication to the TestMain or equivalent. This knob makes the usage and behavior even more complex than the proposed CL.

If we carefully think about panics propagate out from a test goroutine, things become more subtle. Go users were taught to fail a test by t.Error / t.Fatal family. An intentional panic simply does not happen in a test. In this case, panic detected in a test only happens in an accidental scenario, which exactly fits what we would like to fix.

I personally argue this is the best action we can offer and there are no further thoughts from my side.

cc @ianlancetaylor

gopherbot · 2020-02-28T13:48:23Z

Change https://golang.org/cl/221321 mentions this issue: regexp: convert test into *_test package

changkun · 2020-03-02T15:31:20Z

Maybe we could turn this into a proposal that can be reviewed in the proposal review meeting minutes?

cc @rsc @andybons

greg-dennis · 2020-03-27T00:31:19Z

It appears that the proposed solution would propagate the panic to TestMain. An alternative would be that a panic within a test instead triggers an immediate failure of that test but allows the other tests to continue, such that the panic is never propagated to TestMain but arguably never needs to be. On the downside, this may encourage test writers to deliberately call panic to fail -- but I'm not sure we're at risk of that becoming common practice. On the plus side, I think this alternative might be:

A bit easier to implement, less complexity in the test runner
More in line with the "keep going" philosophy of test failures
More user-friendly to not have the entire test suite rollover due to a panic in one test case

bcmills · 2020-03-27T01:39:13Z

@progressnerd, an unexpected panic potentially leaves the program in an arbitrarily corrupted state. Swamping the panic in noise from subsequent test failures — or worse, deadlocks — would make the panic harder to diagnose, in addition to delaying the output of the panic, for at best marginal additional information from the test.

Automatically drop any old test databases. Initially I was under the impression that users could make use of `TestMain` to ensure that test databases are correctly cleaned up after the test suite completes, but it turns out that if a test panics then `TestMain` may not return [1]. As a result there doesn't seem to be a straightforward way to ensure that test databases created by this library are eventually cleaned up, which means that the number of test databases can continually grow across test runs. Mainly this is an annoyance, but eventually it can also start to cause problems. Since this library is explicitly about setting up and tearing down test databases, we really should make an effort to ensure that the number of test databases doesn't continuously grow in certain situations. One potential option would be to replace the supervisor `GetTestDB(testing.TB) TestDB` with something like: ```go func (s *testSupervisor) WithTestDB(t testing.TB, fn func(TestDB)) { defer func() { if r := recover(); r != nil { t.Errorf("recovered panic: %v", r) } }() dbResource, err := s.inner.getTestDB(ctx) if err != nil { t.Fatalf("get test db: %s", err) } t.Cleanup(func() { // release dbResource to the pool. ... }) return fn(dbResource.Data()) } ``` While this would help, there will still be problems if other tests in the suite panic. Also it feels less ergonomic (in my opinion) and would make it harder for users to wrap this library (which is expected). There is also a similar problem, albeit to a lesser extent, with test databases that are explicitly persisted through the `KeepDatabasesForFailed` option. These changes update the supervisor to automatically drop any old test databases, meaning those with the `pg_test` prefix, immediately. This means that by default, after a test suite finishes there should not be any test databases remaining. I am not sure if this is the correct approach. Mainly, I don't think this will actually work if tests from multiple packages are run in parallel and multiple packages make use of `pgtest`. Also, it isn't the end of the world to just have the user manually drop these test databases themself with something like: ``` psql postgres -c "\l" | grep pg_test | awk '{print $1}' | xargs -I{} psql postgres -c "DROP DATABASE {};" ``` I was just trying to verify the `TestMain` approach worked as expected even if the tests didn't exit normally, and once I found out this wasn't the cased tried to come up with a solution. If we find that automatically dropping old test databases is problematic, it might be best to just add some extra documentation explaining the behaviour. [1] golang/go#37206

johndunlap · 2024-09-22T03:00:07Z

I'm new to Go and I just hit this issue while trying to make my tests clean up after themselves.

An alternative would be that a panic within a test instead triggers an immediate failure of that test but allows the other tests to continue, such that the panic is never propagated to TestMain but arguably never needs to be.

This is exactly the behavior I was expecting!

greg-dennis · 2024-09-26T00:40:57Z

@johndunlap, I suspect that's what most people would prefer. I actually built a library that achieved this by reflectively modifying test cases, and it was greatly appreciated by me and others who used it, but it was a bit too hard to maintain that kind-of-hacky, reflection-based implementation. Continuing with tests cases after one fails always risks swamping that test failure with noise from subsequent test cases, but the balance of usefulness, in my experience, has always been heavily on the side of continuing. Nearly every test panic -- if not literally every -- could have been recovered by the test runner and produced useful info by running subsequent tests. At the end of the day, the test as a whole is either going to pass or fail either way -- so the only question here is what is usually most useful to the user.

dmitshur added the NeedsInvestigation label Feb 15, 2020

dmitshur added this to the Backlog milestone Feb 15, 2020

dmitshur changed the title ~~os.Exit is called at some point while running TestMain when a test panics~~ testing: when using a custom TestMain, m.Run does not return if one of the tests it runs panics Feb 15, 2020

hengfengli mentioned this issue Mar 29, 2020

spanner: integration_test.go does not always clean up instances googleapis/google-cloud-go#1887

Closed

fergusstrange mentioned this issue May 20, 2021

Panic leads to stuck process/process not being closed. fergusstrange/embedded-postgres#29

Closed

piotrowski mentioned this issue Oct 19, 2021

Binary not exiting after panic ingridhq/comptest#2

Open

oleg-jukovec mentioned this issue Jul 1, 2022

Problems with setup and teardown Tarantool in tests tarantool/go-tarantool#147

Open

Angith mentioned this issue Dec 3, 2024

fix: the test database is not cleaned up from the Cosmos account when integration tests fail instana/go-sensor#964

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing: when using a custom TestMain, m.Run does not return if one of the tests it runs panics #37206

testing: when using a custom TestMain, m.Run does not return if one of the tests it runs panics #37206

orlangure commented Feb 13, 2020 •

edited

Loading

dmitshur commented Feb 15, 2020

bcmills commented Feb 18, 2020

gopherbot commented Feb 19, 2020

changkun commented Feb 19, 2020 •

edited

Loading

ianlancetaylor commented Feb 24, 2020

changkun commented Feb 24, 2020 •

edited

Loading

ianlancetaylor commented Feb 25, 2020

dmitshur commented Feb 25, 2020

changkun commented Feb 25, 2020

bcmills commented Feb 25, 2020 •

edited

Loading

bcmills commented Feb 25, 2020 •

edited

Loading

changkun commented Feb 25, 2020

bcmills commented Feb 25, 2020

changkun commented Feb 25, 2020

changkun commented Feb 25, 2020

ianlancetaylor commented Feb 26, 2020

bcmills commented Feb 26, 2020

changkun commented Feb 26, 2020 •

edited

Loading

gopherbot commented Feb 28, 2020

changkun commented Mar 2, 2020

greg-dennis commented Mar 27, 2020

bcmills commented Mar 27, 2020

johndunlap commented Sep 22, 2024

greg-dennis commented Sep 26, 2024

testing: when using a custom TestMain, m.Run does not return if one of the tests it runs panics #37206

testing: when using a custom TestMain, m.Run does not return if one of the tests it runs panics #37206

Comments

orlangure commented Feb 13, 2020 • edited Loading

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

dmitshur commented Feb 15, 2020

bcmills commented Feb 18, 2020

gopherbot commented Feb 19, 2020

changkun commented Feb 19, 2020 • edited Loading

ianlancetaylor commented Feb 24, 2020

changkun commented Feb 24, 2020 • edited Loading

ianlancetaylor commented Feb 25, 2020

dmitshur commented Feb 25, 2020

changkun commented Feb 25, 2020

bcmills commented Feb 25, 2020 • edited Loading

bcmills commented Feb 25, 2020 • edited Loading

changkun commented Feb 25, 2020

bcmills commented Feb 25, 2020

changkun commented Feb 25, 2020

changkun commented Feb 25, 2020

ianlancetaylor commented Feb 26, 2020

bcmills commented Feb 26, 2020

changkun commented Feb 26, 2020 • edited Loading

gopherbot commented Feb 28, 2020

changkun commented Mar 2, 2020

greg-dennis commented Mar 27, 2020

bcmills commented Mar 27, 2020

johndunlap commented Sep 22, 2024

greg-dennis commented Sep 26, 2024

orlangure commented Feb 13, 2020 •

edited

Loading

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

changkun commented Feb 19, 2020 •

edited

Loading

changkun commented Feb 24, 2020 •

edited

Loading

bcmills commented Feb 25, 2020 •

edited

Loading

bcmills commented Feb 25, 2020 •

edited

Loading

changkun commented Feb 26, 2020 •

edited

Loading