Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: document that directories returned by TB.TempDir should not be accessed after cleanup #43547

Open
howardjohn opened this issue Jan 6, 2021 · 3 comments
Labels
Documentation help wanted NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@howardjohn
Copy link
Contributor

howardjohn commented Jan 6, 2021

What version of Go are you using (go version)?

$ go version
go version go1.15.5 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/howardjohn/.cache/go-build"
GOENV="/home/howardjohn/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/howardjohn/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/howardjohn/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go-1.15"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.15/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/howardjohn/go/src/istio.io/istio/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build890005715=/tmp/go-build -gno-record-gcc-switches"

What did you do?

func TestTest(t *testing.T) {
	d := t.TempDir()
	go func() {
		c := 0
		for {
			ioutil.WriteFile(filepath.Join(d, fmt.Sprint(c)), nil, 0600)
			c++
		}
	}()
	t.TempDir()
}

What did you expect to see?

Test passes. Alternatively, this caveat is sufficiently documented in t.TempDir and/or RemoveAll.

What did you see instead?

Test fails with: testing.go:915: TempDir RemoveAll cleanup: unlinkat /tmp/TestTest064788875/001: directory not empty

This happens about 50% of the time in this case. In the real world scenario I am actually trying to test, it happens about 0.001% of the time, which makes it quite frustrating.

A manual version of t.TempDir which ignores this error passes, for what its worth.

func TestManual(t *testing.T) {
	d, err := ioutil.TempDir("", "test")
	if err != nil {
		t.Fatal(err)
	}
	t.Cleanup(func() {
		if err := os.RemoveAll(d); err != nil {
			if err.(*os.PathError).Op != "unlinkat" {
				t.Errorf("TempDir RemoveAll cleanup: %v", err)
			}
		}
	})
	go func() {
		c := 0
		for {
			ioutil.WriteFile(filepath.Join(d, fmt.Sprint(c)), nil, 0600)
			c++
		}
	}()
}

I think I understand why it works this way, and unfortunately I don't see any great options here, so perhaps just some documentation around this is the best path forward.

@howardjohn
Copy link
Contributor Author

Also, I think the obvious answer is "your test is bad", but I am not sure it is. We have a bunch of components that all take stop channels/context cancellation. We even check goroutines for leaks after tests (using something like https://github.com/uber-go/goleak), so we don't just have spurious goroutines flying around. The problem is cancelling the context doesn't just immediately kill the goroutines, so they do a few last things until they finally realize the context has been cancelled (in this case, writing a file).

It seems in these cases they only way you can actually write a correct test is something that ensures that a write is not triggered after the context has been cancelled, which I believe would require a lock around the context cancel and the file writing, which is extremely heavy and bizarre to support a testing edge case. The other alternative is to use something like goleak and wait until there are no goroutines sticking around, and make sure that is done before TempDir() so it occurs first - and also remember to do it on each subtest.

So at first glance its obvious that writing a file after the test is complete is wrong, but I also don't think its clear how to not be wrong.

@howardjohn howardjohn changed the title TB.TempDir fails if TB.TempDir fails if test writes a file during cleanup Jan 6, 2021
@bcmills
Copy link
Contributor

bcmills commented Jan 6, 2021

Also, I think the obvious answer is "your test is bad", but I am not sure it is.

Your test is bad, I think, or else the API under test is. 😅
(See http://golang.org/wiki/CodeReviewComments#synchronous-functions.)

It seems in these cases they only way you can actually write a correct test is something that ensures that a write is not triggered after the context has been cancelled, which I believe would require a lock around the context cancel and the file writing, which is extremely heavy and bizarre to support a testing edge case.

No, it's fine to trigger writes after the context has been canceled. The important thing is to ensure that all writes have completed before the function that triggered the writes returns, regardless of whether the context is cancelled.

The other alternative is to use something like goleak and wait until there are no goroutines sticking around, and make sure that is done before TempDir() so it occurs first - and also remember to do it on each subtest.

I don't know what goleak is (maybe go.uber.org/goleak?), but your test should be waiting for all of the background goroutines to stop before its last Cleanup function returns. Otherwise, the behaviors of those goroutines can interfere with other tests — and, perhaps more importantly, benchmarks!

@howardjohn
Copy link
Contributor Author

Otherwise, the behaviors of those goroutines can interfere with other tests

That's fair, I think I would agree my test is bad after thinking about it more..

So likely this is just "a comment above TempDir would be helpful"

@bcmills bcmills changed the title TB.TempDir fails if test writes a file during cleanup testing: document that directories returned by TB.TempDir should not be accessed after cleanup Jan 6, 2021
@bcmills bcmills added Documentation NeedsFix The path to resolution is known, but the work has not been done. labels Jan 6, 2021
@bcmills bcmills added this to the Backlog milestone Jan 6, 2021
mdisibio added a commit to mdisibio/tempo that referenced this issue Oct 13, 2021
mdisibio added a commit to grafana/tempo that referenced this issue Oct 18, 2021
* Fix mutex deadlock between search between writeTraceToHeadBlock, other improvments for mutex contention and Results channel panics

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Fix flaky unlink error by avoiding t.TempDir golang/go#43547

* Prevent file already closed errors when searching wal

* Comment

* Fix data race from flushing in wal search

* changelog
wallyqs added a commit to nats-io/nats-server that referenced this issue Aug 9, 2023
This is to try to prevent test failing due to trying to access
the tempdir while it is being tore down.
(go issue: golang/go#43547)

```
=== RUN   TestFileStoreMsgBlkFailOnKernelFaultLostDataReporting/AES-GCM-S2
    filestore_test.go:5195: ------------> 128
    testing.go:1225: TempDir RemoveAll cleanup: unlinkat ./TestFileStoreMsgBlkFailOnKernelFaultLostDataReportingAES-GCM-S23605508670/001/msgs: directory not empty
--- FAIL: TestFileStoreMsgBlkFailOnKernelFaultLostDataReporting (0.02s)
```

Signed-off-by: Waldemar Quevedo <wally@nats.io>
wallyqs added a commit to nats-io/nats-server that referenced this issue Aug 9, 2023
This is to try to prevent test failing due to trying to access the
tempdir while it is being torn down.
(go issue: golang/go#43547)

```
=== RUN   TestFileStoreMsgBlkFailOnKernelFaultLostDataReporting/AES-GCM-S2
    testing.go:1225: TempDir RemoveAll cleanup: unlinkat ./TestFileStoreMsgBlkFailOnKernelFaultLostDataReportingAES-GCM-S23605508670/001/msgs: directory not empty
--- FAIL: TestFileStoreMsgBlkFailOnKernelFaultLostDataReporting (0.02s)
```

Also increases timeout slightly of `TestFileStoreNewWriteIndexInfo`
which runs close to 1ms deadline sometimes:

```
=== FAIL: server TestFileStoreNewWriteIndexInfo/None-None (4.85s)
  | filestore_test.go:5489: Unexpected elapsed time: 1.054065ms
  | --- FAIL: TestFileStoreNewWriteIndexInfo/None-None (4.85s)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

3 participants