Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: 'go test -o' may fail with ETXTBSY when running the compiled test #62221

Closed
bcmills opened this issue Aug 22, 2023 · 8 comments
Closed
Assignees
Labels
GoCommand cmd/go NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Aug 22, 2023

#!watchflakes
post <- pkg ~ `^cmd/go` && `text file busy` && `go test .* -o\b`

(Pulled out from #58019 (comment).)

When go test is run with the -o flag but not the -c flag, it writes out the requested binary and then execs that binary.
Occasionally that races with calling exec on another test in the same run, which can cause the file descriptor to be (briefly) held open by a child process, leading to the failure mode described in #22315.

This can occur any time go test -o is used to build and run multiple tests.

I can think of three possible fixes:

  1. Acquire a read lock on syscall.ForkLock when copying test binaries, blocking an exec anywhere else in the command from proceeding concurrently.
  2. Start a subprocess (probably another instance of cmd/go with some secret handshake?) to copy the file, preventing other subprocesses from ever seeing the file descriptor.
  3. Retry ETXTBSY errors when running the test binary.
@bcmills bcmills added NeedsFix The path to resolution is known, but the work has not been done. GoCommand cmd/go labels Aug 22, 2023
@bcmills bcmills added this to the Go1.22 milestone Aug 22, 2023
@bcmills bcmills self-assigned this Aug 22, 2023
@gopherbot
Copy link

Found new dashboard test flakes for:

#!watchflakes
post <- pkg ~ `^cmd/go` && `text file busy` && `go test .* -o\b`
2023-08-04 16:02 linux-arm64-longtest go@55d08e50 cmd/go.TestScript (log)
vcs-test.golang.org rerouted to http://127.0.0.1:42463
https://vcs-test.golang.org rerouted to https://127.0.0.1:35577
go test proxy running at GOPROXY=http://127.0.0.1:36833/mod
2023/08/04 16:38:22 http: TLS handshake error from 127.0.0.1:42728: read tcp 127.0.0.1:35577->127.0.0.1:42728: read: connection reset by peer
2023/08/04 16:38:22 http: TLS handshake error from 127.0.0.1:42892: read tcp 127.0.0.1:35577->127.0.0.1:42892: read: connection reset by peer
2023/08/04 16:38:22 http: TLS handshake error from 127.0.0.1:42860: read tcp 127.0.0.1:35577->127.0.0.1:42860: read: connection reset by peer
--- FAIL: TestScript (0.07s)
    --- FAIL: TestScript/test_compile_multi_pkg (4.01s)
        script_test.go:132: 2023-08-04T16:36:45Z
        script_test.go:134: $WORK=/tmp/workdir/tmp/cmd-go-test-4203200371/tmpdir4223219662/test_compile_multi_pkg1358018183
...
            > go test -c -o $devnull ./...
            > rm pkg1.test$GOEXE
            > rm pkg2.test$GOEXE
            > go test -o . ./pkg/...
            [stdout]
            ok  	example/pkg/pkg1	0.020s [no tests to run]
            fork/exec /tmp/workdir/tmp/cmd-go-test-4203200371/tmpdir4223219662/test_compile_multi_pkg1358018183/tmp/go-build1211068849/b091/pkg2.test: text file busy
            FAIL	example/pkg/pkg2	0.000s
            FAIL
        script_test.go:156: FAIL: testdata/script/test_compile_multi_pkg.txt:32: go test -o . ./pkg/...: exit status 1
2023-08-18 20:44 linux-amd64-longtest-race go@ce160863 cmd/go.TestScript (log)
vcs-test.golang.org rerouted to http://127.0.0.1:41723
https://vcs-test.golang.org rerouted to https://127.0.0.1:40761
go test proxy running at GOPROXY=http://127.0.0.1:41481/mod
2023/08/18 21:46:42 http: TLS handshake error from 127.0.0.1:35368: read tcp 127.0.0.1:40761->127.0.0.1:35368: read: connection reset by peer
2023/08/18 21:48:40 http: TLS handshake error from 127.0.0.1:44324: read tcp 127.0.0.1:40761->127.0.0.1:44324: read: connection reset by peer
2023/08/18 21:50:21 http: TLS handshake error from 127.0.0.1:44512: EOF
--- FAIL: TestScript (1.14s)
    --- FAIL: TestScript/test_compile_multi_pkg (11.83s)
        script_test.go:132: 2023-08-18T21:45:40Z
        script_test.go:134: $WORK=/workdir/tmp/cmd-go-test-1401452276/tmpdir4252868724/test_compile_multi_pkg4199451243
...
            > go test -c -o $devnull ./...
            > rm pkg1.test$GOEXE
            > rm pkg2.test$GOEXE
            > go test -o . ./pkg/...
            [stdout]
            ok  	example/pkg/pkg1	0.066s [no tests to run]
            fork/exec /workdir/tmp/cmd-go-test-1401452276/tmpdir4252868724/test_compile_multi_pkg4199451243/tmp/go-build3164080947/b091/pkg2.test: text file busy
            FAIL	example/pkg/pkg2	0.001s
            FAIL
        script_test.go:156: FAIL: testdata/script/test_compile_multi_pkg.txt:32: go test -o . ./pkg/...: exit status 1
2023-08-18 22:41 linux-amd64-longtest-race go@7af28fa9 cmd/go.TestScript (log)
vcs-test.golang.org rerouted to http://127.0.0.1:42589
https://vcs-test.golang.org rerouted to https://127.0.0.1:44175
go test proxy running at GOPROXY=http://127.0.0.1:46413/mod
2023/08/18 23:31:05 http: TLS handshake error from 127.0.0.1:37204: read tcp 127.0.0.1:44175->127.0.0.1:37204: read: connection reset by peer
2023/08/18 23:31:05 http: TLS handshake error from 127.0.0.1:37194: EOF
--- FAIL: TestScript (1.10s)
    --- FAIL: TestScript/test_compile_multi_pkg (6.01s)
        script_test.go:132: 2023-08-18T23:28:55Z
        script_test.go:134: $WORK=/workdir/tmp/cmd-go-test-2179483536/tmpdir224393766/test_compile_multi_pkg694859549
        script_test.go:156: 
...
            > go test -c -o $devnull ./...
            > rm pkg1.test$GOEXE
            > rm pkg2.test$GOEXE
            > go test -o . ./pkg/...
            [stdout]
            ok  	example/pkg/pkg1	0.122s [no tests to run]
            fork/exec /workdir/tmp/cmd-go-test-2179483536/tmpdir224393766/test_compile_multi_pkg694859549/tmp/go-build3723609144/b091/pkg2.test: text file busy
            FAIL	example/pkg/pkg2	0.016s
            FAIL
        script_test.go:156: FAIL: testdata/script/test_compile_multi_pkg.txt:32: go test -o . ./pkg/...: exit status 1

watchflakes

@bcmills
Copy link
Contributor Author

bcmills commented Aug 22, 2023

Reproduced using:

$ OUT=$(mktemp -d)
$ while go test -o $OUT -run='^$' -count=1 archive/... compress/... crypto/... encoding/... hash/... image/... testing/... text/... unicode/...; do rm $OUT/*.test ; done

It failed after a couple dozen iterations.

@ianlancetaylor
Copy link
Contributor

My vote is to retry ETXTBSY errors. Copying a file is not an instantaneous operation, so holding syscall.ForkLock seems too heavyweight. And retrying ETXTBSY seems simpler than starting a new process just to copy a file.

@bcmills
Copy link
Contributor Author

bcmills commented Aug 22, 2023

I agree with that assessment.

@gopherbot
Copy link

Change https://go.dev/cl/522015 mentions this issue: cmd/go: retry ETXTBSY errors when running test binaries

@bcmills
Copy link
Contributor Author

bcmills commented Aug 22, 2023

@gopherbot, please backport to Go 1.21. This Go command bug causes occasional test failures in TestScript/test_compile_multi_pkg, which was added in Go 1.21, and the fix is small and straightforward.

@gopherbot
Copy link

Backport issue(s) opened: #62222 (for 1.21).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@gopherbot
Copy link

Change https://go.dev/cl/522176 mentions this issue: [release-branch.go1.21] cmd/go: retry ETXTBSY errors when running test binaries

cellularmitosis pushed a commit to cellularmitosis/go that referenced this issue Aug 24, 2023
An ETXTBSY error when starting a test binary is almost certainly
caused by the race reported in golang#22315. That race will resolve quickly
on its own, so we should just retry the command instead of reporting a
spurious failure.

Fixes golang#62221.

Change-Id: I408f3eaa7ab5d7efbc7a2b1c8bea3dbc459fc794
Reviewed-on: https://go-review.googlesource.com/c/go/+/522015
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Bryan Mills <bcmills@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
gopherbot pushed a commit that referenced this issue Aug 30, 2023
…t binaries

An ETXTBSY error when starting a test binary is almost certainly
caused by the race reported in #22315. That race will resolve quickly
on its own, so we should just retry the command instead of reporting a
spurious failure.

Fixes #62222.
Updates #62221.

Change-Id: I408f3eaa7ab5d7efbc7a2b1c8bea3dbc459fc794
Reviewed-on: https://go-review.googlesource.com/c/go/+/522015
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Bryan Mills <bcmills@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
(cherry picked from commit 4dc2564)
Reviewed-on: https://go-review.googlesource.com/c/go/+/522176
Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GoCommand cmd/go NeedsFix The path to resolution is known, but the work has not been done.
Projects
Status: Done
Development

No branches or pull requests

3 participants