-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os/exec: TestContextCancel is flaky on netbsd-arm64-bsiegert builder #42061
Comments
Do you suggest lengthening the grace time? Or something more clever? |
I would be inclined to eliminate the dependency on the grace time entirely. That could be done fairly easily by swapping the goroutines: instead of calling Then you just keep reading until EOF, and if EOF never comes the test crashes itself and you figure out what happened from the goroutine dump (which you'll probably need anyway, because if the test legitimately fails there's probably a deadlock somewhere in |
Change https://golang.org/cl/303351 mentions this issue: |
Change https://golang.org/cl/303352 mentions this issue: |
TestContextCancel is a test that ensures a process is killed soon after canceling the context, even if Wait is not called (#16222). The test checks whether the process exited without calling Wait by writing some data to its stdin. Currently the test involves two goroutines writing to stdin and reading from stdout. However the reading goroutine is not very necessary to detect the process exit. This patch simplifies the test by connecting the process stdout to /dev/null. For #42061 Change-Id: I0447a1c024ee5abb050c627ec3766b731b02181a Reviewed-on: https://go-review.googlesource.com/c/go/+/303352 Trust: Emmanuel Odeke <emmanuel@orijtech.com> Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Change https://golang.org/cl/368317 mentions this issue: |
If this test fails, we want to know exactly what the os/exec goroutines are doing. Panicking gives us a goroutine dump, whereas t.Fatal does not. While we're here, use exponential backoff instead of a hard-coded 1ms sleep. We want to give the OS enough time to actually terminate the subprocess. For #42061 Change-Id: I3d50a71ac314853c68a935218e7f97ce18b08b5b Reviewed-on: https://go-review.googlesource.com/c/go/+/368317 Trust: Bryan C. Mills <bcmills@google.com> Run-TryBot: Bryan C. Mills <bcmills@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
We have a goroutine dump! Curiously, at the point where the failure occurs the command is not running at all. This looks a lot like #50187, which suggests that perhaps NetBSD is sometimes failing to close input/output pipes when the underlying process exits or is terminated. @bsiegert, @coypoop: could you please investigate?
|
It's a consistent pattern. The
2021-12-15T23:51:57-6e7c691/netbsd-arm64-bsiegert |
Change https://golang.org/cl/372795 mentions this issue: |
For #42061 Change-Id: I3b4c774ad9e375d4bfef1cfb4336c35ed30a6430 Reviewed-on: https://go-review.googlesource.com/c/go/+/372795 Trust: Bryan Mills <bcmills@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Duplicate of #58699 |
2020-10-18T08:22:01-515e6a9/netbsd-arm64-bsiegert
2020-10-16T03:02:36-912262b/netbsd-arm64-bsiegert
2020-10-14T15:19:28-52669c4/netbsd-arm64-bsiegert
2020-10-12T21:13:04-15a11ce/netbsd-arm64-bsiegert
2020-10-07T20:18:35-ade5161/netbsd-arm64-bsiegert
2020-10-06T22:56:00-f8e5540/netbsd-arm64-bsiegert
2020-10-06T21:55:49-2e4ceaf/netbsd-arm64-bsiegert
It looks like the test hard-codes a 1-second timeout on the cancellation terminating the child process. That might not be a reasonable assumption on slow, heavily-loaded builders.
go/src/os/exec/exec_test.go
Lines 1081 to 1083 in f8df205
CC @bsiegert @bradfitz @ianlancetaylor
The text was updated successfully, but these errors were encountered: