Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: test timeouts without useful debugging output on darwin builders #38768

Closed
bcmills opened this issue Apr 30, 2020 · 13 comments
Closed

cmd/go: test timeouts without useful debugging output on darwin builders #38768

bcmills opened this issue Apr 30, 2020 · 13 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Apr 30, 2020

There are numerous occasional test timeouts on the darwin-amd64-race builder in the logs.

I note that the windows-amd64-race builder sets GO_TEST_TIMEOUT_SCALE=2:
https://github.com/golang/build/blob/916311cec4e1738aae572fe49767cebdc0351144/dashboard/builders.go#L2067

I suspect that we may need to do the same for the darwin-amd64-race builder to reduce flakiness.

CC @andybons

2020-04-27T23:05:16-42c4899/darwin-amd64-race
2020-04-17T04:26:54-29d925d/darwin-amd64-race
2020-04-16T15:31:20-4eaf855/darwin-amd64-race
2020-04-10T18:45:24-37470c0/darwin-amd64-race
2019-10-11T16:34:01-ba108c9/darwin-amd64-race
2019-10-04T21:22:55-5f4aa5d/darwin-amd64-race
2019-09-24T04:19:31-e29d276/darwin-amd64-race
2019-09-16T22:14:49-115e4c9/darwin-amd64-race
2019-09-06T21:22:26-8a8cf5b/darwin-amd64-race
2019-06-29T13:17:15-623d653/darwin-amd64-race
2019-06-28T20:10:05-2e0cd2a/darwin-amd64-race
2019-06-28T20:09:48-91c385b/darwin-amd64-race
2019-06-28T20:00:59-3b040b7/darwin-amd64-race
2019-06-27T20:43:26-f205ae3/darwin-amd64-race
2019-06-26T20:17:35-9acd2d6/darwin-amd64-race
2019-06-26T18:38:59-06f709a/darwin-amd64-race
2019-06-26T16:35:14-60f14fd/darwin-amd64-race
2019-06-26T03:59:57-adf8883/darwin-amd64-race
2019-06-25T21:30:39-343b7fa/darwin-amd64-race
2019-06-25T17:51:25-a6ad626/darwin-amd64-race
2019-06-25T16:43:17-3f83c83/darwin-amd64-race
2019-06-24T21:26:46-5f7e9ce/darwin-amd64-race
2019-06-14T16:46:12-80f8913/darwin-amd64-race
2019-06-13T17:12:09-f83c44e/darwin-amd64-race
2019-06-13T14:14:02-55d31e1/darwin-amd64-race
2019-06-13T03:58:11-418c7d8/darwin-amd64-race
2019-06-12T22:17:38-c0c6cee/darwin-amd64-race
2019-06-12T18:03:46-f44404e/darwin-amd64-race
2019-06-11T17:54:27-9f76566/darwin-amd64-race
2019-06-10T23:23:39-ef84fa0/darwin-amd64-race
2019-06-10T20:32:36-5ce1819/darwin-amd64-race
2019-06-10T16:19:39-8f296f5/darwin-amd64-race
2019-06-05T17:53:53-e9a136d/darwin-amd64-race
2019-06-05T17:25:56-195e753/darwin-amd64-race
2019-06-05T00:53:14-ac921da/darwin-amd64-race
2019-05-24T21:34:53-385b2e0/darwin-amd64-race
2019-05-17T22:32:30-1ab063c/darwin-amd64-race
2019-05-13T17:01:25-db2bf15/darwin-amd64-race
2019-05-09T21:13:46-56c1cf3/darwin-amd64-race
2019-05-09T21:13:23-60f6763/darwin-amd64-race
2019-05-09T21:13:19-fe67ce2/darwin-amd64-race
2019-05-09T15:32:36-b6f59cb/darwin-amd64-race
2019-04-30T18:29:07-888bac1/darwin-amd64-race
2019-04-29T22:35:35-98c5a56/darwin-amd64-race
2019-04-29T04:05:29-d016330/darwin-amd64-race

@bcmills bcmills added OS-Darwin Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Apr 30, 2020
@gopherbot gopherbot added this to the Unreleased milestone Apr 30, 2020
@andybons andybons self-assigned this Apr 30, 2020
@andybons andybons removed their assignment Aug 4, 2020
@bcmills
Copy link
Contributor Author

bcmills commented Feb 19, 2021

@bcmills bcmills changed the title x/build: darwin-amd64-race may need a GO_TEST_TIMEOUT_SCALE cmd/go: test timeouts on darwin builders Feb 19, 2021
@bcmills bcmills removed the Builders x/build issues (builders, bots, dashboards) label Feb 19, 2021
@bcmills bcmills modified the milestones: Unreleased, Backlog Feb 19, 2021
@bcmills bcmills added the Testing An issue that has been verified to require only test changes, not just a test failure. label Feb 19, 2021
@bcmills bcmills changed the title cmd/go: test timeouts on darwin builders cmd/go: test timeouts without useful debugging output on darwin builders Feb 19, 2021
@bcmills bcmills modified the milestones: Backlog, Go1.17 Feb 19, 2021
@gopherbot
Copy link

Change https://golang.org/cl/302052 mentions this issue: cmd/go: bail out from script tests earlier when a timeout occurs

@bcmills
Copy link
Contributor Author

bcmills commented Mar 16, 2021

CL 302052 “fixes” this issue in the sense that the timeouts will probably continue to occur, but should produce more useful debugging output when that happens. If it is successful in that regard, we can file a separate issue for whatever failure mode that uncovers.

@bcmills bcmills self-assigned this Mar 16, 2021
@bcmills
Copy link
Contributor Author

bcmills commented Mar 17, 2021

Still not useful, but we're narrowing in on it, I think.
Looks like we also need to add useful timeouts for (*testgoData).run:

goroutine 117 [syscall, 5 minutes]:
syscall.syscall6(0x10d7940, 0x3e89, 0xc0006a7ad8, 0x0, 0xc000c881b0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/sys_darwin.go:41 +0x2e
syscall.wait4(0x3e89, 0xc0006a7ad8, 0x0, 0xc000c881b0, 0xc000c881b0, 0xc0006a7af8, 0x1050558)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/syscall/zsyscall_darwin_amd64.go:43 +0xa5
syscall.Wait4(0x3e89, 0xc0006a7b40, 0x0, 0xc000c881b0, 0x105026a, 0xc000666170, 0xc0005ae240)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/syscall/syscall_bsd.go:145 +0x66
os.(*Process).wait(0xc0001900f0, 0xc0007e41e0, 0xc0007e4210, 0xc0007e4160)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/os/exec_unix.go:44 +0x159
os.(*Process).Wait(...)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/os/exec.go:129
os/exec.(*Cmd).Wait(0xc0007e4160, 0x0, 0x0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/os/exec/exec.go:507 +0x105
os/exec.(*Cmd).Run(0xc0007e4160, 0x81, 0xc0006680c0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/os/exec/exec.go:341 +0x8a
cmd/go_test.(*testgoData).doRun(0xc00050e0b0, 0xc0006680c0, 0x4, 0x4, 0x40, 0xc0006680c0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/cmd/go/go_test.go:411 +0x518
cmd/go_test.(*testgoData).run(0xc00050e0b0, 0xc0006680c0, 0x4, 0x4)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/cmd/go/go_test.go:427 +0xa6
cmd/go_test.TestCacheCoverage(0xc0004fe480)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/cmd/go/go_test.go:2318 +0x3c5
testing.tRunner(0xc0004fe480, 0x1b01cd0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1236 +0x22e
created by testing.(*T).Run
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1281 +0x78c

https://build.golang.org/log/ad208dadaa44cdaade5cf4e5d39ab96f78ab183d

@bcmills bcmills reopened this Mar 17, 2021
@gopherbot
Copy link

Change https://golang.org/cl/303330 mentions this issue: testing: allow parallel-subtest goroutines to exit when the subtest is complete

gopherbot pushed a commit that referenced this issue Mar 19, 2021
…s complete

Fixes #45127
Updates #38768

Change-Id: I7f41901d5bcc07741ac9f5f2a24d2b07ef633cb1
Reviewed-on: https://go-review.googlesource.com/c/go/+/303330
Trust: Bryan C. Mills <bcmills@google.com>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@bcmills
Copy link
Contributor Author

bcmills commented Mar 25, 2021

https://build.golang.org/log/ebdf7a853ae4dba70c0d76c53f5885501cd54668 has what seems to be a fairly useful trace; it seems to indicate that os.Interrupt is not succeeding in terminating a go subprocess.

@gopherbot
Copy link

Change https://golang.org/cl/308809 mentions this issue: cmd/go: in TestScript, set GOTRACEBACK and use SIGQUIT to terminate hung subprocesses

gopherbot pushed a commit that referenced this issue Apr 9, 2021
…ung subprocesses

This should produce more useful outputs if a cmd/go invocation hangs
during a test.

(In some cases the outputs will be very verbose, but verbose is still
better than uninformative.)

For #36460
For #38768

Change-Id: Ibaf2d0fbf7387dfab1aad8981140ecb11901fc4b
Reviewed-on: https://go-review.googlesource.com/c/go/+/308809
Trust: Bryan C. Mills <bcmills@google.com>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
@ianlancetaylor
Copy link
Contributor

Are we going to do anything else here for 1.17? Thanks.

@bcmills
Copy link
Contributor Author

bcmills commented Apr 27, 2021

I'm planning to investigate this further during the freeze. I've managed to pare down the failure output enough that most of what remains is goroutines stuck in os/exec and syscall, and I suspect that it boils down to a platform bug of some sort (and/or a bug in the os package on macOS).

@bcmills bcmills added the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Apr 27, 2021
@Thor-Bjorgvinsson
Copy link

Could this halting command issue on mac be related? #38824

@Integralist
Copy link

I've been seeing similar with .Run() when using GitHub's Actions CI. I've only been able to replicate once on my local macOS but it's very regular to see it timeout on macOS in the CI environment.

@bcmills bcmills removed the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Jul 15, 2021
@bcmills bcmills modified the milestones: Go1.17, Backlog Jul 15, 2021
@bcmills
Copy link
Contributor Author

bcmills commented Jun 1, 2022

It's been a year since the last of these timeouts. I'm calling this “probably fixed”, either by an intervening runtime change or by an intervening update to the race detector.

greplogs -l -e '(?ms)\Adarwin-(?:amd64|arm64)-race.*panic: test timed out' --since=2021-04-27
2021-05-26T22:43:54-1d5298d/darwin-amd64-race

@bcmills bcmills closed this as completed Jun 1, 2022
@rsc rsc unassigned bcmills Jun 23, 2022
@golang golang locked and limited conversation to collaborators Jun 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

6 participants