Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/signal: test failures with "signal: interrupt" and no other output #41561

Closed
bcmills opened this issue Sep 22, 2020 · 12 comments
Closed

os/signal: test failures with "signal: interrupt" and no other output #41561

bcmills opened this issue Sep 22, 2020 · 12 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Sep 22, 2020

2020-09-22T15:14:09-d42b32e/linux-386-longtest
2020-09-22T15:13:57-53c9b95/darwin-arm64-corellium

signal: interrupt
FAIL	os/signal	13.095s

It's a strange failure mode, and seems even stranger given that we've only seen two occurrences, on very different builders but at very nearly the same time. (CC @golang/osp-team in case this is due to a cmd/coordinator issue.)

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 22, 2020
@bcmills bcmills added this to the Backlog milestone Sep 22, 2020
@bcmills
Copy link
Contributor Author

bcmills commented Nov 9, 2020

2020-11-09T14:17:30-f858c22/solaris-amd64-oraclerel
2020-11-03T01:27:45-cc0930c/darwin-amd64-nocgo
2020-11-03T01:27:45-cc0930c/solaris-amd64-oraclerel
2020-11-02T00:46:44-e463c28/linux-amd64-clang
2020-10-30T15:25:49-e1faebe/illumos-amd64
2020-10-29T15:13:09-50af50d/netbsd-arm-bsiegert
2020-10-29T04:12:30-53efbdb/linux-amd64-jessie
2020-10-27T12:36:54-320cc79/netbsd-amd64-9_0
2020-10-27T12:00:35-333e904/netbsd-arm64-bsiegert
2020-10-26T18:29:24-a8b28eb/netbsd-arm64-bsiegert
2020-10-26T18:20:05-ae585ee/darwin-amd64-nocgo
2020-10-23T00:22:00-400581b/netbsd-amd64-9_0
2020-10-22T19:43:26-c92bfac/netbsd-amd64-9_0
2020-10-22T17:11:03-f8aecbb/solaris-amd64-oraclerel
2020-10-22T15:10:01-de74ea5/linux-arm64-aws
2020-10-22T01:20:16-4c7a18d/illumos-amd64
2020-10-21T14:34:44-15ead85/netbsd-amd64-9_0
2020-10-17T07:18:20-c8f6135/netbsd-amd64-9_0
2020-10-16T12:48:42-e981936/linux-amd64-nocgo
2020-10-16T04:13:45-af87480/netbsd-arm-bsiegert
2020-10-15T21:40:46-21e441c/netbsd-arm-bsiegert
2020-10-15T18:35:44-1bcf6be/illumos-amd64
2020-10-14T20:17:49-2ec71e5/solaris-amd64-oraclerel
2020-10-12T22:34:47-027367a/darwin-amd64-race
2020-10-12T18:31:36-8994607/illumos-amd64
2020-10-12T18:31:22-2f4368c/darwin-amd64-10_15
2020-10-09T15:13:57-eb67eab/darwin-amd64-race
2020-10-08T18:56:11-46ab0c0/netbsd-amd64-9_0
2020-10-07T15:57:48-ccf89be/darwin-amd64-10_14
2020-10-06T22:55:40-234de9e/netbsd-arm64-bsiegert
2020-10-06T07:37:55-8e20388/darwin-amd64-10_15
2020-10-05T17:31:26-b064eb7/linux-386-clang
2020-10-02T20:23:33-21eb3dc/linux-s390x-ibm
2020-10-02T18:57:36-d888f1d/darwin-arm64-corellium
2020-09-29T19:01:28-567ef8b/linux-amd64
2020-09-29T17:25:24-66770f4/solaris-amd64-oraclerel
2020-09-29T06:10:34-6fc094c/darwin-amd64-race
2020-09-24T20:41:14-23cc16c/aix-ppc64
2020-09-24T18:05:54-25a33da/netbsd-amd64-9_0
2020-09-24T16:21:59-83e8bf2/darwin-arm64-corellium
2020-09-23T17:10:35-bc320fc/aix-ppc64
2020-09-22T16:52:11-4e1d812/netbsd-amd64-9_0

@bcmills
Copy link
Contributor Author

bcmills commented Nov 9, 2020

This does not appear to be a cmd/coordinator issue, but does appear to be a regression in Go 1.16, so marking as release-blocker until we better understand the cause.

@bcmills bcmills modified the milestones: Backlog, Go1.16 Nov 9, 2020
@bcmills
Copy link
Contributor Author

bcmills commented Nov 9, 2020

(CC @ianlancetaylor @cherrymui for signaly symptoms)

@cherrymui
Copy link
Member

cherrymui commented Nov 9, 2020

I think this is due to TestNotifyContextSimultaneousNotifications, which sends to the running process a lot of SIGINT signals. The signal could land after the test is done and we have unregistered the signal handler.

Maybe we should wait some time until all pending signals arrive, or run that particular test in a separate process. Or use a non-fatal signal.

@cherrymui
Copy link
Member

TestNotifyContextSimultaneousNotifications is new in Go 1.16, so that explains the regression.

@bcmills bcmills added the Testing An issue that has been verified to require only test changes, not just a test failure. label Nov 10, 2020
@bcmills
Copy link
Contributor Author

bcmills commented Nov 10, 2020

CC @henvic

@bcmills
Copy link
Contributor Author

bcmills commented Nov 10, 2020

Maybe we should wait some time until all pending signals arrive

That is exactly what the existing quiesce helper function attempts to do. It should probably be used in that test too.

or run that particular test in a separate process.

I think that's probably the right long-term direction for the os/signal tests. (See also the discussion on CL 232378.)

Or use a non-fatal signal.

Maybe, although I would worry that would just mask the timing issue and lead to crosstalk with the other tests.

@henvic
Copy link
Contributor

henvic commented Nov 10, 2020

Thanks @bcmills, I'm going to take a look.

@gopherbot
Copy link

Change https://golang.org/cl/270198 mentions this issue: os/signal: fix flaky tests for NotifyContext.

@henvic
Copy link
Contributor

henvic commented Nov 16, 2020

Hey @bcmills,

Is something like this the direction you think tests should go?
https://go-review.googlesource.com/c/go/+/270198

Just wondering: is it on purpose that no -v (verbose) is used when running the tests on the CI? I ask because I don't know how to access further information about the failed tests (if they're stored somewhere).

@bcmills
Copy link
Contributor Author

bcmills commented Nov 16, 2020

That CL does look like the right general direction, although I would be inclined to use os.Args[0] or os.Executable to re-exec the test process itself, rather than invoking go run on a separate program.

(See TestDetectNohup and TestNoup for existing examples of that style in the os/signal tests.)

@dmitshur dmitshur added the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Nov 19, 2020
@toothrot toothrot removed the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Dec 17, 2020
@dmitshur
Copy link
Contributor

Friendly ping as this is a release blocking issue. CC @ianlancetaylor.

@golang golang locked and limited conversation to collaborators Dec 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

6 participants