Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/sys/unix: TestPoll failures with "wrong number of events" beginning 2021-11-04 #49380

Closed
bcmills opened this issue Nov 5, 2021 · 10 comments
Closed
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 release-blocker
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Nov 5, 2021

greplogs --dashboard -md -l -e '(?m)FAIL: TestPoll .*\n\s+.*wrong number of events'

2021-11-05T07:00:05-7861aae-6fefb7f/linux-amd64
2021-11-05T00:52:09-7861aae-bd580a0/linux-amd64-race
2021-11-04T21:41:49-7861aae-156abe5/linux-amd64-race

These all occurred after CL 361255 in x/sys, but I don't see how it could possibly be related. I suspect a regression on the Go side — @mknyszek, could this be another test that is overly sensitive to being interrupted by a GC cycle?

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 5, 2021
@gopherbot gopherbot added this to the Unreleased milestone Nov 5, 2021
@bcmills
Copy link
Contributor Author

bcmills commented Nov 5, 2021

(Marking as release-blocker for Go 1.18 because this appears to be a regression on linux-amd64, which is a first-class port.)

@bcmills bcmills modified the milestones: Unreleased, Go1.18 Nov 5, 2021
@mknyszek mknyszek self-assigned this Nov 12, 2021
@mknyszek
Copy link
Contributor

I'll look into it. I think you're probably right.

@mknyszek
Copy link
Contributor

My attempts to reproduce this have been unsuccessful thus far. Is this particularly rare?

@mknyszek
Copy link
Contributor

This failure mode is also so strange... It creates a FIFO that it then blocks on, and waits to time out. How could something be sent on that FIFO? What else could possibly get that FD?

@mknyszek
Copy link
Contributor

Would it make sense to augment the test to print which events it thinks fired? Even if I can't reproduce locally, it should in theory pop up on the builders eventually.

@bcmills
Copy link
Contributor Author

bcmills commented Nov 12, 2021

My attempts to reproduce this have been unsuccessful thus far. Is this particularly rare?

It does seem to only occur in a small fraction of runs: we have a ton of linux-amd64 builders and only a small number of failures per day (none at all since Nov. 6).

(But we've also had ∞% more failures than we did between whenever my fetchlogs data begins — I think at least sometime in 2019 at the moment? — and Nov. 4 of this year. 😅)

If we think it may have been fixed since Nov. 6, maybe we can park it in WaitingForInfo and see if it occurs again by the time the Gopherbot expires it?


greplogs --dashboard -md -l -e '(?m)FAIL: TestPoll .*\n\s+.*wrong number of events' --since=2021-11-05T07:01:00

2021-11-06T13:10:06-c75c477-4f083c7/linux-amd64-race
2021-11-05T22:03:24-c75c477-035963c/linux-amd64-race

@bcmills
Copy link
Contributor Author

bcmills commented Nov 12, 2021

Would it make sense to augment the test to print which events it thinks fired? Even if I can't reproduce locally, it should in theory pop up on the builders eventually.

I'm always in favor of more-descriptive test logs. 👍

@mknyszek
Copy link
Contributor

Oops. I forgot to mark it as golang/go#49380 in the CL. Anyway, golang.org/cl/363714 improves the logs. I'm going to put this in WaitingForInfo as you suggested Bryan, since we haven't seen it since Nov. 6th.

@mknyszek mknyszek added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Nov 12, 2021
@bcmills bcmills added the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Nov 12, 2021
@gopherbot
Copy link

Change https://golang.org/cl/363660 mentions this issue: unix: avoid depending on consistent Revents type in TestPoll

gopherbot pushed a commit to golang/sys that referenced this issue Nov 13, 2021
For golang/go#49380

Change-Id: Ie1d370681962d9f69ef54b33ddf38e4c74a2e298
Reviewed-on: https://go-review.googlesource.com/c/sys/+/363660
Trust: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@bcmills
Copy link
Contributor Author

bcmills commented Dec 9, 2021

Looks fixed to me.

greplogs --dashboard -md -l -e '(?m)FAIL: TestPoll .*\n\s+.*wrong number of events' --since=2021-11-13

@bcmills bcmills closed this as completed Dec 9, 2021
@bcmills bcmills removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Dec 9, 2021
@golang golang locked and limited conversation to collaborators Jun 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 release-blocker
Projects
None yet
Development

No branches or pull requests

3 participants