Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: TestNetpollBreak failures with "did not interrupt netpoll" on plan9 builders #39437

Closed
bcmills opened this issue Jun 6, 2020 · 4 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Plan9
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Jun 6, 2020

@bcmills bcmills added OS-Plan9 NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 6, 2020
@bcmills bcmills added this to the Unplanned milestone Jun 6, 2020
@9pi
Copy link

9pi commented Jun 6, 2020

CL 235820 is just a coincidence. I can provoke the same failure on the release branch:

term% go version
go version go1.14.2 plan9/arm
term% go test -count 1000 -run TestNetpollBreak runtime
--- FAIL: TestNetpollBreak (5.80s)
    proc_test.go:1031: netpollBreak did not interrupt netpoll: slept for: 5.769816333s
FAIL

There's no network poller for Plan 9, just runtime/netpoll_stub.go which presumably exists to pretend to pass the tests. In essence this test is just one goroutine doing a notetsleep for 10 seconds, and another repeatedly doing a notewakeup with a 100 microsecond Usleep each time around the loop. How this can result in a >5 second delay is mysterious to me. I can see it's not swapping.

@millerresearch
Copy link
Contributor

I can see it's not swapping.

Also, it's not a garbage collection delay: I tried with GOGC=off and still see failures.

@gopherbot
Copy link

Change https://golang.org/cl/237698 mentions this issue: runtime: avoid lock starvation in TestNetpollBreak on Plan 9

@millerresearch
Copy link
Contributor

There's no network poller for Plan 9, just runtime/netpoll_stub.go which presumably exists to pretend to pass the tests.

My presumption was wrong: the stub "implementation" of netpoll is also used (currently on Plan 9 only) to support runtime timers. So the 10 second netpoll calls done by the test are contending with 10 minute netpoll calls which come from the overall go test timeout. The problem is that the runtime.lock which mediates this contention is unfair. When the 10 minute netpoll call is interrupted by netpollBreak it can be restarted and seize the lock before the 10 second netpoll call gets a chance. Repeated enough times, this starves the 10 second call sufficiently to time out the test.

CL 237698 inserts an osyield call to give the two callers a more even chance. It won't guarantee fairness, but a few hours running the test suggest that it does well enough.

@golang golang locked and limited conversation to collaborators Jun 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Plan9
Projects
None yet
Development

No branches or pull requests

4 participants