Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: TestWaitGroupMisuse2 takes 45-90 seconds on netbsd, AIX #22944

Open
bradfitz opened this issue Nov 30, 2017 · 11 comments
Open

sync: TestWaitGroupMisuse2 takes 45-90 seconds on netbsd, AIX #22944

bradfitz opened this issue Nov 30, 2017 · 11 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-AIX OS-NetBSD Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@bradfitz
Copy link
Contributor

TestWaitGroupMisuse2 on Linux with 8 cores takes 0.22s.

On NetBSD it does pass but takes 45-90 seconds. Why?

/cc @bsiegert

@bradfitz bradfitz added help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-NetBSD labels Nov 30, 2017
@bradfitz bradfitz added this to the Unplanned milestone Nov 30, 2017
@bradfitz
Copy link
Contributor Author

/cc @dvyukov for any theories.

@dvyukov
Copy link
Member

dvyukov commented Dec 1, 2017

This test requires physical parallelism, so my first bet would be on a problem in netbsd scheduler.

@dvyukov
Copy link
Member

dvyukov commented Dec 1, 2017

Perhaps execution trace will sched some light.

@coypoop
Copy link
Contributor

coypoop commented Dec 6, 2017

how can I run this specific test?

@bsiegert
Copy link
Contributor

bsiegert commented Dec 6, 2017

go test sync should work/

@coypoop
Copy link
Contributor

coypoop commented Dec 6, 2017

netbsd's nanosleep always schedules another process, even for really short sleeps. making it spin on really short sleeps fixes this

@jdolecek-zz
Copy link

NetBSD nanosleep() always sleeps for at least 1 schedule slice when the specified time is under the schedule resolution. With default HZ value of 100 for i386 and amd64, it's always at least 10ms even when the specified time is smaller. I think other systems might work similar way.

@bradfitz
Copy link
Contributor Author

bradfitz commented Dec 9, 2017

@coypoop, or:

$ go test -v -run=TestWaitGroupMisuse2 sync

Or:

$ go test -v -run=TestWaitGroupMisuse2 -count=20 sync

@bradfitz bradfitz changed the title sync: TestWaitGroupMisuse2 takes 45-90 seconds on netbsd w/ 8 cores sync: TestWaitGroupMisuse2 takes 45-90 seconds on netbsd, AIX May 29, 2019
@bradfitz bradfitz modified the milestones: Unplanned, Go1.13 May 29, 2019
@bradfitz bradfitz added OS-AIX Testing An issue that has been verified to require only test changes, not just a test failure. labels May 29, 2019
@bradfitz
Copy link
Contributor Author

It takes 45 seconds on AIX too, and never panics as it expects to:

https://build.golang.org/log/4521fa0230a42f6f3b70e8f108c3ab63d962d567

/cc @Helflym

@Helflym
Copy link
Contributor

Helflym commented Jun 5, 2019

I was already aware about this failure. But I didn't find anything relevant.
As the test says "The detection is opportunistic" and in some cases, I don't get a panic until iteration 500000... So a few other cases might not trigger it at all, but it's random.
Note that it also explains the slowness of this test. On a local Linux machine, almost all the panic occurs during the first 1000th iteration. On AIX builder, it's far more random, it can happen at the 10th one like at 100000th one...

Edit: as NetBSD, it might be related to AIX scheduler:

The suspension time may be longer than requested due to the scheduling of other activity by the system.

@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@bcmills
Copy link
Contributor

bcmills commented Oct 4, 2019

TestWaitGroupMisuse2 has mostly been passing on AIX as far as I can tell, but the failure mode from #22944 (comment) cropped up again in one aix-ppc64 builder run today (https://build.golang.org/log/a38f4c89dcb8e8b62b37ccefeae0de03dfcfecb5):

--- FAIL: TestWaitGroupMisuse2 (44.26s)
    waitgroup_test.go:133: Should panic
    waitgroup_test.go:96: Unexpected panic: <nil>
FAIL
FAIL	sync	44.880s

@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-AIX OS-NetBSD Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
Status: Triage Backlog
Development

No branches or pull requests

10 participants