Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: freebsd/386 flaky TestCgoSignalDeadlock #18598

Closed
broady opened this issue Jan 10, 2017 · 12 comments
Closed

runtime: freebsd/386 flaky TestCgoSignalDeadlock #18598

broady opened this issue Jan 10, 2017 · 12 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-FreeBSD Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@broady
Copy link
Member

broady commented Jan 10, 2017

While preparing go1.8rc1:

--- FAIL: TestCgoSignalDeadlock (1.91s)
        crash_cgo_test.go:34: expected "OK\n", but got:
                HANG
FAIL
FAIL    runtime 46.016s

/cc @ianlancetaylor

@broady
Copy link
Member Author

broady commented Jan 10, 2017

Another one:

--- FAIL: TestStackGrowth (43.16s)
        stack_test.go:114: finalizer did not run
FAIL
FAIL    runtime 57.182s

@bradfitz
Copy link
Contributor

bradfitz commented Mar 3, 2017

Let's keep this bug about TestCgoSignalDeadlock.

I filed #19381 for the TestStackGrowth bug.

@bradfitz bradfitz added the Testing An issue that has been verified to require only test changes, not just a test failure. label Mar 3, 2017
@bradfitz
Copy link
Contributor

bradfitz commented Mar 3, 2017

/cc @ianlancetaylor

@aclements aclements added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 8, 2017
@ianlancetaylor
Copy link
Contributor

I haven't yet been able to recreate this using gomote with freebsd-386-110. When I run the test using gomote, it takes consistently less than 0.2s. Looking at the times from the failures listed above, I see these times:

279.16
382.73
1.92 (another complicated runtime package failure)
182.60
151.64
147.00
251.81
229.31
84.96
2.06 (only 386 failure, the others are ARM)

So it looks like at at least on ARM, something is making the test take much much longer than expected. There is no FreeBSD ARM gomote. Since the test has a timeout, if something slows it down a lot it is expected to fail.

Since the test calls t.Parallel, it is at least possible that it is other testing work that is causing the test to run much slower.

@gopherbot
Copy link

CL https://golang.org/cl/46723 mentions this issue.

gopherbot pushed a commit that referenced this issue Jun 27, 2017
Updates #18598

Change-Id: I13c60124714cf9d1537efa0a7dd1e6a0fed9ae5b
Reviewed-on: https://go-review.googlesource.com/46723
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@aclements
Copy link
Member

@ianlancetaylor
Copy link
Contributor

Thanks. It looks like while the test is running the system is getting steadily more and more overloaded, and the test blows through the timeout. I note that the test calls t.Parallel. I would guess that some other t.Parallel test is using too many resources.

Something I didn't notice before is that every failure on ARM is in the GOMAXPROCS=2 runtime -cpu=1,2,4 section, likely meaning that the failure occurs when GOMAXPROCS is set to a value larger than the number of hardware threads.

It would be nice if we could figure out a way to print what other tests are running when this one fails.

@aclements
Copy link
Member

I recently dropped t.Parallel() from TestStackGrowth because it also had a built-in timeout that made it sensitive to load. Perhaps in general tests with timeouts should not be parallel.

Something I didn't notice before is that every failure on ARM is in the GOMAXPROCS=2 runtime -cpu=1,2,4 section

Note that CgoSignalDeadlock itself sets GOMAXPROCS to 100. But maybe the rest of the system is just more loaded in this test section.

@gopherbot
Copy link

CL https://golang.org/cl/48233 mentions this issue.

gopherbot pushed a commit that referenced this issue Jul 13, 2017
It seems that when too much other code is running on the system,
the testprogcgo code can overrun its timeouts.

Updates #18598.

Not marking the issue as fixed until it doesn't recur for some time.

Change-Id: Ieaf106b41986fdda76b1d027bb9d5e3fb805cc3b
Reviewed-on: https://go-review.googlesource.com/48233
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@paulzhol
Copy link
Member

I'm running the freebsd-arm-paulzhol builder with the following parameters:
GOARM=7 CGO_ENABLED=1 GO_TEST_TIMEOUT_SCALE=16 $HOME/bin/builder -subrepos=false -v -buildTimeout 2h freebsd-arm-paulzhol
It is also doing automatic reboots after each build with some scripts.

It's an ARM Cortex-A7 (Allwinner A20) and has only 1G RAM so it's using a swap partition from a magnetic disk. Could that be what is making it extra slow for the tests?

I can provide ssh access if it would help (or I can help debug).

@broady broady modified the milestones: Go1.9Maybe, Go1.9 Jul 17, 2017
@bradfitz bradfitz modified the milestones: Go1.9Maybe, Go1.10 Jul 20, 2017
@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017
@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Go1.10 Nov 22, 2017
@ianlancetaylor
Copy link
Contributor

This has not failed since June 29, when before then it was failing every few days. I think that removing t.Parallel worked around the problem. Closing.

@golang golang locked and limited conversation to collaborators Jan 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-FreeBSD Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

7 participants