Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: TestNumGoroutine flaky #14107

Closed
mdempsky opened this issue Jan 27, 2016 · 11 comments
Closed

runtime: TestNumGoroutine flaky #14107

mdempsky opened this issue Jan 27, 2016 · 11 comments

Comments

@mdempsky
Copy link
Member

runtime.TestNumGoroutine has flaked at least 3 times recently on openbsd/amd64 with the same failure message:

--- FAIL: TestNumGoroutine (0.02s)
    proc_test.go:353: NumGoroutine=3, but found 4 goroutines in stack dump

http://build.golang.org/log/3d5e0edfe768d25a87fe5b3806ec2f846cdfa34f
http://build.golang.org/log/3199fe0c622947092e3fd724dc95176bc6513795
http://build.golang.org/log/b6527fd60a2a459e2c6d1ed2d34fcf7121806f92

@bradfitz
Copy link
Contributor

You filed this just as I sent a CL: https://go-review.googlesource.com/18938

@gopherbot
Copy link

CL https://golang.org/cl/18938 mentions this issue.

gopherbot pushed a commit that referenced this issue Jan 27, 2016
This has been flaking on the new OpenBSD 5.8 builders lately:
https://storage.googleapis.com/go-build-log/808270e7/openbsd-amd64-gce58_61ce2663.log
(as one example)

Add more debug info when it fails.

Updates #14107

Change-Id: Ie30bc0c703d2e9ee993d1e232ffc5f2d17e65c97
Reviewed-on: https://go-review.googlesource.com/18938
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
@bradfitz
Copy link
Contributor

And we have output:

http://build.golang.org/log/27397aad98df8b5ba9c800480156ed29b95f5dd9

--- FAIL: TestNumGoroutine (0.02s)
    proc_test.go:353: NumGoroutine=3, but found 4 goroutines in stack dump: goroutine 207553 [running]:
        runtime_test.TestNumGoroutine(0xc82042a000)
            /tmp/workdir/go/src/runtime/proc_test.go:348 +0x14c
        testing.tRunner(0xc82042a000, 0x7d0f58)
            /tmp/workdir/go/src/testing/testing.go:473 +0x98
        created by testing.RunTests
            /tmp/workdir/go/src/testing/testing.go:582 +0x892

        goroutine 1 [chan receive]:
        testing.RunTests(0x6f0160, 0x7d0340, 0xa7, 0xa7, 0x4cc401)
            /tmp/workdir/go/src/testing/testing.go:583 +0x8d2
        testing.(*M).Run(0xc820159ef8, 0x75f098)
            /tmp/workdir/go/src/testing/testing.go:515 +0x81
        runtime_test.TestMain(0xc820159ef8)
            /tmp/workdir/go/src/runtime/crash_test.go:24 +0x28
        main.main()
            runtime/_test/_testmain.go:932 +0x114

        goroutine 17 [syscall, locked to thread]:
        runtime.goexit()
            /tmp/workdir/go/src/runtime/asm_amd64.s:1998 +0x1

        goroutine 207554 [runnable]:
        runtime.goexit1()
            /tmp/workdir/go/src/runtime/proc.go:2159
        runtime.goexit()
            /tmp/workdir/go/src/runtime/asm_amd64.s:1999 +0x6
        created by testing.RunTests
            /tmp/workdir/go/src/testing/testing.go:582 +0x892

@bradfitz
Copy link
Contributor

@bradfitz
Copy link
Contributor

Hey, now one on Linux-arm!

https://storage.googleapis.com/go-build-log/fae00030/linux-arm_acd2e3b8.log

--- FAIL: TestNumGoroutine (0.05s)
    proc_test.go:353: NumGoroutine=3, but found 4 goroutines in stack dump: goroutine 312375 [running]:
        runtime_test.TestNumGoroutine(0x1050e2a0)
            /workdir/go/src/runtime/proc_test.go:348 +0x10c
        testing.tRunner(0x1050e2a0, 0x32eb6c)
            /workdir/go/src/testing/testing.go:473 +0xa8
        created by testing.RunTests
            /workdir/go/src/testing/testing.go:582 +0x600

        goroutine 1 [chan receive, locked to thread]:
        testing.RunTests(0x27f1c0, 0x32e548, 0xac, 0xac, 0x10572001)
            /workdir/go/src/testing/testing.go:583 +0x62c
        testing.(*M).Run(0x10533f7c, 0xc7ba0)
            /workdir/go/src/testing/testing.go:515 +0x8c
        runtime_test.TestMain(0x10533f7c)
            /workdir/go/src/runtime/crash_test.go:24 +0x1c
        main.main()
            runtime/_test/_testmain.go:982 +0x118

        goroutine 17 [syscall, 1 minutes, locked to thread]:
        runtime.goexit()
            /workdir/go/src/runtime/asm_arm.s:991 +0x4

        goroutine 4 [runnable]:
        runtime.SetFinalizer(0x21dea8, 0x11520370, 0x0, 0x0)
            /workdir/go/src/runtime/mfinal.go:285 +0x118
        os.(*Process).release(0x11520370, 0x0, 0x0)
            /workdir/go/src/os/exec_unix.go:66 +0x4c
        os.(*Process).Release(0x11520370, 0x0, 0x0)
            /workdir/go/src/os/doc.go:31 +0x28
FAIL
FAIL    runtime 120.318s

@bradfitz bradfitz added this to the Go1.6Maybe milestone Jan 27, 2016
@bradfitz bradfitz changed the title runtime: TestNumGoroutine flaky on openbsd/amd64 runtime: TestNumGoroutine flaky Jan 27, 2016
@bradfitz
Copy link
Contributor

/cc @aclements for any hints

@rsc
Copy link
Contributor

rsc commented Jan 27, 2016

The goal of the check there (which I only added relatively recently) was to make sure that NumGoroutine was not consistently different from than the number of stacks shown (because one includes system goroutines and one does not). The check is fundamentally racy in that they might see different pictures of the world. I sent CL 19002 to try to reduce the race window. That may be enough. We'll see. In any event I'm not worried about the failures, except as far as they are annoying. Hopefully that CL will cut the frequency enough. If not we can do a retry loop up to 10 times or something.

@mdempsky
Copy link
Member Author

Is it suspicious at all that goroutine 4's trackback in the linux-arm repro doesn't have a "created by" line?

@aclements
Copy link
Member

os.(*Process).Release in the last goroutine is a finalizer, so probably it happened to be running during the Stack call and then returned before the NumGoroutine call.

Plus, since Stack stops and restarts the world, it's throwing the scheduler state in to the air, which is probably making this sort of thing more likely.

I think Russ is right that the best we can do is retry a few times.

@aclements
Copy link
Member

Is it suspicious at all that goroutine 4's trackback in the linux-arm repro doesn't have a "created by" line?

I was wondering about that, too. That goroutine is the runfinq goroutine, so it was created by a runtime function. printcreatedby elides the "created by" message using the same rules that apply to eliding frames, so it doesn't get printed.

@mdempsky
Copy link
Member Author

@aclements Duh, thanks! Somehow SetFinalizer being at the top of the call stack made me rule out that os.(*Process).Release could itself have been set as a finalizer.

@golang golang locked and limited conversation to collaborators Feb 3, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants