Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: hang in all.bash runtime test #14809

Closed
rsc opened this issue Mar 14, 2016 · 26 comments
Closed

runtime: hang in all.bash runtime test #14809

rsc opened this issue Mar 14, 2016 · 26 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Mar 14, 2016

At 43ed65f (Mar 13 2016), running all.bash. The only change in my client is in os/user/user_test.go which is clearly not involved.

/cc @dvyukov @aclements

panic: test timed out after 3m0s

goroutine 21722 [running]:
panic(0x1c27e0, 0xc820484040)
    /Users/rsc/go/src/runtime/panic.go:500 +0x18e fp=0xc82003ff50 sp=0xc82003fec8
testing.startAlarm.func1()
    /Users/rsc/go/src/testing/testing.go:724 +0x10f fp=0xc82003ffb0 sp=0xc82003ff50
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc82003ffb8 sp=0xc82003ffb0
created by time.goFunc
    /Users/rsc/go/src/time/sleep.go:129 +0x3a

goroutine 1 [chan receive, 2 minutes]:
runtime.gopark(0x26bed0, 0xc82025e778, 0x22ff46, 0xc, 0xc820256a17, 0x3)
    /Users/rsc/go/src/runtime/proc.go:262 +0x131 fp=0xc82003cc08 sp=0xc82003cbe0
runtime.goparkunlock(0xc82025e778, 0x22ff46, 0xc, 0x17, 0x3)
    /Users/rsc/go/src/runtime/proc.go:268 +0x54 fp=0xc82003cc40 sp=0xc82003cc08
runtime.chanrecv(0x190d40, 0xc82025e720, 0xc82003ce08, 0xc820237c01, 0x6783c)
    /Users/rsc/go/src/runtime/chan.go:474 +0x2ac fp=0xc82003ccc0 sp=0xc82003cc40
runtime.chanrecv1(0x190d40, 0xc82025e720, 0xc82003ce08)
    /Users/rsc/go/src/runtime/chan.go:357 +0x2b fp=0xc82003ccf0 sp=0xc82003ccc0
testing.RunTests(0x26b8c8, 0x34d460, 0xb0, 0xb0, 0x5b)
    /Users/rsc/go/src/testing/testing.go:582 +0x362 fp=0xc82003ce30 sp=0xc82003ccf0
testing.(*M).Run(0xc82003cf08, 0xc8200180b8)
    /Users/rsc/go/src/testing/testing.go:515 +0x79 fp=0xc82003ceb0 sp=0xc82003ce30
runtime_test.TestMain(0xc82003cf08)
    /Users/rsc/go/src/runtime/crash_test.go:24 +0x25 fp=0xc82003cef8 sp=0xc82003ceb0
main.main()
    runtime/_test/_testmain.go:956 +0xb4 fp=0xc82003cf60 sp=0xc82003cef8
runtime.main()
    /Users/rsc/go/src/runtime/proc.go:188 +0x20a fp=0xc82003cfb0 sp=0xc82003cf60
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc82003cfb8 sp=0xc82003cfb0

goroutine 17 [syscall, 2 minutes, locked to thread]:
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc820040fb8 sp=0xc820040fb0

goroutine 2 [force gc (idle), 2 minutes]:
runtime.gopark(0x26bed0, 0x350870, 0x2328b1, 0xf, 0x100000000000014, 0x1)
    /Users/rsc/go/src/runtime/proc.go:262 +0x131 fp=0xc82002c758 sp=0xc82002c730
runtime.goparkunlock(0x350870, 0x2328b1, 0xf, 0xc820000114, 0x1)
    /Users/rsc/go/src/runtime/proc.go:268 +0x54 fp=0xc82002c790 sp=0xc82002c758
runtime.forcegchelper()
    /Users/rsc/go/src/runtime/proc.go:229 +0x9f fp=0xc82002c7c0 sp=0xc82002c790
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc82002c7c8 sp=0xc82002c7c0
created by runtime.init.4
    /Users/rsc/go/src/runtime/proc.go:218 +0x2b

goroutine 3 [GC sweep wait]:
runtime.gopark(0x26bed0, 0x350aa0, 0x230a06, 0xd, 0x4d914, 0x1)
    /Users/rsc/go/src/runtime/proc.go:262 +0x131 fp=0xc82002cf48 sp=0xc82002cf20
runtime.goparkunlock(0x350aa0, 0x230a06, 0xd, 0x14, 0x1)
    /Users/rsc/go/src/runtime/proc.go:268 +0x54 fp=0xc82002cf80 sp=0xc82002cf48
runtime.bgsweep(0xc82005c000)
    /Users/rsc/go/src/runtime/mgcsweep.go:79 +0x125 fp=0xc82002cfb8 sp=0xc82002cf80
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc82002cfc0 sp=0xc82002cfb8
created by runtime.gcenable
    /Users/rsc/go/src/runtime/mgc.go:191 +0x53

goroutine 4 [finalizer wait]:
runtime.gopark(0x26bed0, 0x36be30, 0x231a6a, 0xe, 0xc820001314, 0x1)
    /Users/rsc/go/src/runtime/proc.go:262 +0x131 fp=0xc82002d710 sp=0xc82002d6e8
runtime.goparkunlock(0x36be30, 0x231a6a, 0xe, 0x2000000014, 0x1)
    /Users/rsc/go/src/runtime/proc.go:268 +0x54 fp=0xc82002d748 sp=0xc82002d710
runtime.runfinq()
    /Users/rsc/go/src/runtime/mfinal.go:158 +0xa5 fp=0xc82002d7c0 sp=0xc82002d748
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc82002d7c8 sp=0xc82002d7c0
created by runtime.createfing
    /Users/rsc/go/src/runtime/mfinal.go:139 +0x66

goroutine 5 [timer goroutine (idle)]:
runtime.gopark(0x26bed0, 0x350ba0, 0x23a0db, 0x16, 0x26d314, 0x1)
    /Users/rsc/go/src/runtime/proc.go:262 +0x131 fp=0xc82002df10 sp=0xc82002dee8
runtime.goparkunlock(0x350ba0, 0x23a0db, 0x16, 0x14, 0x1)
    /Users/rsc/go/src/runtime/proc.go:268 +0x54 fp=0xc82002df48 sp=0xc82002df10
runtime.timerproc()
    /Users/rsc/go/src/runtime/time.go:202 +0x302 fp=0xc82002dfc0 sp=0xc82002df48
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc82002dfc8 sp=0xc82002dfc0
created by runtime.addtimerLocked
    /Users/rsc/go/src/runtime/time.go:116 +0x115

goroutine 22288 [syscall, 2 minutes]:
syscall.Syscall6(0x7, 0xc218, 0xc82003dce4, 0x0, 0xc820256bd0, 0x0, 0x0, 0x1d0120, 0xc800000000, 0xc8204a5100)
    /Users/rsc/go/src/syscall/asm_darwin_amd64.s:41 +0x5 fp=0xc82003dc38 sp=0xc82003dc30
syscall.wait4(0xc218, 0xc82003dce4, 0x0, 0xc820256bd0, 0x90, 0x2162c0, 0x1)
    /Users/rsc/go/src/syscall/zsyscall_darwin_amd64.go:34 +0x78 fp=0xc82003dca8 sp=0xc82003dc38
syscall.Wait4(0xc218, 0xc82003dd2c, 0x0, 0xc820256bd0, 0x26ba78, 0xc8202468c0, 0xc820264520)
    /Users/rsc/go/src/syscall/syscall_bsd.go:129 +0x47 fp=0xc82003dcf0 sp=0xc82003dca8
os.(*Process).wait(0xc8202647a0, 0xc82024e380, 0xc820250150, 0xc820264540)
    /Users/rsc/go/src/os/exec_unix.go:22 +0xf4 fp=0xc82003dd70 sp=0xc82003dcf0
os.(*Process).Wait(0xc8202647a0, 0xc82003ddd0, 0x0, 0x0)
    /Users/rsc/go/src/os/doc.go:49 +0x21 fp=0xc82003dd98 sp=0xc82003dd70
os/exec.(*Cmd).Wait(0xc8202468c0, 0x0, 0x0)
    /Users/rsc/go/src/os/exec/exec.go:396 +0x58 fp=0xc82003de20 sp=0xc82003dd98
os/exec.(*Cmd).Run(0xc8202468c0, 0xc82024e380, 0xc82025d400)
    /Users/rsc/go/src/os/exec/exec.go:262 +0x3e fp=0xc82003de40 sp=0xc82003de20
os/exec.(*Cmd).CombinedOutput(0xc8202468c0, 0xc8202468c0, 0xc8204feed8, 0x1, 0x1, 0xc8202468c0)
    /Users/rsc/go/src/os/exec/exec.go:453 +0x174 fp=0xc82003de80 sp=0xc82003de40
runtime_test.runTestProg(0xc820256ab0, 0x22f573, 0xb, 0x23ac6d, 0x17, 0xc8204fef90, 0x28929)
    /Users/rsc/go/src/runtime/crash_test.go:68 +0xf9 fp=0xc82003def0 sp=0xc82003de80
runtime_test.TestCgoExternalThreadSignal(0xc820256ab0)
    /Users/rsc/go/src/runtime/crash_cgo_test.go:109 +0x5a fp=0xc82003df80 sp=0xc82003def0
testing.tRunner(0xc820256ab0, 0x34d688)
    /Users/rsc/go/src/testing/testing.go:473 +0x7e fp=0xc82003dfa0 sp=0xc82003df80
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc82003dfa8 sp=0xc82003dfa0
created by testing.RunTests
    /Users/rsc/go/src/testing/testing.go:581 +0x31c

goroutine 5319 [GC worker (idle)]:
runtime.gopark(0xc8204a5118, 0xc820214fa0, 0x233443, 0x10, 0xc820452214, 0x0)
    /Users/rsc/go/src/runtime/proc.go:262 +0x131 fp=0xc820214f48 sp=0xc820214f20
runtime.gcBgMarkWorker(0xc820025500)
    /Users/rsc/go/src/runtime/mgc.go:1397 +0xc1 fp=0xc820214fb8 sp=0xc820214f48
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc820214fc0 sp=0xc820214fb8
created by runtime.gcBgMarkStartWorkers
    /Users/rsc/go/src/runtime/mgc.go:1324 +0x8a

goroutine 195 [GC worker (idle)]:
runtime.gopark(0xc8204a4000, 0xc8200ce7a0, 0x233443, 0x10, 0xc8200ce714, 0x0)
    /Users/rsc/go/src/runtime/proc.go:262 +0x131 fp=0xc8200ce748 sp=0xc8200ce720
runtime.gcBgMarkWorker(0xc82001c000)
    /Users/rsc/go/src/runtime/mgc.go:1397 +0xc1 fp=0xc8200ce7b8 sp=0xc8200ce748
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc8200ce7c0 sp=0xc8200ce7b8
created by runtime.gcBgMarkStartWorkers
    /Users/rsc/go/src/runtime/mgc.go:1324 +0x8a

goroutine 5579 [GC worker (idle)]:
runtime.gopark(0xc820484038, 0xc8204fcfa0, 0x233443, 0x10, 0xc820473314, 0x0)
    /Users/rsc/go/src/runtime/proc.go:262 +0x131 fp=0xc8204fcf48 sp=0xc8204fcf20
runtime.gcBgMarkWorker(0xc820024000)
    /Users/rsc/go/src/runtime/mgc.go:1397 +0xc1 fp=0xc8204fcfb8 sp=0xc8204fcf48
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc8204fcfc0 sp=0xc8204fcfb8
created by runtime.gcBgMarkStartWorkers
    /Users/rsc/go/src/runtime/mgc.go:1324 +0x8a

goroutine 5545 [GC worker (idle)]:
runtime.gopark(0xc820010360, 0xc8200cefa0, 0x233443, 0x10, 0xc82041c614, 0x0)
    /Users/rsc/go/src/runtime/proc.go:262 +0x131 fp=0xc8200cef48 sp=0xc8200cef20
runtime.gcBgMarkWorker(0xc820020000)
    /Users/rsc/go/src/runtime/mgc.go:1397 +0xc1 fp=0xc8200cefb8 sp=0xc8200cef48
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc8200cefc0 sp=0xc8200cefb8
created by runtime.gcBgMarkStartWorkers
    /Users/rsc/go/src/runtime/mgc.go:1324 +0x8a

goroutine 5586 [GC worker (idle)]:
runtime.gopark(0xc8204b24a8, 0xc8200c87a0, 0x233443, 0x10, 0xc8201d7a14, 0x0)
    /Users/rsc/go/src/runtime/proc.go:262 +0x131 fp=0xc8200c8748 sp=0xc8200c8720
runtime.gcBgMarkWorker(0xc820021500)
    /Users/rsc/go/src/runtime/mgc.go:1397 +0xc1 fp=0xc8200c87b8 sp=0xc8200c8748
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc8200c87c0 sp=0xc8200c87b8
created by runtime.gcBgMarkStartWorkers
    /Users/rsc/go/src/runtime/mgc.go:1324 +0x8a

goroutine 168 [GC worker (idle)]:
runtime.gopark(0xc820482018, 0xc8200cb7a0, 0x233443, 0x10, 0xc8200cb714, 0x0)
    /Users/rsc/go/src/runtime/proc.go:262 +0x131 fp=0xc8200cb748 sp=0xc8200cb720
runtime.gcBgMarkWorker(0xc82001d500)
    /Users/rsc/go/src/runtime/mgc.go:1397 +0xc1 fp=0xc8200cb7b8 sp=0xc8200cb748
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc8200cb7c0 sp=0xc8200cb7b8
created by runtime.gcBgMarkStartWorkers
    /Users/rsc/go/src/runtime/mgc.go:1324 +0x8a

goroutine 5587 [GC worker (idle)]:
runtime.gopark(0xc820276008, 0xc8200cdfa0, 0x233443, 0x10, 0xc8201d6914, 0x0)
    /Users/rsc/go/src/runtime/proc.go:262 +0x131 fp=0xc8200cdf48 sp=0xc8200cdf20
runtime.gcBgMarkWorker(0xc820022a00)
    /Users/rsc/go/src/runtime/mgc.go:1397 +0xc1 fp=0xc8200cdfb8 sp=0xc8200cdf48
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc8200cdfc0 sp=0xc8200cdfb8
created by runtime.gcBgMarkStartWorkers
    /Users/rsc/go/src/runtime/mgc.go:1324 +0x8a

goroutine 5544 [GC worker (idle)]:
runtime.gopark(0xc8204b0010, 0xc8200c8fa0, 0x233443, 0x10, 0xc82046f114, 0x0)
    /Users/rsc/go/src/runtime/proc.go:262 +0x131 fp=0xc8200c8f48 sp=0xc8200c8f20
runtime.gcBgMarkWorker(0xc82001ea00)
    /Users/rsc/go/src/runtime/mgc.go:1397 +0xc1 fp=0xc8200c8fb8 sp=0xc8200c8f48
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc8200c8fc0 sp=0xc8200c8fb8
created by runtime.gcBgMarkStartWorkers
    /Users/rsc/go/src/runtime/mgc.go:1324 +0x8a

goroutine 22289 [syscall, 2 minutes]:
syscall.Syscall(0x3, 0x4, 0xc82023d000, 0x200, 0x0, 0x0, 0x0)
    /Users/rsc/go/src/syscall/asm_darwin_amd64.s:16 +0x5 fp=0xc8204fecc0 sp=0xc8204fecb8
syscall.read(0x4, 0xc82023d000, 0x200, 0x200, 0xc82023d000, 0xc82025d400, 0x11ffed8e3)
    /Users/rsc/go/src/syscall/zsyscall_darwin_amd64.go:972 +0x48 fp=0xc8204fed08 sp=0xc8204fecc0
syscall.Read(0x4, 0xc82023d000, 0x200, 0x200, 0x0, 0xc820033000, 0xc8202a0f00)
    /Users/rsc/go/src/syscall/syscall_unix.go:161 +0x3f fp=0xc8204fed48 sp=0xc8204fed08
os.(*File).read(0xc820250138, 0xc82023d000, 0x200, 0x200, 0xc8f21, 0x190800, 0x200)
    /Users/rsc/go/src/os/file_unix.go:228 +0x68 fp=0xc8204fed88 sp=0xc8204fed48
os.(*File).Read(0xc820250138, 0xc82023d000, 0x200, 0x200, 0x22fc46, 0xc, 0x0)
    /Users/rsc/go/src/os/file.go:95 +0x4c fp=0xc8204fede8 sp=0xc8204fed88
bytes.(*Buffer).ReadFrom(0xc82024e380, 0x1684000, 0xc820250138, 0xc8204feea8, 0xc82024e301, 0xc820264460)
    /Users/rsc/go/src/bytes/buffer.go:176 +0x150 fp=0xc8204fee50 sp=0xc8204fede8
io.copyBuffer(0x7446b8, 0xc82024e380, 0x1684000, 0xc820250138, 0x0, 0x0, 0x0, 0x22f573, 0xb, 0x23ac6d)
    /Users/rsc/go/src/io/io.go:374 +0x317 fp=0xc8204feed0 sp=0xc8204fee50
io.Copy(0x7446b8, 0xc82024e380, 0x1684000, 0xc820250138, 0xc8204fef90, 0xc8204fefa0, 0x4e120)
    /Users/rsc/go/src/io/io.go:350 +0x5a fp=0xc8204fef28 sp=0xc8204feed0
os/exec.(*Cmd).writerDescriptor.func1(0xc81fd86010, 0x66b3e)
    /Users/rsc/go/src/os/exec/exec.go:236 +0x56 fp=0xc8204fef80 sp=0xc8204fef28
os/exec.(*Cmd).Start.func1(0xc8202468c0, 0xc820264520)
    /Users/rsc/go/src/os/exec/exec.go:344 +0x20 fp=0xc8204fefb0 sp=0xc8204fef80
runtime.goexit()
    /Users/rsc/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc8204fefb8 sp=0xc8204fefb0
created by os/exec.(*Cmd).Start
    /Users/rsc/go/src/os/exec/exec.go:345 +0x50f
FAIL    runtime 180.051s
@rsc rsc added this to the Go1.7Early milestone Mar 14, 2016
@bradfitz bradfitz modified the milestones: Go1.7, Go1.7Early May 4, 2016
@bradfitz
Copy link
Contributor

I just got another one of these on a trybot run:
https://storage.googleapis.com/go-build-log/5121af01/darwin-amd64-10_10_1dc794a3.log

@quentinmit quentinmit added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 26, 2016
@quentinmit
Copy link
Contributor

@dvyukov @aclements Have either of you had a chance to look at this yet?

@aclements
Copy link
Member

I spent all day trying to reproduce this on the darwin-amd64-10_10 gomote with no success.

@rsc rsc modified the milestones: Go1.7Maybe, Go1.7 May 27, 2016
@ianlancetaylor ianlancetaylor added the Testing An issue that has been verified to require only test changes, not just a test failure. label Jun 14, 2016
@gopherbot
Copy link

CL https://golang.org/cl/24110 mentions this issue.

gopherbot pushed a commit that referenced this issue Jun 15, 2016
This is an attempt to get more information for #14809, which seems to
occur rarely.

Updates #14809.

Change-Id: Idbeb136ceb57993644e03266622eb699d2685d02
Reviewed-on: https://go-review.googlesource.com/24110
Reviewed-by: Mikio Hara <mikioh.mikioh@gmail.com>
Reviewed-by: Austin Clements <austin@google.com>
@aclements
Copy link
Member

Unfortunately, we haven't had any TestCgoExternalThreadSignal failures since 2016-06-01T00:09:32-04acd62/darwin-amd64-10_10, which was before @ianlancetaylor's CL to get stack traces of child processes. (In fact, the only runtime test timeouts since June 2nd have been on Plan 9.)

There was one interesting failure around that time that looks related to signals, though it wasn't this specific test: 2016-06-01T00:09:32-04acd62/linux-ppc64le-buildlet. @ianlancetaylor, it looks like goroutine 207880 got preempted at the beginning of sigpanic and never came back. Do you think that could be related?

@ianlancetaylor
Copy link
Contributor

I think it's impossible to tell whether the problem is related or not. They are timeouts, so that part is similar, but they are running different tests.

Why would there be a runnable goroutine that nothing is running? That seems to imply that all the P's are in use, which does not seem to be the case, or that all the M's are doing something, but what?, or that all the M's are blocked waiting for something, but what?

Since we don't know what's going on, It's probably worth doing something so that if a runtime test time outs we get a schedtrace(true).

@ianlancetaylor
Copy link
Contributor

No idea what is happening. Postponing to 1.8.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.8, Go1.7Maybe Jul 7, 2016
@jaredborner
Copy link

i've been able to consistently reproduce this.

compiled/testing from current master (e5ff529)

go env

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/go/tip/src"
GORACE=""
GOROOT="/go/tip"
GOTOOLDIR="/go/tip/pkg/tool/darwin_amd64"
GO15VENDOREXPERIMENT="1"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fno-common"
CXX="clang++"
CGO_ENABLED="1"

steps taken

cd $GOROOT/src/runtime
go test -c runtime
./runtime.test -test.v -test.run TestCgoExternalThreadSignal -test.count 10000

the hang happens every few hundred to few thousand iterations, leading to the following output:

--- FAIL: TestCgoExternalThreadSignal (60.02s)
    crash_test.go:105: testprogcgo CgoExternalThreadSignal exit status: exit status 2
    crash_cgo_test.go:116: expected "OK\n", but got:
        SIGQUIT: quit
        PC=0x4074f7e m=0

        goroutine 1 [syscall]:
        syscall.Syscall6(0xad, 0x1, 0x103da, 0xc42009a000, 0x24, 0x0, 0x0, 0x40c4ea0, 0xc420051bb0, 0x4037a5e)
            /go/tip/src/syscall/asm_darwin_amd64.s:41 +0x5 fp=0xc420051b28 sp=0xc420051b20
        os.(*Process).blockUntilWaitable(0xc420074300, 0x40ea4b8, 0xc420092000, 0xc42007e120)
            /go/tip/src/os/wait_waitid.go:28 +0xbc fp=0xc420051bc0 sp=0xc420051b28
        os.(*Process).wait(0xc420074300, 0x0, 0x40164ee, 0xc420080040)
            /go/tip/src/os/exec_unix.go:22 +0xab fp=0xc420051c50 sp=0xc420051bc0
        os.(*Process).Wait(0xc420074300, 0xc420051cb8, 0x0, 0x0)
            /go/tip/src/os/doc.go:49 +0x2b fp=0xc420051c80 sp=0xc420051c50
        os/exec.(*Cmd).Wait(0xc420092000, 0x0, 0x0)
            /go/tip/src/os/exec/exec.go:434 +0x6d fp=0xc420051d10 sp=0xc420051c80
        os/exec.(*Cmd).Run(0xc420092000, 0xc420094000, 0x3)
            /go/tip/src/os/exec/exec.go:279 +0x48 fp=0xc420051d38 sp=0xc420051d10
        os/exec.(*Cmd).CombinedOutput(0xc420092000, 0x52, 0xc420051e50, 0x2, 0x2, 0xc420092000)
            /go/tip/src/os/exec/exec.go:494 +0x13b fp=0xc420051d78 sp=0xc420051d38
        main.CgoExternalThreadSignal()
            /go/tip/src/runtime/testdata/testprogcgo/threadprof.go:85 +0xe3 fp=0xc420051ea0 sp=0xc420051d78
        main.main()
            /go/tip/src/runtime/testdata/testprogcgo/main.go:34 +0x1f1 fp=0xc420051f48 sp=0xc420051ea0
        runtime.main()
            /go/tip/src/runtime/proc.go:183 +0x1f4 fp=0xc420051fa0 sp=0xc420051f48
        runtime.goexit()
            /go/tip/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc420051fa8 sp=0xc420051fa0

        goroutine 17 [syscall, locked to thread]:
        runtime.goexit()
            /go/tip/src/runtime/asm_amd64.s:2086 +0x1

        goroutine 19 [syscall]:
        os/signal.signal_recv(0x0)
            /go/tip/src/runtime/sigqueue.go:116 +0x157
        os/signal.loop()
            /go/tip/src/os/signal/signal_unix.go:22 +0x22
        created by os/signal.init.1
            /go/tip/src/os/signal/signal_unix.go:28 +0x41

        goroutine 20 [syscall]:
        syscall.Syscall(0x3, 0x4, 0xc4200b0aa2, 0x135e, 0x38, 0x0, 0x0)
            /go/tip/src/syscall/asm_darwin_amd64.s:16 +0x5
        syscall.read(0x4, 0xc4200b0aa2, 0x135e, 0x135e, 0x38, 0x0, 0x0)
            /go/tip/src/syscall/zsyscall_darwin_amd64.go:973 +0x55
        syscall.Read(0x4, 0xc4200b0aa2, 0x135e, 0x135e, 0x38, 0x0, 0x0)
            /go/tip/src/syscall/syscall_unix.go:161 +0x49
        os.(*File).read(0xc420080028, 0xc4200b0aa2, 0x135e, 0x135e, 0x38, 0x0, 0x0)
            /go/tip/src/os/file_unix.go:228 +0x6a
        os.(*File).Read(0xc420080028, 0xc4200b0aa2, 0x135e, 0x135e, 0x38, 0x0, 0x0)
            /go/tip/src/os/file.go:101 +0x59
        bytes.(*Buffer).ReadFrom(0xc420094000, 0x4145300, 0xc420080028, 0xc420021688, 0x1, 0x0)
            /go/tip/src/bytes/buffer.go:176 +0x155
        io.copyBuffer(0x4145240, 0xc420094000, 0x4145300, 0xc420080028, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
            /go/tip/src/io/io.go:384 +0x324
        io.Copy(0x4145240, 0xc420094000, 0x4145300, 0xc420080028, 0x0, 0x0, 0x0)
            /go/tip/src/io/io.go:360 +0x68
        os/exec.(*Cmd).writerDescriptor.func1(0x0, 0x0)
            /go/tip/src/os/exec/exec.go:253 +0x4d
        os/exec.(*Cmd).Start.func1(0xc420092000, 0xc42007e120)
            /go/tip/src/os/exec/exec.go:370 +0x27
        created by os/exec.(*Cmd).Start
            /go/tip/src/os/exec/exec.go:371 +0x4db

        rax    0x20000ad
        rbx    0xc42001950c
        rcx    0xc420051b20
        rdx    0xc42009a000
        rdi    0x1
        rsi    0x103da
        rbp    0xc420051bb0
        rsp    0xc420051b20
        r8     0x0
        r9     0x0
        r10    0x24
        r11    0x202
        r12    0x40
        r13    0xf3
        r14    0x30
        r15    0x3
        rip    0x4074f7e
        rflags 0x202
        cs     0x7
        fs     0x0
        gs     0x0

i also captured several hangs with GODEBUG=gctrace=1,schedtrace=100,scheddetail=1 set. they're pretty noisy but i can upload one of them somewhere if that would be useful.

lastly, there is an instance of a testprogcgo.exe CgoExternalThreadSignal crash process for every hang. if i ctrl-c the main test process, these processes are cleaned up. if i let it finish on its own, they stay alive.

@ianlancetaylor
Copy link
Contributor

Interesting. When the process hangs like this, it is waiting for testprogcgo.exe CgoExternalThreadSignal crash to exit. That process should always exit within 5 seconds. I don't know why it isn't. If you can recreate this, then the next step is to run

GOTRACEBACK=system testprogcgo.exe CgoExternalThreadSIgnal crash

in a loop. Wait for it to hang--presumably it will eventually hang, otherwise I don't know what is going on. Then press ^\ to get a stack trace where it is hanging. Thanks.

@ianlancetaylor
Copy link
Contributor

By the way, you can build testprogcgo.exe by just going to the directory src/runtime/testdata/testprogcgo and running go build -o testprogcgo.exe.

@ianlancetaylor
Copy link
Contributor

For what it's worth, I ran the test 10,000 times, and ran testprogcgo by itself 10,000 times, and they all passed for me. This was on a system whose uname -v output is

Darwin Kernel Version 16.0.0: Tue Jun 28 15:57:40 PDT 2016; root:xnu-3757.0.0.0.1~20/RELEASE_X86_64

@jaredborner
Copy link

uname -v output for me is

Darwin Kernel Version 15.0.0: Wed Aug 26 19:41:34 PDT 2015; root:xnu-3247.1.106~5/RELEASE_X86_64

if i run GOTRACEBACK=system testprogcgo.exe CgoExternalThreadSignal crash directly, it will hang but it ignores SIGQUIT.

running GOTRACEBACK=system testprogcgo.exe CgoExternalThreadSignal seems to run "crash" indirectly, and when it hangs SIGQUIT give this:

^\SIGQUIT: quit
PC=0x4074f7e m=0

goroutine 1 [syscall]:
syscall.Syscall6(0xad, 0x1, 0xfc7c, 0xc42008c000, 0x24, 0x0, 0x0, 0x40c4ea0, 0xc420051bb0, 0x4037a5e)
    /go/tip/src/syscall/asm_darwin_amd64.s:41 +0x5 fp=0xc420051b28 sp=0xc420051b20
os.(*Process).blockUntilWaitable(0xc42000c4e0, 0x40ea4b8, 0xc420088000, 0xc4200102e0)
    /go/tip/src/os/wait_waitid.go:28 +0xbc fp=0xc420051bc0 sp=0xc420051b28
os.(*Process).wait(0xc42000c4e0, 0x0, 0x40164ee, 0xc420028048)
    /go/tip/src/os/exec_unix.go:22 +0xab fp=0xc420051c50 sp=0xc420051bc0
os.(*Process).Wait(0xc42000c4e0, 0xc420051cb8, 0x0, 0x0)
    /go/tip/src/os/doc.go:49 +0x2b fp=0xc420051c80 sp=0xc420051c50
os/exec.(*Cmd).Wait(0xc420088000, 0x0, 0x0)
    /go/tip/src/os/exec/exec.go:434 +0x6d fp=0xc420051d10 sp=0xc420051c80
os/exec.(*Cmd).Run(0xc420088000, 0xc420054070, 0x3)
    /go/tip/src/os/exec/exec.go:279 +0x48 fp=0xc420051d38 sp=0xc420051d10
os/exec.(*Cmd).CombinedOutput(0xc420088000, 0x11, 0xc420051e50, 0x2, 0x2, 0xc420088000)
    /go/tip/src/os/exec/exec.go:494 +0x13b fp=0xc420051d78 sp=0xc420051d38
main.CgoExternalThreadSignal()
    /go/tip/src/runtime/testdata/testprogcgo/threadprof.go:85 +0xe3 fp=0xc420051ea0 sp=0xc420051d78
main.main()
    /go/tip/src/runtime/testdata/testprogcgo/main.go:34 +0x1f1 fp=0xc420051f48 sp=0xc420051ea0
runtime.main()
    /go/tip/src/runtime/proc.go:183 +0x1f4 fp=0xc420051fa0 sp=0xc420051f48
runtime.goexit()
    /go/tip/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc420051fa8 sp=0xc420051fa0

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /go/tip/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc420038fb8 sp=0xc420038fb0

goroutine 2 [force gc (idle)]:
runtime.gopark(0x40ea840, 0x4156560, 0x40df07b, 0xf, 0x40ea714, 0x1)
    /go/tip/src/runtime/proc.go:259 +0x13a fp=0xc420024748 sp=0xc420024718
runtime.goparkunlock(0x4156560, 0x40df07b, 0xf, 0xc420000114, 0x1)
    /go/tip/src/runtime/proc.go:265 +0x5e fp=0xc420024788 sp=0xc420024748
runtime.forcegchelper()
    /go/tip/src/runtime/proc.go:224 +0xa8 fp=0xc4200247c0 sp=0xc420024788
runtime.goexit()
    /go/tip/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc4200247c8 sp=0xc4200247c0
created by runtime.init.4
    /go/tip/src/runtime/proc.go:213 +0x35

goroutine 3 [GC sweep wait]:
runtime.gopark(0x40ea840, 0x4156700, 0x40dec18, 0xd, 0x4023914, 0x1)
    /go/tip/src/runtime/proc.go:259 +0x13a fp=0xc420024f38 sp=0xc420024f08
runtime.goparkunlock(0x4156700, 0x40dec18, 0xd, 0x14, 0x1)
    /go/tip/src/runtime/proc.go:265 +0x5e fp=0xc420024f78 sp=0xc420024f38
runtime.bgsweep(0xc420054000)
    /go/tip/src/runtime/mgcsweep.go:63 +0xb6 fp=0xc420024fb8 sp=0xc420024f78
runtime.goexit()
    /go/tip/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc420024fc0 sp=0xc420024fb8
created by runtime.gcenable
    /go/tip/src/runtime/mgc.go:195 +0x61

goroutine 4 [finalizer wait]:
runtime.gopark(0x40ea840, 0x4170f10, 0x40dee94, 0xe, 0x14, 0x1)
    /go/tip/src/runtime/proc.go:259 +0x13a fp=0xc420025708 sp=0xc4200256d8
runtime.goparkunlock(0x4170f10, 0x40dee94, 0xe, 0x14, 0x1)
    /go/tip/src/runtime/proc.go:265 +0x5e fp=0xc420025748 sp=0xc420025708
runtime.runfinq()
    /go/tip/src/runtime/mfinal.go:158 +0xaf fp=0xc4200257c0 sp=0xc420025748
runtime.goexit()
    /go/tip/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc4200257c8 sp=0xc4200257c0
created by runtime.createfing
    /go/tip/src/runtime/mfinal.go:139 +0x73

goroutine 5 [syscall]:
runtime.notetsleepg(0x41710a0, 0xffffffffffffffff, 0x1)
    /go/tip/src/runtime/lock_sema.go:257 +0x4b fp=0xc420025f50 sp=0xc420025f10
os/signal.signal_recv(0x0)
    /go/tip/src/runtime/sigqueue.go:116 +0x157 fp=0xc420025f80 sp=0xc420025f50
os/signal.loop()
    /go/tip/src/os/signal/signal_unix.go:22 +0x22 fp=0xc420025fc0 sp=0xc420025f80
runtime.goexit()
    /go/tip/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc420025fc8 sp=0xc420025fc0
created by os/signal.init.1
    /go/tip/src/os/signal/signal_unix.go:28 +0x41

goroutine 6 [syscall]:
syscall.Syscall(0x3, 0x4, 0xc4200abe00, 0x8000, 0x38, 0x0, 0x0)
    /go/tip/src/syscall/asm_darwin_amd64.s:16 +0x5 fp=0xc420026470 sp=0xc420026468
syscall.read(0x4, 0xc4200abe00, 0x8000, 0x8000, 0x38, 0x0, 0x0)
    /go/tip/src/syscall/zsyscall_darwin_amd64.go:973 +0x55 fp=0xc4200264c8 sp=0xc420026470
syscall.Read(0x4, 0xc4200abe00, 0x8000, 0x8000, 0x38, 0x0, 0x0)
    /go/tip/src/syscall/syscall_unix.go:161 +0x49 fp=0xc420026510 sp=0xc4200264c8
os.(*File).read(0xc420028030, 0xc4200abe00, 0x8000, 0x8000, 0x38, 0x0, 0x0)
    /go/tip/src/os/file_unix.go:228 +0x6a fp=0xc420026558 sp=0xc420026510
os.(*File).Read(0xc420028030, 0xc4200abe00, 0x8000, 0x8000, 0x38, 0x0, 0x0)
    /go/tip/src/os/file.go:101 +0x59 fp=0xc4200265c0 sp=0xc420026558
bytes.(*Buffer).ReadFrom(0xc420054070, 0x4145300, 0xc420028030, 0xc420026688, 0x1, 0x0)
    /go/tip/src/bytes/buffer.go:176 +0x155 fp=0xc420026630 sp=0xc4200265c0
io.copyBuffer(0x4145240, 0xc420054070, 0x4145300, 0xc420028030, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /go/tip/src/io/io.go:384 +0x324 fp=0xc4200266b8 sp=0xc420026630
io.Copy(0x4145240, 0xc420054070, 0x4145300, 0xc420028030, 0x0, 0x0, 0x0)
    /go/tip/src/io/io.go:360 +0x68 fp=0xc420026718 sp=0xc4200266b8
os/exec.(*Cmd).writerDescriptor.func1(0x0, 0x0)
    /go/tip/src/os/exec/exec.go:253 +0x4d fp=0xc420026778 sp=0xc420026718
os/exec.(*Cmd).Start.func1(0xc420088000, 0xc4200102e0)
    /go/tip/src/os/exec/exec.go:370 +0x27 fp=0xc4200267b0 sp=0xc420026778
runtime.goexit()
    /go/tip/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc4200267b8 sp=0xc4200267b0
created by os/exec.(*Cmd).Start
    /go/tip/src/os/exec/exec.go:371 +0x4db

rax    0x20000ad
rbx    0xc42001800c
rcx    0xc420051b20
rdx    0xc42008c000
rdi    0x1
rsi    0xfc7c
rbp    0xc420051bb0
rsp    0xc420051b20
r8     0x0
r9     0x0
r10    0x24
r11    0x202
r12    0x40
r13    0xf3
r14    0x30
r15    0x3
rip    0x4074f7e
rflags 0x202
cs     0x7
fs     0x0
gs     0x0

@ianlancetaylor
Copy link
Contributor

Thanks. Unfortunately the traceback you showed just tells me that the program is waiting for another program to exit. The other program is testprogcgo.exe CgoExternalThreadSignal crash. We need to figure out why that program is not exiting. If it is ignoring ^\, can you try something else? Try killing the process from a different terminal, such as kill -ILL <pid>. Or try attaching to it from a debugger and getting a stack trace. Or try running the program under dtruss -f. Any of those might help figure out what it is doing. Thanks.

@jaredborner
Copy link

jaredborner commented Jul 8, 2016

i was able to use gdb to get the thread backtraces, and used delve to try and get some more info on the goroutines. if there's some other debugger info that'd be useful please let me know.

this was from running testprogcgo.exe CgoExternalThreadSignal crash directly

gdb-thread-backtraces.txt
delve-goroutine-info.txt

*edit: also, wanted to mention that the hung process takes up ~100% of the cpu

@ianlancetaylor
Copy link
Contributor

Thanks!

Your program seems to be in a state where at least some uses of the SYSCALL instruction are raising a signal. The effect seems to be an endless loop as the program tries to do something, makes a system call, and another signal is raised. Unfortunately I don't know what that signal is or why it is occurring. Based on the appearance of sigpanic in the call stack, I think the signal must be one of SIGFPE, SIGBUS or SIGSEGV, probably the latter. But I can't think of any reason that a SYSCALL instruction would cause a SIGSEGV.

@ianlancetaylor
Copy link
Contributor

If you can get a copy of dtruss -f on the program it might suggest something. Right now without the ability to reproduce the problem myself I am stumped.

@jaredborner
Copy link

yes, the lack of reproducibility is unfortunate. i haven't been able to capture the hang under dtruss yet.

i was able to get the same behavior with this reduced/standalone program:

package main

/*
#include <pthread.h>

void **nullptr;

void *crash(void *p) {
    *nullptr = p;
    return 0;
}

int start_crashing_thread(void) {
    pthread_t tid;
    return pthread_create(&tid, 0, crash, 0);
}
*/
import "C"

func main() {
    if C.start_crashing_thread() == 0 {
        for i := 0; i < 2e9; i++ {
        }
    }
}

backtraces: backtraces.txt
terminal output: terminal.txt

i ran the program in a bash while true loop and it usual hangs within ~1min.

it seems to be the same issue, but i may be reading things incorrectly.

i'll keep trying to see if i can get it to hang under dtruss -f

@jaredborner
Copy link

finally was able to get the hang under dtruss

dtruss.txt
backtraces.txt
testprogcgo.exe.txt

the total dtruss log is ~170 megs but that's the tail of it

@josharian
Copy link
Contributor

Anecdotally, this appears easier to reproduce on a heavily loaded system.

@gopherbot
Copy link

CL https://golang.org/cl/33300 mentions this issue.

@ianlancetaylor
Copy link
Contributor

I think I see the problem. The code in raisebadsignal does this:

handler := originalHandler // in this case, _SIG_DFL
unblocksig(sig)
setsig(sig, handler)
raise(sig)
setsig(sig, funcPC(sighandler))

On Darwin the raise system call is actually implemented as kill(getpid(), sig), so it sends the signal to the entire process. If the kernel chooses to deliver the signal to a different thread, then the current thread will keep running. It will immediately reset the signal handler back to the Go signal handler, so when the kernel does deliver the signal it will send it there, rather than to the default handler as we intended.

CL 33300 adds a sleep to give the kernel a chance to deliver the signal.

I would be grateful if anybody able to recreate this problem could try to recreate it with that CL applied. Thanks.

@jaredborner
Copy link

I've been able to get it to hang with a recent commit (d338f2e)

With this CL I haven't seen it hang. I kept it running for a while in the background (probably like 4 or 5 hrs)

@ianlancetaylor
Copy link
Contributor

Thanks for testing it. Sounds like it's the right fix.

@aclements
Copy link
Member

It looks like Darwin has a pthread_kill system call that's like Linux's tkill. Could we use that instead to explicitly target the signal at the current thread?

@ianlancetaylor
Copy link
Contributor

I don't know. The comment in runtime/sys_darwin_amd64.s on raise is

    // Ideally we'd send the signal to the current thread,
    // not the whole process, but that's too hard on OS X.

@ianlancetaylor
Copy link
Contributor

As far as I can tell there is no pthread_self system call on Darwin. Instead, pthread_self seems to be implemented as, in effect, a TLS load. I don't know how we can reliably determine the correct offset.

That might be easier to implement if we first fix #17200.

@golang golang locked and limited conversation to collaborators Nov 22, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

8 participants