-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: segfault in tests #5422
Labels
Milestone
Comments
Some notes. When I look at the type _select in zruntime_defs_$(GOOS)_$(GOARCH).go, I'm slightly troubled by this: type _select struct { tcase uint16 ncase uint16 pollorder *uint16 lockorder **hchan scase [1]scase } The C code will allocate more than one entry for scase (see newselect in chan.c). Will the precise GC handle this unwarranted chumminess correctly? It should be fine as long as all the pointers to this memory are (from the point of view of the GC) untyped. And that may well be the case. The crash is happening on a parked call to selunlock, which must have come from chan.c:989. When I look at the goroutine backtraces, I'm puzzled that I don't see any goroutine sitting at that line number. Somebody must have called runtime·ready on the goroutine to cause it to start going again, but where is it? This may be expected, I'm not sure. At the point of failure the program is running TestMultiConsumer from runtime/chan_test.go. It's interesting to note that that test does not use select at all, so it would seem that whatever goroutine is executing a select is left over from some earlier test. Although I can't make out which test that would be. |
I've found a way to reproduce this more reliably: GOGC=0 GOGCTRACE=1 GOMAXPROCS=2 ./runtime.test -test.short -test.cpu=1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4 -test.v It failed within 3 or 4 runs. Here's another trace: === RUN TestPseudoRandomSend-2 gc313476(2): 0+0+0 ms, 10 -> 10 MB 3021 -> 3021 (1262540-1259519) objects, 2(30) handoff, 6(385) steal, 31/4/0 yields gc313477(2): 0+0+0 ms, 10 -> 10 MB 3022 -> 3022 (1262542-1259520) objects, 2(23) handoff, 4(413) steal, 40/6/1 yields gc313478(2): 0+0+0 ms, 10 -> 10 MB 3023 -> 3023 (1262544-1259521) objects, 2(19) handoff, 6(421) steal, 37/3/0 yields gc313479(2): 0+0+0 ms, 10 -> 10 MB 3024 -> 3024 (1262546-1259522) objects, 2(12) handoff, 10(1357) steal, 36/15/2 yields gc313480(2): 0+0+0 ms, 10 -> 10 MB 3025 -> 3025 (1262548-1259523) objects, 1(15) handoff, 5(170) steal, 31/4/0 yields gc313481(2): 0+0+0 ms, 10 -> 10 MB 3026 -> 3026 (1262550-1259524) objects, 2(10) handoff, 10(1357) steal, 26/15/2 yields gc313482(2): 0+0+0 ms, 10 -> 10 MB 3027 -> 3027 (1262552-1259525) objects, 2(43) handoff, 6(421) steal, 36/7/0 yields gc313483(2): 0+0+0 ms, 10 -> 10 MB 3028 -> 3023 (1262554-1259531) objects, 0(0) handoff, 5(408) steal, 22/11/0 yields gc313484(2): 0+0+0 ms, 10 -> 10 MB 3023 -> 3023 (1262556-1259533) objects, 1(9) handoff, 4(418) steal, 24/3/0 yields gc313485(2): 0+0+0 ms, 10 -> 10 MB 3023 -> 3023 (1262558-1259535) objects, 3(56) handoff, 5(1124) steal, 22/14/1 yields gc313486(2): 0+0+0 ms, 10 -> 10 MB 3023 -> 3023 (1262560-1259537) objects, 3(61) handoff, 4(1123) steal, 22/15/1 yields gc313487(2): 0+0+0 ms, 10 -> 10 MB 3023 -> 3023 (1262562-1259539) objects, 4(115) handoff, 4(834) steal, 22/15/1 yields gc313488(2): 0+0+0 ms, 10 -> 10 MB 3023 -> 3023 (1262564-1259541) objects, 1(5) handoff, 5(419) steal, 22/2/0 yields SIGSEGV: segmentation violation PC=0x4065cf selunlock(0xc2008cab40) /home/alberts/go/src/pkg/runtime/chan.c:817 +0x3f park0(0xc200817200) /home/alberts/go/src/pkg/runtime/proc.c:1186 +0x7a runtime.mcall() /home/alberts/go/src/pkg/runtime/asm_amd64.s:195 +0x49 goroutine 1 [chan receive]: runtime.park(0x40d1b0, 0xc2003103b0, 0x75f90a) /home/alberts/go/src/pkg/runtime/proc.c:1175 +0x64 runtime.chanrecv(0x573a00, 0xc200310360, 0x7f4c9831cce0, 0x0, 0x0, ...) /home/alberts/go/src/pkg/runtime/chan.c:366 +0x566 runtime.chanrecv1() /home/alberts/go/src/pkg/runtime/chan.c:458 +0x38 testing.RunTests(0x650528, 0x760260, 0x31, 0x31, 0x1, ...) /build/go/go/src/pkg/testing/testing.go:434 +0x88e testing.Main(0x650528, 0x760260, 0x31, 0x31, 0x762040, ...) /build/go/go/src/pkg/testing/testing.go:365 +0x8a main.main() runtime/_test/_testmain.go:319 +0x9a runtime.main() /home/alberts/go/src/pkg/runtime/proc.c:182 +0x92 runtime.goexit() /home/alberts/go/src/pkg/runtime/proc.c:1223 goroutine 2 [syscall]: runtime.entersyscallblock() /home/alberts/go/src/pkg/runtime/proc.c:1333 +0x16e runtime.MHeap_Scavenger() /home/alberts/go/src/pkg/runtime/mheap.c:454 +0xee runtime.goexit() /home/alberts/go/src/pkg/runtime/proc.c:1223 created by runtime.main /home/alberts/go/src/pkg/runtime/proc.c:165 goroutine 68 [timer goroutine (idle)]: runtime.park(0x40d1b0, 0x763420, 0x75a686) /home/alberts/go/src/pkg/runtime/proc.c:1175 +0x64 timerproc() /home/alberts/go/src/pkg/runtime/ztime_linux_amd64.c:187 +0x79 runtime.goexit() /home/alberts/go/src/pkg/runtime/proc.c:1223 created by addtimer /home/alberts/go/src/pkg/runtime/ztime_linux_amd64.c:82 goroutine 36 [finalizer wait]: runtime.park(0x0, 0x0, 0x760ad1) /home/alberts/go/src/pkg/runtime/proc.c:1175 +0x64 runfinq() /home/alberts/go/src/pkg/runtime/mgc0.c:2182 +0x6d runtime.goexit() /home/alberts/go/src/pkg/runtime/proc.c:1223 created by runtime.gc /home/alberts/go/src/pkg/runtime/mgc0.c:1886 goroutine 560834 [runnable]: runtime.gosched() /home/alberts/go/src/pkg/runtime/proc.c:1201 +0x25 runtime.Gosched() /home/alberts/go/src/pkg/runtime/proc.c:1621 +0x18 runtime_test.func·001() /home/alberts/go/src/pkg/runtime/chan_test.go:38 +0x55 runtime.goexit() /home/alberts/go/src/pkg/runtime/proc.c:1223 created by runtime_test.TestPseudoRandomSend /home/alberts/go/src/pkg/runtime/chan_test.go:42 +0x12d goroutine 560833 [running]: runtime.park(0xc2008cab40, 0x7f4c9835c7e8, 0x75fa36) /home/alberts/go/src/pkg/runtime/proc.c:1175 +0x64 selectgo(0x7f4c97e14ec8) /home/alberts/go/src/pkg/runtime/chan.c:1119 +0x28d created by testing.RunTests /build/go/go/src/pkg/testing/testing.go:433 +0x86b rax 0x0 rbx 0x0 rcx 0x0 rdx 0xc200310ee0 rdi 0x419320 rsi 0xc200791a00 rbp 0xc200310ea0 rsp 0xc200797f58 r8 0xffffffff r9 0xc200310ea0 r10 0x0 r11 0x202 r12 0x0 r13 0x4f r14 0x3f59824c084218 r15 0x6261d0 rip 0x4065cf rflags 0x10202 cs 0x33 fs 0x0 gs 0x0 |
The good news is that running TestPseudoRandomSend on its own crashes it in a few seconds. GOGC=0 GOGCTRACE=1 GOMAXPROCS=2 ./runtime.test -test.short -test.cpu=1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4,1,2,4 -test.v -test.run=TestPseudoRandomSend |
Thanks. I was able to replicate the crash on my Ubuntu system. SIGSEGV: segmentation violation PC=0x4065cf goroutine 1 [chan receive]: testing.RunTests(0x650528, 0x760260, 0x31, 0x31, 0x1, ...) /home/iant/go2/src/pkg/testing/testing.go:434 +0x88e testing.Main(0x650528, 0x760260, 0x31, 0x31, 0x762040, ...) /home/iant/go2/src/pkg/testing/testing.go:365 +0x8a main.main() runtime/_test/_testmain.go:319 +0x9a goroutine 62 [chan receive]: runtime_test.func·001() /home/iant/go2/src/pkg/runtime/chan_test.go:39 +0x6f created by runtime_test.TestPseudoRandomSend /home/iant/go2/src/pkg/runtime/chan_test.go:42 +0x12d goroutine 61 [running]: created by testing.RunTests /home/iant/go2/src/pkg/testing/testing.go:433 +0x86b rax 0x0 rbx 0x0 rcx 0x0 rdx 0xc2000d5580 rdi 0x419320 rsi 0xc200079b00 rbp 0xc2000d5540 rsp 0xc2000e7f58 r8 0xffffffff r9 0xc2000d5540 r10 0x0 r11 0x202 r12 0x0 r13 0x62 r14 0x10 r15 0x6261d0 rip 0x4065cf rflags 0x10202 cs 0x33 fs 0x0 gs 0x0 Labels changed: added priority-asap, go1.1, removed priority-triage. |
With GOTRACEBACK=2: SIGSEGV: segmentation violation PC=0x4065cf selunlock(0xc2001130c0) /home/iant/go2/src/pkg/runtime/chan.c:817 +0x3f park0(0xc200079800) /home/iant/go2/src/pkg/runtime/proc.c:1186 +0x7a runtime.mcall() /home/iant/go2/src/pkg/runtime/asm_amd64.s:195 +0x49 goroutine 1 [running]: syscall.Syscall() /home/iant/go2/src/pkg/syscall/asm_linux_amd64.s:16 +0x5 syscall.write(0x1, 0xc2000b8800, 0x30, 0x4a70fd, 0xc2000b7120, ...) /home/iant/go2/src/pkg/syscall/zerrors_linux_amd64.go:2717 +0x70 syscall.Write(0x1, 0xb8800, 0x4b1b8a, 0xc2000b7120, 0x4c4498, ...) /home/iant/go2/src/pkg/syscall/syscall_unix.go:143 +0x5a goroutine 2 [syscall]: runtime.entersyscallblock() /home/iant/go2/src/pkg/runtime/proc.c:1333 +0x16e runtime.MHeap_Scavenger() /home/iant/go2/src/pkg/runtime/mheap.c:454 +0xee runtime.goexit() /home/iant/go2/src/pkg/runtime/proc.c:1223 created by runtime.main /home/iant/go2/src/pkg/runtime/proc.c:165 rax 0x0 rbx 0x0 rcx 0x0 rdx 0xc200112be0 rdi 0x419320 rsi 0xc200079b00 rbp 0xc200112ba0 rsp 0xc2000e4f58 r8 0xffffffff r9 0xc200112ba0 r10 0x0 r11 0x202 r12 0x80 r13 0x4b r14 0x10 r15 0x6261d0 rip 0x4065cf rflags 0x10202 cs 0x33 fs 0x0 gs 0x0 |
I see the problem. Will send a fix shortly. Owner changed to @dvyukov. Status changed to Started. |
This issue was closed by revision 26d95d8. Status changed to Fixed. |
adg
added a commit
that referenced
this issue
May 11, 2015
««« CL 9311043 / 53bc96b4c0c7 runtime: fix crash in select runtime.park() can access freed select descriptor due to a racing free in another thread. See the comment for details. Slightly modified version of dvyukov's CL 9259045. No test yet. Before this CL, the test described in issue 5422 would fail about every 40 times for me. With this CL, I ran the test 5900 times with no failures. Fixes #5422. R=golang-dev, r CC=golang-dev https://golang.org/cl/9311043 »»» R=golang-dev, r CC=golang-dev https://golang.org/cl/9304044
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The text was updated successfully, but these errors were encountered: