Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: deadlock/livelock in GC code #28695

Closed
RLH opened this issue Nov 9, 2018 · 5 comments
Closed

runtime: deadlock/livelock in GC code #28695

RLH opened this issue Nov 9, 2018 · 5 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@RLH
Copy link
Contributor

RLH commented Nov 9, 2018

The GC appears to be deadlock. The test in issue27695.go triggers it though the test seem unrelated to the deadlock.

go version devel +ac277d9234 Wed Nov 7 13:04:50 2018 +0000 linux/amd64

GOARCH="amd64"
GOOS="linux""

Noticed this running go/test/fixedbugs/issue27695.go from the standard distribution.

Once every 1000 or so runs it hangs. The easiest way to make this happen is to run
GOTRACEBACK=crash stress ../test/fixedbugs/issue27695

Run top and you will see 1 out of a thousand or so invocations of issue27695 hanging.
Grab the PID of one of the hung executions and run gdb and dump all the threads.

rlh@rlh:~/work/go$ gdb -p 49509
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 49509
[New LWP 49510]
[New LWP 49511]
[New LWP 49512]
[New LWP 49513]
[New LWP 49514]
[New LWP 49519]
runtime.futex () at /home/rlh/work/go/src/runtime/sys_linux_amd64.s:536
536		MOVL	AX, ret+40(FP)
warning: File "/home/rlh/work/go/src/runtime/runtime-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
	add-auto-load-safe-path /home/rlh/work/go/src/runtime/runtime-gdb.py
line to your configuration file "/home/rlh/.gdbinit".
To completely disable this security protection add
	set auto-load safe-path /
line to your configuration file "/home/rlh/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
	info "(gdb)Auto-loading safe path"
(gdb) thread apply all bt

Thread 7 (LWP 49519):
#0  runtime.futex () at /home/rlh/work/go/src/runtime/sys_linux_amd64.s:536
#1  0x00000000004261f0 in runtime.futexsleep (addr=0x587a48 <runtime.sched+296>, val=0, ns=100000) at /home/rlh/work/go/src/runtime/os_linux.go:63
#2  0x000000000040971e in runtime.notetsleep_internal (n=0x587a48 <runtime.sched+296>, ns=100000, ~r2=<optimized out>) at /home/rlh/work/go/src/runtime/lock_futex.go:193
#3  0x00000000004097f1 in runtime.notetsleep (n=0x587a48 <runtime.sched+296>, ns=100000, ~r2=<optimized out>) at /home/rlh/work/go/src/runtime/lock_futex.go:216
#4  0x000000000042d221 in runtime.forEachP (fn={void (runtime.p *)} 0xc00006ffb8) at /home/rlh/work/go/src/runtime/proc.go:1396
#5  0x000000000044d8fd in runtime.gcMarkDone.func1 () at /home/rlh/work/go/src/runtime/mgc.go:1407
#6  0x000000000044fce6 in runtime.systemstack () at /home/rlh/work/go/src/runtime/asm_amd64.s:351
#7  0x000000000042caf0 in ?? () at /home/rlh/work/go/src/runtime/proc.go:1082
#8  0x0000000000000000 in ?? ()

Thread 6 (LWP 49514):
#0  runtime.futex () at /home/rlh/work/go/src/runtime/sys_linux_amd64.s:536
#1  0x000000000042617b in runtime.futexsleep (addr=0xc0000604c0, val=0, ns=-1) at /home/rlh/work/go/src/runtime/os_linux.go:46
#2  0x000000000040958f in runtime.notesleep (n=0xc0000604c0) at /home/rlh/work/go/src/runtime/lock_futex.go:151
#3  0x000000000042dfe8 in runtime.stopm () at /home/rlh/work/go/src/runtime/proc.go:1936
#4  0x000000000042ea54 in runtime.gcstopm () at /home/rlh/work/go/src/runtime/proc.go:2133
#5  0x000000000042f2d4 in runtime.findrunnable (gp=0xc000026500, inheritTime=false) at /home/rlh/work/go/src/runtime/proc.go:2188
#6  0x000000000042fcb8 in runtime.schedule () at /home/rlh/work/go/src/runtime/proc.go:2525
#7  0x00000000004301d4 in runtime.goschedImpl (gp=0xc000063c80) at /home/rlh/work/go/src/runtime/proc.go:2620
#8  0x00000000004303e4 in runtime.gopreempt_m (gp=0xc000063c80) at /home/rlh/work/go/src/runtime/proc.go:2648
#9  0x000000000043dab9 in runtime.newstack () at /home/rlh/work/go/src/runtime/stack.go:1033
#10 0x000000000044fdbf in runtime.morestack () at /home/rlh/work/go/src/runtime/asm_amd64.s:429
#11 0x0000000000000000 in ?? ()

Thread 5 (LWP 49513):
#0  runtime.futex () at /home/rlh/work/go/src/runtime/sys_linux_amd64.s:536
#1  0x000000000042617b in runtime.futexsleep (addr=0xc000060140, val=0, ns=-1) at /home/rlh/work/go/src/runtime/os_linux.go:46
#2  0x000000000040958f in runtime.notesleep (n=0xc000060140) at /home/rlh/work/go/src/runtime/lock_futex.go:151
#3  0x000000000042dfe8 in runtime.stopm () at /home/rlh/work/go/src/runtime/proc.go:1936
#4  0x000000000042f13a in runtime.findrunnable (gp=0xc000028a00, inheritTime=false) at /home/rlh/work/go/src/runtime/proc.go:2399
#5  0x000000000042fcb8 in runtime.schedule () at /home/rlh/work/go/src/runtime/proc.go:2525
#6  0x00000000004301d4 in runtime.goschedImpl (gp=0xc000062d80) at /home/rlh/work/go/src/runtime/proc.go:2620
#7  0x00000000004303e4 in runtime.gopreempt_m (gp=0xc000062d80) at /home/rlh/work/go/src/runtime/proc.go:2648
#8  0x000000000043dab9 in runtime.newstack () at /home/rlh/work/go/src/runtime/stack.go:1033
#9  0x000000000044fdbf in runtime.morestack () at /home/rlh/work/go/src/runtime/asm_amd64.s:429
#10 0x0000000000000000 in ?? ()

Thread 4 (LWP 49512):
#0  runtime.futex () at /home/rlh/work/go/src/runtime/sys_linux_amd64.s:536
#1  0x000000000042617b in runtime.futexsleep (addr=0xc00003a840, val=0, ns=-1) at /home/rlh/work/go/src/runtime/os_linux.go:46
---Type <return> to continue, or q <return> to quit---
#2  0x000000000040958f in runtime.notesleep (n=0xc00003a840) at /home/rlh/work/go/src/runtime/lock_futex.go:151
#3  0x000000000042dfe8 in runtime.stopm () at /home/rlh/work/go/src/runtime/proc.go:1936
#4  0x000000000042f13a in runtime.findrunnable (gp=0xc000024000, inheritTime=false) at /home/rlh/work/go/src/runtime/proc.go:2399
#5  0x000000000042fcb8 in runtime.schedule () at /home/rlh/work/go/src/runtime/proc.go:2525
#6  0x000000000042ffcc in runtime.park_m (gp=0xc000062a80) at /home/rlh/work/go/src/runtime/proc.go:2605
#7  0x000000000044fc5b in runtime.mcall () at /home/rlh/work/go/src/runtime/asm_amd64.s:299
#8  0x0000000000000000 in ?? ()

Thread 3 (LWP 49511):
#0  runtime.procyield () at /home/rlh/work/go/src/runtime/asm_amd64.s:553
#1  0x000000000042c041 in runtime.scang (gp=0xc000063e00, gcw=0xc000027770) at /home/rlh/work/go/src/runtime/proc.go:923
#2  0x000000000044ddbd in runtime.markroot.func1 () at /home/rlh/work/go/src/runtime/mgcmark.go:221
#3  0x000000000041a796 in runtime.markroot (gcw=0xc000027770, i=26) at /home/rlh/work/go/src/runtime/mgcmark.go:202
#4  0x000000000041c1ff in runtime.gcDrainN (gcw=0xc000027770, scanWork=65536, ~r2=<optimized out>) at /home/rlh/work/go/src/runtime/mgcmark.go:1006
#5  0x000000000041af84 in runtime.gcAssistAlloc1 (gp=0xc000062c00, scanWork=65536) at /home/rlh/work/go/src/runtime/mgcmark.go:512
#6  0x000000000044de73 in runtime.gcAssistAlloc.func1 () at /home/rlh/work/go/src/runtime/mgcmark.go:423
#7  0x000000000044fce6 in runtime.systemstack () at /home/rlh/work/go/src/runtime/asm_amd64.s:351
#8  0x000000000042caf0 in ?? () at /home/rlh/work/go/src/runtime/proc.go:1082
#9  0x0000000000000000 in ?? ()

Thread 2 (LWP 49510):
#0  runtime.usleep () at /home/rlh/work/go/src/runtime/sys_linux_amd64.s:131
#1  0x0000000000433deb in runtime.sysmon () at /home/rlh/work/go/src/runtime/proc.go:4280
#2  0x000000000042cc43 in runtime.mstart1 () at /home/rlh/work/go/src/runtime/proc.go:1206
#3  0x000000000042cb5e in runtime.mstart () at /home/rlh/work/go/src/runtime/proc.go:1172
#4  0x0000000000453ab3 in runtime.clone () at /home/rlh/work/go/src/runtime/sys_linux_amd64.s:587
#5  0x0000000000000000 in ?? ()

Thread 1 (LWP 49509):
#0  runtime.futex () at /home/rlh/work/go/src/runtime/sys_linux_amd64.s:536
#1  0x000000000042617b in runtime.futexsleep (addr=0x588120 <runtime.m0+320>, val=0, ns=-1) at /home/rlh/work/go/src/runtime/os_linux.go:46
#2  0x000000000040958f in runtime.notesleep (n=0x588120 <runtime.m0+320>) at /home/rlh/work/go/src/runtime/lock_futex.go:151
#3  0x000000000042dfe8 in runtime.stopm () at /home/rlh/work/go/src/runtime/proc.go:1936
#4  0x000000000042f13a in runtime.findrunnable (gp=0xc00002af00, inheritTime=false) at /home/rlh/work/go/src/runtime/proc.go:2399
#5  0x000000000042fcb8 in runtime.schedule () at /home/rlh/work/go/src/runtime/proc.go:2525
#6  0x000000000042ffcc in runtime.park_m (gp=0xc000062780) at /home/rlh/work/go/src/runtime/proc.go:2605
#7  0x000000000044fc5b in runtime.mcall () at /home/rlh/work/go/src/runtime/asm_amd64.s:299
#8  0x000000000044fb79 in runtime.rt0_go () at /home/rlh/work/go/src/runtime/asm_amd64.s:201
#9  0x0000000000000000 in ?? ()
(gdb) 

This should be a good starting place.

@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Nov 9, 2018
@ianlancetaylor ianlancetaylor added this to the Go1.12 milestone Nov 9, 2018
@aclements
Copy link
Member

This is a deadlock between thread 7, which is attempting to force all Ps to a safe point, and thread 3, which must be trying to scan thread 7's user stack. Since neither can preempt the other, they're stuck.

Thread 7 (LWP 49519):
#0  runtime.futex () at /home/rlh/work/go/src/runtime/sys_linux_amd64.s:536
#1  0x00000000004261f0 in runtime.futexsleep (addr=0x587a48 <runtime.sched+296>, val=0, ns=100000) at /home/rlh/work/go/src/runtime/os_linux.go:63
#2  0x000000000040971e in runtime.notetsleep_internal (n=0x587a48 <runtime.sched+296>, ns=100000, ~r2=<optimized out>) at /home/rlh/work/go/src/runtime/lock_futex.go:193
#3  0x00000000004097f1 in runtime.notetsleep (n=0x587a48 <runtime.sched+296>, ns=100000, ~r2=<optimized out>) at /home/rlh/work/go/src/runtime/lock_futex.go:216
#4  0x000000000042d221 in runtime.forEachP (fn={void (runtime.p *)} 0xc00006ffb8) at /home/rlh/work/go/src/runtime/proc.go:1396
#5  0x000000000044d8fd in runtime.gcMarkDone.func1 () at /home/rlh/work/go/src/runtime/mgc.go:1407
#6  0x000000000044fce6 in runtime.systemstack () at /home/rlh/work/go/src/runtime/asm_amd64.s:351
#7  0x000000000042caf0 in ?? () at /home/rlh/work/go/src/runtime/proc.go:1082
#8  0x0000000000000000 in ?? ()
Thread 3 (LWP 49511):
#0  runtime.procyield () at /home/rlh/work/go/src/runtime/asm_amd64.s:553
#1  0x000000000042c041 in runtime.scang (gp=0xc000063e00, gcw=0xc000027770) at /home/rlh/work/go/src/runtime/proc.go:923
#2  0x000000000044ddbd in runtime.markroot.func1 () at /home/rlh/work/go/src/runtime/mgcmark.go:221
#3  0x000000000041a796 in runtime.markroot (gcw=0xc000027770, i=26) at /home/rlh/work/go/src/runtime/mgcmark.go:202
#4  0x000000000041c1ff in runtime.gcDrainN (gcw=0xc000027770, scanWork=65536, ~r2=<optimized out>) at /home/rlh/work/go/src/runtime/mgcmark.go:1006
#5  0x000000000041af84 in runtime.gcAssistAlloc1 (gp=0xc000062c00, scanWork=65536) at /home/rlh/work/go/src/runtime/mgcmark.go:512
#6  0x000000000044de73 in runtime.gcAssistAlloc.func1 () at /home/rlh/work/go/src/runtime/mgcmark.go:423
#7  0x000000000044fce6 in runtime.systemstack () at /home/rlh/work/go/src/runtime/asm_amd64.s:351
#8  0x000000000042caf0 in ?? () at /home/rlh/work/go/src/runtime/proc.go:1082
#9  0x0000000000000000 in ?? ()

So, the cause is clear. The fix is less so. It would be much better if all of these preemption operations were asynchronous; we've had plenty of other similar deadlocks in the past. To fix this bug, it may be enough to allow preemption of forEachP's user goroutine.

@andybons
Copy link
Member

@aclements feel free to re-assign as you see fit.

@mknyszek
Copy link
Contributor

mknyszek commented Dec 6, 2018

I was able to reproduce and I think I have a fix. As soon as I can confirm it fixes the issue I'll put it up for review.

@gopherbot
Copy link

Change https://golang.org/cl/153077 mentions this issue: runtime: enable preemption of mark termination goroutine

@mknyszek mknyszek self-assigned this Dec 6, 2018
@mknyszek
Copy link
Contributor

mknyszek commented Dec 6, 2018

Running the same program @RLH mentioned above, I'm at 35k iterations with no stale (deadlocked) processes, so hopefully this means its fixed. :)

@golang golang locked and limited conversation to collaborators Dec 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

6 participants