Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: hang waiting for dead process on macOS 10.12 #18540

Closed
rsc opened this issue Jan 6, 2017 · 6 comments
Closed

os: hang waiting for dead process on macOS 10.12 #18540

rsc opened this issue Jan 6, 2017 · 6 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Jan 6, 2017

Running the net/http tests with -v -short in a loop. Got one that hung for 400 minutes. The actual test binary failed after 10 minutes, but somehow the parent go command was wedged. It ate the ^C (because it knew it was waiting for the subprocess). Eventually kill -ABRT (63 minutes later) dumped the following:

SIGABRT: abort
PC=0x1056a0b m=0 sigcode=0

goroutine 0 [idle]:
runtime.mach_semaphore_wait(0xf03, 0x1693100, 0x0, 0x102048d, 0x1693100, 0x1693280, 0x7fff5fbff440, 0x1050ee3, 0xffffffffffffffff, 0x1a4088008e3, ...)
	/Users/rsc/go/src/runtime/sys_darwin_amd64.s:415 +0xb
runtime.semasleep1(0xffffffffffffffff, 0x1a4088008e3)
	/Users/rsc/go/src/runtime/os_darwin.go:413 +0x4b
runtime.semasleep.func1()
	/Users/rsc/go/src/runtime/os_darwin.go:432 +0x33
runtime.systemstack(0x7fff5fbff468)
	/Users/rsc/go/src/runtime/asm_amd64.s:330 +0xab
runtime.semasleep(0xffffffffffffffff, 0x2)
	/Users/rsc/go/src/runtime/os_darwin.go:433 +0x44
runtime.notesleep(0x1693910)
	/Users/rsc/go/src/runtime/lock_sema.go:166 +0xb2
runtime.stopm()
	/Users/rsc/go/src/runtime/proc.go:1640 +0xad
runtime.findrunnable(0xc420022c00, 0x0)
	/Users/rsc/go/src/runtime/proc.go:2092 +0x2e4
runtime.schedule()
	/Users/rsc/go/src/runtime/proc.go:2212 +0x14c
runtime.park_m(0xc4204604e0)
	/Users/rsc/go/src/runtime/proc.go:2275 +0xab
runtime.mcall(0x7fff5fbff640)
	/Users/rsc/go/src/runtime/asm_amd64.s:256 +0x5b

goroutine 1 [semacquire, 409 minutes]:
sync.runtime_Semacquire(0xc42018eb7c)
	/Users/rsc/go/src/runtime/sema.go:47 +0x34
sync.(*WaitGroup).Wait(0xc42018eb70)
	/Users/rsc/go/src/sync/waitgroup.go:131 +0x7a
main.(*builder).do(0xc42030aa00, 0xc42030e0d0)
	/Users/rsc/go/src/cmd/go/build.go:1370 +0x4be
main.runTest(0x168c120, 0xc4200102a0, 0x2, 0x2)
	/Users/rsc/go/src/cmd/go/test.go:636 +0x1276
main.main()
	/Users/rsc/go/src/cmd/go/main.go:184 +0x655

goroutine 17 [syscall, 409 minutes, locked to thread]:
runtime.goexit()
	/Users/rsc/go/src/runtime/asm_amd64.s:2184 +0x1

goroutine 5 [syscall, 63 minutes]:
os/signal.signal_recv(0x165e340)
	/Users/rsc/go/src/runtime/sigqueue.go:116 +0xff
os/signal.loop()
	/Users/rsc/go/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.1
	/Users/rsc/go/src/os/signal/signal_unix.go:28 +0x41

goroutine 38 [chan receive, 399 minutes]:
main.(*builder).runTest(0xc42030aa00, 0xc4203afe10, 0xc4798bdeb0, 0x110f393)
	/Users/rsc/go/src/cmd/go/test.go:1182 +0xdd4
main.(*builder).do.func1(0xc4203afe10)
	/Users/rsc/go/src/cmd/go/build.go:1302 +0x7d
main.(*builder).do.func2(0xc42018eb70, 0xc42030aa00, 0xc42000c4e0)
	/Users/rsc/go/src/cmd/go/build.go:1361 +0x145
created by main.(*builder).do
	/Users/rsc/go/src/cmd/go/build.go:1367 +0x495

goroutine 22 [select, 409 minutes, locked to thread]:
runtime.gopark(0x14b3590, 0x0, 0x148cfe3, 0x6, 0x18, 0x2)
	/Users/rsc/go/src/runtime/proc.go:261 +0x13a
runtime.selectgoImpl(0xc420719f50, 0x0, 0x18)
	/Users/rsc/go/src/runtime/select.go:423 +0x1308
runtime.selectgo(0xc420719f50)
	/Users/rsc/go/src/runtime/select.go:238 +0x1c
runtime.ensureSigM.func1()
	/Users/rsc/go/src/runtime/signal_unix.go:434 +0x265
runtime.goexit()
	/Users/rsc/go/src/runtime/asm_amd64.s:2184 +0x1

goroutine 24 [syscall, 409 minutes]:
syscall.Syscall6(0xad, 0x1, 0xb8fb, 0xc4207bb5a0, 0x24, 0x0, 0x0, 0x0, 0x0, 0x0)
	/Users/rsc/go/src/syscall/asm_darwin_amd64.s:41 +0x5
os.(*Process).blockUntilWaitable(0xc420131500, 0x0, 0x0, 0x0)
	/Users/rsc/go/src/os/wait_waitid.go:28 +0xa5
os.(*Process).wait(0xc420131500, 0x0, 0x0, 0x0)
	/Users/rsc/go/src/os/exec_unix.go:22 +0x4d
os.(*Process).Wait(0xc420131500, 0x0, 0x0, 0x0)
	/Users/rsc/go/src/os/exec.go:115 +0x2b
os/exec.(*Cmd).Wait(0xc420372000, 0x0, 0x0)
	/Users/rsc/go/src/os/exec/exec.go:435 +0x62
main.(*builder).runTest.func1(0xc42039a3c0, 0xc420372000)
	/Users/rsc/go/src/cmd/go/test.go:1162 +0x2b
created by main.(*builder).runTest
	/Users/rsc/go/src/cmd/go/test.go:1163 +0xc51

rax    0xe
rbx    0x1693800
rcx    0x7fff5fbff3e0
rdx    0x7fff5fbff468
rdi    0xf03
rsi    0x1a140800901
rbp    0x7fff5fbff418
rsp    0x7fff5fbff3e0
r8     0x50
r9     0xa
r10    0x1
r11    0x286
r12    0x37c07b0b94a6
r13    0x37c8e5cde1b0
r14    0x1497348141b9f800
r15    0x152c828
rip    0x1056a0b
rflags 0x286
cs     0x7
fs     0x0
gs     0x0
@rsc rsc added this to the Go1.8Maybe milestone Jan 6, 2017
@rsc rsc modified the milestones: Go1.9Maybe, Go1.8Maybe Feb 7, 2017
@rfjakob
Copy link

rfjakob commented Feb 16, 2017

I get it to hang when I hit Ctrl+C during go test ./.... This is go version go1.7.5 linux/amd64 but the backtraces look similar (build.go). kill -ABRT gives me:

SIGABRT: abort
PC=0x4d1f63 m=2

goroutine 0 [idle]:
runtime.futex(0xa2fbf8, 0x0, 0x7f7de3ae1dc8, 0x0, 0x7f7d00000000, 0x4d1c86, 0x3c, 0x0, 0x7f7de3ae1e10, 0x485c1b, ...)
	/usr/local/go/src/runtime/sys_linux_amd64.s:388 +0x23
runtime.futexsleep(0xa2fbf8, 0x0, 0xdf8475800)
	/usr/local/go/src/runtime/os_linux.go:62 +0xd7
runtime.notetsleep_internal(0xa2fbf8, 0xdf8475800, 0x1)
	/usr/local/go/src/runtime/lock_futex.go:174 +0xeb
runtime.notetsleep(0xa2fbf8, 0xdf8475800, 0x2d19042425c0)
	/usr/local/go/src/runtime/lock_futex.go:194 +0x56
runtime.sysmon()
	/usr/local/go/src/runtime/proc.go:3618 +0x14c
runtime.mstart1()
	/usr/local/go/src/runtime/proc.go:1126 +0x11e
runtime.mstart()
	/usr/local/go/src/runtime/proc.go:1096 +0x64

goroutine 1 [semacquire, 3 minutes]:
sync.runtime_Semacquire(0xc4205161fc)
	/usr/local/go/src/runtime/sema.go:47 +0x30
sync.(*WaitGroup).Wait(0xc4205161f0)
	/usr/local/go/src/sync/waitgroup.go:131 +0x97
main.(*builder).do(0xc42001d340, 0xc420507520)
	/usr/local/go/src/cmd/go/build.go:1330 +0x4e1
main.runTest(0xa28f40, 0xc42000c380, 0x1, 0x1)
	/usr/local/go/src/cmd/go/test.go:618 +0x1328
main.main()
	/usr/local/go/src/cmd/go/main.go:181 +0x624

goroutine 17 [syscall, 3 minutes, locked to thread]:
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:2086 +0x1

goroutine 5 [syscall, 3 minutes]:
os/signal.signal_recv(0xa0aec0)
	/usr/local/go/src/runtime/sigqueue.go:116 +0x157
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.1
	/usr/local/go/src/os/signal/signal_unix.go:28 +0x41

goroutine 8 [select, 3 minutes]:
main.(*builder).runTest(0xc42001d340, 0xc4204f64e0, 0x3, 0xc42051de70)
	/usr/local/go/src/cmd/go/test.go:1145 +0x113a
main.(*builder).do.func1(0xc4204f64e0)
	/usr/local/go/src/cmd/go/build.go:1264 +0x85
main.(*builder).do.func2(0xc4205161f0, 0xc42001d340, 0xc4202d3da0)
	/usr/local/go/src/cmd/go/build.go:1321 +0x147
created by main.(*builder).do
	/usr/local/go/src/cmd/go/build.go:1327 +0x4b8

goroutine 48 [select, 3 minutes, locked to thread]:
runtime.gopark(0x894c08, 0x0, 0x84b1a3, 0x6, 0x18, 0x2)
	/usr/local/go/src/runtime/proc.go:259 +0x13a
runtime.selectgoImpl(0xc42002af30, 0x0, 0x18)
	/usr/local/go/src/runtime/select.go:423 +0x1235
runtime.selectgo(0xc42002af30)
	/usr/local/go/src/runtime/select.go:238 +0x1c
runtime.ensureSigM.func1()
	/usr/local/go/src/runtime/signal1_unix.go:304 +0x2f3
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:2086 +0x1

goroutine 139 [chan receive, 3 minutes]:
os/exec.(*Cmd).Wait(0xc4203ea000, 0xc420430240, 0xc420387f90)
	/usr/local/go/src/os/exec/exec.go:442 +0x124
main.(*builder).runTest.func1(0xc42032a540, 0xc4203ea000)
	/usr/local/go/src/cmd/go/test.go:1142 +0x2b
created by main.(*builder).runTest
	/usr/local/go/src/cmd/go/test.go:1143 +0xc76

goroutine 138 [syscall, 3 minutes]:
syscall.Syscall(0x0, 0x5, 0xc4204606e4, 0x51c, 0x47, 0x563, 0x0)
	/usr/local/go/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.read(0x5, 0xc4204606e4, 0x51c, 0x51c, 0x47, 0x0, 0x0)
	/usr/local/go/src/syscall/zsyscall_linux_amd64.go:783 +0x55
syscall.Read(0x5, 0xc4204606e4, 0x51c, 0x51c, 0x47, 0x0, 0x0)
	/usr/local/go/src/syscall/syscall_unix.go:161 +0x49
os.(*File).read(0xc42013a1c0, 0xc4204606e4, 0x51c, 0x51c, 0x47, 0x0, 0x0)
	/usr/local/go/src/os/file_unix.go:228 +0x4f
os.(*File).Read(0xc42013a1c0, 0xc4204606e4, 0x51c, 0x51c, 0x47, 0x0, 0x0)
	/usr/local/go/src/os/file.go:101 +0x59
bytes.(*Buffer).ReadFrom(0xc420150540, 0xa07c40, 0xc42013a1c0, 0xc420388688, 0x485601, 0xc420388690)
	/usr/local/go/src/bytes/buffer.go:176 +0x155
io.copyBuffer(0xa06bc0, 0xc420150540, 0xa07c40, 0xc42013a1c0, 0x0, 0x0, 0x0, 0xc42032a238, 0x0, 0x0)
	/usr/local/go/src/io/io.go:384 +0x323
io.Copy(0xa06bc0, 0xc420150540, 0xa07c40, 0xc42013a1c0, 0x7bf760, 0xc42032a1e0, 0xc420388790)
	/usr/local/go/src/io/io.go:360 +0x68
os/exec.(*Cmd).writerDescriptor.func1(0x7bf760, 0xc42032a1e0)
	/usr/local/go/src/os/exec/exec.go:253 +0x4d
os/exec.(*Cmd).Start.func1(0xc4203ea000, 0xc42043aa00)
	/usr/local/go/src/os/exec/exec.go:370 +0x27
created by os/exec.(*Cmd).Start
	/usr/local/go/src/os/exec/exec.go:371 +0x4db

rax    0xfffffffffffffffc
rbx    0x0
rcx    0x4d1f63
rdx    0x0
rdi    0xa2fbf8
rsi    0x0
rbp    0x7f7de3ae1dd8
rsp    0x7f7de3ae1d90
r8     0x0
r9     0x0
r10    0x7f7de3ae1dc8
r11    0x246
r12    0x0
r13    0xf3
r14    0x30
r15    0x3
rip    0x4d1f63
rflags 0x246
cs     0x33
fs     0x0
gs     0x0

@rsc
Copy link
Contributor Author

rsc commented Feb 16, 2017

@rfjakob, your trace is stuck reading from the pipe holding the subprocess output, not waiting for the subprocess itself. On the Mac, the subprocess had exited and yet wait did not return. On your system it looks like maybe the process was still running, since the pipe had not been closed yet. Next time it happens, before the ^C, please run 'ps axwwuf' and look for the hung go command and see if it has any children still. Thanks.

@eliasnaur
Copy link
Contributor

eliasnaur commented May 1, 2017

I think a similar thing is happening on the macOS hosted android/arm builder. I can reproduce it manually by enough runs of go test -short -timeout=6m runtime/pprof. I've attached gdb to the hanging go process. Here's a dump of my gdb session so far, including all goroutine stack dumps

Eliass-Mac-mini:Xcode elias$ gdb -p 54046
GNU gdb (GDB) 7.12.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin16.4.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 54046
[New Thread 0x1303 of process 54046]
[New Thread 0x1403 of process 54046]
[New Thread 0x1503 of process 54046]
[New Thread 0x1603 of process 54046]
[New Thread 0x1703 of process 54046]
[New Thread 0x1803 of process 54046]
[New Thread 0x1903 of process 54046]
[New Thread 0x1a03 of process 54046]
[New Thread 0x1b03 of process 54046]
[New Thread 0x1c03 of process 54046]
Reading symbols from /Users/elias/go-tip/bin/go...done.

warning: unhandled dyld version (15)
runtime.mach_semaphore_wait () at /Users/elias/go-tip/src/runtime/sys_darwin_amd64.s:445
445		MOVL	AX, ret+8(FP)
Loading Go Runtime support.
(gdb) info goroutines
  1 waiting  runtime.gopark
* 17 syscall  runtime.goexit
  2 waiting  runtime.gopark
  3 waiting  runtime.gopark
  4 waiting  runtime.gopark
* 5 syscall  runtime.systemstack_switch
  7 waiting  runtime.gopark
  19 waiting  runtime.gopark
* 20 running  runtime.systemstack_switch
  21 waiting  runtime.gopark
  22 waiting  runtime.gopark
  8 waiting  runtime.gopark
  9 waiting  runtime.gopark
  10 waiting  runtime.gopark
  52 waiting  runtime.gopark
  53 waiting  runtime.gopark
  54 waiting  runtime.gopark
  55 waiting  runtime.gopark
  56 waiting  runtime.gopark
(gdb) goroutine 1 bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc4204af5a8, lock=0x171cee0 <runtime.semtable+4544>, reason="semacquire", traceEv=25 '\031', traceskip=4) at /Users/elias/go-tip/src/runtime/proc.go:269
#1  0x000000000102d7ae in runtime.goparkunlock (lock=0x171cee0 <runtime.semtable+4544>, reason="semacquire", traceEv=25 '\031', traceskip=4) at /Users/elias/go-tip/src/runtime/proc.go:274
#2  0x000000000103dfd8 in runtime.semacquire1 (addr=0xc4201700fc, lifo=false, profile=1) at /Users/elias/go-tip/src/runtime/sema.go:144
#3  0x000000000103dc59 in sync.runtime_Semacquire (addr=0xc4201700fc) at /Users/elias/go-tip/src/runtime/sema.go:56
#4  0x000000000106ee84 in sync.(*WaitGroup).Wait (wg=0xc4201700f0) at /Users/elias/go-tip/src/sync/waitgroup.go:131
#5  0x00000000011f6405 in cmd/go/internal/work.(*Builder).Do (b=0xc420184900, root=0xc420117e10) at /Users/elias/go-tip/src/cmd/go/internal/work/build.go:1215
#6  0x00000000013b89bd in cmd/go/internal/test.runTest (cmd=0x170af00 <cmd/go/internal/test.statictmp_4>, args= []string = {...}) at /Users/elias/go-tip/src/cmd/go/internal/test/test.go:661
#7  0x00000000013c9e59 in main.main () at /Users/elias/go-tip/src/cmd/go/main.go:133
(gdb) goroutine 2 bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc42002a768, lock=0x1711520 <runtime.forcegc>, reason="force gc (idle)", traceEv=20 '\024', traceskip=1) at /Users/elias/go-tip/src/runtime/proc.go:269
#1  0x000000000102d7ae in runtime.goparkunlock (lock=0x1711520 <runtime.forcegc>, reason="force gc (idle)", traceEv=20 '\024', traceskip=1) at /Users/elias/go-tip/src/runtime/proc.go:274
#2  0x000000000102d4a2 in runtime.forcegchelper () at /Users/elias/go-tip/src/runtime/proc.go:226
#3  0x00000000010592e1 in runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2349
#4  0x0000000000000000 in ?? ()
(gdb) goroutine 3 bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc42002af60, lock=0x1711620 <runtime.sweep>, reason="GC sweep wait", traceEv=20 '\024', traceskip=1) at /Users/elias/go-tip/src/runtime/proc.go:269
#1  0x000000000102d7ae in runtime.goparkunlock (lock=0x1711620 <runtime.sweep>, reason="GC sweep wait", traceEv=20 '\024', traceskip=1) at /Users/elias/go-tip/src/runtime/proc.go:274
#2  0x0000000001020544 in runtime.bgsweep (c=0xc42005c000) at /Users/elias/go-tip/src/runtime/mgcsweep.go:71
#3  0x00000000010592e1 in runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2349
#4  0x000000c42005c000 in ?? ()
#5  0x0000000000000000 in ?? ()
(gdb) goroutine 4 bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc42002b700, lock=0x172fe98 <runtime.finlock>, reason="finalizer wait", traceEv=20 '\024', traceskip=1) at /Users/elias/go-tip/src/runtime/proc.go:269
#1  0x000000000102d7ae in runtime.goparkunlock (lock=0x172fe98 <runtime.finlock>, reason="finalizer wait", traceEv=20 '\024', traceskip=1) at /Users/elias/go-tip/src/runtime/proc.go:274
#2  0x000000000101721d in runtime.runfinq () at /Users/elias/go-tip/src/runtime/mfinal.go:175
#3  0x00000000010592e1 in runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2349
#4  0x0000000000000000 in ?? ()
(gdb) goroutine 5 bt
#0  runtime.systemstack_switch () at /Users/elias/go-tip/src/runtime/asm_amd64.s:288
#1  0x0000000001029c84 in runtime.semasleep (ns=-1, ~r1=0) at /Users/elias/go-tip/src/runtime/os_darwin.go:431
#2  0x000000000101122f in runtime.notetsleep_internal (n=0x17301a0 <runtime.sig>, ns=-1, gp=0xc420001500, deadline=0, ~r4=false) at /Users/elias/go-tip/src/runtime/lock_sema.go:199
#3  0x0000000001011515 in runtime.notetsleepg (n=0x17301a0 <runtime.sig>, ns=-1, ~r2=false) at /Users/elias/go-tip/src/runtime/lock_sema.go:281
#4  0x000000000104288e in os/signal.signal_recv (~r0=0) at /Users/elias/go-tip/src/runtime/sigqueue.go:116
#5  0x00000000011842f2 in os/signal.loop () at /Users/elias/go-tip/src/os/signal/signal_unix.go:22
#6  0x00000000010592e1 in runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2349
#7  0x0000000000000000 in ?? ()
(gdb) goroutine 17 bt
#0  runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2350
#1  0x0000000000000000 in ?? ()
(gdb) goroutine 7 bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc420329c70, lock=0x0, reason="select", traceEv=24 '\030', traceskip=1) at /Users/elias/go-tip/src/runtime/proc.go:269
#1  0x000000000103c8f6 in runtime.selectgo (sel=0xc420329f38, ~r1=842350944448) at /Users/elias/go-tip/src/runtime/select.go:395
#2  0x000000000121bee8 in cmd/go/internal/work.(*Builder).Do.func2 (&wg=0xc4201700f0, b=0xc420184900, handle={void (struct cmd/go/internal/work.Action *)} 0xc420329fd8) at /Users/elias/go-tip/src/cmd/go/internal/work/build.go:1196
#3  0x00000000010592e1 in runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2349
#4  0x000000c4201700f0 in ?? ()
#5  0x000000c420184900 in ?? ()
#6  0x000000c420202a60 in ?? ()
#7  0x0000000000000000 in ?? ()
(gdb) goroutine 19 bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc420026748, lock=0xc420432760, reason="GC worker (idle)", traceEv=20 '\024', traceskip=0) at /Users/elias/go-tip/src/runtime/proc.go:269
#1  0x000000000101af37 in runtime.gcBgMarkWorker (_p_=0xc420020000) at /Users/elias/go-tip/src/runtime/mgc.go:1710
#2  0x00000000010592e1 in runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2349
#3  0x000000c420020000 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb) goroutine 20 bt
#0  runtime.systemstack_switch () at /Users/elias/go-tip/src/runtime/asm_amd64.s:288
#1  0x0000000001019e0d in runtime.gcMarkDone () at /Users/elias/go-tip/src/runtime/mgc.go:1381
#2  0x000000000101afb4 in runtime.gcBgMarkWorker (_p_=0xc420021300) at /Users/elias/go-tip/src/runtime/mgc.go:1827
#3  0x00000000010592e1 in runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2349
#4  0x000000c420021300 in ?? ()
#5  0x0000000000000000 in ?? ()
(gdb) goroutine 21 bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc420027748, lock=0xc420432780, reason="GC worker (idle)", traceEv=20 '\024', traceskip=0) at /Users/elias/go-tip/src/runtime/proc.go:269
#1  0x000000000101af37 in runtime.gcBgMarkWorker (_p_=0xc420022600) at /Users/elias/go-tip/src/runtime/mgc.go:1710
#2  0x00000000010592e1 in runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2349
#3  0x000000c420022600 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb) goroutine 22 bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc420027f48, lock=0xc420432790, reason="GC worker (idle)", traceEv=20 '\024', traceskip=0) at /Users/elias/go-tip/src/runtime/proc.go:269
#1  0x000000000101af37 in runtime.gcBgMarkWorker (_p_=0xc420023900) at /Users/elias/go-tip/src/runtime/mgc.go:1710
#2  0x00000000010592e1 in runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2349
#3  0x000000c420023900 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb) goroutine 8 bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc4201cdc70, lock=0x0, reason="select", traceEv=24 '\030', traceskip=1) at /Users/elias/go-tip/src/runtime/proc.go:269
#1  0x000000000103c8f6 in runtime.selectgo (sel=0xc4201cdf38, ~r1=842350944448) at /Users/elias/go-tip/src/runtime/select.go:395
#2  0x000000000121bee8 in cmd/go/internal/work.(*Builder).Do.func2 (&wg=0xc4201700f0, b=0xc420184900, handle={void (struct cmd/go/internal/work.Action *)} 0xc4201cdfd8) at /Users/elias/go-tip/src/cmd/go/internal/work/build.go:1196
#3  0x00000000010592e1 in runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2349
#4  0x000000c4201700f0 in ?? ()
#5  0x000000c420184900 in ?? ()
#6  0x000000c420202a60 in ?? ()
#7  0x0000000000000000 in ?? ()
(gdb) goroutine 9 bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc420038c70, lock=0x0, reason="select", traceEv=24 '\030', traceskip=1) at /Users/elias/go-tip/src/runtime/proc.go:269
#1  0x000000000103c8f6 in runtime.selectgo (sel=0xc420038f38, ~r1=842350944448) at /Users/elias/go-tip/src/runtime/select.go:395
#2  0x000000000121bee8 in cmd/go/internal/work.(*Builder).Do.func2 (&wg=0xc4201700f0, b=0xc420184900, handle={void (struct cmd/go/internal/work.Action *)} 0xc420038fd8) at /Users/elias/go-tip/src/cmd/go/internal/work/build.go:1196
#3  0x00000000010592e1 in runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2349
#4  0x000000c4201700f0 in ?? ()
#5  0x000000c420184900 in ?? ()
#6  0x000000c420202a60 in ?? ()
#7  0x0000000000000000 in ?? ()
(gdb) goroutine 10 bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc42003b930, lock=0xc4201dc2f8, reason="chan receive", traceEv=23 '\027', traceskip=3) at /Users/elias/go-tip/src/runtime/proc.go:269
#1  0x000000000102d7ae in runtime.goparkunlock (lock=0xc4201dc2f8, reason="chan receive", traceEv=23 '\027', traceskip=3) at /Users/elias/go-tip/src/runtime/proc.go:274
#2  0x0000000001006574 in runtime.chanrecv (c=0xc4201dc2a0, ep=0xc42003bc38, block=true, selected=128, received=true) at /Users/elias/go-tip/src/runtime/chan.go:506
#3  0x000000000100621b in runtime.chanrecv1 (c=0xc4201dc2a0, elem=0xc42003bc38) at /Users/elias/go-tip/src/runtime/chan.go:388
#4  0x00000000013c1b77 in cmd/go/internal/test.builderRunTest (b=0xc420184900, a=0xc420117ba0, ~r2=...) at /Users/elias/go-tip/src/cmd/go/internal/test/test.go:1223
#5  0x000000000121bab2 in cmd/go/internal/work.(*Builder).Do.func1 (a=0xc420117ba0) at /Users/elias/go-tip/src/cmd/go/internal/work/build.go:1147
#6  0x000000000121be38 in cmd/go/internal/work.(*Builder).Do.func2 (&wg=0xc4201700f0, b=0xc420184900, handle={void (struct cmd/go/internal/work.Action *)} 0xc42003bfd8) at /Users/elias/go-tip/src/cmd/go/internal/work/build.go:1206
#7  0x00000000010592e1 in runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2349
#8  0x000000c4201700f0 in ?? ()
#9  0x000000c420184900 in ?? ()
#10 0x000000c420202a60 in ?? ()
#11 0x0000000000000000 in ?? ()
(gdb) goroutine 52 bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc42002cc08, lock=0x1ca0f28, reason="IO wait", traceEv=27 '\033', traceskip=5) at /Users/elias/go-tip/src/runtime/proc.go:269
#1  0x00000000010286d0 in runtime.netpollblock (pd=0x1ca0f00, mode=114, waitio=false, ~r3=148) at /Users/elias/go-tip/src/runtime/netpoll.go:366
#2  0x0000000001027cd4 in internal/poll.runtime_pollWait (pd=0x1ca0f00, mode=114, ~r2=1) at /Users/elias/go-tip/src/runtime/netpoll.go:173
#3  0x000000000108f10e in internal/poll.(*pollDesc).wait (pd=0xc4204c0478, mode=114, isFile=true, ~r2=...) at /Users/elias/go-tip/src/internal/poll/fd_poll_runtime.go:85
#4  0x000000000108f18d in internal/poll.(*pollDesc).waitRead (pd=0xc4204c0478, isFile=true, ~r1=...) at /Users/elias/go-tip/src/internal/poll/fd_poll_runtime.go:90
#5  0x000000000108fed9 in internal/poll.(*FD).Read (fd=0xc4204c0460, p= []uint8 = {...}, ~r1=0, ~r2=...) at /Users/elias/go-tip/src/internal/poll/fd_unix.go:125
#6  0x000000000109892e in os.(*File).read (f=0xc42000e090, b= []uint8 = {...}, n=101, err=...) at /Users/elias/go-tip/src/os/file_unix.go:203
#7  0x000000000109580d in os.(*File).Read (f=0xc42000e090, b= []uint8 = {...}, n=101, err=...) at /Users/elias/go-tip/src/os/file.go:103
#8  0x00000000010e8121 in bytes.(*Buffer).ReadFrom (b=0xc4200925b0, r=..., n=30277672, err=...) at /Users/elias/go-tip/src/bytes/buffer.go:188
#9  0x000000000106f9a4 in io.copyBuffer (dst=..., src=..., buf= []uint8, written=842355442464, err=...) at /Users/elias/go-tip/src/io/io.go:384
#10 0x000000000106f698 in io.Copy (dst=..., src=..., written=16798931, err=...) at /Users/elias/go-tip/src/io/io.go:360
#11 0x00000000010f962d in os/exec.(*Cmd).writerDescriptor.func1 (~r0=...) at /Users/elias/go-tip/src/os/exec/exec.go:258
#12 0x00000000010f96a7 in os/exec.(*Cmd).Start.func1 (c=0xc4200c2000, fn={void (error *)} 0xc42002cfd8) at /Users/elias/go-tip/src/os/exec/exec.go:374
#13 0x00000000010592e1 in runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2349
#14 0x000000c4200c2000 in ?? ()
#15 0x000000c4203ae360 in ?? ()
#16 0x0000000000000000 in ?? ()
(gdb) goroutine 53 bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc4201e5720, lock=0x1711a40 <runtime.timers>, reason="timer goroutine (idle)", traceEv=20 '\024', traceskip=1) at /Users/elias/go-tip/src/runtime/proc.go:269
#1  0x000000000102d7ae in runtime.goparkunlock (lock=0x1711a40 <runtime.timers>, reason="timer goroutine (idle)", traceEv=20 '\024', traceskip=1) at /Users/elias/go-tip/src/runtime/proc.go:274
#2  0x000000000104aef5 in runtime.timerproc () at /Users/elias/go-tip/src/runtime/time.go:207
#3  0x00000000010592e1 in runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2349
#4  0x0000000000000000 in ?? ()
(gdb) goroutine 54 bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc42003dc90, lock=0x0, reason="select", traceEv=24 '\030', traceskip=1) at /Users/elias/go-tip/src/runtime/proc.go:269
#1  0x000000000103c8f6 in runtime.selectgo (sel=0xc42003df50, ~r1=842352411104) at /Users/elias/go-tip/src/runtime/select.go:395
#2  0x0000000001055faf in runtime.ensureSigM.func1 () at /Users/elias/go-tip/src/runtime/signal_unix.go:492
#3  0x00000000010592e1 in runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2349
#4  0x0000000000000000 in ?? ()
(gdb) goroutine 55 bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc4201e6e98, lock=0xc4201dc178, reason="chan receive", traceEv=23 '\027', traceskip=3) at /Users/elias/go-tip/src/runtime/proc.go:269
#1  0x000000000102d7ae in runtime.goparkunlock (lock=0xc4201dc178, reason="chan receive", traceEv=23 '\027', traceskip=3) at /Users/elias/go-tip/src/runtime/proc.go:274
#2  0x0000000001006574 in runtime.chanrecv (c=0xc4201dc120, ep=0x0, block=true, selected=false, received=false) at /Users/elias/go-tip/src/runtime/chan.go:506
#3  0x000000000100621b in runtime.chanrecv1 (c=0xc4201dc120, elem=0x0) at /Users/elias/go-tip/src/runtime/chan.go:388
#4  0x00000000011860d4 in cmd/go/internal/base.processSignals.func1 (sig=0xc4201dc120) at /Users/elias/go-tip/src/cmd/go/internal/base/signal.go:21
#5  0x00000000010592e1 in runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2349
#6  0x000000c4201dc120 in ?? ()
#7  0x0000000000000000 in ?? ()
(gdb) goroutine 56 bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0xc4201e75f0, lock=0xc4202c2238, reason="chan receive", traceEv=23 '\027', traceskip=3) at /Users/elias/go-tip/src/runtime/proc.go:269
#1  0x000000000102d7ae in runtime.goparkunlock (lock=0xc4202c2238, reason="chan receive", traceEv=23 '\027', traceskip=3) at /Users/elias/go-tip/src/runtime/proc.go:274
#2  0x0000000001006574 in runtime.chanrecv (c=0xc4202c21e0, ep=0xc4201e7778, block=true, selected=false, received=false) at /Users/elias/go-tip/src/runtime/chan.go:506
#3  0x000000000100621b in runtime.chanrecv1 (c=0xc4202c21e0, elem=0xc4201e7778) at /Users/elias/go-tip/src/runtime/chan.go:388
#4  0x00000000010f670b in os/exec.(*Cmd).Wait (c=0xc4200c2000, ~r0=...) at /Users/elias/go-tip/src/os/exec/exec.go:446
#5  0x00000000013c644b in cmd/go/internal/test.builderRunTest.func1 (done=0xc4201dc2a0, cmd=0xc4200c2000) at /Users/elias/go-tip/src/cmd/go/internal/test/test.go:1203
#6  0x00000000010592e1 in runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2349
#7  0x000000c4201dc2a0 in ?? ()
#8  0x000000c4200c2000 in ?? ()
#9  0x0000000000000000 in ?? ()

The hang is in builderRunTest, in the section that makes sure a wedged test binary is killed:

   if err == nil {
        tick := time.NewTimer(testKillTimeout)
        base.StartSigHandlers()
        done := make(chan error)
        go func() {
            done <- cmd.Wait()
        }()
    Outer:
        select {
        case err = <-done:
            // ok
        case <-tick.C:
            if base.SignalTrace != nil {
                // Send a quit signal in the hope that the program will print
                // a stack trace and exit. Give it five seconds before resorting
                // to Kill.
                cmd.Process.Signal(base.SignalTrace)
                select {
                case err = <-done:
                    fmt.Fprintf(&buf, "*** Test killed with %v: ran too long (%v).\n", base.SignalTrace, testKillTimeout)
                    break Outer
                case <-time.After(5 * time.Second):
                }
            }
            cmd.Process.Kill()
            err = <-done
            fmt.Fprintf(&buf, "*** Test killed: ran too long (%v).\n", testKillTimeout)
        }
        tick.Stop()
    }

The interesting goroutine is 10, which is hanging on

err = <- done

in builderRunTest. The done channel is waiting to be filled by goroutine 56:

        go func() {
            done <- cmd.Wait()
        }()

Even though process was killed with cmd.Process.Kill(). psconfirms this.

What's interesting (to me) is that cmd.Wait() has completed cmd.Process.Wait() and is blocked on

   for range c.goroutine {
        if err := <-c.errch; err != nil && copyError == nil {
            copyError = err
        }
    }

Which leads us to goroutine 52:

#1  0x00000000010286d0 in runtime.netpollblock (pd=0x1ca0f00, mode=114, waitio=false, ~r3=148) at /Users/elias/go-tip/src/runtime/netpoll.go:366
#2  0x0000000001027cd4 in internal/poll.runtime_pollWait (pd=0x1ca0f00, mode=114, ~r2=1) at /Users/elias/go-tip/src/runtime/netpoll.go:173
#3  0x000000000108f10e in internal/poll.(*pollDesc).wait (pd=0xc4204c0478, mode=114, isFile=true, ~r2=...) at /Users/elias/go-tip/src/internal/poll/fd_poll_runtime.go:85
#4  0x000000000108f18d in internal/poll.(*pollDesc).waitRead (pd=0xc4204c0478, isFile=true, ~r1=...) at /Users/elias/go-tip/src/internal/poll/fd_poll_runtime.go:90
#5  0x000000000108fed9 in internal/poll.(*FD).Read (fd=0xc4204c0460, p= []uint8 = {...}, ~r1=0, ~r2=...) at /Users/elias/go-tip/src/internal/poll/fd_unix.go:125
#6  0x000000000109892e in os.(*File).read (f=0xc42000e090, b= []uint8 = {...}, n=101, err=...) at /Users/elias/go-tip/src/os/file_unix.go:203
#7  0x000000000109580d in os.(*File).Read (f=0xc42000e090, b= []uint8 = {...}, n=101, err=...) at /Users/elias/go-tip/src/os/file.go:103
#8  0x00000000010e8121 in bytes.(*Buffer).ReadFrom (b=0xc4200925b0, r=..., n=30277672, err=...) at /Users/elias/go-tip/src/bytes/buffer.go:188
#9  0x000000000106f9a4 in io.copyBuffer (dst=..., src=..., buf= []uint8, written=842355442464, err=...) at /Users/elias/go-tip/src/io/io.go:384
#10 0x000000000106f698 in io.Copy (dst=..., src=..., written=16798931, err=...) at /Users/elias/go-tip/src/io/io.go:360
#11 0x00000000010f962d in os/exec.(*Cmd).writerDescriptor.func1 (~r0=...) at /Users/elias/go-tip/src/os/exec/exec.go:258
#12 0x00000000010f96a7 in os/exec.(*Cmd).Start.func1 (c=0xc4200c2000, fn={void (error *)} 0xc42002cfd8) at /Users/elias/go-tip/src/os/exec/exec.go:374
#13 0x00000000010592e1 in runtime.goexit () at /Users/elias/go-tip/src/runtime/asm_amd64.s:2349
#14 0x000000c4200c2000 in ?? ()
#15 0x000000c4203ae360 in ?? ()
#16 0x0000000000000000 in ?? ()

So why didn't cmd.Process.Kill() also close the process' file descriptors, unblocking goroutine 52?

I'm stuck here, but I can reproduce the hang fairly consistently, so any debugging hints would be much appreciated.

@eliasnaur
Copy link
Contributor

I think this is an instance of cmd/go hitting a version of #18874, which means that cmd/go should probably change to not depend on cmd.Wait().

@gopherbot
Copy link

CL https://golang.org/cl/42270 mentions this issue.

@bradfitz bradfitz added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin labels Jul 20, 2017
@bradfitz bradfitz modified the milestones: Go1.10, Go1.9Maybe Jul 20, 2017
@ianlancetaylor
Copy link
Contributor

The original report on this issue is not an example of #18874. In that stack trace we can see that the program is sleeping in os.(*Process).blockUntilWaitable. If it were #18874, it would be sleeping on a channel receive in os/exec.(*Cmd).Wait.

The original report was filed on January 6. On February 28 we stopped using waitid on Darwin (https://golang.org/cl/37610). We made that change for a different reason (#19314) but we now use a completely different code path.

I'm going to close this issue as unreproduceable, possibly fixed, with the other cases being reported as dups of #18874.

@golang golang locked and limited conversation to collaborators Dec 6, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin
Projects
None yet
Development

No branches or pull requests

6 participants