-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime/pprof: hangs when using pprof on a CPU-intensive program on OS X #5519
Labels
Milestone
Comments
Sorry, forgot to specify OS X version. I'm running 10.8.3. Also forgot to put the output after the SIGQUIT. Here's an example one: SIGQUIT: quit PC=0x229fb goroutine 1 [chan receive]: main.main() /Users/akalin/src/aks/aks_debug.go:46 +0x141 goroutine 3 [syscall]: runtime/pprof.profileWriter(0x2102391b0, 0x2101d8020) /Users/akalin/homebrew/Cellar/go/1.1rc3/src/pkg/runtime/pprof/pprof.go:600 +0x1c created by runtime/pprof.StartCPUProfile /Users/akalin/homebrew/Cellar/go/1.1rc3/src/pkg/runtime/pprof/pprof.go:594 +0x16b goroutine 4 [running]: syscall.Syscall() /Users/akalin/homebrew/Cellar/go/1.1rc3/src/pkg/syscall/asm_darwin_amd64.s:15 +0x5 syscall.write(0x2, 0x21023a240, 0x27, 0x40, 0x27, ...) /Users/akalin/homebrew/Cellar/go/1.1rc3/src/pkg/syscall/zsyscall_darwin_amd64.go:1274 +0x70 syscall.Write(0x14001, 0x21023a240, 0x27, 0x40, 0x37a28, ...) /Users/akalin/homebrew/Cellar/go/1.1rc3/src/pkg/syscall/syscall_unix.go:143 +0x5a os.(*File).write(0x13b06, 0x210248838, 0x8, 0x4ff1d, 0x2102360f0, ...) /Users/akalin/homebrew/Cellar/go/1.1rc3/src/pkg/os/file_unix.go:188 +0x79 os.(*File).Write(0x2102360f0, 0x21023a240, 0x27, 0x21049b000, 0x27, ...) /Users/akalin/homebrew/Cellar/go/1.1rc3/src/pkg/os/file.go:144 +0xdb log.(*Logger).Output(0x3c004, 0x21049b000, 0x14001, 0x3c004, 0x0, ...) /Users/akalin/homebrew/Cellar/go/1.1rc3/src/pkg/log/log.go:155 +0x34f runtime/pprof.writeHeap(0xa0b00000030, 0x0, 0x4000000, 0xd5a000166a6, 0x21025a000, ...) /Users/akalin/homebrew/Cellar/go/1.1rc3/src/pkg/runtime/pprof/pprof.go:456 +0xd24 created by main.main /Users/akalin/src/aks/aks_debug.go:44 +0x128 rax 0xe rbx 0x1831d8 rcx 0x7fff5fbff770 rdx 0x0 rdi 0xc07 rsi 0x175e90 rbp 0xffffffffffffffff rsp 0x7fff5fbff770 r8 0x33359d r9 0x1fe r10 0x3488dcb46 r11 0x286 r12 0x2c227d9be987d r13 0x2c2f0a6a92d90 r14 0x12fdf96808336600 r15 0x0 rip 0x229fb rflags 0x286 cs 0x7 fs 0x0 gs 0x0 |
Could you please try this (slightly simplified) example http://play.golang.org/p/zrrv9wEuRj on your machine using tip. Labels changed: added os-macosx. Status changed to WaitingForReply. |
Okay, I downloaded the go source using the instructions on http://golang.org/doc/install/source, moved to the default branch, and built go. I then compiled and followed the repro steps above with my program as well as your program with go built from tip. The problem reproduces with my program, but not with your program. This is consistent with my earlier experience of trying to find a minimal test case -- I found that removing the channel made the program go away. Let me know if I can do anything else. |
Could you please try again with my version so we get the logging when the profile dump starts and finishes, but add back your channel bits. As an experiment (because I don't know how else to capture this information), also export GOGCTRACE=1. If the program hangs with the channel added, just for arguments sake, please hit it with a SIGQUIT and post the results. My working theory at the moment is gc is blocking progress of the profile dump. It isn't a great theory. |
Hunh, I managed to get it to repro with your version! Here's the output with GOGCTRACE=1 and me killing it with SIGQUIT: gc1(1): 0+0+0 ms, 0 -> 0 MB 1170 -> 1171 (1173-2) objects, 0(0) handoff, 0(0) steal, 0/0/0 yields 2013/05/29 03:53:24 0: multiplying... 2013/05/29 03:53:24 0: multiplying done; shifting... 2013/05/29 03:53:24 0: not shifting 2013/05/29 03:53:24 1: multiplying... 2013/05/29 03:53:24 1: multiplying done; shifting... 2013/05/29 03:53:24 1: not shifting 2013/05/29 03:53:24 2: multiplying... 2013/05/29 03:53:24 2: multiplying done; shifting... 2013/05/29 03:53:24 2: not shifting 2013/05/29 03:53:24 3: multiplying... 2013/05/29 03:53:24 3: multiplying done; shifting... 2013/05/29 03:53:24 3: not shifting 2013/05/29 03:53:24 4: multiplying... 2013/05/29 03:53:24 4: multiplying done; shifting... 2013/05/29 03:53:24 4: not shifting 2013/05/29 03:53:24 5: multiplying... 2013/05/29 03:53:24 5: multiplying done; shifting... 2013/05/29 03:53:24 5: not shifting 2013/05/29 03:53:24 6: multiplying... 2013/05/29 03:53:24 6: multiplying done; shifting... 2013/05/29 03:53:24 6: not shifting 2013/05/29 03:53:24 7: multiplying... 2013/05/29 03:53:24 7: multiplying done; shifting... 2013/05/29 03:53:24 7: not shifting 2013/05/29 03:53:24 8: multiplying... 2013/05/29 03:53:24 8: multiplying done; shifting... 2013/05/29 03:53:24 8: not shifting 2013/05/29 03:53:24 9: multiplying... gc2(1): 0+0+0 ms, 0 -> 0 MB 1354 -> 1292 (1378-86) objects, 0(0) handoff, 0(0) steal, 0/0/0 yields 2013/05/29 03:53:24 9: multiplying done; shifting... 2013/05/29 03:53:24 9: not shifting 2013/05/29 03:53:24 10: multiplying... 2013/05/29 03:53:24 10: multiplying done; shifting... 2013/05/29 03:53:24 10: not shifting 2013/05/29 03:53:24 11: multiplying... gc3(1): 0+0+0 ms, 1 -> 1 MB 1302 -> 1292 (1389-97) objects, 0(0) handoff, 0(0) steal, 0/0/0 yields 2013/05/29 03:53:26 11: multiplying done; shifting... 2013/05/29 03:53:26 11: not shifting 2013/05/29 03:53:26 12: multiplying... 2013/05/29 03:53:30 12: multiplying done; shifting... 2013/05/29 03:53:30 12: not shifting 2013/05/29 03:53:30 13: multiplying... gc4(1): 0+0+0 ms, 4 -> 3 MB 1302 -> 1293 (1400-107) objects, 0(0) handoff, 0(0) steal, 0/0/0 yields 2013/05/29 03:53:42 13: multiplying done; shifting... 2013/05/29 03:53:42 13: not shifting 2013/05/29 03:53:42 14: multiplying... gc5(1): 0+0+0 ms, 7 -> 7 MB 1298 -> 1293 (1406-113) objects, 0(0) handoff, 0(0) steal, 0/0/0 yields SIGQUIT: quit PC=0x22b0b runtime/pprof.profileWriter(0x21025c1e0, 0x21023a408) /Users/akalin/src/go.repo/src/pkg/runtime/pprof/pprof.go:600 +0x1c created by runtime/pprof.StartCPUProfile /Users/akalin/src/go.repo/src/pkg/runtime/pprof/pprof.go:594 +0x16b goroutine 1 [running]: syscall.Syscall() /Users/akalin/src/go.repo/src/pkg/syscall/asm_darwin_amd64.s:15 +0x5 syscall.write(0x2, 0x21025d200, 0x27, 0x40, 0x27, ...) /Users/akalin/src/go.repo/src/pkg/syscall/zsyscall_darwin_amd64.go:1274 +0x70 syscall.Write(0x14001, 0x21025d200, 0x27, 0x40, 0x37ae8, ...) /Users/akalin/src/go.repo/src/pkg/syscall/syscall_unix.go:143 +0x5a os.(*File).write(0x13b66, 0x21026b038, 0x8, 0x5024d, 0x2102590f0, ...) /Users/akalin/src/go.repo/src/pkg/os/file_unix.go:188 +0x79 os.(*File).Write(0x2102590f0, 0x21025d200, 0x27, 0x2104b9000, 0x27, ...) /Users/akalin/src/go.repo/src/pkg/os/file.go:144 +0xdb log.(*Logger).Output(0x3c004, 0x2104b9000, 0x14001, 0x3c004, 0x0, ...) /Users/akalin/src/go.repo/src/pkg/log/log.go:155 +0x33a rax 0xe rbx 0x221034f008 rcx 0x221024ce88 rdx 0x0 rdi 0xb07 rsi 0x18cef0 rbp 0xffffffffffffffff rsp 0x221024ce88 r8 0x1000000 r9 0x0 r10 0x2101ef900 r11 0x286 r12 0x62681980af85 r13 0x626bdafe1f70 r14 0x1302986693559e00 r15 0x109b70 rip 0x22b0b rflags 0x286 cs 0x7 fs 0x0 gs 0x0 |
It's exactly the link you posted above: http://play.golang.org/p/zrrv9wEuRj |
Unfortunately, your version seems to repro the hang quite rarely compared to mine. I combined the versions in #8 to get http://play.golang.org/p/GOq8WP1cxN Here's an example log for that one with a hang and a SIGQUIT: gc1(1): 0+0+0 ms, 0 -> 0 MB 1170 -> 1171 (1173-2) objects, 0(0) handoff, 0(0) steal, 0/0/0 yields 2013/05/29 04:43:11 0: multiplying... 2013/05/29 04:43:11 0: multiplying done; shifting... 2013/05/29 04:43:11 0: not shifting 2013/05/29 04:43:11 1: multiplying... 2013/05/29 04:43:11 1: multiplying done; shifting... 2013/05/29 04:43:11 1: not shifting 2013/05/29 04:43:11 2: multiplying... 2013/05/29 04:43:11 2: multiplying done; shifting... 2013/05/29 04:43:11 2: not shifting 2013/05/29 04:43:11 3: multiplying... 2013/05/29 04:43:11 3: multiplying done; shifting... 2013/05/29 04:43:11 3: not shifting 2013/05/29 04:43:11 4: multiplying... 2013/05/29 04:43:11 4: multiplying done; shifting... 2013/05/29 04:43:11 4: not shifting 2013/05/29 04:43:11 5: multiplying... 2013/05/29 04:43:11 5: multiplying done; shifting... 2013/05/29 04:43:11 5: not shifting 2013/05/29 04:43:11 6: multiplying... 2013/05/29 04:43:11 6: multiplying done; shifting... 2013/05/29 04:43:11 6: not shifting 2013/05/29 04:43:11 7: multiplying... 2013/05/29 04:43:11 7: multiplying done; shifting... 2013/05/29 04:43:11 7: not shifting 2013/05/29 04:43:11 8: multiplying... 2013/05/29 04:43:11 8: multiplying done; shifting... 2013/05/29 04:43:11 8: not shifting 2013/05/29 04:43:11 9: multiplying... gc2(1): 0+0+0 ms, 0 -> 0 MB 1358 -> 1297 (1379-82) objects, 0(0) handoff, 0(0) steal, 0/0/0 yields 2013/05/29 04:43:11 9: multiplying done; shifting... 2013/05/29 04:43:11 9: not shifting 2013/05/29 04:43:11 10: multiplying... 2013/05/29 04:43:11 10: multiplying done; shifting... 2013/05/29 04:43:11 10: not shifting 2013/05/29 04:43:11 11: multiplying... gc3(1): 0+0+0 ms, 1 -> 1 MB 1307 -> 1296 (1390-94) objects, 0(0) handoff, 0(0) steal, 0/0/0 yields 2013/05/29 04:43:12 11: multiplying done; shifting... 2013/05/29 04:43:12 11: not shifting 2013/05/29 04:43:12 12: multiplying... SIGQUIT: quit PC=0x22b6b runtime.mach_semaphore_wait() /Users/akalin/src/go.repo/src/pkg/runtime/sys_darwin_amd64.s:391 +0xb runtime.mach_semacquire(0x2200000a0b, 0xffffffffffffffff) /Users/akalin/src/go.repo/src/pkg/runtime/os_darwin.c:426 +0xbe runtime.semasleep(0xffffffffffffffff) /Users/akalin/src/go.repo/src/pkg/runtime/os_darwin.c:32 +0x58 runtime.notesleep(0x198398) /Users/akalin/src/go.repo/src/pkg/runtime/lock_sema.c:159 +0xba stopm() /Users/akalin/src/go.repo/src/pkg/runtime/proc.c:799 +0xdf exitsyscall0(0x2101ef400) /Users/akalin/src/go.repo/src/pkg/runtime/proc.c:1423 +0xc8 runtime.mcall() /Users/akalin/src/go.repo/src/pkg/runtime/asm_amd64.s:195 +0x49 goroutine 1 [chan receive]: runtime.park(0xb6b0, 0x21026e0b0, 0x191c35) /Users/akalin/src/go.repo/src/pkg/runtime/proc.c:1179 +0x64 runtime.chanrecv(0xbd5a0, 0x21026e060, 0x221023eed0, 0x0, 0x0, ...) /Users/akalin/src/go.repo/src/pkg/runtime/chan.c:366 +0x544 runtime.chanrecv1() /Users/akalin/src/go.repo/src/pkg/runtime/chan.c:458 +0x38 main.main() /tmp/aks_debug-3.go:45 +0x150 runtime.main() /Users/akalin/src/go.repo/src/pkg/runtime/proc.c:181 +0x92 runtime.goexit() /Users/akalin/src/go.repo/src/pkg/runtime/proc.c:1227 goroutine 2 [runnable]: runtime.exitsyscall() /Users/akalin/src/go.repo/src/pkg/runtime/proc.c:1395 +0x13f runtime.MHeap_Scavenger() /Users/akalin/src/go.repo/src/pkg/runtime/mheap.c:475 +0xf0 runtime.goexit() /Users/akalin/src/go.repo/src/pkg/runtime/proc.c:1227 created by runtime.main /Users/akalin/src/go.repo/src/pkg/runtime/proc.c:164 goroutine 3 [syscall]: runtime.entersyscallblock() /Users/akalin/src/go.repo/src/pkg/runtime/proc.c:1338 +0x16e getprofile(0x221024cf68, 0x221034f000) /Users/akalin/src/go.repo/src/pkg/runtime/cpuprof.c:362 +0x367 runtime.CPUProfile(0x0, 0x0, 0x0) /Users/akalin/src/go.repo/src/pkg/runtime/cpuprof.c:436 +0x32 runtime/pprof.profileWriter(0x21025c1e0, 0x21023a408) /Users/akalin/src/go.repo/src/pkg/runtime/pprof/pprof.go:600 +0x1c runtime.goexit() /Users/akalin/src/go.repo/src/pkg/runtime/proc.c:1227 created by runtime/pprof.StartCPUProfile /Users/akalin/src/go.repo/src/pkg/runtime/pprof/pprof.go:594 +0x16b goroutine 4 [running]: syscall.Syscall() /Users/akalin/src/go.repo/src/pkg/syscall/asm_darwin_amd64.s:15 +0x5 syscall.write(0x2, 0x21025d200, 0x27, 0x40, 0x27, ...) /Users/akalin/src/go.repo/src/pkg/syscall/zsyscall_darwin_amd64.go:1274 +0x70 syscall.Write(0x5001, 0x21025d200, 0x27, 0x40, 0x37b48, ...) /Users/akalin/src/go.repo/src/pkg/syscall/syscall_unix.go:143 +0x5a os.(*File).write(0x13bc6, 0x21026b838, 0x8, 0x502ad, 0x2102590f0, ...) /Users/akalin/src/go.repo/src/pkg/os/file_unix.go:188 +0x79 os.(*File).Write(0x2102590f0, 0x21025d200, 0x27, 0x210332000, 0x27, ...) /Users/akalin/src/go.repo/src/pkg/os/file.go:144 +0xdb log.(*Logger).Output(0xf004, 0x210332000, 0x5001, 0xf004, 0x0, ...) /Users/akalin/src/go.repo/src/pkg/log/log.go:155 +0x33a runtime.nilintercopy() /Users/akalin/src/go.repo/src/pkg/runtime/alg.c:409 +0x11 handoff(0x2170) /Users/akalin/src/go.repo/src/pkg/runtime/mgc0.c:1361 +0x74 created by main.main /tmp/aks_debug-3.go:43 +0x137 rax 0xe rbx 0x198398 rcx 0x7fff5fbff840 rdx 0x0 rdi 0xa0b rsi 0x18cef0 rbp 0xffffffffffffffff rsp 0x7fff5fbff840 r8 0x2fd2a r9 0x37 r10 0x39ea8cec r11 0x286 r12 0x62681980af85 r13 0x65316ef11058 r14 0x13029b2c3e4aae00 r15 0x0 rip 0x22b6b rflags 0x286 cs 0x7 fs 0x0 gs 0x0 |
It's unclear what to do here. Eventually I think we're going to have to stop supporting profiling on OS X. Apple has shown no interest in fixing the kernel bug, and working around it gets harder and harder. Labels changed: added priority-later, go1.2, removed priority-triage. Status changed to Accepted. |
This issue was closed by revision d3066e4. Status changed to Fixed. |
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
by akalin:
The text was updated successfully, but these errors were encountered: