Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: unexpected hangs by cmd/go and cmd/compile #29385

Closed
ianlancetaylor opened this issue Dec 21, 2018 · 13 comments
Closed

runtime: unexpected hangs by cmd/go and cmd/compile #29385

ianlancetaylor opened this issue Dec 21, 2018 · 13 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@ianlancetaylor
Copy link
Contributor

Several times recently while running all.bash I've seen the cmd/compile or cmd/go programs hang without making progress. These happen at different times during all.bash, usually in the cmd/go test or the final test directory (presumably because those run a lot of commands).

Here is a case that just happened with cmd/go hanging in the final test directory. This is the output after kill -QUIT.

##### ../test
# go run run.go -- fixedbugs/issue7867.go
exit status 2
SIGQUIT: quit
PC=0x45b230 m=0 sigcode=0

goroutine 0 [idle]:
runtime.epollwait(0x7ffe00000004, 0x7ffe63f53620, 0xffffffff00000080, 0x7f4b00000000, 0xffffffff000000e9, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/iant/go/src/runtime/sys_linux_amd64.s:675 +0x20
runtime.netpoll(0xc000042501, 0xc000038f01)
	/home/iant/go/src/runtime/netpoll_epoll.go:71 +0x125
runtime.findrunnable(0xc000038f00, 0x0)
	/home/iant/go/src/runtime/proc.go:2380 +0x4f1
runtime.schedule()
	/home/iant/go/src/runtime/proc.go:2525 +0x208
runtime.goexit0(0xc0003b0300)
	/home/iant/go/src/runtime/proc.go:2722 +0x1d7
runtime.mcall(0x0)
	/home/iant/go/src/runtime/asm_amd64.s:299 +0x5b

goroutine 1 [semacquire, 1 minutes]:
sync.runtime_Semacquire(0xc00044da78)
	/home/iant/go/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc00044da70)
	/home/iant/go/src/sync/waitgroup.go:130 +0x64
cmd/go/internal/work.(*Builder).Do(0xc000217040, 0xc000588dc0)
	/home/iant/go/src/cmd/go/internal/work/exec.go:174 +0x3bf
cmd/go/internal/run.runRun(0xe18dc0, 0xc000020130, 0x1, 0x1)
	/home/iant/go/src/cmd/go/internal/run/run.go:137 +0x53a
main.main()
	/home/iant/go/src/cmd/go/main.go:219 +0x82d

goroutine 5 [syscall, 1 minutes]:
os/signal.signal_recv(0x0)
	/home/iant/go/src/runtime/sigqueue.go:139 +0x9c
os/signal.loop()
	/home/iant/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
	/home/iant/go/src/os/signal/signal_unix.go:29 +0x41

goroutine 22 [select, 1 minutes]:
cmd/go/internal/work.(*Builder).Do.func2(0xc00044da70, 0xc000217040, 0xc0002a9fa0)
	/home/iant/go/src/cmd/go/internal/work/exec.go:155 +0xfb
created by cmd/go/internal/work.(*Builder).Do
	/home/iant/go/src/cmd/go/internal/work/exec.go:152 +0x39b

goroutine 23 [select, 1 minutes]:
cmd/go/internal/work.(*Builder).Do.func2(0xc00044da70, 0xc000217040, 0xc0002a9fa0)
	/home/iant/go/src/cmd/go/internal/work/exec.go:155 +0xfb
created by cmd/go/internal/work.(*Builder).Do
	/home/iant/go/src/cmd/go/internal/work/exec.go:152 +0x39b

goroutine 24 [select, 1 minutes]:
cmd/go/internal/work.(*Builder).Do.func2(0xc00044da70, 0xc000217040, 0xc0002a9fa0)
	/home/iant/go/src/cmd/go/internal/work/exec.go:155 +0xfb
created by cmd/go/internal/work.(*Builder).Do
	/home/iant/go/src/cmd/go/internal/work/exec.go:152 +0x39b

goroutine 25 [select, 1 minutes]:
cmd/go/internal/work.(*Builder).Do.func2(0xc00044da70, 0xc000217040, 0xc0002a9fa0)
	/home/iant/go/src/cmd/go/internal/work/exec.go:155 +0xfb
created by cmd/go/internal/work.(*Builder).Do
	/home/iant/go/src/cmd/go/internal/work/exec.go:152 +0x39b

goroutine 26 [select, 1 minutes]:
cmd/go/internal/work.(*Builder).Do.func2(0xc00044da70, 0xc000217040, 0xc0002a9fa0)
	/home/iant/go/src/cmd/go/internal/work/exec.go:155 +0xfb
created by cmd/go/internal/work.(*Builder).Do
	/home/iant/go/src/cmd/go/internal/work/exec.go:152 +0x39b

goroutine 27 [running]:
	goroutine running on other thread; stack unavailable
created by cmd/go/internal/work.(*Builder).Do
	/home/iant/go/src/cmd/go/internal/work/exec.go:152 +0x39b

goroutine 28 [select, 1 minutes]:
cmd/go/internal/work.(*Builder).Do.func2(0xc00044da70, 0xc000217040, 0xc0002a9fa0)
	/home/iant/go/src/cmd/go/internal/work/exec.go:155 +0xfb
created by cmd/go/internal/work.(*Builder).Do
	/home/iant/go/src/cmd/go/internal/work/exec.go:152 +0x39b

goroutine 29 [running]:
	goroutine running on other thread; stack unavailable
created by cmd/go/internal/work.(*Builder).Do
	/home/iant/go/src/cmd/go/internal/work/exec.go:152 +0x39b

rax    0xfffffffffffffffc
rbx    0xffffffff
rcx    0x45b230
rdx    0x80
rdi    0x4
rsi    0x7ffe63f53620
rbp    0x7ffe63f53c20
rsp    0x7ffe63f535e0
r8     0x0
r9     0x3
r10    0xffffffff
r11    0x202
r12    0x4
r13    0xde8fc0
r14    0x0
r15    0x0
rip    0x45b230
rflags 0x202
cs     0x33
fs     0x0
gs     0x0

FAIL	fixedbugs/issue7867.go	106.601s
2018/12/21 13:08:52 Failed: exit status 1

I'm filing this in case anybody else sees this. On the builders this would show up as a timeout; we normally ignore timeouts in the builders and trybots.

CC @aclements @randall77

@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Dec 21, 2018
@ianlancetaylor ianlancetaylor added this to the Go1.12 milestone Dec 21, 2018
@ianlancetaylor
Copy link
Contributor Author

Sample timeout, on linux/amd64 trybot: https://storage.googleapis.com/go-build-log/20c62074/linux-amd64_360cb6cf.log

@cuonglm
Copy link
Member

cuonglm commented Dec 22, 2018

@cuonglm
Copy link
Member

cuonglm commented Dec 24, 2018

I'm filing this in case anybody else sees this. On the builders this would show up as a timeout; we normally ignore timeouts in the builders and trybots.

I saw trybots still report as failed build instead of ignoring.

@ianlancetaylor
Copy link
Contributor Author

Another instance reported in #29418.

@aclements
Copy link
Member

It's hard to say exactly what's going on here without a GOTRACEBACK=crash traceback, but I did just track down a deadlock involving checkPut, which I recently added to debug #27993. Specifically, when checkPut happens with the sched.lock held, it's possible to deadlock because the forEachP in gcMarkDone doesn't finish and hence can't release the checkPut loop. One place where this happens is when mcommoninit is holding sched.lock and eventually does mp.cgoCallers = new(cgoCallers).

@siebenmann
Copy link

siebenmann commented Dec 28, 2018

I've seen periodic hangs during all.bash for a while on git tip, on 64-bit x86 Fedora Linux. With the most recent one, which happened yesterday at +c043fc4f65, I got a stack trace:

ok      crypto/cipher   0.149s
# crypto/des [crypto/des.test]
SIGABRT: abort
PC=0x421787 m=0 sigcode=0

goroutine 0 [idle]:
runtime.(*gcWork).checkPut(0xc00003eb70, 0x0, 0xc00003ec28, 0x1, 0x200)
        /data/code/go-lang/go/src/runtime/mgcwork.go:124 +0x37
runtime.(*gcWork).putBatch(0xc00003eb70, 0xc00003ec28, 0x1, 0x200)
        /data/code/go-lang/go/src/runtime/mgcwork.go:220 +0x5f
runtime.wbBufFlush1(0xc00003d900)
        /data/code/go-lang/go/src/runtime/mwbbuf.go:308 +0x1be
runtime.wbBufFlush1Debug(0x0, 0x0, 0xc0004d6160, 0xc00003ec28, 0xc00003ec38)
        /data/code/go-lang/go/src/runtime/mwbbuf.go:230 +0x3a
runtime.wbBufFlush.func1()
        /data/code/go-lang/go/src/runtime/mwbbuf.go:216 +0x7a
runtime.systemstack(0x458479)
        /data/code/go-lang/go/src/runtime/asm_amd64.s:351 +0x66
runtime.mstart()
        /data/code/go-lang/go/src/runtime/proc.go:1153

goroutine 1 [running]:
runtime.systemstack_switch()
        /data/code/go-lang/go/src/runtime/asm_amd64.s:311 fp=0xc000433668 sp=0xc000433660 pc=0x458570
runtime.wbBufFlush(0x0, 0x0)
        /data/code/go-lang/go/src/runtime/mwbbuf.go:206 +0x66 fp=0xc000433698 sp=0xc000433668 pc=0x428b56
runtime.bulkBarrierPreWrite(0xc0003dabb8, 0xc000433860, 0x8)
        /data/code/go-lang/go/src/runtime/mbitmap.go:640 +0x2ed fp=0xc000433720 sp=0xc000433698 pc=0x4139cd
runtime.typedmemmove(0xdbc640, 0xc0003dabb8, 0xc000433860)
        /data/code/go-lang/go/src/runtime/mbarrier.go:161 +0xa3 fp=0xc000433758 sp=0xc000433720 pc=0x412993
runtime.chansend(0xc000b07ec0, 0xc000433860, 0x1, 0xbe832c, 0xbe8201)
        /data/code/go-lang/go/src/runtime/chan.go:204 +0x45a fp=0xc0004337d8 sp=0xc000433758 pc=0x4050da
runtime.chansend1(0xc000b07ec0, 0xc000433860)
        /data/code/go-lang/go/src/runtime/chan.go:127 +0x35 fp=0xc000433810 sp=0xc0004337d8 pc=0x404c75
cmd/compile/internal/gc.compileFunctions()
        /data/code/go-lang/go/src/cmd/compile/internal/gc/pgen.go:362 +0x17c fp=0xc000433880 sp=0xc000433810 pc=0xbe832c
cmd/compile/internal/gc.Main(0xde9200)
        /data/code/go-lang/go/src/cmd/compile/internal/gc/main.go:680 +0x31f0 fp=0xc000433f20 sp=0xc000433880 pc=0xbbf840
main.main()
        /data/code/go-lang/go/src/cmd/compile/main.go:51 +0xab fp=0xc000433f98 sp=0xc000433f20 pc=0xcf6fab
runtime.main()
        /data/code/go-lang/go/src/runtime/proc.go:200 +0x207 fp=0xc000433fe0 sp=0xc000433f98 pc=0x42e137
runtime.goexit()
        /data/code/go-lang/go/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc000433fe8 sp=0xc000433fe0 pc=0x45a4c1

goroutine 10 [running]:
        goroutine running on other thread; stack unavailable
created by cmd/compile/internal/gc.compileFunctions
        /data/code/go-lang/go/src/cmd/compile/internal/gc/pgen.go:354 +0x128

goroutine 11 [running]:
        goroutine running on other thread; stack unavailable
created by cmd/compile/internal/gc.compileFunctions
        /data/code/go-lang/go/src/cmd/compile/internal/gc/pgen.go:354 +0x128

goroutine 12 [running]:
        goroutine running on other thread; stack unavailable
created by cmd/compile/internal/gc.compileFunctions
        /data/code/go-lang/go/src/cmd/compile/internal/gc/pgen.go:354 +0x128

goroutine 13 [running]:
        goroutine running on other thread; stack unavailable
created by cmd/compile/internal/gc.compileFunctions
        /data/code/go-lang/go/src/cmd/compile/internal/gc/pgen.go:354 +0x128

rax    0xc00003eb70
rbx    0x200
rcx    0x8
rdx    0x8
rdi    0x2
rsi    0xc00003d900
rbp    0x7fff424c5940
rsp    0x7fff424c5890
r8     0xc
r9     0x1
r10    0x7fd2b1cddd40
r11    0xc00003ec38
r12    0xc0004d6160
r13    0xc00003ec28
r14    0xf48b60
r15    0x0
rip    0x421787
rflags 0x246
cs     0x33
fs     0x0
gs     0x0
FAIL    crypto/des [build failed]
ok      crypto/dsa      0.042s

I have in the past just ^C'd these and redone them, but I will now try to get stack dumps or actual cores (with GOTRACEBACK=crash).

@mvdan
Copy link
Member

mvdan commented Dec 30, 2018

I've seen hangs on make.bash three times so far in the past week, and I only run make.bash about once a day. This is on a new and otherwise idle 8-core ultrabook, so I don't know what I did to get so lucky.

This time I waited for a full three minutes, and ./make.bash -v was making no progress whatsoever. Note that an entire build usually takes just under a minute. Below is the last half of the output, including me doing a ^\ to get a trace. It seems to get stuck during the "building packages and commands" stage.

[...]

Building packages and commands for linux/amd64.
container/list
container/ring
crypto/internal/subtle
crypto/subtle
internal/x/crypto/cryptobyte/asn1
runtime/cgo
internal/nettrace
cmd/internal/goobj
debug/gosym
debug/plan9obj
cmd/vendor/golang.org/x/arch/arm/armasm
compress/bzip2
cmd/vendor/golang.org/x/arch/ppc64/ppc64asm
archive/zip
cmd/vendor/golang.org/x/arch/arm64/arm64asm
cmd/vendor/golang.org/x/arch/x86/x86asm
compress/lzw
crypto/cipher
crypto/internal/randutil
crypto/dsa
crypto/elliptic
crypto/sha512
encoding/asn1
crypto/aes
crypto/des
crypto/hmac
crypto/rc4
encoding/pem
internal/singleflight
internal/x/net/dns/dnsmessage
crypto/rand
internal/x/crypto/internal/chacha20
crypto/x509/pkix
internal/x/crypto/cryptobyte
crypto/ecdsa
internal/x/crypto/poly1305
internal/x/crypto/curve25519
cmd/internal/objfile
crypto/rsa
internal/x/crypto/hkdf
database/sql/driver
encoding/ascii85
internal/x/crypto/chacha20poly1305
encoding/base32
encoding/csv
encoding/gob
database/sql
cmd/addr2line
encoding/xml
internal/x/text/transform
internal/x/text/unicode/bidi
internal/x/net/http2/hpack
internal/x/text/unicode/norm
mime
internal/x/text/secure/bidirule
mime/quotedprintable
net/http/internal
go/constant
go/format
hash/crc64
os/user
net
hash/fnv
html/template
image/color
index/suffixarray
internal/x/net/idna
go/types
image
image/color/palette
internal/syscall/windows
internal/syscall/windows/registry
internal/syscall/windows/sysdll
testing
internal/trace
internal/x/text/secure
internal/x/text/unicode
math/cmplx
image/internal/imageutil
image/png
image/draw
image/jpeg
image/gif
internal/testenv
net/internal/socktest
os/signal
os/signal/internal/pty
plugin
runtime/pprof/internal/profile
runtime/race
testing/internal/testdeps
testing/iotest
testing/quick
cmd/internal/buildid
cmd/buildid
cmd/compile/internal/test
cmd/internal/browser
cmd/dist
cmd/cover
go/internal/gccgoimporter
go/internal/gcimporter
go/internal/srcimporter
cmd/api
cmd/doc
go/importer
cmd/fix
archive/tar
cmd/go/internal/cfg
cmd/go/internal/str
cmd/go/internal/base
cmd/go/internal/renameio
cmd/go/internal/modinfo
cmd/go/internal/search
cmd/go/internal/cache
cmd/go/internal/imports
cmd/go/internal/load
cmd/go/internal/dirhash
cmd/go/internal/lockedfile/internal/filelock
cmd/go/internal/par
cmd/go/internal/semver
cmd/go/internal/doc
cmd/go/internal/lockedfile
cmd/go/internal/help
cmd/go/internal/cmdflag
cmd/go/internal/module
cmd/internal/test2json
crypto/x509
net/textproto
internal/x/net/http/httpproxy
internal/x/net/internal/nettest
internal/x/net/nettest
log/syslog
cmd/go/internal/work
cmd/go/internal/modfetch/codehost
internal/x/net/http/httpguts
mime/multipart
net/mail
cmd/go/internal/modfile
cmd/go/internal/mvs
cmd/go/internal/tool
cmd/go/internal/version
cmd/go/internal/txtar
cmd/gofmt
cmd/nm
cmd/objdump
cmd/pack
crypto/tls
cmd/vendor/github.com/google/pprof/internal/elfexec
cmd/vendor/github.com/google/pprof/profile
cmd/vendor/github.com/ianlancetaylor/demangle
cmd/vendor/github.com/google/pprof/third_party/d3
cmd/go/internal/run
cmd/vendor/github.com/google/pprof/third_party/d3flamegraph
cmd/vendor/github.com/google/pprof/third_party/svgpan
cmd/vendor/golang.org/x/sys/unix
cmd/test2json
cmd/vendor/github.com/google/pprof/internal/proftest
cmd/vendor/golang.org/x/sys/windows
cmd/vendor/github.com/google/pprof/internal/measurement
cmd/vendor/golang.org/x/sys/windows/registry
cmd/vendor/golang.org/x/tools/go/analysis
cmd/vendor/golang.org/x/tools/go/types/objectpath
cmd/vendor/github.com/google/pprof/internal/graph
net/http/httptrace
net/smtp
net/http
cmd/vendor/golang.org/x/tools/go/analysis/internal/analysisflags
cmd/vendor/golang.org/x/tools/go/analysis/internal/facts
cmd/vendor/golang.org/x/tools/go/analysis/passes/internal/analysisutil
cmd/vendor/golang.org/x/tools/go/ast/inspector
cmd/vendor/golang.org/x/tools/go/analysis/passes/asmdecl
cmd/vendor/golang.org/x/tools/go/analysis/passes/buildtag
cmd/vendor/golang.org/x/tools/go/analysis/passes/cgocall
cmd/vendor/golang.org/x/tools/go/cfg
cmd/vendor/golang.org/x/tools/go/analysis/passes/inspect
cmd/vendor/golang.org/x/tools/go/ast/astutil
cmd/vendor/golang.org/x/tools/go/analysis/passes/pkgfact
cmd/vendor/golang.org/x/tools/go/analysis/passes/assign
cmd/vendor/golang.org/x/crypto/ssh/terminal
cmd/vendor/golang.org/x/tools/go/analysis/passes/atomic
cmd/vendor/golang.org/x/tools/go/analysis/passes/bools
cmd/vendor/golang.org/x/tools/go/analysis/passes/composite
cmd/vendor/golang.org/x/tools/go/analysis/passes/copylock
cmd/vendor/golang.org/x/tools/go/analysis/passes/httpresponse
cmd/vendor/golang.org/x/tools/go/analysis/passes/loopclosure
cmd/vendor/golang.org/x/tools/go/analysis/passes/nilfunc
cmd/vendor/golang.org/x/tools/go/analysis/passes/shift
cmd/vendor/golang.org/x/tools/go/analysis/passes/stdmethods
cmd/vendor/golang.org/x/tools/go/analysis/passes/structtag
cmd/vendor/golang.org/x/tools/go/analysis/passes/tests
cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable
cmd/vendor/golang.org/x/tools/go/analysis/passes/unsafeptr
cmd/vendor/golang.org/x/tools/go/types/typeutil
cmd/vendor/golang.org/x/tools/go/analysis/passes/unusedresult
cmd/vendor/golang.org/x/tools/go/analysis/unitchecker
cmd/vendor/golang.org/x/tools/go/analysis/passes/ctrlflow
cmd/vendor/golang.org/x/tools/go/analysis/passes/printf
cmd/vendor/golang.org/x/tools/go/analysis/passes/unmarshal
cmd/vendor/golang.org/x/tools/go/analysis/passes/lostcancel
cmd/vet
expvar
net/http/cgi
net/http/pprof
net/http/httputil
net/http/httptest
net/http/cookiejar
net/rpc
cmd/go/internal/web
cmd/go/internal/web2
cmd/vendor/github.com/google/pprof/internal/plugin
net/http/fcgi
cmd/go/internal/get
cmd/vendor/github.com/google/pprof/internal/binutils
cmd/vendor/github.com/google/pprof/internal/report
cmd/go/internal/webtest
net/rpc/jsonrpc
cmd/vendor/github.com/google/pprof/internal/symbolz
cmd/vendor/github.com/google/pprof/internal/transport
cmd/go/internal/modfetch
cmd/trace
cmd/vendor/github.com/google/pprof/internal/symbolizer
cmd/vendor/github.com/google/pprof/internal/driver
cmd/go/internal/modconv
cmd/go/internal/modload
cmd/vendor/github.com/google/pprof/driver
cmd/pprof
cmd/go/internal/clean
cmd/go/internal/fix
cmd/go/internal/fmtcmd
cmd/go/internal/generate
cmd/go/internal/envcmd
cmd/go/internal/list
cmd/go/internal/modcmd
cmd/go/internal/modget
cmd/go/internal/test
cmd/go/internal/bug
cmd/go/internal/vet
cmd/go
^\SIGQUIT: quit
PC=0x459a31 m=0 sigcode=128

goroutine 0 [idle]:
runtime.futex(0xab76e8, 0x80, 0x0, 0x0, 0xc000000000, SIGQUIT: quit0x7ffe207f8d00
, PC=0x0, 0x455c300x0 m=, 0x7ffe207f8d180,  sigcode=0x40a151128, ...
)

        /home/mvdan/tip/src/runtime/sys_linux_amd64.sgoroutine :0535 [ +idle0x21]:

runtime.futexsleep(0xab76e8, 0x0, 0xffffffffffffffff)
runtime.epollwait       (/home/mvdan/tip/src/runtime/os_linux.go0x7ffd00000004:, 460x7ffdf77364d0 +, 0x4b0xffffffff00000080
, 0xffffffff00000000runtime.notesleep, (0xffffffff000000010xab76e8, )
0x0     , /home/mvdan/tip/src/runtime/lock_futex.go0x0:, 1510x0 +, 0xa10x0
, 0x0runtime.stopm, ...()
)
                /usr/lib/go/src/runtime/sys_linux_amd64.s/home/mvdan/tip/src/runtime/proc.go::6711936 + +0x200xc1

runtime.netpoll(0xc00002c501runtime.findrunnable, (0xc000022f010xc00002e000)
,       0x0/usr/lib/go/src/runtime/netpoll_epoll.go)
:       71/home/mvdan/tip/src/runtime/proc.go +:23990x132 +
0x530
runtime.findrunnable(runtime.schedule0xc00001e500(, )
0x0     )
/home/mvdan/tip/src/runtime/proc.go     :/usr/lib/go/src/runtime/proc.go2525: +24690x20e +
0x51c
runtime.park_m(runtime.schedule0xc000409c80()
)
                /home/mvdan/tip/src/runtime/proc.go/usr/lib/go/src/runtime/proc.go::26052613 + +0xa10x13a

runtime.mcallruntime.goexit0((0x00xc00046d080)
)
                /home/mvdan/tip/src/runtime/asm_amd64.s/usr/lib/go/src/runtime/proc.go::2992790 + +0x5b0x1f2


runtime.mcallgoroutine (10x0 [)
semacquire      , /usr/lib/go/src/runtime/asm_amd64.s1: minutes299]:
 +0x5b
sync.runtime_Semacquire(
0xc00072e028goroutine )
1        [/home/mvdan/tip/src/runtime/sema.gosyscall:]:
56 +syscall.Syscall60x39(
0xf7, sync.(*WaitGroup).Wait0x1(, 0xc00072e0200x6643)
,       0xc00007b4b8/home/mvdan/tip/src/sync/waitgroup.go, :0x1000004130,  +0x00x65,
0x0, cmd/go/internal/work.(*Builder).Do0x60(, 0xc0007119000xc00035a6c0, , 0xc000813b800x0)
)
                /home/mvdan/tip/src/cmd/go/internal/work/exec.go/usr/lib/go/src/syscall/asm_linux_amd64.s::17444 + +0x3c10x5

cmd/go/internal/work.InstallPackagesos.(*Process).blockUntilWaitable((0xc0000a20500xc0000905d0, , 0x20x7, , 0x20xc00007b620, , 0xc00060c0000xc0000905d0, )
0x172   , /usr/lib/go/src/os/wait_waitid.go0x200:)
31       +/home/mvdan/tip/src/cmd/go/internal/work/build.go0x98:
506os.(*Process).wait +(0xb1b0xc0000905d0
, 0xc0003b2010cmd/go/internal/work.runInstall, (0x600xaaec60, , 0x00xc0000a2050)
,       0x2/usr/lib/go/src/os/exec_unix.go, :0x222)
 +      0x39/home/mvdan/tip/src/cmd/go/internal/work/build.go
:417os.(*Process).Wait +(0x660xc0000905d0
, 0x56ece8, main.main0x56ecf0(, )
0x56ece0        )
/home/mvdan/tip/src/cmd/go/main.go      :/usr/lib/go/src/os/exec.go219: +1250x837 +
0x2b
os/exec.(*Cmd).Wait
(goroutine 0xc00039e16018,  [0x0syscall, , 0x01)
 minutes        ]:
/usr/lib/go/src/os/exec/exec.go:os/signal.signal_recv465( +0x00x5b)

        /home/mvdan/tip/src/runtime/sigqueue.goos/exec.(*Cmd).Run:(1390xc00039e160 +, 0x9c0x31
, 0xc0003b2010)
os/signal.loop  (/usr/lib/go/src/os/exec/exec.go)
:       309/home/mvdan/tip/src/os/signal/signal_unix.go +:0x5c23
 +0x22main.run
(created by 0xc0000120a7, os/signal.init.00xf
,       0x3, /home/mvdan/tip/src/os/signal/signal_unix.go0xc0003b2000:, 0x729,  +0x80x41,
0x2,
0x2goroutine )
20       [/home/mvdan/tip/src/cmd/dist/util.goselect:]:
89 +0x12acmd/go/internal/work.(*Builder).Do.func2
(0xc00072e020main.goInstall, (0xc0007119000xc0000181c0, , 0xc000309f200x31)
,       0xc0002020a0/home/mvdan/tip/src/cmd/go/internal/work/exec.go, :0x2155,  +0x20xfb)

        /home/mvdan/tip/src/cmd/dist/build.gocreated by :cmd/go/internal/work.(*Builder).Do1386
 +0x23f
/home/mvdan/tip/src/cmd/go/internal/work/exec.go:main.cmdbootstrap152( +)
0x39d
/home/mvdan/tip/src/cmd/dist/build.go
:goroutine 134221 + [0x1110select
]:
main.xmain(cmd/go/internal/work.(*Builder).Do.func2)
(       0xc00072e020/home/mvdan/tip/src/cmd/dist/main.go, :0xc000711900166,  +0xc000309f200x10f)
        /home/mvdan/tip/src/cmd/go/internal/work/exec.go
:155main.main +(0xfb)

        created by /home/mvdan/tip/src/cmd/dist/main.gocmd/go/internal/work.(*Builder).Do:
149 +   0x174/home/mvdan/tip/src/cmd/go/internal/work/exec.go
:152 +0x39d

goroutine
19goroutine  [22select []:
select]:
main.bghelper(cmd/go/internal/work.(*Builder).Do.func2)
(       0xc00072e020/home/mvdan/tip/src/cmd/dist/util.go, :0xc000711900138,  +0xc000309f200xe9)

        /home/mvdan/tip/src/cmd/go/internal/work/exec.gocreated by :main.bginit155
 +      0xfb/home/mvdan/tip/src/cmd/dist/util.go
:131created by  +cmd/go/internal/work.(*Builder).Do0x5a


        goroutine /home/mvdan/tip/src/cmd/go/internal/work/exec.go20: [152select +, 0x39d1
 minutes
]:
goroutine 23main.bghelper [(select)
]:
        /home/mvdan/tip/src/cmd/dist/util.gocmd/go/internal/work.(*Builder).Do.func2:(1380xc00072e020 +, 0xe90xc000711900
, 0xc000309f20created by )
main.bginit
/home/mvdan/tip/src/cmd/go/internal/work/exec.go        :/home/mvdan/tip/src/cmd/dist/util.go155: +1310xfb +
0x5acreated by
cmd/go/internal/work.(*Builder).Do

goroutine       21/home/mvdan/tip/src/cmd/go/internal/work/exec.go [:select152]:
 +0x39dmain.bghelper
(
)
goroutine       24/home/mvdan/tip/src/cmd/dist/util.go [:select138]:
 +0xe9cmd/go/internal/work.(*Builder).Do.func2
(created by 0xc00072e020main.bginit,
0xc000711900    , /home/mvdan/tip/src/cmd/dist/util.go0xc000309f20:)
131      +/home/mvdan/tip/src/cmd/go/internal/work/exec.go0x5a:
155
 +goroutine 0xfb22
 [created by selectcmd/go/internal/work.(*Builder).Do,
1        minutes/home/mvdan/tip/src/cmd/go/internal/work/exec.go]:
:152main.bghelper +(0x39d)


/home/mvdan/tip/src/cmd/dist/util.gogoroutine :25138 [ +select0xe9]:

cmd/go/internal/work.(*Builder).Do.func2created by (main.bginit0xc00072e020
,       0xc000711900/home/mvdan/tip/src/cmd/dist/util.go, :0xc000309f20131)
 +      0x5a/home/mvdan/tip/src/cmd/go/internal/work/exec.go
:155 +
0xfbrax
0xfffffffffffffffccreated by
cmd/go/internal/work.(*Builder).Dorbx
0xffffffff
        rcx    /home/mvdan/tip/src/cmd/go/internal/work/exec.go0x455c30:
152rdx     +0x800x39d

rdi
0x4goroutine
26rsi     [0x7ffdf77364d0select
]:
rbp    0x7ffdf7736ad0cmd/go/internal/work.(*Builder).Do.func2
(rsp    0xc00072e0200x7ffdf7736490,
0xc000711900r8     , 0x00xc000309f20
)
r9      0x3/home/mvdan/tip/src/cmd/go/internal/work/exec.go
:r10    1550xffffffff +
0xfbr11
0x202
created by r12    cmd/go/internal/work.(*Builder).Do0x4

r13     0x54/home/mvdan/tip/src/cmd/go/internal/work/exec.go
:r14    1520x58773b +
0x39dr15
0x0

goroutine rip    270x455c30 [
syscallrflags ]:
0x202
syscall.Syscall6cs     (0x330xf7
, fs     0x10x0,
0x6b65gs     , 0x00xc000608fd0
, 0x1000004, 0x0, 0x0, 0x7bdb00, 0x4bc801, 0xc0000df140)
        /home/mvdan/tip/src/syscall/asm_linux_amd64.s:44 +0x5
os.(*Process).blockUntilWaitable(0xc0002d5140, 0x1c, 0x4133ba, 0xc0004aa800)
        /home/mvdan/tip/src/os/wait_waitid.go:31 +0x98
os.(*Process).wait(0xc0002d5140, 0x806180, 0x806188, 0x806178)
        /home/mvdan/tip/src/os/exec_unix.go:22 +0x39
os.(*Process).Wait(...)
        /home/mvdan/tip/src/os/exec.go:125
os/exec.(*Cmd).Wait(0xc00036a000, 0x0, 0x0)
        /home/mvdan/tip/src/os/exec/exec.go:474 +0x61
os/exec.(*Cmd).Run(0xc00036a000, 0x0, 0x0)
        /home/mvdan/tip/src/os/exec/exec.go:318 +0x5c
cmd/go/internal/work.(*Builder).runOut(0xc000711900, 0xc0000142e0, 0x1b, 0x0, 0x0, 0x0, 0xc000540000, 0x19, 0x28, 0x0, ...)
        /home/mvdan/tip/src/cmd/go/internal/work/exec.go:1920 +0x517
cmd/go/internal/work.gcToolchain.gc(0xc000711900, 0xc0007d1040, 0xc00028ef90, 0x23, 0xc00040e000, 0x462, 0x786, 0x0, 0x0, 0xc000609600, ...)
        /home/mvdan/tip/src/cmd/go/internal/work/gc.go:142 +0xcef
cmd/go/internal/work.(*Builder).build(0xc000711900, 0xc0007d1040, 0x0, 0x0)
        /home/mvdan/tip/src/cmd/go/internal/work/exec.go:654 +0x17df
cmd/go/internal/work.(*Builder).Do.func1(0xc0007d1040)
        /home/mvdan/tip/src/cmd/go/internal/work/exec.go:107 +0x6c
cmd/go/internal/work.(*Builder).Do.func2(0xc00072e020, 0xc000711900, 0xc000309f20)
        /home/mvdan/tip/src/cmd/go/internal/work/exec.go:165 +0x87
created by cmd/go/internal/work.(*Builder).Do
        /home/mvdan/tip/src/cmd/go/internal/work/exec.go:152 +0x39d

goroutine 250 [IO wait]:
internal/poll.runtime_pollWait(0x7f30b6a368f0, 0x72, 0xffffffffffffffff)
        /home/mvdan/tip/src/runtime/netpoll.go:182 +0x56
internal/poll.(*pollDesc).wait(0xc0000df098, 0x72, 0x201, 0x200, 0xffffffffffffffff)
        /home/mvdan/tip/src/internal/poll/fd_poll_runtime.go:87 +0x9b
internal/poll.(*pollDesc).waitRead(...)
        /home/mvdan/tip/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc0000df080, 0xc0002b6800, 0x200, 0x200, 0x0, 0x0, 0x0)
        /home/mvdan/tip/src/internal/poll/fd_unix.go:169 +0x19b
os.(*File).read(...)
        /home/mvdan/tip/src/os/file_unix.go:246
os.(*File).Read(0xc0000ee158, 0xc0002b6800, 0x200, 0x200, 0x0, 0xc00029f690, 0xc00029f6a0)
        /home/mvdan/tip/src/os/file.go:108 +0x70
bytes.(*Buffer).ReadFrom(0xc000303ad0, 0x86ed40, 0xc0000ee158, 0x7f30b69f6130, 0xc000303ad0, 0xc00029f701)
        /home/mvdan/tip/src/bytes/buffer.go:207 +0xbd
io.copyBuffer(0x86e780, 0xc000303ad0, 0x86ed40, 0xc0000ee158, 0x0, 0x0, 0x0, 0x404cb5, 0xc000349320, 0xc00029f7b0)
        /home/mvdan/tip/src/io/io.go:388 +0x303
io.Copy(...)
        /home/mvdan/tip/src/io/io.go:364
os/exec.(*Cmd).writerDescriptor.func1(0xc000349320, 0xc00029f7b0)
        /home/mvdan/tip/src/os/exec/exec.go:288 +0x63
os/exec.(*Cmd).Start.func1(0xc00036a000, 0xc000372200)
        /home/mvdan/tip/src/os/exec/exec.go:409 +0x27
created by os/exec.(*Cmd).Start
        /home/mvdan/tip/src/os/exec/exec.go:408 +0x594

rax    0xca
rbx    0xab75a0
rcx    0x459a33
rdx    0x0
rdi    0xab76e8
rsi    0x80
rbp    0x7ffe207f8ce0
rsp    0x7ffe207f8c98
r8     0x0
r9     0x0
r10    0x0
r11    0x286
r12    0x0
r13    0x33
r14    0x864628
r15    0x0
rip    0x459a31
rflags 0x286
cs     0x33
fs     0x0
gs     0x0

I seem to be able to reproduce this quite easily, so let me know if there's more information I could extract or fixes I could try. Below are my system details. Also, I'm bootstrapping with go1.11.4, but I don't imagine that matters.

$ go version
go version devel +3e89272f9c Sun Dec 30 04:55:26 2018 +0000 linux/amd64
$ go env
GOARCH="amd64"
GOBIN="/home/mvdan/go/bin"
GOCACHE="/home/mvdan/go/cache"
GOEXE=""
GOFLAGS="-ldflags=-w"
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/mvdan/go/land:/home/mvdan/go"
GOPROXY=""
GORACE=""
GOROOT="/home/mvdan/tip"
GOTMPDIR=""
GOTOOLDIR="/home/mvdan/tip/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build021802742=/tmp/go-build -gno-record-gcc-switches"

@mvdan
Copy link
Member

mvdan commented Dec 30, 2018

I've run GOTRACEBACK=crash GOROOT_BOOTSTRAP=/usr/lib/go ./make.bash -v ten times in a row, but no luck so far. I'll update this thread if I manage to catch a failure with GOTRACEBACK=crash.

@siebenmann
Copy link

I've had ./all.bash hang on me repeatedly now, although it is not something that happens all the time or on demand. Even with GOTRACEBACK=crash set, I've been unable to get a core dump, but I have created several with gcore and I have a whole collection of stack traces from SIGQUIT (probably too many to just dump here).

I'm running into this on 12-core and 16-core desktop machines. Every time I've caught the hang, strace -f -p says that Go is hammering away on a futex call and Go stack traces do have runtime.futex in them. The stack traces also always show GC stuff on multiple other goroutines, although it's idle and waiting for things, and often an idle semacquire as well.

Given all of this, my wild speculation and wondering is if there is some path in GC where GC will force a running goroutine to switch to GC activity while it has a lock held. If this GC activity then tries for a lock itself, you perhaps could get deadlock. I often see waiting semaquires for both (*Builder).Do from internal/work/exec.go and runtime.gcMarkDone(). I've also seen various StartProcess() related semacquire call paths. However, this doesn't seem to be universal across all of my stack traces.

I'd be happy to point dlv/gdlv at one of my gcore-acquired core dumps if people tell me what to look for. I poked around briefly on one of them and didn't see much, although here is the dlv threads dump:

* Thread 21339 at 0x45b6f3 /data/code/go-lang/go/src/runtime/sys_linux_amd64.s:536 runtime.futex
  Thread 21367 at 0x45b15d /data/code/go-lang/go/src/runtime/sys_linux_amd64.s:131 runtime.usleep
  Thread 21368 at 0x45b8a0 /data/code/go-lang/go/src/runtime/sys_linux_amd64.s:675 runtime.epollwait
  Thread 21369 at 0x45b6f3 /data/code/go-lang/go/src/runtime/sys_linux_amd64.s:536 runtime.futex
  Thread 21374 at 0x45b6f3 /data/code/go-lang/go/src/runtime/sys_linux_amd64.s:536 runtime.futex
  Thread 21375 at 0x45b6f3 /data/code/go-lang/go/src/runtime/sys_linux_amd64.s:536 runtime.futex
  Thread 21376 at 0x423d17 /data/code/go-lang/go/src/runtime/mgcwork.go:124 runtime.(*gcWork).checkPut
  Thread 21767 at 0x45b6f3 /data/code/go-lang/go/src/runtime/sys_linux_amd64.s:536 runtime.futex
  Thread 21802 at 0x45b6f3 /data/code/go-lang/go/src/runtime/sys_linux_amd64.s:536 runtime.futex
  Thread 21804 at 0x45b6f3 /data/code/go-lang/go/src/runtime/sys_linux_amd64.s:536 runtime.futex
  Thread 21810 at 0x45b6f3 /data/code/go-lang/go/src/runtime/sys_linux_amd64.s:536 runtime.futex
  Thread 21916 at 0x45b6f3 /data/code/go-lang/go/src/runtime/sys_linux_amd64.s:536 runtime.futex

@gopherbot
Copy link

Change https://golang.org/cl/156017 mentions this issue: runtime: don't spin in checkPut if non-preemptible

@siebenmann
Copy link

For what it's worth, I manually patched my copy of git tip to include the changes from https://golang.org/cl/156017 and I now can't reproduce a hang under repeated runs of all.bash.

@FiloSottile
Copy link
Contributor

@aclements Thanks for mailing a fix. We are planning on cutting the RC this week, please submit it as soon as possible.

gopherbot pushed a commit that referenced this issue Jan 2, 2019
Currently it's possible for the runtime to deadlock if checkPut is
called in a non-preemptible context. In this case, checkPut may spin,
so it won't leave the non-preemptible context, but the thread running
gcMarkDone needs to preempt all of the goroutines before it can
release the checkPut spin loops.

Fix this by returning from checkPut if it's called under any of the
conditions that would prevent gcMarkDone from preempting it. In this
case, it leaves a note behind that this happened; if the runtime does
later detect left-over work it can at least indicate that it was
unable to catch it in the act.

For #27993.
Updates #29385 (may fix it).

Change-Id: Ic71c10701229febb4ddf8c104fb10e06d84b122e
Reviewed-on: https://go-review.googlesource.com/c/156017
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
@aclements
Copy link
Member

CL 156017 has been submitted. I believe this fixes the hangs observed here, so I'm closing this issue. If we continue to see hangs, we can re-open.

@golang golang locked and limited conversation to collaborators Jan 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

7 participants