Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: SIGPROF during copystack causing process deadlock on single core arm? #44791

Closed
jdizzle opened this issue Mar 4, 2021 · 2 comments
Closed
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@jdizzle
Copy link

jdizzle commented Mar 4, 2021

What version of Go are you using (go version)?

$ go version
go version go1.13.10 linux/amd64

Does this issue reproduce with the latest release?

Unknown

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/jdel/.cache/go-build"
GOENV="/home/jdel/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/jdel/co/firmware/samsara/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/jdel/co/firmware/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/jdel/co/firmware/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build032826324=/tmp/go-build -gno-record-gcc-switches"

Though we're cross compiling for ARM with a gcc ARM cross compiler:
rm-oe-linux-gnueabi-gcc --version
ls (GNU coreutils) 8.28
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Richard M. Stallman and David MacKenzie.

What did you do?

pprof.StartCPUProfile() while other go-routines are running. Some cgo calls, some http connections.

What did you expect to see?

No unusual behavior while profiling is enabled.

What did you see instead?

System deadlocks. All go routines make no process. All but one thread is idle. The non-idle thread consumes 100% cpu.

If I pull the stack pointer from the linux proc pid task stat file:

/proc/1154/task/1162 # kill -SIGSTOP 1154
/proc/1154/task/1162 # awk '{printf("sp: 0x%x, pc: 0x%x\n", $29, $30)}' stat
sp: 0x280b864, pc: 0x87228

And look up the PC, it appears to be working through a stack trace of some kind.

$ echo 0x87228 | go tool addr2line bundle
runtime.pcvalue
/code/go/src/runtime/symtab.go:658

Examining the proc pid task status files I see that all signals are blocked on this thread as well:

/proc/1154/task # grep SigBlk */status
1154/status:SigBlk:	0000000000000000
1156/status:SigBlk:	0000000000000000
1157/status:SigBlk:	0000000000000000
1158/status:SigBlk:	fffffffe3bfa3a00
1159/status:SigBlk:	0000000000000000
1161/status:SigBlk:	0000000000000000
1162/status:SigBlk:	fffffffffffbfeff
1163/status:SigBlk:	0000000000000000
1164/status:SigBlk:	0000000000000000
1168/status:SigBlk:	0000000000000000
1268/status:SigBlk:	0000000000000000
27224/status:SigBlk:	0000000000000000
3003/status:SigBlk:	0000000000000000

If I examine this same process in another instance that is not having this problem, the only difference is theres no thread with the fffffffffffbfeff mask.

I have dumped the stack at the sp reported by the stat file by reading the mem file. Apparently unwinding the stack requires per-function stack-usage info encoded into the binary that I'm not sure how to get. If I just look for related PC values I can piece together a call stack in the SIGPROF signal handler call tree.

/proc/1154/task/1162 # /tmp/openseek -file mem -seek 0x280b864 > /tmp/foo.out 2021/03/04 16:43:47 seeking to 41986148 2021/03/04 16:43:47 0000 000879fc 2021/03/04 16:43:47 0001 0179d99a 2021/03/04 16:43:47 0002 00ae87ee 2021/03/04 16:43:47 0003 00ae87ee 2021/03/04 16:43:47 0004 0280b894 2021/03/04 16:43:47 0005 0280b890 2021/03/04 16:43:47 0006 01ff4101 2021/03/04 16:43:47 0007 0179d99c 2021/03/04 16:43:47 0008 00ae87ec 2021/03/04 16:43:47 0009 00ae87ec 2021/03/04 16:43:47 000a 0280b801 2021/03/04 16:43:47 000b 00000000 2021/03/04 16:43:47 000c 0034678c 2021/03/04 16:43:47 000d 00291fef 2021/03/04 16:43:47 000e 00291fef 2021/03/04 16:43:47 000f 00000001 2021/03/04 16:43:47 0010 ffffffff 2021/03/04 16:43:47 0011 00ae4d50 2021/03/04 16:43:47 0012 00000000 2021/03/04 16:43:47 0013 0008dd84 2021/03/04 16:43:47 0014 0196d078 2021/03/04 16:43:47 0015 022ae560 2021/03/04 16:43:47 0016 0006da8a 2021/03/04 16:43:47 0017 0034677c 2021/03/04 16:43:47 0018 0280b978 2021/03/04 16:43:47 0019 01ff4001 2021/03/04 16:43:47 001a 00000000 2021/03/04 16:43:47 001b 008c4287 2021/03/04 16:43:47 001c 00ae4d46 2021/03/04 16:43:47 001d 0008db10 2021/03/04 16:43:47 001e 0034677c 2021/03/04 16:43:47 001f 022ae560 2021/03/04 16:43:47 0020 000742b8 2021/03/04 16:43:47 0021 0196d078 2021/03/04 16:43:47 0022 022ae560 2021/03/04 16:43:47 0023 0034677c 2021/03/04 16:43:47 0024 0280b978 2021/03/04 16:43:47 0025 00000000 2021/03/04 16:43:47 0026 ffffffff 2021/03/04 16:43:47 0027 00000000 2021/03/04 16:43:47 0028 00000000 2021/03/04 16:43:47 0029 00000014 2021/03/04 16:43:47 002a 00000000 2021/03/04 16:43:47 002b 00000000 2021/03/04 16:43:47 002c 00000000 2021/03/04 16:43:47 002d 00000000 2021/03/04 16:43:47 002e 00ae4d47 2021/03/04 16:43:47 002f 00000000 2021/03/04 16:43:47 0030 00000000 2021/03/04 16:43:47 0031 00000000 2021/03/04 16:43:47 0032 0000000c 2021/03/04 16:43:47 0033 02651be8 2021/03/04 16:43:47 0034 00000000 2021/03/04 16:43:47 0035 00000000 2021/03/04 16:43:47 0036 00000000 2021/03/04 16:43:47 0037 00000000 2021/03/04 16:43:47 0038 00000000 2021/03/04 16:43:47 0039 00000000 2021/03/04 16:43:47 003a 00000000 2021/03/04 16:43:47 003b 00000000 2021/03/04 16:43:47 003c 00000000 2021/03/04 16:43:47 003d 00000000 2021/03/04 16:43:47 003e 00000000 2021/03/04 16:43:47 003f 00000000 2021/03/04 16:43:47 0040 00000000 2021/03/04 16:43:47 0041 00000000 2021/03/04 16:43:47 0042 00000000 2021/03/04 16:43:47 0043 00000000 2021/03/04 16:43:47 0044 00000001 2021/03/04 16:43:47 0045 002a5e58 2021/03/04 16:43:47 0046 001d64c6 2021/03/04 16:43:47 0047 00000040 2021/03/04 16:43:47 0048 00000000 2021/03/04 16:43:47 0049 00000000 2021/03/04 16:43:47 004a 00000000 2021/03/04 16:43:47 004b 00000000 2021/03/04 16:43:47 004c 00000000 2021/03/04 16:43:47 004d 00000000 2021/03/04 16:43:47 004e 00315870 2021/03/04 16:43:47 004f 0006d7a5 2021/03/04 16:43:47 0050 00000018 2021/03/04 16:43:47 0051 00000000 2021/03/04 16:43:47 0052 00000000 2021/03/04 16:43:47 0053 00000000 2021/03/04 16:43:47 0054 00083043 2021/03/04 16:43:47 0055 00096360 2021/03/04 16:43:47 0056 ffffffff 2021/03/04 16:43:47 0057 00083468 2021/03/04 16:43:47 0058 00096411 2021/03/04 16:43:47 0059 000000c8 2021/03/04 16:43:47 005a 00000000 2021/03/04 16:43:47 005b 00000000 2021/03/04 16:43:47 005c 00000000 2021/03/04 16:43:47 005d 0034677c 2021/03/04 16:43:47 005e 0006da8a 2021/03/04 16:43:47 005f 00000000 2021/03/04 16:43:47 0060 00044b7c 2021/03/04 16:43:47 0061 0006ac6d 2021/03/04 16:43:47 0062 00000000 2021/03/04 16:43:47 0063 00000000 2021/03/04 16:43:47 0064 00000000 2021/03/04 16:43:47 0065 00000000 2021/03/04 16:43:47 0066 0008439c 2021/03/04 16:43:47 0067 00096e94 2021/03/04 16:43:47 0068 ffffffff 2021/03/04 16:43:47 0069 00000000 2021/03/04 16:43:47 006a 00000000 2021/03/04 16:43:47 006b 00000000 2021/03/04 16:43:47 006c 000a4634 2021/03/04 16:43:47 006d 000adaec 2021/03/04 16:43:47 006e 00000000 2021/03/04 16:43:47 006f 00000000 2021/03/04 16:43:47 0070 00000000 2021/03/04 16:43:47 0071 00000000 2021/03/04 16:43:47 0072 00ae4d47 2021/03/04 16:43:47 0073 008c41b7 2021/03/04 16:43:47 0074 00000030 2021/03/04 16:43:47 0075 00000e34 2021/03/04 16:43:47 0076 0000042c 2021/03/04 16:43:47 0077 022ae560 2021/03/04 16:43:47 0078 0196d078 2021/03/04 16:43:47 0079 00000002 2021/03/04 16:43:47 007a 00000000 2021/03/04 16:43:47 007b 022ae560 2021/03/04 16:43:47 007c 022ae560 2021/03/04 16:43:47 007d 022ae560 2021/03/04 16:43:47 007e 0196d078 2021/03/04 16:43:47 007f 0196d078 2021/03/04 16:43:47 0080 0196d078 2021/03/04 16:43:47 0081 00000000 2021/03/04 16:43:47 0082 00000000 2021/03/04 16:43:47 0083 000876d8 2021/03/04 16:43:47 0084 017c6a98 2021/03/04 16:43:47 0085 00000568 2021/03/04 16:43:47 0086 01522f9c 2021/03/04 16:43:47 0087 00000000 2021/03/04 16:43:47 0088 0196d078 2021/03/04 16:43:47 0089 022ae560 2021/03/04 16:43:47 008a 0034677c 2021/03/04 16:43:47 008b 0034677c 2021/03/04 16:43:47 008c 00000000 2021/03/04 16:43:47 008d 02651be8 2021/03/04 16:43:47 008e 00000000 2021/03/04 16:43:47 008f 02651be8 2021/03/04 16:43:47 0090 00000000 2021/03/04 16:43:47 0091 00000000 2021/03/04 16:43:47 0092 00000000 2021/03/04 16:43:47 0093 0007f160 2021/03/04 16:43:47 0094 0008439c 2021/03/04 16:43:47 0095 a39fea90 2021/03/04 16:43:47 0096 00082960 2021/03/04 16:43:47 0097 026fbb20 2021/03/04 16:43:47 0098 00000000 2021/03/04 16:43:47 0099 0280bae8 2021/03/04 16:43:47 009a 00000040 2021/03/04 16:43:47 009b 00000000 2021/03/04 16:43:47 009c 00000000 2021/03/04 16:43:47 009d 00000006 2021/03/04 16:43:47 009e 00000003 2021/03/04 16:43:47 009f 00000000 2021/03/04 16:43:47 00a0 00000000 2021/03/04 16:43:47 00a1 0008439c 2021/03/04 16:43:47 00a2 00082960 2021/03/04 16:43:47 00a3 0008ca84 2021/03/04 16:43:47 00a4 00082d44 2021/03/04 16:43:47 00a5 00083044 2021/03/04 16:43:47 00a6 00083468 2021/03/04 16:43:47 00a7 000a4634 2021/03/04 16:43:47 00a8 00315870 2021/03/04 16:43:47 00a9 00315870 2021/03/04 16:43:47 00aa 002a5e58 2021/03/04 16:43:47 00ab 00ae4d47 2021/03/04 16:43:47 00ac 00044b7c 2021/03/04 16:43:47 00ad 00000000 2021/03/04 16:43:47 00ae 00000000 2021/03/04 16:43:47 00af 00000000 2021/03/04 16:43:47 00b0 00000000 2021/03/04 16:43:47 00b1 00000000 2021/03/04 16:43:47 00b2 00000000 2021/03/04 16:43:47 00b3 00000000 2021/03/04 16:43:47 00b4 00000000 2021/03/04 16:43:47 00b5 00000000 2021/03/04 16:43:47 00b6 00000000 2021/03/04 16:43:47 00b7 00000000 2021/03/04 16:43:47 00b8 00000000 2021/03/04 16:43:47 00b9 00000000 2021/03/04 16:43:47 00ba 00000000 2021/03/04 16:43:47 00bb 00000000 2021/03/04 16:43:47 00bc 00000000 2021/03/04 16:43:47 00bd 00000000 2021/03/04 16:43:47 00be 00000000 2021/03/04 16:43:47 00bf 00000000 2021/03/04 16:43:47 00c0 00000000 2021/03/04 16:43:47 00c1 00000000 2021/03/04 16:43:47 00c2 00000000 2021/03/04 16:43:47 00c3 00000000 2021/03/04 16:43:47 00c4 00000000 2021/03/04 16:43:47 00c5 00000000 2021/03/04 16:43:47 00c6 00000000 2021/03/04 16:43:47 00c7 00000000 2021/03/04 16:43:47 00c8 00000000 2021/03/04 16:43:47 00c9 00000000 2021/03/04 16:43:47 00ca 00000000 2021/03/04 16:43:47 00cb 00000000 2021/03/04 16:43:47 00cc 00000000 2021/03/04 16:43:47 00cd 00000000 2021/03/04 16:43:47 00ce 00000000 2021/03/04 16:43:47 00cf 00000000 2021/03/04 16:43:47 00d0 00000000 2021/03/04 16:43:47 00d1 00000000 2021/03/04 16:43:47 00d2 00000000 2021/03/04 16:43:47 00d3 00000000 2021/03/04 16:43:47 00d4 00000000 2021/03/04 16:43:47 00d5 00000000 2021/03/04 16:43:47 00d6 00000000 2021/03/04 16:43:47 00d7 00000000 2021/03/04 16:43:47 00d8 00000000 2021/03/04 16:43:47 00d9 00000000 2021/03/04 16:43:47 00da 00000000 2021/03/04 16:43:47 00db 00000000 2021/03/04 16:43:47 00dc 00000000 2021/03/04 16:43:47 00dd 00000000 2021/03/04 16:43:47 00de 00000000 2021/03/04 16:43:47 00df 00000000 2021/03/04 16:43:47 00e0 00000000 2021/03/04 16:43:47 00e1 0007fb18 2021/03/04 16:43:47 00e2 0008439c 2021/03/04 16:43:47 00e3 a39fea90 2021/03/04 16:43:47 00e4 00082960 2021/03/04 16:43:47 00e5 026fbb20 2021/03/04 16:43:47 00e6 02459680 2021/03/04 16:43:47 00e7 00000000 2021/03/04 16:43:47 00e8 00000000 2021/03/04 16:43:47 00e9 00000000 2021/03/04 16:43:47 00ea 00080650 2021/03/04 16:43:47 00eb 0007f990 2021/03/04 16:43:47 00ec 02341d0c 2021/03/04 16:43:47 00ed 00000001 2021/03/04 16:43:47 00ee 00000000 2021/03/04 16:43:47 00ef 00000000 2021/03/04 16:43:47 00f0 0280bc90 2021/03/04 16:43:47 00f1 0280bd10 2021/03/04 16:43:47 00f2 0009a238 2021/03/04 16:43:47 00f3 0000001b 2021/03/04 16:43:47 00f4 0280bc90 2021/03/04 16:43:47 00f5 0280bd10 2021/03/04 16:43:47 00f6 026fbb20 2021/03/04 16:43:47 00f7 00000000 2021/03/04 16:43:47 00f8 00000000 2021/03/04 16:43:47 00f9 00000000 2021/03/04 16:43:47 00fa 00000000 2021/03/04 16:43:47 00fb 00000000 2021/03/04 16:43:47 00fc 00000000 2021/03/04 16:43:47 00fd 00000000 2021/03/04 16:43:47 00fe 026fbb20 2021/03/04 16:43:47 00ff 00000000 2021/03/04 16:43:47 0100 00000000 2021/03/04 16:43:47 0101 00000000 2021/03/04 16:43:47 0102 00000000 2021/03/04 16:43:47 0103 00000000 2021/03/04 16:43:47 0104 00000000 2021/03/04 16:43:47 0105 00000000 2021/03/04 16:43:47 0106 00000000 2021/03/04 16:43:47 0107 bef51350 2021/03/04 16:43:47 0108 0000001b 2021/03/04 16:43:47 0109 0280bc90 2021/03/04 16:43:47 010a 0280bd10 2021/03/04 16:43:47 010b 0000001b
PC: 0x87228 /code/go/src/runtime/symtab.go:658 runtime.pcvalue
0000 000879fc /code/go/src/runtime/symtab.go:775 runtime.funcspdelta
0013 0008dd84 /code/go/src/runtime/traceback.go:220 runtime.gentraceback
0020 000742b8 /code/go/src/runtime/proc.go:3760 runtime.sigprof
0093 0007f160 /code/go/src/runtime/signal_sighandler.go:39 runtime.sighandler
00e1 0007fb18 /code/go/src/runtime/signal_unix.go:353 runtime.sigtrampgo
00f2 0009a238 /code/go/src/runtime/sys_linux_arm.s:450 runtime.sigtramp

and you can see that 00f3 0000001b is the arg of SIGPROF signal value to runtime.sigtramp.

If I SIGABRT the process it will exit and produce a call trace on stderr. A consistent artifact we see in the aborted call trace of a stuck process is a go routine stuck in a "copystack" phase:

goroutine 3722564 [copystack]:
sync.(*Once).Do(0x229051c, 0x26b6be8)
    /code/go/src/sync/once.go:40 +0x50
net/http.(*Transport).roundTrip(0x2290478, 0x26bf800, 0x0, 0x0, 0x0)
    /code/go/src/net/http/transport.go:455 +0x4c
net/http.(*Transport).RoundTrip(0x2290478, 0x26bf800, 0x2290478, 0x0, 0x0)
    /code/go/src/net/http/roundtrip.go:17 +0x24
net/http.send(0x26bf800, 0x168f470, 0x2290478, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2639b00, 0x100, ...)
    /code/go/src/net/http/client.go:250 +0x37c
net/http.(*Client).send(0x232d230, 0x26bf800, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2, 0x25d4420, 0x20, ...)
    /code/go/src/net/http/client.go:174 +0xb8
net/http.(*Client).do(0x232d230, 0x26bf800, 0x0, 0x0, 0x0)
    /code/go/src/net/http/client.go:641 +0x2c8
net/http.(*Client).Do(...)
    /code/go/src/net/http/client.go:509
net/http.(*Client).Post(0x232d230, 0x13fa024, 0x20, 0x13d346d, 0x14, 0x168e750, 0x25d43a0, 0x2693b80, 0x0, 0x6bc78)
    /code/go/src/net/http/client.go:769 +0x134
net/http.Post(...)
    /code/go/src/net/http/client.go:749
firmware.samsaradev.io/logevent.doLogPushHttp(0x13fa024, 0x20, 0x24bc280, 0x96, 0x96, 0x2693bc0)
    /code/samsara/go/src/firmware.samsaradev.io/logevent/manager.go:767 +0xd4
created by firmware.samsaradev.io/logevent.(*loggerManager).maybeSendLogs
    /code/samsara/go/src/firmware.samsaradev.io/logevent/manager.go:761 +0x66c

I can see some evidence of this being the callstack being decoded in the pprof signal stack as well. Here's a contiguous set of values on the stack that happens to resemble the call stack dumped by SIGABRT very closely:

2021/03/04 16:43:47 00a1 0008439c
2021/03/04 16:43:47 00a2 00082960
2021/03/04 16:43:47 00a3 0008ca84
2021/03/04 16:43:47 00a4 00082d44
2021/03/04 16:43:47 00a5 00083044
2021/03/04 16:43:47 00a6 00083468
2021/03/04 16:43:47 00a7 000a4634
2021/03/04 16:43:47 00a8 00315870
2021/03/04 16:43:47 00a9 00315870
2021/03/04 16:43:47 00aa 002a5e58
2021/03/04 16:43:47 00ab 00ae4d47
2021/03/04 16:43:47 00ac 00044b7c

runtime.getStackMap
/code/go/src/runtime/stack.go:1266
runtime.adjustframe
/code/go/src/runtime/stack.go:628
runtime.tracebackdefers
/code/go/src/runtime/traceback.go:77
runtime.adjustdefers
/code/go/src/runtime/stack.go:737
runtime.copystack
/code/go/src/runtime/stack.go:874
runtime.newstack
/code/go/src/runtime/stack.go:1059
sync.(*Once).Do
/code/go/src/sync/once.go:40
net/http.(*Transport).RoundTrip
/code/go/src/net/http/roundtrip.go:17
net/http.(*Transport).RoundTrip
/code/go/src/net/http/roundtrip.go:17
@jdizzle jdizzle changed the title SIGPROF during copystack causing process deadlock or single core arm? SIGPROF during copystack causing process deadlock on single core arm? Mar 4, 2021
@ianlancetaylor ianlancetaylor changed the title SIGPROF during copystack causing process deadlock on single core arm? runtime: SIGPROF during copystack causing process deadlock on single core arm? Mar 4, 2021
@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 4, 2021
@ianlancetaylor ianlancetaylor added this to the Go1.17 milestone Mar 4, 2021
@cherrymui
Copy link
Member

go1.13.10

Could you try a newer version of Go, maybe Go 1.16? There have been a few changes with profiling, signal handling, etc. And Go 1.13 is no longer supported. Thanks.

@cherrymui cherrymui added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 28, 2021
@dmitshur dmitshur modified the milestones: Go1.17, Backlog May 21, 2021
@gopherbot
Copy link

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@golang golang locked and limited conversation to collaborators May 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

5 participants