runtime: SIGPROF during copystack causing process deadlock on single core arm? #44791
Labels
FrozenDueToAge
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
WaitingForInfo
Issue is not actionable because of missing required information, which needs to be provided.
Milestone
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Unknown
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
pprof.StartCPUProfile() while other go-routines are running. Some cgo calls, some http connections.
What did you expect to see?
No unusual behavior while profiling is enabled.
What did you see instead?
System deadlocks. All go routines make no process. All but one thread is idle. The non-idle thread consumes 100% cpu.
If I pull the stack pointer from the linux proc pid task stat file:
And look up the PC, it appears to be working through a stack trace of some kind.
Examining the proc pid task status files I see that all signals are blocked on this thread as well:
If I examine this same process in another instance that is not having this problem, the only difference is theres no thread with the
fffffffffffbfeff
mask.I have dumped the stack at the sp reported by the stat file by reading the
mem
file. Apparently unwinding the stack requires per-function stack-usage info encoded into the binary that I'm not sure how to get. If I just look for related PC values I can piece together a call stack in the SIGPROF signal handler call tree.and you can see that 00f3 0000001b is the arg of SIGPROF signal value to runtime.sigtramp.
If I SIGABRT the process it will exit and produce a call trace on stderr. A consistent artifact we see in the aborted call trace of a stuck process is a go routine stuck in a "copystack" phase:
I can see some evidence of this being the callstack being decoded in the pprof signal stack as well. Here's a contiguous set of values on the stack that happens to resemble the call stack dumped by SIGABRT very closely:
The text was updated successfully, but these errors were encountered: