-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime/pprof: StopCPUProfile occasionally stuck with 100% CPU and process hang #52912
Comments
try killing it with SIGABRT and see what it's doing? |
@seankhliao Not reproduced recent days, will try with SIGABRT when it happened. Meanwhile do you have other ideas or suggested measurements I can take to keep some necessary information when it reproduce the next time? |
Could this be a memory-order / weakness problem with M1 / ARM64? The defense against this deadlock is setting The store and load of |
If you reproduce this again (or still have the profile), could you expand the |
@rhysh I see where you are coming from, but if we hit that race, shouldn't we be stuck spinning in the signal handler when the deadlock occurs, not |
Thus I believe it would be possible to get a deadlock in the signal handler if we go down the That said, we are in fact running Go code, so we should avoid that path. There would also need to be a second call to StopCPUProfile on another thread to have the stuckness in both StopCPUProfile and sigtramp that is shown in the profile. |
Change https://go.dev/cl/420196 mentions this issue: |
https://go.dev/cl/420196 makes prof.hz atomic. It is part of a general atomics cleanup and I don't think it will make a difference here, but if this is easily reproducible then it may be worth testing. |
FWIW, I have been unable to trivially reproduce this on darwin-arm64 calling StartCPUProfile/StopCPUProfile in a loop (with some extra work to ensure we get SIGPROFs). |
This converts several unsynchronized reads (reads without holding prof.signalLock) into atomic reads. For #53821. For #52912. Change-Id: I421b96a22fbe26d699bcc21010c8a9e0f4efc276 Reviewed-on: https://go-review.googlesource.com/c/go/+/420196 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
@prattmic does the merged fix solve the issue or is more info needed? |
I don't expect it to make a difference, but it is possible. @breezewish can you reproduce this at tip? |
Timed out in state WaitingForInfo. Closing. (I am just a bot, though. Please speak up if this is a mistake or you have the requested information.) |
It seems to be hard to reproduce in my M1 in recent days. We have some production env using this Go version now and I will ping back when encountering more deadlocks like this. Thanks a lot for the dig! |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Not sure.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
In tidb-server there is a feature with a pattern that repeatedly run pprof CPU profiler for 1 second (StartCPUProfile -> wait 1 sec -> StopCPUProfile -> StartCPUProfile -> wait 1 sec -> ...).
Recently in my MacOS M1 with this feature enabled, I observed that the tidb-server process was hang with 100% (1 core) CPU and it cannot process any requests.
According to the CPU profiling data provided by Instruments, looks like
StopCPUProfile
was looping infinitely atgo/src/runtime/proc.go
Line 4641 in 016d755
I have no idea how this issue can be reliably reproduced. Hope the stack provided by the Instruments helps.
What did you expect to see?
StopCPUProfile should not cause process hang.
What did you see instead?
Process was hanging.
The text was updated successfully, but these errors were encountered: