-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/pprof: unable to profile CPU intensive code (even with available threads) #32652
Comments
Here is a condensed version of the (You know, in case you don't want to |
/cc @hyangah |
Is it reproducible other than on VM? |
@hyangah I can reproduce this with
EDIT: Added the output of |
I think this is an inherent limitation of profiling/monitoring cpu or memory from runtime, especially, when the runtime doesn't treat profiling and monitoring work differently from any other work the runtime is doing. How can we rely on the process to reliably capture the correct profile or performance signals when the process itself is overloaded or about to oom? Maybe runtime team may have a better idea to mitigate the issue from runtime (@aclements) Otherwise, for processes with extremely high cpu usage, I think we may have better luck by collecting profiles outside runtime (e.g. linux perf, ...) |
Keep in mind, that we're talking about running with 4 CPU-heavy goroutines on a 32 core host. There are 28 other threads available to the runtime for execution, memory usage is minimal. I attached to the
then they abruptly stop, and shortly thereafter the pprof request times out. If you try to hit the pprof endpoint again, you see NO new system calls through strace. |
@jaffee Yeah, that's unarguably strange that pprof couldn't find cpu to run on when there are 28 other cores are idle in theory. But I am not familiar with Azure's VM environment, so I was curious whether such dramatic differences (32 cores >> 4 working cores) were observed in non-VM environment. |
Thanks @hyangah. I have a couple boxes I can test on when I get home (nothing with 32 cores though!). Can also pretty easily test on AWS/GCP/OCI if that helps. |
After more investigation (and opening another ticket #33015), this is likely also a dup of #10958 and will likely be fixed by #24543. I think the missing piece of my understanding was that the runtime is waiting for all running goroutines to reach a pre-emption point so that it can do a STW phase of GC (or something), which is why more things won't run even though there are available threads. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Ran a simple program which launches a configurable number of goroutines all which do CPU intensive work in a tight loop. It also imports
net/http/pprof
and starts a server to expose the debugging endpoint.Try setting the concurrency argument lower than what
runtime.NumCPU()
reports on your system. By default it is equal toruntime.NumCPU()
.What did you expect to see?
A profile after 3ish seconds.
What did you see instead?
A few different things:
On OSX, I see the
connection refused
message if-concurrency
is equal to the number of hardware threads available, and otherwise it works.On a 32 core Linux VM in Azure, I've seen both the
timeout
andconnection refused
messages at concurrencies ranging from 2 to 32. At concurrency 1, things work as expected. At concurrency 2, I've seen it work, but it usually doesn't. It has failed consistently at all higher concurrencies in my testing.I can understand why, if I have 8 cores, and I'm running 8 goroutines all in tight loops, that nothing else might get to run, and so I wouldn't be able to connect to the debug endpoint. But why if I'm only running 3 or 4 goroutines on a 32 core machine?
The text was updated successfully, but these errors were encountered: