-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/pprof: account CPU and allocations of goroutines to the frame where they are created #32223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
“The frame that created the goroutine” is not always the right one for accounting. Every goroutine traces back to either That said, we could probably do some sort of attribution using |
Can you elaborate on why you think that would be useless? That's exactly what I would like to have: if a path through the call graph rooted in main created a goroutine, I would like the CPU time (or memory) consumed by that goroutine be accounted to main, as if it was a normal child "routine" instead of a goroutine. Consider that right now all CPU/mem (including the ones used by goroutines) is already accounted as a child of an "implicit" root node... this, in my mind, doesn't make the current visualization useless. |
Most user-created goroutines will be rooted from How about other tools such as tagging the cpu profile with the Keeping track of all the goroutine creation call stack may be not cheap, and we need to balance between the profiling cost and the usefulness of the profile. |
Agreed that most goroutines will be rooted in either init or main, but I already addressed that:
The reason for the current choice is obviously technical (as it's cheaper to root things in an implicit root node, rather than keeping track of the full stacks of the transitively-spawning Gs), but if the argument is that rooting everything to a small subset of roots makes no sense, then the argument does not seem to me to be very compelling, as that's what we already do (by rooting everything in a single, arbitrary, implicit root).
Sure. Consider some sort of server that idiomatically spawns one goroutine for each request, and that indipendently does some background processing. Without the proposed visualization, there is no way intuitive way to account the resources consumed by the goroutines to the server part (vs. the background processing part). You may argue that in such a simple case you would easily see that the resources consumed by the goroutine can only belong to the server. The obvious counterpoint is that it's not always easy, in the real world:
There are many more potential scenarios: the two above are things I actually struggle daily with. |
That is exactly for which Currently only CPU profiles support labels and #23458 is a tracking issue to expand the label support to memory allocation profiles. |
Change https://golang.org/cl/189317 mentions this issue: |
Document goroutine label inheritance. Goroutine labels are copied upon goroutine creation and there is a test enforcing this, but it was not mentioned in the docstrings for `Do` or `SetGoroutineLabels`. Add notes to both of those functions' docstrings so it's clear that one does not need to set labels as soon as a new goroutine is spawned if they want to propagate tags. Updates #32223 Updates #23458 Change-Id: Idfa33031af0104b884b03ca855ac82b98500c8b4 Reviewed-on: https://go-review.googlesource.com/c/go/+/189317 Reviewed-by: Ian Lance Taylor <iant@golang.org>
When troubleshooting CPU usage of production services it would be useful to have an option, at least in the flamegraph visualization, to account the CPU time and memory allocations of a goroutine to the frame that created the goroutine.
Currently, the way I do this is take a CPU or memory profile, and then go through the code to reconstruct where goroutines were created, so that I can then proceed to identify the full stacktrace that lead to excessive CPU or memory usage.
The way I imagine this could work in the flamegraph would be by considering stack traces to include not just the the stack of the goroutine, but also the transitive stacks of the goroutines that created the current goroutine (up to a maximum limit that - if reached - would cause the option to be disabled).
Currently AFAIK this would be hard to do as described as we only record the PC of where the goroutine is created. I am not knowledgeable enough to know if there are some other ways to do (now, or in the future) what I described above; if such a way existed it would make profiling much more effective and easier to use when dealing with large codebases that are
go
-happy.The text was updated successfully, but these errors were encountered: