cmd/pprof: account CPU and allocations of goroutines to the frame where they are created #32223

CAFxX · 2019-05-24T03:44:22Z

When troubleshooting CPU usage of production services it would be useful to have an option, at least in the flamegraph visualization, to account the CPU time and memory allocations of a goroutine to the frame that created the goroutine.

Currently, the way I do this is take a CPU or memory profile, and then go through the code to reconstruct where goroutines were created, so that I can then proceed to identify the full stacktrace that lead to excessive CPU or memory usage.

The way I imagine this could work in the flamegraph would be by considering stack traces to include not just the the stack of the goroutine, but also the transitive stacks of the goroutines that created the current goroutine (up to a maximum limit that - if reached - would cause the option to be disabled).

Currently AFAIK this would be hard to do as described as we only record the PC of where the goroutine is created. I am not knowledgeable enough to know if there are some other ways to do (now, or in the future) what I described above; if such a way existed it would make profiling much more effective and easier to use when dealing with large codebases that are go-happy.

The text was updated successfully, but these errors were encountered:

bcmills · 2019-05-24T12:49:21Z

“The frame that created the goroutine” is not always the right one for accounting. Every goroutine traces back to either main or init, so if you apply that transitively then it becomes useless, and if it isn't transitive then it becomes very confusing.

That said, we could probably do some sort of attribution using runtime/trace regions, if we don't already (CC @hyangah @dvyukov).

CAFxX · 2019-05-24T23:14:32Z

Every goroutine traces back to either main or init, so if you apply that transitively then it becomes useless

Can you elaborate on why you think that would be useless? That's exactly what I would like to have: if a path through the call graph rooted in main created a goroutine, I would like the CPU time (or memory) consumed by that goroutine be accounted to main, as if it was a normal child "routine" instead of a goroutine.

Consider that right now all CPU/mem (including the ones used by goroutines) is already accounted as a child of an "implicit" root node... this, in my mind, doesn't make the current visualization useless.

hyangah · 2019-05-28T15:56:48Z

That's exactly what I would like to have: if a path through the call graph rooted in main created a goroutine, I would like the CPU time (or memory) consumed by that goroutine be accounted to main, as if it was a normal child "routine" instead of a goroutine.

Most user-created goroutines will be rooted from init or main and it's hard to imagine for me to imagine the usefulness of such analysis. @CAFxX do you have a specific example that demonstrates such profiling and analysis was useful?

How about other tools such as tagging the cpu profile with the runtime/pprof.Label, or some static code analysis tools?

Keeping track of all the goroutine creation call stack may be not cheap, and we need to balance between the profiling cost and the usefulness of the profile.

CAFxX · 2019-05-29T03:41:47Z

Most user-created goroutines will be rooted from init or main and it's hard to imagine for me to imagine the usefulness of such analysis.

Agreed that most goroutines will be rooted in either init or main, but I already addressed that:

Consider that right now all CPU/mem (including the ones used by goroutines) is already accounted as a child of an "implicit" root node... this, in my mind, doesn't make the current visualization useless.

The reason for the current choice is obviously technical (as it's cheaper to root things in an implicit root node, rather than keeping track of the full stacks of the transitively-spawning Gs), but if the argument is that rooting everything to a small subset of roots makes no sense, then the argument does not seem to me to be very compelling, as that's what we already do (by rooting everything in a single, arbitrary, implicit root).

@CAFxX do you have a specific example that demonstrates such profiling and analysis was useful?

Sure. Consider some sort of server that idiomatically spawns one goroutine for each request, and that indipendently does some background processing.

Without the proposed visualization, there is no way intuitive way to account the resources consumed by the goroutines to the server part (vs. the background processing part).

You may argue that in such a simple case you would easily see that the resources consumed by the goroutine can only belong to the server. The obvious counterpoint is that it's not always easy, in the real world:

you can have multiple listeners (e.g. for gRPC and HTTP, or gRPC and pubsub), that (after API adaptation) spawn goroutines running the same code: in this case it's impossible to know how to split the resource consumption between the listeners
the request goroutine can itself spawn one or more goroutines, e.g. to send parallel subrequests, or to handle timeouts, or to do async on-demand processing: in this case you would have multiple goroutine stacks all rooted on the implicit root node, with no easy way to reconstruct the call graph that triggered the resource usage

There are many more potential scenarios: the two above are things I actually struggle daily with.

hyangah · 2019-05-29T15:04:24Z

Sure. Consider some sort of server that idiomatically spawns one goroutine for each request, and that indipendently does some background processing.

Without the proposed visualization, there is no way intuitive way to account the resources consumed by the goroutines to the server part (vs. the background processing part).

You may argue that in such a simple case you would easily see that the resources consumed by the goroutine can only belong to the server. The obvious counterpoint is that it's not always easy, in the real world:

you can have multiple listeners (e.g. for gRPC and HTTP, or gRPC and pubsub), that (after API adaptation) spawn goroutines running the same code: in this case it's impossible to know how to split the resource consumption between the listeners
the request goroutine can itself spawn one or more goroutines, e.g. to send parallel subrequests, or to handle timeouts, or to do async on-demand processing: in this case you would have multiple goroutine stacks all rooted on the implicit root node, with no easy way to reconstruct the call graph that triggered the resource usage
There are many more potential scenarios: the two above are things I actually struggle daily with.

That is exactly for which runtime/pprof.Labels and related APIs were designed. That requires explicit labeling but it provides more flexibility than classifying the profilies based on who created the frames. Also, they are propagated to children goroutines. There are blog posts and tutorials on the web (https://rakyll.org/profiler-labels/, etc). Tracing and profiling libraries such as OpenCensus supports the labels - which opens up the possibility of profiling across distributed processes. The tool pprof offers options to filter and focus based on the labels (in pprof-terminology, they are called tags. See the options such as -tagfocus, -taghide)

Currently only CPU profiles support labels and #23458 is a tracking issue to expand the label support to memory allocation profiles.

gopherbot · 2019-08-07T12:58:46Z

Change https://golang.org/cl/189317 mentions this issue: runtime/pprof: Mention goroutine label heritability

Document goroutine label inheritance. Goroutine labels are copied upon goroutine creation and there is a test enforcing this, but it was not mentioned in the docstrings for `Do` or `SetGoroutineLabels`. Add notes to both of those functions' docstrings so it's clear that one does not need to set labels as soon as a new goroutine is spawned if they want to propagate tags. Updates #32223 Updates #23458 Change-Id: Idfa33031af0104b884b03ca855ac82b98500c8b4 Reviewed-on: https://go-review.googlesource.com/c/go/+/189317 Reviewed-by: Ian Lance Taylor <iant@golang.org>

CAFxX changed the title ~~cmd/pprof: proportionally account CPU and allocations of goroutines to frame where they are created~~ cmd/pprof: account CPU and allocations of goroutines to the frame where they are created May 24, 2019

bcmills added FeatureRequest NeedsInvestigation labels May 24, 2019

bcmills added this to the Unplanned milestone May 24, 2019

gopherbot added the compiler/runtime label Jul 7, 2022

mknyszek added this to Go Compiler / Runtime Jul 7, 2022

mknyszek removed this from Go Compiler / Runtime Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/pprof: account CPU and allocations of goroutines to the frame where they are created #32223

cmd/pprof: account CPU and allocations of goroutines to the frame where they are created #32223

CAFxX commented May 24, 2019

bcmills commented May 24, 2019

CAFxX commented May 24, 2019 •

edited

Loading

hyangah commented May 28, 2019

CAFxX commented May 29, 2019 •

edited

Loading

hyangah commented May 29, 2019

gopherbot commented Aug 7, 2019

cmd/pprof: account CPU and allocations of goroutines to the frame where they are created #32223

cmd/pprof: account CPU and allocations of goroutines to the frame where they are created #32223

Comments

CAFxX commented May 24, 2019

bcmills commented May 24, 2019

CAFxX commented May 24, 2019 • edited Loading

hyangah commented May 28, 2019

CAFxX commented May 29, 2019 • edited Loading

hyangah commented May 29, 2019

gopherbot commented Aug 7, 2019

CAFxX commented May 24, 2019 •

edited

Loading

CAFxX commented May 29, 2019 •

edited

Loading