proposal: runtime/pprof: add SetMaxDumpGoroutine to limit the number of goroutines dumped in a profile #50794

doujiang24 · 2022-01-25T05:10:43Z

We are using pprof for online profiling and we found goroutine dumping may take too much time when there are many goroutines.

In this case, it will take ~300ms when there are 100k goroutines on my side.
https://gist.github.com/doujiang24/9d066bce3a2bdd0f1b9fe1ef49699e4e

It's a too long time since there are almost 10k-100k goroutines in our system.

I think the easier way is to introduce a new API SetMaxDumpGoroutineNum.
I have implemented it in this PR: #50771

But, the SetMaxDumpGoroutineNum API introduced a new global variable.
This may not be a good idea.

Any feedback would be greatly appreciated. Thank you!

The text was updated successfully, but these errors were encountered:

aclements · 2022-01-25T14:21:10Z

Could you explain your use case more? Usually "online profiling" is done using a CPU profile, not a goroutine profile.

rsc · 2022-02-02T19:07:52Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

rhysh · 2022-02-02T23:01:46Z

Goroutine profiles stop the world, iterate over the entire list of live goroutines, and then restart the world. That whole-app pause can be hundreds of milliseconds when there are hundreds of thousands of goroutines.

@doujiang24 , is the problem for your apps the length of time that the app cannot do any other useful work? If so, it sounds similar to what's described in #33250.

Or is it the length of time that specific goroutine calling runtime/pprof.Lookup("goroutine").WriteTo needs to wait before it gets results?

@aclements , I've found regular collection of goroutine profiles to be one of the best tools (combination of effectiveness and ease-of-use) for tracking down problems of "why did my app get slow", including ones like the net/http deadlock in issue 32388. When I encountered that one, it affected a single service instance and self-resolved after about 15 minutes, so we were only able to understand it thanks to regular and automated collection of goroutine profiles. See also fgprof. Though @doujiang24 likely has other uses.

doujiang24 · 2022-02-08T06:56:43Z

Could you explain your use case more? Usually "online profiling" is done using a CPU profile, not a goroutine profile.

@aclements Sorry for the delay.
We want to know what are the goroutines doing or waiting for when there are too many goroutines.
thus, we may get a chance to reduce the number of goroutines.
Sampling is good enough for us since we don't need to know every goroutine.

doujiang24 · 2022-02-08T07:11:48Z

@rhysh Yeah, it's similar to #33250.
We use go for building a network proxy platform(MOSN), it's mostly used as a service mesh data plane.
So, latency is very important for us.

doujiang24 · 2022-02-08T07:28:55Z

I came out with a new idea about the API.
We can introduce the WithMaxGoroutine instead of SetMaxDumpGoroutine to avoid introducing a global variable.

usage:

pprof.Lookup("heap").WithMaxGoroutine(1024).WriteTo(...)

rsc · 2022-02-09T18:09:25Z

The big question is whether the stop-the-world is the problem versus the size of the profile. If the former, then writing less data won't take appreciably less time.

doujiang24 · 2022-02-10T15:18:03Z

The big question is whether the stop-the-world is the problem versus the size of the profile. If the former, then writing less data won't take appreciably less time.

@rsc Thanks for your point. I did more testing and I'm sharing the test results.

less data works for the previous case that takes ~300ms total (https://gist.github.com/doujiang24/9d066bce3a2bdd0f1b9fe1ef49699e4e)

Also, I have got more testing results by adding more log in pprof.go.
It shows 10x faster by writing less data.

100k goroutines, without the max goroutine limitation, means the standard behavior.

fetch profile(runtime_goroutineProfileWithLabels, including the STW) takes ~130ms.
the total pprof goroutine dump takes ~300ms (including printCountProfile).

100k goroutines, with the max 1k goroutine limitation, by using the PR #50771

fetch profile(runtime_goroutineProfileWithLabelsAndLimit) takes ~15ms.
the total pprof goroutine dump takes ~18ms.

Maybe STW may take more time in some other cases, but I think this shows less data will really take appreciably less time.

rsc · 2022-02-16T18:21:34Z

PR 50771 seems to be doing a random sample of the goroutines. Is that helpful in your use case, not to see all of them?
Then there's the question of the API.

/cc @aclements @mknyszek @prattmic for thoughts

aclements · 2022-02-16T19:30:13Z

I suspect we could eliminate the STW here at the cost of losing "snapshot consistency" across the goroutine stacks.

Just looking over the current implementation, we could certainly improve the performance of this, though I doubt we can squeeze 10x out of it short of substantially redesigning how traceback works (e.g., using frame pointers).

mknyszek · 2022-02-16T19:30:24Z

I think it's still not clear to me where exactly the problem lies for your use-case, @doujiang24. Is the 300 ms delay between when you request the profile and when you get it the problem, or is the 130 ms (or however long the world is actually stopped) the problem, because it's a global latency hit.

If it's the latter, one thing we could do instead of limiting the number of goroutines is relax the consistency of the goroutine dump. Depending on how strict we want to be on backwards compatibility with this, this might not require a new API. If the goal is statistical goroutine sampling anyway, then I think that's fine. If a major use-case is getting a consistent snapshot that always 100% makes sense, then either this isn't really an option or it needs a knob.

I imagine that the global latency impact of this approach would be the same latency impact as stack scanning by the GC, which is definitely much, much less than 130 ms even for 100,000 goroutines. However, this would not help the end-to-end latency of the operation; it would still take 300 ms for that many goroutines, limiting how often you can sample.

prattmic · 2022-02-16T19:36:36Z

Here are some of my initial thoughts, I haven't thought too deeply about this.

Currently, the goroutine profile is a complete instantaneous (one-off) view of what every goroutine is doing. Since it involves a stop-the-world (STW), we even get a consistent view of where every goroutine is at a single point in time [1] [2].

On the other hand, this proposal effectively turns this profile into an instantaneous (one-off) sampled view of what goroutines are doing, since we don't capture every goroutine. It would still maintain the consistent view due to STW.

My question for brainstorming is whether this is the best interface for use-cases of this profile? Is sampling OK, or do we typically need all goroutines? Is a consistent snapshot time needed? If so, maybe this API is best.

But I could also imagine perhaps we could provide a continuous sampling goroutine profile. Rather than a single instantaneous profile point, we sample goroutine state over time, eventually capturing all of them but never all at the same time. Would such a profile be useful, I'm not sure.

[1] As opposed to collecting traces while goroutines are running, which would result in each goroutine getting traced at a slightly different time of execution. In this case it would be possible to observe what appear to be "impossible" states between 2 goroutines.

[2] Modulo that some small bits of code not not preemptible and thus cannot be observed in the profile, but I don't think this matters much.

ianlancetaylor · 2022-02-16T20:01:09Z

It's worth asking what the purpose of the goroutine profile is. It's not particularly helpful for analyzing performance or memory usage. Presumably it is helpful for finding leaked goroutines: goroutines that are waiting for something to happen that is never going to happen. Are there other uses for this profile?

If that is the main use can we have a more targeted goroutine profile? Such as, only list goroutines that have been blocked for some period of time?

prattmic · 2022-02-16T20:10:40Z

I've found regular collection of goroutine profiles to be one of the best tools (combination of effectiveness and ease-of-use) for tracking down problems of "why did my app get slow"

@rhysh Could you expand more on this comment? How was this helpful? I'm imagining noticing some goroutines that, while not deadlocked, tend to be blocked or temporarily stopped somewhere we don't expect to be hot, but I'd like to hear if that is what you did.

rhysh · 2022-02-16T23:57:49Z

A common workflow is to collect hundreds or thousands of goroutine profiles (from every instance of an app, every few minutes, for some time range), to get a count in each profile of the number of goroutines with a function matching ServeHTTP on their stack, and to look at what's different in the profiles that score the highest.

Goroutine profiles were important in tracking down #33747, where a large number of waiters on a single lock would lead to performance collapse.

A team I work with encountered the effects of #32388 last October. We used goroutine profiles to figure out what was going on there, and then discovered that someone had already reported the issue (and they had used goroutine profiles too).

Sometimes the reason an app slows down is that its log collection daemon is running behind, and (with a blocking log library) its calls to log end up waiting on the logger's sync.Mutex for a little while.

None of those are exactly leaks, because the thing those goroutines are waiting for does eventually happen. In the first and third cases, if load on the application decreases then the goroutines will each get a turn with the sync.Mutex. In the second case the OS closes the TCP connection after 15 minutes or so (sometimes faster).

We also used them to debug the problem that led to aws/aws-sdk-go#3127, where a sync.Mutex protected a cache that was only populated on success, so consistent failures led to 100% cache misses which were then serialized.

Around Go 1.13, an app I support with a typical response time of 1ms had occasional delays of 100ms. I used goroutine profiles to learn that those requests were delayed in particular functions by calls to runtime.morestack{,_noctxt}. A very close look with the CPU profiler (with request id added as a profile label, filtering to request IDs that showed up more than once) showed contention in runtime.(*mcentral).cacheSpan, which was fixed by then in CL 221182 (thanks @mknyszek !).

It's a very general tool, very easy to deploy (vs the opt-in of block/mutex profiles), very easy to use (vs execution trace).

doujiang24 · 2022-02-17T06:19:18Z

Thanks for @rhysh 's use cases.
In our system, we monitor the goroutine number online regularly and will dump a goroutine profile when:

the number increased too much suddenly(may decrease after some time),
or increased to a very large number.
(we haven't enabled regular dump since we want to reduce the overhead of dumping).

in the 1st case, it may be caused by some unexpected issue, eg. network block or unexpected lock in some corner case.
for the second case, it usually means leaking and we can figure out it before online, but there may still have some corner cases that only happen online.

Actually, we care about two-part overhead:

fetch profile(means the STW time mostly), ~130ms in the test case.
other pure formatting or dumping working(means the printCountProfile), the ~170ms left.

the 1st STW time is more important for us.
but we also want to reduce the second time since ~170ms pure CPU time is also a bit heavy for an online network proxy system, and it's not so meaningful for us. We hope less overhead for profiling.

@rsc Yeah, we want to what does the goroutines are doing or waiting(mostly wanted). I think random sampling is a good choice, just like sampling in CPU or memory profile.

I suspect we could eliminate the STW here at the cost of losing "snapshot consistency" across the goroutine stacks.

@aclements @mknyszek I think this is a good choice for our use case. Glad to see this can be implemented, but there is another question we just break backward compatibility or introduce another new API to control it.
But I think it doesn't conflict with sampling, we can reduce the total overhead even it doesn't have STW.

If the goal is statistical goroutine sampling anyway, then I think that's fine.

@mknyszek yes, that's it.
In our previous cases, there are a large number of unexpected goroutines waiting on the same backtrace, and we don't need to know the exact number, the ratio is good enough for us.

Is sampling OK, or do we typically need all goroutines? Is a consistent snapshot time needed?

@prattmic sampling is OK for me usually. But I'm not sure if it is a good idea for others.

If that is the main use can we have a more targeted goroutine profile? Such as, only list goroutines that have been blocked for some period of time?

@ianlancetaylor yeah, in general, we are interested in the blocked goroutines while doing goroutine profile.
But I think the goroutine profile could give us a high level and brief view to know what the goroutines are doing/waiting of a system, make us know the system better, or find know to optimize it in a global view(it's a bit like the CPU profile, but not the same thing).

Thank you, everyone!

gopherbot · 2022-02-22T20:43:25Z

Change https://go.dev/cl/387415 mentions this issue: runtime: decrease STW pause for goroutine profile

rsc · 2022-03-02T18:30:20Z

Assuming CL 387415 lands, do we need SetMaxDumpGoroutine at all? Or should this proposal be closed? Thanks.

doujiang24 · 2022-03-03T02:15:22Z

@rsc I think it's still better to have the SetMaxDumpGoroutine on my side, since it will reduce the total CPU cost.
One case on my side:
There are 40k goroutines in total, and most of them(about 30k) are unexpected and they are in the same backtrace.
I think SetMaxDumpGoroutine(sample with limit) is the better choice for this case.

@rhysh Glad to see CL 387415 started. Do you think sample with limit conflicts with CL 387415?

felixge · 2022-03-03T09:52:49Z

❤️ the discussion and proposed improvements, especially CL 387415 from @rhysh.

Before deciding on a SetMaxDumpGoroutine() API, I'd like to mention a related idea for extending the goroutine profiling API: If we could manage low STW overhead AND targeted goroutine profiling, it would be possible to build a goroutine tracer similar to the JavaScript wall-clock profiler in Chrome, showing stack traces on a per-goroutine timeline (flame chart) view rather than a FlameGraph.

I'm planning to submit a separate proposal and a working proof of concept (see this video for a teaser) for this soon, but the general idea would be to use pprof labels to allow users to mark a small subset of their goroutines (e.g. every Nth goroutine handling incoming user request) as "tracer goroutines" and then sample their stack traces at 100 Hz and record timestamps + goroutine ids + stacks. The output format could be the trace event format or protobuf format used by perfetto.

Perhaps this is entirely orthogonal to SetMaxDumpGoroutine, my only direct comment for this would be that a NewGoroutineProfile + GoroutineProfile.SetMaxGoroutines + GoroutineProfile.Write API similar to what's proposed for the CPU profiler in #42502 might provide a better path for future API additions.

rhysh · 2022-03-03T18:21:10Z

Collecting the profile concurrently would complicate the sampling algorithm, but otherwise I think the two changes would be able to work together, @doujiang24 . (I don't have an algorithm off the top of my head for how to do that without O(samples) light work to prepare, but at least it's less than O(goroutines) of heavy work.)

I want to be clear that as far as I know, CL 387415 (at PS 3) does not compromise on the consistency of the goroutine profile (based on the test infrastructure in CL 387416). The key information that goroutine profiles provide is sometimes the presence/absence/state of a single goroutine. In #32388 it's a single goroutine holding up the http connection pool because of a slow network write (and 10k goroutines waiting). In #33747 it's the mysterious absence of any goroutine actually holding the lock (and 10k goroutines waiting). So I'm not convinced that sampling the goroutines is going to give good results for its users. An additional field on an extensible API like @felixge mentioned seems like a decent middleground.

As for the CPU overhead, I'm not sure how to tell how much is too much. Here's a possible framework: The machines I use often provide CPU and Memory allocations in a particular ratio, such as 2 GB of memory per 1 CPU thread. Filling the memory with 2kB goroutine stacks allows 1e6 goroutines. Collecting a goroutine profile for all of those takes about 2µs each, for a total of 2 seconds of CPU time. How often does an app need to collect a goroutine profile before the collection takes an unacceptable fraction of the compute resources, when a goroutine profile every five minutes leads to a CPU overhead of less than 1%? Faster is nice, but maybe it's already fast enough. @doujiang24 , what framework and target do you have in mind for "We hope less overhead for profiling."?

rsc · 2022-03-09T18:48:02Z

It sounds like there are enough available performance improvements (some of which are already being implemented) that we should hold off on adding any new API right now, which we would be stuck with forever. Do I have that right?

rhysh · 2022-03-09T20:28:16Z

we should hold off on adding any new API right now

That's my view, yes.

doujiang24 · 2022-03-15T06:44:16Z

@rhysh Sorry for the delay.
Yeah, using a global variable is not a good idea. WithMaxGoroutine should be better:
#50794 (comment)

Also, sampling is not a good idea for all cases but really useful in some cases.
We just want an option not force sampling always.

we implemented a continuous profiling library based on pprof, to capture the CPU/memory/goroutine spikes(at seconds levels), we don't want itself introduce spike.
1% CPU time in five minutes, should be okay in our case, also I think is okay for most cases.
but, we care about the CPU spike since we(MOSN) are a network proxy software, CPU spike always leads to latency spike that's more important to us.

Yeah, CL 387415 reduced the STW really helps, STW could introduce larger latency.
But, 100+ms CPU block for one P is not friendly to us.

rsc · 2022-03-16T18:00:52Z

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

doujiang24 · 2022-03-18T00:07:47Z

okay, just confirm it clearly, is WithMaxGoroutine also not acceptable?

ianlancetaylor · 2022-03-18T00:47:22Z

@doujiang24 As @rsc says above, our current opinion is "we should hold off on adding any new API right now, which we would be stuck with forever." Let's try the new changes out for a while and see to what extent there is still a real problem here to solve. Thanks.

doujiang24 · 2022-03-18T03:05:32Z

Okay, got it. Thank you all.

rsc · 2022-03-23T18:12:42Z

No change in consensus, so declined.
— rsc for the proposal review group

The goroutine profile needs to stop the world to get a consistent snapshot of all goroutines in the app. Leaving the world stopped while iterating over allgs leads to a pause proportional to the number of goroutines in the app (or its high-water mark). Instead, do only a fixed amount of bookkeeping while the world is stopped. Install a barrier so the scheduler confirms that a goroutine appears in the profile, with its stack recorded exactly as it was during the stop-the-world pause, before it allows that goroutine to execute. Iterate over allgs while the app resumes normal operations, adding each to the profile unless they've been scheduled in the meantime (and so have profiled themselves). Stop the world a second time to remove the barrier and do a fixed amount of cleanup work. This increases both the fixed overhead and per-goroutine CPU-time cost of GoroutineProfile. It also increases the wall-clock latency of the call to GoroutineProfile, since the scheduler may interrupt it to execute other goroutines. name old time/op new time/op delta GoroutineProfile/small/loaded-8 1.05ms ± 5% 4.99ms ±31% +376.85% (p=0.000 n=10+9) GoroutineProfile/sparse/loaded-8 1.04ms ± 4% 3.61ms ±27% +246.61% (p=0.000 n=10+10) GoroutineProfile/large/loaded-8 7.69ms ±17% 20.35ms ± 4% +164.50% (p=0.000 n=10+10) GoroutineProfile/small/idle 958µs ± 0% 1820µs ±23% +89.91% (p=0.000 n=10+10) GoroutineProfile/sparse/idle-8 1.00ms ± 3% 1.52ms ±17% +51.18% (p=0.000 n=10+10) GoroutineProfile/small/idle-8 1.01ms ± 4% 1.47ms ± 7% +45.28% (p=0.000 n=9+9) GoroutineProfile/sparse/idle 980µs ± 1% 1403µs ± 2% +43.22% (p=0.000 n=9+10) GoroutineProfile/large/idle-8 7.19ms ± 8% 8.43ms ±21% +17.22% (p=0.011 n=10+10) PingPongHog 511ns ± 8% 585ns ± 9% +14.39% (p=0.000 n=10+10) GoroutineProfile/large/idle 6.71ms ± 0% 7.58ms ± 3% +13.08% (p=0.000 n=8+10) PingPongHog-8 469ns ± 8% 509ns ±12% +8.62% (p=0.010 n=9+10) WakeupParallelSyscall/5µs 216µs ± 4% 229µs ± 3% +6.06% (p=0.000 n=10+9) WakeupParallelSyscall/5µs-8 147µs ± 1% 149µs ± 2% +1.12% (p=0.009 n=10+10) WakeupParallelSyscall/2µs-8 140µs ± 0% 142µs ± 1% +1.11% (p=0.001 n=10+9) WakeupParallelSyscall/50µs-8 236µs ± 0% 238µs ± 1% +1.08% (p=0.000 n=9+10) WakeupParallelSyscall/1µs-8 138µs ± 0% 140µs ± 1% +1.05% (p=0.013 n=10+9) Matmult 8.52ns ± 1% 8.61ns ± 0% +0.98% (p=0.002 n=10+8) WakeupParallelSyscall/10µs-8 157µs ± 1% 158µs ± 1% +0.58% (p=0.003 n=10+8) CreateGoroutinesSingle-8 328ns ± 0% 330ns ± 1% +0.57% (p=0.000 n=9+9) WakeupParallelSpinning/100µs-8 343µs ± 0% 344µs ± 1% +0.30% (p=0.015 n=8+8) WakeupParallelSyscall/20µs-8 178µs ± 0% 178µs ± 0% +0.18% (p=0.043 n=10+9) StackGrowthDeep-8 22.8µs ± 0% 22.9µs ± 0% +0.12% (p=0.006 n=10+10) StackGrowth 1.06µs ± 0% 1.06µs ± 0% +0.09% (p=0.000 n=8+9) WakeupParallelSpinning/0s 10.7µs ± 0% 10.7µs ± 0% +0.08% (p=0.000 n=9+9) WakeupParallelSpinning/5µs 30.7µs ± 0% 30.7µs ± 0% +0.04% (p=0.000 n=10+10) WakeupParallelSpinning/100µs 411µs ± 0% 411µs ± 0% +0.03% (p=0.000 n=10+9) WakeupParallelSpinning/2µs 18.7µs ± 0% 18.7µs ± 0% +0.02% (p=0.026 n=10+10) WakeupParallelSpinning/20µs-8 93.0µs ± 0% 93.0µs ± 0% +0.01% (p=0.021 n=9+10) StackGrowth-8 216ns ± 0% 216ns ± 0% ~ (p=0.209 n=10+10) CreateGoroutinesParallel-8 49.5ns ± 2% 49.3ns ± 1% ~ (p=0.591 n=10+10) CreateGoroutinesSingle 699ns ±20% 748ns ±19% ~ (p=0.353 n=10+10) WakeupParallelSpinning/0s-8 15.9µs ± 2% 16.0µs ± 3% ~ (p=0.315 n=10+10) WakeupParallelSpinning/1µs 14.6µs ± 0% 14.6µs ± 0% ~ (p=0.513 n=10+10) WakeupParallelSpinning/2µs-8 24.2µs ± 3% 24.1µs ± 2% ~ (p=0.971 n=10+10) WakeupParallelSpinning/10µs 50.7µs ± 0% 50.7µs ± 0% ~ (p=0.101 n=10+10) WakeupParallelSpinning/20µs 90.7µs ± 0% 90.7µs ± 0% ~ (p=0.898 n=10+10) WakeupParallelSpinning/50µs 211µs ± 0% 211µs ± 0% ~ (p=0.382 n=10+10) WakeupParallelSyscall/0s-8 137µs ± 1% 138µs ± 0% ~ (p=0.075 n=10+10) WakeupParallelSyscall/1µs 216µs ± 1% 219µs ± 3% ~ (p=0.065 n=10+9) WakeupParallelSyscall/2µs 214µs ± 7% 219µs ± 1% ~ (p=0.101 n=10+8) WakeupParallelSyscall/50µs 317µs ± 5% 326µs ± 4% ~ (p=0.123 n=10+10) WakeupParallelSyscall/100µs 450µs ± 9% 459µs ± 8% ~ (p=0.247 n=10+10) WakeupParallelSyscall/100µs-8 337µs ± 0% 338µs ± 1% ~ (p=0.089 n=10+10) WakeupParallelSpinning/5µs-8 32.2µs ± 0% 32.2µs ± 0% -0.05% (p=0.026 n=9+10) WakeupParallelSpinning/50µs-8 216µs ± 0% 216µs ± 0% -0.12% (p=0.004 n=10+10) WakeupParallelSpinning/1µs-8 20.6µs ± 0% 20.5µs ± 0% -0.22% (p=0.014 n=10+10) WakeupParallelSpinning/10µs-8 54.5µs ± 0% 54.2µs ± 1% -0.57% (p=0.000 n=10+10) CreateGoroutines-8 213ns ± 1% 211ns ± 1% -0.86% (p=0.002 n=10+10) CreateGoroutinesCapture 1.03µs ± 0% 1.02µs ± 0% -0.91% (p=0.000 n=10+10) CreateGoroutinesCapture-8 1.32µs ± 1% 1.31µs ± 1% -1.06% (p=0.001 n=10+9) CreateGoroutines 188ns ± 0% 186ns ± 0% -1.06% (p=0.000 n=9+10) CreateGoroutinesParallel 188ns ± 0% 186ns ± 0% -1.27% (p=0.000 n=8+10) WakeupParallelSyscall/0s 210µs ± 3% 207µs ± 3% -1.60% (p=0.043 n=10+10) StackGrowthDeep 121µs ± 1% 119µs ± 1% -1.70% (p=0.000 n=9+10) Matmult-8 1.82ns ± 3% 1.78ns ± 3% -2.16% (p=0.020 n=10+10) WakeupParallelSyscall/20µs 281µs ± 3% 269µs ± 4% -4.44% (p=0.000 n=10+10) WakeupParallelSyscall/10µs 239µs ± 3% 228µs ± 9% -4.70% (p=0.001 n=10+10) GoroutineProfile/sparse-nil/idle-8 485µs ± 2% 12µs ± 4% -97.56% (p=0.000 n=10+10) GoroutineProfile/small-nil/idle-8 484µs ± 2% 12µs ± 1% -97.60% (p=0.000 n=10+7) GoroutineProfile/small-nil/loaded-8 487µs ± 2% 11µs ± 3% -97.68% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/loaded-8 507µs ± 4% 11µs ± 6% -97.78% (p=0.000 n=10+10) GoroutineProfile/large-nil/idle-8 709µs ± 2% 11µs ± 4% -98.38% (p=0.000 n=10+10) GoroutineProfile/large-nil/loaded-8 717µs ± 2% 11µs ± 3% -98.43% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/idle 465µs ± 3% 1µs ± 1% -99.84% (p=0.000 n=10+10) GoroutineProfile/small-nil/idle 493µs ± 3% 1µs ± 0% -99.85% (p=0.000 n=10+9) GoroutineProfile/large-nil/idle 716µs ± 1% 1µs ± 2% -99.89% (p=0.000 n=7+10) name old alloc/op new alloc/op delta CreateGoroutinesCapture 144B ± 0% 144B ± 0% ~ (all equal) CreateGoroutinesCapture-8 144B ± 0% 144B ± 0% ~ (all equal) name old allocs/op new allocs/op delta CreateGoroutinesCapture 5.00 ± 0% 5.00 ± 0% ~ (all equal) CreateGoroutinesCapture-8 5.00 ± 0% 5.00 ± 0% ~ (all equal) name old p50-ns new p50-ns delta GoroutineProfile/small/loaded-8 1.01M ± 3% 3.87M ±45% +282.15% (p=0.000 n=10+10) GoroutineProfile/sparse/loaded-8 1.02M ± 3% 2.43M ±41% +138.42% (p=0.000 n=10+10) GoroutineProfile/large/loaded-8 7.43M ±16% 17.28M ± 2% +132.43% (p=0.000 n=10+10) GoroutineProfile/small/idle 956k ± 0% 1559k ±16% +63.03% (p=0.000 n=10+10) GoroutineProfile/small/idle-8 1.01M ± 3% 1.45M ± 7% +44.31% (p=0.000 n=10+9) GoroutineProfile/sparse/idle 977k ± 1% 1399k ± 2% +43.20% (p=0.000 n=10+10) GoroutineProfile/sparse/idle-8 1.00M ± 3% 1.41M ± 3% +40.47% (p=0.000 n=10+10) GoroutineProfile/large/idle-8 6.97M ± 1% 8.41M ±25% +20.54% (p=0.003 n=8+10) GoroutineProfile/large/idle 6.71M ± 1% 7.46M ± 4% +11.15% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/idle-8 483k ± 3% 13k ± 3% -97.41% (p=0.000 n=10+9) GoroutineProfile/small-nil/idle-8 483k ± 2% 12k ± 1% -97.43% (p=0.000 n=10+8) GoroutineProfile/small-nil/loaded-8 484k ± 3% 10k ± 2% -97.93% (p=0.000 n=10+8) GoroutineProfile/sparse-nil/loaded-8 492k ± 2% 10k ± 4% -97.97% (p=0.000 n=10+8) GoroutineProfile/large-nil/idle-8 708k ± 2% 12k ±15% -98.36% (p=0.000 n=10+10) GoroutineProfile/large-nil/loaded-8 714k ± 2% 10k ± 2% -98.60% (p=0.000 n=10+8) GoroutineProfile/sparse-nil/idle 459k ± 1% 1k ± 1% -99.85% (p=0.000 n=10+10) GoroutineProfile/small-nil/idle 477k ± 1% 1k ± 0% -99.85% (p=0.000 n=10+9) GoroutineProfile/large-nil/idle 712k ± 1% 1k ± 1% -99.90% (p=0.000 n=7+10) name old p90-ns new p90-ns delta GoroutineProfile/small/loaded-8 1.13M ±10% 7.49M ±35% +562.07% (p=0.000 n=10+10) GoroutineProfile/sparse/loaded-8 1.10M ±12% 4.58M ±31% +318.02% (p=0.000 n=10+9) GoroutineProfile/large/loaded-8 8.78M ±24% 27.83M ± 2% +217.00% (p=0.000 n=10+10) GoroutineProfile/small/idle 967k ± 0% 2909k ±50% +200.91% (p=0.000 n=10+10) GoroutineProfile/sparse/idle-8 1.02M ± 3% 1.96M ±76% +92.99% (p=0.000 n=10+10) GoroutineProfile/small/idle-8 1.07M ±17% 1.55M ±12% +45.23% (p=0.000 n=10+10) GoroutineProfile/sparse/idle 992k ± 1% 1417k ± 3% +42.79% (p=0.000 n=9+10) GoroutineProfile/large/idle 6.73M ± 0% 7.99M ± 8% +18.80% (p=0.000 n=8+10) GoroutineProfile/large/idle-8 8.20M ±25% 9.18M ±25% ~ (p=0.315 n=10+10) GoroutineProfile/sparse-nil/idle-8 495k ± 3% 13k ± 1% -97.36% (p=0.000 n=10+9) GoroutineProfile/small-nil/idle-8 494k ± 2% 13k ± 3% -97.36% (p=0.000 n=10+10) GoroutineProfile/small-nil/loaded-8 496k ± 2% 13k ± 1% -97.41% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/loaded-8 544k ±11% 13k ± 1% -97.62% (p=0.000 n=10+9) GoroutineProfile/large-nil/idle-8 724k ± 1% 13k ± 3% -98.20% (p=0.000 n=10+10) GoroutineProfile/large-nil/loaded-8 729k ± 3% 13k ± 2% -98.23% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/idle 476k ± 4% 1k ± 1% -99.85% (p=0.000 n=9+10) GoroutineProfile/small-nil/idle 537k ±10% 1k ± 0% -99.87% (p=0.000 n=10+9) GoroutineProfile/large-nil/idle 729k ± 0% 1k ± 1% -99.90% (p=0.000 n=7+10) name old p99-ns new p99-ns delta GoroutineProfile/sparse/loaded-8 1.27M ±33% 20.49M ±17% +1514.61% (p=0.000 n=10+10) GoroutineProfile/small/loaded-8 1.37M ±29% 20.48M ±23% +1399.35% (p=0.000 n=10+10) GoroutineProfile/large/loaded-8 9.76M ±23% 39.98M ±22% +309.52% (p=0.000 n=10+8) GoroutineProfile/small/idle 976k ± 1% 3367k ±55% +244.94% (p=0.000 n=10+10) GoroutineProfile/sparse/idle-8 1.03M ± 3% 2.50M ±65% +142.30% (p=0.000 n=10+10) GoroutineProfile/small/idle-8 1.17M ±34% 1.70M ±14% +45.15% (p=0.000 n=10+10) GoroutineProfile/sparse/idle 1.02M ± 3% 1.45M ± 4% +42.64% (p=0.000 n=9+10) GoroutineProfile/large/idle 6.92M ± 2% 9.00M ± 7% +29.98% (p=0.000 n=8+9) GoroutineProfile/large/idle-8 8.74M ±23% 9.90M ±24% ~ (p=0.190 n=10+10) GoroutineProfile/sparse-nil/idle-8 508k ± 4% 16k ± 2% -96.90% (p=0.000 n=10+9) GoroutineProfile/small-nil/idle-8 508k ± 4% 16k ± 3% -96.91% (p=0.000 n=10+9) GoroutineProfile/small-nil/loaded-8 542k ± 5% 15k ±15% -97.15% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/loaded-8 649k ±16% 15k ±18% -97.67% (p=0.000 n=10+10) GoroutineProfile/large-nil/idle-8 738k ± 2% 16k ± 2% -97.86% (p=0.000 n=10+10) GoroutineProfile/large-nil/loaded-8 765k ± 4% 15k ±17% -98.03% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/idle 539k ±26% 1k ±17% -99.84% (p=0.000 n=10+10) GoroutineProfile/small-nil/idle 659k ±25% 1k ± 0% -99.84% (p=0.000 n=10+8) GoroutineProfile/large-nil/idle 760k ± 2% 1k ±22% -99.88% (p=0.000 n=9+10) Fixes #33250 For #50794 Change-Id: I862a2bc4e991cec485f21a6fce4fca84f2c6435b Reviewed-on: https://go-review.googlesource.com/c/go/+/387415 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Rhys Hiltner <rhys@justin.tv> TryBot-Result: Gopher Robot <gobot@golang.org>

doujiang24 added the Proposal label Jan 25, 2022

ianlancetaylor changed the title ~~pprof: limit the max goroutine number for dumping.~~ proposal: runtime/pprof: add SetMaxDumpGoroutine to limit the number of goroutines dumped in a profile Jan 25, 2022

gopherbot added this to the Proposal milestone Jan 25, 2022

prattmic added this to Go Compiler / Runtime Feb 22, 2022

prattmic modified the milestones: Proposal, Backlog Feb 22, 2022

prattmic moved this to Todo in Go Compiler / Runtime Feb 22, 2022

rsc added the Proposal-FinalCommentPeriod label Mar 16, 2022

rsc removed the Proposal-FinalCommentPeriod label Mar 23, 2022

rsc closed this as completed Mar 23, 2022

Repository owner moved this from Todo to Done in Go Compiler / Runtime Mar 23, 2022

rsc moved this to Declined in Proposals Aug 10, 2022

rsc added this to Proposals Aug 10, 2022

mknyszek removed this from Go Compiler / Runtime Feb 15, 2023

golang locked and limited conversation to collaborators Mar 23, 2023

gopherbot added the FrozenDueToAge label Mar 23, 2023

rsc removed this from Proposals Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: runtime/pprof: add SetMaxDumpGoroutine to limit the number of goroutines dumped in a profile #50794

proposal: runtime/pprof: add SetMaxDumpGoroutine to limit the number of goroutines dumped in a profile #50794

doujiang24 commented Jan 25, 2022

aclements commented Jan 25, 2022

rsc commented Feb 2, 2022

rhysh commented Feb 2, 2022

doujiang24 commented Feb 8, 2022

doujiang24 commented Feb 8, 2022

doujiang24 commented Feb 8, 2022

rsc commented Feb 9, 2022

doujiang24 commented Feb 10, 2022

rsc commented Feb 16, 2022

aclements commented Feb 16, 2022

mknyszek commented Feb 16, 2022 •

edited

Loading

prattmic commented Feb 16, 2022

ianlancetaylor commented Feb 16, 2022

prattmic commented Feb 16, 2022

rhysh commented Feb 16, 2022

doujiang24 commented Feb 17, 2022

gopherbot commented Feb 22, 2022

rsc commented Mar 2, 2022

doujiang24 commented Mar 3, 2022

felixge commented Mar 3, 2022 •

edited

Loading

rhysh commented Mar 3, 2022

rsc commented Mar 9, 2022

rhysh commented Mar 9, 2022

doujiang24 commented Mar 15, 2022

rsc commented Mar 16, 2022

doujiang24 commented Mar 18, 2022

ianlancetaylor commented Mar 18, 2022

doujiang24 commented Mar 18, 2022

rsc commented Mar 23, 2022

proposal: runtime/pprof: add SetMaxDumpGoroutine to limit the number of goroutines dumped in a profile #50794

proposal: runtime/pprof: add SetMaxDumpGoroutine to limit the number of goroutines dumped in a profile #50794

Comments

doujiang24 commented Jan 25, 2022

aclements commented Jan 25, 2022

rsc commented Feb 2, 2022

rhysh commented Feb 2, 2022

doujiang24 commented Feb 8, 2022

doujiang24 commented Feb 8, 2022

doujiang24 commented Feb 8, 2022

rsc commented Feb 9, 2022

doujiang24 commented Feb 10, 2022

rsc commented Feb 16, 2022

aclements commented Feb 16, 2022

mknyszek commented Feb 16, 2022 • edited Loading

prattmic commented Feb 16, 2022

ianlancetaylor commented Feb 16, 2022

prattmic commented Feb 16, 2022

rhysh commented Feb 16, 2022

doujiang24 commented Feb 17, 2022

gopherbot commented Feb 22, 2022

rsc commented Mar 2, 2022

doujiang24 commented Mar 3, 2022

felixge commented Mar 3, 2022 • edited Loading

rhysh commented Mar 3, 2022

rsc commented Mar 9, 2022

rhysh commented Mar 9, 2022

doujiang24 commented Mar 15, 2022

rsc commented Mar 16, 2022

doujiang24 commented Mar 18, 2022

ianlancetaylor commented Mar 18, 2022

doujiang24 commented Mar 18, 2022

rsc commented Mar 23, 2022

mknyszek commented Feb 16, 2022 •

edited

Loading

felixge commented Mar 3, 2022 •

edited

Loading