runtime: expose number of running/runnable goroutines #17089

fd · 2016-09-13T13:44:26Z

Summary

I'd like to propose a way to expose the number of active (running + runnable) goroutines.

Background

My primary use case for this metric is to estimate application load (num-active-goroutines / num-cpu) in order to implement load shedding. Other metrics, like the times() syscall, don't expose application overload and don't work well in the presence of noisy neighbours.

Plan

Currently the runtime package includes runtime.NumGoroutine() int which returns the number of live, non-system goroutines.

The runtime package could be extended to include runtime.NumActiveGoroutine() int. NumActiveGoroutine() should count all goroutines where isSystemGoroutine() is false and where status is _Grunnable|_Grunning|_Gsyscall.

It seems that such a function would need to acquire sched.lock and allglock. This could have some performance implications.

The text was updated successfully, but these errors were encountered:

quentinmit · 2016-09-13T14:42:47Z

/cc @aclements

quentinmit · 2016-09-13T14:43:22Z

I'm going to tentatively mark this as a feature request for runtime, instead of a proposal, since it seems pretty uncontroversial to me.

davecheney · 2016-09-13T21:18:39Z

It doesn't seem like a very interesting number, it'll always be less than
or equal to GOMAXPROCS.

On Wed, 14 Sep 2016, 00:43 Quentin Smith notifications@github.com wrote:

I'm going to tentatively mark this as a feature request for runtime,
instead of a proposal, since it seems pretty uncontroversial to me.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#17089 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAAcA_CxhRkJYUwt9yo_IlXyO0wh4e9Uks5qpraLgaJpZM4J7r8b
.

ianlancetaylor · 2016-09-13T22:36:44Z

@davecheney The suggestion counts runnable goroutines, so it can be larger than GOMAXPROCS.

My concern is that I don't see that this adds anything very useful over NumGoroutine. If you are worried about shedding load then I don't see why you want to ignore the system goroutines. And there aren't very many system goroutines anyhow, so if you are in a condition where load shedding is relevant they are just going to be a rounding error.

randall77 · 2016-09-13T22:52:39Z

I think more importantly he doesn't want to count goroutines which are blocked (which NumGoroutine does count).

ianlancetaylor · 2016-09-13T23:02:38Z

Oh, I see, but then the goroutines in state _Gsyscall are ambiguous, as they could be blocked.

randall77 · 2016-09-13T23:07:00Z

Indeed. Only things blocked in Go (select{}, ...) would not be counted if we used the raw goroutine states.

fd · 2016-09-14T06:26:52Z

@ianlancetaylor: The suggestion counts runnable goroutines, so it can be larger than GOMAXPROCS.

That is correct.

@ianlancetaylor: My concern is that I don't see that this adds anything very useful over NumGoroutine.

Using NumGoroutine breaks down when you have long running goroutines that do background work (like periodically refreshing cache entries). This approach also breaks down for proxy servers as they spend most of their time waiting on the network.

@ianlancetaylor: If you are worried about shedding load then I don't see why you want to ignore the system goroutines.

Based on the POC that I made, It seems that at least some system goroutines always appear active. I could be wrong here as they might be blocking on a syscall (like say the netpoller).

The other reason I think system goroutines should be excluded is because NumGoroutine also excludes them.

@ianlancetaylor: Oh, I see, but then the goroutines in state _Gsyscall are ambiguous, as they could be blocked.

That is correct maybe the g.waitreason should be taken into account to?
Otherwise the _Gsyscall state could be excluded.

@randall77: Only things blocked in Go (select{}, ...) would not be counted if we used the raw goroutine states.

How can this be detected?

Here is the POC code I wrote: https://gist.github.com/fd/7136de67a56e174d8c06cb505f7278aa

randall77 · 2016-09-14T07:12:03Z

Goroutines blocked in the Go runtime have Gwaiting state. You probably don't want to count those, they contribute nothing to CPU load (but do consume some memory).

It is not clear whether you should count goroutines in the Gsyscall state. Whether you want to count them depends on whether they are doing real work in the syscall (reading a large file, say) or waiting (read on an idle network socket). I don't think the runtime has the information needed to make that call, although we might be able to make some approximation. That's what makes this problem hard.

fd · 2016-09-14T10:06:53Z

So, how about this:

timers and network IO are managed by the runtime and don't result in _Gsyscall goroutines (except for the system goroutines which should be excluded).
cgocalls can include syscalls which cannot be tracked. as a result these syscalls are always counted.
Applications accessing the network through cgo (which hides the syscalls) should be rare.
other syscalls generally result in CPU load and thus should be included.

So unless you are heavily using something like gopkg.in/fsnotify.v1 NumActiveGoroutine should be a decent approximation of the actual work load.

Including _Gsyscall should be a good starting point for NumActiveGoroutine.
The runtime could be extended to record the called syscall in G.
Then syscall package could be extended with a list of syscalls that result in some form of idling.
Given these changes, NumActiveGoroutine can decide whether to consider the goroutine active or not. Syscalls called from cgo are still hidden in this senario.

Remember, it is not my goal to find an accurate estimation of the CPU utilisation. Instead it is my goal to find a good-enough estimation of the application utilisation. I included a excerpt from Site Reliability Engineering, How Google Runs Production Systems which seems to suggest that Google uses a similar metric/approach.

Site Reliability Engineering, How Google Runs Production Systems - p. 366

The utilization signals we use are based on the state local to the task (since the goal of the signals is to protect the task) and we have implementations for various signals. The most generally useful signal is based on the “load” in the process, which is determined using a system we call executor load average .

To find the executor load average, we count the number of active threads in the process. In this case, “active” refers to threads that are currently running or ready to run and waiting for a free processor. We smooth this value with exponential decay and begin rejecting requests as the number of active threads grows beyond the number of processors available to the task. That means that an incoming request that has a very large fan-out (i.e., one that schedules a burst of a very large number of short-lived operations) will cause the load to spike very briefly, but the smoothing will mostly swallow that spike. However, if the operations are not short-lived (i.e., the load increases and remains high for a significant amount of time), the task will start rejecting requests.

ianlancetaylor · 2016-09-14T13:14:25Z

Using NumGoroutine breaks down when you have long running goroutines that do background work (like periodically refreshing cache entries). This approach also breaks down for proxy servers as they spend most of their time waiting on the network.

As you say, you are looking for an approximation, and you care about load shedding. Unless you start a long running goroutine for each incoming request, the number of long running goroutines should be a tiny fraction of the total number of goroutines, and are therefore ignorable for approximation purposes.

I agree that proxy servers are a problem.

Since you have proof of concept code, do you have a way to see the difference between NumGoroutine and NumActiveGoroutine for a large server?

I would be less concerned about adding NumActiveGoroutines if it weren't for the ambiguity about _Gsyscall. I'm worried about how to document what the result really means for programs that call C code. It's probably unusual to call C code that makes direct network calls, but it's not in the least unusual to call C code that uses the file system, which may be networked, or that uses a library that in turn makes DNS lookups or in some other way uses the network. So while NumActiveGoroutines is easy to understand for pure Go code, I don't see how it's easily generalizable for Go programs that call C code.

One possibility would be to return two numbers: the number of running/runnable goroutines and the number of goroutines waiting for a system call or C code. But that seems to me to be too tied to the current details of how system calls and cgo are implemented.

I assume you are looking for some sort of general framework here, because for any specific program that wants to do load shedding I would say just count the number of active requests.

RLH · 2016-09-14T13:46:43Z

The problem NumActiveGoroutines is trying to solve is when to shed load. Wouldn't monitoring the latency of an application request be a more direct and ultimately more correct way to do this. If latency increases shed load. If latency improves increase load.

Is there a use case where this doesn't work but NumActiveGoroutines does?

Discussing the nuances of what _Gidle, _Grunnable, _Grunning, _Gsyscall, _Gwaiting plus what _Gscanrunning _Gscanrunnable, _Gscansyscall, and _Gscanidle means in this context is a very implementation dependent discussion.

quentinmit · 2016-10-11T18:03:57Z

Even NumGoroutines does not capture all the work C is doing; the C code may have spawned threads that are independently doing work as well.

I think it's reasonable to say that goroutines in C are not active from the perspective of Go, regardless of what they're calling.

rsc · 2016-10-21T00:52:14Z

This is not uncontroversial.

gopherbot · 2017-03-14T22:15:44Z

CL https://golang.org/cl/38180 mentions this issue.

DO NOT SUBMIT. This is an experimental API. This introduces a runtime.SchedStats API to mirror the existing runtime.MemStats API. Currently, SchedStats reports the number of goroutines in four major states: running, runnable, non-go (syscall/cgo), and blocked. The intent is that these can be used to determine the CPU load of a Go process and use this to perform load shedding. This is *not* a complete solution since the Go scheduler cannot account for threads in syscalls or cgo; however, a complete solution can be built by combining these statistics with kernel-provided statistics. The comments on SchedStats attempt to make this clear. ReadSchedStats collects these counts efficiently by scanning the P states and using a running count of the number of goroutines in syscalls that don't own a P (which avoids doing any additional accounting in the syscall fast path). This way, it can avoid scanning all of the goroutines, which could potentially be expensive. With this approach, at GOMAXPROCS=4, ReadSchedStats takes only 33 ns. Updates golang#15490, golang#17089. Change-Id: I202f33eea5d10c83dbf41cb45c8c619ff17fa4c4

prattmic · 2021-04-28T19:29:12Z

Side-stepping whether or not we want this, but now that we have a runtime metrics API (#37112), if we do add these kinds of metrics, adding to the metric API will be the obvious place rather than a methods.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: expose number of running/runnable goroutines #17089

runtime: expose number of running/runnable goroutines #17089

fd commented Sep 13, 2016

quentinmit commented Sep 13, 2016

quentinmit commented Sep 13, 2016

davecheney commented Sep 13, 2016

ianlancetaylor commented Sep 13, 2016

randall77 commented Sep 13, 2016

ianlancetaylor commented Sep 13, 2016

randall77 commented Sep 13, 2016

fd commented Sep 14, 2016 •

edited

Loading

randall77 commented Sep 14, 2016

fd commented Sep 14, 2016

ianlancetaylor commented Sep 14, 2016

RLH commented Sep 14, 2016

quentinmit commented Oct 11, 2016

rsc commented Oct 21, 2016

gopherbot commented Mar 14, 2017

prattmic commented Apr 28, 2021

runtime: expose number of running/runnable goroutines #17089

runtime: expose number of running/runnable goroutines #17089

Comments

fd commented Sep 13, 2016

Summary

Background

Plan

quentinmit commented Sep 13, 2016

quentinmit commented Sep 13, 2016

davecheney commented Sep 13, 2016

ianlancetaylor commented Sep 13, 2016

randall77 commented Sep 13, 2016

ianlancetaylor commented Sep 13, 2016

randall77 commented Sep 13, 2016

fd commented Sep 14, 2016 • edited Loading

randall77 commented Sep 14, 2016

fd commented Sep 14, 2016

Site Reliability Engineering, How Google Runs Production Systems - p. 366

ianlancetaylor commented Sep 14, 2016

RLH commented Sep 14, 2016

quentinmit commented Oct 11, 2016

rsc commented Oct 21, 2016

gopherbot commented Mar 14, 2017

prattmic commented Apr 28, 2021

fd commented Sep 14, 2016 •

edited

Loading