Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: expose number of running/runnable goroutines #17089

Open
fd opened this issue Sep 13, 2016 · 16 comments
Open

runtime: expose number of running/runnable goroutines #17089

fd opened this issue Sep 13, 2016 · 16 comments
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@fd
Copy link

fd commented Sep 13, 2016

Summary

I'd like to propose a way to expose the number of active (running + runnable) goroutines.

Background

My primary use case for this metric is to estimate application load (num-active-goroutines / num-cpu) in order to implement load shedding. Other metrics, like the times() syscall, don't expose application overload and don't work well in the presence of noisy neighbours.

Plan

Currently the runtime package includes runtime.NumGoroutine() int which returns the number of live, non-system goroutines.

The runtime package could be extended to include runtime.NumActiveGoroutine() int. NumActiveGoroutine() should count all goroutines where isSystemGoroutine() is false and where status is _Grunnable|_Grunning|_Gsyscall.

It seems that such a function would need to acquire sched.lock and allglock. This could have some performance implications.

@quentinmit quentinmit added this to the Go1.8Maybe milestone Sep 13, 2016
@quentinmit
Copy link
Contributor

/cc @aclements

@quentinmit
Copy link
Contributor

I'm going to tentatively mark this as a feature request for runtime, instead of a proposal, since it seems pretty uncontroversial to me.

@quentinmit quentinmit changed the title Proposal: Expose number of running/runnable goroutines runtime: expose number of running/runnable goroutines Sep 13, 2016
@davecheney
Copy link
Contributor

It doesn't seem like a very interesting number, it'll always be less than
or equal to GOMAXPROCS.

On Wed, 14 Sep 2016, 00:43 Quentin Smith notifications@github.com wrote:

I'm going to tentatively mark this as a feature request for runtime,
instead of a proposal, since it seems pretty uncontroversial to me.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#17089 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAAcA_CxhRkJYUwt9yo_IlXyO0wh4e9Uks5qpraLgaJpZM4J7r8b
.

@ianlancetaylor
Copy link
Contributor

@davecheney The suggestion counts runnable goroutines, so it can be larger than GOMAXPROCS.

My concern is that I don't see that this adds anything very useful over NumGoroutine. If you are worried about shedding load then I don't see why you want to ignore the system goroutines. And there aren't very many system goroutines anyhow, so if you are in a condition where load shedding is relevant they are just going to be a rounding error.

@randall77
Copy link
Contributor

I think more importantly he doesn't want to count goroutines which are blocked (which NumGoroutine does count).

@ianlancetaylor
Copy link
Contributor

Oh, I see, but then the goroutines in state _Gsyscall are ambiguous, as they could be blocked.

@randall77
Copy link
Contributor

Indeed. Only things blocked in Go (select{}, ...) would not be counted if we used the raw goroutine states.

@fd
Copy link
Author

fd commented Sep 14, 2016

@ianlancetaylor: The suggestion counts runnable goroutines, so it can be larger than GOMAXPROCS.

That is correct.

@ianlancetaylor: My concern is that I don't see that this adds anything very useful over NumGoroutine.

Using NumGoroutine breaks down when you have long running goroutines that do background work (like periodically refreshing cache entries). This approach also breaks down for proxy servers as they spend most of their time waiting on the network.

@ianlancetaylor: If you are worried about shedding load then I don't see why you want to ignore the system goroutines.

Based on the POC that I made, It seems that at least some system goroutines always appear active. I could be wrong here as they might be blocking on a syscall (like say the netpoller).

The other reason I think system goroutines should be excluded is because NumGoroutine also excludes them.

@ianlancetaylor: Oh, I see, but then the goroutines in state _Gsyscall are ambiguous, as they could be blocked.

That is correct maybe the g.waitreason should be taken into account to?
Otherwise the _Gsyscall state could be excluded.

@randall77: Only things blocked in Go (select{}, ...) would not be counted if we used the raw goroutine states.

How can this be detected?


Here is the POC code I wrote: https://gist.github.com/fd/7136de67a56e174d8c06cb505f7278aa

@randall77
Copy link
Contributor

Goroutines blocked in the Go runtime have Gwaiting state. You probably don't want to count those, they contribute nothing to CPU load (but do consume some memory).

It is not clear whether you should count goroutines in the Gsyscall state. Whether you want to count them depends on whether they are doing real work in the syscall (reading a large file, say) or waiting (read on an idle network socket). I don't think the runtime has the information needed to make that call, although we might be able to make some approximation. That's what makes this problem hard.

@fd
Copy link
Author

fd commented Sep 14, 2016

So, how about this:

  • timers and network IO are managed by the runtime and don't result in _Gsyscall goroutines (except for the system goroutines which should be excluded).
  • cgocalls can include syscalls which cannot be tracked. as a result these syscalls are always counted.
  • Applications accessing the network through cgo (which hides the syscalls) should be rare.
  • other syscalls generally result in CPU load and thus should be included.

So unless you are heavily using something like gopkg.in/fsnotify.v1 NumActiveGoroutine should be a decent approximation of the actual work load.

Including _Gsyscall should be a good starting point for NumActiveGoroutine.
The runtime could be extended to record the called syscall in G.
Then syscall package could be extended with a list of syscalls that result in some form of idling.
Given these changes, NumActiveGoroutine can decide whether to consider the goroutine active or not. Syscalls called from cgo are still hidden in this senario.

Remember, it is not my goal to find an accurate estimation of the CPU utilisation. Instead it is my goal to find a good-enough estimation of the application utilisation. I included a excerpt from Site Reliability Engineering, How Google Runs Production Systems which seems to suggest that Google uses a similar metric/approach.


Site Reliability Engineering, How Google Runs Production Systems - p. 366

The utilization signals we use are based on the state local to the task (since the goal of the signals is to protect the task) and we have implementations for various signals. The most generally useful signal is based on the “load” in the process, which is determined using a system we call executor load average .

To find the executor load average, we count the number of active threads in the process. In this case, “active” refers to threads that are currently running or ready to run and waiting for a free processor. We smooth this value with exponential decay and begin rejecting requests as the number of active threads grows beyond the number of processors available to the task. That means that an incoming request that has a very large fan-out (i.e., one that schedules a burst of a very large number of short-lived operations) will cause the load to spike very briefly, but the smoothing will mostly swallow that spike. However, if the operations are not short-lived (i.e., the load increases and remains high for a significant amount of time), the task will start rejecting requests.

@ianlancetaylor
Copy link
Contributor

Using NumGoroutine breaks down when you have long running goroutines that do background work (like periodically refreshing cache entries). This approach also breaks down for proxy servers as they spend most of their time waiting on the network.

As you say, you are looking for an approximation, and you care about load shedding. Unless you start a long running goroutine for each incoming request, the number of long running goroutines should be a tiny fraction of the total number of goroutines, and are therefore ignorable for approximation purposes.

I agree that proxy servers are a problem.

Since you have proof of concept code, do you have a way to see the difference between NumGoroutine and NumActiveGoroutine for a large server?

I would be less concerned about adding NumActiveGoroutines if it weren't for the ambiguity about _Gsyscall. I'm worried about how to document what the result really means for programs that call C code. It's probably unusual to call C code that makes direct network calls, but it's not in the least unusual to call C code that uses the file system, which may be networked, or that uses a library that in turn makes DNS lookups or in some other way uses the network. So while NumActiveGoroutines is easy to understand for pure Go code, I don't see how it's easily generalizable for Go programs that call C code.

One possibility would be to return two numbers: the number of running/runnable goroutines and the number of goroutines waiting for a system call or C code. But that seems to me to be too tied to the current details of how system calls and cgo are implemented.

I assume you are looking for some sort of general framework here, because for any specific program that wants to do load shedding I would say just count the number of active requests.

@RLH
Copy link
Contributor

RLH commented Sep 14, 2016

The problem NumActiveGoroutines is trying to solve is when to shed load. Wouldn't monitoring the latency of an application request be a more direct and ultimately more correct way to do this. If latency increases shed load. If latency improves increase load.

Is there a use case where this doesn't work but NumActiveGoroutines does?

Discussing the nuances of what _Gidle, _Grunnable, _Grunning, _Gsyscall, _Gwaiting plus what _Gscanrunning _Gscanrunnable, _Gscansyscall, and _Gscanidle means in this context is a very implementation dependent discussion.

@quentinmit
Copy link
Contributor

Even NumGoroutines does not capture all the work C is doing; the C code may have spawned threads that are independently doing work as well.

I think it's reasonable to say that goroutines in C are not active from the perspective of Go, regardless of what they're calling.

@quentinmit quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 11, 2016
@rsc
Copy link
Contributor

rsc commented Oct 21, 2016

This is not uncontroversial.

@rsc rsc modified the milestones: Go1.9, Go1.8Maybe Oct 21, 2016
@rsc rsc added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed NeedsFix The path to resolution is known, but the work has not been done. labels Oct 21, 2016
@gopherbot
Copy link

CL https://golang.org/cl/38180 mentions this issue.

@bradfitz bradfitz added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels May 15, 2017
@aclements aclements modified the milestones: Go1.10Early, Go1.9 Jun 7, 2017
@bradfitz bradfitz added early-in-cycle A change that should be done early in the 3 month dev cycle. and removed early-in-cycle A change that should be done early in the 3 month dev cycle. labels Jun 14, 2017
@bradfitz bradfitz removed this from the Go1.10Early milestone Jun 14, 2017
@bradfitz bradfitz modified the milestones: Go1.10, Go1.10Early Jun 14, 2017
@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017
@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Unplanned Jul 9, 2018
bobotu pushed a commit to bobotu/go that referenced this issue Nov 2, 2020
DO NOT SUBMIT. This is an experimental API.

This introduces a runtime.SchedStats API to mirror the existing
runtime.MemStats API. Currently, SchedStats reports the number of
goroutines in four major states: running, runnable, non-go
(syscall/cgo), and blocked.

The intent is that these can be used to determine the CPU load of a Go
process and use this to perform load shedding. This is *not* a
complete solution since the Go scheduler cannot account for threads in
syscalls or cgo; however, a complete solution can be built by
combining these statistics with kernel-provided statistics. The
comments on SchedStats attempt to make this clear.

ReadSchedStats collects these counts efficiently by scanning the P
states and using a running count of the number of goroutines in
syscalls that don't own a P (which avoids doing any additional
accounting in the syscall fast path). This way, it can avoid scanning
all of the goroutines, which could potentially be expensive. With this
approach, at GOMAXPROCS=4, ReadSchedStats takes only 33 ns.

Updates golang#15490, golang#17089.

Change-Id: I202f33eea5d10c83dbf41cb45c8c619ff17fa4c4
bobotu pushed a commit to bobotu/go that referenced this issue Nov 3, 2020
DO NOT SUBMIT. This is an experimental API.

This introduces a runtime.SchedStats API to mirror the existing
runtime.MemStats API. Currently, SchedStats reports the number of
goroutines in four major states: running, runnable, non-go
(syscall/cgo), and blocked.

The intent is that these can be used to determine the CPU load of a Go
process and use this to perform load shedding. This is *not* a
complete solution since the Go scheduler cannot account for threads in
syscalls or cgo; however, a complete solution can be built by
combining these statistics with kernel-provided statistics. The
comments on SchedStats attempt to make this clear.

ReadSchedStats collects these counts efficiently by scanning the P
states and using a running count of the number of goroutines in
syscalls that don't own a P (which avoids doing any additional
accounting in the syscall fast path). This way, it can avoid scanning
all of the goroutines, which could potentially be expensive. With this
approach, at GOMAXPROCS=4, ReadSchedStats takes only 33 ns.

Updates golang#15490, golang#17089.

Change-Id: I202f33eea5d10c83dbf41cb45c8c619ff17fa4c4
@prattmic
Copy link
Member

Side-stepping whether or not we want this, but now that we have a runtime metrics API (#37112), if we do add these kinds of metrics, adding to the metric API will be the obvious place rather than a methods.

See also #15490.

cc @mknyszek

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests