-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: optionally (reliably) avoid netpoller #32009
Comments
This seems like the beginnings of a proposal but there’s no concrete next steps. What changes would you like to be made specifically? Thanks |
Do you want gVisor to never use the net poller full stop, or does this need to apply only to certain operations within gVisor? The whole point of the netpoller is to be more efficient (particularly, in the # of OS threads needed) than just blocking an OS thread on each read/write. I'm curious in what circumstances netpoll needs to be avoided. Maybe we could solve that problem instead of this one. |
We never want gVisor to use netpoll, full stop. One way of doing this that we have discussed is adding some way to detect if it was ever used to the runtime and then running all of our tests and failing any which use netpoll. golang.org/cl/78915 has more context on why netpoll is a problem. |
Could you just check
That CL has been merged for a long time now. I don't see any comments there about other issues besides the one that was fixed in the CL. |
Correct.
The benefits documented in the CL are only if netpoll is not in use. |
Bummer.
So you want the 12% improvement to CPU usage that this CL provides? But you only get that 12% if you never use netpoll? Or are you interested in the 0.5% latency improvement? How close is your app to those benchmarks? They are really corner case benchmarks, with lots of very quick trips into and out of the poller/channel ops/scheduler, with no work on top of that. |
It is mostly the CPU usage. gVisor is used in high-density environments. gVisor's CPU usage is much higher than Linux and we are currently looking into other options for reducing CPU usage as well. Latency is important too though. We measure latency in nanoseconds and shaving even a few nanoseconds in a hot path can be a win for us. |
At the time golang.org/cl/78915 was written, it reduced total runtime of a Tensorflow model training benchmark running inside gVisor by 5% (and total CPU usage by 10%). Tensorflow can be extremely futex heavy, as it coordinates very small units of work (size depends on the model) on a threadpool, where workers contend on resources. When an application calls futex inside gVisor and it actually blocks, that ultimately becomes a wait on a channel. Since new work is likely to be available very soon, this application becomes very sensitive to overall latency and CPU usage of the Go scheduler to wake the goroutine back up. google/gvisor#205 is a similar situation. In general, I think overall scheduler improvements (such as making netpoll cheaper) for these cases would be a possible alternative to an explicit API. |
Ideally we'd like to prevent the netpoller from ever running, but I'd be happy just with a way to check whether the netpoller has run. Here two proposals:
Are there preferences/objections, or better alternatives to this? |
From my perspective this is so special purpose that it's hard to get excited about having to maintain some publicly visible API for it. I'm pretty skeptical that it would ever have more than one user. We've talked about having some sort of runtime package stats access (#15490). Perhaps we could make sure that those stats include some data on use the netpoller. Then your tests could use that. |
Change https://golang.org/cl/187137 mentions this issue: |
The gVisor project implements a user-space Kernel, and its implementation performance-sensitive, which forces a manual avoidance of the netpoller by avoiding certain APIs.
It would be nice to automate and enforce this avoidance, either by exposing some API that could be use to assert in a test that the netpoller has never been used, or by exposing a build tag that would guarantee that the netpoller is inactive. As of this writing it seems concretely that we want to avoid ever incrementing netpollWaiters.
cc @iangudger @nlacasse @prattmic @amscanne
The text was updated successfully, but these errors were encountered: