-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: allow preemption-disable around critical section #65874
Comments
cc @golang/runtime |
Interesting issue. The purpose of asynchronous preemption is to reduce tail latency, so seeing it increase tail latency instead isn't great. One way I could imagine this happening would be if you have many goroutines, but they fall into two buckets: high priority, latency sensitive or low priority, latency tolerant. If the high priority goroutines happen to have tight loops that don't have synchronous preemption points, then asynchronous preemption would preempt them more often. This would improve latency of the low priority goroutines at the expense of the high priority goroutines latency. Does this describe your application? One piece of data I'd like to see is |
cc @golang/runtime @mknyszek @aclements |
Hi again. I think I may have used some bad PromQL (irate vs rate with a push gateway and inconsistent intervals) in Grafana to generate that seemingly conclusive graph above. It seems like What I am still seeing with https://docs.rs/tokio/latest/tokio/#cpu-bound-tasks-and-blocking-code It's not necessarily a priority problem, just a "please do not interrupt me until I reach a channel send or receive op, which I will probably do very soon". If such a flag could be set at the start of the goroutine, that would help my use case. If that flag was set for all goroutines, then I guess CPU starvation is likely.... Anyway, thanks for your time! Any tips gladly received! Please feel free to close the issue if desired. Results with:
Results without
|
Thanks, I'm glad the problem is better understood now. What you are asking for is a mechanism to disable preemption when executing some critical section. I don't think we have an existing issue for that, so this can serve that purpose, though I don't think it is something we clearly want to do. I figured we would also have an issue for goroutine priorities, though I can't seem to find one. |
Go version
go version go1.22.0 darwin/arm64
Output of
go env
in your module/workspace:What did you do?
Running my application I have been investigating some long tail latency which is causing backlogs in the processing of incoming message. The application processes between 4,000 to 50,0000 incoming JSON messages a second, delivered over websocket connections. These messages are unmarshalled on between 8 and 16 separate sockets with a goroutine doing the reading on each. All unmarshalled messages are sent over a single channel to another goroutine that does time sensitive calculations on the resultant state of those messages. There are virtually no memory allocations and GC pauses typically happen once every 30 seconds. This goroutine running its calculation in a speedy manner is critical to the success of the application.
What did you see happen?
Under normal compilation using go 1.22, there is significant long tail (0.9999 percentile) latency of 5-15 ms on the time taken to complete the calculation which typically takes 30-90µs (0.99 percentile). With the
asyncpreemptoff
flag set on GODEBUG, this latency (0.9999 percentile) falls below 1ms. The following image graphs the changed before and after setting the flag:What did you expect to see?
I understand why preemption is useful for long running goroutines that starve other goroutines of access to the CPU, but I would not expect there to be such latency from its use. Being able to disable preemption for specific goroutines might be one solution. If you would like any more specific traces or profiles, happy to provide them. It would be difficult for me to provide a reproducer as the timings of all involved goroutines are very dependent on third party external state.
The text was updated successfully, but these errors were encountered: