Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: allow preemption-disable around critical section #65874

Open
donovanhide opened this issue Feb 22, 2024 · 5 comments
Open

runtime: allow preemption-disable around critical section #65874

donovanhide opened this issue Feb 22, 2024 · 5 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance
Milestone

Comments

@donovanhide
Copy link
Contributor

donovanhide commented Feb 22, 2024

Go version

go version go1.22.0 darwin/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN='/Users/***/gopath/bin'
GOCACHE='/Users/***/Library/Caches/go-build'
GOENV='/Users/***/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/***/gopath/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/***/gopath'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.22.0/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.22.0/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22.0'
GCCGO='gccgo'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/0q/57ygjhqj3k59y4b8txgw839r0000gn/T/go-build2558853606=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

Running my application I have been investigating some long tail latency which is causing backlogs in the processing of incoming message. The application processes between 4,000 to 50,0000 incoming JSON messages a second, delivered over websocket connections. These messages are unmarshalled on between 8 and 16 separate sockets with a goroutine doing the reading on each. All unmarshalled messages are sent over a single channel to another goroutine that does time sensitive calculations on the resultant state of those messages. There are virtually no memory allocations and GC pauses typically happen once every 30 seconds. This goroutine running its calculation in a speedy manner is critical to the success of the application.

What did you see happen?

Under normal compilation using go 1.22, there is significant long tail (0.9999 percentile) latency of 5-15 ms on the time taken to complete the calculation which typically takes 30-90µs (0.99 percentile). With the asyncpreemptoff flag set on GODEBUG, this latency (0.9999 percentile) falls below 1ms. The following image graphs the changed before and after setting the flag:

Screenshot 2024-02-22 at 09 25 58

What did you expect to see?

I understand why preemption is useful for long running goroutines that starve other goroutines of access to the CPU, but I would not expect there to be such latency from its use. Being able to disable preemption for specific goroutines might be one solution. If you would like any more specific traces or profiles, happy to provide them. It would be difficult for me to provide a reproducer as the timings of all involved goroutines are very dependent on third party external state.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Feb 22, 2024
@donovanhide donovanhide changed the title runtime: long tail latency affects of asyncpreemptoff flag runtime: long tail latency effects of asyncpreemptoff flag Feb 22, 2024
@mauri870
Copy link
Member

cc @golang/runtime

@mauri870 mauri870 added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance labels Feb 22, 2024
@prattmic
Copy link
Member

Interesting issue. The purpose of asynchronous preemption is to reduce tail latency, so seeing it increase tail latency instead isn't great.

One way I could imagine this happening would be if you have many goroutines, but they fall into two buckets: high priority, latency sensitive or low priority, latency tolerant. If the high priority goroutines happen to have tight loops that don't have synchronous preemption points, then asynchronous preemption would preempt them more often. This would improve latency of the low priority goroutines at the expense of the high priority goroutines latency. Does this describe your application?

One piece of data I'd like to see is /sched/latencies:seconds from https://pkg.go.dev/runtime/metrics with and without asyncpreemptoff. Note that this histogram is cumulative since process start, so you only need to collect it once at the end. https://go.dev/cl/513115 has some code you can copy to format the data into a ASCII graph.

@prattmic
Copy link
Member

cc @golang/runtime @mknyszek @aclements

@prattmic prattmic added this to the Backlog milestone Feb 22, 2024
@donovanhide
Copy link
Contributor Author

Hi again.

I think I may have used some bad PromQL (irate vs rate with a push gateway and inconsistent intervals) in Grafana to generate that seemingly conclusive graph above. It seems like asyncpreemptoff is not related to my issue as perhaps the /sched/latencies:seconds data below perhaps demonstrates. I could not reproduce the effects of asyncpreemptoff affecting latency again.

What I am still seeing with asyncpreemptoff enabled or disabled is that the CPU bound function in the crucial goroutine is, once every 10,000 messages or so, taking many milliseconds to complete, despite the actual processing parts taking microseconds to complete. This then creates a backlog in a queue, which has knock on effects for the subsequent iterations of the crucial goroutine. As you describe there are two sets of goroutines, a latency sensitive one, which does not want to be interrupted, and many asynchronous ones waiting on bytes to come in over the network. Another scheduler in another language discusses the issues and possible solutions better than I can:

https://docs.rs/tokio/latest/tokio/#cpu-bound-tasks-and-blocking-code

It's not necessarily a priority problem, just a "please do not interrupt me until I reach a channel send or receive op, which I will probably do very soon". If such a flag could be set at the start of the goroutine, that would help my use case. If that flag was set for all goroutines, then I guess CPU starvation is likely....

Anyway, thanks for your time! Any tips gladly received! Please feel free to close the issue if desired.

Results with:
GODEBUG=asyncpreemptoff=1

                    |      0 [  -Inf,    0.0)
********************| 611508 [   0.0,   64.0)
                    |  23961 [  64.0,  128.0)
                 ***| 102487 [ 128.0,  192.0)
                  **|  74167 [ 192.0,  256.0)
                    |  19521 [ 256.0,  320.0)
                   *|  38501 [ 320.0,  384.0)
           *********| 295493 [ 384.0,  448.0)
            ********| 266913 [ 448.0,  512.0)
               *****| 174287 [ 512.0,  640.0)
                  **|  84142 [ 640.0,  768.0)
                 ***| 119178 [ 768.0,  896.0)
                  **|  89576 [ 896.0, 1024.0)
                 ***|  92763 [1024.0, 1280.0)
                   *|  37674 [1280.0, 1536.0)
                    |  18239 [1536.0, 1792.0)
                    |  16608 [1792.0, 2048.0)
                    |  26630 [2048.0, 2560.0)
                    |  20516 [2560.0, 3072.0)
                    |  17352 [3072.0, 3584.0)
                    |  19656 [3584.0, 4096.0)
                    |  29331 [4096.0, 5120.0)
                    |  20887 [5120.0, 6144.0)
                   *|  33623 [6144.0, 7168.0)
                   *|  41952 [7168.0, 8192.0)
                  **|  70560 [8192.0, 10240.0)
                   *|  43181 [10240.0, 12288.0)
                   *|  44811 [12288.0, 14336.0)
                   *|  43833 [14336.0, 16384.0)
                   *|  44537 [16384.0, 20480.0)
                    |  30200 [20480.0, 24576.0)
                    |  28594 [24576.0, 28672.0)
                    |  24048 [28672.0, 32768.0)
                 ***| 107010 [32768.0, 40960.0)
                   *|  59413 [40960.0, 49152.0)
                    |  29004 [49152.0, 57344.0)
                    |  13850 [57344.0, 65536.0)
                    |  11805 [65536.0, 81920.0)
                    |   4973 [81920.0, 98304.0)
                    |   2553 [98304.0, 114688.0)
                    |   1749 [114688.0, 131072.0)
                    |   2346 [131072.0, 163840.0)
                    |   1385 [163840.0, 196608.0)
                    |    949 [196608.0, 229376.0)
                    |    694 [229376.0, 262144.0)
                    |    958 [262144.0, 327680.0)
                    |    562 [327680.0, 393216.0)
                    |    415 [393216.0, 458752.0)
                    |    229 [458752.0, 524288.0)
                    |    289 [524288.0, 655360.0)
                    |    167 [655360.0, 786432.0)
                    |     85 [786432.0, 917504.0)
                    |     60 [917504.0, 1048576.0)
                    |     69 [1048576.0, 1310720.0)
                    |     59 [1310720.0, 1572864.0)
                    |     41 [1572864.0, 1835008.0)
                    |     33 [1835008.0, 2097152.0)
                    |     51 [2097152.0, 2621440.0)
                    |     12 [2621440.0, 3145728.0)
                    |     14 [3145728.0, 3670016.0)
                    |     10 [3670016.0, 4194304.0)
                    |     22 [4194304.0, 5242880.0)
                    |     15 [5242880.0, 6291456.0)
                    |     11 [6291456.0, 7340032.0)
                    |      5 [7340032.0, 8388608.0)
                    |     15 [8388608.0, 10485760.0)
                    |      4 [10485760.0, 12582912.0)
                    |      7 [12582912.0, 14680064.0)
                    |      2 [14680064.0, 16777216.0)
                    |      6 [16777216.0, 20971520.0)
                    |      2 [20971520.0, 25165824.0)
                    |      1 [25165824.0, 29360128.0)
                    |      0 [29360128.0, 33554432.0)
                    |      1 [33554432.0, 41943040.0)
                    |      0 [41943040.0, 50331648.0)
                    |      0 [50331648.0, 58720256.0)
                    | ...
                    |      0 [105553116266496.0, 123145302310912.0)
                    |      0 [123145302310912.0, 140737488355328.0)
                    |      0 [140737488355328.0,   +Inf)

Results without

                     |      0 [  -Inf,    0.0)
********************| 636929 [   0.0,   64.0)
                    |  24445 [  64.0,  128.0)
                 ***|  99320 [ 128.0,  192.0)
                  **|  82062 [ 192.0,  256.0)
                    |  22599 [ 256.0,  320.0)
                   *|  37629 [ 320.0,  384.0)
            ********| 273426 [ 384.0,  448.0)
            ********| 275386 [ 448.0,  512.0)
               *****| 185945 [ 512.0,  640.0)
                  **|  80218 [ 640.0,  768.0)
                ****| 128796 [ 768.0,  896.0)
                  **|  85224 [ 896.0, 1024.0)
                  **|  83437 [1024.0, 1280.0)
                   *|  34046 [1280.0, 1536.0)
                    |  17497 [1536.0, 1792.0)
                    |  17253 [1792.0, 2048.0)
                    |  27411 [2048.0, 2560.0)
                    |  20776 [2560.0, 3072.0)
                    |  17502 [3072.0, 3584.0)
                    |  21533 [3584.0, 4096.0)
                    |  28812 [4096.0, 5120.0)
                    |  19218 [5120.0, 6144.0)
                    |  30163 [6144.0, 7168.0)
                   *|  39716 [7168.0, 8192.0)
                  **|  70413 [8192.0, 10240.0)
                   *|  43146 [10240.0, 12288.0)
                   *|  46100 [12288.0, 14336.0)
                   *|  46865 [14336.0, 16384.0)
                   *|  45291 [16384.0, 20480.0)
                    |  30861 [20480.0, 24576.0)
                    |  28138 [24576.0, 28672.0)
                    |  24143 [28672.0, 32768.0)
                 ***| 114946 [32768.0, 40960.0)
                   *|  58894 [40960.0, 49152.0)
                    |  30010 [49152.0, 57344.0)
                    |  13650 [57344.0, 65536.0)
                    |  11804 [65536.0, 81920.0)
                    |   5049 [81920.0, 98304.0)
                    |   2803 [98304.0, 114688.0)
                    |   1888 [114688.0, 131072.0)
                    |   2275 [131072.0, 163840.0)
                    |   1445 [163840.0, 196608.0)
                    |    988 [196608.0, 229376.0)
                    |    728 [229376.0, 262144.0)
                    |    879 [262144.0, 327680.0)
                    |    558 [327680.0, 393216.0)
                    |    355 [393216.0, 458752.0)
                    |    228 [458752.0, 524288.0)
                    |    244 [524288.0, 655360.0)
                    |    143 [655360.0, 786432.0)
                    |     66 [786432.0, 917504.0)
                    |     55 [917504.0, 1048576.0)
                    |     74 [1048576.0, 1310720.0)
                    |     35 [1310720.0, 1572864.0)
                    |     36 [1572864.0, 1835008.0)
                    |     29 [1835008.0, 2097152.0)
                    |     45 [2097152.0, 2621440.0)
                    |     39 [2621440.0, 3145728.0)
                    |      9 [3145728.0, 3670016.0)
                    |      7 [3670016.0, 4194304.0)
                    |     14 [4194304.0, 5242880.0)
                    |     15 [5242880.0, 6291456.0)
                    |      6 [6291456.0, 7340032.0)
                    |      4 [7340032.0, 8388608.0)
                    |     14 [8388608.0, 10485760.0)
                    |      6 [10485760.0, 12582912.0)
                    |      4 [12582912.0, 14680064.0)
                    |      6 [14680064.0, 16777216.0)
                    |      3 [16777216.0, 20971520.0)
                    |      1 [20971520.0, 25165824.0)
                    |      2 [25165824.0, 29360128.0)
                    |      3 [29360128.0, 33554432.0)
                    |      0 [33554432.0, 41943040.0)
                    |      1 [41943040.0, 50331648.0)
                    |      0 [50331648.0, 58720256.0)
                    |      0 [58720256.0, 67108864.0)
                    | ...
                    |      0 [105553116266496.0, 123145302310912.0)
                    |      0 [123145302310912.0, 140737488355328.0)
                    |      0 [140737488355328.0,   +Inf)

@prattmic
Copy link
Member

Thanks, I'm glad the problem is better understood now.

What you are asking for is a mechanism to disable preemption when executing some critical section. I don't think we have an existing issue for that, so this can serve that purpose, though I don't think it is something we clearly want to do.

I figured we would also have an issue for goroutine priorities, though I can't seem to find one.

@prattmic prattmic changed the title runtime: long tail latency effects of asyncpreemptoff flag runtime: allow preemption-disable around critical section Feb 23, 2024
@prattmic prattmic modified the milestones: Backlog, Unplanned Feb 23, 2024
@prattmic prattmic added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance
Projects
None yet
Development

No branches or pull requests

4 participants