-
Notifications
You must be signed in to change notification settings - Fork 18k
x/net/http2: Server is slow under load (flow control?) #18404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is almost certainly #16512 I think vegeta is starved for flow control tokens. /cc @tombergan |
Also see #17985 |
@bradfitz, I think this is more than #16512. Why are there 2 conn.serve goroutines but 165212 conn.writeFrameAsync goroutines? There should be at most one writeFrameAsync goroutine per serve goroutine. There is either a bug that triggers multiple concurrent frame writes, or a goroutine leak that doesn't cleanup writeFrameAsync goroutines. The latter seems likely. Probably this should wait for "wroteFrameCh<-" or doneServing? |
Hey guys, thanks for the quick response. This discussion leads to some follow-up questions:
|
@autoric, yup, those are the questions this bug is about investigating. We'll probably need better (or automatic) defaults, and more knobs. |
I'm always a fan of the no knob solution, but if there could be knobs only in the http2 pkg offered earlier than the auto solution it would be great for us. |
CL https://golang.org/cl/35118 mentions this issue. |
Maybe it's somehow related to #18309 |
CL https://golang.org/cl/37226 mentions this issue. |
Upload performance is poor when BDP is higher than the flow-control window. Previously, the server's receive window was fixed at 64KB, which resulted in very poor performance for high-BDP links. The receive window now defaults to 1MB and is configurable. The per-connection and per-stream windows are configurable separately (both default to 1MB as suggested in golang/go#16512). Previously, the server created a "fixedBuffer" for each request body. This is no longer a good idea because a fixedBuffer has fixed size, which means individual streams cannot use varying amounts of the available connection window. To overcome this limitation, I replaced fixedBuffer with "dataBuffer", which grows and shrinks based on current usage. The worst-case fragmentation of dataBuffer is 32KB wasted memory per stream, but I expect that worst-case will be rare. A slightly modified version of adg@'s grpcbench program shows a dramatic improvement when increasing from a 64KB window to a 1MB window, especially at higher latencies (i.e., higher BDPs). Network latency was simulated with netem, e.g., `tc qdisc add dev lo root netem delay 16ms`. Duration Latency Proto H2 Window 11ms±4.05ms 0s HTTP/1.1 - 17ms±1.95ms 0s HTTP/2.0 65535 8ms±1.75ms 0s HTTP/2.0 1048576 10ms±1.49ms 1ms HTTP/1.1 - 47ms±2.91ms 1ms HTTP/2.0 65535 10ms±1.77ms 1ms HTTP/2.0 1048576 15ms±1.69ms 2ms HTTP/1.1 - 88ms±11.29ms 2ms HTTP/2.0 65535 15ms±1.18ms 2ms HTTP/2.0 1048576 23ms±1.42ms 4ms HTTP/1.1 - 152ms±0.77ms 4ms HTTP/2.0 65535 23ms±0.94ms 4ms HTTP/2.0 1048576 40ms±1.54ms 8ms HTTP/1.1 - 288ms±1.67ms 8ms HTTP/2.0 65535 39ms±1.29ms 8ms HTTP/2.0 1048576 72ms±1.13ms 16ms HTTP/1.1 - 559ms±0.68ms 16ms HTTP/2.0 65535 71ms±1.12ms 16ms HTTP/2.0 1048576 136ms±1.15ms 32ms HTTP/1.1 - 1104ms±1.62ms 32ms HTTP/2.0 65535 135ms±0.96ms 32ms HTTP/2.0 1048576 264ms±0.95ms 64ms HTTP/1.1 - 2191ms±2.08ms 64ms HTTP/2.0 65535 263ms±1.57ms 64ms HTTP/2.0 1048576 Fixes golang/go#16512 Updates golang/go#17985 Updates golang/go#18404 Change-Id: Ied385aa94588337e98dad9475cf2ece2f39ba346 Reviewed-on: https://go-review.googlesource.com/37226 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
CL https://golang.org/cl/37500 mentions this issue. |
@autoric, do the above two changes fix the problem? I'm a bit concerned about the number of goroutines (see prior comment) but the flow control problem should be fixed. |
Updates http2 to x/net/http2 git rev 906cda9 for: http2: add configurable knobs for the server's receive window https://golang.org/cl/37226 http2/hpack: speedup Encoder.searchTable https://golang.org/cl/37406 http2: Add opt-in option to Framer to allow DataFrame struct reuse https://golang.org/cl/34812 http2: replace fixedBuffer with dataBuffer https://golang.org/cl/37400 http2/hpack: remove hpack's constant time string comparison https://golang.org/cl/37394 Updates #16512 Updates #18404 Change-Id: I1ad7c95c404ead4ced7f85af061cf811b299a288 Reviewed-on: https://go-review.googlesource.com/37500 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
@tombergan Thanks for the update! I will run tests in the next couple days and get back to you. |
@autoric, any updates? Please try Go master before it becomes Go 1.9beta1 soonish here. |
Timeout. I assume this was fixed. Comment if not. |
Upload performance is poor when BDP is higher than the flow-control window. Previously, the server's receive window was fixed at 64KB, which resulted in very poor performance for high-BDP links. The receive window now defaults to 1MB and is configurable. The per-connection and per-stream windows are configurable separately (both default to 1MB as suggested in golang/go#16512). Previously, the server created a "fixedBuffer" for each request body. This is no longer a good idea because a fixedBuffer has fixed size, which means individual streams cannot use varying amounts of the available connection window. To overcome this limitation, I replaced fixedBuffer with "dataBuffer", which grows and shrinks based on current usage. The worst-case fragmentation of dataBuffer is 32KB wasted memory per stream, but I expect that worst-case will be rare. A slightly modified version of adg@'s grpcbench program shows a dramatic improvement when increasing from a 64KB window to a 1MB window, especially at higher latencies (i.e., higher BDPs). Network latency was simulated with netem, e.g., `tc qdisc add dev lo root netem delay 16ms`. Duration Latency Proto H2 Window 11ms±4.05ms 0s HTTP/1.1 - 17ms±1.95ms 0s HTTP/2.0 65535 8ms±1.75ms 0s HTTP/2.0 1048576 10ms±1.49ms 1ms HTTP/1.1 - 47ms±2.91ms 1ms HTTP/2.0 65535 10ms±1.77ms 1ms HTTP/2.0 1048576 15ms±1.69ms 2ms HTTP/1.1 - 88ms±11.29ms 2ms HTTP/2.0 65535 15ms±1.18ms 2ms HTTP/2.0 1048576 23ms±1.42ms 4ms HTTP/1.1 - 152ms±0.77ms 4ms HTTP/2.0 65535 23ms±0.94ms 4ms HTTP/2.0 1048576 40ms±1.54ms 8ms HTTP/1.1 - 288ms±1.67ms 8ms HTTP/2.0 65535 39ms±1.29ms 8ms HTTP/2.0 1048576 72ms±1.13ms 16ms HTTP/1.1 - 559ms±0.68ms 16ms HTTP/2.0 65535 71ms±1.12ms 16ms HTTP/2.0 1048576 136ms±1.15ms 32ms HTTP/1.1 - 1104ms±1.62ms 32ms HTTP/2.0 65535 135ms±0.96ms 32ms HTTP/2.0 1048576 264ms±0.95ms 64ms HTTP/1.1 - 2191ms±2.08ms 64ms HTTP/2.0 65535 263ms±1.57ms 64ms HTTP/2.0 1048576 Fixes golang/go#16512 Updates golang/go#17985 Updates golang/go#18404 Change-Id: Ied385aa94588337e98dad9475cf2ece2f39ba346 Reviewed-on: https://go-review.googlesource.com/37226 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
What version of Go are you using (
go version
)?go version go1.7.4 darwin/amd64
What operating system and processor architecture are you using (
go env
)?What did you do?
The server code:
Create a data file for testing:
$ mkfile -n 5m 5mb.txt
Using the vegeta load test client:
$ echo PUT https://localhost:3000/ | vegeta attack -insecure -duration=10s -rate=25 -body=5mb.txt | vegeta report
Run the same test against the server running in http/1.1 and http/2 mode.
What did you expect to see?
I would expect the performance (average latency / throughput) of an http/2 server to be similar or better than http/1.1.
What did you see instead?
HTTP/2 results:
HTTP/1.1 results:
Further info:
ioutil.ReadAll(req.Body)
- then requests are processed quickly (fairly obvious)Conclusion
Basically, HTTP/2 is merged into core and enabled by default on TLS servers. This leads me to expect that an HTTP/2 server should perform similarly or better than an HTTP/1.1 server under most conditions. I wish to build performant REST APIs that can sustain reasonable throughput, and HTTP/2 offers a number of features that I had assumed would improve performance.
I'm not clear if this is a bad assumption, I am misusing the APIs somehow, or this reflects an issue with the implementation. Any support, information, or recommendations would be greatly appreciated. Thanks!
The text was updated successfully, but these errors were encountered: