Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: Server is slow under load (flow control?) #18404

Closed
erin-noe-payne opened this issue Dec 21, 2016 · 14 comments
Closed

x/net/http2: Server is slow under load (flow control?) #18404

erin-noe-payne opened this issue Dec 21, 2016 · 14 comments
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@erin-noe-payne
Copy link

erin-noe-payne commented Dec 21, 2016

What version of Go are you using (go version)?

go version go1.7.4 darwin/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/erin.noe-payne/local/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"

What did you do?

  • Start an http server.
  • Put it under some load: 25 req / s for 10s, each posting a 5mb text file
  • Compare the results for https/1.1 to http/2

The server code:

package main

import (
	"io/ioutil"
	"log"
	"net/http"
)

type handler struct{}

func (*handler) ServeHTTP(res http.ResponseWriter, req *http.Request) {
	ioutil.ReadAll(req.Body)
	req.Body.Close()
}

func main() {
	server := &http.Server{
		Addr:    ":3000",
		Handler: &handler{},
		// Uncomment to disable http/2
		// TLSNextProto: make(map[string]func(*http.Server, *tls.Conn, http.Handler)),
	}

	log.Fatal(server.ListenAndServeTLS("tls/server.crt", "tls/server.key"))
}

Create a data file for testing:
$ mkfile -n 5m 5mb.txt

Using the vegeta load test client:
$ echo PUT https://localhost:3000/ | vegeta attack -insecure -duration=10s -rate=25 -body=5mb.txt | vegeta report

Run the same test against the server running in http/1.1 and http/2 mode.

What did you expect to see?

I would expect the performance (average latency / throughput) of an http/2 server to be similar or better than http/1.1.

What did you see instead?

HTTP/2 results:

Making 25 req/s over 10s
Requests      [total, rate]            250, 25.10
Duration      [total, attack, wait]    42.305466622s, 9.959999799s, 32.345466823s
Latencies     [mean, 50, 95, 99, max]  31.880917546s, 33.618733832s, 37.184070581s, 37.822213047s, 38.701059773s
Bytes In      [total, mean]            0, 0.00
Bytes Out     [total, mean]            1310720000, 5242880.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:250

http-2
HTTP/1.1 results:

Making 25 req/s over 10s
Requests      [total, rate]            250, 25.10
Duration      [total, attack, wait]    10.044000447s, 9.959999916s, 84.000531ms
Latencies     [mean, 50, 95, 99, max]  105.904846ms, 81.203795ms, 233.098998ms, 315.526192ms, 504.860265ms
Bytes In      [total, mean]            0, 0.00
Bytes Out     [total, mean]            1310720000, 5242880.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:250

http-1 1

Further info:

  • If I reduce load to very low levels (say 1 req/s) HTTP 2 and 1.1 are comparable.
  • If I do not read the bytes off of the request - ioutil.ReadAll(req.Body) - then requests are processed quickly (fairly obvious)
  • The memory allocated for the process goes through the roof (~1.5GB) when http/2 is enabled. I am guessing this is because the latency is so high that many requests are being kept open in parallel, and all of their bodies are being held in memory.
  • I ran a trace on the server while running these tests. I don't have a ton of experience using the tracing / pprof tools, but one thing that stands out is the huge number of goroutines spun up:
    screen shot 2016-12-21 at 11 47 41 am
  • The other is the huge amount of time blocking on a sync.Wait:
    screen shot 2016-12-21 at 11 53 22 am

Conclusion

Basically, HTTP/2 is merged into core and enabled by default on TLS servers. This leads me to expect that an HTTP/2 server should perform similarly or better than an HTTP/1.1 server under most conditions. I wish to build performant REST APIs that can sustain reasonable throughput, and HTTP/2 offers a number of features that I had assumed would improve performance.

I'm not clear if this is a bad assumption, I am misusing the APIs somehow, or this reflects an issue with the implementation. Any support, information, or recommendations would be greatly appreciated. Thanks!

@bradfitz
Copy link
Contributor

This is almost certainly #16512

I think vegeta is starved for flow control tokens.

/cc @tombergan

@bradfitz bradfitz added this to the Go1.9 milestone Dec 21, 2016
@bradfitz bradfitz changed the title HTTP/2 is slow under load x/net/http2: Server is slow under load (flow control?) Dec 21, 2016
@tombergan
Copy link
Contributor

Also see #17985

@tombergan
Copy link
Contributor

@bradfitz, I think this is more than #16512. Why are there 2 conn.serve goroutines but 165212 conn.writeFrameAsync goroutines? There should be at most one writeFrameAsync goroutine per serve goroutine. There is either a bug that triggers multiple concurrent frame writes, or a goroutine leak that doesn't cleanup writeFrameAsync goroutines. The latter seems likely. Probably this should wait for "wroteFrameCh<-" or doneServing?
https://github.com/golang/net/blob/45e771701b814666a7eb299e6c7a57d0b1799e91/http2/server.go#L640

@erin-noe-payne
Copy link
Author

Hey guys, thanks for the quick response. This discussion leads to some follow-up questions:

  • Is this the flow control window size at the connection level - the connection is effectively throttling the throughput?
  • Is this the flow control window size at the stream level - the small buffer size is causing thrashing on reads and contention between the various goroutines? Does that even make sense?
  • In either case, what options do I have to fix it? The only configuration I see exposed is the http/2 server's MaxReadFrameSize. Will that affect both connection and stream window sizes?

@bradfitz
Copy link
Contributor

@autoric, yup, those are the questions this bug is about investigating. We'll probably need better (or automatic) defaults, and more knobs.

@yonderblue
Copy link

I'm always a fan of the no knob solution, but if there could be knobs only in the http2 pkg offered earlier than the auto solution it would be great for us.

@gopherbot
Copy link

CL https://golang.org/cl/35118 mentions this issue.

@Scratch-net
Copy link

Maybe it's somehow related to #18309

@gopherbot
Copy link

CL https://golang.org/cl/37226 mentions this issue.

gopherbot pushed a commit to golang/net that referenced this issue Feb 27, 2017
Upload performance is poor when BDP is higher than the flow-control window.
Previously, the server's receive window was fixed at 64KB, which resulted in
very poor performance for high-BDP links. The receive window now defaults to
1MB and is configurable. The per-connection and per-stream windows are
configurable separately (both default to 1MB as suggested in golang/go#16512).

Previously, the server created a "fixedBuffer" for each request body. This is no
longer a good idea because a fixedBuffer has fixed size, which means individual
streams cannot use varying amounts of the available connection window. To
overcome this limitation, I replaced fixedBuffer with "dataBuffer", which grows
and shrinks based on current usage. The worst-case fragmentation of dataBuffer
is 32KB wasted memory per stream, but I expect that worst-case will be rare.

A slightly modified version of adg@'s grpcbench program shows a dramatic
improvement when increasing from a 64KB window to a 1MB window, especially at
higher latencies (i.e., higher BDPs). Network latency was simulated with netem,
e.g., `tc qdisc add dev lo root netem delay 16ms`.

Duration        Latency Proto           H2 Window

11ms±4.05ms     0s      HTTP/1.1        -
17ms±1.95ms     0s      HTTP/2.0        65535
8ms±1.75ms      0s      HTTP/2.0        1048576

10ms±1.49ms     1ms     HTTP/1.1        -
47ms±2.91ms     1ms     HTTP/2.0        65535
10ms±1.77ms     1ms     HTTP/2.0        1048576

15ms±1.69ms     2ms     HTTP/1.1        -
88ms±11.29ms    2ms     HTTP/2.0        65535
15ms±1.18ms     2ms     HTTP/2.0        1048576

23ms±1.42ms     4ms     HTTP/1.1        -
152ms±0.77ms    4ms     HTTP/2.0        65535
23ms±0.94ms     4ms     HTTP/2.0        1048576

40ms±1.54ms     8ms     HTTP/1.1        -
288ms±1.67ms    8ms     HTTP/2.0        65535
39ms±1.29ms     8ms     HTTP/2.0        1048576

72ms±1.13ms     16ms    HTTP/1.1        -
559ms±0.68ms    16ms    HTTP/2.0        65535
71ms±1.12ms     16ms    HTTP/2.0        1048576

136ms±1.15ms    32ms    HTTP/1.1        -
1104ms±1.62ms   32ms    HTTP/2.0        65535
135ms±0.96ms    32ms    HTTP/2.0        1048576

264ms±0.95ms    64ms    HTTP/1.1        -
2191ms±2.08ms   64ms    HTTP/2.0        65535
263ms±1.57ms    64ms    HTTP/2.0        1048576

Fixes golang/go#16512
Updates golang/go#17985
Updates golang/go#18404

Change-Id: Ied385aa94588337e98dad9475cf2ece2f39ba346
Reviewed-on: https://go-review.googlesource.com/37226
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@gopherbot
Copy link

CL https://golang.org/cl/37500 mentions this issue.

@tombergan
Copy link
Contributor

@autoric, do the above two changes fix the problem? I'm a bit concerned about the number of goroutines (see prior comment) but the flow control problem should be fixed.

gopherbot pushed a commit that referenced this issue Feb 27, 2017
Updates http2 to x/net/http2 git rev 906cda9 for:

http2: add configurable knobs for the server's receive window
https://golang.org/cl/37226

http2/hpack: speedup Encoder.searchTable
https://golang.org/cl/37406

http2: Add opt-in option to Framer to allow DataFrame struct reuse
https://golang.org/cl/34812

http2: replace fixedBuffer with dataBuffer
https://golang.org/cl/37400

http2/hpack: remove hpack's constant time string comparison
https://golang.org/cl/37394

Updates #16512
Updates #18404

Change-Id: I1ad7c95c404ead4ced7f85af061cf811b299a288
Reviewed-on: https://go-review.googlesource.com/37500
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@erin-noe-payne
Copy link
Author

@tombergan Thanks for the update! I will run tests in the next couple days and get back to you.

@bradfitz
Copy link
Contributor

@autoric, any updates? Please try Go master before it becomes Go 1.9beta1 soonish here.

@bradfitz bradfitz added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label May 23, 2017
@bradfitz bradfitz modified the milestones: Go1.9Maybe, Go1.9 May 23, 2017
@bradfitz
Copy link
Contributor

bradfitz commented Jun 7, 2017

Timeout. I assume this was fixed. Comment if not.

@bradfitz bradfitz closed this as completed Jun 7, 2017
c3mb0 pushed a commit to c3mb0/net that referenced this issue Apr 2, 2018
Upload performance is poor when BDP is higher than the flow-control window.
Previously, the server's receive window was fixed at 64KB, which resulted in
very poor performance for high-BDP links. The receive window now defaults to
1MB and is configurable. The per-connection and per-stream windows are
configurable separately (both default to 1MB as suggested in golang/go#16512).

Previously, the server created a "fixedBuffer" for each request body. This is no
longer a good idea because a fixedBuffer has fixed size, which means individual
streams cannot use varying amounts of the available connection window. To
overcome this limitation, I replaced fixedBuffer with "dataBuffer", which grows
and shrinks based on current usage. The worst-case fragmentation of dataBuffer
is 32KB wasted memory per stream, but I expect that worst-case will be rare.

A slightly modified version of adg@'s grpcbench program shows a dramatic
improvement when increasing from a 64KB window to a 1MB window, especially at
higher latencies (i.e., higher BDPs). Network latency was simulated with netem,
e.g., `tc qdisc add dev lo root netem delay 16ms`.

Duration        Latency Proto           H2 Window

11ms±4.05ms     0s      HTTP/1.1        -
17ms±1.95ms     0s      HTTP/2.0        65535
8ms±1.75ms      0s      HTTP/2.0        1048576

10ms±1.49ms     1ms     HTTP/1.1        -
47ms±2.91ms     1ms     HTTP/2.0        65535
10ms±1.77ms     1ms     HTTP/2.0        1048576

15ms±1.69ms     2ms     HTTP/1.1        -
88ms±11.29ms    2ms     HTTP/2.0        65535
15ms±1.18ms     2ms     HTTP/2.0        1048576

23ms±1.42ms     4ms     HTTP/1.1        -
152ms±0.77ms    4ms     HTTP/2.0        65535
23ms±0.94ms     4ms     HTTP/2.0        1048576

40ms±1.54ms     8ms     HTTP/1.1        -
288ms±1.67ms    8ms     HTTP/2.0        65535
39ms±1.29ms     8ms     HTTP/2.0        1048576

72ms±1.13ms     16ms    HTTP/1.1        -
559ms±0.68ms    16ms    HTTP/2.0        65535
71ms±1.12ms     16ms    HTTP/2.0        1048576

136ms±1.15ms    32ms    HTTP/1.1        -
1104ms±1.62ms   32ms    HTTP/2.0        65535
135ms±0.96ms    32ms    HTTP/2.0        1048576

264ms±0.95ms    64ms    HTTP/1.1        -
2191ms±2.08ms   64ms    HTTP/2.0        65535
263ms±1.57ms    64ms    HTTP/2.0        1048576

Fixes golang/go#16512
Updates golang/go#17985
Updates golang/go#18404

Change-Id: Ied385aa94588337e98dad9475cf2ece2f39ba346
Reviewed-on: https://go-review.googlesource.com/37226
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Jun 7, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

6 participants