-
Notifications
You must be signed in to change notification settings - Fork 18k
x/net/http2: slow performance on Internet-latency links #17985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Have you tried to measure raw TLS bandwidth between the two Internet
machines? Perhaps the slowness is not caused by http2 at all.
|
(I meant using the Go crypto/tls package, of course.)
|
Yes it is fast, because its fast when I go from the http2 server and write to the response and read it in the http2 client resp. Keep in mind we have done it both ways, tcp server doing http2 client on one machine and tcp client doing http2 server on the other, and completely visa versa too. If I print some time out, it appears all the time is spent in the writeRequestBody() doing awaitFlowControl() in transport.go. I'm looking at a slowdown where one direction goes 0.7 seconds or less (testing with 10MB) and the other 8 seconds. In that loop in transport.go the read is 0.003s, the awaitflow is 8s and the write to the stream is 0.3s and the lock taken there is only 0.0007s. But why doesn't it get more flow from the server? The handler in the server is throwing away the data it reads from the request as fast as it can. |
I updated the code example above to what reproduces the issue for me. If you run them on two machines over WAN (on a good pipe) it performs very badly. This is a serious issue for us if someone has some free time we would greatly appreciate it. |
Yet this program does not show the issue, and I believe I am simulating latency correctly. combined.go
|
Here is a bit in the middle of the GODEBUG=http2debug=1 server.go
client.go
Seems like a lot of window updates of 1 or 2 and a lot of transmissions of 1 or 2 bytes?? |
Can I poke for some attention? :) |
Maybe @bradfitz can find some time to look into this. I took a short look at this, but couldn't reproduce it. As far as I understand the http2 server code the server (client.go) will send a window update everytime it reads from the request with the same window size as the number of bytes read (in my case always 8192, which is probably just the length of the slice passed to the io.Reader). I haven't looked at the client side (server.go) yet, as I don't currently have that much time. |
We have tested this on a number of different networks all with very high speeds (> 12MB/s both ways) and the problem persist with the above client.go and server.go where we get a max of 2 MBs. |
Isn't the fact that the exact write size is returned in a window update by the server when it reads mean that it ping pongs and is at the whim of latency? I have tried to make some edits to send an arbitrarily large window update to diagnose but I kept getting some errors since the code doesn't expect that. I also have that combined.go above that simulated latency and it doesn't show an issue. |
This seems like a fun mystery, but I don't have time to investigate this before family turkey time. And Go 1.8 (or betas of it) are coming soon, so it seems unlikely we can fix it by then. I'll retarget this to Go 1.9, but it could be fixed outside of a release sooner (in x/net). Maybe related: #16512 What would be motivating would be a self-contained unit test (or program) that exhibited the problem, listening on "127.0.0.1:0" and such, picking free ports, so the program (or test) always worked. A test is probably best, since we'd need one in the fix later anyway. /cc @tombergan |
As a sanity check, can you run the same program with http1 instead of http2 (note GODEBUG=http2client=0,http2server=0) and report the execution times for both http1 and http2? (It's helpful if you can report these times in bytes/sec.)
What is the expected bandwidth of this link?
Breaking down the tower of buffers: The http2 server copies the request body into ioutil.Discard. Each successful Read during this copy generates a window update. The copy uses an 8KB byte buffer, which explains why the server (generally) sends 8192 byte window updates. However, when the server's http handler reads from resp.Body, it actually reads from a fixed buffer of size 65535 bytes. This is the server's receive window size. Note that 8KB doesn't divide evenly into 65535 -- this explains why we sometimes see small WINDOW_UPDATEs. Small WINDOW_UPDATEs are not necessarily a bad thing, as long as we can send a whole bunch of them in sequence. The problem is almost certainly #16512, at least in part. You didn't state the bandwidth of your connection. If we assume 10Mbit/sec at 55ms latency, the BDP is ~68KB (slightly higher than the server's 64KB window). If we assume 100Mbit/sec at 55ms latency, the BDP is ~680KB (much higher than the server's receive window). Can you also try running with the following patch? (warning: not tested)
|
@tombergan I tried the changes, no affect. The time I am capturing for my client.go is timing |
I do not know what's happening. Can you run the same experiment with both http1 and http2 and report the upload speed you see in both cases? We should verify the problem is actually in http2 and not a lower layer. |
If I do just a straight ListenAndServe in my server.go and a POST with the 10MB body in my client.go and discard in server.go, it is completing in 1.8s so > 5MBs. |
So if I take your changes, put a 1 second sleep and log in front of the It seems correct however is very different when I remove the sleep. I know you said the bunches of small window updates is not harmful, but that is clearly what it is waiting on as they multiply tons compared to the version with the sleep and correspond with the framer writers. Is there a way to line these buffers up to eliminates the 1's and 2's? |
ah! First time I have seen it fast. 569ms. So with your changes and just hacking in 16000 or so at Sometimes it fails with |
You see 4.7x faster performance with http1 than http2. Your expected bandwidth is about 5.55MB/s, and latency is 55ms, so the BDP of this link is about 320KB, which is ~4.9x bigger than the default flow window of 64KB. This means the performance difference can be entirely explained by a low flow control window. For some reason, my above patch did not successfully increase the flow window from 64KB to 1GB. I think I figured out the problem. In the patch above, I forgot that SettingInitialWindowSize applies to stream-level flow control only; it leaves the connection flow window at 64KB. What if you use my patch above (and nothing else) and also add the following to serverConn.serve()? // At line 694, just before the call to sc.readPreface()
sc.sendWindowUpdate(nil, serverInitialWindowSize) If this works, I will be convinced that this bug is a dup of #16512. Brad and I will chat at some point about the best way to fix this more permanently. |
@tombergan Yes if I make that change it is fast, however it required one extra change to comment out This represents a 16X (!) difference from a pretty consistent 1.1MB/s to a 17MBs on this specific server, other servers will be even bigger. |
With your last change I have to comment out Still have the STREAM_CLOSED issue though above. It comes out at the bottom of the if statement |
D'oh. Sorry for all the bugs. This is what I get for trying to program in github comments. I think that should have been: sc.sendWindowUpdate(nil, serverInitialWindowSize - initialWindowSize) There may still be more bugs ... I haven't had time to patch in the change myself. |
I get the same error with that :( #16481 seems to be very similar to my error? |
Any idea when this would get worked on the dev cycle since a commit in the net repo could be used as soon as its ready? |
I will try to get to this in the first couple weeks of January. Can you verify that the programs in the original comment compile against go 1.8 and exhibit the problem? Thanks. |
Using this patch https://go-review.googlesource.com/#/c/35118/ and the rest of the http2 pkg at that commit and setting InitialWindowSize to 10 megs or higher resulted in a 6.7s time versus 10.6s without. This is the same test as above. |
Any loving care here? :) |
CL https://golang.org/cl/37226 mentions this issue. |
Upload performance is poor when BDP is higher than the flow-control window. Previously, the server's receive window was fixed at 64KB, which resulted in very poor performance for high-BDP links. The receive window now defaults to 1MB and is configurable. The per-connection and per-stream windows are configurable separately (both default to 1MB as suggested in golang/go#16512). Previously, the server created a "fixedBuffer" for each request body. This is no longer a good idea because a fixedBuffer has fixed size, which means individual streams cannot use varying amounts of the available connection window. To overcome this limitation, I replaced fixedBuffer with "dataBuffer", which grows and shrinks based on current usage. The worst-case fragmentation of dataBuffer is 32KB wasted memory per stream, but I expect that worst-case will be rare. A slightly modified version of adg@'s grpcbench program shows a dramatic improvement when increasing from a 64KB window to a 1MB window, especially at higher latencies (i.e., higher BDPs). Network latency was simulated with netem, e.g., `tc qdisc add dev lo root netem delay 16ms`. Duration Latency Proto H2 Window 11ms±4.05ms 0s HTTP/1.1 - 17ms±1.95ms 0s HTTP/2.0 65535 8ms±1.75ms 0s HTTP/2.0 1048576 10ms±1.49ms 1ms HTTP/1.1 - 47ms±2.91ms 1ms HTTP/2.0 65535 10ms±1.77ms 1ms HTTP/2.0 1048576 15ms±1.69ms 2ms HTTP/1.1 - 88ms±11.29ms 2ms HTTP/2.0 65535 15ms±1.18ms 2ms HTTP/2.0 1048576 23ms±1.42ms 4ms HTTP/1.1 - 152ms±0.77ms 4ms HTTP/2.0 65535 23ms±0.94ms 4ms HTTP/2.0 1048576 40ms±1.54ms 8ms HTTP/1.1 - 288ms±1.67ms 8ms HTTP/2.0 65535 39ms±1.29ms 8ms HTTP/2.0 1048576 72ms±1.13ms 16ms HTTP/1.1 - 559ms±0.68ms 16ms HTTP/2.0 65535 71ms±1.12ms 16ms HTTP/2.0 1048576 136ms±1.15ms 32ms HTTP/1.1 - 1104ms±1.62ms 32ms HTTP/2.0 65535 135ms±0.96ms 32ms HTTP/2.0 1048576 264ms±0.95ms 64ms HTTP/1.1 - 2191ms±2.08ms 64ms HTTP/2.0 65535 263ms±1.57ms 64ms HTTP/2.0 1048576 Fixes golang/go#16512 Updates golang/go#17985 Updates golang/go#18404 Change-Id: Ied385aa94588337e98dad9475cf2ece2f39ba346 Reviewed-on: https://go-review.googlesource.com/37226 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
@Gaillard I believe this is fixed. Please report back if not. |
Yup issue fixed :D Thanks guys! |
amd64 go1.7.3 and a vendored http2 at master.
I'm doing something a little strange and running into a performance problem.
I have a client and server. Once the client connects to the server it then starts serving itself on the connection and the server makes calls to the client, all through http2.
An example program is below but normally I am testing this between two machines over the internet and that is where things are slower than expected.
On a LAN it is very fast and if you run the program below it is 371ms or about 25MB/s which isn't bad. But over the internet even when scp between the two boxes is fast in both directions, this is still slow.
server.go
client.go
It turns out that the tests from the http2 client writing into a request and the http2 server reading from the request is slow. That is despite whether the http2 server is on the server end of the tcp connection or the client end. This is between two machines with a 55ms latency. Scp is fast over the two machines in both directions. As is the test over two machines on a LAN.
I've tried changing the following constants in the http2 code, handlerChunkWriteSize, transportDefaultStreamMinRefresh, defaultMaxReadFrameSize, initialWindowSize, initialHeaderTableSize and initialMaxFrameSize to no effect.
https://groups.google.com/forum/#!topic/golang-nuts/M0gPiyAGVG0
The text was updated successfully, but these errors were encountered: