Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: one slow stream may block all other steams in the same connection #16510

Closed
wujieliulan opened this issue Jul 27, 2016 · 5 comments

Comments

@wujieliulan
Copy link

wujieliulan commented Jul 27, 2016

Ref: #16481

h2test.tar.gz
Here is a test to produce the bug, you can change the bad handler (h2bad) to a slow consumer instead of stalling:

func h2good(w http.ResponseWriter, r *http.Request) {
    buf := make([]byte, 10)
    for {
        n,err := r.Body.Read(buf)
        if err != nil {
            return
        }
        time.Sleep(1*time.Second)
        log.Println("h2good read", n)
    }
}

From rfc7540:
5.2. Flow Control
"A flow-control scheme ensures that streams on the same connection do not destructively interfere with each other. "

This is a real world case, a steam with a slow consumer (for example, the handler can't consume data fast enough) can block all other steams in the same connection, even if the connection is wide open and have lots of available bandwidth.

Here is a suggested fix:

diff --git a/http2/server.go b/http2/server.go
index f368738..b924c3d 100644
--- a/http2/server.go
+++ b/http2/server.go
@@ -1277,6 +1277,9 @@ func (sc *serverConn) processSettingInitialWindowSize(val uint32) error {

 func (sc *serverConn) processData(f *DataFrame) error {
        sc.serveG.check()
+       if len(f.Data()) > 0 {
+               sc.sendWindowUpdate(nil, len(f.Data())) // conn-level
+       }
        // "If a DATA frame is received whose stream is not in "open"
        // or "half closed (local)" state, the recipient MUST respond
        // with a stream error (Section 5.4.2) of type STREAM_CLOSED."
@@ -1764,7 +1767,7 @@ func (sc *serverConn) noteBodyReadFromHandler(st *stream, n int) {

 func (sc *serverConn) noteBodyRead(st *stream, n int) {
        sc.serveG.check()
-       sc.sendWindowUpdate(nil, n) // conn-level
+       //sc.sendWindowUpdate(nil, n) // conn-level
        if st.state != stateHalfClosedRemote && st.state != stateClosed {
                // Don't send this WINDOW_UPDATE if the stream is closed
                // remotely.

The problem is that current implementation sends window update for both connection level and stream level when data is read by handler. The correct way should be to only send window update for steam level when data is read by handler, send connection level window update as soon as a data frame is received, doesn't matter if it is delivered of discarded.

@bradfitz
Copy link
Contributor

Go is following the spec. There is both connection-level and stream-level flow control, and Go is respecting both. Your proposed patch just ignores connection-level flow control entirely. That isn't correct.

When I asked you to file the bug, the question was about whether our default values for stream- and connection-level flow should be different. I forget what they are, and I don't know offhand what other servers are using by default. It's possible we need to make our connection-level ones be some factor larger than our stream-level ones, so it would take N blocked streams to block the whole connection.

But we're not going to ignore connection-level flow control and infinitely buffer all blocked streams' data in memory.

@bradfitz bradfitz added this to the Unreleased milestone Jul 27, 2016
@bradfitz bradfitz changed the title x/net/http2: one slow stream can block all other steams in the same connection x/net/http2: investigate Server default stream & conn flow control values Jul 27, 2016
@bradfitz bradfitz added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 27, 2016
@wujieliulan
Copy link
Author

wujieliulan commented Jul 27, 2016

I think that is the correct way to respect both connection level and stream level flow control. The values for steams and connection can be same or different, but I think there is an implementation logic bug. When data is received by connection, connection level window should be open for other steams, it should not wait until data is ready by handler. Reading the data or not by the handler should not be concerned by connection level, it is stream level issue. Please comment what potential issues you can think of with that patch.

Regarding the default values, connection level window size should be based on the bandwidth delay product. If it too small, it will under utilize the bandwidth (by idling waiting for window to open after RTT). If it is too big, data will be buffered by TCP which may delay/block control frames or other streams. Ideally we should start with a reasonable default (64K?) and adjust it based on the measured RTT and estimated bandwidth. Not sure it we want to be that optimized. Otherwise we can set a reasonable default. It may be helpful to expose a function for user to control, with RTT and estimated BW available to user.

The steam level default should be the same as connection level (equal to bandwidth delay product). If memory is not a issue. However, if we do want to limit server memory, we could limit steam level window size and max number of steams, at the cost of not fully utilizing the whole BW by each steam.

I think this is not server default steam & conn flow control value issue, it is a flow control bug, so I changed the title back.

@wujieliulan wujieliulan changed the title x/net/http2: investigate Server default stream & conn flow control values x/net/http2: one slow stream may block all other steams in the same connection Jul 27, 2016
@bradfitz
Copy link
Contributor

Reading the data or not by the handler should not be concerned by connection level, it is stream level issue. Please comment what potential issues you can think of with that patch.

The problem with immediately returning all conn-level flow control is that it then means you can buffer up to SETTINGS_MAX_CONCURRENT_STREAMS * SETTINGS_INITIAL_WINDOW_SIZE bytes, rather than the value of the connection-level flow control.

In your view, what is the purpose of the connection-level flow control if you don't think there should be any pushback on it? As implemented, if handlers don't read data, things are buffered until the flow control limits, and WINDOW_UPDATE isn't refreshed, creating pushback to the peer to stop sending. With your patch, you ignore all connection-level flow control and have no pushback, meaning that only stream-level flow control is enforced.

The reason I said it's a question of defaults is because we likely need a larger connection-level flow window.

Since you want to retitle this bug, I opened #16512 for investigating the flow control values.

@bradfitz bradfitz added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jul 27, 2016
@wujieliulan
Copy link
Author

Thanks for your comments, now I understand that conn-level flow control is used to limit total memory per connection that could be smaller then max_steams * steam_window_size. In that case, it makes sense to have a larger conn-level flow window. It might be helpful to add an option, different user may have different limitations.
I found QUIC flow control doc we may reference:
https://docs.google.com/document/d/1F2YfdDXKpy20WVKJueEf4abn_LVZHhMUMS5gX6Pgjl4/edit?usp=sharing
Please feel free to change the title if you like.

@bradfitz bradfitz removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 16, 2016
@bradfitz
Copy link
Contributor

Closing in favor of #16512

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants