Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: slow streams can potentially block other faster streams #54330

Open
jronak opened this issue Aug 7, 2022 · 5 comments · May be fixed by golang/net#144
Open

x/net/http2: slow streams can potentially block other faster streams #54330

jronak opened this issue Aug 7, 2022 · 5 comments · May be fixed by golang/net#144
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@jronak
Copy link

jronak commented Aug 7, 2022

What version of Go are you using (go version)?

$ go version
go version go1.18.3 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/ronakj/Library/Caches/go-build"
GOENV="/Users/ronakj/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/ronakj/gocode/pkg/mod"
GONOPROXY="none"
GONOSUMDB="*"
GOOS="darwin"
GOPATH="/Users/ronakj/gocode"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.18.3"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/ronakj/project/http2-issue-repro/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/4d/2jw_2tc15x339gr53x6k64hm0000gn/T/go-build1453600270=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I'm using a Go HTTP2 (h2c) server as an Echo server, where the handler writes the request body in the response immediately for requests with the priority header request-type=priority otherwise delays the response by 5s. I use the HTTP2 client to dispatch two concurrent requests with a 2MB payload each, and one of them has a priority header set.

What did you expect to see?

Priority response must arrive immediately. The non-priority response must arrive after a 5s delay.
According to the RFC, one stream must not block another stream on the same connection.

What did you see instead?

Both non-priority and priority responses arrived after 5s. This should not have happened as HTTP2 streams on the same connections must not interfere with each other.

Repro-

Root Cause

Go HTTP2 server returns client connection flow control bytes back only after the application handler has read the body bytes. This blocks the client from writing the data frame of streams due to a lack of flow control window when a few slow/inactive streams consume the entire window size.

@gopherbot
Copy link

Change https://go.dev/cl/421975 mentions this issue: http2: send client conn flow control bytes back immediately

@thanm thanm added this to the Backlog milestone Aug 9, 2022
@thanm thanm added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 9, 2022
@thanm
Copy link
Contributor

thanm commented Aug 9, 2022

@neild

@neild
Copy link
Contributor

neild commented Aug 9, 2022

HTTP/2 has both per-connection and per-stream flow control. The behavior you're seeing is because the slow stream has consumed not just its own flow-control tokens, but the connection tokens as well. To ensure that a single stream can't consume the connection tokens, you can configure the per-stream limit to something lower than the per-connection limit.

The relevant knobs are MaxUploadBufferPerConnection and MaxUploadBufferPerStream on http2.Server.

Here's your example modified to set a higher per-connection limit:
https://go.dev/play/p/6H7_E1f7yrH

H2c Server starting on :8081
high priority req response time: 0s
low priority req response time: 5s

@jronak
Copy link
Author

jronak commented Aug 12, 2022

Thanks for the reply!

That's correct, connection tokens got exhausted which stalled other streams on the same connection. While the approach you suggested definitely works, this limit is almost always easy to exhaust when the server has a high number of concurrent requests. I have an L7 proxy which forwards requests received to different destinations. When one of the destinations A is unhealthy or slow, the request bound to that dest A starts backing up in the client connection pipeline and consumes the connection flow control tokens. Since connection flow control tokens are low, other requests bound for different destinations start starving for the connection flow control tokens.

The example you shared can be updated to handle a concurrent number of streams to reproduce the issue again.

Surprisingly, we didn't face this issue while using gRPC-go server and after diving in I found gRPC-go HTTP2 server returns connection flow control tokens immediately on receiving Data frame. Source.

@neild
Copy link
Contributor

neild commented Aug 12, 2022

You're describing having only per-stream flow control, with no per-connection flow control.

If that's what you want, then you can set MaxUploadBufferPerConnection to MaxUploadBufferPerStream*MaxConcurrentStreams, which will effectively disable connection-level flow control.

I don't understand what purpose connection-level flow control is supposed to serve if tokens are returned immediately without any pushback mechanism. Perhaps I'm missing something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants