Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: transport race in closing stream while processing DATA frame, kills connection. #47076

Open
amandachow opened this issue Jul 7, 2021 · 4 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@amandachow
Copy link

What version of Go are you using (go version)?

$ go version
go1.16.5 darwin/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/amandachow/Library/Caches/go-build"
GOENV="/Users/amandachow/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/amandachow/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/amandachow/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.16.5"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/_f/1tl8g2w95wx34b7zp53y0kqc0000gp/T/go-build310981905=/tmp/go-build -gno-record-gcc-switches -fno-common"

What's the issue?

I use httputil.ReverseProxy with http2.Transport. Downstream clients were hitting what looked like connection flakes - Either getting streams closed with RST_STREAM, or receiving an error response from our ErrorHandler. After some investigation, I found that it hit a race condition when a stream is closed on the http2.Server side, and on the http2.Transport side the stream is closed while a DATA frame is being processed. This caused the entire TCP client connection to error and close. It is the same issue described here: https://go.googlesource.com/net/+/6c4ac8bdbf06a105c4baf3dcda28bd9b0fb15588

The stream is getting closed between the clientconn unlock and the bufpipe write. It errors on attempting to write to a closed bufpipe, and as a result closes the entire clientconn, erroring all other open streams.

I tried holding onto the lock for the bufpipe write. I made the following change and ran it into production, and I saw the error stop:

                // Check connection-level flow control.
                cc.mu.Lock()
+               defer cc.mu.Unlock()
                if cs.inflow.available() >= int32(f.Length) {
                        cs.inflow.take(int32(f.Length))
                } else {
-                       cc.mu.Unlock()
                        return ConnectionError(ErrCodeFlowControl)
                }
                // Return any padded flow control now, since we won't
@@ -2246,11 +2246,13 @@ func (rl *clientConnReadLoop) processData(f *DataFrame) error {
                        cc.bw.Flush()
                        cc.wmu.Unlock()
                }
-               cc.mu.Unlock()
                if len(data) > 0 && !didReset {
                        if _, err := cs.bufPipe.Write(data); err != nil {
                                rl.endStreamError(cs, err)
                                return err
                        }
@mknyszek mknyszek changed the title x/net/http2: Transport race in closing stream while processing DATA frame, kills connection. x/net/http2: transport race in closing stream while processing DATA frame, kills connection. Jul 7, 2021
@mknyszek
Copy link
Contributor

mknyszek commented Jul 7, 2021

CC @neild via https://dev.golang.org/owners

@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 7, 2021
@mknyszek mknyszek added this to the Unreleased milestone Jul 7, 2021
@jared2501
Copy link

@neild quick ping on this issue! We're still seeing this in our PROD environments.

@amandachow
Copy link
Author

Hey, friendly ping on this - Any other details you need @neild ?

@amandachow
Copy link
Author

I saw this refactor PR where the cc lock is held for the bufpipe write: golang/net@cedda3a#diff-e9bd9b4a514c2960ad85405e4a827d83c2decaaac70e9e0158a59db2d05807a7R2410-R2425

It looks like this will solve our issue - Can someone confirm that please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

3 participants