Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: a canceled request context can deadlock other requests on the same connection #52853

Closed
milesbxf opened this issue May 11, 2022 · 7 comments
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@milesbxf
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.17.6 darwin/amd64

Does this issue reproduce with the latest release?

Yes, this issue reproduces with golang.org/x/net v0.0.0-20220425223048-2871e0cb64e4 (see reproduction below)

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="auto"
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/milesbryant/Library/Caches/go-build"
GOENV="/Users/milesbryant/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/milesbryant/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/milesbryant"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go@1.16/1.16.13/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go@1.16/1.16.13/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.17.6"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

We enabled HTTP/2 for an internal proxy written in Go, and noticed that after 10 minutes, every connection into the proxy seemed to be hanging. A goroutine dump revealed that many goroutines were blocked on acquiring the ClientConn mutex (e.g. see https://github.com/golang/net/blob/2871e0cb64e47e47ba7466fad6e11562caf99563/http2/transport.go#L790).

One goroutine, however, had acquired the mutex and was blocked in abortStreamLocked trying to close the request body: https://github.com/golang/net/blob/2871e0cb64e47e47ba7466fad6e11562caf99563/http2/transport.go#L379. In our case this seemed to be indefinitely blocked trying to read from a network socket. We haven't figured out why yet, but that's a separate issue.

I've knocked up a very crude and simple reproduction here - I'll try and come back and tidy this up a bit later: https://github.com/milesbxf/go-http2-request-close-deadlock/blob/main/deadlock_test.go

For this to occur:

  • One request/stream has its context cancelled, causing the abortStream
  • The same request/stream also blocks on requestBody.Close
  • Other requests on the same connection try to acquire the mutex

Is this actually expected/intended behaviour? I.e. do we require that closing the request body should always be non-blocking? If so, I guess we could mitigate by increasing the concurrent connection count, but it feels like we could still hit the deadlock across multiple connections

What did you expect to see?

Other requests be able to make progress whilst the first request is blocked

What did you see instead?

All requests on the same client connection were blocked.

Other things

I'm very happy to attempt a fix, but I'm a bit of a novice in this part of the codebase!

@heschi heschi added the NeedsFix The path to resolution is known, but the work has not been done. label May 11, 2022
@heschi heschi added this to the Go1.19 milestone May 11, 2022
@heschi
Copy link
Contributor

heschi commented May 11, 2022

cc @neild

@ianlancetaylor
Copy link
Contributor

@neild What is the current status here? This issue is currently in the 1.19 milestone. Should it move to 1.20? To Backlog? Thanks.

@neild
Copy link
Contributor

neild commented Jun 24, 2022

We currently assume that Request.Body.Close will complete reasonably quickly. As demonstrated here, there are circumstances where we call Close with connection or stream mutexes held, which can cause general badness if Close hangs.

We should probably avoid holding locks while calling Request.Body.Close. I think it's okay if a slow Close blocks some operations (for example, ClientConn.Close will block until all in-flight request bodies are closed, and that's probably okay), but one slow request body shouldn't interfere with other requests on a connection.

@neild neild modified the milestones: Go1.19, Backlog Jun 24, 2022
@gopherbot
Copy link

Change https://go.dev/cl/424755 mentions this issue: x/net/http2: improved Request.Body.Close not to hold lock on connection

@gopherbot

This comment was marked as duplicate.

@mraerino
Copy link
Contributor

do you know if the fix will be backported to Go 1.19 in a point release?

@neild
Copy link
Contributor

neild commented Sep 27, 2022

You can use http2.ConfigureTransports to configure a Transport to use the golang.org/x/net/http2 HTTP/2 implementation rather than the bundled one in net/http, which provides a mechanism to pick up this fix in older Go versions.

WeiminShang added a commit to WeiminShang/net that referenced this issue Nov 16, 2022
The existing implementation holds a lock on a connection which causes
issues on a slow Request.Body.Close call.

Unlock before Request.Body.Close call. The abortStream closes the request
body after unlocking a mutex. The abortStreamLocked returns reqBody as
io.ReadCloser if the caller needs to close the body after unlocking the
mutex.

Fixes golang/go#52853

Change-Id: I0b74ba5263f65393c0e69e1c645d10c4db048903
Reviewed-on: https://go-review.googlesource.com/c/net/+/424755
Reviewed-by: Damien Neil <dneil@google.com>
Run-TryBot: Damien Neil <dneil@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Sep 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
Development

No branches or pull requests

7 participants