Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: http.Transport leaks request objects in certain situations #51354

Closed
mwolfe-arkose opened this issue Feb 24, 2022 · 4 comments
Closed
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@mwolfe-arkose
Copy link

mwolfe-arkose commented Feb 24, 2022

What version of Go are you using (go version)?

  $ go version
  go version go1.17.7 linux/amd64
  

Does this issue reproduce with the latest release?

Go 1.17.7 is the latest stable release at the time of writing. I was also able to reproduce the issue on the current "Go dev branch" version on the playground.

What operating system and processor architecture are you using (go env)?

go env Output
  $ go env
  GO111MODULE=""
  GOARCH="amd64"
  GOBIN=""
  GOCACHE="/home/morgan/.cache/go-build"
  GOENV="/home/morgan/.config/go/env"
  GOEXE=""
  GOEXPERIMENT=""
  GOFLAGS=""
  GOHOSTARCH="amd64"
  GOHOSTOS="linux"
  GOINSECURE=""
  GOMODCACHE="/home/morgan/go/pkg/mod"
  GONOPROXY=""
  GONOSUMDB=""
  GOOS="linux"
  GOPATH="/home/morgan/go"
  GOPRIVATE=""
  GOPROXY="https://proxy.golang.org,direct"
  GOROOT="/usr/lib/go"
  GOSUMDB="sum.golang.org"
  GOTMPDIR=""
  GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
  GOVCS=""
  GOVERSION="go1.17.7"
  GCCGO="gccgo"
  AR="ar"
  CC="x86_64-pc-linux-gnu-gcc"
  CXX="x86_64-pc-linux-gnu-g++"
  CGO_ENABLED="1"
  GOMOD="/dev/null"
  CGO_CFLAGS="-g -O2"
  CGO_CPPFLAGS=""
  CGO_CXXFLAGS="-g -O2"
  CGO_FFLAGS="-g -O2"
  CGO_LDFLAGS="-g -O2"
  PKG_CONFIG="pkg-config"
  GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2625616170=/tmp/go-build -gno-record-gcc-switches"
  GOROOT/bin/go version: go version go1.17.7 linux/amd64
  GOROOT/bin/go tool compile -V: compile version go1.17.7
  uname -sr: Linux 5.10.60.1-microsoft-standard-WSL2
  /lib64/libc.so.6: GNU C Library (Gentoo 2.33-r7 p6) release release version 2.33.
  

We also observed this issue in various environments using the golang:1.17 Docker image on commodity cloud hardware, and our reproduction case demonstrates erroneous behavior in the Go playground.

What did you do?

https://go.dev/play/p/iUpPf4G8vZy is a small example demonstrating this behavior. We start a local HTTP server, and then create two http.Clients with timeouts set. One client uses http.DefaultTransport directly, and the other uses a local Transport type that trivially wraps http.DefaultTransport. Then, we make a large number of POST requests to that local HTTP server using each client, allow them to time out by neglecting to read the response body, and read the size of http.Transport.reqCanceler using reflection.

(The issue can be reproduced with const count = 1 set as well, though since there's a race condition involved it may take several runs of the test program to see an incorrect result.)

What did you expect to see?

Our understanding is that http.Transport.reqCanceler should always be empty when there are no HTTP requests in flight. Since it maps "requests" to "functions that cancel requests", once a request has completed, errored, timed out, or otherwise entered a state where "cancellation" is not a meaningful operation, we'd expect that request to be removed from reqCanceler. For this example, we would expect to see three lines of count = 0 printed.

What did you see instead?

Under certain conditions (described below), a request will not be correctly removed from reqCanceler. This results in the request and associated objects being unable to be garbage-collected until the associated HTTP transport is dropped, which could be "for the lifetime of the program" if the associated transport is http.DefaultTransport. In the example, the last line of the output has a non-zero count despite there being no outstanding HTTP requests. In production, we observed excessive heap use involving persistConn objects, leading in some cases to application servers running out of memory.

A detailed investigation

We believe the problem to be line 2220 of net/http/transport.go:

  	  case <-rc.req.Cancel:
  		  alive = false
  		  pc.t.CancelRequest(rc.req)     // [this line]
  	  case <-rc.req.Context().Done():
  		  alive = false
  		  pc.t.cancelRequest(rc.cancelKey, rc.req.Context().Err())
  	  case <-pc.closech:
  		  alive = false
  	  }

The original HTTP request has been stored in pc.t.reqCanceler under rc.cancelKey. If the original HTTP request had a body (like with a POST), rc.req will not be the same request that rc.cancelKey refers to, and if the <-rc.req.Cancel case is executed, this code will remove the wrong key from reqCanceler (often one that doesn't exist to begin with). If no other CancelRequest or cancelRequest call is made with the correct cancel key, then the request stored in reqCanceler under rc.cancelKey will not be removed and will remain attached to the transport.

This will take place if:

  • the server begins to send a response, but the request context times out before the response is read by the application (in our production case, this was caused by scheduler contention on both the client and server machines, so it's not just a case of forgetting to completely read the response body)
  • net.http.knownRoundTripperImpl returned false for the given request, causing setRequestCancel to set req.Cancel to a new channel and start a timer to close it after the request's timeout period expires, and
  • that timer triggers prior to the request context's expiry timer triggering and closing req.Context().Done(). Since both timers will have the same timeout, this does not always happen. In our testing, we found it occurs slightly more than half the time.

Should all these things happen, only the <-rc.req.Cancel case will be executed.

We believe the correct fix for this problem would be to replace that line with a call to cancelRequest with the correct rc.cancelKey, similar to what's done later on in http.Transport (in fact, this case will trigger if you do read the response body and cancel the request correctly, which is why that's a precondition). We have been experimenting with an appropriately-patched net/http and found it to resolve the issue, both in that the test case given will correctly print three lines of count = 0 and that our production memory issues ceased.

@mwolfe-arkose
Copy link
Author

Additional notes that I didn't think belong in the issue body:

  • If nobody else gets to it first, I'll probably submit a patch that applies the fix I mentioned in a week or so (I need to do the first-time Go contributor setup and I have some other things on my plate to deal with first).
  • The issue doesn't occur if the wrapper Transport is moved into a subpackage called http2 due to how knownRoundTripperImpl is implemented.
  • Regardless of this issue, transport.go line 2220 should probably be updated anyways as t.CancelRequest is deprecated and has been for a while.

@ianlancetaylor
Copy link
Contributor

CC @neild

@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 25, 2022
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Feb 25, 2022
AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue Aug 2, 2023
Follows up on CL 245357 and adds missing returns in waitCondition (CL 477196)

Fixes golang#51354
@AlexanderYastrebov
Copy link
Contributor

If the original HTTP request had a body (like with a POST), rc.req will not be the same request that rc.cancelKey refers to

This is because of

req = setupRewindBody(req)
that creates a new request
// setupRewindBody returns a new request with a custom body wrapper
// that can report whether the body needs rewinding.
// This lets rewindBody avoid an error result when the request
// does not have GetBody but the body hasn't been read at all yet.
func setupRewindBody(req *Request) *Request {

I think this was overlooked in CL https://go-review.googlesource.com/c/go/+/245357 fixing #40453

In the client, when deadline (Timeout)

go/src/net/http/client.go

Lines 193 to 198 in ed8cbaf

func (c *Client) deadline() time.Time {
if c.Timeout > 0 {
return time.Now().Add(c.Timeout)
}
return time.Time{}
}

is not zero

go/src/net/http/client.go

Lines 352 to 355 in ed8cbaf

func setRequestCancel(req *Request, rt RoundTripper, deadline time.Time) (stopTimer func(), didTimeout func() bool) {
if deadline.IsZero() {
return nop, alwaysFalse
}

there is also a piece that calls Transport.CancelRequest(req) on the original request

go/src/net/http/client.go

Lines 380 to 389 in ed8cbaf

doCancel := func() {
// The second way in the func comment above:
close(cancel)
// The first way, used only for RoundTripper
// implementations written before Go 1.5 or Go 1.6.
type canceler interface{ CancelRequest(*Request) }
if v, ok := rt.(canceler); ok {
v.CancelRequest(req)
}
}

which breaks if custom transport does not implement CancelRequest like in the subject reproducer.

The reproducer https://go.dev/play/p/iUpPf4G8vZy could be fixed by adding

func (*Transport) CancelRequest(r *http.Request) {
	http.DefaultTransport.(*http.Transport).CancelRequest(r)
}

see https://go.dev/play/p/-n5SFtJXx43

@gopherbot
Copy link

Change https://go.dev/cl/515435 mentions this issue: net/http: use cancelKey to cancel request

cellularmitosis pushed a commit to cellularmitosis/go that referenced this issue Aug 24, 2023
Follows up on CL 245357 and adds missing returns in waitCondition (CL 477196)

Fixes golang#51354

Change-Id: I7950ff889ad72c4927a969c35fedc0186e863bd6
GitHub-Last-Rev: 52ce05b
GitHub-Pull-Request: golang#61724
Reviewed-on: https://go-review.googlesource.com/c/go/+/515435
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Damien Neil <dneil@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Damien Neil <dneil@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants