Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: memory leak due to a large number of items in transport.reqCanceler #61708

Closed
huozby opened this issue Aug 2, 2023 · 12 comments · May be fixed by #62296
Closed

net/http: memory leak due to a large number of items in transport.reqCanceler #61708

huozby opened this issue Aug 2, 2023 · 12 comments · May be fixed by #62296
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@huozby
Copy link

huozby commented Aug 2, 2023

net/http: memory leak due to a large number of items in transport.reqCanceler

What version of Go are you using (go version)?

$ go version
go version go1.19.10 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN="/home/xxx/codes/go/bin"
GOCACHE="/home/xxx/.cache/go-build"
GOENV="/home/xxx/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/xxx/codes/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/xxx/codes/go"
GOPRIVATE=""
GOPROXY="https://goproxy.cn,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.19.10"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/xxx/codes/transport-memory-leak/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3608446700=/tmp/go-build -gno-record-gcc-switches"

What did you do?

When using http.ReverseProxy to build http ReverseProxy, We found memory leak in http.Transport in certain case.

There is a client , proxy and server as bellow, :

client ----->  proxy -----> server

reproduce condition:

  1. client concurrently visit proxy(maybe 200+ concurrency).
  2. server receive request head and return response immediately(server response before receive entired request body).

proxy code:
https://go.dev/play/p/dRzdohisHy5?v=goprev

server code:
https://go.dev/play/p/iyBuNbJWKVi?v=goprev

client code:
https://go.dev/play/p/7gPDzt9vrkw?v=goprev

What did you expect to see?

http.ReverseProxy perform well and has no memory leak.

What did you see instead?

proxy error logs:

2023/08/02 10:37:52 httputil: ReverseProxy read error during body copy: read tcp 127.0.0.1:53946->127.0.0.1:9090: use of closed network connection

PProf file shows that memory leak in transport:
image-20230802105027310

image-20230802105545485

image-20230802105601327-1690966447945-3

dlv debug info shows the memory leak is caused by reqCanceler. 1900+ items in the reqCanceller map should be gc after we stop client, but after a long time, they are still there.
image-20230802105151992

image-20230802105232164

a sample request(proxy -> server) diagram of tcpdump.
image

Source code and some files for further debug:
transport-memory-leak.zip

@huozby huozby changed the title affected/package: net/http affected/package: net/http memory leak due to a large number of items in transport.reqCanceler Aug 2, 2023
@mengzhuo mengzhuo changed the title affected/package: net/http memory leak due to a large number of items in transport.reqCanceler net/http: memory leak due to a large number of items in transport.reqCanceler Aug 2, 2023
@mauri870
Copy link
Member

mauri870 commented Aug 2, 2023

Issue #51354 also claims leaks in reqCanceler under certain conditions

If you get the chance try to reproduce the issue with this patch, I believe this is the fix the author of the other issue suggested:

diff --git a/src/net/http/transport.go b/src/net/http/transport.go
index c07352b018..d30eb79508 100644
--- a/src/net/http/transport.go
+++ b/src/net/http/transport.go
@@ -2248,7 +2248,7 @@ func (pc *persistConn) readLoop() {
 			}
 		case <-rc.req.Cancel:
 			alive = false
-			pc.t.CancelRequest(rc.req)
+			pc.t.cancelRequest(rc.cancelKey, errRequestCanceled)
 		case <-rc.req.Context().Done():
 			alive = false
 			pc.t.cancelRequest(rc.cancelKey, rc.req.Context().Err())

@panjf2000 panjf2000 added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 2, 2023
@panjf2000 panjf2000 added this to the Backlog milestone Aug 2, 2023
@AlexanderYastrebov
Copy link
Contributor

I've created #61724 to fix #51354

The test TestCancelRequestWithBodyWithChannel I added with such change

diff --git a/src/net/http/transport.go b/src/net/http/transport.go
index d30eb79508..fb3f9e168d 100644
--- a/src/net/http/transport.go
+++ b/src/net/http/transport.go
@@ -26,6 +26,7 @@ import (
        "net/textproto"
        "net/url"
        "reflect"
+       "runtime"
        "strings"
        "sync"
        "sync/atomic"
@@ -2248,6 +2249,9 @@ func (pc *persistConn) readLoop() {
                        }
                case <-rc.req.Cancel:
                        alive = false
+                       for _, v := range pc.t.reqCanceler {
+                               fmt.Printf(">> %v\n", runtime.FuncForPC(reflect.ValueOf(v).Pointer()).Name())
+                       }
                        pc.t.cancelRequest(rc.cancelKey, errRequestCanceled)
                case <-rc.req.Context().Done():
                        alive = false

outputs >> net/http.(*persistConn).cancelRequest-fm which matches the description so there are chances it will fix this one as well.

@munding
Copy link

munding commented Aug 3, 2023

I encountered the same issue while writing a forward HTTP proxy.
using transport.DisableKeepAlives = true temporarily resolved the memory leak issue.
image

@huozby
Copy link
Author

huozby commented Aug 3, 2023

@mauri870 @AlexanderYastrebov Thanks a lot for your great work. I test the solution you mentioned above, but it doesn't work.
The result shows:
image-20230803104122705

image-20230803104244941

Maybe my case is much more complicated. My team members and I read go net http source code, the memory leak could happen in src/net/http/transport.go:2230.

		// Before looping back to the top of this function and peeking on
		// the bufio.Reader, wait for the caller goroutine to finish
		// reading the response body. (or for cancellation or death)
		select {
		case bodyEOF := <-waitForBodyRead:
			replaced := pc.t.replaceReqCanceler(rc.cancelKey, nil) // before pc might return to idle pool
			alive = alive &&
				bodyEOF &&
				!pc.sawEOF &&
				pc.wroteRequest() &&
				replaced && tryPutIdleConn(trace)
			if bodyEOF {
				eofc <- struct{}{}
			}
		case <-rc.req.Cancel:
			alive = false
			pc.t.CancelRequest(rc.req)
		case <-rc.req.Context().Done():
			alive = false
			pc.t.cancelRequest(rc.cancelKey, rc.req.Context().Err())
		case <-pc.closech:  // memroy may leak here. pc closed, rc.cancelKey would never be removed.
			alive = false
		}

there could be a unexpected case: case bodyEOF := <-waitForBodyRead is still blocked(the caller goroutine is reading the response body.), pconn is closed by somewhere. Thus, rc.cancelKey would never clean.

pc.closech would be set here: src/net/http/transport.go:2716 which is called by src/net/http/transport.go:2696. There several callers call pc.close, which one could cause the problem?

we guess that in src/net/http/transport.go:2418 calls pc.close(err). In a rare case(or special timings), there is a writeErr(http: invalid Read on closed) , the request body not completely send to server, meanwhile the caller goroutine is reading the response body, pc.close(err) is called(could it be possible?).

			pc.writeErrCh <- err // to the body reader, which might recycle us
			wr.ch <- err         // to the roundTrip function
			if err != nil {
				pc.close(err)
				return
			}

In addition,we found a possible solution to solve this memory leak problem, bu we are not sure the impact(would it impact normal request?). Maybe it is safe to remove rc.cancelKey here, for pconn has already been closed.

		// Before looping back to the top of this function and peeking on
		// the bufio.Reader, wait for the caller goroutine to finish
		// reading the response body. (or for cancellation or death)
		select {
		case bodyEOF := <-waitForBodyRead:
			replaced := pc.t.replaceReqCanceler(rc.cancelKey, nil) // before pc might return to idle pool
			alive = alive &&
				bodyEOF &&
				!pc.sawEOF &&
				pc.wroteRequest() &&
				replaced && tryPutIdleConn(trace)
			if bodyEOF {
				eofc <- struct{}{}
			}
		case <-rc.req.Cancel:
			alive = false
			pc.t.CancelRequest(rc.req)
		case <-rc.req.Context().Done():
			alive = false
			pc.t.cancelRequest(rc.cancelKey, rc.req.Context().Err())
		case <-pc.closech: // memroy may leak here. pc closed, rc.cancelKey would never be removed.
			alive = false
                        pc.t.replaceReqCanceler(rc.cancelKey, nil) // remove rc.cancelKey when pconn has been closed
		}

In out test, this change indeed help to solve the problem, but it is really ok or is there other better solutions?

image-20230803105217512

image-20230803105254473

@huozby
Copy link
Author

huozby commented Aug 3, 2023

I encountered the same issue while writing a forward HTTP proxy. using transport.DisableKeepAlives = true temporarily resolved the memory leak issue. image

memory leak of transport may has several cause, you could use gcore to get core dump file of the process. And use dlv core <app_binary> core. to analyse the memory problem, to find which objects has great numbers and it is unusual.

In this issues(in my case), I have test the DisableKeepAlives: true, option of http.Transport, the memory still keep growing.

@AlexanderYastrebov
Copy link
Contributor

@hzzzzzb

In a rare case(or special timings), there is a writeErr(http: invalid Read on closed) , the request body not completely send to server, meanwhile the caller goroutine is reading the response body, pc.close(err) is called(could it be possible?).

I think you are right.
I've modified your proxy to print reqCanceler

func getUnexportedFieldLen(field reflect.Value) int {
	return reflect.NewAt(field.Type(), unsafe.Pointer(field.UnsafeAddr())).Elem().Len()
}

func countReqCanceler(t *http.Transport) int {
	return getUnexportedFieldLen(reflect.ValueOf(t).Elem().FieldByName("reqCanceler"))
}

...
	reverseProxy.ErrorLog = log.New(io.Discard, "", 0)
	var requests atomic.Int64

	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		reverseProxy.ServeHTTP(w, r)

		n := requests.Add(1)
		if n%1000 == 0 {
			fmt.Println("proxy: reqCanceler:", countReqCanceler(t))
		}
	})
...

and the number grows even with #51354 fix but stops growing with

		case <-pc.closech:
			alive = false
			pc.t.cancelRequest(rc.cancelKey, errRequestCanceled)
                        // pc.t.replaceReqCanceler(rc.cancelKey, nil) // or this

I've added a second channel to send close error, it is indeed ErrBodyReadAfterClose, I correlated it to

go/src/net/http/transport.go

Lines 2422 to 2424 in ed8cbaf

case wr := <-pc.writech:
startBytesWritten := pc.nwrite
err := wr.req.Request.write(pc.bw, pc.isProxy, wr.req.extra, pc.waitForContinue(wr.continueCh))

where it calls pc.close

go/src/net/http/transport.go

Lines 2444 to 2449 in ed8cbaf

pc.writeErrCh <- err // to the body reader, which might recycle us
wr.ch <- err // to the roundTrip function
if err != nil {
pc.close(err)
return
}

I think the problem could be reverseproxy specific due to the way it works similar to #46866

I could not come up with a reproducer test yet that we'd need to make a proper fix.

memory leak of transport may has several cause

Indeed, see e.g. #50798 and #43966

AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue Aug 4, 2023
Due to a race condition persistConn could be closed without removing
request canceler.

Note that without the fix test occasionally passes and to demonstrate
the issue it has to be run multiple times, e.g. using -count=10.

Fixes golang#61708
@gopherbot
Copy link

Change https://go.dev/cl/515796 mentions this issue: net/http: fix request canceler leak on connection close

@ianlancetaylor
Copy link
Contributor

CC @neild @bradfitz

@WenlongWang
Copy link

any update on this ?

AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue Aug 22, 2023
Due to a race condition persistConn could be closed without removing
request canceler.

Note that without the fix test occasionally passes and to demonstrate
the issue it has to be run multiple times, e.g. using -count=10.

Fixes golang#61708
AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue Aug 23, 2023
AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue Aug 23, 2023
@gopherbot
Copy link

Change https://go.dev/cl/522097 mentions this issue: Revert "net/http: fix request canceler leak on connection close"

@bcmills bcmills reopened this Aug 23, 2023
gopherbot pushed a commit that referenced this issue Aug 23, 2023
This reverts CL 515796 due to a flaking test.

Updates #61708
Fixes #62224

Change-Id: I53911a07677d08c3196daaaa2708269593baf472
GitHub-Last-Rev: 3544648
GitHub-Pull-Request: #62233
Reviewed-on: https://go-review.googlesource.com/c/go/+/522097
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Bryan Mills <bcmills@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
cellularmitosis pushed a commit to cellularmitosis/go that referenced this issue Aug 24, 2023
Due to a race condition persistConn could be closed without removing request canceler.

Note that without the fix test occasionally passes and to demonstrate the issue it has to be run multiple times, e.g. using -count=10.

Fixes golang#61708

Change-Id: I9029d7d65cf602dd29ee1b2a87a77a73e99d9c92
GitHub-Last-Rev: 6b31f98
GitHub-Pull-Request: golang#61745
Reviewed-on: https://go-review.googlesource.com/c/go/+/515796
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Damien Neil <dneil@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
cellularmitosis pushed a commit to cellularmitosis/go that referenced this issue Aug 24, 2023
This reverts CL 515796 due to a flaking test.

Updates golang#61708
Fixes golang#62224

Change-Id: I53911a07677d08c3196daaaa2708269593baf472
GitHub-Last-Rev: 3544648
GitHub-Pull-Request: golang#62233
Reviewed-on: https://go-review.googlesource.com/c/go/+/522097
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Bryan Mills <bcmills@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
huozby pushed a commit to huozby/go that referenced this issue Aug 26, 2023
…n.readLoop is still blocking will cause the request canceler never be removed. This would cause memory leak in http.Transport.

Note that the write error could be either "http: invalid Read on closed" (issue 61708 case, req body was close by (*chunkWriter).writeHeader src/net/http/server.go:1405 while (*persistConn).writeLoop is still reading it. A reverseproxy specific problem.) or "http: ContentLength=1048576 with Body length 1" (PR unit test case). Without the fix: pending requests = 5; want 0. With the fix: ALL TESTS PASSED.

Thanks the previous works done by @AlexanderYastrebov.

Fixes golang#61708
@gopherbot
Copy link

Change https://go.dev/cl/522916 mentions this issue: net/http: fix request canceler leak on http.Transport

huozby pushed a commit to huozby/go that referenced this issue Aug 26, 2023
A write error in pconn.writeLoop when channel waitForBodyRead in pconn.readLoop is still blocking will cause the request canceler never be removed. This would cause memory leak in http.Transport.

Note that the write error could be either "http: invalid Read on closed" (issue 61708 case, req body was close by (*chunkWriter).writeHeader src/net/http/server.go:1405 while (*persistConn).writeLoop is still reading it. A reverseproxy specific problem.) or "http: ContentLength=1048576 with Body length 1" (PR unit test case). Without the fix: pending requests = 5; want 0. With the fix: ALL TESTS PASSED.

Thanks the previous works done by @AlexanderYastrebov.

Fixes golang#61708
AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue Aug 26, 2023
writeLoop goroutine closes persistConn closech in case of request body
write error which in turn finishes readLoop without removing request
canceler.

Fixes golang#61708
@gopherbot
Copy link

Change https://go.dev/cl/523296 mentions this issue: net/http: fix request canceler leak on connection close

AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue Aug 28, 2023
writeLoop goroutine closes persistConn closech in case of request body
write error which in turn finishes readLoop without removing request
canceler.

Fixes golang#61708
AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue Aug 28, 2023
writeLoop goroutine closes persistConn closech in case of request body
write error which in turn finishes readLoop without removing request
canceler.

Fixes golang#61708
AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue Nov 10, 2023
writeLoop goroutine closes persistConn closech in case of request body
write error which in turn finishes readLoop without removing request
canceler.

Fixes golang#61708
AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue Mar 12, 2024
writeLoop goroutine closes persistConn closech in case of request body
write error which in turn finishes readLoop without removing request
canceler.

Fixes golang#61708
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
9 participants