Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: dial should not be canceled if an idle connection is reused #66442

Closed
bancek opened this issue Mar 21, 2024 · 6 comments
Closed

net/http: dial should not be canceled if an idle connection is reused #66442

bancek opened this issue Mar 21, 2024 · 6 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@bancek
Copy link

bancek commented Mar 21, 2024

Go version

go1.22.1 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.1'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3581231785=/tmp/go-build -gno-record-gcc-switches'

What did you do?

When the HTTP transport processes a new request it first tries to get an idle connection (if keep alive is not disabled). If there are no idle connection it will dial a new connection. If an idle connection becomes ready while dialing the new connection, the idle connection will be reused and the new connection will be put to idle queue (if there is space). If the request's context is canceled, both the request's and the new connection's context will be canceled.

This causes problems when canceling a context after the response has already been read and closed but the new connection dial has not yet completed (e.g. TLS handshake not yet done). In this case the new connection will be closed. This may cause a lot of new connections to be created and closed (MaxConnsPerHost is ignored). If this repeats a lot, too many TCP connections are stuck in CLOSE_WAIT state and new connections cannot be created.

This only affects HTTP/1 (HTTP and HTTPS).

Example:

  • Requests are created with context cancellation and canceled when the response is read and closed
  • R1 dials C1 and uses C1
  • R2 dials C2 but while waiting for the C2's TLS handshake, C1 becomes idle and is reused - R2 uses C1
  • R2 finishes, its response body is closed and R2's context is canceled
  • C2 is still handling TLS handshake and R2's context cancellation causes C2 to be closed

Example code:

package main

import (
	"context"
	"fmt"
	"io"
	"net/http"
	"net/http/httptest"
	"sync"
)

func main() {
	server := httptest.NewTLSServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {}))
	client := server.Client()
	transport := client.Transport.(*http.Transport)
	transport.MaxIdleConnsPerHost = 2

	requests := 100
	// concurrency must be > transport.MaxIdleConnsPerHost
	concurrency := 10

	makeReq := func() {
		ctx := context.Background()
		ctx, cancel := context.WithCancel(ctx)
		defer cancel()

		req, _ := http.NewRequestWithContext(ctx, "GET", server.URL, nil)
		res, err := transport.RoundTrip(req)
		if err != nil {
			fmt.Println(err)
			return
		}
		io.Copy(io.Discard, res.Body)
		res.Body.Close()
	}

	reqChan := make(chan struct{}, requests)
	for i := 0; i < requests; i++ {
		reqChan <- struct{}{}
	}
	close(reqChan)

	var wg sync.WaitGroup

	wg.Add(concurrency)

	for i := 0; i < concurrency; i++ {
		go func() {
			for range reqChan {
				makeReq()
			}
			wg.Done()
		}()
	}

	wg.Wait()
}

What did you see happen?

$ go run .
read tcp 127.0.0.1:55248->127.0.0.1:54313: read: connection reset by peer
read tcp 127.0.0.1:55246->127.0.0.1:54313: read: connection reset by peer
2024/03/21 11:23:07 http: TLS handshake error from 127.0.0.1:55857: EOF
2024/03/21 11:23:07 http: TLS handshake error from 127.0.0.1:55848: EOF
2024/03/21 11:23:07 http: TLS handshake error from 127.0.0.1:55852: EOF
read tcp 127.0.0.1:55250->127.0.0.1:54313: read: connection reset by peer
2024/03/21 11:23:07 http: TLS handshake error from 127.0.0.1:55850: EOF

With higher concurrency even setting transport.MaxConnsPerHost does not help. The setting fixes request errors but still creates too many TCP connections.

transport.MaxIdleConnsPerHost = 100
transport.MaxConnsPerHost = 100
concurrency := 100

What did you expect to see?

$ go run .
@AlexanderYastrebov
Copy link
Contributor

#59017 looks related and #59017 (comment) in particular.

@gopherbot
Copy link

Change https://go.dev/cl/572696 mentions this issue: net/http: dial should not be canceled if an idle connection is reused

@bancek
Copy link
Author

bancek commented Mar 21, 2024

@neild this is the same issue as #59017 but with a proposed fix. The context cancelation is still kept as is but if the idle connection is reused, the context cancelation is disabled and if the new connection's dial succeeds the connection is put to idle queue. This limits the number of created and closed connections and in combination with MaxConnsPerHost does not go over MaxConnsPerHost.

@dr2chase
Copy link
Contributor

@neild (though you are already pinged above)

@dr2chase dr2chase added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 21, 2024
@neild
Copy link
Contributor

neild commented Apr 1, 2024

I'm going to mark this as a duplicate, so we have discussion in only one place.

@neild
Copy link
Contributor

neild commented Apr 1, 2024

Duplicate of #59017

@neild neild marked this as a duplicate of #59017 Apr 1, 2024
@neild neild closed this as completed Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants