Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: transport closes healthy connections when pool limit is reached #42650

Open
bignacio opened this issue Nov 17, 2020 · 3 comments
Open
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@bignacio
Copy link

bignacio commented Nov 17, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15.5 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build196270523=/tmp/go-build -gno-record-gcc-switches"

What did you do?

When the http transport connection pool is full (no idle connections) and new client side requests are attempted, new connections fail (as expected), and long established connections start being closed.

The following code can be used to reproduce the problem

Server side (not really important, you can use any http server so long as the response is enough for the pool to eventually be full)

Sample server code
package main

import (
"fmt"
"io"
"io/ioutil"
"net/http"
"time"
)

type testHandler struct {

}

func (h *testHandler) ServeHTTP(w http.ResponseWriter, r http.Request){
fmt.Println("request received from", r.Host)
// simulates some processing. It can be any value that is high enough to allow the client connection pool to not have any idle connections
time.Sleep(time.Millisecond
100)

io.Copy(ioutil.Discard, r.Body)
r.Body.Close()

w.Write([]byte("pong"))

}

func main(){
h := &testHandler{}

s := &http.Server{
	Addr:           "localhost:8053",
	Handler:        h,
}

s.ListenAndServe()

}

Client side code - here is where the problem can be seen

Example client side code
package main

import (
"bytes"
"context"
"fmt"
"io"
"io/ioutil"
"net"
"net/http"
"net/http/httptrace"
"time"
)

const (
dialTimeout = 1000 * time.Millisecond
clientTimeout = 200 * time.Millisecond
)

func main(){
c := getClient()

const numGoRoutines = 5

for {
	for i := 0; i < numGoRoutines; i++ {
		go makeRequest(c)
	}
	time.Sleep(time.Second*2)
}

}

func makeRequest(c *http.Client) {
rawRequest := []byte("ping")
ctx, cncl := context.WithTimeout(context.Background(), clientTimeout)
defer cncl()
endpoint := "http://localhost:8053/test"

trace := &httptrace.ClientTrace{
	DNSDone: func(dnsInfo httptrace.DNSDoneInfo) {
		fmt.Printf("DNS info %+v\n", dnsInfo)
	},
	GotConn: func(connInfo httptrace.GotConnInfo) {
		fmt.Printf("Got conn %+v\n", connInfo)
	},
	GetConn: func(hostPort string){
		fmt.Printf("Get conn %s\n", hostPort)
	},
	ConnectStart: func(network, addr string){
		fmt.Printf("Connect start %s %s\n", network, addr)
	},
	ConnectDone: func(network, addr string, err error){
		fmt.Printf("Connect done %s %s - %v\n", network, addr, err)
	},
}

req, err := http.NewRequestWithContext(ctx, "POST", endpoint, bytes.NewBuffer(rawRequest))
req = req.WithContext(httptrace.WithClientTrace(req.Context(), trace))

if err != nil {
	panic(err) // should never happen
	return
}

response, err := c.Do(req)
if err != nil {
	fmt.Println(err)
	return
}

io.Copy(ioutil.Discard, response.Body)

response.Body.Close()

}

func getClient() *http.Client {

t := http.DefaultTransport.(*http.Transport).Clone()

t.MaxIdleConns = 0
t.MaxIdleConnsPerHost = 5
t.MaxConnsPerHost = 5

t.DialContext = (&net.Dialer{
	Timeout:   dialTimeout,
	KeepAlive: 30 * time.Second,
}).DialContext

httpClient := &http.Client{
	Timeout:   clientTimeout,
	Transport: t,
}

return httpClient

}

What did you expect to see?

Note that I'm setting numGoRoutines = 5, matching the number of allowed idle and max connections per host

	t.MaxIdleConns = 0
	t.MaxIdleConnsPerHost = 5
	t.MaxConnsPerHost = 5

This is the healthy and stable case and trace shows initial connection being established for each one allowed

Connect start tcp 127.0.0.1:8053
DNS info {Addrs:[{IP:127.0.0.1 Zone:} {IP:127.0.0.1 Zone:}] Err:<nil> Coalesced:true}

and subsequently, as the program runs,

Got conn {Conn:0xc000386000 Reused:true WasIdle:true IdleTime:1.897671367s}
Got conn {Conn:0xc000010078 Reused:true WasIdle:true IdleTime:1.897726907s}
Got conn {Conn:0xc000216018 Reused:true WasIdle:true IdleTime:1.897619693s}
Got conn {Conn:0xc000010070 Reused:true WasIdle:true IdleTime:1.897554908s}
Get conn localhost:8053
Got conn {Conn:0xc0002bc008 Reused:true WasIdle:true IdleTime:1.898069612s}
Get conn localhost:8053
Got conn {Conn:0xc000010078 Reused:true WasIdle:true IdleTime:1.899032093s}
Get conn localhost:8053
Got conn {Conn:0xc0002bc008 Reused:true WasIdle:true IdleTime:1.899184074s}
Get conn localhost:8053
Got conn {Conn:0xc000010070 Reused:true WasIdle:true IdleTime:1.899394391s}
Get conn localhost:8053
Got conn {Conn:0xc000386000 Reused:true WasIdle:true IdleTime:1.899517661s}
Get conn localhost:8053
Got conn {Conn:0xc000216018 Reused:true WasIdle:true IdleTime:1.899666507s}
...

Reused:true is always the case for all "Got conn" message

The output of netstat also shows that the ports in use on both server and client side are always the same.

What did you see instead?

Now, say that numGoRoutines is set to 8 (you can also reduce the pool size, higher than the pool size. Anything that forces all connections to be in use), the trace output shows frequent reconnects

Longer trace logs
...
DNS info {Addrs:[{IP:127.0.0.1 Zone:} {IP:127.0.0.1 Zone:}] Err: Coalesced:true}
DNS info {Addrs:[{IP:127.0.0.1 Zone:} {IP:127.0.0.1 Zone:}] Err: Coalesced:true}
Connect start tcp 127.0.0.1:8053
Connect start tcp 127.0.0.1:8053
DNS info {Addrs:[{IP:127.0.0.1 Zone:} {IP:127.0.0.1 Zone:}] Err: Coalesced:true}
Connect start tcp 127.0.0.1:8053
Connect done tcp 127.0.0.1:8053 - 
Got conn {Conn:0xc000010098 Reused:false WasIdle:false IdleTime:0s}
Connect done tcp 127.0.0.1:8053 - 
Connect done tcp 127.0.0.1:8053 - 
Got conn {Conn:0xc000394080 Reused:false WasIdle:false IdleTime:0s}
Got conn {Conn:0xc000394088 Reused:false WasIdle:false IdleTime:0s}
Got conn {Conn:0xc000190060 Reused:true WasIdle:false IdleTime:0s}
Got conn {Conn:0xc000394048 Reused:true WasIdle:false IdleTime:0s}
Got conn {Conn:0xc000010098 Reused:true WasIdle:false IdleTime:0s}
Post "http://localhost:8053/4/bid": context deadline exceeded
Post "http://localhost:8053/4/bid": context deadline exceeded
Post "http://localhost:8053/4/bid": context deadline exceeded
Get conn localhost:8053
Get conn localhost:8053
Get conn localhost:8053
Get conn localhost:8053
Get conn localhost:8053
Got conn {Conn:0xc000394088 Reused:true WasIdle:true IdleTime:1.898164171s}
Got conn {Conn:0xc000394080 Reused:true WasIdle:true IdleTime:1.898254362s}
Get conn localhost:8053
Get conn localhost:8053
Get conn localhost:8053
DNS info {Addrs:[{IP:127.0.0.1 Zone:} {IP:127.0.0.1 Zone:}] Err: Coalesced:true}
DNS info {Addrs:[{IP:127.0.0.1 Zone:} {IP:127.0.0.1 Zone:}] Err: Coalesced:true}
Connect start tcp 127.0.0.1:8053
Connect start tcp 127.0.0.1:8053
DNS info {Addrs:[{IP:127.0.0.1 Zone:} {IP:127.0.0.1 Zone:}] Err: Coalesced:true}
Connect start tcp 127.0.0.1:8053
Connect done tcp 127.0.0.1:8053 - 
Connect done tcp 127.0.0.1:8053 - 
Got conn {Conn:0xc00029e050 Reused:false WasIdle:false IdleTime:0s}
Got conn {Conn:0xc000190088 Reused:false WasIdle:false IdleTime:0s}
Connect done tcp 127.0.0.1:8053 - 
Got conn {Conn:0xc00041a018 Reused:false WasIdle:false IdleTime:0s}
Got conn {Conn:0xc000394080 Reused:true WasIdle:false IdleTime:0s}
Got conn {Conn:0xc000394088 Reused:true WasIdle:false IdleTime:0s}
Got conn {Conn:0xc00029e050 Reused:true WasIdle:false IdleTime:0s}
Post "http://localhost:8053/4/bid": context deadline exceeded
Post "http://localhost:8053/4/bid": context deadline exceeded
Post "http://localhost:8053/4/bid": context deadline exceeded
..

this goes on and on

The context deadline exceeded suggests one cause for the problem.
I noticed that goroutines trying to make a request block on the select in Transport::getConn, indicating that, by the time a connection is made available, there isn't enough time for the entire operation to complete and the context eventually expires in the write or read loops.

This causes a connection that is already in established to be closed and replaced, creating a cycle that rotates connections all the time.

The output of netstat confirms this behaviour

Is there a way around this? Can we control the timeout waiting for a connection from the pool separately from request/response timeouts?
Or even not blocking when the pool is full? Thanks

@dmitshur dmitshur changed the title net/http transport closes healthy connections when pool limit is reached net/http: transport closes healthy connections when pool limit is reached Nov 30, 2020
@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 30, 2020
@dmitshur dmitshur added this to the Backlog milestone Nov 30, 2020
@dmitshur
Copy link
Contributor

CC @bradfitz, @rsc, @empijei per owners.

@c5433137
Copy link

c5433137 commented Feb 22, 2021

i get the same problem, i find the connecttion pool is no usefull when the request timeout by context,because timeout will close the connection. can you have some methods to solve it? @dmitshur @bradfitz

@DaemonSnake
Copy link

It's been two years any update? @dmitshur @bradfitz @rsc @empijei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants