Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: transport MaxIdleConnsPerHost does not work as expected #13801

Open
sschepens opened this issue Jan 2, 2016 · 7 comments
Open

net/http: transport MaxIdleConnsPerHost does not work as expected #13801

sschepens opened this issue Jan 2, 2016 · 7 comments
Labels
help wanted NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@sschepens
Copy link

I recently had an issue which made Go consume all ephemeral ports and start failing to do requests.
The problem was triggered because I was doing 10 concurrent requests with the DefaultMaxIdleConnsPerHost so Transport was only reusing 2 connections and closing the other 8 as soon as a request was finished, just to reestablish a connection again.
This seems a really bad behavior and never happened to me on any other language or http client.
A connection is usually considered idle when it was not used for a period of inactivity, maybe a whole keepalive period or so, not after a request has been sent successfully.
The behavior one would expect is to have 10 established connections and, after say 30 seconds of not issuing any request having only 2 connections established as per DefaultMaxIdleConnsPerHost.

Here is a gist that triggers this behavior.

I'm running go 1.5.2 on Ubuntu Precise 12.04, kernel: Linux e-0000e768 3.8.0-44-generic #66~precise1-Ubuntu SMP Tue Jul 15 04:01:04 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

@ianlancetaylor ianlancetaylor changed the title HTTP Transport MaxIdleConnsPerHost does not work as expected net/http: transport MaxIdleConnsPerHost does not work as expected Jan 3, 2016
@ianlancetaylor ianlancetaylor added this to the Go1.6Maybe milestone Jan 3, 2016
@ianlancetaylor
Copy link
Contributor

CC @bradfitz

@bradfitz
Copy link
Contributor

bradfitz commented Jan 4, 2016

This has been like this forever, so it's not Go 1.6 material. It's also somewhat involved.

@bradfitz bradfitz modified the milestones: Go1.7, Go1.6Maybe Jan 4, 2016
@gopherbot
Copy link

CL https://golang.org/cl/18282 mentions this issue.

bradfitz added a commit that referenced this issue Jan 5, 2016
…n.readLoop

In debugging the flaky test in #13825, I discovered that my previous
change to tighten and simplify the communication protocol between
Transport.roundTrip and persistConn.readLoop in
https://golang.org/cl/17890 wasn't complete.

This change simplifies it further: the buffered-vs-unbuffered
complexity goes away, and we no longer need to re-try channel reads in
the select case. It was trying to prioritize channels in the case that
two were readable in the select. (it was only failing in the race builder
because the race builds randomize select scheduling)

The problem was that in the bodyless response case we had to return
the idle connection before replying to roundTrip. But putIdleConn
previously both added it to the free list (which we wanted), but also
closed the connection, which made the caller goroutine
(Transport.roundTrip) have two readable cases: pc.closech, and the
response. We guarded against similar conditions in the caller's select
for two readable channels, but such a fix wasn't possible here, and would
be overly complicated.

Instead, switch to unbuffered channels. The unbuffered channels were only
to prevent goroutine leaks, so address that differently: add a "callerGone"
channel closed by the caller on exit, and select on that during any unbuffered
sends.

As part of the fix, split putIdleConn into two halves: a part that
just returns to the freelist, and a part that also closes. Update the
four callers to the variants each wanted.

Incidentally, the connections were closing on return to the pool due
to MaxIdleConnsPerHost (somewhat related: #13801), but this bug
could've manifested for plenty of other reasons.

Fixes #13825

Change-Id: I6fa7136e2c52909d57a22ea4b74d0155fdf0e6fa
Reviewed-on: https://go-review.googlesource.com/18282
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Gerrand <adg@golang.org>
@bradfitz
Copy link
Contributor

Related, probably to be addressed at the same time: #13957 (limiting number of dials in-flight at once per host)

@bradfitz
Copy link
Contributor

bradfitz commented Feb 2, 2016

Related also: we should close idle conns in some defined order, like the oldest use one gets close first.

@bradfitz
Copy link
Contributor

Related also: we should close idle conns in some defined order, like the oldest use one gets close first.

That happened in Go 1.7 at least.

The rest of this bug didn't happen, though.

@bradfitz bradfitz modified the milestones: Go1.8, Go1.7 May 19, 2016
@quentinmit quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 7, 2016
@bradfitz
Copy link
Contributor

bradfitz commented Nov 1, 2016

Also not happening for Go 1.8.

brian-brazil pushed a commit to prometheus/prometheus that referenced this issue Dec 17, 2017
Otherwise it defaults to 2, and Go will close extra connections as
soon as a request is finished.
See golang/go#13801
kahing added a commit to kahing/goofys that referenced this issue Feb 7, 2018
@gopherbot gopherbot modified the milestones: Go1.11, Unplanned May 23, 2018
iferunsewe pushed a commit to EngineerBetter/credhub-cli that referenced this issue Jul 6, 2018
We found that when using Credhub with Concourse, there was a massive spike in
CPU usage. We profiled ATC (Concourse component that interacts with Credhub) and
found that >90% of samples were in TLS handshake methods. We did not profile
credhub, but presumably it was much the same.

This commit ensures HTTP response body's are read to completion, which is
required for net/http to reuse the connection and avoid a costly TLS handshake.
It also increases the maximum number of idle HTTP connections to 100, from the
default of 2, which is required to work around limitations in net/http:

golang/go#13801

Signed-off-by: Ife Runsewe <ife.runsewe@engineerbetter.com>
iferunsewe pushed a commit to EngineerBetter/credhub-cli that referenced this issue Jul 6, 2018
We found that when using Credhub with Concourse, there was a massive spike in
CPU usage. We profiled ATC (Concourse component that interacts with Credhub) and
found that >90% of samples were in TLS handshake methods. We did not profile
credhub, but presumably it was much the same.

This commit ensures HTTP response body's are read to completion, which is
required for net/http to reuse the connection and avoid a costly TLS handshake.
It also increases the maximum number of idle HTTP connections to 100, from the
default of 2, which is required to work around limitations in net/http:

golang/go#13801

Signed-off-by: Ife Runsewe <ife.runsewe@engineerbetter.com>
iferunsewe pushed a commit to EngineerBetter/credhub-cli that referenced this issue Jul 6, 2018
We found that when using Credhub with Concourse, there was a massive spike in
CPU usage. We profiled ATC (Concourse component that interacts with Credhub) and
found that >90% of samples were in TLS handshake methods. We did not profile
credhub, but presumably it was much the same.

This commit ensures HTTP response body's are read to completion, which is
required for net/http to reuse the connection and avoid a costly TLS handshake.
It also increases the maximum number of idle HTTP connections to 100, from the
default of 2, which is required to work around limitations in net/http:

golang/go#13801

Signed-off-by: Ife Runsewe <ife.runsewe@engineerbetter.com>
martyspiewak pushed a commit to cloudfoundry/credhub-cli that referenced this issue Aug 1, 2018
We found that when using Credhub with Concourse, there was a massive spike in
CPU usage. We profiled ATC (Concourse component that interacts with Credhub) and
found that >90% of samples were in TLS handshake methods. We did not profile
credhub, but presumably it was much the same.

This commit ensures HTTP response body's are read to completion, which is
required for net/http to reuse the connection and avoid a costly TLS handshake.
It also increases the maximum number of idle HTTP connections to 100, from the
default of 2, which is required to work around limitations in net/http:

golang/go#13801

Signed-off-by: Ife Runsewe <ife.runsewe@engineerbetter.com>
bboreham added a commit to cortexproject/cortex that referenced this issue Mar 16, 2020
We will connect many times in parallel to the same DynamoDB server,
and with default settings Go will close and re-open connections; see
golang/go#13801

Raise MaxIdleConnsPerHost to avoid this.

Signed-off-by: Bryan Boreham <bryan@weave.works>
bboreham added a commit to cortexproject/cortex that referenced this issue Mar 16, 2020
We will connect many times in parallel to the same DynamoDB server,
and with default settings Go will close and re-open connections; see
golang/go#13801

Raise MaxIdleConnsPerHost to avoid this.

Signed-off-by: Bryan Boreham <bryan@weave.works>
JensErat added a commit to JensErat/common that referenced this issue Apr 14, 2020
This commit limits the number of concurrent connections to encourage
connection reuse. `MaxIdleConnsPerHost` was already set to workaround
golang/go#13801 ("issue which made Go consume
all ephemeral ports").

We observed a similar issue when starting up Loki (using weaveworks
`common/aws` throught Cortex) which resulted in thousands of very small
S3 requests until memcached was seeded. This failed because of ephemeral
port exhaustion: while the application in fact would have reused
connections, it already failed as it sent all of them concurrently until
most of the requests failed.

`MaxConnsPerHost` can be used to limit the overall number of parallel
connections.
JensErat added a commit to JensErat/common that referenced this issue Apr 14, 2020
This commit limits the number of concurrent connections to encourage
connection reuse. `MaxIdleConnsPerHost` was already set to workaround
golang/go#13801 ("issue which made Go consume
all ephemeral ports").

We observed a similar issue when starting up Loki (using weaveworks
`common/aws` throught Cortex) which resulted in thousands of very small
S3 requests until memcached was seeded. This failed because of ephemeral
port exhaustion: while the application in fact would have reused
connections, it already failed as it sent all of them concurrently until
most of the requests failed.

`MaxConnsPerHost` can be used to limit the overall number of parallel
connections.
JensErat added a commit to JensErat/common that referenced this issue Apr 14, 2020
This commit limits the number of concurrent connections to encourage
connection reuse. `MaxIdleConnsPerHost` was already set to workaround
golang/go#13801 ("issue which made Go consume
all ephemeral ports").

We observed a similar issue when starting up Loki (using weaveworks
`common/aws` throught Cortex) which resulted in thousands of very small
S3 requests until memcached was seeded. This failed because of ephemeral
port exhaustion: while the application in fact would have reused
connections, it already failed as it sent all of them concurrently until
most of the requests failed.

`MaxConnsPerHost` can be used to limit the overall number of parallel
connections.
cyriltovena pushed a commit to cyriltovena/loki that referenced this issue Jun 11, 2021
We will connect many times in parallel to the same DynamoDB server,
and with default settings Go will close and re-open connections; see
golang/go#13801

Raise MaxIdleConnsPerHost to avoid this.

Signed-off-by: Bryan Boreham <bryan@weave.works>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants