net/http: Client does not retry idempotent requests on transport failure #4677

gopherbot · 2013-01-18T21:53:43Z

by patrick.allen.higgins:

rfc2616 section 8.1.4 says: "Client software SHOULD reopen the transport connection
and retransmit the aborted sequence of requests without user interaction so long as the
request sequence is idempotent"

http.Client.Get() does not do this and does not document that callers are responsible
for doing so.

bradfitz · 2013-01-22T22:31:54Z

Comment 1:

If only server authors consistently made their GET handlers be idempotent.
I don't think this is safe to do by default. It could be per-Client opt-in, or we could
just document it.

Status changed to Accepted.

rsc · 2013-01-22T22:34:31Z

Comment 2:

What does Chrome do?

bradfitz · 2013-01-22T23:41:00Z

Comment 3:

See also issue #3514 (a race between client re-using connections and the server shutting
down that connection).  Fixing this would kinda fix that, if it were safe to do this.
I don't know what Chrome does.

bradfitz · 2013-01-22T23:46:13Z

Comment 4:

[+willchan for Chrome question]

evmar · 2013-01-22T23:46:54Z

Comment 5:

Relevant Chrome code, I think:
"1264 // This method determines whether it is safe to resend the request after an
1265 // IO error.  It can only be called in response to request header or body
1266 // write errors or response header read errors.  It should not be used in
1267 // other cases, such as a Connect error."
http://git.chromium.org/gitweb/?p=chromium.git;a=blob;f=net/http/http_network_transaction.cc;h=caa8f11909ff40e5cee9dd6be3814233220d36f1;hb=HEAD#l1264

bradfitz · 2013-01-22T23:48:41Z

Comment 6:

And:
1342 bool HttpNetworkTransaction::ShouldResendRequest(int error) const {
1343   bool connection_is_proven = stream_->IsConnectionReused();
1344   bool has_received_headers = GetResponseHeaders() != NULL;
1345 
1346   // NOTE: we resend a request only if we reused a keep-alive connection.
1347   // This automatically prevents an infinite resend loop because we'll run
1348   // out of the cached keep-alive connections eventually.
1349   if (connection_is_proven && !has_received_headers)
1350     return true;
1351   return false;
1352 }

gopherbot · 2013-01-23T00:07:14Z

Comment 7 by willchan@chromium.org:

* If we get a RST at the TCP level when we try to reuse a persistent connection, we
retry (because middleboxes often time out connections and send RSTs if you still try to
use it). For desktop, you can be wasteful and use TCP keepalives to mitigate this too.
For mobile, it wakes up the radio and maybe costs money, so you shouldn't do that and
just deal with it. We close persistent connections after a fixed period to minimize the
times we get these errors. For mobile, we do deferred socket closes (wait for the radio
to wakeup due to other HTTP requests, and then close all timed out sockets).
* If you pipeline requests and get a transport error, we pray that HEADs and GETs are
actually idempotent and retry.

rsc · 2013-01-30T18:08:36Z

Comment 8:

Labels changed: added priority-later, removed priority-triage.

rsc · 2013-03-12T20:25:37Z

Comment 9:

Labels changed: added go1.1maybe, removed go1.1.

robpike · 2013-05-18T15:04:13Z

Comment 10:

Labels changed: added go1.2maybe, removed go1.1maybe.

rsc · 2013-07-30T22:40:56Z

Comment 12:

Labels changed: added feature.

robpike · 2013-08-30T05:57:35Z

Comment 13:

Not for 1.2.

Labels changed: removed go1.2maybe.

rogpeppe · 2013-10-16T16:00:10Z

Comment 14:

FWIW we just came across an issue that triggered this problem
in a way that happens reliably.
The problem happens when GOMAXPROCS=1 and there's something
doing lots of work (in our case it was reading a bunch of
data from disk and gzipping it) and then issues an http request.
The connection times out while the code is spinning, but
the http transport code doesn't get scheduled to see the EOF
until the code stops spinning, so we see the EOF almost exactly
at the same time as the request is issued, so the request
fails because it doesn't realise that the EOF is out of band.
The problem is exacerbated because the window for the race is larger than
it needs to be (the request is marked as in flight before the
request has been transferred to the writing goroutine, instead
of incrementing the count just before conn.Write is called).
Here's a program that demonstrates the issue reliably for me:
   http://play.golang.org/p/bVf9wsCJSx
The second GET request fails with "can't write HTTP request on broken connection" or
"EOF".
It succeeds when run with GOMAXPROCS>0.
For the time being we can work around this avoiding connection reuse
on all our http client connections, but this is less than ideal.

rogpeppe · 2013-10-16T16:17:38Z

Comment 15:

> It succeeds when run with GOMAXPROCS>0.
GOMAXPROCS>1, of course!

rsc · 2013-11-27T18:49:27Z

Comment 16:

Labels changed: added go1.3maybe.

rsc · 2013-11-27T20:29:16Z

Comment 17:

Labels changed: removed feature.

rsc · 2013-12-04T01:28:16Z

Comment 18:

Labels changed: added release-none, removed go1.3maybe.

rsc · 2013-12-04T01:51:54Z

Comment 19:

Labels changed: added repo-main.

bradfitz · 2014-05-30T02:03:39Z

Comment 20:

Issue #8122 has been merged into this issue.

bradfitz · 2014-11-19T03:12:02Z

Comment 21:

Issue #9122 has been merged into this issue.

gopherbot · 2014-11-27T20:49:35Z

Comment 22 by james.defelice:

It would be great to see this fixed in the upcoming 1.4 release

bradfitz · 2014-11-28T03:01:40Z

Comment 23:

That won't happen. Go 1.4 was frozen 3 months ago. We have a release cycle that's 3
months of work, then 3 months of stabilization. This would need to be done between
December 1st-ish and March 1st-ish.

If we try to reuse a connection that the server is in the process of closing, we may end up successfully writing out our request (or a portion of our request) only to find a connection error when we try to read from (or finish writing to) the socket. This manifests as an EOF returned from the Transport's RoundTrip. This is a test for one of the issues described in issue golang#4677.

If we try to reuse a connection that the server is in the process of closing, we may end up successfully writing out our request (or a portion of our request) only to find a connection error when we try to read from (or finish writing to) the socket. This manifests as an EOF returned from the Transport's RoundTrip. The issue, among others, is described in golang#4677. This change follows some of the Chromium guidelines for retrying idempotent requests only when the connection has been already been used successfully and no header data has yet been received for the response. Change-Id: I1ca630b944f0ed7ec1d3d46056a50fb959481a16

bgentry · 2015-01-23T00:15:27Z

I've made an attempt to resolve one specific, reproducible example of this issue: https://go-review.googlesource.com/3210

Following the comments in this thread from Chromium's network stack, I'm only retrying under the following circumstances:

Request is idempotent (currently just GET or HEAD)
Connection has already been used successfully and is being reused
No data has yet been received for the response headers

The comments so far suggest that this kind of change is something you're open to. Hopefully my approach is reasonable :)

bradfitz · 2015-04-06T14:12:39Z

https://go-review.googlesource.com/#/c/3210 is out for review

gopherbot · 2015-04-25T20:18:50Z

CL https://golang.org/cl/3210 mentions this issue.

If we try to reuse a connection that the server is in the process of closing, we may end up successfully writing out our request (or a portion of our request) only to find a connection error when we try to read from (or finish writing to) the socket. This manifests as an EOF returned from the Transport's RoundTrip. The issue, among others, is described in #4677. This change follows some of the Chromium guidelines for retrying idempotent requests only when the connection has been already been used successfully and no header data has yet been received for the response. As part of this change, an unexported error was defined for errMissingHost, which was previously defined inline. errMissingHost is the only non-network error returned from a Request's Write() method. Additionally, this breaks TestLinuxSendfile because its test server explicitly triggers the type of scenario this change is meant to retry on. Because that test server stops accepting conns on the test listener before the retry, the test would time out. To fix this, the test was altered to use a non-idempotent test type (POST). Change-Id: I1ca630b944f0ed7ec1d3d46056a50fb959481a16 Reviewed-on: https://go-review.googlesource.com/3210 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>

glasser · 2016-02-15T23:45:35Z

Should this issue be closed now that 5dd372b is merged? Or does that only cover some of the cases of this issue?

bradfitz · 2016-02-16T02:17:33Z

@glasser, yes, probably. I can't remember anything else this was open for.

gopherbot added accepted labels Nov 28, 2014

This was referenced Dec 8, 2014

net/http: Consecutive Get requests in quick succession sometimes fail when remote server closes connection after responding with keep-alive #8122

Closed

net/http: http.Client fail with EOF on sequential request to chunked content #9122

Closed

rogpeppe mentioned this issue Dec 10, 2014

bakery: remove Request type. go-macaroon-bakery/macaroon-bakery#12

Merged

bradfitz mentioned this issue Dec 23, 2014

net/http: document errors more (*Transport, Client's wrapper errors, etc), how to check canceled, ... #9424

Open

bgentry mentioned this issue Jan 30, 2015

net/http: http.Client not properly handling broken connections #8946

Closed

mikioh mentioned this issue Feb 8, 2015

net: Dial only tries the first address #9801

Closed

rsc added this to the Unplanned milestone Apr 10, 2015

rsc removed priority-later labels Apr 10, 2015

maddyblue mentioned this issue Apr 21, 2015

cmd/scollect: Do not reuse haproxy stats connections. bosun-monitor/bosun#914

Merged

lebauce mentioned this issue May 19, 2015

Retry when getting an EOF on HEAD/GET requests because of a Golang bug ncw/swift#41

Merged

sobczyk mentioned this issue Jun 19, 2015

fix EOF during mirror update aptly-dev/aptly#266

Merged

zignig mentioned this issue Jul 31, 2015

Chunking and Content size ipfs/kubo#1546

Closed

This was referenced Aug 21, 2015

[SOLVED]Apply error waiting for instance become ready with openstack hashicorp/terraform#3034

Closed

[SOLVED]GET EOF on servers.get rackspace/gophercloud#465

Closed

tv42 mentioned this issue Aug 29, 2015

use go's built in handling of trailers and dont do custom chunking ipfs/kubo#1621

Merged

jeteon mentioned this issue Dec 26, 2015

Server closes 404 HTTP connections without indicating it will crate/crate#3128

Closed

bradfitz closed this as completed Feb 16, 2016

aryszka mentioned this issue Feb 9, 2017

Skipper produces 500 "Internal Server Error" HTTP status code on EOF zalando/skipper#269

Closed

golang locked and limited conversation to collaborators Feb 28, 2017

gopherbot added the FrozenDueToAge label Feb 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

net/http: Client does not retry idempotent requests on transport failure #4677

net/http: Client does not retry idempotent requests on transport failure #4677

gopherbot commented Jan 18, 2013

bradfitz commented Jan 22, 2013

rsc commented Jan 22, 2013

bradfitz commented Jan 22, 2013

bradfitz commented Jan 22, 2013

evmar commented Jan 22, 2013

bradfitz commented Jan 22, 2013

gopherbot commented Jan 23, 2013

rsc commented Jan 30, 2013

rsc commented Mar 12, 2013

robpike commented May 18, 2013

rsc commented Jul 30, 2013

robpike commented Aug 30, 2013

rogpeppe commented Oct 16, 2013

rogpeppe commented Oct 16, 2013

rsc commented Nov 27, 2013

rsc commented Nov 27, 2013

rsc commented Dec 4, 2013

rsc commented Dec 4, 2013

bradfitz commented May 30, 2014

bradfitz commented Nov 19, 2014

gopherbot commented Nov 27, 2014

bradfitz commented Nov 28, 2014

bgentry commented Jan 23, 2015

bradfitz commented Apr 6, 2015

gopherbot commented Apr 25, 2015

glasser commented Feb 15, 2016

bradfitz commented Feb 16, 2016

net/http: Client does not retry idempotent requests on transport failure #4677

net/http: Client does not retry idempotent requests on transport failure #4677

Comments

gopherbot commented Jan 18, 2013

bradfitz commented Jan 22, 2013

rsc commented Jan 22, 2013

bradfitz commented Jan 22, 2013

bradfitz commented Jan 22, 2013

evmar commented Jan 22, 2013

bradfitz commented Jan 22, 2013

gopherbot commented Jan 23, 2013

rsc commented Jan 30, 2013

rsc commented Mar 12, 2013

robpike commented May 18, 2013

rsc commented Jul 30, 2013

robpike commented Aug 30, 2013

rogpeppe commented Oct 16, 2013

rogpeppe commented Oct 16, 2013

rsc commented Nov 27, 2013

rsc commented Nov 27, 2013

rsc commented Dec 4, 2013

rsc commented Dec 4, 2013

bradfitz commented May 30, 2014

bradfitz commented Nov 19, 2014

gopherbot commented Nov 27, 2014

bradfitz commented Nov 28, 2014

bgentry commented Jan 23, 2015

bradfitz commented Apr 6, 2015

gopherbot commented Apr 25, 2015

glasser commented Feb 15, 2016

bradfitz commented Feb 16, 2016