Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: Client does not retry idempotent requests on transport failure #4677

Closed
gopherbot opened this issue Jan 18, 2013 · 27 comments
Closed

Comments

@gopherbot
Copy link

by patrick.allen.higgins:

rfc2616 section 8.1.4 says: "Client software SHOULD reopen the transport connection
and retransmit the aborted sequence of requests without user interaction so long as the
request sequence is idempotent"

http.Client.Get() does not do this and does not document that callers are responsible
for doing so.
@bradfitz
Copy link
Contributor

Comment 1:

If only server authors consistently made their GET handlers be idempotent.
I don't think this is safe to do by default. It could be per-Client opt-in, or we could
just document it.

Status changed to Accepted.

@rsc
Copy link
Contributor

rsc commented Jan 22, 2013

Comment 2:

What does Chrome do?

@bradfitz
Copy link
Contributor

Comment 3:

See also issue #3514 (a race between client re-using connections and the server shutting
down that connection).  Fixing this would kinda fix that, if it were safe to do this.
I don't know what Chrome does.

@bradfitz
Copy link
Contributor

Comment 4:

[+willchan for Chrome question]

@evmar
Copy link

evmar commented Jan 22, 2013

Comment 5:

Relevant Chrome code, I think:
"1264 // This method determines whether it is safe to resend the request after an
1265 // IO error.  It can only be called in response to request header or body
1266 // write errors or response header read errors.  It should not be used in
1267 // other cases, such as a Connect error."
http://git.chromium.org/gitweb/?p=chromium.git;a=blob;f=net/http/http_network_transaction.cc;h=caa8f11909ff40e5cee9dd6be3814233220d36f1;hb=HEAD#l1264

@bradfitz
Copy link
Contributor

Comment 6:

And:
1342 bool HttpNetworkTransaction::ShouldResendRequest(int error) const {
1343   bool connection_is_proven = stream_->IsConnectionReused();
1344   bool has_received_headers = GetResponseHeaders() != NULL;
1345 
1346   // NOTE: we resend a request only if we reused a keep-alive connection.
1347   // This automatically prevents an infinite resend loop because we'll run
1348   // out of the cached keep-alive connections eventually.
1349   if (connection_is_proven && !has_received_headers)
1350     return true;
1351   return false;
1352 }

@gopherbot
Copy link
Author

Comment 7 by willchan@chromium.org:

* If we get a RST at the TCP level when we try to reuse a persistent connection, we
retry (because middleboxes often time out connections and send RSTs if you still try to
use it). For desktop, you can be wasteful and use TCP keepalives to mitigate this too.
For mobile, it wakes up the radio and maybe costs money, so you shouldn't do that and
just deal with it. We close persistent connections after a fixed period to minimize the
times we get these errors. For mobile, we do deferred socket closes (wait for the radio
to wakeup due to other HTTP requests, and then close all timed out sockets).
* If you pipeline requests and get a transport error, we pray that HEADs and GETs are
actually idempotent and retry.

@rsc
Copy link
Contributor

rsc commented Jan 30, 2013

Comment 8:

Labels changed: added priority-later, removed priority-triage.

@rsc
Copy link
Contributor

rsc commented Mar 12, 2013

Comment 9:

Labels changed: added go1.1maybe, removed go1.1.

@robpike
Copy link
Contributor

robpike commented May 18, 2013

Comment 10:

Labels changed: added go1.2maybe, removed go1.1maybe.

@rsc
Copy link
Contributor

rsc commented Jul 30, 2013

Comment 12:

Labels changed: added feature.

@robpike
Copy link
Contributor

robpike commented Aug 30, 2013

Comment 13:

Not for 1.2.

Labels changed: removed go1.2maybe.

@rogpeppe
Copy link
Contributor

Comment 14:

FWIW we just came across an issue that triggered this problem
in a way that happens reliably.
The problem happens when GOMAXPROCS=1 and there's something
doing lots of work (in our case it was reading a bunch of
data from disk and gzipping it) and then issues an http request.
The connection times out while the code is spinning, but
the http transport code doesn't get scheduled to see the EOF
until the code stops spinning, so we see the EOF almost exactly
at the same time as the request is issued, so the request
fails because it doesn't realise that the EOF is out of band.
The problem is exacerbated because the window for the race is larger than
it needs to be (the request is marked as in flight before the
request has been transferred to the writing goroutine, instead
of incrementing the count just before conn.Write is called).
Here's a program that demonstrates the issue reliably for me:
   http://play.golang.org/p/bVf9wsCJSx
The second GET request fails with "can't write HTTP request on broken connection" or
"EOF".
It succeeds when run with GOMAXPROCS>0.
For the time being we can work around this avoiding connection reuse
on all our http client connections, but this is less than ideal.

@rogpeppe
Copy link
Contributor

Comment 15:

> It succeeds when run with GOMAXPROCS>0.
GOMAXPROCS>1, of course!

@rsc
Copy link
Contributor

rsc commented Nov 27, 2013

Comment 16:

Labels changed: added go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Nov 27, 2013

Comment 17:

Labels changed: removed feature.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 18:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 19:

Labels changed: added repo-main.

@bradfitz
Copy link
Contributor

Comment 20:

Issue #8122 has been merged into this issue.

@bradfitz
Copy link
Contributor

Comment 21:

Issue #9122 has been merged into this issue.

@gopherbot
Copy link
Author

Comment 22 by james.defelice:

It would be great to see this fixed in the upcoming 1.4 release

@bradfitz
Copy link
Contributor

Comment 23:

That won't happen. Go 1.4 was frozen 3 months ago. We have a release cycle that's 3
months of work, then 3 months of stabilization. This would need to be done between
December 1st-ish and March 1st-ish.

bgentry added a commit to bgentry/go that referenced this issue Jan 22, 2015
If we try to reuse a connection that the server is in the process of
closing, we may end up successfully writing out our request (or a
portion of our request) only to find a connection error when we try to
read from (or finish writing to) the socket. This manifests as an EOF
returned from the Transport's RoundTrip.

This is a test for one of the issues described in issue golang#4677.
bgentry added a commit to bgentry/go that referenced this issue Jan 22, 2015
If we try to reuse a connection that the server is in the process of
closing, we may end up successfully writing out our request (or a
portion of our request) only to find a connection error when we try to
read from (or finish writing to) the socket. This manifests as an EOF
returned from the Transport's RoundTrip.

This is a test for one of the issues described in issue golang#4677.
bgentry added a commit to bgentry/go that referenced this issue Jan 23, 2015
If we try to reuse a connection that the server is in the process of
closing, we may end up successfully writing out our request (or a
portion of our request) only to find a connection error when we try to
read from (or finish writing to) the socket. This manifests as an EOF
returned from the Transport's RoundTrip.

The issue, among others, is described in golang#4677.

This change follows some of the Chromium guidelines for retrying
idempotent requests only when the connection has been already been used
successfully and no header data has yet been received for the response.

Change-Id: I1ca630b944f0ed7ec1d3d46056a50fb959481a16
@bgentry
Copy link
Contributor

bgentry commented Jan 23, 2015

I've made an attempt to resolve one specific, reproducible example of this issue: https://go-review.googlesource.com/3210

Following the comments in this thread from Chromium's network stack, I'm only retrying under the following circumstances:

  • Request is idempotent (currently just GET or HEAD)
  • Connection has already been used successfully and is being reused
  • No data has yet been received for the response headers

The comments so far suggest that this kind of change is something you're open to. Hopefully my approach is reasonable :)

@bradfitz
Copy link
Contributor

bradfitz commented Apr 6, 2015

https://go-review.googlesource.com/#/c/3210 is out for review

@gopherbot
Copy link
Author

CL https://golang.org/cl/3210 mentions this issue.

bradfitz pushed a commit that referenced this issue Dec 1, 2015
If we try to reuse a connection that the server is in the process of
closing, we may end up successfully writing out our request (or a
portion of our request) only to find a connection error when we try to
read from (or finish writing to) the socket. This manifests as an EOF
returned from the Transport's RoundTrip.

The issue, among others, is described in #4677.

This change follows some of the Chromium guidelines for retrying
idempotent requests only when the connection has been already been used
successfully and no header data has yet been received for the response.

As part of this change, an unexported error was defined for
errMissingHost, which was previously defined inline. errMissingHost is
the only non-network error returned from a Request's Write() method.

Additionally, this breaks TestLinuxSendfile because its test server
explicitly triggers the type of scenario this change is meant to retry
on. Because that test server stops accepting conns on the test listener
before the retry, the test would time out. To fix this, the test was
altered to use a non-idempotent test type (POST).

Change-Id: I1ca630b944f0ed7ec1d3d46056a50fb959481a16
Reviewed-on: https://go-review.googlesource.com/3210
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
@glasser
Copy link
Contributor

glasser commented Feb 15, 2016

Should this issue be closed now that 5dd372b is merged? Or does that only cover some of the cases of this issue?

@bradfitz
Copy link
Contributor

@glasser, yes, probably. I can't remember anything else this was open for.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants