Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: provide mechanism to mark non-idempotent Requests as retry-able #19943

Closed
flimzy opened this issue Apr 12, 2017 · 13 comments
Closed

net/http: provide mechanism to mark non-idempotent Requests as retry-able #19943

flimzy opened this issue Apr 12, 2017 · 13 comments
Labels
FeatureRequest FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@flimzy
Copy link
Contributor

flimzy commented Apr 12, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.8 linux/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/jonhall/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build059328702=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

Tests in one of my applications are frequently receiving the error http: server closed idle connection. Code comments make me think this error should never be visible. (I fully suspect a buggy server, which is dropping connections, but this issue isn't about that behavior, per se.)

This issue is simply intended to clarify the intended behavior of the net/http package with respect to this error message.

Is the fact that I see this error message a bug which should be reported? (If so, I'll produce a MRE and file a followup issue). Or is the source comment (line 605 here) just misleading?

@bradfitz
Copy link
Contributor

Yeah, I don't think that should be visible. I thought it was an internal error that caused the Transport to pick a new conn.

What's your Transport config & usage of the Transport look like?

@bradfitz bradfitz changed the title Question: Is http.errServerClosedIdle truly meant only for internal use, as indicated by code comments? net/http: Transport's errServerClosedIdle escapes to callers Apr 12, 2017
@bradfitz bradfitz added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Apr 12, 2017
@bradfitz bradfitz added this to the Go1.9Maybe milestone Apr 12, 2017
@flimzy
Copy link
Contributor Author

flimzy commented Apr 12, 2017

Thanks for the quick response. I'll work on putting together a minimal reproduction case, and follow up shortly.

@flimzy
Copy link
Contributor Author

flimzy commented Apr 12, 2017

I am able to reproduce with this minimal Go code, communicating with a CouchDB 1.6.1 server, running in a Docker environment (I can provide minimal setup instructions if necessary). I've not yet been able to reproduce the same behavior with an httptest server, which would of course make it so much easier to debug. Since it's not possible to completely reproduce this with just the Go code, I am including as much relevant information as I can think of below. Please let me know what additional info I can provide.

main.go

Note that the failure is semi-random. It's probably triggered by a race condition (whether the race is happening in the Go library, or on the server, I don't yet know).

With 2 or more simultaneous HTTP requests, the script inevitably panics with one of two errors, after successfully handling an arbitrary number of requests.

panic: Put http://localhost:9000/foo/7.txt: http: server closed idle connection

or

panic: Put http://localhost:9000/foo/4.txt: EOF

I'm not sure how related these are, except that I expect they're both related to the server dropping the connection unexpectedly (perhaps the difference is whether the drop occurs mid-request, or between requests).

As I increase concurrency, the frequency of 'http: server closed idle connection' increases relative to the EOF errors. For example:

  • with 2 concurrent requests, after 30 runs I observed 28 EOF panics, and 2 closed idle connection panics.
  • with 5 concurrent requests and 30 runs, I observed 11 EOF panics versus 19 closed idle connection panics.

I have also done some network captures (sudo tcpdump -i lo -s 0 -w capture) to capture the network traffic of both classes of failure.

This run is captured here:

$ go run main.go
PUT http://localhost:9000/foo/5.txt
PUT http://localhost:9000/foo/2.txt
PUT http://localhost:9000/foo/1.txt
PUT http://localhost:9000/foo/3.txt
PUT http://localhost:9000/foo/4.txt
PUT http://localhost:9000/foo/10.txt
PUT http://localhost:9000/foo/15.txt
PUT http://localhost:9000/foo/6.txt
PUT http://localhost:9000/foo/7.txt
panic: Put http://localhost:9000/foo/10.txt: EOF

goroutine 70 [running]:
main.doRequest(0xc420010ec0, 0xc40000000a)
        /home/jonhall/go/src/github.com/flimzy/issue19943/main.go:33 +0x32f
created by main.main
        /home/jonhall/go/src/github.com/flimzy/issue19943/main.go:21 +0x17b
exit status 2

And this run here:

$ go run main.go
PUT http://localhost:9000/foo/5.txt
PUT http://localhost:9000/foo/2.txt
PUT http://localhost:9000/foo/3.txt
PUT http://localhost:9000/foo/1.txt
PUT http://localhost:9000/foo/4.txt
PUT http://localhost:9000/foo/10.txt
panic: Put http://localhost:9000/foo/10.txt: http: server closed idle connection

goroutine 58 [running]:
main.doRequest(0xc420010ec0, 0xc40000000a)
        /home/jonhall/go/src/github.com/flimzy/issue19943/main.go:33 +0x32f
created by main.main
        /home/jonhall/go/src/github.com/flimzy/issue19943/main.go:21 +0x17b
exit status 2

A couple of random observations:

  • As indicated in a comment in the test code, if I don't consume the response body, the error doesn't occur. I don't know if this is related to actually consuming the body, or just timing.

  • If I don't send a request body, the error also doesn't occur. Again: I'm unsure how this relates.

  • The counter is only for convenience when matching the test requests with the HTTP packets in the dumps. The specific URL used, and the HTTP response status code received seem to be immaterial to the failure.

@rmb938
Copy link

rmb938 commented Jun 2, 2017

@flimzy I am having a similar issue however in a slightly different situation.

I am running a python uwsgi server in Docker and using DELETE methods instead of PUT with no concurrent connections.

I seem to only be able to replicate it while running in Docker with a golang client. Using curl, postman, ruby, python, ect.. to make the same request does not result in any EOF or TCP errors.

Doing a few packet traces and following the TCP stream for the DELETE request shows that as soon as the request is made the TCP connection is ACK'd then FIN, ACK'd and closed without the server having time to respond.

I can only seem to replicate the issue when running the web server inside Docker. So it seems that whatever delay or modification to the TCP packets Docker is doing, the golang http client cannot handle correctly.

@bradfitz bradfitz modified the milestones: Go1.9Maybe, Go1.10 Jul 20, 2017
@dunjut
Copy link

dunjut commented Jan 3, 2018

Is there any update on this issue? @bradfitz

@bradfitz
Copy link
Contributor

bradfitz commented Jan 5, 2018

Sorry, I'm just getting back to this after some leave & backlog.

Thanks for the pcaps.

I can reproduce this with the following Go server:

package main

import (
        "io"
        "log"
        "math/rand"
        "net/http"
)

func main() {
        log.Fatal(http.ListenAndServe(":9000", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
                // Occasionally say that we're keeping the connection open but lie and close it:
                if rand.Intn(50) == 0 {
                        conn, _, err := w.(http.Hijacker).Hijack()
                        if err != nil {
                                return
                        }
                        io.WriteString(conn, "HTTP/1.1 404 Object Not Found\r\nServer: CouchDB/1.6.1 (Erlang OTP/17)\r\nDate: Wed, 12 Apr 2017 18:39:49 GMT\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Length: 44\r\nCache-Control: must-revalidate\r\n\r\n{\"error\":\"not_found\",\"reason\":\"no_db_file\"}\n")
                        conn.Close()
                        return
                }
                w.WriteHeader(404)
                io.WriteString(w, "{\"error\":\"not_found\",\"reason\":\"no_db_file\"}\n")
        })))
}

And then I get what you were seeing with your main.go above:

....
PUT http://localhost:9000/foo/134.txt
PUT http://localhost:9000/foo/140.txt
PUT http://localhost:9000/foo/137.txt
PUT http://localhost:9000/foo/138.txt 
PUT http://localhost:9000/foo/139.txt 
PUT http://localhost:9000/foo/136.txt
panic: Put http://localhost:9000/foo/139.txt: http: server closed idle connection

goroutine 523 [running]:
main.doRequest(0xc4200184a0, 0xc40000008b)
        /home/bradfitz/issue19943-client.go:33 +0x2f4
created by main.main
        /home/bradfitz/issue19943-client.go:20 +0x13f
exit status 2

I'll investigate.

@bradfitz
Copy link
Contributor

bradfitz commented Jan 5, 2018

So here's what's happening:

The HTTP Transport would normally retry those requests, but because they're POST requests they're not idempotent and Go conservatively assumes it might be unsafe to retry them. In fact, Go would've retried them had the server's EOF happened earlier before any bytes were written, but Go manages to write a few bytes to the server first, so Go has to assume the server acted on them.

We currently have no way to mark an *http.Request as safe to retry when it's not a GET/HEAD/OPTIONS/TRACE. I suppose we can repurpose that bug for this.

The other root cause of this bug is that CouchDB is lying and saying it'll keep the connection open but then closing it immediately, which is making Go pick a connection which is just going to hang up. Again, that's okay for GET requests (Go will retry), but not for POSTs. It's also okay when there's not a lot of traffic happening quickly, because Go will see the hangup and remove that connection from the idle pool.

Perhaps Go could also keep stats per-hostname of which servers are lying about connection keep-alives.

In any case, the error message returned here is valid and intentional. I'll send a change to update the comments on that internal error message, and we'll keep this bug open to do more in future Go releases.

@bradfitz bradfitz added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Jan 5, 2018
@gopherbot
Copy link

Change https://golang.org/cl/86375 mentions this issue: net/http: document internal error errServerClosedIdle more

@bradfitz bradfitz changed the title net/http: Transport's errServerClosedIdle escapes to callers net/http: provide mechanism to mark non-idempotent Requests as retry-able Jan 5, 2018
@bradfitz bradfitz modified the milestones: Go1.10, Go1.11 Jan 5, 2018
gopherbot pushed a commit that referenced this issue Jan 5, 2018
Updates #19943

Change-Id: Iea249be51a7af3264bee9ee2b28dbd91043275fc
Reviewed-on: https://go-review.googlesource.com/86375
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@bradfitz bradfitz modified the milestones: Go1.11, Unplanned May 18, 2018
@iand
Copy link
Contributor

iand commented Sep 13, 2018

There seems to be a growing support for idempotency-key headers to indicate that a POST should be idempotent. Stripe, GoCardless, Worldpay and Paypal support idempotency-key and Openbanking supports x-idempotency-key. As far as I can tell there is no current effort to specify via IETF but there does seem to be convergence among financial APIs.

Go could look for this header and infer that the request is intended to be idempotent.

@ncw
Copy link
Contributor

ncw commented Nov 5, 2018

An rclone user has been reporting this on POST-ing to the dropbox API.

I've put a work-around in for the moment.

Since errServerClosedIdle is no longer private could it be exposed? It would mean I could kill this, most hideous piece of code in the world...

// retriableErrorStrings is a list of phrases which when we find it
// in an an error, we know it is a networking error which should be
// retried.
//
// This is incredibly ugly - if only errors.Cause worked for all
// errors and all errors were exported from the stdlib.
var retriableErrorStrings = []string{
	"use of closed network connection", // internal/poll/fd.go
	"unexpected EOF reading trailer",   // net/http/transfer.go
	"transport connection broken",      // net/http/transport.go
	"http: ContentLength=",             // net/http/transfer.go
	"server closed idle connection",    // net/http/transport.go
}

@bradfitz
Copy link
Contributor

bradfitz commented Nov 5, 2018

@iand, nice! Thank you for all those links. That's the first I've heard of that and it's great to see. I've always wanted to see a header like that so I'm pleased that it both exists and is pretty widely used.

I say we make Go respect both idempotency-key and x-idempotency-key. If Go sees either, it can mark the request as retry-able.

@gopherbot
Copy link

Change https://golang.org/cl/147457 mentions this issue: net/http: make Transport respect {X-,}Idempotency-Key header

@gopherbot
Copy link

Change https://golang.org/cl/191779 mentions this issue: net/http: document additional uses of Requesty.GetBody()

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FeatureRequest FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

7 participants