Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: document Transport errors more, types of errors returned, how to check canceled, etc #15935

Closed
erikdubbelboer opened this issue Jun 2, 2016 · 19 comments
Labels
Documentation FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@erikdubbelboer
Copy link
Contributor

At the moment the only wat to know if an http.Request was canceled is to check the error string. If http.errRequestCanceled is exported this becomes much easier.

I can submit a change to gerrit if needed?

@davecheney
Copy link
Contributor

@erikdubbelboer what would you do if errRequestCancalled was exported ? If you want to retry the request, would it be better if the error returned implemented a Temporary() bool method ?

@erikdubbelboer
Copy link
Contributor Author

@davecheney we use http for some internal communication. It's around 5000 requests per second with a 100ms timeout before we cancel the request. At the moment we log all errors but would like to ignore errors that aren't important to us. If errRequestCancalled was exported we could type assert this error and ignore it in a more robust way than doing a strings.HasSuffix and hoping the messages never changes.

I'm not sure which errors you would call temporary.

For example we already ignore connection refused errors using:

    if uerr, ok := err.(*url.Error); ok {
        if noerr, ok := uerr.Err.(*net.OpError); ok {
            if scerr, ok := noerr.Err.(*os.SyscallError); ok {
                if scerr.Err == syscall.ECONNREFUSED {
                    ignore = true
                }
            }
        }
    }

@ianlancetaylor ianlancetaylor added this to the Go1.8 milestone Jun 2, 2016
@tombergan
Copy link
Contributor

Don't you already know the request was canceled since your code canceled the request, or does cancellation happen across an API boundary? Also, I think multiple different errors can be returned depending on when the request is cancelled (currently errRequestCanceled or errRequestCanceledConn).

@erikdubbelboer
Copy link
Contributor Author

Yes it's across an API boundary. Both sides are internal code so we could fix this internally but I thought it would also be nice for other people to be able to detect this error this way.

Shouldn't in theory all error types the Go API returns be exported so a user can always detect them reliably without resorting to string comparisons?

@tombergan
Copy link
Contributor

Shouldn't in theory all error types the Go API returns be exported so a user can always detect them reliably without resorting to string comparisons?

I don't entirely disagree. I have some ugly code that detects net/http's errTimeout by inspecting the error string. It would be nice if that error was exported.

The counter-argument is that this forces the API to have a mostly-stable list of possible errors, which can be difficult. How do you decide which errors to export? If your errors are too abstract, the error messages are less useful. If your errors are too low-level, there can be heavy churn as the code changes and old errors are deprecated (e.g., see issue #15150).

Strictly speaking, I don't think all errors should be exported, but maybe there's a small set of stable errors to export? ErrRequestCanceled and ErrResponseHeadersTimedOut are both good candidates since they relate directly to behaviors controlled by the net/http API.

@bradfitz
Copy link
Contributor

@nightlyone might have suggestions for what to do here, per his #15192

@nightlyone
Copy link
Contributor

@tombergan having a type satisfying https://godoc.org/github.com/nightlyone/errorclass#Canceled for errRequestCanceled might help this effort.

So the user of the http package will know what actions to take from these error classifications.

It seems the errTimeout is already correctly classified and Temporary() will will return true there. It will even report that it is a timeout.

The exact error is usually not needed to decide what to do once the application is running, since complete error enumerations is an endless, never complete task.

But what is useful, is to support the decision, whether I should retry (Temporary() == true), ignore it (in the cancel case) or report it.

@tombergan
Copy link
Contributor

@nightlyone

The exact error is usually not needed to decide what to do once the application is running, since complete error enumerations is an endless, never complete task. But what is useful, is to support the decision, whether I should retry (Temporary() == true), ignore it (in the cancel case) or report it.

If only it were that simple! Before seeing this issue, you might have reasonably concluded that Temporary() was "good enough" to make programmatic decisions regarding HTTP errors. Then this bug added another case, which is to ignore canceled errors. Your errorclass seems to handle this case.

Then a future bug wants to know if the request timed out because Transport.ResponseHeaderTimeout was exceeded, or because Transport.TLSHandshakeTimeout was exceeded, or because the req.Context exceeded its deadline, or because of a DNS timeout, or because of a TCP connect timeout, etc. You cannot answer these questions with a simple IsTimeout() method on the error.

For example: I maintain a large proxy service that uses net/http. Rather than logging net/http errors as strings, we convert net/http errors into a custom enum error space (mostly because it's easy to verify that an enum doesn't contain PII, compared to an error string, which can contain arbitrary info; also, enums are easier to process than strings at large scales). Part of that service is ~70 lines of code that tries to convert the error returned by RoundTrip() or Body.Read() into an error in our custom enum space. This code actually isn't too bad. The standard library already exports types like net.OpError, net.DNSError, and os.SyscallError. However, we check for ResponseHeaderTimeout errors by checking for error strings that contain "timeout awaiting response headers". This introspection would be unnecessary if net/http's errTimeout was exported.

I agree with your comment that "complete error enumerations is an endless, never complete task", but that comment is also overly pessimistic. Here's a simple rule: if an API exports N knobs, then it should also export N error types or vars, where each error means that the operation failed due to the corresponding knob. For net/http.RoundTrip, I think three errors could be exported following this rule:

errTimeout               // Transport.ResponseHeaderTimeout
errRequestCanceled       // Transport.CancelRequest, Request.Cancel, Request.Context
tlsHandshakeTimeoutError // Transport.TLSHandshakeTimeout

Note that errRequestCanceledConn doesn't need to be exported since it's controlled by the same knob as errRequestCanceled. Plus, the caller can use the new httptrace API to figure out when the error happened.

(Footnote: I think you could make a case that RoundTrip should return Request.Context().Err() instead of errRequestCanceled when Request.Context() completes, to preserve that original error.)

@bradfitz
Copy link
Contributor

(Footnote: I think you could make a case that RoundTrip should return Request.Context().Err() instead of errRequestCanceled when Request.Context() completes, to preserve that original error.)

@ianlancetaylor recently submitted 0b5f2f0 which does that.

Maybe that's enough for this bug?

@tombergan
Copy link
Contributor

Yep, I think that would satisfy @erikdubbelboer's specific request.

WDYT about also exporting errTimeout and tlsHandshakeTimeoutError because of this suggested rule:

if an API exports N knobs, then it should also export N error types or vars, where each error means that the operation failed due to the corresponding knob.

@bradfitz
Copy link
Contributor

Both of those already have a IsTimeout() bool method which return true.

I don't want the error package to be a minefield of error types and variables obscuring the useful parts.

I also don't want the implementation to be restricted to returning specific error variables over time, constraining the ability to refactor things. I'd rather promise properties of the errors, like that its IsTimeout() == true (ideally with a helper somewhere to get at it) or things like "if the context becomes done, you get back its error, wrapped in a URLError", which is kinda implicit already from the http docs.

Actually, it's not. The only mention of url.Error at https://golang.org/pkg/net/http/ is this one appearance:

        // If CheckRedirect returns an error, the Client's Get
        // method returns both the previous Response (with its Body
        // closed) and CheckRedirect's error (wrapped in a url.Error)
        // instead of issuing the Request req.

But nothing else talks about *url.Error.

Before we start ad-hoc fixing stuff like exporting single variables, I'd rather have a concerted effort to document the status quo where one already exists, make inconsistent things consistent, then document them, and write new tests locking in behavior for the documentation. Maybe we have some of those tests already, and some (but not enough) docs.

But I don't think the answer is more error variables right now.

Witness this error variable:

        // Deprecated: ErrWriteAfterFlush is no longer used.
        ErrWriteAfterFlush = errors.New("unused")

It's in the public docs and was only documented to be no longer used after it was discovered that it was vestigial (#15150). We don't want more of those.

Do you want to specifically know when a TLS handshake timeout happens? Can you not already tell between httptrace hooks and the Timeout() bool method?

@tombergan
Copy link
Contributor

tombergan commented Aug 24, 2016

Can you not already tell between httptrace hooks

I think that's right -- these questions can be answered via the httptrace package. So I think you can close this bug.

I also don't want the implementation to be restricted to returning specific error variables over time, constraining the ability to refactor things.

For the record (not important; you can close this bug), I don't think this is a problem for the rule proposed above because an error is exported only if a knob exists in the public API. You're constrained to support that error for as long as you're constrained to support that API knob.

@bradfitz
Copy link
Contributor

@tombergan, I'll just slightly repurpose this bug instead, changing its title.

@bradfitz bradfitz changed the title net/http: export errRequestCanceled net/http: document errors more, types of errors returned, how to check canceled, etc Aug 24, 2016
@quentinmit quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 7, 2016
@bradfitz bradfitz changed the title net/http: document errors more, types of errors returned, how to check canceled, etc net/http: document Transport errors more, types of errors returned, how to check canceled, etc Nov 11, 2016
@glasser
Copy link
Contributor

glasser commented Dec 10, 2016

As described in #18272, it would also be nice if you could easily tell if an error returned by http.Transport.RoundTrip came from communicating with the target server or from reading request.Body (or a third place I'm not considering).

@bradfitz
Copy link
Contributor

I've started some notes in https://docs.google.com/a/golang.org/document/d/15J1ypbcK7OJ2IJ6TEgB3TitNPTCpo0kAwBbrZu2tzPo/edit?usp=sharing about what Transport actually does today.

Hopefully people don't depend on the current behavior too much. I seem to recall other bugs though where people have made deep assumptions about specific error types returned by RoundTrip, even though nothing is promised.

@gopherbot
Copy link

CL https://golang.org/cl/34381 mentions this issue.

gopherbot pushed a commit that referenced this issue Dec 14, 2016
The client code was using time.Now() (wall time) to determine whether
the cause of a non-nil error meant that a timeout had occured. But on
Windows, the clock used for timers (time.After, time.Sleep, etc) is
much more accurate than the time.Now clock, which doesn't update
often.

But it turns out that as of the recent https://golang.org/cl/32478 we
already have the answer available easily. It just wasn't in scope.

Instead of passing this information along by decorating the errors
(risky this late in Go 1.8, especially with #15935 unresolved), just
passing along the "didTimeout" func internally for now. We can remove
that later in Go 1.9 if we overhaul Transport errors.

Fixes #18287 (I hope)

Change-Id: Icbbfceaf02de6c7ed04fe37afa4ca16374b58f3c
Reviewed-on: https://go-review.googlesource.com/34381
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@bradfitz
Copy link
Contributor

Well, I looked into this, but it's too much to document for Go 1.8, not that I think we'd even want to document it in its current state. (See earlier comment)

@gopherbot
Copy link

CL https://golang.org/cl/37495 mentions this issue.

gopherbot pushed a commit that referenced this issue Mar 2, 2017
If I put a 10 millisecond sleep at testHookWaitResLoop, before the big
select in (*persistConn).roundTrip, two flakes immediately started
happening, TestTransportBodyReadError (#19231) and
TestTransportPersistConnReadLoopEOF.

The problem was that there are many ways for a RoundTrip call to fail
(errors reading from Request.Body while writing the response, errors
writing the response, errors reading the response due to server
closes, errors due to servers sending malformed responses,
cancelations, timeouts, etc.), and many of those failures then tear
down the TCP connection, causing more failures, since there are always
at least three goroutines involved (reading, writing, RoundTripping).

Because the errors were communicated over buffered channels to a giant
select, the error returned to the caller was a function of which
random select case was called, which was why a 10ms delay before the
select brought out so many bugs. (several fixed in my previous CLs the past
few days).

Instead, track the error explicitly in the transportRequest, guarded
by a mutex.

In addition, this CL now:

* differentiates between the two ways writing a request can fail: the
  io.Copy reading from the Request.Body or the io.Copy writing to the
  network. A new io.Reader type notes read errors from the
  Request.Body. The read-from-body vs write-to-network errors are now
  prioritized differently.

* unifies the two mapRoundTripErrorFromXXX methods into one
  mapRoundTripError method since their logic is now the same.

* adds a (*Request).WithT(*testing.T) method in export_test.go, usable
  by tests, to call t.Logf at points during RoundTrip. This is disabled
  behind a constant except when debugging.

* documents and deflakes TestClientRedirectContext

I've tested this CL with high -count values, with/without -race,
with/without delays before the select, etc. So far it seems robust.

Fixes #19231 (TestTransportBodyReadError flake)
Updates #14203 (source of errors unclear; they're now tracked more)
Updates #15935 (document Transport errors more; at least understood more now)

Change-Id: I3cccc3607f369724b5344763e35ad2b7ea415738
Reviewed-on: https://go-review.googlesource.com/37495
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
@bradfitz bradfitz removed their assignment Jul 6, 2017
@bradfitz bradfitz modified the milestones: Go1.10, Go1.9 Jul 6, 2017
@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017
@gopherbot gopherbot modified the milestones: Go1.11, Unplanned May 23, 2018
@bradfitz
Copy link
Contributor

bradfitz commented Dec 4, 2018

Duplicate of #9424. (Not sure why I never noticed earlier)

@bradfitz bradfitz closed this as completed Dec 4, 2018
@golang golang locked and limited conversation to collaborators Dec 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Documentation FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

10 participants