Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: Client.Do returns different errors on context deadline depending on req/conn state #31863

Open
whiteglasses opened this issue May 6, 2019 · 5 comments
Labels
help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@whiteglasses
Copy link

What version of Go are you using (go version)?

go version go1.12.4 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/m.laletin/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/m.laletin/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build654146316=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Run multiple times program on my server (24 cpu)
https://play.golang.org/p/d00_wxKDN5f
With single goroutine - problem did not appears.
Problem appeared not every run, but in 3-4 runs at least one was faulty.

What did you expect to see?

all goroutines: done = true, err = Get http://240.0.0.1/test: context deadline exceeded

What did you see instead?

all gouroutines returned at the same time, after 4 seconds, like i specified in parent context.WithTimeout. But some goroutines returned before parentContext.Done was closed (error was i/o timeout), and some gorotines returned after parentCtx.Done was closed (with expected error "context deadline exceeded").

2019-05-06T18:29:37.969098118+03:00 start
2019-05-06T18:29:41.969304132+03:00 done = false, err = Get http://240.0.0.1/test: dial tcp 240.0.0.1:80: i/o timeout
2019-05-06T18:29:41.969347778+03:00 done = false, err = Get http://240.0.0.1/test: dial tcp 240.0.0.1:80: i/o timeout
2019-05-06T18:29:41.969298753+03:00 done = false, err = Get http://240.0.0.1/test: dial tcp 240.0.0.1:80: i/o timeout
2019-05-06T18:29:41.969580442+03:00 done = true, err = Get http://240.0.0.1/test: context deadline exceeded
2019-05-06T18:29:41.969298324+03:00 done = false, err = Get http://240.0.0.1/test: dial tcp 240.0.0.1:80: i/o timeout
2019-05-06T18:29:41.969589394+03:00 done = true, err = Get http://240.0.0.1/test: context deadline exceeded
2019-05-06T18:29:41.969461857+03:00 done = false, err = Get http://240.0.0.1/test: dial tcp 240.0.0.1:80: i/o timeout
2019-05-06T18:29:41.969294915+03:00 done = false, err = Get http://240.0.0.1/test: dial tcp 240.0.0.1:80: i/o timeout
2019-05-06T18:29:41.969336744+03:00 done = false, err = Get http://240.0.0.1/test: dial tcp 240.0.0.1:80: i/o timeout
2019-05-06T18:29:41.969433802+03:00 done = false, err = Get http://240.0.0.1/test: dial tcp 240.0.0.1:80: i/o timeout
@bradfitz bradfitz changed the title http.Client.Do context deadline before Done net/http: Client.Do returns different errors on context deadline depending on req/conn state May 6, 2019
@bradfitz bradfitz added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 6, 2019
@bradfitz bradfitz added this to the Go1.14 milestone May 6, 2019
@MikeBeller
Copy link

I thought this was an interesting way to learn more about the go runtime, so I looked into it.
First of all, I determined that this can be traced back to the behavior of net.DialContext.
If you call it with this nonexistent address, and a context that has a deadline, it will
always return with i/o timeout error, but sometimes the context will be Done and sometimes
it will not. So I wrote a much simpler test, that always fails (after a few loops). See the
test TestA1 in the following playground:

https://play.golang.org/p/xhSHGthpa3U

My next question was whether this was really a bug. Does it matter if the context is Done()
or not, as long as the reason for the error is obvious? So I developed the following workaround,
shown in TestA2 of the same playground. It always succeeds.

Just to see whether this was some kind of new behavior, I tested on go versions all the way
back to 1.7, and TestA1 behaves the same way on all versions. It is not a (recent) regression.

At this point i'm staring at runtime/netpoll.go and internal/poll/fd_poll_runtime.go, but I would
need some pointers to make sense of it.

I'm willing to look further in to this if it would be of help but first I'd like to hear from someone
in the know. Based on what I've written here, is it really a bug?

@ianlancetaylor
Copy link
Contributor

It sounds like the issue is that context.WithTimeout gets turned into a net package deadline, and the effect is that it's unpredictable whether we return an I/O timeout error or a context done error. That doesn't seem like a bug, exactly, but I suppose it would be nice if we could consistently return a context done error. To make that work I think we would have to make sure that we can reliably distinguish a net package timeout due to setting a deadline based on a context, and a net package timeout due to any other cause. If we could do that then if we get an error due to a deadline based on context, we could safely wait for the context do be done and return the context's error.

@fraenkel
Copy link
Contributor

When I took a quick look, the problem was the one who did the dial saw the actual error and then it became a race between the deadline channel and the error propagation through layers of context as to what outcome each request sees. There is no easy way to fix this unless we have some well defined structure on how errors are propagated rather than creating go funcs that wait for 1 of many possible outcomes.

@MikeBeller
Copy link

Well, one approach (perhaps consistent with @ianlancetaylor's comment) is just to add a few lines to net/dial.go sysDial.dialSingle. Where it currently says:

	if err != nil {
		return nil, &OpError{Op: "dial", Net: sd.network, Source: la, Addr: ra, Err: err} // c is non-nil interface containing nil pointer
	}

Insert a check for the deadline being past due, and if so, wait on ctx.Done() as follows:

	if err != nil {
		if os.IsTimeout(err) {
			if dl, ok := ctx.Deadline(); !ok || dl.Before(time.Now()) {
				// Issue 31863.  Prefer the context error to the i/o error
				<-ctx.Done()
				return nil, &OpError{Op: "dial", Net: sd.network, Source: la, Addr: ra, Err: ctx.Err()}
			}
		}
		return nil, &OpError{Op: "dial", Net: sd.network, Source: la, Addr: ra, Err: err} // c is non-nil interface containing nil pointer
	}

I tried this and built go. It passes my tests and the standard regression suite.

@MikeBeller
Copy link

I'm checking back in on this issue -- I'm happy to prepare a PR with my proposed fix and tests, but didn't want to move forward on it unless folks with a good understanding of the runtime thought such a PR was worth considering.

@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
jhendrixMSFT added a commit to jhendrixMSFT/azure-sdk-for-go that referenced this issue Mar 14, 2023
There is a race between error propagation and closing a context's
deadline channel.  The result is that while the API that takes a context
returns the correct error, context.Err() might return nil.

See golang/go#31863 for more info.
jhendrixMSFT added a commit to Azure/azure-sdk-for-go that referenced this issue Mar 15, 2023
* Use underlying error when checking for context errors

There is a race between error propagation and closing a context's
deadline channel.  The result is that while the API that takes a context
returns the correct error, context.Err() might return nil.

See golang/go#31863 for more info.

* further refinements

* revert ordering of error check

* add changelog entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants