net/http: Transport.CancelRequest is inherently racy #11013

jacobsa · 2015-06-01T04:03:20Z

The implementation of http.Transport.CancelRequest looks for the supplied request struct in a map of in-flight requests, doing nothing if it doesn't exist. If I understand correctly, that means that this snippet is racy, in the sense that cancellation of the context may be lost if it is seen before RoundTrip gets around to sticking the request in the map:

func cancellableDo(ctx context.Context, req *http.Request) (err error) {
  // Propagate cancellation of the context to the HTTP request.
  go func() {
    <-context.Done()
    http.DefaultTransport.CancelRequest(req)
  }()

  // Call through.
  err = http.DefaultTransport.RoundTrip(req)
  return
}

This is made more likely to happen by RoundTripper implementations like oauth2.Transport, which may make network calls before calling through to RoundTrip. In the end I'm forced to cancel in a hacky loop to ensure that the HTTP package pays attention.

Is there anything that can be done to eliminate this race?

ianlancetaylor · 2015-06-03T00:51:23Z

CC @bradfitz

DanielMorsing · 2015-06-11T14:07:35Z

This came up on the list again. https://groups.google.com/d/topic/golang-nuts/m-Ocj7YVahU/discussion

I'm starting to think that it's impossible to do cancellation without races with the current interface. We might have to make a new RoundTrip interface which takes a context object and store the cancellation on there, like what http://blog.golang.org/context describes.

bradfitz · 2015-06-11T15:10:35Z

As pointed out in that thread, the documentation is correct. You can't cancel it until it's been started.

This code is also "racy":

var mu sync.Mutex
m := map[string]string{}
go func() { mu.Lock(); m["foo"] = "bar"; mu.Unlock() }()
go func() { mu.Lock(); delete(m, "foo"); mu.Unlock() }()

... sometimes the map entry will be there and sometimes it won't, because you're not waiting to delete it until it's been added.

That doesn't mean there's a bug with maps or with delete.

We could add more words if that would help, but we can't start adding contexts to APIs for cancellation because the standard library a) doesn't have them, and b) is frozen.

DanielMorsing · 2015-06-11T15:41:42Z

Having the outcome of the map example be non-deterministic is fine, but for the Transport, the stakes are a bit higher. If you cancel a request before it has been started, RoundTrip might block indefinitely when the request does issue, leading to a leaked goroutine and a leaked net.Conn since you never end up calling resp.Body.Close() on the response.

Clearly this is a punt to go1.6 issue, but still think we need to address this. The current API makes it too easy to write code that is subtly wrong.

jacobsa · 2015-06-11T23:01:27Z

@bradfitz: Of course there's no bug, in the sense that the API does what is promises. But I'm saying I think it's a less useful API than it could be. Your example is not fair: the user can synchronize on the completion of an insertion into a map they own, but it's impossible for them to synchronize on RoundTrip inserting into its internal map. So it's impossible for them to ensure that a cancellation will not be lost.

You can't change the existing CancelRequest function because of the API freeze, I understand. But surely there is another solution that is backwards compatible, by adding a new method? context.Context would be the "best" thing to use if designing from scratch since the community has standardized on it, but that doesn't mean the standard library can't have its own bespoke version (e.g. pass in a cancellation channel).

bradfitz · 2015-06-11T23:16:58Z

@jacobsa, the use cases this was originally designed for was calling Transport.CancelRequest after Transport.RoundTrip has returned (and then you know it's in the in-flight map). See #3672 for history.

And for those concerned about Transport.RoundTrip taking too long to return the headers can use Transport.ResponseHeaderTimeout. History is in #3362.

If you really want to know what phase of its lifecycle a Transport's *Request is in, perhaps that ties into #9194

In addition to lack of contexts, it's also unfortunate that CancelRequest doesn't return a value too.

But unless there's something concrete to do here, I'm tempted to close this bug. So please propose something if you have an idea.

Maybe Transport could keep track of the set of requests which have been canceled but not yet started (and then fail the request if it does come in later), and start a timer to remove that map entry in N seconds? But what is N?

jacobsa · 2015-06-11T23:33:41Z

@bradfitz: Thanks for the background. I see, it makes a lot more sense when you view it from the lens of wanting to cancel once you've already received the response headers and are still reading the body. I came at it from "wanting to cancel at any time".

I'm sure you're aware that setting a timeout up-front is less satisfying than being able to cancel at an arbitrary time, especially when you're trying to bridge a context.Context API with net/http. But it's still helpful of course, since it bounds your latency from "context is cancelled" to "I actually return from my function that calls into net/http". Still, it has the same trouble you mentioned: how do I choose the timeout?

Here are two concrete proposals:

Add a callback that lets me know when the request becomes cancellable, so I can synchronize on that. (As you said, this might tie in with net/http: Transport: add ConnState callback for http.Transport #9194.)
Add a chan struct{} field to http.Request. Approximately the same semantics of Context.Done: closing it means the request is regarded is cancelled, nil means not cancellable in this manner.

I personally like (2). The cost is that it would require net/http to maintain a background goroutine with a list of cancellation channels for in-flight requests (if any had ever been made), calling reflect.Select to wait for one of them to be closed or for a request to be finished. But this is what users have to do today if they want to support cancellation in the context.Context style, and they would still need to do it if (1) were chosen.

bradfitz · 2015-06-12T02:42:13Z

I kinda like (2) also. But: what is its type?

a) <-chan struct{} // closed on cancelation
b) chan struct{} // closed on cancelation

And if (b), who can close it? You wouldn't want multiple goroutines calling close on it. One might panic then.

What about Transport.CancelRequest? It would reasonably want to cancel it also. Maybe? I suppose we don't want to allocate the channel by default (and Transport.CancelRequest can't assume it's already there, nor set it from another goroutine), so I CancelRequest will have to forever use its existing mechanism. That leads me to say (a), which is also the type returned by Context.Done in http://godoc.org/golang.org/x/net/context#Context

jacobsa · 2015-06-12T02:54:38Z

I think <-chan struct{} would be better, yeah.

Regarding Transport.CancelRequest: I think you could actually leave this as-is, documenting its downsides and maybe marking it as deprecated, but have it continue to do the heavy lifting of cancellation. That is:

RoundTrip inserts into the map then tells the cancellation goroutine about the channel.
The cancellation goroutine calls CancelRequest when one of the channels it knows about is closed.

Odd to be using the deprecated API to implement the new one internally, but it's safe because you can synchronize on the insert into the map. Put another way: if the old API weren't already promised, it could just be an implementation detail.

jacobsa · 2015-06-28T21:59:12Z

@bradfitz: What do you think of my previous comment? If you're good with the idea, I'm happy to send a CL.

bradfitz · 2015-06-28T22:18:34Z

I don't care much about the implementation. What's easiest or most efficient. I think both the easiest and most efficient is selecting from the *Request.Cancel <-chan struct{} channel (nil or not) would be easiest, without a new map insert or goroutine or anything.

Please send a CL, ideally soon. The tree is increasingly frozen.

jacobsa · 2015-06-29T00:10:13Z

@bradfitz: Oh duh, I was way overcomplicating it, you're right. Code review that just adds new select cases (no new goroutines) is here.

gopherbot · 2015-06-29T01:00:15Z

CL https://golang.org/cl/11601 mentions this issue.

Due to golang/go#11013 race conditions in the remote testcases mean that HTTP requests aren't actually canceled. In client.go::httpDo() the goroutine that actually runs the http.Do() may not execute until after that routine's select{} loop, at which point the Context may have already been canceled. Unfortunatel with Go < 1.5 there is a race in the http.Transport package that drops cancellations on the floor. Work around the Go http.Transport cancellation issue by checking a private cancellation variable, since context.Context has no mechanism to check whether it has already been canceled. This can be removed once Flannel can depend on Go 1.5, at which point the code should start to use http.Request.Cancel instead of calling http.transport.CancelRequest(). Second, use a WaitGroup in doTestWatch() to assert that the outstanding watch is actually canceled.

Due to golang/go#11013 race conditions in the remote testcases mean that HTTP requests aren't actually canceled. In client.go::httpDo() the goroutine that actually runs the http.Do() may not execute until after that routine's select{} loop, at which point the Context may have already been canceled. Unfortunatel with Go < 1.5 there is a race in the http.Transport package that drops cancellations on the floor. Work around the Go http.Transport cancellation issue by checking a private cancellation variable, since context.Context has no mechanism to check whether it has already been canceled. This can be removed once Flannel can depend on Go 1.5, at which point the code should start to use http.Request.Cancel instead of calling http.transport.CancelRequest(). Second, use a WaitGroup in doTestWatch() to assert that the outstanding watch is actually canceled. This issue can be reproduced by using the remote_test.go hunks but not the client.go hunks. Specifically, doTestWatch() calls cancel() before watch.go::WatchLeases() calls sm.WatchLeases(), which in this case is RemoteManager.WatchLeases(), which then calls RemoteManager.httpGet() -> httpDo(), which then calls http.Client.Do() which has the aformentioned cancellation race.

Due to golang/go#11013 race conditions in the remote testcases mean that HTTP requests aren't actually canceled. In client.go::httpDo() the goroutine that actually runs the http.Do() may not execute until after that routine's select{} loop, at which point the Context may have already been canceled. Unfortunately with Go < 1.5 there is a race in the http.Transport package that drops cancellations on the floor. This issue can be reproduced by using the remote_test.go hunks that use a WaitGroup to wait for subnet.WatchLeases() to exit after the Context is canceled, but not the client.go hunks. Specifically, doTestWatch() calls cancel() before watch.go::WatchLeases() calls sm.WatchLeases(), which in this case is RemoteManager.WatchLeases(), which then calls RemoteManager.httpGet() -> httpDo(), which then calls http.Client.Do() which has the aformentioned cancellation race. Work around the Go http.Transport cancellation issue by checking a private cancellation variable, since context.Context has no mechanism to check whether it has already been canceled. This can be removed once Flannel can depend on Go 1.5, at which point the code should start to use http.Request.Cancel instead of calling http.transport.CancelRequest(). This obviously doesn't fix the race, but I think that has to wait for Go 1.5. Second, use a WaitGroup in doTestWatch() to assert that the outstanding watch is actually canceled.

Due to golang/go#11013 race conditions in the remote testcases mean that HTTP requests aren't actually canceled. In client.go::httpDo() the goroutine that actually runs the http.Do() may not execute until after that routine's select{} loop, at which point the Context may have already been canceled. Unfortunately with Go < 1.5 there is a race in the http.Transport package that drops cancellations on the floor. This issue can be reproduced by using the remote_test.go hunks that use a WaitGroup to wait for subnet.WatchLeases() to exit after the Context is canceled, but not the client.go hunks. Specifically, doTestWatch() calls cancel() before watch.go::WatchLeases() calls sm.WatchLeases(), which in this case is RemoteManager.WatchLeases(), which then calls RemoteManager.httpGet() -> httpDo(), which then calls http.Client.Do() which has the aformentioned cancellation race. To fix the races more completely (though not as completely as Go 1.5 will do) we have to copy most of http.Transport into the tree and implement some cancellation fixups ourselves. A few parts of the new Transport are notable: 1) CancelRequest() - if the request is not yet being tracked, the cancelation is saved in another map for later 2) getConn() - the real work is done here; when adding a new request to the internal table, check if it the client has asked to cancel it, and actually cancel it. Second if we are going to use an idle connection for the request, and the request has already been canceled, then cancel the idle connection too. Second, use a WaitGroup in doTestWatch() to assert that the outstanding watch is actually canceled.

mikioh changed the title ~~http: Transport.CancelRequest is inherently racy~~ net/http: Transport.CancelRequest is inherently racy Jun 1, 2015

ianlancetaylor added this to the Go1.5Maybe milestone Jun 3, 2015

bradfitz closed this as completed in 8b4278f Jun 30, 2015

mikioh modified the milestones: Go1.5, Go1.5Maybe Jun 30, 2015

dcbw mentioned this issue Aug 11, 2015

remote: ensure testing HTTP requests are canceled flannel-io/flannel#282

Merged

pwaller mentioned this issue Nov 17, 2015

x/net/context/ctxhttp: remove racy go1.4 support #13293

Closed

rht mentioned this issue Nov 22, 2015

WIP: working on making close notify work ipfs/kubo#1992

Closed

golang locked and limited conversation to collaborators Jun 29, 2016

gopherbot added the FrozenDueToAge label Jun 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

net/http: Transport.CancelRequest is inherently racy #11013

net/http: Transport.CancelRequest is inherently racy #11013

jacobsa commented Jun 1, 2015

ianlancetaylor commented Jun 3, 2015

DanielMorsing commented Jun 11, 2015

bradfitz commented Jun 11, 2015

DanielMorsing commented Jun 11, 2015

jacobsa commented Jun 11, 2015

bradfitz commented Jun 11, 2015

jacobsa commented Jun 11, 2015

bradfitz commented Jun 12, 2015

jacobsa commented Jun 12, 2015

jacobsa commented Jun 28, 2015

bradfitz commented Jun 28, 2015

jacobsa commented Jun 29, 2015

gopherbot commented Jun 29, 2015

Navigation Menu

net/http: Transport.CancelRequest is inherently racy #11013

net/http: Transport.CancelRequest is inherently racy #11013

Comments

jacobsa commented Jun 1, 2015

ianlancetaylor commented Jun 3, 2015

DanielMorsing commented Jun 11, 2015

bradfitz commented Jun 11, 2015

DanielMorsing commented Jun 11, 2015

jacobsa commented Jun 11, 2015

bradfitz commented Jun 11, 2015

jacobsa commented Jun 11, 2015

bradfitz commented Jun 12, 2015

jacobsa commented Jun 12, 2015

jacobsa commented Jun 28, 2015

bradfitz commented Jun 28, 2015

jacobsa commented Jun 29, 2015

gopherbot commented Jun 29, 2015