Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: Data races when using Transport.CancelRequest in an HTTP reverse proxy #6995

Closed
cespare opened this issue Dec 20, 2013 · 22 comments
Closed
Milestone

Comments

@cespare
Copy link
Contributor

cespare commented Dec 20, 2013

Please refer to this gist: https://gist.github.com/cespare/f4352a11bac5cced9458
(I have also attached these files in a tarball to this issue.)

In particular, note the proxy handler:
https://gist.github.com/cespare/f4352a11bac5cced9458#file-proxy-go-L34-L67

I have an HTTP reverse proxy similar to httputil.ReverseProxy. At high loads, I noticed
a few protocol errors like this:

"2013/12/20 11:55:21 Unsolicited response received on idle HTTP channel starting
with "H"; err=<nil>"

When I enabled the race detector, it found a race (pasted at the bottom of the bug
report.) The race is between http client code (one of Transport's goroutines) and a
server goroutine.

Repro steps:

1. Run the server: go build -o server server.go && ./server
2. Run the proxy with the race detector: go build -race -o proxy proxy.go &&
./proxy
3. Run the client to make a bunch of requests to the proxy: go build -o client client.go
&& ./client

What is the expected output?

No races

What do you see instead?

I immediately get data race warnings.

Which compiler are you using (5g, 6g, 8g, gccgo)?
6g

Which operating system are you using?
linux/amd64

Which version are you using?  (run 'go version')
1.2

Also tested and confirmed in:
1.1, 1.1.1, 1.1.2, and devel +a768c0592b88

Notes and questions:

- This is a much-reduced version of my real application
- The error repros reliably only with higher numbers of concurrent connections
- I can only get the race warning to trigger by using GOMAXPROCS > 1
- The triggering condition seems to be that I'm canceling requests myself, after some
timeout, by calling Transport.CancelRequest (in this minimal version I could just rely
on ResponseHeaderTimeout, but in my real application I need per-request timeouts)
- Removing transport.CancelRequest gets rid of the race warnings. Similarly, I cannot
repro by using httputil.ReverseProxy.
- Is my usage of RoundTrip in conjunction with CancelRequest ok? I am assuming that
after RoundTrip returns, you need to handle it the usual way (close the body if err ==
nil) even if you had called CancelRequest previously.

Here's the race:

2013/12/20 13:24:22 Now listening on localhost:7575
==================
WARNING: DATA RACE
Read by goroutine 43:
  net/http.(*body).Read()
      /home/caleb/apps/go/src/pkg/net/http/transfer.go:531 +0x51
  io.(*LimitedReader).Read()
      /home/caleb/apps/go/src/pkg/io/io.go:398 +0x13e
  io/ioutil.devNull.ReadFrom()
      /home/caleb/apps/go/src/pkg/io/ioutil/ioutil.go:144 +0xbd
  io.Copy()
      /home/caleb/apps/go/src/pkg/io/io.go:348 +0x139
  io.CopyN()
      /home/caleb/apps/go/src/pkg/io/io.go:317 +0xf1
  net/http.(*chunkWriter).writeHeader()
      /home/caleb/apps/go/src/pkg/net/http/server.go:787 +0x1b08
  net/http.(*chunkWriter).Write()
      /home/caleb/apps/go/src/pkg/net/http/server.go:246 +0xb2
  bufio.(*Writer).flush()
      /home/caleb/apps/go/src/pkg/bufio/bufio.go:494 +0x15a
  bufio.(*Writer).Flush()
      /home/caleb/apps/go/src/pkg/bufio/bufio.go:483 +0x34
  net/http.(*response).finishRequest()
      /home/caleb/apps/go/src/pkg/net/http/server.go:999 +0x9d
  net/http.(*conn).serve()
      /home/caleb/apps/go/src/pkg/net/http/server.go:1171 +0xc31

Previous write by goroutine 235:
  net/http.(*body).Close()
      /home/caleb/apps/go/src/pkg/net/http/transfer.go:634 +0xc1
  net/http.(*transferWriter).WriteBody()
      /home/caleb/apps/go/src/pkg/net/http/transfer.go:207 +0x2e0
  net/http.(*Request).write()
      /home/caleb/apps/go/src/pkg/net/http/request.go:400 +0xa38
  net/http.(*persistConn).writeLoop()
      /home/caleb/apps/go/src/pkg/net/http/transport.go:797 +0x210

Goroutine 43 (running) created at:
  net/http.(*Server).Serve()
      /home/caleb/apps/go/src/pkg/net/http/server.go:1644 +0x2c1
  net/http.(*Server).ListenAndServe()
      /home/caleb/apps/go/src/pkg/net/http/server.go:1612 +0xc0
  main.main()
      /home/caleb/test/race-repro/proxy.go:77 +0x206

Goroutine 235 (finished) created at:
  net/http.(*Transport).dialConn()
      /home/caleb/apps/go/src/pkg/net/http/transport.go:529 +0x8d5
  net/http.func·014()
      /home/caleb/apps/go/src/pkg/net/http/transport.go:419 +0x76
==================

Attachments:

  1. cespare-race-repro.tgz (3200 bytes)
@cespare
Copy link
Contributor Author

cespare commented Dec 20, 2013

Comment 1:

Some folks on IRC also noticed that it actually panics if you let it run for a bit. See
below.
I've only managed to get it to panic when using -race.
2013/12/20 14:44:34 http: panic serving 127.0.0.1:44267: runtime error: slice bounds out
of range
goroutine 4 [running]:
net/http.func·009()
        /home/caleb/apps/go/src/pkg/net/http/server.go:1093 +0xbf
runtime.panic(0x697cc0, 0x995dea)
        /home/caleb/apps/go/src/pkg/runtime/panic.c:248 +0x106
bufio.(*Reader).ReadSlice(0xc210039120, 0xc2108a630a, 0x0, 0x0, 0x0, ...)
        /home/caleb/apps/go/src/pkg/bufio/bufio.go:259 +0x717
bufio.(*Reader).ReadLine(0xc210039120, 0x0, 0x0, 0x0, 0x7f6bb5ea6b00, ...)
        /home/caleb/apps/go/src/pkg/bufio/bufio.go:305 +0x71
net/textproto.(*Reader).readLineSlice(0xc2108a63c0, 0x7f6bb665d5e0, 0x7f6bb5ea6c08,
0xc210059000, 0x422ab2, ...)
        /home/caleb/apps/go/src/pkg/net/textproto/reader.go:55 +0x83
net/textproto.(*Reader).ReadLine(0xc2108a63c0, 0xc210901340, 0x405225, 0x5f7f80,
0x7f6bb5ea6c98)
        /home/caleb/apps/go/src/pkg/net/textproto/reader.go:36 +0x35
net/http.ReadRequest(0xc210039120, 0xc210901340, 0x0, 0x0)
        /home/caleb/apps/go/src/pkg/net/http/request.go:526 +0x99
net/http.(*conn).readRequest(0xc210059300, 0x0, 0x0, 0x0)
        /home/caleb/apps/go/src/pkg/net/http/server.go:575 +0x395
net/http.(*conn).serve(0xc210059300)
        /home/caleb/apps/go/src/pkg/net/http/server.go:1123 +0x60f
created by net/http.(*Server).Serve
        /home/caleb/apps/go/src/pkg/net/http/server.go:1644 +0x2c1
=

@dvyukov
Copy link
Member

dvyukov commented Dec 21, 2013

Comment 2:

Labels changed: added threadsanitizer.

@cespare
Copy link
Contributor Author

cespare commented Dec 23, 2013

Comment 3:

I reproduced this on darwin/amd64/1.2 as well.

@davecheney
Copy link
Contributor

Comment 4:

@cespare, please try this patch
https://golang.org/cl/45780045
I am not certain that it solves the underlying issue, but it should fix your race.

Labels changed: added release-go1.3, repo-main.

Owner changed to @davecheney.

Status changed to Accepted.

@dvyukov
Copy link
Member

dvyukov commented Dec 27, 2013

Comment 5:

Sometimes such fixes hide more serious underlying problem -- e.g. when objects are being
accessed from wrong goroutines or events happen in a completely wrong order.

@cespare
Copy link
Contributor Author

cespare commented Dec 27, 2013

Comment 6:

Dave, thanks for taking a look.
Unfortunately, in this case Dmitry is right. Locking around that particular field
(closed) simply pushes the race to another shared field. The problem is more fundamental.
==================
WARNING: DATA RACE
Read by goroutine 263:
  io.(*LimitedReader).Read()
      /home/caleb/apps/go/src/pkg/io/io.go:392 +0x51
  net/http.(*body).Read()
      /home/caleb/apps/go/src/pkg/net/http/transfer.go:535 +0xe4
  io.(*LimitedReader).Read()
      /home/caleb/apps/go/src/pkg/io/io.go:398 +0x13e
  io/ioutil.devNull.ReadFrom()
      /home/caleb/apps/go/src/pkg/io/ioutil/ioutil.go:151 +0xd3
  io.Copy()
      /home/caleb/apps/go/src/pkg/io/io.go:348 +0x139
  io.CopyN()
      /home/caleb/apps/go/src/pkg/io/io.go:317 +0x10b
  net/http.(*chunkWriter).writeHeader()
      /home/caleb/apps/go/src/pkg/net/http/server.go:783 +0x1b6c
  net/http.(*chunkWriter).Write()
      /home/caleb/apps/go/src/pkg/net/http/server.go:246 +0xb2
  bufio.(*Writer).flush()
      /home/caleb/apps/go/src/pkg/bufio/bufio.go:494 +0x17a
  bufio.(*Writer).Flush()
      /home/caleb/apps/go/src/pkg/bufio/bufio.go:483 +0x34
  net/http.(*response).finishRequest()
      /home/caleb/apps/go/src/pkg/net/http/server.go:995 +0x9d
  net/http.(*conn).serve()
      /home/caleb/apps/go/src/pkg/net/http/server.go:1167 +0xc67
Previous write by goroutine 409:
  io.(*LimitedReader).Read()
      /home/caleb/apps/go/src/pkg/io/io.go:399 +0x170
  net/http.(*body).Read()
      /home/caleb/apps/go/src/pkg/net/http/transfer.go:535 +0xe4
  io.(*LimitedReader).Read()
      /home/caleb/apps/go/src/pkg/io/io.go:398 +0x13e
  bufio.(*Writer).ReadFrom()
      /home/caleb/apps/go/src/pkg/bufio/bufio.go:622 +0x271
  io.Copy()
      /home/caleb/apps/go/src/pkg/io/io.go:348 +0x139
  net/http.(*transferWriter).WriteBody()
      /home/caleb/apps/go/src/pkg/net/http/transfer.go:197 +0x838
  net/http.(*Request).write()
      /home/caleb/apps/go/src/pkg/net/http/request.go:401 +0xa9c
  net/http.(*persistConn).writeLoop()
      /home/caleb/apps/go/src/pkg/net/http/transport.go:797 +0x255
Goroutine 263 (running) created at:
  net/http.(*Server).Serve()
      /home/caleb/apps/go/src/pkg/net/http/server.go:1640 +0x2d1
  net/http.(*Server).ListenAndServe()
      /home/caleb/apps/go/src/pkg/net/http/server.go:1608 +0xc0
  main.main()
      /tmp/f4352a11bac5cced9458/cespare-race-repro/proxy.go:77 +0x25c
Goroutine 409 (running) created at:
  net/http.(*Transport).dialConn()
      /home/caleb/apps/go/src/pkg/net/http/transport.go:529 +0x967
  net/http.func·015()
      /home/caleb/apps/go/src/pkg/net/http/transport.go:419 +0x76
==================

@cespare
Copy link
Contributor Author

cespare commented Dec 27, 2013

Comment 7:

I should add that the panic is also still present.

@cespare
Copy link
Contributor Author

cespare commented Dec 27, 2013

Comment 8:

This may be useful: I've put the server, proxy, and client in a single Go program:
http://play.golang.org/p/OFIt2wXIr_
To repro this issue, you should be able to download this and run a single command:
$ go run -race proxy.go

@davecheney
Copy link
Contributor

Comment 9:

@dmitry, i agree, CL 45780045 should probably not be merged, I agree with you that the
race warning is a signal of a more serious issue.
@caleb, thanks for the single file repro. What concerns me is your test should be in
essence, TestTransportCancelRequest. I will continue to investigate.

@davecheney
Copy link
Contributor

Comment 10:

var err error
        timedOut := false
        finished := make(chan bool)
        go func() {
                resp, err = transport.RoundTrip(proxyReq)
                finished <- true
        }()
        select {
        case <-finished:
                timer.Stop()
        case <-timer.C:
                transport.CancelRequest(proxyReq)
                <-finished
                timedOut = true
        }
        if err == nil {
                defer resp.Body.Close()
        }
I an suspicious of this piece of code. I do not believe that err is being assigned
correctly. I am in the camp that a channel receive is not a happens before event. In
fact, the assignment to resp is also suspect.

@davecheney
Copy link
Contributor

Comment 11:

Please consider this replacement
        go func() {
                resp, err := transport.RoundTrip(proxyReq)
                finished <- struct {
                        *http.Response
                        error
                }{resp, err}
        }()
        select {
        case v := <-finished:
                resp := v.Response
                err := v.error
                timer.Stop()
                if err == nil {
                        defer resp.Body.Close()
                }
                if err != nil || timedOut {
                        http.Error(w, "error -- timeout or other", http.StatusBadGateway)
                        return
                }
                CopyHeader(w.Header(), resp.Header)
                w.WriteHeader(resp.StatusCode)
                io.Copy(w, resp.Body)
        case <-timer.C:
                transport.CancelRequest(proxyReq)
                <-finished
                timedOut = true
        }

@cespare
Copy link
Contributor Author

cespare commented Dec 27, 2013

Comment 12:

Hi Dave,
I disagree with your assessment of that piece of code. Quoting from the Go memory model
document:
"Within a single goroutine, the happens-before order is the order expressed by the
program."
"A send on a channel happens before the corresponding receive from that channel
completes."
Combining these two properties, it seems to me that the assignment to resp and err does,
in fact, happen before any reads of those variables later on in the code.
Furthermore, the race detector did not find concurrent access to these; only in net/http
code as indicated (I understand that this is not proof of the absence of a race).
To double-check, I made your suggested change in http://play.golang.org/p/OQrSxQZPsq,
and verified that I see the same data race as in the original post.

@cespare
Copy link
Contributor Author

cespare commented Dec 27, 2013

Comment 13:

As Dominik pointed out in IRC, my code is actually almost the same as the code examples
under "Channel communication" in the memory model document:
http://golang.org/ref/mem#tmp_6

@davecheney
Copy link
Contributor

Comment 14:

I have no interst in debating this point of the memory model in this ticket, let's
return to the issue at hand

@cespare
Copy link
Contributor Author

cespare commented Dec 27, 2013

Comment 15:

@dave sure -- the point being that even without the code you're suspicious of, the same
race is still happening.

@bradfitz
Copy link
Contributor

Comment 16:

Here's a proposed fix: https://golang.org/cl/46570043  But I don't have a new
test for it yet so it's not ready for submission.  But it fixes the repro in comment #8.
Btw, I agree comment #8 has a data race.  I changed that code to: 
http://play.golang.org/p/WQwK-PhpiK (also removing the ResponseHeaderTimeout, which had
nothing to do with the repro.

@cespare
Copy link
Contributor Author

cespare commented Dec 31, 2013

Comment 17:

Hey Brad, I appreciate you taking a look at this. I can confirm that the patch fixes the
race in my app.
Can you describe the race that you see in my code from comment #8
(http://play.golang.org/p/OFIt2wXIr_)? Looking at the change you made, it seems like
you're thinking that two goroutines in handleProxy can race on err (declared on line 58)
but I don't see how that can be, since all code paths to get to the read of err on line
78 must first have received from finished (<-finished).
In fact, it seems to me that your version (http://play.golang.org/p/WQwK-PhpiK) doesn't
always close the response body; consider the sequence:
- Timer fires
- In the other goroutine, transport.RoundTrip (line 63) has completed successfully
- transport.CancelRequest in line 73 runs (doesn't do anything)
Now resp.Body is never closed.

@bradfitz
Copy link
Contributor

bradfitz commented Jan 6, 2014

Comment 18:

> Looking at the change you made, it seems like you're thinking that two
> goroutines in handleProxy can race on err (declared on line 58) but I don't
> see how that can be, since all code paths to get to the read of err on line
> 78 must first have received from finished (<-finished).
You're right. I did misread that (I missed the second receive). The code looks fine.

@bradfitz
Copy link
Contributor

bradfitz commented Jan 7, 2014

Comment 19:

New CL with tests at https://golang.org/cl/46570043 if anybody wants to review
and/or test.

@bradfitz
Copy link
Contributor

bradfitz commented Jan 7, 2014

Comment 20:

This issue was closed by revision affab3f.

Status changed to Fixed.

@bradfitz
Copy link
Contributor

Comment 21:

This issue was updated by revision 9b0560e.

Fixes issue #7092
R=golang-codereviews, khr
CC=golang-codereviews
https://golang.org/cl/51700043

@bradfitz
Copy link
Contributor

Comment 22:

Issue #7595 has been merged into this issue.

@rsc rsc added this to the Go1.3 milestone Apr 14, 2015
@rsc rsc removed the release-go1.3 label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 25, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants