Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: data race in TestTransportResponseHeaderTimeout #7264

Closed
dvyukov opened this issue Feb 5, 2014 · 23 comments
Closed

net/http: data race in TestTransportResponseHeaderTimeout #7264

dvyukov opened this issue Feb 5, 2014 · 23 comments
Labels
FrozenDueToAge Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@dvyukov
Copy link
Member

dvyukov commented Feb 5, 2014

parent: 19010:d067c6de4ec3 tip, linux/amd64
$ go test -v -race -cpu=16,16,16,16 -run=TestTransportResponseHeaderTimeout net/http

WARNING: DATA RACE
Write by goroutine 51:
  sync.raceWrite()
      src/pkg/sync/race.go:41 +0x35
  sync.(*WaitGroup).Wait()
      src/pkg/sync/waitgroup.go:120 +0x16d
  net/http/httptest.(*Server).Close()
      src/pkg/net/http/httptest/server.go:168 +0x6a
  net/http_test.TestTransportResponseHeaderTimeout()
      src/pkg/net/http/transport_test.go:1254 +0x6de
  testing.tRunner()
      src/pkg/testing/testing.go:398 +0x10f

Previous read by goroutine 366:
  sync.raceRead()
      src/pkg/sync/race.go:37 +0x35
  sync.(*WaitGroup).Add()
      src/pkg/sync/waitgroup.go:60 +0xc1
  net/http/httptest.(*waitGroupHandler).ServeHTTP()
      src/pkg/net/http/httptest/server.go:198 +0x5e
  net/http.serverHandler.ServeHTTP()
      src/pkg/net/http/server.go:1599 +0x1cc
  net/http.(*conn).serve()
      src/pkg/net/http/server.go:1162 +0xc36

Goroutine 51 (running) created at:
  testing.RunTests()
      src/pkg/testing/testing.go:479 +0xbb5
  testing.Main()
      src/pkg/testing/testing.go:411 +0xad
  main.main()
      /tmp/go-build864201384/net/http/_test/_testmain.go:443 +0xdc

Goroutine 366 (running) created at:
  net/http.(*Server).Serve()
      src/pkg/net/http/server.go:1646 +0x2d1


The test sends request to /slow:
    mux.HandleFunc("/slow", func(w ResponseWriter, r *Request) { time.Sleep(2 * time.Second) })
gets timeout error after 500ms and then closes the server with:
    defer ts.Close()

Close waits for all outstanding handlers to finish:
func (s *Server) Close() {
    s.Listener.Close()
    s.wg.Wait()
    ...
}

The wg is +1'ed in the handler:
func (h *waitGroupHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    h.s.wg.Add(1)
    defer h.s.wg.Done() // a defer, in case ServeHTTP below panics
    h.h.ServeHTTP(w, r)
}

So essentially when the test closes the server and waits for all outstanding handlers to
finish, ServeHTTP may not yet have +1'ed wg, so wg.Wait will return early w/o waiting
for the handler.


httptest.NewServer API assumes that tests wait at least for handlers to start executing.
Then Close will wait for handlers to finish.
@bradfitz
Copy link
Contributor

bradfitz commented Feb 7, 2014

Comment 1:

Sorry, I don't understand this bug report.

@dvyukov
Copy link
Member Author

dvyukov commented Feb 7, 2014

Comment 2:

We call wg.Wait before the relevant wg.Add has happened.

@bradfitz
Copy link
Contributor

bradfitz commented Feb 7, 2014

Comment 3:

I don't see where.
Also, you're not even clear about what you think is at fault---  that test or
httptest.Server?
If no HTTP request has been received, there is nothing to wait for in
httptest.Server.Close.
This test makes 3 HTTP requests. How will the wg.Add not be called 3 times?
I also can't reproduce this.

@dvyukov
Copy link
Member Author

dvyukov commented Feb 7, 2014

Comment 4:

The test makes 3 requests, but it does not know that the servers handler for /slow
request has started. It may not be even received from Listen. Or received from Listen
and the handler goroutine is kicked off, but it has not yet executed wg.Add.
The test is at fault, it must do something along the lines of:
received := make(chan bool)
mux.HandleFunc("/slow", func(w ResponseWriter, r *Request) {
  received <- true
  time.Sleep(2 * time.Second)
})
<-received
// Only NOW we do know that wg.Add has happened!
ts.Close()
-------------------------------------
Essentially what happens here is:
var wg sync.WaitGroup
go func() {  // this is slow handler
  wg.Add(1)
  time.Sleep(2 * time.Second)
  wg.Done()
}()
wg.Wait()

@bradfitz
Copy link
Contributor

bradfitz commented Mar 3, 2014

Comment 5:

In the general case, I agree that the wg.Add(1) could be earlier in httptest, and I
could imagine scenarios that would exploit that, but you wrote that the test
(TestTransportResponseHeaderTimeout) is at fault and I kinda disagree.
The test makes three synchronous HTTP requests. They're not in goroutines.
How can I reproduce the data race you saw? Did you have local modifications?
Normally it's very easy to repro races, but I can't make this one trigger, and an
eyeball analysis of all the relevant code seems fine too.
I totally understand what you're talking about with wg.Add(1) being done in a goroutine
(and I'll send a CL for httptest to do the wg.Add(1) earlier), but I don't see how this
can happen here, at least in the success paths, unless the test is already failing for
other reasons.
Maybe I could imagine a scenario where the second HTTP request actually connects, but
the machine is so loaded, the Go scheduler doesn't run the http Handler goroutine for
seconds, meanwhile the client code goroutine does run and passes, the func exits, and
then the Handler runs.  Was the builder very overloaded?  I guess I'm never surprised by
how slow the build machines can be.

@bradfitz
Copy link
Contributor

bradfitz commented Mar 3, 2014

Comment 6:

Sent https://golang.org/cl/70950043

Owner changed to @bradfitz.

Status changed to Started.

@dvyukov
Copy link
Member Author

dvyukov commented Mar 4, 2014

Comment 7:

I will try to provide more reliable reproducer today.
> Maybe I could imagine a scenario where the second HTTP request actually connects, but
the machine is so loaded, the Go scheduler doesn't run the http Handler goroutine for
seconds, meanwhile the client code goroutine does run and passes, the func exits, and
then the Handler runs.
The race detector does not care about timings, it watches when things are not
synchronized.
> I totally understand what you're talking about with wg.Add(1) being done in a
goroutine (and I'll send a CL for httptest to do the wg.Add(1) earlier)
Just doing wd.Add after Accept won't help. Because if you do http.Get with timeout, then
you don't know that even Accept has returned when timeout fires.

@gopherbot
Copy link

Comment 8:

CL https://golang.org/cl/70950043 references this issue.

@bradfitz
Copy link
Contributor

Comment 9:

Assigning to Dmitry to create a repro.

Owner changed to @dvyukov.

Status changed to Accepted.

@bradfitz
Copy link
Contributor

Comment 10:

Labels changed: added testing.

@dvyukov
Copy link
Member Author

dvyukov commented Apr 11, 2014

Comment 11:

Mailed CL with small and reliable reproducer:
https://golang.org/cl/86770043

Owner changed to @bradfitz.

@gopherbot
Copy link

Comment 12:

CL https://golang.org/cl/86770043 mentions this issue.

@gopherbot
Copy link

Comment 13:

CL https://golang.org/cl/86770043 mentions this issue.

@gopherbot
Copy link

Comment 14:

CL https://golang.org/cl/86770043 mentions this issue.

@dvyukov
Copy link
Member Author

dvyukov commented Apr 11, 2014

Comment 15:

This issue was updated by revision 388d533.

LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/86770043

@gopherbot
Copy link

Comment 16:

CL https://golang.org/cl/87640043 mentions this issue.

@gopherbot
Copy link

Comment 17:

CL https://golang.org/cl/87640043 mentions this issue.

@gopherbot
Copy link

Comment 18:

CL https://golang.org/cl/87640043 mentions this issue.

@bradfitz
Copy link
Contributor

Comment 19:

This bug finally reproduces consistently for me, and I can confirm that
https://golang.org/cl/70950043 fixes the race.
ba12:src bradfitz$ go test -v -race -run=TestTransportResponseHeaderTimeout net/http
=== RUN TestTransportResponseHeaderTimeout
==================
WARNING: DATA RACE
Write by goroutine 5:
  sync.raceWrite()
      /Users/bradfitz/go/src/pkg/sync/race.go:41 +0x35
  sync.(*WaitGroup).Wait()
      /Users/bradfitz/go/src/pkg/sync/waitgroup.go:122 +0x176
  net/http/httptest.(*Server).Close()
      /Users/bradfitz/go/src/pkg/net/http/httptest/server.go:168 +0x74
  net/http_test.TestTransportResponseHeaderTimeout()
      /Users/bradfitz/go/src/pkg/net/http/transport_test.go:1307 +0xe8e
  testing.tRunner()
      /Users/bradfitz/go/src/pkg/testing/testing.go:422 +0x10f
Previous read by goroutine 11:
  sync.raceRead()
      /Users/bradfitz/go/src/pkg/sync/race.go:37 +0x35
  sync.(*WaitGroup).Add()
      /Users/bradfitz/go/src/pkg/sync/waitgroup.go:60 +0xbe
  net/http/httptest.(*waitGroupHandler).ServeHTTP()
      /Users/bradfitz/go/src/pkg/net/http/httptest/server.go:198 +0x5e
  net/http.serverHandler.ServeHTTP()
      /Users/bradfitz/go/src/pkg/net/http/server.go:1675 +0x1e8
  net/http.(*conn).serve()
      /Users/bradfitz/go/src/pkg/net/http/server.go:1176 +0xf16
Goroutine 5 (running) created at:
  testing.RunTests()
      /Users/bradfitz/go/src/pkg/testing/testing.go:503 +0xb4d
  testing.Main()
      /Users/bradfitz/go/src/pkg/testing/testing.go:435 +0xa8
  main.main()
      /var/folders/zr/clvw_31d1wdcd404ybr3pmch0000gn/T/go-build455016757/net/http/_test/_testmain.go:477 +0xdc
Goroutine 11 (running) created at:
  net/http.(*Server).Serve()
      /Users/bradfitz/go/src/pkg/net/http/server.go:1723 +0x339
==================
--- PASS: TestTransportResponseHeaderTimeout (2.01 seconds)
PASS
Found 1 data race(s)
exit status 66
FAIL    net/http    3.048s
ba12:src bradfitz$ hg clpatch 70950043
70950043:
    net/http/httptest: lock Server's WaitGroup earlier, at Accept time
    Fixes issue #7264
    Reviewer: dvyukov@google.com
    CC: golang-codereviews@googlegroups.com
    Files:
        src/pkg/net/http/httptest/server.go
ba12:src bradfitz$ go test -v -race -run=TestTransportResponseHeaderTimeout net/http
=== RUN TestTransportResponseHeaderTimeout
--- PASS: TestTransportResponseHeaderTimeout (2.01 seconds)
PASS
ok      net/http    3.049s
ba12:src bradfitz$ go test -v -race -run=TestTransportResponseHeaderTimeout net/http
=== RUN TestTransportResponseHeaderTimeout
--- PASS: TestTransportResponseHeaderTimeout (2.01 seconds)
PASS
ok      net/http    3.049s
I will reply on that CL to try to convince Dmitry why I think this is correct.

@bradfitz
Copy link
Contributor

Comment 20:

This issue was updated by revision 172acae.

Races:
http://build.golang.org/log/a2e401fdcd4903a61a3375bff5da702a20ddafad
http://build.golang.org/log/ec4c69e92076a747ac6d5df7eb7b382b31ab3d43
I think this is the first time I've actually seen a manifestation
of issue #7264, and one that I can reproduce.
I don't know why it triggers on this test and not any others
just like it, or why I can't reproduce issue #7264
independently, even when Dmitry gives me minimal repros.
Work around it for now with some synchronization to make the
race detector happy.
The proper fix will probably be in net/http/httptest itself, not
in all hundred some tests.
LGTM=rsc
R=rsc
CC=dvyukov, golang-codereviews
https://golang.org/cl/87640043

@gopherbot
Copy link

Comment 21:

CL https://golang.org/cl/87970045 mentions this issue.

@bradfitz
Copy link
Contributor

Comment 22:

This issue was closed by revision 6ddd995.

Status changed to Fixed.

@rsc
Copy link
Contributor

rsc commented Sep 8, 2014

Comment 23:

This issue was closed by revision 50fc0f1.

@dvyukov dvyukov added fixed Testing An issue that has been verified to require only test changes, not just a test failure. labels Sep 8, 2014
@rsc rsc added this to the Go1.3 milestone Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 25, 2016
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jun 25, 2018
Now it's failing on Windows:

panic: httptest: failed to listen on a port: listen tcp 127.0.0.1:0:
listen: An operation on a socket could not be performed because the
system lacked sufficient buffer space or because a queue was full.

Since we can't seem to understand what the test is trying to test,
and because it is causing problems on multiple systems,
delete it.

Fixes golang#7264.

TBR=bradfitz
CC=brainman, golang-codereviews
https://golang.org/cl/141210043
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jul 9, 2018
Now it's failing on Windows:

panic: httptest: failed to listen on a port: listen tcp 127.0.0.1:0:
listen: An operation on a socket could not be performed because the
system lacked sufficient buffer space or because a queue was full.

Since we can't seem to understand what the test is trying to test,
and because it is causing problems on multiple systems,
delete it.

Fixes golang#7264.

TBR=bradfitz
CC=brainman, golang-codereviews
https://golang.org/cl/141210043
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

4 participants