Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: WaitGroup showing up with race detected #8054

Closed
fraenkel opened this issue May 21, 2014 · 11 comments
Closed

sync: WaitGroup showing up with race detected #8054

fraenkel opened this issue May 21, 2014 · 11 comments

Comments

@fraenkel
Copy link
Contributor

go version devel +cd96728bb365 Mon May 19 14:29:45 2014 -0700 linux/amd64

Running our tests with -race is showing what I hope is a false positive within
sync.WaitGroup

What steps reproduce the problem?
This happens when running our tests against our GoRouter.  

The code can be found here: https://github.com/cloudfoundry/gorouter
To run the test cases, just execute ./scripts/test

The failure can be seen here.
https://travis-ci.org/cloudfoundry/gorouter/jobs/25560700

Using go 1.2, no race is ever detected.


==================
WARNING: DATA RACE
Write by goroutine 25:
  sync.raceWrite()
      /home/travis/.gvm/gos/tip/src/pkg/sync/race.go:41 +0x35
  sync.(*WaitGroup).Wait()
      /home/travis/.gvm/gos/tip/src/pkg/sync/waitgroup.go:122 +0x176
  github.com/cloudfoundry/gorouter/proxy.(*proxy).Wait()
      /home/travis/gopath/src/github.com/cloudfoundry/gorouter/proxy/proxy.go:88 +0x4b
  github.com/cloudfoundry/gorouter/router.func·003()
      /home/travis/gopath/src/github.com/cloudfoundry/gorouter/router/router.go:145 +0x77


Previous read by goroutine 20:
  sync.raceRead()
      /home/travis/.gvm/gos/tip/src/pkg/sync/race.go:37 +0x35
  sync.(*WaitGroup).Add()
      /home/travis/.gvm/gos/tip/src/pkg/sync/waitgroup.go:60 +0xbe
  github.com/cloudfoundry/gorouter/proxy.(*proxy).ServeHTTP()
      /home/travis/gopath/src/github.com/cloudfoundry/gorouter/proxy/proxy.go:117 +0x2e2
  net/http.serverHandler.ServeHTTP()
      /home/travis/.gvm/gos/tip/src/pkg/net/http/server.go:1673 +0x1e8
  net/http.(*conn).serve()
      /home/travis/.gvm/gos/tip/src/pkg/net/http/server.go:1174 +0xefc
@ianlancetaylor
Copy link
Contributor

Comment 1:

Labels changed: added repo-main, release-go1.3maybe, racedetector.

@dvyukov
Copy link
Member

dvyukov commented May 21, 2014

Comment 2:

Why do you think it's a false positive?
All similar races that I saw before were real bugs in user code. The report suggests
that you wg.Wait before all root wg.Add's.

Status changed to WaitingForReply.

@fraenkel
Copy link
Contributor Author

Comment 3:

There are cases where the code could Wait() before all Add()s are invoked.  However, the
document states that Wait() may exit early which I am willing to live with.
This is following that same pattern that httptest.Server uses which would mean that code
is potentially broke as well. In this case however, that is not the case.  I am sending
in a request that is Add()ed and then invoking a Wait().

@dvyukov
Copy link
Member

dvyukov commented May 21, 2014

Comment 4:

I do not understand. Do you call Wait before Add or not?

@fraenkel
Copy link
Contributor Author

Comment 5:

For this test case, no.  I added some printouts to prove to myself that Adds happened
before Wait.  In general, I cannot guarantee that since there is a window of possibility
which I am willing to live with.
The below shows the Data Race again, but this time I printed before/after around my
Adds/Waits.
https://travis-ci.org/cloudfoundry/gorouter/jobs/25689283

@dvyukov
Copy link
Member

dvyukov commented May 21, 2014

Comment 6:

Prints can't prove it. Prints only show how things happened in this particular
execution. I still think that you call Wait concurrently with root Add's.

@fraenkel
Copy link
Contributor Author

Comment 7:

I did some more testing just to satisfy myself.
On my mac, and linux VMs, I could never reproduce the problem with 1.2.2 or 1.3beta2.
This "race" only seems to appear on Travis.  It doesn't seem to matter how many times I
try.
I even broke down and wrote an easier testcase which couldn't recreate the issue.
Even if I force Adds after Wait but before Done, I never got a data race which I did
think was a bit odd.

@rsc
Copy link
Contributor

rsc commented May 21, 2014

Comment 8:

Given that the Add call is happening from proxy.ServeHTTP called from the http server,
it seems clear that the Add is concurrent with the Wait: the Wait can't possibly know
when a new request is going to come in.
It sounds like you are saying that you want this Wait to apply to all the Adds that come
in before it and any subset of the ones racing with it but don't care about the race.
Not caring about the race doesn't mean that it's not a race. This is usually a mistake
and probably worth fixing in your code just so that you can get meaningful reports from
the race detector about other races.
The fix would be to have a "current" wait group, and have that protected by a mutex, and
your Wait method would lock the current wait group, swap in a brand new one, and call
Wait on the old one.

Status changed to WorkingAsIntended.

@fraenkel
Copy link
Contributor Author

Comment 9:

I hate to belabor the point but it seems you just invalidated the WaitGroup
implementation.
I understand the rules that are in place when using WaitGroup.
All Add()s before a Wait().  I am assuming Done() is allowed any time after a
corresponding Add() including after Wait().
The problem I have is the implementation of Done() which invalidates the rules of use. 
Done() is simply an Add(-1) which means Add() can occur after Wait().  Or are we really
saying that you cannot allow the count to go to zero and then above 0 with an
outstanding Wait()?
The moment I have to wrap this all with a Mutex I might as well count myself and use a
buffered channel or Condition or even a WaitGroup of 1 which I trigger when my count is
0.
But like I stated in the beginning net/http/httptest/Server.go has the same race since
it too makes the same bad assumption on how to use a WaitGroup.

@dvyukov
Copy link
Member

dvyukov commented May 22, 2014

Comment 10:

The rule for WaitGroup is: all Add's that can transition the counter from 0 to >0 must
happen before Wait. Add's with negative values, or Add's with positive values which
start when counter >0 can be executed at any time.
>  Or are we really saying that you cannot allow the count to go to zero and then above
0 with an outstanding Wait()?
I don't see how it can be useful.

@gopherbot
Copy link

Comment 11:

The documentation sentence "Note that calls with positive delta must happen before the
call to Wait, or else Wait may wait for too small a group." is notoriously confusing.
Given comment #10 here, could that that be clarified too? People read that as "all
calls" or "some calls" depending on what they wish it would do.

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants