New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync: potential race when using cond.Broadcast #14064
Comments
Cond uses special sync semaphore (think of rendezvous on sync chan). So the described scenario can't happen. |
Dmitry, I think there's still a race in the Broadcast case outside the mutex. Suppose there are N goroutines waiting on a cv (but not on the semaphore yet). Then another goroutine calls Broadcast, while concurrently another goroutine is calling Wait on it. While what's going to happen to that last goroutine is definitely non-deterministic (it may or may not get the notification), what happens to the previously N waiting goroutines is deterministic: they should all get the notification. Under this implementation, it looks like one of them may not get the notification when the "new" goroutine steals its notification. |
@wedsonaf please provide a full example program that will misbehave. |
This is hard to reproduce because we need the thread to lose the CPU between releasing the lock and sleeping on the semaphore. So to make it easier, I modified the sync package to optionally artificially introduce a delay of 1s in the spot we care about. This is controlled by The code below will result in a deadlock. Note that if we remove the extra waiter altogether the code runs to completion after the 1s wait. package main
import (
"runtime"
"sync"
"time"
)
func main() {
var m sync.Mutex
cond := sync.NewCond(&m)
// Start a waiter.
ch := make(chan struct{})
sync.ForceWait = true
go func() {
runtime.LockOSThread()
m.Lock()
ch <- struct{}{}
cond.Wait()
m.Unlock()
ch <- struct{}{}
}()
<-ch
m.Lock()
m.Unlock()
// We know that the waiter is in the cond.Wait() call because we
// synchronized with it, then acquired/released the mutex it was
// holding when we synchronized.
go func() {
cond.Broadcast()
}()
// Sleep before we wait to give Broadcast a chance to wait for the waiter.
time.Sleep(20 * time.Millisecond)
// Start the extra waiter, without the forced wait.
sync.ForceWait = false
go func() {
m.Lock()
cond.Wait()
m.Unlock()
}()
// This will only be satisfied when the first waiter wakes up. We'll
// deadlock if it never wakes up.
<-ch
} Below is the diff to sync: diff --git a/src/runtime/sema.go b/src/runtime/sema.go
index b54621b..0ea5d63 100644
--- a/src/runtime/sema.go
+++ b/src/runtime/sema.go
@@ -291,3 +291,8 @@ func syncsemcheck(sz uintptr) {
throw("bad syncSema size")
}
}
+
+//go:linkname sleep sync.runtime_Usleep
+func sleep(us uint32) {
+ usleep(us)
+}
diff --git a/src/sync/cond.go b/src/sync/cond.go
index 0aefcda..a9f7b4b 100644
--- a/src/sync/cond.go
+++ b/src/sync/cond.go
@@ -10,6 +10,8 @@ import (
"unsafe"
)
+var ForceWait = false
+
// Cond implements a condition variable, a rendezvous point
// for goroutines waiting for or announcing the occurrence
// of an event.
@@ -60,6 +62,11 @@ func (c *Cond) Wait() {
race.Enable()
}
c.L.Unlock()
+
+ if ForceWait {
+ runtime_Usleep(1000000)
+ }
+
runtime_Syncsemacquire(&c.sema)
c.L.Lock()
}
diff --git a/src/sync/runtime.go b/src/sync/runtime.go
index c66d2de..8874b18 100644
--- a/src/sync/runtime.go
+++ b/src/sync/runtime.go
@@ -45,3 +45,5 @@ func runtime_canSpin(i int) bool
// runtime_doSpin does active spinning.
func runtime_doSpin()
+
+func runtime_Usleep(usec uint32) |
Here's a proposed fix: https://go-review.googlesource.com/18892 |
CL https://golang.org/cl/18892 mentions this issue. |
@wedsonaf yes, this is a bug in Cond. Have you actually hit this scenario in real life or it is a thought experiment? |
It was a "directed" thought experiment: I was investigating an issue where it seemed like a waiting goroutine wasn't waking up on a broadcast, but it turned out to be something else. |
Copied from a posting to golang-nuts by Wedson Almeida Filho. I (Ian) don't think there is a problem with
cond.Signal
, but there does seem to be a problem withcond.Broadcast
.I'm under the impression that the current implementation of condition variables in Go is incorrect and can lead to deadlocks.
cond.Wait
does the following:It looks like it's possible for one thread to "steal" a signal/broadcast intended to another one if it loses the CPU between steps 2 & 3 above. For example, the following sequence seems problematic:
cond.Wait
, loses CPU between step 2 & 3 above. (Preempted by the kernel scheduler.)cond.Signal
orcond.Broadcast
; semaphore gets incremented to 1.cond.Wait
, the wait on the semaphore is immediately satisfied by the count stored by the previous step.A more concrete example might be a [simplified] consumer/producer implementation (I'm only keeping track of a count):
For the code above, suppose
count == 0
. The following sequence is possible:consume
, which will callcond.Wait
; suppose it loses the CPU between steps 2 & 3.produce
, which will callcond.Broadcast
, which will cause the semaphore to go to 1.produce
repeatedly, until count reaches the max.cond.Wait
, which will be satisfied immediately.cond.Wait
again, and sleep.The threads are now deadlocked. Granted that it's a race condition that is hard to hit, it looks like a correctness issue.
The text was updated successfully, but these errors were encountered: