You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The plan9_arm builder is frequently failing TestBlockProfile in runtime/pprof, for example here. The failing case is always the blockSelectRecvAsync function:
func blockSelectRecvAsync() {
c := make(chan bool, 1)
c2 := make(chan bool, 1)
go func() {
time.Sleep(blockDelay)
c <- true
}()
select {
case <-c:
case <-c2:
}
}
The test requires the parent goroutine to block in the select statement while the child is sleeping. The problem is that success is dependent on the speed of the test platform. On the plan9_arm builder (a Raspberry Pi with 4 cores but only 1GB of RAM, running multiple tests in parallel), the blockDelay time (10ms) is not always enough to guarantee that the parent goroutine will be dispatched before the child wakes.
Simply increasing the sleep time to 2*blockDelay is a possible fix. Empirically this seems to be enough; but it doesn't actually guarantee correctness. If the OS process running the parent goroutine is preempted just after forking a new process for the child, we can't really put a bound on how long it might be delayed.
We could insert an extra synchronisation to give the parent a "head start":
func blockSelectRecvAsync() {
c := make(chan bool, 1)
c2 := make(chan bool, 1)
ready := make(chan bool, 1)
go func() {
<-ready
time.Sleep(blockDelay)
c <- true
}()
ready <- true
select {
case <-c:
case <-c2:
}
}
This also seems to work empirically. In theory the OS process running the parent goroutine could still be preempted between reading the ready channel and blocking in the select statement; but I think this is much less likely than being descheduled after creating the child.
The text was updated successfully, but these errors were encountered:
On further testing, neither of my initial suggestions (2*blockDelay sleep time or extra sync) completely eliminates the failures. I've submitted a CL to retry the blockSelectRecvAsync test three times, which seems to do the trick. @dvyukov, could I ask you to review it please?
The plan9_arm builder is frequently failing TestBlockProfile in runtime/pprof, for example here. The failing case is always the blockSelectRecvAsync function:
The test requires the parent goroutine to block in the select statement while the child is sleeping. The problem is that success is dependent on the speed of the test platform. On the plan9_arm builder (a Raspberry Pi with 4 cores but only 1GB of RAM, running multiple tests in parallel), the blockDelay time (10ms) is not always enough to guarantee that the parent goroutine will be dispatched before the child wakes.
Simply increasing the sleep time to 2*blockDelay is a possible fix. Empirically this seems to be enough; but it doesn't actually guarantee correctness. If the OS process running the parent goroutine is preempted just after forking a new process for the child, we can't really put a bound on how long it might be delayed.
We could insert an extra synchronisation to give the parent a "head start":
This also seems to work empirically. In theory the OS process running the parent goroutine could still be preempted between reading the ready channel and blocking in the select statement; but I think this is much less likely than being descheduled after creating the child.
The text was updated successfully, but these errors were encountered: