Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof: TestBlockProfile is timing dependent #15096

Closed
millerresearch opened this issue Apr 4, 2016 · 4 comments
Closed

runtime/pprof: TestBlockProfile is timing dependent #15096

millerresearch opened this issue Apr 4, 2016 · 4 comments

Comments

@millerresearch
Copy link
Contributor

The plan9_arm builder is frequently failing TestBlockProfile in runtime/pprof, for example here. The failing case is always the blockSelectRecvAsync function:

func blockSelectRecvAsync() {
    c := make(chan bool, 1)
    c2 := make(chan bool, 1)
    go func() {
        time.Sleep(blockDelay)
        c <- true
    }()
    select {
    case <-c:
    case <-c2:
    }
}

The test requires the parent goroutine to block in the select statement while the child is sleeping. The problem is that success is dependent on the speed of the test platform. On the plan9_arm builder (a Raspberry Pi with 4 cores but only 1GB of RAM, running multiple tests in parallel), the blockDelay time (10ms) is not always enough to guarantee that the parent goroutine will be dispatched before the child wakes.

Simply increasing the sleep time to 2*blockDelay is a possible fix. Empirically this seems to be enough; but it doesn't actually guarantee correctness. If the OS process running the parent goroutine is preempted just after forking a new process for the child, we can't really put a bound on how long it might be delayed.

We could insert an extra synchronisation to give the parent a "head start":

func blockSelectRecvAsync() {
    c := make(chan bool, 1)
    c2 := make(chan bool, 1)
    ready := make(chan bool, 1)
    go func() {
        <-ready
        time.Sleep(blockDelay)
        c <- true
    }()
    ready <- true
    select {
    case <-c:
    case <-c2:
    }
}

This also seems to work empirically. In theory the OS process running the parent goroutine could still be preempted between reading the ready channel and blocking in the select statement; but I think this is much less likely than being descheduled after creating the child.

@bradfitz
Copy link
Contributor

bradfitz commented Apr 4, 2016

@dvyukov, any preference on fix?

@bradfitz bradfitz added this to the Unplanned milestone Apr 4, 2016
@dvyukov
Copy link
Member

dvyukov commented Apr 4, 2016

Run the tests that depend on timing several times until we get the expected result. I don't see a better solution for such tests.

@gopherbot
Copy link

CL https://golang.org/cl/21604 mentions this issue.

@millerresearch
Copy link
Contributor Author

On further testing, neither of my initial suggestions (2*blockDelay sleep time or extra sync) completely eliminates the failures. I've submitted a CL to retry the blockSelectRecvAsync test three times, which seems to do the trick. @dvyukov, could I ask you to review it please?

@golang golang locked and limited conversation to collaborators Apr 8, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants