Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: signal_recv can throw spuriously #4383

Closed
dvyukov opened this issue Nov 14, 2012 · 8 comments
Closed

runtime: signal_recv can throw spuriously #4383

dvyukov opened this issue Nov 14, 2012 · 8 comments
Milestone

Comments

@dvyukov
Copy link
Member

dvyukov commented Nov 14, 2012

14850:34142c2654fd tip

The code is:

bool
runtime·sigsend(int32 s)
{
    uint32 bit, mask;

    if(!sig.inuse || s < 0 || s >= 32*nelem(sig.wanted) || !(sig.wanted[s/32]&(1U<<(s&31))))
        return false;
    bit = 1 << (s&31);
    for(;;) {
        mask = sig.mask[s/32];
        if(mask & bit)
            break;      // signal already in queue
        if(runtime·cas(&sig.mask[s/32], mask, mask|bit)) { // (1-0), (9-2)
            // Added to queue.
            // Only send a wakeup if the receiver needs a kick.
            if(runtime·cas(&sig.kick, 1, 0)) // (3-0), (10-2)
                runtime·notewakeup(&sig); // (8-0), (11-2)
            break;
        }
    }
    return true;
}

// Called to receive the next queued signal.
// Must only be called from a single goroutine at a time.
func signal_recv() (m uint32) {
    static uint32 recv[nelem(sig.mask)];
    int32 i, more;
    
    for(;;) {
        // Serve from local copy if there are bits left.
        for(i=0; i<NSIG; i++) { // (0-1)
            if(recv[i/32]&(1U<<(i&31))) {
                recv[i/32] ^= 1U<<(i&31);
                m = i;
                goto done;
            }
        }

        // Get a new local copy.
        // Ask for a kick if more signals come in
        // during or after our check (before the sleep).
        if(sig.kick == 0) {
            runtime·noteclear(&sig);
            runtime·cas(&sig.kick, 0, 1); // (2-1), (6-1)
        }

        more = 0;
        for(i=0; i<nelem(sig.mask); i++) {
            for(;;) {
                m = sig.mask[i];
                if(runtime·cas(&sig.mask[i], m, 0)) // (4-1)
                    break;
            }
            recv[i] = m;
            if(m != 0)
                more = 1;
        }
        if(more)
            continue; // (5-1)

        // Sleep waiting for more.
        runtime·entersyscall();
        runtime·notesleep(&sig); // (7-1)
        runtime·exitsyscall();
    }

done:;
    // goc requires that we fall off the end of functions
    // that return values instead of using our own return
    // statements.
}

Consider that the code is executed by 3 threads as shown in comments. The notation is
(x-y) -> execution order x, executed by thread y. All variables are initially 0.
I.e. first thread 1 checks for signals:
        for(i=0; i<NSIG; i++) { // (0-1)
then thread 0 sets a signal bit:
        if(runtime·cas(&sig.mask[s/32], mask, mask|bit)) { // (1-0), (9-2)
then thread 1 sets kick:
            runtime·cas(&sig.kick, 0, 1); // (2-1), (6-1)
and so on.
It will lead to runtime·throw("notewakeup - double wakeup").
@rsc
Copy link
Contributor

rsc commented Nov 26, 2012

Comment 1:

Looks like the fix would be to introduce
static int32 kicking;
in signal_recv and replace
runtime.cas(&sig.kick, 0, 1)
with
if(!kicking)
    kicking = runtime.cas(&sig.kick, 0, 1)
and also replace
runtime.notesleep(&sig)
with
runtime.notesleep(&sig)
kicking = 0
What do you think, Dmitriy?

@dvyukov
Copy link
Member Author

dvyukov commented Nov 27, 2012

Comment 2:

it wont' help.
consider all vars are initially = 0
signal_recv sets kick=1
sigsend sets a signal bit, kicking=1, kick=0, and signals note
signal_recv discovers new signal bit and return
on the next round signal_recv clears note, sets kick=1 again and blocks on note
at this point it deadlocks, because kicking=1 so nobody will wake it

@dvyukov
Copy link
Member Author

dvyukov commented Nov 27, 2012

Comment 3:

the more separate non-atomic state, states and branches we have, the more tricky it
becomes to reason about it
I am thinking about something along the lines of:
Note note;
uint32 mask[...];
uint32 state;
// can be in 3 states: (1) no signals, no waiter, (2) have signals, (3) have waiter.
sigsend()
{
  set bit in mask
  in a CAS loop {
    if state==0 -> state=HAVE_SIGNALS
    if state==HAVE_SIGNALS -> do nothing
    is state==HAVE_WAITER -> state=HAVE_SIGNALS, signal note
  }
}
signal_recv()
{
  for(;;) {
    the same logic with local copy
    in a CAS loop {
      if state==0 -> state=HAVE_WAITER, block on note, clear note
      if state==HAVE_SIGNALS -> state=0, recheck mask
      is state==HAVE_WAITER -> throw
    }
  }
}
This brings 2 simplifications: (1) we have both conditions in a single var (have
signals, have waiters), so it's slightly simpler to reason, (2) if signal_recv sets
state=HAVE_WAITER, then it always blocks after that.

@rsc
Copy link
Contributor

rsc commented Nov 27, 2012

Comment 4:

Sure.

@rsc
Copy link
Contributor

rsc commented Dec 10, 2012

Comment 5:

Labels changed: added size-l.

@dvyukov
Copy link
Member Author

dvyukov commented Dec 24, 2012

Comment 6:

I am on it. In a stress test is actually crashes in a second.

Owner changed to @dvyukov.

Status changed to Accepted.

@dvyukov
Copy link
Member Author

dvyukov commented Dec 24, 2012

Comment 7:

https://golang.org/cl/6996060/

@dvyukov
Copy link
Member Author

dvyukov commented Dec 28, 2012

Comment 8:

This issue was closed by revision 91484c6.

Status changed to Fixed.

@rsc rsc added this to the Go1.1 milestone Apr 14, 2015
@rsc rsc removed the go1.1 label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants