runtime, net: spurious wakeups in netpoll using kevent #14548

bdarnell · 2016-02-28T06:55:06Z

What version of Go are you using (go version)?

go version go1.6 darwin/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOOS="darwin"
What did you do?

Open and close a lot of sockets to localhost in multiple goroutines, writing to the client side of each socket as soon as net.Dial returns.

Runnable example here: https://github.com/tamird/go-conn-repro. This is the same repro case as #14539, but this issue is about the low-level networking problem discovered here rather than the error handling in crypto/tls.

What did you expect to see?

All connections should succeed.

What did you see instead?

Sometimes, connections hang and cause the test to time out. With the patch from https://go-review.googlesource.com/#/c/19990/ to return errors properly, we see that net.Conn.Write is returning "socket is not connected", i.e.ENOTCONN.

Analysis:

runtime/netpoll.go may have spurious wakeups, in which a goroutine blocked in pollDesc.WaitWrite may be released even though the socket is not writeable, and likewise for reads. This is normally fine: WaitWrite and WaitRead are used in loops so if the goroutine is woken up too early it just gets EAGAIN from the system call and goes back to sleep. However, there is at least one case when an early wakeup results in an error other than EAGAIN: a socket that has not yet completed its asynchronous connect(2) call will return ENOTCONN for some system calls including getpeername and write. In addition, because the connection has not yet completed, the socket does not yet have an error to be returned by getsockopt(SOL_SOCKET, SO_ERROR), so netFD.connect believes the connection has completed successfully. Once connect and Dial return, the connection is presumably writeable but the first call to Write may fail (this is exacerbated by the faulty error handling in crypto/tls, but is problematic in any case).

My evidence that this in fact happening comes from running the above test case under dtruss. Compare a successful connection:

connect(0x9, 0xC8202BB42C, 0x10)         = -1 Err#36
kevent(0x6, 0xC82004ACA8, 0x2)       = 0 0
kevent(0x6, 0x0, 0x0)        = 1 0
getsockopt(0x9, 0xFFFF, 0x1007)      = 0 0
getsockname(0x9, 0xC82004AEE4, 0xC82004AEE0)         = 0 0
getpeername(0x9, 0xC82004AEE4, 0xC82004AEE0)         = 0 0
setsockopt(0x9, 0x6, 0x1)        = 0 0
write(0x9, "\026\003\001\0", 0x84)       = 132 0

with an unsuccessful one:

connect(0x7, 0xC82038078C, 0x10)         = -1 Err#36
kevent(0x6, 0xC820176CA8, 0x2)       = 0 0
write(0x9, "\025\003\003\0", 0x1F)       = 31 0
close(0x9)       = 0 0
kevent(0x6, 0x0, 0x0)        = 0 0
getsockopt(0x7, 0xFFFF, 0x1007)      = 0 0
getsockname(0x7, 0xC820176EE4, 0xC820176EE0)         = 0 0
getpeername(0x7, 0xC820176EE4, 0xC820176EE0)         = -1 Err#57
setsockopt(0x7, 0x6, 0x1)        = 0 0
write(0x7, "\0", 0x84)       = -1 Err#57

(errno translations: 36=EINPROGRESS, 57=ENOTCONN). In the successful case, getsockopt is not called until kevent returns with a new event. In the failing case, there is no call to kevent with non-empty results in between the connect and getsockopt calls, so there must be a lingering wakeup caused by reuse of pollDesc objects that started before this snippet, which unblocks the goroutine and allows it to proceed to call getsockopt.

[netpoll.go] refers to the possibility of getting stale notifications, but says they are handled by the use of the seq field. This appears to be the case for deadlines, but seq does not appear to be consulted on the path between kqueue and waking up the goroutine.

I see two approaches to fixing this: either remove the possibility of spurious wakeups from netpoll.go so that a return from WaitWrite guarantees that the fd became writeable (e.g. check seq or fd when unblocking a waiter), or make netFD.connect tolerant of spurious wakeups by testing for the writeability of the socket before returning (e.g. attempt to Write(nil) and if it returns ENOTCONN go back into the WaitWrite loop).

All my analysis of this bug has been on OSX and I don't know which parts of this may vary on other platforms. This may shed some light on a TODO in tcpsock_posix.go.

The text was updated successfully, but these errors were encountered:

mikioh · 2016-02-28T07:56:27Z

From my experience in #13853, I suppose that this is a darwin-specific issue. You can write a small dtrace script and see more details. I'm not sure whether this issue occurs when using external connectivity, not using loopback. I guess it's worth to try external connectivity.

bdarnell · 2016-02-28T16:11:41Z

I agree that this is likely darwin-specific. On linux, I believe spurious wakeups are possible, but they are relatively harmless because write(2) returns EAGAIN instead of ENOTCONN in this case and the regular retry loop takes care of it (but it's not completely harmless: I think this is one way in which a connection can end up with a nil RemoteAddr).

Spurious wakeups are also more common on kqueue platforms because netpollclose is a no-op there: This means that pollDesc objects are put back on the freelist before their file descriptors have been closed. But I tried making netpollclose unregister the fd with kqueue and it did not resolve the problems seen here.

mikioh · 2016-02-29T01:24:12Z

so there must be a lingering wakeup caused by reuse of pollDesc objects that started before this snippet, which unblocks the goroutine and allows it to proceed to call getsockopt.

I don't know the reason why you suppose so. I've just scratched the TCP control block inside the kernel by using the following snippet and observed that the kernel poked with the application even when the state of TCP was "syn-sent." Furthermore, it happened without any notification through ev.flags such as EV_EOF, EV_ERROR.

package main

import (
        "log"
        "net"
        "runtime"
        "sync"

        "github.com/mikioh/tcp"
)

func main() {
        ln, err := net.Listen("tcp", "127.0.0.1:0")
        if err != nil {
                log.Fatal(err)
        }
        defer ln.Close()

        go func() {
                for {
                        c, err := ln.Accept()
                        if err != nil {
                                return
                        }
                        go func(c net.Conn) {
                                defer c.Close()
                                var b [512]byte
                                read(c, b[:])
                                write(c, b[:])
                        }(c)
                }
        }()


        attempts := int(1e4)
        var wg sync.WaitGroup
        wg.Add(attempts)
        N := runtime.GOMAXPROCS(-1) * 2
        sem := make(chan bool, N)
        for i := 0; i < attempts; i++ {
                sem <- true
                go func() {
                        defer wg.Done()
                        defer func() {
                                <-sem
                        }()
                        c, err := net.Dial(ln.Addr().Network(), ln.Addr().String())
                        if err != nil {
                                return
                        }
                        defer c.Close()
                        var b [512]byte
                        write(c, b[:])
                        read(c, b[:])
                }()
        }
        wg.Wait()

        log.Printf("%dx%d concurrent runs", N, attempts)
}

func write(c net.Conn, b []byte) {
        n, err := c.Write(b)
        if n != len(b) || err != nil {
                switch runtime.GOOS {
                case "darwin", "freebsd", "linux":
                        tc, xerr := tcp.NewConn(c)
                        if xerr != nil {
                                log.Fatal(xerr)
                        }
                        ti, xerr := tc.Info()
                        if xerr != nil {
                                log.Fatal(xerr)
                        }
                        log.Fatalf("err=%q tcp-state=%q\n", err, ti.State)
                default:
                        log.Fatalf("err=%q\n", err)
                }
        }
}
func read(c net.Conn, b []byte) {
        for nr := 0; nr != len(b); {
                n, err := c.Read(b)
                nr += n
                if err != nil {
                        log.Fatal(err)
                }
        }
}

mikioh · 2016-02-29T01:53:30Z

FWIW, my dtrace script is the following:

#!/usr/sbin/dtrace -s

syscall::kevent:entry
/execname == "tcp"/
{
        self->kevent = arg0;
        printf("tid=%d", tid);
}

syscall::kevent:return
/execname == "tcp"/
{
        printf("tid=%d s=%d n=%d errno=%d", tid, self->kevent, arg0, errno);
}

syscall::socket:entry
/execname == "tcp"/
{
        printf("tid=%d", tid);
}

syscall::socket:return
/execname == "tcp"/
{
        printf("tid=%d s=%d s=%d errno=%d", tid, arg0, arg1, errno);
}

syscall::connect:entry
/execname == "tcp"/
{
        printf("tid=%d s=%d", tid, arg0);
        self->connect = arg0;
}

syscall::connect:return
/execname == "tcp"/
{
        printf("tid=%d s=%d errno=%d", tid, self->connect, errno);
}

syscall::listen:entry
/execname == "tcp"/
{
        printf("tid=%d s=%d", tid, arg0);
        self->listen = arg0;
}
syscall::listen:return
/execname == "tcp"/
{
        printf("tid=%d s=%d errno=%d", tid, self->listen, errno);
}

syscall::accept:entry
/execname == "tcp"/
{
        printf("tid=%d s=%d", tid, arg0);
        self->accept = arg0;
}

syscall::accept:return
/execname == "tcp"/
{
        printf("tid=%d s=%d s=%d errno=%d", tid, self->accept, arg1, errno);
}

syscall::getsockopt:entry
/execname == "tcp"/
{
        printf("tid=%d s=%d", tid, arg0);
        self->getsockopt = arg0;
}

syscall::getsockopt:return
/execname == "tcp"/
{
        printf("tid=%d s=%d errno=%d", tid, self->getsockopt, errno);
}

syscall::read:entry
/execname == "tcp"/
{
        printf("tid=%d s=%d", tid, arg0);
        self->read = arg0;
}

syscall::read:return
/execname == "tcp"/
{
        printf("tid=%d s=%d n=%d errno=%d", tid, self->read, arg0, errno);
}
syscall::write:entry
/execname == "tcp"/
{
        printf("tid=%d s=%d", tid, arg0);
        self->write = arg0;
}

syscall::write:return
/execname == "tcp"/
{
        printf("tid=%d s=%d n=%d errno=%d", tid, self->write, arg0, errno);
}

syscall::close:entry
/execname == "tcp"/
{
        printf("tid=%d s=%d", tid, arg0);
        self->close = arg0;
}

syscall::close:return
/execname == "tcp"/
{
        printf("tid=%d s=%d errno=%d", tid, self->close, errno);
}

mikioh · 2016-02-29T01:57:10Z

@bdarnell, I'm happy if you have a spare time for investigating this issue.

bdarnell · 2016-02-29T18:40:25Z

so there must be a lingering wakeup caused by reuse of pollDesc objects that started before this snippet, which unblocks the goroutine and allows it to proceed to call getsockopt.

I don't know the reason why you suppose so.

The reason is in the two dtruss snippets. The successful case has kevent(0x6, 0x0, 0x0) = 1 0 while the unsuccessful case has kevent(0x6, 0x0, 0x0) = 0 0. The first result value is the number of events returned by kevent, and in both cases there was only one kevent call in between the calls to connect and getsockopt.

The goroutine in netFD.connect() should not be brought out of its call to WaitWrite until a new result from kevent unblocks it. Since it did anyway, I think another thread's call to kevent was in the process of waking up this pollDesc, but the pollDesc was concurrently reassigned from the old goroutine and fd to the new one. I'm not 100% positive this is what's going on because this is some subtle code, but comments in the netpoll files seem to suggest that spurious wakeups are possible, which would cause problems for netFD.connect().

mikioh · 2016-03-01T03:32:05Z

The reason is in the two dtruss snippets.

Um, I'm still not sure because there are two types of kevent calls; one is for initial registration to the single per-process kqueue and another is for capturing kernel events, and I don't know whether dtruss has good tracing functionality than dtrace. Anyway, when I tweak the dtrace script mentioned above like the following:

syscall::kevent:entry
/execname == "tcp" && arg2 != 0/
{
        self->kevent = arg0;
        self->nreg = arg2;
        self->nevt = arg4;
        printf("REG: tid=%d s=%d nreg=%d nevt=%d", tid, arg0, arg2, arg4);
}

syscall::kevent:entry
/execname == "tcp" && arg2 == 0/
{
        self->kevent = arg0;
        self->nreg = arg2;
        self->nevt = arg4;
        printf("EVT: tid=%d s=%d nreg=%d nevt=%d", tid, arg0, arg2, arg4);
}

syscall::kevent:return
/execname == "tcp" && self->nreg != 0/
{
        printf("REG: tid=%d s=%d nreg=%d nevt=%d n=%d errno=%d", tid, self->kevent, self->nreg, self->nevt, arg0, errno);
}

syscall::kevent:return
/execname == "tcp" && self->nreg == 0/
{
        printf("EVT: tid=%d s=%d nreg=%d nevt=%d n=%d errno=%d", tid, self->kevent, self->nreg, self->nevt, arg0, errno);
}

we can see

  0    338                   connect:return tid=243094 s=8 errno=36
  0    867                     kevent:entry REG: tid=243094 s=4 nreg=2 nevt=0
  0    868                    kevent:return EVT: tid=243088 s=4 nreg=0 nevt=64 n=3 errno=0
  0    868                    kevent:return REG: tid=243094 s=4 nreg=2 nevt=0 n=0 errno=0
  0    377                 getsockopt:entry tid=243094 s=8
  0    147                       read:entry tid=243088 s=18
  0    378                getsockopt:return tid=243094 s=8 errno=0
  0    148                      read:return tid=243088 s=18 n=512 errno=0
  0    147                       read:entry tid=243097 s=11
  0    338                   connect:return tid=243099 s=10 errno=36
  0    338                   connect:return tid=243095 s=17 errno=36
  0    153                      close:entry tid=243088 s=18
  0    148                      read:return tid=243097 s=11 n=512 errno=0
  0    867                     kevent:entry REG: tid=243099 s=4 nreg=2 nevt=0
  0    867                     kevent:entry REG: tid=243095 s=4 nreg=2 nevt=0
  0    149                      write:entry tid=243094 s=8
  0    153                      close:entry tid=243097 s=11
  0    338                   connect:return tid=243098 s=9 errno=36
  0    150                     write:return tid=243094 s=8 n=-1 errno=57

Perhaps there's a potential race condition on the kqueue implementation for Darwin, like #14127.

Summary:

There is a spurious kevent notification during TCP connection setup process
The notification may occur not only with Go 1.4 and below but with Go 1.5 and above
- I confirmed that it happened with Go 1.4 last night
Still unclear whether the notification depends on kernel version
- At least it occurs on OS X 10.11
Still unclear what's the root cause of spurious notification

gopherbot · 2016-03-09T10:00:49Z

CL https://golang.org/cl/20468 mentions this issue.

mikioh added the OS-Darwin label Feb 28, 2016

mikioh changed the title ~~net: Spurious wakeups in netpoll cause broken connections~~ runtime, net: Spurious wakeups in netpoll cause broken connections Feb 28, 2016

bradfitz added this to the Go1.7 milestone Feb 28, 2016

tamird mentioned this issue Feb 28, 2016

server: unskip TestSSLEnforcement cockroachdb/cockroach#4705

Merged

tamird mentioned this issue Mar 2, 2016

crypto/tls: TLS handshake swallows errors #14539

Closed

gopherbot closed this as completed in 76b724c Mar 16, 2016

ansoda mentioned this issue Mar 21, 2016

net/http: it's a bug? #14895

Closed

mikioh changed the title ~~runtime, net: Spurious wakeups in netpoll cause broken connections~~ runtime, net: spurious wakeups in netpoll using kevent Mar 21, 2016

golang locked and limited conversation to collaborators Mar 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime, net: spurious wakeups in netpoll using kevent #14548

runtime, net: spurious wakeups in netpoll using kevent #14548

bdarnell commented Feb 28, 2016

mikioh commented Feb 28, 2016

bdarnell commented Feb 28, 2016

mikioh commented Feb 29, 2016

mikioh commented Feb 29, 2016

mikioh commented Feb 29, 2016

bdarnell commented Feb 29, 2016

mikioh commented Mar 1, 2016

gopherbot commented Mar 9, 2016

runtime, net: spurious wakeups in netpoll using kevent #14548

runtime, net: spurious wakeups in netpoll using kevent #14548

Comments

bdarnell commented Feb 28, 2016

mikioh commented Feb 28, 2016

bdarnell commented Feb 28, 2016

mikioh commented Feb 29, 2016

mikioh commented Feb 29, 2016

mikioh commented Feb 29, 2016

bdarnell commented Feb 29, 2016

mikioh commented Mar 1, 2016

gopherbot commented Mar 9, 2016