log/syslog, net: goroutine permanently stuck in waitWrite() to /dev/log using a unix datagram socket #23604

esmet · 2018-01-29T18:53:02Z

Writes to syslog using a unix datagram socket (the default behavior on Ubuntu 14.04, probably more) may become stuck indefinitely if the syslog daemon closes /dev/log.

The following reproduces the issue. This code is derived from a reproducer found in the comments section of #5932 - https://play.golang.org/p/vp_e6n8VJuX

To reproduce:

Execute the above reproducer binary several times in the background (3-6 times is usually good).
Restart your syslog daemon. I'm using syslog-ng, so sudo service syslog-ng restart suffices.
Repeat this exact experiment a few times until you observe some of the runs never exit. Kill them with -SIQUIT and hopefully see the same stack pasted below. It sometimes helps to stagger running the the background runs, restart the daemon, wait a few seconds, run some more in the background, restart the daemon again. Get aggressive. Anger the system.

https://pastebin.com/GuV5JZDS

We first observed this bad behavior in a production environment last week during a DNS outage. The outage prevented our syslog-ng daemon from reloading configuration properly, since it exits uncleanly on configuration reload when DNS is unavailable. We then observed a large portion of production boxes hung completely, with one goroutine waiting on syslog and the rest waiting to acquire the log package mutex. I believe the syslog-ng daemon restart caused the bad behavior to be exposed in the syslog and/or net packages. Our syslog write volume is fairly high.

After that, I reproduced the issue locally using the above reproducer. I think the badness has to do with missing a socket close event, or some other invalid state transition in the net code.

Goroutine stack after killing the stuck process with -SIGQUIT: https://pastebin.com/GuV5JZDS

Environment:
$ go version
go version go1.7 linux/amd64
$ go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/jesmet/.gvm/pkgsets/go1.7/global"
GORACE=""
GOROOT="/home/jesmet/.gvm/gos/go1.7"
GOTOOLDIR="/home/jesmet/.gvm/gos/go1.7/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build126114309=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"

The text was updated successfully, but these errors were encountered:

esmet · 2018-02-02T05:01:33Z

This issue is highly related, if not exact #5932 (there are multiple trains of thought in that issue)

davecheney · 2018-02-02T05:06:24Z

Sorry to here you're having problems. Sadly Go 1.7 is long out of support, can upgrade to 1.9.x, or 1.10rc1 and verify if the problem is still occurring.

esmet · 2018-02-02T20:24:04Z

I was able to reproduce just now using golang 1.10rc1

Here's the full goroutine stack trace: https://pastebin.com/xwnYJKi7

esmet · 2018-02-02T22:14:00Z

I modified the runtime to print certain key pollDesc events: close, evict, pollUnblock, etc. I noticed that when I restart syslog-ng and see two log lines, one close/evict/unblock group for each syslog connection, the process ends up not hanging. When I only see one group of those calls, the other connection hangs. My current theory is that something is not calling FD::Close(), which calls fd.pd.evict(), which calls runtime_pollUnblock.

Still digging...

esmet · 2018-02-04T02:00:49Z

After much more digging I'm pretty certain that we should be seeing an EPOLLHUP or EPOLLERR condition in netpoll() when the /dev/log read-side closes unexpectedly, but we're not. Since we don't wake up the waitWrite method and fd is half shutdown, the syslog write stays stuck.

esmet · 2018-02-04T17:05:21Z

It's worth mentioning that I've reproduced this running on linux kernel versions 3.13.0 and 4.4.0

esmet · 2018-02-05T17:31:32Z

I have to revise my analysis.

EPOLLHUP and EPOLLERR are probably not expected on a datagram socket, since it's connectionless, and I think a hang up only makes sense over a connection-oriented socket. The real event that should be observed when the reading end closes is EPOLLOUT, because it means our next write is non-blocking and will fail immediately with ECONNREFUSED.

Now the question is: are we missing the delivery of an EPOLLOUT event to the pollDesc in the golang runtime, or does the EPOLLOUT never come from epoll/linux?

I tried reproducing the environment in C, using a client and server:

https://pastebin.com/59AT01AT client
https://pastebin.com/Y37xbjvp server

I was not able to reproduce the issue with 8 concurrent clients and a single server. Using strace and some old logging code that's no longer present, I was able to confirm that the clients are indeed bouncing between EAGAIN and epollwait (as expected) and that when the server dies, they immediately transition to ECONNREFUSED until the server comes back, at which point everything is ok again.

To reiterate, in the golang reproducer, when the server dies, the clients sometimes get stuck in code that is waiting for an EPOLLOUT, and it seemingly never comes.

The negative result with C suggests but does not prove that this could be an issue with the way the golang runtime is using epoll and pollDecs, as opposed to a bug in epoll itself. But a lot of digging around the internet suggests that epoll is far from perfect and could certainly be the root cause, especially when there are multiple clients connected to the same server socket.

I'd appreciate any feedback on the analysis so far and would love some suggestions on where to look next in the pollDesc / netpoll code for possible bugs.

bradfitz · 2018-02-05T19:20:06Z

/cc @ianlancetaylor

mmathys · 2018-05-16T13:38:55Z

has this been fixed?

bradfitz · 2018-05-16T13:41:27Z

This bug is still open.

fraenkel · 2018-05-16T17:44:09Z

Ran the example from above and it exited on my machine (Ubuntu 18.04 ), kernel 4.15.0.
Tested both 1.10.2 and tip.

ianlancetaylor · 2018-06-20T18:22:15Z

I think I may have been able to recreate the problem on tip with this program.

package main

import (
	"fmt"
	"io/ioutil"
	"net"
	"os"
	"runtime"
	"sync"
	"time"
)

func main() {
	temp, err := ioutil.TempFile("", "test")
	if err != nil {
		die(err)
	}
	addr := temp.Name()
	temp.Close()
	os.Remove(addr)

	go server(addr)

	var wg sync.WaitGroup
	for i := 0; i < 20; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			client(addr)
		}()
	}
	wg.Wait()
}

func server(addr string) {
	for {
		l, err := net.ListenPacket("unixgram", addr)
		if err != nil {
			die(err)
		}
		for i := 0; i < 20; i++ {
			var buf [4096]byte
			if _, _, err := l.ReadFrom(buf[:]); err != nil {
				die(err)
			}
		}
		if err := l.Close(); err != nil {
			die(err)
		}
		os.Remove(addr)
		time.Sleep(time.Millisecond)
	}
}

func client(addr string) {
	var c net.Conn
	for i := 0; i < 500000; i++ {
		for c == nil {
			var err error
			c, err = net.Dial("unixgram", addr)
			if err != nil {
				time.Sleep(time.Microsecond)
			}
		}
		if _, err := c.Write([]byte("test line")); err != nil {
			c.Close()
			c = nil
		}
		if i%10000 == 0 {
			time.Sleep(time.Millisecond)
		}
	}
}

func die(err error) {
	_, file, line, _ := runtime.Caller(1)
	fmt.Fprintf(os.Stderr, "%s:%d: %v\n", file, line, err)
	os.Exit(1)
}

ianlancetaylor · 2018-06-20T22:55:42Z

I captured an strace of the program in the previous comment. Some things I see:

Adding descriptor 19 to epoll:

[pid 213739] epoll_ctl(4, EPOLL_CTL_ADD, 19, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLE
T, {u32=126060880, u64=140415491869008}} <unfinished ...>

Descriptor 19 is returned by epoll a few times, as seen by observing the u64 value 140415491869008 stored with EPOLL_CTL_ADD:

[pid 213761] <... epoll_pwait resumed> [{EPOLLOUT, {u32=126060880, u64=140415491
869008}}, {EPOLLOUT, {u32=126060672, u64=140415491868800}}, {EPOLLOUT, {u32=1260
59424, u64=140415491867552}}, {EPOLLOUT, {u32=126060048, u64=140415491868176}}, 
{EPOLLOUT, {u32=126058800, u64=140415491866928}}, {EPOLLOUT, {u32=126062336, u64
=140415491870464}}, {EPOLLOUT, {u32=126061920, u64=140415491870048}}, {EPOLLOUT,
 {u32=126058592, u64=140415491866720}}], 128, -1, NULL, 3) = 8

After the last time descriptor 19 is returned by epoll, we see one last write to descriptor 19:

[pid 213761] write(19, "test line", 9 <unfinished ...>
[pid 213743] epoll_pwait(4,  <unfinished ...>
[pid 213761] <... write resumed> )      = -1 EAGAIN (Resource temporarily unavailable)

We never see descriptor 19 returned by epoll again. We never see any other reference to descriptor 19.

Shortly before the last write to descriptor 19 I see this:

[pid 213740] recvfrom(6,  <unfinished ...>
...
[pid 213740] <... recvfrom resumed> "test line", 4096, 0, 0xc000128cf0, [112->0]
) = 9
...
[pid 213740] epoll_ctl(4, EPOLL_CTL_DEL, 6, 0xc000128e0c <unfinished ...>
...
[pid 213740] <... epoll_ctl resumed> )  = 0

Shortly after the last write to descriptor 19 I see this:

[pid 213740] close(6 <unfinished ...>
...
[pid 213740] <... close resumed> )      = 0

Interestingly, the close takes a fairly long time. Between the start of the close and the completion I see this, in which descriptor 6 is opened again and closed again. Not sure if that means anything; probably not, since other descriptors seem to be returned by epoll as expected.

[pid 213761] socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0 <unfinished ...>
...
[pid 213761] <... socket resumed> )     = 6
[pid 213761] setsockopt(6, SOL_SOCKET, SO_BROADCAST, [1], 4 <unfinished ...>
...
[pid 213761] <... setsockopt resumed> ) = 0
...
[pid 213761] connect(6, {sa_family=AF_UNIX, sun_path="/tmp/test046398477"}, 21 <unfinished ...>
...
[pid 213761] <... connect resumed> )    = -1 ECONNREFUSED (Connection refused)
...
[pid 213761] close(6 <unfinished ...>
...
[pid 213761] <... close resumed> )      = 0

There are numerous calls to epoll_wait after the last write to descriptor 19, but none of them report descriptor 19 as ready.

ianlancetaylor · 2018-06-20T23:01:09Z

The upshot of the last comment is: as far as I can tell, the kernel is failing to return EPOLLOUT events for some descriptors when the descriptor on the other end of the Unix socket is closed. This is also what @esmet reported above in #23604 (comment) .

Searching the web, I see a similar question at https://stackoverflow.com/questions/38441059/edge-triggered-epoll-for-unix-domain-socket .

This does make me suspect that this is a kernel bug. I'm running 4.9.0-6-amd64.

fraenkel · 2018-06-21T00:51:51Z

I am on a 4.15 kernel and I see similar traces.

ianlancetaylor · 2018-06-21T01:10:57Z

I suspect that this C program shows the same problem. It's hard to be really sure. I would appreciate it if others could look over this code and see if they can spot any bugs. Thanks.

#include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/epoll.h>
#include <sys/socket.h>
#include <sys/un.h>

static char temp[11];

static void die(const char* msg, int err) {
  fprintf(stderr, "%s: %s\n", msg, strerror(err));
  exit(EXIT_FAILURE);
}

struct condlock {
  pthread_mutex_t mutex;
  pthread_cond_t cond;
  int ready;
};

void condlock_init(struct condlock* cl) {
  int err;

  err = pthread_mutex_init(&cl->mutex, NULL);
  if (err != 0) {
    die("pthread_mutex_init", err);
  }
  err = pthread_cond_init(&cl->cond, NULL);
  if (err != 0) {
    die("pthread_cond_init", err);
  }
  cl->ready = 0;
}

void condlock_signal(struct condlock* cl) {
  int err;

  err = pthread_mutex_lock(&cl->mutex);
  if (err != 0) {
    die("pthread_mutex_lock", err);
  }
  cl->ready = 1;
  err = pthread_cond_signal(&cl->cond);
  if (err != 0) {
    die("pthread_cond_signal", err);
  }
  err = pthread_mutex_unlock(&cl->mutex);
  if (err != 0) {
    die("pthread_mutex_unlock", err);
  }
}

void condlock_wait(struct condlock* cl) {
  int err;
  struct timespec ts;

  /* If everything worked as expected, the poller would always report
     that a descriptor is ready and call condlock_signal.  We give it
     20 seconds to do so.  If it fails, we are seeing a bug.  */

  memset(&ts, 0, sizeof ts);

  err = pthread_mutex_lock(&cl->mutex);
  if (err != 0) {
    die("pthread_mutex_lock", err);
  }

  if (!cl->ready) {
    ts.tv_sec = time(NULL) + 20;
    err = pthread_cond_timedwait(&cl->cond, &cl->mutex, &ts);
    if (err != 0) {
      if (err == ETIMEDOUT) {
	fprintf(stderr, "waited more than 20 seconds for descriptor to be ready\n");
	exit(EXIT_FAILURE);
      } else {
	die("pthread_cond_timedwait", err);
      }
    }
  }

  cl->ready = 0;

  err = pthread_mutex_unlock(&cl->mutex);
  if (err != 0) {
    die("pthread_mutex_unlock", err);
  }
}

static int pollfd;

static void poll_init() {
  int d;

  d = epoll_create(20);
  if (d < 0) {
    die("epoll_create", errno);
  }
  pollfd = d;
}

static void poll_add(int d, struct condlock *cond) {
  struct epoll_event ev;

  memset(&ev, 0, sizeof ev);
  ev.events = EPOLLIN | EPOLLOUT | EPOLLRDHUP | EPOLLET;
  ev.data.u64 = (uint64_t)(uintptr_t)(void*)(cond);
  if (epoll_ctl(pollfd, EPOLL_CTL_ADD, d, &ev) < 0) {
    die("epoll_ctl add", errno);
  }
}

static void poll_del(int d) {
  if (epoll_ctl(pollfd, EPOLL_CTL_DEL, d, NULL) < 0) {
    die("epoll_ctl del", errno);
  }
}

static void *poller(void *arg __attribute__((unused))) {
  int c;
  struct epoll_event events[128];
  int i;
  struct condlock *cl;

  while (1) {
    c = epoll_wait(pollfd, events, 128, -1);
    for (i = 0; i < c; i++) {
      cl = (struct condlock*)(void*)(uintptr_t)(events[i].data.u64);
      condlock_signal(cl);
    }
  }

  abort(); /* unreachable */
}

static void *server(void *arg __attribute__((unused))) {
  struct sockaddr_un sun;
  struct condlock cl;
  int s;
  int i;
  char buf[4096];

  memset(&sun, 0, sizeof sun);
  sun.sun_family = AF_UNIX;
  memcpy(sun.sun_path, temp, sizeof temp);

  condlock_init(&cl);

  while (1) {
    s = socket(AF_UNIX, SOCK_DGRAM | SOCK_NONBLOCK, 0);
    if (s < 0) {
      die("socket", errno);
    }

    if (bind(s, (struct sockaddr*)(&sun), sizeof sun) < 0) {
      die("bind", errno);
    }

    poll_add(s, &cl);

    for (i = 0; i < 20; i++) {
      if (recv(s, buf, sizeof buf, 0) < 0) {
	if (errno != EAGAIN && errno != EWOULDBLOCK) {
	  die("recv", errno);
	}
	condlock_wait(&cl);
      }
    }

    poll_del(s);
    if (close(s) < 0) {
      die("close", errno);
    }
    if (remove(temp) < 0) {
      die("remove", errno);
    }

    usleep(1000); /* one millisecond */
  }

  abort(); /* unreachable */
}

static void *client(void *arg __attribute__((unused))) {
  const char * const msg = "test line";
  struct sockaddr_un sun;
  struct condlock cl;
  int s;
  int i;
  int flags;

  memset(&sun, 0, sizeof sun);
  sun.sun_family = AF_UNIX;
  memcpy(sun.sun_path, temp, sizeof temp);

  condlock_init(&cl);

  s = -1;
  for (i = 0; i < 500000; i++) {
    while (s < 0) {
      s = socket(AF_UNIX, SOCK_DGRAM, 0);
      if (s < 0) {
	die("socket", errno);
      }
      if (connect(s, (struct sockaddr *)(&sun), sizeof sun) < 0) {
	if (errno != ECONNREFUSED && errno != ENOENT) {
	  die("connect", errno);
	}
	if (close(s) < 0) {
	  die("close", errno);
	}
	s = -1;
	usleep(1); /* one microsecond */
      } else {
	flags = fcntl(s, F_GETFL, 0);
	if (flags < 0) {
	  die("fcntl", errno);
	}
	if (fcntl(s, F_SETFL, flags | O_NONBLOCK) < 0) {
	  die("fcntl", errno);
	}

	poll_add(s, &cl);
      }
    }

    if (write(s, msg, strlen(msg)) < 0) {
      if (errno == EAGAIN || errno == EWOULDBLOCK) {
	condlock_wait(&cl);
      } else if (errno == ECONNREFUSED) {
	poll_del(s);
	if (close(s) < 0) {
	  die("close", errno);
	}
	s = -1;
      } else {
	die("write", errno);
      }
    }

    if (i % 10000 == 0) {
      usleep(1000); /* one millisecond */
    }
  }

  if (s > 0) {
    poll_del(s);
    if (close(s) < 0) {
      die("close", errno);
    }
  }

  return NULL;
}

#define CLIENT_COUNT (20)

int main() {
  int d;
  pthread_attr_t attr;
  int err;
  pthread_t tid;
  int i;
  pthread_t clients[CLIENT_COUNT];

  memcpy(temp, "testXXXXXX", sizeof temp);
  d = mkstemp(temp);
  if (d < 0) {
    die("mkstemp", errno);
  }
  close(d);
  remove(temp);

  err = pthread_attr_init(&attr);
  if (err != 0) {
    die("pthread_attr_init", err);
  }
  err = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
  if (err != 0) {
    die("pthread_attr_setdetachstate", err);
  }

  poll_init();
  err = pthread_create(&tid, &attr, poller, NULL);
  if (err != 0) {
    die("pthread_create", err);
  }

  err = pthread_create(&tid, &attr, server, NULL);
  if (err != 0) {
    die("pthread_create", err);
  }

  for (i = 0; i < CLIENT_COUNT; i++) {
    err = pthread_create(&clients[i], NULL, client, NULL);
    if (err != 0) {
      die("pthread_create", err);
    }
  }

  for (i = 0; i < CLIENT_COUNT; i++) {
    err = pthread_join(clients[i], NULL);
    if (err != 0) {
      die("pthread_join", err);
    }
  }

  return 0;
}

fraenkel · 2018-06-21T13:00:25Z

The above looks correct and does fail the same way when it does fail. I did happen to have the C version succeed once but only once.

The one thing that bothers me is that my reading of all the various issues that people have had with EPOLL always comes down to how file descriptors are managed and how you can get into trouble. The client never really closes its connection unless there is some error. Whether that is contributing to the issue, I really don't know. I have been playing with the code but nothing seems to fix the issue or increase the percentage of success.

ianlancetaylor · 2018-06-26T17:59:03Z

Sent bug report to netdev mailing list: https://www.spinics.net/lists/netdev/msg509646.html .

ianlancetaylor · 2018-07-13T22:46:56Z

Reply on netdev maililng list, with kernel patch: https://www.spinics.net/lists/netdev/msg512686.html

ianlancetaylor · 2018-07-13T22:47:28Z

Closing this issue since there it doesn't seem to be a Go bug.

davecheney added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Feb 2, 2018

ianlancetaylor added this to the Go1.11 milestone Feb 4, 2018

ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Feb 4, 2018

dmitshur changed the title ~~log/syslog + net/net: goroutine permanently stuck in waitWrite() to /dev/log using a unix datagram socket~~ log/syslog, net/net: goroutine permanently stuck in waitWrite() to /dev/log using a unix datagram socket Mar 10, 2018

ianlancetaylor changed the title ~~log/syslog, net/net: goroutine permanently stuck in waitWrite() to /dev/log using a unix datagram socket~~ log/syslog, net: goroutine permanently stuck in waitWrite() to /dev/log using a unix datagram socket Jun 20, 2018

ianlancetaylor modified the milestones: Go1.11, Go1.12 Jun 20, 2018

ianlancetaylor closed this as completed Jul 13, 2018

absolute8511 mentioned this issue Sep 19, 2018

net: goroutine permanently stuck in waitWrite() and waitRead() on tcp conn #27752

Closed

golang locked and limited conversation to collaborators Jul 13, 2019

gopherbot added the FrozenDueToAge label Jul 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

log/syslog, net: goroutine permanently stuck in waitWrite() to /dev/log using a unix datagram socket #23604

log/syslog, net: goroutine permanently stuck in waitWrite() to /dev/log using a unix datagram socket #23604

esmet commented Jan 29, 2018 •

edited

esmet commented Feb 2, 2018 •

edited

davecheney commented Feb 2, 2018

esmet commented Feb 2, 2018

esmet commented Feb 2, 2018

esmet commented Feb 4, 2018

esmet commented Feb 4, 2018

esmet commented Feb 5, 2018

bradfitz commented Feb 5, 2018

mmathys commented May 16, 2018

bradfitz commented May 16, 2018

fraenkel commented May 16, 2018

ianlancetaylor commented Jun 20, 2018

ianlancetaylor commented Jun 20, 2018

ianlancetaylor commented Jun 20, 2018

fraenkel commented Jun 21, 2018

ianlancetaylor commented Jun 21, 2018

fraenkel commented Jun 21, 2018

ianlancetaylor commented Jun 26, 2018

ianlancetaylor commented Jul 13, 2018

ianlancetaylor commented Jul 13, 2018

log/syslog, net: goroutine permanently stuck in waitWrite() to /dev/log using a unix datagram socket #23604

log/syslog, net: goroutine permanently stuck in waitWrite() to /dev/log using a unix datagram socket #23604

Comments

esmet commented Jan 29, 2018 • edited

esmet commented Feb 2, 2018 • edited

davecheney commented Feb 2, 2018

esmet commented Feb 2, 2018

esmet commented Feb 2, 2018

esmet commented Feb 4, 2018

esmet commented Feb 4, 2018

esmet commented Feb 5, 2018

bradfitz commented Feb 5, 2018

mmathys commented May 16, 2018

bradfitz commented May 16, 2018

fraenkel commented May 16, 2018

ianlancetaylor commented Jun 20, 2018

ianlancetaylor commented Jun 20, 2018

ianlancetaylor commented Jun 20, 2018

fraenkel commented Jun 21, 2018

ianlancetaylor commented Jun 21, 2018

fraenkel commented Jun 21, 2018

ianlancetaylor commented Jun 26, 2018

ianlancetaylor commented Jul 13, 2018

ianlancetaylor commented Jul 13, 2018

esmet commented Jan 29, 2018 •

edited

esmet commented Feb 2, 2018 •

edited