Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log/syslog, net: goroutine permanently stuck in waitWrite() to /dev/log using a unix datagram socket #23604

Closed
esmet opened this issue Jan 29, 2018 · 20 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@esmet
Copy link

esmet commented Jan 29, 2018

Writes to syslog using a unix datagram socket (the default behavior on Ubuntu 14.04, probably more) may become stuck indefinitely if the syslog daemon closes /dev/log.

The following reproduces the issue. This code is derived from a reproducer found in the comments section of #5932 - https://play.golang.org/p/vp_e6n8VJuX

To reproduce:

  • Execute the above reproducer binary several times in the background (3-6 times is usually good).
  • Restart your syslog daemon. I'm using syslog-ng, so sudo service syslog-ng restart suffices.
  • Repeat this exact experiment a few times until you observe some of the runs never exit. Kill them with -SIQUIT and hopefully see the same stack pasted below. It sometimes helps to stagger running the the background runs, restart the daemon, wait a few seconds, run some more in the background, restart the daemon again. Get aggressive. Anger the system.

https://pastebin.com/GuV5JZDS

We first observed this bad behavior in a production environment last week during a DNS outage. The outage prevented our syslog-ng daemon from reloading configuration properly, since it exits uncleanly on configuration reload when DNS is unavailable. We then observed a large portion of production boxes hung completely, with one goroutine waiting on syslog and the rest waiting to acquire the log package mutex. I believe the syslog-ng daemon restart caused the bad behavior to be exposed in the syslog and/or net packages. Our syslog write volume is fairly high.

After that, I reproduced the issue locally using the above reproducer. I think the badness has to do with missing a socket close event, or some other invalid state transition in the net code.

Goroutine stack after killing the stuck process with -SIGQUIT: https://pastebin.com/GuV5JZDS

Environment:
$ go version
go version go1.7 linux/amd64
$ go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/jesmet/.gvm/pkgsets/go1.7/global"
GORACE=""
GOROOT="/home/jesmet/.gvm/gos/go1.7"
GOTOOLDIR="/home/jesmet/.gvm/gos/go1.7/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build126114309=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"

@esmet
Copy link
Author

esmet commented Feb 2, 2018

This issue is highly related, if not exact #5932 (there are multiple trains of thought in that issue)

@davecheney
Copy link
Contributor

Sorry to here you're having problems. Sadly Go 1.7 is long out of support, can upgrade to 1.9.x, or 1.10rc1 and verify if the problem is still occurring.

@davecheney davecheney added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Feb 2, 2018
@esmet
Copy link
Author

esmet commented Feb 2, 2018

I was able to reproduce just now using golang 1.10rc1

Here's the full goroutine stack trace: https://pastebin.com/xwnYJKi7

@esmet
Copy link
Author

esmet commented Feb 2, 2018

I modified the runtime to print certain key pollDesc events: close, evict, pollUnblock, etc. I noticed that when I restart syslog-ng and see two log lines, one close/evict/unblock group for each syslog connection, the process ends up not hanging. When I only see one group of those calls, the other connection hangs. My current theory is that something is not calling FD::Close(), which calls fd.pd.evict(), which calls runtime_pollUnblock.

Still digging...

@esmet
Copy link
Author

esmet commented Feb 4, 2018

After much more digging I'm pretty certain that we should be seeing an EPOLLHUP or EPOLLERR condition in netpoll() when the /dev/log read-side closes unexpectedly, but we're not. Since we don't wake up the waitWrite method and fd is half shutdown, the syslog write stays stuck.

@ianlancetaylor ianlancetaylor added this to the Go1.11 milestone Feb 4, 2018
@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Feb 4, 2018
@esmet
Copy link
Author

esmet commented Feb 4, 2018

It's worth mentioning that I've reproduced this running on linux kernel versions 3.13.0 and 4.4.0

@esmet
Copy link
Author

esmet commented Feb 5, 2018

I have to revise my analysis.

EPOLLHUP and EPOLLERR are probably not expected on a datagram socket, since it's connectionless, and I think a hang up only makes sense over a connection-oriented socket. The real event that should be observed when the reading end closes is EPOLLOUT, because it means our next write is non-blocking and will fail immediately with ECONNREFUSED.

Now the question is: are we missing the delivery of an EPOLLOUT event to the pollDesc in the golang runtime, or does the EPOLLOUT never come from epoll/linux?

I tried reproducing the environment in C, using a client and server:

https://pastebin.com/59AT01AT client
https://pastebin.com/Y37xbjvp server

I was not able to reproduce the issue with 8 concurrent clients and a single server. Using strace and some old logging code that's no longer present, I was able to confirm that the clients are indeed bouncing between EAGAIN and epollwait (as expected) and that when the server dies, they immediately transition to ECONNREFUSED until the server comes back, at which point everything is ok again.

To reiterate, in the golang reproducer, when the server dies, the clients sometimes get stuck in code that is waiting for an EPOLLOUT, and it seemingly never comes.

The negative result with C suggests but does not prove that this could be an issue with the way the golang runtime is using epoll and pollDecs, as opposed to a bug in epoll itself. But a lot of digging around the internet suggests that epoll is far from perfect and could certainly be the root cause, especially when there are multiple clients connected to the same server socket.

I'd appreciate any feedback on the analysis so far and would love some suggestions on where to look next in the pollDesc / netpoll code for possible bugs.

@bradfitz
Copy link
Contributor

bradfitz commented Feb 5, 2018

/cc @ianlancetaylor

@dmitshur dmitshur changed the title log/syslog + net/net: goroutine permanently stuck in waitWrite() to /dev/log using a unix datagram socket log/syslog, net/net: goroutine permanently stuck in waitWrite() to /dev/log using a unix datagram socket Mar 10, 2018
@mmathys
Copy link

mmathys commented May 16, 2018

has this been fixed?

@bradfitz
Copy link
Contributor

This bug is still open.

@fraenkel
Copy link
Contributor

Ran the example from above and it exited on my machine (Ubuntu 18.04 ), kernel 4.15.0.
Tested both 1.10.2 and tip.

@ianlancetaylor ianlancetaylor changed the title log/syslog, net/net: goroutine permanently stuck in waitWrite() to /dev/log using a unix datagram socket log/syslog, net: goroutine permanently stuck in waitWrite() to /dev/log using a unix datagram socket Jun 20, 2018
@ianlancetaylor
Copy link
Contributor

I think I may have been able to recreate the problem on tip with this program.

package main

import (
	"fmt"
	"io/ioutil"
	"net"
	"os"
	"runtime"
	"sync"
	"time"
)

func main() {
	temp, err := ioutil.TempFile("", "test")
	if err != nil {
		die(err)
	}
	addr := temp.Name()
	temp.Close()
	os.Remove(addr)

	go server(addr)

	var wg sync.WaitGroup
	for i := 0; i < 20; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			client(addr)
		}()
	}
	wg.Wait()
}

func server(addr string) {
	for {
		l, err := net.ListenPacket("unixgram", addr)
		if err != nil {
			die(err)
		}
		for i := 0; i < 20; i++ {
			var buf [4096]byte
			if _, _, err := l.ReadFrom(buf[:]); err != nil {
				die(err)
			}
		}
		if err := l.Close(); err != nil {
			die(err)
		}
		os.Remove(addr)
		time.Sleep(time.Millisecond)
	}
}

func client(addr string) {
	var c net.Conn
	for i := 0; i < 500000; i++ {
		for c == nil {
			var err error
			c, err = net.Dial("unixgram", addr)
			if err != nil {
				time.Sleep(time.Microsecond)
			}
		}
		if _, err := c.Write([]byte("test line")); err != nil {
			c.Close()
			c = nil
		}
		if i%10000 == 0 {
			time.Sleep(time.Millisecond)
		}
	}
}

func die(err error) {
	_, file, line, _ := runtime.Caller(1)
	fmt.Fprintf(os.Stderr, "%s:%d: %v\n", file, line, err)
	os.Exit(1)
}

@ianlancetaylor
Copy link
Contributor

I captured an strace of the program in the previous comment. Some things I see:

Adding descriptor 19 to epoll:

[pid 213739] epoll_ctl(4, EPOLL_CTL_ADD, 19, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLE
T, {u32=126060880, u64=140415491869008}} <unfinished ...>

Descriptor 19 is returned by epoll a few times, as seen by observing the u64 value 140415491869008 stored with EPOLL_CTL_ADD:

[pid 213761] <... epoll_pwait resumed> [{EPOLLOUT, {u32=126060880, u64=140415491
869008}}, {EPOLLOUT, {u32=126060672, u64=140415491868800}}, {EPOLLOUT, {u32=1260
59424, u64=140415491867552}}, {EPOLLOUT, {u32=126060048, u64=140415491868176}}, 
{EPOLLOUT, {u32=126058800, u64=140415491866928}}, {EPOLLOUT, {u32=126062336, u64
=140415491870464}}, {EPOLLOUT, {u32=126061920, u64=140415491870048}}, {EPOLLOUT,
 {u32=126058592, u64=140415491866720}}], 128, -1, NULL, 3) = 8

After the last time descriptor 19 is returned by epoll, we see one last write to descriptor 19:

[pid 213761] write(19, "test line", 9 <unfinished ...>
[pid 213743] epoll_pwait(4,  <unfinished ...>
[pid 213761] <... write resumed> )      = -1 EAGAIN (Resource temporarily unavailable)

We never see descriptor 19 returned by epoll again. We never see any other reference to descriptor 19.

Shortly before the last write to descriptor 19 I see this:

[pid 213740] recvfrom(6,  <unfinished ...>
...
[pid 213740] <... recvfrom resumed> "test line", 4096, 0, 0xc000128cf0, [112->0]
) = 9
...
[pid 213740] epoll_ctl(4, EPOLL_CTL_DEL, 6, 0xc000128e0c <unfinished ...>
...
[pid 213740] <... epoll_ctl resumed> )  = 0

Shortly after the last write to descriptor 19 I see this:

[pid 213740] close(6 <unfinished ...>
...
[pid 213740] <... close resumed> )      = 0

Interestingly, the close takes a fairly long time. Between the start of the close and the completion I see this, in which descriptor 6 is opened again and closed again. Not sure if that means anything; probably not, since other descriptors seem to be returned by epoll as expected.

[pid 213761] socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0 <unfinished ...>
...
[pid 213761] <... socket resumed> )     = 6
[pid 213761] setsockopt(6, SOL_SOCKET, SO_BROADCAST, [1], 4 <unfinished ...>
...
[pid 213761] <... setsockopt resumed> ) = 0
...
[pid 213761] connect(6, {sa_family=AF_UNIX, sun_path="/tmp/test046398477"}, 21 <unfinished ...>
...
[pid 213761] <... connect resumed> )    = -1 ECONNREFUSED (Connection refused)
...
[pid 213761] close(6 <unfinished ...>
...
[pid 213761] <... close resumed> )      = 0

There are numerous calls to epoll_wait after the last write to descriptor 19, but none of them report descriptor 19 as ready.

@ianlancetaylor
Copy link
Contributor

The upshot of the last comment is: as far as I can tell, the kernel is failing to return EPOLLOUT events for some descriptors when the descriptor on the other end of the Unix socket is closed. This is also what @esmet reported above in #23604 (comment) .

Searching the web, I see a similar question at https://stackoverflow.com/questions/38441059/edge-triggered-epoll-for-unix-domain-socket .

This does make me suspect that this is a kernel bug. I'm running 4.9.0-6-amd64.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Go1.12 Jun 20, 2018
@fraenkel
Copy link
Contributor

I am on a 4.15 kernel and I see similar traces.

@ianlancetaylor
Copy link
Contributor

I suspect that this C program shows the same problem. It's hard to be really sure. I would appreciate it if others could look over this code and see if they can spot any bugs. Thanks.

#include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/epoll.h>
#include <sys/socket.h>
#include <sys/un.h>

static char temp[11];

static void die(const char* msg, int err) {
  fprintf(stderr, "%s: %s\n", msg, strerror(err));
  exit(EXIT_FAILURE);
}

struct condlock {
  pthread_mutex_t mutex;
  pthread_cond_t cond;
  int ready;
};

void condlock_init(struct condlock* cl) {
  int err;

  err = pthread_mutex_init(&cl->mutex, NULL);
  if (err != 0) {
    die("pthread_mutex_init", err);
  }
  err = pthread_cond_init(&cl->cond, NULL);
  if (err != 0) {
    die("pthread_cond_init", err);
  }
  cl->ready = 0;
}

void condlock_signal(struct condlock* cl) {
  int err;

  err = pthread_mutex_lock(&cl->mutex);
  if (err != 0) {
    die("pthread_mutex_lock", err);
  }
  cl->ready = 1;
  err = pthread_cond_signal(&cl->cond);
  if (err != 0) {
    die("pthread_cond_signal", err);
  }
  err = pthread_mutex_unlock(&cl->mutex);
  if (err != 0) {
    die("pthread_mutex_unlock", err);
  }
}

void condlock_wait(struct condlock* cl) {
  int err;
  struct timespec ts;

  /* If everything worked as expected, the poller would always report
     that a descriptor is ready and call condlock_signal.  We give it
     20 seconds to do so.  If it fails, we are seeing a bug.  */

  memset(&ts, 0, sizeof ts);

  err = pthread_mutex_lock(&cl->mutex);
  if (err != 0) {
    die("pthread_mutex_lock", err);
  }

  if (!cl->ready) {
    ts.tv_sec = time(NULL) + 20;
    err = pthread_cond_timedwait(&cl->cond, &cl->mutex, &ts);
    if (err != 0) {
      if (err == ETIMEDOUT) {
	fprintf(stderr, "waited more than 20 seconds for descriptor to be ready\n");
	exit(EXIT_FAILURE);
      } else {
	die("pthread_cond_timedwait", err);
      }
    }
  }

  cl->ready = 0;

  err = pthread_mutex_unlock(&cl->mutex);
  if (err != 0) {
    die("pthread_mutex_unlock", err);
  }
}

static int pollfd;

static void poll_init() {
  int d;

  d = epoll_create(20);
  if (d < 0) {
    die("epoll_create", errno);
  }
  pollfd = d;
}

static void poll_add(int d, struct condlock *cond) {
  struct epoll_event ev;

  memset(&ev, 0, sizeof ev);
  ev.events = EPOLLIN | EPOLLOUT | EPOLLRDHUP | EPOLLET;
  ev.data.u64 = (uint64_t)(uintptr_t)(void*)(cond);
  if (epoll_ctl(pollfd, EPOLL_CTL_ADD, d, &ev) < 0) {
    die("epoll_ctl add", errno);
  }
}

static void poll_del(int d) {
  if (epoll_ctl(pollfd, EPOLL_CTL_DEL, d, NULL) < 0) {
    die("epoll_ctl del", errno);
  }
}

static void *poller(void *arg __attribute__((unused))) {
  int c;
  struct epoll_event events[128];
  int i;
  struct condlock *cl;

  while (1) {
    c = epoll_wait(pollfd, events, 128, -1);
    for (i = 0; i < c; i++) {
      cl = (struct condlock*)(void*)(uintptr_t)(events[i].data.u64);
      condlock_signal(cl);
    }
  }

  abort(); /* unreachable */
}

static void *server(void *arg __attribute__((unused))) {
  struct sockaddr_un sun;
  struct condlock cl;
  int s;
  int i;
  char buf[4096];

  memset(&sun, 0, sizeof sun);
  sun.sun_family = AF_UNIX;
  memcpy(sun.sun_path, temp, sizeof temp);

  condlock_init(&cl);

  while (1) {
    s = socket(AF_UNIX, SOCK_DGRAM | SOCK_NONBLOCK, 0);
    if (s < 0) {
      die("socket", errno);
    }

    if (bind(s, (struct sockaddr*)(&sun), sizeof sun) < 0) {
      die("bind", errno);
    }

    poll_add(s, &cl);

    for (i = 0; i < 20; i++) {
      if (recv(s, buf, sizeof buf, 0) < 0) {
	if (errno != EAGAIN && errno != EWOULDBLOCK) {
	  die("recv", errno);
	}
	condlock_wait(&cl);
      }
    }

    poll_del(s);
    if (close(s) < 0) {
      die("close", errno);
    }
    if (remove(temp) < 0) {
      die("remove", errno);
    }

    usleep(1000); /* one millisecond */
  }

  abort(); /* unreachable */
}

static void *client(void *arg __attribute__((unused))) {
  const char * const msg = "test line";
  struct sockaddr_un sun;
  struct condlock cl;
  int s;
  int i;
  int flags;

  memset(&sun, 0, sizeof sun);
  sun.sun_family = AF_UNIX;
  memcpy(sun.sun_path, temp, sizeof temp);

  condlock_init(&cl);

  s = -1;
  for (i = 0; i < 500000; i++) {
    while (s < 0) {
      s = socket(AF_UNIX, SOCK_DGRAM, 0);
      if (s < 0) {
	die("socket", errno);
      }
      if (connect(s, (struct sockaddr *)(&sun), sizeof sun) < 0) {
	if (errno != ECONNREFUSED && errno != ENOENT) {
	  die("connect", errno);
	}
	if (close(s) < 0) {
	  die("close", errno);
	}
	s = -1;
	usleep(1); /* one microsecond */
      } else {
	flags = fcntl(s, F_GETFL, 0);
	if (flags < 0) {
	  die("fcntl", errno);
	}
	if (fcntl(s, F_SETFL, flags | O_NONBLOCK) < 0) {
	  die("fcntl", errno);
	}

	poll_add(s, &cl);
      }
    }

    if (write(s, msg, strlen(msg)) < 0) {
      if (errno == EAGAIN || errno == EWOULDBLOCK) {
	condlock_wait(&cl);
      } else if (errno == ECONNREFUSED) {
	poll_del(s);
	if (close(s) < 0) {
	  die("close", errno);
	}
	s = -1;
      } else {
	die("write", errno);
      }
    }

    if (i % 10000 == 0) {
      usleep(1000); /* one millisecond */
    }
  }

  if (s > 0) {
    poll_del(s);
    if (close(s) < 0) {
      die("close", errno);
    }
  }

  return NULL;
}

#define CLIENT_COUNT (20)

int main() {
  int d;
  pthread_attr_t attr;
  int err;
  pthread_t tid;
  int i;
  pthread_t clients[CLIENT_COUNT];

  memcpy(temp, "testXXXXXX", sizeof temp);
  d = mkstemp(temp);
  if (d < 0) {
    die("mkstemp", errno);
  }
  close(d);
  remove(temp);

  err = pthread_attr_init(&attr);
  if (err != 0) {
    die("pthread_attr_init", err);
  }
  err = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
  if (err != 0) {
    die("pthread_attr_setdetachstate", err);
  }

  poll_init();
  err = pthread_create(&tid, &attr, poller, NULL);
  if (err != 0) {
    die("pthread_create", err);
  }

  err = pthread_create(&tid, &attr, server, NULL);
  if (err != 0) {
    die("pthread_create", err);
  }

  for (i = 0; i < CLIENT_COUNT; i++) {
    err = pthread_create(&clients[i], NULL, client, NULL);
    if (err != 0) {
      die("pthread_create", err);
    }
  }

  for (i = 0; i < CLIENT_COUNT; i++) {
    err = pthread_join(clients[i], NULL);
    if (err != 0) {
      die("pthread_join", err);
    }
  }

  return 0;
}

@fraenkel
Copy link
Contributor

The above looks correct and does fail the same way when it does fail. I did happen to have the C version succeed once but only once.

The one thing that bothers me is that my reading of all the various issues that people have had with EPOLL always comes down to how file descriptors are managed and how you can get into trouble. The client never really closes its connection unless there is some error. Whether that is contributing to the issue, I really don't know. I have been playing with the code but nothing seems to fix the issue or increase the percentage of success.

@ianlancetaylor
Copy link
Contributor

Sent bug report to netdev mailing list: https://www.spinics.net/lists/netdev/msg509646.html .

@ianlancetaylor
Copy link
Contributor

Reply on netdev maililng list, with kernel patch: https://www.spinics.net/lists/netdev/msg512686.html

@ianlancetaylor
Copy link
Contributor

Closing this issue since there it doesn't seem to be a Go bug.

@golang golang locked and limited conversation to collaborators Jul 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants