New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: TCP listen queue size set to 0 on Linux #5030
Labels
Milestone
Comments
Any theories on how maxListenerBacklog could be returning 0 without your patch? I can't see how it could happen (syscall.SOMAXCONN is 0x80, and even if /proc/sys/net/core/somaxconn said 0, the function wouldn't use it). What does cat /proc/sys/net/core/somaxconn say? I half expect you to say -1. Labels changed: added priority-later, go1.1maybe, removed priority-triage. Status changed to Accepted. |
Comment 2 by ryan_disabled@heroku.com: Not theories yet. $ cat /proc/sys/net/core/somaxconn 262144 We are still looking into the issue. |
Comment 3 by ryan_disabled@heroku.com: We added a debugging print statement in maxListenerBacklog to print n. The result: maxListenerBacklog result: 262144 We then applied the following patch: diff -r 25b2d31c3148 src/pkg/net/sock_linux.go --- a/src/pkg/net/sock_linux.go Mon Mar 11 16:23:06 2013 -0700 +++ b/src/pkg/net/sock_linux.go Tue Mar 12 01:57:21 2013 +0000 @@ -21,5 +21,6 @@ if n == 0 || !ok { return syscall.SOMAXCONN } - return n + println("maxListenerBacklog result: ", n) + return n - 1 } When we run our sample program and check ss -l, we see that Send-Q is: 65535. |
Does the 65535 make ab happy? One possibility is that the Send-Q listing is truncating the value to 16 bits, and before you were measuring the difference between 262144 and 99, not between 0 and 99. Perhaps a smaller backlog increases performance by giving the server less work to do. Seems unlikely but you never know. Another possibility is that the system call is truncating the value to 16 bits. Seems even less likely. If you run the server under strace -f -e listen, what argument is shown for the listen system call? |
Comment 5 by kr@heroku.com: We'll try that, as well as comparing its behavior with an actual value of 0. Also, we left out some possibly important information from the original report: the 99th percentile of both sets of results are ~50ms; only the max time of is ~3s. We suspected that we were observing the timeout for resending a SYN when the original attempt gets dropped if the listen queue is full, and that only a small fraction (<1%) of requests were being dropped. |
Comment 6 by kr@heroku.com: I tried this a few more times. In all cases strace reports the same value that I gave as input. In the table below "max Recv-Q" is the largest value I observed by eyeballing repeated output of the Recv-Q column of ss -l. max strace "Recv-Q" "Send-Q" ab99 abmax 0 1 0 1ms 2696ms 3 4 3 2ms 3075ms 99 8 99 4ms 21ms 65535 10 65535 4ms 23ms 65536 1 0 1ms 2874ms 65537 2 1 1ms 6270ms 65635 10 99 4ms 21ms 262142 9 65534 3ms 23ms 262143 10 65535 4ms 20ms 262144 1 0 1ms 5859ms 262145 1 0 1ms 4312ms 262243 1 0 1ms 6248ms So yes, 65535 makes ab happy, and in general the values that ss -l reports are consistent with the performance we see from ab. Also, all values >= /proc/sys/net/core/somaxconn are showing up as 0, whereas all values < that file seem to be truncated to 16 bits. I have no idea where that truncation is happening. |
struct sock in linux/include/net/sock.h defines the sk_max_ack_backlog field to be an unsigned short. Setting /proc/sys/whatever/somaxconn bigger than 65535 on Linux is setting things up for sadness. Go tries to do the right thing by using that file, but the instructions in that file are counterproductive. This is true all the way to the latest 3.x head indexed by LXR. It sounds like we should make the Linux somaxconn reader max out at 65535. Thanks for helping track this down. Wow. |
This issue was closed by revision e64f3f2. Status changed to Fixed. |
Comment 12 by kr@heroku.com: Linux man page listen(2) says "If the backlog argument is greater than the value in /proc/sys/net/core/somaxconn, then it is silently truncated to that value; …" And this description fits what we observed here. So why not: func maxListenerBacklog() int { return 1<<16 - 1 } Separately, I'm not too keen on participating in LKML. |
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
by ryan_disabled@heroku.com:
The text was updated successfully, but these errors were encountered: