Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: TestDialTimeoutFDLeak failure #4384

Closed
alberts opened this issue Nov 14, 2012 · 19 comments
Closed

net: TestDialTimeoutFDLeak failure #4384

alberts opened this issue Nov 14, 2012 · 19 comments
Milestone

Comments

@alberts
Copy link
Contributor

alberts commented Nov 14, 2012

What steps will reproduce the problem?

go test -v net

--- FAIL: TestDialTimeoutFDLeak-106 (0.52 seconds)
dial_test.go:260: 134 good connects; expected at most 133
FAIL

Which compiler are you using (5g, 6g, 8g, gccgo)?

6g

Which operating system are you using?

linux

Which version are you using?  (run 'go version')

40ba4d4e4672 tip

Please provide any additional information below.

GOMAXPROCS=106
@alberts
Copy link
Contributor Author

alberts commented Nov 15, 2012

Comment 1:

a few more manifestations:
--- FAIL: TestDialTimeoutFDLeak-51 (0.51 seconds)
dial_test.go:260: 134 good connects; expected at most 133
pollServer WaitFD: epoll_wait: bad file descriptor
FAIL
--- FAIL: TestDialTimeoutFDLeak-33 (0.51 seconds)
dial_test.go:260: 134 good connects; expected at most 133
dial_test.go:260: 135 good connects; expected at most 133
dial_test.go:260: 136 good connects; expected at most 133
pollServer WaitFD: epoll_wait: bad file descriptor
FAIL

@bradfitz
Copy link
Contributor

Comment 3:

This doesn't surprise me, given my comment here:
maxGoodConnect := listenerBacklog + 5 // empirically 131 good ones (of 128). who knows?
Mikioh, what is your machine's /proc/sys/net/core/somaxconn value?
I don't understand why we're able to connect more than SOMAXCONN anyway (the listener
backlog we use), considering we never call Accept.
Help wanted.

@mikioh
Copy link
Contributor

mikioh commented Nov 18, 2012

Comment 4:

% cat /proc/sys/net/core/somaxconn
512
% uname -a
Linux vm2 3.2.0-32-generic #51-Ubuntu SMP Wed Sep 26 21:33:09 UTC 2012 x86_64 x86_64
x86_64 GNU/Linux
I guess, somaxconn is just the threshold to accept function; I mean 
it affects max # of queue that keeps incoming SYN-SENT sockets 
during LISTEN and SYN-RECEIVED or ESTABLISHED state. That means 
that we should make mock TCP sender that can delay last SYN sending
if we need more accurate test. Long live 3-way handshaking.

@mikioh
Copy link
Contributor

mikioh commented Nov 18, 2012

Comment 5:

Ah, original motivation on that test is timeout of TCP sender handshaking.
In that case, we should make mock TCP receiver that never return SYN/ACK
packet. Hm, is there anyone who want to rewrite NS-2/3 in Go? ;)

@bradfitz
Copy link
Contributor

Comment 6:

Mikioh, do you understand what's going wrong here?
I'm not sure I do.

@alberts
Copy link
Contributor Author

alberts commented Nov 21, 2012

Comment 7:

Test issues with backlogs aside, is the
pollServer WaitFD: epoll_wait: bad file descriptor
that pops up in some runs indicative of another problem?

@bradfitz
Copy link
Contributor

Comment 8:

Probably. I've seen that too occasionally.

@mikioh
Copy link
Contributor

mikioh commented Nov 21, 2012

Comment 9:

Sorry for the confusion, I misunderstood.
Q. How related TCP listener's backlog to SOMAXCONN
A. Linux counts # of ESTABLISHED incoming sockets as backlog.
cf. tcp_v4_conn_request in net/ipv4/tcp_ipv4.c:
        /* Accept backlog is full. If we have already queued enough
         * of warm entries in syn queue, drop request. It is better than
         * clogging syn queue with openreqs with exponentially increasing
         * timeout.
         */
        if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1)
                goto drop;
Other BSDs doesn't, it counts both SYN-SENT and ESTABs.
Not sure for Windows.
Q. Why we're able to connect more than SOMAXCONN on Linux?
A. Because latest Linux on multicare/SMP works great on lock-contention!?
Hi fullung, which kernel version are you using?
I failed to repro the issue on 3.2.0-32-generic Ubuntu.

@minux
Copy link
Member

minux commented Nov 21, 2012

Comment 10:

re #7, #8:
the message about epoll_wait might be from TestAddFDReturnsError,
and if it is, it's expected.

@alberts
Copy link
Contributor Author

alberts commented Nov 21, 2012

Comment 11:

re #10: did some tests and I think you're right.

@alberts
Copy link
Contributor Author

alberts commented Nov 21, 2012

Comment 12:

While I was looking into this, I ran into another problem:
#!/bin/bash
set -xe
go test -c net
while true; do
GOMAXPROCS=$[ 1 + $[ RANDOM % 128 ]] ./net.test -test.run=TestAddFDReturnsError
-test.timeout=10s
done
TestAddFDReturnsError seems to hang every few hundred runs... Should I file a new bug?

@bradfitz
Copy link
Contributor

Comment 13:

I copied Dave Cheney on this bug to look into Comment 12 while he works on the poll
server code.  But yes, that's probably another bug.  Or add this info to issue #4434.

@alberts
Copy link
Contributor Author

alberts commented Nov 27, 2012

Comment 14:

TestAddFDReturnsError failure is issue #4423.

@rsc
Copy link
Contributor

rsc commented Dec 9, 2012

Comment 15:

Since we don't really know what the limit should be, can we make it 150 and call it a
day?

Labels changed: added go1.1maybe, removed go1.1.

Status changed to Accepted.

@rsc
Copy link
Contributor

rsc commented Dec 10, 2012

Comment 16:

Labels changed: added size-s.

@davecheney
Copy link
Contributor

Comment 17:

https://golang.org/cl/6923046
fullung: can you please test, with note to the alternative suggestion in the CL.

Owner changed to @davecheney.

Status changed to Started.

@alberts
Copy link
Contributor Author

alberts commented Dec 11, 2012

Comment 18:

LGTM
Ran the test a few thousand times. Increasing the limit works without having to restrict
GOMAXPROCS to 1. linux/amd64 on kernel 3.3.8 with a dual Xeon E5-2670, if that matters.

@davecheney
Copy link
Contributor

Comment 19:

This issue was closed by revision 4766a35.

Status changed to Fixed.

@mikioh
Copy link
Contributor

mikioh commented Dec 13, 2012

Comment 20:

This issue was closed by revision feb509c.

mikioh pushed a commit that referenced this issue Apr 3, 2015
This change makes TestDialTimeoutFDLeak work on almost all the supported
platforms.

Updates #4384.

Change-Id: I3608f438003003f9b7cfa17c9e5fe7077700fd60
Reviewed-on: https://go-review.googlesource.com/8392
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@rsc rsc added this to the Go1.1 milestone Apr 14, 2015
@rsc rsc removed the go1.1maybe label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants