-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: connect after polling initialization #8426
Labels
Milestone
Comments
Comment 2 by jason@eggnet.com: No, connect still returns EINPROGRESS. But adding the descriptor to epoll prior to calling connect causes an event for this descriptor to show up in epoll_wait being executed in another thread. That causes go to believe the descriptor is ready after calling connect later on. go then calls getsockopt which clearly doesn't consider EINPROGRESS an error, thus returning 0. go believes connect succeeded and moves on to to the write. Only looking for EPOLLIN is against the man page. The proper flag to look for is EPOLLOUT like go already does. The problem is simply that go is polling a file descriptor prior to connect, which appears to lead to undefined behavior. |
Comment 4 by jason@eggnet.com: Agreed. Solving this for multiple platforms makes this a lot more complicated. I also noticed one more problem with just triggering on EPOLLIN. The second epoll_wait call that has this event also does not have EPOLLIN set. If go just looks for EPOLLIN, it will probably just leave this fd open forever. I guess the question is do you have the ability to test something on all of the supported go target environments? It's pretty hard to replicate the problem in go. On the other hand replicating it in C should be straight forward. A program could be written to spawn a thread in epoll_wait (or the kqueue, etc equivalents on relevant platforms) to determine which operating systems exhibit the problem in the first place. A program could also be written to test the behavior of polling after connect, basically a portable version of the test.c program I uploaded. I was digging around in libev, since it is a popular cross platform event library, for a potential answer to this question. The library is just a thin abstraction layer around various event handling mechanisms. I guess that is why it is popular. In short, the library doesn't call connect. That is something the user of the library is supposed to do. But I think libev would be useful for simplifying the creation of a portable test.c program since it covers all of the supported go platforms. |
Comment 5 by jason@eggnet.com: The C tests I suggested may not be necessary. I took a look at libuv, used by node.js, rust and others. Unlike libev, libuv has a connect function. uv__tcp_connect is defined in this file: https://github.com/joyent/libuv/blob/9b4f2b84f10c96efa37910f324bc66e27aec3828/src/unix/tcp.c line 127 is the connect call line 149 adds the file descriptor to the poller. This is platform agnostic behavior with the exception of windows. Here is the code for windows https://github.com/joyent/libuv/blob/master/src/win/tcp.c handle->func_connectex is called before REGISTER_HANDLE_REQ, which as far as I can tell, means connect is called before the polling function on windows as well. libuv appears to have unit tests for all of this. |
Comment 7 by jason@eggnet.com: Agreed. |
thanks for the investigation and great help. here is a fix; https://golang.org/cl/120820043/ though, not tested on dragonfly (async-connect enabled platform) yet. |
Comment 9 by jason@eggnet.com: Thanks, this looks great. |
Comment 10 by jason@eggnet.com: I've applied the patch and it seems to work. I can't reproduce any issues, thanks! |
CL https://golang.org/cl/120820043 mentions this issue. |
This issue was closed by revision c0325f5. Status changed to Fixed. |
Issue #8276 has been merged into this issue. |
Issue #8276 has been merged into this issue. |
Comment 18 by jason@eggnet.com: It is urgent for issue #8276 and applications like it, that infer something significant from a successful connect. It could be a monitoring application that infers connectivity, or maybe an http proxy that can fail over on connect but not write. Of course, along with enough traffic/concurrency to trigger the problem. |
Comment 19 by martin.garton@ft.com: Triggering the problem needs very little traffic or concurrency. The following code shows the problem in milliseconds for me. for { // replace 1.2.3.4 with a host that is down _, err := net.DialTimeout("tcp", "1.2.3.4:12345", time.Second) if err == nil { panic("Oops. We 'connected' to an non-existing host") } } (NB. If the host is listening, or returns a RST, or you get ICMP Destination unreachable then this may not show the problem.) If the proposed CL is believed risky, can we consider reverting the change that originally broke this?: https://code.google.com/p/go/source/detail?r=5f662f12d550 That would at least get us back to 1.2 functionality. |
Approved for Go 1.3.1. This code needs to stop changing. It seems to get rewritten on every release. We can't work toward stability when things get rewritten so often. If the rewrites can't stop on their own, we will have to introduce an explicit freeze for large parts of package net. Please try to wind them down. |
> Please try to wind them down. yup. > It seems to get rewritten on every release. for the record, the bug was introduced in go1.1 by https://code.google.com/p/go/source/detail?r=fed789ce8072 and i overlooked that nb "new network poller can have spurious readiness notifications" when i fixed the issue regarding async-connect feature on dragonfly in go1.3. so pls blame me, ugh. |
adg
added a commit
that referenced
this issue
May 11, 2015
…oll on linux ««« CL 120820043 / 06a4b59c1393 net: prevent spurious on-connect events via epoll on linux On Linux, adding a socket descriptor to epoll instance before getting the EINPROGRESS return value from connect system call could be a root cause of spurious on-connect events. See golang.org/issue/8276, golang.org/issue/8426 for further information. All credit to Jason Eggleston <jason@eggnet.com> Fixes #8276. Fixes #8426. LGTM=dvyukov R=dvyukov, golang-codereviews, adg, dave, iant, alex.brainman CC=golang-codereviews https://golang.org/cl/120820043 »»» TBR=r, rsc CC=golang-codereviews https://golang.org/cl/128110045
wheatman
pushed a commit
to wheatman/go-akaros
that referenced
this issue
Jun 25, 2018
On Linux, adding a socket descriptor to epoll instance before getting the EINPROGRESS return value from connect system call could be a root cause of spurious on-connect events. See golang.org/issue/8276, golang.org/issue/8426 for further information. All credit to Jason Eggleston <jason@eggnet.com> Fixes golang#8276. Fixes golang#8426. LGTM=dvyukov R=dvyukov, golang-codereviews, adg, dave, iant, alex.brainman CC=golang-codereviews https://golang.org/cl/120820043
wheatman
pushed a commit
to wheatman/go-akaros
that referenced
this issue
Jul 9, 2018
On Linux, adding a socket descriptor to epoll instance before getting the EINPROGRESS return value from connect system call could be a root cause of spurious on-connect events. See golang.org/issue/8276, golang.org/issue/8426 for further information. All credit to Jason Eggleston <jason@eggnet.com> Fixes golang#8276. Fixes golang#8426. LGTM=dvyukov R=dvyukov, golang-codereviews, adg, dave, iant, alex.brainman CC=golang-codereviews https://golang.org/cl/120820043
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
by jason@eggnet.com:
Attachments:
The text was updated successfully, but these errors were encountered: