Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: dial to a non-existent address doen't return an error #8276

Closed
gopherbot opened this issue Jun 24, 2014 · 19 comments
Closed

net: dial to a non-existent address doen't return an error #8276

gopherbot opened this issue Jun 24, 2014 · 19 comments

Comments

@gopherbot
Copy link

by coocood:

Before filing a bug, please check whether it has been fixed since the
latest release. Search the issue tracker and check that you're running the
latest version of Go:

Run "go version" and compare against
http://golang.org/doc/devel/release.html  If a newer version of Go exists,
install it and retry what you did to reproduce the problem.

Thanks.

What does 'go version' print?

go version go1.3 linux/amd64

What steps reproduce the problem?
If possible, include a link to a program on play.golang.org.

http://play.golang.org/p/PVBGNsPLN1

What happened?

returned error is nil when dial a non-exists address.

What should have happened instead?

the error should not be nil.

Please provide any additional information below.

It happens only on Go 1.3, Go 1.2 doesn't have this issue, so I compared the source
code, and found out that this was caused by a change to "connect" method in
"net/fd_unix.go" file. So I edited this method with the old version, then
recompiled Go 1.3, the issue disappeared.
@davecheney
Copy link
Contributor

Comment 1:

Your example code does not compile. Did you paste the correct version ?

Status changed to WaitingForReply.

@gopherbot
Copy link
Author

Comment 2 by coocood:

I pasted to play.golang.org from my local file and forgot to import package.
Here is the correct one
http://play.golang.org/p/T4FsbNsqkc

@mikioh
Copy link
Contributor

mikioh commented Jun 24, 2014

Comment 3:

interesting, could you please show/tell us;
- the output of attached tcpinfo.go on your environment that happens your issue (you
need to run "go get github.com/mikioh/tcp" first),
- outputs of all routing information by using iproute2 (ip r/n/l/a) or similar command
(pls don't forget to anonymize your private identifiers such as mac64/eui64 addresses),
- how did you verify that address (100.100.100.100) is a non-existent address on your
environment, your ip packet routing domain.

Attachments:

  1. tcpinfo.go (487 bytes)

@gopherbot
Copy link
Author

Comment 4 by coocood:

The address is randomly picked, I've tried other addresses, any non-existent address
will do.
my 'ip r' output is :
fe80::21f1:62c0:1c01:76a1 dev eth0 lladdr a4:1f:72:6b:52:e9 router STALE
10.12.104.1 dev wlan0 lladdr 44:2b:03:7d:15:4a STALE
10.12.113.1 dev eth0 lladdr 44:2b:03:7d:15:53 REACHABLE
the output of tcpinfo.go is like:
dial tcp 100.100.100.100:10000: i/o timeout
dial tcp 100.100.100.100:10000: i/o timeout
&{State:syn-sent Options:[] PeerOptions:[] RTT:0 RTTVar:250ms RTO:1s ATO:0
LastDataSent:4h31m58.388s LastDataReceived:4h31m58.388s LastAckReceived:4h31m58.388s
CC:0xc20800ec00 SysInfo:0xc208018870}
&{State:syn-sent Options:[] PeerOptions:[] RTT:0 RTTVar:250ms RTO:1s ATO:0
LastDataSent:4h31m58.388s LastDataReceived:4h31m58.388s LastAckReceived:4h31m58.388s
CC:0xc20800ecc0 SysInfo:0xc2080188c0}
&{State:syn-sent Options:[] PeerOptions:[] RTT:0 RTTVar:250ms RTO:1s ATO:0
LastDataSent:4h31m58.388s LastDataReceived:4h31m58.388s LastAckReceived:4h31m58.388s
CC:0xc20800ed80 SysInfo:0xc208018910}
&{State:syn-sent Options:[] PeerOptions:[] RTT:0 RTTVar:250ms RTO:1s ATO:0
LastDataSent:4h31m58.388s LastDataReceived:4h31m58.388s LastAckReceived:4h31m58.388s
CC:0xc20800ee40 SysInfo:0xc208018960}
dial tcp 100.100.100.100:10000: i/o timeout
&{State:syn-sent Options:[] PeerOptions:[] RTT:0 RTTVar:250ms RTO:1s ATO:0
LastDataSent:4h31m59.388s LastDataReceived:4h31m59.388s LastAckReceived:4h31m59.388s
CC:0xc20800ef60 SysInfo:0xc2080189b0}
&{State:syn-sent Options:[] PeerOptions:[] RTT:0 RTTVar:250ms RTO:1s ATO:0
LastDataSent:4h31m59.388s LastDataReceived:4h31m59.388s LastAckReceived:4h31m59.388s
CC:0xc20800f020 SysInfo:0xc208018a00}
dial tcp 100.100.100.100:10000: i/o timeout
&{State:syn-sent Options:[] PeerOptions:[] RTT:0 RTTVar:250ms RTO:1s ATO:0
LastDataSent:4h32m0.388s LastDataReceived:4h32m0.388s LastAckReceived:4h32m0.388s
CC:0xc20800f140 SysInfo:0xc208018a50}
&{State:syn-sent Options:[] PeerOptions:[] RTT:0 RTTVar:250ms RTO:1s ATO:0
LastDataSent:4h32m0.388s LastDataReceived:4h32m0.388s LastAckReceived:4h32m0.388s
CC:0xc20800f200 SysInfo:0xc208018aa0}
dial tcp 100.100.100.100:10000: i/o timeout
&{State:syn-sent Options:[] PeerOptions:[] RTT:0 RTTVar:250ms RTO:1s ATO:0
LastDataSent:4h32m1.392s LastDataReceived:4h32m1.392s LastAckReceived:4h32m1.392s
CC:0xc20800f320 SysInfo:0xc208018af0}
...

@mikioh
Copy link
Contributor

mikioh commented Jun 24, 2014

Comment 5:

nice, pretty interesting, looks like
> State:syn-sent
something similar to half-tcp hole punching happens. which version of linux are you
running? also, if possible, pls let us know the detail of your network environment
configuration (especially for ip-layer: sysctl states, iptables for nat44,
6rd/nat64/xlat for address translations and link-layer) for repro. i just tried to repro
it on freebsd but failed.

@gopherbot
Copy link
Author

Comment 6 by coocood:

This issue was first caught on production environment which is CentOS 6.3.
Our application works like sentinel who detects other application's failure by dialing
its address, if the error is nil, we consider this application is alive.
My local machine is Ubuntu 14.04 which can also reproduce this issue, and I didn't
change any system variable since installed.
So I guess it has nothing to do with linux version or configuration.

@mikioh
Copy link
Contributor

mikioh commented Jun 24, 2014

Comment 7:

thanks, looping in dmitriy.
dmitriy: in short, on linux, looks like there's some possibility that somewhere in
net.pollDesc.WaitWrite->net.pollDesc.Wait('w')->runtime.runtime_pollWait(..,
'w')->netpollunblock misses to call epoll_wait. can you identify the location?

@mikioh
Copy link
Contributor

mikioh commented Jun 24, 2014

Comment 8:

i wrote a workaroud cl, https://golang.org/cl/105400043/, as you suggested. can
you try this and report back the result? thanks.

Status changed to New.

@gopherbot
Copy link
Author

Comment 9 by coocood:

I tried this CL, and it fixes this issue.
I hope there will be 1.3.1 release soon.
Thank you.

@mikioh
Copy link
Contributor

mikioh commented Jun 24, 2014

Comment 10:

glad to hear that but that's a workaround. the heart of this issue is, on linux we
sometimes miss to call epoll_wait. that means that we'll face more disaster when we use
tcp fastopen protocol or similar stuff.

@gopherbot
Copy link
Author

Comment 11 by garton:

I have also hit this same issue.
Is the real issue and fix understood now?
If so, I'll stop debugging this.  If not, I'll carry on and post my findings.

@gopherbot
Copy link
Author

Comment 12 by garton:

It may be obvious already, but in case it helps:
The initial commit that broke this is this:
https://code.google.com/p/go/source/detail?r=5f662f12d550
Reverting the change (against release 1.3) appears to solve the problem for me.  Judging
from the history this possibly re-instates some other issue though.

@mikioh
Copy link
Contributor

mikioh commented Jul 28, 2014

Comment 13:

a fix; https://golang.org/cl/120820043/
though, not tested on dragonfly (async-connect enabled platform) yet.

@gopherbot
Copy link
Author

Comment 14:

CL https://golang.org/cl/120820043 mentions this issue.

@mikioh
Copy link
Contributor

mikioh commented Jul 29, 2014

Comment 15:

Labels changed: added release-go1.3.1.

@mikioh
Copy link
Contributor

mikioh commented Jul 29, 2014

Comment 16:

This issue was closed by revision c0325f5.

Status changed to Fixed.

@rsc
Copy link
Contributor

rsc commented Aug 11, 2014

Comment 17:

Merging with 8426 because the same CL claims to fix both.

Status changed to Duplicate.

Merged into issue #8426.

@rsc
Copy link
Contributor

rsc commented Aug 11, 2014

Comment 18:

Labels changed: added release-none, removed release-go1.3.1.

@adg
Copy link
Contributor

adg commented Aug 13, 2014

Comment 19:

This issue was closed by revision 073fc578434b.

Status changed to Fixed.

adg added a commit that referenced this issue May 11, 2015
…oll on linux

««« CL 120820043 / 06a4b59c1393
net: prevent spurious on-connect events via epoll on linux

On Linux, adding a socket descriptor to epoll instance before getting
the EINPROGRESS return value from connect system call could be a root
cause of spurious on-connect events.

See golang.org/issue/8276, golang.org/issue/8426 for further information.

All credit to Jason Eggleston <jason@eggnet.com>

Fixes #8276.
Fixes #8426.

LGTM=dvyukov
R=dvyukov, golang-codereviews, adg, dave, iant, alex.brainman
CC=golang-codereviews
https://golang.org/cl/120820043
»»»

TBR=r, rsc
CC=golang-codereviews
https://golang.org/cl/128110045
@golang golang locked and limited conversation to collaborators Jun 25, 2016
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jun 25, 2018
On Linux, adding a socket descriptor to epoll instance before getting
the EINPROGRESS return value from connect system call could be a root
cause of spurious on-connect events.

See golang.org/issue/8276, golang.org/issue/8426 for further information.

All credit to Jason Eggleston <jason@eggnet.com>

Fixes golang#8276.
Fixes golang#8426.

LGTM=dvyukov
R=dvyukov, golang-codereviews, adg, dave, iant, alex.brainman
CC=golang-codereviews
https://golang.org/cl/120820043
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jul 9, 2018
On Linux, adding a socket descriptor to epoll instance before getting
the EINPROGRESS return value from connect system call could be a root
cause of spurious on-connect events.

See golang.org/issue/8276, golang.org/issue/8426 for further information.

All credit to Jason Eggleston <jason@eggnet.com>

Fixes golang#8276.
Fixes golang#8426.

LGTM=dvyukov
R=dvyukov, golang-codereviews, adg, dave, iant, alex.brainman
CC=golang-codereviews
https://golang.org/cl/120820043
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants