Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/rpc/jsonrpc: Dial: incorrect result returned after multiple dials #2690

Closed
gopherbot opened this issue Jan 12, 2012 · 17 comments
Closed
Milestone

Comments

@gopherbot
Copy link

by Bond.Dmitry:

What steps will reproduce the problem?
1. Choose a port, that is free. In my case it is 50000.
2. Call jsonrpc.Dial("tcp", "localhost:50000") a lot of times with
some timeout. In my case I call it 500000 times with a 1 microsecond timeout.

The code: https://gist.github.com/1602551 (or see the attachment)

What is the expected output?

Error and a nil client as result of each Dial call, because there is nothing on that
port.

What do you see instead?

After some random number of calls the Dial func returns no error and an rpc client with
strange contents. 

Results that I got with the code above:

imp@imp:~/Projects/temp/go/2012/jsonrpc_bug/rpcbug$ ./rpcbug
Got unexpected result after 14337 dials: &{{0 0} {0 0} { 0 <nil>} 0
0xf840026200 map[] false false}
Done 500000 dials successfully

imp@imp:~/Projects/temp/go/2012/jsonrpc_bug/rpcbug$ ./rpcbug
Got unexpected result after 22413 dials: &{{0 0} {0 0} { 0 <nil>} 0
0xf840026200 map[] false false}
Done 500000 dials successfully

imp@imp:~/Projects/temp/go/2012/jsonrpc_bug/rpcbug$ ./rpcbug
Got unexpected result after 2211 dials: &{{0 0} {0 0} { 0 <nil>} 0
0xf840026f80 map[] false false}
Done 500000 dials successfully

imp@imp:~/Projects/temp/go/2012/jsonrpc_bug/rpcbug$ ./rpcbug
Got unexpected result after 10365 dials: &{{0 0} {0 0} { 0 <nil>} 0
0xf840026200 map[] false false}
Got unexpected result after 490291 dials: &{{0 0} {0 0} { 0 <nil>} 0
0xf840026f00 map[] false false}
Done 500000 dials successfully


Which compiler are you using (5g, 6g, 8g, gccgo)?

6g version weekly.2011-12-22 11071

Which operating system are you using?

Ubuntu 10.04.3 LTS

Which revision are you using?  (hg identify)

4a8268927758 weekly/weekly.2011-12-22

Please provide any additional information below.

Attachments:

  1. main.go (518 bytes)
@gopherbot
Copy link
Author

Comment 1 by Bond.Dmitry:

Changing jsonrpc.Dial to rpc.Dial doesn't fix the problem.

@gopherbot
Copy link
Author

Comment 2 by Bond.Dmitry:

I've found out that probably the problem is net.DialTCP as it is returning strange
connections after some dials and probably it is causing these problems.
I've changed the rpc.Dial code to:
   tcpAddr, _ := net.ResolveTCPAddr("tcp", "localhost:50001")
   conn, err := net.DialTCP("tcp", nil, tcpAddr)
   dialsCount++
   if err == nil {
       fmt.Printf("Got unexpected result after %d dials: %v %v\n", dialsCount, conn,   tcpAddr)
...
and got the same result: 
imp@imp:~/Projects/temp/go/2012/jsonrpc_bug/rpcbug$ ./rpcbug
Got unexpected result after 150410 dials: &{0xf8400bd140} 127.0.0.1:50001

@robpike
Copy link
Contributor

robpike commented Jan 13, 2012

Comment 3:

Labels changed: added priority-go1, removed priority-triage.

@robpike
Copy link
Contributor

robpike commented Jan 13, 2012

Comment 4:

Status changed to Accepted.

@robpike
Copy link
Contributor

robpike commented Jan 13, 2012

Comment 5:

Owner changed to builder@golang.org.

@bytbox
Copy link
Contributor

bytbox commented Jan 17, 2012

Comment 6:

Reproducing under arch (linux 3.1.9, at tip), running connect attempts in multiple
goroutines (code attached), I observe that calls to DialTCP that will return an
unexpected non-error take an unusually long time (often >10 seconds).

Attachments:

  1. main.go (592 bytes)

@mikioh
Copy link
Contributor

mikioh commented Jan 17, 2012

Comment 7:

Seems Linux network stack-dependent behavior, funny.
! After 2516 dials: &net.TCPConn{fd:(*net.netFD)(0xf84000fdc0)}
&net.TCPAddr{IP:[]byte{0x7f, 0x0, 0x0, 0x1}, Port:50001}, from: 127.0.0.1:50001, to:
127.0.0.1:50001

@mikioh
Copy link
Contributor

mikioh commented Jan 17, 2012

Comment 8:

#7 comment was wrong, on freebsd/amd64:
! After 5499 dials: &net.TCPConn{fd:(*net.netFD)(0xf8400a3320)}
&net.TCPAddr{IP:[]byte{0x7f, 0x0, 0x0, 0x1}, Port:50001}, from: 127.0.0.1:50001, to:
127.0.0.1:50001

@mikioh
Copy link
Contributor

mikioh commented Jan 17, 2012

Comment 9:

Aha, I think we can close this issue as WorkingAsIntended.
http://stackoverflow.com/questions/4949858/how-can-you-have-a-tcp-connection-back-to-the-same-port

@gopherbot
Copy link
Author

Comment 10 by Bond.Dmitry:

I think before closing it as WorkingAsIntended, some workaround for this situation
should be described in the documentation. How should I 'monitor' the tcp address with go
packages without getting this strange error?

@gopherbot
Copy link
Author

Comment 11 by Bond.Dmitry:

And as to original topic, I think at least (json)rpc.Dial should somehow check whether
the connection is invalid and return error and nil, not a working client.
Currently I have to manually check the returned object for the "&{{0 0} {0 0} { 0
<nil>} 0 ... map[] false false}" pattern to find out that this is actually a
'fake' client returned and I shouldn't use it. But such check seems inconvenient and
incorrect.

@gopherbot
Copy link
Author

Comment 12 by pkorotkov:

My proposal is to allow for (and process) such unexpected behavior *inside* the
net.DialTCP() function, guaranteeing fully that an error always indicates the unlistened
port.

@mikioh
Copy link
Contributor

mikioh commented Jan 20, 2012

Comment 13:

#12: I don't think we can determine whether is expected or not 
even if we can list all IP interface addresses on a target node.
Because it depends on a user, a caller of DialTCP.
#11: I'm happy if you can fix the issue for json-rpc.
#10: I'm also happy if you can proceed to fix the comment in 
regard to "Simultaneous TCP active open causes the problem".

@gopherbot
Copy link
Author

Comment 14 by Bond.Dmitry:

A small update: I've checked the 'normal' connections and they also have the same
pattern, so checking for patterns is not an option actually.
I've followed the link that you gave, and I see that this problem occurs when the client
gets the same local and foreign addresses. But you can actually check this equality and
maybe give special options for the client to "connect to itself" and in other cases -
perform additional connect attempt if got two equal addresses. Or any other solutions,
but I really doubt that currently it is working as intended.
It causes very serious effects: for example I try to connect to the service which is not
up yet, and after some connection attempts I get a good and ready rpc.Client. And when I
call some methods using that client - I get a panic in other goroutine which crashes the
whole application.
It means that there may be cases that the service is temporarily down and it would crash
the front-end, because there is absolutely no possibility to catch a panic in other go
routine after a Call on a 'fake' client which cannot be identified as 'fake'. 
So a solution is still needed.

@rsc
Copy link
Contributor

rsc commented Jan 30, 2012

Comment 16:

Labels changed: added go1-must.

@rsc
Copy link
Contributor

rsc commented Feb 12, 2012

Comment 17:

Ouch.
Will detect and kill.

Owner changed to @rsc.

Status changed to Started.

@rsc
Copy link
Contributor

rsc commented Feb 13, 2012

Comment 18:

This issue was closed by revision cbe7d8d.

Status changed to Fixed.

@rsc rsc added this to the Go1 milestone Apr 10, 2015
@golang golang locked and limited conversation to collaborators Jun 24, 2016
@rsc rsc removed their assignment Jun 22, 2022
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants