Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: timeout support for Dial #240

Closed
gopherbot opened this issue Nov 17, 2009 · 26 comments
Closed

net: timeout support for Dial #240

gopherbot opened this issue Nov 17, 2009 · 26 comments
Milestone

Comments

@gopherbot
Copy link

by emilliken:

Right now it takes over 3 minutes for net.Dial() to return if a tcp host is
down.  I would like to be able to specify the timeout.

Currently you can somewhat get around this by Dialing in a separate
goroutine and just letting it finish, but this sounds like a waste of a
resource.
@rsc
Copy link
Contributor

rsc commented Nov 17, 2009

Comment 1:

I don't know how to provide this without making the Dial interface more complicated.

Owner changed to r...@golang.org.

Status changed to Thinking.

@gopherbot
Copy link
Author

Comment 2 by emilliken:

perhaps this could be done if/when go would allow one goroutine to terminate another?

@rsc
Copy link
Contributor

rsc commented Dec 2, 2009

Comment 3:

Labels changed: added packagechange.

@gopherbot
Copy link
Author

Comment 4 by runningwild:

TCPListener.Accept() is also awkward to use without a timeout.  Once
TCPListener.Accept() is called it blocks 
until a connection is received.  Without a timeout or the ability to kill other
goroutines the best I can do, if I 
want to stop listening and I want to free up that TCP port, is one of the following:
1. Call TCPListener.Close() from another goroutine, which leaves a stalled goroutine
resident that just 
consumes resources (albeit a constant amount) and pollutes the output in the event of a
stack trace.
2. I can actually dial the addr that I was listening on which will dislodge the
TCPListener.Accept(), but that 
seems like a very silly and unintuitive thing to have to do, and it also requires extra
code so that the listener 
knows when it was being dislodged like that.
As with Dial I think this could be fixed by either a timeout or being able to kill other
goroutines, or instead by 
TCPListener.Accept() returning an error if TCPListener.Close() is called from another
goroutine.

@gopherbot
Copy link
Author

Comment 5 by graycardinalster:

Good day!
I write some server. The server creates a new goroutine and performs Accept (). This
gouroutine may never be completed, if the client does not connect. This is a serious
hole. What to do?

@rsc
Copy link
Contributor

rsc commented Jan 26, 2010

Comment 6:

@graycardinalster: Accept is easier than Dial.  I think that if in
another goroutine you wait the appropriate length of time
and then do l.Close(), the Accept will return with an error.

@gopherbot
Copy link
Author

Comment 7 by graycardinalster:

As in other goroutine know if they have already made a Close ()?

@gopherbot
Copy link
Author

Comment 8 by marius@vastech.co.za:

There is another problem with ListenTCP/Accept:
If a TCPListener is dialed in quick succession (e.g.: by multiple clients) a client may 
be connected without error even though no Accept was called on the listener.
See the attached programs.
TCPListen.go accepts one connection, closes the listener and sleeps.
TCPDial.go dials the listener twice.
Without the delay in tcpListen.go between AcceptTCP and Close, I get the behaviour I 
expect from tcpDial - the seconds connection is refused:
0
dial tcp :10000: connection refused
With the delay the second call to DialTCP returns without and error, and I get the 
unexpected result that both sessions connected:
0
1

Attachments:

  1. tcpDial.go (314 bytes)
  2. tcpListen.go (439 bytes)

@rsc
Copy link
Contributor

rsc commented Feb 25, 2010

Comment 9:

@marius: This is standard behavior for TCP.  Accept means give me the
next connection, not "let the next connection connect".  The kernel has 
a queue of incoming already-connected connections and just grabs the
next one.  You'd see this behavior in any other language too, and there's
nothing Go can do about it.

@bradfitz
Copy link
Contributor

Comment 10:

I just hit this limitation in Go's Dial function too and found this bug.
Russ wrote:
> I don't know how to provide this without making the
> Dial interface more complicated.
I would argue that the Dial interface is already needlessly complicated.  How often 
does somebody need to specify the local address?  Usually it'll be blank.  Even the 
docs highlight this ugliness in their examples:
Dial("tcp", "", "12.34.56.78:80")
Dial("tcp", "", "google.com:80")
Dial("tcp", "", "[de:ad:be:ef::ca:fe]:80")
Why not (something like) instantiate a "DialParams" message, and Dial on that?
pendingDial := conn.DialParams("tcp", "12.34.56.78:80")
pendingDial.SetTimeout(100)
tcp := pendingDial.Dial()
Keep Dial() if you want, but DialParams gives extensibility.  In common case:
tcp := conn.DialParams("tcp", "google.com:80").Dial()
Maybe that's not idiomatic Go code, though.
But a bunch of positional parameters is ugly, especially when 1 or more of them is 
often not set.

@gopherbot
Copy link
Author

Comment 11 by clbanning:

Had a similar problem with websocket.Dial() for attaching to servers with DHCP-assigned
network addresses.  The following works great; just call getWebsocConn() while looping
through a range of addresses. 
type websoc struct {
   c *websocket.Conn
   err os.Error
}
func getWebsocConn(wsString, destString string, timeOut int) (*websocket.Conn, os.Error)
{
   var ws websoc
   var ok bool
   ch := make(chan websoc,1)
   go dialConn(wsString, destString, ch)
   tic := time.Seconds()
   for {
      ws, ok = <-ch
      if ok == true {
         break
      }
      toc := time.Seconds()
      if to := toc-tic; int(to) > timeOut {
         ws.err = os.ECONNREFUSED
         break
      }
   }
   return ws.c, ws.err
}
func dialConn(wsString, destString string, ch chan websoc) {
   var ws websoc
   c, ok := websocket.Dial(wsString,"",destString)
   ws.c = c
   ws.err = ok
   ch <- ws
}
Yeah, it does leave some go processes waiting for net.Dial() to timeout - but that's
pretty insignificant overhead for being able to go ahead and get some real processing
done in a dynamic environment.

@gopherbot
Copy link
Author

Comment 12 by clbanning:

Had a similar problem with websocket.Dial().  The server(s) are on a network with
DHCP-assigned addresses.  So I fixed it by looping over a range of addresses while
calling GetWebsocConn(), below.
type websoc struct {
   c *websocket.Conn
   err os.Error
}
func GetWebsoc(wsString, destString string, timeOut int) (*websocket.Conn, os.Error) {
   var ws websoc
   var ok bool
   ch := make(chan websoc,1)
   go dialWebsocConn(wsString, destString, ch)
   tic := time.Seconds()
   for {
      ws, ok = <-ch
      if ok == true {
         break
      }
      toc := time.Seconds()
      if to := toc-tic; int(to) > timeOut {
         close(ch)
         ws.err = os.ECONNREFUSED
         break
      }
   }
   return ws.c, ws.err
}
func dialWebsocConn(wsString, destString string, ch chan websoc) {
   var ws websoc
   c, ok := websocket.Dial(wsString,"",destString)
   ws.c = c
   ws.err = ok
   ch <- ws
}
Sure there is some processing overhead from orphaned 'go dialWebsocConn()' procs; but it
pretty insignificant compared to being able to something useful done. (Of course, once
I've found the server of interest, I move that address to the 'head' of the address
range list; so I usually get a hit on the first address tried.)

@gopherbot
Copy link
Author

Comment 13 by clbanning:

Had a similar problem with websocket.Dial().  The server(s) are on a network with
DHCP-assigned addresses.  So I fixed it by looping over a range of addresses while
calling GetWebsocConn(), below.
type websoc struct {
   c *websocket.Conn
   err os.Error
}
func GetWebsoc(wsString, destString string, timeOut int) (*websocket.Conn, os.Error) {
   var ws websoc
   var ok bool
   ch := make(chan websoc,1)
   go dialWebsocConn(wsString, destString, ch)
   tic := time.Seconds()
   for {
      ws, ok = <-ch
      if ok == true {
         break
      }
      toc := time.Seconds()
      if to := toc-tic; int(to) > timeOut {
         close(ch)
         ws.err = os.ECONNREFUSED
         break
      }
   }
   return ws.c, ws.err
}
func dialWebsocConn(wsString, destString string, ch chan websoc) {
   var ws websoc
   c, ok := websocket.Dial(wsString,"",destString)
   ws.c = c
   ws.err = ok
   ch <- ws
}
Sure there is some processing overhead from orphaned 'go dialWebsocConn()' procs; but
it's pretty insignificant compared to being able to get something useful done. (Of
course, once I've found the server of interest, I move that address to the 'head' of the
address range list; so I usually get a hit on the first address tried.)

@alberts
Copy link
Contributor

alberts commented Oct 5, 2010

Comment 14:

The problem Marius was having would be solved by having a ListenWithBacklog function for
making listeners.

@niemeyer
Copy link
Contributor

Comment 15:

The main underlying issue, which is partly fixed by bradfitz's suggestion, seems to
be that Go's abstraction is a bit different from the usual OS's one. There are
generally two steps: one for allocating the data structure, and then a second one for
"dialing" out.  This gives some time for further tweaking the parameters before an
action is performed. Since Go is acting differently, it's missing an obvious place
for sticking the setting change.
Two high-level suggestions on how to fix this:
1) Separate out the two actions in Go as well.  E.g Prepare() returns the allocated
struct which has some further actions on it, plus the Dial() method to connect out,
or similar.
2) Provide a generic settings parameter to Dial(), which may be given through
different structs/interfaces depending on the type of options needed by the
operation, and also may be nil for defaults.  This ensures forward compatibility with
new settings, and enables custom actions to be done within Dial() itself before any
delaying calls take place.

@alberts
Copy link
Contributor

alberts commented Nov 19, 2010

Comment 16:

Some relevant discussion on golang-dev:
http://groups.google.com/group/golang-dev/browse_thread/thread/87cdea3187972bce

@snaury
Copy link
Contributor

snaury commented May 20, 2011

Comment 17:

Btw, recently there was a change that made net.Dial use non-blocking connect(), this
could mean that the default system-wide timeout might be gone now, and so if you try to
connect to hosts that drop all packets it might never finish (I haven't tested it
though).
Anyway, related to this bug, what do you think about introducing global/default
timeouts? With APIs like net.SetDefaultTimeout, net.SetDefaultReadTimeout,
net.SetDefaultWriteTimeout, net.SetDefaultConnectTimeout? While having global variables
like that might not be a good idea in general, usually single application doesn't use
different default timeouts for different sockets/connections, so being able to set
reasonable defaults on per-application basic might be useful.

@bradfitz
Copy link
Contributor

Comment 18:

Such globals seem dangerous.
If anything a default timeout of 5 minutes or something might make more sense for people
who don't specify it on a specific Dial.
I wonder if there's anything we do or could do with finalizers and removing fds from the
epoll set when their corresponding Go-level objects go away.
i.e.
go func()) {
   conn, _ := net.Dial("tcp", "drop-packets.com")
   ch <- conn
})
select {
case <-time.After(2e9):
case conn := <-ch
}
... if the timeout happens, and "conn" disappears and is GC'd, perhaps the fd that's
forever hung on drop-packets.com could be removed from the epoll set.

@gopherbot
Copy link
Author

Comment 19 by emilliken:

In my case, I wanted to keep the number of connections somewhere below
syscall.Getdtablesize().  Not knowing exactly when the connection is GC'd and destroyed
could make it difficult to keep track of where I am.  I'm in favor or something that
could indicate to net.Dial exactly how long to try.

@snaury
Copy link
Contributor

snaury commented May 30, 2011

Comment 20:

Ok, I just tested a simple black-hole connection and it turns out that even if socket is
non-blocking there's still socket timeout somewhere. Which is awesome news, since it
means go is safe for anti-malware research projects.
As for timeouts and accept it's actually surprising that shutdown doesn't always work,
i.e.:
package main
import "fmt"
import "net"
import "time"
func main() {
    l, e := net.Listen("tcp", "127.0.0.1:8080")
    if e != nil {
        fmt.Printf("%v\n", e)
        return
    }
    go func() {
        <- time.After(1e9)
        println("Closing listener")
        e := l.Close()
        if e != nil {
            fmt.Printf("%v\n", e)
        }
    }()
    println("Starting to accept")
    _, e = l.Accept()
    if e != nil {
        fmt.Printf("%v\n", e)
        return
    }
    println("Done")
}
Hangs up forever on Mac OS X, while correctly stopping with "invalid argument" on Linux.
What gives? :-/

@snaury
Copy link
Contributor

snaury commented May 30, 2011

Comment 21:

It turns out that fd.Close doesn't forward errors from syscall.Shutdown, because same
program in Python gives me "error: [Errno 57] Socket is not connected" on the shutdown
call (as per manpage, turns out on BSD shutdown is for full-duplex connections only).
But if I change fd.Close to forward errors the some tests start failing because of
unexpected errors in Close. :(

@rsc
Copy link
Contributor

rsc commented Jul 25, 2011

Comment 22:

Assigning to Brad.  We worked out an API.

Owner changed to @bradfitz.

Status changed to Accepted.

@rsc
Copy link
Contributor

rsc commented Dec 9, 2011

Comment 23:

Labels changed: added priority-later.

@rsc
Copy link
Contributor

rsc commented Dec 12, 2011

Comment 24:

Labels changed: added priority-go1.

@bradfitz
Copy link
Contributor

@bradfitz
Copy link
Contributor

Comment 26:

This issue was closed by revision 964309e.

Status changed to Fixed.

@rsc rsc added this to the Go1 milestone Apr 10, 2015
@rsc rsc removed the priority-go1 label Apr 10, 2015
@mikioh mikioh changed the title timeout support for net.Dial() net: timeout support for Dial Jul 30, 2015
@golang golang locked and limited conversation to collaborators Aug 5, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants