internal/poll: Close should not return until the fd is really closed #21856

slingamn · 2017-09-13T06:52:01Z

What version of Go are you using (`go version`)?

1.9

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (`go env`)?

linux/amd64

Proposal

I'm taking TCPListener as an example, but I'm pretty sure the issue applies to UDP and Unix domain sockets as well. TCPListener.Close is documented as "Close stops listening on the TCP address." However, Close() does not guarantee that the address is safe to listen on again. This is because if a concurrent goroutine is in the middle of an Accept() call on the same listener, it will hold an additional reference, via an fdmutex readLock, to the underlying file descriptor. Consequently, in the typical case, Close() will merely decrement the reference count and return without actually calling close(2) on the file descriptor.

Because the runtime sets SO_REUSEADDR on all its listen sockets, the address will be free for bind(2) again immediately after the close(2) system call returns. But since close(2) will not execute until both Close() and Accept() return, a synchronization primitive (e.g., a chan or a sync.WaitGroup) is currently required in order to produce the necessary happens-before relationship to safely create a new TCPListener.

Here's a test case (adapted from one suggested by @davecheney on the golang-nuts list):

https://gist.github.com/slingamn/3b0f81169de6578c732ec279828c4866

This case will reliably panic with: panic: listen tcp :6502: bind: address already in use.

The proposal is that all the Listener types affected by this issue should perform this synchronization automatically in their Close methods. This would (at least on POSIX platforms) make it safe to re-listen on an address immediately after Close.

Here are some reasons in support of this change:

The present behavior is counterintuitive because it contradicts the behavior of the underlying, eponymous system calls. (On Linux, with SO_REUSEADDR set, the address is available immediately after close(2) returns; any concurrent accept(2) call will fail with EBADF. This behavior is arguably being simulated, inaccurately, by the polling layer, which does not actually do a blocking accept(2).)
The present behavior is a source of race conditions. If the time.Sleep() call is deleted from the above test case, the program reliably executes without a panic, even when built with -race. It's easy to write code that appears to work, but has a lurking race.
The present behavior is a source of confusion. Here's one example, which includes a workaround that either never worked, or does not work under the current runtime.
Althought "happens-after both Accept() and Close() return" is currently a sufficient condition to safely listen on the address again, this is not clearly specified in the documentation; it seems more like an implementation detail than anything else. It is not clear that future versions of the runtime will behave the same way.
The change appears to have no serious implications for backwards compatibility or performance (since closing a Listener is a relatively uncommon operation, relative to, e.g., accepting new connections).

Thanks very much for your time.

The text was updated successfully, but these errors were encountered:

slingamn · 2017-09-15T18:41:25Z

Sorry --- did anyone have thoughts on this?

ianlancetaylor · 2017-09-15T18:50:36Z

Proposal reviews are done approximately weekly. When this one will be reviewed depends on the overall load. There have been a bunch of proposals recently, for some reason.

My initial reaction is that this seems reasonable, but what is special about TCPListener? Currently I believe that for all the net types with a Close method the Close method returns only with a promise that the socket will be closed when all pending I/O is complete or times out. Should we change all of the Close methods to wait until the socket descriptor is actually really closed?

slingamn · 2017-09-15T19:19:56Z

Thanks for the clarification --- I was confused because the latency of the review process in practice often outperforms the specification :-) (This seems to be a common issue in systems design, e.g., O_PONIES.)

In my opinion, TCPListener, UDPListener, and UnixListener are all special because their file descriptor is associated with a unique resource --- the address they were bound to with bind(2) --- that will not be released until close(2) is called. (Normally, the resource being held is not unique --- something like an ephemeral port, or just the entry in the file descriptor table. It is less important to know exactly when such a resource has been released.)

This was my intention with the original proposal title of "Listener.Close() should not return until the underlying address is freed". The goal is to have Listener.Close reclaim the address so that it can be bound again; the fact that this can only be accomplished via close(2) is more of a detail.

rsc · 2017-09-18T20:49:42Z

It's not clear to me if this should be restricted to listeners. It's plausible that other special fds might also exist and that we should instead make sure that when any fd Close returns, the underlying close(2) system call has returned.

It's then not clear to me if this would potentially cause any deadlocks by making Close wait for pending I/O to be aborted, and that pending I/O might take a long time.

Or maybe we should do this for all fds we poll, since at least there we (think we) know how to abort pending I/O promptly.

slingamn · 2017-09-20T08:59:18Z

I think it would definitely help to enumerate any other cases that might be "special"; thus far I can only think of listeners.

It seems like an implementation at the poll / fdMutex level should be deadlock-free, because the fdMutex semaphores are the "innermost" synchronization primitives held on those code paths. It also seems like the code shouldn't have to wait for any blocking I/O calls to complete, just for other goroutines to wake up from waitRead and waitWrite and realize that Close is trying to close the socket? But this in itself could conceivably have an impact on performance for some workloads.

rsc · 2017-09-25T20:23:20Z

Per discussion with @ianlancetaylor let's just try making internal/poll's Close do this for all file descriptors and back down to Listeners only if we find some reason to do so.

gopherbot · 2017-09-26T03:53:37Z

Change https://golang.org/cl/66150 mentions this issue: internal/poll: don't return from Close until descriptor is closed

gopherbot · 2017-09-27T00:49:48Z

Change https://golang.org/cl/66334 mentions this issue: net: use newLocalListener in TestClosingListener.

Updates #21856 Change-Id: I9baa51fe23e6dd2fcf9dd14f7acfaf7457571e1d Reviewed-on: https://go-review.googlesource.com/66334 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Crawshaw <crawshaw@golang.org> Reviewed-by: Mikio Hara <mikioh.mikioh@gmail.com>

gopherbot · 2017-12-13T14:34:50Z

Change https://golang.org/cl/83715 mentions this issue: net, os: don't wait for Close in blocking mode

Updates #7970 Updates #21856 Updates #23111 Change-Id: I0cd0151fcca740c40c3c976f941b04e98e67b0bf Reviewed-on: https://go-review.googlesource.com/83715 Reviewed-by: Russ Cox <rsc@golang.org>

gopherbot · 2017-12-14T03:03:36Z

Change https://golang.org/cl/83995 mentions this issue: os: don't wait for Close if the File was returned by NewFile

os.NewFile doesn't put the fd into non-blocking mode. In most cases, an *os.File returned by os.NewFile is in blocking mode. Updates #7970 Updates #21856 Updates #23111 Change-Id: Iab08432e41f7ac1b5e25aaa8855d478adb7f98ed Reviewed-on: https://go-review.googlesource.com/83995 Reviewed-by: Ian Lance Taylor <iant@golang.org>

gopherbot added this to the Proposal milestone Sep 13, 2017

gopherbot added the Proposal label Sep 13, 2017

ianlancetaylor changed the title ~~proposal: net.Listener.Close() should not return until the underlying address is freed~~ proposal: net: Listener.Close should not return until the socket is really closed Sep 13, 2017

rsc changed the title ~~proposal: net: Listener.Close should not return until the socket is really closed~~ internal/poll: Close should not return until the fd is really closed Sep 25, 2017

rsc assigned ianlancetaylor Sep 25, 2017

rsc modified the milestones: Proposal, Go1.10 Sep 25, 2017

rsc added the Proposal-Accepted label Sep 25, 2017

gopherbot closed this as completed in 382d492 Sep 26, 2017

davecheney mentioned this issue Oct 31, 2017

net: Listener.Close() does not fully stop the server when Accept()'ing at the same time #22502

Closed

xor-gate mentioned this issue Jan 4, 2018

Unstable redistest.StartCluster due to redis "address already in use" mna/redisc#7

Closed

thejosephstevens mentioned this issue May 31, 2018

Fix for Windows CLI Port Forwarding fission/fission#715

Merged

fturib mentioned this issue Jul 17, 2018

plugin/metrics - fails to follow metric directive after 2 restarts coredns/coredns#1969

Closed

golang locked and limited conversation to collaborators Dec 14, 2018

gopherbot added the FrozenDueToAge label Dec 14, 2018

rsc unassigned ianlancetaylor Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internal/poll: Close should not return until the fd is really closed #21856

internal/poll: Close should not return until the fd is really closed #21856

slingamn commented Sep 13, 2017

slingamn commented Sep 15, 2017

ianlancetaylor commented Sep 15, 2017

slingamn commented Sep 15, 2017

rsc commented Sep 18, 2017

slingamn commented Sep 20, 2017

rsc commented Sep 25, 2017

gopherbot commented Sep 26, 2017

gopherbot commented Sep 27, 2017

gopherbot commented Dec 13, 2017

gopherbot commented Dec 14, 2017

internal/poll: Close should not return until the fd is really closed #21856

internal/poll: Close should not return until the fd is really closed #21856

Comments

slingamn commented Sep 13, 2017

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

Proposal

slingamn commented Sep 15, 2017

ianlancetaylor commented Sep 15, 2017

slingamn commented Sep 15, 2017

rsc commented Sep 18, 2017

slingamn commented Sep 20, 2017

rsc commented Sep 25, 2017

gopherbot commented Sep 26, 2017

gopherbot commented Sep 27, 2017

gopherbot commented Dec 13, 2017

gopherbot commented Dec 14, 2017

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?