Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: Dialer{DualStack: true}.Dial("tcp", "DNS reg-name") mishandles and cancels the winner connection #15035

Closed
mikioh opened this issue Mar 31, 2016 · 7 comments
Milestone

Comments

@mikioh
Copy link
Contributor

mikioh commented Mar 31, 2016

The following snippet fails mostly with tip:

package main

import (
    "log"
    "net"
    "time"
)

func main() {
    c, err := (&net.Dialer{DualStack: true}).Dial("tcp", "golang.org:80")
    if err != nil {
        log.Fatal(err)
    }
    defer c.Close()
    time.Sleep(3 * time.Second)
    req := []byte("GET /robots.txt HTTP/1.0\r\nHost: golang.org\r\n\r\n")
    if _, err := c.Write(req); err != nil {
        log.Fatal(err)
    }
    log.Println("done")
}

# go run snippet.go
write tcp 192.168.86.22:32962->216.58.197.209:80: i/o timeout

A quick look: It seems like dialParallel in dial.go cancels all racers, not only losers but a winner connection. This issue occurs only on tip (Go 1.7 devel).

func dialParallel(ctx *dialContext, ...) {
        cancel := make(chan struct{})
(snip)
        for !primaryResult.done || !fallbackResult.done {
                select {
                 case res := <-results:
(snip)
                        // On success, cancel the other racer (if one exists.)
                        if res.error == nil && cancel != nil {
                                close(cancel)
                                cancel = nil
                        }

/CC @pmarks-net

@mikioh mikioh added this to the Go1.7 milestone Mar 31, 2016
@mikioh
Copy link
Contributor Author

mikioh commented Mar 31, 2016

@pmarks-net,

Sorry, I completely overlooked this case during the review.

@pmarks-net
Copy link
Contributor

This snippet fails in a similar way, using the go1.6 release:

package main

import (
        "log"
        "net"
        "time"
)

func main() {
        cancel := make(chan struct{})
        c, err := (&net.Dialer{Cancel: cancel}).Dial("tcp", "golang.org:80")
        if err != nil {
                log.Fatal(err)
        }
        close(cancel)
        defer c.Close()
        time.Sleep(3 * time.Second)
        req := []byte("GET /robots.txt HTTP/1.0\r\nHost: golang.org\r\n\r\n")
        if _, err := c.Write(req); err != nil {
                log.Fatal(err)
        }
        log.Println("done")
}

The Dialer documentation says:

// Cancel is an optional channel whose closure indicates that
// the dial should be canceled. 

I would interpret "the dial should be canceled" to mean that after Dial returns, closing the channel has no effect. Instead, it seems to travel back in time to kill the existing connection.

Was Cancel intended to apply just to the Dial operation, or to the entire lifetime of the connection?

@mikioh
Copy link
Contributor Author

mikioh commented Mar 31, 2016

I think both cases are slightly different. The former, which I mentioned, is just about how to handle the hidden channel for cancellation of multiple dial racers. The latter is for customer's dial cancellation.

Was Cancel intended to apply just to the Dial operation, or to the entire lifetime of the connection?

I believe the former.

@pmarks-net
Copy link
Contributor

I believe the former.

Then this is a pre-existing bug in dialTCP, which dialParallel is bringing to light. It's true that dialParallel cancels both racers, but only after one of them has successfully returned a connection.

So if cancelation were not retroactive, then dialParallel would work correctly.

@mikioh
Copy link
Contributor Author

mikioh commented Mar 31, 2016

So if cancelation were not retroactive

Yup, agreed.

@pmarks-net
Copy link
Contributor

I think the bug is here:
https://github.com/golang/go/blob/master/src/net/fd_unix.go#L108

The <-cancel and <-done cases are racing each other after connect returns. Instead, connect needs to wait for the goroutine to acknowledge the done event before returning.

@gopherbot
Copy link

CL https://golang.org/cl/21497 mentions this issue.

@golang golang locked and limited conversation to collaborators Apr 5, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants