New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: HTTP requests can be not completed if network interface settings was changed and connection doesn't return an error for a long time #33515
Comments
Hello @dedalqq, thank you for filing this bug and welcome to the Go project! Firstly for a disclaimer, I am not a networking expert but I can debug here and there, so my findings are just from my late night investigations. Here we go: I am not a networking expert but from reading through the literature on TCP KeepAlive ka e.g. from http://man7.org/linux/man-pages/man7/tcp.7.html and then in the Go net source code for Linux Lines 15 to 25 in 9c1f14f
Given a KeepAlive duration of 30seconds, for every single TCP KeepAlive probe, the connection has to be idle for 30 seconds but also the duration between probes is 30 seconds, so per probe we have to wait for: wTPerProbe := 30s + 30s = 60s To get the number of default probes, we look again in http://man7.org/linux/man-pages/man7/tcp.7.html and find minWait := wTPerProbe * nProbes = 60s * 9 probes = 540s = 9 min so from my rough maths, assuming that all is well, we've got that value and that would explain what you are seeing, a wait of 8 minutes of which we've got 9 minutes being that empirical minWait I also made a standalone repro to absolve the blame from net/http by just making a TLS connection directly and making requests to a server and then you can perform your network switches package main
import (
"bufio"
"context"
"crypto/tls"
"fmt"
"log"
"net"
"net/http"
"net/http/httputil"
"sync"
"time"
)
func main() {
baseDialer := &net.Dialer{
Timeout: 30 * time.Second,
KeepAlive: 30 * time.Second,
DualStack: true,
}
conn, err := tls.DialWithDialer(baseDialer, "tcp", "golang.org:443", &tls.Config{
InsecureSkipVerify: true,
})
if err != nil {
log.Fatalf("Failed to dial to connection: %v", err)
}
defer conn.Close()
log.Printf("RemoteAddr: %s\n", conn.RemoteAddr())
var allGoroutinesWg sync.WaitGroup
defer allGoroutinesWg.Wait()
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
allGoroutinesWg.Add(1)
// Requests goroutine.
go func() {
defer allGoroutinesWg.Done()
i := 0
paths := []string{"/doc/tos.html", "/pkg/net/http%23Transport/", "/nonexistent"}
for {
for _, path := range paths {
asStr := fmt.Sprintf("GET %s HTTP/1.1\r\nConnection: keep-alive\r\nHost: golang.org\r\n\r\n", path)
n, err := conn.Write([]byte(asStr))
log.Printf("\n\n\033[35mWrote %d bytes, error: %v\033[00m\n\n", n, err)
select {
case <-time.After(3800 * time.Millisecond):
i++
fmt.Printf("Making the next request: #%d", i)
case <-ctx.Done():
fmt.Printf("Exiting writing goroutine: %v", ctx.Err())
return
}
}
}
}()
// Responses are read here.
br := bufio.NewReader(conn)
for {
res, err := http.ReadResponse(br, nil)
if err != nil {
log.Printf("\033[31mError: %v\033[00m", err)
return
}
blob, _ := httputil.DumpResponse(res, true)
fmt.Printf("\n\n\n%s\n\n\n", blob)
}
} and in the midst of running it, I'll switch from my WiFi connection to my cell phone's network to replicate your network switch and eventually the program will fail with 2019/08/10 01:24:13 Error: read tcp \
[2607:fb90:2821:d316:f56a:d483:8da7:3831]:60717->[2607:f8b0:4007:801::2011]:443: read: no route to host
Exiting writing goroutine: context canceled |
I am going to mark this as working as intended. package main
import (
"fmt"
"log"
"net"
"golang.org/x/sys/unix"
)
func main() {
ln, err := net.Listen("tcp", ":0")
if err != nil {
log.Fatalf("Failed to grab an available address: %v", err)
}
defer ln.Close()
conn, err := net.Dial("tcp", ln.Addr().String())
if err != nil {
log.Fatalf("Failed to dial to listener: %v", err)
}
defer conn.Close()
tcpConn, ok := conn.(*net.TCPConn)
if !ok {
log.Fatalf("Unfortunately never made a TCP connection: %T", conn)
}
fmt.Printf("Addr: %s\n", ln.Addr())
file, err := tcpConn.File()
if err != nil {
log.Fatalf("Failed to find the associated file: %v", err)
}
fd := file.Fd()
tcpInfo, err := unix.GetsockoptTCPInfo(int(fd), unix.SOL_TCP, unix.TCP_INFO)
if err != nil {
log.Fatalf("Failed to get socket option: %v", err)
}
log.Printf("TCPInfo: %#v\n", tcpInfo)
} and as per the dialer, if you use the default one in the default transport, please expect it to be >=9 minutes given that maths on its behavior. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Also it reproduce on go1.12 linux/amd64
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I run next code on VM with Centos. This code run 5 goroutines each of which starts a cycle with http requests per second:
https://play.golang.org/p/uRcIWDhqwF2
And then i change network on it VM. You can see this moment in output:
Full output:
And after change ip on VM you can see what goroutines 1 and 4 is not responding. Ip changed in 08:16:42 but before 08:23:21 no posts from goroutines 1 and 4. Also not posts as "Close number: " (from line 86) and no posts about error (from lien 91).
What did you expect to see?
I expect the request to complete no more than half a minute after changing the address since I use default http transport. Default http transport have half a minute keepalive.
What did you see instead?
I see what goroutines 1 and 4 not response 8 minute after ip was changed. I think it's a incorrect behavior
The text was updated successfully, but these errors were encountered: