Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: DialTimeout is not able to connect when /etc/resolv.conf has some unreachable nameservers at the top #57694

Open
psasidhar opened this issue Jan 9, 2023 · 4 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@psasidhar
Copy link

What version of Go are you using (go version)?

(oath_tools) N9N7C9PNWP:hca palakas$ go version
go version go1.19.4 darwin/arm64
(oath_tools) N9N7C9PNWP:hca palakas$

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

What did you do?

We have a small helper function like this:

// dial attempts to open a tcp connection to the host:port address with a 3 second timeout
func dial(host string, port int) error {
	conn, err := net.DialTimeout("tcp", fmt.Sprintf("%s:%d", host, port), time.Second*3)
	if err == nil {
		conn.Close()
		return nil
	}

	return err
}

This function gets the error "dial tcp: lookup zts.athens.yahoo.com: i/o timeout", when /etc/resolv.conf has nameservers that are not reachable at the top.

For ex:

[palakas@a26074b9 ~]$ cat /etc/resolv.conf
; Created by cloud-init on instance boot automatically, do not edit.
;
nameserver 98.136.206.44
nameserver 98.136.206.45
nameserver 98.136.206.41
nameserver 98.136.206.42
[palakas@a26074b9 ~]$ telnet 98.136.206.44 53
Trying 98.136.206.44...
^C
[palakas@a26074b9 ~]$ telnet 98.136.206.45 53
Trying 98.136.206.45...
^C
[palakas@a26074b9 ~]$ telnet 98.136.206.41 53
Trying 98.136.206.41...
Connected to 98.136.206.41.
Escape character is '^]'.
^C^]

telnet> quit
Connection closed.
[palakas@a26074b9 ~]$ nslookup www.yahoo.com
Server:		98.136.206.41
Address:	98.136.206.41#53

www.yahoo.com	canonical name = new-fp-shed.wg1.b.yahoo.com.
Name:	new-fp-shed.wg1.b.yahoo.com
Address: 98.137.11.165
Name:	new-fp-shed.wg1.b.yahoo.com
Address: 2001:4998:24:120d::f000

[palakas@a26074b9 ~]$

What did you expect to see?

Go resolver code to continue to try a nameserver that is reachable and pick up the answer. For example, nslookup or dig don't have any issue in resolving the names when an unreachable nameserver is at the top in /etc/resolve.conf

What did you see instead?

We are getting at timeout, possibly after trying the nameservers that are not reachable.

@mateusz834
Copy link
Member

Try with running it with GODEBUG=netdns=go or GODEBUG=netdns=cgo.

Also note that resolv.conf has a limit of 3 resolvers. man resolv.conf(5):

Up to MAXNS (currently 3, see <resolv.h>) name
servers may be listed, one per keyword.

@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jan 9, 2023
@bcmills bcmills added this to the Backlog milestone Jan 9, 2023
@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 9, 2023
@bcmills
Copy link
Contributor

bcmills commented Jan 9, 2023

(CC @ianlancetaylor @neild)

@bcmills bcmills added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Jan 9, 2023
@psasidhar
Copy link
Author

psasidhar commented Jan 10, 2023

I added the 4th entry just to observe the behavior and to verify that Go code is actually reading all 3 entries. When I added debugging logs to the functions in net package, I can see that for some lookups it is attempting to connect to all 3 name servers, however functions seem to return with the timeout from the first entry.

In order to isolate the problem, I wrapped the net.DialTimeout call as follows with driver main.go program.

cat main.go Output
[palakas@a26074b9 ~]$ cat main.go
package main

import (
	"fmt"
	"log"
	"net"
	"time"
)

// dial attempts to open a tcp connection to the host:port address with a 3 second timeout
func dial(host string, port int) error {
	conn, err := net.DialTimeout("tcp", fmt.Sprintf("%s:%d", host, port), time.Second*3)
	if err == nil {
		conn.Close()
		return nil
	}

	return err
}

func main() {

	err := dial("www.yahoo.com", 443)
	if err != nil {
		log.Fatalf("unable to reach 443 on www.yahoo.com, err: %v", err)
	}
	log.Printf("sucess test")
}
[palakas@a26074b9 ~]$
single working name server in /etc/resolv.conf, the code works fine Output
[palakas@a26074b9 ~]$ cat /etc/resolv.conf
; Created by cloud-init on instance boot automatically, do not edit.
;
;nameserver 98.136.206.44
;nameserver 98.136.206.45
nameserver 98.136.206.41
[palakas@a26074b9 ~]$ go run ./main.go
2023/01/10 00:02:21 sucess test
[palakas@a26074b9 ~]$
two bad entries and one good entry in /etc/resolv.conf, we get a timeout Output
[palakas@a26074b9 ~]$ cat /etc/resolv.conf
; Created by cloud-init on instance boot automatically, do not edit.
;
nameserver 98.136.206.44
nameserver 98.136.206.45
nameserver 98.136.206.41
[palakas@a26074b9 ~]$
[palakas@a26074b9 ~]$
[palakas@a26074b9 ~]$ go run ./main.go
2023/01/10 00:08:14 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
[palakas@a26074b9 ~]$
With GODEBUG turned on Output
[palakas@a26074b9 ~]$ GODEBUG=netdns=go go run ./main.go
2023/01/10 00:08:44 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
[palakas@a26074b9 ~]$ GODEBUG=netdns=cgo go run ./main.go
2023/01/10 00:08:51 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
[palakas@a26074b9 ~]$
one bad entry, one good entry in /etc/resolv.conf, with netdns=cgo the function works intermittently, but not with netdns=go Output
[palakas@a26074b9 ~]$ cat /etc/resolv.conf
; Created by cloud-init on instance boot automatically, do not edit.
;
;nameserver 98.136.206.44
nameserver 98.136.206.45
nameserver 98.136.206.41
[palakas@a26074b9 ~]$ for i in `seq 10`
> do
> GODEBUG=netdns=cgo go run ./main.go
> done
2023/01/10 00:12:53 sucess test
2023/01/10 00:12:55 sucess test
2023/01/10 00:12:58 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
2023/01/10 00:13:01 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
2023/01/10 00:13:05 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
2023/01/10 00:13:08 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
2023/01/10 00:13:10 sucess test
2023/01/10 00:13:13 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
2023/01/10 00:13:16 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
2023/01/10 00:13:18 sucess test
[palakas@a26074b9 ~]$ for i in `seq 10`; do GODEBUG=netdns=go go run ./main.go; done
2023/01/10 00:13:30 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
2023/01/10 00:13:33 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
2023/01/10 00:13:36 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
2023/01/10 00:13:39 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
2023/01/10 00:13:42 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
2023/01/10 00:13:46 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
2023/01/10 00:13:49 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
2023/01/10 00:13:52 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
2023/01/10 00:13:55 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
2023/01/10 00:13:58 unable to reach 443 on www.yahoo.com, err: dial tcp: lookup www.yahoo.com: i/o timeout
exit status 1
[palakas@a26074b9 ~]$
go env Output
[palakas@a26074b9 ~]$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/palakas/.cache/go-build"
GOENV="/home/palakas/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/palakas/go/pkg/mod"
GONOPROXY="go.ouryahoo.com,go.vzbuilders.com"
GONOSUMDB="go.ouryahoo.com,go.vzbuilders.com"
GOOS="linux"
GOPATH="/home/palakas/go"
GOPRIVATE="go.ouryahoo.com,go.vzbuilders.com"
GOPROXY="https://edge.artifactory.ouroath.com:4443/artifactory/go-proxy,https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.19.4"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build337859814=/tmp/go-build -gno-record-gcc-switches"
[palakas@a26074b9 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux release 8.6 (Ootpa)
[palakas@a26074b9 ~]$

@mateusz834
Copy link
Member

mateusz834 commented Feb 12, 2023

The timeout you provide as a argument causes a creation of a internal context inside the net package, which is passed to the address resolution.

go/src/net/dial.go

Lines 393 to 399 in 261fe25

if !deadline.IsZero() {
if d, ok := ctx.Deadline(); !ok || deadline.Before(d) {
subCtx, cancel := context.WithDeadline(ctx, deadline)
defer cancel()
ctx = subCtx
}
}

I feel that it works as intended.

@seankhliao seankhliao removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants