Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: Resolver's lookup function behavior is oblivious to a custom Dial function #54244

Closed
th0m opened this issue Aug 3, 2022 · 1 comment
Closed

Comments

@th0m
Copy link

th0m commented Aug 3, 2022

What version of Go are you using (go version)?

$ go version
go version go1.18.2 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/vagrant/.cache/go-build"
GOENV="/home/vagrant/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/vagrant/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/vagrant/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.18.2"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1847147377=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Create a Resolver, override its Dial function to set a fixed resolver ip to be used: 1.2.3.4:53.
Set resolv.conf to be

server 1.1.1.1
server 8.8.8.8
server 8.8.4.4

Use it to resolve cloudflare.com NS.

Source code

package main

import (
	"context"
	"fmt"
	"net"
	"time"
)

func main() {
	r := &net.Resolver{
		PreferGo: true,
		Dial: func(ctx context.Context, network, address string) (net.Conn, error) {
			d := net.Dialer{
				Timeout: time.Duration(3 * time.Second),
			}
			return d.DialContext(ctx, "udp", "1.2.3.4:53")
		},
	}
	_, err := r.LookupNS(context.Background(), "cloudflare.com")
	if err != nil {
		fmt.Println(fmt.Sprintf("%v", err))
	}
}

I have noticed the lookup function loops over all resolvers in resolv.conf despite the Dial function having been overridden.
This leads to the number of retries to be correlated to the number of entries in resolv.conf without actually using these resolv.conf servers since the Dial function is overridden.

What did you expect to see?

I would have expected the (r *Resolver) lookup function to not fetch resolv.conf configuration when the Dial function has been overridden with a custom server.

Also I would have expected the number of servers in resolv.conf to not influence the number of retries.

What did you see instead?

With 3 resolvers in resolv.conf, I am seeing 6 retries total (3 * 2)

21:25:00.403789 IP 10.0.2.15.38464 > 1.2.3.4.53: 16792+ NS? cloudflare.com. (32)
21:25:05.404578 IP 10.0.2.15.36711 > 1.2.3.4.53: 53187+ NS? cloudflare.com. (32)
21:25:10.410653 IP 10.0.2.15.48693 > 1.2.3.4.53: 10620+ NS? cloudflare.com. (32)
21:25:15.418829 IP 10.0.2.15.42359 > 1.2.3.4.53: 44619+ NS? cloudflare.com. (32)
21:25:20.425065 IP 10.0.2.15.39995 > 1.2.3.4.53: 9573+ NS? cloudflare.com. (32)
21:25:25.431093 IP 10.0.2.15.41456 > 1.2.3.4.53: 36792+ NS? cloudflare.com. (32)

With 2 resolvers in resolv.conf (I deleted server 1.1.1.1), I am seeing 4 retries total (2 *2)

21:26:22.121101 IP 10.0.2.15.48327 > 1.2.3.4.53: 57107+ NS? cloudflare.com. (32)
21:26:27.126828 IP 10.0.2.15.42868 > 1.2.3.4.53: 61507+ NS? cloudflare.com. (32)
21:26:32.132852 IP 10.0.2.15.39008 > 1.2.3.4.53: 2860+ NS? cloudflare.com. (32)
21:26:37.145549 IP 10.0.2.15.51778 > 1.2.3.4.53: 2160+ NS? cloudflare.com. (32

In both cases the program completes with the following error that contains the wrong server (8.8.4.4:53 isn't actually used, see #43703)

lookup cloudflare.com on 8.8.4.4:53: read udp 10.0.2.15:41456->1.2.3.4:53: i/o timeout

Feel free to let me know if you need any other information, thank you!

@seankhliao
Copy link
Member

Given how this works, resolver gives Dial an address and expects an associated conn back, I would consider this to be working as intended, if unfortunate for this particular situation. There is no way for resolver to know what Dial did to choose an address to attempt to connect to.

See also #12503

@seankhliao seankhliao closed this as not planned Won't fix, can't repro, duplicate, stale Aug 3, 2022
@golang golang locked and limited conversation to collaborators Aug 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants