Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: cgo dns resolver doesn't work as expected. #66114

Closed
brotherlu-xcq opened this issue Mar 5, 2024 · 4 comments
Closed

net: cgo dns resolver doesn't work as expected. #66114

brotherlu-xcq opened this issue Mar 5, 2024 · 4 comments
Labels
WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.

Comments

@brotherlu-xcq
Copy link

brotherlu-xcq commented Mar 5, 2024

Go version

go 1.19 linux/amd64

Output of go env in your module/workspace:

GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/go"
GOPRIVATE=""
GOPROXY="https://goproxy.cn"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.19"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3621145106=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I tried to request the url using a local domain.(can only be resolved by local dns resolver) like below:

var s = true
var host = "test_demo.qa_default.namingdns.com:9898"

func main() {
	dialer := &net.Dialer{
		Timeout:   30 * time.Second,
		KeepAlive: 30 * time.Second,
	}
	clt := &http.Client{
		Transport: &http.Transport{
			Proxy:           http.ProxyFromEnvironment,
			MaxIdleConns:    100,
			IdleConnTimeout: 30 * time.Second,
			DialContext: func(ctx context.Context, network, addr string) (conn net.Conn, err error) {
				return dialer.DialContext(ctx, "tcp4", addr)
			},
		}}
	file, err := os.OpenFile("run.log", os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0666)
	if err != nil {
		log.Fatal(err)
	}
	defer file.Close()

	log.SetOutput(file)
	for i := 0; i < 20; i++ {
		go func() {
			for {
				start := time.Now()
				if !s {
					continue
				}
				req, err := http.NewRequest("GET", "http://"+host, nil)
				trace := &httptrace.ClientTrace{
					DNSDone: func(info httptrace.DNSDoneInfo) {
						if info.Err != nil {
							log.Printf("DNS ERROR: %v, address: %+v\n", err, info)
						}
						timeCost := time.Now().Sub(start).Seconds()
						if timeCost > 5 {
							log.Printf("execute time more than 5s, %vs\n", timeCost)
						}
					},
				}
				req = req.WithContext(httptrace.WithClientTrace(req.Context(), trace))
				resp, err := clt.Do(req)
				if err != nil {
					log.Printf("%v\n", err)
				} else {
					resp.Body.Close()
				}
				time.Sleep(time.Millisecond * 100)
			}
		}()
	}
}

the local cat /etc/resolv.conf like below:

nameserver 127.0.0.1
nameserver xx.xx.xx.xx
nameserver xx.xx.xx.xx
nameserver xx.xx.xx.xx
options timeout:1 attempts:3

the dns resolver used the 3th namesever sometimes and my local resolver work without any problem , but when I using dlv remote debug this problem, the phenomenon has disappeared. and when I build the program using -tags 'netgo', it's also disappear.

What did you see happen?

some request return the blow message:

2024/03/05 19:20:30 DNS ERROR: <nil>, address: {Addrs:[] Err:lookup test_demo.qa_default.namingdns.com: no such host Coalesced:true}
2024/03/05 19:20:30 Get "http://test_demo.qa_default.namingdns.com:9898": dial tcp4: lookup test_demo.qa_default.namingdns.com: no such host

What did you expect to see?

always using the first namesever to resolv the domain.

@mateusz834 mateusz834 changed the title cgo: the dns resolver doesn't work as expected. net: cgo dns resolver doesn't work as expected. Mar 5, 2024
@mateusz834
Copy link
Member

mateusz834 commented Mar 5, 2024

Can you execute dig @address test_demo.qa_default.namingdns.com for every resolver in resolv.conf? And please show the output.

Libc issues are not in our scope, you have to report this the libc maintainers.

@mateusz834 mateusz834 added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Mar 5, 2024
@brotherlu-xcq
Copy link
Author

thank you for reply me so quickly, the first and second dig result like below:

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.15 <<>> test_demo.qa_default.namingdns.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33737
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;test_demo.qa_default.namingdns.com. IN A

;; ANSWER SECTION:
test_demo.qa_default.namingdns.com. 1 IN A      10.179.173.156

;; Query time: 93 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Mar 06 11:31:16 CST 2024
;; MSG SIZE  rcvd: 113

3th and 4th nameserver dig result like below:

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.15 <<>> @xx.xx.xx.xx test_demo.qa_default.namingdns.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 9787
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;test_demo.qa_default.namingdns.com. IN A

;; AUTHORITY SECTION:
namingdns.com.          600     IN      SOA     dns1.registrar-servers.com. hostmaster.registrar-servers.com. 1706026847 43200 3600 604800 3601

;; Query time: 93 msec
;; SERVER: xx.xx.xx.xx#53(xx.xx.xx.xx)
;; WHEN: Wed Mar 06 11:34:29 CST 2024
;; MSG SIZE  rcvd: 133

@mateusz834
Copy link
Member

This is weird that the libc does not always query the first nameserver. The only thing that I can suggest is to limit the amount of nameservers to 3, maybe this is the reason for this issue? It might truncate the first nameserver, and only query the three nameservers that return the NXDOMAIN code. The resolv.conf manual notes that up to 3 namesevers can be specified in the resolv.conf file.

resolv.conf(5):

nameserver Name server IP address
Internet address of a name server that the resolver should
query, either an IPv4 address (in dot notation), or an
IPv6 address in colon (and possibly dot) notation as per
RFC 2373. Up to MAXNS (currently 3, see <resolv.h>)

@brotherlu-xcq
Copy link
Author

thank you very muck @mateusz834 , it seems doesn't resolv my problem on this way. I will try to report this to the libc maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

2 participants