Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: DNS Lookups Very Slow (netgo) #49253

Closed
moloch-- opened this issue Oct 31, 2021 · 21 comments
Closed

net: DNS Lookups Very Slow (netgo) #49253

moloch-- opened this issue Oct 31, 2021 · 21 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@moloch--
Copy link

moloch-- commented Oct 31, 2021

What version of Go are you using (go version)?

Possible duplicate/regression of #21906 and or #26960

$ go version

go version go1.17.2 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/moloch/Library/Caches/go-build"
GOENV="/Users/moloch/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/foobar/go/pkg/mod"
GOOS="darwin"
GOPATH="/Users/foobar/go"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.17.2/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.17.2/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.17.2"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/hk/q06r53v5267fdmk1nw5zqmpc0000gn/T/go-build4154594950=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Example code: https://gist.github.com/moloch--/7ffa9548b3c0f826b39533fcde7e1105

  1. Tried to resolve a domain using the Go resolver (netgo)
  2. PreferGo seems to be ignored when cross-compiling, somewhat expected but extremely un-intuitive.

What did you expect to see?

DNS queries should take around the same time as the system resolver.

What did you see instead?

It takes Go 5+ seconds to resolve a domain.

Screen Shot 2021-10-31 at 3 56 35 PM

Screen Shot 2021-10-31 at 4 26 08 PM

Screen Shot 2021-10-31 at 4 27 20 PM

@davecheney
Copy link
Contributor

What’s the result that is resolved after 5 seconds? I have a suspicion that’s a timeout.

@moloch--
Copy link
Author

moloch-- commented Oct 31, 2021

No, they resolve correctly:

Screen Shot 2021-10-31 at 4 44 56 PM

Again, with PreferGo set to false:

➜  dnstest ./main google.com
go package net: using cgo DNS resolver
go package net: hostLookupOrder(google.com) = cgo
Query took: 7.489067msips = [{142.250.191.206 } {2607:f8b0:4009:814::200e }]err = <nil>

@moloch--
Copy link
Author

The main problem is with cross-compiling, there doesn't seem to be a way to cross-compile a working version of this code as it always uses the extremely slow Go resolver. PreferGo: false only works when you're compiling to the same platform.

@davecheney
Copy link
Contributor

That is correct, cgo is disabled when cross compiling.

@thanm thanm changed the title DNS Lookups Very Slow (netgo) net: DNS Lookups Very Slow (netgo) Nov 1, 2021
@thanm thanm added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 1, 2021
@thanm
Copy link
Contributor

thanm commented Nov 1, 2021

@ianlancetaylor @neild per owners.

@ianlancetaylor
Copy link
Contributor

A request: please put plain text as plain text in comments. Please do not use images. Plain text is much easier to read, and permits cut and paste. Thanks.

@ianlancetaylor
Copy link
Contributor

You can enable cgo when cross-compiling, but you must have a C cross-compiler. Set the environment variable CC to the cross-compiler and set CGO_ENABLED=1.

@ianlancetaylor
Copy link
Contributor

Are you running the program on amd64 darwin?

@moloch--
Copy link
Author

moloch-- commented Nov 2, 2021

Yes, darwin/amd64 I understand I can use a cross-compiler, however I just didn't expect there to be such a drastic difference between query times. Configuring a MacOS cross-compiler can also be complicated due to the MacOS SDK licensing, which is generally required for a functional MacOS cross-compiler. (also will post text going forward, orignal code is linked in gist)

@neild
Copy link
Contributor

neild commented Nov 2, 2021

It takes Go 5+ seconds to resolve a domain.

Running your test program on my OS X laptop:

$ GODEBUG=netdns='go+9' ~/src/go2/bin/go run main.go google.com
go package net: GODEBUG setting forcing use of Go's resolver
go package net: hostLookupOrder(google.com) = files,dns
Query took: 54.871774ms
$ GODEBUG=netdns='9' ~/src/go2/bin/go run main.go google.com
go package net: using cgo DNS resolver
go package net: hostLookupOrder(google.com) = files,dns
Query took: 57.814102ms

I don't know why resolution is taking 5 seconds on your machine, but it's not fundamental to the Go resolver. The very round value of 5 seconds is quite suspicious.

@moloch--
Copy link
Author

moloch-- commented Nov 2, 2021

Yes, it does seem environmental, but the oddity is that dig and the CGO resolver, etc all still work normally.

@moloch--
Copy link
Author

moloch-- commented Nov 2, 2021

Okay I think I figured it out: pure go resolver relies on /etc/resolv.conf whereas the system resolver uses scutil or some platform specific mechanism. I had an entry of 1.1.1.1 in my /etc/resolv.conf (Cloudflare DNS) but my router doesn't properly 1.1.1.1. The system resolver seems smart enough to figure out that this resolver doesn't work well and relies on the alternate DNS. However, the pure-Go resolver doesn't figure this out and fails over after a 5s timeout.

@moloch-- moloch-- closed this as completed Nov 2, 2021
@mvdan
Copy link
Member

mvdan commented Nov 2, 2021

The five seconds look like some of the defaults we have in the net package:

timeout: 5 * time.Second,

What does your resolv.conf look like? If it has multiple entries, I wonder if one entry is hitting the 5s timeout, then the next entry correctly resolves.

@mvdan
Copy link
Member

mvdan commented Nov 2, 2021

Oops, comment race. I still wonder if there's something to do in the net package. It's technically right, but taking five seconds to give a good result is not intuitive at all.

@mistydemeo
Copy link
Contributor

@mvdan Sorry for the bump, but I identified this exact issue and was looking into whether it needed to be reported when I ran into this report. Should this be reopened? I've had to stop using Mac cross-compilation for some code that runs into this because of the five second delay.

@mvdan
Copy link
Member

mvdan commented Dec 8, 2021

@mistydemeo at least OP's problem seemed to stem from a bad resolv.conf - do you have a similar situation?

If the system resolver is faster or more reliable in these situations, I think it would be reasonable to reopen the issue as an enhancement.

@seankhliao
Copy link
Member

the system resolver issue is #16345 and #12524

@mvdan
Copy link
Member

mvdan commented Dec 8, 2021

Thanks @seankhliao; it seems like #12524 is the proper fix.

@mistydemeo
Copy link
Contributor

It's #12524. In my case, it's caused by a VPN-provided resolver that's not in /etc/resolv.conf.

A comment from 2018 indicated it wasn't considered a big deal to build for Darwin on Darwin, but cross-compilation for Intel/ARM64 binaries has made this something much easier to run into consistently even when building for Darwin on Darwin.

@mvdan
Copy link
Member

mvdan commented Dec 9, 2021

Indeed, and I don't think anyone is disputing that #12524 should happen. I believe the labels "help wanted" and "needs investigation" roughly mean "this requires some expertise and time, and we lack either at the moment".

@CompuRoot
Copy link

It doesn't looks like the Go's DNS resolver slow issue related only to darwin, since I stepped today on the same issue on a plain Linux machine.

All network tools like dig, wget and curl resolves hosts under 50ms on that machine.

Go program compiled with CGO_ENABLED=0 and --ldflags="-s -w":

package main

import (
    "fmt"
    "log"
    "net"
    "time"
)

func main() {
    start := time.Now()

    ip, err := net.LookupHost("localhost")
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("\nQuery time: %s", time.Since(start))
    fmt.Printf("\nIPs = %v\n\n", ip)
}

gave me back query time between 1.9-2.2 seconds for the localhost. After digging the issue I found out that netgo DNS resolver not only using /etc/resolv.conf but also /etc/hosts (which has around 800,000 lines and total size around 23Mb on machine that had slow DNS resolving).

After reducing hosts file to just this below

127.0.0.1   localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0     ip6-localnet
ff00::0     ip6-mcastprefix
ff02::1     ip6-allnodes
ff02::2     ip6-allrouters

everything get back to "normal" and Go's resolver resolves localhost in a 1/4s of milliseconds.

It looks like Go's reads the whole hosts file, normalize it (readHosts() function in go/src/net/hosts.go) and only then start evaluating DNS, that's why a first DNS query is always takes way too much time. Also, in case of hosts file changed frequently, Go's resolver reread file again that's takes again a big amount of time for a first DNS query following update.

I believe such "issue" (huge hosts file) I discovered will effect all platforms, since a big hosts file must be processed in the same way in case of CGO free programs.

I actually hesitating to call discovered behavior as an issue unless someone figure how it can be improved.

@golang golang locked and limited conversation to collaborators Dec 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

10 participants