Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: LookupIP("doesnotexist.domain") returns "server misbehaving" when resolv.conf contains search lists #12712

Closed
mbenkmann opened this issue Sep 22, 2015 · 18 comments
Milestone

Comments

@mbenkmann
Copy link

The following test program returns "lookup doesnotexist.domain: no such host" when run with https://storage.googleapis.com/golang/go1.4.2.linux-amd64.tar.gz but returns "lookup doesnotexist.domain on 172.16.2.203:53: server misbehaving" with https://storage.googleapis.com/golang/go1.5.linux-amd64.tar.gz and https://storage.googleapis.com/golang/go1.5.1.linux-amd64.tar.gz

package main

import "net"
import "fmt"

func main() {
  _, err := net.LookupIP("doesnotexist.domain")
  fmt.Println(err)
}

When I change resolv.conf to use nameserver 8.8.8.8 the output is correct. Apparently something has changed in Go 1.5 that prevents it from understanding the reply from our internal DNS server.

nslookup does not have a problem:

> nslookup doesnotexist.domain 172.16.2.203
Server:         172.16.2.203
Address:        172.16.2.203#53

** server can't find doesnotexist.domain: NXDOMAIN

> nslookup doesnotexist.domain 8.8.8.8
Server:         8.8.8.8
Address:        8.8.8.8#53

** server can't find doesnotexist.domain: NXDOMAIN

dig also has no problem

> dig @172.16.2.203 doesnotexist.domain
; <<>> DiG 9.9.5-3ubuntu0.4-Ubuntu <<>> @172.16.2.203 doesnotexist.domain
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 37809
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;doesnotexist.domain.               IN      A

;; AUTHORITY SECTION:
.                   1200    IN      SOA     a.root-servers.net. nstld.verisign-grs.com. 2015092200 1800 900 604800 86400

;; Query time: 2 msec
;; SERVER: 172.16.2.203#53(172.16.2.203)
;; WHEN: Tue Sep 22 13:13:11 CEST 2015
;; MSG SIZE  rcvd: 123
@bradfitz
Copy link
Contributor

Without more information or a way to reproduce it we won't be able to do much.

Can you capture the network traffic when you run the Go program, or dig?

From that we should be able to write a test, once we see what's arriving on the wire from your DNS server.

@bradfitz bradfitz changed the title net.LookupIP() returns "server misbehaving" net: LookupIP returns "server misbehaving" with Go's DNS client Sep 22, 2015
@mbenkmann
Copy link
Author

If you can tell me how to make a local copy of Go's net package work, I can dump out the relevant data structures and maybe even find the issue myself. Unfortunately just copying /opt/go/src/net to my local directory and changing import "net" to import "../net" does not work. It says "imports internal/singleflight: use of internal package not allowed". Copying /opt/go/src/internal, too, does not help. Copying "net" and "internal" to $GOPATH/src does not work (without changing import "net") either. They are simply ignored and the global system version keeps being used.

@bradfitz
Copy link
Contributor

You won't be able to do that. Just build Go from source and modify the net package.

Run "./src/make.bash" from the "src" directory to recompile Go. It takes 30-60 seconds to build everything.

@mbenkmann
Copy link
Author

I've found the problem. My resolv.conf has a line "search foo bar" for 2 subdomains foo and bar. For whatever reason queries to anything.bar are answered with error code 2 SERVFAIL. LookupIP("doesnotexist.domain") apparently uses the search field in resolv.conf and queries "doesnotexist.domain", "doesnotexist.domain.foo" and "doesnotexist.domain.bar" in sequence and reports the LAST ERROR CODE returned, which happens to be SERVFAIL. If the order in resolv.conf is changed to "search bar foo", LookupIP reports the NXDOMAIN error (no such host).

While it is apparent that the DNS server is misbehaving, the current LookupIP() implementation is suboptimal, because:

  • the error code returned depends on the ordering in the search line of resolv.conf
  • the behaviour of Go 1.5 is different from Go 1.4
  • the error code reported does not relate to the actual query, which is not what the programmer expects, especially not in a case where the host queried contains a ".". I'm not even sure if it is proper to append the search subdomains to names that contain a ".".

In any case, I would change LookupIP() so that if the DNS server returns different error codes for the different queries made due to to "search", LookupIP() should always return the error code returned for the query with the actual argument passed to LookupIP(). This is the only query that is guaranteed to happen and therefore the only code that can be expected to be consistent.

@bradfitz bradfitz added this to the Go1.6 milestone Sep 24, 2015
@bradfitz
Copy link
Contributor

the behaviour of Go 1.5 is different from Go 1.4

Go 1.5 uses Go by default for DNS lookups, and only falls back to libc's resolver for special cases. See https://golang.org/doc/go1.5#net for details.

So, it's not surprising that the Go DNS resolver has some rough edges. It's being exercised a lot more than it has in the past.

/cc @mikioh

@mikioh
Copy link
Contributor

mikioh commented Sep 24, 2015

FWIW, IIRC, troubleshooting tools such as dig and/or drill don't use search list for super (not sub) domains in resolv.conf by default.

Looks like you have some idea to LookupIP. I'm not the original designer of LookupIP but I guess he wanted to make LookupIP easier for name-to-address mapping traversal. If you think of having a new API (sorry, we cannot change the behavior of LookupIP because it works as a stub resolver for helping Dial) for some purpose, please follow the procedure: https://github.com/golang/proposal#readme

Thank you.

@mikioh mikioh closed this as completed Sep 24, 2015
@mikioh mikioh changed the title net: LookupIP returns "server misbehaving" with Go's DNS client net: LookupIP("doesnotexist.domain") returns "server misbehaving" when resolv.conf contains search lists Sep 24, 2015
@bradfitz
Copy link
Contributor

Why did you close this bug?

@bradfitz bradfitz reopened this Sep 24, 2015
@mikioh
Copy link
Contributor

mikioh commented Sep 24, 2015

PS: If you need more low-level control for DNS, there are external packages. For example, http://godoc.org/github.com/miekg/dns.

@mikioh
Copy link
Contributor

mikioh commented Sep 24, 2015

Just hands slipped.

@bradfitz
Copy link
Contributor

I don't think @mbenkmann wanted low-level control of DNS. I interpreted this bug as Go's native DNS resolver just misbehaving compared to libc.

@mbenkmann
Copy link
Author

To repeat, here is what I (and I would assume most application programmers) expect from LookupIP()

  • consistent behavior. Ordering of entries in configuration files should not matter.
  • the correct answer. The host "doesnotexist.domain" does not exist. I want that answer. That the DNS server has a corrupt database for a subdomain "bar" unrelated to my query is no excuse for giving the wrong answer to my query. I'm an application programmer. I support my application. I don't want support requests from users with the error description "Your application says my DNS server is misbehaving. But it's working fine with every other application."
  • consistency across Go versions. In the same environment Go 1.4 gives a different answer than 1.5. BAD!
  • if there are multiple different errors that LookupIP() could return (in this case "host not found" and "server misbehaving") I want the error most closely related to MY CODE, because that's what I can debug. MY CODE requests resolving of "doesnotexist.domain" and the appropriate error is "host not found". That LookupIP() performs some additional queries behind-the-scenes to try extra hard to give me non-error reply is nice, but if that behind-the-scenes code fails, I'm not interested in its error code if there's an error code available directly related to my code.

@mbenkmann
Copy link
Author

I'm repeating myself but I really need to drive this point home: My application, like many, resolves host names provided by users. Users make mistakes, especially typos. When LookupIP() fails I write the error into a log file and users look at that log file. It makes a HUGE difference if that log file says "host not found" or "server misbehaving". It's the difference between allowing the user to quickly realize he's made a typo and fix the problem himself or forcing the user to file a support request with me that will lead to a fruitless discussion unpleasant to both sides about a DNS server and system configuration that neither I nor the user have under our control.

@bradfitz
Copy link
Contributor

I totally agree with.

A packet capture would be helpful for debugging and writing a regression test.

@mbenkmann
Copy link
Author

There's nothing special about the server's reply. It's just an error code 2 SERVFAIL which triggers the "server misbehaving" branch in net/dnsclient.go:answer(). The answer() function is called for each of the queries attempted during lookup (i.e. for each entry in search) and as currently implemented the last error propagates up to be the return from LookupIP(). So for regression testing, just set up your test DNS server to reply SERVFAIL for each query that ends in a certain subdomain and list that subdomain in resolv.conf's search line. Once this issue is fixed, it should no longer matter if the broken subdomain is listed first or last and unless the argument to LookupIP() explicitly includes the broken subdomain, the SERVFAIL should never propagate up.

@j16sdiz
Copy link

j16sdiz commented Oct 14, 2015

Related to #12778 ?

@danp
Copy link
Contributor

danp commented Nov 13, 2015

Similar to #12778 I took a quick look at glibc to see how it handles this. If I'm reading/interpreting correctly, there is a special case for SERVFAIL when trying names with search domains appended causing it to try the next search domain. Should Go's stub resolver do the same?

@danp
Copy link
Contributor

danp commented Nov 16, 2015

I did some more digging and was able to repro this using a DNS server built with github.com/miekg/dns.

Instead of the current capturing and returning of the last seen error it probably would make sense to prefer the error encountered when looking up the name closest to what was passed in to LookupIP and friends.

This would mostly mimic what glibc does (here and here), though it only returns the error encountered when looking up the provided name if it has enough dots. Otherwise it will return the last error encountered while trying names with search domains appended, similar to what the Go implementation is doing now.

I'm happy to put a CL together if preference for errors close to user input sounds like a good plan.

@gopherbot
Copy link

CL https://golang.org/cl/16953 mentions this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants