Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: builtin DNS stub resolver returns query results including the second best records unconditionally #11081

Closed
estesp opened this issue Jun 5, 2015 · 11 comments
Milestone

Comments

@estesp
Copy link

estesp commented Jun 5, 2015

  • Version: 1.4.2; simple net.LookupIP() program compiled both with -tags "netgo" and without
  • OS/Arch: Ubuntu Linux 14.04LTS, x64
  • Steps:
    1. may require special DNS server crafted, as to reproduce the problem you need a search <somedomain> in your /etc/resolv.conf and the nameserver in use needs to have a catch-all in the zone def'n responding with an AAAA record for all matches on *.somedomain.
    2. given the above, if you search for any hostname with a valid A (IPv4) record from any authoritative nameserver, but no AAAA (IPv6) record, with 1.4.2-netgo you will be returned both the valid IPv4 address as well as the catch-all IPv6 record for <somedomain>
  • Expected behavior:: The cgo resolver as well as pure libc-based (which you would expect as they should be one and the same) resolving only replies with the (correct) IPv4 A record for the hostname.
  • Current behavior:: Any Go program compiled with netgo will receive incorrect results from a DNS client lookup for this specific DNS server setup; for a real-world scenario see this issue in Docker, which is compiled statically with Go 1.4.2 with -tags "netgo": Go-lang "netgo" DNS resolver bug with catch-all DNS server entries moby/moby#10863

While I would like to say I have the exact set of changes in net/dnsclient_unix.go figured out as to the difference, it seems to hinge on two things--one is that the requests to the nameserver now come from two different connections (source ports) from the Go resolver (because of the go routines used in 1.4.2 that aren't used in cgo, or in 1.3.3 cgo or netgo), and secondly, the initial AAAA response is not found instead of the nameserver doing a forward-request to upstream (e.g. Google public DNS) which you see in the 1.3.3 trace below. Whichever is the exact root cause, the result is that the 1.4.2 netgo code decides it is time to append the search domain and try again for the AAAA record, which hits the "catch-all" set up in the nameserver, which then responds with the catch-all AAAA record for that domain, which means a client may decide to select that as the valid response, when the hostname really has no AAAA record, but has a valid A record which is also returned.

We can debate about the ugliness of catch-all entries in a DNS setup, but the root issue seems to be that netgo has different behavior in this particular scenario than cgo/libc and also from 1.3.3, not to mention we can't control whether certain users will have DNS configurations and backends beyond their control.

go 1.3.3 trace from a simple net.LookupIP("index.docker.io"):

10:25:33.922616 IP 172.17.0.22.57536 > 172.17.0.13.domain: 51477+ A? index.docker.io. (33)
10:25:33.922669 IP 172.17.0.22.57536 > 172.17.0.13.domain: 51477+ A? index.docker.io. (33)
10:25:33.922704 IP 172.17.0.22.57536 > 172.17.0.13.domain: 27449+ AAAA? index.docker.io. (33)
10:25:33.922715 IP 172.17.0.22.57536 > 172.17.0.13.domain: 27449+ AAAA? index.docker.io. (33)
10:25:33.923099 IP 172.17.0.13.62093 > google-public-dns-a.google.com.domain: 25323+ [1au] A? index.docker.io. (44)
10:25:33.923343 IP 172.17.0.13.29026 > google-public-dns-a.google.com.domain: 30220+ [1au] AAAA? index.docker.io. (44)
10:25:33.923918 IP 172.17.0.13.16392 > google-public-dns-a.google.com.domain: 17430+ [1au] NS? . (28)
10:25:33.948675 IP google-public-dns-a.google.com.domain > 172.17.0.13.62093: 25323 1/0/1 A 162.242.195.84 (60)
10:25:33.949118 IP 172.17.0.13.domain > 172.17.0.22.57536: 51477 1/13/0 A 162.242.195.84 (260)
10:25:33.954178 IP google-public-dns-a.google.com.domain > 172.17.0.13.29026: 30220 0/1/1 (114)

go 1.4.2-netgo trace from a simple net.LookupIP("index.docker.io"):

13:42:50.021716 IP 172.17.0.23.47708 > 172.17.0.13.domain: 54982+ A? index.docker.io. (33)
13:42:50.021791 IP 172.17.0.23.47708 > 172.17.0.13.domain: 54982+ A? index.docker.io. (33)
13:42:50.021925 IP 172.17.0.23.36219 > 172.17.0.13.domain: 64311+ AAAA? index.docker.io. (33)
13:42:50.021938 IP 172.17.0.23.36219 > 172.17.0.13.domain: 64311+ AAAA? index.docker.io. (33)
13:42:50.022036 IP 172.17.0.13.domain > 172.17.0.23.36219: 64311 0/1/0 (103)
13:42:50.022103 IP 172.17.0.13.domain > 172.17.0.23.36219: 64311 0/1/0 (103)
13:42:50.022337 IP 172.17.0.13.55684 > google-public-dns-b.google.com.domain: 22399+ [1au] A? index.docker.io. (44)
13:42:50.022583 IP 172.17.0.13.56478 > google-public-dns-b.google.com.domain: 7997+ [1au] NS? . (28)
13:42:50.022834 IP 172.17.0.23.55110 > 172.17.0.13.domain: 49279+ AAAA? index.docker.io.testdocker.org. (48)
13:42:50.022856 IP 172.17.0.23.55110 > 172.17.0.13.domain: 49279+ AAAA? index.docker.io.testdocker.org. (48)
13:42:50.023017 IP 172.17.0.13.domain > 172.17.0.23.55110: 49279* 1/2/2 AAAA 2002::2002:2002:1 (144)
13:42:50.023140 IP 172.17.0.13.domain > 172.17.0.23.55110: 49279* 1/2/2 AAAA 2002::2002:2002:1 (144)
13:42:50.053067 IP google-public-dns-b.google.com.domain > 172.17.0.13.55684: 22399 1/0/1 A 162.242.195.84 (60)
13:42:50.053494 IP 172.17.0.13.domain > 172.17.0.23.47708: 54982 1/13/0 A 162.242.195.84 (260)

Additional resources available--I have a container with the BIND setup shown above, as well as the simple program performing net.LookupIP(). Also you can see moby/moby#10863 for further discussion although fair warning that some of it is off-track as the problem was being narrowed down.

@bradfitz
Copy link
Contributor

bradfitz commented Jun 5, 2015

Please try to reproduce with Go tip (Go 1.5). No further work is planned for Go 1.4.

@estesp
Copy link
Author

estesp commented Jun 5, 2015

@bradfitz sure, I'll try with 1.5. My main goal is to make sure the bug is resolved if it still exists in 1.5, and the netgo resolver code still looks very similar to 1.4.2. I'll verify by testing with tip to make sure it hasn't been resolved in some way that isn't clear from a quick look at the code.

@bradfitz
Copy link
Contributor

bradfitz commented Jun 5, 2015

/cc @mikioh

@bradfitz bradfitz added this to the Go1.5Maybe milestone Jun 5, 2015
@estesp
Copy link
Author

estesp commented Jun 5, 2015

The problem still exists in tip/Go 1.5:

root@57a6b75c2d53:/# ./testdial-gotip-1.5.0-netgo index.docker.io
addr: 162.242.195.84
addr: 2002::2002:2002:1

@bradfitz
Copy link
Contributor

bradfitz commented Jun 5, 2015

Where is the testdial code?

@estesp
Copy link
Author

estesp commented Jun 5, 2015

https://gist.github.com/estesp/89ec461423e3d4e10ab4

package main

import (
    "fmt"
    "net"
    "os"
)

func main() {
    addrs, err := net.LookupIP(os.Args[1])
    if err != nil {
        fmt.Println(err)
        os.Exit(1)
    }

    for _, addr := range addrs {
        fmt.Printf("addr: %s\n", addr)
    }
}

@mikioh mikioh changed the title net: 1.4.2 DNS client mishandles A/AAAA address lookup vs. libc/cgo/1.3.3-netgo net: query pipeline reordering happens in builtin DNS stub resolver with domain, search options Jun 6, 2015
@gopherbot
Copy link

CL https://golang.org/cl/10836 mentions this issue.

@mikioh mikioh changed the title net: query pipeline reordering happens in builtin DNS stub resolver with domain, search options net: builtin DNS stub resolver returns query results including the second best records unconditionally Jun 9, 2015
@gopherbot
Copy link

CL https://golang.org/cl/10931 mentions this issue.

@estesp
Copy link
Author

estesp commented Jul 8, 2015

@mikioh I see that Go 1.5 beta 1 is out--do we still think the resolver fix(es) can make Go 1.5? It would be very helpful to have proper IPv6/IPv4 response ordering and bug fixes in 1.5.

@mikioh
Copy link
Contributor

mikioh commented Jul 8, 2015

@estesp,

I don't know, sorry. The project owners will decide.

mikioh pushed a commit that referenced this issue Jul 13, 2015
This change does clean up as preparation for fixing #11081.

- renames cfg to resolvConf for clarification
- adds a new type resolverConfig and its methods: init, update,
  tryAcquireSema, releaseSema for mutual exclusion of resolv.conf data
- deflakes, simplifies tests for resolv.conf data; previously the tests
  sometimes left some garbage in the data

Change-Id: I277ced853fddc3791dde40ab54dbd5c78114b78c
Reviewed-on: https://go-review.googlesource.com/10931
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot
Copy link

CL https://golang.org/cl/12214 mentions this issue.

@mikioh mikioh closed this as completed in 6aa48a9 Jul 28, 2015
@golang golang locked and limited conversation to collaborators Aug 5, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants