Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: pure Go dns does not handle UDP packets > 512 octets #13561

Closed
waddles opened this issue Dec 10, 2015 · 6 comments
Closed

net: pure Go dns does not handle UDP packets > 512 octets #13561

waddles opened this issue Dec 10, 2015 · 6 comments

Comments

@waddles
Copy link

waddles commented Dec 10, 2015

I have found a bug in pure Go resolver that causes Dial to fail even though correct responses are returned from the server. This problem manifested itself as a failure when trying to search for docker images in docker 1.9.1 built on go1.4.3:

DEBU[0003] Calling GET /v1.21/images/search
INFO[0003] GET /v1.21/images/search?term=phusion
DEBU[0003] hostDir: /etc/docker/certs.d/docker.io
DEBU[0003] pinging registry endpoint https://index.docker.io/v1/
DEBU[0003] attempting v1 ping for registry endpoint https://index.docker.io/v1/
DEBU[0003] Index server: https://index.docker.io/v1/
ERRO[0003] Handler for GET /v1.21/images/search returned error: Get https://index.docker.io/v1/search?q=phusion: dial tcp: lookup index.docker.io on x.x.x.x:53: no such host
ERRO[0003] HTTP Error                                    err=Get https://index.docker.io/v1/search?q=phusion: dial tcp: lookup index.docker.io on x.x.x.x:53: no such host statusCode=404

Similar issues are #12712 and #12778

I have tested go1.3.0 and it did not exhibit this problem, but at least 1.4.3, 1.5.1 and 1.5.2 are affected.

Tested using the following code:

package main

import (
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
    "os"
)

func main() {
    resp, err := http.Get(os.Args[1])
    if err != nil {
        log.Fatal(err)
    }
    page, err := ioutil.ReadAll(resp.Body)
    resp.Body.Close()
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("%s", page)
}

Note - we only run an IPv4 stack. IPv6 is disabled on the client

Test fails with pure Go resolver

# /tmp/fetch-1.5.2 https://index.docker.io/v1/search?q=phusion
2015/12/10 12:17:42 Get https://index.docker.io/v1/search?q=phusion: dial tcp: lookup index.docker.io on x.x.x.x:53: no such host
# docker search phusion
Error response from daemon: Get https://index.docker.io/v1/search?q=phusion: dial tcp: lookup index.docker.io on x.x.x.x:53: no such host

DNS packet capture: PureGO_correct_but_fails.txt

Here we see the AAAA response contains only CNAME RRs but no AAAA RRs. The resolver queries all 4 name servers and all 4 responses contain CNAME and A RRs which the resolver ignores. It then searches using the search domain (from /etc/resolv.conf) appended, finally determining that the host cannot be found.

Test succeeds when forced to use cGo resolver

(By setting LOCALDOMAIN in the environment, as described in the docs)

# LOCALDOMAIN= /tmp/fetch-1.5.2 https://index.docker.io/v1/search?q=phusion
{"num_pages": 11, "num_results": 257, "results": [{"is_automated": true, ...

DNS packet capture: CGO_correct_but_succeeds.txt

Here we see the same AAAA and A RRs are returned but this time the resolver accepts the A records and the dial call succeeds.

Summary

The name server returns correct responses for both A and AAAA queries but the pure Go resolver ignores the A response records when the AAAA response contains no RRs.

I would have also expected that since IPv6 is disabled, querying for AAAA records is redundant, but the C library does it too. That may be because many hosts will have both stacks enabled but only have IPv4 routing configured so querying AAAA may be valid and successful but will require an A query anyway. Unfortunately that means the DNS server's load gets doubled for every address resolution.

Recommendations

  • Review section 3. Expected Behaviour of RFC4074
  • Enable AAAA queries only if the IPv6 stack is enabled
@mikioh
Copy link
Contributor

mikioh commented Dec 10, 2015

Can you please try with tip? I guess that the fix for #12778 and #13090 mitigates circumstances. I just tried tip on linux like the following:

sudo sysctl -w -w net.ipv6.conf.all.disable_ipv6=1

export GODEBUG=netdns=go+2
/tmp/fetch https://index.docker.io/v1/search?q=phusion
go package net: GODEBUG setting forcing use of Go's resolver
go package net: hostLookupOrder(index.docker.io) = files,dns
{"num_pages": 11, "num_results": 257,  ...}

export GODEBUG=netdns=cgo+2
/tmp/fetch https://index.docker.io/v1/search?q=phusion
go package net: using cgo DNS resolver
go package net: hostLookupOrder(index.docker.io) = cgo
{"num_pages": 11, "num_results": 257, ...}

@mikioh mikioh added this to the Go1.6 milestone Dec 10, 2015
@waddles
Copy link
Author

waddles commented Dec 10, 2015

I am still seeing it with tip

go version devel +e05b48e Thu Dec 10 08:04:07 2015 +0000 linux/amd64

export GODEBUG=netdns=go+2
go run fetch.go https://index.docker.io/v1/search?q=phusion
go package net: GODEBUG setting forcing use of Go's resolver
go package net: hostLookupOrder(index.docker.io) = files,dns
2015/12/11 00:44:23 Get https://index.docker.io/v1/search?q=phusion: dial tcp: lookup index.docker.io on x.x.x.x:53: no such host
exit status 1

export GODEBUG=netdns=cgo+2
go run fetch.go https://index.docker.io/v1/search?q=phusion
go package net: using cgo DNS resolver
go package net: hostLookupOrder(index.docker.io) = cgo
{"num_pages": 11, "num_results": 259, ...}

I discovered that changing the nameservers in /etc/resolv.conf solves the problem - at least I tested a few other public name servers and found no problem.

After careful inspection of the response packets from Google's nameservers and ours, I can see no difference in them other than Google (and BIND at least) appear to use an internal "packet pointer" to effectively compress the response packet. When the parser sees a byte in the label where the 2 high bits are set (ie 0xC000 and higher), it uses the next 14 bits as an offset that many characters into the previous label (dots not included) to complete the rest of the current label. It is defined in RFC1035 section 4.1.4 Message Compression and described in O'Reilly's DNS and BIND 4th edition, chapter 15.2.3.

Our nameservers do not use this method of compression and instead include the full label for every RR. I will need to do some more research on that. I still think the problem is in Go but may not be exactly as described in the initial post.

@mikioh
Copy link
Contributor

mikioh commented Dec 11, 2015

Feel free to change the description correctly and send a patch including a test case (see https://github.com/golang/go/wiki#contributing-to-the-go-project) if you see a bug in https://github.com/golang/go/blob/master/src/net/dnsmsg.go#L452. It's likely because the builtin DNS stub resolver was designed as an auxiliary resolver and changed its role as a primary in Go 1.5.

@waddles waddles changed the title net: Go dns queries both AAAA and A but ignores A when no AAAA are returned net: pure Go dns does not handle UDP packets > 512 octets Dec 11, 2015
@waddles
Copy link
Author

waddles commented Dec 11, 2015

Well we managed to work around our test case by enabling compression on our name servers which are built with https://github.com/miekg/dns - an awesome Go DNS library.

However, I'm pretty certain that the pure Go resolver does not implement RFC6891 which would allow UDP packet sizes > 512 octets to be requested and handled.

@mikioh mikioh modified the milestones: Unplanned, Go1.6 Dec 11, 2015
@mikioh
Copy link
Contributor

mikioh commented Dec 11, 2015

See #13356. Also IETF dnsops guys will update RFC 5966 soon; https://tools.ietf.org/html/draft-ietf-dnsop-5966bis.

@mikioh
Copy link
Contributor

mikioh commented Dec 11, 2015

This is a duplicate of #6464.

@mikioh mikioh closed this as completed Dec 11, 2015
@golang golang locked and limited conversation to collaborators Dec 14, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants