New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: DNS timeouts can alter the IPv4/IPv6-ity of LookupIP #17448
Comments
I don't remember the code at the moment, but that sounds perhaps a bit hacky. I'd rather not overload the meanings of In any case, the Go 1.8 freeze is fast approaching. Is this something you want to work on in the next couple weeks? I think @mdempsky and I at least are racing to finish up other stuff. /cc @mikioh too. |
There are environments where real queries fail like this. I don't think I'd also like to see a summary of how the popular system resolvers behave. On Oct 15, 2016 03:17, "Brad Fitzpatrick" notifications@github.com wrote:
|
I think it's fine to make the newly added methods of Resolver in Go 1.8 more deterministic and configurable for people who want to use Lookup APIs deterministically such as telco operators, but am not keen to change the default "non-deterministic" behavior because IP infrastructure consumers usually don't take care with details of underlying stuff, they care about only the returned error value and result. Fortunately we've not seen reports like "exchanging multiple different RRs on multiple DNS transports is inefficient" or "exchanging multiple different RRs on a single DNS transport makes some L4-7 appliance malfunction" yet. Maybe it's a good timing to start improving the existing builtin DNS stub resolver, though I guess it requires some blueprint. |
I'm curious, do you have actual examples of this? I've heard of cases where AAAA queries return REFUSED or SERVFAIL, but no response at all is pretty harsh.
For The harm arises when applications use DNS responses in less-obvious ways. For example, imagine a connection algorithm that says:
Where the operation can only be restarted by killing the process. If the client is IPv4-only or IPv6-only, then it could get stuck connecting to a non-viable address family. Or, consider an application that generates a unique ID by querying DNS, selecting one result with a fixed IPv4 or IPv6 bias. DNS flakiness could alter the answer, when it would be more correct for an incomplete result to fail. I'm not trying to argue that the above are ideal designs, but they work 99.9...% of the time, and the principle of least astonishment would suggest adding more nines.
Can you suggest a list? Note that only resolvers exposing "Lookup A+AAAA" as a single operation are in scope for this issue. |
It's prevalent enough that Ubuntu documentation talks about it: https://help.ubuntu.com/community/WebBrowsingSlowIPv6IPv4 Also lots of people online talking about it
Well, the obvious starting set is Darwin, recent GNU libc, and Windows. |
I mentioned this previously, but glibc's behavior appears to be:
For example:
|
Alright, I spun up Windows Server 2016 in a VM, deleted the "disable IPv6" cruft, and forwarded DNS through my Linux VPS. Here's how
This... isn't helping my case, is it? |
I don't have a good way to test macOS, but we've established that my proposal would in fact be stricter than both the Linux/Windows system resolvers. I see that there is a But which should be the default for |
I am firmly in the camp that "tolerate broken DNS infrastructure" should remain the default. Or at least "do the same thing as the system resolver". I have a Mac to test on but I'm not sure the best way to block the AAAA queries. |
Here's how I managed to test Windows:
|
You can use PF on various BSD variants including macOS, OS X by default nowadays. |
Can you please elaborate on I thought that you wanted stuff like https://tools.ietf.org/html/draft-wkumari-dnsop-multiple-responses or some experimental implementation as a solution to https://tools.ietf.org/html/draft-ietf-dnsop-no-response-issue for in-house tools. But it looks like you want more. I'm still not sure whether your proposal would work well in the wild, also DNS is the best tool to address issues you are assuming.
I'm not sure it would be a help because A and AAAA queries handling varies in each getaddrinfo implementation. IIRC, in early 00s developers realized that following simply RFC 3484 doesn't solve IPv6-IPv4 fallback issues in the wild; see RFC 4074, 4472. As a result, some implementation checks IP routing information/connectivity before querying to reduce unnecessary message exchange, some queries A-record first, some experimental makes a timeout for AAAA-query by using previously succeeded RTT of A-query/response, and perhaps others might do something similar or more complicated things. Moreover we can see AAAA filter in the wild as the side effect of World IPv6 Day/Launch. I feel like adding a few control knobs is fine, but am not keen on changing the default behavior.
Agree. |
What I'll probably do is add a |
I'm a bit confused. What does this mean? I believe you are not talking about RRset because A and AAAA are different types. Also as defined in RFC 4472, DNS transports and DNS records are independent. I think it's impossible to determine what kind of correlation exists between different class/type RRs without retrieving extra information like I-Ds mentioned above. Am I missing something? |
I'm confused by your confusion. I'm describing a completely-ordinary DNS name, with A and AAAA records, whose values are not changing:
In this case, |
Well, [edit]
I wrote above for another project but pasted it wrongly into this issue, but it looks related to this issue by synchronisity. |
Right, this change would only affect DNS transport failures, due to a timeout or socket error. Any situation where the DNS server actually responds (empty RR set, NXDOMAIN, lame delegation, SERVFAIL, REFUSED, etc.) would maintain the existing behavior. |
CL https://golang.org/cl/32572 mentions this issue. |
Further experience with b/31811300 has shown that (at least our own) resolvers can spuriously return This implies that the strict error handler should actually be watching for |
When using the non-cgo DNS resolver, the following program:
Outputs kernel.org's IPv4/IPv6 addresses, as expected:
However, if you blackhole
AAAA
responses, the output becomes IPv4-only:And vice versa for
A
responses:If you replace
INPUT
withOUTPUT
, the timeouts become socket errors, and the same thing happens, but more quickly.Essentially, the problem is that
LookupIP
presents the pair of queries as a single transaction, but internally, failures are non-atomic, so a flaky network can cause the function to report success, but only provide half of the addresses. The fix is to return a failure instead of a half-success.At Google, there have been cases involving other resolvers (b/31811300) where network flakiness causes a client to make an incorrect address selection decision, leading to minor outages, confused engineers, and/or ugly workarounds. I haven't directly observed problems with Go's resolver in particular, but I'd like to preemptively fix this while it's still theoretical.
The relevant
dnsclient_unix.go
changes would be:tryOneName
, distinguish socket errors from DNS protocol errors. My best idea is to setIsTimeout=true
for all socket errors, becauseIsTemporary
is already used to indicateSERVFAIL
.goLookupIPOrder
, cause anyTimeout()
error to abort thenameList
loop, discard all addresses, and fail immediately.(Interestingly,
getaddrinfo
also exhibits this problem, but asymmetrically: droppingAAAA
replies causes a name to become IPv4-only, but droppingA
replies returns an error.)The text was updated successfully, but these errors were encountered: