New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: parallel A/AAAA queries are mishandled by broken DNS servers #35521
Comments
Fedora made Go 1.13.4 available. The error is still present. |
This error message is coming from net, not http2. Retitled. @albrro, can you provide some evidence showing that your DNS actually works? From this bug report as it stands, I'm tempted to just say "your DNS is broken". What does Also, try running with
Also interesting would be /etc/resolv.conf and /etc/nsswitch.conf |
Apart from the go programs on my laptop, nothing else has any DNS problems. I will upload some packet captures shortly. $ dig @192.168.01 registry.fedoraproject.org ; <<>> DiG 9.11.11-RedHat-9.11.11-1.fc31 <<>> @192.168.01 registry.fedoraproject.org ;; OPT PSEUDOSECTION: ;; ANSWER SECTION: ;; Query time: 91 msec $ GODEBUG=netdns=2 go run /tmp/foo.go Body:HTTP/2.0 200 OK Content-Type: application/json; charset=utf-8 {}$ go run /tmp/foo.go |
1 similar comment
Apart from the go programs on my laptop, nothing else has any DNS problems. I will upload some packet captures shortly. $ dig @192.168.01 registry.fedoraproject.org ; <<>> DiG 9.11.11-RedHat-9.11.11-1.fc31 <<>> @192.168.01 registry.fedoraproject.org ;; OPT PSEUDOSECTION: ;; ANSWER SECTION: ;; Query time: 91 msec $ GODEBUG=netdns=2 go run /tmp/foo.go Body:HTTP/2.0 200 OK Content-Type: application/json; charset=utf-8 {}$ go run /tmp/foo.go |
Ugh. I inserted the containing the name service info in the middle of the previous comment. |
Please run it a few times to see if it's just flaky and sometimes able to actually hit your DNS. Also try |
Done. The file go.test1.txt is the result of this script:
The pings are to separate each invocation. The packets are padded with a unique payload to distinguish them. While the script was running I captured the packets. You can see ping resolving its target and the go program also using DNS every time. |
@mdempsky, you like DNS, right? :) Any idea why Go's resolver times out here, but cgo works, and dig works? |
Looking. |
Same program, no changes. Also, to make it clear, my system has ZERO problems with DNS ... except for the test program and other Go programs like Toolbox. I thought it interesting the the invocations that time out still generated normal (ish?) DNS traffic. |
Whatever DNS software is running on 192.168.0.1 seems broken. If you look in the pcap file, you'll see a bunch of lines like this:
This corresponds to one of the invocations where the Go DNS client is trying to resolve registry.fedoraproject.org. It's sending two concurrent DNS requests: one for AAAA records from port 58863 with TXID 31317, and one for A records from port 47056 with TXID 34616. But what's happening here is the AAAA record response (i.e., for the first question) is being sent to port 47056 with TXID 34616 (i.e., for the second question). This is very wrong, and possibly a security issue. The reason using the cgo resolver works here is that when the parallel queries fail, it retries them sequentially:
|
If you add "options single-request" to your /etc/resolv.conf file, it should work. But really, the DNS server is broken and should be fixed. |
Well, I have a no-name router in what it calls "bridge mode" to jump the gap all the way to where my laptop is. It is buggy. What if I make /etc/resolv.conf go directly to 1.1.1.1? Looks like the issue has a workaround (or three). So it seems that the libc resolver, like you say, is more resilient to buggy servers. Thanks! |
Tried again making my "real" router (198.168.1.1) the default in resolv.conf and there is no problem. So long as I avoid relying on the buggy router, all is well. |
Glad to hear that both 1.1.1.1 and 192.168.1.1 work okay. Would you mind testing my hypothesis that "options single-request" should fix it too when using the problematic router? |
Yes, the single request option works. It also crashes my so-called "router". Its MAC address info is not even in the MAC vendors' DB. It is made by Xinwei. Should I delete that? |
Can you elaborate what you mean by this? How does it work, but also crash? |
The option allows the go test program to function. Meanwhile, perhaps because of what we've been doing, the router lost its connection to the main router and lost all of its ip info and just lays there. It's not the first time this happens but I've never found what triggered it. Perhaps it runs out of memory? |
I see. That seems odd, but consistent with the broken DNS server behavior. It sounds like you have a few workarounds here, so I'm going to close the issue. Thanks for you help looking into it. |
What version of Go are you using (
go version
)?This is the latest available from the Fedora repository. Everything is "standard issue".
Does this issue reproduce with the latest release?
unknown
What operating system and processor architecture are you using (
go env
)?Fedora 31 Workstation, fresh install
go env
OutputWhat did you do?
Fedora Toolbox (a container management program) failed to download images complaining that it was unable to get DNS responses. I searched and found issue 1948 for libpod on its Github project page (it's a tool that toolbox uses) and there I found a simple program that was suggested for testing. The issue is closed and the bug was difficult to reproduce.
The program in its entirety:
This small test program had the same error so I thought that a better place to report it would be here.
There is no problem with name resolution on the laptop. Please note that normal DNS query and response traffic is generated yet it does not reach the program.
What did you expect to see?
some http data
What did you see instead?
$ go run /tmp/foo.go
Error: Get https://registry.fedoraproject.org/v2/: dial tcp: lookup registry.fedoraproject.org on 192.168.0.1:53: read udp 192.168.0.3:37044->192.168.0.1:53: i/o timeout
The text was updated successfully, but these errors were encountered: