Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: make LookupCNAME consistent between Unix and Windows, document #50101

Closed
bradfitz opened this issue Dec 11, 2021 · 28 comments
Closed

net: make LookupCNAME consistent between Unix and Windows, document #50101

bradfitz opened this issue Dec 11, 2021 · 28 comments

Comments

@bradfitz
Copy link
Contributor

LookupCNAME is pretty weird right now.

Despite the name, it entirely ignores CNAME records on Unix. It launches A and AAAA record lookups to recursive resolvers and returns the first response name found in the A and AAAA, skipping over any CNAME. (and not even asking for a CNAME)

But it documents that it does that...

https://pkg.go.dev/net#LookupCNAME

A canonical name is the final name after following zero or more CNAME records. LookupCNAME does not return an error if host does not contain DNS "CNAME" records, as long as host resolves to address records.

OTOH, on Windows, it does what you would expect from the name itself: it looks up CNAME records:

func (*Resolver) lookupCNAME(ctx context.Context, name string) (string, error) {
        // TODO(bradfitz): finish ctx plumbing. Nothing currently depends on this.
        acquireThread()
        defer releaseThread()
        var r *syscall.DNSRecord
        e := syscall.DnsQuery(name, syscall.DNS_TYPE_CNAME, 0, nil, &r, nil)

Here's a demo of a program behaving differently:

func main() {
	txt, err := net.LookupTXT("cname-to-txt.go4.org")
	log.Printf("LookupTXT = %q, %v", txt, err)

	cname, err := net.LookupCNAME("cname-to-txt.go4.org")
	log.Printf("cname = %q, %v", cname, err)
}

On Linux/Mac:

2021/12/10 21:19:45 LookupTXT = ["foo=bar"], <nil>
2021/12/10 21:19:45 cname = "", lookup cname-to-txt.go4.org: no such host

On Windows:

2021/12/10 21:11:45 LookupTXT = ["foo=bar"], <nil>
2021/12/10 21:11:45 cname = "test-txt-record.go4.org.", <nil>

I like the Windows behavior better, FWIW. That's what I was looking for, but apparently it doesn't exist.

Can we either:

  1. add LookupCNAMERecord that actually looks up a CNAME record
  2. redefine LookupCNAME to be like Windows, perhaps adding a LookupCanonicalName with the current weird Unix behavior of LookupCNAME?

But at minimum: document whatever the rules are and make Unix and Windows match? At least in Resolver.PreferGo mode?

@rsc
Copy link
Contributor

rsc commented Dec 15, 2021

I think LookupCNAME came about that way because we are using getaddrinfo, and it returns the underlying name almost as a side effect of the lookup, in res.ai_canonname. Should we stop using glibc for this call and make it match Windows? It might make it fail where it was succeeding before? Not sure.

@rsc
Copy link
Contributor

rsc commented Dec 15, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals (old) Dec 15, 2021
@rsc
Copy link
Contributor

rsc commented Jan 5, 2022

The problem we have is that we need to use getaddrinfo(AI_CANONNAME) on Macs, because port 53 is blocked to non-libc code. Is there some way to make getaddrinfo succeed for hosts that have a CNAME record but for which that named host has no A/AAAA records? If not, it's very hard to implement the Windows LookupCNAME behavior on systems like Macs.

@rsc
Copy link
Contributor

rsc commented Jan 12, 2022

Looks like even though it's not in the man pages, macOS may have res_ninit. So maybe we should look into using that in place of getaddrinfo(AI_CANONNAME). If that's possible, then it would seem OK to change this.

The specific case being changed is when the name has a CNAME but no A/AAAA record, which is currently an error on Unix but succeeds on Windows. With this change, it would succeed everywhere in this (unusual) case.

Does anyone want to look into how hard it would be to make the Go code use libresolv on Mac? That might also help for things like MX lookups.

@rsc
Copy link
Contributor

rsc commented Jan 19, 2022

Sounds like we are still waiting for someone to check what can be done on the Mac.

@rsc
Copy link
Contributor

rsc commented Feb 2, 2022

@bradfitz says there is a CL in net that got rolled back that used libresolv directly. The only problem was it used res_init instead of res_ninit, because the latter is undocumented (but apparently present). The relevant issue is #12524. Perhaps someone wants to try resurrecting that CL using the thread-safe APIs? Also related: #16345 and #31705. The rollback was https://go.dev/cl/180843.

@rsc
Copy link
Contributor

rsc commented Feb 16, 2022

On hold for anyone who wants to try to implement the new behavior on Mac.

@rsc
Copy link
Contributor

rsc commented Feb 16, 2022

Placed on hold.
— rsc for the proposal review group

@rsc rsc moved this from Active to Hold in Proposals (old) Feb 16, 2022
@rsc
Copy link
Contributor

rsc commented Feb 20, 2022

Hacked up something that seems to work on macOS. I will clean it up and mail it out next week.

@rsc rsc self-assigned this Feb 20, 2022
@rsc rsc moved this from Hold to Active in Proposals (old) Feb 23, 2022
@rsc
Copy link
Contributor

rsc commented Feb 23, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc removed the Proposal-Hold label Mar 2, 2022
@rsc
Copy link
Contributor

rsc commented Mar 2, 2022

I haven't cleaned it up yet, but it certainly seems to work and it passes Brad's test TXT record.

Does anyone object to changing LookupCNAME to succeed for hosts with CNAME but not A records on Unix, like it already does on Windows?

@slrz
Copy link

slrz commented Mar 2, 2022

I've seen it used to get the local machine's FQDN, by passing the result of os.Hostname to LookupCNAME and letting the resolver worry about search domains and what not. That seems to be exactly the kind of thing that relies on LookupCNAME essentially behaving like gai's AI_CANONNAME.

@slrz
Copy link

slrz commented Mar 2, 2022

A quick search surfaces the package github.com/Showmax/go-fqdn which does just that. It falls back to other methods if LookupCNAME doesn't work but those rely on properly configured PTR records (or explicit /etc/hosts entries).

If we changed LookupCNAME to be about CNAME records, I would prefer to have the "canonical name" behaviour available through some other way.

@rsc
Copy link
Contributor

rsc commented Mar 9, 2022

Sorry, I don't understand. I had not realized AI_CANONNAME meant anything other than 'do a CNAME lookup'.

What is the circumstance in which AI_CANONNAME consults something other than CNAME records, and what does it consult?

@slrz
Copy link

slrz commented Mar 9, 2022

Invoking getaddrinfo for the name "host" with AI_CANONNAME in ai_flags and an otherwise zero hints structure (meaning AF_UNSPEC) does the following (at least on glibc):

  • consult /etc/resolv.conf to get name servers and search domains
  • for each configured search domain d, do A as well as AAAA queries for host.$d
  • return the names from sucessful responses to the caller in ai_canonname [edit: only the first name found is returned to the caller]

That's with "files dns" listed in nsswitch.conf. It's anyone's guess what it might do with other NSS modules.

The Linux man pages describe this behaviour as:

If hints.ai_flags includes the AI_CANONNAME flag, then the ai_canonname field of the first of the addrinfo structures in the returned list is set to point to the official name of the host.

I guess "official name" is meant to refer to the host's FQDN here.

@rsc
Copy link
Contributor

rsc commented Mar 16, 2022

@slrz OK, it sounds like the difference is for hosts with A/AAAA records but no CNAME, such as 'google.com'.
So LookupCNAME("google.com") = "google.com", nil, at least on Unix.
But probably on Windows LookupCNAME("google.com") is an error because it actually looks for CNAME records.
Is that the behavior difference you were getting at?

Normally we try very hard to avoid behavior changes, but in this case it is hard to see what would break given that Windows has never behaved the way Unix does, and we'd be making the Windows behavior the standard one.

@rsc
Copy link
Contributor

rsc commented Mar 23, 2022

There are two possible options.

  1. Make LookupCNAME everywhere match LookupCNAME on Windows today, which means it looks for CNAME records and that's it.
  2. Make LookupCNAME match a combination of Unix and Windows, where it looks for CNAME records but also falls back to returning the input string (like "google.com") for names that have A/AAAA records and no CNAME.

Given the rest of the API, with things like LookupMX, I think everyone expects LookupCNAME to mean (1) today. I certainly did. It seems like we should at least try to go down that path, and if we find out why we have to do (2), at least we'll know why.

Maybe we could get the code for (1) ready to land at the start of the Go 1.20 cycle and see how far we get?

Does anyone object to this? Or does anyone know why we must do (2)?

@slrz
Copy link

slrz commented Mar 26, 2022

@slrz OK, it sounds like the difference is for hosts with A/AAAA records but no CNAME, such as 'google.com'. So LookupCNAME("google.com") = "google.com", nil, at least on Unix. But probably on Windows LookupCNAME("google.com") is an error because it actually looks for CNAME records. Is that the behavior difference you were getting at?

Exactly, but more importantly also LookupCNAME("amsterdam") = "amsterdam.example.com.", nil when invoked on a system with example.com in its DNS search list.

Normally we try very hard to avoid behavior changes, but in this case it is hard to see what would break given that Windows has never behaved the way Unix does, and we'd be making the Windows behavior the standard one.

Besides the "go-fqdn" library linked above, I know of at least one program that will be broken by this change in behaviour. As it's a companion program to the server parts of a video conferencing system that only run on Linux anyway (BigBlueButton), the current Windows behaviour is not very relevant. Usage there is like go-fqdn or the LookupCNAME("amsterdam") example above: invoke LookupCNAME with the result of os.Hostname as part of an attempt to mimic hostname -f.

I'm not saying those are unfixable or that what they're doing is a good idea, just that they will break where they didn't before.

@slrz
Copy link

slrz commented Mar 26, 2022

  1. Make LookupCNAME match a combination of Unix and Windows, where it looks for CNAME records but also falls back to returning the input string (like "google.com") for names that have A/AAAA records and no CNAME.

Just returning the input string won't be sufficient: the callers are generally trying to mimic hostname -f so it's the DNS search list handling and consultation of host aliases from /etc/hosts that's required.

@rsc
Copy link
Contributor

rsc commented Mar 30, 2022

Exactly, but more importantly also LookupCNAME("amsterdam") = "amsterdam.example.com.", nil when invoked on a system with example.com in its DNS search list.

This definitely is something I don't want to break. Thanks.

If we want to make LookupCNAME the same as much as possible on Linux and Windows without breaking existing uses, it looks like we could do a CNAME lookup (including adding DNS search suffixes), and if that works we're done, and otherwise fall back to A/AAAA lookup, and if that works, return the name that had the record. On Linux this would mean adding the explicit CNAME lookup, and on Windows it would mean adding the A/AAAA fallback.

And then we could separately add a StrictCNAME bool field to net.Resolver to make Resolver.LookupCNAME mean only do the CNAME lookup.

Does that sound reasonable?

@rsc rsc moved this from Active to Likely Accept in Proposals (old) Apr 13, 2022
@rsc
Copy link
Contributor

rsc commented Apr 13, 2022

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Likely Accept to Accepted in Proposals (old) May 4, 2022
@rsc
Copy link
Contributor

rsc commented May 4, 2022

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: net: make LookupCNAME consistent between Unix and Windows, document net: make LookupCNAME consistent between Unix and Windows, document May 4, 2022
@rsc rsc modified the milestones: Proposal, Backlog May 4, 2022
@gopherbot
Copy link

Change https://go.dev/cl/446179 mentions this issue: net: unify CNAME handling across ports

@gopherbot
Copy link

Change https://go.dev/cl/451420 mentions this issue: doc/go1.20: add release notes for net package

gopherbot pushed a commit that referenced this issue Nov 18, 2022
For #50101
For #51152
For #53482
For #55301
For #56515

Change-Id: I11edeb4be0a7f80fb72fd7680a3407d081f83b8b
Reviewed-on: https://go-review.googlesource.com/c/go/+/451420
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Damien Neil <dneil@google.com>
Run-TryBot: Damien Neil <dneil@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
@mateusz834
Copy link
Member

mateusz834 commented Nov 30, 2022

I looked at the implementation and it seems that when the DNS server responds with multiple CNAME like:

sth.go.dev. IN CNAME 1.sth.go.dev
1.sth.go.dev. IN CNAME 2.sth.go.dev

Then the LookupCNAME returns 1.sth.go.dev, not the 2.sth.go.dev. Is that the behaviour we want? Shouldn't we return the last one in the chain??

But when there will be an A/AAAA resource, then it will return 2.sth.go.dev.
Edit: probably it will return 1.sth.go.dev, so this is a backwards compatibility breaking change, previously it returned the 2.sth.go.dev.

sth.go.dev. IN CNAME 1.sth.go.dev
1.sth.go.dev. IN CNAME 2.sth.go.dev
2.sth.go.dev IN A 192.0.2.1

Edit: on windows the last one in the chain is returned

// returns the last CNAME in chain.
so this probably only affects Linux and MacOS

@mateusz834
Copy link
Member

mateusz834 commented Dec 1, 2022

@rsc @ianlancetaylor
This change breaks /etc/hosts aliases while using cgo mode on linux.

[mateusz@arch testaa]$ cat /etc/hosts
# Static table lookup for hostnames.
# See hosts(5) for details.
127.0.0.1 localhost.localdoman localhost
::1 localhost.localdomain localhost

::1  arch.localdomain arch
[mateusz@arch testaa]$ GODEBUG=netdns=cgo+2 ./main arch
go package net: confVal.netCgo = true  netGo = false
go package net: using cgo DNS resolver
go package net: hostLookupOrder(arch) = cgo
arch.
[mateusz@arch testaa]$ GODEBUG=netdns=go+2 ./main arch
go package net: confVal.netCgo = false  netGo = true
go package net: GODEBUG setting forcing use of Go's resolver
go package net: hostLookupOrder(arch) = files,dns
arch.localdomain.
[mateusz@arch testaa]$ cat main.go
package main

import (
        "fmt"
        "net"
        "os"
)

func main() {
        a, err := net.LookupCNAME(os.Args[1])
        if err != nil {
                fmt.Printf("ERROR: %v\n", err)
                os.Exit(1)
        }
        fmt.Println(a)
}

On unix systems we cannot only use the dns resolver.

@gopherbot
Copy link

Change https://go.dev/cl/454397 mentions this issue: net: use getaddrinfo while searching for cname on unix systems

@gopherbot
Copy link

Change https://go.dev/cl/455275 mentions this issue: net: rework the CNAME handling on unix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

6 participants