Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: revisit unconditional use of cgo lookups for darwin #16345

Closed
danp opened this issue Jul 13, 2016 · 34 comments
Closed

net: revisit unconditional use of cgo lookups for darwin #16345

danp opened this issue Jul 13, 2016 · 34 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@danp
Copy link
Contributor

danp commented Jul 13, 2016

https://golang.org/cl/8945 changed to using the native Go stub resolver for most systems. Darwin was excluded, here, due to firewall warnings.

Is this still an issue?

On my 10.11 system with the firewall enabled this test program doesn't produce any warnings, even with cgo disabled:

package main

import (
    "fmt"
    "net"
)

func main() {
    ns, err := net.LookupHost("www.google.com")
    if err != nil {
        panic(err)
    }
    fmt.Println(ns)
}
% CGO_ENABLED=0 GODEBUG=netdns=2 go run main.go
go package net: built with netgo build tag; using Go's DNS resolver
go package net: hostLookupOrder(www.google.com) = files,dns
[216.58.192.164 2607:f8b0:4009:80e::2004]

(nothing pops up when running)

There are probably other reasons to conditionally exclude Darwin for now, such as if /etc/resolver config is used (#12524), but perhaps removing this blanket condition could be a start.

cc @mdempsky and @bradfitz since you authored that change.

@ianlancetaylor ianlancetaylor added this to the Go1.8 milestone Jul 13, 2016
@bradfitz
Copy link
Contributor

The Darwin exclusion predates https://golang.org/cl/8945 I thought. Can you do some digging and investigate its history?

What do we gain by using Go's resolver by default on Mac? It seems like we'd need to do #12524 first as you mentioned otherwise we risk doing more harm than good.

@danp
Copy link
Contributor Author

danp commented Jul 19, 2016

I did some digging!

Before https://golang.org/cl/8945 the only way to use the Go's resolver (for most platforms, not just darwin) was to build with netgo (and/or without cgo?). This caused the stubs to return false for completed and triggered use of the Go resolver.

I discovered along the way that android is also currently summarily excluded from using Go's resolver due to #10714, though it's done per request instead of per-conf-setup like darwin. In the CL for that, @minux called for the same to be done for iOS and then subsequently found the exclusion for darwin.

All unix platforms except for darwin and android now use Go's resolver unless weirdness is detected along here or here. I think we could gain more consistency across platforms if darwin were to join in with necessary safety checks.

Re #12524, that could be something we detect and fall back to cgo for initially. Once there is support for it in the Go resolver the check could be removed and Go's resolver could be used in that case.

At the very least if we decide not to pursue this it would be good to update the comment around the darwin exclusion to explain it's due to more than possible firewall warnings.

@minux
Copy link
Member

minux commented Jul 20, 2016 via email

@danp
Copy link
Contributor Author

danp commented Jul 20, 2016

I only have immediate access to 10.11 and 10.10 systems, neither of which pop up anything when trying the test program in the description (which I've updated to show exactly how I ran it). Trying to see if any community members have access to older systems for testing. Worth noting that 10.10 would have been current at the time of https://golang.org/cl/8945.

I'm also unable to find anything when searching for this issue happening generally. @bradfitz, did you experience this firsthand? Is it possible it was this standard firewall warning that pops up when a process listens for outside connections, and maybe caused by something else?

@groob
Copy link
Contributor

groob commented Jul 20, 2016

@dpiddy @bradfitz I don't think this a bug. I tested on 10.11 and the application firewall is not triggered either.

Having a "server" triggers the firewall on OS X, since that would offer an incoming connection. But DNS resolution is outgoing, so the firewall would ignore it.

@danp
Copy link
Contributor Author

danp commented Aug 16, 2016

Any further thoughts on this?

I can try and put together ways to detect when we should not use Go's resolver if we want to move forward with removing the blanket condition.

@bradfitz
Copy link
Contributor

bradfitz commented Sep 9, 2016

@dpiddy, I can test on OS X 10.8, now that we have VM-based builders and ancient versions available.

@danp
Copy link
Contributor Author

danp commented Sep 9, 2016

Great! Let me know what I can do to help.

@bradfitz
Copy link
Contributor

bradfitz commented Sep 9, 2016

I tried on OS X 10.8 with the firewall on in its most restrictive mode (block all incoming connections), and I saw no pop-up dialog using Go's DNS resolver. I set GODEBUG=netdns=go+2 and saw it use Go's DNS resolver and get the right answers.

@bradfitz
Copy link
Contributor

bradfitz commented Sep 9, 2016

I can try and put together ways to detect when we should not use Go's resolver if we want to move forward with removing the blanket condition.

Propose away.

@danp
Copy link
Contributor Author

danp commented Sep 21, 2016

Hope to spend some good time on this soon. So far existence of /etc/resolvers is the first obvious cgo fallback condition.

What I'm more unsure of is this bit in the darwin resolver(5) man page:

However, client configurations are not limited to file storage. The implementation of the DNS multi-client search strategy may also locate client configuratins in other data sources, such as the System Configuration Database. Users of the DNS system should make no assumptions about the source of the configuration data.

I'm not familiar with what those other data sources might be or if we can detect their use.

Issues like docker/for-mac#19 suggest scutil can be used to discover DNS config in the System Configuration Database. Would that give enough hints on when cgo should be preferred? Would it work for all Go-supported OS X versions?

There might also be special names that should prefer cgo.

Any available insight on these things would be greatly appreciated!

And if it would help get things started to open a CL removing the blanket darwin exclusion but falling back to cgo if /etc/resolvers exists I can certainly do that.

@quentinmit
Copy link
Contributor

scutil is a wrapper around the SystemConfiguration framework. scutil --dns prints the current resolver configuration. The default configuration is DNS from DHCP/manual, followed by mdns for a number of specific domains ("local" and the PTR domains).

It looks like Chromium uses the unexported symbol dns_configuration_copy to get the configuration: https://chromium.googlesource.com/chromium/src/+/b4aadc32a7fd6d42ac3cc9adbb20a8ee4a267572/net/dns/dns_config_watcher_mac.cc

@quentinmit quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 7, 2016
@quentinmit
Copy link
Contributor

Also, another link. Here is the implementation of dns_configuration_copy (though I think we should use it from libSystem and/or SystemConfiguration.framework, not copy it into Go):

http://publicsource.apple.com/source/configd/configd-802.40.13/dnsinfo/dnsinfo_copy.c

@danp
Copy link
Contributor Author

danp commented Oct 7, 2016

Thanks for the tips!

though I think we should use it from libSystem and/or SystemConfiguration.framework, not copy it into Go

Any pointers on what that might look like?

Couple other notes from digging:

  • resolver(5) describes support for custom ports both in nameserver lines with 1.2.3.4.55 for port 55 and via the port directive. Use of port would trigger setting unknownOpt on the config but the nameserver form will need consideration.
  • like OpenBSD, I don't think OS X has any notion of nsswitch.conf so the condition there should probably be expanded or reworked

@quentinmit
Copy link
Contributor

OS X's version of nsswitch.conf is the data in the SystemConfiguration framework.

Look at the Chromium source code; also look at our Keychain code in crypto/x509. I imagine we'd do something similar where we weakly link the symbols and call them from cgo.

Though an open question is whether it's actually worth the complexity to use the native resolvers for some but not all queries, I think.

@danp
Copy link
Contributor Author

danp commented Oct 7, 2016

The crypto/x509 tip helped me get an idea for what would be involved, thanks.

Though an open question is whether it's actually worth the complexity to use the native resolvers for some but not all queries, I think.

If the result of this is deciding it's too complex to use Go's resolver when the native resolver (via cgo) is available, that's fine with me. At least we know and have info to consider for the future.

There might still be room for improving the experience when the native resolver is unavailable, such as with support for /etc/resolver.

@rsc rsc added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed NeedsFix The path to resolution is known, but the work has not been done. labels Oct 20, 2016
@rsc
Copy link
Contributor

rsc commented Oct 27, 2016

On my OS X 10.11.6 system, if I go to Security & Privacy and then Firewall and Turn Firewall On and then Firewall Options... and then check "Block all incoming connections", then GODEBUG=netdns=go+2 go run lookup.go hangs, but GODEBUG=netdns=cgo+2 go run lookup.go keeps working.

It is true that without "Block all incoming connections", the cgo mode does not get popup dialogs like it used to long ago. But I think we probably still can't turn on Go resolution by default.

For the record, the original CL adding cgo support for resolving host names, specifically for OS X, was golang.org/cl/4437053 aka c9164a5.

I agree it would be nice if we could do what we do on Linux etc where we look at resolv.conf and nsswitch.conf and decide if it's OK to use the pure Go resolver. The reason we look at nsswitch.conf is to see if there are any non-DNS lookup methods configured, and if so we delegate to the C library. On the Mac there is no nsswitch.conf but as I understand it there's effectively always a non-DNS lookup method configured (Bonjour). So if there were an accurate nsswitch.conf we'd never use the Go resolver by default.

To summarize:

  1. The most restrictive OS X firewall setting makes the Go resolver hang.
  2. OS X has no nsswitch.conf but if it had an accurate one we'd see non-DNS resolution methods listed and would choose to use the cgo resolver.

For both these reasons, I think we should leave the default on OS X where it is, namely using the cgo resolver.

Note that people who want to use the pure Go resolver need not recompile their programs, as in @danp's example (the CGO_ENABLED=0 is causing package net to be rebuilt entirely). It suffices to set GODEBUG=netdns=go, as mentioned in the net package doc.

@bradfitz
Copy link
Contributor

bradfitz commented Nov 1, 2016

On the Mac there is no nsswitch.conf but as I understand it there's effectively always a non-DNS lookup method configured (Bonjour).

That's only for *.local names, or things without a dot. Even on Linux when Avahi/mDNS stuff is listed in nslookup.conf, we only do cgo if we see *.local or no dot (if search domains are listed), iirc. We could probably do the same on Darwin.

I'm going to kick this to Unplanned for now. If somebody wants to own this and do the pure Go thing when it's really safe on macOS and won't be annoying for the user, feel free to research and post your plan. It could go in Go 1.9 if there's a plan that addresses @rsc's concerns.

@bradfitz bradfitz modified the milestones: Unplanned, Go1.8 Nov 1, 2016
@bradfitz bradfitz removed the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Nov 1, 2016
@danp
Copy link
Contributor Author

danp commented Feb 22, 2017

Just tried the test on my 10.12.3 system, with "Block all incoming connections" enabled, and it worked both with GODEBUG=netdns=go+2 and GODEBUG=netdns=cgo+2. Can others confirm?

@shawnps
Copy link
Member

shawnps commented Feb 25, 2017

@danp same result for me

@bitglue
Copy link

bitglue commented May 1, 2018

VPN is a common use case where using the Go native resolver with current behavior would break many things. For example, an IKEv2 VPN which does not do full tunneling will not install the DNS server provided in the IKE configuration as the system default, and thus will not end up in /etc/resolv.conf, thus will not be used by the native Go resolver. It is possible to push additional IKE configuration attributes which specify the VPN-provided DNS server should be used for some domains, or even all domains (less than ideal, because privacy), but in no case is this exposed through /etc/resolv.conf: one must query SystemConfiguration to get it.

@mjreed-wbd
Copy link

Agreed with @bitglue. Anything that relies on /etc/resolv.conf is broken on MacOS; that file may not even exist, but even if it does its contents don't necessarily have anything to do with the current DNS resolution configuration being used by the native resolver. Since there's no way to use the native config from the go resolver without a lot of special-case code (which will require either CGO or shelling out anyway), falling back on the native resolver even if CGO isn't otherwise enabled seems reasonable.

@peterbourgon
Copy link

Anything that relies on /etc/resolv.conf is broken on MacOS; that file may not even exist, but even if it does its contents don't necessarily have anything to do with the current DNS resolution configuration being used by the native resolver.

Any reference for this?

@bitglue
Copy link

bitglue commented Jun 2, 2021

Any reference for this?

The comment at the head of the file:

#
# macOS Notice
#
# This file is not consulted for DNS hostname resolution, address
# resolution, or the DNS query routing mechanism used by most
# processes on this system.
#
# To view the DNS configuration used by this system, use:
#   scutil --dns
#
# SEE ALSO
#   dns-sd(1), scutil(8)
#
# This file is automatically generated.
#

And resolver(5):

The configuration for a particular client may be read from a file having the format described in this man page. These
are at present located by the system in the /etc/resolv.conf file and in the files found in the /etc/resolver directory.
However, client configurations are not limited to file storage. The implementation of the DNS multi-client search
strategy may also locate client configuratins in other data sources, such as the System Configuration Database. Users
of the DNS system should make no assumptions about the source of the configuration data.

@peterbourgon
Copy link

@bitglue Neither of those things substantiate the claim that "Anything that relies on /etc/resolv.conf is broken on MacOS".

@bitglue
Copy link

bitglue commented Jun 2, 2021

@peterbourgon You're right -- the MacOS documentation doesn't say anywhere that Go is broken. It does however say pretty clearly that the builtin resolver doesn't use resolv.conf at all, and that it gets its configuration from other sources, like /etc/resolver and the System Configuration Database.

We can then reasonably hypothesize that because Go uses a file for resolver configuration that MacOS does not use (resolv.conf), and Go does not use the configuration sources the MacOS resolver does use (/etc/resolver, System Configuration Database, maybe more), that the two may not behave identically. In other words, "broken" behavior.

If you want specific scenarios where that happens, you already have them in this thread and the many other issues that have mentioned it. I don't know how it could be made more clear.

@peterbourgon
Copy link

peterbourgon commented Jun 2, 2021

We can then reasonably hypothesize that ... the two may not behave identically.

True!

In other words, "broken" behavior.

False! 😉

I agree that the Go resolver should change to follow macOS standard behaviors and that the current implementation produces broken results for many users. But the root cause of that broken behavior is not the Go resolver, it's broken configuration in resolv.conf. The nameservers in that file should work, and if they don't it's a problem with whatever wrote them there.

But this is just splitting hairs. It's unlikely that we're gonna convince Tim to change anything here. Go should change instead.

@bhcleek
Copy link
Contributor

bhcleek commented Jun 2, 2021

The nameservers in that file should work, and if they don't it's a problem with whatever wrote them there.

One of the problems I've seen is when using a VPN client that doesn't update /etc/resolv.conf on a mac. The thing that wrote the original entries wasn't broken, and arguably neither is the VPN client. In such a case, the nameservers in /etc/resolv.conf can't be used to resolve names on the VPN.

@peterbourgon
Copy link

One of the problems I've seen is when using a VPN client that doesn't update /etc/resolv.conf on a mac.

This seems like a problem with that client; mine does, as another datapoint.

@bhcleek
Copy link
Contributor

bhcleek commented Jun 2, 2021

seems like a problem with that client

That's the arguably part 😁 . While it's a nice to have, it's not clear that on a system where /etc/resolv.conf is documented as not being used for DNS resolution by most applications on a system that it's wrong to not update that file.

@bitglue
Copy link

bitglue commented Jun 3, 2021

But the root cause of that broken behavior is not the Go resolver, it's broken configuration in resolv.conf

You are assuming all possible configurations could somehow be represented in resolv.conf. They can't.

For example, the MacOS resolver can be configured to send queries for a particular domain to an alternative server. This situation is frequently encountered by VPN users. How would you represent that in /etc/resolv.conf?

MacOS does in fact do a pretty good job of keeping /etc/resolv.conf in sync as much as possible. But since not all configuration options can be represented within the constraints of that file, there's only so much it can do. Mostly it's not a problem, because the only things that ship with Mac OS which use this file are programs like dig and host, programs which are for making DNS queries, not resolving hostnames, and these programs also have options to directly specify an alternative DNS server if that's what you want.

The problem with the pure go resolver is it assumes making DNS queries and resolving hostnames are the same thing. This isn't really true:

  • Unqualified hostnames are searched for in the search domains.
  • If the name ends in .local, you probably want to use mDNS, not DNS.
  • The name could be in /etc/hosts.
  • The name could be resolved through LDAP, or nscd.

The pure Go resolver captures some but not all of this behavior. This results in the pure Go resolver being some amount of "broken" not just on MacOS, but on Linux as well. Go's own documentation admits the many limitations of the pure Go resolver:

By default the pure Go resolver is used, because a blocked DNS request consumes only a goroutine, while a blocked C call consumes an operating system thread. When cgo is available, the cgo-based resolver is used instead under a variety of conditions: on systems that do not let programs make direct DNS requests (OS X), when the LOCALDOMAIN environment variable is present (even if empty), when the RES_OPTIONS or HOSTALIASES environment variable is non-empty, when the ASR_CONFIG environment variable is non-empty (OpenBSD only), when /etc/resolv.conf or /etc/nsswitch.conf specify the use of features that the Go resolver does not implement, and when the name being looked up ends in .local or is an mDNS name.

The reason this brokenness tends to be noticed on MacOS more than Linux because most binary releases of popular Go programs (kubectl and terraform are two I've used personally) are cross-compiled, which in practice involves disabling cgo. So, the fallback to the cgo resolver can't happen. Native-compiled Go programs on MacOS resolve fine because they use the cgo resolver unconditionally. A Linux go binary with cgo disabled is also broken, but disabling cgo on Linux isn't nearly as common.

If this decision to unconditionally use the cgo resolver were changed, then Go would need to do a better job detecting situations where it would not behave as desired and falling back, as it does on Linux. As explained in the documentation quoted above, there are many things that might trigger these fallbacks, but many of these are lacking their MacOS counterparts. For example, MacOS doesn't have a /etc/nsswitch.conf. I believe the equivalent configuration is in the System Configuration Database, which isn't mentioned at all among the checks that might trigger a fallback to the cgo resolver.

In other words, no longer unconditionally using the cgo resolver on Darwin would subject all Go users, not just those using cross-compiled binaries, to the brokenness already widely reported here and in mentioned other issues.

@bitglue
Copy link

bitglue commented Jun 3, 2021

One of the problems I've seen is when using a VPN client that doesn't update /etc/resolv.conf on a mac.

This seems like a problem with that client; mine does, as another datapoint.

The problem is in your expectation that /etc/resolv.conf is the canonical source of configuration. It says right in the file it's not. And for good reason: the MacOS resolver can do a lot of stuff which simply can't be configured in /etc/resolv.conf because the format of this file, which was set decades ago, simply doesn't support it. Things like only sending DNS queries for *.workstuff.example.com to the DNS server at 10.255.255.254 over the VPN tunnel that's a route only for 10.0.0.0/8.

The built-in VPN client works this way. If you configure a VPN tunnel to route all traffic, then it does update /etc/resolv.conf (indirectly: what it does is also route all DNS queries to the VPN-provided DNS server through the System Configuration Database, which then updates /etc/resolv.conf in a best-effort attempt to maintain compatibility with a file syntax that predates the design of the MacOS resolver). If the VPN tunnel routes only some traffic then it can also provide a DNS server for some domains, but this doesn't end up in /etc/resolv.conf because there's no way to articulate the idea in that syntax.

@peterbourgon
Copy link

peterbourgon commented Jun 3, 2021

The problem is in your expectation that /etc/resolv.conf is the canonical source of configuration.

I do not expect that it is the canonical source of configuration. I expect that it is a working source of configuration, and I expect that anything which modifies my host's nameservers will also modify that file, as a secondary but still necessary action.

But, again, this is moot, because we agree on what should happen: the pure Go DNS resolver on macOS should consult the actual source of truth instead of this (often broken) proxy.

@dr2chase dr2chase added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 4, 2021
@danp
Copy link
Contributor Author

danp commented Nov 11, 2022

I think with #12524 fixed by having darwin always use cgo-less libc calls this can be closed. Thanks, all!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests