Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: prefer /etc/hosts over DNS when no /etc/nsswitch.conf is present #35305

Closed
schnoddelbotz opened this issue Nov 1, 2019 · 16 comments
Closed
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Proposal Proposal-Accepted
Milestone

Comments

@schnoddelbotz
Copy link

schnoddelbotz commented Nov 1, 2019

What version of Go are you using (go version)?

$ go version
go version go1.13.3 linux/amd64

Does this issue reproduce with the latest release?

Yes, it does.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build004276885=/tmp/go-build -gno-record-gcc-switches"

What did you do?

  • Built a go containerized Go app using best practices as documented by Docker:
    • using a multi-stage build
    • with CGO_ENABLED=0
    • with Alpine as the final stage for the container built
  • After successful build, run the app ...
    • by starting a SSH tunnel on the host system
    • by starting the container with --network=host to let container connect to tunnel
    • by telling container/app to use localhost as target to connect to:
ssh -L 59187:redis.example.com:6379 joe@bastion.example.com
docker run --net=host --rm myGoApp:latest /app/redisDoSomething -host localhost -port 59187

What did you expect to see?

The Go app should consider the container's /etc/hosts file to resolve localhost to 127.0.0.1.

What did you see instead?

The Go app did not consider /etc/hosts inside the container; it queried DNS servers to resolve localhost. Actually, no /etc/hosts entry would ever be considered, all lookups are using DNS directly.

Some background / Summary

In essence, everybody following documented Docker best-practices of using multi-stage builds and disabling CGO is prone to run into this issue sooner or later. In full paranoia mode, it could even be seen as security issue, as resolving localhost via DNS could be a potential attack vector, e.g. on public hot spots?

By re-opening the issue, I'd kindly ask for review of the original issue and to reconsider whether Go's net package could be adjusted to use /etc/hosts if present, even if no /etc/nsswitch.conf is found.

The issue just consumes too much developer resources and introduces work-arounds in way too many places, IMHO. Finally, and just for laughs, we ran into this issue as some joker created a localhost entry in our LAN, pointing to his PC.

@ianlancetaylor
Copy link
Contributor

When there is no /etc/nsswitch.conf file, the default for glibc for the hosts line is dns [!UNAVAIL=return] files. Go is intended to be compatible with that.

If Go never looks at /etc/hosts, even if a DNS lookup fails, that sounds like a bug. But the code does look OK to me at first glance:

	// If /etc/nsswitch.conf doesn't exist or doesn't specify any
	// sources for "hosts", assume Go's DNS will work fine.
	if os.IsNotExist(nss.err) || (nss.err == nil && len(srcs) == 0) {
		if c.goos == "solaris" {
			// illumos defaults to "nis [NOTFOUND=return] files"
			return fallbackOrder
		}
		if c.goos == "linux" {
			// glibc says the default is "dns [!UNAVAIL=return] files"
			// https://www.gnu.org/software/libc/manual/html_node/Notes-on-NSS-Configuration-File.html.
			return hostLookupDNSFiles
		}
		return hostLookupFilesDNS
	}

Changing Go to look at /etc/hosts first, before doing a DNS lookup, would not be compatible with glibc.

@schnoddelbotz
Copy link
Author

Thanks, @ianlancetaylor

When there is no /etc/nsswitch.conf file, the default for glibc for the hosts line is dns [!UNAVAIL=return] files. Go is intended to be compatible with that.
Changing Go to look at /etc/hosts first, before doing a DNS lookup, would not be compatible with glibc.

I believe you indirectly pointed out the root problem: A potentially wrong assumption about the glibc default -- effectively based on an outdated glibc manual, as far as I can tell:

The Go code in question points at some NSS glibc notes. BUT ... these notes were last updated in 1996 (!) by Ulrich Drepper:

Unsurprisingly, more than 20 years later, the NSS glibc notes seem outdated and seem to no longer match actual code. Comparing the commit from 1996 with current version, we have different behaviour nowadays:

  50 # define DEFAULT_DEFCONFIG "files"
 ...
 157   /* No configuration data is available, either because nsswitch.conf
 158      doesn't exist or because it doesn't have a line for this database.
 159
 160      DEFCONFIG specifies the default service list for this database,
 161      or null to use the most common default.  */
 162   if (*ni == NULL)
 163     {
 164       *ni = nss_parse_service_list (defconfig ?: DEFAULT_DEFCONFIG);

Yes, there's still a mention of "hosts", "dns [!UNAVAIL=return] files", but AFAICT by the sourrounding comment, that portion is /* Called by nscd and nscd alone. */.

I've always been able to override DNS with local hosts entries in the past, without tweaking system defaults. I think it's a common, broadly documented use-case (example, showing that even Windows users can rely on hosts as primary default source).
I was also unable to find the stated default anywhere else:

  • glibc documents hosts: files nisplus nis dns as example
  • glibc sources' nsswitch.conf show hosts: files dns as default/example
  • NetBSD manual page shows hosts: files dns as example
  • FreeBSD example: hosts: cache files dns
  • musl statically tries hosts before dns

Given the current de-facto work-around for Alpine-based containers, including Go itself, is...

RUN [ ! -e /etc/nsswitch.conf ] && echo 'hosts: files dns' > /etc/nsswitch.conf

I'd kindly ask to reconsider my request, based on above observations:
If there's no nsswitch.conf BUT a hosts file exists, try the hosts file before using DNS.

If someone has an account for glibc bugzilla, I'd appreciate if a bug could be opened there to further clarify/ensure that I'm not mistaken and this whole issue here just is simply based on their outdated docs ... and can safely be fixed in both places. Thanks!

@schnoddelbotz
Copy link
Author

I filed an issue to clarify the apparent glibc documentation issue https://sourceware.org/bugzilla/show_bug.cgi?id=25156

If I'm not mistaken, glibc actually falls back to files only if nsswitch.conf is absent; PLEASE do not align Go with that behaviour 😉

Will report back here for sure once the glibc issue brings clarification.

@schnoddelbotz
Copy link
Author

I had to take another look at glibc code, with the friendly help of djdelorie on glibc IRC. He pointed me at
https://sourceware.org/git/?p=glibc.git;a=blob;f=nss/hosts-lookup.c;hb=HEAD

Meaning: I was wrong, glibc really queries dns first in absence of nsswitch.conf 🤦‍♂ 😭
Really -- Sorry for the fuss.

But... It all still feels wrong?!
All Linux+BSD distros ship files dns as mentioned above, the glibc example does it, musl does it without etc. ... only glibc on a "broken" system without nsswitch.conf does it the other way round... leading to Go's observed "broken-by-default" behaviour on Alpine/musl. If Go insists on "compatibility" with glibc behaviour, the only remaining potential fix I see now is to change glibc :-)

I'll try to start that discussion over on libc-alpha now, as I was told the default change is at least discussable.

Would you mind keeping this bug open some more days until I have feedback? Thanks!

@FiloSottile FiloSottile changed the title net: Should prefer /etc/hosts over DNS (even) if no /etc/nsswitch.conf is present proposal: net: prefer /etc/hosts over DNS (even) if no /etc/nsswitch.conf is present Nov 5, 2019
@gopherbot gopherbot added this to the Proposal milestone Nov 5, 2019
@rsc rsc changed the title proposal: net: prefer /etc/hosts over DNS (even) if no /etc/nsswitch.conf is present proposal: net: prefer /etc/hosts over DNS when no /etc/nsswitch.conf is present Nov 6, 2019
@rsc
Copy link
Contributor

rsc commented Nov 6, 2019

Retitled for clarity. We'll wait for @schnoddelbotz to report back about glibc, but even in the absence of progress there, it seems plausible to diverge from glibc and prefer /etc/hosts over DNS. Probably glibc never runs on systems where /etc/nsswitch.conf is missing, whereas Go clearly does.

@erikwilson
Copy link

Thanks for proposing this fix @schnoddelbotz!

What glibc and go are doing here feels very wrong, every sane /etc/nsswitch.conf configuration prefers files then dns. This is particularly a problem for networks that setup a search path as localhost can resolve to something completely wrong. For Alpine/musl systems this has been particularly painful, please fix asap! :)

@schnoddelbotz
Copy link
Author

Thanks for retitling and reconsidering the issue as proposal, despite my previous misconception about glibc behaviour. There was only one response to the corresponding glibc-alpha thread in the last 6 days, but it was positive. Citing Florian Weimer's response:

> The proposed new default would be
> #define DEFAULT_CONFIG "files dns"
> ... effectively making the /etc/hosts file the primary source for
> hosts lookups, until overridden by a nsswitch.conf file.

I think this is a reasonable change to make.  I've wanted this for a
quite a while and was waiting for the new nsswitch.conf parser (which I
expected would have centralized the policy in one place).  But we can
make this change now.  It's just a bit tedious to find all the places
that need changing.

I'd happily create a PR here now, but somehow doubt it's really helpful/needed? Just let me know...
Thanks again!

@erikwilson
Copy link

erikwilson commented Nov 10, 2019

I think a PR is absolutely needed, https://golang.org/src/net/conf.go feels like a hot mess, it is making too many assumptions about the system. An upstream glibc change will have no effect on the go dns resolver, so go needs to change since it is making assumptions and emulating broken configs.

I feel like this is an opportunity for Golang to be opinionated and consistent about what is happening here. For musl I would still need to set the netcgo tag to force cgo resolution or maybe RES_OPTIONS to force hostLookupFilesDNS for things which are compiled without cgo or the netgo tag. It feels like there should be a better way, either through an environment variable or other means, to specify the result of hostLookupOrder.

@schnoddelbotz
Copy link
Author

I admittedly only focused on fixing the hosts-over-dns issue for Alpine with CGO_ENABLED=0 builds, which felt like a low-hanging fruit: I could only have imagined a tiny PR that deletes lines 205-209 of conf.go and maybe some doc tweaks in net.go to mention that /etc/hosts is considered before using DNS if nsswitch.conf is absent.

It's perfectly possible that I also mis-read Go code here (besides glibc), but are you sure that the tweak above wouldn't be sufficient to force hostLookupFilesDNS for things which are compiled without cgo, as there's no nsswitch.conf, @erikwilson? And for musl with CGO, you'd always get musl's "static" hostLookupFilesDNS order mentioned above, wouldn't you? Personally, I like automagic behaviour, as long as it matches "working" OS default/configured behaviour i.e. not (today's) glibc without a nsswitch.conf ;)
Just my non-relevant 2¢, let's see what maintainers say... maybe worth a separate issue?

@erikwilson
Copy link

Yah, maybe the magic behavior of when to use cgo dns should be a separate issue.

I doubt very few people, if anyone, will want DNS then file resolution. The problem with the current net conf logic is the assumption that certain libraries are used, even tho that may not be the case. It is trying to be too magical and failing badly. Might as well just always return hostLookupFilesDNS unless something overrides it and remove all of the c.goos logic. If you want system behavior then just use cgo dns.

@rsc
Copy link
Contributor

rsc commented Nov 13, 2019

The consensus here seems overwhelming, and maybe glibc will even make the change.
This seems like a likely accept.

Leaving open for a week for final comments.

@rsc
Copy link
Contributor

rsc commented Nov 20, 2019

No change in consensus. Accepted.

@rsc rsc closed this as completed Nov 20, 2019
@rsc rsc changed the title proposal: net: prefer /etc/hosts over DNS when no /etc/nsswitch.conf is present net: prefer /etc/hosts over DNS when no /etc/nsswitch.conf is present Nov 20, 2019
@rsc rsc modified the milestones: Proposal, Go1.15 Nov 20, 2019
@rsc
Copy link
Contributor

rsc commented Nov 20, 2019

Did not mean to close; sorry.

@rsc rsc reopened this Nov 20, 2019
@ianlancetaylor ianlancetaylor added the NeedsFix The path to resolution is known, but the work has not been done. label Nov 20, 2019
@SinisterMinister
Copy link

Just ran into this issue myself in a really bizarre fashion so I wanted to notate it here just in case it was related. When debugging connectivity issues with Consul, I noticed that it would sometimes get the IP defined in the /etc/hosts file and others it would use DNS.

Example Consul outupt

    2019/12/23 20:52:08 [INFO] agent: Joining LAN cluster...
    2019/12/23 20:52:08 [INFO] agent: (LAN) joining: [HOSTS_OVERRIDDEN_DOMAIN]
    2019/12/23 20:52:08 [INFO] agent: started state syncer
==> Consul agent running!
    2019/12/23 20:52:08 [WARN] manager: No servers available
    2019/12/23 20:52:08 [ERR] agent: failed to sync remote state: No known Consul servers
    2019/12/23 20:52:08 [WARN] agent: (LAN) couldn't join: 0 Err: 1 error occurred:
	* Failed to join 172.x.x.x: Remote state is encrypted and encryption is not configured

In this case, the HOSTS_OVERRIDDEN_DOMAIN is set to a 192.168.x.x/32 IP but it's still picking up the 172.x.x.x/32 IP from DNS. Both the /etc/hosts and /etc/nsswitch.conf files are setup correctly but which it picks seems to be non-deterministic for some reason as it was originally working and occasionally will pick the hosts file when restarted.

$ docker exec -it consul-agent sh
/ # cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.x.x	HOSTS_OVERRIDDEN_DOMAIN
/ # cat /etc/nsswitch.conf
hosts: files dns

The container is using host networking and pulls the /etc/hosts file from the host itself. The container appears to be based on an Alpine 3.9 image running on a CentOS 7.7 host running kernel version 3.10.0-1062.7.1.el7.x86_64. Not sure if any of that is useful but I figured I'd throw it in there just in case. Let me know if there's any more information that may help.

ecordell added a commit to ecordell/operator-registry that referenced this issue Mar 26, 2020
ecordell added a commit to ecordell/operator-registry that referenced this issue Mar 26, 2020
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/operator-registry that referenced this issue Apr 12, 2020
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/operator-registry that referenced this issue May 18, 2020
@odeke-em
Copy link
Member

We didn't cover much progress before the Go1.15 code freeze, thus I shall move this to the Go1.16 milestone.

@odeke-em odeke-em modified the milestones: Go1.15, Go1.16 May 23, 2020
ncopa added a commit to ncopa/go that referenced this issue Jun 18, 2020
Do not mimic glibc behavior if /etc/nsswitch.conf is missing. This will
will likely be missing on musl libc systems and glibc systems will likely
always have it, resulting in localhost lookup being done over DNS rather
than from /etc/hosts.

Do what makes most sense rather than making any assumption about the
libc.

Fixes golang#35305
ncopa added a commit to ncopa/go that referenced this issue Jun 18, 2020
Do not mimic glibc behavior if /etc/nsswitch.conf is missing. This will
will likely be missing on musl libc systems and glibc systems will likely
always have it, resulting in localhost lookup being done over DNS rather
than from /etc/hosts.

Do what makes most sense rather than making any assumption about the
libc.

Fixes golang#35305
@gopherbot
Copy link

Change https://golang.org/cl/238629 mentions this issue: net: prefer /etc/hosts over DNS when no /etc/nsswitch.conf is present

ncopa added a commit to ncopa/go that referenced this issue Oct 16, 2020
Do not mimic glibc behavior if /etc/nsswitch.conf is missing. This will
will likely be missing on musl libc systems and glibc systems will likely
always have it, resulting in localhost lookup being done over DNS rather
than from /etc/hosts.

Do what makes most sense rather than making any assumption about the
libc.

Fixes golang#35305
andreabolognani added a commit to andreabolognani/buildkit that referenced this issue Nov 20, 2020
This is a workaround for

  golang/go#35305

which makes it possible to successfully push to a local registry
using

  $ docker buildx build --push --tag localhost:5000/foo .

assuming, of course, that the builder instance currently in use
has been created using

  $ docker buildx create --driver-opt network=host

The workaround is no longer going to be necessary once Go 1.16
is used for builds.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
storjBuildBot pushed a commit to storj/edge that referenced this issue Jan 15, 2021
Go <1.16 has unexpected default behavior when the /etc/nsswitch.conf
file is missing: It will default to making a network request before
looking in local files like /etc/hosts. Unfortunately the very popular
alpine images do not have /etc/nsswitch.conf by default and so we must
create one ourselves.

There are many reports of this causing issues with name resolution:

golang/go#22846
golang/go#35305

Until this is fixed in a released version of Go (hopefully 1.16) we need
this change to ensure lookups happen in the expected way (files and then
network).

Change-Id: I246a767ae2767f0f83bf3b4229938a292cf9737e
ankitathomas pushed a commit to ankitathomas/operator-framework-olm that referenced this issue Apr 6, 2021
see: golang/go#35305
(upstream operator-registry commit: 597b6ba90b2b13e3e0d1b90167fb1b6452505817)
openshift-merge-robot pushed a commit to openshift/operator-framework-olm that referenced this issue Jul 14, 2021
see: golang/go#35305

Upstream-repository: operator-registry
Upstream-commit: 597b6ba90b2b13e3e0d1b90167fb1b6452505817
@golang golang locked and limited conversation to collaborators Oct 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Proposal Proposal-Accepted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants