Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: narrow 'mod download' default set #44435

Closed
danp opened this issue Feb 19, 2021 · 18 comments
Closed

cmd/go: narrow 'mod download' default set #44435

danp opened this issue Feb 19, 2021 · 18 comments

Comments

@danp
Copy link
Contributor

danp commented Feb 19, 2021

Context

go mod download is often used in build systems to "warm the cache" so subsequent operations don't need to fetch any more data. However, it can currently download more than expected. See #41431.

In #41431 (comment), @bcmills suggests using go list -test -deps ./... instead. This would only fetch only the modules transitively imported by packages in the main module.

A similar suggestion of go list -test all was made recently in the Gophers slack.

Proposal

This issue proposes changing the default selection for go mod download to match go list all. This would make go mod download more intuitive for the cache-warming case which typically does not involve needing dependencies' test dependencies.

The current selection could be moved to a flag if necessary.

@gopherbot gopherbot added this to the Proposal milestone Feb 19, 2021
@bcmills
Copy link
Contributor

bcmills commented Feb 19, 2021

CC @jayconrod @matloob

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Feb 24, 2021
@rsc rsc changed the title proposal: change go mod download's default selection proposal: cmd/go: narrow 'mod download' default set May 4, 2021
@rsc
Copy link
Contributor

rsc commented May 4, 2021

For clarity, right now go mod download downloads the modules listed by go list -m all.
This proposal is to limit it to the modules supplying packages for go list all, which is a smaller set.

Note that we would still need to download go.mod files for something like the whole original set,
and those should still be reported in go mod download -json, in entries that would simply omit
the full-source links if the full source was not downloaded.

@rsc rsc moved this from Incoming to Active in Proposals (old) May 5, 2021
@rsc
Copy link
Contributor

rsc commented May 19, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Accept in Proposals (old) May 19, 2021
@zikaeroh
Copy link
Contributor

zikaeroh commented May 25, 2021

I think that this would break the cache warming case for when you only have go.mod and go.sum and none of the code, as I described in: #45551 (comment)

This case is common when building docker containers as it allows you to populate a module cache before you copy the code in. If the module cache state is solely dependent on the code (which is what go list all effectively does), then the cache is populated after the code is added in a new layer, and any code change will invalidate the layer and cause the cache to be thrown away.

It seemed like based on #45551 (comment) that this was a supported case and would actually be improving in the future, but go list all when I only have go.mod and go.sum nets me no packages while go list -m all does appear to query them.

cc @jayconrod

@bcmills
Copy link
Contributor

bcmills commented May 25, 2021

@zikaeroh, if you're setting up a fast Docker builder, you probably want to prime the build cache too — not just the module cache.

In that case, you would either want to go build $PACKAGES (where $PACKAGES is an externally-computed list of the packages that you expect to be relevant to the build), or go ahead and upload a baseline copy of the code to be build and run go test -c -o /dev/null ./... or similar.

Or, to put it another way: if you're ok with wasting some space on the build image (as you would be with go mod download today), then you're likely better off wasting that space on a redundant copy of the source code for your module instead of redundant copies of the source code for unused modules in your dependency graph.

@zikaeroh
Copy link
Contributor

zikaeroh commented May 25, 2021

if you're setting up a fast Docker builder, you probably want to prime the build cache too — not just the module cache.

In my experience, getting the 2 GB of dependencies my project needs (or about 1 GB non-test) is much, much more time consuming than the actual build itself.

In that case, you would either want to go build $PACKAGES (where $PACKAGES is an externally-computed list of the packages that you expect to be relevant to the build), or go ahead and upload a baseline copy of the code to be build and run go test -c -o /dev/null ./... or similar.

Or, to put it another way: if you're ok with wasting some space on the build image (as you would be with go mod download today), then you're likely better off wasting that space on a redundant copy of the source code for your module instead of redundant copies of the source code for unused modules in your dependency graph.

Unfortunately, docker can't work this way. The build context for docker is based on the current file system. I can't pre-copy a different version (which version?) of the code into the container when building because there's only one version; the current one. If I make any change to the source code, any layer that depended on it is invalidated, hence why I only copy in go.mod and go.sum, as they don't change that often and provide the info required to pre-populate the cache (even if overzealous).

Note that I'm not "wasting space"; I build the binary, then copy it to another image (see: https://github.com/hortbot/hortbot/blob/master/Dockerfile). In this case, the only "waste" is the cache that'd on the builder (versus where the image runs), which is what I want to happen anyway. This is a common pattern when building docker images with compiled code; there's no reason to ship the source, the Go compiler, caches, etc.

@seankhliao
Copy link
Member

It can work that way: use an old version of your build stage as the base image for your newer builds. (Note newer docker versions can mount volumes as caches removing the need for layering gymnastics)

The "wasting space" refers to your build stage, which you are obviously caching to be able to reuse it. In most cases an old copy of your code + accurate dependencies takes up less space than the extended set of dependencies. The wasted space also translates to longer build times if your CI is stateless as it now needs to restore/save larger caches from a remote

@zikaeroh
Copy link
Contributor

If you believe it can be done, then I would appreciate some sort of example to look at; I'm having a hard time conceptualizing how that would work without shifting the caching responsibility to me or extra scripts (managing which parent image I'm using when that's encoded in Dockerfile, removing old items, etc) or breaking statelessness (some at-build volume). It seems like a shame to have to be managing all of this when it was so simple to achieve before following only docker's best practices documentation.

@rsc
Copy link
Contributor

rsc commented May 26, 2021

This is only about changing the default for go mod download with no arguments.
@bcmills, would go mod download all still get the old behavior?
Or maybe the default should be "what's listed in go.mod", which would work well with the new module pruning?

@rsc
Copy link
Contributor

rsc commented May 26, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Likely Accept to Active in Proposals (old) May 26, 2021
@andig
Copy link
Contributor

andig commented May 26, 2021

@rsc I understand go mod download all would retain the current behaviour of downloading when only go.mod/sum exists? I totally supprt @zikaeroh‘s case of fast docker builds without additional steps or mounting volumes (which you might not have on CI). Making this change would break all those builds but at least allow fixing them by adding all.

Or maybe the default should be "what's listed in go.mod", which would work well with the new module pruning?

This would not break anybody‘s builds (and even get rid of the parsing step?)- are there negative side effects?

@zikaeroh
Copy link
Contributor

This would not break anybody‘s builds (and even get rid of the parsing step?)- are there negative side effects?

If the definition of "what's listed in go.mod" is that it will download only the modules listed in go.mod and none of their dependencies (as "all deps and their deps" is the old behavior), then the effect of that change would be to change over-warming to under-warming, as the rest of the dependencies would be pulled in at some future step, probably during go build. This would cause some element of unpredictability based on what projects happen to import (and what their deps import), and probably will cause some confusing caching changes for existing Dockerfiles.

The only way to avoid the unpredictability would be to bump the module to 1.17 to enable lazy loading and the extra entries in go.mod (which I'm not sure all projects will do given the misconceptions/lack of clarity around what the version actually indicates; #46201, #30791).

@bcmills
Copy link
Contributor

bcmills commented May 27, 2021

@zikaeroh

The only way to avoid the unpredictability would be to bump the module to 1.17 to enable lazy loading and the extra entries in go.mod

Or, conversely: to only enable the new go mod download behavior for modules that specify go 1.17 or higher.

@bcmills
Copy link
Contributor

bcmills commented May 27, 2021

I like the idea of making the change contingent on the go version and defaulting to downloading only the modules listed explicitly in the go.mod file. That gives the ability to run go test ./... (which I expect is what most CI systems are running) after go mod download without the need to re-scan the entire package import graph.

I agree that go mod download all should continue to download all of the modules in go list -m all regardless.

@rsc
Copy link
Contributor

rsc commented Jun 2, 2021

OK, so it sounds like we are converging on changing the 'mod download' set for go.mod that use module graph pruning (go 1.17+), but this feature would ship in Go 1.18 since we are already in the Go 1.17 freeze.

Do I have that right?

@rsc
Copy link
Contributor

rsc commented Jun 9, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Accept in Proposals (old) Jun 9, 2021
@rsc rsc moved this from Likely Accept to Accepted in Proposals (old) Jun 16, 2021
@rsc
Copy link
Contributor

rsc commented Jun 16, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: cmd/go: narrow 'mod download' default set cmd/go: narrow 'mod download' default set Jun 16, 2021
@gopherbot
Copy link

Change https://golang.org/cl/357310 mentions this issue: cmd/go: download fewer dependencies in 'go mod download'

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

7 participants