Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: unclear how to cache transitive dependencies in a Docker image #27719

Open
wedow opened this issue Sep 17, 2018 · 61 comments
Open

cmd/go: unclear how to cache transitive dependencies in a Docker image #27719

wedow opened this issue Sep 17, 2018 · 61 comments
Labels
modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@wedow
Copy link

wedow commented Sep 17, 2018

What version of Go are you using (go version)?

go version go1.11 linux/amd64

Does this issue reproduce with the latest release?

yes

What did you do?

I'm attempting to populate a Docker cache layer with compiled dependencies based on the contents of go.mod. The general recommendation with Docker is to use go mod download however this only provides caching of sources.

go build all can be used to compile these sources but instead of relying on go.mod contents, it requires my application source to be present to determine which deps to build. This causes a cache invalidation on every code change and renders the step useless.

Here's a Dockerfile demonstrating my issue:

FROM golang:1.11-alpine
RUN apk add git

ENV CGO_ENABLED=0 GOOS=linux

WORKDIR /app

COPY go.mod go.sum ./

RUN go mod download

# this fails
RUN go build all
# => go: warning: "all" matched no packages

COPY . .

# this now works but isn't needed
RUN go build all

# compile app along with any unbuilt deps
RUN go build

From package lists and patterns:

When using modules, "all" expands to all packages in the main module and their dependencies, including dependencies needed by tests of any of those.

where the main module is defined by the contents of go.mod (if I'm understanding this correctly).

Since "the main module's go.mod file defines the precise set of packages available for use by the go command", I would expect go build all to rely on go.mod and build any packages listed within.

Other actions which support "all" have this issue but some have flags which resolve it (go list -m all).

@davecheney
Copy link
Contributor

davecheney commented Sep 18, 2018 via email

@wedow
Copy link
Author

wedow commented Sep 18, 2018

Thanks Dave, go build ./... is a bit of an improvement since it doesn't include the test dependencies that all does. However it still requires my application source to be present and gives go: warning: "./..." matched no packages if run with only go.mod and go.sum present.

@davecheney
Copy link
Contributor

davecheney commented Sep 18, 2018 via email

@wedow
Copy link
Author

wedow commented Sep 18, 2018

For sure. I've found in most previous projects that dependency build times are fast enough to not be an issue so in the end the existing behaviour is probably fine.

Part of my current project is the creation of a custom Terraform Provider for managing some of our internal systems. Building the Terraform packages only happens once locally so not a big deal, but they need to be rebuilt every time a new docker image is built. When these packages are already compiled, go build completes in under a second. When they need to be rebuilt from scratch, go build can take up to two minutes locally or longer on our CI servers.

Some time can be saved by using go mod download to cache the Terraform package sources but afaict there is no command to compile them after download without having our package main present for go build to determine what the dependencies actually are.

Based on the existing module documentation, I would expect the go.mod file to have an accurate list of required dependencies and for the toolchain to be able to rely on it in isolation.

We do similar things with projects in other languages for building Docker images. The flow is generally:

  1. Copy package manifest (Gemfile, package.json, etc.) into container
  2. Download dependency code and compile associated libraries (bundle, npm install, etc.)
  3. Copy the rest of our project source into container

This lets us avoid having to rebuild dependencies on every commit. It would be nice if this could be replicated with the Go module system. go mod download gets us halfway but doesn't allow caching of compilation artifacts.

Here's an example repo: https://github.com/wedow/docker-go-build

To see the issue we're having, clone it and run docker build ., add a comment or something to main.go and run docker build . again. Ideally all deps would be be built and cached prior to the COPY . . step and the final go build would be a sub-second operation.

@bcmills bcmills added modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 18, 2018
@bcmills bcmills added this to the Go1.12 milestone Sep 18, 2018
@myitcv
Copy link
Member

myitcv commented Nov 13, 2018

I think what you're after here is:

go list -export $(go list -m)/...
The -export flag causes list to set the Export field to the name of a
file containing up-to-date export information for the given package.

This will populate the build cache (go env GOCACHE) with the results of compiling for the -export flag. The module cache ($GOPATH/pkg/mod) as you say contains the module-related caches.

If you want to install main packages too then:

go install $(go list -f '{{ $ip := .ImportPath}}{{if eq .Name "main"}}{{$ip}}{{end}}' $(go list -m)/...)

@bcmills
Copy link
Contributor

bcmills commented Nov 13, 2018

go build all can be used to compile these sources but instead of relying on go.mod contents, it requires my application source to be present to determine which deps to build.

Yes, that is working as designed: in module mode, all refers to the transitive imports of the packages in the main module, not the packages in its module dependencies. That's not going to change.

This causes a cache invalidation on every code change and renders the step useless.

If the code changes are only in your .go source files, then only the cache entries for the packages containing those source files should be invalidated: the cache contents for the other transitive dependencies should be unaffected.

The build artifact cache is separate from the module cache: the former is controlled by GOCACHE (and defaults to $HOME/.cache), while the latter is a subdirectory of the first entry in GOPATH. You may need to set the GOCACHE environment variable to make sure it is within the container; see Build and test caching for detail.

Can you confirm that both the build cache and the module cache are present and populated in your docker image after the first go build all?

@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Nov 13, 2018
@bcmills bcmills changed the title cmd/go: go action all ignores go.mod file cmd/go: unclear how to cache transitive dependencies in a Docker image Nov 13, 2018
@bcmills bcmills modified the milestones: Go1.12, Go1.13 Nov 13, 2018
@wedow
Copy link
Author

wedow commented Nov 29, 2018

Thanks guys, I think there may be some confusion about which caches are being affected and when.

The issue is in how docker caches layers after each operation. When my source files are changed, all side effects which occur after the COPY . . line (such as populating GOCACHE) are lost. Those changes are isolated in a layer which has been invalidated and must be fully rebuilt.

The go list -export $(go list -m)/... command works great for populating GOCACHE but since it must come after COPY . ., it must be fully re-run during every build. go build all also has this issue.

I'm looking for a command which can compile the dependencies listed in the go.mod file in isolation so that it can occur before that COPY . . line that adds our sources to the container. Totally understand if that's not possible with the current module system. I may just experiment with parsing and building the deps separately.

@myitcv
Copy link
Member

myitcv commented Dec 1, 2018

The go list -export $(go list -m)/... command works great for populating GOCACHE but since it must come after COPY . ., it must be fully re-run during every build

I'm unclear why you say it must come after the copy - please can you explain?

I'm looking for a command which can compile the dependencies listed in the go.mod file in isolation so that it can occur before that COPY . .

go list -export $(go list -m)/... should be all you need here. But let's first unravel the question above first.

@bcmills
Copy link
Contributor

bcmills commented Dec 1, 2018

@myitcv, note that -export may at some point do less than a full build. I don't think it's a perfect fit for the use-case.

@hinshun
Copy link

hinshun commented Jan 18, 2019

You have to export both GOCACHE and GOPATH/pkg/mod:

Example:

FROM golang:1.11-alpine AS mod
RUN apk add -U git
WORKDIR /src
COPY go.mod .
COPY go.sum .
RUN go mod download

FROM golang:1.11-alpine
COPY --from=mod $GOCACHE $GOCACHE
COPY --from=mod $GOPATH/pkg/mod $GOPATH/pkg/mod
WORKDIR /src
COPY . .
RUN go build

alexwh added a commit to BEANSQUAD/paul-bot that referenced this issue Jan 19, 2019
this has the disadvantage of being a large image (because it's based on
golang:alpine and not alpine, plus the fact that building the project
creates various cache files, though these cache files are the things we
need in order to speed up compiling the project on the main go build
line. most of the compilation time is spent on the dependenent libraries
(i.e. discordgo, crypto, stdlib, etc).

there is not a functional method of compiling dependencies from a bare
go.mod and go.sum file[1], you must have a valid go project in the
directory for go build all to work, at which point the docker layer
cache has been invalidated by `COPY . /app`. the proposed
`go list -export $(go list -m)/...` does not compile all dependencies
either, showcased by checking the size of $GOCACHE before and after
running `go build all`

without doing funky stuff like bind-mounting a volume container into the
build container[2], inflating the image size for faster compiling seems
to be the best tradeoff, as the image will only stay local anyway

[1] golang/go#27719
[2] https://github.com/banzaicloud/docker-golang
alexwh added a commit to BEANSQUAD/paul-bot that referenced this issue Jan 22, 2019
this has the disadvantage of being a large image (because it's based on
golang:alpine and not alpine, plus the fact that building the project
creates various cache files, though these cache files are the things we
need in order to speed up compiling the project on the main go build
line. most of the compilation time is spent on the dependenent libraries
(i.e. discordgo, crypto, stdlib, etc).

there is not a functional method of compiling dependencies from a bare
go.mod and go.sum file[1], you must have a valid go project in the
directory for go build all to work, at which point the docker layer
cache has been invalidated by `COPY . /app`. the proposed
`go list -export $(go list -m)/...` does not compile all dependencies
either, showcased by checking the size of $GOCACHE before and after
running `go build all`

without doing funky stuff like bind-mounting a volume container into the
build container[2], inflating the image size for faster compiling seems
to be the best tradeoff, as the image will only stay local anyway

[1] golang/go#27719
[2] https://github.com/banzaicloud/docker-golang
@dbudworth
Copy link

@myitcv the go list trick only works if you have your source present
The way we avoid re-downloading all deps is to simply copy over go.mod and go.sum then run go mod download which creates the package source cache, but does not create the compiled cache of the modules.

so we're looking for a way to get the stuff listed in go.mod compiled and placed in ~/.cache before we copy all the project source over, this lets us avoid the length re-compile of our deps on each build

think of it as a 2 phase build
phase 1: copy go.mod, download and (hopefully) compile deps
phase 2: copy project source and compile our stuff against phase 1 cached stuff

@wedow
Copy link
Author

wedow commented Feb 16, 2019

@dbudworth it doesn't really seem possible to do what we're looking to do with the currently available tooling. I came up with a hacky workaround to get the results I was looking for and just updated my example repo to illustrate it.

The basic idea is the use of a dummy import file which can trigger the compilation of dependencies when run through go build. This file is added with go.mod to the docker image, then compiled to prime the cache, then removed before adding the real application source files.

While I'd much prefer a way to compile dependencies separate from application code as part of the official toolchain, this method does dramatically reduce subsequent docker image build times for our project and has really sped up our CI process.

@dinvlad
Copy link

dinvlad commented Feb 28, 2019

Same issue here. go build step depends on main.go AND it compiles vendor dependencies. Which means every time we change main.go, it will recompile it AND all of the vendor dependencies. The only way around that for now appears to be @wedow's workaround of a dummy_main.go that includes dummy imports of all vendor dependencies. So we run go build on that file first, and only then we COPY/ADD main.go and go build the latter (but this later go build now reuses deps pre-compiled with the previous go build).

This would be somewhat easier to handle if docker build supported a -v option so we could mount a "compilation cache" directory at build time.

@benweissmann
Copy link

Would it be possible to add a --install or --compile flag to go mod download, that would compile and cache the downloaded packages?

@bcmills
Copy link
Contributor

bcmills commented Apr 12, 2019

@benweissmann, that seems like it would have significant overlap with go get, which does build and install the requested packages.

@bcmills
Copy link
Contributor

bcmills commented Apr 12, 2019

@dinvlad

go build step depends on main.go AND it compiles vendor dependencies. Which means every time we change main.go, it will recompile it AND all of the vendor dependencies.

The Go build cache is content-addressed, and contains intermediate artifacts. If you are correctly storing the build cache (as @hinshun describes), then it should not recompile dependencies whose sources are unchanged.

The only way around that for now appears to be @wedow's workaround of a dummy_main.go that includes dummy imports of all vendor dependencies.

You can use go list to query the dependencies of your top-level package and request to build those dependencies explicitly. (A dummy .go file is fine too, but not strictly necessary.)

@bcmills bcmills removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 12, 2019
@bcmills
Copy link
Contributor

bcmills commented Apr 12, 2019

Please try the above approach (saving both GOCACHE and GOPATH/pkg/mod and using go list to compute the set of packages to warm the cache) and let us know if there are any remaining issues.

@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 12, 2019
@wedow
Copy link
Author

wedow commented Apr 16, 2019

@bcmills I'm kind of at a loss on how to explain the issue in a different way. The go list approach is incompatible with docker's caching mechanism. It requires the presence of my application source. Any subsequent change to that source invalidates docker's cache which also throws away anything in GOCACHE.

Similarly, @hinshun's approach of copying GOCACHE from a previous build step has no effect because go mod download doesn't populate GOCACHE. There is nothing to be copied.

You mention an --install flag would overlap with go get, but go get requires application source whereas go mod download does not and works on .mod files. If there is a way to have either go get operate on .mod files in isolation, or have go mod download populate GOCACHE after downloading, there'd be no issue. Since this doesn't work, we need a new option or command or something to accomplish this.

Personally, go mod download --install or even go mod install seem like good fits.

@bcmills
Copy link
Contributor

bcmills commented Apr 16, 2019

The go list approach is incompatible with docker's caching mechanism. It requires the presence of my application source. Any subsequent change to that source invalidates docker's cache

Yes, you'd need to prime the cache in your Docker image from a specific version of your application source, and changing that source would invalidate the image caching. (I suspect that you could discard that source from the final image, but I don't use Docker much so I'm a bit fuzzy on the details.)

You could also use go list to compute the dependency versions (and dependency packages), and build those even without your application source.

go get does not require your application source in general: it can download packages and modules as needed. (You still need to pass it an appropriate list of packages to build, though.)

@rsc rsc added this to the Backlog milestone Oct 9, 2019
@SimonAlling
Copy link

Following the advice given by @hinshun, @nicollecastrog and @arjunpur, I made a PR to Kubeapps that I think solves this exact problem. Here is the Dockerfile for reference:

# syntax = docker/dockerfile:experimental

FROM golang:1.13 as builder
WORKDIR /go/src/github.com/kubeapps/kubeapps
COPY go.mod go.sum ./
COPY vendor vendor
COPY pkg pkg
COPY cmd cmd
ARG VERSION
# With the trick below, Go's build cache is kept between builds.
# https://github.com/golang/go/issues/27719#issuecomment-514747274
RUN --mount=type=cache,target=/go/pkg/mod \
    --mount=type=cache,target=/root/.cache/go-build \
    CGO_ENABLED=0 go build -installsuffix cgo -ldflags "-X main.version=$VERSION" ./cmd/tiller-proxy

FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /go/src/github.com/kubeapps/kubeapps/tiller-proxy /proxy
EXPOSE 8080
CMD ["/proxy"]

Compared to having go mod download cached by Docker, but not the Go build cache, this brings the total build time (for make kubeapps/tiller-proxy) down from 40 seconds to about 5 seconds.

@Feresey
Copy link

Feresey commented Jan 24, 2020

More simple way:

FROM golang:1.13-alpine

RUN apk update \
    && apk add --no-cache git

WORKDIR /attacker

COPY ./go.mod .

RUN go mod graph | cut -d '@' -f 1 | cut -d ' ' -f 2 | sort | uniq | tr '\n' ' ' | xargs go get -v

COPY . .

RUN CGO_ENABLED=0 go test -c 

CMD ./attacker.test

@Fryuni
Copy link

Fryuni commented Feb 9, 2020

Took me a while to find out that such a simple thing that all other languages that I ever used made me take for granted can't be done in Go natively.

It is quite a pain having to switch our entire CI ecosystem because of a missing command option, but it seems like the only way for the modules and caching to have any meaning besides allowing code outside of GOPATH and to achieve the Go's goal of fast builds while using containerized applications

@bcmills
Copy link
Contributor

bcmills commented Feb 10, 2020

It is quite a pain having to switch our entire CI ecosystem because of a missing command option

@Fryuni, what is the “missing command option” to which you refer? Most of the recent progress on this issue has been folks figuring out the proper docker configuration (#27719 (comment), #27719 (comment), and others), rather than any proposed changes in the go command.

@Fryuni
Copy link

Fryuni commented Feb 10, 2020

@bcmills A command to build the dependency cache. Or a flag to do it with go mod download.

In Python, for example, pip install -r requirements.txt will download all the dependencies and compile C dependencies if they are not pre-compiled. That means that this dockerfile will have everything in the cache correctly and won't recompile the dependencies:

FROM python:3.7
WORKDIR /app
COPY requirements.txt /app
RUN pip install -r requirements.txt

CMD ["python", "main.py"]
COPY . /app

The same is equally simple in Node, Java, Ruby, etc.
But in Go... this happens:

FROM golang:1.13
WORKDIR /src
COPY go.mod go.sum /src/
RUN go mod download

# At this point there is no cache for the dependencies binaries, but there should be
# Either with a command like `go mod build-cache` or a flag for the previous like `go mod download --build-cache`

COPY . /src  # Making a cache after this is totally useless as it will be thrown away by any change in the code

RUN go build -o /app . # The cache is only created here

CMD ["/app"]

Everyone is figuring a way to use other docker feature to compensate this missing feature. Using the experimental features to sidestep docker layer architecture just to inject a cache along with a RUN command.

That is exactly why we are having to change our CI ecosystem. We currently use managed solutions, but those (very wisely) do not allow experimental features to be enabled on dockerd on your CI pipeline. We are changing to a self-hosted solution in order to use them.

@kuujo
Copy link

kuujo commented Feb 11, 2020

Agree with @Fryuni. FWIW my first attempt was to do this with go mod download and I was also looking for a flag on the command that would facilitate this. That would be a good solution IMO. I’m still using go mod vendor aa a workaround and it has worked great. Presumably a go mod download flag would produce a similar but more elegant solution.

@jschaf
Copy link

jschaf commented Feb 13, 2020

I'd like to avoid experimental docker features to cache go module compilation artifacts, so I tried @Feresey's approach of using go get to download and install dependencies.

RUN go mod graph | cut -d '@' -f 1 | cut -d ' ' -f 2 | sort | uniq | tr '\n' ' ' | xargs go get -v

However, this failed on this go.mod file:

module github.com/my-module

go 1.13

require (
	gopkg.in/DataDog/dd-trace-go.v1 v1.20.1
)

To repro without a go mode file:

$ go get gopkg.in/DataDog/dd-trace-go.v1

go get gopkg.in/DataDog/dd-trace-go.v1: no Go source files

I'm not sure what's going on here. All the actual imports are gopkg.in/DataDog/dd-trace-go.v1/ddtrace so I don't know why the module is different.

@jschaf
Copy link

jschaf commented Feb 13, 2020

I also posted this to StackOverflow: https://stackoverflow.com/questions/60200363/create-docker-container-to-run-go-test-with-all-module-dependencies-downloaded

@bcmills
Copy link
Contributor

bcmills commented Feb 13, 2020

@jschaf, go get (without the -d argument) requests to fetch and build the packages named on the command line.

Packages are not 1:1 with modules: a module contains packages — often many of them, and often many that are not going to be relevant to building the packages in your module. That's why much of the discussion above (for example, #27719 (comment)) focuses on packages rather than modules.

It's also why a flag to go mod download would not be a great fit for this use-case: if we were to add some flag to go mod download that also builds all of the packages within the downloaded modules, it would encourage folks to build (and cache) a bunch of extraneous dependencies that they won't actually end up needing.

@Fryuni
Copy link

Fryuni commented Feb 13, 2020

Honestly, prebuilding a cache that has more than what I'm gonna need is way better then not building any cache at all. After all, that is the build image, having extra data there is not a problem, is expected. The final binary should be moved to another image in a multi-stage build, as per best practices to have small docker images at the end.

Also, I never expected it to build only the cache of what I'm going to use, but the cache of the dependencies declared, whether my code use them or not. This is cache done before the code is added to the image, it obviously cannot optimize for the code.

Similar to what happen with typescript, you install all your dependencies and transitive dependencies entirely, but when you compile it to JS it only includes what is actually used.

@montanaflynn
Copy link

montanaflynn commented Apr 24, 2020

I'm running into this now that we've switched to using modules, whereas before we could use:

RUN go get -d -v ./...
RUN go install -v ./...

Now it seems our only option is to use experimental docker engine that isn't supported by our CI or some of our devs machines or live with slow builds.

I think there's been a lot of confusion about docker cache vs build cache and go module source cache vs go module build cache. To reiterate the issue for @bcmills what we all really want is:

go mod download --install

or

go mod install

This would allow us leverage existing docker versions non-experimental caching layers that have been around forever the same way we use it to avoid re-downloading the modules source every time we build an image.

For example here is an example Dockerfile which caches the go module source in a docker layer so subsequent docker builds use the cached layer and don't re-download the modules. The problem is if I change anything in the main source then I have to re-build all the modules. If go mod download --install existed it would be be cached in a docker layer and speed up the `go build which would only build the actual example app instead of all the dependencies.

FROM golang:1-alpine AS build
ARG COMMIT_HASH
WORKDIR /example-app

COPY ./go.mod ./go.mod
COPY ./go.sum ./go.sum

RUN go mod download

COPY ./*.go ./

ENV GOARCH=amd64
ENV CGO_ENABLED=0
ENV GOOS=linux

RUN go build -o example .

FROM scratch
WORKDIR /app
ENV PATH=/bin/
COPY --from=build /example-app/example ./example
ENTRYPOINT ["./example"]

@futek
Copy link

futek commented Jun 27, 2020

Would it be possible to add a --install or --compile flag to go mod download, that would compile and cache the downloaded packages?

Personally, go mod download --install or even go mod install seem like good fits.

Correct me if I'm wrong but isn't the requested extension to the go command equivalent to the following?

go mod download --json | jq -r '"\(.Path)@\(.Version)"' | xargs go get -v

This obviously depends on jq to transform the JSON output so it would still be nice to have it built into the go command to avoid that extra dependency.

Example of usage in a Dockerfile:

FROM golang:1.14-alpine AS build
WORKDIR /go/src/app
ENV CGO_ENABLED=0
RUN apk add --no-cache jq
COPY go.mod go.sum ./
RUN go mod download --json | jq -r '"\(.Path)@\(.Version)"' | xargs go get -v
COPY . .
RUN go build -o /go/bin/app

FROM gcr.io/distroless/base
COPY --from=build /go/bin/app /
ENTRYPOINT ["/app"]

@montanaflynn
Copy link

@futek unfortunately that doesn't always work and results in this:

The command '/bin/sh -c go mod download --json | jq -r '"\(.Path)@\(.Version)"' | xargs go get -v' returned a non-zero code: 123

reproduction repo: https://github.com/montanaflynn/golang-docker-cache

@tv42
Copy link

tv42 commented Jun 29, 2020

@montanaflynn That looks like broken software you're trying to build, not an issue with the jq kludge.

	github.com/coreos/bbolt: github.com/coreos/bbolt@v1.3.5: parsing go.mod:
	module declares its path as: go.etcd.io/bbolt
	        but was required as: github.com/coreos/bbolt

@futek

This obviously depends on jq to transform the JSON output so it would still be nice to have it built into the go command to avoid that extra dependency.

That would be very simple to write as a go run buildall.go helper, to avoid the jq dependency. It could also easily subsume the --json and xargs parts.

@montanaflynn
Copy link

@tv42 If you remove the line:

RUN go mod download --json | jq -r '"\(.Path)@\(.Version)"' | xargs go get -v

Then it works and actually downloads far less dependencies, presumably just what's needed for the resulting binary. I think that there are edge cases and associated logic that is included in the go cli that should be applied to any solution for the problem of caching the built dependencies.

Example Dockerfile and docker build logs: https://gist.github.com/montanaflynn/9c7365f0b74635f18268f12897b0b6eb

There are other one-liner shell solutions in this comment thread as well that try to use the dependencies from go mod and go get, for example this comment suggested:

RUN go mod graph | cut -d '@' -f 1 | cut -d ' ' -f 2 | sort | uniq | tr '\n' ' ' | xargs go get -v

Which kind of worked for my reproduction, except while it installed even more dependencies than go mod download --json | jq -r '"\(.Path)@\(.Version)"' | xargs go get -v it also missed some that were later picked up by go build. It also failed entirely for this commenter.

Example Dockerfile and docker build logs: https://gist.github.com/montanaflynn/2d8a5532077e501ec86b4ad643cd1075

I think these one-liner combinations of go mod and go get while they may work for a specific set of dependencies will run into issues if being used for the full spectrum of software being built with Go and that is why we need to have it included in the official go cli where any problems can be reported and fixed.

@futek
Copy link

futek commented Jun 30, 2020

@montanaflynn

@futek unfortunately that doesn't always work and results in this:

The command '/bin/sh -c go mod download --json | jq -r '"\(.Path)@\(.Version)"' | xargs go get -v' returned a non-zero code: 123

reproduction repo: https://github.com/montanaflynn/golang-docker-cache

Right, it appears that passing all indirect dependencies to go get is not equivalent to what happens during a normal build. I think it suffices to just pass the direct dependencies to go get like in this one-liner:

go mod graph | grep "^$(go mod edit -json | jq -r .Module.Path) " | cut -d ' ' -f 2 | xargs go get -v

(i.e. grab the module name from go.mod and use it to filter direct dependencies in the output of go mod graph)

There are other one-liner shell solutions in this comment thread as well that try to use the dependencies from go mod and go get, for example this comment suggested:

RUN go mod graph | cut -d '@' -f 1 | cut -d ' ' -f 2 | sort | uniq | tr '\n' ' ' | xargs go get -v

Now that I look at this one again it seems like it's trying to do exactly the same by relying on the fact that the root package doesn't have a version suffix (@...). However, it didn't work with any version of cut I tried since it outputs lines even when it doesn't have a second field when it can't find the delimiter. Passing -s to cut should fix that making the following one-liner almost equivalent to the one above except that it throws away the version suffix which seems wrong (would it always pick the correct version?):

go mod graph | cut -d '@' -f 1 | cut -s -d ' ' -f 2 | xargs go get -v

Building on that, this is the "simplest" version I can come up with that also retains the version:

go mod graph | grep -v '@.*@' | cut -d ' ' -f 2 | xargs go get -v

I'm sure there are a lot of ways to do this (which could break in various subtle ways) so I'm still voting for an official go flag/command that handles this correctly without the need to maintain one-liners/scripts like this.

@montanaflynn
Copy link

@futek I appreciate the thought but that command fails for drone's dependencies.

go: github.com/NVIDIA/gpu-monitoring-tools@v0.0.0-20200622050622-c34507425bdb requires
	k8s.io/kubernetes@v1.18.2 requires
	k8s.io/api@v0.0.0: git init --bare in /go/pkg/mod/cache/vcs/917454838ed90b2f0e9868490d4b59302d7a7e8f8826d51d313bd68be346ecce: exec: "git": executable file not found in $PATH

Even after installing git it still fails with this error:

go: github.com/NVIDIA/gpu-monitoring-tools@v0.0.0-20200622050622-c34507425bdb requires
	k8s.io/kubernetes@v1.18.2 requires
	k8s.io/api@v0.0.0: reading k8s.io/api/go.mod at revision v0.0.0: unknown revision v0.0.0

When removing RUN go mod graph | grep -v '@.*@' | cut -d ' ' -f 2 | xargs go get -v and just letting go build handle the dependencies it works fine.

For some projects it can certainly improve the docker built time dramatically but it doesn't work everywhere or for every project. I'll still be using it for a few projects where I know it works with their dependencies, in some cases I'm seeing a 10x docker image build speed!

By the way I think this might be a little simpler to understand and only requires a single pipe to awk:

go mod graph | awk '{if ($1 !~ "@") print $2}' | xargs go get -v

@Seb-C
Copy link

Seb-C commented Dec 25, 2020

I just spent time on this issue as well, and none of the workarounds works. Can't we just have a command to do this?

dokterbob added a commit to ipfs-search/ipfs-search that referenced this issue Feb 3, 2021
Significantly speeds up repeated go builds.

Ref: golang/go#27719
@barosl
Copy link

barosl commented Feb 6, 2021

I'm in the same boat. I'm currently using the go mod graph workaround suggested by others, but it is hugely cumbersome. As others said, the very feature mentioned in the article is supported by nearly every language I've ever used. No, the BuildKit mount workaround is not a solution, as it is not supported by many CIs. And above all, why should we utilize an experimental feature when there has always been an established, well-understood and just "natural" feature? Every Dockerfile optimization guide urges us to use layer cache. Could we at least acknowledge there's a problem?

By the way, the go mod graph command can be further simplified into:

go mod graph | awk '$1 !~ /@/ { print $2 }' | xargs -r go get

I added the -r flag to xargs to not error on an empty output. It is a GNU extension.

dokterbob added a commit to ipfs-search/ipfs-search that referenced this issue Feb 6, 2021
Significantly speeds up repeated go builds.

Ref: golang/go#27719
@arvenil
Copy link

arvenil commented Aug 11, 2021

@SimonAlling

Following the advice given by @hinshun, @nicollecastrog and @arjunpur, I made a PR to Kubeapps that I think solves this exact problem. Here is the Dockerfile for reference:

# syntax = docker/dockerfile:experimental

FROM golang:1.13 as builder
WORKDIR /go/src/github.com/kubeapps/kubeapps
COPY go.mod go.sum ./
COPY vendor vendor
COPY pkg pkg
COPY cmd cmd
ARG VERSION
# With the trick below, Go's build cache is kept between builds.
# https://github.com/golang/go/issues/27719#issuecomment-514747274
RUN --mount=type=cache,target=/go/pkg/mod \
    --mount=type=cache,target=/root/.cache/go-build \
    CGO_ENABLED=0 go build -installsuffix cgo -ldflags "-X main.version=$VERSION" ./cmd/tiller-proxy

FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /go/src/github.com/kubeapps/kubeapps/tiller-proxy /proxy
EXPOSE 8080
CMD ["/proxy"]

Compared to having go mod download cached by Docker, but not the Go build cache, this brings the total build time (for make kubeapps/tiller-proxy) down from 40 seconds to about 5 seconds.

Isn't --mount=type=cache,target=/go/pkg/mod unnecessary since you are using vendor? At least in my case it's empty and doesn't speed up anything. Just --mount=type=cache,target=/root/.cache/go-build works like a charm and speeds up build from 19s to 3s. Thank you ❤️

@seankhliao
Copy link
Member

If you're using vendor you already have an accurate list of dependencies to build & cache without needing buildkit

COPY go.mod go.sum .
COPY vendor vendor
RUN go build ./vendor/...

@lincolnmantracer
Copy link

While --mount=type=cache,target=/root/.cache/go-build works for caching the intermediate outputs of a single RUN directive, it doesn't work for building an image which can be reused across multiple Dockerfiles; and it's completely useless in CI systems like Github Actions which start with a clean slate (but do support sharing a layer cache, in several different ways).

Having the ability to produce a layer with just COPY go.mod go.sum that warms the build cache would be immensely useful for both cases.

(That said, the other thing that would help would be enabling sharing the mount=type=cache mounts across instances, perhaps with cache-from and cache-to. See for example this github issue with some workarounds: docker/buildx#156)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests