Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: build cache not safe for concurrent builds #43645

Closed
zimmski opened this issue Jan 12, 2021 · 8 comments
Closed

cmd/go: build cache not safe for concurrent builds #43645

zimmski opened this issue Jan 12, 2021 · 8 comments
Labels
FrozenDueToAge GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@zimmski
Copy link
Contributor

zimmski commented Jan 12, 2021

What version of Go are you using (go version)?

1.15.6 before that we used 1.12.*

Does this issue reproduce with the latest release?

1.15.6 is latest, so yes.

What operating system and processor architecture are you using (go env)?

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS="-v -trimpath"
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/builds/symflower/symflower/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/builds/symflower/symflower"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang-11"
CXX="clang++-11"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build695024912=/tmp/go-build -gno-record-gcc-switches"

What did you do?

(This issue is created because of #40461 where i asked if should create an issue or if it fits the referenced one.)

Gist: Parallel running "go build" processes that use the same GOCACHE directory lead to errors such as crypto/tls\n/usr/local/go/src/crypto/tls/common.go:22:2: can't open import: \"net\": open /cache/go-build/87/8713176dd630051cd4b09a7fa534da9a291da82ba5c84c02966f3ee18dc3a206-d: no such file or directory.

Longer: We switched from Go 1.12 to 1.15.6 for one project and are now running into constant non-deterministic build/test problems because of what I think is that Go's cache is not concurrent-safe. Basically our CI does one "go build" over a whole branch and uses one common GOCACHE per CI node. Hence, multiple CI jobs could write to the same cache directory. That part was never a problem even with the newer Go version (i do not know why). After the "go build" CI job multiple jobs run in parallel to do linting and testing (via "go test"). And these run into constant issues e.g. crypto/tls\n/usr/local/go/src/crypto/tls/common.go:22:2: can't open import: \"net\": open /cache/go-build/87/8713176dd630051cd4b09a7fa534da9a291da82ba5c84c02966f3ee18dc3a206-d: no such file or directory.

Some additional information that might be interesting:

  • The CI runs in Docker containers over Kubernetes
  • The GOCACHE points to a directory on an EXT4 partition which is local and not synced over the network.

What did you expect to see?

We expected that all builds can use the same cache directory at the same time and the builds that worked with an older Go version still work with the latest version.

What did you see instead?

Hundreds of errors such as crypto/tls\n/usr/local/go/src/crypto/tls/common.go:22:2: can't open import: \"net\": open /cache/go-build/87/8713176dd630051cd4b09a7fa534da9a291da82ba5c84c02966f3ee18dc3a206-d: no such file or directory. That lead to a failing build. We never had this problem with Go 1.12. The cache just worked. With 1.15.6 it does not. Basically because of this problem we are back to having no cache at all.


We do not run "go clean" at all, so this is hopefully not #31948.

Our current workaround:

  • Use a GOCACHE directory inside of the running container. The "build" job copies the whole GOCACHE to all other CI jobs that then can reuse the cache. So the cache is only valid for one CI pipeline. This creates HUGE artifacts but is the only way we currently see to have a half-working Go build cache.
@seankhliao seankhliao changed the title Build cache of Go 1.15 is not concurrent-safe even though Go 1.12 worked flawlessly cmd/go: build cache not safe for concurrent builds Jan 12, 2021
@seankhliao seankhliao added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 12, 2021
@ianlancetaylor
Copy link
Contributor

CC @bcmills @jayconrod

@ianlancetaylor ianlancetaylor added this to the Go1.17 milestone Jan 12, 2021
@ianlancetaylor ianlancetaylor added the GoCommand cmd/go label Jan 12, 2021
@bcmills
Copy link
Contributor

bcmills commented Jan 12, 2021

@zimmski, are the errors always for that specific source file ($GOROOT/src/crypto/tls/common.go) and imported package (net), or do the source files and packages vary from error to error?

@bcmills
Copy link
Contributor

bcmills commented Jan 12, 2021

The CI runs in Docker containers over Kubernetes

Could you give the output of running go env within the container?

@zimmski
Copy link
Contributor Author

zimmski commented Jan 13, 2021

@zimmski, are the errors always for that specific source file ($GOROOT/src/crypto/tls/common.go) and imported package (net), or do the source files and packages vary from error to error?

Will take a look at a view pipelines and update this comment with what i find:

i see "can't open import:" which both reference "/usr/local/go" which is the Go archive from https://golang.org/dl/ unpacked. So far this seems isolated there.

/usr/local/go/src/crypto/tls/common.go:22:2: can't open import: "net": open /cache/go-build/87/8713176dd630051cd4b09a7fa534da9a291da82ba5c84c02966f3ee18dc3a206-d: no such file or directory
# vendor/golang.org/x/net/http/httpguts
/usr/local/go/src/vendor/golang.org/x/net/http/httpguts/httplex.go:8:2: can't open import: "net": open /cache/go-build/87/8713176dd630051cd4b09a7fa534da9a291da82ba5c84c02966f3ee18dc3a206-d: no such file or directory
# runtime

And other errors are of the form "can't import facts for package" which are StdLib packages so far.

vet: in runtime, can't import facts for package "internal/cpu": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory

Here is a full list of one such failing build. One thing to note is that this failure comes from a subtest where we compile 10 other test cases in parallel with different literals in the code. One fails but the others do not.

# crypto/tls
/usr/local/go/src/crypto/tls/common.go:22:2: can't open import: "net": open /cache/go-build/87/8713176dd630051cd4b09a7fa534da9a291da82ba5c84c02966f3ee18dc3a206-d: no such file or directory
# vendor/golang.org/x/net/http/httpguts
/usr/local/go/src/vendor/golang.org/x/net/http/httpguts/httplex.go:8:2: can't open import: "net": open /cache/go-build/87/8713176dd630051cd4b09a7fa534da9a291da82ba5c84c02966f3ee18dc3a206-d: no such file or directory
# runtime
vet: in runtime, can't import facts for package "internal/cpu": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# sync
vet: in sync, can't import facts for package "internal/race": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# math/rand
vet: in math/rand, can't import facts for package "math": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# crypto/hmac
vet: in crypto/hmac, can't import facts for package "crypto/subtle": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# hash/crc32
vet: in hash/crc32, can't import facts for package "internal/cpu": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# strings
vet: in strings, can't import facts for package "internal/bytealg": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# bytes
vet: in bytes, can't import facts for package "internal/bytealg": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# strconv
vet: in strconv, can't import facts for package "math": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# crypto/rc4
vet: in crypto/rc4, can't import facts for package "crypto/internal/subtle": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# syscall
vet: in syscall, can't import facts for package "internal/bytealg": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# regexp/syntax
vet: in regexp/syntax, can't import facts for package "unicode": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# regexp
vet: in regexp, can't import facts for package "unicode": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# reflect
vet: in reflect, can't import facts for package "unicode": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# encoding/binary
vet: in encoding/binary, can't import facts for package "math": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# vendor/golang.org/x/crypto/poly1305
vet: in vendor/golang.org/x/crypto/poly1305, can't import facts for package "crypto/subtle": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# crypto/sha512
vet: in crypto/sha512, can't import facts for package "internal/cpu": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# crypto/cipher
vet: in crypto/cipher, can't import facts for package "crypto/internal/subtle": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# crypto/sha1
vet: in crypto/sha1, can't import facts for package "internal/cpu": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# crypto/sha256
vet: in crypto/sha256, can't import facts for package "internal/cpu": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# crypto/des
vet: in crypto/des, can't import facts for package "crypto/internal/subtle": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# crypto/aes
vet: in crypto/aes, can't import facts for package "crypto/internal/subtle": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# vendor/golang.org/x/crypto/chacha20
vet: in vendor/golang.org/x/crypto/chacha20, can't import facts for package "vendor/golang.org/x/crypto/internal/subtle": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# os
vet: in os, can't import facts for package "internal/testlog": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# fmt
vet: in fmt, can't import facts for package "math": open /cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d: no such file or directory
# net/textproto
vet: /usr/local/go/src/net/textproto/textproto.go:32:2: could not import net (open /cache/go-build/87/8713176dd630051cd4b09a7fa534da9a291da82ba5c84c02966f3ee18dc3a206-d: no such file or directory)

@zimmski
Copy link
Contributor Author

zimmski commented Jan 13, 2021

The CI runs in Docker containers over Kubernetes

Could you give the output of running go env within the container?

Of course, sorry that i posted my local one. It obviously does not make much sense to do that. The follow output is directly from one of the failing CI pipelines.

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS="-v -trimpath"
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/builds/symflower/symflower/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/builds/symflower/symflower"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang-11"
CXX="clang++-11"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build695024912=/tmp/go-build -gno-record-gcc-switches"

Edit: update the issue description.

@zimmski
Copy link
Contributor Author

zimmski commented Jan 13, 2021

I went now through some failures and it looks like that the failures even though they are everywhere in our build always come from the "net" package. I will add now a new build with the GOCACHE again used from the node itself. However, since now all branches do not use that cache anymore i am wondering if this happens with that branch at all.

@bcmills
Copy link
Contributor

bcmills commented Jan 13, 2021

In the configuration that fails, how is the /cache partition mounted?

Do the same commands exhibit the same failure mode when run outside of Docker?

(I'm trying to rule out the possibility of a bug introduced by a Docker filesystem hook somewhere, or at least distill down the bug to something that we can reproduce independent of Docker.)

@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jan 14, 2021
@gopherbot
Copy link

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

5 participants