New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/go: "file exists" errors when trying to fetch modules #36447
Comments
I forgot to mention the env vars. We run the following:
|
From the filenames involved in the error, it seems likely that the failing call is this one: go/src/cmd/go/internal/modfetch/fetch.go Line 122 in daacf26
That seems to imply one of the following possibilities:
Either way, given the information we have so far this seems more likely to be a bug in the underlying filesystem than in the Could you try running |
Note that the concurrency strategy for (We use file-locking in the module cache because idempotent writes would be significantly less efficient in many cases, and because it is otherwise difficult to signal that a downloaded module is complete and ready for use. In contrast, within |
This was my initial suspicion, but we're using a pretty recent stable Docker on the most recent Ubuntu LTS, with an ext4 disk. It doesn't get more standard and stable than this, I think.
That's a good point. Though the other CI builds could do concurrent module fetches, if the cache isn't up to date. It's this build that's causing problems that doesn't have any concurrent steps whatsoever. Which is why I'm extra confused.
I realised this issue wouldn't have much actionable for you, but I still filed it in case you saw something that I didn't. And in case others would find it useful in the future, if they encounter the same error. I'll give those |
Ok, wow, this is beyond embarassing. The CI config was buggy; someone had messed with it while I was away on vacation, and they removed the dependency between the "run I did look at that twice, but of course, I'm only human :( Apologies for the noise and the waste of time. This is definitely a filesystem data race that's entirely our fault. |
This happens sporadically on
golang:1.13.5
withDocker version 19.03.5, build 633a0ea838
and Linux4.15.0-72-generic #81-Ubuntu
.It's happened on a CI build job three times in the past week, for a job that runs twice per hour. So, roughly, about 1% of the time. I haven't been able to reliably reproduce the error, nor do we run these jobs with Go tip.
Unfortunately, this is happening with a piece of internal end-to-end testing, so its source and build jobs are not public.
Here is the log, since it doesn't contain any sensitive info:
The
gopath
directory in question is cached between builds. The way we do that is by atomically storing atar.zst
archive of the$HOME/.cache/
directory at the end of a successful build, and extracting it at the start.It should be noted that this
go test
docker container does not share any volumes with other docker containers, e.g. other concurrentgo test
commands. Because of how this CI system is designed,$HOME
is a volume, because it needs to persist between build steps. Perhaps this affects how the filesystem works, since$GOPATH
is under it.I tried to do some debugging, but failed to find the cause so far. Here is a summary:
/root/openbank-services/.cache/gopath/pkg/mod/github.com/gogo/protobuf@v1.2.2-0.20190723190241-65acae22fc9d
exists and looks correct. Though this might be a newer archive.src/cmd/go/internal/modfetch/fetch.go
, and the locking and renaming of the directory looks non-racy to me.fetch.go
would error immediately if locking wasn't supported, instead of silently using no locking.err == nil && fi.IsDir()
and then justos.Rename
. But I guess this scenario would mean that$GOPATH
got corrupted.I'd be surprised if our setup was to blame, because another of our CI pipelines does run many
cmd/go
commands concurrently with shared$GOPATH
and$GOCACHE
via the same volume setup. We've run thousands of those jobs in the past month alone, and I don't recall a single error like this./cc @bcmills @jayconrod
The text was updated successfully, but these errors were encountered: