Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: apparent file/directory race in Go 1.15.7 on windows amd64 #44137

Closed
myitcv opened this issue Feb 6, 2021 · 5 comments
Closed

cmd/go: apparent file/directory race in Go 1.15.7 on windows amd64 #44137

myitcv opened this issue Feb 6, 2021 · 5 comments
Labels
FrozenDueToAge GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@myitcv
Copy link
Member

myitcv commented Feb 6, 2021

What version of Go are you using (go version)?

$ go version
go version go1.15.7 windows/amd64

Does this issue reproduce with the latest release?

Unclear.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
  set GO111MODULE=
  set GOARCH=amd64
  set GOBIN=
  set GOCACHE=C:\Users\runneradmin\AppData\Local\go-build
  set GOENV=C:\Users\runneradmin\AppData\Roaming\go\env
  set GOEXE=.exe
  set GOFLAGS=
  set GOHOSTARCH=amd64
  set GOHOSTOS=windows
  set GOINSECURE=
  set GOMODCACHE=C:\Users\runneradmin\go\pkg\mod
  set GONOPROXY=
  set GONOSUMDB=
  set GOOS=windows
  set GOPATH=C:\Users\runneradmin\go
  set GOPRIVATE=
  set GOPROXY=https://proxy.golang.org,direct
  set GOROOT=C:\hostedtoolcache\windows\go\1.15.7\x64
  set GOSUMDB=sum.golang.org
  set GOTMPDIR=
  set GOTOOLDIR=C:\hostedtoolcache\windows\go\1.15.7\x64\pkg\tool\windows_amd64
  set GCCGO=gccgo
  set AR=ar
  set CC=gcc
  set CXX=g++
  set CGO_ENABLED=1
  set GOMOD=
  set CGO_CFLAGS=-g -O2
  set CGO_CPPFLAGS=
  set CGO_CXXFLAGS=-g -O2
  set CGO_FFLAGS=-g -O2
  set CGO_LDFLAGS=-g -O2
  set PKG_CONFIG=pkg-config
  set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\RUNNER~1\AppData\Local\Temp\go-build637926094=/tmp/go-build -gno-record-gcc-switches

What did you do?

We have seen three recent build flakes in the CUE project where Go 1.15.7 is running atop a windows-2019 GitHub actions environment machine:

The failure comes in a test that uses cmd/go via go/packages, and results in the following:

go proxy: no archive example.com v1.0.0
--- FAIL: TestScript (0.21s)
    --- FAIL: TestScript/get_go_types (4.21s)
        testscript.go:382: 
            # Test that a get go works for various "interesting" types,
            # including those special cases that result in _ and string
            # that are specifically mentioned in help get go.
            #
            # This test does not exercise with/without --local. Instead
            # that is left for other tests, so that this test can focus
            # on the various rules of get go.
            # cue get go (3.740s)
            > cue get go --local ./...
            [stderr]
            could not load Go packages:
            	./...: [-: pattern ./...: stat $WORK\tmp\go-build655607887: directory not found]
            [exit status 1]
            FAIL: testdata\script\get_go_types.txt:10: unexpected command failure

This looks like some sort of file/directory race.

I realise there isn't a whole load of detail here, but the flakes appear to be entirely random and as such we can't reproduce. As we find more information I will of course share it here.

What did you expect to see?

A passing test

What did you see instead?

The above.

cc @bcmills @jayconrod

@myitcv myitcv added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. GoCommand cmd/go labels Feb 6, 2021
@dmitshur dmitshur added this to the Backlog milestone Feb 6, 2021
@bcmills
Copy link
Contributor

bcmills commented Feb 8, 2021

It would be helpful to know which layer produced that error — did it originate from go list, or from go/packages itself?

With what arguments — and what directory contents — does cue get go --local invoke go/packages?

@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Feb 8, 2021
@myitcv
Copy link
Member Author

myitcv commented Feb 10, 2021

Apologies, that would have been sensible information to include in the first place. That was sloppy of me.

We're doing a plain go/packages.Load here:

https://github.com/cuelang/cue/blob/fdccfbc52e248257d2ea20de1b2e0ed8589c3c8a/cmd/cue/cmd/get_go.go#L398

with a ./... pattern, run from the root of this txtar:

https://github.com/cuelang/cue/blob/fdccfbc52e248257d2ea20de1b2e0ed8589c3c8a/cmd/cue/cmd/testdata/script/get_go_types.txt

did it originate from go list, or from go/packages itself?

That I don't know I'm afraid.

FWIW we're using golang.org/x/tools v0.0.0-20200612220849-54c614fe050c.

@bcmills bcmills removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Feb 10, 2021
@bcmills
Copy link
Contributor

bcmills commented Feb 10, 2021

This does indeed look like a race somewhere, but it's hard to tell in what layer.
(Is it in cmd/go or go/packages — CC @matloob — or in cue get or @rogpeppe's testscript fork?)

We've seen enough weird filesystem interactions on Windows that I wouldn't be surprised by any of those, but I don't think we have the bandwidth to go hunting for the problem without a more isolated reproducer.

@bcmills bcmills modified the milestones: Backlog, Unplanned Feb 10, 2021
@myitcv
Copy link
Member Author

myitcv commented Feb 10, 2021

or in cue get or @rogpeppe's testscript fork

This I think unlikely, because we get this fail during the running of a test.

We've seen enough weird filesystem interactions on Windows that I wouldn't be surprised by any of those, but I don't think we have the bandwidth to go hunting for the problem without a more isolated reproducer.

Understood. I was raising this morning in the hope that if someone else saw it, combined experience/reports might point to the problem. It's certainly not a critical problem for us, just something I spotted via a number of CI fails.

@myitcv
Copy link
Member Author

myitcv commented Mar 12, 2021

Unsurprisingly, this is actually a problem within our test setup.

The clue is actually in the use of the package pattern ./.... In a standard testscript setup, a temporary directory is established at $WORK/tmp. All files in a testscript archive are extracted to $WORK. Most testscripts, including this one, run commands from the root of a test, i.e. $WORK. In this case, $WORK/tmp sits within the CUE module, and hence ./... matches $WORK/tmp. Except that directories within $WORK/tmp come and go as go/packages does its thing. Hence why we see this apparent race - it's happening during the walk of directories to find packages that match ./....

The solution here is to establish a temp dir (and home dir for that matter) in directories within $WORK that are not walked by ./...; in this case we use . as a prefix to the directory name.

@myitcv myitcv closed this as completed Mar 12, 2021
cueckoo pushed a commit to cuelang/cue that referenced this issue Mar 12, 2021
We have recently been seeing a number of test flakes in the
get_go_types.txt which were ultimately reported as golang/go#44137.

However, it turns out this is actually caused by an interaction between
the ./... pattern and use of go/packages within a testscript test.

In a standard testscript setup, a temporary directory is established at
$WORK/tmp. All files in a testscript archive are extracted to $WORK.
Most testscripts, including get_go_types.txt, run commands from the
root, i.e. $WORK, for convenience. In this case, $WORK/tmp sits within
the Go module, and hence ./... matches $WORK/tmp. Except that
directories within $WORK/tmp come and go as go/packages does its thing.
Hence why we see this apparent race - it's happening during the walk of
directories to find packages that match ./...

Resolve this by hiding temp and home dirs so that they are never walked
for the ./... pattern by either cmd/go or cmd/cue.

Change-Id: Ia8c7ffd471f9879d3283a47db5f2a5e165f619e1
cueckoo pushed a commit to cuelang/cue that referenced this issue Mar 12, 2021
We have recently been seeing a number of test flakes in the
get_go_types.txt which were ultimately reported as golang/go#44137.

However, it turns out this is actually caused by an interaction between
the ./... pattern and use of go/packages within a testscript test,
specifically go/packages (via cmd/go) writing temporary files and
directories to $WORK/tmp which then end up racing with the walk of $WORK
to find directories/packages that match ./...

As of v1.8.0, in a standard testscript setup a temporary directory is
established at $WORK/.tmp. We now follow that same pattern for the home
directory for consistency's sake.

Change-Id: Ia8c7ffd471f9879d3283a47db5f2a5e165f619e1
cueckoo pushed a commit to cuelang/cue that referenced this issue Mar 12, 2021
We have recently been seeing a number of test flakes in the
get_go_types.txt which were ultimately reported as golang/go#44137.

However, it turns out this is actually caused by an interaction between
the ./... pattern and use of go/packages within a testscript test.

In a standard testscript setup, a temporary directory is established at
$WORK/tmp. All files in a testscript archive are extracted to $WORK.
Most testscripts, including get_go_types.txt, run commands from the
root, i.e. $WORK, for convenience. In this case, $WORK/tmp sits within
the Go module, and hence ./... matches $WORK/tmp. Except that
directories within $WORK/tmp come and go as go/packages does its thing.
Hence why we see this apparent race - it's happening during the walk of
directories to find packages that match ./...

Resolve this by hiding temp and home dirs so that they are never walked
for the ./... pattern by either cmd/go or cmd/cue.

Change-Id: Ia8c7ffd471f9879d3283a47db5f2a5e165f619e1
cueckoo pushed a commit to cuelang/cue that referenced this issue Mar 12, 2021
We have recently been seeing a number of test flakes in the
get_go_types.txt which were ultimately reported as golang/go#44137.

However, it turns out this is actually caused by an interaction between
the ./... pattern and use of go/packages within a testscript test,
specifically go/packages (via cmd/go) writing temporary files and
directories to $WORK/tmp which then end up racing with the walk of $WORK
to find directories/packages that match ./...

As of v1.8.0, in a standard testscript setup a temporary directory is
established at $WORK/.tmp. We now follow that same pattern for the home
directory for consistency's sake.

Change-Id: Ia8c7ffd471f9879d3283a47db5f2a5e165f619e1
Reviewed-on: https://cue-review.googlesource.com/c/cue/+/9002
Reviewed-by: CUE cueckoo <cueckoo@gmail.com>
Reviewed-by: Paul Jolly <paul@myitcv.org.uk>
@golang golang locked and limited conversation to collaborators Mar 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants