Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/race: potential false positive when synchronizing with unix.Flock() #50139

Closed
iamcalledrob opened this issue Dec 13, 2021 · 5 comments
Closed

Comments

@iamcalledrob
Copy link

iamcalledrob commented Dec 13, 2021

What version of Go are you using (go version)?

$ go version
go version go1.17 darwin/arm64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/rob/Library/Caches/go-build"
GOENV="/Users/rob/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/rob/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/rob/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="go1.17"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/rob/Code/flockRaceExample/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/3l/mj1p0_6x3dnb850mkd9ph8b00000gn/T/go-build4246288242=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

  1. In multiple goroutines, get a unique file descriptor using unix.Open("/path/to/file",...)
  2. In multiple goroutines, concurrently attempt to acquire an exclusive lock via unix.Flock(fd, unix.LOCK_EX)
  3. A single goroutine acquires the lock, unblocks, then prints Entering critical section. It then mutates a variable, prints Exited critical section and unlocks (by closing via unix.Close(fd))
  4. The next goroutine then acquires the lock and does the same thing (and so on)

Full test-case here: https://go.dev/play/p/5CY5B1q9WUf
Note: doesn't run in go.dev/play due to needing the race flag. Repros for me in darwin/arm64 and linux/arm64

Oddities:

  1. The race detector will reliably trip (or not trip) based on whether unrelated file actions happen in the critical sections (notably, writing to os.Stdout / os.StdErr). If no writes to Stdout/Stderr occur, or an unmatched pair of writes occur (e.g. Stdout + Stderr), the race detector doesn't trip. If a single write to either Stdout or Stderr occurs OR a matched pair of writes occurs, the detector trips when mutating the variable. I discovered this whilst debugging since log.Printf() writes to Stderr and fmt.Printf() writes to Stdout.
  2. Stepping through in the debugger with 2 goroutines (iters=2) shows that the goroutines are not in the critical section concurrently. The second goroutine only unblocks after the first calls unix.Close(fd) (i.e. after the critical section has been left), and a data race is then detected once it reaches line 33 (testVar++).

What did you expect to see?

11:09:58 Entering critical section (fd 3)
11:09:58 Exited critical section (fd 3)
11:09:59 Entering critical section (fd 4)
11:09:59 Exited critical section (fd 4)

Process finished with the exit code 0

What did you see instead?

11:11:59 Entering critical section (fd 3)
11:11:59 Exited critical section (fd 3)
11:12:00 Entering critical section (fd 4)
==================
WARNING: DATA RACE
Read at 0x00c00013c018 by goroutine 8:
  main.main.func1()
      /Users/rob/Code/flockRaceExample/main.go:33 +0x1b4

Previous write at 0x00c00013c018 by goroutine 7:
  main.main.func1()
      /Users/rob/Code/flockRaceExample/main.go:33 +0x1c8

Goroutine 8 (running) created at:
  main.main()
      /Users/rob/Code/flockRaceExample/main.go:49 +0x200

Goroutine 7 (finished) created at:
  main.main()
      /Users/rob/Code/flockRaceExample/main.go:49 +0x200
==================
11:12:00 Exited critical section (fd 4)
Found 1 data race(s)

Process finished with the exit code 66

Is it possible that the synchronization that happens via Flock() is invisible to the race detector, and this is a false positive? I found a potentially related discussion here: #5363 (comment)

@prattmic
Copy link
Member

I believe your analysis here is correct. We do not instrument Flock for the race detector, so it does not treat it as a barrier. Note that we do instrument syscall.Read/Write/Readv/etc, but not any other direct syscalls.

cc @dvyukov

Duplicate of #5363.

@randall77
Copy link
Contributor

Is it possible that the synchronization that happens via Flock() is invisible to the race detector, and this is a false positive?

That sounds about right.

I think this would be hard to fix, though. How would you know that two Flock calls are actually operating on the same lock? You'd need to map from the file descriptor to the inode+device and use that as the lock identifier.

@prattmic
Copy link
Member

I think this would be hard to fix, though. How would you know that two Flock calls are actually operating on the same lock? You'd need to map from the file descriptor to the inode+device and use that as the lock identifier.

syscall.Read et. al seem to have the same problem, as all calls share the same "lock word": https://cs.opensource.google/go/go/+/master:src/syscall/syscall_unix.go;l=194;drc=master;bpv=1;bpt=1. Thus false negatives are quite possible.

@bcmills
Copy link
Contributor

bcmills commented Dec 13, 2021

I don't think Go memory model specifies unix.Flock as a synchronization point — the race detector really is correct, in that the compiler is not obligated to flush writes from one goroutine before releasing a file lock that is subsequently acquired by a different goroutine.

If you are using Flock for mutual exclusion within a process, I suggest that you pair it with a Go synchronization primitive (such as a channel or a sync.Mutex). For example, see the pairing implemented in cmd/go/internal/lockedfile.Mutex:
https://cs.opensource.google/go/go/+/master:src/cmd/go/internal/lockedfile/mutex.go;drc=8c896aa46655cfbdfd9971fb16a830046fcdf81c

If you need to support arbitrary file paths, you could pair that with, say, a sync.Map mapping each filename to the corresponding Mutex-like object.

@iamcalledrob
Copy link
Author

Thanks everyone for the extremely helpful context. The race condition makes sense—I'd never considered that writes may not occur immediately.

Hopefully the details in this issue will be useful for others in future.

@golang golang locked and limited conversation to collaborators Dec 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants