Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/sys/windows Crash in ReadFile() when race enabled #65365

Open
derekwbrown opened this issue Jan 30, 2024 · 10 comments
Open

x/sys/windows Crash in ReadFile() when race enabled #65365

derekwbrown opened this issue Jan 30, 2024 · 10 comments
Labels
NeedsFix The path to resolution is known, but the work has not been done. OS-Windows
Milestone

Comments

@derekwbrown
Copy link

Go version

go 1.21.5

Output of go env in your module/workspace:

set GO111MODULE=
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\db\AppData\Local\go-build
set GOENV=C:\Users\db\AppData\Roaming\go\env
set GOEXE=.exe
set GOEXPERIMENT=
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GOMODCACHE=c:\go\pkg\mod
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=c:\go
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=c:\go\1.21.5\go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLCHAIN=auto
set GOTOOLDIR=c:\go\1.21.5\go\pkg\tool\windows_amd64
set GOVCS=
set GOVERSION=go1.21.5
set GCCGO=gccgo
set GOAMD64=v1
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=C:\src\datadog-windows-procmon\go.mod
set GOWORK=
set CGO_CFLAGS=-O2 -g
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-O2 -g
set CGO_FFLAGS=-O2 -g
set CGO_LDFLAGS=-O2 -g
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=C:\Users\db\AppData\Local\Temp\go-build3031818741=/tmp/go-build -gno-record-gcc-switches

What did you do?

I have windows/go code which crashes unexpectedly.
Specifically, for ReadFile() and WriteFile() any call crashes when the third parameter done is Nil, which is a legal value, when raceenabled


func ReadFile(fd Handle, p []byte, done *uint32, overlapped *Overlapped) error {
	err := readFile(fd, p, done, overlapped)
	if raceenabled {
		if *done > 0 {
			raceWriteRange(unsafe.Pointer(&p[0]), int(*done))
		}
		raceAcquire(unsafe.Pointer(&ioSync))
	}
	return err
}

What did you see happen?

Code crashes when race is enabled and done is nil.

What did you expect to see?

I would expect it to not crash.
the done pointer may be nil when initiating an overlapped operation; in this case the done value isn't filled in until the operation completes (by calling GetQueuedCompletionStatus().

@gopherbot gopherbot added this to the Unreleased milestone Jan 30, 2024
@qmuntal
Copy link
Contributor

qmuntal commented Jan 30, 2024

Thanks for reporting this bug @derekwbrown. ReadFile does define done as optional, so we should also support that.

@qmuntal qmuntal added the NeedsFix The path to resolution is known, but the work has not been done. label Jan 30, 2024
@gopherbot
Copy link

Change https://go.dev/cl/559375 mentions this issue: windows: support nil done parameter in ReadFile and WriteFile

@derekwbrown
Copy link
Author

@qmuntal the fix provided will fix the crash. Howver, it will break code that uses ReadFile() in this way. From the ReadFile() docs:

. Use NULL for this parameter if this is an asynchronous operation to avoid potentially erroneous results.

However, the code change is to simply pass lpNumberOfBytesRead as a pointer to a local variable.

@gopherbot
Copy link

Change https://go.dev/cl/559502 mentions this issue: Revert "windows: support nil done parameter in ReadFile and WriteFile"

gopherbot pushed a commit to golang/sys that referenced this issue Jan 30, 2024
This reverts CL 559375.

Reason for revert: introduced a different regression (golang/go#65378).

Fixes golang/go#65378.
Updates golang/go#65365.

Change-Id: Ie2a602415913b04b9d9b65fee5c6a54c0267b35e
Cq-Include-Trybots: luci.golang.try:x_sys-gotip-windows-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/sys/+/559502
Auto-Submit: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@qmuntal
Copy link
Contributor

qmuntal commented Jan 31, 2024

@qmuntal the fix provided will fix the crash. Howver, it will break code that uses ReadFile() in this way. From the ReadFile() docs:
. Use NULL for this parameter if this is an asynchronous operation to avoid potentially erroneous results.
However, the code change is to simply pass lpNumberOfBytesRead as a pointer to a local variable.

Thanks for catching this. Although it is true that the docs warns about passing a non-null lpNumberOfBytesRead for overlapped operations, it does not say that it can't be done, and CL 559375 is safe as per https://devblogs.microsoft.com/oldnewthing/20230215-00/?p=107832:

So it’s okay to pass a non-null lpNumberOfBytesRead to Read­File, even when issuing asynchronous I/O, provided that you do so into a different variable from the one that the completion routine uses.

What I want to avoid is users side stepping the race detector by just passing a nil done. A better solution that is more "safe" as per the docs could be this:

func ReadFile(fd Handle, p []byte, done *uint32, overlapped *Overlapped) error {
	var n uint32
        if done == nil && overlapped == nil {
                // Non-asynchronous case, done must be set.
                done = &n
        }
	err := readFile(fd, p, &n, overlapped)
	if raceenabled {
		if done != nil && *done > 0 {
			raceWriteRange(unsafe.Pointer(&p[0]), int(n))
		}
		raceAcquire(unsafe.Pointer(&ioSync))
	}
	return err
}

@derekwbrown
Copy link
Author

derekwbrown commented Jan 31, 2024

so this case

 if done == nil && overlapped == nil {

is explicitly an error:

This parameter can be NULL only when the lpOverlapped parameter is not NULL.

So, unless the caller is doing something blatantly wrong, then the assignment done = &n will never be called.
(and if the caller does this, then readFile will return with an error and no read will actually occur)

So, if overlapped is not nil, then it's OK (and in fact expected) that done is nil, and can/should proceed with calling readFile with no lplpNumberOfBytesRead param.

if overlapped is nil, then done must not be nil, and so everything's fine.

Given that, I think your suggestion above simplifies back to

func ReadFile(fd Handle, p []byte, done *uint32, overlapped *Overlapped) error {
   err := readFile(fd, p, done, overlapped)
	if raceenabled {
   	if done != nil && *done > 0 {
   		raceWriteRange(unsafe.Pointer(&p[0]), int(*done))
   	}
   	raceAcquire(unsafe.Pointer(&ioSync))
      }
	return err
}

(or similar; it's not clear to me whether the raceAcquire() needs to be called )

Full disclosure... I don't totally understand what the race detector is testing/how it's designed to work. But in general, overlapped I/O requires care (by the caller). If the caller initiates overlapped I/O, the caller guarantees the buffer to be valid and not used until the operation completes, via separate call to GetOverlappedResult() or GetQueuedCompletionStatus() So trying to do race detection on the buffer in the overlapped case doesn't seem valid, at least to my understanding.

@derekwbrown
Copy link
Author

@qmuntal any update?

@qmuntal
Copy link
Contributor

qmuntal commented Feb 13, 2024

So, unless the caller is doing something blatantly wrong, then the assignment done = &n will never be called.

Thanks for the explanation, I now see that my last proposal doesn't make much sense.

So trying to do race detection on the buffer in the overlapped case doesn't seem valid, at least to my understanding.

Agree. The problem is that ReadFile allows to pass an overlapped structure even if the file hasn't been opened with FILE_FLAG_OVERLAPPED, in which case nNumberOfBytesToRead can be set to null and it will do a synchronous read. There are some valid use cases to do so, in which the race detector should complain if there are races.

Given that it is not possible to directly know if the read/write will be asynchronous by just looking at the parameters, I propose another approach: if the error returned is ERROR_IO_PENDING, then we skip the race detector, as the nNumberOfBytesToRead can contain potentially erroneous results (quoting the docs). It would look like this:

func ReadFile(fd Handle, p []byte, done *uint32, overlapped *Overlapped) error {
	var n uint32
	err := readFile(fd, p, &n, overlapped)
	if done != nil {
		*done = n
	}
	if raceenabled {
		// If err is ERROR_IO_PENDING, the operation is is pending
		// completion asynchronously. n is not reliable and can't be
		// used for the race detector.
		if err != ERROR_IO_PENDING && n > 0 {
			raceWriteRange(unsafe.Pointer(&p[0]), int(n))
		}
		raceAcquire(unsafe.Pointer(&ioSync))
	}
	return err
}

func WriteFile(fd Handle, p []byte, done *uint32, overlapped *Overlapped) error {
	if raceenabled {
		raceReleaseMerge(unsafe.Pointer(&ioSync))
	}
	err := writeFile(fd, p, done, overlapped)
	// If err is ERROR_IO_PENDING, the operation is is pending
	// completion asynchronously. n is not reliable and can't be
	// used for the race detector.
	if raceenabled && err != ERROR_IO_PENDING && *done > 0 {
		raceReadRange(unsafe.Pointer(&p[0]), int(*done))
	}
	return err
}

What do you think?

@derekwbrown
Copy link
Author

derekwbrown commented Feb 20, 2024

I'm still not wild about the implementation changing the semantics of the call.
If I call ReadFile(f, buf, nil, &ol), I'm not expecting the implementation to change that (and whatever fun operating system stuff goes along with it). IIUC, providing overlapped means that the buffer shouldn't be touched until "later" (for some definition of later) and anything in done is of questionable value, and "potentially erroneous".

I would suggest


func ReadFile(fd, p, done, overlapped) {
    err := readFile(fd, p, done, overlapped) 
    if raceenabled {
        if err == nil && done && *done > 0 {
            raceWriteRange(...)
        }
        raceAcquire()
    }
    return err
}

(it's not clear to me if the done > 0 part of my conditional has any value)

In the event that overlapped is non-nil, the docs are quite explicit that done should be nil. So I really would not be in favor of the implementation changing the value of my parameter. That is, if I provide nil for the done parameter, I think it's really unexpected for the implementation to provide a non-nil value for that parameter.

it also seems like raceWriteRange should only ever be called when err == nil, because with any error (not just IO_PENDING), the buffer won't have been touched.

While this also adds additional semantics, you might even put in

if done && overlapped {
   return ERROR_INVALID_PARAMETER
}

Which is more aggressive than the underlying API, but would at least guarantee that (go) callers can't confuse the interface (and the race checker).

@qmuntal
Copy link
Contributor

qmuntal commented Feb 22, 2024

IIUC, providing overlapped means that the buffer shouldn't be touched until "later" (for some definition of later) and anything in done is of questionable value, and "potentially erroneous".

One can pass an overlapped object with synchronous files, in which case done contains a valid and useful value for the race detector (source):

Considerations for working with synchronous file handles:
...
If lpOverlapped is not NULL, the read operation starts at the offset that is specified in the OVERLAPPED structure and ReadFile does not return until the read operation is complete. The system updates the OVERLAPPED offset and the file pointer before ReadFile returns.

--

In the event that overlapped is non-nil, the docs are quite explicit that done should be nil.

It says "The lpNumberOfBytesRead parameter should be set to NULL", that's a recommendation, not an obligation, else it would say "must be set to NULL". AFAIU, our usage of lpNumberOfBytesRead is safe as per https://devblogs.microsoft.com/oldnewthing/20230215-00/?p=107832:

So it’s okay to pass a non-null lpNumberOfBytesRead to Read­File, even when issuing asynchronous I/O, provided that you do so into a different variable from the one that the completion routine uses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done. OS-Windows
Projects
None yet
Development

No branches or pull requests

4 participants