Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: Read of a device driver fails only with Go 1.20 #60211

Closed
joefowler opened this issue May 15, 2023 · 14 comments
Closed

os: Read of a device driver fails only with Go 1.20 #60211

joefowler opened this issue May 15, 2023 · 14 comments
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@joefowler
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.20.4 linux/amd64

Does this issue reproduce with the latest release?

This issue reproduces ONLY with Go 1.20 (I have tried 1.20.3 and 1.20.4), and does not arise on 1.19 or earlier.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/pcuser/.cache/go-build"
GOENV="/home/pcuser/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/pcuser/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/pcuser/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go-1.20"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.20/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20.4"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/pcuser/go/src/github.com/usnistgov/dastard/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3739805746=/tmp/go-build -gno-record-gcc-switches"
GOROOT/bin/go version: go version go1.20.4 linux/arm64
GOROOT/bin/go tool compile -V: compile version go1.20.4
uname -v: #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023

What did you do?

A colleague and I run a large project that has to control and receive data from a special-purpose PCI-express card, running custom firmware. This is a data acquisition system for high-speed x-ray and gamma-ray detectors, arrays of superconducting sensors (just for context). It’s worked extremely well since we launched the Go program in 2017 (as a replacement for a C++ monstrosity).

We talk to this PCIe card by opening/closing and reading/writing device-special files provided (obviously) by a device driver. There are control registers for configuration and a scatter-gather DMA for transferring the high-speed data (typically 20 to 200 MB/second, depending on one’s instrument configuration) to the computer RAM.

Code hangs at the unix.Read call in the following, if running Go 1.20:

	gobuffer := C.GoBytes(unsafe.Pointer(buffer), C.int(bufferLength))
	n, err := unix.Read(int(fd), gobuffer)

Here

  • buffer *C.char is a buffer allocated in C code by posix_memalign
  • bufferLength is 33554432 (that is, 32 MB or 0x1000000)
  • unix refers to package "golang.org/x/sys/unix"
  • fd is the descriptor of the open device special file to be read.

What did you expect to see?

Go 1.16, 1.17, 1.18, and 1.19 all run our program just fine. The device is read successfully, returning high speed data to the Go program. I expected to see the same when switching to Go 1.20.

What did you see instead?

When I build the program with Go 1.20 the program hangs (0% CPU load) at the point where the program tries to perform the system call to the unix.Read(...) function on the device file. The build succeeds, and the configuration steps (appear to) work correctly at run time--these steps set up the scatter-gather DMA cycle. When we try to read from the DMA buffer the first time, the program hangs, and only in Go 1.20!

I’m afraid I cannot provide a minimum reproducible example, owing to the fact that you’d need our specific hardware (running our specific firmware) and the corresponding device drivers. I am very sorry I can’t offer a demonstration in the playground. I realize this issue might be something no one but me can diagnose.

Still, maybe someone can offer ideas? Is there something special I should know about Go 1.20 that might help me track down the problem? I’ve read the 1.20 release notes a dozen times, but maybe I’m missing the significance of the key point in there? Or did something in cgo actually break in Go 1.20?

The info above says I'm using Ubuntu 22.04 with Go 1.20.4. The same problem has also been noted on a different PC running Ubuntu 20.04, and on both computers with Go 1.20.3. I have not tried earlier 1.20.x releases.

I checked with the Go Forum, without success.

@randall77
Copy link
Contributor

Definitely strange.

Some things to try:

  • run with strace and check that the system call trace looks reasonable. See if the read call is at the end. See if the trace differs materially from 1.19 to 1.20.
  • Try writing the entire range of buffer just before calling Read. That will rule out various page permission issues.
  • Are you doing anything with signals? Maybe a signal the runtime needs (SIGURG, maybe others) are blocked? (Not sure why this would be different between 1.19 and 1.20.)

@joefowler
Copy link
Author

@randall77 : Thanks for these good ideas. I'll look into them.

Regarding signals, I am definitely redirectingos.Interrupt to ensure that a Ctrl-C results in an orderly shutdown. Doesn't seem like that ought to mess up any other signals, but I could certainly check. Maybe I could stop intercepting the SIGINT and see if anything changes.

@ianlancetaylor
Copy link
Contributor

Where do you get the fd value that you pass to unix.Read? Do you get it from unix.Open or are you using the os package?

If the latter, I wonder if this could be caused by https://go.dev/cl/420334, though I don't immediately see how.

@joefowler
Copy link
Author

@ianlancetaylor Thanks. Interesting thought. I am opening this SGDMA device through os.OpenFile("/dev/whatever_sgdma") and getting the fd value from that, and I am setting the O_NONBLOCK flag. Specifically:

	if dev.FileSGDMA, err = os.OpenFile(fname("sgdma"),
		os.O_RDWR|syscall.O_NONBLOCK, 0666); err != nil {
		dev.Close()
		return nil, err
	}

Hmm. Isn't syscall deprecated?

The changelist you pointed to could possibly be relevant, yes. I'll experiment with other ways of opening, what happens if I don't set it as non-blocking, etc., and see whether I learn anything....

@ianlancetaylor
Copy link
Contributor

This program will print true in Go 1.19, false in Go 1.20. This is likely the problem you are encountering.

package main

import (
	"fmt"
	"log"
	"os"
	"syscall"
)

func main() {
	f, err := os.OpenFile("/home/iant/hello.go", os.O_RDWR|syscall.O_NONBLOCK, 0666)
	if err != nil {
		log.Fatal(err)
	}
	fd := f.Fd()
	flag, _, e := syscall.Syscall(syscall.SYS_FCNTL, fd, syscall.F_GETFL, 0)
	if e != 0 {
		log.Fatal(e)
	}
	fmt.Printf("nonblocking: %t\n", flag&syscall.O_NONBLOCK != 0)
}

@ianlancetaylor
Copy link
Contributor

@gopherbot Please open backport for 1.20.

This is an edge case that broke in the 1.20 release.

@ianlancetaylor ianlancetaylor changed the title Read of a device driver fails only with Go 1.20 os: Read of a device driver fails only with Go 1.20 May 16, 2023
@ianlancetaylor ianlancetaylor added the NeedsFix The path to resolution is known, but the work has not been done. label May 16, 2023
@ianlancetaylor ianlancetaylor added this to the Go1.21 milestone May 16, 2023
@gopherbot
Copy link

Backport issue(s) opened: #60217 (for 1.20).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@gopherbot
Copy link

Change https://go.dev/cl/495079 mentions this issue: os: if descriptor is non-blocking, retain that in Fd method

@joefowler
Copy link
Author

@ianlancetaylor Great detective work, my friend! I am amazed that I found an actual bug in Go.

I can verify that my use case is fixed by adding the following lines after I do the os.OpenFile(...):

	fd := dev.FileSGDMA.Fd()
	flag, _, e := syscall.Syscall(syscall.SYS_FCNTL, fd, syscall.F_GETFL, 0)
	if e != 0 {
		panic(e)
	}
	if flag&unix.O_NONBLOCK == 0 {
		fmt.Printf("Boo!    The NONBLOCK flag is unset (flags=0x%x)\n", flag)
		if e := syscall.SetNonblock(int(fd), true); err != nil {
			panic(e)
		}
		flag, _, e = syscall.Syscall(syscall.SYS_FCNTL, fd, syscall.F_GETFL, 0)
		if e != 0 {
			panic(e)
		}
		if flag&unix.O_NONBLOCK == 0 {
			fmt.Printf("Boo!    The NONBLOCK flag is STILL unset (flags=0x%x)\n", flag)
		} else {
			fmt.Printf("Hooray! The NONBLOCK flag is set now (flags=0x%x)\n", flag)
		}
	}

When I run my program with this addition, I get

Boo!    The NONBLOCK flag is unset (flags=0x8002)
Hooray! The NONBLOCK flag is set now (flags=0x8802)

...and more importantly, the program proceeds to operate correctly. Thanks, team!

@bcmills
Copy link
Contributor

bcmills commented May 17, 2023

Just checking my understanding: is this the same behavior we were discussing in #58408 (comment)?

@ianlancetaylor
Copy link
Contributor

It's the same behavior in a different form. Your comment in #58408 is about passing a non-blocking file descriptor to os.NewFile. This issue is about calling os.OpenFile and passing syscall.O_NONBLOCK.

@gopherbot
Copy link

Change https://go.dev/cl/496015 mentions this issue: [release-branch.go1.20] os: if descriptor is non-blocking, retain that in Fd method

gopherbot pushed a commit that referenced this issue May 18, 2023
…t in Fd method

For #58408
For #60211
Fixes #60217

Change-Id: I30f5678b46e15121865b19d1c0f82698493fad4e
Reviewed-on: https://go-review.googlesource.com/c/go/+/495079
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
(cherry picked from commit f777726)
Reviewed-on: https://go-review.googlesource.com/c/go/+/496015
Reviewed-by: Heschi Kreinick <heschi@google.com>
Auto-Submit: Heschi Kreinick <heschi@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/496715 mentions this issue: [release-branch.go1.20] net, os: net.Conn.File.Fd should return a blocking descriptor

gopherbot pushed a commit that referenced this issue May 23, 2023
…cking descriptor

Historically net.Conn.File.Fd has returned a descriptor in blocking mode.
That was broken by CL 495079, which changed the behavior for os.OpenFile
and os.NewFile without intending to affect net.Conn.File.Fd.
Use a hidden os entry point to preserve the historical behavior,
to ensure backward compatibility.

For #58408
For #60211
For #60217

Change-Id: I8d14b9296070ddd52bb8940cb88c6a8b2dc28c27
Reviewed-on: https://go-review.googlesource.com/c/go/+/496080
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
(cherry picked from commit b950cc8)
Reviewed-on: https://go-review.googlesource.com/c/go/+/496715
bradfitz pushed a commit to tailscale/go that referenced this issue May 25, 2023
…t in Fd method

For golang#58408
For golang#60211
Fixes golang#60217

Change-Id: I30f5678b46e15121865b19d1c0f82698493fad4e
Reviewed-on: https://go-review.googlesource.com/c/go/+/495079
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
(cherry picked from commit f777726)
Reviewed-on: https://go-review.googlesource.com/c/go/+/496015
Reviewed-by: Heschi Kreinick <heschi@google.com>
Auto-Submit: Heschi Kreinick <heschi@google.com>
bradfitz pushed a commit to tailscale/go that referenced this issue May 25, 2023
…cking descriptor

Historically net.Conn.File.Fd has returned a descriptor in blocking mode.
That was broken by CL 495079, which changed the behavior for os.OpenFile
and os.NewFile without intending to affect net.Conn.File.Fd.
Use a hidden os entry point to preserve the historical behavior,
to ensure backward compatibility.

For golang#58408
For golang#60211
For golang#60217

Change-Id: I8d14b9296070ddd52bb8940cb88c6a8b2dc28c27
Reviewed-on: https://go-review.googlesource.com/c/go/+/496080
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
(cherry picked from commit b950cc8)
Reviewed-on: https://go-review.googlesource.com/c/go/+/496715
bradfitz pushed a commit to tailscale/go that referenced this issue May 25, 2023
…t in Fd method

For golang#58408
For golang#60211
Fixes golang#60217

Change-Id: I30f5678b46e15121865b19d1c0f82698493fad4e
Reviewed-on: https://go-review.googlesource.com/c/go/+/495079
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
(cherry picked from commit f777726)
Reviewed-on: https://go-review.googlesource.com/c/go/+/496015
Reviewed-by: Heschi Kreinick <heschi@google.com>
Auto-Submit: Heschi Kreinick <heschi@google.com>
bradfitz pushed a commit to tailscale/go that referenced this issue May 25, 2023
…cking descriptor

Historically net.Conn.File.Fd has returned a descriptor in blocking mode.
That was broken by CL 495079, which changed the behavior for os.OpenFile
and os.NewFile without intending to affect net.Conn.File.Fd.
Use a hidden os entry point to preserve the historical behavior,
to ensure backward compatibility.

For golang#58408
For golang#60211
For golang#60217

Change-Id: I8d14b9296070ddd52bb8940cb88c6a8b2dc28c27
Reviewed-on: https://go-review.googlesource.com/c/go/+/496080
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
(cherry picked from commit b950cc8)
Reviewed-on: https://go-review.googlesource.com/c/go/+/496715
@gopherbot
Copy link

Change https://go.dev/cl/501699 mentions this issue: doc/go1.21: mention NewFile on non-blocking descriptor

gopherbot pushed a commit that referenced this issue Jun 8, 2023
The returned descriptor now remains in non-blocking mode.

For #58408
For #60211

Change-Id: I88d33c180db642d055b4fed3b03a9afa02e746bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/501699
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Bypass: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants