Skip to content

os/exec: possible race handling stdout&stderr pipes #69060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rgooch opened this issue Aug 25, 2024 · 7 comments
Open

os/exec: possible race handling stdout&stderr pipes #69060

rgooch opened this issue Aug 25, 2024 · 7 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@rgooch
Copy link

rgooch commented Aug 25, 2024

Go version

go version go1.22.6 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/rgooch/.cache/go-build'
GOENV='/home/rgooch/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/rgooch/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/rgooch/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go1.22.6-amd64'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go1.22.6-amd64/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.6'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/rgooch/git/Dominator/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build438858212=/tmp/go-build -gno-record-gcc-switches'

What did you do?

stderr := &strings.Builder{}
stdout := &strings.Builder{}
cmd := exec.Command(path, args...)
cmd.Env = make([]string, 0)
cmd.Stderr = stderr
cmd.Stdout = stdout
if err := cmd.Start(); err != nil {
	return err
}
if err := cmd.Wait(); err != nil {
	return err
}

This captures the essence of what I'm doing when starting a process. Note that there are other goroutines, in particular a goroutine that is accepting HTTP CONNECT requests concurrently.
I have not yet been able to reproduce the problem with a minimal programme.

I've taken a look at the os/exec implementation and haven't yet seen a potential bug, but given the core of what I'm doing is so simple, I have to wonder if there is a bug lurking somewhere in the standard library. It may be a subtle interaction with other file descriptors being created concurrently.

What did you see happen?

The Start() succeeds, but sometimes the Wait() never returns, despite the process (and all its children) exiting.
When Wait() never returns, I was able to capture the file descriptors in the child process before it exited:

lr-x------ 1 0 0 64 0 -> /dev/null
l-wx------ 1 0 0 64 1 -> pipe:[61465]
l-wx------ 1 0 0 64 2 -> pipe:[61466]
lr-x------ 1 0 0 64 10 -> /tmp/test.sh

As you can see, stdout and stderr each have the write-side of their own pipe, as expected. The child process (and any dependents) definitely exit shortly afterwards; they do not appear in the process table.

Inside the parent process (the one using os/exec), these are the file descriptors:

lrwx------    1 0        0               64 Aug 25 03:41 0 -> /dev/console
lrwx------    1 0        0               64 Aug 25 03:41 1 -> /dev/console
lrwx------    1 0        0               64 Aug 25 03:42 10 -> /dev/pts/0
lrwx------    1 0        0               64 Aug 25 03:42 11 -> socket:[23564]=
lrwx------    1 0        0               64 Aug 25 03:41 2 -> /dev/console
lrwx------    1 0        0               64 Aug 25 03:41 3 -> socket:[15381]=
lrwx------    1 0        0               64 Aug 25 03:41 4 -> anon_inode:[eventpoll]
lr-x------    1 0        0               64 Aug 25 03:41 5 -> pipe:[61446]|
l-wx------    1 0        0               64 Aug 25 03:41 6 -> pipe:[61446]|
l-wx------    1 0        0               64 Aug 25 03:42 7 -> /tmp/log
lrwx------    1 0        0               64 Aug 25 03:42 8 -> socket:[61457]=
lrwx------    1 0        0               64 Aug 25 03:41 9 -> /dev/ptmx

Both the read and write side of the stderr pipe are open, and the stdout pipe is nowhere to be found.

What did you expect to see?

The Wait() should return as soon as the process exists.
Also, the read-side of the stdout and stderr pipes should be open in the parent process, but not the write-side.

@seankhliao
Copy link
Member

I think we'll need a reproducer to do anything about this report.

@seankhliao seankhliao added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 25, 2024
@ianlancetaylor
Copy link
Member

You didn't really say what the child process is doing, and in particular whether it might itself be spawing other processes.
See the discussion at https://pkg.go.dev/os/exec#Cmd.WaitDelay. Could this be related to that?

@rgooch
Copy link
Author

rgooch commented Aug 27, 2024

The subprocess is quite simple: it's a BusyBox shell (ash) script:

#! /tmp/bb/ash

set -eu

ls -Fl /proc/$$/fd > /tmp/test.fds

sleep 0.05

exit 0

I know about WaitDelay, but that should not be needed because the process and all children exit promptly.

@ianlancetaylor
Copy link
Member

Thanks.

Both the read and write side of the stderr pipe are open, and the stdout pipe is nowhere to be found.

I may be misreading, but it seems to me that the pipes in the child are 61465 and 61466, whereas the pipe in the parent is
61446, which is different from both. That is, the parent doesn't seem to have either of the pipes in the child, though it does have a different unrelated pipe.

If you send the hanging process a SIGQUIT, or press the ^\ keys, do you get a stack trace of where the program is hanging? It might help to know exactly which system call is not returning. Thanks.

@rgooch
Copy link
Author

rgooch commented Aug 27, 2024

Ah, good eye.
Here are extracts of the relevant stack trace:

goroutine 1 [chan receive, 1246 minutes]:
os/exec.(*Cmd).awaitGoroutines(0xc0003fc180, 0x0)
	/usr/local/go1.22.6-amd64/src/os/exec/exec.go:957 +0x3b1
os/exec.(*Cmd).Wait(0xc0003fc180)
	/usr/local/go1.22.6-amd64/src/os/exec/exec.go:924 +0x2d0

goroutine 181 [IO wait, 1246 minutes]:
internal/poll.runtime_pollWait(0x7fc1d4e7acc8, 0x72)
	/usr/local/go1.22.6-amd64/src/runtime/netpoll.go:345 +0x85
internal/poll.(*pollDesc).wait(0xc000690140, 0x72, 0x1)
	/usr/local/go1.22.6-amd64/src/internal/poll/fd_poll_runtime.go:84 +0xb1
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go1.22.6-amd64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000690120, {0xc00028c000, 0x8000, 0x8000})
	/usr/local/go1.22.6-amd64/src/internal/poll/fd_unix.go:164 +0x466
os.(*File).read(...)
	/usr/local/go1.22.6-amd64/src/os/file_posix.go:29
os.(*File).Read(0xc000398008, {0xc00028c000, 0x8000, 0x8000})
	/usr/local/go1.22.6-amd64/src/os/file.go:118 +0xad
io.copyBuffer({0xcde0a0, 0xc0000fa000}, {0xcde5c0, 0xc00048e010}, {0x0, 0x0, 0x0})
	/usr/local/go1.22.6-amd64/src/io/io.go:429 +0x29b
io.Copy(...)
	/usr/local/go1.22.6-amd64/src/io/io.go:388
os.genericWriteTo(0xc000398008, {0xcde0a0, 0xc0000fa000})
	/usr/local/go1.22.6-amd64/src/os/file.go:269 +0x70
os.(*File).WriteTo(0xc000398008, {0xcde0a0, 0xc0000fa000})
	/usr/local/go1.22.6-amd64/src/os/file.go:247 +0xd5
io.copyBuffer({0xcde0a0, 0xc0000fa000}, {0xcddf00, 0xc000398008}, {0x0, 0x0, 0x0})
	/usr/local/go1.22.6-amd64/src/io/io.go:411 +0xd4
io.Copy(...)
	/usr/local/go1.22.6-amd64/src/io/io.go:388
os/exec.(*Cmd).writerDescriptor.func1()
	/usr/local/go1.22.6-amd64/src/os/exec/exec.go:580 +0x5b
os/exec.(*Cmd).Start.func2(0xc0002640a0)
	/usr/local/go1.22.6-amd64/src/os/exec/exec.go:733 +0x3d
created by os/exec.(*Cmd).Start in goroutine 1
	/usr/local/go1.22.6-amd64/src/os/exec/exec.go:732 +0x11c5

goroutine 182 [IO wait, 1246 minutes]:
internal/poll.runtime_pollWait(0x7fc1d4e7aad8, 0x72)
	/usr/local/go1.22.6-amd64/src/runtime/netpoll.go:345 +0x85
internal/poll.(*pollDesc).wait(0xc000690200, 0x72, 0x1)
	/usr/local/go1.22.6-amd64/src/internal/poll/fd_poll_runtime.go:84 +0xb1
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go1.22.6-amd64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0006901e0, {0xc0001aa000, 0x8000, 0x8000})
	/usr/local/go1.22.6-amd64/src/internal/poll/fd_unix.go:164 +0x466
os.(*File).read(...)
	/usr/local/go1.22.6-amd64/src/os/file_posix.go:29
os.(*File).Read(0xc000398030, {0xc0001aa000, 0x8000, 0x8000})
	/usr/local/go1.22.6-amd64/src/os/file.go:118 +0xad
io.copyBuffer({0xcde0a0, 0xc0000fa160}, {0xcde5c0, 0xc000124000}, {0x0, 0x0, 0x0})
	/usr/local/go1.22.6-amd64/src/io/io.go:429 +0x29b
io.Copy(...)
	/usr/local/go1.22.6-amd64/src/io/io.go:388
os.genericWriteTo(0xc000398030, {0xcde0a0, 0xc0000fa160})
	/usr/local/go1.22.6-amd64/src/os/file.go:269 +0x70
os.(*File).WriteTo(0xc000398030, {0xcde0a0, 0xc0000fa160})
	/usr/local/go1.22.6-amd64/src/os/file.go:247 +0xd5
io.copyBuffer({0xcde0a0, 0xc0000fa160}, {0xcddf00, 0xc000398030}, {0x0, 0x0, 0x0})
	/usr/local/go1.22.6-amd64/src/io/io.go:411 +0xd4
io.Copy(...)
	/usr/local/go1.22.6-amd64/src/io/io.go:388
os/exec.(*Cmd).writerDescriptor.func1()
	/usr/local/go1.22.6-amd64/src/os/exec/exec.go:580 +0x5b
os/exec.(*Cmd).Start.func2(0xc0002640e0)
	/usr/local/go1.22.6-amd64/src/os/exec/exec.go:733 +0x3d
created by os/exec.(*Cmd).Start in goroutine 1
	/usr/local/go1.22.6-amd64/src/os/exec/exec.go:732 +0x11c5

@ianlancetaylor
Copy link
Member

Thanks. That definitely looks like the parent process is waiting for the pipes to be closed by the child. That is, the kind of thing affected by WaitDelay. I don't have an explanation for why the parent is still waiting, though.

@seankhliao seankhliao added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Sep 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants