Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: os: Goroutine-friendly waiting on processes #60481

Open
mitar opened this issue May 29, 2023 · 8 comments
Open

proposal: os: Goroutine-friendly waiting on processes #60481

mitar opened this issue May 29, 2023 · 8 comments
Labels
Milestone

Comments

@mitar
Copy link
Contributor

mitar commented May 29, 2023

Waiting on a sub-process in Go (or through Process.Wait or syscall.Wait4) does not align well with multi-threaded goroutine-based Go programs. At least on Linux they call into waitpid syscall (or one of related ones like waitid) which are designed that a process can call wait on a subprocess only once. While writing my init project in Go I realized this is a limitation which makes it hard to make whole program modular. You cannot have multiple different parts of a program observe status of subprocesses, e.g., one for logging what is happening with all subprocesses, another for spawning and waiting for programs, third for reaping unknown zombies who got reparented to the init process, and lastly if your init uses ptrace to inspect those unknown processes, you even hit bugs #60321 and in combination with reaping loop you have a problem that it might be the reaping loop which gets its wait woke up after you attach to the subprocess.

In short, currently having in a Go program just one process.Wait() and another syscall.Wait4(-1, ...) create an inherently racy execution which makes it hard to have a modular program. E.g., installing go-reaper package inherently breaks any process.Wait, sometimes, randomly, in hard to debug manner.

Because of this I propose that a new API is introduced which hides away the complexities of waiting on sub-processes in a similar way how os.signal package does for signal handling. Signal handling also has similar limitations at a low level, but because there is a Go abstraction which maps them into channels, any part of a program can hook into signals without interfering with the rest of the program. I propose the following API:

// WaitCode is enumeration of possible process states.
type WaitCode int

const (
	// Process has terminated normally.
	ProcessExited WaitCode = iota
	// Process was terminated by a signal.
	ProcessSignaled
	// Process was stopped by delivery of a signal, but not trapped.
	ProcessStopped
	// Process has stopped and trapped.
	ProcessTrapped
	// Child continued.
	ProcessContinued
	// Process was killed by a signal and dumped core.
	ProcessDumped
)

// WaitInfo provides information about the state of the process after waiting.
type WaitInfo struct {
	Id int // Pid or pidfd
	Status int
	Code   WaitCode
}

// Exited returns true if the process has terminated normally.
func (i WaitInfo) Exited() bool { return i.Code == ProcessExited }

// Signaled returns true if the process was terminated by a signal.
func (i WaitInfo) Signaled() bool { return i.Code == ProcessSignaled }

// Stopped returns true if the process was stopped by delivery of a signal, but not trapped.
func (i WaitInfo) Stopped() bool { return i.Code == ProcessStopped }

// Trapped returns true if the process has stopped and trapped.
func (i WaitInfo) Trapped() bool { return i.Code == ProcessTrapped }

// Continued returns true if the process has continued.
func (i WaitInfo) Continued() bool { return i.Code == ProcessContinued }

// CoreDump returns true if the process was killed by a signal and dumped core.
func (i WaitInfo) CoreDump() bool { return i.Code == ProcessDumped }

func (i WaitInfo) ExitStatus() int {
	if !i.Exited() {
		return -1
	}
	return i.Status
}

func (i WaitInfo) Signal() syscall.Signal {
	if !i.Signaled() {
		return -1
	}
	return syscall.Signal(i.Status)
}

func (i WaitInfo) StopSignal() syscall.Signal {
	if !i.Stopped() {
		return -1
	}
	return syscall.Signal(i.Status)
}

func (i WaitInfo) TrapCause() int {
	if !i.Trapped() {
		return -1
	}
	return i.Status
}

// Wait waits on the process with pid core one or more state transitions to those listed.
// Once transition to that state happens, a WaitInfo is sent to the provided channel.
// Wait can be called multiple times on the same process with an arbitrary combination
// of codes to wait on. When context is canceled, waiting gets canceled as well.
func Wait(ctx context.Context, pid int, c chan<- WaitInfo, code ...WaitCode) {

}

// WaitPidfd is similar to Wait, just that it operates on process file handles available on Linux.
func WaitPidfd(pidfd int, c chan<- WaitInfo, code ...WaitCode) {

}

// Waits on any process to transition to states listed. Once transition to that state happens,
// a WaitInfo is sent to the provided channel.Can be called multiple times with  an arbitrary
// combination of codes to wait on. When context is canceled, waiting gets canceled as well.
func WaitAny(c chan<- WaitInfo, code ...WaitCode) {

}

WaitPidfd is Linux specific. Maybe it could be extended to Windows process handles as well?

Current possible list of states/codes are inspired by unix. Not all of them will be relevant on other platforms. Maybe there are other states we should add other platforms have, but I am not familiar with them? I think possible states should be an union of what is available on supported platforms because the API is cross-platform.

I use context instead of explicit Stop like signal package has as I find it more modern and suitable for a new API. We could provide both or just Stop if others feel differently.

WaitInfo is inspired by WaitStatus and API is similar on purpose.

This has to be in the core Go library as os.exec and os.Process should use this internally (any anything else waiting on processes in stdlib), otherwise we didn't fix the main issue.

@mitar mitar added the Proposal label May 29, 2023
@gopherbot gopherbot added this to the Proposal milestone May 29, 2023
@thediveo
Copy link

Maybe because it is named "Go" it naturally is adverse to "Wait"...? 🤡

@ianlancetaylor
Copy link
Contributor

The os package is intended to be somewhat OS independent. Some aspects of this only seem to make sense on Unix systems.

It's not yet obvious to me why we need new API. Programs that need this level of control over subprocesses are likely system dependent and can use the syscall API. I don't see a reason to worry about conflicts between the syscall API and the os API; if you need the syscall API, use only the syscall API. Then you can take advantage of whatever facilities the operating system provides, and can produce the same channel-based API that you are describing here.

@mitar
Copy link
Contributor Author

mitar commented May 29, 2023

But I am not free to use whatever I want, because if I want to use os.exec package I am bound to what os.exec is using. Sure, I could reimplement os.exec with channel-based API but that looks to me like a heavy lift.

I really find here analogy with os.signal which uses channels and not simply a callback when signal arrives, and not a blocking call which waits for the signal to arrive. The channel makes the API composable. Wait lacks this.

@mitar
Copy link
Contributor Author

mitar commented May 30, 2023

Alternative could be that os.Process get a WaitFunc field which by default calls current wait implementation, but one can set it so that when you (or Cmd) calls Wait your implementation is called. This would allow one to swap current implementation with channel based one. But it is still not perfect because libraries wrapping os.Process which might not expose/propagate this new field/option would still interfere.

@thediveo
Copy link

But it is still not perfect because libraries wrapping os.Process which might not expose/propagate this new field/option would still interfere.

That is exactly the crucial point IMHO that the OP brings forward with his proposal. Regarding the "wrapping" I see problems with the many command and process-related packages already out there for some time and it might often need package updates deep down the chain to enable this feature. That's why I find the OP's proposal highly useful from my developer's consumer point of view.

@thediveo
Copy link

thediveo commented Jun 4, 2023

Ping?

@ianlancetaylor
Copy link
Contributor

@thediveo There are more than 200 open proposals, unfortunately. We will get to this one. Thanks.

@EdSchouten
Copy link

I'm dealing with the same issue in https://github.com/buildbarn/bb-remote-execution/. I currently need to wrap execution of all workers into tini, as a plain Go binary can't do any zombie process reaping. It would be nice if all Go binaries had some kind of dedicated goroutine that constantly called syscall.Wait4(-1), and used channels or something to route wait information to the appropriate part of your program.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

5 participants