os/exec: race on cmd.Wait() might lead to panic #28461

empijei · 2018-10-29T12:43:25Z

The current implementation of cmd.Wait has a race condition: if multiple goroutines call it, it might cause a panic.

In the first part of the method, copied below, two concurrent calls might check c.finished, get false, set it to true, invoke c.Process.Wait() and close c.waitDone before any error checking is performed. c.waitDone is a chan struct{}, and a double close will cause a panic.

func (c *Cmd) Wait() error {
	if c.Process == nil {
		return errors.New("exec: not started")
	}
	if c.finished {
		return errors.New("exec: Wait was already called")
	}
	c.finished = true

	state, err := c.Process.Wait()
	if c.waitDone != nil {
		close(c.waitDone)
	}
	//[...]

Since waiting is a synchronization primitive I'd expect one of the two:

The documentation should state that this is not safe for concurrent use (probably the best approach here)
Some form of synchronization to prevent races. I would either atomically CAS c.finished (but I'm not a fan of atomics and would require to change type to some sort of int) or protect c with a mutex, which would be my suggested solution for this case

I would happily send in CLs in both cases.

The text was updated successfully, but these errors were encountered:

mvdan · 2018-10-29T13:55:56Z

I'm unclear as to whether every standard library function that isn't safe for concurrent use needs to be documented as such. For this particular case it seems fine to me, as it wasn't obvious by reading the godoc, and I can imagine someone making the mistake.

/cc @bradfitz @ianlancetaylor as per https://dev.golang.org/owners/ for a decision.

bradfitz · 2018-10-29T15:33:47Z

In general we only document the abnormal cases. The assumption when unstated is that things are not safe for concurrent use, that zero values are not usable, that implementations implement interfaces faithfully, and that only one return values is non-zero and meaningful.

That said, in this case I agree it would be a reasonable assumption for users to think that two concurrent Wait calls are valid, so I wouldn't object to docs or a fix here to allow concurrent calls.

ianlancetaylor · 2018-10-29T18:38:55Z

Fixing this to fully permit concurrent wait calls would require us to save the *ProcessState value, so let's not do that. But I think it would be nice to fix it so that all but one of the calls returns an error, if possible. We may be able to rely on the kernel returning ECHILD for all but the first call that it sees.

bradfitz · 2018-10-29T20:03:50Z

Fixing this to fully permit concurrent wait calls would require us to save the *ProcessState value, so let's not do that.

Why do you dismiss that option? Just curious. It's not a huge struct, and the lifetime of *exec.Cmd isn't typically long. It's not usually retained anywhere.

ianlancetaylor · 2018-10-29T21:34:04Z

It just doesn't feel like the right approach to me. I'm open to counter-argument.

empijei · 2018-10-30T09:45:50Z

Thanks @bradfitz for clarifying assumptions, I just pointed this out as this might be a peculiar case due to the synchronizing nature of the Wait method.

One more question: should the doc change be on cmd (specifying that no concurrent calls to its methods are safe) or just on Wait?

About fixing it I agree with @ianlancetaylor. I think that external processes should be treated as "outside of go boundaries". More specifically: if synchronization or orchestration is needed w.r.t. an external process, interactions with the process should be proxied by a goroutine. It would then use channels and other primitives in an idiomatic way. I wouldn't like to read some Go code using external processes and wait on them instead of a sync.WaitGroup or reading from a chan.

bcmills · 2018-10-31T13:47:56Z

Fixing this to fully permit concurrent wait calls would require us to save the *ProcessState value

I think I'm missing something: don't we already save the *ProcessState in the Cmd.ProcessState field?

ianlancetaylor · 2018-10-31T14:00:02Z

@bcmills In this issue we're talking about the os package, not the os/exec package.

bcmills · 2018-10-31T14:40:07Z

I'm still not sure that I follow. As @empijei notes, it seems like we could address the reported race with a mutex in *exec.Cmd, without touching the os package itself. I'm envisioning something like:

func (c *Cmd) Wait() error {
	if c.Process == nil {
		return errors.New("exec: not started")
	}

	c.waitMu.Lock()
	defer c.waitMu.Unlock()

	if c.waitErr != nil {
		return c.waitErr
	}

	if c.ProcessState == nil {
		if c.finished {
			return errors.New("exec: ProcessState is nil after Wait was called")
		}
		c.ProcessState, c.waitErr := c.Process.Wait()
		if c.waitDone != nil {
			close(c.waitDone)
		}
		for range c.goroutine {
			if err := <-c.errch; err != nil && c.waitErr == nil && state.Success() {
				c.waitErr = err
			}
		}
		c.closeDescriptors(c.closeAfterWait)
		c.finished = true
	}

	if c.waitErr == nil && !c.ProcessState.Success() {
		return &ExitError{ProcessState: c.ProcessState}
	}
	return c.waitErr
}

ianlancetaylor · 2018-10-31T14:47:13Z

I suppose that I have mentally transmuted this issue to applying to os.(*Process).Wait rather than os/exec.(*Cmd).Wait. You're right that the original report is about os/exec, but it still sort of seems to me that we should fix the problem in the os package rather than only in the os/exec package.

bcmills · 2018-10-31T14:54:25Z

Maybe, although I think there is a qualitative difference: calling Wait manually on a *os.Process obtained from FindProcess or StartProcess seems like a power-user feature, whereas I see (*exec.Cmd).Wait pretty frequently in ordinary application code.

ianlancetaylor · 2018-10-31T15:06:31Z

Sure, agreed, but presumably if we fix the os package we will get the os/exec fix for free. So, why not?

bcmills · 2018-10-31T18:36:34Z

I don't think we can get the os/exec fix for free: even if we make c.Process.Wait safe to call concurrently, we'll still need to synchronize access to fields such as c.ProcessState, c.finished, and c.waitDone.

And then there is the question of whether Wait should be idempotent or should return an error for all but one call. You suggested the latter (in #28461 (comment)), but I would expect a method that allows concurrent (not just repeated) Wait calls to return the same result to all callers, just as (*sync.WaitGroup).Wait allows multiple goroutines to wait on the same set of events.

ianlancetaylor · 2018-10-31T19:29:26Z

The wait system call supports concurrent calls, but returns ECHILD to all but the first.

If we support concurrent os.(*Process).Wait calls, then Process must preserve the ProcessState. Which it could do.

rsc · 2018-11-14T18:38:34Z

Leaning toward making it safe for multiple goroutines to call both Waits, but for now let's leave things alone - don't document it, don't change it. It's been this way for 12 releases.

arxeiss · 2020-10-23T11:45:12Z

I know this is an old one, sorry. I also find out it is problematic to wait for finishing with multiple threads because only once it returns exit code, then just error.

Little workaround - not perfect, but working

type errorWrap struct {
	err error
}
type Cmd struct {
	exec.Cmd
	waitLock *sync.Mutex
	errWrap  *errorWrap
}
func (c *Cmd) WaitTS() error {
	c.waitLock.Lock()
	defer c.waitLock.Unlock()
	if c.errWrap == nil {
		c.errWrap = &errorWrap{
			err: c.Wait(),
		}
	}
	return c.errWrap.err
}

shivamsouravjha · 2024-02-07T09:19:25Z

Hey folks!
Is there any update around this?

I am working on a project where I start a child application in my go program with the cmd.Run()(cmd is *exec.Cmd) command. When exiting the child application I kill all the PIDs of child application for that I check the processState of cmd.

There I get race condition warning about my code reading the processState and write at c.ProcessState = state in the wait()

Both of these actions(checking the state and cmd.Run) are happening in separate go-routines thus checking the processState before killing PID.

mvdan added the NeedsDecision label Oct 29, 2018

mvdan added this to the Go1.12 milestone Oct 29, 2018

rsc modified the milestones: Go1.12, Go1.13 Nov 14, 2018

mvdan mentioned this issue Feb 17, 2019

text/template: document concurrency properties for the instances returned by (*Template).New #30281

Closed

empijei mentioned this issue Mar 6, 2019

all: document the default doc assumptions around concurrency, zero values, interfaces and return values #30632

Open

andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019

rsc modified the milestones: Go1.14, Backlog Oct 9, 2019

edigaryev mentioned this issue Nov 17, 2020

Better child process handling cirruslabs/cirrus-ci-agent#23

Closed

bcmills mentioned this issue Apr 26, 2022

os/exec: documentation unclear on whether (*Cmd).Wait must be called #52580

Closed

shivamsouravjha mentioned this issue Feb 7, 2024

[bug]: fix race conditions in various flows within keploy keploy/keploy#1392

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

os/exec: race on cmd.Wait() might lead to panic #28461

os/exec: race on cmd.Wait() might lead to panic #28461

empijei commented Oct 29, 2018

mvdan commented Oct 29, 2018

bradfitz commented Oct 29, 2018

ianlancetaylor commented Oct 29, 2018

bradfitz commented Oct 29, 2018

ianlancetaylor commented Oct 29, 2018

empijei commented Oct 30, 2018

bcmills commented Oct 31, 2018

ianlancetaylor commented Oct 31, 2018

bcmills commented Oct 31, 2018

ianlancetaylor commented Oct 31, 2018

bcmills commented Oct 31, 2018

ianlancetaylor commented Oct 31, 2018

bcmills commented Oct 31, 2018

ianlancetaylor commented Oct 31, 2018

rsc commented Nov 14, 2018

arxeiss commented Oct 23, 2020 •

edited

Loading

shivamsouravjha commented Feb 7, 2024 •

edited

Loading

os/exec: race on cmd.Wait() might lead to panic #28461

os/exec: race on cmd.Wait() might lead to panic #28461

Comments

empijei commented Oct 29, 2018

mvdan commented Oct 29, 2018

bradfitz commented Oct 29, 2018

ianlancetaylor commented Oct 29, 2018

bradfitz commented Oct 29, 2018

ianlancetaylor commented Oct 29, 2018

empijei commented Oct 30, 2018

bcmills commented Oct 31, 2018

ianlancetaylor commented Oct 31, 2018

bcmills commented Oct 31, 2018

ianlancetaylor commented Oct 31, 2018

bcmills commented Oct 31, 2018

ianlancetaylor commented Oct 31, 2018

bcmills commented Oct 31, 2018

ianlancetaylor commented Oct 31, 2018

rsc commented Nov 14, 2018

arxeiss commented Oct 23, 2020 • edited Loading

shivamsouravjha commented Feb 7, 2024 • edited Loading

arxeiss commented Oct 23, 2020 •

edited

Loading

shivamsouravjha commented Feb 7, 2024 •

edited

Loading