os/exec: add fields for managing termination signals and pipes #50436

bcmills · 2022-01-04T20:01:57Z

Background

#23019 (accepted but not yet implemented; CC @ianlancetaylor @bradfitz) proposed to change exec.Cmd.Wait to stop the goroutines that are copying I/O to and from a completed exec.Cmd; see that proposal for further background on the problem it aims to address. However, as noted in #23019 (comment) and #23019 (comment), any feasible implementation of the proposal requires the use of an arbitrary timeout, and the proposal does not include a mechanism to adjust that timeout. (Given our history with the Go project's builders, I am extremely skeptical that any particular hard-coded timeout can strike an appropriate balance between robustness and latency.)

#31774, #22757, and #21135 proposed to allow users of exec.CommandContext to customize the signal sent to the command when the context is canceled. They were all declined due to lack of concrete demand for the feature (#21135 (comment), #22757 (comment), #31774 (comment)). We have since accrued a number of copies of functions that work around the feature's absence. In the Go project alone, we have:

I'm attempting to add yet another variation (in CL 373005) in order to help diagnose #50014. However, for this variation (prompted by discussions with @aclements and @prattmic) I have tried to make this variation a minimally-invasive change on top of the exec.Cmd API.

I believe I have achieved that goal: the API requires the addition of only 2–3 new fields and no new methods or top-level functions. You can view (and try out) a prototype as github.com/bcmills/more/os/moreexec, which provides a drop-in replacement for a subset of the exec.Cmd API.

Proposal

I propose the addition of the following fields to the exec.Cmd struct, along with their corresponding implementation:

	// Context is the context that controls the lifetime of the command
	// (typically the one passed to CommandContext).
	Context context.Context

	// If Interrupt is non-nil, Context must also be non-nil and Interrupt will be
	// sent to the child process when Context is done.
	//
	// If the command exits with a success code after the Interrupt signal has
	// been sent, Wait and similar methods will return Context.Err()
	// instead of nil.
	//
	// If the Interrupt signal is not supported on the current platform
	// (for example, if it is os.Interrupt on Windows), Start may fail
	// (and return a non-nil error).
	Interrupt os.Signal

	// If WaitDelay is non-zero, the command's I/O pipes will be closed after
	// WaitDelay has elapsed after either the command's process has exited or
	// (if Context is non-nil) Context is done, whichever occurs first.
	// If the command's process is still running after WaitDelay has elapsed,
	// it will be terminated with os.Kill before the pipes are closed.
	//
	// If the command exits with a success code after pipes are closed due to
	// WaitDelay and no Interrupt signal has been sent, Wait and similar methods
	// will return ErrWaitDelay instead of nil.
	//
	// If WaitDelay is zero (the default), I/O pipes will be read until EOF,
	// which might not occur until orphaned subprocesses of the command have
	// also closed their descriptors for the pipes.
	WaitDelay time.Duration

The new Context field is exported only in order to simplify the documentation for the Interrupt and WaitDelay fields. (It was requested and rejected in #46699, but the objection there was my own — due to concerns about the interactions with the API in this proposal. It could be excised from this proposal without damaging anything but documentation clarity.)

The new Interrupt field sets the signal to be sent when the Context is done. exec.CommandContext explicitly sets it to os.Kill in order to maintain the existing behavior of exec.CommandContext, but I expect many users on Unix platforms will want to set it to os.Interrupt or syscall.SIGQUIT instead.

The new WaitDelay field sets the interval to wait for input and output after process termination or an interrupt signal. That interval turns out to be important for many testing applications (such as the Go Playground implementation and the cmd/go test suite). It also generalizes nicely to the use-cases in #23019: setting WaitDelay without Context provides bounded I/O wait times without sending a preceding signal.

Compatibility

I believe that this proposal is entirely backward-compatible (in contrast with #23019). The zero-values for the new fields provide exactly the same behavior as a Cmd returned by exec.Command or exec.CommandContext today.

Caveats

This proposal does not address graceful shutdown on Windows (#22757 (comment); CC @mvdan). However, it may be possible to extend it to do so by providing special-case Windows behavior when the Interrupt field is set to os.Interrupt, or by adding an InterruptFunc func(*Cmd) callback that would also be invoked when Context is done.

The proposed API also does not provide a mechanism to send an Interrupt signal followed by os.Kill after a delay but still wait for subprocesses to close all I/O pipes. I believe the use-cases for that scenario are sufficiently niche to be provided only by third-party libraries: sending SIGKILL to the parent process makes it likely that subprocesses will not know to shut down, so in the vast majority of cases users should either not send SIGKILL at all (WaitDelay == 0), forcibly terminate the pipes to try to kill the subprocesses with SIGPIPE (WaitDelay > 0), or do something platform-specific to try to forcibly shut down an entire process group (outside the scope of this proposal).

Alternatives considered

In #31774 (comment), @bradfitz suggested a field Kill func(*os.Process), which would presumably be added instead of the Interrupt field in this proposal. However, I believe that such a field would be simultaneously too complex and not powerful enough:

The Kill field would be too complex for most Unix applications, which overwhelmingly only need to send one of SIGTERM, SIGINT, SIGQUIT, or SIGKILL — why pass a whole callback when you really just want to say which signal you need?
A *os.Process callback would still not be powerful enough for Windows applications. If I understand the discussion in os: Interrupt is not sendable on Windows #6720 correctly (CC @alexbrainman), CTRL_BREAK_EVENT is sent to an entire process group, not a single *os.Process, so Windows users would also need a mechanism for creating (or determining) such a group, or some completely separate out-of-band way to request that the process terminate (such as by sending it a particular input or IPC message).

Given the above, the Interrupt field seems more ergonomic: it gives the right behavior for Unix users, and if Windows users want to do something more complex they can set Interrupt to nil and start a separate goroutine in between the calls to (*Cmd).Start and (*Cmd).Wait to implement whatever custom logic they want.

The text was updated successfully, but these errors were encountered:

gopherbot · 2022-01-04T20:14:09Z

Change https://golang.org/cl/373005 mentions this issue: internal/moreexec: add a utility package for manipulating os/exec.Cmd

seebs · 2022-01-04T20:22:54Z

// If the command's process is still running after WaitDelay has elapsed,
// it will be terminated with os.Kill before the pipes are closed.

I think this may be innately race-prone in that theoretically, the process can terminate and be replaced by a new process with that PID between the check and the kill. I have vague recollections of someone having told me there's a fancy new feature in some Linuxes that could bypass this, and it's admittedly a very small window for a very large amount of stuff to happen.

seebs · 2022-01-04T20:27:26Z

I'm a bit unclear on the meaning of the exported fields. Are they required to be set prior to launching a command, or can they be changed during the lifespan of the command? What happens if I change c.Interrupt after starting a process? What happens if I write a new value into c.Context while the process is running, and then after doing that close the previous context? etc.

I'm not sure what the answers should be, I'm pretty sure that any reasonably plausible answer will be fine, but I think it should be explicit. I think I'd favor "Context is a read-only field once the command is started, Interrupt and WaitDelay are both writeable".

bcmills · 2022-01-04T20:30:51Z

the process can terminate and be replaced by a new process with that PID between the check and the kill.

That is #13987, which seems orthogonal.

bcmills · 2022-01-04T20:32:03Z

I'm a bit unclear on the meaning of the exported fields. Are they required to be set prior to launching a command, or can they be changed during the lifespan of the command?

They are like the other (existing) exported fields of exec.Cmd: they must be set prior to Start (or equilavent) and then not modified until Wait (or equivalent) returns.

seebs · 2022-01-04T20:33:38Z

It is the same issue as 13987, but it's a new circumstance in which we hit that which might be non-obvious to a user, who is assuming that if we are killing "the" process, we'll pick the right one.

bcmills · 2022-01-04T20:36:03Z

That's true, but applies equally to all of the (numerous) workarounds that build the same behavior on top of os/exec.Cmd today. (I don't think this proposal makes that race any worse for anybody than it already is.)

ianlancetaylor · 2022-01-04T20:36:14Z

theoretically, the process can terminate and be replaced by a new process with that PID between the check and the kill. I have vague recollections of someone having told me there's a fancy new feature in some Linuxes that could bypass this, and it's admittedly a very small window for a very large amount of stuff to happen.

The Go standard library already uses that feature (the waitid or wait6 system call with an option of WNOWAIT) on Linux, DragonFly, and FreeBSD. (For a while we used waitid on macOS but stopped doing so due to #19314.)

mvdan · 2022-01-04T21:07:06Z

However, as noted in #23019 (comment) and #23019 (comment), any feasible implementation of the proposal requires the use of an arbitrary timeout, and the proposal does not include a mechanism to adjust that timeout. (Given our history with the Go project's builders, I am extremely skeptical that any particular hard-coded timeout can strike an appropriate balance between robustness and latency.)

I share this concern enthusiastically. Depending on what program you're executing, and in what environment and context, you may need very different timeouts. I also agree that we should let the user decide. As much as I would love for Go to just "do the right thing" without having to extend the API, I don't think we can do that without introducing footguns.

I believe that this proposal is entirely backward-compatible (in contrast with #23019). The zero-values for the new fields provide exactly the same behavior as a Cmd returned by exec.Command or exec.CommandContext today.

I had originally given #23019 a thumbs up, but I've moved my thumbs up here now instead :) I even think #23019 might qualify as a breaking change, given how it may break some unlucky programs where the timeout may lead to losing some output.

bcmills · 2022-01-04T22:53:55Z

Thinking about this some more, there is one other interesting caveat: error behavior.

If we stop reading the program's output before EOF, it may be useful for the caller to know that. Stopping output before EOF seems orthogonal to the exit status of the program, which is what the Wait and similar methods currently report.

Should setting WaitDelay cause Wait and similar methods to return a different error if end up forcibly closing I/O pipes?

henvic · 2022-01-04T23:22:34Z

The Interrupt field seems super useful to me.

However, IMHO adding the Context field to simplify the documentation might end up hurting more than helping as now I've to understand why there are two ways to pass context. Especially after reading this:

// If Interrupt is non-nil, Context must also be non-nil and Interrupt will be
// sent to the child process when Context is done.

henvic · 2022-01-04T23:24:58Z

Someone might ask themselves: is it only if I pass Context as a field or also when using CommandContext?

bcmills · 2022-01-05T04:45:35Z

IMHO adding the Context field to simplify the documentation might end up hurting more than helping as now I've to understand why there are two ways to pass context.

Note that without the Context field, we would still have to explain that Interrupt can only be used (or only has an effect) with Cmd instances returned by CommandContext. So I'm not sure that removing it would help with that confusion much. 😅

Someone might ask themselves: is it only if I pass Context as a field or also when using CommandContext?

Probably we could update the CommandContext doc comment to explicitly state that it sets the Context and Interrupt fields?

bcmills · 2022-01-05T04:49:19Z

Should setting WaitDelay cause Wait and similar methods to return a different error if end up forcibly closing I/O pipes?

I gave this some thought and arrived at:

Setting WaitDelay should not change the exit code if the command fails. The fact that the command failed is almost always enough for the caller to know that its output may be suspecct.
However, if the command exits with a successful status (that is, if Wait would otherwise return nil) but we had to forcibly close the pipes, we should return a distinguished error (perhaps a new exec.ErrWaitDelay?) so that the caller knows the input and/or output may have been truncated.

bcmills · 2022-01-05T14:48:16Z

I thought some more about the appropriate Windows behaviors. I added an “Alternatives considered” section and removed the future option for CTRL_BREAK_EVENT, since per #6720 it seems not to be feasible.

Instead, on Windows the Start method should fail with an error wrapping syscall.EWINDOWS if the specified Interrupt signal is not supported (that is, if it is not either nil or os.Kill).

rsc · 2022-01-05T19:18:01Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

rsc · 2022-01-19T18:33:43Z

It seems like the docs need to distinguish what we call cmd.Stdin from cmd.Stdout/cmd.Stderr. cmd.Stdin is already closed when we finish writing whatever was standard input, independent of exiting/waiting/cancellation.

I'm also confused by the WaitDelay description mixing cancellation and ordinary exits.

Maybe it would help to see the diff that would implement this? I looked at CL 373005 but it's a whole package, not a diff.

bcmills · 2022-01-19T19:00:35Z

It seems like the docs need to distinguish what we call cmd.Stdin from cmd.Stdout/cmd.Stderr. cmd.Stdin is already closed when we finish writing whatever was standard input, independent of exiting/waiting/cancellation.

A hang due to stdin copying analogous to the one involving stdout and stderr can occur if the command spawns its own subprocess with cmd.Stdin = os.Stdin and then exits without either process reading stdin to completion. (The copying goroutine is started in (*Cmd).stdin, and it only closes the pipe once copying is complete — which may be arbitrarily long if the writes don't fit in whatever buffer the kernel provides.)

So, at least in that case, we really do need to treat the I/O pipes symmetrically.

rsc · 2022-01-26T18:39:17Z

Thanks for the clarification about stdin.

I still feel like it would help me to see the diff in the package, as opposed to a fork as in CL 373005. I'm still not 100% sure I understand the proposal.

bcmills · 2022-02-02T16:36:36Z

Found another “run or kill” function in the main Go repo, and this one doesn't even bother using an appropriate signal (it skips straight to SIGKILL):
https://cs.opensource.google/go/go/+/master:test/run.go;l=834-854;drc=f4aa021985e9ae4a9a395f8fbe32ad08d2bfda3b

That manifests as a less-than-useful failure mode in #50973.

rsc · 2022-02-02T18:25:16Z

I'm still a little confused about the implications here. I'd like to see a concrete CL to understand it better. Will leave this open for this week, but let me know if you'd rather we put it on hold for now.

gopherbot · 2022-11-01T13:19:21Z

Change https://go.dev/cl/446875 mentions this issue: internal/testenv: adjust timeout calculations in CommandContext

gopherbot · 2022-11-01T13:27:07Z

Change https://go.dev/cl/446876 mentions this issue: runtime: check for ErrWaitDelay is runBuiltTestProg

I noticed some test failures in the build dashboard after CL 445597 that made me realize the grace period should be based on the test timeout, not the Context timeout: if the test itself sets a short timeout for a command, we still want to give the test process enough time to consume and log its output. I also put some more thought into how one might debug a test hang, and realized that in that case we don't want to set a WaitDelay at all: instead, we want to leave the processes in their stuck state so that they can be investigated with tools like `ps` and 'lsof'. Updates #50436. Change-Id: I65421084f44eeaaaec5dd2741cd836e9e68dd380 Reviewed-on: https://go-review.googlesource.com/c/go/+/446875 Auto-Submit: Bryan Mills <bcmills@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>

ErrWaitDelay is not expected to occur in this test, but if it does it indicates a failure mode very different from the “failed to start” catchall that we log for other non-ExitError errors. Updates #50436. Change-Id: I3f4d87d502f772bf471fc17303d5a6b483446f8f Reviewed-on: https://go-review.googlesource.com/c/go/+/446876 Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Bryan Mills <bcmills@google.com> Run-TryBot: Bryan Mills <bcmills@google.com>

This permits us to safely support concurrent access to files on Plan 9. Concurrent access was already safe on other systems. This does introduce a change: if one goroutine calls a blocking read on a pipe, and another goroutine closes the pipe, then before this CL the close would occur. Now the close will be delayed until the blocking read completes. Also add tests that concurrent I/O and Close on a pipe are OK. For golang#50436 For golang#56043 Change-Id: I969c869ea3b8c5c2f2ef319e441a56a3c64e7bf5 Reviewed-on: https://go-review.googlesource.com/c/go/+/438347 Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: David du Colombier <0intro@gmail.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Rob Pike <r@golang.org>

Fixes golang#50436. Change-Id: I9dff8caa317a04b7b2b605f810b8f12ef8ca485d Reviewed-on: https://go-review.googlesource.com/c/go/+/401835 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Bryan Mills <bcmills@google.com> Auto-Submit: Bryan Mills <bcmills@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>

This adds a testenv.CommandContext function, with timeout behavior based on the existing logic in cmd/go.TestScript: namely, the command is terminated with SIGQUIT (if supported) with an arbitrary grace period remaining until the test's deadline. If the test environment does not support executing subprocesses, CommandContext skips the test. If the command is terminated due to the timout expiring or the test fails to wait for the command after starting it, CommandContext marks the test as failing. For tests where a shorter timeout is desired (such as for fail-fast behavior), one may be supplied using context.WithTimeout. The more concise Command helper is like CommandContext but without the need to supply an explicit Context. Updates golang#50436. Change-Id: Ifd81fb86c402f034063c9e9c03045b4106eab81a Reviewed-on: https://go-review.googlesource.com/c/go/+/445596 Run-TryBot: Bryan Mills <bcmills@google.com> Reviewed-by: Austin Clements <austin@google.com> Auto-Submit: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>

For most tests, the test's deadline itself is more appropriate than an arbitrary timeout layered atop of it (especially once golang#48157 is implemented), and testenv.Command already adds cleaner timeout behavior when a command would run too close to the test's deadline. That makes RunWithTimeout something of an attractive nuisance. For now, migrate the two existing uses of it to testenv.CommandContext, with a shorter timeout implemented using context.WithTimeout. As a followup, we may want to drop the extra timeouts from these invocations entirely. Updates golang#50436. Updates golang#37405. Change-Id: I16840fd36c0137b6da87ec54012b3e44661f0d08 Reviewed-on: https://go-review.googlesource.com/c/go/+/445597 Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Auto-Submit: Bryan Mills <bcmills@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>

I noticed some test failures in the build dashboard after CL 445597 that made me realize the grace period should be based on the test timeout, not the Context timeout: if the test itself sets a short timeout for a command, we still want to give the test process enough time to consume and log its output. I also put some more thought into how one might debug a test hang, and realized that in that case we don't want to set a WaitDelay at all: instead, we want to leave the processes in their stuck state so that they can be investigated with tools like `ps` and 'lsof'. Updates golang#50436. Change-Id: I65421084f44eeaaaec5dd2741cd836e9e68dd380 Reviewed-on: https://go-review.googlesource.com/c/go/+/446875 Auto-Submit: Bryan Mills <bcmills@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>

ErrWaitDelay is not expected to occur in this test, but if it does it indicates a failure mode very different from the “failed to start” catchall that we log for other non-ExitError errors. Updates golang#50436. Change-Id: I3f4d87d502f772bf471fc17303d5a6b483446f8f Reviewed-on: https://go-review.googlesource.com/c/go/+/446876 Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Bryan Mills <bcmills@google.com> Run-TryBot: Bryan Mills <bcmills@google.com>

bcmills · 2022-11-03T20:41:21Z

The implementation is merged and I've added a couple of call sites that use this API in tests to shake out any loose ends.

I think all that's left at this point is a release note!

gopherbot · 2022-11-09T16:20:43Z

Change https://go.dev/cl/449076 mentions this issue: doc/go1.20: add a release note for os/exec API changes

Updates #50436. Change-Id: Ib6771221bda1c81d5593b29d7287ebcf169882ff Reviewed-on: https://go-review.googlesource.com/c/go/+/449076 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Bryan Mills <bcmills@google.com> Run-TryBot: Bryan Mills <bcmills@google.com>

gopherbot · 2022-11-17T14:19:27Z

Change https://go.dev/cl/377835 mentions this issue: internal/testenv: add a Command function that replaces exec.Command

The function is derived from the testenv.Command function in the main repo as of CL 446875, with a couple of modifications to allow it to build (with more limited functionality) with Go versions as old as 1.16 (currently needed in order to test gopls with such versions). testenv.Command sets up an exec.Cmd with more useful termination behavior in the context of a test: namely, it is terminated with SIGQUIT (to get a goroutine dump from the subprocess) shortly before the test would otherwise time out. Assuming that the test logs the output from the command appropriately, this should make deadlocks and unexpectedly slow operations easier to diagnose in the builders. For golang/go#50014. Updates golang/go#50436. Change-Id: I872d4b24e63951bf9b7811189e672973d366fb78 Reviewed-on: https://go-review.googlesource.com/c/tools/+/377835 Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Joedian Reid <joedian@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Bryan Mills <bcmills@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Reviewed-by: Robert Findley <rfindley@google.com> gopls-CI: kokoro <noreply+kokoro@google.com>

gopherbot · 2022-12-07T22:35:25Z

Change https://go.dev/cl/456116 mentions this issue: cmd/go: use Cancel and WaitDelay to terminate test subprocesses

gopherbot · 2023-01-10T16:05:01Z

Change https://go.dev/cl/460997 mentions this issue: cmd/go/internal/vcweb: simplify hgHandler cancellation

gopherbot · 2023-01-10T16:44:50Z

Change https://go.dev/cl/460998 mentions this issue: os/exec: avoid leaking an exec.Cmd in TestWaitInterrupt

In CL 436655 I added a GODEBUG setting to this test process to verify that Wait is eventually called for every exec.Cmd before it becomes unreachable. However, the cmdHang test helpers in TestWaitInterrupt/Exit-hang and TestWaitInterrupt/SIGKILL-hang intentially leak a subprocess in order to simulate a leaky third-party program, as Go users might encounter in practical use. To avoid tripping over the leak check, we call Wait on the leaked subprocess in a background goroutine. Since we expect the process running cmdHang to exit before its subprocess does, the call to Wait should have no effect beyond suppressing the leak check. Fixes #57596. Updates #52580. Updates #50436. Change-Id: Ia4b88ea47fc6b605c27ca6d9d7669c874867a900 Reviewed-on: https://go-review.googlesource.com/c/go/+/460998 Run-TryBot: Bryan Mills <bcmills@google.com> Auto-Submit: Bryan Mills <bcmills@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>

Updates #50436. Updates #56163. Fixes #24050. Change-Id: I1b00eb8fb60e0879f029642b5bad97b2e139fee6 Reviewed-on: https://go-review.googlesource.com/c/go/+/456116 Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Bryan Mills <bcmills@google.com> Auto-Submit: Bryan Mills <bcmills@google.com>

This uses the new Cancel and WaitDelay fields of os/exec.Cmd (added in #50436) to interrupt or kill the 'hg serve' command when its incoming http.Request is canceled. This should keep the vcweb hg handler from getting stuck if 'hg serve' hangs after the request either completes or is canceled. Fixes #57597 (maybe). Change-Id: I53cf58e8ab953fd48c0c37f596f99e885a036d9b Reviewed-on: https://go-review.googlesource.com/c/go/+/460997 Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Auto-Submit: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>

Add WaitDelay to ensure cmd.Wait() returns in a reasonable timeframe if the goroutines that cmd.Start() uses to copy Stdin/Stdout/Stderr are blocked when copying due to a sub-subprocess holding onto them. Read more details in these issues: - golang/go#23019 - golang/go#50436 This isn't the original intent of kill-delay, but it seems reasonable to reuse it in this context. Fixes canonical#149

…/out/err (#275) Use os.exec's Cmd.WaitDelay to ensure cmd.Wait returns in a reasonable timeframe if the goroutines that cmd.Start() uses to copy stdin/out/err are blocked when copying due to a sub-subprocess holding onto them. Read more details about the issue in golang/go#23019 and the proposed solution (that was added in Go 1.20) in golang/go#50436. This solves issue #149, where Patroni wasn't restarting properly even after a `KILL` signal was sent to it. I had originally mis-diagnosed this problem as an issue with Pebble not tracking the process tree of processes that daemonise and change their process group (which is still an issue, but is not causing this problem). The Patroni process wasn't being marked as finished at all due to being blocked on the `cmd.Wait()`. Patroni starts sub-processes and "forwards" stdin/out/err, so the copy goroutines block. Thankfully Go 1.20 introduced `WaitDelay` to allow you to easily work around this exact problem. The fix itself is [this one-liner] (#275): s.cmd.WaitDelay = s.killDelay() * 9 / 10 // 90% of kill-delay This will really only be a problem for services, but we make the same change for exec and exec health checks as it won't hurt there either. Also, as a drive-by, this PR also canonicalises some log messages: our style is to start with an uppercase letter (for logs, not errors) and to use "Cannot X" rather than "Error Xing". Fixes #149.

bcmills added the Proposal label Jan 4, 2022

gopherbot added this to the Proposal milestone Jan 4, 2022

bcmills mentioned this issue Jan 4, 2022

proposal: os/exec: allow CommandContext termination signal to be specified #31774

Closed

ianlancetaylor added this to Incoming in Proposals (old) Jan 4, 2022

rsc moved this from Incoming to Active in Proposals (old) Jan 5, 2022

This was referenced Jan 12, 2022

x/tools/cmd/godoc: improve diagnostics for server hangs in TestWeb/GOPATH #50014

Closed

cmd/fix: timeout in TestRewrite on netbsd builders #50581

Open

ianlancetaylor mentioned this issue Jan 27, 2022

os/exec: consider changing Wait to stop copying goroutines rather than waiting for them #23019

Closed

bcmills mentioned this issue Feb 2, 2022

test: issue46234.go failures with "command exceeded time limit" #50973

Open

prattmic mentioned this issue Dec 6, 2022

api: audit for Go 1.20 #57126

Closed

bcmills mentioned this issue Jan 20, 2023

context: add AfterFunc #57928

Closed

benhoyt mentioned this issue Aug 15, 2023

fix(servstate): use WaitDelay to avoid Command.Wait blocking on stdin/out/err canonical/pebble#275

Merged

imliuda mentioned this issue Nov 13, 2023

result output parameter of script templates occasionally become empty argoproj/argo-workflows#11330

Closed

3 tasks

golang locked and limited conversation to collaborators Jan 10, 2024

gopherbot added the FrozenDueToAge label Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

os/exec: add fields for managing termination signals and pipes #50436

os/exec: add fields for managing termination signals and pipes #50436

bcmills commented Jan 4, 2022 •

edited

gopherbot commented Jan 4, 2022

seebs commented Jan 4, 2022

seebs commented Jan 4, 2022

bcmills commented Jan 4, 2022

bcmills commented Jan 4, 2022 •

edited

seebs commented Jan 4, 2022

bcmills commented Jan 4, 2022

ianlancetaylor commented Jan 4, 2022

mvdan commented Jan 4, 2022

bcmills commented Jan 4, 2022

henvic commented Jan 4, 2022

henvic commented Jan 4, 2022

bcmills commented Jan 5, 2022

bcmills commented Jan 5, 2022

bcmills commented Jan 5, 2022

rsc commented Jan 5, 2022

rsc commented Jan 19, 2022

bcmills commented Jan 19, 2022

rsc commented Jan 26, 2022

bcmills commented Feb 2, 2022

rsc commented Feb 2, 2022

gopherbot commented Nov 1, 2022

gopherbot commented Nov 1, 2022

bcmills commented Nov 3, 2022

gopherbot commented Nov 9, 2022

gopherbot commented Nov 17, 2022

gopherbot commented Dec 7, 2022

gopherbot commented Jan 10, 2023

gopherbot commented Jan 10, 2023

os/exec: add fields for managing termination signals and pipes #50436

os/exec: add fields for managing termination signals and pipes #50436

Comments

bcmills commented Jan 4, 2022 • edited

Background

Proposal

Compatibility

Caveats

Alternatives considered

gopherbot commented Jan 4, 2022

seebs commented Jan 4, 2022

seebs commented Jan 4, 2022

bcmills commented Jan 4, 2022

bcmills commented Jan 4, 2022 • edited

seebs commented Jan 4, 2022

bcmills commented Jan 4, 2022

ianlancetaylor commented Jan 4, 2022

mvdan commented Jan 4, 2022

bcmills commented Jan 4, 2022

henvic commented Jan 4, 2022

henvic commented Jan 4, 2022

bcmills commented Jan 5, 2022

bcmills commented Jan 5, 2022

bcmills commented Jan 5, 2022

rsc commented Jan 5, 2022

rsc commented Jan 19, 2022

bcmills commented Jan 19, 2022

rsc commented Jan 26, 2022

bcmills commented Feb 2, 2022

rsc commented Feb 2, 2022

gopherbot commented Nov 1, 2022

gopherbot commented Nov 1, 2022

bcmills commented Nov 3, 2022

gopherbot commented Nov 9, 2022

gopherbot commented Nov 17, 2022

gopherbot commented Dec 7, 2022

gopherbot commented Jan 10, 2023

gopherbot commented Jan 10, 2023

bcmills commented Jan 4, 2022 •

edited

bcmills commented Jan 4, 2022 •

edited