Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: process already finished, when sending syscall.SIGCONT #19314

Closed
nbari opened this issue Feb 28, 2017 · 11 comments
Closed

os: process already finished, when sending syscall.SIGCONT #19314

nbari opened this issue Feb 28, 2017 · 11 comments

Comments

@nbari
Copy link

nbari commented Feb 28, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.8 darwin/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/nbari/projects/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/6x/tk00dzdd2hz4x3_ybq5zcw1w0000gn/T/go-build113311936=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

When executing a process and sending signals to it using: Process.Signal I notice that after sending the second signal syscall.SIGCONT I got a: os: process already finished but if using syscall.Kill everything works as expected.

For demonstrative purposes I have created this naive example:

https://play.golang.org/p/5CVRyXp2Py

package main

import (
	"fmt"
	"os"
	"os/exec"
	"syscall"
	"time"
)

func main() {
	exit := make(chan error, 1)
	go run(exit)

	for {
		select {
		case <-exit:
			println("fin, restarting")
			run(exit)
		default:
			time.Sleep(time.Second)
			println("running...")
		}
	}
}

func run(ch chan<- error) {
	cmd := exec.Command("sleep", "3")
	if err := cmd.Start(); err != nil {
		print(err.Error())
		os.Exit(1)
	}
	fmt.Printf("Pid: %d\n", cmd.Process.Pid)
        go func() {
		ch <- cmd.Wait()
	}()

	time.Sleep(2 * time.Second)
	fmt.Printf("%v\n", cmd.Process.Signal(syscall.SIGSTOP))

	time.Sleep(2 * time.Second)

	// Using this will return an os: process already finished
	fmt.Printf("%v\n", cmd.Process.Signal(syscall.SIGCONT))

	// This works as expected
	//fmt.Printf("%v\n", syscall.Kill(cmd.Process.Pid, syscall.SIGCONT))
}

What did you expect to see?

nil

What did you see instead?

os: process already finished

@ianlancetaylor
Copy link
Contributor

For the record, your program runs with no error on GNU/Linux (Ubuntu Trusty).

I see that you have a loop. How often do you see the error from SIGCONT?

@ianlancetaylor ianlancetaylor added this to the Go1.9 milestone Feb 28, 2017
@nbari
Copy link
Author

nbari commented Feb 28, 2017

Hi, the program works, but problem is that the SIGCONT seems not to be applied, so every time the signal is sent, I get an os: process already finished in contrast when sending the signal using syscall.Kill(cmd.Process.Pid, syscall.SIGCONT))

@ianlancetaylor
Copy link
Contributor

To be clear, when I run the program on GNU/Linux, I see an infinite sequence of

Pid: 14304
<nil>
<nil>
running...
fin, restarting
Pid: 14305
<nil>
<nil>
running...
fin, restarting
Pid: 14307
<nil>
<nil>
running...
fin, restarting

That is what I mean when I say that your program runs with no error.

I understand from your reply that you see an error every single time in the loop.

@nbari
Copy link
Author

nbari commented Feb 28, 2017

This is what I get when using cmd.Process.Signal(syscall.SIGCONT)

Pid: 2900
running...
<nil>
running...
running...
os: process already finished
running...
running...
...

But when using syscall.Kill(cmd.Process.Pid, syscall.SIGCONT)) it works:

Pid: 3195
running...
<nil>
running...
running...
<nil>
running...
fin, restarting
Pid: 3202
<nil>
<nil>
running...
fin, restarting

I have been told I have a race condition on the command structure, need to test this but besides that probably could be related to an issue related to the operating system, so far seems to be only happening on mac os X (darwin), I tested on FreeBSD 11 and is also working as expected with both options.

@ianlancetaylor
Copy link
Contributor

Could just be Darwin-specific behavior of the sleep command.

@nbari
Copy link
Author

nbari commented Feb 28, 2017

I am using sleep command just to test/create the example, but happens with any other command, for example this:

package main

import (
	"fmt"
	"os"
	"time"
)

func main() {

	for i := 1; i < 10; i++ {
		if i%3 == 0 {
			fmt.Fprintf(os.Stderr, "STDERR i: %d\n", i)
		} else {
			fmt.Printf("STDOUT i: %d\n", i)
		}
		time.Sleep(time.Second)
	}

}

@ianlancetaylor
Copy link
Contributor

Which version of Darwin are you running?

I think this could happen if waitid does not work as expected.

@nbari
Copy link
Author

nbari commented Feb 28, 2017

16.4.0

$ uname -a
Darwin M20160001.local 16.4.0 Darwin Kernel Version 16.4.0: Thu Dec 22 22:53:21 PST 2016; root:xnu-3789.41.3~3/RELEASE_X86_64 x86_64

@jbardin
Copy link
Contributor

jbardin commented Feb 28, 2017

@ianlancetaylor:

Here's a condensed example:

cmd := exec.Command("sleep", "10")
if err := cmd.Start(); err != nil {
	log.Fatal(err)
}

// signal when wait4 will return immediately
go func() {
	var siginfo [128]byte
	psig := &siginfo[0]
	_, _, e := syscall.Syscall6(syscall.SYS_WAITID, 1, uintptr(cmd.Process.Pid), uintptr(unsafe.Pointer(psig)), syscall.WEXITED|syscall.WNOWAIT, 0, 0)
	fmt.Println("WAITID RETURNED -- this shouldn't happen:", e)
}()

err := cmd.Process.Signal(syscall.SIGSTOP)
if err != nil {
	log.Fatal(err)
}
cmd.Wait()

On darwin (or at least on Sierra) this will print WAITID RETURNED with an errno of 0.

That syscall is what's used by Process.blockUntilWaitable to mark the process as done, which is returning on SIGSTOP, even though we specify WEXITED. That looks like a darwin bug.

@ianlancetaylor
Copy link
Contributor

@jbardin Thanks a lot for trying that out. Sounds like we have to stop using waitid on Darwin.

@gopherbot
Copy link

CL https://golang.org/cl/37610 mentions this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants