Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: runtime test timeout in TestCrashDumpsAllThreads #19196

Closed
sandipmgiri opened this issue Feb 20, 2017 · 35 comments
Closed

runtime: runtime test timeout in TestCrashDumpsAllThreads #19196

sandipmgiri opened this issue Feb 20, 2017 · 35 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Testing An issue that has been verified to require only test changes, not just a test failure. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@sandipmgiri
Copy link

sandipmgiri commented Feb 20, 2017

Hi,

I could able to build and test successfully on RHEL7.3 VM, however on jenkins server I am getting runtime test timeout issue. Is there any way to solve this issue ?

Please find attached logfile.
golang_runtime_test_failure.txt

Thanks!

@ALTree
Copy link
Member

ALTree commented Feb 20, 2017

Possibly related: #18442

Also: why did you delete the following sections:

  • What version of Go are you using (go version)?
  • What operating system and processor architecture are you using (go env)?

from the issue template?

@ALTree ALTree changed the title runtime test case failure issue runtime: runtime test timeout Feb 20, 2017
@sandipmgiri
Copy link
Author

What version of Go are you using (go version)?

bash-4.2# go version
go version devel +a610957 Tue Feb 14 04:41:36 2017 +0000 linux/ppc64le

What operating system and processor architecture are you using (go env)?

bash-4.2# go env
warning: GOPATH set to GOROOT (/go) has no effect
GOARCH="ppc64le"
GOBIN=""
GOEXE=""
GOHOSTARCH="ppc64le"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/go"
GORACE=""
GOROOT="/go"
GOTOOLDIR="/go/pkg/tool/linux_ppc64le"
GCCGO="/usr/bin/gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build976661655=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

@ALTree ALTree added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 20, 2017
@ALTree ALTree added this to the Go1.9 milestone Feb 20, 2017
@ianlancetaylor
Copy link
Contributor

The hang is occurring in TestCrashDumpsAllThreads. That test runs a simple test program, sends it a SIGQUIT signal, and checks that output contains information about all the threads. In this case it appears that the child process never terminates, and the runtime test program hangs waiting to read the output. I don't know why.

@ianlancetaylor ianlancetaylor changed the title runtime: runtime test timeout runtime: runtime test timeout in TestCrashDumpsAllThreads Feb 20, 2017
@hartzell
Copy link

I wonder if this has something to do with having a controlling terminal, and or running under a pseudo-terminal?

When I ssh into my build box and build go by hand, I have:

$ printenv | grep -i tty
SSH_TTY=/dev/pts/9

That environment varible is not set in a Jenkins slave's environment.

We're sending SIGQUIT, which Wikipedia sez:

The SIGQUIT signal is sent to a process by its controlling terminal when the user requests that the process quit and perform a core dump.

If the process doesn't have a controlling terminal it might be ignoring SIGQUIT.

@hartzell
Copy link

This change to TestCrashDumpAllThreads allows the build to run successfully.

   26 func TestCrashDumpsAllThreads(t *testing.T) {
   27   _, err := os.Open("/dev/tty")
   28   if err != nil {
   29     t.Skipf("Skipping, unable to open /dev/tty: %v")
   30   }

Newbie mystery, I thought that the t.Skipf() would get written into the build output but I don't see it. I'm not sure if I have to somehow enable verbose mode or .... So there's a chance that it's just following Heisenberg...

@ianlancetaylor
Copy link
Contributor

The presence or absence of a controlling terminal should be irrelevant here. We are sending SIGQUIT directly using the kill system call. The note from Wikipedia is accurate as far as it goes, but what it is trying to say is that if the user types the quit character (default is ^\) on the controlling terminal of a process, then the kernel sends SIGQUIT to the process on behalf of the user. We aren't pressing ^\ or using the terminal in any way, we are sending SIGQUIT directly.

The t.Skipf message only shows up if you use the -test.v option to see what happens with each test, which the default build scripts do not do.

@hartzell
Copy link

Sorry for the false start and thanks for the explanation!

@sandipmgiri
Copy link
Author

sandipmgiri commented Feb 21, 2017

For now , Is it possible to skip runtime test ? If yes , then what we have to do ?
( I am using "./all.bash" command to install and test the golang package)

Thanks!

@hartzell
Copy link

@ianlancetaylor -- Is there a clue to be had in the fact that this test (tests) tend to fail when run in a Jenkins job and not from a command line? Do you have any ideas of where to start poking at it?

@ianlancetaylor
Copy link
Contributor

Well, I'm sure there is a clue that the failure is in a Jenkins job, but I don't know what that clue is.

If you can recreate this at will, I would suggest running the child process under strace -f, and feed the output to os.Stdout and os.Stderr so that we can see it rather than writing to a bytes.Buffer.

@hartzell
Copy link

@ianlancetaylor -- that looks doable-by-me. Thanks!

@hartzell
Copy link

@ianlancetaylor

Sending the output to std{out,err} resulted in Jenkins spinning for 15 minutes before I got bored and killed the job.

I sent the output to a pair of files instead. Here's the diff (no snickering...):

$ diff crash_unix_test.go.orig  crash_unix_test.go
10c10
< 	"bytes"
---
> 	// "bytes"
59c59
< 	cmd = exec.Command(filepath.Join(dir, "a.exe"))
---
> 	cmd = exec.Command("strace", "-f", filepath.Join(dir, "a.exe"))
74,76c74,89
< 	var outbuf bytes.Buffer
< 	cmd.Stdout = &outbuf
< 	cmd.Stderr = &outbuf
---
> 	// var outbuf bytes.Buffer
> 	// cmd.Stdout = &outbuf
> 	// cmd.Stderr = &outbuf
>
> 	// cmd.Stdout = os.Stdout
> 	// cmd.Stderr = os.Stderr
> 	so, err := os.Create("/tmp/poodle-stdout")
> 	if err != nil {
> 		t.Fatalf("Unable to create /tmp/poodle-stdout: %v", err)
> 	}
> 	cmd.Stdout = so
> 	se, err := os.Create("/tmp/poodle-stderr")
> 	if err != nil {
> 		t.Fatalf("Unable to create /tmp/poodle-stderr: %v", err)
> 	}
> 	cmd.Stderr = se
106,111c119,125
< 	out = outbuf.Bytes()
< 	n := bytes.Count(out, []byte("main.loop("))
< 	if n != 4 {
< 		t.Errorf("found %d instances of main.loop; expected 4", n)
< 		t.Logf("%s", out)
< 	}
---
> 	// out = outbuf.Bytes()
> 	// n := bytes.Count(out, []byte("main.loop("))
> 	// if n != 4 {
> 	// 	t.Errorf("found %d instances of main.loop; expected 4", n)
> 	// 	t.Logf("%s", out)
> 	// }
> 	t.Logf("Falling out the bottom")

/tmp/poodle-stdout is empty.

/tmp/poodle-stderr contains:

$ cat /tmp/poodle-stderr
execve("/tmp/go-build599743547/a.exe", ["/tmp/go-build599743547/a.exe"], [/* 77 vars */]) = 0
arch_prctl(ARCH_SET_FS, 0x4fdea8)       = 0
sched_getaffinity(0, 8192, {ffffffff, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...}) = 640
mmap(0xc000000000, 65536, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc000000000
munmap(0xc000000000, 65536)             = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaaaad000
mmap(0xc420000000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc420000000
mmap(0xc41fff8000, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc41fff8000
mmap(0xc000000000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc000000000
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaaaed000
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaaafd000
rt_sigprocmask(SIG_SETMASK, NULL, [QUIT], 8) = 0
sigaltstack(NULL, {ss_sp=0, ss_flags=SS_DISABLE, ss_size=0}) = 0
sigaltstack({ss_sp=0xc420002000, ss_flags=0, ss_size=32672}, NULL) = 0
rt_sigprocmask(SIG_SETMASK, [QUIT], NULL, 8) = 0
gettid()                                = 36799
rt_sigaction(SIGHUP, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGHUP, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGINT, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGINT, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGQUIT, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGQUIT, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGILL, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGILL, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGTRAP, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGTRAP, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGABRT, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGABRT, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGBUS, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGBUS, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGFPE, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGFPE, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGUSR1, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGUSR1, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGSEGV, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGSEGV, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGUSR2, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGUSR2, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGPIPE, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGPIPE, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGALRM, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGALRM, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGTERM, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGTERM, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGSTKFLT, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGSTKFLT, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGCHLD, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGURG, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGURG, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGXCPU, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGXCPU, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGXFSZ, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGXFSZ, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGVTALRM, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGVTALRM, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGPROF, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGPROF, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGWINCH, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGWINCH, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGIO, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGIO, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGPWR, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGPWR, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGSYS, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGSYS, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRTMIN, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_1, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_2, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_2, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_3, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_3, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_4, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_4, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_5, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_5, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_6, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_6, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_7, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_7, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_8, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_8, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_9, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_9, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_10, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_10, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_11, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_11, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_12, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_12, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_13, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_13, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_14, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_14, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_15, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_15, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_16, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_16, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_17, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_17, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_18, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_18, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_19, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_19, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_20, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_20, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_21, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_21, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_22, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_22, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_23, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_23, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_24, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_24, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_25, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_25, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_26, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_26, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_27, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_27, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_28, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_28, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_29, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_29, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_30, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_30, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_31, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_31, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigaction(SIGRT_32, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_32, {0x44e650, ~[], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0x44e780}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[], [QUIT], 8) = 0
clone(Process 36809 attached
 <unfinished ...>
[pid 36809] gettid( <unfinished ...>
[pid 36799] <... clone resumed> child_stack=0xc420044000, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD) = 36809
[pid 36809] <... gettid resumed> )      = 36809
[pid 36799] rt_sigprocmask(SIG_SETMASK, [QUIT],  <unfinished ...>
[pid 36809] arch_prctl(ARCH_SET_FS, 0xc420034068 <unfinished ...>
[pid 36799] <... rt_sigprocmask resumed> NULL, 8) = 0
[pid 36809] <... arch_prctl resumed> )  = 0
[pid 36809] sigaltstack(NULL, {ss_sp=0, ss_flags=SS_DISABLE, ss_size=0}) = 0
[pid 36809] sigaltstack({ss_sp=0xc420036000, ss_flags=0, ss_size=32768} <unfinished ...>
[pid 36799] rt_sigprocmask(SIG_SETMASK, ~[],  <unfinished ...>
[pid 36809] <... sigaltstack resumed> , NULL) = 0
[pid 36799] <... rt_sigprocmask resumed> [QUIT], 8) = 0
[pid 36809] rt_sigprocmask(SIG_SETMASK, [QUIT],  <unfinished ...>
[pid 36799] clone( <unfinished ...>
[pid 36809] <... rt_sigprocmask resumed> NULL, 8) = 0
Process 36810 attached
[pid 36809] gettid( <unfinished ...>
[pid 36810] gettid( <unfinished ...>
[pid 36809] <... gettid resumed> )      = 36809
[pid 36799] <... clone resumed> child_stack=0xc420040000, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD) = 36810
[pid 36810] <... gettid resumed> )      = 36810
[pid 36809] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 36799] rt_sigprocmask(SIG_SETMASK, [QUIT],  <unfinished ...>
[pid 36810] arch_prctl(ARCH_SET_FS, 0xc420034468 <unfinished ...>
[pid 36799] <... rt_sigprocmask resumed> NULL, 8) = 0
[pid 36810] <... arch_prctl resumed> )  = 0
[pid 36810] sigaltstack(NULL <unfinished ...>
[pid 36799] rt_sigprocmask(SIG_SETMASK, ~[],  <unfinished ...>
[pid 36810] <... sigaltstack resumed> , {ss_sp=0, ss_flags=SS_DISABLE, ss_size=0}) = 0
[pid 36799] <... rt_sigprocmask resumed> [QUIT], 8) = 0
[pid 36810] sigaltstack({ss_sp=0xc42004e000, ss_flags=0, ss_size=32768} <unfinished ...>
[pid 36799] clone( <unfinished ...>
[pid 36810] <... sigaltstack resumed> , NULL) = 0
[pid 36809] <... select resumed> )      = 0 (Timeout)
Process 36811 attached
[pid 36810] rt_sigprocmask(SIG_SETMASK, [QUIT],  <unfinished ...>
[pid 36809] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 36811] gettid( <unfinished ...>
[pid 36810] <... rt_sigprocmask resumed> NULL, 8) = 0
[pid 36799] <... clone resumed> child_stack=0xc420042000, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD) = 36811
[pid 36811] <... gettid resumed> )      = 36811
[pid 36810] gettid( <unfinished ...>
[pid 36799] rt_sigprocmask(SIG_SETMASK, [QUIT],  <unfinished ...>
[pid 36811] arch_prctl(ARCH_SET_FS, 0xc420034868 <unfinished ...>
[pid 36810] <... gettid resumed> )      = 36810
[pid 36799] <... rt_sigprocmask resumed> NULL, 8) = 0
[pid 36810] select(0, NULL, NULL, NULL, {0, 3} <unfinished ...>
[pid 36811] <... arch_prctl resumed> )  = 0
[pid 36799] futex(0x4fdf50, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 36811] sigaltstack(NULL <unfinished ...>
[pid 36809] <... select resumed> )      = 0 (Timeout)
[pid 36811] <... sigaltstack resumed> , {ss_sp=0, ss_flags=SS_DISABLE, ss_size=0}) = 0
[pid 36809] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 36811] sigaltstack({ss_sp=0xc420056000, ss_flags=0, ss_size=32768} <unfinished ...>
[pid 36810] <... select resumed> )      = 0 (Timeout)
[pid 36811] <... sigaltstack resumed> , NULL) = 0
[pid 36810] futex(0x4fdf50, FUTEX_WAKE, 1 <unfinished ...>
[pid 36811] rt_sigprocmask(SIG_SETMASK, [QUIT],  <unfinished ...>
[pid 36810] <... futex resumed> )       = 1
[pid 36799] <... futex resumed> )       = 0
[pid 36811] <... rt_sigprocmask resumed> NULL, 8) = 0
[pid 36810] futex(0xc420034510, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 36811] gettid()                    = 36811
[pid 36809] <... select resumed> )      = 0 (Timeout)
[pid 36811] futex(0xc420034910, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 36809] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 36799] readlinkat(AT_FDCWD, "/proc/self/exe",  <unfinished ...>
[pid 36809] <... select resumed> )      = 0 (Timeout)
[pid 36809] futex(0xc420034910, FUTEX_WAKE, 1 <unfinished ...>
[pid 36811] <... futex resumed> )       = 0
[pid 36809] <... futex resumed> )       = 1
[pid 36799] <... readlinkat resumed> "/tmp/go-build599743547/a.exe", 128) = 28
[pid 36811] futex(0xc420034910, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 36809] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 36799] futex(0xc420034910, FUTEX_WAKE, 1 <unfinished ...>
[pid 36811] <... futex resumed> )       = 0
[pid 36799] <... futex resumed> )       = 1
[pid 36811] select(0, NULL, NULL, NULL, {0, 3} <unfinished ...>
[pid 36799] mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaab0d000
[pid 36811] <... select resumed> )      = 0 (Timeout)
[pid 36809] <... select resumed> )      = 0 (Timeout)
[pid 36799] futex(0x4fd908, FUTEX_WAIT, 0, {0, 100000} <unfinished ...>
[pid 36811] futex(0x4fd908, FUTEX_WAKE, 1 <unfinished ...>
[pid 36809] sched_yield( <unfinished ...>
[pid 36799] <... futex resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
[pid 36811] <... futex resumed> )       = 0
[pid 36809] <... sched_yield resumed> ) = 0
[pid 36811] futex(0xc420034910, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 36799] sched_yield( <unfinished ...>
[pid 36809] futex(0x4fd850, FUTEX_WAKE, 1 <unfinished ...>
[pid 36799] <... sched_yield resumed> ) = 0
[pid 36809] <... futex resumed> )       = 0
[pid 36799] futex(0x4fd850, FUTEX_WAKE, 1 <unfinished ...>
[pid 36809] futex(0x4fd918, FUTEX_WAIT, 0, {60, 0} <unfinished ...>
[pid 36799] <... futex resumed> )       = 0
[pid 36799] futex(0x4fd918, FUTEX_WAKE, 1 <unfinished ...>
[pid 36809] <... futex resumed> )       = 0
[pid 36799] <... futex resumed> )       = 1
[pid 36809] sched_yield( <unfinished ...>
[pid 36799] futex(0xc420034910, FUTEX_WAKE, 1 <unfinished ...>
[pid 36809] <... sched_yield resumed> ) = 0
[pid 36811] <... futex resumed> )       = 0
[pid 36799] <... futex resumed> )       = 1
[pid 36809] futex(0x4fd850, FUTEX_WAKE, 1 <unfinished ...>
[pid 36811] futex(0xc420034910, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 36799] futex(0xc420034910, FUTEX_WAKE, 1 <unfinished ...>
[pid 36809] <... futex resumed> )       = 0
[pid 36811] <... futex resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
[pid 36799] <... futex resumed> )       = 0
[pid 36811] select(0, NULL, NULL, NULL, {0, 3} <unfinished ...>
[pid 36809] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 36811] <... select resumed> )      = 0 (Timeout)
[pid 36811] futex(0xc420034510, FUTEX_WAKE, 1 <unfinished ...>
[pid 36809] <... select resumed> )      = 0 (Timeout)
[pid 36811] <... futex resumed> )       = 1
[pid 36810] <... futex resumed> )       = 0
[pid 36809] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 36810] rt_sigprocmask(SIG_SETMASK, ~[], [QUIT], 8) = 0
[pid 36810] clone( <unfinished ...>
[pid 36809] <... select resumed> )      = 0 (Timeout)
[pid 36810] <... clone resumed> child_stack=0xc420098000, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD) = 36814
Process 36814 attached
[pid 36810] rt_sigprocmask(SIG_SETMASK, [QUIT],  <unfinished ...>
[pid 36809] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 36810] <... rt_sigprocmask resumed> NULL, 8) = 0
[pid 36814] gettid()                    = 36814
[pid 36809] <... select resumed> )      = 0 (Timeout)
[pid 36814] arch_prctl(ARCH_SET_FS, 0xc420086068 <unfinished ...>
[pid 36809] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 36814] <... arch_prctl resumed> )  = 0
[pid 36814] sigaltstack(NULL, {ss_sp=0, ss_flags=SS_DISABLE, ss_size=0}) = 0
[pid 36814] sigaltstack({ss_sp=0xc42008a000, ss_flags=0, ss_size=32768} <unfinished ...>
[pid 36809] <... select resumed> )      = 0 (Timeout)
[pid 36814] <... sigaltstack resumed> , NULL) = 0
[pid 36809] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 36814] rt_sigprocmask(SIG_SETMASK, [QUIT], NULL, 8) = 0
[pid 36814] gettid()                    = 36814
[pid 36809] <... select resumed> )      = 0 (Timeout)
[pid 36809] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 36814] rt_sigprocmask(SIG_SETMASK, ~[], [QUIT], 8) = 0
[pid 36814] clone( <unfinished ...>
[pid 36809] <... select resumed> )      = 0 (Timeout)
[pid 36809] select(0, NULL, NULL, NULL, {0, 20}Process 36816 attached
 <unfinished ...>
[pid 36816] gettid( <unfinished ...>
[pid 36814] <... clone resumed> child_stack=0xc420094000, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD) = 36816
[pid 36816] <... gettid resumed> )      = 36816
[pid 36814] rt_sigprocmask(SIG_SETMASK, [QUIT],  <unfinished ...>
[pid 36816] arch_prctl(ARCH_SET_FS, 0xc42009a068 <unfinished ...>
[pid 36814] <... rt_sigprocmask resumed> NULL, 8) = 0
[pid 36809] <... select resumed> )      = 0 (Timeout)
[pid 36816] <... arch_prctl resumed> )  = 0
[pid 36809] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 36816] sigaltstack(NULL, {ss_sp=0, ss_flags=SS_DISABLE, ss_size=0}) = 0
[pid 36816] sigaltstack({ss_sp=0xc42009e000, ss_flags=0, ss_size=32768}, NULL) = 0
[pid 36816] rt_sigprocmask(SIG_SETMASK, [QUIT],  <unfinished ...>
[pid 36809] <... select resumed> )      = 0 (Timeout)
[pid 36816] <... rt_sigprocmask resumed> NULL, 8) = 0
[pid 36809] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 36816] gettid()                    = 36816
[pid 36816] select(0, NULL, NULL, NULL, {0, 3} <unfinished ...>
[pid 36809] <... select resumed> )      = 0 (Timeout)
[pid 36809] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 36816] <... select resumed> )      = 0 (Timeout)
[pid 36816] write(3, "x", 1 <unfinished ...>
[pid 36809] <... select resumed> )      = 0 (Timeout)
Process 36799 detached
Process 36809 detached
Process 36810 detached
Process 36811 detached
Process 36814 detached
Process 36816 detached

What can I do next (and thanks for helping!)?

@hartzell
Copy link

ps. That test "passed" (noticed that I lopped the testy parts out) and/but took a bit over 20 seconds.

ok  	runtime	20.596s

@hartzell
Copy link

I connected the dots with instructions that @minux gave me in a related thread.

I ran a build with an unmodified crash_unix_test.go, waited until the problematic test was running, then did as @minux suggested.

It's not what I expected, but here's the result:

$ ps auxwww | grep a.exe
me 28496  366  0.0   2984   912 ?        Rl   20:03   2:52 /tmp/go-build163565211/a.exe
me 39993  0.0  0.0 112648   976 pts/1    S+   20:04   0:00 grep --color=auto a.exe
$ gdb -p 28496
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-64.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 28496
Reading symbols from /tmp/go-build163565211/a.exe...done.
0x000000000047d4dd in main.loop (i=3, c=0xc420014240) at /tmp/go-build163565211/main.go:37
37			for j := 0; j < 0x7fffffff; j++ {
warning: File "/.../tmp/poodle/go/src/runtime/runtime-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
	add-auto-load-safe-path /.../tmp/poodle/go/src/runtime/runtime-gdb.py
line to your configuration file "/.../.gdbinit".
To completely disable this security protection add
	set auto-load safe-path /
line to your configuration file "/.../.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
	info "(gdb)Auto-loading safe path"
(gdb) set logging file /tmp/gdb.log
(gdb) set logging on
Copying output to /tmp/gdb.log.
(gdb) set height 0
(gdb) thread apply all bt

Thread 1 (process 28496):
#0  0x000000000047d4dd in main.loop (i=3, c=0xc420014240) at /tmp/go-build163565211/main.go:37
#1  0x000000000044d411 in runtime.goexit () at /isilon/Analysis/scratch/hartzelg/tmp/poodle/go/src/runtime/asm_amd64.s:2197
#2  0x0000000000000003 in ?? ()
#3  0x000000c420014240 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb) quit
A debugging session is active.

	Inferior 1 [process 28496] will be detached.

Quit anyway? (y or n) y
Detaching from program: /tmp/go-build163565211/a.exe, process 28496

Here is the /tmp/gdb.log file:


Thread 1 (process 28496):
#0  0x000000000047d4dd in main.loop (i=3, c=0xc420014240) at /tmp/go-build163565211/main.go:37
#1  0x000000000044d411 in runtime.goexit () at /isilon/Analysis/scratch/hartzelg/tmp/poodle/go/src/runtime/asm_amd64.s:2197
#2  0x0000000000000003 in ?? ()
#3  0x000000c420014240 in ?? ()
#4  0x0000000000000000 in ?? ()
A debugging session is active.

	Inferior 1 [process 28496] will be detached.

Quit anyway? (y or n) Detaching

@aclements
Copy link
Member

@hartzell, sorry for dropping this on the floor.

Could you re-do what you did with gdb in your last comment, but first add import _ "runtime/cgo" to crashDumpsAllThreadsSource in runtime/crash_unix_test.go? (The problem is that GDB assumes it can look at the pthread data structures to find the threads in the process, but those only exist in a cgo binary. Hence, without an import of cgo, you only see the initial thread.)

@hartzell
Copy link

hartzell commented Jun 9, 2017

Hi @aclements -- I'm on vacation (miles and worlds away from work). I'll be back in about 10 days and will see if I can do anything with this. I no longer have access to that particularly large machine but will see I can get it again.

@bradfitz bradfitz added the Testing An issue that has been verified to require only test changes, not just a test failure. label Jun 29, 2017
@aclements
Copy link
Member

Ping @hartzell

@aclements aclements added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jul 10, 2017
@chlunde
Copy link
Contributor

chlunde commented Jul 16, 2017

It looks like that when you start a program from Java, not just Jenkins, the SIGQUIT signal is blocked. I do not know if this blocking is a JVM bug or feature: The Java runtime prints a stack trace and keeps on running on SIGQUIT, while most programs would of course terminate. It might be that Java tries to protect other programs from Ctrl-\ in terminal sessions by blocking this signal, because typing Ctrl-\ (unlike kill -QUIT PID) would also be sent to the Go program if it is in the same process group.

When I patch the Go runtime to unblock SIGQUIT,

diff --git a/src/runtime/sigtab_linux_generic.go b/src/runtime/sigtab_linux_generic.go
index 874148e1d2..8f4508dd90 100644
--- a/src/runtime/sigtab_linux_generic.go
+++ b/src/runtime/sigtab_linux_generic.go
@@ -19,7 +19,7 @@ var sigtable = [...]sigTabT{
        /* 0 */ {0, "SIGNONE: no trap"},
        /* 1 */ {_SigNotify + _SigKill, "SIGHUP: terminal line hangup"},
        /* 2 */ {_SigNotify + _SigKill, "SIGINT: interrupt"},
-       /* 3 */ {_SigNotify + _SigThrow, "SIGQUIT: quit"},
+       /* 3 */ {_SigNotify + _SigThrow + _SigUnblock, "SIGQUIT: quit"},
        /* 4 */ {_SigThrow + _SigUnblock, "SIGILL: illegal instruction"},
        /* 5 */ {_SigThrow + _SigUnblock, "SIGTRAP: trace trap"},
        /* 6 */ {_SigNotify + _SigThrow, "SIGABRT: abort"},

all.bash passes.

Are there any good reasons for leaving SIGQUIT blocked, except for the terminal Ctrl-\ issue?

Here /proc/PID/status when I'm running /usr/bin/sleep from Java:

$ grep -E 'Sig|Pnd' /proc/28083/status
SigQ:	0/47016
SigPnd:	0000000000000000
ShdPnd:	0000000000000000
SigBlk:	0000000000000004
SigIgn:	0000000000001000
SigCgt:	0000000000000000
$ kill -QUIT 28083
$ grep -E 'Sig|Pnd' /proc/28083/status
SigQ:	1/47016
SigPnd:	0000000000000000
ShdPnd:	0000000000000004
SigBlk:	0000000000000004
SigIgn:	0000000000001000
SigCgt:	0000000000000000

@chlunde chlunde marked this as a duplicate of #19203 Jul 16, 2017
@hartzell
Copy link

I've closed #19203 in favor of this, @chlunde has a more amenable test case.

Rather than patching the go runtime, can a analogous change be made in the test?

@aclements
Copy link
Member

@chlunde, thanks for figuring that out! I suspect you're right about the reasoning behind Java blocking SIQQUIT.

Rather than patching the go runtime, can a analogous change be made in the test?

This seems like the right approach to me. Could someone try the following patch? (This isn't the right way to do it, but if this works I'll put together a real CL)

--- a/src/runtime/crash_test.go
+++ b/src/runtime/crash_test.go
@@ -12,18 +12,24 @@ import (
        "io/ioutil"
        "os"
        "os/exec"
+       "os/signal"
        "path/filepath"
        "regexp"
        "runtime"
        "strconv"
        "strings"
        "sync"
+       "syscall"
        "testing"
        "time"
 )
 
 var toRemove []string
 
+func init() {
+       signal.Notify(make(chan os.Signal), syscall.SIGQUIT)
+}
+
 func TestMain(m *testing.M) {
        status := m.Run()
        for _, file := range toRemove {

@chlunde
Copy link
Contributor

chlunde commented Jul 17, 2017

@aclements It kind of works, a little bit. For the parent process, I can send a signal. The test still hangs.

Reproducer:

public class Wrap {
    public static void main(String... args) throws Exception {
        Process proc = new ProcessBuilder(args).inheritIO().start();
        System.exit(proc.waitFor());
    }
}
java -cp ~/tmp Wrap ./runtime.test -test.run TestCrashDumpsAllThreads -test.v

@aclements
Copy link
Member

Thanks. I added some prints in the runtime and confirmed that that change to the test does cause it to unblock SIGQUIT with sigprocmask, but it remains blocked according to SigBlk in proc. In fact, it's the only blocked signal according to proc, even though we're setting a signal mask of 0xfffffffe7bfe7b23, which I'm rather confused by. @ianlancetaylor, any idea why this isn't working?

@ianlancetaylor
Copy link
Contributor

Unblocking the signal in the runtime.test program does not unblock the signal in child processes started by the runtime.test program. Using signal.Notify to unblock SIGQUIT arranges for the signal thread (the one started by ensureSigM in runtime/signal_unix.go) to call pthread_sigmask to unblock SIGQUIT for that thread. But that thread is not the one that is going to be starting the child process. The child process will be started by some other thread, and it will inherit the signal mask of that other thread. That thread's signal mask will preserve the fact that SIGQUIT was blocked when the program was started.

@ianlancetaylor
Copy link
Contributor

It's likely that this patch in addition to yours would fix the problem.

diff --git a/src/runtime/proc.go b/src/runtime/proc.go
index 4f61f6164b..29caef5711 100644
--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -2862,7 +2862,26 @@ func syscall_runtime_AfterForkInChild() {
 
 	// When we are the child we are the only thread running,
 	// so we know that nothing else has changed gp.m.sigmask.
-	msigrestore(getg().m.sigmask)
+	sigmask := getg().m.sigmask
+
+	// Update the mask based on what the os/signal package has done.
+	if sig.inuse {
+		for i := 0; i < _NSIG; i += 32 {
+			wanted := atomic.Load(&sig.wanted[i/32])
+			ignored := atomic.Load(&sig.ignored[i/32])
+			for j := 0; j < 32; j++ {
+				bit := uint32(1) << uint(j)
+				if wanted & bit != 0 {
+					sigdelset(&sigmask, i+j)
+				}
+				if ignored & bit != 0 {
+					sigaddset(&sigmask, i+j)
+				}
+			}
+		}
+	}
+
+	msigrestore(sigmask)
 }
 
 // Called from syscall package before Exec.

@chlunde
Copy link
Contributor

chlunde commented Jul 17, 2017

I'm not sure if I understand the goal here, is it

  1. Make it possible to build and test most of Go runtime on a jenkins server
  2. Make it possible to build and test Go runtime on jenkins including this specific test
  3. Make SIGQUIT work our of the box for everyone testing Go programs using Jenkins or another java based build server
  4. create a new interface for unblocking signals in child processes and 2)

or something else? If it is 1), maybe it would be less intrusive to skip this test if an environment variable set by Jenkins is defined, and drop that if Java/Jenkins changes the behaviour.

@ianlancetaylor
Copy link
Contributor

I agree that for the 1.9 release, if there is an environment variable that we can use to reliably detect whether we are running under Jenkins, then we should skip that test if the environment variable is set. Do you know if there is such an environment variable?

For 1.10 I think a patch along the lines of the one I posted above would be reasonable, so that Go programs have a way to control the signal mask of child processes. Or, of course, we could also simply add a field to syscall.SysProcAttr, although I'm not especially excited about doing that.

@aclements
Copy link
Member

I agree that for the 1.9 release, if there is an environment variable that we can use to reliably detect whether we are running under Jenkins, then we should skip that test if the environment variable is set.

Or we could export something to the runtime test that checks if SIGQUIT is blocked.

@hartzell
Copy link

RE: Jenkins environment variables (scroll down to "Jenkins set environment variables): https://wiki.jenkins.io/display/JENKINS/Building+a+software+project

JENKINS_URL seems most evocative of the list. I use WORKSPACE in some of my tooling.

@chlunde
Copy link
Contributor

chlunde commented Jul 18, 2017

@ianlancetaylor that patch works, tested with Wrap.java and in Jenkins.

@hartzell I don't have JENKINS_URL here on a freshly started jenkins server.

I also took a stab at trying to detect if SIGQUIT is blocked, but I'm not sure if this is the right approach and portable, given that sigset is different and there are buildtags involved.

diff --git a/src/runtime/crash_unix_test.go b/src/runtime/crash_unix_test.go
index fdb3267006..73da55d267 100644
--- a/src/runtime/crash_unix_test.go
+++ b/src/runtime/crash_unix_test.go
@@ -31,6 +31,10 @@ func TestCrashDumpsAllThreads(t *testing.T) {
                t.Skipf("skipping; not supported on %v", runtime.GOOS)
        }
 
+       if runtime.Sigisblocked(int(sigquit)) {
+               t.Skip("skipping; SIGQUIT is blocked, see golang.org/issue/19196")
+       }
+
        // We don't use executeTest because we need to kill the
        // program while it is running.
 
diff --git a/src/runtime/export_linux_test.go b/src/runtime/export_linux_test.go
index ef0c111677..c8c94702b6 100644
--- a/src/runtime/export_linux_test.go
+++ b/src/runtime/export_linux_test.go
@@ -8,3 +8,15 @@ package runtime
 
 var NewOSProc0 = newosproc0
 var Mincore = mincore
+
+func sigismember(mask *sigset, i int) bool {
+       clear := *mask
+       sigdelset(&clear, i)
+       return clear != *mask
+}
+
+func Sigisblocked(i int) bool {
+       var sigmask sigset
+       sigprocmask(_SIG_SETMASK, nil, &sigmask)
+       return sigismember(&sigmask, i)
+}
$ java -cp ~/tmp Wrap ./runtime.test -test.run TestCrashDumpsAllThreads -test.v
=== RUN   TestCrashDumpsAllThreads
--- SKIP: TestCrashDumpsAllThreads (0.00s)
	crash_unix_test.go:35: skipping; SIGQUIT is blocked, see golang.org/issue/19196
PASS

$ ./runtime.test -test.run TestCrashDumpsAllThreads -test.v
=== RUN   TestCrashDumpsAllThreads
--- PASS: TestCrashDumpsAllThreads (1.29s)
PASS

@hartzell
Copy link

@hartzell I don't have JENKINS_URL here on a freshly started jenkins server.

I just created a new job on my Jenkins server (v2.32.2) that had a single "Execute shell" build step that contained

#!/bin/sh
printenv

and its output included JENKINS_URL.

Wonder what's different? I guess if you really don't have it then it's not universal and so the "why?" doesn't matter.

Is there another variable on the list that is in your jobs' environments?

@chlunde
Copy link
Contributor

chlunde commented Jul 18, 2017

I just ran java -jar jenkins-war-2.70-SNAPSHOT.war and accepted the default plugins, without configuring jenkins:

BUILD_DISPLAY_NAME
BUILD_ID
BUILD_NUMBER
BUILD_TAG
EXECUTOR_NUMBER
HUDSON_COOKIE
HUDSON_HOME
HUDSON_SERVER_COOKIE
JENKINS_HOME
JENKINS_SERVER_COOKIE
JOB_BASE_NAME
JOB_DISPLAY_URL
JOB_NAME
RUN_CHANGES_DISPLAY_URL
RUN_DISPLAY_URL
WORKSPACE

@hartzell
Copy link

There are some things that don't have default values, instead it forces you to the configuration page the first time you connect and provides default values that get used if/when you save the config page w/out providing anything specific.

This is one of them.

I have a setup that stands up a Jenkins server w/out requiring any user intervention, I had to add a bit of groovy to ensure that it got set, similar to what's described here on the jenkins wiki.

BUILD_ID and BUILD_NUMBER are on your list and the official-ish wiki page. JENKINS_HOME is seductive but it's not on the wiki page...

@aclements
Copy link
Member

@chlunde, your patch looks pretty reasonable to me, other than some minor build tag issues. May I turn it into a CL? (You're welcome to, of course, but I'm happy to.)

@chlunde
Copy link
Contributor

chlunde commented Jul 19, 2017

@aclements Sure, go ahead

@gopherbot
Copy link

CL https://golang.org/cl/50330 mentions this issue.

@golang golang locked and limited conversation to collaborators Jul 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Testing An issue that has been verified to require only test changes, not just a test failure. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

8 participants