runtime: unix: c-shared libraries segfault handler overrides the default handler #14899

jtsylve · 2016-03-21T20:04:43Z

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.6 darwin/amd64 or go version go1.6 darwin/386

What operating system and processor architecture are you using (go env)?

GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/joe/go:/Users/joe/src/repo004/GOPATH"
GORACE=""
GOROOT="/usr/local/Cellar/go/1.6/libexec"
GOTOOLDIR="/usr/local/Cellar/go/1.6/libexec/pkg/tool/darwin_amd64"
GO15VENDOREXPERIMENT="1"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fno-common"
CXX="clang++"
CGO_ENABLED="1"

What did you do?

Compile a library using buildmode=c-shared
Link against a C program
Cause the C program to segfault in any way (ex. dereference a NULL pointer)

What did you expect to see?

The program should crash and the default segfault handler should be called (since the crash was not caused by Go).

I'd expect to see a crash like this:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   crash                           0x00000001023cdf91 main + 17
1   libdyld.dylib                   0x00007fff89f915c9 start + 1

What did you see instead?

Go's signal handler takes precedence, obscuring the stack trace, and making it default to debug crashes in applications that link against Go c-shared libraries.

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libmygolib.dylib                   0x02ef6955 runtime.raiseproc + 37 (sys_darwin_386.s:75)

Thread 1:
0   libmygolib.dylib                   0x02ef6d7a runtime.mach_semaphore_wait + 10 (sys_darwin_386.s:454)

Thread 2:
0   libmygolib.dylib                   0x02ef6c3e runtime.usleep + 78 (sys_darwin_386.s:330)

Thread 3:
0   libmygolib.dylib                   0x02ef6d7a runtime.mach_semaphore_wait + 10 (sys_darwin_386.s:454)

Thread 4:
0   libmygolib.dylib                   0x02ef6d7a runtime.mach_semaphore_wait + 10 (sys_darwin_386.s:454)

Thread 5:
0   libmygolib.dylib                   0x02ef6d7a runtime.mach_semaphore_wait + 10 (sys_darwin_386.s:454)

I believe this has to do with the following code:

TEXT runtime·raise(SB),NOSPLIT,$0
    // Ideally we'd send the signal to the current thread,
    // not the whole process, but that's too hard on OS X.
    JMP runtime·raiseproc(SB)

TEXT runtime·raiseproc(SB),NOSPLIT,$24
    MOVL    $(0x2000000+20), AX // getpid
    SYSCALL
    MOVQ    AX, DI  // arg 1 - pid
    MOVL    sig+0(FP), SI   // arg 2 - signal
    MOVL    $1, DX  // arg 3 - posix
    MOVL    $(0x2000000+37), AX // kill
    SYSCALL
    RET

The text was updated successfully, but these errors were encountered:

ianlancetaylor · 2016-03-21T20:49:57Z

Do you have any suggestions for how to fix this?

jtsylve · 2016-03-21T20:59:36Z

Not yet. Can you try to help me understand why it's currently implemented in the way it is and what the difficulties may be in addressing this issue? I assume it's because the Go runtime was originally not supposed to be loaded into other processes, but I'm not sure.

It's currently making it very difficult / near impossible to debug some crashes in our product, so I am eager to help fix the issue or at least come up with a work around.

ianlancetaylor · 2016-03-21T21:24:41Z

The Go language specifies that certain operations cause a runtime panic. These operations include dereferencing a pointer that is nil or that is somehow invalid. That is implemented by installing a signal handler, catching the signal, and turning it into a panic if appropriate.

When using c-shared the Go runtime looks at the location where the signal occurred. If the signal occurred running non-Go code, it removes the Go signal handler, installs the previous handler if there was one, and raises the signal again. That is presumably what you are seeing.

Go c-shared libraries are always multi-threaded and always have background threads running Go code (for the garbage collector if nothing else), so it doesn't work to only install the signal handler while Go code is running.

For your purposes you could decide that you don't care about correct Go handling of invalid pointer references, and simply remove the Go signal handler after loading the shared library. That might work for you, but it would not be a good choice for most Go programs or shared libraries.

Otherwise, I don't know what to do, but I'm certainly open to suggestions. The handling of signals with mixed C/Go code is definitely a complex area; see https://golang.org/pkg/os/signal .

jtsylve · 2016-03-21T21:43:03Z

When using c-shared the Go runtime looks at the location where the signal occurred. If the signal occurred running non-Go code, it removes the Go signal handler, installs the previous handler if there was one, and raises the signal again. That is presumably what you are seeing.

This is what I would expect, but in the way that it's re-raising the signals (by using the kill system call to send the signal to the process) we lose all context.

The Windows handler seems to check to see if the IP that caused the fault is not in the range of Go, and if so tells Windows to try the next handler in line. (

go/src/runtime/signal_windows.go

Line 70 in a03bdc3

if !isgoexception(info, r) {

)

Couldn't we do something similar?

I believe in C you can do something like the following:

static struct sigaction _sigsev_old;
static struct sigaction _sigsev;

...

// Initialize segfault handler, saving original 
_sigsev.sa_sigaction = sigsev_handler;
_sigsev.sa_flags = SA_SIGINFO;
sigaction(SIGBUS, &_sigsev, &_sigsev_old);

...

// Somewhere inside the segfault handler
if (!isgoexception(siginfo)) {
    _sigsev_old.sa_flags = SA_RESETHAND;
    sigaction(SIGSEGV, &_sigsev_old, NULL);
}

ianlancetaylor · 2016-03-21T22:21:26Z

What you suggest we do is what we already do.

The crash report you show above is not coming from Go, it's coming from something else, either the non-Go signal handler or the OS. You clearly aren't happy with that crash report. Are you unhappy because it shows 6 different stacks, or because the first stack says runtime.raiseproc rather than something in your own program, or both, or something else entirely?

jtsylve · 2016-03-21T22:27:56Z

This is the standard OS X crash report. I get why shows multiple stacks. That makes sense and is expected. Our problem is that the identifies runtime.raiseproc as the source of the fault rather than whatever actually causes the fault.

jtsylve · 2016-03-21T23:32:33Z

Is it possible to fix up the context stack in these situations?

ianlancetaylor · 2016-03-21T23:58:59Z

One possibility would be to simply return from the Go signal handler, after restoring the old signal handler. That should cause the instruction to be executed again, raising the signal again, this time invoking the old handler. I'm not sure but it seems worth a try.

But a serious problem with that idea, or in general with any attempt to fix up the stack in the signal context, is that there would be no way to reinstall the Go signal handler if the program did not simply crash. In this case, of course, it does crash, but in the general case the non-Go signal handler might handle the signal and continue executing.

jtsylve · 2016-03-22T00:35:10Z

Well I thought you could just call the original sighandler, passing the same context struct without uninstalling the go signal handler. This doesn't seem to be what's happening (at least when the original sig handler is the default).

go/src/runtime/signal2_unix.go

Line 42 in 5fea2cc

if fwdFn == _SIG_DFL {

I tried removing that check, but then I think I get into some sort of infinite sighandling loop. It's hard to debug at that level, because you can't print anything out due to the nosplit

jtsylve · 2016-03-22T19:11:36Z

I've got an idea that at least seems to work on my end. Let me know what you think and I'll submit a code review request.

raisebadsignal resets the signal handler to the original, calls raise to re-raise the signal, and then if it returns resets the signal handler to the go signal handler. raisebadsignal only seems to be called if the original handler couldn't be called with the context in sigfwdgo.

I propose that in the case that the signal is _SIG_DFL we return from raisebadsignal after resetting the handler instead of raising the signal with raise. This will allow the default handler to be called without messing with the stack.

We can also only do it for signals which we know the default handler does not recover from, so we know that there's no chance that the signal will be recovered and needed to be re-setup.

ianlancetaylor · 2016-03-22T20:43:25Z

That sounds like it might work. We would have to check that the signal was produced because of execution--that it was not the result of a call to kill or pthread_kill. Basically we have to be sure that the signal really will happen again.

jtsylve · 2016-03-23T00:26:43Z

So I've got a patch, but I'm seeing something really weird that I can't figure out. I'm bootstrapping with Go 1.6 and I get a segfault in runtime.sigaction + 33 (sys_darwin_amd64.s:212) while compiling. I've narrowed it down to what causes it, but I have no idea why. I defined another sigtable flag so that we can easily lookup which default handlers are fatal on each OS. If I set the flag for SIGKILL to anything other than 0, I get the segfault in the build process (even if that's the only change).

Any idea what might cause this? A search through the codebase tells me that the field is only checked against specific bits. Setting additional flags shouldn't cause a problem...

TEXT runtime·sigaction(SB),NOSPLIT,$0-24
        MOVL    mode+0(FP), DI          // arg 1 sig
        MOVQ    new+8(FP), SI           // arg 2 act
        MOVQ    old+16(FP), DX          // arg 3 oact
        MOVQ    old+16(FP), CX          // arg 3 oact
        MOVQ    old+16(FP), R10         // arg 3 oact
        MOVL    $(0x2000000+46), AX     // syscall entry
        SYSCALL
        JCC     2(PC)
        MOVL    $0xf1, 0xf1  // crash <- crashing here because the syscall fails
        RET

jtsylve · 2016-03-23T00:44:36Z

The setup to that syscall looks wrong to me. Why is the 3rd parameter being added to 3 different registers? The sigaction syscall only has 3 parameters...

jtsylve · 2016-03-23T02:35:08Z

OK I figured out why the syscall is failing. It's returning EINVAL, which according to the xnu source of sigaction is because it's being passed SIGKILL.

if (signum <= 0 || signum >= NSIG ||
        signum == SIGKILL || signum == SIGSTOP)
        return (EINVAL);

For whatever reason changing the flags on SIGKILL must cause sigaction to be called on it. I still can't figure out why that is, but since the handler can't be changed anyway I don't need to set the flags on that one.

jtsylve · 2016-03-23T03:27:05Z

CL: https://golang.org/cl/21006/

gopherbot · 2016-03-23T04:01:00Z

CL https://golang.org/cl/21006 mentions this issue.

jtsylve · 2016-03-25T01:18:23Z

Apparently the CL broke the Solaris build. I'm looking into it.

gopherbot · 2016-03-25T19:01:04Z

CL https://golang.org/cl/21145 mentions this issue.

This fixes the problems with signal handling that were inadvertently introduced in https://go-review.googlesource.com/21006. Fixes #14899 Change-Id: Ia746914dcb3146a52413d32c57b089af763f0810 Reviewed-on: https://go-review.googlesource.com/21145 Reviewed-by: Ian Lance Taylor <iant@golang.org>

ianlancetaylor changed the title ~~darwin: c-shared libraries segfault handler overrides the default handler~~ runtime: darwin: c-shared libraries segfault handler overrides the default handler Mar 21, 2016

ianlancetaylor added the OS-Darwin label Mar 21, 2016

ianlancetaylor added this to the Go1.7 milestone Mar 21, 2016

jtsylve changed the title ~~runtime: darwin: c-shared libraries segfault handler overrides the default handler~~ runtime: unix: c-shared libraries segfault handler overrides the default handler Mar 22, 2016

gopherbot closed this as completed in df2b2eb Mar 24, 2016

golang locked and limited conversation to collaborators Mar 26, 2017

gopherbot added the FrozenDueToAge label Mar 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: unix: c-shared libraries segfault handler overrides the default handler #14899

runtime: unix: c-shared libraries segfault handler overrides the default handler #14899

jtsylve commented Mar 21, 2016

ianlancetaylor commented Mar 21, 2016

jtsylve commented Mar 21, 2016

ianlancetaylor commented Mar 21, 2016

jtsylve commented Mar 21, 2016

ianlancetaylor commented Mar 21, 2016

jtsylve commented Mar 21, 2016

jtsylve commented Mar 21, 2016

ianlancetaylor commented Mar 21, 2016

jtsylve commented Mar 22, 2016

jtsylve commented Mar 22, 2016

ianlancetaylor commented Mar 22, 2016

jtsylve commented Mar 23, 2016

jtsylve commented Mar 23, 2016

jtsylve commented Mar 23, 2016

jtsylve commented Mar 23, 2016

gopherbot commented Mar 23, 2016

jtsylve commented Mar 25, 2016

gopherbot commented Mar 25, 2016

runtime: unix: c-shared libraries segfault handler overrides the default handler #14899

runtime: unix: c-shared libraries segfault handler overrides the default handler #14899

Comments

jtsylve commented Mar 21, 2016

ianlancetaylor commented Mar 21, 2016

jtsylve commented Mar 21, 2016

ianlancetaylor commented Mar 21, 2016

jtsylve commented Mar 21, 2016

ianlancetaylor commented Mar 21, 2016

jtsylve commented Mar 21, 2016

jtsylve commented Mar 21, 2016

ianlancetaylor commented Mar 21, 2016

jtsylve commented Mar 22, 2016

jtsylve commented Mar 22, 2016

ianlancetaylor commented Mar 22, 2016

jtsylve commented Mar 23, 2016

jtsylve commented Mar 23, 2016

jtsylve commented Mar 23, 2016

jtsylve commented Mar 23, 2016

gopherbot commented Mar 23, 2016

jtsylve commented Mar 25, 2016

gopherbot commented Mar 25, 2016