New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: unix: c-shared libraries segfault handler overrides the default handler #14899
Comments
Do you have any suggestions for how to fix this? |
Not yet. Can you try to help me understand why it's currently implemented in the way it is and what the difficulties may be in addressing this issue? I assume it's because the Go runtime was originally not supposed to be loaded into other processes, but I'm not sure. It's currently making it very difficult / near impossible to debug some crashes in our product, so I am eager to help fix the issue or at least come up with a work around. |
The Go language specifies that certain operations cause a runtime panic. These operations include dereferencing a pointer that is nil or that is somehow invalid. That is implemented by installing a signal handler, catching the signal, and turning it into a panic if appropriate. When using c-shared the Go runtime looks at the location where the signal occurred. If the signal occurred running non-Go code, it removes the Go signal handler, installs the previous handler if there was one, and raises the signal again. That is presumably what you are seeing. Go c-shared libraries are always multi-threaded and always have background threads running Go code (for the garbage collector if nothing else), so it doesn't work to only install the signal handler while Go code is running. For your purposes you could decide that you don't care about correct Go handling of invalid pointer references, and simply remove the Go signal handler after loading the shared library. That might work for you, but it would not be a good choice for most Go programs or shared libraries. Otherwise, I don't know what to do, but I'm certainly open to suggestions. The handling of signals with mixed C/Go code is definitely a complex area; see https://golang.org/pkg/os/signal . |
This is what I would expect, but in the way that it's re-raising the signals (by using the kill system call to send the signal to the process) we lose all context. The Windows handler seems to check to see if the IP that caused the fault is not in the range of Go, and if so tells Windows to try the next handler in line. ( go/src/runtime/signal_windows.go Line 70 in a03bdc3
Couldn't we do something similar? I believe in C you can do something like the following: static struct sigaction _sigsev_old;
static struct sigaction _sigsev;
...
// Initialize segfault handler, saving original
_sigsev.sa_sigaction = sigsev_handler;
_sigsev.sa_flags = SA_SIGINFO;
sigaction(SIGBUS, &_sigsev, &_sigsev_old);
...
// Somewhere inside the segfault handler
if (!isgoexception(siginfo)) {
_sigsev_old.sa_flags = SA_RESETHAND;
sigaction(SIGSEGV, &_sigsev_old, NULL);
} |
What you suggest we do is what we already do. The crash report you show above is not coming from Go, it's coming from something else, either the non-Go signal handler or the OS. You clearly aren't happy with that crash report. Are you unhappy because it shows 6 different stacks, or because the first stack says |
This is the standard OS X crash report. I get why shows multiple stacks. That makes sense and is expected. Our problem is that the identifies |
Is it possible to fix up the context stack in these situations? |
One possibility would be to simply return from the Go signal handler, after restoring the old signal handler. That should cause the instruction to be executed again, raising the signal again, this time invoking the old handler. I'm not sure but it seems worth a try. But a serious problem with that idea, or in general with any attempt to fix up the stack in the signal context, is that there would be no way to reinstall the Go signal handler if the program did not simply crash. In this case, of course, it does crash, but in the general case the non-Go signal handler might handle the signal and continue executing. |
Well I thought you could just call the original sighandler, passing the same context struct without uninstalling the go signal handler. This doesn't seem to be what's happening (at least when the original sig handler is the default). go/src/runtime/signal2_unix.go Line 42 in 5fea2cc
I tried removing that check, but then I think I get into some sort of infinite sighandling loop. It's hard to debug at that level, because you can't print anything out due to the |
I've got an idea that at least seems to work on my end. Let me know what you think and I'll submit a code review request.
I propose that in the case that the signal is _SIG_DFL we return from We can also only do it for signals which we know the default handler does not recover from, so we know that there's no chance that the signal will be recovered and needed to be re-setup. |
That sounds like it might work. We would have to check that the signal was produced because of execution--that it was not the result of a call to kill or pthread_kill. Basically we have to be sure that the signal really will happen again. |
So I've got a patch, but I'm seeing something really weird that I can't figure out. I'm bootstrapping with Go 1.6 and I get a segfault in Any idea what might cause this? A search through the codebase tells me that the field is only checked against specific bits. Setting additional flags shouldn't cause a problem... TEXT runtime·sigaction(SB),NOSPLIT,$0-24
MOVL mode+0(FP), DI // arg 1 sig
MOVQ new+8(FP), SI // arg 2 act
MOVQ old+16(FP), DX // arg 3 oact
MOVQ old+16(FP), CX // arg 3 oact
MOVQ old+16(FP), R10 // arg 3 oact
MOVL $(0x2000000+46), AX // syscall entry
SYSCALL
JCC 2(PC)
MOVL $0xf1, 0xf1 // crash <- crashing here because the syscall fails
RET |
The setup to that syscall looks wrong to me. Why is the 3rd parameter being added to 3 different registers? The |
OK I figured out why the syscall is failing. It's returning if (signum <= 0 || signum >= NSIG ||
signum == SIGKILL || signum == SIGSTOP)
return (EINVAL); For whatever reason changing the flags on SIGKILL must cause |
CL https://golang.org/cl/21006 mentions this issue. |
Apparently the CL broke the Solaris build. I'm looking into it. |
CL https://golang.org/cl/21145 mentions this issue. |
This fixes the problems with signal handling that were inadvertently introduced in https://go-review.googlesource.com/21006. Fixes #14899 Change-Id: Ia746914dcb3146a52413d32c57b089af763f0810 Reviewed-on: https://go-review.googlesource.com/21145 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Please answer these questions before submitting your issue. Thanks!
go version
)?go version go1.6 darwin/amd64
orgo version go1.6 darwin/386
go env
)?buildmode=c-shared
The program should crash and the default segfault handler should be called (since the crash was not caused by Go).
I'd expect to see a crash like this:
Go's signal handler takes precedence, obscuring the stack trace, and making it default to debug crashes in applications that link against Go c-shared libraries.
I believe this has to do with the following code:
The text was updated successfully, but these errors were encountered: