Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgo: use of libc's backtrace from a C++ signal handler invoked for Go code crashes on ppc64le #46374

Open
laboger opened this issue May 25, 2021 · 7 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@laboger
Copy link
Contributor

laboger commented May 25, 2021

What version of Go are you using (go version)?

$ go version
tip

Does this issue reproduce with the latest release?

yes

This original failure is documented here cockroachdb/cockroach#62979

I'm opening this issue to get some feedback on this problem and possible solutions. I know there have been other problems reported in the community related to cgo stack unwinding and this falls into that category.

In this case there is a problem because there is a C++ signal handler which calls backtrace from libc, and the interrupt can happen in Go or non-Go code. The code where this feature was added can be found here cockroachdb/cockroach@957b4bd.

The call to backtrace gets a SEGV on ppc64le which is not unexpected since backtrace expects the stackframe to contain the backchain pointer (according to the PPC64 ABI). But since Go doesn't maintain the backchain pointer, when backtrace tries to unwind the stack it eventually tries to use something as a pointer which is not a pointer and SEGVs.

I've seen several other issues related to unwinding stacks with cgo, and in particular one that mentions the use of cgosymbolizer by as a solution. It seems like cgosymbolizer could be used in this case from within a signal handler to give the desired stack information when Go code is involved.

@ianlancetaylor @cherrymui Does this seem like a possible solution, to create a Go signal handler which could then use cgosymbolizer to dump the stacks?
@pmur

@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 25, 2021
@mknyszek mknyszek added this to the Backlog milestone May 25, 2021
@mknyszek
Copy link
Contributor

Does this reproduce with older versions of Go as well, or is this new?

Judging by your analysis, this is true for older Go versions as well, hence why I put it in the backlog. If it's new, we probably need to deal with it this release.

@laboger
Copy link
Contributor Author

laboger commented May 26, 2021

Backlog is fine, I don't think this has ever worked. I'm not expecting this to get fixed for this release but trying to understand if this should work as is. To me it seems like they are doing some things that might not work with Go. And if that is the case is there another way this could be written to make it work, or at least not SEGV for platforms where this would be a problem.

Here is a summary as I understand their code where this started failing:

There is some C++ code to dump out all the stacks for a process.
A C++ signal handler for SIGRTMIN is created which calls libc's backtrace function.
A helper function reads /proc/self/task to create a list of all the threads in the process. Then it walks through the list and sends SIGRTMIN to each using rt_tgsigqueueinfo.
If there is a Go thread which receives the signal then the C++ signal handler is called and gets a SEGV in libc's backtrace.

A few things that seem suspicious (although this "works" on x86) and I don't know if they directly contribute to the problem:

  • I don't think backtrace is async-safe so shouldn't be called from a signal handler
  • The man pages for rt_tgsigqueueinfo say this is not intended to be used in user code. There is no libc wrapper for it.

If I write a program to call backtrace using cgo, this does not SEGV. The Go code will call asmcgocall before calling the C code and that will clear out the location for the backchain that backtrace expects. Since it is nil, backrace will stop at that point. In the scenario where Go code is running and is interrupted so the C++ signal handler gets control, there is nothing like asmcgocall in between to indicate that backtrace should stop.

I'm trying to create a smaller, simpler reproducer for this.

@pmur
Copy link
Contributor

pmur commented May 26, 2021

A simple reproducer of the backtrace crash for ppc64le can be done with some minor changes to the example in #46286 by replacing the obvious segfaulting code with a call to backtrace, and replacing the deadlock with a sufficiently long sleep.

@laboger
Copy link
Contributor Author

laboger commented May 26, 2021

A simple reproducer of the backtrace crash for ppc64le can be done with some minor changes to the example in #46286 by replacing the obvious segfaulting code with a call to backtrace, and replacing the deadlock with a sufficiently long sleep.

I meant a reproducer that has multiple Go and non-Go threads where a signal is sent to all to generate backtrace-type information like in the original issue.

@ianlancetaylor
Copy link
Contributor

The backtrace function is not officially async-signal-safe, but in practice it is. To make it work across Go code I guess that we would have to change the Go ABI to use the standard stack frame layout. That seems like it might be a good idea in general, much as we use frame pointers on x86. But I don't know how hard it would be.

I agree that using cgosymbolizer ought to work.

@laboger
Copy link
Contributor Author

laboger commented May 27, 2021

To make it work across Go code I guess that we would have to change the Go ABI to use the standard stack frame layout. That seems like it might be a good idea in general, much as we use frame pointers on x86. But I don't know how hard it would be.

If we could do this, that would be great. There are other tools the expect the PPC64 ABI so don't work correctly. I was always under the impression this was not an option due to compatibility issues. Something to investigate for the next release due to the work and risk.

I agree that using cgosymbolizer ought to work.

Based on my understanding, it would be better to have a Go signal handler to dump out the stacks for threads running Go and use cgosymbolizer from that signal handler. Does that seem like the right approach?

@ianlancetaylor
Copy link
Contributor

Based on my understanding, it would be better to have a Go signal handler to dump out the stacks for threads running Go and use cgosymbolizer from that signal handler. Does that seem like the right approach?

I think that should work, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants