Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/cgo: can't safely use signal handling on Darwin, missing support from Go runtime #22805

Closed
knz opened this issue Nov 19, 2017 · 17 comments
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@knz
Copy link

knz commented Nov 19, 2017

  • Go version: 1.9
  • Reproduces with latest release
  • using Darwin/amd64 - this issue is not relevant to any other OS, but is perhaps relevant to other darwin platforms.

tl;dr: Problem: C's standard sigaction on Darwin cannot retrieve the current signal trampoline, therefore cgo code cannot properly restore the Go trampoline. Solution: document the problem and expose a C symbol for use by cgo code to restore Go's signal configuration.

The following repository contains an example of the problem and an example of the manual, ugly solution (barring a clean solution provided by Go):

https://github.com/knz/sigtramp-bug

(Note: this code can only be compiled and run on a darwin system.)

What did you expect to see?

The following code, when called from cgo, should restore Go's signal handler properly:

void test(void) {
  struct sigaction act, oact;

  // save go's signal handler
  sigaction(SIGWINCH, 0, &oact);

  // this code is irrelevant
  act.sa_flags = SA_RESTART|SA_ONSTACK; // as per go docs
  act.sa_handler = xxx;
  sigaction(SIGWINCH, &act, 0);
  // end of irrelevant code

  // restore go's signal handler
  sigaction(SIGWINCH, &oact, 0);
}

The final sigaction call in the code above should ensure that the values saved at the beginning are restored, and thus that any change by the C code is invisible to Go when the function returns.

What did you see instead?

See e.g. https://github.com/knz/sigtramp-bug/tree/master/cgo-failure

After the function completes, issuing the signal causes the following panic:

runtime: newstack sp=0xc420009be8 stack=[0xc42004e000, 0xc420050000]
        morebuf={pc:0x7fff6b863f5a sp:0xc420009bf0 lr:0x0}
        sched={pc:0x403a68a sp:0xc420009be8 lr:0x0 ctxt:0x0}
fatal error: runtime: stack split at bad time

Or sometimes also:

fatal: morestack on g0

Anyway, investigation reveals the following:

https://opensource.apple.com/source/Libc/Libc-1244.1.7/sys/sigaction.c.auto.html

In other words, sigaction always overrides the trampoline with Apple's own _sigtramp. Go's runtime.sigtramp is lost.

Moreover, the underlying syscall __sigaction can set but not retrieve the signal handler (see declaration in the same file: the oact argument has type struct sigaction not struct __sigaction).
This is arguably a bug/misfeature on Apple's side.

Solution - manual

The manual solution is to manually re-inject Go's own runtime.sigtramp in the signal configuration, as done here:

https://github.com/knz/sigtramp-bug/tree/master/cgo-fix

This is cumbersome because:

Solution - desired

  1. the Go docs about Cgo should say "a program that wishes to temporarily set signal handler and then restore Go's signal configuration must use a function sigrestore provided by the Go runtime` (or alternatively a pair sigsave/sigrestore)

  2. the Go runtime should provide the corresponding (sigsave/)sigrestore function for use by Cgo code.

@knz knz changed the title runtime: expose sigtramp on darwin for use by Cgo programs runtime/cgo: can't safely use signal handling on Darwin, missing support from Go runtime Nov 19, 2017
@odeke-em
Copy link
Member

/cc @ianlancetaylor @aclements

@bradfitz
Copy link
Contributor

And @bcmills.

@bradfitz bradfitz added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Nov 20, 2017
@bradfitz bradfitz added this to the Go1.11 milestone Nov 20, 2017
@bcmills
Copy link
Contributor

bcmills commented Nov 20, 2017

This is arguably a bug/misfeature on Apple's side.

Well, no. The Go runtime is using direct syscalls, but as I understand it the BSD-style platforms define libc as the supported API, not the direct syscalls—and sa_tramp is not part of the POSIX libc API. We're the ones messing around with internals, so it's on us to make sure we do it correctly.

Probably the most robust solution would be to extend https://golang.org/cl/33142 to the BSD platforms. (If there is any C code in the binary, we really ought to use libc instead of doing our own syscalls.)

@bcmills
Copy link
Contributor

bcmills commented Nov 20, 2017

therefore cgo code cannot properly restore the Go trampoline.

Portable C code cannot safely restore any signal handler in a multi-threaded program anyway: it is possible for some other sigaction call to occur in the interim, in which case the call to “restore” the handler will actually drop some third handler that was still supposed to be present.

(For that matter, portable C code cannot safely set any signal handler in a multi-threaded program: instead, it should register handlers before threads have started and change their behavior through atomic variables rather than sigaction calls.)

If we can't fix it easily, it's at least worth documenting, but we should give a more robust recommendation: forward signals explicitly, don't try to restore their handlers.

@knz
Copy link
Author

knz commented Nov 20, 2017

@bcmills you're raising a strawman here. The Go runtime calls sigaction once upon runtime initialization. If a Go user wishes to call a C library that plays with signals knowing that all the calls to that library will be serialized and there is no other uses of C code using signals in the rest of the program, then saving/restoring the signal configuration is safe wrt threading.

At CockroachDB we ran into this when using an external library to read input in an interactive program. In any REPL the signalling logic will be perfectly single-threaded. I don't think this is an exotic case and I believe the Go runtime should help by providing either some support or using libc's own sigaction as you proposed above.

@bcmills
Copy link
Contributor

bcmills commented Nov 20, 2017

If a Go user wishes to call a C library that plays with signals knowing that all the calls to that library will be serialized and there is no other uses of C code using signals in the rest of the program

I must be missing something. If you know for a fact that no other libraries in the entire program are making use of the signal, why restore the handler at all?

@knz
Copy link
Author

knz commented Nov 20, 2017

Pseudo-code:

  for {
      line := C.read() // <- this temporarily overrides the handler for SIGWINCH then restores it

     doSomethingInGo(line) // this can take some time, during which SIGWINCH can be received
  }

In this code, if Go's trampoline is not restored properly by C.read(), a subsequent SIGWINCH while running doSomethingInGo will cause a sigsegv/panic. See the example code linked from the initial PR description.

@bcmills
Copy link
Contributor

bcmills commented Nov 20, 2017

I understand what the example code does. What I don't understand is the underlying problem or use-case it attempts to address.

That code is racy unless you somehow know that the signal is only delivered synchronously from the same thread (e.g. via pthread_kill; see #19326). Otherwise, it is possible that a SIGWINCH sent prior to the call to C.test will not be delivered until the new handler is in place (erroneously(?) skipping the Go handler), or that a SIGWINCH sent toward the end of the call will not be delivered until the Go handler is restored.

In either of those cases, the program has to be prepared for the signal to arrive at the wrong handler and respond accordingly, presumably by forwarding to the correct handler. And if you're forwarding anyway, saving and restoring the handler adds system calls and potential races for little apparent benefit.

@ianlancetaylor
Copy link
Contributor

I think that if we fix #17490 then this problem will go away.

I don't think that we should provide sigsave and sigrestore functions. Aside from the valid raciness concerns that @bcmills has raised, if C code knows that it has to call sigsave and sigrestore then that C code can just as easily cooperate with Go's os/signal package to get whatever information it needs. If there were a way to make this work with C code that does not know that it is running in a Go program then that might be worth considering, but I don't see a way.

Closing this since I think this will wind up getting fixed incidentally, and since I don't think there is anything reasonable to do until that happens. Please comment if you disagree.

@knz
Copy link
Author

knz commented Nov 22, 2017

that C code can just as easily cooperate with Go's os/signal package to get whatever information it needs.

I don't believe that's true -- Go's os/signal does not expose runtime.sigtramp, and does not allow the C code to restore Go's behavior after it completes.

Fixing #17490 will only "make this problem go away" if libSystem does not expose the sa_tramp struct field. Otherwise, you won't have any incentive to not use it, and then the problem will persist. Are you sure of this resolution?

@ianlancetaylor
Copy link
Contributor

What I meant by saying that the C code can cooperate with the os/signal package is that Go's os/signal package can report whenever SIGWINCH occurs, and the C code can use that information to do what it needs to do.

You're correct that I am assuming that libSystem does not expose sa_tramp. I can't imagine that it does, but I haven't actually checked.

@knz
Copy link
Author

knz commented Nov 22, 2017

How does the Go code can "report that the signal occurs to the C code"?
In order to make the C code handle the signal it has to be interrupted when the signal is received, to process it (e.g. the C code might be running a read() syscall, and will want to know SIGWINCH was received immediately after read completes, or by interrupting the read call). How do you suggest Go can interrupt the C code, other than using a signal? And thereby requiring the C code to set up a handler, and not confuse Go afterwards like the initial issue description explains?

@ianlancetaylor
Copy link
Contributor

You're right: that case is not going to work. Signal handling in multiple-language programs has many difficulties.

In any case, we are not going to add sigsave and sigrestore calls.

@knz
Copy link
Author

knz commented Nov 25, 2017

Signal handling in multiple language programs has many difficulties

Of the various language run-times I've had the chance to work with (ML, Erlang, Python's), Go is the only one that causes difficulty here. I don't know what the best solution is (I'm OK reading you don't like the idea of an extra API), however the issue described by the title of this issue still exists, and the proposed strategy (addressing #17490) does not yet clearly appear to be a solution to this issue.

I'd say the resolution here is ... lacking.

@knz
Copy link
Author

knz commented Nov 25, 2017

Do you recognize that if used the title "runtime/cgo: can't safely use signal handling on Darwin" without writing "missing support from the runtime", the issue would still not be resolved?

I can perhaps create a new issue that describes the problem at hand, and then we can spell out how exact;y having the Go runtime use libSystem directly will address the use case. At least it would help spell out the requirement that Go's signal handlers must still properly run when libc installs its own trampoline (_sigtramp) when C code calls sigaction.

@ianlancetaylor
Copy link
Contributor

If Go code always installs signal handlers using sigaction, then there will be no problem.

Let's fix #17490, which we have to fix anyhow, and then see where we are.

@gopherbot
Copy link

Change https://golang.org/cl/116875 mentions this issue: runtime: use libc's signal functions on Darwin

gopherbot pushed a commit that referenced this issue Jun 12, 2018
sigaction, sigprocmask, sigaltstack, and raiseproc.

Fix bug in mstart_stub where we weren't saving callee-saved registers,
so if an m finished the pthread library calling mstart_stub would
sometimes fail.

Update #17490
Update #22805

Change-Id: Ie297ede0997910aa956834e49e85711b90cdfaa7
Reviewed-on: https://go-review.googlesource.com/116875
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
benesch added a commit to cockroachdb/cryptopp that referenced this issue Oct 16, 2018
As part of its CPU feature detection, CryptoPP installs a SIGILL signal
handler before issuing the cpuid instruction. The intent is to
gracefully degrade on CPUs that don't support the cpuid instruction.

The problem is that it is impossible to safely overwrite a signal
handler installed by the Go runtime in go1.10 on macOS
(golang/go#22805). This causes CockroachDB 2.0 to crash on macOS Mojave:
cockroachdb/cockroach#31380.

The situation has improved on the Go front, as go1.11 makes it possible
to safely save and restore signal handlers installed by the Go runtime
on macOS.

Still, we can do better and support go1.10. There is no need to bother
installing a SIGILL handler, as the cpuid instruction is supported by
every x86-64 CPU. We can instead use conditional compilation to make
sure that we never execute a cpuid instruction on a non x86-64 CPU.

Note that CPU feature detection is performed at executable load time
(see the attribute(constructor) on DetectX86Features); therefore any
reference to function which calls DetectX86Features (notably HasAESNI)
corrupts the signal handler. It's not entirely clear why this corruption
later leads to the SIGTRAP seen in cockroachdb/cockroach#31380--is
something in macOS or the Go runtime generating a SIGILL and trying to
handle it gracefully?--but regardless, not mucking with the signal
handler fixes the issue.
benesch added a commit to cockroachdb/cryptopp that referenced this issue Oct 16, 2018
As part of its CPU feature detection, CryptoPP installs a SIGILL signal
handler before issuing the cpuid instruction. The intent is to
gracefully degrade on CPUs that don't support the cpuid instruction.

The problem is that it is impossible to safely overwrite a signal
handler installed by the Go runtime in go1.10 on macOS
(golang/go#22805). This causes CockroachDB 2.0 to crash on macOS Mojave:
cockroachdb/cockroach#31380.

The situation has improved on the Go front, as go1.11 makes it possible
to safely save and restore signal handlers installed by the Go runtime
on macOS.

Still, we can do better and support go1.10. There is no need to bother
installing a SIGILL handler, as the cpuid instruction is supported by
every x86-64 CPU. We can instead use conditional compilation to make
sure that we never execute a cpuid instruction on a non x86-64 CPU.

Note that CPU feature detection is performed at executable load time
(see the attribute(constructor) on DetectX86Features); therefore any
reference to function which calls DetectX86Features (notably HasAESNI)
corrupts the signal handler. It's not entirely clear why this corruption
later leads to the SIGTRAP seen in cockroachdb/cockroach#31380--is
something in macOS or the Go runtime generating a SIGILL and trying to
handle it gracefully?--but regardless, not mucking with the signal
handler fixes the issue.
benesch added a commit to benesch/cockroach that referenced this issue Oct 16, 2018
Bump CryptoPP to pick up a fix for cockroachdb#31380.
Details reproduced below.

Fix cockroachdb#31380.

---

As part of its CPU feature detection, CryptoPP installs a SIGILL signal
handler before issuing the cpuid instruction. The intent is to
gracefully degrade on CPUs that don't support the cpuid instruction.

The problem is that it is impossible to safely overwrite a signal
handler installed by the Go runtime in go1.10 on macOS
(golang/go#22805). This causes CockroachDB 2.0 to crash on macOS Mojave:
cockroachdb#31380.

The situation has improved on the Go front, as go1.11 makes it possible
to safely save and restore signal handlers installed by the Go runtime
on macOS.

Still, we can do better and support go1.10. There is no need to bother
installing a SIGILL handler, as the cpuid instruction is supported by
every x86-64 CPU. We can instead use conditional compilation to make
sure that we never execute a cpuid instruction on a non x86-64 CPU.

Note that CPU feature detection is performed at executable load time
(see the attribute(constructor) on DetectX86Features); therefore any
reference to function which calls DetectX86Features (notably HasAESNI)
corrupts the signal handler. It's not entirely clear why this corruption
later leads to the SIGTRAP seen in cockroachdb#31380--is
something in macOS or the Go runtime generating a SIGILL and trying to
handle it gracefully?--but regardless, not mucking with the signal
handler fixes the issue.

Release note (bug fix): CockroachDB no longer crashes due to a SIGTRAP error
soon after startup on macOS Mojave (cockroachdb#31380).
craig bot pushed a commit to cockroachdb/cockroach that referenced this issue Oct 16, 2018
31516: c-deps: bump CryptoPP to avoid SIGTRAP on macOS r=mberhault a=benesch

Bump CryptoPP to pick up a fix for #31380.
Details reproduced below.

Fix #31380.

---

As part of its CPU feature detection, CryptoPP installs a SIGILL signal
handler before issuing the cpuid instruction. The intent is to
gracefully degrade on CPUs that don't support the cpuid instruction.

The problem is that it is impossible to safely overwrite a signal
handler installed by the Go runtime in go1.10 on macOS
(golang/go#22805). This causes CockroachDB 2.0 to crash on macOS Mojave:
#31380.

The situation has improved on the Go front, as go1.11 makes it possible
to safely save and restore signal handlers installed by the Go runtime
on macOS.

Still, we can do better and support go1.10. There is no need to bother
installing a SIGILL handler, as the cpuid instruction is supported by
every x86-64 CPU. We can instead use conditional compilation to make
sure that we never execute a cpuid instruction on a non x86-64 CPU.

Note that CPU feature detection is performed at executable load time
(see the attribute(constructor) on DetectX86Features); therefore any
reference to function which calls DetectX86Features (notably HasAESNI)
corrupts the signal handler. It's not entirely clear why this corruption
later leads to the SIGTRAP seen in #31380--is
something in macOS or the Go runtime generating a SIGILL and trying to
handle it gracefully?--but regardless, not mucking with the signal
handler fixes the issue.

Release note (bug fix): CockroachDB no longer crashes due to a SIGTRAP error
soon after startup on macOS Mojave (#31380).

31517: roachtest: deflake acceptance/bank/zerosum-splits r=andreimatei a=benesch

This test requires that the experimental_force_split_at session var be
set to force ALTER ... SPLIT AT to work even with the merge queue
enabled. gosql.DB's connection pool will occasionally open a new
connection which does not have the var set. Set the session var in the
same batch of statements as the ALTER ... SPLIT AT command so that the
session var is always set in the session that executes the ALTER ...
SPLIT AT command.

Fix #31510.

Release note: None

Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
benesch added a commit to benesch/cockroach that referenced this issue Oct 16, 2018
Bump CryptoPP to pick up a fix for cockroachdb#31380.
Details reproduced below.

Fix cockroachdb#31380.

---

As part of its CPU feature detection, CryptoPP installs a SIGILL signal
handler before issuing the cpuid instruction. The intent is to
gracefully degrade on CPUs that don't support the cpuid instruction.

The problem is that it is impossible to safely overwrite a signal
handler installed by the Go runtime in go1.10 on macOS
(golang/go#22805). This causes CockroachDB 2.0 to crash on macOS Mojave:
cockroachdb#31380.

The situation has improved on the Go front, as go1.11 makes it possible
to safely save and restore signal handlers installed by the Go runtime
on macOS.

Still, we can do better and support go1.10. There is no need to bother
installing a SIGILL handler, as the cpuid instruction is supported by
every x86-64 CPU. We can instead use conditional compilation to make
sure that we never execute a cpuid instruction on a non x86-64 CPU.

Note that CPU feature detection is performed at executable load time
(see the attribute(constructor) on DetectX86Features); therefore any
reference to function which calls DetectX86Features (notably HasAESNI)
corrupts the signal handler. It's not entirely clear why this corruption
later leads to the SIGTRAP seen in cockroachdb#31380--is
something in macOS or the Go runtime generating a SIGILL and trying to
handle it gracefully?--but regardless, not mucking with the signal
handler fixes the issue.

Release note (bug fix): CockroachDB no longer crashes due to a SIGTRAP error
soon after startup on macOS Mojave (cockroachdb#31380).
benesch added a commit to cockroachdb/cryptopp that referenced this issue Oct 16, 2018
As part of its CPU feature detection, CryptoPP installs a SIGILL signal
handler before issuing the cpuid instruction. The intent is to
gracefully degrade on CPUs that don't support the cpuid instruction.

The problem is that it is impossible to safely overwrite a signal
handler installed by the Go runtime in go1.10 on macOS
(golang/go#22805). This causes CockroachDB 2.0 to crash on macOS Mojave:
cockroachdb/cockroach#31380.

The situation has improved on the Go front, as go1.11 makes it possible
to safely save and restore signal handlers installed by the Go runtime
on macOS.

Still, we can do better and support go1.10. There is no need to bother
installing a SIGILL handler, as the cpuid instruction is supported by
every x86-64 CPU. We can instead use conditional compilation to make
sure that we never execute a cpuid instruction on a non x86-64 CPU.

Note that CPU feature detection is performed at executable load time
(see the attribute(constructor) on DetectX86Features); therefore any
reference to function which calls DetectX86Features (notably HasAESNI)
corrupts the signal handler. It's not entirely clear why this corruption
later leads to the SIGTRAP seen in cockroachdb/cockroach#31380--is
something in macOS or the Go runtime generating a SIGILL and trying to
handle it gracefully?--but regardless, not mucking with the signal
handler fixes the issue.
benesch added a commit to benesch/cockroach that referenced this issue Oct 16, 2018
Bump CryptoPP to pick up a fix for cockroachdb#31380.
Details reproduced below.

Fix cockroachdb#31380.

---

As part of its CPU feature detection, CryptoPP installs a SIGILL signal
handler before issuing the cpuid instruction. The intent is to
gracefully degrade on CPUs that don't support the cpuid instruction.

The problem is that it is impossible to safely overwrite a signal
handler installed by the Go runtime in go1.10 on macOS
(golang/go#22805). This causes CockroachDB 2.0 to crash on macOS Mojave:
cockroachdb#31380.

The situation has improved on the Go front, as go1.11 makes it possible
to safely save and restore signal handlers installed by the Go runtime
on macOS.

Still, we can do better and support go1.10. There is no need to bother
installing a SIGILL handler, as the cpuid instruction is supported by
every x86-64 CPU. We can instead use conditional compilation to make
sure that we never execute a cpuid instruction on a non x86-64 CPU.

Note that CPU feature detection is performed at executable load time
(see the attribute(constructor) on DetectX86Features); therefore any
reference to function which calls DetectX86Features (notably HasAESNI)
corrupts the signal handler. It's not entirely clear why this corruption
later leads to the SIGTRAP seen in cockroachdb#31380--is
something in macOS or the Go runtime generating a SIGILL and trying to
handle it gracefully?--but regardless, not mucking with the signal
handler fixes the issue.

Release note (bug fix): CockroachDB no longer crashes due to a SIGTRAP error
soon after startup on macOS Mojave (cockroachdb#31380).

diff --git a/c-deps/cryptopp b/c-deps/cryptopp
index c621ce0532..6d6064445d 160000
--- a/c-deps/cryptopp
+++ b/c-deps/cryptopp
@@ -1 +1 @@
-Subproject commit c621ce053298fafc1e59191079c33acd76045c26
+Subproject commit 6d6064445ded787c7d6bf0a021fb718fddb63345
@golang golang locked and limited conversation to collaborators Jun 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

6 participants