New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: unexpected return pc for runtime.sigpanic #23576
Comments
Interesting. Thanks for the repro. This happened recently on linux/arm on the build dashboard, too: https://build.golang.org/log/4552a24f91dc893759cce93184c415f3986fcbc6 I haven't dug into this very deeply, but it sort of looks like we're injecting a |
Thank you for the repro @josharian and my apologies for totally forgetting to work on diagnosing and reproducing this bug last year, but on the bright side no one could have done it better than you :) |
@odeke-em want to bisect? :) |
@josharian gladly, you've done the heavy lifting already :) |
Obviously commit dbd8f3d aka CL https://go-review.googlesource.com/#/c/89016/ exposes this problem, now my next step is to bisect And also good news, here is a standalone repro package main
/*
#include <stdio.h>
void printIt(void *ptr) {
char *s = (char *)ptr;
printf(":%c\n", *s);
}
*/
import "C"
func main() {
C.printIt(nil)
} |
@odeke-em, thanks for looking into this! I'm not positive that your reproducer is quite the same since it unwinds to asmcgocall, while Josh's only gets to the sigpanic. If it is the same, then I think we've had the underlying issue for a long time. Go 1.8 also cuts off the traceback at asmcgocall and presumably was just silently failing to unwind past there. It would make sense to cut off the traceback at asmcgocall since that performs a system stack switch and thus there's nowhere to unwind to from it most of the time. (But it wouldn't make sense to cut it off at sigpanic, which is what makes me think your reproducer might be different from Josh's in an important way, though it still highlights a bug.) |
We only forward signals if there is a C signal handler to which to forward them. If there is no C signal handler (the normal case) and if the Go runtime expects to handle the signal itself (the normal case) we don't forward. So, yes, we can definitely insert a call to Perhaps we should undo the hex dump for 1.10, or skip it when we think we may be executing C code. |
Thank you @aclements, I'll take a deeper look sometime when I get more bandwidth, it's an interesting issue but I am finding myself having to tear up many CLs and commits :)
@ianlancetaylor nice, in deed :) package main
/*
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
void sigHandler(const int sig) {
printf("got this signal: %d\n", sig);
exit(0);
}
void printIt(void *ptr, int handleSig) {
if (handleSig) {
struct sigaction sa;
sa.sa_handler = sigHandler;
sigaction(SIGSEGV, &sa, NULL);
}
char *s = (char *)ptr;
printf(":%c\n", *s);
}
*/
import "C"
func main() {
C.printIt(nil, 1)
} |
For the original reproducer, the problem is indeed that we're injecting a For the second reproducer, after switching to the system stack, The difference between the reproducers is rather amusing. In general we should see things like in the original reproducer. However, two GCC optimizations combine to make it appear like |
It turns out this doesn't work. If the C code calls back in to Go, we switch back to the regular G stack and if we try to unwind from there, we'll see the call to |
Change https://golang.org/cl/90896 mentions this issue: |
Change https://golang.org/cl/90895 mentions this issue: |
That's surprising: doesn't I guess the base-pointer and return address can't be saved there because they might be moved during a callback from C. But why does the callback from C need to run on the same G at all? (I think we've talked before about treating C calls as implicitly noescape and picking up a new goroutine stack instead of resuming on the old one.) It seems like we could save the G in the system stack frame and zero it out in TLS, so that if we get a callback from C we run it on a fresh goroutine. What would that break? ( At any rate, I think all the things I'm speculating about here are too invasive for a fix yet in 1.10... |
It does not and, unfortunately, I'm not sure it even can. The function itself is declared with 0 stack size, but it opens up a 64 byte frame and then aligns it on the system stack. I think traceback is still using the 0 stack size (I would expect the pcsp table to include the 64 byte change, but empirically it may not be; I haven't dug in to that), so we'd have to put the return PC at 0(SP). But that 64 byte frame is some spill location for the Windows fastcall calling convention (though I wasn't able to find documentation on why it needed that), so 0(SP) may be clobbered (if that's indeed what's going on with the 64 byte frame).
They're on the system stack, so they'll never get moved.
I'm not sure if there's much point in running the callback on a new G, but it certainly could run on a new stack, as I think we've talked about. We may not want to pay the cost of allocating a new stack at every C callback, so perhaps a cleverer thing would be to keep using the original G stack, but if we need to grow it, only copy back as far as the last C callback and keep both segments around. Of course, then we're sort of back in segmented stack land and Russ really didn't like the sound of that. :) |
asmcgocall switches to the system stack and aligns the SP, so gentraceback both can't unwind over it when it appears on the system stack (it'll read some uninitialized stack slot as the return PC). There's also no point in unwinding over it, so don't. Updates #23576. Change-Id: Idfcc9599c7636b80dec5451cb65ae892b4611981 Reviewed-on: https://go-review.googlesource.com/90895 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
No, I mean that the stack to which they point might get moved, and the saved register values wouldn't be updated to match (since they would be on the system stack, instead of the goroutine stack being moved).
Ah, I was thinking we would construct something that looks like a C stack frame rather than a Go stack frame. If I understand correctly, today the system stack in asmcgocall looks like this before the
If a frame pointer is present, it still points to the goroutine stack at that point, so the root of the stack appears to be an invalid frame. (We know how to restore the caller frame from the current stack frame and the TLS register, but that process doesn't follow the usual C or Go ABIs.) We could instead make that look like a 64-byte standard (Windows-and-SysV-compatible) call frame:
Then we mark As far as I can tell, the main problem with that approach is that the saved |
This is probably a duplicate of some other recent runtime issues, but it happens to be very easily reproduced using Go 1.10rc1. I'm sure it can be minimized further, but I am almost out of dev time for several days, so I wanted to get it up here ASAP in case it was helpful for @aclements.
Repro: Grab https://gist.github.com/josharian/cec3dbe91b2cffaf7419f53757cce3d3 and run it from inside any git repo. (See https://github.com/libgit2/git2go if there are questions about how to install git2go.) Pasted code:
The resulting crash is obviously programmer error (rw.Push expects a non-nil parameter), but the "unexpected return pc" part shouldn't be present.
Result:
System details
The text was updated successfully, but these errors were encountered: