-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: unix domain socket crash on darwin High Sierra 10.13.6, go 1.21.0 #62337
Comments
If the stack trace is correct this is a |
Hm... Possibly related to #60449? Though, if it's signal specific it might unrelated; that's more general memory corruption we're seeing. How easy is this to reproduce? |
I didn't manually try to resize the window, but I do remember the terminal "blinking", so perhaps there was a terminal resize invoked by some code... the Go process loads R 4.0.5 as a library, which, being a huge old crusty C codebase, might very well be doing something with the terminal... or gnu readline, or libedit, might be doing stuff. The process tells R to open a graphics window, so XQuartz on darwin gets launched and a graphics window attempts to be opened... and OSX tries to bring that graphics window to the top of the desktop, which perhaps sends a signal to the terminal (which is Iterm2 build 3.1.5) Any number components might have done stuff to the terminal, I guess.
@mknyszek |
Ah hah. Thanks Ian for the idea about SIGWINCH. I was able to reproduce a crash again by resizing the terminal window. So perhaps this is unrelated to the unix domain socket stuff, and more related to terminal stuff. Also I realized that because the R library (loaded by the main Go process and called by cgo) loads a shared object (.dylib) that also contains Go code, there are two copies of the Go runtime involved (one from the main process, one embedded in the .dylib), and I cannot tell up front which one is doing the crashing... but, the goroutine stack trace does seem to be from the main, initial process, Go runtime, because it has calls to the web-server portion that I use to show graphics and the history of the R command line interaction. However, the other runtime might be crashing??? I don't think I was ever able to completely isolate it from the occassional signal, unfortunately, and sometimes they get through to the "inside" runtime, I suspect (does that make sense? i.e. perhaps the fix for #13034 still left some ways through). |
Looking at the source for libedit, which is what I'm linking the R library against, https://opensource.apple.com/source/libedit/libedit-31/src/sig.c.auto.html I notice that it does not use the SA_ONSTACK flag when taking over signals while it reads a command line. So I think I can see a race where, between the time the libedit reading a command line returns, and before my C code can restore the SA_ONSTACK flag to the signal handlers, here https://github.com/glycerine/embedr/blob/master/cpp/embedr.cpp#L349 a signal could get through to the top/main Go runtime that does not have SA_ONSTACK set. Moreover this is fragile anyway since the runtime is running in the background on a separate thread right? |
@ianlancetaylor Is there a way to have the Go runtime not crash if SA_ONSTACK is not set? making sure it is always set is super tricky when using these big old C systems like R and libedit... (referring to
from https://pkg.go.dev/os/signal |
possibly related: #22805 |
I'm trying the following to keep signals away from the Host (main) Go runtime. I'd appreciate any thoughts about whether this would be expected to work... or is there still a subtle race here? right after crossing the CGO border into running C code, I save the set of signal handlers (which would include any host Go runtime signal handlers for growing stacks), and then null out all the signal handlers: https://github.com/glycerine/embedr/blob/master/cpp/embedr.cpp#L176 So if the libedit/libreadline code that sets SIGWINCH does stuff, the host Go runtime should never see it, even in that small race window of the readline code finishing and the Go code getting to run again, because we are still on the C side of the CGO go->into C call. So I restore the host Go runtime signal handlers before returning from the CGO call. https://github.com/glycerine/embedr/blob/master/cpp/embedr.cpp#L210 I think this should prevent the kernel from ever giving the host Go runtime a signal callback, right? I'm still puzzling about how to protect the guest Go runtime (inside the .so/DLL/.dylib) from getting signals. Thinking on it. Any suggestions welcome. I think it has to get signals in order to grow its stacks, no? (or maybe not? that would be greatly simplifying)... I can save the guest Go runtime signal handlers after it has initialized, but there are alot of calls back and forth between R and Go guest code in the DLL, where I don't have a way to intercept in C and make sure the guest runtime signal handlers are installled... without modifying the C code for R, which would be painful. |
So reading about
This seems to say that I would need get the synchronous signals and SIGPIPE back in place for the guest Go runtime to be 100% happy. If this is an accurate conclusion, it seems like some kind of callback mechanism to let my code restore to the guest Go runtime code its proper/expected signal handlers without having to modify the calling C code would be highly useful (to put it mildly, really we want to avoid random sporadic crashes to our working programs in most all cases). |
Anywho, its pretty clear to me this crash was not a Go bug but rather a coding mistake on my part. Feel free to close this issue. But if you all do have advice on how to handle the signals for the guest c-shared Go runtime, I would be grateful for guidance. Thank you. |
Thanks for the thorough investigation. When any Go code is used in a process, all signals must have the If
There is not. That said, since you mention it, we could consider having some knob for this somewhere. Though I don't know where. Turning the knob on would tend to waste a lot more memory on goroutine stacks, but it might be an acceptable tradeoff for a C program calling Go code that doesn't want to run very many goroutines. That should be a different issue, though. Assuming you can't change the C code that you are calling, one possible fix for your program might be to have a hook after the signal handlers are installed that just sets the In any case it doesn't sound like there is anything to change in the Go for this issue, so closing. Thanks again. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
This is the latest
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I started using unix domain sockets on go1.21.0 on High Sierra. I observed once a sporadic crash. This was not easy to reproduce, it happens only sporadically; probably some race.
I assume if High Sierra is no longer supported, that may be why... perhaps there was some runtime evolution of the code, and I'll just need to go back to an older Go release. But it would be nice to know if that is the case, or perhaps this really does indicate a bug in the darwin unix domain socket handling Go runtime code.
What did you expect to see?
Not a crash.
What did you see instead?
A crash a moment after starting a unix domain listener.
goroutine 0 [idle]:
runtime.(*sigctxt).sigcode(...)
/usr/local/go1.21.0/src/runtime/signal_darwin_amd64.go:43
runtime.(*sigctxt).sigFromUser(...)
/usr/local/go1.21.0/src/runtime/os_unix_nonlinux.go:14
runtime.sighandler(0x1c, 0x100a783a0?, 0x7ffeefbfd360?, 0x100a777a0)
/usr/local/go1.21.0/src/runtime/signal_unix.go:676 +0x177 fp=0x7ffeefbfd338 sp=0x7ffeefbfd2d0 pc=0x100051e37
runtime.sigtrampgo(0x1c, 0x8300, 0x820000008303)
/usr/local/go1.21.0/src/runtime/signal_unix.go:490 +0x13c fp=0x7ffeefbfd3b0 sp=0x7ffeefbfd338 pc=0x10005181c
runtime.sigtramp()
/usr/local/go1.21.0/src/runtime/sys_darwin_amd64.s:189 +0x46 fp=0x7ffeefbfd400 sp=0x7ffeefbfd3b0 pc=0x100071486
the place where this crash occurred:
caller was
here is the unix domain code; it creates a simple cooperative lock between processes, to avoid
both working on the same file
The text was updated successfully, but these errors were encountered: