-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: cgo stuck because of go signal handler went into dead loop if pid=1 #59569
Comments
We will still need a way to reproduce this issue to verify any fixes |
What does “containered CentOS 7” mean? Does this change the semantics of rt_sigaction? We set the action for SIGSEGV to DFL, so it shouldn’t be possible to deliver to a handler anymore. |
First thanks for your help.
The container is built from centos 7.
Yes, I can see that from I can keep an environment for debug the next time it reproduce, if anybody interest enough to debug the issue. |
I don't want to merge the patch, but want somebody professional enough to review the changes. |
Can you show us the output from |
@ianlancetaylor the strace is on #56649 (we probably should have reopened that one rather than making a new bug) |
Thanks. I also don't understand that strace. It appears to be running the code in
In other words the process receives a You said that this only happens occasionally. What does the |
Yep. I agree with u. So this maybe the kernel bug, I guess. Recently, we also request our SRE to record the kernel version of the host which reproduce the issue.
It take some times. I will provide the log and keep an environment for debug the next time it reproduce. |
@ianlancetaylor Thanks in advance. |
Thanks. I just noticed I don't know how much we care about this edge case. But we could change |
It's weird, the process crash and exit in almost all case, and they all run as pid 1 in a container. In other words, a container with only one process(no matter it comes from C/C++, Golang, or others) should crash and exit for SIGSEGV, or something unexpected should happed. Image a case, that all container crash but not exit... So I still think there is a bug hide in go or the kernel. Thanks for your attention! |
The failure will only occur if the signal is delivered on a thread that was created by C, not by Go. So there is some unpredictability as to precisely how it will be handled. It is only when the signal is delivered to a C thread that we go into the endless loop of signal delivery due to running on PID 1. |
For reference, the kernel ignores signals with SIG_DFL handler for init tasks (SIGNAL_UNKILLABLE) unless it is a "forced" signal: https://elixir.bootlin.com/linux/v6.3/source/kernel/signal.c#L89 Kernel generated signals (such as SIGSEGV from a page fault) are "forced": https://elixir.bootlin.com/linux/v6.3/source/kernel/signal.c#L1245 So we should be able to raise the signal by explicitly faulting after removing the signal handler. |
Yep, you are right. I can reproduce the issue myself with follwing step.
Notice:
|
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Sure
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
The bug found in go1.10.3 at first, the I update go to go1.19, it happends too. I find the root case lives here:
go/src/runtime/signal_unix.go
Line 964 in de475e8
In almost all real cases the program is about to crash, but in my situation, it does not after
usleep
. But why, I cannot explain!!!.Then I try fix it myself with the following patch:
What did you expect to see?
panic
What did you see instead?
See #56649
The text was updated successfully, but these errors were encountered: