New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
misc/cgo/testcarchive: TestSIGPROF failures with "signal: profiling timer expired" #43828
Comments
Given that this has affected at least three TryBots in the last month alone, I'm marking it as a release-blocker via #11811. |
Checking on this as a release blocker. Are there any updates? @ianlancetaylor |
I'll take a look. |
I think I have a hunch. When we turn off profiling,
If a pending profiling signal lands right after the Cas but before the setsig (including in another thread running concurrently), in the signal handler, sigfwdgo will see that we're not handling that signal ( Maybe we could do the Cas after the setsig? We have |
Good catch. I think it would be correct to change the code to if atomic.Load(&handlingSig[_SIGPROF]) != 0 {
...
setsig(...)
atomic.Store(&handlingSig[_SIGPROF], 0)
} It can mess up if there a race between enabling and disabling profiling, but that will always have unpredictable results. |
SGTM. With that change I still see failures, possibly with a lower rate (same for CAS'ing to 2 then store 0). Still looking. |
While I haven't got any useful debugging, here is another guess: when we do the atomic load, setsig, atomic store, a signal handler may be running concurrently on another thread, and it may still observe Maybe we need some mechanism to synchronize with a concurrently running signal handler. Or maybe we could atomic store |
With this patch I cannot reproduce the failure. But the code is a bit weird... Maybe there is a better way. diff --git a/src/runtime/signal_unix.go b/src/runtime/signal_unix.go
index dbcbfc67bc..12340628ad 100644
--- a/src/runtime/signal_unix.go
+++ b/src/runtime/signal_unix.go
@@ -298,12 +298,14 @@ func setProcessCPUProfilerTimer(hz int32) {
// that use profiling don't want to crash on a stray SIGPROF.
// See issue 19320.
if !sigInstallGoHandler(_SIGPROF) {
- if atomic.Cas(&handlingSig[_SIGPROF], 1, 0) {
+ if atomic.Load(&handlingSig[_SIGPROF]) != 0 {
h := atomic.Loaduintptr(&fwdSig[_SIGPROF])
if h == _SIG_DFL {
h = _SIG_IGN
+ atomic.Storeuintptr(&fwdSig[_SIGPROF], h)
}
setsig(_SIGPROF, h)
+ atomic.Store(&handlingSig[_SIGPROF], 0)
}
}
}
@@ -1034,6 +1036,8 @@ func sigfwdgo(sig uint32, info *siginfo, ctx unsafe.Pointer) bool {
// If we aren't handling the signal, forward it.
if atomic.Load(&handlingSig[sig]) == 0 || !signalsOK {
+ fwdFn = atomic.Loaduintptr(&fwdSig[sig]) // XXX load again
+
// If the signal is ignored, doing nothing is the same as forwarding.
if fwdFn == _SIG_IGN || (fwdFn == _SIG_DFL && flags&_SigIgn != 0) {
return true |
Interesting. So the problem may be that the switch from What if we change the code that enables profiling instead? If |
Good idea! Thanks. Sent CL https://go-review.googlesource.com/c/go/+/374074 . |
Change https://golang.org/cl/374074 mentions this issue: |
2021-01-18T17:21:53-5a8fbb0/freebsd-amd64-11_2
2021-01-14T22:01:23-e125ccd/linux-amd64-wsl
See previously #43710, #19320
CC @ianlancetaylor
The text was updated successfully, but these errors were encountered: