-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime/pprof: apparent deadlock in TestGoroutineSwitch on linux-armv6l #47505
Comments
2021-08-11T22:07:50-dea23e9/linux-arm-aws |
The trace in 2021-08-11T22:07:50-dea23e9/linux-arm-aws seems to have better detail: similar failure mode, but those two
|
For comparison, a successful run in https://build.golang.org/log/8b12b357fab1605247238fe334c5b2aec2e090c0 shows the entire |
Reproduced in a gomote, turns out this was a latent bug brought to the forefront by #47554, which turned on @cherrymui figured it out the full issue: on ARMv6 we are missing some atomics, so we use a kernel helper and a lock a spinlock from a table indexed by the atomically accessed value's address (I guess the kernel helper needs to be synchronized for the same address?). When the profiling signal is enabled, we try to use the atomics in the signal handler for the profiling signal. If it happens to land when one of these atomics are being used (so a spinlock is held), and it indexes into the same spinlock in the table, the program can self-deadlock. I don't know the full details as to why it works this way, but the fix is trivial: just back out of the profiling signal handler if it discovers it has landed in the kernel helper. @prattmic is working on putting the fix up for review. |
@gopherbot Please open a backport issues for 1.15 and 1.16. This bug is not new and can cause deadlocks on linux/arm ( |
Backport issue(s) opened: #47674 (for 1.15), #47675 (for 1.16). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
Change https://golang.org/cl/341889 mentions this issue: |
I'm not sure what you mean by this, but I'll note that the kernel helper is used to lock the spinlock (the spinlock is locked with Cas, Cas calls the cas kernel helper). So if SIGPROF arrives when we are still in the helper but after the lock is successfully taken (probably on the RET), then another use of atomics from the signal handler may try to lock the same spinlock and deadlock. I believe it is simply bad luck when we happen to pick the same spinlock. |
Got it. I misunderstood the purpose of the kernel helper. |
@gopherbot Please open a backport issue for 1.17. This bug is not new and can cause deadlocks on linux/arm ( |
Change https://golang.org/cl/341853 mentions this issue: |
Change https://golang.org/cl/341890 mentions this issue: |
… helpers On Linux ARMv6 and below runtime/internal/atomic.Cas calls into a kernel cas helper at a fixed address. If a SIGPROF arrives while executing the kernel helper, the sigprof lostAtomic logic will miss that we are potentially in the spinlock critical section, which could cause a deadlock when using atomics later in sigprof. For #47505 Fixes #47688 Change-Id: If8ba0d0fc47e45d4e6c68eca98fac4c6ed4e43c1 Reviewed-on: https://go-review.googlesource.com/c/go/+/341889 Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> (cherry picked from commit 20a620f) Reviewed-on: https://go-review.googlesource.com/c/go/+/341890
… helpers On Linux ARMv6 and below runtime/internal/atomic.Cas calls into a kernel cas helper at a fixed address. If a SIGPROF arrives while executing the kernel helper, the sigprof lostAtomic logic will miss that we are potentially in the spinlock critical section, which could cause a deadlock when using atomics later in sigprof. For #47505 Fixes #47675 Change-Id: If8ba0d0fc47e45d4e6c68eca98fac4c6ed4e43c1 Reviewed-on: https://go-review.googlesource.com/c/go/+/341889 Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> (cherry picked from commit 20a620f) Reviewed-on: https://go-review.googlesource.com/c/go/+/341853
What version of Go are you using (
go version
)?go1.17rc2
Does this issue reproduce with the latest release?
Literally yes.
What operating system and processor architecture are you using (
go env
)?linux-armv6l
What did you do?
Ran tests as part of building a release. They failed the first 2 of 3 attempts.
See #47502
What did you expect to see?
Success
What did you see instead?
The text was updated successfully, but these errors were encountered: