-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: SIGPROF during stack barrier install can panic #11863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can we just cas something to make the stack barrier insertion wait for SIGPROF or vice versa? That particular race has to be separate threads, because SIGPROF cannot be interrupted. |
For whatever reason I can't reproduce this on master (92e2252) or even on the cherry-picked version of the latest version of CL 12674 (510fd13). I can still reproduce it on https://go-review.googlesource.com/12674/2 (but not on any later version). |
CL https://golang.org/cl/12748 mentions this issue. |
We're seeing this issue on go 1.5.1 Stack trace:
|
@DanielMorsing, this issue was closed and labeled Go 1.5. It's best to open a new issue. This one might be ignored by all our dashboards, since it's not new or targeted at an upcoming release. In general we don't re-use old bugs. |
Alright, Will do. |
CL https://golang.org/cl/17036 mentions this issue. |
A sigprof during stack barrier insertion or removal can crash if it detects an inconsistency between the stkbar array and the stack itself. Currently we protect against this when scanning another G's stack using stackLock, but we don't protect against it when unwinding stack barriers for a recover or a memmove to the stack. This commit cleans up and improves the stack locking code. It abstracts out the lock and unlock operations. It uses the lock consistently everywhere we perform stack operations, and pushes the lock/unlock down closer to where the stack barrier operations happen to make it more obvious what it's protecting. Finally, it modifies sigprof so that instead of spinning until it acquires the lock, it simply doesn't perform a traceback if it can't acquire it. This is necessary to prevent self-deadlock. Updates #11863, which introduced stackLock to fix some of these issues, but didn't go far enough. Updates #12528. Change-Id: I9d1fa88ae3744d31ba91500c96c6988ce1a3a349 Reviewed-on: https://go-review.googlesource.com/17036 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
CL https://golang.org/cl/17057 mentions this issue. |
…ier ops A sigprof during stack barrier insertion or removal can crash if it detects an inconsistency between the stkbar array and the stack itself. Currently we protect against this when scanning another G's stack using stackLock, but we don't protect against it when unwinding stack barriers for a recover or a memmove to the stack. This commit cleans up and improves the stack locking code. It abstracts out the lock and unlock operations. It uses the lock consistently everywhere we perform stack operations, and pushes the lock/unlock down closer to where the stack barrier operations happen to make it more obvious what it's protecting. Finally, it modifies sigprof so that instead of spinning until it acquires the lock, it simply doesn't perform a traceback if it can't acquire it. This is necessary to prevent self-deadlock. Updates #11863, which introduced stackLock to fix some of these issues, but didn't go far enough. Updates #12528. Change-Id: I9d1fa88ae3744d31ba91500c96c6988ce1a3a349 Reviewed-on: https://go-review.googlesource.com/17036 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-on: https://go-review.googlesource.com/17057
The following sequence of events can lead to an out-of-bounds access attempt in the runtime:
Because of the particularly prickly context, this double faults and turns into a "panic: fatal error: malloc deadlock".
I can reproduce this ~1 in 10 runs by applying https://go-review.googlesource.com/12674 and running
This should have nothing to do with CL 12647, but applying the CL makes it easy to reproduce (presumably because of some effect on timings).
I'm not sure what the solution to this is. We already have a few cases where we just give up when we're walking the stack for a profile. We could do that here, too, if gentraceback encounters a stack barrier it wasn't expecting. Alternatively, we could make sigprof pickier about Gs in _Gsyscall, though I'm not sure how exactly.
@rsc @RLH @ianlancetaylor @dvyukov
The text was updated successfully, but these errors were encountered: