Skip to content

runtime: SIGPROF during stack barrier install can panic #11863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aclements opened this issue Jul 25, 2015 · 8 comments
Closed

runtime: SIGPROF during stack barrier install can panic #11863

aclements opened this issue Jul 25, 2015 · 8 comments
Milestone

Comments

@aclements
Copy link
Member

The following sequence of events can lead to an out-of-bounds access attempt in the runtime:

  1. A SIGPROF comes in on a thread while the G on that thread is in _Gsyscall. The sigprof handler calls gentraceback, which saves a local copy of the G's stkbar slice. Currently the G has no stack barriers, so this slice is empty.
  2. On another thread, the GC concurrently scans the stack of the goroutine being profiled (it considers it stopped because it's in _Gsyscall) and installs stack barriers.
  3. Back on the sigprof thread, gentraceback comes across a stack barrier in the stack and attempts to look it up in its (zero length) copy of G's old stkbar slice and attempts an out-of-bounds access.

Because of the particularly prickly context, this double faults and turns into a "panic: fatal error: malloc deadlock".

I can reproduce this ~1 in 10 runs by applying https://go-review.googlesource.com/12674 and running

cd $GOROOT/src/runtime/pprof
go test -c
stress ./pprof.test -test.v -test.short

This should have nothing to do with CL 12647, but applying the CL makes it easy to reproduce (presumably because of some effect on timings).

I'm not sure what the solution to this is. We already have a few cases where we just give up when we're walking the stack for a profile. We could do that here, too, if gentraceback encounters a stack barrier it wasn't expecting. Alternatively, we could make sigprof pickier about Gs in _Gsyscall, though I'm not sure how exactly.

@rsc @RLH @ianlancetaylor @dvyukov

@aclements aclements added this to the Go1.5 milestone Jul 25, 2015
@rsc
Copy link
Contributor

rsc commented Jul 27, 2015

Can we just cas something to make the stack barrier insertion wait for SIGPROF or vice versa? That particular race has to be separate threads, because SIGPROF cannot be interrupted.

@aclements
Copy link
Member Author

For whatever reason I can't reproduce this on master (92e2252) or even on the cherry-picked version of the latest version of CL 12674 (510fd13). I can still reproduce it on https://go-review.googlesource.com/12674/2 (but not on any later version).

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/12748 mentions this issue.

@DanielMorsing
Copy link
Contributor

We're seeing this issue on go 1.5.1

Stack trace:

found next stack barrier at 0xc857445a00; expected [*0xc8427059f0=0x869015 *0xc842705e38=0x61b1de]
fatal error: missed stack barrier

goroutine 0 [idle]:
runtime.throw(0xcb8fc0, 0x14)
    /root/.gvm/gos/go1.5.1/src/runtime/panic.go:527 +0x90
runtime.gentraceback(0x413929, 0x7fea597f9b48, 0x0, 0xc8206ca300, 0x0, 0xc8207a7738, 0x40, 0x0, 0x0, 0x6, ...)
    /root/.gvm/gos/go1.5.1/src/runtime/traceback.go:261 +0x1053
runtime.sigprof(0x413929, 0x7fea597f9b48, 0x0, 0xc8206ca300, 0xc82079c000)
    /root/.gvm/gos/go1.5.1/src/runtime/proc1.go:2585 +0x50b
runtime.sighandler(0xc80000001b, 0xc8207a7bb0, 0xc8207a7a80, 0xc8206ca300)
    /root/.gvm/gos/go1.5.1/src/runtime/signal_amd64x.go:47 +0xbe
runtime.sigtrampgo(0x1b, 0xc8207a7bb0, 0xc8207a7a80)
    /root/.gvm/gos/go1.5.1/src/runtime/signal_linux.go:94 +0x95
runtime.sigtramp(0x1, 0x0, 0xc8207a0000, 0x0, 0x7fa0, 0x4000000, 0xc859cb1c80, 0x0, 0x0, 0x3, ...)
    /root/.gvm/gos/go1.5.1/src/runtime/sys_linux_amd64.s:234 +0x1b
runtime.sigreturn(0x0, 0xc8207a0000, 0x0, 0x7fa0, 0x4000000, 0xc859cb1c80, 0x0, 0x0, 0x3, 0xd817e3, ...)
    /root/.gvm/gos/go1.5.1/src/runtime/sys_linux_amd64.s:238

goroutine 870933 [copystack]:
runtime.callers(0x4, 0xc842705298, 0x20, 0x20, 0xd)
    /root/.gvm/gos/go1.5.1/src/runtime/traceback.go:562 fp=0xc857445268 sp=0xc857445260
runtime.mProf_Malloc(0xc82e414580, 0x40)
    /root/.gvm/gos/go1.5.1/src/runtime/mprof.go:235 +0x7f fp=0xc8574453e8 sp=0xc857445268
runtime.profilealloc(0xc82079c000, 0xc82e414580, 0x40)
    /root/.gvm/gos/go1.5.1/src/runtime/malloc.go:811 +0x98 fp=0xc857445410 sp=0xc8574453e8
runtime.mallocgc(0x40, 0x0, 0x3, 0x412bd5)
    /root/.gvm/gos/go1.5.1/src/runtime/malloc.go:699 +0x5d3 fp=0xc8574454e0 sp=0xc857445410
runtime.rawstring(0x31, 0x0, 0x0, 0x0, 0x0, 0x0)
    /root/.gvm/gos/go1.5.1/src/runtime/string.go:264 +0x70 fp=0xc857445528 sp=0xc8574454e0
runtime.rawstringtmp(0x0, 0x31, 0x0, 0x0, 0x0, 0x0, 0x0)
    /root/.gvm/gos/go1.5.1/src/runtime/string.go:107 +0xb7 fp=0xc857445560 sp=0xc857445528
runtime.concatstrings(0x0, 0xc8427056e0, 0x3, 0x3, 0x0, 0x0)
    /root/.gvm/gos/go1.5.1/src/runtime/string.go:48 +0x1c2 fp=0xc857445698 sp=0xc857445560
runtime.concatstring3(0x0, 0xc82e412420, 0x28, 0xc2d0e8, 0x4, 0xc841aaf3d0, 0x5, 0x0, 0x0)
    /root/.gvm/gos/go1.5.1/src/runtime/string.go:62 +0x6a fp=0xc8574456e8 sp=0xc857445698
found next stack barrier at 0xc857445a00; expected [*0xc8427059f0=0x869015 *0xc842705e38=0x61b1de]
fatal error: missed stack barrier
panic during panic

goroutine 0 [idle]:
runtime.startpanic_m()
    /root/.gvm/gos/go1.5.1/src/runtime/panic1.go:67 +0x141
runtime.systemstack(0xd862e0)
    /root/.gvm/gos/go1.5.1/src/runtime/asm_amd64.s:278 +0xab
runtime.startpanic()
    /root/.gvm/gos/go1.5.1/src/runtime/panic.go:505 +0x14
runtime.throw(0xcb8fc0, 0x14)
    /root/.gvm/gos/go1.5.1/src/runtime/panic.go:526 +0x83
runtime.gentraceback(0x452670, 0xc857445260, 0x0, 0xc859cb1c80, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
    /root/.gvm/gos/go1.5.1/src/runtime/traceback.go:261 +0x1053
runtime.traceback1(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc859cb1c80, 0x0)
    /root/.gvm/gos/go1.5.1/src/runtime/traceback.go:552 +0xc8
runtime.traceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc859cb1c80)
    /root/.gvm/gos/go1.5.1/src/runtime/traceback.go:529 +0x48
runtime.tracebackothers(0xc8206ca180)
    /root/.gvm/gos/go1.5.1/src/runtime/traceback.go:666 +0xda
runtime.dopanic_m(0xc8206ca180, 0x42e9a0, 0xc8207a7588)
    /root/.gvm/gos/go1.5.1/src/runtime/panic1.go:104 +0x1f9
runtime.dopanic.func1()
    /root/.gvm/gos/go1.5.1/src/runtime/panic.go:514 +0x32
runtime.systemstack(0xc8207a7560)
    /root/.gvm/gos/go1.5.1/src/runtime/asm_amd64.s:278 +0xab
runtime.dopanic(0x0)
    /root/.gvm/gos/go1.5.1/src/runtime/panic.go:515 +0x61
runtime.throw(0xcb8fc0, 0x14)
    /root/.gvm/gos/go1.5.1/src/runtime/panic.go:527 +0x90
runtime.gentraceback(0x413929, 0x7fea597f9b48, 0x0, 0xc8206ca300, 0x0, 0xc8207a7738, 0x40, 0x0, 0x0, 0x6, ...)
    /root/.gvm/gos/go1.5.1/src/runtime/traceback.go:261 +0x1053
runtime.sigprof(0x413929, 0x7fea597f9b48, 0x0, 0xc8206ca300, 0xc82079c000)
    /root/.gvm/gos/go1.5.1/src/runtime/proc1.go:2585 +0x50b
runtime.sighandler(0xc80000001b, 0xc8207a7bb0, 0xc8207a7a80, 0xc8206ca300)
    /root/.gvm/gos/go1.5.1/src/runtime/signal_amd64x.go:47 +0xbe
runtime.sigtrampgo(0x1b, 0xc8207a7bb0, 0xc8207a7a80)
    /root/.gvm/gos/go1.5.1/src/runtime/signal_linux.go:94 +0x95
runtime.sigtramp(0x1, 0x0, 0xc8207a0000, 0x0, 0x7fa0, 0x4000000, 0xc859cb1c80, 0x0, 0x0, 0x3, ...)
    /root/.gvm/gos/go1.5.1/src/runtime/sys_linux_amd64.s:234 +0x1b
runtime.sigreturn(0x0, 0xc8207a0000, 0x0, 0x7fa0, 0x4000000, 0xc859cb1c80, 0x0, 0x0, 0x3, 0xd817e3, ...)
    /root/.gvm/gos/go1.5.1/src/runtime/sys_linux_amd64.s:238

@DanielMorsing DanielMorsing reopened this Oct 14, 2015
@bradfitz
Copy link
Contributor

@DanielMorsing, this issue was closed and labeled Go 1.5. It's best to open a new issue. This one might be ignored by all our dashboards, since it's not new or targeted at an upcoming release.

In general we don't re-use old bugs.

@DanielMorsing
Copy link
Contributor

Alright, Will do.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/17036 mentions this issue.

aclements added a commit that referenced this issue Nov 19, 2015
A sigprof during stack barrier insertion or removal can crash if it
detects an inconsistency between the stkbar array and the stack
itself. Currently we protect against this when scanning another G's
stack using stackLock, but we don't protect against it when unwinding
stack barriers for a recover or a memmove to the stack.

This commit cleans up and improves the stack locking code. It
abstracts out the lock and unlock operations. It uses the lock
consistently everywhere we perform stack operations, and pushes the
lock/unlock down closer to where the stack barrier operations happen
to make it more obvious what it's protecting. Finally, it modifies
sigprof so that instead of spinning until it acquires the lock, it
simply doesn't perform a traceback if it can't acquire it. This is
necessary to prevent self-deadlock.

Updates #11863, which introduced stackLock to fix some of these
issues, but didn't go far enough.

Updates #12528.

Change-Id: I9d1fa88ae3744d31ba91500c96c6988ce1a3a349
Reviewed-on: https://go-review.googlesource.com/17036
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/17057 mentions this issue.

aclements added a commit that referenced this issue Nov 20, 2015
…ier ops

A sigprof during stack barrier insertion or removal can crash if it
detects an inconsistency between the stkbar array and the stack
itself. Currently we protect against this when scanning another G's
stack using stackLock, but we don't protect against it when unwinding
stack barriers for a recover or a memmove to the stack.

This commit cleans up and improves the stack locking code. It
abstracts out the lock and unlock operations. It uses the lock
consistently everywhere we perform stack operations, and pushes the
lock/unlock down closer to where the stack barrier operations happen
to make it more obvious what it's protecting. Finally, it modifies
sigprof so that instead of spinning until it acquires the lock, it
simply doesn't perform a traceback if it can't acquire it. This is
necessary to prevent self-deadlock.

Updates #11863, which introduced stackLock to fix some of these
issues, but didn't go far enough.

Updates #12528.

Change-Id: I9d1fa88ae3744d31ba91500c96c6988ce1a3a349
Reviewed-on: https://go-review.googlesource.com/17036
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-on: https://go-review.googlesource.com/17057
@golang golang locked and limited conversation to collaborators Nov 18, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants