Skip to content

runtime: crash with "leftover defer" #11023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rhysh opened this issue Jun 1, 2015 · 4 comments
Closed

runtime: crash with "leftover defer" #11023

rhysh opened this issue Jun 1, 2015 · 4 comments
Milestone

Comments

@rhysh
Copy link
Contributor

rhysh commented Jun 1, 2015

$ go version
go version devel +8cd191b Sat May 30 12:21:56 2015 +0000 linux/amd64
$ uname -a | awk '$2="host"'
Linux host 3.13.0-52-generic #86~precise1-Ubuntu SMP Tue May 5 18:08:21 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

I have a process that receives data over a few hundred concurrent TCP connections and writes them to files. It's been crashing on recent versions of tip (it was stable on 1.4.1) with messages like "runtime: g422: leftover defer sp=0xdeaddeaddeaddead pc=0x48d60e" and "runtime: g883: leftover defer sp=0x6576452e67736d62 pc=0x69203a746553746e". This may be related to #9610 and #10941, but I've seen it (once) with sp != 0xdeaddeaddeaddead .

runtime: g422: leftover defer sp=0xdeaddeaddeaddead pc=0x48d60e
    defer 0xc2087e7680 sp=0xdeaddeaddeaddead pc=0x48d60e
fatal error: traceback has leftover defers

runtime stack:
runtime.throw(0x8ce070, 0x1d)
    /usr/local/go/src/runtime/panic.go:527 +0x96
runtime.gentraceback(0x4435a0, 0xc208780a78, 0x0, 0xc208848a00, 0x0, 0x0, 0x7fffffff, 0x9298a0, 0xc208045e38, 0x0, ...)
    /usr/local/go/src/runtime/traceback.go:450 +0x7b9
runtime.copystack(0xc208848a00, 0x1000)
    /usr/local/go/src/runtime/stack1.go:569 +0x180
runtime.newstack()
    /usr/local/go/src/runtime/stack1.go:757 +0xb8a
runtime.morestack()
    /usr/local/go/src/runtime/asm_amd64.s:330 +0x82

goroutine 422 [copystack]:
runtime.cansemacquire(0xc208ed80e8, 0xc208848a80)
    /usr/local/go/src/runtime/sema.go:151 fp=0xc208780a80 sp=0xc208780a78
runtime.semacquire(0xc208ed80e8, 0xc208848a01)
    /usr/local/go/src/runtime/sema.go:69 +0x5d fp=0xc208780ae0 sp=0xc208780a80
net.runtime_Semacquire(0xc208ed80e8)
    /usr/local/go/src/runtime/sema.go:48 +0x2d fp=0xc208780af8 sp=0xc208780ae0
net.(*fdMutex).RWLock(0xc208ed80e0, 0x1, 0x0)
    /usr/local/go/src/net/fd_mutex.go:143 +0x16f fp=0xc208780b60 sp=0xc208780af8
net.(*netFD).readLock(0xc208ed80e0, 0x0, 0x0)
    /usr/local/go/src/net/fd_unix.go:161 +0x40 fp=0xc208780b80 sp=0xc208780b60
net.(*netFD).Read(0xc208ed80e0, 0xc20896e00e, 0xff2, 0xff2, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/fd_unix.go:221 +0x6f fp=0xc208780c50 sp=0xc208780b80
net.(*conn).Read(0xc2089ba010, 0xc20896e00e, 0xff2, 0xff2, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:124 +0xe7 fp=0xc208780cb8 sp=0xc208780c50
bufio.(*Scanner).Scan(0xc20882b280, 0xc2080b4240)
    /usr/local/go/src/bufio/scan.go:180 +0x87a fp=0xc208780e78 sp=0xc208780cb8
redacted(0xc20812a0a0, 0x7f375ce70a30, 0xc2089ba010, 0xc2080b4240)
    /redacted.go:99 +0x93 fp=0xc208780fc0 sp=0xc208780e78
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1670 +0x1 fp=0xc208780fc8 sp=0xc208780fc0
created by redacted
    /redacted.go:81 +0x21e

goroutine 1 [chan receive]:
main.main()
    /redacted.go:230 +0xe7b

goroutine 5 [chan send]:
redacted(0xc20809e000)
    /redacted.go:199 +0x1f0
created by redacted.init.1
    /redacted.go:184 +0x6e

[snip]

The stack trace with non-0xdead address is a bit more interesting - it looks like two separate threads are dumping stacks.

runtime: g883: leftover defer sp=0x6576452e67736d62 pc=0x69203a746553746e
    defer 0xc20af93780 sp=0x6576452e67736d62 pc=0x69203a746553746e
fatal error: unexpected signal during runtime execution
[signal 0xb code=0x80 addr=0x0 pc=0x452ba4]

runtime stack:
runtime.throw(0x8ef170, 0x2a)
    /usr/local/go/src/runtime/panic.go:527 +0x96
runtime.sigpanic()
    /usr/local/go/src/runtime/sigpanic_unix.go:12 +0x5d
runtime.gentraceback(0x4435a0, 0xc2088e8a78, 0x0, 0xc2095de000, 0x0, 0x0, 0x7fffffff, 0x9298a0, 0xc2085f3e38, 0x0, ...)
    /usr/local/go/src/runtime/traceback.go:448 +0x6d4
runtime.copystack(0xc2095de000, 0x1000)
    /usr/local/go/src/runtime/stack1.go:569 +0x180
runtime.newstack()
    /usr/local/go/src/runtime/stack1.go:757 +0xb8a
runtime.morestack()
    /usr/local/go/src/runtime/asm_amd64.s:330 +0x82

goroutine 883 [copystack]:
runtime.cansemacquire(0xc208ab2c48, 0xc2095de080)
    /usr/local/go/src/runtime/sema.go:151 fp=0xc2088e8a80 sp=0xc2088e8a78
runtime.semacquire(0xc208ab2c48, 0xc2095de001)
    /usr/local/go/src/runtime/sema.go:69 +0x5d fp=0xc2088e8ae0 sp=0xc2088e8a80
net.runtime_Semacquire(0xc208ab2c48)
    /usr/local/go/src/runtime/sema.go:48 +0x2d fp=0xc2088e8af8 sp=0xc2088e8ae0
net.(*fdMutex).RWLock(0xc208ab2c40, 0x1, 0x0)
    /usr/local/go/src/net/fd_mutex.go:143 +0x16f fp=0xc2088e8b60 sp=0xc2088e8af8
net.(*netFD).readLock(0xc208ab2c40, 0x0, 0x0)
    /usr/local/go/src/net/fd_unix.go:161 +0x40 fp=0xc2088e8b80 sp=0xc2088e8b60
net.(*netFD).Read(0xc208ab2c40, 0xc20886b081, 0xf7f, 0xf7f, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/fd_unix.go:221 +0x6f fp=0xc2088e8c50 sp=0xc2088e8b80
net.(*conn).Read(0xc209ab0508, 0xc20886b081, 0xf7f, 0xf7f, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:124 +0xe7 fp=0xc2088e8cb8 sp=0xc2088e8c50
bufio.(*Scanner).Scan(0xc20894b280, 0xc2080c0240)
runtime: g382: leftover defer sp=0x6576452e67736d62 pc=0x69203a746553746e
    defer 0xc20dfabfc0 sp=0x6576452e67736d62 pc=0x69203a746553746e
fatal error: unexpected signal during runtime execution
    /usr/local/go/src/bufio/scan.go:180 +0x87a fp=0xc2088e8e78 sp=0xc2088e8cb8
redacted(0xc20812a0a0, 0x7f4989e2f8d8, 0xc209ab0508, 0xc2080c0240)
    /redacted.go:99 +0x93 fp=0xc2088e8fc0 sp=0xc2088e8e78
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1670 +0x1 fp=0xc2088e8fc8 sp=0xc2088e8fc0
created by redacted
    /redacted.go:81 +0x21e

goroutine 1 [chan receive, 6 minutes]:
main.main()
    /redacted.go:230 +0xe7b

goroutine 5 [chan send]:
redacted(0xc20809e000)
    /redacted.go:199 +0x1f0
created by redacted.init.1
    /redacted.go:184 +0x6e

[snip]
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/10713 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/10791 mentions this issue.

methane pushed a commit to methane/go that referenced this issue Jun 6, 2015
Issues golang#10240, golang#10541, golang#10941, golang#11023, golang#11027 and possibly others are
indicating memory corruption in the runtime. One of the easiest places
to both get corruption and detect it is in the allocator's free lists
since they appear throughout memory and follow strict invariants. This
commit adds a check when sweeping a span that its free list is sane
and, if not, it prints the corrupted free list and panics. Hopefully
this will help us collect more information on these failures.

Change-Id: I6d417bcaeedf654943a5e068bd76b58bb02d4a64
aclements added a commit that referenced this issue Jun 7, 2015
Stack barriers assume that writes through pointers to frames above the
current frame will get write barriers, and hence these frames do not
need to be re-scanned to pick up these changes. For normal writes,
this is true. However, there are places in the runtime that use
typedmemmove to potentially write through pointers to higher frames
(such as mapassign1). Currently, typedmemmove does not execute write
barriers if the destination is on the stack. If there's a stack
barrier between the current frame and the frame being modified with
typedmemmove, and the stack barrier is not otherwise hit, it's
possible that the garbage collector will never see the updated pointer
and incorrectly reclaim the object.

Fix this by making heapBitsBulkBarrier (which lies behind typedmemmove
and its variants) detect when the destination is in the stack and
unwind stack barriers up to the point, forcing mark termination to
later rescan the effected frame and collect these pointers.

Fixes #11084. Might be related to #10240, #10541, #10941, #11023,
 #11027 and possibly others.

Change-Id: I323d6cd0f1d29fa01f8fc946f4b90e04ef210efd
Reviewed-on: https://go-review.googlesource.com/10791
Reviewed-by: Russ Cox <rsc@golang.org>
aclements added a commit that referenced this issue Jun 16, 2015
Issues #10240, #10541, #10941, #11023, #11027 and possibly others are
indicating memory corruption in the runtime. One of the easiest places
to both get corruption and detect it is in the allocator's free lists
since they appear throughout memory and follow strict invariants. This
commit adds a check when sweeping a span that its free list is sane
and, if not, it prints the corrupted free list and panics. Hopefully
this will help us collect more information on these failures.

Change-Id: I6d417bcaeedf654943a5e068bd76b58bb02d4a64
Reviewed-on: https://go-review.googlesource.com/10713
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
@aclements
Copy link
Member

Hi @rhysh. We've fixed several memory corruption and lost write barrier issues in the runtime over the past few weeks. Please try to reproduce the problem with current master and reopen this issue if it's still happening. Thanks!

@rhysh
Copy link
Contributor Author

rhysh commented Jun 23, 2015

Hi @aclements - I ran my app with 8fa1a69 (from 17 Jun 2015) for a few days and did not observe any crashes. It looks like this is resolved. Thanks!

@golang golang locked and limited conversation to collaborators Jun 25, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants