Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: stack trace shows confusing call sequence during panic #13857

Closed
rogpeppe opened this issue Jan 7, 2016 · 16 comments
Closed

runtime: stack trace shows confusing call sequence during panic #13857

rogpeppe opened this issue Jan 7, 2016 · 16 comments
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@rogpeppe
Copy link
Contributor

rogpeppe commented Jan 7, 2016

I had a program that seemed to be deadlocking, and it showed the
following confusing stack trace:

goroutine 8 [semacquire]:
sync.runtime_Semacquire(0xc82019c91c)
    /home/rog/go/src/runtime/sema.go:47 +0x26
sync.(*WaitGroup).Wait(0xc82019c910)
    /home/rog/go/src/sync/waitgroup.go:127 +0xb4
gopkg.in/juju/charmstore.v5-unstable/internal/entitycache.(*Cache).Close(0xc82019c900)
    /home/rog/src/go/src/gopkg.in/juju/charmstore.v5-unstable/internal/entitycache/cache.go:90 +0x2d
sync.(*Mutex).Unlock(0xc82019c9a8)
    /home/rog/go/src/sync/mutex.go:109 +0x9f
gopkg.in/juju/charmstore.v5-unstable/internal/entitycache_test.newChanStore.func2(0xc8201de0c0, 0xc82016a600, 0x3, 0x0, 0x0)
    /home/rog/src/go/src/gopkg.in/juju/charmstore.v5-unstable/internal/entitycache/cache_test.go:253 +0xa0
gopkg.in/juju/charmstore.v5-unstable/internal/entitycache_test.(*callbackStore).FindBaseEntity(0xc820118860, 0xc8201de0c0, 0xc82016a600, 0x0, 0x0, 0x0)
    /home/rog/src/go/src/gopkg.in/juju/charmstore.v5-unstable/internal/entitycache/cache_test.go:275 +0x44
gopkg.in/juju/charmstore.v5-unstable/internal/entitycache.(*Cache).getBaseEntity(0xc82019c900, 0xc8201de0c0, 0xc82016a600, 0x0, 0x0, 0x0, 0x0)
    /home/rog/src/go/src/gopkg.in/juju/charmstore.v5-unstable/internal/entitycache/cache.go:187 +0x85
gopkg.in/juju/charmstore.v5-unstable/internal/entitycache.(*Cache).(gopkg.in/juju/charmstore.v5-unstable/internal/entitycache.getBaseEntity)-fm(0xc8201de0c0, 0xc82016a600, 0x0, 0x0, 0x0, 0x0)
    /home/rog/src/go/src/gopkg.in/juju/charmstore.v5-unstable/internal/entitycache/cache.go:82 +0x50
gopkg.in/juju/charmstore.v5-unstable/internal/entitycache.(*stash).fetch(0xc82019c998, 0xc8201de0c0, 0xc82016a600, 0x0, 0x0, 0x0)
    /home/rog/src/go/src/gopkg.in/juju/charmstore.v5-unstable/internal/entitycache/cache.go:364 +0x5b
gopkg.in/juju/charmstore.v5-unstable/internal/entitycache.(*stash).entity(0xc82019c998, 0xc8201de0c0, 0xc82016a7e0, 0x0, 0x0, 0x0, 0x0)
    /home/rog/src/go/src/gopkg.in/juju/charmstore.v5-unstable/internal/entitycache/cache.go:285 +0x49d
gopkg.in/juju/charmstore.v5-unstable/internal/entitycache.(*Cache).BaseEntity(0xc82019c900, 0xc8201de060, 0xc82016a7e0, 0xbc05f0, 0x0, 0x0)
    /home/rog/src/go/src/gopkg.in/juju/charmstore.v5-unstable/internal/entitycache/cache.go:158 +0xd8
gopkg.in/juju/charmstore.v5-unstable/internal/entitycache_test.(*suite).TestEntityIssuesBaseEntityQuerySequentiallyForPromulgatedURL(0xddd8e0, 0xc8200f60f0)
    /home/rog/src/go/src/gopkg.in/juju/charmstore.v5-unstable/internal/entitycache/cache_test.go:105 +0xf1e
reflect.Value.call(0x9bb6e0, 0xddd8e0, 0x213, 0xaa1a88, 0x4, 0xc8200d9f00, 0x1, 0x1, 0x0, 0x0, ...)
    /home/rog/go/src/reflect/value.go:435 +0x120d
reflect.Value.Call(0x9bb6e0, 0xddd8e0, 0x213, 0xc8200d9f00, 0x1, 0x1, 0x0, 0x0, 0x0)
    /home/rog/go/src/reflect/value.go:303 +0xb1
gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1(0xc8200f60f0)
    /home/rog/src/go/src/gopkg.in/check.v1/check.go:757 +0x489
gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1(0xc82007fc80, 0xc8200f60f0, 0xc820114680)
    /home/rog/src/go/src/gopkg.in/check.v1/check.go:651 +0x6f
created by gopkg.in/check%2ev1.(*suiteRunner).forkCall
    /home/rog/src/go/src/gopkg.in/check.v1/check.go:652 +0x299

It seems fairly clear that sync.Mutex.Unlock should not be calling entitycache.Cache.Close directly.

It seems we're unlocking an already-unlocked mutex (a bug in the code), so sync.Mutex is panicking, but shouldn't this fact be visible in the stack trace?

To reproduce, there's a branch in github.com/rogpeppe/charmstore called stacktrace-oddity; it should be checked out into gopkg.in/juju/charmstore.v5-unstable and its prereqs go-gotten. Running go test in internal/entitycache reproduces the issue reasonably often (reproduced under Go 1.4.1 and Go 1.6 beta). ^\ to see the stack trace when it's deadlocked (go 1.4 panics with a deadlock).

@ianlancetaylor ianlancetaylor changed the title stack trace shows confusing call sequence during panic runtime: stack trace shows confusing call sequence during panic Jan 7, 2016
@ianlancetaylor ianlancetaylor added this to the Go1.6Maybe milestone Jan 7, 2016
@ianlancetaylor
Copy link
Contributor

CC @aclements

@aclements
Copy link
Member

The confusing call sequence is because it's in the process of unwinding defers. It looks like TestEntityIssuesBaseEntityQuerySequentiallyForPromulgatedURL pushed (*Cache).Close on the defer stack, so the panic in Unlock, in a sense, did call (*Cache).Close, though I agree that the presentation of this could perhaps be less confusing. Close wound up waiting on a waitgroup that never finished (and perhaps never will?), so you got stuck in the middle of unwinding the defers. Since, in principle, it was still possible to recover this panic at this point, it couldn't print out the panic.

So, given this, I'm curious if you have thoughts on what you would like to see in the stack trace that would make this clearer.

@rogpeppe
Copy link
Contributor Author

rogpeppe commented Jan 8, 2016

I'd expect to see some mention of "panic" in the stack trace. I'd expect to see it
called by the function that panicked (Mutex.Unlock in this case).

As it turned out, this particular case happened when there were two nested
panics - the code in newChanStore.func2 was trying to send on a closed
channel, which caused a deferred unlock to run, which panicked (because
it was unlocking an unlocked mutex), which caused the
deferred Close to run, which deadlocked because the Unlock
actually locked the mutex (see issue #13879)

Here's a simpler piece of code that illustrates the same issue (http://play.golang.org/p/r8F94naBit):

package main
import (
    "fmt"
)

var c = make(chan int)

func main() {
    close(c)
    defer halt()
    defer b()
    a()
}

func halt() {
    select{}
}

func a() {
    fmt.Println("a")
    c <- 1
}

func b() {
    fmt.Println("b")
    panic("b")
}

Something along this kind of line would be preferable (without any
thought as to whether it's possible or not!)

fatal error: all goroutines are asleep - deadlock!

goroutine 1 [select (no cases)]:
main.halt()
    /home/rog/src/tst.go:16 +0x1b
panic()
    ???
main.b()
    /home/rog/src/tst.go:26 +0x124
panic()
    ???
main.a()
    /home/rog/src/tst.go:21 +0x10c
main.main()
    /home/rog/src/tst.go:12 +0x5b

Of course it would be even better if there was some indication of
where the deferred calls were being made from... when there's a
very deep stack trace, it can be challenging trying to work out
what's actually going on.

@aclements
Copy link
Member

It should be easy to show the panics in the traceback. In fact, you will see them if you run with GOTRACEBACK=system. They just don't show by default because "panic" doesn't actually exist and the runtime function that implements it gets elided because it's a runtime function. We could just special case traceback to show the runtime.gopanic frame and probably rewrite it to "runtime.panic" (or maybe even "panic") to make it really clear.

Giving more information about what frames belong to the defer and where that defer came from is a bit trickier, but I think still doable. The defer is still on the G's defer list while it's running, so most of the information is there, including the exact PC where the defer was created. We might have to record not just that the defer has started, but what SP it's running at to give traceback enough bread crumbs.

/cc @randall77 for any thoughts.

@aclements
Copy link
Member

As @ohler pointed out in #5832, not printing panics is actually a regression from when we renamed the runtime panic function but didn't update some string matching in traceback. I have a CL to fix that which I'll send when the tree reopens.

@minux
Copy link
Member

minux commented Feb 12, 2016 via email

@aclements
Copy link
Member

I don't think this is important enough to hold up 1.6.0, but we could cherry-pick it for 1.6.1.

@aclements
Copy link
Member

@rogpeppe, any thoughts on how you would like deferrals to appear in the traceback? Adding the panic calls in is straightforward, but I'm not sure what to show for deferrals (if anything). Already any call from a panic frame is a deferral. Is there more information you'd like to see? We could simply mark those as deferred calls so users don't need to know that that's what it means when panic "calls" a function. We could add a line that says where they were deferred from, perhaps printing the function name so it's easier to match up with the later frame?

@minux
Copy link
Member

minux commented Feb 12, 2016 via email

@aclements
Copy link
Member

I didn't mean to suggest we would do 1.6.1 just for this. But, yes, it is just a few string comparisons that need to be fixed. I've posted my CL at https://go-review.googlesource.com/19492 and will let @rsc decide if we should go ahead and push it for 1.6.

@gopherbot
Copy link

CL https://golang.org/cl/19492 mentions this issue.

gopherbot pushed a commit that referenced this issue Feb 16, 2016
We used to include panic calls in tracebacks; however, when
runtime.panic was renamed to runtime.gopanic in the conversion of the
runtime to Go, we missed the special case in showframe that includes
panic calls even though they're in package runtime.

Fix the function name check in showframe (and, while we're here, fix
the other check for "runtime.panic" in runtime/pprof). Since the
"runtime.gopanic" name doesn't match what users call panic and hence
isn't very user-friendly, make traceback rewrite it to just "panic".

Updates #5832, #13857. Fixes #14315.

Change-Id: I8059621b41ec043e63d5cfb4cbee479f47f64973
Reviewed-on: https://go-review.googlesource.com/19492
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
@rogpeppe
Copy link
Contributor Author

@aclements It would be useful to show where the defer call is made, otherwise it can be hard to find that out. ("deferred from xxxxx" where xxxxx would indicate calling function and source location of defer statement? )

@aclements
Copy link
Member

@rogpeppe, I thought about that, but it's not obvious to me why that's particularly useful. If the defer is for a closure constructed at the defer point, the source location is only going to be a few lines from what it already prints in the traceback. But perhaps it's more useful when you defer something other than a closure? It looks like that's the case in your original traceback.

@quentinmit quentinmit added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label May 19, 2016
@quentinmit
Copy link
Contributor

@aclements Is there anything left to do as part of this issue? Do you want to add more information about the location of the defer call?

@aclements
Copy link
Member

@quentinmit, the important part (showing panic calls) is fixed. The defer part is a "nice to have" and may break things that try to parse the backtrace format, so maybe bump this to Go1.8Early?

@aclements aclements modified the milestones: Go1.8Early, Go1.7 May 20, 2016
@rsc
Copy link
Contributor

rsc commented Oct 18, 2016

@rogpeppe's program now prints:

a
b
fatal error: all goroutines are asleep - deadlock!

goroutine 1 [select (no cases)]:
main.halt()
    /tmp/x.go:16 +0x14
panic(0x84ae0, 0xc42000a310)
    /Users/rsc/go/src/runtime/panic.go:485 +0x241
main.b()
    /tmp/x.go:26 +0xdd
panic(0x8a360, 0xc42000a2f0)
    /Users/rsc/go/src/runtime/panic.go:485 +0x241
main.a()
    /tmp/x.go:21 +0xc5
main.main()
    /tmp/x.go:12 +0x6e
exit status 2

I think that's clear enough.

@rsc rsc closed this as completed Oct 18, 2016
@golang golang locked and limited conversation to collaborators Oct 18, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

7 participants