Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Add a Partial Deadlock Detector For Go #13759

Closed
rfliam opened this issue Dec 29, 2015 · 9 comments
Closed

proposal: Add a Partial Deadlock Detector For Go #13759

rfliam opened this issue Dec 29, 2015 · 9 comments

Comments

@rfliam
Copy link

rfliam commented Dec 29, 2015

The Proposal

Deadlock's make programs fail. Partial deadlocks (where a subset of the goroutines in a system are deadlocked) are difficult to find and debug. While predicting the occurrences of these deadlocks is not computable, detecting them when they occur is. Adding in a partial deadlock detector would be a powerful tool in debugging our concurrent programs.

Like the race detector, a flag-able deadlock detector would use environment variables to log, or exit on detected deadlocks. The deadlock detector would not detect all deadlocks, or potential deadlocks, only a subset of those that are occurring.

The goal of this proposal is to see if such a feature is desirable, and present a plausible implementation. If it is desirable a detailed design doc will follow.

Is this the Halting Problem?

No. If the necessary conditions are met the program will not continue.

Proposed Implementation

High Level

In short the proposed implementation is to "hook" into the garbage collector, and use it's mark phase to build a wait-for graph. Then use Tarjan's algorithm to find deadlock cycles. That is the strongly connect components with no edges to non deadlocked components.

More Detailed

As go uses a mark-sweep garbage collector most of the computationally expensive work necessary for dead lock detection is already being done. Here is how the proposed detector would work (in very high level terms):

  1. At the beginning of the mark phase all goroutines in a wait state are examined, along with the object they are waiting on.
  2. As objects are marked (colored black or gray) from the various roots, if the object is one one of the "waited on" resources, it can be added as an edge in our wait for graph.
  3. Any resource which is not referenced from another root will be added as an edge to a "nothing" node.
  4. At the end of the mark phase (e.g. post stop the world) run Tarjan's (or similar) to determine the strongly connected components.
  5. If we have a strongly connected component with no edges to a non deadlocked component, the involved goroutines are deadlocked.

Notes

I have intentionally glossed over:

  • Dealing with timers (interruptible waits) and globally referenced resources
  • That an object referenced by multiple roots may contain a reference to a waited on object in another root (paths to the waited on object would have to be stored and examined when previously colored nodes are addressed)
  • Send and receive mechanics on channels
    for simplicity's sake.
@RLH
Copy link
Contributor

RLH commented Dec 29, 2015

Here is one approach that doesn't perturb the GC and its data structures
very much and focuses the additional work only on objects with a proven
partial deadlock after eliminating all objects not involved in such cycles.

Distinguish all objects being waited on and their associated Goroutines. Do
not mark and trace these Goroutine's roots. Otherwise perform the normal
transitive mark to completion. Inspect the waited on object and the
associated Goroutine one at a time. If the object is marked, it is
reachable from an unblocked Goroutine. Trace the objects reachable from the
Goroutine associated with the marked blocking object. This is repeated
until all non perma-blocked Goroutines are discovered and traced.
Correctness, completion, and termination proofs could probably be based on
the various distributed termination proofs in the literature.

At this point untraced Goroutines are blocked on objects that are only
reachable from perma-blocked Goroutines. Furthermore the objects involved
in the interesting dependencies are reachable yet unmarked at this point.
This focuses the work on the interesting objects and interesting Goroutines.

These Goroutines and associated objects are revealed to the user.
Constructing the who depends on whom graph could also happen but I haven't
thought that through completely.

To continue trace the perma-blocked Goroutines and finish the GC cycle.

This approach solves the problem of having to maintain all paths to the
blocked object. All the algorithm needs is to know that there is at least
one non-local path to the blocking object from a non perma-blocked
Goroutine and this approach provides that single bit of information.

Using GC for these types of things is a great idea.

On Mon, Dec 28, 2015 at 7:26 PM, Richard Fliam notifications@github.com
wrote:

The Proposal

Deadlock's make programs fail. Partial deadlocks (where a subset of the
goroutines in a system are deadlocked) are difficult to find and debug.
While predicting the occurrences of these deadlocks is not computable,
detecting them when they occur is. Adding in a partial deadlock detector
would be a powerful tool in debugging our concurrent programs.

Like the race detector, a flag-able deadlock detector would use
environment variables to log, or exit on detected deadlocks. The deadlock
detector would not detect all deadlocks, or potential deadlocks, only a
subset of those that are occurring.

The goal of this proposal is to see if such a feature is desirable, and
present a plausible implementation. If it is desirable a detailed design
doc will follow.
Is this the Halting Problem?

No. If the necessary conditions are met
https://en.wikipedia.org/wiki/Deadlock#Necessary_conditions the program
will not continue.
Proposed Implementation High Level

In short the proposed implementation is to "hook" into the garbage
collector, and use it's mark phase to build a wait-for graph
https://en.wikipedia.org/wiki/Wait-for_graph. Then use Tarjan's
algorithm
https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm
to find deadlock cycles. That is the strongly connect components with no
edges to non deadlocked components.
More Detailed

As go uses a mark-sweep garbage collector most of the computationally
expensive work necessary for dead lock detection is already being done.
Here is how the proposed detector would work (in very high level terms):

  1. At the beginning of the mark phase all goroutines in a wait state
    are examined, along with the object they are waiting on.
  2. As objects are marked (colored black or gray) from the various
    roots, if the object is one one of the "waited on" resources, it can be
    added as an edge in our wait for graph.
  3. Any resource which is not referenced from another root will be
    added as an edge to a "nothing" node.
  4. At the end of the mark phase (e.g. post stop the world) run
    Tarjan's (or similar) to determine the strongly connected components.
  5. If we have a strongly connected component with no edges to a non
    deadlocked component, the involved goroutines are deadlocked.

Notes

I have intentionally glossed over:

  • Dealing with timers (interruptible waits) and globally referenced
    resources
  • That an object referenced by multiple roots may contain a reference
    to a waited on object in another root (paths to the waited on object would
    have to be stored and examined when previously colored nodes are addressed)
  • Send and receive mechanics on channels for simplicity's sake.


Reply to this email directly or view it on GitHub
#13759.

@dr2chase
Copy link
Contributor

dr2chase commented Jan 1, 2016

Years ago I worked on a Java system that detected vanilla deadlocks on the fly. In a green-threads-like system (which Go also resembles) when a thread is about to enter a wait state because of a lock, if the lock records the thread that owns it (this was true in our implementation), the about-to-wait thread can trace from lock, to thread, to lock, to thread, etc, as long as the threads are in a wait state. If this search runs "too far" (e.g, more than 16) then the algorithm switches to serious deadlock detection and notes the current thread, and then the 32nd, 64th, etc, and checks each new thread encountered against those recorded to search for a duplicate.

This is something that can take the place of spinning for a lock; our system didn't do any spinning by default and enabling the deadlock detector tended to improve performance slightly. The sort of information that we found useful in a deadlock was a stack trace for each thread involved.

I think (not sure) that we actually marked threads on the fly while searching for deadlocks; a "searching" thread would abandon a search if it encountered some other thread's mark, and a thread would report a cycle if it encountered its own mark; however, I think that this was vulnerable to sometimes missing deadlocks in a race-to-deadlock, and also required more expensive load/store barriers.

@bradfitz bradfitz added this to the Unplanned milestone Jan 21, 2016
@adg
Copy link
Contributor

adg commented Aug 15, 2016

This proposal is okay in principle, it just needs someone to write a detailed proposal document that explains how it will be implemented.

@adg
Copy link
Contributor

adg commented Sep 26, 2016

Ping?

@minux
Copy link
Member

minux commented Sep 27, 2016 via email

@adg
Copy link
Contributor

adg commented Oct 31, 2016

This issue requires more feedback from people who work on the runtime. Is this desirable? Or is it impractical given existing constraints?

@aclements @RLH @dr2chase @josharian @minux

@aclements
Copy link
Member

I'm not sure I fully understand what's being proposed. Traditional waits-for analysis doesn't require any sort of integration into the marking process, so I'm not sure what you're getting at there (I've done waits-for analysis on runtime locks using just a few hooks and a totally lame GDB script). I'm also not clear what edges you're adding to the waits-for graph in your detailed steps.

Channels

But my main concern is that most synchronization in Go is done with channels and channels don't fit in to traditional waits-for analysis and deadlock detection. For example, if a goroutine is blocked on receiving from a channel, it could become unblocked at some unknown point in the future if the send side of that channel is reachable by any non-blocked goroutine. But reachability is far from a sufficient condition for a send to happen. Channels don't meet the "necessary conditions" you linked to.

You could in principle trace the reachability of channels (perhaps this is what you're getting at by integrating it into the garbage collector?), but that has both false positives and false negatives. A channel may be reachable by a goroutine, but that goroutine will never send on it, in which case we cannot detect a deadlock even if there is one (as a simple case, the channel could be globally reachable). On the flip side, it's perfectly valid (if a bit odd) to do something like <-make(chan int). That's a self-cycle, but is it a deadlock? What if the channel is being used as an iterator and the receiving side breaks out early? That's certainly a leak, but arguably we should garbage collect the sender goroutine and move on. With locks it's very clear that dynamic cycles are never desired nor intended. With channels this is much less clear.

Mutexes

But even if we restrict ourselves to sync.Mutex, we still have to consider reachability because sync.Mutex can be unlocked by any goroutine, not just the one that last locked it (perhaps that was a mistake).

Considering reachability

Can we account for reachability? I think we could, but it's actually quite onerous, since you have to know that the mutex (or channel) are reachable specifically from a non-blocked goroutine. Currently, the garbage collector has no idea what's reachable from what. Distinguishing multiple types of reachability means it would have to flood a secondary property through the heap graph. Propagating that information requires either performing a second full mark phase, or dividing the mark phase into two strictly separate sub-phases where the first starts from just non-blocked goroutines and global roots and finishes a complete graph flood before starting a second flood from blocked goroutines.

Counter-proposal

Of course, I agree that it would be a great debugging tool to be able to find deadlocks. Here's a counter-proposal. Rather than trying to do this online while a program is running, do it offline, perhaps as a heap dump analysis (once we have a library for that). The runtime could help out by recording which goroutine holds each mutex. I posit that most of the time you know a deadlock has happened and the tricky part is figuring out exactly what's involved in the deadlock. Deadlocks, conveniently, sit around waiting to debugged, so you could capture a heap dump and let a tool find cycles for you. These cycles could even be potential cycles (flagged as such), where the channel or mutex is still reachable from a non-blocked goroutine, but maybe that goroutine is never going to send/unlock. A channel or mutex that's only globally reachable is probably a good signal here (and requires yet another reachability analysis).

@rfliam
Copy link
Author

rfliam commented Nov 1, 2016

Channels don't meet the "necessary conditions" you linked to. ...
That's a self-cycle, but is it a deadlock? ... That's certainly a leak, but arguably we should garbage collect the sender goroutine and move on.

I feel this is a bit of a semantic disagreement. A goroutine that will never proceed but can not be garbage collected is "deadlocked" to me. We can detect a subset of goroutines that will never proceed. I don't think anyone would expect us to detect all goroutines which will not proceed (isn't that the halting problem?). We could certainly collect these goroutines, and that may be very valuable. I was, however, concerned about the performance impact if we change the GC to detect them. Like race I had hoped this feature to be low overhead... not no overhead.

perhaps this is what you're getting at by integrating it into the garbage collector?

Yes, reachability is why I proposed linking into the garbage collector.

Can we account for reachability? I think we could, but it's actually quite onerous ... Propagating that information requires either performing a second full mark phase, or dividing the mark phase into two strictly separate sub-phases where the first starts from just non-blocked goroutines and global roots and finishes a complete graph flood before starting a second flood from blocked goroutines.

My proposal was focused on the two-phase mark, though I see I did not make that clear. Practically the number of roots (being driven mostly by the number of goroutines) far exceeds the number of cores we have to mark anyways. Prioritizing unblocked goroutines shouldn't hurt performance too badly.

The details of the GC (especially the 1.8 and 1.9 gc's) are not familiar to me. So the amount of work to divide the mark into two phases isn't something I can estimate. Keep in mind the intention is that this, like race, would not be enabled by default to avoid performance penalties.

Rather than trying to do this online while a program is running, do it offline, perhaps as a heap dump analysis (once we have a library for that).

I've actually done this and we have used it internally. The problem is, like races, one generally doesn't know to start looking until your application has died in production. And then you can't fetch a heap. Deadlocks and leaked goroutines have even led some to decry their usage. Anecdotally, I have heard similar sentiments from engineers at Gophercon. This would also obviate the need for features like #6705 and lots of related test code.

@rsc
Copy link
Contributor

rsc commented Nov 21, 2016

This sounds like a great idea. The devil is in the details. We'll mark this proposal accepted to reflect that we're certainly in favor of having this kind of functionality, but we are not ourselves planning to do the work. If you would like to do this and can make it work, that would be great.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants