runtime: express lock rank graph as a DAG like go/build/deps_test.go #53789

aclements · 2022-07-11T18:42:12Z

The runtime's lock rank graph currently requires a full manual transitive closure of the lock rank graph. This makes it annoying to maintain and some of the edge lists are very long.

We should replace this with something like the DAG language in go/build/deps_test.go. In particular, the runtime's lock ranking falls into a few natural strata, which are very natural to express in the deps DAG, and extremely verbose in the current approach. This would also significantly improve the readability of diffs to the graph (see, for example, 482669d). Finally, it would let us auto-generate the total order, rather than trying to keep them consistent by hand.

We could parse this at runtime startup (only if lock rank checking is enabled anyway), though the current totally static structure has the advantage of being available for lock checking from the moment the process starts.

It may be better to go:generate the static ranking structure from the DAG language. This would keep the structure available immediately at startup, and would also make it easy to share the parser with deps_test.go without trying to fit it into the constraints of early runtime initialization.

The text was updated successfully, but these errors were encountered:

aclements · 2022-07-19T16:40:03Z

I need to clean this up by hand, but here's the current lock graph automatically converted to deps language and reduced:

NONE < dummy, sysmon, sweepWaiters, assistQueue, cpuprof, sweep, pollDesc, deadlock, itab, notifyList, root, rwmutexW, defer, newmHandoff, debugPtrmask, faketimeState, ticks, raceFini, pollCache, debug;
sysmon < scavenge, forcegc;
assistQueue, cpuprof, forcegc, pollDesc, scavenge, sweep, sweepWaiters < sched;
sched < allg, allp, gFree;
allp < timers;
itab < reflectOffs;
scavenge, sweep < hchan;
scavenge < traceBuf;
allg, hchan, reflectOffs, timers, traceBuf < fin;
traceBuf < traceStrings;
allg, hchan, notifyList, reflectOffs, timers, traceStrings < mspanSpecial, profInsert, profBlock, profMemActive, gcBitsArenas, spanSetSpine, mheapSpecial;
profMemActive < profMemFuture;
hchan, root, sched, traceStrings < trace;
fin, notifyList, trace < traceStackTab;
timers < netpollInit;
rwmutexW, sysmon < rwmutexR;
gcBitsArenas, netpollInit, profBlock, profInsert, profMemFuture, spanSetSpine, traceStackTab < gscan;
gscan, rwmutexR < stackpool;
gscan < stackLarge, hchanLeaf;
hchan, notifyList < sudog;
defer, gscan, mspanSpecial, sudog < wbufSpans;
stackLarge, stackpool, wbufSpans < mheap;
mheap, mheapSpecial < globalAlloc;
mheap < pageAllocScav;
deadlock < panic;

To the surprise of no-one, the current lock rank graph is not transitively closed, meaning there are locks for which it's okay to acquire a then b or b then c, but not a then c. The new deps-based approach automatically constructs the full reachability matrix, which will fix this, too.

gopherbot · 2022-07-21T02:10:08Z

Change https://go.dev/cl/418716 mentions this issue: runtime: add missing trace lock edges

gopherbot · 2022-07-21T02:10:09Z

Change https://go.dev/cl/418717 mentions this issue: runtime: add mayAcquire annotation for finlock

gopherbot · 2022-07-21T02:10:09Z

Change https://go.dev/cl/418714 mentions this issue: runtime: delete unused lock ranks

gopherbot · 2022-07-21T02:10:10Z

Change https://go.dev/cl/418715 mentions this issue: runtime: generate the lock ranking from a DAG description

gopherbot · 2022-07-21T02:10:11Z

Change https://go.dev/cl/418593 mentions this issue: internal/dag: add a Graph type and make node order deterministic

gopherbot · 2022-07-21T02:10:11Z

Change https://go.dev/cl/418718 mentions this issue: runtime: make the lock rank DAG make more sense

gopherbot · 2022-07-21T02:10:12Z

Change https://go.dev/cl/418719 mentions this issue: runtime: clean up panic and deadlock lock ranks

gopherbot · 2022-07-21T02:10:14Z

Change https://go.dev/cl/418720 mentions this issue: runtime: add mayAcquire annotation for trace.lock

gopherbot · 2022-07-21T02:10:15Z

Change https://go.dev/cl/418592 mentions this issue: go/build, internal/dag: lift DAG parser into an internal package

aclements · 2022-07-21T14:43:02Z

Now that it's in the DAG language, I was able to generally rationalize the lock graph, in part because it was now easy to generate a visual graph, and group it by subsystem. This rationalization led me to find a few genuinely missing edges, including one that detected a previously unknown rank violation and true lock cycle (#53979).

The graph is still pretty complicated. Now that it's easier to think about, we could consider breaking some of these long paths. In particular, I think we should break the TRACE -> STACKGROW path so that tracing is at the leaf of the lock graph, and possibly the WB -> mheap path, which has caused us issues in the past.

Here's what it looked like prior to cleaning up:

This lifts the DAG parser from the go/build dependencies test into its own package that can be reused elsewhere. I tried to keep the code as close as possible. I changed some names to reflect the more general purpose of internal/dag. Most of the changes are related to error handling, since internal/dag doesn't take a testing.T on which to report errors. Notably, parseRules now returns a slice of parsed rules rather than calling a callback because this made it easier to separate fatal parsing errors from non-fatal graph checking errors. For #53789. Change-Id: I170b84fd85f971cfc1a50972156d48e78b45fce3 Reviewed-on: https://go-review.googlesource.com/c/go/+/418592 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>

The go/types package doesn't care about node ordering because it's just querying paths in the graph, but we're about to use this for the runtime lock graph, and there we want determinism. For #53789. Change-Id: Ic41329bf2eb9a3a202f97c21c761ea588ca551c8 Reviewed-on: https://go-review.googlesource.com/c/go/+/418593 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com>

For #53789. Change-Id: Ic7379afcfdcc47b541bac9b44b5bc6b43604fc0a Reviewed-on: https://go-review.googlesource.com/c/go/+/418714 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com>

We're missing lock edges to trace.lock that happen only rarely. Any trace event can potentially fill up a trace buffer and acquire trace.lock in order to flush the buffer, but this happens relatively rarely, so we simply haven't seen some of these lock edges that could happen. With this change, we promote "fin, notifyList < traceStackTab" to "fin, notifyList < trace" and now everything that emits trace events with a P enters the tracer lock ranks via "trace", rather than some things entering at "trace" and others at "traceStackTab". This was found by inspecting the rank graph for things that didn't make sense. Ideally we would add a mayAcquire annotation that any trace event can potentially acquire trace.lock, but there are actually cases that violate this ranking right now. This is #53979. The chance of a lock cycle is extremely low given the number of conditions that have to happen simultaneously. For #53789. Change-Id: Ic65947d27dee88d2daf639b21b2c9d37552f0ac0 Reviewed-on: https://go-review.googlesource.com/c/go/+/418716 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>

We're missing lock edges to finlock that happen only rarely. Anything that calls mallocgc can potentially trigger sweeping, which can potentially queue a finalizer, which acquires finlock. While this can happen on any malloc, it happens relatively rarely, so we simply haven't seen some of the lock edges that could happen. Add a mayAcquire annotation to mallocgc to capture the possibility of acquiring finlock. With this change, we add "fin" to the set of "malloc" locks. Several of these edges were already there, but not quite all of them. This was found by inspecting the rank graph for things that didn't make sense. For #53789. Change-Id: Idc10ce6f250596b0c07ba07ac93f2198fb38c22b Reviewed-on: https://go-review.googlesource.com/c/go/+/418717 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>

This groups, comments, and generally reorganizes the lock rank graph description by subsystem. It also introduces several pseudo-nodes that more cleanly describe the inherent layering of lock ranks by subsystem. I believe this doesn't actually change the graph, but haven't verified this. For #53789. Change-Id: I72f332f5a23b8217c7dc1b21411631ad48cee4b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/418718 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>

I'm not entirely sure why these locks are currently ranked "deadlock < panic" since we drop panic before acquiring deadlock, and we actually want deadlock to be below panic because panic is implicitly below everything else and we want deadlock to be, too. My best guess is that we had this edge because we intentionally acquire deadlock twice to deadlock, and that causes the lock rank checking to panic on the second acquire. Fix this in a more sensible way by capturing that deadlock can be acquired in a self-cycle and flipping the rank to "panic < deadlock" to express that deadlock needs to be under all other locks, just like panic. For #53789. Change-Id: I8809e5d102ce473bd3ace0ba07bf2200ef60263f Reviewed-on: https://go-review.googlesource.com/c/go/+/418719 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Austin Clements <austin@google.com>

This lifts the DAG parser from the go/build dependencies test into its own package that can be reused elsewhere. I tried to keep the code as close as possible. I changed some names to reflect the more general purpose of internal/dag. Most of the changes are related to error handling, since internal/dag doesn't take a testing.T on which to report errors. Notably, parseRules now returns a slice of parsed rules rather than calling a callback because this made it easier to separate fatal parsing errors from non-fatal graph checking errors. For golang#53789. Change-Id: I170b84fd85f971cfc1a50972156d48e78b45fce3 Reviewed-on: https://go-review.googlesource.com/c/go/+/418592 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>

The go/types package doesn't care about node ordering because it's just querying paths in the graph, but we're about to use this for the runtime lock graph, and there we want determinism. For golang#53789. Change-Id: Ic41329bf2eb9a3a202f97c21c761ea588ca551c8 Reviewed-on: https://go-review.googlesource.com/c/go/+/418593 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com>

For golang#53789. Change-Id: Ic7379afcfdcc47b541bac9b44b5bc6b43604fc0a Reviewed-on: https://go-review.googlesource.com/c/go/+/418714 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com>

Currently, the runtime lock rank graph is maintained manually in a large set of arrays that give the partial order and a manual topological sort of this partial order. Any changes to the rank graph are difficult to reason about and hard to review, as well as likely to cause merge conflicts. Furthermore, because the partial order is manually maintained, it's not actually transitively closed (though it's close), meaning there are many cases where rank a can be acquired before b and b before c, but a cannot be acquired before c. While this isn't technically wrong, it's very strange in the context of lock ordering. Replace all of this with a much more compact, readable, and maintainable description of the rank graph written in the internal/dag graph language. We statically generate the runtime structures from this description, which has the advantage that the parser doesn't have to run during runtime initialization and the structures can live in static data where they can be accessed from any point during runtime init. The current description was automatically generated from the existing partial order, combined with a transitive reduction. This ensures it's correct, but it could use some manual messaging to call out the logical layers and add some structure. We do lose the ad hoc string names of the lock ranks in this translation, which could mostly be derived from the rank constant names, but not always. I may bring those back but in a more uniform way. We no longer need the tests in lockrank_test.go because they were checking that we manually maintained the structures correctly. Fixes golang#53789. Change-Id: I54451d561b22e61150aff7e9b8602ba9737e1b9b Reviewed-on: https://go-review.googlesource.com/c/go/+/418715 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>

We're missing lock edges to trace.lock that happen only rarely. Any trace event can potentially fill up a trace buffer and acquire trace.lock in order to flush the buffer, but this happens relatively rarely, so we simply haven't seen some of these lock edges that could happen. With this change, we promote "fin, notifyList < traceStackTab" to "fin, notifyList < trace" and now everything that emits trace events with a P enters the tracer lock ranks via "trace", rather than some things entering at "trace" and others at "traceStackTab". This was found by inspecting the rank graph for things that didn't make sense. Ideally we would add a mayAcquire annotation that any trace event can potentially acquire trace.lock, but there are actually cases that violate this ranking right now. This is golang#53979. The chance of a lock cycle is extremely low given the number of conditions that have to happen simultaneously. For golang#53789. Change-Id: Ic65947d27dee88d2daf639b21b2c9d37552f0ac0 Reviewed-on: https://go-review.googlesource.com/c/go/+/418716 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>

We're missing lock edges to finlock that happen only rarely. Anything that calls mallocgc can potentially trigger sweeping, which can potentially queue a finalizer, which acquires finlock. While this can happen on any malloc, it happens relatively rarely, so we simply haven't seen some of the lock edges that could happen. Add a mayAcquire annotation to mallocgc to capture the possibility of acquiring finlock. With this change, we add "fin" to the set of "malloc" locks. Several of these edges were already there, but not quite all of them. This was found by inspecting the rank graph for things that didn't make sense. For golang#53789. Change-Id: Idc10ce6f250596b0c07ba07ac93f2198fb38c22b Reviewed-on: https://go-review.googlesource.com/c/go/+/418717 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>

This groups, comments, and generally reorganizes the lock rank graph description by subsystem. It also introduces several pseudo-nodes that more cleanly describe the inherent layering of lock ranks by subsystem. I believe this doesn't actually change the graph, but haven't verified this. For golang#53789. Change-Id: I72f332f5a23b8217c7dc1b21411631ad48cee4b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/418718 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>

I'm not entirely sure why these locks are currently ranked "deadlock < panic" since we drop panic before acquiring deadlock, and we actually want deadlock to be below panic because panic is implicitly below everything else and we want deadlock to be, too. My best guess is that we had this edge because we intentionally acquire deadlock twice to deadlock, and that causes the lock rank checking to panic on the second acquire. Fix this in a more sensible way by capturing that deadlock can be acquired in a self-cycle and flipping the rank to "panic < deadlock" to express that deadlock needs to be under all other locks, just like panic. For golang#53789. Change-Id: I8809e5d102ce473bd3ace0ba07bf2200ef60263f Reviewed-on: https://go-review.googlesource.com/c/go/+/418719 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Austin Clements <austin@google.com>

Now that we've moved the trace locks to the leaf of the lock graph, we can safely annotate that any trace event may acquire trace.lock even if dynamically it turns out a particular event doesn't need to flush and acquire this lock. This reveals a new edge where we can trace while holding the mheap lock, so we add this to the lock graph. For #53789. Updates #53979. Change-Id: I13e2f6cd1b621cca4bed0cc13ef12e64d05c89a7 Reviewed-on: https://go-review.googlesource.com/c/go/+/418720 Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>

aclements added the FeatureRequest label Jul 11, 2022

aclements added this to the Backlog milestone Jul 11, 2022

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 11, 2022

mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 11, 2022

aclements self-assigned this Jul 12, 2022

gopherbot closed this as completed in 2b8a9a4 Aug 4, 2022

golang locked and limited conversation to collaborators Aug 4, 2023

gopherbot added the FrozenDueToAge label Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: express lock rank graph as a DAG like go/build/deps_test.go #53789

runtime: express lock rank graph as a DAG like go/build/deps_test.go #53789

aclements commented Jul 11, 2022 •

edited

aclements commented Jul 19, 2022

gopherbot commented Jul 21, 2022

gopherbot commented Jul 21, 2022

gopherbot commented Jul 21, 2022

gopherbot commented Jul 21, 2022

gopherbot commented Jul 21, 2022

gopherbot commented Jul 21, 2022

gopherbot commented Jul 21, 2022

gopherbot commented Jul 21, 2022

gopherbot commented Jul 21, 2022

aclements commented Jul 21, 2022

runtime: express lock rank graph as a DAG like go/build/deps_test.go #53789

runtime: express lock rank graph as a DAG like go/build/deps_test.go #53789

Comments

aclements commented Jul 11, 2022 • edited

aclements commented Jul 19, 2022

gopherbot commented Jul 21, 2022

gopherbot commented Jul 21, 2022

gopherbot commented Jul 21, 2022

gopherbot commented Jul 21, 2022

gopherbot commented Jul 21, 2022

gopherbot commented Jul 21, 2022

gopherbot commented Jul 21, 2022

gopherbot commented Jul 21, 2022

gopherbot commented Jul 21, 2022

aclements commented Jul 21, 2022

aclements commented Jul 11, 2022 •

edited