runtime: Stack() calls lock against each other unnecessarily, even when single-goroutine #56400

seebs · 2022-10-24T19:29:35Z

What version of Go are you using (`go version`)?

$ go version
1.19.2

Does this issue reproduce with the latest release?

y

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
(encountered on Linux but I assume it shows up everywhere)

What did you do?

I saw a profile that showed 85% of CPU time in calls to runtime.Stack(), and 81% of total CPU time in calls to printlock and printunlock.

What did you expect to see?

I expected calls that were independent of each other, writing into buffers rather than to shared streams, on different CPUs, to proceed without lock contention.

What did you see instead?

Every single write in any part of the traceback process goes through print (and friends) ultimately, which means that it uses printlock and printunlock. This is true even when g.writebuf is non-nil and the different gs will be writing into different buffers. As a result, the stack tracebacks block each other, and since they're both making large numbers of individual calls to print, that results in heavy lock contention. (To complicate things, there's cases where a non-nil g.writebuf still doesn't cause the writes to go into the buffer, as when gp.m is dying.)

This looks like a mess to fix. It's "easy" to fix by changing the code to thread a write target through it, except that you absolutely don't want any of this code dealing with stuff like io.Writer or other higher-level abstractions. I also don't know whether it's actually completely safe to have multiple copies of print running, against different write buffers, on different gs. In the case where you really are writing to a stream, you probably don't want a buffer. You don't want to have two copies of all of this code, identical except for their output. But having two calls to runtime.Stack() generate hundreds of contended locks seems pretty bad too...

The text was updated successfully, but these errors were encountered:

heschi · 2022-10-25T16:48:33Z

cc @golang/runtime

mknyszek · 2022-10-26T19:12:52Z

In triage this doesn't seem like high priority to look at. Stack could probably avoid the printlock but that might be tricky today. This might be more doable if we clean up gentraceback in general (CC @aclements).

mknyszek · 2022-10-26T19:13:29Z

(In other words, effectively blocking this on #54466.)

prattmic · 2022-10-26T19:14:41Z

Note that using runtime.Callers should avoid this lock contention.

josharian · 2022-10-26T19:34:11Z

There was also some discussion in the Gophers Slack about one possible hack fix:

 func printlock() {
-       mp := getg().m
+       gp := getg()
+       if gp.writebuf != nil {
+               return
+       }
+       mp := gp.m

(and the same for printunlock)

ianlancetaylor · 2022-10-26T22:23:33Z

That's not the only reason printlock exists. It also ensures that if two goroutines call print in parallel with multiple arguments then the output will not be interleaved.

josharian · 2022-10-26T22:34:47Z

Right. But if I read the code correctly (big if), when gp.writebuf is non-nil, we are always non-concurrent.

ianlancetaylor · 2022-10-26T23:07:36Z

Perhaps you are not aware that the compiler generates calls to printlock. When the compiler sees print(a, b, c), it generates code like

    printlock()
    printint(a)
    printint(b)
    printint(c)
    printunlock()

josharian · 2022-10-26T23:48:45Z

I am indeed quite aware of that. :)

In those cases, however, gp.writebuf is nil (the output goes straight to stderr). gp.writebuf is only non-nil for runtime.Stack and a runtime tests.

ianlancetaylor · 2022-10-26T23:51:42Z

Sure, but my point was that if two different goroutines call print in parallel with multiple arguments, their output will not be interleaved. And that still seems true to me. Am I missing something?

(It's true that the output from print can be interleaved with other output that the program might be writing to standard error. That seems harder to avoid, though as you imply it could be done by setting writebuf.)

josharian · 2022-10-27T00:10:23Z

Am I missing something?

I hope so? :P

The idea of the patch above was that we effectively disable printlock and printunlock only on the relevant G when its writebuf is non-nil. writebuf is non-nil only when runtime.Stack is actively using its (systemstack's) G to print a stack. All other goroutines that are printing will have nil writebufs, and the locking will proceed as usual.

ianlancetaylor · 2022-10-27T00:18:06Z

Argh, sorry, I get it now. Completely misunderstood. Apologies.

seebs · 2022-10-27T15:26:12Z

I did note that runtime.Callers produces part of this data without locking, but there's no way to get the argument info, if you want it, and also there's no easy way to just get the pretty-printed output. So, it solves the performance problem but it loses information; so far as I can tell, there's nothing available to a user that can produce the call stack with args that isn't this.

So far as I can tell, writebuf can only be non-nil during a call to runtime.Stack(), and during one specific test that also assigns to it for some reason. There's no other uses in the code.

So I think in practice the "just dummy out locks when writebuf is non-nil" behavior would be correct, but I'd feel a bit nervous about it being brittle. I'm also not 100% sure I know what it would do in the case where you request tracebacks for all goroutines, but I assume it works. (I also note that if you request tracebacks for all goroutines, the world is stopped, so, good news, no lock contention, bad news, you already took every performance hit.)

That said, I'm certainly not about to volunteer to try to make the traceback stuff work differently, that looks like a really hard problem.

We thought stack traces were mildly expensive. We were very wrong. Due to a complicated issue in the Go runtime, simultaneous requests for stack traces end up contending on a lock even when they're not actually contending on any resources. I've filed a ticket in the Go issue tracker for this: golang/go#56400 In the mean time: Under some workloads, we were seeing 85% of all CPU time go into the stack backtraces, of which 81% went into the contention on those locks. But even if you take away the contention, that leaves us with 4/19 of all CPU time in our code going into building those stack backtraces. That's a lot of overhead for a feature we virtually never use. We might consider adding a backtrace functionality here, possibly using `runtime.Callers` which is much lower overhead, and allows us to generate a backtrace on demand (no argument values available, but then, we never read those because they're unformatted hex values), but I don't think it's actually very informative to know what the stack traces were of the Tx; they don't necessarily reflect the current state of any ongoing use of the Tx, so we can't necessarily correlate them to goroutine stack dumps, and so on.

* use t.Fatal(f) to abort tests, not panic * make perf_able run at all, make it debug a bit better switch perf-able to using same node type we use for other spot instances, because otherwise it never finds any available capacity. we switch the perf-able script to use the standard get_value function instead of direct jq calls. we try to grab server logs if the restore fails in the hopes of finding out why the restore very occasionally fails. * Fix some issues with running IDK tests in docker. (#2248) *Stop running TestKafkaSourceIntegration with t.Parallel() This test can't be run in parallel as it's currently written. Doing so allows for interleaving of messages to the same kafka topic between tests. I didn't attempt to modify the test so it could be run in parallel. That could be done, but left for someone more ambitious. * Remove idk/testenv/certs which got accidentally committed. also update .gitignore to include those. * changes to add bool support in idk (#2240) * initial changes to add bool support in idk * modifying some default parameters for testing, will revert them later * adding support for bool in making fragments function * boolean values implementation without supporting empty or null values at this point * Implement bool support in batch using a map (and a slice for nulls) (#2247) * Implement bool support in batch using a map (and a slice for nulls) * Keep the PackBools default for now But set it explicity in the ingest tests which rely on it. * Modify batch to construct bool update like mutex The code in API.ImportRoaringShard has a switch statement which causes bool fields to be handled like mutex fields. This means, that the viewUpdate.Clear value should only contain data in the first "row" of the fragment, which it will treat as records to clear for *all* rows. This makes more sense for mutex fields; for bool fields, there's only one other row to clear. But since the code is currently handling them the same, we need to construct viewUpdate.Clear such that it conforms to that pattern. This commit also adds a test which covers this logic. * Remove commented code; revert config for testing This commit also removes the DELETE_SENTINEL case for non-packed bools, since that isn't supported anyway. * Revert default setting * remove inconsistent type scope * correcting the logic of string converstion to bool * resolving an error in a test * adding tests to cover code related to bool support in batch.go file and interface.go files * modifying interfaces test * added one more test case Co-authored-by: Travis Turner <travis@pilosa.com> Co-authored-by: Travis Turner <travis@molecula.com> * resolving bool null field ingestion error (#2254) * resolving bool null field ingestion error * testing issues * adding null support for bools * updating the null bool field ingestion * trying to resolve issue when ingesting null value for bool type * adding a clearing support for bool type * resolving issues with bool null value ingestion * updating the jwt go package version and removing changes made in docker compose file * reverting jwt go version * removing v4 of jwt * adding a comment in test file to see if sonar cloud accepts this file * don't obtain stack traces on rbf.Tx creation We thought stack traces were mildly expensive. We were very wrong. Due to a complicated issue in the Go runtime, simultaneous requests for stack traces end up contending on a lock even when they're not actually contending on any resources. I've filed a ticket in the Go issue tracker for this: golang/go#56400 In the mean time: Under some workloads, we were seeing 85% of all CPU time go into the stack backtraces, of which 81% went into the contention on those locks. But even if you take away the contention, that leaves us with 4/19 of all CPU time in our code going into building those stack backtraces. That's a lot of overhead for a feature we virtually never use. We might consider adding a backtrace functionality here, possibly using `runtime.Callers` which is much lower overhead, and allows us to generate a backtrace on demand (no argument values available, but then, we never read those because they're unformatted hex values), but I don't think it's actually very informative to know what the stack traces were of the Tx; they don't necessarily reflect the current state of any ongoing use of the Tx, so we can't necessarily correlate them to goroutine stack dumps, and so on. * fb-1729 Enriched Table Metadata (#2255) enriched metadata for tables added support for the concept of a table and field owners in metadata; mechanism to derive owner from http request metadata; metadata for table description * tightened up is/is not null filter expressions (FB-1741) (#2260) Covers tightening up handling filter expressions that contain is/is not null ops. These filters may have to be translated into PQL calls to be passed to the executor and even though sql3 language supports nullability for any data type, currently only BSI fields are nullable at the storage engine level (there is a ticket to add support for non-BSI field here FB-1689: IS SQL Argument returns incorrect error) so when these fields are used in filter conditions we need to handle BSI and non-BSI fields differently. * added a test to cover the keyword replace as being synonymous with insert (#2261) * update molecula references to featurebase (#2262) Co-authored-by: Seebs <seebs@molecula.com> Co-authored-by: Travis Turner <travis@pilosa.com> Co-authored-by: Pranitha-malae <56414132+Pranitha-malae@users.noreply.github.com> Co-authored-by: Travis Turner <travis@molecula.com> Co-authored-by: pokeeffe-molecula <85502298+pokeeffe-molecula@users.noreply.github.com> Co-authored-by: Stephanie Yang <stephanie@pilosa.com>

We thought stack traces were mildly expensive. We were very wrong. Due to a complicated issue in the Go runtime, simultaneous requests for stack traces end up contending on a lock even when they're not actually contending on any resources. I've filed a ticket in the Go issue tracker for this: golang/go#56400 In the mean time: Under some workloads, we were seeing 85% of all CPU time go into the stack backtraces, of which 81% went into the contention on those locks. But even if you take away the contention, that leaves us with 4/19 of all CPU time in our code going into building those stack backtraces. That's a lot of overhead for a feature we virtually never use. We might consider adding a backtrace functionality here, possibly using `runtime.Callers` which is much lower overhead, and allows us to generate a backtrace on demand (no argument values available, but then, we never read those because they're unformatted hex values), but I don't think it's actually very informative to know what the stack traces were of the Tx; they don't necessarily reflect the current state of any ongoing use of the Tx, so we can't necessarily correlate them to goroutine stack dumps, and so on.

gopherbot added the compiler/runtime label Oct 24, 2022

seankhliao added the Performance label Oct 24, 2022

heschi added the NeedsInvestigation label Oct 25, 2022

heschi added this to the Backlog milestone Oct 25, 2022

mknyszek added this to Go Compiler / Runtime Oct 26, 2022

mknyszek moved this to Todo in Go Compiler / Runtime Oct 26, 2022

rafiss mentioned this issue Apr 29, 2024

sql/colexecerror: CatchVectorizedRuntimeError uses an expensive debug.Stack call cockroachdb/cockroach#123235

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: Stack() calls lock against each other unnecessarily, even when single-goroutine #56400

runtime: Stack() calls lock against each other unnecessarily, even when single-goroutine #56400

seebs commented Oct 24, 2022

heschi commented Oct 25, 2022

mknyszek commented Oct 26, 2022

mknyszek commented Oct 26, 2022

prattmic commented Oct 26, 2022

josharian commented Oct 26, 2022

ianlancetaylor commented Oct 26, 2022

josharian commented Oct 26, 2022

ianlancetaylor commented Oct 26, 2022

josharian commented Oct 26, 2022

ianlancetaylor commented Oct 26, 2022

josharian commented Oct 27, 2022

ianlancetaylor commented Oct 27, 2022

seebs commented Oct 27, 2022

runtime: Stack() calls lock against each other unnecessarily, even when single-goroutine #56400

runtime: Stack() calls lock against each other unnecessarily, even when single-goroutine #56400

Comments

seebs commented Oct 24, 2022

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

heschi commented Oct 25, 2022

mknyszek commented Oct 26, 2022

mknyszek commented Oct 26, 2022

prattmic commented Oct 26, 2022

josharian commented Oct 26, 2022

ianlancetaylor commented Oct 26, 2022

josharian commented Oct 26, 2022

ianlancetaylor commented Oct 26, 2022

josharian commented Oct 26, 2022

ianlancetaylor commented Oct 26, 2022

josharian commented Oct 27, 2022

ianlancetaylor commented Oct 27, 2022

seebs commented Oct 27, 2022

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?