Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/trace/v2: goroutine analysis page doesn't identify goroutines consistently #65574

Closed
nsrip-dd opened this issue Feb 7, 2024 · 7 comments
Closed
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@nsrip-dd
Copy link
Contributor

nsrip-dd commented Feb 7, 2024

Go version

go version go1.22rc2 darwin/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/nick.ripley/Library/Caches/go-build'
GOENV='/Users/nick.ripley/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/nick.ripley/go/pkg/mod'
GONOPROXY='redacted
GONOSUMDB='redacted'
GOOS='darwin'
GOPATH='/Users/nick.ripley/go'
GOPRIVATE='redacted'
GOPROXY='redacted'
GOROOT='/Users/nick.ripley/sdk/go1.22rc2'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/Users/nick.ripley/sdk/go1.22rc2/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22rc2'
GCCGO='gccgo'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/f3/g91d13pd6kd3vdxts_gsgd1r0000gn/T/go-build1407906451=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

Collected an execution trace with the new v2 tracer and viewed it with go tool trace. I then navigated to the "Goroutine analysis" page.

What did you see happen?

I didn't see the goroutines I was looking for on the analysis page, like main.main. Many of the names in the "Start location" column look like where a goroutine would be waiting, not where it started. Here's an example of what I see, looking an an execution trace from running unit tests:

Start location Count Total execution time
testing.tRunner 2450 10.246053307s
runtime.gopark 9 6.733193086s
runtime.gcMarkDone 1 806.957881ms
runtime.goparkunlock 1 301.65869ms
runtime.(*scavengerState).park 1 153.088717ms
runtime.(*traceAdvancerState).start.func1 1 114.573568ms
time.Sleep 1 7.663809ms
runtime.chanrecv1 1 5.670979ms
runtime.traceStartReadCPU.func1 1 592.705µs
runtime/trace.Start.func1 1 255.957µs
(Inactive, no stack trace sampled) 2 0s

Here, for example, the goroutines called runtime.gopark are actually the runtime.gcBgMarkWorker goroutines, and runtime.chanrecv1 is actually main.main waiting on a channel.

What did you expect to see?

The "start location" of a goroutine should be the name of the entry function. For example, the goroutine created by the statement go myFunction() should be called myFunction.

The immediate issue is that the current goroutine summary code uses the wrong end of the call stack to identify a goroutine here. There's a secondary issue, which is that the UI uses a PC as a key for grouping together goroutines, but the tracer no longer records the "starting" PC for goroutines which existed at the start of the trace. So grouping by PC is no longer consistent. We could remove the dependency on the starting PC from the goroutine pages and just use the name for grouping. I can send a CL if we're okay with that.

As a followup, should we include starting PCs in traces again? The old tracer recorded the starting PC of every goroutine at the beginning of the trace. The starting PC would mainly be helpful for identifying goroutines where we don't see any other call stacks. We could also consider recording the full call stack of a goroutine the first time we see it, e.g. where it's blocked at the start of the trace, instead of/in addition to the starting PC. Happy to open a separate issue or leave a comment somewhere else if this point is worth discussing more?

cc @mknyszek

@gopherbot
Copy link

Change https://go.dev/cl/562455 mentions this issue: cmd/trace/v2,internal/trace: use correct frame for identifying goroutines

@mknyszek
Copy link
Contributor

mknyszek commented Feb 7, 2024

The immediate issue is that the current goroutine summary code uses the wrong end of the call stack to identify a goroutine here. There's a secondary issue, which is that the UI uses a PC as a key for grouping together goroutines, but the tracer no longer records the "starting" PC for goroutines which existed at the start of the trace. So grouping by PC is no longer consistent. We could remove the dependency on the starting PC from the goroutine pages and just use the name for grouping. I can send a CL if we're okay with that.

'Doh. Thank you. Yes, these are both issues we should fix. I think using the name for grouping is fine and good.

As a followup, should we include starting PCs in traces again?

Yes, this was an oversight that was brought to my attention relatively recently. There might be another issue about it already, but a brief look around didn't reveal anything.

I think instead of taking a PC though, we should consider taking the full stack trace. The tracer takes ownership of the goroutine when it reads its status for the case where a goroutine is waiting or in a syscall for an entire generation.GoStatus events will just have an additional stack argument (usually 0, unless it falls into one of these cases). We would have to take and store the stack here: https://cs.opensource.google/go/go/+/master:src/runtime/trace2.go;l=336?q=trace2.go&ss=go%2Fgo:src%2Fruntime%2F. We then emit the stack later, if it turns out we're going to emit a GoStatus event for the goroutine.

We can do the same thing with just a PC, but it's not that much more work to just take the whole stack.

@mknyszek
Copy link
Contributor

mknyszek commented Feb 7, 2024

I think we should consider this for backport. I don't think the Go 1.22 cmd/trace tool should be broken in this way forever. :P The backport risk here is very low, because it's only dev tooling.

I do think we should maybe open up a separate issue for the "some goroutines are missing a stack" problem.

@mknyszek
Copy link
Contributor

mknyszek commented Feb 7, 2024

@gopherbot Please open backport issues for Go 1.21 and Go 1.22. This makes the cmd/trace UI harder to interpret with no easy workaround. The fix is very safe because it only touches debug tooling, specifically go tool trace, not any Go code someone would run in production.

EDIT: Go 1.21 backport not necessary. That was muscle memory on my part.

@gopherbot
Copy link

Backport issue(s) opened: #65576 (for 1.21), #65577 (for 1.22).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@dr2chase dr2chase added the NeedsFix The path to resolution is known, but the work has not been done. label Feb 8, 2024
@nsrip-dd
Copy link
Contributor Author

nsrip-dd commented Feb 9, 2024

I do think we should maybe open up a separate issue for the "some goroutines are missing a stack" problem.

Done :) #65634

@gopherbot
Copy link

Change https://go.dev/cl/562558 mentions this issue: [release-branch.go1.22] cmd/trace/v2,internal/trace: use correct frame for identifying goroutines

@dmitshur dmitshur added this to the Go1.23 milestone Feb 14, 2024
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
…ines

To determine the identity of a goroutine for displaying in the trace UI,
we should use the root frame from a call stack. This will be the
starting function for the goroutine and is the same for each call stack
from a given goroutine. The new tracer no longer includes starting PCs
for goroutines which existed at the start of tracing, so we can't use a
PC for grouping together goroutines any more. Instead, we just use the
name of the entry function for grouping.

Fixes golang#65574

Change-Id: I5324653316f1acf0ab90c30680f181060ea45dd7
Reviewed-on: https://go-review.googlesource.com/c/go/+/562455
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
gopherbot pushed a commit that referenced this issue Feb 27, 2024
…e for identifying goroutines

To determine the identity of a goroutine for displaying in the trace UI,
we should use the root frame from a call stack. This will be the
starting function for the goroutine and is the same for each call stack
from a given goroutine. The new tracer no longer includes starting PCs
for goroutines which existed at the start of tracing, so we can't use a
PC for grouping together goroutines any more. Instead, we just use the
name of the entry function for grouping.

For #65574
Fixes #65577

Change-Id: I5324653316f1acf0ab90c30680f181060ea45dd7
Reviewed-on: https://go-review.googlesource.com/c/go/+/562455
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
(cherry picked from commit a517131)
Reviewed-on: https://go-review.googlesource.com/c/go/+/562558
Reviewed-by: Michael Pratt <mpratt@google.com>
bradfitz pushed a commit to tailscale/go that referenced this issue Mar 5, 2024
…e for identifying goroutines

To determine the identity of a goroutine for displaying in the trace UI,
we should use the root frame from a call stack. This will be the
starting function for the goroutine and is the same for each call stack
from a given goroutine. The new tracer no longer includes starting PCs
for goroutines which existed at the start of tracing, so we can't use a
PC for grouping together goroutines any more. Instead, we just use the
name of the entry function for grouping.

For golang#65574
Fixes golang#65577

Change-Id: I5324653316f1acf0ab90c30680f181060ea45dd7
Reviewed-on: https://go-review.googlesource.com/c/go/+/562455
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
(cherry picked from commit a517131)
Reviewed-on: https://go-review.googlesource.com/c/go/+/562558
Reviewed-by: Michael Pratt <mpratt@google.com>
romaindoumenc pushed a commit to TroutSoftware/go that referenced this issue Mar 6, 2024
…e for identifying goroutines

To determine the identity of a goroutine for displaying in the trace UI,
we should use the root frame from a call stack. This will be the
starting function for the goroutine and is the same for each call stack
from a given goroutine. The new tracer no longer includes starting PCs
for goroutines which existed at the start of tracing, so we can't use a
PC for grouping together goroutines any more. Instead, we just use the
name of the entry function for grouping.

For golang#65574
Fixes golang#65577

Change-Id: I5324653316f1acf0ab90c30680f181060ea45dd7
Reviewed-on: https://go-review.googlesource.com/c/go/+/562455
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
(cherry picked from commit a517131)
Reviewed-on: https://go-review.googlesource.com/c/go/+/562558
Reviewed-by: Michael Pratt <mpratt@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants