Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/trace: port to v2 traces [freeze exception] #63960

Closed
21 of 22 tasks
felixge opened this issue Nov 6, 2023 · 23 comments
Closed
21 of 22 tasks

cmd/trace: port to v2 traces [freeze exception] #63960

felixge opened this issue Nov 6, 2023 · 23 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@felixge
Copy link
Contributor

felixge commented Nov 6, 2023

Tracking issue for updating cmd/trace for the v2 overhaul (#60773) in CL 538515.

Main Viewer:

Sub-Pages:

cc @mknyszek

@felixge felixge changed the title cmd/trace v2 cmd/trace v2: tracking issue Nov 6, 2023
@heschi heschi added the NeedsFix The path to resolution is known, but the work has not been done. label Nov 6, 2023
@heschi heschi added this to the Go1.22 milestone Nov 6, 2023
@heschi
Copy link
Contributor

heschi commented Nov 6, 2023

cc @golang/runtime

@heschi heschi added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 6, 2023
@gopherbot
Copy link

Change https://go.dev/cl/540256 mentions this issue: internal/trace: implement MutatorUtilizationV2

@gopherbot
Copy link

Change https://go.dev/cl/540259 mentions this issue: internal/trace: implement goroutine analysis for v2 traces

@mknyszek mknyszek changed the title cmd/trace v2: tracking issue cmd/trace: port to v2 traces [freeze exception] Nov 8, 2023
@mknyszek
Copy link
Contributor

mknyszek commented Nov 8, 2023

@golang/release

Hey release folks, I'd like to request a freeze exception for this work. It's a little early and there's a solid chance that we'll get this all done before the freeze rolls around, but I wanted to get this on your radar ASAP.

My reasoning for the freeze exception is as follows:

This is a blocker for enabling the new execution tracer (#60773) by default, because we shouldn't ship a runtime that produces traces that users have no tooling for.

However, this codebase is far from mission-critical. If it has bugs, it's not going to break any production services. The new execution tracer on the other hand is in good shape and will easily land before the freeze.

Simultaneously, because the new execution tracer work is largely targeting developers that build tools for other developers, asking downstream users to set a GOEXPERIMENT is a fairly high hurdle to overcome, and would effectively delay building on top of the new foundation from February to August.

Note that I'm also suggesting to enable the new execution tracer by default after the freeze window. While this seems risky, I designed the new tracer's tests to run even if the GOEXPERIMENT is disabled, so we'll still get exactly the same coverage from CI until cmd/trace is fully finished. I can also easily expand this testing to running several std packages' tests under the new tracer and validating the results if that would help ease concerns about test coverage.

I propose we also set a hard cutoff on this work at the first release candidate. If we can't get it done by the time the release candidate is being built, then we'll drop it for Go 1.22.

Note that we can still land all the in-progress changes before then (to avoid carrying an inventory of unsubmitted CLs) because cmd/trace selects its behavior based on the trace version. As a result, it'll still work exactly the same for old traces; cmd/trace will only be left in a partially-working state for new traces. This further reduces the risk and provides us with an easy off-ramp.

@heschi
Copy link
Contributor

heschi commented Nov 8, 2023

Approved if needed: we don't expect to get a lot of coverage of this feature before RC1 anyway, so as long as it lands before that it should be okay. However, please try to test with large-scale Google services as early as possible.

@mknyszek mknyszek self-assigned this Nov 8, 2023
gopherbot pushed a commit that referenced this issue Nov 10, 2023
This change adds a new MutatorUtilization for traces for Go 1.22+.

To facilitate testing, it also generates a short trace with the
gc-stress.go test program (shortening its duration to 10ms) and adds it
to the tests for the internal/trace/v2 package. Notably, we make sure
this trace has a GCMarkAssistActive event to test that codepath.

For #63960.
For #60773.

Change-Id: I2e61f545988677be716818e2a08641c54c4c201f
Reviewed-on: https://go-review.googlesource.com/c/go/+/540256
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
gopherbot pushed a commit that referenced this issue Nov 10, 2023
For #63960.

Change-Id: I1efe35435e32623aba894a915114e394570ebc56
Reviewed-on: https://go-review.googlesource.com/c/go/+/540259
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@gopherbot
Copy link

Change https://go.dev/cl/541535 mentions this issue: internal/trace/v2: disable cgo callback test if cgo is not available

gopherbot pushed a commit that referenced this issue Nov 10, 2023
For #63960.

Change-Id: I3d8d1567c4ee213e2ffd2bd91d0ffae9c4c42b92
Reviewed-on: https://go-review.googlesource.com/c/go/+/541535
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
@felixge
Copy link
Contributor Author

felixge commented Nov 10, 2023

Syscall rendering now works. Instead of rendering them as instant events, they are now rendered as slices. For example:

CleanShot 2023-11-10 at 18 02 15@2x

For blocking syscalls, only the time they are executing on a P (before sysmon declares them "blocking") is displayed on the P lane. The Arg label "blocked" indicates that the syscall got blocked:

CleanShot 2023-11-10 at 18 00 53@2x

While typing this I realize that it might be nice to display the totalDuration of blocked syscalls. I added it as a nice-to-have item in the list above.

@gopherbot
Copy link

Change https://go.dev/cl/541257 mentions this issue: cmd/trace/v2: add goroutine analysis pages

@gopherbot
Copy link

Change https://go.dev/cl/541259 mentions this issue: internal/trace/traceviewer: make the mmu handler more self-contained

@gopherbot
Copy link

Change https://go.dev/cl/541258 mentions this issue: cmd/trace: common up the mmu page and add it to cmd/trace/v2

@gopherbot
Copy link

Change https://go.dev/cl/541999 mentions this issue: cmd/trace/v2: add support for pprof endpoints

@gopherbot
Copy link

Change https://go.dev/cl/541998 mentions this issue: cmd/trace: refactor pprof HTTP SVG serving into traceviewer

@gopherbot
Copy link

Change https://go.dev/cl/542076 mentions this issue: internal/trace: add task analysis for v2 traces

@gopherbot
Copy link

Change https://go.dev/cl/542077 mentions this issue: cmd/trace/v2: add support for task and region endpoints

@gopherbot
Copy link

Change https://go.dev/cl/542001 mentions this issue: cmd/trace: factor out durationHistogram

@gopherbot
Copy link

Change https://go.dev/cl/542218 mentions this issue: cmd/trace: add basic support for v2 traces

@gopherbot
Copy link

Change https://go.dev/cl/543019 mentions this issue: internal/trace/v2: redefine NoTask and add BackgroundTask

@gopherbot
Copy link

Change https://go.dev/cl/543595 mentions this issue: cmd/trace/v2: add support for goroutine filtering

@gopherbot
Copy link

Change https://go.dev/cl/543596 mentions this issue: cmd/trace/v2: add support for a task-oriented procs-based view

@gopherbot
Copy link

Change https://go.dev/cl/543695 mentions this issue: cmd/trace/v2: add thread-oriented mode for v2 traces

@gopherbot
Copy link

Change https://go.dev/cl/543937 mentions this issue: cmd/trace/v2: add support for the goroutine-oriented task view

@gopherbot
Copy link

Change https://go.dev/cl/543995 mentions this issue: cmd/trace/v2: emit regions in the goroutine-oriented task view

gopherbot pushed a commit that referenced this issue Nov 21, 2023
This change refactors the cmd/trace package and adds most of the support
for v2 traces.

The following features of note are missing in this CL and will be
implemented in follow-up CLs:
- The focustask filter for the trace viewer
- The taskid filter for the trace viewer
- The goid filter for the trace viewer
- Pprof profiles
- The MMU graph
- The goroutine analysis pages
- The task analysis pages
- The region analysis pages

This CL makes one notable change to the trace CLI: it makes the -d flag
accept an integer to set the debug mode. For old traces -d != 0 works
just like -d. For new traces -d=1 means the high-level events and -d=2
means the low-level events.

Thanks to Felix Geisendörfer (felix.geisendoerfer@datadoghq.com) for
doing a lot of work on this CL; I picked this up from him and got a
massive headstart as a result.

For #60773.
For #63960.

Change-Id: I3626e22473227c5980134a85f1bb6a845f567b1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/542218
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Nov 21, 2023
This is a complete fork and most of a rewrite of the goroutine analysis
pages for v2 traces. It fixes an issue with the old page where GC time
didn't really make any sense, generalizes the page and breaks things
down further, and adds clarifying text.

This change also modifies the SummarizeGoroutines API to not stream the
trace. This is unfortunate, but we're already reading and holding the
entire trace in memory for the trace viewer. We can revisit this
decision in the future. Also, we want to do this now because the
GoroutineSummary holds on to pointers to events, and these events will
be used by the user region and user task analyses. While tracev2 events
are values and they should be equivalent no matter how many times we
parse a trace, this lets us reference the event in the slice directly.

For #60773.
For #63960.
Fixes #62443.

Change-Id: I1c5ab68141869378843f4f2826686038e4533090
Reviewed-on: https://go-review.googlesource.com/c/go/+/541257
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue Nov 21, 2023
This change moves the MMU HTTP handlers and functionality into the
traceviewer package, since unlike the goroutine pages the vast majority
of that functionality is identical between v1 and v2. This change
involves some refactoring so that callers can plug in their own mutator
utilization computation functions (which is the only point of difference
between v1 and v2). The new interface isn't especially nice, but part of
the problem is the MMU handlers depend on specific endpoints to exist. A
follow-up CL will clean this up a bit.

Like the previous CL did for goroutine analysis, modify the v2 mutator
utilization API to accept a slice of trace events. Again, we might as
well reuse what was already parsed and will be needed for other
purposes. It also simplifies the API slightly.

For #60773.
For #63960.

Change-Id: I6c21ec8d1bf7e95eff5363d0e0005c9217fa00e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/541258
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Nov 21, 2023
The last change made the MMU rendering code common and introduced a new
API, but it was kind of messy. Part of the problem was that some of the
Javascript in the template for the main page referred to specific
endpoints on the server.

Fix this by having the Javascript access the same endpoint but with a
different query variable. Now the Javascript code doesn't depend on
specific endpoints, just on query variables for the current endpoint.

For #60773.
For #63960.

Change-Id: I1c559d9859c3a0d62e2094c9d4ab117890b63b31
Reviewed-on: https://go-review.googlesource.com/c/go/+/541259
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue Nov 21, 2023
For #60773.
For #63960.

Change-Id: Id97380f19267ec765b25a703ea3e2f284396ad75
Reviewed-on: https://go-review.googlesource.com/c/go/+/541998
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
gopherbot pushed a commit that referenced this issue Nov 21, 2023
This change adds support for the pprof endpoints to cmd/trace/v2.

In the process, I realized we need to pass the goroutine summaries to
more places, and previous CLs had already done the goroutine analysis
during cmd/trace startup. This change thus refactors the goroutine
analysis API once again to operate in a streaming manner, and to run
at the same time as the initial trace parsing. Now we can include it in
the parsedTrace type and pass that around as the de-facto global trace
context.

Note: for simplicity, this change redefines "syscall" profiles to
capture *all* syscalls, not just syscalls that block. IIUC, this choice
was partly the result of a limitation in the previous trace format that
syscalls don't all have complete durations and many short syscalls are
treated as instant. To this end, this change modifies the text on the
main trace webpage to reflect this change.

For #60773.
For #63960.

Change-Id: I601d9250ab0849a0bfaef233fd9b1e81aca9a22a
Reviewed-on: https://go-review.googlesource.com/c/go/+/541999
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue Nov 21, 2023
The v2 trace parser currently handles task inheritance and region task
association incorrectly. It assumes that a TaskID of 0 means that there
is no task. However, this is only true for task events. A TaskID of 0
means that a region gets assigned to the "background task." The parser
currently has no concept of a "background task."

Fix this by defining the background task as task ID 0 and redefining
NoTask to ^uint64(0). This aligns the TaskID values more closely with
other IDs in the parser and also enables disambiguating these two cases.

For #60773.
For #63960.

Change-Id: I09c8217b33b87c8f8f8ea3b0203ed83fd3b61e11
Reviewed-on: https://go-review.googlesource.com/c/go/+/543019
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Nov 21, 2023
For v1 traces, cmd/trace contains code for analyzing tasks separately
from the goroutine analysis code present in internal/trace. As I started
to look into porting that code to v2 traces, I noticed that it wouldn't
be too hard to just generalize the existing v2 goroutine summary code to
generate exactly the same information.

This change does exactly that.

For #60773.
For #63960.

Change-Id: I0cdd9bf9ba11fb292a9ffc37dbf18c2a6a2483b8
Reviewed-on: https://go-review.googlesource.com/c/go/+/542076
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
gopherbot pushed a commit that referenced this issue Nov 21, 2023
This code will be useful for the new tracer, and there's no need to
duplicate it. This change copies it to internal/trace/traceviewer, adds
some comments, and renames it to TimeHistogram.

While we're here, let's get rid of the unused String method which has a
comment talking about how awful the rendering is.

Also, let's get rid of uses of niceDuration. We'd have to bring it
with us in the move and I don't think it's worth it. The difference
between the default time.Duration rendering and the niceDuration
rendering is usually a few extra digits of precision. Yes, it's noisier,
but AFAICT it's not substantially worse. It doesn't seem worth the new
API, even if it's just internal. We can also always bring it back later.

For #60773.
For #63960.

Change-Id: I795f58f579f1d503c540c3a40bed12e52bce38e2
Reviewed-on: https://go-review.googlesource.com/c/go/+/542001
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
gopherbot pushed a commit that referenced this issue Nov 21, 2023
This change fills out the last of cmd/trace's subpages for v2 traces by
adding support for task and region endpoints.

For #60773.
For #63960.

Change-Id: Ifc4c660514b3904788785a1b20e3abc3bb9e55f1
Reviewed-on: https://go-review.googlesource.com/c/go/+/542077
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue Nov 21, 2023
This change adds support for the trace?goid=<goid> endpoint to the trace
tool for v2 traces.

In effect, this change actually implements a per-goroutine view. I tried
to add a link to the main page to enable a "view by goroutines" view
without filtering, but the web trace viewer broke the browser tab when
there were a few hundred goroutines. The risk of a browser hang probably
isn't worth the cases where this is nice, especially since filtering by
goroutine already works. Unfortunate, but c'est l'vie. Might be worth
revisiting if we change out the web viewer in the future.

For #60773.
For #63960.

Change-Id: I8e29f4ab8346af6708fd8824505c30f2c43db796
Reviewed-on: https://go-review.googlesource.com/c/go/+/543595
TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Nov 21, 2023
This change implements support for the trace?focustask=<taskid> endpoint
in the trace tool for v2 traces.

Note: the one missing feature in v2 vs. v1 is that the "irrelevant" (but
still rendered) events are not grayed out. This basically includes
events that overlapped with events that overlapped with other events
that were in the task time period, but aren't themselves directly
associated. This is probably fine -- the UI already puts a very obvious
focus on the period of time the selected task was running.

For #60773.
For #63960.

Change-Id: I5c78a220ae816e331b74cb67c01c5cd98be40dd4
Reviewed-on: https://go-review.googlesource.com/c/go/+/543596
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
gopherbot pushed a commit that referenced this issue Nov 21, 2023
This is a nice-to-have that's now straightforward to do with the new
trace format. This change adds a new query variable passed to the
/trace endpoint called "view," which indicates the type of view to
use. It is orthogonal with task-related views.

Unfortunately a goroutine-based view isn't included because it's too
likely to cause the browser tab to crash.

For #60773.
For #63960.

Change-Id: Ifbcb8f2d58ffd425819bdb09c586819cb786478d
Reviewed-on: https://go-review.googlesource.com/c/go/+/543695
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Nov 21, 2023
This change adds support for a goroutine-oriented task view via the
/trace?taskid=<taskid> endpoint. This works but it's missing regions.
That will be implemented in a follow-up CL.

For #60773.
For #63960.

Change-Id: I086694143e5c71596ac22fc551416868f0b80923
Reviewed-on: https://go-review.googlesource.com/c/go/+/543937
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/546026 mentions this issue: doc: add release notes for the new execution tracer

gopherbot pushed a commit that referenced this issue Dec 4, 2023
For #60773.
For #62627.
For #63960.

For #61422.

Change-Id: I3c933f7522f65cd36d11d38a268556d92c8053f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/546026
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
For golang#60773.
For golang#62627.
For golang#63960.

For golang#61422.

Change-Id: I3c933f7522f65cd36d11d38a268556d92c8053f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/546026
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

4 participants