New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/trace: TestAnalyzeAnnotationTaskTree failures since 2022-07-08 #53784
Comments
See previously #24081. |
Marking as release-blocker for Go 1.19 because this appears to have been introduced sometime last week. |
I was able to reproduce this on my linux-amd64 workstation:
Though I seem to have gotten lucky? I can't reproduce again even with |
The immediate problem is that the task has too many regions. It should only include task1.region, but also contains a bunch of other goroutines. These are almost certainly GC worker goroutines, which were started while this task was active. I believe we fixed this before for pprof labels, we may need a similar fix for tasks/regions. It looks like region inheritance is actually done in Timing-wise, it seems likely that something changed that caused the start time of the first GC to change slightly. Most likely CL 416417. (cc @mknyszek, though I don't think there is anything wrong with that CL) Edit: I got a stack trace, these goroutines are definitely GC workers. |
This can be reproduced pretty easily with: package main
import (
"context"
"runtime"
"runtime/trace"
"os"
)
var sink int
func child(ctx context.Context) {
runtime.GC()
for i := 0; i < 100000; i++ {
sink = i
}
}
func main() {
f, err := os.Create("/tmp/trace.out")
if err != nil {
panic(err)
}
defer f.Close()
if err := trace.Start(f); err != nil {
panic(err)
}
defer trace.Stop()
ctx := context.Background()
ctx, task := trace.NewTask(ctx, "thisisatask")
defer task.End()
trace.WithRegion(ctx, "thisisaregion", func() {
go child(ctx)
for i := 0; i < 5*100000; i++ {
sink = i
}
})
} On
Interestingly, the "goroutine view" visualization already filters out the GC goroutines. I think we should just exclude system goroutines from the task inheritance mentioned above. Other goroutines created in a region are interesting to see even if they don't perform task operations. GC goroutines are not. (It may be interesting to see that a GC was triggered by this task, but these gcBgMarkWorker events only appear on the first GC, which makes this a poor "GC start" indicator") |
Change https://go.dev/cl/416858 mentions this issue: |
Change https://go.dev/cl/416860 mentions this issue: |
If a goroutine is started within a user region, internal/trace assigns the child goroutine a nameless region for its entire lifetime which is assosciated the same task as the parent's region. This is not strictly necessary: a child goroutine is not necessarily related to the task unless it performs some task operation (in which case it will be associated with the task through the standard means). However, it can be quite handy to see child goroutines within a region, which may be child worker goroutines that you simply didn't perform task operations on. If the first GC occurs during a region, the GC worker goroutines will also inherit a child region. We know for sure that these aren't related to the task, so filter them out from the region list. Note that we can't exclude system goroutines from setting activeRegions in EvGoCreate handling, because we don't know the goroutine start function name until the first EvGoStart. Fixes golang#53784. Change-Id: Ic83d84e23858a8400a76d1ae2f1418ef49951178 Reviewed-on: https://go-review.googlesource.com/c/go/+/416858 Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
greplogs -l -e 'FAIL: TestAnalyzeAnnotationTaskTree'
2022-07-11T08:23:57-59ab6f3/linux-amd64-nocgo
2022-07-08T18:51:33-5c1a13e/linux-loong64-3a5000
2022-07-08T17:00:50-180bcad/freebsd-amd64-race
2022-07-08T12:57:49-14abe8a/darwin-amd64-race
2022-07-08T12:57:49-14abe8a/openbsd-amd64-70
The first of these failures in the logs is at CL 415634, but the failure rate for this test is not 100% — it could have been introduced in a somewhat earlier change. However, given the number of failures in the log this looks like a very recent regression.
(attn @golang/runtime)
The text was updated successfully, but these errors were encountered: