Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/trace: TestAnalyzeAnnotationTaskTree failures since 2022-07-08 #53784

Closed
bcmills opened this issue Jul 11, 2022 · 7 comments
Closed

cmd/trace: TestAnalyzeAnnotationTaskTree failures since 2022-07-08 #53784

bcmills opened this issue Jul 11, 2022 · 7 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Jul 11, 2022

greplogs -l -e 'FAIL: TestAnalyzeAnnotationTaskTree'
2022-07-11T08:23:57-59ab6f3/linux-amd64-nocgo
2022-07-08T18:51:33-5c1a13e/linux-loong64-3a5000
2022-07-08T17:00:50-180bcad/freebsd-amd64-race
2022-07-08T12:57:49-14abe8a/darwin-amd64-race
2022-07-08T12:57:49-14abe8a/openbsd-amd64-70

The first of these failures in the logs is at CL 415634, but the failure rate for this test is not 100% — it could have been introduced in a somewhat earlier change. However, given the number of failures in the log this looks like a very recent regression.

(attn @golang/runtime)

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 11, 2022
@bcmills bcmills added this to the Go1.19 milestone Jul 11, 2022
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 11, 2022
@bcmills
Copy link
Contributor Author

bcmills commented Jul 11, 2022

See previously #24081.

@bcmills
Copy link
Contributor Author

bcmills commented Jul 11, 2022

Marking as release-blocker for Go 1.19 because this appears to have been introduced sometime last week.

@prattmic
Copy link
Member

I was able to reproduce this on my linux-amd64 workstation:

$ go test cmd/trace -run TestAnalyzeAnnotationTaskTree -count 100
--- FAIL: TestAnalyzeAnnotationTaskTree (0.00s)
    annotations_test.go:228: got task 4:        task1
                start: 25956 end: 639582 complete: true
                13 goroutines
                13 regions:
                        task1.region(goid=21)
                        (goid=23)
                        (goid=24)
                        (goid=11)
                        (goid=25)
                        (goid=26)
                        (goid=27)
                        (goid=28)
                        (goid=29)
                        (goid=30)
                        (goid=31)
                        (goid=32)
                        (goid=33)
                1 children:
                        task2
        ; want {parent: children:[task2] regions:[task1.region]}
FAIL
FAIL    cmd/trace       0.146s
FAIL

Though I seem to have gotten lucky? I can't reproduce again even with -count 10000.

@prattmic prattmic self-assigned this Jul 11, 2022
@prattmic
Copy link
Member

prattmic commented Jul 11, 2022

The immediate problem is that the task has too many regions. It should only include task1.region, but also contains a bunch of other goroutines.

These are almost certainly GC worker goroutines, which were started while this task was active. I believe we fixed this before for pprof labels, we may need a similar fix for tasks/regions.

It looks like region inheritance is actually done in internal/trace when parsing the trace: https://cs.opensource.google/go/go/+/master:src/internal/trace/goroutines.go;drc=c1a4e0fe014568501b194eb8b04309f54eee6b4c;bpv=1;bpt=1;l=169

Timing-wise, it seems likely that something changed that caused the start time of the first GC to change slightly. Most likely CL 416417. (cc @mknyszek, though I don't think there is anything wrong with that CL)

Edit: I got a stack trace, these goroutines are definitely GC workers.

@prattmic
Copy link
Member

This can be reproduced pretty easily with:

package main

import (
        "context"
        "runtime"
        "runtime/trace"
        "os"
)

var sink int

func child(ctx context.Context) {
        runtime.GC()
        for i := 0; i < 100000; i++ {
                sink = i
        }
}

func main() {
        f, err := os.Create("/tmp/trace.out")
        if err != nil {
                panic(err)
        }
        defer f.Close()

        if err := trace.Start(f); err != nil {
                panic(err)
        }
        defer trace.Stop()

        ctx := context.Background()
        ctx, task := trace.NewTask(ctx, "thisisatask")
        defer task.End()

        trace.WithRegion(ctx, "thisisaregion", func() {
                go child(ctx)
                for i := 0; i < 5*100000; i++ {
                        sink = i
                }
        })
}

On go tool trace /usertask?type=thisisatask, we see:


When	Elapsed	Goroutine ID	Events
0.000079565s	332.722µs		Task 1 (goroutine view) (complete)
0.000079565	 .         	1	task thisisatask (id 1, parent 0) created
0.000089574	 .    10009	1	region thisisaregion started (duration: 319.7µs)
0.000098214	 .     8640	1	new goroutine 7: main.child
0.000412287	 .   314073	1	task end
0.000617513	 .   205226	7	new goroutine 17: runtime.gcBgMarkWorker
0.000653357	 .    35844	7	new goroutine 8: runtime.gcBgMarkWorker
0.000700023	 .    46666	7	new goroutine 33: runtime.gcBgMarkWorker
0.000733251	 .    33228	7	new goroutine 18: runtime.gcBgMarkWorker
0.000765459	 .    32208	7	new goroutine 9: runtime.gcBgMarkWorker
0.000797095	 .    31636	7	new goroutine 19: runtime.gcBgMarkWorker
0.000824072	 .    26977	7	new goroutine 34: runtime.gcBgMarkWorker
0.000864920	 .    40848	7	new goroutine 35: runtime.gcBgMarkWorker
0.000890686	 .    25766	7	new goroutine 36: runtime.gcBgMarkWorker
0.000920815	 .    30129	7	new goroutine 20: runtime.gcBgMarkWorker
0.000957888	 .    37073	7	new goroutine 10: runtime.gcBgMarkWorker
0.000978858	 .    20970	7	new goroutine 37: runtime.gcBgMarkWorker
GC:181.036µs

Interestingly, the "goroutine view" visualization already filters out the GC goroutines.

I think we should just exclude system goroutines from the task inheritance mentioned above.

Other goroutines created in a region are interesting to see even if they don't perform task operations. GC goroutines are not. (It may be interesting to see that a GC was triggered by this task, but these gcBgMarkWorker events only appear on the first GC, which makes this a poor "GC start" indicator")

@gopherbot
Copy link

Change https://go.dev/cl/416858 mentions this issue: internal/trace: don't report regions on system goroutines

@gopherbot
Copy link

Change https://go.dev/cl/416860 mentions this issue: cmd/trace: temporarily strip system goroutines for tests

jproberts pushed a commit to jproberts/go that referenced this issue Aug 10, 2022
If a goroutine is started within a user region, internal/trace assigns
the child goroutine a nameless region for its entire lifetime which is
assosciated the same task as the parent's region.

This is not strictly necessary: a child goroutine is not necessarily
related to the task unless it performs some task operation (in which
case it will be associated with the task through the standard means).

However, it can be quite handy to see child goroutines within a region,
which may be child worker goroutines that you simply didn't perform task
operations on.

If the first GC occurs during a region, the GC worker goroutines will
also inherit a child region. We know for sure that these aren't related
to the task, so filter them out from the region list.

Note that we can't exclude system goroutines from setting activeRegions
in EvGoCreate handling, because we don't know the goroutine start
function name until the first EvGoStart.

Fixes golang#53784.

Change-Id: Ic83d84e23858a8400a76d1ae2f1418ef49951178
Reviewed-on: https://go-review.googlesource.com/c/go/+/416858
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@golang golang locked and limited conversation to collaborators Jul 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

3 participants