Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/trace: give more context to "Scheduler wait" table column on /goroutine page #63430

Open
rhysh opened this issue Oct 6, 2023 · 0 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@rhysh
Copy link
Contributor

rhysh commented Oct 6, 2023

What version of Go are you using (go version)?

$ go1.21 version
go version go1.21.2 darwin/arm64

Does this issue reproduce with the latest release?

Yes, this is present in the Go 1.21 series and in the development branch.

What operating system and processor architecture are you using (go env)?

go env Output
$ go1.21 env
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/rhys/Library/Caches/go-build'
GOENV='/Users/rhys/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/rhys/go/pkg/mod'
GONOPROXY='*'
GONOSUMDB='*'
GOOS='darwin'
GOPATH='/Users/rhys/go'
GOPRIVATE='*'
GOPROXY='direct'
GOROOT='/Users/rhys/go/version/go1.21'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/Users/rhys/go/version/go1.21/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.21.2'
GCCGO='gccgo'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/lf/n4ry0qv5639_3z0nhyhy3z7c0000gs/T/go-build360557930=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

I ran the reproducer for #63422 and collected an execution trace. I used go tool trace to open a browser view, clicked the "Goroutine analysis" link, and clicked on one of the goroutine types (testing.tRunner). I looked at the "Scheduler wait" column, and saw that one of the goroutines had spend a total of 84ms waiting to run.

$ go1.21 test ./h2_unplug_test.go -test.run=/http1 -request-timeout=10ms -request-count=30000 -unplug-interval=10000 -test.trace=/tmp/t
--- FAIL: TestNetsplit (2.33s)
    --- FAIL: TestNetsplit/http1 (2.33s)
        h2_unplug_test.go:155: total=30000 unplugs=3 failures=3
FAIL
FAIL    command-line-arguments  2.497s
FAIL

$ go1.21 tool trace /tmp/t
2023/10/06 14:48:04 Parsing trace...
2023/10/06 14:48:05 Splitting trace...
2023/10/06 14:48:07 Opening browser. Trace viewer is listening on http://127.0.0.1:54953

What did you expect to see?

I expected that figure to come with the relevant context: That goroutine played a role in executing 30,000 HTTP requests in sequence. It was scheduled a total of 60,035 times. Its average wait time was 1.4µs, and (if I've calculated correctly) its p99 wait time was 6.3µs, the p100 / worst case time was 532µs, and the second-worst was 82µs.

$ go1.21 tool trace -d /tmp/t | awk '{if ($2=="GoStart" && $4=="g=8") {n+=1}} END {print n}'
2023/10/06 14:56:01 Parsing trace...
60035

I think that table should have an additional column showing the number of times the goroutine has been scheduled (the count of GoStart events), "60035" in this case.

Following that, I also think that table should give some hints about the tail behavior of the goroutine's scheduler interactions. Maybe that's the p100 / worst case scheduling latency.

What did you see instead?

The table shows "84ms" of "Scheduler wait", without any of that additional context. That makes it look like the scheduler has problems, when a much closer look reveals that this is an especially chatty goroutine ... one that uses the scheduler perhaps a little too often, but which has a good experience nearly every time.

Screenshot 2023-10-06 at 2 48 21 PM

CC @golang/runtime , and @mknyszek in particular

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 9, 2023
@dmitshur dmitshur added this to the Backlog milestone Oct 9, 2023
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

3 participants