Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/trace: missing cpu samples #58877

Closed
felixge opened this issue Mar 5, 2023 · 7 comments
Closed

runtime/trace: missing cpu samples #58877

felixge opened this issue Mar 5, 2023 · 7 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@felixge
Copy link
Contributor

felixge commented Mar 5, 2023

The tracer seems to be dropping CPU samples in a way that might not be explained by #52674. At least I'm surprised by the pattern. cc @rhysh

What version of Go are you using (go version)?

go version go1.20.1 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/ubuntu/.cache/go-build"
GOENV="/home/ubuntu/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/ubuntu/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/ubuntu/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/ubuntu/sdk/go1.20.1"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/ubuntu/sdk/go1.20.1/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20.1"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2440112406=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Run the benchmarks in encoding/json while capturing a trace with CPU samples.

go test -run '^$' -bench . -cpuprofile trace.pprof -trace json.trace encoding/json

You can download the files above here: Google Drive

What did you expect to see?

The trace should contain lots of CPU samples throughout the trace, especially while all the Ps are execution goroutines.

What did you see instead?

There are several seconds where all Ps are execution goroutines, but no CPU samples are being recorded.

CleanShot 2023-03-05 at 10 46 51@2x

I noticed this while implementing traceutils flamescope which also clearly shows this pattern:

CleanShot 2023-03-05 at 10 41 27@2x

As far as I can tell the trace should contain 126,873 samples, but it only contains 105,756 samples.

$ go tool pprof -top json.pprof | grep Total
Duration: 76.34s, Total samples = 1268.73s (1662.02%)

$ traceutils breakdown count json.trace | grep CPU
| EventCPUSample         |   105756 | 0.30%   |
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Mar 5, 2023
@rhysh
Copy link
Contributor

rhysh commented Mar 5, 2023

Thanks @felixge . What part of the behavior you see is inconsistent with #52674?

Are you able to reproduce the problem with a smaller output file?

Is the pattern consistent across different runs?

Have you looked at the delta between the CPU profile and the execution trace's CPUSample events to see which kinds of samples are missing from the trace? I don't think this one will be fruitful; between 700 and 800 ms, the benchmark is running BenchmarkCodeEncoder so I'd expect the content of the samples to be the same, which is evidence against the samples' content somehow causing them to be dropped.

I see no CPUSample events between 0.878 seconds and 7.773 seconds, so it's even a bit worse than your screenshot shows. The additional-row spans on Proc 0 near 5.5 seconds are MARK ASSIST, SWEEP, and STW.

The execution trace includes 2094 CPUSample events leading up to the gap at 0.878–7.773 seconds. Spot-checking several of those, they seem to have very deep call stacks, often at or near the 64-frame limit. It seems likely that this is filling the profBuf that we use for the execution tracer's CPU profile samples, as described in #52674 (a bit more than 1900 samples).

$ go tool trace -d ~/Downloads/json.trace | grep CPUSample > /tmp/cpuse

$ cat /tmp/cpuse | awk '{print int($1/100e6),$1/1e6}' | grep -C 4 '^8 ' | tail
7 717.821
7 717.823
7 717.824
7 721.825
8 877.786
77 7773.8
77 7773.8
77 7773.8
77 7773.81

$ cat /tmp/cpuse | awk '{if ($1<1e9) {print $0}}' | wc -l 
    2094

@felixge
Copy link
Contributor Author

felixge commented Mar 5, 2023

Thanks @felixge . What part of the behavior you see is inconsistent with #52674?

My understanding of #52674 is that samples could be dropped if trace buffers are not being flushed often enough. I could imagine this to happen here, but I wouldn't expect the problem to last for so many seconds given that the workload is causing non-stop trace activity. But maybe I'm overestimating the amount of trace buffers that would be flushed from that.

Other than that I have no strong reason to believe this is not caused by #52674 . If that's the root cause, then this issue could be seen as a "real world" example of the problem.

Are you able to reproduce the problem with a smaller output file?

No, but I haven't tried very hard.

Is the pattern consistent across different runs?

Yeah, it seems pretty consistent between runs for me.

Have you looked at the delta between the CPU profile and the execution trace's CPUSample events to see which kinds of samples are missing from the trace?

I haven't. Is there a way to extract a CPU pprof from a trace?

I don't think this one will be fruitful; between 700 and 800 ms, the benchmark is running BenchmarkCodeEncoder so I'd expect the content of the samples to be the same, which is evidence against the samples' content somehow causing them to be dropped.

Makes sense.

I see no CPUSample events between 0.878 seconds and 7.773 seconds, so it's even a bit worse than your screenshot shows. The additional-row spans on Proc 0 near 5.5 seconds are MARK ASSIST, SWEEP, and STW.

Ack.

The execution trace includes 2094 CPUSample events leading up to the gap at 0.878–7.773 seconds. Spot-checking several of those, they seem to have very deep call stacks, often at or near the 64-frame limit.

Yeah, some benchmarks are definitely creating very deep stacks. But we'd expect those to be truncated, not dropped, if they exceed the 64 frame limit, right?

It seems likely that this is filling the profBuf that we use for the execution tracer's CPU profile samples, as described in #52674 (a bit more than 1900 samples).

Ok, maybe that's what's happening after all.

$ go tool trace -d ~/Downloads/json.trace | grep CPUSample > /tmp/cpuse

TIL about -d 🙈 - thanks 🙇.

$ cat /tmp/cpuse | awk '{print int($1/100e6),$1/1e6}' | grep -C 4 '^8 ' | tail
7 717.821
7 717.823
7 717.824
7 721.825
8 877.786
77 7773.8
77 7773.8
77 7773.8
77 7773.81

$ cat /tmp/cpuse | awk '{if ($1<1e9) {print $0}}' | wc -l 
    2094

Nice analysis. So based on this, you think this is probably caused by #52674 - right?

@danp
Copy link
Contributor

danp commented Mar 5, 2023

Out of curiosity I added a println("empty") here:

// Looks like an overflow record from the profBuf. Not much to

And seem to get one or more consistently during BenchmarkMarshalBytes. Examples:

BenchmarkMarshalBytes/32
empty
BenchmarkMarshalBytes/32-8       	13310040	        91.08 ns/op

BenchmarkMarshalBytes/32
empty
BenchmarkMarshalBytes/32-8       	13002902	        88.95 ns/op

BenchmarkMarshalBytes/32
empty
empty
empty
BenchmarkMarshalBytes/32-8       	13341460	        86.97 ns/op

That's pretty early on in the run for me. I tried running the whole set of benchmarks once or twice but didn't see it pop up for any others.

In my last full-set run I got two empties:

BenchmarkMarshalBytes/32
empty
empty
BenchmarkMarshalBytes/32-8       	13634491	        86.77 ns/op

And looking at the flamescope I see:

Screenshot 2023-03-05 at 4 56 53 PM

With an empty run a few seconds in. This is confirmed with some trace to sqlite bits I'm playing with (here, which also has the query I used to generate the flamescope data inspired by you, @felixge, thanks!):

sqlite> select round(ts/1e9, 1), count(*) from events where type='CPUSample' and ts/1e9 between 3 and 5 group by 1 order by 1;
┌──────────────────┬──────────┐
│ round(ts/1e9, 1) │ count(*) │
├──────────────────┼──────────┤
│ 3.0              │ 31       │
│ 3.1              │ 61       │
│ 3.2              │ 59       │
│ 3.3              │ 62       │
│ 3.4              │ 61       │
│ 3.5              │ 57       │
│ 3.6              │ 53       │
│ 3.7              │ 1        │
│ 4.8              │ 5        │
│ 4.9              │ 9        │
│ 5.0              │ 5        │
└──────────────────┴──────────┘

@cagedmantis cagedmantis added this to the Backlog milestone Mar 6, 2023
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 6, 2023
@cagedmantis
Copy link
Contributor

cc @golang/runtime @mknyszek @prattmic @dvyukov

@mknyszek
Copy link
Contributor

Should this be deduplicated with #52674?

@felixge
Copy link
Contributor Author

felixge commented Mar 15, 2023

Should this be deduplicated with #52674?

Yes. Based on the discussion above, I think we can close this one as a dupe and maybe update the description of #52674 to link to this issue for a concrete example?

@mknyszek
Copy link
Contributor

SGTM! Closing as duplicate of #52674.

@mknyszek mknyszek closed this as not planned Won't fix, can't repro, duplicate, stale Mar 15, 2023
@golang golang locked and limited conversation to collaborators Mar 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants