runtime/trace: missing cpu samples #58877

felixge · 2023-03-05T10:13:25Z

The tracer seems to be dropping CPU samples in a way that might not be explained by #52674. At least I'm surprised by the pattern. cc @rhysh

What version of Go are you using (`go version`)?

go version go1.20.1 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/ubuntu/.cache/go-build"
GOENV="/home/ubuntu/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/ubuntu/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/ubuntu/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/ubuntu/sdk/go1.20.1"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/ubuntu/sdk/go1.20.1/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20.1"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2440112406=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Run the benchmarks in encoding/json while capturing a trace with CPU samples.

go test -run '^$' -bench . -cpuprofile trace.pprof -trace json.trace encoding/json

You can download the files above here: Google Drive

What did you expect to see?

The trace should contain lots of CPU samples throughout the trace, especially while all the Ps are execution goroutines.

What did you see instead?

There are several seconds where all Ps are execution goroutines, but no CPU samples are being recorded.

I noticed this while implementing traceutils flamescope which also clearly shows this pattern:

As far as I can tell the trace should contain 126,873 samples, but it only contains 105,756 samples.

$ go tool pprof -top json.pprof | grep Total
Duration: 76.34s, Total samples = 1268.73s (1662.02%)

$ traceutils breakdown count json.trace | grep CPU
| EventCPUSample         |   105756 | 0.30%   |

The text was updated successfully, but these errors were encountered:

rhysh · 2023-03-05T17:37:51Z

Thanks @felixge . What part of the behavior you see is inconsistent with #52674?

Are you able to reproduce the problem with a smaller output file?

Is the pattern consistent across different runs?

Have you looked at the delta between the CPU profile and the execution trace's CPUSample events to see which kinds of samples are missing from the trace? I don't think this one will be fruitful; between 700 and 800 ms, the benchmark is running BenchmarkCodeEncoder so I'd expect the content of the samples to be the same, which is evidence against the samples' content somehow causing them to be dropped.

I see no CPUSample events between 0.878 seconds and 7.773 seconds, so it's even a bit worse than your screenshot shows. The additional-row spans on Proc 0 near 5.5 seconds are MARK ASSIST, SWEEP, and STW.

The execution trace includes 2094 CPUSample events leading up to the gap at 0.878–7.773 seconds. Spot-checking several of those, they seem to have very deep call stacks, often at or near the 64-frame limit. It seems likely that this is filling the profBuf that we use for the execution tracer's CPU profile samples, as described in #52674 (a bit more than 1900 samples).

$ go tool trace -d ~/Downloads/json.trace | grep CPUSample > /tmp/cpuse

$ cat /tmp/cpuse | awk '{print int($1/100e6),$1/1e6}' | grep -C 4 '^8 ' | tail
7 717.821
7 717.823
7 717.824
7 721.825
8 877.786
77 7773.8
77 7773.8
77 7773.8
77 7773.81

$ cat /tmp/cpuse | awk '{if ($1<1e9) {print $0}}' | wc -l 
    2094

felixge · 2023-03-05T18:17:42Z

Thanks @felixge . What part of the behavior you see is inconsistent with #52674?

My understanding of #52674 is that samples could be dropped if trace buffers are not being flushed often enough. I could imagine this to happen here, but I wouldn't expect the problem to last for so many seconds given that the workload is causing non-stop trace activity. But maybe I'm overestimating the amount of trace buffers that would be flushed from that.

Other than that I have no strong reason to believe this is not caused by #52674 . If that's the root cause, then this issue could be seen as a "real world" example of the problem.

Are you able to reproduce the problem with a smaller output file?

No, but I haven't tried very hard.

Is the pattern consistent across different runs?

Yeah, it seems pretty consistent between runs for me.

Have you looked at the delta between the CPU profile and the execution trace's CPUSample events to see which kinds of samples are missing from the trace?

I haven't. Is there a way to extract a CPU pprof from a trace?

I don't think this one will be fruitful; between 700 and 800 ms, the benchmark is running BenchmarkCodeEncoder so I'd expect the content of the samples to be the same, which is evidence against the samples' content somehow causing them to be dropped.

Makes sense.

I see no CPUSample events between 0.878 seconds and 7.773 seconds, so it's even a bit worse than your screenshot shows. The additional-row spans on Proc 0 near 5.5 seconds are MARK ASSIST, SWEEP, and STW.

Ack.

The execution trace includes 2094 CPUSample events leading up to the gap at 0.878–7.773 seconds. Spot-checking several of those, they seem to have very deep call stacks, often at or near the 64-frame limit.

Yeah, some benchmarks are definitely creating very deep stacks. But we'd expect those to be truncated, not dropped, if they exceed the 64 frame limit, right?

It seems likely that this is filling the profBuf that we use for the execution tracer's CPU profile samples, as described in #52674 (a bit more than 1900 samples).

Ok, maybe that's what's happening after all.

$ go tool trace -d ~/Downloads/json.trace | grep CPUSample > /tmp/cpuse

TIL about -d 🙈 - thanks 🙇.

$ cat /tmp/cpuse | awk '{print int($1/100e6),$1/1e6}' | grep -C 4 '^8 ' | tail
7 717.821
7 717.823
7 717.824
7 721.825
8 877.786
77 7773.8
77 7773.8
77 7773.8
77 7773.81

$ cat /tmp/cpuse | awk '{if ($1<1e9) {print $0}}' | wc -l 
    2094

Nice analysis. So based on this, you think this is probably caused by #52674 - right?

danp · 2023-03-05T20:32:45Z

Out of curiosity I added a println("empty") here:

go/src/runtime/trace.go

Line 846 in b94dc38

// Looks like an overflow record from the profBuf. Not much to

And seem to get one or more consistently during BenchmarkMarshalBytes. Examples:

BenchmarkMarshalBytes/32
empty
BenchmarkMarshalBytes/32-8       	13310040	        91.08 ns/op

BenchmarkMarshalBytes/32
empty
BenchmarkMarshalBytes/32-8       	13002902	        88.95 ns/op

BenchmarkMarshalBytes/32
empty
empty
empty
BenchmarkMarshalBytes/32-8       	13341460	        86.97 ns/op

That's pretty early on in the run for me. I tried running the whole set of benchmarks once or twice but didn't see it pop up for any others.

In my last full-set run I got two empties:

BenchmarkMarshalBytes/32
empty
empty
BenchmarkMarshalBytes/32-8       	13634491	        86.77 ns/op

And looking at the flamescope I see:

With an empty run a few seconds in. This is confirmed with some trace to sqlite bits I'm playing with (here, which also has the query I used to generate the flamescope data inspired by you, @felixge, thanks!):

sqlite> select round(ts/1e9, 1), count(*) from events where type='CPUSample' and ts/1e9 between 3 and 5 group by 1 order by 1;
┌──────────────────┬──────────┐
│ round(ts/1e9, 1) │ count(*) │
├──────────────────┼──────────┤
│ 3.0              │ 31       │
│ 3.1              │ 61       │
│ 3.2              │ 59       │
│ 3.3              │ 62       │
│ 3.4              │ 61       │
│ 3.5              │ 57       │
│ 3.6              │ 53       │
│ 3.7              │ 1        │
│ 4.8              │ 5        │
│ 4.9              │ 9        │
│ 5.0              │ 5        │
└──────────────────┴──────────┘

cagedmantis · 2023-03-06T16:20:47Z

cc @golang/runtime @mknyszek @prattmic @dvyukov

mknyszek · 2023-03-15T19:11:54Z

Should this be deduplicated with #52674?

felixge · 2023-03-15T20:08:21Z

Should this be deduplicated with #52674?

Yes. Based on the discussion above, I think we can close this one as a dupe and maybe update the description of #52674 to link to this issue for a concrete example?

mknyszek · 2023-03-15T20:39:00Z

SGTM! Closing as duplicate of #52674.

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Mar 5, 2023

cagedmantis added this to the Backlog milestone Mar 6, 2023

cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 6, 2023

mknyszek closed this as not planned Won't fix, can't repro, duplicate, stale Mar 15, 2023

mknyszek mentioned this issue Mar 15, 2023

runtime: improve the "best effort" of the execution tracer's CPU sample delivery #52674

Open

golang locked and limited conversation to collaborators Mar 14, 2024

gopherbot added the FrozenDueToAge label Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime/trace: missing cpu samples #58877

runtime/trace: missing cpu samples #58877

felixge commented Mar 5, 2023 •

edited

rhysh commented Mar 5, 2023

felixge commented Mar 5, 2023 •

edited

danp commented Mar 5, 2023 •

edited

cagedmantis commented Mar 6, 2023

mknyszek commented Mar 15, 2023

felixge commented Mar 15, 2023

mknyszek commented Mar 15, 2023

runtime/trace: missing cpu samples #58877

runtime/trace: missing cpu samples #58877

Comments

felixge commented Mar 5, 2023 • edited

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

rhysh commented Mar 5, 2023

felixge commented Mar 5, 2023 • edited

danp commented Mar 5, 2023 • edited

cagedmantis commented Mar 6, 2023

mknyszek commented Mar 15, 2023

felixge commented Mar 15, 2023

mknyszek commented Mar 15, 2023

felixge commented Mar 5, 2023 •

edited

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

felixge commented Mar 5, 2023 •

edited

danp commented Mar 5, 2023 •

edited