New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime/trace: missing cpu samples #58877
Comments
Thanks @felixge . What part of the behavior you see is inconsistent with #52674? Are you able to reproduce the problem with a smaller output file? Is the pattern consistent across different runs? Have you looked at the delta between the CPU profile and the execution trace's CPUSample events to see which kinds of samples are missing from the trace? I don't think this one will be fruitful; between 700 and 800 ms, the benchmark is running I see no CPUSample events between 0.878 seconds and 7.773 seconds, so it's even a bit worse than your screenshot shows. The additional-row spans on Proc 0 near 5.5 seconds are MARK ASSIST, SWEEP, and STW. The execution trace includes 2094 CPUSample events leading up to the gap at 0.878–7.773 seconds. Spot-checking several of those, they seem to have very deep call stacks, often at or near the 64-frame limit. It seems likely that this is filling the
|
My understanding of #52674 is that samples could be dropped if trace buffers are not being flushed often enough. I could imagine this to happen here, but I wouldn't expect the problem to last for so many seconds given that the workload is causing non-stop trace activity. But maybe I'm overestimating the amount of trace buffers that would be flushed from that. Other than that I have no strong reason to believe this is not caused by #52674 . If that's the root cause, then this issue could be seen as a "real world" example of the problem.
No, but I haven't tried very hard.
Yeah, it seems pretty consistent between runs for me.
I haven't. Is there a way to extract a CPU pprof from a trace?
Makes sense.
Ack.
Yeah, some benchmarks are definitely creating very deep stacks. But we'd expect those to be truncated, not dropped, if they exceed the 64 frame limit, right?
Ok, maybe that's what's happening after all.
TIL about
Nice analysis. So based on this, you think this is probably caused by #52674 - right? |
Out of curiosity I added a Line 846 in b94dc38
And seem to get one or more consistently during BenchmarkMarshalBytes. Examples:
That's pretty early on in the run for me. I tried running the whole set of benchmarks once or twice but didn't see it pop up for any others. In my last full-set run I got two empties:
And looking at the flamescope I see: With an empty run a few seconds in. This is confirmed with some trace to sqlite bits I'm playing with (here, which also has the query I used to generate the flamescope data inspired by you, @felixge, thanks!):
|
Should this be deduplicated with #52674? |
SGTM! Closing as duplicate of #52674. |
The tracer seems to be dropping CPU samples in a way that might not be explained by #52674. At least I'm surprised by the pattern. cc @rhysh
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Run the benchmarks in encoding/json while capturing a trace with CPU samples.
You can download the files above here: Google Drive
What did you expect to see?
The trace should contain lots of CPU samples throughout the trace, especially while all the Ps are execution goroutines.
What did you see instead?
There are several seconds where all Ps are execution goroutines, but no CPU samples are being recorded.
I noticed this while implementing traceutils flamescope which also clearly shows this pattern:
As far as I can tell the trace should contain
126,873
samples, but it only contains105,756
samples.The text was updated successfully, but these errors were encountered: