runtime/pprof: block profiler: fix sampling bias #44192

felixge · 2021-02-09T20:13:43Z

What version of Go are you using (`go version`)?

$ go version
go version go1.15.7 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/felix.geisendoerfer/Library/Caches/go-build"
GOENV="/Users/felix.geisendoerfer/Library/Application Support/go/env"
GOEXE=""
GOFLAGS="-count=1"
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/felix.geisendoerfer/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/felix.geisendoerfer/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.15.7_1/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.15.7_1/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/felix.geisendoerfer/go/src/github.com/felixge/go-profiler-notes/examples/block-bias/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/mg/xwzs985x35789zk2l14nk8k80000gn/T/go-build460068460=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I created a program with 2 goroutines. One is blocked on a slow time.NewTicker() loop, the other on one that is 10x faster. Then I observed the program using the block profiler with the duration of the slow ticker as the rate.

Code: https://gist.github.com/felixge/6f9701dd26707dbe7d351805860a4608

What did you expect to see?

Both goroutines are blocked 100% of their time, so I'd expect both of them to show up equally in the block profile like shown below:

What did you see instead?

The block profiler sampling seems to be biased, preferring infrequent long block events over frequent short events, even when they contribute to the same amount of overall blocking. This can be seen in the block profile result I actually got:

I've done a lot of additional research on this which can be found here: https://github.com/felixge/go-profiler-notes/blob/main/block.md#sampling-bias

Proposal

I propose to fix the profiler in a way that removes the sampling bias without changing the API or increasing overhead. Concretely this might be done with a patch like this:

felixge/go@master...debias-blockprofile-rate

I've described this in more detail along with a simulation here: https://github.com/felixge/go-profiler-notes/blob/main/block.md#simulation--proposal-for-improvement

The downside to this approach is that it will obscure the "average event duration" which could currently be determined by dividing the delay/nanoseconds by the contentions/count values in the profile. I have a few ideas around this that I'd be happy to discuss.

Additionally I propose to improve the documentation for the block profiler to give people a better idea what the rate value means and perhaps also make some recommendations about overhead.

I'm happy to work on all of the above and submit the patches for it.

The text was updated successfully, but these errors were encountered:

felixge · 2021-02-24T11:33:28Z

@seankhliao would it make sense to add the label "proposal" for this? I'm afraid it might not be reviewed otherwise.

seankhliao · 2021-02-24T12:00:21Z

cc @hyangah

hyangah · 2021-02-25T03:10:18Z

Thank you for the detailed explanation and writeup.

cc @dvyukov (blockprofile author), @pjweinb (mutexprofile author), @rsc (heap profile author) - all three profiles share similar sampling logic.

Improving sampling bias sounds great. I have very limited experience with blockprofile so I want to learn more about the goal here. In your motivating example scenario, why is it important to present the two kinds of events equally? If one type of events usually block 10x longer on average, isn't it more interesting and worth more attention?

felixge · 2021-02-25T07:50:30Z

@hyangah thx.

all three profiles share similar sampling logic.

It's similar, but I think all three of them use slightly different approaches:

Heap Profiler: Uses an internal counter to accumulate bytes allocated. Any event causing the counter to exceed MemProfileRate gets sampled and resets the counter. Very simplified explanation, see runtime/malloc.go for full details.
Block profiler: Samples events with a probability of min(1, duration/rate). In code review @dvyukov mentioned that this was intended to be similar to MemProfileRate.
Mutex Profiler: The simplest sampling implementation. It doesn't take into account the event duration at all, and just samples events with a probability of 1/rate.

Improving sampling bias sounds great. I have very limited experience with blockprofile so I want to learn more about the goal here. In your motivating example scenario, why is it important to present the two kinds of events equally? If one type of events usually block 10x longer on average, isn't it more interesting and worth more attention?

Great point. I think it depends : ). I could imagine a scenario where you have two producers sending work into two separate channels, sometimes getting blocked while doing so. I'd expect the block profiler to be able to tell me which of those two producers is spending more time blocked than the other. Given the current bias, it could give me the wrong answer to that question.

Or from another point of view, imagine if the heap profiler was biased towards big allocations (i hope it isn't, still need to analyze it some more). It could make you chase a few big allocation sources even so they may contribute less overall garbage than thousands of small allocations that the profiler is telling you to ignore by under-representing them.

Anyway, I'd be okay if the bias is here to stay. But if it is, I think it's worth documenting, which I'd be happy to send patches for.

dvyukov · 2021-02-25T07:55:54Z

Interesting. I can see how this change makes sense:
felixge/go@master...debias-blockprofile-rate#diff-4af3428aab976e43f655b4e1a35c92a608a92294609190aa42378d2d265d501cR414

Have you looked at what's done in tcmalloc?
https://github.com/google/tcmalloc/blob/master/tcmalloc/sampler.cc#L168-L177
The logic there was given a lot of thought and it passed several rounds of de-biasing already :)
At least for heap sampling it would be good to mimic exactly the same logic I think. Maybe for other samplers as well.

felixge · 2021-02-25T08:32:52Z

Interesting. I can see how this change makes sense:

Cool : ). Do you think the fact that it obscures the mean of the samples (total_delay / count) is a problem? It's kind of a nice thing to be able to compute, but OTOH I'm not sure if anybody ever looks at it.

Have you looked at what's done in tcmalloc?

Not yet, thanks for the link!

At least for heap sampling it would be good to mimic exactly the same logic I think. Maybe for other samplers as well.

If the current heap sampling could be improved, that'd be awesome. But I'll need some time to fully wrap my head around the current Go implementation as well as tcmalloc's sampler before I could contribute anything useful to that discussion. If you're aware of any good design documents or literature discussing heap sampling techniques, please let me know.

dvyukov · 2021-02-25T09:07:00Z

Do you think the fact that it obscures the mean of the samples (total_delay / count) is a problem?

Lie in profiles is definitely bad. If I see any numbers that definitely look like a lie I may throw away whole profile b/c I don't know what I can trust there.
Humm.... could we make blockRecord.count a float and record real number of cycles and rate/cycles samples? It should fix the total, but preserve average.

If you're aware of any good design documents or literature discussing heap sampling techniques, please let me know.

I am not aware of anything besides comments in sampler.h:
https://github.com/google/tcmalloc/blob/master/tcmalloc/sampler.h#L33

felixge · 2021-02-25T12:33:21Z

Lie in profiles is definitely bad. If I see any numbers that definitely look like a lie I may throw away whole profile b/c I don't know what I can trust there.

👍 I love this perspective / mindset.

Humm.... could we make blockRecord.count a float and record real number of cycles and rate/cycles samples? It should fix the total, but preserve average.

This sounds interesting, but I don't think I understand it. Can you explain it differently?

dvyukov · 2021-02-25T13:27:29Z

Humm.... could we make blockRecord.count a float and record real number of cycles and rate/cycles samples? It should fix the total, but preserve average.

This sounds interesting, but I don't think I understand it. Can you explain it differently?

In saveblockevent we do:

func saveblockevent(cycles int64, skip int, which bucketType) {
	...
	b.bp().count++
	b.bp().cycles += cycles
	...
}

You change increases cycles, but count is still incremented by 1. This makes average incorrect.
My idea was to do something like:

	b.bp().count += rate/cycles
	b.bp().cycles += rate

This increases cycles, but also proportionally increases count.

felixge · 2021-02-25T21:53:05Z

Edit: I updated this comment, you can still see the original below, but I've hidden it because it was confusing.

Old Comment (Please ignore)

	b.bp().count += rate/cycles
	b.bp().cycles += rate

I spend some time thinking and playing around with this. I'm assuming your snippet above would be used instead of the patch I proposed? If that's the case, I think it will fix the sample mean issue, but flip the total duration bias towards short frequent events favored over long infrequent events.

Example 1:

rate = 100
20 events with 500 cycles each
--->
count = 4 (100/500*20)
cycles = 2000 (1000*20)
mean = 500 (2000/40

Example 2:

rate = 100
10 events with 1000 cycles each
--->
count = 1 (100/1000*10)
cycles = 1000 (100*10)
mean = 1000 (1000/1)

As you can see, the total cycles in example 1 end up being twice as high as in example 2, even so both events contribute the same amount of blocking.

Unless I'm missing something, I think the only way to fix the bias without screwing the sample mean is to either change the logic used for deciding which events get sampled, or to add the mean as its own value type into the profile. But maybe there is an elegant solution I'm overlooking.

FWIW, I also played with a little simulation for your suggestion. Below you can see the impact of the rate on an application that produces three distinct block events a, b, c with a population mean of 1000ns, 2000ns and 3000ns respectively.

And this is how things look right now before applying any patches:

  b.bp().count += rate/cycles
  b.bp().cycles += rate

@dvyukov it took me a bit to figure out if you were suggesting this change in addition to my patch or instead of it. Neither combination seemed to work. However, I think I finally figured out that your idea could be applied by a new patch that simply does this inside of saveblockevent. At least in my testing it seems to fix the bias and produce a sample mean that converges on the population mean.

if cycles < rate {
	b.bp().count += rate/cycles
	b.bp().cycles += rate
} else {
	b.bp().count += 1
	b.bp().cycles += cycles
}

I'll think this over again tomorrow when it's not late, but I'd be happy to send a patch + tests for this if you agree. Let me know.

dvyukov · 2021-02-26T09:27:27Z

Yes, I meant in combination with your patch. Something similar to the snippet you posted. I just did not elaborate enough, sorry.

felixge · 2021-02-26T11:52:40Z

Yes, I meant in combination with your patch. Something similar to the snippet you posted. I just did not elaborate enough, sorry.

No worries, in retrospect I think it was clear enough, but my brain didn't want to collaborate 🙈. I'll try to prepare a patch for this now to continue the discussion.

@dvyukov

Block profiles were biased towards infrequent long events over frequent short events. This fix corrects the bias by aggregating shorter events as longer but less frequent in the profiles. As a result their cumulative duration will be accurately represented in the profile without skewing their sample mean (duration/count). Credit to @dvyukov for suggesting to adjust the count in the saveblockevent function. Fixes golang#44192.

hyangah · 2021-02-26T14:39:54Z

I see Heap profiler's sampling is more similar to what tcmalloc sampler is doing - having an internal counter to keep track of the next sampling point instead of tossing a coin every time. Is it an option to change the block profile sampler to be more similar to what the current heap profiler or tcmalloc sampler are doing?

felixge · 2021-02-26T15:01:14Z

@hyangah I think the main concern there would be doing this in a way that doesn't fundamentally impact the overhead of block profiling or change the semantics of the blockprofilerate in a way that will lead to less accurate profiles. But if there is consensus that this would be a better direction, I'd be happy to give it a shot.

Meanwhile I got a patch ready based on @dvyukov's suggestions here: felixge@a3dc518

But I'll hold off on submitting it for now. I also still have to sort out some CLA stuff on my end.

@dvyukov

Block profiles were biased towards infrequent long events over frequent short events. This fix corrects the bias by aggregating shorter events as longer but less frequent in the profiles. As a result their cumulative duration will be accurately represented in the profile without skewing their sample mean (duration/count). Credit to @dvyukov for suggesting to adjust the count in the saveblockevent function. Fixes golang#44192. Change-Id: I71a99d7f6ebdb2d484d44890a2517863cceb4004

gopherbot · 2021-03-09T13:27:08Z

Change https://golang.org/cl/299991 mentions this issue: runtime/pprof: fix block profile bias

felixge · 2021-03-09T13:36:39Z

@hyangah @dvyukov I just submitted my current patch. I think it's a decent improvement for now.

However, I'd also be happy to look into aligning the sampling with heap/tcmalloc in a follow-up patch if that's something you're interested in.

@dvyukov

Block profiles were biased towards infrequent long events over frequent short events. This fix corrects the bias by aggregating shorter events as longer but less frequent in the profiles. As a result their cumulative duration will be accurately represented in the profile without skewing their sample mean (duration/count). Credit to @dvyukov for suggesting to adjust the count in the saveblockevent function. Fixes golang#44192. Change-Id: I71a99d7f6ebdb2d484d44890a2517863cceb4004

seankhliao changed the title ~~block profiler: fix sampling bias~~ runtime/pprof: block profiler: fix sampling bias Feb 10, 2021

seankhliao added the NeedsInvestigation label Feb 10, 2021

felixge mentioned this issue Feb 10, 2021

doc: runtime/pprof: block profiler documentation needs some love #14689

Open

gopherbot closed this as completed in 0ae9c3b Apr 27, 2021

dmitshur added this to the Go1.17 milestone Jun 4, 2021

dmitshur added NeedsFix and removed NeedsInvestigation labels Jun 4, 2021

golang locked and limited conversation to collaborators Jun 4, 2022

gopherbot added the FrozenDueToAge label Jun 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime/pprof: block profiler: fix sampling bias #44192

runtime/pprof: block profiler: fix sampling bias #44192

felixge commented Feb 9, 2021 •

edited

Loading

felixge commented Feb 24, 2021 •

edited

Loading

seankhliao commented Feb 24, 2021

hyangah commented Feb 25, 2021

felixge commented Feb 25, 2021

dvyukov commented Feb 25, 2021

felixge commented Feb 25, 2021 •

edited

Loading

dvyukov commented Feb 25, 2021

felixge commented Feb 25, 2021 •

edited

Loading

dvyukov commented Feb 25, 2021

felixge commented Feb 25, 2021 •

edited

Loading

dvyukov commented Feb 26, 2021

felixge commented Feb 26, 2021

hyangah commented Feb 26, 2021

felixge commented Feb 26, 2021

gopherbot commented Mar 9, 2021

felixge commented Mar 9, 2021

runtime/pprof: block profiler: fix sampling bias #44192

runtime/pprof: block profiler: fix sampling bias #44192

Comments

felixge commented Feb 9, 2021 • edited Loading

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

Proposal

felixge commented Feb 24, 2021 • edited Loading

seankhliao commented Feb 24, 2021

hyangah commented Feb 25, 2021

felixge commented Feb 25, 2021

dvyukov commented Feb 25, 2021

felixge commented Feb 25, 2021 • edited Loading

dvyukov commented Feb 25, 2021

felixge commented Feb 25, 2021 • edited Loading

dvyukov commented Feb 25, 2021

felixge commented Feb 25, 2021 • edited Loading

dvyukov commented Feb 26, 2021

felixge commented Feb 26, 2021

hyangah commented Feb 26, 2021

felixge commented Feb 26, 2021

gopherbot commented Mar 9, 2021

felixge commented Mar 9, 2021

felixge commented Feb 9, 2021 •

edited

Loading

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

felixge commented Feb 24, 2021 •

edited

Loading

felixge commented Feb 25, 2021 •

edited

Loading

felixge commented Feb 25, 2021 •

edited

Loading

felixge commented Feb 25, 2021 •

edited

Loading