runtime: GC assists are inefficient at the beginning of marking #15361

dvyukov · 2016-04-19T06:01:31Z

go version devel +318da8d Sat Apr 16 02:55:03 2016 +0000 linux/amd64

I've looked at http benchmark trace:

trace.zip

At the beginning of marking all goroutines are both in debt and cannot scan anything (because GC goroutines are busy with roots and did not produce enough workbufs yet). So what happens is that all procs cycle through all runnable goroutines and park them on assistQueue. That's first 200us of GC with small goroutine boxes all blocking in gcAssistAlloc.

gcAssistAlloc must be able to help with root scanning, because it may be the only available GC work. Switching to other goroutines makes small sense, because the first malloc will trigger assist and will bring us to where we started.

Maybe it also makes sense to give goroutines some temporal credit in the beginning of GC. Currently we start concurrent phase just to force every goroutine to assist. If it turns out that we need aggressive assists, we can make up for it later. But in the expected case GC is triggered at such point that we don't need any assists. The idea is similar to https://go-review.googlesource.com/#/c/20969 but not black and white (GOGC=199 requires assists, while GOGC=200 does not need them entirely).

You can also see that Proc 4 is idle for some reason and does not help GC.

RLH · 2016-04-19T13:47:19Z

Having the goroutines attempting to help out early in the GC while roots
are still being scanned seems like the problem. Fighting for a small amount
of work at this point might not be efficient. In the case where the root
scan is not complete and there are no work bufs available we could just
allow the goroutine to go ahead and allocate. This is part of the larger
discussion about what kinds of availability promises the Go GC wants to
make. At this point 200us is in the noise, hopefully it won't be in the
noise in the 1.8 timeframe.

On Tue, Apr 19, 2016 at 2:01 AM, Dmitry Vyukov notifications@github.com
wrote:

go version devel +318da8d Sat Apr 16 02:55:03 2016 +0000 linux/amd64

I've looked at http benchmark trace:
[image: trace]
https://cloud.githubusercontent.com/assets/1095328/14628858/f047c932-0601-11e6-8405-ce35de5bb539.png
trace.zip https://github.com/golang/go/files/225283/trace.zip

At the beginning of marking all goroutines are both in debt and cannot
scan anything (because GC goroutines are busy with roots and did not
produce enough workbufs yet). So what happens is that all procs cycle
through all runnable goroutines and park them on assistQueue. That's first
200us of GC with small goroutine boxes all blocking in gcAssistAlloc.

gcAssistAlloc must be able to help with root scanning, because it may be
the only available GC work. Switching to other goroutines makes small
sense, because the first malloc will trigger assist and will bring us to
where we started.

Maybe it also makes sense to give goroutines some temporal credit in the
beginning of GC. Currently we start concurrent phase just to force every
goroutine to assist. If it turns out that we need aggressive assists, we
can make up for it later. But in the expected case GC is triggered at such
point that we don't need any assists. The idea is similar to
https://go-review.googlesource.com/#/c/20969 but not black and white
(GOGC=199 requires assists, while GOGC=200 does not need them entirely).

You can also see that Proc 4 is idle for some reason and does not help GC.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#15361

aclements · 2016-04-19T14:22:39Z

I agree with @dvyukov that assists should be able to scan roots. I don't see why they shouldn't; the amount of fighting is bounded by the number of Ps whether it's assists doing it or the background mark worker. The difficulty is that we currently don't count root scanning as "scan work", so assists can't get credit for it (and can't satisfy their debt with it). This also means that the background worker won't build up global credit while it's scanning roots, which is probably contributing to the delay. Tracking the stats on that is somewhat subtle, which is why we don't do it right now, but it's definitely doable.

I believe we'll also want this when we start re-scanning stacks during concurrent mark so that assists can help out with that.

In the case where the root scan is not complete and there are no work bufs available we could just allow the goroutine to go ahead and allocate.

We used to do this and then I worked very hard to eliminate this uncontrolled allocation window because @dvyukov had some great benchmarks that used this window to completely blow out the heap goal.

dvyukov · 2016-04-19T14:40:58Z

Tracking the stats on that is somewhat subtle, which is why we don't do it right now, but it's definitely doable.

But it should be positive without stats. Threads will do useful work instead of switching goroutines.

We used to do this and then I worked very hard to eliminate this uncontrolled allocation window because @dvyukov had some great benchmarks that used this window to completely blow out the heap goal.

Can't we give them temporal credit? The intention is that background scanners will cover that backlog soon and we can allow goroutines to work without assists at all in the expected case. But if background scanners do not, then goroutines will have to assists aggressively and compensate for initial allocations. The duration of the temporal credit must not be time based, it must be allocation based. That is, if we expect to do scanning while application allocates X bytes, we can give credit for X/10 allocations. I don't know the assist system good enough to say whether such credit system can avoid overallocation.
It also looks like a more reliable solution to the problem described in cl/20969. The change turns off assists when GOGC>200. But that still can lead to overallocation. Instead we could give goroutines credit for longer period of time.

quentinmit · 2016-09-28T22:25:06Z

@aclements Is this bug still relevant with the new changes to reduce the amount of extra work that is done for each assist?

aclements · 2016-09-29T02:16:13Z

@aclements Is this bug still relevant with the new changes to reduce the amount of extra work that is done for each assist?

Yes. This issue arises because there's no work at all for assists at the beginning of the cycle, not because they want more work than they can get. Reducing the extra work does mean assists become useful earlier in the cycle, just not yet at the beginning.

gopherbot · 2016-10-31T03:00:57Z

CL https://golang.org/cl/32431 mentions this issue.

gopherbot · 2016-10-31T03:00:58Z

CL https://golang.org/cl/32432 mentions this issue.

This lifts the part of gcAssistAlloc that runs on the system stack to its own function in preparation for letting assists perform root jobs (notably stack scanning). This makes it easy to see that there are no references to the user stack once we've entered gcAssistAlloc1, which means it's safe to shrink the stack while in gcAssistAlloc1. This does not yet make assists perform root jobs, so it's not actually possible for the stack to shrink yet. That will happen in the next commit. The code in gcAssistAlloc1 is identical to the code that's currently passed in a closure to systemstack with one exception. Currently, we set the "completed" variable in the enclosing scope to indicate that the assist completed the mark phase. This is exactly the sort of cross-stack reference lifting this function is meant to prevent. We replace this variable with setting gp.param to nil or non-nil to indicate the completion status. Updates #15361. Change-Id: Iba7cfb758c781070a441aea86c0117b399a24dbd Reviewed-on: https://go-review.googlesource.com/32431 TryBot-Result: Gobot Gobot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>

dvyukov assigned aclements Apr 19, 2016

aclements added this to the Go1.8Early milestone Apr 19, 2016

quentinmit added the NeedsFix label Sep 28, 2016

rsc modified the milestones: Go1.8, Go1.8Early Oct 20, 2016

gopherbot closed this as completed in d35dfd4 Oct 31, 2016

golang locked and limited conversation to collaborators Oct 31, 2017

gopherbot added the FrozenDueToAge label Oct 31, 2017

rsc unassigned aclements Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: GC assists are inefficient at the beginning of marking #15361

runtime: GC assists are inefficient at the beginning of marking #15361

dvyukov commented Apr 19, 2016

RLH commented Apr 19, 2016

aclements commented Apr 19, 2016

dvyukov commented Apr 19, 2016

quentinmit commented Sep 28, 2016

aclements commented Sep 29, 2016

gopherbot commented Oct 31, 2016

gopherbot commented Oct 31, 2016

runtime: GC assists are inefficient at the beginning of marking #15361

runtime: GC assists are inefficient at the beginning of marking #15361

Comments

dvyukov commented Apr 19, 2016

RLH commented Apr 19, 2016

aclements commented Apr 19, 2016

dvyukov commented Apr 19, 2016

quentinmit commented Sep 28, 2016

aclements commented Sep 29, 2016

gopherbot commented Oct 31, 2016

gopherbot commented Oct 31, 2016