-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: GC assists are inefficient at the beginning of marking #15361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Having the goroutines attempting to help out early in the GC while roots On Tue, Apr 19, 2016 at 2:01 AM, Dmitry Vyukov notifications@github.com
|
I agree with @dvyukov that assists should be able to scan roots. I don't see why they shouldn't; the amount of fighting is bounded by the number of Ps whether it's assists doing it or the background mark worker. The difficulty is that we currently don't count root scanning as "scan work", so assists can't get credit for it (and can't satisfy their debt with it). This also means that the background worker won't build up global credit while it's scanning roots, which is probably contributing to the delay. Tracking the stats on that is somewhat subtle, which is why we don't do it right now, but it's definitely doable. I believe we'll also want this when we start re-scanning stacks during concurrent mark so that assists can help out with that.
We used to do this and then I worked very hard to eliminate this uncontrolled allocation window because @dvyukov had some great benchmarks that used this window to completely blow out the heap goal. |
But it should be positive without stats. Threads will do useful work instead of switching goroutines.
Can't we give them temporal credit? The intention is that background scanners will cover that backlog soon and we can allow goroutines to work without assists at all in the expected case. But if background scanners do not, then goroutines will have to assists aggressively and compensate for initial allocations. The duration of the temporal credit must not be time based, it must be allocation based. That is, if we expect to do scanning while application allocates X bytes, we can give credit for X/10 allocations. I don't know the assist system good enough to say whether such credit system can avoid overallocation. |
@aclements Is this bug still relevant with the new changes to reduce the amount of extra work that is done for each assist? |
Yes. This issue arises because there's no work at all for assists at the beginning of the cycle, not because they want more work than they can get. Reducing the extra work does mean assists become useful earlier in the cycle, just not yet at the beginning. |
CL https://golang.org/cl/32431 mentions this issue. |
CL https://golang.org/cl/32432 mentions this issue. |
This lifts the part of gcAssistAlloc that runs on the system stack to its own function in preparation for letting assists perform root jobs (notably stack scanning). This makes it easy to see that there are no references to the user stack once we've entered gcAssistAlloc1, which means it's safe to shrink the stack while in gcAssistAlloc1. This does not yet make assists perform root jobs, so it's not actually possible for the stack to shrink yet. That will happen in the next commit. The code in gcAssistAlloc1 is identical to the code that's currently passed in a closure to systemstack with one exception. Currently, we set the "completed" variable in the enclosing scope to indicate that the assist completed the mark phase. This is exactly the sort of cross-stack reference lifting this function is meant to prevent. We replace this variable with setting gp.param to nil or non-nil to indicate the completion status. Updates #15361. Change-Id: Iba7cfb758c781070a441aea86c0117b399a24dbd Reviewed-on: https://go-review.googlesource.com/32431 TryBot-Result: Gobot Gobot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>
go version devel +318da8d Sat Apr 16 02:55:03 2016 +0000 linux/amd64
I've looked at http benchmark trace:

trace.zip
At the beginning of marking all goroutines are both in debt and cannot scan anything (because GC goroutines are busy with roots and did not produce enough workbufs yet). So what happens is that all procs cycle through all runnable goroutines and park them on assistQueue. That's first 200us of GC with small goroutine boxes all blocking in gcAssistAlloc.
gcAssistAlloc must be able to help with root scanning, because it may be the only available GC work. Switching to other goroutines makes small sense, because the first malloc will trigger assist and will bring us to where we started.
Maybe it also makes sense to give goroutines some temporal credit in the beginning of GC. Currently we start concurrent phase just to force every goroutine to assist. If it turns out that we need aggressive assists, we can make up for it later. But in the expected case GC is triggered at such point that we don't need any assists. The idea is similar to https://go-review.googlesource.com/#/c/20969 but not black and white (GOGC=199 requires assists, while GOGC=200 does not need them entirely).
You can also see that Proc 4 is idle for some reason and does not help GC.
The text was updated successfully, but these errors were encountered: