runtime: fix yet another race in bgsweep
Currently it's possible that bgsweep finishes before all spans
have been swept (we only know that sweeping of all spans has *started*).
In such case bgsweep may fail wake up runfinq goroutine when it needs to.
finq may still be nil at this point, but some finalizers may be queued later.
Make bgsweep to wait for sweeping to *complete*, then it can decide
whether it needs to wake up runfinq for sure.
Update issue 7533
Hello rsc@golang.org (cc: golang-codereviews@googlegroups.com), I'd like you to review this change to https://dvyukov%40google.com@code.google.com/p/go/
10 years, 1 month ago
(2014-03-14 09:14:35 UTC)
#1
This is somewhat uglier than I would like to. One constraint on the fix is ...
10 years, 1 month ago
(2014-03-14 09:19:33 UTC)
#2
This is somewhat uglier than I would like to.
One constraint on the fix is that on-demand sweep can't wake up runfinq (it's
called from mallocgc, and runtime.ready/newg allocate).
Another possible solution is to count span sweep completions. Then bgsweep can
wait for nspancompleted==nspan. But that once again will be spinning (on-demand
sweep can't wake up goroutines). And it will introduce another contended atomic
RMW (there can easily be 100'000 spans and sweeping of spans with large objects
can be very fast).
Can we create the finq goroutine during the first call to runtime.SetFinalizer? Then runfing would ...
10 years, 1 month ago
(2014-03-14 18:24:32 UTC)
#3
Can we create the finq goroutine during the first call to runtime.SetFinalizer?
Then runfing would only ever need to call ready, which I believe does not
allocate.
On 2014/03/14 18:24:32, rsc wrote: > Can we create the finq goroutine during the first ...
10 years, 1 month ago
(2014-03-17 11:06:19 UTC)
#4
On 2014/03/14 18:24:32, rsc wrote:
> Can we create the finq goroutine during the first call to
runtime.SetFinalizer?
> Then runfing would only ever need to call ready, which I believe does not
> allocate.
ready can allocate to create a new M
but I like this idea and creating new M is not strictly necessary for finalizers
On 2014/03/17 11:06:19, dvyukov wrote: > On 2014/03/14 18:24:32, rsc wrote: > > Can we ...
10 years, 1 month ago
(2014-03-17 11:08:20 UTC)
#5
On 2014/03/17 11:06:19, dvyukov wrote:
> On 2014/03/14 18:24:32, rsc wrote:
> > Can we create the finq goroutine during the first call to
> runtime.SetFinalizer?
> > Then runfing would only ever need to call ready, which I believe does not
> > allocate.
>
> ready can allocate to create a new M
> but I like this idea and creating new M is not strictly necessary for
finalizers
no, this does not work
if we ever allocate under schedlock, then that allocation can cause MSpan_Sweep
-> wakefing -> ready(fing)
to execute ready(fing) we need schedlock (because now local queues bounded, so
on overflow we need schedlock to put into global run queue).
On 2014/03/17 11:08:20, dvyukov wrote: > On 2014/03/17 11:06:19, dvyukov wrote: > > On 2014/03/14 ...
10 years, 1 month ago
(2014-03-17 12:41:10 UTC)
#6
On 2014/03/17 11:08:20, dvyukov wrote:
> On 2014/03/17 11:06:19, dvyukov wrote:
> > On 2014/03/14 18:24:32, rsc wrote:
> > > Can we create the finq goroutine during the first call to
> > runtime.SetFinalizer?
> > > Then runfing would only ever need to call ready, which I believe does not
> > > allocate.
> >
> > ready can allocate to create a new M
> > but I like this idea and creating new M is not strictly necessary for
> finalizers
>
> no, this does not work
> if we ever allocate under schedlock, then that allocation can cause
MSpan_Sweep
> -> wakefing -> ready(fing)
> to execute ready(fing) we need schedlock (because now local queues bounded, so
> on overflow we need schedlock to put into global run queue).
OK, PTAL
this become a quite complex change, and I can't say that I 100% trust it
I had to split gclock into gclock/finlock, because otherwise the following
deadlock is possible (it's possible today on tip):
wakefing (locks gclock) -> ready(fing) -> allocate new M -> mallocgc ->
MSpan_Sweep -> queuefinalizer (locks gclock again)
With this change we can allocate while holding gclock again (because mallocgc
can lock finlock at worst, but not gclock), thus the deadlock is resolved.
Also waking of fing becomes merely a write to a bool flag (fingwake), so we can
do in during span sweep.
I think this is pretty clean. https://codereview.appspot.com/75960043/diff/180005/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (right): https://codereview.appspot.com/75960043/diff/180005/src/pkg/runtime/mgc0.c#newcode2613 src/pkg/runtime/mgc0.c:2613: runtime·lock(&gclock); Can this ...
10 years, 1 month ago
(2014-03-24 19:04:00 UTC)
#7
LGTM Please submit even though it doesn't completely fix the problem. Sometimes the problem is ...
10 years, 1 month ago
(2014-03-25 21:01:10 UTC)
#10
LGTM
Please submit even though it doesn't completely fix the problem. Sometimes the
problem is a leak due to conservative collection of one word or another. Once
the heap dump is in I will change the sync.Pool test to write a heap dump on
failure, and then we can fetch the dump from the builder to figure out what went
wrong.
Regarding the arm, okay, submit without the atomics, but I'm not 100% sure we
won't need to add them later.
On 2014/03/25 21:01:10, rsc wrote: > LGTM > > Please submit even though it doesn't ...
10 years, 1 month ago
(2014-03-26 11:06:39 UTC)
#11
On 2014/03/25 21:01:10, rsc wrote:
> LGTM
>
> Please submit even though it doesn't completely fix the problem. Sometimes the
> problem is a leak due to conservative collection of one word or another.
Yeah, it seems to be the case.
I've checked state of GC/fing/bgsweep when the test fails, and it all looks OK
-- all spans are swept and finalizer queue is empty. It seems that the heap
block is not collected at all.
> Once
> the heap dump is in I will change the sync.Pool test to write a heap dump on
> failure, and then we can fetch the dump from the builder to figure out what
went
> wrong.
*** Submitted as https://code.google.com/p/go/source/detail?r=725e6ae0d712 *** runtime: fix yet another race in bgsweep Currently it's possible ...
10 years, 1 month ago
(2014-03-26 11:11:45 UTC)
#12
*** Submitted as https://code.google.com/p/go/source/detail?r=725e6ae0d712 ***
runtime: fix yet another race in bgsweep
Currently it's possible that bgsweep finishes before all spans
have been swept (we only know that sweeping of all spans has *started*).
In such case bgsweep may fail wake up runfinq goroutine when it needs to.
finq may still be nil at this point, but some finalizers may be queued later.
Make bgsweep to wait for sweeping to *complete*, then it can decide
whether it needs to wake up runfinq for sure.
Update issue 7533
LGTM=rsc
R=rsc
CC=golang-codereviews
https://codereview.appspot.com/75960043
Issue 75960043: code review 75960043: runtime: fix yet another race in bgsweep
(Closed)
Created 10 years, 1 month ago by dvyukov
Modified 10 years, 1 month ago
Reviewers: gobot
Base URL:
Comments: 4