Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: sync: add a HasRun method to sync.Once #19245

Closed
btracey opened this issue Feb 22, 2017 · 11 comments
Closed

proposal: sync: add a HasRun method to sync.Once #19245

btracey opened this issue Feb 22, 2017 · 11 comments

Comments

@btracey
Copy link
Contributor

btracey commented Feb 22, 2017

In some cases, a different code path should be taken depending on if sync.Once() has been executed (example below). It is possible to work around this issue by using a mutex, but Once can return this information more efficiently

func (o *Once) HasRun() bool {
    return atomic.LoadUint32(&o.done) == 1
}

Use case:
In gonum's implementation of a multivariate Normal distribution, we only compute and store the covariance matrix if necessary. Most commonly, users need to compute probabilities and generate random numbers, neither of which actually need the covariance matrix. Computing and storing the matrix is O(n^3) operation and O(n^2) respectively, which can be a significant additional cost. We have a setSigma method that uses a sync.Once to compute and store the covariance matrix if it's actually required. This is fine, except there are also operations which only need a single element of the covariance matrix, for example getting the marginal distribution along the single dimension. If the covariance matrix has already been computed (due to some other method call), then it's an O(1) operation to return the marginal. If it has not been computed, then a single element of the covariance matrix can be computed in O(n) time, instead of forcing the entire (n^3) operation of the covariance matrix.

At the moment, we have the code

func (n *Normal) MarginalNormalSingle(i int, src *rand.Rand) distuv.Normal {
	var std float64
	if n.sigma != nil {
		std = n.sigma.At(i, i)
	} else {
		// Reconstruct the {i,i} diagonal element of the covariance directly.
		for j := 0; j <= i; j++ {
			v := n.lower.At(i, j)
			std += v * v
		}
	}
	return distuv.Normal{
		Mu:     n.mu[i],
		Sigma:  math.Sqrt(std),
		Source: src,
	}
}

// setSigma computes and stores the covariance matrix of the distribution.
func (n *Normal) setSigma() {
	n.once.Do(func() {
		n.sigma = mat64.NewSymDense(n.Dim(), nil)
		n.sigma.FromCholesky(&n.chol)
	})
}

These two methods cannot be called concurrently, because checking that n.sigma == nil directly races against n.Once setting n.sigma. It is possible to work around this using a mutex, with something like

func (n *Normal) sigmaNil() bool {
    n.RLock()
    b := n.sigma == nil
    n.RUnlock()
    return b
}

and a similar transformation to setSigma, but it would be really nice to just have the MarginalNormal code be

func (n *Normal) MarginalNormalSingle(i int, src *rand.Rand) distuv.Normal {
   var std float64
   if n.once.HasRun() {
   	std = n.sigma.At(i, i)
   } else {
   	// Reconstruct the {i,i} diagonal element of the covariance directly.
               std = ...
   }
}
@bradfitz
Copy link
Contributor

This seems ripe for misuse. And even in your example code it looks like you're doing it wrong.

Your code permits two goroutines to both see HasRun() return false and then do duplicate initialization work.

You should instead do:

func (n *Normal) MarginalNormalSingle(i int, src *rand.Rand) distuv.Normal {
     n.once.Do(n.initStd)
     std := n.std
     ....
}

func (n *Normal) initStd() {
    ....
}

@btracey
Copy link
Contributor Author

btracey commented Feb 22, 2017

I don't understand. Your initStd (which is our setSigma), computes the entire covariance matrix, which computes O(n^2) matrix elements and takes O(n^3) time. If the covariance matrix is not set, instead MarginalNormalSingle computes one matrix element in O(n) time. That is a significant cost savings. Yes, it's possible that two different goroutines could compute the same marginal, which would duplicate work, but it takes O(n^2) goroutines to compute the same marginal before you lose overall.

@bradfitz
Copy link
Contributor

Sorry, I didn't read all your code. I just responded looking at your final snippet.

You can work around this without any new API. Just use atomic.StoreInt32 in your n.once.Do(func() { and LoadInt32 to check whether it's done.

I don't think we want to encourage such racy checks. We also didn't accept Mutex.TryLock in #6123.

@btracey
Copy link
Contributor Author

btracey commented Feb 22, 2017

You can work around this without any new API. Just use atomic.StoreInt32 in your n.once.Do(func() { and LoadInt32 to check whether it's done.

By which you mean replicating the code that's in sync.Once?

@griesemer
Copy link
Contributor

Agreeing with @bradfitz . This should not go in.

sync.Once is here exactly to handle concurrent invocations correctly. Every time you provide a simple predicate that doesn't also atomically handle an action depending on the predicate, you are producing racy code in a concurrent environment.

If you still wanted that predicate, you can do exactly as Brad suggested: simply use an atomic store to set the predicate in the function you provide to sync.Once.

@btracey
Copy link
Contributor Author

btracey commented Feb 23, 2017

Okay.

@btracey
Copy link
Contributor Author

btracey commented Feb 23, 2017

Sorry, I thought I understood yesterday, but I don't think I do anymore.

@bradfitz says that "my code permits two goroutines to see HasRun() as false and do duplicate initialization work". This is not true. sync.Once ensures the initialization work only happens once. In my case here, the code in sync.Once is expensive, but does not necessarily need to happen. If it has happened, then the code uses that work as a shortcut, but if it has not happened it can avoid the work by avoiding the data modified by sync.Once entirely.

@griesemer : Isn't having sync.HasRun exactly atomically handling an action depending on the predicate? If the predicate has run, then it does one way, and if it hasn't, it avoids that memory altogether. HasRun is (proposed as) atomic, so it's not a race condition in the "should set off the race detector" sense of the word.

I'm not necessarily trying to change the argument, but you both seem to be suggesting that the above code is bad. I don't understand why. It's not a race condition if the check happens atomically, and it increases the efficiency (in big O) for almost all executions of the code.

@griesemer
Copy link
Contributor

@btracey Think about the code you'd write using the HasRun predicate: Even if HasRun returns false, by the time the caller checks the result of HasRun (which is still false), the code may actually have run (and if you'd call HasRun again it would return true) - hence the race condition.

@btracey
Copy link
Contributor Author

btracey commented Feb 23, 2017

Right, I understand that. This is why there needs to be protection with an atomic or mutex.

The code that uses the answer returns the same number in either condition, it just uses a different means to acquire the number. On the one hand, I can see the argument that this kind of unpredictability is bad, but on the other using sync.Once in the first place doesn't seem all that different -- it's impossible to tell if a specific goroutine will get to use the fast path or have to wait for the call to f() to conclude.

If the argument is just "this is too dangerous a tool to expose", then understood. It seemed like there was an additional "and changing your code to use the atomics is a bad idea" and I'm trying to understand why.

@griesemer
Copy link
Contributor

@btracey It's a dangerous tool to expose exactly because it can't be used correctly without yet another synchronization mechanism. So why not use just one mechanism instead?

@btracey
Copy link
Contributor Author

btracey commented Feb 23, 2017

Well, with HasRun exposed it would be the same synchronization mechanism (the done field in Once), while without the method another (or altogether different) synchronization mechanism is required. However, I understand your point that HasRun gives the impression the call either has or hasn't run, while in reality all that can be said is that the call either has definitely run or it has maybe run.

@golang golang locked and limited conversation to collaborators Feb 23, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants