Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: x/sync/singleflight: add a method for per-caller post-processing of shared result #56349

Open
costela opened this issue Oct 20, 2022 · 9 comments
Labels
Milestone

Comments

@costela
Copy link
Contributor

costela commented Oct 20, 2022

Background

In order to safely deal with some classes of shared resources that can be acquired via a singleflight group (anything that requires explicit cleanup/freeing; files, shared buffers, etc), there is need for a mechanism to account for all goroutines that received a reference to the shared resource.

It is not sufficient to perform this accouting after the singleflight call returns, since any single waiting goroutine may complete its work on the shared resource before other goroutines start, leading to misbehavior.

Proposal

To solve this missing functionality, there were suggestions (not converted to proposals) to expose singleflights internal count of duplicated calls.

I propose a different approach, inspired by this comment: extend the singleflight.Group API to include a second function parameter, called by every duplicated caller.

E.g.:

sg := &singleflight.Group{}
sg.DoShared(
	"somekey",
	func() (interface{}, error) {
       	// acquire shared resource
	},
	func(shared interface{}, err error) {
		// perform arbitrary per-caller post-processing;
		// guaranteed to run before any caller returns.
	},
)

Pros:

  • explicit
  • versatile
  • has no impact on existing usage
  • leaves error handling to the consumer (with the exception of panics)

Cons:

  • increases API surface area
  • less intuitive API

Relates to golang/sync#9 and golang/sync#20

@ianlancetaylor
Copy link
Contributor

I don't understand when this post-processing function would be called. The singleflight code just returns the value to the caller. It can't know when the caller is done with the value.

@costela
Copy link
Contributor Author

costela commented Oct 20, 2022

@ianlancetaylor the second function parameter would be called after the first function parameter returns, but before the Do() method returns to the caller, ensuring all callers (the one that actually executed the closure as well as any of the others) get a chance to do something to the shared value. This something must not be all the caller wants to do with the resource (it doesn't need to be "done" with the value), but should be enough to ensure accurate accounting.

One way to use it would be something like:

type ExpensiveResource { /* something */ }

func (ExpensiveResource) Cleanup() { /* something */ }

type sharedExpensiveResource struct {
	readers atomic.Int64
	ExpensiveResource
}

func (s *sharedExpensiveResource) Cleanup() {
	if left := s.readers.Add(-1); left == 0 {
		s.ExpensiveResource.Cleanup()
	}
}

sf := &singleflight.Group{}

func somePotentiallyConcurrentFuntion() {
	out, err := sf.DoShared(
		"foo",
		func() (interface{}, error) {
			s := &sharedExpensiveResource{
				ExpensiveResource: /* expensive stuff */, 
			}
			s.readers.Add(1)
		},
		func(in interface{}, error) {
			res := in.(*sharedExpensiveResource)
			res.readers.Add(1)
		},
	)

	res := out.(*sharedExpensiveResource)

	res.DoStuff()
	res.Cleanup()
}

All calls of somePotentiallyConcurrentFunction are guaranteed they will only get to res.Cleanup() after all other calls sharing the same sharedExpensiveResource got their chance to run their "accounting" (in this case res.readers.Add(1)).

Does that clear it up?

Or am I being obtuse? 😅

@ianlancetaylor
Copy link
Contributor

Just to understand the proposal, would it be the same to write

func DoShared() *sharedExpensiveResource {
	out, err, _ := sf.Do("foo", 
		func() (interface{}, error) {
			s := &sharedExpensiveResource{
				ExpensiveResource: /* expensive stuff */, 
			})
	res := in.(*sharedExpensiveResource)
	res.readers.Add(1)
	return res
}

and then always call that DoShared function instead of sf.DoShared?

@seankhliao
Copy link
Member

I don't understand how readers in the example would ever be accurate?

@ianlancetaylor
Copy link
Contributor

Is there a key difference that would make it accurate when the proposed DoShared method?

@costela
Copy link
Contributor Author

costela commented Oct 21, 2022

@ianlancetaylor

would it be the same to write

I don't think it would: in this case there is no guarantee the callers of DoShared would have a consistent view of readers. If two goroutines (G1, G2) enter DoShared concurrently and end up in the same singleflight call, they are only guaranteed to be synchronized up to the point where Do returns. After that G1 can do all its work with sharedExpensiveResource and (crucially!) call Cleanup before G2 ever gets to res.readers.Add.

Or am I overlooking something obvious?

@seankhliao sorry, do you mean Ian's example or mine?

@ianlancetaylor
Copy link
Contributor

@costela Thanks. I guess the proposal is that before we return a value to any caller, we call the new function as many times as there are callers. Or perhaps instead of calling it several times we should instead call a function once with the number of results we are going to return.

@costela
Copy link
Contributor Author

costela commented Oct 21, 2022

@ianlancetaylor the advantage of having a per-caller closure would be allowing each caller to decide if they're still interested in sharing the resource after a potentially long time waiting for the singleflight.

But I admit: that's maybe veering too far into speculation territory. The use-cases that were brought up until now would AFAICT also work with a simple reference counter.

@costela
Copy link
Contributor Author

costela commented Mar 16, 2023

Just for the record, because I had this discussion somewhere else: one alternative would be introducing some single-method interface, which the returned any could optionally implement. Something like:

type Sharer interface {
    Share()
}

This single method would then be called before Do returns to any caller, giving the same guarantees as above.

This has the advantage of not increasing the API surface area, at the price of making the code more implicit and less discoverable. There's also the risk of "accidental implementation", which would need further workarounds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

4 participants