Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: sync, sync/atomic: add PoolOf, MapOf, ValueOf #47657

Open
ianlancetaylor opened this issue Aug 11, 2021 · 41 comments
Open

proposal: sync, sync/atomic: add PoolOf, MapOf, ValueOf #47657

ianlancetaylor opened this issue Aug 11, 2021 · 41 comments
Labels
generics Issue is related to generics Proposal Proposal-Hold
Milestone

Comments

@ianlancetaylor
Copy link
Contributor

This proposal is for use with #43651. We propose using type parameters to add compile-time-type-safe variants of sync.Pool, sync.Map, and atomic.Value.

This proposal is not targeted for any specific Go release.

package sync

// A PoolOf is a set of temporary objects of type T that may be individually saved and retrieved.
// ...and so forth
type PoolOf[T any] struct {
    ...
    New func() T
}

// Put adds x to the pool.
func (p *PoolOf[T]) Put(x T)

// Get selects an arbitrary item from the Pool, removes it from the
// Pool, and returns it to the caller.
// The bool result reports whether a value was found.
// ...and so forth
//
// If Get would otherwise return the second result as false and p.New is non-nil,
// Get returns the result of calling p.New and true.
func (p *PoolOf[T]) Get() (T, bool)

// MapOf is like a Go map[K]V but is safe for concurrent use
// by multiple goroutines without additional locking or coordination.
// ...and so forth
type MapOf[K comparable, V any] struct { ... }

// Load returns the value stored in the map for a key, or the zero value if no
// value is present.
// The ok result indicates whether value was found in the map.
func (m *MapOf[K, V]) Load(key K) (value V, ok bool)

// Store sets the value for a key.
func (m *MapOf[K, V]) Store(key K, value V)

// LoadOrStore returns the existing value for the key if present.
// Otherwise, it stores and returns the given value.
// The loaded result is true if the value was loaded, false if stored.
func (m *MapOf[K, V]) LoadOrStore(key K, value V) (actual V, loaded bool)

// LoadAndDelete deletes the value for a key, returning the previous value if any.
// The loaded result reports whether the key was present.
func (m *MapOf[K, V]) LoadAndDelete(key K) (value V, loaded bool)

// Delete deletes the value for a key.
func (m *MapOf[K, V]) Delete(key K)

// Range calls f sequentially for each key and value present in the map.
// ... and so forth
func (m *MapOf[K, V]) Range(f func(key K, value V) bool)
package atomic

// A ValueOf provides an atomic load and store of a value of a given type.
// The zero value for a Value returns the zero value of the type from Load.
// Once Store has been called, a Value must not be copied.
//
// A Value must not be copied after first use.
type ValueOf[T any] struct { ... }

// Load returns the value set by the most recent Store.
// It returns the zero value of T if there has been no call to Store for this Value.
func (v *ValueOf[T]) Load() (val T)

// Store sets the value of the Value to x.
func (v *ValueOf[T]) Store(val T)

// Swap stores new into Value and returns the previous value. It returns the zero value
// if the Value is empty.
func (v *ValueOf[T]) Swap(new T) (old T)

// Note: no CompareAndSwap method; it would require a comparable T.
@DeedleFake
Copy link

DeedleFake commented Aug 11, 2021

Would it be possible to remove the double implementation that would result from this using type aliases? For example,

type Pool = PoolOf[interface{}]

I'm not familiar enough with the nuances of type aliases to be sure if this is fully backwards compatible or if it would have weird edge cases, such as in embedding. (I'm not actually entirely sure that it's legal to alias to a generic instantiation.)

@ianlancetaylor
Copy link
Contributor Author

@DeedleFake Unfortunately it's not quite the same. The current version of sync.Pool.Get returns a single result and relies on the caller checking for nil. That doesn't work for sync.PoolOf.Get, because Go has no universal does-not-exist value, so sync.PoolOf.Get has to return two results.

@ianlancetaylor
Copy link
Contributor Author

That said, it's likely that sync.Pool could be reimplemented in terms of sync.PoolOf[interface{}].

@DeedleFake
Copy link

DeedleFake commented Aug 11, 2021

The bool second return seems like it's adding functionality. If I'm reading the docs correctly, the current sync.Pool.Get() never returns nil unless p.New() returns nil, in which case that's the user deciding that that's a valid value, not the pool's implementation using it as a marker value.

The second return in the generic implementation doesn't seem to map to any existing behavior.

Edit: It can return nil in the case that it needs to create a new object but p.New is nil. Is that particularly different than just returning the zero value of T, which will be nil for interface{}?

@ianlancetaylor
Copy link
Contributor Author

For interface{} it's pretty clear that nil is not a real value. For other types, the zero value may be a real value. Perhaps we could get away with it for most uses, but I think it is cleaner to separate the value from whether there was a value in the pool.

@DeedleFake
Copy link

So, just to clarify, what you're saying is that the incompatibility isn't non-obvious nil-equivalents, but rather that the new version isn't just a genercized implementation, but also adds functionality in the form of the ability to tell if a value was in the pool already or was generated on the fly? I'm asking because you had originally said

The current version of sync.Pool.Get returns a single result relies on the caller checking for nil. That doesn't work for sync.PoolOf.Get, because Go has no universal does-not-exist value, so sync.PoolOf.Get has to return two results.

but the current implementation's return of nil isn't particularly useful most of the time. In fact, I was kind of surprised that it doesn't panic in those circumstances. I think that that's where my confusion came from. The current implementation returning nil is, as far as I can tell, completely different and unrelated circumstances from those which determine the value of the second return in the new implementation.

@zephyrtronium
Copy link
Contributor

... adds functionality in the form of the ability to tell if a value was in the pool already or was generated on the fly?

This is exactly the circumstance under which the current Get returns nil, assuming New never returns nil itself.

@DeedleFake
Copy link

DeedleFake commented Aug 12, 2021

No it isn't. The current implementation returns nil, assuming that p.New doesn't itself return nil, if and only if it wants to generate a new value but p.New == nil.

It can't possibly be the same functionality as is proposed here because that would mean that it was both returning nil to indicate that it had generated a new value and was, simultaneously, returning non-nil because it's also returning the generated value. It can't do both at the same time with a single return value.

Edit: You could use it simulate the proposed functionality by manually setting p.New to nil before calling Get(). Is that what you're suggesting? Because that seems rather non-obvious, and requires you to have a completely separate New() function that you keep track of. If you ever actually set p.New, you lose thread-safety, as the documentation states.

Edit 2: The documentation does say that p.New is optional. Huh. Never noticed that before. Still seems non-obvious to me.

Edit 3: Alright, so I did a quick survey and it seems like there are some cases of people using this functionality of sync.Pool. I am still surprised at the API, though. Just feels really weird to me. Might just be the way the documentation is written, though.

Either way, I feel like this is getting further and further off of the main topic. I'm fully in favor of this proposal, regardless of these minor details.

@ianlancetaylor
Copy link
Contributor Author

It's true that if the pool's New field is set, then there is no significant difference between Pool and the proposed PoolOf. But most pools do not set the New field, and in that case the original Pool returns nil to indicate that it couldn't find anything in the pool, and the proposed PoolOf indicates that by returning an additional boolean result.

@seankhliao seankhliao added the generics Issue is related to generics label Aug 12, 2021
@DeedleFake
Copy link

DeedleFake commented Aug 12, 2021

Honestly, the more I think about it the more that what I find strange about the existing API isn't the behavior when p.New is nil, but rather that p.New exists at all. It feels like sync.Pool just has an extra minor convenience feature attached to it that has no particular impact on the actual purpose of the type.

As long as new implementations are being built anyways, could New be removed from PoolOf? Pool could still be reimplemented via PoolOf by just making it

type Pool struct {
  p PoolOf[interface{}]
  New func() interface{}
}

func (p *Pool) Get() interface{} {
  v, ok := p.p.Get()
  if !ok && (p.New != nil) {
    return p.New()
  }
  return v
}

// And Put(), too.

@deanveloper
Copy link

Documentation is going to be very messy with similarly named structures/functions, as will autocomplete. As #47619 (comment) says, maybe it's time for /v2.

@DeedleFake
Copy link

DeedleFake commented Aug 13, 2021

I've been wondering recently if maybe now that modules are a well-accepted thing, everything that isn't key to the functioning of the language should be moved into golang.org/x and properly versioned. That would also allow updates to the standard library to be completely decoupled from updates to the language, especially thanks to the go directive in go.mod. Then just deprecate most of the standard library, and maybe remove it for a full Go 2 release. You could soft remove it, too, by only allowing access to it via low enough go version specifications.

Edit: I forgot to mention that a potential downside to this is complicating bootstrapping a Go compiler onto a system with limited or no network capabilities, as basic functionality could wind up being external accidentally, but I'm not sure that this is nearly a big enough problem to really worry about.

@rsc
Copy link
Contributor

rsc commented Aug 18, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals (old) Aug 18, 2021
@CAFxX
Copy link
Contributor

CAFxX commented Aug 23, 2021

// Note: no CompareAndSwap method; it would require a comparable T.

Not having CompareAndSwap is quite unfortunate.

I imagine something like the following could fill the gap, but it would be ugly-ish and it's probably why it has been left out:

func CompareAndSwap[T comparable](v *ValueOf[T], old, new T) (T, bool)

(update: or, obviously, something like)

type ComparableValueOf[T comparable] struct{ 
	v ValueOf[T]
}

func (v *ComparableValueOf[T]) CompareAndSwap(old, new T) (T, bool)

Are we omitting it now because we think there are going to be better ways to do this more cleanly in the future?

@ianlancetaylor
Copy link
Contributor Author

Are we omitting it now because we think there are going to be better ways to do this more cleanly in the future?

One possibility is to permit generic types to have methods that use different, tighter, constraints. Those methods would only be available if the type argument(s) satisfied those constraints. I don't know whether that would be a good idea or not.

@twmb
Copy link
Contributor

twmb commented Sep 1, 2021

p.New exists at all

But most pools do not set the New field

As a counterpoint, I've never not used New when writing a pool. There are also more instances of a sync.Pool in the stdlib with New initialized than not:

$ rg -U 'sync\.Pool\{(\n\s+)?New' --stats --count
src/io/io.go:1
src/reflect/type.go:1
src/fmt/print.go:1
src/fmt/scan.go:1
src/os/dir_unix.go:1
src/encoding/gob/encode.go:1
src/encoding/json/scanner.go:1
src/crypto/tls/conn.go:1
src/net/http/server.go:1
src/net/http/header.go:1
src/net/http/h2_bundle.go:7
src/internal/poll/splice_linux.go:1
src/cmd/compile/internal/types/fmt.go:1

19 matches
35 matched lines
13 files contained matches

vs.

$ rg 'sync\.Pool(;|$)' --stats --count
src/reflect/type.go:1
src/regexp/regexp.go:1
src/regexp/exec.go:1
src/regexp/backtrack.go:1
src/encoding/json/encode.go:1
src/archive/zip/register.go:2
src/math/big/nat.go:1
src/net/http/request.go:1
src/net/http/server.go:3

12 matches
12 matched lines
9 files contained matches

and, some of these instances are sync.Pool fields in structs that are later actually initialized with New. Using the New field means that I can just blindly type-assert any return, rather than always checking if v == nil { initialize } (or, in PoolOf, if !ok { initialize }). This is even more useful if the Get function is used in more than just one location.

In short, New is very valuable. Returning an extra bool from Get is fine by me -- similar to how I blindly type-assert today, I can blindly ignore the bool return in the future. I'm not sure what the worry is about the New field: there's no downside to not setting it, and there's only downside to removing it.

@rogpeppe
Copy link
Contributor

rogpeppe commented Sep 2, 2021

It's maybe worth pointing out that atomic.ValueOf is a little cleaner than atomic.Value when it comes to storing interface types. It seems that this would be valid, for example, which it isn't for atomic.Value:

var v atomic.ValueOf[interface{}]

func init() {
    v.Store("a")
    v.Store(1)
}

@Merovius
Copy link
Contributor

Merovius commented Sep 7, 2021

One way to avoid the boolean return of PoolOf[T].Get is to make New no longer optional. That is, either panic if New is nil, or require a use of NewPool[T any](new func() T) *PoolOf[T] by making the field opaque (and in practice also panic if it is nil). The panics are unfortunate, but as long as it's clearly documented as an API requirement, that seems fine to me.

This assumes that there are no cases where you'd want to use Pool without New, of course. Like @DeedleFake I can't really imagine a circumstance where that would be useful, but maybe I'm not imaginative enough. I looked at the codesearch they linked and at least the first couple of results would actually be well-served by having a New function.

@rsc
Copy link
Contributor

rsc commented Sep 9, 2021

It's maybe worth pointing out that atomic.ValueOf is a little cleaner than atomic.Value when it comes to storing interface types. It seems that this would be valid, for example, which it isn't for atomic.Value:

It depends on how it is implemented. I am not sure that it would necessarily be implemented in a way that allows changing the underlying interface type.

@rsc
Copy link
Contributor

rsc commented Sep 9, 2021

I am increasingly uncomfortable with the long-term effects of the "Of" suffix.
Ian, Robert, and I have been batting around other options and posted a discussion at #48287 to try to solicit more.

@bcmills
Copy link
Contributor

bcmills commented Sep 9, 2021

A thought regarding atomic.ValueOf. If I were new to Go, I would probably tend to assume that, say, an atomic.Value[time.Duration] uses atomic operations on time.Duration values, but what it would actually do under this proposal is allocate boxed time.Duration values and use atomic operations on the pointers.

To me, that merits splitting atomic.Value into several different types, based on the allocation properties we can guarantee for the type. Maybe atomic.Pointer[T] (for any pointer type), atomic.Integer[T] (for any integer type of specific sizes), and one final one for interface types (atomic.InterfaceValue?)?

I'm not sure whether that makes the naming problem easier or harder. 😅

@rogpeppe
Copy link
Contributor

rogpeppe commented Sep 10, 2021

A thought regarding atomic.ValueOf. If I were new to Go, I would probably tend to assume that, say, an atomic.Value[time.Duration] uses atomic operations on time.Duration values, but what it would actually do under this proposal is allocate boxed time.Duration values and use atomic operations on the pointers.

That's an interesting thought. If we had a constructor, say NewValueOf, we could do something like this, where we use a different concrete type under the hood depending on what the actual type is, but also provide different concrete types for when it's important to be able to use the zero value.

@rogpeppe
Copy link
Contributor

To me, that merits splitting atomic.Value into several different types, based on the allocation properties we can guarantee for the type. Maybe atomic.Pointer[T] (for any pointer type), atomic.Integer[T] (for any integer type of specific sizes), and one final one for interface types (atomic.InterfaceValue?)?

Technically you might only need one type AFAICS, as type constraints are expressive enough to allow most of the types that can be accessed atomically (notably absent are struct types containing single pointerlike values).

// DirectAtomicType is an constraint that admits only types that
// can be set directly with system-level atomic APIs.
type DirectAtomicType[V any, K comparable] interface {
	SimpleAtomicType
	~*V |
	~map[K]V |
	~chan V |
	~<-chan V |
	~chan<- V
}

// SimpleAtomicType is a constraint that admits non-pointer
// types that can be set directly with system-level atomic APIs.
type SimpleAtomicType interface {
	~int |
	~int64 |
	~int32 |
	~uint |
	~uintptr |
	~uint64 |
	~uint32
}

// DirectValue is like Value except that it stores its value directly
// inside the struct without bookkeeping overhead.
type DirectValue[T DirectAtomicType[V, K], V any, K comparable] struct {
	x T
}

You'd want to be able to do a generic type-parameter switch though, and generic type aliases would be nice too:

// SimpleDirectValue is an alias for DirectValue that can store
// non-pointer types.
// This is using the type alias feature proposed in issue 46477.
type SimpleDirectValue[T SimpleAtomicType] = DirectValue[T, struct{}, struct{}]

The code might look like this: https://go2goplay.golang.org/p/LVak2XRNC81

@rsc
Copy link
Contributor

rsc commented Oct 27, 2021

Putting this on hold until Go 1.18 is out. We're not going to make any changes to the library for Go 1.18.

@rsc rsc moved this from Active to Hold in Proposals (old) Oct 27, 2021
@rsc
Copy link
Contributor

rsc commented Oct 27, 2021

Placed on hold.
— rsc for the proposal review group

@kf6nux
Copy link

kf6nux commented Apr 19, 2022

Should this Proposal be moved from Hold to Active? The hold condition (Go 1.18 released) is resolved.

@ianlancetaylor
Copy link
Contributor Author

I think we don't yet understand what to do for this kind of change. See also #48287.

@kf6nux
Copy link

kf6nux commented Apr 19, 2022

I see that discussion is labeled as closed. Is it being actively discussed somewhere?

@ianlancetaylor
Copy link
Contributor Author

I don't know. It's early days yet.

@muhlemmer
Copy link
Contributor

About the PoolOf discussion regarding New() vs ok bool, I think you can do without both, because of the type safety it brings with generics.

  1. Using it with struct values is pointless, because passing by value creates a copy. So there won't be memory re-usage;
  2. Is there any real use for sync.PoolOf for build-in (base) types? And even if it will be used like that, Get() will safely return it's zero value, which is a sane state.

IMHO:

type PoolOf[T any] struct {
...
}

func (p *PoolOf[T]) Put(x T)
func (p *PoolOf[T]) Get() T

Can be used like:

type myType struct

func newMyType() *myType{ return &myType }

var pool PoolOf[*myType]

func main() {
    v := pool.Get()
    if v == nil {
        v = newMyType()
    }
    defer pool.Put(v)
    ...
}

@dsnet
Copy link
Member

dsnet commented Feb 25, 2023

A performance advantage of a generic sync.Pool is that we we're no longer limited to only using pointers with it if we want to avoid an allocation. This is commonly the case when trying to pool []byte, where one often resorts to using the more unnatural *[]byte representation instead. The more natural []byte representation causes the slice header to always be heap allocated whenever calling sync.Pool.Put.

In https://go.dev/cl/471200, I struggled with this problem and decided to pool []byte instead of *[]byte since handling []byte resulted in both simpler and overall faster code (even when working against the continuous slice header allocation).

@randall77
Copy link
Contributor

I dealt with the pooling []byte problem in https://go-review.googlesource.com/c/go/+/444820 . I kept a separate set of slice headers that got reused in addition to the backing store itself, to keep the slice header from being (repeatedly) heap allocated.

@dsnet
Copy link
Member

dsnet commented Feb 25, 2023

I agree with @muhlemmer that the ok bool return for sync.Pool.Get is unnecessary, but disagree about getting rid of sync.Pool.New and also disagree that using of sync.Pool with "struct values is pointless". My and @randall77's comment just earlier provide a use-case for non-pointers, where a slice header is semantically equivalent to a struct value.

My argument against ok bool in sync.Pool.Get is as follows:

  • The purpose of using sync.Pool is as a lossy cache for memory allocations.
  • Any cached memory object must contain a pointer somewhere (either explicitly as a *T or implicitly through a []E, map[K]V, or a chan).
  • Therefore, the zero value of T in sync.PoolOf[T] cannot be useful since it contains no pointers to allocated memory.
  • User code can always check for non-nilness instead checking the ok bool return value.
    • In the common case of pooling a *T, []E or map[K]V, these can be trivially checked by comparing to nil.
    • In the rarer (but totally valid) case of pooling a struct, that struct must have a field somewhere that contains a pointer to allocated memory. The user code can check the nil-ness of that pointer.

Yes, dropping ok bool makes it impossible to distinguish between non-presence and a New that happened to return the zero value, but under what circumstance is that a useful distinction? sync.Pool is not meant to be a general purpose map.

@earthboundkid
Copy link
Contributor

Given that #56102 was approved, can this come off of hold and be evaluated?

@thepudds
Copy link
Contributor

Hi @carlmjohnson, there is a decent chance resolution here is at least somewhat intertwined with #48287 and how to approach this category of change?

@ianlancetaylor
Copy link
Contributor Author

Yes, this should now be considered to be on hold for further thoughts along the lines of #48287. We don't have a general plan for updating packages to add generic support.

@rip-create-your-account
Copy link

I'm proposing to use the opportunity to extend sync.Pool to also allow Put() and Get() of multiple objects in a single call. This proposal could be implemented by adding a few new methods:

// PutFromSlice adds all entries from xs to the pool.
//
// It behaves like the following code:
// for _, x := range xs {
//     pool.Put(x)
// }
func (p *PoolOf[T]) PutFromSlice(xs []T)

// FillSlice selects arbitrary items from the Pool, removes them from the Pool,
// and places them into xs.
//
// If there is not enough items to fill the slice, the remaining part of the
// slice is filled with nils. But if p.New is non-nil, the remaining part is
// filled with the results of calling p.New repeatedly.
//
// It behaves like the following code:
// for i := range xs {
//     xs[i] = pool.Get()
// }
func (p *PoolOf[T]) FillSlice(xs []T)

The motivation is performance in the context of packet processing. It's starting to become a pattern in my hot path to have a slice of objects to Get() or Put(). Right now I'm just calling Get/Put in a loop to fill/release such slices. One unfortunately common detail that I have going on is that one goroutine does the Gets and after some processing it's some another goroutine that Puts those objects back to the pool. When under load this ends up meaning that the goroutine doing the Gets mostly ends up taking the slow path of Get(). Get() of a slice of objects surely allows new optimizations in the slow path considering how complex of a data structure Pool is.

Even under light load the common case for me is a Put/Get of len(slice)>=4 and a small hiccup can cause len(slice)>=16. Under heavy load len(slice)>=42 and in the worst case len(slice)>=128. As I explained above under heavy load the Get() path consistently takes the slow path so I would really like to see a batch API of some sort included.

@someview
Copy link

Good idea. This can show clear requirements,not interface that represents any. Type assertions are no longer needed for performance

@karelbilek
Copy link

@ianlancetaylor #48287 seem closed now? I don't understand the "discussion" system so I don't know if there was an actual output from that

@earthboundkid
Copy link
Contributor

math/rand is becoming math/rand/v2 in Go 1.22, so maybe the answer is to add sync/v2 for a generic map.

@ianlancetaylor
Copy link
Contributor Author

At the moment, I agree: our path forward will be a sync/v2 package. We're going to see how math/rand/v2 plays out first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
generics Issue is related to generics Proposal Proposal-Hold
Projects
Status: Hold
Development

No branches or pull requests