Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync/atomic: add Uint64Pair #61236

Open
haraldrudell opened this issue Jul 8, 2023 · 34 comments
Open

sync/atomic: add Uint64Pair #61236

haraldrudell opened this issue Jul 8, 2023 · 34 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted Proposal Proposal-Accepted
Milestone

Comments

@haraldrudell
Copy link

I suggest CAS2 is implemented in the atomic package similar to atomic.align64, operational for hardware that supports it.

possibly also:

  • atomic OR
  • LL/SC Load-link/store-conditional

  • With the 2022 memory model, go1.19 atomics became useful after years of unproductive near-sighted discussions
  • However, wait-free algorithms now use CAS2, ie. 128-bit atomic, which is supported on most cpus but not in Go
  • implementing CAS spinners in lock-free applications produces long thread wait-times or 100% cpu, so wait-free is better: threads are neither suspended or spinning. We will have 100 cores and 100 GiB RAM from Apple soon enough
  • what is desired is a performant wait-free queue and wait-free free-list stack
  • fast-path of wCQ Nikolaev Ravindran 2022 and wfq Yang Mellor-Crummey 2016
  • this would enable an initial thread-pool to have wait-free and allocation-free data sink

darwin-arm64 darwin-amd64 linux-amd64

I think also atomic generics could do with another polish
Like this AtomicMax that is more flexible in what underlying integer types and named types can be used: https://github.com/haraldrudell/parl/blob/main/atomic-max.go

  • by inventing align64 only in atomic, 64-bit things must go in atomic
@gopherbot gopherbot added this to the Proposal milestone Jul 8, 2023
@seankhliao seankhliao changed the title proposal: support CAS2 in Go affected/package: atomic proposal: sync/atomic: support CAS2 Jul 8, 2023
@bcmills bcmills added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 10, 2023
@ianlancetaylor
Copy link
Contributor

I think the proposal here is

// CompareAndSwapUint128 executes the compare-and-swap operation for a 128-bit
// integer, represented as a pair of 64-bit integers.
func CompareAndSwapUint128(addr *uint64, old1, old2, new1, new2 uint64) (swapped bool)

But see also #9455. If we adopt that, then this version of the function would no longer make sense. So perhaps the proposal here should be

// CompareAndSwapUint64Pair executes the compare-and-swap operation for a 128-bit
// integer, represented as a pair of 64-bit integers.
func CompareAndSwapUint64Pair(addr *uint64, old1, old2, new1, new2 uint64) (swapped bool)

Presumably hardware does not have a 128-bit CAS operation could implement this using something like the lock table in runtime/internal/atomic/atomic_arm.go.

@ianlancetaylor ianlancetaylor changed the title proposal: sync/atomic: support CAS2 proposal: sync/atomic: add CompareAndSwapUint128 Jul 12, 2023
@randall77
Copy link
Contributor

Perhaps

func CompareAndSwapUint64Pair(addr *[2]uint64, old, new [2]uint64) (swapped bool)

On x86 at least, the hardware instructionCMPXCHG16B requires its argument to be 16-byte aligned. I'm not sure how we would enforce that. It sounds like #599 all over again.

@merykitty
Copy link

merykitty commented Jul 12, 2023

We could have atomic.Uint64Pair which can be enforced specially by the compiler to have 16-byte alignment. An API point to operate on arbitrary pairs of uint64 seems improbable.

@haraldrudell
Copy link
Author

  1. CAS2 atomic.CompareAndSwap is one required 128-bit instruction for wait-Free 2016+
  2. The other is fetch-and-add FAA atomic.Add
  3. LL/SC is only used if the other 2 do not exist and is actually more complicated, likely for some unusual platforms that are not arm64 or amd64
  4. If there is none of that, it’s slow-path

There is a Go package that uses array of uint64, [2]uint64 in its api:
github.com/CAFxX/atomic128

Generally, the data used is not a 128-bit number but a composite of often two 64-bit numbers that are changed atomically as a pair

@CAFxX
Copy link
Contributor

CAFxX commented Jul 17, 2023

There is a Go package that uses array of uint64, [2]uint64 in its api: https://github.com/CAFxX/atomic128

FTR this was just out of necessity (due to the lack of uint128) more than anything else. It's true that both intel and arm define their DWCAS instructions as operating on pairs of 64-bit values, but that does not automatically mean we should expose that in our APIs (especially given the alignment requirements that are uncharacteristic of uint64s)

@rsc
Copy link
Contributor

rsc commented Aug 9, 2023

Which systems would support this function?

@rsc
Copy link
Contributor

rsc commented Aug 9, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@CAFxX
Copy link
Contributor

CAFxX commented Aug 10, 2023

Which systems would support this function?

Natively: I'm aware of at least amd64 with CMPXCHG16B (AMD64v2 and above) and aarch64/ARMv8.1 with LSE

Others will have to be emulated (I just use a mutex in my library in this case, but other implementations are possible e.g. for aarch64 there are DW LL/SC instructions that were available before the native DWCAS)

@rsc
Copy link
Contributor

rsc commented Aug 16, 2023

We don't want to have an API that's unavailable or always fails on 32-bit systems, so what do we do on 32-bit systems?
If we use a lock on 32-bit systems then we need to provide ways for code to read the 64-bit halves or the whole 128-bit value. (Currently there's no way to find out what the value has!)

Perhaps the right path forward is to define a type wrapper like atomic.Uint64 and not the un-wrapped one.

package atomic
type Uint64Pair struct { ... unexported ... }
func (x *Uint64Pair) Load() [2]uint64
func (x *Uint64Pair) Store(new [2]uint64) 
func (x *Uint64Pair) CompareAndSwap(old, new [2]uint64) bool
func (x *Uint64Pair) Swap(new [2]uint64) (old [2]uint64)

But there would not be corresponding top-level functions, to avoid direct access to the atomic uint64 values.

Edit: Added Swap.

@someview
Copy link

someview commented Aug 26, 2023

How about add something like MarkableReference and TimeStampedReference in java,just extend the val :

func CompareAndSwapUint64Pointer(oldPtr unsafe.Pointer, newPtr unsafe.Pointer, new1, new2 uint64) (swapped bool)

This is useful for lock free data structure:
hashmap.add

@rsc
Copy link
Contributor

rsc commented Aug 30, 2023

@someview, I am not sure I understand what you mean by "just extend the val" here. It sounds like you mean assume that oldPtr and newPtr both point to a pair of uint64 values, but if we do that, then there will be a problem on 32-bit systems (which must use locks to implement cas128) that the individual uint64s will be available for ordinary reads or even atomic.LoadUint64, neither of which will be correct since they will not use the lock too. The point of the Uint64Pair type is to hide the actual uint64 values behind an abstraction that only allows using them in this specific way.

@someview
Copy link

If cas128 must support

@someview, I am not sure I understand what you mean by "just extend the val" here. It sounds like you mean assume that oldPtr and newPtr both point to a pair of uint64 values, but if we do that, then there will be a problem on 32-bit systems (which must use locks to implement cas128) that the individual uint64s will be available for ordinary reads or even atomic.LoadUint64, neither of which will be correct since they will not use the lock too. The point of the Uint64Pair type is to hide the actual uint64 values behind an abstraction that only allows using them in this specific way.

This may be double-word CAS,not atomic128 with 32-bit system

@rsc
Copy link
Contributor

rsc commented Aug 30, 2023

It sounds like people are generally happy with #61236 (comment).
Have all remaining concerns about this proposal been addressed?

@randall77
Copy link
Contributor

We probably want Swap also, the other types have that.

@rsc
Copy link
Contributor

rsc commented Aug 31, 2023

Added Swap, thanks.

@CAFxX
Copy link
Contributor

CAFxX commented Aug 31, 2023

I suspect quite a few uses of a DWCAS would be to deal with pointers... Should we consider also PointerPair/[2]unsafe.Pointer in this proposal (with the caveat that the size will differ between 32 and 64 bit archs), or should we leave it for later?

@ianlancetaylor
Copy link
Contributor

I suggest that PointerPair be a different proposal, since, as you say, it's not necessarily a 128-bit issue. People might in principle be running into it today on 32-bit systems. That separate proposal should ideally point out some algorithms that would benefit from a PointerPair implementation. Thanks.

@rsc
Copy link
Contributor

rsc commented Sep 7, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@zephyrtronium
Copy link
Contributor

Somehow I missed this until now. To echo @CAFxX, I'm not really sure what algorithms can actually make use of uint64 pairs in particular, i.e. without needing any pointers in the pair. The classical wait-free linked list algorithm using atomic pairs needs one or two pointers, depending on the element type. wCQ seems to need zero or one depending on the element type. wFQ doesn't actually seem to use 128-bit atomics, unless I missed it in my skim, and if it would then at least one would be a pointer.

@haraldrudell's opening post says, "what is desired is a performant wait-free queue and wait-free free-list stack." That doesn't seem to be possible under unsafe.Pointer rules with just integer operations, barring implementing a whole manual allocator to circumvent Go's garbage collector.

So, it seems like the only potential use case of this is a fixed-buffer wait-free queue with uint64 elements. That seems dubious to me. What am I missing?

@zephyrtronium
Copy link
Contributor

I have been thinking more about this. Is there any reason not to make the API generic?

package atomic
type Pair[Fst ~*FstElem | ~uintptr, Snd ~*SndElem | ~uintptr, FstElem, SndElem any] struct { ... }
func (*Pair[Fst, Snd, _, _]) Load() (Fst, Snd)
func (*Pair[Fst, Snd, _, _]) Store(Fst, Snd) 
func (*Pair[Fst, Snd, _, _]) CompareAndSwap(oldFst Fst, oldSnd Snd, newFst Fst, newSnd Snd) bool
func (*Pair[Fst, Snd, _, _]) Swap(Fst, Snd) (Fst, Snd)

Per #61236 (comment), I'm happy to open a separate proposal for this since it is not a 128-bit issue per se. I'm posting here first because I don't think the current proposal meets its own goals, whereas this does. The list of algorithms that would benefit is the same (unless someone has an example otherwise); it would only make the implementations of those algorithms much simpler and safer as well as being the same on 32-bit platforms as on 64-bit, except where uint64 elements in particular would appear.

@rsc rsc changed the title proposal: sync/atomic: add CompareAndSwapUint128 proposal: sync/atomic: add Uint64Pair Sep 20, 2023
@rsc
Copy link
Contributor

rsc commented Sep 20, 2023

The generic version seems untenable to me. How do I put one of those in a struct?

type MyType struct {
    Field atomic.Pair[*byte, *int, byte, int]
}

is pretty awkward. And what if one type is uintptr?

    Field atomic.Pair[uintptr, *int, ????, int]

What goes in the ????? I guess we can provide anything but then explaining why 'any' appears here:

    Field atomic.Pair[uintptr, *int, any, int]

is very awkward.

We don't have answers for these questions in the current generics, nor in the future plans.

If we don't try to generify, it seems like we need three types: Uint64Pair, PointerPair[T1, T2], and Uint64AndPointer[T].
That's not ideal, but it works, and this is sync/atomic, which is never that pretty.

@CAFxX
Copy link
Contributor

CAFxX commented Sep 20, 2023

Uint64AndPointer[T]

This should maybe be UintptrAndPointer[T], considering 32 bit architectures.

@rsc
Copy link
Contributor

rsc commented Oct 3, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Oct 4, 2023

Right now it looks like the proposal is to add three types with this API:

package atomic
type Uint64Pair struct { ... unexported ... }
func (x *Uint64Pair) Load() (uint64, uint64)
func (x *Uint64Pair) Store(uint64, uint64) 
func (x *Uint64Pair) CompareAndSwap(old1, old2, new1, new2 uint64) bool
func (x *Uint64Pair) Swap(new1, new2 uint64) (old1, old2 uint64)

type PointerPair[T1, T2 any] struct { ... unexported ... }
func (x *PointerPair[T1, T2]) Load() (*T1, *T2)
func (x *PointerPair[T1, T2]) Store(new1 *T1, new2 *T2) 
func (x *PointerPair[T1, T2]) CompareAndSwap(old1 *T1, old2 *T2, new1 *T1, new2 *T2) bool
func (x *PointerPair[T1, T2]) Swap(new1 *T1, new2 *T2) (old1 *T1, old2 *T2)

type Uint64AndPointer[T1 any] struct { ... unexported ... }
func (x *Uint64AndPointer[T1]) Load() (uint64, *T1)
func (x *Uint64AndPointer[T1]) Store(uint64, *T1)
func (x *Uint64AndPointer[T1]) CompareAndSwap(old1 uint64, old2 *T1, new1 uint64, new2 *T2) bool
func (x *Uint64AndPointer[T1]) Swap(new1 uint64, new2 *T1) (old1 uint64, old2 *T1)

And maybe also UintptrAndPointer.

The arrays are gone because they only apply in Uint64Pair.

It's not clear if this is the road we want to go down, but I don't see another one.

@rsc
Copy link
Contributor

rsc commented Oct 11, 2023

This is the API we think we'd use. Is there anyone who has a use case that isn't addressed by this? And is there anyone who has a use case that is addressed by this? (We want to make sure we're not adding something no one needs.) Thanks!

@zephyrtronium
Copy link
Contributor

PointerPair with CAS makes it straightforward to implement an efficient lock-free linked list with arbitrary elements, which directly addresses a use case I have. Uint64AndPointer does the same for uint64 elements, which is probably still useful. It's a bit of a shame that having them as separate types means that the same type can't provide both varieties of linked list, but I'm not sure how common the latter is anyway.

If @haraldrudell or someone else could check that my reading of wCQ in #61236 (comment) is correct, that would confirm another use case for Uint64AndPointer.

@rsc
Copy link
Contributor

rsc commented Oct 26, 2023

Have all remaining concerns about this proposal been addressed?

Add to sync/atomic:

package atomic

// A Uint64Pair is an atomic pair of uint64 values.
// The zero value is a pair of zeros.
type Uint64Pair struct { ... unexported ... }

// Load atomically loads and returns the pair stored in x.
func (x *Uint64Pair) Load() (v1, v2 uint64)

// Store atomically stores the pair v1, v2 into x.
func (x *Uint64Pair) Store(v1, v2 uint64) 

// CompareAndSwap executes the compare-and-swap operation for x.
func (x *Uint64Pair) CompareAndSwap(old1, old2, new1, new2 uint64) bool

// Swap atomically stores new1, new2 into x and returns the old pair.
func (x *Uint64Pair) Swap(new1, new2 uint64) (old1, old2 uint64)

// A PointerPair is an atomic pair of pointer values.
// The zero value is a pair of nils.
type PointerPair[T1, T2 any] struct { ... unexported ... }

// Same doc comments as above.
func (x *PointerPair[T1, T2]) Load() (v1 *T1, v2 *T2)
func (x *PointerPair[T1, T2]) Store(v1 *T1, v2 *T2) 
func (x *PointerPair[T1, T2]) CompareAndSwap(old1 *T1, old2 *T2, new1 *T1, new2 *T2) bool
func (x *PointerPair[T1, T2]) Swap(new1 *T1, new2 *T2) (old1 *T1, old2 *T2)

// A Uint64AndPointer is an atomic pair consisting of a uint64 and a pointer.
// The zero value is a zero uint64 and a nil pointer.
type Uint64AndPointer[T any] struct { ... unexported ... }

// Same doc comments as above.
func (x *Uint64AndPointer[T]) Load() (v1 uint64, v2 *T)
func (x *Uint64AndPointer[T]) Store(v1 uint64, v2 *T)
func (x *Uint64AndPointer[T]) CompareAndSwap(old1 uint64, old2 *T, new1 uint64, new2 *T) bool
func (x *Uint64AndPointer[T]) Swap(new1 uint64, new2 *T) (old1 uint64, old2 *T)

@rsc
Copy link
Contributor

rsc commented Nov 1, 2023

Note that like with atomic.Int64, the compiler and toolchain will provide the necessary alignment automatically, so there's no need to worry about it or mention it here.

@rsc
Copy link
Contributor

rsc commented Nov 2, 2023

This seems like a likely accept but we may well not get to it until Go 1.23 at this point.

@rsc
Copy link
Contributor

rsc commented Nov 2, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

Add to sync/atomic:

package atomic

// A Uint64Pair is an atomic pair of uint64 values.
// The zero value is a pair of zeros.
type Uint64Pair struct { ... unexported ... }

// Load atomically loads and returns the pair stored in x.
func (x *Uint64Pair) Load() (v1, v2 uint64)

// Store atomically stores the pair v1, v2 into x.
func (x *Uint64Pair) Store(v1, v2 uint64) 

// CompareAndSwap executes the compare-and-swap operation for x.
func (x *Uint64Pair) CompareAndSwap(old1, old2, new1, new2 uint64) bool

// Swap atomically stores new1, new2 into x and returns the old pair.
func (x *Uint64Pair) Swap(new1, new2 uint64) (old1, old2 uint64)

// A PointerPair is an atomic pair of pointer values.
// The zero value is a pair of nils.
type PointerPair[T1, T2 any] struct { ... unexported ... }

// Same doc comments as above.
func (x *PointerPair[T1, T2]) Load() (v1 *T1, v2 *T2)
func (x *PointerPair[T1, T2]) Store(v1 *T1, v2 *T2) 
func (x *PointerPair[T1, T2]) CompareAndSwap(old1 *T1, old2 *T2, new1 *T1, new2 *T2) bool
func (x *PointerPair[T1, T2]) Swap(new1 *T1, new2 *T2) (old1 *T1, old2 *T2)

// A Uint64AndPointer is an atomic pair consisting of a uint64 and a pointer.
// The zero value is a zero uint64 and a nil pointer.
type Uint64AndPointer[T any] struct { ... unexported ... }

// Same doc comments as above.
func (x *Uint64AndPointer[T]) Load() (v1 uint64, v2 *T)
func (x *Uint64AndPointer[T]) Store(v1 uint64, v2 *T)
func (x *Uint64AndPointer[T]) CompareAndSwap(old1 uint64, old2 *T, new1 uint64, new2 *T) bool
func (x *Uint64AndPointer[T]) Swap(new1 uint64, new2 *T) (old1 uint64, old2 *T)

@rsc
Copy link
Contributor

rsc commented Nov 10, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

Add to sync/atomic:

package atomic

// A Uint64Pair is an atomic pair of uint64 values.
// The zero value is a pair of zeros.
type Uint64Pair struct { ... unexported ... }

// Load atomically loads and returns the pair stored in x.
func (x *Uint64Pair) Load() (v1, v2 uint64)

// Store atomically stores the pair v1, v2 into x.
func (x *Uint64Pair) Store(v1, v2 uint64) 

// CompareAndSwap executes the compare-and-swap operation for x.
func (x *Uint64Pair) CompareAndSwap(old1, old2, new1, new2 uint64) bool

// Swap atomically stores new1, new2 into x and returns the old pair.
func (x *Uint64Pair) Swap(new1, new2 uint64) (old1, old2 uint64)

// A PointerPair is an atomic pair of pointer values.
// The zero value is a pair of nils.
type PointerPair[T1, T2 any] struct { ... unexported ... }

// Same doc comments as above.
func (x *PointerPair[T1, T2]) Load() (v1 *T1, v2 *T2)
func (x *PointerPair[T1, T2]) Store(v1 *T1, v2 *T2) 
func (x *PointerPair[T1, T2]) CompareAndSwap(old1 *T1, old2 *T2, new1 *T1, new2 *T2) bool
func (x *PointerPair[T1, T2]) Swap(new1 *T1, new2 *T2) (old1 *T1, old2 *T2)

// A Uint64AndPointer is an atomic pair consisting of a uint64 and a pointer.
// The zero value is a zero uint64 and a nil pointer.
type Uint64AndPointer[T any] struct { ... unexported ... }

// Same doc comments as above.
func (x *Uint64AndPointer[T]) Load() (v1 uint64, v2 *T)
func (x *Uint64AndPointer[T]) Store(v1 uint64, v2 *T)
func (x *Uint64AndPointer[T]) CompareAndSwap(old1 uint64, old2 *T, new1 uint64, new2 *T) bool
func (x *Uint64AndPointer[T]) Swap(new1 uint64, new2 *T) (old1 uint64, old2 *T)

@rsc rsc changed the title proposal: sync/atomic: add Uint64Pair sync/atomic: add Uint64Pair Nov 10, 2023
@rsc rsc modified the milestones: Proposal, Backlog Nov 10, 2023
@Zheng-Xu
Copy link
Contributor

Zheng-Xu commented Nov 13, 2023

In https://pkg.go.dev/sync/atomic#pkg-note-BUG , it is said

On ARM, 386, and 32-bit MIPS, it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically via the primitive atomic functions (types Int64 and Uint64 are automatically aligned). The first word in an allocated struct, array, or slice; in a global variable; or in a local variable (because the subject of all atomic operations will escape to the heap) can be relied upon to be 64-bit aligned.

So far, we don't have 16-byte aligned data structure in Go. Will we provide the capability to cast something else to *Uint64Pair, for example the address of a struct to *Uint64Pair?

More specifically, can we guarantee below Load() is atomic?

var comp complex128
(*atomic. Uint64Pair)(unsafe.Pointer(&comp)).Load()

@cherrymui
Copy link
Member

@Zheng-Xu as commented in #61236 (comment) , atomic. Uint64Pair will be properly aligned, to 16-byte if the architecture's atomic operations require it. That said, the alignment of complex128 is not changed, so that unsafe conversion is not safe.

@mauri870
Copy link
Member

mauri870 commented Dec 8, 2023

I would be happy to work on this for the 1.23 cycle, given that I have previous experience implementing internal and sync/atomic APIs. I'll keep it assigned unless someone else is interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted Proposal Proposal-Accepted
Projects
Status: Accepted
Development

No branches or pull requests