Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync/atomic: add OR/AND operators for unsigned types #61395

Open
aktau opened this issue Jul 17, 2023 · 57 comments · May be fixed by #64331
Open

sync/atomic: add OR/AND operators for unsigned types #61395

aktau opened this issue Jul 17, 2023 · 57 comments · May be fixed by #64331

Comments

@aktau
Copy link
Contributor

aktau commented Jul 17, 2023

Update, Aug 16 2023: Current proposal at #61395 (comment).


Original related proposal: #31748

Use case: we have types with methods that set a value. These methods manipulate a bitset indicating that the value was set, which is used (e.g.) for data serialization. Users of this API know to use a lock to manage concurrently reading/writing the same field, but they are allowed to concurrently write to different fields. Given that the bitsets are logically shared between different fields, we must manipulate them atomically. Currently that takes the form of a CAS loop:

type MyStruct struct{
  x int32
  y int32
  // ...

  present uint32 // Contains "present" bits for 32 fields.
}

func (m *MyStruct) SetX(v int32) {
  m.x = v
  setPresent(&m.present[0], 1)
}

func (m *MyStruct) SetY(v int32) {
  m.y = v
  setPresent(&m.present[0], 2)
}

func setPresent(part *uint32, num uint32) {
	for {
		old := atomic.LoadUint32(part)
		swapped := atomic.CompareAndSwapUint32(part, old, old|(1<<(num%32)))
		if swapped {
			return
		}
		// Yield and then try the swap again.
		runtime.Gosched()
	}
}

// similar for clearPresent, but with AND.

But, on x86-64, there are atomic versions of AND/OR that do this in one go, as mentioned in #31748. Using this would not only make the setters faster, it would likely also allow inlining them: setPresent is too complex to inline.

cc @aclements @ianlancetaylor @randall77

@rsc
Copy link
Contributor

rsc commented Jul 17, 2023

But, on x86-64, there are atomic versions of AND/OR that do this in one go, as mentioned in #31748. Using this would not only make the setters faster, it would likely also allow inlining them: setPresent is too complex to inline.

setPresent should not be calling runtime.Gosched. There's nothing another goroutine is going to do to help move things along (unlike, say, waiting for a mutex to be unlocked).

If you remove the Gosched, it should be inlinable. At least, -m says this is inlinable:

func orcas(x *uint32, bit uint32) {
	for {
		old := atomic.Load(x)
		if atomic.Cas(x, old, old|bit) {
			break
		}
	}
}

I wrote this in runtime/atomic_test.go:

package runtime_test

import (
	"runtime/internal/atomic"
	"testing"
)

func BenchmarkCas(b *testing.B) {
	var x uint32
	for i := 0; i < b.N; i++ {
		for {
			old := atomic.Load(&x)
			if atomic.Cas(&x, old, old|1) {
				break
			}
		}
	}
}

func BenchmarkOr(b *testing.B) {
	var x uint32
	for i := 0; i < b.N; i++ {
		atomic.Or(&x, 1)
	}
}

The result is:

BenchmarkCas-16                  	154634635	         7.429 ns/op
BenchmarkOr-16                   	270709120	         4.462 ns/op

So this would be an x86-only optimization that wins less than 2X. I'm not sure this really makes sense given how unusual it typically is. The main argument I can see is inlinability, but it seems to be inlinable already if written efficiently.

@aktau
Copy link
Contributor Author

aktau commented Jul 17, 2023

I was wondering about runtime.Gosched() as well, I couldn't figure out why it would help. Thanks for highlighting that it is indeed unnecessary.

So this would be an x86-only optimization that wins less than 2X. I'm not sure this really makes sense given how unusual it typically is. The main argument I can see is inlinability, but it seems to be inlinable already if written efficiently.

A 2x win may still be significant, but I'll know more when I add a better benchmarking setup. But also I found LDSET/LDSETA/LDSETAL/LDSETL for ARM, quoting:

Atomic bit set on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory, performs a bitwise OR with the value held in a register on it, and stores the result back to memory. The value initially loaded from memory is returned in the destination register.

I think the "atomically" may be referring to the entire operation. To be honest the mnemonic doesn't make it sound like this is a bitwise or. Perhaps an ARM expert could chime in.

@rsc
Copy link
Contributor

rsc commented Jul 17, 2023

Thanks for the link to LDSETAL. That looks like it might be good enough, although we would have to think carefully about whether the acquire-release semantics it guarantees matches the sequentially consistent semantics we want for sync/atomic when limited to the OR operation. Perhaps it does but perhaps not.

@aclements
Copy link
Member

But also I found LDSET/LDSETA/LDSETAL/LDSETL for ARM

That's correct. ARMV8.1 added instructions for atomic And (also, Min/Max, and, for some reason, Xor). LDCLR, LDCLRA, LDCLRAL, LDCLRL is the equivalent for atomic And.

@aclements
Copy link
Member

A 2x win may still be significant

I'll note that there was also no contention in @rsc's benchmark. Typically, CAS loops collapse far worse under contention than direct atomic operations.

@rsc
Copy link
Contributor

rsc commented Jul 19, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@zephyrtronium
Copy link
Contributor

This might be a stretch, but another option could be to rewrite the simplest form of the CAS loop to the corresponding atomic operation on platforms that support it, similar to how "intrinsics" for encoding/binary work. That way old code using the thing that works now benefits as well. New API would still be convenient, but I think atomic flag sets that want it are rare, and if it's intrinsified then it can live in any package. I am given to understand this would be harder than existing rewrite rules, though, because it spans more than a single basic block. It may also be hard to formulate proofs that the rewrite preserves semantics.

@aclements
Copy link
Member

This might be a stretch, but another option could be to rewrite the simplest form of the CAS loop to the corresponding atomic operation on platforms that support it

@randall77 actually prototyped exactly this. I'm not necessarily opposed to having these rewrite rules, but I don't think they're a substitute for just giving people the API that they need. Often, when writing low-level code like this, the developer wants to say what they mean and have a good sense of what's actually going on, so having to express this operation through a rather opaque rewrite rule seems too subtle.

@rsc
Copy link
Contributor

rsc commented Jul 26, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Aug 2, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: sync/atomic: add OR/AND operators for unsigned types sync/atomic: add OR/AND operators for unsigned types Aug 2, 2023
@rsc rsc modified the milestones: Proposal, Backlog Aug 2, 2023
@mateusz834
Copy link
Member

What new API additions were proposed here?

@rsc
Copy link
Contributor

rsc commented Aug 2, 2023

I guess we all assumed we knew what the API was but forgot to write it down. Oops!
I think the new API is take anything that says Add in the current sync/atomic package and s/Add/And/ and also s/Add/Or/.
So there would be both new top-level functions and new methods on Int32, Int64, Uint32, Uint64, and Uintptr.

@mauri870
Copy link
Member

mauri870 commented Aug 5, 2023

Anyone currently working on this one? I've made some progress already and would like to finish the work. If that is ok please feel free to assign the issue to me. Thanks.

@mauri870
Copy link
Member

mauri870 commented Aug 5, 2023

I see that the current internal/atomic Or/And don't return a new value as opposed to the Add apis.

If we want to keep the same api style as Add we need to change Or/And to return a new value instead of applying the bitwise operation inplace.

Just want to confirm that this is indeed an expected outcome, I see that it is used in a couple places in runtime and atomic tests.

@randall77
Copy link
Contributor

We should make the new API for sync/atomic consistent with what is already in sync/atomic. So And/Or should return the new value.

If that makes it inconsistent with runtime/internal/atomic, so be it. We can always clean up runtime/internal/atomic later. (Or maybe you are saying we should make them consistent as part of this change so we can share SSA ops like ssa.AtomicOr8? That would be fine.)

One thing that bugs me a bit: there is a choice to return the new value or the old value. With Add the choice doesn't really matter, as you can always derive the other choice by subtracting/adding the arg you passed to Add. But that doesn't work with And/Or as they are not reversible. You can get the new value from the old value, but not vice versa. I guess we should think a bit about what that return value might be used for. (I guess if you really needed to know the old value you could use a CAS loop instead.)

@aclements
Copy link
Member

It seems really unfortunate to have these operations destroy useful information by returning the new value. In isolation, clearly we would want them to return the old value, but that would be confusingly inconsistent with the rest of sync/atomic.

These operations could return both the old and new value, like:

func AndUint32(addr *uint32, mask uint32) (old, new uint32)
func (x *Uint32) And(mask uint32) (old, new uint32)

This would allow us to return the old value, while making the return signature unmistakably different from other functions in sync/atomic to avoid confusion. I'm sure most uses would assign one result or the other (or possibly both) to _, and the compiler could easily optimize away the unnecessary parts of the operation. In general it can be easy to mistake the order of multiple results of the same type, but in this case the result order is consistent with many other norms like left-to-right timelines and LTR writing.

This reminds me a lot of GCC's atomic intrinsics, where several have both an __atomic_OP_fetch and an __atomic_fetch_OP variant that differ in whether they return the old value or the new value. In Go we don't have the limitation of returning just one or the other.

The one downside I see to returning both old and new is that these functions couldn't be used as part of a larger expression. Users could of course wrap it in their own operation that returned just one result or the other, and use that in expression contexts.

@mauri870
Copy link
Member

mauri870 commented Aug 6, 2023

If that makes it inconsistent with runtime/internal/atomic, so be it. We can always clean up runtime/internal/atomic later. (Or maybe you are saying we should make them consistent as part of this change so we can share SSA ops like ssa.AtomicOr8? That would be fine.)

I haven't consider touching the SSA. I'm not really familiar with it to pull this off. My idea is to implement Or32/Or64 and And32/And64 in internal/atomic to serve as basis for the sync/atomic work. In order to not mess up the current Or/And and variants I will leave them untouched for now.

These operations could return both the old and new value, like:

I like this approach, it is unfortunate that it prevents its use in larger expressions, at least this approach is explicit about the returns so people don't accidentally use old for new and vice versa. Another option is just to stick with the api we currently have for add, returning the new value only

@aclements
Copy link
Member

I like this approach, it is unfortunate that it prevents its use in larger expressions

Looking at the handful of uses in the runtime, in fact, none of them use the result at all. So I'm not too worried about the inability to use these operations in a larger expression.

at least this approach is explicit about the returns so people don't accidentally use old for new and vice versa.

+1

@gopherbot
Copy link

Change https://go.dev/cl/543035 mentions this issue: runtime/race: update race syso files to support atomic And, Or

gopherbot pushed a commit that referenced this issue Nov 16, 2023
TSAN recently got support for Go's new atomic And and Or
operations (#61395). This CL updates the race syso files to
include the change. Also regenerate cgo dynamic imports on darwin.

OpenBSD/AMD64 is not updated, as TSAN no longer supports OpenBSD
(#52090).

Linux/PPC64 is not updated, as I'm running into some builder
issues. Still working on it.

For #61395.
For #62624.

Change-Id: Ifc90ea79284f29a356f9e8a5f144f6c690881395
Reviewed-on: https://go-review.googlesource.com/c/go/+/543035
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
mauri870 added a commit to mauri870/go that referenced this issue Nov 16, 2023
Without having all the architectures implementing the And/Or operators
merged I can't proceed with the public sync/atomic apis. This CL adds a
generic implementation that should work for all the remaining arches,
while waiting for the native assembly implementations.

For golang#61395
mauri870 added a commit to mauri870/go that referenced this issue Nov 16, 2023
Without having all the architectures implementing the And/Or operators
merged I can't proceed with the public sync/atomic apis. This CL adds a
generic implementation that should work for all the remaining arches,
while waiting for the native assembly implementations.

For golang#61395
@gopherbot
Copy link

Change https://go.dev/cl/543175 mentions this issue: runtime/internal/atomic: add generic implementation for And/Or

@gopherbot
Copy link

Change https://go.dev/cl/543397 mentions this issue: runtine/race: update race syso for PPC64LE

gopherbot pushed a commit that referenced this issue Nov 20, 2023
Without having all the architectures implementing the And/Or operators
merged I can't proceed with the public sync/atomic apis. This CL adds a
generic implementation that should work for all the remaining arches,
while waiting for the native assembly implementations in CL 531835,
CL 531678, CL 531895.

I regret the oversight of not pushing this earlier.

For #61395

Change-Id: Ib2d67f359fe324b4743eb79e9c8e52e8f6f5476c
GitHub-Last-Rev: d350927
GitHub-Pull-Request: #64214
Reviewed-on: https://go-review.googlesource.com/c/go/+/543175
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
gopherbot pushed a commit that referenced this issue Nov 20, 2023
Following CL 543035, this CL updates race syso for Linux/PPC64LE.
Now we have update all of them (except OpenBSD).

For #61395.
Fixes #62624.

Change-Id: I9e1d758355114a50ff206e5d78dc4ea8a06367d8
Reviewed-on: https://go-review.googlesource.com/c/go/+/543397
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
mauri870 added a commit to mauri870/go that referenced this issue Nov 21, 2023
Turns out after adding the generic implementation for And/Or we ended up
with duplicated ops that are exactly the same for arm.

This CL removes the arm code and adds arm to the generic build flags.

For golang#61395
@gopherbot
Copy link

Change https://go.dev/cl/544017 mentions this issue: runtime/internal/atomic: deduplicate And/Or ops on arm

mauri870 added a commit to mauri870/go that referenced this issue Nov 21, 2023
When I initially added the wasm code for these ops I did not saw that
wasm actually has the Cas operations implemented, although they are
merely pointer assignments since wasm is single threaded.

Now with a generic implementation for And/Or we can add wasm to the
build tags.

For golang#61395
@gopherbot
Copy link

Change https://go.dev/cl/544116 mentions this issue: runtime/internal/atomic: deduplicate And/Or code on wasm

gopherbot pushed a commit that referenced this issue Nov 21, 2023
Turns out after adding the generic implementation for And/Or we ended up
with duplicated ops that are exactly the same for arm.

Apologies for the oversight, this CL removes the redundant arm code and
adds arm to the generic build flags.

For #61395

Change-Id: Id5e5a5cf113774948f8e772592e898d0810ad1f6
GitHub-Last-Rev: 4d8c857
GitHub-Pull-Request: #64299
Reviewed-on: https://go-review.googlesource.com/c/go/+/544017
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
gopherbot pushed a commit that referenced this issue Nov 22, 2023
When I initially added the wasm code for these ops I did not saw that
wasm actually has the Cas operations implemented, although they are
merely pointer assignments since wasm is single threaded.

Now with a generic implementation for And/Or we can add wasm to the
build tags.

For #61395

Change-Id: I997dc90477c772882d6703df1b795dfc0d90a699
GitHub-Last-Rev: 92736a6
GitHub-Pull-Request: #64300
Reviewed-on: https://go-review.googlesource.com/c/go/+/544116
Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
mauri870 added a commit to mauri870/go that referenced this issue Nov 22, 2023
This CL adds race instrumentation for the new sync/atomic And/Or apis.

I belive the only missing implementation is for openbsd/amd64, that does
not have the newer tsan ops since it is stuck in tsan V2. I'll send the
race impl for openbsd alongside the sync/atomic code in a follow-up CL.

For golang#61395
@gopherbot
Copy link

Change https://go.dev/cl/544476 mentions this issue: runtime: race instrumentation for sync/atomic And/Or

@mauri870 mauri870 modified the milestones: Backlog, Go1.23 Nov 22, 2023
@mauri870
Copy link
Member

mauri870 commented Nov 22, 2023

Brief update: with Go1.22 entering a freeze, I've adjusted the milestone to Go1.23. Given our time constraints, it seems adequate to postpone since so far we have only internal bits merged.

@gopherbot
Copy link

Change https://go.dev/cl/544455 mentions this issue: sync/atomic: public And/Or apis

gopherbot pushed a commit that referenced this issue Feb 5, 2024
These primitives will be used by the new And/Or sync/atomic apis.

For #61395

Change-Id: I64b2e599e4f91412e0342aa01f5fd53271e9a333
GitHub-Last-Rev: 9755db5
GitHub-Pull-Request: #63314
Reviewed-on: https://go-review.googlesource.com/c/go/+/531895
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
These primitives will be used by the new And/Or sync/atomic apis.

For golang#61395

Change-Id: I64b2e599e4f91412e0342aa01f5fd53271e9a333
GitHub-Last-Rev: 9755db5
GitHub-Pull-Request: golang#63314
Reviewed-on: https://go-review.googlesource.com/c/go/+/531895
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
gopherbot pushed a commit that referenced this issue Feb 20, 2024
These primitives will be used by the new And/Or sync/atomic apis.

For #61395

Change-Id: Ia9b4877048002d3d7d1dffa2311d0ec5f38e4ee5
GitHub-Last-Rev: 20dea11
GitHub-Pull-Request: #63318
Reviewed-on: https://go-review.googlesource.com/c/go/+/531678
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

Successfully merging a pull request may close this issue.