New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: Swapping elements of a [2]any
uses 2 separate writebarriers
#62126
Comments
It appears my question is much simpler: why do consecutive assignments to interfaces require multiple writebarrier checks, and why is this not required for consecutive assignments to pointer types? I'm not aware of the internals of gc, but this asymmetry strikes me as an unnecessary cost. |
The difference you're seeing is that when writing two pointers in a row, we can merge those two pointer writes under a single barrier check. When you're writing interfaces, or strings, the writes are pointer-scalar-pointer-scalar, so we need two write barrier checks because the intervening scalar write prevents merging the barrier checks. Note that for 1.21 the generated write barrier code has changed significantly. I think the behavior you observed is still present, but may have different performance characteristics. The new barriers also may make this easier to fix because the new barriers can be batched more easily. In particular, it should be easier to just plain ignore intervening scalar writes. |
That pointer-scalar split is what I was observing - though oddly, it reorders the scalars to be first, then the pointers to be next, and still splits them. I will try 1.21 to see if it is fixed. Very helpful response though, thank you. It may be that in practice, the write barrier is negligible enough that it doesn't matter to perform multiple times. |
Yes, that's so the register holding the scalar is used first, so if it dies at that point it doesn't have to be saved around the write barrier call. |
Here's a simple example for anyone looking at this:
It would be nice if that compiled to a single barrier check + a |
Change https://go.dev/cl/521498 mentions this issue: |
This lets us combine more write barriers, getting rid of some of the test+branch and gcWriteBarrier* calls. With the new write barriers, it's easy to add a few non-pointer writes to the set of values written. We allow up to 2 non-pointer writes between pointer writes. This is enough for, for example, adjacent slice fields. Fixes golang#62126 Change-Id: I872d0fa9cc4eb855e270ffc0223b39fde1723c4b Reviewed-on: https://go-review.googlesource.com/c/go/+/521498 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Uncertain
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I was profiling some code and found that swapping 2 elements of a slice was slower than expected. I found that swapping elements if they were pointer types instead of interfaces reached the performance I expected.
This link shows the naive method of swapping elements, and the unsafe method which produces the asm and performance I expected: https://go.dev/play/p/NQZn03hsQU4
Here are the benchmarks:
My assumption is that the compiler should get the performance of the 3rd benchmark in which I use unsafe to force my expected asm to be generated, but is represented here as the 2nd benchmark.
What did you expect to see?
For the code
where arr is an array of the empty interface, I expected to see just one write barrier check, just as I do for the (paraphrased) asm for swapping a normal pointer type.What did you see instead?
I unexpectedly found two writebarrier checks, where the asm was writing the values of each interface with separate writebarrier checks, like this (paraphrased) asm.
By using unsafe to treat the [2]any as a [4]uintptr and doing 2 sets of swaps, I was able to observe the asm doing just 1 writebarrier check. I currently believe that one check has the same semantics as two checks (in regards to concurrent gc sweep correctness), and this might be incorrect. I'm almost certain however it has the same semantics in regards to concurrent observability promises.
The text was updated successfully, but these errors were encountered: